Training: 2022-07-05 14:25:06,695-rank_id: 0 Training: 2022-07-05 14:26:27,213-: margin_list [1.0, 0.0, 0.4] Training: 2022-07-05 14:26:27,214-: network vit_b_dp005_mask_005 Training: 2022-07-05 14:26:27,214-: resume False Training: 2022-07-05 14:26:27,214-: save_all_states False Training: 2022-07-05 14:26:27,214-: output work_dirs/wf42m_pfc03_40epoch_8gpu_vit_b Training: 2022-07-05 14:26:27,215-: embedding_size 512 Training: 2022-07-05 14:26:27,215-: sample_rate 0.3 Training: 2022-07-05 14:26:27,215-: interclass_filtering_threshold0 Training: 2022-07-05 14:26:27,215-: fp16 True Training: 2022-07-05 14:26:27,215-: batch_size 256 Training: 2022-07-05 14:26:27,215-: optimizer adamw Training: 2022-07-05 14:26:27,215-: lr 0.001 Training: 2022-07-05 14:26:27,215-: momentum 0.9 Training: 2022-07-05 14:26:27,215-: weight_decay 0.1 Training: 2022-07-05 14:26:27,215-: verbose 2000 Training: 2022-07-05 14:26:27,215-: frequent 10 Training: 2022-07-05 14:26:27,215-: dali True Training: 2022-07-05 14:26:27,215-: gradient_acc 12 Training: 2022-07-05 14:26:27,215-: seed 2048 Training: 2022-07-05 14:26:27,215-: num_workers 2 Training: 2022-07-05 14:26:27,215-: rec /train_tmp/WebFace42M Training: 2022-07-05 14:26:27,215-: num_classes 2059906 Training: 2022-07-05 14:26:27,215-: num_image 42474557 Training: 2022-07-05 14:26:27,215-: num_epoch 40 Training: 2022-07-05 14:26:27,215-: warmup_epoch 4 Training: 2022-07-05 14:26:27,215-: val_targets [] Training: 2022-07-05 14:26:27,215-: total_batch_size 2048 Training: 2022-07-05 14:26:27,215-: warmup_step 82956 Training: 2022-07-05 14:26:27,215-: total_step 829560 Training: 2022-07-05 14:26:29,158-Reducer buckets have been rebuilt in this iteration. Training: 2022-07-05 14:26:43,994-Speed 2465.81 samples/sec Loss 42.8759 LearningRate 0.000000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 176 hours Training: 2022-07-05 14:26:52,191-Speed 2499.56 samples/sec Loss 42.8873 LearningRate 0.000000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-05 14:27:00,390-Speed 2498.03 samples/sec Loss 42.8853 LearningRate 0.000000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 185 hours Training: 2022-07-05 14:27:08,603-Speed 2494.24 samples/sec Loss 42.8770 LearningRate 0.000001 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 185 hours Training: 2022-07-05 14:27:16,816-Speed 2494.31 samples/sec Loss 42.9002 LearningRate 0.000001 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 65536 Required: 185 hours Training: 2022-07-05 14:27:24,960-Speed 2514.86 samples/sec Loss 42.8944 LearningRate 0.000001 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 65536 Required: 185 hours Training: 2022-07-05 14:27:33,159-Speed 2498.31 samples/sec Loss 42.9074 LearningRate 0.000001 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 65536 Required: 185 hours Training: 2022-07-05 14:27:41,359-Speed 2497.93 samples/sec Loss 42.8974 LearningRate 0.000001 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:27:49,587-Speed 2489.40 samples/sec Loss 42.8989 LearningRate 0.000001 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:27:57,786-Speed 2498.34 samples/sec Loss 42.8683 LearningRate 0.000001 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:28:05,988-Speed 2497.85 samples/sec Loss 42.9023 LearningRate 0.000001 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:28:14,136-Speed 2514.08 samples/sec Loss 42.8960 LearningRate 0.000002 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 65536 Required: 186 hours Training: 2022-07-05 14:28:22,336-Speed 2498.79 samples/sec Loss 42.8896 LearningRate 0.000002 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:28:30,539-Speed 2497.19 samples/sec Loss 42.8854 LearningRate 0.000002 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:28:38,737-Speed 2498.62 samples/sec Loss 42.8915 LearningRate 0.000002 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:28:46,942-Speed 2496.30 samples/sec Loss 42.8592 LearningRate 0.000002 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:28:55,141-Speed 2498.25 samples/sec Loss 42.8656 LearningRate 0.000002 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:29:03,299-Speed 2510.74 samples/sec Loss 42.8576 LearningRate 0.000002 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:29:11,497-Speed 2498.56 samples/sec Loss 42.8673 LearningRate 0.000002 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:29:19,696-Speed 2498.37 samples/sec Loss 42.8688 LearningRate 0.000003 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:29:27,897-Speed 2497.73 samples/sec Loss 42.8638 LearningRate 0.000003 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:29:36,104-Speed 2496.24 samples/sec Loss 42.8426 LearningRate 0.000003 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 187 hours Training: 2022-07-05 14:29:44,303-Speed 2498.07 samples/sec Loss 42.8292 LearningRate 0.000003 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:29:52,450-Speed 2514.51 samples/sec Loss 42.8466 LearningRate 0.000003 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:00,650-Speed 2497.74 samples/sec Loss 42.8370 LearningRate 0.000003 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:08,855-Speed 2496.68 samples/sec Loss 42.8421 LearningRate 0.000003 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:17,055-Speed 2497.78 samples/sec Loss 42.8396 LearningRate 0.000003 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:25,261-Speed 2496.12 samples/sec Loss 42.8021 LearningRate 0.000003 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:33,471-Speed 2495.08 samples/sec Loss 42.8027 LearningRate 0.000004 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:41,627-Speed 2511.42 samples/sec Loss 42.8220 LearningRate 0.000004 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:49,837-Speed 2494.67 samples/sec Loss 42.8152 LearningRate 0.000004 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:30:58,044-Speed 2495.96 samples/sec Loss 42.7693 LearningRate 0.000004 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:06,256-Speed 2494.25 samples/sec Loss 42.8121 LearningRate 0.000004 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:14,463-Speed 2495.79 samples/sec Loss 42.8171 LearningRate 0.000004 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:22,676-Speed 2494.10 samples/sec Loss 42.7638 LearningRate 0.000004 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:30,836-Speed 2510.17 samples/sec Loss 42.7814 LearningRate 0.000004 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:39,045-Speed 2495.31 samples/sec Loss 42.7696 LearningRate 0.000005 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:31:47,252-Speed 2496.07 samples/sec Loss 42.7520 LearningRate 0.000005 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:31:55,461-Speed 2495.24 samples/sec Loss 42.7406 LearningRate 0.000005 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:03,670-Speed 2495.31 samples/sec Loss 42.7321 LearningRate 0.000005 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:11,881-Speed 2494.48 samples/sec Loss 42.7013 LearningRate 0.000005 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:20,037-Speed 2511.30 samples/sec Loss 42.6830 LearningRate 0.000005 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:28,245-Speed 2495.81 samples/sec Loss 42.6980 LearningRate 0.000005 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:32:36,455-Speed 2494.98 samples/sec Loss 42.6577 LearningRate 0.000005 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:44,678-Speed 2490.84 samples/sec Loss 42.6406 LearningRate 0.000006 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:32:52,886-Speed 2495.64 samples/sec Loss 42.6143 LearningRate 0.000006 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:01,095-Speed 2495.17 samples/sec Loss 42.6261 LearningRate 0.000006 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:09,258-Speed 2509.35 samples/sec Loss 42.6473 LearningRate 0.000006 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:33:17,468-Speed 2494.61 samples/sec Loss 42.5694 LearningRate 0.000006 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:25,682-Speed 2494.00 samples/sec Loss 42.5494 LearningRate 0.000006 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:33,903-Speed 2491.54 samples/sec Loss 42.5700 LearningRate 0.000006 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:42,111-Speed 2495.68 samples/sec Loss 42.5329 LearningRate 0.000006 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:33:50,318-Speed 2495.88 samples/sec Loss 42.4547 LearningRate 0.000006 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:33:58,473-Speed 2511.44 samples/sec Loss 42.4910 LearningRate 0.000007 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:34:06,701-Speed 2489.77 samples/sec Loss 42.4582 LearningRate 0.000007 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:34:14,916-Speed 2493.37 samples/sec Loss 42.4272 LearningRate 0.000007 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:34:23,130-Speed 2493.67 samples/sec Loss 42.4211 LearningRate 0.000007 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:34:31,341-Speed 2494.46 samples/sec Loss 42.4027 LearningRate 0.000007 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:34:39,553-Speed 2494.43 samples/sec Loss 42.3545 LearningRate 0.000007 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:34:47,714-Speed 2509.73 samples/sec Loss 42.3302 LearningRate 0.000007 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:34:55,927-Speed 2493.88 samples/sec Loss 42.2536 LearningRate 0.000007 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:35:04,139-Speed 2494.43 samples/sec Loss 42.2492 LearningRate 0.000008 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:35:12,353-Speed 2493.64 samples/sec Loss 42.2125 LearningRate 0.000008 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:35:20,565-Speed 2494.46 samples/sec Loss 42.2106 LearningRate 0.000008 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:35:28,777-Speed 2494.00 samples/sec Loss 42.1817 LearningRate 0.000008 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:35:36,939-Speed 2509.71 samples/sec Loss 42.1424 LearningRate 0.000008 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:35:45,153-Speed 2493.87 samples/sec Loss 42.0933 LearningRate 0.000008 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:35:53,368-Speed 2493.51 samples/sec Loss 42.0168 LearningRate 0.000008 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:36:01,580-Speed 2494.13 samples/sec Loss 42.0148 LearningRate 0.000008 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:36:09,796-Speed 2493.10 samples/sec Loss 41.9782 LearningRate 0.000009 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:36:18,011-Speed 2493.61 samples/sec Loss 41.9445 LearningRate 0.000009 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:36:26,169-Speed 2510.62 samples/sec Loss 41.8653 LearningRate 0.000009 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:36:34,383-Speed 2493.73 samples/sec Loss 41.8537 LearningRate 0.000009 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:36:42,603-Speed 2492.14 samples/sec Loss 41.8026 LearningRate 0.000009 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:36:50,817-Speed 2493.51 samples/sec Loss 41.7859 LearningRate 0.000009 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:36:59,031-Speed 2493.95 samples/sec Loss 41.7256 LearningRate 0.000009 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:37:07,242-Speed 2494.45 samples/sec Loss 41.6653 LearningRate 0.000009 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:37:15,407-Speed 2508.86 samples/sec Loss 41.6417 LearningRate 0.000010 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:37:23,621-Speed 2493.70 samples/sec Loss 41.6083 LearningRate 0.000010 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:37:31,837-Speed 2492.90 samples/sec Loss 41.5331 LearningRate 0.000010 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:37:40,053-Speed 2493.33 samples/sec Loss 41.5065 LearningRate 0.000010 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:37:48,267-Speed 2493.55 samples/sec Loss 41.4799 LearningRate 0.000010 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:37:56,486-Speed 2492.26 samples/sec Loss 41.4267 LearningRate 0.000010 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:38:04,651-Speed 2508.79 samples/sec Loss 41.3923 LearningRate 0.000010 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:38:12,865-Speed 2494.15 samples/sec Loss 41.3296 LearningRate 0.000010 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:38:21,080-Speed 2493.35 samples/sec Loss 41.3027 LearningRate 0.000010 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:38:29,295-Speed 2493.24 samples/sec Loss 41.3163 LearningRate 0.000011 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:38:37,510-Speed 2493.58 samples/sec Loss 41.2329 LearningRate 0.000011 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:38:45,723-Speed 2494.08 samples/sec Loss 41.1736 LearningRate 0.000011 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:38:53,884-Speed 2509.91 samples/sec Loss 41.1384 LearningRate 0.000011 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:39:02,103-Speed 2492.27 samples/sec Loss 41.0751 LearningRate 0.000011 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:39:10,325-Speed 2491.35 samples/sec Loss 41.0840 LearningRate 0.000011 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:39:18,536-Speed 2494.44 samples/sec Loss 40.9867 LearningRate 0.000011 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:39:26,749-Speed 2493.91 samples/sec Loss 41.0030 LearningRate 0.000011 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:39:34,964-Speed 2493.62 samples/sec Loss 40.9342 LearningRate 0.000012 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:39:43,126-Speed 2509.58 samples/sec Loss 40.9109 LearningRate 0.000012 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:39:51,342-Speed 2493.46 samples/sec Loss 40.8690 LearningRate 0.000012 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:39:59,557-Speed 2493.32 samples/sec Loss 40.8253 LearningRate 0.000012 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:07,778-Speed 2491.84 samples/sec Loss 40.8094 LearningRate 0.000012 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:16,004-Speed 2490.11 samples/sec Loss 40.7978 LearningRate 0.000012 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:40:24,227-Speed 2491.23 samples/sec Loss 40.7375 LearningRate 0.000012 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:32,392-Speed 2508.40 samples/sec Loss 40.7021 LearningRate 0.000012 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:40,610-Speed 2493.01 samples/sec Loss 40.6558 LearningRate 0.000013 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:48,824-Speed 2493.51 samples/sec Loss 40.6244 LearningRate 0.000013 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:40:57,041-Speed 2492.83 samples/sec Loss 40.5895 LearningRate 0.000013 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:41:05,257-Speed 2493.17 samples/sec Loss 40.5489 LearningRate 0.000013 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:41:13,473-Speed 2493.40 samples/sec Loss 40.5004 LearningRate 0.000013 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:41:21,645-Speed 2506.56 samples/sec Loss 40.4834 LearningRate 0.000013 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:41:29,869-Speed 2490.39 samples/sec Loss 40.4509 LearningRate 0.000013 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:41:38,093-Speed 2490.86 samples/sec Loss 40.4129 LearningRate 0.000013 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:41:46,320-Speed 2489.63 samples/sec Loss 40.3963 LearningRate 0.000013 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:41:54,542-Speed 2491.35 samples/sec Loss 40.3413 LearningRate 0.000014 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:02,765-Speed 2491.06 samples/sec Loss 40.3600 LearningRate 0.000014 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:10,937-Speed 2506.62 samples/sec Loss 40.3071 LearningRate 0.000014 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:19,154-Speed 2492.45 samples/sec Loss 40.2816 LearningRate 0.000014 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:42:27,376-Speed 2491.58 samples/sec Loss 40.2409 LearningRate 0.000014 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:35,598-Speed 2491.11 samples/sec Loss 40.2048 LearningRate 0.000014 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:43,817-Speed 2492.37 samples/sec Loss 40.1737 LearningRate 0.000014 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:42:52,035-Speed 2492.37 samples/sec Loss 40.1687 LearningRate 0.000014 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:00,198-Speed 2509.15 samples/sec Loss 40.1047 LearningRate 0.000015 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 188 hours Training: 2022-07-05 14:43:08,413-Speed 2493.44 samples/sec Loss 40.1240 LearningRate 0.000015 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:16,631-Speed 2492.40 samples/sec Loss 40.0668 LearningRate 0.000015 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:24,854-Speed 2491.07 samples/sec Loss 40.0863 LearningRate 0.000015 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:33,068-Speed 2493.64 samples/sec Loss 40.0365 LearningRate 0.000015 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:41,287-Speed 2492.11 samples/sec Loss 40.0130 LearningRate 0.000015 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:49,449-Speed 2509.86 samples/sec Loss 39.9623 LearningRate 0.000015 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:43:57,669-Speed 2491.81 samples/sec Loss 39.9703 LearningRate 0.000015 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:05,915-Speed 2484.16 samples/sec Loss 39.9391 LearningRate 0.000016 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:14,137-Speed 2491.18 samples/sec Loss 39.9104 LearningRate 0.000016 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:22,359-Speed 2491.34 samples/sec Loss 39.9128 LearningRate 0.000016 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:30,574-Speed 2493.36 samples/sec Loss 39.8696 LearningRate 0.000016 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:38,733-Speed 2510.33 samples/sec Loss 39.8547 LearningRate 0.000016 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:46,950-Speed 2493.00 samples/sec Loss 39.8398 LearningRate 0.000016 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:44:55,163-Speed 2493.97 samples/sec Loss 39.8330 LearningRate 0.000016 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 189 hours Training: 2022-07-05 14:45:03,342-Speed 2504.35 samples/sec Loss 39.8090 LearningRate 0.000016 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:11,567-Speed 2490.18 samples/sec Loss 39.8100 LearningRate 0.000017 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:19,784-Speed 2492.97 samples/sec Loss 39.7835 LearningRate 0.000017 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:27,948-Speed 2508.86 samples/sec Loss 39.7600 LearningRate 0.000017 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:36,166-Speed 2492.61 samples/sec Loss 39.7410 LearningRate 0.000017 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:44,382-Speed 2492.89 samples/sec Loss 39.7251 LearningRate 0.000017 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:45:52,601-Speed 2492.20 samples/sec Loss 39.7112 LearningRate 0.000017 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:00,833-Speed 2488.14 samples/sec Loss 39.6894 LearningRate 0.000017 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:09,054-Speed 2491.74 samples/sec Loss 39.6846 LearningRate 0.000017 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:17,216-Speed 2509.75 samples/sec Loss 39.6686 LearningRate 0.000017 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 65536 Required: 188 hours Training: 2022-07-05 14:46:25,433-Speed 2492.75 samples/sec Loss 39.6304 LearningRate 0.000018 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:33,655-Speed 2491.43 samples/sec Loss 39.6332 LearningRate 0.000018 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:41,875-Speed 2492.29 samples/sec Loss 39.6159 LearningRate 0.000018 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:50,094-Speed 2491.96 samples/sec Loss 39.6031 LearningRate 0.000018 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:46:58,313-Speed 2492.43 samples/sec Loss 39.5781 LearningRate 0.000018 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:06,479-Speed 2508.76 samples/sec Loss 39.5439 LearningRate 0.000018 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:14,698-Speed 2492.44 samples/sec Loss 39.5596 LearningRate 0.000018 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:22,939-Speed 2485.50 samples/sec Loss 39.5491 LearningRate 0.000018 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:31,159-Speed 2491.96 samples/sec Loss 39.5277 LearningRate 0.000019 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:39,378-Speed 2492.06 samples/sec Loss 39.5370 LearningRate 0.000019 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 65536 Required: 189 hours Training: 2022-07-05 14:47:47,551-Speed 2506.39 samples/sec Loss 39.5212 LearningRate 0.000019 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:47:55,715-Speed 2508.67 samples/sec Loss 39.5071 LearningRate 0.000019 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:03,935-Speed 2491.97 samples/sec Loss 39.5238 LearningRate 0.000019 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:12,168-Speed 2488.14 samples/sec Loss 39.5029 LearningRate 0.000019 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:20,385-Speed 2492.62 samples/sec Loss 39.4922 LearningRate 0.000019 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:28,604-Speed 2492.20 samples/sec Loss 39.4906 LearningRate 0.000019 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:36,820-Speed 2493.26 samples/sec Loss 39.4733 LearningRate 0.000020 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:44,986-Speed 2508.50 samples/sec Loss 39.4583 LearningRate 0.000020 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:48:53,205-Speed 2491.90 samples/sec Loss 39.4431 LearningRate 0.000020 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 32768 Required: 188 hours Training: 2022-07-05 14:49:01,429-Speed 2490.63 samples/sec Loss 39.4200 LearningRate 0.000020 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:09,655-Speed 2490.32 samples/sec Loss 39.4138 LearningRate 0.000020 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:17,881-Speed 2489.97 samples/sec Loss 39.4104 LearningRate 0.000020 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:26,100-Speed 2492.01 samples/sec Loss 39.4068 LearningRate 0.000020 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:34,263-Speed 2509.35 samples/sec Loss 39.3957 LearningRate 0.000020 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:42,484-Speed 2491.52 samples/sec Loss 39.3886 LearningRate 0.000020 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:50,703-Speed 2492.22 samples/sec Loss 39.3713 LearningRate 0.000021 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:49:58,924-Speed 2491.63 samples/sec Loss 39.3606 LearningRate 0.000021 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:07,140-Speed 2493.18 samples/sec Loss 39.3365 LearningRate 0.000021 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:15,361-Speed 2491.64 samples/sec Loss 39.3431 LearningRate 0.000021 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:23,530-Speed 2507.33 samples/sec Loss 39.3362 LearningRate 0.000021 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:31,759-Speed 2489.26 samples/sec Loss 39.3253 LearningRate 0.000021 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:39,976-Speed 2492.85 samples/sec Loss 39.3116 LearningRate 0.000021 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:50:48,194-Speed 2492.39 samples/sec Loss 39.3151 LearningRate 0.000021 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 32768 Required: 188 hours Training: 2022-07-05 14:50:56,410-Speed 2493.22 samples/sec Loss 39.3063 LearningRate 0.000022 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:04,641-Speed 2488.28 samples/sec Loss 39.2966 LearningRate 0.000022 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:12,807-Speed 2508.66 samples/sec Loss 39.2991 LearningRate 0.000022 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:21,027-Speed 2491.99 samples/sec Loss 39.3025 LearningRate 0.000022 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:29,244-Speed 2492.57 samples/sec Loss 39.2947 LearningRate 0.000022 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:37,463-Speed 2492.11 samples/sec Loss 39.2834 LearningRate 0.000022 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:45,676-Speed 2494.00 samples/sec Loss 39.2751 LearningRate 0.000022 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:51:53,908-Speed 2488.39 samples/sec Loss 39.2654 LearningRate 0.000022 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:02,071-Speed 2509.14 samples/sec Loss 39.2503 LearningRate 0.000023 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:10,296-Speed 2490.23 samples/sec Loss 39.2596 LearningRate 0.000023 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:18,518-Speed 2491.46 samples/sec Loss 39.2560 LearningRate 0.000023 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:26,733-Speed 2493.41 samples/sec Loss 39.2417 LearningRate 0.000023 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:34,950-Speed 2492.79 samples/sec Loss 39.2342 LearningRate 0.000023 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:43,163-Speed 2493.93 samples/sec Loss 39.2584 LearningRate 0.000023 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 32768 Required: 188 hours Training: 2022-07-05 14:52:51,331-Speed 2508.14 samples/sec Loss 39.2331 LearningRate 0.000023 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:52:59,559-Speed 2489.42 samples/sec Loss 39.2402 LearningRate 0.000023 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:07,788-Speed 2489.33 samples/sec Loss 39.2158 LearningRate 0.000023 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:16,002-Speed 2493.74 samples/sec Loss 39.2263 LearningRate 0.000024 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:24,218-Speed 2493.10 samples/sec Loss 39.2082 LearningRate 0.000024 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:32,434-Speed 2492.98 samples/sec Loss 39.2208 LearningRate 0.000024 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:40,598-Speed 2508.96 samples/sec Loss 39.2060 LearningRate 0.000024 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:48,815-Speed 2493.22 samples/sec Loss 39.2038 LearningRate 0.000024 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:53:57,026-Speed 2494.66 samples/sec Loss 39.2101 LearningRate 0.000024 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 32768 Required: 188 hours Training: 2022-07-05 14:54:05,251-Speed 2490.43 samples/sec Loss 39.2039 LearningRate 0.000024 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:54:13,489-Speed 2486.47 samples/sec Loss 39.1803 LearningRate 0.000024 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 32768 Required: 189 hours Training: 2022-07-05 14:54:21,665-Speed 2505.29 samples/sec Loss 39.1782 LearningRate 0.000025 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:54:29,833-Speed 2507.56 samples/sec Loss 39.2004 LearningRate 0.000025 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:54:38,048-Speed 2493.47 samples/sec Loss 39.2022 LearningRate 0.000025 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:54:46,266-Speed 2492.91 samples/sec Loss 39.1836 LearningRate 0.000025 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:54:54,481-Speed 2493.49 samples/sec Loss 39.1787 LearningRate 0.000025 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:55:02,700-Speed 2491.89 samples/sec Loss 39.1698 LearningRate 0.000025 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:55:10,914-Speed 2493.66 samples/sec Loss 39.1688 LearningRate 0.000025 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:55:19,077-Speed 2509.54 samples/sec Loss 39.1698 LearningRate 0.000025 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:55:27,295-Speed 2492.44 samples/sec Loss 39.1817 LearningRate 0.000026 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:55:35,509-Speed 2493.66 samples/sec Loss 39.1613 LearningRate 0.000026 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:55:43,729-Speed 2491.74 samples/sec Loss 39.1497 LearningRate 0.000026 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:55:51,959-Speed 2489.07 samples/sec Loss 39.1778 LearningRate 0.000026 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:56:00,191-Speed 2488.20 samples/sec Loss 39.1548 LearningRate 0.000026 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:56:08,359-Speed 2507.79 samples/sec Loss 39.1675 LearningRate 0.000026 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:56:16,599-Speed 2485.56 samples/sec Loss 39.1728 LearningRate 0.000026 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:56:24,830-Speed 2488.71 samples/sec Loss 39.1645 LearningRate 0.000026 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:56:33,046-Speed 2493.07 samples/sec Loss 39.1624 LearningRate 0.000027 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:56:41,262-Speed 2493.20 samples/sec Loss 39.1627 LearningRate 0.000027 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:56:49,478-Speed 2493.46 samples/sec Loss 39.1977 LearningRate 0.000027 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:56:57,640-Speed 2509.77 samples/sec Loss 39.1464 LearningRate 0.000027 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:57:05,849-Speed 2495.22 samples/sec Loss 39.1685 LearningRate 0.000027 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:57:14,073-Speed 2490.49 samples/sec Loss 39.1754 LearningRate 0.000027 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:57:22,314-Speed 2485.84 samples/sec Loss 39.1644 LearningRate 0.000027 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:57:30,528-Speed 2493.61 samples/sec Loss 39.1641 LearningRate 0.000027 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:57:38,744-Speed 2493.05 samples/sec Loss 39.1464 LearningRate 0.000027 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:57:46,906-Speed 2509.54 samples/sec Loss 39.1616 LearningRate 0.000028 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:57:55,122-Speed 2493.36 samples/sec Loss 39.1558 LearningRate 0.000028 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:58:03,339-Speed 2492.73 samples/sec Loss 39.1545 LearningRate 0.000028 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:58:11,555-Speed 2492.99 samples/sec Loss 39.1751 LearningRate 0.000028 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:58:19,772-Speed 2493.06 samples/sec Loss 39.1537 LearningRate 0.000028 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:58:27,991-Speed 2492.32 samples/sec Loss 39.1511 LearningRate 0.000028 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:58:36,155-Speed 2508.97 samples/sec Loss 39.1352 LearningRate 0.000028 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:58:44,371-Speed 2493.06 samples/sec Loss 39.1398 LearningRate 0.000028 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:58:52,590-Speed 2492.29 samples/sec Loss 39.1414 LearningRate 0.000029 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:00,804-Speed 2493.74 samples/sec Loss 39.1390 LearningRate 0.000029 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:09,023-Speed 2492.14 samples/sec Loss 39.1466 LearningRate 0.000029 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:17,244-Speed 2491.73 samples/sec Loss 39.1326 LearningRate 0.000029 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 14:59:25,407-Speed 2509.22 samples/sec Loss 39.1272 LearningRate 0.000029 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:33,629-Speed 2491.45 samples/sec Loss 39.1112 LearningRate 0.000029 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:41,854-Speed 2490.43 samples/sec Loss 39.1052 LearningRate 0.000029 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:50,078-Speed 2490.68 samples/sec Loss 39.1137 LearningRate 0.000029 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 14:59:58,301-Speed 2490.82 samples/sec Loss 39.1099 LearningRate 0.000030 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 16384 Required: 189 hours Training: 2022-07-05 15:00:06,519-Speed 2492.39 samples/sec Loss 39.1033 LearningRate 0.000030 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:00:14,683-Speed 2509.13 samples/sec Loss 39.1246 LearningRate 0.000030 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:00:22,857-Speed 2505.84 samples/sec Loss 39.0958 LearningRate 0.000030 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:00:31,078-Speed 2491.56 samples/sec Loss 39.0986 LearningRate 0.000030 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:00:39,314-Speed 2487.06 samples/sec Loss 39.0863 LearningRate 0.000030 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:00:47,532-Speed 2492.55 samples/sec Loss 39.0831 LearningRate 0.000030 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:00:55,749-Speed 2492.73 samples/sec Loss 39.0930 LearningRate 0.000030 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:03,916-Speed 2508.13 samples/sec Loss 39.0969 LearningRate 0.000030 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:12,132-Speed 2493.22 samples/sec Loss 39.0894 LearningRate 0.000031 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:20,351-Speed 2492.02 samples/sec Loss 39.0741 LearningRate 0.000031 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:28,568-Speed 2492.73 samples/sec Loss 39.0834 LearningRate 0.000031 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:36,785-Speed 2492.98 samples/sec Loss 39.0770 LearningRate 0.000031 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:45,008-Speed 2491.17 samples/sec Loss 39.0761 LearningRate 0.000031 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:01:53,174-Speed 2508.07 samples/sec Loss 39.0786 LearningRate 0.000031 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:01,395-Speed 2491.82 samples/sec Loss 39.0875 LearningRate 0.000031 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:09,615-Speed 2492.01 samples/sec Loss 39.0863 LearningRate 0.000031 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:17,831-Speed 2493.12 samples/sec Loss 39.0678 LearningRate 0.000032 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:26,048-Speed 2492.58 samples/sec Loss 39.0777 LearningRate 0.000032 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:34,264-Speed 2493.39 samples/sec Loss 39.0708 LearningRate 0.000032 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:42,427-Speed 2509.27 samples/sec Loss 39.0835 LearningRate 0.000032 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:50,650-Speed 2490.99 samples/sec Loss 39.0775 LearningRate 0.000032 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:02:58,872-Speed 2491.32 samples/sec Loss 39.0713 LearningRate 0.000032 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:07,090-Speed 2492.45 samples/sec Loss 39.0761 LearningRate 0.000032 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:15,313-Speed 2490.98 samples/sec Loss 39.0765 LearningRate 0.000032 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:23,531-Speed 2492.45 samples/sec Loss 39.0875 LearningRate 0.000033 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:31,698-Speed 2508.02 samples/sec Loss 39.0772 LearningRate 0.000033 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:39,917-Speed 2493.17 samples/sec Loss 39.0757 LearningRate 0.000033 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:48,135-Speed 2492.49 samples/sec Loss 39.0767 LearningRate 0.000033 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:03:56,361-Speed 2490.42 samples/sec Loss 39.0700 LearningRate 0.000033 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:04,579-Speed 2492.45 samples/sec Loss 39.0568 LearningRate 0.000033 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:12,799-Speed 2492.83 samples/sec Loss 39.0713 LearningRate 0.000033 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:20,981-Speed 2503.32 samples/sec Loss 39.0705 LearningRate 0.000033 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:29,209-Speed 2489.63 samples/sec Loss 39.0479 LearningRate 0.000033 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:37,430-Speed 2491.51 samples/sec Loss 39.0617 LearningRate 0.000034 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:45,661-Speed 2488.58 samples/sec Loss 39.0744 LearningRate 0.000034 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:04:53,876-Speed 2493.47 samples/sec Loss 39.0713 LearningRate 0.000034 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:02,095-Speed 2491.98 samples/sec Loss 39.0385 LearningRate 0.000034 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:10,257-Speed 2509.95 samples/sec Loss 39.0561 LearningRate 0.000034 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:18,477-Speed 2491.83 samples/sec Loss 39.0416 LearningRate 0.000034 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:26,694-Speed 2492.88 samples/sec Loss 39.0518 LearningRate 0.000034 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:34,911-Speed 2492.82 samples/sec Loss 39.0391 LearningRate 0.000034 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:43,127-Speed 2492.88 samples/sec Loss 39.0431 LearningRate 0.000035 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:51,342-Speed 2493.69 samples/sec Loss 39.0524 LearningRate 0.000035 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:05:59,512-Speed 2507.25 samples/sec Loss 39.0683 LearningRate 0.000035 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:07,741-Speed 2488.86 samples/sec Loss 39.0894 LearningRate 0.000035 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:15,953-Speed 2494.42 samples/sec Loss 39.1288 LearningRate 0.000035 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:24,167-Speed 2493.87 samples/sec Loss 39.1241 LearningRate 0.000035 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:32,395-Speed 2489.37 samples/sec Loss 39.1311 LearningRate 0.000035 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:40,615-Speed 2492.26 samples/sec Loss 39.1216 LearningRate 0.000035 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:48,783-Speed 2507.85 samples/sec Loss 39.0961 LearningRate 0.000036 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:06:57,002-Speed 2492.16 samples/sec Loss 39.0800 LearningRate 0.000036 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:05,226-Speed 2490.73 samples/sec Loss 39.0363 LearningRate 0.000036 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:13,445-Speed 2492.06 samples/sec Loss 39.0680 LearningRate 0.000036 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:21,662-Speed 2492.88 samples/sec Loss 39.0744 LearningRate 0.000036 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:29,875-Speed 2493.82 samples/sec Loss 39.0605 LearningRate 0.000036 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:38,039-Speed 2508.98 samples/sec Loss 39.0733 LearningRate 0.000036 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:46,255-Speed 2494.12 samples/sec Loss 39.0696 LearningRate 0.000036 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:07:54,470-Speed 2493.47 samples/sec Loss 39.0559 LearningRate 0.000037 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:02,683-Speed 2494.06 samples/sec Loss 39.0633 LearningRate 0.000037 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:10,897-Speed 2493.71 samples/sec Loss 39.0664 LearningRate 0.000037 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:19,117-Speed 2491.72 samples/sec Loss 39.0708 LearningRate 0.000037 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:27,277-Speed 2510.24 samples/sec Loss 39.0405 LearningRate 0.000037 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:35,493-Speed 2493.16 samples/sec Loss 39.0370 LearningRate 0.000037 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:43,707-Speed 2493.83 samples/sec Loss 39.0225 LearningRate 0.000037 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:08:51,922-Speed 2493.30 samples/sec Loss 39.0464 LearningRate 0.000037 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:00,142-Speed 2491.98 samples/sec Loss 39.0247 LearningRate 0.000037 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:08,359-Speed 2492.75 samples/sec Loss 39.0246 LearningRate 0.000038 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:16,525-Speed 2508.44 samples/sec Loss 39.0459 LearningRate 0.000038 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:24,741-Speed 2493.03 samples/sec Loss 39.0241 LearningRate 0.000038 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:32,954-Speed 2494.08 samples/sec Loss 39.0230 LearningRate 0.000038 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:41,171-Speed 2492.89 samples/sec Loss 39.0162 LearningRate 0.000038 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:49,384-Speed 2493.87 samples/sec Loss 39.0413 LearningRate 0.000038 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:09:57,605-Speed 2491.80 samples/sec Loss 39.0477 LearningRate 0.000038 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:05,777-Speed 2506.57 samples/sec Loss 39.0481 LearningRate 0.000038 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:13,994-Speed 2492.69 samples/sec Loss 39.0436 LearningRate 0.000039 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:22,209-Speed 2493.63 samples/sec Loss 39.0659 LearningRate 0.000039 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:30,433-Speed 2490.42 samples/sec Loss 39.0403 LearningRate 0.000039 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:38,649-Speed 2493.28 samples/sec Loss 39.0407 LearningRate 0.000039 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:46,870-Speed 2491.64 samples/sec Loss 39.0322 LearningRate 0.000039 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:10:55,033-Speed 2509.42 samples/sec Loss 39.0115 LearningRate 0.000039 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:03,251-Speed 2492.28 samples/sec Loss 39.0339 LearningRate 0.000039 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:11,485-Speed 2487.76 samples/sec Loss 39.0398 LearningRate 0.000039 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:19,697-Speed 2494.38 samples/sec Loss 39.0328 LearningRate 0.000040 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:27,910-Speed 2494.28 samples/sec Loss 39.0543 LearningRate 0.000040 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:36,124-Speed 2493.69 samples/sec Loss 39.0665 LearningRate 0.000040 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:44,285-Speed 2509.73 samples/sec Loss 39.0653 LearningRate 0.000040 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:11:52,499-Speed 2493.76 samples/sec Loss 39.0454 LearningRate 0.000040 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:00,714-Speed 2493.44 samples/sec Loss 39.0769 LearningRate 0.000040 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:08,929-Speed 2493.54 samples/sec Loss 39.0454 LearningRate 0.000040 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:17,143-Speed 2494.03 samples/sec Loss 39.0522 LearningRate 0.000040 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:25,355-Speed 2494.16 samples/sec Loss 39.0305 LearningRate 0.000040 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:33,529-Speed 2506.07 samples/sec Loss 39.0561 LearningRate 0.000041 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:41,750-Speed 2491.61 samples/sec Loss 39.0413 LearningRate 0.000041 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:49,961-Speed 2494.51 samples/sec Loss 39.0477 LearningRate 0.000041 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:12:58,175-Speed 2493.55 samples/sec Loss 39.0356 LearningRate 0.000041 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:06,390-Speed 2493.28 samples/sec Loss 39.0273 LearningRate 0.000041 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:14,601-Speed 2494.76 samples/sec Loss 39.0145 LearningRate 0.000041 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:22,760-Speed 2510.41 samples/sec Loss 39.0161 LearningRate 0.000041 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:30,973-Speed 2494.06 samples/sec Loss 39.0237 LearningRate 0.000041 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:39,187-Speed 2494.06 samples/sec Loss 39.0084 LearningRate 0.000042 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:13:47,356-Speed 2507.51 samples/sec Loss 39.0091 LearningRate 0.000042 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:13:55,570-Speed 2493.74 samples/sec Loss 39.0105 LearningRate 0.000042 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:03,780-Speed 2495.11 samples/sec Loss 39.0144 LearningRate 0.000042 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:11,942-Speed 2509.40 samples/sec Loss 39.0354 LearningRate 0.000042 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:20,155-Speed 2494.18 samples/sec Loss 39.0051 LearningRate 0.000042 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:28,368-Speed 2494.02 samples/sec Loss 39.0209 LearningRate 0.000042 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:36,575-Speed 2495.60 samples/sec Loss 39.0176 LearningRate 0.000042 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:44,787-Speed 2494.39 samples/sec Loss 39.0299 LearningRate 0.000043 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:14:52,998-Speed 2494.55 samples/sec Loss 39.0007 LearningRate 0.000043 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:01,152-Speed 2512.09 samples/sec Loss 39.0667 LearningRate 0.000043 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:09,360-Speed 2495.69 samples/sec Loss 39.0930 LearningRate 0.000043 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:17,573-Speed 2494.72 samples/sec Loss 39.1050 LearningRate 0.000043 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:25,786-Speed 2494.14 samples/sec Loss 39.2027 LearningRate 0.000043 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:33,993-Speed 2495.78 samples/sec Loss 39.1396 LearningRate 0.000043 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:42,206-Speed 2494.16 samples/sec Loss 39.1316 LearningRate 0.000043 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:50,359-Speed 2512.42 samples/sec Loss 39.1426 LearningRate 0.000044 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:15:58,572-Speed 2493.95 samples/sec Loss 39.0824 LearningRate 0.000044 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:06,782-Speed 2494.93 samples/sec Loss 39.0853 LearningRate 0.000044 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:14,994-Speed 2494.43 samples/sec Loss 39.0594 LearningRate 0.000044 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:23,212-Speed 2492.13 samples/sec Loss 39.0735 LearningRate 0.000044 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:31,427-Speed 2493.38 samples/sec Loss 39.0694 LearningRate 0.000044 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:39,587-Speed 2510.51 samples/sec Loss 39.0349 LearningRate 0.000044 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:47,799-Speed 2494.37 samples/sec Loss 39.0522 LearningRate 0.000044 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:16:56,011-Speed 2494.28 samples/sec Loss 39.0491 LearningRate 0.000044 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:04,223-Speed 2494.36 samples/sec Loss 39.0544 LearningRate 0.000045 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:12,441-Speed 2492.45 samples/sec Loss 39.0580 LearningRate 0.000045 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:20,653-Speed 2494.48 samples/sec Loss 39.0389 LearningRate 0.000045 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:28,826-Speed 2506.07 samples/sec Loss 39.0405 LearningRate 0.000045 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:37,035-Speed 2495.54 samples/sec Loss 39.0600 LearningRate 0.000045 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:45,250-Speed 2493.39 samples/sec Loss 39.0514 LearningRate 0.000045 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:17:53,466-Speed 2493.17 samples/sec Loss 39.0429 LearningRate 0.000045 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:01,680-Speed 2493.52 samples/sec Loss 39.0455 LearningRate 0.000045 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:09,893-Speed 2493.89 samples/sec Loss 39.0449 LearningRate 0.000046 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:18,057-Speed 2509.32 samples/sec Loss 39.0275 LearningRate 0.000046 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:26,265-Speed 2495.36 samples/sec Loss 39.0547 LearningRate 0.000046 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:34,478-Speed 2494.26 samples/sec Loss 39.0499 LearningRate 0.000046 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:42,684-Speed 2496.02 samples/sec Loss 39.0292 LearningRate 0.000046 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:50,893-Speed 2495.08 samples/sec Loss 39.0499 LearningRate 0.000046 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:18:59,104-Speed 2495.01 samples/sec Loss 39.0383 LearningRate 0.000046 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:07,262-Speed 2510.65 samples/sec Loss 39.0473 LearningRate 0.000046 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:15,478-Speed 2492.98 samples/sec Loss 39.0344 LearningRate 0.000047 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:23,691-Speed 2494.51 samples/sec Loss 39.0302 LearningRate 0.000047 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:31,901-Speed 2494.96 samples/sec Loss 39.0174 LearningRate 0.000047 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:40,119-Speed 2492.34 samples/sec Loss 39.0312 LearningRate 0.000047 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:48,326-Speed 2495.78 samples/sec Loss 39.0284 LearningRate 0.000047 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:19:56,484-Speed 2510.94 samples/sec Loss 39.0398 LearningRate 0.000047 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:04,697-Speed 2493.93 samples/sec Loss 39.0223 LearningRate 0.000047 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:12,909-Speed 2494.37 samples/sec Loss 39.0156 LearningRate 0.000047 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:21,128-Speed 2492.33 samples/sec Loss 39.0269 LearningRate 0.000047 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:29,343-Speed 2493.10 samples/sec Loss 39.0424 LearningRate 0.000048 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:37,563-Speed 2491.81 samples/sec Loss 39.0602 LearningRate 0.000048 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:45,727-Speed 2508.93 samples/sec Loss 39.0686 LearningRate 0.000048 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:20:53,945-Speed 2492.71 samples/sec Loss 39.0622 LearningRate 0.000048 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:02,163-Speed 2492.35 samples/sec Loss 39.0552 LearningRate 0.000048 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:10,370-Speed 2495.80 samples/sec Loss 39.0631 LearningRate 0.000048 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:18,574-Speed 2496.69 samples/sec Loss 39.0522 LearningRate 0.000048 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:26,792-Speed 2492.66 samples/sec Loss 39.0482 LearningRate 0.000048 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:34,965-Speed 2506.05 samples/sec Loss 39.0448 LearningRate 0.000049 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:43,173-Speed 2495.75 samples/sec Loss 39.0541 LearningRate 0.000049 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:51,378-Speed 2496.65 samples/sec Loss 39.0624 LearningRate 0.000049 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:21:59,583-Speed 2496.28 samples/sec Loss 39.0452 LearningRate 0.000049 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:07,792-Speed 2495.17 samples/sec Loss 39.0544 LearningRate 0.000049 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:16,004-Speed 2494.31 samples/sec Loss 39.0656 LearningRate 0.000049 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:24,162-Speed 2510.82 samples/sec Loss 39.0456 LearningRate 0.000049 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:32,373-Speed 2494.85 samples/sec Loss 39.0599 LearningRate 0.000049 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:40,579-Speed 2495.93 samples/sec Loss 39.0633 LearningRate 0.000050 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:48,788-Speed 2495.21 samples/sec Loss 39.0527 LearningRate 0.000050 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:22:57,001-Speed 2494.11 samples/sec Loss 39.0571 LearningRate 0.000050 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:05,212-Speed 2494.67 samples/sec Loss 39.0565 LearningRate 0.000050 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:13,369-Speed 2511.37 samples/sec Loss 39.0750 LearningRate 0.000050 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:21,581-Speed 2494.12 samples/sec Loss 39.0651 LearningRate 0.000050 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:29,792-Speed 2494.86 samples/sec Loss 39.0587 LearningRate 0.000050 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:38,009-Speed 2492.76 samples/sec Loss 39.0412 LearningRate 0.000050 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:46,219-Speed 2494.82 samples/sec Loss 39.0569 LearningRate 0.000050 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:23:54,430-Speed 2494.89 samples/sec Loss 39.0540 LearningRate 0.000051 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:02,588-Speed 2510.60 samples/sec Loss 39.0695 LearningRate 0.000051 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:10,800-Speed 2494.50 samples/sec Loss 39.0587 LearningRate 0.000051 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:19,011-Speed 2494.41 samples/sec Loss 39.0515 LearningRate 0.000051 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:27,224-Speed 2494.30 samples/sec Loss 39.0659 LearningRate 0.000051 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:35,437-Speed 2493.93 samples/sec Loss 39.0633 LearningRate 0.000051 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:43,649-Speed 2494.19 samples/sec Loss 39.0982 LearningRate 0.000051 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:24:51,808-Speed 2510.73 samples/sec Loss 39.0711 LearningRate 0.000051 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:00,020-Speed 2494.11 samples/sec Loss 39.0812 LearningRate 0.000052 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:08,230-Speed 2494.96 samples/sec Loss 39.0896 LearningRate 0.000052 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:16,440-Speed 2494.99 samples/sec Loss 39.0889 LearningRate 0.000052 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:24,656-Speed 2493.17 samples/sec Loss 39.0923 LearningRate 0.000052 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:32,869-Speed 2494.11 samples/sec Loss 39.0822 LearningRate 0.000052 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:41,026-Speed 2510.98 samples/sec Loss 39.1114 LearningRate 0.000052 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:49,232-Speed 2496.09 samples/sec Loss 39.0971 LearningRate 0.000052 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:25:57,437-Speed 2496.60 samples/sec Loss 39.0983 LearningRate 0.000052 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:05,659-Speed 2491.09 samples/sec Loss 39.1055 LearningRate 0.000053 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:13,868-Speed 2495.29 samples/sec Loss 39.0948 LearningRate 0.000053 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:22,081-Speed 2494.39 samples/sec Loss 39.0932 LearningRate 0.000053 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:30,241-Speed 2510.17 samples/sec Loss 39.0816 LearningRate 0.000053 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:38,454-Speed 2494.32 samples/sec Loss 39.1021 LearningRate 0.000053 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:46,665-Speed 2494.66 samples/sec Loss 39.1090 LearningRate 0.000053 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:26:54,879-Speed 2493.94 samples/sec Loss 39.1188 LearningRate 0.000053 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:03,088-Speed 2495.01 samples/sec Loss 39.0958 LearningRate 0.000053 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:11,300-Speed 2494.65 samples/sec Loss 39.0933 LearningRate 0.000054 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:19,457-Speed 2510.94 samples/sec Loss 39.0916 LearningRate 0.000054 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:27,662-Speed 2496.46 samples/sec Loss 39.0968 LearningRate 0.000054 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:35,877-Speed 2493.45 samples/sec Loss 39.0929 LearningRate 0.000054 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:44,086-Speed 2495.62 samples/sec Loss 39.0835 LearningRate 0.000054 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:27:52,297-Speed 2494.56 samples/sec Loss 39.1001 LearningRate 0.000054 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:00,503-Speed 2495.71 samples/sec Loss 39.0798 LearningRate 0.000054 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:08,659-Speed 2511.63 samples/sec Loss 39.1060 LearningRate 0.000054 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:16,863-Speed 2496.73 samples/sec Loss 39.0930 LearningRate 0.000054 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:25,071-Speed 2495.57 samples/sec Loss 39.0936 LearningRate 0.000055 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:33,278-Speed 2495.69 samples/sec Loss 39.0985 LearningRate 0.000055 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:41,487-Speed 2495.16 samples/sec Loss 39.0751 LearningRate 0.000055 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:49,697-Speed 2495.24 samples/sec Loss 39.0720 LearningRate 0.000055 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:28:57,855-Speed 2510.86 samples/sec Loss 39.0889 LearningRate 0.000055 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:06,075-Speed 2491.85 samples/sec Loss 39.1083 LearningRate 0.000055 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:14,289-Speed 2493.81 samples/sec Loss 39.0888 LearningRate 0.000055 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:22,500-Speed 2494.77 samples/sec Loss 39.0894 LearningRate 0.000055 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:30,720-Speed 2491.60 samples/sec Loss 39.0980 LearningRate 0.000056 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:38,932-Speed 2494.26 samples/sec Loss 39.1116 LearningRate 0.000056 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:47,087-Speed 2511.84 samples/sec Loss 39.0821 LearningRate 0.000056 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:29:55,299-Speed 2494.25 samples/sec Loss 39.1038 LearningRate 0.000056 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:30:03,508-Speed 2495.08 samples/sec Loss 39.0964 LearningRate 0.000056 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 15:30:11,721-Speed 2493.99 samples/sec Loss 39.1015 LearningRate 0.000056 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:30:19,933-Speed 2494.49 samples/sec Loss 39.1182 LearningRate 0.000056 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:30:28,168-Speed 2498.27 samples/sec Loss 39.1220 LearningRate 0.000056 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:30:36,324-Speed 2511.58 samples/sec Loss 39.1123 LearningRate 0.000057 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:30:44,593-Speed 2496.80 samples/sec Loss 39.1120 LearningRate 0.000057 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:30:52,802-Speed 2495.09 samples/sec Loss 39.1317 LearningRate 0.000057 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:01,069-Speed 2496.62 samples/sec Loss 39.1202 LearningRate 0.000057 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:09,315-Speed 2496.36 samples/sec Loss 39.1472 LearningRate 0.000057 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:17,529-Speed 2493.84 samples/sec Loss 39.1278 LearningRate 0.000057 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:25,728-Speed 2511.91 samples/sec Loss 39.1036 LearningRate 0.000057 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:35,534-Speed 2093.97 samples/sec Loss 39.1165 LearningRate 0.000057 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:43,743-Speed 2495.02 samples/sec Loss 39.1217 LearningRate 0.000057 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:31:51,952-Speed 2495.27 samples/sec Loss 39.1039 LearningRate 0.000058 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:02,153-Speed 2025.34 samples/sec Loss 39.1214 LearningRate 0.000058 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:10,405-Speed 2495.75 samples/sec Loss 39.1129 LearningRate 0.000058 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:18,569-Speed 2508.85 samples/sec Loss 39.1245 LearningRate 0.000058 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:26,823-Speed 2495.96 samples/sec Loss 39.1393 LearningRate 0.000058 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:35,068-Speed 2495.75 samples/sec Loss 39.1509 LearningRate 0.000058 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:43,282-Speed 2493.69 samples/sec Loss 39.1310 LearningRate 0.000058 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:51,515-Speed 2487.82 samples/sec Loss 39.1509 LearningRate 0.000058 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:32:59,744-Speed 2495.81 samples/sec Loss 39.1375 LearningRate 0.000059 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:07,919-Speed 2513.04 samples/sec Loss 39.1479 LearningRate 0.000059 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:16,125-Speed 2495.83 samples/sec Loss 39.1245 LearningRate 0.000059 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:24,369-Speed 2496.47 samples/sec Loss 39.1394 LearningRate 0.000059 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:33,364-Speed 2493.81 samples/sec Loss 39.2011 LearningRate 0.000059 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:41,579-Speed 2493.20 samples/sec Loss 39.1618 LearningRate 0.000059 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:49,808-Speed 2495.29 samples/sec Loss 39.1459 LearningRate 0.000059 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:33:58,862-Speed 2512.74 samples/sec Loss 39.1566 LearningRate 0.000059 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:07,096-Speed 2495.51 samples/sec Loss 39.1341 LearningRate 0.000060 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:15,309-Speed 2493.81 samples/sec Loss 39.1343 LearningRate 0.000060 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:24,540-Speed 2413.45 samples/sec Loss 39.1590 LearningRate 0.000060 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:32,783-Speed 2496.04 samples/sec Loss 39.1858 LearningRate 0.000060 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:41,025-Speed 2497.31 samples/sec Loss 39.1419 LearningRate 0.000060 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:49,177-Speed 2512.68 samples/sec Loss 39.1665 LearningRate 0.000060 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:34:57,383-Speed 2496.03 samples/sec Loss 39.1629 LearningRate 0.000060 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:05,640-Speed 2496.81 samples/sec Loss 39.1718 LearningRate 0.000060 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:13,921-Speed 2497.94 samples/sec Loss 39.1680 LearningRate 0.000061 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:22,136-Speed 2493.41 samples/sec Loss 39.1717 LearningRate 0.000061 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:30,344-Speed 2497.85 samples/sec Loss 39.1760 LearningRate 0.000061 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:40,184-Speed 2504.68 samples/sec Loss 39.1685 LearningRate 0.000061 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:48,392-Speed 2495.74 samples/sec Loss 39.1324 LearningRate 0.000061 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:35:56,618-Speed 2489.92 samples/sec Loss 39.1493 LearningRate 0.000061 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:04,831-Speed 2494.30 samples/sec Loss 39.1362 LearningRate 0.000061 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:13,038-Speed 2495.67 samples/sec Loss 39.1507 LearningRate 0.000061 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:21,248-Speed 2494.91 samples/sec Loss 39.1433 LearningRate 0.000061 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:29,405-Speed 2511.17 samples/sec Loss 39.1298 LearningRate 0.000062 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:37,617-Speed 2494.42 samples/sec Loss 39.1572 LearningRate 0.000062 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:45,832-Speed 2493.38 samples/sec Loss 39.1815 LearningRate 0.000062 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:36:54,045-Speed 2493.92 samples/sec Loss 39.1537 LearningRate 0.000062 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:02,254-Speed 2495.31 samples/sec Loss 39.1562 LearningRate 0.000062 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:10,465-Speed 2494.67 samples/sec Loss 39.1782 LearningRate 0.000062 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:18,621-Speed 2511.32 samples/sec Loss 39.1743 LearningRate 0.000062 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:26,832-Speed 2494.76 samples/sec Loss 39.1740 LearningRate 0.000062 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:35,039-Speed 2495.96 samples/sec Loss 39.1492 LearningRate 0.000063 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:43,248-Speed 2495.16 samples/sec Loss 39.1786 LearningRate 0.000063 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:51,473-Speed 2490.45 samples/sec Loss 39.1787 LearningRate 0.000063 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:37:59,709-Speed 2487.13 samples/sec Loss 39.1751 LearningRate 0.000063 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:07,867-Speed 2510.86 samples/sec Loss 39.1902 LearningRate 0.000063 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:16,079-Speed 2494.15 samples/sec Loss 39.1976 LearningRate 0.000063 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:24,294-Speed 2493.58 samples/sec Loss 39.1976 LearningRate 0.000063 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:32,506-Speed 2494.27 samples/sec Loss 39.2110 LearningRate 0.000063 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:40,714-Speed 2495.36 samples/sec Loss 39.2028 LearningRate 0.000064 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:48,925-Speed 2494.85 samples/sec Loss 39.2085 LearningRate 0.000064 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:38:57,085-Speed 2510.25 samples/sec Loss 39.2010 LearningRate 0.000064 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:05,294-Speed 2495.03 samples/sec Loss 39.2042 LearningRate 0.000064 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:13,503-Speed 2495.50 samples/sec Loss 39.1903 LearningRate 0.000064 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:21,714-Speed 2494.69 samples/sec Loss 39.1894 LearningRate 0.000064 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:29,924-Speed 2495.00 samples/sec Loss 39.1612 LearningRate 0.000064 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:38,137-Speed 2494.13 samples/sec Loss 39.1820 LearningRate 0.000064 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:46,295-Speed 2510.51 samples/sec Loss 39.1873 LearningRate 0.000064 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:39:54,505-Speed 2494.87 samples/sec Loss 39.1893 LearningRate 0.000065 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:02,717-Speed 2494.65 samples/sec Loss 39.2054 LearningRate 0.000065 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:10,931-Speed 2493.60 samples/sec Loss 39.2030 LearningRate 0.000065 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:19,139-Speed 2495.79 samples/sec Loss 39.1968 LearningRate 0.000065 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:27,347-Speed 2495.77 samples/sec Loss 39.1988 LearningRate 0.000065 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:35,505-Speed 2510.96 samples/sec Loss 39.1909 LearningRate 0.000065 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:43,716-Speed 2494.63 samples/sec Loss 39.2179 LearningRate 0.000065 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:40:51,933-Speed 2492.89 samples/sec Loss 39.1986 LearningRate 0.000065 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:00,141-Speed 2495.27 samples/sec Loss 39.2236 LearningRate 0.000066 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:08,354-Speed 2494.10 samples/sec Loss 39.1870 LearningRate 0.000066 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:16,567-Speed 2493.98 samples/sec Loss 39.2183 LearningRate 0.000066 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:24,732-Speed 2508.72 samples/sec Loss 39.2275 LearningRate 0.000066 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:32,951-Speed 2492.10 samples/sec Loss 39.2350 LearningRate 0.000066 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:41,166-Speed 2493.33 samples/sec Loss 39.2222 LearningRate 0.000066 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:49,379-Speed 2494.01 samples/sec Loss 39.2209 LearningRate 0.000066 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:41:57,593-Speed 2493.87 samples/sec Loss 39.2027 LearningRate 0.000066 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:05,807-Speed 2493.62 samples/sec Loss 39.1932 LearningRate 0.000067 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:13,965-Speed 2511.00 samples/sec Loss 39.2115 LearningRate 0.000067 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:22,182-Speed 2492.98 samples/sec Loss 39.2357 LearningRate 0.000067 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:30,397-Speed 2493.65 samples/sec Loss 39.2258 LearningRate 0.000067 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:38,608-Speed 2494.54 samples/sec Loss 39.2451 LearningRate 0.000067 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:46,817-Speed 2495.21 samples/sec Loss 39.2461 LearningRate 0.000067 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:42:55,027-Speed 2494.71 samples/sec Loss 39.2462 LearningRate 0.000067 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:03,193-Speed 2508.37 samples/sec Loss 39.2176 LearningRate 0.000067 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:11,421-Speed 2489.55 samples/sec Loss 39.2306 LearningRate 0.000067 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:19,631-Speed 2495.05 samples/sec Loss 39.2240 LearningRate 0.000068 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:27,841-Speed 2495.18 samples/sec Loss 39.2214 LearningRate 0.000068 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:36,055-Speed 2493.60 samples/sec Loss 39.2375 LearningRate 0.000068 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:44,283-Speed 2489.56 samples/sec Loss 39.2834 LearningRate 0.000068 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:43:52,445-Speed 2509.51 samples/sec Loss 39.2927 LearningRate 0.000068 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:00,662-Speed 2493.26 samples/sec Loss 39.2667 LearningRate 0.000068 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:08,876-Speed 2493.77 samples/sec Loss 39.2709 LearningRate 0.000068 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:17,088-Speed 2494.21 samples/sec Loss 39.2562 LearningRate 0.000068 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:25,304-Speed 2493.28 samples/sec Loss 39.2610 LearningRate 0.000069 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:33,619-Speed 2463.27 samples/sec Loss 39.2586 LearningRate 0.000069 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:41,776-Speed 2511.10 samples/sec Loss 39.2368 LearningRate 0.000069 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:49,995-Speed 2492.32 samples/sec Loss 39.2638 LearningRate 0.000069 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:44:58,208-Speed 2493.87 samples/sec Loss 39.2337 LearningRate 0.000069 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:06,429-Speed 2491.68 samples/sec Loss 39.2497 LearningRate 0.000069 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:14,645-Speed 2492.85 samples/sec Loss 39.2581 LearningRate 0.000069 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:22,862-Speed 2492.86 samples/sec Loss 39.2473 LearningRate 0.000069 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:31,026-Speed 2509.06 samples/sec Loss 39.2506 LearningRate 0.000070 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:39,240-Speed 2493.81 samples/sec Loss 39.2465 LearningRate 0.000070 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:47,455-Speed 2493.52 samples/sec Loss 39.2153 LearningRate 0.000070 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:45:55,671-Speed 2493.26 samples/sec Loss 39.2391 LearningRate 0.000070 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:03,888-Speed 2492.81 samples/sec Loss 39.2240 LearningRate 0.000070 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:12,196-Speed 2465.46 samples/sec Loss 39.2342 LearningRate 0.000070 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:20,371-Speed 2505.47 samples/sec Loss 39.2495 LearningRate 0.000070 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:28,586-Speed 2493.70 samples/sec Loss 39.2474 LearningRate 0.000070 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:36,802-Speed 2493.58 samples/sec Loss 39.2540 LearningRate 0.000071 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:46:45,017-Speed 2493.16 samples/sec Loss 39.2591 LearningRate 0.000071 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:46:53,234-Speed 2492.95 samples/sec Loss 39.2581 LearningRate 0.000071 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:01,449-Speed 2493.66 samples/sec Loss 39.2793 LearningRate 0.000071 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:09,610-Speed 2509.82 samples/sec Loss 39.2510 LearningRate 0.000071 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:17,841-Speed 2488.60 samples/sec Loss 39.2478 LearningRate 0.000071 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:26,061-Speed 2491.96 samples/sec Loss 39.2680 LearningRate 0.000071 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:34,272-Speed 2494.58 samples/sec Loss 39.2725 LearningRate 0.000071 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:42,487-Speed 2493.29 samples/sec Loss 39.2551 LearningRate 0.000071 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:50,703-Speed 2493.14 samples/sec Loss 39.2940 LearningRate 0.000072 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:47:58,862-Speed 2510.60 samples/sec Loss 39.2859 LearningRate 0.000072 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:07,086-Speed 2490.82 samples/sec Loss 39.3028 LearningRate 0.000072 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:15,314-Speed 2489.30 samples/sec Loss 39.3049 LearningRate 0.000072 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:23,531-Speed 2493.14 samples/sec Loss 39.2853 LearningRate 0.000072 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:31,748-Speed 2492.73 samples/sec Loss 39.2931 LearningRate 0.000072 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:39,963-Speed 2493.21 samples/sec Loss 39.3100 LearningRate 0.000072 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:48,126-Speed 2509.42 samples/sec Loss 39.3230 LearningRate 0.000072 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:48:56,339-Speed 2494.07 samples/sec Loss 39.3578 LearningRate 0.000073 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:04,558-Speed 2492.48 samples/sec Loss 39.3537 LearningRate 0.000073 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:12,782-Speed 2490.61 samples/sec Loss 39.3699 LearningRate 0.000073 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:20,994-Speed 2494.24 samples/sec Loss 39.3982 LearningRate 0.000073 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:29,206-Speed 2494.23 samples/sec Loss 39.3748 LearningRate 0.000073 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:37,367-Speed 2510.08 samples/sec Loss 39.3994 LearningRate 0.000073 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:45,583-Speed 2493.29 samples/sec Loss 39.4332 LearningRate 0.000073 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:49:53,798-Speed 2493.25 samples/sec Loss 39.4465 LearningRate 0.000073 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:02,015-Speed 2493.04 samples/sec Loss 39.4587 LearningRate 0.000074 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:10,232-Speed 2492.60 samples/sec Loss 39.4674 LearningRate 0.000074 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:18,451-Speed 2492.37 samples/sec Loss 39.4746 LearningRate 0.000074 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:26,612-Speed 2509.80 samples/sec Loss 39.4401 LearningRate 0.000074 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:34,825-Speed 2494.07 samples/sec Loss 39.4704 LearningRate 0.000074 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:43,041-Speed 2493.23 samples/sec Loss 39.4640 LearningRate 0.000074 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:51,255-Speed 2493.90 samples/sec Loss 39.4140 LearningRate 0.000074 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:50:59,474-Speed 2492.07 samples/sec Loss 39.4202 LearningRate 0.000074 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:07,690-Speed 2493.03 samples/sec Loss 39.3910 LearningRate 0.000074 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:15,849-Speed 2510.52 samples/sec Loss 39.3894 LearningRate 0.000075 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:24,065-Speed 2493.07 samples/sec Loss 39.3719 LearningRate 0.000075 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:32,293-Speed 2489.83 samples/sec Loss 39.3879 LearningRate 0.000075 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:40,512-Speed 2492.13 samples/sec Loss 39.3745 LearningRate 0.000075 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:48,729-Speed 2492.74 samples/sec Loss 39.3641 LearningRate 0.000075 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:51:56,948-Speed 2492.17 samples/sec Loss 39.3679 LearningRate 0.000075 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:05,114-Speed 2508.39 samples/sec Loss 39.3536 LearningRate 0.000075 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:13,335-Speed 2491.70 samples/sec Loss 39.3370 LearningRate 0.000075 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:21,552-Speed 2492.84 samples/sec Loss 39.3448 LearningRate 0.000076 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:29,766-Speed 2493.89 samples/sec Loss 39.3419 LearningRate 0.000076 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:37,983-Speed 2493.10 samples/sec Loss 39.3432 LearningRate 0.000076 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:46,198-Speed 2493.34 samples/sec Loss 39.3106 LearningRate 0.000076 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:52:54,357-Speed 2510.40 samples/sec Loss 39.3163 LearningRate 0.000076 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:02,571-Speed 2493.97 samples/sec Loss 39.3165 LearningRate 0.000076 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:10,784-Speed 2493.78 samples/sec Loss 39.3329 LearningRate 0.000076 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:18,998-Speed 2493.74 samples/sec Loss 39.3002 LearningRate 0.000076 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:27,213-Speed 2493.46 samples/sec Loss 39.3171 LearningRate 0.000077 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:35,435-Speed 2491.28 samples/sec Loss 39.3301 LearningRate 0.000077 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:43,595-Speed 2510.33 samples/sec Loss 39.3204 LearningRate 0.000077 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:53:51,807-Speed 2494.36 samples/sec Loss 39.3628 LearningRate 0.000077 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:00,023-Speed 2493.03 samples/sec Loss 39.3564 LearningRate 0.000077 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:08,238-Speed 2493.47 samples/sec Loss 39.3577 LearningRate 0.000077 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:16,452-Speed 2493.69 samples/sec Loss 39.3604 LearningRate 0.000077 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:24,668-Speed 2493.22 samples/sec Loss 39.3586 LearningRate 0.000077 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:32,830-Speed 2509.46 samples/sec Loss 39.3610 LearningRate 0.000077 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:41,048-Speed 2492.32 samples/sec Loss 39.3636 LearningRate 0.000078 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 16384 Required: 188 hours Training: 2022-07-05 15:54:49,217-Speed 2507.63 samples/sec Loss 39.3784 LearningRate 0.000078 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:54:57,427-Speed 2494.88 samples/sec Loss 39.3794 LearningRate 0.000078 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:05,641-Speed 2493.92 samples/sec Loss 39.3870 LearningRate 0.000078 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:13,854-Speed 2494.19 samples/sec Loss 39.3651 LearningRate 0.000078 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:22,014-Speed 2510.32 samples/sec Loss 39.3954 LearningRate 0.000078 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:30,231-Speed 2492.71 samples/sec Loss 39.4017 LearningRate 0.000078 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:38,442-Speed 2494.56 samples/sec Loss 39.4296 LearningRate 0.000078 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:46,656-Speed 2494.01 samples/sec Loss 39.4344 LearningRate 0.000079 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:55:54,872-Speed 2493.04 samples/sec Loss 39.4386 LearningRate 0.000079 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:03,092-Speed 2491.73 samples/sec Loss 39.4747 LearningRate 0.000079 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:11,252-Speed 2510.16 samples/sec Loss 39.5114 LearningRate 0.000079 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:19,465-Speed 2494.07 samples/sec Loss 39.4962 LearningRate 0.000079 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:27,675-Speed 2495.17 samples/sec Loss 39.5107 LearningRate 0.000079 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:35,891-Speed 2492.94 samples/sec Loss 39.5371 LearningRate 0.000079 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:44,103-Speed 2494.34 samples/sec Loss 39.5595 LearningRate 0.000079 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:56:52,319-Speed 2493.22 samples/sec Loss 39.5612 LearningRate 0.000080 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:00,479-Speed 2509.95 samples/sec Loss 39.5826 LearningRate 0.000080 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:08,704-Speed 2490.54 samples/sec Loss 39.5447 LearningRate 0.000080 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:16,919-Speed 2493.64 samples/sec Loss 39.5585 LearningRate 0.000080 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:25,130-Speed 2494.52 samples/sec Loss 39.5080 LearningRate 0.000080 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:33,344-Speed 2493.57 samples/sec Loss 39.4840 LearningRate 0.000080 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:41,558-Speed 2493.74 samples/sec Loss 39.4430 LearningRate 0.000080 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:49,722-Speed 2509.00 samples/sec Loss 39.4112 LearningRate 0.000080 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:57:57,938-Speed 2493.13 samples/sec Loss 39.4039 LearningRate 0.000081 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:06,147-Speed 2495.43 samples/sec Loss 39.4220 LearningRate 0.000081 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:14,364-Speed 2492.62 samples/sec Loss 39.4062 LearningRate 0.000081 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:22,604-Speed 2486.16 samples/sec Loss 39.4205 LearningRate 0.000081 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:30,827-Speed 2490.94 samples/sec Loss 39.3841 LearningRate 0.000081 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:38,980-Speed 2512.37 samples/sec Loss 39.4075 LearningRate 0.000081 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:47,203-Speed 2491.07 samples/sec Loss 39.3769 LearningRate 0.000081 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:58:55,429-Speed 2490.05 samples/sec Loss 39.4087 LearningRate 0.000081 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:03,642-Speed 2493.89 samples/sec Loss 39.3953 LearningRate 0.000081 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:11,854-Speed 2494.28 samples/sec Loss 39.3863 LearningRate 0.000082 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:20,067-Speed 2493.80 samples/sec Loss 39.3652 LearningRate 0.000082 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:28,230-Speed 2509.34 samples/sec Loss 39.3719 LearningRate 0.000082 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:36,443-Speed 2494.98 samples/sec Loss 39.3977 LearningRate 0.000082 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:44,655-Speed 2494.49 samples/sec Loss 39.3708 LearningRate 0.000082 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 15:59:52,868-Speed 2493.81 samples/sec Loss 39.3926 LearningRate 0.000082 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:01,078-Speed 2494.83 samples/sec Loss 39.3660 LearningRate 0.000082 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:09,288-Speed 2494.92 samples/sec Loss 39.3645 LearningRate 0.000082 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:17,448-Speed 2510.27 samples/sec Loss 39.3802 LearningRate 0.000083 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:25,662-Speed 2493.80 samples/sec Loss 39.3951 LearningRate 0.000083 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:33,873-Speed 2494.56 samples/sec Loss 39.3862 LearningRate 0.000083 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:42,078-Speed 2496.52 samples/sec Loss 39.3764 LearningRate 0.000083 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:50,291-Speed 2493.79 samples/sec Loss 39.3933 LearningRate 0.000083 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:00:58,507-Speed 2493.05 samples/sec Loss 39.4180 LearningRate 0.000083 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:06,662-Speed 2511.86 samples/sec Loss 39.4085 LearningRate 0.000083 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:14,869-Speed 2496.01 samples/sec Loss 39.4118 LearningRate 0.000083 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:23,075-Speed 2496.11 samples/sec Loss 39.4284 LearningRate 0.000084 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:31,290-Speed 2493.36 samples/sec Loss 39.3678 LearningRate 0.000084 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:39,506-Speed 2493.08 samples/sec Loss 39.3979 LearningRate 0.000084 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:47,735-Speed 2489.10 samples/sec Loss 39.3872 LearningRate 0.000084 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:01:55,901-Speed 2508.49 samples/sec Loss 39.3821 LearningRate 0.000084 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:04,116-Speed 2493.23 samples/sec Loss 39.4101 LearningRate 0.000084 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:12,346-Speed 2488.82 samples/sec Loss 39.3909 LearningRate 0.000084 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:20,564-Speed 2492.25 samples/sec Loss 39.3885 LearningRate 0.000084 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:28,776-Speed 2494.14 samples/sec Loss 39.3892 LearningRate 0.000084 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:36,988-Speed 2494.66 samples/sec Loss 39.3913 LearningRate 0.000085 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:45,146-Speed 2510.79 samples/sec Loss 39.4187 LearningRate 0.000085 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:02:53,356-Speed 2495.05 samples/sec Loss 39.4158 LearningRate 0.000085 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:01,564-Speed 2495.37 samples/sec Loss 39.3766 LearningRate 0.000085 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:09,773-Speed 2495.32 samples/sec Loss 39.3913 LearningRate 0.000085 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:17,986-Speed 2494.00 samples/sec Loss 39.3982 LearningRate 0.000085 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:26,199-Speed 2493.72 samples/sec Loss 39.3995 LearningRate 0.000085 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:34,354-Speed 2512.26 samples/sec Loss 39.4051 LearningRate 0.000085 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:42,566-Speed 2494.34 samples/sec Loss 39.4212 LearningRate 0.000086 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:50,776-Speed 2494.85 samples/sec Loss 39.3930 LearningRate 0.000086 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:03:58,983-Speed 2495.83 samples/sec Loss 39.3990 LearningRate 0.000086 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:07,195-Speed 2494.55 samples/sec Loss 39.3953 LearningRate 0.000086 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:15,408-Speed 2494.29 samples/sec Loss 39.3907 LearningRate 0.000086 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:23,563-Speed 2511.52 samples/sec Loss 39.3754 LearningRate 0.000086 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:31,774-Speed 2494.51 samples/sec Loss 39.3802 LearningRate 0.000086 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:39,985-Speed 2494.62 samples/sec Loss 39.3791 LearningRate 0.000086 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:48,191-Speed 2496.10 samples/sec Loss 39.3907 LearningRate 0.000087 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:04:56,399-Speed 2495.57 samples/sec Loss 39.3883 LearningRate 0.000087 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:04,626-Speed 2489.94 samples/sec Loss 39.4064 LearningRate 0.000087 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:12,786-Speed 2510.04 samples/sec Loss 39.4106 LearningRate 0.000087 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:20,998-Speed 2494.36 samples/sec Loss 39.3982 LearningRate 0.000087 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:29,209-Speed 2494.47 samples/sec Loss 39.3949 LearningRate 0.000087 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:37,424-Speed 2493.42 samples/sec Loss 39.4004 LearningRate 0.000087 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:45,634-Speed 2495.05 samples/sec Loss 39.4169 LearningRate 0.000087 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:05:53,844-Speed 2494.86 samples/sec Loss 39.4146 LearningRate 0.000088 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:02,002-Speed 2510.76 samples/sec Loss 39.3984 LearningRate 0.000088 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:10,218-Speed 2493.25 samples/sec Loss 39.4158 LearningRate 0.000088 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:18,430-Speed 2494.04 samples/sec Loss 39.4067 LearningRate 0.000088 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:26,636-Speed 2496.16 samples/sec Loss 39.4011 LearningRate 0.000088 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:34,844-Speed 2495.29 samples/sec Loss 39.4042 LearningRate 0.000088 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:43,053-Speed 2494.99 samples/sec Loss 39.4142 LearningRate 0.000088 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:51,212-Speed 2510.63 samples/sec Loss 39.4517 LearningRate 0.000088 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:06:59,425-Speed 2493.92 samples/sec Loss 39.4271 LearningRate 0.000088 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 8192 Required: 188 hours Training: 2022-07-05 16:07:07,604-Speed 2504.19 samples/sec Loss 39.5015 LearningRate 0.000089 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:15,814-Speed 2495.01 samples/sec Loss 39.4261 LearningRate 0.000089 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:24,027-Speed 2493.96 samples/sec Loss 39.4543 LearningRate 0.000089 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:32,240-Speed 2494.09 samples/sec Loss 39.4849 LearningRate 0.000089 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:40,400-Speed 2510.03 samples/sec Loss 39.4631 LearningRate 0.000089 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:48,613-Speed 2494.04 samples/sec Loss 39.4386 LearningRate 0.000089 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:07:56,831-Speed 2492.56 samples/sec Loss 39.4192 LearningRate 0.000089 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:05,042-Speed 2494.56 samples/sec Loss 39.4510 LearningRate 0.000089 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:13,254-Speed 2494.28 samples/sec Loss 39.4245 LearningRate 0.000090 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:21,463-Speed 2495.19 samples/sec Loss 39.4638 LearningRate 0.000090 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:29,620-Speed 2511.06 samples/sec Loss 39.4347 LearningRate 0.000090 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:37,826-Speed 2496.19 samples/sec Loss 39.4228 LearningRate 0.000090 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:46,036-Speed 2494.81 samples/sec Loss 39.4320 LearningRate 0.000090 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:08:54,248-Speed 2494.18 samples/sec Loss 39.4296 LearningRate 0.000090 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:02,457-Speed 2495.48 samples/sec Loss 39.4068 LearningRate 0.000090 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:10,670-Speed 2493.83 samples/sec Loss 39.4278 LearningRate 0.000090 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:18,827-Speed 2511.22 samples/sec Loss 39.4186 LearningRate 0.000091 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:27,038-Speed 2494.62 samples/sec Loss 39.4080 LearningRate 0.000091 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:35,249-Speed 2494.70 samples/sec Loss 39.4064 LearningRate 0.000091 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:43,467-Speed 2492.46 samples/sec Loss 39.4289 LearningRate 0.000091 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:51,671-Speed 2496.60 samples/sec Loss 39.4203 LearningRate 0.000091 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:09:59,882-Speed 2494.72 samples/sec Loss 39.4424 LearningRate 0.000091 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:08,036-Speed 2511.96 samples/sec Loss 39.4188 LearningRate 0.000091 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:16,241-Speed 2496.46 samples/sec Loss 39.4285 LearningRate 0.000091 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:24,449-Speed 2495.62 samples/sec Loss 39.4397 LearningRate 0.000091 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:32,661-Speed 2494.34 samples/sec Loss 39.4341 LearningRate 0.000092 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:40,866-Speed 2496.26 samples/sec Loss 39.4158 LearningRate 0.000092 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:49,074-Speed 2495.67 samples/sec Loss 39.4358 LearningRate 0.000092 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:10:57,230-Speed 2511.23 samples/sec Loss 39.4482 LearningRate 0.000092 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:05,443-Speed 2494.09 samples/sec Loss 39.4405 LearningRate 0.000092 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:13,651-Speed 2495.45 samples/sec Loss 39.4220 LearningRate 0.000092 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:21,860-Speed 2495.00 samples/sec Loss 39.4278 LearningRate 0.000092 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:30,080-Speed 2491.95 samples/sec Loss 39.4253 LearningRate 0.000092 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:38,290-Speed 2494.81 samples/sec Loss 39.4117 LearningRate 0.000093 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:46,452-Speed 2509.48 samples/sec Loss 39.4171 LearningRate 0.000093 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:11:54,664-Speed 2494.45 samples/sec Loss 39.4329 LearningRate 0.000093 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:02,876-Speed 2494.14 samples/sec Loss 39.4303 LearningRate 0.000093 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:11,085-Speed 2495.43 samples/sec Loss 39.4363 LearningRate 0.000093 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:19,289-Speed 2496.55 samples/sec Loss 39.4266 LearningRate 0.000093 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:27,494-Speed 2496.27 samples/sec Loss 39.4290 LearningRate 0.000093 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:35,651-Speed 2511.18 samples/sec Loss 39.4220 LearningRate 0.000093 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:43,860-Speed 2495.05 samples/sec Loss 39.4635 LearningRate 0.000094 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:12:52,083-Speed 2490.97 samples/sec Loss 39.4640 LearningRate 0.000094 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:00,293-Speed 2495.08 samples/sec Loss 39.4828 LearningRate 0.000094 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:08,518-Speed 2490.51 samples/sec Loss 39.4543 LearningRate 0.000094 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:16,728-Speed 2494.66 samples/sec Loss 39.4626 LearningRate 0.000094 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:24,886-Speed 2510.88 samples/sec Loss 39.4493 LearningRate 0.000094 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:33,106-Speed 2491.86 samples/sec Loss 39.4731 LearningRate 0.000094 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:41,316-Speed 2495.00 samples/sec Loss 39.4738 LearningRate 0.000094 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:49,525-Speed 2495.20 samples/sec Loss 39.4656 LearningRate 0.000094 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:13:57,732-Speed 2495.73 samples/sec Loss 39.4507 LearningRate 0.000095 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:05,944-Speed 2494.55 samples/sec Loss 39.4794 LearningRate 0.000095 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:14,114-Speed 2507.07 samples/sec Loss 39.4402 LearningRate 0.000095 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:22,329-Speed 2493.29 samples/sec Loss 39.4542 LearningRate 0.000095 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:30,541-Speed 2494.19 samples/sec Loss 39.4363 LearningRate 0.000095 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:38,751-Speed 2494.78 samples/sec Loss 39.4503 LearningRate 0.000095 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:14:46,957-Speed 2496.59 samples/sec Loss 39.4623 LearningRate 0.000095 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:14:55,168-Speed 2494.58 samples/sec Loss 39.4790 LearningRate 0.000095 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:15:03,324-Speed 2511.52 samples/sec Loss 39.4784 LearningRate 0.000096 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:15:11,543-Speed 2492.29 samples/sec Loss 39.4686 LearningRate 0.000096 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 4096 Required: 188 hours Training: 2022-07-05 16:15:19,751-Speed 2495.40 samples/sec Loss 39.4662 LearningRate 0.000096 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:15:27,924-Speed 2506.08 samples/sec Loss 39.4860 LearningRate 0.000096 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:15:36,135-Speed 2494.90 samples/sec Loss 39.4922 LearningRate 0.000096 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:15:44,341-Speed 2495.94 samples/sec Loss 39.4769 LearningRate 0.000096 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:15:52,509-Speed 2507.76 samples/sec Loss 39.4514 LearningRate 0.000096 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:00,717-Speed 2495.79 samples/sec Loss 39.4711 LearningRate 0.000096 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:08,920-Speed 2496.78 samples/sec Loss 39.4663 LearningRate 0.000097 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:17,129-Speed 2495.40 samples/sec Loss 39.4909 LearningRate 0.000097 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:25,337-Speed 2495.55 samples/sec Loss 39.4896 LearningRate 0.000097 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:33,544-Speed 2495.44 samples/sec Loss 39.4792 LearningRate 0.000097 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:41,698-Speed 2512.11 samples/sec Loss 39.4666 LearningRate 0.000097 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:49,909-Speed 2494.64 samples/sec Loss 39.4597 LearningRate 0.000097 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:16:58,121-Speed 2494.45 samples/sec Loss 39.4922 LearningRate 0.000097 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:06,328-Speed 2495.73 samples/sec Loss 39.4609 LearningRate 0.000097 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:14,531-Speed 2497.01 samples/sec Loss 39.4495 LearningRate 0.000098 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:22,738-Speed 2495.55 samples/sec Loss 39.4691 LearningRate 0.000098 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:30,895-Speed 2511.23 samples/sec Loss 39.4517 LearningRate 0.000098 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:39,101-Speed 2495.98 samples/sec Loss 39.4991 LearningRate 0.000098 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:47,313-Speed 2494.50 samples/sec Loss 39.4922 LearningRate 0.000098 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:17:55,525-Speed 2494.20 samples/sec Loss 39.4643 LearningRate 0.000098 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:03,735-Speed 2494.79 samples/sec Loss 39.4768 LearningRate 0.000098 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:11,946-Speed 2494.42 samples/sec Loss 39.4760 LearningRate 0.000098 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:20,105-Speed 2510.61 samples/sec Loss 39.4936 LearningRate 0.000098 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:28,313-Speed 2495.29 samples/sec Loss 39.5005 LearningRate 0.000099 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:36,523-Speed 2494.68 samples/sec Loss 39.5082 LearningRate 0.000099 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:44,737-Speed 2493.72 samples/sec Loss 39.4764 LearningRate 0.000099 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:18:52,949-Speed 2494.35 samples/sec Loss 39.4828 LearningRate 0.000099 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:01,160-Speed 2494.61 samples/sec Loss 39.4758 LearningRate 0.000099 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:09,326-Speed 2508.17 samples/sec Loss 39.5183 LearningRate 0.000099 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:17,536-Speed 2494.69 samples/sec Loss 39.5258 LearningRate 0.000099 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:25,759-Speed 2491.00 samples/sec Loss 39.5135 LearningRate 0.000099 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:33,966-Speed 2495.74 samples/sec Loss 39.5105 LearningRate 0.000100 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:42,169-Speed 2496.86 samples/sec Loss 39.4971 LearningRate 0.000100 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:50,373-Speed 2496.85 samples/sec Loss 39.5007 LearningRate 0.000100 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:19:58,530-Speed 2511.10 samples/sec Loss 39.4923 LearningRate 0.000100 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:06,737-Speed 2495.62 samples/sec Loss 39.5008 LearningRate 0.000100 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:14,947-Speed 2494.89 samples/sec Loss 39.5014 LearningRate 0.000100 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:23,157-Speed 2494.90 samples/sec Loss 39.5047 LearningRate 0.000100 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:31,366-Speed 2495.70 samples/sec Loss 39.4722 LearningRate 0.000100 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:39,569-Speed 2496.80 samples/sec Loss 39.4981 LearningRate 0.000101 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:47,719-Speed 2513.24 samples/sec Loss 39.4851 LearningRate 0.000101 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:20:55,925-Speed 2495.94 samples/sec Loss 39.4951 LearningRate 0.000101 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:04,135-Speed 2494.72 samples/sec Loss 39.5059 LearningRate 0.000101 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:12,342-Speed 2495.93 samples/sec Loss 39.5129 LearningRate 0.000101 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:20,549-Speed 2495.63 samples/sec Loss 39.5110 LearningRate 0.000101 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:28,758-Speed 2495.15 samples/sec Loss 39.5090 LearningRate 0.000101 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:36,915-Speed 2511.47 samples/sec Loss 39.5066 LearningRate 0.000101 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:45,127-Speed 2493.99 samples/sec Loss 39.5002 LearningRate 0.000101 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:21:53,339-Speed 2494.32 samples/sec Loss 39.5163 LearningRate 0.000102 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:01,550-Speed 2494.66 samples/sec Loss 39.4896 LearningRate 0.000102 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:09,759-Speed 2495.22 samples/sec Loss 39.4869 LearningRate 0.000102 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:17,962-Speed 2497.03 samples/sec Loss 39.5008 LearningRate 0.000102 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:26,114-Speed 2512.53 samples/sec Loss 39.4917 LearningRate 0.000102 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:34,324-Speed 2494.72 samples/sec Loss 39.5228 LearningRate 0.000102 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:42,545-Speed 2491.46 samples/sec Loss 39.4987 LearningRate 0.000102 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:50,754-Speed 2495.33 samples/sec Loss 39.5028 LearningRate 0.000102 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:22:58,966-Speed 2494.16 samples/sec Loss 39.4948 LearningRate 0.000103 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:07,178-Speed 2494.14 samples/sec Loss 39.5176 LearningRate 0.000103 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:15,337-Speed 2510.60 samples/sec Loss 39.4975 LearningRate 0.000103 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:23,545-Speed 2495.47 samples/sec Loss 39.4785 LearningRate 0.000103 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:31,753-Speed 2495.47 samples/sec Loss 39.4922 LearningRate 0.000103 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:39,961-Speed 2495.99 samples/sec Loss 39.4952 LearningRate 0.000103 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:48,172-Speed 2494.80 samples/sec Loss 39.4959 LearningRate 0.000103 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:23:56,380-Speed 2495.33 samples/sec Loss 39.4885 LearningRate 0.000103 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:04,534-Speed 2512.23 samples/sec Loss 39.4954 LearningRate 0.000104 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:12,745-Speed 2494.58 samples/sec Loss 39.4799 LearningRate 0.000104 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:20,958-Speed 2493.70 samples/sec Loss 39.5163 LearningRate 0.000104 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:29,182-Speed 2490.76 samples/sec Loss 39.5256 LearningRate 0.000104 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:37,390-Speed 2495.47 samples/sec Loss 39.4970 LearningRate 0.000104 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:45,598-Speed 2495.43 samples/sec Loss 39.5139 LearningRate 0.000104 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:24:53,754-Speed 2511.14 samples/sec Loss 39.5197 LearningRate 0.000104 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:01,967-Speed 2494.11 samples/sec Loss 39.4920 LearningRate 0.000104 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:10,175-Speed 2495.55 samples/sec Loss 39.5005 LearningRate 0.000105 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:18,384-Speed 2495.19 samples/sec Loss 39.4924 LearningRate 0.000105 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:26,597-Speed 2493.93 samples/sec Loss 39.5098 LearningRate 0.000105 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:34,815-Speed 2492.24 samples/sec Loss 39.5252 LearningRate 0.000105 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:42,978-Speed 2509.70 samples/sec Loss 39.5483 LearningRate 0.000105 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:51,194-Speed 2492.63 samples/sec Loss 39.4892 LearningRate 0.000105 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:25:59,413-Speed 2492.21 samples/sec Loss 39.5038 LearningRate 0.000105 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:07,628-Speed 2493.46 samples/sec Loss 39.5034 LearningRate 0.000105 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:15,840-Speed 2494.58 samples/sec Loss 39.5448 LearningRate 0.000105 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:24,045-Speed 2496.16 samples/sec Loss 39.5204 LearningRate 0.000106 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:32,205-Speed 2510.09 samples/sec Loss 39.5246 LearningRate 0.000106 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:40,415-Speed 2494.98 samples/sec Loss 39.4824 LearningRate 0.000106 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:48,623-Speed 2495.65 samples/sec Loss 39.5194 LearningRate 0.000106 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:26:56,835-Speed 2494.35 samples/sec Loss 39.5332 LearningRate 0.000106 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:05,053-Speed 2492.40 samples/sec Loss 39.5008 LearningRate 0.000106 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:13,267-Speed 2493.77 samples/sec Loss 39.5174 LearningRate 0.000106 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:21,423-Speed 2511.45 samples/sec Loss 39.5099 LearningRate 0.000106 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:29,632-Speed 2494.94 samples/sec Loss 39.4878 LearningRate 0.000107 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:37,839-Speed 2496.07 samples/sec Loss 39.5335 LearningRate 0.000107 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:46,048-Speed 2494.88 samples/sec Loss 39.5438 LearningRate 0.000107 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:27:54,263-Speed 2493.60 samples/sec Loss 39.5376 LearningRate 0.000107 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:02,480-Speed 2492.81 samples/sec Loss 39.5059 LearningRate 0.000107 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:10,634-Speed 2512.12 samples/sec Loss 39.4937 LearningRate 0.000107 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:18,845-Speed 2494.68 samples/sec Loss 39.5137 LearningRate 0.000107 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:27,052-Speed 2495.91 samples/sec Loss 39.5162 LearningRate 0.000107 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:35,260-Speed 2495.94 samples/sec Loss 39.5334 LearningRate 0.000108 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:43,466-Speed 2495.73 samples/sec Loss 39.5329 LearningRate 0.000108 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:51,672-Speed 2496.39 samples/sec Loss 39.5256 LearningRate 0.000108 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:28:59,829-Speed 2510.90 samples/sec Loss 39.5257 LearningRate 0.000108 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:08,043-Speed 2493.68 samples/sec Loss 39.5156 LearningRate 0.000108 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:16,253-Speed 2494.60 samples/sec Loss 39.5132 LearningRate 0.000108 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:24,463-Speed 2495.42 samples/sec Loss 39.5152 LearningRate 0.000108 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:32,676-Speed 2493.98 samples/sec Loss 39.5281 LearningRate 0.000108 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:40,885-Speed 2495.13 samples/sec Loss 39.5236 LearningRate 0.000108 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:49,041-Speed 2511.20 samples/sec Loss 39.5240 LearningRate 0.000109 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:29:57,251-Speed 2494.83 samples/sec Loss 39.5154 LearningRate 0.000109 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:05,460-Speed 2495.35 samples/sec Loss 39.5097 LearningRate 0.000109 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:13,670-Speed 2494.98 samples/sec Loss 39.5582 LearningRate 0.000109 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:21,882-Speed 2494.19 samples/sec Loss 39.5108 LearningRate 0.000109 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:30,091-Speed 2495.32 samples/sec Loss 39.5250 LearningRate 0.000109 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:38,254-Speed 2509.13 samples/sec Loss 39.5098 LearningRate 0.000109 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:46,473-Speed 2492.13 samples/sec Loss 39.5342 LearningRate 0.000109 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:30:54,685-Speed 2494.47 samples/sec Loss 39.5084 LearningRate 0.000110 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:02,895-Speed 2494.61 samples/sec Loss 39.5247 LearningRate 0.000110 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:11,107-Speed 2494.40 samples/sec Loss 39.5416 LearningRate 0.000110 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:19,316-Speed 2494.86 samples/sec Loss 39.5339 LearningRate 0.000110 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:28,204-Speed 2511.75 samples/sec Loss 39.5261 LearningRate 0.000110 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:37,721-Speed 2498.52 samples/sec Loss 39.5181 LearningRate 0.000110 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:45,929-Speed 2495.31 samples/sec Loss 39.5606 LearningRate 0.000110 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 2048 Required: 187 hours Training: 2022-07-05 16:31:54,406-Speed 2497.32 samples/sec Loss 39.5560 LearningRate 0.000110 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:02,614-Speed 2495.39 samples/sec Loss 39.5628 LearningRate 0.000111 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:10,825-Speed 2494.47 samples/sec Loss 39.5393 LearningRate 0.000111 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:18,984-Speed 2510.50 samples/sec Loss 39.5599 LearningRate 0.000111 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:27,202-Speed 2492.82 samples/sec Loss 39.5586 LearningRate 0.000111 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:35,424-Speed 2491.52 samples/sec Loss 39.5407 LearningRate 0.000111 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:43,645-Speed 2491.49 samples/sec Loss 39.5253 LearningRate 0.000111 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:32:51,861-Speed 2493.18 samples/sec Loss 39.5568 LearningRate 0.000111 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:00,092-Speed 2488.92 samples/sec Loss 39.5380 LearningRate 0.000111 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:08,252-Speed 2510.21 samples/sec Loss 39.5156 LearningRate 0.000111 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:16,473-Speed 2491.65 samples/sec Loss 39.5511 LearningRate 0.000112 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:24,699-Speed 2490.21 samples/sec Loss 39.5360 LearningRate 0.000112 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:32,912-Speed 2493.99 samples/sec Loss 39.5601 LearningRate 0.000112 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:41,140-Speed 2489.50 samples/sec Loss 39.5336 LearningRate 0.000112 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:49,354-Speed 2493.46 samples/sec Loss 39.5319 LearningRate 0.000112 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:33:57,513-Speed 2510.69 samples/sec Loss 39.5108 LearningRate 0.000112 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:05,726-Speed 2494.06 samples/sec Loss 39.5375 LearningRate 0.000112 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:13,944-Speed 2492.56 samples/sec Loss 39.5527 LearningRate 0.000112 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:22,159-Speed 2493.16 samples/sec Loss 39.5461 LearningRate 0.000113 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:30,379-Speed 2491.84 samples/sec Loss 39.4970 LearningRate 0.000113 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:38,609-Speed 2488.95 samples/sec Loss 39.5029 LearningRate 0.000113 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:46,779-Speed 2507.02 samples/sec Loss 39.5382 LearningRate 0.000113 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:34:54,995-Speed 2493.07 samples/sec Loss 39.5312 LearningRate 0.000113 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:03,212-Speed 2493.11 samples/sec Loss 39.5478 LearningRate 0.000113 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:11,429-Speed 2493.09 samples/sec Loss 39.5349 LearningRate 0.000113 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:19,649-Speed 2491.91 samples/sec Loss 39.5259 LearningRate 0.000113 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:27,865-Speed 2492.89 samples/sec Loss 39.5092 LearningRate 0.000114 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:36,029-Speed 2509.02 samples/sec Loss 39.5240 LearningRate 0.000114 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:44,244-Speed 2493.42 samples/sec Loss 39.5527 LearningRate 0.000114 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:35:52,465-Speed 2491.49 samples/sec Loss 39.5373 LearningRate 0.000114 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:00,685-Speed 2491.97 samples/sec Loss 39.5376 LearningRate 0.000114 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:08,907-Speed 2491.37 samples/sec Loss 39.5388 LearningRate 0.000114 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:17,126-Speed 2491.88 samples/sec Loss 39.5794 LearningRate 0.000114 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:25,290-Speed 2508.97 samples/sec Loss 39.5607 LearningRate 0.000114 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:33,517-Speed 2490.02 samples/sec Loss 39.5395 LearningRate 0.000115 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:41,728-Speed 2494.61 samples/sec Loss 39.5408 LearningRate 0.000115 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:49,956-Speed 2489.43 samples/sec Loss 39.5416 LearningRate 0.000115 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:36:58,196-Speed 2485.83 samples/sec Loss 39.5470 LearningRate 0.000115 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:06,410-Speed 2493.88 samples/sec Loss 39.5145 LearningRate 0.000115 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:14,578-Speed 2507.57 samples/sec Loss 39.5440 LearningRate 0.000115 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:22,795-Speed 2492.86 samples/sec Loss 39.5720 LearningRate 0.000115 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:31,007-Speed 2494.20 samples/sec Loss 39.5483 LearningRate 0.000115 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:39,232-Speed 2490.33 samples/sec Loss 39.5102 LearningRate 0.000115 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:47,449-Speed 2492.77 samples/sec Loss 39.5393 LearningRate 0.000116 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:37:55,662-Speed 2493.98 samples/sec Loss 39.5399 LearningRate 0.000116 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:03,826-Speed 2509.09 samples/sec Loss 39.5117 LearningRate 0.000116 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:12,040-Speed 2493.88 samples/sec Loss 39.5152 LearningRate 0.000116 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:20,253-Speed 2493.75 samples/sec Loss 39.5462 LearningRate 0.000116 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:28,471-Speed 2492.67 samples/sec Loss 39.5205 LearningRate 0.000116 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:36,684-Speed 2493.79 samples/sec Loss 39.5224 LearningRate 0.000116 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:44,899-Speed 2493.48 samples/sec Loss 39.5449 LearningRate 0.000116 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:38:53,075-Speed 2505.14 samples/sec Loss 39.5283 LearningRate 0.000117 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:01,304-Speed 2489.25 samples/sec Loss 39.5400 LearningRate 0.000117 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:09,518-Speed 2493.34 samples/sec Loss 39.5127 LearningRate 0.000117 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:17,735-Speed 2492.95 samples/sec Loss 39.5216 LearningRate 0.000117 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:25,951-Speed 2493.21 samples/sec Loss 39.5476 LearningRate 0.000117 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:34,169-Speed 2492.42 samples/sec Loss 39.5511 LearningRate 0.000117 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:42,332-Speed 2509.35 samples/sec Loss 39.5743 LearningRate 0.000117 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:50,546-Speed 2493.47 samples/sec Loss 39.5376 LearningRate 0.000117 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:39:58,761-Speed 2493.42 samples/sec Loss 39.5462 LearningRate 0.000118 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:06,978-Speed 2492.81 samples/sec Loss 39.5230 LearningRate 0.000118 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:15,205-Speed 2489.45 samples/sec Loss 39.5489 LearningRate 0.000118 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:23,420-Speed 2493.53 samples/sec Loss 39.5310 LearningRate 0.000118 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:31,582-Speed 2509.69 samples/sec Loss 39.5202 LearningRate 0.000118 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:39,796-Speed 2493.55 samples/sec Loss 39.5238 LearningRate 0.000118 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:48,012-Speed 2493.12 samples/sec Loss 39.5235 LearningRate 0.000118 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:40:56,227-Speed 2493.43 samples/sec Loss 39.5461 LearningRate 0.000118 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:04,447-Speed 2492.14 samples/sec Loss 39.5253 LearningRate 0.000118 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:12,661-Speed 2493.78 samples/sec Loss 39.5350 LearningRate 0.000119 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:20,818-Speed 2510.88 samples/sec Loss 39.4963 LearningRate 0.000119 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:29,032-Speed 2493.82 samples/sec Loss 39.5360 LearningRate 0.000119 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:37,247-Speed 2493.29 samples/sec Loss 39.5439 LearningRate 0.000119 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:45,471-Speed 2490.94 samples/sec Loss 39.5312 LearningRate 0.000119 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:41:53,684-Speed 2493.98 samples/sec Loss 39.5626 LearningRate 0.000119 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:01,897-Speed 2493.91 samples/sec Loss 39.5291 LearningRate 0.000119 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:10,062-Speed 2508.90 samples/sec Loss 39.5489 LearningRate 0.000119 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:18,274-Speed 2494.42 samples/sec Loss 39.5702 LearningRate 0.000120 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:26,485-Speed 2494.49 samples/sec Loss 39.5225 LearningRate 0.000120 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:34,701-Speed 2493.49 samples/sec Loss 39.5264 LearningRate 0.000120 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:42,913-Speed 2494.30 samples/sec Loss 39.5599 LearningRate 0.000120 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:51,137-Speed 2490.70 samples/sec Loss 39.5338 LearningRate 0.000120 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:42:59,298-Speed 2509.75 samples/sec Loss 39.5359 LearningRate 0.000120 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:07,528-Speed 2488.72 samples/sec Loss 39.5339 LearningRate 0.000120 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:15,741-Speed 2494.02 samples/sec Loss 39.5447 LearningRate 0.000120 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:23,960-Speed 2492.23 samples/sec Loss 39.5647 LearningRate 0.000121 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:32,175-Speed 2493.35 samples/sec Loss 39.5471 LearningRate 0.000121 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:40,384-Speed 2495.02 samples/sec Loss 39.5492 LearningRate 0.000121 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:48,549-Speed 2508.81 samples/sec Loss 39.5343 LearningRate 0.000121 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:43:56,764-Speed 2493.27 samples/sec Loss 39.5476 LearningRate 0.000121 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:04,979-Speed 2493.45 samples/sec Loss 39.5393 LearningRate 0.000121 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:13,191-Speed 2494.64 samples/sec Loss 39.5582 LearningRate 0.000121 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:21,406-Speed 2493.51 samples/sec Loss 39.5667 LearningRate 0.000121 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:29,626-Speed 2491.85 samples/sec Loss 39.5361 LearningRate 0.000121 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:37,791-Speed 2508.80 samples/sec Loss 39.5531 LearningRate 0.000122 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:46,007-Speed 2493.01 samples/sec Loss 39.5282 LearningRate 0.000122 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:44:54,225-Speed 2492.84 samples/sec Loss 39.5492 LearningRate 0.000122 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:02,442-Speed 2493.05 samples/sec Loss 39.5365 LearningRate 0.000122 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:10,660-Speed 2492.63 samples/sec Loss 39.5126 LearningRate 0.000122 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:18,878-Speed 2492.29 samples/sec Loss 39.5391 LearningRate 0.000122 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:27,042-Speed 2509.00 samples/sec Loss 39.5619 LearningRate 0.000122 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:35,262-Speed 2492.06 samples/sec Loss 39.5509 LearningRate 0.000122 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:43,475-Speed 2493.95 samples/sec Loss 39.5547 LearningRate 0.000123 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:51,689-Speed 2493.65 samples/sec Loss 39.5471 LearningRate 0.000123 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:45:59,906-Speed 2492.95 samples/sec Loss 39.5398 LearningRate 0.000123 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:08,122-Speed 2493.03 samples/sec Loss 39.5369 LearningRate 0.000123 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:16,281-Speed 2510.41 samples/sec Loss 39.5090 LearningRate 0.000123 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:24,493-Speed 2494.26 samples/sec Loss 39.5298 LearningRate 0.000123 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:32,708-Speed 2493.41 samples/sec Loss 39.5441 LearningRate 0.000123 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:40,925-Speed 2492.83 samples/sec Loss 39.5410 LearningRate 0.000123 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:49,153-Speed 2489.56 samples/sec Loss 39.5018 LearningRate 0.000124 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:46:57,367-Speed 2493.48 samples/sec Loss 39.5346 LearningRate 0.000124 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:05,539-Speed 2507.46 samples/sec Loss 39.5407 LearningRate 0.000124 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:13,756-Speed 2492.92 samples/sec Loss 39.5227 LearningRate 0.000124 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:21,975-Speed 2491.97 samples/sec Loss 39.5640 LearningRate 0.000124 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:30,192-Speed 2492.91 samples/sec Loss 39.5166 LearningRate 0.000124 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:38,406-Speed 2493.56 samples/sec Loss 39.5001 LearningRate 0.000124 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:46,617-Speed 2494.63 samples/sec Loss 39.5445 LearningRate 0.000124 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:47:54,781-Speed 2509.27 samples/sec Loss 39.5595 LearningRate 0.000125 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:48:03,007-Speed 2490.17 samples/sec Loss 39.5099 LearningRate 0.000125 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:48:11,221-Speed 2493.67 samples/sec Loss 39.5538 LearningRate 0.000125 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 4096 Required: 187 hours Training: 2022-07-05 16:48:19,433-Speed 2494.28 samples/sec Loss 39.5355 LearningRate 0.000125 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:48:27,647-Speed 2493.89 samples/sec Loss 39.5301 LearningRate 0.000125 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:48:35,864-Speed 2492.65 samples/sec Loss 39.5441 LearningRate 0.000125 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:48:44,035-Speed 2507.01 samples/sec Loss 39.5579 LearningRate 0.000125 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:48:52,255-Speed 2491.83 samples/sec Loss 39.5453 LearningRate 0.000125 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:00,470-Speed 2493.53 samples/sec Loss 39.5385 LearningRate 0.000125 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:08,683-Speed 2493.92 samples/sec Loss 39.5418 LearningRate 0.000126 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:16,901-Speed 2492.75 samples/sec Loss 39.5045 LearningRate 0.000126 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:25,117-Speed 2493.19 samples/sec Loss 39.5404 LearningRate 0.000126 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:33,281-Speed 2509.11 samples/sec Loss 39.4911 LearningRate 0.000126 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:41,496-Speed 2493.44 samples/sec Loss 39.5452 LearningRate 0.000126 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:49,712-Speed 2493.11 samples/sec Loss 39.5181 LearningRate 0.000126 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:49:57,929-Speed 2492.95 samples/sec Loss 39.5497 LearningRate 0.000126 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:06,145-Speed 2492.95 samples/sec Loss 39.5607 LearningRate 0.000126 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:14,367-Speed 2491.36 samples/sec Loss 39.5491 LearningRate 0.000127 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:22,529-Speed 2509.50 samples/sec Loss 39.5913 LearningRate 0.000127 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:30,746-Speed 2493.13 samples/sec Loss 39.5272 LearningRate 0.000127 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:38,958-Speed 2494.15 samples/sec Loss 39.5590 LearningRate 0.000127 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:47,174-Speed 2492.94 samples/sec Loss 39.5256 LearningRate 0.000127 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:50:55,391-Speed 2492.97 samples/sec Loss 39.5743 LearningRate 0.000127 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:03,610-Speed 2492.21 samples/sec Loss 39.5168 LearningRate 0.000127 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:11,772-Speed 2509.49 samples/sec Loss 39.5374 LearningRate 0.000127 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:19,994-Speed 2491.19 samples/sec Loss 39.5415 LearningRate 0.000128 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:28,211-Speed 2492.85 samples/sec Loss 39.5167 LearningRate 0.000128 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:36,429-Speed 2492.91 samples/sec Loss 39.5051 LearningRate 0.000128 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:44,641-Speed 2494.14 samples/sec Loss 39.5245 LearningRate 0.000128 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:51:52,864-Speed 2490.86 samples/sec Loss 39.5686 LearningRate 0.000128 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:01,021-Speed 2511.38 samples/sec Loss 39.5600 LearningRate 0.000128 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:09,239-Speed 2492.59 samples/sec Loss 39.5214 LearningRate 0.000128 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:17,455-Speed 2493.08 samples/sec Loss 39.5224 LearningRate 0.000128 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:25,671-Speed 2492.84 samples/sec Loss 39.5271 LearningRate 0.000128 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:33,895-Speed 2490.84 samples/sec Loss 39.5215 LearningRate 0.000129 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:42,113-Speed 2492.63 samples/sec Loss 39.5280 LearningRate 0.000129 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:50,276-Speed 2509.35 samples/sec Loss 39.5613 LearningRate 0.000129 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:52:58,507-Speed 2488.49 samples/sec Loss 39.5691 LearningRate 0.000129 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:06,730-Speed 2491.13 samples/sec Loss 39.5259 LearningRate 0.000129 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:14,949-Speed 2492.16 samples/sec Loss 39.5530 LearningRate 0.000129 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:23,164-Speed 2493.82 samples/sec Loss 39.5317 LearningRate 0.000129 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:31,387-Speed 2491.08 samples/sec Loss 39.5427 LearningRate 0.000129 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:39,549-Speed 2509.58 samples/sec Loss 39.5195 LearningRate 0.000130 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:47,765-Speed 2492.93 samples/sec Loss 39.5268 LearningRate 0.000130 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:53:55,996-Speed 2488.68 samples/sec Loss 39.5341 LearningRate 0.000130 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:04,213-Speed 2493.01 samples/sec Loss 39.5420 LearningRate 0.000130 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:12,429-Speed 2493.23 samples/sec Loss 39.5680 LearningRate 0.000130 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:20,642-Speed 2493.79 samples/sec Loss 39.5159 LearningRate 0.000130 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:28,807-Speed 2508.74 samples/sec Loss 39.5432 LearningRate 0.000130 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:37,025-Speed 2492.50 samples/sec Loss 39.5066 LearningRate 0.000130 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:45,238-Speed 2494.10 samples/sec Loss 39.5299 LearningRate 0.000131 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:54:53,454-Speed 2493.29 samples/sec Loss 39.5396 LearningRate 0.000131 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:01,671-Speed 2492.92 samples/sec Loss 39.5433 LearningRate 0.000131 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:09,890-Speed 2492.03 samples/sec Loss 39.5426 LearningRate 0.000131 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:18,056-Speed 2508.50 samples/sec Loss 39.5624 LearningRate 0.000131 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:26,272-Speed 2492.88 samples/sec Loss 39.5314 LearningRate 0.000131 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:34,492-Speed 2491.86 samples/sec Loss 39.5374 LearningRate 0.000131 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:42,712-Speed 2491.98 samples/sec Loss 39.5681 LearningRate 0.000131 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:50,942-Speed 2489.02 samples/sec Loss 39.5299 LearningRate 0.000132 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:55:59,161-Speed 2491.94 samples/sec Loss 39.5526 LearningRate 0.000132 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:07,326-Speed 2508.80 samples/sec Loss 39.5245 LearningRate 0.000132 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:15,543-Speed 2493.00 samples/sec Loss 39.5351 LearningRate 0.000132 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:23,763-Speed 2491.69 samples/sec Loss 39.5316 LearningRate 0.000132 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:31,979-Speed 2493.06 samples/sec Loss 39.5318 LearningRate 0.000132 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:40,210-Speed 2488.74 samples/sec Loss 39.5261 LearningRate 0.000132 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:48,426-Speed 2493.03 samples/sec Loss 39.5236 LearningRate 0.000132 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:56:56,590-Speed 2509.04 samples/sec Loss 39.5472 LearningRate 0.000132 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:04,811-Speed 2491.34 samples/sec Loss 39.5353 LearningRate 0.000133 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:13,027-Speed 2493.41 samples/sec Loss 39.5346 LearningRate 0.000133 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:21,242-Speed 2493.40 samples/sec Loss 39.5369 LearningRate 0.000133 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:29,461-Speed 2492.09 samples/sec Loss 39.5091 LearningRate 0.000133 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:37,690-Speed 2489.27 samples/sec Loss 39.5381 LearningRate 0.000133 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:45,853-Speed 2509.40 samples/sec Loss 39.5213 LearningRate 0.000133 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:57:54,069-Speed 2493.01 samples/sec Loss 39.5470 LearningRate 0.000133 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:02,284-Speed 2493.27 samples/sec Loss 39.5199 LearningRate 0.000133 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:10,520-Speed 2487.13 samples/sec Loss 39.5680 LearningRate 0.000134 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:18,746-Speed 2489.84 samples/sec Loss 39.5310 LearningRate 0.000134 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:26,965-Speed 2492.14 samples/sec Loss 39.5319 LearningRate 0.000134 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:35,139-Speed 2506.15 samples/sec Loss 39.5371 LearningRate 0.000134 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:43,353-Speed 2493.62 samples/sec Loss 39.5263 LearningRate 0.000134 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:51,573-Speed 2492.04 samples/sec Loss 39.5334 LearningRate 0.000134 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:58:59,801-Speed 2489.24 samples/sec Loss 39.4914 LearningRate 0.000134 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:08,021-Speed 2492.01 samples/sec Loss 39.5257 LearningRate 0.000134 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:16,239-Speed 2492.30 samples/sec Loss 39.5348 LearningRate 0.000135 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:24,404-Speed 2508.91 samples/sec Loss 39.5221 LearningRate 0.000135 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:32,621-Speed 2492.54 samples/sec Loss 39.5300 LearningRate 0.000135 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:40,848-Speed 2489.72 samples/sec Loss 39.4873 LearningRate 0.000135 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:49,066-Speed 2492.82 samples/sec Loss 39.5131 LearningRate 0.000135 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 16:59:57,286-Speed 2491.87 samples/sec Loss 39.5092 LearningRate 0.000135 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:05,506-Speed 2491.89 samples/sec Loss 39.5252 LearningRate 0.000135 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:13,673-Speed 2508.05 samples/sec Loss 39.5180 LearningRate 0.000135 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:21,897-Speed 2490.88 samples/sec Loss 39.5547 LearningRate 0.000135 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:30,116-Speed 2492.20 samples/sec Loss 39.5010 LearningRate 0.000136 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:38,334-Speed 2492.27 samples/sec Loss 39.5327 LearningRate 0.000136 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:46,551-Speed 2492.91 samples/sec Loss 39.5142 LearningRate 0.000136 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:00:54,768-Speed 2492.78 samples/sec Loss 39.5332 LearningRate 0.000136 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:02,932-Speed 2508.89 samples/sec Loss 39.5024 LearningRate 0.000136 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:11,152-Speed 2491.95 samples/sec Loss 39.5326 LearningRate 0.000136 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:19,368-Speed 2493.16 samples/sec Loss 39.4833 LearningRate 0.000136 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:27,587-Speed 2492.41 samples/sec Loss 39.5027 LearningRate 0.000136 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:35,804-Speed 2492.53 samples/sec Loss 39.4837 LearningRate 0.000137 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:44,027-Speed 2491.17 samples/sec Loss 39.5175 LearningRate 0.000137 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:01:52,192-Speed 2508.51 samples/sec Loss 39.5213 LearningRate 0.000137 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:00,409-Speed 2492.88 samples/sec Loss 39.4984 LearningRate 0.000137 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:08,627-Speed 2492.53 samples/sec Loss 39.5133 LearningRate 0.000137 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:16,846-Speed 2492.10 samples/sec Loss 39.5051 LearningRate 0.000137 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:25,064-Speed 2492.56 samples/sec Loss 39.5074 LearningRate 0.000137 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:33,285-Speed 2491.53 samples/sec Loss 39.4755 LearningRate 0.000137 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:41,453-Speed 2507.91 samples/sec Loss 39.5059 LearningRate 0.000138 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:49,671-Speed 2492.30 samples/sec Loss 39.5247 LearningRate 0.000138 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:02:57,896-Speed 2490.62 samples/sec Loss 39.5106 LearningRate 0.000138 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:06,126-Speed 2488.91 samples/sec Loss 39.4965 LearningRate 0.000138 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:14,342-Speed 2493.16 samples/sec Loss 39.4728 LearningRate 0.000138 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:22,562-Speed 2491.88 samples/sec Loss 39.4802 LearningRate 0.000138 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:30,724-Speed 2509.33 samples/sec Loss 39.4786 LearningRate 0.000138 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:38,946-Speed 2491.52 samples/sec Loss 39.4985 LearningRate 0.000138 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:47,172-Speed 2489.92 samples/sec Loss 39.4626 LearningRate 0.000138 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:03:55,388-Speed 2492.97 samples/sec Loss 39.4770 LearningRate 0.000139 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:03,610-Speed 2491.55 samples/sec Loss 39.4643 LearningRate 0.000139 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:11,827-Speed 2492.89 samples/sec Loss 39.4897 LearningRate 0.000139 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:19,995-Speed 2507.82 samples/sec Loss 39.4936 LearningRate 0.000139 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:28,212-Speed 2492.62 samples/sec Loss 39.4798 LearningRate 0.000139 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:36,429-Speed 2492.95 samples/sec Loss 39.4838 LearningRate 0.000139 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 8192 Required: 187 hours Training: 2022-07-05 17:04:44,647-Speed 2492.63 samples/sec Loss 39.4934 LearningRate 0.000139 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:04:52,867-Speed 2491.78 samples/sec Loss 39.4918 LearningRate 0.000139 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:01,088-Speed 2491.52 samples/sec Loss 39.4862 LearningRate 0.000140 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:09,255-Speed 2508.08 samples/sec Loss 39.4966 LearningRate 0.000140 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:17,473-Speed 2492.15 samples/sec Loss 39.4805 LearningRate 0.000140 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:25,693-Speed 2491.96 samples/sec Loss 39.4657 LearningRate 0.000140 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:33,916-Speed 2491.19 samples/sec Loss 39.5087 LearningRate 0.000140 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:42,135-Speed 2492.04 samples/sec Loss 39.4654 LearningRate 0.000140 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:50,353-Speed 2492.73 samples/sec Loss 39.4725 LearningRate 0.000140 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:05:58,520-Speed 2508.18 samples/sec Loss 39.4307 LearningRate 0.000140 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:06,755-Speed 2487.68 samples/sec Loss 39.4378 LearningRate 0.000141 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:14,976-Speed 2491.68 samples/sec Loss 39.4801 LearningRate 0.000141 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:23,194-Speed 2492.51 samples/sec Loss 39.4812 LearningRate 0.000141 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:31,410-Speed 2493.15 samples/sec Loss 39.4578 LearningRate 0.000141 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:39,628-Speed 2492.48 samples/sec Loss 39.4518 LearningRate 0.000141 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:47,793-Speed 2508.65 samples/sec Loss 39.4505 LearningRate 0.000141 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:06:56,014-Speed 2491.64 samples/sec Loss 39.4625 LearningRate 0.000141 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:04,246-Speed 2488.30 samples/sec Loss 39.4358 LearningRate 0.000141 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:12,464-Speed 2492.32 samples/sec Loss 39.4093 LearningRate 0.000142 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:20,681-Speed 2493.02 samples/sec Loss 39.4238 LearningRate 0.000142 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:28,901-Speed 2491.79 samples/sec Loss 39.4296 LearningRate 0.000142 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:37,067-Speed 2508.06 samples/sec Loss 39.4316 LearningRate 0.000142 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:45,298-Speed 2488.65 samples/sec Loss 39.4463 LearningRate 0.000142 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:07:53,529-Speed 2488.93 samples/sec Loss 39.4259 LearningRate 0.000142 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:01,747-Speed 2492.30 samples/sec Loss 39.4478 LearningRate 0.000142 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:09,973-Speed 2489.93 samples/sec Loss 39.4541 LearningRate 0.000142 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:18,193-Speed 2491.92 samples/sec Loss 39.4437 LearningRate 0.000142 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:26,354-Speed 2509.91 samples/sec Loss 39.4303 LearningRate 0.000143 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:34,577-Speed 2490.75 samples/sec Loss 39.4414 LearningRate 0.000143 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:42,805-Speed 2489.38 samples/sec Loss 39.4339 LearningRate 0.000143 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:51,025-Speed 2491.98 samples/sec Loss 39.3951 LearningRate 0.000143 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:08:59,246-Speed 2491.44 samples/sec Loss 39.4257 LearningRate 0.000143 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:07,466-Speed 2491.55 samples/sec Loss 39.4283 LearningRate 0.000143 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:15,631-Speed 2508.56 samples/sec Loss 39.4282 LearningRate 0.000143 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:23,851-Speed 2492.04 samples/sec Loss 39.4488 LearningRate 0.000143 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:32,069-Speed 2492.03 samples/sec Loss 39.4026 LearningRate 0.000144 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:40,302-Speed 2488.15 samples/sec Loss 39.4436 LearningRate 0.000144 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:48,520-Speed 2492.41 samples/sec Loss 39.4597 LearningRate 0.000144 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:09:56,740-Speed 2491.73 samples/sec Loss 39.6087 LearningRate 0.000144 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:04,905-Speed 2508.43 samples/sec Loss 39.6269 LearningRate 0.000144 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:13,123-Speed 2492.43 samples/sec Loss 39.4938 LearningRate 0.000144 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:21,343-Speed 2491.94 samples/sec Loss 39.4749 LearningRate 0.000144 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:29,560-Speed 2492.41 samples/sec Loss 39.5016 LearningRate 0.000144 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:37,783-Speed 2490.89 samples/sec Loss 39.4713 LearningRate 0.000145 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:46,003-Speed 2491.62 samples/sec Loss 39.4377 LearningRate 0.000145 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:10:54,169-Speed 2508.43 samples/sec Loss 39.4980 LearningRate 0.000145 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:02,387-Speed 2492.26 samples/sec Loss 39.4919 LearningRate 0.000145 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:10,622-Speed 2487.34 samples/sec Loss 39.4437 LearningRate 0.000145 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:18,841-Speed 2492.09 samples/sec Loss 39.4147 LearningRate 0.000145 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:27,061-Speed 2491.82 samples/sec Loss 39.3963 LearningRate 0.000145 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:35,279-Speed 2492.78 samples/sec Loss 39.3926 LearningRate 0.000145 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:43,447-Speed 2507.63 samples/sec Loss 39.4366 LearningRate 0.000145 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:51,669-Speed 2491.20 samples/sec Loss 39.4079 LearningRate 0.000146 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:11:59,885-Speed 2492.97 samples/sec Loss 39.4424 LearningRate 0.000146 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:08,114-Speed 2488.99 samples/sec Loss 39.3850 LearningRate 0.000146 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:16,334-Speed 2491.93 samples/sec Loss 39.3844 LearningRate 0.000146 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:24,552-Speed 2492.22 samples/sec Loss 39.3960 LearningRate 0.000146 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:32,722-Speed 2507.20 samples/sec Loss 39.3964 LearningRate 0.000146 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:40,943-Speed 2491.53 samples/sec Loss 39.3975 LearningRate 0.000146 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:49,161-Speed 2492.61 samples/sec Loss 39.4037 LearningRate 0.000146 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:12:57,381-Speed 2491.73 samples/sec Loss 39.3734 LearningRate 0.000147 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:05,597-Speed 2493.13 samples/sec Loss 39.3988 LearningRate 0.000147 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:13,816-Speed 2492.28 samples/sec Loss 39.3601 LearningRate 0.000147 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:21,981-Speed 2508.55 samples/sec Loss 39.3076 LearningRate 0.000147 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:30,198-Speed 2492.82 samples/sec Loss 39.3750 LearningRate 0.000147 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:13:38,418-Speed 2491.67 samples/sec Loss 39.3644 LearningRate 0.000147 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:46,636-Speed 2492.41 samples/sec Loss 39.3660 LearningRate 0.000147 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:13:54,859-Speed 2491.34 samples/sec Loss 39.3566 LearningRate 0.000147 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:03,076-Speed 2492.88 samples/sec Loss 39.3701 LearningRate 0.000148 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:11,241-Speed 2508.87 samples/sec Loss 39.3654 LearningRate 0.000148 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:14:19,459-Speed 2492.57 samples/sec Loss 39.3603 LearningRate 0.000148 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:27,677-Speed 2492.33 samples/sec Loss 39.3423 LearningRate 0.000148 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:35,895-Speed 2492.66 samples/sec Loss 39.2958 LearningRate 0.000148 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:44,112-Speed 2492.50 samples/sec Loss 39.3107 LearningRate 0.000148 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:14:52,328-Speed 2493.19 samples/sec Loss 39.3699 LearningRate 0.000148 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:00,497-Speed 2507.68 samples/sec Loss 39.3490 LearningRate 0.000148 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:08,718-Speed 2491.51 samples/sec Loss 39.3315 LearningRate 0.000149 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:16,938-Speed 2491.74 samples/sec Loss 39.3196 LearningRate 0.000149 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:25,160-Speed 2491.65 samples/sec Loss 39.2903 LearningRate 0.000149 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:33,378-Speed 2492.62 samples/sec Loss 39.3054 LearningRate 0.000149 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:41,597-Speed 2492.22 samples/sec Loss 39.2895 LearningRate 0.000149 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:49,762-Speed 2508.46 samples/sec Loss 39.2828 LearningRate 0.000149 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:15:57,993-Speed 2488.70 samples/sec Loss 39.2762 LearningRate 0.000149 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:06,213-Speed 2491.65 samples/sec Loss 39.3178 LearningRate 0.000149 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:14,437-Speed 2490.75 samples/sec Loss 39.2847 LearningRate 0.000149 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:22,660-Speed 2491.21 samples/sec Loss 39.2960 LearningRate 0.000150 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:30,886-Speed 2490.15 samples/sec Loss 39.2771 LearningRate 0.000150 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:39,064-Speed 2504.61 samples/sec Loss 39.3052 LearningRate 0.000150 Epoch: 0 Global Step: 12430 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:47,283-Speed 2492.32 samples/sec Loss 39.3043 LearningRate 0.000150 Epoch: 0 Global Step: 12440 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:16:55,527-Speed 2484.34 samples/sec Loss 39.3069 LearningRate 0.000150 Epoch: 0 Global Step: 12450 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:03,750-Speed 2491.18 samples/sec Loss 39.3198 LearningRate 0.000150 Epoch: 0 Global Step: 12460 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:11,971-Speed 2491.60 samples/sec Loss 39.2802 LearningRate 0.000150 Epoch: 0 Global Step: 12470 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:20,220-Speed 2482.73 samples/sec Loss 39.2895 LearningRate 0.000150 Epoch: 0 Global Step: 12480 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:28,389-Speed 2507.73 samples/sec Loss 39.2722 LearningRate 0.000151 Epoch: 0 Global Step: 12490 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:36,614-Speed 2490.48 samples/sec Loss 39.3025 LearningRate 0.000151 Epoch: 0 Global Step: 12500 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:44,833-Speed 2492.16 samples/sec Loss 39.2685 LearningRate 0.000151 Epoch: 0 Global Step: 12510 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:17:53,049-Speed 2493.20 samples/sec Loss 39.2345 LearningRate 0.000151 Epoch: 0 Global Step: 12520 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:01,269-Speed 2491.82 samples/sec Loss 39.2801 LearningRate 0.000151 Epoch: 0 Global Step: 12530 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:09,486-Speed 2492.77 samples/sec Loss 39.2721 LearningRate 0.000151 Epoch: 0 Global Step: 12540 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:17,661-Speed 2505.48 samples/sec Loss 39.2389 LearningRate 0.000151 Epoch: 0 Global Step: 12550 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:25,878-Speed 2492.91 samples/sec Loss 39.2620 LearningRate 0.000151 Epoch: 0 Global Step: 12560 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:34,093-Speed 2493.30 samples/sec Loss 39.2565 LearningRate 0.000152 Epoch: 0 Global Step: 12570 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:42,315-Speed 2491.60 samples/sec Loss 39.2238 LearningRate 0.000152 Epoch: 0 Global Step: 12580 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:50,530-Speed 2493.05 samples/sec Loss 39.2234 LearningRate 0.000152 Epoch: 0 Global Step: 12590 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:18:58,748-Speed 2492.57 samples/sec Loss 39.2335 LearningRate 0.000152 Epoch: 0 Global Step: 12600 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:06,920-Speed 2506.67 samples/sec Loss 39.2284 LearningRate 0.000152 Epoch: 0 Global Step: 12610 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:15,138-Speed 2492.47 samples/sec Loss 39.2500 LearningRate 0.000152 Epoch: 0 Global Step: 12620 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:23,355-Speed 2492.79 samples/sec Loss 39.2270 LearningRate 0.000152 Epoch: 0 Global Step: 12630 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:31,587-Speed 2488.21 samples/sec Loss 39.2025 LearningRate 0.000152 Epoch: 0 Global Step: 12640 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:39,806-Speed 2492.25 samples/sec Loss 39.2179 LearningRate 0.000152 Epoch: 0 Global Step: 12650 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:48,025-Speed 2492.14 samples/sec Loss 39.2121 LearningRate 0.000153 Epoch: 0 Global Step: 12660 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:19:56,190-Speed 2508.39 samples/sec Loss 39.1981 LearningRate 0.000153 Epoch: 0 Global Step: 12670 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:04,407-Speed 2492.88 samples/sec Loss 39.2232 LearningRate 0.000153 Epoch: 0 Global Step: 12680 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:12,627-Speed 2491.75 samples/sec Loss 39.2398 LearningRate 0.000153 Epoch: 0 Global Step: 12690 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:20,846-Speed 2492.07 samples/sec Loss 39.2313 LearningRate 0.000153 Epoch: 0 Global Step: 12700 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:29,065-Speed 2492.49 samples/sec Loss 39.2305 LearningRate 0.000153 Epoch: 0 Global Step: 12710 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:37,285-Speed 2491.73 samples/sec Loss 39.1804 LearningRate 0.000153 Epoch: 0 Global Step: 12720 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:45,449-Speed 2508.82 samples/sec Loss 39.1813 LearningRate 0.000153 Epoch: 0 Global Step: 12730 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:20:53,677-Speed 2489.72 samples/sec Loss 39.1869 LearningRate 0.000154 Epoch: 0 Global Step: 12740 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:21:01,899-Speed 2491.11 samples/sec Loss 39.1829 LearningRate 0.000154 Epoch: 0 Global Step: 12750 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:21:10,122-Speed 2491.27 samples/sec Loss 39.1801 LearningRate 0.000154 Epoch: 0 Global Step: 12760 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:18,344-Speed 2491.09 samples/sec Loss 39.1693 LearningRate 0.000154 Epoch: 0 Global Step: 12770 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:26,562-Speed 2492.90 samples/sec Loss 39.1903 LearningRate 0.000154 Epoch: 0 Global Step: 12780 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:34,730-Speed 2507.69 samples/sec Loss 39.1768 LearningRate 0.000154 Epoch: 0 Global Step: 12790 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:42,952-Speed 2491.27 samples/sec Loss 39.1339 LearningRate 0.000154 Epoch: 0 Global Step: 12800 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:51,170-Speed 2492.76 samples/sec Loss 39.1687 LearningRate 0.000154 Epoch: 0 Global Step: 12810 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:21:59,388-Speed 2492.29 samples/sec Loss 39.1829 LearningRate 0.000155 Epoch: 0 Global Step: 12820 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:07,607-Speed 2492.25 samples/sec Loss 39.1879 LearningRate 0.000155 Epoch: 0 Global Step: 12830 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:15,838-Speed 2488.47 samples/sec Loss 39.1251 LearningRate 0.000155 Epoch: 0 Global Step: 12840 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:24,002-Speed 2509.06 samples/sec Loss 39.1255 LearningRate 0.000155 Epoch: 0 Global Step: 12850 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:32,221-Speed 2492.04 samples/sec Loss 39.1395 LearningRate 0.000155 Epoch: 0 Global Step: 12860 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:40,444-Speed 2491.04 samples/sec Loss 39.1643 LearningRate 0.000155 Epoch: 0 Global Step: 12870 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:48,665-Speed 2491.61 samples/sec Loss 39.1129 LearningRate 0.000155 Epoch: 0 Global Step: 12880 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:22:56,885-Speed 2491.69 samples/sec Loss 39.1818 LearningRate 0.000155 Epoch: 0 Global Step: 12890 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:23:05,073-Speed 2501.68 samples/sec Loss 39.1443 LearningRate 0.000155 Epoch: 0 Global Step: 12900 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:13,239-Speed 2508.49 samples/sec Loss 39.1311 LearningRate 0.000156 Epoch: 0 Global Step: 12910 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:21,472-Speed 2487.99 samples/sec Loss 39.1208 LearningRate 0.000156 Epoch: 0 Global Step: 12920 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:29,698-Speed 2490.02 samples/sec Loss 39.1257 LearningRate 0.000156 Epoch: 0 Global Step: 12930 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:37,919-Speed 2491.97 samples/sec Loss 39.1662 LearningRate 0.000156 Epoch: 0 Global Step: 12940 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:46,136-Speed 2492.89 samples/sec Loss 39.1245 LearningRate 0.000156 Epoch: 0 Global Step: 12950 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:23:54,358-Speed 2490.90 samples/sec Loss 39.1342 LearningRate 0.000156 Epoch: 0 Global Step: 12960 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:02,522-Speed 2509.13 samples/sec Loss 39.1276 LearningRate 0.000156 Epoch: 0 Global Step: 12970 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:10,763-Speed 2485.65 samples/sec Loss 39.1767 LearningRate 0.000156 Epoch: 0 Global Step: 12980 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:18,979-Speed 2492.92 samples/sec Loss 39.1362 LearningRate 0.000157 Epoch: 0 Global Step: 12990 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:27,197-Speed 2492.36 samples/sec Loss 39.1082 LearningRate 0.000157 Epoch: 0 Global Step: 13000 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:35,413-Speed 2493.40 samples/sec Loss 39.1173 LearningRate 0.000157 Epoch: 0 Global Step: 13010 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:43,631-Speed 2492.70 samples/sec Loss 39.1105 LearningRate 0.000157 Epoch: 0 Global Step: 13020 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:24:51,810-Speed 2504.57 samples/sec Loss 39.0900 LearningRate 0.000157 Epoch: 0 Global Step: 13030 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:00,029-Speed 2492.36 samples/sec Loss 39.0921 LearningRate 0.000157 Epoch: 0 Global Step: 13040 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:08,264-Speed 2487.39 samples/sec Loss 39.0602 LearningRate 0.000157 Epoch: 0 Global Step: 13050 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:16,502-Speed 2486.44 samples/sec Loss 39.1072 LearningRate 0.000157 Epoch: 0 Global Step: 13060 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:24,733-Speed 2488.45 samples/sec Loss 39.0759 LearningRate 0.000158 Epoch: 0 Global Step: 13070 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:32,955-Speed 2491.13 samples/sec Loss 39.1109 LearningRate 0.000158 Epoch: 0 Global Step: 13080 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:41,122-Speed 2508.03 samples/sec Loss 39.0689 LearningRate 0.000158 Epoch: 0 Global Step: 13090 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:49,342-Speed 2492.07 samples/sec Loss 39.0552 LearningRate 0.000158 Epoch: 0 Global Step: 13100 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:25:57,559-Speed 2492.49 samples/sec Loss 39.0640 LearningRate 0.000158 Epoch: 0 Global Step: 13110 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:05,775-Speed 2492.95 samples/sec Loss 39.0435 LearningRate 0.000158 Epoch: 0 Global Step: 13120 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:13,994-Speed 2492.27 samples/sec Loss 39.0200 LearningRate 0.000158 Epoch: 0 Global Step: 13130 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:22,213-Speed 2492.26 samples/sec Loss 39.0910 LearningRate 0.000158 Epoch: 0 Global Step: 13140 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:30,378-Speed 2508.62 samples/sec Loss 39.1077 LearningRate 0.000159 Epoch: 0 Global Step: 13150 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:38,596-Speed 2493.05 samples/sec Loss 39.0839 LearningRate 0.000159 Epoch: 0 Global Step: 13160 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:46,814-Speed 2492.84 samples/sec Loss 39.0527 LearningRate 0.000159 Epoch: 0 Global Step: 13170 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:26:55,042-Speed 2489.91 samples/sec Loss 39.0391 LearningRate 0.000159 Epoch: 0 Global Step: 13180 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:03,262-Speed 2491.95 samples/sec Loss 39.0695 LearningRate 0.000159 Epoch: 0 Global Step: 13190 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:11,478-Speed 2493.11 samples/sec Loss 39.0853 LearningRate 0.000159 Epoch: 0 Global Step: 13200 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:19,643-Speed 2508.74 samples/sec Loss 39.0350 LearningRate 0.000159 Epoch: 0 Global Step: 13210 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:27,860-Speed 2492.86 samples/sec Loss 39.0276 LearningRate 0.000159 Epoch: 0 Global Step: 13220 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:36,077-Speed 2492.85 samples/sec Loss 39.0619 LearningRate 0.000159 Epoch: 0 Global Step: 13230 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:44,294-Speed 2492.93 samples/sec Loss 39.0288 LearningRate 0.000160 Epoch: 0 Global Step: 13240 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:27:52,515-Speed 2491.56 samples/sec Loss 39.0219 LearningRate 0.000160 Epoch: 0 Global Step: 13250 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:00,735-Speed 2491.76 samples/sec Loss 39.0750 LearningRate 0.000160 Epoch: 0 Global Step: 13260 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:08,898-Speed 2509.39 samples/sec Loss 39.0113 LearningRate 0.000160 Epoch: 0 Global Step: 13270 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:17,115-Speed 2492.77 samples/sec Loss 39.0014 LearningRate 0.000160 Epoch: 0 Global Step: 13280 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:25,330-Speed 2493.43 samples/sec Loss 39.0053 LearningRate 0.000160 Epoch: 0 Global Step: 13290 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:33,549-Speed 2492.18 samples/sec Loss 38.9799 LearningRate 0.000160 Epoch: 0 Global Step: 13300 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:41,765-Speed 2493.24 samples/sec Loss 38.9905 LearningRate 0.000160 Epoch: 0 Global Step: 13310 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:49,985-Speed 2491.85 samples/sec Loss 38.9618 LearningRate 0.000161 Epoch: 0 Global Step: 13320 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:28:58,147-Speed 2509.66 samples/sec Loss 38.9900 LearningRate 0.000161 Epoch: 0 Global Step: 13330 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:06,367-Speed 2491.98 samples/sec Loss 39.0137 LearningRate 0.000161 Epoch: 0 Global Step: 13340 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:14,581-Speed 2493.88 samples/sec Loss 38.9820 LearningRate 0.000161 Epoch: 0 Global Step: 13350 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:22,798-Speed 2492.81 samples/sec Loss 38.9346 LearningRate 0.000161 Epoch: 0 Global Step: 13360 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:31,028-Speed 2488.75 samples/sec Loss 38.9652 LearningRate 0.000161 Epoch: 0 Global Step: 13370 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:39,247-Speed 2492.36 samples/sec Loss 39.0298 LearningRate 0.000161 Epoch: 0 Global Step: 13380 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:47,409-Speed 2509.39 samples/sec Loss 39.0472 LearningRate 0.000161 Epoch: 0 Global Step: 13390 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:29:55,628-Speed 2492.47 samples/sec Loss 38.9642 LearningRate 0.000162 Epoch: 0 Global Step: 13400 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:03,848-Speed 2491.76 samples/sec Loss 38.9958 LearningRate 0.000162 Epoch: 0 Global Step: 13410 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:12,065-Speed 2492.90 samples/sec Loss 38.9644 LearningRate 0.000162 Epoch: 0 Global Step: 13420 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:20,280-Speed 2493.30 samples/sec Loss 38.9386 LearningRate 0.000162 Epoch: 0 Global Step: 13430 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:28,498-Speed 2492.43 samples/sec Loss 39.0116 LearningRate 0.000162 Epoch: 0 Global Step: 13440 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:36,681-Speed 2503.01 samples/sec Loss 39.0114 LearningRate 0.000162 Epoch: 0 Global Step: 13450 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:44,897-Speed 2493.13 samples/sec Loss 38.9625 LearningRate 0.000162 Epoch: 0 Global Step: 13460 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:30:53,113-Speed 2493.03 samples/sec Loss 38.9719 LearningRate 0.000162 Epoch: 0 Global Step: 13470 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:01,329-Speed 2493.14 samples/sec Loss 38.9271 LearningRate 0.000162 Epoch: 0 Global Step: 13480 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:09,546-Speed 2492.98 samples/sec Loss 38.9555 LearningRate 0.000163 Epoch: 0 Global Step: 13490 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:17,763-Speed 2492.44 samples/sec Loss 38.9703 LearningRate 0.000163 Epoch: 0 Global Step: 13500 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:25,927-Speed 2509.15 samples/sec Loss 38.9319 LearningRate 0.000163 Epoch: 0 Global Step: 13510 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:34,144-Speed 2493.09 samples/sec Loss 38.9307 LearningRate 0.000163 Epoch: 0 Global Step: 13520 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:42,362-Speed 2492.33 samples/sec Loss 38.9327 LearningRate 0.000163 Epoch: 0 Global Step: 13530 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:50,580-Speed 2492.49 samples/sec Loss 38.9220 LearningRate 0.000163 Epoch: 0 Global Step: 13540 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:31:58,817-Speed 2486.91 samples/sec Loss 38.9232 LearningRate 0.000163 Epoch: 0 Global Step: 13550 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:07,036-Speed 2492.14 samples/sec Loss 38.9130 LearningRate 0.000163 Epoch: 0 Global Step: 13560 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:15,199-Speed 2509.12 samples/sec Loss 38.9210 LearningRate 0.000164 Epoch: 0 Global Step: 13570 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:23,418-Speed 2492.09 samples/sec Loss 38.9156 LearningRate 0.000164 Epoch: 0 Global Step: 13580 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:31,641-Speed 2491.15 samples/sec Loss 38.8725 LearningRate 0.000164 Epoch: 0 Global Step: 13590 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:39,866-Speed 2490.51 samples/sec Loss 38.8991 LearningRate 0.000164 Epoch: 0 Global Step: 13600 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:48,082-Speed 2493.14 samples/sec Loss 38.8817 LearningRate 0.000164 Epoch: 0 Global Step: 13610 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:32:56,299-Speed 2492.68 samples/sec Loss 38.8587 LearningRate 0.000164 Epoch: 0 Global Step: 13620 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:04,464-Speed 2508.72 samples/sec Loss 38.8596 LearningRate 0.000164 Epoch: 0 Global Step: 13630 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:12,686-Speed 2491.41 samples/sec Loss 38.8992 LearningRate 0.000164 Epoch: 0 Global Step: 13640 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:20,899-Speed 2494.03 samples/sec Loss 38.8611 LearningRate 0.000165 Epoch: 0 Global Step: 13650 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:29,118-Speed 2492.51 samples/sec Loss 38.8371 LearningRate 0.000165 Epoch: 0 Global Step: 13660 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:37,335-Speed 2492.56 samples/sec Loss 38.8494 LearningRate 0.000165 Epoch: 0 Global Step: 13670 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:45,552-Speed 2492.85 samples/sec Loss 38.8790 LearningRate 0.000165 Epoch: 0 Global Step: 13680 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:33:53,717-Speed 2508.72 samples/sec Loss 38.8594 LearningRate 0.000165 Epoch: 0 Global Step: 13690 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:01,941-Speed 2490.87 samples/sec Loss 38.8463 LearningRate 0.000165 Epoch: 0 Global Step: 13700 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:10,157-Speed 2493.05 samples/sec Loss 38.8423 LearningRate 0.000165 Epoch: 0 Global Step: 13710 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:18,376-Speed 2492.21 samples/sec Loss 38.8865 LearningRate 0.000165 Epoch: 0 Global Step: 13720 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:26,592-Speed 2493.00 samples/sec Loss 38.8665 LearningRate 0.000165 Epoch: 0 Global Step: 13730 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:34,814-Speed 2491.32 samples/sec Loss 38.8262 LearningRate 0.000166 Epoch: 0 Global Step: 13740 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:42,981-Speed 2507.91 samples/sec Loss 38.8014 LearningRate 0.000166 Epoch: 0 Global Step: 13750 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:51,197-Speed 2493.32 samples/sec Loss 38.8388 LearningRate 0.000166 Epoch: 0 Global Step: 13760 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:34:59,412-Speed 2493.14 samples/sec Loss 38.8259 LearningRate 0.000166 Epoch: 0 Global Step: 13770 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:07,628-Speed 2493.31 samples/sec Loss 38.8731 LearningRate 0.000166 Epoch: 0 Global Step: 13780 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:15,846-Speed 2492.46 samples/sec Loss 38.8323 LearningRate 0.000166 Epoch: 0 Global Step: 13790 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:24,063-Speed 2492.82 samples/sec Loss 38.8150 LearningRate 0.000166 Epoch: 0 Global Step: 13800 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:32,232-Speed 2507.53 samples/sec Loss 38.8518 LearningRate 0.000166 Epoch: 0 Global Step: 13810 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:40,451-Speed 2492.09 samples/sec Loss 38.8652 LearningRate 0.000167 Epoch: 0 Global Step: 13820 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:48,669-Speed 2492.64 samples/sec Loss 38.8450 LearningRate 0.000167 Epoch: 0 Global Step: 13830 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:35:56,886-Speed 2492.70 samples/sec Loss 38.8200 LearningRate 0.000167 Epoch: 0 Global Step: 13840 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:05,105-Speed 2492.23 samples/sec Loss 38.8069 LearningRate 0.000167 Epoch: 0 Global Step: 13850 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:13,319-Speed 2493.80 samples/sec Loss 38.8145 LearningRate 0.000167 Epoch: 0 Global Step: 13860 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:21,499-Speed 2504.13 samples/sec Loss 38.9397 LearningRate 0.000167 Epoch: 0 Global Step: 13870 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:29,719-Speed 2491.86 samples/sec Loss 38.9778 LearningRate 0.000167 Epoch: 0 Global Step: 13880 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:37,938-Speed 2492.27 samples/sec Loss 38.9664 LearningRate 0.000167 Epoch: 0 Global Step: 13890 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:46,158-Speed 2492.21 samples/sec Loss 38.8684 LearningRate 0.000168 Epoch: 0 Global Step: 13900 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:36:54,390-Speed 2488.42 samples/sec Loss 38.8696 LearningRate 0.000168 Epoch: 0 Global Step: 13910 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:02,607-Speed 2492.53 samples/sec Loss 38.8419 LearningRate 0.000168 Epoch: 0 Global Step: 13920 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:10,774-Speed 2508.35 samples/sec Loss 38.7738 LearningRate 0.000168 Epoch: 0 Global Step: 13930 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:18,995-Speed 2491.78 samples/sec Loss 38.8108 LearningRate 0.000168 Epoch: 0 Global Step: 13940 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:27,216-Speed 2491.36 samples/sec Loss 38.7683 LearningRate 0.000168 Epoch: 0 Global Step: 13950 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:35,434-Speed 2492.71 samples/sec Loss 38.7910 LearningRate 0.000168 Epoch: 0 Global Step: 13960 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:43,657-Speed 2490.95 samples/sec Loss 38.7610 LearningRate 0.000168 Epoch: 0 Global Step: 13970 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:37:51,881-Speed 2490.46 samples/sec Loss 38.7788 LearningRate 0.000169 Epoch: 0 Global Step: 13980 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:00,094-Speed 2509.54 samples/sec Loss 38.7738 LearningRate 0.000169 Epoch: 0 Global Step: 13990 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:08,330-Speed 2494.78 samples/sec Loss 38.7540 LearningRate 0.000169 Epoch: 0 Global Step: 14000 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:16,575-Speed 2494.30 samples/sec Loss 38.7847 LearningRate 0.000169 Epoch: 0 Global Step: 14010 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:24,794-Speed 2491.92 samples/sec Loss 38.7651 LearningRate 0.000169 Epoch: 0 Global Step: 14020 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:33,018-Speed 2490.71 samples/sec Loss 38.7353 LearningRate 0.000169 Epoch: 0 Global Step: 14030 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:41,269-Speed 2495.18 samples/sec Loss 38.7409 LearningRate 0.000169 Epoch: 0 Global Step: 14040 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:49,512-Speed 2510.12 samples/sec Loss 38.7566 LearningRate 0.000169 Epoch: 0 Global Step: 14050 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:38:57,729-Speed 2492.66 samples/sec Loss 38.7550 LearningRate 0.000169 Epoch: 0 Global Step: 14060 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:39:05,989-Speed 2490.21 samples/sec Loss 38.7336 LearningRate 0.000170 Epoch: 0 Global Step: 14070 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:39:14,432-Speed 2493.91 samples/sec Loss 38.6811 LearningRate 0.000170 Epoch: 0 Global Step: 14080 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:39:22,648-Speed 2493.04 samples/sec Loss 38.7208 LearningRate 0.000170 Epoch: 0 Global Step: 14090 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:39:30,908-Speed 2495.21 samples/sec Loss 38.7201 LearningRate 0.000170 Epoch: 0 Global Step: 14100 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:39:39,109-Speed 2510.64 samples/sec Loss 38.7262 LearningRate 0.000170 Epoch: 0 Global Step: 14110 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:39:47,434-Speed 2493.79 samples/sec Loss 38.6632 LearningRate 0.000170 Epoch: 0 Global Step: 14120 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:39:55,647-Speed 2493.80 samples/sec Loss 38.7156 LearningRate 0.000170 Epoch: 0 Global Step: 14130 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:40:03,892-Speed 2494.83 samples/sec Loss 38.6807 LearningRate 0.000170 Epoch: 0 Global Step: 14140 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:40:12,061-Speed 2507.39 samples/sec Loss 38.6831 LearningRate 0.000171 Epoch: 0 Global Step: 14150 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:40:20,283-Speed 2492.11 samples/sec Loss 38.7228 LearningRate 0.000171 Epoch: 0 Global Step: 14160 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:40:28,500-Speed 2510.61 samples/sec Loss 38.7199 LearningRate 0.000171 Epoch: 0 Global Step: 14170 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:40:36,718-Speed 2492.48 samples/sec Loss 38.7083 LearningRate 0.000171 Epoch: 0 Global Step: 14180 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:40:44,936-Speed 2492.59 samples/sec Loss 38.6790 LearningRate 0.000171 Epoch: 0 Global Step: 14190 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:40:53,195-Speed 2494.01 samples/sec Loss 38.6680 LearningRate 0.000171 Epoch: 0 Global Step: 14200 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:01,437-Speed 2493.84 samples/sec Loss 38.7025 LearningRate 0.000171 Epoch: 0 Global Step: 14210 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:09,657-Speed 2491.72 samples/sec Loss 38.6340 LearningRate 0.000171 Epoch: 0 Global Step: 14220 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:17,819-Speed 2509.96 samples/sec Loss 38.6739 LearningRate 0.000172 Epoch: 0 Global Step: 14230 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:26,606-Speed 2497.07 samples/sec Loss 38.6914 LearningRate 0.000172 Epoch: 0 Global Step: 14240 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:34,821-Speed 2493.49 samples/sec Loss 38.6644 LearningRate 0.000172 Epoch: 0 Global Step: 14250 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:41:43,069-Speed 2495.92 samples/sec Loss 38.6703 LearningRate 0.000172 Epoch: 0 Global Step: 14260 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:42:23,718-Speed 504.56 samples/sec Loss 38.6369 LearningRate 0.000172 Epoch: 0 Global Step: 14270 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:42:31,912-Speed 2500.31 samples/sec Loss 38.6459 LearningRate 0.000172 Epoch: 0 Global Step: 14280 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:42:40,099-Speed 2514.76 samples/sec Loss 38.6284 LearningRate 0.000172 Epoch: 0 Global Step: 14290 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:42:48,328-Speed 2489.06 samples/sec Loss 38.6203 LearningRate 0.000172 Epoch: 0 Global Step: 14300 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:42:56,547-Speed 2492.10 samples/sec Loss 38.6044 LearningRate 0.000172 Epoch: 0 Global Step: 14310 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:04,766-Speed 2492.05 samples/sec Loss 38.6496 LearningRate 0.000173 Epoch: 0 Global Step: 14320 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:12,984-Speed 2492.55 samples/sec Loss 38.5892 LearningRate 0.000173 Epoch: 0 Global Step: 14330 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:21,203-Speed 2492.29 samples/sec Loss 38.6451 LearningRate 0.000173 Epoch: 0 Global Step: 14340 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:29,365-Speed 2509.39 samples/sec Loss 38.5412 LearningRate 0.000173 Epoch: 0 Global Step: 14350 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:37,579-Speed 2493.96 samples/sec Loss 38.6149 LearningRate 0.000173 Epoch: 0 Global Step: 14360 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:45,796-Speed 2493.05 samples/sec Loss 38.6125 LearningRate 0.000173 Epoch: 0 Global Step: 14370 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:43:54,010-Speed 2493.56 samples/sec Loss 38.6116 LearningRate 0.000173 Epoch: 0 Global Step: 14380 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:02,226-Speed 2493.12 samples/sec Loss 38.5930 LearningRate 0.000173 Epoch: 0 Global Step: 14390 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:10,444-Speed 2492.60 samples/sec Loss 38.5959 LearningRate 0.000174 Epoch: 0 Global Step: 14400 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:18,609-Speed 2508.67 samples/sec Loss 38.5867 LearningRate 0.000174 Epoch: 0 Global Step: 14410 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:26,827-Speed 2492.34 samples/sec Loss 38.5946 LearningRate 0.000174 Epoch: 0 Global Step: 14420 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:35,042-Speed 2493.59 samples/sec Loss 38.5984 LearningRate 0.000174 Epoch: 0 Global Step: 14430 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:43,257-Speed 2493.35 samples/sec Loss 38.5715 LearningRate 0.000174 Epoch: 0 Global Step: 14440 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:51,486-Speed 2489.41 samples/sec Loss 38.5828 LearningRate 0.000174 Epoch: 0 Global Step: 14450 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:44:59,708-Speed 2491.11 samples/sec Loss 38.5309 LearningRate 0.000174 Epoch: 0 Global Step: 14460 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:45:07,876-Speed 2507.98 samples/sec Loss 38.5394 LearningRate 0.000174 Epoch: 0 Global Step: 14470 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:45:16,096-Speed 2491.86 samples/sec Loss 38.5251 LearningRate 0.000175 Epoch: 0 Global Step: 14480 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:45:24,314-Speed 2492.48 samples/sec Loss 38.5201 LearningRate 0.000175 Epoch: 0 Global Step: 14490 Fp16 Grad Scale: 16384 Required: 187 hours Training: 2022-07-05 17:45:32,534-Speed 2491.92 samples/sec Loss 38.5442 LearningRate 0.000175 Epoch: 0 Global Step: 14500 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:45:40,753-Speed 2492.46 samples/sec Loss 38.5825 LearningRate 0.000175 Epoch: 0 Global Step: 14510 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:45:48,974-Speed 2491.59 samples/sec Loss 38.5358 LearningRate 0.000175 Epoch: 0 Global Step: 14520 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:45:57,140-Speed 2508.23 samples/sec Loss 38.5303 LearningRate 0.000175 Epoch: 0 Global Step: 14530 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:05,357-Speed 2492.99 samples/sec Loss 38.5335 LearningRate 0.000175 Epoch: 0 Global Step: 14540 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:13,575-Speed 2492.56 samples/sec Loss 38.5257 LearningRate 0.000175 Epoch: 0 Global Step: 14550 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:21,797-Speed 2491.46 samples/sec Loss 38.5291 LearningRate 0.000176 Epoch: 0 Global Step: 14560 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:30,024-Speed 2489.87 samples/sec Loss 38.5045 LearningRate 0.000176 Epoch: 0 Global Step: 14570 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:38,244-Speed 2491.68 samples/sec Loss 38.5502 LearningRate 0.000176 Epoch: 0 Global Step: 14580 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:46,406-Speed 2509.76 samples/sec Loss 38.5118 LearningRate 0.000176 Epoch: 0 Global Step: 14590 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:46:54,626-Speed 2492.06 samples/sec Loss 38.5360 LearningRate 0.000176 Epoch: 0 Global Step: 14600 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:02,845-Speed 2492.24 samples/sec Loss 38.5171 LearningRate 0.000176 Epoch: 0 Global Step: 14610 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:11,063-Speed 2492.55 samples/sec Loss 38.5034 LearningRate 0.000176 Epoch: 0 Global Step: 14620 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:19,281-Speed 2492.24 samples/sec Loss 38.4782 LearningRate 0.000176 Epoch: 0 Global Step: 14630 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:27,499-Speed 2492.79 samples/sec Loss 38.5047 LearningRate 0.000176 Epoch: 0 Global Step: 14640 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:35,666-Speed 2508.20 samples/sec Loss 38.4817 LearningRate 0.000177 Epoch: 0 Global Step: 14650 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:43,895-Speed 2488.98 samples/sec Loss 38.4511 LearningRate 0.000177 Epoch: 0 Global Step: 14660 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:47:52,114-Speed 2492.25 samples/sec Loss 38.4905 LearningRate 0.000177 Epoch: 0 Global Step: 14670 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:00,331-Speed 2492.74 samples/sec Loss 38.4542 LearningRate 0.000177 Epoch: 0 Global Step: 14680 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:08,551-Speed 2491.99 samples/sec Loss 38.4804 LearningRate 0.000177 Epoch: 0 Global Step: 14690 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:16,770-Speed 2492.53 samples/sec Loss 38.4596 LearningRate 0.000177 Epoch: 0 Global Step: 14700 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:24,933-Speed 2509.27 samples/sec Loss 38.4932 LearningRate 0.000177 Epoch: 0 Global Step: 14710 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:33,149-Speed 2493.11 samples/sec Loss 38.4623 LearningRate 0.000177 Epoch: 0 Global Step: 14720 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:41,371-Speed 2491.46 samples/sec Loss 38.4427 LearningRate 0.000178 Epoch: 0 Global Step: 14730 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:49,589-Speed 2492.49 samples/sec Loss 38.4393 LearningRate 0.000178 Epoch: 0 Global Step: 14740 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:48:57,805-Speed 2493.36 samples/sec Loss 38.4434 LearningRate 0.000178 Epoch: 0 Global Step: 14750 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:06,023-Speed 2492.18 samples/sec Loss 38.4147 LearningRate 0.000178 Epoch: 0 Global Step: 14760 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:14,188-Speed 2508.73 samples/sec Loss 38.3564 LearningRate 0.000178 Epoch: 0 Global Step: 14770 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:22,406-Speed 2492.66 samples/sec Loss 38.4159 LearningRate 0.000178 Epoch: 0 Global Step: 14780 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:30,627-Speed 2491.72 samples/sec Loss 38.4417 LearningRate 0.000178 Epoch: 0 Global Step: 14790 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:38,840-Speed 2493.97 samples/sec Loss 38.4257 LearningRate 0.000178 Epoch: 0 Global Step: 14800 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:47,058-Speed 2492.83 samples/sec Loss 38.4039 LearningRate 0.000179 Epoch: 0 Global Step: 14810 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:49:55,275-Speed 2492.70 samples/sec Loss 38.4096 LearningRate 0.000179 Epoch: 0 Global Step: 14820 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:03,442-Speed 2508.07 samples/sec Loss 38.3970 LearningRate 0.000179 Epoch: 0 Global Step: 14830 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:11,659-Speed 2492.91 samples/sec Loss 38.4306 LearningRate 0.000179 Epoch: 0 Global Step: 14840 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:19,876-Speed 2492.76 samples/sec Loss 38.4020 LearningRate 0.000179 Epoch: 0 Global Step: 14850 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:28,091-Speed 2493.27 samples/sec Loss 38.3980 LearningRate 0.000179 Epoch: 0 Global Step: 14860 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:36,309-Speed 2492.62 samples/sec Loss 38.3582 LearningRate 0.000179 Epoch: 0 Global Step: 14870 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:44,528-Speed 2492.27 samples/sec Loss 38.3702 LearningRate 0.000179 Epoch: 0 Global Step: 14880 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:50:52,693-Speed 2508.58 samples/sec Loss 38.3982 LearningRate 0.000179 Epoch: 0 Global Step: 14890 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:00,910-Speed 2492.85 samples/sec Loss 38.4377 LearningRate 0.000180 Epoch: 0 Global Step: 14900 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:09,129-Speed 2492.12 samples/sec Loss 38.4331 LearningRate 0.000180 Epoch: 0 Global Step: 14910 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:17,349-Speed 2491.70 samples/sec Loss 38.4385 LearningRate 0.000180 Epoch: 0 Global Step: 14920 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:25,566-Speed 2492.96 samples/sec Loss 38.4613 LearningRate 0.000180 Epoch: 0 Global Step: 14930 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:33,786-Speed 2491.80 samples/sec Loss 38.3594 LearningRate 0.000180 Epoch: 0 Global Step: 14940 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:41,950-Speed 2509.00 samples/sec Loss 38.3910 LearningRate 0.000180 Epoch: 0 Global Step: 14950 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:50,171-Speed 2491.59 samples/sec Loss 38.4064 LearningRate 0.000180 Epoch: 0 Global Step: 14960 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:51:58,390-Speed 2492.27 samples/sec Loss 38.3842 LearningRate 0.000180 Epoch: 0 Global Step: 14970 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:06,610-Speed 2491.88 samples/sec Loss 38.3778 LearningRate 0.000181 Epoch: 0 Global Step: 14980 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:14,827-Speed 2492.89 samples/sec Loss 38.3945 LearningRate 0.000181 Epoch: 0 Global Step: 14990 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:23,048-Speed 2491.51 samples/sec Loss 38.3377 LearningRate 0.000181 Epoch: 0 Global Step: 15000 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:31,215-Speed 2507.93 samples/sec Loss 38.3297 LearningRate 0.000181 Epoch: 0 Global Step: 15010 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:39,449-Speed 2487.83 samples/sec Loss 38.3733 LearningRate 0.000181 Epoch: 0 Global Step: 15020 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:47,666-Speed 2492.61 samples/sec Loss 38.3548 LearningRate 0.000181 Epoch: 0 Global Step: 15030 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:52:55,889-Speed 2490.89 samples/sec Loss 38.3207 LearningRate 0.000181 Epoch: 0 Global Step: 15040 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:04,107-Speed 2492.95 samples/sec Loss 38.3139 LearningRate 0.000181 Epoch: 0 Global Step: 15050 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:12,329-Speed 2491.43 samples/sec Loss 38.2855 LearningRate 0.000182 Epoch: 0 Global Step: 15060 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:20,497-Speed 2507.58 samples/sec Loss 38.3404 LearningRate 0.000182 Epoch: 0 Global Step: 15070 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:28,735-Speed 2486.52 samples/sec Loss 38.3104 LearningRate 0.000182 Epoch: 0 Global Step: 15080 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:36,953-Speed 2492.48 samples/sec Loss 38.2754 LearningRate 0.000182 Epoch: 0 Global Step: 15090 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:45,169-Speed 2493.01 samples/sec Loss 38.2899 LearningRate 0.000182 Epoch: 0 Global Step: 15100 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:53:53,386-Speed 2492.83 samples/sec Loss 38.2580 LearningRate 0.000182 Epoch: 0 Global Step: 15110 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:01,605-Speed 2492.18 samples/sec Loss 38.3261 LearningRate 0.000182 Epoch: 0 Global Step: 15120 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:09,772-Speed 2508.16 samples/sec Loss 38.3007 LearningRate 0.000182 Epoch: 0 Global Step: 15130 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:17,988-Speed 2493.19 samples/sec Loss 38.2435 LearningRate 0.000182 Epoch: 0 Global Step: 15140 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:26,208-Speed 2491.87 samples/sec Loss 38.2449 LearningRate 0.000183 Epoch: 0 Global Step: 15150 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:34,426-Speed 2492.77 samples/sec Loss 38.3161 LearningRate 0.000183 Epoch: 0 Global Step: 15160 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:42,640-Speed 2493.61 samples/sec Loss 38.2695 LearningRate 0.000183 Epoch: 0 Global Step: 15170 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:50,857-Speed 2492.90 samples/sec Loss 38.2338 LearningRate 0.000183 Epoch: 0 Global Step: 15180 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:54:59,022-Speed 2508.76 samples/sec Loss 38.1967 LearningRate 0.000183 Epoch: 0 Global Step: 15190 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:07,246-Speed 2490.51 samples/sec Loss 38.2734 LearningRate 0.000183 Epoch: 0 Global Step: 15200 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:15,464-Speed 2492.81 samples/sec Loss 38.2167 LearningRate 0.000183 Epoch: 0 Global Step: 15210 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:23,679-Speed 2493.61 samples/sec Loss 38.2233 LearningRate 0.000183 Epoch: 0 Global Step: 15220 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:31,895-Speed 2492.94 samples/sec Loss 38.2589 LearningRate 0.000184 Epoch: 0 Global Step: 15230 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:40,110-Speed 2493.53 samples/sec Loss 38.2604 LearningRate 0.000184 Epoch: 0 Global Step: 15240 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:48,275-Speed 2508.78 samples/sec Loss 38.2780 LearningRate 0.000184 Epoch: 0 Global Step: 15250 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:55:56,493-Speed 2492.19 samples/sec Loss 38.2579 LearningRate 0.000184 Epoch: 0 Global Step: 15260 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:04,709-Speed 2493.12 samples/sec Loss 38.1572 LearningRate 0.000184 Epoch: 0 Global Step: 15270 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:12,928-Speed 2492.40 samples/sec Loss 38.1844 LearningRate 0.000184 Epoch: 0 Global Step: 15280 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:21,148-Speed 2492.06 samples/sec Loss 38.1993 LearningRate 0.000184 Epoch: 0 Global Step: 15290 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:29,362-Speed 2493.54 samples/sec Loss 38.2014 LearningRate 0.000184 Epoch: 0 Global Step: 15300 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:37,523-Speed 2510.09 samples/sec Loss 38.1922 LearningRate 0.000185 Epoch: 0 Global Step: 15310 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:45,738-Speed 2493.48 samples/sec Loss 38.1906 LearningRate 0.000185 Epoch: 0 Global Step: 15320 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:56:53,953-Speed 2493.29 samples/sec Loss 38.1693 LearningRate 0.000185 Epoch: 0 Global Step: 15330 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:57:02,169-Speed 2493.16 samples/sec Loss 38.2055 LearningRate 0.000185 Epoch: 0 Global Step: 15340 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:57:10,392-Speed 2491.19 samples/sec Loss 38.1764 LearningRate 0.000185 Epoch: 0 Global Step: 15350 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:18,614-Speed 2491.30 samples/sec Loss 38.1686 LearningRate 0.000185 Epoch: 0 Global Step: 15360 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:26,776-Speed 2509.72 samples/sec Loss 38.2002 LearningRate 0.000185 Epoch: 0 Global Step: 15370 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:34,994-Speed 2492.51 samples/sec Loss 38.1556 LearningRate 0.000185 Epoch: 0 Global Step: 15380 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:43,210-Speed 2492.83 samples/sec Loss 38.2050 LearningRate 0.000186 Epoch: 0 Global Step: 15390 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:51,428-Speed 2492.54 samples/sec Loss 38.1876 LearningRate 0.000186 Epoch: 0 Global Step: 15400 Fp16 Grad Scale: 32768 Required: 186 hours Training: 2022-07-05 17:57:59,603-Speed 2505.82 samples/sec Loss 38.1603 LearningRate 0.000186 Epoch: 0 Global Step: 15410 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:07,817-Speed 2493.45 samples/sec Loss 38.1543 LearningRate 0.000186 Epoch: 0 Global Step: 15420 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:15,979-Speed 2509.69 samples/sec Loss 38.1516 LearningRate 0.000186 Epoch: 0 Global Step: 15430 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:24,196-Speed 2493.00 samples/sec Loss 38.1239 LearningRate 0.000186 Epoch: 0 Global Step: 15440 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:32,429-Speed 2487.95 samples/sec Loss 38.1125 LearningRate 0.000186 Epoch: 0 Global Step: 15450 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:40,645-Speed 2493.03 samples/sec Loss 38.0649 LearningRate 0.000186 Epoch: 0 Global Step: 15460 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:48,862-Speed 2492.64 samples/sec Loss 38.1439 LearningRate 0.000186 Epoch: 0 Global Step: 15470 Fp16 Grad Scale: 16384 Required: 186 hours Training: 2022-07-05 17:58:57,050-Speed 2501.53 samples/sec Loss 38.1367 LearningRate 0.000187 Epoch: 0 Global Step: 15480 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:05,214-Speed 2509.02 samples/sec Loss 38.1723 LearningRate 0.000187 Epoch: 0 Global Step: 15490 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:13,434-Speed 2492.35 samples/sec Loss 38.1877 LearningRate 0.000187 Epoch: 0 Global Step: 15500 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:21,651-Speed 2492.50 samples/sec Loss 38.0958 LearningRate 0.000187 Epoch: 0 Global Step: 15510 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:29,866-Speed 2493.78 samples/sec Loss 38.1019 LearningRate 0.000187 Epoch: 0 Global Step: 15520 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:38,081-Speed 2493.67 samples/sec Loss 38.0487 LearningRate 0.000187 Epoch: 0 Global Step: 15530 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:46,299-Speed 2492.14 samples/sec Loss 38.0811 LearningRate 0.000187 Epoch: 0 Global Step: 15540 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 17:59:54,464-Speed 2508.97 samples/sec Loss 38.1360 LearningRate 0.000187 Epoch: 0 Global Step: 15550 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:02,702-Speed 2486.58 samples/sec Loss 38.1736 LearningRate 0.000188 Epoch: 0 Global Step: 15560 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:10,918-Speed 2493.06 samples/sec Loss 38.2146 LearningRate 0.000188 Epoch: 0 Global Step: 15570 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:19,135-Speed 2492.90 samples/sec Loss 38.1209 LearningRate 0.000188 Epoch: 0 Global Step: 15580 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:27,352-Speed 2492.99 samples/sec Loss 38.1138 LearningRate 0.000188 Epoch: 0 Global Step: 15590 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:35,568-Speed 2493.13 samples/sec Loss 38.0665 LearningRate 0.000188 Epoch: 0 Global Step: 15600 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:43,731-Speed 2509.15 samples/sec Loss 38.0891 LearningRate 0.000188 Epoch: 0 Global Step: 15610 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:00:51,948-Speed 2493.15 samples/sec Loss 38.0649 LearningRate 0.000188 Epoch: 0 Global Step: 15620 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:00,167-Speed 2492.47 samples/sec Loss 38.0612 LearningRate 0.000188 Epoch: 0 Global Step: 15630 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:08,386-Speed 2492.11 samples/sec Loss 38.0203 LearningRate 0.000189 Epoch: 0 Global Step: 15640 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:16,602-Speed 2492.97 samples/sec Loss 38.0892 LearningRate 0.000189 Epoch: 0 Global Step: 15650 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:24,823-Speed 2491.49 samples/sec Loss 38.0736 LearningRate 0.000189 Epoch: 0 Global Step: 15660 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:32,991-Speed 2508.05 samples/sec Loss 38.0271 LearningRate 0.000189 Epoch: 0 Global Step: 15670 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:41,206-Speed 2493.56 samples/sec Loss 38.0586 LearningRate 0.000189 Epoch: 0 Global Step: 15680 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:49,420-Speed 2493.81 samples/sec Loss 38.0259 LearningRate 0.000189 Epoch: 0 Global Step: 15690 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:01:57,644-Speed 2490.49 samples/sec Loss 38.0133 LearningRate 0.000189 Epoch: 0 Global Step: 15700 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:05,870-Speed 2490.14 samples/sec Loss 38.0096 LearningRate 0.000189 Epoch: 0 Global Step: 15710 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:14,089-Speed 2492.08 samples/sec Loss 38.0176 LearningRate 0.000189 Epoch: 0 Global Step: 15720 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:22,251-Speed 2509.98 samples/sec Loss 38.0202 LearningRate 0.000190 Epoch: 0 Global Step: 15730 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:30,469-Speed 2492.73 samples/sec Loss 38.0363 LearningRate 0.000190 Epoch: 0 Global Step: 15740 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:38,683-Speed 2493.55 samples/sec Loss 37.9938 LearningRate 0.000190 Epoch: 0 Global Step: 15750 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:46,895-Speed 2493.97 samples/sec Loss 38.0093 LearningRate 0.000190 Epoch: 0 Global Step: 15760 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:02:55,117-Speed 2491.49 samples/sec Loss 38.0183 LearningRate 0.000190 Epoch: 0 Global Step: 15770 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:03,335-Speed 2492.52 samples/sec Loss 38.0068 LearningRate 0.000190 Epoch: 0 Global Step: 15780 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:11,498-Speed 2509.20 samples/sec Loss 38.0018 LearningRate 0.000190 Epoch: 0 Global Step: 15790 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:19,710-Speed 2494.60 samples/sec Loss 37.9592 LearningRate 0.000190 Epoch: 0 Global Step: 15800 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:27,925-Speed 2493.65 samples/sec Loss 38.0029 LearningRate 0.000191 Epoch: 0 Global Step: 15810 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:36,155-Speed 2489.06 samples/sec Loss 37.9956 LearningRate 0.000191 Epoch: 0 Global Step: 15820 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:44,369-Speed 2493.66 samples/sec Loss 37.9879 LearningRate 0.000191 Epoch: 0 Global Step: 15830 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:03:52,588-Speed 2491.95 samples/sec Loss 37.9604 LearningRate 0.000191 Epoch: 0 Global Step: 15840 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:00,749-Speed 2509.94 samples/sec Loss 37.9550 LearningRate 0.000191 Epoch: 0 Global Step: 15850 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:08,982-Speed 2488.18 samples/sec Loss 37.9644 LearningRate 0.000191 Epoch: 0 Global Step: 15860 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:17,197-Speed 2493.44 samples/sec Loss 37.9001 LearningRate 0.000191 Epoch: 0 Global Step: 15870 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:25,410-Speed 2494.05 samples/sec Loss 37.9527 LearningRate 0.000191 Epoch: 0 Global Step: 15880 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:33,639-Speed 2489.06 samples/sec Loss 37.9998 LearningRate 0.000192 Epoch: 0 Global Step: 15890 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:41,855-Speed 2493.02 samples/sec Loss 37.9292 LearningRate 0.000192 Epoch: 0 Global Step: 15900 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:50,017-Speed 2509.44 samples/sec Loss 37.9236 LearningRate 0.000192 Epoch: 0 Global Step: 15910 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:04:58,233-Speed 2493.35 samples/sec Loss 37.9525 LearningRate 0.000192 Epoch: 0 Global Step: 15920 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:06,446-Speed 2493.91 samples/sec Loss 37.9413 LearningRate 0.000192 Epoch: 0 Global Step: 15930 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:14,660-Speed 2493.81 samples/sec Loss 37.9302 LearningRate 0.000192 Epoch: 0 Global Step: 15940 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:22,881-Speed 2491.51 samples/sec Loss 37.8909 LearningRate 0.000192 Epoch: 0 Global Step: 15950 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:31,095-Speed 2493.85 samples/sec Loss 37.8881 LearningRate 0.000192 Epoch: 0 Global Step: 15960 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:39,257-Speed 2509.84 samples/sec Loss 37.8818 LearningRate 0.000192 Epoch: 0 Global Step: 15970 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:47,473-Speed 2492.87 samples/sec Loss 37.9066 LearningRate 0.000193 Epoch: 0 Global Step: 15980 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:05:55,692-Speed 2492.27 samples/sec Loss 37.9931 LearningRate 0.000193 Epoch: 0 Global Step: 15990 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:03,922-Speed 2489.06 samples/sec Loss 38.0671 LearningRate 0.000193 Epoch: 0 Global Step: 16000 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:12,141-Speed 2492.15 samples/sec Loss 38.0387 LearningRate 0.000193 Epoch: 0 Global Step: 16010 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:20,353-Speed 2494.29 samples/sec Loss 38.1581 LearningRate 0.000193 Epoch: 0 Global Step: 16020 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:28,513-Speed 2510.25 samples/sec Loss 38.2458 LearningRate 0.000193 Epoch: 0 Global Step: 16030 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:36,726-Speed 2494.36 samples/sec Loss 38.0917 LearningRate 0.000193 Epoch: 0 Global Step: 16040 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:44,939-Speed 2493.61 samples/sec Loss 38.0155 LearningRate 0.000193 Epoch: 0 Global Step: 16050 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:06:53,259-Speed 2461.92 samples/sec Loss 38.0291 LearningRate 0.000194 Epoch: 0 Global Step: 16060 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:01,471-Speed 2494.25 samples/sec Loss 37.8877 LearningRate 0.000194 Epoch: 0 Global Step: 16070 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:09,690-Speed 2492.26 samples/sec Loss 37.8698 LearningRate 0.000194 Epoch: 0 Global Step: 16080 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:17,861-Speed 2506.64 samples/sec Loss 37.8708 LearningRate 0.000194 Epoch: 0 Global Step: 16090 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:26,077-Speed 2493.02 samples/sec Loss 37.8701 LearningRate 0.000194 Epoch: 0 Global Step: 16100 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:34,294-Speed 2493.06 samples/sec Loss 37.8855 LearningRate 0.000194 Epoch: 0 Global Step: 16110 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:42,507-Speed 2493.83 samples/sec Loss 37.8937 LearningRate 0.000194 Epoch: 0 Global Step: 16120 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:50,725-Speed 2492.69 samples/sec Loss 37.8694 LearningRate 0.000194 Epoch: 0 Global Step: 16130 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:07:58,939-Speed 2493.71 samples/sec Loss 37.8481 LearningRate 0.000195 Epoch: 0 Global Step: 16140 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:07,102-Speed 2509.17 samples/sec Loss 37.8585 LearningRate 0.000195 Epoch: 0 Global Step: 16150 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:15,321-Speed 2492.09 samples/sec Loss 37.8813 LearningRate 0.000195 Epoch: 0 Global Step: 16160 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:23,535-Speed 2493.79 samples/sec Loss 37.8683 LearningRate 0.000195 Epoch: 0 Global Step: 16170 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:31,844-Speed 2465.27 samples/sec Loss 37.7994 LearningRate 0.000195 Epoch: 0 Global Step: 16180 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:40,056-Speed 2494.60 samples/sec Loss 37.8102 LearningRate 0.000195 Epoch: 0 Global Step: 16190 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:48,270-Speed 2493.82 samples/sec Loss 37.7939 LearningRate 0.000195 Epoch: 0 Global Step: 16200 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:08:56,433-Speed 2509.02 samples/sec Loss 37.8663 LearningRate 0.000195 Epoch: 0 Global Step: 16210 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:04,647-Speed 2493.90 samples/sec Loss 37.8375 LearningRate 0.000196 Epoch: 0 Global Step: 16220 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:12,862-Speed 2493.24 samples/sec Loss 37.8669 LearningRate 0.000196 Epoch: 0 Global Step: 16230 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:21,074-Speed 2494.42 samples/sec Loss 37.8492 LearningRate 0.000196 Epoch: 0 Global Step: 16240 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:29,287-Speed 2493.92 samples/sec Loss 37.7680 LearningRate 0.000196 Epoch: 0 Global Step: 16250 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:37,502-Speed 2493.24 samples/sec Loss 37.7659 LearningRate 0.000196 Epoch: 0 Global Step: 16260 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:45,666-Speed 2508.98 samples/sec Loss 37.7733 LearningRate 0.000196 Epoch: 0 Global Step: 16270 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:09:53,882-Speed 2492.87 samples/sec Loss 37.7370 LearningRate 0.000196 Epoch: 0 Global Step: 16280 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:10:02,099-Speed 2492.76 samples/sec Loss 37.7325 LearningRate 0.000196 Epoch: 0 Global Step: 16290 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:10:10,314-Speed 2493.72 samples/sec Loss 37.7848 LearningRate 0.000196 Epoch: 0 Global Step: 16300 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:10:18,542-Speed 2489.55 samples/sec Loss 37.7348 LearningRate 0.000197 Epoch: 0 Global Step: 16310 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:10:26,723-Speed 2503.56 samples/sec Loss 37.7944 LearningRate 0.000197 Epoch: 0 Global Step: 16320 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:10:34,884-Speed 2510.11 samples/sec Loss 37.8028 LearningRate 0.000197 Epoch: 0 Global Step: 16330 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:10:43,098-Speed 2493.58 samples/sec Loss 37.7525 LearningRate 0.000197 Epoch: 0 Global Step: 16340 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:10:51,312-Speed 2493.75 samples/sec Loss 37.6990 LearningRate 0.000197 Epoch: 0 Global Step: 16350 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:10:59,528-Speed 2493.10 samples/sec Loss 37.7119 LearningRate 0.000197 Epoch: 0 Global Step: 16360 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:07,743-Speed 2493.26 samples/sec Loss 37.7655 LearningRate 0.000197 Epoch: 0 Global Step: 16370 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:15,958-Speed 2493.65 samples/sec Loss 37.7029 LearningRate 0.000197 Epoch: 0 Global Step: 16380 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:24,116-Speed 2510.75 samples/sec Loss 37.6887 LearningRate 0.000198 Epoch: 0 Global Step: 16390 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:32,331-Speed 2493.43 samples/sec Loss 37.6787 LearningRate 0.000198 Epoch: 0 Global Step: 16400 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:40,543-Speed 2494.68 samples/sec Loss 37.6595 LearningRate 0.000198 Epoch: 0 Global Step: 16410 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:48,756-Speed 2494.17 samples/sec Loss 37.6495 LearningRate 0.000198 Epoch: 0 Global Step: 16420 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:11:56,974-Speed 2492.50 samples/sec Loss 37.6449 LearningRate 0.000198 Epoch: 0 Global Step: 16430 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:05,187-Speed 2494.10 samples/sec Loss 37.6407 LearningRate 0.000198 Epoch: 0 Global Step: 16440 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:13,350-Speed 2509.42 samples/sec Loss 37.6532 LearningRate 0.000198 Epoch: 0 Global Step: 16450 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:21,564-Speed 2493.64 samples/sec Loss 37.6249 LearningRate 0.000198 Epoch: 0 Global Step: 16460 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:29,777-Speed 2493.99 samples/sec Loss 37.6225 LearningRate 0.000199 Epoch: 0 Global Step: 16470 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:37,988-Speed 2494.67 samples/sec Loss 37.6084 LearningRate 0.000199 Epoch: 0 Global Step: 16480 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:46,204-Speed 2493.16 samples/sec Loss 37.6121 LearningRate 0.000199 Epoch: 0 Global Step: 16490 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:12:54,418-Speed 2493.73 samples/sec Loss 37.7256 LearningRate 0.000199 Epoch: 0 Global Step: 16500 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:02,580-Speed 2509.35 samples/sec Loss 37.6248 LearningRate 0.000199 Epoch: 0 Global Step: 16510 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:10,793-Speed 2493.88 samples/sec Loss 37.6115 LearningRate 0.000199 Epoch: 0 Global Step: 16520 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:19,005-Speed 2494.49 samples/sec Loss 37.6255 LearningRate 0.000199 Epoch: 0 Global Step: 16530 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:27,217-Speed 2494.22 samples/sec Loss 37.6072 LearningRate 0.000199 Epoch: 0 Global Step: 16540 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:35,443-Speed 2490.15 samples/sec Loss 37.5709 LearningRate 0.000199 Epoch: 0 Global Step: 16550 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:43,661-Speed 2492.51 samples/sec Loss 37.6068 LearningRate 0.000200 Epoch: 0 Global Step: 16560 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:13:51,822-Speed 2509.93 samples/sec Loss 37.5405 LearningRate 0.000200 Epoch: 0 Global Step: 16570 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:00,036-Speed 2493.62 samples/sec Loss 37.5858 LearningRate 0.000200 Epoch: 0 Global Step: 16580 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:08,250-Speed 2493.82 samples/sec Loss 37.5578 LearningRate 0.000200 Epoch: 0 Global Step: 16590 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:16,465-Speed 2493.47 samples/sec Loss 37.5333 LearningRate 0.000200 Epoch: 0 Global Step: 16600 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:24,685-Speed 2491.79 samples/sec Loss 37.5217 LearningRate 0.000200 Epoch: 0 Global Step: 16610 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:32,897-Speed 2494.51 samples/sec Loss 37.5561 LearningRate 0.000200 Epoch: 0 Global Step: 16620 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:41,057-Speed 2510.54 samples/sec Loss 37.5885 LearningRate 0.000200 Epoch: 0 Global Step: 16630 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:49,272-Speed 2493.39 samples/sec Loss 37.5615 LearningRate 0.000201 Epoch: 0 Global Step: 16640 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:14:57,486-Speed 2493.76 samples/sec Loss 37.5610 LearningRate 0.000201 Epoch: 0 Global Step: 16650 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:05,702-Speed 2492.88 samples/sec Loss 37.5888 LearningRate 0.000201 Epoch: 0 Global Step: 16660 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:13,915-Speed 2494.18 samples/sec Loss 37.5454 LearningRate 0.000201 Epoch: 0 Global Step: 16670 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:22,130-Speed 2493.20 samples/sec Loss 37.5290 LearningRate 0.000201 Epoch: 0 Global Step: 16680 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:30,292-Speed 2509.64 samples/sec Loss 37.4950 LearningRate 0.000201 Epoch: 0 Global Step: 16690 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:38,504-Speed 2494.31 samples/sec Loss 37.5187 LearningRate 0.000201 Epoch: 0 Global Step: 16700 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:46,717-Speed 2494.25 samples/sec Loss 37.5216 LearningRate 0.000201 Epoch: 0 Global Step: 16710 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:15:54,934-Speed 2492.70 samples/sec Loss 37.5033 LearningRate 0.000202 Epoch: 0 Global Step: 16720 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:03,145-Speed 2494.82 samples/sec Loss 37.4838 LearningRate 0.000202 Epoch: 0 Global Step: 16730 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:11,363-Speed 2492.41 samples/sec Loss 37.4397 LearningRate 0.000202 Epoch: 0 Global Step: 16740 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:19,520-Speed 2511.21 samples/sec Loss 37.4452 LearningRate 0.000202 Epoch: 0 Global Step: 16750 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:27,746-Speed 2490.04 samples/sec Loss 37.4739 LearningRate 0.000202 Epoch: 0 Global Step: 16760 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:35,970-Speed 2490.87 samples/sec Loss 37.4124 LearningRate 0.000202 Epoch: 0 Global Step: 16770 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:44,183-Speed 2493.86 samples/sec Loss 37.4522 LearningRate 0.000202 Epoch: 0 Global Step: 16780 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:16:52,396-Speed 2494.05 samples/sec Loss 37.4525 LearningRate 0.000202 Epoch: 0 Global Step: 16790 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:00,606-Speed 2495.05 samples/sec Loss 37.4645 LearningRate 0.000203 Epoch: 0 Global Step: 16800 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:08,779-Speed 2506.20 samples/sec Loss 37.4918 LearningRate 0.000203 Epoch: 0 Global Step: 16810 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:16,990-Speed 2494.35 samples/sec Loss 37.4716 LearningRate 0.000203 Epoch: 0 Global Step: 16820 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:25,204-Speed 2493.88 samples/sec Loss 37.4398 LearningRate 0.000203 Epoch: 0 Global Step: 16830 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:33,417-Speed 2494.13 samples/sec Loss 37.5105 LearningRate 0.000203 Epoch: 0 Global Step: 16840 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:41,630-Speed 2493.95 samples/sec Loss 37.4972 LearningRate 0.000203 Epoch: 0 Global Step: 16850 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:49,842-Speed 2494.13 samples/sec Loss 37.4995 LearningRate 0.000203 Epoch: 0 Global Step: 16860 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:17:58,003-Speed 2509.98 samples/sec Loss 37.4847 LearningRate 0.000203 Epoch: 0 Global Step: 16870 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:06,215-Speed 2494.29 samples/sec Loss 37.4395 LearningRate 0.000203 Epoch: 0 Global Step: 16880 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:14,428-Speed 2493.78 samples/sec Loss 37.4284 LearningRate 0.000204 Epoch: 0 Global Step: 16890 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:22,642-Speed 2493.84 samples/sec Loss 37.4480 LearningRate 0.000204 Epoch: 0 Global Step: 16900 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:30,854-Speed 2494.48 samples/sec Loss 37.4030 LearningRate 0.000204 Epoch: 0 Global Step: 16910 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:39,064-Speed 2494.78 samples/sec Loss 37.3649 LearningRate 0.000204 Epoch: 0 Global Step: 16920 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:47,222-Speed 2510.94 samples/sec Loss 37.3588 LearningRate 0.000204 Epoch: 0 Global Step: 16930 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:18:55,434-Speed 2494.35 samples/sec Loss 37.3683 LearningRate 0.000204 Epoch: 0 Global Step: 16940 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:03,648-Speed 2493.76 samples/sec Loss 37.3646 LearningRate 0.000204 Epoch: 0 Global Step: 16950 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:11,865-Speed 2492.76 samples/sec Loss 37.2822 LearningRate 0.000204 Epoch: 0 Global Step: 16960 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:20,080-Speed 2493.31 samples/sec Loss 37.3431 LearningRate 0.000205 Epoch: 0 Global Step: 16970 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:28,306-Speed 2490.19 samples/sec Loss 37.3904 LearningRate 0.000205 Epoch: 0 Global Step: 16980 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:36,467-Speed 2509.82 samples/sec Loss 37.3185 LearningRate 0.000205 Epoch: 0 Global Step: 16990 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:44,682-Speed 2493.91 samples/sec Loss 37.3187 LearningRate 0.000205 Epoch: 0 Global Step: 17000 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:19:52,899-Speed 2492.89 samples/sec Loss 37.2810 LearningRate 0.000205 Epoch: 0 Global Step: 17010 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:01,114-Speed 2493.35 samples/sec Loss 37.2762 LearningRate 0.000205 Epoch: 0 Global Step: 17020 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:09,325-Speed 2494.46 samples/sec Loss 37.3045 LearningRate 0.000205 Epoch: 0 Global Step: 17030 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:17,535-Speed 2495.20 samples/sec Loss 37.2626 LearningRate 0.000205 Epoch: 0 Global Step: 17040 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:25,697-Speed 2509.55 samples/sec Loss 37.2639 LearningRate 0.000206 Epoch: 0 Global Step: 17050 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:33,906-Speed 2495.05 samples/sec Loss 37.2645 LearningRate 0.000206 Epoch: 0 Global Step: 17060 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:42,119-Speed 2493.95 samples/sec Loss 37.2823 LearningRate 0.000206 Epoch: 0 Global Step: 17070 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:50,333-Speed 2493.77 samples/sec Loss 37.3256 LearningRate 0.000206 Epoch: 0 Global Step: 17080 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:20:58,544-Speed 2494.62 samples/sec Loss 37.2412 LearningRate 0.000206 Epoch: 0 Global Step: 17090 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:06,758-Speed 2493.72 samples/sec Loss 37.2700 LearningRate 0.000206 Epoch: 0 Global Step: 17100 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:14,920-Speed 2509.63 samples/sec Loss 37.2366 LearningRate 0.000206 Epoch: 0 Global Step: 17110 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:23,134-Speed 2493.74 samples/sec Loss 37.2520 LearningRate 0.000206 Epoch: 0 Global Step: 17120 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:31,348-Speed 2493.37 samples/sec Loss 37.2375 LearningRate 0.000206 Epoch: 0 Global Step: 17130 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:39,560-Speed 2494.35 samples/sec Loss 37.2025 LearningRate 0.000207 Epoch: 0 Global Step: 17140 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:47,770-Speed 2495.10 samples/sec Loss 37.2660 LearningRate 0.000207 Epoch: 0 Global Step: 17150 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:21:55,987-Speed 2492.69 samples/sec Loss 37.2173 LearningRate 0.000207 Epoch: 0 Global Step: 17160 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:04,154-Speed 2508.07 samples/sec Loss 37.2471 LearningRate 0.000207 Epoch: 0 Global Step: 17170 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:12,367-Speed 2494.06 samples/sec Loss 37.2245 LearningRate 0.000207 Epoch: 0 Global Step: 17180 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:20,590-Speed 2490.87 samples/sec Loss 37.2475 LearningRate 0.000207 Epoch: 0 Global Step: 17190 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:28,804-Speed 2493.67 samples/sec Loss 37.2423 LearningRate 0.000207 Epoch: 0 Global Step: 17200 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:37,015-Speed 2494.78 samples/sec Loss 37.2042 LearningRate 0.000207 Epoch: 0 Global Step: 17210 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:45,229-Speed 2493.75 samples/sec Loss 37.2147 LearningRate 0.000208 Epoch: 0 Global Step: 17220 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:22:53,393-Speed 2508.93 samples/sec Loss 37.1574 LearningRate 0.000208 Epoch: 0 Global Step: 17230 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:01,605-Speed 2494.39 samples/sec Loss 37.2355 LearningRate 0.000208 Epoch: 0 Global Step: 17240 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:09,818-Speed 2494.01 samples/sec Loss 37.2944 LearningRate 0.000208 Epoch: 0 Global Step: 17250 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:18,031-Speed 2494.05 samples/sec Loss 37.2577 LearningRate 0.000208 Epoch: 0 Global Step: 17260 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:26,245-Speed 2493.59 samples/sec Loss 37.2203 LearningRate 0.000208 Epoch: 0 Global Step: 17270 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:34,459-Speed 2493.92 samples/sec Loss 37.2166 LearningRate 0.000208 Epoch: 0 Global Step: 17280 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:42,626-Speed 2508.14 samples/sec Loss 37.2081 LearningRate 0.000208 Epoch: 0 Global Step: 17290 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:50,834-Speed 2495.27 samples/sec Loss 37.1452 LearningRate 0.000209 Epoch: 0 Global Step: 17300 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:23:59,047-Speed 2494.11 samples/sec Loss 37.1482 LearningRate 0.000209 Epoch: 0 Global Step: 17310 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:07,258-Speed 2494.41 samples/sec Loss 37.0666 LearningRate 0.000209 Epoch: 0 Global Step: 17320 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:15,471-Speed 2494.05 samples/sec Loss 37.1501 LearningRate 0.000209 Epoch: 0 Global Step: 17330 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:23,683-Speed 2494.25 samples/sec Loss 37.0946 LearningRate 0.000209 Epoch: 0 Global Step: 17340 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:31,846-Speed 2509.29 samples/sec Loss 37.1532 LearningRate 0.000209 Epoch: 0 Global Step: 17350 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:40,058-Speed 2494.50 samples/sec Loss 37.1456 LearningRate 0.000209 Epoch: 0 Global Step: 17360 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:48,276-Speed 2492.41 samples/sec Loss 37.1066 LearningRate 0.000209 Epoch: 0 Global Step: 17370 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:24:56,492-Speed 2493.17 samples/sec Loss 37.1273 LearningRate 0.000209 Epoch: 0 Global Step: 17380 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:04,729-Speed 2486.55 samples/sec Loss 37.1294 LearningRate 0.000210 Epoch: 0 Global Step: 17390 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:12,942-Speed 2494.08 samples/sec Loss 37.1161 LearningRate 0.000210 Epoch: 0 Global Step: 17400 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:21,099-Speed 2511.05 samples/sec Loss 37.1144 LearningRate 0.000210 Epoch: 0 Global Step: 17410 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:29,310-Speed 2494.75 samples/sec Loss 37.1028 LearningRate 0.000210 Epoch: 0 Global Step: 17420 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:37,523-Speed 2494.02 samples/sec Loss 37.1023 LearningRate 0.000210 Epoch: 0 Global Step: 17430 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:45,736-Speed 2493.89 samples/sec Loss 37.0681 LearningRate 0.000210 Epoch: 0 Global Step: 17440 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:25:53,957-Speed 2491.69 samples/sec Loss 37.0186 LearningRate 0.000210 Epoch: 0 Global Step: 17450 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:02,172-Speed 2493.39 samples/sec Loss 37.0572 LearningRate 0.000210 Epoch: 0 Global Step: 17460 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:10,332-Speed 2510.33 samples/sec Loss 37.0380 LearningRate 0.000211 Epoch: 0 Global Step: 17470 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:18,541-Speed 2495.23 samples/sec Loss 37.0417 LearningRate 0.000211 Epoch: 0 Global Step: 17480 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:26,759-Speed 2492.37 samples/sec Loss 37.0304 LearningRate 0.000211 Epoch: 0 Global Step: 17490 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:34,975-Speed 2493.08 samples/sec Loss 37.0307 LearningRate 0.000211 Epoch: 0 Global Step: 17500 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:43,193-Speed 2492.46 samples/sec Loss 37.0438 LearningRate 0.000211 Epoch: 0 Global Step: 17510 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:26:51,406-Speed 2493.87 samples/sec Loss 37.0348 LearningRate 0.000211 Epoch: 0 Global Step: 17520 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:26:59,577-Speed 2507.03 samples/sec Loss 36.9607 LearningRate 0.000211 Epoch: 0 Global Step: 17530 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:07,808-Speed 2488.65 samples/sec Loss 37.0351 LearningRate 0.000211 Epoch: 0 Global Step: 17540 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:16,035-Speed 2489.87 samples/sec Loss 37.0032 LearningRate 0.000212 Epoch: 0 Global Step: 17550 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:24,243-Speed 2495.30 samples/sec Loss 36.9865 LearningRate 0.000212 Epoch: 0 Global Step: 17560 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:32,457-Speed 2493.91 samples/sec Loss 37.0228 LearningRate 0.000212 Epoch: 0 Global Step: 17570 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:40,670-Speed 2493.78 samples/sec Loss 36.9549 LearningRate 0.000212 Epoch: 0 Global Step: 17580 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:48,832-Speed 2509.77 samples/sec Loss 36.9810 LearningRate 0.000212 Epoch: 0 Global Step: 17590 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:27:57,057-Speed 2490.55 samples/sec Loss 36.9867 LearningRate 0.000212 Epoch: 0 Global Step: 17600 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:05,273-Speed 2493.09 samples/sec Loss 36.9860 LearningRate 0.000212 Epoch: 0 Global Step: 17610 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:13,485-Speed 2494.49 samples/sec Loss 36.9916 LearningRate 0.000212 Epoch: 0 Global Step: 17620 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:21,697-Speed 2494.06 samples/sec Loss 37.0039 LearningRate 0.000213 Epoch: 0 Global Step: 17630 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:29,910-Speed 2494.07 samples/sec Loss 36.9317 LearningRate 0.000213 Epoch: 0 Global Step: 17640 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:38,069-Speed 2510.60 samples/sec Loss 36.9412 LearningRate 0.000213 Epoch: 0 Global Step: 17650 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:46,284-Speed 2493.34 samples/sec Loss 36.9165 LearningRate 0.000213 Epoch: 0 Global Step: 17660 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:28:54,497-Speed 2494.22 samples/sec Loss 36.9245 LearningRate 0.000213 Epoch: 0 Global Step: 17670 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:02,708-Speed 2494.77 samples/sec Loss 36.8911 LearningRate 0.000213 Epoch: 0 Global Step: 17680 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:10,920-Speed 2494.14 samples/sec Loss 36.9379 LearningRate 0.000213 Epoch: 0 Global Step: 17690 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:19,138-Speed 2492.74 samples/sec Loss 36.9732 LearningRate 0.000213 Epoch: 0 Global Step: 17700 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:27,299-Speed 2509.93 samples/sec Loss 36.9156 LearningRate 0.000213 Epoch: 0 Global Step: 17710 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:35,513-Speed 2493.65 samples/sec Loss 36.9587 LearningRate 0.000214 Epoch: 0 Global Step: 17720 Fp16 Grad Scale: 8192 Required: 186 hours Training: 2022-07-05 18:29:43,684-Speed 2506.81 samples/sec Loss 36.9087 LearningRate 0.000214 Epoch: 0 Global Step: 17730 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:29:51,899-Speed 2493.46 samples/sec Loss 36.9335 LearningRate 0.000214 Epoch: 0 Global Step: 17740 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:00,109-Speed 2494.75 samples/sec Loss 36.9014 LearningRate 0.000214 Epoch: 0 Global Step: 17750 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:08,324-Speed 2493.47 samples/sec Loss 36.9004 LearningRate 0.000214 Epoch: 0 Global Step: 17760 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:16,485-Speed 2509.98 samples/sec Loss 36.9268 LearningRate 0.000214 Epoch: 0 Global Step: 17770 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:24,696-Speed 2494.82 samples/sec Loss 36.9115 LearningRate 0.000214 Epoch: 0 Global Step: 17780 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:32,911-Speed 2493.36 samples/sec Loss 36.8987 LearningRate 0.000214 Epoch: 0 Global Step: 17790 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:41,126-Speed 2493.62 samples/sec Loss 36.8996 LearningRate 0.000215 Epoch: 0 Global Step: 17800 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:49,339-Speed 2494.09 samples/sec Loss 36.9198 LearningRate 0.000215 Epoch: 0 Global Step: 17810 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:30:57,554-Speed 2493.39 samples/sec Loss 36.9086 LearningRate 0.000215 Epoch: 0 Global Step: 17820 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:05,718-Speed 2508.86 samples/sec Loss 36.9252 LearningRate 0.000215 Epoch: 0 Global Step: 17830 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:13,934-Speed 2493.35 samples/sec Loss 36.8843 LearningRate 0.000215 Epoch: 0 Global Step: 17840 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:22,148-Speed 2493.67 samples/sec Loss 36.8903 LearningRate 0.000215 Epoch: 0 Global Step: 17850 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:30,359-Speed 2494.53 samples/sec Loss 36.8712 LearningRate 0.000215 Epoch: 0 Global Step: 17860 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:38,573-Speed 2494.10 samples/sec Loss 36.8862 LearningRate 0.000215 Epoch: 0 Global Step: 17870 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:46,786-Speed 2494.13 samples/sec Loss 36.8833 LearningRate 0.000216 Epoch: 0 Global Step: 17880 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:31:54,945-Speed 2510.56 samples/sec Loss 36.8008 LearningRate 0.000216 Epoch: 0 Global Step: 17890 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:03,162-Speed 2492.69 samples/sec Loss 36.8572 LearningRate 0.000216 Epoch: 0 Global Step: 17900 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:11,375-Speed 2493.77 samples/sec Loss 36.8226 LearningRate 0.000216 Epoch: 0 Global Step: 17910 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:19,611-Speed 2487.07 samples/sec Loss 36.8255 LearningRate 0.000216 Epoch: 0 Global Step: 17920 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:27,831-Speed 2491.97 samples/sec Loss 36.8175 LearningRate 0.000216 Epoch: 0 Global Step: 17930 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:36,043-Speed 2494.35 samples/sec Loss 36.8341 LearningRate 0.000216 Epoch: 0 Global Step: 17940 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:44,202-Speed 2510.38 samples/sec Loss 36.7769 LearningRate 0.000216 Epoch: 0 Global Step: 17950 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:32:52,414-Speed 2494.29 samples/sec Loss 36.7828 LearningRate 0.000216 Epoch: 0 Global Step: 17960 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:00,624-Speed 2495.00 samples/sec Loss 36.8434 LearningRate 0.000217 Epoch: 0 Global Step: 17970 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:08,842-Speed 2492.60 samples/sec Loss 36.7982 LearningRate 0.000217 Epoch: 0 Global Step: 17980 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:17,060-Speed 2492.49 samples/sec Loss 36.8429 LearningRate 0.000217 Epoch: 0 Global Step: 17990 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:25,274-Speed 2493.57 samples/sec Loss 36.7875 LearningRate 0.000217 Epoch: 0 Global Step: 18000 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:33,433-Speed 2510.56 samples/sec Loss 36.8168 LearningRate 0.000217 Epoch: 0 Global Step: 18010 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:41,644-Speed 2494.53 samples/sec Loss 36.7925 LearningRate 0.000217 Epoch: 0 Global Step: 18020 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:49,856-Speed 2494.40 samples/sec Loss 36.7747 LearningRate 0.000217 Epoch: 0 Global Step: 18030 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:33:58,068-Speed 2494.50 samples/sec Loss 36.7601 LearningRate 0.000217 Epoch: 0 Global Step: 18040 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:06,278-Speed 2494.82 samples/sec Loss 36.8254 LearningRate 0.000218 Epoch: 0 Global Step: 18050 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:14,503-Speed 2490.46 samples/sec Loss 36.8321 LearningRate 0.000218 Epoch: 0 Global Step: 18060 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:22,659-Speed 2511.16 samples/sec Loss 36.7877 LearningRate 0.000218 Epoch: 0 Global Step: 18070 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:30,876-Speed 2492.91 samples/sec Loss 36.7575 LearningRate 0.000218 Epoch: 0 Global Step: 18080 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:39,092-Speed 2493.18 samples/sec Loss 36.7393 LearningRate 0.000218 Epoch: 0 Global Step: 18090 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:47,308-Speed 2493.05 samples/sec Loss 36.7506 LearningRate 0.000218 Epoch: 0 Global Step: 18100 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:34:55,519-Speed 2494.56 samples/sec Loss 36.6954 LearningRate 0.000218 Epoch: 0 Global Step: 18110 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:03,734-Speed 2493.56 samples/sec Loss 36.7704 LearningRate 0.000218 Epoch: 0 Global Step: 18120 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:11,902-Speed 2507.66 samples/sec Loss 36.7417 LearningRate 0.000219 Epoch: 0 Global Step: 18130 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:20,116-Speed 2493.88 samples/sec Loss 36.7355 LearningRate 0.000219 Epoch: 0 Global Step: 18140 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:28,329-Speed 2493.83 samples/sec Loss 36.7606 LearningRate 0.000219 Epoch: 0 Global Step: 18150 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:36,546-Speed 2492.94 samples/sec Loss 36.7174 LearningRate 0.000219 Epoch: 0 Global Step: 18160 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:44,771-Speed 2490.37 samples/sec Loss 36.7022 LearningRate 0.000219 Epoch: 0 Global Step: 18170 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:35:52,986-Speed 2493.29 samples/sec Loss 36.7193 LearningRate 0.000219 Epoch: 0 Global Step: 18180 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:01,144-Speed 2510.84 samples/sec Loss 36.6705 LearningRate 0.000219 Epoch: 0 Global Step: 18190 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:09,359-Speed 2493.58 samples/sec Loss 36.6672 LearningRate 0.000219 Epoch: 0 Global Step: 18200 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:17,570-Speed 2494.59 samples/sec Loss 36.5855 LearningRate 0.000220 Epoch: 0 Global Step: 18210 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:25,780-Speed 2495.00 samples/sec Loss 36.6707 LearningRate 0.000220 Epoch: 0 Global Step: 18220 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:33,994-Speed 2493.55 samples/sec Loss 36.6432 LearningRate 0.000220 Epoch: 0 Global Step: 18230 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:42,205-Speed 2494.64 samples/sec Loss 36.6692 LearningRate 0.000220 Epoch: 0 Global Step: 18240 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:36:50,368-Speed 2509.14 samples/sec Loss 36.6129 LearningRate 0.000220 Epoch: 0 Global Step: 18250 Fp16 Grad Scale: 4096 Required: 186 hours Training: 2022-07-05 18:36:58,583-Speed 2493.50 samples/sec Loss 36.6065 LearningRate 0.000220 Epoch: 0 Global Step: 18260 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:06,800-Speed 2492.94 samples/sec Loss 36.6026 LearningRate 0.000220 Epoch: 0 Global Step: 18270 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:15,011-Speed 2494.46 samples/sec Loss 36.5970 LearningRate 0.000220 Epoch: 0 Global Step: 18280 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:23,234-Speed 2490.89 samples/sec Loss 36.6214 LearningRate 0.000220 Epoch: 0 Global Step: 18290 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:31,458-Speed 2490.76 samples/sec Loss 36.6030 LearningRate 0.000221 Epoch: 0 Global Step: 18300 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:39,617-Speed 2510.67 samples/sec Loss 36.6453 LearningRate 0.000221 Epoch: 0 Global Step: 18310 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:47,831-Speed 2493.55 samples/sec Loss 36.5716 LearningRate 0.000221 Epoch: 0 Global Step: 18320 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:37:56,044-Speed 2494.16 samples/sec Loss 36.6534 LearningRate 0.000221 Epoch: 0 Global Step: 18330 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:04,258-Speed 2493.68 samples/sec Loss 36.6274 LearningRate 0.000221 Epoch: 0 Global Step: 18340 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:12,468-Speed 2494.99 samples/sec Loss 36.6639 LearningRate 0.000221 Epoch: 0 Global Step: 18350 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:20,698-Speed 2488.77 samples/sec Loss 36.6019 LearningRate 0.000221 Epoch: 0 Global Step: 18360 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:28,854-Speed 2511.50 samples/sec Loss 36.5484 LearningRate 0.000221 Epoch: 0 Global Step: 18370 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:37,071-Speed 2493.19 samples/sec Loss 36.5969 LearningRate 0.000222 Epoch: 0 Global Step: 18380 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:45,283-Speed 2494.33 samples/sec Loss 36.5555 LearningRate 0.000222 Epoch: 0 Global Step: 18390 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:38:53,506-Speed 2491.20 samples/sec Loss 36.5655 LearningRate 0.000222 Epoch: 0 Global Step: 18400 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:01,718-Speed 2494.65 samples/sec Loss 36.6195 LearningRate 0.000222 Epoch: 0 Global Step: 18410 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:09,932-Speed 2493.63 samples/sec Loss 36.5940 LearningRate 0.000222 Epoch: 0 Global Step: 18420 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:18,103-Speed 2506.84 samples/sec Loss 36.5572 LearningRate 0.000222 Epoch: 0 Global Step: 18430 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:26,319-Speed 2493.02 samples/sec Loss 36.5999 LearningRate 0.000222 Epoch: 0 Global Step: 18440 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:34,533-Speed 2493.85 samples/sec Loss 36.6643 LearningRate 0.000222 Epoch: 0 Global Step: 18450 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:42,749-Speed 2493.23 samples/sec Loss 36.6024 LearningRate 0.000223 Epoch: 0 Global Step: 18460 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:50,964-Speed 2493.36 samples/sec Loss 36.5673 LearningRate 0.000223 Epoch: 0 Global Step: 18470 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:39:59,178-Speed 2493.55 samples/sec Loss 36.5779 LearningRate 0.000223 Epoch: 0 Global Step: 18480 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:07,349-Speed 2506.81 samples/sec Loss 36.5136 LearningRate 0.000223 Epoch: 0 Global Step: 18490 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:15,562-Speed 2494.12 samples/sec Loss 36.5657 LearningRate 0.000223 Epoch: 0 Global Step: 18500 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:23,774-Speed 2494.24 samples/sec Loss 36.5150 LearningRate 0.000223 Epoch: 0 Global Step: 18510 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:31,985-Speed 2494.72 samples/sec Loss 36.5506 LearningRate 0.000223 Epoch: 0 Global Step: 18520 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:40,198-Speed 2493.86 samples/sec Loss 36.5259 LearningRate 0.000223 Epoch: 0 Global Step: 18530 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:48,414-Speed 2493.19 samples/sec Loss 36.4815 LearningRate 0.000223 Epoch: 0 Global Step: 18540 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:40:56,575-Speed 2509.99 samples/sec Loss 36.5800 LearningRate 0.000224 Epoch: 0 Global Step: 18550 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:04,798-Speed 2490.86 samples/sec Loss 36.5823 LearningRate 0.000224 Epoch: 0 Global Step: 18560 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:13,012-Speed 2493.78 samples/sec Loss 36.7331 LearningRate 0.000224 Epoch: 0 Global Step: 18570 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:21,223-Speed 2494.84 samples/sec Loss 36.6814 LearningRate 0.000224 Epoch: 0 Global Step: 18580 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:29,437-Speed 2493.89 samples/sec Loss 36.5703 LearningRate 0.000224 Epoch: 0 Global Step: 18590 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:37,649-Speed 2494.41 samples/sec Loss 36.5134 LearningRate 0.000224 Epoch: 0 Global Step: 18600 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:45,819-Speed 2507.02 samples/sec Loss 36.5029 LearningRate 0.000224 Epoch: 0 Global Step: 18610 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:41:54,032-Speed 2494.03 samples/sec Loss 36.4920 LearningRate 0.000224 Epoch: 0 Global Step: 18620 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:02,278-Speed 2483.84 samples/sec Loss 36.4670 LearningRate 0.000225 Epoch: 0 Global Step: 18630 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:10,492-Speed 2493.76 samples/sec Loss 36.4622 LearningRate 0.000225 Epoch: 0 Global Step: 18640 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:18,703-Speed 2494.72 samples/sec Loss 36.4428 LearningRate 0.000225 Epoch: 0 Global Step: 18650 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:26,915-Speed 2494.55 samples/sec Loss 36.4529 LearningRate 0.000225 Epoch: 0 Global Step: 18660 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:35,074-Speed 2510.80 samples/sec Loss 36.5199 LearningRate 0.000225 Epoch: 0 Global Step: 18670 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:43,287-Speed 2494.27 samples/sec Loss 36.4736 LearningRate 0.000225 Epoch: 0 Global Step: 18680 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:51,498-Speed 2494.77 samples/sec Loss 36.4874 LearningRate 0.000225 Epoch: 0 Global Step: 18690 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:42:59,709-Speed 2494.71 samples/sec Loss 36.4592 LearningRate 0.000225 Epoch: 0 Global Step: 18700 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:07,925-Speed 2493.16 samples/sec Loss 36.4839 LearningRate 0.000226 Epoch: 0 Global Step: 18710 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:16,138-Speed 2494.05 samples/sec Loss 36.4792 LearningRate 0.000226 Epoch: 0 Global Step: 18720 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:24,297-Speed 2510.56 samples/sec Loss 36.4261 LearningRate 0.000226 Epoch: 0 Global Step: 18730 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:32,510-Speed 2494.10 samples/sec Loss 36.5177 LearningRate 0.000226 Epoch: 0 Global Step: 18740 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:40,722-Speed 2494.10 samples/sec Loss 36.5042 LearningRate 0.000226 Epoch: 0 Global Step: 18750 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:48,936-Speed 2494.08 samples/sec Loss 36.4836 LearningRate 0.000226 Epoch: 0 Global Step: 18760 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:43:57,148-Speed 2494.39 samples/sec Loss 36.3928 LearningRate 0.000226 Epoch: 0 Global Step: 18770 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:05,364-Speed 2492.96 samples/sec Loss 36.4145 LearningRate 0.000226 Epoch: 0 Global Step: 18780 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:13,523-Speed 2510.66 samples/sec Loss 36.4833 LearningRate 0.000226 Epoch: 0 Global Step: 18790 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:21,734-Speed 2494.55 samples/sec Loss 36.4468 LearningRate 0.000227 Epoch: 0 Global Step: 18800 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:29,949-Speed 2493.61 samples/sec Loss 36.3932 LearningRate 0.000227 Epoch: 0 Global Step: 18810 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:38,166-Speed 2492.76 samples/sec Loss 36.4664 LearningRate 0.000227 Epoch: 0 Global Step: 18820 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:46,375-Speed 2495.03 samples/sec Loss 36.3791 LearningRate 0.000227 Epoch: 0 Global Step: 18830 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:44:54,590-Speed 2493.54 samples/sec Loss 36.3473 LearningRate 0.000227 Epoch: 0 Global Step: 18840 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:02,749-Speed 2510.68 samples/sec Loss 36.3415 LearningRate 0.000227 Epoch: 0 Global Step: 18850 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:10,964-Speed 2493.37 samples/sec Loss 36.2715 LearningRate 0.000227 Epoch: 0 Global Step: 18860 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:19,174-Speed 2494.98 samples/sec Loss 36.3365 LearningRate 0.000227 Epoch: 0 Global Step: 18870 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:27,384-Speed 2494.92 samples/sec Loss 36.2910 LearningRate 0.000228 Epoch: 0 Global Step: 18880 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:35,604-Speed 2491.93 samples/sec Loss 36.2860 LearningRate 0.000228 Epoch: 0 Global Step: 18890 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:43,831-Speed 2489.57 samples/sec Loss 36.3136 LearningRate 0.000228 Epoch: 0 Global Step: 18900 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:45:51,992-Speed 2509.93 samples/sec Loss 36.3898 LearningRate 0.000228 Epoch: 0 Global Step: 18910 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:46:00,201-Speed 2495.47 samples/sec Loss 36.3575 LearningRate 0.000228 Epoch: 0 Global Step: 18920 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:46:08,410-Speed 2495.20 samples/sec Loss 36.3653 LearningRate 0.000228 Epoch: 0 Global Step: 18930 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:16,624-Speed 2494.34 samples/sec Loss 36.3164 LearningRate 0.000228 Epoch: 0 Global Step: 18940 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:24,838-Speed 2493.60 samples/sec Loss 36.3725 LearningRate 0.000228 Epoch: 0 Global Step: 18950 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:33,063-Speed 2490.46 samples/sec Loss 36.2858 LearningRate 0.000229 Epoch: 0 Global Step: 18960 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:41,235-Speed 2506.53 samples/sec Loss 36.2361 LearningRate 0.000229 Epoch: 0 Global Step: 18970 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:49,450-Speed 2493.43 samples/sec Loss 36.3492 LearningRate 0.000229 Epoch: 0 Global Step: 18980 Fp16 Grad Scale: 8192 Required: 185 hours Training: 2022-07-05 18:46:57,620-Speed 2507.40 samples/sec Loss 36.3505 LearningRate 0.000229 Epoch: 0 Global Step: 18990 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:05,833-Speed 2493.84 samples/sec Loss 36.4138 LearningRate 0.000229 Epoch: 0 Global Step: 19000 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:14,043-Speed 2494.91 samples/sec Loss 36.3372 LearningRate 0.000229 Epoch: 0 Global Step: 19010 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:22,258-Speed 2493.42 samples/sec Loss 36.3800 LearningRate 0.000229 Epoch: 0 Global Step: 19020 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:30,413-Speed 2511.68 samples/sec Loss 36.3364 LearningRate 0.000229 Epoch: 0 Global Step: 19030 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:38,624-Speed 2494.74 samples/sec Loss 36.3038 LearningRate 0.000230 Epoch: 0 Global Step: 19040 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:46,835-Speed 2494.60 samples/sec Loss 36.3341 LearningRate 0.000230 Epoch: 0 Global Step: 19050 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:47:55,053-Speed 2492.58 samples/sec Loss 36.2977 LearningRate 0.000230 Epoch: 0 Global Step: 19060 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:03,265-Speed 2494.39 samples/sec Loss 36.3090 LearningRate 0.000230 Epoch: 0 Global Step: 19070 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:11,481-Speed 2493.08 samples/sec Loss 36.3214 LearningRate 0.000230 Epoch: 0 Global Step: 19080 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:19,638-Speed 2511.05 samples/sec Loss 36.2610 LearningRate 0.000230 Epoch: 0 Global Step: 19090 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:27,853-Speed 2493.21 samples/sec Loss 36.2589 LearningRate 0.000230 Epoch: 0 Global Step: 19100 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:36,063-Speed 2494.84 samples/sec Loss 36.2412 LearningRate 0.000230 Epoch: 0 Global Step: 19110 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:44,289-Speed 2490.18 samples/sec Loss 36.2854 LearningRate 0.000230 Epoch: 0 Global Step: 19120 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:48:52,499-Speed 2494.99 samples/sec Loss 36.2676 LearningRate 0.000231 Epoch: 0 Global Step: 19130 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:00,711-Speed 2494.33 samples/sec Loss 36.3119 LearningRate 0.000231 Epoch: 0 Global Step: 19140 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:08,868-Speed 2511.17 samples/sec Loss 36.2679 LearningRate 0.000231 Epoch: 0 Global Step: 19150 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:17,079-Speed 2494.56 samples/sec Loss 36.2220 LearningRate 0.000231 Epoch: 0 Global Step: 19160 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:25,296-Speed 2492.83 samples/sec Loss 36.1620 LearningRate 0.000231 Epoch: 0 Global Step: 19170 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:33,507-Speed 2494.68 samples/sec Loss 36.2081 LearningRate 0.000231 Epoch: 0 Global Step: 19180 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:41,718-Speed 2494.52 samples/sec Loss 36.1780 LearningRate 0.000231 Epoch: 0 Global Step: 19190 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:49,927-Speed 2495.08 samples/sec Loss 36.1279 LearningRate 0.000231 Epoch: 0 Global Step: 19200 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:49:58,084-Speed 2511.07 samples/sec Loss 36.1920 LearningRate 0.000232 Epoch: 0 Global Step: 19210 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:06,299-Speed 2493.60 samples/sec Loss 36.1746 LearningRate 0.000232 Epoch: 0 Global Step: 19220 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:14,514-Speed 2493.61 samples/sec Loss 36.1552 LearningRate 0.000232 Epoch: 0 Global Step: 19230 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:22,725-Speed 2494.57 samples/sec Loss 36.1638 LearningRate 0.000232 Epoch: 0 Global Step: 19240 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:30,935-Speed 2494.78 samples/sec Loss 36.1746 LearningRate 0.000232 Epoch: 0 Global Step: 19250 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:39,146-Speed 2494.59 samples/sec Loss 36.1105 LearningRate 0.000232 Epoch: 0 Global Step: 19260 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:47,306-Speed 2510.45 samples/sec Loss 36.1049 LearningRate 0.000232 Epoch: 0 Global Step: 19270 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:50:55,520-Speed 2494.10 samples/sec Loss 36.0705 LearningRate 0.000232 Epoch: 0 Global Step: 19280 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:03,729-Speed 2495.12 samples/sec Loss 36.2084 LearningRate 0.000233 Epoch: 0 Global Step: 19290 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:11,943-Speed 2493.73 samples/sec Loss 36.1557 LearningRate 0.000233 Epoch: 0 Global Step: 19300 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:20,155-Speed 2494.45 samples/sec Loss 36.1112 LearningRate 0.000233 Epoch: 0 Global Step: 19310 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:28,363-Speed 2495.29 samples/sec Loss 36.1784 LearningRate 0.000233 Epoch: 0 Global Step: 19320 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:36,520-Speed 2511.23 samples/sec Loss 36.1609 LearningRate 0.000233 Epoch: 0 Global Step: 19330 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:44,728-Speed 2495.49 samples/sec Loss 36.1518 LearningRate 0.000233 Epoch: 0 Global Step: 19340 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:51:52,940-Speed 2494.45 samples/sec Loss 36.2237 LearningRate 0.000233 Epoch: 0 Global Step: 19350 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:01,151-Speed 2494.59 samples/sec Loss 36.1838 LearningRate 0.000233 Epoch: 0 Global Step: 19360 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:09,363-Speed 2494.15 samples/sec Loss 36.1140 LearningRate 0.000233 Epoch: 0 Global Step: 19370 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:17,573-Speed 2494.95 samples/sec Loss 36.1576 LearningRate 0.000234 Epoch: 0 Global Step: 19380 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:25,734-Speed 2510.00 samples/sec Loss 36.0617 LearningRate 0.000234 Epoch: 0 Global Step: 19390 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:33,945-Speed 2494.53 samples/sec Loss 36.1044 LearningRate 0.000234 Epoch: 0 Global Step: 19400 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:42,160-Speed 2493.28 samples/sec Loss 36.1591 LearningRate 0.000234 Epoch: 0 Global Step: 19410 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:50,396-Speed 2487.16 samples/sec Loss 36.1192 LearningRate 0.000234 Epoch: 0 Global Step: 19420 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:52:58,608-Speed 2494.24 samples/sec Loss 36.1833 LearningRate 0.000234 Epoch: 0 Global Step: 19430 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:06,829-Speed 2491.48 samples/sec Loss 36.0685 LearningRate 0.000234 Epoch: 0 Global Step: 19440 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:14,984-Speed 2511.67 samples/sec Loss 36.0932 LearningRate 0.000234 Epoch: 0 Global Step: 19450 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:23,194-Speed 2494.99 samples/sec Loss 36.0882 LearningRate 0.000235 Epoch: 0 Global Step: 19460 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:31,408-Speed 2493.74 samples/sec Loss 36.0608 LearningRate 0.000235 Epoch: 0 Global Step: 19470 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:39,632-Speed 2490.62 samples/sec Loss 36.0465 LearningRate 0.000235 Epoch: 0 Global Step: 19480 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:47,843-Speed 2495.05 samples/sec Loss 36.0499 LearningRate 0.000235 Epoch: 0 Global Step: 19490 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:53:56,059-Speed 2493.22 samples/sec Loss 36.1094 LearningRate 0.000235 Epoch: 0 Global Step: 19500 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:04,219-Speed 2510.10 samples/sec Loss 36.0762 LearningRate 0.000235 Epoch: 0 Global Step: 19510 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:12,430-Speed 2494.59 samples/sec Loss 36.0000 LearningRate 0.000235 Epoch: 0 Global Step: 19520 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:20,651-Speed 2491.76 samples/sec Loss 36.0887 LearningRate 0.000235 Epoch: 0 Global Step: 19530 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:28,865-Speed 2493.99 samples/sec Loss 36.0802 LearningRate 0.000236 Epoch: 0 Global Step: 19540 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:37,076-Speed 2494.51 samples/sec Loss 36.0726 LearningRate 0.000236 Epoch: 0 Global Step: 19550 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:45,287-Speed 2494.75 samples/sec Loss 36.0036 LearningRate 0.000236 Epoch: 0 Global Step: 19560 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:54:53,445-Speed 2510.77 samples/sec Loss 36.0159 LearningRate 0.000236 Epoch: 0 Global Step: 19570 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:01,655-Speed 2494.98 samples/sec Loss 36.0498 LearningRate 0.000236 Epoch: 0 Global Step: 19580 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:09,868-Speed 2494.11 samples/sec Loss 36.0725 LearningRate 0.000236 Epoch: 0 Global Step: 19590 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:18,082-Speed 2493.58 samples/sec Loss 36.1285 LearningRate 0.000236 Epoch: 0 Global Step: 19600 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:26,293-Speed 2494.60 samples/sec Loss 36.1355 LearningRate 0.000236 Epoch: 0 Global Step: 19610 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:34,504-Speed 2494.53 samples/sec Loss 36.0986 LearningRate 0.000236 Epoch: 0 Global Step: 19620 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:42,663-Speed 2510.50 samples/sec Loss 36.1273 LearningRate 0.000237 Epoch: 0 Global Step: 19630 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:50,873-Speed 2495.05 samples/sec Loss 36.1812 LearningRate 0.000237 Epoch: 0 Global Step: 19640 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:55:59,084-Speed 2494.50 samples/sec Loss 36.1362 LearningRate 0.000237 Epoch: 0 Global Step: 19650 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:07,298-Speed 2493.76 samples/sec Loss 36.0666 LearningRate 0.000237 Epoch: 0 Global Step: 19660 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:15,508-Speed 2495.00 samples/sec Loss 36.1181 LearningRate 0.000237 Epoch: 0 Global Step: 19670 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:23,731-Speed 2491.04 samples/sec Loss 36.1112 LearningRate 0.000237 Epoch: 0 Global Step: 19680 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:31,900-Speed 2507.52 samples/sec Loss 36.0128 LearningRate 0.000237 Epoch: 0 Global Step: 19690 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:40,114-Speed 2493.57 samples/sec Loss 36.0519 LearningRate 0.000237 Epoch: 0 Global Step: 19700 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:48,325-Speed 2494.40 samples/sec Loss 35.9998 LearningRate 0.000238 Epoch: 0 Global Step: 19710 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:56:56,537-Speed 2494.45 samples/sec Loss 36.0243 LearningRate 0.000238 Epoch: 0 Global Step: 19720 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:04,752-Speed 2493.32 samples/sec Loss 35.9466 LearningRate 0.000238 Epoch: 0 Global Step: 19730 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:12,970-Speed 2492.58 samples/sec Loss 36.0214 LearningRate 0.000238 Epoch: 0 Global Step: 19740 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:21,131-Speed 2509.80 samples/sec Loss 36.0704 LearningRate 0.000238 Epoch: 0 Global Step: 19750 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:29,343-Speed 2494.52 samples/sec Loss 36.0448 LearningRate 0.000238 Epoch: 0 Global Step: 19760 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:37,552-Speed 2495.01 samples/sec Loss 35.9814 LearningRate 0.000238 Epoch: 0 Global Step: 19770 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:45,764-Speed 2494.23 samples/sec Loss 35.9571 LearningRate 0.000238 Epoch: 0 Global Step: 19780 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:57:53,975-Speed 2494.73 samples/sec Loss 35.9910 LearningRate 0.000239 Epoch: 0 Global Step: 19790 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:58:02,186-Speed 2494.73 samples/sec Loss 35.9679 LearningRate 0.000239 Epoch: 0 Global Step: 19800 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:58:10,344-Speed 2510.64 samples/sec Loss 35.9578 LearningRate 0.000239 Epoch: 0 Global Step: 19810 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:58:18,557-Speed 2494.03 samples/sec Loss 35.9208 LearningRate 0.000239 Epoch: 0 Global Step: 19820 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 18:58:26,723-Speed 2508.59 samples/sec Loss 35.8498 LearningRate 0.000239 Epoch: 0 Global Step: 19830 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:58:34,936-Speed 2493.95 samples/sec Loss 35.9146 LearningRate 0.000239 Epoch: 0 Global Step: 19840 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:58:43,146-Speed 2494.75 samples/sec Loss 35.8450 LearningRate 0.000239 Epoch: 0 Global Step: 19850 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:58:51,363-Speed 2493.15 samples/sec Loss 35.9029 LearningRate 0.000239 Epoch: 0 Global Step: 19860 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:58:59,530-Speed 2507.99 samples/sec Loss 35.8606 LearningRate 0.000240 Epoch: 0 Global Step: 19870 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:07,755-Speed 2490.21 samples/sec Loss 35.8442 LearningRate 0.000240 Epoch: 0 Global Step: 19880 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:15,967-Speed 2494.19 samples/sec Loss 35.8242 LearningRate 0.000240 Epoch: 0 Global Step: 19890 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:24,176-Speed 2495.19 samples/sec Loss 35.8310 LearningRate 0.000240 Epoch: 0 Global Step: 19900 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:32,390-Speed 2493.84 samples/sec Loss 35.8289 LearningRate 0.000240 Epoch: 0 Global Step: 19910 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:40,607-Speed 2492.61 samples/sec Loss 35.8284 LearningRate 0.000240 Epoch: 0 Global Step: 19920 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:48,761-Speed 2512.14 samples/sec Loss 35.8401 LearningRate 0.000240 Epoch: 0 Global Step: 19930 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 18:59:56,975-Speed 2493.58 samples/sec Loss 35.8700 LearningRate 0.000240 Epoch: 0 Global Step: 19940 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:05,187-Speed 2494.35 samples/sec Loss 35.7699 LearningRate 0.000240 Epoch: 0 Global Step: 19950 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:13,399-Speed 2494.51 samples/sec Loss 35.8118 LearningRate 0.000241 Epoch: 0 Global Step: 19960 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:21,612-Speed 2493.96 samples/sec Loss 35.7986 LearningRate 0.000241 Epoch: 0 Global Step: 19970 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:29,838-Speed 2490.49 samples/sec Loss 35.8083 LearningRate 0.000241 Epoch: 0 Global Step: 19980 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:37,995-Speed 2511.22 samples/sec Loss 35.8120 LearningRate 0.000241 Epoch: 0 Global Step: 19990 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:46,205-Speed 2494.84 samples/sec Loss 35.8296 LearningRate 0.000241 Epoch: 0 Global Step: 20000 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:00:54,417-Speed 2494.54 samples/sec Loss 35.8088 LearningRate 0.000241 Epoch: 0 Global Step: 20010 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:02,626-Speed 2495.01 samples/sec Loss 35.8535 LearningRate 0.000241 Epoch: 0 Global Step: 20020 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:10,843-Speed 2492.98 samples/sec Loss 35.8686 LearningRate 0.000241 Epoch: 0 Global Step: 20030 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:19,056-Speed 2494.12 samples/sec Loss 35.9188 LearningRate 0.000242 Epoch: 0 Global Step: 20040 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:27,214-Speed 2510.61 samples/sec Loss 35.7218 LearningRate 0.000242 Epoch: 0 Global Step: 20050 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:35,422-Speed 2495.38 samples/sec Loss 35.8413 LearningRate 0.000242 Epoch: 0 Global Step: 20060 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:43,629-Speed 2495.85 samples/sec Loss 35.8169 LearningRate 0.000242 Epoch: 0 Global Step: 20070 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:01:51,839-Speed 2494.85 samples/sec Loss 35.7912 LearningRate 0.000242 Epoch: 0 Global Step: 20080 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:00,044-Speed 2496.44 samples/sec Loss 35.7785 LearningRate 0.000242 Epoch: 0 Global Step: 20090 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:08,255-Speed 2494.57 samples/sec Loss 35.7508 LearningRate 0.000242 Epoch: 0 Global Step: 20100 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:16,414-Speed 2510.49 samples/sec Loss 35.7455 LearningRate 0.000242 Epoch: 0 Global Step: 20110 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:24,636-Speed 2491.36 samples/sec Loss 35.6854 LearningRate 0.000243 Epoch: 0 Global Step: 20120 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:32,850-Speed 2493.62 samples/sec Loss 35.7759 LearningRate 0.000243 Epoch: 0 Global Step: 20130 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:41,064-Speed 2493.88 samples/sec Loss 35.7479 LearningRate 0.000243 Epoch: 0 Global Step: 20140 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:49,285-Speed 2491.63 samples/sec Loss 35.6788 LearningRate 0.000243 Epoch: 0 Global Step: 20150 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:02:57,498-Speed 2494.02 samples/sec Loss 35.6983 LearningRate 0.000243 Epoch: 0 Global Step: 20160 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:05,653-Speed 2511.87 samples/sec Loss 35.7191 LearningRate 0.000243 Epoch: 0 Global Step: 20170 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:13,864-Speed 2494.76 samples/sec Loss 35.6519 LearningRate 0.000243 Epoch: 0 Global Step: 20180 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:22,075-Speed 2494.63 samples/sec Loss 35.6515 LearningRate 0.000243 Epoch: 0 Global Step: 20190 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:30,286-Speed 2494.41 samples/sec Loss 35.6669 LearningRate 0.000243 Epoch: 0 Global Step: 20200 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:38,497-Speed 2494.59 samples/sec Loss 35.6664 LearningRate 0.000244 Epoch: 0 Global Step: 20210 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:46,707-Speed 2495.21 samples/sec Loss 35.6232 LearningRate 0.000244 Epoch: 0 Global Step: 20220 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:03:54,871-Speed 2508.91 samples/sec Loss 35.7366 LearningRate 0.000244 Epoch: 0 Global Step: 20230 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:03,086-Speed 2493.45 samples/sec Loss 35.7356 LearningRate 0.000244 Epoch: 0 Global Step: 20240 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:11,295-Speed 2495.45 samples/sec Loss 35.7755 LearningRate 0.000244 Epoch: 0 Global Step: 20250 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:19,506-Speed 2494.57 samples/sec Loss 35.6582 LearningRate 0.000244 Epoch: 0 Global Step: 20260 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:27,715-Speed 2495.06 samples/sec Loss 35.6695 LearningRate 0.000244 Epoch: 0 Global Step: 20270 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:35,926-Speed 2494.71 samples/sec Loss 35.6408 LearningRate 0.000244 Epoch: 0 Global Step: 20280 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:44,084-Speed 2510.81 samples/sec Loss 35.6345 LearningRate 0.000245 Epoch: 0 Global Step: 20290 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:04:52,300-Speed 2493.46 samples/sec Loss 35.5763 LearningRate 0.000245 Epoch: 0 Global Step: 20300 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:00,515-Speed 2493.18 samples/sec Loss 35.6424 LearningRate 0.000245 Epoch: 0 Global Step: 20310 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:08,734-Speed 2492.52 samples/sec Loss 35.6445 LearningRate 0.000245 Epoch: 0 Global Step: 20320 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:16,945-Speed 2494.73 samples/sec Loss 35.6697 LearningRate 0.000245 Epoch: 0 Global Step: 20330 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:25,157-Speed 2494.22 samples/sec Loss 35.6475 LearningRate 0.000245 Epoch: 0 Global Step: 20340 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:33,316-Speed 2510.58 samples/sec Loss 35.5744 LearningRate 0.000245 Epoch: 0 Global Step: 20350 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:41,527-Speed 2494.35 samples/sec Loss 35.6277 LearningRate 0.000245 Epoch: 0 Global Step: 20360 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:49,739-Speed 2494.52 samples/sec Loss 35.6061 LearningRate 0.000246 Epoch: 0 Global Step: 20370 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:05:57,956-Speed 2492.72 samples/sec Loss 35.6401 LearningRate 0.000246 Epoch: 0 Global Step: 20380 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:06,166-Speed 2494.97 samples/sec Loss 35.6224 LearningRate 0.000246 Epoch: 0 Global Step: 20390 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:14,374-Speed 2495.37 samples/sec Loss 35.6524 LearningRate 0.000246 Epoch: 0 Global Step: 20400 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:22,535-Speed 2510.24 samples/sec Loss 35.6083 LearningRate 0.000246 Epoch: 0 Global Step: 20410 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:30,751-Speed 2493.02 samples/sec Loss 35.6223 LearningRate 0.000246 Epoch: 0 Global Step: 20420 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:38,963-Speed 2494.07 samples/sec Loss 35.5134 LearningRate 0.000246 Epoch: 0 Global Step: 20430 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:47,187-Speed 2490.80 samples/sec Loss 35.5220 LearningRate 0.000246 Epoch: 0 Global Step: 20440 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:06:55,402-Speed 2493.51 samples/sec Loss 35.5021 LearningRate 0.000247 Epoch: 0 Global Step: 20450 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:03,611-Speed 2495.32 samples/sec Loss 35.5013 LearningRate 0.000247 Epoch: 0 Global Step: 20460 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:11,768-Speed 2511.11 samples/sec Loss 35.5270 LearningRate 0.000247 Epoch: 0 Global Step: 20470 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:19,979-Speed 2494.89 samples/sec Loss 35.5221 LearningRate 0.000247 Epoch: 0 Global Step: 20480 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:28,187-Speed 2495.49 samples/sec Loss 35.5435 LearningRate 0.000247 Epoch: 0 Global Step: 20490 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:36,400-Speed 2493.97 samples/sec Loss 35.5937 LearningRate 0.000247 Epoch: 0 Global Step: 20500 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:44,615-Speed 2493.36 samples/sec Loss 35.5200 LearningRate 0.000247 Epoch: 0 Global Step: 20510 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:07:52,833-Speed 2492.42 samples/sec Loss 35.4899 LearningRate 0.000247 Epoch: 0 Global Step: 20520 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:00,994-Speed 2510.14 samples/sec Loss 35.5299 LearningRate 0.000247 Epoch: 0 Global Step: 20530 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:09,203-Speed 2495.22 samples/sec Loss 35.5058 LearningRate 0.000248 Epoch: 0 Global Step: 20540 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:17,416-Speed 2494.00 samples/sec Loss 35.4953 LearningRate 0.000248 Epoch: 0 Global Step: 20550 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:25,626-Speed 2494.90 samples/sec Loss 35.4481 LearningRate 0.000248 Epoch: 0 Global Step: 20560 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:33,838-Speed 2494.29 samples/sec Loss 35.4822 LearningRate 0.000248 Epoch: 0 Global Step: 20570 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:42,053-Speed 2493.67 samples/sec Loss 35.4830 LearningRate 0.000248 Epoch: 0 Global Step: 20580 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:50,211-Speed 2510.91 samples/sec Loss 35.4245 LearningRate 0.000248 Epoch: 0 Global Step: 20590 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:08:58,421-Speed 2494.79 samples/sec Loss 35.4272 LearningRate 0.000248 Epoch: 0 Global Step: 20600 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:06,632-Speed 2494.82 samples/sec Loss 35.5125 LearningRate 0.000248 Epoch: 0 Global Step: 20610 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:14,843-Speed 2494.69 samples/sec Loss 35.4225 LearningRate 0.000249 Epoch: 0 Global Step: 20620 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:23,053-Speed 2494.70 samples/sec Loss 35.4063 LearningRate 0.000249 Epoch: 0 Global Step: 20630 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:31,275-Speed 2491.47 samples/sec Loss 35.4020 LearningRate 0.000249 Epoch: 0 Global Step: 20640 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:39,437-Speed 2509.58 samples/sec Loss 35.4071 LearningRate 0.000249 Epoch: 0 Global Step: 20650 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:47,646-Speed 2495.33 samples/sec Loss 35.3895 LearningRate 0.000249 Epoch: 0 Global Step: 20660 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:09:55,857-Speed 2494.30 samples/sec Loss 35.4648 LearningRate 0.000249 Epoch: 0 Global Step: 20670 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:04,075-Speed 2492.72 samples/sec Loss 35.4542 LearningRate 0.000249 Epoch: 0 Global Step: 20680 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:12,291-Speed 2493.08 samples/sec Loss 35.4609 LearningRate 0.000249 Epoch: 0 Global Step: 20690 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:20,521-Speed 2488.83 samples/sec Loss 35.4752 LearningRate 0.000250 Epoch: 0 Global Step: 20700 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:28,675-Speed 2512.12 samples/sec Loss 35.4523 LearningRate 0.000250 Epoch: 0 Global Step: 20710 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:36,881-Speed 2496.21 samples/sec Loss 35.4721 LearningRate 0.000250 Epoch: 0 Global Step: 20720 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:45,091-Speed 2494.99 samples/sec Loss 35.5310 LearningRate 0.000250 Epoch: 0 Global Step: 20730 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:10:53,310-Speed 2492.01 samples/sec Loss 35.5499 LearningRate 0.000250 Epoch: 0 Global Step: 20740 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:03,978-Speed 1920.25 samples/sec Loss 35.4756 LearningRate 0.000250 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:12,182-Speed 2496.76 samples/sec Loss 35.4556 LearningRate 0.000250 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:20,343-Speed 2510.05 samples/sec Loss 35.4586 LearningRate 0.000250 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:28,548-Speed 2496.46 samples/sec Loss 35.4161 LearningRate 0.000250 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:36,752-Speed 2497.09 samples/sec Loss 35.3611 LearningRate 0.000251 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:44,957-Speed 2496.30 samples/sec Loss 35.3174 LearningRate 0.000251 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:11:53,162-Speed 2496.50 samples/sec Loss 35.3098 LearningRate 0.000251 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:01,371-Speed 2494.98 samples/sec Loss 35.3585 LearningRate 0.000251 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:09,531-Speed 2510.43 samples/sec Loss 35.3169 LearningRate 0.000251 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:17,736-Speed 2496.30 samples/sec Loss 35.4133 LearningRate 0.000251 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:25,949-Speed 2494.18 samples/sec Loss 35.3519 LearningRate 0.000251 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:34,154-Speed 2496.19 samples/sec Loss 35.3766 LearningRate 0.000251 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:42,370-Speed 2493.53 samples/sec Loss 35.3077 LearningRate 0.000252 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:50,579-Speed 2495.12 samples/sec Loss 35.3385 LearningRate 0.000252 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:12:58,734-Speed 2511.79 samples/sec Loss 35.3411 LearningRate 0.000252 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:06,957-Speed 2491.21 samples/sec Loss 35.3141 LearningRate 0.000252 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:15,164-Speed 2495.84 samples/sec Loss 35.2801 LearningRate 0.000252 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:23,377-Speed 2494.23 samples/sec Loss 35.2259 LearningRate 0.000252 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:31,583-Speed 2495.85 samples/sec Loss 35.2613 LearningRate 0.000252 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:39,794-Speed 2495.16 samples/sec Loss 35.2347 LearningRate 0.000252 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:47,950-Speed 2511.18 samples/sec Loss 35.2500 LearningRate 0.000253 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:13:56,159-Speed 2495.25 samples/sec Loss 35.2227 LearningRate 0.000253 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:04,367-Speed 2495.76 samples/sec Loss 35.3250 LearningRate 0.000253 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:12,574-Speed 2495.75 samples/sec Loss 35.2366 LearningRate 0.000253 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:20,786-Speed 2494.31 samples/sec Loss 35.2596 LearningRate 0.000253 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:28,996-Speed 2494.74 samples/sec Loss 35.3150 LearningRate 0.000253 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:37,152-Speed 2511.64 samples/sec Loss 35.2755 LearningRate 0.000253 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:45,362-Speed 2494.98 samples/sec Loss 35.2182 LearningRate 0.000253 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:14:53,571-Speed 2494.98 samples/sec Loss 35.2428 LearningRate 0.000253 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:01,782-Speed 2494.48 samples/sec Loss 35.2753 LearningRate 0.000254 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:09,992-Speed 2495.19 samples/sec Loss 35.3084 LearningRate 0.000254 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:18,199-Speed 2495.82 samples/sec Loss 35.2996 LearningRate 0.000254 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:26,355-Speed 2511.35 samples/sec Loss 35.2270 LearningRate 0.000254 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:34,567-Speed 2494.56 samples/sec Loss 35.2001 LearningRate 0.000254 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:42,771-Speed 2496.73 samples/sec Loss 35.2224 LearningRate 0.000254 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:50,979-Speed 2495.77 samples/sec Loss 35.1940 LearningRate 0.000254 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:15:59,186-Speed 2495.55 samples/sec Loss 35.1863 LearningRate 0.000254 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:07,394-Speed 2495.66 samples/sec Loss 35.2326 LearningRate 0.000255 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:15,548-Speed 2512.18 samples/sec Loss 35.1531 LearningRate 0.000255 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:23,768-Speed 2491.86 samples/sec Loss 35.2474 LearningRate 0.000255 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:31,975-Speed 2495.63 samples/sec Loss 35.2278 LearningRate 0.000255 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:40,186-Speed 2494.94 samples/sec Loss 35.1589 LearningRate 0.000255 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:48,396-Speed 2494.81 samples/sec Loss 35.1638 LearningRate 0.000255 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:16:56,607-Speed 2494.53 samples/sec Loss 35.2294 LearningRate 0.000255 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:04,777-Speed 2507.31 samples/sec Loss 35.2689 LearningRate 0.000255 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:12,984-Speed 2495.84 samples/sec Loss 35.1969 LearningRate 0.000256 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:21,194-Speed 2494.98 samples/sec Loss 35.0823 LearningRate 0.000256 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:29,404-Speed 2494.89 samples/sec Loss 35.1521 LearningRate 0.000256 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:37,609-Speed 2496.25 samples/sec Loss 35.1805 LearningRate 0.000256 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:45,829-Speed 2491.98 samples/sec Loss 35.1581 LearningRate 0.000256 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:17:53,981-Speed 2512.91 samples/sec Loss 35.1289 LearningRate 0.000256 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:02,186-Speed 2496.36 samples/sec Loss 35.0739 LearningRate 0.000256 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:10,396-Speed 2494.97 samples/sec Loss 35.0950 LearningRate 0.000256 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:18,612-Speed 2493.12 samples/sec Loss 35.1067 LearningRate 0.000257 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:26,819-Speed 2496.47 samples/sec Loss 34.9884 LearningRate 0.000257 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:35,025-Speed 2496.01 samples/sec Loss 35.0288 LearningRate 0.000257 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:43,183-Speed 2510.68 samples/sec Loss 35.0514 LearningRate 0.000257 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:51,390-Speed 2495.99 samples/sec Loss 35.0837 LearningRate 0.000257 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:18:59,599-Speed 2495.43 samples/sec Loss 34.9906 LearningRate 0.000257 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:07,808-Speed 2495.17 samples/sec Loss 34.9669 LearningRate 0.000257 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:16,018-Speed 2495.04 samples/sec Loss 35.0064 LearningRate 0.000257 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:24,231-Speed 2494.09 samples/sec Loss 34.9938 LearningRate 0.000257 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:32,388-Speed 2510.96 samples/sec Loss 35.0236 LearningRate 0.000258 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:40,598-Speed 2494.90 samples/sec Loss 34.9388 LearningRate 0.000258 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:48,807-Speed 2495.09 samples/sec Loss 34.9700 LearningRate 0.000258 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:19:57,031-Speed 2490.84 samples/sec Loss 34.9563 LearningRate 0.000258 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:05,245-Speed 2493.57 samples/sec Loss 34.9920 LearningRate 0.000258 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:13,457-Speed 2494.32 samples/sec Loss 35.0426 LearningRate 0.000258 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:21,626-Speed 2507.69 samples/sec Loss 34.9531 LearningRate 0.000258 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:29,832-Speed 2496.18 samples/sec Loss 34.9924 LearningRate 0.000258 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:38,041-Speed 2495.22 samples/sec Loss 34.9501 LearningRate 0.000259 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:46,255-Speed 2493.71 samples/sec Loss 34.9355 LearningRate 0.000259 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:20:54,464-Speed 2495.43 samples/sec Loss 34.9363 LearningRate 0.000259 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:02,674-Speed 2494.70 samples/sec Loss 34.9442 LearningRate 0.000259 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:10,844-Speed 2507.04 samples/sec Loss 34.9372 LearningRate 0.000259 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:19,056-Speed 2494.33 samples/sec Loss 34.9133 LearningRate 0.000259 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:27,265-Speed 2495.25 samples/sec Loss 34.9025 LearningRate 0.000259 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:35,478-Speed 2493.86 samples/sec Loss 34.9532 LearningRate 0.000259 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:43,686-Speed 2495.41 samples/sec Loss 34.9619 LearningRate 0.000260 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:21:51,900-Speed 2493.69 samples/sec Loss 35.0164 LearningRate 0.000260 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:00,064-Speed 2509.89 samples/sec Loss 35.0199 LearningRate 0.000260 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:08,270-Speed 2496.11 samples/sec Loss 35.0148 LearningRate 0.000260 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:16,487-Speed 2492.76 samples/sec Loss 34.9536 LearningRate 0.000260 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:24,694-Speed 2495.96 samples/sec Loss 34.8726 LearningRate 0.000260 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:32,909-Speed 2493.39 samples/sec Loss 35.0427 LearningRate 0.000260 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:41,115-Speed 2496.09 samples/sec Loss 35.0672 LearningRate 0.000260 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:49,269-Speed 2511.98 samples/sec Loss 34.9660 LearningRate 0.000260 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:22:57,476-Speed 2495.82 samples/sec Loss 34.9414 LearningRate 0.000261 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:05,683-Speed 2495.76 samples/sec Loss 34.9120 LearningRate 0.000261 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:13,898-Speed 2493.45 samples/sec Loss 34.8417 LearningRate 0.000261 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:22,107-Speed 2495.15 samples/sec Loss 34.8778 LearningRate 0.000261 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:30,315-Speed 2495.66 samples/sec Loss 34.8560 LearningRate 0.000261 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:38,472-Speed 2511.22 samples/sec Loss 34.7921 LearningRate 0.000261 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:46,681-Speed 2495.06 samples/sec Loss 34.7646 LearningRate 0.000261 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:23:54,893-Speed 2494.34 samples/sec Loss 34.7746 LearningRate 0.000261 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:03,111-Speed 2492.55 samples/sec Loss 34.7966 LearningRate 0.000262 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:11,326-Speed 2493.76 samples/sec Loss 34.8241 LearningRate 0.000262 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:19,532-Speed 2495.87 samples/sec Loss 34.8741 LearningRate 0.000262 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:27,686-Speed 2512.13 samples/sec Loss 34.8180 LearningRate 0.000262 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:35,891-Speed 2496.47 samples/sec Loss 34.7755 LearningRate 0.000262 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:44,108-Speed 2493.01 samples/sec Loss 34.7654 LearningRate 0.000262 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:24:52,314-Speed 2496.10 samples/sec Loss 34.8007 LearningRate 0.000262 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:00,523-Speed 2495.07 samples/sec Loss 34.7484 LearningRate 0.000262 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:08,733-Speed 2495.07 samples/sec Loss 34.7685 LearningRate 0.000263 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:16,885-Speed 2512.36 samples/sec Loss 34.7872 LearningRate 0.000263 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:25,096-Speed 2494.69 samples/sec Loss 34.7354 LearningRate 0.000263 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:33,305-Speed 2495.45 samples/sec Loss 34.7154 LearningRate 0.000263 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:41,510-Speed 2496.60 samples/sec Loss 34.7568 LearningRate 0.000263 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:49,718-Speed 2495.32 samples/sec Loss 34.7960 LearningRate 0.000263 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:25:57,945-Speed 2489.75 samples/sec Loss 34.8354 LearningRate 0.000263 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:06,101-Speed 2511.51 samples/sec Loss 34.9123 LearningRate 0.000263 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:14,309-Speed 2495.68 samples/sec Loss 34.8760 LearningRate 0.000264 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:22,517-Speed 2495.40 samples/sec Loss 34.8132 LearningRate 0.000264 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:30,724-Speed 2495.75 samples/sec Loss 34.8662 LearningRate 0.000264 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:38,933-Speed 2495.26 samples/sec Loss 34.8014 LearningRate 0.000264 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:47,138-Speed 2496.63 samples/sec Loss 34.8486 LearningRate 0.000264 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:26:55,291-Speed 2512.25 samples/sec Loss 34.7969 LearningRate 0.000264 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 4096 Required: 185 hours Training: 2022-07-05 19:27:03,450-Speed 2510.26 samples/sec Loss 34.6658 LearningRate 0.000264 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:11,656-Speed 2496.26 samples/sec Loss 34.7822 LearningRate 0.000264 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:19,862-Speed 2496.08 samples/sec Loss 34.7451 LearningRate 0.000264 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:28,065-Speed 2496.94 samples/sec Loss 34.8211 LearningRate 0.000265 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:36,269-Speed 2496.91 samples/sec Loss 34.7944 LearningRate 0.000265 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:44,420-Speed 2512.96 samples/sec Loss 34.7838 LearningRate 0.000265 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:27:52,623-Speed 2497.09 samples/sec Loss 34.6983 LearningRate 0.000265 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:00,844-Speed 2491.41 samples/sec Loss 34.8261 LearningRate 0.000265 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:09,058-Speed 2493.79 samples/sec Loss 34.8170 LearningRate 0.000265 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:17,265-Speed 2495.98 samples/sec Loss 34.8889 LearningRate 0.000265 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:25,471-Speed 2495.90 samples/sec Loss 34.7840 LearningRate 0.000265 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:33,626-Speed 2511.84 samples/sec Loss 34.6859 LearningRate 0.000266 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:41,833-Speed 2495.71 samples/sec Loss 34.7170 LearningRate 0.000266 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:50,040-Speed 2495.77 samples/sec Loss 34.6448 LearningRate 0.000266 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:28:58,252-Speed 2494.39 samples/sec Loss 34.6928 LearningRate 0.000266 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:06,458-Speed 2496.07 samples/sec Loss 34.6858 LearningRate 0.000266 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:14,665-Speed 2495.88 samples/sec Loss 34.5968 LearningRate 0.000266 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:22,815-Speed 2513.44 samples/sec Loss 34.6651 LearningRate 0.000266 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:31,022-Speed 2495.80 samples/sec Loss 34.6047 LearningRate 0.000266 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:39,226-Speed 2496.79 samples/sec Loss 34.6249 LearningRate 0.000267 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:47,435-Speed 2495.00 samples/sec Loss 34.6109 LearningRate 0.000267 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:29:55,659-Speed 2490.69 samples/sec Loss 34.6196 LearningRate 0.000267 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:03,865-Speed 2496.02 samples/sec Loss 34.5842 LearningRate 0.000267 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:12,017-Speed 2512.89 samples/sec Loss 34.5800 LearningRate 0.000267 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:20,220-Speed 2497.00 samples/sec Loss 34.5745 LearningRate 0.000267 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:28,424-Speed 2496.70 samples/sec Loss 34.5002 LearningRate 0.000267 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:36,636-Speed 2494.92 samples/sec Loss 34.5139 LearningRate 0.000267 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:44,869-Speed 2487.99 samples/sec Loss 34.4506 LearningRate 0.000267 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:30:53,074-Speed 2496.42 samples/sec Loss 34.5293 LearningRate 0.000268 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:31:01,228-Speed 2512.19 samples/sec Loss 34.5008 LearningRate 0.000268 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:31:09,433-Speed 2496.63 samples/sec Loss 34.4826 LearningRate 0.000268 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:31:17,640-Speed 2495.67 samples/sec Loss 34.5304 LearningRate 0.000268 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 2048 Required: 185 hours Training: 2022-07-05 19:31:25,849-Speed 2495.50 samples/sec Loss 34.4723 LearningRate 0.000268 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:31:34,069-Speed 2491.74 samples/sec Loss 34.4271 LearningRate 0.000268 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:31:42,282-Speed 2493.81 samples/sec Loss 34.5295 LearningRate 0.000268 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:31:50,444-Speed 2509.55 samples/sec Loss 34.5168 LearningRate 0.000268 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:31:58,657-Speed 2494.48 samples/sec Loss 34.5145 LearningRate 0.000269 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:06,864-Speed 2495.99 samples/sec Loss 34.4565 LearningRate 0.000269 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:15,069-Speed 2496.41 samples/sec Loss 34.5088 LearningRate 0.000269 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:23,278-Speed 2495.09 samples/sec Loss 34.4092 LearningRate 0.000269 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:31,489-Speed 2494.95 samples/sec Loss 34.4078 LearningRate 0.000269 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:39,645-Speed 2511.50 samples/sec Loss 34.3722 LearningRate 0.000269 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:47,852-Speed 2495.82 samples/sec Loss 34.4889 LearningRate 0.000269 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:32:56,059-Speed 2495.80 samples/sec Loss 34.5644 LearningRate 0.000269 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:04,265-Speed 2496.06 samples/sec Loss 34.5571 LearningRate 0.000270 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:12,471-Speed 2496.12 samples/sec Loss 34.5481 LearningRate 0.000270 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:20,674-Speed 2496.96 samples/sec Loss 34.4635 LearningRate 0.000270 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:28,830-Speed 2511.66 samples/sec Loss 34.4492 LearningRate 0.000270 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:37,036-Speed 2496.01 samples/sec Loss 34.3936 LearningRate 0.000270 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:45,243-Speed 2495.83 samples/sec Loss 34.3097 LearningRate 0.000270 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:33:53,451-Speed 2495.60 samples/sec Loss 34.3751 LearningRate 0.000270 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:01,661-Speed 2494.84 samples/sec Loss 34.4266 LearningRate 0.000270 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:09,868-Speed 2495.84 samples/sec Loss 34.3417 LearningRate 0.000270 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:18,019-Speed 2512.91 samples/sec Loss 34.3349 LearningRate 0.000271 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:26,239-Speed 2492.13 samples/sec Loss 34.2721 LearningRate 0.000271 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:34,440-Speed 2497.60 samples/sec Loss 34.3937 LearningRate 0.000271 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:42,656-Speed 2493.10 samples/sec Loss 34.3491 LearningRate 0.000271 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:50,863-Speed 2495.74 samples/sec Loss 34.3116 LearningRate 0.000271 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:34:59,070-Speed 2495.91 samples/sec Loss 34.2667 LearningRate 0.000271 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:07,236-Speed 2508.34 samples/sec Loss 34.3372 LearningRate 0.000271 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:15,446-Speed 2495.09 samples/sec Loss 34.2954 LearningRate 0.000271 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:23,652-Speed 2496.00 samples/sec Loss 34.2089 LearningRate 0.000272 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:31,856-Speed 2496.93 samples/sec Loss 34.3099 LearningRate 0.000272 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:40,076-Speed 2491.94 samples/sec Loss 34.2880 LearningRate 0.000272 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:48,281-Speed 2496.23 samples/sec Loss 34.2376 LearningRate 0.000272 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:35:56,439-Speed 2510.89 samples/sec Loss 34.2089 LearningRate 0.000272 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:04,647-Speed 2495.60 samples/sec Loss 34.1987 LearningRate 0.000272 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:12,851-Speed 2496.71 samples/sec Loss 34.1983 LearningRate 0.000272 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:21,069-Speed 2492.62 samples/sec Loss 34.2404 LearningRate 0.000272 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:29,275-Speed 2496.31 samples/sec Loss 34.1034 LearningRate 0.000273 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:37,485-Speed 2494.96 samples/sec Loss 34.1653 LearningRate 0.000273 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:45,637-Speed 2512.60 samples/sec Loss 34.1201 LearningRate 0.000273 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:36:53,842-Speed 2496.46 samples/sec Loss 34.1946 LearningRate 0.000273 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:02,047-Speed 2496.75 samples/sec Loss 34.1637 LearningRate 0.000273 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:10,255-Speed 2495.67 samples/sec Loss 34.1601 LearningRate 0.000273 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:18,460-Speed 2496.12 samples/sec Loss 34.1213 LearningRate 0.000273 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:26,669-Speed 2495.25 samples/sec Loss 34.1102 LearningRate 0.000273 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:34,822-Speed 2512.24 samples/sec Loss 34.1400 LearningRate 0.000274 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:43,031-Speed 2495.19 samples/sec Loss 34.1376 LearningRate 0.000274 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:51,235-Speed 2496.70 samples/sec Loss 34.0922 LearningRate 0.000274 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:37:59,441-Speed 2496.39 samples/sec Loss 34.1052 LearningRate 0.000274 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:07,649-Speed 2495.41 samples/sec Loss 34.0946 LearningRate 0.000274 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:15,854-Speed 2496.63 samples/sec Loss 34.0008 LearningRate 0.000274 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:24,029-Speed 2505.65 samples/sec Loss 34.1303 LearningRate 0.000274 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:32,233-Speed 2496.91 samples/sec Loss 34.0510 LearningRate 0.000274 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:40,439-Speed 2496.22 samples/sec Loss 34.0162 LearningRate 0.000274 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:48,643-Speed 2496.71 samples/sec Loss 33.9773 LearningRate 0.000275 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:38:56,852-Speed 2495.25 samples/sec Loss 34.0365 LearningRate 0.000275 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:05,058-Speed 2496.14 samples/sec Loss 34.0763 LearningRate 0.000275 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:13,214-Speed 2511.53 samples/sec Loss 33.9881 LearningRate 0.000275 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:21,423-Speed 2495.07 samples/sec Loss 34.0547 LearningRate 0.000275 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:29,634-Speed 2494.61 samples/sec Loss 34.0033 LearningRate 0.000275 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:37,842-Speed 2495.52 samples/sec Loss 34.0243 LearningRate 0.000275 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:46,061-Speed 2492.07 samples/sec Loss 34.0439 LearningRate 0.000275 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:39:54,267-Speed 2496.10 samples/sec Loss 33.9983 LearningRate 0.000276 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:02,421-Speed 2511.90 samples/sec Loss 33.9328 LearningRate 0.000276 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:10,639-Speed 2492.50 samples/sec Loss 33.9139 LearningRate 0.000276 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:18,859-Speed 2492.01 samples/sec Loss 33.9566 LearningRate 0.000276 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:27,070-Speed 2494.82 samples/sec Loss 33.8689 LearningRate 0.000276 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:35,284-Speed 2493.69 samples/sec Loss 33.9301 LearningRate 0.000276 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:43,502-Speed 2492.45 samples/sec Loss 33.9116 LearningRate 0.000276 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:51,657-Speed 2511.75 samples/sec Loss 33.8789 LearningRate 0.000276 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:40:59,862-Speed 2496.33 samples/sec Loss 33.8808 LearningRate 0.000277 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:08,080-Speed 2492.65 samples/sec Loss 33.9003 LearningRate 0.000277 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:16,285-Speed 2496.44 samples/sec Loss 33.9164 LearningRate 0.000277 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:24,493-Speed 2495.53 samples/sec Loss 33.8474 LearningRate 0.000277 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:32,698-Speed 2496.40 samples/sec Loss 33.9598 LearningRate 0.000277 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:40,852-Speed 2512.05 samples/sec Loss 33.9838 LearningRate 0.000277 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:49,058-Speed 2495.99 samples/sec Loss 33.9066 LearningRate 0.000277 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:41:57,265-Speed 2495.82 samples/sec Loss 33.9935 LearningRate 0.000277 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:05,467-Speed 2497.75 samples/sec Loss 33.9584 LearningRate 0.000277 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:13,672-Speed 2496.69 samples/sec Loss 33.9461 LearningRate 0.000278 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:21,877-Speed 2496.37 samples/sec Loss 33.8867 LearningRate 0.000278 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:30,031-Speed 2512.02 samples/sec Loss 33.8415 LearningRate 0.000278 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:38,238-Speed 2495.93 samples/sec Loss 33.9454 LearningRate 0.000278 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:46,443-Speed 2496.51 samples/sec Loss 33.9034 LearningRate 0.000278 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:42:54,649-Speed 2496.22 samples/sec Loss 33.8960 LearningRate 0.000278 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:43:02,853-Speed 2496.56 samples/sec Loss 33.8393 LearningRate 0.000278 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:43:11,059-Speed 2495.97 samples/sec Loss 33.8693 LearningRate 0.000278 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:43:19,213-Speed 2512.31 samples/sec Loss 33.8477 LearningRate 0.000279 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:43:27,417-Speed 2496.87 samples/sec Loss 33.8298 LearningRate 0.000279 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 4096 Required: 184 hours Training: 2022-07-05 19:43:35,621-Speed 2496.68 samples/sec Loss 33.7691 LearningRate 0.000279 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 4096 Required: 184 hours Training: 2022-07-05 19:43:43,829-Speed 2495.44 samples/sec Loss 33.7645 LearningRate 0.000279 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 4096 Required: 184 hours Training: 2022-07-05 19:43:51,993-Speed 2508.91 samples/sec Loss 33.8430 LearningRate 0.000279 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:00,202-Speed 2495.44 samples/sec Loss 33.8889 LearningRate 0.000279 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:08,356-Speed 2511.84 samples/sec Loss 33.8175 LearningRate 0.000279 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:16,567-Speed 2494.85 samples/sec Loss 33.8084 LearningRate 0.000279 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:24,807-Speed 2485.77 samples/sec Loss 33.8035 LearningRate 0.000280 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:33,012-Speed 2496.63 samples/sec Loss 33.7436 LearningRate 0.000280 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:41,217-Speed 2496.45 samples/sec Loss 33.8215 LearningRate 0.000280 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:49,420-Speed 2496.84 samples/sec Loss 33.7141 LearningRate 0.000280 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:44:57,579-Speed 2510.57 samples/sec Loss 33.7310 LearningRate 0.000280 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:05,781-Speed 2497.65 samples/sec Loss 33.6301 LearningRate 0.000280 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:13,990-Speed 2495.12 samples/sec Loss 33.6770 LearningRate 0.000280 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:22,192-Speed 2497.14 samples/sec Loss 33.6085 LearningRate 0.000280 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:30,400-Speed 2495.69 samples/sec Loss 33.6947 LearningRate 0.000280 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:38,617-Speed 2492.96 samples/sec Loss 33.9135 LearningRate 0.000281 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:46,772-Speed 2511.79 samples/sec Loss 33.8620 LearningRate 0.000281 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:45:54,973-Speed 2497.58 samples/sec Loss 33.7440 LearningRate 0.000281 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:03,175-Speed 2497.48 samples/sec Loss 33.7804 LearningRate 0.000281 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:11,380-Speed 2496.66 samples/sec Loss 33.7534 LearningRate 0.000281 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:19,582-Speed 2497.21 samples/sec Loss 33.7051 LearningRate 0.000281 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:27,786-Speed 2496.57 samples/sec Loss 33.7017 LearningRate 0.000281 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:35,963-Speed 2513.54 samples/sec Loss 33.6793 LearningRate 0.000281 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:44,211-Speed 2497.06 samples/sec Loss 33.7958 LearningRate 0.000282 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:46:52,419-Speed 2495.37 samples/sec Loss 33.7396 LearningRate 0.000282 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:00,624-Speed 2496.17 samples/sec Loss 33.6851 LearningRate 0.000282 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:08,859-Speed 2498.27 samples/sec Loss 33.6749 LearningRate 0.000282 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:17,094-Speed 2498.78 samples/sec Loss 33.7305 LearningRate 0.000282 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:25,244-Speed 2513.40 samples/sec Loss 33.7276 LearningRate 0.000282 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:33,472-Speed 2497.77 samples/sec Loss 33.7111 LearningRate 0.000282 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:41,719-Speed 2498.15 samples/sec Loss 33.6467 LearningRate 0.000282 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:49,941-Speed 2498.81 samples/sec Loss 33.6357 LearningRate 0.000283 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:47:58,144-Speed 2497.07 samples/sec Loss 33.7270 LearningRate 0.000283 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:06,348-Speed 2496.66 samples/sec Loss 33.6383 LearningRate 0.000283 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:14,589-Speed 2514.43 samples/sec Loss 33.5261 LearningRate 0.000283 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:23,744-Speed 2499.45 samples/sec Loss 33.5233 LearningRate 0.000283 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:31,951-Speed 2495.76 samples/sec Loss 33.5736 LearningRate 0.000283 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:40,891-Speed 2498.96 samples/sec Loss 33.5809 LearningRate 0.000283 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:49,099-Speed 2498.43 samples/sec Loss 33.5037 LearningRate 0.000283 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:48:57,360-Speed 2498.04 samples/sec Loss 33.5774 LearningRate 0.000284 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:05,634-Speed 2514.74 samples/sec Loss 33.4799 LearningRate 0.000284 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:13,838-Speed 2496.44 samples/sec Loss 33.4831 LearningRate 0.000284 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:22,053-Speed 2498.75 samples/sec Loss 33.4462 LearningRate 0.000284 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:30,312-Speed 2498.12 samples/sec Loss 33.4715 LearningRate 0.000284 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:38,518-Speed 2495.91 samples/sec Loss 33.5187 LearningRate 0.000284 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 2048 Required: 184 hours Training: 2022-07-05 19:49:46,761-Speed 2511.26 samples/sec Loss 33.4967 LearningRate 0.000284 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:49:54,912-Speed 2515.19 samples/sec Loss 33.5474 LearningRate 0.000284 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:03,374-Speed 2495.57 samples/sec Loss 33.4955 LearningRate 0.000284 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:11,579-Speed 2496.40 samples/sec Loss 33.5409 LearningRate 0.000285 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:19,946-Speed 2498.99 samples/sec Loss 33.4705 LearningRate 0.000285 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:28,147-Speed 2497.44 samples/sec Loss 33.4776 LearningRate 0.000285 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:36,416-Speed 2498.00 samples/sec Loss 33.4395 LearningRate 0.000285 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:44,611-Speed 2510.65 samples/sec Loss 33.3867 LearningRate 0.000285 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:50:52,814-Speed 2496.83 samples/sec Loss 33.3598 LearningRate 0.000285 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:01,017-Speed 2496.80 samples/sec Loss 33.3875 LearningRate 0.000285 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:09,491-Speed 2498.34 samples/sec Loss 33.3973 LearningRate 0.000285 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:20,783-Speed 2493.97 samples/sec Loss 33.3017 LearningRate 0.000286 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:28,990-Speed 2495.40 samples/sec Loss 33.3187 LearningRate 0.000286 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:37,139-Speed 2513.72 samples/sec Loss 33.4067 LearningRate 0.000286 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:45,337-Speed 2498.72 samples/sec Loss 33.3252 LearningRate 0.000286 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:51:53,541-Speed 2496.82 samples/sec Loss 33.3752 LearningRate 0.000286 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:01,740-Speed 2498.29 samples/sec Loss 33.3532 LearningRate 0.000286 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:09,940-Speed 2498.46 samples/sec Loss 33.3778 LearningRate 0.000286 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:18,147-Speed 2496.12 samples/sec Loss 33.3621 LearningRate 0.000286 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:26,292-Speed 2514.93 samples/sec Loss 33.3539 LearningRate 0.000287 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:34,498-Speed 2496.16 samples/sec Loss 33.3347 LearningRate 0.000287 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:42,699-Speed 2497.43 samples/sec Loss 33.4222 LearningRate 0.000287 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:50,903-Speed 2496.96 samples/sec Loss 33.3824 LearningRate 0.000287 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:52:59,107-Speed 2496.75 samples/sec Loss 33.3689 LearningRate 0.000287 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:07,315-Speed 2495.25 samples/sec Loss 33.2359 LearningRate 0.000287 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:15,482-Speed 2508.01 samples/sec Loss 33.3197 LearningRate 0.000287 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:23,695-Speed 2494.04 samples/sec Loss 33.3083 LearningRate 0.000287 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:31,905-Speed 2495.18 samples/sec Loss 33.3326 LearningRate 0.000287 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:40,109-Speed 2496.80 samples/sec Loss 33.2910 LearningRate 0.000288 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:48,315-Speed 2495.99 samples/sec Loss 33.2766 LearningRate 0.000288 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:53:56,525-Speed 2495.05 samples/sec Loss 33.2293 LearningRate 0.000288 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:04,679-Speed 2511.88 samples/sec Loss 33.2717 LearningRate 0.000288 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:12,884-Speed 2496.85 samples/sec Loss 33.1711 LearningRate 0.000288 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:21,088-Speed 2496.46 samples/sec Loss 33.2236 LearningRate 0.000288 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:29,296-Speed 2495.75 samples/sec Loss 33.1394 LearningRate 0.000288 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:37,508-Speed 2494.22 samples/sec Loss 33.1276 LearningRate 0.000288 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:45,727-Speed 2492.10 samples/sec Loss 33.1683 LearningRate 0.000289 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:54:53,878-Speed 2512.89 samples/sec Loss 33.1664 LearningRate 0.000289 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:02,083-Speed 2496.58 samples/sec Loss 33.1389 LearningRate 0.000289 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:10,285-Speed 2497.47 samples/sec Loss 33.1222 LearningRate 0.000289 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:18,489-Speed 2496.97 samples/sec Loss 33.1233 LearningRate 0.000289 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:26,693-Speed 2496.70 samples/sec Loss 33.2252 LearningRate 0.000289 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:34,912-Speed 2492.12 samples/sec Loss 33.1699 LearningRate 0.000289 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:43,075-Speed 2509.17 samples/sec Loss 33.2733 LearningRate 0.000289 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:51,281-Speed 2496.23 samples/sec Loss 33.1992 LearningRate 0.000290 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:55:59,489-Speed 2495.62 samples/sec Loss 33.2350 LearningRate 0.000290 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:07,696-Speed 2495.84 samples/sec Loss 33.2665 LearningRate 0.000290 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:15,899-Speed 2496.86 samples/sec Loss 33.1565 LearningRate 0.000290 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:24,118-Speed 2491.93 samples/sec Loss 33.2066 LearningRate 0.000290 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:32,270-Speed 2512.94 samples/sec Loss 33.0607 LearningRate 0.000290 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:40,473-Speed 2496.97 samples/sec Loss 33.1001 LearningRate 0.000290 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:48,679-Speed 2496.14 samples/sec Loss 33.0645 LearningRate 0.000290 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:56:56,909-Speed 2488.97 samples/sec Loss 33.0311 LearningRate 0.000291 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:57:05,113-Speed 2496.82 samples/sec Loss 33.1355 LearningRate 0.000291 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:57:13,322-Speed 2495.19 samples/sec Loss 33.1852 LearningRate 0.000291 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:57:21,472-Speed 2513.29 samples/sec Loss 33.1823 LearningRate 0.000291 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:57:29,680-Speed 2495.73 samples/sec Loss 33.2730 LearningRate 0.000291 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 19:57:37,838-Speed 2510.64 samples/sec Loss 33.2205 LearningRate 0.000291 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:57:46,041-Speed 2497.26 samples/sec Loss 33.1216 LearningRate 0.000291 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:57:54,246-Speed 2496.46 samples/sec Loss 33.3251 LearningRate 0.000291 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:02,450-Speed 2496.83 samples/sec Loss 33.3539 LearningRate 0.000291 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:10,599-Speed 2513.65 samples/sec Loss 33.1732 LearningRate 0.000292 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:18,802-Speed 2496.87 samples/sec Loss 33.0591 LearningRate 0.000292 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:27,005-Speed 2496.99 samples/sec Loss 33.1538 LearningRate 0.000292 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:35,213-Speed 2495.73 samples/sec Loss 33.0709 LearningRate 0.000292 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:43,422-Speed 2495.17 samples/sec Loss 33.0796 LearningRate 0.000292 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:51,629-Speed 2495.84 samples/sec Loss 33.0822 LearningRate 0.000292 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:58:59,784-Speed 2511.91 samples/sec Loss 33.0311 LearningRate 0.000292 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:07,991-Speed 2495.77 samples/sec Loss 32.9989 LearningRate 0.000292 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:16,198-Speed 2495.71 samples/sec Loss 33.0336 LearningRate 0.000293 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:24,410-Speed 2494.33 samples/sec Loss 33.0315 LearningRate 0.000293 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:32,616-Speed 2496.00 samples/sec Loss 33.0093 LearningRate 0.000293 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:40,821-Speed 2496.59 samples/sec Loss 33.0280 LearningRate 0.000293 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:48,971-Speed 2513.47 samples/sec Loss 32.9928 LearningRate 0.000293 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 19:59:57,178-Speed 2495.88 samples/sec Loss 32.9757 LearningRate 0.000293 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:05,380-Speed 2497.14 samples/sec Loss 33.1060 LearningRate 0.000293 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:13,586-Speed 2496.22 samples/sec Loss 33.0064 LearningRate 0.000293 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:21,797-Speed 2494.54 samples/sec Loss 33.0919 LearningRate 0.000294 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:30,001-Speed 2496.75 samples/sec Loss 33.1081 LearningRate 0.000294 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:38,156-Speed 2511.63 samples/sec Loss 32.8561 LearningRate 0.000294 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:46,364-Speed 2496.04 samples/sec Loss 32.8790 LearningRate 0.000294 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:00:54,570-Speed 2496.14 samples/sec Loss 32.9206 LearningRate 0.000294 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:02,781-Speed 2494.50 samples/sec Loss 32.8656 LearningRate 0.000294 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:10,986-Speed 2496.34 samples/sec Loss 32.8417 LearningRate 0.000294 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:19,191-Speed 2496.57 samples/sec Loss 32.9291 LearningRate 0.000294 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:27,349-Speed 2510.78 samples/sec Loss 32.7406 LearningRate 0.000294 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:35,562-Speed 2494.21 samples/sec Loss 32.8637 LearningRate 0.000295 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:43,771-Speed 2494.96 samples/sec Loss 32.8166 LearningRate 0.000295 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:01:51,975-Speed 2496.93 samples/sec Loss 32.8047 LearningRate 0.000295 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:00,180-Speed 2496.53 samples/sec Loss 32.7606 LearningRate 0.000295 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:08,383-Speed 2496.78 samples/sec Loss 32.7251 LearningRate 0.000295 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:16,534-Speed 2513.24 samples/sec Loss 32.7989 LearningRate 0.000295 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:24,736-Speed 2497.40 samples/sec Loss 32.7163 LearningRate 0.000295 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:32,940-Speed 2496.60 samples/sec Loss 32.6928 LearningRate 0.000295 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:41,157-Speed 2493.24 samples/sec Loss 32.7449 LearningRate 0.000296 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:49,359-Speed 2497.22 samples/sec Loss 32.6926 LearningRate 0.000296 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:02:57,570-Speed 2494.63 samples/sec Loss 32.6652 LearningRate 0.000296 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:05,729-Speed 2510.73 samples/sec Loss 32.6852 LearningRate 0.000296 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:13,931-Speed 2497.28 samples/sec Loss 32.7997 LearningRate 0.000296 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:22,134-Speed 2496.92 samples/sec Loss 32.7440 LearningRate 0.000296 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:30,343-Speed 2495.11 samples/sec Loss 32.7272 LearningRate 0.000296 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:38,548-Speed 2496.86 samples/sec Loss 32.7644 LearningRate 0.000296 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:46,751-Speed 2496.85 samples/sec Loss 32.7146 LearningRate 0.000297 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:03:54,902-Speed 2512.92 samples/sec Loss 32.6818 LearningRate 0.000297 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:03,105-Speed 2497.12 samples/sec Loss 32.6484 LearningRate 0.000297 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:11,306-Speed 2497.74 samples/sec Loss 32.7171 LearningRate 0.000297 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:19,520-Speed 2493.79 samples/sec Loss 32.6490 LearningRate 0.000297 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:27,723-Speed 2496.93 samples/sec Loss 32.7901 LearningRate 0.000297 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:35,927-Speed 2496.75 samples/sec Loss 32.7633 LearningRate 0.000297 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:44,082-Speed 2511.98 samples/sec Loss 32.6891 LearningRate 0.000297 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:04:52,286-Speed 2496.57 samples/sec Loss 32.7113 LearningRate 0.000297 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:00,521-Speed 2487.13 samples/sec Loss 32.6715 LearningRate 0.000298 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:08,731-Speed 2495.10 samples/sec Loss 32.5415 LearningRate 0.000298 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:16,939-Speed 2495.43 samples/sec Loss 32.5743 LearningRate 0.000298 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:25,144-Speed 2496.22 samples/sec Loss 32.5464 LearningRate 0.000298 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:33,308-Speed 2508.84 samples/sec Loss 32.6604 LearningRate 0.000298 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:41,511-Speed 2497.30 samples/sec Loss 32.6559 LearningRate 0.000298 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:49,714-Speed 2496.94 samples/sec Loss 32.6443 LearningRate 0.000298 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:05:57,916-Speed 2497.10 samples/sec Loss 32.5202 LearningRate 0.000298 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:06,119-Speed 2497.52 samples/sec Loss 32.4976 LearningRate 0.000299 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:14,324-Speed 2496.44 samples/sec Loss 32.4891 LearningRate 0.000299 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:22,478-Speed 2511.78 samples/sec Loss 32.5388 LearningRate 0.000299 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:30,682-Speed 2496.66 samples/sec Loss 32.4584 LearningRate 0.000299 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:38,887-Speed 2496.57 samples/sec Loss 32.4462 LearningRate 0.000299 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:47,092-Speed 2496.62 samples/sec Loss 32.4910 LearningRate 0.000299 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:06:55,299-Speed 2495.77 samples/sec Loss 32.4677 LearningRate 0.000299 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:03,511-Speed 2494.28 samples/sec Loss 32.3412 LearningRate 0.000299 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:11,664-Speed 2512.21 samples/sec Loss 32.5492 LearningRate 0.000300 Epoch: 1 Global Step: 24850 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:19,869-Speed 2496.56 samples/sec Loss 32.5292 LearningRate 0.000300 Epoch: 1 Global Step: 24860 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:28,072-Speed 2497.04 samples/sec Loss 32.5331 LearningRate 0.000300 Epoch: 1 Global Step: 24870 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:36,278-Speed 2495.95 samples/sec Loss 32.4501 LearningRate 0.000300 Epoch: 1 Global Step: 24880 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:44,484-Speed 2496.03 samples/sec Loss 32.3448 LearningRate 0.000300 Epoch: 1 Global Step: 24890 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:07:52,689-Speed 2496.55 samples/sec Loss 32.3462 LearningRate 0.000300 Epoch: 1 Global Step: 24900 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:00,843-Speed 2511.89 samples/sec Loss 32.3986 LearningRate 0.000300 Epoch: 1 Global Step: 24910 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:09,054-Speed 2494.41 samples/sec Loss 32.3324 LearningRate 0.000300 Epoch: 1 Global Step: 24920 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:17,258-Speed 2496.87 samples/sec Loss 32.3721 LearningRate 0.000301 Epoch: 1 Global Step: 24930 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:25,464-Speed 2496.30 samples/sec Loss 32.3437 LearningRate 0.000301 Epoch: 1 Global Step: 24940 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:33,666-Speed 2497.14 samples/sec Loss 32.4082 LearningRate 0.000301 Epoch: 1 Global Step: 24950 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:41,869-Speed 2497.02 samples/sec Loss 32.3699 LearningRate 0.000301 Epoch: 1 Global Step: 24960 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:50,029-Speed 2510.37 samples/sec Loss 32.4306 LearningRate 0.000301 Epoch: 1 Global Step: 24970 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:08:58,234-Speed 2496.70 samples/sec Loss 32.3080 LearningRate 0.000301 Epoch: 1 Global Step: 24980 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:06,440-Speed 2496.05 samples/sec Loss 32.3402 LearningRate 0.000301 Epoch: 1 Global Step: 24990 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:14,646-Speed 2495.94 samples/sec Loss 32.3176 LearningRate 0.000301 Epoch: 1 Global Step: 25000 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:22,850-Speed 2496.90 samples/sec Loss 32.3016 LearningRate 0.000301 Epoch: 1 Global Step: 25010 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:31,059-Speed 2495.04 samples/sec Loss 32.3873 LearningRate 0.000302 Epoch: 1 Global Step: 25020 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:39,213-Speed 2511.94 samples/sec Loss 32.2669 LearningRate 0.000302 Epoch: 1 Global Step: 25030 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:47,417-Speed 2496.99 samples/sec Loss 32.3031 LearningRate 0.000302 Epoch: 1 Global Step: 25040 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:09:55,628-Speed 2494.71 samples/sec Loss 32.2999 LearningRate 0.000302 Epoch: 1 Global Step: 25050 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:03,830-Speed 2497.25 samples/sec Loss 32.2248 LearningRate 0.000302 Epoch: 1 Global Step: 25060 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:12,035-Speed 2496.35 samples/sec Loss 32.2325 LearningRate 0.000302 Epoch: 1 Global Step: 25070 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:20,242-Speed 2495.69 samples/sec Loss 32.3510 LearningRate 0.000302 Epoch: 1 Global Step: 25080 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:28,395-Speed 2512.58 samples/sec Loss 32.4733 LearningRate 0.000302 Epoch: 1 Global Step: 25090 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:36,597-Speed 2497.16 samples/sec Loss 32.2897 LearningRate 0.000303 Epoch: 1 Global Step: 25100 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:44,800-Speed 2497.27 samples/sec Loss 32.3391 LearningRate 0.000303 Epoch: 1 Global Step: 25110 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:10:53,012-Speed 2494.36 samples/sec Loss 32.3072 LearningRate 0.000303 Epoch: 1 Global Step: 25120 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:01,214-Speed 2497.13 samples/sec Loss 32.1994 LearningRate 0.000303 Epoch: 1 Global Step: 25130 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:09,420-Speed 2496.47 samples/sec Loss 32.1301 LearningRate 0.000303 Epoch: 1 Global Step: 25140 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:17,571-Speed 2513.06 samples/sec Loss 32.1906 LearningRate 0.000303 Epoch: 1 Global Step: 25150 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:25,779-Speed 2495.60 samples/sec Loss 32.2795 LearningRate 0.000303 Epoch: 1 Global Step: 25160 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:33,981-Speed 2497.48 samples/sec Loss 32.2456 LearningRate 0.000303 Epoch: 1 Global Step: 25170 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:42,184-Speed 2496.78 samples/sec Loss 32.2143 LearningRate 0.000304 Epoch: 1 Global Step: 25180 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:50,390-Speed 2496.70 samples/sec Loss 32.2657 LearningRate 0.000304 Epoch: 1 Global Step: 25190 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:11:58,592-Speed 2497.28 samples/sec Loss 32.3627 LearningRate 0.000304 Epoch: 1 Global Step: 25200 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:06,742-Speed 2513.30 samples/sec Loss 32.3041 LearningRate 0.000304 Epoch: 1 Global Step: 25210 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:14,944-Speed 2497.20 samples/sec Loss 32.3859 LearningRate 0.000304 Epoch: 1 Global Step: 25220 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:23,147-Speed 2497.27 samples/sec Loss 32.2314 LearningRate 0.000304 Epoch: 1 Global Step: 25230 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:31,353-Speed 2496.01 samples/sec Loss 32.2180 LearningRate 0.000304 Epoch: 1 Global Step: 25240 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:39,557-Speed 2496.71 samples/sec Loss 32.3763 LearningRate 0.000304 Epoch: 1 Global Step: 25250 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:47,763-Speed 2496.06 samples/sec Loss 32.1571 LearningRate 0.000304 Epoch: 1 Global Step: 25260 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:12:55,914-Speed 2512.86 samples/sec Loss 32.3105 LearningRate 0.000305 Epoch: 1 Global Step: 25270 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:04,115-Speed 2497.58 samples/sec Loss 32.3711 LearningRate 0.000305 Epoch: 1 Global Step: 25280 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:12,319-Speed 2496.70 samples/sec Loss 32.2713 LearningRate 0.000305 Epoch: 1 Global Step: 25290 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:20,521-Speed 2497.35 samples/sec Loss 32.2114 LearningRate 0.000305 Epoch: 1 Global Step: 25300 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:28,722-Speed 2497.64 samples/sec Loss 32.1858 LearningRate 0.000305 Epoch: 1 Global Step: 25310 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:36,930-Speed 2495.43 samples/sec Loss 32.0999 LearningRate 0.000305 Epoch: 1 Global Step: 25320 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:45,080-Speed 2513.35 samples/sec Loss 32.1120 LearningRate 0.000305 Epoch: 1 Global Step: 25330 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:13:53,288-Speed 2495.52 samples/sec Loss 32.0347 LearningRate 0.000305 Epoch: 1 Global Step: 25340 Fp16 Grad Scale: 512 Required: 184 hours Training: 2022-07-05 20:14:01,497-Speed 2495.18 samples/sec Loss 32.1290 LearningRate 0.000306 Epoch: 1 Global Step: 25350 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:09,715-Speed 2492.82 samples/sec Loss 31.8875 LearningRate 0.000306 Epoch: 1 Global Step: 25360 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:17,916-Speed 2497.40 samples/sec Loss 31.9380 LearningRate 0.000306 Epoch: 1 Global Step: 25370 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:26,120-Speed 2496.84 samples/sec Loss 32.0008 LearningRate 0.000306 Epoch: 1 Global Step: 25380 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:34,282-Speed 2509.76 samples/sec Loss 31.9027 LearningRate 0.000306 Epoch: 1 Global Step: 25390 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:42,489-Speed 2495.80 samples/sec Loss 31.9004 LearningRate 0.000306 Epoch: 1 Global Step: 25400 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:50,695-Speed 2496.33 samples/sec Loss 31.9455 LearningRate 0.000306 Epoch: 1 Global Step: 25410 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:14:58,900-Speed 2496.05 samples/sec Loss 31.9468 LearningRate 0.000306 Epoch: 1 Global Step: 25420 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:07,103-Speed 2497.21 samples/sec Loss 31.9077 LearningRate 0.000307 Epoch: 1 Global Step: 25430 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:15,311-Speed 2495.45 samples/sec Loss 31.9807 LearningRate 0.000307 Epoch: 1 Global Step: 25440 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:23,464-Speed 2512.37 samples/sec Loss 31.9171 LearningRate 0.000307 Epoch: 1 Global Step: 25450 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:31,677-Speed 2494.02 samples/sec Loss 31.8391 LearningRate 0.000307 Epoch: 1 Global Step: 25460 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:39,885-Speed 2495.36 samples/sec Loss 31.7531 LearningRate 0.000307 Epoch: 1 Global Step: 25470 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:48,088-Speed 2497.07 samples/sec Loss 31.8745 LearningRate 0.000307 Epoch: 1 Global Step: 25480 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:15:56,304-Speed 2492.94 samples/sec Loss 31.8558 LearningRate 0.000307 Epoch: 1 Global Step: 25490 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:04,520-Speed 2493.20 samples/sec Loss 31.7941 LearningRate 0.000307 Epoch: 1 Global Step: 25500 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:12,671-Speed 2513.55 samples/sec Loss 31.8529 LearningRate 0.000308 Epoch: 1 Global Step: 25510 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:20,875-Speed 2496.51 samples/sec Loss 31.7923 LearningRate 0.000308 Epoch: 1 Global Step: 25520 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:29,082-Speed 2495.86 samples/sec Loss 31.8603 LearningRate 0.000308 Epoch: 1 Global Step: 25530 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:37,287-Speed 2496.43 samples/sec Loss 31.7991 LearningRate 0.000308 Epoch: 1 Global Step: 25540 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:45,497-Speed 2495.02 samples/sec Loss 31.8380 LearningRate 0.000308 Epoch: 1 Global Step: 25550 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:16:53,702-Speed 2496.37 samples/sec Loss 31.8941 LearningRate 0.000308 Epoch: 1 Global Step: 25560 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:01,849-Speed 2514.07 samples/sec Loss 32.0415 LearningRate 0.000308 Epoch: 1 Global Step: 25570 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:10,072-Speed 2491.05 samples/sec Loss 31.9726 LearningRate 0.000308 Epoch: 1 Global Step: 25580 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:18,289-Speed 2492.91 samples/sec Loss 31.8153 LearningRate 0.000308 Epoch: 1 Global Step: 25590 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:26,494-Speed 2496.35 samples/sec Loss 31.9202 LearningRate 0.000309 Epoch: 1 Global Step: 25600 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:34,698-Speed 2496.85 samples/sec Loss 31.8010 LearningRate 0.000309 Epoch: 1 Global Step: 25610 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:42,915-Speed 2492.80 samples/sec Loss 31.7487 LearningRate 0.000309 Epoch: 1 Global Step: 25620 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:51,068-Speed 2512.05 samples/sec Loss 31.8118 LearningRate 0.000309 Epoch: 1 Global Step: 25630 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:17:59,285-Speed 2492.73 samples/sec Loss 31.7639 LearningRate 0.000309 Epoch: 1 Global Step: 25640 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:07,485-Speed 2497.93 samples/sec Loss 31.7145 LearningRate 0.000309 Epoch: 1 Global Step: 25650 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:15,701-Speed 2493.30 samples/sec Loss 31.6880 LearningRate 0.000309 Epoch: 1 Global Step: 25660 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:23,905-Speed 2496.46 samples/sec Loss 31.6492 LearningRate 0.000309 Epoch: 1 Global Step: 25670 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:32,110-Speed 2496.61 samples/sec Loss 31.6460 LearningRate 0.000310 Epoch: 1 Global Step: 25680 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:40,258-Speed 2513.69 samples/sec Loss 31.6598 LearningRate 0.000310 Epoch: 1 Global Step: 25690 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:48,465-Speed 2495.92 samples/sec Loss 31.6538 LearningRate 0.000310 Epoch: 1 Global Step: 25700 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:18:56,672-Speed 2495.58 samples/sec Loss 31.6522 LearningRate 0.000310 Epoch: 1 Global Step: 25710 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:04,880-Speed 2495.82 samples/sec Loss 31.5906 LearningRate 0.000310 Epoch: 1 Global Step: 25720 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:13,085-Speed 2496.44 samples/sec Loss 31.5828 LearningRate 0.000310 Epoch: 1 Global Step: 25730 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:21,289-Speed 2496.97 samples/sec Loss 31.6886 LearningRate 0.000310 Epoch: 1 Global Step: 25740 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:29,438-Speed 2513.58 samples/sec Loss 31.5538 LearningRate 0.000310 Epoch: 1 Global Step: 25750 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:37,642-Speed 2496.88 samples/sec Loss 31.5500 LearningRate 0.000311 Epoch: 1 Global Step: 25760 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:45,845-Speed 2497.06 samples/sec Loss 31.5494 LearningRate 0.000311 Epoch: 1 Global Step: 25770 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:19:54,049-Speed 2496.59 samples/sec Loss 31.5327 LearningRate 0.000311 Epoch: 1 Global Step: 25780 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:02,251-Speed 2497.25 samples/sec Loss 31.4678 LearningRate 0.000311 Epoch: 1 Global Step: 25790 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:10,456-Speed 2496.42 samples/sec Loss 31.4884 LearningRate 0.000311 Epoch: 1 Global Step: 25800 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:18,604-Speed 2513.85 samples/sec Loss 31.4919 LearningRate 0.000311 Epoch: 1 Global Step: 25810 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:26,808-Speed 2497.27 samples/sec Loss 31.5137 LearningRate 0.000311 Epoch: 1 Global Step: 25820 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:35,010-Speed 2497.27 samples/sec Loss 31.5362 LearningRate 0.000311 Epoch: 1 Global Step: 25830 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:43,217-Speed 2495.94 samples/sec Loss 31.4985 LearningRate 0.000311 Epoch: 1 Global Step: 25840 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:51,420-Speed 2497.06 samples/sec Loss 31.5277 LearningRate 0.000312 Epoch: 1 Global Step: 25850 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:20:59,623-Speed 2497.13 samples/sec Loss 31.5172 LearningRate 0.000312 Epoch: 1 Global Step: 25860 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:07,775-Speed 2512.55 samples/sec Loss 31.6680 LearningRate 0.000312 Epoch: 1 Global Step: 25870 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:15,977-Speed 2497.40 samples/sec Loss 31.5537 LearningRate 0.000312 Epoch: 1 Global Step: 25880 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:24,179-Speed 2497.33 samples/sec Loss 31.6955 LearningRate 0.000312 Epoch: 1 Global Step: 25890 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:32,385-Speed 2496.11 samples/sec Loss 31.6385 LearningRate 0.000312 Epoch: 1 Global Step: 25900 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:40,589-Speed 2496.70 samples/sec Loss 31.5793 LearningRate 0.000312 Epoch: 1 Global Step: 25910 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:48,789-Speed 2498.00 samples/sec Loss 31.4828 LearningRate 0.000312 Epoch: 1 Global Step: 25920 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:21:56,942-Speed 2512.36 samples/sec Loss 31.5095 LearningRate 0.000313 Epoch: 1 Global Step: 25930 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:05,145-Speed 2497.13 samples/sec Loss 31.5200 LearningRate 0.000313 Epoch: 1 Global Step: 25940 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:13,349-Speed 2496.93 samples/sec Loss 31.5169 LearningRate 0.000313 Epoch: 1 Global Step: 25950 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:21,551-Speed 2497.44 samples/sec Loss 31.6015 LearningRate 0.000313 Epoch: 1 Global Step: 25960 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:29,756-Speed 2496.56 samples/sec Loss 31.4587 LearningRate 0.000313 Epoch: 1 Global Step: 25970 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:37,960-Speed 2496.50 samples/sec Loss 31.4085 LearningRate 0.000313 Epoch: 1 Global Step: 25980 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:46,113-Speed 2512.59 samples/sec Loss 31.3613 LearningRate 0.000313 Epoch: 1 Global Step: 25990 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:22:54,314-Speed 2497.48 samples/sec Loss 31.4802 LearningRate 0.000313 Epoch: 1 Global Step: 26000 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:02,516-Speed 2497.57 samples/sec Loss 31.2842 LearningRate 0.000314 Epoch: 1 Global Step: 26010 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:10,719-Speed 2496.87 samples/sec Loss 31.3569 LearningRate 0.000314 Epoch: 1 Global Step: 26020 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:18,920-Speed 2497.63 samples/sec Loss 31.3023 LearningRate 0.000314 Epoch: 1 Global Step: 26030 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:27,123-Speed 2497.00 samples/sec Loss 31.3123 LearningRate 0.000314 Epoch: 1 Global Step: 26040 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:35,357-Speed 2487.66 samples/sec Loss 31.3162 LearningRate 0.000314 Epoch: 1 Global Step: 26050 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:43,561-Speed 2496.52 samples/sec Loss 31.2293 LearningRate 0.000314 Epoch: 1 Global Step: 26060 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:51,763-Speed 2497.42 samples/sec Loss 31.1915 LearningRate 0.000314 Epoch: 1 Global Step: 26070 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:23:59,983-Speed 2491.95 samples/sec Loss 31.2682 LearningRate 0.000314 Epoch: 1 Global Step: 26080 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:08,185-Speed 2497.30 samples/sec Loss 31.1435 LearningRate 0.000314 Epoch: 1 Global Step: 26090 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:16,385-Speed 2497.81 samples/sec Loss 31.1860 LearningRate 0.000315 Epoch: 1 Global Step: 26100 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:24,536-Speed 2513.09 samples/sec Loss 31.2498 LearningRate 0.000315 Epoch: 1 Global Step: 26110 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:32,751-Speed 2493.55 samples/sec Loss 31.2791 LearningRate 0.000315 Epoch: 1 Global Step: 26120 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:40,955-Speed 2496.53 samples/sec Loss 31.2550 LearningRate 0.000315 Epoch: 1 Global Step: 26130 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:49,157-Speed 2497.43 samples/sec Loss 31.2299 LearningRate 0.000315 Epoch: 1 Global Step: 26140 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:24:57,358-Speed 2497.55 samples/sec Loss 31.2803 LearningRate 0.000315 Epoch: 1 Global Step: 26150 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:05,565-Speed 2495.83 samples/sec Loss 31.1884 LearningRate 0.000315 Epoch: 1 Global Step: 26160 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:13,713-Speed 2513.73 samples/sec Loss 31.2313 LearningRate 0.000315 Epoch: 1 Global Step: 26170 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:21,917-Speed 2496.80 samples/sec Loss 31.2780 LearningRate 0.000316 Epoch: 1 Global Step: 26180 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:30,121-Speed 2497.02 samples/sec Loss 31.2994 LearningRate 0.000316 Epoch: 1 Global Step: 26190 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:38,325-Speed 2497.07 samples/sec Loss 31.2858 LearningRate 0.000316 Epoch: 1 Global Step: 26200 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:46,524-Speed 2498.07 samples/sec Loss 31.3007 LearningRate 0.000316 Epoch: 1 Global Step: 26210 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:25:54,726-Speed 2497.31 samples/sec Loss 31.2867 LearningRate 0.000316 Epoch: 1 Global Step: 26220 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:02,887-Speed 2510.57 samples/sec Loss 31.2132 LearningRate 0.000316 Epoch: 1 Global Step: 26230 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:11,092-Speed 2496.58 samples/sec Loss 31.1798 LearningRate 0.000316 Epoch: 1 Global Step: 26240 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:19,300-Speed 2495.37 samples/sec Loss 31.2423 LearningRate 0.000316 Epoch: 1 Global Step: 26250 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:27,505-Speed 2496.57 samples/sec Loss 31.2544 LearningRate 0.000317 Epoch: 1 Global Step: 26260 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:35,707-Speed 2497.03 samples/sec Loss 31.2072 LearningRate 0.000317 Epoch: 1 Global Step: 26270 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:43,909-Speed 2497.58 samples/sec Loss 31.2484 LearningRate 0.000317 Epoch: 1 Global Step: 26280 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:26:52,062-Speed 2512.32 samples/sec Loss 31.1424 LearningRate 0.000317 Epoch: 1 Global Step: 26290 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:00,268-Speed 2496.22 samples/sec Loss 31.1398 LearningRate 0.000317 Epoch: 1 Global Step: 26300 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:08,478-Speed 2495.68 samples/sec Loss 31.2081 LearningRate 0.000317 Epoch: 1 Global Step: 26310 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:16,678-Speed 2497.71 samples/sec Loss 31.0531 LearningRate 0.000317 Epoch: 1 Global Step: 26320 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:24,882-Speed 2496.75 samples/sec Loss 31.1151 LearningRate 0.000317 Epoch: 1 Global Step: 26330 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:33,083-Speed 2497.84 samples/sec Loss 31.0539 LearningRate 0.000318 Epoch: 1 Global Step: 26340 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:41,231-Speed 2513.58 samples/sec Loss 31.0767 LearningRate 0.000318 Epoch: 1 Global Step: 26350 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:49,436-Speed 2496.42 samples/sec Loss 30.9893 LearningRate 0.000318 Epoch: 1 Global Step: 26360 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:27:57,645-Speed 2495.27 samples/sec Loss 31.0701 LearningRate 0.000318 Epoch: 1 Global Step: 26370 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:28:05,852-Speed 2496.09 samples/sec Loss 30.9919 LearningRate 0.000318 Epoch: 1 Global Step: 26380 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:28:14,054-Speed 2497.40 samples/sec Loss 31.0262 LearningRate 0.000318 Epoch: 1 Global Step: 26390 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:28:22,256-Speed 2497.10 samples/sec Loss 31.0711 LearningRate 0.000318 Epoch: 1 Global Step: 26400 Fp16 Grad Scale: 1024 Required: 184 hours Training: 2022-07-05 20:28:30,405-Speed 2513.70 samples/sec Loss 31.0424 LearningRate 0.000318 Epoch: 1 Global Step: 26410 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:28:38,629-Speed 2490.73 samples/sec Loss 30.9536 LearningRate 0.000318 Epoch: 1 Global Step: 26420 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:28:46,836-Speed 2495.73 samples/sec Loss 30.9925 LearningRate 0.000319 Epoch: 1 Global Step: 26430 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:28:55,039-Speed 2497.04 samples/sec Loss 30.9742 LearningRate 0.000319 Epoch: 1 Global Step: 26440 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:03,240-Speed 2497.53 samples/sec Loss 30.9151 LearningRate 0.000319 Epoch: 1 Global Step: 26450 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:11,445-Speed 2496.67 samples/sec Loss 30.9545 LearningRate 0.000319 Epoch: 1 Global Step: 26460 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:19,594-Speed 2513.71 samples/sec Loss 30.9654 LearningRate 0.000319 Epoch: 1 Global Step: 26470 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:27,797-Speed 2497.07 samples/sec Loss 30.8525 LearningRate 0.000319 Epoch: 1 Global Step: 26480 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:35,999-Speed 2497.24 samples/sec Loss 31.0210 LearningRate 0.000319 Epoch: 1 Global Step: 26490 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:44,200-Speed 2497.70 samples/sec Loss 30.9371 LearningRate 0.000319 Epoch: 1 Global Step: 26500 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:29:52,402-Speed 2497.60 samples/sec Loss 31.0277 LearningRate 0.000320 Epoch: 1 Global Step: 26510 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:30:00,620-Speed 2492.25 samples/sec Loss 30.9085 LearningRate 0.000320 Epoch: 1 Global Step: 26520 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:30:08,768-Speed 2514.04 samples/sec Loss 30.8710 LearningRate 0.000320 Epoch: 1 Global Step: 26530 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:30:16,970-Speed 2497.35 samples/sec Loss 30.8444 LearningRate 0.000320 Epoch: 1 Global Step: 26540 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:30:25,171-Speed 2497.64 samples/sec Loss 30.8210 LearningRate 0.000320 Epoch: 1 Global Step: 26550 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:30:33,376-Speed 2496.44 samples/sec Loss 30.8558 LearningRate 0.000320 Epoch: 1 Global Step: 26560 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:30:41,578-Speed 2497.30 samples/sec Loss 30.7881 LearningRate 0.000320 Epoch: 1 Global Step: 26570 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:30:49,779-Speed 2497.51 samples/sec Loss 30.8394 LearningRate 0.000320 Epoch: 1 Global Step: 26580 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:30:57,927-Speed 2513.82 samples/sec Loss 30.7659 LearningRate 0.000321 Epoch: 1 Global Step: 26590 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:06,130-Speed 2496.97 samples/sec Loss 30.8637 LearningRate 0.000321 Epoch: 1 Global Step: 26600 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:14,337-Speed 2495.94 samples/sec Loss 30.7294 LearningRate 0.000321 Epoch: 1 Global Step: 26610 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:22,576-Speed 2486.22 samples/sec Loss 30.7988 LearningRate 0.000321 Epoch: 1 Global Step: 26620 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:30,776-Speed 2497.95 samples/sec Loss 30.7356 LearningRate 0.000321 Epoch: 1 Global Step: 26630 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:38,977-Speed 2497.41 samples/sec Loss 30.6528 LearningRate 0.000321 Epoch: 1 Global Step: 26640 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:47,126-Speed 2514.00 samples/sec Loss 30.8588 LearningRate 0.000321 Epoch: 1 Global Step: 26650 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:31:55,333-Speed 2495.84 samples/sec Loss 30.6689 LearningRate 0.000321 Epoch: 1 Global Step: 26660 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:03,538-Speed 2496.64 samples/sec Loss 30.7829 LearningRate 0.000321 Epoch: 1 Global Step: 26670 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:11,746-Speed 2495.56 samples/sec Loss 30.6658 LearningRate 0.000322 Epoch: 1 Global Step: 26680 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:19,972-Speed 2490.19 samples/sec Loss 30.6534 LearningRate 0.000322 Epoch: 1 Global Step: 26690 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:28,176-Speed 2496.53 samples/sec Loss 30.5873 LearningRate 0.000322 Epoch: 1 Global Step: 26700 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:36,326-Speed 2513.19 samples/sec Loss 30.7319 LearningRate 0.000322 Epoch: 1 Global Step: 26710 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:44,528-Speed 2497.56 samples/sec Loss 30.6228 LearningRate 0.000322 Epoch: 1 Global Step: 26720 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:32:52,730-Speed 2497.45 samples/sec Loss 30.7110 LearningRate 0.000322 Epoch: 1 Global Step: 26730 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:00,930-Speed 2497.86 samples/sec Loss 30.7899 LearningRate 0.000322 Epoch: 1 Global Step: 26740 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:09,135-Speed 2496.48 samples/sec Loss 30.7135 LearningRate 0.000322 Epoch: 1 Global Step: 26750 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:17,334-Speed 2498.41 samples/sec Loss 30.6589 LearningRate 0.000323 Epoch: 1 Global Step: 26760 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:25,486-Speed 2512.54 samples/sec Loss 30.5798 LearningRate 0.000323 Epoch: 1 Global Step: 26770 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:33,689-Speed 2497.04 samples/sec Loss 30.5297 LearningRate 0.000323 Epoch: 1 Global Step: 26780 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:41,890-Speed 2497.63 samples/sec Loss 30.5115 LearningRate 0.000323 Epoch: 1 Global Step: 26790 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:50,091-Speed 2497.78 samples/sec Loss 30.5553 LearningRate 0.000323 Epoch: 1 Global Step: 26800 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:33:58,292-Speed 2497.55 samples/sec Loss 30.6917 LearningRate 0.000323 Epoch: 1 Global Step: 26810 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:06,493-Speed 2497.45 samples/sec Loss 30.5827 LearningRate 0.000323 Epoch: 1 Global Step: 26820 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:14,640-Speed 2514.29 samples/sec Loss 30.6131 LearningRate 0.000323 Epoch: 1 Global Step: 26830 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:22,844-Speed 2496.99 samples/sec Loss 30.5362 LearningRate 0.000324 Epoch: 1 Global Step: 26840 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:31,044-Speed 2497.81 samples/sec Loss 30.4713 LearningRate 0.000324 Epoch: 1 Global Step: 26850 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:39,245-Speed 2497.84 samples/sec Loss 30.5630 LearningRate 0.000324 Epoch: 1 Global Step: 26860 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:47,449-Speed 2496.86 samples/sec Loss 30.5452 LearningRate 0.000324 Epoch: 1 Global Step: 26870 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:34:55,652-Speed 2497.22 samples/sec Loss 30.4815 LearningRate 0.000324 Epoch: 1 Global Step: 26880 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:03,798-Speed 2514.27 samples/sec Loss 30.4300 LearningRate 0.000324 Epoch: 1 Global Step: 26890 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:11,997-Speed 2498.23 samples/sec Loss 30.4307 LearningRate 0.000324 Epoch: 1 Global Step: 26900 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:20,199-Speed 2497.24 samples/sec Loss 30.4132 LearningRate 0.000324 Epoch: 1 Global Step: 26910 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:28,401-Speed 2497.81 samples/sec Loss 30.4142 LearningRate 0.000324 Epoch: 1 Global Step: 26920 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:36,604-Speed 2497.02 samples/sec Loss 30.3449 LearningRate 0.000325 Epoch: 1 Global Step: 26930 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:44,807-Speed 2497.19 samples/sec Loss 30.4174 LearningRate 0.000325 Epoch: 1 Global Step: 26940 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:35:52,958-Speed 2513.08 samples/sec Loss 30.3344 LearningRate 0.000325 Epoch: 1 Global Step: 26950 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:01,163-Speed 2496.34 samples/sec Loss 30.3410 LearningRate 0.000325 Epoch: 1 Global Step: 26960 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:09,369-Speed 2496.18 samples/sec Loss 30.4122 LearningRate 0.000325 Epoch: 1 Global Step: 26970 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:17,573-Speed 2496.52 samples/sec Loss 30.3758 LearningRate 0.000325 Epoch: 1 Global Step: 26980 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:25,780-Speed 2496.13 samples/sec Loss 30.2991 LearningRate 0.000325 Epoch: 1 Global Step: 26990 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:33,988-Speed 2495.52 samples/sec Loss 30.3205 LearningRate 0.000325 Epoch: 1 Global Step: 27000 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:42,138-Speed 2513.54 samples/sec Loss 30.2299 LearningRate 0.000326 Epoch: 1 Global Step: 27010 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:50,338-Speed 2497.60 samples/sec Loss 30.2163 LearningRate 0.000326 Epoch: 1 Global Step: 27020 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:36:58,541-Speed 2497.12 samples/sec Loss 30.2621 LearningRate 0.000326 Epoch: 1 Global Step: 27030 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:06,742-Speed 2497.86 samples/sec Loss 30.2358 LearningRate 0.000326 Epoch: 1 Global Step: 27040 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:14,948-Speed 2496.12 samples/sec Loss 30.2532 LearningRate 0.000326 Epoch: 1 Global Step: 27050 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:23,155-Speed 2495.79 samples/sec Loss 30.2940 LearningRate 0.000326 Epoch: 1 Global Step: 27060 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:31,313-Speed 2510.87 samples/sec Loss 30.2035 LearningRate 0.000326 Epoch: 1 Global Step: 27070 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:39,515-Speed 2497.15 samples/sec Loss 30.2181 LearningRate 0.000326 Epoch: 1 Global Step: 27080 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:47,717-Speed 2497.42 samples/sec Loss 30.1548 LearningRate 0.000327 Epoch: 1 Global Step: 27090 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:37:55,920-Speed 2496.76 samples/sec Loss 30.2818 LearningRate 0.000327 Epoch: 1 Global Step: 27100 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:04,124-Speed 2497.17 samples/sec Loss 30.1641 LearningRate 0.000327 Epoch: 1 Global Step: 27110 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:12,325-Speed 2497.60 samples/sec Loss 30.1799 LearningRate 0.000327 Epoch: 1 Global Step: 27120 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:20,472-Speed 2514.39 samples/sec Loss 30.1598 LearningRate 0.000327 Epoch: 1 Global Step: 27130 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:28,674-Speed 2497.06 samples/sec Loss 30.1113 LearningRate 0.000327 Epoch: 1 Global Step: 27140 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:36,879-Speed 2496.35 samples/sec Loss 30.1386 LearningRate 0.000327 Epoch: 1 Global Step: 27150 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:45,088-Speed 2495.08 samples/sec Loss 30.1045 LearningRate 0.000327 Epoch: 1 Global Step: 27160 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:38:53,299-Speed 2494.67 samples/sec Loss 30.0804 LearningRate 0.000328 Epoch: 1 Global Step: 27170 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:01,500-Speed 2497.54 samples/sec Loss 30.0966 LearningRate 0.000328 Epoch: 1 Global Step: 27180 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:09,652-Speed 2512.63 samples/sec Loss 30.1466 LearningRate 0.000328 Epoch: 1 Global Step: 27190 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:17,854-Speed 2497.31 samples/sec Loss 30.0636 LearningRate 0.000328 Epoch: 1 Global Step: 27200 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:26,056-Speed 2497.35 samples/sec Loss 30.1459 LearningRate 0.000328 Epoch: 1 Global Step: 27210 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:34,258-Speed 2497.44 samples/sec Loss 29.9836 LearningRate 0.000328 Epoch: 1 Global Step: 27220 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:42,462-Speed 2496.61 samples/sec Loss 30.1219 LearningRate 0.000328 Epoch: 1 Global Step: 27230 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:50,665-Speed 2497.18 samples/sec Loss 30.0754 LearningRate 0.000328 Epoch: 1 Global Step: 27240 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:39:58,828-Speed 2509.19 samples/sec Loss 30.0896 LearningRate 0.000328 Epoch: 1 Global Step: 27250 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:07,031-Speed 2497.16 samples/sec Loss 30.1017 LearningRate 0.000329 Epoch: 1 Global Step: 27260 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:15,233-Speed 2497.56 samples/sec Loss 29.9932 LearningRate 0.000329 Epoch: 1 Global Step: 27270 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:23,441-Speed 2495.57 samples/sec Loss 30.0921 LearningRate 0.000329 Epoch: 1 Global Step: 27280 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:31,640-Speed 2498.19 samples/sec Loss 30.0118 LearningRate 0.000329 Epoch: 1 Global Step: 27290 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:39,843-Speed 2497.05 samples/sec Loss 29.9326 LearningRate 0.000329 Epoch: 1 Global Step: 27300 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:47,993-Speed 2512.95 samples/sec Loss 30.0409 LearningRate 0.000329 Epoch: 1 Global Step: 27310 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:40:56,195-Speed 2497.53 samples/sec Loss 29.9835 LearningRate 0.000329 Epoch: 1 Global Step: 27320 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:41:04,401-Speed 2496.61 samples/sec Loss 30.0777 LearningRate 0.000329 Epoch: 1 Global Step: 27330 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:41:12,600-Speed 2498.10 samples/sec Loss 29.9482 LearningRate 0.000330 Epoch: 1 Global Step: 27340 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:41:20,758-Speed 2510.74 samples/sec Loss 29.9571 LearningRate 0.000330 Epoch: 1 Global Step: 27350 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:41:28,959-Speed 2497.68 samples/sec Loss 29.9819 LearningRate 0.000330 Epoch: 1 Global Step: 27360 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:41:37,107-Speed 2513.97 samples/sec Loss 30.0224 LearningRate 0.000330 Epoch: 1 Global Step: 27370 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:41:45,311-Speed 2496.87 samples/sec Loss 30.0541 LearningRate 0.000330 Epoch: 1 Global Step: 27380 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:41:53,511-Speed 2497.78 samples/sec Loss 29.9730 LearningRate 0.000330 Epoch: 1 Global Step: 27390 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:01,721-Speed 2495.01 samples/sec Loss 29.9657 LearningRate 0.000330 Epoch: 1 Global Step: 27400 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:09,919-Speed 2498.58 samples/sec Loss 29.8557 LearningRate 0.000330 Epoch: 1 Global Step: 27410 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:18,122-Speed 2496.93 samples/sec Loss 29.9007 LearningRate 0.000331 Epoch: 1 Global Step: 27420 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:26,272-Speed 2513.45 samples/sec Loss 29.8942 LearningRate 0.000331 Epoch: 1 Global Step: 27430 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:34,476-Speed 2496.81 samples/sec Loss 29.9213 LearningRate 0.000331 Epoch: 1 Global Step: 27440 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:42,676-Speed 2498.03 samples/sec Loss 29.8932 LearningRate 0.000331 Epoch: 1 Global Step: 27450 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:50,878-Speed 2497.41 samples/sec Loss 29.8443 LearningRate 0.000331 Epoch: 1 Global Step: 27460 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:42:59,081-Speed 2496.88 samples/sec Loss 29.8200 LearningRate 0.000331 Epoch: 1 Global Step: 27470 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:07,293-Speed 2494.41 samples/sec Loss 29.8007 LearningRate 0.000331 Epoch: 1 Global Step: 27480 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:15,442-Speed 2513.42 samples/sec Loss 29.8878 LearningRate 0.000331 Epoch: 1 Global Step: 27490 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:23,642-Speed 2498.41 samples/sec Loss 29.7874 LearningRate 0.000331 Epoch: 1 Global Step: 27500 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:31,844-Speed 2497.29 samples/sec Loss 29.7341 LearningRate 0.000332 Epoch: 1 Global Step: 27510 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:40,043-Speed 2498.30 samples/sec Loss 29.8105 LearningRate 0.000332 Epoch: 1 Global Step: 27520 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:48,244-Speed 2497.51 samples/sec Loss 29.7515 LearningRate 0.000332 Epoch: 1 Global Step: 27530 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:43:56,471-Speed 2489.64 samples/sec Loss 29.7902 LearningRate 0.000332 Epoch: 1 Global Step: 27540 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:04,620-Speed 2513.72 samples/sec Loss 29.7458 LearningRate 0.000332 Epoch: 1 Global Step: 27550 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:12,819-Speed 2498.14 samples/sec Loss 29.7302 LearningRate 0.000332 Epoch: 1 Global Step: 27560 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:21,019-Speed 2497.98 samples/sec Loss 29.6791 LearningRate 0.000332 Epoch: 1 Global Step: 27570 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:29,217-Speed 2498.51 samples/sec Loss 29.6240 LearningRate 0.000332 Epoch: 1 Global Step: 27580 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:37,417-Speed 2498.11 samples/sec Loss 29.6587 LearningRate 0.000333 Epoch: 1 Global Step: 27590 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:45,616-Speed 2498.05 samples/sec Loss 29.6129 LearningRate 0.000333 Epoch: 1 Global Step: 27600 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:44:53,762-Speed 2514.93 samples/sec Loss 29.6054 LearningRate 0.000333 Epoch: 1 Global Step: 27610 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:01,964-Speed 2497.35 samples/sec Loss 29.5513 LearningRate 0.000333 Epoch: 1 Global Step: 27620 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:10,165-Speed 2497.77 samples/sec Loss 29.5792 LearningRate 0.000333 Epoch: 1 Global Step: 27630 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:18,363-Speed 2498.46 samples/sec Loss 29.6325 LearningRate 0.000333 Epoch: 1 Global Step: 27640 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:26,563-Speed 2498.10 samples/sec Loss 29.6829 LearningRate 0.000333 Epoch: 1 Global Step: 27650 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:34,767-Speed 2496.52 samples/sec Loss 29.5761 LearningRate 0.000333 Epoch: 1 Global Step: 27660 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:42,913-Speed 2514.62 samples/sec Loss 29.5984 LearningRate 0.000334 Epoch: 1 Global Step: 27670 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:51,127-Speed 2493.82 samples/sec Loss 29.5692 LearningRate 0.000334 Epoch: 1 Global Step: 27680 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:45:59,329-Speed 2497.72 samples/sec Loss 29.6242 LearningRate 0.000334 Epoch: 1 Global Step: 27690 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:07,528-Speed 2498.05 samples/sec Loss 29.4023 LearningRate 0.000334 Epoch: 1 Global Step: 27700 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:15,728-Speed 2498.08 samples/sec Loss 29.4366 LearningRate 0.000334 Epoch: 1 Global Step: 27710 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:23,929-Speed 2497.58 samples/sec Loss 29.6083 LearningRate 0.000334 Epoch: 1 Global Step: 27720 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:32,079-Speed 2513.55 samples/sec Loss 29.5325 LearningRate 0.000334 Epoch: 1 Global Step: 27730 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:40,275-Speed 2499.18 samples/sec Loss 29.5091 LearningRate 0.000334 Epoch: 1 Global Step: 27740 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:48,473-Speed 2498.56 samples/sec Loss 29.4248 LearningRate 0.000335 Epoch: 1 Global Step: 27750 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:46:56,673-Speed 2498.03 samples/sec Loss 29.5112 LearningRate 0.000335 Epoch: 1 Global Step: 27760 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:04,872-Speed 2498.36 samples/sec Loss 29.3756 LearningRate 0.000335 Epoch: 1 Global Step: 27770 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:13,074-Speed 2497.50 samples/sec Loss 29.3503 LearningRate 0.000335 Epoch: 1 Global Step: 27780 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:21,224-Speed 2513.58 samples/sec Loss 29.4400 LearningRate 0.000335 Epoch: 1 Global Step: 27790 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:29,424-Speed 2497.95 samples/sec Loss 29.4286 LearningRate 0.000335 Epoch: 1 Global Step: 27800 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:37,624-Speed 2497.68 samples/sec Loss 29.3852 LearningRate 0.000335 Epoch: 1 Global Step: 27810 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:45,824-Speed 2498.15 samples/sec Loss 29.3731 LearningRate 0.000335 Epoch: 1 Global Step: 27820 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:47:54,026-Speed 2497.09 samples/sec Loss 29.3896 LearningRate 0.000335 Epoch: 1 Global Step: 27830 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:02,230-Speed 2497.14 samples/sec Loss 29.3491 LearningRate 0.000336 Epoch: 1 Global Step: 27840 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:10,376-Speed 2514.74 samples/sec Loss 29.3639 LearningRate 0.000336 Epoch: 1 Global Step: 27850 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:18,583-Speed 2495.71 samples/sec Loss 29.3742 LearningRate 0.000336 Epoch: 1 Global Step: 27860 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:26,792-Speed 2495.29 samples/sec Loss 29.3637 LearningRate 0.000336 Epoch: 1 Global Step: 27870 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:34,991-Speed 2498.34 samples/sec Loss 29.2260 LearningRate 0.000336 Epoch: 1 Global Step: 27880 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:43,187-Speed 2499.02 samples/sec Loss 29.2895 LearningRate 0.000336 Epoch: 1 Global Step: 27890 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:51,388-Speed 2498.15 samples/sec Loss 29.1769 LearningRate 0.000336 Epoch: 1 Global Step: 27900 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:48:59,546-Speed 2511.08 samples/sec Loss 29.3520 LearningRate 0.000336 Epoch: 1 Global Step: 27910 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:07,751-Speed 2496.31 samples/sec Loss 29.3577 LearningRate 0.000337 Epoch: 1 Global Step: 27920 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:15,953-Speed 2497.31 samples/sec Loss 29.3001 LearningRate 0.000337 Epoch: 1 Global Step: 27930 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:24,155-Speed 2497.40 samples/sec Loss 29.2708 LearningRate 0.000337 Epoch: 1 Global Step: 27940 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:32,360-Speed 2496.58 samples/sec Loss 29.2581 LearningRate 0.000337 Epoch: 1 Global Step: 27950 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:40,562-Speed 2497.39 samples/sec Loss 29.1600 LearningRate 0.000337 Epoch: 1 Global Step: 27960 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:48,709-Speed 2514.10 samples/sec Loss 29.2612 LearningRate 0.000337 Epoch: 1 Global Step: 27970 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:49:56,918-Speed 2495.24 samples/sec Loss 29.3130 LearningRate 0.000337 Epoch: 1 Global Step: 27980 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:05,118-Speed 2498.04 samples/sec Loss 29.2006 LearningRate 0.000337 Epoch: 1 Global Step: 27990 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:13,320-Speed 2497.23 samples/sec Loss 29.2437 LearningRate 0.000338 Epoch: 1 Global Step: 28000 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:21,519-Speed 2498.29 samples/sec Loss 29.2213 LearningRate 0.000338 Epoch: 1 Global Step: 28010 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:29,718-Speed 2498.24 samples/sec Loss 29.3488 LearningRate 0.000338 Epoch: 1 Global Step: 28020 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:37,862-Speed 2515.04 samples/sec Loss 29.1741 LearningRate 0.000338 Epoch: 1 Global Step: 28030 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:46,064-Speed 2497.42 samples/sec Loss 29.2115 LearningRate 0.000338 Epoch: 1 Global Step: 28040 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:50:54,266-Speed 2497.25 samples/sec Loss 29.1142 LearningRate 0.000338 Epoch: 1 Global Step: 28050 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:02,467-Speed 2497.65 samples/sec Loss 29.2017 LearningRate 0.000338 Epoch: 1 Global Step: 28060 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:10,687-Speed 2492.07 samples/sec Loss 29.2081 LearningRate 0.000338 Epoch: 1 Global Step: 28070 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:18,889-Speed 2497.38 samples/sec Loss 29.0958 LearningRate 0.000338 Epoch: 1 Global Step: 28080 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:27,037-Speed 2513.61 samples/sec Loss 29.1947 LearningRate 0.000339 Epoch: 1 Global Step: 28090 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:35,241-Speed 2496.72 samples/sec Loss 29.2508 LearningRate 0.000339 Epoch: 1 Global Step: 28100 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:43,453-Speed 2494.63 samples/sec Loss 29.2206 LearningRate 0.000339 Epoch: 1 Global Step: 28110 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:51,652-Speed 2497.96 samples/sec Loss 29.1314 LearningRate 0.000339 Epoch: 1 Global Step: 28120 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:51:59,857-Speed 2496.78 samples/sec Loss 29.0416 LearningRate 0.000339 Epoch: 1 Global Step: 28130 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:08,056-Speed 2498.20 samples/sec Loss 29.0431 LearningRate 0.000339 Epoch: 1 Global Step: 28140 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:16,206-Speed 2513.42 samples/sec Loss 29.0474 LearningRate 0.000339 Epoch: 1 Global Step: 28150 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:24,404-Speed 2498.72 samples/sec Loss 29.0061 LearningRate 0.000339 Epoch: 1 Global Step: 28160 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:32,605-Speed 2497.62 samples/sec Loss 29.0683 LearningRate 0.000340 Epoch: 1 Global Step: 28170 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:40,812-Speed 2495.58 samples/sec Loss 29.1192 LearningRate 0.000340 Epoch: 1 Global Step: 28180 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:49,024-Speed 2494.82 samples/sec Loss 29.0261 LearningRate 0.000340 Epoch: 1 Global Step: 28190 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:52:57,227-Speed 2497.02 samples/sec Loss 29.0714 LearningRate 0.000340 Epoch: 1 Global Step: 28200 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:05,372-Speed 2514.77 samples/sec Loss 28.9373 LearningRate 0.000340 Epoch: 1 Global Step: 28210 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:13,574-Speed 2497.18 samples/sec Loss 28.9422 LearningRate 0.000340 Epoch: 1 Global Step: 28220 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:21,777-Speed 2497.19 samples/sec Loss 28.9157 LearningRate 0.000340 Epoch: 1 Global Step: 28230 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:29,977-Speed 2497.81 samples/sec Loss 28.8910 LearningRate 0.000340 Epoch: 1 Global Step: 28240 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:38,178-Speed 2497.62 samples/sec Loss 28.8897 LearningRate 0.000341 Epoch: 1 Global Step: 28250 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:46,380-Speed 2497.74 samples/sec Loss 28.8443 LearningRate 0.000341 Epoch: 1 Global Step: 28260 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:53:54,524-Speed 2514.80 samples/sec Loss 28.8871 LearningRate 0.000341 Epoch: 1 Global Step: 28270 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:02,727-Speed 2497.06 samples/sec Loss 28.8191 LearningRate 0.000341 Epoch: 1 Global Step: 28280 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:10,928-Speed 2497.60 samples/sec Loss 28.8069 LearningRate 0.000341 Epoch: 1 Global Step: 28290 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:19,130-Speed 2497.41 samples/sec Loss 28.9418 LearningRate 0.000341 Epoch: 1 Global Step: 28300 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:27,340-Speed 2494.93 samples/sec Loss 28.8509 LearningRate 0.000341 Epoch: 1 Global Step: 28310 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:35,539-Speed 2498.16 samples/sec Loss 28.8659 LearningRate 0.000341 Epoch: 1 Global Step: 28320 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:43,688-Speed 2513.83 samples/sec Loss 28.7731 LearningRate 0.000341 Epoch: 1 Global Step: 28330 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:54:51,887-Speed 2498.08 samples/sec Loss 28.8055 LearningRate 0.000342 Epoch: 1 Global Step: 28340 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:00,090-Speed 2497.04 samples/sec Loss 28.8517 LearningRate 0.000342 Epoch: 1 Global Step: 28350 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:08,293-Speed 2496.98 samples/sec Loss 28.8414 LearningRate 0.000342 Epoch: 1 Global Step: 28360 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:16,494-Speed 2497.62 samples/sec Loss 28.8625 LearningRate 0.000342 Epoch: 1 Global Step: 28370 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:24,698-Speed 2496.72 samples/sec Loss 28.8276 LearningRate 0.000342 Epoch: 1 Global Step: 28380 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:32,850-Speed 2512.63 samples/sec Loss 28.8251 LearningRate 0.000342 Epoch: 1 Global Step: 28390 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:41,052-Speed 2497.25 samples/sec Loss 28.8405 LearningRate 0.000342 Epoch: 1 Global Step: 28400 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:49,254-Speed 2497.53 samples/sec Loss 28.7442 LearningRate 0.000342 Epoch: 1 Global Step: 28410 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:55:57,456-Speed 2497.67 samples/sec Loss 28.7313 LearningRate 0.000343 Epoch: 1 Global Step: 28420 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:05,662-Speed 2495.96 samples/sec Loss 28.6398 LearningRate 0.000343 Epoch: 1 Global Step: 28430 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:13,865-Speed 2497.16 samples/sec Loss 28.7498 LearningRate 0.000343 Epoch: 1 Global Step: 28440 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:22,012-Speed 2514.05 samples/sec Loss 28.7194 LearningRate 0.000343 Epoch: 1 Global Step: 28450 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:30,209-Speed 2498.94 samples/sec Loss 28.6728 LearningRate 0.000343 Epoch: 1 Global Step: 28460 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:38,419-Speed 2495.01 samples/sec Loss 28.7607 LearningRate 0.000343 Epoch: 1 Global Step: 28470 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:46,616-Speed 2499.10 samples/sec Loss 28.7300 LearningRate 0.000343 Epoch: 1 Global Step: 28480 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:56:54,811-Speed 2499.71 samples/sec Loss 28.6328 LearningRate 0.000343 Epoch: 1 Global Step: 28490 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:03,010-Speed 2498.15 samples/sec Loss 28.6041 LearningRate 0.000344 Epoch: 1 Global Step: 28500 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:11,176-Speed 2508.32 samples/sec Loss 28.5482 LearningRate 0.000344 Epoch: 1 Global Step: 28510 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:19,373-Speed 2498.91 samples/sec Loss 28.6024 LearningRate 0.000344 Epoch: 1 Global Step: 28520 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:27,577-Speed 2496.77 samples/sec Loss 28.6622 LearningRate 0.000344 Epoch: 1 Global Step: 28530 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:35,775-Speed 2498.57 samples/sec Loss 28.7326 LearningRate 0.000344 Epoch: 1 Global Step: 28540 Fp16 Grad Scale: 1024 Required: 183 hours Training: 2022-07-05 20:57:43,977-Speed 2497.41 samples/sec Loss 28.6256 LearningRate 0.000344 Epoch: 1 Global Step: 28550 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:57:52,175-Speed 2498.68 samples/sec Loss 28.5885 LearningRate 0.000344 Epoch: 1 Global Step: 28560 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:00,324-Speed 2513.55 samples/sec Loss 28.6517 LearningRate 0.000344 Epoch: 1 Global Step: 28570 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:08,523-Speed 2498.08 samples/sec Loss 28.5794 LearningRate 0.000345 Epoch: 1 Global Step: 28580 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:16,724-Speed 2497.95 samples/sec Loss 28.6781 LearningRate 0.000345 Epoch: 1 Global Step: 28590 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:24,927-Speed 2496.89 samples/sec Loss 28.5958 LearningRate 0.000345 Epoch: 1 Global Step: 28600 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:33,128-Speed 2497.68 samples/sec Loss 28.5786 LearningRate 0.000345 Epoch: 1 Global Step: 28610 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:41,330-Speed 2497.28 samples/sec Loss 28.5001 LearningRate 0.000345 Epoch: 1 Global Step: 28620 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:49,481-Speed 2512.93 samples/sec Loss 28.4955 LearningRate 0.000345 Epoch: 1 Global Step: 28630 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:58:57,684-Speed 2497.32 samples/sec Loss 28.4591 LearningRate 0.000345 Epoch: 1 Global Step: 28640 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:05,889-Speed 2496.44 samples/sec Loss 28.5457 LearningRate 0.000345 Epoch: 1 Global Step: 28650 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:14,095-Speed 2496.08 samples/sec Loss 28.4195 LearningRate 0.000345 Epoch: 1 Global Step: 28660 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:22,296-Speed 2497.80 samples/sec Loss 28.4735 LearningRate 0.000346 Epoch: 1 Global Step: 28670 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:30,510-Speed 2493.83 samples/sec Loss 28.3919 LearningRate 0.000346 Epoch: 1 Global Step: 28680 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:38,659-Speed 2513.48 samples/sec Loss 28.4119 LearningRate 0.000346 Epoch: 1 Global Step: 28690 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:46,859-Speed 2498.14 samples/sec Loss 28.4166 LearningRate 0.000346 Epoch: 1 Global Step: 28700 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 20:59:55,062-Speed 2496.97 samples/sec Loss 28.3797 LearningRate 0.000346 Epoch: 1 Global Step: 28710 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:03,269-Speed 2496.02 samples/sec Loss 28.3063 LearningRate 0.000346 Epoch: 1 Global Step: 28720 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:11,474-Speed 2496.46 samples/sec Loss 28.4199 LearningRate 0.000346 Epoch: 1 Global Step: 28730 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:19,683-Speed 2495.19 samples/sec Loss 28.3975 LearningRate 0.000346 Epoch: 1 Global Step: 28740 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:27,829-Speed 2514.31 samples/sec Loss 28.3525 LearningRate 0.000347 Epoch: 1 Global Step: 28750 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:36,027-Speed 2498.67 samples/sec Loss 28.3681 LearningRate 0.000347 Epoch: 1 Global Step: 28760 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:44,238-Speed 2494.89 samples/sec Loss 28.3962 LearningRate 0.000347 Epoch: 1 Global Step: 28770 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:00:52,436-Speed 2498.42 samples/sec Loss 28.3710 LearningRate 0.000347 Epoch: 1 Global Step: 28780 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:00,637-Speed 2498.57 samples/sec Loss 28.3239 LearningRate 0.000347 Epoch: 1 Global Step: 28790 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:08,838-Speed 2497.53 samples/sec Loss 28.2821 LearningRate 0.000347 Epoch: 1 Global Step: 28800 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:16,993-Speed 2511.76 samples/sec Loss 28.2883 LearningRate 0.000347 Epoch: 1 Global Step: 28810 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:25,195-Speed 2497.42 samples/sec Loss 28.2850 LearningRate 0.000347 Epoch: 1 Global Step: 28820 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:33,415-Speed 2499.28 samples/sec Loss 28.3509 LearningRate 0.000348 Epoch: 1 Global Step: 28830 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:41,622-Speed 2495.92 samples/sec Loss 28.2706 LearningRate 0.000348 Epoch: 1 Global Step: 28840 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:49,995-Speed 2446.27 samples/sec Loss 28.1325 LearningRate 0.000348 Epoch: 1 Global Step: 28850 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:01:58,197-Speed 2497.31 samples/sec Loss 28.1704 LearningRate 0.000348 Epoch: 1 Global Step: 28860 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:06,345-Speed 2513.83 samples/sec Loss 28.1909 LearningRate 0.000348 Epoch: 1 Global Step: 28870 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:18,378-Speed 1702.82 samples/sec Loss 28.1490 LearningRate 0.000348 Epoch: 1 Global Step: 28880 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:26,572-Speed 2499.81 samples/sec Loss 28.1805 LearningRate 0.000348 Epoch: 1 Global Step: 28890 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:34,766-Speed 2499.88 samples/sec Loss 28.2245 LearningRate 0.000348 Epoch: 1 Global Step: 28900 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:42,959-Speed 2500.19 samples/sec Loss 28.1757 LearningRate 0.000348 Epoch: 1 Global Step: 28910 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:51,167-Speed 2495.32 samples/sec Loss 28.2003 LearningRate 0.000349 Epoch: 1 Global Step: 28920 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:02:59,311-Speed 2515.28 samples/sec Loss 28.0982 LearningRate 0.000349 Epoch: 1 Global Step: 28930 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:07,510-Speed 2498.19 samples/sec Loss 28.0372 LearningRate 0.000349 Epoch: 1 Global Step: 28940 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:15,711-Speed 2497.76 samples/sec Loss 28.0795 LearningRate 0.000349 Epoch: 1 Global Step: 28950 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:23,911-Speed 2498.05 samples/sec Loss 28.0957 LearningRate 0.000349 Epoch: 1 Global Step: 28960 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:32,115-Speed 2496.62 samples/sec Loss 28.2154 LearningRate 0.000349 Epoch: 1 Global Step: 28970 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:40,320-Speed 2496.68 samples/sec Loss 28.0167 LearningRate 0.000349 Epoch: 1 Global Step: 28980 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:48,468-Speed 2513.90 samples/sec Loss 28.1080 LearningRate 0.000349 Epoch: 1 Global Step: 28990 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:03:56,667-Speed 2498.06 samples/sec Loss 27.9990 LearningRate 0.000350 Epoch: 1 Global Step: 29000 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:04,870-Speed 2497.05 samples/sec Loss 27.9722 LearningRate 0.000350 Epoch: 1 Global Step: 29010 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:13,071-Speed 2497.80 samples/sec Loss 28.1545 LearningRate 0.000350 Epoch: 1 Global Step: 29020 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:21,284-Speed 2494.19 samples/sec Loss 28.1142 LearningRate 0.000350 Epoch: 1 Global Step: 29030 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:29,488-Speed 2496.60 samples/sec Loss 28.0376 LearningRate 0.000350 Epoch: 1 Global Step: 29040 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:37,635-Speed 2514.21 samples/sec Loss 27.9521 LearningRate 0.000350 Epoch: 1 Global Step: 29050 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:45,837-Speed 2497.55 samples/sec Loss 28.0479 LearningRate 0.000350 Epoch: 1 Global Step: 29060 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:04:54,039-Speed 2497.27 samples/sec Loss 28.0312 LearningRate 0.000350 Epoch: 1 Global Step: 29070 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:02,238-Speed 2498.19 samples/sec Loss 28.0461 LearningRate 0.000351 Epoch: 1 Global Step: 29080 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:10,442-Speed 2497.19 samples/sec Loss 27.9915 LearningRate 0.000351 Epoch: 1 Global Step: 29090 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:18,643-Speed 2497.83 samples/sec Loss 27.9648 LearningRate 0.000351 Epoch: 1 Global Step: 29100 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:26,790-Speed 2513.97 samples/sec Loss 27.8900 LearningRate 0.000351 Epoch: 1 Global Step: 29110 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:34,990-Speed 2497.99 samples/sec Loss 27.9033 LearningRate 0.000351 Epoch: 1 Global Step: 29120 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:43,191-Speed 2497.76 samples/sec Loss 27.8771 LearningRate 0.000351 Epoch: 1 Global Step: 29130 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:51,390-Speed 2498.37 samples/sec Loss 27.8522 LearningRate 0.000351 Epoch: 1 Global Step: 29140 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:05:59,591-Speed 2497.61 samples/sec Loss 27.8726 LearningRate 0.000351 Epoch: 1 Global Step: 29150 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:07,794-Speed 2497.18 samples/sec Loss 28.0212 LearningRate 0.000351 Epoch: 1 Global Step: 29160 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:15,941-Speed 2513.92 samples/sec Loss 27.9000 LearningRate 0.000352 Epoch: 1 Global Step: 29170 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:24,155-Speed 2493.90 samples/sec Loss 27.8470 LearningRate 0.000352 Epoch: 1 Global Step: 29180 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:32,359-Speed 2496.82 samples/sec Loss 27.8085 LearningRate 0.000352 Epoch: 1 Global Step: 29190 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:40,573-Speed 2493.77 samples/sec Loss 27.8024 LearningRate 0.000352 Epoch: 1 Global Step: 29200 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:48,776-Speed 2497.22 samples/sec Loss 27.8601 LearningRate 0.000352 Epoch: 1 Global Step: 29210 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:06:56,980-Speed 2496.86 samples/sec Loss 27.7514 LearningRate 0.000352 Epoch: 1 Global Step: 29220 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:05,128-Speed 2513.61 samples/sec Loss 27.7490 LearningRate 0.000352 Epoch: 1 Global Step: 29230 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:13,335-Speed 2496.00 samples/sec Loss 27.7415 LearningRate 0.000352 Epoch: 1 Global Step: 29240 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:21,537-Speed 2497.18 samples/sec Loss 27.6699 LearningRate 0.000353 Epoch: 1 Global Step: 29250 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:29,754-Speed 2492.95 samples/sec Loss 27.7438 LearningRate 0.000353 Epoch: 1 Global Step: 29260 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:37,956-Speed 2497.55 samples/sec Loss 27.7717 LearningRate 0.000353 Epoch: 1 Global Step: 29270 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:46,156-Speed 2498.11 samples/sec Loss 27.6955 LearningRate 0.000353 Epoch: 1 Global Step: 29280 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:07:54,303-Speed 2514.18 samples/sec Loss 27.7562 LearningRate 0.000353 Epoch: 1 Global Step: 29290 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:02,510-Speed 2496.06 samples/sec Loss 27.6761 LearningRate 0.000353 Epoch: 1 Global Step: 29300 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:10,709-Speed 2498.09 samples/sec Loss 27.7006 LearningRate 0.000353 Epoch: 1 Global Step: 29310 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:18,913-Speed 2496.84 samples/sec Loss 27.6040 LearningRate 0.000353 Epoch: 1 Global Step: 29320 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:27,131-Speed 2492.50 samples/sec Loss 27.6923 LearningRate 0.000354 Epoch: 1 Global Step: 29330 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:35,335-Speed 2496.64 samples/sec Loss 27.6943 LearningRate 0.000354 Epoch: 1 Global Step: 29340 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:43,480-Speed 2514.89 samples/sec Loss 27.6846 LearningRate 0.000354 Epoch: 1 Global Step: 29350 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:51,684-Speed 2497.19 samples/sec Loss 27.6269 LearningRate 0.000354 Epoch: 1 Global Step: 29360 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:08:59,887-Speed 2496.78 samples/sec Loss 27.6542 LearningRate 0.000354 Epoch: 1 Global Step: 29370 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:08,095-Speed 2495.50 samples/sec Loss 27.6763 LearningRate 0.000354 Epoch: 1 Global Step: 29380 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:16,295-Speed 2497.97 samples/sec Loss 27.7369 LearningRate 0.000354 Epoch: 1 Global Step: 29390 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:24,500-Speed 2496.57 samples/sec Loss 27.6485 LearningRate 0.000354 Epoch: 1 Global Step: 29400 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:32,652-Speed 2512.56 samples/sec Loss 27.7143 LearningRate 0.000355 Epoch: 1 Global Step: 29410 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:40,853-Speed 2497.83 samples/sec Loss 27.6224 LearningRate 0.000355 Epoch: 1 Global Step: 29420 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:49,055-Speed 2497.64 samples/sec Loss 27.6349 LearningRate 0.000355 Epoch: 1 Global Step: 29430 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:09:57,258-Speed 2497.03 samples/sec Loss 27.5693 LearningRate 0.000355 Epoch: 1 Global Step: 29440 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:05,462-Speed 2496.71 samples/sec Loss 27.5820 LearningRate 0.000355 Epoch: 1 Global Step: 29450 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:13,664-Speed 2497.34 samples/sec Loss 27.5953 LearningRate 0.000355 Epoch: 1 Global Step: 29460 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:21,811-Speed 2514.44 samples/sec Loss 27.5552 LearningRate 0.000355 Epoch: 1 Global Step: 29470 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:30,024-Speed 2494.07 samples/sec Loss 27.5673 LearningRate 0.000355 Epoch: 1 Global Step: 29480 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:38,226-Speed 2497.42 samples/sec Loss 27.4256 LearningRate 0.000355 Epoch: 1 Global Step: 29490 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:46,431-Speed 2496.36 samples/sec Loss 27.5318 LearningRate 0.000356 Epoch: 1 Global Step: 29500 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:10:54,645-Speed 2493.80 samples/sec Loss 27.4672 LearningRate 0.000356 Epoch: 1 Global Step: 29510 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:02,845-Speed 2498.13 samples/sec Loss 27.4307 LearningRate 0.000356 Epoch: 1 Global Step: 29520 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:11,000-Speed 2511.67 samples/sec Loss 27.5794 LearningRate 0.000356 Epoch: 1 Global Step: 29530 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:19,200-Speed 2498.02 samples/sec Loss 27.4906 LearningRate 0.000356 Epoch: 1 Global Step: 29540 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:27,398-Speed 2498.55 samples/sec Loss 27.5148 LearningRate 0.000356 Epoch: 1 Global Step: 29550 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:35,601-Speed 2497.09 samples/sec Loss 27.3872 LearningRate 0.000356 Epoch: 1 Global Step: 29560 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:43,800-Speed 2498.16 samples/sec Loss 27.3458 LearningRate 0.000356 Epoch: 1 Global Step: 29570 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:11:51,999-Speed 2498.13 samples/sec Loss 27.3629 LearningRate 0.000357 Epoch: 1 Global Step: 29580 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:00,147-Speed 2514.15 samples/sec Loss 27.3809 LearningRate 0.000357 Epoch: 1 Global Step: 29590 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:08,347-Speed 2497.76 samples/sec Loss 27.2343 LearningRate 0.000357 Epoch: 1 Global Step: 29600 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:16,550-Speed 2497.21 samples/sec Loss 27.3938 LearningRate 0.000357 Epoch: 1 Global Step: 29610 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:24,755-Speed 2496.36 samples/sec Loss 27.3388 LearningRate 0.000357 Epoch: 1 Global Step: 29620 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:32,959-Speed 2496.93 samples/sec Loss 27.2675 LearningRate 0.000357 Epoch: 1 Global Step: 29630 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:41,161-Speed 2497.25 samples/sec Loss 27.2302 LearningRate 0.000357 Epoch: 1 Global Step: 29640 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:49,317-Speed 2511.53 samples/sec Loss 27.3599 LearningRate 0.000357 Epoch: 1 Global Step: 29650 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:12:57,522-Speed 2496.41 samples/sec Loss 27.2776 LearningRate 0.000358 Epoch: 1 Global Step: 29660 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:05,723-Speed 2497.71 samples/sec Loss 27.1923 LearningRate 0.000358 Epoch: 1 Global Step: 29670 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:13,926-Speed 2496.99 samples/sec Loss 27.3079 LearningRate 0.000358 Epoch: 1 Global Step: 29680 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:22,126-Speed 2497.88 samples/sec Loss 27.3777 LearningRate 0.000358 Epoch: 1 Global Step: 29690 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:30,346-Speed 2492.04 samples/sec Loss 27.3639 LearningRate 0.000358 Epoch: 1 Global Step: 29700 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:38,495-Speed 2513.82 samples/sec Loss 27.2649 LearningRate 0.000358 Epoch: 1 Global Step: 29710 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:46,699-Speed 2496.66 samples/sec Loss 27.3633 LearningRate 0.000358 Epoch: 1 Global Step: 29720 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:13:54,906-Speed 2495.95 samples/sec Loss 27.1935 LearningRate 0.000358 Epoch: 1 Global Step: 29730 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:14:03,106-Speed 2497.93 samples/sec Loss 27.2199 LearningRate 0.000358 Epoch: 1 Global Step: 29740 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:14:11,320-Speed 2493.54 samples/sec Loss 27.1953 LearningRate 0.000359 Epoch: 1 Global Step: 29750 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:14:19,539-Speed 2492.26 samples/sec Loss 27.1452 LearningRate 0.000359 Epoch: 1 Global Step: 29760 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:14:27,692-Speed 2512.38 samples/sec Loss 27.0960 LearningRate 0.000359 Epoch: 1 Global Step: 29770 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:14:35,893-Speed 2497.87 samples/sec Loss 27.2461 LearningRate 0.000359 Epoch: 1 Global Step: 29780 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:14:44,096-Speed 2497.13 samples/sec Loss 27.2288 LearningRate 0.000359 Epoch: 1 Global Step: 29790 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:14:52,300-Speed 2496.82 samples/sec Loss 27.3097 LearningRate 0.000359 Epoch: 1 Global Step: 29800 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:00,517-Speed 2492.99 samples/sec Loss 27.3273 LearningRate 0.000359 Epoch: 1 Global Step: 29810 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:08,722-Speed 2496.59 samples/sec Loss 27.2748 LearningRate 0.000359 Epoch: 1 Global Step: 29820 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:16,872-Speed 2513.45 samples/sec Loss 27.2310 LearningRate 0.000360 Epoch: 1 Global Step: 29830 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:25,073-Speed 2497.59 samples/sec Loss 27.1715 LearningRate 0.000360 Epoch: 1 Global Step: 29840 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:33,277-Speed 2496.54 samples/sec Loss 27.1799 LearningRate 0.000360 Epoch: 1 Global Step: 29850 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:41,479-Speed 2497.20 samples/sec Loss 27.1275 LearningRate 0.000360 Epoch: 1 Global Step: 29860 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:49,680-Speed 2497.52 samples/sec Loss 27.1286 LearningRate 0.000360 Epoch: 1 Global Step: 29870 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:15:57,894-Speed 2493.95 samples/sec Loss 27.0064 LearningRate 0.000360 Epoch: 1 Global Step: 29880 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:06,041-Speed 2514.25 samples/sec Loss 27.0458 LearningRate 0.000360 Epoch: 1 Global Step: 29890 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:14,245-Speed 2496.32 samples/sec Loss 26.9067 LearningRate 0.000360 Epoch: 1 Global Step: 29900 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:22,450-Speed 2496.54 samples/sec Loss 27.1327 LearningRate 0.000361 Epoch: 1 Global Step: 29910 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:30,653-Speed 2497.14 samples/sec Loss 27.0005 LearningRate 0.000361 Epoch: 1 Global Step: 29920 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:38,856-Speed 2497.28 samples/sec Loss 27.0917 LearningRate 0.000361 Epoch: 1 Global Step: 29930 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:47,060-Speed 2496.78 samples/sec Loss 26.9952 LearningRate 0.000361 Epoch: 1 Global Step: 29940 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:16:55,212-Speed 2512.43 samples/sec Loss 26.9829 LearningRate 0.000361 Epoch: 1 Global Step: 29950 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:03,415-Speed 2496.91 samples/sec Loss 26.9250 LearningRate 0.000361 Epoch: 1 Global Step: 29960 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:11,620-Speed 2496.69 samples/sec Loss 26.9996 LearningRate 0.000361 Epoch: 1 Global Step: 29970 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:19,822-Speed 2497.10 samples/sec Loss 26.9058 LearningRate 0.000361 Epoch: 1 Global Step: 29980 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:28,025-Speed 2497.20 samples/sec Loss 26.9039 LearningRate 0.000362 Epoch: 1 Global Step: 29990 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:36,227-Speed 2497.30 samples/sec Loss 26.9549 LearningRate 0.000362 Epoch: 1 Global Step: 30000 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:44,378-Speed 2512.90 samples/sec Loss 26.9609 LearningRate 0.000362 Epoch: 1 Global Step: 30010 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:17:52,580-Speed 2497.06 samples/sec Loss 26.8035 LearningRate 0.000362 Epoch: 1 Global Step: 30020 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:00,784-Speed 2496.83 samples/sec Loss 26.8128 LearningRate 0.000362 Epoch: 1 Global Step: 30030 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:08,987-Speed 2496.82 samples/sec Loss 26.8005 LearningRate 0.000362 Epoch: 1 Global Step: 30040 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:17,187-Speed 2497.68 samples/sec Loss 26.9096 LearningRate 0.000362 Epoch: 1 Global Step: 30050 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:25,395-Speed 2495.37 samples/sec Loss 26.9097 LearningRate 0.000362 Epoch: 1 Global Step: 30060 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:33,542-Speed 2514.18 samples/sec Loss 26.8607 LearningRate 0.000362 Epoch: 1 Global Step: 30070 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:41,744-Speed 2497.45 samples/sec Loss 26.8188 LearningRate 0.000363 Epoch: 1 Global Step: 30080 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:49,947-Speed 2496.99 samples/sec Loss 26.8707 LearningRate 0.000363 Epoch: 1 Global Step: 30090 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:18:58,158-Speed 2494.64 samples/sec Loss 26.8241 LearningRate 0.000363 Epoch: 1 Global Step: 30100 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:06,392-Speed 2487.81 samples/sec Loss 26.8405 LearningRate 0.000363 Epoch: 1 Global Step: 30110 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:14,597-Speed 2496.34 samples/sec Loss 26.8104 LearningRate 0.000363 Epoch: 1 Global Step: 30120 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:22,750-Speed 2512.55 samples/sec Loss 26.7774 LearningRate 0.000363 Epoch: 1 Global Step: 30130 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:30,953-Speed 2496.94 samples/sec Loss 26.7468 LearningRate 0.000363 Epoch: 1 Global Step: 30140 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:39,155-Speed 2497.33 samples/sec Loss 26.8462 LearningRate 0.000363 Epoch: 1 Global Step: 30150 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:47,384-Speed 2489.31 samples/sec Loss 26.7967 LearningRate 0.000364 Epoch: 1 Global Step: 30160 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:19:55,589-Speed 2496.19 samples/sec Loss 26.6214 LearningRate 0.000364 Epoch: 1 Global Step: 30170 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:03,804-Speed 2493.46 samples/sec Loss 26.8088 LearningRate 0.000364 Epoch: 1 Global Step: 30180 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:11,956-Speed 2512.90 samples/sec Loss 26.6075 LearningRate 0.000364 Epoch: 1 Global Step: 30190 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:20,157-Speed 2497.37 samples/sec Loss 26.6905 LearningRate 0.000364 Epoch: 1 Global Step: 30200 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:28,360-Speed 2497.35 samples/sec Loss 26.6153 LearningRate 0.000364 Epoch: 1 Global Step: 30210 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:36,563-Speed 2496.84 samples/sec Loss 26.6846 LearningRate 0.000364 Epoch: 1 Global Step: 30220 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:44,766-Speed 2497.35 samples/sec Loss 26.6565 LearningRate 0.000364 Epoch: 1 Global Step: 30230 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:20:52,965-Speed 2498.10 samples/sec Loss 26.6297 LearningRate 0.000365 Epoch: 1 Global Step: 30240 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:01,110-Speed 2514.83 samples/sec Loss 26.6370 LearningRate 0.000365 Epoch: 1 Global Step: 30250 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:09,312-Speed 2497.65 samples/sec Loss 26.5629 LearningRate 0.000365 Epoch: 1 Global Step: 30260 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:17,508-Speed 2499.25 samples/sec Loss 26.5080 LearningRate 0.000365 Epoch: 1 Global Step: 30270 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:25,714-Speed 2496.43 samples/sec Loss 26.4005 LearningRate 0.000365 Epoch: 1 Global Step: 30280 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:33,915-Speed 2497.37 samples/sec Loss 26.5389 LearningRate 0.000365 Epoch: 1 Global Step: 30290 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:42,119-Speed 2496.77 samples/sec Loss 26.4735 LearningRate 0.000365 Epoch: 1 Global Step: 30300 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:50,267-Speed 2514.20 samples/sec Loss 26.4474 LearningRate 0.000365 Epoch: 1 Global Step: 30310 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:21:58,468-Speed 2497.60 samples/sec Loss 26.5021 LearningRate 0.000365 Epoch: 1 Global Step: 30320 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:06,673-Speed 2496.32 samples/sec Loss 26.3819 LearningRate 0.000366 Epoch: 1 Global Step: 30330 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:14,876-Speed 2497.11 samples/sec Loss 26.5150 LearningRate 0.000366 Epoch: 1 Global Step: 30340 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:23,082-Speed 2496.16 samples/sec Loss 26.4413 LearningRate 0.000366 Epoch: 1 Global Step: 30350 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:31,288-Speed 2496.00 samples/sec Loss 26.4103 LearningRate 0.000366 Epoch: 1 Global Step: 30360 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:39,438-Speed 2513.42 samples/sec Loss 26.4441 LearningRate 0.000366 Epoch: 1 Global Step: 30370 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:47,644-Speed 2496.28 samples/sec Loss 26.3726 LearningRate 0.000366 Epoch: 1 Global Step: 30380 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:22:55,846-Speed 2497.39 samples/sec Loss 26.4081 LearningRate 0.000366 Epoch: 1 Global Step: 30390 Fp16 Grad Scale: 4096 Required: 183 hours Training: 2022-07-05 21:23:04,004-Speed 2510.86 samples/sec Loss 26.4375 LearningRate 0.000366 Epoch: 1 Global Step: 30400 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:12,209-Speed 2496.67 samples/sec Loss 26.4100 LearningRate 0.000367 Epoch: 1 Global Step: 30410 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:20,410-Speed 2497.63 samples/sec Loss 26.3289 LearningRate 0.000367 Epoch: 1 Global Step: 30420 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:28,554-Speed 2515.24 samples/sec Loss 26.4161 LearningRate 0.000367 Epoch: 1 Global Step: 30430 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:36,759-Speed 2496.44 samples/sec Loss 26.4562 LearningRate 0.000367 Epoch: 1 Global Step: 30440 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:44,964-Speed 2496.40 samples/sec Loss 26.4913 LearningRate 0.000367 Epoch: 1 Global Step: 30450 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:23:53,169-Speed 2496.40 samples/sec Loss 26.4054 LearningRate 0.000367 Epoch: 1 Global Step: 30460 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:24:01,378-Speed 2495.05 samples/sec Loss 26.4065 LearningRate 0.000367 Epoch: 1 Global Step: 30470 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:24:09,584-Speed 2496.22 samples/sec Loss 26.3859 LearningRate 0.000367 Epoch: 1 Global Step: 30480 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:24:17,735-Speed 2513.15 samples/sec Loss 26.2815 LearningRate 0.000368 Epoch: 1 Global Step: 30490 Fp16 Grad Scale: 2048 Required: 183 hours Training: 2022-07-05 21:24:25,936-Speed 2497.45 samples/sec Loss 26.2340 LearningRate 0.000368 Epoch: 1 Global Step: 30500 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:24:34,149-Speed 2494.07 samples/sec Loss 26.2417 LearningRate 0.000368 Epoch: 1 Global Step: 30510 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:24:42,357-Speed 2495.52 samples/sec Loss 26.3282 LearningRate 0.000368 Epoch: 1 Global Step: 30520 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:24:50,560-Speed 2497.02 samples/sec Loss 26.3010 LearningRate 0.000368 Epoch: 1 Global Step: 30530 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:24:58,762-Speed 2497.65 samples/sec Loss 26.2134 LearningRate 0.000368 Epoch: 1 Global Step: 30540 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:06,909-Speed 2514.12 samples/sec Loss 26.2841 LearningRate 0.000368 Epoch: 1 Global Step: 30550 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:15,111-Speed 2497.55 samples/sec Loss 26.2636 LearningRate 0.000368 Epoch: 1 Global Step: 30560 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:23,316-Speed 2496.11 samples/sec Loss 26.2535 LearningRate 0.000368 Epoch: 1 Global Step: 30570 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:31,518-Speed 2497.50 samples/sec Loss 26.2391 LearningRate 0.000369 Epoch: 1 Global Step: 30580 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:39,717-Speed 2498.29 samples/sec Loss 26.2423 LearningRate 0.000369 Epoch: 1 Global Step: 30590 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:47,917-Speed 2497.74 samples/sec Loss 26.1610 LearningRate 0.000369 Epoch: 1 Global Step: 30600 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:25:56,069-Speed 2512.66 samples/sec Loss 26.2166 LearningRate 0.000369 Epoch: 1 Global Step: 30610 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:04,268-Speed 2498.50 samples/sec Loss 26.2329 LearningRate 0.000369 Epoch: 1 Global Step: 30620 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:12,472-Speed 2496.62 samples/sec Loss 26.1474 LearningRate 0.000369 Epoch: 1 Global Step: 30630 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:20,678-Speed 2496.43 samples/sec Loss 26.1281 LearningRate 0.000369 Epoch: 1 Global Step: 30640 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:28,880-Speed 2497.44 samples/sec Loss 26.1392 LearningRate 0.000369 Epoch: 1 Global Step: 30650 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:37,081-Speed 2497.45 samples/sec Loss 26.0728 LearningRate 0.000370 Epoch: 1 Global Step: 30660 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:45,229-Speed 2513.66 samples/sec Loss 26.1642 LearningRate 0.000370 Epoch: 1 Global Step: 30670 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:26:53,430-Speed 2497.73 samples/sec Loss 26.1398 LearningRate 0.000370 Epoch: 1 Global Step: 30680 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:01,633-Speed 2497.08 samples/sec Loss 25.9800 LearningRate 0.000370 Epoch: 1 Global Step: 30690 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:09,834-Speed 2497.70 samples/sec Loss 26.1075 LearningRate 0.000370 Epoch: 1 Global Step: 30700 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:18,039-Speed 2496.32 samples/sec Loss 26.0974 LearningRate 0.000370 Epoch: 1 Global Step: 30710 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:26,241-Speed 2497.53 samples/sec Loss 26.0709 LearningRate 0.000370 Epoch: 1 Global Step: 30720 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:34,419-Speed 2504.56 samples/sec Loss 26.0987 LearningRate 0.000370 Epoch: 1 Global Step: 30730 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:42,620-Speed 2497.66 samples/sec Loss 26.0970 LearningRate 0.000371 Epoch: 1 Global Step: 30740 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:50,827-Speed 2495.99 samples/sec Loss 26.0649 LearningRate 0.000371 Epoch: 1 Global Step: 30750 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:27:59,032-Speed 2496.75 samples/sec Loss 25.9695 LearningRate 0.000371 Epoch: 1 Global Step: 30760 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:07,234-Speed 2497.27 samples/sec Loss 25.9237 LearningRate 0.000371 Epoch: 1 Global Step: 30770 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:15,443-Speed 2494.91 samples/sec Loss 25.9890 LearningRate 0.000371 Epoch: 1 Global Step: 30780 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:23,597-Speed 2512.43 samples/sec Loss 25.9211 LearningRate 0.000371 Epoch: 1 Global Step: 30790 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:31,798-Speed 2497.60 samples/sec Loss 25.9990 LearningRate 0.000371 Epoch: 1 Global Step: 30800 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:40,002-Speed 2496.94 samples/sec Loss 25.9243 LearningRate 0.000371 Epoch: 1 Global Step: 30810 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:48,203-Speed 2497.51 samples/sec Loss 25.9329 LearningRate 0.000372 Epoch: 1 Global Step: 30820 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:28:56,418-Speed 2493.59 samples/sec Loss 25.8742 LearningRate 0.000372 Epoch: 1 Global Step: 30830 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:04,619-Speed 2497.45 samples/sec Loss 25.8188 LearningRate 0.000372 Epoch: 1 Global Step: 30840 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:12,773-Speed 2512.06 samples/sec Loss 25.8080 LearningRate 0.000372 Epoch: 1 Global Step: 30850 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:20,974-Speed 2497.59 samples/sec Loss 25.9749 LearningRate 0.000372 Epoch: 1 Global Step: 30860 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:29,181-Speed 2495.74 samples/sec Loss 25.8701 LearningRate 0.000372 Epoch: 1 Global Step: 30870 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:37,388-Speed 2496.04 samples/sec Loss 25.8410 LearningRate 0.000372 Epoch: 1 Global Step: 30880 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:45,602-Speed 2493.72 samples/sec Loss 25.8150 LearningRate 0.000372 Epoch: 1 Global Step: 30890 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:29:53,814-Speed 2494.08 samples/sec Loss 25.7970 LearningRate 0.000372 Epoch: 1 Global Step: 30900 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:01,963-Speed 2513.57 samples/sec Loss 25.8326 LearningRate 0.000373 Epoch: 1 Global Step: 30910 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:10,170-Speed 2495.58 samples/sec Loss 25.7813 LearningRate 0.000373 Epoch: 1 Global Step: 30920 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:18,372-Speed 2497.68 samples/sec Loss 25.7786 LearningRate 0.000373 Epoch: 1 Global Step: 30930 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:26,572-Speed 2497.88 samples/sec Loss 25.8465 LearningRate 0.000373 Epoch: 1 Global Step: 30940 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:34,774-Speed 2497.32 samples/sec Loss 25.6775 LearningRate 0.000373 Epoch: 1 Global Step: 30950 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:42,990-Speed 2493.03 samples/sec Loss 25.7425 LearningRate 0.000373 Epoch: 1 Global Step: 30960 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:51,140-Speed 2514.35 samples/sec Loss 25.6799 LearningRate 0.000373 Epoch: 1 Global Step: 30970 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:30:59,342-Speed 2497.37 samples/sec Loss 25.7071 LearningRate 0.000373 Epoch: 1 Global Step: 30980 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:07,542-Speed 2498.05 samples/sec Loss 25.7494 LearningRate 0.000374 Epoch: 1 Global Step: 30990 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:15,741-Speed 2498.19 samples/sec Loss 25.7291 LearningRate 0.000374 Epoch: 1 Global Step: 31000 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:23,940-Speed 2498.05 samples/sec Loss 25.6526 LearningRate 0.000374 Epoch: 1 Global Step: 31010 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:32,139-Speed 2498.17 samples/sec Loss 25.7260 LearningRate 0.000374 Epoch: 1 Global Step: 31020 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:40,284-Speed 2515.05 samples/sec Loss 25.7365 LearningRate 0.000374 Epoch: 1 Global Step: 31030 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:48,500-Speed 2493.49 samples/sec Loss 25.6943 LearningRate 0.000374 Epoch: 1 Global Step: 31040 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:31:56,705-Speed 2496.25 samples/sec Loss 25.5876 LearningRate 0.000374 Epoch: 1 Global Step: 31050 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:04,907-Speed 2497.29 samples/sec Loss 25.5800 LearningRate 0.000374 Epoch: 1 Global Step: 31060 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:13,107-Speed 2498.18 samples/sec Loss 25.5513 LearningRate 0.000375 Epoch: 1 Global Step: 31070 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:21,316-Speed 2495.52 samples/sec Loss 25.6559 LearningRate 0.000375 Epoch: 1 Global Step: 31080 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:29,476-Speed 2510.00 samples/sec Loss 25.5551 LearningRate 0.000375 Epoch: 1 Global Step: 31090 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:37,677-Speed 2498.02 samples/sec Loss 25.5821 LearningRate 0.000375 Epoch: 1 Global Step: 31100 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:45,893-Speed 2493.08 samples/sec Loss 25.6664 LearningRate 0.000375 Epoch: 1 Global Step: 31110 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:32:54,096-Speed 2497.25 samples/sec Loss 25.5889 LearningRate 0.000375 Epoch: 1 Global Step: 31120 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:02,296-Speed 2497.95 samples/sec Loss 25.5592 LearningRate 0.000375 Epoch: 1 Global Step: 31130 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:10,499-Speed 2497.14 samples/sec Loss 25.5059 LearningRate 0.000375 Epoch: 1 Global Step: 31140 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:18,648-Speed 2513.40 samples/sec Loss 25.4501 LearningRate 0.000375 Epoch: 1 Global Step: 31150 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:26,851-Speed 2497.24 samples/sec Loss 25.4812 LearningRate 0.000376 Epoch: 1 Global Step: 31160 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:35,069-Speed 2492.58 samples/sec Loss 25.5294 LearningRate 0.000376 Epoch: 1 Global Step: 31170 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:43,271-Speed 2497.14 samples/sec Loss 25.5632 LearningRate 0.000376 Epoch: 1 Global Step: 31180 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:51,482-Speed 2494.93 samples/sec Loss 25.5894 LearningRate 0.000376 Epoch: 1 Global Step: 31190 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:33:59,687-Speed 2496.66 samples/sec Loss 25.5452 LearningRate 0.000376 Epoch: 1 Global Step: 31200 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:07,838-Speed 2513.08 samples/sec Loss 25.5000 LearningRate 0.000376 Epoch: 1 Global Step: 31210 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:16,038-Speed 2497.83 samples/sec Loss 25.4436 LearningRate 0.000376 Epoch: 1 Global Step: 31220 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:24,240-Speed 2497.31 samples/sec Loss 25.5587 LearningRate 0.000376 Epoch: 1 Global Step: 31230 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:32,443-Speed 2497.02 samples/sec Loss 25.4708 LearningRate 0.000377 Epoch: 1 Global Step: 31240 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:40,647-Speed 2496.69 samples/sec Loss 25.5411 LearningRate 0.000377 Epoch: 1 Global Step: 31250 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:48,851-Speed 2496.63 samples/sec Loss 25.3876 LearningRate 0.000377 Epoch: 1 Global Step: 31260 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:34:57,006-Speed 2512.00 samples/sec Loss 25.2655 LearningRate 0.000377 Epoch: 1 Global Step: 31270 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:05,212-Speed 2495.91 samples/sec Loss 25.3613 LearningRate 0.000377 Epoch: 1 Global Step: 31280 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:13,427-Speed 2493.30 samples/sec Loss 25.4494 LearningRate 0.000377 Epoch: 1 Global Step: 31290 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:21,636-Speed 2495.69 samples/sec Loss 25.3944 LearningRate 0.000377 Epoch: 1 Global Step: 31300 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:29,838-Speed 2497.09 samples/sec Loss 25.3395 LearningRate 0.000377 Epoch: 1 Global Step: 31310 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:38,043-Speed 2496.35 samples/sec Loss 25.4491 LearningRate 0.000378 Epoch: 1 Global Step: 31320 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:46,194-Speed 2513.12 samples/sec Loss 25.3775 LearningRate 0.000378 Epoch: 1 Global Step: 31330 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:35:54,394-Speed 2497.88 samples/sec Loss 25.3645 LearningRate 0.000378 Epoch: 1 Global Step: 31340 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:02,599-Speed 2496.59 samples/sec Loss 25.4204 LearningRate 0.000378 Epoch: 1 Global Step: 31350 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:10,802-Speed 2496.91 samples/sec Loss 25.4306 LearningRate 0.000378 Epoch: 1 Global Step: 31360 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:19,002-Speed 2497.95 samples/sec Loss 25.5967 LearningRate 0.000378 Epoch: 1 Global Step: 31370 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:27,203-Speed 2497.51 samples/sec Loss 25.5113 LearningRate 0.000378 Epoch: 1 Global Step: 31380 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:35,353-Speed 2513.47 samples/sec Loss 25.3279 LearningRate 0.000378 Epoch: 1 Global Step: 31390 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:43,553-Speed 2497.83 samples/sec Loss 25.4395 LearningRate 0.000379 Epoch: 1 Global Step: 31400 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:51,758-Speed 2496.48 samples/sec Loss 25.4328 LearningRate 0.000379 Epoch: 1 Global Step: 31410 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:36:59,961-Speed 2496.81 samples/sec Loss 25.2750 LearningRate 0.000379 Epoch: 1 Global Step: 31420 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:08,167-Speed 2496.16 samples/sec Loss 25.3233 LearningRate 0.000379 Epoch: 1 Global Step: 31430 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:16,372-Speed 2496.57 samples/sec Loss 25.2985 LearningRate 0.000379 Epoch: 1 Global Step: 31440 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:24,521-Speed 2513.50 samples/sec Loss 25.1831 LearningRate 0.000379 Epoch: 1 Global Step: 31450 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:32,725-Speed 2496.70 samples/sec Loss 25.2226 LearningRate 0.000379 Epoch: 1 Global Step: 31460 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:40,926-Speed 2497.46 samples/sec Loss 25.1139 LearningRate 0.000379 Epoch: 1 Global Step: 31470 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:49,126-Speed 2497.90 samples/sec Loss 25.1744 LearningRate 0.000379 Epoch: 1 Global Step: 31480 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:37:57,326-Speed 2497.89 samples/sec Loss 25.1927 LearningRate 0.000380 Epoch: 1 Global Step: 31490 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:05,531-Speed 2496.83 samples/sec Loss 25.1856 LearningRate 0.000380 Epoch: 1 Global Step: 31500 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:13,680-Speed 2513.52 samples/sec Loss 25.1344 LearningRate 0.000380 Epoch: 1 Global Step: 31510 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:21,882-Speed 2497.10 samples/sec Loss 25.2039 LearningRate 0.000380 Epoch: 1 Global Step: 31520 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:30,083-Speed 2497.60 samples/sec Loss 25.1933 LearningRate 0.000380 Epoch: 1 Global Step: 31530 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:38,285-Speed 2497.36 samples/sec Loss 25.1668 LearningRate 0.000380 Epoch: 1 Global Step: 31540 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:46,501-Speed 2493.30 samples/sec Loss 25.1353 LearningRate 0.000380 Epoch: 1 Global Step: 31550 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:38:54,701-Speed 2497.60 samples/sec Loss 25.0622 LearningRate 0.000380 Epoch: 1 Global Step: 31560 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:39:02,848-Speed 2514.33 samples/sec Loss 25.0993 LearningRate 0.000381 Epoch: 1 Global Step: 31570 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:39:11,048-Speed 2497.78 samples/sec Loss 25.1075 LearningRate 0.000381 Epoch: 1 Global Step: 31580 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:39:19,247-Speed 2498.55 samples/sec Loss 25.1792 LearningRate 0.000381 Epoch: 1 Global Step: 31590 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:39:27,447-Speed 2497.94 samples/sec Loss 25.2287 LearningRate 0.000381 Epoch: 1 Global Step: 31600 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 21:39:35,648-Speed 2497.68 samples/sec Loss 25.2809 LearningRate 0.000381 Epoch: 1 Global Step: 31610 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 21:39:43,807-Speed 2510.70 samples/sec Loss 25.1908 LearningRate 0.000381 Epoch: 1 Global Step: 31620 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:39:51,954-Speed 2513.95 samples/sec Loss 25.2175 LearningRate 0.000381 Epoch: 1 Global Step: 31630 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:00,165-Speed 2494.93 samples/sec Loss 25.2692 LearningRate 0.000381 Epoch: 1 Global Step: 31640 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:08,364-Speed 2498.28 samples/sec Loss 25.1313 LearningRate 0.000382 Epoch: 1 Global Step: 31650 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:16,565-Speed 2497.56 samples/sec Loss 25.1635 LearningRate 0.000382 Epoch: 1 Global Step: 31660 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:24,769-Speed 2496.84 samples/sec Loss 25.0880 LearningRate 0.000382 Epoch: 1 Global Step: 31670 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:32,973-Speed 2496.87 samples/sec Loss 25.1050 LearningRate 0.000382 Epoch: 1 Global Step: 31680 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:41,120-Speed 2513.74 samples/sec Loss 25.0541 LearningRate 0.000382 Epoch: 1 Global Step: 31690 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:49,324-Speed 2497.28 samples/sec Loss 25.0674 LearningRate 0.000382 Epoch: 1 Global Step: 31700 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:40:57,524-Speed 2497.92 samples/sec Loss 25.0403 LearningRate 0.000382 Epoch: 1 Global Step: 31710 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:05,738-Speed 2493.61 samples/sec Loss 24.9593 LearningRate 0.000382 Epoch: 1 Global Step: 31720 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:13,948-Speed 2494.90 samples/sec Loss 24.9429 LearningRate 0.000382 Epoch: 1 Global Step: 31730 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:22,150-Speed 2497.18 samples/sec Loss 24.9422 LearningRate 0.000383 Epoch: 1 Global Step: 31740 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:30,300-Speed 2513.52 samples/sec Loss 24.8486 LearningRate 0.000383 Epoch: 1 Global Step: 31750 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:38,503-Speed 2496.92 samples/sec Loss 24.9540 LearningRate 0.000383 Epoch: 1 Global Step: 31760 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:46,704-Speed 2497.72 samples/sec Loss 24.9373 LearningRate 0.000383 Epoch: 1 Global Step: 31770 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:41:54,904-Speed 2497.99 samples/sec Loss 24.8933 LearningRate 0.000383 Epoch: 1 Global Step: 31780 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:03,111-Speed 2495.70 samples/sec Loss 24.9703 LearningRate 0.000383 Epoch: 1 Global Step: 31790 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:11,311-Speed 2498.07 samples/sec Loss 24.8594 LearningRate 0.000383 Epoch: 1 Global Step: 31800 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:19,457-Speed 2514.69 samples/sec Loss 24.8615 LearningRate 0.000383 Epoch: 1 Global Step: 31810 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:27,656-Speed 2498.09 samples/sec Loss 24.8493 LearningRate 0.000384 Epoch: 1 Global Step: 31820 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:35,866-Speed 2495.18 samples/sec Loss 24.7840 LearningRate 0.000384 Epoch: 1 Global Step: 31830 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:44,063-Speed 2498.79 samples/sec Loss 24.9457 LearningRate 0.000384 Epoch: 1 Global Step: 31840 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:42:52,265-Speed 2497.35 samples/sec Loss 24.9292 LearningRate 0.000384 Epoch: 1 Global Step: 31850 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:00,465-Speed 2497.95 samples/sec Loss 24.9683 LearningRate 0.000384 Epoch: 1 Global Step: 31860 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:08,614-Speed 2513.55 samples/sec Loss 24.8514 LearningRate 0.000384 Epoch: 1 Global Step: 31870 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:16,812-Speed 2498.32 samples/sec Loss 25.0122 LearningRate 0.000384 Epoch: 1 Global Step: 31880 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:25,015-Speed 2497.06 samples/sec Loss 24.9937 LearningRate 0.000384 Epoch: 1 Global Step: 31890 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:33,217-Speed 2497.26 samples/sec Loss 24.9262 LearningRate 0.000385 Epoch: 1 Global Step: 31900 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:41,418-Speed 2497.85 samples/sec Loss 24.9849 LearningRate 0.000385 Epoch: 1 Global Step: 31910 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:49,623-Speed 2496.28 samples/sec Loss 24.9797 LearningRate 0.000385 Epoch: 1 Global Step: 31920 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:43:57,784-Speed 2510.06 samples/sec Loss 25.0660 LearningRate 0.000385 Epoch: 1 Global Step: 31930 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:05,984-Speed 2497.78 samples/sec Loss 24.8925 LearningRate 0.000385 Epoch: 1 Global Step: 31940 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:14,182-Speed 2498.61 samples/sec Loss 24.8142 LearningRate 0.000385 Epoch: 1 Global Step: 31950 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:22,382-Speed 2498.22 samples/sec Loss 24.7360 LearningRate 0.000385 Epoch: 1 Global Step: 31960 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:30,581-Speed 2498.30 samples/sec Loss 24.7699 LearningRate 0.000385 Epoch: 1 Global Step: 31970 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:38,780-Speed 2498.03 samples/sec Loss 24.6569 LearningRate 0.000385 Epoch: 1 Global Step: 31980 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:46,928-Speed 2513.83 samples/sec Loss 24.6704 LearningRate 0.000386 Epoch: 1 Global Step: 31990 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:44:55,133-Speed 2496.62 samples/sec Loss 24.7362 LearningRate 0.000386 Epoch: 1 Global Step: 32000 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:03,341-Speed 2495.32 samples/sec Loss 24.6270 LearningRate 0.000386 Epoch: 1 Global Step: 32010 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:11,543-Speed 2497.58 samples/sec Loss 24.6803 LearningRate 0.000386 Epoch: 1 Global Step: 32020 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:19,746-Speed 2496.96 samples/sec Loss 24.5968 LearningRate 0.000386 Epoch: 1 Global Step: 32030 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:27,949-Speed 2496.93 samples/sec Loss 24.5030 LearningRate 0.000386 Epoch: 1 Global Step: 32040 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:36,100-Speed 2513.11 samples/sec Loss 24.5816 LearningRate 0.000386 Epoch: 1 Global Step: 32050 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:44,299-Speed 2498.24 samples/sec Loss 24.5357 LearningRate 0.000386 Epoch: 1 Global Step: 32060 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:45:52,501-Speed 2497.43 samples/sec Loss 24.6256 LearningRate 0.000387 Epoch: 1 Global Step: 32070 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:00,701-Speed 2497.90 samples/sec Loss 24.6399 LearningRate 0.000387 Epoch: 1 Global Step: 32080 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:08,902-Speed 2497.77 samples/sec Loss 24.5022 LearningRate 0.000387 Epoch: 1 Global Step: 32090 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:17,104-Speed 2497.37 samples/sec Loss 24.6420 LearningRate 0.000387 Epoch: 1 Global Step: 32100 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:25,253-Speed 2513.40 samples/sec Loss 24.4035 LearningRate 0.000387 Epoch: 1 Global Step: 32110 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:33,453-Speed 2497.97 samples/sec Loss 24.4719 LearningRate 0.000387 Epoch: 1 Global Step: 32120 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:41,655-Speed 2497.59 samples/sec Loss 24.5018 LearningRate 0.000387 Epoch: 1 Global Step: 32130 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:49,858-Speed 2497.15 samples/sec Loss 24.5312 LearningRate 0.000387 Epoch: 1 Global Step: 32140 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:46:58,059-Speed 2497.67 samples/sec Loss 24.5620 LearningRate 0.000388 Epoch: 1 Global Step: 32150 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:06,262-Speed 2496.99 samples/sec Loss 24.3981 LearningRate 0.000388 Epoch: 1 Global Step: 32160 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:14,411-Speed 2513.73 samples/sec Loss 24.4899 LearningRate 0.000388 Epoch: 1 Global Step: 32170 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:22,611-Speed 2497.90 samples/sec Loss 24.3780 LearningRate 0.000388 Epoch: 1 Global Step: 32180 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:30,818-Speed 2496.48 samples/sec Loss 24.4533 LearningRate 0.000388 Epoch: 1 Global Step: 32190 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:39,024-Speed 2496.58 samples/sec Loss 24.5787 LearningRate 0.000388 Epoch: 1 Global Step: 32200 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:47,225-Speed 2497.53 samples/sec Loss 24.3969 LearningRate 0.000388 Epoch: 1 Global Step: 32210 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 21:47:55,385-Speed 2509.99 samples/sec Loss 24.4004 LearningRate 0.000388 Epoch: 1 Global Step: 32220 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:03,534-Speed 2513.83 samples/sec Loss 24.3189 LearningRate 0.000389 Epoch: 1 Global Step: 32230 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:11,733-Speed 2498.36 samples/sec Loss 24.3748 LearningRate 0.000389 Epoch: 1 Global Step: 32240 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:19,939-Speed 2495.94 samples/sec Loss 24.3370 LearningRate 0.000389 Epoch: 1 Global Step: 32250 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:28,138-Speed 2498.32 samples/sec Loss 24.2926 LearningRate 0.000389 Epoch: 1 Global Step: 32260 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:36,350-Speed 2494.54 samples/sec Loss 24.2440 LearningRate 0.000389 Epoch: 1 Global Step: 32270 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:44,564-Speed 2493.97 samples/sec Loss 24.4007 LearningRate 0.000389 Epoch: 1 Global Step: 32280 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:48:52,712-Speed 2513.93 samples/sec Loss 24.2067 LearningRate 0.000389 Epoch: 1 Global Step: 32290 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:00,912-Speed 2497.92 samples/sec Loss 24.2807 LearningRate 0.000389 Epoch: 1 Global Step: 32300 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:09,112-Speed 2497.93 samples/sec Loss 24.2584 LearningRate 0.000389 Epoch: 1 Global Step: 32310 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:17,315-Speed 2497.39 samples/sec Loss 24.2903 LearningRate 0.000390 Epoch: 1 Global Step: 32320 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:25,513-Speed 2498.63 samples/sec Loss 24.2970 LearningRate 0.000390 Epoch: 1 Global Step: 32330 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:33,718-Speed 2496.34 samples/sec Loss 24.2195 LearningRate 0.000390 Epoch: 1 Global Step: 32340 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:41,868-Speed 2513.52 samples/sec Loss 24.2777 LearningRate 0.000390 Epoch: 1 Global Step: 32350 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:50,068-Speed 2498.12 samples/sec Loss 24.1930 LearningRate 0.000390 Epoch: 1 Global Step: 32360 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:49:58,271-Speed 2497.05 samples/sec Loss 24.2398 LearningRate 0.000390 Epoch: 1 Global Step: 32370 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:06,468-Speed 2498.93 samples/sec Loss 24.2565 LearningRate 0.000390 Epoch: 1 Global Step: 32380 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:14,669-Speed 2497.44 samples/sec Loss 24.2316 LearningRate 0.000390 Epoch: 1 Global Step: 32390 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:22,873-Speed 2497.06 samples/sec Loss 24.3110 LearningRate 0.000391 Epoch: 1 Global Step: 32400 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:31,026-Speed 2512.41 samples/sec Loss 24.3035 LearningRate 0.000391 Epoch: 1 Global Step: 32410 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:39,226-Speed 2498.11 samples/sec Loss 24.1516 LearningRate 0.000391 Epoch: 1 Global Step: 32420 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:47,424-Speed 2498.68 samples/sec Loss 24.2102 LearningRate 0.000391 Epoch: 1 Global Step: 32430 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:50:55,625-Speed 2497.57 samples/sec Loss 24.1660 LearningRate 0.000391 Epoch: 1 Global Step: 32440 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:03,830-Speed 2496.71 samples/sec Loss 24.1583 LearningRate 0.000391 Epoch: 1 Global Step: 32450 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:12,026-Speed 2499.56 samples/sec Loss 24.1730 LearningRate 0.000391 Epoch: 1 Global Step: 32460 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:20,169-Speed 2515.62 samples/sec Loss 24.1061 LearningRate 0.000391 Epoch: 1 Global Step: 32470 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:28,367-Speed 2498.42 samples/sec Loss 24.1162 LearningRate 0.000392 Epoch: 1 Global Step: 32480 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:36,571-Speed 2496.64 samples/sec Loss 24.3136 LearningRate 0.000392 Epoch: 1 Global Step: 32490 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:44,771-Speed 2497.95 samples/sec Loss 24.2586 LearningRate 0.000392 Epoch: 1 Global Step: 32500 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:51:52,971-Speed 2497.94 samples/sec Loss 24.1157 LearningRate 0.000392 Epoch: 1 Global Step: 32510 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:01,172-Speed 2497.78 samples/sec Loss 24.0403 LearningRate 0.000392 Epoch: 1 Global Step: 32520 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:09,318-Speed 2514.23 samples/sec Loss 24.0328 LearningRate 0.000392 Epoch: 1 Global Step: 32530 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:17,513-Speed 2499.63 samples/sec Loss 24.0626 LearningRate 0.000392 Epoch: 1 Global Step: 32540 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:25,710-Speed 2498.90 samples/sec Loss 24.1071 LearningRate 0.000392 Epoch: 1 Global Step: 32550 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:33,909-Speed 2498.27 samples/sec Loss 24.0117 LearningRate 0.000392 Epoch: 1 Global Step: 32560 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:42,111-Speed 2497.45 samples/sec Loss 24.0810 LearningRate 0.000393 Epoch: 1 Global Step: 32570 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:50,313-Speed 2497.07 samples/sec Loss 23.9630 LearningRate 0.000393 Epoch: 1 Global Step: 32580 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:52:58,456-Speed 2515.58 samples/sec Loss 24.1533 LearningRate 0.000393 Epoch: 1 Global Step: 32590 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:06,653-Speed 2498.74 samples/sec Loss 23.9000 LearningRate 0.000393 Epoch: 1 Global Step: 32600 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:14,855-Speed 2497.28 samples/sec Loss 24.0398 LearningRate 0.000393 Epoch: 1 Global Step: 32610 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:23,052-Speed 2498.84 samples/sec Loss 23.9662 LearningRate 0.000393 Epoch: 1 Global Step: 32620 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:31,254-Speed 2497.36 samples/sec Loss 23.9752 LearningRate 0.000393 Epoch: 1 Global Step: 32630 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:39,455-Speed 2497.77 samples/sec Loss 24.0598 LearningRate 0.000393 Epoch: 1 Global Step: 32640 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:47,599-Speed 2515.05 samples/sec Loss 23.9780 LearningRate 0.000394 Epoch: 1 Global Step: 32650 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:53:55,814-Speed 2493.45 samples/sec Loss 23.8998 LearningRate 0.000394 Epoch: 1 Global Step: 32660 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:04,012-Speed 2498.47 samples/sec Loss 23.7996 LearningRate 0.000394 Epoch: 1 Global Step: 32670 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:12,212-Speed 2497.92 samples/sec Loss 23.9056 LearningRate 0.000394 Epoch: 1 Global Step: 32680 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:20,408-Speed 2499.28 samples/sec Loss 23.9655 LearningRate 0.000394 Epoch: 1 Global Step: 32690 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:28,608-Speed 2498.09 samples/sec Loss 23.9438 LearningRate 0.000394 Epoch: 1 Global Step: 32700 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:36,753-Speed 2514.87 samples/sec Loss 23.9982 LearningRate 0.000394 Epoch: 1 Global Step: 32710 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:44,951-Speed 2498.49 samples/sec Loss 23.9109 LearningRate 0.000394 Epoch: 1 Global Step: 32720 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:54:53,154-Speed 2497.18 samples/sec Loss 24.0242 LearningRate 0.000395 Epoch: 1 Global Step: 32730 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:01,397-Speed 2499.68 samples/sec Loss 24.0709 LearningRate 0.000395 Epoch: 1 Global Step: 32740 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:09,632-Speed 2495.40 samples/sec Loss 23.9327 LearningRate 0.000395 Epoch: 1 Global Step: 32750 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:17,846-Speed 2493.70 samples/sec Loss 23.9620 LearningRate 0.000395 Epoch: 1 Global Step: 32760 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:26,022-Speed 2515.37 samples/sec Loss 23.7894 LearningRate 0.000395 Epoch: 1 Global Step: 32770 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:36,835-Speed 2503.27 samples/sec Loss 23.8356 LearningRate 0.000395 Epoch: 1 Global Step: 32780 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:45,027-Speed 2500.48 samples/sec Loss 23.9046 LearningRate 0.000395 Epoch: 1 Global Step: 32790 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:55:53,224-Speed 2498.73 samples/sec Loss 23.7361 LearningRate 0.000395 Epoch: 1 Global Step: 32800 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:01,485-Speed 2501.69 samples/sec Loss 23.8414 LearningRate 0.000395 Epoch: 1 Global Step: 32810 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:09,709-Speed 2499.63 samples/sec Loss 23.7809 LearningRate 0.000396 Epoch: 1 Global Step: 32820 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:17,855-Speed 2514.48 samples/sec Loss 23.8673 LearningRate 0.000396 Epoch: 1 Global Step: 32830 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:26,111-Speed 2497.08 samples/sec Loss 23.9164 LearningRate 0.000396 Epoch: 1 Global Step: 32840 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:34,332-Speed 2498.86 samples/sec Loss 23.9254 LearningRate 0.000396 Epoch: 1 Global Step: 32850 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:42,532-Speed 2497.92 samples/sec Loss 23.7781 LearningRate 0.000396 Epoch: 1 Global Step: 32860 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:56:51,798-Speed 2500.27 samples/sec Loss 23.8362 LearningRate 0.000396 Epoch: 1 Global Step: 32870 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:00,025-Speed 2499.78 samples/sec Loss 23.8168 LearningRate 0.000396 Epoch: 1 Global Step: 32880 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:08,230-Speed 2515.09 samples/sec Loss 23.8240 LearningRate 0.000396 Epoch: 1 Global Step: 32890 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:16,429-Speed 2498.17 samples/sec Loss 23.7253 LearningRate 0.000397 Epoch: 1 Global Step: 32900 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:24,658-Speed 2498.89 samples/sec Loss 23.8044 LearningRate 0.000397 Epoch: 1 Global Step: 32910 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:32,893-Speed 2499.45 samples/sec Loss 23.6759 LearningRate 0.000397 Epoch: 1 Global Step: 32920 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:41,667-Speed 2500.39 samples/sec Loss 23.6195 LearningRate 0.000397 Epoch: 1 Global Step: 32930 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:49,865-Speed 2498.38 samples/sec Loss 23.6607 LearningRate 0.000397 Epoch: 1 Global Step: 32940 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:57:58,010-Speed 2514.68 samples/sec Loss 23.5544 LearningRate 0.000397 Epoch: 1 Global Step: 32950 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:06,673-Speed 2496.60 samples/sec Loss 23.6503 LearningRate 0.000397 Epoch: 1 Global Step: 32960 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:14,942-Speed 2500.38 samples/sec Loss 23.5723 LearningRate 0.000397 Epoch: 1 Global Step: 32970 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:23,141-Speed 2498.05 samples/sec Loss 23.6872 LearningRate 0.000398 Epoch: 1 Global Step: 32980 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:31,379-Speed 2501.34 samples/sec Loss 23.5233 LearningRate 0.000398 Epoch: 1 Global Step: 32990 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:39,607-Speed 2499.63 samples/sec Loss 23.5856 LearningRate 0.000398 Epoch: 1 Global Step: 33000 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:47,753-Speed 2514.52 samples/sec Loss 23.6347 LearningRate 0.000398 Epoch: 1 Global Step: 33010 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:58:55,992-Speed 2494.98 samples/sec Loss 23.5681 LearningRate 0.000398 Epoch: 1 Global Step: 33020 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:04,987-Speed 2376.16 samples/sec Loss 23.4844 LearningRate 0.000398 Epoch: 1 Global Step: 33030 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:13,947-Speed 2501.26 samples/sec Loss 23.5604 LearningRate 0.000398 Epoch: 1 Global Step: 33040 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:22,145-Speed 2498.46 samples/sec Loss 23.5044 LearningRate 0.000398 Epoch: 1 Global Step: 33050 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:30,378-Speed 2499.28 samples/sec Loss 23.4392 LearningRate 0.000399 Epoch: 1 Global Step: 33060 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:38,541-Speed 2515.90 samples/sec Loss 23.5016 LearningRate 0.000399 Epoch: 1 Global Step: 33070 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:46,956-Speed 2434.11 samples/sec Loss 23.3524 LearningRate 0.000399 Epoch: 1 Global Step: 33080 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 21:59:55,196-Speed 2500.84 samples/sec Loss 23.4339 LearningRate 0.000399 Epoch: 1 Global Step: 33090 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:05,727-Speed 2501.57 samples/sec Loss 23.3377 LearningRate 0.000399 Epoch: 1 Global Step: 33100 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:13,923-Speed 2499.45 samples/sec Loss 23.3308 LearningRate 0.000399 Epoch: 1 Global Step: 33110 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:22,121-Speed 2498.73 samples/sec Loss 23.3674 LearningRate 0.000399 Epoch: 1 Global Step: 33120 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:30,275-Speed 2512.20 samples/sec Loss 23.3895 LearningRate 0.000399 Epoch: 1 Global Step: 33130 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:38,472-Speed 2498.84 samples/sec Loss 23.4463 LearningRate 0.000399 Epoch: 1 Global Step: 33140 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:46,674-Speed 2497.30 samples/sec Loss 23.3962 LearningRate 0.000400 Epoch: 1 Global Step: 33150 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:00:54,875-Speed 2497.83 samples/sec Loss 23.3815 LearningRate 0.000400 Epoch: 1 Global Step: 33160 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:03,075-Speed 2497.90 samples/sec Loss 23.3273 LearningRate 0.000400 Epoch: 1 Global Step: 33170 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:11,278-Speed 2496.95 samples/sec Loss 23.2712 LearningRate 0.000400 Epoch: 1 Global Step: 33180 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:19,425-Speed 2514.17 samples/sec Loss 23.2843 LearningRate 0.000400 Epoch: 1 Global Step: 33190 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:27,632-Speed 2495.85 samples/sec Loss 23.3094 LearningRate 0.000400 Epoch: 1 Global Step: 33200 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:35,834-Speed 2497.58 samples/sec Loss 23.2990 LearningRate 0.000400 Epoch: 1 Global Step: 33210 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:44,032-Speed 2498.56 samples/sec Loss 23.2362 LearningRate 0.000400 Epoch: 1 Global Step: 33220 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:01:52,251-Speed 2492.09 samples/sec Loss 23.2116 LearningRate 0.000401 Epoch: 1 Global Step: 33230 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:00,448-Speed 2499.03 samples/sec Loss 23.1514 LearningRate 0.000401 Epoch: 1 Global Step: 33240 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:08,600-Speed 2512.75 samples/sec Loss 23.2184 LearningRate 0.000401 Epoch: 1 Global Step: 33250 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:16,804-Speed 2496.68 samples/sec Loss 23.0967 LearningRate 0.000401 Epoch: 1 Global Step: 33260 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:25,007-Speed 2497.11 samples/sec Loss 23.1668 LearningRate 0.000401 Epoch: 1 Global Step: 33270 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:33,207-Speed 2497.83 samples/sec Loss 23.1688 LearningRate 0.000401 Epoch: 1 Global Step: 33280 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:41,405-Speed 2498.59 samples/sec Loss 23.2388 LearningRate 0.000401 Epoch: 1 Global Step: 33290 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:49,603-Speed 2498.83 samples/sec Loss 23.0878 LearningRate 0.000401 Epoch: 1 Global Step: 33300 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:02:57,751-Speed 2513.89 samples/sec Loss 23.1490 LearningRate 0.000402 Epoch: 1 Global Step: 33310 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:05,957-Speed 2496.03 samples/sec Loss 23.1297 LearningRate 0.000402 Epoch: 1 Global Step: 33320 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:14,157-Speed 2497.95 samples/sec Loss 23.1035 LearningRate 0.000402 Epoch: 1 Global Step: 33330 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:22,356-Speed 2498.42 samples/sec Loss 23.1844 LearningRate 0.000402 Epoch: 1 Global Step: 33340 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:30,570-Speed 2493.82 samples/sec Loss 23.1214 LearningRate 0.000402 Epoch: 1 Global Step: 33350 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:38,767-Speed 2498.72 samples/sec Loss 23.1307 LearningRate 0.000402 Epoch: 1 Global Step: 33360 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:46,912-Speed 2514.76 samples/sec Loss 23.1606 LearningRate 0.000402 Epoch: 1 Global Step: 33370 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:03:55,109-Speed 2498.86 samples/sec Loss 23.1378 LearningRate 0.000402 Epoch: 1 Global Step: 33380 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:04:03,306-Speed 2499.23 samples/sec Loss 23.2333 LearningRate 0.000402 Epoch: 1 Global Step: 33390 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:04:11,508-Speed 2497.20 samples/sec Loss 23.1009 LearningRate 0.000403 Epoch: 1 Global Step: 33400 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:04:19,707-Speed 2498.69 samples/sec Loss 23.0139 LearningRate 0.000403 Epoch: 1 Global Step: 33410 Fp16 Grad Scale: 1024 Required: 182 hours Training: 2022-07-05 22:04:27,908-Speed 2497.52 samples/sec Loss 23.0921 LearningRate 0.000403 Epoch: 1 Global Step: 33420 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:04:36,057-Speed 2513.73 samples/sec Loss 23.0218 LearningRate 0.000403 Epoch: 1 Global Step: 33430 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:04:44,258-Speed 2497.54 samples/sec Loss 23.0333 LearningRate 0.000403 Epoch: 1 Global Step: 33440 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:04:52,460-Speed 2497.76 samples/sec Loss 23.0456 LearningRate 0.000403 Epoch: 1 Global Step: 33450 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:00,658-Speed 2498.49 samples/sec Loss 22.8440 LearningRate 0.000403 Epoch: 1 Global Step: 33460 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:08,871-Speed 2493.89 samples/sec Loss 22.9903 LearningRate 0.000403 Epoch: 1 Global Step: 33470 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:17,071-Speed 2497.97 samples/sec Loss 23.0438 LearningRate 0.000404 Epoch: 1 Global Step: 33480 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:25,221-Speed 2513.34 samples/sec Loss 23.1300 LearningRate 0.000404 Epoch: 1 Global Step: 33490 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:33,423-Speed 2497.23 samples/sec Loss 22.9399 LearningRate 0.000404 Epoch: 1 Global Step: 33500 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:41,625-Speed 2497.07 samples/sec Loss 22.9915 LearningRate 0.000404 Epoch: 1 Global Step: 33510 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:49,828-Speed 2497.01 samples/sec Loss 23.0103 LearningRate 0.000404 Epoch: 1 Global Step: 33520 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:05:58,036-Speed 2495.68 samples/sec Loss 22.9830 LearningRate 0.000404 Epoch: 1 Global Step: 33530 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:06,239-Speed 2497.07 samples/sec Loss 22.9325 LearningRate 0.000404 Epoch: 1 Global Step: 33540 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:14,385-Speed 2514.52 samples/sec Loss 22.9059 LearningRate 0.000404 Epoch: 1 Global Step: 33550 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:22,588-Speed 2497.32 samples/sec Loss 22.7877 LearningRate 0.000405 Epoch: 1 Global Step: 33560 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:30,795-Speed 2495.74 samples/sec Loss 22.9086 LearningRate 0.000405 Epoch: 1 Global Step: 33570 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:38,997-Speed 2497.31 samples/sec Loss 22.9396 LearningRate 0.000405 Epoch: 1 Global Step: 33580 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:47,203-Speed 2496.12 samples/sec Loss 22.8910 LearningRate 0.000405 Epoch: 1 Global Step: 33590 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:06:55,400-Speed 2498.81 samples/sec Loss 22.7822 LearningRate 0.000405 Epoch: 1 Global Step: 33600 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:03,546-Speed 2514.79 samples/sec Loss 22.9216 LearningRate 0.000405 Epoch: 1 Global Step: 33610 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:11,751-Speed 2496.44 samples/sec Loss 22.8670 LearningRate 0.000405 Epoch: 1 Global Step: 33620 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:19,967-Speed 2492.88 samples/sec Loss 22.8256 LearningRate 0.000405 Epoch: 1 Global Step: 33630 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:28,165-Speed 2498.78 samples/sec Loss 22.7483 LearningRate 0.000406 Epoch: 1 Global Step: 33640 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:36,366-Speed 2497.73 samples/sec Loss 22.8202 LearningRate 0.000406 Epoch: 1 Global Step: 33650 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:44,572-Speed 2496.05 samples/sec Loss 22.8672 LearningRate 0.000406 Epoch: 1 Global Step: 33660 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:07:52,726-Speed 2511.97 samples/sec Loss 22.8202 LearningRate 0.000406 Epoch: 1 Global Step: 33670 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:00,939-Speed 2493.93 samples/sec Loss 22.8097 LearningRate 0.000406 Epoch: 1 Global Step: 33680 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:09,139-Speed 2497.98 samples/sec Loss 22.8393 LearningRate 0.000406 Epoch: 1 Global Step: 33690 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:17,343-Speed 2496.63 samples/sec Loss 22.8276 LearningRate 0.000406 Epoch: 1 Global Step: 33700 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:25,544-Speed 2497.63 samples/sec Loss 22.8260 LearningRate 0.000406 Epoch: 1 Global Step: 33710 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:33,747-Speed 2497.31 samples/sec Loss 22.7518 LearningRate 0.000406 Epoch: 1 Global Step: 33720 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:41,893-Speed 2514.54 samples/sec Loss 22.7899 LearningRate 0.000407 Epoch: 1 Global Step: 33730 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:50,092-Speed 2498.20 samples/sec Loss 22.7853 LearningRate 0.000407 Epoch: 1 Global Step: 33740 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:08:58,294-Speed 2497.58 samples/sec Loss 22.7224 LearningRate 0.000407 Epoch: 1 Global Step: 33750 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:06,495-Speed 2497.78 samples/sec Loss 22.7696 LearningRate 0.000407 Epoch: 1 Global Step: 33760 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:14,701-Speed 2496.00 samples/sec Loss 22.8263 LearningRate 0.000407 Epoch: 1 Global Step: 33770 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:22,905-Speed 2496.64 samples/sec Loss 22.9178 LearningRate 0.000407 Epoch: 1 Global Step: 33780 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:31,054-Speed 2513.38 samples/sec Loss 22.8824 LearningRate 0.000407 Epoch: 1 Global Step: 33790 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:39,260-Speed 2496.51 samples/sec Loss 22.7271 LearningRate 0.000407 Epoch: 1 Global Step: 33800 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:47,461-Speed 2497.80 samples/sec Loss 22.7733 LearningRate 0.000408 Epoch: 1 Global Step: 33810 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:09:55,661-Speed 2498.05 samples/sec Loss 22.6967 LearningRate 0.000408 Epoch: 1 Global Step: 33820 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:03,871-Speed 2494.85 samples/sec Loss 22.7515 LearningRate 0.000408 Epoch: 1 Global Step: 33830 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:12,074-Speed 2497.15 samples/sec Loss 22.7085 LearningRate 0.000408 Epoch: 1 Global Step: 33840 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:20,222-Speed 2513.83 samples/sec Loss 22.6942 LearningRate 0.000408 Epoch: 1 Global Step: 33850 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:28,424-Speed 2497.11 samples/sec Loss 22.7109 LearningRate 0.000408 Epoch: 1 Global Step: 33860 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:36,623-Speed 2498.29 samples/sec Loss 22.5588 LearningRate 0.000408 Epoch: 1 Global Step: 33870 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:44,823-Speed 2497.87 samples/sec Loss 22.5804 LearningRate 0.000408 Epoch: 1 Global Step: 33880 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:10:53,028-Speed 2496.44 samples/sec Loss 22.7125 LearningRate 0.000409 Epoch: 1 Global Step: 33890 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:01,227-Speed 2498.30 samples/sec Loss 22.6993 LearningRate 0.000409 Epoch: 1 Global Step: 33900 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:09,381-Speed 2512.63 samples/sec Loss 22.5706 LearningRate 0.000409 Epoch: 1 Global Step: 33910 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:17,580-Speed 2498.21 samples/sec Loss 22.5894 LearningRate 0.000409 Epoch: 1 Global Step: 33920 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:25,780-Speed 2497.91 samples/sec Loss 22.4076 LearningRate 0.000409 Epoch: 1 Global Step: 33930 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:33,982-Speed 2497.29 samples/sec Loss 22.5181 LearningRate 0.000409 Epoch: 1 Global Step: 33940 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:42,183-Speed 2497.73 samples/sec Loss 22.5376 LearningRate 0.000409 Epoch: 1 Global Step: 33950 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:50,386-Speed 2497.31 samples/sec Loss 22.5780 LearningRate 0.000409 Epoch: 1 Global Step: 33960 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:11:58,555-Speed 2507.49 samples/sec Loss 22.7067 LearningRate 0.000409 Epoch: 1 Global Step: 33970 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:06,754-Speed 2498.14 samples/sec Loss 22.5879 LearningRate 0.000410 Epoch: 1 Global Step: 33980 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:14,955-Speed 2497.71 samples/sec Loss 22.4998 LearningRate 0.000410 Epoch: 1 Global Step: 33990 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:23,156-Speed 2497.46 samples/sec Loss 22.6086 LearningRate 0.000410 Epoch: 1 Global Step: 34000 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:31,376-Speed 2491.80 samples/sec Loss 22.4604 LearningRate 0.000410 Epoch: 1 Global Step: 34010 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:39,591-Speed 2493.75 samples/sec Loss 22.3268 LearningRate 0.000410 Epoch: 1 Global Step: 34020 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:47,753-Speed 2509.54 samples/sec Loss 22.4579 LearningRate 0.000410 Epoch: 1 Global Step: 34030 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:12:55,959-Speed 2496.25 samples/sec Loss 22.5628 LearningRate 0.000410 Epoch: 1 Global Step: 34040 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:04,163-Speed 2496.73 samples/sec Loss 22.4649 LearningRate 0.000410 Epoch: 1 Global Step: 34050 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:12,365-Speed 2497.61 samples/sec Loss 22.5110 LearningRate 0.000411 Epoch: 1 Global Step: 34060 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:20,570-Speed 2496.46 samples/sec Loss 22.3544 LearningRate 0.000411 Epoch: 1 Global Step: 34070 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:28,772-Speed 2497.09 samples/sec Loss 22.4132 LearningRate 0.000411 Epoch: 1 Global Step: 34080 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:36,931-Speed 2510.75 samples/sec Loss 22.3852 LearningRate 0.000411 Epoch: 1 Global Step: 34090 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:45,142-Speed 2494.47 samples/sec Loss 22.3562 LearningRate 0.000411 Epoch: 1 Global Step: 34100 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:13:53,347-Speed 2496.41 samples/sec Loss 22.3796 LearningRate 0.000411 Epoch: 1 Global Step: 34110 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:01,549-Speed 2497.26 samples/sec Loss 22.3175 LearningRate 0.000411 Epoch: 1 Global Step: 34120 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:09,751-Speed 2497.44 samples/sec Loss 22.3887 LearningRate 0.000411 Epoch: 1 Global Step: 34130 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:17,956-Speed 2496.29 samples/sec Loss 22.3824 LearningRate 0.000412 Epoch: 1 Global Step: 34140 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:26,103-Speed 2514.16 samples/sec Loss 22.2658 LearningRate 0.000412 Epoch: 1 Global Step: 34150 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:34,303-Speed 2498.01 samples/sec Loss 22.2747 LearningRate 0.000412 Epoch: 1 Global Step: 34160 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:42,505-Speed 2497.38 samples/sec Loss 22.2931 LearningRate 0.000412 Epoch: 1 Global Step: 34170 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:50,707-Speed 2497.24 samples/sec Loss 22.1902 LearningRate 0.000412 Epoch: 1 Global Step: 34180 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:14:58,911-Speed 2496.72 samples/sec Loss 22.0978 LearningRate 0.000412 Epoch: 1 Global Step: 34190 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:07,117-Speed 2495.92 samples/sec Loss 22.1649 LearningRate 0.000412 Epoch: 1 Global Step: 34200 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:15,266-Speed 2513.89 samples/sec Loss 22.2359 LearningRate 0.000412 Epoch: 1 Global Step: 34210 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:23,482-Speed 2492.98 samples/sec Loss 22.2431 LearningRate 0.000412 Epoch: 1 Global Step: 34220 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:31,682-Speed 2498.03 samples/sec Loss 22.1552 LearningRate 0.000413 Epoch: 1 Global Step: 34230 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:39,892-Speed 2494.94 samples/sec Loss 22.3153 LearningRate 0.000413 Epoch: 1 Global Step: 34240 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:48,092-Speed 2497.72 samples/sec Loss 22.1984 LearningRate 0.000413 Epoch: 1 Global Step: 34250 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:15:56,293-Speed 2497.77 samples/sec Loss 22.3138 LearningRate 0.000413 Epoch: 1 Global Step: 34260 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:04,450-Speed 2511.14 samples/sec Loss 22.2593 LearningRate 0.000413 Epoch: 1 Global Step: 34270 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:12,653-Speed 2497.36 samples/sec Loss 22.0952 LearningRate 0.000413 Epoch: 1 Global Step: 34280 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:20,854-Speed 2497.47 samples/sec Loss 22.1834 LearningRate 0.000413 Epoch: 1 Global Step: 34290 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:29,058-Speed 2496.70 samples/sec Loss 22.0214 LearningRate 0.000413 Epoch: 1 Global Step: 34300 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:37,257-Speed 2498.51 samples/sec Loss 22.1791 LearningRate 0.000414 Epoch: 1 Global Step: 34310 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:45,458-Speed 2497.64 samples/sec Loss 22.0818 LearningRate 0.000414 Epoch: 1 Global Step: 34320 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:16:53,606-Speed 2513.75 samples/sec Loss 22.0166 LearningRate 0.000414 Epoch: 1 Global Step: 34330 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:01,808-Speed 2497.71 samples/sec Loss 22.0944 LearningRate 0.000414 Epoch: 1 Global Step: 34340 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:10,007-Speed 2498.13 samples/sec Loss 22.0541 LearningRate 0.000414 Epoch: 1 Global Step: 34350 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:18,205-Speed 2498.56 samples/sec Loss 22.0161 LearningRate 0.000414 Epoch: 1 Global Step: 34360 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:26,403-Speed 2498.68 samples/sec Loss 22.1233 LearningRate 0.000414 Epoch: 1 Global Step: 34370 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:34,607-Speed 2496.68 samples/sec Loss 22.1924 LearningRate 0.000414 Epoch: 1 Global Step: 34380 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:42,763-Speed 2511.56 samples/sec Loss 22.0512 LearningRate 0.000415 Epoch: 1 Global Step: 34390 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:50,970-Speed 2495.67 samples/sec Loss 22.0511 LearningRate 0.000415 Epoch: 1 Global Step: 34400 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:17:59,168-Speed 2498.55 samples/sec Loss 21.8909 LearningRate 0.000415 Epoch: 1 Global Step: 34410 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:07,368-Speed 2498.20 samples/sec Loss 21.9927 LearningRate 0.000415 Epoch: 1 Global Step: 34420 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:15,570-Speed 2497.70 samples/sec Loss 22.0051 LearningRate 0.000415 Epoch: 1 Global Step: 34430 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:23,767-Speed 2498.64 samples/sec Loss 22.0404 LearningRate 0.000415 Epoch: 1 Global Step: 34440 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:31,917-Speed 2513.32 samples/sec Loss 22.0639 LearningRate 0.000415 Epoch: 1 Global Step: 34450 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:40,117-Speed 2498.30 samples/sec Loss 22.0222 LearningRate 0.000415 Epoch: 1 Global Step: 34460 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:48,320-Speed 2497.73 samples/sec Loss 21.9346 LearningRate 0.000416 Epoch: 1 Global Step: 34470 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:18:56,518-Speed 2498.16 samples/sec Loss 22.1509 LearningRate 0.000416 Epoch: 1 Global Step: 34480 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:04,724-Speed 2496.05 samples/sec Loss 22.1081 LearningRate 0.000416 Epoch: 1 Global Step: 34490 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:12,929-Speed 2496.46 samples/sec Loss 22.0187 LearningRate 0.000416 Epoch: 1 Global Step: 34500 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:21,080-Speed 2513.19 samples/sec Loss 21.9529 LearningRate 0.000416 Epoch: 1 Global Step: 34510 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:29,275-Speed 2499.25 samples/sec Loss 21.9528 LearningRate 0.000416 Epoch: 1 Global Step: 34520 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:37,476-Speed 2497.73 samples/sec Loss 21.8649 LearningRate 0.000416 Epoch: 1 Global Step: 34530 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:45,675-Speed 2498.48 samples/sec Loss 21.7291 LearningRate 0.000416 Epoch: 1 Global Step: 34540 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:19:53,873-Speed 2498.19 samples/sec Loss 21.7542 LearningRate 0.000416 Epoch: 1 Global Step: 34550 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:02,074-Speed 2497.66 samples/sec Loss 21.8784 LearningRate 0.000417 Epoch: 1 Global Step: 34560 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:10,224-Speed 2513.68 samples/sec Loss 21.9263 LearningRate 0.000417 Epoch: 1 Global Step: 34570 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:18,424-Speed 2497.90 samples/sec Loss 21.9403 LearningRate 0.000417 Epoch: 1 Global Step: 34580 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:26,622-Speed 2498.57 samples/sec Loss 21.8634 LearningRate 0.000417 Epoch: 1 Global Step: 34590 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:34,821-Speed 2498.35 samples/sec Loss 21.8733 LearningRate 0.000417 Epoch: 1 Global Step: 34600 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:43,021-Speed 2497.96 samples/sec Loss 21.7857 LearningRate 0.000417 Epoch: 1 Global Step: 34610 Fp16 Grad Scale: 2048 Required: 182 hours Training: 2022-07-05 22:20:51,219-Speed 2498.57 samples/sec Loss 21.8214 LearningRate 0.000417 Epoch: 1 Global Step: 34620 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:20:59,368-Speed 2513.82 samples/sec Loss 21.8113 LearningRate 0.000417 Epoch: 1 Global Step: 34630 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:07,575-Speed 2495.66 samples/sec Loss 21.7819 LearningRate 0.000418 Epoch: 1 Global Step: 34640 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:15,774-Speed 2498.17 samples/sec Loss 21.7381 LearningRate 0.000418 Epoch: 1 Global Step: 34650 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:23,977-Speed 2497.09 samples/sec Loss 21.7591 LearningRate 0.000418 Epoch: 1 Global Step: 34660 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:32,179-Speed 2497.19 samples/sec Loss 21.7298 LearningRate 0.000418 Epoch: 1 Global Step: 34670 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:40,378-Speed 2498.37 samples/sec Loss 21.7315 LearningRate 0.000418 Epoch: 1 Global Step: 34680 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:48,530-Speed 2512.72 samples/sec Loss 21.7287 LearningRate 0.000418 Epoch: 1 Global Step: 34690 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:21:56,734-Speed 2496.57 samples/sec Loss 21.6445 LearningRate 0.000418 Epoch: 1 Global Step: 34700 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:04,934-Speed 2497.91 samples/sec Loss 21.6909 LearningRate 0.000418 Epoch: 1 Global Step: 34710 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:13,135-Speed 2497.69 samples/sec Loss 21.7087 LearningRate 0.000419 Epoch: 1 Global Step: 34720 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:21,335-Speed 2497.93 samples/sec Loss 21.8957 LearningRate 0.000419 Epoch: 1 Global Step: 34730 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:29,531-Speed 2499.17 samples/sec Loss 21.7770 LearningRate 0.000419 Epoch: 1 Global Step: 34740 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:37,679-Speed 2513.72 samples/sec Loss 21.7044 LearningRate 0.000419 Epoch: 1 Global Step: 34750 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:45,880-Speed 2497.87 samples/sec Loss 21.7179 LearningRate 0.000419 Epoch: 1 Global Step: 34760 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:22:54,085-Speed 2496.66 samples/sec Loss 21.6777 LearningRate 0.000419 Epoch: 1 Global Step: 34770 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:23:02,285-Speed 2498.00 samples/sec Loss 21.7270 LearningRate 0.000419 Epoch: 1 Global Step: 34780 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:23:10,497-Speed 2494.30 samples/sec Loss 21.6278 LearningRate 0.000419 Epoch: 1 Global Step: 34790 Fp16 Grad Scale: 4096 Required: 182 hours Training: 2022-07-05 22:23:18,699-Speed 2497.49 samples/sec Loss 21.6413 LearningRate 0.000419 Epoch: 1 Global Step: 34800 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:23:26,850-Speed 2513.09 samples/sec Loss 21.6253 LearningRate 0.000420 Epoch: 1 Global Step: 34810 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:23:35,052-Speed 2497.17 samples/sec Loss 21.5519 LearningRate 0.000420 Epoch: 1 Global Step: 34820 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:23:43,254-Speed 2497.61 samples/sec Loss 21.6725 LearningRate 0.000420 Epoch: 1 Global Step: 34830 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:23:51,451-Speed 2498.72 samples/sec Loss 21.5847 LearningRate 0.000420 Epoch: 1 Global Step: 34840 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:23:59,656-Speed 2496.53 samples/sec Loss 21.4798 LearningRate 0.000420 Epoch: 1 Global Step: 34850 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:07,859-Speed 2497.26 samples/sec Loss 21.4141 LearningRate 0.000420 Epoch: 1 Global Step: 34860 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:16,007-Speed 2513.85 samples/sec Loss 21.5138 LearningRate 0.000420 Epoch: 1 Global Step: 34870 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:24,209-Speed 2497.32 samples/sec Loss 21.5322 LearningRate 0.000420 Epoch: 1 Global Step: 34880 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:32,409-Speed 2498.03 samples/sec Loss 21.5179 LearningRate 0.000421 Epoch: 1 Global Step: 34890 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:40,605-Speed 2499.11 samples/sec Loss 21.4825 LearningRate 0.000421 Epoch: 1 Global Step: 34900 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:48,805-Speed 2498.07 samples/sec Loss 21.5347 LearningRate 0.000421 Epoch: 1 Global Step: 34910 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:24:57,004-Speed 2498.22 samples/sec Loss 21.4258 LearningRate 0.000421 Epoch: 1 Global Step: 34920 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:05,161-Speed 2510.87 samples/sec Loss 21.3731 LearningRate 0.000421 Epoch: 1 Global Step: 34930 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:13,362-Speed 2497.69 samples/sec Loss 21.4532 LearningRate 0.000421 Epoch: 1 Global Step: 34940 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:21,562-Speed 2497.88 samples/sec Loss 21.3841 LearningRate 0.000421 Epoch: 1 Global Step: 34950 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:29,760-Speed 2498.65 samples/sec Loss 21.4276 LearningRate 0.000421 Epoch: 1 Global Step: 34960 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:37,965-Speed 2496.54 samples/sec Loss 21.4702 LearningRate 0.000422 Epoch: 1 Global Step: 34970 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:46,175-Speed 2494.87 samples/sec Loss 21.4621 LearningRate 0.000422 Epoch: 1 Global Step: 34980 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:25:54,321-Speed 2514.25 samples/sec Loss 21.4236 LearningRate 0.000422 Epoch: 1 Global Step: 34990 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:02,524-Speed 2497.13 samples/sec Loss 21.4287 LearningRate 0.000422 Epoch: 1 Global Step: 35000 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:10,728-Speed 2496.82 samples/sec Loss 21.2532 LearningRate 0.000422 Epoch: 1 Global Step: 35010 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:18,933-Speed 2496.21 samples/sec Loss 21.3672 LearningRate 0.000422 Epoch: 1 Global Step: 35020 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:27,139-Speed 2496.28 samples/sec Loss 21.3855 LearningRate 0.000422 Epoch: 1 Global Step: 35030 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:35,339-Speed 2497.92 samples/sec Loss 21.3119 LearningRate 0.000422 Epoch: 1 Global Step: 35040 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:43,490-Speed 2513.13 samples/sec Loss 21.3483 LearningRate 0.000423 Epoch: 1 Global Step: 35050 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:51,698-Speed 2495.43 samples/sec Loss 21.2175 LearningRate 0.000423 Epoch: 1 Global Step: 35060 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:26:59,909-Speed 2494.70 samples/sec Loss 21.2756 LearningRate 0.000423 Epoch: 1 Global Step: 35070 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:08,144-Speed 2487.30 samples/sec Loss 21.4561 LearningRate 0.000423 Epoch: 1 Global Step: 35080 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:16,345-Speed 2497.74 samples/sec Loss 21.3561 LearningRate 0.000423 Epoch: 1 Global Step: 35090 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:24,550-Speed 2496.54 samples/sec Loss 21.3178 LearningRate 0.000423 Epoch: 1 Global Step: 35100 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:32,696-Speed 2514.37 samples/sec Loss 21.2490 LearningRate 0.000423 Epoch: 1 Global Step: 35110 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:40,898-Speed 2497.42 samples/sec Loss 21.2123 LearningRate 0.000423 Epoch: 1 Global Step: 35120 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:49,100-Speed 2497.49 samples/sec Loss 21.3355 LearningRate 0.000423 Epoch: 1 Global Step: 35130 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:27:57,301-Speed 2497.46 samples/sec Loss 21.4014 LearningRate 0.000424 Epoch: 1 Global Step: 35140 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:05,498-Speed 2498.90 samples/sec Loss 21.3114 LearningRate 0.000424 Epoch: 1 Global Step: 35150 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:13,697-Speed 2498.35 samples/sec Loss 21.1926 LearningRate 0.000424 Epoch: 1 Global Step: 35160 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:21,844-Speed 2514.03 samples/sec Loss 21.2555 LearningRate 0.000424 Epoch: 1 Global Step: 35170 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:30,044-Speed 2497.88 samples/sec Loss 21.3231 LearningRate 0.000424 Epoch: 1 Global Step: 35180 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:38,244-Speed 2498.16 samples/sec Loss 21.1493 LearningRate 0.000424 Epoch: 1 Global Step: 35190 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:46,447-Speed 2497.39 samples/sec Loss 21.1997 LearningRate 0.000424 Epoch: 1 Global Step: 35200 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:28:54,649-Speed 2497.25 samples/sec Loss 21.2805 LearningRate 0.000424 Epoch: 1 Global Step: 35210 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:02,846-Speed 2498.80 samples/sec Loss 21.2657 LearningRate 0.000425 Epoch: 1 Global Step: 35220 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:11,006-Speed 2510.30 samples/sec Loss 21.1614 LearningRate 0.000425 Epoch: 1 Global Step: 35230 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:19,207-Speed 2497.50 samples/sec Loss 21.1799 LearningRate 0.000425 Epoch: 1 Global Step: 35240 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:27,406-Speed 2498.20 samples/sec Loss 21.0711 LearningRate 0.000425 Epoch: 1 Global Step: 35250 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:35,616-Speed 2495.00 samples/sec Loss 21.0719 LearningRate 0.000425 Epoch: 1 Global Step: 35260 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:43,816-Speed 2498.04 samples/sec Loss 21.1310 LearningRate 0.000425 Epoch: 1 Global Step: 35270 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:29:52,017-Speed 2497.58 samples/sec Loss 21.1416 LearningRate 0.000425 Epoch: 1 Global Step: 35280 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:00,164-Speed 2514.11 samples/sec Loss 21.1609 LearningRate 0.000425 Epoch: 1 Global Step: 35290 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:08,367-Speed 2497.03 samples/sec Loss 21.1196 LearningRate 0.000426 Epoch: 1 Global Step: 35300 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:16,566-Speed 2498.10 samples/sec Loss 21.0900 LearningRate 0.000426 Epoch: 1 Global Step: 35310 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:24,772-Speed 2496.35 samples/sec Loss 21.1171 LearningRate 0.000426 Epoch: 1 Global Step: 35320 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:32,972-Speed 2497.87 samples/sec Loss 21.1496 LearningRate 0.000426 Epoch: 1 Global Step: 35330 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:41,172-Speed 2498.01 samples/sec Loss 21.1191 LearningRate 0.000426 Epoch: 1 Global Step: 35340 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:49,331-Speed 2510.66 samples/sec Loss 21.0876 LearningRate 0.000426 Epoch: 1 Global Step: 35350 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:30:57,529-Speed 2498.42 samples/sec Loss 21.0414 LearningRate 0.000426 Epoch: 1 Global Step: 35360 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:05,730-Speed 2497.59 samples/sec Loss 20.9914 LearningRate 0.000426 Epoch: 1 Global Step: 35370 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:13,933-Speed 2497.15 samples/sec Loss 21.0255 LearningRate 0.000426 Epoch: 1 Global Step: 35380 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:22,129-Speed 2499.27 samples/sec Loss 21.0365 LearningRate 0.000427 Epoch: 1 Global Step: 35390 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:30,330-Speed 2497.50 samples/sec Loss 20.9081 LearningRate 0.000427 Epoch: 1 Global Step: 35400 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:38,477-Speed 2514.43 samples/sec Loss 20.9758 LearningRate 0.000427 Epoch: 1 Global Step: 35410 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:46,677-Speed 2498.34 samples/sec Loss 20.9143 LearningRate 0.000427 Epoch: 1 Global Step: 35420 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:31:54,875-Speed 2498.48 samples/sec Loss 20.9446 LearningRate 0.000427 Epoch: 1 Global Step: 35430 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:03,078-Speed 2497.19 samples/sec Loss 20.8584 LearningRate 0.000427 Epoch: 1 Global Step: 35440 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:11,278-Speed 2498.00 samples/sec Loss 21.0226 LearningRate 0.000427 Epoch: 1 Global Step: 35450 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:19,496-Speed 2492.71 samples/sec Loss 20.9458 LearningRate 0.000427 Epoch: 1 Global Step: 35460 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:27,646-Speed 2513.07 samples/sec Loss 20.9020 LearningRate 0.000428 Epoch: 1 Global Step: 35470 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:35,847-Speed 2497.74 samples/sec Loss 20.9359 LearningRate 0.000428 Epoch: 1 Global Step: 35480 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:44,044-Speed 2498.65 samples/sec Loss 21.0190 LearningRate 0.000428 Epoch: 1 Global Step: 35490 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:32:52,249-Speed 2496.60 samples/sec Loss 21.0179 LearningRate 0.000428 Epoch: 1 Global Step: 35500 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:00,451-Speed 2497.38 samples/sec Loss 20.9381 LearningRate 0.000428 Epoch: 1 Global Step: 35510 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:08,651-Speed 2497.90 samples/sec Loss 20.8929 LearningRate 0.000428 Epoch: 1 Global Step: 35520 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:16,798-Speed 2514.00 samples/sec Loss 20.8546 LearningRate 0.000428 Epoch: 1 Global Step: 35530 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:24,998-Speed 2498.07 samples/sec Loss 20.7865 LearningRate 0.000428 Epoch: 1 Global Step: 35540 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:33,197-Speed 2498.87 samples/sec Loss 20.9317 LearningRate 0.000429 Epoch: 1 Global Step: 35550 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:41,415-Speed 2492.22 samples/sec Loss 20.9281 LearningRate 0.000429 Epoch: 1 Global Step: 35560 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:49,613-Speed 2498.64 samples/sec Loss 20.9104 LearningRate 0.000429 Epoch: 1 Global Step: 35570 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:33:57,817-Speed 2497.00 samples/sec Loss 20.8108 LearningRate 0.000429 Epoch: 1 Global Step: 35580 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:05,990-Speed 2505.87 samples/sec Loss 20.8353 LearningRate 0.000429 Epoch: 1 Global Step: 35590 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:14,190-Speed 2498.09 samples/sec Loss 20.8171 LearningRate 0.000429 Epoch: 1 Global Step: 35600 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:22,388-Speed 2498.72 samples/sec Loss 20.8722 LearningRate 0.000429 Epoch: 1 Global Step: 35610 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:30,585-Speed 2498.89 samples/sec Loss 20.7901 LearningRate 0.000429 Epoch: 1 Global Step: 35620 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:38,794-Speed 2495.24 samples/sec Loss 20.7911 LearningRate 0.000429 Epoch: 1 Global Step: 35630 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:46,996-Speed 2497.22 samples/sec Loss 20.8532 LearningRate 0.000430 Epoch: 1 Global Step: 35640 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:34:55,142-Speed 2514.48 samples/sec Loss 20.7127 LearningRate 0.000430 Epoch: 1 Global Step: 35650 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:03,347-Speed 2496.65 samples/sec Loss 20.8712 LearningRate 0.000430 Epoch: 1 Global Step: 35660 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:11,565-Speed 2492.52 samples/sec Loss 20.7494 LearningRate 0.000430 Epoch: 1 Global Step: 35670 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:19,769-Speed 2496.82 samples/sec Loss 20.7491 LearningRate 0.000430 Epoch: 1 Global Step: 35680 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:27,976-Speed 2496.01 samples/sec Loss 20.8747 LearningRate 0.000430 Epoch: 1 Global Step: 35690 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:36,187-Speed 2494.67 samples/sec Loss 20.8382 LearningRate 0.000430 Epoch: 1 Global Step: 35700 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:44,338-Speed 2512.96 samples/sec Loss 20.7979 LearningRate 0.000430 Epoch: 1 Global Step: 35710 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:35:52,547-Speed 2495.48 samples/sec Loss 20.7927 LearningRate 0.000431 Epoch: 1 Global Step: 35720 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:00,752-Speed 2496.42 samples/sec Loss 20.8222 LearningRate 0.000431 Epoch: 1 Global Step: 35730 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:08,955-Speed 2497.04 samples/sec Loss 20.7251 LearningRate 0.000431 Epoch: 1 Global Step: 35740 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:17,169-Speed 2493.54 samples/sec Loss 20.7202 LearningRate 0.000431 Epoch: 1 Global Step: 35750 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:25,377-Speed 2495.76 samples/sec Loss 20.7699 LearningRate 0.000431 Epoch: 1 Global Step: 35760 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:33,526-Speed 2513.52 samples/sec Loss 20.6255 LearningRate 0.000431 Epoch: 1 Global Step: 35770 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:41,728-Speed 2497.08 samples/sec Loss 20.7948 LearningRate 0.000431 Epoch: 1 Global Step: 35780 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:49,925-Speed 2498.84 samples/sec Loss 20.5888 LearningRate 0.000431 Epoch: 1 Global Step: 35790 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:36:58,127-Speed 2497.29 samples/sec Loss 20.6326 LearningRate 0.000432 Epoch: 1 Global Step: 35800 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:37:06,329-Speed 2497.71 samples/sec Loss 20.6500 LearningRate 0.000432 Epoch: 1 Global Step: 35810 Fp16 Grad Scale: 4096 Required: 181 hours Training: 2022-07-05 22:37:14,534-Speed 2496.44 samples/sec Loss 20.6210 LearningRate 0.000432 Epoch: 1 Global Step: 35820 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:37:22,684-Speed 2513.28 samples/sec Loss 20.5861 LearningRate 0.000432 Epoch: 1 Global Step: 35830 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:37:30,885-Speed 2497.48 samples/sec Loss 20.6614 LearningRate 0.000432 Epoch: 1 Global Step: 35840 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:37:39,089-Speed 2496.94 samples/sec Loss 20.6487 LearningRate 0.000432 Epoch: 1 Global Step: 35850 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:37:47,284-Speed 2499.52 samples/sec Loss 20.6227 LearningRate 0.000432 Epoch: 1 Global Step: 35860 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:37:55,487-Speed 2497.09 samples/sec Loss 20.5848 LearningRate 0.000432 Epoch: 1 Global Step: 35870 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:03,689-Speed 2497.45 samples/sec Loss 20.5867 LearningRate 0.000433 Epoch: 1 Global Step: 35880 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:11,838-Speed 2513.71 samples/sec Loss 20.7562 LearningRate 0.000433 Epoch: 1 Global Step: 35890 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:20,041-Speed 2496.98 samples/sec Loss 20.6473 LearningRate 0.000433 Epoch: 1 Global Step: 35900 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:28,244-Speed 2497.03 samples/sec Loss 20.6098 LearningRate 0.000433 Epoch: 1 Global Step: 35910 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:36,444-Speed 2498.01 samples/sec Loss 20.5668 LearningRate 0.000433 Epoch: 1 Global Step: 35920 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:44,646-Speed 2497.17 samples/sec Loss 20.6407 LearningRate 0.000433 Epoch: 1 Global Step: 35930 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:38:52,867-Speed 2491.68 samples/sec Loss 20.5249 LearningRate 0.000433 Epoch: 1 Global Step: 35940 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:01,014-Speed 2514.10 samples/sec Loss 20.4317 LearningRate 0.000433 Epoch: 1 Global Step: 35950 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:09,219-Speed 2496.47 samples/sec Loss 20.4352 LearningRate 0.000433 Epoch: 1 Global Step: 35960 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:17,422-Speed 2497.00 samples/sec Loss 20.5186 LearningRate 0.000434 Epoch: 1 Global Step: 35970 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:25,622-Speed 2498.11 samples/sec Loss 20.5539 LearningRate 0.000434 Epoch: 1 Global Step: 35980 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:33,823-Speed 2497.31 samples/sec Loss 20.4724 LearningRate 0.000434 Epoch: 1 Global Step: 35990 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:42,031-Speed 2495.61 samples/sec Loss 20.4406 LearningRate 0.000434 Epoch: 1 Global Step: 36000 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:50,182-Speed 2513.28 samples/sec Loss 20.4929 LearningRate 0.000434 Epoch: 1 Global Step: 36010 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:39:58,391-Speed 2494.92 samples/sec Loss 20.4865 LearningRate 0.000434 Epoch: 1 Global Step: 36020 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:06,601-Speed 2495.07 samples/sec Loss 20.3199 LearningRate 0.000434 Epoch: 1 Global Step: 36030 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:14,808-Speed 2495.77 samples/sec Loss 20.4541 LearningRate 0.000434 Epoch: 1 Global Step: 36040 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:23,025-Speed 2492.71 samples/sec Loss 20.3555 LearningRate 0.000435 Epoch: 1 Global Step: 36050 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:31,232-Speed 2495.96 samples/sec Loss 20.2691 LearningRate 0.000435 Epoch: 1 Global Step: 36060 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:39,386-Speed 2512.14 samples/sec Loss 20.3630 LearningRate 0.000435 Epoch: 1 Global Step: 36070 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:47,589-Speed 2497.09 samples/sec Loss 20.3771 LearningRate 0.000435 Epoch: 1 Global Step: 36080 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:40:55,792-Speed 2497.12 samples/sec Loss 20.3693 LearningRate 0.000435 Epoch: 1 Global Step: 36090 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:03,997-Speed 2496.61 samples/sec Loss 20.3570 LearningRate 0.000435 Epoch: 1 Global Step: 36100 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:12,207-Speed 2494.96 samples/sec Loss 20.3015 LearningRate 0.000435 Epoch: 1 Global Step: 36110 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:20,410-Speed 2497.16 samples/sec Loss 20.3061 LearningRate 0.000435 Epoch: 1 Global Step: 36120 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:28,566-Speed 2511.19 samples/sec Loss 20.3766 LearningRate 0.000436 Epoch: 1 Global Step: 36130 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:36,766-Speed 2498.18 samples/sec Loss 20.4007 LearningRate 0.000436 Epoch: 1 Global Step: 36140 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:44,970-Speed 2496.77 samples/sec Loss 20.3084 LearningRate 0.000436 Epoch: 1 Global Step: 36150 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:41:53,174-Speed 2496.91 samples/sec Loss 20.2414 LearningRate 0.000436 Epoch: 1 Global Step: 36160 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:01,375-Speed 2497.72 samples/sec Loss 20.2779 LearningRate 0.000436 Epoch: 1 Global Step: 36170 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:09,576-Speed 2497.38 samples/sec Loss 20.4987 LearningRate 0.000436 Epoch: 1 Global Step: 36180 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:17,724-Speed 2514.19 samples/sec Loss 20.3353 LearningRate 0.000436 Epoch: 1 Global Step: 36190 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:25,931-Speed 2495.87 samples/sec Loss 20.2255 LearningRate 0.000436 Epoch: 1 Global Step: 36200 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:34,148-Speed 2492.60 samples/sec Loss 20.2296 LearningRate 0.000436 Epoch: 1 Global Step: 36210 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:42,348-Speed 2497.76 samples/sec Loss 20.1514 LearningRate 0.000437 Epoch: 1 Global Step: 36220 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:50,557-Speed 2495.50 samples/sec Loss 20.1825 LearningRate 0.000437 Epoch: 1 Global Step: 36230 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:42:58,760-Speed 2496.97 samples/sec Loss 20.2419 LearningRate 0.000437 Epoch: 1 Global Step: 36240 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:06,913-Speed 2512.29 samples/sec Loss 20.2712 LearningRate 0.000437 Epoch: 1 Global Step: 36250 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:15,118-Speed 2496.53 samples/sec Loss 20.1910 LearningRate 0.000437 Epoch: 1 Global Step: 36260 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:23,321-Speed 2497.32 samples/sec Loss 20.2164 LearningRate 0.000437 Epoch: 1 Global Step: 36270 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:31,524-Speed 2497.01 samples/sec Loss 20.1764 LearningRate 0.000437 Epoch: 1 Global Step: 36280 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:39,725-Speed 2497.72 samples/sec Loss 20.2316 LearningRate 0.000437 Epoch: 1 Global Step: 36290 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:48,011-Speed 2472.39 samples/sec Loss 20.0737 LearningRate 0.000438 Epoch: 1 Global Step: 36300 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:43:56,171-Speed 2510.16 samples/sec Loss 20.1329 LearningRate 0.000438 Epoch: 1 Global Step: 36310 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:04,373-Speed 2497.27 samples/sec Loss 20.0857 LearningRate 0.000438 Epoch: 1 Global Step: 36320 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:12,575-Speed 2497.21 samples/sec Loss 20.1100 LearningRate 0.000438 Epoch: 1 Global Step: 36330 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:20,779-Speed 2496.99 samples/sec Loss 20.1731 LearningRate 0.000438 Epoch: 1 Global Step: 36340 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:28,982-Speed 2497.18 samples/sec Loss 20.0505 LearningRate 0.000438 Epoch: 1 Global Step: 36350 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:37,187-Speed 2496.37 samples/sec Loss 20.0835 LearningRate 0.000438 Epoch: 1 Global Step: 36360 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:45,337-Speed 2513.18 samples/sec Loss 20.0967 LearningRate 0.000438 Epoch: 1 Global Step: 36370 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:44:53,539-Speed 2497.52 samples/sec Loss 20.1067 LearningRate 0.000439 Epoch: 1 Global Step: 36380 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:01,742-Speed 2497.04 samples/sec Loss 20.0169 LearningRate 0.000439 Epoch: 1 Global Step: 36390 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:09,946-Speed 2496.63 samples/sec Loss 19.9660 LearningRate 0.000439 Epoch: 1 Global Step: 36400 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:18,152-Speed 2496.28 samples/sec Loss 19.9655 LearningRate 0.000439 Epoch: 1 Global Step: 36410 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:26,359-Speed 2495.92 samples/sec Loss 19.9281 LearningRate 0.000439 Epoch: 1 Global Step: 36420 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:34,522-Speed 2509.38 samples/sec Loss 20.1085 LearningRate 0.000439 Epoch: 1 Global Step: 36430 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:42,738-Speed 2492.98 samples/sec Loss 20.0750 LearningRate 0.000439 Epoch: 1 Global Step: 36440 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:50,945-Speed 2495.99 samples/sec Loss 20.0757 LearningRate 0.000439 Epoch: 1 Global Step: 36450 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:45:59,146-Speed 2497.68 samples/sec Loss 19.8426 LearningRate 0.000439 Epoch: 1 Global Step: 36460 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:07,351-Speed 2496.48 samples/sec Loss 19.9270 LearningRate 0.000440 Epoch: 1 Global Step: 36470 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:15,560-Speed 2495.01 samples/sec Loss 19.9518 LearningRate 0.000440 Epoch: 1 Global Step: 36480 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:23,722-Speed 2509.65 samples/sec Loss 19.8870 LearningRate 0.000440 Epoch: 1 Global Step: 36490 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:31,924-Speed 2497.16 samples/sec Loss 19.8966 LearningRate 0.000440 Epoch: 1 Global Step: 36500 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:40,130-Speed 2496.31 samples/sec Loss 20.0066 LearningRate 0.000440 Epoch: 1 Global Step: 36510 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:48,332-Speed 2497.20 samples/sec Loss 19.7988 LearningRate 0.000440 Epoch: 1 Global Step: 36520 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:46:56,535-Speed 2497.13 samples/sec Loss 19.8805 LearningRate 0.000440 Epoch: 1 Global Step: 36530 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:04,742-Speed 2496.02 samples/sec Loss 19.9458 LearningRate 0.000440 Epoch: 1 Global Step: 36540 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:12,897-Speed 2511.52 samples/sec Loss 19.9279 LearningRate 0.000441 Epoch: 1 Global Step: 36550 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:21,100-Speed 2497.23 samples/sec Loss 19.8459 LearningRate 0.000441 Epoch: 1 Global Step: 36560 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:29,300-Speed 2497.70 samples/sec Loss 19.8766 LearningRate 0.000441 Epoch: 1 Global Step: 36570 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:37,510-Speed 2495.42 samples/sec Loss 19.8005 LearningRate 0.000441 Epoch: 1 Global Step: 36580 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:45,724-Speed 2493.62 samples/sec Loss 19.8090 LearningRate 0.000441 Epoch: 1 Global Step: 36590 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:47:53,939-Speed 2493.09 samples/sec Loss 19.7652 LearningRate 0.000441 Epoch: 1 Global Step: 36600 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:02,102-Speed 2509.44 samples/sec Loss 19.7348 LearningRate 0.000441 Epoch: 1 Global Step: 36610 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:10,311-Speed 2495.23 samples/sec Loss 19.9248 LearningRate 0.000441 Epoch: 1 Global Step: 36620 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:18,514-Speed 2497.14 samples/sec Loss 19.8383 LearningRate 0.000442 Epoch: 1 Global Step: 36630 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:26,736-Speed 2491.20 samples/sec Loss 19.7487 LearningRate 0.000442 Epoch: 1 Global Step: 36640 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:34,940-Speed 2497.12 samples/sec Loss 19.8498 LearningRate 0.000442 Epoch: 1 Global Step: 36650 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:43,143-Speed 2496.95 samples/sec Loss 19.8387 LearningRate 0.000442 Epoch: 1 Global Step: 36660 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:51,293-Speed 2513.29 samples/sec Loss 19.8025 LearningRate 0.000442 Epoch: 1 Global Step: 36670 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:48:59,495-Speed 2497.34 samples/sec Loss 19.8191 LearningRate 0.000442 Epoch: 1 Global Step: 36680 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:07,694-Speed 2498.18 samples/sec Loss 19.7254 LearningRate 0.000442 Epoch: 1 Global Step: 36690 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:15,901-Speed 2495.98 samples/sec Loss 19.7719 LearningRate 0.000442 Epoch: 1 Global Step: 36700 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:24,103-Speed 2497.46 samples/sec Loss 19.7949 LearningRate 0.000443 Epoch: 1 Global Step: 36710 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:32,307-Speed 2496.57 samples/sec Loss 19.5355 LearningRate 0.000443 Epoch: 1 Global Step: 36720 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:40,469-Speed 2509.87 samples/sec Loss 19.6621 LearningRate 0.000443 Epoch: 1 Global Step: 36730 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:48,668-Speed 2498.13 samples/sec Loss 19.6591 LearningRate 0.000443 Epoch: 1 Global Step: 36740 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:49:56,871-Speed 2496.99 samples/sec Loss 19.6595 LearningRate 0.000443 Epoch: 1 Global Step: 36750 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:05,073-Speed 2497.48 samples/sec Loss 19.6346 LearningRate 0.000443 Epoch: 1 Global Step: 36760 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:13,294-Speed 2491.84 samples/sec Loss 19.6079 LearningRate 0.000443 Epoch: 1 Global Step: 36770 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:21,517-Speed 2491.08 samples/sec Loss 19.5869 LearningRate 0.000443 Epoch: 1 Global Step: 36780 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:29,668-Speed 2512.85 samples/sec Loss 19.6671 LearningRate 0.000443 Epoch: 1 Global Step: 36790 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:37,875-Speed 2495.72 samples/sec Loss 19.6454 LearningRate 0.000444 Epoch: 1 Global Step: 36800 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:46,082-Speed 2496.11 samples/sec Loss 19.6732 LearningRate 0.000444 Epoch: 1 Global Step: 36810 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:50:54,286-Speed 2496.60 samples/sec Loss 19.7782 LearningRate 0.000444 Epoch: 1 Global Step: 36820 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:02,490-Speed 2497.02 samples/sec Loss 19.7233 LearningRate 0.000444 Epoch: 1 Global Step: 36830 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:10,698-Speed 2495.59 samples/sec Loss 19.7837 LearningRate 0.000444 Epoch: 1 Global Step: 36840 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:18,850-Speed 2512.73 samples/sec Loss 19.6570 LearningRate 0.000444 Epoch: 1 Global Step: 36850 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:27,054-Speed 2496.61 samples/sec Loss 19.6952 LearningRate 0.000444 Epoch: 1 Global Step: 36860 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:35,270-Speed 2493.13 samples/sec Loss 19.6202 LearningRate 0.000444 Epoch: 1 Global Step: 36870 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:43,476-Speed 2496.00 samples/sec Loss 19.6544 LearningRate 0.000445 Epoch: 1 Global Step: 36880 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:51,682-Speed 2496.24 samples/sec Loss 19.6714 LearningRate 0.000445 Epoch: 1 Global Step: 36890 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:51:59,887-Speed 2496.08 samples/sec Loss 19.6813 LearningRate 0.000445 Epoch: 1 Global Step: 36900 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:08,036-Speed 2513.68 samples/sec Loss 19.5501 LearningRate 0.000445 Epoch: 1 Global Step: 36910 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:16,242-Speed 2496.27 samples/sec Loss 19.4777 LearningRate 0.000445 Epoch: 1 Global Step: 36920 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:24,443-Speed 2497.65 samples/sec Loss 19.6251 LearningRate 0.000445 Epoch: 1 Global Step: 36930 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:32,657-Speed 2493.39 samples/sec Loss 19.4856 LearningRate 0.000445 Epoch: 1 Global Step: 36940 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:40,862-Speed 2496.53 samples/sec Loss 19.4767 LearningRate 0.000445 Epoch: 1 Global Step: 36950 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:49,079-Speed 2492.75 samples/sec Loss 19.4448 LearningRate 0.000446 Epoch: 1 Global Step: 36960 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:52:57,232-Speed 2512.57 samples/sec Loss 19.5404 LearningRate 0.000446 Epoch: 1 Global Step: 36970 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:53:05,433-Speed 2497.43 samples/sec Loss 19.5086 LearningRate 0.000446 Epoch: 1 Global Step: 36980 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:53:13,638-Speed 2496.41 samples/sec Loss 19.4875 LearningRate 0.000446 Epoch: 1 Global Step: 36990 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:53:21,842-Speed 2497.05 samples/sec Loss 19.5141 LearningRate 0.000446 Epoch: 1 Global Step: 37000 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:53:30,041-Speed 2498.36 samples/sec Loss 19.4640 LearningRate 0.000446 Epoch: 1 Global Step: 37010 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 22:53:38,243-Speed 2497.24 samples/sec Loss 19.4782 LearningRate 0.000446 Epoch: 1 Global Step: 37020 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:53:46,393-Speed 2513.28 samples/sec Loss 19.4909 LearningRate 0.000446 Epoch: 1 Global Step: 37030 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:53:54,598-Speed 2496.67 samples/sec Loss 19.3156 LearningRate 0.000446 Epoch: 1 Global Step: 37040 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:02,804-Speed 2495.87 samples/sec Loss 19.5085 LearningRate 0.000447 Epoch: 1 Global Step: 37050 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:11,010-Speed 2496.05 samples/sec Loss 19.5104 LearningRate 0.000447 Epoch: 1 Global Step: 37060 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:19,214-Speed 2496.83 samples/sec Loss 19.3425 LearningRate 0.000447 Epoch: 1 Global Step: 37070 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:27,419-Speed 2496.53 samples/sec Loss 19.6232 LearningRate 0.000447 Epoch: 1 Global Step: 37080 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:35,567-Speed 2513.85 samples/sec Loss 19.4861 LearningRate 0.000447 Epoch: 1 Global Step: 37090 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:43,782-Speed 2493.22 samples/sec Loss 19.7073 LearningRate 0.000447 Epoch: 1 Global Step: 37100 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:54:51,988-Speed 2496.33 samples/sec Loss 19.5081 LearningRate 0.000447 Epoch: 1 Global Step: 37110 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:00,197-Speed 2495.40 samples/sec Loss 19.4748 LearningRate 0.000447 Epoch: 1 Global Step: 37120 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:08,405-Speed 2495.49 samples/sec Loss 19.4815 LearningRate 0.000448 Epoch: 1 Global Step: 37130 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:16,609-Speed 2496.59 samples/sec Loss 19.3745 LearningRate 0.000448 Epoch: 1 Global Step: 37140 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:24,763-Speed 2512.10 samples/sec Loss 19.4819 LearningRate 0.000448 Epoch: 1 Global Step: 37150 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:32,972-Speed 2495.43 samples/sec Loss 19.4252 LearningRate 0.000448 Epoch: 1 Global Step: 37160 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:41,188-Speed 2492.76 samples/sec Loss 19.5132 LearningRate 0.000448 Epoch: 1 Global Step: 37170 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:49,396-Speed 2495.75 samples/sec Loss 19.3991 LearningRate 0.000448 Epoch: 1 Global Step: 37180 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:55:57,629-Speed 2488.17 samples/sec Loss 19.4450 LearningRate 0.000448 Epoch: 1 Global Step: 37190 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:05,829-Speed 2497.80 samples/sec Loss 19.4001 LearningRate 0.000448 Epoch: 1 Global Step: 37200 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:13,977-Speed 2513.76 samples/sec Loss 19.2741 LearningRate 0.000449 Epoch: 1 Global Step: 37210 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:22,184-Speed 2496.19 samples/sec Loss 19.3474 LearningRate 0.000449 Epoch: 1 Global Step: 37220 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:30,384-Speed 2498.09 samples/sec Loss 19.4670 LearningRate 0.000449 Epoch: 1 Global Step: 37230 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:38,591-Speed 2495.79 samples/sec Loss 19.3170 LearningRate 0.000449 Epoch: 1 Global Step: 37240 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:46,795-Speed 2496.86 samples/sec Loss 19.2699 LearningRate 0.000449 Epoch: 1 Global Step: 37250 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:56:54,997-Speed 2497.29 samples/sec Loss 19.3171 LearningRate 0.000449 Epoch: 1 Global Step: 37260 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:03,147-Speed 2513.23 samples/sec Loss 19.3707 LearningRate 0.000449 Epoch: 1 Global Step: 37270 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:11,346-Speed 2498.06 samples/sec Loss 19.3544 LearningRate 0.000449 Epoch: 1 Global Step: 37280 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:19,546-Speed 2498.17 samples/sec Loss 19.3484 LearningRate 0.000450 Epoch: 1 Global Step: 37290 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:27,762-Speed 2493.09 samples/sec Loss 19.2602 LearningRate 0.000450 Epoch: 1 Global Step: 37300 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:35,963-Speed 2497.71 samples/sec Loss 19.2412 LearningRate 0.000450 Epoch: 1 Global Step: 37310 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:44,167-Speed 2496.79 samples/sec Loss 19.1366 LearningRate 0.000450 Epoch: 1 Global Step: 37320 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:57:52,312-Speed 2514.74 samples/sec Loss 19.2330 LearningRate 0.000450 Epoch: 1 Global Step: 37330 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:00,524-Speed 2494.45 samples/sec Loss 19.1403 LearningRate 0.000450 Epoch: 1 Global Step: 37340 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:08,727-Speed 2496.68 samples/sec Loss 19.2639 LearningRate 0.000450 Epoch: 1 Global Step: 37350 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:16,933-Speed 2496.47 samples/sec Loss 19.2003 LearningRate 0.000450 Epoch: 1 Global Step: 37360 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:25,137-Speed 2496.57 samples/sec Loss 19.1062 LearningRate 0.000450 Epoch: 1 Global Step: 37370 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:33,340-Speed 2497.29 samples/sec Loss 19.1995 LearningRate 0.000451 Epoch: 1 Global Step: 37380 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:41,493-Speed 2512.23 samples/sec Loss 19.1838 LearningRate 0.000451 Epoch: 1 Global Step: 37390 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:49,701-Speed 2495.58 samples/sec Loss 19.1893 LearningRate 0.000451 Epoch: 1 Global Step: 37400 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:58:57,900-Speed 2498.21 samples/sec Loss 19.0032 LearningRate 0.000451 Epoch: 1 Global Step: 37410 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:06,101-Speed 2497.91 samples/sec Loss 19.2237 LearningRate 0.000451 Epoch: 1 Global Step: 37420 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:14,305-Speed 2496.61 samples/sec Loss 19.0541 LearningRate 0.000451 Epoch: 1 Global Step: 37430 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:22,508-Speed 2497.29 samples/sec Loss 19.1069 LearningRate 0.000451 Epoch: 1 Global Step: 37440 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:30,658-Speed 2513.04 samples/sec Loss 19.1844 LearningRate 0.000451 Epoch: 1 Global Step: 37450 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:38,868-Speed 2495.04 samples/sec Loss 18.9742 LearningRate 0.000452 Epoch: 1 Global Step: 37460 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:47,073-Speed 2496.66 samples/sec Loss 19.0392 LearningRate 0.000452 Epoch: 1 Global Step: 37470 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 22:59:55,277-Speed 2496.59 samples/sec Loss 19.1411 LearningRate 0.000452 Epoch: 1 Global Step: 37480 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:00:03,479-Speed 2497.50 samples/sec Loss 18.9915 LearningRate 0.000452 Epoch: 1 Global Step: 37490 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:00:11,684-Speed 2496.41 samples/sec Loss 19.0565 LearningRate 0.000452 Epoch: 1 Global Step: 37500 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:00:19,832-Speed 2513.89 samples/sec Loss 18.9871 LearningRate 0.000452 Epoch: 1 Global Step: 37510 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:00:27,990-Speed 2510.92 samples/sec Loss 19.0385 LearningRate 0.000452 Epoch: 1 Global Step: 37520 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:00:36,194-Speed 2496.78 samples/sec Loss 19.0695 LearningRate 0.000452 Epoch: 1 Global Step: 37530 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:00:44,395-Speed 2497.43 samples/sec Loss 18.9623 LearningRate 0.000453 Epoch: 1 Global Step: 37540 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:00:52,594-Speed 2498.25 samples/sec Loss 18.9819 LearningRate 0.000453 Epoch: 1 Global Step: 37550 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:00,795-Speed 2497.72 samples/sec Loss 18.9508 LearningRate 0.000453 Epoch: 1 Global Step: 37560 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:08,943-Speed 2514.01 samples/sec Loss 19.0847 LearningRate 0.000453 Epoch: 1 Global Step: 37570 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:17,145-Speed 2497.13 samples/sec Loss 19.0780 LearningRate 0.000453 Epoch: 1 Global Step: 37580 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:25,347-Speed 2497.39 samples/sec Loss 19.0841 LearningRate 0.000453 Epoch: 1 Global Step: 37590 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:33,551-Speed 2496.66 samples/sec Loss 18.9154 LearningRate 0.000453 Epoch: 1 Global Step: 37600 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:41,751-Speed 2497.87 samples/sec Loss 18.9601 LearningRate 0.000453 Epoch: 1 Global Step: 37610 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:49,951-Speed 2498.11 samples/sec Loss 18.8946 LearningRate 0.000453 Epoch: 1 Global Step: 37620 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:01:58,102-Speed 2512.82 samples/sec Loss 19.0217 LearningRate 0.000454 Epoch: 1 Global Step: 37630 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:06,304-Speed 2497.48 samples/sec Loss 19.0330 LearningRate 0.000454 Epoch: 1 Global Step: 37640 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:14,503-Speed 2498.36 samples/sec Loss 18.9257 LearningRate 0.000454 Epoch: 1 Global Step: 37650 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:22,708-Speed 2496.45 samples/sec Loss 18.8314 LearningRate 0.000454 Epoch: 1 Global Step: 37660 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:30,921-Speed 2494.18 samples/sec Loss 18.8867 LearningRate 0.000454 Epoch: 1 Global Step: 37670 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:39,122-Speed 2497.62 samples/sec Loss 18.8068 LearningRate 0.000454 Epoch: 1 Global Step: 37680 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:47,281-Speed 2510.19 samples/sec Loss 18.8590 LearningRate 0.000454 Epoch: 1 Global Step: 37690 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:02:55,480-Speed 2498.23 samples/sec Loss 18.8673 LearningRate 0.000454 Epoch: 1 Global Step: 37700 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:03,683-Speed 2497.08 samples/sec Loss 18.9801 LearningRate 0.000455 Epoch: 1 Global Step: 37710 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:11,897-Speed 2493.95 samples/sec Loss 18.8064 LearningRate 0.000455 Epoch: 1 Global Step: 37720 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:20,094-Speed 2498.62 samples/sec Loss 18.8013 LearningRate 0.000455 Epoch: 1 Global Step: 37730 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:28,296-Speed 2497.41 samples/sec Loss 18.8457 LearningRate 0.000455 Epoch: 1 Global Step: 37740 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:36,446-Speed 2513.00 samples/sec Loss 18.8070 LearningRate 0.000455 Epoch: 1 Global Step: 37750 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:44,647-Speed 2497.70 samples/sec Loss 18.8751 LearningRate 0.000455 Epoch: 1 Global Step: 37760 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:03:52,850-Speed 2497.19 samples/sec Loss 18.8334 LearningRate 0.000455 Epoch: 1 Global Step: 37770 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:01,053-Speed 2496.94 samples/sec Loss 18.8386 LearningRate 0.000455 Epoch: 1 Global Step: 37780 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:09,265-Speed 2494.31 samples/sec Loss 18.9181 LearningRate 0.000456 Epoch: 1 Global Step: 37790 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:17,467-Speed 2497.58 samples/sec Loss 18.7710 LearningRate 0.000456 Epoch: 1 Global Step: 37800 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:25,613-Speed 2514.40 samples/sec Loss 18.8298 LearningRate 0.000456 Epoch: 1 Global Step: 37810 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:33,824-Speed 2494.53 samples/sec Loss 18.8124 LearningRate 0.000456 Epoch: 1 Global Step: 37820 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:42,027-Speed 2497.23 samples/sec Loss 18.8133 LearningRate 0.000456 Epoch: 1 Global Step: 37830 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:50,224-Speed 2498.81 samples/sec Loss 18.7424 LearningRate 0.000456 Epoch: 1 Global Step: 37840 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:04:58,424-Speed 2498.10 samples/sec Loss 18.7717 LearningRate 0.000456 Epoch: 1 Global Step: 37850 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:06,625-Speed 2497.49 samples/sec Loss 18.7968 LearningRate 0.000456 Epoch: 1 Global Step: 37860 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:14,775-Speed 2513.44 samples/sec Loss 18.7129 LearningRate 0.000456 Epoch: 1 Global Step: 37870 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:22,973-Speed 2498.62 samples/sec Loss 18.5692 LearningRate 0.000457 Epoch: 1 Global Step: 37880 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:31,184-Speed 2494.57 samples/sec Loss 18.6506 LearningRate 0.000457 Epoch: 1 Global Step: 37890 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:39,388-Speed 2496.64 samples/sec Loss 18.6321 LearningRate 0.000457 Epoch: 1 Global Step: 37900 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:47,593-Speed 2496.48 samples/sec Loss 18.5822 LearningRate 0.000457 Epoch: 1 Global Step: 37910 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:05:55,801-Speed 2495.81 samples/sec Loss 18.5848 LearningRate 0.000457 Epoch: 1 Global Step: 37920 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:03,952-Speed 2512.86 samples/sec Loss 18.5432 LearningRate 0.000457 Epoch: 1 Global Step: 37930 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:12,156-Speed 2496.62 samples/sec Loss 18.5618 LearningRate 0.000457 Epoch: 1 Global Step: 37940 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:20,361-Speed 2496.46 samples/sec Loss 18.7018 LearningRate 0.000457 Epoch: 1 Global Step: 37950 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:28,569-Speed 2495.45 samples/sec Loss 18.6030 LearningRate 0.000458 Epoch: 1 Global Step: 37960 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:36,774-Speed 2496.55 samples/sec Loss 18.6802 LearningRate 0.000458 Epoch: 1 Global Step: 37970 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:44,979-Speed 2496.33 samples/sec Loss 18.6249 LearningRate 0.000458 Epoch: 1 Global Step: 37980 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:06:53,131-Speed 2512.89 samples/sec Loss 18.5749 LearningRate 0.000458 Epoch: 1 Global Step: 37990 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:01,335-Speed 2496.49 samples/sec Loss 18.5540 LearningRate 0.000458 Epoch: 1 Global Step: 38000 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:09,549-Speed 2496.35 samples/sec Loss 18.6675 LearningRate 0.000458 Epoch: 1 Global Step: 38010 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:17,756-Speed 2496.01 samples/sec Loss 18.5243 LearningRate 0.000458 Epoch: 1 Global Step: 38020 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:25,962-Speed 2496.23 samples/sec Loss 18.6890 LearningRate 0.000458 Epoch: 1 Global Step: 38030 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:34,178-Speed 2493.01 samples/sec Loss 18.7915 LearningRate 0.000459 Epoch: 1 Global Step: 38040 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:42,332-Speed 2512.05 samples/sec Loss 18.6469 LearningRate 0.000459 Epoch: 1 Global Step: 38050 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:50,543-Speed 2494.71 samples/sec Loss 18.6956 LearningRate 0.000459 Epoch: 1 Global Step: 38060 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:07:58,755-Speed 2494.42 samples/sec Loss 18.7437 LearningRate 0.000459 Epoch: 1 Global Step: 38070 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:06,960-Speed 2496.33 samples/sec Loss 18.6603 LearningRate 0.000459 Epoch: 1 Global Step: 38080 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:15,167-Speed 2496.21 samples/sec Loss 18.8180 LearningRate 0.000459 Epoch: 1 Global Step: 38090 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:23,368-Speed 2497.64 samples/sec Loss 18.8988 LearningRate 0.000459 Epoch: 1 Global Step: 38100 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:31,523-Speed 2511.50 samples/sec Loss 18.5361 LearningRate 0.000459 Epoch: 1 Global Step: 38110 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:39,729-Speed 2496.34 samples/sec Loss 18.6677 LearningRate 0.000460 Epoch: 1 Global Step: 38120 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:47,935-Speed 2496.34 samples/sec Loss 18.7014 LearningRate 0.000460 Epoch: 1 Global Step: 38130 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:08:56,140-Speed 2496.29 samples/sec Loss 18.6875 LearningRate 0.000460 Epoch: 1 Global Step: 38140 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:04,342-Speed 2497.20 samples/sec Loss 18.5841 LearningRate 0.000460 Epoch: 1 Global Step: 38150 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:12,546-Speed 2496.90 samples/sec Loss 18.6313 LearningRate 0.000460 Epoch: 1 Global Step: 38160 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:20,694-Speed 2513.81 samples/sec Loss 18.5490 LearningRate 0.000460 Epoch: 1 Global Step: 38170 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:28,900-Speed 2496.22 samples/sec Loss 18.5328 LearningRate 0.000460 Epoch: 1 Global Step: 38180 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:37,109-Speed 2495.08 samples/sec Loss 18.5503 LearningRate 0.000460 Epoch: 1 Global Step: 38190 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:45,315-Speed 2496.11 samples/sec Loss 18.5217 LearningRate 0.000460 Epoch: 1 Global Step: 38200 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:09:53,519-Speed 2497.02 samples/sec Loss 18.3902 LearningRate 0.000461 Epoch: 1 Global Step: 38210 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:01,733-Speed 2494.19 samples/sec Loss 18.4681 LearningRate 0.000461 Epoch: 1 Global Step: 38220 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:09,897-Speed 2508.99 samples/sec Loss 18.3525 LearningRate 0.000461 Epoch: 1 Global Step: 38230 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:18,099-Speed 2497.36 samples/sec Loss 18.6408 LearningRate 0.000461 Epoch: 1 Global Step: 38240 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:26,299-Speed 2498.06 samples/sec Loss 18.4599 LearningRate 0.000461 Epoch: 1 Global Step: 38250 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:34,498-Speed 2498.08 samples/sec Loss 18.2477 LearningRate 0.000461 Epoch: 1 Global Step: 38260 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:42,698-Speed 2498.08 samples/sec Loss 18.4162 LearningRate 0.000461 Epoch: 1 Global Step: 38270 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:50,904-Speed 2495.83 samples/sec Loss 18.3454 LearningRate 0.000461 Epoch: 1 Global Step: 38280 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:10:59,055-Speed 2513.19 samples/sec Loss 18.4757 LearningRate 0.000462 Epoch: 1 Global Step: 38290 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:07,260-Speed 2496.23 samples/sec Loss 18.4773 LearningRate 0.000462 Epoch: 1 Global Step: 38300 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:15,462-Speed 2497.54 samples/sec Loss 18.3687 LearningRate 0.000462 Epoch: 1 Global Step: 38310 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:23,660-Speed 2498.36 samples/sec Loss 18.3815 LearningRate 0.000462 Epoch: 1 Global Step: 38320 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:31,873-Speed 2494.18 samples/sec Loss 18.3211 LearningRate 0.000462 Epoch: 1 Global Step: 38330 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:40,074-Speed 2497.66 samples/sec Loss 18.3179 LearningRate 0.000462 Epoch: 1 Global Step: 38340 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:48,221-Speed 2513.85 samples/sec Loss 18.4147 LearningRate 0.000462 Epoch: 1 Global Step: 38350 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:11:56,422-Speed 2497.79 samples/sec Loss 18.2928 LearningRate 0.000462 Epoch: 1 Global Step: 38360 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:04,625-Speed 2497.12 samples/sec Loss 18.4559 LearningRate 0.000463 Epoch: 1 Global Step: 38370 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:12,830-Speed 2496.39 samples/sec Loss 18.4300 LearningRate 0.000463 Epoch: 1 Global Step: 38380 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:21,029-Speed 2498.26 samples/sec Loss 18.5686 LearningRate 0.000463 Epoch: 1 Global Step: 38390 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:29,232-Speed 2497.12 samples/sec Loss 18.4309 LearningRate 0.000463 Epoch: 1 Global Step: 38400 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:37,382-Speed 2513.16 samples/sec Loss 18.4891 LearningRate 0.000463 Epoch: 1 Global Step: 38410 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:45,585-Speed 2497.27 samples/sec Loss 18.3201 LearningRate 0.000463 Epoch: 1 Global Step: 38420 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:12:53,789-Speed 2496.55 samples/sec Loss 18.3745 LearningRate 0.000463 Epoch: 1 Global Step: 38430 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:01,992-Speed 2497.11 samples/sec Loss 18.3293 LearningRate 0.000463 Epoch: 1 Global Step: 38440 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:10,195-Speed 2496.94 samples/sec Loss 18.2856 LearningRate 0.000463 Epoch: 1 Global Step: 38450 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:18,397-Speed 2497.26 samples/sec Loss 18.3667 LearningRate 0.000464 Epoch: 1 Global Step: 38460 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:26,545-Speed 2513.98 samples/sec Loss 18.2639 LearningRate 0.000464 Epoch: 1 Global Step: 38470 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:34,750-Speed 2496.44 samples/sec Loss 18.3304 LearningRate 0.000464 Epoch: 1 Global Step: 38480 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:42,950-Speed 2498.08 samples/sec Loss 18.2442 LearningRate 0.000464 Epoch: 1 Global Step: 38490 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:51,163-Speed 2493.83 samples/sec Loss 18.3052 LearningRate 0.000464 Epoch: 1 Global Step: 38500 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:13:59,384-Speed 2491.44 samples/sec Loss 18.2715 LearningRate 0.000464 Epoch: 1 Global Step: 38510 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:07,582-Speed 2498.86 samples/sec Loss 18.2140 LearningRate 0.000464 Epoch: 1 Global Step: 38520 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:15,730-Speed 2513.79 samples/sec Loss 18.2807 LearningRate 0.000464 Epoch: 1 Global Step: 38530 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:23,935-Speed 2496.47 samples/sec Loss 18.3273 LearningRate 0.000465 Epoch: 1 Global Step: 38540 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:32,132-Speed 2498.91 samples/sec Loss 18.1432 LearningRate 0.000465 Epoch: 1 Global Step: 38550 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:40,335-Speed 2497.02 samples/sec Loss 18.1786 LearningRate 0.000465 Epoch: 1 Global Step: 38560 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:48,534-Speed 2498.50 samples/sec Loss 18.2025 LearningRate 0.000465 Epoch: 1 Global Step: 38570 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:14:56,737-Speed 2496.95 samples/sec Loss 18.1630 LearningRate 0.000465 Epoch: 1 Global Step: 38580 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:04,885-Speed 2514.16 samples/sec Loss 18.1895 LearningRate 0.000465 Epoch: 1 Global Step: 38590 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:13,086-Speed 2497.78 samples/sec Loss 18.0918 LearningRate 0.000465 Epoch: 1 Global Step: 38600 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:21,286-Speed 2497.84 samples/sec Loss 18.1566 LearningRate 0.000465 Epoch: 1 Global Step: 38610 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:29,493-Speed 2495.86 samples/sec Loss 18.2243 LearningRate 0.000466 Epoch: 1 Global Step: 38620 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:37,700-Speed 2495.91 samples/sec Loss 18.1051 LearningRate 0.000466 Epoch: 1 Global Step: 38630 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:45,903-Speed 2496.86 samples/sec Loss 18.1183 LearningRate 0.000466 Epoch: 1 Global Step: 38640 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:15:54,051-Speed 2513.94 samples/sec Loss 18.0692 LearningRate 0.000466 Epoch: 1 Global Step: 38650 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:02,253-Speed 2497.51 samples/sec Loss 18.1715 LearningRate 0.000466 Epoch: 1 Global Step: 38660 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:10,454-Speed 2497.42 samples/sec Loss 18.1398 LearningRate 0.000466 Epoch: 1 Global Step: 38670 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:18,658-Speed 2496.91 samples/sec Loss 18.1773 LearningRate 0.000466 Epoch: 1 Global Step: 38680 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:26,873-Speed 2493.34 samples/sec Loss 18.1792 LearningRate 0.000466 Epoch: 1 Global Step: 38690 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:35,077-Speed 2496.76 samples/sec Loss 18.1041 LearningRate 0.000467 Epoch: 1 Global Step: 38700 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:43,226-Speed 2513.42 samples/sec Loss 18.1454 LearningRate 0.000467 Epoch: 1 Global Step: 38710 Fp16 Grad Scale: 8192 Required: 181 hours Training: 2022-07-05 23:16:51,425-Speed 2498.47 samples/sec Loss 18.0746 LearningRate 0.000467 Epoch: 1 Global Step: 38720 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:16:59,625-Speed 2497.87 samples/sec Loss 18.1491 LearningRate 0.000467 Epoch: 1 Global Step: 38730 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:07,830-Speed 2496.32 samples/sec Loss 18.1694 LearningRate 0.000467 Epoch: 1 Global Step: 38740 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:16,033-Speed 2497.07 samples/sec Loss 18.0977 LearningRate 0.000467 Epoch: 1 Global Step: 38750 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:24,235-Speed 2497.45 samples/sec Loss 18.0895 LearningRate 0.000467 Epoch: 1 Global Step: 38760 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:32,389-Speed 2512.01 samples/sec Loss 17.9625 LearningRate 0.000467 Epoch: 1 Global Step: 38770 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:40,589-Speed 2498.10 samples/sec Loss 18.1358 LearningRate 0.000467 Epoch: 1 Global Step: 38780 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:48,788-Speed 2498.10 samples/sec Loss 18.0451 LearningRate 0.000468 Epoch: 1 Global Step: 38790 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:17:56,990-Speed 2497.33 samples/sec Loss 18.1855 LearningRate 0.000468 Epoch: 1 Global Step: 38800 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:05,191-Speed 2497.87 samples/sec Loss 18.0040 LearningRate 0.000468 Epoch: 1 Global Step: 38810 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:13,393-Speed 2497.24 samples/sec Loss 17.9984 LearningRate 0.000468 Epoch: 1 Global Step: 38820 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:21,542-Speed 2513.59 samples/sec Loss 18.0035 LearningRate 0.000468 Epoch: 1 Global Step: 38830 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:29,743-Speed 2497.60 samples/sec Loss 18.0098 LearningRate 0.000468 Epoch: 1 Global Step: 38840 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:37,948-Speed 2496.65 samples/sec Loss 17.9901 LearningRate 0.000468 Epoch: 1 Global Step: 38850 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:46,148-Speed 2497.84 samples/sec Loss 17.9024 LearningRate 0.000468 Epoch: 1 Global Step: 38860 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:18:54,350-Speed 2497.62 samples/sec Loss 17.9159 LearningRate 0.000469 Epoch: 1 Global Step: 38870 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:19:02,554-Speed 2496.95 samples/sec Loss 17.9317 LearningRate 0.000469 Epoch: 1 Global Step: 38880 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:19:10,722-Speed 2507.75 samples/sec Loss 17.9826 LearningRate 0.000469 Epoch: 1 Global Step: 38890 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:19:18,924-Speed 2497.17 samples/sec Loss 17.8609 LearningRate 0.000469 Epoch: 1 Global Step: 38900 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:19:27,130-Speed 2496.48 samples/sec Loss 17.9369 LearningRate 0.000469 Epoch: 1 Global Step: 38910 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:19:35,331-Speed 2497.69 samples/sec Loss 17.8964 LearningRate 0.000469 Epoch: 1 Global Step: 38920 Fp16 Grad Scale: 16384 Required: 181 hours Training: 2022-07-05 23:19:43,533-Speed 2497.11 samples/sec Loss 17.9856 LearningRate 0.000469 Epoch: 1 Global Step: 38930 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:19:51,733-Speed 2498.32 samples/sec Loss 17.9104 LearningRate 0.000469 Epoch: 1 Global Step: 38940 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:19:59,879-Speed 2514.51 samples/sec Loss 17.8494 LearningRate 0.000470 Epoch: 1 Global Step: 38950 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:08,079-Speed 2497.97 samples/sec Loss 17.7650 LearningRate 0.000470 Epoch: 1 Global Step: 38960 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:16,280-Speed 2497.70 samples/sec Loss 17.7453 LearningRate 0.000470 Epoch: 1 Global Step: 38970 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:24,479-Speed 2498.24 samples/sec Loss 17.8113 LearningRate 0.000470 Epoch: 1 Global Step: 38980 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:32,682-Speed 2496.96 samples/sec Loss 17.7279 LearningRate 0.000470 Epoch: 1 Global Step: 38990 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:40,887-Speed 2496.80 samples/sec Loss 17.7155 LearningRate 0.000470 Epoch: 1 Global Step: 39000 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:49,036-Speed 2513.41 samples/sec Loss 17.9178 LearningRate 0.000470 Epoch: 1 Global Step: 39010 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:20:57,244-Speed 2495.52 samples/sec Loss 17.7851 LearningRate 0.000470 Epoch: 1 Global Step: 39020 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:05,449-Speed 2496.53 samples/sec Loss 17.8317 LearningRate 0.000470 Epoch: 1 Global Step: 39030 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:13,653-Speed 2497.02 samples/sec Loss 17.7821 LearningRate 0.000471 Epoch: 1 Global Step: 39040 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:21,855-Speed 2497.49 samples/sec Loss 17.7903 LearningRate 0.000471 Epoch: 1 Global Step: 39050 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:30,055-Speed 2497.91 samples/sec Loss 17.8741 LearningRate 0.000471 Epoch: 1 Global Step: 39060 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:38,204-Speed 2513.71 samples/sec Loss 18.0634 LearningRate 0.000471 Epoch: 1 Global Step: 39070 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:46,407-Speed 2496.81 samples/sec Loss 17.8359 LearningRate 0.000471 Epoch: 1 Global Step: 39080 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:21:54,626-Speed 2492.20 samples/sec Loss 17.8727 LearningRate 0.000471 Epoch: 1 Global Step: 39090 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:02,831-Speed 2496.44 samples/sec Loss 17.9667 LearningRate 0.000471 Epoch: 1 Global Step: 39100 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:11,030-Speed 2498.40 samples/sec Loss 17.8572 LearningRate 0.000471 Epoch: 1 Global Step: 39110 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:19,234-Speed 2496.42 samples/sec Loss 17.8545 LearningRate 0.000472 Epoch: 1 Global Step: 39120 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:27,383-Speed 2513.52 samples/sec Loss 17.8068 LearningRate 0.000472 Epoch: 1 Global Step: 39130 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:35,587-Speed 2497.20 samples/sec Loss 17.7527 LearningRate 0.000472 Epoch: 1 Global Step: 39140 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:43,790-Speed 2496.99 samples/sec Loss 17.7583 LearningRate 0.000472 Epoch: 1 Global Step: 39150 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:22:51,991-Speed 2497.74 samples/sec Loss 17.6603 LearningRate 0.000472 Epoch: 1 Global Step: 39160 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:00,196-Speed 2496.34 samples/sec Loss 17.6891 LearningRate 0.000472 Epoch: 1 Global Step: 39170 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:08,408-Speed 2494.32 samples/sec Loss 17.6631 LearningRate 0.000472 Epoch: 1 Global Step: 39180 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:16,557-Speed 2513.74 samples/sec Loss 17.7562 LearningRate 0.000472 Epoch: 1 Global Step: 39190 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:24,758-Speed 2497.69 samples/sec Loss 17.6037 LearningRate 0.000473 Epoch: 1 Global Step: 39200 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:32,961-Speed 2496.91 samples/sec Loss 17.6643 LearningRate 0.000473 Epoch: 1 Global Step: 39210 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:41,162-Speed 2497.67 samples/sec Loss 17.7618 LearningRate 0.000473 Epoch: 1 Global Step: 39220 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:49,363-Speed 2497.72 samples/sec Loss 17.7722 LearningRate 0.000473 Epoch: 1 Global Step: 39230 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:23:57,566-Speed 2496.79 samples/sec Loss 17.7257 LearningRate 0.000473 Epoch: 1 Global Step: 39240 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:05,719-Speed 2512.87 samples/sec Loss 17.5682 LearningRate 0.000473 Epoch: 1 Global Step: 39250 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:13,922-Speed 2496.94 samples/sec Loss 17.6324 LearningRate 0.000473 Epoch: 1 Global Step: 39260 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:22,134-Speed 2494.11 samples/sec Loss 17.7521 LearningRate 0.000473 Epoch: 1 Global Step: 39270 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:30,340-Speed 2496.38 samples/sec Loss 17.6068 LearningRate 0.000473 Epoch: 1 Global Step: 39280 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:38,541-Speed 2497.58 samples/sec Loss 17.5947 LearningRate 0.000474 Epoch: 1 Global Step: 39290 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:46,745-Speed 2496.89 samples/sec Loss 17.5809 LearningRate 0.000474 Epoch: 1 Global Step: 39300 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:24:54,898-Speed 2512.51 samples/sec Loss 17.7309 LearningRate 0.000474 Epoch: 1 Global Step: 39310 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:03,103-Speed 2496.50 samples/sec Loss 17.7030 LearningRate 0.000474 Epoch: 1 Global Step: 39320 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:11,318-Speed 2493.24 samples/sec Loss 17.5554 LearningRate 0.000474 Epoch: 1 Global Step: 39330 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:19,519-Speed 2497.69 samples/sec Loss 17.6939 LearningRate 0.000474 Epoch: 1 Global Step: 39340 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:27,721-Speed 2497.48 samples/sec Loss 17.5059 LearningRate 0.000474 Epoch: 1 Global Step: 39350 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:35,941-Speed 2491.72 samples/sec Loss 17.5560 LearningRate 0.000474 Epoch: 1 Global Step: 39360 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:44,086-Speed 2514.82 samples/sec Loss 17.6469 LearningRate 0.000475 Epoch: 1 Global Step: 39370 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:25:52,284-Speed 2498.63 samples/sec Loss 17.6148 LearningRate 0.000475 Epoch: 1 Global Step: 39380 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:00,484-Speed 2497.75 samples/sec Loss 17.5234 LearningRate 0.000475 Epoch: 1 Global Step: 39390 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:08,689-Speed 2496.54 samples/sec Loss 17.5082 LearningRate 0.000475 Epoch: 1 Global Step: 39400 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:16,900-Speed 2494.61 samples/sec Loss 17.4943 LearningRate 0.000475 Epoch: 1 Global Step: 39410 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:25,110-Speed 2495.36 samples/sec Loss 17.5149 LearningRate 0.000475 Epoch: 1 Global Step: 39420 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:33,258-Speed 2513.76 samples/sec Loss 17.5372 LearningRate 0.000475 Epoch: 1 Global Step: 39430 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:41,461-Speed 2497.15 samples/sec Loss 17.4634 LearningRate 0.000475 Epoch: 1 Global Step: 39440 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:49,662-Speed 2497.67 samples/sec Loss 17.5836 LearningRate 0.000476 Epoch: 1 Global Step: 39450 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:26:57,874-Speed 2494.39 samples/sec Loss 17.5524 LearningRate 0.000476 Epoch: 1 Global Step: 39460 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:06,077-Speed 2497.09 samples/sec Loss 17.6175 LearningRate 0.000476 Epoch: 1 Global Step: 39470 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:14,279-Speed 2497.23 samples/sec Loss 17.4491 LearningRate 0.000476 Epoch: 1 Global Step: 39480 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:22,429-Speed 2513.28 samples/sec Loss 17.4810 LearningRate 0.000476 Epoch: 1 Global Step: 39490 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:30,636-Speed 2495.93 samples/sec Loss 17.5084 LearningRate 0.000476 Epoch: 1 Global Step: 39500 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:38,840-Speed 2496.50 samples/sec Loss 17.4090 LearningRate 0.000476 Epoch: 1 Global Step: 39510 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:47,044-Speed 2496.97 samples/sec Loss 17.4472 LearningRate 0.000476 Epoch: 1 Global Step: 39520 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:27:55,244-Speed 2497.85 samples/sec Loss 17.4562 LearningRate 0.000477 Epoch: 1 Global Step: 39530 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:03,442-Speed 2498.32 samples/sec Loss 17.3835 LearningRate 0.000477 Epoch: 1 Global Step: 39540 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:11,591-Speed 2513.77 samples/sec Loss 17.4120 LearningRate 0.000477 Epoch: 1 Global Step: 39550 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:19,791-Speed 2497.63 samples/sec Loss 17.4720 LearningRate 0.000477 Epoch: 1 Global Step: 39560 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:27,998-Speed 2495.91 samples/sec Loss 17.3923 LearningRate 0.000477 Epoch: 1 Global Step: 39570 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:36,196-Speed 2498.69 samples/sec Loss 17.4456 LearningRate 0.000477 Epoch: 1 Global Step: 39580 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:44,401-Speed 2496.40 samples/sec Loss 17.4400 LearningRate 0.000477 Epoch: 1 Global Step: 39590 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:28:52,604-Speed 2496.91 samples/sec Loss 17.5069 LearningRate 0.000477 Epoch: 1 Global Step: 39600 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:00,751-Speed 2514.38 samples/sec Loss 17.3394 LearningRate 0.000477 Epoch: 1 Global Step: 39610 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:08,954-Speed 2497.20 samples/sec Loss 17.3364 LearningRate 0.000478 Epoch: 1 Global Step: 39620 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:17,156-Speed 2497.40 samples/sec Loss 17.3431 LearningRate 0.000478 Epoch: 1 Global Step: 39630 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:25,354-Speed 2498.54 samples/sec Loss 17.3545 LearningRate 0.000478 Epoch: 1 Global Step: 39640 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:33,559-Speed 2496.80 samples/sec Loss 17.3524 LearningRate 0.000478 Epoch: 1 Global Step: 39650 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:41,761-Speed 2497.34 samples/sec Loss 17.3346 LearningRate 0.000478 Epoch: 1 Global Step: 39660 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:49,911-Speed 2513.35 samples/sec Loss 17.3353 LearningRate 0.000478 Epoch: 1 Global Step: 39670 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:29:58,118-Speed 2496.37 samples/sec Loss 17.3304 LearningRate 0.000478 Epoch: 1 Global Step: 39680 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:06,321-Speed 2497.16 samples/sec Loss 17.3059 LearningRate 0.000478 Epoch: 1 Global Step: 39690 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:14,528-Speed 2495.61 samples/sec Loss 17.2856 LearningRate 0.000479 Epoch: 1 Global Step: 39700 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:22,736-Speed 2495.53 samples/sec Loss 17.2668 LearningRate 0.000479 Epoch: 1 Global Step: 39710 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:30,940-Speed 2497.10 samples/sec Loss 17.4128 LearningRate 0.000479 Epoch: 1 Global Step: 39720 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:39,090-Speed 2513.14 samples/sec Loss 17.2708 LearningRate 0.000479 Epoch: 1 Global Step: 39730 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:47,289-Speed 2498.70 samples/sec Loss 17.3100 LearningRate 0.000479 Epoch: 1 Global Step: 39740 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:30:55,494-Speed 2496.40 samples/sec Loss 17.2612 LearningRate 0.000479 Epoch: 1 Global Step: 39750 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:03,697-Speed 2497.30 samples/sec Loss 17.2322 LearningRate 0.000479 Epoch: 1 Global Step: 39760 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:11,902-Speed 2496.43 samples/sec Loss 17.2962 LearningRate 0.000479 Epoch: 1 Global Step: 39770 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:20,142-Speed 2485.82 samples/sec Loss 17.2074 LearningRate 0.000480 Epoch: 1 Global Step: 39780 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:28,293-Speed 2512.88 samples/sec Loss 17.2264 LearningRate 0.000480 Epoch: 1 Global Step: 39790 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:36,499-Speed 2496.31 samples/sec Loss 17.2095 LearningRate 0.000480 Epoch: 1 Global Step: 39800 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:44,709-Speed 2495.02 samples/sec Loss 17.1684 LearningRate 0.000480 Epoch: 1 Global Step: 39810 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:31:52,909-Speed 2497.95 samples/sec Loss 17.1921 LearningRate 0.000480 Epoch: 1 Global Step: 39820 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:01,108-Speed 2498.68 samples/sec Loss 17.1647 LearningRate 0.000480 Epoch: 1 Global Step: 39830 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:09,311-Speed 2496.90 samples/sec Loss 17.2977 LearningRate 0.000480 Epoch: 1 Global Step: 39840 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:17,463-Speed 2512.58 samples/sec Loss 17.2386 LearningRate 0.000480 Epoch: 1 Global Step: 39850 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:25,668-Speed 2496.79 samples/sec Loss 17.0942 LearningRate 0.000480 Epoch: 1 Global Step: 39860 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:33,870-Speed 2497.29 samples/sec Loss 17.1718 LearningRate 0.000481 Epoch: 1 Global Step: 39870 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:42,070-Speed 2498.01 samples/sec Loss 17.2836 LearningRate 0.000481 Epoch: 1 Global Step: 39880 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:50,285-Speed 2493.21 samples/sec Loss 17.2498 LearningRate 0.000481 Epoch: 1 Global Step: 39890 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:32:58,499-Speed 2493.95 samples/sec Loss 17.0723 LearningRate 0.000481 Epoch: 1 Global Step: 39900 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:33:06,645-Speed 2514.65 samples/sec Loss 17.2347 LearningRate 0.000481 Epoch: 1 Global Step: 39910 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:33:14,863-Speed 2492.72 samples/sec Loss 17.1594 LearningRate 0.000481 Epoch: 1 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:33:23,063-Speed 2498.19 samples/sec Loss 17.1565 LearningRate 0.000481 Epoch: 1 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:33:31,263-Speed 2497.70 samples/sec Loss 17.2015 LearningRate 0.000481 Epoch: 1 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:33:39,467-Speed 2497.06 samples/sec Loss 17.2095 LearningRate 0.000482 Epoch: 1 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:33:47,667-Speed 2498.76 samples/sec Loss 17.1536 LearningRate 0.000482 Epoch: 1 Global Step: 39960 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:33:55,816-Speed 2513.58 samples/sec Loss 17.1382 LearningRate 0.000482 Epoch: 1 Global Step: 39970 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:04,030-Speed 2493.60 samples/sec Loss 17.2690 LearningRate 0.000482 Epoch: 1 Global Step: 39980 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:12,231-Speed 2497.56 samples/sec Loss 17.1498 LearningRate 0.000482 Epoch: 1 Global Step: 39990 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:20,432-Speed 2498.12 samples/sec Loss 17.0333 LearningRate 0.000482 Epoch: 1 Global Step: 40000 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:28,632-Speed 2498.22 samples/sec Loss 17.1213 LearningRate 0.000482 Epoch: 1 Global Step: 40010 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:36,833-Speed 2497.16 samples/sec Loss 17.0893 LearningRate 0.000482 Epoch: 1 Global Step: 40020 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:44,996-Speed 2509.42 samples/sec Loss 17.0061 LearningRate 0.000483 Epoch: 1 Global Step: 40030 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:34:53,196-Speed 2498.13 samples/sec Loss 17.0959 LearningRate 0.000483 Epoch: 1 Global Step: 40040 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:01,399-Speed 2497.12 samples/sec Loss 17.1449 LearningRate 0.000483 Epoch: 1 Global Step: 40050 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:09,603-Speed 2496.69 samples/sec Loss 17.0809 LearningRate 0.000483 Epoch: 1 Global Step: 40060 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:17,801-Speed 2498.67 samples/sec Loss 16.9780 LearningRate 0.000483 Epoch: 1 Global Step: 40070 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:26,008-Speed 2496.06 samples/sec Loss 17.1011 LearningRate 0.000483 Epoch: 1 Global Step: 40080 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:34,160-Speed 2512.52 samples/sec Loss 17.1139 LearningRate 0.000483 Epoch: 1 Global Step: 40090 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:42,361-Speed 2497.90 samples/sec Loss 17.0464 LearningRate 0.000483 Epoch: 1 Global Step: 40100 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:50,563-Speed 2497.30 samples/sec Loss 17.1552 LearningRate 0.000483 Epoch: 1 Global Step: 40110 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:35:58,762-Speed 2498.22 samples/sec Loss 17.1634 LearningRate 0.000484 Epoch: 1 Global Step: 40120 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:06,965-Speed 2496.87 samples/sec Loss 17.0187 LearningRate 0.000484 Epoch: 1 Global Step: 40130 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:15,180-Speed 2493.26 samples/sec Loss 16.9900 LearningRate 0.000484 Epoch: 1 Global Step: 40140 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:23,327-Speed 2514.13 samples/sec Loss 16.9791 LearningRate 0.000484 Epoch: 1 Global Step: 40150 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:31,526-Speed 2498.36 samples/sec Loss 16.9483 LearningRate 0.000484 Epoch: 1 Global Step: 40160 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:39,726-Speed 2497.81 samples/sec Loss 17.0971 LearningRate 0.000484 Epoch: 1 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:47,934-Speed 2495.99 samples/sec Loss 17.2314 LearningRate 0.000484 Epoch: 1 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:36:56,138-Speed 2496.76 samples/sec Loss 17.0068 LearningRate 0.000484 Epoch: 1 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:04,343-Speed 2496.27 samples/sec Loss 17.0630 LearningRate 0.000485 Epoch: 1 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:12,502-Speed 2510.95 samples/sec Loss 17.1513 LearningRate 0.000485 Epoch: 1 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:20,705-Speed 2497.06 samples/sec Loss 17.0442 LearningRate 0.000485 Epoch: 1 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:28,904-Speed 2498.18 samples/sec Loss 17.0114 LearningRate 0.000485 Epoch: 1 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:37,125-Speed 2491.51 samples/sec Loss 17.1009 LearningRate 0.000485 Epoch: 1 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:45,327-Speed 2497.59 samples/sec Loss 16.9524 LearningRate 0.000485 Epoch: 1 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:37:53,528-Speed 2497.54 samples/sec Loss 17.0029 LearningRate 0.000485 Epoch: 1 Global Step: 40260 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:01,684-Speed 2511.24 samples/sec Loss 16.8939 LearningRate 0.000485 Epoch: 1 Global Step: 40270 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:09,887-Speed 2497.14 samples/sec Loss 16.9272 LearningRate 0.000486 Epoch: 1 Global Step: 40280 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:18,092-Speed 2496.58 samples/sec Loss 16.9717 LearningRate 0.000486 Epoch: 1 Global Step: 40290 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:26,295-Speed 2497.00 samples/sec Loss 16.9535 LearningRate 0.000486 Epoch: 1 Global Step: 40300 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:34,519-Speed 2490.77 samples/sec Loss 16.9477 LearningRate 0.000486 Epoch: 1 Global Step: 40310 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:42,739-Speed 2491.85 samples/sec Loss 16.8654 LearningRate 0.000486 Epoch: 1 Global Step: 40320 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:50,902-Speed 2509.41 samples/sec Loss 16.8110 LearningRate 0.000486 Epoch: 1 Global Step: 40330 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:38:59,107-Speed 2496.63 samples/sec Loss 16.8797 LearningRate 0.000486 Epoch: 1 Global Step: 40340 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:39:07,312-Speed 2496.34 samples/sec Loss 16.7418 LearningRate 0.000486 Epoch: 1 Global Step: 40350 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:39:15,516-Speed 2496.61 samples/sec Loss 16.8040 LearningRate 0.000487 Epoch: 1 Global Step: 40360 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:39:23,677-Speed 2509.87 samples/sec Loss 16.9722 LearningRate 0.000487 Epoch: 1 Global Step: 40370 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:39:31,880-Speed 2496.88 samples/sec Loss 16.8106 LearningRate 0.000487 Epoch: 1 Global Step: 40380 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:39:40,033-Speed 2512.42 samples/sec Loss 16.8715 LearningRate 0.000487 Epoch: 1 Global Step: 40390 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:39:48,240-Speed 2495.72 samples/sec Loss 16.9118 LearningRate 0.000487 Epoch: 1 Global Step: 40400 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:39:56,445-Speed 2496.47 samples/sec Loss 16.7434 LearningRate 0.000487 Epoch: 1 Global Step: 40410 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:04,649-Speed 2496.97 samples/sec Loss 16.8527 LearningRate 0.000487 Epoch: 1 Global Step: 40420 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:12,861-Speed 2494.14 samples/sec Loss 16.8703 LearningRate 0.000487 Epoch: 1 Global Step: 40430 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:21,061-Speed 2497.78 samples/sec Loss 16.7485 LearningRate 0.000487 Epoch: 1 Global Step: 40440 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:29,212-Speed 2513.18 samples/sec Loss 16.6639 LearningRate 0.000488 Epoch: 1 Global Step: 40450 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:37,411-Speed 2498.25 samples/sec Loss 16.7722 LearningRate 0.000488 Epoch: 1 Global Step: 40460 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:45,615-Speed 2496.74 samples/sec Loss 16.8197 LearningRate 0.000488 Epoch: 1 Global Step: 40470 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:40:53,816-Speed 2497.57 samples/sec Loss 16.6894 LearningRate 0.000488 Epoch: 1 Global Step: 40480 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:02,015-Speed 2498.25 samples/sec Loss 16.7423 LearningRate 0.000488 Epoch: 1 Global Step: 40490 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:10,219-Speed 2496.71 samples/sec Loss 16.7785 LearningRate 0.000488 Epoch: 1 Global Step: 40500 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:18,371-Speed 2512.52 samples/sec Loss 16.7181 LearningRate 0.000488 Epoch: 1 Global Step: 40510 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:26,575-Speed 2496.86 samples/sec Loss 16.7507 LearningRate 0.000488 Epoch: 1 Global Step: 40520 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:34,777-Speed 2497.28 samples/sec Loss 16.6881 LearningRate 0.000489 Epoch: 1 Global Step: 40530 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:42,976-Speed 2498.14 samples/sec Loss 16.8321 LearningRate 0.000489 Epoch: 1 Global Step: 40540 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:51,181-Speed 2496.55 samples/sec Loss 17.0335 LearningRate 0.000489 Epoch: 1 Global Step: 40550 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:41:59,388-Speed 2495.86 samples/sec Loss 16.7277 LearningRate 0.000489 Epoch: 1 Global Step: 40560 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:07,536-Speed 2513.85 samples/sec Loss 17.0260 LearningRate 0.000489 Epoch: 1 Global Step: 40570 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:15,738-Speed 2497.17 samples/sec Loss 16.8186 LearningRate 0.000489 Epoch: 1 Global Step: 40580 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:23,943-Speed 2496.78 samples/sec Loss 16.8130 LearningRate 0.000489 Epoch: 1 Global Step: 40590 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:32,144-Speed 2497.48 samples/sec Loss 16.8416 LearningRate 0.000489 Epoch: 1 Global Step: 40600 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:40,348-Speed 2496.66 samples/sec Loss 16.7890 LearningRate 0.000490 Epoch: 1 Global Step: 40610 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:48,550-Speed 2497.35 samples/sec Loss 16.8231 LearningRate 0.000490 Epoch: 1 Global Step: 40620 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:42:56,699-Speed 2513.68 samples/sec Loss 16.6940 LearningRate 0.000490 Epoch: 1 Global Step: 40630 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:04,905-Speed 2496.39 samples/sec Loss 16.6712 LearningRate 0.000490 Epoch: 1 Global Step: 40640 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:13,107-Speed 2497.06 samples/sec Loss 16.6588 LearningRate 0.000490 Epoch: 1 Global Step: 40650 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:21,310-Speed 2497.25 samples/sec Loss 16.6292 LearningRate 0.000490 Epoch: 1 Global Step: 40660 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:29,512-Speed 2497.48 samples/sec Loss 16.6755 LearningRate 0.000490 Epoch: 1 Global Step: 40670 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:37,716-Speed 2497.08 samples/sec Loss 16.5196 LearningRate 0.000490 Epoch: 1 Global Step: 40680 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:45,864-Speed 2513.93 samples/sec Loss 16.6076 LearningRate 0.000490 Epoch: 1 Global Step: 40690 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:43:54,068-Speed 2496.77 samples/sec Loss 16.5825 LearningRate 0.000491 Epoch: 1 Global Step: 40700 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:02,270-Speed 2497.57 samples/sec Loss 16.6869 LearningRate 0.000491 Epoch: 1 Global Step: 40710 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:10,469-Speed 2498.10 samples/sec Loss 16.5018 LearningRate 0.000491 Epoch: 1 Global Step: 40720 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:18,669-Speed 2498.12 samples/sec Loss 16.5426 LearningRate 0.000491 Epoch: 1 Global Step: 40730 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:26,868-Speed 2498.29 samples/sec Loss 16.5267 LearningRate 0.000491 Epoch: 1 Global Step: 40740 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:35,019-Speed 2512.80 samples/sec Loss 16.5605 LearningRate 0.000491 Epoch: 1 Global Step: 40750 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:43,220-Speed 2497.91 samples/sec Loss 16.5693 LearningRate 0.000491 Epoch: 1 Global Step: 40760 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:51,418-Speed 2498.31 samples/sec Loss 16.5088 LearningRate 0.000491 Epoch: 1 Global Step: 40770 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:44:59,618-Speed 2497.96 samples/sec Loss 16.6189 LearningRate 0.000492 Epoch: 1 Global Step: 40780 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:07,819-Speed 2497.57 samples/sec Loss 16.5403 LearningRate 0.000492 Epoch: 1 Global Step: 40790 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:16,024-Speed 2496.67 samples/sec Loss 16.6083 LearningRate 0.000492 Epoch: 1 Global Step: 40800 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:24,170-Speed 2514.29 samples/sec Loss 16.4405 LearningRate 0.000492 Epoch: 1 Global Step: 40810 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:32,370-Speed 2498.01 samples/sec Loss 16.5634 LearningRate 0.000492 Epoch: 1 Global Step: 40820 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:40,570-Speed 2497.89 samples/sec Loss 16.4947 LearningRate 0.000492 Epoch: 1 Global Step: 40830 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:48,770-Speed 2498.07 samples/sec Loss 16.7094 LearningRate 0.000492 Epoch: 1 Global Step: 40840 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:45:56,981-Speed 2494.79 samples/sec Loss 16.6468 LearningRate 0.000492 Epoch: 1 Global Step: 40850 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:05,182-Speed 2497.52 samples/sec Loss 16.5743 LearningRate 0.000493 Epoch: 1 Global Step: 40860 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:13,334-Speed 2512.57 samples/sec Loss 16.4936 LearningRate 0.000493 Epoch: 1 Global Step: 40870 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:21,536-Speed 2497.29 samples/sec Loss 16.5783 LearningRate 0.000493 Epoch: 1 Global Step: 40880 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:29,736-Speed 2498.05 samples/sec Loss 16.5448 LearningRate 0.000493 Epoch: 1 Global Step: 40890 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:37,938-Speed 2497.51 samples/sec Loss 16.4931 LearningRate 0.000493 Epoch: 1 Global Step: 40900 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:46,138-Speed 2498.30 samples/sec Loss 16.5381 LearningRate 0.000493 Epoch: 1 Global Step: 40910 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:46:54,337-Speed 2498.27 samples/sec Loss 16.5643 LearningRate 0.000493 Epoch: 1 Global Step: 40920 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:02,483-Speed 2514.39 samples/sec Loss 16.5178 LearningRate 0.000493 Epoch: 1 Global Step: 40930 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:10,683-Speed 2498.05 samples/sec Loss 16.5206 LearningRate 0.000494 Epoch: 1 Global Step: 40940 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:18,885-Speed 2497.32 samples/sec Loss 16.4792 LearningRate 0.000494 Epoch: 1 Global Step: 40950 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:27,089-Speed 2496.87 samples/sec Loss 16.5873 LearningRate 0.000494 Epoch: 1 Global Step: 40960 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:35,294-Speed 2496.37 samples/sec Loss 16.5574 LearningRate 0.000494 Epoch: 1 Global Step: 40970 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:43,496-Speed 2497.56 samples/sec Loss 16.5332 LearningRate 0.000494 Epoch: 1 Global Step: 40980 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:51,650-Speed 2512.10 samples/sec Loss 16.4792 LearningRate 0.000494 Epoch: 1 Global Step: 40990 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:47:59,853-Speed 2497.01 samples/sec Loss 16.5173 LearningRate 0.000494 Epoch: 1 Global Step: 41000 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:08,054-Speed 2497.63 samples/sec Loss 16.4967 LearningRate 0.000494 Epoch: 1 Global Step: 41010 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:16,252-Speed 2498.67 samples/sec Loss 16.3712 LearningRate 0.000494 Epoch: 1 Global Step: 41020 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:24,456-Speed 2497.68 samples/sec Loss 16.5427 LearningRate 0.000495 Epoch: 1 Global Step: 41030 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:32,655-Speed 2498.30 samples/sec Loss 16.4165 LearningRate 0.000495 Epoch: 1 Global Step: 41040 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:40,805-Speed 2513.15 samples/sec Loss 16.5873 LearningRate 0.000495 Epoch: 1 Global Step: 41050 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:49,006-Speed 2497.68 samples/sec Loss 16.3576 LearningRate 0.000495 Epoch: 1 Global Step: 41060 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:48:57,211-Speed 2496.48 samples/sec Loss 16.3345 LearningRate 0.000495 Epoch: 1 Global Step: 41070 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:05,416-Speed 2496.43 samples/sec Loss 16.4565 LearningRate 0.000495 Epoch: 1 Global Step: 41080 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:13,630-Speed 2493.70 samples/sec Loss 16.3315 LearningRate 0.000495 Epoch: 1 Global Step: 41090 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:21,831-Speed 2497.59 samples/sec Loss 16.4416 LearningRate 0.000495 Epoch: 1 Global Step: 41100 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:29,976-Speed 2514.84 samples/sec Loss 16.3863 LearningRate 0.000496 Epoch: 1 Global Step: 41110 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:38,179-Speed 2497.14 samples/sec Loss 16.3488 LearningRate 0.000496 Epoch: 1 Global Step: 41120 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:46,378-Speed 2498.13 samples/sec Loss 16.4312 LearningRate 0.000496 Epoch: 1 Global Step: 41130 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:49:54,576-Speed 2498.94 samples/sec Loss 16.3475 LearningRate 0.000496 Epoch: 1 Global Step: 41140 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:02,792-Speed 2492.90 samples/sec Loss 16.3989 LearningRate 0.000496 Epoch: 1 Global Step: 41150 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:10,992-Speed 2498.16 samples/sec Loss 16.3355 LearningRate 0.000496 Epoch: 1 Global Step: 41160 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:19,140-Speed 2513.91 samples/sec Loss 16.3097 LearningRate 0.000496 Epoch: 1 Global Step: 41170 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:27,349-Speed 2495.29 samples/sec Loss 16.4236 LearningRate 0.000496 Epoch: 1 Global Step: 41180 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:35,552-Speed 2496.97 samples/sec Loss 16.2412 LearningRate 0.000497 Epoch: 1 Global Step: 41190 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:43,755-Speed 2496.94 samples/sec Loss 16.2927 LearningRate 0.000497 Epoch: 1 Global Step: 41200 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:50:51,951-Speed 2499.10 samples/sec Loss 16.4169 LearningRate 0.000497 Epoch: 1 Global Step: 41210 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:00,151-Speed 2498.21 samples/sec Loss 16.2581 LearningRate 0.000497 Epoch: 1 Global Step: 41220 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:08,303-Speed 2512.46 samples/sec Loss 16.4837 LearningRate 0.000497 Epoch: 1 Global Step: 41230 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:16,507-Speed 2496.89 samples/sec Loss 16.4145 LearningRate 0.000497 Epoch: 1 Global Step: 41240 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:24,714-Speed 2496.02 samples/sec Loss 16.3307 LearningRate 0.000497 Epoch: 1 Global Step: 41250 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:32,925-Speed 2494.38 samples/sec Loss 16.3352 LearningRate 0.000497 Epoch: 1 Global Step: 41260 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:41,128-Speed 2497.14 samples/sec Loss 16.2988 LearningRate 0.000497 Epoch: 1 Global Step: 41270 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:49,331-Speed 2497.06 samples/sec Loss 16.4271 LearningRate 0.000498 Epoch: 1 Global Step: 41280 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:51:57,480-Speed 2513.55 samples/sec Loss 16.3962 LearningRate 0.000498 Epoch: 1 Global Step: 41290 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:05,687-Speed 2495.95 samples/sec Loss 16.3541 LearningRate 0.000498 Epoch: 1 Global Step: 41300 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:13,891-Speed 2497.10 samples/sec Loss 16.3185 LearningRate 0.000498 Epoch: 1 Global Step: 41310 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:22,093-Speed 2497.32 samples/sec Loss 16.3083 LearningRate 0.000498 Epoch: 1 Global Step: 41320 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:30,296-Speed 2497.04 samples/sec Loss 16.2208 LearningRate 0.000498 Epoch: 1 Global Step: 41330 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:38,499-Speed 2497.36 samples/sec Loss 16.3492 LearningRate 0.000498 Epoch: 1 Global Step: 41340 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:46,648-Speed 2513.41 samples/sec Loss 16.2869 LearningRate 0.000498 Epoch: 1 Global Step: 41350 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:52:54,849-Speed 2497.67 samples/sec Loss 16.2722 LearningRate 0.000499 Epoch: 1 Global Step: 41360 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:03,048-Speed 2498.42 samples/sec Loss 16.2339 LearningRate 0.000499 Epoch: 1 Global Step: 41370 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:11,260-Speed 2494.37 samples/sec Loss 16.2592 LearningRate 0.000499 Epoch: 1 Global Step: 41380 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:19,461-Speed 2497.87 samples/sec Loss 16.2469 LearningRate 0.000499 Epoch: 1 Global Step: 41390 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:27,661-Speed 2497.75 samples/sec Loss 16.2341 LearningRate 0.000499 Epoch: 1 Global Step: 41400 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:35,810-Speed 2514.30 samples/sec Loss 16.1798 LearningRate 0.000499 Epoch: 1 Global Step: 41410 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:44,008-Speed 2498.61 samples/sec Loss 16.2000 LearningRate 0.000499 Epoch: 1 Global Step: 41420 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:53:52,210-Speed 2497.29 samples/sec Loss 16.2545 LearningRate 0.000499 Epoch: 1 Global Step: 41430 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:00,414-Speed 2496.86 samples/sec Loss 16.1530 LearningRate 0.000500 Epoch: 1 Global Step: 41440 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:08,611-Speed 2498.74 samples/sec Loss 16.4143 LearningRate 0.000500 Epoch: 1 Global Step: 41450 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:16,816-Speed 2496.57 samples/sec Loss 16.2247 LearningRate 0.000500 Epoch: 1 Global Step: 41460 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:24,961-Speed 2514.72 samples/sec Loss 16.2797 LearningRate 0.000500 Epoch: 1 Global Step: 41470 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:33,176-Speed 2493.10 samples/sec Loss 16.2072 LearningRate 0.000500 Epoch: 1 Global Step: 41480 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:43,813-Speed 1925.56 samples/sec Loss 16.1523 LearningRate 0.000500 Epoch: 2 Global Step: 41490 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:54:52,011-Speed 2498.44 samples/sec Loss 16.1492 LearningRate 0.000500 Epoch: 2 Global Step: 41500 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:00,206-Speed 2499.81 samples/sec Loss 16.1753 LearningRate 0.000500 Epoch: 2 Global Step: 41510 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:08,400-Speed 2499.71 samples/sec Loss 16.1518 LearningRate 0.000500 Epoch: 2 Global Step: 41520 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:16,552-Speed 2512.66 samples/sec Loss 16.1610 LearningRate 0.000501 Epoch: 2 Global Step: 41530 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:24,752-Speed 2498.20 samples/sec Loss 16.0456 LearningRate 0.000501 Epoch: 2 Global Step: 41540 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:32,951-Speed 2498.24 samples/sec Loss 16.1188 LearningRate 0.000501 Epoch: 2 Global Step: 41550 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:41,150-Speed 2498.32 samples/sec Loss 16.1254 LearningRate 0.000501 Epoch: 2 Global Step: 41560 Fp16 Grad Scale: 16384 Required: 180 hours Training: 2022-07-05 23:55:49,349-Speed 2498.27 samples/sec Loss 16.1586 LearningRate 0.000501 Epoch: 2 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:55:57,565-Speed 2492.70 samples/sec Loss 15.9708 LearningRate 0.000501 Epoch: 2 Global Step: 41580 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:05,716-Speed 2513.16 samples/sec Loss 16.0990 LearningRate 0.000501 Epoch: 2 Global Step: 41590 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:13,920-Speed 2496.78 samples/sec Loss 16.1364 LearningRate 0.000501 Epoch: 2 Global Step: 41600 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:22,128-Speed 2495.30 samples/sec Loss 16.1233 LearningRate 0.000502 Epoch: 2 Global Step: 41610 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:30,334-Speed 2496.32 samples/sec Loss 16.0228 LearningRate 0.000502 Epoch: 2 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:38,534-Speed 2497.76 samples/sec Loss 16.0265 LearningRate 0.000502 Epoch: 2 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:46,740-Speed 2496.25 samples/sec Loss 16.1053 LearningRate 0.000502 Epoch: 2 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:56:54,891-Speed 2512.91 samples/sec Loss 16.1100 LearningRate 0.000502 Epoch: 2 Global Step: 41650 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:03,095-Speed 2497.47 samples/sec Loss 16.1812 LearningRate 0.000502 Epoch: 2 Global Step: 41660 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:11,309-Speed 2493.60 samples/sec Loss 16.1189 LearningRate 0.000502 Epoch: 2 Global Step: 41670 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:19,512-Speed 2496.99 samples/sec Loss 16.0402 LearningRate 0.000502 Epoch: 2 Global Step: 41680 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:27,714-Speed 2497.43 samples/sec Loss 15.9989 LearningRate 0.000503 Epoch: 2 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:35,920-Speed 2495.96 samples/sec Loss 16.1006 LearningRate 0.000503 Epoch: 2 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:44,070-Speed 2513.42 samples/sec Loss 16.0156 LearningRate 0.000503 Epoch: 2 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:57:52,285-Speed 2493.29 samples/sec Loss 16.0065 LearningRate 0.000503 Epoch: 2 Global Step: 41720 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:00,493-Speed 2495.53 samples/sec Loss 16.0001 LearningRate 0.000503 Epoch: 2 Global Step: 41730 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:08,700-Speed 2495.72 samples/sec Loss 15.9316 LearningRate 0.000503 Epoch: 2 Global Step: 41740 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:16,904-Speed 2496.62 samples/sec Loss 16.0311 LearningRate 0.000503 Epoch: 2 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:25,108-Speed 2496.78 samples/sec Loss 16.0104 LearningRate 0.000503 Epoch: 2 Global Step: 41760 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:33,260-Speed 2512.57 samples/sec Loss 15.8529 LearningRate 0.000504 Epoch: 2 Global Step: 41770 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:41,473-Speed 2493.98 samples/sec Loss 15.8978 LearningRate 0.000504 Epoch: 2 Global Step: 41780 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:49,672-Speed 2498.35 samples/sec Loss 15.9204 LearningRate 0.000504 Epoch: 2 Global Step: 41790 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:58:57,871-Speed 2498.41 samples/sec Loss 15.8641 LearningRate 0.000504 Epoch: 2 Global Step: 41800 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:06,073-Speed 2497.21 samples/sec Loss 15.9312 LearningRate 0.000504 Epoch: 2 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:14,278-Speed 2496.46 samples/sec Loss 15.8923 LearningRate 0.000504 Epoch: 2 Global Step: 41820 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:22,433-Speed 2511.85 samples/sec Loss 15.9878 LearningRate 0.000504 Epoch: 2 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:30,634-Speed 2497.66 samples/sec Loss 15.9040 LearningRate 0.000504 Epoch: 2 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:38,833-Speed 2498.51 samples/sec Loss 15.7851 LearningRate 0.000504 Epoch: 2 Global Step: 41850 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:47,035-Speed 2497.32 samples/sec Loss 15.9011 LearningRate 0.000505 Epoch: 2 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-05 23:59:55,238-Speed 2496.76 samples/sec Loss 15.8122 LearningRate 0.000505 Epoch: 2 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:03,440-Speed 2497.48 samples/sec Loss 15.8418 LearningRate 0.000505 Epoch: 2 Global Step: 41880 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:11,593-Speed 2512.33 samples/sec Loss 15.8092 LearningRate 0.000505 Epoch: 2 Global Step: 41890 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:19,817-Speed 2490.85 samples/sec Loss 15.7070 LearningRate 0.000505 Epoch: 2 Global Step: 41900 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:28,019-Speed 2497.29 samples/sec Loss 15.8791 LearningRate 0.000505 Epoch: 2 Global Step: 41910 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:36,219-Speed 2498.06 samples/sec Loss 15.8345 LearningRate 0.000505 Epoch: 2 Global Step: 41920 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:44,438-Speed 2492.34 samples/sec Loss 15.7285 LearningRate 0.000505 Epoch: 2 Global Step: 41930 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:00:52,656-Speed 2492.45 samples/sec Loss 15.8899 LearningRate 0.000506 Epoch: 2 Global Step: 41940 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:00,815-Speed 2510.71 samples/sec Loss 15.7711 LearningRate 0.000506 Epoch: 2 Global Step: 41950 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:09,030-Speed 2493.38 samples/sec Loss 15.7946 LearningRate 0.000506 Epoch: 2 Global Step: 41960 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:17,246-Speed 2493.11 samples/sec Loss 15.9510 LearningRate 0.000506 Epoch: 2 Global Step: 41970 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:25,454-Speed 2495.42 samples/sec Loss 15.8746 LearningRate 0.000506 Epoch: 2 Global Step: 41980 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:33,654-Speed 2498.02 samples/sec Loss 15.7567 LearningRate 0.000506 Epoch: 2 Global Step: 41990 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:41,854-Speed 2497.86 samples/sec Loss 15.8741 LearningRate 0.000506 Epoch: 2 Global Step: 42000 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:50,005-Speed 2512.95 samples/sec Loss 15.8184 LearningRate 0.000506 Epoch: 2 Global Step: 42010 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:01:58,207-Speed 2497.31 samples/sec Loss 15.8579 LearningRate 0.000507 Epoch: 2 Global Step: 42020 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:06,407-Speed 2498.07 samples/sec Loss 15.9012 LearningRate 0.000507 Epoch: 2 Global Step: 42030 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:14,609-Speed 2497.13 samples/sec Loss 15.8249 LearningRate 0.000507 Epoch: 2 Global Step: 42040 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:22,823-Speed 2494.14 samples/sec Loss 15.8705 LearningRate 0.000507 Epoch: 2 Global Step: 42050 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:31,021-Speed 2498.48 samples/sec Loss 15.8265 LearningRate 0.000507 Epoch: 2 Global Step: 42060 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:39,166-Speed 2514.93 samples/sec Loss 15.7438 LearningRate 0.000507 Epoch: 2 Global Step: 42070 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:47,375-Speed 2495.08 samples/sec Loss 15.8066 LearningRate 0.000507 Epoch: 2 Global Step: 42080 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:02:55,576-Speed 2497.68 samples/sec Loss 15.7562 LearningRate 0.000507 Epoch: 2 Global Step: 42090 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:03,773-Speed 2498.86 samples/sec Loss 15.8011 LearningRate 0.000507 Epoch: 2 Global Step: 42100 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:11,990-Speed 2492.86 samples/sec Loss 15.7144 LearningRate 0.000508 Epoch: 2 Global Step: 42110 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:20,190-Speed 2498.09 samples/sec Loss 15.6737 LearningRate 0.000508 Epoch: 2 Global Step: 42120 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:28,356-Speed 2508.30 samples/sec Loss 15.6144 LearningRate 0.000508 Epoch: 2 Global Step: 42130 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:36,560-Speed 2496.64 samples/sec Loss 15.8477 LearningRate 0.000508 Epoch: 2 Global Step: 42140 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:44,762-Speed 2497.51 samples/sec Loss 15.7617 LearningRate 0.000508 Epoch: 2 Global Step: 42150 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:03:52,960-Speed 2498.30 samples/sec Loss 15.8002 LearningRate 0.000508 Epoch: 2 Global Step: 42160 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:01,421-Speed 2498.88 samples/sec Loss 15.8047 LearningRate 0.000508 Epoch: 2 Global Step: 42170 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:09,620-Speed 2498.17 samples/sec Loss 15.7060 LearningRate 0.000508 Epoch: 2 Global Step: 42180 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:17,772-Speed 2514.93 samples/sec Loss 15.7704 LearningRate 0.000509 Epoch: 2 Global Step: 42190 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:26,331-Speed 2407.31 samples/sec Loss 15.9113 LearningRate 0.000509 Epoch: 2 Global Step: 42200 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:34,542-Speed 2494.38 samples/sec Loss 15.7702 LearningRate 0.000509 Epoch: 2 Global Step: 42210 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:42,760-Speed 2499.00 samples/sec Loss 15.8319 LearningRate 0.000509 Epoch: 2 Global Step: 42220 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:04:50,984-Speed 2499.58 samples/sec Loss 15.7517 LearningRate 0.000509 Epoch: 2 Global Step: 42230 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:01,310-Speed 2501.73 samples/sec Loss 15.5980 LearningRate 0.000509 Epoch: 2 Global Step: 42240 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:09,459-Speed 2513.26 samples/sec Loss 15.8471 LearningRate 0.000509 Epoch: 2 Global Step: 42250 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:17,663-Speed 2496.71 samples/sec Loss 15.6373 LearningRate 0.000509 Epoch: 2 Global Step: 42260 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:26,223-Speed 2495.70 samples/sec Loss 15.7289 LearningRate 0.000510 Epoch: 2 Global Step: 42270 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:34,460-Speed 2499.94 samples/sec Loss 15.6639 LearningRate 0.000510 Epoch: 2 Global Step: 42280 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:42,660-Speed 2497.91 samples/sec Loss 15.7558 LearningRate 0.000510 Epoch: 2 Global Step: 42290 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:50,891-Speed 2499.10 samples/sec Loss 15.6605 LearningRate 0.000510 Epoch: 2 Global Step: 42300 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:05:59,069-Speed 2515.32 samples/sec Loss 15.8545 LearningRate 0.000510 Epoch: 2 Global Step: 42310 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:07,275-Speed 2496.34 samples/sec Loss 15.5915 LearningRate 0.000510 Epoch: 2 Global Step: 42320 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:15,475-Speed 2497.81 samples/sec Loss 15.6354 LearningRate 0.000510 Epoch: 2 Global Step: 42330 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:23,706-Speed 2495.41 samples/sec Loss 15.6654 LearningRate 0.000510 Epoch: 2 Global Step: 42340 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:31,935-Speed 2499.62 samples/sec Loss 15.5386 LearningRate 0.000510 Epoch: 2 Global Step: 42350 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:40,136-Speed 2497.35 samples/sec Loss 15.6730 LearningRate 0.000511 Epoch: 2 Global Step: 42360 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:48,288-Speed 2515.84 samples/sec Loss 15.7163 LearningRate 0.000511 Epoch: 2 Global Step: 42370 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:06:56,514-Speed 2500.20 samples/sec Loss 15.6494 LearningRate 0.000511 Epoch: 2 Global Step: 42380 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:04,743-Speed 2499.86 samples/sec Loss 15.7508 LearningRate 0.000511 Epoch: 2 Global Step: 42390 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:12,943-Speed 2497.78 samples/sec Loss 15.6726 LearningRate 0.000511 Epoch: 2 Global Step: 42400 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:21,285-Speed 2499.62 samples/sec Loss 15.6378 LearningRate 0.000511 Epoch: 2 Global Step: 42410 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:29,497-Speed 2494.18 samples/sec Loss 15.5950 LearningRate 0.000511 Epoch: 2 Global Step: 42420 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:37,703-Speed 2514.77 samples/sec Loss 15.5774 LearningRate 0.000511 Epoch: 2 Global Step: 42430 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:46,052-Speed 2498.86 samples/sec Loss 15.6209 LearningRate 0.000512 Epoch: 2 Global Step: 42440 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:07:54,257-Speed 2496.26 samples/sec Loss 15.5623 LearningRate 0.000512 Epoch: 2 Global Step: 42450 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:02,513-Speed 2497.95 samples/sec Loss 15.6111 LearningRate 0.000512 Epoch: 2 Global Step: 42460 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:14,494-Speed 2499.57 samples/sec Loss 15.5277 LearningRate 0.000512 Epoch: 2 Global Step: 42470 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:22,694-Speed 2497.79 samples/sec Loss 15.6010 LearningRate 0.000512 Epoch: 2 Global Step: 42480 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:30,909-Speed 2515.21 samples/sec Loss 15.5151 LearningRate 0.000512 Epoch: 2 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:39,121-Speed 2498.51 samples/sec Loss 15.5678 LearningRate 0.000512 Epoch: 2 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:08:52,075-Speed 1582.07 samples/sec Loss 15.6242 LearningRate 0.000512 Epoch: 2 Global Step: 42510 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:00,283-Speed 2499.28 samples/sec Loss 15.5160 LearningRate 0.000513 Epoch: 2 Global Step: 42520 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:08,513-Speed 2498.71 samples/sec Loss 15.4366 LearningRate 0.000513 Epoch: 2 Global Step: 42530 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:21,241-Speed 2498.00 samples/sec Loss 15.5034 LearningRate 0.000513 Epoch: 2 Global Step: 42540 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:29,385-Speed 2515.32 samples/sec Loss 15.5799 LearningRate 0.000513 Epoch: 2 Global Step: 42550 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:37,589-Speed 2496.71 samples/sec Loss 15.4345 LearningRate 0.000513 Epoch: 2 Global Step: 42560 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:45,793-Speed 2496.70 samples/sec Loss 15.3666 LearningRate 0.000513 Epoch: 2 Global Step: 42570 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:09:54,000-Speed 2495.97 samples/sec Loss 15.3528 LearningRate 0.000513 Epoch: 2 Global Step: 42580 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:02,205-Speed 2496.37 samples/sec Loss 15.3858 LearningRate 0.000513 Epoch: 2 Global Step: 42590 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:10,416-Speed 2494.86 samples/sec Loss 15.3469 LearningRate 0.000514 Epoch: 2 Global Step: 42600 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:18,572-Speed 2511.48 samples/sec Loss 15.3833 LearningRate 0.000514 Epoch: 2 Global Step: 42610 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:26,789-Speed 2492.70 samples/sec Loss 15.2985 LearningRate 0.000514 Epoch: 2 Global Step: 42620 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:34,998-Speed 2495.10 samples/sec Loss 15.3597 LearningRate 0.000514 Epoch: 2 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:43,210-Speed 2494.31 samples/sec Loss 15.3325 LearningRate 0.000514 Epoch: 2 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:51,433-Speed 2491.26 samples/sec Loss 15.4529 LearningRate 0.000514 Epoch: 2 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:10:59,639-Speed 2496.03 samples/sec Loss 15.3617 LearningRate 0.000514 Epoch: 2 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:07,798-Speed 2510.45 samples/sec Loss 15.3073 LearningRate 0.000514 Epoch: 2 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:16,008-Speed 2494.86 samples/sec Loss 15.3257 LearningRate 0.000514 Epoch: 2 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:24,213-Speed 2496.60 samples/sec Loss 15.4133 LearningRate 0.000515 Epoch: 2 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:32,432-Speed 2491.89 samples/sec Loss 15.4156 LearningRate 0.000515 Epoch: 2 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:40,633-Speed 2497.78 samples/sec Loss 15.3276 LearningRate 0.000515 Epoch: 2 Global Step: 42710 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:48,835-Speed 2497.26 samples/sec Loss 15.4585 LearningRate 0.000515 Epoch: 2 Global Step: 42720 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:11:56,988-Speed 2512.53 samples/sec Loss 15.3625 LearningRate 0.000515 Epoch: 2 Global Step: 42730 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:12:05,192-Speed 2496.64 samples/sec Loss 15.3830 LearningRate 0.000515 Epoch: 2 Global Step: 42740 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:12:13,396-Speed 2496.73 samples/sec Loss 15.2849 LearningRate 0.000515 Epoch: 2 Global Step: 42750 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:12:21,597-Speed 2497.65 samples/sec Loss 15.4309 LearningRate 0.000515 Epoch: 2 Global Step: 42760 Fp16 Grad Scale: 32768 Required: 180 hours Training: 2022-07-06 00:12:29,801-Speed 2496.68 samples/sec Loss 15.3366 LearningRate 0.000516 Epoch: 2 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:12:38,005-Speed 2496.94 samples/sec Loss 15.2256 LearningRate 0.000516 Epoch: 2 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:12:46,177-Speed 2506.16 samples/sec Loss 15.4996 LearningRate 0.000516 Epoch: 2 Global Step: 42790 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:12:54,385-Speed 2495.55 samples/sec Loss 15.2765 LearningRate 0.000516 Epoch: 2 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:02,593-Speed 2495.75 samples/sec Loss 15.3141 LearningRate 0.000516 Epoch: 2 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:10,815-Speed 2490.95 samples/sec Loss 15.3520 LearningRate 0.000516 Epoch: 2 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:19,022-Speed 2495.87 samples/sec Loss 15.4202 LearningRate 0.000516 Epoch: 2 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:27,226-Speed 2496.74 samples/sec Loss 15.3961 LearningRate 0.000516 Epoch: 2 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:35,384-Speed 2510.89 samples/sec Loss 15.3412 LearningRate 0.000517 Epoch: 2 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:43,589-Speed 2496.52 samples/sec Loss 15.2284 LearningRate 0.000517 Epoch: 2 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:13:51,790-Speed 2497.72 samples/sec Loss 15.2914 LearningRate 0.000517 Epoch: 2 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:00,005-Speed 2493.37 samples/sec Loss 15.4262 LearningRate 0.000517 Epoch: 2 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:08,205-Speed 2498.23 samples/sec Loss 15.3405 LearningRate 0.000517 Epoch: 2 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:16,407-Speed 2497.36 samples/sec Loss 15.2561 LearningRate 0.000517 Epoch: 2 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:24,556-Speed 2513.54 samples/sec Loss 15.4186 LearningRate 0.000517 Epoch: 2 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:32,757-Speed 2497.87 samples/sec Loss 15.3510 LearningRate 0.000517 Epoch: 2 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:40,959-Speed 2497.11 samples/sec Loss 15.3209 LearningRate 0.000517 Epoch: 2 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:49,166-Speed 2495.64 samples/sec Loss 15.2317 LearningRate 0.000518 Epoch: 2 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:14:57,366-Speed 2498.17 samples/sec Loss 15.4200 LearningRate 0.000518 Epoch: 2 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:05,572-Speed 2495.96 samples/sec Loss 15.2681 LearningRate 0.000518 Epoch: 2 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:13,735-Speed 2509.34 samples/sec Loss 15.2802 LearningRate 0.000518 Epoch: 2 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:21,935-Speed 2497.91 samples/sec Loss 15.1893 LearningRate 0.000518 Epoch: 2 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:30,140-Speed 2496.53 samples/sec Loss 15.2677 LearningRate 0.000518 Epoch: 2 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:38,344-Speed 2496.81 samples/sec Loss 15.3090 LearningRate 0.000518 Epoch: 2 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:46,549-Speed 2496.29 samples/sec Loss 15.3984 LearningRate 0.000518 Epoch: 2 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:15:54,759-Speed 2494.79 samples/sec Loss 15.3603 LearningRate 0.000519 Epoch: 2 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:02,924-Speed 2508.48 samples/sec Loss 15.3164 LearningRate 0.000519 Epoch: 2 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:11,132-Speed 2495.84 samples/sec Loss 15.2227 LearningRate 0.000519 Epoch: 2 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:19,344-Speed 2494.14 samples/sec Loss 15.1432 LearningRate 0.000519 Epoch: 2 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:27,552-Speed 2495.42 samples/sec Loss 15.2749 LearningRate 0.000519 Epoch: 2 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:35,761-Speed 2495.41 samples/sec Loss 15.2201 LearningRate 0.000519 Epoch: 2 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:43,981-Speed 2491.83 samples/sec Loss 15.1264 LearningRate 0.000519 Epoch: 2 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:16:52,129-Speed 2513.74 samples/sec Loss 15.3196 LearningRate 0.000519 Epoch: 2 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:00,333-Speed 2497.10 samples/sec Loss 15.2043 LearningRate 0.000520 Epoch: 2 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:08,545-Speed 2494.34 samples/sec Loss 15.1786 LearningRate 0.000520 Epoch: 2 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:16,752-Speed 2495.61 samples/sec Loss 15.2616 LearningRate 0.000520 Epoch: 2 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:24,957-Speed 2496.74 samples/sec Loss 15.3493 LearningRate 0.000520 Epoch: 2 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:33,159-Speed 2497.03 samples/sec Loss 15.3633 LearningRate 0.000520 Epoch: 2 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:41,311-Speed 2512.91 samples/sec Loss 15.1401 LearningRate 0.000520 Epoch: 2 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:49,524-Speed 2494.02 samples/sec Loss 15.2337 LearningRate 0.000520 Epoch: 2 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:17:57,729-Speed 2496.46 samples/sec Loss 15.2519 LearningRate 0.000520 Epoch: 2 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:05,944-Speed 2493.36 samples/sec Loss 15.1610 LearningRate 0.000521 Epoch: 2 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:14,148-Speed 2496.68 samples/sec Loss 15.1812 LearningRate 0.000521 Epoch: 2 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:22,355-Speed 2495.85 samples/sec Loss 15.2474 LearningRate 0.000521 Epoch: 2 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:30,505-Speed 2513.36 samples/sec Loss 15.1248 LearningRate 0.000521 Epoch: 2 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:38,721-Speed 2493.21 samples/sec Loss 15.0992 LearningRate 0.000521 Epoch: 2 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:46,922-Speed 2497.68 samples/sec Loss 15.1386 LearningRate 0.000521 Epoch: 2 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:18:55,125-Speed 2497.01 samples/sec Loss 15.2236 LearningRate 0.000521 Epoch: 2 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:03,330-Speed 2496.57 samples/sec Loss 15.0833 LearningRate 0.000521 Epoch: 2 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:11,531-Speed 2497.46 samples/sec Loss 15.0563 LearningRate 0.000521 Epoch: 2 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:19,685-Speed 2512.25 samples/sec Loss 15.1664 LearningRate 0.000522 Epoch: 2 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:27,889-Speed 2496.57 samples/sec Loss 15.1442 LearningRate 0.000522 Epoch: 2 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:36,091-Speed 2497.19 samples/sec Loss 15.0310 LearningRate 0.000522 Epoch: 2 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:44,303-Speed 2494.39 samples/sec Loss 15.0475 LearningRate 0.000522 Epoch: 2 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:19:52,505-Speed 2497.77 samples/sec Loss 15.1349 LearningRate 0.000522 Epoch: 2 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:00,713-Speed 2495.42 samples/sec Loss 15.0267 LearningRate 0.000522 Epoch: 2 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:08,864-Speed 2512.74 samples/sec Loss 15.1554 LearningRate 0.000522 Epoch: 2 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:17,066-Speed 2497.39 samples/sec Loss 15.0516 LearningRate 0.000522 Epoch: 2 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:25,272-Speed 2496.30 samples/sec Loss 15.0885 LearningRate 0.000523 Epoch: 2 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:33,479-Speed 2495.87 samples/sec Loss 15.0972 LearningRate 0.000523 Epoch: 2 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:41,681-Speed 2497.36 samples/sec Loss 15.0476 LearningRate 0.000523 Epoch: 2 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:49,888-Speed 2495.80 samples/sec Loss 15.0348 LearningRate 0.000523 Epoch: 2 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:20:58,039-Speed 2512.90 samples/sec Loss 15.1465 LearningRate 0.000523 Epoch: 2 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:06,247-Speed 2495.33 samples/sec Loss 15.0415 LearningRate 0.000523 Epoch: 2 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:14,449-Speed 2497.42 samples/sec Loss 15.0566 LearningRate 0.000523 Epoch: 2 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:22,652-Speed 2497.02 samples/sec Loss 15.1198 LearningRate 0.000523 Epoch: 2 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:30,859-Speed 2495.85 samples/sec Loss 15.1746 LearningRate 0.000524 Epoch: 2 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:39,060-Speed 2497.67 samples/sec Loss 15.1570 LearningRate 0.000524 Epoch: 2 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:47,209-Speed 2513.59 samples/sec Loss 15.1496 LearningRate 0.000524 Epoch: 2 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:21:55,410-Speed 2497.94 samples/sec Loss 15.0313 LearningRate 0.000524 Epoch: 2 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:22:03,614-Speed 2496.54 samples/sec Loss 15.0115 LearningRate 0.000524 Epoch: 2 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:22:11,822-Speed 2495.71 samples/sec Loss 15.1373 LearningRate 0.000524 Epoch: 2 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 180 hours Training: 2022-07-06 00:22:20,038-Speed 2493.17 samples/sec Loss 15.0129 LearningRate 0.000524 Epoch: 2 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:22:28,244-Speed 2496.63 samples/sec Loss 15.0232 LearningRate 0.000524 Epoch: 2 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:22:36,412-Speed 2507.77 samples/sec Loss 15.0342 LearningRate 0.000524 Epoch: 2 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:22:44,618-Speed 2495.99 samples/sec Loss 15.0868 LearningRate 0.000525 Epoch: 2 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:22:52,829-Speed 2494.80 samples/sec Loss 15.1722 LearningRate 0.000525 Epoch: 2 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:01,032-Speed 2496.81 samples/sec Loss 14.9984 LearningRate 0.000525 Epoch: 2 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:09,238-Speed 2496.39 samples/sec Loss 14.9860 LearningRate 0.000525 Epoch: 2 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:17,438-Speed 2497.81 samples/sec Loss 15.1233 LearningRate 0.000525 Epoch: 2 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:25,587-Speed 2513.57 samples/sec Loss 15.0909 LearningRate 0.000525 Epoch: 2 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:33,795-Speed 2495.67 samples/sec Loss 15.0469 LearningRate 0.000525 Epoch: 2 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:41,999-Speed 2496.83 samples/sec Loss 15.1323 LearningRate 0.000525 Epoch: 2 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:50,200-Speed 2497.34 samples/sec Loss 15.0163 LearningRate 0.000526 Epoch: 2 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:23:58,399-Speed 2498.28 samples/sec Loss 15.0473 LearningRate 0.000526 Epoch: 2 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:06,599-Speed 2498.17 samples/sec Loss 15.0037 LearningRate 0.000526 Epoch: 2 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:14,748-Speed 2513.63 samples/sec Loss 15.0893 LearningRate 0.000526 Epoch: 2 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:22,951-Speed 2497.09 samples/sec Loss 14.9442 LearningRate 0.000526 Epoch: 2 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:31,151-Speed 2498.08 samples/sec Loss 14.9817 LearningRate 0.000526 Epoch: 2 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:39,361-Speed 2494.92 samples/sec Loss 14.9241 LearningRate 0.000526 Epoch: 2 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:47,581-Speed 2491.92 samples/sec Loss 14.8765 LearningRate 0.000526 Epoch: 2 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:24:55,784-Speed 2497.06 samples/sec Loss 14.9309 LearningRate 0.000527 Epoch: 2 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:03,933-Speed 2513.62 samples/sec Loss 14.9341 LearningRate 0.000527 Epoch: 2 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:12,133-Speed 2497.90 samples/sec Loss 14.8191 LearningRate 0.000527 Epoch: 2 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:20,334-Speed 2497.62 samples/sec Loss 14.7846 LearningRate 0.000527 Epoch: 2 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:28,534-Speed 2498.08 samples/sec Loss 14.8589 LearningRate 0.000527 Epoch: 2 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:36,737-Speed 2496.96 samples/sec Loss 14.8772 LearningRate 0.000527 Epoch: 2 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:44,938-Speed 2497.71 samples/sec Loss 14.7452 LearningRate 0.000527 Epoch: 2 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:25:53,086-Speed 2513.81 samples/sec Loss 14.7791 LearningRate 0.000527 Epoch: 2 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:01,295-Speed 2495.27 samples/sec Loss 14.8413 LearningRate 0.000527 Epoch: 2 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:09,496-Speed 2497.89 samples/sec Loss 14.8042 LearningRate 0.000528 Epoch: 2 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:17,704-Speed 2495.52 samples/sec Loss 14.7654 LearningRate 0.000528 Epoch: 2 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:25,901-Speed 2498.72 samples/sec Loss 14.7233 LearningRate 0.000528 Epoch: 2 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:34,108-Speed 2495.84 samples/sec Loss 14.8161 LearningRate 0.000528 Epoch: 2 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:42,262-Speed 2511.91 samples/sec Loss 14.7433 LearningRate 0.000528 Epoch: 2 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:50,463-Speed 2497.69 samples/sec Loss 14.7386 LearningRate 0.000528 Epoch: 2 Global Step: 43820 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:26:58,665-Speed 2497.28 samples/sec Loss 14.8451 LearningRate 0.000528 Epoch: 2 Global Step: 43830 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:06,867-Speed 2497.37 samples/sec Loss 14.6693 LearningRate 0.000528 Epoch: 2 Global Step: 43840 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:15,070-Speed 2497.12 samples/sec Loss 14.8504 LearningRate 0.000529 Epoch: 2 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:23,279-Speed 2495.23 samples/sec Loss 14.6259 LearningRate 0.000529 Epoch: 2 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:31,437-Speed 2510.73 samples/sec Loss 14.7628 LearningRate 0.000529 Epoch: 2 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:39,638-Speed 2497.53 samples/sec Loss 14.7780 LearningRate 0.000529 Epoch: 2 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:47,841-Speed 2497.12 samples/sec Loss 14.7603 LearningRate 0.000529 Epoch: 2 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:27:56,047-Speed 2496.23 samples/sec Loss 15.0001 LearningRate 0.000529 Epoch: 2 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:04,248-Speed 2497.84 samples/sec Loss 14.9247 LearningRate 0.000529 Epoch: 2 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:12,452-Speed 2496.58 samples/sec Loss 14.8515 LearningRate 0.000529 Epoch: 2 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:20,599-Speed 2514.48 samples/sec Loss 15.0716 LearningRate 0.000530 Epoch: 2 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:28,814-Speed 2493.70 samples/sec Loss 14.9447 LearningRate 0.000530 Epoch: 2 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:37,019-Speed 2496.43 samples/sec Loss 14.7931 LearningRate 0.000530 Epoch: 2 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:45,226-Speed 2495.79 samples/sec Loss 14.8477 LearningRate 0.000530 Epoch: 2 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 179 hours Training: 2022-07-06 00:28:53,432-Speed 2496.39 samples/sec Loss 14.7509 LearningRate 0.000530 Epoch: 2 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:01,649-Speed 2492.59 samples/sec Loss 14.8264 LearningRate 0.000530 Epoch: 2 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:09,799-Speed 2513.08 samples/sec Loss 14.8167 LearningRate 0.000530 Epoch: 2 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:18,007-Speed 2495.78 samples/sec Loss 14.7250 LearningRate 0.000530 Epoch: 2 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:26,218-Speed 2494.59 samples/sec Loss 14.8200 LearningRate 0.000531 Epoch: 2 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:34,423-Speed 2496.61 samples/sec Loss 14.8470 LearningRate 0.000531 Epoch: 2 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:42,627-Speed 2496.60 samples/sec Loss 14.8626 LearningRate 0.000531 Epoch: 2 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:50,840-Speed 2493.88 samples/sec Loss 14.8209 LearningRate 0.000531 Epoch: 2 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:29:58,989-Speed 2513.74 samples/sec Loss 14.7530 LearningRate 0.000531 Epoch: 2 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:07,195-Speed 2496.12 samples/sec Loss 14.8649 LearningRate 0.000531 Epoch: 2 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:15,397-Speed 2497.20 samples/sec Loss 14.7927 LearningRate 0.000531 Epoch: 2 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:23,599-Speed 2497.71 samples/sec Loss 14.7479 LearningRate 0.000531 Epoch: 2 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:31,800-Speed 2497.34 samples/sec Loss 15.0306 LearningRate 0.000531 Epoch: 2 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:40,007-Speed 2495.94 samples/sec Loss 14.7276 LearningRate 0.000532 Epoch: 2 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:48,158-Speed 2513.02 samples/sec Loss 14.7487 LearningRate 0.000532 Epoch: 2 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:30:56,367-Speed 2494.98 samples/sec Loss 14.8527 LearningRate 0.000532 Epoch: 2 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:04,574-Speed 2495.87 samples/sec Loss 14.6846 LearningRate 0.000532 Epoch: 2 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:12,782-Speed 2495.68 samples/sec Loss 14.8400 LearningRate 0.000532 Epoch: 2 Global Step: 44140 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:20,985-Speed 2497.28 samples/sec Loss 14.7238 LearningRate 0.000532 Epoch: 2 Global Step: 44150 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:29,191-Speed 2496.05 samples/sec Loss 14.7479 LearningRate 0.000532 Epoch: 2 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:37,337-Speed 2514.64 samples/sec Loss 14.6424 LearningRate 0.000532 Epoch: 2 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:45,542-Speed 2496.35 samples/sec Loss 14.7088 LearningRate 0.000533 Epoch: 2 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:31:53,753-Speed 2494.64 samples/sec Loss 14.7432 LearningRate 0.000533 Epoch: 2 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:01,960-Speed 2495.93 samples/sec Loss 14.6276 LearningRate 0.000533 Epoch: 2 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:10,168-Speed 2495.57 samples/sec Loss 14.6499 LearningRate 0.000533 Epoch: 2 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:18,372-Speed 2496.84 samples/sec Loss 14.5927 LearningRate 0.000533 Epoch: 2 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:26,521-Speed 2513.43 samples/sec Loss 14.5658 LearningRate 0.000533 Epoch: 2 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:34,723-Speed 2497.61 samples/sec Loss 14.5499 LearningRate 0.000533 Epoch: 2 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:42,923-Speed 2497.71 samples/sec Loss 14.6756 LearningRate 0.000533 Epoch: 2 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:51,130-Speed 2496.23 samples/sec Loss 14.5887 LearningRate 0.000534 Epoch: 2 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:32:59,340-Speed 2494.95 samples/sec Loss 14.6231 LearningRate 0.000534 Epoch: 2 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:07,544-Speed 2496.90 samples/sec Loss 14.5478 LearningRate 0.000534 Epoch: 2 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:15,696-Speed 2512.48 samples/sec Loss 14.6102 LearningRate 0.000534 Epoch: 2 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:23,902-Speed 2496.09 samples/sec Loss 14.6095 LearningRate 0.000534 Epoch: 2 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:32,105-Speed 2497.08 samples/sec Loss 14.5452 LearningRate 0.000534 Epoch: 2 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:40,308-Speed 2497.18 samples/sec Loss 14.5232 LearningRate 0.000534 Epoch: 2 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:48,510-Speed 2497.34 samples/sec Loss 14.4506 LearningRate 0.000534 Epoch: 2 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:33:56,713-Speed 2496.99 samples/sec Loss 14.4793 LearningRate 0.000534 Epoch: 2 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:04,861-Speed 2513.85 samples/sec Loss 14.5870 LearningRate 0.000535 Epoch: 2 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:13,066-Speed 2496.41 samples/sec Loss 14.6445 LearningRate 0.000535 Epoch: 2 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:21,278-Speed 2494.32 samples/sec Loss 14.5241 LearningRate 0.000535 Epoch: 2 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:29,480-Speed 2497.39 samples/sec Loss 14.5470 LearningRate 0.000535 Epoch: 2 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:37,682-Speed 2497.42 samples/sec Loss 14.6857 LearningRate 0.000535 Epoch: 2 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:45,891-Speed 2495.07 samples/sec Loss 14.5733 LearningRate 0.000535 Epoch: 2 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:34:54,040-Speed 2513.86 samples/sec Loss 14.6827 LearningRate 0.000535 Epoch: 2 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:02,244-Speed 2496.82 samples/sec Loss 14.6329 LearningRate 0.000535 Epoch: 2 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:10,452-Speed 2495.36 samples/sec Loss 14.5622 LearningRate 0.000536 Epoch: 2 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:18,667-Speed 2493.49 samples/sec Loss 14.6017 LearningRate 0.000536 Epoch: 2 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:26,869-Speed 2497.18 samples/sec Loss 14.6281 LearningRate 0.000536 Epoch: 2 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:35,075-Speed 2496.21 samples/sec Loss 14.5122 LearningRate 0.000536 Epoch: 2 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:43,226-Speed 2513.03 samples/sec Loss 14.6346 LearningRate 0.000536 Epoch: 2 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:51,430-Speed 2496.85 samples/sec Loss 14.5375 LearningRate 0.000536 Epoch: 2 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:35:59,632-Speed 2497.29 samples/sec Loss 14.4930 LearningRate 0.000536 Epoch: 2 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:07,834-Speed 2497.61 samples/sec Loss 14.4881 LearningRate 0.000536 Epoch: 2 Global Step: 44500 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:16,048-Speed 2493.69 samples/sec Loss 14.4863 LearningRate 0.000537 Epoch: 2 Global Step: 44510 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:24,250-Speed 2497.42 samples/sec Loss 14.5014 LearningRate 0.000537 Epoch: 2 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:32,398-Speed 2513.65 samples/sec Loss 14.4411 LearningRate 0.000537 Epoch: 2 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:40,600-Speed 2497.71 samples/sec Loss 14.4234 LearningRate 0.000537 Epoch: 2 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:48,803-Speed 2497.01 samples/sec Loss 14.4169 LearningRate 0.000537 Epoch: 2 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:36:57,013-Speed 2494.92 samples/sec Loss 14.4698 LearningRate 0.000537 Epoch: 2 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:05,215-Speed 2497.35 samples/sec Loss 14.5086 LearningRate 0.000537 Epoch: 2 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:13,441-Speed 2490.01 samples/sec Loss 14.6266 LearningRate 0.000537 Epoch: 2 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:21,589-Speed 2514.06 samples/sec Loss 14.4838 LearningRate 0.000538 Epoch: 2 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:29,802-Speed 2494.10 samples/sec Loss 14.5180 LearningRate 0.000538 Epoch: 2 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:38,011-Speed 2495.28 samples/sec Loss 14.4819 LearningRate 0.000538 Epoch: 2 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:46,215-Speed 2496.61 samples/sec Loss 14.4420 LearningRate 0.000538 Epoch: 2 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:37:54,437-Speed 2491.34 samples/sec Loss 14.5168 LearningRate 0.000538 Epoch: 2 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:02,643-Speed 2496.37 samples/sec Loss 14.3730 LearningRate 0.000538 Epoch: 2 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:10,803-Speed 2510.56 samples/sec Loss 14.4563 LearningRate 0.000538 Epoch: 2 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:19,008-Speed 2496.05 samples/sec Loss 14.3718 LearningRate 0.000538 Epoch: 2 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:27,212-Speed 2496.99 samples/sec Loss 14.4064 LearningRate 0.000538 Epoch: 2 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:35,413-Speed 2497.59 samples/sec Loss 14.2886 LearningRate 0.000539 Epoch: 2 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:43,624-Speed 2494.59 samples/sec Loss 14.4012 LearningRate 0.000539 Epoch: 2 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:51,827-Speed 2497.15 samples/sec Loss 14.4206 LearningRate 0.000539 Epoch: 2 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:38:59,979-Speed 2512.78 samples/sec Loss 14.4346 LearningRate 0.000539 Epoch: 2 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:08,197-Speed 2492.21 samples/sec Loss 14.4494 LearningRate 0.000539 Epoch: 2 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:16,400-Speed 2497.19 samples/sec Loss 14.4164 LearningRate 0.000539 Epoch: 2 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:24,620-Speed 2491.67 samples/sec Loss 14.3249 LearningRate 0.000539 Epoch: 2 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:32,829-Speed 2495.23 samples/sec Loss 14.3982 LearningRate 0.000539 Epoch: 2 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:41,031-Speed 2497.74 samples/sec Loss 14.4589 LearningRate 0.000540 Epoch: 2 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:49,180-Speed 2513.77 samples/sec Loss 14.3226 LearningRate 0.000540 Epoch: 2 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:39:57,381-Speed 2497.58 samples/sec Loss 14.3583 LearningRate 0.000540 Epoch: 2 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:05,580-Speed 2498.06 samples/sec Loss 14.3638 LearningRate 0.000540 Epoch: 2 Global Step: 44790 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:13,782-Speed 2497.43 samples/sec Loss 14.3379 LearningRate 0.000540 Epoch: 2 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:21,984-Speed 2497.39 samples/sec Loss 14.3523 LearningRate 0.000540 Epoch: 2 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:30,200-Speed 2493.09 samples/sec Loss 14.4396 LearningRate 0.000540 Epoch: 2 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:38,351-Speed 2512.82 samples/sec Loss 14.3716 LearningRate 0.000540 Epoch: 2 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:46,555-Speed 2496.96 samples/sec Loss 14.3726 LearningRate 0.000541 Epoch: 2 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:40:54,759-Speed 2496.52 samples/sec Loss 14.3588 LearningRate 0.000541 Epoch: 2 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:02,964-Speed 2496.35 samples/sec Loss 14.2904 LearningRate 0.000541 Epoch: 2 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:11,162-Speed 2498.97 samples/sec Loss 14.3113 LearningRate 0.000541 Epoch: 2 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:19,365-Speed 2497.13 samples/sec Loss 14.2825 LearningRate 0.000541 Epoch: 2 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:27,515-Speed 2513.17 samples/sec Loss 14.3571 LearningRate 0.000541 Epoch: 2 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:35,719-Speed 2497.04 samples/sec Loss 14.3101 LearningRate 0.000541 Epoch: 2 Global Step: 44900 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:43,922-Speed 2496.93 samples/sec Loss 14.2756 LearningRate 0.000541 Epoch: 2 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:41:52,133-Speed 2494.87 samples/sec Loss 14.2578 LearningRate 0.000541 Epoch: 2 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:00,349-Speed 2493.08 samples/sec Loss 14.1283 LearningRate 0.000542 Epoch: 2 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:08,551-Speed 2497.39 samples/sec Loss 14.3351 LearningRate 0.000542 Epoch: 2 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:16,704-Speed 2512.18 samples/sec Loss 14.2806 LearningRate 0.000542 Epoch: 2 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:24,911-Speed 2495.88 samples/sec Loss 14.2190 LearningRate 0.000542 Epoch: 2 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:33,113-Speed 2497.38 samples/sec Loss 14.2705 LearningRate 0.000542 Epoch: 2 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:41,314-Speed 2497.91 samples/sec Loss 14.1989 LearningRate 0.000542 Epoch: 2 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:49,529-Speed 2493.30 samples/sec Loss 14.1762 LearningRate 0.000542 Epoch: 2 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:42:57,731-Speed 2497.33 samples/sec Loss 14.1713 LearningRate 0.000542 Epoch: 2 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:05,881-Speed 2513.44 samples/sec Loss 14.1086 LearningRate 0.000543 Epoch: 2 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:14,082-Speed 2497.51 samples/sec Loss 14.1902 LearningRate 0.000543 Epoch: 2 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:22,285-Speed 2497.13 samples/sec Loss 14.4424 LearningRate 0.000543 Epoch: 2 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:30,486-Speed 2497.53 samples/sec Loss 14.2748 LearningRate 0.000543 Epoch: 2 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:38,692-Speed 2496.28 samples/sec Loss 14.2700 LearningRate 0.000543 Epoch: 2 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:46,893-Speed 2497.79 samples/sec Loss 14.3828 LearningRate 0.000543 Epoch: 2 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:43:55,040-Speed 2514.10 samples/sec Loss 14.3101 LearningRate 0.000543 Epoch: 2 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:03,246-Speed 2496.31 samples/sec Loss 14.4676 LearningRate 0.000543 Epoch: 2 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:11,448-Speed 2497.34 samples/sec Loss 14.2469 LearningRate 0.000544 Epoch: 2 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:19,662-Speed 2493.83 samples/sec Loss 14.2553 LearningRate 0.000544 Epoch: 2 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:27,865-Speed 2496.87 samples/sec Loss 14.2202 LearningRate 0.000544 Epoch: 2 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:36,068-Speed 2497.23 samples/sec Loss 14.2725 LearningRate 0.000544 Epoch: 2 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:44,218-Speed 2513.37 samples/sec Loss 14.4406 LearningRate 0.000544 Epoch: 2 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:44:52,419-Speed 2497.76 samples/sec Loss 14.3283 LearningRate 0.000544 Epoch: 2 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:00,617-Speed 2498.54 samples/sec Loss 14.2011 LearningRate 0.000544 Epoch: 2 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:08,823-Speed 2496.22 samples/sec Loss 14.2694 LearningRate 0.000544 Epoch: 2 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:17,024-Speed 2497.67 samples/sec Loss 14.2253 LearningRate 0.000544 Epoch: 2 Global Step: 45170 Fp16 Grad Scale: 262144 Required: 179 hours Training: 2022-07-06 00:45:25,181-Speed 2511.51 samples/sec Loss 14.1298 LearningRate 0.000545 Epoch: 2 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:33,330-Speed 2513.45 samples/sec Loss 14.1247 LearningRate 0.000545 Epoch: 2 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:41,533-Speed 2497.07 samples/sec Loss 14.2530 LearningRate 0.000545 Epoch: 2 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:49,736-Speed 2497.32 samples/sec Loss 14.2141 LearningRate 0.000545 Epoch: 2 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:45:57,943-Speed 2495.89 samples/sec Loss 14.1527 LearningRate 0.000545 Epoch: 2 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:06,147-Speed 2496.65 samples/sec Loss 14.1590 LearningRate 0.000545 Epoch: 2 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:14,348-Speed 2497.82 samples/sec Loss 14.1789 LearningRate 0.000545 Epoch: 2 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:22,500-Speed 2512.69 samples/sec Loss 14.2006 LearningRate 0.000545 Epoch: 2 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:30,712-Speed 2494.11 samples/sec Loss 14.2661 LearningRate 0.000546 Epoch: 2 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:38,915-Speed 2497.11 samples/sec Loss 14.1649 LearningRate 0.000546 Epoch: 2 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:47,115-Speed 2497.97 samples/sec Loss 14.1803 LearningRate 0.000546 Epoch: 2 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:46:55,320-Speed 2496.51 samples/sec Loss 14.2616 LearningRate 0.000546 Epoch: 2 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:03,523-Speed 2497.15 samples/sec Loss 14.2018 LearningRate 0.000546 Epoch: 2 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:11,672-Speed 2513.52 samples/sec Loss 14.1385 LearningRate 0.000546 Epoch: 2 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:19,875-Speed 2497.55 samples/sec Loss 14.2676 LearningRate 0.000546 Epoch: 2 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:28,077-Speed 2497.57 samples/sec Loss 14.1198 LearningRate 0.000546 Epoch: 2 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:36,294-Speed 2492.92 samples/sec Loss 14.3183 LearningRate 0.000547 Epoch: 2 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:44,494-Speed 2498.00 samples/sec Loss 14.2288 LearningRate 0.000547 Epoch: 2 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:47:52,697-Speed 2496.82 samples/sec Loss 14.1921 LearningRate 0.000547 Epoch: 2 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:00,844-Speed 2514.23 samples/sec Loss 14.1710 LearningRate 0.000547 Epoch: 2 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:09,049-Speed 2496.64 samples/sec Loss 14.1095 LearningRate 0.000547 Epoch: 2 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:17,251-Speed 2497.22 samples/sec Loss 14.1640 LearningRate 0.000547 Epoch: 2 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:25,455-Speed 2497.31 samples/sec Loss 14.0770 LearningRate 0.000547 Epoch: 2 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:33,659-Speed 2496.94 samples/sec Loss 14.0790 LearningRate 0.000547 Epoch: 2 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:41,859-Speed 2497.76 samples/sec Loss 14.1029 LearningRate 0.000548 Epoch: 2 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:50,008-Speed 2513.38 samples/sec Loss 14.0284 LearningRate 0.000548 Epoch: 2 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:48:58,209-Speed 2498.54 samples/sec Loss 14.0691 LearningRate 0.000548 Epoch: 2 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:06,409-Speed 2498.12 samples/sec Loss 14.0947 LearningRate 0.000548 Epoch: 2 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:14,619-Speed 2494.86 samples/sec Loss 13.9775 LearningRate 0.000548 Epoch: 2 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:22,821-Speed 2497.38 samples/sec Loss 14.0444 LearningRate 0.000548 Epoch: 2 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:31,025-Speed 2497.07 samples/sec Loss 14.1602 LearningRate 0.000548 Epoch: 2 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:39,174-Speed 2513.71 samples/sec Loss 14.0362 LearningRate 0.000548 Epoch: 2 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:47,372-Speed 2498.49 samples/sec Loss 14.1071 LearningRate 0.000548 Epoch: 2 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:49:55,577-Speed 2496.30 samples/sec Loss 14.1177 LearningRate 0.000549 Epoch: 2 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:03,780-Speed 2497.27 samples/sec Loss 14.0237 LearningRate 0.000549 Epoch: 2 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:11,980-Speed 2498.19 samples/sec Loss 14.0013 LearningRate 0.000549 Epoch: 2 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:20,181-Speed 2497.37 samples/sec Loss 14.1457 LearningRate 0.000549 Epoch: 2 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:28,330-Speed 2513.56 samples/sec Loss 14.0752 LearningRate 0.000549 Epoch: 2 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:36,534-Speed 2496.57 samples/sec Loss 14.1989 LearningRate 0.000549 Epoch: 2 Global Step: 45560 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:44,744-Speed 2495.19 samples/sec Loss 14.0835 LearningRate 0.000549 Epoch: 2 Global Step: 45570 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:50:52,948-Speed 2496.86 samples/sec Loss 14.0197 LearningRate 0.000549 Epoch: 2 Global Step: 45580 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:01,147-Speed 2498.18 samples/sec Loss 14.0801 LearningRate 0.000550 Epoch: 2 Global Step: 45590 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:09,346-Speed 2498.00 samples/sec Loss 13.9621 LearningRate 0.000550 Epoch: 2 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:17,496-Speed 2513.51 samples/sec Loss 13.7829 LearningRate 0.000550 Epoch: 2 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:25,696-Speed 2498.10 samples/sec Loss 13.9839 LearningRate 0.000550 Epoch: 2 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:33,897-Speed 2497.49 samples/sec Loss 14.0938 LearningRate 0.000550 Epoch: 2 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:42,098-Speed 2497.53 samples/sec Loss 13.9673 LearningRate 0.000550 Epoch: 2 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:50,299-Speed 2497.72 samples/sec Loss 13.9858 LearningRate 0.000550 Epoch: 2 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:51:58,501-Speed 2497.27 samples/sec Loss 14.0232 LearningRate 0.000550 Epoch: 2 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:06,650-Speed 2513.39 samples/sec Loss 13.9280 LearningRate 0.000551 Epoch: 2 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:14,855-Speed 2496.46 samples/sec Loss 13.8876 LearningRate 0.000551 Epoch: 2 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:23,061-Speed 2496.56 samples/sec Loss 13.9526 LearningRate 0.000551 Epoch: 2 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:31,266-Speed 2496.02 samples/sec Loss 13.9180 LearningRate 0.000551 Epoch: 2 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:39,468-Speed 2497.53 samples/sec Loss 13.9494 LearningRate 0.000551 Epoch: 2 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:47,668-Speed 2498.07 samples/sec Loss 13.8759 LearningRate 0.000551 Epoch: 2 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:52:55,818-Speed 2513.37 samples/sec Loss 13.7057 LearningRate 0.000551 Epoch: 2 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:04,019-Speed 2497.72 samples/sec Loss 13.9440 LearningRate 0.000551 Epoch: 2 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:12,231-Speed 2494.44 samples/sec Loss 13.8385 LearningRate 0.000551 Epoch: 2 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:20,431-Speed 2498.10 samples/sec Loss 13.8213 LearningRate 0.000552 Epoch: 2 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:28,632-Speed 2497.56 samples/sec Loss 13.8780 LearningRate 0.000552 Epoch: 2 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:36,835-Speed 2496.99 samples/sec Loss 13.8644 LearningRate 0.000552 Epoch: 2 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:44,984-Speed 2513.58 samples/sec Loss 13.8213 LearningRate 0.000552 Epoch: 2 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:53:53,199-Speed 2493.59 samples/sec Loss 13.8271 LearningRate 0.000552 Epoch: 2 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:01,401-Speed 2497.26 samples/sec Loss 13.8138 LearningRate 0.000552 Epoch: 2 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:09,607-Speed 2496.10 samples/sec Loss 13.9135 LearningRate 0.000552 Epoch: 2 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:17,811-Speed 2496.83 samples/sec Loss 13.9376 LearningRate 0.000552 Epoch: 2 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:26,024-Speed 2494.01 samples/sec Loss 13.9337 LearningRate 0.000553 Epoch: 2 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:34,183-Speed 2510.56 samples/sec Loss 13.8728 LearningRate 0.000553 Epoch: 2 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:42,383-Speed 2497.76 samples/sec Loss 13.8284 LearningRate 0.000553 Epoch: 2 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:50,584-Speed 2497.54 samples/sec Loss 13.8640 LearningRate 0.000553 Epoch: 2 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:54:58,789-Speed 2496.25 samples/sec Loss 13.9011 LearningRate 0.000553 Epoch: 2 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:06,991-Speed 2497.58 samples/sec Loss 13.7768 LearningRate 0.000553 Epoch: 2 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:15,206-Speed 2493.19 samples/sec Loss 13.8350 LearningRate 0.000553 Epoch: 2 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:23,355-Speed 2513.48 samples/sec Loss 13.7779 LearningRate 0.000553 Epoch: 2 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:31,555-Speed 2497.83 samples/sec Loss 13.7824 LearningRate 0.000554 Epoch: 2 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:39,760-Speed 2496.52 samples/sec Loss 13.7968 LearningRate 0.000554 Epoch: 2 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:47,962-Speed 2497.11 samples/sec Loss 13.8094 LearningRate 0.000554 Epoch: 2 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:55:56,168-Speed 2496.75 samples/sec Loss 13.7347 LearningRate 0.000554 Epoch: 2 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:04,372-Speed 2496.69 samples/sec Loss 13.7511 LearningRate 0.000554 Epoch: 2 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:12,521-Speed 2513.45 samples/sec Loss 13.8463 LearningRate 0.000554 Epoch: 2 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:20,723-Speed 2497.09 samples/sec Loss 13.8298 LearningRate 0.000554 Epoch: 2 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:28,923-Speed 2497.83 samples/sec Loss 13.7902 LearningRate 0.000554 Epoch: 2 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:37,123-Speed 2498.12 samples/sec Loss 13.8072 LearningRate 0.000554 Epoch: 2 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:45,326-Speed 2497.03 samples/sec Loss 13.8633 LearningRate 0.000555 Epoch: 2 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:56:53,525-Speed 2497.98 samples/sec Loss 13.7191 LearningRate 0.000555 Epoch: 2 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:01,676-Speed 2513.18 samples/sec Loss 14.0510 LearningRate 0.000555 Epoch: 2 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:09,877-Speed 2497.74 samples/sec Loss 14.0123 LearningRate 0.000555 Epoch: 2 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:18,079-Speed 2497.36 samples/sec Loss 13.9103 LearningRate 0.000555 Epoch: 2 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:26,282-Speed 2496.92 samples/sec Loss 13.9762 LearningRate 0.000555 Epoch: 2 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:34,480-Speed 2498.32 samples/sec Loss 13.9952 LearningRate 0.000555 Epoch: 2 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:42,682-Speed 2497.30 samples/sec Loss 13.7873 LearningRate 0.000555 Epoch: 2 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:50,833-Speed 2512.92 samples/sec Loss 13.8388 LearningRate 0.000556 Epoch: 2 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:57:59,038-Speed 2496.80 samples/sec Loss 13.8980 LearningRate 0.000556 Epoch: 2 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:07,238-Speed 2497.69 samples/sec Loss 13.8672 LearningRate 0.000556 Epoch: 2 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:15,440-Speed 2497.59 samples/sec Loss 13.8171 LearningRate 0.000556 Epoch: 2 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:23,649-Speed 2495.44 samples/sec Loss 13.8776 LearningRate 0.000556 Epoch: 2 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:31,852-Speed 2497.11 samples/sec Loss 13.9381 LearningRate 0.000556 Epoch: 2 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:39,999-Speed 2514.02 samples/sec Loss 13.7968 LearningRate 0.000556 Epoch: 2 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:48,226-Speed 2489.76 samples/sec Loss 13.7484 LearningRate 0.000556 Epoch: 2 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:58:56,432-Speed 2496.15 samples/sec Loss 13.8388 LearningRate 0.000557 Epoch: 2 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:04,646-Speed 2493.75 samples/sec Loss 13.9049 LearningRate 0.000557 Epoch: 2 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:12,860-Speed 2494.51 samples/sec Loss 13.8688 LearningRate 0.000557 Epoch: 2 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:21,062-Speed 2497.29 samples/sec Loss 13.9108 LearningRate 0.000557 Epoch: 2 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:29,212-Speed 2513.39 samples/sec Loss 13.8206 LearningRate 0.000557 Epoch: 2 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:37,411-Speed 2498.04 samples/sec Loss 13.7332 LearningRate 0.000557 Epoch: 2 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:45,617-Speed 2496.31 samples/sec Loss 13.7363 LearningRate 0.000557 Epoch: 2 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 00:59:53,818-Speed 2497.58 samples/sec Loss 13.8993 LearningRate 0.000557 Epoch: 2 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:02,024-Speed 2496.10 samples/sec Loss 13.8387 LearningRate 0.000558 Epoch: 2 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:10,227-Speed 2496.93 samples/sec Loss 13.6052 LearningRate 0.000558 Epoch: 2 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:18,377-Speed 2513.27 samples/sec Loss 13.7077 LearningRate 0.000558 Epoch: 2 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:26,579-Speed 2497.49 samples/sec Loss 13.6432 LearningRate 0.000558 Epoch: 2 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:34,782-Speed 2497.36 samples/sec Loss 13.6991 LearningRate 0.000558 Epoch: 2 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:42,986-Speed 2496.77 samples/sec Loss 13.7011 LearningRate 0.000558 Epoch: 2 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:51,187-Speed 2497.47 samples/sec Loss 13.6088 LearningRate 0.000558 Epoch: 2 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:00:59,392-Speed 2496.76 samples/sec Loss 13.5889 LearningRate 0.000558 Epoch: 2 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:07,538-Speed 2514.55 samples/sec Loss 13.7599 LearningRate 0.000558 Epoch: 2 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:15,736-Speed 2498.29 samples/sec Loss 13.6225 LearningRate 0.000559 Epoch: 2 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:23,937-Speed 2497.75 samples/sec Loss 13.7545 LearningRate 0.000559 Epoch: 2 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:32,140-Speed 2497.19 samples/sec Loss 13.7770 LearningRate 0.000559 Epoch: 2 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:40,344-Speed 2496.91 samples/sec Loss 13.6460 LearningRate 0.000559 Epoch: 2 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:01:48,545-Speed 2497.46 samples/sec Loss 13.7642 LearningRate 0.000559 Epoch: 2 Global Step: 46380 Fp16 Grad Scale: 262144 Required: 179 hours Training: 2022-07-06 01:01:56,690-Speed 2514.80 samples/sec Loss 13.8894 LearningRate 0.000559 Epoch: 2 Global Step: 46390 Fp16 Grad Scale: 262144 Required: 179 hours Training: 2022-07-06 01:02:04,848-Speed 2510.84 samples/sec Loss 13.8271 LearningRate 0.000559 Epoch: 2 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:13,049-Speed 2497.74 samples/sec Loss 13.7024 LearningRate 0.000559 Epoch: 2 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:21,249-Speed 2498.06 samples/sec Loss 13.7333 LearningRate 0.000560 Epoch: 2 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:29,450-Speed 2497.47 samples/sec Loss 13.6359 LearningRate 0.000560 Epoch: 2 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:37,651-Speed 2497.90 samples/sec Loss 13.6685 LearningRate 0.000560 Epoch: 2 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:45,800-Speed 2513.53 samples/sec Loss 13.7622 LearningRate 0.000560 Epoch: 2 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:02:54,007-Speed 2495.69 samples/sec Loss 13.7075 LearningRate 0.000560 Epoch: 2 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:02,208-Speed 2497.94 samples/sec Loss 13.7936 LearningRate 0.000560 Epoch: 2 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:10,420-Speed 2494.52 samples/sec Loss 13.7199 LearningRate 0.000560 Epoch: 2 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:18,620-Speed 2497.81 samples/sec Loss 13.7472 LearningRate 0.000560 Epoch: 2 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:26,825-Speed 2496.52 samples/sec Loss 13.6346 LearningRate 0.000561 Epoch: 2 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:34,986-Speed 2509.99 samples/sec Loss 13.7514 LearningRate 0.000561 Epoch: 2 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:43,202-Speed 2492.94 samples/sec Loss 13.6205 LearningRate 0.000561 Epoch: 2 Global Step: 46520 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:51,405-Speed 2497.06 samples/sec Loss 13.5748 LearningRate 0.000561 Epoch: 2 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:03:59,607-Speed 2497.25 samples/sec Loss 13.5777 LearningRate 0.000561 Epoch: 2 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:07,809-Speed 2497.25 samples/sec Loss 13.6462 LearningRate 0.000561 Epoch: 2 Global Step: 46550 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:16,015-Speed 2496.36 samples/sec Loss 13.5834 LearningRate 0.000561 Epoch: 2 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:24,161-Speed 2514.50 samples/sec Loss 13.4838 LearningRate 0.000561 Epoch: 2 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:32,358-Speed 2498.83 samples/sec Loss 13.6225 LearningRate 0.000561 Epoch: 2 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:40,571-Speed 2493.90 samples/sec Loss 13.6582 LearningRate 0.000562 Epoch: 2 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:48,772-Speed 2497.59 samples/sec Loss 13.5466 LearningRate 0.000562 Epoch: 2 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:04:56,978-Speed 2496.03 samples/sec Loss 13.5821 LearningRate 0.000562 Epoch: 2 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:05,193-Speed 2493.50 samples/sec Loss 13.6742 LearningRate 0.000562 Epoch: 2 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:13,427-Speed 2487.41 samples/sec Loss 13.6117 LearningRate 0.000562 Epoch: 2 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:21,628-Speed 2497.59 samples/sec Loss 13.7169 LearningRate 0.000562 Epoch: 2 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:29,829-Speed 2497.58 samples/sec Loss 13.5194 LearningRate 0.000562 Epoch: 2 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:38,031-Speed 2497.27 samples/sec Loss 13.6745 LearningRate 0.000562 Epoch: 2 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:46,235-Speed 2496.74 samples/sec Loss 13.8350 LearningRate 0.000563 Epoch: 2 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:05:54,440-Speed 2496.60 samples/sec Loss 13.6017 LearningRate 0.000563 Epoch: 2 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:02,588-Speed 2514.01 samples/sec Loss 13.6288 LearningRate 0.000563 Epoch: 2 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:10,794-Speed 2495.82 samples/sec Loss 13.7467 LearningRate 0.000563 Epoch: 2 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:18,998-Speed 2496.91 samples/sec Loss 13.6907 LearningRate 0.000563 Epoch: 2 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:27,197-Speed 2498.08 samples/sec Loss 13.6970 LearningRate 0.000563 Epoch: 2 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:35,396-Speed 2498.54 samples/sec Loss 13.6156 LearningRate 0.000563 Epoch: 2 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:43,600-Speed 2496.67 samples/sec Loss 13.6063 LearningRate 0.000563 Epoch: 2 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:51,751-Speed 2513.22 samples/sec Loss 13.7026 LearningRate 0.000564 Epoch: 2 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:06:59,951-Speed 2497.97 samples/sec Loss 13.6464 LearningRate 0.000564 Epoch: 2 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:08,149-Speed 2498.46 samples/sec Loss 13.6241 LearningRate 0.000564 Epoch: 2 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:16,352-Speed 2497.02 samples/sec Loss 13.5910 LearningRate 0.000564 Epoch: 2 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:24,551-Speed 2498.30 samples/sec Loss 13.6033 LearningRate 0.000564 Epoch: 2 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:32,751-Speed 2498.01 samples/sec Loss 13.6247 LearningRate 0.000564 Epoch: 2 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:40,897-Speed 2514.57 samples/sec Loss 13.6801 LearningRate 0.000564 Epoch: 2 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:49,097-Speed 2497.69 samples/sec Loss 13.6877 LearningRate 0.000564 Epoch: 2 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:07:57,296-Speed 2498.41 samples/sec Loss 13.6275 LearningRate 0.000565 Epoch: 2 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:05,507-Speed 2494.83 samples/sec Loss 13.5312 LearningRate 0.000565 Epoch: 2 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:13,707-Speed 2497.83 samples/sec Loss 13.4369 LearningRate 0.000565 Epoch: 2 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:21,908-Speed 2497.58 samples/sec Loss 13.4933 LearningRate 0.000565 Epoch: 2 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:30,055-Speed 2514.55 samples/sec Loss 13.5344 LearningRate 0.000565 Epoch: 2 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:38,254-Speed 2498.15 samples/sec Loss 13.5535 LearningRate 0.000565 Epoch: 2 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:46,469-Speed 2493.39 samples/sec Loss 13.5481 LearningRate 0.000565 Epoch: 2 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:08:54,676-Speed 2495.82 samples/sec Loss 13.4561 LearningRate 0.000565 Epoch: 2 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:02,878-Speed 2497.34 samples/sec Loss 13.4575 LearningRate 0.000565 Epoch: 2 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:11,091-Speed 2493.95 samples/sec Loss 13.4367 LearningRate 0.000566 Epoch: 2 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:19,243-Speed 2512.62 samples/sec Loss 13.4099 LearningRate 0.000566 Epoch: 2 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:27,444-Speed 2497.80 samples/sec Loss 13.4890 LearningRate 0.000566 Epoch: 2 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:35,645-Speed 2497.53 samples/sec Loss 13.5113 LearningRate 0.000566 Epoch: 2 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:43,844-Speed 2498.13 samples/sec Loss 13.3593 LearningRate 0.000566 Epoch: 2 Global Step: 46960 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:09:52,045-Speed 2497.81 samples/sec Loss 13.3148 LearningRate 0.000566 Epoch: 2 Global Step: 46970 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:00,244-Speed 2498.03 samples/sec Loss 13.3286 LearningRate 0.000566 Epoch: 2 Global Step: 46980 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:08,389-Speed 2514.98 samples/sec Loss 13.4341 LearningRate 0.000566 Epoch: 2 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:16,592-Speed 2496.77 samples/sec Loss 13.5055 LearningRate 0.000567 Epoch: 2 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:24,800-Speed 2495.53 samples/sec Loss 13.5013 LearningRate 0.000567 Epoch: 2 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:33,003-Speed 2497.37 samples/sec Loss 13.4091 LearningRate 0.000567 Epoch: 2 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:41,205-Speed 2497.50 samples/sec Loss 13.5130 LearningRate 0.000567 Epoch: 2 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:49,406-Speed 2497.33 samples/sec Loss 13.4957 LearningRate 0.000567 Epoch: 2 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:10:57,565-Speed 2510.76 samples/sec Loss 13.4281 LearningRate 0.000567 Epoch: 2 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:05,771-Speed 2496.41 samples/sec Loss 13.4001 LearningRate 0.000567 Epoch: 2 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:13,972-Speed 2497.75 samples/sec Loss 13.4961 LearningRate 0.000567 Epoch: 2 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:22,172-Speed 2497.88 samples/sec Loss 13.3613 LearningRate 0.000568 Epoch: 2 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:30,374-Speed 2497.40 samples/sec Loss 13.2501 LearningRate 0.000568 Epoch: 2 Global Step: 47090 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:38,577-Speed 2497.08 samples/sec Loss 13.3540 LearningRate 0.000568 Epoch: 2 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:46,746-Speed 2507.52 samples/sec Loss 13.3110 LearningRate 0.000568 Epoch: 2 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:11:54,960-Speed 2493.84 samples/sec Loss 13.3575 LearningRate 0.000568 Epoch: 2 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:03,187-Speed 2489.46 samples/sec Loss 13.3269 LearningRate 0.000568 Epoch: 2 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:11,389-Speed 2497.92 samples/sec Loss 13.3762 LearningRate 0.000568 Epoch: 2 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:19,599-Speed 2494.76 samples/sec Loss 13.2843 LearningRate 0.000568 Epoch: 2 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:27,800-Speed 2497.57 samples/sec Loss 13.4102 LearningRate 0.000568 Epoch: 2 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:35,944-Speed 2515.38 samples/sec Loss 13.3854 LearningRate 0.000569 Epoch: 2 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:44,141-Speed 2498.76 samples/sec Loss 13.3930 LearningRate 0.000569 Epoch: 2 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:12:52,354-Speed 2493.83 samples/sec Loss 13.3665 LearningRate 0.000569 Epoch: 2 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:00,553-Speed 2498.42 samples/sec Loss 13.4598 LearningRate 0.000569 Epoch: 2 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:08,754-Speed 2497.76 samples/sec Loss 13.5579 LearningRate 0.000569 Epoch: 2 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:16,957-Speed 2497.16 samples/sec Loss 13.3576 LearningRate 0.000569 Epoch: 2 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:25,106-Speed 2513.78 samples/sec Loss 13.3156 LearningRate 0.000569 Epoch: 2 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:33,307-Speed 2497.52 samples/sec Loss 13.3518 LearningRate 0.000569 Epoch: 2 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:41,511-Speed 2496.65 samples/sec Loss 13.3751 LearningRate 0.000570 Epoch: 2 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:49,712-Speed 2497.75 samples/sec Loss 13.4551 LearningRate 0.000570 Epoch: 2 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:13:57,926-Speed 2493.71 samples/sec Loss 13.3752 LearningRate 0.000570 Epoch: 2 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:06,125-Speed 2498.12 samples/sec Loss 13.3535 LearningRate 0.000570 Epoch: 2 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:14,275-Speed 2513.54 samples/sec Loss 13.2970 LearningRate 0.000570 Epoch: 2 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:22,491-Speed 2492.92 samples/sec Loss 13.2984 LearningRate 0.000570 Epoch: 2 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:30,690-Speed 2498.25 samples/sec Loss 13.2365 LearningRate 0.000570 Epoch: 2 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:38,905-Speed 2493.44 samples/sec Loss 13.2346 LearningRate 0.000570 Epoch: 2 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:47,108-Speed 2497.35 samples/sec Loss 13.2324 LearningRate 0.000571 Epoch: 2 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:14:55,312-Speed 2496.76 samples/sec Loss 13.3615 LearningRate 0.000571 Epoch: 2 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:03,465-Speed 2512.19 samples/sec Loss 13.3398 LearningRate 0.000571 Epoch: 2 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:11,666-Speed 2497.81 samples/sec Loss 13.2295 LearningRate 0.000571 Epoch: 2 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:19,873-Speed 2495.78 samples/sec Loss 13.2906 LearningRate 0.000571 Epoch: 2 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:28,075-Speed 2497.52 samples/sec Loss 13.4224 LearningRate 0.000571 Epoch: 2 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:36,281-Speed 2495.80 samples/sec Loss 13.1655 LearningRate 0.000571 Epoch: 2 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:44,495-Speed 2494.01 samples/sec Loss 13.3188 LearningRate 0.000571 Epoch: 2 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:15:52,650-Speed 2511.97 samples/sec Loss 13.4002 LearningRate 0.000571 Epoch: 2 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:00,852-Speed 2497.31 samples/sec Loss 13.2389 LearningRate 0.000572 Epoch: 2 Global Step: 47420 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:09,062-Speed 2494.66 samples/sec Loss 13.3198 LearningRate 0.000572 Epoch: 2 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:17,270-Speed 2496.00 samples/sec Loss 13.4050 LearningRate 0.000572 Epoch: 2 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:25,487-Speed 2492.72 samples/sec Loss 13.3941 LearningRate 0.000572 Epoch: 2 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:33,694-Speed 2495.82 samples/sec Loss 13.3839 LearningRate 0.000572 Epoch: 2 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:41,837-Speed 2515.19 samples/sec Loss 13.3638 LearningRate 0.000572 Epoch: 2 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:50,038-Speed 2498.11 samples/sec Loss 13.2410 LearningRate 0.000572 Epoch: 2 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:16:58,237-Speed 2498.48 samples/sec Loss 13.3667 LearningRate 0.000572 Epoch: 2 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:06,438-Speed 2497.51 samples/sec Loss 13.2921 LearningRate 0.000573 Epoch: 2 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:14,640-Speed 2497.24 samples/sec Loss 13.2005 LearningRate 0.000573 Epoch: 2 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:22,840-Speed 2498.05 samples/sec Loss 13.1449 LearningRate 0.000573 Epoch: 2 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:30,985-Speed 2514.89 samples/sec Loss 13.3075 LearningRate 0.000573 Epoch: 2 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:39,187-Speed 2497.16 samples/sec Loss 13.3653 LearningRate 0.000573 Epoch: 2 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:47,384-Speed 2498.90 samples/sec Loss 13.3413 LearningRate 0.000573 Epoch: 2 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:17:55,584-Speed 2498.08 samples/sec Loss 13.2207 LearningRate 0.000573 Epoch: 2 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:03,790-Speed 2496.28 samples/sec Loss 13.2450 LearningRate 0.000573 Epoch: 2 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:11,990-Speed 2497.95 samples/sec Loss 13.2296 LearningRate 0.000574 Epoch: 2 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:20,136-Speed 2514.43 samples/sec Loss 13.1321 LearningRate 0.000574 Epoch: 2 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:28,345-Speed 2495.30 samples/sec Loss 13.3226 LearningRate 0.000574 Epoch: 2 Global Step: 47600 Fp16 Grad Scale: 262144 Required: 179 hours Training: 2022-07-06 01:18:36,512-Speed 2508.45 samples/sec Loss 13.2288 LearningRate 0.000574 Epoch: 2 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:44,716-Speed 2496.72 samples/sec Loss 13.1857 LearningRate 0.000574 Epoch: 2 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:18:52,917-Speed 2497.64 samples/sec Loss 13.2157 LearningRate 0.000574 Epoch: 2 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:19:01,119-Speed 2498.04 samples/sec Loss 13.3071 LearningRate 0.000574 Epoch: 2 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:19:09,271-Speed 2512.69 samples/sec Loss 13.1424 LearningRate 0.000574 Epoch: 2 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:19:17,476-Speed 2496.54 samples/sec Loss 13.2565 LearningRate 0.000575 Epoch: 2 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 179 hours Training: 2022-07-06 01:19:25,682-Speed 2496.14 samples/sec Loss 13.2024 LearningRate 0.000575 Epoch: 2 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:19:33,885-Speed 2497.23 samples/sec Loss 13.2440 LearningRate 0.000575 Epoch: 2 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:19:42,088-Speed 2496.99 samples/sec Loss 13.1671 LearningRate 0.000575 Epoch: 2 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:19:50,290-Speed 2497.33 samples/sec Loss 13.3080 LearningRate 0.000575 Epoch: 2 Global Step: 47700 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:19:58,440-Speed 2513.13 samples/sec Loss 13.2424 LearningRate 0.000575 Epoch: 2 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:06,642-Speed 2497.61 samples/sec Loss 13.1701 LearningRate 0.000575 Epoch: 2 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:14,841-Speed 2498.14 samples/sec Loss 13.1648 LearningRate 0.000575 Epoch: 2 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:23,042-Speed 2497.71 samples/sec Loss 13.2348 LearningRate 0.000575 Epoch: 2 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:31,243-Speed 2498.06 samples/sec Loss 13.2407 LearningRate 0.000576 Epoch: 2 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:39,447-Speed 2496.79 samples/sec Loss 13.2136 LearningRate 0.000576 Epoch: 2 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:47,601-Speed 2512.22 samples/sec Loss 13.1899 LearningRate 0.000576 Epoch: 2 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:20:55,799-Speed 2498.34 samples/sec Loss 13.2542 LearningRate 0.000576 Epoch: 2 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:03,999-Speed 2498.29 samples/sec Loss 13.1934 LearningRate 0.000576 Epoch: 2 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:12,201-Speed 2497.20 samples/sec Loss 13.1061 LearningRate 0.000576 Epoch: 2 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:20,500-Speed 2468.14 samples/sec Loss 13.1507 LearningRate 0.000576 Epoch: 2 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:28,701-Speed 2497.80 samples/sec Loss 13.1550 LearningRate 0.000576 Epoch: 2 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:36,846-Speed 2514.87 samples/sec Loss 13.0983 LearningRate 0.000577 Epoch: 2 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:45,050-Speed 2497.13 samples/sec Loss 13.1531 LearningRate 0.000577 Epoch: 2 Global Step: 47840 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:21:53,250-Speed 2497.81 samples/sec Loss 13.1054 LearningRate 0.000577 Epoch: 2 Global Step: 47850 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:01,452-Speed 2497.09 samples/sec Loss 13.0870 LearningRate 0.000577 Epoch: 2 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:09,653-Speed 2497.87 samples/sec Loss 13.0474 LearningRate 0.000577 Epoch: 2 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:17,850-Speed 2498.90 samples/sec Loss 13.0498 LearningRate 0.000577 Epoch: 2 Global Step: 47880 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:26,004-Speed 2512.11 samples/sec Loss 13.0707 LearningRate 0.000577 Epoch: 2 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:34,204-Speed 2497.82 samples/sec Loss 13.1660 LearningRate 0.000577 Epoch: 2 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:42,407-Speed 2497.11 samples/sec Loss 13.1757 LearningRate 0.000578 Epoch: 2 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:50,605-Speed 2498.58 samples/sec Loss 13.1775 LearningRate 0.000578 Epoch: 2 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:22:58,806-Speed 2497.49 samples/sec Loss 13.1697 LearningRate 0.000578 Epoch: 2 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:07,006-Speed 2497.97 samples/sec Loss 13.2871 LearningRate 0.000578 Epoch: 2 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:15,150-Speed 2515.26 samples/sec Loss 13.1154 LearningRate 0.000578 Epoch: 2 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:23,351-Speed 2497.53 samples/sec Loss 13.0996 LearningRate 0.000578 Epoch: 2 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:31,549-Speed 2498.46 samples/sec Loss 13.0570 LearningRate 0.000578 Epoch: 2 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:39,762-Speed 2494.18 samples/sec Loss 13.0443 LearningRate 0.000578 Epoch: 2 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:47,961-Speed 2498.43 samples/sec Loss 13.0289 LearningRate 0.000578 Epoch: 2 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:23:56,174-Speed 2493.90 samples/sec Loss 13.0516 LearningRate 0.000579 Epoch: 2 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:04,322-Speed 2514.15 samples/sec Loss 12.9993 LearningRate 0.000579 Epoch: 2 Global Step: 48010 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:12,522-Speed 2497.92 samples/sec Loss 13.0969 LearningRate 0.000579 Epoch: 2 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:20,724-Speed 2497.18 samples/sec Loss 12.9041 LearningRate 0.000579 Epoch: 2 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:28,931-Speed 2495.81 samples/sec Loss 13.0877 LearningRate 0.000579 Epoch: 2 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:37,127-Speed 2499.20 samples/sec Loss 13.0616 LearningRate 0.000579 Epoch: 2 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:45,327-Speed 2497.72 samples/sec Loss 13.0576 LearningRate 0.000579 Epoch: 2 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:24:53,478-Speed 2513.04 samples/sec Loss 13.0681 LearningRate 0.000579 Epoch: 2 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:01,678-Speed 2497.82 samples/sec Loss 13.0388 LearningRate 0.000580 Epoch: 2 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:09,883-Speed 2496.38 samples/sec Loss 12.9742 LearningRate 0.000580 Epoch: 2 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:18,096-Speed 2494.29 samples/sec Loss 13.0175 LearningRate 0.000580 Epoch: 2 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:26,297-Speed 2497.62 samples/sec Loss 13.0345 LearningRate 0.000580 Epoch: 2 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:34,505-Speed 2495.40 samples/sec Loss 12.9741 LearningRate 0.000580 Epoch: 2 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:42,656-Speed 2513.09 samples/sec Loss 12.8908 LearningRate 0.000580 Epoch: 2 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:50,866-Speed 2494.92 samples/sec Loss 12.9464 LearningRate 0.000580 Epoch: 2 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:25:59,073-Speed 2495.74 samples/sec Loss 12.9827 LearningRate 0.000580 Epoch: 2 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:07,283-Speed 2495.03 samples/sec Loss 13.0204 LearningRate 0.000581 Epoch: 2 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:15,486-Speed 2496.79 samples/sec Loss 12.9680 LearningRate 0.000581 Epoch: 2 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:23,703-Speed 2492.96 samples/sec Loss 13.1087 LearningRate 0.000581 Epoch: 2 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:31,853-Speed 2513.37 samples/sec Loss 13.1153 LearningRate 0.000581 Epoch: 2 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:40,059-Speed 2496.10 samples/sec Loss 12.9613 LearningRate 0.000581 Epoch: 2 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:48,261-Speed 2497.47 samples/sec Loss 12.9755 LearningRate 0.000581 Epoch: 2 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:26:56,465-Speed 2496.76 samples/sec Loss 12.9155 LearningRate 0.000581 Epoch: 2 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:04,669-Speed 2496.78 samples/sec Loss 12.8628 LearningRate 0.000581 Epoch: 2 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:12,874-Speed 2496.14 samples/sec Loss 13.0206 LearningRate 0.000582 Epoch: 2 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:21,023-Speed 2513.72 samples/sec Loss 12.9107 LearningRate 0.000582 Epoch: 2 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:29,224-Speed 2497.70 samples/sec Loss 12.9813 LearningRate 0.000582 Epoch: 2 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:37,438-Speed 2493.66 samples/sec Loss 12.9813 LearningRate 0.000582 Epoch: 2 Global Step: 48270 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:45,640-Speed 2497.50 samples/sec Loss 12.9242 LearningRate 0.000582 Epoch: 2 Global Step: 48280 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:27:53,843-Speed 2497.21 samples/sec Loss 12.9012 LearningRate 0.000582 Epoch: 2 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:02,060-Speed 2492.79 samples/sec Loss 13.0224 LearningRate 0.000582 Epoch: 2 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:10,222-Speed 2509.75 samples/sec Loss 13.0783 LearningRate 0.000582 Epoch: 2 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:18,436-Speed 2493.72 samples/sec Loss 13.0410 LearningRate 0.000582 Epoch: 2 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:26,637-Speed 2497.49 samples/sec Loss 13.0035 LearningRate 0.000583 Epoch: 2 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:34,838-Speed 2497.85 samples/sec Loss 12.9442 LearningRate 0.000583 Epoch: 2 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:43,036-Speed 2498.42 samples/sec Loss 12.9593 LearningRate 0.000583 Epoch: 2 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:51,235-Speed 2498.21 samples/sec Loss 12.8695 LearningRate 0.000583 Epoch: 2 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:28:59,388-Speed 2512.24 samples/sec Loss 12.9516 LearningRate 0.000583 Epoch: 2 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:07,588-Speed 2498.14 samples/sec Loss 12.9496 LearningRate 0.000583 Epoch: 2 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:15,801-Speed 2493.80 samples/sec Loss 12.8017 LearningRate 0.000583 Epoch: 2 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:24,000-Speed 2498.22 samples/sec Loss 12.9592 LearningRate 0.000583 Epoch: 2 Global Step: 48400 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:32,202-Speed 2497.37 samples/sec Loss 12.8479 LearningRate 0.000584 Epoch: 2 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:40,400-Speed 2498.66 samples/sec Loss 12.9189 LearningRate 0.000584 Epoch: 2 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:48,547-Speed 2514.28 samples/sec Loss 13.0827 LearningRate 0.000584 Epoch: 2 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:29:56,747-Speed 2497.76 samples/sec Loss 12.9701 LearningRate 0.000584 Epoch: 2 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:04,950-Speed 2497.15 samples/sec Loss 12.9249 LearningRate 0.000584 Epoch: 2 Global Step: 48450 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:13,155-Speed 2496.70 samples/sec Loss 12.9217 LearningRate 0.000584 Epoch: 2 Global Step: 48460 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:21,355-Speed 2497.83 samples/sec Loss 12.9374 LearningRate 0.000584 Epoch: 2 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:29,556-Speed 2497.62 samples/sec Loss 12.8388 LearningRate 0.000584 Epoch: 2 Global Step: 48480 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:37,703-Speed 2514.18 samples/sec Loss 12.9011 LearningRate 0.000585 Epoch: 2 Global Step: 48490 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:45,907-Speed 2496.83 samples/sec Loss 12.8599 LearningRate 0.000585 Epoch: 2 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:30:54,065-Speed 2510.88 samples/sec Loss 12.8745 LearningRate 0.000585 Epoch: 2 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:02,267-Speed 2497.84 samples/sec Loss 13.0165 LearningRate 0.000585 Epoch: 2 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:10,471-Speed 2496.58 samples/sec Loss 12.7565 LearningRate 0.000585 Epoch: 2 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:18,676-Speed 2496.39 samples/sec Loss 12.7368 LearningRate 0.000585 Epoch: 2 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:26,825-Speed 2513.47 samples/sec Loss 12.7609 LearningRate 0.000585 Epoch: 2 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:35,024-Speed 2498.31 samples/sec Loss 12.6400 LearningRate 0.000585 Epoch: 2 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:43,226-Speed 2497.43 samples/sec Loss 12.7589 LearningRate 0.000585 Epoch: 2 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:51,425-Speed 2498.09 samples/sec Loss 12.8350 LearningRate 0.000586 Epoch: 2 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:31:59,629-Speed 2496.84 samples/sec Loss 12.8781 LearningRate 0.000586 Epoch: 2 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:07,832-Speed 2497.19 samples/sec Loss 12.9821 LearningRate 0.000586 Epoch: 2 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:15,988-Speed 2511.56 samples/sec Loss 12.9169 LearningRate 0.000586 Epoch: 2 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:24,190-Speed 2497.53 samples/sec Loss 12.9333 LearningRate 0.000586 Epoch: 2 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:32,392-Speed 2497.48 samples/sec Loss 12.9476 LearningRate 0.000586 Epoch: 2 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:40,599-Speed 2496.86 samples/sec Loss 12.8846 LearningRate 0.000586 Epoch: 2 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:48,803-Speed 2496.86 samples/sec Loss 12.8207 LearningRate 0.000586 Epoch: 2 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:32:57,008-Speed 2496.41 samples/sec Loss 12.8209 LearningRate 0.000587 Epoch: 2 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:05,160-Speed 2512.40 samples/sec Loss 12.7894 LearningRate 0.000587 Epoch: 2 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:14,105-Speed 2499.79 samples/sec Loss 12.7436 LearningRate 0.000587 Epoch: 2 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:22,307-Speed 2497.45 samples/sec Loss 12.8771 LearningRate 0.000587 Epoch: 2 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:30,509-Speed 2497.43 samples/sec Loss 12.8171 LearningRate 0.000587 Epoch: 2 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:38,759-Speed 2500.12 samples/sec Loss 12.7334 LearningRate 0.000587 Epoch: 2 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:48,351-Speed 2500.54 samples/sec Loss 12.7706 LearningRate 0.000587 Epoch: 2 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:33:56,504-Speed 2512.56 samples/sec Loss 12.7394 LearningRate 0.000587 Epoch: 2 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:04,704-Speed 2497.98 samples/sec Loss 12.7817 LearningRate 0.000588 Epoch: 2 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:12,902-Speed 2498.74 samples/sec Loss 12.8544 LearningRate 0.000588 Epoch: 2 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:21,103-Speed 2497.71 samples/sec Loss 12.7384 LearningRate 0.000588 Epoch: 2 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:29,304-Speed 2497.59 samples/sec Loss 12.8343 LearningRate 0.000588 Epoch: 2 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:37,502-Speed 2498.97 samples/sec Loss 12.7485 LearningRate 0.000588 Epoch: 2 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:45,646-Speed 2515.44 samples/sec Loss 12.8490 LearningRate 0.000588 Epoch: 2 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:34:53,846-Speed 2498.03 samples/sec Loss 12.8344 LearningRate 0.000588 Epoch: 2 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:02,054-Speed 2495.35 samples/sec Loss 12.7388 LearningRate 0.000588 Epoch: 2 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:10,260-Speed 2496.55 samples/sec Loss 12.6785 LearningRate 0.000588 Epoch: 2 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:18,464-Speed 2496.53 samples/sec Loss 12.8021 LearningRate 0.000589 Epoch: 2 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:26,667-Speed 2497.14 samples/sec Loss 12.7851 LearningRate 0.000589 Epoch: 2 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:34,817-Speed 2513.36 samples/sec Loss 12.7742 LearningRate 0.000589 Epoch: 2 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:43,031-Speed 2493.60 samples/sec Loss 12.8590 LearningRate 0.000589 Epoch: 2 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:51,233-Speed 2497.50 samples/sec Loss 12.8465 LearningRate 0.000589 Epoch: 2 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:35:59,438-Speed 2496.66 samples/sec Loss 12.6670 LearningRate 0.000589 Epoch: 2 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:07,646-Speed 2495.46 samples/sec Loss 12.8690 LearningRate 0.000589 Epoch: 2 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:15,848-Speed 2497.15 samples/sec Loss 12.7569 LearningRate 0.000589 Epoch: 2 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:23,995-Speed 2514.49 samples/sec Loss 12.6759 LearningRate 0.000590 Epoch: 2 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:32,194-Speed 2498.22 samples/sec Loss 12.8618 LearningRate 0.000590 Epoch: 2 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:40,399-Speed 2496.60 samples/sec Loss 12.6827 LearningRate 0.000590 Epoch: 2 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:48,600-Speed 2497.52 samples/sec Loss 12.6753 LearningRate 0.000590 Epoch: 2 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:36:56,801-Speed 2497.57 samples/sec Loss 12.5783 LearningRate 0.000590 Epoch: 2 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:05,003-Speed 2497.52 samples/sec Loss 12.7211 LearningRate 0.000590 Epoch: 2 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:13,152-Speed 2513.51 samples/sec Loss 12.6422 LearningRate 0.000590 Epoch: 2 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:21,354-Speed 2497.29 samples/sec Loss 12.5610 LearningRate 0.000590 Epoch: 2 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:29,561-Speed 2496.09 samples/sec Loss 12.6199 LearningRate 0.000591 Epoch: 2 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:37,760-Speed 2498.00 samples/sec Loss 12.5608 LearningRate 0.000591 Epoch: 2 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:45,961-Speed 2497.65 samples/sec Loss 12.5963 LearningRate 0.000591 Epoch: 2 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:37:54,163-Speed 2497.61 samples/sec Loss 12.6089 LearningRate 0.000591 Epoch: 2 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:02,316-Speed 2512.33 samples/sec Loss 12.6495 LearningRate 0.000591 Epoch: 2 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:10,515-Speed 2498.13 samples/sec Loss 12.7041 LearningRate 0.000591 Epoch: 2 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:18,720-Speed 2496.45 samples/sec Loss 12.5863 LearningRate 0.000591 Epoch: 2 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:26,928-Speed 2495.50 samples/sec Loss 12.5637 LearningRate 0.000591 Epoch: 2 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:35,129-Speed 2497.63 samples/sec Loss 12.8332 LearningRate 0.000592 Epoch: 2 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:43,333-Speed 2496.68 samples/sec Loss 12.6603 LearningRate 0.000592 Epoch: 2 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:51,480-Speed 2514.13 samples/sec Loss 12.6975 LearningRate 0.000592 Epoch: 2 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:38:59,684-Speed 2497.23 samples/sec Loss 12.8425 LearningRate 0.000592 Epoch: 2 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:07,887-Speed 2497.07 samples/sec Loss 12.8245 LearningRate 0.000592 Epoch: 2 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:16,091-Speed 2496.79 samples/sec Loss 12.7308 LearningRate 0.000592 Epoch: 2 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:24,291-Speed 2498.13 samples/sec Loss 12.7150 LearningRate 0.000592 Epoch: 2 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:32,500-Speed 2495.28 samples/sec Loss 12.6614 LearningRate 0.000592 Epoch: 2 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:40,646-Speed 2514.34 samples/sec Loss 12.6777 LearningRate 0.000592 Epoch: 2 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:48,848-Speed 2497.43 samples/sec Loss 12.8271 LearningRate 0.000593 Epoch: 2 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:39:57,048-Speed 2497.87 samples/sec Loss 12.6689 LearningRate 0.000593 Epoch: 2 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:05,250-Speed 2497.32 samples/sec Loss 12.7076 LearningRate 0.000593 Epoch: 2 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:13,448-Speed 2498.79 samples/sec Loss 12.6422 LearningRate 0.000593 Epoch: 2 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:21,654-Speed 2496.20 samples/sec Loss 12.8287 LearningRate 0.000593 Epoch: 2 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:29,800-Speed 2514.35 samples/sec Loss 12.6607 LearningRate 0.000593 Epoch: 2 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:38,001-Speed 2497.57 samples/sec Loss 12.7014 LearningRate 0.000593 Epoch: 2 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:46,199-Speed 2498.68 samples/sec Loss 12.6672 LearningRate 0.000593 Epoch: 2 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:40:54,400-Speed 2497.77 samples/sec Loss 12.6449 LearningRate 0.000594 Epoch: 2 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:02,601-Speed 2497.74 samples/sec Loss 12.6471 LearningRate 0.000594 Epoch: 2 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:10,806-Speed 2496.43 samples/sec Loss 12.7683 LearningRate 0.000594 Epoch: 2 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:18,971-Speed 2508.76 samples/sec Loss 12.7014 LearningRate 0.000594 Epoch: 2 Global Step: 49270 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:27,167-Speed 2499.11 samples/sec Loss 12.7521 LearningRate 0.000594 Epoch: 2 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:35,372-Speed 2496.32 samples/sec Loss 12.8045 LearningRate 0.000594 Epoch: 2 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:43,584-Speed 2494.29 samples/sec Loss 12.7380 LearningRate 0.000594 Epoch: 2 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:51,789-Speed 2496.61 samples/sec Loss 12.6699 LearningRate 0.000594 Epoch: 2 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:41:59,987-Speed 2498.36 samples/sec Loss 12.6045 LearningRate 0.000595 Epoch: 2 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:08,136-Speed 2513.59 samples/sec Loss 12.6653 LearningRate 0.000595 Epoch: 2 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:16,338-Speed 2497.56 samples/sec Loss 12.6033 LearningRate 0.000595 Epoch: 2 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:24,541-Speed 2497.19 samples/sec Loss 12.6919 LearningRate 0.000595 Epoch: 2 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:32,756-Speed 2493.64 samples/sec Loss 12.6804 LearningRate 0.000595 Epoch: 2 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:40,958-Speed 2497.32 samples/sec Loss 12.6591 LearningRate 0.000595 Epoch: 2 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:49,173-Speed 2493.29 samples/sec Loss 12.5361 LearningRate 0.000595 Epoch: 2 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:42:57,325-Speed 2512.62 samples/sec Loss 12.5971 LearningRate 0.000595 Epoch: 2 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:05,539-Speed 2493.88 samples/sec Loss 12.4900 LearningRate 0.000595 Epoch: 2 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:13,741-Speed 2497.17 samples/sec Loss 12.5748 LearningRate 0.000596 Epoch: 2 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:21,937-Speed 2499.09 samples/sec Loss 12.5138 LearningRate 0.000596 Epoch: 2 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:30,146-Speed 2495.32 samples/sec Loss 12.6966 LearningRate 0.000596 Epoch: 2 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:38,348-Speed 2497.51 samples/sec Loss 12.5535 LearningRate 0.000596 Epoch: 2 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:46,495-Speed 2514.32 samples/sec Loss 12.5321 LearningRate 0.000596 Epoch: 2 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:43:54,695-Speed 2497.71 samples/sec Loss 12.6257 LearningRate 0.000596 Epoch: 2 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:02,893-Speed 2498.49 samples/sec Loss 12.6997 LearningRate 0.000596 Epoch: 2 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:11,092-Speed 2498.95 samples/sec Loss 12.8014 LearningRate 0.000596 Epoch: 2 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:19,293-Speed 2497.57 samples/sec Loss 12.5833 LearningRate 0.000597 Epoch: 2 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:27,497-Speed 2496.96 samples/sec Loss 12.6264 LearningRate 0.000597 Epoch: 2 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:35,645-Speed 2513.72 samples/sec Loss 12.5156 LearningRate 0.000597 Epoch: 2 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:43,845-Speed 2498.15 samples/sec Loss 12.6766 LearningRate 0.000597 Epoch: 2 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:44:52,045-Speed 2497.83 samples/sec Loss 12.5574 LearningRate 0.000597 Epoch: 2 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:00,246-Speed 2498.12 samples/sec Loss 12.6382 LearningRate 0.000597 Epoch: 2 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:08,443-Speed 2498.61 samples/sec Loss 12.5395 LearningRate 0.000597 Epoch: 2 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:16,642-Speed 2498.37 samples/sec Loss 12.6964 LearningRate 0.000597 Epoch: 2 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:24,787-Speed 2514.87 samples/sec Loss 12.5010 LearningRate 0.000598 Epoch: 2 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:33,000-Speed 2494.17 samples/sec Loss 12.4487 LearningRate 0.000598 Epoch: 2 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:41,198-Speed 2498.47 samples/sec Loss 12.5306 LearningRate 0.000598 Epoch: 2 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:49,397-Speed 2498.16 samples/sec Loss 12.5389 LearningRate 0.000598 Epoch: 2 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:45:57,597-Speed 2498.21 samples/sec Loss 12.5423 LearningRate 0.000598 Epoch: 2 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:05,797-Speed 2497.80 samples/sec Loss 12.4778 LearningRate 0.000598 Epoch: 2 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:13,947-Speed 2513.22 samples/sec Loss 12.4582 LearningRate 0.000598 Epoch: 2 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:22,154-Speed 2495.84 samples/sec Loss 12.5705 LearningRate 0.000598 Epoch: 2 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:30,361-Speed 2496.08 samples/sec Loss 12.5956 LearningRate 0.000598 Epoch: 2 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:38,576-Speed 2493.36 samples/sec Loss 12.5579 LearningRate 0.000599 Epoch: 2 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:46,790-Speed 2493.69 samples/sec Loss 12.6349 LearningRate 0.000599 Epoch: 2 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:46:54,993-Speed 2497.25 samples/sec Loss 12.5955 LearningRate 0.000599 Epoch: 2 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:47:03,142-Speed 2513.55 samples/sec Loss 12.5996 LearningRate 0.000599 Epoch: 2 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:47:11,348-Speed 2496.07 samples/sec Loss 12.5601 LearningRate 0.000599 Epoch: 2 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 01:47:19,548-Speed 2498.01 samples/sec Loss 12.5878 LearningRate 0.000599 Epoch: 2 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:47:27,750-Speed 2497.43 samples/sec Loss 12.6023 LearningRate 0.000599 Epoch: 2 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:47:35,955-Speed 2496.44 samples/sec Loss 12.5701 LearningRate 0.000599 Epoch: 2 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:47:44,156-Speed 2497.59 samples/sec Loss 12.4965 LearningRate 0.000600 Epoch: 2 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:47:52,311-Speed 2511.90 samples/sec Loss 12.5417 LearningRate 0.000600 Epoch: 2 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:00,514-Speed 2497.09 samples/sec Loss 12.5498 LearningRate 0.000600 Epoch: 2 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:08,714-Speed 2498.00 samples/sec Loss 12.5730 LearningRate 0.000600 Epoch: 2 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:16,914-Speed 2497.76 samples/sec Loss 12.5079 LearningRate 0.000600 Epoch: 2 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:25,115-Speed 2497.77 samples/sec Loss 12.5765 LearningRate 0.000600 Epoch: 2 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:33,320-Speed 2496.57 samples/sec Loss 12.5151 LearningRate 0.000600 Epoch: 2 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:41,465-Speed 2514.68 samples/sec Loss 12.4815 LearningRate 0.000600 Epoch: 2 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:49,671-Speed 2496.41 samples/sec Loss 12.5011 LearningRate 0.000601 Epoch: 2 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:48:57,872-Speed 2497.66 samples/sec Loss 12.5752 LearningRate 0.000601 Epoch: 2 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:06,071-Speed 2498.22 samples/sec Loss 12.4734 LearningRate 0.000601 Epoch: 2 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:14,277-Speed 2496.00 samples/sec Loss 12.3612 LearningRate 0.000601 Epoch: 2 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:22,478-Speed 2497.70 samples/sec Loss 12.4284 LearningRate 0.000601 Epoch: 2 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:30,626-Speed 2514.00 samples/sec Loss 12.4430 LearningRate 0.000601 Epoch: 2 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:38,823-Speed 2498.67 samples/sec Loss 12.4431 LearningRate 0.000601 Epoch: 2 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:47,023-Speed 2498.76 samples/sec Loss 12.3714 LearningRate 0.000601 Epoch: 2 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:49:55,222-Speed 2498.55 samples/sec Loss 12.4464 LearningRate 0.000602 Epoch: 2 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:03,423-Speed 2497.60 samples/sec Loss 12.5337 LearningRate 0.000602 Epoch: 2 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:11,626-Speed 2497.38 samples/sec Loss 12.3843 LearningRate 0.000602 Epoch: 2 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:19,774-Speed 2513.75 samples/sec Loss 12.5005 LearningRate 0.000602 Epoch: 2 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:27,976-Speed 2497.51 samples/sec Loss 12.3547 LearningRate 0.000602 Epoch: 2 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:36,174-Speed 2498.58 samples/sec Loss 12.4597 LearningRate 0.000602 Epoch: 2 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:44,375-Speed 2497.74 samples/sec Loss 12.3602 LearningRate 0.000602 Epoch: 2 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:50:52,579-Speed 2496.55 samples/sec Loss 12.5256 LearningRate 0.000602 Epoch: 2 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:00,780-Speed 2497.72 samples/sec Loss 12.4397 LearningRate 0.000602 Epoch: 2 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:08,942-Speed 2509.54 samples/sec Loss 12.3632 LearningRate 0.000603 Epoch: 2 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:17,156-Speed 2493.79 samples/sec Loss 12.3871 LearningRate 0.000603 Epoch: 2 Global Step: 50000 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:25,355-Speed 2497.94 samples/sec Loss 12.3573 LearningRate 0.000603 Epoch: 2 Global Step: 50010 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:33,556-Speed 2497.59 samples/sec Loss 12.3290 LearningRate 0.000603 Epoch: 2 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:41,763-Speed 2495.89 samples/sec Loss 12.3101 LearningRate 0.000603 Epoch: 2 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:49,964-Speed 2497.58 samples/sec Loss 12.2960 LearningRate 0.000603 Epoch: 2 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:51:58,111-Speed 2514.35 samples/sec Loss 12.6045 LearningRate 0.000603 Epoch: 2 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:06,312-Speed 2497.66 samples/sec Loss 12.5486 LearningRate 0.000603 Epoch: 2 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:14,525-Speed 2494.29 samples/sec Loss 12.4748 LearningRate 0.000604 Epoch: 2 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:22,722-Speed 2498.56 samples/sec Loss 12.4723 LearningRate 0.000604 Epoch: 2 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:30,922-Speed 2497.95 samples/sec Loss 12.4917 LearningRate 0.000604 Epoch: 2 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:39,135-Speed 2494.06 samples/sec Loss 12.4251 LearningRate 0.000604 Epoch: 2 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:47,284-Speed 2513.67 samples/sec Loss 12.4131 LearningRate 0.000604 Epoch: 2 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:52:55,487-Speed 2497.05 samples/sec Loss 12.4031 LearningRate 0.000604 Epoch: 2 Global Step: 50120 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:03,692-Speed 2496.36 samples/sec Loss 12.4295 LearningRate 0.000604 Epoch: 2 Global Step: 50130 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:11,898-Speed 2495.94 samples/sec Loss 12.4777 LearningRate 0.000604 Epoch: 2 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:20,097-Speed 2498.48 samples/sec Loss 12.4276 LearningRate 0.000605 Epoch: 2 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:28,297-Speed 2497.74 samples/sec Loss 12.4696 LearningRate 0.000605 Epoch: 2 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:36,459-Speed 2509.61 samples/sec Loss 12.3660 LearningRate 0.000605 Epoch: 2 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:44,660-Speed 2497.59 samples/sec Loss 12.4047 LearningRate 0.000605 Epoch: 2 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:53:52,865-Speed 2496.57 samples/sec Loss 12.4691 LearningRate 0.000605 Epoch: 2 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:01,065-Speed 2497.90 samples/sec Loss 12.3936 LearningRate 0.000605 Epoch: 2 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:09,263-Speed 2498.33 samples/sec Loss 12.3563 LearningRate 0.000605 Epoch: 2 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:17,465-Speed 2497.52 samples/sec Loss 12.4019 LearningRate 0.000605 Epoch: 2 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:25,610-Speed 2514.87 samples/sec Loss 12.3757 LearningRate 0.000605 Epoch: 2 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:33,815-Speed 2496.37 samples/sec Loss 12.2720 LearningRate 0.000606 Epoch: 2 Global Step: 50240 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:42,014-Speed 2498.07 samples/sec Loss 12.3615 LearningRate 0.000606 Epoch: 2 Global Step: 50250 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:50,214-Speed 2498.28 samples/sec Loss 12.3044 LearningRate 0.000606 Epoch: 2 Global Step: 50260 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:54:58,412-Speed 2498.79 samples/sec Loss 12.5247 LearningRate 0.000606 Epoch: 2 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:06,612-Speed 2497.93 samples/sec Loss 12.3797 LearningRate 0.000606 Epoch: 2 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:14,758-Speed 2514.55 samples/sec Loss 12.3878 LearningRate 0.000606 Epoch: 2 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:22,964-Speed 2496.04 samples/sec Loss 12.3463 LearningRate 0.000606 Epoch: 2 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:31,163-Speed 2498.33 samples/sec Loss 12.3288 LearningRate 0.000606 Epoch: 2 Global Step: 50310 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:39,362-Speed 2498.15 samples/sec Loss 12.3135 LearningRate 0.000607 Epoch: 2 Global Step: 50320 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:47,563-Speed 2497.55 samples/sec Loss 12.2524 LearningRate 0.000607 Epoch: 2 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:55:55,761-Speed 2498.67 samples/sec Loss 12.3109 LearningRate 0.000607 Epoch: 2 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:03,906-Speed 2514.64 samples/sec Loss 12.4022 LearningRate 0.000607 Epoch: 2 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:12,106-Speed 2498.05 samples/sec Loss 12.3406 LearningRate 0.000607 Epoch: 2 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:20,312-Speed 2496.07 samples/sec Loss 12.2529 LearningRate 0.000607 Epoch: 2 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:28,518-Speed 2496.49 samples/sec Loss 12.2996 LearningRate 0.000607 Epoch: 2 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:36,720-Speed 2497.11 samples/sec Loss 12.3416 LearningRate 0.000607 Epoch: 2 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:44,921-Speed 2497.73 samples/sec Loss 12.3082 LearningRate 0.000608 Epoch: 2 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:56:53,066-Speed 2514.64 samples/sec Loss 12.3882 LearningRate 0.000608 Epoch: 2 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:01,264-Speed 2498.70 samples/sec Loss 12.3666 LearningRate 0.000608 Epoch: 2 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:09,464-Speed 2497.84 samples/sec Loss 12.3017 LearningRate 0.000608 Epoch: 2 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:17,664-Speed 2497.54 samples/sec Loss 12.3027 LearningRate 0.000608 Epoch: 2 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:25,864-Speed 2497.88 samples/sec Loss 12.2576 LearningRate 0.000608 Epoch: 2 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:34,065-Speed 2497.64 samples/sec Loss 12.3038 LearningRate 0.000608 Epoch: 2 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:42,210-Speed 2514.89 samples/sec Loss 12.4053 LearningRate 0.000608 Epoch: 2 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:50,409-Speed 2498.29 samples/sec Loss 12.3670 LearningRate 0.000609 Epoch: 2 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:57:58,613-Speed 2497.10 samples/sec Loss 12.3473 LearningRate 0.000609 Epoch: 2 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:06,812-Speed 2498.24 samples/sec Loss 12.2881 LearningRate 0.000609 Epoch: 2 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:15,011-Speed 2498.30 samples/sec Loss 12.2590 LearningRate 0.000609 Epoch: 2 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:23,209-Speed 2498.78 samples/sec Loss 12.2917 LearningRate 0.000609 Epoch: 2 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:31,358-Speed 2513.49 samples/sec Loss 12.3037 LearningRate 0.000609 Epoch: 2 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:39,555-Speed 2498.97 samples/sec Loss 12.3035 LearningRate 0.000609 Epoch: 2 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:47,753-Speed 2498.64 samples/sec Loss 12.2849 LearningRate 0.000609 Epoch: 2 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:58:55,953-Speed 2497.82 samples/sec Loss 12.2189 LearningRate 0.000609 Epoch: 2 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:04,153-Speed 2497.91 samples/sec Loss 12.2552 LearningRate 0.000610 Epoch: 2 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:12,355-Speed 2497.23 samples/sec Loss 12.3953 LearningRate 0.000610 Epoch: 2 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:20,501-Speed 2514.61 samples/sec Loss 12.1917 LearningRate 0.000610 Epoch: 2 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:28,700-Speed 2498.10 samples/sec Loss 12.3560 LearningRate 0.000610 Epoch: 2 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:36,911-Speed 2494.89 samples/sec Loss 12.3712 LearningRate 0.000610 Epoch: 2 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:45,116-Speed 2496.68 samples/sec Loss 12.4144 LearningRate 0.000610 Epoch: 2 Global Step: 50620 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 01:59:53,321-Speed 2496.52 samples/sec Loss 12.2701 LearningRate 0.000610 Epoch: 2 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:01,526-Speed 2496.41 samples/sec Loss 12.3701 LearningRate 0.000610 Epoch: 2 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:09,676-Speed 2513.28 samples/sec Loss 12.3020 LearningRate 0.000611 Epoch: 2 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:17,877-Speed 2497.72 samples/sec Loss 12.3330 LearningRate 0.000611 Epoch: 2 Global Step: 50660 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:26,097-Speed 2492.06 samples/sec Loss 12.3933 LearningRate 0.000611 Epoch: 2 Global Step: 50670 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:34,301-Speed 2496.47 samples/sec Loss 12.2632 LearningRate 0.000611 Epoch: 2 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:42,515-Speed 2493.92 samples/sec Loss 12.2728 LearningRate 0.000611 Epoch: 2 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:50,725-Speed 2494.91 samples/sec Loss 12.3084 LearningRate 0.000611 Epoch: 2 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:00:58,872-Speed 2514.16 samples/sec Loss 12.1559 LearningRate 0.000611 Epoch: 2 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:07,072-Speed 2497.97 samples/sec Loss 12.2543 LearningRate 0.000611 Epoch: 2 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:15,285-Speed 2493.92 samples/sec Loss 12.1732 LearningRate 0.000612 Epoch: 2 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:23,485-Speed 2498.10 samples/sec Loss 12.3893 LearningRate 0.000612 Epoch: 2 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:31,685-Speed 2497.95 samples/sec Loss 12.4731 LearningRate 0.000612 Epoch: 2 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:39,885-Speed 2497.83 samples/sec Loss 12.3494 LearningRate 0.000612 Epoch: 2 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:48,035-Speed 2513.62 samples/sec Loss 12.4216 LearningRate 0.000612 Epoch: 2 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:01:56,233-Speed 2498.43 samples/sec Loss 12.2674 LearningRate 0.000612 Epoch: 2 Global Step: 50780 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:04,434-Speed 2497.78 samples/sec Loss 12.2625 LearningRate 0.000612 Epoch: 2 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:12,637-Speed 2497.03 samples/sec Loss 12.2676 LearningRate 0.000612 Epoch: 2 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:20,835-Speed 2498.43 samples/sec Loss 12.3012 LearningRate 0.000612 Epoch: 2 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:29,037-Speed 2497.30 samples/sec Loss 12.4219 LearningRate 0.000613 Epoch: 2 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:37,183-Speed 2514.60 samples/sec Loss 12.3747 LearningRate 0.000613 Epoch: 2 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:45,381-Speed 2498.47 samples/sec Loss 12.2528 LearningRate 0.000613 Epoch: 2 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:02:53,581-Speed 2498.02 samples/sec Loss 12.3831 LearningRate 0.000613 Epoch: 2 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:01,780-Speed 2498.25 samples/sec Loss 12.3775 LearningRate 0.000613 Epoch: 2 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:09,981-Speed 2497.81 samples/sec Loss 12.2181 LearningRate 0.000613 Epoch: 2 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:18,183-Speed 2497.43 samples/sec Loss 12.3067 LearningRate 0.000613 Epoch: 2 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:26,330-Speed 2514.09 samples/sec Loss 12.1633 LearningRate 0.000613 Epoch: 2 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:34,542-Speed 2494.37 samples/sec Loss 12.2250 LearningRate 0.000614 Epoch: 2 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:42,743-Speed 2497.81 samples/sec Loss 12.1990 LearningRate 0.000614 Epoch: 2 Global Step: 50910 Fp16 Grad Scale: 262144 Required: 178 hours Training: 2022-07-06 02:03:50,904-Speed 2510.03 samples/sec Loss 12.1072 LearningRate 0.000614 Epoch: 2 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:03:59,108-Speed 2496.76 samples/sec Loss 11.9979 LearningRate 0.000614 Epoch: 2 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:07,309-Speed 2497.75 samples/sec Loss 12.1631 LearningRate 0.000614 Epoch: 2 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:15,457-Speed 2513.89 samples/sec Loss 12.1810 LearningRate 0.000614 Epoch: 2 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:23,659-Speed 2497.39 samples/sec Loss 12.0303 LearningRate 0.000614 Epoch: 2 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:31,869-Speed 2495.09 samples/sec Loss 12.1694 LearningRate 0.000614 Epoch: 2 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:40,074-Speed 2496.32 samples/sec Loss 12.1449 LearningRate 0.000615 Epoch: 2 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:48,280-Speed 2496.36 samples/sec Loss 12.1312 LearningRate 0.000615 Epoch: 2 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:04:56,485-Speed 2496.15 samples/sec Loss 12.2307 LearningRate 0.000615 Epoch: 2 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:04,630-Speed 2514.91 samples/sec Loss 11.9917 LearningRate 0.000615 Epoch: 2 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:12,829-Speed 2498.36 samples/sec Loss 12.3730 LearningRate 0.000615 Epoch: 2 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:21,031-Speed 2497.27 samples/sec Loss 12.2222 LearningRate 0.000615 Epoch: 2 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:29,245-Speed 2493.69 samples/sec Loss 12.3930 LearningRate 0.000615 Epoch: 2 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:37,443-Speed 2498.37 samples/sec Loss 12.3907 LearningRate 0.000615 Epoch: 2 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:45,642-Speed 2498.24 samples/sec Loss 12.4404 LearningRate 0.000615 Epoch: 2 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:05:53,794-Speed 2512.85 samples/sec Loss 12.3228 LearningRate 0.000616 Epoch: 2 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:01,995-Speed 2497.67 samples/sec Loss 12.2686 LearningRate 0.000616 Epoch: 2 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:10,197-Speed 2497.49 samples/sec Loss 12.2296 LearningRate 0.000616 Epoch: 2 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:18,400-Speed 2497.11 samples/sec Loss 12.1991 LearningRate 0.000616 Epoch: 2 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:26,602-Speed 2497.48 samples/sec Loss 12.2089 LearningRate 0.000616 Epoch: 2 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:34,803-Speed 2497.70 samples/sec Loss 12.1382 LearningRate 0.000616 Epoch: 2 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:42,951-Speed 2513.95 samples/sec Loss 12.2754 LearningRate 0.000616 Epoch: 2 Global Step: 51130 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:51,153-Speed 2497.38 samples/sec Loss 12.2162 LearningRate 0.000616 Epoch: 2 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:06:59,349-Speed 2499.29 samples/sec Loss 12.2076 LearningRate 0.000617 Epoch: 2 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:07,553-Speed 2496.58 samples/sec Loss 12.1040 LearningRate 0.000617 Epoch: 2 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:15,759-Speed 2496.26 samples/sec Loss 12.1150 LearningRate 0.000617 Epoch: 2 Global Step: 51170 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:23,960-Speed 2497.70 samples/sec Loss 12.0960 LearningRate 0.000617 Epoch: 2 Global Step: 51180 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:32,110-Speed 2513.19 samples/sec Loss 12.1228 LearningRate 0.000617 Epoch: 2 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:40,311-Speed 2497.60 samples/sec Loss 12.1334 LearningRate 0.000617 Epoch: 2 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:48,512-Speed 2498.20 samples/sec Loss 12.0010 LearningRate 0.000617 Epoch: 2 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:07:56,712-Speed 2497.97 samples/sec Loss 12.0765 LearningRate 0.000617 Epoch: 2 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:04,913-Speed 2497.51 samples/sec Loss 12.1203 LearningRate 0.000618 Epoch: 2 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:13,117-Speed 2496.96 samples/sec Loss 12.1874 LearningRate 0.000618 Epoch: 2 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:21,266-Speed 2513.38 samples/sec Loss 12.1446 LearningRate 0.000618 Epoch: 2 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:29,470-Speed 2496.93 samples/sec Loss 12.0073 LearningRate 0.000618 Epoch: 2 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:37,672-Speed 2497.47 samples/sec Loss 12.0350 LearningRate 0.000618 Epoch: 2 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:45,872-Speed 2497.84 samples/sec Loss 11.9465 LearningRate 0.000618 Epoch: 2 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:08:54,072-Speed 2498.21 samples/sec Loss 11.9291 LearningRate 0.000618 Epoch: 2 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:02,277-Speed 2496.31 samples/sec Loss 12.0015 LearningRate 0.000618 Epoch: 2 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:10,423-Speed 2514.61 samples/sec Loss 12.0892 LearningRate 0.000619 Epoch: 2 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:18,632-Speed 2495.41 samples/sec Loss 11.9701 LearningRate 0.000619 Epoch: 2 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:26,830-Speed 2498.32 samples/sec Loss 12.0649 LearningRate 0.000619 Epoch: 2 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:35,032-Speed 2497.48 samples/sec Loss 12.0578 LearningRate 0.000619 Epoch: 2 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:43,235-Speed 2497.02 samples/sec Loss 12.0827 LearningRate 0.000619 Epoch: 2 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:51,435-Speed 2497.81 samples/sec Loss 11.9536 LearningRate 0.000619 Epoch: 2 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:09:59,585-Speed 2514.16 samples/sec Loss 12.1102 LearningRate 0.000619 Epoch: 2 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:07,788-Speed 2497.22 samples/sec Loss 12.0486 LearningRate 0.000619 Epoch: 2 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:15,987-Speed 2498.21 samples/sec Loss 11.8776 LearningRate 0.000619 Epoch: 2 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:24,187-Speed 2497.99 samples/sec Loss 11.9312 LearningRate 0.000620 Epoch: 2 Global Step: 51400 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:32,389-Speed 2497.37 samples/sec Loss 11.9565 LearningRate 0.000620 Epoch: 2 Global Step: 51410 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:40,589-Speed 2497.96 samples/sec Loss 12.0035 LearningRate 0.000620 Epoch: 2 Global Step: 51420 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:48,740-Speed 2513.13 samples/sec Loss 11.9821 LearningRate 0.000620 Epoch: 2 Global Step: 51430 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:10:56,938-Speed 2498.59 samples/sec Loss 11.9315 LearningRate 0.000620 Epoch: 2 Global Step: 51440 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:05,157-Speed 2492.40 samples/sec Loss 12.0752 LearningRate 0.000620 Epoch: 2 Global Step: 51450 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:13,356-Speed 2498.24 samples/sec Loss 12.0533 LearningRate 0.000620 Epoch: 2 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:21,555-Speed 2498.21 samples/sec Loss 12.1421 LearningRate 0.000620 Epoch: 2 Global Step: 51470 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:29,760-Speed 2496.42 samples/sec Loss 12.2484 LearningRate 0.000621 Epoch: 2 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:37,905-Speed 2514.79 samples/sec Loss 12.1322 LearningRate 0.000621 Epoch: 2 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:46,106-Speed 2497.65 samples/sec Loss 12.0157 LearningRate 0.000621 Epoch: 2 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:11:54,312-Speed 2496.20 samples/sec Loss 11.9959 LearningRate 0.000621 Epoch: 2 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:02,519-Speed 2495.96 samples/sec Loss 12.0127 LearningRate 0.000621 Epoch: 2 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:10,720-Speed 2497.71 samples/sec Loss 11.9287 LearningRate 0.000621 Epoch: 2 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:18,923-Speed 2496.80 samples/sec Loss 12.0728 LearningRate 0.000621 Epoch: 2 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:27,077-Speed 2512.03 samples/sec Loss 12.0311 LearningRate 0.000621 Epoch: 2 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:35,278-Speed 2497.74 samples/sec Loss 12.0405 LearningRate 0.000622 Epoch: 2 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:43,479-Speed 2497.63 samples/sec Loss 12.0248 LearningRate 0.000622 Epoch: 2 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:51,686-Speed 2495.71 samples/sec Loss 12.1226 LearningRate 0.000622 Epoch: 2 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:12:59,885-Speed 2498.44 samples/sec Loss 12.1200 LearningRate 0.000622 Epoch: 2 Global Step: 51590 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:08,087-Speed 2497.42 samples/sec Loss 11.9551 LearningRate 0.000622 Epoch: 2 Global Step: 51600 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:16,234-Speed 2514.41 samples/sec Loss 11.9870 LearningRate 0.000622 Epoch: 2 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:24,437-Speed 2497.37 samples/sec Loss 12.0146 LearningRate 0.000622 Epoch: 2 Global Step: 51620 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:32,637-Speed 2497.88 samples/sec Loss 12.0105 LearningRate 0.000622 Epoch: 2 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:40,871-Speed 2499.54 samples/sec Loss 11.9133 LearningRate 0.000622 Epoch: 2 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:49,112-Speed 2499.43 samples/sec Loss 11.8779 LearningRate 0.000623 Epoch: 2 Global Step: 51650 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:13:59,429-Speed 1985.19 samples/sec Loss 11.8446 LearningRate 0.000623 Epoch: 2 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:07,624-Speed 2514.54 samples/sec Loss 12.0316 LearningRate 0.000623 Epoch: 2 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:15,856-Speed 2498.47 samples/sec Loss 11.8223 LearningRate 0.000623 Epoch: 2 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:27,988-Speed 1693.97 samples/sec Loss 11.8740 LearningRate 0.000623 Epoch: 2 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:36,193-Speed 2501.89 samples/sec Loss 11.8750 LearningRate 0.000623 Epoch: 2 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:44,438-Speed 2499.79 samples/sec Loss 11.8517 LearningRate 0.000623 Epoch: 2 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 178 hours Training: 2022-07-06 02:14:52,645-Speed 2511.23 samples/sec Loss 11.7655 LearningRate 0.000623 Epoch: 2 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:01,050-Speed 2436.94 samples/sec Loss 11.8068 LearningRate 0.000624 Epoch: 2 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:09,284-Speed 2497.86 samples/sec Loss 11.8933 LearningRate 0.000624 Epoch: 2 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:17,524-Speed 2499.57 samples/sec Loss 11.9208 LearningRate 0.000624 Epoch: 2 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:26,117-Speed 2407.16 samples/sec Loss 11.7739 LearningRate 0.000624 Epoch: 2 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:34,318-Speed 2497.50 samples/sec Loss 11.7619 LearningRate 0.000624 Epoch: 2 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:42,520-Speed 2497.31 samples/sec Loss 11.8633 LearningRate 0.000624 Epoch: 2 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:15:50,699-Speed 2514.32 samples/sec Loss 11.8677 LearningRate 0.000624 Epoch: 2 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:02,608-Speed 2497.85 samples/sec Loss 11.8354 LearningRate 0.000624 Epoch: 2 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:10,806-Speed 2498.28 samples/sec Loss 11.9876 LearningRate 0.000625 Epoch: 2 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:19,012-Speed 2500.84 samples/sec Loss 12.0031 LearningRate 0.000625 Epoch: 2 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:27,235-Speed 2499.74 samples/sec Loss 11.9772 LearningRate 0.000625 Epoch: 2 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:38,488-Speed 1820.16 samples/sec Loss 11.8670 LearningRate 0.000625 Epoch: 2 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:46,633-Speed 2517.21 samples/sec Loss 11.8633 LearningRate 0.000625 Epoch: 2 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:16:54,841-Speed 2501.26 samples/sec Loss 11.9128 LearningRate 0.000625 Epoch: 2 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:03,038-Speed 2498.87 samples/sec Loss 11.8878 LearningRate 0.000625 Epoch: 2 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:11,237-Speed 2498.01 samples/sec Loss 11.8245 LearningRate 0.000625 Epoch: 2 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:19,448-Speed 2499.03 samples/sec Loss 11.7593 LearningRate 0.000626 Epoch: 2 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:27,666-Speed 2501.18 samples/sec Loss 11.9480 LearningRate 0.000626 Epoch: 2 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:35,822-Speed 2511.26 samples/sec Loss 11.7963 LearningRate 0.000626 Epoch: 2 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:46,282-Speed 1966.40 samples/sec Loss 12.0144 LearningRate 0.000626 Epoch: 2 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:17:54,500-Speed 2501.01 samples/sec Loss 12.0253 LearningRate 0.000626 Epoch: 2 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:04,777-Speed 2499.85 samples/sec Loss 12.0600 LearningRate 0.000626 Epoch: 2 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:12,973-Speed 2499.29 samples/sec Loss 11.9643 LearningRate 0.000626 Epoch: 2 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:24,958-Speed 2501.07 samples/sec Loss 12.0023 LearningRate 0.000626 Epoch: 2 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:35,598-Speed 2349.25 samples/sec Loss 11.9702 LearningRate 0.000626 Epoch: 2 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:43,935-Speed 2498.31 samples/sec Loss 11.8638 LearningRate 0.000627 Epoch: 2 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:18:52,144-Speed 2495.16 samples/sec Loss 11.7985 LearningRate 0.000627 Epoch: 2 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:00,357-Speed 2494.00 samples/sec Loss 12.0233 LearningRate 0.000627 Epoch: 2 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:08,576-Speed 2492.31 samples/sec Loss 11.9162 LearningRate 0.000627 Epoch: 2 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:16,797-Speed 2491.61 samples/sec Loss 11.8864 LearningRate 0.000627 Epoch: 2 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:24,974-Speed 2504.89 samples/sec Loss 11.8319 LearningRate 0.000627 Epoch: 2 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:33,208-Speed 2487.75 samples/sec Loss 11.8283 LearningRate 0.000627 Epoch: 2 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:41,432-Speed 2490.74 samples/sec Loss 11.8486 LearningRate 0.000627 Epoch: 2 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:49,652-Speed 2491.89 samples/sec Loss 11.7881 LearningRate 0.000628 Epoch: 2 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:19:57,870-Speed 2492.62 samples/sec Loss 11.9021 LearningRate 0.000628 Epoch: 2 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:06,084-Speed 2493.82 samples/sec Loss 11.7382 LearningRate 0.000628 Epoch: 2 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:14,238-Speed 2512.15 samples/sec Loss 11.8727 LearningRate 0.000628 Epoch: 2 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:22,444-Speed 2495.87 samples/sec Loss 11.9029 LearningRate 0.000628 Epoch: 2 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:30,642-Speed 2498.50 samples/sec Loss 11.8549 LearningRate 0.000628 Epoch: 2 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:38,851-Speed 2495.34 samples/sec Loss 11.8971 LearningRate 0.000628 Epoch: 2 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:47,043-Speed 2500.36 samples/sec Loss 11.8833 LearningRate 0.000628 Epoch: 2 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:20:55,245-Speed 2497.40 samples/sec Loss 11.8251 LearningRate 0.000629 Epoch: 2 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:03,391-Speed 2514.66 samples/sec Loss 11.8444 LearningRate 0.000629 Epoch: 2 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:11,595-Speed 2497.49 samples/sec Loss 11.8790 LearningRate 0.000629 Epoch: 2 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:19,800-Speed 2496.56 samples/sec Loss 11.8132 LearningRate 0.000629 Epoch: 2 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:28,003-Speed 2497.17 samples/sec Loss 11.7166 LearningRate 0.000629 Epoch: 2 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:36,231-Speed 2489.42 samples/sec Loss 11.7776 LearningRate 0.000629 Epoch: 2 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:44,436-Speed 2496.41 samples/sec Loss 11.8296 LearningRate 0.000629 Epoch: 2 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:21:52,586-Speed 2513.17 samples/sec Loss 11.8106 LearningRate 0.000629 Epoch: 2 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:00,800-Speed 2493.44 samples/sec Loss 11.8331 LearningRate 0.000629 Epoch: 2 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:09,014-Speed 2494.03 samples/sec Loss 11.8413 LearningRate 0.000630 Epoch: 2 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:17,217-Speed 2496.94 samples/sec Loss 11.7877 LearningRate 0.000630 Epoch: 2 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:25,420-Speed 2496.99 samples/sec Loss 11.6935 LearningRate 0.000630 Epoch: 2 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:33,623-Speed 2497.14 samples/sec Loss 11.7401 LearningRate 0.000630 Epoch: 2 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:41,777-Speed 2512.17 samples/sec Loss 11.7956 LearningRate 0.000630 Epoch: 2 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:49,997-Speed 2492.12 samples/sec Loss 11.7821 LearningRate 0.000630 Epoch: 2 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:22:58,209-Speed 2494.45 samples/sec Loss 11.8078 LearningRate 0.000630 Epoch: 2 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:23:06,410-Speed 2497.59 samples/sec Loss 11.8411 LearningRate 0.000630 Epoch: 2 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:23:14,610-Speed 2497.98 samples/sec Loss 12.1087 LearningRate 0.000631 Epoch: 2 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 178 hours Training: 2022-07-06 02:23:22,812-Speed 2497.35 samples/sec Loss 12.0238 LearningRate 0.000631 Epoch: 2 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:23:30,968-Speed 2511.53 samples/sec Loss 11.9027 LearningRate 0.000631 Epoch: 2 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:23:39,171-Speed 2497.11 samples/sec Loss 11.9265 LearningRate 0.000631 Epoch: 2 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:23:47,379-Speed 2495.87 samples/sec Loss 11.9071 LearningRate 0.000631 Epoch: 2 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:23:55,584-Speed 2496.63 samples/sec Loss 11.8916 LearningRate 0.000631 Epoch: 2 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:03,787-Speed 2496.95 samples/sec Loss 11.7534 LearningRate 0.000631 Epoch: 2 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:11,992-Speed 2496.32 samples/sec Loss 11.8615 LearningRate 0.000631 Epoch: 2 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:20,145-Speed 2512.42 samples/sec Loss 11.7717 LearningRate 0.000632 Epoch: 2 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:28,348-Speed 2497.15 samples/sec Loss 11.9889 LearningRate 0.000632 Epoch: 2 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:36,553-Speed 2496.27 samples/sec Loss 11.8966 LearningRate 0.000632 Epoch: 2 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:44,756-Speed 2497.09 samples/sec Loss 11.8685 LearningRate 0.000632 Epoch: 2 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:24:52,961-Speed 2496.68 samples/sec Loss 11.8473 LearningRate 0.000632 Epoch: 2 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:01,180-Speed 2492.11 samples/sec Loss 11.8593 LearningRate 0.000632 Epoch: 2 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:09,333-Speed 2512.35 samples/sec Loss 11.9205 LearningRate 0.000632 Epoch: 2 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:17,537-Speed 2496.72 samples/sec Loss 11.8082 LearningRate 0.000632 Epoch: 2 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:25,739-Speed 2497.29 samples/sec Loss 11.8288 LearningRate 0.000632 Epoch: 2 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:33,942-Speed 2497.25 samples/sec Loss 11.8408 LearningRate 0.000633 Epoch: 2 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:42,142-Speed 2497.67 samples/sec Loss 11.7587 LearningRate 0.000633 Epoch: 2 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:50,345-Speed 2497.32 samples/sec Loss 11.8673 LearningRate 0.000633 Epoch: 2 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:25:58,502-Speed 2511.16 samples/sec Loss 11.9052 LearningRate 0.000633 Epoch: 2 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:06,714-Speed 2494.21 samples/sec Loss 11.8761 LearningRate 0.000633 Epoch: 2 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:14,915-Speed 2497.69 samples/sec Loss 11.8076 LearningRate 0.000633 Epoch: 2 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:23,117-Speed 2497.31 samples/sec Loss 11.7684 LearningRate 0.000633 Epoch: 2 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:31,315-Speed 2498.41 samples/sec Loss 11.9214 LearningRate 0.000633 Epoch: 2 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:39,515-Speed 2498.12 samples/sec Loss 11.7364 LearningRate 0.000634 Epoch: 2 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:47,658-Speed 2515.49 samples/sec Loss 11.8097 LearningRate 0.000634 Epoch: 2 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:26:55,858-Speed 2497.84 samples/sec Loss 11.8221 LearningRate 0.000634 Epoch: 2 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:04,063-Speed 2496.28 samples/sec Loss 11.8546 LearningRate 0.000634 Epoch: 2 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:12,264-Speed 2497.75 samples/sec Loss 11.7899 LearningRate 0.000634 Epoch: 2 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:20,464-Speed 2498.15 samples/sec Loss 11.8018 LearningRate 0.000634 Epoch: 2 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:28,668-Speed 2496.73 samples/sec Loss 11.7243 LearningRate 0.000634 Epoch: 2 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:36,818-Speed 2513.40 samples/sec Loss 12.0785 LearningRate 0.000634 Epoch: 2 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:45,016-Speed 2498.38 samples/sec Loss 11.8126 LearningRate 0.000635 Epoch: 2 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:27:53,218-Speed 2497.50 samples/sec Loss 11.9376 LearningRate 0.000635 Epoch: 2 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:01,417-Speed 2498.34 samples/sec Loss 11.9026 LearningRate 0.000635 Epoch: 2 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:09,619-Speed 2497.49 samples/sec Loss 11.8162 LearningRate 0.000635 Epoch: 2 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:17,820-Speed 2497.60 samples/sec Loss 11.8934 LearningRate 0.000635 Epoch: 2 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:25,967-Speed 2514.10 samples/sec Loss 11.8563 LearningRate 0.000635 Epoch: 2 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:34,171-Speed 2496.65 samples/sec Loss 11.8183 LearningRate 0.000635 Epoch: 2 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:42,373-Speed 2497.77 samples/sec Loss 11.8780 LearningRate 0.000635 Epoch: 2 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:50,573-Speed 2498.31 samples/sec Loss 11.7522 LearningRate 0.000636 Epoch: 2 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:28:58,780-Speed 2496.31 samples/sec Loss 11.6604 LearningRate 0.000636 Epoch: 2 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:06,981-Speed 2497.50 samples/sec Loss 11.8576 LearningRate 0.000636 Epoch: 2 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:15,135-Speed 2512.06 samples/sec Loss 11.8479 LearningRate 0.000636 Epoch: 2 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:23,336-Speed 2497.51 samples/sec Loss 11.8039 LearningRate 0.000636 Epoch: 2 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:31,552-Speed 2493.18 samples/sec Loss 11.8056 LearningRate 0.000636 Epoch: 2 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:39,758-Speed 2496.25 samples/sec Loss 11.7650 LearningRate 0.000636 Epoch: 2 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:47,959-Speed 2497.41 samples/sec Loss 11.7505 LearningRate 0.000636 Epoch: 2 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:29:56,162-Speed 2496.97 samples/sec Loss 11.8171 LearningRate 0.000636 Epoch: 2 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:04,316-Speed 2512.25 samples/sec Loss 11.7764 LearningRate 0.000637 Epoch: 2 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:12,518-Speed 2497.99 samples/sec Loss 11.8113 LearningRate 0.000637 Epoch: 2 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:20,725-Speed 2495.84 samples/sec Loss 11.7930 LearningRate 0.000637 Epoch: 2 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:28,952-Speed 2489.82 samples/sec Loss 11.7242 LearningRate 0.000637 Epoch: 2 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:37,156-Speed 2496.89 samples/sec Loss 11.7660 LearningRate 0.000637 Epoch: 2 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:45,359-Speed 2497.02 samples/sec Loss 11.7669 LearningRate 0.000637 Epoch: 2 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:30:53,507-Speed 2513.95 samples/sec Loss 11.8271 LearningRate 0.000637 Epoch: 2 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:31:01,710-Speed 2496.92 samples/sec Loss 11.6126 LearningRate 0.000637 Epoch: 2 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:31:09,918-Speed 2496.16 samples/sec Loss 11.7024 LearningRate 0.000638 Epoch: 2 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:31:18,122-Speed 2496.60 samples/sec Loss 11.7238 LearningRate 0.000638 Epoch: 2 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:31:26,325-Speed 2497.46 samples/sec Loss 11.7872 LearningRate 0.000638 Epoch: 2 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 177 hours Training: 2022-07-06 02:31:34,541-Speed 2493.11 samples/sec Loss 11.6392 LearningRate 0.000638 Epoch: 2 Global Step: 52920 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:31:42,687-Speed 2514.54 samples/sec Loss 11.7316 LearningRate 0.000638 Epoch: 2 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:31:50,889-Speed 2497.23 samples/sec Loss 11.6718 LearningRate 0.000638 Epoch: 2 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:31:59,090-Speed 2497.77 samples/sec Loss 11.6552 LearningRate 0.000638 Epoch: 2 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:07,310-Speed 2491.90 samples/sec Loss 11.5497 LearningRate 0.000638 Epoch: 2 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:15,515-Speed 2496.25 samples/sec Loss 11.6764 LearningRate 0.000639 Epoch: 2 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:23,715-Speed 2498.11 samples/sec Loss 11.6101 LearningRate 0.000639 Epoch: 2 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:31,866-Speed 2512.97 samples/sec Loss 11.5839 LearningRate 0.000639 Epoch: 2 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:40,068-Speed 2497.79 samples/sec Loss 11.6161 LearningRate 0.000639 Epoch: 2 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:48,270-Speed 2497.23 samples/sec Loss 11.6618 LearningRate 0.000639 Epoch: 2 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:32:56,484-Speed 2493.66 samples/sec Loss 11.4951 LearningRate 0.000639 Epoch: 2 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:04,687-Speed 2497.50 samples/sec Loss 11.6318 LearningRate 0.000639 Epoch: 2 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:12,891-Speed 2496.50 samples/sec Loss 11.6573 LearningRate 0.000639 Epoch: 2 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:21,041-Speed 2513.19 samples/sec Loss 11.8272 LearningRate 0.000639 Epoch: 2 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:29,243-Speed 2497.58 samples/sec Loss 11.6439 LearningRate 0.000640 Epoch: 2 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:37,447-Speed 2496.68 samples/sec Loss 11.5802 LearningRate 0.000640 Epoch: 2 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:45,650-Speed 2497.12 samples/sec Loss 11.5904 LearningRate 0.000640 Epoch: 2 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:33:53,863-Speed 2493.92 samples/sec Loss 11.6834 LearningRate 0.000640 Epoch: 2 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:02,066-Speed 2497.33 samples/sec Loss 11.6560 LearningRate 0.000640 Epoch: 2 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:10,213-Speed 2514.82 samples/sec Loss 11.5812 LearningRate 0.000640 Epoch: 2 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:18,417-Speed 2496.73 samples/sec Loss 11.5713 LearningRate 0.000640 Epoch: 2 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:26,620-Speed 2497.16 samples/sec Loss 11.5249 LearningRate 0.000640 Epoch: 2 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:34,818-Speed 2498.26 samples/sec Loss 11.5141 LearningRate 0.000641 Epoch: 2 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:43,020-Speed 2497.51 samples/sec Loss 11.5432 LearningRate 0.000641 Epoch: 2 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:51,225-Speed 2496.40 samples/sec Loss 11.6219 LearningRate 0.000641 Epoch: 2 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:34:59,373-Speed 2514.08 samples/sec Loss 11.5262 LearningRate 0.000641 Epoch: 2 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:07,576-Speed 2496.93 samples/sec Loss 11.5318 LearningRate 0.000641 Epoch: 2 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:15,777-Speed 2497.82 samples/sec Loss 11.5122 LearningRate 0.000641 Epoch: 2 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:23,980-Speed 2496.78 samples/sec Loss 11.4562 LearningRate 0.000641 Epoch: 2 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:32,186-Speed 2496.31 samples/sec Loss 11.4555 LearningRate 0.000641 Epoch: 2 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:40,391-Speed 2496.43 samples/sec Loss 11.5835 LearningRate 0.000642 Epoch: 2 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:48,540-Speed 2513.59 samples/sec Loss 11.5344 LearningRate 0.000642 Epoch: 2 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:35:56,763-Speed 2490.90 samples/sec Loss 11.5159 LearningRate 0.000642 Epoch: 2 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:04,969-Speed 2495.94 samples/sec Loss 11.5657 LearningRate 0.000642 Epoch: 2 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:13,173-Speed 2496.99 samples/sec Loss 11.5191 LearningRate 0.000642 Epoch: 2 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:21,382-Speed 2495.21 samples/sec Loss 11.5056 LearningRate 0.000642 Epoch: 2 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:29,599-Speed 2492.58 samples/sec Loss 11.5296 LearningRate 0.000642 Epoch: 2 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:37,751-Speed 2512.73 samples/sec Loss 11.5104 LearningRate 0.000642 Epoch: 2 Global Step: 53290 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:45,969-Speed 2492.43 samples/sec Loss 11.5831 LearningRate 0.000642 Epoch: 2 Global Step: 53300 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:36:54,175-Speed 2496.12 samples/sec Loss 11.5781 LearningRate 0.000643 Epoch: 2 Global Step: 53310 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:02,378-Speed 2497.20 samples/sec Loss 11.5292 LearningRate 0.000643 Epoch: 2 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:10,588-Speed 2494.83 samples/sec Loss 11.5322 LearningRate 0.000643 Epoch: 2 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:18,790-Speed 2497.48 samples/sec Loss 11.4921 LearningRate 0.000643 Epoch: 2 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:26,949-Speed 2510.57 samples/sec Loss 11.7886 LearningRate 0.000643 Epoch: 2 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:35,151-Speed 2497.57 samples/sec Loss 11.6396 LearningRate 0.000643 Epoch: 2 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:43,353-Speed 2497.42 samples/sec Loss 11.5704 LearningRate 0.000643 Epoch: 2 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:51,555-Speed 2497.36 samples/sec Loss 11.5314 LearningRate 0.000643 Epoch: 2 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:37:59,755-Speed 2498.58 samples/sec Loss 11.5749 LearningRate 0.000644 Epoch: 2 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:07,968-Speed 2493.75 samples/sec Loss 11.5212 LearningRate 0.000644 Epoch: 2 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:16,117-Speed 2513.56 samples/sec Loss 11.5868 LearningRate 0.000644 Epoch: 2 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:24,318-Speed 2498.11 samples/sec Loss 11.5557 LearningRate 0.000644 Epoch: 2 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:32,533-Speed 2493.50 samples/sec Loss 11.5282 LearningRate 0.000644 Epoch: 2 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:40,736-Speed 2497.03 samples/sec Loss 11.4381 LearningRate 0.000644 Epoch: 2 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:48,934-Speed 2498.30 samples/sec Loss 11.4889 LearningRate 0.000644 Epoch: 2 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:38:57,138-Speed 2496.58 samples/sec Loss 11.4901 LearningRate 0.000644 Epoch: 2 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:05,296-Speed 2511.00 samples/sec Loss 11.5925 LearningRate 0.000645 Epoch: 2 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:13,502-Speed 2496.12 samples/sec Loss 11.5284 LearningRate 0.000645 Epoch: 2 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:21,703-Speed 2497.78 samples/sec Loss 11.4874 LearningRate 0.000645 Epoch: 2 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:29,903-Speed 2498.10 samples/sec Loss 11.4054 LearningRate 0.000645 Epoch: 2 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:38,120-Speed 2492.65 samples/sec Loss 11.4111 LearningRate 0.000645 Epoch: 2 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:46,323-Speed 2496.99 samples/sec Loss 11.3942 LearningRate 0.000645 Epoch: 2 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:39:54,477-Speed 2512.25 samples/sec Loss 11.5793 LearningRate 0.000645 Epoch: 2 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:02,679-Speed 2497.43 samples/sec Loss 11.5436 LearningRate 0.000645 Epoch: 2 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:10,880-Speed 2497.59 samples/sec Loss 11.5047 LearningRate 0.000646 Epoch: 2 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:19,085-Speed 2496.68 samples/sec Loss 11.5739 LearningRate 0.000646 Epoch: 2 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:27,286-Speed 2497.43 samples/sec Loss 11.5491 LearningRate 0.000646 Epoch: 2 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:35,494-Speed 2495.62 samples/sec Loss 11.5496 LearningRate 0.000646 Epoch: 2 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:43,643-Speed 2513.81 samples/sec Loss 11.4567 LearningRate 0.000646 Epoch: 2 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:40:51,845-Speed 2497.60 samples/sec Loss 11.5533 LearningRate 0.000646 Epoch: 2 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:00,053-Speed 2495.39 samples/sec Loss 11.4483 LearningRate 0.000646 Epoch: 2 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:08,267-Speed 2493.78 samples/sec Loss 11.4929 LearningRate 0.000646 Epoch: 2 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:16,468-Speed 2497.74 samples/sec Loss 11.4145 LearningRate 0.000646 Epoch: 2 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:24,668-Speed 2497.91 samples/sec Loss 11.3594 LearningRate 0.000647 Epoch: 2 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:32,816-Speed 2513.90 samples/sec Loss 11.5977 LearningRate 0.000647 Epoch: 2 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:41,019-Speed 2497.12 samples/sec Loss 11.5396 LearningRate 0.000647 Epoch: 2 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:49,226-Speed 2495.85 samples/sec Loss 11.6017 LearningRate 0.000647 Epoch: 2 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:41:57,424-Speed 2498.44 samples/sec Loss 11.6186 LearningRate 0.000647 Epoch: 2 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:05,624-Speed 2498.06 samples/sec Loss 11.5988 LearningRate 0.000647 Epoch: 2 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:13,827-Speed 2497.13 samples/sec Loss 11.4792 LearningRate 0.000647 Epoch: 2 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:21,974-Speed 2514.12 samples/sec Loss 11.5630 LearningRate 0.000647 Epoch: 2 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:30,179-Speed 2496.51 samples/sec Loss 11.4999 LearningRate 0.000648 Epoch: 2 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:38,379-Speed 2497.91 samples/sec Loss 11.4414 LearningRate 0.000648 Epoch: 2 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:46,582-Speed 2497.18 samples/sec Loss 11.4101 LearningRate 0.000648 Epoch: 2 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:42:54,782-Speed 2497.97 samples/sec Loss 11.3458 LearningRate 0.000648 Epoch: 2 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:02,983-Speed 2497.45 samples/sec Loss 11.4210 LearningRate 0.000648 Epoch: 2 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:11,130-Speed 2514.35 samples/sec Loss 11.4039 LearningRate 0.000648 Epoch: 2 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:19,327-Speed 2498.96 samples/sec Loss 11.5127 LearningRate 0.000648 Epoch: 2 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:27,526-Speed 2498.26 samples/sec Loss 11.2828 LearningRate 0.000648 Epoch: 2 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:35,728-Speed 2497.42 samples/sec Loss 11.3689 LearningRate 0.000649 Epoch: 2 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:43,927-Speed 2498.28 samples/sec Loss 11.3506 LearningRate 0.000649 Epoch: 2 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:43:52,131-Speed 2496.87 samples/sec Loss 11.4096 LearningRate 0.000649 Epoch: 2 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:00,281-Speed 2513.42 samples/sec Loss 11.4093 LearningRate 0.000649 Epoch: 2 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:08,483-Speed 2497.40 samples/sec Loss 11.4514 LearningRate 0.000649 Epoch: 2 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:16,685-Speed 2497.35 samples/sec Loss 11.4038 LearningRate 0.000649 Epoch: 2 Global Step: 53850 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:24,886-Speed 2497.72 samples/sec Loss 11.4792 LearningRate 0.000649 Epoch: 2 Global Step: 53860 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:33,083-Speed 2498.60 samples/sec Loss 11.4677 LearningRate 0.000649 Epoch: 2 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:41,283-Speed 2498.05 samples/sec Loss 11.4119 LearningRate 0.000649 Epoch: 2 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:49,439-Speed 2511.51 samples/sec Loss 11.4948 LearningRate 0.000650 Epoch: 2 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:44:57,645-Speed 2496.10 samples/sec Loss 11.4536 LearningRate 0.000650 Epoch: 2 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:05,846-Speed 2497.60 samples/sec Loss 11.4367 LearningRate 0.000650 Epoch: 2 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:14,047-Speed 2497.40 samples/sec Loss 11.4192 LearningRate 0.000650 Epoch: 2 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:22,249-Speed 2497.28 samples/sec Loss 11.3547 LearningRate 0.000650 Epoch: 2 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:30,451-Speed 2497.37 samples/sec Loss 11.3898 LearningRate 0.000650 Epoch: 2 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:38,602-Speed 2513.00 samples/sec Loss 11.2636 LearningRate 0.000650 Epoch: 2 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:46,803-Speed 2497.64 samples/sec Loss 11.3765 LearningRate 0.000650 Epoch: 2 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:45:55,005-Speed 2497.58 samples/sec Loss 11.4782 LearningRate 0.000651 Epoch: 2 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:03,204-Speed 2498.10 samples/sec Loss 11.3792 LearningRate 0.000651 Epoch: 2 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:11,406-Speed 2497.52 samples/sec Loss 11.2905 LearningRate 0.000651 Epoch: 2 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:19,605-Speed 2498.40 samples/sec Loss 11.4793 LearningRate 0.000651 Epoch: 2 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:27,752-Speed 2514.06 samples/sec Loss 11.4614 LearningRate 0.000651 Epoch: 2 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:35,961-Speed 2495.33 samples/sec Loss 11.5050 LearningRate 0.000651 Epoch: 2 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:44,161-Speed 2497.73 samples/sec Loss 11.6854 LearningRate 0.000651 Epoch: 2 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:46:52,366-Speed 2496.60 samples/sec Loss 11.5961 LearningRate 0.000651 Epoch: 2 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:00,566-Speed 2498.21 samples/sec Loss 11.7228 LearningRate 0.000652 Epoch: 2 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:08,766-Speed 2497.88 samples/sec Loss 11.7934 LearningRate 0.000652 Epoch: 2 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:16,914-Speed 2513.94 samples/sec Loss 11.7936 LearningRate 0.000652 Epoch: 2 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:25,115-Speed 2497.44 samples/sec Loss 11.6079 LearningRate 0.000652 Epoch: 2 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:33,332-Speed 2492.76 samples/sec Loss 11.5773 LearningRate 0.000652 Epoch: 2 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:41,539-Speed 2495.87 samples/sec Loss 11.4865 LearningRate 0.000652 Epoch: 2 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:49,741-Speed 2497.17 samples/sec Loss 11.5197 LearningRate 0.000652 Epoch: 2 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:47:57,941-Speed 2497.86 samples/sec Loss 11.4897 LearningRate 0.000652 Epoch: 2 Global Step: 54120 Fp16 Grad Scale: 262144 Required: 177 hours Training: 2022-07-06 02:48:06,104-Speed 2509.28 samples/sec Loss 11.5506 LearningRate 0.000653 Epoch: 2 Global Step: 54130 Fp16 Grad Scale: 262144 Required: 177 hours Training: 2022-07-06 02:48:14,275-Speed 2506.75 samples/sec Loss 11.6097 LearningRate 0.000653 Epoch: 2 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:48:22,476-Speed 2497.66 samples/sec Loss 11.4887 LearningRate 0.000653 Epoch: 2 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:48:30,685-Speed 2495.27 samples/sec Loss 11.3928 LearningRate 0.000653 Epoch: 2 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:48:38,887-Speed 2497.44 samples/sec Loss 11.3400 LearningRate 0.000653 Epoch: 2 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:48:47,092-Speed 2496.41 samples/sec Loss 11.4593 LearningRate 0.000653 Epoch: 2 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:48:55,241-Speed 2513.80 samples/sec Loss 11.3461 LearningRate 0.000653 Epoch: 2 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:03,454-Speed 2493.99 samples/sec Loss 11.5032 LearningRate 0.000653 Epoch: 2 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:11,654-Speed 2498.20 samples/sec Loss 11.4757 LearningRate 0.000653 Epoch: 2 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:19,852-Speed 2498.47 samples/sec Loss 11.4616 LearningRate 0.000654 Epoch: 2 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:28,069-Speed 2492.90 samples/sec Loss 11.4600 LearningRate 0.000654 Epoch: 2 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:36,284-Speed 2493.51 samples/sec Loss 11.3648 LearningRate 0.000654 Epoch: 2 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:44,428-Speed 2514.93 samples/sec Loss 11.4121 LearningRate 0.000654 Epoch: 2 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:49:52,627-Speed 2498.36 samples/sec Loss 11.3989 LearningRate 0.000654 Epoch: 2 Global Step: 54260 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:00,834-Speed 2495.86 samples/sec Loss 11.3930 LearningRate 0.000654 Epoch: 2 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:09,034-Speed 2497.97 samples/sec Loss 11.4394 LearningRate 0.000654 Epoch: 2 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:17,233-Speed 2498.32 samples/sec Loss 11.4016 LearningRate 0.000654 Epoch: 2 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:25,433-Speed 2497.88 samples/sec Loss 11.4126 LearningRate 0.000655 Epoch: 2 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:33,585-Speed 2512.78 samples/sec Loss 11.2957 LearningRate 0.000655 Epoch: 2 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:41,791-Speed 2496.06 samples/sec Loss 11.3299 LearningRate 0.000655 Epoch: 2 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:49,994-Speed 2497.26 samples/sec Loss 11.3697 LearningRate 0.000655 Epoch: 2 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:50:58,199-Speed 2496.51 samples/sec Loss 11.3385 LearningRate 0.000655 Epoch: 2 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:06,405-Speed 2496.25 samples/sec Loss 11.3577 LearningRate 0.000655 Epoch: 2 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:14,615-Speed 2494.59 samples/sec Loss 11.3308 LearningRate 0.000655 Epoch: 2 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:22,763-Speed 2513.94 samples/sec Loss 11.3666 LearningRate 0.000655 Epoch: 2 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:30,966-Speed 2497.17 samples/sec Loss 11.2288 LearningRate 0.000656 Epoch: 2 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:39,165-Speed 2498.01 samples/sec Loss 11.3985 LearningRate 0.000656 Epoch: 2 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:47,365-Speed 2497.86 samples/sec Loss 11.3727 LearningRate 0.000656 Epoch: 2 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:51:55,579-Speed 2493.86 samples/sec Loss 11.4132 LearningRate 0.000656 Epoch: 2 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:03,781-Speed 2497.47 samples/sec Loss 11.3306 LearningRate 0.000656 Epoch: 2 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:11,930-Speed 2513.63 samples/sec Loss 11.3906 LearningRate 0.000656 Epoch: 2 Global Step: 54430 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:20,132-Speed 2497.24 samples/sec Loss 11.2655 LearningRate 0.000656 Epoch: 2 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:28,339-Speed 2495.95 samples/sec Loss 11.2843 LearningRate 0.000656 Epoch: 2 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:36,543-Speed 2496.93 samples/sec Loss 11.2222 LearningRate 0.000656 Epoch: 2 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:44,747-Speed 2496.59 samples/sec Loss 11.2834 LearningRate 0.000657 Epoch: 2 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:52:52,954-Speed 2495.53 samples/sec Loss 11.2130 LearningRate 0.000657 Epoch: 2 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:01,105-Speed 2513.23 samples/sec Loss 11.1738 LearningRate 0.000657 Epoch: 2 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:09,320-Speed 2493.35 samples/sec Loss 11.2536 LearningRate 0.000657 Epoch: 2 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:17,531-Speed 2494.84 samples/sec Loss 11.1707 LearningRate 0.000657 Epoch: 2 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:25,730-Speed 2498.38 samples/sec Loss 11.2392 LearningRate 0.000657 Epoch: 2 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:33,935-Speed 2496.62 samples/sec Loss 11.4931 LearningRate 0.000657 Epoch: 2 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:42,146-Speed 2494.72 samples/sec Loss 11.5023 LearningRate 0.000657 Epoch: 2 Global Step: 54540 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:50,290-Speed 2515.02 samples/sec Loss 11.2808 LearningRate 0.000658 Epoch: 2 Global Step: 54550 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:53:58,497-Speed 2495.78 samples/sec Loss 11.4900 LearningRate 0.000658 Epoch: 2 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:06,696-Speed 2498.25 samples/sec Loss 11.3207 LearningRate 0.000658 Epoch: 2 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:14,897-Speed 2497.75 samples/sec Loss 11.2856 LearningRate 0.000658 Epoch: 2 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:23,106-Speed 2495.02 samples/sec Loss 11.4175 LearningRate 0.000658 Epoch: 2 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:31,305-Speed 2498.39 samples/sec Loss 11.3260 LearningRate 0.000658 Epoch: 2 Global Step: 54600 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:39,454-Speed 2513.40 samples/sec Loss 11.1379 LearningRate 0.000658 Epoch: 2 Global Step: 54610 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:47,669-Speed 2493.94 samples/sec Loss 11.3133 LearningRate 0.000658 Epoch: 2 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:54:55,878-Speed 2495.23 samples/sec Loss 11.3312 LearningRate 0.000659 Epoch: 2 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:04,083-Speed 2496.68 samples/sec Loss 11.2747 LearningRate 0.000659 Epoch: 2 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:12,288-Speed 2496.31 samples/sec Loss 11.3137 LearningRate 0.000659 Epoch: 2 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:20,490-Speed 2497.26 samples/sec Loss 11.2363 LearningRate 0.000659 Epoch: 2 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:28,638-Speed 2513.92 samples/sec Loss 11.2612 LearningRate 0.000659 Epoch: 2 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:36,839-Speed 2497.78 samples/sec Loss 11.2929 LearningRate 0.000659 Epoch: 2 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:45,040-Speed 2497.48 samples/sec Loss 11.2509 LearningRate 0.000659 Epoch: 2 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:55:53,245-Speed 2496.42 samples/sec Loss 11.3576 LearningRate 0.000659 Epoch: 2 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:01,444-Speed 2498.32 samples/sec Loss 11.3072 LearningRate 0.000659 Epoch: 2 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:09,643-Speed 2498.25 samples/sec Loss 11.2384 LearningRate 0.000660 Epoch: 2 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:17,789-Speed 2514.65 samples/sec Loss 11.2515 LearningRate 0.000660 Epoch: 2 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:25,988-Speed 2498.01 samples/sec Loss 11.3564 LearningRate 0.000660 Epoch: 2 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:34,193-Speed 2496.51 samples/sec Loss 11.2767 LearningRate 0.000660 Epoch: 2 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:42,393-Speed 2497.90 samples/sec Loss 11.2848 LearningRate 0.000660 Epoch: 2 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:50,608-Speed 2493.51 samples/sec Loss 11.3929 LearningRate 0.000660 Epoch: 2 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:56:58,806-Speed 2498.40 samples/sec Loss 11.3412 LearningRate 0.000660 Epoch: 2 Global Step: 54780 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:06,956-Speed 2513.25 samples/sec Loss 11.2844 LearningRate 0.000660 Epoch: 2 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:15,164-Speed 2495.73 samples/sec Loss 11.2712 LearningRate 0.000661 Epoch: 2 Global Step: 54800 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:23,363-Speed 2498.18 samples/sec Loss 11.1819 LearningRate 0.000661 Epoch: 2 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:31,572-Speed 2495.13 samples/sec Loss 11.2300 LearningRate 0.000661 Epoch: 2 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:39,773-Speed 2497.77 samples/sec Loss 11.2905 LearningRate 0.000661 Epoch: 2 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:47,974-Speed 2497.82 samples/sec Loss 11.2082 LearningRate 0.000661 Epoch: 2 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:57:56,122-Speed 2513.79 samples/sec Loss 11.2900 LearningRate 0.000661 Epoch: 2 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:04,325-Speed 2497.15 samples/sec Loss 11.2483 LearningRate 0.000661 Epoch: 2 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:12,529-Speed 2496.57 samples/sec Loss 11.2358 LearningRate 0.000661 Epoch: 2 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:20,732-Speed 2497.36 samples/sec Loss 11.2889 LearningRate 0.000662 Epoch: 2 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:28,934-Speed 2497.39 samples/sec Loss 11.3518 LearningRate 0.000662 Epoch: 2 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:37,139-Speed 2496.46 samples/sec Loss 11.3871 LearningRate 0.000662 Epoch: 2 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:45,284-Speed 2514.66 samples/sec Loss 11.2704 LearningRate 0.000662 Epoch: 2 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:58:53,490-Speed 2496.43 samples/sec Loss 11.1654 LearningRate 0.000662 Epoch: 2 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:01,706-Speed 2493.05 samples/sec Loss 11.2403 LearningRate 0.000662 Epoch: 2 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:09,908-Speed 2497.41 samples/sec Loss 11.2149 LearningRate 0.000662 Epoch: 2 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:18,106-Speed 2498.33 samples/sec Loss 11.2375 LearningRate 0.000662 Epoch: 2 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:26,308-Speed 2497.45 samples/sec Loss 11.1496 LearningRate 0.000663 Epoch: 2 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:34,452-Speed 2515.20 samples/sec Loss 11.2378 LearningRate 0.000663 Epoch: 2 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:42,652-Speed 2498.06 samples/sec Loss 11.0952 LearningRate 0.000663 Epoch: 2 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:50,851-Speed 2498.04 samples/sec Loss 11.1407 LearningRate 0.000663 Epoch: 2 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 02:59:59,051-Speed 2498.00 samples/sec Loss 11.1807 LearningRate 0.000663 Epoch: 2 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:07,254-Speed 2497.16 samples/sec Loss 11.1000 LearningRate 0.000663 Epoch: 2 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:15,454-Speed 2497.68 samples/sec Loss 11.1007 LearningRate 0.000663 Epoch: 2 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:23,601-Speed 2514.58 samples/sec Loss 11.1337 LearningRate 0.000663 Epoch: 2 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:31,817-Speed 2493.11 samples/sec Loss 11.1993 LearningRate 0.000663 Epoch: 2 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:40,016-Speed 2498.19 samples/sec Loss 11.1326 LearningRate 0.000664 Epoch: 2 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:48,216-Speed 2498.12 samples/sec Loss 11.2297 LearningRate 0.000664 Epoch: 2 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:00:56,420-Speed 2496.74 samples/sec Loss 11.1088 LearningRate 0.000664 Epoch: 2 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:04,624-Speed 2496.62 samples/sec Loss 11.0975 LearningRate 0.000664 Epoch: 2 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:12,776-Speed 2512.59 samples/sec Loss 11.1802 LearningRate 0.000664 Epoch: 2 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:20,980-Speed 2496.80 samples/sec Loss 11.2324 LearningRate 0.000664 Epoch: 2 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:29,180-Speed 2498.20 samples/sec Loss 11.1522 LearningRate 0.000664 Epoch: 2 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:37,379-Speed 2498.46 samples/sec Loss 11.1572 LearningRate 0.000664 Epoch: 2 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:45,578-Speed 2498.02 samples/sec Loss 11.1176 LearningRate 0.000665 Epoch: 2 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:01:53,777-Speed 2498.30 samples/sec Loss 11.0777 LearningRate 0.000665 Epoch: 2 Global Step: 55140 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:01,926-Speed 2513.63 samples/sec Loss 11.1678 LearningRate 0.000665 Epoch: 2 Global Step: 55150 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:10,130-Speed 2496.68 samples/sec Loss 11.2012 LearningRate 0.000665 Epoch: 2 Global Step: 55160 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:18,328-Speed 2498.60 samples/sec Loss 11.1856 LearningRate 0.000665 Epoch: 2 Global Step: 55170 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:26,543-Speed 2493.61 samples/sec Loss 11.2109 LearningRate 0.000665 Epoch: 2 Global Step: 55180 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:34,743-Speed 2498.07 samples/sec Loss 11.1130 LearningRate 0.000665 Epoch: 2 Global Step: 55190 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:42,941-Speed 2498.44 samples/sec Loss 11.1424 LearningRate 0.000665 Epoch: 2 Global Step: 55200 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:51,089-Speed 2513.80 samples/sec Loss 11.1452 LearningRate 0.000666 Epoch: 2 Global Step: 55210 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:02:59,290-Speed 2497.49 samples/sec Loss 11.2627 LearningRate 0.000666 Epoch: 2 Global Step: 55220 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:07,492-Speed 2497.90 samples/sec Loss 11.0777 LearningRate 0.000666 Epoch: 2 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:15,696-Speed 2496.62 samples/sec Loss 11.1249 LearningRate 0.000666 Epoch: 2 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:23,898-Speed 2497.32 samples/sec Loss 11.0637 LearningRate 0.000666 Epoch: 2 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:32,112-Speed 2493.66 samples/sec Loss 11.1380 LearningRate 0.000666 Epoch: 2 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:40,263-Speed 2512.99 samples/sec Loss 11.1619 LearningRate 0.000666 Epoch: 2 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:48,465-Speed 2497.51 samples/sec Loss 11.0942 LearningRate 0.000666 Epoch: 2 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:03:56,668-Speed 2497.13 samples/sec Loss 11.1553 LearningRate 0.000666 Epoch: 2 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:04,877-Speed 2495.11 samples/sec Loss 11.2736 LearningRate 0.000667 Epoch: 2 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:13,079-Speed 2497.66 samples/sec Loss 11.1305 LearningRate 0.000667 Epoch: 2 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:21,283-Speed 2496.63 samples/sec Loss 11.0706 LearningRate 0.000667 Epoch: 2 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:29,430-Speed 2514.00 samples/sec Loss 11.1364 LearningRate 0.000667 Epoch: 2 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:37,633-Speed 2497.28 samples/sec Loss 11.0708 LearningRate 0.000667 Epoch: 2 Global Step: 55340 Fp16 Grad Scale: 262144 Required: 177 hours Training: 2022-07-06 03:04:45,792-Speed 2510.53 samples/sec Loss 11.0231 LearningRate 0.000667 Epoch: 2 Global Step: 55350 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:04:53,996-Speed 2496.58 samples/sec Loss 11.0912 LearningRate 0.000667 Epoch: 2 Global Step: 55360 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:02,200-Speed 2497.03 samples/sec Loss 11.1493 LearningRate 0.000667 Epoch: 2 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:10,406-Speed 2495.95 samples/sec Loss 11.1123 LearningRate 0.000668 Epoch: 2 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:18,579-Speed 2506.23 samples/sec Loss 11.0430 LearningRate 0.000668 Epoch: 2 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:26,783-Speed 2496.83 samples/sec Loss 10.9871 LearningRate 0.000668 Epoch: 2 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:34,983-Speed 2497.91 samples/sec Loss 11.0375 LearningRate 0.000668 Epoch: 2 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:43,187-Speed 2496.82 samples/sec Loss 11.1456 LearningRate 0.000668 Epoch: 2 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:51,391-Speed 2496.62 samples/sec Loss 11.1248 LearningRate 0.000668 Epoch: 2 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:05:59,602-Speed 2494.44 samples/sec Loss 11.0197 LearningRate 0.000668 Epoch: 2 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:07,757-Speed 2511.81 samples/sec Loss 11.0141 LearningRate 0.000668 Epoch: 2 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:15,970-Speed 2493.91 samples/sec Loss 11.0120 LearningRate 0.000669 Epoch: 2 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:24,175-Speed 2497.11 samples/sec Loss 11.0587 LearningRate 0.000669 Epoch: 2 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:32,393-Speed 2492.32 samples/sec Loss 11.1761 LearningRate 0.000669 Epoch: 2 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:40,605-Speed 2494.20 samples/sec Loss 10.9915 LearningRate 0.000669 Epoch: 2 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:48,825-Speed 2491.81 samples/sec Loss 11.0641 LearningRate 0.000669 Epoch: 2 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:06:56,974-Speed 2513.65 samples/sec Loss 11.1187 LearningRate 0.000669 Epoch: 2 Global Step: 55510 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:05,170-Speed 2499.13 samples/sec Loss 11.0181 LearningRate 0.000669 Epoch: 2 Global Step: 55520 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:13,386-Speed 2493.17 samples/sec Loss 11.0232 LearningRate 0.000669 Epoch: 2 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:21,605-Speed 2492.42 samples/sec Loss 11.0260 LearningRate 0.000669 Epoch: 2 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:29,806-Speed 2497.50 samples/sec Loss 11.0183 LearningRate 0.000670 Epoch: 2 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:38,008-Speed 2497.34 samples/sec Loss 11.1454 LearningRate 0.000670 Epoch: 2 Global Step: 55560 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:46,154-Speed 2514.60 samples/sec Loss 11.0585 LearningRate 0.000670 Epoch: 2 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:07:54,353-Speed 2498.44 samples/sec Loss 11.2175 LearningRate 0.000670 Epoch: 2 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:02,556-Speed 2496.94 samples/sec Loss 11.1508 LearningRate 0.000670 Epoch: 2 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:10,759-Speed 2497.09 samples/sec Loss 11.1276 LearningRate 0.000670 Epoch: 2 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:18,958-Speed 2498.15 samples/sec Loss 11.1052 LearningRate 0.000670 Epoch: 2 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:27,164-Speed 2496.14 samples/sec Loss 11.0531 LearningRate 0.000670 Epoch: 2 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:35,312-Speed 2513.82 samples/sec Loss 11.1067 LearningRate 0.000671 Epoch: 2 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:43,525-Speed 2493.96 samples/sec Loss 11.0970 LearningRate 0.000671 Epoch: 2 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:51,733-Speed 2495.48 samples/sec Loss 11.0836 LearningRate 0.000671 Epoch: 2 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:08:59,936-Speed 2497.21 samples/sec Loss 11.1132 LearningRate 0.000671 Epoch: 2 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:08,135-Speed 2498.23 samples/sec Loss 11.1176 LearningRate 0.000671 Epoch: 2 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:16,360-Speed 2490.27 samples/sec Loss 11.0705 LearningRate 0.000671 Epoch: 2 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:24,512-Speed 2512.87 samples/sec Loss 11.3832 LearningRate 0.000671 Epoch: 2 Global Step: 55690 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:32,712-Speed 2497.62 samples/sec Loss 11.3147 LearningRate 0.000671 Epoch: 2 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:40,916-Speed 2496.73 samples/sec Loss 11.1696 LearningRate 0.000672 Epoch: 2 Global Step: 55710 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:49,122-Speed 2496.12 samples/sec Loss 11.1460 LearningRate 0.000672 Epoch: 2 Global Step: 55720 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:09:57,339-Speed 2492.87 samples/sec Loss 11.0838 LearningRate 0.000672 Epoch: 2 Global Step: 55730 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:05,542-Speed 2497.00 samples/sec Loss 11.1823 LearningRate 0.000672 Epoch: 2 Global Step: 55740 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:13,692-Speed 2513.42 samples/sec Loss 10.9962 LearningRate 0.000672 Epoch: 2 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:21,899-Speed 2495.81 samples/sec Loss 11.1090 LearningRate 0.000672 Epoch: 2 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:30,101-Speed 2497.45 samples/sec Loss 11.2337 LearningRate 0.000672 Epoch: 2 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:38,303-Speed 2497.21 samples/sec Loss 11.2337 LearningRate 0.000672 Epoch: 2 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:46,513-Speed 2495.03 samples/sec Loss 11.0659 LearningRate 0.000673 Epoch: 2 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:10:54,729-Speed 2492.85 samples/sec Loss 11.0758 LearningRate 0.000673 Epoch: 2 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:02,878-Speed 2513.64 samples/sec Loss 11.0157 LearningRate 0.000673 Epoch: 2 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:11,081-Speed 2496.96 samples/sec Loss 11.0391 LearningRate 0.000673 Epoch: 2 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:19,285-Speed 2496.95 samples/sec Loss 11.0030 LearningRate 0.000673 Epoch: 2 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:27,488-Speed 2497.09 samples/sec Loss 11.0145 LearningRate 0.000673 Epoch: 2 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:35,691-Speed 2496.85 samples/sec Loss 11.0948 LearningRate 0.000673 Epoch: 2 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:43,896-Speed 2496.37 samples/sec Loss 11.0409 LearningRate 0.000673 Epoch: 2 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:11:52,044-Speed 2514.04 samples/sec Loss 11.1311 LearningRate 0.000673 Epoch: 2 Global Step: 55870 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:00,246-Speed 2497.44 samples/sec Loss 11.1510 LearningRate 0.000674 Epoch: 2 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:08,446-Speed 2498.15 samples/sec Loss 11.1798 LearningRate 0.000674 Epoch: 2 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:16,645-Speed 2498.13 samples/sec Loss 11.1121 LearningRate 0.000674 Epoch: 2 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:24,845-Speed 2497.90 samples/sec Loss 10.9863 LearningRate 0.000674 Epoch: 2 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:33,046-Speed 2497.74 samples/sec Loss 11.0246 LearningRate 0.000674 Epoch: 2 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:41,192-Speed 2514.57 samples/sec Loss 11.0427 LearningRate 0.000674 Epoch: 2 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:49,391-Speed 2498.16 samples/sec Loss 11.0909 LearningRate 0.000674 Epoch: 2 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:12:57,589-Speed 2498.46 samples/sec Loss 10.9601 LearningRate 0.000674 Epoch: 2 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:05,792-Speed 2497.15 samples/sec Loss 11.0395 LearningRate 0.000675 Epoch: 2 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:13,992-Speed 2497.97 samples/sec Loss 11.0566 LearningRate 0.000675 Epoch: 2 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:22,191-Speed 2498.21 samples/sec Loss 10.9910 LearningRate 0.000675 Epoch: 2 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:30,352-Speed 2509.57 samples/sec Loss 11.0134 LearningRate 0.000675 Epoch: 2 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:38,552-Speed 2498.27 samples/sec Loss 11.0263 LearningRate 0.000675 Epoch: 2 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:46,751-Speed 2498.05 samples/sec Loss 10.9752 LearningRate 0.000675 Epoch: 2 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:13:54,953-Speed 2497.46 samples/sec Loss 10.9899 LearningRate 0.000675 Epoch: 2 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:03,152-Speed 2498.32 samples/sec Loss 11.1119 LearningRate 0.000675 Epoch: 2 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:11,355-Speed 2497.22 samples/sec Loss 10.9629 LearningRate 0.000676 Epoch: 2 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:19,502-Speed 2514.19 samples/sec Loss 10.9316 LearningRate 0.000676 Epoch: 2 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:27,706-Speed 2496.58 samples/sec Loss 10.9209 LearningRate 0.000676 Epoch: 2 Global Step: 56060 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:35,914-Speed 2495.45 samples/sec Loss 10.9401 LearningRate 0.000676 Epoch: 2 Global Step: 56070 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:44,116-Speed 2497.77 samples/sec Loss 10.9366 LearningRate 0.000676 Epoch: 2 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:14:52,327-Speed 2495.28 samples/sec Loss 10.9347 LearningRate 0.000676 Epoch: 2 Global Step: 56090 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:00,528-Speed 2497.66 samples/sec Loss 10.9761 LearningRate 0.000676 Epoch: 2 Global Step: 56100 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:08,675-Speed 2514.12 samples/sec Loss 10.9139 LearningRate 0.000676 Epoch: 2 Global Step: 56110 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:16,878-Speed 2497.05 samples/sec Loss 10.8458 LearningRate 0.000676 Epoch: 2 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:25,080-Speed 2497.51 samples/sec Loss 10.9712 LearningRate 0.000677 Epoch: 2 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:33,277-Speed 2498.79 samples/sec Loss 11.0027 LearningRate 0.000677 Epoch: 2 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:41,483-Speed 2496.19 samples/sec Loss 10.8980 LearningRate 0.000677 Epoch: 2 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:49,683-Speed 2498.06 samples/sec Loss 10.8343 LearningRate 0.000677 Epoch: 2 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:15:57,835-Speed 2512.71 samples/sec Loss 10.8489 LearningRate 0.000677 Epoch: 2 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:06,037-Speed 2497.43 samples/sec Loss 10.8664 LearningRate 0.000677 Epoch: 2 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:14,237-Speed 2498.07 samples/sec Loss 10.8088 LearningRate 0.000677 Epoch: 2 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:22,448-Speed 2494.43 samples/sec Loss 10.9125 LearningRate 0.000677 Epoch: 2 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:30,653-Speed 2496.34 samples/sec Loss 10.8457 LearningRate 0.000678 Epoch: 2 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:38,852-Speed 2498.26 samples/sec Loss 10.8912 LearningRate 0.000678 Epoch: 2 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:46,999-Speed 2514.36 samples/sec Loss 10.7952 LearningRate 0.000678 Epoch: 2 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:16:55,199-Speed 2497.66 samples/sec Loss 10.8948 LearningRate 0.000678 Epoch: 2 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:03,400-Speed 2497.88 samples/sec Loss 10.8908 LearningRate 0.000678 Epoch: 2 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:11,598-Speed 2498.40 samples/sec Loss 10.9024 LearningRate 0.000678 Epoch: 2 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:19,801-Speed 2497.08 samples/sec Loss 10.9330 LearningRate 0.000678 Epoch: 2 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:28,003-Speed 2497.25 samples/sec Loss 10.9015 LearningRate 0.000678 Epoch: 2 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:36,151-Speed 2514.01 samples/sec Loss 10.8726 LearningRate 0.000679 Epoch: 2 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:44,357-Speed 2496.13 samples/sec Loss 10.7656 LearningRate 0.000679 Epoch: 2 Global Step: 56300 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:17:52,562-Speed 2496.38 samples/sec Loss 10.8996 LearningRate 0.000679 Epoch: 2 Global Step: 56310 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:00,763-Speed 2497.86 samples/sec Loss 10.9728 LearningRate 0.000679 Epoch: 2 Global Step: 56320 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:08,976-Speed 2493.92 samples/sec Loss 10.9921 LearningRate 0.000679 Epoch: 2 Global Step: 56330 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:17,175-Speed 2498.13 samples/sec Loss 10.9412 LearningRate 0.000679 Epoch: 2 Global Step: 56340 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:25,327-Speed 2512.82 samples/sec Loss 11.0595 LearningRate 0.000679 Epoch: 2 Global Step: 56350 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:33,530-Speed 2496.84 samples/sec Loss 10.8960 LearningRate 0.000679 Epoch: 2 Global Step: 56360 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:41,729-Speed 2498.67 samples/sec Loss 11.1202 LearningRate 0.000680 Epoch: 2 Global Step: 56370 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:49,929-Speed 2497.87 samples/sec Loss 11.1570 LearningRate 0.000680 Epoch: 2 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:18:58,128-Speed 2498.56 samples/sec Loss 11.0180 LearningRate 0.000680 Epoch: 2 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:06,339-Speed 2494.23 samples/sec Loss 11.0559 LearningRate 0.000680 Epoch: 2 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:14,488-Speed 2513.56 samples/sec Loss 11.0964 LearningRate 0.000680 Epoch: 2 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:22,686-Speed 2498.51 samples/sec Loss 11.0125 LearningRate 0.000680 Epoch: 2 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:30,887-Speed 2497.97 samples/sec Loss 10.9932 LearningRate 0.000680 Epoch: 2 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:39,090-Speed 2497.17 samples/sec Loss 11.0991 LearningRate 0.000680 Epoch: 2 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:47,295-Speed 2496.44 samples/sec Loss 11.0394 LearningRate 0.000680 Epoch: 2 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:19:55,497-Speed 2497.57 samples/sec Loss 11.0306 LearningRate 0.000681 Epoch: 2 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:03,646-Speed 2513.47 samples/sec Loss 11.0372 LearningRate 0.000681 Epoch: 2 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:11,852-Speed 2496.17 samples/sec Loss 10.9467 LearningRate 0.000681 Epoch: 2 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:20,058-Speed 2496.14 samples/sec Loss 11.0433 LearningRate 0.000681 Epoch: 2 Global Step: 56490 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:28,263-Speed 2496.62 samples/sec Loss 11.0169 LearningRate 0.000681 Epoch: 2 Global Step: 56500 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:36,464-Speed 2497.53 samples/sec Loss 10.9649 LearningRate 0.000681 Epoch: 2 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 177 hours Training: 2022-07-06 03:20:44,665-Speed 2497.64 samples/sec Loss 11.1387 LearningRate 0.000681 Epoch: 2 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:20:52,820-Speed 2511.89 samples/sec Loss 10.9640 LearningRate 0.000681 Epoch: 2 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:01,020-Speed 2498.10 samples/sec Loss 11.1229 LearningRate 0.000682 Epoch: 2 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:09,223-Speed 2496.97 samples/sec Loss 11.0704 LearningRate 0.000682 Epoch: 2 Global Step: 56550 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 03:21:17,422-Speed 2498.06 samples/sec Loss 11.0231 LearningRate 0.000682 Epoch: 2 Global Step: 56560 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 03:21:25,582-Speed 2510.51 samples/sec Loss 11.0300 LearningRate 0.000682 Epoch: 2 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:33,782-Speed 2498.04 samples/sec Loss 10.9287 LearningRate 0.000682 Epoch: 2 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:41,930-Speed 2513.75 samples/sec Loss 10.9897 LearningRate 0.000682 Epoch: 2 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:50,133-Speed 2497.09 samples/sec Loss 10.9814 LearningRate 0.000682 Epoch: 2 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:21:58,348-Speed 2493.70 samples/sec Loss 10.8848 LearningRate 0.000682 Epoch: 2 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:06,547-Speed 2498.25 samples/sec Loss 10.9190 LearningRate 0.000683 Epoch: 2 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:14,749-Speed 2497.13 samples/sec Loss 10.8674 LearningRate 0.000683 Epoch: 2 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:22,949-Speed 2498.20 samples/sec Loss 10.9351 LearningRate 0.000683 Epoch: 2 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:31,095-Speed 2514.43 samples/sec Loss 10.9227 LearningRate 0.000683 Epoch: 2 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:39,301-Speed 2496.03 samples/sec Loss 10.7926 LearningRate 0.000683 Epoch: 2 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:47,502-Speed 2497.74 samples/sec Loss 10.8606 LearningRate 0.000683 Epoch: 2 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:22:55,703-Speed 2497.73 samples/sec Loss 10.9629 LearningRate 0.000683 Epoch: 2 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:03,905-Speed 2497.30 samples/sec Loss 10.8694 LearningRate 0.000683 Epoch: 2 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:12,111-Speed 2496.13 samples/sec Loss 10.7538 LearningRate 0.000683 Epoch: 2 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:20,258-Speed 2514.53 samples/sec Loss 11.0760 LearningRate 0.000684 Epoch: 2 Global Step: 56710 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:28,456-Speed 2498.63 samples/sec Loss 11.0951 LearningRate 0.000684 Epoch: 2 Global Step: 56720 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:36,655-Speed 2498.61 samples/sec Loss 10.9970 LearningRate 0.000684 Epoch: 2 Global Step: 56730 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:44,867-Speed 2494.20 samples/sec Loss 11.2383 LearningRate 0.000684 Epoch: 2 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:23:53,064-Speed 2499.12 samples/sec Loss 11.0532 LearningRate 0.000684 Epoch: 2 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:01,264-Speed 2497.63 samples/sec Loss 10.9498 LearningRate 0.000684 Epoch: 2 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:09,412-Speed 2514.10 samples/sec Loss 10.9934 LearningRate 0.000684 Epoch: 2 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:17,611-Speed 2498.46 samples/sec Loss 10.9872 LearningRate 0.000684 Epoch: 2 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:25,812-Speed 2497.37 samples/sec Loss 10.8881 LearningRate 0.000685 Epoch: 2 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:34,012-Speed 2498.00 samples/sec Loss 10.8404 LearningRate 0.000685 Epoch: 2 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:42,212-Speed 2498.13 samples/sec Loss 10.8187 LearningRate 0.000685 Epoch: 2 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:50,411-Speed 2498.17 samples/sec Loss 10.8525 LearningRate 0.000685 Epoch: 2 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:24:58,557-Speed 2514.55 samples/sec Loss 10.8408 LearningRate 0.000685 Epoch: 2 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:06,770-Speed 2493.87 samples/sec Loss 10.9228 LearningRate 0.000685 Epoch: 2 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:14,971-Speed 2497.90 samples/sec Loss 10.7787 LearningRate 0.000685 Epoch: 2 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:23,170-Speed 2497.96 samples/sec Loss 10.9064 LearningRate 0.000685 Epoch: 2 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:31,373-Speed 2497.15 samples/sec Loss 10.8664 LearningRate 0.000686 Epoch: 2 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:39,667-Speed 2469.84 samples/sec Loss 11.0418 LearningRate 0.000686 Epoch: 2 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:47,817-Speed 2513.40 samples/sec Loss 10.8867 LearningRate 0.000686 Epoch: 2 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:25:56,021-Speed 2496.99 samples/sec Loss 10.8915 LearningRate 0.000686 Epoch: 2 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:04,224-Speed 2497.08 samples/sec Loss 10.8509 LearningRate 0.000686 Epoch: 2 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:12,429-Speed 2496.78 samples/sec Loss 10.8440 LearningRate 0.000686 Epoch: 2 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:20,632-Speed 2496.88 samples/sec Loss 10.8612 LearningRate 0.000686 Epoch: 2 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:28,836-Speed 2496.79 samples/sec Loss 10.7866 LearningRate 0.000686 Epoch: 2 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:36,986-Speed 2513.29 samples/sec Loss 10.8652 LearningRate 0.000686 Epoch: 2 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:45,188-Speed 2497.43 samples/sec Loss 10.9151 LearningRate 0.000687 Epoch: 2 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:26:53,405-Speed 2492.79 samples/sec Loss 10.8055 LearningRate 0.000687 Epoch: 2 Global Step: 56970 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:01,605-Speed 2498.01 samples/sec Loss 10.7579 LearningRate 0.000687 Epoch: 2 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:09,807-Speed 2497.09 samples/sec Loss 10.8993 LearningRate 0.000687 Epoch: 2 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:18,009-Speed 2497.53 samples/sec Loss 10.7982 LearningRate 0.000687 Epoch: 2 Global Step: 57000 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:26,157-Speed 2513.56 samples/sec Loss 10.7672 LearningRate 0.000687 Epoch: 2 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:34,356-Speed 2498.53 samples/sec Loss 10.8039 LearningRate 0.000687 Epoch: 2 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:42,557-Speed 2497.33 samples/sec Loss 10.7613 LearningRate 0.000687 Epoch: 2 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:50,754-Speed 2498.79 samples/sec Loss 10.7475 LearningRate 0.000688 Epoch: 2 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:27:58,956-Speed 2497.72 samples/sec Loss 10.7230 LearningRate 0.000688 Epoch: 2 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:07,166-Speed 2494.80 samples/sec Loss 10.8327 LearningRate 0.000688 Epoch: 2 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:15,313-Speed 2514.26 samples/sec Loss 10.7330 LearningRate 0.000688 Epoch: 2 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:23,519-Speed 2495.99 samples/sec Loss 10.7343 LearningRate 0.000688 Epoch: 2 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:31,725-Speed 2496.28 samples/sec Loss 10.7593 LearningRate 0.000688 Epoch: 2 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:39,930-Speed 2496.47 samples/sec Loss 10.7966 LearningRate 0.000688 Epoch: 2 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:48,135-Speed 2496.24 samples/sec Loss 10.8224 LearningRate 0.000688 Epoch: 2 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:28:56,340-Speed 2496.62 samples/sec Loss 10.7878 LearningRate 0.000689 Epoch: 2 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:04,491-Speed 2513.16 samples/sec Loss 10.7014 LearningRate 0.000689 Epoch: 2 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:12,695-Speed 2496.71 samples/sec Loss 10.9178 LearningRate 0.000689 Epoch: 2 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:20,899-Speed 2496.72 samples/sec Loss 10.8344 LearningRate 0.000689 Epoch: 2 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:29,116-Speed 2492.75 samples/sec Loss 10.8551 LearningRate 0.000689 Epoch: 2 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:37,312-Speed 2499.18 samples/sec Loss 10.8661 LearningRate 0.000689 Epoch: 2 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:45,520-Speed 2495.48 samples/sec Loss 10.7752 LearningRate 0.000689 Epoch: 2 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:29:53,668-Speed 2514.00 samples/sec Loss 10.8036 LearningRate 0.000689 Epoch: 2 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:01,868-Speed 2497.99 samples/sec Loss 10.6557 LearningRate 0.000690 Epoch: 2 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:10,070-Speed 2497.50 samples/sec Loss 10.7393 LearningRate 0.000690 Epoch: 2 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:18,268-Speed 2498.38 samples/sec Loss 10.7874 LearningRate 0.000690 Epoch: 2 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:26,470-Speed 2497.60 samples/sec Loss 10.8113 LearningRate 0.000690 Epoch: 2 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:34,672-Speed 2497.24 samples/sec Loss 10.6600 LearningRate 0.000690 Epoch: 2 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:42,819-Speed 2514.05 samples/sec Loss 10.7566 LearningRate 0.000690 Epoch: 2 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:51,018-Speed 2498.27 samples/sec Loss 10.6502 LearningRate 0.000690 Epoch: 2 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:30:59,217-Speed 2498.34 samples/sec Loss 10.8435 LearningRate 0.000690 Epoch: 2 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:07,416-Speed 2498.27 samples/sec Loss 10.7778 LearningRate 0.000690 Epoch: 2 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:15,616-Speed 2498.27 samples/sec Loss 10.7703 LearningRate 0.000691 Epoch: 2 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:23,815-Speed 2498.15 samples/sec Loss 10.8995 LearningRate 0.000691 Epoch: 2 Global Step: 57300 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:31,965-Speed 2513.34 samples/sec Loss 10.8900 LearningRate 0.000691 Epoch: 2 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:40,162-Speed 2498.68 samples/sec Loss 10.8214 LearningRate 0.000691 Epoch: 2 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:48,363-Speed 2497.71 samples/sec Loss 10.9575 LearningRate 0.000691 Epoch: 2 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:31:56,565-Speed 2497.38 samples/sec Loss 10.9046 LearningRate 0.000691 Epoch: 2 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:04,765-Speed 2497.99 samples/sec Loss 10.8928 LearningRate 0.000691 Epoch: 2 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:12,977-Speed 2494.09 samples/sec Loss 10.8420 LearningRate 0.000691 Epoch: 2 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:21,127-Speed 2513.48 samples/sec Loss 10.8416 LearningRate 0.000692 Epoch: 2 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:29,325-Speed 2498.49 samples/sec Loss 10.9312 LearningRate 0.000692 Epoch: 2 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:37,524-Speed 2498.46 samples/sec Loss 10.8118 LearningRate 0.000692 Epoch: 2 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:45,727-Speed 2496.81 samples/sec Loss 10.7703 LearningRate 0.000692 Epoch: 2 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:32:53,926-Speed 2498.39 samples/sec Loss 10.7312 LearningRate 0.000692 Epoch: 2 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:02,126-Speed 2497.82 samples/sec Loss 10.7260 LearningRate 0.000692 Epoch: 2 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:10,287-Speed 2509.88 samples/sec Loss 10.8019 LearningRate 0.000692 Epoch: 2 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:18,496-Speed 2495.59 samples/sec Loss 10.7510 LearningRate 0.000692 Epoch: 2 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:26,700-Speed 2496.56 samples/sec Loss 10.7574 LearningRate 0.000693 Epoch: 2 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:34,913-Speed 2494.19 samples/sec Loss 10.7347 LearningRate 0.000693 Epoch: 2 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:43,114-Speed 2497.48 samples/sec Loss 10.9303 LearningRate 0.000693 Epoch: 2 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:51,317-Speed 2497.17 samples/sec Loss 10.7779 LearningRate 0.000693 Epoch: 2 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:33:59,497-Speed 2504.16 samples/sec Loss 10.9460 LearningRate 0.000693 Epoch: 2 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:07,698-Speed 2497.46 samples/sec Loss 10.9050 LearningRate 0.000693 Epoch: 2 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:15,915-Speed 2492.81 samples/sec Loss 10.8491 LearningRate 0.000693 Epoch: 2 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:24,118-Speed 2497.07 samples/sec Loss 10.7912 LearningRate 0.000693 Epoch: 2 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:32,323-Speed 2496.44 samples/sec Loss 10.8267 LearningRate 0.000693 Epoch: 2 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:40,528-Speed 2496.54 samples/sec Loss 10.7098 LearningRate 0.000694 Epoch: 2 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:48,675-Speed 2514.13 samples/sec Loss 10.7579 LearningRate 0.000694 Epoch: 2 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:34:56,878-Speed 2496.90 samples/sec Loss 10.7392 LearningRate 0.000694 Epoch: 2 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:05,082-Speed 2496.88 samples/sec Loss 10.7984 LearningRate 0.000694 Epoch: 2 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:13,286-Speed 2496.95 samples/sec Loss 10.8033 LearningRate 0.000694 Epoch: 2 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:21,486-Speed 2497.94 samples/sec Loss 10.8235 LearningRate 0.000694 Epoch: 2 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:29,687-Speed 2497.48 samples/sec Loss 10.8076 LearningRate 0.000694 Epoch: 2 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:37,841-Speed 2512.11 samples/sec Loss 10.7977 LearningRate 0.000694 Epoch: 2 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:46,044-Speed 2497.27 samples/sec Loss 10.7090 LearningRate 0.000695 Epoch: 2 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:35:54,255-Speed 2494.78 samples/sec Loss 10.6861 LearningRate 0.000695 Epoch: 2 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:02,454-Speed 2498.13 samples/sec Loss 10.6643 LearningRate 0.000695 Epoch: 2 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:10,659-Speed 2496.58 samples/sec Loss 10.5947 LearningRate 0.000695 Epoch: 2 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:18,859-Speed 2497.66 samples/sec Loss 10.5800 LearningRate 0.000695 Epoch: 2 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:27,009-Speed 2513.25 samples/sec Loss 10.7168 LearningRate 0.000695 Epoch: 2 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:35,218-Speed 2495.42 samples/sec Loss 10.7576 LearningRate 0.000695 Epoch: 2 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:43,418-Speed 2497.98 samples/sec Loss 10.5511 LearningRate 0.000695 Epoch: 2 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:51,616-Speed 2498.61 samples/sec Loss 10.6697 LearningRate 0.000696 Epoch: 2 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:36:59,815-Speed 2498.41 samples/sec Loss 10.7038 LearningRate 0.000696 Epoch: 2 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:08,028-Speed 2493.81 samples/sec Loss 10.7372 LearningRate 0.000696 Epoch: 2 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:16,178-Speed 2513.46 samples/sec Loss 10.6312 LearningRate 0.000696 Epoch: 2 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:24,379-Speed 2497.55 samples/sec Loss 10.5796 LearningRate 0.000696 Epoch: 2 Global Step: 57740 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:32,579-Speed 2498.13 samples/sec Loss 10.5065 LearningRate 0.000696 Epoch: 2 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:40,807-Speed 2489.24 samples/sec Loss 10.6237 LearningRate 0.000696 Epoch: 2 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:37:49,007-Speed 2498.01 samples/sec Loss 10.5181 LearningRate 0.000696 Epoch: 2 Global Step: 57770 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 03:37:57,163-Speed 2511.42 samples/sec Loss 10.6171 LearningRate 0.000697 Epoch: 2 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:05,312-Speed 2513.63 samples/sec Loss 10.6051 LearningRate 0.000697 Epoch: 2 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:13,517-Speed 2496.37 samples/sec Loss 10.6451 LearningRate 0.000697 Epoch: 2 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:21,724-Speed 2496.01 samples/sec Loss 10.6037 LearningRate 0.000697 Epoch: 2 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:29,926-Speed 2497.17 samples/sec Loss 10.6016 LearningRate 0.000697 Epoch: 2 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:38,128-Speed 2497.39 samples/sec Loss 10.5533 LearningRate 0.000697 Epoch: 2 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:46,333-Speed 2496.57 samples/sec Loss 10.5123 LearningRate 0.000697 Epoch: 2 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:38:54,481-Speed 2513.91 samples/sec Loss 10.6431 LearningRate 0.000697 Epoch: 2 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:02,683-Speed 2497.50 samples/sec Loss 10.5191 LearningRate 0.000697 Epoch: 2 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:10,884-Speed 2497.47 samples/sec Loss 10.6327 LearningRate 0.000698 Epoch: 2 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:19,088-Speed 2496.74 samples/sec Loss 10.6610 LearningRate 0.000698 Epoch: 2 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:27,290-Speed 2497.44 samples/sec Loss 10.6983 LearningRate 0.000698 Epoch: 2 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:35,500-Speed 2494.85 samples/sec Loss 10.7145 LearningRate 0.000698 Epoch: 2 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:43,648-Speed 2513.90 samples/sec Loss 10.5502 LearningRate 0.000698 Epoch: 2 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:39:51,850-Speed 2497.40 samples/sec Loss 10.6013 LearningRate 0.000698 Epoch: 2 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:00,061-Speed 2494.73 samples/sec Loss 10.6441 LearningRate 0.000698 Epoch: 2 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:08,261-Speed 2497.84 samples/sec Loss 10.6444 LearningRate 0.000698 Epoch: 2 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:16,470-Speed 2495.36 samples/sec Loss 10.6843 LearningRate 0.000699 Epoch: 2 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:24,672-Speed 2497.43 samples/sec Loss 10.6287 LearningRate 0.000699 Epoch: 2 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:32,817-Speed 2514.71 samples/sec Loss 10.5526 LearningRate 0.000699 Epoch: 2 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:41,017-Speed 2497.97 samples/sec Loss 10.6010 LearningRate 0.000699 Epoch: 2 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:49,217-Speed 2498.04 samples/sec Loss 10.6353 LearningRate 0.000699 Epoch: 2 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:40:57,419-Speed 2497.20 samples/sec Loss 10.5802 LearningRate 0.000699 Epoch: 2 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:05,620-Speed 2497.63 samples/sec Loss 10.6055 LearningRate 0.000699 Epoch: 2 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:13,820-Speed 2498.01 samples/sec Loss 10.6499 LearningRate 0.000699 Epoch: 2 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:21,969-Speed 2513.62 samples/sec Loss 10.5781 LearningRate 0.000700 Epoch: 2 Global Step: 58030 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:30,169-Speed 2497.81 samples/sec Loss 10.5747 LearningRate 0.000700 Epoch: 2 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:38,373-Speed 2496.80 samples/sec Loss 10.5184 LearningRate 0.000700 Epoch: 2 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:46,579-Speed 2496.46 samples/sec Loss 10.5986 LearningRate 0.000700 Epoch: 2 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:41:54,783-Speed 2496.73 samples/sec Loss 10.5617 LearningRate 0.000700 Epoch: 2 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:03,001-Speed 2492.44 samples/sec Loss 10.4726 LearningRate 0.000700 Epoch: 2 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:11,157-Speed 2511.26 samples/sec Loss 10.7052 LearningRate 0.000700 Epoch: 2 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:19,361-Speed 2496.85 samples/sec Loss 10.5432 LearningRate 0.000700 Epoch: 2 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:27,575-Speed 2493.93 samples/sec Loss 10.5560 LearningRate 0.000700 Epoch: 2 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:35,778-Speed 2497.01 samples/sec Loss 10.5567 LearningRate 0.000701 Epoch: 2 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:43,980-Speed 2497.40 samples/sec Loss 10.7014 LearningRate 0.000701 Epoch: 2 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:42:52,186-Speed 2496.14 samples/sec Loss 10.6595 LearningRate 0.000701 Epoch: 2 Global Step: 58140 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:00,333-Speed 2514.25 samples/sec Loss 10.5782 LearningRate 0.000701 Epoch: 2 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:08,546-Speed 2493.97 samples/sec Loss 10.6229 LearningRate 0.000701 Epoch: 2 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:16,748-Speed 2497.48 samples/sec Loss 10.6298 LearningRate 0.000701 Epoch: 2 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:24,950-Speed 2497.56 samples/sec Loss 10.6072 LearningRate 0.000701 Epoch: 2 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:33,152-Speed 2497.64 samples/sec Loss 10.5728 LearningRate 0.000701 Epoch: 2 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:41,352-Speed 2497.76 samples/sec Loss 10.4753 LearningRate 0.000702 Epoch: 2 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:49,506-Speed 2512.12 samples/sec Loss 10.4384 LearningRate 0.000702 Epoch: 2 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:43:57,707-Speed 2497.62 samples/sec Loss 10.5103 LearningRate 0.000702 Epoch: 2 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:05,908-Speed 2497.58 samples/sec Loss 10.5276 LearningRate 0.000702 Epoch: 2 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:14,107-Speed 2498.47 samples/sec Loss 10.5358 LearningRate 0.000702 Epoch: 2 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:22,306-Speed 2498.18 samples/sec Loss 10.6814 LearningRate 0.000702 Epoch: 2 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:30,508-Speed 2497.70 samples/sec Loss 10.6099 LearningRate 0.000702 Epoch: 2 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:38,656-Speed 2513.89 samples/sec Loss 10.5685 LearningRate 0.000702 Epoch: 2 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:46,858-Speed 2497.16 samples/sec Loss 10.5397 LearningRate 0.000703 Epoch: 2 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:44:55,058-Speed 2498.02 samples/sec Loss 10.6011 LearningRate 0.000703 Epoch: 2 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:03,261-Speed 2497.28 samples/sec Loss 10.5902 LearningRate 0.000703 Epoch: 2 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:11,463-Speed 2497.70 samples/sec Loss 10.6907 LearningRate 0.000703 Epoch: 2 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:19,667-Speed 2496.57 samples/sec Loss 10.5922 LearningRate 0.000703 Epoch: 2 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:27,817-Speed 2513.35 samples/sec Loss 10.5280 LearningRate 0.000703 Epoch: 2 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:36,017-Speed 2498.01 samples/sec Loss 10.5446 LearningRate 0.000703 Epoch: 2 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:44,217-Speed 2498.03 samples/sec Loss 10.6405 LearningRate 0.000703 Epoch: 2 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:45:52,419-Speed 2497.74 samples/sec Loss 10.5400 LearningRate 0.000703 Epoch: 2 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:00,629-Speed 2494.94 samples/sec Loss 10.5633 LearningRate 0.000704 Epoch: 2 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:08,830-Speed 2497.48 samples/sec Loss 10.7098 LearningRate 0.000704 Epoch: 2 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:16,979-Speed 2513.55 samples/sec Loss 10.5120 LearningRate 0.000704 Epoch: 2 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:25,185-Speed 2496.29 samples/sec Loss 10.5193 LearningRate 0.000704 Epoch: 2 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:33,383-Speed 2498.56 samples/sec Loss 10.5730 LearningRate 0.000704 Epoch: 2 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:41,585-Speed 2497.29 samples/sec Loss 10.4590 LearningRate 0.000704 Epoch: 2 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:49,785-Speed 2498.09 samples/sec Loss 10.5601 LearningRate 0.000704 Epoch: 2 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:46:57,985-Speed 2497.71 samples/sec Loss 10.5265 LearningRate 0.000704 Epoch: 2 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:06,137-Speed 2513.03 samples/sec Loss 10.4590 LearningRate 0.000705 Epoch: 2 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:14,353-Speed 2493.12 samples/sec Loss 10.5613 LearningRate 0.000705 Epoch: 2 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:22,553-Speed 2497.72 samples/sec Loss 10.5322 LearningRate 0.000705 Epoch: 2 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:30,763-Speed 2495.15 samples/sec Loss 10.5360 LearningRate 0.000705 Epoch: 2 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:38,966-Speed 2497.14 samples/sec Loss 10.5337 LearningRate 0.000705 Epoch: 2 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:47,183-Speed 2492.72 samples/sec Loss 10.5672 LearningRate 0.000705 Epoch: 2 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:47:55,328-Speed 2514.66 samples/sec Loss 10.6261 LearningRate 0.000705 Epoch: 2 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:03,531-Speed 2497.11 samples/sec Loss 10.6101 LearningRate 0.000705 Epoch: 2 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:11,736-Speed 2496.28 samples/sec Loss 10.6297 LearningRate 0.000706 Epoch: 2 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:19,951-Speed 2493.56 samples/sec Loss 10.6289 LearningRate 0.000706 Epoch: 2 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:28,152-Speed 2497.41 samples/sec Loss 10.5301 LearningRate 0.000706 Epoch: 2 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:36,357-Speed 2496.49 samples/sec Loss 10.4858 LearningRate 0.000706 Epoch: 2 Global Step: 58560 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:44,506-Speed 2513.87 samples/sec Loss 10.4439 LearningRate 0.000706 Epoch: 2 Global Step: 58570 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:48:52,722-Speed 2493.04 samples/sec Loss 10.4540 LearningRate 0.000706 Epoch: 2 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:00,925-Speed 2496.95 samples/sec Loss 10.5784 LearningRate 0.000706 Epoch: 2 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:09,128-Speed 2497.16 samples/sec Loss 10.5554 LearningRate 0.000706 Epoch: 2 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:17,342-Speed 2493.49 samples/sec Loss 10.5394 LearningRate 0.000707 Epoch: 2 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:25,543-Speed 2497.77 samples/sec Loss 10.6584 LearningRate 0.000707 Epoch: 2 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:33,693-Speed 2513.24 samples/sec Loss 10.7374 LearningRate 0.000707 Epoch: 2 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:41,891-Speed 2498.67 samples/sec Loss 10.5821 LearningRate 0.000707 Epoch: 2 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:50,092-Speed 2497.67 samples/sec Loss 10.6295 LearningRate 0.000707 Epoch: 2 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:49:58,300-Speed 2495.41 samples/sec Loss 10.6147 LearningRate 0.000707 Epoch: 2 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:06,502-Speed 2497.53 samples/sec Loss 10.4911 LearningRate 0.000707 Epoch: 2 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:14,716-Speed 2493.67 samples/sec Loss 10.4916 LearningRate 0.000707 Epoch: 2 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:22,862-Speed 2514.42 samples/sec Loss 10.4939 LearningRate 0.000707 Epoch: 2 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:31,063-Speed 2497.61 samples/sec Loss 10.6820 LearningRate 0.000708 Epoch: 2 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:39,267-Speed 2497.04 samples/sec Loss 10.4556 LearningRate 0.000708 Epoch: 2 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:47,467-Speed 2497.71 samples/sec Loss 10.5408 LearningRate 0.000708 Epoch: 2 Global Step: 58720 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:50:55,670-Speed 2497.39 samples/sec Loss 10.5337 LearningRate 0.000708 Epoch: 2 Global Step: 58730 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:03,874-Speed 2496.69 samples/sec Loss 10.5935 LearningRate 0.000708 Epoch: 2 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:12,027-Speed 2512.30 samples/sec Loss 10.5574 LearningRate 0.000708 Epoch: 2 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:20,231-Speed 2496.60 samples/sec Loss 10.6274 LearningRate 0.000708 Epoch: 2 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:28,435-Speed 2497.02 samples/sec Loss 10.6411 LearningRate 0.000708 Epoch: 2 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:36,669-Speed 2487.63 samples/sec Loss 10.4864 LearningRate 0.000709 Epoch: 2 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:44,877-Speed 2495.42 samples/sec Loss 10.6125 LearningRate 0.000709 Epoch: 2 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:51:53,080-Speed 2497.12 samples/sec Loss 10.5178 LearningRate 0.000709 Epoch: 2 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:01,231-Speed 2513.12 samples/sec Loss 10.5751 LearningRate 0.000709 Epoch: 2 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:09,437-Speed 2495.89 samples/sec Loss 10.5739 LearningRate 0.000709 Epoch: 2 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:17,642-Speed 2496.76 samples/sec Loss 10.4868 LearningRate 0.000709 Epoch: 2 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:25,842-Speed 2498.08 samples/sec Loss 10.4523 LearningRate 0.000709 Epoch: 2 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:34,049-Speed 2495.73 samples/sec Loss 10.6195 LearningRate 0.000709 Epoch: 2 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:42,253-Speed 2496.57 samples/sec Loss 10.4469 LearningRate 0.000710 Epoch: 2 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:50,402-Speed 2513.70 samples/sec Loss 10.5650 LearningRate 0.000710 Epoch: 2 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:52:58,601-Speed 2498.09 samples/sec Loss 10.4604 LearningRate 0.000710 Epoch: 2 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:06,807-Speed 2496.25 samples/sec Loss 10.5928 LearningRate 0.000710 Epoch: 2 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:15,006-Speed 2498.17 samples/sec Loss 10.4827 LearningRate 0.000710 Epoch: 2 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:23,205-Speed 2498.26 samples/sec Loss 10.6077 LearningRate 0.000710 Epoch: 2 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:31,407-Speed 2497.36 samples/sec Loss 10.5933 LearningRate 0.000710 Epoch: 2 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:39,564-Speed 2511.09 samples/sec Loss 10.3817 LearningRate 0.000710 Epoch: 2 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:47,768-Speed 2496.99 samples/sec Loss 10.4614 LearningRate 0.000710 Epoch: 2 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:53:55,968-Speed 2498.03 samples/sec Loss 10.5303 LearningRate 0.000711 Epoch: 2 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:54:04,168-Speed 2498.00 samples/sec Loss 10.4381 LearningRate 0.000711 Epoch: 2 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:54:12,370-Speed 2497.29 samples/sec Loss 10.4073 LearningRate 0.000711 Epoch: 2 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:54:20,571-Speed 2497.45 samples/sec Loss 10.4867 LearningRate 0.000711 Epoch: 2 Global Step: 58980 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 03:54:28,721-Speed 2513.46 samples/sec Loss 10.3865 LearningRate 0.000711 Epoch: 2 Global Step: 58990 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 03:54:36,885-Speed 2508.96 samples/sec Loss 10.4411 LearningRate 0.000711 Epoch: 2 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:54:45,097-Speed 2494.03 samples/sec Loss 10.4685 LearningRate 0.000711 Epoch: 2 Global Step: 59010 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:54:53,295-Speed 2498.50 samples/sec Loss 10.3926 LearningRate 0.000711 Epoch: 2 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:01,497-Speed 2497.87 samples/sec Loss 10.4342 LearningRate 0.000712 Epoch: 2 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:09,696-Speed 2498.11 samples/sec Loss 10.4097 LearningRate 0.000712 Epoch: 2 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:17,845-Speed 2513.44 samples/sec Loss 10.3115 LearningRate 0.000712 Epoch: 2 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:26,043-Speed 2498.67 samples/sec Loss 10.4067 LearningRate 0.000712 Epoch: 2 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:34,246-Speed 2496.88 samples/sec Loss 10.5249 LearningRate 0.000712 Epoch: 2 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:42,449-Speed 2497.13 samples/sec Loss 10.3917 LearningRate 0.000712 Epoch: 2 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:50,651-Speed 2497.47 samples/sec Loss 10.5236 LearningRate 0.000712 Epoch: 2 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:55:58,860-Speed 2495.95 samples/sec Loss 10.4742 LearningRate 0.000712 Epoch: 2 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:07,026-Speed 2508.49 samples/sec Loss 10.3762 LearningRate 0.000713 Epoch: 2 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:15,231-Speed 2496.38 samples/sec Loss 10.4149 LearningRate 0.000713 Epoch: 2 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:23,437-Speed 2496.52 samples/sec Loss 10.4266 LearningRate 0.000713 Epoch: 2 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:31,639-Speed 2497.33 samples/sec Loss 10.3659 LearningRate 0.000713 Epoch: 2 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:39,856-Speed 2492.87 samples/sec Loss 10.4610 LearningRate 0.000713 Epoch: 2 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:48,061-Speed 2496.79 samples/sec Loss 10.4865 LearningRate 0.000713 Epoch: 2 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:56:56,211-Speed 2513.27 samples/sec Loss 10.2840 LearningRate 0.000713 Epoch: 2 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:04,412-Speed 2497.56 samples/sec Loss 10.3788 LearningRate 0.000713 Epoch: 2 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:12,620-Speed 2495.73 samples/sec Loss 10.3485 LearningRate 0.000713 Epoch: 2 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:20,819-Speed 2498.05 samples/sec Loss 10.4557 LearningRate 0.000714 Epoch: 2 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:29,018-Speed 2498.29 samples/sec Loss 10.4359 LearningRate 0.000714 Epoch: 2 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:37,218-Speed 2497.90 samples/sec Loss 10.3176 LearningRate 0.000714 Epoch: 2 Global Step: 59220 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:45,367-Speed 2513.81 samples/sec Loss 10.2595 LearningRate 0.000714 Epoch: 2 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:57:53,585-Speed 2492.25 samples/sec Loss 10.4107 LearningRate 0.000714 Epoch: 2 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:01,790-Speed 2496.42 samples/sec Loss 10.4165 LearningRate 0.000714 Epoch: 2 Global Step: 59250 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:09,990-Speed 2497.91 samples/sec Loss 10.3365 LearningRate 0.000714 Epoch: 2 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:18,188-Speed 2499.06 samples/sec Loss 10.3494 LearningRate 0.000714 Epoch: 2 Global Step: 59270 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:26,387-Speed 2498.06 samples/sec Loss 10.3490 LearningRate 0.000715 Epoch: 2 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:34,536-Speed 2513.64 samples/sec Loss 10.2495 LearningRate 0.000715 Epoch: 2 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:42,735-Speed 2498.19 samples/sec Loss 10.3065 LearningRate 0.000715 Epoch: 2 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:50,935-Speed 2498.04 samples/sec Loss 10.3502 LearningRate 0.000715 Epoch: 2 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:58:59,136-Speed 2498.15 samples/sec Loss 10.4023 LearningRate 0.000715 Epoch: 2 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:07,333-Speed 2498.74 samples/sec Loss 10.2334 LearningRate 0.000715 Epoch: 2 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:15,536-Speed 2497.17 samples/sec Loss 10.3309 LearningRate 0.000715 Epoch: 2 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:23,684-Speed 2514.00 samples/sec Loss 10.4961 LearningRate 0.000715 Epoch: 2 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:31,884-Speed 2497.89 samples/sec Loss 10.4110 LearningRate 0.000716 Epoch: 2 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:40,096-Speed 2494.28 samples/sec Loss 10.4295 LearningRate 0.000716 Epoch: 2 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:48,299-Speed 2497.52 samples/sec Loss 10.4504 LearningRate 0.000716 Epoch: 2 Global Step: 59380 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 03:59:56,499-Speed 2497.83 samples/sec Loss 10.4946 LearningRate 0.000716 Epoch: 2 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:04,701-Speed 2497.44 samples/sec Loss 10.4908 LearningRate 0.000716 Epoch: 2 Global Step: 59400 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:12,849-Speed 2513.95 samples/sec Loss 10.3886 LearningRate 0.000716 Epoch: 2 Global Step: 59410 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:21,053-Speed 2496.54 samples/sec Loss 10.4866 LearningRate 0.000716 Epoch: 2 Global Step: 59420 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:29,253-Speed 2497.98 samples/sec Loss 10.5234 LearningRate 0.000716 Epoch: 2 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:37,460-Speed 2496.05 samples/sec Loss 10.3288 LearningRate 0.000717 Epoch: 2 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:45,658-Speed 2498.44 samples/sec Loss 10.3195 LearningRate 0.000717 Epoch: 2 Global Step: 59450 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:00:53,859-Speed 2497.92 samples/sec Loss 10.2506 LearningRate 0.000717 Epoch: 2 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:02,008-Speed 2513.49 samples/sec Loss 10.3057 LearningRate 0.000717 Epoch: 2 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:10,222-Speed 2494.04 samples/sec Loss 10.2832 LearningRate 0.000717 Epoch: 2 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:18,420-Speed 2498.38 samples/sec Loss 10.2594 LearningRate 0.000717 Epoch: 2 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:26,620-Speed 2498.02 samples/sec Loss 10.3180 LearningRate 0.000717 Epoch: 2 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:34,819-Speed 2498.13 samples/sec Loss 10.3600 LearningRate 0.000717 Epoch: 2 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:43,027-Speed 2495.47 samples/sec Loss 10.2917 LearningRate 0.000717 Epoch: 2 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:51,174-Speed 2514.28 samples/sec Loss 10.5265 LearningRate 0.000718 Epoch: 2 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:01:59,380-Speed 2496.25 samples/sec Loss 10.4337 LearningRate 0.000718 Epoch: 2 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:07,577-Speed 2498.63 samples/sec Loss 10.3098 LearningRate 0.000718 Epoch: 2 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:15,787-Speed 2494.91 samples/sec Loss 10.3071 LearningRate 0.000718 Epoch: 2 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:23,998-Speed 2494.57 samples/sec Loss 10.4403 LearningRate 0.000718 Epoch: 2 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:32,200-Speed 2497.78 samples/sec Loss 10.2582 LearningRate 0.000718 Epoch: 2 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:40,351-Speed 2512.97 samples/sec Loss 10.4326 LearningRate 0.000718 Epoch: 2 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:48,571-Speed 2491.98 samples/sec Loss 10.4067 LearningRate 0.000718 Epoch: 2 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:02:56,772-Speed 2497.71 samples/sec Loss 10.1981 LearningRate 0.000719 Epoch: 2 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:04,973-Speed 2498.23 samples/sec Loss 10.2657 LearningRate 0.000719 Epoch: 2 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:13,175-Speed 2497.40 samples/sec Loss 10.3538 LearningRate 0.000719 Epoch: 2 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:21,377-Speed 2497.20 samples/sec Loss 10.5264 LearningRate 0.000719 Epoch: 2 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:29,524-Speed 2514.35 samples/sec Loss 10.3872 LearningRate 0.000719 Epoch: 2 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:37,724-Speed 2498.07 samples/sec Loss 10.3523 LearningRate 0.000719 Epoch: 2 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:45,927-Speed 2496.94 samples/sec Loss 10.3944 LearningRate 0.000719 Epoch: 2 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:03:54,127-Speed 2498.27 samples/sec Loss 10.3380 LearningRate 0.000719 Epoch: 2 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:02,333-Speed 2496.14 samples/sec Loss 10.3420 LearningRate 0.000720 Epoch: 2 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:10,532-Speed 2498.13 samples/sec Loss 10.4370 LearningRate 0.000720 Epoch: 2 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:18,681-Speed 2513.65 samples/sec Loss 10.3096 LearningRate 0.000720 Epoch: 2 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:26,884-Speed 2497.03 samples/sec Loss 10.3858 LearningRate 0.000720 Epoch: 2 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:35,088-Speed 2496.72 samples/sec Loss 10.3199 LearningRate 0.000720 Epoch: 2 Global Step: 59730 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:43,289-Speed 2497.65 samples/sec Loss 10.2267 LearningRate 0.000720 Epoch: 2 Global Step: 59740 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:51,488-Speed 2498.11 samples/sec Loss 10.3043 LearningRate 0.000720 Epoch: 2 Global Step: 59750 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:04:59,697-Speed 2495.28 samples/sec Loss 10.3138 LearningRate 0.000720 Epoch: 2 Global Step: 59760 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:07,846-Speed 2513.62 samples/sec Loss 10.3559 LearningRate 0.000720 Epoch: 2 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:16,046-Speed 2497.88 samples/sec Loss 10.1536 LearningRate 0.000721 Epoch: 2 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:24,245-Speed 2498.31 samples/sec Loss 10.1817 LearningRate 0.000721 Epoch: 2 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:32,446-Speed 2497.75 samples/sec Loss 10.2608 LearningRate 0.000721 Epoch: 2 Global Step: 59800 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:40,646-Speed 2497.99 samples/sec Loss 10.3241 LearningRate 0.000721 Epoch: 2 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:48,848-Speed 2497.42 samples/sec Loss 10.3140 LearningRate 0.000721 Epoch: 2 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:05:56,996-Speed 2513.76 samples/sec Loss 10.3909 LearningRate 0.000721 Epoch: 2 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:05,197-Speed 2497.99 samples/sec Loss 10.3997 LearningRate 0.000721 Epoch: 2 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:13,392-Speed 2499.58 samples/sec Loss 10.4281 LearningRate 0.000721 Epoch: 2 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:21,594-Speed 2497.47 samples/sec Loss 10.4113 LearningRate 0.000722 Epoch: 2 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:29,791-Speed 2498.81 samples/sec Loss 10.4777 LearningRate 0.000722 Epoch: 2 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:37,993-Speed 2497.53 samples/sec Loss 10.2960 LearningRate 0.000722 Epoch: 2 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:46,139-Speed 2514.44 samples/sec Loss 10.3662 LearningRate 0.000722 Epoch: 2 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:06:54,338-Speed 2498.21 samples/sec Loss 10.3807 LearningRate 0.000722 Epoch: 2 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:02,538-Speed 2497.95 samples/sec Loss 10.2435 LearningRate 0.000722 Epoch: 2 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:10,740-Speed 2497.37 samples/sec Loss 10.2315 LearningRate 0.000722 Epoch: 2 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:18,941-Speed 2497.83 samples/sec Loss 10.2890 LearningRate 0.000722 Epoch: 2 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:27,141-Speed 2498.04 samples/sec Loss 10.2771 LearningRate 0.000723 Epoch: 2 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:35,292-Speed 2512.77 samples/sec Loss 10.3685 LearningRate 0.000723 Epoch: 2 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:43,492-Speed 2498.09 samples/sec Loss 10.1452 LearningRate 0.000723 Epoch: 2 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:51,722-Speed 2489.09 samples/sec Loss 10.3527 LearningRate 0.000723 Epoch: 2 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:07:59,935-Speed 2493.78 samples/sec Loss 10.3775 LearningRate 0.000723 Epoch: 2 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:08,133-Speed 2498.78 samples/sec Loss 10.2694 LearningRate 0.000723 Epoch: 2 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:16,332-Speed 2498.46 samples/sec Loss 10.2616 LearningRate 0.000723 Epoch: 2 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:24,479-Speed 2514.03 samples/sec Loss 10.2382 LearningRate 0.000723 Epoch: 2 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:32,679-Speed 2497.86 samples/sec Loss 10.2974 LearningRate 0.000724 Epoch: 2 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:40,880-Speed 2497.76 samples/sec Loss 10.2962 LearningRate 0.000724 Epoch: 2 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:49,080-Speed 2498.03 samples/sec Loss 10.2512 LearningRate 0.000724 Epoch: 2 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:08:57,282-Speed 2497.56 samples/sec Loss 10.2684 LearningRate 0.000724 Epoch: 2 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:05,478-Speed 2499.04 samples/sec Loss 10.2719 LearningRate 0.000724 Epoch: 2 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:13,627-Speed 2513.48 samples/sec Loss 10.2671 LearningRate 0.000724 Epoch: 2 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:21,827-Speed 2498.17 samples/sec Loss 10.1951 LearningRate 0.000724 Epoch: 2 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:30,029-Speed 2497.42 samples/sec Loss 10.1122 LearningRate 0.000724 Epoch: 2 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:38,227-Speed 2498.76 samples/sec Loss 10.1580 LearningRate 0.000724 Epoch: 2 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:46,442-Speed 2493.58 samples/sec Loss 10.2050 LearningRate 0.000725 Epoch: 2 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:09:54,646-Speed 2496.70 samples/sec Loss 10.1896 LearningRate 0.000725 Epoch: 2 Global Step: 60120 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:02,795-Speed 2513.40 samples/sec Loss 10.2783 LearningRate 0.000725 Epoch: 2 Global Step: 60130 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:10,997-Speed 2497.46 samples/sec Loss 10.3885 LearningRate 0.000725 Epoch: 2 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:19,194-Speed 2498.95 samples/sec Loss 10.3444 LearningRate 0.000725 Epoch: 2 Global Step: 60150 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:27,395-Speed 2497.67 samples/sec Loss 10.2947 LearningRate 0.000725 Epoch: 2 Global Step: 60160 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:35,595-Speed 2498.05 samples/sec Loss 10.3698 LearningRate 0.000725 Epoch: 2 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:43,807-Speed 2494.09 samples/sec Loss 10.3280 LearningRate 0.000725 Epoch: 2 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:10:51,954-Speed 2514.28 samples/sec Loss 10.3841 LearningRate 0.000726 Epoch: 2 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:00,154-Speed 2498.05 samples/sec Loss 10.3661 LearningRate 0.000726 Epoch: 2 Global Step: 60200 Fp16 Grad Scale: 262144 Required: 176 hours Training: 2022-07-06 04:11:08,313-Speed 2510.19 samples/sec Loss 10.2378 LearningRate 0.000726 Epoch: 2 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:16,527-Speed 2494.20 samples/sec Loss 10.2820 LearningRate 0.000726 Epoch: 2 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:24,729-Speed 2497.57 samples/sec Loss 10.2927 LearningRate 0.000726 Epoch: 2 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:32,927-Speed 2498.58 samples/sec Loss 10.2249 LearningRate 0.000726 Epoch: 2 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:41,073-Speed 2514.41 samples/sec Loss 10.1200 LearningRate 0.000726 Epoch: 2 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:49,277-Speed 2497.12 samples/sec Loss 10.3746 LearningRate 0.000726 Epoch: 2 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:11:57,489-Speed 2494.40 samples/sec Loss 10.4540 LearningRate 0.000727 Epoch: 2 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:05,707-Speed 2492.45 samples/sec Loss 10.4720 LearningRate 0.000727 Epoch: 2 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:13,906-Speed 2498.51 samples/sec Loss 10.4776 LearningRate 0.000727 Epoch: 2 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:22,107-Speed 2497.60 samples/sec Loss 10.5504 LearningRate 0.000727 Epoch: 2 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:30,261-Speed 2511.99 samples/sec Loss 10.4220 LearningRate 0.000727 Epoch: 2 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:38,460-Speed 2498.44 samples/sec Loss 10.4984 LearningRate 0.000727 Epoch: 2 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:46,666-Speed 2496.02 samples/sec Loss 10.3434 LearningRate 0.000727 Epoch: 2 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:12:54,865-Speed 2498.51 samples/sec Loss 10.4155 LearningRate 0.000727 Epoch: 2 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:03,062-Speed 2498.76 samples/sec Loss 10.3796 LearningRate 0.000727 Epoch: 2 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:11,263-Speed 2497.58 samples/sec Loss 10.3108 LearningRate 0.000728 Epoch: 2 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:19,419-Speed 2511.29 samples/sec Loss 10.2393 LearningRate 0.000728 Epoch: 2 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:27,623-Speed 2496.77 samples/sec Loss 10.2188 LearningRate 0.000728 Epoch: 2 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:35,822-Speed 2498.35 samples/sec Loss 10.2575 LearningRate 0.000728 Epoch: 2 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:44,023-Speed 2497.57 samples/sec Loss 10.2622 LearningRate 0.000728 Epoch: 2 Global Step: 60400 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:13:52,226-Speed 2496.88 samples/sec Loss 10.3551 LearningRate 0.000728 Epoch: 2 Global Step: 60410 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:00,427-Speed 2497.90 samples/sec Loss 10.2832 LearningRate 0.000728 Epoch: 2 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:08,588-Speed 2509.73 samples/sec Loss 10.2646 LearningRate 0.000728 Epoch: 2 Global Step: 60430 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:16,790-Speed 2497.46 samples/sec Loss 10.2917 LearningRate 0.000729 Epoch: 2 Global Step: 60440 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:24,989-Speed 2498.50 samples/sec Loss 10.2710 LearningRate 0.000729 Epoch: 2 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:33,196-Speed 2495.71 samples/sec Loss 10.1669 LearningRate 0.000729 Epoch: 2 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:41,390-Speed 2499.57 samples/sec Loss 10.2597 LearningRate 0.000729 Epoch: 2 Global Step: 60470 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:49,589-Speed 2498.26 samples/sec Loss 10.3101 LearningRate 0.000729 Epoch: 2 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:14:57,734-Speed 2514.85 samples/sec Loss 10.2492 LearningRate 0.000729 Epoch: 2 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:05,930-Speed 2499.24 samples/sec Loss 10.1857 LearningRate 0.000729 Epoch: 2 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:14,132-Speed 2497.24 samples/sec Loss 10.2282 LearningRate 0.000729 Epoch: 2 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:22,334-Speed 2497.25 samples/sec Loss 10.2752 LearningRate 0.000730 Epoch: 2 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:30,536-Speed 2497.92 samples/sec Loss 10.3011 LearningRate 0.000730 Epoch: 2 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:38,737-Speed 2497.38 samples/sec Loss 10.2498 LearningRate 0.000730 Epoch: 2 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:46,883-Speed 2514.43 samples/sec Loss 10.1992 LearningRate 0.000730 Epoch: 2 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:15:55,084-Speed 2497.90 samples/sec Loss 10.1314 LearningRate 0.000730 Epoch: 2 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:03,293-Speed 2495.15 samples/sec Loss 10.1423 LearningRate 0.000730 Epoch: 2 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:11,493-Speed 2497.93 samples/sec Loss 10.2034 LearningRate 0.000730 Epoch: 2 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:19,707-Speed 2493.96 samples/sec Loss 10.1331 LearningRate 0.000730 Epoch: 2 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:27,909-Speed 2497.23 samples/sec Loss 10.1120 LearningRate 0.000730 Epoch: 2 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:36,057-Speed 2513.95 samples/sec Loss 10.2097 LearningRate 0.000731 Epoch: 2 Global Step: 60610 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:44,262-Speed 2496.63 samples/sec Loss 10.1540 LearningRate 0.000731 Epoch: 2 Global Step: 60620 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:16:52,475-Speed 2494.01 samples/sec Loss 10.2419 LearningRate 0.000731 Epoch: 2 Global Step: 60630 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:00,678-Speed 2497.18 samples/sec Loss 10.3823 LearningRate 0.000731 Epoch: 2 Global Step: 60640 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:08,902-Speed 2490.69 samples/sec Loss 10.2243 LearningRate 0.000731 Epoch: 2 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:17,103-Speed 2497.69 samples/sec Loss 10.2275 LearningRate 0.000731 Epoch: 2 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:25,253-Speed 2513.55 samples/sec Loss 10.2093 LearningRate 0.000731 Epoch: 2 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:33,455-Speed 2497.30 samples/sec Loss 10.2172 LearningRate 0.000731 Epoch: 2 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:41,663-Speed 2495.41 samples/sec Loss 10.1344 LearningRate 0.000732 Epoch: 2 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:49,869-Speed 2496.36 samples/sec Loss 10.2107 LearningRate 0.000732 Epoch: 2 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:17:58,074-Speed 2496.49 samples/sec Loss 10.1907 LearningRate 0.000732 Epoch: 2 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:18:06,288-Speed 2493.57 samples/sec Loss 10.2573 LearningRate 0.000732 Epoch: 2 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:18:14,436-Speed 2514.04 samples/sec Loss 10.1638 LearningRate 0.000732 Epoch: 2 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 176 hours Training: 2022-07-06 04:18:22,635-Speed 2498.13 samples/sec Loss 10.2394 LearningRate 0.000732 Epoch: 2 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:18:30,840-Speed 2496.26 samples/sec Loss 10.1551 LearningRate 0.000732 Epoch: 2 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:18:39,043-Speed 2497.04 samples/sec Loss 10.0987 LearningRate 0.000732 Epoch: 2 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:18:47,248-Speed 2496.60 samples/sec Loss 10.2219 LearningRate 0.000733 Epoch: 2 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:18:55,466-Speed 2492.48 samples/sec Loss 10.3492 LearningRate 0.000733 Epoch: 2 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:03,616-Speed 2513.31 samples/sec Loss 10.1616 LearningRate 0.000733 Epoch: 2 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:11,818-Speed 2497.11 samples/sec Loss 10.1507 LearningRate 0.000733 Epoch: 2 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:20,020-Speed 2497.47 samples/sec Loss 10.2768 LearningRate 0.000733 Epoch: 2 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:28,225-Speed 2496.43 samples/sec Loss 10.2128 LearningRate 0.000733 Epoch: 2 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:36,427-Speed 2497.36 samples/sec Loss 10.1404 LearningRate 0.000733 Epoch: 2 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:44,630-Speed 2497.08 samples/sec Loss 10.1776 LearningRate 0.000733 Epoch: 2 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:19:52,783-Speed 2512.33 samples/sec Loss 10.1345 LearningRate 0.000734 Epoch: 2 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:00,984-Speed 2497.44 samples/sec Loss 10.0946 LearningRate 0.000734 Epoch: 2 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:09,183-Speed 2498.17 samples/sec Loss 10.1733 LearningRate 0.000734 Epoch: 2 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:17,384-Speed 2497.94 samples/sec Loss 10.1039 LearningRate 0.000734 Epoch: 2 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:25,580-Speed 2499.12 samples/sec Loss 10.0564 LearningRate 0.000734 Epoch: 2 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:33,792-Speed 2494.01 samples/sec Loss 10.2084 LearningRate 0.000734 Epoch: 2 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:41,939-Speed 2514.13 samples/sec Loss 10.1440 LearningRate 0.000734 Epoch: 2 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:50,140-Speed 2498.06 samples/sec Loss 10.0698 LearningRate 0.000734 Epoch: 2 Global Step: 60920 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:20:58,344-Speed 2496.86 samples/sec Loss 10.0633 LearningRate 0.000734 Epoch: 2 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:06,562-Speed 2492.58 samples/sec Loss 10.0455 LearningRate 0.000735 Epoch: 2 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:14,763-Speed 2497.71 samples/sec Loss 10.1010 LearningRate 0.000735 Epoch: 2 Global Step: 60950 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:22,966-Speed 2496.87 samples/sec Loss 9.9957 LearningRate 0.000735 Epoch: 2 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:31,120-Speed 2512.26 samples/sec Loss 10.0548 LearningRate 0.000735 Epoch: 2 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:39,321-Speed 2497.45 samples/sec Loss 10.0911 LearningRate 0.000735 Epoch: 2 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:47,522-Speed 2497.58 samples/sec Loss 9.9763 LearningRate 0.000735 Epoch: 2 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:21:55,723-Speed 2498.05 samples/sec Loss 10.0201 LearningRate 0.000735 Epoch: 2 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:03,923-Speed 2498.09 samples/sec Loss 10.0317 LearningRate 0.000735 Epoch: 2 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:12,124-Speed 2497.54 samples/sec Loss 10.1602 LearningRate 0.000736 Epoch: 2 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:20,274-Speed 2513.19 samples/sec Loss 10.0611 LearningRate 0.000736 Epoch: 2 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:28,474-Speed 2498.26 samples/sec Loss 10.1959 LearningRate 0.000736 Epoch: 2 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:36,675-Speed 2497.47 samples/sec Loss 10.2301 LearningRate 0.000736 Epoch: 2 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:44,893-Speed 2492.51 samples/sec Loss 10.0999 LearningRate 0.000736 Epoch: 2 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:22:53,091-Speed 2498.72 samples/sec Loss 10.3112 LearningRate 0.000736 Epoch: 2 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:01,290-Speed 2498.15 samples/sec Loss 10.1881 LearningRate 0.000736 Epoch: 2 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:09,468-Speed 2504.81 samples/sec Loss 10.2201 LearningRate 0.000736 Epoch: 2 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:17,672-Speed 2496.67 samples/sec Loss 10.3218 LearningRate 0.000737 Epoch: 2 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:25,886-Speed 2493.78 samples/sec Loss 10.2504 LearningRate 0.000737 Epoch: 2 Global Step: 61110 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:34,087-Speed 2497.83 samples/sec Loss 10.1469 LearningRate 0.000737 Epoch: 2 Global Step: 61120 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:42,298-Speed 2494.47 samples/sec Loss 10.1339 LearningRate 0.000737 Epoch: 2 Global Step: 61130 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:50,500-Speed 2497.69 samples/sec Loss 10.0836 LearningRate 0.000737 Epoch: 2 Global Step: 61140 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:23:58,646-Speed 2514.39 samples/sec Loss 10.1724 LearningRate 0.000737 Epoch: 2 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:06,860-Speed 2493.96 samples/sec Loss 10.0739 LearningRate 0.000737 Epoch: 2 Global Step: 61160 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:15,064-Speed 2496.76 samples/sec Loss 10.0486 LearningRate 0.000737 Epoch: 2 Global Step: 61170 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:23,266-Speed 2497.34 samples/sec Loss 10.1201 LearningRate 0.000737 Epoch: 2 Global Step: 61180 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:31,467-Speed 2497.69 samples/sec Loss 10.1317 LearningRate 0.000738 Epoch: 2 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:39,670-Speed 2496.80 samples/sec Loss 10.2021 LearningRate 0.000738 Epoch: 2 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:47,821-Speed 2513.18 samples/sec Loss 10.2004 LearningRate 0.000738 Epoch: 2 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:24:56,115-Speed 2498.85 samples/sec Loss 10.1342 LearningRate 0.000738 Epoch: 2 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:04,398-Speed 2492.41 samples/sec Loss 10.1322 LearningRate 0.000738 Epoch: 2 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:12,602-Speed 2496.52 samples/sec Loss 10.1775 LearningRate 0.000738 Epoch: 2 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:20,858-Speed 2498.48 samples/sec Loss 10.4513 LearningRate 0.000738 Epoch: 2 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:30,965-Speed 2036.17 samples/sec Loss 10.1890 LearningRate 0.000738 Epoch: 2 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:39,116-Speed 2512.87 samples/sec Loss 10.4232 LearningRate 0.000739 Epoch: 2 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:47,321-Speed 2496.51 samples/sec Loss 10.3678 LearningRate 0.000739 Epoch: 2 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:25:58,832-Speed 1791.45 samples/sec Loss 10.2243 LearningRate 0.000739 Epoch: 2 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:07,058-Speed 2498.22 samples/sec Loss 10.2224 LearningRate 0.000739 Epoch: 2 Global Step: 61300 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:15,252-Speed 2499.75 samples/sec Loss 10.2187 LearningRate 0.000739 Epoch: 2 Global Step: 61310 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:23,450-Speed 2498.46 samples/sec Loss 10.2338 LearningRate 0.000739 Epoch: 2 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:33,497-Speed 2054.31 samples/sec Loss 10.2553 LearningRate 0.000739 Epoch: 2 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:41,708-Speed 2500.89 samples/sec Loss 10.1442 LearningRate 0.000739 Epoch: 2 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:49,907-Speed 2498.31 samples/sec Loss 10.0426 LearningRate 0.000740 Epoch: 2 Global Step: 61350 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:26:58,106-Speed 2498.09 samples/sec Loss 10.1320 LearningRate 0.000740 Epoch: 2 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:27:06,345-Speed 2500.21 samples/sec Loss 10.1370 LearningRate 0.000740 Epoch: 2 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:27:17,504-Speed 2483.58 samples/sec Loss 10.1895 LearningRate 0.000740 Epoch: 2 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:27:25,649-Speed 2514.70 samples/sec Loss 10.0733 LearningRate 0.000740 Epoch: 2 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:27:36,849-Speed 1828.82 samples/sec Loss 10.0834 LearningRate 0.000740 Epoch: 2 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:27:45,096-Speed 2495.96 samples/sec Loss 10.0304 LearningRate 0.000740 Epoch: 2 Global Step: 61410 Fp16 Grad Scale: 262144 Required: 175 hours Training: 2022-07-06 04:27:53,298-Speed 2500.62 samples/sec Loss 9.9982 LearningRate 0.000740 Epoch: 2 Global Step: 61420 Fp16 Grad Scale: 262144 Required: 175 hours Training: 2022-07-06 04:28:01,510-Speed 2494.18 samples/sec Loss 9.9692 LearningRate 0.000741 Epoch: 2 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:09,708-Speed 2498.44 samples/sec Loss 10.1159 LearningRate 0.000741 Epoch: 2 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:17,864-Speed 2515.77 samples/sec Loss 9.9921 LearningRate 0.000741 Epoch: 2 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:27,947-Speed 2133.20 samples/sec Loss 10.0348 LearningRate 0.000741 Epoch: 2 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:36,154-Speed 2495.74 samples/sec Loss 10.1637 LearningRate 0.000741 Epoch: 2 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:44,404-Speed 2498.46 samples/sec Loss 10.1283 LearningRate 0.000741 Epoch: 2 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:28:52,677-Speed 2496.80 samples/sec Loss 9.9461 LearningRate 0.000741 Epoch: 2 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:00,877-Speed 2497.69 samples/sec Loss 10.0892 LearningRate 0.000741 Epoch: 2 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:10,366-Speed 2502.45 samples/sec Loss 9.9839 LearningRate 0.000741 Epoch: 2 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:18,600-Speed 2499.10 samples/sec Loss 10.0658 LearningRate 0.000742 Epoch: 2 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:27,750-Speed 2255.97 samples/sec Loss 10.0548 LearningRate 0.000742 Epoch: 2 Global Step: 61530 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:35,996-Speed 2483.85 samples/sec Loss 9.9870 LearningRate 0.000742 Epoch: 2 Global Step: 61540 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:44,199-Speed 2497.01 samples/sec Loss 10.0143 LearningRate 0.000742 Epoch: 2 Global Step: 61550 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:29:54,018-Speed 2104.67 samples/sec Loss 10.1196 LearningRate 0.000742 Epoch: 2 Global Step: 61560 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:02,164-Speed 2514.54 samples/sec Loss 9.8862 LearningRate 0.000742 Epoch: 2 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:10,366-Speed 2497.20 samples/sec Loss 10.1015 LearningRate 0.000742 Epoch: 2 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:18,606-Speed 2498.71 samples/sec Loss 10.0909 LearningRate 0.000742 Epoch: 2 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:26,813-Speed 2495.71 samples/sec Loss 10.0476 LearningRate 0.000743 Epoch: 2 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:35,010-Speed 2498.76 samples/sec Loss 10.1056 LearningRate 0.000743 Epoch: 2 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:43,209-Speed 2498.51 samples/sec Loss 10.0908 LearningRate 0.000743 Epoch: 2 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:51,351-Speed 2515.80 samples/sec Loss 9.9751 LearningRate 0.000743 Epoch: 2 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:30:59,554-Speed 2496.99 samples/sec Loss 9.9852 LearningRate 0.000743 Epoch: 2 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:07,756-Speed 2497.34 samples/sec Loss 10.1286 LearningRate 0.000743 Epoch: 2 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:15,960-Speed 2496.68 samples/sec Loss 10.1054 LearningRate 0.000743 Epoch: 2 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:24,155-Speed 2499.81 samples/sec Loss 10.0557 LearningRate 0.000743 Epoch: 2 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:32,365-Speed 2494.84 samples/sec Loss 10.0463 LearningRate 0.000744 Epoch: 2 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:40,519-Speed 2512.19 samples/sec Loss 9.9252 LearningRate 0.000744 Epoch: 2 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:48,717-Speed 2498.45 samples/sec Loss 10.0869 LearningRate 0.000744 Epoch: 2 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:31:56,915-Speed 2498.52 samples/sec Loss 10.0223 LearningRate 0.000744 Epoch: 2 Global Step: 61710 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:05,116-Speed 2497.81 samples/sec Loss 10.0605 LearningRate 0.000744 Epoch: 2 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:13,320-Speed 2496.79 samples/sec Loss 10.1141 LearningRate 0.000744 Epoch: 2 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:21,520-Speed 2497.86 samples/sec Loss 10.0888 LearningRate 0.000744 Epoch: 2 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:29,674-Speed 2516.05 samples/sec Loss 9.9809 LearningRate 0.000744 Epoch: 2 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:37,873-Speed 2498.18 samples/sec Loss 10.0108 LearningRate 0.000744 Epoch: 2 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:46,074-Speed 2497.57 samples/sec Loss 10.1433 LearningRate 0.000745 Epoch: 2 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:32:54,285-Speed 2494.96 samples/sec Loss 10.1868 LearningRate 0.000745 Epoch: 2 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:02,490-Speed 2496.62 samples/sec Loss 9.9680 LearningRate 0.000745 Epoch: 2 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:10,691-Speed 2497.55 samples/sec Loss 9.9808 LearningRate 0.000745 Epoch: 2 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:18,840-Speed 2513.61 samples/sec Loss 10.0211 LearningRate 0.000745 Epoch: 2 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:27,039-Speed 2497.99 samples/sec Loss 10.0022 LearningRate 0.000745 Epoch: 2 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:35,280-Speed 2485.75 samples/sec Loss 9.8997 LearningRate 0.000745 Epoch: 2 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:43,481-Speed 2497.52 samples/sec Loss 9.9682 LearningRate 0.000745 Epoch: 2 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:51,684-Speed 2497.42 samples/sec Loss 10.0080 LearningRate 0.000746 Epoch: 2 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:33:59,887-Speed 2497.18 samples/sec Loss 10.0188 LearningRate 0.000746 Epoch: 2 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:08,046-Speed 2510.55 samples/sec Loss 9.8919 LearningRate 0.000746 Epoch: 2 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:16,245-Speed 2498.02 samples/sec Loss 10.0906 LearningRate 0.000746 Epoch: 2 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:24,457-Speed 2494.31 samples/sec Loss 10.0722 LearningRate 0.000746 Epoch: 2 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:32,654-Speed 2498.85 samples/sec Loss 10.0969 LearningRate 0.000746 Epoch: 2 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:40,852-Speed 2498.77 samples/sec Loss 10.0628 LearningRate 0.000746 Epoch: 2 Global Step: 61910 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:49,053-Speed 2497.58 samples/sec Loss 9.9525 LearningRate 0.000746 Epoch: 2 Global Step: 61920 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:34:57,201-Speed 2513.97 samples/sec Loss 10.1545 LearningRate 0.000747 Epoch: 2 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:05,409-Speed 2495.60 samples/sec Loss 9.9811 LearningRate 0.000747 Epoch: 2 Global Step: 61940 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:13,623-Speed 2493.96 samples/sec Loss 10.0723 LearningRate 0.000747 Epoch: 2 Global Step: 61950 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:21,830-Speed 2495.64 samples/sec Loss 10.1510 LearningRate 0.000747 Epoch: 2 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:30,035-Speed 2496.69 samples/sec Loss 9.9750 LearningRate 0.000747 Epoch: 2 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:38,238-Speed 2496.88 samples/sec Loss 9.9428 LearningRate 0.000747 Epoch: 2 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:46,387-Speed 2513.50 samples/sec Loss 10.1666 LearningRate 0.000747 Epoch: 2 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:35:54,599-Speed 2494.43 samples/sec Loss 10.0407 LearningRate 0.000747 Epoch: 2 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:02,798-Speed 2498.23 samples/sec Loss 9.9370 LearningRate 0.000747 Epoch: 2 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:10,998-Speed 2498.78 samples/sec Loss 10.0656 LearningRate 0.000748 Epoch: 2 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:19,204-Speed 2496.30 samples/sec Loss 10.0331 LearningRate 0.000748 Epoch: 2 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:27,406-Speed 2497.27 samples/sec Loss 10.0727 LearningRate 0.000748 Epoch: 2 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:35,555-Speed 2513.64 samples/sec Loss 10.0257 LearningRate 0.000748 Epoch: 2 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:43,756-Speed 2497.88 samples/sec Loss 10.0053 LearningRate 0.000748 Epoch: 2 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:36:51,959-Speed 2497.07 samples/sec Loss 9.9370 LearningRate 0.000748 Epoch: 2 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:00,162-Speed 2497.06 samples/sec Loss 9.9007 LearningRate 0.000748 Epoch: 2 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:08,362-Speed 2498.07 samples/sec Loss 9.8989 LearningRate 0.000748 Epoch: 2 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:16,567-Speed 2496.29 samples/sec Loss 9.9317 LearningRate 0.000749 Epoch: 2 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:24,714-Speed 2514.40 samples/sec Loss 10.0124 LearningRate 0.000749 Epoch: 2 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:32,916-Speed 2497.32 samples/sec Loss 10.0450 LearningRate 0.000749 Epoch: 2 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:41,120-Speed 2496.70 samples/sec Loss 9.9564 LearningRate 0.000749 Epoch: 2 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:49,322-Speed 2497.54 samples/sec Loss 9.8339 LearningRate 0.000749 Epoch: 2 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:37:57,522-Speed 2497.84 samples/sec Loss 9.9074 LearningRate 0.000749 Epoch: 2 Global Step: 62150 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:05,738-Speed 2493.30 samples/sec Loss 9.8366 LearningRate 0.000749 Epoch: 2 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:13,884-Speed 2514.26 samples/sec Loss 9.8402 LearningRate 0.000749 Epoch: 2 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:22,088-Speed 2497.44 samples/sec Loss 9.9953 LearningRate 0.000750 Epoch: 2 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:30,288-Speed 2497.96 samples/sec Loss 9.8409 LearningRate 0.000750 Epoch: 2 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:38,489-Speed 2498.03 samples/sec Loss 9.8581 LearningRate 0.000750 Epoch: 2 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:46,688-Speed 2498.28 samples/sec Loss 9.8638 LearningRate 0.000750 Epoch: 2 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:38:57,627-Speed 1872.44 samples/sec Loss 10.0265 LearningRate 0.000750 Epoch: 3 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:05,771-Speed 2514.97 samples/sec Loss 9.7962 LearningRate 0.000750 Epoch: 3 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:13,971-Speed 2498.65 samples/sec Loss 9.9099 LearningRate 0.000750 Epoch: 3 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:22,173-Speed 2497.34 samples/sec Loss 9.8690 LearningRate 0.000750 Epoch: 3 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:30,378-Speed 2496.30 samples/sec Loss 10.0573 LearningRate 0.000751 Epoch: 3 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:38,580-Speed 2497.48 samples/sec Loss 9.9959 LearningRate 0.000751 Epoch: 3 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:46,788-Speed 2495.50 samples/sec Loss 9.9816 LearningRate 0.000751 Epoch: 3 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:39:54,938-Speed 2513.38 samples/sec Loss 10.0114 LearningRate 0.000751 Epoch: 3 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:03,136-Speed 2498.45 samples/sec Loss 10.0090 LearningRate 0.000751 Epoch: 3 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:11,336-Speed 2497.98 samples/sec Loss 9.9605 LearningRate 0.000751 Epoch: 3 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:19,541-Speed 2496.54 samples/sec Loss 9.9002 LearningRate 0.000751 Epoch: 3 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:27,743-Speed 2497.29 samples/sec Loss 9.9226 LearningRate 0.000751 Epoch: 3 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:35,967-Speed 2490.69 samples/sec Loss 9.9144 LearningRate 0.000751 Epoch: 3 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:44,118-Speed 2512.99 samples/sec Loss 9.8117 LearningRate 0.000752 Epoch: 3 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:40:52,325-Speed 2495.79 samples/sec Loss 9.8611 LearningRate 0.000752 Epoch: 3 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:00,528-Speed 2496.98 samples/sec Loss 9.8946 LearningRate 0.000752 Epoch: 3 Global Step: 62370 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:08,732-Speed 2496.79 samples/sec Loss 9.9819 LearningRate 0.000752 Epoch: 3 Global Step: 62380 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:16,948-Speed 2493.08 samples/sec Loss 10.0109 LearningRate 0.000752 Epoch: 3 Global Step: 62390 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:25,151-Speed 2496.86 samples/sec Loss 9.9386 LearningRate 0.000752 Epoch: 3 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:33,303-Speed 2512.59 samples/sec Loss 9.9518 LearningRate 0.000752 Epoch: 3 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:41,507-Speed 2497.11 samples/sec Loss 9.9493 LearningRate 0.000752 Epoch: 3 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:49,712-Speed 2496.58 samples/sec Loss 9.9613 LearningRate 0.000753 Epoch: 3 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:41:57,912-Speed 2497.80 samples/sec Loss 10.0515 LearningRate 0.000753 Epoch: 3 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:06,115-Speed 2496.98 samples/sec Loss 9.8680 LearningRate 0.000753 Epoch: 3 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:14,316-Speed 2497.99 samples/sec Loss 10.0739 LearningRate 0.000753 Epoch: 3 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:22,467-Speed 2512.89 samples/sec Loss 9.9265 LearningRate 0.000753 Epoch: 3 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:30,679-Speed 2494.29 samples/sec Loss 9.9841 LearningRate 0.000753 Epoch: 3 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:38,888-Speed 2495.27 samples/sec Loss 9.8589 LearningRate 0.000753 Epoch: 3 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:47,091-Speed 2496.93 samples/sec Loss 9.9499 LearningRate 0.000753 Epoch: 3 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:42:55,295-Speed 2496.88 samples/sec Loss 9.8571 LearningRate 0.000754 Epoch: 3 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:03,507-Speed 2494.52 samples/sec Loss 9.8494 LearningRate 0.000754 Epoch: 3 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:11,660-Speed 2512.45 samples/sec Loss 9.9017 LearningRate 0.000754 Epoch: 3 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:19,877-Speed 2492.62 samples/sec Loss 9.8948 LearningRate 0.000754 Epoch: 3 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:28,092-Speed 2493.47 samples/sec Loss 9.8718 LearningRate 0.000754 Epoch: 3 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:36,299-Speed 2496.02 samples/sec Loss 9.9402 LearningRate 0.000754 Epoch: 3 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:44,505-Speed 2495.95 samples/sec Loss 9.8129 LearningRate 0.000754 Epoch: 3 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:43:52,707-Speed 2497.35 samples/sec Loss 9.7320 LearningRate 0.000754 Epoch: 3 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:00,858-Speed 2513.25 samples/sec Loss 9.8079 LearningRate 0.000754 Epoch: 3 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:09,063-Speed 2496.50 samples/sec Loss 9.8051 LearningRate 0.000755 Epoch: 3 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:17,264-Speed 2497.34 samples/sec Loss 9.8237 LearningRate 0.000755 Epoch: 3 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:25,465-Speed 2497.81 samples/sec Loss 9.7696 LearningRate 0.000755 Epoch: 3 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:33,668-Speed 2497.05 samples/sec Loss 9.8799 LearningRate 0.000755 Epoch: 3 Global Step: 62630 Fp16 Grad Scale: 262144 Required: 175 hours Training: 2022-07-06 04:44:41,837-Speed 2507.14 samples/sec Loss 9.9341 LearningRate 0.000755 Epoch: 3 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:49,988-Speed 2512.97 samples/sec Loss 9.8730 LearningRate 0.000755 Epoch: 3 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:44:58,204-Speed 2493.30 samples/sec Loss 9.7019 LearningRate 0.000755 Epoch: 3 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:06,410-Speed 2496.24 samples/sec Loss 9.8160 LearningRate 0.000755 Epoch: 3 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:14,612-Speed 2497.60 samples/sec Loss 9.8374 LearningRate 0.000756 Epoch: 3 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:22,815-Speed 2496.78 samples/sec Loss 9.8688 LearningRate 0.000756 Epoch: 3 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:31,023-Speed 2495.81 samples/sec Loss 10.1129 LearningRate 0.000756 Epoch: 3 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:39,176-Speed 2512.24 samples/sec Loss 9.9696 LearningRate 0.000756 Epoch: 3 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:47,386-Speed 2495.07 samples/sec Loss 10.1470 LearningRate 0.000756 Epoch: 3 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:45:55,595-Speed 2495.34 samples/sec Loss 10.0145 LearningRate 0.000756 Epoch: 3 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:03,798-Speed 2496.99 samples/sec Loss 10.0281 LearningRate 0.000756 Epoch: 3 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:11,999-Speed 2497.71 samples/sec Loss 10.0067 LearningRate 0.000756 Epoch: 3 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:20,204-Speed 2496.33 samples/sec Loss 9.9874 LearningRate 0.000757 Epoch: 3 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:28,350-Speed 2514.41 samples/sec Loss 9.8992 LearningRate 0.000757 Epoch: 3 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:36,553-Speed 2497.23 samples/sec Loss 10.0082 LearningRate 0.000757 Epoch: 3 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:44,756-Speed 2496.96 samples/sec Loss 9.9224 LearningRate 0.000757 Epoch: 3 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:46:52,956-Speed 2497.96 samples/sec Loss 9.8572 LearningRate 0.000757 Epoch: 3 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:01,156-Speed 2497.99 samples/sec Loss 9.9007 LearningRate 0.000757 Epoch: 3 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:09,356-Speed 2498.01 samples/sec Loss 9.8783 LearningRate 0.000757 Epoch: 3 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:17,502-Speed 2514.25 samples/sec Loss 9.8350 LearningRate 0.000757 Epoch: 3 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:25,708-Speed 2496.21 samples/sec Loss 9.7878 LearningRate 0.000757 Epoch: 3 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:33,911-Speed 2497.22 samples/sec Loss 9.8126 LearningRate 0.000758 Epoch: 3 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:42,112-Speed 2497.94 samples/sec Loss 9.8394 LearningRate 0.000758 Epoch: 3 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:50,313-Speed 2497.32 samples/sec Loss 10.0250 LearningRate 0.000758 Epoch: 3 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:47:58,532-Speed 2492.29 samples/sec Loss 9.9605 LearningRate 0.000758 Epoch: 3 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:06,686-Speed 2511.92 samples/sec Loss 9.8560 LearningRate 0.000758 Epoch: 3 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:14,888-Speed 2497.56 samples/sec Loss 9.9338 LearningRate 0.000758 Epoch: 3 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:23,090-Speed 2497.43 samples/sec Loss 9.8458 LearningRate 0.000758 Epoch: 3 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:31,292-Speed 2497.41 samples/sec Loss 9.9233 LearningRate 0.000758 Epoch: 3 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:39,491-Speed 2498.17 samples/sec Loss 9.9599 LearningRate 0.000759 Epoch: 3 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:47,694-Speed 2497.03 samples/sec Loss 9.8473 LearningRate 0.000759 Epoch: 3 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:48:55,844-Speed 2513.33 samples/sec Loss 9.9512 LearningRate 0.000759 Epoch: 3 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:04,046-Speed 2497.45 samples/sec Loss 9.8281 LearningRate 0.000759 Epoch: 3 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:12,249-Speed 2496.88 samples/sec Loss 9.8550 LearningRate 0.000759 Epoch: 3 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:20,451-Speed 2497.38 samples/sec Loss 9.8958 LearningRate 0.000759 Epoch: 3 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:28,654-Speed 2497.06 samples/sec Loss 9.7745 LearningRate 0.000759 Epoch: 3 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:36,864-Speed 2494.85 samples/sec Loss 9.9003 LearningRate 0.000759 Epoch: 3 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:45,013-Speed 2513.75 samples/sec Loss 9.9343 LearningRate 0.000760 Epoch: 3 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:49:53,215-Speed 2497.46 samples/sec Loss 9.9061 LearningRate 0.000760 Epoch: 3 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:01,416-Speed 2497.56 samples/sec Loss 9.8166 LearningRate 0.000760 Epoch: 3 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:09,620-Speed 2496.90 samples/sec Loss 9.8515 LearningRate 0.000760 Epoch: 3 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:17,822-Speed 2497.21 samples/sec Loss 9.9005 LearningRate 0.000760 Epoch: 3 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:26,025-Speed 2497.11 samples/sec Loss 9.8821 LearningRate 0.000760 Epoch: 3 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:34,171-Speed 2514.33 samples/sec Loss 9.9101 LearningRate 0.000760 Epoch: 3 Global Step: 63070 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:42,371-Speed 2497.98 samples/sec Loss 9.9701 LearningRate 0.000760 Epoch: 3 Global Step: 63080 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:50,574-Speed 2497.29 samples/sec Loss 9.8847 LearningRate 0.000761 Epoch: 3 Global Step: 63090 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:50:58,774-Speed 2498.02 samples/sec Loss 9.9282 LearningRate 0.000761 Epoch: 3 Global Step: 63100 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:51:06,979-Speed 2496.43 samples/sec Loss 9.8625 LearningRate 0.000761 Epoch: 3 Global Step: 63110 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 04:51:15,138-Speed 2510.57 samples/sec Loss 9.6932 LearningRate 0.000761 Epoch: 3 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:51:23,287-Speed 2513.59 samples/sec Loss 9.8748 LearningRate 0.000761 Epoch: 3 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:51:31,486-Speed 2498.24 samples/sec Loss 9.8223 LearningRate 0.000761 Epoch: 3 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:51:39,683-Speed 2498.71 samples/sec Loss 9.8095 LearningRate 0.000761 Epoch: 3 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:51:47,884-Speed 2497.70 samples/sec Loss 9.8729 LearningRate 0.000761 Epoch: 3 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:51:56,085-Speed 2497.99 samples/sec Loss 9.8126 LearningRate 0.000761 Epoch: 3 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:04,289-Speed 2496.71 samples/sec Loss 9.8678 LearningRate 0.000762 Epoch: 3 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:12,434-Speed 2515.46 samples/sec Loss 9.7488 LearningRate 0.000762 Epoch: 3 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:20,635-Speed 2497.92 samples/sec Loss 9.8573 LearningRate 0.000762 Epoch: 3 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:28,833-Speed 2498.38 samples/sec Loss 9.8540 LearningRate 0.000762 Epoch: 3 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:37,036-Speed 2497.14 samples/sec Loss 9.9011 LearningRate 0.000762 Epoch: 3 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:45,238-Speed 2497.52 samples/sec Loss 9.9389 LearningRate 0.000762 Epoch: 3 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:52:53,438-Speed 2497.78 samples/sec Loss 9.7551 LearningRate 0.000762 Epoch: 3 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:01,584-Speed 2514.52 samples/sec Loss 9.7996 LearningRate 0.000762 Epoch: 3 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:09,786-Speed 2497.65 samples/sec Loss 9.7734 LearningRate 0.000763 Epoch: 3 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:17,989-Speed 2496.89 samples/sec Loss 9.8251 LearningRate 0.000763 Epoch: 3 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:26,189-Speed 2497.81 samples/sec Loss 9.7750 LearningRate 0.000763 Epoch: 3 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:34,406-Speed 2492.91 samples/sec Loss 9.8139 LearningRate 0.000763 Epoch: 3 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:42,608-Speed 2497.24 samples/sec Loss 9.7600 LearningRate 0.000763 Epoch: 3 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:50,757-Speed 2513.67 samples/sec Loss 9.8082 LearningRate 0.000763 Epoch: 3 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:53:58,975-Speed 2492.72 samples/sec Loss 9.7722 LearningRate 0.000763 Epoch: 3 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:07,188-Speed 2493.93 samples/sec Loss 9.7680 LearningRate 0.000763 Epoch: 3 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:15,392-Speed 2496.71 samples/sec Loss 9.7002 LearningRate 0.000764 Epoch: 3 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:23,596-Speed 2496.86 samples/sec Loss 9.7239 LearningRate 0.000764 Epoch: 3 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:31,807-Speed 2494.62 samples/sec Loss 9.7760 LearningRate 0.000764 Epoch: 3 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:39,958-Speed 2513.12 samples/sec Loss 9.8076 LearningRate 0.000764 Epoch: 3 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:48,156-Speed 2498.19 samples/sec Loss 9.8146 LearningRate 0.000764 Epoch: 3 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:54:56,366-Speed 2495.16 samples/sec Loss 9.8365 LearningRate 0.000764 Epoch: 3 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:04,566-Speed 2498.13 samples/sec Loss 9.7698 LearningRate 0.000764 Epoch: 3 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:12,764-Speed 2498.27 samples/sec Loss 9.8270 LearningRate 0.000764 Epoch: 3 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:20,963-Speed 2498.14 samples/sec Loss 10.0526 LearningRate 0.000764 Epoch: 3 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:29,106-Speed 2515.68 samples/sec Loss 9.8095 LearningRate 0.000765 Epoch: 3 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:37,306-Speed 2497.86 samples/sec Loss 9.9508 LearningRate 0.000765 Epoch: 3 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:45,507-Speed 2497.53 samples/sec Loss 9.8666 LearningRate 0.000765 Epoch: 3 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:55:53,706-Speed 2498.43 samples/sec Loss 9.8241 LearningRate 0.000765 Epoch: 3 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:01,906-Speed 2497.92 samples/sec Loss 9.7979 LearningRate 0.000765 Epoch: 3 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:10,106-Speed 2497.88 samples/sec Loss 9.8982 LearningRate 0.000765 Epoch: 3 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:18,253-Speed 2514.02 samples/sec Loss 9.7942 LearningRate 0.000765 Epoch: 3 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:26,454-Speed 2497.86 samples/sec Loss 9.8493 LearningRate 0.000765 Epoch: 3 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:34,654-Speed 2497.90 samples/sec Loss 9.8419 LearningRate 0.000766 Epoch: 3 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:42,862-Speed 2495.45 samples/sec Loss 9.8208 LearningRate 0.000766 Epoch: 3 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:51,066-Speed 2496.85 samples/sec Loss 9.7727 LearningRate 0.000766 Epoch: 3 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:56:59,262-Speed 2499.09 samples/sec Loss 9.7702 LearningRate 0.000766 Epoch: 3 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:07,418-Speed 2511.75 samples/sec Loss 9.7148 LearningRate 0.000766 Epoch: 3 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:15,617-Speed 2498.24 samples/sec Loss 9.7003 LearningRate 0.000766 Epoch: 3 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:23,813-Speed 2499.00 samples/sec Loss 9.7419 LearningRate 0.000766 Epoch: 3 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:32,012-Speed 2498.45 samples/sec Loss 9.7546 LearningRate 0.000766 Epoch: 3 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:40,213-Speed 2497.67 samples/sec Loss 9.7113 LearningRate 0.000767 Epoch: 3 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:48,415-Speed 2497.44 samples/sec Loss 9.7018 LearningRate 0.000767 Epoch: 3 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:57:56,566-Speed 2512.75 samples/sec Loss 9.7292 LearningRate 0.000767 Epoch: 3 Global Step: 63610 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:04,768-Speed 2497.51 samples/sec Loss 9.6393 LearningRate 0.000767 Epoch: 3 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:12,970-Speed 2497.13 samples/sec Loss 9.7441 LearningRate 0.000767 Epoch: 3 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:21,178-Speed 2495.67 samples/sec Loss 9.7841 LearningRate 0.000767 Epoch: 3 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:29,382-Speed 2496.77 samples/sec Loss 9.7131 LearningRate 0.000767 Epoch: 3 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:37,584-Speed 2497.41 samples/sec Loss 9.7784 LearningRate 0.000767 Epoch: 3 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:45,733-Speed 2513.53 samples/sec Loss 9.6699 LearningRate 0.000768 Epoch: 3 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:58:53,937-Speed 2496.71 samples/sec Loss 9.7599 LearningRate 0.000768 Epoch: 3 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:02,138-Speed 2497.79 samples/sec Loss 9.8127 LearningRate 0.000768 Epoch: 3 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:10,361-Speed 2491.01 samples/sec Loss 9.7453 LearningRate 0.000768 Epoch: 3 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:18,563-Speed 2497.31 samples/sec Loss 9.8564 LearningRate 0.000768 Epoch: 3 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:26,771-Speed 2495.53 samples/sec Loss 9.8220 LearningRate 0.000768 Epoch: 3 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:34,917-Speed 2514.68 samples/sec Loss 9.7977 LearningRate 0.000768 Epoch: 3 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:43,116-Speed 2498.18 samples/sec Loss 9.8332 LearningRate 0.000768 Epoch: 3 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:51,315-Speed 2498.23 samples/sec Loss 9.8204 LearningRate 0.000768 Epoch: 3 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 04:59:59,517-Speed 2497.54 samples/sec Loss 9.7655 LearningRate 0.000769 Epoch: 3 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:07,733-Speed 2492.98 samples/sec Loss 9.7303 LearningRate 0.000769 Epoch: 3 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:15,936-Speed 2497.05 samples/sec Loss 9.7925 LearningRate 0.000769 Epoch: 3 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:24,083-Speed 2514.16 samples/sec Loss 9.8680 LearningRate 0.000769 Epoch: 3 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:32,284-Speed 2497.67 samples/sec Loss 9.6716 LearningRate 0.000769 Epoch: 3 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:40,485-Speed 2497.48 samples/sec Loss 9.8169 LearningRate 0.000769 Epoch: 3 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:48,690-Speed 2496.57 samples/sec Loss 9.7908 LearningRate 0.000769 Epoch: 3 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:00:56,887-Speed 2498.82 samples/sec Loss 9.7510 LearningRate 0.000769 Epoch: 3 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:05,090-Speed 2497.17 samples/sec Loss 9.7449 LearningRate 0.000770 Epoch: 3 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:13,241-Speed 2513.15 samples/sec Loss 9.7939 LearningRate 0.000770 Epoch: 3 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:21,438-Speed 2498.65 samples/sec Loss 9.7692 LearningRate 0.000770 Epoch: 3 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:29,650-Speed 2494.39 samples/sec Loss 9.7926 LearningRate 0.000770 Epoch: 3 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:37,850-Speed 2498.21 samples/sec Loss 9.6756 LearningRate 0.000770 Epoch: 3 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:46,048-Speed 2498.77 samples/sec Loss 9.6921 LearningRate 0.000770 Epoch: 3 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:01:54,249-Speed 2497.47 samples/sec Loss 9.6347 LearningRate 0.000770 Epoch: 3 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:02,400-Speed 2513.28 samples/sec Loss 9.7208 LearningRate 0.000770 Epoch: 3 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:10,603-Speed 2496.82 samples/sec Loss 9.7131 LearningRate 0.000771 Epoch: 3 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:18,804-Speed 2497.64 samples/sec Loss 9.6778 LearningRate 0.000771 Epoch: 3 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:27,003-Speed 2498.63 samples/sec Loss 9.7570 LearningRate 0.000771 Epoch: 3 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:35,207-Speed 2496.70 samples/sec Loss 9.8084 LearningRate 0.000771 Epoch: 3 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:43,406-Speed 2498.20 samples/sec Loss 9.8348 LearningRate 0.000771 Epoch: 3 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:51,552-Speed 2514.53 samples/sec Loss 9.9466 LearningRate 0.000771 Epoch: 3 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:02:59,757-Speed 2496.64 samples/sec Loss 9.9782 LearningRate 0.000771 Epoch: 3 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:07,956-Speed 2498.21 samples/sec Loss 9.7180 LearningRate 0.000771 Epoch: 3 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:16,153-Speed 2498.81 samples/sec Loss 9.8118 LearningRate 0.000771 Epoch: 3 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:24,355-Speed 2497.56 samples/sec Loss 9.7330 LearningRate 0.000772 Epoch: 3 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:32,567-Speed 2494.16 samples/sec Loss 9.5850 LearningRate 0.000772 Epoch: 3 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:40,730-Speed 2509.38 samples/sec Loss 9.7031 LearningRate 0.000772 Epoch: 3 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:48,929-Speed 2498.22 samples/sec Loss 9.6888 LearningRate 0.000772 Epoch: 3 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:03:57,131-Speed 2497.50 samples/sec Loss 9.8252 LearningRate 0.000772 Epoch: 3 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:05,334-Speed 2496.85 samples/sec Loss 9.7053 LearningRate 0.000772 Epoch: 3 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:13,533-Speed 2498.49 samples/sec Loss 9.6519 LearningRate 0.000772 Epoch: 3 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:21,734-Speed 2497.84 samples/sec Loss 9.7794 LearningRate 0.000772 Epoch: 3 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:29,884-Speed 2513.10 samples/sec Loss 9.6964 LearningRate 0.000773 Epoch: 3 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:38,084-Speed 2497.95 samples/sec Loss 9.8143 LearningRate 0.000773 Epoch: 3 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:46,288-Speed 2496.79 samples/sec Loss 9.6564 LearningRate 0.000773 Epoch: 3 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:04:54,488-Speed 2497.89 samples/sec Loss 9.7275 LearningRate 0.000773 Epoch: 3 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:02,685-Speed 2498.82 samples/sec Loss 9.5864 LearningRate 0.000773 Epoch: 3 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:10,905-Speed 2491.68 samples/sec Loss 9.6522 LearningRate 0.000773 Epoch: 3 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:19,051-Speed 2514.43 samples/sec Loss 9.7667 LearningRate 0.000773 Epoch: 3 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:27,250-Speed 2498.71 samples/sec Loss 9.7957 LearningRate 0.000773 Epoch: 3 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:35,455-Speed 2496.61 samples/sec Loss 9.5882 LearningRate 0.000774 Epoch: 3 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:43,652-Speed 2498.81 samples/sec Loss 9.7527 LearningRate 0.000774 Epoch: 3 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:05:51,853-Speed 2497.68 samples/sec Loss 9.6622 LearningRate 0.000774 Epoch: 3 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:00,051-Speed 2498.67 samples/sec Loss 9.5826 LearningRate 0.000774 Epoch: 3 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:08,196-Speed 2514.69 samples/sec Loss 9.8576 LearningRate 0.000774 Epoch: 3 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:16,406-Speed 2494.86 samples/sec Loss 9.6689 LearningRate 0.000774 Epoch: 3 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:24,607-Speed 2497.81 samples/sec Loss 9.7191 LearningRate 0.000774 Epoch: 3 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:32,805-Speed 2498.52 samples/sec Loss 9.8010 LearningRate 0.000774 Epoch: 3 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:41,005-Speed 2498.09 samples/sec Loss 9.8486 LearningRate 0.000774 Epoch: 3 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:49,216-Speed 2494.44 samples/sec Loss 9.6675 LearningRate 0.000775 Epoch: 3 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:06:57,366-Speed 2513.32 samples/sec Loss 9.7066 LearningRate 0.000775 Epoch: 3 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:07:05,571-Speed 2496.44 samples/sec Loss 9.7288 LearningRate 0.000775 Epoch: 3 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:07:13,774-Speed 2496.91 samples/sec Loss 9.7417 LearningRate 0.000775 Epoch: 3 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:07:21,976-Speed 2497.57 samples/sec Loss 9.7092 LearningRate 0.000775 Epoch: 3 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:07:30,186-Speed 2495.24 samples/sec Loss 9.6688 LearningRate 0.000775 Epoch: 3 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 175 hours Training: 2022-07-06 05:07:38,391-Speed 2496.37 samples/sec Loss 9.7398 LearningRate 0.000775 Epoch: 3 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:07:46,542-Speed 2513.04 samples/sec Loss 9.5853 LearningRate 0.000775 Epoch: 3 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:07:54,742-Speed 2497.99 samples/sec Loss 9.6537 LearningRate 0.000776 Epoch: 3 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:02,945-Speed 2497.02 samples/sec Loss 9.6165 LearningRate 0.000776 Epoch: 3 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:11,147-Speed 2497.27 samples/sec Loss 9.6705 LearningRate 0.000776 Epoch: 3 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:19,351-Speed 2496.81 samples/sec Loss 9.6295 LearningRate 0.000776 Epoch: 3 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:27,551-Speed 2498.21 samples/sec Loss 9.5127 LearningRate 0.000776 Epoch: 3 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:35,695-Speed 2515.05 samples/sec Loss 9.6424 LearningRate 0.000776 Epoch: 3 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:43,901-Speed 2496.35 samples/sec Loss 9.6041 LearningRate 0.000776 Epoch: 3 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:08:52,101-Speed 2497.74 samples/sec Loss 9.5823 LearningRate 0.000776 Epoch: 3 Global Step: 64410 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:00,302-Speed 2497.90 samples/sec Loss 9.5918 LearningRate 0.000777 Epoch: 3 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:08,504-Speed 2497.39 samples/sec Loss 9.5254 LearningRate 0.000777 Epoch: 3 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:16,702-Speed 2498.50 samples/sec Loss 9.5822 LearningRate 0.000777 Epoch: 3 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:24,853-Speed 2513.24 samples/sec Loss 9.5816 LearningRate 0.000777 Epoch: 3 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:33,056-Speed 2496.97 samples/sec Loss 9.5170 LearningRate 0.000777 Epoch: 3 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:41,265-Speed 2495.07 samples/sec Loss 9.6284 LearningRate 0.000777 Epoch: 3 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:49,465-Speed 2497.96 samples/sec Loss 9.6581 LearningRate 0.000777 Epoch: 3 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:09:57,669-Speed 2496.69 samples/sec Loss 9.5226 LearningRate 0.000777 Epoch: 3 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:05,870-Speed 2498.16 samples/sec Loss 9.6542 LearningRate 0.000778 Epoch: 3 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:14,019-Speed 2513.46 samples/sec Loss 9.6361 LearningRate 0.000778 Epoch: 3 Global Step: 64510 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:22,221-Speed 2497.17 samples/sec Loss 9.4975 LearningRate 0.000778 Epoch: 3 Global Step: 64520 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:30,424-Speed 2497.37 samples/sec Loss 9.5948 LearningRate 0.000778 Epoch: 3 Global Step: 64530 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:38,624-Speed 2497.85 samples/sec Loss 9.4952 LearningRate 0.000778 Epoch: 3 Global Step: 64540 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:46,826-Speed 2497.32 samples/sec Loss 9.5135 LearningRate 0.000778 Epoch: 3 Global Step: 64550 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:10:55,026-Speed 2498.07 samples/sec Loss 9.7144 LearningRate 0.000778 Epoch: 3 Global Step: 64560 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:03,183-Speed 2511.03 samples/sec Loss 9.6740 LearningRate 0.000778 Epoch: 3 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:11,387-Speed 2496.55 samples/sec Loss 9.6666 LearningRate 0.000778 Epoch: 3 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:19,589-Speed 2497.39 samples/sec Loss 9.6392 LearningRate 0.000779 Epoch: 3 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:27,789-Speed 2498.03 samples/sec Loss 9.5398 LearningRate 0.000779 Epoch: 3 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:35,990-Speed 2497.54 samples/sec Loss 9.7218 LearningRate 0.000779 Epoch: 3 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:44,192-Speed 2497.39 samples/sec Loss 9.6179 LearningRate 0.000779 Epoch: 3 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:11:52,341-Speed 2513.79 samples/sec Loss 9.6449 LearningRate 0.000779 Epoch: 3 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:00,555-Speed 2493.64 samples/sec Loss 9.6761 LearningRate 0.000779 Epoch: 3 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:08,753-Speed 2498.70 samples/sec Loss 9.6088 LearningRate 0.000779 Epoch: 3 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:16,953-Speed 2498.10 samples/sec Loss 9.5907 LearningRate 0.000779 Epoch: 3 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:25,156-Speed 2496.95 samples/sec Loss 9.6059 LearningRate 0.000780 Epoch: 3 Global Step: 64670 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:33,361-Speed 2496.52 samples/sec Loss 9.7029 LearningRate 0.000780 Epoch: 3 Global Step: 64680 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:41,511-Speed 2513.43 samples/sec Loss 9.6653 LearningRate 0.000780 Epoch: 3 Global Step: 64690 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:49,709-Speed 2498.63 samples/sec Loss 9.6885 LearningRate 0.000780 Epoch: 3 Global Step: 64700 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:12:57,913-Speed 2496.65 samples/sec Loss 9.6360 LearningRate 0.000780 Epoch: 3 Global Step: 64710 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:06,111-Speed 2498.72 samples/sec Loss 9.6448 LearningRate 0.000780 Epoch: 3 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:14,313-Speed 2497.52 samples/sec Loss 9.6448 LearningRate 0.000780 Epoch: 3 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:22,514-Speed 2497.38 samples/sec Loss 9.6554 LearningRate 0.000780 Epoch: 3 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:30,660-Speed 2514.76 samples/sec Loss 9.5160 LearningRate 0.000781 Epoch: 3 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:38,858-Speed 2498.53 samples/sec Loss 9.5558 LearningRate 0.000781 Epoch: 3 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:47,058-Speed 2497.95 samples/sec Loss 9.6935 LearningRate 0.000781 Epoch: 3 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:13:55,260-Speed 2497.37 samples/sec Loss 9.6166 LearningRate 0.000781 Epoch: 3 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:03,464-Speed 2496.70 samples/sec Loss 9.5816 LearningRate 0.000781 Epoch: 3 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:11,667-Speed 2497.16 samples/sec Loss 9.7686 LearningRate 0.000781 Epoch: 3 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:19,813-Speed 2514.54 samples/sec Loss 9.6849 LearningRate 0.000781 Epoch: 3 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:28,029-Speed 2493.20 samples/sec Loss 9.5981 LearningRate 0.000781 Epoch: 3 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:36,228-Speed 2497.84 samples/sec Loss 9.5369 LearningRate 0.000781 Epoch: 3 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:44,430-Speed 2497.52 samples/sec Loss 9.5651 LearningRate 0.000782 Epoch: 3 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:14:52,632-Speed 2497.36 samples/sec Loss 9.5937 LearningRate 0.000782 Epoch: 3 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:00,848-Speed 2493.08 samples/sec Loss 9.5724 LearningRate 0.000782 Epoch: 3 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:08,996-Speed 2514.12 samples/sec Loss 9.6081 LearningRate 0.000782 Epoch: 3 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:17,201-Speed 2496.54 samples/sec Loss 9.6490 LearningRate 0.000782 Epoch: 3 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:25,401-Speed 2497.93 samples/sec Loss 9.5699 LearningRate 0.000782 Epoch: 3 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:33,604-Speed 2497.24 samples/sec Loss 9.5340 LearningRate 0.000782 Epoch: 3 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:41,808-Speed 2496.60 samples/sec Loss 9.7684 LearningRate 0.000782 Epoch: 3 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:50,006-Speed 2498.48 samples/sec Loss 9.7700 LearningRate 0.000783 Epoch: 3 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:15:58,160-Speed 2512.24 samples/sec Loss 9.6544 LearningRate 0.000783 Epoch: 3 Global Step: 64930 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:06,359-Speed 2498.19 samples/sec Loss 9.7339 LearningRate 0.000783 Epoch: 3 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:14,561-Speed 2497.36 samples/sec Loss 9.7119 LearningRate 0.000783 Epoch: 3 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:22,761-Speed 2498.18 samples/sec Loss 9.7350 LearningRate 0.000783 Epoch: 3 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:30,959-Speed 2498.39 samples/sec Loss 9.6336 LearningRate 0.000783 Epoch: 3 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:39,159-Speed 2498.04 samples/sec Loss 9.6702 LearningRate 0.000783 Epoch: 3 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:47,307-Speed 2513.79 samples/sec Loss 9.6071 LearningRate 0.000783 Epoch: 3 Global Step: 64990 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:16:55,503-Speed 2499.08 samples/sec Loss 9.6149 LearningRate 0.000784 Epoch: 3 Global Step: 65000 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:03,707-Speed 2496.69 samples/sec Loss 9.6127 LearningRate 0.000784 Epoch: 3 Global Step: 65010 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:11,910-Speed 2497.35 samples/sec Loss 9.5490 LearningRate 0.000784 Epoch: 3 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:20,110-Speed 2498.16 samples/sec Loss 9.5740 LearningRate 0.000784 Epoch: 3 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:28,311-Speed 2497.56 samples/sec Loss 9.4627 LearningRate 0.000784 Epoch: 3 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:36,460-Speed 2513.92 samples/sec Loss 9.4951 LearningRate 0.000784 Epoch: 3 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:44,660-Speed 2497.85 samples/sec Loss 9.5587 LearningRate 0.000784 Epoch: 3 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:17:52,870-Speed 2494.85 samples/sec Loss 9.5809 LearningRate 0.000784 Epoch: 3 Global Step: 65070 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:01,081-Speed 2494.75 samples/sec Loss 9.5113 LearningRate 0.000785 Epoch: 3 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:09,291-Speed 2494.83 samples/sec Loss 9.4735 LearningRate 0.000785 Epoch: 3 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:17,491-Speed 2497.91 samples/sec Loss 9.5800 LearningRate 0.000785 Epoch: 3 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:25,640-Speed 2513.89 samples/sec Loss 9.6150 LearningRate 0.000785 Epoch: 3 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:33,837-Speed 2498.55 samples/sec Loss 9.6242 LearningRate 0.000785 Epoch: 3 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:42,036-Speed 2498.42 samples/sec Loss 9.5274 LearningRate 0.000785 Epoch: 3 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:50,238-Speed 2497.46 samples/sec Loss 9.5965 LearningRate 0.000785 Epoch: 3 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:18:58,437-Speed 2498.24 samples/sec Loss 9.5392 LearningRate 0.000785 Epoch: 3 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:06,636-Speed 2498.08 samples/sec Loss 9.3853 LearningRate 0.000785 Epoch: 3 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:14,785-Speed 2513.58 samples/sec Loss 9.4581 LearningRate 0.000786 Epoch: 3 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:22,985-Speed 2498.03 samples/sec Loss 9.4418 LearningRate 0.000786 Epoch: 3 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:31,186-Speed 2497.85 samples/sec Loss 9.5433 LearningRate 0.000786 Epoch: 3 Global Step: 65190 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:39,386-Speed 2498.11 samples/sec Loss 9.6760 LearningRate 0.000786 Epoch: 3 Global Step: 65200 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:47,586-Speed 2497.73 samples/sec Loss 9.6531 LearningRate 0.000786 Epoch: 3 Global Step: 65210 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:19:55,785-Speed 2498.52 samples/sec Loss 9.6577 LearningRate 0.000786 Epoch: 3 Global Step: 65220 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:03,930-Speed 2514.60 samples/sec Loss 9.5796 LearningRate 0.000786 Epoch: 3 Global Step: 65230 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:12,132-Speed 2497.22 samples/sec Loss 9.7235 LearningRate 0.000786 Epoch: 3 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:20,339-Speed 2495.99 samples/sec Loss 9.6009 LearningRate 0.000787 Epoch: 3 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:28,554-Speed 2493.46 samples/sec Loss 9.6267 LearningRate 0.000787 Epoch: 3 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:36,755-Speed 2497.74 samples/sec Loss 9.5598 LearningRate 0.000787 Epoch: 3 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 175 hours Training: 2022-07-06 05:20:44,971-Speed 2493.21 samples/sec Loss 9.6047 LearningRate 0.000787 Epoch: 3 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:20:53,121-Speed 2513.05 samples/sec Loss 9.6553 LearningRate 0.000787 Epoch: 3 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:01,326-Speed 2496.36 samples/sec Loss 9.5817 LearningRate 0.000787 Epoch: 3 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:09,535-Speed 2495.43 samples/sec Loss 9.5626 LearningRate 0.000787 Epoch: 3 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:17,737-Speed 2497.06 samples/sec Loss 9.4769 LearningRate 0.000787 Epoch: 3 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:25,939-Speed 2497.47 samples/sec Loss 9.4848 LearningRate 0.000788 Epoch: 3 Global Step: 65330 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:34,146-Speed 2495.86 samples/sec Loss 9.5383 LearningRate 0.000788 Epoch: 3 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:42,301-Speed 2511.96 samples/sec Loss 9.5068 LearningRate 0.000788 Epoch: 3 Global Step: 65350 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:50,508-Speed 2495.72 samples/sec Loss 9.4884 LearningRate 0.000788 Epoch: 3 Global Step: 65360 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:21:58,714-Speed 2496.22 samples/sec Loss 9.4767 LearningRate 0.000788 Epoch: 3 Global Step: 65370 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:06,920-Speed 2496.23 samples/sec Loss 9.4285 LearningRate 0.000788 Epoch: 3 Global Step: 65380 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:15,121-Speed 2497.85 samples/sec Loss 9.5760 LearningRate 0.000788 Epoch: 3 Global Step: 65390 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:23,320-Speed 2498.01 samples/sec Loss 9.4851 LearningRate 0.000788 Epoch: 3 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:31,468-Speed 2514.63 samples/sec Loss 9.4270 LearningRate 0.000788 Epoch: 3 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:39,667-Speed 2498.13 samples/sec Loss 9.5590 LearningRate 0.000789 Epoch: 3 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:47,867-Speed 2498.20 samples/sec Loss 9.4301 LearningRate 0.000789 Epoch: 3 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:22:56,073-Speed 2496.13 samples/sec Loss 9.4247 LearningRate 0.000789 Epoch: 3 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:04,277-Speed 2496.87 samples/sec Loss 9.4988 LearningRate 0.000789 Epoch: 3 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:12,481-Speed 2496.94 samples/sec Loss 9.4390 LearningRate 0.000789 Epoch: 3 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:20,627-Speed 2514.31 samples/sec Loss 9.4146 LearningRate 0.000789 Epoch: 3 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:28,828-Speed 2497.55 samples/sec Loss 9.4383 LearningRate 0.000789 Epoch: 3 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:37,034-Speed 2496.23 samples/sec Loss 9.4149 LearningRate 0.000789 Epoch: 3 Global Step: 65490 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:45,237-Speed 2497.03 samples/sec Loss 9.5016 LearningRate 0.000790 Epoch: 3 Global Step: 65500 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:23:53,443-Speed 2496.37 samples/sec Loss 9.4324 LearningRate 0.000790 Epoch: 3 Global Step: 65510 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:24:01,646-Speed 2496.92 samples/sec Loss 9.6798 LearningRate 0.000790 Epoch: 3 Global Step: 65520 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:24:09,796-Speed 2513.50 samples/sec Loss 9.6070 LearningRate 0.000790 Epoch: 3 Global Step: 65530 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:24:17,999-Speed 2496.82 samples/sec Loss 9.4768 LearningRate 0.000790 Epoch: 3 Global Step: 65540 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:24:26,204-Speed 2496.67 samples/sec Loss 9.4780 LearningRate 0.000790 Epoch: 3 Global Step: 65550 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:24:34,362-Speed 2510.56 samples/sec Loss 9.5667 LearningRate 0.000790 Epoch: 3 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:24:42,568-Speed 2496.50 samples/sec Loss 9.5957 LearningRate 0.000790 Epoch: 3 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:24:50,786-Speed 2492.14 samples/sec Loss 9.5262 LearningRate 0.000791 Epoch: 3 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:24:58,933-Speed 2514.35 samples/sec Loss 9.5045 LearningRate 0.000791 Epoch: 3 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:07,133-Speed 2497.95 samples/sec Loss 9.5311 LearningRate 0.000791 Epoch: 3 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:15,344-Speed 2494.73 samples/sec Loss 9.4449 LearningRate 0.000791 Epoch: 3 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:23,543-Speed 2498.21 samples/sec Loss 9.5406 LearningRate 0.000791 Epoch: 3 Global Step: 65620 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:31,748-Speed 2496.54 samples/sec Loss 9.5349 LearningRate 0.000791 Epoch: 3 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:39,951-Speed 2496.93 samples/sec Loss 9.4601 LearningRate 0.000791 Epoch: 3 Global Step: 65640 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:48,098-Speed 2514.33 samples/sec Loss 9.4354 LearningRate 0.000791 Epoch: 3 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:25:56,312-Speed 2493.76 samples/sec Loss 9.4961 LearningRate 0.000791 Epoch: 3 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:04,517-Speed 2496.50 samples/sec Loss 9.4888 LearningRate 0.000792 Epoch: 3 Global Step: 65670 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:12,717-Speed 2497.93 samples/sec Loss 9.5715 LearningRate 0.000792 Epoch: 3 Global Step: 65680 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:20,918-Speed 2497.63 samples/sec Loss 9.4819 LearningRate 0.000792 Epoch: 3 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:29,118-Speed 2498.23 samples/sec Loss 9.3830 LearningRate 0.000792 Epoch: 3 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:37,264-Speed 2515.13 samples/sec Loss 9.4219 LearningRate 0.000792 Epoch: 3 Global Step: 65710 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:45,468-Speed 2497.02 samples/sec Loss 9.3723 LearningRate 0.000792 Epoch: 3 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:26:53,667-Speed 2498.16 samples/sec Loss 9.4601 LearningRate 0.000792 Epoch: 3 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:01,867-Speed 2498.03 samples/sec Loss 9.3655 LearningRate 0.000792 Epoch: 3 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:10,068-Speed 2497.79 samples/sec Loss 9.3799 LearningRate 0.000793 Epoch: 3 Global Step: 65750 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:18,268-Speed 2498.15 samples/sec Loss 9.3885 LearningRate 0.000793 Epoch: 3 Global Step: 65760 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:26,413-Speed 2514.67 samples/sec Loss 9.3526 LearningRate 0.000793 Epoch: 3 Global Step: 65770 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:34,625-Speed 2494.05 samples/sec Loss 9.4543 LearningRate 0.000793 Epoch: 3 Global Step: 65780 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:42,827-Speed 2497.58 samples/sec Loss 9.4466 LearningRate 0.000793 Epoch: 3 Global Step: 65790 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:51,025-Speed 2498.31 samples/sec Loss 9.3833 LearningRate 0.000793 Epoch: 3 Global Step: 65800 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:27:59,227-Speed 2497.49 samples/sec Loss 9.5021 LearningRate 0.000793 Epoch: 3 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:07,430-Speed 2497.09 samples/sec Loss 9.3688 LearningRate 0.000793 Epoch: 3 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:15,575-Speed 2514.83 samples/sec Loss 9.3868 LearningRate 0.000794 Epoch: 3 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:23,780-Speed 2496.54 samples/sec Loss 9.5280 LearningRate 0.000794 Epoch: 3 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:31,980-Speed 2497.80 samples/sec Loss 9.5115 LearningRate 0.000794 Epoch: 3 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:40,181-Speed 2497.66 samples/sec Loss 9.3964 LearningRate 0.000794 Epoch: 3 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:48,382-Speed 2497.84 samples/sec Loss 9.5216 LearningRate 0.000794 Epoch: 3 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:28:56,584-Speed 2497.52 samples/sec Loss 9.4501 LearningRate 0.000794 Epoch: 3 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:04,734-Speed 2513.15 samples/sec Loss 9.4984 LearningRate 0.000794 Epoch: 3 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:12,936-Speed 2497.53 samples/sec Loss 9.4942 LearningRate 0.000794 Epoch: 3 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:21,142-Speed 2496.22 samples/sec Loss 9.5156 LearningRate 0.000795 Epoch: 3 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:29,342-Speed 2498.07 samples/sec Loss 9.5611 LearningRate 0.000795 Epoch: 3 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:37,551-Speed 2495.28 samples/sec Loss 9.5956 LearningRate 0.000795 Epoch: 3 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:45,755-Speed 2496.55 samples/sec Loss 9.5048 LearningRate 0.000795 Epoch: 3 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:29:53,906-Speed 2513.09 samples/sec Loss 9.4378 LearningRate 0.000795 Epoch: 3 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:02,109-Speed 2497.06 samples/sec Loss 9.5053 LearningRate 0.000795 Epoch: 3 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:10,308-Speed 2498.32 samples/sec Loss 9.4173 LearningRate 0.000795 Epoch: 3 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:18,508-Speed 2498.09 samples/sec Loss 9.3909 LearningRate 0.000795 Epoch: 3 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:26,713-Speed 2497.14 samples/sec Loss 9.4580 LearningRate 0.000795 Epoch: 3 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:34,915-Speed 2497.44 samples/sec Loss 9.4594 LearningRate 0.000796 Epoch: 3 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:43,061-Speed 2514.16 samples/sec Loss 9.4370 LearningRate 0.000796 Epoch: 3 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:51,266-Speed 2496.38 samples/sec Loss 9.4266 LearningRate 0.000796 Epoch: 3 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:30:59,465-Speed 2498.52 samples/sec Loss 9.4442 LearningRate 0.000796 Epoch: 3 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:07,664-Speed 2498.05 samples/sec Loss 9.3655 LearningRate 0.000796 Epoch: 3 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:15,868-Speed 2496.76 samples/sec Loss 9.3906 LearningRate 0.000796 Epoch: 3 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:24,069-Speed 2497.81 samples/sec Loss 9.4171 LearningRate 0.000796 Epoch: 3 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:32,222-Speed 2512.15 samples/sec Loss 9.4142 LearningRate 0.000796 Epoch: 3 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:40,423-Speed 2497.62 samples/sec Loss 9.4181 LearningRate 0.000797 Epoch: 3 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:48,621-Speed 2498.70 samples/sec Loss 9.4753 LearningRate 0.000797 Epoch: 3 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:31:56,828-Speed 2495.74 samples/sec Loss 9.4218 LearningRate 0.000797 Epoch: 3 Global Step: 66100 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:05,029-Speed 2497.74 samples/sec Loss 9.3984 LearningRate 0.000797 Epoch: 3 Global Step: 66110 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:13,229-Speed 2498.15 samples/sec Loss 9.3989 LearningRate 0.000797 Epoch: 3 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:21,378-Speed 2513.70 samples/sec Loss 9.3888 LearningRate 0.000797 Epoch: 3 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:29,600-Speed 2491.24 samples/sec Loss 9.3967 LearningRate 0.000797 Epoch: 3 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:37,804-Speed 2496.70 samples/sec Loss 9.3941 LearningRate 0.000797 Epoch: 3 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:46,003-Speed 2498.26 samples/sec Loss 9.2670 LearningRate 0.000798 Epoch: 3 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:32:54,203-Speed 2498.14 samples/sec Loss 9.4188 LearningRate 0.000798 Epoch: 3 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:02,406-Speed 2496.91 samples/sec Loss 9.2923 LearningRate 0.000798 Epoch: 3 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:10,553-Speed 2514.42 samples/sec Loss 9.5960 LearningRate 0.000798 Epoch: 3 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:18,764-Speed 2494.35 samples/sec Loss 9.5926 LearningRate 0.000798 Epoch: 3 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:26,964-Speed 2497.98 samples/sec Loss 9.5937 LearningRate 0.000798 Epoch: 3 Global Step: 66210 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:35,174-Speed 2494.86 samples/sec Loss 9.5640 LearningRate 0.000798 Epoch: 3 Global Step: 66220 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:43,374-Speed 2498.01 samples/sec Loss 9.5197 LearningRate 0.000798 Epoch: 3 Global Step: 66230 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:51,572-Speed 2498.38 samples/sec Loss 9.4057 LearningRate 0.000798 Epoch: 3 Global Step: 66240 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:33:59,731-Speed 2510.70 samples/sec Loss 9.4355 LearningRate 0.000799 Epoch: 3 Global Step: 66250 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:07,943-Speed 2494.33 samples/sec Loss 9.3483 LearningRate 0.000799 Epoch: 3 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:16,152-Speed 2495.37 samples/sec Loss 9.4000 LearningRate 0.000799 Epoch: 3 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:24,350-Speed 2498.42 samples/sec Loss 9.4444 LearningRate 0.000799 Epoch: 3 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:32,554-Speed 2496.74 samples/sec Loss 9.4581 LearningRate 0.000799 Epoch: 3 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:40,757-Speed 2497.29 samples/sec Loss 9.5299 LearningRate 0.000799 Epoch: 3 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:48,906-Speed 2513.35 samples/sec Loss 9.4820 LearningRate 0.000799 Epoch: 3 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:34:57,104-Speed 2498.85 samples/sec Loss 9.4401 LearningRate 0.000799 Epoch: 3 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:05,304-Speed 2497.95 samples/sec Loss 9.5345 LearningRate 0.000800 Epoch: 3 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:13,506-Speed 2497.49 samples/sec Loss 9.4146 LearningRate 0.000800 Epoch: 3 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:21,705-Speed 2498.05 samples/sec Loss 9.6095 LearningRate 0.000800 Epoch: 3 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:29,906-Speed 2497.96 samples/sec Loss 9.5048 LearningRate 0.000800 Epoch: 3 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:38,055-Speed 2513.83 samples/sec Loss 9.6186 LearningRate 0.000800 Epoch: 3 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:46,257-Speed 2497.66 samples/sec Loss 9.3759 LearningRate 0.000800 Epoch: 3 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:35:54,455-Speed 2498.32 samples/sec Loss 9.3582 LearningRate 0.000800 Epoch: 3 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:02,657-Speed 2497.44 samples/sec Loss 9.3994 LearningRate 0.000800 Epoch: 3 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:10,858-Speed 2497.61 samples/sec Loss 9.4507 LearningRate 0.000801 Epoch: 3 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:19,058-Speed 2498.08 samples/sec Loss 9.3344 LearningRate 0.000801 Epoch: 3 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:27,205-Speed 2513.96 samples/sec Loss 9.2903 LearningRate 0.000801 Epoch: 3 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:35,407-Speed 2498.03 samples/sec Loss 9.3200 LearningRate 0.000801 Epoch: 3 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:43,607-Speed 2498.18 samples/sec Loss 9.3001 LearningRate 0.000801 Epoch: 3 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:36:51,811-Speed 2496.99 samples/sec Loss 9.3725 LearningRate 0.000801 Epoch: 3 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:00,017-Speed 2496.02 samples/sec Loss 9.3601 LearningRate 0.000801 Epoch: 3 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:08,222-Speed 2496.55 samples/sec Loss 9.4068 LearningRate 0.000801 Epoch: 3 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:16,369-Speed 2514.14 samples/sec Loss 9.3789 LearningRate 0.000801 Epoch: 3 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:24,572-Speed 2497.42 samples/sec Loss 9.4712 LearningRate 0.000802 Epoch: 3 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:32,791-Speed 2492.03 samples/sec Loss 9.4034 LearningRate 0.000802 Epoch: 3 Global Step: 66510 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:40,999-Speed 2495.74 samples/sec Loss 9.3627 LearningRate 0.000802 Epoch: 3 Global Step: 66520 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:49,203-Speed 2496.74 samples/sec Loss 9.3256 LearningRate 0.000802 Epoch: 3 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:37:57,403-Speed 2498.01 samples/sec Loss 9.3729 LearningRate 0.000802 Epoch: 3 Global Step: 66540 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:05,550-Speed 2514.15 samples/sec Loss 9.4153 LearningRate 0.000802 Epoch: 3 Global Step: 66550 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:13,749-Speed 2498.26 samples/sec Loss 9.4615 LearningRate 0.000802 Epoch: 3 Global Step: 66560 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:21,974-Speed 2490.82 samples/sec Loss 9.3998 LearningRate 0.000802 Epoch: 3 Global Step: 66570 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:30,172-Speed 2498.53 samples/sec Loss 9.4465 LearningRate 0.000803 Epoch: 3 Global Step: 66580 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:38,373-Speed 2497.63 samples/sec Loss 9.4944 LearningRate 0.000803 Epoch: 3 Global Step: 66590 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:46,574-Speed 2497.75 samples/sec Loss 9.4192 LearningRate 0.000803 Epoch: 3 Global Step: 66600 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:38:54,723-Speed 2514.27 samples/sec Loss 9.4656 LearningRate 0.000803 Epoch: 3 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:02,926-Speed 2497.07 samples/sec Loss 9.3629 LearningRate 0.000803 Epoch: 3 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:11,128-Speed 2497.31 samples/sec Loss 9.2720 LearningRate 0.000803 Epoch: 3 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:19,328-Speed 2498.01 samples/sec Loss 9.3837 LearningRate 0.000803 Epoch: 3 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:27,527-Speed 2498.27 samples/sec Loss 9.3542 LearningRate 0.000803 Epoch: 3 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:35,727-Speed 2497.87 samples/sec Loss 9.3899 LearningRate 0.000804 Epoch: 3 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:43,876-Speed 2513.74 samples/sec Loss 9.3217 LearningRate 0.000804 Epoch: 3 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:39:52,074-Speed 2498.75 samples/sec Loss 9.2629 LearningRate 0.000804 Epoch: 3 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:00,273-Speed 2498.27 samples/sec Loss 9.4047 LearningRate 0.000804 Epoch: 3 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:08,502-Speed 2489.24 samples/sec Loss 9.4259 LearningRate 0.000804 Epoch: 3 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:16,706-Speed 2496.70 samples/sec Loss 9.3746 LearningRate 0.000804 Epoch: 3 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:24,906-Speed 2498.05 samples/sec Loss 9.3353 LearningRate 0.000804 Epoch: 3 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:33,052-Speed 2514.48 samples/sec Loss 9.2445 LearningRate 0.000804 Epoch: 3 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:41,263-Speed 2494.67 samples/sec Loss 9.3866 LearningRate 0.000805 Epoch: 3 Global Step: 66740 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:49,465-Speed 2497.31 samples/sec Loss 9.3371 LearningRate 0.000805 Epoch: 3 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:40:57,669-Speed 2496.73 samples/sec Loss 9.4023 LearningRate 0.000805 Epoch: 3 Global Step: 66760 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:41:05,830-Speed 2510.06 samples/sec Loss 9.4111 LearningRate 0.000805 Epoch: 3 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:14,037-Speed 2495.73 samples/sec Loss 9.4637 LearningRate 0.000805 Epoch: 3 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:22,182-Speed 2514.86 samples/sec Loss 9.4387 LearningRate 0.000805 Epoch: 3 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:30,388-Speed 2496.65 samples/sec Loss 9.4015 LearningRate 0.000805 Epoch: 3 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:38,587-Speed 2498.33 samples/sec Loss 9.3580 LearningRate 0.000805 Epoch: 3 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:46,790-Speed 2497.12 samples/sec Loss 9.3563 LearningRate 0.000805 Epoch: 3 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:41:54,991-Speed 2497.71 samples/sec Loss 9.2972 LearningRate 0.000806 Epoch: 3 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:03,186-Speed 2499.34 samples/sec Loss 9.2921 LearningRate 0.000806 Epoch: 3 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:11,332-Speed 2514.69 samples/sec Loss 9.3243 LearningRate 0.000806 Epoch: 3 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:19,530-Speed 2498.53 samples/sec Loss 9.4222 LearningRate 0.000806 Epoch: 3 Global Step: 66860 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:27,726-Speed 2499.24 samples/sec Loss 9.4311 LearningRate 0.000806 Epoch: 3 Global Step: 66870 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:35,927-Speed 2497.79 samples/sec Loss 9.4024 LearningRate 0.000806 Epoch: 3 Global Step: 66880 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:44,128-Speed 2497.48 samples/sec Loss 9.3736 LearningRate 0.000806 Epoch: 3 Global Step: 66890 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:42:52,326-Speed 2498.77 samples/sec Loss 9.4336 LearningRate 0.000806 Epoch: 3 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:00,473-Speed 2514.59 samples/sec Loss 9.4908 LearningRate 0.000807 Epoch: 3 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:08,676-Speed 2496.99 samples/sec Loss 9.4928 LearningRate 0.000807 Epoch: 3 Global Step: 66920 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:16,876-Speed 2498.13 samples/sec Loss 9.4233 LearningRate 0.000807 Epoch: 3 Global Step: 66930 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:25,077-Speed 2497.62 samples/sec Loss 9.3808 LearningRate 0.000807 Epoch: 3 Global Step: 66940 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:33,272-Speed 2499.61 samples/sec Loss 9.4364 LearningRate 0.000807 Epoch: 3 Global Step: 66950 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:41,473-Speed 2497.68 samples/sec Loss 9.2130 LearningRate 0.000807 Epoch: 3 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:49,618-Speed 2514.94 samples/sec Loss 9.3238 LearningRate 0.000807 Epoch: 3 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:43:57,816-Speed 2498.62 samples/sec Loss 9.3580 LearningRate 0.000807 Epoch: 3 Global Step: 66980 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:06,024-Speed 2495.47 samples/sec Loss 9.2852 LearningRate 0.000808 Epoch: 3 Global Step: 66990 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:14,221-Speed 2498.85 samples/sec Loss 9.3240 LearningRate 0.000808 Epoch: 3 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:22,426-Speed 2496.45 samples/sec Loss 9.3500 LearningRate 0.000808 Epoch: 3 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:30,633-Speed 2496.32 samples/sec Loss 9.3265 LearningRate 0.000808 Epoch: 3 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:38,786-Speed 2512.57 samples/sec Loss 9.3369 LearningRate 0.000808 Epoch: 3 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:46,993-Speed 2495.62 samples/sec Loss 9.3513 LearningRate 0.000808 Epoch: 3 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:44:55,205-Speed 2494.33 samples/sec Loss 9.2994 LearningRate 0.000808 Epoch: 3 Global Step: 67050 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:03,407-Speed 2497.46 samples/sec Loss 9.2762 LearningRate 0.000808 Epoch: 3 Global Step: 67060 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:11,606-Speed 2498.27 samples/sec Loss 9.2777 LearningRate 0.000808 Epoch: 3 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:19,810-Speed 2496.73 samples/sec Loss 9.2861 LearningRate 0.000809 Epoch: 3 Global Step: 67080 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:28,046-Speed 2486.79 samples/sec Loss 9.3007 LearningRate 0.000809 Epoch: 3 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:36,247-Speed 2497.99 samples/sec Loss 9.2846 LearningRate 0.000809 Epoch: 3 Global Step: 67100 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:44,462-Speed 2493.44 samples/sec Loss 9.1728 LearningRate 0.000809 Epoch: 3 Global Step: 67110 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:45:52,665-Speed 2496.99 samples/sec Loss 9.2412 LearningRate 0.000809 Epoch: 3 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:00,863-Speed 2498.59 samples/sec Loss 9.3476 LearningRate 0.000809 Epoch: 3 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:09,067-Speed 2496.79 samples/sec Loss 9.4702 LearningRate 0.000809 Epoch: 3 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:17,213-Speed 2514.53 samples/sec Loss 9.2851 LearningRate 0.000809 Epoch: 3 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:25,414-Speed 2497.90 samples/sec Loss 9.4613 LearningRate 0.000810 Epoch: 3 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:33,614-Speed 2498.20 samples/sec Loss 9.4803 LearningRate 0.000810 Epoch: 3 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:41,812-Speed 2498.56 samples/sec Loss 9.4869 LearningRate 0.000810 Epoch: 3 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:50,015-Speed 2497.13 samples/sec Loss 9.3851 LearningRate 0.000810 Epoch: 3 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:46:58,223-Speed 2495.44 samples/sec Loss 9.4495 LearningRate 0.000810 Epoch: 3 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:06,370-Speed 2514.17 samples/sec Loss 9.3226 LearningRate 0.000810 Epoch: 3 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:14,570-Speed 2497.86 samples/sec Loss 9.3790 LearningRate 0.000810 Epoch: 3 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:22,770-Speed 2498.05 samples/sec Loss 9.3761 LearningRate 0.000810 Epoch: 3 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:30,972-Speed 2497.60 samples/sec Loss 9.3735 LearningRate 0.000811 Epoch: 3 Global Step: 67240 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:39,174-Speed 2497.18 samples/sec Loss 9.3990 LearningRate 0.000811 Epoch: 3 Global Step: 67250 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:47,373-Speed 2498.43 samples/sec Loss 9.3901 LearningRate 0.000811 Epoch: 3 Global Step: 67260 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:47:55,520-Speed 2514.14 samples/sec Loss 9.3292 LearningRate 0.000811 Epoch: 3 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:03,721-Speed 2497.73 samples/sec Loss 9.2845 LearningRate 0.000811 Epoch: 3 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:11,923-Speed 2497.44 samples/sec Loss 9.2731 LearningRate 0.000811 Epoch: 3 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:20,123-Speed 2498.00 samples/sec Loss 9.3499 LearningRate 0.000811 Epoch: 3 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:28,323-Speed 2497.85 samples/sec Loss 9.3308 LearningRate 0.000811 Epoch: 3 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:36,525-Speed 2497.52 samples/sec Loss 9.3233 LearningRate 0.000812 Epoch: 3 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:44,687-Speed 2509.54 samples/sec Loss 9.1895 LearningRate 0.000812 Epoch: 3 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:48:52,886-Speed 2498.00 samples/sec Loss 9.1804 LearningRate 0.000812 Epoch: 3 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:01,094-Speed 2496.49 samples/sec Loss 9.2973 LearningRate 0.000812 Epoch: 3 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:09,298-Speed 2496.95 samples/sec Loss 9.2759 LearningRate 0.000812 Epoch: 3 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:17,498-Speed 2497.91 samples/sec Loss 9.2262 LearningRate 0.000812 Epoch: 3 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:25,699-Speed 2497.74 samples/sec Loss 9.3407 LearningRate 0.000812 Epoch: 3 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:33,844-Speed 2514.80 samples/sec Loss 9.2863 LearningRate 0.000812 Epoch: 3 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:42,047-Speed 2497.68 samples/sec Loss 9.2936 LearningRate 0.000812 Epoch: 3 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:50,257-Speed 2494.84 samples/sec Loss 9.3789 LearningRate 0.000813 Epoch: 3 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:49:58,474-Speed 2492.49 samples/sec Loss 9.3903 LearningRate 0.000813 Epoch: 3 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:06,677-Speed 2497.12 samples/sec Loss 9.3498 LearningRate 0.000813 Epoch: 3 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:14,888-Speed 2494.75 samples/sec Loss 9.4461 LearningRate 0.000813 Epoch: 3 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:23,037-Speed 2513.37 samples/sec Loss 9.2804 LearningRate 0.000813 Epoch: 3 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:31,252-Speed 2493.39 samples/sec Loss 9.4656 LearningRate 0.000813 Epoch: 3 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:39,454-Speed 2497.50 samples/sec Loss 9.5020 LearningRate 0.000813 Epoch: 3 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:47,659-Speed 2496.40 samples/sec Loss 9.2936 LearningRate 0.000813 Epoch: 3 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:50:55,861-Speed 2497.75 samples/sec Loss 9.2910 LearningRate 0.000814 Epoch: 3 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:04,067-Speed 2496.21 samples/sec Loss 9.4804 LearningRate 0.000814 Epoch: 3 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:12,216-Speed 2513.48 samples/sec Loss 9.3089 LearningRate 0.000814 Epoch: 3 Global Step: 67510 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:20,419-Speed 2496.86 samples/sec Loss 9.4087 LearningRate 0.000814 Epoch: 3 Global Step: 67520 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:28,621-Speed 2497.42 samples/sec Loss 9.3834 LearningRate 0.000814 Epoch: 3 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:36,825-Speed 2496.86 samples/sec Loss 9.3790 LearningRate 0.000814 Epoch: 3 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:45,025-Speed 2497.82 samples/sec Loss 9.4711 LearningRate 0.000814 Epoch: 3 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:51:53,225-Speed 2497.92 samples/sec Loss 9.3099 LearningRate 0.000814 Epoch: 3 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:01,373-Speed 2514.09 samples/sec Loss 9.4012 LearningRate 0.000815 Epoch: 3 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:09,571-Speed 2498.55 samples/sec Loss 9.2944 LearningRate 0.000815 Epoch: 3 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:17,784-Speed 2494.12 samples/sec Loss 9.3344 LearningRate 0.000815 Epoch: 3 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:25,984-Speed 2497.80 samples/sec Loss 9.3142 LearningRate 0.000815 Epoch: 3 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:34,190-Speed 2496.43 samples/sec Loss 9.3430 LearningRate 0.000815 Epoch: 3 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:42,408-Speed 2492.28 samples/sec Loss 9.3354 LearningRate 0.000815 Epoch: 3 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:50,556-Speed 2513.99 samples/sec Loss 9.2041 LearningRate 0.000815 Epoch: 3 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:52:58,766-Speed 2494.94 samples/sec Loss 9.2330 LearningRate 0.000815 Epoch: 3 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:06,968-Speed 2497.37 samples/sec Loss 9.3158 LearningRate 0.000815 Epoch: 3 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:15,176-Speed 2495.37 samples/sec Loss 9.3243 LearningRate 0.000816 Epoch: 3 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:23,389-Speed 2494.04 samples/sec Loss 9.3530 LearningRate 0.000816 Epoch: 3 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:31,592-Speed 2497.04 samples/sec Loss 9.3558 LearningRate 0.000816 Epoch: 3 Global Step: 67680 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:39,742-Speed 2513.03 samples/sec Loss 9.3818 LearningRate 0.000816 Epoch: 3 Global Step: 67690 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:47,945-Speed 2497.26 samples/sec Loss 9.3299 LearningRate 0.000816 Epoch: 3 Global Step: 67700 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:53:56,147-Speed 2497.28 samples/sec Loss 9.2937 LearningRate 0.000816 Epoch: 3 Global Step: 67710 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:04,348-Speed 2497.81 samples/sec Loss 9.2997 LearningRate 0.000816 Epoch: 3 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:12,552-Speed 2496.46 samples/sec Loss 9.2256 LearningRate 0.000816 Epoch: 3 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:20,753-Speed 2497.89 samples/sec Loss 9.2707 LearningRate 0.000817 Epoch: 3 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:28,906-Speed 2512.26 samples/sec Loss 9.1715 LearningRate 0.000817 Epoch: 3 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:37,107-Speed 2497.78 samples/sec Loss 9.2519 LearningRate 0.000817 Epoch: 3 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:45,309-Speed 2497.26 samples/sec Loss 9.3647 LearningRate 0.000817 Epoch: 3 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:54:53,515-Speed 2496.50 samples/sec Loss 9.2870 LearningRate 0.000817 Epoch: 3 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:01,716-Speed 2497.40 samples/sec Loss 9.3485 LearningRate 0.000817 Epoch: 3 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:09,924-Speed 2495.68 samples/sec Loss 9.2711 LearningRate 0.000817 Epoch: 3 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:18,073-Speed 2513.43 samples/sec Loss 9.2965 LearningRate 0.000817 Epoch: 3 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:26,277-Speed 2496.75 samples/sec Loss 9.2759 LearningRate 0.000818 Epoch: 3 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:34,486-Speed 2495.21 samples/sec Loss 9.2436 LearningRate 0.000818 Epoch: 3 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:42,717-Speed 2488.35 samples/sec Loss 9.2196 LearningRate 0.000818 Epoch: 3 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:50,938-Speed 2491.87 samples/sec Loss 9.2973 LearningRate 0.000818 Epoch: 3 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:55:59,144-Speed 2496.16 samples/sec Loss 9.1943 LearningRate 0.000818 Epoch: 3 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:07,295-Speed 2512.92 samples/sec Loss 9.2234 LearningRate 0.000818 Epoch: 3 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:15,500-Speed 2496.58 samples/sec Loss 9.0880 LearningRate 0.000818 Epoch: 3 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:23,703-Speed 2496.99 samples/sec Loss 9.1636 LearningRate 0.000818 Epoch: 3 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:31,907-Speed 2496.79 samples/sec Loss 9.2486 LearningRate 0.000818 Epoch: 3 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:40,108-Speed 2497.73 samples/sec Loss 9.0121 LearningRate 0.000819 Epoch: 3 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:48,312-Speed 2496.65 samples/sec Loss 9.1052 LearningRate 0.000819 Epoch: 3 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:56:56,465-Speed 2512.41 samples/sec Loss 9.2623 LearningRate 0.000819 Epoch: 3 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:04,678-Speed 2493.99 samples/sec Loss 9.2021 LearningRate 0.000819 Epoch: 3 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:12,879-Speed 2497.54 samples/sec Loss 9.1371 LearningRate 0.000819 Epoch: 3 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:21,087-Speed 2495.38 samples/sec Loss 9.2722 LearningRate 0.000819 Epoch: 3 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:29,291-Speed 2496.91 samples/sec Loss 9.2212 LearningRate 0.000819 Epoch: 3 Global Step: 67970 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 05:57:37,449-Speed 2510.66 samples/sec Loss 9.0677 LearningRate 0.000819 Epoch: 3 Global Step: 67980 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:45,597-Speed 2513.86 samples/sec Loss 9.1414 LearningRate 0.000820 Epoch: 3 Global Step: 67990 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:57:53,799-Speed 2497.62 samples/sec Loss 9.0966 LearningRate 0.000820 Epoch: 3 Global Step: 68000 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:02,014-Speed 2493.31 samples/sec Loss 9.1449 LearningRate 0.000820 Epoch: 3 Global Step: 68010 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:10,221-Speed 2495.85 samples/sec Loss 9.1662 LearningRate 0.000820 Epoch: 3 Global Step: 68020 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:18,425-Speed 2496.89 samples/sec Loss 9.1371 LearningRate 0.000820 Epoch: 3 Global Step: 68030 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:26,627-Speed 2497.62 samples/sec Loss 9.1573 LearningRate 0.000820 Epoch: 3 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:34,775-Speed 2514.21 samples/sec Loss 9.2892 LearningRate 0.000820 Epoch: 3 Global Step: 68050 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:42,991-Speed 2493.07 samples/sec Loss 9.1856 LearningRate 0.000820 Epoch: 3 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:51,191-Speed 2497.90 samples/sec Loss 9.3208 LearningRate 0.000821 Epoch: 3 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:58:59,397-Speed 2496.16 samples/sec Loss 9.2757 LearningRate 0.000821 Epoch: 3 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:07,601-Speed 2496.70 samples/sec Loss 9.1921 LearningRate 0.000821 Epoch: 3 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:15,810-Speed 2495.27 samples/sec Loss 9.1557 LearningRate 0.000821 Epoch: 3 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:23,985-Speed 2505.59 samples/sec Loss 9.2155 LearningRate 0.000821 Epoch: 3 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:32,189-Speed 2496.69 samples/sec Loss 9.1806 LearningRate 0.000821 Epoch: 3 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:40,391-Speed 2497.50 samples/sec Loss 9.2513 LearningRate 0.000821 Epoch: 3 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:48,597-Speed 2496.31 samples/sec Loss 9.1555 LearningRate 0.000821 Epoch: 3 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 05:59:56,800-Speed 2496.74 samples/sec Loss 9.1218 LearningRate 0.000822 Epoch: 3 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:05,003-Speed 2497.15 samples/sec Loss 9.1115 LearningRate 0.000822 Epoch: 3 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:13,152-Speed 2513.38 samples/sec Loss 9.2495 LearningRate 0.000822 Epoch: 3 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:21,355-Speed 2496.99 samples/sec Loss 9.1879 LearningRate 0.000822 Epoch: 3 Global Step: 68180 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:29,557-Speed 2497.42 samples/sec Loss 9.3252 LearningRate 0.000822 Epoch: 3 Global Step: 68190 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:37,760-Speed 2497.09 samples/sec Loss 9.3040 LearningRate 0.000822 Epoch: 3 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:45,962-Speed 2497.32 samples/sec Loss 9.1678 LearningRate 0.000822 Epoch: 3 Global Step: 68210 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:00:54,163-Speed 2497.53 samples/sec Loss 9.2350 LearningRate 0.000822 Epoch: 3 Global Step: 68220 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:02,311-Speed 2513.84 samples/sec Loss 9.2025 LearningRate 0.000822 Epoch: 3 Global Step: 68230 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:10,513-Speed 2497.60 samples/sec Loss 9.1903 LearningRate 0.000823 Epoch: 3 Global Step: 68240 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:18,723-Speed 2495.00 samples/sec Loss 9.1698 LearningRate 0.000823 Epoch: 3 Global Step: 68250 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:26,921-Speed 2498.36 samples/sec Loss 9.2458 LearningRate 0.000823 Epoch: 3 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:35,123-Speed 2497.86 samples/sec Loss 9.1802 LearningRate 0.000823 Epoch: 3 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:43,325-Speed 2497.61 samples/sec Loss 9.2210 LearningRate 0.000823 Epoch: 3 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:51,469-Speed 2515.07 samples/sec Loss 9.2229 LearningRate 0.000823 Epoch: 3 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:01:59,670-Speed 2497.90 samples/sec Loss 9.2329 LearningRate 0.000823 Epoch: 3 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:07,870-Speed 2497.80 samples/sec Loss 9.3045 LearningRate 0.000823 Epoch: 3 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:16,069-Speed 2498.26 samples/sec Loss 9.3630 LearningRate 0.000824 Epoch: 3 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:24,274-Speed 2496.35 samples/sec Loss 9.1732 LearningRate 0.000824 Epoch: 3 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:32,478-Speed 2496.80 samples/sec Loss 9.3104 LearningRate 0.000824 Epoch: 3 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:40,639-Speed 2510.05 samples/sec Loss 9.2671 LearningRate 0.000824 Epoch: 3 Global Step: 68350 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:48,842-Speed 2496.90 samples/sec Loss 9.2339 LearningRate 0.000824 Epoch: 3 Global Step: 68360 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:02:57,044-Speed 2497.36 samples/sec Loss 9.2332 LearningRate 0.000824 Epoch: 3 Global Step: 68370 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:05,250-Speed 2495.95 samples/sec Loss 9.1409 LearningRate 0.000824 Epoch: 3 Global Step: 68380 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:13,455-Speed 2496.62 samples/sec Loss 9.1598 LearningRate 0.000824 Epoch: 3 Global Step: 68390 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:21,751-Speed 2499.69 samples/sec Loss 9.2315 LearningRate 0.000825 Epoch: 3 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:29,899-Speed 2513.69 samples/sec Loss 9.2533 LearningRate 0.000825 Epoch: 3 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:39,797-Speed 2240.83 samples/sec Loss 9.2006 LearningRate 0.000825 Epoch: 3 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:47,995-Speed 2498.66 samples/sec Loss 9.2258 LearningRate 0.000825 Epoch: 3 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:03:56,190-Speed 2499.52 samples/sec Loss 9.1806 LearningRate 0.000825 Epoch: 3 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:06,457-Speed 1995.11 samples/sec Loss 9.1372 LearningRate 0.000825 Epoch: 3 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:14,880-Speed 2431.66 samples/sec Loss 9.2418 LearningRate 0.000825 Epoch: 3 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:23,029-Speed 2513.60 samples/sec Loss 9.4213 LearningRate 0.000825 Epoch: 3 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:31,226-Speed 2498.97 samples/sec Loss 9.2918 LearningRate 0.000825 Epoch: 3 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:39,423-Speed 2498.68 samples/sec Loss 9.2025 LearningRate 0.000826 Epoch: 3 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:47,629-Speed 2496.38 samples/sec Loss 9.3204 LearningRate 0.000826 Epoch: 3 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:04:55,833-Speed 2496.85 samples/sec Loss 9.2484 LearningRate 0.000826 Epoch: 3 Global Step: 68510 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:04,036-Speed 2496.92 samples/sec Loss 9.1902 LearningRate 0.000826 Epoch: 3 Global Step: 68520 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:12,186-Speed 2513.59 samples/sec Loss 9.2717 LearningRate 0.000826 Epoch: 3 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:20,384-Speed 2498.51 samples/sec Loss 9.0768 LearningRate 0.000826 Epoch: 3 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:28,585-Speed 2497.61 samples/sec Loss 9.1841 LearningRate 0.000826 Epoch: 3 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:36,785-Speed 2497.96 samples/sec Loss 9.1562 LearningRate 0.000826 Epoch: 3 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:44,986-Speed 2497.74 samples/sec Loss 9.1775 LearningRate 0.000827 Epoch: 3 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:05:53,193-Speed 2495.93 samples/sec Loss 9.0061 LearningRate 0.000827 Epoch: 3 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:01,332-Speed 2516.53 samples/sec Loss 9.1136 LearningRate 0.000827 Epoch: 3 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:09,537-Speed 2496.34 samples/sec Loss 9.1697 LearningRate 0.000827 Epoch: 3 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:17,736-Speed 2498.22 samples/sec Loss 9.1114 LearningRate 0.000827 Epoch: 3 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:25,937-Speed 2498.11 samples/sec Loss 9.1046 LearningRate 0.000827 Epoch: 3 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:34,140-Speed 2496.73 samples/sec Loss 9.2642 LearningRate 0.000827 Epoch: 3 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:42,342-Speed 2497.56 samples/sec Loss 9.1833 LearningRate 0.000827 Epoch: 3 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:50,489-Speed 2514.20 samples/sec Loss 9.0555 LearningRate 0.000828 Epoch: 3 Global Step: 68650 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:06:58,695-Speed 2496.24 samples/sec Loss 9.1821 LearningRate 0.000828 Epoch: 3 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:06,897-Speed 2497.15 samples/sec Loss 9.1855 LearningRate 0.000828 Epoch: 3 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:15,097-Speed 2498.14 samples/sec Loss 9.2856 LearningRate 0.000828 Epoch: 3 Global Step: 68680 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:23,297-Speed 2497.80 samples/sec Loss 9.2235 LearningRate 0.000828 Epoch: 3 Global Step: 68690 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:31,497-Speed 2497.93 samples/sec Loss 9.3357 LearningRate 0.000828 Epoch: 3 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:39,646-Speed 2513.81 samples/sec Loss 9.1291 LearningRate 0.000828 Epoch: 3 Global Step: 68710 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:47,841-Speed 2499.41 samples/sec Loss 9.1392 LearningRate 0.000828 Epoch: 3 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:07:56,040-Speed 2498.50 samples/sec Loss 9.1822 LearningRate 0.000828 Epoch: 3 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:04,240-Speed 2498.23 samples/sec Loss 9.1976 LearningRate 0.000829 Epoch: 3 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:12,439-Speed 2498.22 samples/sec Loss 9.1421 LearningRate 0.000829 Epoch: 3 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:20,643-Speed 2496.67 samples/sec Loss 9.2066 LearningRate 0.000829 Epoch: 3 Global Step: 68760 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:28,789-Speed 2514.63 samples/sec Loss 9.0641 LearningRate 0.000829 Epoch: 3 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:36,985-Speed 2499.10 samples/sec Loss 9.1025 LearningRate 0.000829 Epoch: 3 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:45,193-Speed 2495.62 samples/sec Loss 9.2783 LearningRate 0.000829 Epoch: 3 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:08:53,397-Speed 2496.69 samples/sec Loss 9.1374 LearningRate 0.000829 Epoch: 3 Global Step: 68800 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:01,598-Speed 2497.72 samples/sec Loss 9.1275 LearningRate 0.000829 Epoch: 3 Global Step: 68810 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:09,799-Speed 2497.76 samples/sec Loss 9.1796 LearningRate 0.000830 Epoch: 3 Global Step: 68820 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:17,957-Speed 2510.81 samples/sec Loss 9.0698 LearningRate 0.000830 Epoch: 3 Global Step: 68830 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:26,157-Speed 2497.99 samples/sec Loss 9.0749 LearningRate 0.000830 Epoch: 3 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:34,360-Speed 2497.15 samples/sec Loss 9.1001 LearningRate 0.000830 Epoch: 3 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:42,572-Speed 2494.40 samples/sec Loss 9.0639 LearningRate 0.000830 Epoch: 3 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:50,773-Speed 2497.60 samples/sec Loss 9.1861 LearningRate 0.000830 Epoch: 3 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:09:58,978-Speed 2496.69 samples/sec Loss 9.1090 LearningRate 0.000830 Epoch: 3 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:07,125-Speed 2514.06 samples/sec Loss 9.0616 LearningRate 0.000830 Epoch: 3 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:15,327-Speed 2497.40 samples/sec Loss 9.1013 LearningRate 0.000831 Epoch: 3 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:23,527-Speed 2498.36 samples/sec Loss 9.1004 LearningRate 0.000831 Epoch: 3 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:31,727-Speed 2497.96 samples/sec Loss 9.0013 LearningRate 0.000831 Epoch: 3 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:39,927-Speed 2497.74 samples/sec Loss 9.1722 LearningRate 0.000831 Epoch: 3 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:48,128-Speed 2498.10 samples/sec Loss 9.0725 LearningRate 0.000831 Epoch: 3 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:10:56,279-Speed 2513.25 samples/sec Loss 9.1450 LearningRate 0.000831 Epoch: 3 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:04,477-Speed 2498.38 samples/sec Loss 9.0367 LearningRate 0.000831 Epoch: 3 Global Step: 68960 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:12,682-Speed 2496.54 samples/sec Loss 9.0113 LearningRate 0.000831 Epoch: 3 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:20,883-Speed 2497.76 samples/sec Loss 9.0303 LearningRate 0.000832 Epoch: 3 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:29,084-Speed 2497.69 samples/sec Loss 9.0799 LearningRate 0.000832 Epoch: 3 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:37,288-Speed 2496.94 samples/sec Loss 9.0881 LearningRate 0.000832 Epoch: 3 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:45,433-Speed 2514.89 samples/sec Loss 9.1401 LearningRate 0.000832 Epoch: 3 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:11:53,630-Speed 2498.67 samples/sec Loss 9.1441 LearningRate 0.000832 Epoch: 3 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:01,834-Speed 2496.91 samples/sec Loss 9.0010 LearningRate 0.000832 Epoch: 3 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:10,037-Speed 2496.84 samples/sec Loss 9.0658 LearningRate 0.000832 Epoch: 3 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:18,255-Speed 2492.46 samples/sec Loss 9.0488 LearningRate 0.000832 Epoch: 3 Global Step: 69050 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:26,470-Speed 2493.28 samples/sec Loss 9.0428 LearningRate 0.000832 Epoch: 3 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:34,617-Speed 2514.72 samples/sec Loss 9.0124 LearningRate 0.000833 Epoch: 3 Global Step: 69070 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:42,818-Speed 2497.58 samples/sec Loss 9.0013 LearningRate 0.000833 Epoch: 3 Global Step: 69080 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:51,019-Speed 2497.74 samples/sec Loss 9.0361 LearningRate 0.000833 Epoch: 3 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:12:59,219-Speed 2497.91 samples/sec Loss 8.9557 LearningRate 0.000833 Epoch: 3 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:07,424-Speed 2496.48 samples/sec Loss 9.0643 LearningRate 0.000833 Epoch: 3 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:15,636-Speed 2494.33 samples/sec Loss 9.1405 LearningRate 0.000833 Epoch: 3 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:23,784-Speed 2513.75 samples/sec Loss 9.1252 LearningRate 0.000833 Epoch: 3 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:31,991-Speed 2495.69 samples/sec Loss 9.2772 LearningRate 0.000833 Epoch: 3 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:40,204-Speed 2494.04 samples/sec Loss 9.2276 LearningRate 0.000834 Epoch: 3 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:48,407-Speed 2497.31 samples/sec Loss 9.1298 LearningRate 0.000834 Epoch: 3 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:13:56,610-Speed 2497.07 samples/sec Loss 9.1227 LearningRate 0.000834 Epoch: 3 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:14:04,811-Speed 2497.55 samples/sec Loss 9.1736 LearningRate 0.000834 Epoch: 3 Global Step: 69180 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 06:14:12,968-Speed 2511.23 samples/sec Loss 9.1572 LearningRate 0.000834 Epoch: 3 Global Step: 69190 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 06:14:21,172-Speed 2496.76 samples/sec Loss 9.2298 LearningRate 0.000834 Epoch: 3 Global Step: 69200 Fp16 Grad Scale: 262144 Required: 174 hours Training: 2022-07-06 06:14:29,332-Speed 2510.13 samples/sec Loss 9.2686 LearningRate 0.000834 Epoch: 3 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:14:37,541-Speed 2495.13 samples/sec Loss 9.2170 LearningRate 0.000834 Epoch: 3 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:14:45,770-Speed 2489.53 samples/sec Loss 9.1080 LearningRate 0.000835 Epoch: 3 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:14:53,978-Speed 2495.48 samples/sec Loss 9.1268 LearningRate 0.000835 Epoch: 3 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:02,137-Speed 2510.62 samples/sec Loss 9.0721 LearningRate 0.000835 Epoch: 3 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:10,339-Speed 2497.08 samples/sec Loss 9.1651 LearningRate 0.000835 Epoch: 3 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:18,540-Speed 2497.86 samples/sec Loss 9.1667 LearningRate 0.000835 Epoch: 3 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:26,742-Speed 2497.43 samples/sec Loss 9.1678 LearningRate 0.000835 Epoch: 3 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:34,943-Speed 2497.49 samples/sec Loss 9.0188 LearningRate 0.000835 Epoch: 3 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:43,145-Speed 2497.43 samples/sec Loss 9.1143 LearningRate 0.000835 Epoch: 3 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:51,292-Speed 2514.62 samples/sec Loss 9.0629 LearningRate 0.000835 Epoch: 3 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:15:59,495-Speed 2496.94 samples/sec Loss 9.0995 LearningRate 0.000836 Epoch: 3 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:07,699-Speed 2496.79 samples/sec Loss 9.0561 LearningRate 0.000836 Epoch: 3 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:15,900-Speed 2497.56 samples/sec Loss 9.0132 LearningRate 0.000836 Epoch: 3 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:24,102-Speed 2497.34 samples/sec Loss 9.0832 LearningRate 0.000836 Epoch: 3 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:32,308-Speed 2496.23 samples/sec Loss 9.1411 LearningRate 0.000836 Epoch: 3 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:40,457-Speed 2513.64 samples/sec Loss 9.0374 LearningRate 0.000836 Epoch: 3 Global Step: 69370 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:48,677-Speed 2492.09 samples/sec Loss 9.0933 LearningRate 0.000836 Epoch: 3 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:16:56,879-Speed 2497.50 samples/sec Loss 9.1080 LearningRate 0.000836 Epoch: 3 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:05,078-Speed 2498.01 samples/sec Loss 9.0361 LearningRate 0.000837 Epoch: 3 Global Step: 69400 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:13,288-Speed 2495.16 samples/sec Loss 9.0146 LearningRate 0.000837 Epoch: 3 Global Step: 69410 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:21,490-Speed 2497.29 samples/sec Loss 9.0513 LearningRate 0.000837 Epoch: 3 Global Step: 69420 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:29,638-Speed 2513.77 samples/sec Loss 9.0149 LearningRate 0.000837 Epoch: 3 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:37,839-Speed 2497.66 samples/sec Loss 9.0796 LearningRate 0.000837 Epoch: 3 Global Step: 69440 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:46,039-Speed 2497.96 samples/sec Loss 9.0637 LearningRate 0.000837 Epoch: 3 Global Step: 69450 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:17:54,242-Speed 2496.81 samples/sec Loss 8.9998 LearningRate 0.000837 Epoch: 3 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:02,452-Speed 2495.38 samples/sec Loss 9.0387 LearningRate 0.000837 Epoch: 3 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:10,656-Speed 2496.67 samples/sec Loss 9.0044 LearningRate 0.000838 Epoch: 3 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:18,808-Speed 2512.62 samples/sec Loss 9.2268 LearningRate 0.000838 Epoch: 3 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:27,025-Speed 2493.14 samples/sec Loss 9.0690 LearningRate 0.000838 Epoch: 3 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:35,230-Speed 2496.43 samples/sec Loss 9.1215 LearningRate 0.000838 Epoch: 3 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:43,431-Speed 2497.41 samples/sec Loss 9.1007 LearningRate 0.000838 Epoch: 3 Global Step: 69520 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:51,637-Speed 2496.44 samples/sec Loss 9.1655 LearningRate 0.000838 Epoch: 3 Global Step: 69530 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:18:59,843-Speed 2496.17 samples/sec Loss 9.1101 LearningRate 0.000838 Epoch: 3 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:19:08,006-Speed 2509.09 samples/sec Loss 9.0538 LearningRate 0.000838 Epoch: 3 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 174 hours Training: 2022-07-06 06:19:16,211-Speed 2496.39 samples/sec Loss 9.1153 LearningRate 0.000839 Epoch: 3 Global Step: 69560 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:19:24,416-Speed 2496.58 samples/sec Loss 9.0991 LearningRate 0.000839 Epoch: 3 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:19:32,619-Speed 2496.85 samples/sec Loss 9.0641 LearningRate 0.000839 Epoch: 3 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:19:40,824-Speed 2496.46 samples/sec Loss 9.0641 LearningRate 0.000839 Epoch: 3 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:19:49,025-Speed 2497.71 samples/sec Loss 8.9417 LearningRate 0.000839 Epoch: 3 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:19:57,177-Speed 2512.62 samples/sec Loss 9.0512 LearningRate 0.000839 Epoch: 3 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:05,380-Speed 2497.07 samples/sec Loss 8.9422 LearningRate 0.000839 Epoch: 3 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:13,581-Speed 2497.71 samples/sec Loss 9.1517 LearningRate 0.000839 Epoch: 3 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:21,779-Speed 2498.48 samples/sec Loss 9.1887 LearningRate 0.000839 Epoch: 3 Global Step: 69640 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:29,980-Speed 2497.57 samples/sec Loss 9.2395 LearningRate 0.000840 Epoch: 3 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:38,181-Speed 2497.68 samples/sec Loss 9.0129 LearningRate 0.000840 Epoch: 3 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:46,327-Speed 2514.41 samples/sec Loss 9.0663 LearningRate 0.000840 Epoch: 3 Global Step: 69670 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:20:54,526-Speed 2498.32 samples/sec Loss 8.9921 LearningRate 0.000840 Epoch: 3 Global Step: 69680 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:02,722-Speed 2499.06 samples/sec Loss 9.1848 LearningRate 0.000840 Epoch: 3 Global Step: 69690 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:10,926-Speed 2496.57 samples/sec Loss 9.0758 LearningRate 0.000840 Epoch: 3 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:19,130-Speed 2496.74 samples/sec Loss 9.1624 LearningRate 0.000840 Epoch: 3 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:27,332-Speed 2497.52 samples/sec Loss 9.0397 LearningRate 0.000840 Epoch: 3 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:35,480-Speed 2513.71 samples/sec Loss 9.0825 LearningRate 0.000841 Epoch: 3 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:43,681-Speed 2497.73 samples/sec Loss 9.0813 LearningRate 0.000841 Epoch: 3 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:21:51,882-Speed 2497.68 samples/sec Loss 9.0848 LearningRate 0.000841 Epoch: 3 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:00,086-Speed 2496.92 samples/sec Loss 9.0602 LearningRate 0.000841 Epoch: 3 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:08,284-Speed 2498.41 samples/sec Loss 8.9291 LearningRate 0.000841 Epoch: 3 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:16,484-Speed 2497.86 samples/sec Loss 8.9666 LearningRate 0.000841 Epoch: 3 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:24,629-Speed 2514.82 samples/sec Loss 8.9676 LearningRate 0.000841 Epoch: 3 Global Step: 69790 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:32,836-Speed 2496.01 samples/sec Loss 8.8601 LearningRate 0.000841 Epoch: 3 Global Step: 69800 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:41,037-Speed 2497.62 samples/sec Loss 9.0080 LearningRate 0.000842 Epoch: 3 Global Step: 69810 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:49,239-Speed 2497.30 samples/sec Loss 8.9304 LearningRate 0.000842 Epoch: 3 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:22:57,438-Speed 2498.07 samples/sec Loss 8.8939 LearningRate 0.000842 Epoch: 3 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:05,640-Speed 2497.59 samples/sec Loss 8.8675 LearningRate 0.000842 Epoch: 3 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:13,787-Speed 2514.06 samples/sec Loss 8.8670 LearningRate 0.000842 Epoch: 3 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:21,989-Speed 2497.38 samples/sec Loss 8.9611 LearningRate 0.000842 Epoch: 3 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:30,188-Speed 2498.42 samples/sec Loss 9.0386 LearningRate 0.000842 Epoch: 3 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:38,387-Speed 2498.22 samples/sec Loss 9.0369 LearningRate 0.000842 Epoch: 3 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:46,588-Speed 2497.39 samples/sec Loss 9.0470 LearningRate 0.000842 Epoch: 3 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:23:54,788-Speed 2498.07 samples/sec Loss 9.1491 LearningRate 0.000843 Epoch: 3 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:02,943-Speed 2511.60 samples/sec Loss 8.9681 LearningRate 0.000843 Epoch: 3 Global Step: 69910 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:11,145-Speed 2497.40 samples/sec Loss 8.9916 LearningRate 0.000843 Epoch: 3 Global Step: 69920 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:19,342-Speed 2499.02 samples/sec Loss 9.0245 LearningRate 0.000843 Epoch: 3 Global Step: 69930 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:27,540-Speed 2498.85 samples/sec Loss 9.1071 LearningRate 0.000843 Epoch: 3 Global Step: 69940 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:35,742-Speed 2497.69 samples/sec Loss 9.0689 LearningRate 0.000843 Epoch: 3 Global Step: 69950 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:43,946-Speed 2496.61 samples/sec Loss 8.9912 LearningRate 0.000843 Epoch: 3 Global Step: 69960 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:24:52,094-Speed 2514.06 samples/sec Loss 9.1208 LearningRate 0.000843 Epoch: 3 Global Step: 69970 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:00,304-Speed 2494.59 samples/sec Loss 9.0453 LearningRate 0.000844 Epoch: 3 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:08,500-Speed 2499.38 samples/sec Loss 9.0318 LearningRate 0.000844 Epoch: 3 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:16,706-Speed 2496.08 samples/sec Loss 8.9400 LearningRate 0.000844 Epoch: 3 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:24,904-Speed 2498.70 samples/sec Loss 8.9367 LearningRate 0.000844 Epoch: 3 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:33,110-Speed 2496.32 samples/sec Loss 8.9998 LearningRate 0.000844 Epoch: 3 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:41,262-Speed 2512.72 samples/sec Loss 9.0056 LearningRate 0.000844 Epoch: 3 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:49,458-Speed 2498.93 samples/sec Loss 9.0252 LearningRate 0.000844 Epoch: 3 Global Step: 70040 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:25:57,657-Speed 2498.42 samples/sec Loss 9.0209 LearningRate 0.000844 Epoch: 3 Global Step: 70050 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:05,858-Speed 2497.80 samples/sec Loss 9.0205 LearningRate 0.000845 Epoch: 3 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:14,057-Speed 2498.43 samples/sec Loss 8.9891 LearningRate 0.000845 Epoch: 3 Global Step: 70070 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:22,270-Speed 2493.89 samples/sec Loss 8.9995 LearningRate 0.000845 Epoch: 3 Global Step: 70080 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:30,418-Speed 2513.73 samples/sec Loss 9.0089 LearningRate 0.000845 Epoch: 3 Global Step: 70090 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:38,618-Speed 2498.08 samples/sec Loss 8.9711 LearningRate 0.000845 Epoch: 3 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:46,821-Speed 2497.24 samples/sec Loss 8.9148 LearningRate 0.000845 Epoch: 3 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:26:54,982-Speed 2509.86 samples/sec Loss 8.9819 LearningRate 0.000845 Epoch: 3 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:03,181-Speed 2497.97 samples/sec Loss 9.0696 LearningRate 0.000845 Epoch: 3 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:11,384-Speed 2497.44 samples/sec Loss 8.9952 LearningRate 0.000845 Epoch: 3 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:19,541-Speed 2511.17 samples/sec Loss 9.2571 LearningRate 0.000846 Epoch: 3 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:27,739-Speed 2498.45 samples/sec Loss 8.9818 LearningRate 0.000846 Epoch: 3 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:35,939-Speed 2498.19 samples/sec Loss 9.1723 LearningRate 0.000846 Epoch: 3 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:44,136-Speed 2498.91 samples/sec Loss 9.2196 LearningRate 0.000846 Epoch: 3 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:27:52,335-Speed 2498.26 samples/sec Loss 9.1118 LearningRate 0.000846 Epoch: 3 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:00,537-Speed 2497.00 samples/sec Loss 8.9972 LearningRate 0.000846 Epoch: 3 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:08,684-Speed 2514.16 samples/sec Loss 8.9738 LearningRate 0.000846 Epoch: 3 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:16,885-Speed 2497.91 samples/sec Loss 9.0843 LearningRate 0.000846 Epoch: 3 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:25,089-Speed 2496.86 samples/sec Loss 9.0891 LearningRate 0.000847 Epoch: 3 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:33,287-Speed 2498.34 samples/sec Loss 9.0534 LearningRate 0.000847 Epoch: 3 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:41,488-Speed 2497.72 samples/sec Loss 9.0871 LearningRate 0.000847 Epoch: 3 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:49,690-Speed 2497.57 samples/sec Loss 9.0388 LearningRate 0.000847 Epoch: 3 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:28:57,843-Speed 2512.38 samples/sec Loss 9.0207 LearningRate 0.000847 Epoch: 3 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:06,040-Speed 2498.92 samples/sec Loss 9.0959 LearningRate 0.000847 Epoch: 3 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:14,249-Speed 2495.66 samples/sec Loss 9.1107 LearningRate 0.000847 Epoch: 3 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:22,447-Speed 2498.44 samples/sec Loss 8.9633 LearningRate 0.000847 Epoch: 3 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:30,647-Speed 2498.01 samples/sec Loss 9.0461 LearningRate 0.000848 Epoch: 3 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:38,844-Speed 2498.58 samples/sec Loss 9.0411 LearningRate 0.000848 Epoch: 3 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:46,992-Speed 2513.93 samples/sec Loss 8.9040 LearningRate 0.000848 Epoch: 3 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:29:55,194-Speed 2497.49 samples/sec Loss 9.1078 LearningRate 0.000848 Epoch: 3 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:03,400-Speed 2496.02 samples/sec Loss 9.0090 LearningRate 0.000848 Epoch: 3 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:11,600-Speed 2497.92 samples/sec Loss 8.9732 LearningRate 0.000848 Epoch: 3 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:19,801-Speed 2497.86 samples/sec Loss 8.8434 LearningRate 0.000848 Epoch: 3 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:28,002-Speed 2497.69 samples/sec Loss 9.0447 LearningRate 0.000848 Epoch: 3 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:36,151-Speed 2513.47 samples/sec Loss 9.1629 LearningRate 0.000849 Epoch: 3 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:44,363-Speed 2494.43 samples/sec Loss 8.9245 LearningRate 0.000849 Epoch: 3 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:30:52,562-Speed 2498.23 samples/sec Loss 8.9759 LearningRate 0.000849 Epoch: 3 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:00,760-Speed 2498.70 samples/sec Loss 8.9435 LearningRate 0.000849 Epoch: 3 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:08,967-Speed 2495.69 samples/sec Loss 8.8997 LearningRate 0.000849 Epoch: 3 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:17,168-Speed 2497.57 samples/sec Loss 9.0303 LearningRate 0.000849 Epoch: 3 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:25,314-Speed 2514.45 samples/sec Loss 8.9413 LearningRate 0.000849 Epoch: 3 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:33,518-Speed 2496.80 samples/sec Loss 8.9626 LearningRate 0.000849 Epoch: 3 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:41,716-Speed 2498.19 samples/sec Loss 8.9299 LearningRate 0.000849 Epoch: 3 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:49,917-Speed 2497.91 samples/sec Loss 8.9426 LearningRate 0.000850 Epoch: 3 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:31:58,119-Speed 2497.46 samples/sec Loss 8.8158 LearningRate 0.000850 Epoch: 3 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:06,322-Speed 2497.21 samples/sec Loss 8.9425 LearningRate 0.000850 Epoch: 3 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:14,484-Speed 2509.69 samples/sec Loss 9.0881 LearningRate 0.000850 Epoch: 3 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:22,687-Speed 2497.16 samples/sec Loss 8.9349 LearningRate 0.000850 Epoch: 3 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:30,906-Speed 2492.46 samples/sec Loss 8.9329 LearningRate 0.000850 Epoch: 3 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:39,113-Speed 2495.60 samples/sec Loss 8.9958 LearningRate 0.000850 Epoch: 3 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:47,318-Speed 2496.84 samples/sec Loss 8.8922 LearningRate 0.000850 Epoch: 3 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:32:55,522-Speed 2496.79 samples/sec Loss 8.8922 LearningRate 0.000851 Epoch: 3 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:03,671-Speed 2513.49 samples/sec Loss 8.8217 LearningRate 0.000851 Epoch: 3 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:11,874-Speed 2497.00 samples/sec Loss 8.9442 LearningRate 0.000851 Epoch: 3 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:20,080-Speed 2496.23 samples/sec Loss 8.7542 LearningRate 0.000851 Epoch: 3 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:28,290-Speed 2494.84 samples/sec Loss 8.9090 LearningRate 0.000851 Epoch: 3 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:36,491-Speed 2497.67 samples/sec Loss 8.9378 LearningRate 0.000851 Epoch: 3 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:44,691-Speed 2498.08 samples/sec Loss 8.8650 LearningRate 0.000851 Epoch: 3 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:33:52,838-Speed 2514.11 samples/sec Loss 8.9425 LearningRate 0.000851 Epoch: 3 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:01,038-Speed 2498.39 samples/sec Loss 8.9322 LearningRate 0.000852 Epoch: 3 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:09,246-Speed 2495.37 samples/sec Loss 8.8743 LearningRate 0.000852 Epoch: 3 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:17,445-Speed 2498.45 samples/sec Loss 8.9229 LearningRate 0.000852 Epoch: 3 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:25,651-Speed 2496.53 samples/sec Loss 8.9746 LearningRate 0.000852 Epoch: 3 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:33,853-Speed 2497.40 samples/sec Loss 8.9618 LearningRate 0.000852 Epoch: 3 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:42,005-Speed 2512.48 samples/sec Loss 8.9062 LearningRate 0.000852 Epoch: 3 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:50,207-Speed 2497.60 samples/sec Loss 8.9699 LearningRate 0.000852 Epoch: 3 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:34:58,407-Speed 2497.90 samples/sec Loss 9.0011 LearningRate 0.000852 Epoch: 3 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:06,606-Speed 2498.30 samples/sec Loss 8.8454 LearningRate 0.000852 Epoch: 3 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:14,807-Speed 2497.74 samples/sec Loss 8.9212 LearningRate 0.000853 Epoch: 3 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:23,008-Speed 2497.67 samples/sec Loss 8.8422 LearningRate 0.000853 Epoch: 3 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:31,159-Speed 2513.08 samples/sec Loss 9.0119 LearningRate 0.000853 Epoch: 3 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:39,360-Speed 2497.75 samples/sec Loss 8.9079 LearningRate 0.000853 Epoch: 3 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:47,558-Speed 2498.30 samples/sec Loss 8.9768 LearningRate 0.000853 Epoch: 3 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:35:55,760-Speed 2497.22 samples/sec Loss 8.8688 LearningRate 0.000853 Epoch: 3 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:03,960-Speed 2498.04 samples/sec Loss 8.9789 LearningRate 0.000853 Epoch: 3 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:12,159-Speed 2498.73 samples/sec Loss 8.9435 LearningRate 0.000853 Epoch: 3 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:20,306-Speed 2514.06 samples/sec Loss 8.9323 LearningRate 0.000854 Epoch: 3 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:28,508-Speed 2497.94 samples/sec Loss 8.9371 LearningRate 0.000854 Epoch: 3 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:36,775-Speed 2498.80 samples/sec Loss 8.8873 LearningRate 0.000854 Epoch: 3 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:45,030-Speed 2499.28 samples/sec Loss 8.8981 LearningRate 0.000854 Epoch: 3 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:36:53,233-Speed 2496.91 samples/sec Loss 8.8035 LearningRate 0.000854 Epoch: 3 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:04,417-Speed 1844.73 samples/sec Loss 8.8124 LearningRate 0.000854 Epoch: 3 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:12,574-Speed 2516.36 samples/sec Loss 8.8488 LearningRate 0.000854 Epoch: 3 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:20,791-Speed 2499.13 samples/sec Loss 8.8728 LearningRate 0.000854 Epoch: 3 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:28,992-Speed 2497.40 samples/sec Loss 8.9541 LearningRate 0.000855 Epoch: 3 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:37,195-Speed 2497.19 samples/sec Loss 8.9504 LearningRate 0.000855 Epoch: 3 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:47,358-Speed 2486.92 samples/sec Loss 8.8853 LearningRate 0.000855 Epoch: 3 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:37:55,584-Speed 2499.68 samples/sec Loss 8.7802 LearningRate 0.000855 Epoch: 3 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:03,729-Speed 2514.73 samples/sec Loss 8.8363 LearningRate 0.000855 Epoch: 3 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:12,540-Speed 2488.36 samples/sec Loss 8.8421 LearningRate 0.000855 Epoch: 3 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:20,767-Speed 2499.77 samples/sec Loss 8.8164 LearningRate 0.000855 Epoch: 3 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:32,551-Speed 1738.16 samples/sec Loss 8.7783 LearningRate 0.000855 Epoch: 3 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:40,782-Speed 2501.58 samples/sec Loss 9.0503 LearningRate 0.000856 Epoch: 3 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:49,026-Speed 2500.29 samples/sec Loss 8.8665 LearningRate 0.000856 Epoch: 3 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:38:57,264-Speed 2486.56 samples/sec Loss 8.8544 LearningRate 0.000856 Epoch: 3 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:39:05,467-Speed 2501.14 samples/sec Loss 8.9308 LearningRate 0.000856 Epoch: 3 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:39:13,671-Speed 2496.66 samples/sec Loss 8.9455 LearningRate 0.000856 Epoch: 3 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:40:31,347-Speed 263.74 samples/sec Loss 9.0791 LearningRate 0.000856 Epoch: 3 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:40:39,546-Speed 2506.16 samples/sec Loss 9.0590 LearningRate 0.000856 Epoch: 3 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:40:47,761-Speed 2503.44 samples/sec Loss 9.0267 LearningRate 0.000856 Epoch: 3 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:40:55,901-Speed 2516.28 samples/sec Loss 8.9653 LearningRate 0.000856 Epoch: 3 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:04,140-Speed 2500.60 samples/sec Loss 8.9912 LearningRate 0.000857 Epoch: 3 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:12,341-Speed 2497.69 samples/sec Loss 8.9560 LearningRate 0.000857 Epoch: 3 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:20,564-Speed 2498.12 samples/sec Loss 8.9269 LearningRate 0.000857 Epoch: 3 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:28,811-Speed 2499.31 samples/sec Loss 8.9711 LearningRate 0.000857 Epoch: 3 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:37,946-Speed 2242.26 samples/sec Loss 8.9201 LearningRate 0.000857 Epoch: 3 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:46,103-Speed 2511.19 samples/sec Loss 8.9278 LearningRate 0.000857 Epoch: 3 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:41:54,316-Speed 2493.96 samples/sec Loss 8.8947 LearningRate 0.000857 Epoch: 3 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:02,530-Speed 2493.74 samples/sec Loss 8.9584 LearningRate 0.000857 Epoch: 3 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:10,750-Speed 2491.95 samples/sec Loss 8.9818 LearningRate 0.000858 Epoch: 3 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:18,978-Speed 2489.47 samples/sec Loss 8.9032 LearningRate 0.000858 Epoch: 3 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:27,189-Speed 2494.87 samples/sec Loss 8.9014 LearningRate 0.000858 Epoch: 3 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:35,359-Speed 2507.06 samples/sec Loss 8.9504 LearningRate 0.000858 Epoch: 3 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:43,569-Speed 2495.00 samples/sec Loss 8.8129 LearningRate 0.000858 Epoch: 3 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:51,787-Speed 2492.64 samples/sec Loss 8.9101 LearningRate 0.000858 Epoch: 3 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:42:59,992-Speed 2496.08 samples/sec Loss 8.9251 LearningRate 0.000858 Epoch: 3 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:08,198-Speed 2496.41 samples/sec Loss 8.9137 LearningRate 0.000858 Epoch: 3 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:16,401-Speed 2497.15 samples/sec Loss 8.8970 LearningRate 0.000859 Epoch: 3 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:24,549-Speed 2513.84 samples/sec Loss 8.9548 LearningRate 0.000859 Epoch: 3 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:32,755-Speed 2496.19 samples/sec Loss 8.9143 LearningRate 0.000859 Epoch: 3 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:40,960-Speed 2496.18 samples/sec Loss 8.8436 LearningRate 0.000859 Epoch: 3 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:49,170-Speed 2494.76 samples/sec Loss 8.8932 LearningRate 0.000859 Epoch: 3 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:43:57,375-Speed 2496.61 samples/sec Loss 8.7543 LearningRate 0.000859 Epoch: 3 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:44:05,580-Speed 2496.36 samples/sec Loss 8.7803 LearningRate 0.000859 Epoch: 3 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:44:13,736-Speed 2511.52 samples/sec Loss 8.8447 LearningRate 0.000859 Epoch: 3 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:44:21,945-Speed 2495.12 samples/sec Loss 8.7613 LearningRate 0.000859 Epoch: 3 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:44:30,152-Speed 2495.82 samples/sec Loss 8.7519 LearningRate 0.000860 Epoch: 3 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:44:38,359-Speed 2495.97 samples/sec Loss 8.7746 LearningRate 0.000860 Epoch: 3 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:44:46,568-Speed 2495.10 samples/sec Loss 8.7140 LearningRate 0.000860 Epoch: 3 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:44:54,774-Speed 2496.30 samples/sec Loss 8.7935 LearningRate 0.000860 Epoch: 3 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:02,925-Speed 2512.98 samples/sec Loss 9.0573 LearningRate 0.000860 Epoch: 3 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:11,143-Speed 2492.60 samples/sec Loss 8.9566 LearningRate 0.000860 Epoch: 3 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:19,348-Speed 2496.44 samples/sec Loss 9.0472 LearningRate 0.000860 Epoch: 3 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:27,555-Speed 2495.60 samples/sec Loss 8.9300 LearningRate 0.000860 Epoch: 3 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:35,761-Speed 2496.31 samples/sec Loss 8.8732 LearningRate 0.000861 Epoch: 3 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:43,968-Speed 2496.00 samples/sec Loss 8.8954 LearningRate 0.000861 Epoch: 3 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:45:52,122-Speed 2512.12 samples/sec Loss 8.8455 LearningRate 0.000861 Epoch: 3 Global Step: 71410 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:00,327-Speed 2496.52 samples/sec Loss 8.8515 LearningRate 0.000861 Epoch: 3 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:08,532-Speed 2496.30 samples/sec Loss 8.9260 LearningRate 0.000861 Epoch: 3 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:16,737-Speed 2496.25 samples/sec Loss 8.9102 LearningRate 0.000861 Epoch: 3 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:24,948-Speed 2494.87 samples/sec Loss 8.8826 LearningRate 0.000861 Epoch: 3 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:33,155-Speed 2495.81 samples/sec Loss 8.9501 LearningRate 0.000861 Epoch: 3 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:41,308-Speed 2512.10 samples/sec Loss 8.8455 LearningRate 0.000862 Epoch: 3 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:49,513-Speed 2496.68 samples/sec Loss 8.8960 LearningRate 0.000862 Epoch: 3 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:46:57,719-Speed 2496.02 samples/sec Loss 8.8391 LearningRate 0.000862 Epoch: 3 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:05,925-Speed 2496.13 samples/sec Loss 8.8522 LearningRate 0.000862 Epoch: 3 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:14,131-Speed 2496.31 samples/sec Loss 8.8269 LearningRate 0.000862 Epoch: 3 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:22,333-Speed 2497.10 samples/sec Loss 8.9020 LearningRate 0.000862 Epoch: 3 Global Step: 71520 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:30,487-Speed 2512.30 samples/sec Loss 8.9388 LearningRate 0.000862 Epoch: 3 Global Step: 71530 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:38,690-Speed 2496.98 samples/sec Loss 8.8697 LearningRate 0.000862 Epoch: 3 Global Step: 71540 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:46,900-Speed 2494.83 samples/sec Loss 8.8827 LearningRate 0.000862 Epoch: 3 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:47:55,108-Speed 2495.58 samples/sec Loss 8.8725 LearningRate 0.000863 Epoch: 3 Global Step: 71560 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:03,311-Speed 2497.01 samples/sec Loss 8.7718 LearningRate 0.000863 Epoch: 3 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:11,524-Speed 2494.43 samples/sec Loss 8.9253 LearningRate 0.000863 Epoch: 3 Global Step: 71580 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:19,678-Speed 2511.65 samples/sec Loss 8.8961 LearningRate 0.000863 Epoch: 3 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:27,882-Speed 2496.95 samples/sec Loss 8.8118 LearningRate 0.000863 Epoch: 3 Global Step: 71600 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:36,087-Speed 2496.27 samples/sec Loss 9.0198 LearningRate 0.000863 Epoch: 3 Global Step: 71610 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 06:48:44,254-Speed 2508.12 samples/sec Loss 8.8987 LearningRate 0.000863 Epoch: 3 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:48:52,456-Speed 2497.37 samples/sec Loss 8.8720 LearningRate 0.000863 Epoch: 3 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:00,660-Speed 2496.76 samples/sec Loss 8.8780 LearningRate 0.000864 Epoch: 3 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:08,811-Speed 2513.03 samples/sec Loss 8.8753 LearningRate 0.000864 Epoch: 3 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:17,013-Speed 2497.12 samples/sec Loss 8.8001 LearningRate 0.000864 Epoch: 3 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:25,218-Speed 2496.50 samples/sec Loss 8.8605 LearningRate 0.000864 Epoch: 3 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:33,436-Speed 2492.61 samples/sec Loss 8.7815 LearningRate 0.000864 Epoch: 3 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:41,640-Speed 2496.55 samples/sec Loss 8.8415 LearningRate 0.000864 Epoch: 3 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:49,843-Speed 2496.95 samples/sec Loss 8.8215 LearningRate 0.000864 Epoch: 3 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:49:57,993-Speed 2513.30 samples/sec Loss 8.7066 LearningRate 0.000864 Epoch: 3 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:06,201-Speed 2495.56 samples/sec Loss 8.7073 LearningRate 0.000865 Epoch: 3 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:14,405-Speed 2496.86 samples/sec Loss 8.6960 LearningRate 0.000865 Epoch: 3 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:22,609-Speed 2496.48 samples/sec Loss 8.7972 LearningRate 0.000865 Epoch: 3 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:30,828-Speed 2492.55 samples/sec Loss 8.9248 LearningRate 0.000865 Epoch: 3 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:39,032-Speed 2496.52 samples/sec Loss 8.7181 LearningRate 0.000865 Epoch: 3 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:47,183-Speed 2513.24 samples/sec Loss 8.8367 LearningRate 0.000865 Epoch: 3 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:50:55,398-Speed 2493.29 samples/sec Loss 8.8999 LearningRate 0.000865 Epoch: 3 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:03,603-Speed 2496.60 samples/sec Loss 8.8190 LearningRate 0.000865 Epoch: 3 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:11,810-Speed 2495.95 samples/sec Loss 8.7826 LearningRate 0.000866 Epoch: 3 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:20,014-Speed 2496.78 samples/sec Loss 8.8624 LearningRate 0.000866 Epoch: 3 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:28,221-Speed 2495.91 samples/sec Loss 8.8296 LearningRate 0.000866 Epoch: 3 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:36,371-Speed 2513.22 samples/sec Loss 8.9135 LearningRate 0.000866 Epoch: 3 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:44,577-Speed 2496.37 samples/sec Loss 8.7479 LearningRate 0.000866 Epoch: 3 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:51:52,780-Speed 2497.02 samples/sec Loss 8.8452 LearningRate 0.000866 Epoch: 3 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:00,984-Speed 2496.93 samples/sec Loss 8.9245 LearningRate 0.000866 Epoch: 3 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:09,192-Speed 2495.57 samples/sec Loss 8.7631 LearningRate 0.000866 Epoch: 3 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:17,395-Speed 2497.00 samples/sec Loss 8.7582 LearningRate 0.000866 Epoch: 3 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:25,552-Speed 2511.30 samples/sec Loss 8.7015 LearningRate 0.000867 Epoch: 3 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:33,755-Speed 2497.04 samples/sec Loss 8.7148 LearningRate 0.000867 Epoch: 3 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:41,956-Speed 2497.64 samples/sec Loss 8.6517 LearningRate 0.000867 Epoch: 3 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:50,161-Speed 2496.45 samples/sec Loss 8.7456 LearningRate 0.000867 Epoch: 3 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:52:58,369-Speed 2495.71 samples/sec Loss 8.7326 LearningRate 0.000867 Epoch: 3 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:06,577-Speed 2495.53 samples/sec Loss 8.7380 LearningRate 0.000867 Epoch: 3 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:14,748-Speed 2506.98 samples/sec Loss 8.6555 LearningRate 0.000867 Epoch: 3 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:22,957-Speed 2495.16 samples/sec Loss 8.9185 LearningRate 0.000867 Epoch: 3 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:31,173-Speed 2493.08 samples/sec Loss 8.9040 LearningRate 0.000868 Epoch: 3 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:39,378-Speed 2496.52 samples/sec Loss 8.8441 LearningRate 0.000868 Epoch: 3 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:47,579-Speed 2497.63 samples/sec Loss 8.8618 LearningRate 0.000868 Epoch: 3 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:53:55,783-Speed 2496.65 samples/sec Loss 8.8230 LearningRate 0.000868 Epoch: 3 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:03,940-Speed 2511.03 samples/sec Loss 8.8336 LearningRate 0.000868 Epoch: 3 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:12,156-Speed 2493.26 samples/sec Loss 8.8156 LearningRate 0.000868 Epoch: 3 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:20,360-Speed 2496.87 samples/sec Loss 8.8946 LearningRate 0.000868 Epoch: 3 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:28,576-Speed 2493.14 samples/sec Loss 8.8601 LearningRate 0.000868 Epoch: 3 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:36,778-Speed 2497.20 samples/sec Loss 8.8973 LearningRate 0.000869 Epoch: 3 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:44,983-Speed 2496.34 samples/sec Loss 8.7357 LearningRate 0.000869 Epoch: 3 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:54:53,132-Speed 2513.96 samples/sec Loss 8.8217 LearningRate 0.000869 Epoch: 3 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:01,335-Speed 2496.77 samples/sec Loss 8.8108 LearningRate 0.000869 Epoch: 3 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:09,545-Speed 2495.23 samples/sec Loss 8.7536 LearningRate 0.000869 Epoch: 3 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:17,746-Speed 2497.56 samples/sec Loss 8.8405 LearningRate 0.000869 Epoch: 3 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:25,949-Speed 2497.13 samples/sec Loss 8.7738 LearningRate 0.000869 Epoch: 3 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:34,161-Speed 2494.51 samples/sec Loss 8.8019 LearningRate 0.000869 Epoch: 3 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:42,315-Speed 2512.05 samples/sec Loss 8.8067 LearningRate 0.000869 Epoch: 3 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:50,518-Speed 2496.81 samples/sec Loss 8.7935 LearningRate 0.000870 Epoch: 3 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:55:58,721-Speed 2497.13 samples/sec Loss 8.8226 LearningRate 0.000870 Epoch: 3 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:06,928-Speed 2495.73 samples/sec Loss 8.7974 LearningRate 0.000870 Epoch: 3 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:15,138-Speed 2494.94 samples/sec Loss 8.8776 LearningRate 0.000870 Epoch: 3 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:23,346-Speed 2496.00 samples/sec Loss 8.8014 LearningRate 0.000870 Epoch: 3 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:31,498-Speed 2513.04 samples/sec Loss 8.9499 LearningRate 0.000870 Epoch: 3 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:39,722-Speed 2490.58 samples/sec Loss 8.8334 LearningRate 0.000870 Epoch: 3 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:47,925-Speed 2496.83 samples/sec Loss 8.9023 LearningRate 0.000870 Epoch: 3 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:56:56,131-Speed 2496.29 samples/sec Loss 8.8087 LearningRate 0.000871 Epoch: 3 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:04,335-Speed 2496.71 samples/sec Loss 8.8540 LearningRate 0.000871 Epoch: 3 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:12,541-Speed 2496.11 samples/sec Loss 8.7722 LearningRate 0.000871 Epoch: 3 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:20,691-Speed 2513.08 samples/sec Loss 8.8595 LearningRate 0.000871 Epoch: 3 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:28,895-Speed 2496.97 samples/sec Loss 8.7941 LearningRate 0.000871 Epoch: 3 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:37,100-Speed 2496.31 samples/sec Loss 8.8181 LearningRate 0.000871 Epoch: 3 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:45,304-Speed 2496.90 samples/sec Loss 8.8323 LearningRate 0.000871 Epoch: 3 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:57:53,506-Speed 2497.26 samples/sec Loss 8.8738 LearningRate 0.000871 Epoch: 3 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:01,713-Speed 2495.82 samples/sec Loss 8.8021 LearningRate 0.000872 Epoch: 3 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:09,863-Speed 2513.24 samples/sec Loss 8.8160 LearningRate 0.000872 Epoch: 3 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:18,069-Speed 2496.24 samples/sec Loss 8.7542 LearningRate 0.000872 Epoch: 3 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:26,276-Speed 2495.80 samples/sec Loss 8.8127 LearningRate 0.000872 Epoch: 3 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:34,478-Speed 2497.48 samples/sec Loss 8.7351 LearningRate 0.000872 Epoch: 3 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:42,688-Speed 2494.93 samples/sec Loss 8.7476 LearningRate 0.000872 Epoch: 3 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:50,898-Speed 2494.58 samples/sec Loss 8.8064 LearningRate 0.000872 Epoch: 3 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:58:59,050-Speed 2512.83 samples/sec Loss 8.6635 LearningRate 0.000872 Epoch: 3 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:07,254-Speed 2496.73 samples/sec Loss 8.6069 LearningRate 0.000872 Epoch: 3 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:15,463-Speed 2495.23 samples/sec Loss 8.7185 LearningRate 0.000873 Epoch: 3 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:23,667-Speed 2497.00 samples/sec Loss 8.7595 LearningRate 0.000873 Epoch: 3 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:31,874-Speed 2495.79 samples/sec Loss 8.7285 LearningRate 0.000873 Epoch: 3 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:40,077-Speed 2497.17 samples/sec Loss 8.6594 LearningRate 0.000873 Epoch: 3 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:48,229-Speed 2512.68 samples/sec Loss 8.7064 LearningRate 0.000873 Epoch: 3 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 06:59:56,431-Speed 2497.18 samples/sec Loss 8.7823 LearningRate 0.000873 Epoch: 3 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:04,663-Speed 2488.40 samples/sec Loss 8.7929 LearningRate 0.000873 Epoch: 3 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:12,868-Speed 2496.39 samples/sec Loss 8.7727 LearningRate 0.000873 Epoch: 3 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:21,077-Speed 2495.36 samples/sec Loss 8.8463 LearningRate 0.000874 Epoch: 3 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:29,281-Speed 2496.47 samples/sec Loss 8.7826 LearningRate 0.000874 Epoch: 3 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:37,435-Speed 2512.08 samples/sec Loss 8.7001 LearningRate 0.000874 Epoch: 3 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:45,638-Speed 2497.43 samples/sec Loss 8.9723 LearningRate 0.000874 Epoch: 3 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:00:53,843-Speed 2496.50 samples/sec Loss 8.7972 LearningRate 0.000874 Epoch: 3 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:02,051-Speed 2495.52 samples/sec Loss 8.7520 LearningRate 0.000874 Epoch: 3 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:10,253-Speed 2497.27 samples/sec Loss 8.7290 LearningRate 0.000874 Epoch: 3 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:18,474-Speed 2491.61 samples/sec Loss 8.8436 LearningRate 0.000874 Epoch: 3 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:26,624-Speed 2513.23 samples/sec Loss 8.7776 LearningRate 0.000875 Epoch: 3 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:34,827-Speed 2497.10 samples/sec Loss 8.8682 LearningRate 0.000875 Epoch: 3 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:43,034-Speed 2495.71 samples/sec Loss 8.7279 LearningRate 0.000875 Epoch: 3 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:51,237-Speed 2497.27 samples/sec Loss 8.8181 LearningRate 0.000875 Epoch: 3 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:01:59,442-Speed 2496.61 samples/sec Loss 8.8502 LearningRate 0.000875 Epoch: 3 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:07,648-Speed 2495.98 samples/sec Loss 8.8730 LearningRate 0.000875 Epoch: 3 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:15,798-Speed 2513.58 samples/sec Loss 8.8693 LearningRate 0.000875 Epoch: 3 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:24,000-Speed 2497.12 samples/sec Loss 8.8474 LearningRate 0.000875 Epoch: 3 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:32,203-Speed 2497.19 samples/sec Loss 8.8537 LearningRate 0.000876 Epoch: 3 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:40,423-Speed 2491.75 samples/sec Loss 8.7297 LearningRate 0.000876 Epoch: 3 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:48,627-Speed 2496.87 samples/sec Loss 8.7857 LearningRate 0.000876 Epoch: 3 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:02:56,830-Speed 2496.76 samples/sec Loss 8.6908 LearningRate 0.000876 Epoch: 3 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:04,980-Speed 2513.62 samples/sec Loss 8.7821 LearningRate 0.000876 Epoch: 3 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:13,187-Speed 2495.93 samples/sec Loss 8.8799 LearningRate 0.000876 Epoch: 3 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:21,403-Speed 2493.00 samples/sec Loss 8.8879 LearningRate 0.000876 Epoch: 3 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:29,612-Speed 2495.14 samples/sec Loss 8.8533 LearningRate 0.000876 Epoch: 3 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:37,817-Speed 2497.36 samples/sec Loss 8.7181 LearningRate 0.000876 Epoch: 3 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:46,024-Speed 2495.57 samples/sec Loss 8.7103 LearningRate 0.000877 Epoch: 3 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:03:54,177-Speed 2512.46 samples/sec Loss 8.8627 LearningRate 0.000877 Epoch: 3 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:02,382-Speed 2496.61 samples/sec Loss 8.7813 LearningRate 0.000877 Epoch: 3 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:10,587-Speed 2496.62 samples/sec Loss 8.5961 LearningRate 0.000877 Epoch: 3 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:18,789-Speed 2497.46 samples/sec Loss 8.7545 LearningRate 0.000877 Epoch: 3 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:26,995-Speed 2496.04 samples/sec Loss 8.7524 LearningRate 0.000877 Epoch: 3 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:35,203-Speed 2495.34 samples/sec Loss 8.6631 LearningRate 0.000877 Epoch: 3 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:43,355-Speed 2512.87 samples/sec Loss 8.7756 LearningRate 0.000877 Epoch: 3 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:51,565-Speed 2494.95 samples/sec Loss 8.7002 LearningRate 0.000878 Epoch: 3 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:04:59,769-Speed 2496.59 samples/sec Loss 8.6372 LearningRate 0.000878 Epoch: 3 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:05:07,986-Speed 2492.82 samples/sec Loss 8.9152 LearningRate 0.000878 Epoch: 3 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:16,191-Speed 2496.63 samples/sec Loss 8.9030 LearningRate 0.000878 Epoch: 3 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:24,398-Speed 2495.66 samples/sec Loss 8.7205 LearningRate 0.000878 Epoch: 3 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:32,555-Speed 2511.38 samples/sec Loss 8.8666 LearningRate 0.000878 Epoch: 3 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:40,763-Speed 2495.23 samples/sec Loss 8.7820 LearningRate 0.000878 Epoch: 3 Global Step: 72860 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:48,976-Speed 2494.40 samples/sec Loss 8.7688 LearningRate 0.000878 Epoch: 3 Global Step: 72870 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:05:57,180-Speed 2496.62 samples/sec Loss 8.8407 LearningRate 0.000879 Epoch: 3 Global Step: 72880 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:05,384-Speed 2496.78 samples/sec Loss 8.6423 LearningRate 0.000879 Epoch: 3 Global Step: 72890 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:13,590-Speed 2496.00 samples/sec Loss 8.6187 LearningRate 0.000879 Epoch: 3 Global Step: 72900 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:21,744-Speed 2512.09 samples/sec Loss 8.6818 LearningRate 0.000879 Epoch: 3 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:29,947-Speed 2497.58 samples/sec Loss 8.6369 LearningRate 0.000879 Epoch: 3 Global Step: 72920 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:38,152-Speed 2496.49 samples/sec Loss 8.6550 LearningRate 0.000879 Epoch: 3 Global Step: 72930 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:46,380-Speed 2489.26 samples/sec Loss 8.7062 LearningRate 0.000879 Epoch: 3 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:06:54,584-Speed 2496.66 samples/sec Loss 8.8397 LearningRate 0.000879 Epoch: 3 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:02,793-Speed 2495.48 samples/sec Loss 8.7460 LearningRate 0.000879 Epoch: 3 Global Step: 72960 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:10,945-Speed 2512.59 samples/sec Loss 8.7632 LearningRate 0.000880 Epoch: 3 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:19,156-Speed 2494.68 samples/sec Loss 8.6915 LearningRate 0.000880 Epoch: 3 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:27,362-Speed 2496.43 samples/sec Loss 8.7588 LearningRate 0.000880 Epoch: 3 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:35,566-Speed 2496.76 samples/sec Loss 8.7804 LearningRate 0.000880 Epoch: 3 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:43,770-Speed 2496.78 samples/sec Loss 8.7152 LearningRate 0.000880 Epoch: 3 Global Step: 73010 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:07:51,972-Speed 2497.33 samples/sec Loss 8.8107 LearningRate 0.000880 Epoch: 3 Global Step: 73020 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:00,124-Speed 2513.00 samples/sec Loss 8.8143 LearningRate 0.000880 Epoch: 3 Global Step: 73030 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:08,331-Speed 2495.79 samples/sec Loss 8.8367 LearningRate 0.000880 Epoch: 3 Global Step: 73040 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:16,536-Speed 2496.31 samples/sec Loss 8.8333 LearningRate 0.000881 Epoch: 3 Global Step: 73050 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:24,745-Speed 2495.27 samples/sec Loss 8.7777 LearningRate 0.000881 Epoch: 3 Global Step: 73060 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:32,952-Speed 2495.77 samples/sec Loss 8.7917 LearningRate 0.000881 Epoch: 3 Global Step: 73070 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:41,157-Speed 2496.56 samples/sec Loss 8.7248 LearningRate 0.000881 Epoch: 3 Global Step: 73080 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:49,308-Speed 2513.02 samples/sec Loss 8.7141 LearningRate 0.000881 Epoch: 3 Global Step: 73090 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:08:57,511-Speed 2496.92 samples/sec Loss 8.7404 LearningRate 0.000881 Epoch: 3 Global Step: 73100 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:05,717-Speed 2496.16 samples/sec Loss 8.7468 LearningRate 0.000881 Epoch: 3 Global Step: 73110 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:13,923-Speed 2495.99 samples/sec Loss 8.6627 LearningRate 0.000881 Epoch: 3 Global Step: 73120 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:22,132-Speed 2495.27 samples/sec Loss 8.5484 LearningRate 0.000882 Epoch: 3 Global Step: 73130 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:30,338-Speed 2496.23 samples/sec Loss 8.5920 LearningRate 0.000882 Epoch: 3 Global Step: 73140 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:38,493-Speed 2511.70 samples/sec Loss 8.6824 LearningRate 0.000882 Epoch: 3 Global Step: 73150 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:46,695-Speed 2497.55 samples/sec Loss 8.6776 LearningRate 0.000882 Epoch: 3 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:09:54,911-Speed 2493.04 samples/sec Loss 8.7251 LearningRate 0.000882 Epoch: 3 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:03,117-Speed 2496.08 samples/sec Loss 8.6990 LearningRate 0.000882 Epoch: 3 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:11,320-Speed 2497.17 samples/sec Loss 8.6543 LearningRate 0.000882 Epoch: 3 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:19,533-Speed 2493.93 samples/sec Loss 8.4564 LearningRate 0.000882 Epoch: 3 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:27,694-Speed 2509.82 samples/sec Loss 8.7871 LearningRate 0.000883 Epoch: 3 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:35,898-Speed 2497.00 samples/sec Loss 8.5549 LearningRate 0.000883 Epoch: 3 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:44,105-Speed 2495.83 samples/sec Loss 8.6567 LearningRate 0.000883 Epoch: 3 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:10:52,311-Speed 2496.17 samples/sec Loss 8.6448 LearningRate 0.000883 Epoch: 3 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:00,517-Speed 2496.07 samples/sec Loss 8.6374 LearningRate 0.000883 Epoch: 3 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:08,734-Speed 2492.58 samples/sec Loss 8.7671 LearningRate 0.000883 Epoch: 3 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:16,882-Speed 2514.15 samples/sec Loss 8.5621 LearningRate 0.000883 Epoch: 3 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:25,096-Speed 2493.53 samples/sec Loss 8.6520 LearningRate 0.000883 Epoch: 3 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:33,300-Speed 2496.55 samples/sec Loss 8.6236 LearningRate 0.000883 Epoch: 3 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:41,508-Speed 2495.74 samples/sec Loss 8.6084 LearningRate 0.000884 Epoch: 3 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:49,723-Speed 2493.36 samples/sec Loss 8.5885 LearningRate 0.000884 Epoch: 3 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:11:57,928-Speed 2496.44 samples/sec Loss 8.6155 LearningRate 0.000884 Epoch: 3 Global Step: 73320 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:06,078-Speed 2513.39 samples/sec Loss 8.7954 LearningRate 0.000884 Epoch: 3 Global Step: 73330 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:14,283-Speed 2496.60 samples/sec Loss 8.6641 LearningRate 0.000884 Epoch: 3 Global Step: 73340 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:22,500-Speed 2492.74 samples/sec Loss 8.6520 LearningRate 0.000884 Epoch: 3 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:30,706-Speed 2495.92 samples/sec Loss 8.7239 LearningRate 0.000884 Epoch: 3 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:38,908-Speed 2497.56 samples/sec Loss 8.6361 LearningRate 0.000884 Epoch: 3 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:47,112-Speed 2496.59 samples/sec Loss 8.6657 LearningRate 0.000885 Epoch: 3 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:12:55,263-Speed 2513.13 samples/sec Loss 8.6661 LearningRate 0.000885 Epoch: 3 Global Step: 73390 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:03,472-Speed 2495.06 samples/sec Loss 8.6914 LearningRate 0.000885 Epoch: 3 Global Step: 73400 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:11,680-Speed 2495.43 samples/sec Loss 8.7389 LearningRate 0.000885 Epoch: 3 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:19,891-Speed 2494.81 samples/sec Loss 8.7284 LearningRate 0.000885 Epoch: 3 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:28,099-Speed 2495.51 samples/sec Loss 8.6949 LearningRate 0.000885 Epoch: 3 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:36,302-Speed 2496.94 samples/sec Loss 8.5777 LearningRate 0.000885 Epoch: 3 Global Step: 73440 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:44,450-Speed 2513.95 samples/sec Loss 8.6588 LearningRate 0.000885 Epoch: 3 Global Step: 73450 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:13:52,650-Speed 2497.78 samples/sec Loss 8.5272 LearningRate 0.000886 Epoch: 3 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:00,854-Speed 2496.89 samples/sec Loss 8.5992 LearningRate 0.000886 Epoch: 3 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:09,060-Speed 2496.18 samples/sec Loss 8.6680 LearningRate 0.000886 Epoch: 3 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:17,264-Speed 2496.48 samples/sec Loss 8.7148 LearningRate 0.000886 Epoch: 3 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:25,479-Speed 2493.56 samples/sec Loss 8.6374 LearningRate 0.000886 Epoch: 3 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:33,630-Speed 2512.70 samples/sec Loss 8.8758 LearningRate 0.000886 Epoch: 3 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:41,832-Speed 2497.61 samples/sec Loss 8.6869 LearningRate 0.000886 Epoch: 3 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:50,032-Speed 2497.70 samples/sec Loss 8.6965 LearningRate 0.000886 Epoch: 3 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:14:58,235-Speed 2497.21 samples/sec Loss 8.6822 LearningRate 0.000886 Epoch: 3 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:15:06,395-Speed 2509.98 samples/sec Loss 8.7479 LearningRate 0.000887 Epoch: 3 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:14,597-Speed 2497.47 samples/sec Loss 8.7125 LearningRate 0.000887 Epoch: 3 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:22,744-Speed 2514.28 samples/sec Loss 8.6251 LearningRate 0.000887 Epoch: 3 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:30,944-Speed 2497.69 samples/sec Loss 8.7235 LearningRate 0.000887 Epoch: 3 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:39,158-Speed 2493.76 samples/sec Loss 8.6891 LearningRate 0.000887 Epoch: 3 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:47,361-Speed 2497.30 samples/sec Loss 8.6596 LearningRate 0.000887 Epoch: 3 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:15:55,571-Speed 2494.83 samples/sec Loss 8.6361 LearningRate 0.000887 Epoch: 3 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:03,771-Speed 2497.51 samples/sec Loss 8.6507 LearningRate 0.000887 Epoch: 3 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:11,916-Speed 2515.04 samples/sec Loss 8.8087 LearningRate 0.000888 Epoch: 3 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:20,126-Speed 2494.65 samples/sec Loss 8.6960 LearningRate 0.000888 Epoch: 3 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:28,325-Speed 2498.25 samples/sec Loss 8.6653 LearningRate 0.000888 Epoch: 3 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:36,527-Speed 2497.39 samples/sec Loss 8.7740 LearningRate 0.000888 Epoch: 3 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:44,739-Speed 2494.03 samples/sec Loss 8.8648 LearningRate 0.000888 Epoch: 3 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:16:52,942-Speed 2497.05 samples/sec Loss 8.6879 LearningRate 0.000888 Epoch: 3 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:01,088-Speed 2514.44 samples/sec Loss 8.7230 LearningRate 0.000888 Epoch: 3 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:09,289-Speed 2497.39 samples/sec Loss 8.7221 LearningRate 0.000888 Epoch: 3 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:17,491-Speed 2497.36 samples/sec Loss 8.6931 LearningRate 0.000889 Epoch: 3 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:25,692-Speed 2497.43 samples/sec Loss 8.6082 LearningRate 0.000889 Epoch: 3 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:33,907-Speed 2493.38 samples/sec Loss 8.5468 LearningRate 0.000889 Epoch: 3 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:42,118-Speed 2494.50 samples/sec Loss 8.6297 LearningRate 0.000889 Epoch: 3 Global Step: 73740 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:50,265-Speed 2514.14 samples/sec Loss 8.5877 LearningRate 0.000889 Epoch: 3 Global Step: 73750 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:17:58,475-Speed 2495.32 samples/sec Loss 8.6577 LearningRate 0.000889 Epoch: 3 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:06,676-Speed 2497.61 samples/sec Loss 8.6696 LearningRate 0.000889 Epoch: 3 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:14,876-Speed 2497.69 samples/sec Loss 8.6366 LearningRate 0.000889 Epoch: 3 Global Step: 73780 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:23,078-Speed 2497.58 samples/sec Loss 8.5539 LearningRate 0.000889 Epoch: 3 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:31,279-Speed 2497.79 samples/sec Loss 8.7514 LearningRate 0.000890 Epoch: 3 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:39,425-Speed 2514.25 samples/sec Loss 8.7729 LearningRate 0.000890 Epoch: 3 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:47,628-Speed 2497.39 samples/sec Loss 8.7506 LearningRate 0.000890 Epoch: 3 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:18:55,827-Speed 2498.59 samples/sec Loss 8.8001 LearningRate 0.000890 Epoch: 3 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:04,031-Speed 2496.95 samples/sec Loss 8.7721 LearningRate 0.000890 Epoch: 3 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:12,233-Speed 2497.09 samples/sec Loss 8.6857 LearningRate 0.000890 Epoch: 3 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:20,438-Speed 2496.44 samples/sec Loss 8.7235 LearningRate 0.000890 Epoch: 3 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:28,585-Speed 2515.06 samples/sec Loss 8.6870 LearningRate 0.000890 Epoch: 3 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:36,784-Speed 2498.30 samples/sec Loss 8.6242 LearningRate 0.000891 Epoch: 3 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:44,984-Speed 2497.95 samples/sec Loss 8.6804 LearningRate 0.000891 Epoch: 3 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:19:53,181-Speed 2498.79 samples/sec Loss 8.5876 LearningRate 0.000891 Epoch: 3 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:01,390-Speed 2495.32 samples/sec Loss 8.5441 LearningRate 0.000891 Epoch: 3 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:09,589-Speed 2498.12 samples/sec Loss 8.6207 LearningRate 0.000891 Epoch: 3 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:17,735-Speed 2514.64 samples/sec Loss 8.6067 LearningRate 0.000891 Epoch: 3 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:25,932-Speed 2499.03 samples/sec Loss 8.6878 LearningRate 0.000891 Epoch: 3 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:34,135-Speed 2497.23 samples/sec Loss 8.6634 LearningRate 0.000891 Epoch: 3 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:42,336-Speed 2497.55 samples/sec Loss 8.6025 LearningRate 0.000892 Epoch: 3 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:50,536-Speed 2498.04 samples/sec Loss 8.5701 LearningRate 0.000892 Epoch: 3 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:20:58,735-Speed 2498.14 samples/sec Loss 8.5306 LearningRate 0.000892 Epoch: 3 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:06,882-Speed 2514.45 samples/sec Loss 8.5212 LearningRate 0.000892 Epoch: 3 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:15,081-Speed 2497.86 samples/sec Loss 8.6042 LearningRate 0.000892 Epoch: 3 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:23,283-Speed 2497.45 samples/sec Loss 8.6572 LearningRate 0.000892 Epoch: 3 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:31,489-Speed 2495.99 samples/sec Loss 8.6412 LearningRate 0.000892 Epoch: 3 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:39,691-Speed 2497.42 samples/sec Loss 8.6627 LearningRate 0.000892 Epoch: 3 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:47,888-Speed 2498.71 samples/sec Loss 8.6078 LearningRate 0.000893 Epoch: 3 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:21:56,040-Speed 2512.86 samples/sec Loss 8.7472 LearningRate 0.000893 Epoch: 3 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:04,244-Speed 2496.92 samples/sec Loss 8.5977 LearningRate 0.000893 Epoch: 3 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:12,479-Speed 2487.29 samples/sec Loss 8.6334 LearningRate 0.000893 Epoch: 3 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:20,679-Speed 2497.82 samples/sec Loss 8.6429 LearningRate 0.000893 Epoch: 3 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:28,879-Speed 2498.21 samples/sec Loss 8.6538 LearningRate 0.000893 Epoch: 3 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:37,077-Speed 2498.55 samples/sec Loss 8.7255 LearningRate 0.000893 Epoch: 3 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:45,224-Speed 2514.39 samples/sec Loss 8.6362 LearningRate 0.000893 Epoch: 3 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:22:53,421-Speed 2498.73 samples/sec Loss 8.6088 LearningRate 0.000893 Epoch: 3 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:01,620-Speed 2498.39 samples/sec Loss 8.5316 LearningRate 0.000894 Epoch: 3 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:09,834-Speed 2493.60 samples/sec Loss 8.5623 LearningRate 0.000894 Epoch: 3 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:18,034-Speed 2498.27 samples/sec Loss 8.4928 LearningRate 0.000894 Epoch: 3 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:26,235-Speed 2497.64 samples/sec Loss 8.6462 LearningRate 0.000894 Epoch: 3 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:34,384-Speed 2513.45 samples/sec Loss 8.5192 LearningRate 0.000894 Epoch: 3 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:42,593-Speed 2495.34 samples/sec Loss 8.5401 LearningRate 0.000894 Epoch: 3 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:50,805-Speed 2494.27 samples/sec Loss 8.5575 LearningRate 0.000894 Epoch: 3 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:23:59,008-Speed 2496.87 samples/sec Loss 8.5230 LearningRate 0.000894 Epoch: 3 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:07,205-Speed 2499.00 samples/sec Loss 8.5744 LearningRate 0.000895 Epoch: 3 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:15,403-Speed 2498.31 samples/sec Loss 8.5916 LearningRate 0.000895 Epoch: 3 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:23,549-Speed 2514.67 samples/sec Loss 8.9054 LearningRate 0.000895 Epoch: 3 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:31,748-Speed 2498.38 samples/sec Loss 8.5736 LearningRate 0.000895 Epoch: 3 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:39,948-Speed 2498.13 samples/sec Loss 8.7803 LearningRate 0.000895 Epoch: 3 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:48,146-Speed 2498.70 samples/sec Loss 8.7874 LearningRate 0.000895 Epoch: 3 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:24:56,345-Speed 2497.88 samples/sec Loss 8.7247 LearningRate 0.000895 Epoch: 3 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:04,546-Speed 2497.61 samples/sec Loss 8.6240 LearningRate 0.000895 Epoch: 3 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:12,700-Speed 2512.21 samples/sec Loss 8.6669 LearningRate 0.000896 Epoch: 3 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:20,899-Speed 2498.49 samples/sec Loss 8.6795 LearningRate 0.000896 Epoch: 3 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:29,097-Speed 2498.50 samples/sec Loss 8.6457 LearningRate 0.000896 Epoch: 3 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:37,295-Speed 2498.17 samples/sec Loss 8.7127 LearningRate 0.000896 Epoch: 3 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:45,499-Speed 2497.00 samples/sec Loss 8.5371 LearningRate 0.000896 Epoch: 3 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:25:53,697-Speed 2498.61 samples/sec Loss 8.5932 LearningRate 0.000896 Epoch: 3 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:01,841-Speed 2514.94 samples/sec Loss 8.5112 LearningRate 0.000896 Epoch: 3 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:10,042-Speed 2497.84 samples/sec Loss 8.5822 LearningRate 0.000896 Epoch: 3 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:18,240-Speed 2498.71 samples/sec Loss 8.5786 LearningRate 0.000896 Epoch: 3 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:26,440-Speed 2497.72 samples/sec Loss 8.5190 LearningRate 0.000897 Epoch: 3 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:34,640-Speed 2498.13 samples/sec Loss 8.5928 LearningRate 0.000897 Epoch: 3 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:42,840-Speed 2498.09 samples/sec Loss 8.5722 LearningRate 0.000897 Epoch: 3 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:50,989-Speed 2513.53 samples/sec Loss 8.5956 LearningRate 0.000897 Epoch: 3 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:26:59,191-Speed 2497.37 samples/sec Loss 8.6128 LearningRate 0.000897 Epoch: 3 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:07,389-Speed 2498.54 samples/sec Loss 8.5941 LearningRate 0.000897 Epoch: 3 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:15,590-Speed 2497.95 samples/sec Loss 8.6497 LearningRate 0.000897 Epoch: 3 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:23,791-Speed 2498.33 samples/sec Loss 8.5480 LearningRate 0.000897 Epoch: 3 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:31,991-Speed 2497.96 samples/sec Loss 8.6637 LearningRate 0.000898 Epoch: 3 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:40,138-Speed 2514.17 samples/sec Loss 8.4604 LearningRate 0.000898 Epoch: 3 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:48,338-Speed 2498.15 samples/sec Loss 8.5614 LearningRate 0.000898 Epoch: 3 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:27:56,540-Speed 2497.23 samples/sec Loss 8.6919 LearningRate 0.000898 Epoch: 3 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:04,741-Speed 2497.86 samples/sec Loss 8.6054 LearningRate 0.000898 Epoch: 3 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:12,968-Speed 2489.76 samples/sec Loss 8.6164 LearningRate 0.000898 Epoch: 3 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:21,169-Speed 2497.43 samples/sec Loss 8.6360 LearningRate 0.000898 Epoch: 3 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:29,315-Speed 2514.51 samples/sec Loss 8.5570 LearningRate 0.000898 Epoch: 3 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:37,515-Speed 2498.08 samples/sec Loss 8.5409 LearningRate 0.000899 Epoch: 3 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:45,720-Speed 2496.31 samples/sec Loss 8.5115 LearningRate 0.000899 Epoch: 3 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:28:53,919-Speed 2498.23 samples/sec Loss 8.5607 LearningRate 0.000899 Epoch: 3 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:02,116-Speed 2498.81 samples/sec Loss 8.4696 LearningRate 0.000899 Epoch: 3 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:10,315-Speed 2498.28 samples/sec Loss 8.5095 LearningRate 0.000899 Epoch: 3 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:18,462-Speed 2514.41 samples/sec Loss 8.4343 LearningRate 0.000899 Epoch: 3 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:26,660-Speed 2498.37 samples/sec Loss 8.5070 LearningRate 0.000899 Epoch: 3 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:34,864-Speed 2496.65 samples/sec Loss 8.5186 LearningRate 0.000899 Epoch: 3 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:43,065-Speed 2497.77 samples/sec Loss 8.5254 LearningRate 0.000900 Epoch: 3 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:51,267-Speed 2497.21 samples/sec Loss 8.5252 LearningRate 0.000900 Epoch: 3 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:29:59,472-Speed 2496.67 samples/sec Loss 8.5702 LearningRate 0.000900 Epoch: 3 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:07,652-Speed 2503.83 samples/sec Loss 8.5386 LearningRate 0.000900 Epoch: 3 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:15,860-Speed 2496.04 samples/sec Loss 8.5988 LearningRate 0.000900 Epoch: 3 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:24,066-Speed 2496.03 samples/sec Loss 8.5122 LearningRate 0.000900 Epoch: 3 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:32,268-Speed 2497.56 samples/sec Loss 8.5370 LearningRate 0.000900 Epoch: 3 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:40,471-Speed 2497.02 samples/sec Loss 8.4037 LearningRate 0.000900 Epoch: 3 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:48,674-Speed 2497.05 samples/sec Loss 8.4988 LearningRate 0.000900 Epoch: 3 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:30:56,822-Speed 2513.67 samples/sec Loss 8.6641 LearningRate 0.000901 Epoch: 3 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:31:05,023-Speed 2497.76 samples/sec Loss 8.6082 LearningRate 0.000901 Epoch: 3 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:31:13,226-Speed 2496.87 samples/sec Loss 8.4525 LearningRate 0.000901 Epoch: 3 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:31:21,437-Speed 2495.11 samples/sec Loss 8.5639 LearningRate 0.000901 Epoch: 3 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 173 hours Training: 2022-07-06 07:31:29,637-Speed 2497.70 samples/sec Loss 8.4971 LearningRate 0.000901 Epoch: 3 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:31:37,837-Speed 2497.94 samples/sec Loss 8.4779 LearningRate 0.000901 Epoch: 3 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:31:45,982-Speed 2514.96 samples/sec Loss 8.4296 LearningRate 0.000901 Epoch: 3 Global Step: 74770 Fp16 Grad Scale: 131072 Required: 173 hours Training: 2022-07-06 07:31:54,188-Speed 2496.22 samples/sec Loss 8.4591 LearningRate 0.000901 Epoch: 3 Global Step: 74780 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:02,388-Speed 2498.20 samples/sec Loss 8.6389 LearningRate 0.000902 Epoch: 3 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:10,590-Speed 2497.46 samples/sec Loss 8.5909 LearningRate 0.000902 Epoch: 3 Global Step: 74800 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:18,791-Speed 2497.72 samples/sec Loss 8.5589 LearningRate 0.000902 Epoch: 3 Global Step: 74810 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:26,995-Speed 2496.62 samples/sec Loss 8.6904 LearningRate 0.000902 Epoch: 3 Global Step: 74820 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:35,145-Speed 2513.31 samples/sec Loss 8.6855 LearningRate 0.000902 Epoch: 3 Global Step: 74830 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:43,351-Speed 2496.26 samples/sec Loss 8.6162 LearningRate 0.000902 Epoch: 3 Global Step: 74840 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:51,564-Speed 2493.89 samples/sec Loss 8.5305 LearningRate 0.000902 Epoch: 3 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:32:59,770-Speed 2496.35 samples/sec Loss 8.6815 LearningRate 0.000902 Epoch: 3 Global Step: 74860 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:07,975-Speed 2496.57 samples/sec Loss 8.6767 LearningRate 0.000903 Epoch: 3 Global Step: 74870 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:16,177-Speed 2497.12 samples/sec Loss 8.6573 LearningRate 0.000903 Epoch: 3 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:24,328-Speed 2512.85 samples/sec Loss 8.6457 LearningRate 0.000903 Epoch: 3 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:32,537-Speed 2495.32 samples/sec Loss 8.4782 LearningRate 0.000903 Epoch: 3 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:40,750-Speed 2494.07 samples/sec Loss 8.5889 LearningRate 0.000903 Epoch: 3 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:48,957-Speed 2495.85 samples/sec Loss 8.7010 LearningRate 0.000903 Epoch: 3 Global Step: 74920 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:33:57,160-Speed 2496.98 samples/sec Loss 8.5751 LearningRate 0.000903 Epoch: 3 Global Step: 74930 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:05,363-Speed 2497.22 samples/sec Loss 8.6643 LearningRate 0.000903 Epoch: 3 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:13,519-Speed 2511.38 samples/sec Loss 8.5516 LearningRate 0.000903 Epoch: 3 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:21,723-Speed 2496.63 samples/sec Loss 8.5508 LearningRate 0.000904 Epoch: 3 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:29,932-Speed 2495.25 samples/sec Loss 8.5575 LearningRate 0.000904 Epoch: 3 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:38,136-Speed 2496.80 samples/sec Loss 8.5647 LearningRate 0.000904 Epoch: 3 Global Step: 74980 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:46,339-Speed 2496.86 samples/sec Loss 8.5064 LearningRate 0.000904 Epoch: 3 Global Step: 74990 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:34:54,542-Speed 2497.25 samples/sec Loss 8.5457 LearningRate 0.000904 Epoch: 3 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:02,691-Speed 2513.41 samples/sec Loss 8.4775 LearningRate 0.000904 Epoch: 3 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:10,896-Speed 2496.59 samples/sec Loss 8.4853 LearningRate 0.000904 Epoch: 3 Global Step: 75020 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:19,102-Speed 2496.12 samples/sec Loss 8.4909 LearningRate 0.000904 Epoch: 3 Global Step: 75030 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:27,305-Speed 2497.05 samples/sec Loss 8.4589 LearningRate 0.000905 Epoch: 3 Global Step: 75040 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:35,512-Speed 2496.10 samples/sec Loss 8.5335 LearningRate 0.000905 Epoch: 3 Global Step: 75050 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:43,715-Speed 2496.96 samples/sec Loss 8.7304 LearningRate 0.000905 Epoch: 3 Global Step: 75060 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:35:51,876-Speed 2510.06 samples/sec Loss 8.5722 LearningRate 0.000905 Epoch: 3 Global Step: 75070 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:00,083-Speed 2495.92 samples/sec Loss 8.6439 LearningRate 0.000905 Epoch: 3 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:08,283-Speed 2497.87 samples/sec Loss 8.5362 LearningRate 0.000905 Epoch: 3 Global Step: 75090 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:16,486-Speed 2497.06 samples/sec Loss 8.5460 LearningRate 0.000905 Epoch: 3 Global Step: 75100 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:24,690-Speed 2496.60 samples/sec Loss 8.5911 LearningRate 0.000905 Epoch: 3 Global Step: 75110 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:32,890-Speed 2497.94 samples/sec Loss 8.5561 LearningRate 0.000906 Epoch: 3 Global Step: 75120 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:41,037-Speed 2514.37 samples/sec Loss 8.5107 LearningRate 0.000906 Epoch: 3 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:49,237-Speed 2497.96 samples/sec Loss 8.4477 LearningRate 0.000906 Epoch: 3 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:36:57,437-Speed 2498.09 samples/sec Loss 8.4966 LearningRate 0.000906 Epoch: 3 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:05,650-Speed 2494.05 samples/sec Loss 8.6862 LearningRate 0.000906 Epoch: 3 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:13,851-Speed 2497.74 samples/sec Loss 8.5065 LearningRate 0.000906 Epoch: 3 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:22,057-Speed 2495.89 samples/sec Loss 8.5642 LearningRate 0.000906 Epoch: 3 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:30,201-Speed 2515.33 samples/sec Loss 8.4271 LearningRate 0.000906 Epoch: 3 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:38,401-Speed 2497.87 samples/sec Loss 8.5442 LearningRate 0.000906 Epoch: 3 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:46,603-Speed 2497.39 samples/sec Loss 8.5120 LearningRate 0.000907 Epoch: 3 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:37:54,805-Speed 2497.25 samples/sec Loss 8.4461 LearningRate 0.000907 Epoch: 3 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:03,007-Speed 2497.43 samples/sec Loss 8.5345 LearningRate 0.000907 Epoch: 3 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:11,214-Speed 2495.83 samples/sec Loss 8.4504 LearningRate 0.000907 Epoch: 3 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:19,356-Speed 2515.73 samples/sec Loss 8.3876 LearningRate 0.000907 Epoch: 3 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:27,553-Speed 2498.85 samples/sec Loss 8.4481 LearningRate 0.000907 Epoch: 3 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:35,753-Speed 2498.16 samples/sec Loss 8.3011 LearningRate 0.000907 Epoch: 3 Global Step: 75270 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:43,951-Speed 2498.64 samples/sec Loss 8.3917 LearningRate 0.000907 Epoch: 3 Global Step: 75280 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:38:52,162-Speed 2494.56 samples/sec Loss 8.3802 LearningRate 0.000908 Epoch: 3 Global Step: 75290 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:00,365-Speed 2497.18 samples/sec Loss 8.3826 LearningRate 0.000908 Epoch: 3 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:08,513-Speed 2514.03 samples/sec Loss 8.4381 LearningRate 0.000908 Epoch: 3 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:16,728-Speed 2493.18 samples/sec Loss 8.4928 LearningRate 0.000908 Epoch: 3 Global Step: 75320 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:24,929-Speed 2497.82 samples/sec Loss 8.6686 LearningRate 0.000908 Epoch: 3 Global Step: 75330 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:33,136-Speed 2495.83 samples/sec Loss 8.5573 LearningRate 0.000908 Epoch: 3 Global Step: 75340 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:41,347-Speed 2494.63 samples/sec Loss 8.5202 LearningRate 0.000908 Epoch: 3 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:49,546-Speed 2498.30 samples/sec Loss 8.4821 LearningRate 0.000908 Epoch: 3 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:39:57,692-Speed 2514.61 samples/sec Loss 8.6655 LearningRate 0.000909 Epoch: 3 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:40:05,846-Speed 2511.94 samples/sec Loss 8.4952 LearningRate 0.000909 Epoch: 3 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:14,045-Speed 2498.46 samples/sec Loss 8.4821 LearningRate 0.000909 Epoch: 3 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:22,248-Speed 2497.39 samples/sec Loss 8.4517 LearningRate 0.000909 Epoch: 3 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:30,445-Speed 2498.72 samples/sec Loss 8.5533 LearningRate 0.000909 Epoch: 3 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:38,645-Speed 2498.02 samples/sec Loss 8.4711 LearningRate 0.000909 Epoch: 3 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:46,796-Speed 2513.00 samples/sec Loss 8.4580 LearningRate 0.000909 Epoch: 3 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:40:55,000-Speed 2496.97 samples/sec Loss 8.3605 LearningRate 0.000909 Epoch: 3 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:03,200-Speed 2498.02 samples/sec Loss 8.3731 LearningRate 0.000910 Epoch: 3 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:11,398-Speed 2498.51 samples/sec Loss 8.3236 LearningRate 0.000910 Epoch: 3 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:19,595-Speed 2498.79 samples/sec Loss 8.3741 LearningRate 0.000910 Epoch: 3 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:27,802-Speed 2496.10 samples/sec Loss 8.4748 LearningRate 0.000910 Epoch: 3 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:35,947-Speed 2514.77 samples/sec Loss 8.5402 LearningRate 0.000910 Epoch: 3 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:44,149-Speed 2497.46 samples/sec Loss 8.4713 LearningRate 0.000910 Epoch: 3 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:41:52,352-Speed 2496.84 samples/sec Loss 8.6457 LearningRate 0.000910 Epoch: 3 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:00,552-Speed 2498.24 samples/sec Loss 8.5322 LearningRate 0.000910 Epoch: 3 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:08,758-Speed 2496.07 samples/sec Loss 8.4601 LearningRate 0.000910 Epoch: 3 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:16,960-Speed 2497.35 samples/sec Loss 8.6642 LearningRate 0.000911 Epoch: 3 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:25,106-Speed 2514.66 samples/sec Loss 8.5522 LearningRate 0.000911 Epoch: 3 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:33,307-Speed 2497.70 samples/sec Loss 8.6664 LearningRate 0.000911 Epoch: 3 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:41,507-Speed 2498.08 samples/sec Loss 8.6113 LearningRate 0.000911 Epoch: 3 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:49,704-Speed 2498.62 samples/sec Loss 8.5747 LearningRate 0.000911 Epoch: 3 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:42:57,903-Speed 2498.45 samples/sec Loss 8.4878 LearningRate 0.000911 Epoch: 3 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:06,104-Speed 2497.73 samples/sec Loss 8.5166 LearningRate 0.000911 Epoch: 3 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:14,252-Speed 2513.94 samples/sec Loss 8.4883 LearningRate 0.000911 Epoch: 3 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:22,449-Speed 2498.90 samples/sec Loss 8.5562 LearningRate 0.000912 Epoch: 3 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:30,663-Speed 2493.78 samples/sec Loss 8.5251 LearningRate 0.000912 Epoch: 3 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:38,865-Speed 2497.12 samples/sec Loss 8.4996 LearningRate 0.000912 Epoch: 3 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:47,064-Speed 2498.39 samples/sec Loss 8.5870 LearningRate 0.000912 Epoch: 3 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:43:55,272-Speed 2495.55 samples/sec Loss 8.5880 LearningRate 0.000912 Epoch: 3 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:03,416-Speed 2515.08 samples/sec Loss 8.5562 LearningRate 0.000912 Epoch: 3 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:11,616-Speed 2497.84 samples/sec Loss 8.5785 LearningRate 0.000912 Epoch: 3 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:19,816-Speed 2497.90 samples/sec Loss 8.5310 LearningRate 0.000912 Epoch: 3 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:28,023-Speed 2495.94 samples/sec Loss 8.4316 LearningRate 0.000913 Epoch: 3 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:36,223-Speed 2498.02 samples/sec Loss 8.5387 LearningRate 0.000913 Epoch: 3 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:44,422-Speed 2497.88 samples/sec Loss 8.5127 LearningRate 0.000913 Epoch: 3 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:44:52,571-Speed 2514.41 samples/sec Loss 8.5918 LearningRate 0.000913 Epoch: 3 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:00,771-Speed 2498.24 samples/sec Loss 8.5059 LearningRate 0.000913 Epoch: 3 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:08,982-Speed 2494.59 samples/sec Loss 8.5046 LearningRate 0.000913 Epoch: 3 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:17,180-Speed 2498.34 samples/sec Loss 8.4497 LearningRate 0.000913 Epoch: 3 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:25,380-Speed 2498.26 samples/sec Loss 8.4902 LearningRate 0.000913 Epoch: 3 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:33,579-Speed 2498.34 samples/sec Loss 8.3886 LearningRate 0.000913 Epoch: 3 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:41,725-Speed 2514.48 samples/sec Loss 8.4247 LearningRate 0.000914 Epoch: 3 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:49,924-Speed 2498.29 samples/sec Loss 8.4310 LearningRate 0.000914 Epoch: 3 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:45:58,125-Speed 2497.50 samples/sec Loss 8.2846 LearningRate 0.000914 Epoch: 3 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:06,328-Speed 2496.86 samples/sec Loss 8.3881 LearningRate 0.000914 Epoch: 3 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:14,529-Speed 2497.70 samples/sec Loss 8.4046 LearningRate 0.000914 Epoch: 3 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:22,743-Speed 2493.57 samples/sec Loss 8.3580 LearningRate 0.000914 Epoch: 3 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:30,897-Speed 2512.18 samples/sec Loss 8.3006 LearningRate 0.000914 Epoch: 3 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:39,099-Speed 2497.35 samples/sec Loss 8.4334 LearningRate 0.000914 Epoch: 3 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:47,298-Speed 2498.18 samples/sec Loss 8.4174 LearningRate 0.000915 Epoch: 3 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:46:55,511-Speed 2494.23 samples/sec Loss 8.3382 LearningRate 0.000915 Epoch: 3 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:03,709-Speed 2498.63 samples/sec Loss 8.5525 LearningRate 0.000915 Epoch: 3 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:11,908-Speed 2498.29 samples/sec Loss 8.4173 LearningRate 0.000915 Epoch: 3 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:20,055-Speed 2514.34 samples/sec Loss 8.5319 LearningRate 0.000915 Epoch: 3 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:28,254-Speed 2498.23 samples/sec Loss 8.4964 LearningRate 0.000915 Epoch: 3 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:36,457-Speed 2496.88 samples/sec Loss 8.3729 LearningRate 0.000915 Epoch: 3 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:44,656-Speed 2498.23 samples/sec Loss 8.4302 LearningRate 0.000915 Epoch: 3 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:47:52,855-Speed 2498.25 samples/sec Loss 8.4739 LearningRate 0.000916 Epoch: 3 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:01,055-Speed 2497.93 samples/sec Loss 8.5421 LearningRate 0.000916 Epoch: 3 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:09,204-Speed 2513.78 samples/sec Loss 8.6262 LearningRate 0.000916 Epoch: 3 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:17,405-Speed 2497.76 samples/sec Loss 8.5623 LearningRate 0.000916 Epoch: 3 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:25,605-Speed 2497.86 samples/sec Loss 8.5248 LearningRate 0.000916 Epoch: 3 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:33,806-Speed 2497.51 samples/sec Loss 8.5453 LearningRate 0.000916 Epoch: 3 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:42,012-Speed 2496.39 samples/sec Loss 8.4973 LearningRate 0.000916 Epoch: 3 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:50,212-Speed 2497.98 samples/sec Loss 8.4736 LearningRate 0.000916 Epoch: 3 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:48:58,371-Speed 2510.23 samples/sec Loss 8.4109 LearningRate 0.000916 Epoch: 3 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:06,573-Speed 2497.54 samples/sec Loss 8.4742 LearningRate 0.000917 Epoch: 3 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:14,773-Speed 2497.85 samples/sec Loss 8.3958 LearningRate 0.000917 Epoch: 3 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:22,975-Speed 2497.56 samples/sec Loss 8.3857 LearningRate 0.000917 Epoch: 3 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:31,181-Speed 2496.06 samples/sec Loss 8.4825 LearningRate 0.000917 Epoch: 3 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:39,382-Speed 2497.47 samples/sec Loss 8.3687 LearningRate 0.000917 Epoch: 3 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:47,531-Speed 2513.87 samples/sec Loss 8.4349 LearningRate 0.000917 Epoch: 3 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:49:55,733-Speed 2497.47 samples/sec Loss 8.3685 LearningRate 0.000917 Epoch: 3 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:03,938-Speed 2496.46 samples/sec Loss 8.3607 LearningRate 0.000917 Epoch: 3 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:12,138-Speed 2497.75 samples/sec Loss 8.3036 LearningRate 0.000918 Epoch: 3 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:20,349-Speed 2494.90 samples/sec Loss 8.3564 LearningRate 0.000918 Epoch: 3 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:28,549-Speed 2497.75 samples/sec Loss 8.4853 LearningRate 0.000918 Epoch: 3 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:36,697-Speed 2513.83 samples/sec Loss 8.3390 LearningRate 0.000918 Epoch: 3 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:44,900-Speed 2497.26 samples/sec Loss 8.3159 LearningRate 0.000918 Epoch: 3 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:50:53,097-Speed 2498.96 samples/sec Loss 8.3986 LearningRate 0.000918 Epoch: 3 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:01,309-Speed 2494.15 samples/sec Loss 8.2790 LearningRate 0.000918 Epoch: 3 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:09,511-Speed 2497.40 samples/sec Loss 8.2516 LearningRate 0.000918 Epoch: 3 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:17,714-Speed 2497.12 samples/sec Loss 8.3637 LearningRate 0.000919 Epoch: 3 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:25,862-Speed 2513.82 samples/sec Loss 8.3619 LearningRate 0.000919 Epoch: 3 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:34,061-Speed 2498.04 samples/sec Loss 8.3822 LearningRate 0.000919 Epoch: 3 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:42,267-Speed 2496.33 samples/sec Loss 8.3786 LearningRate 0.000919 Epoch: 3 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:50,479-Speed 2494.29 samples/sec Loss 8.4353 LearningRate 0.000919 Epoch: 3 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:51:58,682-Speed 2496.91 samples/sec Loss 8.5666 LearningRate 0.000919 Epoch: 3 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:06,883-Speed 2497.77 samples/sec Loss 8.4376 LearningRate 0.000919 Epoch: 3 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:15,031-Speed 2514.22 samples/sec Loss 8.2304 LearningRate 0.000919 Epoch: 3 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:23,235-Speed 2496.92 samples/sec Loss 8.3908 LearningRate 0.000920 Epoch: 3 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:31,455-Speed 2491.76 samples/sec Loss 8.4024 LearningRate 0.000920 Epoch: 3 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:39,653-Speed 2498.55 samples/sec Loss 8.4424 LearningRate 0.000920 Epoch: 3 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:47,853-Speed 2497.90 samples/sec Loss 8.4523 LearningRate 0.000920 Epoch: 3 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:52:56,053-Speed 2497.86 samples/sec Loss 8.3698 LearningRate 0.000920 Epoch: 3 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:04,206-Speed 2512.49 samples/sec Loss 8.4290 LearningRate 0.000920 Epoch: 3 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:12,416-Speed 2494.96 samples/sec Loss 8.3684 LearningRate 0.000920 Epoch: 3 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:20,617-Speed 2497.90 samples/sec Loss 8.4509 LearningRate 0.000920 Epoch: 3 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:28,817-Speed 2497.77 samples/sec Loss 8.3620 LearningRate 0.000920 Epoch: 3 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:37,017-Speed 2497.95 samples/sec Loss 8.4245 LearningRate 0.000921 Epoch: 3 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:45,218-Speed 2497.83 samples/sec Loss 8.4745 LearningRate 0.000921 Epoch: 3 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:53:53,366-Speed 2514.00 samples/sec Loss 8.2338 LearningRate 0.000921 Epoch: 3 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:01,571-Speed 2496.23 samples/sec Loss 8.3596 LearningRate 0.000921 Epoch: 3 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:09,772-Speed 2497.87 samples/sec Loss 8.3938 LearningRate 0.000921 Epoch: 3 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:17,972-Speed 2497.92 samples/sec Loss 8.4853 LearningRate 0.000921 Epoch: 3 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:26,173-Speed 2497.50 samples/sec Loss 8.5038 LearningRate 0.000921 Epoch: 3 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:34,378-Speed 2496.51 samples/sec Loss 8.4789 LearningRate 0.000921 Epoch: 3 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:42,525-Speed 2514.28 samples/sec Loss 8.3598 LearningRate 0.000922 Epoch: 3 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:50,738-Speed 2494.19 samples/sec Loss 8.4334 LearningRate 0.000922 Epoch: 3 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:54:58,939-Speed 2497.78 samples/sec Loss 8.4228 LearningRate 0.000922 Epoch: 3 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:07,155-Speed 2493.05 samples/sec Loss 8.3589 LearningRate 0.000922 Epoch: 3 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:15,357-Speed 2497.18 samples/sec Loss 8.3810 LearningRate 0.000922 Epoch: 3 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:23,557-Speed 2497.92 samples/sec Loss 8.3638 LearningRate 0.000922 Epoch: 3 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:31,705-Speed 2514.15 samples/sec Loss 8.4800 LearningRate 0.000922 Epoch: 3 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:39,906-Speed 2497.85 samples/sec Loss 8.4922 LearningRate 0.000922 Epoch: 3 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:48,107-Speed 2497.54 samples/sec Loss 8.3864 LearningRate 0.000923 Epoch: 3 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:55:56,313-Speed 2496.33 samples/sec Loss 8.3429 LearningRate 0.000923 Epoch: 3 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:56:04,526-Speed 2493.94 samples/sec Loss 8.3250 LearningRate 0.000923 Epoch: 3 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:56:12,723-Speed 2498.71 samples/sec Loss 8.3588 LearningRate 0.000923 Epoch: 3 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:56:20,868-Speed 2515.03 samples/sec Loss 8.3824 LearningRate 0.000923 Epoch: 3 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 07:56:29,069-Speed 2497.54 samples/sec Loss 8.3723 LearningRate 0.000923 Epoch: 3 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:56:37,267-Speed 2498.62 samples/sec Loss 8.3664 LearningRate 0.000923 Epoch: 3 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:56:45,465-Speed 2498.89 samples/sec Loss 8.3469 LearningRate 0.000923 Epoch: 3 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:56:53,664-Speed 2498.33 samples/sec Loss 8.3752 LearningRate 0.000923 Epoch: 3 Global Step: 76610 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:01,879-Speed 2493.34 samples/sec Loss 8.3337 LearningRate 0.000924 Epoch: 3 Global Step: 76620 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:10,025-Speed 2514.59 samples/sec Loss 8.2266 LearningRate 0.000924 Epoch: 3 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:18,226-Speed 2497.82 samples/sec Loss 8.4729 LearningRate 0.000924 Epoch: 3 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:26,423-Speed 2498.73 samples/sec Loss 8.4118 LearningRate 0.000924 Epoch: 3 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:34,622-Speed 2498.22 samples/sec Loss 8.3102 LearningRate 0.000924 Epoch: 3 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:42,822-Speed 2498.22 samples/sec Loss 8.4110 LearningRate 0.000924 Epoch: 3 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:51,031-Speed 2495.13 samples/sec Loss 8.4207 LearningRate 0.000924 Epoch: 3 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:57:59,177-Speed 2514.25 samples/sec Loss 8.3711 LearningRate 0.000924 Epoch: 3 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:07,379-Speed 2497.72 samples/sec Loss 8.3236 LearningRate 0.000925 Epoch: 3 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:15,580-Speed 2497.56 samples/sec Loss 8.3108 LearningRate 0.000925 Epoch: 3 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:23,782-Speed 2497.38 samples/sec Loss 8.3372 LearningRate 0.000925 Epoch: 3 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:31,990-Speed 2495.48 samples/sec Loss 8.2932 LearningRate 0.000925 Epoch: 3 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:40,188-Speed 2498.67 samples/sec Loss 8.2835 LearningRate 0.000925 Epoch: 3 Global Step: 76740 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:48,334-Speed 2514.44 samples/sec Loss 8.3377 LearningRate 0.000925 Epoch: 3 Global Step: 76750 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:58:56,531-Speed 2498.91 samples/sec Loss 8.3472 LearningRate 0.000925 Epoch: 3 Global Step: 76760 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:04,727-Speed 2499.06 samples/sec Loss 8.2820 LearningRate 0.000925 Epoch: 3 Global Step: 76770 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:12,931-Speed 2496.90 samples/sec Loss 8.2570 LearningRate 0.000926 Epoch: 3 Global Step: 76780 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:21,141-Speed 2494.91 samples/sec Loss 8.1981 LearningRate 0.000926 Epoch: 3 Global Step: 76790 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:29,355-Speed 2493.67 samples/sec Loss 8.2982 LearningRate 0.000926 Epoch: 3 Global Step: 76800 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:37,503-Speed 2513.88 samples/sec Loss 8.2641 LearningRate 0.000926 Epoch: 3 Global Step: 76810 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:45,708-Speed 2496.41 samples/sec Loss 8.3599 LearningRate 0.000926 Epoch: 3 Global Step: 76820 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 07:59:53,910-Speed 2497.63 samples/sec Loss 8.3320 LearningRate 0.000926 Epoch: 3 Global Step: 76830 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:02,106-Speed 2498.87 samples/sec Loss 8.4638 LearningRate 0.000926 Epoch: 3 Global Step: 76840 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:10,302-Speed 2499.38 samples/sec Loss 8.3707 LearningRate 0.000926 Epoch: 3 Global Step: 76850 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:18,509-Speed 2495.72 samples/sec Loss 8.3795 LearningRate 0.000927 Epoch: 3 Global Step: 76860 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:26,659-Speed 2513.10 samples/sec Loss 8.4348 LearningRate 0.000927 Epoch: 3 Global Step: 76870 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:34,860-Speed 2497.86 samples/sec Loss 8.3750 LearningRate 0.000927 Epoch: 3 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:43,061-Speed 2497.90 samples/sec Loss 8.4411 LearningRate 0.000927 Epoch: 3 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:51,265-Speed 2496.80 samples/sec Loss 8.4157 LearningRate 0.000927 Epoch: 3 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:00:59,464-Speed 2498.11 samples/sec Loss 8.3354 LearningRate 0.000927 Epoch: 3 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:07,665-Speed 2497.48 samples/sec Loss 8.3450 LearningRate 0.000927 Epoch: 3 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:15,811-Speed 2514.65 samples/sec Loss 8.4484 LearningRate 0.000927 Epoch: 3 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:24,013-Speed 2497.27 samples/sec Loss 8.4146 LearningRate 0.000927 Epoch: 3 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:32,212-Speed 2498.15 samples/sec Loss 8.3230 LearningRate 0.000928 Epoch: 3 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:40,415-Speed 2497.04 samples/sec Loss 8.2872 LearningRate 0.000928 Epoch: 3 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:48,619-Speed 2497.02 samples/sec Loss 8.3145 LearningRate 0.000928 Epoch: 3 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:01:56,816-Speed 2498.71 samples/sec Loss 8.4394 LearningRate 0.000928 Epoch: 3 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:04,963-Speed 2514.21 samples/sec Loss 8.4715 LearningRate 0.000928 Epoch: 3 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:13,166-Speed 2497.27 samples/sec Loss 8.3978 LearningRate 0.000928 Epoch: 3 Global Step: 77000 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:21,370-Speed 2496.93 samples/sec Loss 8.2696 LearningRate 0.000928 Epoch: 3 Global Step: 77010 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:29,568-Speed 2498.32 samples/sec Loss 8.2412 LearningRate 0.000928 Epoch: 3 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:37,772-Speed 2496.69 samples/sec Loss 8.3094 LearningRate 0.000929 Epoch: 3 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:45,976-Speed 2496.97 samples/sec Loss 8.3042 LearningRate 0.000929 Epoch: 3 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:02:54,119-Speed 2515.42 samples/sec Loss 8.2961 LearningRate 0.000929 Epoch: 3 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:02,319-Speed 2497.88 samples/sec Loss 8.4733 LearningRate 0.000929 Epoch: 3 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:10,521-Speed 2497.44 samples/sec Loss 8.4296 LearningRate 0.000929 Epoch: 3 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:18,718-Speed 2498.97 samples/sec Loss 8.4091 LearningRate 0.000929 Epoch: 3 Global Step: 77080 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:26,919-Speed 2497.85 samples/sec Loss 8.4214 LearningRate 0.000929 Epoch: 3 Global Step: 77090 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:35,117-Speed 2498.53 samples/sec Loss 8.3887 LearningRate 0.000929 Epoch: 3 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:43,265-Speed 2513.89 samples/sec Loss 8.3238 LearningRate 0.000930 Epoch: 3 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:51,464-Speed 2498.47 samples/sec Loss 8.2326 LearningRate 0.000930 Epoch: 3 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:03:59,664-Speed 2498.11 samples/sec Loss 8.2956 LearningRate 0.000930 Epoch: 3 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:07,868-Speed 2496.74 samples/sec Loss 8.2681 LearningRate 0.000930 Epoch: 3 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:16,071-Speed 2497.05 samples/sec Loss 8.3023 LearningRate 0.000930 Epoch: 3 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:24,272-Speed 2497.99 samples/sec Loss 8.4322 LearningRate 0.000930 Epoch: 3 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:32,419-Speed 2513.97 samples/sec Loss 8.2633 LearningRate 0.000930 Epoch: 3 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:40,617-Speed 2498.59 samples/sec Loss 8.4358 LearningRate 0.000930 Epoch: 3 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:48,818-Speed 2497.82 samples/sec Loss 8.3595 LearningRate 0.000930 Epoch: 3 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:04:57,018-Speed 2497.93 samples/sec Loss 8.2500 LearningRate 0.000931 Epoch: 3 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:05,221-Speed 2496.99 samples/sec Loss 8.4097 LearningRate 0.000931 Epoch: 3 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:13,424-Speed 2497.50 samples/sec Loss 8.2387 LearningRate 0.000931 Epoch: 3 Global Step: 77220 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:21,583-Speed 2510.41 samples/sec Loss 8.3419 LearningRate 0.000931 Epoch: 3 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:29,785-Speed 2497.52 samples/sec Loss 8.3342 LearningRate 0.000931 Epoch: 3 Global Step: 77240 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:37,989-Speed 2496.79 samples/sec Loss 8.3854 LearningRate 0.000931 Epoch: 3 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:46,193-Speed 2496.64 samples/sec Loss 8.3652 LearningRate 0.000931 Epoch: 3 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:05:54,394-Speed 2497.87 samples/sec Loss 8.3068 LearningRate 0.000931 Epoch: 3 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:02,593-Speed 2498.28 samples/sec Loss 8.3458 LearningRate 0.000932 Epoch: 3 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:10,744-Speed 2512.88 samples/sec Loss 8.4284 LearningRate 0.000932 Epoch: 3 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:18,958-Speed 2493.81 samples/sec Loss 8.2664 LearningRate 0.000932 Epoch: 3 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:27,157-Speed 2498.16 samples/sec Loss 8.2695 LearningRate 0.000932 Epoch: 3 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:35,360-Speed 2497.02 samples/sec Loss 8.1409 LearningRate 0.000932 Epoch: 3 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:43,560-Speed 2498.14 samples/sec Loss 8.2670 LearningRate 0.000932 Epoch: 3 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:06:51,856-Speed 2469.07 samples/sec Loss 8.2150 LearningRate 0.000932 Epoch: 3 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:00,001-Speed 2514.91 samples/sec Loss 8.2988 LearningRate 0.000932 Epoch: 3 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:08,200-Speed 2498.26 samples/sec Loss 8.2511 LearningRate 0.000933 Epoch: 3 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:16,406-Speed 2496.26 samples/sec Loss 8.2364 LearningRate 0.000933 Epoch: 3 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:24,609-Speed 2497.04 samples/sec Loss 8.2390 LearningRate 0.000933 Epoch: 3 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:32,806-Speed 2498.67 samples/sec Loss 8.2314 LearningRate 0.000933 Epoch: 3 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:41,007-Speed 2497.59 samples/sec Loss 8.1667 LearningRate 0.000933 Epoch: 3 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:49,157-Speed 2513.45 samples/sec Loss 8.2604 LearningRate 0.000933 Epoch: 3 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:07:57,361-Speed 2496.57 samples/sec Loss 8.1998 LearningRate 0.000933 Epoch: 3 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:05,564-Speed 2497.15 samples/sec Loss 8.2494 LearningRate 0.000933 Epoch: 3 Global Step: 77430 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:13,765-Speed 2497.64 samples/sec Loss 8.2969 LearningRate 0.000933 Epoch: 3 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:21,969-Speed 2496.86 samples/sec Loss 8.3335 LearningRate 0.000934 Epoch: 3 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:30,171-Speed 2497.49 samples/sec Loss 8.4104 LearningRate 0.000934 Epoch: 3 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:38,320-Speed 2513.32 samples/sec Loss 8.3059 LearningRate 0.000934 Epoch: 3 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:46,531-Speed 2494.72 samples/sec Loss 8.3307 LearningRate 0.000934 Epoch: 3 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:08:54,734-Speed 2497.16 samples/sec Loss 8.3465 LearningRate 0.000934 Epoch: 3 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:02,934-Speed 2498.08 samples/sec Loss 8.2812 LearningRate 0.000934 Epoch: 3 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:11,134-Speed 2497.94 samples/sec Loss 8.3612 LearningRate 0.000934 Epoch: 3 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:19,332-Speed 2498.31 samples/sec Loss 8.4166 LearningRate 0.000934 Epoch: 3 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:27,488-Speed 2511.50 samples/sec Loss 8.2409 LearningRate 0.000935 Epoch: 3 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:35,687-Speed 2498.29 samples/sec Loss 8.3652 LearningRate 0.000935 Epoch: 3 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:43,900-Speed 2494.01 samples/sec Loss 8.3285 LearningRate 0.000935 Epoch: 3 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:09:52,102-Speed 2497.25 samples/sec Loss 8.2716 LearningRate 0.000935 Epoch: 3 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:00,304-Speed 2497.64 samples/sec Loss 8.2231 LearningRate 0.000935 Epoch: 3 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:08,504-Speed 2498.15 samples/sec Loss 8.3045 LearningRate 0.000935 Epoch: 3 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:16,649-Speed 2514.78 samples/sec Loss 8.2481 LearningRate 0.000935 Epoch: 3 Global Step: 77590 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:24,846-Speed 2498.61 samples/sec Loss 8.2481 LearningRate 0.000935 Epoch: 3 Global Step: 77600 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:33,044-Speed 2498.85 samples/sec Loss 8.2743 LearningRate 0.000936 Epoch: 3 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:41,242-Speed 2498.42 samples/sec Loss 8.2034 LearningRate 0.000936 Epoch: 3 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:49,451-Speed 2495.29 samples/sec Loss 8.1919 LearningRate 0.000936 Epoch: 3 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:10:57,651-Speed 2497.95 samples/sec Loss 8.2137 LearningRate 0.000936 Epoch: 3 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:05,797-Speed 2514.60 samples/sec Loss 8.3649 LearningRate 0.000936 Epoch: 3 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:14,002-Speed 2496.57 samples/sec Loss 8.3441 LearningRate 0.000936 Epoch: 3 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:22,200-Speed 2498.43 samples/sec Loss 8.5619 LearningRate 0.000936 Epoch: 3 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:30,401-Speed 2497.72 samples/sec Loss 8.4311 LearningRate 0.000936 Epoch: 3 Global Step: 77680 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:38,604-Speed 2497.57 samples/sec Loss 8.2793 LearningRate 0.000937 Epoch: 3 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:46,816-Speed 2494.24 samples/sec Loss 8.3533 LearningRate 0.000937 Epoch: 3 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:11:54,964-Speed 2514.06 samples/sec Loss 8.2044 LearningRate 0.000937 Epoch: 3 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:03,163-Speed 2498.14 samples/sec Loss 8.4454 LearningRate 0.000937 Epoch: 3 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:11,376-Speed 2494.15 samples/sec Loss 8.3421 LearningRate 0.000937 Epoch: 3 Global Step: 77730 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:19,580-Speed 2496.96 samples/sec Loss 8.4123 LearningRate 0.000937 Epoch: 3 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:27,780-Speed 2497.71 samples/sec Loss 8.2668 LearningRate 0.000937 Epoch: 3 Global Step: 77750 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:35,981-Speed 2497.89 samples/sec Loss 8.4170 LearningRate 0.000937 Epoch: 3 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:44,127-Speed 2514.44 samples/sec Loss 8.3544 LearningRate 0.000937 Epoch: 3 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:12:52,323-Speed 2499.20 samples/sec Loss 8.3636 LearningRate 0.000938 Epoch: 3 Global Step: 77780 Fp16 Grad Scale: 262144 Required: 172 hours Training: 2022-07-06 08:13:00,523-Speed 2498.03 samples/sec Loss 8.2592 LearningRate 0.000938 Epoch: 3 Global Step: 77790 Fp16 Grad Scale: 262144 Required: 172 hours Training: 2022-07-06 08:13:08,685-Speed 2509.81 samples/sec Loss 8.3140 LearningRate 0.000938 Epoch: 3 Global Step: 77800 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:16,887-Speed 2497.44 samples/sec Loss 8.3549 LearningRate 0.000938 Epoch: 3 Global Step: 77810 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:25,089-Speed 2497.31 samples/sec Loss 8.4319 LearningRate 0.000938 Epoch: 3 Global Step: 77820 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:33,236-Speed 2514.27 samples/sec Loss 8.3780 LearningRate 0.000938 Epoch: 3 Global Step: 77830 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:41,440-Speed 2496.75 samples/sec Loss 8.3318 LearningRate 0.000938 Epoch: 3 Global Step: 77840 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:49,639-Speed 2498.10 samples/sec Loss 8.3041 LearningRate 0.000938 Epoch: 3 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:13:57,837-Speed 2498.76 samples/sec Loss 8.2694 LearningRate 0.000939 Epoch: 3 Global Step: 77860 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:06,043-Speed 2496.09 samples/sec Loss 8.3340 LearningRate 0.000939 Epoch: 3 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:14,250-Speed 2496.04 samples/sec Loss 8.3240 LearningRate 0.000939 Epoch: 3 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:22,393-Speed 2515.32 samples/sec Loss 8.2529 LearningRate 0.000939 Epoch: 3 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:30,592-Speed 2498.33 samples/sec Loss 8.3484 LearningRate 0.000939 Epoch: 3 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:38,792-Speed 2498.10 samples/sec Loss 8.3157 LearningRate 0.000939 Epoch: 3 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:47,016-Speed 2490.94 samples/sec Loss 8.3243 LearningRate 0.000939 Epoch: 3 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:14:55,215-Speed 2498.13 samples/sec Loss 8.2133 LearningRate 0.000939 Epoch: 3 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:03,415-Speed 2497.92 samples/sec Loss 8.1641 LearningRate 0.000940 Epoch: 3 Global Step: 77940 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:11,566-Speed 2512.84 samples/sec Loss 8.2049 LearningRate 0.000940 Epoch: 3 Global Step: 77950 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:19,779-Speed 2494.49 samples/sec Loss 8.1369 LearningRate 0.000940 Epoch: 3 Global Step: 77960 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:27,978-Speed 2498.34 samples/sec Loss 8.2132 LearningRate 0.000940 Epoch: 3 Global Step: 77970 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:36,176-Speed 2498.72 samples/sec Loss 8.1738 LearningRate 0.000940 Epoch: 3 Global Step: 77980 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:44,377-Speed 2497.70 samples/sec Loss 8.3130 LearningRate 0.000940 Epoch: 3 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:15:52,576-Speed 2498.25 samples/sec Loss 8.2053 LearningRate 0.000940 Epoch: 3 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:00,723-Speed 2514.14 samples/sec Loss 8.2555 LearningRate 0.000940 Epoch: 3 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:08,941-Speed 2492.51 samples/sec Loss 8.2544 LearningRate 0.000940 Epoch: 3 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:17,144-Speed 2497.20 samples/sec Loss 8.2426 LearningRate 0.000941 Epoch: 3 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:25,354-Speed 2494.60 samples/sec Loss 8.2019 LearningRate 0.000941 Epoch: 3 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:33,565-Speed 2494.86 samples/sec Loss 8.2298 LearningRate 0.000941 Epoch: 3 Global Step: 78050 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:41,768-Speed 2497.10 samples/sec Loss 8.2262 LearningRate 0.000941 Epoch: 3 Global Step: 78060 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:49,916-Speed 2514.33 samples/sec Loss 8.1921 LearningRate 0.000941 Epoch: 3 Global Step: 78070 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:16:58,114-Speed 2498.97 samples/sec Loss 8.2653 LearningRate 0.000941 Epoch: 3 Global Step: 78080 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:06,316-Speed 2497.26 samples/sec Loss 8.1764 LearningRate 0.000941 Epoch: 3 Global Step: 78090 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:14,516-Speed 2498.07 samples/sec Loss 8.3307 LearningRate 0.000941 Epoch: 3 Global Step: 78100 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:22,714-Speed 2498.47 samples/sec Loss 8.2476 LearningRate 0.000942 Epoch: 3 Global Step: 78110 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:30,916-Speed 2497.26 samples/sec Loss 8.3120 LearningRate 0.000942 Epoch: 3 Global Step: 78120 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:39,060-Speed 2515.28 samples/sec Loss 8.2091 LearningRate 0.000942 Epoch: 3 Global Step: 78130 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:47,258-Speed 2498.67 samples/sec Loss 8.3117 LearningRate 0.000942 Epoch: 3 Global Step: 78140 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:17:55,469-Speed 2494.49 samples/sec Loss 8.1759 LearningRate 0.000942 Epoch: 3 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:03,670-Speed 2497.42 samples/sec Loss 8.1485 LearningRate 0.000942 Epoch: 3 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:11,869-Speed 2498.53 samples/sec Loss 8.2833 LearningRate 0.000942 Epoch: 3 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:20,071-Speed 2497.51 samples/sec Loss 8.1860 LearningRate 0.000942 Epoch: 3 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:28,216-Speed 2514.65 samples/sec Loss 8.1185 LearningRate 0.000943 Epoch: 3 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:36,437-Speed 2491.71 samples/sec Loss 8.2129 LearningRate 0.000943 Epoch: 3 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:44,638-Speed 2497.51 samples/sec Loss 8.2043 LearningRate 0.000943 Epoch: 3 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:18:52,847-Speed 2495.32 samples/sec Loss 8.3113 LearningRate 0.000943 Epoch: 3 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:01,050-Speed 2496.91 samples/sec Loss 8.2268 LearningRate 0.000943 Epoch: 3 Global Step: 78230 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:09,251-Speed 2497.70 samples/sec Loss 8.2104 LearningRate 0.000943 Epoch: 3 Global Step: 78240 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:17,397-Speed 2514.43 samples/sec Loss 8.1907 LearningRate 0.000943 Epoch: 3 Global Step: 78250 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:25,595-Speed 2498.94 samples/sec Loss 8.2141 LearningRate 0.000943 Epoch: 3 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:33,794-Speed 2498.57 samples/sec Loss 8.2083 LearningRate 0.000944 Epoch: 3 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:41,991-Speed 2498.78 samples/sec Loss 8.3534 LearningRate 0.000944 Epoch: 3 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:50,192-Speed 2497.59 samples/sec Loss 8.3386 LearningRate 0.000944 Epoch: 3 Global Step: 78290 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:19:58,389-Speed 2499.19 samples/sec Loss 8.2619 LearningRate 0.000944 Epoch: 3 Global Step: 78300 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:06,534-Speed 2514.94 samples/sec Loss 8.2671 LearningRate 0.000944 Epoch: 3 Global Step: 78310 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:14,737-Speed 2497.02 samples/sec Loss 8.2630 LearningRate 0.000944 Epoch: 3 Global Step: 78320 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:22,948-Speed 2494.57 samples/sec Loss 8.3514 LearningRate 0.000944 Epoch: 3 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:31,152-Speed 2496.60 samples/sec Loss 8.4000 LearningRate 0.000944 Epoch: 3 Global Step: 78340 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:39,351-Speed 2498.22 samples/sec Loss 8.3244 LearningRate 0.000944 Epoch: 3 Global Step: 78350 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:47,553-Speed 2497.64 samples/sec Loss 8.3140 LearningRate 0.000945 Epoch: 3 Global Step: 78360 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:20:55,702-Speed 2513.37 samples/sec Loss 8.2002 LearningRate 0.000945 Epoch: 3 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:03,901-Speed 2498.84 samples/sec Loss 8.2247 LearningRate 0.000945 Epoch: 3 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:12,102-Speed 2497.39 samples/sec Loss 8.2322 LearningRate 0.000945 Epoch: 3 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:20,301-Speed 2498.20 samples/sec Loss 8.1372 LearningRate 0.000945 Epoch: 3 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:28,499-Speed 2498.50 samples/sec Loss 8.2088 LearningRate 0.000945 Epoch: 3 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:36,699-Speed 2498.41 samples/sec Loss 8.2189 LearningRate 0.000945 Epoch: 3 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:44,845-Speed 2514.36 samples/sec Loss 8.2121 LearningRate 0.000945 Epoch: 3 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:21:53,046-Speed 2497.64 samples/sec Loss 8.2491 LearningRate 0.000946 Epoch: 3 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 172 hours Training: 2022-07-06 08:22:01,200-Speed 2512.30 samples/sec Loss 8.2293 LearningRate 0.000946 Epoch: 3 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:09,404-Speed 2496.82 samples/sec Loss 8.1904 LearningRate 0.000946 Epoch: 3 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:17,603-Speed 2498.47 samples/sec Loss 8.1738 LearningRate 0.000946 Epoch: 3 Global Step: 78470 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:25,803-Speed 2498.07 samples/sec Loss 8.2123 LearningRate 0.000946 Epoch: 3 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:33,949-Speed 2514.70 samples/sec Loss 8.1150 LearningRate 0.000946 Epoch: 3 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:42,153-Speed 2496.54 samples/sec Loss 8.2617 LearningRate 0.000946 Epoch: 3 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:50,354-Speed 2497.66 samples/sec Loss 8.1825 LearningRate 0.000946 Epoch: 3 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:22:58,552-Speed 2498.75 samples/sec Loss 8.1891 LearningRate 0.000947 Epoch: 3 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:06,749-Speed 2498.67 samples/sec Loss 8.2372 LearningRate 0.000947 Epoch: 3 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:14,960-Speed 2494.48 samples/sec Loss 8.2503 LearningRate 0.000947 Epoch: 3 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:23,108-Speed 2513.88 samples/sec Loss 8.2704 LearningRate 0.000947 Epoch: 3 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:31,306-Speed 2498.67 samples/sec Loss 8.2938 LearningRate 0.000947 Epoch: 3 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:39,504-Speed 2498.34 samples/sec Loss 8.1286 LearningRate 0.000947 Epoch: 3 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:47,702-Speed 2498.84 samples/sec Loss 8.1542 LearningRate 0.000947 Epoch: 3 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:23:55,898-Speed 2498.99 samples/sec Loss 8.2256 LearningRate 0.000947 Epoch: 3 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:04,096-Speed 2498.63 samples/sec Loss 8.2012 LearningRate 0.000947 Epoch: 3 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:12,245-Speed 2513.66 samples/sec Loss 8.2512 LearningRate 0.000948 Epoch: 3 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:20,443-Speed 2498.58 samples/sec Loss 8.2411 LearningRate 0.000948 Epoch: 3 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:28,644-Speed 2497.62 samples/sec Loss 8.1232 LearningRate 0.000948 Epoch: 3 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:36,841-Speed 2498.79 samples/sec Loss 8.2378 LearningRate 0.000948 Epoch: 3 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:45,041-Speed 2498.14 samples/sec Loss 8.3132 LearningRate 0.000948 Epoch: 3 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:24:53,239-Speed 2498.19 samples/sec Loss 8.2963 LearningRate 0.000948 Epoch: 3 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:01,388-Speed 2513.60 samples/sec Loss 8.2589 LearningRate 0.000948 Epoch: 3 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:09,589-Speed 2497.77 samples/sec Loss 8.2803 LearningRate 0.000948 Epoch: 3 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:17,787-Speed 2498.52 samples/sec Loss 8.3678 LearningRate 0.000949 Epoch: 3 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:25,989-Speed 2497.32 samples/sec Loss 8.1996 LearningRate 0.000949 Epoch: 3 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:34,197-Speed 2495.51 samples/sec Loss 8.1803 LearningRate 0.000949 Epoch: 3 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:42,401-Speed 2496.93 samples/sec Loss 8.2423 LearningRate 0.000949 Epoch: 3 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:50,544-Speed 2515.29 samples/sec Loss 8.2257 LearningRate 0.000949 Epoch: 3 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:25:58,744-Speed 2497.81 samples/sec Loss 8.1902 LearningRate 0.000949 Epoch: 3 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:06,954-Speed 2494.99 samples/sec Loss 8.2552 LearningRate 0.000949 Epoch: 3 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:15,153-Speed 2498.16 samples/sec Loss 8.2762 LearningRate 0.000949 Epoch: 3 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:23,354-Speed 2497.80 samples/sec Loss 8.3800 LearningRate 0.000950 Epoch: 3 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:31,555-Speed 2497.70 samples/sec Loss 8.1953 LearningRate 0.000950 Epoch: 3 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:39,701-Speed 2514.39 samples/sec Loss 8.2993 LearningRate 0.000950 Epoch: 3 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:47,901-Speed 2497.88 samples/sec Loss 8.2123 LearningRate 0.000950 Epoch: 3 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:26:56,099-Speed 2498.55 samples/sec Loss 8.1656 LearningRate 0.000950 Epoch: 3 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:04,303-Speed 2496.86 samples/sec Loss 8.2324 LearningRate 0.000950 Epoch: 3 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:12,507-Speed 2496.80 samples/sec Loss 8.1326 LearningRate 0.000950 Epoch: 3 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:20,712-Speed 2496.47 samples/sec Loss 8.2272 LearningRate 0.000950 Epoch: 3 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:28,862-Speed 2513.53 samples/sec Loss 8.1841 LearningRate 0.000950 Epoch: 3 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:37,065-Speed 2496.96 samples/sec Loss 8.1779 LearningRate 0.000951 Epoch: 3 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:45,264-Speed 2498.21 samples/sec Loss 8.1412 LearningRate 0.000951 Epoch: 3 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:27:53,462-Speed 2498.82 samples/sec Loss 8.1290 LearningRate 0.000951 Epoch: 3 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:01,659-Speed 2498.71 samples/sec Loss 8.2053 LearningRate 0.000951 Epoch: 3 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:09,860-Speed 2497.70 samples/sec Loss 8.2886 LearningRate 0.000951 Epoch: 3 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:18,006-Speed 2514.43 samples/sec Loss 8.1921 LearningRate 0.000951 Epoch: 3 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:26,209-Speed 2497.06 samples/sec Loss 8.1549 LearningRate 0.000951 Epoch: 3 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:34,413-Speed 2497.03 samples/sec Loss 8.1646 LearningRate 0.000951 Epoch: 3 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:42,615-Speed 2497.27 samples/sec Loss 8.2543 LearningRate 0.000952 Epoch: 3 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:50,815-Speed 2497.99 samples/sec Loss 8.1485 LearningRate 0.000952 Epoch: 3 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:28:59,013-Speed 2498.43 samples/sec Loss 8.1792 LearningRate 0.000952 Epoch: 3 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:29:07,158-Speed 2514.74 samples/sec Loss 8.1655 LearningRate 0.000952 Epoch: 3 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:29:15,359-Speed 2497.90 samples/sec Loss 8.3066 LearningRate 0.000952 Epoch: 3 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 172 hours Training: 2022-07-06 08:29:23,560-Speed 2497.79 samples/sec Loss 8.0942 LearningRate 0.000952 Epoch: 3 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:29:31,760-Speed 2497.90 samples/sec Loss 7.9648 LearningRate 0.000952 Epoch: 3 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:29:39,994-Speed 2487.62 samples/sec Loss 8.0995 LearningRate 0.000952 Epoch: 3 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:29:48,195-Speed 2497.91 samples/sec Loss 8.0110 LearningRate 0.000953 Epoch: 3 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:29:56,346-Speed 2513.39 samples/sec Loss 8.0466 LearningRate 0.000953 Epoch: 3 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:04,542-Speed 2499.12 samples/sec Loss 8.1894 LearningRate 0.000953 Epoch: 3 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:12,744-Speed 2497.68 samples/sec Loss 8.2006 LearningRate 0.000953 Epoch: 3 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:20,944-Speed 2497.90 samples/sec Loss 8.2169 LearningRate 0.000953 Epoch: 3 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:29,147-Speed 2496.89 samples/sec Loss 8.1333 LearningRate 0.000953 Epoch: 3 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:37,344-Speed 2498.91 samples/sec Loss 8.1114 LearningRate 0.000953 Epoch: 3 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:45,491-Speed 2514.28 samples/sec Loss 8.1311 LearningRate 0.000953 Epoch: 3 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:30:53,688-Speed 2498.82 samples/sec Loss 8.0357 LearningRate 0.000954 Epoch: 3 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:01,888-Speed 2498.10 samples/sec Loss 8.0959 LearningRate 0.000954 Epoch: 3 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:10,091-Speed 2496.98 samples/sec Loss 8.1791 LearningRate 0.000954 Epoch: 3 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:18,293-Speed 2497.09 samples/sec Loss 8.3249 LearningRate 0.000954 Epoch: 3 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:26,509-Speed 2493.13 samples/sec Loss 8.2024 LearningRate 0.000954 Epoch: 3 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:34,655-Speed 2514.74 samples/sec Loss 8.0987 LearningRate 0.000954 Epoch: 3 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:42,855-Speed 2497.80 samples/sec Loss 8.1056 LearningRate 0.000954 Epoch: 3 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:51,056-Speed 2497.64 samples/sec Loss 8.1514 LearningRate 0.000954 Epoch: 3 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:31:59,258-Speed 2498.51 samples/sec Loss 8.1175 LearningRate 0.000954 Epoch: 3 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:07,459-Speed 2497.56 samples/sec Loss 8.1714 LearningRate 0.000955 Epoch: 3 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:15,657-Speed 2498.21 samples/sec Loss 8.1344 LearningRate 0.000955 Epoch: 3 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:23,803-Speed 2514.75 samples/sec Loss 8.1258 LearningRate 0.000955 Epoch: 3 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:32,010-Speed 2495.69 samples/sec Loss 8.1047 LearningRate 0.000955 Epoch: 3 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:40,211-Speed 2497.65 samples/sec Loss 8.0557 LearningRate 0.000955 Epoch: 3 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:48,410-Speed 2498.40 samples/sec Loss 8.0772 LearningRate 0.000955 Epoch: 3 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:32:56,609-Speed 2498.42 samples/sec Loss 8.1146 LearningRate 0.000955 Epoch: 3 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:04,809-Speed 2497.78 samples/sec Loss 8.0885 LearningRate 0.000955 Epoch: 3 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:12,954-Speed 2515.01 samples/sec Loss 8.1371 LearningRate 0.000956 Epoch: 3 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:21,157-Speed 2497.22 samples/sec Loss 8.1787 LearningRate 0.000956 Epoch: 3 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:29,358-Speed 2498.36 samples/sec Loss 8.0902 LearningRate 0.000956 Epoch: 3 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:37,562-Speed 2496.69 samples/sec Loss 8.1368 LearningRate 0.000956 Epoch: 3 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:45,762-Speed 2497.90 samples/sec Loss 8.0505 LearningRate 0.000956 Epoch: 3 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:33:53,961-Speed 2498.26 samples/sec Loss 8.1769 LearningRate 0.000956 Epoch: 3 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:02,107-Speed 2514.53 samples/sec Loss 8.3307 LearningRate 0.000956 Epoch: 3 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:10,306-Speed 2498.20 samples/sec Loss 8.1563 LearningRate 0.000956 Epoch: 3 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:18,509-Speed 2497.13 samples/sec Loss 8.2353 LearningRate 0.000957 Epoch: 3 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:26,709-Speed 2497.93 samples/sec Loss 8.2019 LearningRate 0.000957 Epoch: 3 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:34,906-Speed 2499.05 samples/sec Loss 8.2259 LearningRate 0.000957 Epoch: 3 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:43,101-Speed 2499.15 samples/sec Loss 8.0786 LearningRate 0.000957 Epoch: 3 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:51,248-Speed 2514.11 samples/sec Loss 8.3167 LearningRate 0.000957 Epoch: 3 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:34:59,448-Speed 2498.27 samples/sec Loss 8.2650 LearningRate 0.000957 Epoch: 3 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:07,660-Speed 2494.28 samples/sec Loss 8.1762 LearningRate 0.000957 Epoch: 3 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:15,861-Speed 2497.46 samples/sec Loss 8.2841 LearningRate 0.000957 Epoch: 3 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:24,059-Speed 2498.25 samples/sec Loss 8.2284 LearningRate 0.000957 Epoch: 3 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:32,263-Speed 2496.98 samples/sec Loss 8.1814 LearningRate 0.000958 Epoch: 3 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:40,408-Speed 2515.11 samples/sec Loss 8.2916 LearningRate 0.000958 Epoch: 3 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:48,608-Speed 2497.90 samples/sec Loss 8.2623 LearningRate 0.000958 Epoch: 3 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:35:56,808-Speed 2498.16 samples/sec Loss 8.3234 LearningRate 0.000958 Epoch: 3 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:05,007-Speed 2498.61 samples/sec Loss 8.2275 LearningRate 0.000958 Epoch: 3 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:13,205-Speed 2498.29 samples/sec Loss 8.1697 LearningRate 0.000958 Epoch: 3 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:21,410-Speed 2496.60 samples/sec Loss 8.1634 LearningRate 0.000958 Epoch: 3 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:29,560-Speed 2513.30 samples/sec Loss 8.3006 LearningRate 0.000958 Epoch: 3 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:37,764-Speed 2496.99 samples/sec Loss 8.1967 LearningRate 0.000959 Epoch: 3 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:45,964-Speed 2497.75 samples/sec Loss 8.2100 LearningRate 0.000959 Epoch: 3 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:36:54,165-Speed 2497.83 samples/sec Loss 8.1467 LearningRate 0.000959 Epoch: 3 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:02,363-Speed 2498.59 samples/sec Loss 8.2166 LearningRate 0.000959 Epoch: 3 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:10,563-Speed 2497.91 samples/sec Loss 8.0084 LearningRate 0.000959 Epoch: 3 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:18,725-Speed 2509.64 samples/sec Loss 8.2215 LearningRate 0.000959 Epoch: 3 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:26,921-Speed 2499.04 samples/sec Loss 8.2011 LearningRate 0.000959 Epoch: 3 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:35,121-Speed 2498.14 samples/sec Loss 8.1772 LearningRate 0.000959 Epoch: 3 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:43,332-Speed 2494.46 samples/sec Loss 8.1880 LearningRate 0.000960 Epoch: 3 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:51,532-Speed 2497.96 samples/sec Loss 8.2013 LearningRate 0.000960 Epoch: 3 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:37:59,736-Speed 2496.65 samples/sec Loss 8.0145 LearningRate 0.000960 Epoch: 3 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:38:07,884-Speed 2514.05 samples/sec Loss 8.2618 LearningRate 0.000960 Epoch: 3 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:38:16,082-Speed 2498.54 samples/sec Loss 8.1253 LearningRate 0.000960 Epoch: 3 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:38:24,308-Speed 2490.05 samples/sec Loss 8.0981 LearningRate 0.000960 Epoch: 3 Global Step: 79650 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:38:32,514-Speed 2496.30 samples/sec Loss 8.1933 LearningRate 0.000960 Epoch: 3 Global Step: 79660 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:38:40,715-Speed 2497.65 samples/sec Loss 8.0943 LearningRate 0.000960 Epoch: 3 Global Step: 79670 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:38:48,913-Speed 2498.53 samples/sec Loss 8.1752 LearningRate 0.000960 Epoch: 3 Global Step: 79680 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:38:57,055-Speed 2515.70 samples/sec Loss 8.2620 LearningRate 0.000961 Epoch: 3 Global Step: 79690 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:05,256-Speed 2497.74 samples/sec Loss 8.0974 LearningRate 0.000961 Epoch: 3 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:13,458-Speed 2497.32 samples/sec Loss 8.1544 LearningRate 0.000961 Epoch: 3 Global Step: 79710 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:21,668-Speed 2495.08 samples/sec Loss 8.1702 LearningRate 0.000961 Epoch: 3 Global Step: 79720 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:29,864-Speed 2498.97 samples/sec Loss 8.1897 LearningRate 0.000961 Epoch: 3 Global Step: 79730 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:38,063-Speed 2498.21 samples/sec Loss 8.1303 LearningRate 0.000961 Epoch: 3 Global Step: 79740 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:46,208-Speed 2515.12 samples/sec Loss 8.1253 LearningRate 0.000961 Epoch: 3 Global Step: 79750 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:39:54,423-Speed 2493.20 samples/sec Loss 8.1617 LearningRate 0.000961 Epoch: 3 Global Step: 79760 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:02,621-Speed 2498.40 samples/sec Loss 8.1734 LearningRate 0.000962 Epoch: 3 Global Step: 79770 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:10,823-Speed 2497.35 samples/sec Loss 8.1088 LearningRate 0.000962 Epoch: 3 Global Step: 79780 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:19,025-Speed 2497.44 samples/sec Loss 8.0813 LearningRate 0.000962 Epoch: 3 Global Step: 79790 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:27,225-Speed 2497.88 samples/sec Loss 8.0969 LearningRate 0.000962 Epoch: 3 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:35,381-Speed 2511.52 samples/sec Loss 8.1100 LearningRate 0.000962 Epoch: 3 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:43,579-Speed 2498.54 samples/sec Loss 8.1091 LearningRate 0.000962 Epoch: 3 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:51,781-Speed 2497.73 samples/sec Loss 8.1219 LearningRate 0.000962 Epoch: 3 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:40:59,990-Speed 2495.10 samples/sec Loss 8.0294 LearningRate 0.000962 Epoch: 3 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:08,192-Speed 2497.65 samples/sec Loss 8.1077 LearningRate 0.000963 Epoch: 3 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:16,394-Speed 2497.23 samples/sec Loss 8.0648 LearningRate 0.000963 Epoch: 3 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:24,541-Speed 2514.47 samples/sec Loss 8.1144 LearningRate 0.000963 Epoch: 3 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:32,742-Speed 2497.46 samples/sec Loss 8.0519 LearningRate 0.000963 Epoch: 3 Global Step: 79880 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:40,944-Speed 2497.45 samples/sec Loss 8.1639 LearningRate 0.000963 Epoch: 3 Global Step: 79890 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:49,147-Speed 2497.06 samples/sec Loss 8.1660 LearningRate 0.000963 Epoch: 3 Global Step: 79900 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:41:57,346-Speed 2498.45 samples/sec Loss 8.1406 LearningRate 0.000963 Epoch: 3 Global Step: 79910 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:05,545-Speed 2498.16 samples/sec Loss 8.0663 LearningRate 0.000963 Epoch: 3 Global Step: 79920 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:13,703-Speed 2510.77 samples/sec Loss 8.0185 LearningRate 0.000964 Epoch: 3 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:21,903-Speed 2498.15 samples/sec Loss 8.1268 LearningRate 0.000964 Epoch: 3 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:30,104-Speed 2497.69 samples/sec Loss 8.0880 LearningRate 0.000964 Epoch: 3 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:38,304-Speed 2498.07 samples/sec Loss 8.0876 LearningRate 0.000964 Epoch: 3 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:46,505-Speed 2497.72 samples/sec Loss 8.3118 LearningRate 0.000964 Epoch: 3 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:42:54,709-Speed 2496.73 samples/sec Loss 8.1711 LearningRate 0.000964 Epoch: 3 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:02,856-Speed 2514.02 samples/sec Loss 8.1183 LearningRate 0.000964 Epoch: 3 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:11,081-Speed 2490.59 samples/sec Loss 8.0701 LearningRate 0.000964 Epoch: 3 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:19,280-Speed 2498.06 samples/sec Loss 8.0340 LearningRate 0.000964 Epoch: 3 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:27,494-Speed 2494.04 samples/sec Loss 8.0544 LearningRate 0.000965 Epoch: 3 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:35,704-Speed 2494.99 samples/sec Loss 7.9762 LearningRate 0.000965 Epoch: 3 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:43,912-Speed 2495.22 samples/sec Loss 8.0492 LearningRate 0.000965 Epoch: 3 Global Step: 80040 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:43:52,055-Speed 2515.69 samples/sec Loss 8.1388 LearningRate 0.000965 Epoch: 3 Global Step: 80050 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:00,254-Speed 2498.23 samples/sec Loss 8.0775 LearningRate 0.000965 Epoch: 3 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:08,454-Speed 2497.82 samples/sec Loss 8.0369 LearningRate 0.000965 Epoch: 3 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:16,653-Speed 2498.13 samples/sec Loss 8.0708 LearningRate 0.000965 Epoch: 3 Global Step: 80080 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:24,853-Speed 2498.11 samples/sec Loss 7.9778 LearningRate 0.000965 Epoch: 3 Global Step: 80090 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:33,053-Speed 2497.97 samples/sec Loss 8.1849 LearningRate 0.000966 Epoch: 3 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:41,199-Speed 2514.44 samples/sec Loss 8.0153 LearningRate 0.000966 Epoch: 3 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:49,397-Speed 2498.61 samples/sec Loss 8.0401 LearningRate 0.000966 Epoch: 3 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:44:57,596-Speed 2498.22 samples/sec Loss 8.0900 LearningRate 0.000966 Epoch: 3 Global Step: 80130 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:05,806-Speed 2495.01 samples/sec Loss 8.0326 LearningRate 0.000966 Epoch: 3 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:14,031-Speed 2490.25 samples/sec Loss 8.0616 LearningRate 0.000966 Epoch: 3 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:22,231-Speed 2498.24 samples/sec Loss 8.0886 LearningRate 0.000966 Epoch: 3 Global Step: 80160 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:30,380-Speed 2513.62 samples/sec Loss 8.0206 LearningRate 0.000966 Epoch: 3 Global Step: 80170 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:38,578-Speed 2498.69 samples/sec Loss 8.0573 LearningRate 0.000967 Epoch: 3 Global Step: 80180 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:46,782-Speed 2497.02 samples/sec Loss 8.1090 LearningRate 0.000967 Epoch: 3 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:45:54,979-Speed 2498.99 samples/sec Loss 8.0651 LearningRate 0.000967 Epoch: 3 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:03,175-Speed 2499.01 samples/sec Loss 8.0757 LearningRate 0.000967 Epoch: 3 Global Step: 80210 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:11,374-Speed 2498.35 samples/sec Loss 7.9742 LearningRate 0.000967 Epoch: 3 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:19,518-Speed 2514.98 samples/sec Loss 8.1284 LearningRate 0.000967 Epoch: 3 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:27,719-Speed 2497.62 samples/sec Loss 7.9629 LearningRate 0.000967 Epoch: 3 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:35,917-Speed 2498.46 samples/sec Loss 8.0218 LearningRate 0.000967 Epoch: 3 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:44,117-Speed 2498.01 samples/sec Loss 8.0039 LearningRate 0.000967 Epoch: 3 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:46:52,324-Speed 2495.88 samples/sec Loss 8.0416 LearningRate 0.000968 Epoch: 3 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:00,521-Speed 2498.84 samples/sec Loss 8.4165 LearningRate 0.000968 Epoch: 3 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:08,665-Speed 2515.26 samples/sec Loss 8.3115 LearningRate 0.000968 Epoch: 3 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:16,871-Speed 2496.15 samples/sec Loss 8.2848 LearningRate 0.000968 Epoch: 3 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:25,074-Speed 2497.11 samples/sec Loss 8.2216 LearningRate 0.000968 Epoch: 3 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:33,276-Speed 2497.59 samples/sec Loss 8.2588 LearningRate 0.000968 Epoch: 3 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:41,488-Speed 2500.22 samples/sec Loss 8.2762 LearningRate 0.000968 Epoch: 3 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:49,686-Speed 2498.68 samples/sec Loss 8.1981 LearningRate 0.000968 Epoch: 3 Global Step: 80340 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:47:57,830-Speed 2514.95 samples/sec Loss 8.1163 LearningRate 0.000969 Epoch: 3 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:06,831-Speed 2500.85 samples/sec Loss 8.0981 LearningRate 0.000969 Epoch: 3 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:15,258-Speed 2501.56 samples/sec Loss 8.1111 LearningRate 0.000969 Epoch: 3 Global Step: 80370 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:23,463-Speed 2496.27 samples/sec Loss 7.9949 LearningRate 0.000969 Epoch: 3 Global Step: 80380 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:33,174-Speed 2499.02 samples/sec Loss 8.0081 LearningRate 0.000969 Epoch: 3 Global Step: 80390 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:41,403-Speed 2499.08 samples/sec Loss 8.0510 LearningRate 0.000969 Epoch: 3 Global Step: 80400 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:49,580-Speed 2514.32 samples/sec Loss 7.9656 LearningRate 0.000969 Epoch: 3 Global Step: 80410 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:48:59,771-Speed 2499.95 samples/sec Loss 7.8939 LearningRate 0.000969 Epoch: 3 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:08,000-Speed 2499.88 samples/sec Loss 8.0106 LearningRate 0.000970 Epoch: 3 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:16,226-Speed 2499.34 samples/sec Loss 7.9863 LearningRate 0.000970 Epoch: 3 Global Step: 80440 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:24,427-Speed 2497.53 samples/sec Loss 7.9155 LearningRate 0.000970 Epoch: 3 Global Step: 80450 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:37,631-Speed 2501.95 samples/sec Loss 8.0499 LearningRate 0.000970 Epoch: 3 Global Step: 80460 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:45,794-Speed 2517.76 samples/sec Loss 8.0953 LearningRate 0.000970 Epoch: 3 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:49:54,002-Speed 2495.35 samples/sec Loss 8.1085 LearningRate 0.000970 Epoch: 3 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:02,649-Speed 2500.09 samples/sec Loss 8.2196 LearningRate 0.000970 Epoch: 3 Global Step: 80490 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:12,866-Speed 2009.04 samples/sec Loss 8.0054 LearningRate 0.000970 Epoch: 3 Global Step: 80500 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:21,089-Speed 2498.59 samples/sec Loss 8.1409 LearningRate 0.000971 Epoch: 3 Global Step: 80510 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:29,296-Speed 2495.70 samples/sec Loss 8.1624 LearningRate 0.000971 Epoch: 3 Global Step: 80520 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:37,477-Speed 2514.58 samples/sec Loss 8.0346 LearningRate 0.000971 Epoch: 3 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:49,097-Speed 2498.55 samples/sec Loss 8.1978 LearningRate 0.000971 Epoch: 3 Global Step: 80540 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:50:57,334-Speed 2499.36 samples/sec Loss 8.2078 LearningRate 0.000971 Epoch: 3 Global Step: 80550 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:51:05,541-Speed 2495.49 samples/sec Loss 8.1048 LearningRate 0.000971 Epoch: 3 Global Step: 80560 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:51:13,763-Speed 2498.97 samples/sec Loss 8.0578 LearningRate 0.000971 Epoch: 3 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:51:21,992-Speed 2497.83 samples/sec Loss 8.0535 LearningRate 0.000971 Epoch: 3 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:51:30,141-Speed 2513.49 samples/sec Loss 8.1821 LearningRate 0.000971 Epoch: 3 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 08:51:38,493-Speed 2513.43 samples/sec Loss 8.1247 LearningRate 0.000972 Epoch: 3 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:51:46,710-Speed 2500.29 samples/sec Loss 8.0974 LearningRate 0.000972 Epoch: 3 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:51:54,952-Speed 2500.06 samples/sec Loss 8.0246 LearningRate 0.000972 Epoch: 3 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:08,556-Speed 1505.46 samples/sec Loss 8.0444 LearningRate 0.000972 Epoch: 3 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:16,750-Speed 2500.45 samples/sec Loss 8.0910 LearningRate 0.000972 Epoch: 3 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:24,901-Speed 2514.34 samples/sec Loss 7.9670 LearningRate 0.000972 Epoch: 3 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:33,106-Speed 2496.35 samples/sec Loss 8.0386 LearningRate 0.000972 Epoch: 3 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:46,680-Speed 2498.49 samples/sec Loss 7.9874 LearningRate 0.000972 Epoch: 3 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:52:54,890-Speed 2495.23 samples/sec Loss 8.0648 LearningRate 0.000973 Epoch: 3 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:03,100-Speed 2494.56 samples/sec Loss 8.2196 LearningRate 0.000973 Epoch: 3 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:11,316-Speed 2493.20 samples/sec Loss 8.0457 LearningRate 0.000973 Epoch: 3 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:19,489-Speed 2506.27 samples/sec Loss 8.0667 LearningRate 0.000973 Epoch: 3 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:27,705-Speed 2493.18 samples/sec Loss 8.0837 LearningRate 0.000973 Epoch: 3 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:35,919-Speed 2493.64 samples/sec Loss 8.0724 LearningRate 0.000973 Epoch: 3 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:44,130-Speed 2494.57 samples/sec Loss 8.0203 LearningRate 0.000973 Epoch: 3 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:53:52,338-Speed 2495.57 samples/sec Loss 8.0378 LearningRate 0.000973 Epoch: 3 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:00,545-Speed 2495.78 samples/sec Loss 8.0573 LearningRate 0.000974 Epoch: 3 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:08,697-Speed 2512.76 samples/sec Loss 8.0490 LearningRate 0.000974 Epoch: 3 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:16,902-Speed 2496.43 samples/sec Loss 8.0608 LearningRate 0.000974 Epoch: 3 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:25,105-Speed 2496.87 samples/sec Loss 8.1449 LearningRate 0.000974 Epoch: 3 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:33,310-Speed 2496.63 samples/sec Loss 8.1094 LearningRate 0.000974 Epoch: 3 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:41,517-Speed 2495.85 samples/sec Loss 7.9553 LearningRate 0.000974 Epoch: 3 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:49,726-Speed 2495.17 samples/sec Loss 7.9479 LearningRate 0.000974 Epoch: 3 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:54:57,879-Speed 2512.22 samples/sec Loss 8.0696 LearningRate 0.000974 Epoch: 3 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:06,091-Speed 2494.56 samples/sec Loss 7.9514 LearningRate 0.000974 Epoch: 3 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:14,303-Speed 2494.34 samples/sec Loss 8.0475 LearningRate 0.000975 Epoch: 3 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:22,522-Speed 2492.12 samples/sec Loss 8.1067 LearningRate 0.000975 Epoch: 3 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:30,728-Speed 2496.27 samples/sec Loss 8.3222 LearningRate 0.000975 Epoch: 3 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:38,934-Speed 2495.76 samples/sec Loss 8.1699 LearningRate 0.000975 Epoch: 3 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:47,089-Speed 2511.69 samples/sec Loss 8.1340 LearningRate 0.000975 Epoch: 3 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:55:55,296-Speed 2495.84 samples/sec Loss 8.1083 LearningRate 0.000975 Epoch: 3 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:03,510-Speed 2493.64 samples/sec Loss 8.0929 LearningRate 0.000975 Epoch: 3 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:11,728-Speed 2492.47 samples/sec Loss 8.0230 LearningRate 0.000975 Epoch: 3 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:19,935-Speed 2495.95 samples/sec Loss 7.9452 LearningRate 0.000976 Epoch: 3 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:28,148-Speed 2494.02 samples/sec Loss 8.0268 LearningRate 0.000976 Epoch: 3 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:36,306-Speed 2510.64 samples/sec Loss 8.0215 LearningRate 0.000976 Epoch: 3 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:44,522-Speed 2493.15 samples/sec Loss 8.1255 LearningRate 0.000976 Epoch: 3 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:56:52,733-Speed 2494.46 samples/sec Loss 8.1128 LearningRate 0.000976 Epoch: 3 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:00,940-Speed 2496.01 samples/sec Loss 8.2198 LearningRate 0.000976 Epoch: 3 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:09,145-Speed 2496.27 samples/sec Loss 8.0535 LearningRate 0.000976 Epoch: 3 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:17,350-Speed 2496.47 samples/sec Loss 8.1554 LearningRate 0.000976 Epoch: 3 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:25,504-Speed 2511.82 samples/sec Loss 7.9496 LearningRate 0.000977 Epoch: 3 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:33,718-Speed 2493.88 samples/sec Loss 7.9672 LearningRate 0.000977 Epoch: 3 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:41,951-Speed 2487.70 samples/sec Loss 8.1334 LearningRate 0.000977 Epoch: 3 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:50,163-Speed 2494.39 samples/sec Loss 8.0441 LearningRate 0.000977 Epoch: 3 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:57:58,372-Speed 2495.33 samples/sec Loss 8.0341 LearningRate 0.000977 Epoch: 3 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:06,577-Speed 2496.24 samples/sec Loss 8.0074 LearningRate 0.000977 Epoch: 3 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:14,731-Speed 2512.33 samples/sec Loss 8.0339 LearningRate 0.000977 Epoch: 3 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:22,935-Speed 2496.59 samples/sec Loss 7.9726 LearningRate 0.000977 Epoch: 3 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:31,145-Speed 2494.70 samples/sec Loss 8.0849 LearningRate 0.000977 Epoch: 3 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:39,352-Speed 2495.85 samples/sec Loss 8.0137 LearningRate 0.000978 Epoch: 3 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:47,557-Speed 2496.51 samples/sec Loss 7.9850 LearningRate 0.000978 Epoch: 3 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:58:55,763-Speed 2496.11 samples/sec Loss 7.9466 LearningRate 0.000978 Epoch: 3 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:03,915-Speed 2512.78 samples/sec Loss 7.9285 LearningRate 0.000978 Epoch: 3 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:12,138-Speed 2491.30 samples/sec Loss 7.8748 LearningRate 0.000978 Epoch: 3 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:20,359-Speed 2491.42 samples/sec Loss 7.8970 LearningRate 0.000978 Epoch: 3 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:28,563-Speed 2496.59 samples/sec Loss 7.9050 LearningRate 0.000978 Epoch: 3 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:36,775-Speed 2494.46 samples/sec Loss 7.9049 LearningRate 0.000978 Epoch: 3 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:44,981-Speed 2495.97 samples/sec Loss 7.9066 LearningRate 0.000979 Epoch: 3 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 08:59:53,134-Speed 2512.28 samples/sec Loss 8.2171 LearningRate 0.000979 Epoch: 3 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:01,342-Speed 2495.73 samples/sec Loss 8.0582 LearningRate 0.000979 Epoch: 3 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:09,547-Speed 2496.41 samples/sec Loss 7.9864 LearningRate 0.000979 Epoch: 3 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:17,751-Speed 2496.75 samples/sec Loss 7.9961 LearningRate 0.000979 Epoch: 3 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:25,957-Speed 2496.02 samples/sec Loss 8.0173 LearningRate 0.000979 Epoch: 3 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:34,167-Speed 2494.83 samples/sec Loss 8.1574 LearningRate 0.000979 Epoch: 3 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:42,335-Speed 2507.72 samples/sec Loss 8.0981 LearningRate 0.000979 Epoch: 3 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:50,539-Speed 2496.82 samples/sec Loss 8.1159 LearningRate 0.000980 Epoch: 3 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:00:58,746-Speed 2495.92 samples/sec Loss 8.0458 LearningRate 0.000980 Epoch: 3 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:06,951-Speed 2496.30 samples/sec Loss 7.9635 LearningRate 0.000980 Epoch: 3 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:15,159-Speed 2495.79 samples/sec Loss 8.0408 LearningRate 0.000980 Epoch: 3 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:23,369-Speed 2495.03 samples/sec Loss 8.1461 LearningRate 0.000980 Epoch: 3 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:31,524-Speed 2511.71 samples/sec Loss 8.0504 LearningRate 0.000980 Epoch: 3 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:39,744-Speed 2491.67 samples/sec Loss 8.0487 LearningRate 0.000980 Epoch: 3 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:47,949-Speed 2496.51 samples/sec Loss 8.0956 LearningRate 0.000980 Epoch: 3 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:01:56,155-Speed 2496.15 samples/sec Loss 8.1224 LearningRate 0.000981 Epoch: 3 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:04,360-Speed 2496.48 samples/sec Loss 8.1035 LearningRate 0.000981 Epoch: 3 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:12,565-Speed 2496.27 samples/sec Loss 8.1244 LearningRate 0.000981 Epoch: 3 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:20,718-Speed 2512.50 samples/sec Loss 8.0473 LearningRate 0.000981 Epoch: 3 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:28,922-Speed 2496.57 samples/sec Loss 7.9536 LearningRate 0.000981 Epoch: 3 Global Step: 81380 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:37,125-Speed 2497.11 samples/sec Loss 7.9851 LearningRate 0.000981 Epoch: 3 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:45,328-Speed 2497.02 samples/sec Loss 7.9967 LearningRate 0.000981 Epoch: 3 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:02:53,531-Speed 2497.06 samples/sec Loss 8.0454 LearningRate 0.000981 Epoch: 3 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:01,736-Speed 2496.64 samples/sec Loss 7.9537 LearningRate 0.000981 Epoch: 3 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:09,886-Speed 2513.15 samples/sec Loss 8.0002 LearningRate 0.000982 Epoch: 3 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:18,091-Speed 2496.70 samples/sec Loss 8.1263 LearningRate 0.000982 Epoch: 3 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:26,297-Speed 2496.06 samples/sec Loss 8.0024 LearningRate 0.000982 Epoch: 3 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:34,501-Speed 2496.78 samples/sec Loss 7.9433 LearningRate 0.000982 Epoch: 3 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:42,702-Speed 2497.79 samples/sec Loss 8.0437 LearningRate 0.000982 Epoch: 3 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:50,908-Speed 2496.60 samples/sec Loss 8.0356 LearningRate 0.000982 Epoch: 3 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:03:59,058-Speed 2513.22 samples/sec Loss 8.0416 LearningRate 0.000982 Epoch: 3 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:07,264-Speed 2496.08 samples/sec Loss 8.0002 LearningRate 0.000982 Epoch: 3 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:15,467-Speed 2497.66 samples/sec Loss 7.9368 LearningRate 0.000983 Epoch: 3 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:23,671-Speed 2496.87 samples/sec Loss 8.0267 LearningRate 0.000983 Epoch: 3 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:31,876-Speed 2496.34 samples/sec Loss 8.1328 LearningRate 0.000983 Epoch: 3 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:40,080-Speed 2496.92 samples/sec Loss 7.9666 LearningRate 0.000983 Epoch: 3 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:48,234-Speed 2512.15 samples/sec Loss 8.1362 LearningRate 0.000983 Epoch: 3 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:04:56,442-Speed 2495.53 samples/sec Loss 8.0337 LearningRate 0.000983 Epoch: 3 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:04,647-Speed 2496.84 samples/sec Loss 7.9577 LearningRate 0.000983 Epoch: 3 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:12,848-Speed 2497.52 samples/sec Loss 7.9904 LearningRate 0.000983 Epoch: 3 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:21,053-Speed 2496.19 samples/sec Loss 8.0717 LearningRate 0.000984 Epoch: 3 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:29,260-Speed 2496.10 samples/sec Loss 8.0653 LearningRate 0.000984 Epoch: 3 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:37,409-Speed 2513.62 samples/sec Loss 8.0346 LearningRate 0.000984 Epoch: 3 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:45,619-Speed 2494.86 samples/sec Loss 8.1131 LearningRate 0.000984 Epoch: 3 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:05:53,822-Speed 2497.11 samples/sec Loss 8.1381 LearningRate 0.000984 Epoch: 3 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:02,028-Speed 2496.27 samples/sec Loss 8.0064 LearningRate 0.000984 Epoch: 3 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:10,233-Speed 2496.57 samples/sec Loss 8.0570 LearningRate 0.000984 Epoch: 3 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:18,436-Speed 2496.96 samples/sec Loss 7.9879 LearningRate 0.000984 Epoch: 3 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:26,587-Speed 2512.83 samples/sec Loss 8.0971 LearningRate 0.000984 Epoch: 3 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:34,794-Speed 2495.92 samples/sec Loss 8.0686 LearningRate 0.000985 Epoch: 3 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:42,995-Speed 2497.62 samples/sec Loss 8.0080 LearningRate 0.000985 Epoch: 3 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:51,197-Speed 2497.11 samples/sec Loss 8.0432 LearningRate 0.000985 Epoch: 3 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:06:59,399-Speed 2497.31 samples/sec Loss 7.9306 LearningRate 0.000985 Epoch: 3 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:07,602-Speed 2497.62 samples/sec Loss 8.0852 LearningRate 0.000985 Epoch: 3 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:15,752-Speed 2513.01 samples/sec Loss 7.8984 LearningRate 0.000985 Epoch: 3 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:23,957-Speed 2496.65 samples/sec Loss 7.9462 LearningRate 0.000985 Epoch: 3 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:32,162-Speed 2496.48 samples/sec Loss 8.0259 LearningRate 0.000985 Epoch: 3 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:40,366-Speed 2496.68 samples/sec Loss 7.9619 LearningRate 0.000986 Epoch: 3 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:48,571-Speed 2496.48 samples/sec Loss 7.9922 LearningRate 0.000986 Epoch: 3 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:07:56,777-Speed 2496.12 samples/sec Loss 7.9056 LearningRate 0.000986 Epoch: 3 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:04,926-Speed 2513.49 samples/sec Loss 7.9276 LearningRate 0.000986 Epoch: 3 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:13,128-Speed 2497.73 samples/sec Loss 7.8327 LearningRate 0.000986 Epoch: 3 Global Step: 81800 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:08:21,288-Speed 2510.19 samples/sec Loss 7.8261 LearningRate 0.000986 Epoch: 3 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:29,491-Speed 2496.92 samples/sec Loss 7.7872 LearningRate 0.000986 Epoch: 3 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:37,694-Speed 2497.10 samples/sec Loss 7.8757 LearningRate 0.000986 Epoch: 3 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:45,909-Speed 2493.70 samples/sec Loss 7.8178 LearningRate 0.000987 Epoch: 3 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:08:54,059-Speed 2513.32 samples/sec Loss 7.8555 LearningRate 0.000987 Epoch: 3 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:02,266-Speed 2495.94 samples/sec Loss 8.0331 LearningRate 0.000987 Epoch: 3 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:10,473-Speed 2495.77 samples/sec Loss 7.9678 LearningRate 0.000987 Epoch: 3 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:18,682-Speed 2495.28 samples/sec Loss 8.0310 LearningRate 0.000987 Epoch: 3 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:26,885-Speed 2497.04 samples/sec Loss 8.0787 LearningRate 0.000987 Epoch: 3 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:35,089-Speed 2496.64 samples/sec Loss 8.0128 LearningRate 0.000987 Epoch: 3 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:43,254-Speed 2508.89 samples/sec Loss 7.9686 LearningRate 0.000987 Epoch: 3 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:51,458-Speed 2496.59 samples/sec Loss 8.0879 LearningRate 0.000987 Epoch: 3 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:09:59,663-Speed 2496.32 samples/sec Loss 8.0169 LearningRate 0.000988 Epoch: 3 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:07,870-Speed 2496.20 samples/sec Loss 8.0284 LearningRate 0.000988 Epoch: 3 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:16,073-Speed 2496.90 samples/sec Loss 8.0617 LearningRate 0.000988 Epoch: 3 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:24,275-Speed 2497.16 samples/sec Loss 7.9727 LearningRate 0.000988 Epoch: 3 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:32,426-Speed 2513.04 samples/sec Loss 7.8605 LearningRate 0.000988 Epoch: 3 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:40,629-Speed 2496.97 samples/sec Loss 7.9354 LearningRate 0.000988 Epoch: 3 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:48,836-Speed 2495.85 samples/sec Loss 7.8562 LearningRate 0.000988 Epoch: 3 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:10:57,044-Speed 2495.61 samples/sec Loss 7.8756 LearningRate 0.000988 Epoch: 3 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:05,249-Speed 2496.38 samples/sec Loss 7.9123 LearningRate 0.000989 Epoch: 3 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:13,460-Speed 2494.42 samples/sec Loss 7.8883 LearningRate 0.000989 Epoch: 3 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:21,611-Speed 2513.04 samples/sec Loss 8.2170 LearningRate 0.000989 Epoch: 3 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:29,817-Speed 2496.00 samples/sec Loss 8.0595 LearningRate 0.000989 Epoch: 3 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:38,020-Speed 2497.26 samples/sec Loss 8.0257 LearningRate 0.000989 Epoch: 3 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:46,225-Speed 2496.43 samples/sec Loss 8.0925 LearningRate 0.000989 Epoch: 3 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:11:54,430-Speed 2496.35 samples/sec Loss 8.0929 LearningRate 0.000989 Epoch: 3 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:02,636-Speed 2495.99 samples/sec Loss 8.1719 LearningRate 0.000989 Epoch: 3 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:10,785-Speed 2513.87 samples/sec Loss 7.9319 LearningRate 0.000990 Epoch: 3 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:19,000-Speed 2493.40 samples/sec Loss 8.0777 LearningRate 0.000990 Epoch: 3 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:27,199-Speed 2498.39 samples/sec Loss 8.0495 LearningRate 0.000990 Epoch: 3 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:35,405-Speed 2496.07 samples/sec Loss 8.0054 LearningRate 0.000990 Epoch: 3 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:43,610-Speed 2496.18 samples/sec Loss 7.9413 LearningRate 0.000990 Epoch: 3 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:51,816-Speed 2496.41 samples/sec Loss 8.0018 LearningRate 0.000990 Epoch: 3 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:12:59,975-Speed 2510.58 samples/sec Loss 7.9979 LearningRate 0.000990 Epoch: 3 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:08,175-Speed 2497.70 samples/sec Loss 7.9669 LearningRate 0.000990 Epoch: 3 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:16,378-Speed 2497.09 samples/sec Loss 7.9617 LearningRate 0.000991 Epoch: 3 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:24,585-Speed 2496.01 samples/sec Loss 8.0066 LearningRate 0.000991 Epoch: 3 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:32,798-Speed 2493.86 samples/sec Loss 7.9348 LearningRate 0.000991 Epoch: 3 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:41,017-Speed 2492.33 samples/sec Loss 7.9147 LearningRate 0.000991 Epoch: 3 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:49,167-Speed 2512.97 samples/sec Loss 7.9233 LearningRate 0.000991 Epoch: 3 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:13:57,370-Speed 2497.16 samples/sec Loss 7.8171 LearningRate 0.000991 Epoch: 3 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:05,571-Speed 2497.81 samples/sec Loss 7.9034 LearningRate 0.000991 Epoch: 3 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:13,773-Speed 2497.06 samples/sec Loss 7.9044 LearningRate 0.000991 Epoch: 3 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:21,976-Speed 2497.18 samples/sec Loss 7.8995 LearningRate 0.000991 Epoch: 3 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:30,179-Speed 2496.91 samples/sec Loss 7.7592 LearningRate 0.000992 Epoch: 3 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:38,333-Speed 2512.21 samples/sec Loss 8.0057 LearningRate 0.000992 Epoch: 3 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:46,539-Speed 2496.41 samples/sec Loss 7.8922 LearningRate 0.000992 Epoch: 3 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:14:54,741-Speed 2497.18 samples/sec Loss 7.8222 LearningRate 0.000992 Epoch: 3 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:02,945-Speed 2496.89 samples/sec Loss 7.7909 LearningRate 0.000992 Epoch: 3 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:11,151-Speed 2495.76 samples/sec Loss 7.7879 LearningRate 0.000992 Epoch: 3 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:19,366-Speed 2493.44 samples/sec Loss 7.8825 LearningRate 0.000992 Epoch: 3 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:27,518-Speed 2512.68 samples/sec Loss 7.9948 LearningRate 0.000992 Epoch: 3 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:35,722-Speed 2496.74 samples/sec Loss 8.0158 LearningRate 0.000993 Epoch: 3 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:43,926-Speed 2496.69 samples/sec Loss 7.8950 LearningRate 0.000993 Epoch: 3 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:15:52,130-Speed 2497.07 samples/sec Loss 7.9337 LearningRate 0.000993 Epoch: 3 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:00,332-Speed 2497.21 samples/sec Loss 7.9718 LearningRate 0.000993 Epoch: 3 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:08,535-Speed 2497.01 samples/sec Loss 7.9144 LearningRate 0.000993 Epoch: 3 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:16,688-Speed 2512.41 samples/sec Loss 7.9594 LearningRate 0.000993 Epoch: 3 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:24,896-Speed 2495.56 samples/sec Loss 7.8686 LearningRate 0.000993 Epoch: 3 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:33,101-Speed 2496.70 samples/sec Loss 7.9359 LearningRate 0.000993 Epoch: 3 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:41,307-Speed 2496.02 samples/sec Loss 7.8368 LearningRate 0.000994 Epoch: 3 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:49,508-Speed 2497.44 samples/sec Loss 7.8809 LearningRate 0.000994 Epoch: 3 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:16:57,715-Speed 2495.96 samples/sec Loss 7.9705 LearningRate 0.000994 Epoch: 3 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:05,863-Speed 2514.03 samples/sec Loss 7.8817 LearningRate 0.000994 Epoch: 3 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:14,068-Speed 2496.33 samples/sec Loss 7.9144 LearningRate 0.000994 Epoch: 3 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:22,276-Speed 2495.42 samples/sec Loss 7.8747 LearningRate 0.000994 Epoch: 3 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:30,482-Speed 2496.43 samples/sec Loss 7.8960 LearningRate 0.000994 Epoch: 3 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:38,686-Speed 2496.51 samples/sec Loss 7.9322 LearningRate 0.000994 Epoch: 3 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:46,896-Speed 2495.15 samples/sec Loss 8.0765 LearningRate 0.000994 Epoch: 3 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:17:55,043-Speed 2514.34 samples/sec Loss 8.0797 LearningRate 0.000995 Epoch: 3 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:03,259-Speed 2493.05 samples/sec Loss 7.9544 LearningRate 0.000995 Epoch: 3 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:11,462-Speed 2496.75 samples/sec Loss 7.9112 LearningRate 0.000995 Epoch: 3 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:19,670-Speed 2495.80 samples/sec Loss 7.8273 LearningRate 0.000995 Epoch: 3 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:27,873-Speed 2496.86 samples/sec Loss 7.8905 LearningRate 0.000995 Epoch: 3 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:36,088-Speed 2493.26 samples/sec Loss 7.8891 LearningRate 0.000995 Epoch: 3 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:44,237-Speed 2513.64 samples/sec Loss 7.9056 LearningRate 0.000995 Epoch: 3 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:18:52,443-Speed 2496.07 samples/sec Loss 7.8171 LearningRate 0.000995 Epoch: 3 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:00,654-Speed 2494.61 samples/sec Loss 7.8332 LearningRate 0.000996 Epoch: 3 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:08,864-Speed 2495.38 samples/sec Loss 7.9437 LearningRate 0.000996 Epoch: 3 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:17,067-Speed 2496.83 samples/sec Loss 7.9283 LearningRate 0.000996 Epoch: 3 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:25,281-Speed 2493.67 samples/sec Loss 8.0273 LearningRate 0.000996 Epoch: 3 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:33,431-Speed 2513.33 samples/sec Loss 7.8656 LearningRate 0.000996 Epoch: 3 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:41,635-Speed 2497.07 samples/sec Loss 7.9042 LearningRate 0.000996 Epoch: 3 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:49,836-Speed 2497.46 samples/sec Loss 7.8438 LearningRate 0.000996 Epoch: 3 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:19:58,042-Speed 2496.02 samples/sec Loss 7.9046 LearningRate 0.000996 Epoch: 3 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:06,244-Speed 2497.48 samples/sec Loss 7.8599 LearningRate 0.000997 Epoch: 3 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:14,447-Speed 2497.15 samples/sec Loss 7.9911 LearningRate 0.000997 Epoch: 3 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:22,598-Speed 2512.87 samples/sec Loss 7.9742 LearningRate 0.000997 Epoch: 3 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:30,800-Speed 2497.07 samples/sec Loss 7.8663 LearningRate 0.000997 Epoch: 3 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:39,005-Speed 2496.46 samples/sec Loss 7.9817 LearningRate 0.000997 Epoch: 3 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:47,219-Speed 2493.79 samples/sec Loss 7.9156 LearningRate 0.000997 Epoch: 3 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:20:55,421-Speed 2497.17 samples/sec Loss 7.9205 LearningRate 0.000997 Epoch: 3 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:03,623-Speed 2497.80 samples/sec Loss 7.8376 LearningRate 0.000997 Epoch: 3 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:11,774-Speed 2513.15 samples/sec Loss 8.0004 LearningRate 0.000998 Epoch: 3 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:19,976-Speed 2497.02 samples/sec Loss 7.8398 LearningRate 0.000998 Epoch: 3 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:28,180-Speed 2497.40 samples/sec Loss 7.8659 LearningRate 0.000998 Epoch: 3 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:36,385-Speed 2496.44 samples/sec Loss 7.8915 LearningRate 0.000998 Epoch: 3 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:44,590-Speed 2496.63 samples/sec Loss 7.9878 LearningRate 0.000998 Epoch: 3 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:21:52,792-Speed 2497.04 samples/sec Loss 8.0038 LearningRate 0.000998 Epoch: 3 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:00,947-Speed 2511.70 samples/sec Loss 7.8677 LearningRate 0.000998 Epoch: 3 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:09,151-Speed 2496.72 samples/sec Loss 8.0610 LearningRate 0.000998 Epoch: 3 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:17,354-Speed 2497.30 samples/sec Loss 7.9803 LearningRate 0.000998 Epoch: 3 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:25,555-Speed 2497.73 samples/sec Loss 7.9493 LearningRate 0.000999 Epoch: 3 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:33,759-Speed 2496.66 samples/sec Loss 7.9208 LearningRate 0.000999 Epoch: 3 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:41,958-Speed 2498.08 samples/sec Loss 8.0197 LearningRate 0.000999 Epoch: 3 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:50,105-Speed 2514.15 samples/sec Loss 8.0699 LearningRate 0.000999 Epoch: 3 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:22:58,320-Speed 2493.30 samples/sec Loss 8.0114 LearningRate 0.000999 Epoch: 3 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:06,521-Speed 2497.46 samples/sec Loss 7.8447 LearningRate 0.000999 Epoch: 3 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:14,723-Speed 2497.28 samples/sec Loss 7.8152 LearningRate 0.000999 Epoch: 3 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:22,932-Speed 2495.19 samples/sec Loss 7.9648 LearningRate 0.000999 Epoch: 3 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:31,133-Speed 2497.81 samples/sec Loss 8.0212 LearningRate 0.001000 Epoch: 3 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:39,283-Speed 2513.06 samples/sec Loss 7.9034 LearningRate 0.001000 Epoch: 3 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:47,483-Speed 2497.83 samples/sec Loss 7.9226 LearningRate 0.001000 Epoch: 3 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:23:55,683-Speed 2497.97 samples/sec Loss 7.7999 LearningRate 0.001000 Epoch: 3 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:06,005-Speed 1984.25 samples/sec Loss 7.9449 LearningRate 0.001000 Epoch: 4 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:14,202-Speed 2498.91 samples/sec Loss 7.9014 LearningRate 0.001000 Epoch: 4 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:22,399-Speed 2498.64 samples/sec Loss 7.8654 LearningRate 0.001000 Epoch: 4 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:30,542-Speed 2515.40 samples/sec Loss 7.9477 LearningRate 0.001000 Epoch: 4 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:38,738-Speed 2499.01 samples/sec Loss 7.9214 LearningRate 0.001000 Epoch: 4 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:24:46,943-Speed 2496.39 samples/sec Loss 7.8857 LearningRate 0.001000 Epoch: 4 Global Step: 83010 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:24:55,142-Speed 2498.50 samples/sec Loss 7.9292 LearningRate 0.001000 Epoch: 4 Global Step: 83020 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:03,338-Speed 2499.08 samples/sec Loss 8.0116 LearningRate 0.001000 Epoch: 4 Global Step: 83030 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:11,535-Speed 2498.83 samples/sec Loss 7.9850 LearningRate 0.001000 Epoch: 4 Global Step: 83040 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:19,678-Speed 2515.52 samples/sec Loss 7.9500 LearningRate 0.001000 Epoch: 4 Global Step: 83050 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:27,902-Speed 2490.91 samples/sec Loss 7.9548 LearningRate 0.001000 Epoch: 4 Global Step: 83060 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:36,112-Speed 2494.71 samples/sec Loss 7.8163 LearningRate 0.001000 Epoch: 4 Global Step: 83070 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:44,314-Speed 2497.18 samples/sec Loss 7.7935 LearningRate 0.001000 Epoch: 4 Global Step: 83080 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:25:52,513-Speed 2498.61 samples/sec Loss 8.0296 LearningRate 0.001000 Epoch: 4 Global Step: 83090 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:00,710-Speed 2498.97 samples/sec Loss 7.8482 LearningRate 0.001000 Epoch: 4 Global Step: 83100 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:08,855-Speed 2514.81 samples/sec Loss 7.9368 LearningRate 0.001000 Epoch: 4 Global Step: 83110 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:17,065-Speed 2495.13 samples/sec Loss 7.9993 LearningRate 0.001000 Epoch: 4 Global Step: 83120 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:25,264-Speed 2498.58 samples/sec Loss 7.9480 LearningRate 0.001000 Epoch: 4 Global Step: 83130 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:33,468-Speed 2496.73 samples/sec Loss 7.9929 LearningRate 0.001000 Epoch: 4 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:41,666-Speed 2498.44 samples/sec Loss 7.8700 LearningRate 0.000999 Epoch: 4 Global Step: 83150 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:49,866-Speed 2498.14 samples/sec Loss 7.8302 LearningRate 0.000999 Epoch: 4 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:26:58,015-Speed 2513.50 samples/sec Loss 7.9030 LearningRate 0.000999 Epoch: 4 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:06,213-Speed 2498.48 samples/sec Loss 7.9200 LearningRate 0.000999 Epoch: 4 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:14,413-Speed 2497.91 samples/sec Loss 8.0338 LearningRate 0.000999 Epoch: 4 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:22,630-Speed 2493.02 samples/sec Loss 7.9659 LearningRate 0.000999 Epoch: 4 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:30,829-Speed 2498.29 samples/sec Loss 7.9538 LearningRate 0.000999 Epoch: 4 Global Step: 83210 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:39,032-Speed 2497.00 samples/sec Loss 7.8415 LearningRate 0.000999 Epoch: 4 Global Step: 83220 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:47,186-Speed 2512.15 samples/sec Loss 7.9370 LearningRate 0.000999 Epoch: 4 Global Step: 83230 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:27:55,384-Speed 2498.50 samples/sec Loss 7.9120 LearningRate 0.000999 Epoch: 4 Global Step: 83240 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:03,587-Speed 2497.00 samples/sec Loss 7.8600 LearningRate 0.000999 Epoch: 4 Global Step: 83250 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:11,786-Speed 2498.48 samples/sec Loss 7.7449 LearningRate 0.000999 Epoch: 4 Global Step: 83260 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:19,997-Speed 2494.89 samples/sec Loss 7.8998 LearningRate 0.000999 Epoch: 4 Global Step: 83270 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:28,197-Speed 2497.79 samples/sec Loss 7.8118 LearningRate 0.000999 Epoch: 4 Global Step: 83280 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:36,342-Speed 2514.74 samples/sec Loss 7.7761 LearningRate 0.000999 Epoch: 4 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:44,543-Speed 2497.85 samples/sec Loss 7.9449 LearningRate 0.000999 Epoch: 4 Global Step: 83300 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:28:52,742-Speed 2498.18 samples/sec Loss 7.9155 LearningRate 0.000999 Epoch: 4 Global Step: 83310 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:00,941-Speed 2498.45 samples/sec Loss 7.7904 LearningRate 0.000999 Epoch: 4 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:09,143-Speed 2497.28 samples/sec Loss 7.7530 LearningRate 0.000999 Epoch: 4 Global Step: 83330 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:17,342-Speed 2498.07 samples/sec Loss 7.8530 LearningRate 0.000999 Epoch: 4 Global Step: 83340 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:25,489-Speed 2514.38 samples/sec Loss 7.9235 LearningRate 0.000999 Epoch: 4 Global Step: 83350 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:33,688-Speed 2498.47 samples/sec Loss 8.0272 LearningRate 0.000999 Epoch: 4 Global Step: 83360 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:41,884-Speed 2498.91 samples/sec Loss 7.8536 LearningRate 0.000999 Epoch: 4 Global Step: 83370 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:50,083-Speed 2498.49 samples/sec Loss 7.8731 LearningRate 0.000999 Epoch: 4 Global Step: 83380 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:29:58,284-Speed 2497.38 samples/sec Loss 7.9240 LearningRate 0.000999 Epoch: 4 Global Step: 83390 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:06,486-Speed 2497.54 samples/sec Loss 7.9162 LearningRate 0.000999 Epoch: 4 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:14,630-Speed 2515.13 samples/sec Loss 7.7976 LearningRate 0.000999 Epoch: 4 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:22,831-Speed 2497.69 samples/sec Loss 7.7325 LearningRate 0.000999 Epoch: 4 Global Step: 83420 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:31,026-Speed 2499.31 samples/sec Loss 7.9070 LearningRate 0.000999 Epoch: 4 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:39,227-Speed 2497.82 samples/sec Loss 7.9034 LearningRate 0.000999 Epoch: 4 Global Step: 83440 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:47,436-Speed 2494.96 samples/sec Loss 7.7473 LearningRate 0.000999 Epoch: 4 Global Step: 83450 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:30:55,636-Speed 2498.35 samples/sec Loss 7.8614 LearningRate 0.000999 Epoch: 4 Global Step: 83460 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:31:03,783-Speed 2514.14 samples/sec Loss 7.7943 LearningRate 0.000999 Epoch: 4 Global Step: 83470 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:31:11,983-Speed 2498.03 samples/sec Loss 7.7692 LearningRate 0.000999 Epoch: 4 Global Step: 83480 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:31:20,181-Speed 2498.38 samples/sec Loss 7.8418 LearningRate 0.000999 Epoch: 4 Global Step: 83490 Fp16 Grad Scale: 131072 Required: 171 hours Training: 2022-07-06 09:31:28,338-Speed 2511.25 samples/sec Loss 7.7623 LearningRate 0.000999 Epoch: 4 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:31:36,546-Speed 2495.70 samples/sec Loss 7.8065 LearningRate 0.000999 Epoch: 4 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:31:44,743-Speed 2498.62 samples/sec Loss 7.7856 LearningRate 0.000998 Epoch: 4 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 171 hours Training: 2022-07-06 09:31:52,899-Speed 2511.73 samples/sec Loss 7.8225 LearningRate 0.000998 Epoch: 4 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:01,111-Speed 2494.04 samples/sec Loss 7.7714 LearningRate 0.000998 Epoch: 4 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:09,310-Speed 2498.21 samples/sec Loss 7.7651 LearningRate 0.000998 Epoch: 4 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:17,508-Speed 2498.64 samples/sec Loss 7.9101 LearningRate 0.000998 Epoch: 4 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:25,712-Speed 2496.81 samples/sec Loss 7.8532 LearningRate 0.000998 Epoch: 4 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:33,914-Speed 2497.38 samples/sec Loss 7.8488 LearningRate 0.000998 Epoch: 4 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:42,059-Speed 2514.67 samples/sec Loss 7.8131 LearningRate 0.000998 Epoch: 4 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:50,261-Speed 2497.38 samples/sec Loss 7.9986 LearningRate 0.000998 Epoch: 4 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:32:58,463-Speed 2497.36 samples/sec Loss 7.9657 LearningRate 0.000998 Epoch: 4 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:06,662-Speed 2498.71 samples/sec Loss 7.9839 LearningRate 0.000998 Epoch: 4 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:14,859-Speed 2498.91 samples/sec Loss 7.9835 LearningRate 0.000998 Epoch: 4 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:23,060-Speed 2497.58 samples/sec Loss 7.9762 LearningRate 0.000998 Epoch: 4 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:31,212-Speed 2512.56 samples/sec Loss 7.8151 LearningRate 0.000998 Epoch: 4 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:39,411-Speed 2498.39 samples/sec Loss 7.9451 LearningRate 0.000998 Epoch: 4 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:47,621-Speed 2494.89 samples/sec Loss 7.9364 LearningRate 0.000998 Epoch: 4 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:33:55,823-Speed 2497.39 samples/sec Loss 7.8567 LearningRate 0.000998 Epoch: 4 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:04,037-Speed 2493.70 samples/sec Loss 7.8721 LearningRate 0.000998 Epoch: 4 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:12,236-Speed 2498.46 samples/sec Loss 7.8047 LearningRate 0.000998 Epoch: 4 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:20,383-Speed 2514.31 samples/sec Loss 7.9030 LearningRate 0.000998 Epoch: 4 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:28,593-Speed 2495.05 samples/sec Loss 7.8982 LearningRate 0.000998 Epoch: 4 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:36,790-Speed 2498.96 samples/sec Loss 7.9431 LearningRate 0.000998 Epoch: 4 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:44,988-Speed 2498.47 samples/sec Loss 7.8736 LearningRate 0.000998 Epoch: 4 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:34:53,192-Speed 2497.07 samples/sec Loss 7.8779 LearningRate 0.000998 Epoch: 4 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:01,416-Speed 2490.53 samples/sec Loss 7.9335 LearningRate 0.000998 Epoch: 4 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:09,569-Speed 2512.46 samples/sec Loss 7.9073 LearningRate 0.000998 Epoch: 4 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:17,769-Speed 2498.13 samples/sec Loss 7.9483 LearningRate 0.000998 Epoch: 4 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:25,968-Speed 2498.23 samples/sec Loss 7.9492 LearningRate 0.000998 Epoch: 4 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:34,171-Speed 2497.03 samples/sec Loss 8.0009 LearningRate 0.000998 Epoch: 4 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:42,369-Speed 2498.68 samples/sec Loss 7.8402 LearningRate 0.000998 Epoch: 4 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:50,569-Speed 2497.94 samples/sec Loss 7.8895 LearningRate 0.000998 Epoch: 4 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:35:58,717-Speed 2513.97 samples/sec Loss 7.9228 LearningRate 0.000998 Epoch: 4 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:06,919-Speed 2497.37 samples/sec Loss 7.8926 LearningRate 0.000998 Epoch: 4 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:15,117-Speed 2498.33 samples/sec Loss 7.8627 LearningRate 0.000998 Epoch: 4 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:23,319-Speed 2497.36 samples/sec Loss 7.8510 LearningRate 0.000998 Epoch: 4 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:31,521-Speed 2497.50 samples/sec Loss 7.8089 LearningRate 0.000998 Epoch: 4 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:39,722-Speed 2497.62 samples/sec Loss 7.9111 LearningRate 0.000998 Epoch: 4 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:47,869-Speed 2514.39 samples/sec Loss 7.7782 LearningRate 0.000998 Epoch: 4 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:36:56,067-Speed 2498.51 samples/sec Loss 7.7159 LearningRate 0.000997 Epoch: 4 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:04,267-Speed 2497.98 samples/sec Loss 7.8641 LearningRate 0.000997 Epoch: 4 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:12,468-Speed 2497.59 samples/sec Loss 7.8589 LearningRate 0.000997 Epoch: 4 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:20,668-Speed 2497.90 samples/sec Loss 7.8579 LearningRate 0.000997 Epoch: 4 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:28,866-Speed 2498.70 samples/sec Loss 7.7440 LearningRate 0.000997 Epoch: 4 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:37,011-Speed 2514.63 samples/sec Loss 7.8069 LearningRate 0.000997 Epoch: 4 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:45,214-Speed 2497.37 samples/sec Loss 7.7919 LearningRate 0.000997 Epoch: 4 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:37:53,413-Speed 2498.07 samples/sec Loss 7.8185 LearningRate 0.000997 Epoch: 4 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:01,613-Speed 2498.23 samples/sec Loss 7.8086 LearningRate 0.000997 Epoch: 4 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:09,812-Speed 2498.23 samples/sec Loss 7.7507 LearningRate 0.000997 Epoch: 4 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:18,024-Speed 2494.29 samples/sec Loss 7.7571 LearningRate 0.000997 Epoch: 4 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:26,168-Speed 2515.29 samples/sec Loss 7.8274 LearningRate 0.000997 Epoch: 4 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:34,366-Speed 2498.62 samples/sec Loss 7.7073 LearningRate 0.000997 Epoch: 4 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:42,565-Speed 2498.26 samples/sec Loss 7.7964 LearningRate 0.000997 Epoch: 4 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:50,764-Speed 2498.45 samples/sec Loss 7.7569 LearningRate 0.000997 Epoch: 4 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:38:58,963-Speed 2498.22 samples/sec Loss 7.7325 LearningRate 0.000997 Epoch: 4 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:07,167-Speed 2496.95 samples/sec Loss 7.8161 LearningRate 0.000997 Epoch: 4 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:15,314-Speed 2514.14 samples/sec Loss 7.7719 LearningRate 0.000997 Epoch: 4 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:23,525-Speed 2494.80 samples/sec Loss 7.7695 LearningRate 0.000997 Epoch: 4 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:31,726-Speed 2497.85 samples/sec Loss 7.7156 LearningRate 0.000997 Epoch: 4 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:39,928-Speed 2497.22 samples/sec Loss 7.8602 LearningRate 0.000997 Epoch: 4 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:48,137-Speed 2495.46 samples/sec Loss 7.7931 LearningRate 0.000997 Epoch: 4 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:39:56,336-Speed 2498.09 samples/sec Loss 7.6734 LearningRate 0.000997 Epoch: 4 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:04,482-Speed 2514.35 samples/sec Loss 7.7282 LearningRate 0.000997 Epoch: 4 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:12,679-Speed 2498.88 samples/sec Loss 7.6876 LearningRate 0.000997 Epoch: 4 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:20,890-Speed 2494.87 samples/sec Loss 7.7236 LearningRate 0.000997 Epoch: 4 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:29,090-Speed 2498.31 samples/sec Loss 7.7528 LearningRate 0.000997 Epoch: 4 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:37,287-Speed 2498.80 samples/sec Loss 7.7994 LearningRate 0.000997 Epoch: 4 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:45,488-Speed 2497.94 samples/sec Loss 7.7631 LearningRate 0.000997 Epoch: 4 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:40:53,633-Speed 2514.97 samples/sec Loss 7.9538 LearningRate 0.000997 Epoch: 4 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:01,835-Speed 2497.27 samples/sec Loss 7.7370 LearningRate 0.000997 Epoch: 4 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:10,038-Speed 2497.17 samples/sec Loss 7.8775 LearningRate 0.000997 Epoch: 4 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:18,241-Speed 2497.15 samples/sec Loss 7.7566 LearningRate 0.000997 Epoch: 4 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:26,437-Speed 2499.18 samples/sec Loss 7.7043 LearningRate 0.000997 Epoch: 4 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:34,641-Speed 2497.03 samples/sec Loss 7.7304 LearningRate 0.000997 Epoch: 4 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:42,794-Speed 2512.44 samples/sec Loss 7.6401 LearningRate 0.000997 Epoch: 4 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:50,992-Speed 2498.49 samples/sec Loss 7.8852 LearningRate 0.000997 Epoch: 4 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:41:59,191-Speed 2498.27 samples/sec Loss 7.8313 LearningRate 0.000996 Epoch: 4 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:07,389-Speed 2498.52 samples/sec Loss 7.7744 LearningRate 0.000996 Epoch: 4 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:15,589-Speed 2498.07 samples/sec Loss 7.8263 LearningRate 0.000996 Epoch: 4 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:23,795-Speed 2496.32 samples/sec Loss 7.8021 LearningRate 0.000996 Epoch: 4 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:31,964-Speed 2507.35 samples/sec Loss 7.8838 LearningRate 0.000996 Epoch: 4 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:40,162-Speed 2498.50 samples/sec Loss 7.8556 LearningRate 0.000996 Epoch: 4 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:48,359-Speed 2499.00 samples/sec Loss 7.8454 LearningRate 0.000996 Epoch: 4 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:42:56,572-Speed 2494.00 samples/sec Loss 7.7796 LearningRate 0.000996 Epoch: 4 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:04,773-Speed 2497.60 samples/sec Loss 7.8591 LearningRate 0.000996 Epoch: 4 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:12,971-Speed 2498.64 samples/sec Loss 7.7875 LearningRate 0.000996 Epoch: 4 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:21,117-Speed 2514.60 samples/sec Loss 7.6839 LearningRate 0.000996 Epoch: 4 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:29,314-Speed 2498.64 samples/sec Loss 7.8451 LearningRate 0.000996 Epoch: 4 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:37,512-Speed 2498.64 samples/sec Loss 7.8226 LearningRate 0.000996 Epoch: 4 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:45,708-Speed 2499.10 samples/sec Loss 7.8093 LearningRate 0.000996 Epoch: 4 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:43:53,904-Speed 2499.00 samples/sec Loss 7.7469 LearningRate 0.000996 Epoch: 4 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:02,098-Speed 2499.81 samples/sec Loss 7.7074 LearningRate 0.000996 Epoch: 4 Global Step: 84420 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:10,242-Speed 2515.42 samples/sec Loss 7.8296 LearningRate 0.000996 Epoch: 4 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:18,437-Speed 2499.52 samples/sec Loss 7.8093 LearningRate 0.000996 Epoch: 4 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:26,636-Speed 2498.39 samples/sec Loss 7.7176 LearningRate 0.000996 Epoch: 4 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:34,830-Speed 2499.60 samples/sec Loss 7.6454 LearningRate 0.000996 Epoch: 4 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:43,027-Speed 2499.12 samples/sec Loss 7.6877 LearningRate 0.000996 Epoch: 4 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:51,233-Speed 2496.18 samples/sec Loss 7.7148 LearningRate 0.000996 Epoch: 4 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:44:59,375-Speed 2515.59 samples/sec Loss 7.6802 LearningRate 0.000996 Epoch: 4 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:07,571-Speed 2499.50 samples/sec Loss 7.7168 LearningRate 0.000996 Epoch: 4 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:15,769-Speed 2498.57 samples/sec Loss 7.6867 LearningRate 0.000996 Epoch: 4 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:23,963-Speed 2499.67 samples/sec Loss 7.7669 LearningRate 0.000996 Epoch: 4 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:32,163-Speed 2498.28 samples/sec Loss 7.6944 LearningRate 0.000996 Epoch: 4 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:40,357-Speed 2499.69 samples/sec Loss 7.7414 LearningRate 0.000996 Epoch: 4 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:48,502-Speed 2514.95 samples/sec Loss 7.6964 LearningRate 0.000996 Epoch: 4 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:45:56,700-Speed 2498.58 samples/sec Loss 7.7309 LearningRate 0.000996 Epoch: 4 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:04,896-Speed 2499.24 samples/sec Loss 7.7894 LearningRate 0.000996 Epoch: 4 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:13,091-Speed 2499.57 samples/sec Loss 7.8544 LearningRate 0.000996 Epoch: 4 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:21,296-Speed 2496.53 samples/sec Loss 7.7623 LearningRate 0.000996 Epoch: 4 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:29,508-Speed 2494.31 samples/sec Loss 7.6819 LearningRate 0.000996 Epoch: 4 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:37,650-Speed 2515.53 samples/sec Loss 7.7249 LearningRate 0.000996 Epoch: 4 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:45,854-Speed 2496.79 samples/sec Loss 7.7136 LearningRate 0.000996 Epoch: 4 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:46:54,049-Speed 2499.79 samples/sec Loss 7.6967 LearningRate 0.000996 Epoch: 4 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:02,256-Speed 2495.99 samples/sec Loss 7.6791 LearningRate 0.000995 Epoch: 4 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:10,450-Speed 2499.77 samples/sec Loss 7.6755 LearningRate 0.000995 Epoch: 4 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:18,645-Speed 2499.52 samples/sec Loss 7.6881 LearningRate 0.000995 Epoch: 4 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:26,785-Speed 2516.39 samples/sec Loss 7.6933 LearningRate 0.000995 Epoch: 4 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:34,984-Speed 2497.95 samples/sec Loss 7.6617 LearningRate 0.000995 Epoch: 4 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:43,182-Speed 2498.72 samples/sec Loss 7.7345 LearningRate 0.000995 Epoch: 4 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:47:51,378-Speed 2499.14 samples/sec Loss 7.6968 LearningRate 0.000995 Epoch: 4 Global Step: 84700 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:47:59,590-Speed 2494.34 samples/sec Loss 7.7545 LearningRate 0.000995 Epoch: 4 Global Step: 84710 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:07,785-Speed 2499.51 samples/sec Loss 7.8061 LearningRate 0.000995 Epoch: 4 Global Step: 84720 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:15,926-Speed 2516.28 samples/sec Loss 7.6247 LearningRate 0.000995 Epoch: 4 Global Step: 84730 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:24,121-Speed 2499.53 samples/sec Loss 7.7220 LearningRate 0.000995 Epoch: 4 Global Step: 84740 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:32,326-Speed 2496.33 samples/sec Loss 7.7159 LearningRate 0.000995 Epoch: 4 Global Step: 84750 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:40,522-Speed 2499.31 samples/sec Loss 7.7611 LearningRate 0.000995 Epoch: 4 Global Step: 84760 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:48,719-Speed 2499.16 samples/sec Loss 7.7198 LearningRate 0.000995 Epoch: 4 Global Step: 84770 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:48:56,917-Speed 2498.59 samples/sec Loss 7.7270 LearningRate 0.000995 Epoch: 4 Global Step: 84780 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:05,063-Speed 2514.66 samples/sec Loss 7.5867 LearningRate 0.000995 Epoch: 4 Global Step: 84790 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:13,258-Speed 2499.66 samples/sec Loss 7.7521 LearningRate 0.000995 Epoch: 4 Global Step: 84800 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:21,455-Speed 2498.70 samples/sec Loss 7.6707 LearningRate 0.000995 Epoch: 4 Global Step: 84810 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:29,653-Speed 2498.64 samples/sec Loss 7.6374 LearningRate 0.000995 Epoch: 4 Global Step: 84820 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:37,853-Speed 2498.11 samples/sec Loss 7.6603 LearningRate 0.000995 Epoch: 4 Global Step: 84830 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:46,073-Speed 2491.85 samples/sec Loss 7.7428 LearningRate 0.000995 Epoch: 4 Global Step: 84840 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:49:54,221-Speed 2513.80 samples/sec Loss 7.6213 LearningRate 0.000995 Epoch: 4 Global Step: 84850 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:02,433-Speed 2494.42 samples/sec Loss 7.7357 LearningRate 0.000995 Epoch: 4 Global Step: 84860 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:10,646-Speed 2493.96 samples/sec Loss 7.7026 LearningRate 0.000995 Epoch: 4 Global Step: 84870 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:18,851-Speed 2496.49 samples/sec Loss 7.6831 LearningRate 0.000995 Epoch: 4 Global Step: 84880 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:27,049-Speed 2498.47 samples/sec Loss 7.6601 LearningRate 0.000995 Epoch: 4 Global Step: 84890 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:35,246-Speed 2499.12 samples/sec Loss 7.8350 LearningRate 0.000995 Epoch: 4 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:43,392-Speed 2514.36 samples/sec Loss 7.6666 LearningRate 0.000995 Epoch: 4 Global Step: 84910 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:51,588-Speed 2499.24 samples/sec Loss 7.6465 LearningRate 0.000995 Epoch: 4 Global Step: 84920 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:50:59,783-Speed 2499.45 samples/sec Loss 7.6547 LearningRate 0.000995 Epoch: 4 Global Step: 84930 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:07,975-Speed 2500.30 samples/sec Loss 7.5901 LearningRate 0.000995 Epoch: 4 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:16,175-Speed 2498.14 samples/sec Loss 7.6779 LearningRate 0.000995 Epoch: 4 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:24,376-Speed 2497.55 samples/sec Loss 7.6986 LearningRate 0.000995 Epoch: 4 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:32,518-Speed 2515.81 samples/sec Loss 7.7674 LearningRate 0.000995 Epoch: 4 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:40,713-Speed 2499.47 samples/sec Loss 7.7170 LearningRate 0.000995 Epoch: 4 Global Step: 84980 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:48,908-Speed 2499.67 samples/sec Loss 7.7161 LearningRate 0.000995 Epoch: 4 Global Step: 84990 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:51:57,102-Speed 2499.69 samples/sec Loss 7.7374 LearningRate 0.000995 Epoch: 4 Global Step: 85000 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:05,298-Speed 2499.28 samples/sec Loss 7.8506 LearningRate 0.000995 Epoch: 4 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:13,505-Speed 2495.95 samples/sec Loss 7.7757 LearningRate 0.000994 Epoch: 4 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:21,652-Speed 2514.21 samples/sec Loss 7.7999 LearningRate 0.000994 Epoch: 4 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:29,849-Speed 2498.84 samples/sec Loss 7.8588 LearningRate 0.000994 Epoch: 4 Global Step: 85040 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:38,045-Speed 2499.26 samples/sec Loss 7.7019 LearningRate 0.000994 Epoch: 4 Global Step: 85050 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:46,239-Speed 2499.65 samples/sec Loss 7.7817 LearningRate 0.000994 Epoch: 4 Global Step: 85060 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:52:54,434-Speed 2499.59 samples/sec Loss 7.7465 LearningRate 0.000994 Epoch: 4 Global Step: 85070 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:53:02,637-Speed 2497.38 samples/sec Loss 7.7099 LearningRate 0.000994 Epoch: 4 Global Step: 85080 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:53:10,775-Speed 2517.08 samples/sec Loss 7.7137 LearningRate 0.000994 Epoch: 4 Global Step: 85090 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:53:18,983-Speed 2495.46 samples/sec Loss 7.6684 LearningRate 0.000994 Epoch: 4 Global Step: 85100 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 09:53:27,136-Speed 2512.25 samples/sec Loss 7.7011 LearningRate 0.000994 Epoch: 4 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:53:35,332-Speed 2499.19 samples/sec Loss 7.6914 LearningRate 0.000994 Epoch: 4 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:53:43,530-Speed 2498.60 samples/sec Loss 7.6708 LearningRate 0.000994 Epoch: 4 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:53:51,726-Speed 2499.17 samples/sec Loss 7.6139 LearningRate 0.000994 Epoch: 4 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:53:59,873-Speed 2514.16 samples/sec Loss 7.6471 LearningRate 0.000994 Epoch: 4 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:08,086-Speed 2494.14 samples/sec Loss 7.6768 LearningRate 0.000994 Epoch: 4 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:16,280-Speed 2499.68 samples/sec Loss 7.6103 LearningRate 0.000994 Epoch: 4 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:24,479-Speed 2498.34 samples/sec Loss 7.6041 LearningRate 0.000994 Epoch: 4 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:32,675-Speed 2499.08 samples/sec Loss 7.6916 LearningRate 0.000994 Epoch: 4 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:40,875-Speed 2497.99 samples/sec Loss 7.6153 LearningRate 0.000994 Epoch: 4 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:49,020-Speed 2514.84 samples/sec Loss 7.7745 LearningRate 0.000994 Epoch: 4 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:54:57,219-Speed 2498.21 samples/sec Loss 7.6277 LearningRate 0.000994 Epoch: 4 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:05,416-Speed 2498.94 samples/sec Loss 7.7451 LearningRate 0.000994 Epoch: 4 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:13,623-Speed 2496.03 samples/sec Loss 7.7486 LearningRate 0.000994 Epoch: 4 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:21,825-Speed 2497.36 samples/sec Loss 7.5798 LearningRate 0.000994 Epoch: 4 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:30,022-Speed 2498.66 samples/sec Loss 7.6629 LearningRate 0.000994 Epoch: 4 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:38,165-Speed 2515.42 samples/sec Loss 7.5709 LearningRate 0.000994 Epoch: 4 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:46,362-Speed 2499.08 samples/sec Loss 7.6566 LearningRate 0.000994 Epoch: 4 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:55:54,557-Speed 2499.40 samples/sec Loss 7.6580 LearningRate 0.000994 Epoch: 4 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:02,761-Speed 2497.05 samples/sec Loss 7.6338 LearningRate 0.000994 Epoch: 4 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:10,959-Speed 2498.65 samples/sec Loss 7.6376 LearningRate 0.000994 Epoch: 4 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:19,162-Speed 2497.08 samples/sec Loss 7.6281 LearningRate 0.000994 Epoch: 4 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:27,305-Speed 2515.45 samples/sec Loss 7.7008 LearningRate 0.000994 Epoch: 4 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:35,500-Speed 2499.31 samples/sec Loss 7.6863 LearningRate 0.000994 Epoch: 4 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:43,697-Speed 2499.02 samples/sec Loss 7.6364 LearningRate 0.000994 Epoch: 4 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:56:51,907-Speed 2495.00 samples/sec Loss 7.6905 LearningRate 0.000994 Epoch: 4 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:00,108-Speed 2497.47 samples/sec Loss 7.5934 LearningRate 0.000994 Epoch: 4 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:08,306-Speed 2498.69 samples/sec Loss 7.5824 LearningRate 0.000994 Epoch: 4 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:16,451-Speed 2515.18 samples/sec Loss 7.5389 LearningRate 0.000993 Epoch: 4 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:24,647-Speed 2499.24 samples/sec Loss 7.5653 LearningRate 0.000993 Epoch: 4 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:32,846-Speed 2498.03 samples/sec Loss 7.6256 LearningRate 0.000993 Epoch: 4 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:41,044-Speed 2498.73 samples/sec Loss 7.6238 LearningRate 0.000993 Epoch: 4 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:49,243-Speed 2498.30 samples/sec Loss 7.5724 LearningRate 0.000993 Epoch: 4 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:57:57,440-Speed 2498.82 samples/sec Loss 7.6757 LearningRate 0.000993 Epoch: 4 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:05,600-Speed 2510.18 samples/sec Loss 7.7691 LearningRate 0.000993 Epoch: 4 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:13,802-Speed 2497.20 samples/sec Loss 7.5874 LearningRate 0.000993 Epoch: 4 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:22,001-Speed 2498.66 samples/sec Loss 7.6146 LearningRate 0.000993 Epoch: 4 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:30,201-Speed 2497.91 samples/sec Loss 7.6283 LearningRate 0.000993 Epoch: 4 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:38,400-Speed 2498.40 samples/sec Loss 7.6568 LearningRate 0.000993 Epoch: 4 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:46,597-Speed 2499.07 samples/sec Loss 7.5588 LearningRate 0.000993 Epoch: 4 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:58:54,745-Speed 2513.84 samples/sec Loss 7.7597 LearningRate 0.000993 Epoch: 4 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:02,945-Speed 2497.91 samples/sec Loss 7.6974 LearningRate 0.000993 Epoch: 4 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:11,150-Speed 2496.88 samples/sec Loss 7.7705 LearningRate 0.000993 Epoch: 4 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:19,350-Speed 2498.10 samples/sec Loss 7.6728 LearningRate 0.000993 Epoch: 4 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:27,547-Speed 2498.73 samples/sec Loss 7.6445 LearningRate 0.000993 Epoch: 4 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:35,747-Speed 2497.96 samples/sec Loss 7.5881 LearningRate 0.000993 Epoch: 4 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:43,892-Speed 2515.04 samples/sec Loss 7.7003 LearningRate 0.000993 Epoch: 4 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 09:59:52,090-Speed 2498.83 samples/sec Loss 7.5386 LearningRate 0.000993 Epoch: 4 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:00,284-Speed 2499.61 samples/sec Loss 7.6719 LearningRate 0.000993 Epoch: 4 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:08,483-Speed 2498.47 samples/sec Loss 7.5607 LearningRate 0.000993 Epoch: 4 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:16,680-Speed 2498.97 samples/sec Loss 7.7330 LearningRate 0.000993 Epoch: 4 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:24,879-Speed 2498.02 samples/sec Loss 7.7184 LearningRate 0.000993 Epoch: 4 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:33,023-Speed 2515.03 samples/sec Loss 7.5973 LearningRate 0.000993 Epoch: 4 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:41,218-Speed 2499.46 samples/sec Loss 7.7606 LearningRate 0.000993 Epoch: 4 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:49,426-Speed 2495.80 samples/sec Loss 7.7517 LearningRate 0.000993 Epoch: 4 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:00:57,623-Speed 2498.67 samples/sec Loss 7.6708 LearningRate 0.000993 Epoch: 4 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:05,820-Speed 2498.82 samples/sec Loss 7.6257 LearningRate 0.000993 Epoch: 4 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:14,022-Speed 2497.61 samples/sec Loss 7.7349 LearningRate 0.000993 Epoch: 4 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:22,169-Speed 2514.32 samples/sec Loss 7.7137 LearningRate 0.000993 Epoch: 4 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:30,368-Speed 2498.26 samples/sec Loss 7.7024 LearningRate 0.000993 Epoch: 4 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:38,565-Speed 2498.99 samples/sec Loss 7.6801 LearningRate 0.000993 Epoch: 4 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:46,765-Speed 2498.02 samples/sec Loss 7.6099 LearningRate 0.000993 Epoch: 4 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:01:54,962-Speed 2498.54 samples/sec Loss 7.7004 LearningRate 0.000993 Epoch: 4 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:03,158-Speed 2499.29 samples/sec Loss 7.5361 LearningRate 0.000993 Epoch: 4 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:11,300-Speed 2516.31 samples/sec Loss 7.6807 LearningRate 0.000993 Epoch: 4 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:19,493-Speed 2500.08 samples/sec Loss 7.6321 LearningRate 0.000993 Epoch: 4 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:27,693-Speed 2497.76 samples/sec Loss 7.7530 LearningRate 0.000992 Epoch: 4 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:35,902-Speed 2495.41 samples/sec Loss 7.7883 LearningRate 0.000992 Epoch: 4 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:44,100-Speed 2498.74 samples/sec Loss 7.6855 LearningRate 0.000992 Epoch: 4 Global Step: 85790 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:02:52,299-Speed 2498.15 samples/sec Loss 7.8042 LearningRate 0.000992 Epoch: 4 Global Step: 85800 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:00,443-Speed 2515.31 samples/sec Loss 7.7201 LearningRate 0.000992 Epoch: 4 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:08,640-Speed 2498.91 samples/sec Loss 7.6812 LearningRate 0.000992 Epoch: 4 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:16,835-Speed 2499.47 samples/sec Loss 7.6785 LearningRate 0.000992 Epoch: 4 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:25,029-Speed 2499.88 samples/sec Loss 7.7294 LearningRate 0.000992 Epoch: 4 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:33,226-Speed 2498.97 samples/sec Loss 7.6309 LearningRate 0.000992 Epoch: 4 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:41,424-Speed 2498.32 samples/sec Loss 7.6359 LearningRate 0.000992 Epoch: 4 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:49,567-Speed 2515.55 samples/sec Loss 7.6360 LearningRate 0.000992 Epoch: 4 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:03:57,762-Speed 2499.59 samples/sec Loss 7.6516 LearningRate 0.000992 Epoch: 4 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:05,969-Speed 2495.66 samples/sec Loss 7.5662 LearningRate 0.000992 Epoch: 4 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:14,165-Speed 2499.27 samples/sec Loss 7.5652 LearningRate 0.000992 Epoch: 4 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:22,359-Speed 2499.66 samples/sec Loss 7.5680 LearningRate 0.000992 Epoch: 4 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:30,557-Speed 2498.70 samples/sec Loss 7.5575 LearningRate 0.000992 Epoch: 4 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:38,699-Speed 2516.04 samples/sec Loss 7.6083 LearningRate 0.000992 Epoch: 4 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:46,896-Speed 2498.72 samples/sec Loss 7.6051 LearningRate 0.000992 Epoch: 4 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:04:55,092-Speed 2499.12 samples/sec Loss 7.5394 LearningRate 0.000992 Epoch: 4 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:03,287-Speed 2499.65 samples/sec Loss 7.5397 LearningRate 0.000992 Epoch: 4 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:11,479-Speed 2500.25 samples/sec Loss 7.6185 LearningRate 0.000992 Epoch: 4 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:19,676-Speed 2499.10 samples/sec Loss 7.7849 LearningRate 0.000992 Epoch: 4 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:27,822-Speed 2514.83 samples/sec Loss 7.5897 LearningRate 0.000992 Epoch: 4 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:36,019-Speed 2498.65 samples/sec Loss 7.6244 LearningRate 0.000992 Epoch: 4 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:44,217-Speed 2498.86 samples/sec Loss 7.6034 LearningRate 0.000992 Epoch: 4 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:05:52,413-Speed 2499.20 samples/sec Loss 7.6103 LearningRate 0.000992 Epoch: 4 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:00,608-Speed 2499.37 samples/sec Loss 7.5766 LearningRate 0.000992 Epoch: 4 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:08,806-Speed 2498.69 samples/sec Loss 7.5010 LearningRate 0.000992 Epoch: 4 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:16,946-Speed 2516.35 samples/sec Loss 7.5763 LearningRate 0.000992 Epoch: 4 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:25,138-Speed 2500.36 samples/sec Loss 7.6594 LearningRate 0.000992 Epoch: 4 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:33,331-Speed 2500.00 samples/sec Loss 7.5834 LearningRate 0.000992 Epoch: 4 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:41,524-Speed 2500.29 samples/sec Loss 7.5759 LearningRate 0.000992 Epoch: 4 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:49,740-Speed 2492.92 samples/sec Loss 7.6532 LearningRate 0.000992 Epoch: 4 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:06:57,942-Speed 2497.43 samples/sec Loss 7.5600 LearningRate 0.000992 Epoch: 4 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:06,088-Speed 2514.60 samples/sec Loss 7.6219 LearningRate 0.000992 Epoch: 4 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:14,283-Speed 2499.34 samples/sec Loss 7.5986 LearningRate 0.000992 Epoch: 4 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:22,482-Speed 2498.75 samples/sec Loss 7.4980 LearningRate 0.000992 Epoch: 4 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:30,682-Speed 2498.11 samples/sec Loss 7.5323 LearningRate 0.000991 Epoch: 4 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:38,878-Speed 2499.38 samples/sec Loss 7.5872 LearningRate 0.000991 Epoch: 4 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:47,090-Speed 2494.38 samples/sec Loss 7.4993 LearningRate 0.000991 Epoch: 4 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:07:55,237-Speed 2514.06 samples/sec Loss 7.5930 LearningRate 0.000991 Epoch: 4 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:03,446-Speed 2495.44 samples/sec Loss 7.6433 LearningRate 0.000991 Epoch: 4 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:11,649-Speed 2497.33 samples/sec Loss 7.5968 LearningRate 0.000991 Epoch: 4 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:19,848-Speed 2498.24 samples/sec Loss 7.6315 LearningRate 0.000991 Epoch: 4 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:28,047-Speed 2498.33 samples/sec Loss 7.4485 LearningRate 0.000991 Epoch: 4 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:36,249-Speed 2497.48 samples/sec Loss 7.4629 LearningRate 0.000991 Epoch: 4 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:44,395-Speed 2514.25 samples/sec Loss 7.6612 LearningRate 0.000991 Epoch: 4 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:08:52,596-Speed 2497.86 samples/sec Loss 7.5686 LearningRate 0.000991 Epoch: 4 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:00,795-Speed 2498.33 samples/sec Loss 7.5744 LearningRate 0.000991 Epoch: 4 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:08,996-Speed 2497.74 samples/sec Loss 7.5206 LearningRate 0.000991 Epoch: 4 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:17,198-Speed 2497.24 samples/sec Loss 7.5599 LearningRate 0.000991 Epoch: 4 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:25,396-Speed 2498.62 samples/sec Loss 7.5461 LearningRate 0.000991 Epoch: 4 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:33,540-Speed 2514.89 samples/sec Loss 7.6089 LearningRate 0.000991 Epoch: 4 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:41,745-Speed 2496.74 samples/sec Loss 7.6315 LearningRate 0.000991 Epoch: 4 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:09:49,944-Speed 2498.18 samples/sec Loss 7.5796 LearningRate 0.000991 Epoch: 4 Global Step: 86310 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:09:58,144-Speed 2498.03 samples/sec Loss 7.4751 LearningRate 0.000991 Epoch: 4 Global Step: 86320 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:10:06,344-Speed 2498.02 samples/sec Loss 7.5683 LearningRate 0.000991 Epoch: 4 Global Step: 86330 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:10:14,544-Speed 2498.11 samples/sec Loss 7.7065 LearningRate 0.000991 Epoch: 4 Global Step: 86340 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:10:22,692-Speed 2513.74 samples/sec Loss 7.5499 LearningRate 0.000991 Epoch: 4 Global Step: 86350 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:10:30,891-Speed 2498.29 samples/sec Loss 7.3691 LearningRate 0.000991 Epoch: 4 Global Step: 86360 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:10:39,051-Speed 2510.04 samples/sec Loss 7.5125 LearningRate 0.000991 Epoch: 4 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:10:47,250-Speed 2498.35 samples/sec Loss 7.6159 LearningRate 0.000991 Epoch: 4 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:10:55,450-Speed 2498.15 samples/sec Loss 7.6312 LearningRate 0.000991 Epoch: 4 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:03,650-Speed 2497.76 samples/sec Loss 7.4839 LearningRate 0.000991 Epoch: 4 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:11,800-Speed 2513.53 samples/sec Loss 7.6926 LearningRate 0.000991 Epoch: 4 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:20,003-Speed 2497.11 samples/sec Loss 7.6787 LearningRate 0.000991 Epoch: 4 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:28,200-Speed 2498.95 samples/sec Loss 7.5982 LearningRate 0.000991 Epoch: 4 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:36,392-Speed 2500.37 samples/sec Loss 7.5920 LearningRate 0.000991 Epoch: 4 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:44,588-Speed 2499.30 samples/sec Loss 7.5903 LearningRate 0.000991 Epoch: 4 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:11:52,781-Speed 2499.80 samples/sec Loss 7.5593 LearningRate 0.000991 Epoch: 4 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:00,924-Speed 2515.58 samples/sec Loss 7.6451 LearningRate 0.000991 Epoch: 4 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:09,121-Speed 2498.74 samples/sec Loss 7.5586 LearningRate 0.000991 Epoch: 4 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:17,320-Speed 2498.36 samples/sec Loss 7.6363 LearningRate 0.000991 Epoch: 4 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:25,520-Speed 2498.11 samples/sec Loss 7.5661 LearningRate 0.000991 Epoch: 4 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:33,716-Speed 2499.21 samples/sec Loss 7.5648 LearningRate 0.000991 Epoch: 4 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:41,912-Speed 2498.92 samples/sec Loss 7.5446 LearningRate 0.000990 Epoch: 4 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:50,060-Speed 2514.13 samples/sec Loss 7.5503 LearningRate 0.000990 Epoch: 4 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:12:58,256-Speed 2499.18 samples/sec Loss 7.4838 LearningRate 0.000990 Epoch: 4 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:06,450-Speed 2499.60 samples/sec Loss 7.5264 LearningRate 0.000990 Epoch: 4 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:14,643-Speed 2500.20 samples/sec Loss 7.5852 LearningRate 0.000990 Epoch: 4 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:22,840-Speed 2498.77 samples/sec Loss 7.5377 LearningRate 0.000990 Epoch: 4 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:31,039-Speed 2498.36 samples/sec Loss 7.4630 LearningRate 0.000990 Epoch: 4 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:39,180-Speed 2516.05 samples/sec Loss 7.6564 LearningRate 0.000990 Epoch: 4 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:47,378-Speed 2498.72 samples/sec Loss 7.5101 LearningRate 0.000990 Epoch: 4 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:13:55,575-Speed 2498.90 samples/sec Loss 7.5140 LearningRate 0.000990 Epoch: 4 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:03,776-Speed 2497.60 samples/sec Loss 7.5720 LearningRate 0.000990 Epoch: 4 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:11,972-Speed 2499.18 samples/sec Loss 7.5170 LearningRate 0.000990 Epoch: 4 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:20,167-Speed 2499.39 samples/sec Loss 7.4338 LearningRate 0.000990 Epoch: 4 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:28,316-Speed 2513.70 samples/sec Loss 7.6378 LearningRate 0.000990 Epoch: 4 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:36,515-Speed 2498.29 samples/sec Loss 7.5770 LearningRate 0.000990 Epoch: 4 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:44,719-Speed 2496.69 samples/sec Loss 7.5669 LearningRate 0.000990 Epoch: 4 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:14:52,918-Speed 2498.45 samples/sec Loss 7.5882 LearningRate 0.000990 Epoch: 4 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:01,113-Speed 2499.32 samples/sec Loss 7.5571 LearningRate 0.000990 Epoch: 4 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:09,314-Speed 2497.81 samples/sec Loss 7.7079 LearningRate 0.000990 Epoch: 4 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:17,473-Speed 2510.45 samples/sec Loss 7.5784 LearningRate 0.000990 Epoch: 4 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:25,677-Speed 2496.78 samples/sec Loss 7.6059 LearningRate 0.000990 Epoch: 4 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:33,875-Speed 2498.65 samples/sec Loss 7.6204 LearningRate 0.000990 Epoch: 4 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:42,074-Speed 2498.32 samples/sec Loss 7.4853 LearningRate 0.000990 Epoch: 4 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:50,273-Speed 2498.19 samples/sec Loss 7.4724 LearningRate 0.000990 Epoch: 4 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:15:58,468-Speed 2499.74 samples/sec Loss 7.5260 LearningRate 0.000990 Epoch: 4 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:06,617-Speed 2513.72 samples/sec Loss 7.5418 LearningRate 0.000990 Epoch: 4 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:14,813-Speed 2498.96 samples/sec Loss 7.5111 LearningRate 0.000990 Epoch: 4 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:23,012-Speed 2498.29 samples/sec Loss 7.4402 LearningRate 0.000990 Epoch: 4 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:31,214-Speed 2497.67 samples/sec Loss 7.5636 LearningRate 0.000990 Epoch: 4 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:39,412-Speed 2498.46 samples/sec Loss 7.5343 LearningRate 0.000990 Epoch: 4 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:47,611-Speed 2498.06 samples/sec Loss 7.6498 LearningRate 0.000990 Epoch: 4 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:16:55,759-Speed 2513.97 samples/sec Loss 7.5497 LearningRate 0.000990 Epoch: 4 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:03,957-Speed 2498.73 samples/sec Loss 7.5091 LearningRate 0.000990 Epoch: 4 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:12,155-Speed 2498.43 samples/sec Loss 7.5794 LearningRate 0.000990 Epoch: 4 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:20,351-Speed 2499.33 samples/sec Loss 7.5371 LearningRate 0.000990 Epoch: 4 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:28,548-Speed 2498.68 samples/sec Loss 7.4774 LearningRate 0.000990 Epoch: 4 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:36,749-Speed 2497.83 samples/sec Loss 7.6197 LearningRate 0.000990 Epoch: 4 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:44,894-Speed 2514.81 samples/sec Loss 7.5513 LearningRate 0.000989 Epoch: 4 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:17:53,095-Speed 2497.69 samples/sec Loss 7.5049 LearningRate 0.000989 Epoch: 4 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:01,296-Speed 2497.80 samples/sec Loss 7.5539 LearningRate 0.000989 Epoch: 4 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:09,503-Speed 2495.85 samples/sec Loss 7.6613 LearningRate 0.000989 Epoch: 4 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:17,703-Speed 2497.80 samples/sec Loss 7.5793 LearningRate 0.000989 Epoch: 4 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:25,900-Speed 2498.94 samples/sec Loss 7.6203 LearningRate 0.000989 Epoch: 4 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:34,051-Speed 2512.99 samples/sec Loss 7.5289 LearningRate 0.000989 Epoch: 4 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:42,250-Speed 2498.32 samples/sec Loss 7.7148 LearningRate 0.000989 Epoch: 4 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:50,449-Speed 2498.44 samples/sec Loss 7.5982 LearningRate 0.000989 Epoch: 4 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:18:58,648-Speed 2498.06 samples/sec Loss 7.4975 LearningRate 0.000989 Epoch: 4 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:06,849-Speed 2498.07 samples/sec Loss 7.6380 LearningRate 0.000989 Epoch: 4 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:15,047-Speed 2498.64 samples/sec Loss 7.5350 LearningRate 0.000989 Epoch: 4 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:23,193-Speed 2514.25 samples/sec Loss 7.4454 LearningRate 0.000989 Epoch: 4 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:31,392-Speed 2498.49 samples/sec Loss 7.5285 LearningRate 0.000989 Epoch: 4 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:39,591-Speed 2498.42 samples/sec Loss 7.5402 LearningRate 0.000989 Epoch: 4 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:47,792-Speed 2497.52 samples/sec Loss 7.5490 LearningRate 0.000989 Epoch: 4 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:19:55,994-Speed 2497.31 samples/sec Loss 7.4948 LearningRate 0.000989 Epoch: 4 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:04,191-Speed 2498.84 samples/sec Loss 7.5182 LearningRate 0.000989 Epoch: 4 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:12,338-Speed 2514.32 samples/sec Loss 7.5812 LearningRate 0.000989 Epoch: 4 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:20,542-Speed 2496.94 samples/sec Loss 7.4167 LearningRate 0.000989 Epoch: 4 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:28,742-Speed 2497.69 samples/sec Loss 7.4643 LearningRate 0.000989 Epoch: 4 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:36,943-Speed 2497.68 samples/sec Loss 7.5120 LearningRate 0.000989 Epoch: 4 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:45,141-Speed 2498.76 samples/sec Loss 7.4551 LearningRate 0.000989 Epoch: 4 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:20:53,337-Speed 2499.12 samples/sec Loss 7.4633 LearningRate 0.000989 Epoch: 4 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:01,480-Speed 2515.32 samples/sec Loss 7.4407 LearningRate 0.000989 Epoch: 4 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:09,678-Speed 2498.77 samples/sec Loss 7.5692 LearningRate 0.000989 Epoch: 4 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:17,874-Speed 2498.98 samples/sec Loss 7.4597 LearningRate 0.000989 Epoch: 4 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:26,071-Speed 2498.79 samples/sec Loss 7.4694 LearningRate 0.000989 Epoch: 4 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:34,269-Speed 2498.75 samples/sec Loss 7.6056 LearningRate 0.000989 Epoch: 4 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:42,466-Speed 2498.68 samples/sec Loss 7.6280 LearningRate 0.000989 Epoch: 4 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:50,611-Speed 2514.99 samples/sec Loss 7.5642 LearningRate 0.000989 Epoch: 4 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:21:58,810-Speed 2498.35 samples/sec Loss 7.5549 LearningRate 0.000989 Epoch: 4 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:07,004-Speed 2499.68 samples/sec Loss 7.4540 LearningRate 0.000989 Epoch: 4 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:15,198-Speed 2499.78 samples/sec Loss 7.4313 LearningRate 0.000989 Epoch: 4 Global Step: 87220 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:23,408-Speed 2495.08 samples/sec Loss 7.4368 LearningRate 0.000989 Epoch: 4 Global Step: 87230 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:31,610-Speed 2497.24 samples/sec Loss 7.4566 LearningRate 0.000989 Epoch: 4 Global Step: 87240 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:39,757-Speed 2514.33 samples/sec Loss 7.3928 LearningRate 0.000989 Epoch: 4 Global Step: 87250 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:47,956-Speed 2498.19 samples/sec Loss 7.4590 LearningRate 0.000989 Epoch: 4 Global Step: 87260 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:22:56,157-Speed 2497.45 samples/sec Loss 7.4385 LearningRate 0.000988 Epoch: 4 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:04,360-Speed 2497.18 samples/sec Loss 7.5203 LearningRate 0.000988 Epoch: 4 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:12,562-Speed 2497.22 samples/sec Loss 7.4173 LearningRate 0.000988 Epoch: 4 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:20,765-Speed 2496.98 samples/sec Loss 7.4867 LearningRate 0.000988 Epoch: 4 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:28,914-Speed 2513.60 samples/sec Loss 7.4911 LearningRate 0.000988 Epoch: 4 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:37,114-Speed 2498.09 samples/sec Loss 7.5005 LearningRate 0.000988 Epoch: 4 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:45,316-Speed 2497.51 samples/sec Loss 7.4909 LearningRate 0.000988 Epoch: 4 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:23:53,510-Speed 2499.69 samples/sec Loss 7.4396 LearningRate 0.000988 Epoch: 4 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:01,711-Speed 2497.77 samples/sec Loss 7.4490 LearningRate 0.000988 Epoch: 4 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:09,911-Speed 2498.11 samples/sec Loss 7.3899 LearningRate 0.000988 Epoch: 4 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:18,052-Speed 2516.03 samples/sec Loss 7.6272 LearningRate 0.000988 Epoch: 4 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:26,249-Speed 2499.00 samples/sec Loss 7.4738 LearningRate 0.000988 Epoch: 4 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:34,444-Speed 2499.42 samples/sec Loss 7.5293 LearningRate 0.000988 Epoch: 4 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:42,647-Speed 2497.13 samples/sec Loss 7.4177 LearningRate 0.000988 Epoch: 4 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:50,852-Speed 2496.18 samples/sec Loss 7.5695 LearningRate 0.000988 Epoch: 4 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:24:59,048-Speed 2499.04 samples/sec Loss 7.6386 LearningRate 0.000988 Epoch: 4 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:07,194-Speed 2514.73 samples/sec Loss 7.5631 LearningRate 0.000988 Epoch: 4 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:15,396-Speed 2496.99 samples/sec Loss 7.6904 LearningRate 0.000988 Epoch: 4 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:23,599-Speed 2497.19 samples/sec Loss 7.5862 LearningRate 0.000988 Epoch: 4 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:31,801-Speed 2497.48 samples/sec Loss 7.5237 LearningRate 0.000988 Epoch: 4 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:40,011-Speed 2494.72 samples/sec Loss 7.5911 LearningRate 0.000988 Epoch: 4 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:48,214-Speed 2496.89 samples/sec Loss 7.5500 LearningRate 0.000988 Epoch: 4 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:25:56,362-Speed 2514.25 samples/sec Loss 7.4908 LearningRate 0.000988 Epoch: 4 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:04,558-Speed 2498.90 samples/sec Loss 7.4407 LearningRate 0.000988 Epoch: 4 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:12,754-Speed 2499.10 samples/sec Loss 7.5120 LearningRate 0.000988 Epoch: 4 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:20,951-Speed 2499.08 samples/sec Loss 7.3967 LearningRate 0.000988 Epoch: 4 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:29,146-Speed 2499.25 samples/sec Loss 7.4397 LearningRate 0.000988 Epoch: 4 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:37,344-Speed 2498.57 samples/sec Loss 7.5159 LearningRate 0.000988 Epoch: 4 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:45,501-Speed 2510.87 samples/sec Loss 7.4664 LearningRate 0.000988 Epoch: 4 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:26:53,698-Speed 2499.15 samples/sec Loss 7.4103 LearningRate 0.000988 Epoch: 4 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 170 hours Training: 2022-07-06 10:27:01,894-Speed 2499.23 samples/sec Loss 7.4119 LearningRate 0.000988 Epoch: 4 Global Step: 87570 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:10,090-Speed 2499.06 samples/sec Loss 7.3311 LearningRate 0.000988 Epoch: 4 Global Step: 87580 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:18,287-Speed 2498.95 samples/sec Loss 7.3133 LearningRate 0.000988 Epoch: 4 Global Step: 87590 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:26,489-Speed 2497.70 samples/sec Loss 7.3082 LearningRate 0.000988 Epoch: 4 Global Step: 87600 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:34,635-Speed 2514.44 samples/sec Loss 7.4188 LearningRate 0.000988 Epoch: 4 Global Step: 87610 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:42,833-Speed 2498.79 samples/sec Loss 7.4359 LearningRate 0.000988 Epoch: 4 Global Step: 87620 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:51,033-Speed 2498.06 samples/sec Loss 7.2424 LearningRate 0.000988 Epoch: 4 Global Step: 87630 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:27:59,232-Speed 2498.22 samples/sec Loss 7.4668 LearningRate 0.000987 Epoch: 4 Global Step: 87640 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:07,428-Speed 2499.32 samples/sec Loss 7.4067 LearningRate 0.000987 Epoch: 4 Global Step: 87650 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:15,627-Speed 2498.27 samples/sec Loss 7.4804 LearningRate 0.000987 Epoch: 4 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:23,857-Speed 2488.72 samples/sec Loss 7.4760 LearningRate 0.000987 Epoch: 4 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:32,063-Speed 2495.90 samples/sec Loss 7.3484 LearningRate 0.000987 Epoch: 4 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:40,260-Speed 2499.02 samples/sec Loss 7.5242 LearningRate 0.000987 Epoch: 4 Global Step: 87690 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:48,456-Speed 2499.21 samples/sec Loss 7.6622 LearningRate 0.000987 Epoch: 4 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:28:56,651-Speed 2499.26 samples/sec Loss 7.5069 LearningRate 0.000987 Epoch: 4 Global Step: 87710 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:29:04,850-Speed 2498.13 samples/sec Loss 7.7220 LearningRate 0.000987 Epoch: 4 Global Step: 87720 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:29:12,997-Speed 2514.05 samples/sec Loss 7.6730 LearningRate 0.000987 Epoch: 4 Global Step: 87730 Fp16 Grad Scale: 131072 Required: 170 hours Training: 2022-07-06 10:29:21,193-Speed 2499.22 samples/sec Loss 7.5407 LearningRate 0.000987 Epoch: 4 Global Step: 87740 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:29:29,392-Speed 2498.16 samples/sec Loss 7.5441 LearningRate 0.000987 Epoch: 4 Global Step: 87750 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:29:37,595-Speed 2497.15 samples/sec Loss 7.5343 LearningRate 0.000987 Epoch: 4 Global Step: 87760 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:29:45,793-Speed 2498.36 samples/sec Loss 7.5535 LearningRate 0.000987 Epoch: 4 Global Step: 87770 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:29:53,993-Speed 2497.98 samples/sec Loss 7.5905 LearningRate 0.000987 Epoch: 4 Global Step: 87780 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:02,146-Speed 2512.34 samples/sec Loss 7.4758 LearningRate 0.000987 Epoch: 4 Global Step: 87790 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:10,349-Speed 2497.55 samples/sec Loss 7.5681 LearningRate 0.000987 Epoch: 4 Global Step: 87800 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:18,547-Speed 2498.38 samples/sec Loss 7.5825 LearningRate 0.000987 Epoch: 4 Global Step: 87810 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:26,747-Speed 2498.20 samples/sec Loss 7.4531 LearningRate 0.000987 Epoch: 4 Global Step: 87820 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:34,946-Speed 2498.21 samples/sec Loss 7.4498 LearningRate 0.000987 Epoch: 4 Global Step: 87830 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:43,144-Speed 2498.34 samples/sec Loss 7.4486 LearningRate 0.000987 Epoch: 4 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:51,289-Speed 2514.71 samples/sec Loss 7.4521 LearningRate 0.000987 Epoch: 4 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:30:59,487-Speed 2498.81 samples/sec Loss 7.4459 LearningRate 0.000987 Epoch: 4 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:07,688-Speed 2497.72 samples/sec Loss 7.5346 LearningRate 0.000987 Epoch: 4 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:15,901-Speed 2493.89 samples/sec Loss 7.4008 LearningRate 0.000987 Epoch: 4 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:24,096-Speed 2499.29 samples/sec Loss 7.4742 LearningRate 0.000987 Epoch: 4 Global Step: 87890 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:32,290-Speed 2499.84 samples/sec Loss 7.5591 LearningRate 0.000987 Epoch: 4 Global Step: 87900 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:40,432-Speed 2515.91 samples/sec Loss 7.4972 LearningRate 0.000987 Epoch: 4 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:48,634-Speed 2497.40 samples/sec Loss 7.4041 LearningRate 0.000987 Epoch: 4 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:31:56,829-Speed 2499.19 samples/sec Loss 7.4855 LearningRate 0.000987 Epoch: 4 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:05,040-Speed 2494.92 samples/sec Loss 7.3927 LearningRate 0.000987 Epoch: 4 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:13,251-Speed 2494.58 samples/sec Loss 7.5008 LearningRate 0.000987 Epoch: 4 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:21,452-Speed 2497.69 samples/sec Loss 7.5704 LearningRate 0.000987 Epoch: 4 Global Step: 87960 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:29,597-Speed 2514.96 samples/sec Loss 7.3991 LearningRate 0.000987 Epoch: 4 Global Step: 87970 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:37,802-Speed 2496.39 samples/sec Loss 7.3693 LearningRate 0.000987 Epoch: 4 Global Step: 87980 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 10:32:45,962-Speed 2510.26 samples/sec Loss 7.5075 LearningRate 0.000987 Epoch: 4 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:32:54,172-Speed 2494.61 samples/sec Loss 7.5146 LearningRate 0.000987 Epoch: 4 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:02,366-Speed 2499.81 samples/sec Loss 7.3565 LearningRate 0.000987 Epoch: 4 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:10,561-Speed 2499.89 samples/sec Loss 7.3953 LearningRate 0.000986 Epoch: 4 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:18,706-Speed 2515.05 samples/sec Loss 7.4352 LearningRate 0.000986 Epoch: 4 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:26,902-Speed 2499.29 samples/sec Loss 7.4307 LearningRate 0.000986 Epoch: 4 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:35,095-Speed 2499.95 samples/sec Loss 7.4116 LearningRate 0.000986 Epoch: 4 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:43,293-Speed 2498.76 samples/sec Loss 7.4278 LearningRate 0.000986 Epoch: 4 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:51,492-Speed 2498.27 samples/sec Loss 7.4909 LearningRate 0.000986 Epoch: 4 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:33:59,687-Speed 2499.46 samples/sec Loss 7.4860 LearningRate 0.000986 Epoch: 4 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:08,000-Speed 2515.61 samples/sec Loss 7.4672 LearningRate 0.000986 Epoch: 4 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:16,212-Speed 2494.37 samples/sec Loss 7.4775 LearningRate 0.000986 Epoch: 4 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:24,405-Speed 2499.92 samples/sec Loss 7.4261 LearningRate 0.000986 Epoch: 4 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:32,609-Speed 2496.92 samples/sec Loss 7.4985 LearningRate 0.000986 Epoch: 4 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:40,869-Speed 2500.91 samples/sec Loss 7.4164 LearningRate 0.000986 Epoch: 4 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:49,068-Speed 2498.32 samples/sec Loss 7.3855 LearningRate 0.000986 Epoch: 4 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:34:57,213-Speed 2514.76 samples/sec Loss 7.3653 LearningRate 0.000986 Epoch: 4 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:06,662-Speed 2500.81 samples/sec Loss 7.3628 LearningRate 0.000986 Epoch: 4 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:14,857-Speed 2499.31 samples/sec Loss 7.2590 LearningRate 0.000986 Epoch: 4 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:23,055-Speed 2498.80 samples/sec Loss 7.3752 LearningRate 0.000986 Epoch: 4 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:31,258-Speed 2496.99 samples/sec Loss 7.3180 LearningRate 0.000986 Epoch: 4 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:39,454-Speed 2499.20 samples/sec Loss 7.2789 LearningRate 0.000986 Epoch: 4 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:47,599-Speed 2514.88 samples/sec Loss 7.4251 LearningRate 0.000986 Epoch: 4 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:35:55,798-Speed 2498.20 samples/sec Loss 7.4403 LearningRate 0.000986 Epoch: 4 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:03,997-Speed 2498.03 samples/sec Loss 7.4874 LearningRate 0.000986 Epoch: 4 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:12,195-Speed 2498.93 samples/sec Loss 7.3471 LearningRate 0.000986 Epoch: 4 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:20,390-Speed 2499.44 samples/sec Loss 7.3262 LearningRate 0.000986 Epoch: 4 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:28,590-Speed 2497.95 samples/sec Loss 7.3847 LearningRate 0.000986 Epoch: 4 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:36,734-Speed 2515.29 samples/sec Loss 7.3893 LearningRate 0.000986 Epoch: 4 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:44,936-Speed 2497.54 samples/sec Loss 7.3520 LearningRate 0.000986 Epoch: 4 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:36:53,134-Speed 2498.57 samples/sec Loss 7.2930 LearningRate 0.000986 Epoch: 4 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:01,335-Speed 2497.55 samples/sec Loss 7.4537 LearningRate 0.000986 Epoch: 4 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:09,532-Speed 2498.98 samples/sec Loss 7.5177 LearningRate 0.000986 Epoch: 4 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:17,727-Speed 2499.55 samples/sec Loss 7.4451 LearningRate 0.000986 Epoch: 4 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:25,871-Speed 2515.11 samples/sec Loss 7.4416 LearningRate 0.000986 Epoch: 4 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:34,074-Speed 2497.24 samples/sec Loss 7.4442 LearningRate 0.000986 Epoch: 4 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:42,272-Speed 2498.43 samples/sec Loss 7.3814 LearningRate 0.000986 Epoch: 4 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:50,472-Speed 2498.17 samples/sec Loss 7.4408 LearningRate 0.000986 Epoch: 4 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:37:58,677-Speed 2496.58 samples/sec Loss 7.3692 LearningRate 0.000986 Epoch: 4 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:06,875-Speed 2498.73 samples/sec Loss 7.3202 LearningRate 0.000986 Epoch: 4 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:15,019-Speed 2515.32 samples/sec Loss 7.2701 LearningRate 0.000985 Epoch: 4 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:23,216-Speed 2498.86 samples/sec Loss 7.3008 LearningRate 0.000985 Epoch: 4 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:31,414-Speed 2498.41 samples/sec Loss 7.4866 LearningRate 0.000985 Epoch: 4 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:39,625-Speed 2494.65 samples/sec Loss 7.5133 LearningRate 0.000985 Epoch: 4 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:47,826-Speed 2497.66 samples/sec Loss 7.3874 LearningRate 0.000985 Epoch: 4 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:38:56,024-Speed 2498.68 samples/sec Loss 7.3902 LearningRate 0.000985 Epoch: 4 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:04,169-Speed 2514.65 samples/sec Loss 7.4266 LearningRate 0.000985 Epoch: 4 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:12,374-Speed 2496.46 samples/sec Loss 7.4376 LearningRate 0.000985 Epoch: 4 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:20,574-Speed 2498.08 samples/sec Loss 7.2704 LearningRate 0.000985 Epoch: 4 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:28,771-Speed 2498.75 samples/sec Loss 7.2643 LearningRate 0.000985 Epoch: 4 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:36,968-Speed 2498.92 samples/sec Loss 7.4189 LearningRate 0.000985 Epoch: 4 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:45,167-Speed 2498.27 samples/sec Loss 7.3003 LearningRate 0.000985 Epoch: 4 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:39:53,314-Speed 2514.19 samples/sec Loss 7.4786 LearningRate 0.000985 Epoch: 4 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:01,513-Speed 2498.21 samples/sec Loss 7.3960 LearningRate 0.000985 Epoch: 4 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:09,713-Speed 2497.93 samples/sec Loss 7.3569 LearningRate 0.000985 Epoch: 4 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:17,925-Speed 2494.52 samples/sec Loss 7.4145 LearningRate 0.000985 Epoch: 4 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:26,144-Speed 2491.89 samples/sec Loss 7.4037 LearningRate 0.000985 Epoch: 4 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:34,347-Speed 2497.15 samples/sec Loss 7.3402 LearningRate 0.000985 Epoch: 4 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:42,495-Speed 2513.94 samples/sec Loss 7.3660 LearningRate 0.000985 Epoch: 4 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:50,693-Speed 2498.56 samples/sec Loss 7.3527 LearningRate 0.000985 Epoch: 4 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:40:58,905-Speed 2494.23 samples/sec Loss 7.3780 LearningRate 0.000985 Epoch: 4 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:07,102-Speed 2498.74 samples/sec Loss 7.3436 LearningRate 0.000985 Epoch: 4 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:15,311-Speed 2495.29 samples/sec Loss 7.3430 LearningRate 0.000985 Epoch: 4 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:23,510-Speed 2498.26 samples/sec Loss 7.4414 LearningRate 0.000985 Epoch: 4 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:31,655-Speed 2515.02 samples/sec Loss 7.4022 LearningRate 0.000985 Epoch: 4 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:39,849-Speed 2499.54 samples/sec Loss 7.4010 LearningRate 0.000985 Epoch: 4 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:48,046-Speed 2499.31 samples/sec Loss 7.3887 LearningRate 0.000985 Epoch: 4 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:41:56,240-Speed 2499.86 samples/sec Loss 7.3906 LearningRate 0.000985 Epoch: 4 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:04,449-Speed 2495.05 samples/sec Loss 7.4216 LearningRate 0.000985 Epoch: 4 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:12,647-Speed 2498.49 samples/sec Loss 7.3192 LearningRate 0.000985 Epoch: 4 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:20,792-Speed 2514.75 samples/sec Loss 7.3957 LearningRate 0.000985 Epoch: 4 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:28,991-Speed 2498.12 samples/sec Loss 7.3745 LearningRate 0.000985 Epoch: 4 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:37,195-Speed 2496.68 samples/sec Loss 7.3753 LearningRate 0.000985 Epoch: 4 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:45,397-Speed 2497.40 samples/sec Loss 7.4661 LearningRate 0.000985 Epoch: 4 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:42:53,598-Speed 2497.81 samples/sec Loss 7.4372 LearningRate 0.000985 Epoch: 4 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:01,793-Speed 2499.38 samples/sec Loss 7.3010 LearningRate 0.000985 Epoch: 4 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:09,938-Speed 2514.95 samples/sec Loss 7.4792 LearningRate 0.000985 Epoch: 4 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:18,142-Speed 2496.77 samples/sec Loss 7.4456 LearningRate 0.000985 Epoch: 4 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:26,340-Speed 2498.60 samples/sec Loss 7.4641 LearningRate 0.000984 Epoch: 4 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:34,538-Speed 2498.99 samples/sec Loss 7.3464 LearningRate 0.000984 Epoch: 4 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:42,736-Speed 2498.51 samples/sec Loss 7.3520 LearningRate 0.000984 Epoch: 4 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:50,929-Speed 2500.07 samples/sec Loss 7.4331 LearningRate 0.000984 Epoch: 4 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:43:59,070-Speed 2516.02 samples/sec Loss 7.3595 LearningRate 0.000984 Epoch: 4 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:07,266-Speed 2499.45 samples/sec Loss 7.3019 LearningRate 0.000984 Epoch: 4 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:15,472-Speed 2496.17 samples/sec Loss 7.2950 LearningRate 0.000984 Epoch: 4 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:23,667-Speed 2499.39 samples/sec Loss 7.3672 LearningRate 0.000984 Epoch: 4 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:31,863-Speed 2499.37 samples/sec Loss 7.2636 LearningRate 0.000984 Epoch: 4 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:40,061-Speed 2498.42 samples/sec Loss 7.3143 LearningRate 0.000984 Epoch: 4 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:48,204-Speed 2515.47 samples/sec Loss 7.4448 LearningRate 0.000984 Epoch: 4 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:44:56,404-Speed 2498.07 samples/sec Loss 7.3727 LearningRate 0.000984 Epoch: 4 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:04,601-Speed 2499.12 samples/sec Loss 7.2518 LearningRate 0.000984 Epoch: 4 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:12,800-Speed 2498.83 samples/sec Loss 7.3528 LearningRate 0.000984 Epoch: 4 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:20,998-Speed 2498.47 samples/sec Loss 7.3932 LearningRate 0.000984 Epoch: 4 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:29,198-Speed 2497.98 samples/sec Loss 7.3297 LearningRate 0.000984 Epoch: 4 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:37,342-Speed 2514.93 samples/sec Loss 7.3210 LearningRate 0.000984 Epoch: 4 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:45,544-Speed 2497.33 samples/sec Loss 7.2771 LearningRate 0.000984 Epoch: 4 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:45:53,741-Speed 2498.86 samples/sec Loss 7.3466 LearningRate 0.000984 Epoch: 4 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 10:46:01,897-Speed 2511.61 samples/sec Loss 7.3520 LearningRate 0.000984 Epoch: 4 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:10,094-Speed 2498.87 samples/sec Loss 7.3011 LearningRate 0.000984 Epoch: 4 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:18,300-Speed 2496.02 samples/sec Loss 7.3131 LearningRate 0.000984 Epoch: 4 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:26,447-Speed 2514.30 samples/sec Loss 7.4659 LearningRate 0.000984 Epoch: 4 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:34,646-Speed 2498.59 samples/sec Loss 7.3268 LearningRate 0.000984 Epoch: 4 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:42,844-Speed 2498.37 samples/sec Loss 7.4022 LearningRate 0.000984 Epoch: 4 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:51,044-Speed 2498.00 samples/sec Loss 7.3537 LearningRate 0.000984 Epoch: 4 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:46:59,237-Speed 2500.05 samples/sec Loss 7.3005 LearningRate 0.000984 Epoch: 4 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:07,435-Speed 2498.48 samples/sec Loss 7.3824 LearningRate 0.000984 Epoch: 4 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:15,579-Speed 2515.19 samples/sec Loss 7.4104 LearningRate 0.000984 Epoch: 4 Global Step: 89050 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:23,778-Speed 2498.32 samples/sec Loss 7.2966 LearningRate 0.000984 Epoch: 4 Global Step: 89060 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:31,973-Speed 2499.42 samples/sec Loss 7.2937 LearningRate 0.000984 Epoch: 4 Global Step: 89070 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:40,169-Speed 2499.01 samples/sec Loss 7.3616 LearningRate 0.000984 Epoch: 4 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:48,370-Speed 2497.77 samples/sec Loss 7.2857 LearningRate 0.000984 Epoch: 4 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:47:56,568-Speed 2498.42 samples/sec Loss 7.2830 LearningRate 0.000984 Epoch: 4 Global Step: 89100 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:04,712-Speed 2515.28 samples/sec Loss 7.3605 LearningRate 0.000984 Epoch: 4 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:12,914-Speed 2497.49 samples/sec Loss 7.3027 LearningRate 0.000984 Epoch: 4 Global Step: 89120 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:21,114-Speed 2497.90 samples/sec Loss 7.1966 LearningRate 0.000984 Epoch: 4 Global Step: 89130 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:29,317-Speed 2497.04 samples/sec Loss 7.2156 LearningRate 0.000984 Epoch: 4 Global Step: 89140 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:37,518-Speed 2497.75 samples/sec Loss 7.3493 LearningRate 0.000983 Epoch: 4 Global Step: 89150 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:45,715-Speed 2498.83 samples/sec Loss 7.2822 LearningRate 0.000983 Epoch: 4 Global Step: 89160 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:48:53,859-Speed 2515.23 samples/sec Loss 7.2348 LearningRate 0.000983 Epoch: 4 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:02,057-Speed 2498.57 samples/sec Loss 7.3736 LearningRate 0.000983 Epoch: 4 Global Step: 89180 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:10,261-Speed 2496.88 samples/sec Loss 7.3520 LearningRate 0.000983 Epoch: 4 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:18,467-Speed 2496.17 samples/sec Loss 7.2509 LearningRate 0.000983 Epoch: 4 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:26,664-Speed 2498.74 samples/sec Loss 7.3327 LearningRate 0.000983 Epoch: 4 Global Step: 89210 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:34,866-Speed 2497.51 samples/sec Loss 7.4293 LearningRate 0.000983 Epoch: 4 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:43,008-Speed 2515.76 samples/sec Loss 7.1924 LearningRate 0.000983 Epoch: 4 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:51,203-Speed 2499.61 samples/sec Loss 7.3577 LearningRate 0.000983 Epoch: 4 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:49:59,403-Speed 2497.71 samples/sec Loss 7.3028 LearningRate 0.000983 Epoch: 4 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:07,603-Speed 2498.17 samples/sec Loss 7.2769 LearningRate 0.000983 Epoch: 4 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:15,797-Speed 2499.59 samples/sec Loss 7.2254 LearningRate 0.000983 Epoch: 4 Global Step: 89270 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:23,995-Speed 2498.52 samples/sec Loss 7.1479 LearningRate 0.000983 Epoch: 4 Global Step: 89280 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:32,144-Speed 2513.88 samples/sec Loss 7.3547 LearningRate 0.000983 Epoch: 4 Global Step: 89290 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:40,340-Speed 2499.00 samples/sec Loss 7.2268 LearningRate 0.000983 Epoch: 4 Global Step: 89300 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:48,539-Speed 2498.62 samples/sec Loss 7.1947 LearningRate 0.000983 Epoch: 4 Global Step: 89310 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:50:56,735-Speed 2499.28 samples/sec Loss 7.1814 LearningRate 0.000983 Epoch: 4 Global Step: 89320 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:04,937-Speed 2497.36 samples/sec Loss 7.2380 LearningRate 0.000983 Epoch: 4 Global Step: 89330 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:13,137-Speed 2498.17 samples/sec Loss 7.2207 LearningRate 0.000983 Epoch: 4 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:21,281-Speed 2515.02 samples/sec Loss 7.5221 LearningRate 0.000983 Epoch: 4 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:29,495-Speed 2493.85 samples/sec Loss 7.3613 LearningRate 0.000983 Epoch: 4 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:37,695-Speed 2497.73 samples/sec Loss 7.2627 LearningRate 0.000983 Epoch: 4 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:45,894-Speed 2498.24 samples/sec Loss 7.3359 LearningRate 0.000983 Epoch: 4 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:51:54,093-Speed 2498.41 samples/sec Loss 7.2084 LearningRate 0.000983 Epoch: 4 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:02,292-Speed 2498.37 samples/sec Loss 7.3628 LearningRate 0.000983 Epoch: 4 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:10,438-Speed 2514.61 samples/sec Loss 7.2679 LearningRate 0.000983 Epoch: 4 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:18,646-Speed 2495.48 samples/sec Loss 7.2529 LearningRate 0.000983 Epoch: 4 Global Step: 89420 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:26,843-Speed 2499.09 samples/sec Loss 7.2518 LearningRate 0.000983 Epoch: 4 Global Step: 89430 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:35,041-Speed 2498.68 samples/sec Loss 7.2552 LearningRate 0.000983 Epoch: 4 Global Step: 89440 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:43,240-Speed 2498.26 samples/sec Loss 7.2610 LearningRate 0.000983 Epoch: 4 Global Step: 89450 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:51,439-Speed 2498.45 samples/sec Loss 7.2761 LearningRate 0.000983 Epoch: 4 Global Step: 89460 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:52:59,586-Speed 2514.21 samples/sec Loss 7.2518 LearningRate 0.000983 Epoch: 4 Global Step: 89470 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:07,784-Speed 2498.61 samples/sec Loss 7.3091 LearningRate 0.000983 Epoch: 4 Global Step: 89480 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:15,981-Speed 2498.72 samples/sec Loss 7.3300 LearningRate 0.000983 Epoch: 4 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:24,179-Speed 2498.43 samples/sec Loss 7.1658 LearningRate 0.000983 Epoch: 4 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:32,391-Speed 2494.21 samples/sec Loss 7.1560 LearningRate 0.000983 Epoch: 4 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:40,588-Speed 2499.40 samples/sec Loss 7.2932 LearningRate 0.000982 Epoch: 4 Global Step: 89520 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:48,730-Speed 2515.75 samples/sec Loss 7.2116 LearningRate 0.000982 Epoch: 4 Global Step: 89530 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:53:56,930-Speed 2497.73 samples/sec Loss 7.2559 LearningRate 0.000982 Epoch: 4 Global Step: 89540 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:05,127-Speed 2499.25 samples/sec Loss 7.2571 LearningRate 0.000982 Epoch: 4 Global Step: 89550 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:13,327-Speed 2497.82 samples/sec Loss 7.2081 LearningRate 0.000982 Epoch: 4 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:21,525-Speed 2498.71 samples/sec Loss 7.2068 LearningRate 0.000982 Epoch: 4 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:29,721-Speed 2499.28 samples/sec Loss 7.3501 LearningRate 0.000982 Epoch: 4 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:37,866-Speed 2514.60 samples/sec Loss 7.1648 LearningRate 0.000982 Epoch: 4 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:46,063-Speed 2498.97 samples/sec Loss 7.3668 LearningRate 0.000982 Epoch: 4 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:54:54,261-Speed 2498.46 samples/sec Loss 7.3365 LearningRate 0.000982 Epoch: 4 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:02,465-Speed 2496.88 samples/sec Loss 7.3455 LearningRate 0.000982 Epoch: 4 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:10,661-Speed 2498.99 samples/sec Loss 7.2215 LearningRate 0.000982 Epoch: 4 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:18,861-Speed 2498.10 samples/sec Loss 7.3278 LearningRate 0.000982 Epoch: 4 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:27,005-Speed 2515.43 samples/sec Loss 7.2869 LearningRate 0.000982 Epoch: 4 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:35,217-Speed 2494.29 samples/sec Loss 7.3673 LearningRate 0.000982 Epoch: 4 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:43,420-Speed 2496.94 samples/sec Loss 7.2474 LearningRate 0.000982 Epoch: 4 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:51,618-Speed 2498.88 samples/sec Loss 7.2990 LearningRate 0.000982 Epoch: 4 Global Step: 89680 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:55:59,815-Speed 2498.86 samples/sec Loss 7.3367 LearningRate 0.000982 Epoch: 4 Global Step: 89690 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:08,014-Speed 2498.40 samples/sec Loss 7.3570 LearningRate 0.000982 Epoch: 4 Global Step: 89700 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:16,161-Speed 2514.16 samples/sec Loss 7.2304 LearningRate 0.000982 Epoch: 4 Global Step: 89710 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:24,362-Speed 2497.79 samples/sec Loss 7.2639 LearningRate 0.000982 Epoch: 4 Global Step: 89720 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:32,560-Speed 2498.43 samples/sec Loss 7.3170 LearningRate 0.000982 Epoch: 4 Global Step: 89730 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:40,758-Speed 2499.04 samples/sec Loss 7.2042 LearningRate 0.000982 Epoch: 4 Global Step: 89740 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:48,953-Speed 2499.39 samples/sec Loss 7.2185 LearningRate 0.000982 Epoch: 4 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:56:57,153-Speed 2498.09 samples/sec Loss 7.2298 LearningRate 0.000982 Epoch: 4 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:05,297-Speed 2515.18 samples/sec Loss 7.1407 LearningRate 0.000982 Epoch: 4 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:13,495-Speed 2498.62 samples/sec Loss 7.2054 LearningRate 0.000982 Epoch: 4 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:21,707-Speed 2494.33 samples/sec Loss 7.2331 LearningRate 0.000982 Epoch: 4 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:29,910-Speed 2496.79 samples/sec Loss 7.1653 LearningRate 0.000982 Epoch: 4 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:38,111-Speed 2497.75 samples/sec Loss 7.1786 LearningRate 0.000982 Epoch: 4 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:46,308-Speed 2498.91 samples/sec Loss 7.0793 LearningRate 0.000982 Epoch: 4 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:57:54,453-Speed 2514.82 samples/sec Loss 7.3380 LearningRate 0.000982 Epoch: 4 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:02,649-Speed 2499.11 samples/sec Loss 7.1894 LearningRate 0.000982 Epoch: 4 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:10,847-Speed 2498.75 samples/sec Loss 7.2220 LearningRate 0.000982 Epoch: 4 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:19,057-Speed 2494.93 samples/sec Loss 7.2322 LearningRate 0.000982 Epoch: 4 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:27,253-Speed 2499.16 samples/sec Loss 7.2970 LearningRate 0.000982 Epoch: 4 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:35,452-Speed 2498.22 samples/sec Loss 7.1817 LearningRate 0.000982 Epoch: 4 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:43,596-Speed 2515.07 samples/sec Loss 7.4368 LearningRate 0.000982 Epoch: 4 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:51,793-Speed 2498.74 samples/sec Loss 7.2913 LearningRate 0.000981 Epoch: 4 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:58:59,995-Speed 2497.56 samples/sec Loss 7.3343 LearningRate 0.000981 Epoch: 4 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:12,225-Speed 1676.78 samples/sec Loss 7.3470 LearningRate 0.000981 Epoch: 4 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:20,500-Speed 2499.49 samples/sec Loss 7.2959 LearningRate 0.000981 Epoch: 4 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:28,693-Speed 2499.98 samples/sec Loss 7.3111 LearningRate 0.000981 Epoch: 4 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:36,831-Speed 2516.71 samples/sec Loss 7.3327 LearningRate 0.000981 Epoch: 4 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:45,042-Speed 2499.85 samples/sec Loss 7.1400 LearningRate 0.000981 Epoch: 4 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 10:59:53,321-Speed 2499.69 samples/sec Loss 7.2623 LearningRate 0.000981 Epoch: 4 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:01,517-Speed 2499.06 samples/sec Loss 7.1846 LearningRate 0.000981 Epoch: 4 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:09,756-Speed 2499.96 samples/sec Loss 7.2300 LearningRate 0.000981 Epoch: 4 Global Step: 89990 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:21,892-Speed 2484.70 samples/sec Loss 7.1771 LearningRate 0.000981 Epoch: 4 Global Step: 90000 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:30,034-Speed 2518.47 samples/sec Loss 7.2484 LearningRate 0.000981 Epoch: 4 Global Step: 90010 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:38,225-Speed 2500.47 samples/sec Loss 7.2006 LearningRate 0.000981 Epoch: 4 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:49,684-Speed 1791.83 samples/sec Loss 7.3025 LearningRate 0.000981 Epoch: 4 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:00:57,877-Speed 2500.31 samples/sec Loss 7.2796 LearningRate 0.000981 Epoch: 4 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:01:06,076-Speed 2498.32 samples/sec Loss 7.3921 LearningRate 0.000981 Epoch: 4 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:01:18,519-Speed 1651.89 samples/sec Loss 7.2648 LearningRate 0.000981 Epoch: 4 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:01:30,748-Speed 2519.09 samples/sec Loss 7.2840 LearningRate 0.000981 Epoch: 4 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:01:38,961-Speed 2502.56 samples/sec Loss 7.2515 LearningRate 0.000981 Epoch: 4 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:01:51,860-Speed 2505.29 samples/sec Loss 7.1692 LearningRate 0.000981 Epoch: 4 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:00,080-Speed 2504.05 samples/sec Loss 7.1904 LearningRate 0.000981 Epoch: 4 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:08,271-Speed 2500.70 samples/sec Loss 7.1143 LearningRate 0.000981 Epoch: 4 Global Step: 90110 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:20,324-Speed 1747.36 samples/sec Loss 7.2882 LearningRate 0.000981 Epoch: 4 Global Step: 90120 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:28,483-Speed 2519.07 samples/sec Loss 7.2353 LearningRate 0.000981 Epoch: 4 Global Step: 90130 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:36,675-Speed 2500.20 samples/sec Loss 7.2770 LearningRate 0.000981 Epoch: 4 Global Step: 90140 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:44,901-Speed 2501.31 samples/sec Loss 7.2971 LearningRate 0.000981 Epoch: 4 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:02:56,323-Speed 2109.56 samples/sec Loss 7.2147 LearningRate 0.000981 Epoch: 4 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:04,539-Speed 2500.45 samples/sec Loss 7.1528 LearningRate 0.000981 Epoch: 4 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:18,109-Speed 1509.34 samples/sec Loss 7.2892 LearningRate 0.000981 Epoch: 4 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:26,336-Speed 2516.22 samples/sec Loss 7.2051 LearningRate 0.000981 Epoch: 4 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:34,627-Speed 2475.19 samples/sec Loss 7.2085 LearningRate 0.000981 Epoch: 4 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:42,827-Speed 2497.73 samples/sec Loss 7.3043 LearningRate 0.000981 Epoch: 4 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:03:51,026-Speed 2498.20 samples/sec Loss 7.4151 LearningRate 0.000981 Epoch: 4 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:02,619-Speed 1774.25 samples/sec Loss 7.2465 LearningRate 0.000981 Epoch: 4 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:10,851-Speed 2500.44 samples/sec Loss 7.3605 LearningRate 0.000981 Epoch: 4 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:19,004-Speed 2512.37 samples/sec Loss 7.3062 LearningRate 0.000981 Epoch: 4 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:27,214-Speed 2494.81 samples/sec Loss 7.2611 LearningRate 0.000981 Epoch: 4 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:35,418-Speed 2496.54 samples/sec Loss 7.2029 LearningRate 0.000981 Epoch: 4 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:43,622-Speed 2496.88 samples/sec Loss 7.2847 LearningRate 0.000980 Epoch: 4 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:04:51,827-Speed 2496.47 samples/sec Loss 7.2618 LearningRate 0.000980 Epoch: 4 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:00,029-Speed 2497.41 samples/sec Loss 7.2396 LearningRate 0.000980 Epoch: 4 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:08,179-Speed 2513.40 samples/sec Loss 7.3313 LearningRate 0.000980 Epoch: 4 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:16,382-Speed 2497.01 samples/sec Loss 7.2137 LearningRate 0.000980 Epoch: 4 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:24,587-Speed 2496.54 samples/sec Loss 7.1900 LearningRate 0.000980 Epoch: 4 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:32,803-Speed 2493.08 samples/sec Loss 7.2499 LearningRate 0.000980 Epoch: 4 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:41,004-Speed 2497.48 samples/sec Loss 7.1244 LearningRate 0.000980 Epoch: 4 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:49,207-Speed 2497.45 samples/sec Loss 7.2862 LearningRate 0.000980 Epoch: 4 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:05:57,353-Speed 2514.47 samples/sec Loss 7.3093 LearningRate 0.000980 Epoch: 4 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:05,559-Speed 2496.23 samples/sec Loss 7.2299 LearningRate 0.000980 Epoch: 4 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:13,760-Speed 2497.42 samples/sec Loss 7.1996 LearningRate 0.000980 Epoch: 4 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:21,977-Speed 2492.78 samples/sec Loss 7.2054 LearningRate 0.000980 Epoch: 4 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:30,188-Speed 2495.04 samples/sec Loss 7.2149 LearningRate 0.000980 Epoch: 4 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:38,392-Speed 2496.90 samples/sec Loss 7.1382 LearningRate 0.000980 Epoch: 4 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:46,536-Speed 2515.09 samples/sec Loss 7.1103 LearningRate 0.000980 Epoch: 4 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:06:54,735-Speed 2498.23 samples/sec Loss 7.1058 LearningRate 0.000980 Epoch: 4 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:02,934-Speed 2498.55 samples/sec Loss 7.2410 LearningRate 0.000980 Epoch: 4 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:11,132-Speed 2498.55 samples/sec Loss 7.2945 LearningRate 0.000980 Epoch: 4 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:19,334-Speed 2497.33 samples/sec Loss 7.2438 LearningRate 0.000980 Epoch: 4 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:27,533-Speed 2498.50 samples/sec Loss 7.1859 LearningRate 0.000980 Epoch: 4 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:35,686-Speed 2512.36 samples/sec Loss 7.1095 LearningRate 0.000980 Epoch: 4 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:43,904-Speed 2492.57 samples/sec Loss 7.0547 LearningRate 0.000980 Epoch: 4 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:07:52,118-Speed 2493.76 samples/sec Loss 7.1393 LearningRate 0.000980 Epoch: 4 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:00,348-Speed 2488.87 samples/sec Loss 7.1978 LearningRate 0.000980 Epoch: 4 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:08,546-Speed 2498.47 samples/sec Loss 7.1579 LearningRate 0.000980 Epoch: 4 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:16,747-Speed 2498.04 samples/sec Loss 7.1392 LearningRate 0.000980 Epoch: 4 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:24,893-Speed 2514.67 samples/sec Loss 7.1828 LearningRate 0.000980 Epoch: 4 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:33,093-Speed 2497.97 samples/sec Loss 7.0614 LearningRate 0.000980 Epoch: 4 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:41,297-Speed 2496.64 samples/sec Loss 7.0968 LearningRate 0.000980 Epoch: 4 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:49,499-Speed 2497.59 samples/sec Loss 7.1847 LearningRate 0.000980 Epoch: 4 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:08:57,695-Speed 2499.06 samples/sec Loss 7.2458 LearningRate 0.000980 Epoch: 4 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:05,897-Speed 2497.38 samples/sec Loss 7.2799 LearningRate 0.000980 Epoch: 4 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:14,043-Speed 2514.49 samples/sec Loss 7.1418 LearningRate 0.000980 Epoch: 4 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:22,277-Speed 2487.74 samples/sec Loss 7.1490 LearningRate 0.000980 Epoch: 4 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:30,479-Speed 2497.55 samples/sec Loss 7.2039 LearningRate 0.000980 Epoch: 4 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:38,677-Speed 2498.64 samples/sec Loss 7.1870 LearningRate 0.000980 Epoch: 4 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:46,876-Speed 2499.22 samples/sec Loss 7.1714 LearningRate 0.000979 Epoch: 4 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:09:55,076-Speed 2497.68 samples/sec Loss 7.0977 LearningRate 0.000979 Epoch: 4 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:03,222-Speed 2514.49 samples/sec Loss 7.1092 LearningRate 0.000979 Epoch: 4 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:11,421-Speed 2498.95 samples/sec Loss 7.2058 LearningRate 0.000979 Epoch: 4 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:19,620-Speed 2498.06 samples/sec Loss 7.1566 LearningRate 0.000979 Epoch: 4 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:27,821-Speed 2497.63 samples/sec Loss 7.1658 LearningRate 0.000979 Epoch: 4 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:36,022-Speed 2497.83 samples/sec Loss 7.2411 LearningRate 0.000979 Epoch: 4 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:44,227-Speed 2496.24 samples/sec Loss 7.2988 LearningRate 0.000979 Epoch: 4 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:10:52,373-Speed 2514.68 samples/sec Loss 7.2146 LearningRate 0.000979 Epoch: 4 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:00,571-Speed 2498.57 samples/sec Loss 7.3534 LearningRate 0.000979 Epoch: 4 Global Step: 90740 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:08,768-Speed 2498.92 samples/sec Loss 7.2931 LearningRate 0.000979 Epoch: 4 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:16,968-Speed 2498.07 samples/sec Loss 7.2725 LearningRate 0.000979 Epoch: 4 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:25,169-Speed 2497.74 samples/sec Loss 7.2521 LearningRate 0.000979 Epoch: 4 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:33,369-Speed 2498.00 samples/sec Loss 7.1730 LearningRate 0.000979 Epoch: 4 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:41,512-Speed 2515.52 samples/sec Loss 7.2353 LearningRate 0.000979 Epoch: 4 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:49,715-Speed 2496.79 samples/sec Loss 7.2345 LearningRate 0.000979 Epoch: 4 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:11:57,915-Speed 2498.14 samples/sec Loss 7.1465 LearningRate 0.000979 Epoch: 4 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:06,117-Speed 2497.66 samples/sec Loss 7.1888 LearningRate 0.000979 Epoch: 4 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:14,315-Speed 2498.47 samples/sec Loss 7.2029 LearningRate 0.000979 Epoch: 4 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:22,515-Speed 2497.91 samples/sec Loss 7.1687 LearningRate 0.000979 Epoch: 4 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:30,661-Speed 2514.55 samples/sec Loss 7.2404 LearningRate 0.000979 Epoch: 4 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:38,860-Speed 2498.32 samples/sec Loss 7.1884 LearningRate 0.000979 Epoch: 4 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:47,060-Speed 2497.99 samples/sec Loss 7.1698 LearningRate 0.000979 Epoch: 4 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:12:55,258-Speed 2498.61 samples/sec Loss 7.0837 LearningRate 0.000979 Epoch: 4 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:03,460-Speed 2497.14 samples/sec Loss 7.1878 LearningRate 0.000979 Epoch: 4 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:11,663-Speed 2497.25 samples/sec Loss 7.0589 LearningRate 0.000979 Epoch: 4 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:19,811-Speed 2514.08 samples/sec Loss 7.1101 LearningRate 0.000979 Epoch: 4 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:28,011-Speed 2497.87 samples/sec Loss 7.0341 LearningRate 0.000979 Epoch: 4 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:36,208-Speed 2498.86 samples/sec Loss 7.0740 LearningRate 0.000979 Epoch: 4 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:44,405-Speed 2498.83 samples/sec Loss 7.1145 LearningRate 0.000979 Epoch: 4 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:13:52,608-Speed 2497.27 samples/sec Loss 7.1179 LearningRate 0.000979 Epoch: 4 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:00,807-Speed 2498.24 samples/sec Loss 7.2162 LearningRate 0.000979 Epoch: 4 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:08,950-Speed 2515.40 samples/sec Loss 7.0799 LearningRate 0.000979 Epoch: 4 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:17,148-Speed 2498.43 samples/sec Loss 7.2017 LearningRate 0.000979 Epoch: 4 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:25,356-Speed 2495.93 samples/sec Loss 7.2475 LearningRate 0.000979 Epoch: 4 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:33,553-Speed 2498.80 samples/sec Loss 7.1853 LearningRate 0.000979 Epoch: 4 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:41,758-Speed 2496.53 samples/sec Loss 7.1524 LearningRate 0.000979 Epoch: 4 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:49,965-Speed 2495.89 samples/sec Loss 7.2026 LearningRate 0.000979 Epoch: 4 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:14:58,111-Speed 2514.77 samples/sec Loss 7.1497 LearningRate 0.000978 Epoch: 4 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:06,310-Speed 2498.36 samples/sec Loss 7.1081 LearningRate 0.000978 Epoch: 4 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:14,509-Speed 2498.26 samples/sec Loss 7.1904 LearningRate 0.000978 Epoch: 4 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:22,714-Speed 2496.91 samples/sec Loss 7.1751 LearningRate 0.000978 Epoch: 4 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:30,910-Speed 2499.11 samples/sec Loss 7.1698 LearningRate 0.000978 Epoch: 4 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:39,109-Speed 2498.31 samples/sec Loss 7.1225 LearningRate 0.000978 Epoch: 4 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:47,266-Speed 2511.16 samples/sec Loss 7.1482 LearningRate 0.000978 Epoch: 4 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:15:55,466-Speed 2497.87 samples/sec Loss 7.1874 LearningRate 0.000978 Epoch: 4 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:03,665-Speed 2498.15 samples/sec Loss 7.1856 LearningRate 0.000978 Epoch: 4 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:11,862-Speed 2499.06 samples/sec Loss 7.1374 LearningRate 0.000978 Epoch: 4 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:20,062-Speed 2497.97 samples/sec Loss 7.1305 LearningRate 0.000978 Epoch: 4 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:28,260-Speed 2498.54 samples/sec Loss 7.1506 LearningRate 0.000978 Epoch: 4 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:36,405-Speed 2515.06 samples/sec Loss 7.1586 LearningRate 0.000978 Epoch: 4 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:44,616-Speed 2494.48 samples/sec Loss 7.1333 LearningRate 0.000978 Epoch: 4 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:16:52,826-Speed 2495.22 samples/sec Loss 7.1286 LearningRate 0.000978 Epoch: 4 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:01,023-Speed 2498.72 samples/sec Loss 7.1273 LearningRate 0.000978 Epoch: 4 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:09,227-Speed 2496.80 samples/sec Loss 7.1044 LearningRate 0.000978 Epoch: 4 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:17,425-Speed 2498.63 samples/sec Loss 7.1181 LearningRate 0.000978 Epoch: 4 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:25,575-Speed 2513.68 samples/sec Loss 7.1770 LearningRate 0.000978 Epoch: 4 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:33,776-Speed 2497.47 samples/sec Loss 7.1134 LearningRate 0.000978 Epoch: 4 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:41,998-Speed 2491.41 samples/sec Loss 7.1448 LearningRate 0.000978 Epoch: 4 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:50,199-Speed 2497.69 samples/sec Loss 7.1850 LearningRate 0.000978 Epoch: 4 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:17:58,400-Speed 2497.62 samples/sec Loss 7.0799 LearningRate 0.000978 Epoch: 4 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:06,597-Speed 2498.93 samples/sec Loss 7.0844 LearningRate 0.000978 Epoch: 4 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:14,743-Speed 2514.51 samples/sec Loss 7.0914 LearningRate 0.000978 Epoch: 4 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:22,945-Speed 2497.36 samples/sec Loss 7.0336 LearningRate 0.000978 Epoch: 4 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:31,153-Speed 2495.38 samples/sec Loss 6.9759 LearningRate 0.000978 Epoch: 4 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:39,360-Speed 2496.01 samples/sec Loss 6.9875 LearningRate 0.000978 Epoch: 4 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:47,568-Speed 2495.59 samples/sec Loss 7.0912 LearningRate 0.000978 Epoch: 4 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:18:55,765-Speed 2498.58 samples/sec Loss 7.0489 LearningRate 0.000978 Epoch: 4 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:19:03,909-Speed 2515.23 samples/sec Loss 6.9717 LearningRate 0.000978 Epoch: 4 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:19:12,107-Speed 2498.50 samples/sec Loss 6.9774 LearningRate 0.000978 Epoch: 4 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:19:20,306-Speed 2498.49 samples/sec Loss 7.0381 LearningRate 0.000978 Epoch: 4 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:19:28,506-Speed 2498.04 samples/sec Loss 7.0583 LearningRate 0.000978 Epoch: 4 Global Step: 91360 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:19:36,702-Speed 2499.08 samples/sec Loss 7.0824 LearningRate 0.000978 Epoch: 4 Global Step: 91370 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:19:44,897-Speed 2499.62 samples/sec Loss 7.1348 LearningRate 0.000978 Epoch: 4 Global Step: 91380 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:19:53,046-Speed 2513.47 samples/sec Loss 6.9378 LearningRate 0.000978 Epoch: 4 Global Step: 91390 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:01,242-Speed 2499.24 samples/sec Loss 7.0623 LearningRate 0.000978 Epoch: 4 Global Step: 91400 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:09,437-Speed 2499.39 samples/sec Loss 7.1951 LearningRate 0.000977 Epoch: 4 Global Step: 91410 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:17,645-Speed 2495.57 samples/sec Loss 7.1160 LearningRate 0.000977 Epoch: 4 Global Step: 91420 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:25,841-Speed 2499.07 samples/sec Loss 7.1258 LearningRate 0.000977 Epoch: 4 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:34,047-Speed 2496.19 samples/sec Loss 7.0299 LearningRate 0.000977 Epoch: 4 Global Step: 91440 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:42,193-Speed 2514.82 samples/sec Loss 7.0928 LearningRate 0.000977 Epoch: 4 Global Step: 91450 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:50,392-Speed 2498.38 samples/sec Loss 7.0592 LearningRate 0.000977 Epoch: 4 Global Step: 91460 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:20:58,589-Speed 2498.59 samples/sec Loss 7.1024 LearningRate 0.000977 Epoch: 4 Global Step: 91470 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:06,786-Speed 2499.10 samples/sec Loss 7.1824 LearningRate 0.000977 Epoch: 4 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:14,982-Speed 2499.17 samples/sec Loss 7.0581 LearningRate 0.000977 Epoch: 4 Global Step: 91490 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:23,186-Speed 2496.66 samples/sec Loss 7.0857 LearningRate 0.000977 Epoch: 4 Global Step: 91500 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:31,338-Speed 2512.81 samples/sec Loss 7.2071 LearningRate 0.000977 Epoch: 4 Global Step: 91510 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:39,534-Speed 2499.36 samples/sec Loss 7.1469 LearningRate 0.000977 Epoch: 4 Global Step: 91520 Fp16 Grad Scale: 131072 Required: 169 hours Training: 2022-07-06 11:21:47,686-Speed 2512.88 samples/sec Loss 7.2697 LearningRate 0.000977 Epoch: 4 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:21:55,887-Speed 2497.47 samples/sec Loss 7.1782 LearningRate 0.000977 Epoch: 4 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:04,084-Speed 2498.88 samples/sec Loss 7.1067 LearningRate 0.000977 Epoch: 4 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:12,280-Speed 2499.22 samples/sec Loss 7.0996 LearningRate 0.000977 Epoch: 4 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:20,433-Speed 2512.46 samples/sec Loss 7.0856 LearningRate 0.000977 Epoch: 4 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:28,634-Speed 2497.69 samples/sec Loss 7.0554 LearningRate 0.000977 Epoch: 4 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:36,841-Speed 2495.55 samples/sec Loss 7.0987 LearningRate 0.000977 Epoch: 4 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:45,039-Speed 2498.72 samples/sec Loss 7.0468 LearningRate 0.000977 Epoch: 4 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:22:53,253-Speed 2494.10 samples/sec Loss 7.0740 LearningRate 0.000977 Epoch: 4 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:01,452-Speed 2498.12 samples/sec Loss 7.0401 LearningRate 0.000977 Epoch: 4 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:09,603-Speed 2513.24 samples/sec Loss 7.1247 LearningRate 0.000977 Epoch: 4 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:17,802-Speed 2498.38 samples/sec Loss 7.0580 LearningRate 0.000977 Epoch: 4 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:26,002-Speed 2497.90 samples/sec Loss 6.9971 LearningRate 0.000977 Epoch: 4 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:34,204-Speed 2497.22 samples/sec Loss 7.0927 LearningRate 0.000977 Epoch: 4 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:42,413-Speed 2495.77 samples/sec Loss 7.1130 LearningRate 0.000977 Epoch: 4 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:50,611-Speed 2498.29 samples/sec Loss 7.0855 LearningRate 0.000977 Epoch: 4 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:23:58,756-Speed 2515.15 samples/sec Loss 7.1724 LearningRate 0.000977 Epoch: 4 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:06,953-Speed 2498.85 samples/sec Loss 7.0602 LearningRate 0.000977 Epoch: 4 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:15,148-Speed 2499.38 samples/sec Loss 7.0397 LearningRate 0.000977 Epoch: 4 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:23,371-Speed 2491.21 samples/sec Loss 7.0358 LearningRate 0.000977 Epoch: 4 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:31,566-Speed 2499.36 samples/sec Loss 6.9773 LearningRate 0.000977 Epoch: 4 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:39,764-Speed 2498.44 samples/sec Loss 7.0936 LearningRate 0.000977 Epoch: 4 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:47,909-Speed 2514.82 samples/sec Loss 7.0328 LearningRate 0.000977 Epoch: 4 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:24:56,106-Speed 2498.78 samples/sec Loss 7.0385 LearningRate 0.000977 Epoch: 4 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:04,306-Speed 2498.07 samples/sec Loss 7.1263 LearningRate 0.000977 Epoch: 4 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:12,505-Speed 2498.10 samples/sec Loss 6.9642 LearningRate 0.000977 Epoch: 4 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:20,706-Speed 2497.85 samples/sec Loss 7.0687 LearningRate 0.000976 Epoch: 4 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:28,909-Speed 2496.93 samples/sec Loss 7.1045 LearningRate 0.000976 Epoch: 4 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:37,052-Speed 2515.46 samples/sec Loss 6.9838 LearningRate 0.000976 Epoch: 4 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:45,250-Speed 2498.68 samples/sec Loss 7.0355 LearningRate 0.000976 Epoch: 4 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:25:53,447-Speed 2498.77 samples/sec Loss 7.1096 LearningRate 0.000976 Epoch: 4 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:01,643-Speed 2499.08 samples/sec Loss 7.1880 LearningRate 0.000976 Epoch: 4 Global Step: 91840 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:09,841-Speed 2498.71 samples/sec Loss 7.1523 LearningRate 0.000976 Epoch: 4 Global Step: 91850 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:18,037-Speed 2499.03 samples/sec Loss 7.0704 LearningRate 0.000976 Epoch: 4 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:26,181-Speed 2515.11 samples/sec Loss 7.0254 LearningRate 0.000976 Epoch: 4 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:34,382-Speed 2497.65 samples/sec Loss 7.1430 LearningRate 0.000976 Epoch: 4 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:42,575-Speed 2500.17 samples/sec Loss 7.0764 LearningRate 0.000976 Epoch: 4 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:50,773-Speed 2498.48 samples/sec Loss 7.1520 LearningRate 0.000976 Epoch: 4 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:26:58,983-Speed 2495.40 samples/sec Loss 7.2051 LearningRate 0.000976 Epoch: 4 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:07,180-Speed 2498.93 samples/sec Loss 7.0497 LearningRate 0.000976 Epoch: 4 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:15,325-Speed 2514.84 samples/sec Loss 7.0213 LearningRate 0.000976 Epoch: 4 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:23,523-Speed 2498.64 samples/sec Loss 7.0108 LearningRate 0.000976 Epoch: 4 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:31,721-Speed 2498.81 samples/sec Loss 6.9426 LearningRate 0.000976 Epoch: 4 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:39,920-Speed 2498.13 samples/sec Loss 7.0256 LearningRate 0.000976 Epoch: 4 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:48,121-Speed 2497.46 samples/sec Loss 7.0987 LearningRate 0.000976 Epoch: 4 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:27:56,322-Speed 2497.70 samples/sec Loss 6.9256 LearningRate 0.000976 Epoch: 4 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:04,466-Speed 2515.35 samples/sec Loss 7.0103 LearningRate 0.000976 Epoch: 4 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:12,675-Speed 2495.36 samples/sec Loss 7.0924 LearningRate 0.000976 Epoch: 4 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:20,877-Speed 2497.29 samples/sec Loss 7.0441 LearningRate 0.000976 Epoch: 4 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:29,078-Speed 2497.63 samples/sec Loss 6.9677 LearningRate 0.000976 Epoch: 4 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:37,277-Speed 2498.36 samples/sec Loss 6.9409 LearningRate 0.000976 Epoch: 4 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:45,482-Speed 2496.65 samples/sec Loss 6.9803 LearningRate 0.000976 Epoch: 4 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:28:53,634-Speed 2512.49 samples/sec Loss 7.0137 LearningRate 0.000976 Epoch: 4 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:01,834-Speed 2497.82 samples/sec Loss 6.9336 LearningRate 0.000976 Epoch: 4 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:10,038-Speed 2496.77 samples/sec Loss 7.0441 LearningRate 0.000976 Epoch: 4 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:18,241-Speed 2497.15 samples/sec Loss 7.0015 LearningRate 0.000976 Epoch: 4 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:26,446-Speed 2496.65 samples/sec Loss 7.0139 LearningRate 0.000976 Epoch: 4 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:34,642-Speed 2499.07 samples/sec Loss 7.0562 LearningRate 0.000976 Epoch: 4 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:42,800-Speed 2510.71 samples/sec Loss 6.9881 LearningRate 0.000976 Epoch: 4 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:50,999-Speed 2498.21 samples/sec Loss 7.0472 LearningRate 0.000976 Epoch: 4 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:29:59,197-Speed 2498.56 samples/sec Loss 7.0251 LearningRate 0.000976 Epoch: 4 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:07,395-Speed 2498.45 samples/sec Loss 7.0170 LearningRate 0.000976 Epoch: 4 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:15,593-Speed 2498.99 samples/sec Loss 6.9352 LearningRate 0.000976 Epoch: 4 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:23,796-Speed 2497.17 samples/sec Loss 6.9738 LearningRate 0.000975 Epoch: 4 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:31,943-Speed 2514.21 samples/sec Loss 6.9545 LearningRate 0.000975 Epoch: 4 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:40,141-Speed 2498.60 samples/sec Loss 6.9770 LearningRate 0.000975 Epoch: 4 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:48,342-Speed 2497.50 samples/sec Loss 7.0327 LearningRate 0.000975 Epoch: 4 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:30:56,538-Speed 2499.19 samples/sec Loss 7.0639 LearningRate 0.000975 Epoch: 4 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:31:04,736-Speed 2498.73 samples/sec Loss 7.0677 LearningRate 0.000975 Epoch: 4 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 169 hours Training: 2022-07-06 11:31:12,887-Speed 2513.02 samples/sec Loss 6.9878 LearningRate 0.000975 Epoch: 4 Global Step: 92220 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:31:21,044-Speed 2511.35 samples/sec Loss 6.9819 LearningRate 0.000975 Epoch: 4 Global Step: 92230 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:31:29,247-Speed 2496.96 samples/sec Loss 6.9827 LearningRate 0.000975 Epoch: 4 Global Step: 92240 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:31:37,447-Speed 2497.80 samples/sec Loss 6.9375 LearningRate 0.000975 Epoch: 4 Global Step: 92250 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:31:45,646-Speed 2498.27 samples/sec Loss 6.9823 LearningRate 0.000975 Epoch: 4 Global Step: 92260 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:31:53,848-Speed 2497.38 samples/sec Loss 7.0566 LearningRate 0.000975 Epoch: 4 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:02,051-Speed 2497.16 samples/sec Loss 6.9799 LearningRate 0.000975 Epoch: 4 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:10,202-Speed 2512.77 samples/sec Loss 6.9346 LearningRate 0.000975 Epoch: 4 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:18,416-Speed 2493.94 samples/sec Loss 7.0443 LearningRate 0.000975 Epoch: 4 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:26,616-Speed 2497.80 samples/sec Loss 7.0852 LearningRate 0.000975 Epoch: 4 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:34,822-Speed 2496.46 samples/sec Loss 7.0747 LearningRate 0.000975 Epoch: 4 Global Step: 92320 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:43,028-Speed 2496.23 samples/sec Loss 7.0346 LearningRate 0.000975 Epoch: 4 Global Step: 92330 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:51,228-Speed 2498.04 samples/sec Loss 7.0027 LearningRate 0.000975 Epoch: 4 Global Step: 92340 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:32:59,380-Speed 2512.49 samples/sec Loss 7.0199 LearningRate 0.000975 Epoch: 4 Global Step: 92350 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:33:07,591-Speed 2494.54 samples/sec Loss 7.0296 LearningRate 0.000975 Epoch: 4 Global Step: 92360 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:33:15,801-Speed 2495.01 samples/sec Loss 7.0865 LearningRate 0.000975 Epoch: 4 Global Step: 92370 Fp16 Grad Scale: 32768 Required: 169 hours Training: 2022-07-06 11:33:24,008-Speed 2495.95 samples/sec Loss 6.9681 LearningRate 0.000975 Epoch: 4 Global Step: 92380 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:33:32,210-Speed 2497.10 samples/sec Loss 7.1086 LearningRate 0.000975 Epoch: 4 Global Step: 92390 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:33:40,414-Speed 2496.92 samples/sec Loss 6.9719 LearningRate 0.000975 Epoch: 4 Global Step: 92400 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:33:48,561-Speed 2514.33 samples/sec Loss 7.1215 LearningRate 0.000975 Epoch: 4 Global Step: 92410 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:33:56,760-Speed 2498.16 samples/sec Loss 7.0364 LearningRate 0.000975 Epoch: 4 Global Step: 92420 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:04,962-Speed 2497.44 samples/sec Loss 7.1127 LearningRate 0.000975 Epoch: 4 Global Step: 92430 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:13,162-Speed 2497.88 samples/sec Loss 7.0297 LearningRate 0.000975 Epoch: 4 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:21,361-Speed 2498.37 samples/sec Loss 7.0459 LearningRate 0.000975 Epoch: 4 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:29,561-Speed 2498.02 samples/sec Loss 7.0189 LearningRate 0.000975 Epoch: 4 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:37,706-Speed 2514.66 samples/sec Loss 7.0541 LearningRate 0.000975 Epoch: 4 Global Step: 92470 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:45,906-Speed 2498.20 samples/sec Loss 7.0276 LearningRate 0.000975 Epoch: 4 Global Step: 92480 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:34:54,106-Speed 2497.86 samples/sec Loss 6.9861 LearningRate 0.000975 Epoch: 4 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:02,306-Speed 2498.05 samples/sec Loss 7.0319 LearningRate 0.000975 Epoch: 4 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:10,510-Speed 2496.68 samples/sec Loss 7.0800 LearningRate 0.000975 Epoch: 4 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:18,709-Speed 2498.28 samples/sec Loss 6.8974 LearningRate 0.000975 Epoch: 4 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:26,858-Speed 2513.71 samples/sec Loss 7.0171 LearningRate 0.000975 Epoch: 4 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:35,054-Speed 2499.08 samples/sec Loss 6.9324 LearningRate 0.000974 Epoch: 4 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:43,256-Speed 2497.26 samples/sec Loss 6.9970 LearningRate 0.000974 Epoch: 4 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:51,465-Speed 2495.36 samples/sec Loss 6.9895 LearningRate 0.000974 Epoch: 4 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:35:59,664-Speed 2498.51 samples/sec Loss 7.0070 LearningRate 0.000974 Epoch: 4 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:07,865-Speed 2497.47 samples/sec Loss 6.9765 LearningRate 0.000974 Epoch: 4 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:16,013-Speed 2514.06 samples/sec Loss 7.0265 LearningRate 0.000974 Epoch: 4 Global Step: 92590 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:24,211-Speed 2498.58 samples/sec Loss 7.1404 LearningRate 0.000974 Epoch: 4 Global Step: 92600 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:32,410-Speed 2498.09 samples/sec Loss 6.9801 LearningRate 0.000974 Epoch: 4 Global Step: 92610 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:40,612-Speed 2497.58 samples/sec Loss 7.0073 LearningRate 0.000974 Epoch: 4 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:48,812-Speed 2497.69 samples/sec Loss 7.0722 LearningRate 0.000974 Epoch: 4 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:36:57,015-Speed 2497.02 samples/sec Loss 6.9476 LearningRate 0.000974 Epoch: 4 Global Step: 92640 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:05,162-Speed 2514.31 samples/sec Loss 6.9783 LearningRate 0.000974 Epoch: 4 Global Step: 92650 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:13,362-Speed 2497.88 samples/sec Loss 6.9999 LearningRate 0.000974 Epoch: 4 Global Step: 92660 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:21,572-Speed 2494.97 samples/sec Loss 7.0331 LearningRate 0.000974 Epoch: 4 Global Step: 92670 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:29,769-Speed 2498.83 samples/sec Loss 6.9092 LearningRate 0.000974 Epoch: 4 Global Step: 92680 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:37,965-Speed 2499.43 samples/sec Loss 7.0194 LearningRate 0.000974 Epoch: 4 Global Step: 92690 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:46,173-Speed 2495.36 samples/sec Loss 6.9861 LearningRate 0.000974 Epoch: 4 Global Step: 92700 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:37:54,314-Speed 2516.18 samples/sec Loss 7.0443 LearningRate 0.000974 Epoch: 4 Global Step: 92710 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:02,513-Speed 2498.39 samples/sec Loss 6.9440 LearningRate 0.000974 Epoch: 4 Global Step: 92720 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:10,713-Speed 2497.90 samples/sec Loss 6.9408 LearningRate 0.000974 Epoch: 4 Global Step: 92730 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:18,914-Speed 2497.78 samples/sec Loss 6.9434 LearningRate 0.000974 Epoch: 4 Global Step: 92740 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:27,110-Speed 2499.52 samples/sec Loss 6.9691 LearningRate 0.000974 Epoch: 4 Global Step: 92750 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:35,310-Speed 2497.99 samples/sec Loss 6.8201 LearningRate 0.000974 Epoch: 4 Global Step: 92760 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:43,454-Speed 2514.83 samples/sec Loss 6.9359 LearningRate 0.000974 Epoch: 4 Global Step: 92770 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:51,666-Speed 2494.50 samples/sec Loss 6.9509 LearningRate 0.000974 Epoch: 4 Global Step: 92780 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:38:59,865-Speed 2498.19 samples/sec Loss 6.9845 LearningRate 0.000974 Epoch: 4 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:08,064-Speed 2498.27 samples/sec Loss 6.9930 LearningRate 0.000974 Epoch: 4 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:16,261-Speed 2498.83 samples/sec Loss 6.8768 LearningRate 0.000974 Epoch: 4 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:24,460-Speed 2498.30 samples/sec Loss 6.9317 LearningRate 0.000974 Epoch: 4 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:32,606-Speed 2514.66 samples/sec Loss 7.0107 LearningRate 0.000974 Epoch: 4 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:40,804-Speed 2498.67 samples/sec Loss 6.8542 LearningRate 0.000974 Epoch: 4 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:49,001-Speed 2498.66 samples/sec Loss 6.9541 LearningRate 0.000974 Epoch: 4 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:39:57,210-Speed 2495.45 samples/sec Loss 6.8776 LearningRate 0.000974 Epoch: 4 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:05,409-Speed 2498.19 samples/sec Loss 6.9027 LearningRate 0.000974 Epoch: 4 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:13,606-Speed 2498.77 samples/sec Loss 7.0348 LearningRate 0.000974 Epoch: 4 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:21,750-Speed 2515.27 samples/sec Loss 6.9552 LearningRate 0.000974 Epoch: 4 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:29,947-Speed 2498.87 samples/sec Loss 7.1243 LearningRate 0.000974 Epoch: 4 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:38,147-Speed 2498.20 samples/sec Loss 7.0561 LearningRate 0.000974 Epoch: 4 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:46,350-Speed 2496.99 samples/sec Loss 7.0919 LearningRate 0.000973 Epoch: 4 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:40:54,549-Speed 2498.10 samples/sec Loss 6.9976 LearningRate 0.000973 Epoch: 4 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:02,746-Speed 2499.12 samples/sec Loss 7.0497 LearningRate 0.000973 Epoch: 4 Global Step: 92940 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:10,890-Speed 2515.06 samples/sec Loss 6.8823 LearningRate 0.000973 Epoch: 4 Global Step: 92950 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:19,092-Speed 2497.46 samples/sec Loss 6.9731 LearningRate 0.000973 Epoch: 4 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:27,286-Speed 2499.69 samples/sec Loss 7.0149 LearningRate 0.000973 Epoch: 4 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:35,486-Speed 2497.86 samples/sec Loss 7.0511 LearningRate 0.000973 Epoch: 4 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:43,685-Speed 2498.35 samples/sec Loss 7.0602 LearningRate 0.000973 Epoch: 4 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:41:51,882-Speed 2498.88 samples/sec Loss 7.0624 LearningRate 0.000973 Epoch: 4 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:00,027-Speed 2515.09 samples/sec Loss 6.9178 LearningRate 0.000973 Epoch: 4 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:08,236-Speed 2495.17 samples/sec Loss 7.0100 LearningRate 0.000973 Epoch: 4 Global Step: 93020 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:16,437-Speed 2497.99 samples/sec Loss 7.0344 LearningRate 0.000973 Epoch: 4 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:24,637-Speed 2497.94 samples/sec Loss 6.9012 LearningRate 0.000973 Epoch: 4 Global Step: 93040 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:32,836-Speed 2498.44 samples/sec Loss 6.9829 LearningRate 0.000973 Epoch: 4 Global Step: 93050 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:41,039-Speed 2496.73 samples/sec Loss 6.9476 LearningRate 0.000973 Epoch: 4 Global Step: 93060 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:49,186-Speed 2514.40 samples/sec Loss 7.0071 LearningRate 0.000973 Epoch: 4 Global Step: 93070 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:42:57,386-Speed 2497.94 samples/sec Loss 6.9766 LearningRate 0.000973 Epoch: 4 Global Step: 93080 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:05,587-Speed 2497.67 samples/sec Loss 7.0113 LearningRate 0.000973 Epoch: 4 Global Step: 93090 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:13,783-Speed 2499.67 samples/sec Loss 7.0645 LearningRate 0.000973 Epoch: 4 Global Step: 93100 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:21,982-Speed 2498.14 samples/sec Loss 7.1026 LearningRate 0.000973 Epoch: 4 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:30,179-Speed 2498.75 samples/sec Loss 6.9628 LearningRate 0.000973 Epoch: 4 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:38,324-Speed 2515.01 samples/sec Loss 6.9390 LearningRate 0.000973 Epoch: 4 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:46,520-Speed 2499.08 samples/sec Loss 6.9572 LearningRate 0.000973 Epoch: 4 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:43:54,714-Speed 2499.81 samples/sec Loss 6.9840 LearningRate 0.000973 Epoch: 4 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:02,923-Speed 2495.34 samples/sec Loss 6.9686 LearningRate 0.000973 Epoch: 4 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:11,126-Speed 2496.91 samples/sec Loss 6.9363 LearningRate 0.000973 Epoch: 4 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:19,323-Speed 2498.96 samples/sec Loss 6.9743 LearningRate 0.000973 Epoch: 4 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:27,470-Speed 2514.26 samples/sec Loss 6.9613 LearningRate 0.000973 Epoch: 4 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:35,670-Speed 2498.13 samples/sec Loss 7.0295 LearningRate 0.000973 Epoch: 4 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:43,866-Speed 2499.03 samples/sec Loss 6.9835 LearningRate 0.000973 Epoch: 4 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:44:52,063-Speed 2499.01 samples/sec Loss 6.9786 LearningRate 0.000973 Epoch: 4 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:00,258-Speed 2499.39 samples/sec Loss 6.9784 LearningRate 0.000973 Epoch: 4 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:08,460-Speed 2497.37 samples/sec Loss 6.8737 LearningRate 0.000973 Epoch: 4 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:16,610-Speed 2513.23 samples/sec Loss 6.9887 LearningRate 0.000973 Epoch: 4 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:24,804-Speed 2499.77 samples/sec Loss 6.9447 LearningRate 0.000973 Epoch: 4 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:33,004-Speed 2498.16 samples/sec Loss 6.9905 LearningRate 0.000973 Epoch: 4 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:41,202-Speed 2498.27 samples/sec Loss 7.0065 LearningRate 0.000973 Epoch: 4 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:49,398-Speed 2499.07 samples/sec Loss 6.9376 LearningRate 0.000973 Epoch: 4 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:45:57,598-Speed 2498.16 samples/sec Loss 7.0003 LearningRate 0.000972 Epoch: 4 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:05,745-Speed 2514.11 samples/sec Loss 6.9565 LearningRate 0.000972 Epoch: 4 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:13,944-Speed 2498.54 samples/sec Loss 6.9912 LearningRate 0.000972 Epoch: 4 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:22,140-Speed 2499.54 samples/sec Loss 6.9988 LearningRate 0.000972 Epoch: 4 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:30,336-Speed 2499.08 samples/sec Loss 6.9427 LearningRate 0.000972 Epoch: 4 Global Step: 93340 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:38,539-Speed 2497.12 samples/sec Loss 6.9620 LearningRate 0.000972 Epoch: 4 Global Step: 93350 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:46,742-Speed 2497.21 samples/sec Loss 7.0107 LearningRate 0.000972 Epoch: 4 Global Step: 93360 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:46:54,886-Speed 2515.28 samples/sec Loss 6.9977 LearningRate 0.000972 Epoch: 4 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:47:03,084-Speed 2498.56 samples/sec Loss 6.9732 LearningRate 0.000972 Epoch: 4 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:47:11,290-Speed 2495.94 samples/sec Loss 7.0152 LearningRate 0.000972 Epoch: 4 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:47:19,491-Speed 2497.67 samples/sec Loss 6.8984 LearningRate 0.000972 Epoch: 4 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:47:27,691-Speed 2497.94 samples/sec Loss 6.8885 LearningRate 0.000972 Epoch: 4 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:47:35,890-Speed 2498.23 samples/sec Loss 6.9976 LearningRate 0.000972 Epoch: 4 Global Step: 93420 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:47:44,033-Speed 2515.16 samples/sec Loss 6.9489 LearningRate 0.000972 Epoch: 4 Global Step: 93430 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:47:52,240-Speed 2495.95 samples/sec Loss 6.9425 LearningRate 0.000972 Epoch: 4 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:00,441-Speed 2497.88 samples/sec Loss 6.8810 LearningRate 0.000972 Epoch: 4 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:08,639-Speed 2498.36 samples/sec Loss 6.8238 LearningRate 0.000972 Epoch: 4 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:16,852-Speed 2494.10 samples/sec Loss 6.8638 LearningRate 0.000972 Epoch: 4 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:25,050-Speed 2498.53 samples/sec Loss 7.0016 LearningRate 0.000972 Epoch: 4 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:33,208-Speed 2510.89 samples/sec Loss 6.8941 LearningRate 0.000972 Epoch: 4 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:41,409-Speed 2497.64 samples/sec Loss 6.9277 LearningRate 0.000972 Epoch: 4 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:49,607-Speed 2498.77 samples/sec Loss 6.8806 LearningRate 0.000972 Epoch: 4 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:48:57,803-Speed 2499.05 samples/sec Loss 6.9139 LearningRate 0.000972 Epoch: 4 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:06,021-Speed 2492.62 samples/sec Loss 6.9613 LearningRate 0.000972 Epoch: 4 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:14,229-Speed 2495.55 samples/sec Loss 6.8605 LearningRate 0.000972 Epoch: 4 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:22,387-Speed 2510.68 samples/sec Loss 6.9438 LearningRate 0.000972 Epoch: 4 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:30,586-Speed 2498.37 samples/sec Loss 6.9724 LearningRate 0.000972 Epoch: 4 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:38,791-Speed 2496.53 samples/sec Loss 6.8924 LearningRate 0.000972 Epoch: 4 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:46,985-Speed 2499.67 samples/sec Loss 6.8566 LearningRate 0.000972 Epoch: 4 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:49:55,184-Speed 2498.42 samples/sec Loss 6.9551 LearningRate 0.000972 Epoch: 4 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:03,383-Speed 2498.69 samples/sec Loss 6.9537 LearningRate 0.000972 Epoch: 4 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:11,531-Speed 2513.98 samples/sec Loss 6.9023 LearningRate 0.000972 Epoch: 4 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:19,729-Speed 2498.23 samples/sec Loss 6.9256 LearningRate 0.000972 Epoch: 4 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:27,927-Speed 2498.71 samples/sec Loss 6.9467 LearningRate 0.000972 Epoch: 4 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:36,124-Speed 2498.81 samples/sec Loss 6.9511 LearningRate 0.000972 Epoch: 4 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:44,323-Speed 2498.53 samples/sec Loss 6.9104 LearningRate 0.000972 Epoch: 4 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:50:52,521-Speed 2498.46 samples/sec Loss 6.9795 LearningRate 0.000972 Epoch: 4 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:00,666-Speed 2514.80 samples/sec Loss 6.9543 LearningRate 0.000972 Epoch: 4 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:08,865-Speed 2498.34 samples/sec Loss 6.9240 LearningRate 0.000971 Epoch: 4 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:17,067-Speed 2497.29 samples/sec Loss 6.9080 LearningRate 0.000971 Epoch: 4 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:25,264-Speed 2499.00 samples/sec Loss 6.8586 LearningRate 0.000971 Epoch: 4 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:33,463-Speed 2498.14 samples/sec Loss 6.9454 LearningRate 0.000971 Epoch: 4 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:41,663-Speed 2498.11 samples/sec Loss 6.8800 LearningRate 0.000971 Epoch: 4 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:49,808-Speed 2514.84 samples/sec Loss 6.9033 LearningRate 0.000971 Epoch: 4 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 11:51:57,964-Speed 2511.60 samples/sec Loss 6.8080 LearningRate 0.000971 Epoch: 4 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:06,163-Speed 2498.25 samples/sec Loss 6.8902 LearningRate 0.000971 Epoch: 4 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:14,360-Speed 2498.93 samples/sec Loss 6.9759 LearningRate 0.000971 Epoch: 4 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:22,568-Speed 2495.47 samples/sec Loss 6.8799 LearningRate 0.000971 Epoch: 4 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:30,767-Speed 2498.21 samples/sec Loss 7.0206 LearningRate 0.000971 Epoch: 4 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:38,913-Speed 2514.71 samples/sec Loss 6.9578 LearningRate 0.000971 Epoch: 4 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:47,123-Speed 2494.95 samples/sec Loss 6.9869 LearningRate 0.000971 Epoch: 4 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:52:55,336-Speed 2493.86 samples/sec Loss 6.9643 LearningRate 0.000971 Epoch: 4 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:03,536-Speed 2498.61 samples/sec Loss 6.8932 LearningRate 0.000971 Epoch: 4 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:11,734-Speed 2498.60 samples/sec Loss 6.8962 LearningRate 0.000971 Epoch: 4 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:19,933-Speed 2498.26 samples/sec Loss 6.9184 LearningRate 0.000971 Epoch: 4 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:28,080-Speed 2514.27 samples/sec Loss 6.9141 LearningRate 0.000971 Epoch: 4 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:36,279-Speed 2498.39 samples/sec Loss 6.7808 LearningRate 0.000971 Epoch: 4 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:44,476-Speed 2498.84 samples/sec Loss 6.8653 LearningRate 0.000971 Epoch: 4 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:53:52,679-Speed 2497.06 samples/sec Loss 6.8088 LearningRate 0.000971 Epoch: 4 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:00,879-Speed 2497.94 samples/sec Loss 6.8607 LearningRate 0.000971 Epoch: 4 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:09,082-Speed 2497.17 samples/sec Loss 6.9563 LearningRate 0.000971 Epoch: 4 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:17,230-Speed 2514.03 samples/sec Loss 6.9381 LearningRate 0.000971 Epoch: 4 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:25,427-Speed 2498.94 samples/sec Loss 6.8729 LearningRate 0.000971 Epoch: 4 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:33,627-Speed 2497.85 samples/sec Loss 6.8316 LearningRate 0.000971 Epoch: 4 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:41,825-Speed 2498.80 samples/sec Loss 7.0139 LearningRate 0.000971 Epoch: 4 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:50,021-Speed 2499.30 samples/sec Loss 6.9247 LearningRate 0.000971 Epoch: 4 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:54:58,219-Speed 2498.36 samples/sec Loss 6.8090 LearningRate 0.000971 Epoch: 4 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:06,367-Speed 2514.13 samples/sec Loss 6.8090 LearningRate 0.000971 Epoch: 4 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:14,565-Speed 2498.39 samples/sec Loss 6.7817 LearningRate 0.000971 Epoch: 4 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:22,769-Speed 2496.80 samples/sec Loss 6.9355 LearningRate 0.000971 Epoch: 4 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:30,970-Speed 2497.49 samples/sec Loss 6.8086 LearningRate 0.000971 Epoch: 4 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:39,174-Speed 2496.86 samples/sec Loss 6.7954 LearningRate 0.000971 Epoch: 4 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:47,371-Speed 2499.10 samples/sec Loss 6.7991 LearningRate 0.000971 Epoch: 4 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:55:55,518-Speed 2514.29 samples/sec Loss 6.7890 LearningRate 0.000971 Epoch: 4 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:03,722-Speed 2496.52 samples/sec Loss 6.8570 LearningRate 0.000971 Epoch: 4 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:11,921-Speed 2498.46 samples/sec Loss 6.9090 LearningRate 0.000971 Epoch: 4 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:20,117-Speed 2499.50 samples/sec Loss 6.8872 LearningRate 0.000970 Epoch: 4 Global Step: 94060 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:28,317-Speed 2497.97 samples/sec Loss 6.9415 LearningRate 0.000970 Epoch: 4 Global Step: 94070 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:36,518-Speed 2497.47 samples/sec Loss 6.8883 LearningRate 0.000970 Epoch: 4 Global Step: 94080 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:44,664-Speed 2514.47 samples/sec Loss 6.8555 LearningRate 0.000970 Epoch: 4 Global Step: 94090 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:56:52,861-Speed 2499.14 samples/sec Loss 6.9672 LearningRate 0.000970 Epoch: 4 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:01,060-Speed 2498.16 samples/sec Loss 6.7858 LearningRate 0.000970 Epoch: 4 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:09,259-Speed 2498.31 samples/sec Loss 6.8186 LearningRate 0.000970 Epoch: 4 Global Step: 94120 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:17,457-Speed 2498.51 samples/sec Loss 7.0038 LearningRate 0.000970 Epoch: 4 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:25,655-Speed 2498.79 samples/sec Loss 7.1296 LearningRate 0.000970 Epoch: 4 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:33,800-Speed 2514.56 samples/sec Loss 6.8999 LearningRate 0.000970 Epoch: 4 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:42,002-Speed 2497.44 samples/sec Loss 6.9751 LearningRate 0.000970 Epoch: 4 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:50,200-Speed 2498.66 samples/sec Loss 6.8758 LearningRate 0.000970 Epoch: 4 Global Step: 94170 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:57:58,397-Speed 2498.81 samples/sec Loss 6.9109 LearningRate 0.000970 Epoch: 4 Global Step: 94180 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:06,596-Speed 2498.34 samples/sec Loss 6.9624 LearningRate 0.000970 Epoch: 4 Global Step: 94190 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:14,794-Speed 2498.56 samples/sec Loss 6.8562 LearningRate 0.000970 Epoch: 4 Global Step: 94200 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:22,939-Speed 2514.94 samples/sec Loss 6.9674 LearningRate 0.000970 Epoch: 4 Global Step: 94210 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:31,143-Speed 2496.65 samples/sec Loss 6.9414 LearningRate 0.000970 Epoch: 4 Global Step: 94220 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:39,339-Speed 2499.09 samples/sec Loss 6.8721 LearningRate 0.000970 Epoch: 4 Global Step: 94230 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:47,540-Speed 2497.67 samples/sec Loss 6.8959 LearningRate 0.000970 Epoch: 4 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:58:55,740-Speed 2497.96 samples/sec Loss 6.8255 LearningRate 0.000970 Epoch: 4 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:03,937-Speed 2498.83 samples/sec Loss 7.0052 LearningRate 0.000970 Epoch: 4 Global Step: 94260 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:12,089-Speed 2512.99 samples/sec Loss 6.9073 LearningRate 0.000970 Epoch: 4 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:20,302-Speed 2494.10 samples/sec Loss 6.8532 LearningRate 0.000970 Epoch: 4 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:28,500-Speed 2498.41 samples/sec Loss 6.8250 LearningRate 0.000970 Epoch: 4 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:36,703-Speed 2496.93 samples/sec Loss 6.8629 LearningRate 0.000970 Epoch: 4 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:44,910-Speed 2495.91 samples/sec Loss 6.7853 LearningRate 0.000970 Epoch: 4 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 11:59:53,109-Speed 2498.52 samples/sec Loss 6.7839 LearningRate 0.000970 Epoch: 4 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:01,256-Speed 2514.05 samples/sec Loss 6.7980 LearningRate 0.000970 Epoch: 4 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:09,454-Speed 2498.56 samples/sec Loss 6.8878 LearningRate 0.000970 Epoch: 4 Global Step: 94340 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:17,654-Speed 2497.77 samples/sec Loss 6.9271 LearningRate 0.000970 Epoch: 4 Global Step: 94350 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:25,853-Speed 2498.35 samples/sec Loss 6.9440 LearningRate 0.000970 Epoch: 4 Global Step: 94360 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:34,050-Speed 2498.89 samples/sec Loss 6.9018 LearningRate 0.000970 Epoch: 4 Global Step: 94370 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:42,256-Speed 2495.88 samples/sec Loss 6.7973 LearningRate 0.000970 Epoch: 4 Global Step: 94380 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:50,401-Speed 2515.20 samples/sec Loss 6.8746 LearningRate 0.000970 Epoch: 4 Global Step: 94390 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:00:58,604-Speed 2497.02 samples/sec Loss 6.8243 LearningRate 0.000970 Epoch: 4 Global Step: 94400 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:06,811-Speed 2495.84 samples/sec Loss 6.8344 LearningRate 0.000970 Epoch: 4 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:15,009-Speed 2498.37 samples/sec Loss 6.9788 LearningRate 0.000970 Epoch: 4 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:23,206-Speed 2499.20 samples/sec Loss 7.0663 LearningRate 0.000970 Epoch: 4 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:31,402-Speed 2498.95 samples/sec Loss 7.0161 LearningRate 0.000969 Epoch: 4 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:39,548-Speed 2514.64 samples/sec Loss 6.9306 LearningRate 0.000969 Epoch: 4 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:47,747-Speed 2498.43 samples/sec Loss 7.0267 LearningRate 0.000969 Epoch: 4 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:01:55,952-Speed 2496.65 samples/sec Loss 6.9319 LearningRate 0.000969 Epoch: 4 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:04,149-Speed 2498.68 samples/sec Loss 6.9225 LearningRate 0.000969 Epoch: 4 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:12,351-Speed 2497.65 samples/sec Loss 6.8371 LearningRate 0.000969 Epoch: 4 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:20,550-Speed 2498.15 samples/sec Loss 6.8278 LearningRate 0.000969 Epoch: 4 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:28,693-Speed 2515.43 samples/sec Loss 6.8253 LearningRate 0.000969 Epoch: 4 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:36,893-Speed 2498.04 samples/sec Loss 6.8470 LearningRate 0.000969 Epoch: 4 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:45,090-Speed 2498.82 samples/sec Loss 6.9254 LearningRate 0.000969 Epoch: 4 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:02:53,288-Speed 2498.58 samples/sec Loss 6.8325 LearningRate 0.000969 Epoch: 4 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:01,487-Speed 2498.40 samples/sec Loss 6.7471 LearningRate 0.000969 Epoch: 4 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:09,684-Speed 2498.72 samples/sec Loss 6.7432 LearningRate 0.000969 Epoch: 4 Global Step: 94560 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:17,827-Speed 2515.72 samples/sec Loss 6.8054 LearningRate 0.000969 Epoch: 4 Global Step: 94570 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:26,028-Speed 2497.83 samples/sec Loss 6.9072 LearningRate 0.000969 Epoch: 4 Global Step: 94580 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:34,236-Speed 2495.64 samples/sec Loss 6.8814 LearningRate 0.000969 Epoch: 4 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:42,437-Speed 2497.87 samples/sec Loss 6.8254 LearningRate 0.000969 Epoch: 4 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:50,637-Speed 2497.91 samples/sec Loss 6.8233 LearningRate 0.000969 Epoch: 4 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:03:58,836-Speed 2498.13 samples/sec Loss 6.8072 LearningRate 0.000969 Epoch: 4 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:06,981-Speed 2515.15 samples/sec Loss 6.8548 LearningRate 0.000969 Epoch: 4 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:15,180-Speed 2498.34 samples/sec Loss 6.7396 LearningRate 0.000969 Epoch: 4 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:23,381-Speed 2497.55 samples/sec Loss 6.8177 LearningRate 0.000969 Epoch: 4 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:31,591-Speed 2494.66 samples/sec Loss 6.8440 LearningRate 0.000969 Epoch: 4 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:39,788-Speed 2498.86 samples/sec Loss 6.8198 LearningRate 0.000969 Epoch: 4 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:47,987-Speed 2498.30 samples/sec Loss 6.7720 LearningRate 0.000969 Epoch: 4 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:04:56,134-Speed 2514.12 samples/sec Loss 6.8474 LearningRate 0.000969 Epoch: 4 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:04,332-Speed 2499.04 samples/sec Loss 6.7785 LearningRate 0.000969 Epoch: 4 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:12,532-Speed 2498.28 samples/sec Loss 6.8321 LearningRate 0.000969 Epoch: 4 Global Step: 94710 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:20,729-Speed 2499.00 samples/sec Loss 6.7981 LearningRate 0.000969 Epoch: 4 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:28,927-Speed 2498.65 samples/sec Loss 6.8099 LearningRate 0.000969 Epoch: 4 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:37,127-Speed 2498.01 samples/sec Loss 6.8563 LearningRate 0.000969 Epoch: 4 Global Step: 94740 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:45,272-Speed 2514.75 samples/sec Loss 6.8784 LearningRate 0.000969 Epoch: 4 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:05:53,473-Speed 2497.77 samples/sec Loss 6.8135 LearningRate 0.000969 Epoch: 4 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:01,670-Speed 2498.68 samples/sec Loss 6.7260 LearningRate 0.000969 Epoch: 4 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:09,864-Speed 2500.37 samples/sec Loss 6.7875 LearningRate 0.000969 Epoch: 4 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:18,065-Speed 2497.86 samples/sec Loss 6.6829 LearningRate 0.000969 Epoch: 4 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:26,268-Speed 2497.00 samples/sec Loss 6.7137 LearningRate 0.000969 Epoch: 4 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:34,416-Speed 2513.91 samples/sec Loss 6.7215 LearningRate 0.000969 Epoch: 4 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:42,616-Speed 2497.83 samples/sec Loss 6.8575 LearningRate 0.000968 Epoch: 4 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:50,815-Speed 2498.44 samples/sec Loss 6.6996 LearningRate 0.000968 Epoch: 4 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:06:59,014-Speed 2498.25 samples/sec Loss 6.7842 LearningRate 0.000968 Epoch: 4 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:07,221-Speed 2495.81 samples/sec Loss 6.7853 LearningRate 0.000968 Epoch: 4 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:15,421-Speed 2498.02 samples/sec Loss 6.8011 LearningRate 0.000968 Epoch: 4 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:23,575-Speed 2512.11 samples/sec Loss 6.9053 LearningRate 0.000968 Epoch: 4 Global Step: 94870 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:31,771-Speed 2499.11 samples/sec Loss 7.0148 LearningRate 0.000968 Epoch: 4 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:39,971-Speed 2498.15 samples/sec Loss 6.8779 LearningRate 0.000968 Epoch: 4 Global Step: 94890 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:48,173-Speed 2497.29 samples/sec Loss 6.8694 LearningRate 0.000968 Epoch: 4 Global Step: 94900 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:07:56,377-Speed 2496.72 samples/sec Loss 6.9497 LearningRate 0.000968 Epoch: 4 Global Step: 94910 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:08:04,576-Speed 2498.31 samples/sec Loss 6.8955 LearningRate 0.000968 Epoch: 4 Global Step: 94920 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:08:12,722-Speed 2514.68 samples/sec Loss 6.9113 LearningRate 0.000968 Epoch: 4 Global Step: 94930 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:08:20,920-Speed 2498.91 samples/sec Loss 7.0122 LearningRate 0.000968 Epoch: 4 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:08:29,130-Speed 2494.73 samples/sec Loss 6.8818 LearningRate 0.000968 Epoch: 4 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:08:37,330-Speed 2498.08 samples/sec Loss 6.8501 LearningRate 0.000968 Epoch: 4 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:08:45,528-Speed 2498.53 samples/sec Loss 6.9249 LearningRate 0.000968 Epoch: 4 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:08:53,724-Speed 2499.19 samples/sec Loss 6.8107 LearningRate 0.000968 Epoch: 4 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:01,865-Speed 2516.22 samples/sec Loss 6.9040 LearningRate 0.000968 Epoch: 4 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:10,061-Speed 2499.23 samples/sec Loss 6.8513 LearningRate 0.000968 Epoch: 4 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:18,274-Speed 2494.14 samples/sec Loss 6.8821 LearningRate 0.000968 Epoch: 4 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:26,467-Speed 2500.12 samples/sec Loss 6.7509 LearningRate 0.000968 Epoch: 4 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:34,662-Speed 2499.71 samples/sec Loss 6.8689 LearningRate 0.000968 Epoch: 4 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:42,859-Speed 2498.66 samples/sec Loss 6.7041 LearningRate 0.000968 Epoch: 4 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:51,000-Speed 2516.15 samples/sec Loss 6.7818 LearningRate 0.000968 Epoch: 4 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:09:59,191-Speed 2500.44 samples/sec Loss 6.8179 LearningRate 0.000968 Epoch: 4 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:07,391-Speed 2497.84 samples/sec Loss 6.8392 LearningRate 0.000968 Epoch: 4 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:15,590-Speed 2498.20 samples/sec Loss 6.8664 LearningRate 0.000968 Epoch: 4 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:23,788-Speed 2499.23 samples/sec Loss 6.7941 LearningRate 0.000968 Epoch: 4 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:31,980-Speed 2500.11 samples/sec Loss 7.0105 LearningRate 0.000968 Epoch: 4 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:40,126-Speed 2514.59 samples/sec Loss 6.8281 LearningRate 0.000968 Epoch: 4 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:48,319-Speed 2500.31 samples/sec Loss 6.8634 LearningRate 0.000968 Epoch: 4 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:10:56,515-Speed 2499.06 samples/sec Loss 6.7668 LearningRate 0.000968 Epoch: 4 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:04,723-Speed 2495.85 samples/sec Loss 6.8303 LearningRate 0.000968 Epoch: 4 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:12,916-Speed 2499.98 samples/sec Loss 6.7569 LearningRate 0.000968 Epoch: 4 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:21,143-Speed 2489.82 samples/sec Loss 6.7691 LearningRate 0.000968 Epoch: 4 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:29,287-Speed 2514.95 samples/sec Loss 6.6847 LearningRate 0.000968 Epoch: 4 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:37,486-Speed 2498.43 samples/sec Loss 6.7545 LearningRate 0.000968 Epoch: 4 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:45,686-Speed 2497.91 samples/sec Loss 6.7345 LearningRate 0.000967 Epoch: 4 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:11:53,887-Speed 2497.75 samples/sec Loss 6.6984 LearningRate 0.000967 Epoch: 4 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:02,089-Speed 2497.28 samples/sec Loss 6.8922 LearningRate 0.000967 Epoch: 4 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:10,291-Speed 2497.53 samples/sec Loss 6.8714 LearningRate 0.000967 Epoch: 4 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:18,435-Speed 2515.07 samples/sec Loss 6.8734 LearningRate 0.000967 Epoch: 4 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:26,633-Speed 2498.52 samples/sec Loss 6.6656 LearningRate 0.000967 Epoch: 4 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:34,831-Speed 2498.61 samples/sec Loss 6.7168 LearningRate 0.000967 Epoch: 4 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:43,033-Speed 2497.27 samples/sec Loss 6.7926 LearningRate 0.000967 Epoch: 4 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:51,234-Speed 2497.66 samples/sec Loss 6.8835 LearningRate 0.000967 Epoch: 4 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:12:59,438-Speed 2497.11 samples/sec Loss 6.8367 LearningRate 0.000967 Epoch: 4 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:07,596-Speed 2510.76 samples/sec Loss 6.7551 LearningRate 0.000967 Epoch: 4 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:15,797-Speed 2497.80 samples/sec Loss 6.8235 LearningRate 0.000967 Epoch: 4 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:24,003-Speed 2496.14 samples/sec Loss 6.8069 LearningRate 0.000967 Epoch: 4 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:32,201-Speed 2498.53 samples/sec Loss 6.7266 LearningRate 0.000967 Epoch: 4 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:40,400-Speed 2498.58 samples/sec Loss 6.7425 LearningRate 0.000967 Epoch: 4 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:48,593-Speed 2499.95 samples/sec Loss 6.7044 LearningRate 0.000967 Epoch: 4 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:13:56,737-Speed 2515.30 samples/sec Loss 6.8671 LearningRate 0.000967 Epoch: 4 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:04,930-Speed 2499.83 samples/sec Loss 6.8097 LearningRate 0.000967 Epoch: 4 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:13,126-Speed 2499.43 samples/sec Loss 6.8032 LearningRate 0.000967 Epoch: 4 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:21,324-Speed 2498.67 samples/sec Loss 6.8287 LearningRate 0.000967 Epoch: 4 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:29,520-Speed 2499.07 samples/sec Loss 6.7779 LearningRate 0.000967 Epoch: 4 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:37,714-Speed 2499.70 samples/sec Loss 6.7155 LearningRate 0.000967 Epoch: 4 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:45,856-Speed 2516.00 samples/sec Loss 6.8784 LearningRate 0.000967 Epoch: 4 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:14:54,052-Speed 2499.07 samples/sec Loss 6.7740 LearningRate 0.000967 Epoch: 4 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:02,249-Speed 2498.69 samples/sec Loss 6.7238 LearningRate 0.000967 Epoch: 4 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:10,445-Speed 2499.58 samples/sec Loss 6.7305 LearningRate 0.000967 Epoch: 4 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:18,641-Speed 2499.28 samples/sec Loss 6.7312 LearningRate 0.000967 Epoch: 4 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:26,845-Speed 2497.02 samples/sec Loss 6.7516 LearningRate 0.000967 Epoch: 4 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:34,990-Speed 2514.81 samples/sec Loss 6.7722 LearningRate 0.000967 Epoch: 4 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:43,186-Speed 2499.50 samples/sec Loss 6.7631 LearningRate 0.000967 Epoch: 4 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:51,383-Speed 2498.79 samples/sec Loss 6.7239 LearningRate 0.000967 Epoch: 4 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:15:59,580-Speed 2498.95 samples/sec Loss 6.7072 LearningRate 0.000967 Epoch: 4 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:07,775-Speed 2499.48 samples/sec Loss 6.7386 LearningRate 0.000967 Epoch: 4 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:15,970-Speed 2499.30 samples/sec Loss 6.8410 LearningRate 0.000967 Epoch: 4 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:24,114-Speed 2515.17 samples/sec Loss 6.7202 LearningRate 0.000967 Epoch: 4 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:32,310-Speed 2499.22 samples/sec Loss 6.8529 LearningRate 0.000967 Epoch: 4 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:40,506-Speed 2499.30 samples/sec Loss 6.8770 LearningRate 0.000967 Epoch: 4 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:48,700-Speed 2499.55 samples/sec Loss 6.8827 LearningRate 0.000967 Epoch: 4 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:16:56,896-Speed 2499.26 samples/sec Loss 6.9547 LearningRate 0.000966 Epoch: 4 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:05,091-Speed 2499.47 samples/sec Loss 6.8557 LearningRate 0.000966 Epoch: 4 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:13,235-Speed 2515.04 samples/sec Loss 6.7805 LearningRate 0.000966 Epoch: 4 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:21,432-Speed 2499.00 samples/sec Loss 6.7899 LearningRate 0.000966 Epoch: 4 Global Step: 95600 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:29,631-Speed 2498.20 samples/sec Loss 6.8874 LearningRate 0.000966 Epoch: 4 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:37,827-Speed 2499.43 samples/sec Loss 6.8368 LearningRate 0.000966 Epoch: 4 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:46,022-Speed 2499.34 samples/sec Loss 6.8942 LearningRate 0.000966 Epoch: 4 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:17:54,232-Speed 2495.04 samples/sec Loss 6.7415 LearningRate 0.000966 Epoch: 4 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:02,378-Speed 2514.69 samples/sec Loss 6.8956 LearningRate 0.000966 Epoch: 4 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:10,577-Speed 2498.16 samples/sec Loss 6.9773 LearningRate 0.000966 Epoch: 4 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:18,776-Speed 2498.34 samples/sec Loss 6.9343 LearningRate 0.000966 Epoch: 4 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:26,977-Speed 2497.57 samples/sec Loss 6.8316 LearningRate 0.000966 Epoch: 4 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:35,177-Speed 2498.04 samples/sec Loss 6.8031 LearningRate 0.000966 Epoch: 4 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:43,376-Speed 2498.17 samples/sec Loss 6.7585 LearningRate 0.000966 Epoch: 4 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:51,523-Speed 2514.42 samples/sec Loss 6.7805 LearningRate 0.000966 Epoch: 4 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:18:59,722-Speed 2498.10 samples/sec Loss 6.7729 LearningRate 0.000966 Epoch: 4 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:07,922-Speed 2498.16 samples/sec Loss 6.7214 LearningRate 0.000966 Epoch: 4 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:16,120-Speed 2498.50 samples/sec Loss 6.8845 LearningRate 0.000966 Epoch: 4 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:24,316-Speed 2499.41 samples/sec Loss 6.7372 LearningRate 0.000966 Epoch: 4 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:32,513-Speed 2498.76 samples/sec Loss 6.7288 LearningRate 0.000966 Epoch: 4 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:40,659-Speed 2514.46 samples/sec Loss 6.7764 LearningRate 0.000966 Epoch: 4 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:48,854-Speed 2499.75 samples/sec Loss 6.7621 LearningRate 0.000966 Epoch: 4 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:19:57,052-Speed 2498.74 samples/sec Loss 6.6669 LearningRate 0.000966 Epoch: 4 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:05,254-Speed 2497.28 samples/sec Loss 6.7930 LearningRate 0.000966 Epoch: 4 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:13,450-Speed 2499.40 samples/sec Loss 6.7558 LearningRate 0.000966 Epoch: 4 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:21,646-Speed 2499.08 samples/sec Loss 6.7797 LearningRate 0.000966 Epoch: 4 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:29,792-Speed 2514.54 samples/sec Loss 6.7938 LearningRate 0.000966 Epoch: 4 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:37,990-Speed 2498.48 samples/sec Loss 6.8119 LearningRate 0.000966 Epoch: 4 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:46,196-Speed 2496.05 samples/sec Loss 6.7913 LearningRate 0.000966 Epoch: 4 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:20:54,411-Speed 2493.61 samples/sec Loss 6.8342 LearningRate 0.000966 Epoch: 4 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:02,608-Speed 2498.66 samples/sec Loss 6.7734 LearningRate 0.000966 Epoch: 4 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:10,811-Speed 2497.06 samples/sec Loss 6.7350 LearningRate 0.000966 Epoch: 4 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:18,956-Speed 2514.65 samples/sec Loss 6.7902 LearningRate 0.000966 Epoch: 4 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:27,157-Speed 2497.76 samples/sec Loss 6.6684 LearningRate 0.000966 Epoch: 4 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:35,358-Speed 2497.82 samples/sec Loss 6.7981 LearningRate 0.000966 Epoch: 4 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:43,557-Speed 2498.10 samples/sec Loss 6.7866 LearningRate 0.000966 Epoch: 4 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:51,756-Speed 2498.35 samples/sec Loss 6.6967 LearningRate 0.000966 Epoch: 4 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:21:59,954-Speed 2498.62 samples/sec Loss 6.7730 LearningRate 0.000966 Epoch: 4 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:08,102-Speed 2513.99 samples/sec Loss 6.7292 LearningRate 0.000965 Epoch: 4 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:16,300-Speed 2498.54 samples/sec Loss 6.6732 LearningRate 0.000965 Epoch: 4 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:24,503-Speed 2497.06 samples/sec Loss 6.6485 LearningRate 0.000965 Epoch: 4 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:32,704-Speed 2497.95 samples/sec Loss 6.7035 LearningRate 0.000965 Epoch: 4 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:40,905-Speed 2497.51 samples/sec Loss 6.6795 LearningRate 0.000965 Epoch: 4 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:49,106-Speed 2497.66 samples/sec Loss 6.6562 LearningRate 0.000965 Epoch: 4 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:22:57,259-Speed 2512.44 samples/sec Loss 6.7402 LearningRate 0.000965 Epoch: 4 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:23:05,456-Speed 2498.82 samples/sec Loss 6.7668 LearningRate 0.000965 Epoch: 4 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 168 hours Training: 2022-07-06 12:23:13,619-Speed 2509.21 samples/sec Loss 6.8401 LearningRate 0.000965 Epoch: 4 Global Step: 96030 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:23:21,822-Speed 2497.08 samples/sec Loss 6.7314 LearningRate 0.000965 Epoch: 4 Global Step: 96040 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:23:30,022-Speed 2498.10 samples/sec Loss 6.7074 LearningRate 0.000965 Epoch: 4 Global Step: 96050 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:23:38,226-Speed 2497.06 samples/sec Loss 6.8736 LearningRate 0.000965 Epoch: 4 Global Step: 96060 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:23:46,379-Speed 2512.27 samples/sec Loss 6.7607 LearningRate 0.000965 Epoch: 4 Global Step: 96070 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:23:54,580-Speed 2497.84 samples/sec Loss 6.8291 LearningRate 0.000965 Epoch: 4 Global Step: 96080 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:02,780-Speed 2498.07 samples/sec Loss 6.8131 LearningRate 0.000965 Epoch: 4 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:10,982-Speed 2497.01 samples/sec Loss 6.7270 LearningRate 0.000965 Epoch: 4 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:19,178-Speed 2499.25 samples/sec Loss 6.8395 LearningRate 0.000965 Epoch: 4 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:27,378-Speed 2498.11 samples/sec Loss 6.6467 LearningRate 0.000965 Epoch: 4 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:35,524-Speed 2514.51 samples/sec Loss 6.7993 LearningRate 0.000965 Epoch: 4 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:43,725-Speed 2497.58 samples/sec Loss 6.7418 LearningRate 0.000965 Epoch: 4 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:24:51,922-Speed 2498.80 samples/sec Loss 6.7392 LearningRate 0.000965 Epoch: 4 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:00,117-Speed 2499.50 samples/sec Loss 6.7724 LearningRate 0.000965 Epoch: 4 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:08,316-Speed 2498.66 samples/sec Loss 6.7032 LearningRate 0.000965 Epoch: 4 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:16,513-Speed 2498.75 samples/sec Loss 6.6218 LearningRate 0.000965 Epoch: 4 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:24,659-Speed 2514.72 samples/sec Loss 6.7131 LearningRate 0.000965 Epoch: 4 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:32,862-Speed 2496.85 samples/sec Loss 6.7072 LearningRate 0.000965 Epoch: 4 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:41,061-Speed 2498.44 samples/sec Loss 6.6723 LearningRate 0.000965 Epoch: 4 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:49,258-Speed 2498.71 samples/sec Loss 6.6847 LearningRate 0.000965 Epoch: 4 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:25:57,456-Speed 2498.58 samples/sec Loss 6.6862 LearningRate 0.000965 Epoch: 4 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:05,652-Speed 2499.30 samples/sec Loss 6.6181 LearningRate 0.000965 Epoch: 4 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:13,798-Speed 2514.57 samples/sec Loss 6.6268 LearningRate 0.000965 Epoch: 4 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:21,996-Speed 2498.68 samples/sec Loss 6.6740 LearningRate 0.000965 Epoch: 4 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:30,196-Speed 2498.02 samples/sec Loss 6.6044 LearningRate 0.000965 Epoch: 4 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:38,398-Speed 2497.07 samples/sec Loss 6.6933 LearningRate 0.000965 Epoch: 4 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:46,600-Speed 2500.24 samples/sec Loss 6.8439 LearningRate 0.000965 Epoch: 4 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:26:54,798-Speed 2498.80 samples/sec Loss 6.6514 LearningRate 0.000965 Epoch: 4 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:02,945-Speed 2514.01 samples/sec Loss 7.0640 LearningRate 0.000965 Epoch: 4 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:11,146-Speed 2497.59 samples/sec Loss 6.9578 LearningRate 0.000965 Epoch: 4 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:19,343-Speed 2498.88 samples/sec Loss 6.9446 LearningRate 0.000964 Epoch: 4 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:27,542-Speed 2498.25 samples/sec Loss 6.8831 LearningRate 0.000964 Epoch: 4 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:35,736-Speed 2500.11 samples/sec Loss 6.8880 LearningRate 0.000964 Epoch: 4 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:43,956-Speed 2492.08 samples/sec Loss 6.7771 LearningRate 0.000964 Epoch: 4 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:27:52,098-Speed 2515.54 samples/sec Loss 6.7679 LearningRate 0.000964 Epoch: 4 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:00,290-Speed 2500.32 samples/sec Loss 6.7702 LearningRate 0.000964 Epoch: 4 Global Step: 96380 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:08,484-Speed 2499.74 samples/sec Loss 6.7582 LearningRate 0.000964 Epoch: 4 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:16,681-Speed 2499.02 samples/sec Loss 6.7090 LearningRate 0.000964 Epoch: 4 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:24,892-Speed 2494.82 samples/sec Loss 6.6934 LearningRate 0.000964 Epoch: 4 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:33,089-Speed 2498.72 samples/sec Loss 6.9290 LearningRate 0.000964 Epoch: 4 Global Step: 96420 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:41,232-Speed 2515.48 samples/sec Loss 6.6901 LearningRate 0.000964 Epoch: 4 Global Step: 96430 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:49,432-Speed 2497.80 samples/sec Loss 6.7255 LearningRate 0.000964 Epoch: 4 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:28:57,647-Speed 2493.40 samples/sec Loss 6.7062 LearningRate 0.000964 Epoch: 4 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:05,839-Speed 2500.45 samples/sec Loss 6.7749 LearningRate 0.000964 Epoch: 4 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:14,037-Speed 2498.66 samples/sec Loss 6.7174 LearningRate 0.000964 Epoch: 4 Global Step: 96470 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:22,240-Speed 2497.18 samples/sec Loss 6.7910 LearningRate 0.000964 Epoch: 4 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:30,389-Speed 2513.26 samples/sec Loss 6.7434 LearningRate 0.000964 Epoch: 4 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:38,590-Speed 2497.93 samples/sec Loss 6.6836 LearningRate 0.000964 Epoch: 4 Global Step: 96500 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:46,785-Speed 2499.27 samples/sec Loss 6.6965 LearningRate 0.000964 Epoch: 4 Global Step: 96510 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:29:54,980-Speed 2499.48 samples/sec Loss 6.6842 LearningRate 0.000964 Epoch: 4 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:03,178-Speed 2498.81 samples/sec Loss 6.7228 LearningRate 0.000964 Epoch: 4 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:11,377-Speed 2498.08 samples/sec Loss 6.7375 LearningRate 0.000964 Epoch: 4 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:19,525-Speed 2514.17 samples/sec Loss 6.7226 LearningRate 0.000964 Epoch: 4 Global Step: 96550 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:27,724-Speed 2498.23 samples/sec Loss 6.8024 LearningRate 0.000964 Epoch: 4 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:35,924-Speed 2497.75 samples/sec Loss 6.6828 LearningRate 0.000964 Epoch: 4 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:44,125-Speed 2497.81 samples/sec Loss 6.6618 LearningRate 0.000964 Epoch: 4 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:30:52,341-Speed 2493.28 samples/sec Loss 6.6947 LearningRate 0.000964 Epoch: 4 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:31:00,541-Speed 2497.94 samples/sec Loss 6.6663 LearningRate 0.000964 Epoch: 4 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 168 hours Training: 2022-07-06 12:31:08,692-Speed 2512.91 samples/sec Loss 6.7173 LearningRate 0.000964 Epoch: 4 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:16,896-Speed 2496.74 samples/sec Loss 6.6829 LearningRate 0.000964 Epoch: 4 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:25,094-Speed 2499.02 samples/sec Loss 6.7265 LearningRate 0.000964 Epoch: 4 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:33,293-Speed 2498.21 samples/sec Loss 6.6488 LearningRate 0.000964 Epoch: 4 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:41,506-Speed 2493.94 samples/sec Loss 6.6334 LearningRate 0.000964 Epoch: 4 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:49,704-Speed 2498.55 samples/sec Loss 6.5675 LearningRate 0.000964 Epoch: 4 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:31:57,850-Speed 2514.63 samples/sec Loss 6.7790 LearningRate 0.000964 Epoch: 4 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:06,047-Speed 2498.76 samples/sec Loss 6.5977 LearningRate 0.000964 Epoch: 4 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:14,246-Speed 2498.41 samples/sec Loss 6.7110 LearningRate 0.000964 Epoch: 4 Global Step: 96690 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:22,438-Speed 2500.24 samples/sec Loss 6.6436 LearningRate 0.000964 Epoch: 4 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:30,633-Speed 2499.41 samples/sec Loss 6.6870 LearningRate 0.000963 Epoch: 4 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:38,830-Speed 2498.87 samples/sec Loss 6.6621 LearningRate 0.000963 Epoch: 4 Global Step: 96720 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:46,975-Speed 2514.93 samples/sec Loss 6.7319 LearningRate 0.000963 Epoch: 4 Global Step: 96730 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:32:55,170-Speed 2499.73 samples/sec Loss 6.7154 LearningRate 0.000963 Epoch: 4 Global Step: 96740 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:03,368-Speed 2498.34 samples/sec Loss 6.7113 LearningRate 0.000963 Epoch: 4 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:11,563-Speed 2499.44 samples/sec Loss 6.7220 LearningRate 0.000963 Epoch: 4 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:19,757-Speed 2499.61 samples/sec Loss 6.6332 LearningRate 0.000963 Epoch: 4 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:27,955-Speed 2498.61 samples/sec Loss 6.6812 LearningRate 0.000963 Epoch: 4 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:36,109-Speed 2512.02 samples/sec Loss 6.7598 LearningRate 0.000963 Epoch: 4 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:44,304-Speed 2499.41 samples/sec Loss 6.7057 LearningRate 0.000963 Epoch: 4 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:33:52,501-Speed 2498.88 samples/sec Loss 6.6110 LearningRate 0.000963 Epoch: 4 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:00,702-Speed 2497.54 samples/sec Loss 6.7341 LearningRate 0.000963 Epoch: 4 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:08,899-Speed 2499.08 samples/sec Loss 6.6596 LearningRate 0.000963 Epoch: 4 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:17,097-Speed 2498.43 samples/sec Loss 6.6540 LearningRate 0.000963 Epoch: 4 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:25,240-Speed 2515.77 samples/sec Loss 6.6357 LearningRate 0.000963 Epoch: 4 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:33,434-Speed 2499.62 samples/sec Loss 6.6803 LearningRate 0.000963 Epoch: 4 Global Step: 96860 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:41,632-Speed 2498.65 samples/sec Loss 6.6335 LearningRate 0.000963 Epoch: 4 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:49,830-Speed 2498.66 samples/sec Loss 6.6092 LearningRate 0.000963 Epoch: 4 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:34:58,029-Speed 2498.30 samples/sec Loss 6.7220 LearningRate 0.000963 Epoch: 4 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:06,226-Speed 2498.85 samples/sec Loss 6.6888 LearningRate 0.000963 Epoch: 4 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:14,377-Speed 2513.15 samples/sec Loss 6.5915 LearningRate 0.000963 Epoch: 4 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:22,578-Speed 2497.59 samples/sec Loss 6.7113 LearningRate 0.000963 Epoch: 4 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:30,776-Speed 2498.42 samples/sec Loss 6.7161 LearningRate 0.000963 Epoch: 4 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:38,985-Speed 2495.41 samples/sec Loss 6.6927 LearningRate 0.000963 Epoch: 4 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:47,186-Speed 2497.78 samples/sec Loss 6.6702 LearningRate 0.000963 Epoch: 4 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:35:55,388-Speed 2497.42 samples/sec Loss 6.6766 LearningRate 0.000963 Epoch: 4 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:03,536-Speed 2513.75 samples/sec Loss 6.6948 LearningRate 0.000963 Epoch: 4 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:11,734-Speed 2498.69 samples/sec Loss 6.6558 LearningRate 0.000963 Epoch: 4 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:19,930-Speed 2499.29 samples/sec Loss 6.5945 LearningRate 0.000963 Epoch: 4 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:28,128-Speed 2498.50 samples/sec Loss 6.5037 LearningRate 0.000963 Epoch: 4 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:36,329-Speed 2498.02 samples/sec Loss 6.6261 LearningRate 0.000963 Epoch: 4 Global Step: 97010 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:44,529-Speed 2497.95 samples/sec Loss 6.7201 LearningRate 0.000963 Epoch: 4 Global Step: 97020 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:36:52,671-Speed 2515.69 samples/sec Loss 6.7306 LearningRate 0.000963 Epoch: 4 Global Step: 97030 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:00,867-Speed 2499.04 samples/sec Loss 6.6970 LearningRate 0.000963 Epoch: 4 Global Step: 97040 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:09,069-Speed 2497.56 samples/sec Loss 6.6553 LearningRate 0.000963 Epoch: 4 Global Step: 97050 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:17,266-Speed 2498.80 samples/sec Loss 6.7269 LearningRate 0.000963 Epoch: 4 Global Step: 97060 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:25,465-Speed 2498.45 samples/sec Loss 6.7764 LearningRate 0.000963 Epoch: 4 Global Step: 97070 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:33,666-Speed 2497.77 samples/sec Loss 6.6863 LearningRate 0.000963 Epoch: 4 Global Step: 97080 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:41,811-Speed 2514.91 samples/sec Loss 6.6313 LearningRate 0.000962 Epoch: 4 Global Step: 97090 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:50,010-Speed 2498.20 samples/sec Loss 6.6592 LearningRate 0.000962 Epoch: 4 Global Step: 97100 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:37:58,208-Speed 2498.88 samples/sec Loss 6.5920 LearningRate 0.000962 Epoch: 4 Global Step: 97110 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:06,405-Speed 2498.78 samples/sec Loss 6.6791 LearningRate 0.000962 Epoch: 4 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:14,602-Speed 2498.92 samples/sec Loss 6.6782 LearningRate 0.000962 Epoch: 4 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:22,803-Speed 2497.78 samples/sec Loss 6.6269 LearningRate 0.000962 Epoch: 4 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:30,948-Speed 2514.65 samples/sec Loss 6.6976 LearningRate 0.000962 Epoch: 4 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:39,149-Speed 2497.85 samples/sec Loss 6.6927 LearningRate 0.000962 Epoch: 4 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:47,347-Speed 2498.29 samples/sec Loss 6.6207 LearningRate 0.000962 Epoch: 4 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:38:55,559-Speed 2494.35 samples/sec Loss 6.6379 LearningRate 0.000962 Epoch: 4 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:39:03,759-Speed 2498.18 samples/sec Loss 6.5916 LearningRate 0.000962 Epoch: 4 Global Step: 97190 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:39:11,963-Speed 2496.41 samples/sec Loss 6.6470 LearningRate 0.000962 Epoch: 4 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:39:20,116-Speed 2512.39 samples/sec Loss 6.6776 LearningRate 0.000962 Epoch: 4 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:39:28,322-Speed 2496.28 samples/sec Loss 6.6341 LearningRate 0.000962 Epoch: 4 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:39:36,528-Speed 2496.27 samples/sec Loss 6.5426 LearningRate 0.000962 Epoch: 4 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:39:44,728-Speed 2497.86 samples/sec Loss 6.5414 LearningRate 0.000962 Epoch: 4 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:39:52,928-Speed 2498.04 samples/sec Loss 6.5768 LearningRate 0.000962 Epoch: 4 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:01,129-Speed 2497.66 samples/sec Loss 6.6260 LearningRate 0.000962 Epoch: 4 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:09,272-Speed 2515.34 samples/sec Loss 6.6352 LearningRate 0.000962 Epoch: 4 Global Step: 97270 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:17,472-Speed 2497.89 samples/sec Loss 6.5138 LearningRate 0.000962 Epoch: 4 Global Step: 97280 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:25,675-Speed 2498.01 samples/sec Loss 6.5454 LearningRate 0.000962 Epoch: 4 Global Step: 97290 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:33,873-Speed 2498.51 samples/sec Loss 6.5667 LearningRate 0.000962 Epoch: 4 Global Step: 97300 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:42,074-Speed 2497.60 samples/sec Loss 6.5599 LearningRate 0.000962 Epoch: 4 Global Step: 97310 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:50,273-Speed 2498.22 samples/sec Loss 6.6478 LearningRate 0.000962 Epoch: 4 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:40:58,425-Speed 2512.76 samples/sec Loss 6.6413 LearningRate 0.000962 Epoch: 4 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:06,635-Speed 2494.70 samples/sec Loss 6.6065 LearningRate 0.000962 Epoch: 4 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:14,835-Speed 2498.03 samples/sec Loss 6.5550 LearningRate 0.000962 Epoch: 4 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:23,046-Speed 2494.96 samples/sec Loss 6.5401 LearningRate 0.000962 Epoch: 4 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:31,245-Speed 2498.35 samples/sec Loss 6.5572 LearningRate 0.000962 Epoch: 4 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:39,442-Speed 2498.59 samples/sec Loss 6.6563 LearningRate 0.000962 Epoch: 4 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:47,591-Speed 2513.84 samples/sec Loss 6.7976 LearningRate 0.000962 Epoch: 4 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:41:55,789-Speed 2498.44 samples/sec Loss 6.5624 LearningRate 0.000962 Epoch: 4 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:03,989-Speed 2498.04 samples/sec Loss 6.6397 LearningRate 0.000962 Epoch: 4 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:12,188-Speed 2498.34 samples/sec Loss 6.6758 LearningRate 0.000962 Epoch: 4 Global Step: 97420 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:20,387-Speed 2498.23 samples/sec Loss 6.5844 LearningRate 0.000962 Epoch: 4 Global Step: 97430 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:28,594-Speed 2495.82 samples/sec Loss 6.7209 LearningRate 0.000962 Epoch: 4 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:36,739-Speed 2514.75 samples/sec Loss 6.5522 LearningRate 0.000962 Epoch: 4 Global Step: 97450 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:44,949-Speed 2494.83 samples/sec Loss 6.7627 LearningRate 0.000962 Epoch: 4 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:42:53,146-Speed 2498.98 samples/sec Loss 6.6024 LearningRate 0.000962 Epoch: 4 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:01,345-Speed 2498.34 samples/sec Loss 6.6534 LearningRate 0.000961 Epoch: 4 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:09,542-Speed 2498.71 samples/sec Loss 6.5450 LearningRate 0.000961 Epoch: 4 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:17,740-Speed 2498.62 samples/sec Loss 6.6701 LearningRate 0.000961 Epoch: 4 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:25,893-Speed 2512.32 samples/sec Loss 6.6552 LearningRate 0.000961 Epoch: 4 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:34,091-Speed 2498.75 samples/sec Loss 6.6134 LearningRate 0.000961 Epoch: 4 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:42,285-Speed 2500.12 samples/sec Loss 6.5802 LearningRate 0.000961 Epoch: 4 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:50,485-Speed 2498.11 samples/sec Loss 6.5809 LearningRate 0.000961 Epoch: 4 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:43:58,681-Speed 2499.04 samples/sec Loss 6.6157 LearningRate 0.000961 Epoch: 4 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:06,879-Speed 2498.68 samples/sec Loss 6.5541 LearningRate 0.000961 Epoch: 4 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:15,028-Speed 2513.39 samples/sec Loss 6.6431 LearningRate 0.000961 Epoch: 4 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:23,226-Speed 2498.65 samples/sec Loss 6.6577 LearningRate 0.000961 Epoch: 4 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:31,425-Speed 2498.31 samples/sec Loss 6.5762 LearningRate 0.000961 Epoch: 4 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:39,627-Speed 2497.33 samples/sec Loss 6.5535 LearningRate 0.000961 Epoch: 4 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:47,831-Speed 2496.73 samples/sec Loss 6.6778 LearningRate 0.000961 Epoch: 4 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:44:56,033-Speed 2497.30 samples/sec Loss 6.6564 LearningRate 0.000961 Epoch: 4 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:04,182-Speed 2513.74 samples/sec Loss 6.5956 LearningRate 0.000961 Epoch: 4 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:12,382-Speed 2497.96 samples/sec Loss 6.6883 LearningRate 0.000961 Epoch: 4 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:20,597-Speed 2493.37 samples/sec Loss 6.7242 LearningRate 0.000961 Epoch: 4 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:28,796-Speed 2498.61 samples/sec Loss 6.6226 LearningRate 0.000961 Epoch: 4 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:36,995-Speed 2498.14 samples/sec Loss 6.6237 LearningRate 0.000961 Epoch: 4 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:45,193-Speed 2498.61 samples/sec Loss 6.6053 LearningRate 0.000961 Epoch: 4 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:45:53,336-Speed 2515.26 samples/sec Loss 6.7322 LearningRate 0.000961 Epoch: 4 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 12:46:01,490-Speed 2512.29 samples/sec Loss 6.5992 LearningRate 0.000961 Epoch: 4 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:09,686-Speed 2499.14 samples/sec Loss 6.5893 LearningRate 0.000961 Epoch: 4 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:17,890-Speed 2496.80 samples/sec Loss 6.5510 LearningRate 0.000961 Epoch: 4 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:26,087-Speed 2498.80 samples/sec Loss 6.6196 LearningRate 0.000961 Epoch: 4 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:34,286-Speed 2498.35 samples/sec Loss 6.6412 LearningRate 0.000961 Epoch: 4 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:42,438-Speed 2512.66 samples/sec Loss 6.5727 LearningRate 0.000961 Epoch: 4 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:50,637-Speed 2498.29 samples/sec Loss 6.6292 LearningRate 0.000961 Epoch: 4 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:46:58,836-Speed 2498.27 samples/sec Loss 6.5252 LearningRate 0.000961 Epoch: 4 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:07,037-Speed 2497.80 samples/sec Loss 6.6238 LearningRate 0.000961 Epoch: 4 Global Step: 97780 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:15,238-Speed 2497.76 samples/sec Loss 6.5515 LearningRate 0.000961 Epoch: 4 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:23,442-Speed 2496.91 samples/sec Loss 6.5152 LearningRate 0.000961 Epoch: 4 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:31,596-Speed 2512.10 samples/sec Loss 6.7439 LearningRate 0.000961 Epoch: 4 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:39,796-Speed 2498.01 samples/sec Loss 6.6890 LearningRate 0.000961 Epoch: 4 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:47,998-Speed 2497.63 samples/sec Loss 6.6741 LearningRate 0.000961 Epoch: 4 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:47:56,197-Speed 2497.99 samples/sec Loss 6.6841 LearningRate 0.000961 Epoch: 4 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:04,397-Speed 2497.98 samples/sec Loss 6.6279 LearningRate 0.000961 Epoch: 4 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:12,598-Speed 2497.84 samples/sec Loss 6.6667 LearningRate 0.000960 Epoch: 4 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:20,744-Speed 2514.38 samples/sec Loss 6.6691 LearningRate 0.000960 Epoch: 4 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:28,941-Speed 2498.73 samples/sec Loss 6.6188 LearningRate 0.000960 Epoch: 4 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:37,138-Speed 2498.80 samples/sec Loss 6.5629 LearningRate 0.000960 Epoch: 4 Global Step: 97890 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:45,336-Speed 2498.62 samples/sec Loss 6.5531 LearningRate 0.000960 Epoch: 4 Global Step: 97900 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:48:53,536-Speed 2498.03 samples/sec Loss 6.6687 LearningRate 0.000960 Epoch: 4 Global Step: 97910 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:01,819-Speed 2472.71 samples/sec Loss 6.6368 LearningRate 0.000960 Epoch: 4 Global Step: 97920 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:09,963-Speed 2515.18 samples/sec Loss 6.5938 LearningRate 0.000960 Epoch: 4 Global Step: 97930 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:18,158-Speed 2499.58 samples/sec Loss 6.6690 LearningRate 0.000960 Epoch: 4 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:26,354-Speed 2499.31 samples/sec Loss 6.6604 LearningRate 0.000960 Epoch: 4 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:34,550-Speed 2499.35 samples/sec Loss 6.5398 LearningRate 0.000960 Epoch: 4 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:42,745-Speed 2499.63 samples/sec Loss 6.6535 LearningRate 0.000960 Epoch: 4 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:50,942-Speed 2498.74 samples/sec Loss 6.5434 LearningRate 0.000960 Epoch: 4 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:49:59,083-Speed 2515.85 samples/sec Loss 6.4993 LearningRate 0.000960 Epoch: 4 Global Step: 97990 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:07,292-Speed 2495.87 samples/sec Loss 6.5156 LearningRate 0.000960 Epoch: 4 Global Step: 98000 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:15,486-Speed 2499.76 samples/sec Loss 6.5684 LearningRate 0.000960 Epoch: 4 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:23,681-Speed 2499.45 samples/sec Loss 6.5844 LearningRate 0.000960 Epoch: 4 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:31,878-Speed 2498.99 samples/sec Loss 6.5211 LearningRate 0.000960 Epoch: 4 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:40,074-Speed 2499.17 samples/sec Loss 6.5650 LearningRate 0.000960 Epoch: 4 Global Step: 98040 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:48,217-Speed 2515.62 samples/sec Loss 6.5785 LearningRate 0.000960 Epoch: 4 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:50:56,415-Speed 2498.29 samples/sec Loss 6.6551 LearningRate 0.000960 Epoch: 4 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:04,617-Speed 2497.28 samples/sec Loss 6.5389 LearningRate 0.000960 Epoch: 4 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:12,811-Speed 2499.91 samples/sec Loss 6.5497 LearningRate 0.000960 Epoch: 4 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:21,020-Speed 2495.15 samples/sec Loss 6.5203 LearningRate 0.000960 Epoch: 4 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:29,218-Speed 2498.55 samples/sec Loss 6.3984 LearningRate 0.000960 Epoch: 4 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:37,361-Speed 2515.47 samples/sec Loss 6.5929 LearningRate 0.000960 Epoch: 4 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:45,559-Speed 2498.44 samples/sec Loss 6.4856 LearningRate 0.000960 Epoch: 4 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:51:53,753-Speed 2499.85 samples/sec Loss 6.5961 LearningRate 0.000960 Epoch: 4 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:01,952-Speed 2498.23 samples/sec Loss 6.5275 LearningRate 0.000960 Epoch: 4 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:10,147-Speed 2499.69 samples/sec Loss 6.6530 LearningRate 0.000960 Epoch: 4 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:18,353-Speed 2496.21 samples/sec Loss 6.5681 LearningRate 0.000960 Epoch: 4 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:26,495-Speed 2515.78 samples/sec Loss 6.6043 LearningRate 0.000960 Epoch: 4 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:34,689-Speed 2500.22 samples/sec Loss 6.5255 LearningRate 0.000960 Epoch: 4 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:42,883-Speed 2499.51 samples/sec Loss 6.5390 LearningRate 0.000960 Epoch: 4 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:51,096-Speed 2493.95 samples/sec Loss 6.6417 LearningRate 0.000960 Epoch: 4 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:52:59,292-Speed 2499.19 samples/sec Loss 6.6019 LearningRate 0.000960 Epoch: 4 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:07,498-Speed 2496.27 samples/sec Loss 6.6644 LearningRate 0.000960 Epoch: 4 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:15,641-Speed 2515.51 samples/sec Loss 6.5358 LearningRate 0.000960 Epoch: 4 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:23,837-Speed 2499.09 samples/sec Loss 6.5824 LearningRate 0.000959 Epoch: 4 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:32,038-Speed 2497.71 samples/sec Loss 6.6538 LearningRate 0.000959 Epoch: 4 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:40,240-Speed 2497.34 samples/sec Loss 6.5537 LearningRate 0.000959 Epoch: 4 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:48,439-Speed 2498.26 samples/sec Loss 6.5465 LearningRate 0.000959 Epoch: 4 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:53:56,641-Speed 2497.23 samples/sec Loss 6.5548 LearningRate 0.000959 Epoch: 4 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:04,787-Speed 2514.66 samples/sec Loss 6.4826 LearningRate 0.000959 Epoch: 4 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:12,986-Speed 2498.27 samples/sec Loss 6.5654 LearningRate 0.000959 Epoch: 4 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:21,186-Speed 2497.84 samples/sec Loss 6.5779 LearningRate 0.000959 Epoch: 4 Global Step: 98310 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:29,386-Speed 2498.11 samples/sec Loss 6.5196 LearningRate 0.000959 Epoch: 4 Global Step: 98320 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:37,588-Speed 2497.34 samples/sec Loss 6.6632 LearningRate 0.000959 Epoch: 4 Global Step: 98330 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:45,785-Speed 2499.14 samples/sec Loss 6.5100 LearningRate 0.000959 Epoch: 4 Global Step: 98340 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:54:53,931-Speed 2514.73 samples/sec Loss 6.4985 LearningRate 0.000959 Epoch: 4 Global Step: 98350 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:02,127-Speed 2499.09 samples/sec Loss 6.6145 LearningRate 0.000959 Epoch: 4 Global Step: 98360 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:10,335-Speed 2495.66 samples/sec Loss 6.6297 LearningRate 0.000959 Epoch: 4 Global Step: 98370 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:18,542-Speed 2495.71 samples/sec Loss 6.6145 LearningRate 0.000959 Epoch: 4 Global Step: 98380 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:26,740-Speed 2498.49 samples/sec Loss 6.6848 LearningRate 0.000959 Epoch: 4 Global Step: 98390 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:34,936-Speed 2499.60 samples/sec Loss 6.5716 LearningRate 0.000959 Epoch: 4 Global Step: 98400 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:43,082-Speed 2514.32 samples/sec Loss 6.6571 LearningRate 0.000959 Epoch: 4 Global Step: 98410 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:51,281-Speed 2498.37 samples/sec Loss 6.6188 LearningRate 0.000959 Epoch: 4 Global Step: 98420 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:55:59,478-Speed 2498.65 samples/sec Loss 6.6277 LearningRate 0.000959 Epoch: 4 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:07,679-Speed 2497.77 samples/sec Loss 6.5510 LearningRate 0.000959 Epoch: 4 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:15,880-Speed 2497.74 samples/sec Loss 6.5589 LearningRate 0.000959 Epoch: 4 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:24,084-Speed 2496.61 samples/sec Loss 6.5524 LearningRate 0.000959 Epoch: 4 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:32,235-Speed 2513.14 samples/sec Loss 6.5716 LearningRate 0.000959 Epoch: 4 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:40,437-Speed 2497.34 samples/sec Loss 6.6227 LearningRate 0.000959 Epoch: 4 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:48,649-Speed 2494.32 samples/sec Loss 6.6906 LearningRate 0.000959 Epoch: 4 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:56:56,852-Speed 2497.08 samples/sec Loss 6.5970 LearningRate 0.000959 Epoch: 4 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:05,052-Speed 2497.88 samples/sec Loss 6.5440 LearningRate 0.000959 Epoch: 4 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:13,261-Speed 2495.03 samples/sec Loss 6.6327 LearningRate 0.000959 Epoch: 4 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:21,409-Speed 2513.78 samples/sec Loss 6.5008 LearningRate 0.000959 Epoch: 4 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:29,609-Speed 2498.49 samples/sec Loss 6.5832 LearningRate 0.000959 Epoch: 4 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:37,824-Speed 2493.48 samples/sec Loss 6.5628 LearningRate 0.000959 Epoch: 4 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:46,024-Speed 2497.97 samples/sec Loss 6.5588 LearningRate 0.000959 Epoch: 4 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:57:54,223-Speed 2498.36 samples/sec Loss 6.5453 LearningRate 0.000959 Epoch: 4 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:02,420-Speed 2498.92 samples/sec Loss 6.5046 LearningRate 0.000959 Epoch: 4 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:10,563-Speed 2515.78 samples/sec Loss 6.4650 LearningRate 0.000959 Epoch: 4 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:18,762-Speed 2498.21 samples/sec Loss 6.4484 LearningRate 0.000959 Epoch: 4 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:26,958-Speed 2499.19 samples/sec Loss 6.5296 LearningRate 0.000959 Epoch: 4 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:35,152-Speed 2499.70 samples/sec Loss 6.4405 LearningRate 0.000958 Epoch: 4 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:43,351-Speed 2498.69 samples/sec Loss 6.5277 LearningRate 0.000958 Epoch: 4 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:51,544-Speed 2499.97 samples/sec Loss 6.4557 LearningRate 0.000958 Epoch: 4 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:58:59,689-Speed 2514.91 samples/sec Loss 6.5109 LearningRate 0.000958 Epoch: 4 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:07,891-Speed 2497.66 samples/sec Loss 6.4775 LearningRate 0.000958 Epoch: 4 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:16,088-Speed 2499.04 samples/sec Loss 6.4655 LearningRate 0.000958 Epoch: 4 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:24,289-Speed 2497.57 samples/sec Loss 6.5086 LearningRate 0.000958 Epoch: 4 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:32,492-Speed 2497.23 samples/sec Loss 6.5261 LearningRate 0.000958 Epoch: 4 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:40,691-Speed 2499.01 samples/sec Loss 6.5412 LearningRate 0.000958 Epoch: 4 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:48,840-Speed 2514.18 samples/sec Loss 6.4629 LearningRate 0.000958 Epoch: 4 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 12:59:57,037-Speed 2498.63 samples/sec Loss 6.5409 LearningRate 0.000958 Epoch: 4 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:05,239-Speed 2497.17 samples/sec Loss 6.4973 LearningRate 0.000958 Epoch: 4 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:13,439-Speed 2498.85 samples/sec Loss 6.5481 LearningRate 0.000958 Epoch: 4 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:21,634-Speed 2499.61 samples/sec Loss 6.5107 LearningRate 0.000958 Epoch: 4 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:29,840-Speed 2496.23 samples/sec Loss 6.4794 LearningRate 0.000958 Epoch: 4 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:37,984-Speed 2515.09 samples/sec Loss 6.4472 LearningRate 0.000958 Epoch: 4 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:46,182-Speed 2498.82 samples/sec Loss 6.4471 LearningRate 0.000958 Epoch: 4 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:00:54,381-Speed 2498.24 samples/sec Loss 6.5379 LearningRate 0.000958 Epoch: 4 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:02,575-Speed 2499.68 samples/sec Loss 6.5300 LearningRate 0.000958 Epoch: 4 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:10,771-Speed 2499.16 samples/sec Loss 6.4483 LearningRate 0.000958 Epoch: 4 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:18,970-Speed 2498.44 samples/sec Loss 6.5606 LearningRate 0.000958 Epoch: 4 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:27,115-Speed 2514.79 samples/sec Loss 6.5411 LearningRate 0.000958 Epoch: 4 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:35,313-Speed 2498.70 samples/sec Loss 6.5144 LearningRate 0.000958 Epoch: 4 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:43,517-Speed 2496.49 samples/sec Loss 6.4484 LearningRate 0.000958 Epoch: 4 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:51,712-Speed 2499.60 samples/sec Loss 6.4898 LearningRate 0.000958 Epoch: 4 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:01:59,906-Speed 2499.75 samples/sec Loss 6.5351 LearningRate 0.000958 Epoch: 4 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:02:08,113-Speed 2495.77 samples/sec Loss 6.4951 LearningRate 0.000958 Epoch: 4 Global Step: 98880 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:02:16,257-Speed 2515.25 samples/sec Loss 6.5377 LearningRate 0.000958 Epoch: 4 Global Step: 98890 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:02:24,454-Speed 2499.11 samples/sec Loss 6.4626 LearningRate 0.000958 Epoch: 4 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:02:32,669-Speed 2493.33 samples/sec Loss 6.5696 LearningRate 0.000958 Epoch: 4 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:02:40,870-Speed 2497.83 samples/sec Loss 6.6097 LearningRate 0.000958 Epoch: 4 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:02:49,072-Speed 2497.20 samples/sec Loss 6.4687 LearningRate 0.000958 Epoch: 4 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:02:57,273-Speed 2497.81 samples/sec Loss 6.5231 LearningRate 0.000958 Epoch: 4 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:05,422-Speed 2513.40 samples/sec Loss 6.5478 LearningRate 0.000958 Epoch: 4 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:13,627-Speed 2496.50 samples/sec Loss 6.5160 LearningRate 0.000958 Epoch: 4 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:21,831-Speed 2497.19 samples/sec Loss 6.4731 LearningRate 0.000958 Epoch: 4 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:30,034-Speed 2497.19 samples/sec Loss 6.3649 LearningRate 0.000958 Epoch: 4 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:38,235-Speed 2497.65 samples/sec Loss 6.5577 LearningRate 0.000958 Epoch: 4 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:46,436-Speed 2497.80 samples/sec Loss 6.4693 LearningRate 0.000957 Epoch: 4 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:03:54,581-Speed 2514.70 samples/sec Loss 6.4887 LearningRate 0.000957 Epoch: 4 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:02,782-Speed 2497.70 samples/sec Loss 6.5206 LearningRate 0.000957 Epoch: 4 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:10,986-Speed 2496.66 samples/sec Loss 6.4881 LearningRate 0.000957 Epoch: 4 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:19,187-Speed 2497.85 samples/sec Loss 6.3884 LearningRate 0.000957 Epoch: 4 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:27,383-Speed 2499.36 samples/sec Loss 6.4753 LearningRate 0.000957 Epoch: 4 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:35,585-Speed 2497.30 samples/sec Loss 6.4779 LearningRate 0.000957 Epoch: 4 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:43,745-Speed 2510.31 samples/sec Loss 6.4582 LearningRate 0.000957 Epoch: 4 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:04:51,943-Speed 2498.71 samples/sec Loss 6.4393 LearningRate 0.000957 Epoch: 4 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:00,142-Speed 2498.13 samples/sec Loss 6.5149 LearningRate 0.000957 Epoch: 4 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:08,341-Speed 2498.46 samples/sec Loss 6.5426 LearningRate 0.000957 Epoch: 4 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:16,542-Speed 2497.57 samples/sec Loss 6.6396 LearningRate 0.000957 Epoch: 4 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:24,741-Speed 2498.17 samples/sec Loss 6.6294 LearningRate 0.000957 Epoch: 4 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:32,890-Speed 2513.72 samples/sec Loss 6.5508 LearningRate 0.000957 Epoch: 4 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:41,089-Speed 2498.21 samples/sec Loss 6.5692 LearningRate 0.000957 Epoch: 4 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:49,289-Speed 2497.80 samples/sec Loss 6.5662 LearningRate 0.000957 Epoch: 4 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:05:57,489-Speed 2498.08 samples/sec Loss 6.5277 LearningRate 0.000957 Epoch: 4 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:05,697-Speed 2495.43 samples/sec Loss 6.5110 LearningRate 0.000957 Epoch: 4 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:13,898-Speed 2497.58 samples/sec Loss 6.6488 LearningRate 0.000957 Epoch: 4 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:22,046-Speed 2513.94 samples/sec Loss 6.6013 LearningRate 0.000957 Epoch: 4 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:30,246-Speed 2498.06 samples/sec Loss 6.5353 LearningRate 0.000957 Epoch: 4 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:38,447-Speed 2497.89 samples/sec Loss 6.5793 LearningRate 0.000957 Epoch: 4 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:46,649-Speed 2497.40 samples/sec Loss 6.5488 LearningRate 0.000957 Epoch: 4 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:06:54,848-Speed 2498.30 samples/sec Loss 6.5130 LearningRate 0.000957 Epoch: 4 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:03,044-Speed 2499.30 samples/sec Loss 6.5771 LearningRate 0.000957 Epoch: 4 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:11,190-Speed 2514.60 samples/sec Loss 6.5868 LearningRate 0.000957 Epoch: 4 Global Step: 99250 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:19,400-Speed 2494.80 samples/sec Loss 6.5698 LearningRate 0.000957 Epoch: 4 Global Step: 99260 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:27,602-Speed 2497.79 samples/sec Loss 6.5270 LearningRate 0.000957 Epoch: 4 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:35,801-Speed 2498.12 samples/sec Loss 6.4168 LearningRate 0.000957 Epoch: 4 Global Step: 99280 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:44,001-Speed 2497.86 samples/sec Loss 6.5223 LearningRate 0.000957 Epoch: 4 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:07:52,206-Speed 2496.58 samples/sec Loss 6.5111 LearningRate 0.000957 Epoch: 4 Global Step: 99300 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:08:04,907-Speed 2515.58 samples/sec Loss 6.4408 LearningRate 0.000957 Epoch: 4 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:08:13,136-Speed 2497.58 samples/sec Loss 6.4701 LearningRate 0.000957 Epoch: 4 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:08:21,344-Speed 2495.37 samples/sec Loss 6.4093 LearningRate 0.000957 Epoch: 4 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:08:41,553-Speed 1015.97 samples/sec Loss 6.4582 LearningRate 0.000957 Epoch: 4 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:08:49,749-Speed 2499.02 samples/sec Loss 6.5270 LearningRate 0.000957 Epoch: 4 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:02,636-Speed 1591.92 samples/sec Loss 6.4513 LearningRate 0.000957 Epoch: 4 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:11,226-Speed 2513.50 samples/sec Loss 6.4843 LearningRate 0.000957 Epoch: 4 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:19,433-Speed 2495.82 samples/sec Loss 6.4592 LearningRate 0.000956 Epoch: 4 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:27,644-Speed 2494.42 samples/sec Loss 6.4468 LearningRate 0.000956 Epoch: 4 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:37,897-Speed 2004.23 samples/sec Loss 6.4901 LearningRate 0.000956 Epoch: 4 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:46,249-Speed 2495.92 samples/sec Loss 6.3953 LearningRate 0.000956 Epoch: 4 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:09:54,471-Speed 2491.18 samples/sec Loss 6.4793 LearningRate 0.000956 Epoch: 4 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:02,639-Speed 2507.64 samples/sec Loss 6.5160 LearningRate 0.000956 Epoch: 4 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:15,623-Speed 1582.35 samples/sec Loss 6.4266 LearningRate 0.000956 Epoch: 4 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:23,863-Speed 2489.52 samples/sec Loss 6.5016 LearningRate 0.000956 Epoch: 4 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:32,097-Speed 2487.73 samples/sec Loss 6.5892 LearningRate 0.000956 Epoch: 4 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:44,197-Speed 1702.84 samples/sec Loss 6.5263 LearningRate 0.000956 Epoch: 4 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:10:52,778-Speed 2493.40 samples/sec Loss 6.5196 LearningRate 0.000956 Epoch: 4 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:00,942-Speed 2509.05 samples/sec Loss 6.6030 LearningRate 0.000956 Epoch: 4 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:09,218-Speed 2492.28 samples/sec Loss 6.5010 LearningRate 0.000956 Epoch: 4 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:20,469-Speed 1989.39 samples/sec Loss 6.4681 LearningRate 0.000956 Epoch: 4 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:28,762-Speed 2493.17 samples/sec Loss 6.5095 LearningRate 0.000956 Epoch: 4 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:36,975-Speed 2493.76 samples/sec Loss 6.5305 LearningRate 0.000956 Epoch: 4 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:48,834-Speed 1751.72 samples/sec Loss 6.4700 LearningRate 0.000956 Epoch: 4 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:11:57,032-Speed 2512.98 samples/sec Loss 6.4201 LearningRate 0.000956 Epoch: 4 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:05,260-Speed 2489.37 samples/sec Loss 6.4522 LearningRate 0.000956 Epoch: 4 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:13,917-Speed 2496.63 samples/sec Loss 6.4126 LearningRate 0.000956 Epoch: 4 Global Step: 99570 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:23,183-Speed 2499.63 samples/sec Loss 6.3404 LearningRate 0.000956 Epoch: 4 Global Step: 99580 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:31,421-Speed 2499.22 samples/sec Loss 6.3596 LearningRate 0.000956 Epoch: 4 Global Step: 99590 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:42,693-Speed 1817.13 samples/sec Loss 6.3964 LearningRate 0.000956 Epoch: 4 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:50,854-Speed 2515.82 samples/sec Loss 6.5340 LearningRate 0.000956 Epoch: 4 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:12:59,056-Speed 2497.22 samples/sec Loss 6.4574 LearningRate 0.000956 Epoch: 4 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:07,260-Speed 2496.86 samples/sec Loss 6.5402 LearningRate 0.000956 Epoch: 4 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:15,461-Speed 2497.79 samples/sec Loss 6.4533 LearningRate 0.000956 Epoch: 4 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:23,663-Speed 2497.24 samples/sec Loss 6.4267 LearningRate 0.000956 Epoch: 4 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:31,868-Speed 2496.31 samples/sec Loss 6.5039 LearningRate 0.000956 Epoch: 4 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:40,030-Speed 2509.77 samples/sec Loss 6.5228 LearningRate 0.000956 Epoch: 4 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:48,237-Speed 2495.94 samples/sec Loss 6.4002 LearningRate 0.000956 Epoch: 4 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:13:56,441-Speed 2496.89 samples/sec Loss 6.5197 LearningRate 0.000956 Epoch: 4 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:04,645-Speed 2496.62 samples/sec Loss 6.4944 LearningRate 0.000956 Epoch: 4 Global Step: 99700 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:12,855-Speed 2495.19 samples/sec Loss 6.5444 LearningRate 0.000956 Epoch: 4 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:21,060-Speed 2496.33 samples/sec Loss 6.4156 LearningRate 0.000956 Epoch: 4 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:29,211-Speed 2512.92 samples/sec Loss 6.4931 LearningRate 0.000956 Epoch: 4 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:37,416-Speed 2496.67 samples/sec Loss 6.5263 LearningRate 0.000956 Epoch: 4 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:45,618-Speed 2497.32 samples/sec Loss 6.4346 LearningRate 0.000956 Epoch: 4 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:14:53,819-Speed 2497.68 samples/sec Loss 6.4616 LearningRate 0.000955 Epoch: 4 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:02,020-Speed 2497.78 samples/sec Loss 6.3998 LearningRate 0.000955 Epoch: 4 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:10,235-Speed 2493.31 samples/sec Loss 6.4561 LearningRate 0.000955 Epoch: 4 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:18,384-Speed 2513.32 samples/sec Loss 6.4765 LearningRate 0.000955 Epoch: 4 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:26,599-Speed 2493.53 samples/sec Loss 6.3666 LearningRate 0.000955 Epoch: 4 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:34,801-Speed 2497.35 samples/sec Loss 6.3941 LearningRate 0.000955 Epoch: 4 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:43,002-Speed 2497.52 samples/sec Loss 6.3799 LearningRate 0.000955 Epoch: 4 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 167 hours Training: 2022-07-06 13:15:51,176-Speed 2505.77 samples/sec Loss 6.3453 LearningRate 0.000955 Epoch: 4 Global Step: 99830 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:15:59,381-Speed 2496.46 samples/sec Loss 6.4478 LearningRate 0.000955 Epoch: 4 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:07,534-Speed 2512.43 samples/sec Loss 6.4343 LearningRate 0.000955 Epoch: 4 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:15,742-Speed 2495.45 samples/sec Loss 6.4049 LearningRate 0.000955 Epoch: 4 Global Step: 99860 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:23,945-Speed 2497.32 samples/sec Loss 6.4159 LearningRate 0.000955 Epoch: 4 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:32,147-Speed 2497.28 samples/sec Loss 6.4472 LearningRate 0.000955 Epoch: 4 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:40,357-Speed 2494.87 samples/sec Loss 6.3630 LearningRate 0.000955 Epoch: 4 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:16:48,531-Speed 2505.92 samples/sec Loss 6.3875 LearningRate 0.000955 Epoch: 4 Global Step: 99900 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:16:56,682-Speed 2512.94 samples/sec Loss 6.4286 LearningRate 0.000955 Epoch: 4 Global Step: 99910 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:04,888-Speed 2496.30 samples/sec Loss 6.4336 LearningRate 0.000955 Epoch: 4 Global Step: 99920 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:13,091-Speed 2496.89 samples/sec Loss 6.3756 LearningRate 0.000955 Epoch: 4 Global Step: 99930 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:21,330-Speed 2486.23 samples/sec Loss 6.4573 LearningRate 0.000955 Epoch: 4 Global Step: 99940 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:29,532-Speed 2497.70 samples/sec Loss 6.3627 LearningRate 0.000955 Epoch: 4 Global Step: 99950 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:37,736-Speed 2496.69 samples/sec Loss 6.4655 LearningRate 0.000955 Epoch: 4 Global Step: 99960 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:45,896-Speed 2510.00 samples/sec Loss 6.4985 LearningRate 0.000955 Epoch: 4 Global Step: 99970 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:17:54,100-Speed 2496.75 samples/sec Loss 6.4200 LearningRate 0.000955 Epoch: 4 Global Step: 99980 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:02,310-Speed 2495.19 samples/sec Loss 6.4139 LearningRate 0.000955 Epoch: 4 Global Step: 99990 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:10,522-Speed 2494.40 samples/sec Loss 6.5307 LearningRate 0.000955 Epoch: 4 Global Step: 100000 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:18,729-Speed 2495.69 samples/sec Loss 6.3670 LearningRate 0.000955 Epoch: 4 Global Step: 100010 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:26,953-Speed 2490.80 samples/sec Loss 6.3892 LearningRate 0.000955 Epoch: 4 Global Step: 100020 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:35,105-Speed 2512.86 samples/sec Loss 6.4088 LearningRate 0.000955 Epoch: 4 Global Step: 100030 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:43,322-Speed 2492.87 samples/sec Loss 6.3789 LearningRate 0.000955 Epoch: 4 Global Step: 100040 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:51,527-Speed 2496.55 samples/sec Loss 6.3525 LearningRate 0.000955 Epoch: 4 Global Step: 100050 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:18:59,729-Speed 2497.25 samples/sec Loss 6.4592 LearningRate 0.000955 Epoch: 4 Global Step: 100060 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:07,932-Speed 2497.07 samples/sec Loss 6.4515 LearningRate 0.000955 Epoch: 4 Global Step: 100070 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:16,141-Speed 2495.10 samples/sec Loss 6.4343 LearningRate 0.000955 Epoch: 4 Global Step: 100080 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:24,290-Speed 2513.69 samples/sec Loss 6.4950 LearningRate 0.000955 Epoch: 4 Global Step: 100090 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:32,493-Speed 2497.27 samples/sec Loss 6.5980 LearningRate 0.000955 Epoch: 4 Global Step: 100100 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:40,699-Speed 2496.38 samples/sec Loss 6.4540 LearningRate 0.000955 Epoch: 4 Global Step: 100110 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:48,902-Speed 2497.06 samples/sec Loss 6.4794 LearningRate 0.000955 Epoch: 4 Global Step: 100120 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:19:57,105-Speed 2497.08 samples/sec Loss 6.4862 LearningRate 0.000955 Epoch: 4 Global Step: 100130 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:05,311-Speed 2495.99 samples/sec Loss 6.4630 LearningRate 0.000954 Epoch: 4 Global Step: 100140 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:13,459-Speed 2513.97 samples/sec Loss 6.4612 LearningRate 0.000954 Epoch: 4 Global Step: 100150 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:21,673-Speed 2493.90 samples/sec Loss 6.3970 LearningRate 0.000954 Epoch: 4 Global Step: 100160 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:29,873-Speed 2497.94 samples/sec Loss 6.3607 LearningRate 0.000954 Epoch: 4 Global Step: 100170 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:38,078-Speed 2496.85 samples/sec Loss 6.3759 LearningRate 0.000954 Epoch: 4 Global Step: 100180 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:46,280-Speed 2497.22 samples/sec Loss 6.4147 LearningRate 0.000954 Epoch: 4 Global Step: 100190 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:20:54,484-Speed 2496.66 samples/sec Loss 6.4381 LearningRate 0.000954 Epoch: 4 Global Step: 100200 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:02,633-Speed 2513.47 samples/sec Loss 6.4655 LearningRate 0.000954 Epoch: 4 Global Step: 100210 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:10,839-Speed 2496.38 samples/sec Loss 6.5722 LearningRate 0.000954 Epoch: 4 Global Step: 100220 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:19,038-Speed 2498.17 samples/sec Loss 6.5106 LearningRate 0.000954 Epoch: 4 Global Step: 100230 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:27,253-Speed 2493.71 samples/sec Loss 6.5164 LearningRate 0.000954 Epoch: 4 Global Step: 100240 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:35,458-Speed 2496.67 samples/sec Loss 6.5395 LearningRate 0.000954 Epoch: 4 Global Step: 100250 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:43,657-Speed 2498.07 samples/sec Loss 6.4307 LearningRate 0.000954 Epoch: 4 Global Step: 100260 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:21:51,812-Speed 2511.90 samples/sec Loss 6.4109 LearningRate 0.000954 Epoch: 4 Global Step: 100270 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:00,012-Speed 2498.05 samples/sec Loss 6.3567 LearningRate 0.000954 Epoch: 4 Global Step: 100280 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:08,210-Speed 2498.45 samples/sec Loss 6.4121 LearningRate 0.000954 Epoch: 4 Global Step: 100290 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:16,410-Speed 2498.00 samples/sec Loss 6.4634 LearningRate 0.000954 Epoch: 4 Global Step: 100300 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:24,614-Speed 2496.91 samples/sec Loss 6.4924 LearningRate 0.000954 Epoch: 4 Global Step: 100310 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:32,814-Speed 2497.79 samples/sec Loss 6.5208 LearningRate 0.000954 Epoch: 4 Global Step: 100320 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:40,959-Speed 2515.15 samples/sec Loss 6.4092 LearningRate 0.000954 Epoch: 4 Global Step: 100330 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:49,161-Speed 2497.23 samples/sec Loss 6.3119 LearningRate 0.000954 Epoch: 4 Global Step: 100340 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:22:57,360-Speed 2498.37 samples/sec Loss 6.3758 LearningRate 0.000954 Epoch: 4 Global Step: 100350 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:05,559-Speed 2498.29 samples/sec Loss 6.3617 LearningRate 0.000954 Epoch: 4 Global Step: 100360 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:13,759-Speed 2498.19 samples/sec Loss 6.3325 LearningRate 0.000954 Epoch: 4 Global Step: 100370 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:21,973-Speed 2493.71 samples/sec Loss 6.4029 LearningRate 0.000954 Epoch: 4 Global Step: 100380 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:30,119-Speed 2514.52 samples/sec Loss 6.4108 LearningRate 0.000954 Epoch: 4 Global Step: 100390 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:38,333-Speed 2493.51 samples/sec Loss 6.3411 LearningRate 0.000954 Epoch: 4 Global Step: 100400 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:46,534-Speed 2497.88 samples/sec Loss 6.2860 LearningRate 0.000954 Epoch: 4 Global Step: 100410 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:23:54,737-Speed 2497.34 samples/sec Loss 6.3679 LearningRate 0.000954 Epoch: 4 Global Step: 100420 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:02,949-Speed 2493.93 samples/sec Loss 6.2747 LearningRate 0.000954 Epoch: 4 Global Step: 100430 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:11,150-Speed 2498.96 samples/sec Loss 6.3640 LearningRate 0.000954 Epoch: 4 Global Step: 100440 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:19,298-Speed 2513.72 samples/sec Loss 6.3976 LearningRate 0.000954 Epoch: 4 Global Step: 100450 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:27,495-Speed 2498.97 samples/sec Loss 6.2827 LearningRate 0.000954 Epoch: 4 Global Step: 100460 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:35,698-Speed 2497.45 samples/sec Loss 6.4412 LearningRate 0.000954 Epoch: 4 Global Step: 100470 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:43,897-Speed 2498.43 samples/sec Loss 6.3450 LearningRate 0.000954 Epoch: 4 Global Step: 100480 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:24:52,101-Speed 2496.77 samples/sec Loss 6.3344 LearningRate 0.000954 Epoch: 4 Global Step: 100490 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:00,300-Speed 2498.11 samples/sec Loss 6.4613 LearningRate 0.000954 Epoch: 4 Global Step: 100500 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:08,447-Speed 2514.25 samples/sec Loss 6.3597 LearningRate 0.000954 Epoch: 4 Global Step: 100510 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:16,644-Speed 2498.83 samples/sec Loss 6.4161 LearningRate 0.000954 Epoch: 4 Global Step: 100520 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:24,857-Speed 2494.26 samples/sec Loss 6.3777 LearningRate 0.000953 Epoch: 4 Global Step: 100530 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:33,058-Speed 2497.86 samples/sec Loss 6.3864 LearningRate 0.000953 Epoch: 4 Global Step: 100540 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:41,260-Speed 2497.16 samples/sec Loss 6.4090 LearningRate 0.000953 Epoch: 4 Global Step: 100550 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:49,458-Speed 2498.74 samples/sec Loss 6.4438 LearningRate 0.000953 Epoch: 4 Global Step: 100560 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:25:57,616-Speed 2510.75 samples/sec Loss 6.3389 LearningRate 0.000953 Epoch: 4 Global Step: 100570 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:05,813-Speed 2498.86 samples/sec Loss 6.3876 LearningRate 0.000953 Epoch: 4 Global Step: 100580 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:14,011-Speed 2498.64 samples/sec Loss 6.4125 LearningRate 0.000953 Epoch: 4 Global Step: 100590 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:22,211-Speed 2498.11 samples/sec Loss 6.2902 LearningRate 0.000953 Epoch: 4 Global Step: 100600 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:30,412-Speed 2497.52 samples/sec Loss 6.2910 LearningRate 0.000953 Epoch: 4 Global Step: 100610 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:38,610-Speed 2498.28 samples/sec Loss 6.2649 LearningRate 0.000953 Epoch: 4 Global Step: 100620 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:46,759-Speed 2513.72 samples/sec Loss 6.3305 LearningRate 0.000953 Epoch: 4 Global Step: 100630 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:26:54,957-Speed 2498.51 samples/sec Loss 6.2846 LearningRate 0.000953 Epoch: 4 Global Step: 100640 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:03,158-Speed 2498.04 samples/sec Loss 6.3569 LearningRate 0.000953 Epoch: 4 Global Step: 100650 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:11,357-Speed 2498.20 samples/sec Loss 6.3104 LearningRate 0.000953 Epoch: 4 Global Step: 100660 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:19,556-Speed 2498.53 samples/sec Loss 6.4582 LearningRate 0.000953 Epoch: 4 Global Step: 100670 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:27,758-Speed 2497.44 samples/sec Loss 6.4889 LearningRate 0.000953 Epoch: 4 Global Step: 100680 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:35,903-Speed 2514.72 samples/sec Loss 6.2980 LearningRate 0.000953 Epoch: 4 Global Step: 100690 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:44,108-Speed 2496.40 samples/sec Loss 6.5401 LearningRate 0.000953 Epoch: 4 Global Step: 100700 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:27:52,305-Speed 2498.87 samples/sec Loss 6.4954 LearningRate 0.000953 Epoch: 4 Global Step: 100710 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:00,505-Speed 2498.07 samples/sec Loss 6.5938 LearningRate 0.000953 Epoch: 4 Global Step: 100720 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:08,703-Speed 2498.73 samples/sec Loss 6.4749 LearningRate 0.000953 Epoch: 4 Global Step: 100730 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:16,902-Speed 2498.00 samples/sec Loss 6.4425 LearningRate 0.000953 Epoch: 4 Global Step: 100740 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:25,043-Speed 2516.13 samples/sec Loss 6.5147 LearningRate 0.000953 Epoch: 4 Global Step: 100750 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:33,242-Speed 2498.59 samples/sec Loss 6.4538 LearningRate 0.000953 Epoch: 4 Global Step: 100760 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:41,436-Speed 2499.50 samples/sec Loss 6.4465 LearningRate 0.000953 Epoch: 4 Global Step: 100770 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:49,631-Speed 2499.50 samples/sec Loss 6.4591 LearningRate 0.000953 Epoch: 4 Global Step: 100780 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:28:57,832-Speed 2498.09 samples/sec Loss 6.5116 LearningRate 0.000953 Epoch: 4 Global Step: 100790 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:06,031-Speed 2498.19 samples/sec Loss 6.4185 LearningRate 0.000953 Epoch: 4 Global Step: 100800 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:14,176-Speed 2514.88 samples/sec Loss 6.4419 LearningRate 0.000953 Epoch: 4 Global Step: 100810 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:22,375-Speed 2498.40 samples/sec Loss 6.3421 LearningRate 0.000953 Epoch: 4 Global Step: 100820 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:30,573-Speed 2498.40 samples/sec Loss 6.4190 LearningRate 0.000953 Epoch: 4 Global Step: 100830 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:38,771-Speed 2498.47 samples/sec Loss 6.4111 LearningRate 0.000953 Epoch: 4 Global Step: 100840 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:46,972-Speed 2497.67 samples/sec Loss 6.3653 LearningRate 0.000953 Epoch: 4 Global Step: 100850 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:29:55,171-Speed 2498.33 samples/sec Loss 6.2850 LearningRate 0.000953 Epoch: 4 Global Step: 100860 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:03,322-Speed 2512.93 samples/sec Loss 6.4316 LearningRate 0.000953 Epoch: 4 Global Step: 100870 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:11,525-Speed 2497.39 samples/sec Loss 6.4021 LearningRate 0.000953 Epoch: 4 Global Step: 100880 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:19,721-Speed 2498.90 samples/sec Loss 6.4029 LearningRate 0.000953 Epoch: 4 Global Step: 100890 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:27,920-Speed 2498.35 samples/sec Loss 6.3794 LearningRate 0.000953 Epoch: 4 Global Step: 100900 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:36,119-Speed 2498.21 samples/sec Loss 6.3436 LearningRate 0.000952 Epoch: 4 Global Step: 100910 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:44,317-Speed 2498.56 samples/sec Loss 6.3413 LearningRate 0.000952 Epoch: 4 Global Step: 100920 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:30:52,461-Speed 2515.24 samples/sec Loss 6.3618 LearningRate 0.000952 Epoch: 4 Global Step: 100930 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:00,661-Speed 2498.04 samples/sec Loss 6.4837 LearningRate 0.000952 Epoch: 4 Global Step: 100940 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:08,857-Speed 2498.85 samples/sec Loss 6.4516 LearningRate 0.000952 Epoch: 4 Global Step: 100950 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:17,062-Speed 2496.49 samples/sec Loss 6.3622 LearningRate 0.000952 Epoch: 4 Global Step: 100960 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:25,259-Speed 2498.90 samples/sec Loss 6.4800 LearningRate 0.000952 Epoch: 4 Global Step: 100970 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:33,458-Speed 2498.31 samples/sec Loss 6.3561 LearningRate 0.000952 Epoch: 4 Global Step: 100980 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:41,603-Speed 2514.94 samples/sec Loss 6.3462 LearningRate 0.000952 Epoch: 4 Global Step: 100990 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:49,800-Speed 2498.84 samples/sec Loss 6.3249 LearningRate 0.000952 Epoch: 4 Global Step: 101000 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:31:58,005-Speed 2496.33 samples/sec Loss 6.3638 LearningRate 0.000952 Epoch: 4 Global Step: 101010 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:06,219-Speed 2493.79 samples/sec Loss 6.3508 LearningRate 0.000952 Epoch: 4 Global Step: 101020 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:14,415-Speed 2499.18 samples/sec Loss 6.4102 LearningRate 0.000952 Epoch: 4 Global Step: 101030 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:22,612-Speed 2498.89 samples/sec Loss 6.3865 LearningRate 0.000952 Epoch: 4 Global Step: 101040 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:30,764-Speed 2512.56 samples/sec Loss 6.4205 LearningRate 0.000952 Epoch: 4 Global Step: 101050 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:38,965-Speed 2497.60 samples/sec Loss 6.3947 LearningRate 0.000952 Epoch: 4 Global Step: 101060 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:47,167-Speed 2497.52 samples/sec Loss 6.3286 LearningRate 0.000952 Epoch: 4 Global Step: 101070 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:32:55,366-Speed 2498.38 samples/sec Loss 6.3394 LearningRate 0.000952 Epoch: 4 Global Step: 101080 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:33:03,566-Speed 2497.65 samples/sec Loss 6.3538 LearningRate 0.000952 Epoch: 4 Global Step: 101090 Fp16 Grad Scale: 16384 Required: 167 hours Training: 2022-07-06 13:33:11,768-Speed 2497.43 samples/sec Loss 6.4054 LearningRate 0.000952 Epoch: 4 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:33:19,917-Speed 2514.13 samples/sec Loss 6.4594 LearningRate 0.000952 Epoch: 4 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:33:28,119-Speed 2497.22 samples/sec Loss 6.3528 LearningRate 0.000952 Epoch: 4 Global Step: 101120 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:33:36,326-Speed 2495.81 samples/sec Loss 6.3558 LearningRate 0.000952 Epoch: 4 Global Step: 101130 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:33:44,520-Speed 2499.75 samples/sec Loss 6.2964 LearningRate 0.000952 Epoch: 4 Global Step: 101140 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:33:52,721-Speed 2497.69 samples/sec Loss 6.3468 LearningRate 0.000952 Epoch: 4 Global Step: 101150 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:00,916-Speed 2499.28 samples/sec Loss 6.3923 LearningRate 0.000952 Epoch: 4 Global Step: 101160 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:09,064-Speed 2514.03 samples/sec Loss 6.2768 LearningRate 0.000952 Epoch: 4 Global Step: 101170 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:17,269-Speed 2496.77 samples/sec Loss 6.3520 LearningRate 0.000952 Epoch: 4 Global Step: 101180 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:25,464-Speed 2499.44 samples/sec Loss 6.2515 LearningRate 0.000952 Epoch: 4 Global Step: 101190 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:33,659-Speed 2499.51 samples/sec Loss 6.3599 LearningRate 0.000952 Epoch: 4 Global Step: 101200 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:41,855-Speed 2499.31 samples/sec Loss 6.2909 LearningRate 0.000952 Epoch: 4 Global Step: 101210 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:50,050-Speed 2499.66 samples/sec Loss 6.3047 LearningRate 0.000952 Epoch: 4 Global Step: 101220 Fp16 Grad Scale: 32768 Required: 167 hours Training: 2022-07-06 13:34:58,192-Speed 2515.68 samples/sec Loss 6.3355 LearningRate 0.000952 Epoch: 4 Global Step: 101230 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:06,390-Speed 2498.65 samples/sec Loss 6.3407 LearningRate 0.000952 Epoch: 4 Global Step: 101240 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:14,589-Speed 2498.49 samples/sec Loss 6.3665 LearningRate 0.000952 Epoch: 4 Global Step: 101250 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:22,786-Speed 2498.66 samples/sec Loss 6.5677 LearningRate 0.000952 Epoch: 4 Global Step: 101260 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:30,986-Speed 2498.07 samples/sec Loss 6.4082 LearningRate 0.000952 Epoch: 4 Global Step: 101270 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:39,185-Speed 2498.37 samples/sec Loss 6.2681 LearningRate 0.000952 Epoch: 4 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:47,328-Speed 2515.28 samples/sec Loss 6.7027 LearningRate 0.000951 Epoch: 4 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:35:55,526-Speed 2498.68 samples/sec Loss 6.4230 LearningRate 0.000951 Epoch: 4 Global Step: 101300 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:03,723-Speed 2498.99 samples/sec Loss 6.5135 LearningRate 0.000951 Epoch: 4 Global Step: 101310 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:11,921-Speed 2498.55 samples/sec Loss 6.5815 LearningRate 0.000951 Epoch: 4 Global Step: 101320 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:20,121-Speed 2498.01 samples/sec Loss 6.4221 LearningRate 0.000951 Epoch: 4 Global Step: 101330 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:28,346-Speed 2490.31 samples/sec Loss 6.5132 LearningRate 0.000951 Epoch: 4 Global Step: 101340 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:36,494-Speed 2514.09 samples/sec Loss 6.5223 LearningRate 0.000951 Epoch: 4 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:44,695-Speed 2497.56 samples/sec Loss 6.4257 LearningRate 0.000951 Epoch: 4 Global Step: 101360 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:36:52,901-Speed 2496.36 samples/sec Loss 6.3872 LearningRate 0.000951 Epoch: 4 Global Step: 101370 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:01,101-Speed 2498.00 samples/sec Loss 6.4327 LearningRate 0.000951 Epoch: 4 Global Step: 101380 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:09,313-Speed 2494.39 samples/sec Loss 6.3502 LearningRate 0.000951 Epoch: 4 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:17,510-Speed 2498.70 samples/sec Loss 6.3393 LearningRate 0.000951 Epoch: 4 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:25,657-Speed 2514.35 samples/sec Loss 6.3897 LearningRate 0.000951 Epoch: 4 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:33,860-Speed 2496.96 samples/sec Loss 6.3068 LearningRate 0.000951 Epoch: 4 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:42,056-Speed 2499.05 samples/sec Loss 6.3949 LearningRate 0.000951 Epoch: 4 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:50,252-Speed 2499.31 samples/sec Loss 6.3262 LearningRate 0.000951 Epoch: 4 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:37:58,448-Speed 2499.33 samples/sec Loss 6.3222 LearningRate 0.000951 Epoch: 4 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:06,645-Speed 2498.61 samples/sec Loss 6.2930 LearningRate 0.000951 Epoch: 4 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:14,792-Speed 2514.38 samples/sec Loss 6.3993 LearningRate 0.000951 Epoch: 4 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:22,989-Speed 2499.15 samples/sec Loss 6.3249 LearningRate 0.000951 Epoch: 4 Global Step: 101480 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:31,193-Speed 2496.62 samples/sec Loss 6.3318 LearningRate 0.000951 Epoch: 4 Global Step: 101490 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:39,394-Speed 2497.87 samples/sec Loss 6.3511 LearningRate 0.000951 Epoch: 4 Global Step: 101500 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:47,596-Speed 2497.32 samples/sec Loss 6.3245 LearningRate 0.000951 Epoch: 4 Global Step: 101510 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:38:55,799-Speed 2496.93 samples/sec Loss 6.2471 LearningRate 0.000951 Epoch: 4 Global Step: 101520 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:03,947-Speed 2514.07 samples/sec Loss 6.2559 LearningRate 0.000951 Epoch: 4 Global Step: 101530 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:12,146-Speed 2498.26 samples/sec Loss 6.2933 LearningRate 0.000951 Epoch: 4 Global Step: 101540 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:20,343-Speed 2498.89 samples/sec Loss 6.3215 LearningRate 0.000951 Epoch: 4 Global Step: 101550 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:28,552-Speed 2495.33 samples/sec Loss 6.2938 LearningRate 0.000951 Epoch: 4 Global Step: 101560 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:36,759-Speed 2496.00 samples/sec Loss 6.2661 LearningRate 0.000951 Epoch: 4 Global Step: 101570 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:44,959-Speed 2497.82 samples/sec Loss 6.3979 LearningRate 0.000951 Epoch: 4 Global Step: 101580 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:39:53,113-Speed 2512.46 samples/sec Loss 6.3719 LearningRate 0.000951 Epoch: 4 Global Step: 101590 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:01,323-Speed 2495.11 samples/sec Loss 6.4945 LearningRate 0.000951 Epoch: 4 Global Step: 101600 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:09,523-Speed 2497.83 samples/sec Loss 6.3720 LearningRate 0.000951 Epoch: 4 Global Step: 101610 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:17,720-Speed 2498.84 samples/sec Loss 6.3150 LearningRate 0.000951 Epoch: 4 Global Step: 101620 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:25,916-Speed 2500.11 samples/sec Loss 6.3460 LearningRate 0.000951 Epoch: 4 Global Step: 101630 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:34,119-Speed 2497.15 samples/sec Loss 6.3990 LearningRate 0.000951 Epoch: 4 Global Step: 101640 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:42,270-Speed 2513.06 samples/sec Loss 6.2574 LearningRate 0.000951 Epoch: 4 Global Step: 101650 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:50,473-Speed 2497.23 samples/sec Loss 6.4159 LearningRate 0.000951 Epoch: 4 Global Step: 101660 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:40:58,672-Speed 2498.26 samples/sec Loss 6.2655 LearningRate 0.000950 Epoch: 4 Global Step: 101670 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:06,886-Speed 2494.00 samples/sec Loss 6.2892 LearningRate 0.000950 Epoch: 4 Global Step: 101680 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:15,095-Speed 2495.25 samples/sec Loss 6.3627 LearningRate 0.000950 Epoch: 4 Global Step: 101690 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:23,297-Speed 2497.44 samples/sec Loss 6.3374 LearningRate 0.000950 Epoch: 4 Global Step: 101700 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:31,446-Speed 2513.60 samples/sec Loss 6.2797 LearningRate 0.000950 Epoch: 4 Global Step: 101710 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:39,645-Speed 2498.31 samples/sec Loss 6.3208 LearningRate 0.000950 Epoch: 4 Global Step: 101720 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:47,843-Speed 2499.08 samples/sec Loss 6.2742 LearningRate 0.000950 Epoch: 4 Global Step: 101730 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:41:56,043-Speed 2497.83 samples/sec Loss 6.3637 LearningRate 0.000950 Epoch: 4 Global Step: 101740 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:04,259-Speed 2493.07 samples/sec Loss 6.3591 LearningRate 0.000950 Epoch: 4 Global Step: 101750 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:12,457-Speed 2498.62 samples/sec Loss 6.3730 LearningRate 0.000950 Epoch: 4 Global Step: 101760 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:20,605-Speed 2513.78 samples/sec Loss 6.3581 LearningRate 0.000950 Epoch: 4 Global Step: 101770 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:28,809-Speed 2497.03 samples/sec Loss 6.1976 LearningRate 0.000950 Epoch: 4 Global Step: 101780 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:37,011-Speed 2497.37 samples/sec Loss 6.3112 LearningRate 0.000950 Epoch: 4 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:45,208-Speed 2498.90 samples/sec Loss 6.3543 LearningRate 0.000950 Epoch: 4 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:42:53,411-Speed 2496.80 samples/sec Loss 6.3427 LearningRate 0.000950 Epoch: 4 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:01,612-Speed 2497.73 samples/sec Loss 6.3285 LearningRate 0.000950 Epoch: 4 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:09,756-Speed 2515.07 samples/sec Loss 6.2192 LearningRate 0.000950 Epoch: 4 Global Step: 101830 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:17,968-Speed 2494.32 samples/sec Loss 6.3519 LearningRate 0.000950 Epoch: 4 Global Step: 101840 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:26,164-Speed 2499.07 samples/sec Loss 6.2934 LearningRate 0.000950 Epoch: 4 Global Step: 101850 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:34,365-Speed 2497.91 samples/sec Loss 6.2934 LearningRate 0.000950 Epoch: 4 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:42,572-Speed 2495.53 samples/sec Loss 6.3467 LearningRate 0.000950 Epoch: 4 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:50,772-Speed 2498.20 samples/sec Loss 6.2598 LearningRate 0.000950 Epoch: 4 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:43:58,912-Speed 2516.69 samples/sec Loss 6.2501 LearningRate 0.000950 Epoch: 4 Global Step: 101890 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:07,113-Speed 2497.73 samples/sec Loss 6.2964 LearningRate 0.000950 Epoch: 4 Global Step: 101900 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:15,313-Speed 2497.72 samples/sec Loss 6.2701 LearningRate 0.000950 Epoch: 4 Global Step: 101910 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:23,533-Speed 2492.05 samples/sec Loss 6.3052 LearningRate 0.000950 Epoch: 4 Global Step: 101920 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:31,734-Speed 2497.73 samples/sec Loss 6.3307 LearningRate 0.000950 Epoch: 4 Global Step: 101930 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:39,932-Speed 2498.70 samples/sec Loss 6.2803 LearningRate 0.000950 Epoch: 4 Global Step: 101940 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:48,075-Speed 2515.46 samples/sec Loss 6.2620 LearningRate 0.000950 Epoch: 4 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:44:56,273-Speed 2498.49 samples/sec Loss 6.3308 LearningRate 0.000950 Epoch: 4 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:04,468-Speed 2499.66 samples/sec Loss 6.2257 LearningRate 0.000950 Epoch: 4 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:12,663-Speed 2499.55 samples/sec Loss 6.3549 LearningRate 0.000950 Epoch: 4 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:20,863-Speed 2498.01 samples/sec Loss 6.2359 LearningRate 0.000950 Epoch: 4 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:29,065-Speed 2496.99 samples/sec Loss 6.2520 LearningRate 0.000950 Epoch: 4 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:37,215-Speed 2513.67 samples/sec Loss 6.2557 LearningRate 0.000950 Epoch: 4 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:45,416-Speed 2497.67 samples/sec Loss 6.4192 LearningRate 0.000950 Epoch: 4 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:45:53,613-Speed 2498.94 samples/sec Loss 6.3131 LearningRate 0.000950 Epoch: 4 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:01,810-Speed 2498.85 samples/sec Loss 6.2674 LearningRate 0.000950 Epoch: 4 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:10,008-Speed 2498.78 samples/sec Loss 6.2328 LearningRate 0.000950 Epoch: 4 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:18,208-Speed 2498.02 samples/sec Loss 6.3192 LearningRate 0.000949 Epoch: 4 Global Step: 102060 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:26,357-Speed 2513.40 samples/sec Loss 6.2024 LearningRate 0.000949 Epoch: 4 Global Step: 102070 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:34,554-Speed 2499.02 samples/sec Loss 6.3322 LearningRate 0.000949 Epoch: 4 Global Step: 102080 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:42,757-Speed 2497.03 samples/sec Loss 6.2845 LearningRate 0.000949 Epoch: 4 Global Step: 102090 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:50,954-Speed 2498.86 samples/sec Loss 6.2735 LearningRate 0.000949 Epoch: 4 Global Step: 102100 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:46:59,158-Speed 2496.76 samples/sec Loss 6.2178 LearningRate 0.000949 Epoch: 4 Global Step: 102110 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:07,360-Speed 2497.26 samples/sec Loss 6.2149 LearningRate 0.000949 Epoch: 4 Global Step: 102120 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:15,504-Speed 2515.57 samples/sec Loss 6.1813 LearningRate 0.000949 Epoch: 4 Global Step: 102130 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:23,702-Speed 2498.72 samples/sec Loss 6.3997 LearningRate 0.000949 Epoch: 4 Global Step: 102140 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:31,898-Speed 2499.09 samples/sec Loss 6.3749 LearningRate 0.000949 Epoch: 4 Global Step: 102150 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:40,100-Speed 2497.47 samples/sec Loss 6.3702 LearningRate 0.000949 Epoch: 4 Global Step: 102160 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:48,315-Speed 2493.58 samples/sec Loss 6.3188 LearningRate 0.000949 Epoch: 4 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:47:56,512-Speed 2498.63 samples/sec Loss 6.3658 LearningRate 0.000949 Epoch: 4 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:04,670-Speed 2510.89 samples/sec Loss 6.2529 LearningRate 0.000949 Epoch: 4 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:12,868-Speed 2498.54 samples/sec Loss 6.3033 LearningRate 0.000949 Epoch: 4 Global Step: 102200 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:21,072-Speed 2496.89 samples/sec Loss 6.2768 LearningRate 0.000949 Epoch: 4 Global Step: 102210 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:29,270-Speed 2498.82 samples/sec Loss 6.2522 LearningRate 0.000949 Epoch: 4 Global Step: 102220 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:37,470-Speed 2497.76 samples/sec Loss 6.3113 LearningRate 0.000949 Epoch: 4 Global Step: 102230 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:45,664-Speed 2499.74 samples/sec Loss 6.2330 LearningRate 0.000949 Epoch: 4 Global Step: 102240 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:48:53,818-Speed 2512.04 samples/sec Loss 6.2010 LearningRate 0.000949 Epoch: 4 Global Step: 102250 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:49:02,013-Speed 2499.37 samples/sec Loss 6.2853 LearningRate 0.000949 Epoch: 4 Global Step: 102260 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:49:10,210-Speed 2498.79 samples/sec Loss 6.2798 LearningRate 0.000949 Epoch: 4 Global Step: 102270 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:49:18,420-Speed 2494.85 samples/sec Loss 6.2890 LearningRate 0.000949 Epoch: 4 Global Step: 102280 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:49:26,644-Speed 2490.93 samples/sec Loss 6.3226 LearningRate 0.000949 Epoch: 4 Global Step: 102290 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:49:34,841-Speed 2498.85 samples/sec Loss 6.2777 LearningRate 0.000949 Epoch: 4 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:49:42,984-Speed 2515.26 samples/sec Loss 6.2384 LearningRate 0.000949 Epoch: 4 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:49:51,182-Speed 2498.78 samples/sec Loss 6.2837 LearningRate 0.000949 Epoch: 4 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:49:59,392-Speed 2494.72 samples/sec Loss 6.2171 LearningRate 0.000949 Epoch: 4 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:07,590-Speed 2498.57 samples/sec Loss 6.2946 LearningRate 0.000949 Epoch: 4 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:15,784-Speed 2500.20 samples/sec Loss 6.2124 LearningRate 0.000949 Epoch: 4 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:23,981-Speed 2498.88 samples/sec Loss 6.2203 LearningRate 0.000949 Epoch: 4 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:32,125-Speed 2514.91 samples/sec Loss 6.1479 LearningRate 0.000949 Epoch: 4 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:40,324-Speed 2498.61 samples/sec Loss 6.2923 LearningRate 0.000949 Epoch: 4 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:48,538-Speed 2493.75 samples/sec Loss 6.3617 LearningRate 0.000949 Epoch: 4 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:50:56,741-Speed 2497.03 samples/sec Loss 6.2852 LearningRate 0.000949 Epoch: 4 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:04,946-Speed 2496.79 samples/sec Loss 6.3516 LearningRate 0.000949 Epoch: 4 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:13,153-Speed 2495.49 samples/sec Loss 6.2839 LearningRate 0.000949 Epoch: 4 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:21,301-Speed 2514.03 samples/sec Loss 6.2324 LearningRate 0.000949 Epoch: 4 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:29,507-Speed 2496.28 samples/sec Loss 6.2312 LearningRate 0.000948 Epoch: 4 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:37,710-Speed 2496.85 samples/sec Loss 6.2644 LearningRate 0.000948 Epoch: 4 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:45,913-Speed 2497.23 samples/sec Loss 6.2997 LearningRate 0.000948 Epoch: 4 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:51:54,111-Speed 2498.75 samples/sec Loss 6.4158 LearningRate 0.000948 Epoch: 4 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:02,322-Speed 2494.82 samples/sec Loss 6.1986 LearningRate 0.000948 Epoch: 4 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:10,471-Speed 2513.49 samples/sec Loss 6.1707 LearningRate 0.000948 Epoch: 4 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:18,670-Speed 2498.10 samples/sec Loss 6.2759 LearningRate 0.000948 Epoch: 4 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:26,879-Speed 2495.39 samples/sec Loss 6.3098 LearningRate 0.000948 Epoch: 4 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:35,076-Speed 2498.91 samples/sec Loss 6.3014 LearningRate 0.000948 Epoch: 4 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:43,275-Speed 2498.16 samples/sec Loss 6.3426 LearningRate 0.000948 Epoch: 4 Global Step: 102530 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:51,470-Speed 2499.37 samples/sec Loss 6.2780 LearningRate 0.000948 Epoch: 4 Global Step: 102540 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:52:59,614-Speed 2514.96 samples/sec Loss 6.3113 LearningRate 0.000948 Epoch: 4 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:07,813-Speed 2498.46 samples/sec Loss 6.2480 LearningRate 0.000948 Epoch: 4 Global Step: 102560 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:16,024-Speed 2494.72 samples/sec Loss 6.3476 LearningRate 0.000948 Epoch: 4 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:24,219-Speed 2499.58 samples/sec Loss 6.3820 LearningRate 0.000948 Epoch: 4 Global Step: 102580 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:32,414-Speed 2499.49 samples/sec Loss 6.2482 LearningRate 0.000948 Epoch: 4 Global Step: 102590 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:40,619-Speed 2496.36 samples/sec Loss 6.2132 LearningRate 0.000948 Epoch: 4 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:48,788-Speed 2507.47 samples/sec Loss 6.4845 LearningRate 0.000948 Epoch: 4 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:53:56,985-Speed 2499.10 samples/sec Loss 6.2968 LearningRate 0.000948 Epoch: 4 Global Step: 102620 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:05,191-Speed 2495.90 samples/sec Loss 6.4175 LearningRate 0.000948 Epoch: 4 Global Step: 102630 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:13,403-Speed 2494.94 samples/sec Loss 6.3905 LearningRate 0.000948 Epoch: 4 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:21,600-Speed 2498.75 samples/sec Loss 6.3726 LearningRate 0.000948 Epoch: 4 Global Step: 102650 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:29,797-Speed 2498.64 samples/sec Loss 6.4054 LearningRate 0.000948 Epoch: 4 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:37,948-Speed 2513.13 samples/sec Loss 6.3155 LearningRate 0.000948 Epoch: 4 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:46,157-Speed 2495.27 samples/sec Loss 6.3080 LearningRate 0.000948 Epoch: 4 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:54:54,364-Speed 2495.75 samples/sec Loss 6.2734 LearningRate 0.000948 Epoch: 4 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:55:02,560-Speed 2499.37 samples/sec Loss 6.2482 LearningRate 0.000948 Epoch: 4 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:55:10,757-Speed 2499.11 samples/sec Loss 6.2406 LearningRate 0.000948 Epoch: 4 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:55:18,962-Speed 2496.40 samples/sec Loss 6.3219 LearningRate 0.000948 Epoch: 4 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:55:27,118-Speed 2511.30 samples/sec Loss 6.2200 LearningRate 0.000948 Epoch: 4 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 13:55:35,278-Speed 2510.43 samples/sec Loss 6.2086 LearningRate 0.000948 Epoch: 4 Global Step: 102740 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:55:43,484-Speed 2496.48 samples/sec Loss 6.2374 LearningRate 0.000948 Epoch: 4 Global Step: 102750 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:55:51,696-Speed 2494.33 samples/sec Loss 6.2773 LearningRate 0.000948 Epoch: 4 Global Step: 102760 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:55:59,907-Speed 2494.76 samples/sec Loss 6.3158 LearningRate 0.000948 Epoch: 4 Global Step: 102770 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:08,106-Speed 2499.07 samples/sec Loss 6.2492 LearningRate 0.000948 Epoch: 4 Global Step: 102780 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:16,252-Speed 2514.36 samples/sec Loss 6.1504 LearningRate 0.000948 Epoch: 4 Global Step: 102790 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:24,449-Speed 2498.90 samples/sec Loss 6.3344 LearningRate 0.000948 Epoch: 4 Global Step: 102800 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:32,657-Speed 2495.55 samples/sec Loss 6.2097 LearningRate 0.000948 Epoch: 4 Global Step: 102810 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:40,859-Speed 2497.42 samples/sec Loss 6.2525 LearningRate 0.000947 Epoch: 4 Global Step: 102820 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:49,059-Speed 2498.36 samples/sec Loss 6.2178 LearningRate 0.000947 Epoch: 4 Global Step: 102830 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:56:57,254-Speed 2499.38 samples/sec Loss 6.2504 LearningRate 0.000947 Epoch: 4 Global Step: 102840 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:05,398-Speed 2515.11 samples/sec Loss 6.3850 LearningRate 0.000947 Epoch: 4 Global Step: 102850 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:13,609-Speed 2495.05 samples/sec Loss 6.2848 LearningRate 0.000947 Epoch: 4 Global Step: 102860 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:21,807-Speed 2498.23 samples/sec Loss 6.2961 LearningRate 0.000947 Epoch: 4 Global Step: 102870 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:30,006-Speed 2498.22 samples/sec Loss 6.2783 LearningRate 0.000947 Epoch: 4 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:38,217-Speed 2494.86 samples/sec Loss 6.2270 LearningRate 0.000947 Epoch: 4 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:46,429-Speed 2494.22 samples/sec Loss 6.1742 LearningRate 0.000947 Epoch: 4 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:57:54,585-Speed 2511.30 samples/sec Loss 6.1886 LearningRate 0.000947 Epoch: 4 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:02,791-Speed 2496.10 samples/sec Loss 6.2626 LearningRate 0.000947 Epoch: 4 Global Step: 102920 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:10,992-Speed 2497.82 samples/sec Loss 6.3007 LearningRate 0.000947 Epoch: 4 Global Step: 102930 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:19,208-Speed 2493.20 samples/sec Loss 6.1721 LearningRate 0.000947 Epoch: 4 Global Step: 102940 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:27,407-Speed 2498.11 samples/sec Loss 6.2239 LearningRate 0.000947 Epoch: 4 Global Step: 102950 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:35,611-Speed 2496.81 samples/sec Loss 6.1960 LearningRate 0.000947 Epoch: 4 Global Step: 102960 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:43,761-Speed 2513.25 samples/sec Loss 6.2591 LearningRate 0.000947 Epoch: 4 Global Step: 102970 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:58:51,971-Speed 2494.95 samples/sec Loss 6.1675 LearningRate 0.000947 Epoch: 4 Global Step: 102980 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:00,213-Speed 2485.51 samples/sec Loss 6.1573 LearningRate 0.000947 Epoch: 4 Global Step: 102990 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:08,412-Speed 2498.14 samples/sec Loss 6.2007 LearningRate 0.000947 Epoch: 4 Global Step: 103000 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:16,617-Speed 2496.56 samples/sec Loss 6.1848 LearningRate 0.000947 Epoch: 4 Global Step: 103010 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:24,813-Speed 2498.98 samples/sec Loss 6.2760 LearningRate 0.000947 Epoch: 4 Global Step: 103020 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:32,957-Speed 2515.49 samples/sec Loss 6.2630 LearningRate 0.000947 Epoch: 4 Global Step: 103030 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:41,154-Speed 2498.99 samples/sec Loss 6.3386 LearningRate 0.000947 Epoch: 4 Global Step: 103040 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:49,351-Speed 2498.84 samples/sec Loss 6.2109 LearningRate 0.000947 Epoch: 4 Global Step: 103050 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 13:59:57,552-Speed 2497.87 samples/sec Loss 6.3273 LearningRate 0.000947 Epoch: 4 Global Step: 103060 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:05,752-Speed 2497.89 samples/sec Loss 6.2322 LearningRate 0.000947 Epoch: 4 Global Step: 103070 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:13,949-Speed 2498.64 samples/sec Loss 6.2881 LearningRate 0.000947 Epoch: 4 Global Step: 103080 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:22,096-Speed 2514.42 samples/sec Loss 6.1762 LearningRate 0.000947 Epoch: 4 Global Step: 103090 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:30,295-Speed 2498.43 samples/sec Loss 6.1722 LearningRate 0.000947 Epoch: 4 Global Step: 103100 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:38,496-Speed 2497.35 samples/sec Loss 6.2294 LearningRate 0.000947 Epoch: 4 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:46,694-Speed 2498.69 samples/sec Loss 6.2351 LearningRate 0.000947 Epoch: 4 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:00:54,903-Speed 2495.19 samples/sec Loss 6.2356 LearningRate 0.000947 Epoch: 4 Global Step: 103130 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:03,098-Speed 2499.63 samples/sec Loss 6.1207 LearningRate 0.000947 Epoch: 4 Global Step: 103140 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:11,243-Speed 2514.80 samples/sec Loss 6.3024 LearningRate 0.000947 Epoch: 4 Global Step: 103150 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:19,441-Speed 2498.50 samples/sec Loss 6.2518 LearningRate 0.000947 Epoch: 4 Global Step: 103160 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:27,642-Speed 2497.60 samples/sec Loss 6.2231 LearningRate 0.000947 Epoch: 4 Global Step: 103170 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:35,837-Speed 2499.82 samples/sec Loss 6.2487 LearningRate 0.000947 Epoch: 4 Global Step: 103180 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:44,039-Speed 2497.57 samples/sec Loss 6.2974 LearningRate 0.000947 Epoch: 4 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:01:52,256-Speed 2492.62 samples/sec Loss 6.2709 LearningRate 0.000947 Epoch: 4 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:00,405-Speed 2513.51 samples/sec Loss 6.2377 LearningRate 0.000946 Epoch: 4 Global Step: 103210 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:08,601-Speed 2499.30 samples/sec Loss 6.2820 LearningRate 0.000946 Epoch: 4 Global Step: 103220 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:16,797-Speed 2498.95 samples/sec Loss 6.1956 LearningRate 0.000946 Epoch: 4 Global Step: 103230 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:24,999-Speed 2497.57 samples/sec Loss 6.1755 LearningRate 0.000946 Epoch: 4 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:33,203-Speed 2496.84 samples/sec Loss 6.1546 LearningRate 0.000946 Epoch: 4 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:41,405-Speed 2497.80 samples/sec Loss 6.3041 LearningRate 0.000946 Epoch: 4 Global Step: 103260 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:49,552-Speed 2514.19 samples/sec Loss 6.3057 LearningRate 0.000946 Epoch: 4 Global Step: 103270 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:02:57,756-Speed 2496.69 samples/sec Loss 6.2425 LearningRate 0.000946 Epoch: 4 Global Step: 103280 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:05,972-Speed 2492.92 samples/sec Loss 6.1846 LearningRate 0.000946 Epoch: 4 Global Step: 103290 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:14,176-Speed 2497.02 samples/sec Loss 6.2725 LearningRate 0.000946 Epoch: 4 Global Step: 103300 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:22,375-Speed 2498.23 samples/sec Loss 6.2837 LearningRate 0.000946 Epoch: 4 Global Step: 103310 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:30,573-Speed 2498.52 samples/sec Loss 6.2040 LearningRate 0.000946 Epoch: 4 Global Step: 103320 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:38,717-Speed 2515.10 samples/sec Loss 6.2965 LearningRate 0.000946 Epoch: 4 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:46,922-Speed 2496.64 samples/sec Loss 6.1680 LearningRate 0.000946 Epoch: 4 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:03:55,139-Speed 2492.88 samples/sec Loss 6.1506 LearningRate 0.000946 Epoch: 4 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:03,342-Speed 2497.91 samples/sec Loss 6.2230 LearningRate 0.000946 Epoch: 4 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:11,538-Speed 2499.38 samples/sec Loss 6.2379 LearningRate 0.000946 Epoch: 4 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:19,746-Speed 2495.44 samples/sec Loss 6.2023 LearningRate 0.000946 Epoch: 4 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:27,892-Speed 2514.57 samples/sec Loss 6.2227 LearningRate 0.000946 Epoch: 4 Global Step: 103390 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:36,090-Speed 2498.74 samples/sec Loss 6.2775 LearningRate 0.000946 Epoch: 4 Global Step: 103400 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:44,285-Speed 2499.63 samples/sec Loss 6.2319 LearningRate 0.000946 Epoch: 4 Global Step: 103410 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:04:52,493-Speed 2495.66 samples/sec Loss 6.2067 LearningRate 0.000946 Epoch: 4 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:00,686-Speed 2499.78 samples/sec Loss 6.2516 LearningRate 0.000946 Epoch: 4 Global Step: 103430 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:08,882-Speed 2499.45 samples/sec Loss 6.1877 LearningRate 0.000946 Epoch: 4 Global Step: 103440 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:17,027-Speed 2514.74 samples/sec Loss 6.1887 LearningRate 0.000946 Epoch: 4 Global Step: 103450 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:25,223-Speed 2499.21 samples/sec Loss 6.2235 LearningRate 0.000946 Epoch: 4 Global Step: 103460 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:33,426-Speed 2497.21 samples/sec Loss 6.2827 LearningRate 0.000946 Epoch: 4 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:41,632-Speed 2495.93 samples/sec Loss 6.1433 LearningRate 0.000946 Epoch: 4 Global Step: 103480 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:49,831-Speed 2498.64 samples/sec Loss 6.2462 LearningRate 0.000946 Epoch: 4 Global Step: 103490 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:05:58,030-Speed 2498.22 samples/sec Loss 6.2936 LearningRate 0.000946 Epoch: 4 Global Step: 103500 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:06,177-Speed 2514.11 samples/sec Loss 6.1647 LearningRate 0.000946 Epoch: 4 Global Step: 103510 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:14,372-Speed 2499.36 samples/sec Loss 6.2877 LearningRate 0.000946 Epoch: 4 Global Step: 103520 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:22,581-Speed 2495.33 samples/sec Loss 6.2066 LearningRate 0.000946 Epoch: 4 Global Step: 103530 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:30,785-Speed 2496.65 samples/sec Loss 6.1576 LearningRate 0.000946 Epoch: 4 Global Step: 103540 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:38,991-Speed 2496.27 samples/sec Loss 6.1920 LearningRate 0.000946 Epoch: 4 Global Step: 103550 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:47,186-Speed 2499.76 samples/sec Loss 6.2702 LearningRate 0.000946 Epoch: 4 Global Step: 103560 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:06:55,330-Speed 2514.98 samples/sec Loss 6.3171 LearningRate 0.000946 Epoch: 4 Global Step: 103570 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:03,528-Speed 2498.76 samples/sec Loss 6.2461 LearningRate 0.000946 Epoch: 4 Global Step: 103580 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:11,731-Speed 2497.21 samples/sec Loss 6.2545 LearningRate 0.000945 Epoch: 4 Global Step: 103590 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:19,943-Speed 2494.52 samples/sec Loss 6.2872 LearningRate 0.000945 Epoch: 4 Global Step: 103600 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:28,145-Speed 2497.27 samples/sec Loss 6.3293 LearningRate 0.000945 Epoch: 4 Global Step: 103610 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:36,342-Speed 2499.06 samples/sec Loss 6.2718 LearningRate 0.000945 Epoch: 4 Global Step: 103620 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:44,487-Speed 2514.69 samples/sec Loss 6.2063 LearningRate 0.000945 Epoch: 4 Global Step: 103630 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:07:52,686-Speed 2498.87 samples/sec Loss 6.2104 LearningRate 0.000945 Epoch: 4 Global Step: 103640 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:00,897-Speed 2494.75 samples/sec Loss 6.1765 LearningRate 0.000945 Epoch: 4 Global Step: 103650 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:09,096-Speed 2498.37 samples/sec Loss 6.2157 LearningRate 0.000945 Epoch: 4 Global Step: 103660 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:17,292-Speed 2499.32 samples/sec Loss 6.2537 LearningRate 0.000945 Epoch: 4 Global Step: 103670 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:25,490-Speed 2498.38 samples/sec Loss 6.1409 LearningRate 0.000945 Epoch: 4 Global Step: 103680 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:33,642-Speed 2512.62 samples/sec Loss 6.1724 LearningRate 0.000945 Epoch: 4 Global Step: 103690 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:44,120-Speed 1954.94 samples/sec Loss 6.2641 LearningRate 0.000945 Epoch: 5 Global Step: 103700 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:08:52,317-Speed 2499.01 samples/sec Loss 6.1973 LearningRate 0.000945 Epoch: 5 Global Step: 103710 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:00,518-Speed 2497.72 samples/sec Loss 6.2580 LearningRate 0.000945 Epoch: 5 Global Step: 103720 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:08,724-Speed 2496.09 samples/sec Loss 6.2006 LearningRate 0.000945 Epoch: 5 Global Step: 103730 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:16,923-Speed 2498.33 samples/sec Loss 6.2615 LearningRate 0.000945 Epoch: 5 Global Step: 103740 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:25,068-Speed 2514.83 samples/sec Loss 6.2731 LearningRate 0.000945 Epoch: 5 Global Step: 103750 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:33,278-Speed 2494.93 samples/sec Loss 6.1522 LearningRate 0.000945 Epoch: 5 Global Step: 103760 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:41,483-Speed 2496.46 samples/sec Loss 6.1576 LearningRate 0.000945 Epoch: 5 Global Step: 103770 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:49,685-Speed 2497.57 samples/sec Loss 6.2132 LearningRate 0.000945 Epoch: 5 Global Step: 103780 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:09:57,912-Speed 2490.17 samples/sec Loss 6.1780 LearningRate 0.000945 Epoch: 5 Global Step: 103790 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:06,116-Speed 2496.85 samples/sec Loss 6.1751 LearningRate 0.000945 Epoch: 5 Global Step: 103800 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:14,268-Speed 2512.68 samples/sec Loss 6.2525 LearningRate 0.000945 Epoch: 5 Global Step: 103810 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:22,468-Speed 2497.90 samples/sec Loss 6.1892 LearningRate 0.000945 Epoch: 5 Global Step: 103820 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:30,667-Speed 2498.38 samples/sec Loss 6.2338 LearningRate 0.000945 Epoch: 5 Global Step: 103830 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:38,877-Speed 2494.75 samples/sec Loss 6.2201 LearningRate 0.000945 Epoch: 5 Global Step: 103840 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:47,078-Speed 2497.69 samples/sec Loss 6.2487 LearningRate 0.000945 Epoch: 5 Global Step: 103850 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:10:55,278-Speed 2498.11 samples/sec Loss 6.1659 LearningRate 0.000945 Epoch: 5 Global Step: 103860 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:03,424-Speed 2514.37 samples/sec Loss 6.1867 LearningRate 0.000945 Epoch: 5 Global Step: 103870 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:11,627-Speed 2497.93 samples/sec Loss 6.1710 LearningRate 0.000945 Epoch: 5 Global Step: 103880 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:19,821-Speed 2499.85 samples/sec Loss 6.1906 LearningRate 0.000945 Epoch: 5 Global Step: 103890 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:28,026-Speed 2496.44 samples/sec Loss 6.0477 LearningRate 0.000945 Epoch: 5 Global Step: 103900 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:36,232-Speed 2496.20 samples/sec Loss 6.1085 LearningRate 0.000945 Epoch: 5 Global Step: 103910 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:44,428-Speed 2499.13 samples/sec Loss 6.2114 LearningRate 0.000945 Epoch: 5 Global Step: 103920 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:11:52,579-Speed 2512.85 samples/sec Loss 6.1291 LearningRate 0.000945 Epoch: 5 Global Step: 103930 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:12:00,778-Speed 2498.60 samples/sec Loss 6.2143 LearningRate 0.000945 Epoch: 5 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:08,977-Speed 2498.17 samples/sec Loss 6.1552 LearningRate 0.000945 Epoch: 5 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:17,181-Speed 2496.69 samples/sec Loss 6.1914 LearningRate 0.000945 Epoch: 5 Global Step: 103960 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:25,380-Speed 2498.44 samples/sec Loss 6.1822 LearningRate 0.000945 Epoch: 5 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:33,578-Speed 2498.59 samples/sec Loss 6.0958 LearningRate 0.000944 Epoch: 5 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:41,727-Speed 2513.52 samples/sec Loss 6.1244 LearningRate 0.000944 Epoch: 5 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:49,924-Speed 2498.99 samples/sec Loss 6.1736 LearningRate 0.000944 Epoch: 5 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:12:58,123-Speed 2498.42 samples/sec Loss 6.1702 LearningRate 0.000944 Epoch: 5 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:06,331-Speed 2495.29 samples/sec Loss 6.1883 LearningRate 0.000944 Epoch: 5 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:14,525-Speed 2499.88 samples/sec Loss 6.1478 LearningRate 0.000944 Epoch: 5 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:22,733-Speed 2495.54 samples/sec Loss 6.2960 LearningRate 0.000944 Epoch: 5 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:30,879-Speed 2514.62 samples/sec Loss 6.1474 LearningRate 0.000944 Epoch: 5 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:39,076-Speed 2498.90 samples/sec Loss 6.2248 LearningRate 0.000944 Epoch: 5 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:47,281-Speed 2496.28 samples/sec Loss 6.2251 LearningRate 0.000944 Epoch: 5 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:13:55,481-Speed 2497.91 samples/sec Loss 6.1888 LearningRate 0.000944 Epoch: 5 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:03,681-Speed 2497.98 samples/sec Loss 6.1912 LearningRate 0.000944 Epoch: 5 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:11,882-Speed 2497.96 samples/sec Loss 6.2361 LearningRate 0.000944 Epoch: 5 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:20,026-Speed 2515.01 samples/sec Loss 6.4794 LearningRate 0.000944 Epoch: 5 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:28,230-Speed 2496.89 samples/sec Loss 6.3171 LearningRate 0.000944 Epoch: 5 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:36,430-Speed 2498.01 samples/sec Loss 6.3027 LearningRate 0.000944 Epoch: 5 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:44,631-Speed 2497.46 samples/sec Loss 6.3076 LearningRate 0.000944 Epoch: 5 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:14:52,832-Speed 2497.78 samples/sec Loss 6.2405 LearningRate 0.000944 Epoch: 5 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:01,030-Speed 2498.39 samples/sec Loss 6.2801 LearningRate 0.000944 Epoch: 5 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:09,175-Speed 2515.08 samples/sec Loss 6.2224 LearningRate 0.000944 Epoch: 5 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:17,374-Speed 2498.20 samples/sec Loss 6.2570 LearningRate 0.000944 Epoch: 5 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:25,572-Speed 2499.31 samples/sec Loss 6.2230 LearningRate 0.000944 Epoch: 5 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:33,774-Speed 2497.67 samples/sec Loss 6.1348 LearningRate 0.000944 Epoch: 5 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:41,979-Speed 2496.40 samples/sec Loss 6.2253 LearningRate 0.000944 Epoch: 5 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:50,174-Speed 2499.47 samples/sec Loss 6.2548 LearningRate 0.000944 Epoch: 5 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:15:58,315-Speed 2516.38 samples/sec Loss 6.2183 LearningRate 0.000944 Epoch: 5 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:06,510-Speed 2499.85 samples/sec Loss 6.1546 LearningRate 0.000944 Epoch: 5 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:14,708-Speed 2498.62 samples/sec Loss 6.2206 LearningRate 0.000944 Epoch: 5 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:22,904-Speed 2499.28 samples/sec Loss 6.3174 LearningRate 0.000944 Epoch: 5 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:31,100-Speed 2499.22 samples/sec Loss 6.2242 LearningRate 0.000944 Epoch: 5 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:39,314-Speed 2493.64 samples/sec Loss 6.1776 LearningRate 0.000944 Epoch: 5 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:47,460-Speed 2514.45 samples/sec Loss 6.2242 LearningRate 0.000944 Epoch: 5 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:16:55,660-Speed 2498.10 samples/sec Loss 6.0898 LearningRate 0.000944 Epoch: 5 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:03,859-Speed 2498.30 samples/sec Loss 6.0978 LearningRate 0.000944 Epoch: 5 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:12,055-Speed 2499.22 samples/sec Loss 6.1247 LearningRate 0.000944 Epoch: 5 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:20,251-Speed 2499.15 samples/sec Loss 6.0901 LearningRate 0.000944 Epoch: 5 Global Step: 104330 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:28,445-Speed 2499.60 samples/sec Loss 6.1272 LearningRate 0.000944 Epoch: 5 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:36,592-Speed 2514.35 samples/sec Loss 6.1088 LearningRate 0.000944 Epoch: 5 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:44,791-Speed 2498.31 samples/sec Loss 6.3001 LearningRate 0.000943 Epoch: 5 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:17:52,991-Speed 2497.99 samples/sec Loss 6.1512 LearningRate 0.000943 Epoch: 5 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:01,192-Speed 2497.63 samples/sec Loss 6.1095 LearningRate 0.000943 Epoch: 5 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:09,389-Speed 2499.06 samples/sec Loss 6.1452 LearningRate 0.000943 Epoch: 5 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:17,584-Speed 2499.52 samples/sec Loss 6.1430 LearningRate 0.000943 Epoch: 5 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:25,727-Speed 2515.55 samples/sec Loss 6.1221 LearningRate 0.000943 Epoch: 5 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:33,924-Speed 2498.87 samples/sec Loss 6.0826 LearningRate 0.000943 Epoch: 5 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:42,118-Speed 2499.95 samples/sec Loss 6.1588 LearningRate 0.000943 Epoch: 5 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:50,311-Speed 2500.24 samples/sec Loss 6.1186 LearningRate 0.000943 Epoch: 5 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:18:58,510-Speed 2498.37 samples/sec Loss 6.1798 LearningRate 0.000943 Epoch: 5 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:06,707-Speed 2498.55 samples/sec Loss 6.3540 LearningRate 0.000943 Epoch: 5 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:14,859-Speed 2512.77 samples/sec Loss 6.1463 LearningRate 0.000943 Epoch: 5 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:23,056-Speed 2498.88 samples/sec Loss 6.2427 LearningRate 0.000943 Epoch: 5 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:31,263-Speed 2495.78 samples/sec Loss 6.1246 LearningRate 0.000943 Epoch: 5 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:39,462-Speed 2498.62 samples/sec Loss 6.2088 LearningRate 0.000943 Epoch: 5 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:47,660-Speed 2498.75 samples/sec Loss 6.1822 LearningRate 0.000943 Epoch: 5 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:19:55,859-Speed 2498.13 samples/sec Loss 6.0747 LearningRate 0.000943 Epoch: 5 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:04,012-Speed 2512.44 samples/sec Loss 6.0690 LearningRate 0.000943 Epoch: 5 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:12,209-Speed 2498.80 samples/sec Loss 6.1826 LearningRate 0.000943 Epoch: 5 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:20,405-Speed 2499.43 samples/sec Loss 6.0662 LearningRate 0.000943 Epoch: 5 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:28,600-Speed 2499.38 samples/sec Loss 6.2158 LearningRate 0.000943 Epoch: 5 Global Step: 104560 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:36,801-Speed 2497.88 samples/sec Loss 6.0789 LearningRate 0.000943 Epoch: 5 Global Step: 104570 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:44,999-Speed 2498.61 samples/sec Loss 6.0309 LearningRate 0.000943 Epoch: 5 Global Step: 104580 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:20:53,145-Speed 2514.49 samples/sec Loss 6.1662 LearningRate 0.000943 Epoch: 5 Global Step: 104590 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:01,346-Speed 2497.74 samples/sec Loss 6.0829 LearningRate 0.000943 Epoch: 5 Global Step: 104600 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:09,541-Speed 2499.33 samples/sec Loss 6.1095 LearningRate 0.000943 Epoch: 5 Global Step: 104610 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:17,743-Speed 2497.41 samples/sec Loss 6.0790 LearningRate 0.000943 Epoch: 5 Global Step: 104620 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:25,939-Speed 2499.21 samples/sec Loss 6.1599 LearningRate 0.000943 Epoch: 5 Global Step: 104630 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:34,136-Speed 2498.89 samples/sec Loss 6.1571 LearningRate 0.000943 Epoch: 5 Global Step: 104640 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:42,305-Speed 2507.46 samples/sec Loss 6.2148 LearningRate 0.000943 Epoch: 5 Global Step: 104650 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:50,506-Speed 2497.78 samples/sec Loss 6.0932 LearningRate 0.000943 Epoch: 5 Global Step: 104660 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:21:58,700-Speed 2500.67 samples/sec Loss 6.0679 LearningRate 0.000943 Epoch: 5 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:06,903-Speed 2497.16 samples/sec Loss 6.1968 LearningRate 0.000943 Epoch: 5 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:15,099-Speed 2499.06 samples/sec Loss 6.1239 LearningRate 0.000943 Epoch: 5 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:23,300-Speed 2498.23 samples/sec Loss 6.1645 LearningRate 0.000943 Epoch: 5 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:31,442-Speed 2515.67 samples/sec Loss 6.1857 LearningRate 0.000943 Epoch: 5 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:39,643-Speed 2497.53 samples/sec Loss 6.0567 LearningRate 0.000943 Epoch: 5 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:47,840-Speed 2499.94 samples/sec Loss 6.1408 LearningRate 0.000943 Epoch: 5 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:22:56,034-Speed 2499.86 samples/sec Loss 6.0822 LearningRate 0.000942 Epoch: 5 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:04,233-Speed 2498.15 samples/sec Loss 6.0885 LearningRate 0.000942 Epoch: 5 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:12,427-Speed 2499.84 samples/sec Loss 6.1448 LearningRate 0.000942 Epoch: 5 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:20,575-Speed 2514.22 samples/sec Loss 6.0037 LearningRate 0.000942 Epoch: 5 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:28,770-Speed 2499.61 samples/sec Loss 6.1452 LearningRate 0.000942 Epoch: 5 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:36,965-Speed 2499.50 samples/sec Loss 6.1894 LearningRate 0.000942 Epoch: 5 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:45,161-Speed 2499.27 samples/sec Loss 6.2398 LearningRate 0.000942 Epoch: 5 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:23:53,356-Speed 2499.47 samples/sec Loss 6.2364 LearningRate 0.000942 Epoch: 5 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:01,555-Speed 2498.23 samples/sec Loss 6.1227 LearningRate 0.000942 Epoch: 5 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:09,696-Speed 2516.71 samples/sec Loss 6.1598 LearningRate 0.000942 Epoch: 5 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:17,899-Speed 2497.07 samples/sec Loss 6.1349 LearningRate 0.000942 Epoch: 5 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:26,097-Speed 2498.75 samples/sec Loss 6.1610 LearningRate 0.000942 Epoch: 5 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:34,294-Speed 2498.81 samples/sec Loss 6.1483 LearningRate 0.000942 Epoch: 5 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:42,503-Speed 2495.20 samples/sec Loss 6.1060 LearningRate 0.000942 Epoch: 5 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:50,708-Speed 2496.41 samples/sec Loss 6.0801 LearningRate 0.000942 Epoch: 5 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:24:58,858-Speed 2513.53 samples/sec Loss 6.0971 LearningRate 0.000942 Epoch: 5 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:07,062-Speed 2496.84 samples/sec Loss 6.0550 LearningRate 0.000942 Epoch: 5 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:15,268-Speed 2496.20 samples/sec Loss 6.2213 LearningRate 0.000942 Epoch: 5 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:23,470-Speed 2497.17 samples/sec Loss 6.3017 LearningRate 0.000942 Epoch: 5 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:31,670-Speed 2497.82 samples/sec Loss 6.1397 LearningRate 0.000942 Epoch: 5 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:39,873-Speed 2497.91 samples/sec Loss 6.2664 LearningRate 0.000942 Epoch: 5 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:48,015-Speed 2515.72 samples/sec Loss 6.1668 LearningRate 0.000942 Epoch: 5 Global Step: 104950 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:25:56,210-Speed 2499.89 samples/sec Loss 6.1399 LearningRate 0.000942 Epoch: 5 Global Step: 104960 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:04,407-Speed 2499.09 samples/sec Loss 6.1264 LearningRate 0.000942 Epoch: 5 Global Step: 104970 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:12,610-Speed 2497.08 samples/sec Loss 6.1570 LearningRate 0.000942 Epoch: 5 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:20,817-Speed 2496.35 samples/sec Loss 6.1076 LearningRate 0.000942 Epoch: 5 Global Step: 104990 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:29,017-Speed 2498.14 samples/sec Loss 6.1106 LearningRate 0.000942 Epoch: 5 Global Step: 105000 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:37,160-Speed 2515.42 samples/sec Loss 6.1636 LearningRate 0.000942 Epoch: 5 Global Step: 105010 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:45,355-Speed 2499.38 samples/sec Loss 6.2032 LearningRate 0.000942 Epoch: 5 Global Step: 105020 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:26:53,552-Speed 2498.77 samples/sec Loss 6.2934 LearningRate 0.000942 Epoch: 5 Global Step: 105030 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:01,751-Speed 2498.47 samples/sec Loss 6.1983 LearningRate 0.000942 Epoch: 5 Global Step: 105040 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:09,947-Speed 2499.21 samples/sec Loss 6.1860 LearningRate 0.000942 Epoch: 5 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:18,144-Speed 2498.83 samples/sec Loss 6.1886 LearningRate 0.000942 Epoch: 5 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:26,285-Speed 2516.06 samples/sec Loss 6.1186 LearningRate 0.000942 Epoch: 5 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:34,484-Speed 2498.55 samples/sec Loss 6.0784 LearningRate 0.000942 Epoch: 5 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:42,681-Speed 2498.72 samples/sec Loss 6.1552 LearningRate 0.000942 Epoch: 5 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:50,883-Speed 2497.83 samples/sec Loss 6.2351 LearningRate 0.000942 Epoch: 5 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:27:59,081-Speed 2498.63 samples/sec Loss 6.1396 LearningRate 0.000942 Epoch: 5 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 166 hours Training: 2022-07-06 14:28:07,238-Speed 2511.16 samples/sec Loss 6.1177 LearningRate 0.000942 Epoch: 5 Global Step: 105120 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:15,384-Speed 2514.49 samples/sec Loss 6.1738 LearningRate 0.000941 Epoch: 5 Global Step: 105130 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:23,588-Speed 2496.83 samples/sec Loss 6.1466 LearningRate 0.000941 Epoch: 5 Global Step: 105140 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:31,786-Speed 2498.39 samples/sec Loss 6.1982 LearningRate 0.000941 Epoch: 5 Global Step: 105150 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:39,989-Speed 2497.21 samples/sec Loss 6.0786 LearningRate 0.000941 Epoch: 5 Global Step: 105160 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:48,192-Speed 2497.25 samples/sec Loss 6.1470 LearningRate 0.000941 Epoch: 5 Global Step: 105170 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:28:56,391-Speed 2497.97 samples/sec Loss 6.0556 LearningRate 0.000941 Epoch: 5 Global Step: 105180 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:04,537-Speed 2514.51 samples/sec Loss 6.1900 LearningRate 0.000941 Epoch: 5 Global Step: 105190 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:12,740-Speed 2497.33 samples/sec Loss 6.0782 LearningRate 0.000941 Epoch: 5 Global Step: 105200 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:20,938-Speed 2498.60 samples/sec Loss 6.0594 LearningRate 0.000941 Epoch: 5 Global Step: 105210 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:29,147-Speed 2495.46 samples/sec Loss 6.2284 LearningRate 0.000941 Epoch: 5 Global Step: 105220 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:37,347-Speed 2498.05 samples/sec Loss 6.2211 LearningRate 0.000941 Epoch: 5 Global Step: 105230 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:45,547-Speed 2497.77 samples/sec Loss 6.1937 LearningRate 0.000941 Epoch: 5 Global Step: 105240 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:29:53,693-Speed 2514.59 samples/sec Loss 6.0656 LearningRate 0.000941 Epoch: 5 Global Step: 105250 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:01,892-Speed 2498.14 samples/sec Loss 6.1165 LearningRate 0.000941 Epoch: 5 Global Step: 105260 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:10,108-Speed 2493.02 samples/sec Loss 6.0891 LearningRate 0.000941 Epoch: 5 Global Step: 105270 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:18,312-Speed 2496.85 samples/sec Loss 6.1715 LearningRate 0.000941 Epoch: 5 Global Step: 105280 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:26,521-Speed 2495.34 samples/sec Loss 6.0244 LearningRate 0.000941 Epoch: 5 Global Step: 105290 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:34,720-Speed 2498.52 samples/sec Loss 6.0938 LearningRate 0.000941 Epoch: 5 Global Step: 105300 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:42,868-Speed 2514.06 samples/sec Loss 6.0121 LearningRate 0.000941 Epoch: 5 Global Step: 105310 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:51,066-Speed 2498.67 samples/sec Loss 6.1118 LearningRate 0.000941 Epoch: 5 Global Step: 105320 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:30:59,265-Speed 2498.10 samples/sec Loss 6.1881 LearningRate 0.000941 Epoch: 5 Global Step: 105330 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:07,466-Speed 2497.92 samples/sec Loss 6.1463 LearningRate 0.000941 Epoch: 5 Global Step: 105340 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:15,665-Speed 2498.17 samples/sec Loss 6.0742 LearningRate 0.000941 Epoch: 5 Global Step: 105350 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:23,865-Speed 2498.16 samples/sec Loss 6.0984 LearningRate 0.000941 Epoch: 5 Global Step: 105360 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:32,010-Speed 2514.91 samples/sec Loss 6.0611 LearningRate 0.000941 Epoch: 5 Global Step: 105370 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:40,220-Speed 2494.88 samples/sec Loss 6.0901 LearningRate 0.000941 Epoch: 5 Global Step: 105380 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:48,431-Speed 2494.64 samples/sec Loss 6.1101 LearningRate 0.000941 Epoch: 5 Global Step: 105390 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:31:56,631-Speed 2498.00 samples/sec Loss 6.0424 LearningRate 0.000941 Epoch: 5 Global Step: 105400 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:04,830-Speed 2498.52 samples/sec Loss 6.1219 LearningRate 0.000941 Epoch: 5 Global Step: 105410 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:13,033-Speed 2497.04 samples/sec Loss 6.0296 LearningRate 0.000941 Epoch: 5 Global Step: 105420 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:21,180-Speed 2514.06 samples/sec Loss 6.2580 LearningRate 0.000941 Epoch: 5 Global Step: 105430 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:29,391-Speed 2494.74 samples/sec Loss 6.1146 LearningRate 0.000941 Epoch: 5 Global Step: 105440 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:37,589-Speed 2498.36 samples/sec Loss 6.0365 LearningRate 0.000941 Epoch: 5 Global Step: 105450 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:45,800-Speed 2495.14 samples/sec Loss 6.1119 LearningRate 0.000941 Epoch: 5 Global Step: 105460 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:32:53,991-Speed 2500.72 samples/sec Loss 6.0445 LearningRate 0.000941 Epoch: 5 Global Step: 105470 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:33:02,187-Speed 2498.99 samples/sec Loss 6.2175 LearningRate 0.000941 Epoch: 5 Global Step: 105480 Fp16 Grad Scale: 32768 Required: 166 hours Training: 2022-07-06 14:33:10,335-Speed 2513.97 samples/sec Loss 6.0992 LearningRate 0.000941 Epoch: 5 Global Step: 105490 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:18,539-Speed 2496.77 samples/sec Loss 6.1647 LearningRate 0.000941 Epoch: 5 Global Step: 105500 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:26,742-Speed 2496.87 samples/sec Loss 6.1439 LearningRate 0.000940 Epoch: 5 Global Step: 105510 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:34,940-Speed 2498.91 samples/sec Loss 6.2909 LearningRate 0.000940 Epoch: 5 Global Step: 105520 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:43,139-Speed 2498.38 samples/sec Loss 6.2831 LearningRate 0.000940 Epoch: 5 Global Step: 105530 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:51,338-Speed 2498.45 samples/sec Loss 6.2918 LearningRate 0.000940 Epoch: 5 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:33:59,486-Speed 2513.92 samples/sec Loss 6.1735 LearningRate 0.000940 Epoch: 5 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:07,685-Speed 2498.16 samples/sec Loss 6.2015 LearningRate 0.000940 Epoch: 5 Global Step: 105560 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:15,900-Speed 2493.56 samples/sec Loss 6.1252 LearningRate 0.000940 Epoch: 5 Global Step: 105570 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:24,124-Speed 2490.77 samples/sec Loss 6.1875 LearningRate 0.000940 Epoch: 5 Global Step: 105580 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:32,323-Speed 2498.29 samples/sec Loss 6.1841 LearningRate 0.000940 Epoch: 5 Global Step: 105590 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:40,521-Speed 2498.39 samples/sec Loss 6.0614 LearningRate 0.000940 Epoch: 5 Global Step: 105600 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:48,671-Speed 2513.08 samples/sec Loss 6.1003 LearningRate 0.000940 Epoch: 5 Global Step: 105610 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:34:56,867-Speed 2499.02 samples/sec Loss 6.1001 LearningRate 0.000940 Epoch: 5 Global Step: 105620 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:05,076-Speed 2495.42 samples/sec Loss 6.2148 LearningRate 0.000940 Epoch: 5 Global Step: 105630 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:13,274-Speed 2498.43 samples/sec Loss 6.0900 LearningRate 0.000940 Epoch: 5 Global Step: 105640 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:21,469-Speed 2499.37 samples/sec Loss 6.1171 LearningRate 0.000940 Epoch: 5 Global Step: 105650 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:29,666-Speed 2499.14 samples/sec Loss 6.0867 LearningRate 0.000940 Epoch: 5 Global Step: 105660 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:37,808-Speed 2515.63 samples/sec Loss 6.1301 LearningRate 0.000940 Epoch: 5 Global Step: 105670 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:46,016-Speed 2495.76 samples/sec Loss 6.1157 LearningRate 0.000940 Epoch: 5 Global Step: 105680 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:35:54,213-Speed 2498.82 samples/sec Loss 6.2197 LearningRate 0.000940 Epoch: 5 Global Step: 105690 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:02,410-Speed 2498.98 samples/sec Loss 6.3104 LearningRate 0.000940 Epoch: 5 Global Step: 105700 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:10,608-Speed 2498.48 samples/sec Loss 6.0832 LearningRate 0.000940 Epoch: 5 Global Step: 105710 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:18,808-Speed 2498.06 samples/sec Loss 6.2694 LearningRate 0.000940 Epoch: 5 Global Step: 105720 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:26,953-Speed 2514.75 samples/sec Loss 6.1624 LearningRate 0.000940 Epoch: 5 Global Step: 105730 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:35,146-Speed 2500.05 samples/sec Loss 6.2223 LearningRate 0.000940 Epoch: 5 Global Step: 105740 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:43,343-Speed 2498.80 samples/sec Loss 6.1262 LearningRate 0.000940 Epoch: 5 Global Step: 105750 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:51,541-Speed 2498.66 samples/sec Loss 6.1599 LearningRate 0.000940 Epoch: 5 Global Step: 105760 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:36:59,738-Speed 2498.78 samples/sec Loss 6.1573 LearningRate 0.000940 Epoch: 5 Global Step: 105770 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:07,939-Speed 2497.67 samples/sec Loss 6.1730 LearningRate 0.000940 Epoch: 5 Global Step: 105780 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:16,084-Speed 2514.92 samples/sec Loss 6.2355 LearningRate 0.000940 Epoch: 5 Global Step: 105790 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:24,285-Speed 2497.79 samples/sec Loss 6.1053 LearningRate 0.000940 Epoch: 5 Global Step: 105800 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:32,485-Speed 2498.05 samples/sec Loss 6.0445 LearningRate 0.000940 Epoch: 5 Global Step: 105810 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:40,684-Speed 2498.22 samples/sec Loss 6.0512 LearningRate 0.000940 Epoch: 5 Global Step: 105820 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:48,880-Speed 2499.16 samples/sec Loss 6.0798 LearningRate 0.000940 Epoch: 5 Global Step: 105830 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:37:57,086-Speed 2495.99 samples/sec Loss 6.1831 LearningRate 0.000940 Epoch: 5 Global Step: 105840 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:05,236-Speed 2513.34 samples/sec Loss 6.0308 LearningRate 0.000940 Epoch: 5 Global Step: 105850 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:13,445-Speed 2495.67 samples/sec Loss 6.1280 LearningRate 0.000940 Epoch: 5 Global Step: 105860 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:21,642-Speed 2498.90 samples/sec Loss 6.0748 LearningRate 0.000940 Epoch: 5 Global Step: 105870 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:29,838-Speed 2499.24 samples/sec Loss 6.0673 LearningRate 0.000940 Epoch: 5 Global Step: 105880 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:38,031-Speed 2499.89 samples/sec Loss 6.1113 LearningRate 0.000940 Epoch: 5 Global Step: 105890 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:46,231-Speed 2497.92 samples/sec Loss 6.1902 LearningRate 0.000939 Epoch: 5 Global Step: 105900 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:38:54,376-Speed 2514.86 samples/sec Loss 6.1098 LearningRate 0.000939 Epoch: 5 Global Step: 105910 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:02,577-Speed 2497.80 samples/sec Loss 6.0209 LearningRate 0.000939 Epoch: 5 Global Step: 105920 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:10,775-Speed 2498.46 samples/sec Loss 6.0523 LearningRate 0.000939 Epoch: 5 Global Step: 105930 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:18,978-Speed 2497.17 samples/sec Loss 5.9930 LearningRate 0.000939 Epoch: 5 Global Step: 105940 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:27,183-Speed 2496.53 samples/sec Loss 6.0062 LearningRate 0.000939 Epoch: 5 Global Step: 105950 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:35,382-Speed 2498.03 samples/sec Loss 6.0451 LearningRate 0.000939 Epoch: 5 Global Step: 105960 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:43,526-Speed 2515.37 samples/sec Loss 6.1263 LearningRate 0.000939 Epoch: 5 Global Step: 105970 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:51,724-Speed 2498.66 samples/sec Loss 5.9534 LearningRate 0.000939 Epoch: 5 Global Step: 105980 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:39:59,929-Speed 2496.29 samples/sec Loss 6.1915 LearningRate 0.000939 Epoch: 5 Global Step: 105990 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:08,126-Speed 2498.87 samples/sec Loss 6.1187 LearningRate 0.000939 Epoch: 5 Global Step: 106000 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:16,323-Speed 2498.86 samples/sec Loss 6.1251 LearningRate 0.000939 Epoch: 5 Global Step: 106010 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:24,517-Speed 2499.86 samples/sec Loss 6.0746 LearningRate 0.000939 Epoch: 5 Global Step: 106020 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:32,660-Speed 2515.41 samples/sec Loss 6.0798 LearningRate 0.000939 Epoch: 5 Global Step: 106030 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:40,854-Speed 2500.04 samples/sec Loss 6.0685 LearningRate 0.000939 Epoch: 5 Global Step: 106040 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:49,053-Speed 2498.21 samples/sec Loss 6.0762 LearningRate 0.000939 Epoch: 5 Global Step: 106050 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:40:57,246-Speed 2499.89 samples/sec Loss 6.0446 LearningRate 0.000939 Epoch: 5 Global Step: 106060 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:05,438-Speed 2500.28 samples/sec Loss 6.0336 LearningRate 0.000939 Epoch: 5 Global Step: 106070 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:13,637-Speed 2498.19 samples/sec Loss 6.0271 LearningRate 0.000939 Epoch: 5 Global Step: 106080 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:21,780-Speed 2515.89 samples/sec Loss 6.0938 LearningRate 0.000939 Epoch: 5 Global Step: 106090 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:29,979-Speed 2498.08 samples/sec Loss 6.0908 LearningRate 0.000939 Epoch: 5 Global Step: 106100 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:38,180-Speed 2497.73 samples/sec Loss 5.9709 LearningRate 0.000939 Epoch: 5 Global Step: 106110 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:46,376-Speed 2499.11 samples/sec Loss 6.0085 LearningRate 0.000939 Epoch: 5 Global Step: 106120 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:41:54,573-Speed 2498.82 samples/sec Loss 6.0600 LearningRate 0.000939 Epoch: 5 Global Step: 106130 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:02,771-Speed 2498.41 samples/sec Loss 6.0349 LearningRate 0.000939 Epoch: 5 Global Step: 106140 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:10,914-Speed 2515.67 samples/sec Loss 6.0252 LearningRate 0.000939 Epoch: 5 Global Step: 106150 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:19,111-Speed 2498.81 samples/sec Loss 6.0266 LearningRate 0.000939 Epoch: 5 Global Step: 106160 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:27,304-Speed 2499.82 samples/sec Loss 6.0865 LearningRate 0.000939 Epoch: 5 Global Step: 106170 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:35,513-Speed 2495.45 samples/sec Loss 6.1018 LearningRate 0.000939 Epoch: 5 Global Step: 106180 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:43,712-Speed 2497.97 samples/sec Loss 6.0382 LearningRate 0.000939 Epoch: 5 Global Step: 106190 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:42:51,908-Speed 2499.30 samples/sec Loss 6.0634 LearningRate 0.000939 Epoch: 5 Global Step: 106200 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:00,049-Speed 2516.21 samples/sec Loss 6.0557 LearningRate 0.000939 Epoch: 5 Global Step: 106210 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:08,246-Speed 2498.89 samples/sec Loss 6.0553 LearningRate 0.000939 Epoch: 5 Global Step: 106220 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:16,442-Speed 2499.11 samples/sec Loss 6.0570 LearningRate 0.000939 Epoch: 5 Global Step: 106230 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:24,640-Speed 2498.55 samples/sec Loss 6.0599 LearningRate 0.000939 Epoch: 5 Global Step: 106240 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:32,836-Speed 2499.14 samples/sec Loss 6.0511 LearningRate 0.000939 Epoch: 5 Global Step: 106250 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:41,033-Speed 2498.73 samples/sec Loss 6.1299 LearningRate 0.000939 Epoch: 5 Global Step: 106260 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:49,177-Speed 2515.49 samples/sec Loss 6.0269 LearningRate 0.000939 Epoch: 5 Global Step: 106270 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:43:57,375-Speed 2498.51 samples/sec Loss 6.1065 LearningRate 0.000938 Epoch: 5 Global Step: 106280 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:44:05,571-Speed 2499.29 samples/sec Loss 6.0720 LearningRate 0.000938 Epoch: 5 Global Step: 106290 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:44:13,768-Speed 2498.94 samples/sec Loss 6.1078 LearningRate 0.000938 Epoch: 5 Global Step: 106300 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:44:21,974-Speed 2496.19 samples/sec Loss 5.9842 LearningRate 0.000938 Epoch: 5 Global Step: 106310 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 14:44:30,172-Speed 2498.57 samples/sec Loss 6.1368 LearningRate 0.000938 Epoch: 5 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:44:38,313-Speed 2516.08 samples/sec Loss 5.9783 LearningRate 0.000938 Epoch: 5 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:44:46,508-Speed 2499.80 samples/sec Loss 6.0007 LearningRate 0.000938 Epoch: 5 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:44:54,705-Speed 2499.10 samples/sec Loss 6.0378 LearningRate 0.000938 Epoch: 5 Global Step: 106350 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:02,899-Speed 2499.49 samples/sec Loss 6.0196 LearningRate 0.000938 Epoch: 5 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:11,099-Speed 2498.10 samples/sec Loss 6.0415 LearningRate 0.000938 Epoch: 5 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:19,314-Speed 2493.58 samples/sec Loss 6.0498 LearningRate 0.000938 Epoch: 5 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:27,454-Speed 2516.08 samples/sec Loss 6.0302 LearningRate 0.000938 Epoch: 5 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:35,653-Speed 2498.27 samples/sec Loss 5.9660 LearningRate 0.000938 Epoch: 5 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:43,848-Speed 2499.70 samples/sec Loss 5.9531 LearningRate 0.000938 Epoch: 5 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:45:52,041-Speed 2499.99 samples/sec Loss 6.0310 LearningRate 0.000938 Epoch: 5 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:00,248-Speed 2495.87 samples/sec Loss 6.0473 LearningRate 0.000938 Epoch: 5 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:08,447-Speed 2498.40 samples/sec Loss 6.0693 LearningRate 0.000938 Epoch: 5 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:16,592-Speed 2514.53 samples/sec Loss 5.9591 LearningRate 0.000938 Epoch: 5 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:24,789-Speed 2498.92 samples/sec Loss 6.0323 LearningRate 0.000938 Epoch: 5 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:32,986-Speed 2498.91 samples/sec Loss 5.9991 LearningRate 0.000938 Epoch: 5 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:41,191-Speed 2496.18 samples/sec Loss 6.0323 LearningRate 0.000938 Epoch: 5 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:49,390-Speed 2498.36 samples/sec Loss 6.0874 LearningRate 0.000938 Epoch: 5 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:46:57,588-Speed 2498.75 samples/sec Loss 6.0178 LearningRate 0.000938 Epoch: 5 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:05,734-Speed 2514.41 samples/sec Loss 6.0576 LearningRate 0.000938 Epoch: 5 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:13,932-Speed 2498.37 samples/sec Loss 6.0754 LearningRate 0.000938 Epoch: 5 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:22,133-Speed 2497.49 samples/sec Loss 6.1059 LearningRate 0.000938 Epoch: 5 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:30,333-Speed 2498.25 samples/sec Loss 6.1900 LearningRate 0.000938 Epoch: 5 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:38,531-Speed 2498.62 samples/sec Loss 5.9834 LearningRate 0.000938 Epoch: 5 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:46,730-Speed 2498.43 samples/sec Loss 5.9878 LearningRate 0.000938 Epoch: 5 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:47:54,890-Speed 2510.61 samples/sec Loss 6.0216 LearningRate 0.000938 Epoch: 5 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:03,087-Speed 2498.88 samples/sec Loss 6.0929 LearningRate 0.000938 Epoch: 5 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:11,289-Speed 2497.41 samples/sec Loss 5.9592 LearningRate 0.000938 Epoch: 5 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:19,487-Speed 2498.59 samples/sec Loss 6.0598 LearningRate 0.000938 Epoch: 5 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:27,688-Speed 2497.83 samples/sec Loss 6.1199 LearningRate 0.000938 Epoch: 5 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:35,885-Speed 2498.75 samples/sec Loss 6.0280 LearningRate 0.000938 Epoch: 5 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:44,031-Speed 2514.66 samples/sec Loss 5.9824 LearningRate 0.000938 Epoch: 5 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:48:52,228-Speed 2498.89 samples/sec Loss 6.0272 LearningRate 0.000938 Epoch: 5 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:00,425-Speed 2499.00 samples/sec Loss 6.0047 LearningRate 0.000938 Epoch: 5 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:08,619-Speed 2499.93 samples/sec Loss 6.1155 LearningRate 0.000938 Epoch: 5 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:16,813-Speed 2499.82 samples/sec Loss 6.0504 LearningRate 0.000937 Epoch: 5 Global Step: 106670 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:25,011-Speed 2498.67 samples/sec Loss 6.0103 LearningRate 0.000937 Epoch: 5 Global Step: 106680 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:33,152-Speed 2515.94 samples/sec Loss 6.0271 LearningRate 0.000937 Epoch: 5 Global Step: 106690 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:41,349-Speed 2498.97 samples/sec Loss 6.0395 LearningRate 0.000937 Epoch: 5 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:49,545-Speed 2499.12 samples/sec Loss 6.0440 LearningRate 0.000937 Epoch: 5 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:49:57,739-Speed 2499.56 samples/sec Loss 6.0253 LearningRate 0.000937 Epoch: 5 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:05,937-Speed 2498.79 samples/sec Loss 6.0289 LearningRate 0.000937 Epoch: 5 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:14,132-Speed 2499.44 samples/sec Loss 6.0840 LearningRate 0.000937 Epoch: 5 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:22,277-Speed 2514.83 samples/sec Loss 6.0205 LearningRate 0.000937 Epoch: 5 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:30,471-Speed 2499.63 samples/sec Loss 5.9922 LearningRate 0.000937 Epoch: 5 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:38,668-Speed 2499.14 samples/sec Loss 6.0414 LearningRate 0.000937 Epoch: 5 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:46,863-Speed 2499.50 samples/sec Loss 6.0505 LearningRate 0.000937 Epoch: 5 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:50:55,056-Speed 2499.93 samples/sec Loss 6.0457 LearningRate 0.000937 Epoch: 5 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:03,253-Speed 2499.01 samples/sec Loss 6.0306 LearningRate 0.000937 Epoch: 5 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:11,396-Speed 2515.59 samples/sec Loss 6.0912 LearningRate 0.000937 Epoch: 5 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:19,590-Speed 2499.57 samples/sec Loss 6.0814 LearningRate 0.000937 Epoch: 5 Global Step: 106820 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:27,786-Speed 2499.15 samples/sec Loss 6.1714 LearningRate 0.000937 Epoch: 5 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:35,983-Speed 2498.86 samples/sec Loss 6.1678 LearningRate 0.000937 Epoch: 5 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:44,181-Speed 2498.61 samples/sec Loss 6.0233 LearningRate 0.000937 Epoch: 5 Global Step: 106850 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:51:52,376-Speed 2499.41 samples/sec Loss 6.0591 LearningRate 0.000937 Epoch: 5 Global Step: 106860 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:00,519-Speed 2515.61 samples/sec Loss 6.0568 LearningRate 0.000937 Epoch: 5 Global Step: 106870 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:08,714-Speed 2499.22 samples/sec Loss 6.0298 LearningRate 0.000937 Epoch: 5 Global Step: 106880 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:16,916-Speed 2497.42 samples/sec Loss 6.0158 LearningRate 0.000937 Epoch: 5 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:25,110-Speed 2499.87 samples/sec Loss 5.9224 LearningRate 0.000937 Epoch: 5 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:33,313-Speed 2497.19 samples/sec Loss 5.9124 LearningRate 0.000937 Epoch: 5 Global Step: 106910 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:41,514-Speed 2497.78 samples/sec Loss 6.0002 LearningRate 0.000937 Epoch: 5 Global Step: 106920 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:49,654-Speed 2516.43 samples/sec Loss 6.0058 LearningRate 0.000937 Epoch: 5 Global Step: 106930 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:52:57,849-Speed 2499.57 samples/sec Loss 6.0114 LearningRate 0.000937 Epoch: 5 Global Step: 106940 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:06,047-Speed 2498.68 samples/sec Loss 5.9979 LearningRate 0.000937 Epoch: 5 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:14,245-Speed 2498.65 samples/sec Loss 6.0206 LearningRate 0.000937 Epoch: 5 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:22,446-Speed 2497.79 samples/sec Loss 6.0070 LearningRate 0.000937 Epoch: 5 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:30,639-Speed 2499.85 samples/sec Loss 6.0307 LearningRate 0.000937 Epoch: 5 Global Step: 106980 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:38,783-Speed 2515.22 samples/sec Loss 6.0257 LearningRate 0.000937 Epoch: 5 Global Step: 106990 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:46,975-Speed 2500.42 samples/sec Loss 5.9790 LearningRate 0.000937 Epoch: 5 Global Step: 107000 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:53:55,173-Speed 2499.19 samples/sec Loss 5.9779 LearningRate 0.000937 Epoch: 5 Global Step: 107010 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:03,367-Speed 2499.79 samples/sec Loss 6.0122 LearningRate 0.000937 Epoch: 5 Global Step: 107020 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:11,562-Speed 2499.53 samples/sec Loss 5.9603 LearningRate 0.000937 Epoch: 5 Global Step: 107030 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:19,758-Speed 2499.24 samples/sec Loss 6.0029 LearningRate 0.000937 Epoch: 5 Global Step: 107040 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:27,904-Speed 2514.48 samples/sec Loss 5.9768 LearningRate 0.000937 Epoch: 5 Global Step: 107050 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:36,104-Speed 2498.01 samples/sec Loss 5.9675 LearningRate 0.000936 Epoch: 5 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:44,309-Speed 2496.41 samples/sec Loss 5.9919 LearningRate 0.000936 Epoch: 5 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:54:52,502-Speed 2500.17 samples/sec Loss 5.9807 LearningRate 0.000936 Epoch: 5 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:00,703-Speed 2498.00 samples/sec Loss 5.9579 LearningRate 0.000936 Epoch: 5 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:08,901-Speed 2498.42 samples/sec Loss 6.0027 LearningRate 0.000936 Epoch: 5 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:17,045-Speed 2515.15 samples/sec Loss 6.2001 LearningRate 0.000936 Epoch: 5 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:25,241-Speed 2499.28 samples/sec Loss 6.1531 LearningRate 0.000936 Epoch: 5 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:33,437-Speed 2499.14 samples/sec Loss 6.0431 LearningRate 0.000936 Epoch: 5 Global Step: 107130 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:41,641-Speed 2496.99 samples/sec Loss 6.0135 LearningRate 0.000936 Epoch: 5 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:49,839-Speed 2498.44 samples/sec Loss 6.0882 LearningRate 0.000936 Epoch: 5 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:55:58,040-Speed 2497.48 samples/sec Loss 5.9664 LearningRate 0.000936 Epoch: 5 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:06,184-Speed 2515.42 samples/sec Loss 6.1092 LearningRate 0.000936 Epoch: 5 Global Step: 107170 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:14,377-Speed 2499.88 samples/sec Loss 6.0255 LearningRate 0.000936 Epoch: 5 Global Step: 107180 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:22,571-Speed 2500.05 samples/sec Loss 5.9926 LearningRate 0.000936 Epoch: 5 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:30,770-Speed 2498.29 samples/sec Loss 6.1007 LearningRate 0.000936 Epoch: 5 Global Step: 107200 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:38,961-Speed 2500.74 samples/sec Loss 6.0107 LearningRate 0.000936 Epoch: 5 Global Step: 107210 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:47,156-Speed 2499.42 samples/sec Loss 5.9863 LearningRate 0.000936 Epoch: 5 Global Step: 107220 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:56:55,298-Speed 2515.86 samples/sec Loss 6.0118 LearningRate 0.000936 Epoch: 5 Global Step: 107230 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:03,491-Speed 2500.04 samples/sec Loss 6.0426 LearningRate 0.000936 Epoch: 5 Global Step: 107240 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:11,687-Speed 2499.08 samples/sec Loss 5.9356 LearningRate 0.000936 Epoch: 5 Global Step: 107250 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:19,890-Speed 2497.18 samples/sec Loss 5.9337 LearningRate 0.000936 Epoch: 5 Global Step: 107260 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:28,086-Speed 2499.21 samples/sec Loss 5.9754 LearningRate 0.000936 Epoch: 5 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:36,287-Speed 2497.72 samples/sec Loss 5.9725 LearningRate 0.000936 Epoch: 5 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:44,429-Speed 2515.78 samples/sec Loss 6.2012 LearningRate 0.000936 Epoch: 5 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:57:52,637-Speed 2495.60 samples/sec Loss 6.0524 LearningRate 0.000936 Epoch: 5 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:00,836-Speed 2498.37 samples/sec Loss 6.0229 LearningRate 0.000936 Epoch: 5 Global Step: 107310 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:09,038-Speed 2497.25 samples/sec Loss 6.0151 LearningRate 0.000936 Epoch: 5 Global Step: 107320 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:17,246-Speed 2495.49 samples/sec Loss 5.9740 LearningRate 0.000936 Epoch: 5 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:25,444-Speed 2498.57 samples/sec Loss 5.9687 LearningRate 0.000936 Epoch: 5 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:33,593-Speed 2513.65 samples/sec Loss 6.0160 LearningRate 0.000936 Epoch: 5 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:41,791-Speed 2498.47 samples/sec Loss 5.9380 LearningRate 0.000936 Epoch: 5 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:49,987-Speed 2499.58 samples/sec Loss 6.1225 LearningRate 0.000936 Epoch: 5 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:58:58,183-Speed 2499.12 samples/sec Loss 5.9852 LearningRate 0.000936 Epoch: 5 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:06,379-Speed 2499.35 samples/sec Loss 6.0833 LearningRate 0.000936 Epoch: 5 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:14,579-Speed 2498.05 samples/sec Loss 6.0453 LearningRate 0.000936 Epoch: 5 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:22,722-Speed 2515.45 samples/sec Loss 5.9915 LearningRate 0.000936 Epoch: 5 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:30,920-Speed 2498.56 samples/sec Loss 5.9713 LearningRate 0.000936 Epoch: 5 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:39,115-Speed 2499.58 samples/sec Loss 5.9635 LearningRate 0.000936 Epoch: 5 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:47,316-Speed 2497.92 samples/sec Loss 5.9023 LearningRate 0.000935 Epoch: 5 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 14:59:55,513-Speed 2498.73 samples/sec Loss 5.9744 LearningRate 0.000935 Epoch: 5 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:03,711-Speed 2498.83 samples/sec Loss 5.9489 LearningRate 0.000935 Epoch: 5 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:11,853-Speed 2515.73 samples/sec Loss 6.0817 LearningRate 0.000935 Epoch: 5 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:20,048-Speed 2499.37 samples/sec Loss 6.0106 LearningRate 0.000935 Epoch: 5 Global Step: 107480 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:28,245-Speed 2499.17 samples/sec Loss 6.0312 LearningRate 0.000935 Epoch: 5 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:36,438-Speed 2500.10 samples/sec Loss 6.0540 LearningRate 0.000935 Epoch: 5 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:44,636-Speed 2498.59 samples/sec Loss 5.9439 LearningRate 0.000935 Epoch: 5 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:00:52,831-Speed 2499.38 samples/sec Loss 5.9631 LearningRate 0.000935 Epoch: 5 Global Step: 107520 Fp16 Grad Scale: 131072 Required: 165 hours Training: 2022-07-06 15:01:00,973-Speed 2515.89 samples/sec Loss 6.0324 LearningRate 0.000935 Epoch: 5 Global Step: 107530 Fp16 Grad Scale: 131072 Required: 165 hours Training: 2022-07-06 15:01:09,127-Speed 2512.03 samples/sec Loss 6.0308 LearningRate 0.000935 Epoch: 5 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:17,324-Speed 2498.86 samples/sec Loss 6.0117 LearningRate 0.000935 Epoch: 5 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:25,520-Speed 2499.46 samples/sec Loss 5.9752 LearningRate 0.000935 Epoch: 5 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:33,717-Speed 2498.70 samples/sec Loss 5.9849 LearningRate 0.000935 Epoch: 5 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:41,917-Speed 2497.92 samples/sec Loss 5.9821 LearningRate 0.000935 Epoch: 5 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:50,075-Speed 2511.02 samples/sec Loss 6.0814 LearningRate 0.000935 Epoch: 5 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:01:58,281-Speed 2495.89 samples/sec Loss 5.9323 LearningRate 0.000935 Epoch: 5 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:06,480-Speed 2498.39 samples/sec Loss 5.9711 LearningRate 0.000935 Epoch: 5 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:14,676-Speed 2499.38 samples/sec Loss 6.1088 LearningRate 0.000935 Epoch: 5 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:22,873-Speed 2499.00 samples/sec Loss 6.0258 LearningRate 0.000935 Epoch: 5 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:31,070-Speed 2498.77 samples/sec Loss 5.8910 LearningRate 0.000935 Epoch: 5 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:39,219-Speed 2513.78 samples/sec Loss 5.9912 LearningRate 0.000935 Epoch: 5 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:47,419-Speed 2498.02 samples/sec Loss 5.9510 LearningRate 0.000935 Epoch: 5 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:02:55,613-Speed 2499.45 samples/sec Loss 6.0274 LearningRate 0.000935 Epoch: 5 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:03,812-Speed 2498.42 samples/sec Loss 5.9569 LearningRate 0.000935 Epoch: 5 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:12,009-Speed 2498.89 samples/sec Loss 5.9569 LearningRate 0.000935 Epoch: 5 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:20,204-Speed 2499.54 samples/sec Loss 5.8865 LearningRate 0.000935 Epoch: 5 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:28,348-Speed 2515.28 samples/sec Loss 5.9281 LearningRate 0.000935 Epoch: 5 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:36,545-Speed 2499.15 samples/sec Loss 5.9391 LearningRate 0.000935 Epoch: 5 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:44,750-Speed 2496.39 samples/sec Loss 5.9792 LearningRate 0.000935 Epoch: 5 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:03:52,945-Speed 2499.62 samples/sec Loss 6.0408 LearningRate 0.000935 Epoch: 5 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:01,140-Speed 2499.33 samples/sec Loss 5.8735 LearningRate 0.000935 Epoch: 5 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:09,335-Speed 2499.50 samples/sec Loss 5.9692 LearningRate 0.000935 Epoch: 5 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:17,480-Speed 2514.98 samples/sec Loss 5.8845 LearningRate 0.000935 Epoch: 5 Global Step: 107770 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:25,678-Speed 2498.44 samples/sec Loss 6.0571 LearningRate 0.000935 Epoch: 5 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:33,876-Speed 2498.45 samples/sec Loss 5.9857 LearningRate 0.000935 Epoch: 5 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:42,070-Speed 2499.73 samples/sec Loss 5.9242 LearningRate 0.000935 Epoch: 5 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:50,268-Speed 2498.71 samples/sec Loss 5.9757 LearningRate 0.000935 Epoch: 5 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:04:58,582-Speed 2501.35 samples/sec Loss 5.9585 LearningRate 0.000935 Epoch: 5 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:06,723-Speed 2516.07 samples/sec Loss 6.0766 LearningRate 0.000934 Epoch: 5 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:14,918-Speed 2499.56 samples/sec Loss 5.9381 LearningRate 0.000934 Epoch: 5 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:23,744-Speed 2500.44 samples/sec Loss 5.9067 LearningRate 0.000934 Epoch: 5 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:31,944-Speed 2497.76 samples/sec Loss 6.0087 LearningRate 0.000934 Epoch: 5 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:40,142-Speed 2498.42 samples/sec Loss 6.0041 LearningRate 0.000934 Epoch: 5 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:48,342-Speed 2498.02 samples/sec Loss 6.0349 LearningRate 0.000934 Epoch: 5 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:05:56,481-Speed 2516.75 samples/sec Loss 5.8710 LearningRate 0.000934 Epoch: 5 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:04,677-Speed 2499.22 samples/sec Loss 6.0327 LearningRate 0.000934 Epoch: 5 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:12,875-Speed 2498.58 samples/sec Loss 5.9874 LearningRate 0.000934 Epoch: 5 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:21,075-Speed 2497.99 samples/sec Loss 5.9873 LearningRate 0.000934 Epoch: 5 Global Step: 107920 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:29,266-Speed 2500.71 samples/sec Loss 5.9245 LearningRate 0.000934 Epoch: 5 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:37,466-Speed 2498.37 samples/sec Loss 6.0468 LearningRate 0.000934 Epoch: 5 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:45,606-Speed 2516.69 samples/sec Loss 5.9089 LearningRate 0.000934 Epoch: 5 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:06:53,802-Speed 2499.11 samples/sec Loss 5.9328 LearningRate 0.000934 Epoch: 5 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:02,001-Speed 2498.14 samples/sec Loss 5.9490 LearningRate 0.000934 Epoch: 5 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:10,200-Speed 2498.29 samples/sec Loss 5.9394 LearningRate 0.000934 Epoch: 5 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:18,399-Speed 2498.56 samples/sec Loss 5.9134 LearningRate 0.000934 Epoch: 5 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:26,594-Speed 2499.43 samples/sec Loss 5.9066 LearningRate 0.000934 Epoch: 5 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:34,739-Speed 2514.61 samples/sec Loss 5.9031 LearningRate 0.000934 Epoch: 5 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:42,950-Speed 2494.60 samples/sec Loss 5.9152 LearningRate 0.000934 Epoch: 5 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:51,147-Speed 2499.02 samples/sec Loss 5.9129 LearningRate 0.000934 Epoch: 5 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:07:59,344-Speed 2498.88 samples/sec Loss 5.8576 LearningRate 0.000934 Epoch: 5 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:07,544-Speed 2497.92 samples/sec Loss 6.0246 LearningRate 0.000934 Epoch: 5 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:15,744-Speed 2498.04 samples/sec Loss 5.9015 LearningRate 0.000934 Epoch: 5 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:23,901-Speed 2511.13 samples/sec Loss 6.0917 LearningRate 0.000934 Epoch: 5 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:32,101-Speed 2498.04 samples/sec Loss 5.9830 LearningRate 0.000934 Epoch: 5 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:40,299-Speed 2498.75 samples/sec Loss 5.9427 LearningRate 0.000934 Epoch: 5 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:48,515-Speed 2492.82 samples/sec Loss 5.9985 LearningRate 0.000934 Epoch: 5 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:08:56,728-Speed 2494.26 samples/sec Loss 5.8981 LearningRate 0.000934 Epoch: 5 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:04,923-Speed 2499.54 samples/sec Loss 5.9550 LearningRate 0.000934 Epoch: 5 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:13,067-Speed 2514.86 samples/sec Loss 6.0379 LearningRate 0.000934 Epoch: 5 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:21,271-Speed 2496.93 samples/sec Loss 5.8949 LearningRate 0.000934 Epoch: 5 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:29,479-Speed 2495.69 samples/sec Loss 5.9074 LearningRate 0.000934 Epoch: 5 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:37,679-Speed 2497.78 samples/sec Loss 5.9672 LearningRate 0.000934 Epoch: 5 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:45,879-Speed 2497.97 samples/sec Loss 5.9479 LearningRate 0.000934 Epoch: 5 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:09:54,078-Speed 2498.18 samples/sec Loss 5.9077 LearningRate 0.000934 Epoch: 5 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:02,224-Speed 2514.48 samples/sec Loss 6.0200 LearningRate 0.000934 Epoch: 5 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:10,421-Speed 2498.80 samples/sec Loss 5.9618 LearningRate 0.000934 Epoch: 5 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:18,623-Speed 2497.40 samples/sec Loss 5.9622 LearningRate 0.000933 Epoch: 5 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:26,820-Speed 2498.75 samples/sec Loss 5.9693 LearningRate 0.000933 Epoch: 5 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:35,027-Speed 2495.72 samples/sec Loss 5.9197 LearningRate 0.000933 Epoch: 5 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:43,232-Speed 2496.35 samples/sec Loss 5.9111 LearningRate 0.000933 Epoch: 5 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:51,384-Speed 2512.93 samples/sec Loss 5.9011 LearningRate 0.000933 Epoch: 5 Global Step: 108250 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:10:59,591-Speed 2495.90 samples/sec Loss 5.8817 LearningRate 0.000933 Epoch: 5 Global Step: 108260 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:07,794-Speed 2496.91 samples/sec Loss 6.0194 LearningRate 0.000933 Epoch: 5 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:15,995-Speed 2497.52 samples/sec Loss 5.9863 LearningRate 0.000933 Epoch: 5 Global Step: 108280 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:24,199-Speed 2496.87 samples/sec Loss 6.0488 LearningRate 0.000933 Epoch: 5 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:32,402-Speed 2497.02 samples/sec Loss 5.9784 LearningRate 0.000933 Epoch: 5 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:40,557-Speed 2511.81 samples/sec Loss 5.8817 LearningRate 0.000933 Epoch: 5 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:48,773-Speed 2492.88 samples/sec Loss 5.9470 LearningRate 0.000933 Epoch: 5 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:11:56,975-Speed 2497.45 samples/sec Loss 5.9557 LearningRate 0.000933 Epoch: 5 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:05,186-Speed 2494.59 samples/sec Loss 5.8728 LearningRate 0.000933 Epoch: 5 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:13,387-Speed 2497.72 samples/sec Loss 5.8302 LearningRate 0.000933 Epoch: 5 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:21,588-Speed 2497.50 samples/sec Loss 5.9287 LearningRate 0.000933 Epoch: 5 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:29,742-Speed 2512.12 samples/sec Loss 5.9104 LearningRate 0.000933 Epoch: 5 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:37,943-Speed 2497.80 samples/sec Loss 5.9406 LearningRate 0.000933 Epoch: 5 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:46,143-Speed 2497.79 samples/sec Loss 5.9629 LearningRate 0.000933 Epoch: 5 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:12:54,348-Speed 2496.61 samples/sec Loss 5.9447 LearningRate 0.000933 Epoch: 5 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:02,554-Speed 2496.24 samples/sec Loss 5.8899 LearningRate 0.000933 Epoch: 5 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:10,760-Speed 2495.96 samples/sec Loss 5.9714 LearningRate 0.000933 Epoch: 5 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:18,928-Speed 2507.67 samples/sec Loss 5.9788 LearningRate 0.000933 Epoch: 5 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:27,132-Speed 2496.79 samples/sec Loss 6.0475 LearningRate 0.000933 Epoch: 5 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:35,335-Speed 2496.87 samples/sec Loss 6.0190 LearningRate 0.000933 Epoch: 5 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:43,534-Speed 2498.43 samples/sec Loss 5.9425 LearningRate 0.000933 Epoch: 5 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:51,739-Speed 2496.32 samples/sec Loss 5.9262 LearningRate 0.000933 Epoch: 5 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:13:59,950-Speed 2494.63 samples/sec Loss 6.0204 LearningRate 0.000933 Epoch: 5 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:08,097-Speed 2514.38 samples/sec Loss 5.9137 LearningRate 0.000933 Epoch: 5 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:16,298-Speed 2497.55 samples/sec Loss 5.9084 LearningRate 0.000933 Epoch: 5 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:24,511-Speed 2494.02 samples/sec Loss 5.9612 LearningRate 0.000933 Epoch: 5 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:32,713-Speed 2497.21 samples/sec Loss 5.8825 LearningRate 0.000933 Epoch: 5 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:40,916-Speed 2497.14 samples/sec Loss 5.9711 LearningRate 0.000933 Epoch: 5 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:49,117-Speed 2497.77 samples/sec Loss 5.9132 LearningRate 0.000933 Epoch: 5 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:14:57,264-Speed 2513.97 samples/sec Loss 6.0249 LearningRate 0.000933 Epoch: 5 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:05,462-Speed 2498.60 samples/sec Loss 5.9801 LearningRate 0.000933 Epoch: 5 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:13,673-Speed 2494.77 samples/sec Loss 5.9900 LearningRate 0.000933 Epoch: 5 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:21,890-Speed 2492.68 samples/sec Loss 5.9866 LearningRate 0.000933 Epoch: 5 Global Step: 108580 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:30,092-Speed 2497.34 samples/sec Loss 6.0113 LearningRate 0.000933 Epoch: 5 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:38,323-Speed 2488.39 samples/sec Loss 5.9875 LearningRate 0.000932 Epoch: 5 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:46,473-Speed 2513.38 samples/sec Loss 5.9336 LearningRate 0.000932 Epoch: 5 Global Step: 108610 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:15:54,671-Speed 2498.78 samples/sec Loss 5.9034 LearningRate 0.000932 Epoch: 5 Global Step: 108620 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:02,871-Speed 2497.77 samples/sec Loss 5.9543 LearningRate 0.000932 Epoch: 5 Global Step: 108630 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:11,081-Speed 2495.01 samples/sec Loss 5.9721 LearningRate 0.000932 Epoch: 5 Global Step: 108640 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:19,282-Speed 2497.77 samples/sec Loss 5.8769 LearningRate 0.000932 Epoch: 5 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:27,493-Speed 2494.54 samples/sec Loss 5.9771 LearningRate 0.000932 Epoch: 5 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:35,639-Speed 2514.59 samples/sec Loss 6.0038 LearningRate 0.000932 Epoch: 5 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:43,839-Speed 2497.98 samples/sec Loss 5.9227 LearningRate 0.000932 Epoch: 5 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:16:52,037-Speed 2498.44 samples/sec Loss 5.9146 LearningRate 0.000932 Epoch: 5 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:00,237-Speed 2497.77 samples/sec Loss 5.9525 LearningRate 0.000932 Epoch: 5 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:08,436-Speed 2498.11 samples/sec Loss 5.9809 LearningRate 0.000932 Epoch: 5 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:16,633-Speed 2500.53 samples/sec Loss 5.9583 LearningRate 0.000932 Epoch: 5 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:24,829-Speed 2516.24 samples/sec Loss 5.9006 LearningRate 0.000932 Epoch: 5 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:33,027-Speed 2498.46 samples/sec Loss 5.9123 LearningRate 0.000932 Epoch: 5 Global Step: 108740 Fp16 Grad Scale: 131072 Required: 165 hours Training: 2022-07-06 15:17:41,184-Speed 2511.15 samples/sec Loss 5.9721 LearningRate 0.000932 Epoch: 5 Global Step: 108750 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:49,436-Speed 2499.72 samples/sec Loss 5.9480 LearningRate 0.000932 Epoch: 5 Global Step: 108760 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:17:57,680-Speed 2500.15 samples/sec Loss 5.9055 LearningRate 0.000932 Epoch: 5 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:08,302-Speed 1928.28 samples/sec Loss 5.9704 LearningRate 0.000932 Epoch: 5 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:16,451-Speed 2513.48 samples/sec Loss 5.9116 LearningRate 0.000932 Epoch: 5 Global Step: 108790 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:24,693-Speed 2497.42 samples/sec Loss 5.9111 LearningRate 0.000932 Epoch: 5 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:32,923-Speed 2499.65 samples/sec Loss 5.9901 LearningRate 0.000932 Epoch: 5 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:41,132-Speed 2494.90 samples/sec Loss 6.0256 LearningRate 0.000932 Epoch: 5 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:18:52,134-Speed 1887.45 samples/sec Loss 5.9784 LearningRate 0.000932 Epoch: 5 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:00,351-Speed 2500.66 samples/sec Loss 5.8718 LearningRate 0.000932 Epoch: 5 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:11,628-Speed 1821.66 samples/sec Loss 5.9292 LearningRate 0.000932 Epoch: 5 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:19,820-Speed 2500.36 samples/sec Loss 5.8225 LearningRate 0.000932 Epoch: 5 Global Step: 108860 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:28,093-Speed 2496.31 samples/sec Loss 5.8900 LearningRate 0.000932 Epoch: 5 Global Step: 108870 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:36,300-Speed 2500.85 samples/sec Loss 5.9444 LearningRate 0.000932 Epoch: 5 Global Step: 108880 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:44,542-Speed 2496.15 samples/sec Loss 5.8732 LearningRate 0.000932 Epoch: 5 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:19:52,737-Speed 2499.35 samples/sec Loss 6.1338 LearningRate 0.000932 Epoch: 5 Global Step: 108900 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:00,884-Speed 2514.27 samples/sec Loss 5.9035 LearningRate 0.000932 Epoch: 5 Global Step: 108910 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:09,162-Speed 2499.48 samples/sec Loss 5.9496 LearningRate 0.000932 Epoch: 5 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:17,383-Speed 2499.51 samples/sec Loss 6.0393 LearningRate 0.000932 Epoch: 5 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:25,584-Speed 2497.67 samples/sec Loss 6.0940 LearningRate 0.000932 Epoch: 5 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:33,822-Speed 2500.79 samples/sec Loss 6.0849 LearningRate 0.000932 Epoch: 5 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:42,564-Speed 2451.33 samples/sec Loss 5.9533 LearningRate 0.000932 Epoch: 5 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:50,731-Speed 2516.90 samples/sec Loss 5.9302 LearningRate 0.000932 Epoch: 5 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 165 hours Training: 2022-07-06 15:20:58,887-Speed 2511.25 samples/sec Loss 5.9453 LearningRate 0.000932 Epoch: 5 Global Step: 108980 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:07,134-Speed 2501.78 samples/sec Loss 5.9744 LearningRate 0.000931 Epoch: 5 Global Step: 108990 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:15,391-Speed 2501.73 samples/sec Loss 5.9387 LearningRate 0.000931 Epoch: 5 Global Step: 109000 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:23,588-Speed 2498.63 samples/sec Loss 5.8939 LearningRate 0.000931 Epoch: 5 Global Step: 109010 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:31,854-Speed 2496.35 samples/sec Loss 5.8748 LearningRate 0.000931 Epoch: 5 Global Step: 109020 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:40,005-Speed 2516.26 samples/sec Loss 5.9476 LearningRate 0.000931 Epoch: 5 Global Step: 109030 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:48,849-Speed 2501.26 samples/sec Loss 5.8926 LearningRate 0.000931 Epoch: 5 Global Step: 109040 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:21:57,059-Speed 2494.78 samples/sec Loss 5.9067 LearningRate 0.000931 Epoch: 5 Global Step: 109050 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:05,263-Speed 2496.56 samples/sec Loss 5.9487 LearningRate 0.000931 Epoch: 5 Global Step: 109060 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:13,581-Speed 2465.22 samples/sec Loss 5.8821 LearningRate 0.000931 Epoch: 5 Global Step: 109070 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:21,814-Speed 2487.79 samples/sec Loss 5.9662 LearningRate 0.000931 Epoch: 5 Global Step: 109080 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:29,960-Speed 2514.54 samples/sec Loss 5.9086 LearningRate 0.000931 Epoch: 5 Global Step: 109090 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:38,157-Speed 2498.98 samples/sec Loss 5.8162 LearningRate 0.000931 Epoch: 5 Global Step: 109100 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:46,354-Speed 2499.23 samples/sec Loss 5.8336 LearningRate 0.000931 Epoch: 5 Global Step: 109110 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:22:54,550-Speed 2499.01 samples/sec Loss 5.9702 LearningRate 0.000931 Epoch: 5 Global Step: 109120 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:02,749-Speed 2498.46 samples/sec Loss 5.9758 LearningRate 0.000931 Epoch: 5 Global Step: 109130 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:10,943-Speed 2499.98 samples/sec Loss 5.9296 LearningRate 0.000931 Epoch: 5 Global Step: 109140 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:19,086-Speed 2515.76 samples/sec Loss 5.8846 LearningRate 0.000931 Epoch: 5 Global Step: 109150 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:27,283-Speed 2498.69 samples/sec Loss 5.8445 LearningRate 0.000931 Epoch: 5 Global Step: 109160 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:35,481-Speed 2498.47 samples/sec Loss 5.8999 LearningRate 0.000931 Epoch: 5 Global Step: 109170 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:43,673-Speed 2500.55 samples/sec Loss 5.9402 LearningRate 0.000931 Epoch: 5 Global Step: 109180 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:23:51,869-Speed 2499.43 samples/sec Loss 5.9579 LearningRate 0.000931 Epoch: 5 Global Step: 109190 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:00,064-Speed 2499.38 samples/sec Loss 5.9691 LearningRate 0.000931 Epoch: 5 Global Step: 109200 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:08,211-Speed 2514.26 samples/sec Loss 5.8996 LearningRate 0.000931 Epoch: 5 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:16,406-Speed 2499.36 samples/sec Loss 5.8336 LearningRate 0.000931 Epoch: 5 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:24,605-Speed 2498.44 samples/sec Loss 5.9235 LearningRate 0.000931 Epoch: 5 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:32,797-Speed 2500.47 samples/sec Loss 6.0474 LearningRate 0.000931 Epoch: 5 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:40,996-Speed 2498.37 samples/sec Loss 6.0741 LearningRate 0.000931 Epoch: 5 Global Step: 109250 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:49,192-Speed 2499.19 samples/sec Loss 5.9386 LearningRate 0.000931 Epoch: 5 Global Step: 109260 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:24:57,359-Speed 2508.06 samples/sec Loss 6.0442 LearningRate 0.000931 Epoch: 5 Global Step: 109270 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:05,553-Speed 2499.61 samples/sec Loss 6.0539 LearningRate 0.000931 Epoch: 5 Global Step: 109280 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:13,750-Speed 2498.86 samples/sec Loss 5.8974 LearningRate 0.000931 Epoch: 5 Global Step: 109290 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:21,948-Speed 2498.66 samples/sec Loss 5.9121 LearningRate 0.000931 Epoch: 5 Global Step: 109300 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:30,158-Speed 2494.90 samples/sec Loss 5.9413 LearningRate 0.000931 Epoch: 5 Global Step: 109310 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:38,354-Speed 2499.23 samples/sec Loss 6.0018 LearningRate 0.000931 Epoch: 5 Global Step: 109320 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:46,497-Speed 2515.59 samples/sec Loss 6.1184 LearningRate 0.000931 Epoch: 5 Global Step: 109330 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:25:54,693-Speed 2499.24 samples/sec Loss 5.9633 LearningRate 0.000931 Epoch: 5 Global Step: 109340 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:02,894-Speed 2497.69 samples/sec Loss 5.9587 LearningRate 0.000931 Epoch: 5 Global Step: 109350 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:11,109-Speed 2493.58 samples/sec Loss 5.9831 LearningRate 0.000931 Epoch: 5 Global Step: 109360 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:19,304-Speed 2499.75 samples/sec Loss 5.9805 LearningRate 0.000930 Epoch: 5 Global Step: 109370 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:27,502-Speed 2498.43 samples/sec Loss 6.0177 LearningRate 0.000930 Epoch: 5 Global Step: 109380 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:35,651-Speed 2513.66 samples/sec Loss 5.9001 LearningRate 0.000930 Epoch: 5 Global Step: 109390 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:43,849-Speed 2498.45 samples/sec Loss 5.9355 LearningRate 0.000930 Epoch: 5 Global Step: 109400 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:26:52,047-Speed 2498.55 samples/sec Loss 5.8555 LearningRate 0.000930 Epoch: 5 Global Step: 109410 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:00,246-Speed 2498.54 samples/sec Loss 5.8890 LearningRate 0.000930 Epoch: 5 Global Step: 109420 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:08,450-Speed 2496.71 samples/sec Loss 5.8950 LearningRate 0.000930 Epoch: 5 Global Step: 109430 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:16,651-Speed 2497.60 samples/sec Loss 5.9009 LearningRate 0.000930 Epoch: 5 Global Step: 109440 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:24,794-Speed 2515.45 samples/sec Loss 5.8689 LearningRate 0.000930 Epoch: 5 Global Step: 109450 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:33,008-Speed 2493.65 samples/sec Loss 5.9051 LearningRate 0.000930 Epoch: 5 Global Step: 109460 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:41,209-Speed 2497.68 samples/sec Loss 5.8372 LearningRate 0.000930 Epoch: 5 Global Step: 109470 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:49,406-Speed 2498.74 samples/sec Loss 5.7768 LearningRate 0.000930 Epoch: 5 Global Step: 109480 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:27:57,605-Speed 2498.61 samples/sec Loss 5.8788 LearningRate 0.000930 Epoch: 5 Global Step: 109490 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:05,805-Speed 2497.67 samples/sec Loss 5.8979 LearningRate 0.000930 Epoch: 5 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:13,953-Speed 2514.02 samples/sec Loss 5.9138 LearningRate 0.000930 Epoch: 5 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:22,155-Speed 2497.54 samples/sec Loss 5.9189 LearningRate 0.000930 Epoch: 5 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:30,354-Speed 2498.34 samples/sec Loss 5.9089 LearningRate 0.000930 Epoch: 5 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:38,565-Speed 2494.59 samples/sec Loss 5.9808 LearningRate 0.000930 Epoch: 5 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:46,761-Speed 2499.14 samples/sec Loss 5.8688 LearningRate 0.000930 Epoch: 5 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:28:54,961-Speed 2497.98 samples/sec Loss 6.1728 LearningRate 0.000930 Epoch: 5 Global Step: 109560 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:03,107-Speed 2514.50 samples/sec Loss 5.9764 LearningRate 0.000930 Epoch: 5 Global Step: 109570 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:11,306-Speed 2498.72 samples/sec Loss 6.0608 LearningRate 0.000930 Epoch: 5 Global Step: 109580 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:19,505-Speed 2498.53 samples/sec Loss 6.0555 LearningRate 0.000930 Epoch: 5 Global Step: 109590 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:27,703-Speed 2498.52 samples/sec Loss 6.0386 LearningRate 0.000930 Epoch: 5 Global Step: 109600 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:35,902-Speed 2498.20 samples/sec Loss 6.0956 LearningRate 0.000930 Epoch: 5 Global Step: 109610 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:44,102-Speed 2498.07 samples/sec Loss 5.9522 LearningRate 0.000930 Epoch: 5 Global Step: 109620 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:29:52,245-Speed 2515.36 samples/sec Loss 5.9822 LearningRate 0.000930 Epoch: 5 Global Step: 109630 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:00,446-Speed 2497.46 samples/sec Loss 5.9230 LearningRate 0.000930 Epoch: 5 Global Step: 109640 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:08,646-Speed 2498.14 samples/sec Loss 5.8784 LearningRate 0.000930 Epoch: 5 Global Step: 109650 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:16,844-Speed 2498.45 samples/sec Loss 5.9571 LearningRate 0.000930 Epoch: 5 Global Step: 109660 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:25,043-Speed 2498.32 samples/sec Loss 5.9254 LearningRate 0.000930 Epoch: 5 Global Step: 109670 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:33,249-Speed 2495.98 samples/sec Loss 5.9110 LearningRate 0.000930 Epoch: 5 Global Step: 109680 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:41,395-Speed 2514.45 samples/sec Loss 5.9582 LearningRate 0.000930 Epoch: 5 Global Step: 109690 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:49,594-Speed 2498.38 samples/sec Loss 5.8841 LearningRate 0.000930 Epoch: 5 Global Step: 109700 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:30:57,795-Speed 2497.76 samples/sec Loss 5.9987 LearningRate 0.000930 Epoch: 5 Global Step: 109710 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:05,994-Speed 2498.05 samples/sec Loss 5.9243 LearningRate 0.000930 Epoch: 5 Global Step: 109720 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:14,198-Speed 2497.06 samples/sec Loss 5.9092 LearningRate 0.000930 Epoch: 5 Global Step: 109730 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:22,411-Speed 2493.87 samples/sec Loss 5.8704 LearningRate 0.000930 Epoch: 5 Global Step: 109740 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:30,559-Speed 2513.89 samples/sec Loss 5.8486 LearningRate 0.000930 Epoch: 5 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:38,760-Speed 2497.76 samples/sec Loss 5.9023 LearningRate 0.000929 Epoch: 5 Global Step: 109760 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:46,957-Speed 2498.54 samples/sec Loss 5.9320 LearningRate 0.000929 Epoch: 5 Global Step: 109770 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:31:55,157-Speed 2497.99 samples/sec Loss 5.9346 LearningRate 0.000929 Epoch: 5 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:32:03,367-Speed 2494.99 samples/sec Loss 5.9475 LearningRate 0.000929 Epoch: 5 Global Step: 109790 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:32:11,571-Speed 2497.01 samples/sec Loss 5.8601 LearningRate 0.000929 Epoch: 5 Global Step: 109800 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:32:19,720-Speed 2513.62 samples/sec Loss 5.8109 LearningRate 0.000929 Epoch: 5 Global Step: 109810 Fp16 Grad Scale: 32768 Required: 165 hours Training: 2022-07-06 15:32:27,920-Speed 2498.18 samples/sec Loss 5.8846 LearningRate 0.000929 Epoch: 5 Global Step: 109820 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:32:36,117-Speed 2498.79 samples/sec Loss 5.7834 LearningRate 0.000929 Epoch: 5 Global Step: 109830 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:32:44,312-Speed 2499.37 samples/sec Loss 5.9262 LearningRate 0.000929 Epoch: 5 Global Step: 109840 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:32:52,512-Speed 2498.28 samples/sec Loss 5.8250 LearningRate 0.000929 Epoch: 5 Global Step: 109850 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:00,707-Speed 2499.28 samples/sec Loss 5.8213 LearningRate 0.000929 Epoch: 5 Global Step: 109860 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:08,863-Speed 2511.51 samples/sec Loss 5.8846 LearningRate 0.000929 Epoch: 5 Global Step: 109870 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:17,059-Speed 2499.24 samples/sec Loss 5.9103 LearningRate 0.000929 Epoch: 5 Global Step: 109880 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:25,255-Speed 2499.12 samples/sec Loss 5.9584 LearningRate 0.000929 Epoch: 5 Global Step: 109890 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:33,454-Speed 2498.46 samples/sec Loss 5.9205 LearningRate 0.000929 Epoch: 5 Global Step: 109900 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:41,651-Speed 2498.86 samples/sec Loss 5.9250 LearningRate 0.000929 Epoch: 5 Global Step: 109910 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:49,848-Speed 2498.94 samples/sec Loss 5.9205 LearningRate 0.000929 Epoch: 5 Global Step: 109920 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:33:57,993-Speed 2515.00 samples/sec Loss 5.8502 LearningRate 0.000929 Epoch: 5 Global Step: 109930 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:06,189-Speed 2499.26 samples/sec Loss 5.8637 LearningRate 0.000929 Epoch: 5 Global Step: 109940 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:14,385-Speed 2499.50 samples/sec Loss 5.8540 LearningRate 0.000929 Epoch: 5 Global Step: 109950 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:22,584-Speed 2498.11 samples/sec Loss 5.8983 LearningRate 0.000929 Epoch: 5 Global Step: 109960 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:30,779-Speed 2499.19 samples/sec Loss 5.8532 LearningRate 0.000929 Epoch: 5 Global Step: 109970 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:38,976-Speed 2499.01 samples/sec Loss 5.7894 LearningRate 0.000929 Epoch: 5 Global Step: 109980 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:47,121-Speed 2514.97 samples/sec Loss 5.8220 LearningRate 0.000929 Epoch: 5 Global Step: 109990 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:34:55,319-Speed 2498.48 samples/sec Loss 5.8276 LearningRate 0.000929 Epoch: 5 Global Step: 110000 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:03,517-Speed 2498.42 samples/sec Loss 5.8994 LearningRate 0.000929 Epoch: 5 Global Step: 110010 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:11,722-Speed 2496.73 samples/sec Loss 5.9419 LearningRate 0.000929 Epoch: 5 Global Step: 110020 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:19,926-Speed 2496.70 samples/sec Loss 5.9643 LearningRate 0.000929 Epoch: 5 Global Step: 110030 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:28,125-Speed 2498.14 samples/sec Loss 5.9425 LearningRate 0.000929 Epoch: 5 Global Step: 110040 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:36,273-Speed 2514.09 samples/sec Loss 5.8845 LearningRate 0.000929 Epoch: 5 Global Step: 110050 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:44,489-Speed 2493.34 samples/sec Loss 5.8265 LearningRate 0.000929 Epoch: 5 Global Step: 110060 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:35:52,694-Speed 2496.40 samples/sec Loss 5.8644 LearningRate 0.000929 Epoch: 5 Global Step: 110070 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:00,894-Speed 2498.00 samples/sec Loss 5.7800 LearningRate 0.000929 Epoch: 5 Global Step: 110080 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:09,093-Speed 2498.16 samples/sec Loss 5.8009 LearningRate 0.000929 Epoch: 5 Global Step: 110090 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:17,294-Speed 2497.71 samples/sec Loss 5.8973 LearningRate 0.000929 Epoch: 5 Global Step: 110100 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:25,441-Speed 2514.13 samples/sec Loss 5.8007 LearningRate 0.000929 Epoch: 5 Global Step: 110110 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:33,638-Speed 2498.65 samples/sec Loss 5.7330 LearningRate 0.000929 Epoch: 5 Global Step: 110120 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:41,835-Speed 2499.02 samples/sec Loss 5.8007 LearningRate 0.000929 Epoch: 5 Global Step: 110130 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:50,034-Speed 2498.42 samples/sec Loss 5.8456 LearningRate 0.000929 Epoch: 5 Global Step: 110140 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:36:58,235-Speed 2497.67 samples/sec Loss 5.7359 LearningRate 0.000928 Epoch: 5 Global Step: 110150 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:37:06,434-Speed 2498.34 samples/sec Loss 5.7277 LearningRate 0.000928 Epoch: 5 Global Step: 110160 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:37:14,580-Speed 2514.55 samples/sec Loss 5.8254 LearningRate 0.000928 Epoch: 5 Global Step: 110170 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:37:22,793-Speed 2494.13 samples/sec Loss 5.7292 LearningRate 0.000928 Epoch: 5 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:37:30,992-Speed 2498.00 samples/sec Loss 5.8223 LearningRate 0.000928 Epoch: 5 Global Step: 110190 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:37:39,192-Speed 2498.04 samples/sec Loss 5.9319 LearningRate 0.000928 Epoch: 5 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:37:47,388-Speed 2499.24 samples/sec Loss 5.9382 LearningRate 0.000928 Epoch: 5 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:37:55,600-Speed 2494.15 samples/sec Loss 5.9495 LearningRate 0.000928 Epoch: 5 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:03,749-Speed 2513.76 samples/sec Loss 5.8011 LearningRate 0.000928 Epoch: 5 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:11,944-Speed 2499.44 samples/sec Loss 5.8648 LearningRate 0.000928 Epoch: 5 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:20,140-Speed 2499.11 samples/sec Loss 5.8630 LearningRate 0.000928 Epoch: 5 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:28,341-Speed 2497.79 samples/sec Loss 5.8818 LearningRate 0.000928 Epoch: 5 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:36,536-Speed 2499.50 samples/sec Loss 5.8626 LearningRate 0.000928 Epoch: 5 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:44,736-Speed 2497.91 samples/sec Loss 5.8615 LearningRate 0.000928 Epoch: 5 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:38:52,877-Speed 2516.02 samples/sec Loss 5.6886 LearningRate 0.000928 Epoch: 5 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:01,076-Speed 2498.30 samples/sec Loss 5.9016 LearningRate 0.000928 Epoch: 5 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:09,282-Speed 2495.90 samples/sec Loss 5.8193 LearningRate 0.000928 Epoch: 5 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:17,479-Speed 2498.98 samples/sec Loss 5.8747 LearningRate 0.000928 Epoch: 5 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:25,689-Speed 2494.95 samples/sec Loss 5.9032 LearningRate 0.000928 Epoch: 5 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:33,888-Speed 2498.17 samples/sec Loss 6.0165 LearningRate 0.000928 Epoch: 5 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:42,031-Speed 2515.55 samples/sec Loss 5.8424 LearningRate 0.000928 Epoch: 5 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:50,226-Speed 2499.53 samples/sec Loss 5.9640 LearningRate 0.000928 Epoch: 5 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:39:58,420-Speed 2499.85 samples/sec Loss 5.8415 LearningRate 0.000928 Epoch: 5 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:06,620-Speed 2497.83 samples/sec Loss 5.8401 LearningRate 0.000928 Epoch: 5 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:14,819-Speed 2498.34 samples/sec Loss 5.8938 LearningRate 0.000928 Epoch: 5 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:23,029-Speed 2495.08 samples/sec Loss 5.9371 LearningRate 0.000928 Epoch: 5 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:31,172-Speed 2515.43 samples/sec Loss 5.8707 LearningRate 0.000928 Epoch: 5 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:39,368-Speed 2499.16 samples/sec Loss 5.9376 LearningRate 0.000928 Epoch: 5 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:47,564-Speed 2499.01 samples/sec Loss 5.8721 LearningRate 0.000928 Epoch: 5 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:40:55,764-Speed 2498.04 samples/sec Loss 5.8849 LearningRate 0.000928 Epoch: 5 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:03,962-Speed 2498.54 samples/sec Loss 5.8580 LearningRate 0.000928 Epoch: 5 Global Step: 110450 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:12,160-Speed 2498.59 samples/sec Loss 5.8894 LearningRate 0.000928 Epoch: 5 Global Step: 110460 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:20,307-Speed 2514.20 samples/sec Loss 5.7339 LearningRate 0.000928 Epoch: 5 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:28,514-Speed 2495.81 samples/sec Loss 5.8757 LearningRate 0.000928 Epoch: 5 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:36,716-Speed 2497.40 samples/sec Loss 5.8914 LearningRate 0.000928 Epoch: 5 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:44,916-Speed 2498.14 samples/sec Loss 5.8629 LearningRate 0.000928 Epoch: 5 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:41:53,130-Speed 2493.64 samples/sec Loss 5.8225 LearningRate 0.000928 Epoch: 5 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:01,331-Speed 2497.69 samples/sec Loss 5.8078 LearningRate 0.000928 Epoch: 5 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:09,479-Speed 2513.88 samples/sec Loss 5.8344 LearningRate 0.000928 Epoch: 5 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:17,683-Speed 2496.71 samples/sec Loss 5.7733 LearningRate 0.000927 Epoch: 5 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:25,886-Speed 2496.85 samples/sec Loss 5.6853 LearningRate 0.000927 Epoch: 5 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:34,086-Speed 2497.90 samples/sec Loss 5.7460 LearningRate 0.000927 Epoch: 5 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:42,290-Speed 2496.85 samples/sec Loss 5.7803 LearningRate 0.000927 Epoch: 5 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:50,495-Speed 2496.37 samples/sec Loss 5.8626 LearningRate 0.000927 Epoch: 5 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:42:58,645-Speed 2513.23 samples/sec Loss 5.8855 LearningRate 0.000927 Epoch: 5 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:06,850-Speed 2496.38 samples/sec Loss 5.8500 LearningRate 0.000927 Epoch: 5 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:15,057-Speed 2495.86 samples/sec Loss 5.7773 LearningRate 0.000927 Epoch: 5 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:23,263-Speed 2496.17 samples/sec Loss 5.8134 LearningRate 0.000927 Epoch: 5 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:31,479-Speed 2493.06 samples/sec Loss 5.9289 LearningRate 0.000927 Epoch: 5 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:39,680-Speed 2497.98 samples/sec Loss 5.7073 LearningRate 0.000927 Epoch: 5 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:47,828-Speed 2513.83 samples/sec Loss 5.7075 LearningRate 0.000927 Epoch: 5 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:43:56,030-Speed 2497.31 samples/sec Loss 5.8763 LearningRate 0.000927 Epoch: 5 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:04,231-Speed 2497.50 samples/sec Loss 5.7835 LearningRate 0.000927 Epoch: 5 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:12,433-Speed 2498.90 samples/sec Loss 5.8595 LearningRate 0.000927 Epoch: 5 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:20,632-Speed 2498.06 samples/sec Loss 5.8480 LearningRate 0.000927 Epoch: 5 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:28,832-Speed 2498.39 samples/sec Loss 5.7976 LearningRate 0.000927 Epoch: 5 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:36,978-Speed 2514.45 samples/sec Loss 5.7477 LearningRate 0.000927 Epoch: 5 Global Step: 110710 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:45,184-Speed 2495.99 samples/sec Loss 5.7323 LearningRate 0.000927 Epoch: 5 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:44:53,384-Speed 2497.97 samples/sec Loss 5.7769 LearningRate 0.000927 Epoch: 5 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:01,582-Speed 2498.65 samples/sec Loss 5.8381 LearningRate 0.000927 Epoch: 5 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:09,781-Speed 2498.31 samples/sec Loss 5.7917 LearningRate 0.000927 Epoch: 5 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:17,989-Speed 2495.56 samples/sec Loss 5.8120 LearningRate 0.000927 Epoch: 5 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:26,142-Speed 2512.35 samples/sec Loss 5.8081 LearningRate 0.000927 Epoch: 5 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:34,361-Speed 2492.29 samples/sec Loss 5.7752 LearningRate 0.000927 Epoch: 5 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:42,561-Speed 2497.88 samples/sec Loss 5.8130 LearningRate 0.000927 Epoch: 5 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:50,773-Speed 2494.24 samples/sec Loss 5.7981 LearningRate 0.000927 Epoch: 5 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:45:58,982-Speed 2495.40 samples/sec Loss 5.8198 LearningRate 0.000927 Epoch: 5 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:07,182-Speed 2497.74 samples/sec Loss 5.8499 LearningRate 0.000927 Epoch: 5 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:15,327-Speed 2515.05 samples/sec Loss 5.8808 LearningRate 0.000927 Epoch: 5 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:23,525-Speed 2498.48 samples/sec Loss 5.8864 LearningRate 0.000927 Epoch: 5 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:31,735-Speed 2495.00 samples/sec Loss 5.8403 LearningRate 0.000927 Epoch: 5 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:39,953-Speed 2492.57 samples/sec Loss 5.7878 LearningRate 0.000927 Epoch: 5 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:48,152-Speed 2498.06 samples/sec Loss 5.8267 LearningRate 0.000927 Epoch: 5 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:46:56,353-Speed 2497.57 samples/sec Loss 5.7698 LearningRate 0.000927 Epoch: 5 Global Step: 110880 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:04,501-Speed 2514.09 samples/sec Loss 5.7118 LearningRate 0.000927 Epoch: 5 Global Step: 110890 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:12,700-Speed 2498.11 samples/sec Loss 5.6768 LearningRate 0.000927 Epoch: 5 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:20,901-Speed 2498.07 samples/sec Loss 5.7560 LearningRate 0.000927 Epoch: 5 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:29,118-Speed 2492.63 samples/sec Loss 5.8040 LearningRate 0.000926 Epoch: 5 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:37,315-Speed 2499.22 samples/sec Loss 5.7651 LearningRate 0.000926 Epoch: 5 Global Step: 110930 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:45,516-Speed 2497.44 samples/sec Loss 5.8698 LearningRate 0.000926 Epoch: 5 Global Step: 110940 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:47:53,673-Speed 2511.16 samples/sec Loss 5.7655 LearningRate 0.000926 Epoch: 5 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:01,869-Speed 2499.29 samples/sec Loss 5.9189 LearningRate 0.000926 Epoch: 5 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:10,069-Speed 2498.20 samples/sec Loss 5.8916 LearningRate 0.000926 Epoch: 5 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:18,263-Speed 2499.78 samples/sec Loss 5.8448 LearningRate 0.000926 Epoch: 5 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:26,462-Speed 2498.18 samples/sec Loss 5.9171 LearningRate 0.000926 Epoch: 5 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:34,669-Speed 2495.87 samples/sec Loss 5.9331 LearningRate 0.000926 Epoch: 5 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:42,812-Speed 2515.44 samples/sec Loss 5.7435 LearningRate 0.000926 Epoch: 5 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:51,007-Speed 2499.65 samples/sec Loss 5.8361 LearningRate 0.000926 Epoch: 5 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:48:59,203-Speed 2498.86 samples/sec Loss 5.8628 LearningRate 0.000926 Epoch: 5 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:07,409-Speed 2496.11 samples/sec Loss 5.8136 LearningRate 0.000926 Epoch: 5 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:15,609-Speed 2498.19 samples/sec Loss 5.8136 LearningRate 0.000926 Epoch: 5 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:23,804-Speed 2499.44 samples/sec Loss 5.8184 LearningRate 0.000926 Epoch: 5 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:31,946-Speed 2515.53 samples/sec Loss 5.7609 LearningRate 0.000926 Epoch: 5 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:40,141-Speed 2499.66 samples/sec Loss 5.7816 LearningRate 0.000926 Epoch: 5 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:48,338-Speed 2499.05 samples/sec Loss 5.8307 LearningRate 0.000926 Epoch: 5 Global Step: 111090 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:49:56,537-Speed 2498.07 samples/sec Loss 5.8977 LearningRate 0.000926 Epoch: 5 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:04,736-Speed 2498.52 samples/sec Loss 5.8211 LearningRate 0.000926 Epoch: 5 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:12,938-Speed 2497.49 samples/sec Loss 5.8484 LearningRate 0.000926 Epoch: 5 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:21,085-Speed 2513.87 samples/sec Loss 5.8138 LearningRate 0.000926 Epoch: 5 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:29,287-Speed 2497.65 samples/sec Loss 5.8099 LearningRate 0.000926 Epoch: 5 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:37,484-Speed 2498.92 samples/sec Loss 5.8836 LearningRate 0.000926 Epoch: 5 Global Step: 111150 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:45,685-Speed 2497.46 samples/sec Loss 5.8614 LearningRate 0.000926 Epoch: 5 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:50:53,886-Speed 2497.77 samples/sec Loss 5.7639 LearningRate 0.000926 Epoch: 5 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:02,081-Speed 2499.60 samples/sec Loss 5.8652 LearningRate 0.000926 Epoch: 5 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:10,223-Speed 2515.65 samples/sec Loss 5.7427 LearningRate 0.000926 Epoch: 5 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:18,420-Speed 2498.96 samples/sec Loss 5.8195 LearningRate 0.000926 Epoch: 5 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:26,620-Speed 2498.04 samples/sec Loss 5.9114 LearningRate 0.000926 Epoch: 5 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:34,818-Speed 2498.42 samples/sec Loss 5.8400 LearningRate 0.000926 Epoch: 5 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:43,019-Speed 2497.76 samples/sec Loss 5.7473 LearningRate 0.000926 Epoch: 5 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:51,228-Speed 2495.11 samples/sec Loss 5.8187 LearningRate 0.000926 Epoch: 5 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:51:59,375-Speed 2514.48 samples/sec Loss 5.8082 LearningRate 0.000926 Epoch: 5 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:07,570-Speed 2499.60 samples/sec Loss 5.8090 LearningRate 0.000926 Epoch: 5 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:15,763-Speed 2500.07 samples/sec Loss 5.8088 LearningRate 0.000926 Epoch: 5 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:23,958-Speed 2499.57 samples/sec Loss 5.7695 LearningRate 0.000926 Epoch: 5 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:32,150-Speed 2500.21 samples/sec Loss 5.7759 LearningRate 0.000926 Epoch: 5 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:40,351-Speed 2497.79 samples/sec Loss 5.8585 LearningRate 0.000926 Epoch: 5 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:48,495-Speed 2514.96 samples/sec Loss 5.8067 LearningRate 0.000925 Epoch: 5 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:52:56,693-Speed 2498.68 samples/sec Loss 5.8061 LearningRate 0.000925 Epoch: 5 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:04,889-Speed 2499.30 samples/sec Loss 5.8878 LearningRate 0.000925 Epoch: 5 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:13,089-Speed 2497.66 samples/sec Loss 5.7618 LearningRate 0.000925 Epoch: 5 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:21,289-Speed 2498.26 samples/sec Loss 5.7821 LearningRate 0.000925 Epoch: 5 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:29,500-Speed 2494.47 samples/sec Loss 5.8232 LearningRate 0.000925 Epoch: 5 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:37,667-Speed 2508.19 samples/sec Loss 5.7262 LearningRate 0.000925 Epoch: 5 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:53:45,865-Speed 2498.44 samples/sec Loss 5.7825 LearningRate 0.000925 Epoch: 5 Global Step: 111380 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:53:54,067-Speed 2497.50 samples/sec Loss 5.7136 LearningRate 0.000925 Epoch: 5 Global Step: 111390 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:02,267-Speed 2497.74 samples/sec Loss 5.7663 LearningRate 0.000925 Epoch: 5 Global Step: 111400 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:10,469-Speed 2497.48 samples/sec Loss 5.7275 LearningRate 0.000925 Epoch: 5 Global Step: 111410 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:18,669-Speed 2497.92 samples/sec Loss 5.8005 LearningRate 0.000925 Epoch: 5 Global Step: 111420 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:26,814-Speed 2515.02 samples/sec Loss 5.7679 LearningRate 0.000925 Epoch: 5 Global Step: 111430 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:35,009-Speed 2499.33 samples/sec Loss 5.8027 LearningRate 0.000925 Epoch: 5 Global Step: 111440 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:43,208-Speed 2498.71 samples/sec Loss 5.7599 LearningRate 0.000925 Epoch: 5 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:51,402-Speed 2499.68 samples/sec Loss 5.8340 LearningRate 0.000925 Epoch: 5 Global Step: 111460 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:54:59,609-Speed 2496.02 samples/sec Loss 5.8188 LearningRate 0.000925 Epoch: 5 Global Step: 111470 Fp16 Grad Scale: 131072 Required: 164 hours Training: 2022-07-06 15:55:07,764-Speed 2511.84 samples/sec Loss 5.7920 LearningRate 0.000925 Epoch: 5 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:15,909-Speed 2514.84 samples/sec Loss 5.7138 LearningRate 0.000925 Epoch: 5 Global Step: 111490 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:24,107-Speed 2498.52 samples/sec Loss 5.6914 LearningRate 0.000925 Epoch: 5 Global Step: 111500 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:32,303-Speed 2499.33 samples/sec Loss 5.7699 LearningRate 0.000925 Epoch: 5 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:40,500-Speed 2498.78 samples/sec Loss 5.7305 LearningRate 0.000925 Epoch: 5 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:48,696-Speed 2499.33 samples/sec Loss 5.7650 LearningRate 0.000925 Epoch: 5 Global Step: 111530 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:55:56,895-Speed 2498.26 samples/sec Loss 5.7759 LearningRate 0.000925 Epoch: 5 Global Step: 111540 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:05,036-Speed 2515.88 samples/sec Loss 5.7399 LearningRate 0.000925 Epoch: 5 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:13,232-Speed 2499.41 samples/sec Loss 5.7709 LearningRate 0.000925 Epoch: 5 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:21,439-Speed 2495.75 samples/sec Loss 5.7380 LearningRate 0.000925 Epoch: 5 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:29,635-Speed 2499.05 samples/sec Loss 5.8490 LearningRate 0.000925 Epoch: 5 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:37,829-Speed 2499.90 samples/sec Loss 5.7672 LearningRate 0.000925 Epoch: 5 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:46,034-Speed 2496.32 samples/sec Loss 5.6565 LearningRate 0.000925 Epoch: 5 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:56:54,176-Speed 2515.97 samples/sec Loss 5.8157 LearningRate 0.000925 Epoch: 5 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:02,376-Speed 2497.74 samples/sec Loss 5.8129 LearningRate 0.000925 Epoch: 5 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:10,582-Speed 2496.08 samples/sec Loss 5.8418 LearningRate 0.000925 Epoch: 5 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:18,778-Speed 2499.24 samples/sec Loss 5.7389 LearningRate 0.000925 Epoch: 5 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:26,995-Speed 2492.75 samples/sec Loss 5.7234 LearningRate 0.000925 Epoch: 5 Global Step: 111650 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:35,192-Speed 2499.03 samples/sec Loss 5.7802 LearningRate 0.000925 Epoch: 5 Global Step: 111660 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:43,335-Speed 2515.33 samples/sec Loss 5.7023 LearningRate 0.000925 Epoch: 5 Global Step: 111670 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:51,531-Speed 2499.33 samples/sec Loss 5.8034 LearningRate 0.000925 Epoch: 5 Global Step: 111680 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:57:59,727-Speed 2499.23 samples/sec Loss 5.7282 LearningRate 0.000925 Epoch: 5 Global Step: 111690 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:07,920-Speed 2499.96 samples/sec Loss 5.8266 LearningRate 0.000924 Epoch: 5 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:16,116-Speed 2499.47 samples/sec Loss 5.8622 LearningRate 0.000924 Epoch: 5 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:24,312-Speed 2499.25 samples/sec Loss 5.7911 LearningRate 0.000924 Epoch: 5 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:32,470-Speed 2510.63 samples/sec Loss 5.6882 LearningRate 0.000924 Epoch: 5 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:40,665-Speed 2499.54 samples/sec Loss 5.8357 LearningRate 0.000924 Epoch: 5 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 15:58:48,825-Speed 2510.75 samples/sec Loss 5.8602 LearningRate 0.000924 Epoch: 5 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:58:57,022-Speed 2498.70 samples/sec Loss 5.7873 LearningRate 0.000924 Epoch: 5 Global Step: 111760 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:05,225-Speed 2497.00 samples/sec Loss 5.7613 LearningRate 0.000924 Epoch: 5 Global Step: 111770 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:13,424-Speed 2498.25 samples/sec Loss 5.7981 LearningRate 0.000924 Epoch: 5 Global Step: 111780 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:21,575-Speed 2513.23 samples/sec Loss 5.7522 LearningRate 0.000924 Epoch: 5 Global Step: 111790 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:29,776-Speed 2497.34 samples/sec Loss 5.7397 LearningRate 0.000924 Epoch: 5 Global Step: 111800 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:37,977-Speed 2497.69 samples/sec Loss 5.8161 LearningRate 0.000924 Epoch: 5 Global Step: 111810 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:46,178-Speed 2497.60 samples/sec Loss 5.8000 LearningRate 0.000924 Epoch: 5 Global Step: 111820 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 15:59:54,376-Speed 2498.55 samples/sec Loss 5.8372 LearningRate 0.000924 Epoch: 5 Global Step: 111830 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:02,575-Speed 2498.49 samples/sec Loss 5.9799 LearningRate 0.000924 Epoch: 5 Global Step: 111840 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:10,719-Speed 2515.07 samples/sec Loss 5.8331 LearningRate 0.000924 Epoch: 5 Global Step: 111850 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:18,910-Speed 2500.71 samples/sec Loss 5.8579 LearningRate 0.000924 Epoch: 5 Global Step: 111860 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:27,106-Speed 2499.20 samples/sec Loss 5.8509 LearningRate 0.000924 Epoch: 5 Global Step: 111870 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:35,306-Speed 2498.05 samples/sec Loss 5.8843 LearningRate 0.000924 Epoch: 5 Global Step: 111880 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:43,507-Speed 2497.61 samples/sec Loss 5.7825 LearningRate 0.000924 Epoch: 5 Global Step: 111890 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:51,704-Speed 2498.67 samples/sec Loss 5.8632 LearningRate 0.000924 Epoch: 5 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:00:59,847-Speed 2515.68 samples/sec Loss 5.8694 LearningRate 0.000924 Epoch: 5 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:08,052-Speed 2496.44 samples/sec Loss 5.9619 LearningRate 0.000924 Epoch: 5 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:16,250-Speed 2498.50 samples/sec Loss 5.9060 LearningRate 0.000924 Epoch: 5 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:24,441-Speed 2500.70 samples/sec Loss 5.9218 LearningRate 0.000924 Epoch: 5 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:32,640-Speed 2498.15 samples/sec Loss 5.8710 LearningRate 0.000924 Epoch: 5 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:40,837-Speed 2498.96 samples/sec Loss 5.9321 LearningRate 0.000924 Epoch: 5 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:48,979-Speed 2515.57 samples/sec Loss 5.8375 LearningRate 0.000924 Epoch: 5 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:01:57,177-Speed 2498.60 samples/sec Loss 5.8905 LearningRate 0.000924 Epoch: 5 Global Step: 111980 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:05,377-Speed 2497.92 samples/sec Loss 5.7864 LearningRate 0.000924 Epoch: 5 Global Step: 111990 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:13,578-Speed 2497.74 samples/sec Loss 5.7569 LearningRate 0.000924 Epoch: 5 Global Step: 112000 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:21,781-Speed 2497.00 samples/sec Loss 5.8547 LearningRate 0.000924 Epoch: 5 Global Step: 112010 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:29,979-Speed 2498.65 samples/sec Loss 5.7829 LearningRate 0.000924 Epoch: 5 Global Step: 112020 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:38,125-Speed 2514.60 samples/sec Loss 5.7668 LearningRate 0.000924 Epoch: 5 Global Step: 112030 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:46,325-Speed 2498.18 samples/sec Loss 5.7785 LearningRate 0.000924 Epoch: 5 Global Step: 112040 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:02:54,527-Speed 2497.61 samples/sec Loss 5.7463 LearningRate 0.000924 Epoch: 5 Global Step: 112050 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:02,728-Speed 2497.59 samples/sec Loss 5.7820 LearningRate 0.000924 Epoch: 5 Global Step: 112060 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:10,926-Speed 2498.80 samples/sec Loss 5.7867 LearningRate 0.000924 Epoch: 5 Global Step: 112070 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:19,127-Speed 2497.69 samples/sec Loss 5.6745 LearningRate 0.000924 Epoch: 5 Global Step: 112080 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:27,273-Speed 2514.40 samples/sec Loss 5.7715 LearningRate 0.000923 Epoch: 5 Global Step: 112090 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:35,472-Speed 2498.45 samples/sec Loss 5.7755 LearningRate 0.000923 Epoch: 5 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:43,671-Speed 2498.05 samples/sec Loss 5.7369 LearningRate 0.000923 Epoch: 5 Global Step: 112110 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:03:51,872-Speed 2497.88 samples/sec Loss 5.7785 LearningRate 0.000923 Epoch: 5 Global Step: 112120 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:00,070-Speed 2498.37 samples/sec Loss 5.7476 LearningRate 0.000923 Epoch: 5 Global Step: 112130 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:08,270-Speed 2498.15 samples/sec Loss 5.8749 LearningRate 0.000923 Epoch: 5 Global Step: 112140 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:16,413-Speed 2515.62 samples/sec Loss 5.8055 LearningRate 0.000923 Epoch: 5 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:24,610-Speed 2498.76 samples/sec Loss 5.7276 LearningRate 0.000923 Epoch: 5 Global Step: 112160 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:32,804-Speed 2499.61 samples/sec Loss 5.8206 LearningRate 0.000923 Epoch: 5 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:40,998-Speed 2499.80 samples/sec Loss 5.8541 LearningRate 0.000923 Epoch: 5 Global Step: 112180 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:49,195-Speed 2498.94 samples/sec Loss 5.8147 LearningRate 0.000923 Epoch: 5 Global Step: 112190 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:04:57,393-Speed 2498.57 samples/sec Loss 5.7735 LearningRate 0.000923 Epoch: 5 Global Step: 112200 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:05,536-Speed 2515.62 samples/sec Loss 5.8767 LearningRate 0.000923 Epoch: 5 Global Step: 112210 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:13,732-Speed 2498.87 samples/sec Loss 5.7699 LearningRate 0.000923 Epoch: 5 Global Step: 112220 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:21,929-Speed 2498.93 samples/sec Loss 5.8716 LearningRate 0.000923 Epoch: 5 Global Step: 112230 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:30,128-Speed 2498.53 samples/sec Loss 5.8215 LearningRate 0.000923 Epoch: 5 Global Step: 112240 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:38,321-Speed 2499.88 samples/sec Loss 5.7472 LearningRate 0.000923 Epoch: 5 Global Step: 112250 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:46,515-Speed 2499.74 samples/sec Loss 5.8146 LearningRate 0.000923 Epoch: 5 Global Step: 112260 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:05:54,660-Speed 2514.97 samples/sec Loss 5.8196 LearningRate 0.000923 Epoch: 5 Global Step: 112270 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:02,858-Speed 2498.34 samples/sec Loss 5.7193 LearningRate 0.000923 Epoch: 5 Global Step: 112280 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:11,062-Speed 2496.87 samples/sec Loss 5.7651 LearningRate 0.000923 Epoch: 5 Global Step: 112290 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:19,255-Speed 2500.09 samples/sec Loss 5.6754 LearningRate 0.000923 Epoch: 5 Global Step: 112300 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:27,453-Speed 2498.45 samples/sec Loss 5.7728 LearningRate 0.000923 Epoch: 5 Global Step: 112310 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:35,661-Speed 2495.70 samples/sec Loss 5.7732 LearningRate 0.000923 Epoch: 5 Global Step: 112320 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:43,802-Speed 2516.00 samples/sec Loss 5.7860 LearningRate 0.000923 Epoch: 5 Global Step: 112330 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:06:51,996-Speed 2499.82 samples/sec Loss 5.7908 LearningRate 0.000923 Epoch: 5 Global Step: 112340 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:00,195-Speed 2498.57 samples/sec Loss 5.7108 LearningRate 0.000923 Epoch: 5 Global Step: 112350 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:08,391-Speed 2499.12 samples/sec Loss 5.7370 LearningRate 0.000923 Epoch: 5 Global Step: 112360 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:16,588-Speed 2498.83 samples/sec Loss 5.7503 LearningRate 0.000923 Epoch: 5 Global Step: 112370 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:24,799-Speed 2494.47 samples/sec Loss 5.7778 LearningRate 0.000923 Epoch: 5 Global Step: 112380 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:32,944-Speed 2514.81 samples/sec Loss 5.7690 LearningRate 0.000923 Epoch: 5 Global Step: 112390 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:41,140-Speed 2499.22 samples/sec Loss 5.8074 LearningRate 0.000923 Epoch: 5 Global Step: 112400 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:49,335-Speed 2499.53 samples/sec Loss 5.7858 LearningRate 0.000923 Epoch: 5 Global Step: 112410 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:07:57,533-Speed 2498.48 samples/sec Loss 5.7433 LearningRate 0.000923 Epoch: 5 Global Step: 112420 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:05,731-Speed 2498.57 samples/sec Loss 5.7944 LearningRate 0.000923 Epoch: 5 Global Step: 112430 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:13,924-Speed 2500.06 samples/sec Loss 5.7613 LearningRate 0.000923 Epoch: 5 Global Step: 112440 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:22,067-Speed 2515.54 samples/sec Loss 5.7729 LearningRate 0.000923 Epoch: 5 Global Step: 112450 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:30,263-Speed 2499.20 samples/sec Loss 5.7883 LearningRate 0.000923 Epoch: 5 Global Step: 112460 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:38,475-Speed 2494.27 samples/sec Loss 5.7446 LearningRate 0.000923 Epoch: 5 Global Step: 112470 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:46,676-Speed 2497.83 samples/sec Loss 5.7305 LearningRate 0.000922 Epoch: 5 Global Step: 112480 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:08:54,874-Speed 2498.55 samples/sec Loss 5.6810 LearningRate 0.000922 Epoch: 5 Global Step: 112490 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:03,070-Speed 2499.06 samples/sec Loss 5.7979 LearningRate 0.000922 Epoch: 5 Global Step: 112500 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:11,220-Speed 2513.45 samples/sec Loss 5.7317 LearningRate 0.000922 Epoch: 5 Global Step: 112510 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:19,418-Speed 2498.56 samples/sec Loss 5.7470 LearningRate 0.000922 Epoch: 5 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:27,614-Speed 2499.20 samples/sec Loss 5.7089 LearningRate 0.000922 Epoch: 5 Global Step: 112530 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:35,810-Speed 2498.92 samples/sec Loss 5.7533 LearningRate 0.000922 Epoch: 5 Global Step: 112540 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:44,006-Speed 2499.28 samples/sec Loss 5.7953 LearningRate 0.000922 Epoch: 5 Global Step: 112550 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:09:52,201-Speed 2499.56 samples/sec Loss 5.7964 LearningRate 0.000922 Epoch: 5 Global Step: 112560 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:00,346-Speed 2514.86 samples/sec Loss 5.7059 LearningRate 0.000922 Epoch: 5 Global Step: 112570 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:08,540-Speed 2499.86 samples/sec Loss 5.6986 LearningRate 0.000922 Epoch: 5 Global Step: 112580 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:16,735-Speed 2499.47 samples/sec Loss 5.8397 LearningRate 0.000922 Epoch: 5 Global Step: 112590 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:24,937-Speed 2497.45 samples/sec Loss 5.7635 LearningRate 0.000922 Epoch: 5 Global Step: 112600 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:33,127-Speed 2500.83 samples/sec Loss 5.8123 LearningRate 0.000922 Epoch: 5 Global Step: 112610 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:41,322-Speed 2499.55 samples/sec Loss 5.5950 LearningRate 0.000922 Epoch: 5 Global Step: 112620 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:49,466-Speed 2515.26 samples/sec Loss 5.6922 LearningRate 0.000922 Epoch: 5 Global Step: 112630 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:10:57,660-Speed 2499.67 samples/sec Loss 5.6651 LearningRate 0.000922 Epoch: 5 Global Step: 112640 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:05,865-Speed 2496.49 samples/sec Loss 5.7505 LearningRate 0.000922 Epoch: 5 Global Step: 112650 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:14,064-Speed 2498.07 samples/sec Loss 5.7092 LearningRate 0.000922 Epoch: 5 Global Step: 112660 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:22,267-Speed 2497.19 samples/sec Loss 5.6699 LearningRate 0.000922 Epoch: 5 Global Step: 112670 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:30,463-Speed 2499.10 samples/sec Loss 5.7125 LearningRate 0.000922 Epoch: 5 Global Step: 112680 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:38,602-Speed 2516.46 samples/sec Loss 5.8305 LearningRate 0.000922 Epoch: 5 Global Step: 112690 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:46,798-Speed 2499.26 samples/sec Loss 5.7334 LearningRate 0.000922 Epoch: 5 Global Step: 112700 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:11:54,997-Speed 2498.33 samples/sec Loss 5.7827 LearningRate 0.000922 Epoch: 5 Global Step: 112710 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:03,193-Speed 2499.34 samples/sec Loss 5.8254 LearningRate 0.000922 Epoch: 5 Global Step: 112720 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:11,395-Speed 2497.39 samples/sec Loss 5.8489 LearningRate 0.000922 Epoch: 5 Global Step: 112730 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:19,590-Speed 2499.40 samples/sec Loss 5.7797 LearningRate 0.000922 Epoch: 5 Global Step: 112740 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:27,733-Speed 2515.47 samples/sec Loss 5.7987 LearningRate 0.000922 Epoch: 5 Global Step: 112750 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:35,929-Speed 2499.12 samples/sec Loss 5.7208 LearningRate 0.000922 Epoch: 5 Global Step: 112760 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:44,131-Speed 2497.46 samples/sec Loss 5.6422 LearningRate 0.000922 Epoch: 5 Global Step: 112770 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:12:52,329-Speed 2498.64 samples/sec Loss 5.7115 LearningRate 0.000922 Epoch: 5 Global Step: 112780 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:00,526-Speed 2499.08 samples/sec Loss 5.6782 LearningRate 0.000922 Epoch: 5 Global Step: 112790 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:08,725-Speed 2498.24 samples/sec Loss 5.7599 LearningRate 0.000922 Epoch: 5 Global Step: 112800 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:16,872-Speed 2514.04 samples/sec Loss 5.6967 LearningRate 0.000922 Epoch: 5 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:25,066-Speed 2499.78 samples/sec Loss 5.6646 LearningRate 0.000922 Epoch: 5 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:33,263-Speed 2499.01 samples/sec Loss 5.7300 LearningRate 0.000922 Epoch: 5 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:41,457-Speed 2499.62 samples/sec Loss 5.7070 LearningRate 0.000922 Epoch: 5 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:49,653-Speed 2499.06 samples/sec Loss 5.6398 LearningRate 0.000922 Epoch: 5 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:13:57,850-Speed 2498.71 samples/sec Loss 5.7188 LearningRate 0.000922 Epoch: 5 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:05,992-Speed 2516.05 samples/sec Loss 5.8153 LearningRate 0.000921 Epoch: 5 Global Step: 112870 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:14,188-Speed 2498.94 samples/sec Loss 5.7127 LearningRate 0.000921 Epoch: 5 Global Step: 112880 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:22,384-Speed 2499.40 samples/sec Loss 5.6924 LearningRate 0.000921 Epoch: 5 Global Step: 112890 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:30,584-Speed 2498.03 samples/sec Loss 5.7792 LearningRate 0.000921 Epoch: 5 Global Step: 112900 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:38,778-Speed 2499.66 samples/sec Loss 5.6459 LearningRate 0.000921 Epoch: 5 Global Step: 112910 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:46,979-Speed 2497.83 samples/sec Loss 5.6712 LearningRate 0.000921 Epoch: 5 Global Step: 112920 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:14:55,126-Speed 2514.15 samples/sec Loss 5.7807 LearningRate 0.000921 Epoch: 5 Global Step: 112930 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:15:03,321-Speed 2499.70 samples/sec Loss 5.7018 LearningRate 0.000921 Epoch: 5 Global Step: 112940 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:15:11,515-Speed 2499.63 samples/sec Loss 5.7414 LearningRate 0.000921 Epoch: 5 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:15:19,727-Speed 2494.58 samples/sec Loss 5.6904 LearningRate 0.000921 Epoch: 5 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:15:27,930-Speed 2496.90 samples/sec Loss 5.7430 LearningRate 0.000921 Epoch: 5 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:15:36,128-Speed 2498.83 samples/sec Loss 5.7097 LearningRate 0.000921 Epoch: 5 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:15:44,272-Speed 2515.31 samples/sec Loss 5.7351 LearningRate 0.000921 Epoch: 5 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:15:52,468-Speed 2499.07 samples/sec Loss 5.7957 LearningRate 0.000921 Epoch: 5 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:00,661-Speed 2499.97 samples/sec Loss 5.7217 LearningRate 0.000921 Epoch: 5 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:08,856-Speed 2499.64 samples/sec Loss 5.7099 LearningRate 0.000921 Epoch: 5 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:17,061-Speed 2496.52 samples/sec Loss 5.6944 LearningRate 0.000921 Epoch: 5 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:25,257-Speed 2499.24 samples/sec Loss 5.6342 LearningRate 0.000921 Epoch: 5 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:33,400-Speed 2515.59 samples/sec Loss 5.8466 LearningRate 0.000921 Epoch: 5 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:41,598-Speed 2498.65 samples/sec Loss 5.7528 LearningRate 0.000921 Epoch: 5 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:49,793-Speed 2499.46 samples/sec Loss 5.7715 LearningRate 0.000921 Epoch: 5 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:16:57,991-Speed 2498.57 samples/sec Loss 5.7935 LearningRate 0.000921 Epoch: 5 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:06,187-Speed 2499.24 samples/sec Loss 5.7288 LearningRate 0.000921 Epoch: 5 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:14,385-Speed 2498.62 samples/sec Loss 5.8654 LearningRate 0.000921 Epoch: 5 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:22,530-Speed 2514.73 samples/sec Loss 5.7576 LearningRate 0.000921 Epoch: 5 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:30,724-Speed 2499.59 samples/sec Loss 5.8132 LearningRate 0.000921 Epoch: 5 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:38,920-Speed 2499.43 samples/sec Loss 5.7401 LearningRate 0.000921 Epoch: 5 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:47,114-Speed 2499.55 samples/sec Loss 5.6838 LearningRate 0.000921 Epoch: 5 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:17:55,310-Speed 2499.43 samples/sec Loss 5.6576 LearningRate 0.000921 Epoch: 5 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:03,508-Speed 2498.50 samples/sec Loss 5.7878 LearningRate 0.000921 Epoch: 5 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:11,652-Speed 2515.23 samples/sec Loss 5.7624 LearningRate 0.000921 Epoch: 5 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:19,848-Speed 2499.16 samples/sec Loss 5.7639 LearningRate 0.000921 Epoch: 5 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:28,050-Speed 2497.40 samples/sec Loss 5.7567 LearningRate 0.000921 Epoch: 5 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:36,248-Speed 2498.43 samples/sec Loss 5.7346 LearningRate 0.000921 Epoch: 5 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:44,446-Speed 2498.78 samples/sec Loss 5.7338 LearningRate 0.000921 Epoch: 5 Global Step: 113210 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:18:52,645-Speed 2498.41 samples/sec Loss 5.6996 LearningRate 0.000921 Epoch: 5 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:00,787-Speed 2515.66 samples/sec Loss 5.7113 LearningRate 0.000921 Epoch: 5 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:08,982-Speed 2499.44 samples/sec Loss 5.6698 LearningRate 0.000921 Epoch: 5 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:17,181-Speed 2498.50 samples/sec Loss 5.6723 LearningRate 0.000920 Epoch: 5 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:25,387-Speed 2495.97 samples/sec Loss 5.7308 LearningRate 0.000920 Epoch: 5 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:33,582-Speed 2499.41 samples/sec Loss 5.6983 LearningRate 0.000920 Epoch: 5 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:41,779-Speed 2499.20 samples/sec Loss 5.6747 LearningRate 0.000920 Epoch: 5 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:49,920-Speed 2516.22 samples/sec Loss 5.7771 LearningRate 0.000920 Epoch: 5 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:19:58,120-Speed 2497.90 samples/sec Loss 5.7282 LearningRate 0.000920 Epoch: 5 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:06,317-Speed 2499.04 samples/sec Loss 5.7960 LearningRate 0.000920 Epoch: 5 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:14,515-Speed 2498.57 samples/sec Loss 5.7364 LearningRate 0.000920 Epoch: 5 Global Step: 113320 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:22,725-Speed 2495.06 samples/sec Loss 5.7211 LearningRate 0.000920 Epoch: 5 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:30,922-Speed 2498.96 samples/sec Loss 5.7206 LearningRate 0.000920 Epoch: 5 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:39,067-Speed 2514.74 samples/sec Loss 5.8859 LearningRate 0.000920 Epoch: 5 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:47,262-Speed 2499.67 samples/sec Loss 5.7045 LearningRate 0.000920 Epoch: 5 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:20:55,455-Speed 2499.92 samples/sec Loss 5.7350 LearningRate 0.000920 Epoch: 5 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:03,650-Speed 2499.39 samples/sec Loss 5.7245 LearningRate 0.000920 Epoch: 5 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:11,858-Speed 2495.82 samples/sec Loss 5.7289 LearningRate 0.000920 Epoch: 5 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:20,055-Speed 2498.72 samples/sec Loss 5.6938 LearningRate 0.000920 Epoch: 5 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:28,201-Speed 2514.37 samples/sec Loss 5.8092 LearningRate 0.000920 Epoch: 5 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:36,395-Speed 2500.23 samples/sec Loss 5.6848 LearningRate 0.000920 Epoch: 5 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:44,593-Speed 2498.66 samples/sec Loss 5.7326 LearningRate 0.000920 Epoch: 5 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:21:52,792-Speed 2498.30 samples/sec Loss 5.7305 LearningRate 0.000920 Epoch: 5 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:00,988-Speed 2499.11 samples/sec Loss 5.7365 LearningRate 0.000920 Epoch: 5 Global Step: 113450 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:09,186-Speed 2498.44 samples/sec Loss 5.6916 LearningRate 0.000920 Epoch: 5 Global Step: 113460 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:17,335-Speed 2513.99 samples/sec Loss 5.7249 LearningRate 0.000920 Epoch: 5 Global Step: 113470 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:25,528-Speed 2499.96 samples/sec Loss 5.6547 LearningRate 0.000920 Epoch: 5 Global Step: 113480 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:33,736-Speed 2495.67 samples/sec Loss 5.6033 LearningRate 0.000920 Epoch: 5 Global Step: 113490 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:41,932-Speed 2499.20 samples/sec Loss 5.5981 LearningRate 0.000920 Epoch: 5 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:50,130-Speed 2498.56 samples/sec Loss 5.6496 LearningRate 0.000920 Epoch: 5 Global Step: 113510 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:22:58,340-Speed 2495.48 samples/sec Loss 5.6764 LearningRate 0.000920 Epoch: 5 Global Step: 113520 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:23:06,483-Speed 2515.30 samples/sec Loss 5.5878 LearningRate 0.000920 Epoch: 5 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:23:14,685-Speed 2497.14 samples/sec Loss 5.6851 LearningRate 0.000920 Epoch: 5 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:23:22,893-Speed 2495.46 samples/sec Loss 5.6329 LearningRate 0.000920 Epoch: 5 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 164 hours Training: 2022-07-06 16:23:31,061-Speed 2508.04 samples/sec Loss 5.8380 LearningRate 0.000920 Epoch: 5 Global Step: 113560 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:23:39,257-Speed 2499.05 samples/sec Loss 5.7804 LearningRate 0.000920 Epoch: 5 Global Step: 113570 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:23:47,454-Speed 2499.06 samples/sec Loss 5.7513 LearningRate 0.000920 Epoch: 5 Global Step: 113580 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:23:55,600-Speed 2514.37 samples/sec Loss 5.6761 LearningRate 0.000920 Epoch: 5 Global Step: 113590 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:03,799-Speed 2498.49 samples/sec Loss 5.7444 LearningRate 0.000920 Epoch: 5 Global Step: 113600 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:11,995-Speed 2499.37 samples/sec Loss 5.8964 LearningRate 0.000920 Epoch: 5 Global Step: 113610 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:20,203-Speed 2495.58 samples/sec Loss 5.8094 LearningRate 0.000920 Epoch: 5 Global Step: 113620 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:28,400-Speed 2499.01 samples/sec Loss 5.6825 LearningRate 0.000920 Epoch: 5 Global Step: 113630 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:36,597-Speed 2498.80 samples/sec Loss 5.6989 LearningRate 0.000919 Epoch: 5 Global Step: 113640 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:44,739-Speed 2515.66 samples/sec Loss 5.7091 LearningRate 0.000919 Epoch: 5 Global Step: 113650 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:24:52,936-Speed 2498.84 samples/sec Loss 5.6513 LearningRate 0.000919 Epoch: 5 Global Step: 113660 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:01,131-Speed 2499.44 samples/sec Loss 5.6613 LearningRate 0.000919 Epoch: 5 Global Step: 113670 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:09,330-Speed 2498.65 samples/sec Loss 5.6479 LearningRate 0.000919 Epoch: 5 Global Step: 113680 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:17,528-Speed 2498.44 samples/sec Loss 5.6571 LearningRate 0.000919 Epoch: 5 Global Step: 113690 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:25,723-Speed 2500.01 samples/sec Loss 5.7140 LearningRate 0.000919 Epoch: 5 Global Step: 113700 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:33,865-Speed 2515.69 samples/sec Loss 5.7038 LearningRate 0.000919 Epoch: 5 Global Step: 113710 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:42,060-Speed 2499.62 samples/sec Loss 5.7544 LearningRate 0.000919 Epoch: 5 Global Step: 113720 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:50,261-Speed 2497.76 samples/sec Loss 5.7647 LearningRate 0.000919 Epoch: 5 Global Step: 113730 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:25:58,457-Speed 2499.06 samples/sec Loss 5.7839 LearningRate 0.000919 Epoch: 5 Global Step: 113740 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:06,657-Speed 2498.01 samples/sec Loss 5.7217 LearningRate 0.000919 Epoch: 5 Global Step: 113750 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:14,855-Speed 2498.52 samples/sec Loss 5.6623 LearningRate 0.000919 Epoch: 5 Global Step: 113760 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:22,999-Speed 2515.17 samples/sec Loss 5.7160 LearningRate 0.000919 Epoch: 5 Global Step: 113770 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:31,203-Speed 2496.52 samples/sec Loss 5.7229 LearningRate 0.000919 Epoch: 5 Global Step: 113780 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:39,401-Speed 2498.68 samples/sec Loss 5.6396 LearningRate 0.000919 Epoch: 5 Global Step: 113790 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:47,597-Speed 2499.08 samples/sec Loss 5.6954 LearningRate 0.000919 Epoch: 5 Global Step: 113800 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:26:55,792-Speed 2499.74 samples/sec Loss 5.6758 LearningRate 0.000919 Epoch: 5 Global Step: 113810 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:03,989-Speed 2498.86 samples/sec Loss 5.7713 LearningRate 0.000919 Epoch: 5 Global Step: 113820 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:12,130-Speed 2515.98 samples/sec Loss 5.6253 LearningRate 0.000919 Epoch: 5 Global Step: 113830 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:20,328-Speed 2498.68 samples/sec Loss 5.7547 LearningRate 0.000919 Epoch: 5 Global Step: 113840 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:28,524-Speed 2499.33 samples/sec Loss 5.7459 LearningRate 0.000919 Epoch: 5 Global Step: 113850 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:36,723-Speed 2498.31 samples/sec Loss 5.7066 LearningRate 0.000919 Epoch: 5 Global Step: 113860 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:44,921-Speed 2498.59 samples/sec Loss 5.6261 LearningRate 0.000919 Epoch: 5 Global Step: 113870 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:27:53,117-Speed 2499.00 samples/sec Loss 5.6545 LearningRate 0.000919 Epoch: 5 Global Step: 113880 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:01,260-Speed 2515.47 samples/sec Loss 5.6347 LearningRate 0.000919 Epoch: 5 Global Step: 113890 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:09,457-Speed 2498.94 samples/sec Loss 5.6128 LearningRate 0.000919 Epoch: 5 Global Step: 113900 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:17,654-Speed 2499.02 samples/sec Loss 5.7286 LearningRate 0.000919 Epoch: 5 Global Step: 113910 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:25,853-Speed 2498.28 samples/sec Loss 5.6987 LearningRate 0.000919 Epoch: 5 Global Step: 113920 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:34,046-Speed 2500.03 samples/sec Loss 5.7546 LearningRate 0.000919 Epoch: 5 Global Step: 113930 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:42,243-Speed 2498.96 samples/sec Loss 5.6978 LearningRate 0.000919 Epoch: 5 Global Step: 113940 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:50,388-Speed 2514.70 samples/sec Loss 5.7274 LearningRate 0.000919 Epoch: 5 Global Step: 113950 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:28:58,586-Speed 2498.59 samples/sec Loss 5.5993 LearningRate 0.000919 Epoch: 5 Global Step: 113960 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:06,784-Speed 2498.65 samples/sec Loss 5.6339 LearningRate 0.000919 Epoch: 5 Global Step: 113970 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:14,982-Speed 2498.66 samples/sec Loss 5.7404 LearningRate 0.000919 Epoch: 5 Global Step: 113980 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:23,184-Speed 2497.25 samples/sec Loss 5.7581 LearningRate 0.000919 Epoch: 5 Global Step: 113990 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:31,385-Speed 2497.50 samples/sec Loss 5.7548 LearningRate 0.000919 Epoch: 5 Global Step: 114000 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:39,534-Speed 2513.60 samples/sec Loss 5.7011 LearningRate 0.000919 Epoch: 5 Global Step: 114010 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:47,734-Speed 2498.14 samples/sec Loss 5.7639 LearningRate 0.000919 Epoch: 5 Global Step: 114020 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:29:55,932-Speed 2498.50 samples/sec Loss 5.6515 LearningRate 0.000918 Epoch: 5 Global Step: 114030 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:30:04,134-Speed 2497.26 samples/sec Loss 5.6262 LearningRate 0.000918 Epoch: 5 Global Step: 114040 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:30:12,336-Speed 2497.41 samples/sec Loss 5.6294 LearningRate 0.000918 Epoch: 5 Global Step: 114050 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:30:20,544-Speed 2495.43 samples/sec Loss 5.6201 LearningRate 0.000918 Epoch: 5 Global Step: 114060 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:30:28,691-Speed 2514.37 samples/sec Loss 5.7039 LearningRate 0.000918 Epoch: 5 Global Step: 114070 Fp16 Grad Scale: 32768 Required: 164 hours Training: 2022-07-06 16:30:36,891-Speed 2498.04 samples/sec Loss 5.6220 LearningRate 0.000918 Epoch: 5 Global Step: 114080 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:30:45,089-Speed 2498.54 samples/sec Loss 5.7375 LearningRate 0.000918 Epoch: 5 Global Step: 114090 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:30:53,288-Speed 2498.12 samples/sec Loss 5.7025 LearningRate 0.000918 Epoch: 5 Global Step: 114100 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:01,489-Speed 2498.03 samples/sec Loss 5.6430 LearningRate 0.000918 Epoch: 5 Global Step: 114110 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:09,687-Speed 2498.44 samples/sec Loss 5.7311 LearningRate 0.000918 Epoch: 5 Global Step: 114120 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:17,835-Speed 2513.78 samples/sec Loss 5.6440 LearningRate 0.000918 Epoch: 5 Global Step: 114130 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:26,047-Speed 2494.37 samples/sec Loss 5.7240 LearningRate 0.000918 Epoch: 5 Global Step: 114140 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:34,248-Speed 2497.41 samples/sec Loss 5.6801 LearningRate 0.000918 Epoch: 5 Global Step: 114150 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:42,448-Speed 2497.91 samples/sec Loss 5.6623 LearningRate 0.000918 Epoch: 5 Global Step: 114160 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:50,644-Speed 2499.02 samples/sec Loss 5.7600 LearningRate 0.000918 Epoch: 5 Global Step: 114170 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:31:58,842-Speed 2498.58 samples/sec Loss 5.6585 LearningRate 0.000918 Epoch: 5 Global Step: 114180 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:06,986-Speed 2515.15 samples/sec Loss 5.7204 LearningRate 0.000918 Epoch: 5 Global Step: 114190 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:15,185-Speed 2498.32 samples/sec Loss 5.6546 LearningRate 0.000918 Epoch: 5 Global Step: 114200 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:23,384-Speed 2498.44 samples/sec Loss 5.6784 LearningRate 0.000918 Epoch: 5 Global Step: 114210 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:31,585-Speed 2497.43 samples/sec Loss 5.6872 LearningRate 0.000918 Epoch: 5 Global Step: 114220 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:39,784-Speed 2498.46 samples/sec Loss 5.6559 LearningRate 0.000918 Epoch: 5 Global Step: 114230 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:47,982-Speed 2498.43 samples/sec Loss 5.6757 LearningRate 0.000918 Epoch: 5 Global Step: 114240 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:32:56,126-Speed 2515.18 samples/sec Loss 5.7258 LearningRate 0.000918 Epoch: 5 Global Step: 114250 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:04,332-Speed 2496.21 samples/sec Loss 5.6083 LearningRate 0.000918 Epoch: 5 Global Step: 114260 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:12,524-Speed 2500.38 samples/sec Loss 5.6963 LearningRate 0.000918 Epoch: 5 Global Step: 114270 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:20,726-Speed 2497.50 samples/sec Loss 5.7397 LearningRate 0.000918 Epoch: 5 Global Step: 114280 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:28,921-Speed 2499.32 samples/sec Loss 5.7248 LearningRate 0.000918 Epoch: 5 Global Step: 114290 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:37,130-Speed 2495.15 samples/sec Loss 5.6614 LearningRate 0.000918 Epoch: 5 Global Step: 114300 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:45,273-Speed 2515.48 samples/sec Loss 5.6612 LearningRate 0.000918 Epoch: 5 Global Step: 114310 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:33:53,470-Speed 2499.09 samples/sec Loss 5.6150 LearningRate 0.000918 Epoch: 5 Global Step: 114320 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:01,664-Speed 2499.62 samples/sec Loss 5.6679 LearningRate 0.000918 Epoch: 5 Global Step: 114330 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:09,856-Speed 2500.55 samples/sec Loss 5.6542 LearningRate 0.000918 Epoch: 5 Global Step: 114340 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:18,049-Speed 2500.03 samples/sec Loss 5.7005 LearningRate 0.000918 Epoch: 5 Global Step: 114350 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:26,243-Speed 2499.93 samples/sec Loss 5.7028 LearningRate 0.000918 Epoch: 5 Global Step: 114360 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:34,386-Speed 2515.55 samples/sec Loss 5.7391 LearningRate 0.000918 Epoch: 5 Global Step: 114370 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:42,582-Speed 2498.95 samples/sec Loss 5.7202 LearningRate 0.000918 Epoch: 5 Global Step: 114380 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:50,778-Speed 2499.55 samples/sec Loss 5.6689 LearningRate 0.000918 Epoch: 5 Global Step: 114390 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:34:58,971-Speed 2499.81 samples/sec Loss 5.7297 LearningRate 0.000918 Epoch: 5 Global Step: 114400 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:07,166-Speed 2499.45 samples/sec Loss 5.7480 LearningRate 0.000918 Epoch: 5 Global Step: 114410 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:15,372-Speed 2496.34 samples/sec Loss 5.6324 LearningRate 0.000917 Epoch: 5 Global Step: 114420 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:23,517-Speed 2514.77 samples/sec Loss 5.7510 LearningRate 0.000917 Epoch: 5 Global Step: 114430 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:31,716-Speed 2498.22 samples/sec Loss 5.6822 LearningRate 0.000917 Epoch: 5 Global Step: 114440 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:39,911-Speed 2499.62 samples/sec Loss 5.6785 LearningRate 0.000917 Epoch: 5 Global Step: 114450 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:48,105-Speed 2499.77 samples/sec Loss 5.6204 LearningRate 0.000917 Epoch: 5 Global Step: 114460 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:35:56,303-Speed 2498.91 samples/sec Loss 5.6362 LearningRate 0.000917 Epoch: 5 Global Step: 114470 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:04,504-Speed 2497.56 samples/sec Loss 5.7119 LearningRate 0.000917 Epoch: 5 Global Step: 114480 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:12,649-Speed 2514.66 samples/sec Loss 5.6339 LearningRate 0.000917 Epoch: 5 Global Step: 114490 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:20,847-Speed 2498.75 samples/sec Loss 5.6462 LearningRate 0.000917 Epoch: 5 Global Step: 114500 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:29,045-Speed 2498.64 samples/sec Loss 5.7215 LearningRate 0.000917 Epoch: 5 Global Step: 114510 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:37,243-Speed 2498.33 samples/sec Loss 5.7638 LearningRate 0.000917 Epoch: 5 Global Step: 114520 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:45,440-Speed 2498.94 samples/sec Loss 5.6931 LearningRate 0.000917 Epoch: 5 Global Step: 114530 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:36:53,645-Speed 2496.32 samples/sec Loss 5.6388 LearningRate 0.000917 Epoch: 5 Global Step: 114540 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:01,792-Speed 2514.41 samples/sec Loss 5.6562 LearningRate 0.000917 Epoch: 5 Global Step: 114550 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:09,991-Speed 2498.41 samples/sec Loss 5.5904 LearningRate 0.000917 Epoch: 5 Global Step: 114560 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:18,197-Speed 2496.22 samples/sec Loss 5.6503 LearningRate 0.000917 Epoch: 5 Global Step: 114570 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:26,399-Speed 2497.36 samples/sec Loss 5.7230 LearningRate 0.000917 Epoch: 5 Global Step: 114580 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:34,596-Speed 2498.68 samples/sec Loss 5.6575 LearningRate 0.000917 Epoch: 5 Global Step: 114590 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:42,798-Speed 2497.34 samples/sec Loss 5.6704 LearningRate 0.000917 Epoch: 5 Global Step: 114600 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:50,945-Speed 2514.18 samples/sec Loss 5.7498 LearningRate 0.000917 Epoch: 5 Global Step: 114610 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:37:59,147-Speed 2497.23 samples/sec Loss 5.7496 LearningRate 0.000917 Epoch: 5 Global Step: 114620 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:07,346-Speed 2498.56 samples/sec Loss 5.6948 LearningRate 0.000917 Epoch: 5 Global Step: 114630 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:15,547-Speed 2497.79 samples/sec Loss 5.6461 LearningRate 0.000917 Epoch: 5 Global Step: 114640 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:23,761-Speed 2493.48 samples/sec Loss 5.6529 LearningRate 0.000917 Epoch: 5 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:31,970-Speed 2495.29 samples/sec Loss 5.7140 LearningRate 0.000917 Epoch: 5 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:40,126-Speed 2511.60 samples/sec Loss 5.6013 LearningRate 0.000917 Epoch: 5 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:48,326-Speed 2497.79 samples/sec Loss 5.6953 LearningRate 0.000917 Epoch: 5 Global Step: 114680 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:38:56,521-Speed 2499.45 samples/sec Loss 5.6736 LearningRate 0.000917 Epoch: 5 Global Step: 114690 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:04,721-Speed 2498.21 samples/sec Loss 5.6269 LearningRate 0.000917 Epoch: 5 Global Step: 114700 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:12,921-Speed 2498.18 samples/sec Loss 5.6465 LearningRate 0.000917 Epoch: 5 Global Step: 114710 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:21,122-Speed 2497.50 samples/sec Loss 5.6667 LearningRate 0.000917 Epoch: 5 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:29,265-Speed 2515.31 samples/sec Loss 5.6371 LearningRate 0.000917 Epoch: 5 Global Step: 114730 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:37,460-Speed 2499.53 samples/sec Loss 5.6008 LearningRate 0.000917 Epoch: 5 Global Step: 114740 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:45,661-Speed 2497.49 samples/sec Loss 5.6036 LearningRate 0.000917 Epoch: 5 Global Step: 114750 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:39:53,857-Speed 2499.16 samples/sec Loss 5.6098 LearningRate 0.000917 Epoch: 5 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:02,052-Speed 2499.56 samples/sec Loss 5.6542 LearningRate 0.000917 Epoch: 5 Global Step: 114770 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:10,247-Speed 2499.65 samples/sec Loss 5.7281 LearningRate 0.000917 Epoch: 5 Global Step: 114780 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:18,387-Speed 2516.22 samples/sec Loss 5.7921 LearningRate 0.000917 Epoch: 5 Global Step: 114790 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:26,580-Speed 2500.02 samples/sec Loss 5.6626 LearningRate 0.000917 Epoch: 5 Global Step: 114800 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:34,789-Speed 2495.44 samples/sec Loss 5.6761 LearningRate 0.000916 Epoch: 5 Global Step: 114810 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:42,982-Speed 2499.88 samples/sec Loss 5.6614 LearningRate 0.000916 Epoch: 5 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:51,176-Speed 2500.01 samples/sec Loss 5.6542 LearningRate 0.000916 Epoch: 5 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:40:59,373-Speed 2498.75 samples/sec Loss 5.7564 LearningRate 0.000916 Epoch: 5 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:07,515-Speed 2515.88 samples/sec Loss 5.6472 LearningRate 0.000916 Epoch: 5 Global Step: 114850 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:15,711-Speed 2499.07 samples/sec Loss 5.5881 LearningRate 0.000916 Epoch: 5 Global Step: 114860 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:23,905-Speed 2499.81 samples/sec Loss 5.6025 LearningRate 0.000916 Epoch: 5 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:32,108-Speed 2496.78 samples/sec Loss 5.6149 LearningRate 0.000916 Epoch: 5 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:40,305-Speed 2498.88 samples/sec Loss 5.6598 LearningRate 0.000916 Epoch: 5 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:48,505-Speed 2497.97 samples/sec Loss 5.8112 LearningRate 0.000916 Epoch: 5 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:41:56,651-Speed 2514.70 samples/sec Loss 5.6881 LearningRate 0.000916 Epoch: 5 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:04,845-Speed 2499.71 samples/sec Loss 5.6980 LearningRate 0.000916 Epoch: 5 Global Step: 114920 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:13,038-Speed 2500.02 samples/sec Loss 5.7172 LearningRate 0.000916 Epoch: 5 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:21,235-Speed 2498.93 samples/sec Loss 5.7147 LearningRate 0.000916 Epoch: 5 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:29,429-Speed 2499.64 samples/sec Loss 5.6814 LearningRate 0.000916 Epoch: 5 Global Step: 114950 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:37,623-Speed 2499.85 samples/sec Loss 5.7352 LearningRate 0.000916 Epoch: 5 Global Step: 114960 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:45,770-Speed 2514.46 samples/sec Loss 5.6983 LearningRate 0.000916 Epoch: 5 Global Step: 114970 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:42:53,964-Speed 2499.84 samples/sec Loss 5.7020 LearningRate 0.000916 Epoch: 5 Global Step: 114980 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:02,160-Speed 2499.12 samples/sec Loss 5.6824 LearningRate 0.000916 Epoch: 5 Global Step: 114990 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:10,355-Speed 2499.55 samples/sec Loss 5.6162 LearningRate 0.000916 Epoch: 5 Global Step: 115000 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:18,549-Speed 2499.90 samples/sec Loss 5.6211 LearningRate 0.000916 Epoch: 5 Global Step: 115010 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:26,747-Speed 2498.49 samples/sec Loss 5.7012 LearningRate 0.000916 Epoch: 5 Global Step: 115020 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:34,887-Speed 2516.25 samples/sec Loss 5.5788 LearningRate 0.000916 Epoch: 5 Global Step: 115030 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:43,079-Speed 2500.29 samples/sec Loss 5.7051 LearningRate 0.000916 Epoch: 5 Global Step: 115040 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:51,275-Speed 2499.19 samples/sec Loss 5.7709 LearningRate 0.000916 Epoch: 5 Global Step: 115050 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:43:59,469-Speed 2499.89 samples/sec Loss 5.7734 LearningRate 0.000916 Epoch: 5 Global Step: 115060 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:07,664-Speed 2499.51 samples/sec Loss 5.7001 LearningRate 0.000916 Epoch: 5 Global Step: 115070 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:15,858-Speed 2499.81 samples/sec Loss 5.6507 LearningRate 0.000916 Epoch: 5 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:23,999-Speed 2516.92 samples/sec Loss 5.5981 LearningRate 0.000916 Epoch: 5 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:32,197-Speed 2498.71 samples/sec Loss 5.6344 LearningRate 0.000916 Epoch: 5 Global Step: 115100 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:40,391-Speed 2499.78 samples/sec Loss 5.5750 LearningRate 0.000916 Epoch: 5 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:48,585-Speed 2499.87 samples/sec Loss 5.6186 LearningRate 0.000916 Epoch: 5 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:44:56,788-Speed 2497.17 samples/sec Loss 5.6153 LearningRate 0.000916 Epoch: 5 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:04,986-Speed 2498.21 samples/sec Loss 5.5134 LearningRate 0.000916 Epoch: 5 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:13,126-Speed 2516.48 samples/sec Loss 5.5468 LearningRate 0.000916 Epoch: 5 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:21,327-Speed 2497.76 samples/sec Loss 5.6138 LearningRate 0.000916 Epoch: 5 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:29,521-Speed 2499.60 samples/sec Loss 5.5826 LearningRate 0.000916 Epoch: 5 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:37,716-Speed 2499.56 samples/sec Loss 5.6266 LearningRate 0.000916 Epoch: 5 Global Step: 115180 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:45,914-Speed 2498.73 samples/sec Loss 5.5850 LearningRate 0.000916 Epoch: 5 Global Step: 115190 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:45:54,111-Speed 2499.06 samples/sec Loss 5.6745 LearningRate 0.000915 Epoch: 5 Global Step: 115200 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:02,252-Speed 2516.12 samples/sec Loss 5.6320 LearningRate 0.000915 Epoch: 5 Global Step: 115210 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:10,459-Speed 2495.84 samples/sec Loss 5.6491 LearningRate 0.000915 Epoch: 5 Global Step: 115220 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:18,653-Speed 2499.64 samples/sec Loss 5.6086 LearningRate 0.000915 Epoch: 5 Global Step: 115230 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:26,848-Speed 2499.69 samples/sec Loss 5.5872 LearningRate 0.000915 Epoch: 5 Global Step: 115240 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:35,041-Speed 2499.92 samples/sec Loss 5.6192 LearningRate 0.000915 Epoch: 5 Global Step: 115250 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:43,234-Speed 2500.14 samples/sec Loss 5.6257 LearningRate 0.000915 Epoch: 5 Global Step: 115260 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:51,374-Speed 2516.50 samples/sec Loss 5.5549 LearningRate 0.000915 Epoch: 5 Global Step: 115270 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:46:59,569-Speed 2499.43 samples/sec Loss 5.6100 LearningRate 0.000915 Epoch: 5 Global Step: 115280 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:07,767-Speed 2498.78 samples/sec Loss 5.7587 LearningRate 0.000915 Epoch: 5 Global Step: 115290 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:15,963-Speed 2499.04 samples/sec Loss 5.6792 LearningRate 0.000915 Epoch: 5 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:24,178-Speed 2493.61 samples/sec Loss 5.7129 LearningRate 0.000915 Epoch: 5 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:32,394-Speed 2493.26 samples/sec Loss 5.6962 LearningRate 0.000915 Epoch: 5 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:40,536-Speed 2515.68 samples/sec Loss 5.6963 LearningRate 0.000915 Epoch: 5 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:48,732-Speed 2499.32 samples/sec Loss 5.5736 LearningRate 0.000915 Epoch: 5 Global Step: 115340 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:47:56,943-Speed 2494.74 samples/sec Loss 5.6423 LearningRate 0.000915 Epoch: 5 Global Step: 115350 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:05,149-Speed 2496.19 samples/sec Loss 5.6400 LearningRate 0.000915 Epoch: 5 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:13,349-Speed 2498.08 samples/sec Loss 5.6514 LearningRate 0.000915 Epoch: 5 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:21,548-Speed 2498.35 samples/sec Loss 5.5103 LearningRate 0.000915 Epoch: 5 Global Step: 115380 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:29,694-Speed 2514.52 samples/sec Loss 5.5470 LearningRate 0.000915 Epoch: 5 Global Step: 115390 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:37,893-Speed 2498.30 samples/sec Loss 5.6699 LearningRate 0.000915 Epoch: 5 Global Step: 115400 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:46,090-Speed 2498.71 samples/sec Loss 5.6078 LearningRate 0.000915 Epoch: 5 Global Step: 115410 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:48:54,289-Speed 2498.52 samples/sec Loss 5.5968 LearningRate 0.000915 Epoch: 5 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:49:02,491-Speed 2497.24 samples/sec Loss 5.5973 LearningRate 0.000915 Epoch: 5 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:49:10,690-Speed 2498.14 samples/sec Loss 5.6256 LearningRate 0.000915 Epoch: 5 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:49:18,850-Speed 2510.18 samples/sec Loss 5.6094 LearningRate 0.000915 Epoch: 5 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 16:49:27,012-Speed 2509.67 samples/sec Loss 5.5964 LearningRate 0.000915 Epoch: 5 Global Step: 115460 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:49:35,218-Speed 2496.28 samples/sec Loss 5.6086 LearningRate 0.000915 Epoch: 5 Global Step: 115470 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:49:43,415-Speed 2498.76 samples/sec Loss 5.5383 LearningRate 0.000915 Epoch: 5 Global Step: 115480 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:49:51,612-Speed 2498.84 samples/sec Loss 5.5875 LearningRate 0.000915 Epoch: 5 Global Step: 115490 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:49:59,810-Speed 2498.73 samples/sec Loss 5.6312 LearningRate 0.000915 Epoch: 5 Global Step: 115500 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:07,955-Speed 2514.74 samples/sec Loss 5.5325 LearningRate 0.000915 Epoch: 5 Global Step: 115510 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:16,152-Speed 2498.95 samples/sec Loss 5.5810 LearningRate 0.000915 Epoch: 5 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:24,350-Speed 2498.64 samples/sec Loss 5.5688 LearningRate 0.000915 Epoch: 5 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:32,549-Speed 2498.45 samples/sec Loss 5.5592 LearningRate 0.000915 Epoch: 5 Global Step: 115540 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:40,748-Speed 2498.27 samples/sec Loss 5.5222 LearningRate 0.000915 Epoch: 5 Global Step: 115550 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:48,949-Speed 2497.65 samples/sec Loss 5.6496 LearningRate 0.000915 Epoch: 5 Global Step: 115560 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:50:57,095-Speed 2514.81 samples/sec Loss 5.5135 LearningRate 0.000915 Epoch: 5 Global Step: 115570 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:05,300-Speed 2496.43 samples/sec Loss 5.6220 LearningRate 0.000915 Epoch: 5 Global Step: 115580 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:13,512-Speed 2494.22 samples/sec Loss 5.7065 LearningRate 0.000914 Epoch: 5 Global Step: 115590 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:21,730-Speed 2492.52 samples/sec Loss 5.6738 LearningRate 0.000914 Epoch: 5 Global Step: 115600 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:29,930-Speed 2497.78 samples/sec Loss 5.6137 LearningRate 0.000914 Epoch: 5 Global Step: 115610 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:38,127-Speed 2499.20 samples/sec Loss 5.6754 LearningRate 0.000914 Epoch: 5 Global Step: 115620 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:46,272-Speed 2514.88 samples/sec Loss 5.5767 LearningRate 0.000914 Epoch: 5 Global Step: 115630 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:51:54,466-Speed 2499.75 samples/sec Loss 5.6917 LearningRate 0.000914 Epoch: 5 Global Step: 115640 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:02,667-Speed 2497.77 samples/sec Loss 5.6119 LearningRate 0.000914 Epoch: 5 Global Step: 115650 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:10,869-Speed 2497.48 samples/sec Loss 5.6377 LearningRate 0.000914 Epoch: 5 Global Step: 115660 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:19,064-Speed 2499.74 samples/sec Loss 5.6346 LearningRate 0.000914 Epoch: 5 Global Step: 115670 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:27,259-Speed 2499.27 samples/sec Loss 5.6331 LearningRate 0.000914 Epoch: 5 Global Step: 115680 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:35,401-Speed 2515.99 samples/sec Loss 5.7392 LearningRate 0.000914 Epoch: 5 Global Step: 115690 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:43,597-Speed 2499.10 samples/sec Loss 5.6331 LearningRate 0.000914 Epoch: 5 Global Step: 115700 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:51,798-Speed 2497.82 samples/sec Loss 5.6363 LearningRate 0.000914 Epoch: 5 Global Step: 115710 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:52:59,991-Speed 2499.97 samples/sec Loss 5.6047 LearningRate 0.000914 Epoch: 5 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:08,198-Speed 2496.01 samples/sec Loss 5.6041 LearningRate 0.000914 Epoch: 5 Global Step: 115730 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:16,390-Speed 2500.63 samples/sec Loss 5.6666 LearningRate 0.000914 Epoch: 5 Global Step: 115740 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:24,535-Speed 2514.71 samples/sec Loss 5.5427 LearningRate 0.000914 Epoch: 5 Global Step: 115750 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:32,735-Speed 2498.16 samples/sec Loss 5.5996 LearningRate 0.000914 Epoch: 5 Global Step: 115760 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:40,933-Speed 2498.37 samples/sec Loss 5.5747 LearningRate 0.000914 Epoch: 5 Global Step: 115770 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:49,126-Speed 2500.02 samples/sec Loss 5.6191 LearningRate 0.000914 Epoch: 5 Global Step: 115780 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:53:57,324-Speed 2498.71 samples/sec Loss 5.7028 LearningRate 0.000914 Epoch: 5 Global Step: 115790 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:05,532-Speed 2495.73 samples/sec Loss 5.6637 LearningRate 0.000914 Epoch: 5 Global Step: 115800 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:13,672-Speed 2516.32 samples/sec Loss 5.6869 LearningRate 0.000914 Epoch: 5 Global Step: 115810 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:21,868-Speed 2499.23 samples/sec Loss 5.6327 LearningRate 0.000914 Epoch: 5 Global Step: 115820 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:30,085-Speed 2492.92 samples/sec Loss 5.6160 LearningRate 0.000914 Epoch: 5 Global Step: 115830 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:38,281-Speed 2499.07 samples/sec Loss 5.5964 LearningRate 0.000914 Epoch: 5 Global Step: 115840 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:46,477-Speed 2499.44 samples/sec Loss 5.6120 LearningRate 0.000914 Epoch: 5 Global Step: 115850 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:54:54,674-Speed 2498.77 samples/sec Loss 5.5853 LearningRate 0.000914 Epoch: 5 Global Step: 115860 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:02,817-Speed 2515.40 samples/sec Loss 5.5991 LearningRate 0.000914 Epoch: 5 Global Step: 115870 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:11,014-Speed 2498.90 samples/sec Loss 5.6483 LearningRate 0.000914 Epoch: 5 Global Step: 115880 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:19,212-Speed 2498.66 samples/sec Loss 5.5002 LearningRate 0.000914 Epoch: 5 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:27,406-Speed 2499.54 samples/sec Loss 5.5778 LearningRate 0.000914 Epoch: 5 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:35,601-Speed 2499.54 samples/sec Loss 5.5215 LearningRate 0.000914 Epoch: 5 Global Step: 115910 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:43,796-Speed 2499.67 samples/sec Loss 5.5887 LearningRate 0.000914 Epoch: 5 Global Step: 115920 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:55:51,940-Speed 2515.17 samples/sec Loss 5.4634 LearningRate 0.000914 Epoch: 5 Global Step: 115930 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:00,138-Speed 2498.69 samples/sec Loss 5.5452 LearningRate 0.000914 Epoch: 5 Global Step: 115940 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:08,341-Speed 2497.20 samples/sec Loss 5.5436 LearningRate 0.000914 Epoch: 5 Global Step: 115950 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:16,561-Speed 2491.91 samples/sec Loss 5.5550 LearningRate 0.000914 Epoch: 5 Global Step: 115960 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:24,757-Speed 2499.05 samples/sec Loss 5.4994 LearningRate 0.000914 Epoch: 5 Global Step: 115970 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:32,953-Speed 2499.39 samples/sec Loss 5.6099 LearningRate 0.000913 Epoch: 5 Global Step: 115980 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:41,095-Speed 2515.76 samples/sec Loss 5.6643 LearningRate 0.000913 Epoch: 5 Global Step: 115990 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:49,289-Speed 2500.07 samples/sec Loss 5.5611 LearningRate 0.000913 Epoch: 5 Global Step: 116000 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:56:57,485-Speed 2499.31 samples/sec Loss 5.5850 LearningRate 0.000913 Epoch: 5 Global Step: 116010 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:05,680-Speed 2499.28 samples/sec Loss 5.5738 LearningRate 0.000913 Epoch: 5 Global Step: 116020 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:13,876-Speed 2499.33 samples/sec Loss 5.5537 LearningRate 0.000913 Epoch: 5 Global Step: 116030 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:22,072-Speed 2499.18 samples/sec Loss 5.4888 LearningRate 0.000913 Epoch: 5 Global Step: 116040 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:30,216-Speed 2515.32 samples/sec Loss 5.5758 LearningRate 0.000913 Epoch: 5 Global Step: 116050 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:38,412-Speed 2499.20 samples/sec Loss 5.4995 LearningRate 0.000913 Epoch: 5 Global Step: 116060 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:46,609-Speed 2498.77 samples/sec Loss 5.6336 LearningRate 0.000913 Epoch: 5 Global Step: 116070 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:57:54,806-Speed 2498.91 samples/sec Loss 5.6145 LearningRate 0.000913 Epoch: 5 Global Step: 116080 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:03,017-Speed 2494.67 samples/sec Loss 5.6305 LearningRate 0.000913 Epoch: 5 Global Step: 116090 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:11,215-Speed 2498.16 samples/sec Loss 5.6773 LearningRate 0.000913 Epoch: 5 Global Step: 116100 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:19,364-Speed 2513.86 samples/sec Loss 5.5480 LearningRate 0.000913 Epoch: 5 Global Step: 116110 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:27,560-Speed 2499.05 samples/sec Loss 5.5283 LearningRate 0.000913 Epoch: 5 Global Step: 116120 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:35,758-Speed 2498.47 samples/sec Loss 5.5506 LearningRate 0.000913 Epoch: 5 Global Step: 116130 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:43,960-Speed 2497.46 samples/sec Loss 5.5912 LearningRate 0.000913 Epoch: 5 Global Step: 116140 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:58:52,156-Speed 2499.24 samples/sec Loss 5.5654 LearningRate 0.000913 Epoch: 5 Global Step: 116150 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:00,352-Speed 2498.98 samples/sec Loss 5.4714 LearningRate 0.000913 Epoch: 5 Global Step: 116160 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:08,498-Speed 2514.63 samples/sec Loss 5.5439 LearningRate 0.000913 Epoch: 5 Global Step: 116170 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:16,696-Speed 2498.57 samples/sec Loss 5.4997 LearningRate 0.000913 Epoch: 5 Global Step: 116180 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:24,895-Speed 2498.56 samples/sec Loss 5.6311 LearningRate 0.000913 Epoch: 5 Global Step: 116190 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:33,094-Speed 2498.37 samples/sec Loss 5.5811 LearningRate 0.000913 Epoch: 5 Global Step: 116200 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:41,294-Speed 2497.83 samples/sec Loss 5.6164 LearningRate 0.000913 Epoch: 5 Global Step: 116210 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:49,488-Speed 2499.60 samples/sec Loss 5.5530 LearningRate 0.000913 Epoch: 5 Global Step: 116220 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 16:59:57,636-Speed 2514.04 samples/sec Loss 5.6349 LearningRate 0.000913 Epoch: 5 Global Step: 116230 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:05,833-Speed 2499.04 samples/sec Loss 5.5389 LearningRate 0.000913 Epoch: 5 Global Step: 116240 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:14,028-Speed 2499.27 samples/sec Loss 5.5634 LearningRate 0.000913 Epoch: 5 Global Step: 116250 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:22,236-Speed 2495.78 samples/sec Loss 5.6089 LearningRate 0.000913 Epoch: 5 Global Step: 116260 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:30,438-Speed 2497.25 samples/sec Loss 5.6218 LearningRate 0.000913 Epoch: 5 Global Step: 116270 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:38,636-Speed 2498.64 samples/sec Loss 5.5723 LearningRate 0.000913 Epoch: 5 Global Step: 116280 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:46,794-Speed 2510.97 samples/sec Loss 5.6134 LearningRate 0.000913 Epoch: 5 Global Step: 116290 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:00:54,994-Speed 2497.91 samples/sec Loss 5.5968 LearningRate 0.000913 Epoch: 5 Global Step: 116300 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:03,190-Speed 2499.10 samples/sec Loss 5.6709 LearningRate 0.000913 Epoch: 5 Global Step: 116310 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:11,397-Speed 2495.84 samples/sec Loss 5.6678 LearningRate 0.000913 Epoch: 5 Global Step: 116320 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:19,592-Speed 2499.37 samples/sec Loss 5.6253 LearningRate 0.000913 Epoch: 5 Global Step: 116330 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:27,788-Speed 2499.28 samples/sec Loss 5.5822 LearningRate 0.000913 Epoch: 5 Global Step: 116340 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:35,936-Speed 2514.10 samples/sec Loss 5.6195 LearningRate 0.000913 Epoch: 5 Global Step: 116350 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:44,136-Speed 2498.12 samples/sec Loss 5.5720 LearningRate 0.000913 Epoch: 5 Global Step: 116360 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:01:52,334-Speed 2498.34 samples/sec Loss 5.5393 LearningRate 0.000912 Epoch: 5 Global Step: 116370 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:00,539-Speed 2496.54 samples/sec Loss 5.6334 LearningRate 0.000912 Epoch: 5 Global Step: 116380 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:08,739-Speed 2498.13 samples/sec Loss 5.5969 LearningRate 0.000912 Epoch: 5 Global Step: 116390 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:16,937-Speed 2498.38 samples/sec Loss 5.6215 LearningRate 0.000912 Epoch: 5 Global Step: 116400 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:25,079-Speed 2515.91 samples/sec Loss 5.5610 LearningRate 0.000912 Epoch: 5 Global Step: 116410 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:33,278-Speed 2498.22 samples/sec Loss 5.5780 LearningRate 0.000912 Epoch: 5 Global Step: 116420 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:41,477-Speed 2498.38 samples/sec Loss 5.6639 LearningRate 0.000912 Epoch: 5 Global Step: 116430 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:49,674-Speed 2498.91 samples/sec Loss 5.5871 LearningRate 0.000912 Epoch: 5 Global Step: 116440 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:02:57,882-Speed 2495.57 samples/sec Loss 5.5561 LearningRate 0.000912 Epoch: 5 Global Step: 116450 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:06,082-Speed 2498.15 samples/sec Loss 5.7037 LearningRate 0.000912 Epoch: 5 Global Step: 116460 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:14,225-Speed 2515.36 samples/sec Loss 5.5641 LearningRate 0.000912 Epoch: 5 Global Step: 116470 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:22,422-Speed 2498.85 samples/sec Loss 5.5221 LearningRate 0.000912 Epoch: 5 Global Step: 116480 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:30,622-Speed 2497.88 samples/sec Loss 5.6120 LearningRate 0.000912 Epoch: 5 Global Step: 116490 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:38,825-Speed 2497.28 samples/sec Loss 5.6586 LearningRate 0.000912 Epoch: 5 Global Step: 116500 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:47,021-Speed 2498.99 samples/sec Loss 5.6317 LearningRate 0.000912 Epoch: 5 Global Step: 116510 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:03:55,219-Speed 2498.54 samples/sec Loss 5.5578 LearningRate 0.000912 Epoch: 5 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:03,376-Speed 2511.07 samples/sec Loss 5.6191 LearningRate 0.000912 Epoch: 5 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:11,576-Speed 2498.49 samples/sec Loss 5.5317 LearningRate 0.000912 Epoch: 5 Global Step: 116540 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:19,778-Speed 2497.31 samples/sec Loss 5.6283 LearningRate 0.000912 Epoch: 5 Global Step: 116550 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:27,986-Speed 2495.32 samples/sec Loss 5.5466 LearningRate 0.000912 Epoch: 5 Global Step: 116560 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:36,186-Speed 2498.22 samples/sec Loss 5.5503 LearningRate 0.000912 Epoch: 5 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:44,387-Speed 2497.80 samples/sec Loss 5.5525 LearningRate 0.000912 Epoch: 5 Global Step: 116580 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:04:52,543-Speed 2511.43 samples/sec Loss 5.5950 LearningRate 0.000912 Epoch: 5 Global Step: 116590 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:00,743-Speed 2497.86 samples/sec Loss 5.6181 LearningRate 0.000912 Epoch: 5 Global Step: 116600 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:08,939-Speed 2499.44 samples/sec Loss 5.5350 LearningRate 0.000912 Epoch: 5 Global Step: 116610 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:17,136-Speed 2498.67 samples/sec Loss 5.5462 LearningRate 0.000912 Epoch: 5 Global Step: 116620 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:25,335-Speed 2498.52 samples/sec Loss 5.5351 LearningRate 0.000912 Epoch: 5 Global Step: 116630 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:33,535-Speed 2497.79 samples/sec Loss 5.4985 LearningRate 0.000912 Epoch: 5 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:41,682-Speed 2514.35 samples/sec Loss 5.6103 LearningRate 0.000912 Epoch: 5 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:05:49,880-Speed 2498.70 samples/sec Loss 5.4953 LearningRate 0.000912 Epoch: 5 Global Step: 116660 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:05:58,077-Speed 2498.77 samples/sec Loss 5.5394 LearningRate 0.000912 Epoch: 5 Global Step: 116670 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:06,283-Speed 2496.22 samples/sec Loss 5.5604 LearningRate 0.000912 Epoch: 5 Global Step: 116680 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:14,490-Speed 2495.84 samples/sec Loss 5.5244 LearningRate 0.000912 Epoch: 5 Global Step: 116690 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:22,683-Speed 2500.38 samples/sec Loss 5.5744 LearningRate 0.000912 Epoch: 5 Global Step: 116700 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:30,828-Speed 2514.90 samples/sec Loss 5.5385 LearningRate 0.000912 Epoch: 5 Global Step: 116710 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:39,023-Speed 2499.41 samples/sec Loss 5.5133 LearningRate 0.000912 Epoch: 5 Global Step: 116720 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:47,219-Speed 2499.38 samples/sec Loss 5.4945 LearningRate 0.000912 Epoch: 5 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:06:55,414-Speed 2499.37 samples/sec Loss 5.5408 LearningRate 0.000912 Epoch: 5 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:03,611-Speed 2498.90 samples/sec Loss 5.4381 LearningRate 0.000912 Epoch: 5 Global Step: 116750 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:11,816-Speed 2496.59 samples/sec Loss 5.5213 LearningRate 0.000911 Epoch: 5 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:19,960-Speed 2515.09 samples/sec Loss 5.7307 LearningRate 0.000911 Epoch: 5 Global Step: 116770 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:28,161-Speed 2498.02 samples/sec Loss 5.6985 LearningRate 0.000911 Epoch: 5 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:36,362-Speed 2497.61 samples/sec Loss 5.6150 LearningRate 0.000911 Epoch: 5 Global Step: 116790 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:44,559-Speed 2498.86 samples/sec Loss 5.5594 LearningRate 0.000911 Epoch: 5 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:07:52,717-Speed 2511.23 samples/sec Loss 5.6439 LearningRate 0.000911 Epoch: 5 Global Step: 116810 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:00,917-Speed 2497.80 samples/sec Loss 5.5438 LearningRate 0.000911 Epoch: 5 Global Step: 116820 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:09,063-Speed 2514.63 samples/sec Loss 5.4558 LearningRate 0.000911 Epoch: 5 Global Step: 116830 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:17,262-Speed 2498.06 samples/sec Loss 5.6110 LearningRate 0.000911 Epoch: 5 Global Step: 116840 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:25,459-Speed 2499.06 samples/sec Loss 5.6025 LearningRate 0.000911 Epoch: 5 Global Step: 116850 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:33,657-Speed 2498.45 samples/sec Loss 5.6411 LearningRate 0.000911 Epoch: 5 Global Step: 116860 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:41,855-Speed 2498.70 samples/sec Loss 5.6748 LearningRate 0.000911 Epoch: 5 Global Step: 116870 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:50,052-Speed 2498.74 samples/sec Loss 5.6459 LearningRate 0.000911 Epoch: 5 Global Step: 116880 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:08:58,203-Speed 2512.97 samples/sec Loss 5.5901 LearningRate 0.000911 Epoch: 5 Global Step: 116890 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:06,401-Speed 2498.66 samples/sec Loss 5.6014 LearningRate 0.000911 Epoch: 5 Global Step: 116900 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:14,602-Speed 2497.58 samples/sec Loss 5.5407 LearningRate 0.000911 Epoch: 5 Global Step: 116910 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:22,817-Speed 2493.55 samples/sec Loss 5.5494 LearningRate 0.000911 Epoch: 5 Global Step: 116920 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:31,017-Speed 2497.95 samples/sec Loss 5.5474 LearningRate 0.000911 Epoch: 5 Global Step: 116930 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:39,217-Speed 2497.89 samples/sec Loss 5.5219 LearningRate 0.000911 Epoch: 5 Global Step: 116940 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:47,368-Speed 2513.04 samples/sec Loss 5.4879 LearningRate 0.000911 Epoch: 5 Global Step: 116950 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:09:55,567-Speed 2498.27 samples/sec Loss 5.4587 LearningRate 0.000911 Epoch: 5 Global Step: 116960 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:03,766-Speed 2498.30 samples/sec Loss 5.4768 LearningRate 0.000911 Epoch: 5 Global Step: 116970 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:11,964-Speed 2498.63 samples/sec Loss 5.5036 LearningRate 0.000911 Epoch: 5 Global Step: 116980 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:20,161-Speed 2498.87 samples/sec Loss 5.5229 LearningRate 0.000911 Epoch: 5 Global Step: 116990 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:28,358-Speed 2498.74 samples/sec Loss 5.6572 LearningRate 0.000911 Epoch: 5 Global Step: 117000 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:36,504-Speed 2514.70 samples/sec Loss 5.6234 LearningRate 0.000911 Epoch: 5 Global Step: 117010 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:44,701-Speed 2498.69 samples/sec Loss 5.6120 LearningRate 0.000911 Epoch: 5 Global Step: 117020 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:10:52,897-Speed 2499.42 samples/sec Loss 5.5118 LearningRate 0.000911 Epoch: 5 Global Step: 117030 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:01,093-Speed 2499.18 samples/sec Loss 5.6588 LearningRate 0.000911 Epoch: 5 Global Step: 117040 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:09,292-Speed 2498.57 samples/sec Loss 5.5308 LearningRate 0.000911 Epoch: 5 Global Step: 117050 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:17,490-Speed 2498.33 samples/sec Loss 5.7556 LearningRate 0.000911 Epoch: 5 Global Step: 117060 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:25,631-Speed 2516.08 samples/sec Loss 5.5446 LearningRate 0.000911 Epoch: 5 Global Step: 117070 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:33,830-Speed 2498.40 samples/sec Loss 5.5823 LearningRate 0.000911 Epoch: 5 Global Step: 117080 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:42,027-Speed 2498.88 samples/sec Loss 5.5738 LearningRate 0.000911 Epoch: 5 Global Step: 117090 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:50,225-Speed 2498.83 samples/sec Loss 5.5390 LearningRate 0.000911 Epoch: 5 Global Step: 117100 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:11:58,421-Speed 2499.08 samples/sec Loss 5.5088 LearningRate 0.000911 Epoch: 5 Global Step: 117110 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:06,621-Speed 2497.84 samples/sec Loss 5.5935 LearningRate 0.000911 Epoch: 5 Global Step: 117120 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:14,769-Speed 2514.24 samples/sec Loss 5.5106 LearningRate 0.000911 Epoch: 5 Global Step: 117130 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:22,966-Speed 2498.73 samples/sec Loss 5.5776 LearningRate 0.000911 Epoch: 5 Global Step: 117140 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:31,164-Speed 2498.56 samples/sec Loss 5.5721 LearningRate 0.000911 Epoch: 5 Global Step: 117150 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:39,368-Speed 2496.70 samples/sec Loss 5.5731 LearningRate 0.000910 Epoch: 5 Global Step: 117160 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:47,566-Speed 2498.61 samples/sec Loss 5.5380 LearningRate 0.000910 Epoch: 5 Global Step: 117170 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:12:55,765-Speed 2498.34 samples/sec Loss 5.4699 LearningRate 0.000910 Epoch: 5 Global Step: 117180 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:03,908-Speed 2515.31 samples/sec Loss 5.6033 LearningRate 0.000910 Epoch: 5 Global Step: 117190 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:12,105-Speed 2499.10 samples/sec Loss 5.5386 LearningRate 0.000910 Epoch: 5 Global Step: 117200 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:20,303-Speed 2498.43 samples/sec Loss 5.6063 LearningRate 0.000910 Epoch: 5 Global Step: 117210 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:28,498-Speed 2499.42 samples/sec Loss 5.5751 LearningRate 0.000910 Epoch: 5 Global Step: 117220 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:36,698-Speed 2497.75 samples/sec Loss 5.5253 LearningRate 0.000910 Epoch: 5 Global Step: 117230 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:44,897-Speed 2498.43 samples/sec Loss 5.6380 LearningRate 0.000910 Epoch: 5 Global Step: 117240 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:13:53,040-Speed 2515.49 samples/sec Loss 5.6525 LearningRate 0.000910 Epoch: 5 Global Step: 117250 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:01,238-Speed 2498.42 samples/sec Loss 5.6385 LearningRate 0.000910 Epoch: 5 Global Step: 117260 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:09,443-Speed 2496.58 samples/sec Loss 5.6878 LearningRate 0.000910 Epoch: 5 Global Step: 117270 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:17,639-Speed 2499.03 samples/sec Loss 5.5679 LearningRate 0.000910 Epoch: 5 Global Step: 117280 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:25,850-Speed 2494.67 samples/sec Loss 5.6156 LearningRate 0.000910 Epoch: 5 Global Step: 117290 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:34,048-Speed 2498.53 samples/sec Loss 5.6018 LearningRate 0.000910 Epoch: 5 Global Step: 117300 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:42,194-Speed 2514.43 samples/sec Loss 5.4973 LearningRate 0.000910 Epoch: 5 Global Step: 117310 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:50,394-Speed 2498.19 samples/sec Loss 5.5479 LearningRate 0.000910 Epoch: 5 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:14:58,607-Speed 2494.06 samples/sec Loss 5.4285 LearningRate 0.000910 Epoch: 5 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:06,806-Speed 2498.26 samples/sec Loss 5.4863 LearningRate 0.000910 Epoch: 5 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:15,004-Speed 2498.50 samples/sec Loss 5.4969 LearningRate 0.000910 Epoch: 5 Global Step: 117350 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:23,207-Speed 2497.10 samples/sec Loss 5.4446 LearningRate 0.000910 Epoch: 5 Global Step: 117360 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:31,352-Speed 2514.91 samples/sec Loss 5.5414 LearningRate 0.000910 Epoch: 5 Global Step: 117370 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:39,546-Speed 2499.98 samples/sec Loss 5.5501 LearningRate 0.000910 Epoch: 5 Global Step: 117380 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:47,757-Speed 2494.39 samples/sec Loss 5.5816 LearningRate 0.000910 Epoch: 5 Global Step: 117390 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:15:55,958-Speed 2497.73 samples/sec Loss 5.5896 LearningRate 0.000910 Epoch: 5 Global Step: 117400 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:04,154-Speed 2499.13 samples/sec Loss 5.6010 LearningRate 0.000910 Epoch: 5 Global Step: 117410 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:12,353-Speed 2498.22 samples/sec Loss 5.5591 LearningRate 0.000910 Epoch: 5 Global Step: 117420 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:20,499-Speed 2514.56 samples/sec Loss 5.4206 LearningRate 0.000910 Epoch: 5 Global Step: 117430 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:28,697-Speed 2498.58 samples/sec Loss 5.5556 LearningRate 0.000910 Epoch: 5 Global Step: 117440 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:36,894-Speed 2498.88 samples/sec Loss 5.5821 LearningRate 0.000910 Epoch: 5 Global Step: 117450 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:45,093-Speed 2498.44 samples/sec Loss 5.5368 LearningRate 0.000910 Epoch: 5 Global Step: 117460 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:16:53,291-Speed 2498.48 samples/sec Loss 5.6330 LearningRate 0.000910 Epoch: 5 Global Step: 117470 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:01,489-Speed 2498.64 samples/sec Loss 5.5288 LearningRate 0.000910 Epoch: 5 Global Step: 117480 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:09,633-Speed 2515.02 samples/sec Loss 5.5083 LearningRate 0.000910 Epoch: 5 Global Step: 117490 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:17,831-Speed 2498.63 samples/sec Loss 5.5358 LearningRate 0.000910 Epoch: 5 Global Step: 117500 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:26,033-Speed 2497.23 samples/sec Loss 5.4163 LearningRate 0.000910 Epoch: 5 Global Step: 117510 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:34,231-Speed 2498.68 samples/sec Loss 5.5312 LearningRate 0.000910 Epoch: 5 Global Step: 117520 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:42,435-Speed 2496.92 samples/sec Loss 5.5519 LearningRate 0.000910 Epoch: 5 Global Step: 117530 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:50,631-Speed 2499.20 samples/sec Loss 5.4456 LearningRate 0.000910 Epoch: 5 Global Step: 117540 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:17:58,779-Speed 2513.89 samples/sec Loss 5.5235 LearningRate 0.000909 Epoch: 5 Global Step: 117550 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:06,977-Speed 2498.54 samples/sec Loss 5.4618 LearningRate 0.000909 Epoch: 5 Global Step: 117560 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:15,174-Speed 2498.81 samples/sec Loss 5.4734 LearningRate 0.000909 Epoch: 5 Global Step: 117570 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:23,378-Speed 2496.76 samples/sec Loss 5.5199 LearningRate 0.000909 Epoch: 5 Global Step: 117580 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:31,575-Speed 2498.84 samples/sec Loss 5.4910 LearningRate 0.000909 Epoch: 5 Global Step: 117590 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:39,773-Speed 2498.41 samples/sec Loss 5.4267 LearningRate 0.000909 Epoch: 5 Global Step: 117600 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:47,915-Speed 2516.04 samples/sec Loss 5.4873 LearningRate 0.000909 Epoch: 5 Global Step: 117610 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:18:56,117-Speed 2497.51 samples/sec Loss 5.4540 LearningRate 0.000909 Epoch: 5 Global Step: 117620 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:04,311-Speed 2499.49 samples/sec Loss 5.5123 LearningRate 0.000909 Epoch: 5 Global Step: 117630 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:12,508-Speed 2498.89 samples/sec Loss 5.5146 LearningRate 0.000909 Epoch: 5 Global Step: 117640 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:20,709-Speed 2497.61 samples/sec Loss 5.5322 LearningRate 0.000909 Epoch: 5 Global Step: 117650 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:28,907-Speed 2499.08 samples/sec Loss 5.5097 LearningRate 0.000909 Epoch: 5 Global Step: 117660 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:37,064-Speed 2511.16 samples/sec Loss 5.5244 LearningRate 0.000909 Epoch: 5 Global Step: 117670 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:45,259-Speed 2499.33 samples/sec Loss 5.4296 LearningRate 0.000909 Epoch: 5 Global Step: 117680 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:19:53,461-Speed 2497.68 samples/sec Loss 5.4747 LearningRate 0.000909 Epoch: 5 Global Step: 117690 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:01,656-Speed 2499.50 samples/sec Loss 5.5028 LearningRate 0.000909 Epoch: 5 Global Step: 117700 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:09,854-Speed 2498.72 samples/sec Loss 5.4572 LearningRate 0.000909 Epoch: 5 Global Step: 117710 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:18,059-Speed 2496.24 samples/sec Loss 5.4563 LearningRate 0.000909 Epoch: 5 Global Step: 117720 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:26,202-Speed 2515.36 samples/sec Loss 5.4684 LearningRate 0.000909 Epoch: 5 Global Step: 117730 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:34,400-Speed 2498.50 samples/sec Loss 5.5443 LearningRate 0.000909 Epoch: 5 Global Step: 117740 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:42,612-Speed 2494.59 samples/sec Loss 5.4941 LearningRate 0.000909 Epoch: 5 Global Step: 117750 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:50,807-Speed 2499.20 samples/sec Loss 5.5173 LearningRate 0.000909 Epoch: 5 Global Step: 117760 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:20:59,008-Speed 2497.84 samples/sec Loss 5.5704 LearningRate 0.000909 Epoch: 5 Global Step: 117770 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:07,205-Speed 2498.64 samples/sec Loss 5.5509 LearningRate 0.000909 Epoch: 5 Global Step: 117780 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:15,356-Speed 2513.13 samples/sec Loss 5.4727 LearningRate 0.000909 Epoch: 5 Global Step: 117790 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:23,552-Speed 2499.13 samples/sec Loss 5.5653 LearningRate 0.000909 Epoch: 5 Global Step: 117800 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:31,755-Speed 2497.08 samples/sec Loss 5.5422 LearningRate 0.000909 Epoch: 5 Global Step: 117810 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:39,970-Speed 2493.37 samples/sec Loss 5.4773 LearningRate 0.000909 Epoch: 5 Global Step: 117820 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:48,168-Speed 2498.62 samples/sec Loss 5.4576 LearningRate 0.000909 Epoch: 5 Global Step: 117830 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:21:56,367-Speed 2498.30 samples/sec Loss 5.5203 LearningRate 0.000909 Epoch: 5 Global Step: 117840 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:04,508-Speed 2517.25 samples/sec Loss 5.6141 LearningRate 0.000909 Epoch: 5 Global Step: 117850 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:12,704-Speed 2499.00 samples/sec Loss 5.5410 LearningRate 0.000909 Epoch: 5 Global Step: 117860 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:20,904-Speed 2498.23 samples/sec Loss 5.4990 LearningRate 0.000909 Epoch: 5 Global Step: 117870 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:29,102-Speed 2498.56 samples/sec Loss 5.5470 LearningRate 0.000909 Epoch: 5 Global Step: 117880 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:37,300-Speed 2498.42 samples/sec Loss 5.4891 LearningRate 0.000909 Epoch: 5 Global Step: 117890 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:45,497-Speed 2498.94 samples/sec Loss 5.6915 LearningRate 0.000909 Epoch: 5 Global Step: 117900 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:22:53,643-Speed 2514.62 samples/sec Loss 5.5376 LearningRate 0.000909 Epoch: 5 Global Step: 117910 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:01,836-Speed 2499.92 samples/sec Loss 5.5619 LearningRate 0.000909 Epoch: 5 Global Step: 117920 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:10,032-Speed 2499.33 samples/sec Loss 5.5350 LearningRate 0.000909 Epoch: 5 Global Step: 117930 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:18,230-Speed 2498.67 samples/sec Loss 5.5121 LearningRate 0.000908 Epoch: 5 Global Step: 117940 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:26,427-Speed 2498.76 samples/sec Loss 5.6062 LearningRate 0.000908 Epoch: 5 Global Step: 117950 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:34,622-Speed 2499.55 samples/sec Loss 5.5821 LearningRate 0.000908 Epoch: 5 Global Step: 117960 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:42,769-Speed 2514.14 samples/sec Loss 5.5019 LearningRate 0.000908 Epoch: 5 Global Step: 117970 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:50,965-Speed 2499.39 samples/sec Loss 5.5428 LearningRate 0.000908 Epoch: 5 Global Step: 117980 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:23:59,166-Speed 2497.98 samples/sec Loss 5.4223 LearningRate 0.000908 Epoch: 5 Global Step: 117990 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:24:07,360-Speed 2499.97 samples/sec Loss 5.5125 LearningRate 0.000908 Epoch: 5 Global Step: 118000 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:24:15,559-Speed 2498.21 samples/sec Loss 5.4888 LearningRate 0.000908 Epoch: 5 Global Step: 118010 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:24:23,760-Speed 2498.53 samples/sec Loss 5.4300 LearningRate 0.000908 Epoch: 5 Global Step: 118020 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:24:31,904-Speed 2514.95 samples/sec Loss 5.4594 LearningRate 0.000908 Epoch: 5 Global Step: 118030 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:24:40,103-Speed 2498.22 samples/sec Loss 5.4630 LearningRate 0.000908 Epoch: 5 Global Step: 118040 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:24:48,297-Speed 2499.76 samples/sec Loss 5.4811 LearningRate 0.000908 Epoch: 5 Global Step: 118050 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:24:56,494-Speed 2498.87 samples/sec Loss 5.5103 LearningRate 0.000908 Epoch: 5 Global Step: 118060 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:04,693-Speed 2498.34 samples/sec Loss 5.5628 LearningRate 0.000908 Epoch: 5 Global Step: 118070 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:12,900-Speed 2495.84 samples/sec Loss 5.3882 LearningRate 0.000908 Epoch: 5 Global Step: 118080 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:21,046-Speed 2514.31 samples/sec Loss 5.4730 LearningRate 0.000908 Epoch: 5 Global Step: 118090 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:29,257-Speed 2494.62 samples/sec Loss 5.4340 LearningRate 0.000908 Epoch: 5 Global Step: 118100 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:37,454-Speed 2498.98 samples/sec Loss 5.5306 LearningRate 0.000908 Epoch: 5 Global Step: 118110 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:45,661-Speed 2495.90 samples/sec Loss 5.4689 LearningRate 0.000908 Epoch: 5 Global Step: 118120 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:25:53,857-Speed 2499.30 samples/sec Loss 5.5222 LearningRate 0.000908 Epoch: 5 Global Step: 118130 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:02,055-Speed 2498.44 samples/sec Loss 5.4990 LearningRate 0.000908 Epoch: 5 Global Step: 118140 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:10,201-Speed 2514.34 samples/sec Loss 5.4191 LearningRate 0.000908 Epoch: 5 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:18,398-Speed 2499.09 samples/sec Loss 5.5515 LearningRate 0.000908 Epoch: 5 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:26,598-Speed 2497.77 samples/sec Loss 5.4829 LearningRate 0.000908 Epoch: 5 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:34,798-Speed 2498.24 samples/sec Loss 5.5007 LearningRate 0.000908 Epoch: 5 Global Step: 118180 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:42,995-Speed 2498.96 samples/sec Loss 5.4564 LearningRate 0.000908 Epoch: 5 Global Step: 118190 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:51,193-Speed 2498.46 samples/sec Loss 5.5090 LearningRate 0.000908 Epoch: 5 Global Step: 118200 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:26:59,343-Speed 2513.34 samples/sec Loss 5.5703 LearningRate 0.000908 Epoch: 5 Global Step: 118210 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:27:07,541-Speed 2498.69 samples/sec Loss 5.6275 LearningRate 0.000908 Epoch: 5 Global Step: 118220 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:27:15,737-Speed 2499.68 samples/sec Loss 5.6318 LearningRate 0.000908 Epoch: 5 Global Step: 118230 Fp16 Grad Scale: 65536 Required: 163 hours Training: 2022-07-06 17:27:23,891-Speed 2511.69 samples/sec Loss 5.5518 LearningRate 0.000908 Epoch: 5 Global Step: 118240 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:27:32,101-Speed 2500.86 samples/sec Loss 5.5944 LearningRate 0.000908 Epoch: 5 Global Step: 118250 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:27:40,298-Speed 2499.04 samples/sec Loss 5.5360 LearningRate 0.000908 Epoch: 5 Global Step: 118260 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:27:48,478-Speed 2517.52 samples/sec Loss 5.4202 LearningRate 0.000908 Epoch: 5 Global Step: 118270 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:27:56,720-Speed 2499.62 samples/sec Loss 5.4240 LearningRate 0.000908 Epoch: 5 Global Step: 118280 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:04,922-Speed 2497.11 samples/sec Loss 5.4680 LearningRate 0.000908 Epoch: 5 Global Step: 118290 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:13,144-Speed 2499.57 samples/sec Loss 5.5259 LearningRate 0.000908 Epoch: 5 Global Step: 118300 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:23,872-Speed 2501.60 samples/sec Loss 5.4713 LearningRate 0.000908 Epoch: 5 Global Step: 118310 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:32,072-Speed 2498.02 samples/sec Loss 5.5705 LearningRate 0.000908 Epoch: 5 Global Step: 118320 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:42,980-Speed 1886.56 samples/sec Loss 5.5964 LearningRate 0.000907 Epoch: 5 Global Step: 118330 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:51,183-Speed 2497.06 samples/sec Loss 5.5110 LearningRate 0.000907 Epoch: 5 Global Step: 118340 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:28:59,441-Speed 2496.95 samples/sec Loss 5.5350 LearningRate 0.000907 Epoch: 5 Global Step: 118350 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:29:08,881-Speed 2169.50 samples/sec Loss 5.5603 LearningRate 0.000907 Epoch: 5 Global Step: 118360 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:29:17,159-Speed 2487.80 samples/sec Loss 5.4746 LearningRate 0.000907 Epoch: 5 Global Step: 118370 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:29:25,511-Speed 2467.37 samples/sec Loss 5.6623 LearningRate 0.000907 Epoch: 5 Global Step: 118380 Fp16 Grad Scale: 32768 Required: 163 hours Training: 2022-07-06 17:29:33,809-Speed 2508.52 samples/sec Loss 5.5565 LearningRate 0.000907 Epoch: 5 Global Step: 118390 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:29:42,028-Speed 2492.04 samples/sec Loss 5.6397 LearningRate 0.000907 Epoch: 5 Global Step: 118400 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:29:50,240-Speed 2494.32 samples/sec Loss 5.5575 LearningRate 0.000907 Epoch: 5 Global Step: 118410 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:29:58,486-Speed 2492.18 samples/sec Loss 5.5412 LearningRate 0.000907 Epoch: 5 Global Step: 118420 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:09,220-Speed 1920.33 samples/sec Loss 5.5223 LearningRate 0.000907 Epoch: 5 Global Step: 118430 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:17,413-Speed 2499.83 samples/sec Loss 5.5162 LearningRate 0.000907 Epoch: 5 Global Step: 118440 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:25,583-Speed 2517.25 samples/sec Loss 5.4865 LearningRate 0.000907 Epoch: 5 Global Step: 118450 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:34,564-Speed 2308.80 samples/sec Loss 5.4627 LearningRate 0.000907 Epoch: 5 Global Step: 118460 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:42,759-Speed 2499.19 samples/sec Loss 5.6032 LearningRate 0.000907 Epoch: 5 Global Step: 118470 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:51,004-Speed 2501.65 samples/sec Loss 5.5525 LearningRate 0.000907 Epoch: 5 Global Step: 118480 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:30:59,559-Speed 2501.21 samples/sec Loss 5.4772 LearningRate 0.000907 Epoch: 5 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:07,758-Speed 2498.36 samples/sec Loss 5.6200 LearningRate 0.000907 Epoch: 5 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:15,920-Speed 2516.21 samples/sec Loss 5.5211 LearningRate 0.000907 Epoch: 5 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:24,933-Speed 2494.63 samples/sec Loss 5.6004 LearningRate 0.000907 Epoch: 5 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:33,157-Speed 2500.63 samples/sec Loss 5.5356 LearningRate 0.000907 Epoch: 5 Global Step: 118530 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:41,357-Speed 2497.93 samples/sec Loss 5.4822 LearningRate 0.000907 Epoch: 5 Global Step: 118540 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:51,028-Speed 2117.90 samples/sec Loss 5.5142 LearningRate 0.000907 Epoch: 5 Global Step: 118550 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:31:59,243-Speed 2499.44 samples/sec Loss 5.4120 LearningRate 0.000907 Epoch: 5 Global Step: 118560 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:07,546-Speed 2512.02 samples/sec Loss 5.4870 LearningRate 0.000907 Epoch: 5 Global Step: 118570 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:16,853-Speed 2200.77 samples/sec Loss 5.4992 LearningRate 0.000907 Epoch: 5 Global Step: 118580 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:25,109-Speed 2501.12 samples/sec Loss 5.5127 LearningRate 0.000907 Epoch: 5 Global Step: 118590 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:34,943-Speed 2496.97 samples/sec Loss 5.5011 LearningRate 0.000907 Epoch: 5 Global Step: 118600 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:43,137-Speed 2499.87 samples/sec Loss 5.4975 LearningRate 0.000907 Epoch: 5 Global Step: 118610 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:51,333-Speed 2499.02 samples/sec Loss 5.5232 LearningRate 0.000907 Epoch: 5 Global Step: 118620 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:32:59,475-Speed 2515.85 samples/sec Loss 5.4873 LearningRate 0.000907 Epoch: 5 Global Step: 118630 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:07,693-Speed 2500.63 samples/sec Loss 5.4228 LearningRate 0.000907 Epoch: 5 Global Step: 118640 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:15,893-Speed 2497.90 samples/sec Loss 5.4440 LearningRate 0.000907 Epoch: 5 Global Step: 118650 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:24,094-Speed 2497.85 samples/sec Loss 5.4481 LearningRate 0.000907 Epoch: 5 Global Step: 118660 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:32,296-Speed 2497.25 samples/sec Loss 5.4408 LearningRate 0.000907 Epoch: 5 Global Step: 118670 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:40,497-Speed 2498.06 samples/sec Loss 5.4640 LearningRate 0.000907 Epoch: 5 Global Step: 118680 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:48,653-Speed 2511.52 samples/sec Loss 5.5115 LearningRate 0.000907 Epoch: 5 Global Step: 118690 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:33:56,853-Speed 2497.68 samples/sec Loss 5.4301 LearningRate 0.000907 Epoch: 5 Global Step: 118700 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:05,053-Speed 2497.99 samples/sec Loss 5.4560 LearningRate 0.000907 Epoch: 5 Global Step: 118710 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:13,255-Speed 2497.40 samples/sec Loss 5.4596 LearningRate 0.000906 Epoch: 5 Global Step: 118720 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:21,454-Speed 2498.13 samples/sec Loss 5.4823 LearningRate 0.000906 Epoch: 5 Global Step: 118730 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:29,652-Speed 2498.48 samples/sec Loss 5.4878 LearningRate 0.000906 Epoch: 5 Global Step: 118740 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:37,797-Speed 2515.08 samples/sec Loss 5.4806 LearningRate 0.000906 Epoch: 5 Global Step: 118750 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:45,996-Speed 2498.18 samples/sec Loss 5.4503 LearningRate 0.000906 Epoch: 5 Global Step: 118760 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:34:54,193-Speed 2498.66 samples/sec Loss 5.4959 LearningRate 0.000906 Epoch: 5 Global Step: 118770 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:02,395-Speed 2497.69 samples/sec Loss 5.5392 LearningRate 0.000906 Epoch: 5 Global Step: 118780 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:10,594-Speed 2498.25 samples/sec Loss 5.5498 LearningRate 0.000906 Epoch: 5 Global Step: 118790 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:18,796-Speed 2497.28 samples/sec Loss 5.3512 LearningRate 0.000906 Epoch: 5 Global Step: 118800 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:26,955-Speed 2510.64 samples/sec Loss 5.5722 LearningRate 0.000906 Epoch: 5 Global Step: 118810 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:35,156-Speed 2497.75 samples/sec Loss 5.4591 LearningRate 0.000906 Epoch: 5 Global Step: 118820 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:43,355-Speed 2498.16 samples/sec Loss 5.5247 LearningRate 0.000906 Epoch: 5 Global Step: 118830 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:51,553-Speed 2498.84 samples/sec Loss 5.5270 LearningRate 0.000906 Epoch: 5 Global Step: 118840 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:35:59,751-Speed 2498.56 samples/sec Loss 5.4487 LearningRate 0.000906 Epoch: 5 Global Step: 118850 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:07,954-Speed 2497.11 samples/sec Loss 5.4880 LearningRate 0.000906 Epoch: 5 Global Step: 118860 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:16,098-Speed 2515.09 samples/sec Loss 5.5266 LearningRate 0.000906 Epoch: 5 Global Step: 118870 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:24,298-Speed 2497.82 samples/sec Loss 5.4193 LearningRate 0.000906 Epoch: 5 Global Step: 118880 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:32,497-Speed 2498.21 samples/sec Loss 5.4644 LearningRate 0.000906 Epoch: 5 Global Step: 118890 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:40,700-Speed 2497.15 samples/sec Loss 5.4634 LearningRate 0.000906 Epoch: 5 Global Step: 118900 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:48,901-Speed 2497.56 samples/sec Loss 5.5241 LearningRate 0.000906 Epoch: 5 Global Step: 118910 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:36:57,121-Speed 2492.06 samples/sec Loss 5.4950 LearningRate 0.000906 Epoch: 5 Global Step: 118920 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:05,268-Speed 2514.41 samples/sec Loss 5.4746 LearningRate 0.000906 Epoch: 5 Global Step: 118930 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:13,467-Speed 2498.29 samples/sec Loss 5.4076 LearningRate 0.000906 Epoch: 5 Global Step: 118940 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:21,662-Speed 2499.28 samples/sec Loss 5.4889 LearningRate 0.000906 Epoch: 5 Global Step: 118950 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:29,859-Speed 2498.93 samples/sec Loss 5.4183 LearningRate 0.000906 Epoch: 5 Global Step: 118960 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:38,058-Speed 2498.32 samples/sec Loss 5.5274 LearningRate 0.000906 Epoch: 5 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:46,258-Speed 2497.93 samples/sec Loss 5.4124 LearningRate 0.000906 Epoch: 5 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:37:54,404-Speed 2514.51 samples/sec Loss 5.4610 LearningRate 0.000906 Epoch: 5 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:02,601-Speed 2498.88 samples/sec Loss 5.5103 LearningRate 0.000906 Epoch: 5 Global Step: 119000 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:10,799-Speed 2498.55 samples/sec Loss 5.4517 LearningRate 0.000906 Epoch: 5 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:19,000-Speed 2497.57 samples/sec Loss 5.4547 LearningRate 0.000906 Epoch: 5 Global Step: 119020 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:27,201-Speed 2497.63 samples/sec Loss 5.4481 LearningRate 0.000906 Epoch: 5 Global Step: 119030 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:35,402-Speed 2497.78 samples/sec Loss 5.4524 LearningRate 0.000906 Epoch: 5 Global Step: 119040 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:43,553-Speed 2513.24 samples/sec Loss 5.5312 LearningRate 0.000906 Epoch: 5 Global Step: 119050 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:51,749-Speed 2499.14 samples/sec Loss 5.4697 LearningRate 0.000906 Epoch: 5 Global Step: 119060 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:38:59,946-Speed 2498.86 samples/sec Loss 5.4489 LearningRate 0.000906 Epoch: 5 Global Step: 119070 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:08,143-Speed 2498.86 samples/sec Loss 5.6199 LearningRate 0.000906 Epoch: 5 Global Step: 119080 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:16,349-Speed 2496.40 samples/sec Loss 5.5242 LearningRate 0.000906 Epoch: 5 Global Step: 119090 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:24,548-Speed 2498.34 samples/sec Loss 5.5255 LearningRate 0.000906 Epoch: 5 Global Step: 119100 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:32,691-Speed 2515.19 samples/sec Loss 5.4368 LearningRate 0.000905 Epoch: 5 Global Step: 119110 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:40,890-Speed 2498.63 samples/sec Loss 5.4821 LearningRate 0.000905 Epoch: 5 Global Step: 119120 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:49,085-Speed 2499.39 samples/sec Loss 5.4626 LearningRate 0.000905 Epoch: 5 Global Step: 119130 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:39:57,283-Speed 2498.91 samples/sec Loss 5.5143 LearningRate 0.000905 Epoch: 5 Global Step: 119140 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:05,479-Speed 2498.96 samples/sec Loss 5.4346 LearningRate 0.000905 Epoch: 5 Global Step: 119150 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:13,675-Speed 2499.21 samples/sec Loss 5.4519 LearningRate 0.000905 Epoch: 5 Global Step: 119160 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:21,819-Speed 2515.39 samples/sec Loss 5.3940 LearningRate 0.000905 Epoch: 5 Global Step: 119170 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:30,015-Speed 2498.96 samples/sec Loss 5.4606 LearningRate 0.000905 Epoch: 5 Global Step: 119180 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:38,211-Speed 2499.35 samples/sec Loss 5.5372 LearningRate 0.000905 Epoch: 5 Global Step: 119190 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:46,409-Speed 2498.59 samples/sec Loss 5.4110 LearningRate 0.000905 Epoch: 5 Global Step: 119200 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:40:54,608-Speed 2498.24 samples/sec Loss 5.4012 LearningRate 0.000905 Epoch: 5 Global Step: 119210 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:02,804-Speed 2499.05 samples/sec Loss 5.5714 LearningRate 0.000905 Epoch: 5 Global Step: 119220 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:10,967-Speed 2509.32 samples/sec Loss 5.4113 LearningRate 0.000905 Epoch: 5 Global Step: 119230 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:19,163-Speed 2499.28 samples/sec Loss 5.4052 LearningRate 0.000905 Epoch: 5 Global Step: 119240 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:27,367-Speed 2496.95 samples/sec Loss 5.3933 LearningRate 0.000905 Epoch: 5 Global Step: 119250 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:35,566-Speed 2498.18 samples/sec Loss 5.4016 LearningRate 0.000905 Epoch: 5 Global Step: 119260 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:43,762-Speed 2499.19 samples/sec Loss 5.3597 LearningRate 0.000905 Epoch: 5 Global Step: 119270 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:41:51,962-Speed 2498.25 samples/sec Loss 5.4432 LearningRate 0.000905 Epoch: 5 Global Step: 119280 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:00,108-Speed 2514.26 samples/sec Loss 5.5527 LearningRate 0.000905 Epoch: 5 Global Step: 119290 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:08,308-Speed 2498.22 samples/sec Loss 5.4144 LearningRate 0.000905 Epoch: 5 Global Step: 119300 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:16,509-Speed 2497.43 samples/sec Loss 5.5221 LearningRate 0.000905 Epoch: 5 Global Step: 119310 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:24,710-Speed 2497.81 samples/sec Loss 5.4733 LearningRate 0.000905 Epoch: 5 Global Step: 119320 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:32,907-Speed 2499.12 samples/sec Loss 5.4682 LearningRate 0.000905 Epoch: 5 Global Step: 119330 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:41,106-Speed 2498.20 samples/sec Loss 5.5067 LearningRate 0.000905 Epoch: 5 Global Step: 119340 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:49,251-Speed 2514.97 samples/sec Loss 5.4092 LearningRate 0.000905 Epoch: 5 Global Step: 119350 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:42:57,450-Speed 2498.33 samples/sec Loss 5.4431 LearningRate 0.000905 Epoch: 5 Global Step: 119360 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:05,649-Speed 2498.17 samples/sec Loss 5.4394 LearningRate 0.000905 Epoch: 5 Global Step: 119370 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:13,859-Speed 2495.30 samples/sec Loss 5.3963 LearningRate 0.000905 Epoch: 5 Global Step: 119380 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:22,073-Speed 2494.03 samples/sec Loss 5.3740 LearningRate 0.000905 Epoch: 5 Global Step: 119390 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:30,269-Speed 2499.29 samples/sec Loss 5.4189 LearningRate 0.000905 Epoch: 5 Global Step: 119400 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:38,417-Speed 2513.90 samples/sec Loss 5.3876 LearningRate 0.000905 Epoch: 5 Global Step: 119410 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:46,618-Speed 2497.74 samples/sec Loss 5.4624 LearningRate 0.000905 Epoch: 5 Global Step: 119420 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:43:54,825-Speed 2495.98 samples/sec Loss 5.3897 LearningRate 0.000905 Epoch: 5 Global Step: 119430 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:44:03,026-Speed 2497.62 samples/sec Loss 5.4404 LearningRate 0.000905 Epoch: 5 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:11,224-Speed 2498.39 samples/sec Loss 5.3836 LearningRate 0.000905 Epoch: 5 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:19,424-Speed 2497.90 samples/sec Loss 5.4270 LearningRate 0.000905 Epoch: 5 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:27,573-Speed 2513.77 samples/sec Loss 5.4995 LearningRate 0.000905 Epoch: 5 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:35,771-Speed 2498.56 samples/sec Loss 5.5074 LearningRate 0.000905 Epoch: 5 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:43,968-Speed 2498.74 samples/sec Loss 5.3943 LearningRate 0.000905 Epoch: 5 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:44:52,167-Speed 2498.23 samples/sec Loss 5.4147 LearningRate 0.000905 Epoch: 5 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:00,369-Speed 2497.51 samples/sec Loss 5.4914 LearningRate 0.000904 Epoch: 5 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:08,570-Speed 2497.54 samples/sec Loss 5.4705 LearningRate 0.000904 Epoch: 5 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:16,714-Speed 2515.03 samples/sec Loss 5.3694 LearningRate 0.000904 Epoch: 5 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:24,914-Speed 2497.88 samples/sec Loss 5.4892 LearningRate 0.000904 Epoch: 5 Global Step: 119540 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:33,113-Speed 2498.51 samples/sec Loss 5.4324 LearningRate 0.000904 Epoch: 5 Global Step: 119550 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:41,311-Speed 2498.67 samples/sec Loss 5.4921 LearningRate 0.000904 Epoch: 5 Global Step: 119560 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:49,510-Speed 2498.19 samples/sec Loss 5.4263 LearningRate 0.000904 Epoch: 5 Global Step: 119570 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:45:57,706-Speed 2499.15 samples/sec Loss 5.4083 LearningRate 0.000904 Epoch: 5 Global Step: 119580 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:05,852-Speed 2514.53 samples/sec Loss 5.4384 LearningRate 0.000904 Epoch: 5 Global Step: 119590 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:14,051-Speed 2498.33 samples/sec Loss 5.4221 LearningRate 0.000904 Epoch: 5 Global Step: 119600 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:22,246-Speed 2499.57 samples/sec Loss 5.4291 LearningRate 0.000904 Epoch: 5 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:30,465-Speed 2492.36 samples/sec Loss 5.4240 LearningRate 0.000904 Epoch: 5 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:38,660-Speed 2499.39 samples/sec Loss 5.4026 LearningRate 0.000904 Epoch: 5 Global Step: 119630 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:46,852-Speed 2500.67 samples/sec Loss 5.3887 LearningRate 0.000904 Epoch: 5 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:46:54,996-Speed 2515.16 samples/sec Loss 5.4791 LearningRate 0.000904 Epoch: 5 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:03,190-Speed 2499.72 samples/sec Loss 5.4125 LearningRate 0.000904 Epoch: 5 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:11,387-Speed 2499.27 samples/sec Loss 5.3609 LearningRate 0.000904 Epoch: 5 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:19,587-Speed 2497.82 samples/sec Loss 5.4466 LearningRate 0.000904 Epoch: 5 Global Step: 119680 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:27,783-Speed 2499.37 samples/sec Loss 5.4376 LearningRate 0.000904 Epoch: 5 Global Step: 119690 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:35,980-Speed 2498.67 samples/sec Loss 5.5354 LearningRate 0.000904 Epoch: 5 Global Step: 119700 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:44,125-Speed 2514.85 samples/sec Loss 5.4041 LearningRate 0.000904 Epoch: 5 Global Step: 119710 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:47:52,325-Speed 2497.89 samples/sec Loss 5.4135 LearningRate 0.000904 Epoch: 5 Global Step: 119720 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 17:48:00,483-Speed 2511.00 samples/sec Loss 5.4449 LearningRate 0.000904 Epoch: 5 Global Step: 119730 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:08,691-Speed 2495.61 samples/sec Loss 5.4583 LearningRate 0.000904 Epoch: 5 Global Step: 119740 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:16,886-Speed 2499.22 samples/sec Loss 5.4893 LearningRate 0.000904 Epoch: 5 Global Step: 119750 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:25,085-Speed 2498.44 samples/sec Loss 5.4941 LearningRate 0.000904 Epoch: 5 Global Step: 119760 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:33,228-Speed 2515.55 samples/sec Loss 5.5138 LearningRate 0.000904 Epoch: 5 Global Step: 119770 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:41,427-Speed 2498.44 samples/sec Loss 5.5450 LearningRate 0.000904 Epoch: 5 Global Step: 119780 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:49,626-Speed 2498.19 samples/sec Loss 5.4023 LearningRate 0.000904 Epoch: 5 Global Step: 119790 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:48:57,826-Speed 2497.71 samples/sec Loss 5.4745 LearningRate 0.000904 Epoch: 5 Global Step: 119800 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:06,025-Speed 2498.44 samples/sec Loss 5.4765 LearningRate 0.000904 Epoch: 5 Global Step: 119810 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:14,222-Speed 2498.76 samples/sec Loss 5.4459 LearningRate 0.000904 Epoch: 5 Global Step: 119820 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:22,366-Speed 2515.03 samples/sec Loss 5.4776 LearningRate 0.000904 Epoch: 5 Global Step: 119830 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:30,564-Speed 2499.18 samples/sec Loss 5.4755 LearningRate 0.000904 Epoch: 5 Global Step: 119840 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:38,761-Speed 2498.96 samples/sec Loss 5.4457 LearningRate 0.000904 Epoch: 5 Global Step: 119850 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:46,961-Speed 2498.40 samples/sec Loss 5.4528 LearningRate 0.000904 Epoch: 5 Global Step: 119860 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:49:55,156-Speed 2499.40 samples/sec Loss 5.4004 LearningRate 0.000904 Epoch: 5 Global Step: 119870 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:03,355-Speed 2497.98 samples/sec Loss 5.4217 LearningRate 0.000904 Epoch: 5 Global Step: 119880 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:11,498-Speed 2515.58 samples/sec Loss 5.3196 LearningRate 0.000904 Epoch: 5 Global Step: 119890 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:19,696-Speed 2498.69 samples/sec Loss 5.4202 LearningRate 0.000903 Epoch: 5 Global Step: 119900 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:27,894-Speed 2498.31 samples/sec Loss 5.3614 LearningRate 0.000903 Epoch: 5 Global Step: 119910 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:36,093-Speed 2498.54 samples/sec Loss 5.4972 LearningRate 0.000903 Epoch: 5 Global Step: 119920 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:44,293-Speed 2498.19 samples/sec Loss 5.5477 LearningRate 0.000903 Epoch: 5 Global Step: 119930 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:50:52,495-Speed 2497.27 samples/sec Loss 5.3925 LearningRate 0.000903 Epoch: 5 Global Step: 119940 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:00,640-Speed 2514.77 samples/sec Loss 5.3578 LearningRate 0.000903 Epoch: 5 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:08,837-Speed 2499.12 samples/sec Loss 5.4834 LearningRate 0.000903 Epoch: 5 Global Step: 119960 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:17,038-Speed 2497.89 samples/sec Loss 5.3972 LearningRate 0.000903 Epoch: 5 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:25,233-Speed 2499.25 samples/sec Loss 5.4337 LearningRate 0.000903 Epoch: 5 Global Step: 119980 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:33,435-Speed 2497.43 samples/sec Loss 5.4525 LearningRate 0.000903 Epoch: 5 Global Step: 119990 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:41,632-Speed 2498.61 samples/sec Loss 5.4103 LearningRate 0.000903 Epoch: 5 Global Step: 120000 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:49,775-Speed 2515.82 samples/sec Loss 5.3393 LearningRate 0.000903 Epoch: 5 Global Step: 120010 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:51:57,973-Speed 2498.53 samples/sec Loss 5.3916 LearningRate 0.000903 Epoch: 5 Global Step: 120020 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:06,176-Speed 2497.18 samples/sec Loss 5.4026 LearningRate 0.000903 Epoch: 5 Global Step: 120030 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:14,373-Speed 2498.77 samples/sec Loss 5.4413 LearningRate 0.000903 Epoch: 5 Global Step: 120040 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:22,569-Speed 2499.45 samples/sec Loss 5.4578 LearningRate 0.000903 Epoch: 5 Global Step: 120050 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:30,772-Speed 2497.10 samples/sec Loss 5.4939 LearningRate 0.000903 Epoch: 5 Global Step: 120060 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:38,917-Speed 2514.83 samples/sec Loss 5.4099 LearningRate 0.000903 Epoch: 5 Global Step: 120070 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:47,117-Speed 2497.98 samples/sec Loss 5.3456 LearningRate 0.000903 Epoch: 5 Global Step: 120080 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:52:55,317-Speed 2498.27 samples/sec Loss 5.4096 LearningRate 0.000903 Epoch: 5 Global Step: 120090 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:03,529-Speed 2493.98 samples/sec Loss 5.3091 LearningRate 0.000903 Epoch: 5 Global Step: 120100 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:11,726-Speed 2498.84 samples/sec Loss 5.3876 LearningRate 0.000903 Epoch: 5 Global Step: 120110 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:19,925-Speed 2498.41 samples/sec Loss 5.4849 LearningRate 0.000903 Epoch: 5 Global Step: 120120 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:28,074-Speed 2513.64 samples/sec Loss 5.5268 LearningRate 0.000903 Epoch: 5 Global Step: 120130 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:36,271-Speed 2498.85 samples/sec Loss 5.4048 LearningRate 0.000903 Epoch: 5 Global Step: 120140 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:44,471-Speed 2497.90 samples/sec Loss 5.5258 LearningRate 0.000903 Epoch: 5 Global Step: 120150 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:53:52,678-Speed 2495.87 samples/sec Loss 5.5124 LearningRate 0.000903 Epoch: 5 Global Step: 120160 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:00,878-Speed 2498.04 samples/sec Loss 5.4795 LearningRate 0.000903 Epoch: 5 Global Step: 120170 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:09,078-Speed 2498.24 samples/sec Loss 5.4963 LearningRate 0.000903 Epoch: 5 Global Step: 120180 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:17,219-Speed 2515.84 samples/sec Loss 5.5066 LearningRate 0.000903 Epoch: 5 Global Step: 120190 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:25,418-Speed 2498.48 samples/sec Loss 5.4676 LearningRate 0.000903 Epoch: 5 Global Step: 120200 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:33,617-Speed 2498.03 samples/sec Loss 5.4042 LearningRate 0.000903 Epoch: 5 Global Step: 120210 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:41,815-Speed 2498.81 samples/sec Loss 5.4663 LearningRate 0.000903 Epoch: 5 Global Step: 120220 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:50,010-Speed 2499.46 samples/sec Loss 5.4242 LearningRate 0.000903 Epoch: 5 Global Step: 120230 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:54:58,207-Speed 2498.86 samples/sec Loss 5.3966 LearningRate 0.000903 Epoch: 5 Global Step: 120240 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:06,348-Speed 2516.25 samples/sec Loss 5.5220 LearningRate 0.000903 Epoch: 5 Global Step: 120250 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:14,550-Speed 2497.38 samples/sec Loss 5.4344 LearningRate 0.000903 Epoch: 5 Global Step: 120260 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:22,762-Speed 2494.34 samples/sec Loss 5.4496 LearningRate 0.000903 Epoch: 5 Global Step: 120270 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:30,960-Speed 2498.84 samples/sec Loss 5.4322 LearningRate 0.000903 Epoch: 5 Global Step: 120280 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:39,162-Speed 2497.43 samples/sec Loss 5.4380 LearningRate 0.000902 Epoch: 5 Global Step: 120290 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:47,364-Speed 2497.44 samples/sec Loss 5.3935 LearningRate 0.000902 Epoch: 5 Global Step: 120300 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:55:55,506-Speed 2515.51 samples/sec Loss 5.4018 LearningRate 0.000902 Epoch: 5 Global Step: 120310 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:03,698-Speed 2500.62 samples/sec Loss 5.3573 LearningRate 0.000902 Epoch: 5 Global Step: 120320 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:11,895-Speed 2498.84 samples/sec Loss 5.3387 LearningRate 0.000902 Epoch: 5 Global Step: 120330 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:20,104-Speed 2494.79 samples/sec Loss 5.3636 LearningRate 0.000902 Epoch: 5 Global Step: 120340 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:28,298-Speed 2500.15 samples/sec Loss 5.4447 LearningRate 0.000902 Epoch: 5 Global Step: 120350 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:36,492-Speed 2499.98 samples/sec Loss 5.4154 LearningRate 0.000902 Epoch: 5 Global Step: 120360 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:44,632-Speed 2516.10 samples/sec Loss 5.4240 LearningRate 0.000902 Epoch: 5 Global Step: 120370 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:56:52,830-Speed 2498.99 samples/sec Loss 5.4483 LearningRate 0.000902 Epoch: 5 Global Step: 120380 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:01,028-Speed 2498.66 samples/sec Loss 5.4737 LearningRate 0.000902 Epoch: 5 Global Step: 120390 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:09,240-Speed 2494.54 samples/sec Loss 5.4688 LearningRate 0.000902 Epoch: 5 Global Step: 120400 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:17,435-Speed 2499.43 samples/sec Loss 5.5405 LearningRate 0.000902 Epoch: 5 Global Step: 120410 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:25,630-Speed 2499.59 samples/sec Loss 5.5418 LearningRate 0.000902 Epoch: 5 Global Step: 120420 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:33,772-Speed 2515.98 samples/sec Loss 5.5381 LearningRate 0.000902 Epoch: 5 Global Step: 120430 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:41,992-Speed 2491.90 samples/sec Loss 5.4382 LearningRate 0.000902 Epoch: 5 Global Step: 120440 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:50,191-Speed 2498.48 samples/sec Loss 5.5225 LearningRate 0.000902 Epoch: 5 Global Step: 120450 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:57:58,386-Speed 2499.41 samples/sec Loss 5.4912 LearningRate 0.000902 Epoch: 5 Global Step: 120460 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:06,585-Speed 2498.41 samples/sec Loss 5.3486 LearningRate 0.000902 Epoch: 5 Global Step: 120470 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:14,782-Speed 2498.96 samples/sec Loss 5.4084 LearningRate 0.000902 Epoch: 5 Global Step: 120480 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:22,924-Speed 2515.56 samples/sec Loss 5.4144 LearningRate 0.000902 Epoch: 5 Global Step: 120490 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:31,124-Speed 2497.96 samples/sec Loss 5.4283 LearningRate 0.000902 Epoch: 5 Global Step: 120500 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:39,319-Speed 2499.42 samples/sec Loss 5.3672 LearningRate 0.000902 Epoch: 5 Global Step: 120510 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:47,516-Speed 2498.93 samples/sec Loss 5.4751 LearningRate 0.000902 Epoch: 5 Global Step: 120520 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:58:55,724-Speed 2495.61 samples/sec Loss 5.4586 LearningRate 0.000902 Epoch: 5 Global Step: 120530 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:03,919-Speed 2499.39 samples/sec Loss 5.4051 LearningRate 0.000902 Epoch: 5 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:12,062-Speed 2515.33 samples/sec Loss 5.3011 LearningRate 0.000902 Epoch: 5 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:20,258-Speed 2499.07 samples/sec Loss 5.4430 LearningRate 0.000902 Epoch: 5 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:28,458-Speed 2498.09 samples/sec Loss 5.4179 LearningRate 0.000902 Epoch: 5 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:36,664-Speed 2496.05 samples/sec Loss 5.5367 LearningRate 0.000902 Epoch: 5 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:44,861-Speed 2498.86 samples/sec Loss 5.3544 LearningRate 0.000902 Epoch: 5 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 17:59:53,060-Speed 2498.35 samples/sec Loss 5.3570 LearningRate 0.000902 Epoch: 5 Global Step: 120600 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:01,201-Speed 2515.98 samples/sec Loss 5.4241 LearningRate 0.000902 Epoch: 5 Global Step: 120610 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:09,397-Speed 2499.31 samples/sec Loss 5.4181 LearningRate 0.000902 Epoch: 5 Global Step: 120620 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:17,593-Speed 2499.20 samples/sec Loss 5.4273 LearningRate 0.000902 Epoch: 5 Global Step: 120630 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:25,789-Speed 2499.27 samples/sec Loss 5.4293 LearningRate 0.000902 Epoch: 5 Global Step: 120640 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:33,985-Speed 2499.19 samples/sec Loss 5.3789 LearningRate 0.000902 Epoch: 5 Global Step: 120650 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:42,181-Speed 2499.30 samples/sec Loss 5.3462 LearningRate 0.000902 Epoch: 5 Global Step: 120660 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:50,323-Speed 2515.52 samples/sec Loss 5.3602 LearningRate 0.000902 Epoch: 5 Global Step: 120670 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:00:58,522-Speed 2498.50 samples/sec Loss 5.3535 LearningRate 0.000902 Epoch: 5 Global Step: 120680 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:06,718-Speed 2499.09 samples/sec Loss 5.3884 LearningRate 0.000901 Epoch: 5 Global Step: 120690 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:14,913-Speed 2499.23 samples/sec Loss 5.3666 LearningRate 0.000901 Epoch: 5 Global Step: 120700 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:23,115-Speed 2497.65 samples/sec Loss 5.3152 LearningRate 0.000901 Epoch: 5 Global Step: 120710 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:31,312-Speed 2498.84 samples/sec Loss 5.4068 LearningRate 0.000901 Epoch: 5 Global Step: 120720 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:39,455-Speed 2515.68 samples/sec Loss 5.3763 LearningRate 0.000901 Epoch: 5 Global Step: 120730 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:47,653-Speed 2498.47 samples/sec Loss 5.3873 LearningRate 0.000901 Epoch: 5 Global Step: 120740 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:01:55,849-Speed 2499.16 samples/sec Loss 5.5060 LearningRate 0.000901 Epoch: 5 Global Step: 120750 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:04,046-Speed 2498.73 samples/sec Loss 5.5176 LearningRate 0.000901 Epoch: 5 Global Step: 120760 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:12,241-Speed 2499.78 samples/sec Loss 5.3827 LearningRate 0.000901 Epoch: 5 Global Step: 120770 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:20,434-Speed 2499.80 samples/sec Loss 5.4226 LearningRate 0.000901 Epoch: 5 Global Step: 120780 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:28,582-Speed 2514.08 samples/sec Loss 5.3912 LearningRate 0.000901 Epoch: 5 Global Step: 120790 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:36,779-Speed 2498.77 samples/sec Loss 5.4448 LearningRate 0.000901 Epoch: 5 Global Step: 120800 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:44,977-Speed 2498.58 samples/sec Loss 5.4461 LearningRate 0.000901 Epoch: 5 Global Step: 120810 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:02:53,185-Speed 2495.76 samples/sec Loss 5.4400 LearningRate 0.000901 Epoch: 5 Global Step: 120820 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:01,379-Speed 2499.58 samples/sec Loss 5.3938 LearningRate 0.000901 Epoch: 5 Global Step: 120830 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:09,577-Speed 2498.68 samples/sec Loss 5.3631 LearningRate 0.000901 Epoch: 5 Global Step: 120840 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:17,721-Speed 2515.10 samples/sec Loss 5.2889 LearningRate 0.000901 Epoch: 5 Global Step: 120850 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:25,931-Speed 2494.91 samples/sec Loss 5.4032 LearningRate 0.000901 Epoch: 5 Global Step: 120860 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:34,131-Speed 2498.30 samples/sec Loss 5.4425 LearningRate 0.000901 Epoch: 5 Global Step: 120870 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:42,337-Speed 2496.23 samples/sec Loss 5.3726 LearningRate 0.000901 Epoch: 5 Global Step: 120880 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:50,537-Speed 2498.48 samples/sec Loss 5.4617 LearningRate 0.000901 Epoch: 5 Global Step: 120890 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:03:58,736-Speed 2497.98 samples/sec Loss 5.4544 LearningRate 0.000901 Epoch: 5 Global Step: 120900 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:04:06,885-Speed 2513.87 samples/sec Loss 5.4842 LearningRate 0.000901 Epoch: 5 Global Step: 120910 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:04:15,081-Speed 2499.10 samples/sec Loss 5.4469 LearningRate 0.000901 Epoch: 5 Global Step: 120920 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:04:23,278-Speed 2499.15 samples/sec Loss 5.4829 LearningRate 0.000901 Epoch: 5 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:04:31,478-Speed 2498.04 samples/sec Loss 5.3618 LearningRate 0.000901 Epoch: 5 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:04:39,677-Speed 2498.16 samples/sec Loss 5.4077 LearningRate 0.000901 Epoch: 5 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:04:47,875-Speed 2498.42 samples/sec Loss 5.4232 LearningRate 0.000901 Epoch: 5 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:04:56,023-Speed 2513.97 samples/sec Loss 5.3191 LearningRate 0.000901 Epoch: 5 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:04,223-Speed 2497.94 samples/sec Loss 5.3028 LearningRate 0.000901 Epoch: 5 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:12,423-Speed 2497.84 samples/sec Loss 5.3514 LearningRate 0.000901 Epoch: 5 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:20,631-Speed 2495.84 samples/sec Loss 5.3414 LearningRate 0.000901 Epoch: 5 Global Step: 121000 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:28,831-Speed 2497.91 samples/sec Loss 5.3570 LearningRate 0.000901 Epoch: 5 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:37,031-Speed 2497.99 samples/sec Loss 5.3336 LearningRate 0.000901 Epoch: 5 Global Step: 121020 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:45,180-Speed 2513.35 samples/sec Loss 5.3072 LearningRate 0.000901 Epoch: 5 Global Step: 121030 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:05:53,379-Speed 2498.26 samples/sec Loss 5.4733 LearningRate 0.000901 Epoch: 5 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:01,597-Speed 2492.50 samples/sec Loss 5.4645 LearningRate 0.000901 Epoch: 5 Global Step: 121050 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:09,798-Speed 2497.58 samples/sec Loss 5.3509 LearningRate 0.000901 Epoch: 5 Global Step: 121060 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:17,998-Speed 2497.97 samples/sec Loss 5.4072 LearningRate 0.000901 Epoch: 5 Global Step: 121070 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:26,198-Speed 2497.91 samples/sec Loss 5.4061 LearningRate 0.000900 Epoch: 5 Global Step: 121080 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:34,345-Speed 2514.32 samples/sec Loss 5.3414 LearningRate 0.000900 Epoch: 5 Global Step: 121090 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:42,545-Speed 2497.90 samples/sec Loss 5.3566 LearningRate 0.000900 Epoch: 5 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:50,746-Speed 2497.90 samples/sec Loss 5.2716 LearningRate 0.000900 Epoch: 5 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:06:58,944-Speed 2498.49 samples/sec Loss 5.3433 LearningRate 0.000900 Epoch: 5 Global Step: 121120 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:07:07,147-Speed 2497.07 samples/sec Loss 5.3999 LearningRate 0.000900 Epoch: 5 Global Step: 121130 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:07:15,344-Speed 2498.71 samples/sec Loss 5.3896 LearningRate 0.000900 Epoch: 5 Global Step: 121140 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:07:23,491-Speed 2514.30 samples/sec Loss 5.4019 LearningRate 0.000900 Epoch: 5 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 162 hours Training: 2022-07-06 18:07:31,647-Speed 2511.83 samples/sec Loss 5.4157 LearningRate 0.000900 Epoch: 5 Global Step: 121160 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:07:39,843-Speed 2499.23 samples/sec Loss 5.3913 LearningRate 0.000900 Epoch: 5 Global Step: 121170 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:07:48,043-Speed 2498.02 samples/sec Loss 5.4296 LearningRate 0.000900 Epoch: 5 Global Step: 121180 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:07:56,243-Speed 2497.81 samples/sec Loss 5.3463 LearningRate 0.000900 Epoch: 5 Global Step: 121190 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:04,440-Speed 2499.03 samples/sec Loss 5.3689 LearningRate 0.000900 Epoch: 5 Global Step: 121200 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:12,601-Speed 2509.99 samples/sec Loss 5.4139 LearningRate 0.000900 Epoch: 5 Global Step: 121210 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:20,804-Speed 2497.19 samples/sec Loss 5.3615 LearningRate 0.000900 Epoch: 5 Global Step: 121220 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:29,005-Speed 2497.90 samples/sec Loss 5.3890 LearningRate 0.000900 Epoch: 5 Global Step: 121230 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:37,212-Speed 2495.62 samples/sec Loss 5.4598 LearningRate 0.000900 Epoch: 5 Global Step: 121240 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:45,415-Speed 2496.86 samples/sec Loss 5.4481 LearningRate 0.000900 Epoch: 5 Global Step: 121250 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:08:53,617-Speed 2497.38 samples/sec Loss 5.4790 LearningRate 0.000900 Epoch: 5 Global Step: 121260 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:01,766-Speed 2513.78 samples/sec Loss 5.3361 LearningRate 0.000900 Epoch: 5 Global Step: 121270 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:09,967-Speed 2497.48 samples/sec Loss 5.4111 LearningRate 0.000900 Epoch: 5 Global Step: 121280 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:18,170-Speed 2497.04 samples/sec Loss 5.3436 LearningRate 0.000900 Epoch: 5 Global Step: 121290 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:26,374-Speed 2497.12 samples/sec Loss 5.2991 LearningRate 0.000900 Epoch: 5 Global Step: 121300 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:34,582-Speed 2495.66 samples/sec Loss 5.3663 LearningRate 0.000900 Epoch: 5 Global Step: 121310 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:42,787-Speed 2496.49 samples/sec Loss 5.3802 LearningRate 0.000900 Epoch: 5 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:50,937-Speed 2513.38 samples/sec Loss 5.2665 LearningRate 0.000900 Epoch: 5 Global Step: 121330 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:09:59,138-Speed 2497.77 samples/sec Loss 5.4378 LearningRate 0.000900 Epoch: 5 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:07,341-Speed 2497.29 samples/sec Loss 5.4232 LearningRate 0.000900 Epoch: 5 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:15,542-Speed 2497.42 samples/sec Loss 5.3426 LearningRate 0.000900 Epoch: 5 Global Step: 121360 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:23,757-Speed 2493.53 samples/sec Loss 5.3352 LearningRate 0.000900 Epoch: 5 Global Step: 121370 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:31,957-Speed 2497.99 samples/sec Loss 5.4022 LearningRate 0.000900 Epoch: 5 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:40,117-Speed 2510.21 samples/sec Loss 5.2634 LearningRate 0.000900 Epoch: 5 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:48,333-Speed 2493.21 samples/sec Loss 5.3110 LearningRate 0.000900 Epoch: 5 Global Step: 121400 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:10:56,533-Speed 2497.95 samples/sec Loss 5.3716 LearningRate 0.000900 Epoch: 5 Global Step: 121410 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:04,733-Speed 2498.33 samples/sec Loss 5.3687 LearningRate 0.000900 Epoch: 5 Global Step: 121420 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:12,932-Speed 2498.35 samples/sec Loss 5.3860 LearningRate 0.000900 Epoch: 5 Global Step: 121430 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:21,138-Speed 2496.04 samples/sec Loss 5.3299 LearningRate 0.000900 Epoch: 5 Global Step: 121440 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:29,287-Speed 2513.63 samples/sec Loss 5.3174 LearningRate 0.000900 Epoch: 5 Global Step: 121450 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:37,486-Speed 2498.14 samples/sec Loss 5.4236 LearningRate 0.000900 Epoch: 5 Global Step: 121460 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:45,684-Speed 2498.69 samples/sec Loss 5.3970 LearningRate 0.000899 Epoch: 5 Global Step: 121470 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:11:53,881-Speed 2498.70 samples/sec Loss 5.3673 LearningRate 0.000899 Epoch: 5 Global Step: 121480 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:02,084-Speed 2497.15 samples/sec Loss 5.4079 LearningRate 0.000899 Epoch: 5 Global Step: 121490 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:10,284-Speed 2498.12 samples/sec Loss 5.4998 LearningRate 0.000899 Epoch: 5 Global Step: 121500 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:18,429-Speed 2514.73 samples/sec Loss 5.4706 LearningRate 0.000899 Epoch: 5 Global Step: 121510 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:26,624-Speed 2499.35 samples/sec Loss 5.4049 LearningRate 0.000899 Epoch: 5 Global Step: 121520 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:34,820-Speed 2499.17 samples/sec Loss 5.3975 LearningRate 0.000899 Epoch: 5 Global Step: 121530 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:43,016-Speed 2499.37 samples/sec Loss 5.4284 LearningRate 0.000899 Epoch: 5 Global Step: 121540 Fp16 Grad Scale: 32768 Required: 162 hours Training: 2022-07-06 18:12:51,170-Speed 2511.77 samples/sec Loss 5.3359 LearningRate 0.000899 Epoch: 5 Global Step: 121550 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:12:59,369-Speed 2498.60 samples/sec Loss 5.3259 LearningRate 0.000899 Epoch: 5 Global Step: 121560 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:07,510-Speed 2515.99 samples/sec Loss 5.4140 LearningRate 0.000899 Epoch: 5 Global Step: 121570 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:15,706-Speed 2499.33 samples/sec Loss 5.3117 LearningRate 0.000899 Epoch: 5 Global Step: 121580 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:23,905-Speed 2498.22 samples/sec Loss 5.2564 LearningRate 0.000899 Epoch: 5 Global Step: 121590 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:32,120-Speed 2493.49 samples/sec Loss 5.2679 LearningRate 0.000899 Epoch: 5 Global Step: 121600 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:40,334-Speed 2493.68 samples/sec Loss 5.3841 LearningRate 0.000899 Epoch: 5 Global Step: 121610 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:48,532-Speed 2498.88 samples/sec Loss 5.3650 LearningRate 0.000899 Epoch: 5 Global Step: 121620 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:13:56,681-Speed 2513.70 samples/sec Loss 5.3578 LearningRate 0.000899 Epoch: 5 Global Step: 121630 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:04,882-Speed 2497.43 samples/sec Loss 5.3420 LearningRate 0.000899 Epoch: 5 Global Step: 121640 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:13,082-Speed 2497.98 samples/sec Loss 5.3694 LearningRate 0.000899 Epoch: 5 Global Step: 121650 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:21,286-Speed 2496.67 samples/sec Loss 5.3487 LearningRate 0.000899 Epoch: 5 Global Step: 121660 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:29,490-Speed 2496.65 samples/sec Loss 5.4659 LearningRate 0.000899 Epoch: 5 Global Step: 121670 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:37,688-Speed 2498.63 samples/sec Loss 5.4808 LearningRate 0.000899 Epoch: 5 Global Step: 121680 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:45,835-Speed 2514.48 samples/sec Loss 5.3679 LearningRate 0.000899 Epoch: 5 Global Step: 121690 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:14:54,033-Speed 2498.64 samples/sec Loss 5.3663 LearningRate 0.000899 Epoch: 5 Global Step: 121700 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:02,234-Speed 2497.56 samples/sec Loss 5.3754 LearningRate 0.000899 Epoch: 5 Global Step: 121710 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:10,435-Speed 2497.77 samples/sec Loss 5.3237 LearningRate 0.000899 Epoch: 5 Global Step: 121720 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:18,643-Speed 2495.60 samples/sec Loss 5.4274 LearningRate 0.000899 Epoch: 5 Global Step: 121730 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:26,845-Speed 2497.30 samples/sec Loss 5.2856 LearningRate 0.000899 Epoch: 5 Global Step: 121740 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:34,994-Speed 2513.41 samples/sec Loss 5.3184 LearningRate 0.000899 Epoch: 5 Global Step: 121750 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:43,193-Speed 2498.38 samples/sec Loss 5.3843 LearningRate 0.000899 Epoch: 5 Global Step: 121760 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:51,391-Speed 2498.87 samples/sec Loss 5.3662 LearningRate 0.000899 Epoch: 5 Global Step: 121770 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:15:59,588-Speed 2498.98 samples/sec Loss 5.3099 LearningRate 0.000899 Epoch: 5 Global Step: 121780 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:07,786-Speed 2498.39 samples/sec Loss 5.3557 LearningRate 0.000899 Epoch: 5 Global Step: 121790 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:15,984-Speed 2498.79 samples/sec Loss 5.3195 LearningRate 0.000899 Epoch: 5 Global Step: 121800 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:24,127-Speed 2515.59 samples/sec Loss 5.4334 LearningRate 0.000899 Epoch: 5 Global Step: 121810 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:32,324-Speed 2498.85 samples/sec Loss 5.2749 LearningRate 0.000899 Epoch: 5 Global Step: 121820 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:40,518-Speed 2499.76 samples/sec Loss 5.2860 LearningRate 0.000899 Epoch: 5 Global Step: 121830 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:48,717-Speed 2498.32 samples/sec Loss 5.3280 LearningRate 0.000899 Epoch: 5 Global Step: 121840 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:16:56,915-Speed 2498.65 samples/sec Loss 5.4348 LearningRate 0.000899 Epoch: 5 Global Step: 121850 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:05,113-Speed 2498.58 samples/sec Loss 5.3985 LearningRate 0.000899 Epoch: 5 Global Step: 121860 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:13,256-Speed 2515.53 samples/sec Loss 5.3806 LearningRate 0.000898 Epoch: 5 Global Step: 121870 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:21,463-Speed 2495.89 samples/sec Loss 5.3831 LearningRate 0.000898 Epoch: 5 Global Step: 121880 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:29,661-Speed 2498.51 samples/sec Loss 5.3390 LearningRate 0.000898 Epoch: 5 Global Step: 121890 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:37,859-Speed 2498.45 samples/sec Loss 5.3701 LearningRate 0.000898 Epoch: 5 Global Step: 121900 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:46,054-Speed 2499.81 samples/sec Loss 5.3999 LearningRate 0.000898 Epoch: 5 Global Step: 121910 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:17:54,249-Speed 2499.39 samples/sec Loss 5.3325 LearningRate 0.000898 Epoch: 5 Global Step: 121920 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:02,393-Speed 2515.09 samples/sec Loss 5.3834 LearningRate 0.000898 Epoch: 5 Global Step: 121930 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:10,589-Speed 2499.29 samples/sec Loss 5.3946 LearningRate 0.000898 Epoch: 5 Global Step: 121940 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:18,798-Speed 2495.27 samples/sec Loss 5.3079 LearningRate 0.000898 Epoch: 5 Global Step: 121950 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:26,996-Speed 2498.59 samples/sec Loss 5.3394 LearningRate 0.000898 Epoch: 5 Global Step: 121960 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:35,194-Speed 2498.63 samples/sec Loss 5.3996 LearningRate 0.000898 Epoch: 5 Global Step: 121970 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:43,396-Speed 2497.06 samples/sec Loss 5.4484 LearningRate 0.000898 Epoch: 5 Global Step: 121980 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:51,541-Speed 2515.17 samples/sec Loss 5.3744 LearningRate 0.000898 Epoch: 5 Global Step: 121990 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:18:59,742-Speed 2497.59 samples/sec Loss 5.3690 LearningRate 0.000898 Epoch: 5 Global Step: 122000 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:07,940-Speed 2498.58 samples/sec Loss 5.3587 LearningRate 0.000898 Epoch: 5 Global Step: 122010 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:16,136-Speed 2499.05 samples/sec Loss 5.4174 LearningRate 0.000898 Epoch: 5 Global Step: 122020 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:24,338-Speed 2497.46 samples/sec Loss 5.3886 LearningRate 0.000898 Epoch: 5 Global Step: 122030 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:32,534-Speed 2499.48 samples/sec Loss 5.3612 LearningRate 0.000898 Epoch: 5 Global Step: 122040 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:40,681-Speed 2514.30 samples/sec Loss 5.5040 LearningRate 0.000898 Epoch: 5 Global Step: 122050 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:48,874-Speed 2500.16 samples/sec Loss 5.4438 LearningRate 0.000898 Epoch: 5 Global Step: 122060 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:19:57,068-Speed 2500.07 samples/sec Loss 5.4480 LearningRate 0.000898 Epoch: 5 Global Step: 122070 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:05,263-Speed 2499.37 samples/sec Loss 5.4199 LearningRate 0.000898 Epoch: 5 Global Step: 122080 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:13,460-Speed 2499.05 samples/sec Loss 5.4430 LearningRate 0.000898 Epoch: 5 Global Step: 122090 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:21,657-Speed 2498.63 samples/sec Loss 5.3137 LearningRate 0.000898 Epoch: 5 Global Step: 122100 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:29,802-Speed 2515.16 samples/sec Loss 5.4296 LearningRate 0.000898 Epoch: 5 Global Step: 122110 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:38,004-Speed 2497.24 samples/sec Loss 5.4391 LearningRate 0.000898 Epoch: 5 Global Step: 122120 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:46,211-Speed 2495.79 samples/sec Loss 5.3610 LearningRate 0.000898 Epoch: 5 Global Step: 122130 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:20:54,410-Speed 2498.42 samples/sec Loss 5.2949 LearningRate 0.000898 Epoch: 5 Global Step: 122140 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:02,607-Speed 2498.99 samples/sec Loss 5.3049 LearningRate 0.000898 Epoch: 5 Global Step: 122150 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:10,809-Speed 2497.23 samples/sec Loss 5.2951 LearningRate 0.000898 Epoch: 5 Global Step: 122160 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:18,951-Speed 2515.88 samples/sec Loss 5.3646 LearningRate 0.000898 Epoch: 5 Global Step: 122170 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:27,152-Speed 2497.64 samples/sec Loss 5.3369 LearningRate 0.000898 Epoch: 5 Global Step: 122180 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:35,346-Speed 2499.51 samples/sec Loss 5.3767 LearningRate 0.000898 Epoch: 5 Global Step: 122190 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:43,548-Speed 2497.39 samples/sec Loss 5.3343 LearningRate 0.000898 Epoch: 5 Global Step: 122200 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:51,749-Speed 2497.74 samples/sec Loss 5.3940 LearningRate 0.000898 Epoch: 5 Global Step: 122210 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:21:59,946-Speed 2499.10 samples/sec Loss 5.3187 LearningRate 0.000898 Epoch: 5 Global Step: 122220 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:08,092-Speed 2514.65 samples/sec Loss 5.4043 LearningRate 0.000898 Epoch: 5 Global Step: 122230 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:16,290-Speed 2498.57 samples/sec Loss 5.4472 LearningRate 0.000898 Epoch: 5 Global Step: 122240 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:24,482-Speed 2500.33 samples/sec Loss 5.4399 LearningRate 0.000898 Epoch: 5 Global Step: 122250 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:32,679-Speed 2498.93 samples/sec Loss 5.3945 LearningRate 0.000897 Epoch: 5 Global Step: 122260 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:40,884-Speed 2496.54 samples/sec Loss 5.4204 LearningRate 0.000897 Epoch: 5 Global Step: 122270 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:49,078-Speed 2499.68 samples/sec Loss 5.4674 LearningRate 0.000897 Epoch: 5 Global Step: 122280 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:22:57,222-Speed 2515.05 samples/sec Loss 5.3472 LearningRate 0.000897 Epoch: 5 Global Step: 122290 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:05,421-Speed 2498.30 samples/sec Loss 5.3745 LearningRate 0.000897 Epoch: 5 Global Step: 122300 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:13,616-Speed 2499.52 samples/sec Loss 5.3153 LearningRate 0.000897 Epoch: 5 Global Step: 122310 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:21,835-Speed 2492.22 samples/sec Loss 5.3588 LearningRate 0.000897 Epoch: 5 Global Step: 122320 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:30,036-Speed 2497.61 samples/sec Loss 5.3873 LearningRate 0.000897 Epoch: 5 Global Step: 122330 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:38,233-Speed 2498.84 samples/sec Loss 5.4135 LearningRate 0.000897 Epoch: 5 Global Step: 122340 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:46,379-Speed 2514.81 samples/sec Loss 5.3595 LearningRate 0.000897 Epoch: 5 Global Step: 122350 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:23:54,576-Speed 2498.64 samples/sec Loss 5.3106 LearningRate 0.000897 Epoch: 5 Global Step: 122360 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:02,776-Speed 2498.05 samples/sec Loss 5.3257 LearningRate 0.000897 Epoch: 5 Global Step: 122370 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:10,977-Speed 2498.05 samples/sec Loss 5.3140 LearningRate 0.000897 Epoch: 5 Global Step: 122380 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:19,174-Speed 2499.00 samples/sec Loss 5.3163 LearningRate 0.000897 Epoch: 5 Global Step: 122390 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:27,373-Speed 2498.06 samples/sec Loss 5.2820 LearningRate 0.000897 Epoch: 5 Global Step: 122400 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:35,524-Speed 2512.93 samples/sec Loss 5.3721 LearningRate 0.000897 Epoch: 5 Global Step: 122410 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:43,726-Speed 2497.50 samples/sec Loss 5.2950 LearningRate 0.000897 Epoch: 5 Global Step: 122420 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:24:51,925-Speed 2498.10 samples/sec Loss 5.3540 LearningRate 0.000897 Epoch: 5 Global Step: 122430 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:00,125-Speed 2497.93 samples/sec Loss 5.3188 LearningRate 0.000897 Epoch: 5 Global Step: 122440 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:08,326-Speed 2497.71 samples/sec Loss 5.3913 LearningRate 0.000897 Epoch: 5 Global Step: 122450 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:16,525-Speed 2498.56 samples/sec Loss 5.2779 LearningRate 0.000897 Epoch: 5 Global Step: 122460 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:24,668-Speed 2515.48 samples/sec Loss 5.3711 LearningRate 0.000897 Epoch: 5 Global Step: 122470 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:32,863-Speed 2499.52 samples/sec Loss 5.3420 LearningRate 0.000897 Epoch: 5 Global Step: 122480 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:41,064-Speed 2497.58 samples/sec Loss 5.3209 LearningRate 0.000897 Epoch: 5 Global Step: 122490 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:49,266-Speed 2497.74 samples/sec Loss 5.3506 LearningRate 0.000897 Epoch: 5 Global Step: 122500 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:25:57,463-Speed 2499.06 samples/sec Loss 5.3512 LearningRate 0.000897 Epoch: 5 Global Step: 122510 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:05,659-Speed 2499.03 samples/sec Loss 5.3455 LearningRate 0.000897 Epoch: 5 Global Step: 122520 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:13,803-Speed 2515.04 samples/sec Loss 5.4325 LearningRate 0.000897 Epoch: 5 Global Step: 122530 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:22,008-Speed 2496.51 samples/sec Loss 5.3311 LearningRate 0.000897 Epoch: 5 Global Step: 122540 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:30,204-Speed 2499.21 samples/sec Loss 5.3197 LearningRate 0.000897 Epoch: 5 Global Step: 122550 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:38,403-Speed 2498.23 samples/sec Loss 5.4071 LearningRate 0.000897 Epoch: 5 Global Step: 122560 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:46,602-Speed 2498.43 samples/sec Loss 5.4132 LearningRate 0.000897 Epoch: 5 Global Step: 122570 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:26:54,807-Speed 2496.26 samples/sec Loss 5.2927 LearningRate 0.000897 Epoch: 5 Global Step: 122580 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:02,976-Speed 2507.45 samples/sec Loss 5.3826 LearningRate 0.000897 Epoch: 5 Global Step: 122590 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:11,174-Speed 2498.83 samples/sec Loss 5.3057 LearningRate 0.000897 Epoch: 5 Global Step: 122600 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:19,372-Speed 2498.72 samples/sec Loss 5.4142 LearningRate 0.000897 Epoch: 5 Global Step: 122610 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:27,569-Speed 2498.87 samples/sec Loss 5.2880 LearningRate 0.000897 Epoch: 5 Global Step: 122620 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:35,767-Speed 2498.49 samples/sec Loss 5.2194 LearningRate 0.000897 Epoch: 5 Global Step: 122630 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:43,966-Speed 2498.15 samples/sec Loss 5.3823 LearningRate 0.000897 Epoch: 5 Global Step: 122640 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:27:52,115-Speed 2513.59 samples/sec Loss 5.2967 LearningRate 0.000896 Epoch: 5 Global Step: 122650 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:00,315-Speed 2498.37 samples/sec Loss 5.2653 LearningRate 0.000896 Epoch: 5 Global Step: 122660 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:08,513-Speed 2498.50 samples/sec Loss 5.3662 LearningRate 0.000896 Epoch: 5 Global Step: 122670 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:16,707-Speed 2499.74 samples/sec Loss 5.3821 LearningRate 0.000896 Epoch: 5 Global Step: 122680 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:24,906-Speed 2498.39 samples/sec Loss 5.3734 LearningRate 0.000896 Epoch: 5 Global Step: 122690 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:33,104-Speed 2498.55 samples/sec Loss 5.3184 LearningRate 0.000896 Epoch: 5 Global Step: 122700 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:41,247-Speed 2515.30 samples/sec Loss 5.2501 LearningRate 0.000896 Epoch: 5 Global Step: 122710 Fp16 Grad Scale: 16384 Required: 162 hours Training: 2022-07-06 18:28:49,442-Speed 2499.62 samples/sec Loss 5.3058 LearningRate 0.000896 Epoch: 5 Global Step: 122720 Fp16 Grad Scale: 16384 Required: 161 hours Training: 2022-07-06 18:28:57,642-Speed 2498.04 samples/sec Loss 5.3844 LearningRate 0.000896 Epoch: 5 Global Step: 122730 Fp16 Grad Scale: 16384 Required: 161 hours Training: 2022-07-06 18:29:05,839-Speed 2498.72 samples/sec Loss 5.2967 LearningRate 0.000896 Epoch: 5 Global Step: 122740 Fp16 Grad Scale: 16384 Required: 161 hours Training: 2022-07-06 18:29:14,037-Speed 2498.71 samples/sec Loss 5.3935 LearningRate 0.000896 Epoch: 5 Global Step: 122750 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:29:22,240-Speed 2496.82 samples/sec Loss 5.2546 LearningRate 0.000896 Epoch: 5 Global Step: 122760 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:29:30,394-Speed 2512.47 samples/sec Loss 5.3628 LearningRate 0.000896 Epoch: 5 Global Step: 122770 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:29:38,590-Speed 2499.26 samples/sec Loss 5.3329 LearningRate 0.000896 Epoch: 5 Global Step: 122780 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:29:46,789-Speed 2498.29 samples/sec Loss 5.4164 LearningRate 0.000896 Epoch: 5 Global Step: 122790 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:29:54,997-Speed 2495.76 samples/sec Loss 5.2992 LearningRate 0.000896 Epoch: 5 Global Step: 122800 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:03,198-Speed 2497.66 samples/sec Loss 5.2462 LearningRate 0.000896 Epoch: 5 Global Step: 122810 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:11,396-Speed 2498.31 samples/sec Loss 5.2795 LearningRate 0.000896 Epoch: 5 Global Step: 122820 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:19,562-Speed 2508.35 samples/sec Loss 5.2085 LearningRate 0.000896 Epoch: 5 Global Step: 122830 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:27,761-Speed 2498.41 samples/sec Loss 5.2928 LearningRate 0.000896 Epoch: 5 Global Step: 122840 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:35,958-Speed 2498.90 samples/sec Loss 5.2940 LearningRate 0.000896 Epoch: 5 Global Step: 122850 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:44,154-Speed 2499.23 samples/sec Loss 5.3121 LearningRate 0.000896 Epoch: 5 Global Step: 122860 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:30:52,350-Speed 2499.10 samples/sec Loss 5.3061 LearningRate 0.000896 Epoch: 5 Global Step: 122870 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:00,550-Speed 2497.96 samples/sec Loss 5.3567 LearningRate 0.000896 Epoch: 5 Global Step: 122880 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:08,690-Speed 2516.50 samples/sec Loss 5.2679 LearningRate 0.000896 Epoch: 5 Global Step: 122890 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:16,888-Speed 2498.60 samples/sec Loss 5.3838 LearningRate 0.000896 Epoch: 5 Global Step: 122900 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:25,087-Speed 2498.37 samples/sec Loss 5.2705 LearningRate 0.000896 Epoch: 5 Global Step: 122910 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:33,282-Speed 2499.35 samples/sec Loss 5.3242 LearningRate 0.000896 Epoch: 5 Global Step: 122920 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:41,478-Speed 2499.17 samples/sec Loss 5.2799 LearningRate 0.000896 Epoch: 5 Global Step: 122930 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:49,675-Speed 2498.87 samples/sec Loss 5.2053 LearningRate 0.000896 Epoch: 5 Global Step: 122940 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:31:57,820-Speed 2514.96 samples/sec Loss 5.2137 LearningRate 0.000896 Epoch: 5 Global Step: 122950 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:06,016-Speed 2499.07 samples/sec Loss 5.2907 LearningRate 0.000896 Epoch: 5 Global Step: 122960 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:14,212-Speed 2499.07 samples/sec Loss 5.3056 LearningRate 0.000896 Epoch: 5 Global Step: 122970 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:22,408-Speed 2499.38 samples/sec Loss 5.2797 LearningRate 0.000896 Epoch: 5 Global Step: 122980 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:30,608-Speed 2497.70 samples/sec Loss 5.2643 LearningRate 0.000896 Epoch: 5 Global Step: 122990 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:38,805-Speed 2498.93 samples/sec Loss 5.2561 LearningRate 0.000896 Epoch: 5 Global Step: 123000 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:46,948-Speed 2515.45 samples/sec Loss 5.2824 LearningRate 0.000896 Epoch: 5 Global Step: 123010 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:32:55,144-Speed 2499.00 samples/sec Loss 5.2616 LearningRate 0.000896 Epoch: 5 Global Step: 123020 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:03,345-Speed 2498.02 samples/sec Loss 5.1484 LearningRate 0.000896 Epoch: 5 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:11,545-Speed 2497.83 samples/sec Loss 5.2608 LearningRate 0.000896 Epoch: 5 Global Step: 123040 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:19,757-Speed 2494.10 samples/sec Loss 5.2827 LearningRate 0.000895 Epoch: 5 Global Step: 123050 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:27,962-Speed 2496.38 samples/sec Loss 5.4623 LearningRate 0.000895 Epoch: 5 Global Step: 123060 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:36,109-Speed 2514.26 samples/sec Loss 5.2773 LearningRate 0.000895 Epoch: 5 Global Step: 123070 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:44,308-Speed 2498.34 samples/sec Loss 5.4552 LearningRate 0.000895 Epoch: 5 Global Step: 123080 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:33:52,522-Speed 2493.64 samples/sec Loss 5.3093 LearningRate 0.000895 Epoch: 5 Global Step: 123090 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:00,722-Speed 2498.25 samples/sec Loss 5.3717 LearningRate 0.000895 Epoch: 5 Global Step: 123100 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:08,931-Speed 2494.99 samples/sec Loss 5.3903 LearningRate 0.000895 Epoch: 5 Global Step: 123110 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:17,129-Speed 2498.55 samples/sec Loss 5.3023 LearningRate 0.000895 Epoch: 5 Global Step: 123120 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:25,283-Speed 2512.08 samples/sec Loss 5.2715 LearningRate 0.000895 Epoch: 5 Global Step: 123130 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:33,479-Speed 2499.38 samples/sec Loss 5.2465 LearningRate 0.000895 Epoch: 5 Global Step: 123140 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:41,678-Speed 2498.11 samples/sec Loss 5.2495 LearningRate 0.000895 Epoch: 5 Global Step: 123150 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:49,876-Speed 2498.53 samples/sec Loss 5.3233 LearningRate 0.000895 Epoch: 5 Global Step: 123160 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:34:58,074-Speed 2498.55 samples/sec Loss 5.3426 LearningRate 0.000895 Epoch: 5 Global Step: 123170 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:06,268-Speed 2499.85 samples/sec Loss 5.2659 LearningRate 0.000895 Epoch: 5 Global Step: 123180 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:14,415-Speed 2514.40 samples/sec Loss 5.3719 LearningRate 0.000895 Epoch: 5 Global Step: 123190 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:22,623-Speed 2495.18 samples/sec Loss 5.4000 LearningRate 0.000895 Epoch: 5 Global Step: 123200 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:30,820-Speed 2499.31 samples/sec Loss 5.3332 LearningRate 0.000895 Epoch: 5 Global Step: 123210 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:39,016-Speed 2499.23 samples/sec Loss 5.2568 LearningRate 0.000895 Epoch: 5 Global Step: 123220 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:47,212-Speed 2499.12 samples/sec Loss 5.3320 LearningRate 0.000895 Epoch: 5 Global Step: 123230 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:35:55,408-Speed 2499.16 samples/sec Loss 5.3357 LearningRate 0.000895 Epoch: 5 Global Step: 123240 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:03,556-Speed 2514.09 samples/sec Loss 5.3809 LearningRate 0.000895 Epoch: 5 Global Step: 123250 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:11,752-Speed 2499.14 samples/sec Loss 5.3964 LearningRate 0.000895 Epoch: 5 Global Step: 123260 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:19,955-Speed 2497.17 samples/sec Loss 5.3121 LearningRate 0.000895 Epoch: 5 Global Step: 123270 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:28,156-Speed 2497.40 samples/sec Loss 5.3454 LearningRate 0.000895 Epoch: 5 Global Step: 123280 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:36,354-Speed 2499.50 samples/sec Loss 5.3096 LearningRate 0.000895 Epoch: 5 Global Step: 123290 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:44,552-Speed 2498.48 samples/sec Loss 5.3622 LearningRate 0.000895 Epoch: 5 Global Step: 123300 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:36:52,697-Speed 2514.81 samples/sec Loss 5.4050 LearningRate 0.000895 Epoch: 5 Global Step: 123310 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:00,892-Speed 2499.51 samples/sec Loss 5.3340 LearningRate 0.000895 Epoch: 5 Global Step: 123320 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:09,092-Speed 2498.00 samples/sec Loss 5.2938 LearningRate 0.000895 Epoch: 5 Global Step: 123330 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:17,293-Speed 2497.82 samples/sec Loss 5.2938 LearningRate 0.000895 Epoch: 5 Global Step: 123340 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:25,491-Speed 2498.80 samples/sec Loss 5.2923 LearningRate 0.000895 Epoch: 5 Global Step: 123350 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:33,690-Speed 2498.15 samples/sec Loss 5.3266 LearningRate 0.000895 Epoch: 5 Global Step: 123360 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:41,836-Speed 2514.76 samples/sec Loss 5.2664 LearningRate 0.000895 Epoch: 5 Global Step: 123370 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:50,034-Speed 2498.51 samples/sec Loss 5.2903 LearningRate 0.000895 Epoch: 5 Global Step: 123380 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:37:58,237-Speed 2497.20 samples/sec Loss 5.3056 LearningRate 0.000895 Epoch: 5 Global Step: 123390 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:06,434-Speed 2498.82 samples/sec Loss 5.2960 LearningRate 0.000895 Epoch: 5 Global Step: 123400 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:14,629-Speed 2499.64 samples/sec Loss 5.2779 LearningRate 0.000895 Epoch: 5 Global Step: 123410 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:22,829-Speed 2498.37 samples/sec Loss 5.2551 LearningRate 0.000895 Epoch: 5 Global Step: 123420 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:30,975-Speed 2514.60 samples/sec Loss 5.3058 LearningRate 0.000895 Epoch: 5 Global Step: 123430 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:39,171-Speed 2499.04 samples/sec Loss 5.2528 LearningRate 0.000894 Epoch: 5 Global Step: 123440 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:47,374-Speed 2496.91 samples/sec Loss 5.2878 LearningRate 0.000894 Epoch: 5 Global Step: 123450 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:38:55,592-Speed 2492.76 samples/sec Loss 5.2802 LearningRate 0.000894 Epoch: 5 Global Step: 123460 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:03,789-Speed 2499.18 samples/sec Loss 5.3571 LearningRate 0.000894 Epoch: 5 Global Step: 123470 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:11,994-Speed 2496.35 samples/sec Loss 5.3379 LearningRate 0.000894 Epoch: 5 Global Step: 123480 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:20,141-Speed 2514.48 samples/sec Loss 5.3478 LearningRate 0.000894 Epoch: 5 Global Step: 123490 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:28,347-Speed 2495.87 samples/sec Loss 5.2813 LearningRate 0.000894 Epoch: 5 Global Step: 123500 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:36,544-Speed 2498.97 samples/sec Loss 5.4097 LearningRate 0.000894 Epoch: 5 Global Step: 123510 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:44,744-Speed 2497.87 samples/sec Loss 5.4460 LearningRate 0.000894 Epoch: 5 Global Step: 123520 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:39:52,941-Speed 2498.87 samples/sec Loss 5.4271 LearningRate 0.000894 Epoch: 5 Global Step: 123530 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:01,139-Speed 2498.60 samples/sec Loss 5.4241 LearningRate 0.000894 Epoch: 5 Global Step: 123540 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:09,284-Speed 2514.83 samples/sec Loss 5.3284 LearningRate 0.000894 Epoch: 5 Global Step: 123550 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:17,480-Speed 2499.10 samples/sec Loss 5.3759 LearningRate 0.000894 Epoch: 5 Global Step: 123560 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:25,680-Speed 2497.95 samples/sec Loss 5.3068 LearningRate 0.000894 Epoch: 5 Global Step: 123570 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:33,878-Speed 2498.78 samples/sec Loss 5.2766 LearningRate 0.000894 Epoch: 5 Global Step: 123580 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:42,075-Speed 2498.90 samples/sec Loss 5.2290 LearningRate 0.000894 Epoch: 5 Global Step: 123590 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:50,276-Speed 2497.60 samples/sec Loss 5.2844 LearningRate 0.000894 Epoch: 5 Global Step: 123600 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:40:58,422-Speed 2514.42 samples/sec Loss 5.2903 LearningRate 0.000894 Epoch: 5 Global Step: 123610 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:06,619-Speed 2498.74 samples/sec Loss 5.3473 LearningRate 0.000894 Epoch: 5 Global Step: 123620 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:14,816-Speed 2499.16 samples/sec Loss 5.2164 LearningRate 0.000894 Epoch: 5 Global Step: 123630 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:23,013-Speed 2498.96 samples/sec Loss 5.3050 LearningRate 0.000894 Epoch: 5 Global Step: 123640 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:31,215-Speed 2497.35 samples/sec Loss 5.2769 LearningRate 0.000894 Epoch: 5 Global Step: 123650 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:39,414-Speed 2498.08 samples/sec Loss 5.2773 LearningRate 0.000894 Epoch: 5 Global Step: 123660 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:47,561-Speed 2514.11 samples/sec Loss 5.2451 LearningRate 0.000894 Epoch: 5 Global Step: 123670 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:41:55,760-Speed 2498.37 samples/sec Loss 5.3121 LearningRate 0.000894 Epoch: 5 Global Step: 123680 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:03,955-Speed 2499.79 samples/sec Loss 5.2611 LearningRate 0.000894 Epoch: 5 Global Step: 123690 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:12,166-Speed 2494.75 samples/sec Loss 5.3522 LearningRate 0.000894 Epoch: 5 Global Step: 123700 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:20,363-Speed 2498.93 samples/sec Loss 5.2546 LearningRate 0.000894 Epoch: 5 Global Step: 123710 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:28,560-Speed 2498.93 samples/sec Loss 5.3137 LearningRate 0.000894 Epoch: 5 Global Step: 123720 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:36,708-Speed 2514.16 samples/sec Loss 5.3445 LearningRate 0.000894 Epoch: 5 Global Step: 123730 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:44,914-Speed 2496.15 samples/sec Loss 5.3277 LearningRate 0.000894 Epoch: 5 Global Step: 123740 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:42:53,114-Speed 2497.96 samples/sec Loss 5.3607 LearningRate 0.000894 Epoch: 5 Global Step: 123750 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:01,311-Speed 2498.96 samples/sec Loss 5.2645 LearningRate 0.000894 Epoch: 5 Global Step: 123760 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:09,509-Speed 2498.82 samples/sec Loss 5.3133 LearningRate 0.000894 Epoch: 5 Global Step: 123770 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:17,710-Speed 2497.75 samples/sec Loss 5.2599 LearningRate 0.000894 Epoch: 5 Global Step: 123780 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:25,857-Speed 2513.94 samples/sec Loss 5.2861 LearningRate 0.000894 Epoch: 5 Global Step: 123790 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:34,054-Speed 2499.37 samples/sec Loss 5.2352 LearningRate 0.000894 Epoch: 5 Global Step: 123800 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:42,251-Speed 2498.79 samples/sec Loss 5.2907 LearningRate 0.000894 Epoch: 5 Global Step: 123810 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:50,450-Speed 2498.08 samples/sec Loss 5.3231 LearningRate 0.000894 Epoch: 5 Global Step: 123820 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:43:58,648-Speed 2498.65 samples/sec Loss 5.2519 LearningRate 0.000894 Epoch: 5 Global Step: 123830 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:06,858-Speed 2495.12 samples/sec Loss 5.3322 LearningRate 0.000893 Epoch: 5 Global Step: 123840 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:15,003-Speed 2514.60 samples/sec Loss 5.2751 LearningRate 0.000893 Epoch: 5 Global Step: 123850 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:23,203-Speed 2498.20 samples/sec Loss 5.2315 LearningRate 0.000893 Epoch: 5 Global Step: 123860 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:31,404-Speed 2497.72 samples/sec Loss 5.3128 LearningRate 0.000893 Epoch: 5 Global Step: 123870 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:39,597-Speed 2500.12 samples/sec Loss 5.3207 LearningRate 0.000893 Epoch: 5 Global Step: 123880 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:47,806-Speed 2495.17 samples/sec Loss 5.2803 LearningRate 0.000893 Epoch: 5 Global Step: 123890 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:44:56,007-Speed 2498.07 samples/sec Loss 5.4065 LearningRate 0.000893 Epoch: 5 Global Step: 123900 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:45:04,150-Speed 2515.16 samples/sec Loss 5.2130 LearningRate 0.000893 Epoch: 5 Global Step: 123910 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:45:12,382-Speed 2488.26 samples/sec Loss 5.3332 LearningRate 0.000893 Epoch: 5 Global Step: 123920 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:45:20,584-Speed 2497.50 samples/sec Loss 5.3073 LearningRate 0.000893 Epoch: 5 Global Step: 123930 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:45:28,785-Speed 2497.47 samples/sec Loss 5.3028 LearningRate 0.000893 Epoch: 5 Global Step: 123940 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:45:36,987-Speed 2497.61 samples/sec Loss 5.2803 LearningRate 0.000893 Epoch: 5 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:45:45,184-Speed 2498.69 samples/sec Loss 5.3846 LearningRate 0.000893 Epoch: 5 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:45:53,338-Speed 2511.88 samples/sec Loss 5.2414 LearningRate 0.000893 Epoch: 5 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:01,542-Speed 2496.87 samples/sec Loss 5.2694 LearningRate 0.000893 Epoch: 5 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:09,742-Speed 2498.12 samples/sec Loss 5.2993 LearningRate 0.000893 Epoch: 5 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:17,942-Speed 2498.01 samples/sec Loss 5.3605 LearningRate 0.000893 Epoch: 5 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:26,141-Speed 2498.10 samples/sec Loss 5.3555 LearningRate 0.000893 Epoch: 5 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:34,345-Speed 2496.65 samples/sec Loss 5.2828 LearningRate 0.000893 Epoch: 5 Global Step: 124020 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:42,497-Speed 2512.64 samples/sec Loss 5.3484 LearningRate 0.000893 Epoch: 5 Global Step: 124030 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:50,702-Speed 2496.30 samples/sec Loss 5.3509 LearningRate 0.000893 Epoch: 5 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:46:58,903-Speed 2497.69 samples/sec Loss 5.2844 LearningRate 0.000893 Epoch: 5 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:07,105-Speed 2497.65 samples/sec Loss 5.2155 LearningRate 0.000893 Epoch: 5 Global Step: 124060 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:15,304-Speed 2498.20 samples/sec Loss 5.2647 LearningRate 0.000893 Epoch: 5 Global Step: 124070 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:23,507-Speed 2497.07 samples/sec Loss 5.3024 LearningRate 0.000893 Epoch: 5 Global Step: 124080 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:31,656-Speed 2513.62 samples/sec Loss 5.2364 LearningRate 0.000893 Epoch: 5 Global Step: 124090 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:39,860-Speed 2496.77 samples/sec Loss 5.2985 LearningRate 0.000893 Epoch: 5 Global Step: 124100 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:48,065-Speed 2496.89 samples/sec Loss 5.3535 LearningRate 0.000893 Epoch: 5 Global Step: 124110 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:47:56,267-Speed 2497.45 samples/sec Loss 5.2651 LearningRate 0.000893 Epoch: 5 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:04,465-Speed 2498.42 samples/sec Loss 5.3106 LearningRate 0.000893 Epoch: 5 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:12,664-Speed 2498.29 samples/sec Loss 5.2930 LearningRate 0.000893 Epoch: 5 Global Step: 124140 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:20,820-Speed 2511.60 samples/sec Loss 5.3835 LearningRate 0.000893 Epoch: 5 Global Step: 124150 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:29,017-Speed 2498.84 samples/sec Loss 5.3627 LearningRate 0.000893 Epoch: 5 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:37,214-Speed 2498.88 samples/sec Loss 5.3210 LearningRate 0.000893 Epoch: 5 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:45,411-Speed 2498.92 samples/sec Loss 5.2872 LearningRate 0.000893 Epoch: 5 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:48:53,607-Speed 2498.98 samples/sec Loss 5.2891 LearningRate 0.000893 Epoch: 5 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:01,804-Speed 2499.04 samples/sec Loss 5.3176 LearningRate 0.000893 Epoch: 5 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:09,955-Speed 2513.43 samples/sec Loss 5.2789 LearningRate 0.000893 Epoch: 5 Global Step: 124210 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:18,150-Speed 2499.45 samples/sec Loss 5.3027 LearningRate 0.000893 Epoch: 5 Global Step: 124220 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:26,348-Speed 2498.83 samples/sec Loss 5.2829 LearningRate 0.000892 Epoch: 5 Global Step: 124230 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:34,545-Speed 2499.33 samples/sec Loss 5.2230 LearningRate 0.000892 Epoch: 5 Global Step: 124240 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:42,742-Speed 2498.79 samples/sec Loss 5.2561 LearningRate 0.000892 Epoch: 5 Global Step: 124250 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:50,946-Speed 2496.84 samples/sec Loss 5.2714 LearningRate 0.000892 Epoch: 5 Global Step: 124260 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:49:59,103-Speed 2511.43 samples/sec Loss 5.2764 LearningRate 0.000892 Epoch: 5 Global Step: 124270 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:07,300-Speed 2498.75 samples/sec Loss 5.3170 LearningRate 0.000892 Epoch: 5 Global Step: 124280 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:15,497-Speed 2499.05 samples/sec Loss 5.3357 LearningRate 0.000892 Epoch: 5 Global Step: 124290 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:23,694-Speed 2499.07 samples/sec Loss 5.2842 LearningRate 0.000892 Epoch: 5 Global Step: 124300 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:31,896-Speed 2497.12 samples/sec Loss 5.2774 LearningRate 0.000892 Epoch: 5 Global Step: 124310 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:40,100-Speed 2496.73 samples/sec Loss 5.3153 LearningRate 0.000892 Epoch: 5 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:48,244-Speed 2515.40 samples/sec Loss 5.1960 LearningRate 0.000892 Epoch: 5 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:50:56,439-Speed 2499.37 samples/sec Loss 5.2694 LearningRate 0.000892 Epoch: 5 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:04,637-Speed 2498.46 samples/sec Loss 5.1702 LearningRate 0.000892 Epoch: 5 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:12,839-Speed 2497.57 samples/sec Loss 5.2296 LearningRate 0.000892 Epoch: 5 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:21,041-Speed 2497.36 samples/sec Loss 5.2758 LearningRate 0.000892 Epoch: 5 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:29,240-Speed 2498.20 samples/sec Loss 5.2834 LearningRate 0.000892 Epoch: 5 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:37,382-Speed 2515.68 samples/sec Loss 5.1553 LearningRate 0.000892 Epoch: 5 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:45,589-Speed 2495.83 samples/sec Loss 5.2340 LearningRate 0.000892 Epoch: 5 Global Step: 124400 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:51:53,786-Speed 2499.03 samples/sec Loss 5.2180 LearningRate 0.000892 Epoch: 5 Global Step: 124410 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:01,987-Speed 2497.72 samples/sec Loss 5.2139 LearningRate 0.000892 Epoch: 5 Global Step: 124420 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:10,182-Speed 2499.71 samples/sec Loss 5.2729 LearningRate 0.000892 Epoch: 5 Global Step: 124430 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:20,540-Speed 1977.93 samples/sec Loss 5.2799 LearningRate 0.000892 Epoch: 6 Global Step: 124440 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:28,694-Speed 2511.82 samples/sec Loss 5.2938 LearningRate 0.000892 Epoch: 6 Global Step: 124450 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:36,890-Speed 2499.70 samples/sec Loss 5.2084 LearningRate 0.000892 Epoch: 6 Global Step: 124460 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:45,088-Speed 2498.65 samples/sec Loss 5.3611 LearningRate 0.000892 Epoch: 6 Global Step: 124470 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:52:53,285-Speed 2498.73 samples/sec Loss 5.3021 LearningRate 0.000892 Epoch: 6 Global Step: 124480 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 18:53:01,444-Speed 2510.61 samples/sec Loss 5.2960 LearningRate 0.000892 Epoch: 6 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:09,640-Speed 2499.09 samples/sec Loss 5.2986 LearningRate 0.000892 Epoch: 6 Global Step: 124500 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:17,781-Speed 2516.03 samples/sec Loss 5.3160 LearningRate 0.000892 Epoch: 6 Global Step: 124510 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:25,992-Speed 2494.80 samples/sec Loss 5.2799 LearningRate 0.000892 Epoch: 6 Global Step: 124520 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:34,202-Speed 2494.91 samples/sec Loss 5.2582 LearningRate 0.000892 Epoch: 6 Global Step: 124530 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:42,411-Speed 2495.32 samples/sec Loss 5.2865 LearningRate 0.000892 Epoch: 6 Global Step: 124540 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:50,607-Speed 2498.96 samples/sec Loss 5.2124 LearningRate 0.000892 Epoch: 6 Global Step: 124550 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:53:58,804-Speed 2499.20 samples/sec Loss 5.2033 LearningRate 0.000892 Epoch: 6 Global Step: 124560 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:06,953-Speed 2513.53 samples/sec Loss 5.2915 LearningRate 0.000892 Epoch: 6 Global Step: 124570 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:15,150-Speed 2498.77 samples/sec Loss 5.2509 LearningRate 0.000892 Epoch: 6 Global Step: 124580 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:23,354-Speed 2496.99 samples/sec Loss 5.2018 LearningRate 0.000892 Epoch: 6 Global Step: 124590 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:31,554-Speed 2497.82 samples/sec Loss 5.2822 LearningRate 0.000892 Epoch: 6 Global Step: 124600 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:39,757-Speed 2497.08 samples/sec Loss 5.2941 LearningRate 0.000892 Epoch: 6 Global Step: 124610 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:47,958-Speed 2497.54 samples/sec Loss 5.3498 LearningRate 0.000892 Epoch: 6 Global Step: 124620 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:54:56,115-Speed 2511.23 samples/sec Loss 5.2769 LearningRate 0.000891 Epoch: 6 Global Step: 124630 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:04,319-Speed 2496.91 samples/sec Loss 5.2076 LearningRate 0.000891 Epoch: 6 Global Step: 124640 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:12,520-Speed 2497.75 samples/sec Loss 5.2152 LearningRate 0.000891 Epoch: 6 Global Step: 124650 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:20,731-Speed 2494.81 samples/sec Loss 5.1993 LearningRate 0.000891 Epoch: 6 Global Step: 124660 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:28,930-Speed 2498.20 samples/sec Loss 5.1702 LearningRate 0.000891 Epoch: 6 Global Step: 124670 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:37,127-Speed 2498.77 samples/sec Loss 5.2410 LearningRate 0.000891 Epoch: 6 Global Step: 124680 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:45,271-Speed 2515.21 samples/sec Loss 5.1899 LearningRate 0.000891 Epoch: 6 Global Step: 124690 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:55:53,468-Speed 2499.20 samples/sec Loss 5.2529 LearningRate 0.000891 Epoch: 6 Global Step: 124700 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:01,665-Speed 2499.12 samples/sec Loss 5.1867 LearningRate 0.000891 Epoch: 6 Global Step: 124710 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:09,861-Speed 2498.99 samples/sec Loss 5.1826 LearningRate 0.000891 Epoch: 6 Global Step: 124720 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:18,074-Speed 2494.03 samples/sec Loss 5.1692 LearningRate 0.000891 Epoch: 6 Global Step: 124730 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:26,273-Speed 2498.56 samples/sec Loss 5.1848 LearningRate 0.000891 Epoch: 6 Global Step: 124740 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:34,418-Speed 2514.62 samples/sec Loss 5.2807 LearningRate 0.000891 Epoch: 6 Global Step: 124750 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:42,627-Speed 2495.27 samples/sec Loss 5.3514 LearningRate 0.000891 Epoch: 6 Global Step: 124760 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:50,836-Speed 2495.47 samples/sec Loss 5.3345 LearningRate 0.000891 Epoch: 6 Global Step: 124770 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:56:59,032-Speed 2499.21 samples/sec Loss 5.1745 LearningRate 0.000891 Epoch: 6 Global Step: 124780 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:07,228-Speed 2499.17 samples/sec Loss 5.2337 LearningRate 0.000891 Epoch: 6 Global Step: 124790 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:15,424-Speed 2499.15 samples/sec Loss 5.2030 LearningRate 0.000891 Epoch: 6 Global Step: 124800 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:23,569-Speed 2514.80 samples/sec Loss 5.1672 LearningRate 0.000891 Epoch: 6 Global Step: 124810 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:31,767-Speed 2498.61 samples/sec Loss 5.2434 LearningRate 0.000891 Epoch: 6 Global Step: 124820 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:39,963-Speed 2499.63 samples/sec Loss 5.1607 LearningRate 0.000891 Epoch: 6 Global Step: 124830 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:48,160-Speed 2498.68 samples/sec Loss 5.2379 LearningRate 0.000891 Epoch: 6 Global Step: 124840 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:57:56,395-Speed 2487.26 samples/sec Loss 5.2357 LearningRate 0.000891 Epoch: 6 Global Step: 124850 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:04,598-Speed 2497.40 samples/sec Loss 5.3900 LearningRate 0.000891 Epoch: 6 Global Step: 124860 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:12,750-Speed 2512.72 samples/sec Loss 5.3120 LearningRate 0.000891 Epoch: 6 Global Step: 124870 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:20,949-Speed 2498.14 samples/sec Loss 5.2542 LearningRate 0.000891 Epoch: 6 Global Step: 124880 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:29,143-Speed 2499.62 samples/sec Loss 5.2823 LearningRate 0.000891 Epoch: 6 Global Step: 124890 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:37,345-Speed 2497.42 samples/sec Loss 5.3135 LearningRate 0.000891 Epoch: 6 Global Step: 124900 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:45,540-Speed 2499.62 samples/sec Loss 5.2554 LearningRate 0.000891 Epoch: 6 Global Step: 124910 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:58:53,736-Speed 2498.80 samples/sec Loss 5.2753 LearningRate 0.000891 Epoch: 6 Global Step: 124920 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:01,880-Speed 2515.65 samples/sec Loss 5.4821 LearningRate 0.000891 Epoch: 6 Global Step: 124930 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:10,079-Speed 2498.38 samples/sec Loss 5.3679 LearningRate 0.000891 Epoch: 6 Global Step: 124940 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:18,280-Speed 2497.77 samples/sec Loss 5.3606 LearningRate 0.000891 Epoch: 6 Global Step: 124950 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:26,482-Speed 2497.43 samples/sec Loss 5.2751 LearningRate 0.000891 Epoch: 6 Global Step: 124960 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:34,681-Speed 2498.05 samples/sec Loss 5.3487 LearningRate 0.000891 Epoch: 6 Global Step: 124970 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:42,884-Speed 2497.03 samples/sec Loss 5.3459 LearningRate 0.000891 Epoch: 6 Global Step: 124980 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:51,034-Speed 2513.28 samples/sec Loss 5.1916 LearningRate 0.000891 Epoch: 6 Global Step: 124990 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 18:59:59,239-Speed 2496.84 samples/sec Loss 5.1887 LearningRate 0.000891 Epoch: 6 Global Step: 125000 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:07,441-Speed 2497.08 samples/sec Loss 5.2556 LearningRate 0.000891 Epoch: 6 Global Step: 125010 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:15,640-Speed 2498.29 samples/sec Loss 5.2265 LearningRate 0.000890 Epoch: 6 Global Step: 125020 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:23,847-Speed 2495.89 samples/sec Loss 5.2427 LearningRate 0.000890 Epoch: 6 Global Step: 125030 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:32,060-Speed 2494.07 samples/sec Loss 5.2582 LearningRate 0.000890 Epoch: 6 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:40,209-Speed 2513.28 samples/sec Loss 5.1953 LearningRate 0.000890 Epoch: 6 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:48,413-Speed 2496.92 samples/sec Loss 5.2546 LearningRate 0.000890 Epoch: 6 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:00:56,612-Speed 2498.47 samples/sec Loss 5.2065 LearningRate 0.000890 Epoch: 6 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:04,810-Speed 2498.42 samples/sec Loss 5.2249 LearningRate 0.000890 Epoch: 6 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:13,009-Speed 2498.24 samples/sec Loss 5.1301 LearningRate 0.000890 Epoch: 6 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:21,208-Speed 2498.32 samples/sec Loss 5.1332 LearningRate 0.000890 Epoch: 6 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:29,353-Speed 2514.65 samples/sec Loss 5.2108 LearningRate 0.000890 Epoch: 6 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:37,550-Speed 2498.91 samples/sec Loss 5.1695 LearningRate 0.000890 Epoch: 6 Global Step: 125120 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:45,750-Speed 2498.29 samples/sec Loss 5.1758 LearningRate 0.000890 Epoch: 6 Global Step: 125130 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:01:53,946-Speed 2498.99 samples/sec Loss 5.2232 LearningRate 0.000890 Epoch: 6 Global Step: 125140 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:02,145-Speed 2498.38 samples/sec Loss 5.1690 LearningRate 0.000890 Epoch: 6 Global Step: 125150 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:10,343-Speed 2498.53 samples/sec Loss 5.2618 LearningRate 0.000890 Epoch: 6 Global Step: 125160 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:18,488-Speed 2514.96 samples/sec Loss 5.3147 LearningRate 0.000890 Epoch: 6 Global Step: 125170 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:26,683-Speed 2499.67 samples/sec Loss 5.2367 LearningRate 0.000890 Epoch: 6 Global Step: 125180 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:34,877-Speed 2499.75 samples/sec Loss 5.1783 LearningRate 0.000890 Epoch: 6 Global Step: 125190 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:43,076-Speed 2498.28 samples/sec Loss 5.2643 LearningRate 0.000890 Epoch: 6 Global Step: 125200 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:51,275-Speed 2498.47 samples/sec Loss 5.2336 LearningRate 0.000890 Epoch: 6 Global Step: 125210 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:02:59,473-Speed 2498.55 samples/sec Loss 5.2285 LearningRate 0.000890 Epoch: 6 Global Step: 125220 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:07,622-Speed 2513.55 samples/sec Loss 5.3231 LearningRate 0.000890 Epoch: 6 Global Step: 125230 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:15,821-Speed 2498.38 samples/sec Loss 5.1963 LearningRate 0.000890 Epoch: 6 Global Step: 125240 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:24,021-Speed 2497.85 samples/sec Loss 5.3433 LearningRate 0.000890 Epoch: 6 Global Step: 125250 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:32,218-Speed 2498.77 samples/sec Loss 5.2296 LearningRate 0.000890 Epoch: 6 Global Step: 125260 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:40,414-Speed 2499.14 samples/sec Loss 5.1869 LearningRate 0.000890 Epoch: 6 Global Step: 125270 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:48,609-Speed 2499.68 samples/sec Loss 5.2305 LearningRate 0.000890 Epoch: 6 Global Step: 125280 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:03:56,754-Speed 2514.78 samples/sec Loss 5.3067 LearningRate 0.000890 Epoch: 6 Global Step: 125290 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:04,950-Speed 2499.11 samples/sec Loss 5.3452 LearningRate 0.000890 Epoch: 6 Global Step: 125300 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:13,158-Speed 2495.39 samples/sec Loss 5.2964 LearningRate 0.000890 Epoch: 6 Global Step: 125310 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:21,357-Speed 2498.24 samples/sec Loss 5.2795 LearningRate 0.000890 Epoch: 6 Global Step: 125320 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:29,557-Speed 2497.97 samples/sec Loss 5.2883 LearningRate 0.000890 Epoch: 6 Global Step: 125330 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:37,757-Speed 2498.12 samples/sec Loss 5.2558 LearningRate 0.000890 Epoch: 6 Global Step: 125340 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:45,901-Speed 2515.16 samples/sec Loss 5.3109 LearningRate 0.000890 Epoch: 6 Global Step: 125350 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:04:54,099-Speed 2498.66 samples/sec Loss 5.2713 LearningRate 0.000890 Epoch: 6 Global Step: 125360 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:02,311-Speed 2494.16 samples/sec Loss 5.2170 LearningRate 0.000890 Epoch: 6 Global Step: 125370 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:10,510-Speed 2498.27 samples/sec Loss 5.2100 LearningRate 0.000890 Epoch: 6 Global Step: 125380 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:18,707-Speed 2498.82 samples/sec Loss 5.1982 LearningRate 0.000890 Epoch: 6 Global Step: 125390 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:26,903-Speed 2499.64 samples/sec Loss 5.2487 LearningRate 0.000890 Epoch: 6 Global Step: 125400 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:35,045-Speed 2515.77 samples/sec Loss 5.2289 LearningRate 0.000890 Epoch: 6 Global Step: 125410 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:43,241-Speed 2498.88 samples/sec Loss 5.2376 LearningRate 0.000889 Epoch: 6 Global Step: 125420 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:51,440-Speed 2498.55 samples/sec Loss 5.2232 LearningRate 0.000889 Epoch: 6 Global Step: 125430 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:05:59,638-Speed 2498.58 samples/sec Loss 5.2254 LearningRate 0.000889 Epoch: 6 Global Step: 125440 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:07,838-Speed 2498.18 samples/sec Loss 5.2322 LearningRate 0.000889 Epoch: 6 Global Step: 125450 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:16,040-Speed 2497.49 samples/sec Loss 5.1160 LearningRate 0.000889 Epoch: 6 Global Step: 125460 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:24,187-Speed 2514.12 samples/sec Loss 5.2294 LearningRate 0.000889 Epoch: 6 Global Step: 125470 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:32,392-Speed 2496.62 samples/sec Loss 5.2301 LearningRate 0.000889 Epoch: 6 Global Step: 125480 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:40,594-Speed 2497.27 samples/sec Loss 5.1279 LearningRate 0.000889 Epoch: 6 Global Step: 125490 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:48,796-Speed 2497.55 samples/sec Loss 5.2140 LearningRate 0.000889 Epoch: 6 Global Step: 125500 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:06:56,999-Speed 2496.76 samples/sec Loss 5.2173 LearningRate 0.000889 Epoch: 6 Global Step: 125510 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:05,209-Speed 2494.90 samples/sec Loss 5.2466 LearningRate 0.000889 Epoch: 6 Global Step: 125520 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:13,353-Speed 2515.28 samples/sec Loss 5.2276 LearningRate 0.000889 Epoch: 6 Global Step: 125530 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:21,554-Speed 2497.71 samples/sec Loss 5.3068 LearningRate 0.000889 Epoch: 6 Global Step: 125540 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:29,756-Speed 2497.27 samples/sec Loss 5.1860 LearningRate 0.000889 Epoch: 6 Global Step: 125550 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:37,958-Speed 2497.45 samples/sec Loss 5.2418 LearningRate 0.000889 Epoch: 6 Global Step: 125560 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:46,165-Speed 2495.85 samples/sec Loss 5.2936 LearningRate 0.000889 Epoch: 6 Global Step: 125570 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:07:54,373-Speed 2495.54 samples/sec Loss 5.2968 LearningRate 0.000889 Epoch: 6 Global Step: 125580 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:02,513-Speed 2516.36 samples/sec Loss 5.1772 LearningRate 0.000889 Epoch: 6 Global Step: 125590 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:10,715-Speed 2497.59 samples/sec Loss 5.2264 LearningRate 0.000889 Epoch: 6 Global Step: 125600 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:18,912-Speed 2498.87 samples/sec Loss 5.1534 LearningRate 0.000889 Epoch: 6 Global Step: 125610 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:27,112-Speed 2498.01 samples/sec Loss 5.1939 LearningRate 0.000889 Epoch: 6 Global Step: 125620 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:35,312-Speed 2497.89 samples/sec Loss 5.2390 LearningRate 0.000889 Epoch: 6 Global Step: 125630 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:43,510-Speed 2498.80 samples/sec Loss 5.2126 LearningRate 0.000889 Epoch: 6 Global Step: 125640 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:51,655-Speed 2514.84 samples/sec Loss 5.1869 LearningRate 0.000889 Epoch: 6 Global Step: 125650 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:08:59,855-Speed 2497.85 samples/sec Loss 5.1934 LearningRate 0.000889 Epoch: 6 Global Step: 125660 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:09:08,052-Speed 2498.78 samples/sec Loss 5.2704 LearningRate 0.000889 Epoch: 6 Global Step: 125670 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:09:16,253-Speed 2497.92 samples/sec Loss 5.1872 LearningRate 0.000889 Epoch: 6 Global Step: 125680 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:09:24,456-Speed 2496.93 samples/sec Loss 5.1824 LearningRate 0.000889 Epoch: 6 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:09:32,653-Speed 2498.89 samples/sec Loss 5.2150 LearningRate 0.000889 Epoch: 6 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:09:40,796-Speed 2515.56 samples/sec Loss 5.1269 LearningRate 0.000889 Epoch: 6 Global Step: 125710 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:09:48,996-Speed 2498.15 samples/sec Loss 5.1417 LearningRate 0.000889 Epoch: 6 Global Step: 125720 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:09:57,192-Speed 2499.11 samples/sec Loss 5.1202 LearningRate 0.000889 Epoch: 6 Global Step: 125730 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:05,393-Speed 2497.74 samples/sec Loss 5.1722 LearningRate 0.000889 Epoch: 6 Global Step: 125740 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:13,589-Speed 2499.33 samples/sec Loss 5.2058 LearningRate 0.000889 Epoch: 6 Global Step: 125750 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:21,790-Speed 2497.60 samples/sec Loss 5.1439 LearningRate 0.000889 Epoch: 6 Global Step: 125760 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:29,946-Speed 2511.58 samples/sec Loss 5.2422 LearningRate 0.000889 Epoch: 6 Global Step: 125770 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:38,145-Speed 2497.93 samples/sec Loss 5.2224 LearningRate 0.000889 Epoch: 6 Global Step: 125780 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:46,341-Speed 2499.47 samples/sec Loss 5.2250 LearningRate 0.000889 Epoch: 6 Global Step: 125790 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:10:54,539-Speed 2498.61 samples/sec Loss 5.2462 LearningRate 0.000889 Epoch: 6 Global Step: 125800 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:02,735-Speed 2499.13 samples/sec Loss 5.2504 LearningRate 0.000888 Epoch: 6 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:10,938-Speed 2500.48 samples/sec Loss 5.1572 LearningRate 0.000888 Epoch: 6 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:19,089-Speed 2513.22 samples/sec Loss 5.2113 LearningRate 0.000888 Epoch: 6 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:27,297-Speed 2495.47 samples/sec Loss 5.2297 LearningRate 0.000888 Epoch: 6 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:35,493-Speed 2499.12 samples/sec Loss 5.2257 LearningRate 0.000888 Epoch: 6 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:43,692-Speed 2498.55 samples/sec Loss 5.1164 LearningRate 0.000888 Epoch: 6 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:11:51,888-Speed 2499.19 samples/sec Loss 5.2496 LearningRate 0.000888 Epoch: 6 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:00,095-Speed 2495.86 samples/sec Loss 5.3054 LearningRate 0.000888 Epoch: 6 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:08,241-Speed 2514.52 samples/sec Loss 5.2646 LearningRate 0.000888 Epoch: 6 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:16,438-Speed 2498.76 samples/sec Loss 5.2799 LearningRate 0.000888 Epoch: 6 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:24,638-Speed 2498.46 samples/sec Loss 5.2014 LearningRate 0.000888 Epoch: 6 Global Step: 125910 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:32,840-Speed 2497.36 samples/sec Loss 5.3014 LearningRate 0.000888 Epoch: 6 Global Step: 125920 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:41,035-Speed 2499.30 samples/sec Loss 5.2442 LearningRate 0.000888 Epoch: 6 Global Step: 125930 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:49,234-Speed 2498.40 samples/sec Loss 5.2215 LearningRate 0.000888 Epoch: 6 Global Step: 125940 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:12:57,379-Speed 2514.85 samples/sec Loss 5.1785 LearningRate 0.000888 Epoch: 6 Global Step: 125950 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:05,575-Speed 2499.05 samples/sec Loss 5.1725 LearningRate 0.000888 Epoch: 6 Global Step: 125960 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:13,772-Speed 2499.00 samples/sec Loss 5.1977 LearningRate 0.000888 Epoch: 6 Global Step: 125970 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:21,972-Speed 2497.92 samples/sec Loss 5.1587 LearningRate 0.000888 Epoch: 6 Global Step: 125980 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:30,180-Speed 2495.54 samples/sec Loss 5.1875 LearningRate 0.000888 Epoch: 6 Global Step: 125990 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:38,380-Speed 2498.11 samples/sec Loss 5.1621 LearningRate 0.000888 Epoch: 6 Global Step: 126000 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:46,528-Speed 2513.77 samples/sec Loss 5.2309 LearningRate 0.000888 Epoch: 6 Global Step: 126010 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:13:54,729-Speed 2498.04 samples/sec Loss 5.1840 LearningRate 0.000888 Epoch: 6 Global Step: 126020 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:02,927-Speed 2498.64 samples/sec Loss 5.2294 LearningRate 0.000888 Epoch: 6 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:11,127-Speed 2497.98 samples/sec Loss 5.1625 LearningRate 0.000888 Epoch: 6 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:19,339-Speed 2494.36 samples/sec Loss 5.1330 LearningRate 0.000888 Epoch: 6 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:27,539-Speed 2497.85 samples/sec Loss 5.1584 LearningRate 0.000888 Epoch: 6 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:35,691-Speed 2512.81 samples/sec Loss 5.2490 LearningRate 0.000888 Epoch: 6 Global Step: 126070 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:43,900-Speed 2495.25 samples/sec Loss 5.2491 LearningRate 0.000888 Epoch: 6 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:14:52,099-Speed 2498.28 samples/sec Loss 5.2309 LearningRate 0.000888 Epoch: 6 Global Step: 126090 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:00,299-Speed 2498.06 samples/sec Loss 5.2305 LearningRate 0.000888 Epoch: 6 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:08,498-Speed 2498.29 samples/sec Loss 5.2856 LearningRate 0.000888 Epoch: 6 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:16,697-Speed 2498.34 samples/sec Loss 5.1689 LearningRate 0.000888 Epoch: 6 Global Step: 126120 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:24,860-Speed 2509.20 samples/sec Loss 5.2094 LearningRate 0.000888 Epoch: 6 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:33,056-Speed 2499.00 samples/sec Loss 5.2392 LearningRate 0.000888 Epoch: 6 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:41,256-Speed 2497.99 samples/sec Loss 5.2413 LearningRate 0.000888 Epoch: 6 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:49,452-Speed 2499.25 samples/sec Loss 5.1705 LearningRate 0.000888 Epoch: 6 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:15:57,651-Speed 2498.34 samples/sec Loss 5.2115 LearningRate 0.000888 Epoch: 6 Global Step: 126170 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:05,862-Speed 2494.82 samples/sec Loss 5.1593 LearningRate 0.000888 Epoch: 6 Global Step: 126180 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:14,005-Speed 2516.07 samples/sec Loss 5.1893 LearningRate 0.000888 Epoch: 6 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:22,215-Speed 2495.40 samples/sec Loss 5.1922 LearningRate 0.000888 Epoch: 6 Global Step: 126200 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:30,415-Speed 2498.01 samples/sec Loss 5.2035 LearningRate 0.000887 Epoch: 6 Global Step: 126210 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:38,613-Speed 2498.79 samples/sec Loss 5.2465 LearningRate 0.000887 Epoch: 6 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:46,813-Speed 2497.83 samples/sec Loss 5.1678 LearningRate 0.000887 Epoch: 6 Global Step: 126230 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:16:55,010-Speed 2498.75 samples/sec Loss 5.2355 LearningRate 0.000887 Epoch: 6 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:03,158-Speed 2514.14 samples/sec Loss 5.2150 LearningRate 0.000887 Epoch: 6 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:11,357-Speed 2498.34 samples/sec Loss 5.1998 LearningRate 0.000887 Epoch: 6 Global Step: 126260 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:19,559-Speed 2497.24 samples/sec Loss 5.2095 LearningRate 0.000887 Epoch: 6 Global Step: 126270 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:27,757-Speed 2498.70 samples/sec Loss 5.1496 LearningRate 0.000887 Epoch: 6 Global Step: 126280 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:35,958-Speed 2497.84 samples/sec Loss 5.1426 LearningRate 0.000887 Epoch: 6 Global Step: 126290 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:44,155-Speed 2498.79 samples/sec Loss 5.1461 LearningRate 0.000887 Epoch: 6 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:17:52,298-Speed 2515.50 samples/sec Loss 5.1759 LearningRate 0.000887 Epoch: 6 Global Step: 126310 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:00,496-Speed 2498.47 samples/sec Loss 5.2129 LearningRate 0.000887 Epoch: 6 Global Step: 126320 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:08,705-Speed 2495.32 samples/sec Loss 5.2259 LearningRate 0.000887 Epoch: 6 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:16,901-Speed 2499.20 samples/sec Loss 5.1312 LearningRate 0.000887 Epoch: 6 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:25,095-Speed 2499.78 samples/sec Loss 5.1687 LearningRate 0.000887 Epoch: 6 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:33,290-Speed 2499.14 samples/sec Loss 5.2245 LearningRate 0.000887 Epoch: 6 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:41,433-Speed 2515.69 samples/sec Loss 5.2010 LearningRate 0.000887 Epoch: 6 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:49,626-Speed 2500.12 samples/sec Loss 5.1601 LearningRate 0.000887 Epoch: 6 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:18:57,827-Speed 2497.43 samples/sec Loss 5.2622 LearningRate 0.000887 Epoch: 6 Global Step: 126390 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:06,025-Speed 2498.63 samples/sec Loss 5.2853 LearningRate 0.000887 Epoch: 6 Global Step: 126400 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:14,225-Speed 2498.04 samples/sec Loss 5.1345 LearningRate 0.000887 Epoch: 6 Global Step: 126410 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:22,420-Speed 2499.24 samples/sec Loss 5.1611 LearningRate 0.000887 Epoch: 6 Global Step: 126420 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:30,564-Speed 2515.11 samples/sec Loss 5.2346 LearningRate 0.000887 Epoch: 6 Global Step: 126430 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:38,760-Speed 2499.17 samples/sec Loss 5.2496 LearningRate 0.000887 Epoch: 6 Global Step: 126440 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:46,955-Speed 2500.36 samples/sec Loss 5.1977 LearningRate 0.000887 Epoch: 6 Global Step: 126450 Fp16 Grad Scale: 65536 Required: 161 hours Training: 2022-07-06 19:19:55,108-Speed 2512.47 samples/sec Loss 5.1995 LearningRate 0.000887 Epoch: 6 Global Step: 126460 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:03,304-Speed 2499.00 samples/sec Loss 5.2542 LearningRate 0.000887 Epoch: 6 Global Step: 126470 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:11,499-Speed 2499.41 samples/sec Loss 5.1100 LearningRate 0.000887 Epoch: 6 Global Step: 126480 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:19,655-Speed 2511.52 samples/sec Loss 5.1290 LearningRate 0.000887 Epoch: 6 Global Step: 126490 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:27,850-Speed 2499.68 samples/sec Loss 5.1440 LearningRate 0.000887 Epoch: 6 Global Step: 126500 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:36,043-Speed 2500.04 samples/sec Loss 5.2539 LearningRate 0.000887 Epoch: 6 Global Step: 126510 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:44,240-Speed 2498.97 samples/sec Loss 5.2806 LearningRate 0.000887 Epoch: 6 Global Step: 126520 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:20:52,435-Speed 2499.58 samples/sec Loss 5.1507 LearningRate 0.000887 Epoch: 6 Global Step: 126530 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:00,637-Speed 2498.00 samples/sec Loss 5.2246 LearningRate 0.000887 Epoch: 6 Global Step: 126540 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:08,779-Speed 2515.85 samples/sec Loss 5.2070 LearningRate 0.000887 Epoch: 6 Global Step: 126550 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:16,977-Speed 2498.69 samples/sec Loss 5.1961 LearningRate 0.000887 Epoch: 6 Global Step: 126560 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:25,177-Speed 2497.90 samples/sec Loss 5.2742 LearningRate 0.000887 Epoch: 6 Global Step: 126570 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:33,372-Speed 2499.66 samples/sec Loss 5.3074 LearningRate 0.000887 Epoch: 6 Global Step: 126580 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:41,567-Speed 2499.55 samples/sec Loss 5.2906 LearningRate 0.000887 Epoch: 6 Global Step: 126590 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:49,763-Speed 2498.99 samples/sec Loss 5.2360 LearningRate 0.000887 Epoch: 6 Global Step: 126600 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:21:57,903-Speed 2516.53 samples/sec Loss 5.1727 LearningRate 0.000886 Epoch: 6 Global Step: 126610 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:06,101-Speed 2498.56 samples/sec Loss 5.1086 LearningRate 0.000886 Epoch: 6 Global Step: 126620 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:14,298-Speed 2498.92 samples/sec Loss 5.1487 LearningRate 0.000886 Epoch: 6 Global Step: 126630 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:22,497-Speed 2498.51 samples/sec Loss 5.1798 LearningRate 0.000886 Epoch: 6 Global Step: 126640 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:30,693-Speed 2499.16 samples/sec Loss 5.1799 LearningRate 0.000886 Epoch: 6 Global Step: 126650 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:38,889-Speed 2499.58 samples/sec Loss 5.1448 LearningRate 0.000886 Epoch: 6 Global Step: 126660 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:47,031-Speed 2515.80 samples/sec Loss 5.2014 LearningRate 0.000886 Epoch: 6 Global Step: 126670 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:22:55,226-Speed 2499.57 samples/sec Loss 5.1638 LearningRate 0.000886 Epoch: 6 Global Step: 126680 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:03,432-Speed 2496.07 samples/sec Loss 5.2447 LearningRate 0.000886 Epoch: 6 Global Step: 126690 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:11,626-Speed 2499.71 samples/sec Loss 5.2350 LearningRate 0.000886 Epoch: 6 Global Step: 126700 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:19,819-Speed 2500.05 samples/sec Loss 5.2471 LearningRate 0.000886 Epoch: 6 Global Step: 126710 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:28,022-Speed 2497.25 samples/sec Loss 5.1615 LearningRate 0.000886 Epoch: 6 Global Step: 126720 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:36,169-Speed 2514.26 samples/sec Loss 5.1750 LearningRate 0.000886 Epoch: 6 Global Step: 126730 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:44,369-Speed 2497.94 samples/sec Loss 5.2426 LearningRate 0.000886 Epoch: 6 Global Step: 126740 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:23:52,564-Speed 2499.46 samples/sec Loss 5.2076 LearningRate 0.000886 Epoch: 6 Global Step: 126750 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:00,758-Speed 2499.89 samples/sec Loss 5.1781 LearningRate 0.000886 Epoch: 6 Global Step: 126760 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:08,956-Speed 2498.71 samples/sec Loss 5.1938 LearningRate 0.000886 Epoch: 6 Global Step: 126770 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:17,153-Speed 2498.69 samples/sec Loss 5.2012 LearningRate 0.000886 Epoch: 6 Global Step: 126780 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:25,295-Speed 2516.04 samples/sec Loss 5.2264 LearningRate 0.000886 Epoch: 6 Global Step: 126790 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:33,492-Speed 2498.58 samples/sec Loss 5.2425 LearningRate 0.000886 Epoch: 6 Global Step: 126800 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:41,691-Speed 2498.44 samples/sec Loss 5.1645 LearningRate 0.000886 Epoch: 6 Global Step: 126810 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:49,884-Speed 2500.01 samples/sec Loss 5.0879 LearningRate 0.000886 Epoch: 6 Global Step: 126820 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:24:58,081-Speed 2498.85 samples/sec Loss 5.1327 LearningRate 0.000886 Epoch: 6 Global Step: 126830 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:06,285-Speed 2496.72 samples/sec Loss 5.1581 LearningRate 0.000886 Epoch: 6 Global Step: 126840 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:14,427-Speed 2515.58 samples/sec Loss 5.1459 LearningRate 0.000886 Epoch: 6 Global Step: 126850 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:22,624-Speed 2498.95 samples/sec Loss 5.1504 LearningRate 0.000886 Epoch: 6 Global Step: 126860 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:30,831-Speed 2496.11 samples/sec Loss 5.1129 LearningRate 0.000886 Epoch: 6 Global Step: 126870 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:39,028-Speed 2498.89 samples/sec Loss 5.1755 LearningRate 0.000886 Epoch: 6 Global Step: 126880 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:47,222-Speed 2499.92 samples/sec Loss 5.1159 LearningRate 0.000886 Epoch: 6 Global Step: 126890 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:25:55,425-Speed 2496.89 samples/sec Loss 5.1603 LearningRate 0.000886 Epoch: 6 Global Step: 126900 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:03,580-Speed 2512.14 samples/sec Loss 5.2232 LearningRate 0.000886 Epoch: 6 Global Step: 126910 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:11,774-Speed 2499.60 samples/sec Loss 5.1498 LearningRate 0.000886 Epoch: 6 Global Step: 126920 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:19,968-Speed 2499.73 samples/sec Loss 5.2290 LearningRate 0.000886 Epoch: 6 Global Step: 126930 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:28,167-Speed 2498.56 samples/sec Loss 5.1429 LearningRate 0.000886 Epoch: 6 Global Step: 126940 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:36,360-Speed 2500.26 samples/sec Loss 5.1341 LearningRate 0.000886 Epoch: 6 Global Step: 126950 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:44,558-Speed 2498.48 samples/sec Loss 5.1783 LearningRate 0.000886 Epoch: 6 Global Step: 126960 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:26:52,709-Speed 2512.77 samples/sec Loss 5.1795 LearningRate 0.000886 Epoch: 6 Global Step: 126970 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:27:00,905-Speed 2499.28 samples/sec Loss 5.1365 LearningRate 0.000886 Epoch: 6 Global Step: 126980 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:27:09,104-Speed 2498.32 samples/sec Loss 5.1789 LearningRate 0.000886 Epoch: 6 Global Step: 126990 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:27:17,298-Speed 2499.65 samples/sec Loss 5.2369 LearningRate 0.000885 Epoch: 6 Global Step: 127000 Fp16 Grad Scale: 32768 Required: 161 hours Training: 2022-07-06 19:27:25,494-Speed 2499.43 samples/sec Loss 5.2098 LearningRate 0.000885 Epoch: 6 Global Step: 127010 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:27:33,689-Speed 2499.48 samples/sec Loss 5.1558 LearningRate 0.000885 Epoch: 6 Global Step: 127020 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:27:41,833-Speed 2514.96 samples/sec Loss 5.1177 LearningRate 0.000885 Epoch: 6 Global Step: 127030 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:27:50,030-Speed 2498.98 samples/sec Loss 5.1682 LearningRate 0.000885 Epoch: 6 Global Step: 127040 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:27:58,226-Speed 2499.22 samples/sec Loss 5.2196 LearningRate 0.000885 Epoch: 6 Global Step: 127050 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:06,419-Speed 2500.29 samples/sec Loss 5.1773 LearningRate 0.000885 Epoch: 6 Global Step: 127060 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:14,614-Speed 2499.33 samples/sec Loss 5.1885 LearningRate 0.000885 Epoch: 6 Global Step: 127070 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:22,822-Speed 2495.63 samples/sec Loss 5.2900 LearningRate 0.000885 Epoch: 6 Global Step: 127080 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:30,965-Speed 2515.40 samples/sec Loss 5.1602 LearningRate 0.000885 Epoch: 6 Global Step: 127090 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:39,163-Speed 2499.06 samples/sec Loss 5.1011 LearningRate 0.000885 Epoch: 6 Global Step: 127100 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:47,359-Speed 2498.93 samples/sec Loss 5.1552 LearningRate 0.000885 Epoch: 6 Global Step: 127110 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:28:55,558-Speed 2498.38 samples/sec Loss 5.1297 LearningRate 0.000885 Epoch: 6 Global Step: 127120 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:03,761-Speed 2497.08 samples/sec Loss 5.1745 LearningRate 0.000885 Epoch: 6 Global Step: 127130 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:11,958-Speed 2498.73 samples/sec Loss 5.1969 LearningRate 0.000885 Epoch: 6 Global Step: 127140 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:20,105-Speed 2514.38 samples/sec Loss 5.1543 LearningRate 0.000885 Epoch: 6 Global Step: 127150 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:28,310-Speed 2496.36 samples/sec Loss 5.1780 LearningRate 0.000885 Epoch: 6 Global Step: 127160 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:36,509-Speed 2498.25 samples/sec Loss 5.1587 LearningRate 0.000885 Epoch: 6 Global Step: 127170 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:44,723-Speed 2494.04 samples/sec Loss 5.0845 LearningRate 0.000885 Epoch: 6 Global Step: 127180 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:29:52,918-Speed 2499.34 samples/sec Loss 5.0621 LearningRate 0.000885 Epoch: 6 Global Step: 127190 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:01,113-Speed 2499.56 samples/sec Loss 5.2084 LearningRate 0.000885 Epoch: 6 Global Step: 127200 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:09,258-Speed 2514.84 samples/sec Loss 5.2235 LearningRate 0.000885 Epoch: 6 Global Step: 127210 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:17,455-Speed 2498.91 samples/sec Loss 5.0660 LearningRate 0.000885 Epoch: 6 Global Step: 127220 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:25,653-Speed 2498.54 samples/sec Loss 5.1542 LearningRate 0.000885 Epoch: 6 Global Step: 127230 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:33,853-Speed 2498.28 samples/sec Loss 5.1872 LearningRate 0.000885 Epoch: 6 Global Step: 127240 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:42,058-Speed 2496.47 samples/sec Loss 5.1195 LearningRate 0.000885 Epoch: 6 Global Step: 127250 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:50,258-Speed 2497.91 samples/sec Loss 5.1188 LearningRate 0.000885 Epoch: 6 Global Step: 127260 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:30:58,401-Speed 2515.20 samples/sec Loss 5.2163 LearningRate 0.000885 Epoch: 6 Global Step: 127270 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:06,597-Speed 2499.10 samples/sec Loss 5.2274 LearningRate 0.000885 Epoch: 6 Global Step: 127280 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:14,797-Speed 2498.06 samples/sec Loss 5.1853 LearningRate 0.000885 Epoch: 6 Global Step: 127290 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:22,999-Speed 2497.39 samples/sec Loss 5.1507 LearningRate 0.000885 Epoch: 6 Global Step: 127300 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:31,197-Speed 2498.49 samples/sec Loss 5.1507 LearningRate 0.000885 Epoch: 6 Global Step: 127310 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:39,393-Speed 2499.48 samples/sec Loss 5.1789 LearningRate 0.000885 Epoch: 6 Global Step: 127320 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:47,535-Speed 2515.66 samples/sec Loss 5.0579 LearningRate 0.000885 Epoch: 6 Global Step: 127330 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:31:55,733-Speed 2498.48 samples/sec Loss 5.0742 LearningRate 0.000885 Epoch: 6 Global Step: 127340 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:03,930-Speed 2499.18 samples/sec Loss 5.1524 LearningRate 0.000885 Epoch: 6 Global Step: 127350 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:12,124-Speed 2499.60 samples/sec Loss 5.2199 LearningRate 0.000885 Epoch: 6 Global Step: 127360 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:20,321-Speed 2499.12 samples/sec Loss 5.1018 LearningRate 0.000885 Epoch: 6 Global Step: 127370 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:28,516-Speed 2499.46 samples/sec Loss 5.1329 LearningRate 0.000885 Epoch: 6 Global Step: 127380 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:36,659-Speed 2515.30 samples/sec Loss 5.0648 LearningRate 0.000885 Epoch: 6 Global Step: 127390 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:44,858-Speed 2498.66 samples/sec Loss 5.1804 LearningRate 0.000884 Epoch: 6 Global Step: 127400 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:32:53,053-Speed 2499.49 samples/sec Loss 5.1488 LearningRate 0.000884 Epoch: 6 Global Step: 127410 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:01,246-Speed 2499.90 samples/sec Loss 5.1288 LearningRate 0.000884 Epoch: 6 Global Step: 127420 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:09,443-Speed 2499.16 samples/sec Loss 5.1310 LearningRate 0.000884 Epoch: 6 Global Step: 127430 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:17,641-Speed 2498.35 samples/sec Loss 5.1951 LearningRate 0.000884 Epoch: 6 Global Step: 127440 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:25,783-Speed 2515.74 samples/sec Loss 5.1808 LearningRate 0.000884 Epoch: 6 Global Step: 127450 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:33,977-Speed 2499.69 samples/sec Loss 5.1984 LearningRate 0.000884 Epoch: 6 Global Step: 127460 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:42,174-Speed 2499.04 samples/sec Loss 5.1698 LearningRate 0.000884 Epoch: 6 Global Step: 127470 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:50,373-Speed 2498.20 samples/sec Loss 5.1421 LearningRate 0.000884 Epoch: 6 Global Step: 127480 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:33:58,569-Speed 2499.07 samples/sec Loss 5.1034 LearningRate 0.000884 Epoch: 6 Global Step: 127490 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:06,778-Speed 2495.34 samples/sec Loss 5.1267 LearningRate 0.000884 Epoch: 6 Global Step: 127500 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:14,920-Speed 2515.71 samples/sec Loss 5.2032 LearningRate 0.000884 Epoch: 6 Global Step: 127510 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:23,135-Speed 2493.73 samples/sec Loss 5.1097 LearningRate 0.000884 Epoch: 6 Global Step: 127520 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:31,330-Speed 2499.38 samples/sec Loss 5.1177 LearningRate 0.000884 Epoch: 6 Global Step: 127530 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:39,527-Speed 2498.75 samples/sec Loss 5.0785 LearningRate 0.000884 Epoch: 6 Global Step: 127540 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:47,744-Speed 2492.94 samples/sec Loss 5.1117 LearningRate 0.000884 Epoch: 6 Global Step: 127550 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:34:58,296-Speed 2501.42 samples/sec Loss 5.2181 LearningRate 0.000884 Epoch: 6 Global Step: 127560 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:06,445-Speed 2513.61 samples/sec Loss 5.1434 LearningRate 0.000884 Epoch: 6 Global Step: 127570 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:14,649-Speed 2496.73 samples/sec Loss 5.2090 LearningRate 0.000884 Epoch: 6 Global Step: 127580 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:23,761-Speed 2491.79 samples/sec Loss 5.2433 LearningRate 0.000884 Epoch: 6 Global Step: 127590 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:31,982-Speed 2491.64 samples/sec Loss 5.1217 LearningRate 0.000884 Epoch: 6 Global Step: 127600 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:40,206-Speed 2490.64 samples/sec Loss 5.1138 LearningRate 0.000884 Epoch: 6 Global Step: 127610 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:48,600-Speed 2493.41 samples/sec Loss 5.2057 LearningRate 0.000884 Epoch: 6 Global Step: 127620 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:35:56,788-Speed 2501.94 samples/sec Loss 5.2911 LearningRate 0.000884 Epoch: 6 Global Step: 127630 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:36:05,018-Speed 2488.80 samples/sec Loss 5.2559 LearningRate 0.000884 Epoch: 6 Global Step: 127640 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:36:13,245-Speed 2489.70 samples/sec Loss 5.2882 LearningRate 0.000884 Epoch: 6 Global Step: 127650 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:36:21,468-Speed 2490.98 samples/sec Loss 5.1934 LearningRate 0.000884 Epoch: 6 Global Step: 127660 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:36:29,688-Speed 2491.95 samples/sec Loss 5.1682 LearningRate 0.000884 Epoch: 6 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:36:37,902-Speed 2493.63 samples/sec Loss 5.2252 LearningRate 0.000884 Epoch: 6 Global Step: 127680 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:36:46,057-Speed 2511.97 samples/sec Loss 5.1476 LearningRate 0.000884 Epoch: 6 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:36:54,262-Speed 2496.55 samples/sec Loss 5.2284 LearningRate 0.000884 Epoch: 6 Global Step: 127700 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:02,499-Speed 2486.60 samples/sec Loss 5.2875 LearningRate 0.000884 Epoch: 6 Global Step: 127710 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:10,701-Speed 2497.25 samples/sec Loss 5.1868 LearningRate 0.000884 Epoch: 6 Global Step: 127720 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:18,900-Speed 2498.38 samples/sec Loss 5.1525 LearningRate 0.000884 Epoch: 6 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:27,109-Speed 2495.04 samples/sec Loss 5.1279 LearningRate 0.000884 Epoch: 6 Global Step: 127740 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:35,254-Speed 2515.13 samples/sec Loss 5.1123 LearningRate 0.000884 Epoch: 6 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:43,456-Speed 2497.32 samples/sec Loss 5.1024 LearningRate 0.000884 Epoch: 6 Global Step: 127760 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:51,674-Speed 2492.45 samples/sec Loss 5.1465 LearningRate 0.000884 Epoch: 6 Global Step: 127770 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:37:59,873-Speed 2498.17 samples/sec Loss 5.1351 LearningRate 0.000884 Epoch: 6 Global Step: 127780 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:38:08,081-Speed 2495.75 samples/sec Loss 5.1320 LearningRate 0.000884 Epoch: 6 Global Step: 127790 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:38:16,292-Speed 2494.43 samples/sec Loss 5.1837 LearningRate 0.000883 Epoch: 6 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:38:24,440-Speed 2514.05 samples/sec Loss 5.1586 LearningRate 0.000883 Epoch: 6 Global Step: 127810 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:38:32,603-Speed 2509.05 samples/sec Loss 5.1456 LearningRate 0.000883 Epoch: 6 Global Step: 127820 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:38:40,806-Speed 2497.12 samples/sec Loss 5.1487 LearningRate 0.000883 Epoch: 6 Global Step: 127830 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:38:49,008-Speed 2497.37 samples/sec Loss 5.1346 LearningRate 0.000883 Epoch: 6 Global Step: 127840 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:38:57,215-Speed 2495.94 samples/sec Loss 5.2063 LearningRate 0.000883 Epoch: 6 Global Step: 127850 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:05,414-Speed 2498.20 samples/sec Loss 5.1226 LearningRate 0.000883 Epoch: 6 Global Step: 127860 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:13,561-Speed 2514.09 samples/sec Loss 5.2865 LearningRate 0.000883 Epoch: 6 Global Step: 127870 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:21,771-Speed 2495.04 samples/sec Loss 5.1813 LearningRate 0.000883 Epoch: 6 Global Step: 127880 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:29,972-Speed 2497.43 samples/sec Loss 5.2070 LearningRate 0.000883 Epoch: 6 Global Step: 127890 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:38,173-Speed 2497.76 samples/sec Loss 5.1275 LearningRate 0.000883 Epoch: 6 Global Step: 127900 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:46,377-Speed 2496.93 samples/sec Loss 5.1077 LearningRate 0.000883 Epoch: 6 Global Step: 127910 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:39:54,575-Speed 2498.40 samples/sec Loss 5.1398 LearningRate 0.000883 Epoch: 6 Global Step: 127920 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:02,723-Speed 2513.94 samples/sec Loss 5.1747 LearningRate 0.000883 Epoch: 6 Global Step: 127930 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:10,921-Speed 2498.71 samples/sec Loss 5.2617 LearningRate 0.000883 Epoch: 6 Global Step: 127940 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:19,122-Speed 2497.86 samples/sec Loss 5.2490 LearningRate 0.000883 Epoch: 6 Global Step: 127950 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:27,320-Speed 2498.54 samples/sec Loss 5.2028 LearningRate 0.000883 Epoch: 6 Global Step: 127960 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:35,523-Speed 2497.14 samples/sec Loss 5.2143 LearningRate 0.000883 Epoch: 6 Global Step: 127970 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:43,732-Speed 2499.50 samples/sec Loss 5.1331 LearningRate 0.000883 Epoch: 6 Global Step: 127980 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:40:51,877-Speed 2514.65 samples/sec Loss 5.0998 LearningRate 0.000883 Epoch: 6 Global Step: 127990 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:00,104-Speed 2499.63 samples/sec Loss 5.1962 LearningRate 0.000883 Epoch: 6 Global Step: 128000 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:13,863-Speed 2500.52 samples/sec Loss 5.2435 LearningRate 0.000883 Epoch: 6 Global Step: 128010 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:22,115-Speed 2502.21 samples/sec Loss 5.1957 LearningRate 0.000883 Epoch: 6 Global Step: 128020 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:30,311-Speed 2499.16 samples/sec Loss 5.1086 LearningRate 0.000883 Epoch: 6 Global Step: 128030 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:38,543-Speed 2499.00 samples/sec Loss 5.1614 LearningRate 0.000883 Epoch: 6 Global Step: 128040 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:46,722-Speed 2514.85 samples/sec Loss 5.1029 LearningRate 0.000883 Epoch: 6 Global Step: 128050 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:41:54,924-Speed 2497.40 samples/sec Loss 5.1749 LearningRate 0.000883 Epoch: 6 Global Step: 128060 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:03,235-Speed 2501.19 samples/sec Loss 5.1194 LearningRate 0.000883 Epoch: 6 Global Step: 128070 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:11,460-Speed 2501.10 samples/sec Loss 5.1750 LearningRate 0.000883 Epoch: 6 Global Step: 128080 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:19,686-Speed 2500.15 samples/sec Loss 5.1707 LearningRate 0.000883 Epoch: 6 Global Step: 128090 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:27,886-Speed 2497.85 samples/sec Loss 5.1842 LearningRate 0.000883 Epoch: 6 Global Step: 128100 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:36,031-Speed 2514.70 samples/sec Loss 5.1563 LearningRate 0.000883 Epoch: 6 Global Step: 128110 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:48,543-Speed 1645.90 samples/sec Loss 5.1414 LearningRate 0.000883 Epoch: 6 Global Step: 128120 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:42:56,768-Speed 2501.15 samples/sec Loss 5.1579 LearningRate 0.000883 Epoch: 6 Global Step: 128130 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:04,962-Speed 2499.62 samples/sec Loss 5.0955 LearningRate 0.000883 Epoch: 6 Global Step: 128140 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:16,986-Speed 2497.51 samples/sec Loss 5.1273 LearningRate 0.000883 Epoch: 6 Global Step: 128150 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:25,293-Speed 2501.30 samples/sec Loss 5.1081 LearningRate 0.000883 Epoch: 6 Global Step: 128160 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:36,123-Speed 1891.24 samples/sec Loss 5.1560 LearningRate 0.000883 Epoch: 6 Global Step: 128170 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:44,323-Speed 2499.87 samples/sec Loss 5.0742 LearningRate 0.000883 Epoch: 6 Global Step: 128180 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:43:52,564-Speed 2500.74 samples/sec Loss 5.1455 LearningRate 0.000883 Epoch: 6 Global Step: 128190 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:05,209-Speed 1619.78 samples/sec Loss 5.1743 LearningRate 0.000882 Epoch: 6 Global Step: 128200 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:13,440-Speed 2500.45 samples/sec Loss 5.1627 LearningRate 0.000882 Epoch: 6 Global Step: 128210 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:25,228-Speed 1741.14 samples/sec Loss 5.1656 LearningRate 0.000882 Epoch: 6 Global Step: 128220 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:33,378-Speed 2513.38 samples/sec Loss 5.1597 LearningRate 0.000882 Epoch: 6 Global Step: 128230 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:41,602-Speed 2490.69 samples/sec Loss 5.1892 LearningRate 0.000882 Epoch: 6 Global Step: 128240 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:44:54,510-Speed 2487.78 samples/sec Loss 5.2189 LearningRate 0.000882 Epoch: 6 Global Step: 128250 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:02,726-Speed 2496.36 samples/sec Loss 5.1036 LearningRate 0.000882 Epoch: 6 Global Step: 128260 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:12,168-Speed 2169.43 samples/sec Loss 5.1908 LearningRate 0.000882 Epoch: 6 Global Step: 128270 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:20,391-Speed 2490.93 samples/sec Loss 5.2454 LearningRate 0.000882 Epoch: 6 Global Step: 128280 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:28,578-Speed 2510.20 samples/sec Loss 5.1927 LearningRate 0.000882 Epoch: 6 Global Step: 128290 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:40,458-Speed 1724.11 samples/sec Loss 5.1440 LearningRate 0.000882 Epoch: 6 Global Step: 128300 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:48,677-Speed 2498.39 samples/sec Loss 5.1554 LearningRate 0.000882 Epoch: 6 Global Step: 128310 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:45:56,880-Speed 2497.56 samples/sec Loss 5.1419 LearningRate 0.000882 Epoch: 6 Global Step: 128320 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:05,093-Speed 2494.05 samples/sec Loss 5.1463 LearningRate 0.000882 Epoch: 6 Global Step: 128330 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:13,292-Speed 2498.10 samples/sec Loss 5.1726 LearningRate 0.000882 Epoch: 6 Global Step: 128340 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:21,443-Speed 2513.05 samples/sec Loss 5.2052 LearningRate 0.000882 Epoch: 6 Global Step: 128350 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:29,725-Speed 2473.40 samples/sec Loss 5.1615 LearningRate 0.000882 Epoch: 6 Global Step: 128360 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:37,919-Speed 2499.60 samples/sec Loss 5.1434 LearningRate 0.000882 Epoch: 6 Global Step: 128370 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:46,115-Speed 2499.33 samples/sec Loss 5.1566 LearningRate 0.000882 Epoch: 6 Global Step: 128380 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:46:54,320-Speed 2496.71 samples/sec Loss 5.1730 LearningRate 0.000882 Epoch: 6 Global Step: 128390 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:02,520-Speed 2497.58 samples/sec Loss 5.1255 LearningRate 0.000882 Epoch: 6 Global Step: 128400 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:10,670-Speed 2513.48 samples/sec Loss 5.1462 LearningRate 0.000882 Epoch: 6 Global Step: 128410 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:18,872-Speed 2497.61 samples/sec Loss 5.0446 LearningRate 0.000882 Epoch: 6 Global Step: 128420 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:27,085-Speed 2493.97 samples/sec Loss 5.1195 LearningRate 0.000882 Epoch: 6 Global Step: 128430 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:35,300-Speed 2493.18 samples/sec Loss 5.0897 LearningRate 0.000882 Epoch: 6 Global Step: 128440 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:43,506-Speed 2496.22 samples/sec Loss 5.0992 LearningRate 0.000882 Epoch: 6 Global Step: 128450 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:51,712-Speed 2496.22 samples/sec Loss 5.1772 LearningRate 0.000882 Epoch: 6 Global Step: 128460 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:47:59,865-Speed 2512.22 samples/sec Loss 5.2216 LearningRate 0.000882 Epoch: 6 Global Step: 128470 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:08,071-Speed 2496.22 samples/sec Loss 5.1165 LearningRate 0.000882 Epoch: 6 Global Step: 128480 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:16,278-Speed 2495.68 samples/sec Loss 5.2225 LearningRate 0.000882 Epoch: 6 Global Step: 128490 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:24,483-Speed 2496.55 samples/sec Loss 5.1747 LearningRate 0.000882 Epoch: 6 Global Step: 128500 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:32,688-Speed 2496.36 samples/sec Loss 5.1173 LearningRate 0.000882 Epoch: 6 Global Step: 128510 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:40,894-Speed 2496.09 samples/sec Loss 5.1399 LearningRate 0.000882 Epoch: 6 Global Step: 128520 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:49,046-Speed 2512.76 samples/sec Loss 5.1325 LearningRate 0.000882 Epoch: 6 Global Step: 128530 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:48:57,258-Speed 2494.46 samples/sec Loss 5.1439 LearningRate 0.000882 Epoch: 6 Global Step: 128540 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:05,461-Speed 2497.04 samples/sec Loss 5.2026 LearningRate 0.000882 Epoch: 6 Global Step: 128550 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:13,680-Speed 2492.14 samples/sec Loss 5.0912 LearningRate 0.000882 Epoch: 6 Global Step: 128560 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:21,884-Speed 2496.84 samples/sec Loss 5.0993 LearningRate 0.000882 Epoch: 6 Global Step: 128570 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:30,087-Speed 2497.12 samples/sec Loss 5.1604 LearningRate 0.000882 Epoch: 6 Global Step: 128580 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:38,241-Speed 2512.07 samples/sec Loss 5.1527 LearningRate 0.000881 Epoch: 6 Global Step: 128590 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:46,446-Speed 2496.52 samples/sec Loss 5.1058 LearningRate 0.000881 Epoch: 6 Global Step: 128600 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:49:54,650-Speed 2496.89 samples/sec Loss 5.1486 LearningRate 0.000881 Epoch: 6 Global Step: 128610 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:02,869-Speed 2492.81 samples/sec Loss 5.1703 LearningRate 0.000881 Epoch: 6 Global Step: 128620 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:11,072-Speed 2496.97 samples/sec Loss 5.0895 LearningRate 0.000881 Epoch: 6 Global Step: 128630 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:19,279-Speed 2496.07 samples/sec Loss 5.0894 LearningRate 0.000881 Epoch: 6 Global Step: 128640 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:27,434-Speed 2511.60 samples/sec Loss 5.0474 LearningRate 0.000881 Epoch: 6 Global Step: 128650 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:35,641-Speed 2496.18 samples/sec Loss 5.1147 LearningRate 0.000881 Epoch: 6 Global Step: 128660 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:43,845-Speed 2496.47 samples/sec Loss 5.0630 LearningRate 0.000881 Epoch: 6 Global Step: 128670 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:50:52,048-Speed 2497.00 samples/sec Loss 5.0582 LearningRate 0.000881 Epoch: 6 Global Step: 128680 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:00,252-Speed 2496.84 samples/sec Loss 5.0292 LearningRate 0.000881 Epoch: 6 Global Step: 128690 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:08,471-Speed 2492.16 samples/sec Loss 5.0063 LearningRate 0.000881 Epoch: 6 Global Step: 128700 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:16,616-Speed 2514.67 samples/sec Loss 5.0382 LearningRate 0.000881 Epoch: 6 Global Step: 128710 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:24,822-Speed 2496.11 samples/sec Loss 5.0101 LearningRate 0.000881 Epoch: 6 Global Step: 128720 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:33,028-Speed 2496.25 samples/sec Loss 5.0766 LearningRate 0.000881 Epoch: 6 Global Step: 128730 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:41,231-Speed 2497.07 samples/sec Loss 5.0942 LearningRate 0.000881 Epoch: 6 Global Step: 128740 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:49,436-Speed 2496.63 samples/sec Loss 5.0233 LearningRate 0.000881 Epoch: 6 Global Step: 128750 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:51:57,644-Speed 2495.71 samples/sec Loss 5.0559 LearningRate 0.000881 Epoch: 6 Global Step: 128760 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:05,792-Speed 2513.82 samples/sec Loss 5.0927 LearningRate 0.000881 Epoch: 6 Global Step: 128770 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:13,999-Speed 2495.86 samples/sec Loss 5.2122 LearningRate 0.000881 Epoch: 6 Global Step: 128780 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:22,202-Speed 2497.06 samples/sec Loss 5.1213 LearningRate 0.000881 Epoch: 6 Global Step: 128790 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:30,416-Speed 2493.54 samples/sec Loss 5.0627 LearningRate 0.000881 Epoch: 6 Global Step: 128800 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:38,620-Speed 2496.90 samples/sec Loss 5.1299 LearningRate 0.000881 Epoch: 6 Global Step: 128810 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:46,823-Speed 2497.21 samples/sec Loss 5.1283 LearningRate 0.000881 Epoch: 6 Global Step: 128820 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:52:55,012-Speed 2501.45 samples/sec Loss 5.0335 LearningRate 0.000881 Epoch: 6 Global Step: 128830 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:03,213-Speed 2497.67 samples/sec Loss 5.0940 LearningRate 0.000881 Epoch: 6 Global Step: 128840 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:11,416-Speed 2497.05 samples/sec Loss 5.0944 LearningRate 0.000881 Epoch: 6 Global Step: 128850 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:19,618-Speed 2497.28 samples/sec Loss 5.1685 LearningRate 0.000881 Epoch: 6 Global Step: 128860 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:27,825-Speed 2496.14 samples/sec Loss 5.1091 LearningRate 0.000881 Epoch: 6 Global Step: 128870 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:36,028-Speed 2496.96 samples/sec Loss 5.0571 LearningRate 0.000881 Epoch: 6 Global Step: 128880 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:44,175-Speed 2514.13 samples/sec Loss 5.0671 LearningRate 0.000881 Epoch: 6 Global Step: 128890 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:53:52,377-Speed 2497.47 samples/sec Loss 5.0382 LearningRate 0.000881 Epoch: 6 Global Step: 128900 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:00,581-Speed 2496.73 samples/sec Loss 5.1405 LearningRate 0.000881 Epoch: 6 Global Step: 128910 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:08,782-Speed 2497.56 samples/sec Loss 5.0702 LearningRate 0.000881 Epoch: 6 Global Step: 128920 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:16,984-Speed 2497.55 samples/sec Loss 4.9731 LearningRate 0.000881 Epoch: 6 Global Step: 128930 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:25,185-Speed 2497.63 samples/sec Loss 5.1626 LearningRate 0.000881 Epoch: 6 Global Step: 128940 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:33,341-Speed 2511.67 samples/sec Loss 5.0444 LearningRate 0.000881 Epoch: 6 Global Step: 128950 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:41,542-Speed 2497.51 samples/sec Loss 5.1034 LearningRate 0.000881 Epoch: 6 Global Step: 128960 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:49,742-Speed 2497.90 samples/sec Loss 5.1747 LearningRate 0.000881 Epoch: 6 Global Step: 128970 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:54:57,945-Speed 2497.27 samples/sec Loss 5.0570 LearningRate 0.000881 Epoch: 6 Global Step: 128980 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:55:06,153-Speed 2495.48 samples/sec Loss 5.0917 LearningRate 0.000880 Epoch: 6 Global Step: 128990 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:55:14,355-Speed 2497.25 samples/sec Loss 5.1103 LearningRate 0.000880 Epoch: 6 Global Step: 129000 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:55:22,506-Speed 2512.91 samples/sec Loss 5.0741 LearningRate 0.000880 Epoch: 6 Global Step: 129010 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 19:55:30,711-Speed 2496.59 samples/sec Loss 5.1598 LearningRate 0.000880 Epoch: 6 Global Step: 129020 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:55:38,912-Speed 2497.60 samples/sec Loss 5.0806 LearningRate 0.000880 Epoch: 6 Global Step: 129030 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:55:47,118-Speed 2496.35 samples/sec Loss 5.0059 LearningRate 0.000880 Epoch: 6 Global Step: 129040 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:55:55,321-Speed 2497.11 samples/sec Loss 5.0820 LearningRate 0.000880 Epoch: 6 Global Step: 129050 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:03,522-Speed 2497.42 samples/sec Loss 5.0788 LearningRate 0.000880 Epoch: 6 Global Step: 129060 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:11,689-Speed 2508.24 samples/sec Loss 5.1732 LearningRate 0.000880 Epoch: 6 Global Step: 129070 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:19,891-Speed 2497.16 samples/sec Loss 5.1210 LearningRate 0.000880 Epoch: 6 Global Step: 129080 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:28,094-Speed 2497.13 samples/sec Loss 5.0900 LearningRate 0.000880 Epoch: 6 Global Step: 129090 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:36,298-Speed 2496.80 samples/sec Loss 5.0834 LearningRate 0.000880 Epoch: 6 Global Step: 129100 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:44,504-Speed 2496.15 samples/sec Loss 5.0983 LearningRate 0.000880 Epoch: 6 Global Step: 129110 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:56:52,707-Speed 2497.06 samples/sec Loss 5.0359 LearningRate 0.000880 Epoch: 6 Global Step: 129120 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:00,858-Speed 2513.33 samples/sec Loss 5.1042 LearningRate 0.000880 Epoch: 6 Global Step: 129130 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:09,058-Speed 2497.93 samples/sec Loss 5.1529 LearningRate 0.000880 Epoch: 6 Global Step: 129140 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:17,262-Speed 2496.70 samples/sec Loss 5.0977 LearningRate 0.000880 Epoch: 6 Global Step: 129150 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:25,466-Speed 2496.67 samples/sec Loss 5.0835 LearningRate 0.000880 Epoch: 6 Global Step: 129160 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:33,676-Speed 2495.02 samples/sec Loss 5.1332 LearningRate 0.000880 Epoch: 6 Global Step: 129170 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:41,881-Speed 2497.05 samples/sec Loss 5.2594 LearningRate 0.000880 Epoch: 6 Global Step: 129180 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:50,034-Speed 2512.51 samples/sec Loss 5.0687 LearningRate 0.000880 Epoch: 6 Global Step: 129190 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:57:58,235-Speed 2497.66 samples/sec Loss 5.1050 LearningRate 0.000880 Epoch: 6 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:06,436-Speed 2497.47 samples/sec Loss 5.0981 LearningRate 0.000880 Epoch: 6 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:14,637-Speed 2497.63 samples/sec Loss 5.0963 LearningRate 0.000880 Epoch: 6 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:22,839-Speed 2497.61 samples/sec Loss 5.1365 LearningRate 0.000880 Epoch: 6 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:31,042-Speed 2496.92 samples/sec Loss 5.1416 LearningRate 0.000880 Epoch: 6 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:39,190-Speed 2513.89 samples/sec Loss 5.1026 LearningRate 0.000880 Epoch: 6 Global Step: 129250 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:47,395-Speed 2496.62 samples/sec Loss 5.1569 LearningRate 0.000880 Epoch: 6 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:58:55,595-Speed 2498.00 samples/sec Loss 5.1028 LearningRate 0.000880 Epoch: 6 Global Step: 129270 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:03,798-Speed 2497.29 samples/sec Loss 5.1665 LearningRate 0.000880 Epoch: 6 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:11,999-Speed 2497.42 samples/sec Loss 5.2064 LearningRate 0.000880 Epoch: 6 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:20,199-Speed 2498.60 samples/sec Loss 5.1275 LearningRate 0.000880 Epoch: 6 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:28,352-Speed 2512.26 samples/sec Loss 5.1349 LearningRate 0.000880 Epoch: 6 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:36,552-Speed 2498.27 samples/sec Loss 5.1446 LearningRate 0.000880 Epoch: 6 Global Step: 129320 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:44,751-Speed 2498.16 samples/sec Loss 5.1211 LearningRate 0.000880 Epoch: 6 Global Step: 129330 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 19:59:52,975-Speed 2491.03 samples/sec Loss 5.1014 LearningRate 0.000880 Epoch: 6 Global Step: 129340 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:01,176-Speed 2497.71 samples/sec Loss 5.0681 LearningRate 0.000880 Epoch: 6 Global Step: 129350 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:09,380-Speed 2496.77 samples/sec Loss 5.0303 LearningRate 0.000880 Epoch: 6 Global Step: 129360 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:17,538-Speed 2510.56 samples/sec Loss 5.0366 LearningRate 0.000880 Epoch: 6 Global Step: 129370 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:25,738-Speed 2498.03 samples/sec Loss 5.1163 LearningRate 0.000880 Epoch: 6 Global Step: 129380 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:33,940-Speed 2497.47 samples/sec Loss 5.1142 LearningRate 0.000879 Epoch: 6 Global Step: 129390 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:42,145-Speed 2496.49 samples/sec Loss 5.1235 LearningRate 0.000879 Epoch: 6 Global Step: 129400 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:50,347-Speed 2497.73 samples/sec Loss 5.0911 LearningRate 0.000879 Epoch: 6 Global Step: 129410 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:00:58,552-Speed 2496.31 samples/sec Loss 5.0255 LearningRate 0.000879 Epoch: 6 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:06,700-Speed 2514.09 samples/sec Loss 5.0076 LearningRate 0.000879 Epoch: 6 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:14,914-Speed 2493.82 samples/sec Loss 5.0630 LearningRate 0.000879 Epoch: 6 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:23,127-Speed 2493.98 samples/sec Loss 5.0810 LearningRate 0.000879 Epoch: 6 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:31,326-Speed 2498.54 samples/sec Loss 5.1031 LearningRate 0.000879 Epoch: 6 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:39,541-Speed 2493.29 samples/sec Loss 5.0597 LearningRate 0.000879 Epoch: 6 Global Step: 129470 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:47,749-Speed 2495.43 samples/sec Loss 5.1667 LearningRate 0.000879 Epoch: 6 Global Step: 129480 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:01:55,898-Speed 2513.77 samples/sec Loss 5.1484 LearningRate 0.000879 Epoch: 6 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:04,099-Speed 2497.77 samples/sec Loss 5.0832 LearningRate 0.000879 Epoch: 6 Global Step: 129500 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:12,302-Speed 2497.05 samples/sec Loss 5.0174 LearningRate 0.000879 Epoch: 6 Global Step: 129510 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:20,517-Speed 2493.53 samples/sec Loss 5.0444 LearningRate 0.000879 Epoch: 6 Global Step: 129520 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:28,715-Speed 2498.53 samples/sec Loss 5.1357 LearningRate 0.000879 Epoch: 6 Global Step: 129530 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:36,914-Speed 2498.22 samples/sec Loss 5.0798 LearningRate 0.000879 Epoch: 6 Global Step: 129540 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:45,062-Speed 2513.99 samples/sec Loss 5.1421 LearningRate 0.000879 Epoch: 6 Global Step: 129550 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:02:53,259-Speed 2498.68 samples/sec Loss 5.1289 LearningRate 0.000879 Epoch: 6 Global Step: 129560 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:01,460-Speed 2497.91 samples/sec Loss 5.0623 LearningRate 0.000879 Epoch: 6 Global Step: 129570 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:09,664-Speed 2496.61 samples/sec Loss 5.0787 LearningRate 0.000879 Epoch: 6 Global Step: 129580 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:17,867-Speed 2496.95 samples/sec Loss 5.1054 LearningRate 0.000879 Epoch: 6 Global Step: 129590 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:26,081-Speed 2493.72 samples/sec Loss 5.0958 LearningRate 0.000879 Epoch: 6 Global Step: 129600 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:34,229-Speed 2514.06 samples/sec Loss 5.1261 LearningRate 0.000879 Epoch: 6 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:42,433-Speed 2496.64 samples/sec Loss 5.1039 LearningRate 0.000879 Epoch: 6 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:50,632-Speed 2498.42 samples/sec Loss 5.0785 LearningRate 0.000879 Epoch: 6 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:03:58,833-Speed 2497.71 samples/sec Loss 5.1217 LearningRate 0.000879 Epoch: 6 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:04:07,033-Speed 2498.07 samples/sec Loss 5.1093 LearningRate 0.000879 Epoch: 6 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:04:15,191-Speed 2510.94 samples/sec Loss 5.1318 LearningRate 0.000879 Epoch: 6 Global Step: 129660 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:04:23,340-Speed 2513.43 samples/sec Loss 5.1271 LearningRate 0.000879 Epoch: 6 Global Step: 129670 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:04:31,541-Speed 2497.67 samples/sec Loss 5.1274 LearningRate 0.000879 Epoch: 6 Global Step: 129680 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:04:39,741-Speed 2497.78 samples/sec Loss 5.1271 LearningRate 0.000879 Epoch: 6 Global Step: 129690 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:04:47,942-Speed 2497.88 samples/sec Loss 5.1536 LearningRate 0.000879 Epoch: 6 Global Step: 129700 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:04:56,141-Speed 2498.04 samples/sec Loss 5.1225 LearningRate 0.000879 Epoch: 6 Global Step: 129710 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:04,343-Speed 2497.46 samples/sec Loss 5.1249 LearningRate 0.000879 Epoch: 6 Global Step: 129720 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:12,491-Speed 2513.85 samples/sec Loss 5.1352 LearningRate 0.000879 Epoch: 6 Global Step: 129730 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:20,696-Speed 2496.39 samples/sec Loss 5.1067 LearningRate 0.000879 Epoch: 6 Global Step: 129740 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:28,908-Speed 2494.26 samples/sec Loss 5.1157 LearningRate 0.000879 Epoch: 6 Global Step: 129750 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:37,108-Speed 2497.94 samples/sec Loss 5.0597 LearningRate 0.000879 Epoch: 6 Global Step: 129760 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:45,319-Speed 2494.82 samples/sec Loss 5.0547 LearningRate 0.000879 Epoch: 6 Global Step: 129770 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:05:53,529-Speed 2494.73 samples/sec Loss 5.0433 LearningRate 0.000879 Epoch: 6 Global Step: 129780 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:01,679-Speed 2513.28 samples/sec Loss 5.0598 LearningRate 0.000878 Epoch: 6 Global Step: 129790 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:09,885-Speed 2496.33 samples/sec Loss 5.0793 LearningRate 0.000878 Epoch: 6 Global Step: 129800 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:18,086-Speed 2497.56 samples/sec Loss 5.1270 LearningRate 0.000878 Epoch: 6 Global Step: 129810 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:26,294-Speed 2495.75 samples/sec Loss 5.0784 LearningRate 0.000878 Epoch: 6 Global Step: 129820 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:34,493-Speed 2498.41 samples/sec Loss 5.1171 LearningRate 0.000878 Epoch: 6 Global Step: 129830 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:42,693-Speed 2497.85 samples/sec Loss 5.0780 LearningRate 0.000878 Epoch: 6 Global Step: 129840 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:50,839-Speed 2514.50 samples/sec Loss 5.0623 LearningRate 0.000878 Epoch: 6 Global Step: 129850 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:06:59,041-Speed 2497.63 samples/sec Loss 5.0775 LearningRate 0.000878 Epoch: 6 Global Step: 129860 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:07,239-Speed 2498.20 samples/sec Loss 5.1943 LearningRate 0.000878 Epoch: 6 Global Step: 129870 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:15,438-Speed 2498.55 samples/sec Loss 5.2351 LearningRate 0.000878 Epoch: 6 Global Step: 129880 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:23,638-Speed 2497.97 samples/sec Loss 5.1072 LearningRate 0.000878 Epoch: 6 Global Step: 129890 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:31,843-Speed 2496.49 samples/sec Loss 5.1493 LearningRate 0.000878 Epoch: 6 Global Step: 129900 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:39,989-Speed 2514.41 samples/sec Loss 5.0443 LearningRate 0.000878 Epoch: 6 Global Step: 129910 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:48,192-Speed 2497.01 samples/sec Loss 5.1926 LearningRate 0.000878 Epoch: 6 Global Step: 129920 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:07:56,391-Speed 2498.24 samples/sec Loss 5.1653 LearningRate 0.000878 Epoch: 6 Global Step: 129930 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:04,591-Speed 2497.91 samples/sec Loss 5.0655 LearningRate 0.000878 Epoch: 6 Global Step: 129940 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:12,790-Speed 2498.42 samples/sec Loss 5.0866 LearningRate 0.000878 Epoch: 6 Global Step: 129950 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:20,990-Speed 2498.11 samples/sec Loss 4.9731 LearningRate 0.000878 Epoch: 6 Global Step: 129960 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:29,142-Speed 2512.29 samples/sec Loss 5.1235 LearningRate 0.000878 Epoch: 6 Global Step: 129970 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:37,363-Speed 2491.65 samples/sec Loss 5.1669 LearningRate 0.000878 Epoch: 6 Global Step: 129980 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:45,561-Speed 2498.42 samples/sec Loss 5.1513 LearningRate 0.000878 Epoch: 6 Global Step: 129990 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:08:53,762-Speed 2497.75 samples/sec Loss 5.0844 LearningRate 0.000878 Epoch: 6 Global Step: 130000 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:01,965-Speed 2497.07 samples/sec Loss 5.1003 LearningRate 0.000878 Epoch: 6 Global Step: 130010 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:10,164-Speed 2498.16 samples/sec Loss 5.0515 LearningRate 0.000878 Epoch: 6 Global Step: 130020 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:18,312-Speed 2514.08 samples/sec Loss 5.1070 LearningRate 0.000878 Epoch: 6 Global Step: 130030 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:26,515-Speed 2497.10 samples/sec Loss 5.0842 LearningRate 0.000878 Epoch: 6 Global Step: 130040 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:34,719-Speed 2496.75 samples/sec Loss 5.0742 LearningRate 0.000878 Epoch: 6 Global Step: 130050 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:42,921-Speed 2497.91 samples/sec Loss 5.1093 LearningRate 0.000878 Epoch: 6 Global Step: 130060 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:51,119-Speed 2498.44 samples/sec Loss 5.1221 LearningRate 0.000878 Epoch: 6 Global Step: 130070 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:09:59,330-Speed 2494.54 samples/sec Loss 5.1114 LearningRate 0.000878 Epoch: 6 Global Step: 130080 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:07,479-Speed 2513.75 samples/sec Loss 5.1711 LearningRate 0.000878 Epoch: 6 Global Step: 130090 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:15,685-Speed 2496.02 samples/sec Loss 5.1593 LearningRate 0.000878 Epoch: 6 Global Step: 130100 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:23,885-Speed 2498.27 samples/sec Loss 5.1151 LearningRate 0.000878 Epoch: 6 Global Step: 130110 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:32,082-Speed 2498.60 samples/sec Loss 5.1052 LearningRate 0.000878 Epoch: 6 Global Step: 130120 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:40,291-Speed 2495.20 samples/sec Loss 5.1007 LearningRate 0.000878 Epoch: 6 Global Step: 130130 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:48,491-Speed 2498.20 samples/sec Loss 5.1208 LearningRate 0.000878 Epoch: 6 Global Step: 130140 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:10:56,639-Speed 2513.98 samples/sec Loss 5.0952 LearningRate 0.000878 Epoch: 6 Global Step: 130150 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:04,836-Speed 2498.84 samples/sec Loss 5.0844 LearningRate 0.000878 Epoch: 6 Global Step: 130160 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:13,044-Speed 2495.44 samples/sec Loss 5.0400 LearningRate 0.000878 Epoch: 6 Global Step: 130170 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:21,247-Speed 2496.90 samples/sec Loss 5.1193 LearningRate 0.000877 Epoch: 6 Global Step: 130180 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:29,451-Speed 2496.89 samples/sec Loss 5.0438 LearningRate 0.000877 Epoch: 6 Global Step: 130190 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:37,651-Speed 2497.99 samples/sec Loss 5.0572 LearningRate 0.000877 Epoch: 6 Global Step: 130200 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:45,796-Speed 2514.78 samples/sec Loss 5.0352 LearningRate 0.000877 Epoch: 6 Global Step: 130210 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:11:53,993-Speed 2498.88 samples/sec Loss 5.0723 LearningRate 0.000877 Epoch: 6 Global Step: 130220 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:02,193-Speed 2498.11 samples/sec Loss 5.0738 LearningRate 0.000877 Epoch: 6 Global Step: 130230 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:10,396-Speed 2496.89 samples/sec Loss 5.0727 LearningRate 0.000877 Epoch: 6 Global Step: 130240 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:18,597-Speed 2497.57 samples/sec Loss 5.1080 LearningRate 0.000877 Epoch: 6 Global Step: 130250 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:26,796-Speed 2498.46 samples/sec Loss 5.0475 LearningRate 0.000877 Epoch: 6 Global Step: 130260 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:34,948-Speed 2512.53 samples/sec Loss 5.1736 LearningRate 0.000877 Epoch: 6 Global Step: 130270 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:43,149-Speed 2497.94 samples/sec Loss 5.0807 LearningRate 0.000877 Epoch: 6 Global Step: 130280 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:51,355-Speed 2496.11 samples/sec Loss 5.0747 LearningRate 0.000877 Epoch: 6 Global Step: 130290 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:12:59,552-Speed 2498.73 samples/sec Loss 5.0386 LearningRate 0.000877 Epoch: 6 Global Step: 130300 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:07,754-Speed 2497.42 samples/sec Loss 5.0803 LearningRate 0.000877 Epoch: 6 Global Step: 130310 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:15,953-Speed 2498.45 samples/sec Loss 5.1620 LearningRate 0.000877 Epoch: 6 Global Step: 130320 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:24,102-Speed 2513.45 samples/sec Loss 5.1109 LearningRate 0.000877 Epoch: 6 Global Step: 130330 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:32,302-Speed 2497.87 samples/sec Loss 5.0831 LearningRate 0.000877 Epoch: 6 Global Step: 130340 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:40,503-Speed 2497.76 samples/sec Loss 5.0289 LearningRate 0.000877 Epoch: 6 Global Step: 130350 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:48,708-Speed 2496.35 samples/sec Loss 4.9963 LearningRate 0.000877 Epoch: 6 Global Step: 130360 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:13:56,909-Speed 2497.69 samples/sec Loss 5.0661 LearningRate 0.000877 Epoch: 6 Global Step: 130370 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:05,110-Speed 2497.95 samples/sec Loss 5.0807 LearningRate 0.000877 Epoch: 6 Global Step: 130380 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:13,259-Speed 2513.31 samples/sec Loss 5.0098 LearningRate 0.000877 Epoch: 6 Global Step: 130390 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:21,463-Speed 2496.92 samples/sec Loss 5.0644 LearningRate 0.000877 Epoch: 6 Global Step: 130400 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:29,669-Speed 2496.17 samples/sec Loss 5.0849 LearningRate 0.000877 Epoch: 6 Global Step: 130410 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:37,872-Speed 2497.18 samples/sec Loss 5.0673 LearningRate 0.000877 Epoch: 6 Global Step: 130420 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:46,074-Speed 2497.42 samples/sec Loss 5.1138 LearningRate 0.000877 Epoch: 6 Global Step: 130430 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:14:54,283-Speed 2495.16 samples/sec Loss 5.0499 LearningRate 0.000877 Epoch: 6 Global Step: 130440 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:02,429-Speed 2514.33 samples/sec Loss 5.0453 LearningRate 0.000877 Epoch: 6 Global Step: 130450 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:10,633-Speed 2496.88 samples/sec Loss 5.0646 LearningRate 0.000877 Epoch: 6 Global Step: 130460 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:18,833-Speed 2498.06 samples/sec Loss 5.1072 LearningRate 0.000877 Epoch: 6 Global Step: 130470 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:27,031-Speed 2498.43 samples/sec Loss 5.0023 LearningRate 0.000877 Epoch: 6 Global Step: 130480 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:35,232-Speed 2498.14 samples/sec Loss 5.1475 LearningRate 0.000877 Epoch: 6 Global Step: 130490 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:43,438-Speed 2496.15 samples/sec Loss 5.1285 LearningRate 0.000877 Epoch: 6 Global Step: 130500 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:51,579-Speed 2515.75 samples/sec Loss 5.1171 LearningRate 0.000877 Epoch: 6 Global Step: 130510 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:15:59,789-Speed 2495.02 samples/sec Loss 5.0419 LearningRate 0.000877 Epoch: 6 Global Step: 130520 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:07,988-Speed 2498.43 samples/sec Loss 5.1281 LearningRate 0.000877 Epoch: 6 Global Step: 130530 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:16,185-Speed 2498.77 samples/sec Loss 5.1002 LearningRate 0.000877 Epoch: 6 Global Step: 130540 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:24,384-Speed 2498.14 samples/sec Loss 4.9856 LearningRate 0.000877 Epoch: 6 Global Step: 130550 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:32,599-Speed 2493.38 samples/sec Loss 4.9517 LearningRate 0.000877 Epoch: 6 Global Step: 130560 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:40,748-Speed 2513.74 samples/sec Loss 5.0310 LearningRate 0.000877 Epoch: 6 Global Step: 130570 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:48,953-Speed 2496.48 samples/sec Loss 5.0689 LearningRate 0.000876 Epoch: 6 Global Step: 130580 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:16:57,152-Speed 2497.93 samples/sec Loss 5.0689 LearningRate 0.000876 Epoch: 6 Global Step: 130590 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:05,351-Speed 2498.49 samples/sec Loss 5.1055 LearningRate 0.000876 Epoch: 6 Global Step: 130600 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:13,548-Speed 2498.97 samples/sec Loss 5.0921 LearningRate 0.000876 Epoch: 6 Global Step: 130610 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:21,750-Speed 2497.63 samples/sec Loss 5.0894 LearningRate 0.000876 Epoch: 6 Global Step: 130620 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:29,894-Speed 2515.03 samples/sec Loss 4.9873 LearningRate 0.000876 Epoch: 6 Global Step: 130630 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:38,094-Speed 2497.87 samples/sec Loss 5.0963 LearningRate 0.000876 Epoch: 6 Global Step: 130640 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:46,294-Speed 2497.92 samples/sec Loss 5.0594 LearningRate 0.000876 Epoch: 6 Global Step: 130650 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:17:54,492-Speed 2498.71 samples/sec Loss 5.0182 LearningRate 0.000876 Epoch: 6 Global Step: 130660 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:02,694-Speed 2497.37 samples/sec Loss 5.0914 LearningRate 0.000876 Epoch: 6 Global Step: 130670 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:10,906-Speed 2494.23 samples/sec Loss 5.0294 LearningRate 0.000876 Epoch: 6 Global Step: 130680 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:19,055-Speed 2513.71 samples/sec Loss 5.0317 LearningRate 0.000876 Epoch: 6 Global Step: 130690 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:27,267-Speed 2494.01 samples/sec Loss 5.0427 LearningRate 0.000876 Epoch: 6 Global Step: 130700 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:35,469-Speed 2497.55 samples/sec Loss 5.0478 LearningRate 0.000876 Epoch: 6 Global Step: 130710 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:43,671-Speed 2497.34 samples/sec Loss 5.0423 LearningRate 0.000876 Epoch: 6 Global Step: 130720 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:18:51,873-Speed 2497.41 samples/sec Loss 4.9502 LearningRate 0.000876 Epoch: 6 Global Step: 130730 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:00,072-Speed 2498.40 samples/sec Loss 5.0322 LearningRate 0.000876 Epoch: 6 Global Step: 130740 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:08,237-Speed 2508.87 samples/sec Loss 4.9942 LearningRate 0.000876 Epoch: 6 Global Step: 130750 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:16,435-Speed 2498.59 samples/sec Loss 4.9191 LearningRate 0.000876 Epoch: 6 Global Step: 130760 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:24,633-Speed 2498.46 samples/sec Loss 5.0115 LearningRate 0.000876 Epoch: 6 Global Step: 130770 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:32,838-Speed 2496.39 samples/sec Loss 5.0000 LearningRate 0.000876 Epoch: 6 Global Step: 130780 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:41,038-Speed 2498.16 samples/sec Loss 4.9847 LearningRate 0.000876 Epoch: 6 Global Step: 130790 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:49,237-Speed 2498.01 samples/sec Loss 5.1112 LearningRate 0.000876 Epoch: 6 Global Step: 130800 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:19:57,384-Speed 2514.39 samples/sec Loss 5.0524 LearningRate 0.000876 Epoch: 6 Global Step: 130810 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:20:05,584-Speed 2497.81 samples/sec Loss 5.0429 LearningRate 0.000876 Epoch: 6 Global Step: 130820 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:20:13,785-Speed 2497.66 samples/sec Loss 5.0059 LearningRate 0.000876 Epoch: 6 Global Step: 130830 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:20:21,984-Speed 2498.14 samples/sec Loss 5.0157 LearningRate 0.000876 Epoch: 6 Global Step: 130840 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:20:30,187-Speed 2497.63 samples/sec Loss 5.0482 LearningRate 0.000876 Epoch: 6 Global Step: 130850 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:20:38,385-Speed 2498.76 samples/sec Loss 5.0083 LearningRate 0.000876 Epoch: 6 Global Step: 130860 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:20:46,531-Speed 2514.48 samples/sec Loss 4.9894 LearningRate 0.000876 Epoch: 6 Global Step: 130870 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:20:54,730-Speed 2498.46 samples/sec Loss 5.0366 LearningRate 0.000876 Epoch: 6 Global Step: 130880 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:02,930-Speed 2497.88 samples/sec Loss 5.0352 LearningRate 0.000876 Epoch: 6 Global Step: 130890 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:11,131-Speed 2498.01 samples/sec Loss 4.9685 LearningRate 0.000876 Epoch: 6 Global Step: 130900 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:19,331-Speed 2497.85 samples/sec Loss 5.0135 LearningRate 0.000876 Epoch: 6 Global Step: 130910 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:27,534-Speed 2496.93 samples/sec Loss 5.0215 LearningRate 0.000876 Epoch: 6 Global Step: 130920 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:35,681-Speed 2514.34 samples/sec Loss 5.0078 LearningRate 0.000876 Epoch: 6 Global Step: 130930 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:43,881-Speed 2498.37 samples/sec Loss 4.9963 LearningRate 0.000876 Epoch: 6 Global Step: 130940 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:21:52,089-Speed 2495.25 samples/sec Loss 5.0729 LearningRate 0.000876 Epoch: 6 Global Step: 130950 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:00,309-Speed 2492.30 samples/sec Loss 4.9948 LearningRate 0.000876 Epoch: 6 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:08,518-Speed 2495.00 samples/sec Loss 5.0212 LearningRate 0.000876 Epoch: 6 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:16,720-Speed 2497.59 samples/sec Loss 5.0479 LearningRate 0.000875 Epoch: 6 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:24,870-Speed 2513.07 samples/sec Loss 4.9821 LearningRate 0.000875 Epoch: 6 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:33,071-Speed 2497.72 samples/sec Loss 5.0095 LearningRate 0.000875 Epoch: 6 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:41,285-Speed 2493.90 samples/sec Loss 5.0096 LearningRate 0.000875 Epoch: 6 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:49,484-Speed 2498.14 samples/sec Loss 5.0459 LearningRate 0.000875 Epoch: 6 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:22:57,689-Speed 2496.78 samples/sec Loss 5.0358 LearningRate 0.000875 Epoch: 6 Global Step: 131030 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:05,889-Speed 2497.76 samples/sec Loss 4.9814 LearningRate 0.000875 Epoch: 6 Global Step: 131040 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:14,037-Speed 2514.22 samples/sec Loss 5.0695 LearningRate 0.000875 Epoch: 6 Global Step: 131050 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:22,235-Speed 2498.54 samples/sec Loss 5.0741 LearningRate 0.000875 Epoch: 6 Global Step: 131060 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:30,438-Speed 2497.03 samples/sec Loss 4.9834 LearningRate 0.000875 Epoch: 6 Global Step: 131070 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:38,639-Speed 2497.62 samples/sec Loss 5.0363 LearningRate 0.000875 Epoch: 6 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:46,838-Speed 2498.27 samples/sec Loss 5.1902 LearningRate 0.000875 Epoch: 6 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:23:55,041-Speed 2497.27 samples/sec Loss 4.9666 LearningRate 0.000875 Epoch: 6 Global Step: 131100 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:03,191-Speed 2514.08 samples/sec Loss 5.1321 LearningRate 0.000875 Epoch: 6 Global Step: 131110 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:11,392-Speed 2497.40 samples/sec Loss 5.0871 LearningRate 0.000875 Epoch: 6 Global Step: 131120 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:19,598-Speed 2496.38 samples/sec Loss 5.1647 LearningRate 0.000875 Epoch: 6 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:27,798-Speed 2498.09 samples/sec Loss 5.0296 LearningRate 0.000875 Epoch: 6 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:36,000-Speed 2497.32 samples/sec Loss 5.0892 LearningRate 0.000875 Epoch: 6 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:44,202-Speed 2497.45 samples/sec Loss 4.9879 LearningRate 0.000875 Epoch: 6 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:24:52,357-Speed 2511.87 samples/sec Loss 5.0007 LearningRate 0.000875 Epoch: 6 Global Step: 131170 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:00,557-Speed 2497.97 samples/sec Loss 5.0771 LearningRate 0.000875 Epoch: 6 Global Step: 131180 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:08,758-Speed 2497.89 samples/sec Loss 5.0182 LearningRate 0.000875 Epoch: 6 Global Step: 131190 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:16,957-Speed 2498.19 samples/sec Loss 4.9477 LearningRate 0.000875 Epoch: 6 Global Step: 131200 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:25,157-Speed 2497.80 samples/sec Loss 5.0348 LearningRate 0.000875 Epoch: 6 Global Step: 131210 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:33,355-Speed 2498.46 samples/sec Loss 5.0252 LearningRate 0.000875 Epoch: 6 Global Step: 131220 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:41,505-Speed 2513.49 samples/sec Loss 5.0833 LearningRate 0.000875 Epoch: 6 Global Step: 131230 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:49,704-Speed 2498.31 samples/sec Loss 5.1115 LearningRate 0.000875 Epoch: 6 Global Step: 131240 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:25:57,901-Speed 2498.68 samples/sec Loss 4.9730 LearningRate 0.000875 Epoch: 6 Global Step: 131250 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:06,104-Speed 2497.11 samples/sec Loss 5.0371 LearningRate 0.000875 Epoch: 6 Global Step: 131260 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:14,304-Speed 2498.10 samples/sec Loss 5.0037 LearningRate 0.000875 Epoch: 6 Global Step: 131270 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:22,505-Speed 2497.57 samples/sec Loss 4.9798 LearningRate 0.000875 Epoch: 6 Global Step: 131280 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:30,653-Speed 2514.05 samples/sec Loss 4.9817 LearningRate 0.000875 Epoch: 6 Global Step: 131290 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:38,853-Speed 2497.86 samples/sec Loss 4.9945 LearningRate 0.000875 Epoch: 6 Global Step: 131300 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:47,053-Speed 2498.10 samples/sec Loss 5.0792 LearningRate 0.000875 Epoch: 6 Global Step: 131310 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:26:55,255-Speed 2497.27 samples/sec Loss 5.1380 LearningRate 0.000875 Epoch: 6 Global Step: 131320 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:03,458-Speed 2497.09 samples/sec Loss 5.0181 LearningRate 0.000875 Epoch: 6 Global Step: 131330 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:11,655-Speed 2498.86 samples/sec Loss 5.0415 LearningRate 0.000875 Epoch: 6 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:19,801-Speed 2514.54 samples/sec Loss 5.0551 LearningRate 0.000875 Epoch: 6 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:28,005-Speed 2497.09 samples/sec Loss 4.9714 LearningRate 0.000875 Epoch: 6 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:36,204-Speed 2498.14 samples/sec Loss 4.9874 LearningRate 0.000875 Epoch: 6 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:44,406-Speed 2497.41 samples/sec Loss 5.0037 LearningRate 0.000874 Epoch: 6 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:27:52,610-Speed 2496.73 samples/sec Loss 4.9848 LearningRate 0.000874 Epoch: 6 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:00,812-Speed 2497.46 samples/sec Loss 4.9744 LearningRate 0.000874 Epoch: 6 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:08,954-Speed 2515.86 samples/sec Loss 5.0359 LearningRate 0.000874 Epoch: 6 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:17,156-Speed 2497.48 samples/sec Loss 5.0885 LearningRate 0.000874 Epoch: 6 Global Step: 131420 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:25,355-Speed 2498.28 samples/sec Loss 5.1244 LearningRate 0.000874 Epoch: 6 Global Step: 131430 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:33,554-Speed 2498.25 samples/sec Loss 5.0012 LearningRate 0.000874 Epoch: 6 Global Step: 131440 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:41,753-Speed 2498.20 samples/sec Loss 4.9704 LearningRate 0.000874 Epoch: 6 Global Step: 131450 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:49,955-Speed 2497.23 samples/sec Loss 5.0756 LearningRate 0.000874 Epoch: 6 Global Step: 131460 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:28:58,106-Speed 2512.97 samples/sec Loss 4.9361 LearningRate 0.000874 Epoch: 6 Global Step: 131470 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:29:06,307-Speed 2497.49 samples/sec Loss 5.0650 LearningRate 0.000874 Epoch: 6 Global Step: 131480 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:29:14,505-Speed 2498.60 samples/sec Loss 5.0840 LearningRate 0.000874 Epoch: 6 Global Step: 131490 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:29:22,706-Speed 2497.93 samples/sec Loss 5.0229 LearningRate 0.000874 Epoch: 6 Global Step: 131500 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:29:30,906-Speed 2497.99 samples/sec Loss 5.0117 LearningRate 0.000874 Epoch: 6 Global Step: 131510 Fp16 Grad Scale: 65536 Required: 160 hours Training: 2022-07-06 20:29:39,070-Speed 2508.97 samples/sec Loss 4.9975 LearningRate 0.000874 Epoch: 6 Global Step: 131520 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:29:47,217-Speed 2514.19 samples/sec Loss 5.0604 LearningRate 0.000874 Epoch: 6 Global Step: 131530 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:29:55,417-Speed 2497.99 samples/sec Loss 5.0085 LearningRate 0.000874 Epoch: 6 Global Step: 131540 Fp16 Grad Scale: 32768 Required: 160 hours Training: 2022-07-06 20:30:03,614-Speed 2498.94 samples/sec Loss 4.9761 LearningRate 0.000874 Epoch: 6 Global Step: 131550 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:11,817-Speed 2497.16 samples/sec Loss 5.0905 LearningRate 0.000874 Epoch: 6 Global Step: 131560 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:20,029-Speed 2494.23 samples/sec Loss 5.0447 LearningRate 0.000874 Epoch: 6 Global Step: 131570 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:28,232-Speed 2497.06 samples/sec Loss 5.0698 LearningRate 0.000874 Epoch: 6 Global Step: 131580 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:36,376-Speed 2515.14 samples/sec Loss 4.9865 LearningRate 0.000874 Epoch: 6 Global Step: 131590 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:44,577-Speed 2497.59 samples/sec Loss 4.9667 LearningRate 0.000874 Epoch: 6 Global Step: 131600 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:30:52,780-Speed 2497.08 samples/sec Loss 4.9277 LearningRate 0.000874 Epoch: 6 Global Step: 131610 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:00,981-Speed 2497.62 samples/sec Loss 5.0717 LearningRate 0.000874 Epoch: 6 Global Step: 131620 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:09,180-Speed 2498.12 samples/sec Loss 4.9773 LearningRate 0.000874 Epoch: 6 Global Step: 131630 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:17,382-Speed 2497.35 samples/sec Loss 5.0611 LearningRate 0.000874 Epoch: 6 Global Step: 131640 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:25,542-Speed 2510.37 samples/sec Loss 5.1159 LearningRate 0.000874 Epoch: 6 Global Step: 131650 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:33,755-Speed 2494.07 samples/sec Loss 5.0585 LearningRate 0.000874 Epoch: 6 Global Step: 131660 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:41,955-Speed 2497.84 samples/sec Loss 5.0905 LearningRate 0.000874 Epoch: 6 Global Step: 131670 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:50,165-Speed 2495.15 samples/sec Loss 5.0597 LearningRate 0.000874 Epoch: 6 Global Step: 131680 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:31:58,369-Speed 2496.64 samples/sec Loss 5.0644 LearningRate 0.000874 Epoch: 6 Global Step: 131690 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:06,572-Speed 2497.18 samples/sec Loss 4.9733 LearningRate 0.000874 Epoch: 6 Global Step: 131700 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:14,723-Speed 2513.03 samples/sec Loss 5.0293 LearningRate 0.000874 Epoch: 6 Global Step: 131710 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:22,924-Speed 2497.66 samples/sec Loss 5.0335 LearningRate 0.000874 Epoch: 6 Global Step: 131720 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:31,124-Speed 2497.61 samples/sec Loss 5.0220 LearningRate 0.000874 Epoch: 6 Global Step: 131730 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:39,325-Speed 2497.79 samples/sec Loss 5.0011 LearningRate 0.000874 Epoch: 6 Global Step: 131740 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:47,525-Speed 2497.79 samples/sec Loss 5.0008 LearningRate 0.000874 Epoch: 6 Global Step: 131750 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:32:55,730-Speed 2496.47 samples/sec Loss 5.0011 LearningRate 0.000874 Epoch: 6 Global Step: 131760 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:03,880-Speed 2513.13 samples/sec Loss 4.9529 LearningRate 0.000874 Epoch: 6 Global Step: 131770 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:12,078-Speed 2498.91 samples/sec Loss 4.9802 LearningRate 0.000873 Epoch: 6 Global Step: 131780 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:20,279-Speed 2497.57 samples/sec Loss 4.9424 LearningRate 0.000873 Epoch: 6 Global Step: 131790 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:28,501-Speed 2491.03 samples/sec Loss 5.0097 LearningRate 0.000873 Epoch: 6 Global Step: 131800 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:36,705-Speed 2496.84 samples/sec Loss 4.9858 LearningRate 0.000873 Epoch: 6 Global Step: 131810 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:44,905-Speed 2497.96 samples/sec Loss 4.9364 LearningRate 0.000873 Epoch: 6 Global Step: 131820 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:33:53,054-Speed 2513.58 samples/sec Loss 5.0156 LearningRate 0.000873 Epoch: 6 Global Step: 131830 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:01,255-Speed 2497.66 samples/sec Loss 5.0031 LearningRate 0.000873 Epoch: 6 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:09,461-Speed 2496.08 samples/sec Loss 5.0225 LearningRate 0.000873 Epoch: 6 Global Step: 131850 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:17,663-Speed 2497.74 samples/sec Loss 4.9983 LearningRate 0.000873 Epoch: 6 Global Step: 131860 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:25,866-Speed 2496.87 samples/sec Loss 5.1233 LearningRate 0.000873 Epoch: 6 Global Step: 131870 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:34,086-Speed 2491.75 samples/sec Loss 5.0770 LearningRate 0.000873 Epoch: 6 Global Step: 131880 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:42,248-Speed 2510.20 samples/sec Loss 4.9738 LearningRate 0.000873 Epoch: 6 Global Step: 131890 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:50,458-Speed 2495.03 samples/sec Loss 5.0248 LearningRate 0.000873 Epoch: 6 Global Step: 131900 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:34:58,660-Speed 2497.51 samples/sec Loss 5.0046 LearningRate 0.000873 Epoch: 6 Global Step: 131910 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:06,865-Speed 2496.39 samples/sec Loss 4.9937 LearningRate 0.000873 Epoch: 6 Global Step: 131920 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:15,068-Speed 2497.08 samples/sec Loss 5.0257 LearningRate 0.000873 Epoch: 6 Global Step: 131930 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:23,281-Speed 2493.86 samples/sec Loss 5.0224 LearningRate 0.000873 Epoch: 6 Global Step: 131940 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:31,438-Speed 2511.06 samples/sec Loss 4.9473 LearningRate 0.000873 Epoch: 6 Global Step: 131950 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:39,644-Speed 2496.21 samples/sec Loss 4.9685 LearningRate 0.000873 Epoch: 6 Global Step: 131960 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:47,848-Speed 2496.63 samples/sec Loss 4.9850 LearningRate 0.000873 Epoch: 6 Global Step: 131970 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:35:56,062-Speed 2493.71 samples/sec Loss 4.9352 LearningRate 0.000873 Epoch: 6 Global Step: 131980 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:04,266-Speed 2496.61 samples/sec Loss 4.9239 LearningRate 0.000873 Epoch: 6 Global Step: 131990 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:12,472-Speed 2496.40 samples/sec Loss 4.9978 LearningRate 0.000873 Epoch: 6 Global Step: 132000 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:20,622-Speed 2513.23 samples/sec Loss 5.0578 LearningRate 0.000873 Epoch: 6 Global Step: 132010 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:28,823-Speed 2497.63 samples/sec Loss 5.1313 LearningRate 0.000873 Epoch: 6 Global Step: 132020 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:37,034-Speed 2494.40 samples/sec Loss 5.1726 LearningRate 0.000873 Epoch: 6 Global Step: 132030 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:45,238-Speed 2497.00 samples/sec Loss 5.1223 LearningRate 0.000873 Epoch: 6 Global Step: 132040 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:36:53,442-Speed 2496.89 samples/sec Loss 5.0733 LearningRate 0.000873 Epoch: 6 Global Step: 132050 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:01,643-Speed 2497.55 samples/sec Loss 5.0708 LearningRate 0.000873 Epoch: 6 Global Step: 132060 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:09,791-Speed 2513.88 samples/sec Loss 5.0126 LearningRate 0.000873 Epoch: 6 Global Step: 132070 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:17,995-Speed 2496.70 samples/sec Loss 5.0517 LearningRate 0.000873 Epoch: 6 Global Step: 132080 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:26,199-Speed 2496.90 samples/sec Loss 4.9785 LearningRate 0.000873 Epoch: 6 Global Step: 132090 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:34,427-Speed 2489.37 samples/sec Loss 4.9783 LearningRate 0.000873 Epoch: 6 Global Step: 132100 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:42,628-Speed 2497.78 samples/sec Loss 4.9400 LearningRate 0.000873 Epoch: 6 Global Step: 132110 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:50,829-Speed 2497.47 samples/sec Loss 5.0173 LearningRate 0.000873 Epoch: 6 Global Step: 132120 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:37:58,989-Speed 2510.38 samples/sec Loss 5.0531 LearningRate 0.000873 Epoch: 6 Global Step: 132130 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:07,191-Speed 2497.36 samples/sec Loss 5.0111 LearningRate 0.000873 Epoch: 6 Global Step: 132140 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:15,398-Speed 2495.99 samples/sec Loss 5.0030 LearningRate 0.000873 Epoch: 6 Global Step: 132150 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:23,600-Speed 2497.36 samples/sec Loss 5.0611 LearningRate 0.000873 Epoch: 6 Global Step: 132160 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:31,803-Speed 2496.86 samples/sec Loss 5.1012 LearningRate 0.000873 Epoch: 6 Global Step: 132170 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:40,004-Speed 2497.81 samples/sec Loss 5.0881 LearningRate 0.000872 Epoch: 6 Global Step: 132180 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:48,153-Speed 2513.59 samples/sec Loss 4.9685 LearningRate 0.000872 Epoch: 6 Global Step: 132190 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:38:56,355-Speed 2497.24 samples/sec Loss 5.0632 LearningRate 0.000872 Epoch: 6 Global Step: 132200 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:04,558-Speed 2496.83 samples/sec Loss 5.0715 LearningRate 0.000872 Epoch: 6 Global Step: 132210 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:12,759-Speed 2497.97 samples/sec Loss 5.0306 LearningRate 0.000872 Epoch: 6 Global Step: 132220 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:20,958-Speed 2498.13 samples/sec Loss 5.0530 LearningRate 0.000872 Epoch: 6 Global Step: 132230 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:29,162-Speed 2496.82 samples/sec Loss 5.0433 LearningRate 0.000872 Epoch: 6 Global Step: 132240 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:37,309-Speed 2514.27 samples/sec Loss 5.0136 LearningRate 0.000872 Epoch: 6 Global Step: 132250 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:45,512-Speed 2496.99 samples/sec Loss 4.9050 LearningRate 0.000872 Epoch: 6 Global Step: 132260 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:39:53,729-Speed 2492.81 samples/sec Loss 5.0979 LearningRate 0.000872 Epoch: 6 Global Step: 132270 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:01,931-Speed 2497.31 samples/sec Loss 5.0265 LearningRate 0.000872 Epoch: 6 Global Step: 132280 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:10,135-Speed 2496.72 samples/sec Loss 4.9166 LearningRate 0.000872 Epoch: 6 Global Step: 132290 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:18,340-Speed 2496.59 samples/sec Loss 5.0216 LearningRate 0.000872 Epoch: 6 Global Step: 132300 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:26,488-Speed 2513.80 samples/sec Loss 5.0989 LearningRate 0.000872 Epoch: 6 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:34,693-Speed 2496.34 samples/sec Loss 5.0161 LearningRate 0.000872 Epoch: 6 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:42,908-Speed 2493.54 samples/sec Loss 5.0333 LearningRate 0.000872 Epoch: 6 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:51,110-Speed 2497.54 samples/sec Loss 5.1075 LearningRate 0.000872 Epoch: 6 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:40:59,311-Speed 2497.73 samples/sec Loss 5.1258 LearningRate 0.000872 Epoch: 6 Global Step: 132350 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:07,516-Speed 2496.16 samples/sec Loss 5.1460 LearningRate 0.000872 Epoch: 6 Global Step: 132360 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:15,678-Speed 2509.88 samples/sec Loss 5.0181 LearningRate 0.000872 Epoch: 6 Global Step: 132370 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:23,877-Speed 2498.03 samples/sec Loss 5.0839 LearningRate 0.000872 Epoch: 6 Global Step: 132380 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:32,083-Speed 2496.32 samples/sec Loss 5.0653 LearningRate 0.000872 Epoch: 6 Global Step: 132390 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:40,286-Speed 2497.03 samples/sec Loss 4.9606 LearningRate 0.000872 Epoch: 6 Global Step: 132400 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:48,490-Speed 2496.74 samples/sec Loss 5.0301 LearningRate 0.000872 Epoch: 6 Global Step: 132410 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:41:56,709-Speed 2492.16 samples/sec Loss 5.1116 LearningRate 0.000872 Epoch: 6 Global Step: 132420 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:04,857-Speed 2513.94 samples/sec Loss 5.0319 LearningRate 0.000872 Epoch: 6 Global Step: 132430 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:13,056-Speed 2498.27 samples/sec Loss 4.9848 LearningRate 0.000872 Epoch: 6 Global Step: 132440 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:21,260-Speed 2496.41 samples/sec Loss 5.0163 LearningRate 0.000872 Epoch: 6 Global Step: 132450 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:29,462-Speed 2497.68 samples/sec Loss 5.0637 LearningRate 0.000872 Epoch: 6 Global Step: 132460 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:37,665-Speed 2496.89 samples/sec Loss 5.1995 LearningRate 0.000872 Epoch: 6 Global Step: 132470 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:45,870-Speed 2496.44 samples/sec Loss 5.0541 LearningRate 0.000872 Epoch: 6 Global Step: 132480 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:42:54,017-Speed 2514.08 samples/sec Loss 5.0230 LearningRate 0.000872 Epoch: 6 Global Step: 132490 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:02,219-Speed 2497.45 samples/sec Loss 5.0911 LearningRate 0.000872 Epoch: 6 Global Step: 132500 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:10,423-Speed 2496.95 samples/sec Loss 5.1581 LearningRate 0.000872 Epoch: 6 Global Step: 132510 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:18,625-Speed 2497.21 samples/sec Loss 5.0845 LearningRate 0.000872 Epoch: 6 Global Step: 132520 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:26,841-Speed 2493.05 samples/sec Loss 5.0226 LearningRate 0.000872 Epoch: 6 Global Step: 132530 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:35,042-Speed 2498.02 samples/sec Loss 5.0197 LearningRate 0.000872 Epoch: 6 Global Step: 132540 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:43,192-Speed 2513.43 samples/sec Loss 5.0809 LearningRate 0.000872 Epoch: 6 Global Step: 132550 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:51,393-Speed 2497.79 samples/sec Loss 5.0120 LearningRate 0.000872 Epoch: 6 Global Step: 132560 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:43:59,597-Speed 2497.09 samples/sec Loss 5.0166 LearningRate 0.000872 Epoch: 6 Global Step: 132570 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:07,798-Speed 2497.48 samples/sec Loss 5.0141 LearningRate 0.000871 Epoch: 6 Global Step: 132580 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:15,999-Speed 2497.77 samples/sec Loss 5.0111 LearningRate 0.000871 Epoch: 6 Global Step: 132590 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:24,204-Speed 2496.27 samples/sec Loss 4.9965 LearningRate 0.000871 Epoch: 6 Global Step: 132600 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:32,351-Speed 2514.23 samples/sec Loss 4.9736 LearningRate 0.000871 Epoch: 6 Global Step: 132610 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:40,552-Speed 2497.56 samples/sec Loss 5.1159 LearningRate 0.000871 Epoch: 6 Global Step: 132620 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:48,756-Speed 2496.71 samples/sec Loss 5.0029 LearningRate 0.000871 Epoch: 6 Global Step: 132630 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:44:56,956-Speed 2497.78 samples/sec Loss 4.9011 LearningRate 0.000871 Epoch: 6 Global Step: 132640 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:05,158-Speed 2497.51 samples/sec Loss 4.9922 LearningRate 0.000871 Epoch: 6 Global Step: 132650 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:13,359-Speed 2497.55 samples/sec Loss 4.9631 LearningRate 0.000871 Epoch: 6 Global Step: 132660 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:21,514-Speed 2511.62 samples/sec Loss 5.1081 LearningRate 0.000871 Epoch: 6 Global Step: 132670 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:29,716-Speed 2497.54 samples/sec Loss 5.0449 LearningRate 0.000871 Epoch: 6 Global Step: 132680 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:37,923-Speed 2496.02 samples/sec Loss 4.9969 LearningRate 0.000871 Epoch: 6 Global Step: 132690 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:46,123-Speed 2497.82 samples/sec Loss 4.9320 LearningRate 0.000871 Epoch: 6 Global Step: 132700 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:45:54,324-Speed 2497.52 samples/sec Loss 5.0038 LearningRate 0.000871 Epoch: 6 Global Step: 132710 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:46:02,525-Speed 2497.70 samples/sec Loss 5.0737 LearningRate 0.000871 Epoch: 6 Global Step: 132720 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:10,674-Speed 2513.72 samples/sec Loss 5.0297 LearningRate 0.000871 Epoch: 6 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:18,874-Speed 2497.69 samples/sec Loss 4.9865 LearningRate 0.000871 Epoch: 6 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:27,077-Speed 2497.29 samples/sec Loss 4.9621 LearningRate 0.000871 Epoch: 6 Global Step: 132750 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:35,281-Speed 2496.68 samples/sec Loss 5.0161 LearningRate 0.000871 Epoch: 6 Global Step: 132760 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:43,489-Speed 2495.64 samples/sec Loss 5.0575 LearningRate 0.000871 Epoch: 6 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:51,693-Speed 2496.65 samples/sec Loss 4.9851 LearningRate 0.000871 Epoch: 6 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:46:59,845-Speed 2512.98 samples/sec Loss 4.9012 LearningRate 0.000871 Epoch: 6 Global Step: 132790 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:08,048-Speed 2497.13 samples/sec Loss 5.0127 LearningRate 0.000871 Epoch: 6 Global Step: 132800 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:16,263-Speed 2493.13 samples/sec Loss 4.9647 LearningRate 0.000871 Epoch: 6 Global Step: 132810 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:24,469-Speed 2496.26 samples/sec Loss 5.0203 LearningRate 0.000871 Epoch: 6 Global Step: 132820 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:32,670-Speed 2497.68 samples/sec Loss 4.9271 LearningRate 0.000871 Epoch: 6 Global Step: 132830 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:40,876-Speed 2496.17 samples/sec Loss 5.0891 LearningRate 0.000871 Epoch: 6 Global Step: 132840 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:49,027-Speed 2513.05 samples/sec Loss 4.9731 LearningRate 0.000871 Epoch: 6 Global Step: 132850 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:47:57,229-Speed 2497.23 samples/sec Loss 5.0111 LearningRate 0.000871 Epoch: 6 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:05,433-Speed 2496.86 samples/sec Loss 5.0428 LearningRate 0.000871 Epoch: 6 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:13,635-Speed 2497.38 samples/sec Loss 5.0386 LearningRate 0.000871 Epoch: 6 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:21,835-Speed 2497.94 samples/sec Loss 4.9273 LearningRate 0.000871 Epoch: 6 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:30,040-Speed 2496.47 samples/sec Loss 4.8545 LearningRate 0.000871 Epoch: 6 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:38,190-Speed 2513.52 samples/sec Loss 4.9729 LearningRate 0.000871 Epoch: 6 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:46,389-Speed 2498.32 samples/sec Loss 5.0421 LearningRate 0.000871 Epoch: 6 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:48:54,589-Speed 2498.04 samples/sec Loss 4.9598 LearningRate 0.000871 Epoch: 6 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:02,789-Speed 2497.96 samples/sec Loss 5.0038 LearningRate 0.000871 Epoch: 6 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:10,990-Speed 2498.02 samples/sec Loss 5.0062 LearningRate 0.000871 Epoch: 6 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:19,191-Speed 2497.60 samples/sec Loss 4.9914 LearningRate 0.000871 Epoch: 6 Global Step: 132960 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:27,347-Speed 2511.41 samples/sec Loss 5.0365 LearningRate 0.000871 Epoch: 6 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:35,546-Speed 2498.48 samples/sec Loss 5.0003 LearningRate 0.000870 Epoch: 6 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:43,749-Speed 2496.99 samples/sec Loss 5.0212 LearningRate 0.000870 Epoch: 6 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:49:51,953-Speed 2496.90 samples/sec Loss 4.9737 LearningRate 0.000870 Epoch: 6 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:00,152-Speed 2498.18 samples/sec Loss 4.9845 LearningRate 0.000870 Epoch: 6 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:08,353-Speed 2497.87 samples/sec Loss 4.9797 LearningRate 0.000870 Epoch: 6 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:16,501-Speed 2514.05 samples/sec Loss 5.0516 LearningRate 0.000870 Epoch: 6 Global Step: 133030 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:24,700-Speed 2498.27 samples/sec Loss 4.9843 LearningRate 0.000870 Epoch: 6 Global Step: 133040 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:32,902-Speed 2497.26 samples/sec Loss 4.9890 LearningRate 0.000870 Epoch: 6 Global Step: 133050 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:41,105-Speed 2497.18 samples/sec Loss 4.9966 LearningRate 0.000870 Epoch: 6 Global Step: 133060 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:49,317-Speed 2494.41 samples/sec Loss 5.0182 LearningRate 0.000870 Epoch: 6 Global Step: 133070 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:50:57,521-Speed 2496.65 samples/sec Loss 4.9519 LearningRate 0.000870 Epoch: 6 Global Step: 133080 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:05,672-Speed 2513.08 samples/sec Loss 4.9345 LearningRate 0.000870 Epoch: 6 Global Step: 133090 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:13,872-Speed 2497.96 samples/sec Loss 5.0142 LearningRate 0.000870 Epoch: 6 Global Step: 133100 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:22,078-Speed 2496.02 samples/sec Loss 4.9910 LearningRate 0.000870 Epoch: 6 Global Step: 133110 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:30,284-Speed 2496.07 samples/sec Loss 4.9799 LearningRate 0.000870 Epoch: 6 Global Step: 133120 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:38,486-Speed 2497.40 samples/sec Loss 4.9447 LearningRate 0.000870 Epoch: 6 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:46,689-Speed 2497.26 samples/sec Loss 4.9274 LearningRate 0.000870 Epoch: 6 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:51:54,836-Speed 2514.06 samples/sec Loss 4.9939 LearningRate 0.000870 Epoch: 6 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:03,038-Speed 2497.36 samples/sec Loss 4.9615 LearningRate 0.000870 Epoch: 6 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:11,249-Speed 2494.69 samples/sec Loss 4.9552 LearningRate 0.000870 Epoch: 6 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:19,451-Speed 2497.45 samples/sec Loss 4.9240 LearningRate 0.000870 Epoch: 6 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:27,654-Speed 2496.89 samples/sec Loss 5.0361 LearningRate 0.000870 Epoch: 6 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:35,858-Speed 2497.18 samples/sec Loss 5.0251 LearningRate 0.000870 Epoch: 6 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:44,008-Speed 2513.65 samples/sec Loss 5.0136 LearningRate 0.000870 Epoch: 6 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:52:52,207-Speed 2497.99 samples/sec Loss 4.9487 LearningRate 0.000870 Epoch: 6 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:00,406-Speed 2498.44 samples/sec Loss 4.9945 LearningRate 0.000870 Epoch: 6 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:08,605-Speed 2498.16 samples/sec Loss 4.9574 LearningRate 0.000870 Epoch: 6 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:16,818-Speed 2494.20 samples/sec Loss 4.9618 LearningRate 0.000870 Epoch: 6 Global Step: 133250 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:25,023-Speed 2496.65 samples/sec Loss 4.8572 LearningRate 0.000870 Epoch: 6 Global Step: 133260 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:33,169-Speed 2514.57 samples/sec Loss 5.0036 LearningRate 0.000870 Epoch: 6 Global Step: 133270 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:41,372-Speed 2497.07 samples/sec Loss 4.9155 LearningRate 0.000870 Epoch: 6 Global Step: 133280 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:49,573-Speed 2497.56 samples/sec Loss 4.9324 LearningRate 0.000870 Epoch: 6 Global Step: 133290 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:53:57,780-Speed 2496.02 samples/sec Loss 4.9391 LearningRate 0.000870 Epoch: 6 Global Step: 133300 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:05,983-Speed 2496.86 samples/sec Loss 4.9731 LearningRate 0.000870 Epoch: 6 Global Step: 133310 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:14,186-Speed 2496.90 samples/sec Loss 5.0159 LearningRate 0.000870 Epoch: 6 Global Step: 133320 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:22,334-Speed 2513.92 samples/sec Loss 4.9664 LearningRate 0.000870 Epoch: 6 Global Step: 133330 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:30,537-Speed 2497.25 samples/sec Loss 4.9856 LearningRate 0.000870 Epoch: 6 Global Step: 133340 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:38,742-Speed 2496.29 samples/sec Loss 5.0340 LearningRate 0.000870 Epoch: 6 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:46,944-Speed 2497.71 samples/sec Loss 4.9511 LearningRate 0.000870 Epoch: 6 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 159 hours Training: 2022-07-06 20:54:55,117-Speed 2506.17 samples/sec Loss 4.9782 LearningRate 0.000870 Epoch: 6 Global Step: 133370 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:03,318-Speed 2497.69 samples/sec Loss 4.9705 LearningRate 0.000869 Epoch: 6 Global Step: 133380 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:11,467-Speed 2513.46 samples/sec Loss 5.0742 LearningRate 0.000869 Epoch: 6 Global Step: 133390 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:19,669-Speed 2497.39 samples/sec Loss 4.9871 LearningRate 0.000869 Epoch: 6 Global Step: 133400 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:27,870-Speed 2497.74 samples/sec Loss 4.9296 LearningRate 0.000869 Epoch: 6 Global Step: 133410 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:36,071-Speed 2497.40 samples/sec Loss 4.9094 LearningRate 0.000869 Epoch: 6 Global Step: 133420 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:44,270-Speed 2498.26 samples/sec Loss 4.9581 LearningRate 0.000869 Epoch: 6 Global Step: 133430 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:55:52,483-Speed 2494.07 samples/sec Loss 4.9198 LearningRate 0.000869 Epoch: 6 Global Step: 133440 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:00,630-Speed 2514.19 samples/sec Loss 4.9539 LearningRate 0.000869 Epoch: 6 Global Step: 133450 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:08,829-Speed 2498.35 samples/sec Loss 5.0107 LearningRate 0.000869 Epoch: 6 Global Step: 133460 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:17,032-Speed 2497.00 samples/sec Loss 4.9319 LearningRate 0.000869 Epoch: 6 Global Step: 133470 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:25,236-Speed 2496.98 samples/sec Loss 4.9177 LearningRate 0.000869 Epoch: 6 Global Step: 133480 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:33,443-Speed 2495.69 samples/sec Loss 5.0090 LearningRate 0.000869 Epoch: 6 Global Step: 133490 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:41,645-Speed 2497.29 samples/sec Loss 4.9515 LearningRate 0.000869 Epoch: 6 Global Step: 133500 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:49,794-Speed 2513.76 samples/sec Loss 5.0150 LearningRate 0.000869 Epoch: 6 Global Step: 133510 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:56:57,996-Speed 2497.33 samples/sec Loss 4.9730 LearningRate 0.000869 Epoch: 6 Global Step: 133520 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:06,197-Speed 2497.85 samples/sec Loss 5.0640 LearningRate 0.000869 Epoch: 6 Global Step: 133530 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:14,395-Speed 2498.66 samples/sec Loss 4.9704 LearningRate 0.000869 Epoch: 6 Global Step: 133540 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:22,597-Speed 2497.27 samples/sec Loss 4.9662 LearningRate 0.000869 Epoch: 6 Global Step: 133550 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:30,797-Speed 2498.10 samples/sec Loss 5.0195 LearningRate 0.000869 Epoch: 6 Global Step: 133560 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:38,945-Speed 2513.77 samples/sec Loss 4.9062 LearningRate 0.000869 Epoch: 6 Global Step: 133570 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:47,149-Speed 2496.94 samples/sec Loss 4.9947 LearningRate 0.000869 Epoch: 6 Global Step: 133580 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:57:55,346-Speed 2498.63 samples/sec Loss 5.0628 LearningRate 0.000869 Epoch: 6 Global Step: 133590 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:03,545-Speed 2498.43 samples/sec Loss 5.0068 LearningRate 0.000869 Epoch: 6 Global Step: 133600 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:11,744-Speed 2498.24 samples/sec Loss 5.0845 LearningRate 0.000869 Epoch: 6 Global Step: 133610 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:19,946-Speed 2497.36 samples/sec Loss 4.9937 LearningRate 0.000869 Epoch: 6 Global Step: 133620 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:28,094-Speed 2513.72 samples/sec Loss 4.9848 LearningRate 0.000869 Epoch: 6 Global Step: 133630 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:36,294-Speed 2498.08 samples/sec Loss 5.0343 LearningRate 0.000869 Epoch: 6 Global Step: 133640 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:44,494-Speed 2498.16 samples/sec Loss 4.9568 LearningRate 0.000869 Epoch: 6 Global Step: 133650 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:58:52,693-Speed 2498.27 samples/sec Loss 4.9261 LearningRate 0.000869 Epoch: 6 Global Step: 133660 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:00,891-Speed 2498.44 samples/sec Loss 4.9726 LearningRate 0.000869 Epoch: 6 Global Step: 133670 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:09,091-Speed 2498.08 samples/sec Loss 5.0075 LearningRate 0.000869 Epoch: 6 Global Step: 133680 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:17,234-Speed 2515.49 samples/sec Loss 4.9949 LearningRate 0.000869 Epoch: 6 Global Step: 133690 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:25,432-Speed 2498.56 samples/sec Loss 5.0061 LearningRate 0.000869 Epoch: 6 Global Step: 133700 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:33,630-Speed 2499.37 samples/sec Loss 4.9777 LearningRate 0.000869 Epoch: 6 Global Step: 133710 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:41,832-Speed 2497.48 samples/sec Loss 5.0145 LearningRate 0.000869 Epoch: 6 Global Step: 133720 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:50,031-Speed 2497.97 samples/sec Loss 5.0166 LearningRate 0.000869 Epoch: 6 Global Step: 133730 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 20:59:58,234-Speed 2497.05 samples/sec Loss 5.0107 LearningRate 0.000869 Epoch: 6 Global Step: 133740 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:06,386-Speed 2512.71 samples/sec Loss 4.9919 LearningRate 0.000869 Epoch: 6 Global Step: 133750 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:14,585-Speed 2498.60 samples/sec Loss 5.0249 LearningRate 0.000869 Epoch: 6 Global Step: 133760 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:22,786-Speed 2497.41 samples/sec Loss 4.9342 LearningRate 0.000869 Epoch: 6 Global Step: 133770 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:30,987-Speed 2497.92 samples/sec Loss 4.9279 LearningRate 0.000868 Epoch: 6 Global Step: 133780 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:39,187-Speed 2498.14 samples/sec Loss 4.9982 LearningRate 0.000868 Epoch: 6 Global Step: 133790 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:47,390-Speed 2497.00 samples/sec Loss 4.8982 LearningRate 0.000868 Epoch: 6 Global Step: 133800 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:00:55,537-Speed 2514.16 samples/sec Loss 5.0036 LearningRate 0.000868 Epoch: 6 Global Step: 133810 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:03,738-Speed 2497.96 samples/sec Loss 4.9061 LearningRate 0.000868 Epoch: 6 Global Step: 133820 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:11,939-Speed 2497.59 samples/sec Loss 4.9927 LearningRate 0.000868 Epoch: 6 Global Step: 133830 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:20,145-Speed 2496.42 samples/sec Loss 4.9844 LearningRate 0.000868 Epoch: 6 Global Step: 133840 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:28,346-Speed 2497.43 samples/sec Loss 4.9557 LearningRate 0.000868 Epoch: 6 Global Step: 133850 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:36,545-Speed 2498.60 samples/sec Loss 4.9801 LearningRate 0.000868 Epoch: 6 Global Step: 133860 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:44,695-Speed 2513.41 samples/sec Loss 4.9734 LearningRate 0.000868 Epoch: 6 Global Step: 133870 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:01:52,913-Speed 2492.46 samples/sec Loss 4.9486 LearningRate 0.000868 Epoch: 6 Global Step: 133880 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:01,128-Speed 2493.58 samples/sec Loss 4.9203 LearningRate 0.000868 Epoch: 6 Global Step: 133890 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:09,328-Speed 2498.05 samples/sec Loss 4.9009 LearningRate 0.000868 Epoch: 6 Global Step: 133900 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:17,526-Speed 2498.28 samples/sec Loss 4.9580 LearningRate 0.000868 Epoch: 6 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:25,724-Speed 2498.61 samples/sec Loss 4.9685 LearningRate 0.000868 Epoch: 6 Global Step: 133920 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:33,871-Speed 2514.53 samples/sec Loss 4.9034 LearningRate 0.000868 Epoch: 6 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:42,072-Speed 2497.80 samples/sec Loss 4.9773 LearningRate 0.000868 Epoch: 6 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:50,269-Speed 2498.84 samples/sec Loss 5.0003 LearningRate 0.000868 Epoch: 6 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:02:58,483-Speed 2493.71 samples/sec Loss 4.9059 LearningRate 0.000868 Epoch: 6 Global Step: 133960 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:06,688-Speed 2496.38 samples/sec Loss 4.9672 LearningRate 0.000868 Epoch: 6 Global Step: 133970 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:14,900-Speed 2494.72 samples/sec Loss 4.9871 LearningRate 0.000868 Epoch: 6 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:23,048-Speed 2513.87 samples/sec Loss 4.9682 LearningRate 0.000868 Epoch: 6 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:31,245-Speed 2498.82 samples/sec Loss 4.9651 LearningRate 0.000868 Epoch: 6 Global Step: 134000 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:39,443-Speed 2498.51 samples/sec Loss 4.8807 LearningRate 0.000868 Epoch: 6 Global Step: 134010 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:47,643-Speed 2498.17 samples/sec Loss 4.9739 LearningRate 0.000868 Epoch: 6 Global Step: 134020 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:03:55,840-Speed 2498.49 samples/sec Loss 4.9477 LearningRate 0.000868 Epoch: 6 Global Step: 134030 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:04,044-Speed 2497.06 samples/sec Loss 4.9059 LearningRate 0.000868 Epoch: 6 Global Step: 134040 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:12,190-Speed 2514.25 samples/sec Loss 4.9838 LearningRate 0.000868 Epoch: 6 Global Step: 134050 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:20,389-Speed 2498.53 samples/sec Loss 4.9306 LearningRate 0.000868 Epoch: 6 Global Step: 134060 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:28,591-Speed 2497.41 samples/sec Loss 4.8793 LearningRate 0.000868 Epoch: 6 Global Step: 134070 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:36,791-Speed 2497.94 samples/sec Loss 4.9281 LearningRate 0.000868 Epoch: 6 Global Step: 134080 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:44,987-Speed 2499.08 samples/sec Loss 5.0034 LearningRate 0.000868 Epoch: 6 Global Step: 134090 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:04:53,188-Speed 2497.84 samples/sec Loss 4.8532 LearningRate 0.000868 Epoch: 6 Global Step: 134100 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:01,334-Speed 2514.62 samples/sec Loss 4.9972 LearningRate 0.000868 Epoch: 6 Global Step: 134110 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:09,534-Speed 2497.88 samples/sec Loss 5.0562 LearningRate 0.000868 Epoch: 6 Global Step: 134120 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:17,747-Speed 2494.23 samples/sec Loss 5.0144 LearningRate 0.000868 Epoch: 6 Global Step: 134130 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:25,973-Speed 2490.06 samples/sec Loss 4.9385 LearningRate 0.000868 Epoch: 6 Global Step: 134140 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:34,176-Speed 2496.89 samples/sec Loss 5.0893 LearningRate 0.000868 Epoch: 6 Global Step: 134150 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:42,378-Speed 2497.29 samples/sec Loss 5.0826 LearningRate 0.000868 Epoch: 6 Global Step: 134160 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:50,527-Speed 2513.79 samples/sec Loss 5.0208 LearningRate 0.000868 Epoch: 6 Global Step: 134170 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:05:58,736-Speed 2495.10 samples/sec Loss 5.0039 LearningRate 0.000867 Epoch: 6 Global Step: 134180 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:06:06,940-Speed 2496.94 samples/sec Loss 4.9880 LearningRate 0.000867 Epoch: 6 Global Step: 134190 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:06:15,143-Speed 2496.92 samples/sec Loss 4.9840 LearningRate 0.000867 Epoch: 6 Global Step: 134200 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:06:23,303-Speed 2510.15 samples/sec Loss 4.9512 LearningRate 0.000867 Epoch: 6 Global Step: 134210 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:06:31,505-Speed 2497.40 samples/sec Loss 5.0376 LearningRate 0.000867 Epoch: 6 Global Step: 134220 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:06:39,652-Speed 2514.22 samples/sec Loss 4.9803 LearningRate 0.000867 Epoch: 6 Global Step: 134230 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:06:47,851-Speed 2498.51 samples/sec Loss 4.9912 LearningRate 0.000867 Epoch: 6 Global Step: 134240 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:06:56,058-Speed 2495.72 samples/sec Loss 4.9427 LearningRate 0.000867 Epoch: 6 Global Step: 134250 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:04,261-Speed 2497.16 samples/sec Loss 4.9835 LearningRate 0.000867 Epoch: 6 Global Step: 134260 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:12,460-Speed 2498.12 samples/sec Loss 5.0468 LearningRate 0.000867 Epoch: 6 Global Step: 134270 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:20,674-Speed 2493.83 samples/sec Loss 4.9769 LearningRate 0.000867 Epoch: 6 Global Step: 134280 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:28,824-Speed 2513.25 samples/sec Loss 4.9001 LearningRate 0.000867 Epoch: 6 Global Step: 134290 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:37,024-Speed 2497.68 samples/sec Loss 5.0531 LearningRate 0.000867 Epoch: 6 Global Step: 134300 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:45,227-Speed 2497.16 samples/sec Loss 5.0317 LearningRate 0.000867 Epoch: 6 Global Step: 134310 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:07:53,423-Speed 2499.47 samples/sec Loss 4.9905 LearningRate 0.000867 Epoch: 6 Global Step: 134320 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:01,627-Speed 2496.55 samples/sec Loss 5.0225 LearningRate 0.000867 Epoch: 6 Global Step: 134330 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:09,829-Speed 2497.47 samples/sec Loss 5.0041 LearningRate 0.000867 Epoch: 6 Global Step: 134340 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:17,975-Speed 2514.67 samples/sec Loss 4.9677 LearningRate 0.000867 Epoch: 6 Global Step: 134350 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:26,187-Speed 2494.12 samples/sec Loss 4.9618 LearningRate 0.000867 Epoch: 6 Global Step: 134360 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:34,391-Speed 2496.75 samples/sec Loss 4.9817 LearningRate 0.000867 Epoch: 6 Global Step: 134370 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:42,590-Speed 2498.45 samples/sec Loss 4.9600 LearningRate 0.000867 Epoch: 6 Global Step: 134380 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:50,794-Speed 2496.90 samples/sec Loss 4.9843 LearningRate 0.000867 Epoch: 6 Global Step: 134390 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:08:58,993-Speed 2498.00 samples/sec Loss 4.9164 LearningRate 0.000867 Epoch: 6 Global Step: 134400 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:07,144-Speed 2513.09 samples/sec Loss 5.0097 LearningRate 0.000867 Epoch: 6 Global Step: 134410 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:15,345-Speed 2497.85 samples/sec Loss 4.9948 LearningRate 0.000867 Epoch: 6 Global Step: 134420 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:23,557-Speed 2494.25 samples/sec Loss 4.8816 LearningRate 0.000867 Epoch: 6 Global Step: 134430 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:31,759-Speed 2497.60 samples/sec Loss 4.9668 LearningRate 0.000867 Epoch: 6 Global Step: 134440 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:39,961-Speed 2497.25 samples/sec Loss 4.8855 LearningRate 0.000867 Epoch: 6 Global Step: 134450 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:48,165-Speed 2496.95 samples/sec Loss 5.0017 LearningRate 0.000867 Epoch: 6 Global Step: 134460 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:09:56,316-Speed 2512.80 samples/sec Loss 4.9041 LearningRate 0.000867 Epoch: 6 Global Step: 134470 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:04,517-Speed 2497.82 samples/sec Loss 4.8909 LearningRate 0.000867 Epoch: 6 Global Step: 134480 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:12,719-Speed 2497.37 samples/sec Loss 4.9171 LearningRate 0.000867 Epoch: 6 Global Step: 134490 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:20,916-Speed 2498.93 samples/sec Loss 4.8864 LearningRate 0.000867 Epoch: 6 Global Step: 134500 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:29,117-Speed 2497.87 samples/sec Loss 4.8845 LearningRate 0.000867 Epoch: 6 Global Step: 134510 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:37,320-Speed 2496.79 samples/sec Loss 4.8790 LearningRate 0.000867 Epoch: 6 Global Step: 134520 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:45,468-Speed 2513.80 samples/sec Loss 4.9966 LearningRate 0.000867 Epoch: 6 Global Step: 134530 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:10:53,672-Speed 2497.07 samples/sec Loss 4.9322 LearningRate 0.000867 Epoch: 6 Global Step: 134540 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:01,874-Speed 2497.22 samples/sec Loss 4.9947 LearningRate 0.000867 Epoch: 6 Global Step: 134550 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:10,076-Speed 2497.32 samples/sec Loss 4.9516 LearningRate 0.000867 Epoch: 6 Global Step: 134560 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:18,281-Speed 2496.49 samples/sec Loss 4.9425 LearningRate 0.000867 Epoch: 6 Global Step: 134570 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:26,485-Speed 2497.01 samples/sec Loss 4.9576 LearningRate 0.000866 Epoch: 6 Global Step: 134580 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:34,634-Speed 2513.22 samples/sec Loss 4.9698 LearningRate 0.000866 Epoch: 6 Global Step: 134590 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:42,836-Speed 2497.65 samples/sec Loss 4.9080 LearningRate 0.000866 Epoch: 6 Global Step: 134600 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:51,043-Speed 2496.18 samples/sec Loss 4.8845 LearningRate 0.000866 Epoch: 6 Global Step: 134610 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:11:59,245-Speed 2497.68 samples/sec Loss 4.8255 LearningRate 0.000866 Epoch: 6 Global Step: 134620 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:07,449-Speed 2496.75 samples/sec Loss 4.9529 LearningRate 0.000866 Epoch: 6 Global Step: 134630 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:15,655-Speed 2496.04 samples/sec Loss 4.8749 LearningRate 0.000866 Epoch: 6 Global Step: 134640 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:23,802-Speed 2514.34 samples/sec Loss 4.9599 LearningRate 0.000866 Epoch: 6 Global Step: 134650 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:32,005-Speed 2496.79 samples/sec Loss 4.8948 LearningRate 0.000866 Epoch: 6 Global Step: 134660 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:40,209-Speed 2496.92 samples/sec Loss 4.8824 LearningRate 0.000866 Epoch: 6 Global Step: 134670 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:48,409-Speed 2497.68 samples/sec Loss 4.9095 LearningRate 0.000866 Epoch: 6 Global Step: 134680 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:12:56,609-Speed 2498.06 samples/sec Loss 4.9386 LearningRate 0.000866 Epoch: 6 Global Step: 134690 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:04,809-Speed 2497.97 samples/sec Loss 4.8504 LearningRate 0.000866 Epoch: 6 Global Step: 134700 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:12,957-Speed 2514.05 samples/sec Loss 4.9289 LearningRate 0.000866 Epoch: 6 Global Step: 134710 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:21,158-Speed 2497.65 samples/sec Loss 4.9095 LearningRate 0.000866 Epoch: 6 Global Step: 134720 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:29,360-Speed 2497.48 samples/sec Loss 4.8787 LearningRate 0.000866 Epoch: 6 Global Step: 134730 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:37,560-Speed 2498.06 samples/sec Loss 4.8416 LearningRate 0.000866 Epoch: 6 Global Step: 134740 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:45,762-Speed 2497.45 samples/sec Loss 4.8117 LearningRate 0.000866 Epoch: 6 Global Step: 134750 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:13:53,965-Speed 2496.94 samples/sec Loss 4.8525 LearningRate 0.000866 Epoch: 6 Global Step: 134760 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:02,112-Speed 2514.13 samples/sec Loss 4.9229 LearningRate 0.000866 Epoch: 6 Global Step: 134770 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:10,314-Speed 2497.54 samples/sec Loss 4.8185 LearningRate 0.000866 Epoch: 6 Global Step: 134780 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:18,521-Speed 2495.84 samples/sec Loss 4.9817 LearningRate 0.000866 Epoch: 6 Global Step: 134790 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:26,723-Speed 2498.04 samples/sec Loss 4.9720 LearningRate 0.000866 Epoch: 6 Global Step: 134800 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:34,928-Speed 2496.52 samples/sec Loss 4.9388 LearningRate 0.000866 Epoch: 6 Global Step: 134810 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:43,129-Speed 2497.83 samples/sec Loss 4.8547 LearningRate 0.000866 Epoch: 6 Global Step: 134820 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:51,277-Speed 2513.99 samples/sec Loss 4.9438 LearningRate 0.000866 Epoch: 6 Global Step: 134830 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:14:59,478-Speed 2497.45 samples/sec Loss 4.9717 LearningRate 0.000866 Epoch: 6 Global Step: 134840 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:07,678-Speed 2498.19 samples/sec Loss 4.9353 LearningRate 0.000866 Epoch: 6 Global Step: 134850 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:15,880-Speed 2497.38 samples/sec Loss 4.9069 LearningRate 0.000866 Epoch: 6 Global Step: 134860 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:24,082-Speed 2497.33 samples/sec Loss 4.9686 LearningRate 0.000866 Epoch: 6 Global Step: 134870 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:32,285-Speed 2497.04 samples/sec Loss 4.9212 LearningRate 0.000866 Epoch: 6 Global Step: 134880 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:40,434-Speed 2513.73 samples/sec Loss 4.9108 LearningRate 0.000866 Epoch: 6 Global Step: 134890 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:48,634-Speed 2497.86 samples/sec Loss 4.9630 LearningRate 0.000866 Epoch: 6 Global Step: 134900 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:15:56,847-Speed 2494.06 samples/sec Loss 4.9627 LearningRate 0.000866 Epoch: 6 Global Step: 134910 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:05,047-Speed 2497.94 samples/sec Loss 4.8783 LearningRate 0.000866 Epoch: 6 Global Step: 134920 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:13,248-Speed 2497.63 samples/sec Loss 4.8655 LearningRate 0.000866 Epoch: 6 Global Step: 134930 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:21,449-Speed 2497.60 samples/sec Loss 4.9034 LearningRate 0.000866 Epoch: 6 Global Step: 134940 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:29,595-Speed 2514.44 samples/sec Loss 4.9040 LearningRate 0.000866 Epoch: 6 Global Step: 134950 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:37,801-Speed 2496.17 samples/sec Loss 4.9018 LearningRate 0.000866 Epoch: 6 Global Step: 134960 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:46,004-Speed 2497.25 samples/sec Loss 4.8816 LearningRate 0.000866 Epoch: 6 Global Step: 134970 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:16:54,203-Speed 2498.12 samples/sec Loss 4.9004 LearningRate 0.000865 Epoch: 6 Global Step: 134980 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:02,405-Speed 2497.55 samples/sec Loss 4.8462 LearningRate 0.000865 Epoch: 6 Global Step: 134990 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:10,609-Speed 2496.63 samples/sec Loss 4.9810 LearningRate 0.000865 Epoch: 6 Global Step: 135000 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:18,760-Speed 2513.06 samples/sec Loss 4.9445 LearningRate 0.000865 Epoch: 6 Global Step: 135010 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:26,972-Speed 2494.23 samples/sec Loss 4.9231 LearningRate 0.000865 Epoch: 6 Global Step: 135020 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:35,174-Speed 2497.12 samples/sec Loss 4.9689 LearningRate 0.000865 Epoch: 6 Global Step: 135030 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:43,375-Speed 2497.80 samples/sec Loss 4.9299 LearningRate 0.000865 Epoch: 6 Global Step: 135040 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:51,575-Speed 2497.87 samples/sec Loss 4.9892 LearningRate 0.000865 Epoch: 6 Global Step: 135050 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:17:59,778-Speed 2496.99 samples/sec Loss 4.9277 LearningRate 0.000865 Epoch: 6 Global Step: 135060 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:07,933-Speed 2511.90 samples/sec Loss 4.8852 LearningRate 0.000865 Epoch: 6 Global Step: 135070 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:16,133-Speed 2498.09 samples/sec Loss 4.9715 LearningRate 0.000865 Epoch: 6 Global Step: 135080 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:24,337-Speed 2496.88 samples/sec Loss 4.9282 LearningRate 0.000865 Epoch: 6 Global Step: 135090 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:32,538-Speed 2497.72 samples/sec Loss 4.9269 LearningRate 0.000865 Epoch: 6 Global Step: 135100 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:40,742-Speed 2496.86 samples/sec Loss 4.8341 LearningRate 0.000865 Epoch: 6 Global Step: 135110 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:48,952-Speed 2494.84 samples/sec Loss 4.9420 LearningRate 0.000865 Epoch: 6 Global Step: 135120 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:18:57,099-Speed 2514.37 samples/sec Loss 4.9758 LearningRate 0.000865 Epoch: 6 Global Step: 135130 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:05,297-Speed 2498.61 samples/sec Loss 4.9152 LearningRate 0.000865 Epoch: 6 Global Step: 135140 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:13,497-Speed 2498.30 samples/sec Loss 4.8740 LearningRate 0.000865 Epoch: 6 Global Step: 135150 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:21,702-Speed 2496.80 samples/sec Loss 4.8715 LearningRate 0.000865 Epoch: 6 Global Step: 135160 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:29,900-Speed 2498.39 samples/sec Loss 4.9149 LearningRate 0.000865 Epoch: 6 Global Step: 135170 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:38,102-Speed 2497.41 samples/sec Loss 4.8879 LearningRate 0.000865 Epoch: 6 Global Step: 135180 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:46,250-Speed 2513.83 samples/sec Loss 5.0332 LearningRate 0.000865 Epoch: 6 Global Step: 135190 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:19:54,450-Speed 2497.87 samples/sec Loss 4.8947 LearningRate 0.000865 Epoch: 6 Global Step: 135200 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:02,651-Speed 2497.65 samples/sec Loss 4.9129 LearningRate 0.000865 Epoch: 6 Global Step: 135210 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:10,852-Speed 2497.67 samples/sec Loss 5.0014 LearningRate 0.000865 Epoch: 6 Global Step: 135220 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:19,057-Speed 2496.39 samples/sec Loss 5.0227 LearningRate 0.000865 Epoch: 6 Global Step: 135230 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:27,260-Speed 2497.44 samples/sec Loss 4.9391 LearningRate 0.000865 Epoch: 6 Global Step: 135240 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:35,405-Speed 2514.72 samples/sec Loss 4.9281 LearningRate 0.000865 Epoch: 6 Global Step: 135250 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:43,606-Speed 2497.62 samples/sec Loss 4.9148 LearningRate 0.000865 Epoch: 6 Global Step: 135260 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:20:51,810-Speed 2497.03 samples/sec Loss 4.8384 LearningRate 0.000865 Epoch: 6 Global Step: 135270 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:00,013-Speed 2496.95 samples/sec Loss 4.9094 LearningRate 0.000865 Epoch: 6 Global Step: 135280 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:08,215-Speed 2497.44 samples/sec Loss 4.9213 LearningRate 0.000865 Epoch: 6 Global Step: 135290 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:16,414-Speed 2498.02 samples/sec Loss 4.8744 LearningRate 0.000865 Epoch: 6 Global Step: 135300 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:24,565-Speed 2513.11 samples/sec Loss 4.9254 LearningRate 0.000865 Epoch: 6 Global Step: 135310 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:32,765-Speed 2497.99 samples/sec Loss 5.0028 LearningRate 0.000865 Epoch: 6 Global Step: 135320 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:40,965-Speed 2498.13 samples/sec Loss 4.9789 LearningRate 0.000865 Epoch: 6 Global Step: 135330 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:49,165-Speed 2498.03 samples/sec Loss 4.8535 LearningRate 0.000865 Epoch: 6 Global Step: 135340 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:21:57,364-Speed 2498.19 samples/sec Loss 4.8647 LearningRate 0.000865 Epoch: 6 Global Step: 135350 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:05,566-Speed 2497.30 samples/sec Loss 4.9479 LearningRate 0.000865 Epoch: 6 Global Step: 135360 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:13,714-Speed 2514.04 samples/sec Loss 4.9091 LearningRate 0.000865 Epoch: 6 Global Step: 135370 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:21,920-Speed 2496.18 samples/sec Loss 4.9615 LearningRate 0.000864 Epoch: 6 Global Step: 135380 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:30,126-Speed 2496.14 samples/sec Loss 4.9164 LearningRate 0.000864 Epoch: 6 Global Step: 135390 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:38,328-Speed 2497.39 samples/sec Loss 4.9388 LearningRate 0.000864 Epoch: 6 Global Step: 135400 Fp16 Grad Scale: 16384 Required: 159 hours Training: 2022-07-06 21:22:46,530-Speed 2497.44 samples/sec Loss 4.9466 LearningRate 0.000864 Epoch: 6 Global Step: 135410 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:22:54,731-Speed 2497.70 samples/sec Loss 4.9770 LearningRate 0.000864 Epoch: 6 Global Step: 135420 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:02,878-Speed 2513.97 samples/sec Loss 4.9206 LearningRate 0.000864 Epoch: 6 Global Step: 135430 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:11,091-Speed 2494.12 samples/sec Loss 4.8963 LearningRate 0.000864 Epoch: 6 Global Step: 135440 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:19,290-Speed 2498.10 samples/sec Loss 4.8941 LearningRate 0.000864 Epoch: 6 Global Step: 135450 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:27,493-Speed 2497.19 samples/sec Loss 4.8932 LearningRate 0.000864 Epoch: 6 Global Step: 135460 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:35,696-Speed 2496.98 samples/sec Loss 4.9418 LearningRate 0.000864 Epoch: 6 Global Step: 135470 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:43,898-Speed 2497.35 samples/sec Loss 4.8712 LearningRate 0.000864 Epoch: 6 Global Step: 135480 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:23:52,051-Speed 2512.30 samples/sec Loss 4.9194 LearningRate 0.000864 Epoch: 6 Global Step: 135490 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:00,250-Speed 2498.45 samples/sec Loss 4.8263 LearningRate 0.000864 Epoch: 6 Global Step: 135500 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:08,453-Speed 2497.09 samples/sec Loss 4.8460 LearningRate 0.000864 Epoch: 6 Global Step: 135510 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:16,670-Speed 2492.90 samples/sec Loss 4.8519 LearningRate 0.000864 Epoch: 6 Global Step: 135520 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:24,874-Speed 2496.95 samples/sec Loss 4.8256 LearningRate 0.000864 Epoch: 6 Global Step: 135530 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:33,073-Speed 2498.16 samples/sec Loss 4.9148 LearningRate 0.000864 Epoch: 6 Global Step: 135540 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:41,221-Speed 2514.10 samples/sec Loss 4.9278 LearningRate 0.000864 Epoch: 6 Global Step: 135550 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:49,424-Speed 2497.25 samples/sec Loss 4.8810 LearningRate 0.000864 Epoch: 6 Global Step: 135560 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:24:57,626-Speed 2497.43 samples/sec Loss 4.9170 LearningRate 0.000864 Epoch: 6 Global Step: 135570 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:05,826-Speed 2498.07 samples/sec Loss 4.8151 LearningRate 0.000864 Epoch: 6 Global Step: 135580 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:14,027-Speed 2497.83 samples/sec Loss 4.9514 LearningRate 0.000864 Epoch: 6 Global Step: 135590 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:22,237-Speed 2494.86 samples/sec Loss 4.8300 LearningRate 0.000864 Epoch: 6 Global Step: 135600 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:30,388-Speed 2513.03 samples/sec Loss 4.8427 LearningRate 0.000864 Epoch: 6 Global Step: 135610 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:38,590-Speed 2497.48 samples/sec Loss 4.7885 LearningRate 0.000864 Epoch: 6 Global Step: 135620 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:46,792-Speed 2497.34 samples/sec Loss 4.8671 LearningRate 0.000864 Epoch: 6 Global Step: 135630 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:25:54,993-Speed 2497.56 samples/sec Loss 4.8434 LearningRate 0.000864 Epoch: 6 Global Step: 135640 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:03,200-Speed 2495.93 samples/sec Loss 4.7814 LearningRate 0.000864 Epoch: 6 Global Step: 135650 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:11,408-Speed 2495.45 samples/sec Loss 4.7634 LearningRate 0.000864 Epoch: 6 Global Step: 135660 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:19,559-Speed 2513.44 samples/sec Loss 4.8933 LearningRate 0.000864 Epoch: 6 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:27,761-Speed 2497.09 samples/sec Loss 4.8491 LearningRate 0.000864 Epoch: 6 Global Step: 135680 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:35,970-Speed 2495.53 samples/sec Loss 4.8504 LearningRate 0.000864 Epoch: 6 Global Step: 135690 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:44,177-Speed 2495.86 samples/sec Loss 4.8978 LearningRate 0.000864 Epoch: 6 Global Step: 135700 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:26:52,382-Speed 2496.06 samples/sec Loss 4.9389 LearningRate 0.000864 Epoch: 6 Global Step: 135710 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:00,586-Speed 2496.96 samples/sec Loss 4.8586 LearningRate 0.000864 Epoch: 6 Global Step: 135720 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:08,736-Speed 2513.28 samples/sec Loss 4.9242 LearningRate 0.000864 Epoch: 6 Global Step: 135730 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:16,938-Speed 2497.22 samples/sec Loss 4.9388 LearningRate 0.000864 Epoch: 6 Global Step: 135740 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:25,142-Speed 2496.57 samples/sec Loss 4.8275 LearningRate 0.000864 Epoch: 6 Global Step: 135750 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:33,350-Speed 2495.88 samples/sec Loss 4.8867 LearningRate 0.000864 Epoch: 6 Global Step: 135760 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:41,564-Speed 2493.40 samples/sec Loss 4.8321 LearningRate 0.000864 Epoch: 6 Global Step: 135770 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:49,768-Speed 2496.79 samples/sec Loss 4.8798 LearningRate 0.000864 Epoch: 6 Global Step: 135780 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:27:57,924-Speed 2511.71 samples/sec Loss 4.8306 LearningRate 0.000863 Epoch: 6 Global Step: 135790 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:28:06,127-Speed 2497.22 samples/sec Loss 4.8674 LearningRate 0.000863 Epoch: 6 Global Step: 135800 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:28:14,334-Speed 2495.61 samples/sec Loss 4.8814 LearningRate 0.000863 Epoch: 6 Global Step: 135810 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:28:22,535-Speed 2497.79 samples/sec Loss 4.8984 LearningRate 0.000863 Epoch: 6 Global Step: 135820 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:28:30,734-Speed 2498.44 samples/sec Loss 4.8848 LearningRate 0.000863 Epoch: 6 Global Step: 135830 Fp16 Grad Scale: 32768 Required: 159 hours Training: 2022-07-06 21:28:38,939-Speed 2497.29 samples/sec Loss 4.9957 LearningRate 0.000863 Epoch: 6 Global Step: 135840 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:28:47,089-Speed 2513.24 samples/sec Loss 4.8686 LearningRate 0.000863 Epoch: 6 Global Step: 135850 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:28:55,290-Speed 2497.83 samples/sec Loss 4.9203 LearningRate 0.000863 Epoch: 6 Global Step: 135860 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:03,494-Speed 2496.65 samples/sec Loss 4.8415 LearningRate 0.000863 Epoch: 6 Global Step: 135870 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:11,697-Speed 2497.18 samples/sec Loss 4.8342 LearningRate 0.000863 Epoch: 6 Global Step: 135880 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:19,901-Speed 2496.63 samples/sec Loss 4.9083 LearningRate 0.000863 Epoch: 6 Global Step: 135890 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:28,116-Speed 2493.61 samples/sec Loss 4.8880 LearningRate 0.000863 Epoch: 6 Global Step: 135900 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:36,260-Speed 2515.16 samples/sec Loss 4.8445 LearningRate 0.000863 Epoch: 6 Global Step: 135910 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:44,487-Speed 2489.79 samples/sec Loss 4.9424 LearningRate 0.000863 Epoch: 6 Global Step: 135920 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:29:52,684-Speed 2498.89 samples/sec Loss 5.0330 LearningRate 0.000863 Epoch: 6 Global Step: 135930 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:00,889-Speed 2496.25 samples/sec Loss 4.8999 LearningRate 0.000863 Epoch: 6 Global Step: 135940 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:09,097-Speed 2495.63 samples/sec Loss 4.8853 LearningRate 0.000863 Epoch: 6 Global Step: 135950 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:17,296-Speed 2498.34 samples/sec Loss 4.9374 LearningRate 0.000863 Epoch: 6 Global Step: 135960 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:25,443-Speed 2514.21 samples/sec Loss 4.9344 LearningRate 0.000863 Epoch: 6 Global Step: 135970 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:33,645-Speed 2497.14 samples/sec Loss 4.9560 LearningRate 0.000863 Epoch: 6 Global Step: 135980 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:41,844-Speed 2498.39 samples/sec Loss 4.8606 LearningRate 0.000863 Epoch: 6 Global Step: 135990 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:50,046-Speed 2497.30 samples/sec Loss 4.8588 LearningRate 0.000863 Epoch: 6 Global Step: 136000 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:30:58,251-Speed 2496.45 samples/sec Loss 4.8423 LearningRate 0.000863 Epoch: 6 Global Step: 136010 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:06,456-Speed 2496.62 samples/sec Loss 4.8993 LearningRate 0.000863 Epoch: 6 Global Step: 136020 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:14,606-Speed 2513.50 samples/sec Loss 4.8115 LearningRate 0.000863 Epoch: 6 Global Step: 136030 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:22,807-Speed 2497.50 samples/sec Loss 4.8791 LearningRate 0.000863 Epoch: 6 Global Step: 136040 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:31,011-Speed 2497.05 samples/sec Loss 4.8577 LearningRate 0.000863 Epoch: 6 Global Step: 136050 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:39,213-Speed 2497.11 samples/sec Loss 4.9108 LearningRate 0.000863 Epoch: 6 Global Step: 136060 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:47,415-Speed 2497.26 samples/sec Loss 4.8801 LearningRate 0.000863 Epoch: 6 Global Step: 136070 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:31:55,622-Speed 2496.08 samples/sec Loss 5.0251 LearningRate 0.000863 Epoch: 6 Global Step: 136080 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:03,772-Speed 2513.26 samples/sec Loss 4.8097 LearningRate 0.000863 Epoch: 6 Global Step: 136090 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:11,984-Speed 2494.26 samples/sec Loss 4.8948 LearningRate 0.000863 Epoch: 6 Global Step: 136100 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:20,184-Speed 2498.01 samples/sec Loss 4.8338 LearningRate 0.000863 Epoch: 6 Global Step: 136110 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:28,384-Speed 2498.00 samples/sec Loss 4.9973 LearningRate 0.000863 Epoch: 6 Global Step: 136120 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:36,586-Speed 2497.75 samples/sec Loss 5.0048 LearningRate 0.000863 Epoch: 6 Global Step: 136130 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:44,801-Speed 2493.26 samples/sec Loss 4.8558 LearningRate 0.000863 Epoch: 6 Global Step: 136140 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:32:52,949-Speed 2513.85 samples/sec Loss 4.8885 LearningRate 0.000863 Epoch: 6 Global Step: 136150 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:01,151-Speed 2497.58 samples/sec Loss 4.8758 LearningRate 0.000863 Epoch: 6 Global Step: 136160 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:09,354-Speed 2497.22 samples/sec Loss 4.9514 LearningRate 0.000863 Epoch: 6 Global Step: 136170 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:17,554-Speed 2497.85 samples/sec Loss 4.8734 LearningRate 0.000863 Epoch: 6 Global Step: 136180 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:25,756-Speed 2497.34 samples/sec Loss 4.8786 LearningRate 0.000862 Epoch: 6 Global Step: 136190 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:33,959-Speed 2497.48 samples/sec Loss 4.9078 LearningRate 0.000862 Epoch: 6 Global Step: 136200 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:42,110-Speed 2512.74 samples/sec Loss 4.8994 LearningRate 0.000862 Epoch: 6 Global Step: 136210 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:50,316-Speed 2496.27 samples/sec Loss 4.9377 LearningRate 0.000862 Epoch: 6 Global Step: 136220 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:33:58,517-Speed 2497.62 samples/sec Loss 4.9476 LearningRate 0.000862 Epoch: 6 Global Step: 136230 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:06,720-Speed 2497.52 samples/sec Loss 4.9806 LearningRate 0.000862 Epoch: 6 Global Step: 136240 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:14,932-Speed 2494.29 samples/sec Loss 4.9613 LearningRate 0.000862 Epoch: 6 Global Step: 136250 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:23,132-Speed 2498.18 samples/sec Loss 4.9200 LearningRate 0.000862 Epoch: 6 Global Step: 136260 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:31,285-Speed 2512.03 samples/sec Loss 4.8932 LearningRate 0.000862 Epoch: 6 Global Step: 136270 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:39,490-Speed 2496.85 samples/sec Loss 4.8430 LearningRate 0.000862 Epoch: 6 Global Step: 136280 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:47,696-Speed 2496.35 samples/sec Loss 4.8780 LearningRate 0.000862 Epoch: 6 Global Step: 136290 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:34:55,899-Speed 2497.05 samples/sec Loss 4.9303 LearningRate 0.000862 Epoch: 6 Global Step: 136300 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:04,103-Speed 2496.74 samples/sec Loss 4.8791 LearningRate 0.000862 Epoch: 6 Global Step: 136310 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:12,313-Speed 2495.13 samples/sec Loss 4.7925 LearningRate 0.000862 Epoch: 6 Global Step: 136320 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:20,465-Speed 2512.74 samples/sec Loss 4.8505 LearningRate 0.000862 Epoch: 6 Global Step: 136330 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:28,669-Speed 2497.24 samples/sec Loss 4.9257 LearningRate 0.000862 Epoch: 6 Global Step: 136340 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:36,869-Speed 2497.88 samples/sec Loss 4.9156 LearningRate 0.000862 Epoch: 6 Global Step: 136350 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:45,068-Speed 2498.10 samples/sec Loss 4.8718 LearningRate 0.000862 Epoch: 6 Global Step: 136360 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:35:53,272-Speed 2496.72 samples/sec Loss 4.8339 LearningRate 0.000862 Epoch: 6 Global Step: 136370 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:01,475-Speed 2497.11 samples/sec Loss 4.9211 LearningRate 0.000862 Epoch: 6 Global Step: 136380 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:09,628-Speed 2512.48 samples/sec Loss 4.8378 LearningRate 0.000862 Epoch: 6 Global Step: 136390 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:17,833-Speed 2496.39 samples/sec Loss 4.8326 LearningRate 0.000862 Epoch: 6 Global Step: 136400 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:26,032-Speed 2498.37 samples/sec Loss 4.8119 LearningRate 0.000862 Epoch: 6 Global Step: 136410 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:34,236-Speed 2496.59 samples/sec Loss 4.8007 LearningRate 0.000862 Epoch: 6 Global Step: 136420 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:42,440-Speed 2496.84 samples/sec Loss 4.7246 LearningRate 0.000862 Epoch: 6 Global Step: 136430 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:50,639-Speed 2498.29 samples/sec Loss 4.8561 LearningRate 0.000862 Epoch: 6 Global Step: 136440 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:36:58,805-Speed 2508.50 samples/sec Loss 4.8764 LearningRate 0.000862 Epoch: 6 Global Step: 136450 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:07,010-Speed 2496.55 samples/sec Loss 4.8701 LearningRate 0.000862 Epoch: 6 Global Step: 136460 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:15,215-Speed 2496.22 samples/sec Loss 4.9145 LearningRate 0.000862 Epoch: 6 Global Step: 136470 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:23,424-Speed 2495.38 samples/sec Loss 4.8655 LearningRate 0.000862 Epoch: 6 Global Step: 136480 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:31,624-Speed 2498.06 samples/sec Loss 4.8954 LearningRate 0.000862 Epoch: 6 Global Step: 136490 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:39,828-Speed 2496.62 samples/sec Loss 4.8439 LearningRate 0.000862 Epoch: 6 Global Step: 136500 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:47,987-Speed 2510.73 samples/sec Loss 4.8481 LearningRate 0.000862 Epoch: 6 Global Step: 136510 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:37:56,187-Speed 2498.17 samples/sec Loss 4.8990 LearningRate 0.000862 Epoch: 6 Global Step: 136520 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:04,390-Speed 2496.99 samples/sec Loss 4.9175 LearningRate 0.000862 Epoch: 6 Global Step: 136530 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:12,590-Speed 2498.02 samples/sec Loss 4.9001 LearningRate 0.000862 Epoch: 6 Global Step: 136540 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:20,788-Speed 2498.62 samples/sec Loss 4.8423 LearningRate 0.000862 Epoch: 6 Global Step: 136550 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:28,991-Speed 2497.17 samples/sec Loss 4.7977 LearningRate 0.000862 Epoch: 6 Global Step: 136560 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:37,138-Speed 2514.34 samples/sec Loss 4.8463 LearningRate 0.000862 Epoch: 6 Global Step: 136570 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:45,342-Speed 2496.84 samples/sec Loss 4.8200 LearningRate 0.000862 Epoch: 6 Global Step: 136580 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:38:53,542-Speed 2497.74 samples/sec Loss 4.8308 LearningRate 0.000861 Epoch: 6 Global Step: 136590 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:39:01,743-Speed 2497.92 samples/sec Loss 4.9578 LearningRate 0.000861 Epoch: 6 Global Step: 136600 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:39:09,946-Speed 2496.80 samples/sec Loss 5.0399 LearningRate 0.000861 Epoch: 6 Global Step: 136610 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:18,154-Speed 2495.86 samples/sec Loss 4.9062 LearningRate 0.000861 Epoch: 6 Global Step: 136620 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:26,299-Speed 2514.75 samples/sec Loss 4.7678 LearningRate 0.000861 Epoch: 6 Global Step: 136630 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:34,497-Speed 2498.49 samples/sec Loss 4.8662 LearningRate 0.000861 Epoch: 6 Global Step: 136640 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:42,700-Speed 2497.27 samples/sec Loss 4.9102 LearningRate 0.000861 Epoch: 6 Global Step: 136650 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:50,903-Speed 2496.81 samples/sec Loss 4.8717 LearningRate 0.000861 Epoch: 6 Global Step: 136660 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:39:59,104-Speed 2497.88 samples/sec Loss 4.9507 LearningRate 0.000861 Epoch: 6 Global Step: 136670 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:07,304-Speed 2498.22 samples/sec Loss 5.0037 LearningRate 0.000861 Epoch: 6 Global Step: 136680 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:15,465-Speed 2509.86 samples/sec Loss 4.8526 LearningRate 0.000861 Epoch: 6 Global Step: 136690 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:23,666-Speed 2497.47 samples/sec Loss 4.9978 LearningRate 0.000861 Epoch: 6 Global Step: 136700 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:31,867-Speed 2497.74 samples/sec Loss 4.8869 LearningRate 0.000861 Epoch: 6 Global Step: 136710 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:40,068-Speed 2497.78 samples/sec Loss 4.9302 LearningRate 0.000861 Epoch: 6 Global Step: 136720 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:48,268-Speed 2498.10 samples/sec Loss 4.9064 LearningRate 0.000861 Epoch: 6 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:40:56,474-Speed 2495.98 samples/sec Loss 4.8847 LearningRate 0.000861 Epoch: 6 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:04,617-Speed 2515.36 samples/sec Loss 4.9204 LearningRate 0.000861 Epoch: 6 Global Step: 136750 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:12,819-Speed 2497.40 samples/sec Loss 4.9508 LearningRate 0.000861 Epoch: 6 Global Step: 136760 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:21,019-Speed 2498.00 samples/sec Loss 4.9335 LearningRate 0.000861 Epoch: 6 Global Step: 136770 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:29,232-Speed 2494.27 samples/sec Loss 4.9485 LearningRate 0.000861 Epoch: 6 Global Step: 136780 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:37,445-Speed 2494.08 samples/sec Loss 4.9977 LearningRate 0.000861 Epoch: 6 Global Step: 136790 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:45,645-Speed 2497.88 samples/sec Loss 4.8758 LearningRate 0.000861 Epoch: 6 Global Step: 136800 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:41:53,793-Speed 2514.03 samples/sec Loss 4.9076 LearningRate 0.000861 Epoch: 6 Global Step: 136810 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:42:01,954-Speed 2509.84 samples/sec Loss 4.9400 LearningRate 0.000861 Epoch: 6 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:10,162-Speed 2495.46 samples/sec Loss 4.9033 LearningRate 0.000861 Epoch: 6 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:18,361-Speed 2498.10 samples/sec Loss 4.7979 LearningRate 0.000861 Epoch: 6 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:26,563-Speed 2497.31 samples/sec Loss 4.8382 LearningRate 0.000861 Epoch: 6 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:34,773-Speed 2496.00 samples/sec Loss 4.7923 LearningRate 0.000861 Epoch: 6 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:42,930-Speed 2511.21 samples/sec Loss 4.8215 LearningRate 0.000861 Epoch: 6 Global Step: 136870 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:51,129-Speed 2498.04 samples/sec Loss 4.8892 LearningRate 0.000861 Epoch: 6 Global Step: 136880 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:42:59,330-Speed 2498.07 samples/sec Loss 4.8334 LearningRate 0.000861 Epoch: 6 Global Step: 136890 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:07,530-Speed 2497.96 samples/sec Loss 4.8448 LearningRate 0.000861 Epoch: 6 Global Step: 136900 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:15,732-Speed 2497.25 samples/sec Loss 4.8169 LearningRate 0.000861 Epoch: 6 Global Step: 136910 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:23,933-Speed 2497.60 samples/sec Loss 4.8066 LearningRate 0.000861 Epoch: 6 Global Step: 136920 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:32,082-Speed 2513.51 samples/sec Loss 4.7989 LearningRate 0.000861 Epoch: 6 Global Step: 136930 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:40,296-Speed 2493.65 samples/sec Loss 4.8765 LearningRate 0.000861 Epoch: 6 Global Step: 136940 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:48,507-Speed 2494.67 samples/sec Loss 4.8647 LearningRate 0.000861 Epoch: 6 Global Step: 136950 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:43:56,707-Speed 2497.94 samples/sec Loss 4.7766 LearningRate 0.000861 Epoch: 6 Global Step: 136960 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:04,908-Speed 2497.54 samples/sec Loss 4.8146 LearningRate 0.000861 Epoch: 6 Global Step: 136970 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:13,113-Speed 2496.58 samples/sec Loss 4.8029 LearningRate 0.000861 Epoch: 6 Global Step: 136980 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:21,261-Speed 2513.90 samples/sec Loss 4.7674 LearningRate 0.000860 Epoch: 6 Global Step: 136990 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:29,463-Speed 2497.30 samples/sec Loss 4.8469 LearningRate 0.000860 Epoch: 6 Global Step: 137000 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:37,664-Speed 2497.95 samples/sec Loss 4.8632 LearningRate 0.000860 Epoch: 6 Global Step: 137010 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:45,868-Speed 2496.79 samples/sec Loss 4.8568 LearningRate 0.000860 Epoch: 6 Global Step: 137020 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:44:54,074-Speed 2496.32 samples/sec Loss 4.8770 LearningRate 0.000860 Epoch: 6 Global Step: 137030 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:02,277-Speed 2496.97 samples/sec Loss 4.9465 LearningRate 0.000860 Epoch: 6 Global Step: 137040 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:10,429-Speed 2512.49 samples/sec Loss 4.7886 LearningRate 0.000860 Epoch: 6 Global Step: 137050 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:18,658-Speed 2489.27 samples/sec Loss 4.8981 LearningRate 0.000860 Epoch: 6 Global Step: 137060 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:26,871-Speed 2494.10 samples/sec Loss 4.8469 LearningRate 0.000860 Epoch: 6 Global Step: 137070 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:35,076-Speed 2496.66 samples/sec Loss 4.8263 LearningRate 0.000860 Epoch: 6 Global Step: 137080 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:43,285-Speed 2495.45 samples/sec Loss 4.8333 LearningRate 0.000860 Epoch: 6 Global Step: 137090 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:51,491-Speed 2496.23 samples/sec Loss 4.8839 LearningRate 0.000860 Epoch: 6 Global Step: 137100 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:45:59,639-Speed 2513.73 samples/sec Loss 4.8818 LearningRate 0.000860 Epoch: 6 Global Step: 137110 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:07,839-Speed 2497.99 samples/sec Loss 4.8449 LearningRate 0.000860 Epoch: 6 Global Step: 137120 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:16,038-Speed 2498.07 samples/sec Loss 4.9375 LearningRate 0.000860 Epoch: 6 Global Step: 137130 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:24,241-Speed 2497.14 samples/sec Loss 4.8301 LearningRate 0.000860 Epoch: 6 Global Step: 137140 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:32,443-Speed 2497.40 samples/sec Loss 4.8489 LearningRate 0.000860 Epoch: 6 Global Step: 137150 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:40,651-Speed 2495.61 samples/sec Loss 4.7536 LearningRate 0.000860 Epoch: 6 Global Step: 137160 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:48,803-Speed 2512.45 samples/sec Loss 4.9440 LearningRate 0.000860 Epoch: 6 Global Step: 137170 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:46:57,006-Speed 2496.94 samples/sec Loss 4.8345 LearningRate 0.000860 Epoch: 6 Global Step: 137180 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:05,221-Speed 2493.81 samples/sec Loss 4.9641 LearningRate 0.000860 Epoch: 6 Global Step: 137190 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:13,425-Speed 2496.71 samples/sec Loss 4.9016 LearningRate 0.000860 Epoch: 6 Global Step: 137200 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:21,631-Speed 2496.42 samples/sec Loss 4.8781 LearningRate 0.000860 Epoch: 6 Global Step: 137210 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:29,838-Speed 2495.84 samples/sec Loss 4.8317 LearningRate 0.000860 Epoch: 6 Global Step: 137220 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:37,986-Speed 2513.74 samples/sec Loss 4.9622 LearningRate 0.000860 Epoch: 6 Global Step: 137230 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:46,193-Speed 2495.88 samples/sec Loss 4.8746 LearningRate 0.000860 Epoch: 6 Global Step: 137240 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:47:54,395-Speed 2497.50 samples/sec Loss 4.8755 LearningRate 0.000860 Epoch: 6 Global Step: 137250 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:02,596-Speed 2497.70 samples/sec Loss 4.8551 LearningRate 0.000860 Epoch: 6 Global Step: 137260 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:10,800-Speed 2496.51 samples/sec Loss 4.8822 LearningRate 0.000860 Epoch: 6 Global Step: 137270 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:19,001-Speed 2497.64 samples/sec Loss 4.8555 LearningRate 0.000860 Epoch: 6 Global Step: 137280 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:27,152-Speed 2513.18 samples/sec Loss 4.7701 LearningRate 0.000860 Epoch: 6 Global Step: 137290 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:35,355-Speed 2496.91 samples/sec Loss 4.8331 LearningRate 0.000860 Epoch: 6 Global Step: 137300 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:43,559-Speed 2496.83 samples/sec Loss 4.9046 LearningRate 0.000860 Epoch: 6 Global Step: 137310 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:51,759-Speed 2497.77 samples/sec Loss 4.9094 LearningRate 0.000860 Epoch: 6 Global Step: 137320 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:48:59,964-Speed 2496.72 samples/sec Loss 4.9421 LearningRate 0.000860 Epoch: 6 Global Step: 137330 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:08,168-Speed 2496.69 samples/sec Loss 4.9337 LearningRate 0.000860 Epoch: 6 Global Step: 137340 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:16,312-Speed 2515.01 samples/sec Loss 4.9448 LearningRate 0.000860 Epoch: 6 Global Step: 137350 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:24,518-Speed 2496.17 samples/sec Loss 4.8941 LearningRate 0.000860 Epoch: 6 Global Step: 137360 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:32,718-Speed 2498.20 samples/sec Loss 4.9000 LearningRate 0.000860 Epoch: 6 Global Step: 137370 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:40,919-Speed 2497.63 samples/sec Loss 4.8576 LearningRate 0.000860 Epoch: 6 Global Step: 137380 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:49,120-Speed 2497.51 samples/sec Loss 4.8878 LearningRate 0.000860 Epoch: 6 Global Step: 137390 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:49:57,323-Speed 2497.07 samples/sec Loss 4.8484 LearningRate 0.000859 Epoch: 6 Global Step: 137400 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:05,468-Speed 2514.78 samples/sec Loss 4.9091 LearningRate 0.000859 Epoch: 6 Global Step: 137410 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:13,668-Speed 2497.99 samples/sec Loss 4.8179 LearningRate 0.000859 Epoch: 6 Global Step: 137420 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:21,868-Speed 2497.87 samples/sec Loss 4.8334 LearningRate 0.000859 Epoch: 6 Global Step: 137430 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:30,070-Speed 2497.57 samples/sec Loss 4.7731 LearningRate 0.000859 Epoch: 6 Global Step: 137440 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:38,276-Speed 2496.21 samples/sec Loss 4.9113 LearningRate 0.000859 Epoch: 6 Global Step: 137450 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:46,488-Speed 2494.34 samples/sec Loss 4.8986 LearningRate 0.000859 Epoch: 6 Global Step: 137460 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:50:54,640-Speed 2512.58 samples/sec Loss 4.9375 LearningRate 0.000859 Epoch: 6 Global Step: 137470 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:02,841-Speed 2497.68 samples/sec Loss 4.7925 LearningRate 0.000859 Epoch: 6 Global Step: 137480 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:11,043-Speed 2497.29 samples/sec Loss 4.8287 LearningRate 0.000859 Epoch: 6 Global Step: 137490 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:19,244-Speed 2497.73 samples/sec Loss 4.7748 LearningRate 0.000859 Epoch: 6 Global Step: 137500 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:27,446-Speed 2497.18 samples/sec Loss 4.7834 LearningRate 0.000859 Epoch: 6 Global Step: 137510 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:35,645-Speed 2498.33 samples/sec Loss 4.8465 LearningRate 0.000859 Epoch: 6 Global Step: 137520 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:43,818-Speed 2506.30 samples/sec Loss 4.7832 LearningRate 0.000859 Epoch: 6 Global Step: 137530 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:51:52,018-Speed 2497.83 samples/sec Loss 4.7860 LearningRate 0.000859 Epoch: 6 Global Step: 137540 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:00,216-Speed 2498.69 samples/sec Loss 4.8216 LearningRate 0.000859 Epoch: 6 Global Step: 137550 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:08,416-Speed 2498.02 samples/sec Loss 4.8411 LearningRate 0.000859 Epoch: 6 Global Step: 137560 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:16,618-Speed 2497.43 samples/sec Loss 4.8786 LearningRate 0.000859 Epoch: 6 Global Step: 137570 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:24,818-Speed 2497.94 samples/sec Loss 4.8172 LearningRate 0.000859 Epoch: 6 Global Step: 137580 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:32,979-Speed 2509.81 samples/sec Loss 4.8355 LearningRate 0.000859 Epoch: 6 Global Step: 137590 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:41,179-Speed 2497.87 samples/sec Loss 4.8576 LearningRate 0.000859 Epoch: 6 Global Step: 137600 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:49,445-Speed 2499.45 samples/sec Loss 4.8899 LearningRate 0.000859 Epoch: 6 Global Step: 137610 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:52:57,647-Speed 2497.35 samples/sec Loss 4.8720 LearningRate 0.000859 Epoch: 6 Global Step: 137620 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:05,865-Speed 2497.43 samples/sec Loss 4.9253 LearningRate 0.000859 Epoch: 6 Global Step: 137630 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:14,092-Speed 2499.08 samples/sec Loss 4.8170 LearningRate 0.000859 Epoch: 6 Global Step: 137640 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:22,925-Speed 2518.52 samples/sec Loss 4.8423 LearningRate 0.000859 Epoch: 6 Global Step: 137650 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:31,116-Speed 2500.70 samples/sec Loss 4.8179 LearningRate 0.000859 Epoch: 6 Global Step: 137660 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:39,311-Speed 2499.26 samples/sec Loss 4.8054 LearningRate 0.000859 Epoch: 6 Global Step: 137670 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:47,552-Speed 2501.45 samples/sec Loss 4.8256 LearningRate 0.000859 Epoch: 6 Global Step: 137680 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:53:58,734-Speed 2003.88 samples/sec Loss 4.9452 LearningRate 0.000859 Epoch: 6 Global Step: 137690 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:06,954-Speed 2491.82 samples/sec Loss 4.9994 LearningRate 0.000859 Epoch: 6 Global Step: 137700 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:15,084-Speed 2519.35 samples/sec Loss 4.8522 LearningRate 0.000859 Epoch: 6 Global Step: 137710 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:26,887-Speed 2499.51 samples/sec Loss 4.8796 LearningRate 0.000859 Epoch: 6 Global Step: 137720 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:35,109-Speed 2499.76 samples/sec Loss 4.8846 LearningRate 0.000859 Epoch: 6 Global Step: 137730 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:43,303-Speed 2499.69 samples/sec Loss 4.8694 LearningRate 0.000859 Epoch: 6 Global Step: 137740 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:51,527-Speed 2499.95 samples/sec Loss 4.8712 LearningRate 0.000859 Epoch: 6 Global Step: 137750 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:54:59,765-Speed 2497.55 samples/sec Loss 4.8405 LearningRate 0.000859 Epoch: 6 Global Step: 137760 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:07,920-Speed 2511.63 samples/sec Loss 4.9529 LearningRate 0.000859 Epoch: 6 Global Step: 137770 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:16,177-Speed 2493.36 samples/sec Loss 4.8968 LearningRate 0.000859 Epoch: 6 Global Step: 137780 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:25,768-Speed 2165.36 samples/sec Loss 4.9083 LearningRate 0.000859 Epoch: 6 Global Step: 137790 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:34,005-Speed 2492.51 samples/sec Loss 4.8387 LearningRate 0.000858 Epoch: 6 Global Step: 137800 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:42,210-Speed 2496.11 samples/sec Loss 4.7878 LearningRate 0.000858 Epoch: 6 Global Step: 137810 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:50,549-Speed 2456.27 samples/sec Loss 4.8777 LearningRate 0.000858 Epoch: 6 Global Step: 137820 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:55:58,719-Speed 2513.51 samples/sec Loss 4.9706 LearningRate 0.000858 Epoch: 6 Global Step: 137830 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:06,954-Speed 2499.07 samples/sec Loss 4.9169 LearningRate 0.000858 Epoch: 6 Global Step: 137840 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:15,162-Speed 2495.33 samples/sec Loss 4.9095 LearningRate 0.000858 Epoch: 6 Global Step: 137850 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:23,752-Speed 2417.60 samples/sec Loss 4.8353 LearningRate 0.000858 Epoch: 6 Global Step: 137860 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:31,952-Speed 2498.09 samples/sec Loss 4.8800 LearningRate 0.000858 Epoch: 6 Global Step: 137870 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:40,150-Speed 2498.35 samples/sec Loss 4.8840 LearningRate 0.000858 Epoch: 6 Global Step: 137880 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:48,337-Speed 2516.42 samples/sec Loss 4.8921 LearningRate 0.000858 Epoch: 6 Global Step: 137890 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:56:56,551-Speed 2497.83 samples/sec Loss 4.8200 LearningRate 0.000858 Epoch: 6 Global Step: 137900 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:04,760-Speed 2500.20 samples/sec Loss 4.7997 LearningRate 0.000858 Epoch: 6 Global Step: 137910 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:12,955-Speed 2499.19 samples/sec Loss 4.8210 LearningRate 0.000858 Epoch: 6 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:21,198-Speed 2499.62 samples/sec Loss 4.8309 LearningRate 0.000858 Epoch: 6 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:29,667-Speed 2501.08 samples/sec Loss 4.8474 LearningRate 0.000858 Epoch: 6 Global Step: 137940 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:39,741-Speed 2098.62 samples/sec Loss 4.7874 LearningRate 0.000858 Epoch: 6 Global Step: 137950 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:47,936-Speed 2499.30 samples/sec Loss 4.8899 LearningRate 0.000858 Epoch: 6 Global Step: 137960 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:57:57,649-Speed 2108.70 samples/sec Loss 4.7965 LearningRate 0.000858 Epoch: 6 Global Step: 137970 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:58:06,170-Speed 2502.55 samples/sec Loss 4.8457 LearningRate 0.000858 Epoch: 6 Global Step: 137980 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:58:14,375-Speed 2496.38 samples/sec Loss 4.8135 LearningRate 0.000858 Epoch: 6 Global Step: 137990 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:58:22,572-Speed 2498.92 samples/sec Loss 4.8666 LearningRate 0.000858 Epoch: 6 Global Step: 138000 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:58:30,717-Speed 2514.83 samples/sec Loss 4.8748 LearningRate 0.000858 Epoch: 6 Global Step: 138010 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 21:58:38,914-Speed 2499.00 samples/sec Loss 4.8648 LearningRate 0.000858 Epoch: 6 Global Step: 138020 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:58:47,108-Speed 2499.47 samples/sec Loss 4.8124 LearningRate 0.000858 Epoch: 6 Global Step: 138030 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:58:55,307-Speed 2498.37 samples/sec Loss 4.7971 LearningRate 0.000858 Epoch: 6 Global Step: 138040 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:03,509-Speed 2497.35 samples/sec Loss 4.8354 LearningRate 0.000858 Epoch: 6 Global Step: 138050 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:11,710-Speed 2497.82 samples/sec Loss 4.9060 LearningRate 0.000858 Epoch: 6 Global Step: 138060 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:19,860-Speed 2513.25 samples/sec Loss 4.8222 LearningRate 0.000858 Epoch: 6 Global Step: 138070 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:28,074-Speed 2493.69 samples/sec Loss 4.7686 LearningRate 0.000858 Epoch: 6 Global Step: 138080 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:36,278-Speed 2496.88 samples/sec Loss 4.8181 LearningRate 0.000858 Epoch: 6 Global Step: 138090 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:44,480-Speed 2497.21 samples/sec Loss 4.8231 LearningRate 0.000858 Epoch: 6 Global Step: 138100 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 21:59:52,679-Speed 2498.25 samples/sec Loss 4.8534 LearningRate 0.000858 Epoch: 6 Global Step: 138110 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:00,884-Speed 2496.29 samples/sec Loss 4.9900 LearningRate 0.000858 Epoch: 6 Global Step: 138120 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:09,033-Speed 2513.73 samples/sec Loss 4.8600 LearningRate 0.000858 Epoch: 6 Global Step: 138130 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:17,234-Speed 2497.84 samples/sec Loss 4.8029 LearningRate 0.000858 Epoch: 6 Global Step: 138140 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:25,435-Speed 2497.57 samples/sec Loss 4.7940 LearningRate 0.000858 Epoch: 6 Global Step: 138150 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:33,641-Speed 2496.23 samples/sec Loss 4.8592 LearningRate 0.000858 Epoch: 6 Global Step: 138160 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:41,852-Speed 2494.61 samples/sec Loss 4.8245 LearningRate 0.000858 Epoch: 6 Global Step: 138170 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:50,048-Speed 2499.10 samples/sec Loss 4.8146 LearningRate 0.000858 Epoch: 6 Global Step: 138180 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:00:58,194-Speed 2514.33 samples/sec Loss 4.7888 LearningRate 0.000858 Epoch: 6 Global Step: 138190 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:06,400-Speed 2496.38 samples/sec Loss 4.8916 LearningRate 0.000857 Epoch: 6 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:14,597-Speed 2498.68 samples/sec Loss 4.8507 LearningRate 0.000857 Epoch: 6 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:22,794-Speed 2498.58 samples/sec Loss 4.9243 LearningRate 0.000857 Epoch: 6 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:30,993-Speed 2498.57 samples/sec Loss 4.7630 LearningRate 0.000857 Epoch: 6 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:39,200-Speed 2496.36 samples/sec Loss 4.8390 LearningRate 0.000857 Epoch: 6 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:47,346-Speed 2514.57 samples/sec Loss 4.7637 LearningRate 0.000857 Epoch: 6 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:01:55,543-Speed 2498.55 samples/sec Loss 4.7961 LearningRate 0.000857 Epoch: 6 Global Step: 138260 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:02:03,740-Speed 2499.02 samples/sec Loss 4.8665 LearningRate 0.000857 Epoch: 6 Global Step: 138270 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:02:11,947-Speed 2495.63 samples/sec Loss 4.8022 LearningRate 0.000857 Epoch: 6 Global Step: 138280 Fp16 Grad Scale: 65536 Required: 158 hours Training: 2022-07-06 22:02:20,103-Speed 2511.61 samples/sec Loss 4.7738 LearningRate 0.000857 Epoch: 6 Global Step: 138290 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:02:28,309-Speed 2496.21 samples/sec Loss 4.7920 LearningRate 0.000857 Epoch: 6 Global Step: 138300 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:02:36,456-Speed 2514.06 samples/sec Loss 4.8464 LearningRate 0.000857 Epoch: 6 Global Step: 138310 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:02:44,668-Speed 2494.57 samples/sec Loss 4.8512 LearningRate 0.000857 Epoch: 6 Global Step: 138320 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:02:52,869-Speed 2497.77 samples/sec Loss 4.7562 LearningRate 0.000857 Epoch: 6 Global Step: 138330 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:01,069-Speed 2497.94 samples/sec Loss 4.7782 LearningRate 0.000857 Epoch: 6 Global Step: 138340 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:09,267-Speed 2498.87 samples/sec Loss 4.7717 LearningRate 0.000857 Epoch: 6 Global Step: 138350 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:17,469-Speed 2497.24 samples/sec Loss 4.7453 LearningRate 0.000857 Epoch: 6 Global Step: 138360 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:25,622-Speed 2512.51 samples/sec Loss 4.9376 LearningRate 0.000857 Epoch: 6 Global Step: 138370 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:33,827-Speed 2496.41 samples/sec Loss 4.7842 LearningRate 0.000857 Epoch: 6 Global Step: 138380 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:42,043-Speed 2493.27 samples/sec Loss 4.7375 LearningRate 0.000857 Epoch: 6 Global Step: 138390 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:50,244-Speed 2497.78 samples/sec Loss 4.8214 LearningRate 0.000857 Epoch: 6 Global Step: 138400 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:03:58,443-Speed 2498.46 samples/sec Loss 4.8012 LearningRate 0.000857 Epoch: 6 Global Step: 138410 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:06,647-Speed 2496.87 samples/sec Loss 4.7720 LearningRate 0.000857 Epoch: 6 Global Step: 138420 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:14,794-Speed 2514.60 samples/sec Loss 4.7680 LearningRate 0.000857 Epoch: 6 Global Step: 138430 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:22,992-Speed 2498.46 samples/sec Loss 4.8439 LearningRate 0.000857 Epoch: 6 Global Step: 138440 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:31,194-Speed 2497.26 samples/sec Loss 4.9417 LearningRate 0.000857 Epoch: 6 Global Step: 138450 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:39,394-Speed 2498.17 samples/sec Loss 4.7862 LearningRate 0.000857 Epoch: 6 Global Step: 138460 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:47,595-Speed 2497.75 samples/sec Loss 4.8550 LearningRate 0.000857 Epoch: 6 Global Step: 138470 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:04:56,299-Speed 2353.45 samples/sec Loss 4.7934 LearningRate 0.000857 Epoch: 6 Global Step: 138480 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:05,277-Speed 2281.29 samples/sec Loss 4.7572 LearningRate 0.000857 Epoch: 6 Global Step: 138490 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:13,472-Speed 2499.75 samples/sec Loss 4.8298 LearningRate 0.000857 Epoch: 6 Global Step: 138500 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:21,668-Speed 2499.06 samples/sec Loss 4.8376 LearningRate 0.000857 Epoch: 6 Global Step: 138510 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:30,003-Speed 2493.39 samples/sec Loss 4.8234 LearningRate 0.000857 Epoch: 6 Global Step: 138520 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:38,199-Speed 2499.22 samples/sec Loss 4.7942 LearningRate 0.000857 Epoch: 6 Global Step: 138530 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:46,396-Speed 2498.90 samples/sec Loss 4.7905 LearningRate 0.000857 Epoch: 6 Global Step: 138540 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:05:54,539-Speed 2515.70 samples/sec Loss 4.8080 LearningRate 0.000857 Epoch: 6 Global Step: 138550 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:06:02,695-Speed 2511.48 samples/sec Loss 4.8120 LearningRate 0.000857 Epoch: 6 Global Step: 138560 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:10,892-Speed 2498.83 samples/sec Loss 4.8188 LearningRate 0.000857 Epoch: 6 Global Step: 138570 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:19,087-Speed 2499.38 samples/sec Loss 4.7922 LearningRate 0.000857 Epoch: 6 Global Step: 138580 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:27,284-Speed 2498.89 samples/sec Loss 4.7889 LearningRate 0.000857 Epoch: 6 Global Step: 138590 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:35,481-Speed 2499.42 samples/sec Loss 4.7675 LearningRate 0.000856 Epoch: 6 Global Step: 138600 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:43,716-Speed 2487.31 samples/sec Loss 4.8356 LearningRate 0.000856 Epoch: 6 Global Step: 138610 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:06:51,913-Speed 2498.89 samples/sec Loss 4.7617 LearningRate 0.000856 Epoch: 6 Global Step: 138620 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:00,115-Speed 2497.30 samples/sec Loss 4.8076 LearningRate 0.000856 Epoch: 6 Global Step: 138630 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:08,310-Speed 2499.33 samples/sec Loss 4.8772 LearningRate 0.000856 Epoch: 6 Global Step: 138640 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:16,509-Speed 2498.25 samples/sec Loss 4.7581 LearningRate 0.000856 Epoch: 6 Global Step: 138650 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:24,725-Speed 2493.18 samples/sec Loss 4.8575 LearningRate 0.000856 Epoch: 6 Global Step: 138660 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:32,874-Speed 2513.60 samples/sec Loss 4.7713 LearningRate 0.000856 Epoch: 6 Global Step: 138670 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:41,080-Speed 2496.07 samples/sec Loss 4.7492 LearningRate 0.000856 Epoch: 6 Global Step: 138680 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:49,279-Speed 2498.32 samples/sec Loss 4.7617 LearningRate 0.000856 Epoch: 6 Global Step: 138690 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:07:57,481-Speed 2497.50 samples/sec Loss 4.8380 LearningRate 0.000856 Epoch: 6 Global Step: 138700 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:05,683-Speed 2497.66 samples/sec Loss 4.7506 LearningRate 0.000856 Epoch: 6 Global Step: 138710 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:13,884-Speed 2497.41 samples/sec Loss 4.7929 LearningRate 0.000856 Epoch: 6 Global Step: 138720 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:22,031-Speed 2514.24 samples/sec Loss 4.9093 LearningRate 0.000856 Epoch: 6 Global Step: 138730 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:30,231-Speed 2498.18 samples/sec Loss 4.7757 LearningRate 0.000856 Epoch: 6 Global Step: 138740 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:38,429-Speed 2498.48 samples/sec Loss 4.8536 LearningRate 0.000856 Epoch: 6 Global Step: 138750 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:46,631-Speed 2497.54 samples/sec Loss 4.8403 LearningRate 0.000856 Epoch: 6 Global Step: 138760 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:08:54,842-Speed 2494.57 samples/sec Loss 4.8244 LearningRate 0.000856 Epoch: 6 Global Step: 138770 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:03,041-Speed 2498.20 samples/sec Loss 4.8208 LearningRate 0.000856 Epoch: 6 Global Step: 138780 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:11,183-Speed 2515.65 samples/sec Loss 4.8323 LearningRate 0.000856 Epoch: 6 Global Step: 138790 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:19,381-Speed 2498.88 samples/sec Loss 4.8068 LearningRate 0.000856 Epoch: 6 Global Step: 138800 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:27,593-Speed 2494.27 samples/sec Loss 4.8075 LearningRate 0.000856 Epoch: 6 Global Step: 138810 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:35,793-Speed 2499.20 samples/sec Loss 4.7415 LearningRate 0.000856 Epoch: 6 Global Step: 138820 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:43,990-Speed 2498.79 samples/sec Loss 4.8819 LearningRate 0.000856 Epoch: 6 Global Step: 138830 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:09:52,202-Speed 2494.27 samples/sec Loss 4.7645 LearningRate 0.000856 Epoch: 6 Global Step: 138840 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:00,345-Speed 2515.30 samples/sec Loss 4.8524 LearningRate 0.000856 Epoch: 6 Global Step: 138850 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:08,542-Speed 2498.80 samples/sec Loss 4.8050 LearningRate 0.000856 Epoch: 6 Global Step: 138860 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:16,754-Speed 2494.38 samples/sec Loss 4.8124 LearningRate 0.000856 Epoch: 6 Global Step: 138870 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:24,958-Speed 2496.89 samples/sec Loss 4.9228 LearningRate 0.000856 Epoch: 6 Global Step: 138880 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:33,163-Speed 2496.32 samples/sec Loss 4.8849 LearningRate 0.000856 Epoch: 6 Global Step: 138890 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:41,365-Speed 2497.40 samples/sec Loss 4.8224 LearningRate 0.000856 Epoch: 6 Global Step: 138900 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:49,512-Speed 2514.07 samples/sec Loss 4.8213 LearningRate 0.000856 Epoch: 6 Global Step: 138910 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:10:57,711-Speed 2498.50 samples/sec Loss 4.7575 LearningRate 0.000856 Epoch: 6 Global Step: 138920 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:05,923-Speed 2494.34 samples/sec Loss 4.8445 LearningRate 0.000856 Epoch: 6 Global Step: 138930 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:14,122-Speed 2498.55 samples/sec Loss 4.7263 LearningRate 0.000856 Epoch: 6 Global Step: 138940 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:22,325-Speed 2496.76 samples/sec Loss 4.7290 LearningRate 0.000856 Epoch: 6 Global Step: 138950 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:30,523-Speed 2498.40 samples/sec Loss 4.7721 LearningRate 0.000856 Epoch: 6 Global Step: 138960 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:38,670-Speed 2515.00 samples/sec Loss 4.7742 LearningRate 0.000856 Epoch: 6 Global Step: 138970 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:46,869-Speed 2498.45 samples/sec Loss 4.8240 LearningRate 0.000856 Epoch: 6 Global Step: 138980 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:11:55,067-Speed 2498.53 samples/sec Loss 4.8305 LearningRate 0.000856 Epoch: 6 Global Step: 138990 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:03,265-Speed 2498.48 samples/sec Loss 4.8272 LearningRate 0.000856 Epoch: 6 Global Step: 139000 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:11,465-Speed 2498.03 samples/sec Loss 4.8202 LearningRate 0.000855 Epoch: 6 Global Step: 139010 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:19,668-Speed 2497.33 samples/sec Loss 4.6995 LearningRate 0.000855 Epoch: 6 Global Step: 139020 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:27,816-Speed 2513.88 samples/sec Loss 4.7569 LearningRate 0.000855 Epoch: 6 Global Step: 139030 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:36,013-Speed 2498.64 samples/sec Loss 4.7967 LearningRate 0.000855 Epoch: 6 Global Step: 139040 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:44,214-Speed 2497.84 samples/sec Loss 4.8096 LearningRate 0.000855 Epoch: 6 Global Step: 139050 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:12:52,411-Speed 2498.74 samples/sec Loss 4.7870 LearningRate 0.000855 Epoch: 6 Global Step: 139060 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:00,612-Speed 2497.62 samples/sec Loss 4.7933 LearningRate 0.000855 Epoch: 6 Global Step: 139070 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:08,812-Speed 2498.26 samples/sec Loss 4.7620 LearningRate 0.000855 Epoch: 6 Global Step: 139080 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:16,956-Speed 2515.21 samples/sec Loss 4.9253 LearningRate 0.000855 Epoch: 6 Global Step: 139090 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:25,156-Speed 2497.90 samples/sec Loss 4.8082 LearningRate 0.000855 Epoch: 6 Global Step: 139100 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:33,359-Speed 2496.99 samples/sec Loss 4.7996 LearningRate 0.000855 Epoch: 6 Global Step: 139110 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:41,563-Speed 2496.85 samples/sec Loss 4.7458 LearningRate 0.000855 Epoch: 6 Global Step: 139120 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:49,763-Speed 2497.91 samples/sec Loss 4.7810 LearningRate 0.000855 Epoch: 6 Global Step: 139130 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:13:57,959-Speed 2499.01 samples/sec Loss 4.7723 LearningRate 0.000855 Epoch: 6 Global Step: 139140 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:06,104-Speed 2514.86 samples/sec Loss 4.7579 LearningRate 0.000855 Epoch: 6 Global Step: 139150 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:14,317-Speed 2493.91 samples/sec Loss 4.8184 LearningRate 0.000855 Epoch: 6 Global Step: 139160 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:22,519-Speed 2497.36 samples/sec Loss 4.8333 LearningRate 0.000855 Epoch: 6 Global Step: 139170 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:30,725-Speed 2496.31 samples/sec Loss 4.8306 LearningRate 0.000855 Epoch: 6 Global Step: 139180 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:38,929-Speed 2496.84 samples/sec Loss 4.8093 LearningRate 0.000855 Epoch: 6 Global Step: 139190 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:47,126-Speed 2498.73 samples/sec Loss 4.8603 LearningRate 0.000855 Epoch: 6 Global Step: 139200 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:14:55,273-Speed 2514.47 samples/sec Loss 4.7537 LearningRate 0.000855 Epoch: 6 Global Step: 139210 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:03,473-Speed 2498.01 samples/sec Loss 4.7740 LearningRate 0.000855 Epoch: 6 Global Step: 139220 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:11,672-Speed 2498.23 samples/sec Loss 4.8018 LearningRate 0.000855 Epoch: 6 Global Step: 139230 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:19,871-Speed 2498.31 samples/sec Loss 4.8096 LearningRate 0.000855 Epoch: 6 Global Step: 139240 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:28,082-Speed 2494.56 samples/sec Loss 4.8071 LearningRate 0.000855 Epoch: 6 Global Step: 139250 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:36,287-Speed 2496.54 samples/sec Loss 4.8831 LearningRate 0.000855 Epoch: 6 Global Step: 139260 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:44,435-Speed 2513.74 samples/sec Loss 4.7453 LearningRate 0.000855 Epoch: 6 Global Step: 139270 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:15:52,635-Speed 2497.98 samples/sec Loss 4.8028 LearningRate 0.000855 Epoch: 6 Global Step: 139280 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:00,835-Speed 2497.98 samples/sec Loss 4.8378 LearningRate 0.000855 Epoch: 6 Global Step: 139290 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:09,033-Speed 2498.65 samples/sec Loss 4.7878 LearningRate 0.000855 Epoch: 6 Global Step: 139300 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:17,233-Speed 2498.06 samples/sec Loss 4.8249 LearningRate 0.000855 Epoch: 6 Global Step: 139310 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:25,432-Speed 2498.38 samples/sec Loss 4.8527 LearningRate 0.000855 Epoch: 6 Global Step: 139320 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:33,580-Speed 2514.02 samples/sec Loss 4.8035 LearningRate 0.000855 Epoch: 6 Global Step: 139330 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:41,778-Speed 2498.76 samples/sec Loss 4.8441 LearningRate 0.000855 Epoch: 6 Global Step: 139340 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:49,974-Speed 2498.88 samples/sec Loss 4.7762 LearningRate 0.000855 Epoch: 6 Global Step: 139350 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:16:58,170-Speed 2499.93 samples/sec Loss 4.7576 LearningRate 0.000855 Epoch: 6 Global Step: 139360 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:06,364-Speed 2499.88 samples/sec Loss 4.8583 LearningRate 0.000855 Epoch: 6 Global Step: 139370 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:14,578-Speed 2493.93 samples/sec Loss 4.9297 LearningRate 0.000855 Epoch: 6 Global Step: 139380 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:22,726-Speed 2514.01 samples/sec Loss 4.8168 LearningRate 0.000855 Epoch: 6 Global Step: 139390 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:30,923-Speed 2499.14 samples/sec Loss 4.8941 LearningRate 0.000855 Epoch: 6 Global Step: 139400 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:39,134-Speed 2494.49 samples/sec Loss 4.8754 LearningRate 0.000854 Epoch: 6 Global Step: 139410 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:47,331-Speed 2498.92 samples/sec Loss 4.7673 LearningRate 0.000854 Epoch: 6 Global Step: 139420 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:17:55,538-Speed 2495.83 samples/sec Loss 4.7504 LearningRate 0.000854 Epoch: 6 Global Step: 139430 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:03,734-Speed 2499.05 samples/sec Loss 4.8911 LearningRate 0.000854 Epoch: 6 Global Step: 139440 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:11,880-Speed 2514.57 samples/sec Loss 4.8322 LearningRate 0.000854 Epoch: 6 Global Step: 139450 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:20,076-Speed 2499.21 samples/sec Loss 4.7520 LearningRate 0.000854 Epoch: 6 Global Step: 139460 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:28,275-Speed 2498.33 samples/sec Loss 4.8330 LearningRate 0.000854 Epoch: 6 Global Step: 139470 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:36,499-Speed 2490.72 samples/sec Loss 4.8585 LearningRate 0.000854 Epoch: 6 Global Step: 139480 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:44,698-Speed 2498.34 samples/sec Loss 4.8530 LearningRate 0.000854 Epoch: 6 Global Step: 139490 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:18:52,899-Speed 2497.73 samples/sec Loss 4.8159 LearningRate 0.000854 Epoch: 6 Global Step: 139500 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:01,039-Speed 2516.42 samples/sec Loss 4.8224 LearningRate 0.000854 Epoch: 6 Global Step: 139510 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:09,237-Speed 2498.56 samples/sec Loss 4.8413 LearningRate 0.000854 Epoch: 6 Global Step: 139520 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:17,435-Speed 2498.59 samples/sec Loss 4.7758 LearningRate 0.000854 Epoch: 6 Global Step: 139530 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:25,634-Speed 2498.16 samples/sec Loss 4.8052 LearningRate 0.000854 Epoch: 6 Global Step: 139540 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:33,839-Speed 2496.43 samples/sec Loss 4.8905 LearningRate 0.000854 Epoch: 6 Global Step: 139550 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:42,034-Speed 2499.81 samples/sec Loss 4.8486 LearningRate 0.000854 Epoch: 6 Global Step: 139560 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:50,177-Speed 2515.39 samples/sec Loss 4.8633 LearningRate 0.000854 Epoch: 6 Global Step: 139570 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:19:58,384-Speed 2496.26 samples/sec Loss 4.8543 LearningRate 0.000854 Epoch: 6 Global Step: 139580 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:06,584-Speed 2498.08 samples/sec Loss 4.8240 LearningRate 0.000854 Epoch: 6 Global Step: 139590 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:14,781-Speed 2498.61 samples/sec Loss 4.9100 LearningRate 0.000854 Epoch: 6 Global Step: 139600 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:22,982-Speed 2497.95 samples/sec Loss 4.8487 LearningRate 0.000854 Epoch: 6 Global Step: 139610 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:31,185-Speed 2496.75 samples/sec Loss 4.8412 LearningRate 0.000854 Epoch: 6 Global Step: 139620 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:39,332-Speed 2514.30 samples/sec Loss 4.8161 LearningRate 0.000854 Epoch: 6 Global Step: 139630 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:47,529-Speed 2498.80 samples/sec Loss 4.9137 LearningRate 0.000854 Epoch: 6 Global Step: 139640 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:20:55,730-Speed 2497.61 samples/sec Loss 4.7800 LearningRate 0.000854 Epoch: 6 Global Step: 139650 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:03,935-Speed 2497.27 samples/sec Loss 4.9191 LearningRate 0.000854 Epoch: 6 Global Step: 139660 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:12,131-Speed 2499.63 samples/sec Loss 4.8654 LearningRate 0.000854 Epoch: 6 Global Step: 139670 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:20,330-Speed 2498.23 samples/sec Loss 4.8423 LearningRate 0.000854 Epoch: 6 Global Step: 139680 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:28,473-Speed 2515.31 samples/sec Loss 4.7213 LearningRate 0.000854 Epoch: 6 Global Step: 139690 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:36,671-Speed 2498.64 samples/sec Loss 4.8023 LearningRate 0.000854 Epoch: 6 Global Step: 139700 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:44,872-Speed 2497.67 samples/sec Loss 4.8415 LearningRate 0.000854 Epoch: 6 Global Step: 139710 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:21:53,067-Speed 2499.47 samples/sec Loss 4.8216 LearningRate 0.000854 Epoch: 6 Global Step: 139720 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:22:01,265-Speed 2498.72 samples/sec Loss 4.7432 LearningRate 0.000854 Epoch: 6 Global Step: 139730 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:22:09,466-Speed 2497.49 samples/sec Loss 4.7843 LearningRate 0.000854 Epoch: 6 Global Step: 139740 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:22:17,614-Speed 2514.19 samples/sec Loss 4.7822 LearningRate 0.000854 Epoch: 6 Global Step: 139750 Fp16 Grad Scale: 16384 Required: 158 hours Training: 2022-07-06 22:22:25,811-Speed 2498.69 samples/sec Loss 4.8240 LearningRate 0.000854 Epoch: 6 Global Step: 139760 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:22:34,008-Speed 2498.86 samples/sec Loss 4.8094 LearningRate 0.000854 Epoch: 6 Global Step: 139770 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:22:42,204-Speed 2499.39 samples/sec Loss 4.8144 LearningRate 0.000854 Epoch: 6 Global Step: 139780 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:22:50,415-Speed 2494.38 samples/sec Loss 4.7577 LearningRate 0.000854 Epoch: 6 Global Step: 139790 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:22:58,618-Speed 2497.16 samples/sec Loss 4.7609 LearningRate 0.000854 Epoch: 6 Global Step: 139800 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:06,760-Speed 2515.67 samples/sec Loss 4.8449 LearningRate 0.000854 Epoch: 6 Global Step: 139810 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:14,960-Speed 2498.00 samples/sec Loss 4.7690 LearningRate 0.000853 Epoch: 6 Global Step: 139820 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:23,162-Speed 2497.56 samples/sec Loss 4.7659 LearningRate 0.000853 Epoch: 6 Global Step: 139830 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:31,365-Speed 2497.08 samples/sec Loss 4.8148 LearningRate 0.000853 Epoch: 6 Global Step: 139840 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:39,576-Speed 2494.83 samples/sec Loss 4.6989 LearningRate 0.000853 Epoch: 6 Global Step: 139850 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:47,776-Speed 2498.22 samples/sec Loss 4.6983 LearningRate 0.000853 Epoch: 6 Global Step: 139860 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:23:55,924-Speed 2514.11 samples/sec Loss 4.6882 LearningRate 0.000853 Epoch: 6 Global Step: 139870 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:04,144-Speed 2491.94 samples/sec Loss 4.7202 LearningRate 0.000853 Epoch: 6 Global Step: 139880 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:12,342-Speed 2498.63 samples/sec Loss 4.7659 LearningRate 0.000853 Epoch: 6 Global Step: 139890 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:20,544-Speed 2497.58 samples/sec Loss 4.8242 LearningRate 0.000853 Epoch: 6 Global Step: 139900 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:28,742-Speed 2498.57 samples/sec Loss 4.9027 LearningRate 0.000853 Epoch: 6 Global Step: 139910 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:36,941-Speed 2497.94 samples/sec Loss 4.7712 LearningRate 0.000853 Epoch: 6 Global Step: 139920 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:45,090-Speed 2513.78 samples/sec Loss 4.7397 LearningRate 0.000853 Epoch: 6 Global Step: 139930 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:24:53,285-Speed 2499.78 samples/sec Loss 4.8202 LearningRate 0.000853 Epoch: 6 Global Step: 139940 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:01,483-Speed 2498.50 samples/sec Loss 4.7427 LearningRate 0.000853 Epoch: 6 Global Step: 139950 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:09,687-Speed 2496.76 samples/sec Loss 4.8096 LearningRate 0.000853 Epoch: 6 Global Step: 139960 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:17,884-Speed 2498.79 samples/sec Loss 4.7867 LearningRate 0.000853 Epoch: 6 Global Step: 139970 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:26,085-Speed 2497.75 samples/sec Loss 4.8658 LearningRate 0.000853 Epoch: 6 Global Step: 139980 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:34,231-Speed 2514.56 samples/sec Loss 4.7689 LearningRate 0.000853 Epoch: 6 Global Step: 139990 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:42,439-Speed 2495.60 samples/sec Loss 4.7183 LearningRate 0.000853 Epoch: 6 Global Step: 140000 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:50,639-Speed 2497.92 samples/sec Loss 4.7038 LearningRate 0.000853 Epoch: 6 Global Step: 140010 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:25:58,832-Speed 2500.20 samples/sec Loss 4.7919 LearningRate 0.000853 Epoch: 6 Global Step: 140020 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:07,033-Speed 2497.82 samples/sec Loss 4.7893 LearningRate 0.000853 Epoch: 6 Global Step: 140030 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:15,230-Speed 2498.70 samples/sec Loss 4.7146 LearningRate 0.000853 Epoch: 6 Global Step: 140040 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:23,391-Speed 2510.07 samples/sec Loss 4.8544 LearningRate 0.000853 Epoch: 6 Global Step: 140050 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:31,590-Speed 2498.07 samples/sec Loss 4.6903 LearningRate 0.000853 Epoch: 6 Global Step: 140060 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:39,788-Speed 2499.04 samples/sec Loss 4.7035 LearningRate 0.000853 Epoch: 6 Global Step: 140070 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:47,984-Speed 2498.92 samples/sec Loss 4.8066 LearningRate 0.000853 Epoch: 6 Global Step: 140080 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:26:56,179-Speed 2499.74 samples/sec Loss 4.7363 LearningRate 0.000853 Epoch: 6 Global Step: 140090 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:04,376-Speed 2498.78 samples/sec Loss 4.7549 LearningRate 0.000853 Epoch: 6 Global Step: 140100 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:12,524-Speed 2514.08 samples/sec Loss 4.7250 LearningRate 0.000853 Epoch: 6 Global Step: 140110 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:20,723-Speed 2498.22 samples/sec Loss 4.7109 LearningRate 0.000853 Epoch: 6 Global Step: 140120 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:28,922-Speed 2498.25 samples/sec Loss 4.7060 LearningRate 0.000853 Epoch: 6 Global Step: 140130 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:37,121-Speed 2498.36 samples/sec Loss 4.8067 LearningRate 0.000853 Epoch: 6 Global Step: 140140 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:45,318-Speed 2498.90 samples/sec Loss 4.7735 LearningRate 0.000853 Epoch: 6 Global Step: 140150 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:27:53,522-Speed 2496.55 samples/sec Loss 4.8389 LearningRate 0.000853 Epoch: 6 Global Step: 140160 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:01,667-Speed 2515.03 samples/sec Loss 4.7975 LearningRate 0.000853 Epoch: 6 Global Step: 140170 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:09,869-Speed 2497.42 samples/sec Loss 4.8426 LearningRate 0.000853 Epoch: 6 Global Step: 140180 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:18,069-Speed 2497.96 samples/sec Loss 4.8402 LearningRate 0.000853 Epoch: 6 Global Step: 140190 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:26,265-Speed 2498.88 samples/sec Loss 4.8793 LearningRate 0.000853 Epoch: 6 Global Step: 140200 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:34,464-Speed 2498.57 samples/sec Loss 4.7976 LearningRate 0.000853 Epoch: 6 Global Step: 140210 Fp16 Grad Scale: 32768 Required: 158 hours Training: 2022-07-06 22:28:42,662-Speed 2498.73 samples/sec Loss 4.7948 LearningRate 0.000852 Epoch: 6 Global Step: 140220 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:28:50,805-Speed 2515.41 samples/sec Loss 4.7335 LearningRate 0.000852 Epoch: 6 Global Step: 140230 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:28:59,015-Speed 2494.92 samples/sec Loss 4.9134 LearningRate 0.000852 Epoch: 6 Global Step: 140240 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:07,210-Speed 2499.56 samples/sec Loss 4.8158 LearningRate 0.000852 Epoch: 6 Global Step: 140250 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:15,408-Speed 2498.39 samples/sec Loss 4.8047 LearningRate 0.000852 Epoch: 6 Global Step: 140260 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:23,607-Speed 2498.11 samples/sec Loss 4.8076 LearningRate 0.000852 Epoch: 6 Global Step: 140270 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:31,806-Speed 2498.54 samples/sec Loss 4.7477 LearningRate 0.000852 Epoch: 6 Global Step: 140280 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:39,950-Speed 2515.37 samples/sec Loss 4.7266 LearningRate 0.000852 Epoch: 6 Global Step: 140290 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:48,149-Speed 2498.08 samples/sec Loss 4.7285 LearningRate 0.000852 Epoch: 6 Global Step: 140300 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:29:56,346-Speed 2498.92 samples/sec Loss 4.7794 LearningRate 0.000852 Epoch: 6 Global Step: 140310 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:04,544-Speed 2498.46 samples/sec Loss 4.7541 LearningRate 0.000852 Epoch: 6 Global Step: 140320 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:12,767-Speed 2491.16 samples/sec Loss 4.6905 LearningRate 0.000852 Epoch: 6 Global Step: 140330 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:20,964-Speed 2498.87 samples/sec Loss 4.7879 LearningRate 0.000852 Epoch: 6 Global Step: 140340 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:29,110-Speed 2514.62 samples/sec Loss 4.8485 LearningRate 0.000852 Epoch: 6 Global Step: 140350 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:37,310-Speed 2498.09 samples/sec Loss 4.7915 LearningRate 0.000852 Epoch: 6 Global Step: 140360 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:45,516-Speed 2495.95 samples/sec Loss 4.7837 LearningRate 0.000852 Epoch: 6 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:30:53,726-Speed 2494.89 samples/sec Loss 4.7464 LearningRate 0.000852 Epoch: 6 Global Step: 140380 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:01,940-Speed 2493.75 samples/sec Loss 4.7551 LearningRate 0.000852 Epoch: 6 Global Step: 140390 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:10,147-Speed 2496.14 samples/sec Loss 4.7657 LearningRate 0.000852 Epoch: 6 Global Step: 140400 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:18,294-Speed 2514.24 samples/sec Loss 4.7726 LearningRate 0.000852 Epoch: 6 Global Step: 140410 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:26,513-Speed 2492.00 samples/sec Loss 4.8073 LearningRate 0.000852 Epoch: 6 Global Step: 140420 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:34,729-Speed 2493.19 samples/sec Loss 4.7849 LearningRate 0.000852 Epoch: 6 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:42,929-Speed 2497.94 samples/sec Loss 4.7589 LearningRate 0.000852 Epoch: 6 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:51,139-Speed 2495.11 samples/sec Loss 4.8042 LearningRate 0.000852 Epoch: 6 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:31:59,342-Speed 2496.97 samples/sec Loss 4.8349 LearningRate 0.000852 Epoch: 6 Global Step: 140460 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:07,491-Speed 2513.73 samples/sec Loss 4.7666 LearningRate 0.000852 Epoch: 6 Global Step: 140470 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:15,691-Speed 2497.93 samples/sec Loss 4.8672 LearningRate 0.000852 Epoch: 6 Global Step: 140480 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:23,890-Speed 2498.20 samples/sec Loss 4.8329 LearningRate 0.000852 Epoch: 6 Global Step: 140490 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:32,092-Speed 2497.60 samples/sec Loss 4.8141 LearningRate 0.000852 Epoch: 6 Global Step: 140500 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:40,292-Speed 2497.80 samples/sec Loss 4.7718 LearningRate 0.000852 Epoch: 6 Global Step: 140510 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:48,492-Speed 2498.10 samples/sec Loss 4.7947 LearningRate 0.000852 Epoch: 6 Global Step: 140520 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:32:56,639-Speed 2514.37 samples/sec Loss 4.7647 LearningRate 0.000852 Epoch: 6 Global Step: 140530 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:04,839-Speed 2497.75 samples/sec Loss 4.8898 LearningRate 0.000852 Epoch: 6 Global Step: 140540 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:13,038-Speed 2498.55 samples/sec Loss 4.8228 LearningRate 0.000852 Epoch: 6 Global Step: 140550 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:21,238-Speed 2498.22 samples/sec Loss 4.8600 LearningRate 0.000852 Epoch: 6 Global Step: 140560 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:29,444-Speed 2495.88 samples/sec Loss 4.8712 LearningRate 0.000852 Epoch: 6 Global Step: 140570 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:37,640-Speed 2499.58 samples/sec Loss 4.8941 LearningRate 0.000852 Epoch: 6 Global Step: 140580 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:45,789-Speed 2513.68 samples/sec Loss 4.8140 LearningRate 0.000852 Epoch: 6 Global Step: 140590 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:33:53,986-Speed 2498.88 samples/sec Loss 4.7684 LearningRate 0.000852 Epoch: 6 Global Step: 140600 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:02,189-Speed 2497.14 samples/sec Loss 4.7440 LearningRate 0.000852 Epoch: 6 Global Step: 140610 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:10,386-Speed 2498.86 samples/sec Loss 4.7968 LearningRate 0.000851 Epoch: 6 Global Step: 140620 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:18,585-Speed 2498.37 samples/sec Loss 4.7798 LearningRate 0.000851 Epoch: 6 Global Step: 140630 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:26,783-Speed 2498.68 samples/sec Loss 4.7931 LearningRate 0.000851 Epoch: 6 Global Step: 140640 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:34,932-Speed 2513.78 samples/sec Loss 4.8408 LearningRate 0.000851 Epoch: 6 Global Step: 140650 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:43,128-Speed 2499.11 samples/sec Loss 4.8993 LearningRate 0.000851 Epoch: 6 Global Step: 140660 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:51,331-Speed 2497.31 samples/sec Loss 4.8144 LearningRate 0.000851 Epoch: 6 Global Step: 140670 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:34:59,530-Speed 2498.25 samples/sec Loss 4.8228 LearningRate 0.000851 Epoch: 6 Global Step: 140680 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:07,730-Speed 2497.85 samples/sec Loss 4.8379 LearningRate 0.000851 Epoch: 6 Global Step: 140690 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:15,929-Speed 2498.35 samples/sec Loss 4.7795 LearningRate 0.000851 Epoch: 6 Global Step: 140700 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:24,090-Speed 2510.02 samples/sec Loss 4.8450 LearningRate 0.000851 Epoch: 6 Global Step: 140710 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:32,286-Speed 2498.89 samples/sec Loss 4.8475 LearningRate 0.000851 Epoch: 6 Global Step: 140720 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:40,486-Speed 2498.36 samples/sec Loss 4.8212 LearningRate 0.000851 Epoch: 6 Global Step: 140730 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:48,686-Speed 2497.84 samples/sec Loss 4.7599 LearningRate 0.000851 Epoch: 6 Global Step: 140740 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:35:56,884-Speed 2498.64 samples/sec Loss 4.8297 LearningRate 0.000851 Epoch: 6 Global Step: 140750 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:05,082-Speed 2498.60 samples/sec Loss 4.7299 LearningRate 0.000851 Epoch: 6 Global Step: 140760 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:13,227-Speed 2514.52 samples/sec Loss 4.7761 LearningRate 0.000851 Epoch: 6 Global Step: 140770 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:21,437-Speed 2495.01 samples/sec Loss 4.7471 LearningRate 0.000851 Epoch: 6 Global Step: 140780 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:29,643-Speed 2496.14 samples/sec Loss 4.7367 LearningRate 0.000851 Epoch: 6 Global Step: 140790 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:37,840-Speed 2498.67 samples/sec Loss 4.6995 LearningRate 0.000851 Epoch: 6 Global Step: 140800 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:46,039-Speed 2498.56 samples/sec Loss 4.6702 LearningRate 0.000851 Epoch: 6 Global Step: 140810 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:36:54,239-Speed 2498.01 samples/sec Loss 4.7715 LearningRate 0.000851 Epoch: 6 Global Step: 140820 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:02,387-Speed 2513.99 samples/sec Loss 4.6626 LearningRate 0.000851 Epoch: 6 Global Step: 140830 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:10,595-Speed 2495.48 samples/sec Loss 4.6573 LearningRate 0.000851 Epoch: 6 Global Step: 140840 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:18,792-Speed 2498.84 samples/sec Loss 4.6469 LearningRate 0.000851 Epoch: 6 Global Step: 140850 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:26,993-Speed 2497.61 samples/sec Loss 4.7151 LearningRate 0.000851 Epoch: 6 Global Step: 140860 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:35,187-Speed 2499.79 samples/sec Loss 4.7034 LearningRate 0.000851 Epoch: 6 Global Step: 140870 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:43,383-Speed 2499.08 samples/sec Loss 4.6954 LearningRate 0.000851 Epoch: 6 Global Step: 140880 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:37:51,859-Speed 2416.74 samples/sec Loss 4.6304 LearningRate 0.000851 Epoch: 6 Global Step: 140890 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:00,756-Speed 2501.52 samples/sec Loss 4.6958 LearningRate 0.000851 Epoch: 6 Global Step: 140900 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:08,952-Speed 2499.25 samples/sec Loss 4.7728 LearningRate 0.000851 Epoch: 6 Global Step: 140910 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:19,394-Speed 2501.84 samples/sec Loss 4.7071 LearningRate 0.000851 Epoch: 6 Global Step: 140920 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:27,586-Speed 2500.33 samples/sec Loss 4.8513 LearningRate 0.000851 Epoch: 6 Global Step: 140930 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:35,780-Speed 2500.02 samples/sec Loss 4.7655 LearningRate 0.000851 Epoch: 6 Global Step: 140940 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:43,924-Speed 2514.98 samples/sec Loss 4.7496 LearningRate 0.000851 Epoch: 6 Global Step: 140950 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 22:38:52,116-Speed 2500.39 samples/sec Loss 4.7128 LearningRate 0.000851 Epoch: 6 Global Step: 140960 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:00,311-Speed 2499.62 samples/sec Loss 4.7262 LearningRate 0.000851 Epoch: 6 Global Step: 140970 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:08,507-Speed 2499.21 samples/sec Loss 4.6713 LearningRate 0.000851 Epoch: 6 Global Step: 140980 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:16,700-Speed 2500.05 samples/sec Loss 4.7448 LearningRate 0.000851 Epoch: 6 Global Step: 140990 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:24,893-Speed 2499.72 samples/sec Loss 4.7046 LearningRate 0.000851 Epoch: 6 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:33,035-Speed 2515.89 samples/sec Loss 4.6671 LearningRate 0.000851 Epoch: 6 Global Step: 141010 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:41,239-Speed 2496.56 samples/sec Loss 4.6781 LearningRate 0.000851 Epoch: 6 Global Step: 141020 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:49,440-Speed 2497.50 samples/sec Loss 4.6529 LearningRate 0.000850 Epoch: 6 Global Step: 141030 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:39:57,643-Speed 2497.22 samples/sec Loss 4.7552 LearningRate 0.000850 Epoch: 6 Global Step: 141040 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:05,843-Speed 2497.97 samples/sec Loss 4.8041 LearningRate 0.000850 Epoch: 6 Global Step: 141050 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:14,042-Speed 2498.06 samples/sec Loss 4.6906 LearningRate 0.000850 Epoch: 6 Global Step: 141060 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:22,196-Speed 2512.20 samples/sec Loss 4.7519 LearningRate 0.000850 Epoch: 6 Global Step: 141070 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:30,395-Speed 2498.43 samples/sec Loss 4.7144 LearningRate 0.000850 Epoch: 6 Global Step: 141080 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:38,595-Speed 2497.98 samples/sec Loss 4.7516 LearningRate 0.000850 Epoch: 6 Global Step: 141090 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:46,796-Speed 2497.51 samples/sec Loss 4.7667 LearningRate 0.000850 Epoch: 6 Global Step: 141100 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:40:54,992-Speed 2499.15 samples/sec Loss 4.7335 LearningRate 0.000850 Epoch: 6 Global Step: 141110 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:03,189-Speed 2499.08 samples/sec Loss 4.8172 LearningRate 0.000850 Epoch: 6 Global Step: 141120 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:11,334-Speed 2514.55 samples/sec Loss 4.7286 LearningRate 0.000850 Epoch: 6 Global Step: 141130 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:19,536-Speed 2497.36 samples/sec Loss 4.7469 LearningRate 0.000850 Epoch: 6 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:27,732-Speed 2499.78 samples/sec Loss 4.7472 LearningRate 0.000850 Epoch: 6 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:35,931-Speed 2498.92 samples/sec Loss 4.7733 LearningRate 0.000850 Epoch: 6 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:44,131-Speed 2497.88 samples/sec Loss 4.6866 LearningRate 0.000850 Epoch: 6 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:41:52,347-Speed 2493.07 samples/sec Loss 4.7207 LearningRate 0.000850 Epoch: 6 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:00,491-Speed 2515.06 samples/sec Loss 4.8126 LearningRate 0.000850 Epoch: 6 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:08,692-Speed 2497.93 samples/sec Loss 4.7636 LearningRate 0.000850 Epoch: 6 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:16,899-Speed 2495.84 samples/sec Loss 4.8084 LearningRate 0.000850 Epoch: 6 Global Step: 141210 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:25,096-Speed 2498.83 samples/sec Loss 4.8406 LearningRate 0.000850 Epoch: 6 Global Step: 141220 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:33,299-Speed 2496.89 samples/sec Loss 4.6948 LearningRate 0.000850 Epoch: 6 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:41,499-Speed 2498.19 samples/sec Loss 4.8767 LearningRate 0.000850 Epoch: 6 Global Step: 141240 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:49,642-Speed 2515.34 samples/sec Loss 4.8592 LearningRate 0.000850 Epoch: 6 Global Step: 141250 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:42:57,839-Speed 2498.90 samples/sec Loss 4.8457 LearningRate 0.000850 Epoch: 6 Global Step: 141260 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:06,036-Speed 2498.73 samples/sec Loss 4.8308 LearningRate 0.000850 Epoch: 6 Global Step: 141270 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:14,234-Speed 2498.60 samples/sec Loss 4.8226 LearningRate 0.000850 Epoch: 6 Global Step: 141280 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:22,436-Speed 2497.60 samples/sec Loss 4.7847 LearningRate 0.000850 Epoch: 6 Global Step: 141290 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:30,632-Speed 2499.19 samples/sec Loss 4.7625 LearningRate 0.000850 Epoch: 6 Global Step: 141300 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:38,779-Speed 2514.05 samples/sec Loss 4.7634 LearningRate 0.000850 Epoch: 6 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:46,975-Speed 2499.01 samples/sec Loss 4.7738 LearningRate 0.000850 Epoch: 6 Global Step: 141320 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:43:55,170-Speed 2499.52 samples/sec Loss 4.6948 LearningRate 0.000850 Epoch: 6 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:03,369-Speed 2498.23 samples/sec Loss 4.8095 LearningRate 0.000850 Epoch: 6 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:11,579-Speed 2495.19 samples/sec Loss 4.7527 LearningRate 0.000850 Epoch: 6 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:19,779-Speed 2497.66 samples/sec Loss 4.6890 LearningRate 0.000850 Epoch: 6 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:27,924-Speed 2514.81 samples/sec Loss 4.8675 LearningRate 0.000850 Epoch: 6 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:36,122-Speed 2499.04 samples/sec Loss 4.7127 LearningRate 0.000850 Epoch: 6 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:44,320-Speed 2498.56 samples/sec Loss 4.7248 LearningRate 0.000850 Epoch: 6 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:44:52,519-Speed 2498.25 samples/sec Loss 4.7734 LearningRate 0.000850 Epoch: 6 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:00,712-Speed 2500.20 samples/sec Loss 4.7225 LearningRate 0.000850 Epoch: 6 Global Step: 141410 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:08,912-Speed 2498.00 samples/sec Loss 4.8019 LearningRate 0.000850 Epoch: 6 Global Step: 141420 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:17,059-Speed 2514.28 samples/sec Loss 4.7478 LearningRate 0.000849 Epoch: 6 Global Step: 141430 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:25,266-Speed 2495.86 samples/sec Loss 4.6827 LearningRate 0.000849 Epoch: 6 Global Step: 141440 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:33,476-Speed 2494.75 samples/sec Loss 4.7486 LearningRate 0.000849 Epoch: 6 Global Step: 141450 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:41,674-Speed 2498.55 samples/sec Loss 4.7596 LearningRate 0.000849 Epoch: 6 Global Step: 141460 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:49,872-Speed 2498.66 samples/sec Loss 4.7413 LearningRate 0.000849 Epoch: 6 Global Step: 141470 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:45:58,075-Speed 2496.96 samples/sec Loss 4.7999 LearningRate 0.000849 Epoch: 6 Global Step: 141480 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:06,231-Speed 2511.44 samples/sec Loss 4.7911 LearningRate 0.000849 Epoch: 6 Global Step: 141490 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:14,437-Speed 2496.28 samples/sec Loss 4.9132 LearningRate 0.000849 Epoch: 6 Global Step: 141500 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:22,636-Speed 2498.42 samples/sec Loss 4.7824 LearningRate 0.000849 Epoch: 6 Global Step: 141510 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:30,835-Speed 2498.27 samples/sec Loss 4.8452 LearningRate 0.000849 Epoch: 6 Global Step: 141520 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:39,034-Speed 2498.40 samples/sec Loss 4.7226 LearningRate 0.000849 Epoch: 6 Global Step: 141530 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:47,231-Speed 2499.09 samples/sec Loss 4.8118 LearningRate 0.000849 Epoch: 6 Global Step: 141540 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:46:55,375-Speed 2515.05 samples/sec Loss 4.7663 LearningRate 0.000849 Epoch: 6 Global Step: 141550 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:03,574-Speed 2498.27 samples/sec Loss 4.7867 LearningRate 0.000849 Epoch: 6 Global Step: 141560 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:11,773-Speed 2498.03 samples/sec Loss 4.7529 LearningRate 0.000849 Epoch: 6 Global Step: 141570 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:19,971-Speed 2498.61 samples/sec Loss 4.8335 LearningRate 0.000849 Epoch: 6 Global Step: 141580 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:28,168-Speed 2499.28 samples/sec Loss 4.7342 LearningRate 0.000849 Epoch: 6 Global Step: 141590 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:36,367-Speed 2498.05 samples/sec Loss 4.7682 LearningRate 0.000849 Epoch: 6 Global Step: 141600 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:44,515-Speed 2514.16 samples/sec Loss 4.7183 LearningRate 0.000849 Epoch: 6 Global Step: 141610 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:47:52,715-Speed 2497.81 samples/sec Loss 4.7741 LearningRate 0.000849 Epoch: 6 Global Step: 141620 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:00,914-Speed 2498.15 samples/sec Loss 4.8149 LearningRate 0.000849 Epoch: 6 Global Step: 141630 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:09,111-Speed 2498.88 samples/sec Loss 4.8099 LearningRate 0.000849 Epoch: 6 Global Step: 141640 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:17,314-Speed 2497.40 samples/sec Loss 4.7498 LearningRate 0.000849 Epoch: 6 Global Step: 141650 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:25,511-Speed 2498.86 samples/sec Loss 4.8198 LearningRate 0.000849 Epoch: 6 Global Step: 141660 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:33,656-Speed 2514.76 samples/sec Loss 4.7995 LearningRate 0.000849 Epoch: 6 Global Step: 141670 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:41,854-Speed 2498.56 samples/sec Loss 4.8187 LearningRate 0.000849 Epoch: 6 Global Step: 141680 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:50,053-Speed 2498.55 samples/sec Loss 4.6899 LearningRate 0.000849 Epoch: 6 Global Step: 141690 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:48:58,255-Speed 2497.20 samples/sec Loss 4.7395 LearningRate 0.000849 Epoch: 6 Global Step: 141700 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:06,452-Speed 2498.95 samples/sec Loss 4.7431 LearningRate 0.000849 Epoch: 6 Global Step: 141710 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:14,650-Speed 2498.52 samples/sec Loss 4.7958 LearningRate 0.000849 Epoch: 6 Global Step: 141720 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:22,794-Speed 2515.40 samples/sec Loss 4.7187 LearningRate 0.000849 Epoch: 6 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:30,995-Speed 2497.47 samples/sec Loss 4.7419 LearningRate 0.000849 Epoch: 6 Global Step: 141740 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:39,194-Speed 2498.22 samples/sec Loss 4.7181 LearningRate 0.000849 Epoch: 6 Global Step: 141750 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:47,408-Speed 2493.84 samples/sec Loss 4.7347 LearningRate 0.000849 Epoch: 6 Global Step: 141760 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:49:55,606-Speed 2499.49 samples/sec Loss 4.8263 LearningRate 0.000849 Epoch: 6 Global Step: 141770 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:03,804-Speed 2498.21 samples/sec Loss 4.6607 LearningRate 0.000849 Epoch: 6 Global Step: 141780 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:11,952-Speed 2514.30 samples/sec Loss 4.7412 LearningRate 0.000849 Epoch: 6 Global Step: 141790 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:20,149-Speed 2498.85 samples/sec Loss 4.7764 LearningRate 0.000849 Epoch: 6 Global Step: 141800 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:28,353-Speed 2496.76 samples/sec Loss 4.7304 LearningRate 0.000849 Epoch: 6 Global Step: 141810 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:36,554-Speed 2497.71 samples/sec Loss 4.7063 LearningRate 0.000849 Epoch: 6 Global Step: 141820 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:44,752-Speed 2498.84 samples/sec Loss 4.7236 LearningRate 0.000849 Epoch: 6 Global Step: 141830 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:50:52,950-Speed 2498.41 samples/sec Loss 4.7143 LearningRate 0.000848 Epoch: 6 Global Step: 141840 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:01,096-Speed 2514.85 samples/sec Loss 4.7394 LearningRate 0.000848 Epoch: 6 Global Step: 141850 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:09,294-Speed 2498.50 samples/sec Loss 4.7732 LearningRate 0.000848 Epoch: 6 Global Step: 141860 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:17,493-Speed 2498.49 samples/sec Loss 4.7700 LearningRate 0.000848 Epoch: 6 Global Step: 141870 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:25,693-Speed 2497.97 samples/sec Loss 4.7806 LearningRate 0.000848 Epoch: 6 Global Step: 141880 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:33,899-Speed 2496.24 samples/sec Loss 4.7838 LearningRate 0.000848 Epoch: 6 Global Step: 141890 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:42,098-Speed 2498.24 samples/sec Loss 4.6940 LearningRate 0.000848 Epoch: 6 Global Step: 141900 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:50,244-Speed 2514.48 samples/sec Loss 4.6813 LearningRate 0.000848 Epoch: 6 Global Step: 141910 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:51:58,442-Speed 2498.67 samples/sec Loss 4.7270 LearningRate 0.000848 Epoch: 6 Global Step: 141920 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:06,640-Speed 2498.57 samples/sec Loss 4.7935 LearningRate 0.000848 Epoch: 6 Global Step: 141930 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:14,842-Speed 2497.15 samples/sec Loss 4.6919 LearningRate 0.000848 Epoch: 6 Global Step: 141940 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:23,045-Speed 2497.15 samples/sec Loss 4.7689 LearningRate 0.000848 Epoch: 6 Global Step: 141950 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:31,245-Speed 2498.00 samples/sec Loss 4.6542 LearningRate 0.000848 Epoch: 6 Global Step: 141960 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:39,389-Speed 2514.74 samples/sec Loss 4.7443 LearningRate 0.000848 Epoch: 6 Global Step: 141970 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:47,589-Speed 2498.02 samples/sec Loss 4.8049 LearningRate 0.000848 Epoch: 6 Global Step: 141980 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:52:55,790-Speed 2498.23 samples/sec Loss 4.7502 LearningRate 0.000848 Epoch: 6 Global Step: 141990 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:03,990-Speed 2498.03 samples/sec Loss 4.6928 LearningRate 0.000848 Epoch: 6 Global Step: 142000 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:12,193-Speed 2496.89 samples/sec Loss 4.7352 LearningRate 0.000848 Epoch: 6 Global Step: 142010 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:20,400-Speed 2495.65 samples/sec Loss 4.7563 LearningRate 0.000848 Epoch: 6 Global Step: 142020 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:28,552-Speed 2512.63 samples/sec Loss 4.7783 LearningRate 0.000848 Epoch: 6 Global Step: 142030 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:36,749-Speed 2499.10 samples/sec Loss 4.6828 LearningRate 0.000848 Epoch: 6 Global Step: 142040 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:44,949-Speed 2497.88 samples/sec Loss 4.6786 LearningRate 0.000848 Epoch: 6 Global Step: 142050 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:53:53,151-Speed 2497.27 samples/sec Loss 4.7739 LearningRate 0.000848 Epoch: 6 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:01,349-Speed 2498.82 samples/sec Loss 4.8505 LearningRate 0.000848 Epoch: 6 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:09,549-Speed 2498.40 samples/sec Loss 4.6413 LearningRate 0.000848 Epoch: 6 Global Step: 142080 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:17,693-Speed 2515.21 samples/sec Loss 4.7980 LearningRate 0.000848 Epoch: 6 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:25,890-Speed 2498.94 samples/sec Loss 4.6767 LearningRate 0.000848 Epoch: 6 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:34,089-Speed 2498.43 samples/sec Loss 4.6995 LearningRate 0.000848 Epoch: 6 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:42,285-Speed 2498.95 samples/sec Loss 4.7075 LearningRate 0.000848 Epoch: 6 Global Step: 142120 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:50,501-Speed 2493.29 samples/sec Loss 4.7020 LearningRate 0.000848 Epoch: 6 Global Step: 142130 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:54:58,700-Speed 2498.00 samples/sec Loss 4.6632 LearningRate 0.000848 Epoch: 6 Global Step: 142140 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:55:06,864-Speed 2509.30 samples/sec Loss 4.7104 LearningRate 0.000848 Epoch: 6 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:55:15,070-Speed 2496.24 samples/sec Loss 4.6734 LearningRate 0.000848 Epoch: 6 Global Step: 142160 Fp16 Grad Scale: 131072 Required: 157 hours Training: 2022-07-06 22:55:23,280-Speed 2495.00 samples/sec Loss 4.7066 LearningRate 0.000848 Epoch: 6 Global Step: 142170 Fp16 Grad Scale: 131072 Required: 157 hours Training: 2022-07-06 22:55:31,438-Speed 2510.87 samples/sec Loss 4.7742 LearningRate 0.000848 Epoch: 6 Global Step: 142180 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:55:39,637-Speed 2498.26 samples/sec Loss 4.6733 LearningRate 0.000848 Epoch: 6 Global Step: 142190 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:55:47,841-Speed 2496.93 samples/sec Loss 4.6537 LearningRate 0.000848 Epoch: 6 Global Step: 142200 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:55:55,985-Speed 2515.17 samples/sec Loss 4.7740 LearningRate 0.000848 Epoch: 6 Global Step: 142210 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:04,182-Speed 2498.84 samples/sec Loss 4.7747 LearningRate 0.000848 Epoch: 6 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:12,396-Speed 2493.60 samples/sec Loss 4.7596 LearningRate 0.000848 Epoch: 6 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:20,598-Speed 2497.56 samples/sec Loss 4.8010 LearningRate 0.000847 Epoch: 6 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:28,794-Speed 2498.92 samples/sec Loss 4.8664 LearningRate 0.000847 Epoch: 6 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:36,998-Speed 2496.71 samples/sec Loss 4.6581 LearningRate 0.000847 Epoch: 6 Global Step: 142260 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:45,151-Speed 2512.44 samples/sec Loss 4.8421 LearningRate 0.000847 Epoch: 6 Global Step: 142270 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:56:53,349-Speed 2498.65 samples/sec Loss 4.8034 LearningRate 0.000847 Epoch: 6 Global Step: 142280 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:01,551-Speed 2497.48 samples/sec Loss 4.7776 LearningRate 0.000847 Epoch: 6 Global Step: 142290 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:09,762-Speed 2494.63 samples/sec Loss 4.7387 LearningRate 0.000847 Epoch: 6 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:17,966-Speed 2496.65 samples/sec Loss 4.7141 LearningRate 0.000847 Epoch: 6 Global Step: 142310 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:26,172-Speed 2496.22 samples/sec Loss 4.7001 LearningRate 0.000847 Epoch: 6 Global Step: 142320 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:34,321-Speed 2513.43 samples/sec Loss 4.6686 LearningRate 0.000847 Epoch: 6 Global Step: 142330 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:42,524-Speed 2497.00 samples/sec Loss 4.6882 LearningRate 0.000847 Epoch: 6 Global Step: 142340 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:50,731-Speed 2496.02 samples/sec Loss 4.7766 LearningRate 0.000847 Epoch: 6 Global Step: 142350 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:57:58,932-Speed 2497.65 samples/sec Loss 4.7093 LearningRate 0.000847 Epoch: 6 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:07,134-Speed 2497.06 samples/sec Loss 4.7040 LearningRate 0.000847 Epoch: 6 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:15,338-Speed 2497.01 samples/sec Loss 4.6712 LearningRate 0.000847 Epoch: 6 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:23,485-Speed 2513.96 samples/sec Loss 4.7513 LearningRate 0.000847 Epoch: 6 Global Step: 142390 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:31,691-Speed 2496.31 samples/sec Loss 4.6985 LearningRate 0.000847 Epoch: 6 Global Step: 142400 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:39,895-Speed 2496.88 samples/sec Loss 4.7407 LearningRate 0.000847 Epoch: 6 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:48,092-Speed 2498.84 samples/sec Loss 4.7696 LearningRate 0.000847 Epoch: 6 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:58:56,291-Speed 2498.32 samples/sec Loss 4.7305 LearningRate 0.000847 Epoch: 6 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:04,493-Speed 2497.08 samples/sec Loss 4.7938 LearningRate 0.000847 Epoch: 6 Global Step: 142440 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:12,639-Speed 2514.93 samples/sec Loss 4.7655 LearningRate 0.000847 Epoch: 6 Global Step: 142450 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:20,837-Speed 2498.63 samples/sec Loss 4.7983 LearningRate 0.000847 Epoch: 6 Global Step: 142460 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:29,035-Speed 2498.55 samples/sec Loss 4.8010 LearningRate 0.000847 Epoch: 6 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:37,231-Speed 2499.17 samples/sec Loss 4.7494 LearningRate 0.000847 Epoch: 6 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:45,426-Speed 2499.55 samples/sec Loss 4.8232 LearningRate 0.000847 Epoch: 6 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 22:59:53,631-Speed 2496.81 samples/sec Loss 4.8156 LearningRate 0.000847 Epoch: 6 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:01,790-Speed 2510.52 samples/sec Loss 4.7549 LearningRate 0.000847 Epoch: 6 Global Step: 142510 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:09,990-Speed 2498.15 samples/sec Loss 4.6963 LearningRate 0.000847 Epoch: 6 Global Step: 142520 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:18,190-Speed 2498.15 samples/sec Loss 4.6930 LearningRate 0.000847 Epoch: 6 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:26,401-Speed 2494.52 samples/sec Loss 4.6196 LearningRate 0.000847 Epoch: 6 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:34,602-Speed 2497.43 samples/sec Loss 4.5897 LearningRate 0.000847 Epoch: 6 Global Step: 142550 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:42,804-Speed 2497.37 samples/sec Loss 4.7076 LearningRate 0.000847 Epoch: 6 Global Step: 142560 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:50,950-Speed 2514.76 samples/sec Loss 4.7095 LearningRate 0.000847 Epoch: 6 Global Step: 142570 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:00:59,149-Speed 2498.40 samples/sec Loss 4.7771 LearningRate 0.000847 Epoch: 6 Global Step: 142580 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:07,352-Speed 2496.66 samples/sec Loss 4.7342 LearningRate 0.000847 Epoch: 6 Global Step: 142590 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:15,552-Speed 2498.06 samples/sec Loss 4.7303 LearningRate 0.000847 Epoch: 6 Global Step: 142600 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:23,749-Speed 2498.72 samples/sec Loss 4.6619 LearningRate 0.000847 Epoch: 6 Global Step: 142610 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:31,949-Speed 2498.13 samples/sec Loss 4.7089 LearningRate 0.000847 Epoch: 6 Global Step: 142620 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:40,094-Speed 2514.94 samples/sec Loss 4.6262 LearningRate 0.000847 Epoch: 6 Global Step: 142630 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:48,297-Speed 2496.99 samples/sec Loss 4.6669 LearningRate 0.000847 Epoch: 6 Global Step: 142640 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:01:56,500-Speed 2497.36 samples/sec Loss 4.7231 LearningRate 0.000846 Epoch: 6 Global Step: 142650 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:04,701-Speed 2497.40 samples/sec Loss 4.6686 LearningRate 0.000846 Epoch: 6 Global Step: 142660 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:12,914-Speed 2493.94 samples/sec Loss 4.6668 LearningRate 0.000846 Epoch: 6 Global Step: 142670 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:21,129-Speed 2493.66 samples/sec Loss 4.7244 LearningRate 0.000846 Epoch: 6 Global Step: 142680 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:29,275-Speed 2514.31 samples/sec Loss 4.6431 LearningRate 0.000846 Epoch: 6 Global Step: 142690 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:37,477-Speed 2497.44 samples/sec Loss 4.7202 LearningRate 0.000846 Epoch: 6 Global Step: 142700 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:02:45,634-Speed 2511.04 samples/sec Loss 4.7686 LearningRate 0.000846 Epoch: 6 Global Step: 142710 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:02:53,852-Speed 2492.82 samples/sec Loss 4.6660 LearningRate 0.000846 Epoch: 6 Global Step: 142720 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:02,059-Speed 2495.89 samples/sec Loss 4.7051 LearningRate 0.000846 Epoch: 6 Global Step: 142730 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:10,257-Speed 2498.51 samples/sec Loss 4.6709 LearningRate 0.000846 Epoch: 6 Global Step: 142740 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:18,405-Speed 2513.94 samples/sec Loss 4.6784 LearningRate 0.000846 Epoch: 6 Global Step: 142750 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:26,605-Speed 2498.25 samples/sec Loss 4.9220 LearningRate 0.000846 Epoch: 6 Global Step: 142760 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:34,810-Speed 2496.36 samples/sec Loss 4.7527 LearningRate 0.000846 Epoch: 6 Global Step: 142770 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:43,025-Speed 2494.16 samples/sec Loss 4.7367 LearningRate 0.000846 Epoch: 6 Global Step: 142780 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:51,227-Speed 2497.49 samples/sec Loss 4.7028 LearningRate 0.000846 Epoch: 6 Global Step: 142790 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:03:59,427-Speed 2497.79 samples/sec Loss 4.7742 LearningRate 0.000846 Epoch: 6 Global Step: 142800 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:07,589-Speed 2509.59 samples/sec Loss 4.7533 LearningRate 0.000846 Epoch: 6 Global Step: 142810 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:15,789-Speed 2498.03 samples/sec Loss 4.6719 LearningRate 0.000846 Epoch: 6 Global Step: 142820 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:23,992-Speed 2496.87 samples/sec Loss 4.6925 LearningRate 0.000846 Epoch: 6 Global Step: 142830 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:32,204-Speed 2494.47 samples/sec Loss 4.7025 LearningRate 0.000846 Epoch: 6 Global Step: 142840 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:40,402-Speed 2498.41 samples/sec Loss 4.6441 LearningRate 0.000846 Epoch: 6 Global Step: 142850 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:48,602-Speed 2498.00 samples/sec Loss 4.7331 LearningRate 0.000846 Epoch: 6 Global Step: 142860 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:04:56,745-Speed 2515.57 samples/sec Loss 4.6600 LearningRate 0.000846 Epoch: 6 Global Step: 142870 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:04,948-Speed 2496.87 samples/sec Loss 4.7212 LearningRate 0.000846 Epoch: 6 Global Step: 142880 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:13,149-Speed 2497.92 samples/sec Loss 4.7356 LearningRate 0.000846 Epoch: 6 Global Step: 142890 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:21,345-Speed 2499.22 samples/sec Loss 4.7005 LearningRate 0.000846 Epoch: 6 Global Step: 142900 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:29,546-Speed 2497.59 samples/sec Loss 4.7639 LearningRate 0.000846 Epoch: 6 Global Step: 142910 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:37,749-Speed 2497.04 samples/sec Loss 4.7099 LearningRate 0.000846 Epoch: 6 Global Step: 142920 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:45,896-Speed 2514.22 samples/sec Loss 4.7065 LearningRate 0.000846 Epoch: 6 Global Step: 142930 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:05:54,096-Speed 2497.77 samples/sec Loss 4.7058 LearningRate 0.000846 Epoch: 6 Global Step: 142940 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:02,297-Speed 2498.03 samples/sec Loss 4.5976 LearningRate 0.000846 Epoch: 6 Global Step: 142950 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:10,497-Speed 2498.25 samples/sec Loss 4.7140 LearningRate 0.000846 Epoch: 6 Global Step: 142960 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:18,698-Speed 2497.76 samples/sec Loss 4.7449 LearningRate 0.000846 Epoch: 6 Global Step: 142970 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:26,901-Speed 2496.94 samples/sec Loss 4.6906 LearningRate 0.000846 Epoch: 6 Global Step: 142980 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:35,049-Speed 2513.81 samples/sec Loss 4.7224 LearningRate 0.000846 Epoch: 6 Global Step: 142990 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:43,252-Speed 2497.14 samples/sec Loss 4.7310 LearningRate 0.000846 Epoch: 6 Global Step: 143000 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:51,453-Speed 2497.66 samples/sec Loss 4.6779 LearningRate 0.000846 Epoch: 6 Global Step: 143010 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:06:59,664-Speed 2494.79 samples/sec Loss 4.7106 LearningRate 0.000846 Epoch: 6 Global Step: 143020 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:07,876-Speed 2494.27 samples/sec Loss 4.6829 LearningRate 0.000846 Epoch: 6 Global Step: 143030 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:16,080-Speed 2496.65 samples/sec Loss 4.7693 LearningRate 0.000846 Epoch: 6 Global Step: 143040 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:24,227-Speed 2514.17 samples/sec Loss 4.7113 LearningRate 0.000846 Epoch: 6 Global Step: 143050 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:32,424-Speed 2498.77 samples/sec Loss 4.5779 LearningRate 0.000845 Epoch: 6 Global Step: 143060 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:40,629-Speed 2496.38 samples/sec Loss 4.6962 LearningRate 0.000845 Epoch: 6 Global Step: 143070 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:48,831-Speed 2497.37 samples/sec Loss 4.6164 LearningRate 0.000845 Epoch: 6 Global Step: 143080 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:07:57,030-Speed 2498.59 samples/sec Loss 4.7424 LearningRate 0.000845 Epoch: 6 Global Step: 143090 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:05,227-Speed 2498.66 samples/sec Loss 4.6473 LearningRate 0.000845 Epoch: 6 Global Step: 143100 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:13,374-Speed 2514.26 samples/sec Loss 4.6045 LearningRate 0.000845 Epoch: 6 Global Step: 143110 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:21,572-Speed 2498.62 samples/sec Loss 4.6891 LearningRate 0.000845 Epoch: 6 Global Step: 143120 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:29,773-Speed 2497.75 samples/sec Loss 4.6509 LearningRate 0.000845 Epoch: 6 Global Step: 143130 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:37,976-Speed 2497.24 samples/sec Loss 4.6301 LearningRate 0.000845 Epoch: 6 Global Step: 143140 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:46,181-Speed 2496.53 samples/sec Loss 4.7306 LearningRate 0.000845 Epoch: 6 Global Step: 143150 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:08:54,381-Speed 2497.95 samples/sec Loss 4.7978 LearningRate 0.000845 Epoch: 6 Global Step: 143160 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:02,530-Speed 2513.42 samples/sec Loss 4.7584 LearningRate 0.000845 Epoch: 6 Global Step: 143170 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:10,737-Speed 2495.94 samples/sec Loss 4.7514 LearningRate 0.000845 Epoch: 6 Global Step: 143180 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:18,940-Speed 2497.06 samples/sec Loss 4.6856 LearningRate 0.000845 Epoch: 6 Global Step: 143190 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:27,150-Speed 2494.86 samples/sec Loss 4.7057 LearningRate 0.000845 Epoch: 6 Global Step: 143200 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:35,353-Speed 2497.10 samples/sec Loss 4.6711 LearningRate 0.000845 Epoch: 6 Global Step: 143210 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:43,558-Speed 2496.30 samples/sec Loss 4.7009 LearningRate 0.000845 Epoch: 6 Global Step: 143220 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:51,705-Speed 2514.19 samples/sec Loss 4.6582 LearningRate 0.000845 Epoch: 6 Global Step: 143230 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:09:59,909-Speed 2496.91 samples/sec Loss 4.7129 LearningRate 0.000845 Epoch: 6 Global Step: 143240 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:08,109-Speed 2497.82 samples/sec Loss 4.6915 LearningRate 0.000845 Epoch: 6 Global Step: 143250 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:17,457-Speed 2500.46 samples/sec Loss 4.6891 LearningRate 0.000845 Epoch: 6 Global Step: 143260 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:25,658-Speed 2497.61 samples/sec Loss 4.6183 LearningRate 0.000845 Epoch: 6 Global Step: 143270 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:34,466-Speed 2325.51 samples/sec Loss 4.6398 LearningRate 0.000845 Epoch: 6 Global Step: 143280 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:43,024-Speed 2393.43 samples/sec Loss 4.7343 LearningRate 0.000845 Epoch: 6 Global Step: 143290 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:51,237-Speed 2494.06 samples/sec Loss 4.6662 LearningRate 0.000845 Epoch: 6 Global Step: 143300 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:10:59,439-Speed 2497.28 samples/sec Loss 4.7075 LearningRate 0.000845 Epoch: 6 Global Step: 143310 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:07,640-Speed 2497.68 samples/sec Loss 4.6292 LearningRate 0.000845 Epoch: 6 Global Step: 143320 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:15,844-Speed 2496.72 samples/sec Loss 4.6164 LearningRate 0.000845 Epoch: 6 Global Step: 143330 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:24,058-Speed 2493.64 samples/sec Loss 4.5925 LearningRate 0.000845 Epoch: 6 Global Step: 143340 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:32,205-Speed 2514.05 samples/sec Loss 4.6811 LearningRate 0.000845 Epoch: 6 Global Step: 143350 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:40,406-Speed 2497.53 samples/sec Loss 4.7838 LearningRate 0.000845 Epoch: 6 Global Step: 143360 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:48,606-Speed 2498.17 samples/sec Loss 4.6676 LearningRate 0.000845 Epoch: 6 Global Step: 143370 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:11:56,811-Speed 2496.55 samples/sec Loss 4.6775 LearningRate 0.000845 Epoch: 6 Global Step: 143380 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:05,016-Speed 2496.31 samples/sec Loss 4.7004 LearningRate 0.000845 Epoch: 6 Global Step: 143390 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:13,217-Speed 2497.59 samples/sec Loss 4.8328 LearningRate 0.000845 Epoch: 6 Global Step: 143400 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:21,369-Speed 2512.77 samples/sec Loss 4.6657 LearningRate 0.000845 Epoch: 6 Global Step: 143410 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:29,572-Speed 2497.05 samples/sec Loss 4.6532 LearningRate 0.000845 Epoch: 6 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:37,775-Speed 2497.04 samples/sec Loss 4.5998 LearningRate 0.000845 Epoch: 6 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:45,979-Speed 2496.91 samples/sec Loss 4.7449 LearningRate 0.000845 Epoch: 6 Global Step: 143440 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:12:54,176-Speed 2498.83 samples/sec Loss 4.7107 LearningRate 0.000845 Epoch: 6 Global Step: 143450 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:02,378-Speed 2497.38 samples/sec Loss 4.6800 LearningRate 0.000844 Epoch: 6 Global Step: 143460 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:10,536-Speed 2510.61 samples/sec Loss 4.7734 LearningRate 0.000844 Epoch: 6 Global Step: 143470 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:18,739-Speed 2497.04 samples/sec Loss 4.6166 LearningRate 0.000844 Epoch: 6 Global Step: 143480 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:26,938-Speed 2498.54 samples/sec Loss 4.7338 LearningRate 0.000844 Epoch: 6 Global Step: 143490 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:35,149-Speed 2494.59 samples/sec Loss 4.6273 LearningRate 0.000844 Epoch: 6 Global Step: 143500 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:43,348-Speed 2497.98 samples/sec Loss 4.7277 LearningRate 0.000844 Epoch: 6 Global Step: 143510 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:51,549-Speed 2497.94 samples/sec Loss 4.6414 LearningRate 0.000844 Epoch: 6 Global Step: 143520 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:13:59,690-Speed 2515.94 samples/sec Loss 4.6824 LearningRate 0.000844 Epoch: 6 Global Step: 143530 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:07,904-Speed 2493.70 samples/sec Loss 4.6987 LearningRate 0.000844 Epoch: 6 Global Step: 143540 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:16,103-Speed 2498.25 samples/sec Loss 4.6385 LearningRate 0.000844 Epoch: 6 Global Step: 143550 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:24,305-Speed 2497.21 samples/sec Loss 4.6890 LearningRate 0.000844 Epoch: 6 Global Step: 143560 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:32,514-Speed 2495.35 samples/sec Loss 4.7060 LearningRate 0.000844 Epoch: 6 Global Step: 143570 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:40,739-Speed 2490.39 samples/sec Loss 4.7399 LearningRate 0.000844 Epoch: 6 Global Step: 143580 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:48,883-Speed 2514.96 samples/sec Loss 4.6002 LearningRate 0.000844 Epoch: 6 Global Step: 143590 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:14:57,079-Speed 2499.16 samples/sec Loss 4.6702 LearningRate 0.000844 Epoch: 6 Global Step: 143600 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:05,279-Speed 2498.15 samples/sec Loss 4.7153 LearningRate 0.000844 Epoch: 6 Global Step: 143610 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:13,477-Speed 2498.63 samples/sec Loss 4.7219 LearningRate 0.000844 Epoch: 6 Global Step: 143620 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:21,674-Speed 2498.81 samples/sec Loss 4.6288 LearningRate 0.000844 Epoch: 6 Global Step: 143630 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:29,881-Speed 2496.07 samples/sec Loss 4.7474 LearningRate 0.000844 Epoch: 6 Global Step: 143640 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:38,028-Speed 2514.13 samples/sec Loss 4.6646 LearningRate 0.000844 Epoch: 6 Global Step: 143650 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:46,226-Speed 2498.70 samples/sec Loss 4.8028 LearningRate 0.000844 Epoch: 6 Global Step: 143660 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:15:54,424-Speed 2498.53 samples/sec Loss 4.7107 LearningRate 0.000844 Epoch: 6 Global Step: 143670 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:02,625-Speed 2497.53 samples/sec Loss 4.7130 LearningRate 0.000844 Epoch: 6 Global Step: 143680 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:10,824-Speed 2498.52 samples/sec Loss 4.6536 LearningRate 0.000844 Epoch: 6 Global Step: 143690 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:19,023-Speed 2498.06 samples/sec Loss 4.6240 LearningRate 0.000844 Epoch: 6 Global Step: 143700 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:27,169-Speed 2515.00 samples/sec Loss 4.6854 LearningRate 0.000844 Epoch: 6 Global Step: 143710 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:35,366-Speed 2498.97 samples/sec Loss 4.6366 LearningRate 0.000844 Epoch: 6 Global Step: 143720 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:43,581-Speed 2493.21 samples/sec Loss 4.7454 LearningRate 0.000844 Epoch: 6 Global Step: 143730 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:51,780-Speed 2498.31 samples/sec Loss 4.7033 LearningRate 0.000844 Epoch: 6 Global Step: 143740 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:16:59,986-Speed 2496.34 samples/sec Loss 4.6825 LearningRate 0.000844 Epoch: 6 Global Step: 143750 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:08,191-Speed 2496.37 samples/sec Loss 4.8278 LearningRate 0.000844 Epoch: 6 Global Step: 143760 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:16,353-Speed 2509.66 samples/sec Loss 4.7056 LearningRate 0.000844 Epoch: 6 Global Step: 143770 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:24,556-Speed 2496.84 samples/sec Loss 4.6149 LearningRate 0.000844 Epoch: 6 Global Step: 143780 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:32,758-Speed 2497.61 samples/sec Loss 4.6431 LearningRate 0.000844 Epoch: 6 Global Step: 143790 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:40,962-Speed 2496.86 samples/sec Loss 4.6630 LearningRate 0.000844 Epoch: 6 Global Step: 143800 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:49,176-Speed 2493.57 samples/sec Loss 4.7066 LearningRate 0.000844 Epoch: 6 Global Step: 143810 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:17:57,379-Speed 2496.92 samples/sec Loss 4.7195 LearningRate 0.000844 Epoch: 6 Global Step: 143820 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:05,527-Speed 2514.12 samples/sec Loss 4.6552 LearningRate 0.000844 Epoch: 6 Global Step: 143830 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:13,727-Speed 2497.82 samples/sec Loss 4.8048 LearningRate 0.000844 Epoch: 6 Global Step: 143840 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:21,930-Speed 2496.88 samples/sec Loss 4.7161 LearningRate 0.000844 Epoch: 6 Global Step: 143850 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:30,132-Speed 2497.36 samples/sec Loss 4.7152 LearningRate 0.000844 Epoch: 6 Global Step: 143860 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:38,344-Speed 2494.70 samples/sec Loss 4.6854 LearningRate 0.000843 Epoch: 6 Global Step: 143870 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:46,547-Speed 2496.91 samples/sec Loss 4.6837 LearningRate 0.000843 Epoch: 6 Global Step: 143880 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:18:54,695-Speed 2514.07 samples/sec Loss 4.6907 LearningRate 0.000843 Epoch: 6 Global Step: 143890 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:19:02,898-Speed 2496.88 samples/sec Loss 4.6500 LearningRate 0.000843 Epoch: 6 Global Step: 143900 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:19:11,097-Speed 2498.35 samples/sec Loss 4.7039 LearningRate 0.000843 Epoch: 6 Global Step: 143910 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:19:19,317-Speed 2491.83 samples/sec Loss 4.6685 LearningRate 0.000843 Epoch: 6 Global Step: 143920 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:19:27,511-Speed 2499.51 samples/sec Loss 4.7051 LearningRate 0.000843 Epoch: 6 Global Step: 143930 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:19:35,710-Speed 2498.47 samples/sec Loss 4.7380 LearningRate 0.000843 Epoch: 6 Global Step: 143940 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:19:43,850-Speed 2516.60 samples/sec Loss 4.6988 LearningRate 0.000843 Epoch: 6 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:19:52,049-Speed 2498.18 samples/sec Loss 4.7417 LearningRate 0.000843 Epoch: 6 Global Step: 143960 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:00,258-Speed 2495.61 samples/sec Loss 4.6772 LearningRate 0.000843 Epoch: 6 Global Step: 143970 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:08,456-Speed 2498.45 samples/sec Loss 4.6388 LearningRate 0.000843 Epoch: 6 Global Step: 143980 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:16,658-Speed 2497.29 samples/sec Loss 4.6872 LearningRate 0.000843 Epoch: 6 Global Step: 143990 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:24,855-Speed 2498.85 samples/sec Loss 4.6848 LearningRate 0.000843 Epoch: 6 Global Step: 144000 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:33,002-Speed 2514.11 samples/sec Loss 4.7107 LearningRate 0.000843 Epoch: 6 Global Step: 144010 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:41,208-Speed 2496.11 samples/sec Loss 4.6303 LearningRate 0.000843 Epoch: 6 Global Step: 144020 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:49,411-Speed 2497.11 samples/sec Loss 4.6803 LearningRate 0.000843 Epoch: 6 Global Step: 144030 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:20:57,614-Speed 2497.18 samples/sec Loss 4.6620 LearningRate 0.000843 Epoch: 6 Global Step: 144040 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:05,814-Speed 2497.89 samples/sec Loss 4.7680 LearningRate 0.000843 Epoch: 6 Global Step: 144050 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:14,014-Speed 2498.03 samples/sec Loss 4.7367 LearningRate 0.000843 Epoch: 6 Global Step: 144060 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:22,164-Speed 2513.37 samples/sec Loss 4.7284 LearningRate 0.000843 Epoch: 6 Global Step: 144070 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:30,366-Speed 2497.18 samples/sec Loss 4.6819 LearningRate 0.000843 Epoch: 6 Global Step: 144080 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:38,571-Speed 2496.66 samples/sec Loss 4.6074 LearningRate 0.000843 Epoch: 6 Global Step: 144090 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:46,774-Speed 2496.92 samples/sec Loss 4.6351 LearningRate 0.000843 Epoch: 6 Global Step: 144100 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:21:54,981-Speed 2495.88 samples/sec Loss 4.6050 LearningRate 0.000843 Epoch: 6 Global Step: 144110 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:22:03,180-Speed 2498.64 samples/sec Loss 4.6581 LearningRate 0.000843 Epoch: 6 Global Step: 144120 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:22:11,329-Speed 2513.58 samples/sec Loss 4.7374 LearningRate 0.000843 Epoch: 6 Global Step: 144130 Fp16 Grad Scale: 65536 Required: 157 hours Training: 2022-07-06 23:22:19,488-Speed 2510.58 samples/sec Loss 4.7236 LearningRate 0.000843 Epoch: 6 Global Step: 144140 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:22:27,685-Speed 2498.76 samples/sec Loss 4.7346 LearningRate 0.000843 Epoch: 6 Global Step: 144150 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:22:35,884-Speed 2498.05 samples/sec Loss 4.6630 LearningRate 0.000843 Epoch: 6 Global Step: 144160 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:22:44,094-Speed 2494.98 samples/sec Loss 4.6404 LearningRate 0.000843 Epoch: 6 Global Step: 144170 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:22:52,290-Speed 2499.02 samples/sec Loss 4.6502 LearningRate 0.000843 Epoch: 6 Global Step: 144180 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:00,434-Speed 2515.42 samples/sec Loss 4.6255 LearningRate 0.000843 Epoch: 6 Global Step: 144190 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:08,636-Speed 2497.29 samples/sec Loss 4.7045 LearningRate 0.000843 Epoch: 6 Global Step: 144200 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:16,844-Speed 2495.69 samples/sec Loss 4.6783 LearningRate 0.000843 Epoch: 6 Global Step: 144210 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:25,043-Speed 2498.16 samples/sec Loss 4.7244 LearningRate 0.000843 Epoch: 6 Global Step: 144220 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:33,246-Speed 2496.95 samples/sec Loss 4.6101 LearningRate 0.000843 Epoch: 6 Global Step: 144230 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:41,449-Speed 2497.13 samples/sec Loss 4.6716 LearningRate 0.000843 Epoch: 6 Global Step: 144240 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:49,600-Speed 2513.18 samples/sec Loss 4.6963 LearningRate 0.000843 Epoch: 6 Global Step: 144250 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:23:57,801-Speed 2497.62 samples/sec Loss 4.6231 LearningRate 0.000843 Epoch: 6 Global Step: 144260 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:06,005-Speed 2496.80 samples/sec Loss 4.7025 LearningRate 0.000842 Epoch: 6 Global Step: 144270 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:14,209-Speed 2496.51 samples/sec Loss 4.6146 LearningRate 0.000842 Epoch: 6 Global Step: 144280 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:22,423-Speed 2493.90 samples/sec Loss 4.6715 LearningRate 0.000842 Epoch: 6 Global Step: 144290 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:30,624-Speed 2497.28 samples/sec Loss 4.6689 LearningRate 0.000842 Epoch: 6 Global Step: 144300 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:38,774-Speed 2513.55 samples/sec Loss 4.6532 LearningRate 0.000842 Epoch: 6 Global Step: 144310 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:46,982-Speed 2495.43 samples/sec Loss 4.6752 LearningRate 0.000842 Epoch: 6 Global Step: 144320 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:24:55,184-Speed 2497.59 samples/sec Loss 4.6322 LearningRate 0.000842 Epoch: 6 Global Step: 144330 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:03,382-Speed 2498.68 samples/sec Loss 4.6519 LearningRate 0.000842 Epoch: 6 Global Step: 144340 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:11,580-Speed 2498.53 samples/sec Loss 4.6618 LearningRate 0.000842 Epoch: 6 Global Step: 144350 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:19,779-Speed 2498.12 samples/sec Loss 4.6858 LearningRate 0.000842 Epoch: 6 Global Step: 144360 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:27,924-Speed 2514.97 samples/sec Loss 4.5719 LearningRate 0.000842 Epoch: 6 Global Step: 144370 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:36,139-Speed 2493.18 samples/sec Loss 4.7122 LearningRate 0.000842 Epoch: 6 Global Step: 144380 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:44,341-Speed 2497.36 samples/sec Loss 4.7138 LearningRate 0.000842 Epoch: 6 Global Step: 144390 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:25:52,539-Speed 2498.86 samples/sec Loss 4.7265 LearningRate 0.000842 Epoch: 6 Global Step: 144400 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:00,733-Speed 2499.58 samples/sec Loss 4.7304 LearningRate 0.000842 Epoch: 6 Global Step: 144410 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:08,934-Speed 2497.50 samples/sec Loss 4.7210 LearningRate 0.000842 Epoch: 6 Global Step: 144420 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:17,083-Speed 2513.68 samples/sec Loss 4.7912 LearningRate 0.000842 Epoch: 6 Global Step: 144430 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:25,282-Speed 2498.32 samples/sec Loss 4.7631 LearningRate 0.000842 Epoch: 6 Global Step: 144440 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:33,479-Speed 2498.95 samples/sec Loss 4.7453 LearningRate 0.000842 Epoch: 6 Global Step: 144450 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:41,676-Speed 2498.84 samples/sec Loss 4.7171 LearningRate 0.000842 Epoch: 6 Global Step: 144460 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:49,876-Speed 2497.87 samples/sec Loss 4.7256 LearningRate 0.000842 Epoch: 6 Global Step: 144470 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:26:58,075-Speed 2498.54 samples/sec Loss 4.7151 LearningRate 0.000842 Epoch: 6 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:06,228-Speed 2512.43 samples/sec Loss 4.6844 LearningRate 0.000842 Epoch: 6 Global Step: 144490 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:14,426-Speed 2498.54 samples/sec Loss 4.8031 LearningRate 0.000842 Epoch: 6 Global Step: 144500 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:22,625-Speed 2498.24 samples/sec Loss 4.7143 LearningRate 0.000842 Epoch: 6 Global Step: 144510 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:30,824-Speed 2498.29 samples/sec Loss 4.7111 LearningRate 0.000842 Epoch: 6 Global Step: 144520 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:39,029-Speed 2496.60 samples/sec Loss 4.6268 LearningRate 0.000842 Epoch: 6 Global Step: 144530 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:47,229-Speed 2497.89 samples/sec Loss 4.6212 LearningRate 0.000842 Epoch: 6 Global Step: 144540 Fp16 Grad Scale: 32768 Required: 157 hours Training: 2022-07-06 23:27:55,377-Speed 2514.01 samples/sec Loss 4.6480 LearningRate 0.000842 Epoch: 6 Global Step: 144550 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:03,575-Speed 2498.62 samples/sec Loss 4.6452 LearningRate 0.000842 Epoch: 6 Global Step: 144560 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:11,779-Speed 2496.87 samples/sec Loss 4.6532 LearningRate 0.000842 Epoch: 6 Global Step: 144570 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:19,979-Speed 2497.98 samples/sec Loss 4.6584 LearningRate 0.000842 Epoch: 6 Global Step: 144580 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:28,177-Speed 2498.48 samples/sec Loss 4.6069 LearningRate 0.000842 Epoch: 6 Global Step: 144590 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:36,375-Speed 2498.48 samples/sec Loss 4.6282 LearningRate 0.000842 Epoch: 6 Global Step: 144600 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:44,520-Speed 2514.94 samples/sec Loss 4.7147 LearningRate 0.000842 Epoch: 6 Global Step: 144610 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:28:52,718-Speed 2498.52 samples/sec Loss 4.7118 LearningRate 0.000842 Epoch: 6 Global Step: 144620 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:00,922-Speed 2496.71 samples/sec Loss 4.7051 LearningRate 0.000842 Epoch: 6 Global Step: 144630 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:09,118-Speed 2499.18 samples/sec Loss 4.6767 LearningRate 0.000842 Epoch: 6 Global Step: 144640 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:17,317-Speed 2498.47 samples/sec Loss 4.5845 LearningRate 0.000842 Epoch: 6 Global Step: 144650 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:25,514-Speed 2498.69 samples/sec Loss 4.6338 LearningRate 0.000842 Epoch: 6 Global Step: 144660 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:33,660-Speed 2514.76 samples/sec Loss 4.6475 LearningRate 0.000842 Epoch: 6 Global Step: 144670 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:41,859-Speed 2498.10 samples/sec Loss 4.7073 LearningRate 0.000841 Epoch: 6 Global Step: 144680 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:50,057-Speed 2498.54 samples/sec Loss 4.6890 LearningRate 0.000841 Epoch: 6 Global Step: 144690 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:29:58,268-Speed 2494.74 samples/sec Loss 4.6476 LearningRate 0.000841 Epoch: 6 Global Step: 144700 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:06,467-Speed 2498.18 samples/sec Loss 4.6982 LearningRate 0.000841 Epoch: 6 Global Step: 144710 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:14,681-Speed 2494.09 samples/sec Loss 4.6143 LearningRate 0.000841 Epoch: 6 Global Step: 144720 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:22,837-Speed 2511.33 samples/sec Loss 4.6675 LearningRate 0.000841 Epoch: 6 Global Step: 144730 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:31,040-Speed 2497.14 samples/sec Loss 4.6347 LearningRate 0.000841 Epoch: 6 Global Step: 144740 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:39,236-Speed 2499.04 samples/sec Loss 4.5915 LearningRate 0.000841 Epoch: 6 Global Step: 144750 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:47,438-Speed 2497.32 samples/sec Loss 4.7225 LearningRate 0.000841 Epoch: 6 Global Step: 144760 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:30:55,641-Speed 2497.38 samples/sec Loss 4.6065 LearningRate 0.000841 Epoch: 6 Global Step: 144770 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:03,841-Speed 2497.84 samples/sec Loss 4.7471 LearningRate 0.000841 Epoch: 6 Global Step: 144780 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:11,991-Speed 2513.26 samples/sec Loss 4.6782 LearningRate 0.000841 Epoch: 6 Global Step: 144790 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:20,193-Speed 2497.52 samples/sec Loss 4.6410 LearningRate 0.000841 Epoch: 6 Global Step: 144800 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:28,392-Speed 2498.85 samples/sec Loss 4.6239 LearningRate 0.000841 Epoch: 6 Global Step: 144810 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:36,592-Speed 2497.97 samples/sec Loss 4.5952 LearningRate 0.000841 Epoch: 6 Global Step: 144820 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:44,790-Speed 2498.64 samples/sec Loss 4.6680 LearningRate 0.000841 Epoch: 6 Global Step: 144830 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:31:52,989-Speed 2498.23 samples/sec Loss 4.5714 LearningRate 0.000841 Epoch: 6 Global Step: 144840 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:01,137-Speed 2513.55 samples/sec Loss 4.6466 LearningRate 0.000841 Epoch: 6 Global Step: 144850 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:09,337-Speed 2498.06 samples/sec Loss 4.5837 LearningRate 0.000841 Epoch: 6 Global Step: 144860 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:17,539-Speed 2497.81 samples/sec Loss 4.6642 LearningRate 0.000841 Epoch: 6 Global Step: 144870 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:25,737-Speed 2498.26 samples/sec Loss 4.6049 LearningRate 0.000841 Epoch: 6 Global Step: 144880 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:33,940-Speed 2497.20 samples/sec Loss 4.6188 LearningRate 0.000841 Epoch: 6 Global Step: 144890 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:42,140-Speed 2498.08 samples/sec Loss 4.6466 LearningRate 0.000841 Epoch: 6 Global Step: 144900 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:50,301-Speed 2510.47 samples/sec Loss 4.6536 LearningRate 0.000841 Epoch: 6 Global Step: 144910 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:32:58,502-Speed 2497.67 samples/sec Loss 4.7236 LearningRate 0.000841 Epoch: 6 Global Step: 144920 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:06,699-Speed 2498.88 samples/sec Loss 4.6770 LearningRate 0.000841 Epoch: 6 Global Step: 144930 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:14,898-Speed 2498.52 samples/sec Loss 4.6068 LearningRate 0.000841 Epoch: 6 Global Step: 144940 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:23,111-Speed 2493.98 samples/sec Loss 4.7008 LearningRate 0.000841 Epoch: 6 Global Step: 144950 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:31,310-Speed 2498.24 samples/sec Loss 4.6241 LearningRate 0.000841 Epoch: 6 Global Step: 144960 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:39,457-Speed 2514.18 samples/sec Loss 4.6263 LearningRate 0.000841 Epoch: 6 Global Step: 144970 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:47,660-Speed 2496.94 samples/sec Loss 4.6851 LearningRate 0.000841 Epoch: 6 Global Step: 144980 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:33:55,860-Speed 2498.07 samples/sec Loss 4.6377 LearningRate 0.000841 Epoch: 6 Global Step: 144990 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:04,065-Speed 2496.39 samples/sec Loss 4.6401 LearningRate 0.000841 Epoch: 6 Global Step: 145000 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:12,268-Speed 2498.07 samples/sec Loss 4.6884 LearningRate 0.000841 Epoch: 6 Global Step: 145010 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:20,471-Speed 2497.23 samples/sec Loss 4.5785 LearningRate 0.000841 Epoch: 6 Global Step: 145020 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:28,620-Speed 2513.41 samples/sec Loss 4.6011 LearningRate 0.000841 Epoch: 6 Global Step: 145030 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:36,833-Speed 2494.48 samples/sec Loss 4.6740 LearningRate 0.000841 Epoch: 6 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:45,045-Speed 2494.14 samples/sec Loss 4.6784 LearningRate 0.000841 Epoch: 6 Global Step: 145050 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:34:53,256-Speed 2494.76 samples/sec Loss 4.6918 LearningRate 0.000841 Epoch: 6 Global Step: 145060 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:01,455-Speed 2498.35 samples/sec Loss 4.6132 LearningRate 0.000841 Epoch: 6 Global Step: 145070 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:09,657-Speed 2497.19 samples/sec Loss 4.6970 LearningRate 0.000841 Epoch: 6 Global Step: 145080 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:17,802-Speed 2514.88 samples/sec Loss 4.7047 LearningRate 0.000840 Epoch: 6 Global Step: 145090 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:26,003-Speed 2497.68 samples/sec Loss 4.6362 LearningRate 0.000840 Epoch: 6 Global Step: 145100 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:34,204-Speed 2497.87 samples/sec Loss 4.6336 LearningRate 0.000840 Epoch: 6 Global Step: 145110 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:42,402-Speed 2498.72 samples/sec Loss 4.6658 LearningRate 0.000840 Epoch: 6 Global Step: 145120 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:50,609-Speed 2495.77 samples/sec Loss 4.6570 LearningRate 0.000840 Epoch: 6 Global Step: 145130 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:35:58,811-Speed 2497.67 samples/sec Loss 4.6482 LearningRate 0.000840 Epoch: 6 Global Step: 145140 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:06,957-Speed 2514.76 samples/sec Loss 4.5880 LearningRate 0.000840 Epoch: 6 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:15,165-Speed 2495.43 samples/sec Loss 4.6542 LearningRate 0.000840 Epoch: 6 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:23,365-Speed 2497.95 samples/sec Loss 4.7139 LearningRate 0.000840 Epoch: 6 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:33,804-Speed 1962.35 samples/sec Loss 4.6329 LearningRate 0.000840 Epoch: 7 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:42,001-Speed 2498.84 samples/sec Loss 4.6589 LearningRate 0.000840 Epoch: 7 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:50,192-Speed 2500.69 samples/sec Loss 4.6709 LearningRate 0.000840 Epoch: 7 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:36:58,335-Speed 2515.57 samples/sec Loss 4.6543 LearningRate 0.000840 Epoch: 7 Global Step: 145210 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:06,534-Speed 2498.15 samples/sec Loss 4.7663 LearningRate 0.000840 Epoch: 7 Global Step: 145220 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:14,729-Speed 2499.55 samples/sec Loss 4.7529 LearningRate 0.000840 Epoch: 7 Global Step: 145230 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:22,926-Speed 2498.85 samples/sec Loss 4.6896 LearningRate 0.000840 Epoch: 7 Global Step: 145240 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:31,123-Speed 2498.71 samples/sec Loss 4.6403 LearningRate 0.000840 Epoch: 7 Global Step: 145250 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:39,321-Speed 2498.41 samples/sec Loss 4.6246 LearningRate 0.000840 Epoch: 7 Global Step: 145260 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:47,469-Speed 2513.96 samples/sec Loss 4.6174 LearningRate 0.000840 Epoch: 7 Global Step: 145270 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:37:55,668-Speed 2498.37 samples/sec Loss 4.7190 LearningRate 0.000840 Epoch: 7 Global Step: 145280 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:03,869-Speed 2497.62 samples/sec Loss 4.7513 LearningRate 0.000840 Epoch: 7 Global Step: 145290 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:12,069-Speed 2497.89 samples/sec Loss 4.6708 LearningRate 0.000840 Epoch: 7 Global Step: 145300 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:20,265-Speed 2499.05 samples/sec Loss 4.7323 LearningRate 0.000840 Epoch: 7 Global Step: 145310 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:28,466-Speed 2497.71 samples/sec Loss 4.5956 LearningRate 0.000840 Epoch: 7 Global Step: 145320 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:36,614-Speed 2513.92 samples/sec Loss 4.6857 LearningRate 0.000840 Epoch: 7 Global Step: 145330 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-06 23:38:44,810-Speed 2499.08 samples/sec Loss 4.6781 LearningRate 0.000840 Epoch: 7 Global Step: 145340 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:38:53,023-Speed 2494.23 samples/sec Loss 4.7098 LearningRate 0.000840 Epoch: 7 Global Step: 145350 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:01,219-Speed 2499.21 samples/sec Loss 4.6107 LearningRate 0.000840 Epoch: 7 Global Step: 145360 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:09,415-Speed 2499.02 samples/sec Loss 4.6080 LearningRate 0.000840 Epoch: 7 Global Step: 145370 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:17,609-Speed 2499.78 samples/sec Loss 4.6409 LearningRate 0.000840 Epoch: 7 Global Step: 145380 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:25,752-Speed 2515.42 samples/sec Loss 4.7422 LearningRate 0.000840 Epoch: 7 Global Step: 145390 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:33,964-Speed 2494.32 samples/sec Loss 4.6198 LearningRate 0.000840 Epoch: 7 Global Step: 145400 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:42,163-Speed 2498.58 samples/sec Loss 4.6352 LearningRate 0.000840 Epoch: 7 Global Step: 145410 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:50,365-Speed 2497.30 samples/sec Loss 4.6603 LearningRate 0.000840 Epoch: 7 Global Step: 145420 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:39:58,565-Speed 2497.93 samples/sec Loss 4.6831 LearningRate 0.000840 Epoch: 7 Global Step: 145430 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:06,769-Speed 2496.76 samples/sec Loss 4.5728 LearningRate 0.000840 Epoch: 7 Global Step: 145440 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:14,915-Speed 2514.39 samples/sec Loss 4.5792 LearningRate 0.000840 Epoch: 7 Global Step: 145450 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:23,114-Speed 2498.40 samples/sec Loss 4.5803 LearningRate 0.000840 Epoch: 7 Global Step: 145460 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:31,312-Speed 2499.06 samples/sec Loss 4.6320 LearningRate 0.000840 Epoch: 7 Global Step: 145470 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:39,510-Speed 2498.64 samples/sec Loss 4.6358 LearningRate 0.000840 Epoch: 7 Global Step: 145480 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:47,708-Speed 2498.47 samples/sec Loss 4.6431 LearningRate 0.000840 Epoch: 7 Global Step: 145490 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:40:55,905-Speed 2498.75 samples/sec Loss 4.6659 LearningRate 0.000839 Epoch: 7 Global Step: 145500 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:04,048-Speed 2515.63 samples/sec Loss 4.6999 LearningRate 0.000839 Epoch: 7 Global Step: 145510 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:12,245-Speed 2498.85 samples/sec Loss 4.6385 LearningRate 0.000839 Epoch: 7 Global Step: 145520 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:20,442-Speed 2498.76 samples/sec Loss 4.6282 LearningRate 0.000839 Epoch: 7 Global Step: 145530 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:28,638-Speed 2499.24 samples/sec Loss 4.5848 LearningRate 0.000839 Epoch: 7 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:36,835-Speed 2498.81 samples/sec Loss 4.6472 LearningRate 0.000839 Epoch: 7 Global Step: 145550 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:45,031-Speed 2499.59 samples/sec Loss 4.6837 LearningRate 0.000839 Epoch: 7 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:41:53,173-Speed 2515.49 samples/sec Loss 4.7122 LearningRate 0.000839 Epoch: 7 Global Step: 145570 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:01,371-Speed 2498.80 samples/sec Loss 4.6958 LearningRate 0.000839 Epoch: 7 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:09,567-Speed 2499.11 samples/sec Loss 4.6821 LearningRate 0.000839 Epoch: 7 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:17,761-Speed 2499.84 samples/sec Loss 4.7313 LearningRate 0.000839 Epoch: 7 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:25,957-Speed 2499.11 samples/sec Loss 4.6757 LearningRate 0.000839 Epoch: 7 Global Step: 145610 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:34,158-Speed 2497.86 samples/sec Loss 4.6348 LearningRate 0.000839 Epoch: 7 Global Step: 145620 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:42,297-Speed 2516.79 samples/sec Loss 4.7255 LearningRate 0.000839 Epoch: 7 Global Step: 145630 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:50,851-Speed 2394.37 samples/sec Loss 4.6032 LearningRate 0.000839 Epoch: 7 Global Step: 145640 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:42:59,049-Speed 2498.54 samples/sec Loss 4.7666 LearningRate 0.000839 Epoch: 7 Global Step: 145650 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:07,681-Speed 2492.93 samples/sec Loss 4.6232 LearningRate 0.000839 Epoch: 7 Global Step: 145660 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:15,879-Speed 2498.31 samples/sec Loss 4.6625 LearningRate 0.000839 Epoch: 7 Global Step: 145670 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:24,082-Speed 2497.08 samples/sec Loss 4.7574 LearningRate 0.000839 Epoch: 7 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:32,228-Speed 2514.61 samples/sec Loss 4.6507 LearningRate 0.000839 Epoch: 7 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:40,426-Speed 2498.44 samples/sec Loss 4.6769 LearningRate 0.000839 Epoch: 7 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:48,625-Speed 2498.42 samples/sec Loss 4.6777 LearningRate 0.000839 Epoch: 7 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:43:56,826-Speed 2497.65 samples/sec Loss 4.6482 LearningRate 0.000839 Epoch: 7 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:05,026-Speed 2497.90 samples/sec Loss 4.7281 LearningRate 0.000839 Epoch: 7 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:13,227-Speed 2497.61 samples/sec Loss 4.6415 LearningRate 0.000839 Epoch: 7 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:21,373-Speed 2514.43 samples/sec Loss 4.5671 LearningRate 0.000839 Epoch: 7 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:29,578-Speed 2496.48 samples/sec Loss 4.5683 LearningRate 0.000839 Epoch: 7 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:37,782-Speed 2496.86 samples/sec Loss 4.7519 LearningRate 0.000839 Epoch: 7 Global Step: 145770 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:45,988-Speed 2496.41 samples/sec Loss 4.7415 LearningRate 0.000839 Epoch: 7 Global Step: 145780 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:44:54,199-Speed 2494.45 samples/sec Loss 4.6781 LearningRate 0.000839 Epoch: 7 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:02,397-Speed 2498.59 samples/sec Loss 4.6555 LearningRate 0.000839 Epoch: 7 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:10,543-Speed 2514.56 samples/sec Loss 4.7002 LearningRate 0.000839 Epoch: 7 Global Step: 145810 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:18,744-Speed 2497.82 samples/sec Loss 4.6109 LearningRate 0.000839 Epoch: 7 Global Step: 145820 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:26,947-Speed 2497.29 samples/sec Loss 4.6671 LearningRate 0.000839 Epoch: 7 Global Step: 145830 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:35,143-Speed 2499.14 samples/sec Loss 4.6564 LearningRate 0.000839 Epoch: 7 Global Step: 145840 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:43,341-Speed 2498.85 samples/sec Loss 4.6859 LearningRate 0.000839 Epoch: 7 Global Step: 145850 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:51,544-Speed 2496.81 samples/sec Loss 4.6849 LearningRate 0.000839 Epoch: 7 Global Step: 145860 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:45:59,687-Speed 2515.59 samples/sec Loss 4.5372 LearningRate 0.000839 Epoch: 7 Global Step: 145870 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:07,901-Speed 2493.95 samples/sec Loss 4.7371 LearningRate 0.000839 Epoch: 7 Global Step: 145880 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:16,098-Speed 2498.76 samples/sec Loss 4.7770 LearningRate 0.000839 Epoch: 7 Global Step: 145890 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:24,296-Speed 2498.76 samples/sec Loss 4.6322 LearningRate 0.000838 Epoch: 7 Global Step: 145900 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:32,493-Speed 2499.04 samples/sec Loss 4.6229 LearningRate 0.000838 Epoch: 7 Global Step: 145910 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:40,693-Speed 2497.79 samples/sec Loss 4.6576 LearningRate 0.000838 Epoch: 7 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:48,840-Speed 2514.51 samples/sec Loss 4.6865 LearningRate 0.000838 Epoch: 7 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:46:57,036-Speed 2498.92 samples/sec Loss 4.5640 LearningRate 0.000838 Epoch: 7 Global Step: 145940 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:05,233-Speed 2498.71 samples/sec Loss 4.6028 LearningRate 0.000838 Epoch: 7 Global Step: 145950 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:13,456-Speed 2491.20 samples/sec Loss 4.5790 LearningRate 0.000838 Epoch: 7 Global Step: 145960 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:21,651-Speed 2499.35 samples/sec Loss 4.6518 LearningRate 0.000838 Epoch: 7 Global Step: 145970 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:29,863-Speed 2494.32 samples/sec Loss 4.5959 LearningRate 0.000838 Epoch: 7 Global Step: 145980 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:38,009-Speed 2514.58 samples/sec Loss 4.6460 LearningRate 0.000838 Epoch: 7 Global Step: 145990 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:46,206-Speed 2498.85 samples/sec Loss 4.6373 LearningRate 0.000838 Epoch: 7 Global Step: 146000 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:47:54,402-Speed 2499.14 samples/sec Loss 4.6022 LearningRate 0.000838 Epoch: 7 Global Step: 146010 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:02,604-Speed 2497.50 samples/sec Loss 4.6658 LearningRate 0.000838 Epoch: 7 Global Step: 146020 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:10,801-Speed 2498.78 samples/sec Loss 4.6174 LearningRate 0.000838 Epoch: 7 Global Step: 146030 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:19,001-Speed 2498.29 samples/sec Loss 4.5740 LearningRate 0.000838 Epoch: 7 Global Step: 146040 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:27,144-Speed 2515.37 samples/sec Loss 4.6034 LearningRate 0.000838 Epoch: 7 Global Step: 146050 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:35,341-Speed 2498.81 samples/sec Loss 4.6308 LearningRate 0.000838 Epoch: 7 Global Step: 146060 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:43,546-Speed 2496.89 samples/sec Loss 4.6739 LearningRate 0.000838 Epoch: 7 Global Step: 146070 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:51,742-Speed 2498.92 samples/sec Loss 4.6759 LearningRate 0.000838 Epoch: 7 Global Step: 146080 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:48:59,939-Speed 2498.99 samples/sec Loss 4.6005 LearningRate 0.000838 Epoch: 7 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:08,139-Speed 2498.01 samples/sec Loss 4.5992 LearningRate 0.000838 Epoch: 7 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:16,285-Speed 2514.59 samples/sec Loss 4.5674 LearningRate 0.000838 Epoch: 7 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:24,482-Speed 2498.91 samples/sec Loss 4.5298 LearningRate 0.000838 Epoch: 7 Global Step: 146120 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:32,676-Speed 2499.55 samples/sec Loss 4.6262 LearningRate 0.000838 Epoch: 7 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:40,870-Speed 2500.00 samples/sec Loss 4.6983 LearningRate 0.000838 Epoch: 7 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:49,074-Speed 2496.66 samples/sec Loss 4.6185 LearningRate 0.000838 Epoch: 7 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:49:57,275-Speed 2497.61 samples/sec Loss 4.6349 LearningRate 0.000838 Epoch: 7 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:05,421-Speed 2514.54 samples/sec Loss 4.8417 LearningRate 0.000838 Epoch: 7 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:13,617-Speed 2499.14 samples/sec Loss 4.5984 LearningRate 0.000838 Epoch: 7 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:21,815-Speed 2498.82 samples/sec Loss 4.7102 LearningRate 0.000838 Epoch: 7 Global Step: 146190 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:30,011-Speed 2498.89 samples/sec Loss 4.7982 LearningRate 0.000838 Epoch: 7 Global Step: 146200 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:38,208-Speed 2498.97 samples/sec Loss 4.6875 LearningRate 0.000838 Epoch: 7 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:46,405-Speed 2498.73 samples/sec Loss 4.7832 LearningRate 0.000838 Epoch: 7 Global Step: 146220 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:50:54,563-Speed 2510.80 samples/sec Loss 4.8111 LearningRate 0.000838 Epoch: 7 Global Step: 146230 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:02,763-Speed 2498.39 samples/sec Loss 4.7159 LearningRate 0.000838 Epoch: 7 Global Step: 146240 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:10,961-Speed 2498.69 samples/sec Loss 4.6780 LearningRate 0.000838 Epoch: 7 Global Step: 146250 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:19,160-Speed 2498.24 samples/sec Loss 4.6238 LearningRate 0.000838 Epoch: 7 Global Step: 146260 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:27,372-Speed 2494.11 samples/sec Loss 4.6913 LearningRate 0.000838 Epoch: 7 Global Step: 146270 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:35,570-Speed 2498.91 samples/sec Loss 4.6379 LearningRate 0.000838 Epoch: 7 Global Step: 146280 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:43,713-Speed 2515.18 samples/sec Loss 4.6048 LearningRate 0.000838 Epoch: 7 Global Step: 146290 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:51:51,910-Speed 2499.11 samples/sec Loss 4.7147 LearningRate 0.000838 Epoch: 7 Global Step: 146300 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:00,111-Speed 2497.77 samples/sec Loss 4.6759 LearningRate 0.000837 Epoch: 7 Global Step: 146310 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:08,307-Speed 2499.48 samples/sec Loss 4.7403 LearningRate 0.000837 Epoch: 7 Global Step: 146320 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:16,504-Speed 2498.81 samples/sec Loss 4.5723 LearningRate 0.000837 Epoch: 7 Global Step: 146330 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:24,702-Speed 2498.43 samples/sec Loss 4.6556 LearningRate 0.000837 Epoch: 7 Global Step: 146340 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:32,848-Speed 2514.70 samples/sec Loss 4.5838 LearningRate 0.000837 Epoch: 7 Global Step: 146350 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:41,044-Speed 2499.24 samples/sec Loss 4.6525 LearningRate 0.000837 Epoch: 7 Global Step: 146360 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:49,240-Speed 2499.26 samples/sec Loss 4.6569 LearningRate 0.000837 Epoch: 7 Global Step: 146370 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:52:57,439-Speed 2498.09 samples/sec Loss 4.6255 LearningRate 0.000837 Epoch: 7 Global Step: 146380 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:05,640-Speed 2497.82 samples/sec Loss 4.7238 LearningRate 0.000837 Epoch: 7 Global Step: 146390 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:13,849-Speed 2494.96 samples/sec Loss 4.6169 LearningRate 0.000837 Epoch: 7 Global Step: 146400 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:21,997-Speed 2514.09 samples/sec Loss 4.6340 LearningRate 0.000837 Epoch: 7 Global Step: 146410 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:30,199-Speed 2497.41 samples/sec Loss 4.5971 LearningRate 0.000837 Epoch: 7 Global Step: 146420 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:38,401-Speed 2497.27 samples/sec Loss 4.6402 LearningRate 0.000837 Epoch: 7 Global Step: 146430 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:46,598-Speed 2498.68 samples/sec Loss 4.6125 LearningRate 0.000837 Epoch: 7 Global Step: 146440 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:53:54,798-Speed 2498.07 samples/sec Loss 4.6670 LearningRate 0.000837 Epoch: 7 Global Step: 146450 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:02,997-Speed 2498.05 samples/sec Loss 4.6838 LearningRate 0.000837 Epoch: 7 Global Step: 146460 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:11,144-Speed 2514.45 samples/sec Loss 4.5830 LearningRate 0.000837 Epoch: 7 Global Step: 146470 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:19,343-Speed 2498.20 samples/sec Loss 4.6940 LearningRate 0.000837 Epoch: 7 Global Step: 146480 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:27,544-Speed 2497.77 samples/sec Loss 4.6463 LearningRate 0.000837 Epoch: 7 Global Step: 146490 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:35,739-Speed 2499.48 samples/sec Loss 4.5934 LearningRate 0.000837 Epoch: 7 Global Step: 146500 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:43,942-Speed 2497.00 samples/sec Loss 4.5981 LearningRate 0.000837 Epoch: 7 Global Step: 146510 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:54:52,143-Speed 2497.77 samples/sec Loss 4.5946 LearningRate 0.000837 Epoch: 7 Global Step: 146520 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:55:00,296-Speed 2512.34 samples/sec Loss 4.7110 LearningRate 0.000837 Epoch: 7 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:55:08,499-Speed 2497.05 samples/sec Loss 4.6976 LearningRate 0.000837 Epoch: 7 Global Step: 146540 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:16,697-Speed 2498.58 samples/sec Loss 4.6476 LearningRate 0.000837 Epoch: 7 Global Step: 146550 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:24,906-Speed 2495.30 samples/sec Loss 4.7190 LearningRate 0.000837 Epoch: 7 Global Step: 146560 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:33,108-Speed 2497.28 samples/sec Loss 4.6694 LearningRate 0.000837 Epoch: 7 Global Step: 146570 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:41,307-Speed 2497.97 samples/sec Loss 4.6247 LearningRate 0.000837 Epoch: 7 Global Step: 146580 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:49,456-Speed 2513.83 samples/sec Loss 4.6142 LearningRate 0.000837 Epoch: 7 Global Step: 146590 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:55:57,659-Speed 2496.95 samples/sec Loss 4.6807 LearningRate 0.000837 Epoch: 7 Global Step: 146600 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:05,858-Speed 2498.22 samples/sec Loss 4.6079 LearningRate 0.000837 Epoch: 7 Global Step: 146610 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:14,059-Speed 2497.76 samples/sec Loss 4.6218 LearningRate 0.000837 Epoch: 7 Global Step: 146620 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:22,258-Speed 2498.21 samples/sec Loss 4.6414 LearningRate 0.000837 Epoch: 7 Global Step: 146630 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:30,456-Speed 2498.75 samples/sec Loss 4.6084 LearningRate 0.000837 Epoch: 7 Global Step: 146640 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:38,602-Speed 2514.34 samples/sec Loss 4.5859 LearningRate 0.000837 Epoch: 7 Global Step: 146650 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:46,799-Speed 2498.99 samples/sec Loss 4.6090 LearningRate 0.000837 Epoch: 7 Global Step: 146660 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:56:55,008-Speed 2495.31 samples/sec Loss 4.5569 LearningRate 0.000837 Epoch: 7 Global Step: 146670 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:57:03,204-Speed 2499.39 samples/sec Loss 4.6465 LearningRate 0.000837 Epoch: 7 Global Step: 146680 Fp16 Grad Scale: 131072 Required: 156 hours Training: 2022-07-06 23:57:11,357-Speed 2512.29 samples/sec Loss 4.6243 LearningRate 0.000837 Epoch: 7 Global Step: 146690 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:57:19,553-Speed 2499.02 samples/sec Loss 4.5930 LearningRate 0.000837 Epoch: 7 Global Step: 146700 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:57:27,696-Speed 2515.57 samples/sec Loss 4.5883 LearningRate 0.000837 Epoch: 7 Global Step: 146710 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:57:35,895-Speed 2499.10 samples/sec Loss 4.6013 LearningRate 0.000836 Epoch: 7 Global Step: 146720 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:57:44,090-Speed 2499.43 samples/sec Loss 4.6196 LearningRate 0.000836 Epoch: 7 Global Step: 146730 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:57:52,287-Speed 2499.22 samples/sec Loss 4.6917 LearningRate 0.000836 Epoch: 7 Global Step: 146740 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:00,488-Speed 2497.64 samples/sec Loss 4.5443 LearningRate 0.000836 Epoch: 7 Global Step: 146750 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:08,683-Speed 2499.56 samples/sec Loss 4.5768 LearningRate 0.000836 Epoch: 7 Global Step: 146760 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:16,838-Speed 2511.64 samples/sec Loss 4.5445 LearningRate 0.000836 Epoch: 7 Global Step: 146770 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:25,036-Speed 2498.92 samples/sec Loss 4.5925 LearningRate 0.000836 Epoch: 7 Global Step: 146780 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:33,236-Speed 2498.04 samples/sec Loss 4.5195 LearningRate 0.000836 Epoch: 7 Global Step: 146790 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:41,432-Speed 2499.02 samples/sec Loss 4.5269 LearningRate 0.000836 Epoch: 7 Global Step: 146800 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:49,626-Speed 2499.76 samples/sec Loss 4.5943 LearningRate 0.000836 Epoch: 7 Global Step: 146810 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:58:57,823-Speed 2498.86 samples/sec Loss 4.5517 LearningRate 0.000836 Epoch: 7 Global Step: 146820 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:05,966-Speed 2515.63 samples/sec Loss 4.5653 LearningRate 0.000836 Epoch: 7 Global Step: 146830 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:14,161-Speed 2499.57 samples/sec Loss 4.6282 LearningRate 0.000836 Epoch: 7 Global Step: 146840 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:22,367-Speed 2495.94 samples/sec Loss 4.5476 LearningRate 0.000836 Epoch: 7 Global Step: 146850 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:30,565-Speed 2498.68 samples/sec Loss 4.6848 LearningRate 0.000836 Epoch: 7 Global Step: 146860 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:38,773-Speed 2495.69 samples/sec Loss 4.6294 LearningRate 0.000836 Epoch: 7 Global Step: 146870 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:46,972-Speed 2498.30 samples/sec Loss 4.6117 LearningRate 0.000836 Epoch: 7 Global Step: 146880 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-06 23:59:55,117-Speed 2514.87 samples/sec Loss 4.6868 LearningRate 0.000836 Epoch: 7 Global Step: 146890 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:03,317-Speed 2497.92 samples/sec Loss 4.6561 LearningRate 0.000836 Epoch: 7 Global Step: 146900 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:11,515-Speed 2498.58 samples/sec Loss 4.6323 LearningRate 0.000836 Epoch: 7 Global Step: 146910 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:19,725-Speed 2494.95 samples/sec Loss 4.5641 LearningRate 0.000836 Epoch: 7 Global Step: 146920 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:27,924-Speed 2498.21 samples/sec Loss 4.5522 LearningRate 0.000836 Epoch: 7 Global Step: 146930 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:36,126-Speed 2497.56 samples/sec Loss 4.5498 LearningRate 0.000836 Epoch: 7 Global Step: 146940 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:44,289-Speed 2509.10 samples/sec Loss 4.6430 LearningRate 0.000836 Epoch: 7 Global Step: 146950 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:00:52,489-Speed 2497.92 samples/sec Loss 4.5961 LearningRate 0.000836 Epoch: 7 Global Step: 146960 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:00,686-Speed 2498.90 samples/sec Loss 4.5947 LearningRate 0.000836 Epoch: 7 Global Step: 146970 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:08,884-Speed 2498.52 samples/sec Loss 4.6014 LearningRate 0.000836 Epoch: 7 Global Step: 146980 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:17,081-Speed 2499.30 samples/sec Loss 4.5876 LearningRate 0.000836 Epoch: 7 Global Step: 146990 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:25,276-Speed 2499.47 samples/sec Loss 4.5951 LearningRate 0.000836 Epoch: 7 Global Step: 147000 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:33,420-Speed 2515.22 samples/sec Loss 4.8166 LearningRate 0.000836 Epoch: 7 Global Step: 147010 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:41,625-Speed 2496.32 samples/sec Loss 4.7296 LearningRate 0.000836 Epoch: 7 Global Step: 147020 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:49,819-Speed 2499.62 samples/sec Loss 4.6834 LearningRate 0.000836 Epoch: 7 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:01:58,024-Speed 2496.50 samples/sec Loss 4.6293 LearningRate 0.000836 Epoch: 7 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:06,225-Speed 2497.77 samples/sec Loss 4.6567 LearningRate 0.000836 Epoch: 7 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:14,435-Speed 2494.55 samples/sec Loss 4.6769 LearningRate 0.000836 Epoch: 7 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:22,606-Speed 2517.01 samples/sec Loss 4.6280 LearningRate 0.000836 Epoch: 7 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:32,119-Speed 2166.31 samples/sec Loss 4.6781 LearningRate 0.000836 Epoch: 7 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:40,318-Speed 2498.54 samples/sec Loss 4.6117 LearningRate 0.000836 Epoch: 7 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:48,519-Speed 2497.49 samples/sec Loss 4.5764 LearningRate 0.000836 Epoch: 7 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:02:57,021-Speed 2425.48 samples/sec Loss 4.5998 LearningRate 0.000836 Epoch: 7 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:07,679-Speed 2500.54 samples/sec Loss 4.6537 LearningRate 0.000836 Epoch: 7 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:15,824-Speed 2514.53 samples/sec Loss 4.6198 LearningRate 0.000835 Epoch: 7 Global Step: 147130 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:24,022-Speed 2498.52 samples/sec Loss 4.5680 LearningRate 0.000835 Epoch: 7 Global Step: 147140 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:32,252-Speed 2500.53 samples/sec Loss 4.7135 LearningRate 0.000835 Epoch: 7 Global Step: 147150 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:40,490-Speed 2498.33 samples/sec Loss 4.5692 LearningRate 0.000835 Epoch: 7 Global Step: 147160 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:48,693-Speed 2496.96 samples/sec Loss 4.5866 LearningRate 0.000835 Epoch: 7 Global Step: 147170 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:03:56,922-Speed 2501.35 samples/sec Loss 4.6876 LearningRate 0.000835 Epoch: 7 Global Step: 147180 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:04:05,114-Speed 2509.94 samples/sec Loss 4.6159 LearningRate 0.000835 Epoch: 7 Global Step: 147190 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:04:13,315-Speed 2499.52 samples/sec Loss 4.5755 LearningRate 0.000835 Epoch: 7 Global Step: 147200 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:04:21,471-Speed 2511.19 samples/sec Loss 4.6023 LearningRate 0.000835 Epoch: 7 Global Step: 147210 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:04:29,681-Speed 2494.73 samples/sec Loss 4.5565 LearningRate 0.000835 Epoch: 7 Global Step: 147220 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:04:37,883-Speed 2498.18 samples/sec Loss 4.6370 LearningRate 0.000835 Epoch: 7 Global Step: 147230 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:04:46,117-Speed 2495.26 samples/sec Loss 4.6313 LearningRate 0.000835 Epoch: 7 Global Step: 147240 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:04:56,091-Speed 2053.55 samples/sec Loss 4.5717 LearningRate 0.000835 Epoch: 7 Global Step: 147250 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:04,379-Speed 2496.67 samples/sec Loss 4.6701 LearningRate 0.000835 Epoch: 7 Global Step: 147260 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:12,587-Speed 2501.80 samples/sec Loss 4.5423 LearningRate 0.000835 Epoch: 7 Global Step: 147270 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:22,866-Speed 1992.51 samples/sec Loss 4.6322 LearningRate 0.000835 Epoch: 7 Global Step: 147280 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:31,087-Speed 2500.59 samples/sec Loss 4.5636 LearningRate 0.000835 Epoch: 7 Global Step: 147290 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:39,333-Speed 2499.97 samples/sec Loss 4.5958 LearningRate 0.000835 Epoch: 7 Global Step: 147300 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:05:51,881-Speed 1632.35 samples/sec Loss 4.7032 LearningRate 0.000835 Epoch: 7 Global Step: 147310 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:00,072-Speed 2500.33 samples/sec Loss 4.6181 LearningRate 0.000835 Epoch: 7 Global Step: 147320 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:08,275-Speed 2502.53 samples/sec Loss 4.6399 LearningRate 0.000835 Epoch: 7 Global Step: 147330 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:16,519-Speed 2502.82 samples/sec Loss 4.7227 LearningRate 0.000835 Epoch: 7 Global Step: 147340 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:24,715-Speed 2499.36 samples/sec Loss 4.6339 LearningRate 0.000835 Epoch: 7 Global Step: 147350 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:34,908-Speed 2019.20 samples/sec Loss 4.7642 LearningRate 0.000835 Epoch: 7 Global Step: 147360 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:43,064-Speed 2518.83 samples/sec Loss 4.7053 LearningRate 0.000835 Epoch: 7 Global Step: 147370 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:06:51,269-Speed 2496.41 samples/sec Loss 4.7085 LearningRate 0.000835 Epoch: 7 Global Step: 147380 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:02,956-Speed 1758.29 samples/sec Loss 4.7202 LearningRate 0.000835 Epoch: 7 Global Step: 147390 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:13,606-Speed 1935.46 samples/sec Loss 4.7578 LearningRate 0.000835 Epoch: 7 Global Step: 147400 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:22,508-Speed 2300.89 samples/sec Loss 4.7993 LearningRate 0.000835 Epoch: 7 Global Step: 147410 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:30,702-Speed 2499.91 samples/sec Loss 4.7452 LearningRate 0.000835 Epoch: 7 Global Step: 147420 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:38,841-Speed 2516.62 samples/sec Loss 4.6825 LearningRate 0.000835 Epoch: 7 Global Step: 147430 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:47,036-Speed 2499.72 samples/sec Loss 4.6522 LearningRate 0.000835 Epoch: 7 Global Step: 147440 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:07:55,232-Speed 2499.86 samples/sec Loss 4.7197 LearningRate 0.000835 Epoch: 7 Global Step: 147450 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:03,425-Speed 2500.06 samples/sec Loss 4.7004 LearningRate 0.000835 Epoch: 7 Global Step: 147460 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:11,616-Speed 2500.52 samples/sec Loss 4.6558 LearningRate 0.000835 Epoch: 7 Global Step: 147470 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:19,813-Speed 2498.94 samples/sec Loss 4.5683 LearningRate 0.000835 Epoch: 7 Global Step: 147480 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:27,959-Speed 2514.54 samples/sec Loss 4.6438 LearningRate 0.000835 Epoch: 7 Global Step: 147490 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:36,160-Speed 2497.83 samples/sec Loss 4.6124 LearningRate 0.000835 Epoch: 7 Global Step: 147500 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:44,359-Speed 2498.22 samples/sec Loss 4.5601 LearningRate 0.000835 Epoch: 7 Global Step: 147510 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:08:52,558-Speed 2498.30 samples/sec Loss 4.5406 LearningRate 0.000835 Epoch: 7 Global Step: 147520 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:00,759-Speed 2497.66 samples/sec Loss 4.5527 LearningRate 0.000835 Epoch: 7 Global Step: 147530 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:08,960-Speed 2497.61 samples/sec Loss 4.6047 LearningRate 0.000834 Epoch: 7 Global Step: 147540 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:17,118-Speed 2510.70 samples/sec Loss 4.6297 LearningRate 0.000834 Epoch: 7 Global Step: 147550 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:25,321-Speed 2496.86 samples/sec Loss 4.5721 LearningRate 0.000834 Epoch: 7 Global Step: 147560 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:33,521-Speed 2498.12 samples/sec Loss 4.5909 LearningRate 0.000834 Epoch: 7 Global Step: 147570 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:41,722-Speed 2497.76 samples/sec Loss 4.5820 LearningRate 0.000834 Epoch: 7 Global Step: 147580 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:49,923-Speed 2497.47 samples/sec Loss 4.5912 LearningRate 0.000834 Epoch: 7 Global Step: 147590 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:09:58,125-Speed 2497.57 samples/sec Loss 4.6159 LearningRate 0.000834 Epoch: 7 Global Step: 147600 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:06,270-Speed 2514.82 samples/sec Loss 4.6100 LearningRate 0.000834 Epoch: 7 Global Step: 147610 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:14,468-Speed 2498.54 samples/sec Loss 4.5148 LearningRate 0.000834 Epoch: 7 Global Step: 147620 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:22,665-Speed 2498.91 samples/sec Loss 4.5311 LearningRate 0.000834 Epoch: 7 Global Step: 147630 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:30,864-Speed 2498.24 samples/sec Loss 4.5657 LearningRate 0.000834 Epoch: 7 Global Step: 147640 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:39,060-Speed 2499.12 samples/sec Loss 4.5578 LearningRate 0.000834 Epoch: 7 Global Step: 147650 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:47,258-Speed 2498.57 samples/sec Loss 4.6165 LearningRate 0.000834 Epoch: 7 Global Step: 147660 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:10:55,404-Speed 2514.83 samples/sec Loss 4.6277 LearningRate 0.000834 Epoch: 7 Global Step: 147670 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:03,603-Speed 2498.21 samples/sec Loss 4.4846 LearningRate 0.000834 Epoch: 7 Global Step: 147680 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:11,802-Speed 2498.22 samples/sec Loss 4.6186 LearningRate 0.000834 Epoch: 7 Global Step: 147690 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:20,014-Speed 2494.47 samples/sec Loss 4.5055 LearningRate 0.000834 Epoch: 7 Global Step: 147700 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:28,211-Speed 2498.73 samples/sec Loss 4.4512 LearningRate 0.000834 Epoch: 7 Global Step: 147710 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:36,416-Speed 2496.69 samples/sec Loss 4.5577 LearningRate 0.000834 Epoch: 7 Global Step: 147720 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:44,563-Speed 2514.32 samples/sec Loss 4.5442 LearningRate 0.000834 Epoch: 7 Global Step: 147730 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:11:52,768-Speed 2496.38 samples/sec Loss 4.5792 LearningRate 0.000834 Epoch: 7 Global Step: 147740 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:00,966-Speed 2498.64 samples/sec Loss 4.5871 LearningRate 0.000834 Epoch: 7 Global Step: 147750 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:09,166-Speed 2497.81 samples/sec Loss 4.4929 LearningRate 0.000834 Epoch: 7 Global Step: 147760 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:17,369-Speed 2497.07 samples/sec Loss 4.5624 LearningRate 0.000834 Epoch: 7 Global Step: 147770 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:25,570-Speed 2497.58 samples/sec Loss 4.5870 LearningRate 0.000834 Epoch: 7 Global Step: 147780 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:33,716-Speed 2514.45 samples/sec Loss 4.5043 LearningRate 0.000834 Epoch: 7 Global Step: 147790 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:41,936-Speed 2491.81 samples/sec Loss 4.5078 LearningRate 0.000834 Epoch: 7 Global Step: 147800 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:50,134-Speed 2498.55 samples/sec Loss 4.6120 LearningRate 0.000834 Epoch: 7 Global Step: 147810 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:12:58,332-Speed 2498.77 samples/sec Loss 4.5824 LearningRate 0.000834 Epoch: 7 Global Step: 147820 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:06,531-Speed 2498.26 samples/sec Loss 4.5273 LearningRate 0.000834 Epoch: 7 Global Step: 147830 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:14,729-Speed 2498.33 samples/sec Loss 4.5588 LearningRate 0.000834 Epoch: 7 Global Step: 147840 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:22,877-Speed 2514.00 samples/sec Loss 4.5070 LearningRate 0.000834 Epoch: 7 Global Step: 147850 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:31,082-Speed 2496.78 samples/sec Loss 4.5614 LearningRate 0.000834 Epoch: 7 Global Step: 147860 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:39,280-Speed 2498.61 samples/sec Loss 4.6077 LearningRate 0.000834 Epoch: 7 Global Step: 147870 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:47,486-Speed 2495.97 samples/sec Loss 4.5924 LearningRate 0.000834 Epoch: 7 Global Step: 147880 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:13:55,688-Speed 2497.34 samples/sec Loss 4.5570 LearningRate 0.000834 Epoch: 7 Global Step: 147890 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:03,895-Speed 2495.67 samples/sec Loss 4.6480 LearningRate 0.000834 Epoch: 7 Global Step: 147900 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:12,040-Speed 2514.90 samples/sec Loss 4.5380 LearningRate 0.000834 Epoch: 7 Global Step: 147910 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:20,245-Speed 2496.45 samples/sec Loss 4.5703 LearningRate 0.000834 Epoch: 7 Global Step: 147920 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:28,449-Speed 2496.94 samples/sec Loss 4.6125 LearningRate 0.000834 Epoch: 7 Global Step: 147930 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:36,649-Speed 2497.95 samples/sec Loss 4.6685 LearningRate 0.000833 Epoch: 7 Global Step: 147940 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:44,846-Speed 2498.88 samples/sec Loss 4.5525 LearningRate 0.000833 Epoch: 7 Global Step: 147950 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:14:53,046-Speed 2497.86 samples/sec Loss 4.5590 LearningRate 0.000833 Epoch: 7 Global Step: 147960 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:01,191-Speed 2514.80 samples/sec Loss 4.5756 LearningRate 0.000833 Epoch: 7 Global Step: 147970 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:09,402-Speed 2494.76 samples/sec Loss 4.5053 LearningRate 0.000833 Epoch: 7 Global Step: 147980 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:17,604-Speed 2497.43 samples/sec Loss 4.5713 LearningRate 0.000833 Epoch: 7 Global Step: 147990 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:25,801-Speed 2498.89 samples/sec Loss 4.6292 LearningRate 0.000833 Epoch: 7 Global Step: 148000 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:34,009-Speed 2495.70 samples/sec Loss 4.6053 LearningRate 0.000833 Epoch: 7 Global Step: 148010 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:42,206-Speed 2498.65 samples/sec Loss 4.5265 LearningRate 0.000833 Epoch: 7 Global Step: 148020 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:50,354-Speed 2513.90 samples/sec Loss 4.5428 LearningRate 0.000833 Epoch: 7 Global Step: 148030 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:15:58,551-Speed 2498.67 samples/sec Loss 4.5743 LearningRate 0.000833 Epoch: 7 Global Step: 148040 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:06,752-Speed 2497.64 samples/sec Loss 4.5854 LearningRate 0.000833 Epoch: 7 Global Step: 148050 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:14,963-Speed 2494.87 samples/sec Loss 4.5873 LearningRate 0.000833 Epoch: 7 Global Step: 148060 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:23,164-Speed 2497.55 samples/sec Loss 4.6669 LearningRate 0.000833 Epoch: 7 Global Step: 148070 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:31,365-Speed 2497.84 samples/sec Loss 4.5669 LearningRate 0.000833 Epoch: 7 Global Step: 148080 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:39,515-Speed 2513.48 samples/sec Loss 4.6045 LearningRate 0.000833 Epoch: 7 Global Step: 148090 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:47,717-Speed 2497.49 samples/sec Loss 4.5963 LearningRate 0.000833 Epoch: 7 Global Step: 148100 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:16:55,914-Speed 2498.68 samples/sec Loss 4.4918 LearningRate 0.000833 Epoch: 7 Global Step: 148110 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:04,126-Speed 2494.57 samples/sec Loss 4.5587 LearningRate 0.000833 Epoch: 7 Global Step: 148120 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:12,318-Speed 2500.15 samples/sec Loss 4.6172 LearningRate 0.000833 Epoch: 7 Global Step: 148130 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:20,519-Speed 2498.19 samples/sec Loss 4.6412 LearningRate 0.000833 Epoch: 7 Global Step: 148140 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:28,667-Speed 2513.79 samples/sec Loss 4.5692 LearningRate 0.000833 Epoch: 7 Global Step: 148150 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:36,868-Speed 2497.67 samples/sec Loss 4.5084 LearningRate 0.000833 Epoch: 7 Global Step: 148160 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:45,068-Speed 2498.20 samples/sec Loss 4.5293 LearningRate 0.000833 Epoch: 7 Global Step: 148170 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:17:53,273-Speed 2496.27 samples/sec Loss 4.5635 LearningRate 0.000833 Epoch: 7 Global Step: 148180 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:01,471-Speed 2498.69 samples/sec Loss 4.5994 LearningRate 0.000833 Epoch: 7 Global Step: 148190 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:09,672-Speed 2497.75 samples/sec Loss 4.6125 LearningRate 0.000833 Epoch: 7 Global Step: 148200 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:17,819-Speed 2514.27 samples/sec Loss 4.4974 LearningRate 0.000833 Epoch: 7 Global Step: 148210 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:26,142-Speed 2461.01 samples/sec Loss 4.5284 LearningRate 0.000833 Epoch: 7 Global Step: 148220 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:34,343-Speed 2497.47 samples/sec Loss 4.5738 LearningRate 0.000833 Epoch: 7 Global Step: 148230 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:42,764-Speed 2432.56 samples/sec Loss 4.4541 LearningRate 0.000833 Epoch: 7 Global Step: 148240 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:51,688-Speed 2382.20 samples/sec Loss 4.5499 LearningRate 0.000833 Epoch: 7 Global Step: 148250 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:18:59,902-Speed 2493.76 samples/sec Loss 4.5385 LearningRate 0.000833 Epoch: 7 Global Step: 148260 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:08,050-Speed 2513.79 samples/sec Loss 4.6051 LearningRate 0.000833 Epoch: 7 Global Step: 148270 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:16,253-Speed 2497.34 samples/sec Loss 4.5962 LearningRate 0.000833 Epoch: 7 Global Step: 148280 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:24,447-Speed 2499.64 samples/sec Loss 4.5604 LearningRate 0.000833 Epoch: 7 Global Step: 148290 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:32,663-Speed 2493.17 samples/sec Loss 4.5922 LearningRate 0.000833 Epoch: 7 Global Step: 148300 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:40,860-Speed 2498.97 samples/sec Loss 4.5725 LearningRate 0.000833 Epoch: 7 Global Step: 148310 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:49,058-Speed 2498.54 samples/sec Loss 4.5716 LearningRate 0.000833 Epoch: 7 Global Step: 148320 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:19:57,210-Speed 2512.92 samples/sec Loss 4.6226 LearningRate 0.000833 Epoch: 7 Global Step: 148330 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:05,404-Speed 2499.55 samples/sec Loss 4.5480 LearningRate 0.000833 Epoch: 7 Global Step: 148340 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:13,608-Speed 2496.98 samples/sec Loss 4.6007 LearningRate 0.000832 Epoch: 7 Global Step: 148350 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:21,810-Speed 2497.41 samples/sec Loss 4.6417 LearningRate 0.000832 Epoch: 7 Global Step: 148360 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:30,008-Speed 2498.75 samples/sec Loss 4.5865 LearningRate 0.000832 Epoch: 7 Global Step: 148370 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:38,207-Speed 2498.29 samples/sec Loss 4.5330 LearningRate 0.000832 Epoch: 7 Global Step: 148380 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:46,354-Speed 2515.78 samples/sec Loss 4.4988 LearningRate 0.000832 Epoch: 7 Global Step: 148390 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:20:54,551-Speed 2498.76 samples/sec Loss 4.5632 LearningRate 0.000832 Epoch: 7 Global Step: 148400 Fp16 Grad Scale: 32768 Required: 156 hours Training: 2022-07-07 00:21:02,760-Speed 2495.11 samples/sec Loss 4.6716 LearningRate 0.000832 Epoch: 7 Global Step: 148410 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:10,954-Speed 2499.80 samples/sec Loss 4.6118 LearningRate 0.000832 Epoch: 7 Global Step: 148420 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:19,153-Speed 2498.41 samples/sec Loss 4.6665 LearningRate 0.000832 Epoch: 7 Global Step: 148430 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:27,350-Speed 2498.80 samples/sec Loss 4.6264 LearningRate 0.000832 Epoch: 7 Global Step: 148440 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:35,502-Speed 2512.77 samples/sec Loss 4.6442 LearningRate 0.000832 Epoch: 7 Global Step: 148450 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:43,711-Speed 2495.09 samples/sec Loss 4.6161 LearningRate 0.000832 Epoch: 7 Global Step: 148460 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:21:51,911-Speed 2498.41 samples/sec Loss 4.6002 LearningRate 0.000832 Epoch: 7 Global Step: 148470 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:00,113-Speed 2497.24 samples/sec Loss 4.5108 LearningRate 0.000832 Epoch: 7 Global Step: 148480 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:08,324-Speed 2494.66 samples/sec Loss 4.5478 LearningRate 0.000832 Epoch: 7 Global Step: 148490 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:16,523-Speed 2498.36 samples/sec Loss 4.6714 LearningRate 0.000832 Epoch: 7 Global Step: 148500 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:24,672-Speed 2513.80 samples/sec Loss 4.5516 LearningRate 0.000832 Epoch: 7 Global Step: 148510 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:32,877-Speed 2496.45 samples/sec Loss 4.6636 LearningRate 0.000832 Epoch: 7 Global Step: 148520 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:41,072-Speed 2499.32 samples/sec Loss 4.6065 LearningRate 0.000832 Epoch: 7 Global Step: 148530 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:49,273-Speed 2497.70 samples/sec Loss 4.6570 LearningRate 0.000832 Epoch: 7 Global Step: 148540 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:22:57,472-Speed 2498.58 samples/sec Loss 4.6204 LearningRate 0.000832 Epoch: 7 Global Step: 148550 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:05,672-Speed 2497.76 samples/sec Loss 4.6412 LearningRate 0.000832 Epoch: 7 Global Step: 148560 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:13,814-Speed 2515.55 samples/sec Loss 4.5915 LearningRate 0.000832 Epoch: 7 Global Step: 148570 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:22,011-Speed 2499.09 samples/sec Loss 4.6174 LearningRate 0.000832 Epoch: 7 Global Step: 148580 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:30,218-Speed 2495.92 samples/sec Loss 4.6083 LearningRate 0.000832 Epoch: 7 Global Step: 148590 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:38,414-Speed 2499.22 samples/sec Loss 4.6466 LearningRate 0.000832 Epoch: 7 Global Step: 148600 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:46,612-Speed 2498.59 samples/sec Loss 4.6237 LearningRate 0.000832 Epoch: 7 Global Step: 148610 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:23:54,809-Speed 2498.83 samples/sec Loss 4.5674 LearningRate 0.000832 Epoch: 7 Global Step: 148620 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:02,957-Speed 2513.95 samples/sec Loss 4.6480 LearningRate 0.000832 Epoch: 7 Global Step: 148630 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:11,152-Speed 2499.62 samples/sec Loss 4.6272 LearningRate 0.000832 Epoch: 7 Global Step: 148640 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:19,350-Speed 2498.44 samples/sec Loss 4.5291 LearningRate 0.000832 Epoch: 7 Global Step: 148650 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:27,568-Speed 2492.72 samples/sec Loss 4.5875 LearningRate 0.000832 Epoch: 7 Global Step: 148660 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:35,792-Speed 2490.73 samples/sec Loss 4.6099 LearningRate 0.000832 Epoch: 7 Global Step: 148670 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:43,990-Speed 2498.47 samples/sec Loss 4.5491 LearningRate 0.000832 Epoch: 7 Global Step: 148680 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:24:52,147-Speed 2511.24 samples/sec Loss 4.5833 LearningRate 0.000832 Epoch: 7 Global Step: 148690 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:00,344-Speed 2499.11 samples/sec Loss 4.4522 LearningRate 0.000832 Epoch: 7 Global Step: 148700 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:08,541-Speed 2498.87 samples/sec Loss 4.5035 LearningRate 0.000832 Epoch: 7 Global Step: 148710 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:16,756-Speed 2493.22 samples/sec Loss 4.6441 LearningRate 0.000832 Epoch: 7 Global Step: 148720 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:24,957-Speed 2497.84 samples/sec Loss 4.5634 LearningRate 0.000832 Epoch: 7 Global Step: 148730 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:33,156-Speed 2498.30 samples/sec Loss 4.6495 LearningRate 0.000832 Epoch: 7 Global Step: 148740 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:41,311-Speed 2511.69 samples/sec Loss 4.5285 LearningRate 0.000832 Epoch: 7 Global Step: 148750 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:49,514-Speed 2496.97 samples/sec Loss 4.5585 LearningRate 0.000831 Epoch: 7 Global Step: 148760 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:25:57,707-Speed 2500.27 samples/sec Loss 4.6111 LearningRate 0.000831 Epoch: 7 Global Step: 148770 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:05,904-Speed 2498.86 samples/sec Loss 4.5847 LearningRate 0.000831 Epoch: 7 Global Step: 148780 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:14,102-Speed 2498.75 samples/sec Loss 4.5999 LearningRate 0.000831 Epoch: 7 Global Step: 148790 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:22,298-Speed 2498.95 samples/sec Loss 4.5411 LearningRate 0.000831 Epoch: 7 Global Step: 148800 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:30,441-Speed 2515.32 samples/sec Loss 4.5270 LearningRate 0.000831 Epoch: 7 Global Step: 148810 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:38,639-Speed 2499.02 samples/sec Loss 4.5011 LearningRate 0.000831 Epoch: 7 Global Step: 148820 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:46,835-Speed 2499.03 samples/sec Loss 4.4823 LearningRate 0.000831 Epoch: 7 Global Step: 148830 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:26:55,031-Speed 2499.08 samples/sec Loss 4.6114 LearningRate 0.000831 Epoch: 7 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:03,230-Speed 2498.30 samples/sec Loss 4.6703 LearningRate 0.000831 Epoch: 7 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:11,432-Speed 2497.41 samples/sec Loss 4.5384 LearningRate 0.000831 Epoch: 7 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:19,574-Speed 2515.74 samples/sec Loss 4.4608 LearningRate 0.000831 Epoch: 7 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:27,773-Speed 2498.57 samples/sec Loss 4.4495 LearningRate 0.000831 Epoch: 7 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:35,970-Speed 2498.84 samples/sec Loss 4.5534 LearningRate 0.000831 Epoch: 7 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:44,166-Speed 2499.11 samples/sec Loss 4.5662 LearningRate 0.000831 Epoch: 7 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:27:52,362-Speed 2499.33 samples/sec Loss 4.5838 LearningRate 0.000831 Epoch: 7 Global Step: 148910 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:00,563-Speed 2497.84 samples/sec Loss 4.5495 LearningRate 0.000831 Epoch: 7 Global Step: 148920 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:08,705-Speed 2515.58 samples/sec Loss 4.4923 LearningRate 0.000831 Epoch: 7 Global Step: 148930 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:16,919-Speed 2493.45 samples/sec Loss 4.5427 LearningRate 0.000831 Epoch: 7 Global Step: 148940 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:25,117-Speed 2498.64 samples/sec Loss 4.5392 LearningRate 0.000831 Epoch: 7 Global Step: 148950 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:33,312-Speed 2499.75 samples/sec Loss 4.5114 LearningRate 0.000831 Epoch: 7 Global Step: 148960 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:41,512-Speed 2497.82 samples/sec Loss 4.5817 LearningRate 0.000831 Epoch: 7 Global Step: 148970 Fp16 Grad Scale: 65536 Required: 156 hours Training: 2022-07-07 00:28:49,720-Speed 2495.75 samples/sec Loss 4.6303 LearningRate 0.000831 Epoch: 7 Global Step: 148980 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:28:57,864-Speed 2514.94 samples/sec Loss 4.5214 LearningRate 0.000831 Epoch: 7 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:06,062-Speed 2498.71 samples/sec Loss 4.6173 LearningRate 0.000831 Epoch: 7 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:14,260-Speed 2498.51 samples/sec Loss 4.5603 LearningRate 0.000831 Epoch: 7 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:22,461-Speed 2497.92 samples/sec Loss 4.5578 LearningRate 0.000831 Epoch: 7 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:30,659-Speed 2498.42 samples/sec Loss 4.5389 LearningRate 0.000831 Epoch: 7 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:38,855-Speed 2499.40 samples/sec Loss 4.4529 LearningRate 0.000831 Epoch: 7 Global Step: 149040 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:46,999-Speed 2514.83 samples/sec Loss 4.5994 LearningRate 0.000831 Epoch: 7 Global Step: 149050 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:29:55,199-Speed 2498.09 samples/sec Loss 4.5801 LearningRate 0.000831 Epoch: 7 Global Step: 149060 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:03,402-Speed 2497.14 samples/sec Loss 4.5546 LearningRate 0.000831 Epoch: 7 Global Step: 149070 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:11,617-Speed 2493.50 samples/sec Loss 4.5065 LearningRate 0.000831 Epoch: 7 Global Step: 149080 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:19,811-Speed 2499.62 samples/sec Loss 4.5928 LearningRate 0.000831 Epoch: 7 Global Step: 149090 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:28,010-Speed 2498.53 samples/sec Loss 4.5171 LearningRate 0.000831 Epoch: 7 Global Step: 149100 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:36,160-Speed 2513.43 samples/sec Loss 4.5500 LearningRate 0.000831 Epoch: 7 Global Step: 149110 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:44,359-Speed 2498.33 samples/sec Loss 4.4977 LearningRate 0.000831 Epoch: 7 Global Step: 149120 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:30:52,557-Speed 2498.61 samples/sec Loss 4.6102 LearningRate 0.000831 Epoch: 7 Global Step: 149130 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:00,751-Speed 2500.00 samples/sec Loss 4.5002 LearningRate 0.000831 Epoch: 7 Global Step: 149140 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:08,954-Speed 2496.96 samples/sec Loss 4.5783 LearningRate 0.000831 Epoch: 7 Global Step: 149150 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:17,151-Speed 2498.90 samples/sec Loss 4.6002 LearningRate 0.000831 Epoch: 7 Global Step: 149160 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:25,311-Speed 2510.17 samples/sec Loss 4.6129 LearningRate 0.000830 Epoch: 7 Global Step: 149170 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:33,513-Speed 2497.32 samples/sec Loss 4.5947 LearningRate 0.000830 Epoch: 7 Global Step: 149180 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:41,713-Speed 2498.07 samples/sec Loss 4.5011 LearningRate 0.000830 Epoch: 7 Global Step: 149190 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:49,909-Speed 2499.38 samples/sec Loss 4.5272 LearningRate 0.000830 Epoch: 7 Global Step: 149200 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:31:58,101-Speed 2500.10 samples/sec Loss 4.5790 LearningRate 0.000830 Epoch: 7 Global Step: 149210 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:06,300-Speed 2498.30 samples/sec Loss 4.5667 LearningRate 0.000830 Epoch: 7 Global Step: 149220 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:14,446-Speed 2514.49 samples/sec Loss 4.5953 LearningRate 0.000830 Epoch: 7 Global Step: 149230 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:22,648-Speed 2497.38 samples/sec Loss 4.5463 LearningRate 0.000830 Epoch: 7 Global Step: 149240 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:30,845-Speed 2498.90 samples/sec Loss 4.5784 LearningRate 0.000830 Epoch: 7 Global Step: 149250 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:39,041-Speed 2499.15 samples/sec Loss 4.5983 LearningRate 0.000830 Epoch: 7 Global Step: 149260 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:47,236-Speed 2499.84 samples/sec Loss 4.5221 LearningRate 0.000830 Epoch: 7 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:32:55,435-Speed 2498.47 samples/sec Loss 4.5468 LearningRate 0.000830 Epoch: 7 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:03,583-Speed 2513.81 samples/sec Loss 4.6630 LearningRate 0.000830 Epoch: 7 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:11,779-Speed 2499.27 samples/sec Loss 4.6054 LearningRate 0.000830 Epoch: 7 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:19,977-Speed 2498.87 samples/sec Loss 4.6019 LearningRate 0.000830 Epoch: 7 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:28,174-Speed 2498.90 samples/sec Loss 4.5034 LearningRate 0.000830 Epoch: 7 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:36,383-Speed 2495.24 samples/sec Loss 4.5647 LearningRate 0.000830 Epoch: 7 Global Step: 149330 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:44,581-Speed 2498.51 samples/sec Loss 4.5474 LearningRate 0.000830 Epoch: 7 Global Step: 149340 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:33:52,728-Speed 2514.33 samples/sec Loss 4.5800 LearningRate 0.000830 Epoch: 7 Global Step: 149350 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:00,929-Speed 2497.77 samples/sec Loss 4.6395 LearningRate 0.000830 Epoch: 7 Global Step: 149360 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:09,133-Speed 2496.72 samples/sec Loss 4.6126 LearningRate 0.000830 Epoch: 7 Global Step: 149370 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:17,333-Speed 2497.83 samples/sec Loss 4.5451 LearningRate 0.000830 Epoch: 7 Global Step: 149380 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:25,533-Speed 2498.13 samples/sec Loss 4.5007 LearningRate 0.000830 Epoch: 7 Global Step: 149390 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:33,734-Speed 2497.51 samples/sec Loss 4.5034 LearningRate 0.000830 Epoch: 7 Global Step: 149400 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:41,880-Speed 2514.35 samples/sec Loss 4.5412 LearningRate 0.000830 Epoch: 7 Global Step: 149410 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:50,081-Speed 2497.95 samples/sec Loss 4.4813 LearningRate 0.000830 Epoch: 7 Global Step: 149420 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:34:58,283-Speed 2497.36 samples/sec Loss 4.5881 LearningRate 0.000830 Epoch: 7 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:06,478-Speed 2499.52 samples/sec Loss 4.5085 LearningRate 0.000830 Epoch: 7 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:14,680-Speed 2497.46 samples/sec Loss 4.6017 LearningRate 0.000830 Epoch: 7 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:22,879-Speed 2498.52 samples/sec Loss 4.5755 LearningRate 0.000830 Epoch: 7 Global Step: 149460 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:31,024-Speed 2514.85 samples/sec Loss 4.5349 LearningRate 0.000830 Epoch: 7 Global Step: 149470 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:39,224-Speed 2497.90 samples/sec Loss 4.4672 LearningRate 0.000830 Epoch: 7 Global Step: 149480 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:47,415-Speed 2501.02 samples/sec Loss 4.5671 LearningRate 0.000830 Epoch: 7 Global Step: 149490 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:35:55,619-Speed 2496.65 samples/sec Loss 4.5471 LearningRate 0.000830 Epoch: 7 Global Step: 149500 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:03,817-Speed 2498.86 samples/sec Loss 4.5135 LearningRate 0.000830 Epoch: 7 Global Step: 149510 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:12,015-Speed 2498.65 samples/sec Loss 4.4840 LearningRate 0.000830 Epoch: 7 Global Step: 149520 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:20,160-Speed 2514.86 samples/sec Loss 4.4746 LearningRate 0.000830 Epoch: 7 Global Step: 149530 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:28,356-Speed 2499.17 samples/sec Loss 4.5034 LearningRate 0.000830 Epoch: 7 Global Step: 149540 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:36,559-Speed 2497.23 samples/sec Loss 4.5348 LearningRate 0.000830 Epoch: 7 Global Step: 149550 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:44,761-Speed 2497.39 samples/sec Loss 4.5351 LearningRate 0.000830 Epoch: 7 Global Step: 149560 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:36:52,963-Speed 2497.15 samples/sec Loss 4.5390 LearningRate 0.000830 Epoch: 7 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:37:01,161-Speed 2498.57 samples/sec Loss 4.5439 LearningRate 0.000829 Epoch: 7 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:37:09,307-Speed 2514.28 samples/sec Loss 4.6065 LearningRate 0.000829 Epoch: 7 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:37:17,524-Speed 2492.94 samples/sec Loss 4.5555 LearningRate 0.000829 Epoch: 7 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:37:25,721-Speed 2498.94 samples/sec Loss 4.5282 LearningRate 0.000829 Epoch: 7 Global Step: 149610 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:37:33,965-Speed 2484.85 samples/sec Loss 4.5165 LearningRate 0.000829 Epoch: 7 Global Step: 149620 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:37:42,166-Speed 2497.70 samples/sec Loss 4.5639 LearningRate 0.000829 Epoch: 7 Global Step: 149630 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:37:50,365-Speed 2498.45 samples/sec Loss 4.5201 LearningRate 0.000829 Epoch: 7 Global Step: 149640 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:37:58,529-Speed 2509.24 samples/sec Loss 4.5007 LearningRate 0.000829 Epoch: 7 Global Step: 149650 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:38:06,741-Speed 2494.76 samples/sec Loss 4.5672 LearningRate 0.000829 Epoch: 7 Global Step: 149660 Fp16 Grad Scale: 131072 Required: 155 hours Training: 2022-07-07 00:38:14,900-Speed 2510.48 samples/sec Loss 4.5296 LearningRate 0.000829 Epoch: 7 Global Step: 149670 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:38:23,101-Speed 2497.66 samples/sec Loss 4.5267 LearningRate 0.000829 Epoch: 7 Global Step: 149680 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:38:31,324-Speed 2491.12 samples/sec Loss 4.4955 LearningRate 0.000829 Epoch: 7 Global Step: 149690 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:38:39,537-Speed 2493.90 samples/sec Loss 4.5191 LearningRate 0.000829 Epoch: 7 Global Step: 149700 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:38:47,688-Speed 2513.00 samples/sec Loss 4.4452 LearningRate 0.000829 Epoch: 7 Global Step: 149710 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:38:55,893-Speed 2496.52 samples/sec Loss 4.4747 LearningRate 0.000829 Epoch: 7 Global Step: 149720 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:04,094-Speed 2497.77 samples/sec Loss 4.5062 LearningRate 0.000829 Epoch: 7 Global Step: 149730 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:12,294-Speed 2498.14 samples/sec Loss 4.4568 LearningRate 0.000829 Epoch: 7 Global Step: 149740 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:20,504-Speed 2494.93 samples/sec Loss 4.4937 LearningRate 0.000829 Epoch: 7 Global Step: 149750 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:28,705-Speed 2497.71 samples/sec Loss 4.5618 LearningRate 0.000829 Epoch: 7 Global Step: 149760 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:36,851-Speed 2514.45 samples/sec Loss 4.5694 LearningRate 0.000829 Epoch: 7 Global Step: 149770 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:45,056-Speed 2496.71 samples/sec Loss 4.5288 LearningRate 0.000829 Epoch: 7 Global Step: 149780 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:39:53,255-Speed 2498.27 samples/sec Loss 4.4748 LearningRate 0.000829 Epoch: 7 Global Step: 149790 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:01,466-Speed 2494.56 samples/sec Loss 4.5154 LearningRate 0.000829 Epoch: 7 Global Step: 149800 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:09,664-Speed 2498.64 samples/sec Loss 4.6597 LearningRate 0.000829 Epoch: 7 Global Step: 149810 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:17,862-Speed 2498.46 samples/sec Loss 4.5725 LearningRate 0.000829 Epoch: 7 Global Step: 149820 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:26,007-Speed 2514.88 samples/sec Loss 4.5447 LearningRate 0.000829 Epoch: 7 Global Step: 149830 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:34,202-Speed 2499.56 samples/sec Loss 4.5772 LearningRate 0.000829 Epoch: 7 Global Step: 149840 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:42,400-Speed 2498.79 samples/sec Loss 4.5439 LearningRate 0.000829 Epoch: 7 Global Step: 149850 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:50,594-Speed 2499.60 samples/sec Loss 4.4813 LearningRate 0.000829 Epoch: 7 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:40:58,804-Speed 2495.11 samples/sec Loss 4.5714 LearningRate 0.000829 Epoch: 7 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:41:06,999-Speed 2499.76 samples/sec Loss 4.5983 LearningRate 0.000829 Epoch: 7 Global Step: 149880 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:41:15,144-Speed 2514.73 samples/sec Loss 4.5941 LearningRate 0.000829 Epoch: 7 Global Step: 149890 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:41:23,342-Speed 2498.44 samples/sec Loss 4.5385 LearningRate 0.000829 Epoch: 7 Global Step: 149900 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:41:31,538-Speed 2499.30 samples/sec Loss 4.5538 LearningRate 0.000829 Epoch: 7 Global Step: 149910 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:41:39,693-Speed 2511.81 samples/sec Loss 4.5048 LearningRate 0.000829 Epoch: 7 Global Step: 149920 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:41:47,890-Speed 2498.94 samples/sec Loss 4.5079 LearningRate 0.000829 Epoch: 7 Global Step: 149930 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:41:56,094-Speed 2496.82 samples/sec Loss 4.4331 LearningRate 0.000829 Epoch: 7 Global Step: 149940 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:04,240-Speed 2514.43 samples/sec Loss 4.5411 LearningRate 0.000829 Epoch: 7 Global Step: 149950 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:12,453-Speed 2494.40 samples/sec Loss 4.5172 LearningRate 0.000829 Epoch: 7 Global Step: 149960 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:20,658-Speed 2496.29 samples/sec Loss 4.4613 LearningRate 0.000829 Epoch: 7 Global Step: 149970 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:28,857-Speed 2498.58 samples/sec Loss 4.5072 LearningRate 0.000829 Epoch: 7 Global Step: 149980 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:37,058-Speed 2497.51 samples/sec Loss 4.6024 LearningRate 0.000828 Epoch: 7 Global Step: 149990 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:45,265-Speed 2495.87 samples/sec Loss 4.5190 LearningRate 0.000828 Epoch: 7 Global Step: 150000 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:42:53,419-Speed 2511.90 samples/sec Loss 4.5046 LearningRate 0.000828 Epoch: 7 Global Step: 150010 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:01,621-Speed 2497.39 samples/sec Loss 4.5232 LearningRate 0.000828 Epoch: 7 Global Step: 150020 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:09,820-Speed 2498.56 samples/sec Loss 4.4908 LearningRate 0.000828 Epoch: 7 Global Step: 150030 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:18,024-Speed 2496.90 samples/sec Loss 4.4683 LearningRate 0.000828 Epoch: 7 Global Step: 150040 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:26,223-Speed 2498.33 samples/sec Loss 4.5359 LearningRate 0.000828 Epoch: 7 Global Step: 150050 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:34,417-Speed 2499.88 samples/sec Loss 4.5058 LearningRate 0.000828 Epoch: 7 Global Step: 150060 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:42,573-Speed 2511.35 samples/sec Loss 4.4970 LearningRate 0.000828 Epoch: 7 Global Step: 150070 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:50,771-Speed 2498.83 samples/sec Loss 4.5101 LearningRate 0.000828 Epoch: 7 Global Step: 150080 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:43:58,972-Speed 2497.61 samples/sec Loss 4.5791 LearningRate 0.000828 Epoch: 7 Global Step: 150090 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:07,170-Speed 2498.53 samples/sec Loss 4.5389 LearningRate 0.000828 Epoch: 7 Global Step: 150100 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:15,368-Speed 2498.99 samples/sec Loss 4.5882 LearningRate 0.000828 Epoch: 7 Global Step: 150110 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:23,584-Speed 2493.30 samples/sec Loss 4.5812 LearningRate 0.000828 Epoch: 7 Global Step: 150120 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:31,727-Speed 2515.38 samples/sec Loss 4.6333 LearningRate 0.000828 Epoch: 7 Global Step: 150130 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:39,922-Speed 2499.39 samples/sec Loss 4.5743 LearningRate 0.000828 Epoch: 7 Global Step: 150140 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:48,121-Speed 2498.68 samples/sec Loss 4.5200 LearningRate 0.000828 Epoch: 7 Global Step: 150150 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:44:56,319-Speed 2498.55 samples/sec Loss 4.6136 LearningRate 0.000828 Epoch: 7 Global Step: 150160 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:04,518-Speed 2498.17 samples/sec Loss 4.5492 LearningRate 0.000828 Epoch: 7 Global Step: 150170 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:12,723-Speed 2496.36 samples/sec Loss 4.6179 LearningRate 0.000828 Epoch: 7 Global Step: 150180 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:20,864-Speed 2516.04 samples/sec Loss 4.4826 LearningRate 0.000828 Epoch: 7 Global Step: 150190 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:29,067-Speed 2497.36 samples/sec Loss 4.5441 LearningRate 0.000828 Epoch: 7 Global Step: 150200 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:37,266-Speed 2498.19 samples/sec Loss 4.6171 LearningRate 0.000828 Epoch: 7 Global Step: 150210 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:45,467-Speed 2497.57 samples/sec Loss 4.5814 LearningRate 0.000828 Epoch: 7 Global Step: 150220 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:45:53,668-Speed 2497.73 samples/sec Loss 4.6256 LearningRate 0.000828 Epoch: 7 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:01,871-Speed 2497.22 samples/sec Loss 4.5395 LearningRate 0.000828 Epoch: 7 Global Step: 150240 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:10,024-Speed 2512.50 samples/sec Loss 4.4842 LearningRate 0.000828 Epoch: 7 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:18,223-Speed 2498.25 samples/sec Loss 4.6288 LearningRate 0.000828 Epoch: 7 Global Step: 150260 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:26,422-Speed 2498.19 samples/sec Loss 4.5050 LearningRate 0.000828 Epoch: 7 Global Step: 150270 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:34,620-Speed 2498.72 samples/sec Loss 4.5324 LearningRate 0.000828 Epoch: 7 Global Step: 150280 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:42,818-Speed 2498.65 samples/sec Loss 4.5952 LearningRate 0.000828 Epoch: 7 Global Step: 150290 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:51,018-Speed 2498.10 samples/sec Loss 4.5203 LearningRate 0.000828 Epoch: 7 Global Step: 150300 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:46:59,176-Speed 2511.12 samples/sec Loss 4.5367 LearningRate 0.000828 Epoch: 7 Global Step: 150310 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:07,373-Speed 2498.72 samples/sec Loss 4.5615 LearningRate 0.000828 Epoch: 7 Global Step: 150320 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:15,571-Speed 2499.04 samples/sec Loss 4.5220 LearningRate 0.000828 Epoch: 7 Global Step: 150330 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:23,766-Speed 2499.42 samples/sec Loss 4.5280 LearningRate 0.000828 Epoch: 7 Global Step: 150340 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:31,960-Speed 2499.61 samples/sec Loss 4.5972 LearningRate 0.000828 Epoch: 7 Global Step: 150350 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:40,157-Speed 2498.78 samples/sec Loss 4.5891 LearningRate 0.000828 Epoch: 7 Global Step: 150360 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:48,307-Speed 2513.29 samples/sec Loss 4.5141 LearningRate 0.000828 Epoch: 7 Global Step: 150370 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:47:56,506-Speed 2498.52 samples/sec Loss 4.5202 LearningRate 0.000828 Epoch: 7 Global Step: 150380 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:04,704-Speed 2498.58 samples/sec Loss 4.5046 LearningRate 0.000828 Epoch: 7 Global Step: 150390 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:12,902-Speed 2498.51 samples/sec Loss 4.5101 LearningRate 0.000827 Epoch: 7 Global Step: 150400 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:21,100-Speed 2498.67 samples/sec Loss 4.5012 LearningRate 0.000827 Epoch: 7 Global Step: 150410 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:29,303-Speed 2497.17 samples/sec Loss 4.5547 LearningRate 0.000827 Epoch: 7 Global Step: 150420 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:37,451-Speed 2513.65 samples/sec Loss 4.5454 LearningRate 0.000827 Epoch: 7 Global Step: 150430 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:45,653-Speed 2497.86 samples/sec Loss 4.4857 LearningRate 0.000827 Epoch: 7 Global Step: 150440 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:48:53,851-Speed 2498.51 samples/sec Loss 4.5360 LearningRate 0.000827 Epoch: 7 Global Step: 150450 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:02,053-Speed 2497.29 samples/sec Loss 4.4795 LearningRate 0.000827 Epoch: 7 Global Step: 150460 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:10,255-Speed 2497.65 samples/sec Loss 4.5049 LearningRate 0.000827 Epoch: 7 Global Step: 150470 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:18,454-Speed 2498.13 samples/sec Loss 4.5931 LearningRate 0.000827 Epoch: 7 Global Step: 150480 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:26,605-Speed 2513.18 samples/sec Loss 4.5850 LearningRate 0.000827 Epoch: 7 Global Step: 150490 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:34,811-Speed 2496.12 samples/sec Loss 4.5694 LearningRate 0.000827 Epoch: 7 Global Step: 150500 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:43,015-Speed 2496.71 samples/sec Loss 4.5792 LearningRate 0.000827 Epoch: 7 Global Step: 150510 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:51,217-Speed 2497.28 samples/sec Loss 4.5061 LearningRate 0.000827 Epoch: 7 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:49:59,418-Speed 2497.97 samples/sec Loss 4.5674 LearningRate 0.000827 Epoch: 7 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:07,627-Speed 2495.23 samples/sec Loss 4.5666 LearningRate 0.000827 Epoch: 7 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:15,775-Speed 2513.87 samples/sec Loss 4.5546 LearningRate 0.000827 Epoch: 7 Global Step: 150550 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:23,974-Speed 2498.09 samples/sec Loss 4.5859 LearningRate 0.000827 Epoch: 7 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:32,175-Speed 2497.63 samples/sec Loss 4.5357 LearningRate 0.000827 Epoch: 7 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:40,378-Speed 2497.12 samples/sec Loss 4.5516 LearningRate 0.000827 Epoch: 7 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:48,585-Speed 2495.98 samples/sec Loss 4.5450 LearningRate 0.000827 Epoch: 7 Global Step: 150590 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:50:56,789-Speed 2496.59 samples/sec Loss 4.5141 LearningRate 0.000827 Epoch: 7 Global Step: 150600 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:04,942-Speed 2512.50 samples/sec Loss 4.5462 LearningRate 0.000827 Epoch: 7 Global Step: 150610 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:13,143-Speed 2497.51 samples/sec Loss 4.5832 LearningRate 0.000827 Epoch: 7 Global Step: 150620 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:21,351-Speed 2495.51 samples/sec Loss 4.5117 LearningRate 0.000827 Epoch: 7 Global Step: 150630 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:29,557-Speed 2496.40 samples/sec Loss 4.5528 LearningRate 0.000827 Epoch: 7 Global Step: 150640 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:37,761-Speed 2496.76 samples/sec Loss 4.5303 LearningRate 0.000827 Epoch: 7 Global Step: 150650 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:45,964-Speed 2497.08 samples/sec Loss 4.5555 LearningRate 0.000827 Epoch: 7 Global Step: 150660 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:51:54,143-Speed 2504.47 samples/sec Loss 4.5254 LearningRate 0.000827 Epoch: 7 Global Step: 150670 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:03,442-Speed 2202.67 samples/sec Loss 4.4427 LearningRate 0.000827 Epoch: 7 Global Step: 150680 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:11,692-Speed 2494.19 samples/sec Loss 4.5340 LearningRate 0.000827 Epoch: 7 Global Step: 150690 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:20,448-Speed 2498.51 samples/sec Loss 4.5858 LearningRate 0.000827 Epoch: 7 Global Step: 150700 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:28,660-Speed 2494.14 samples/sec Loss 4.5959 LearningRate 0.000827 Epoch: 7 Global Step: 150710 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:36,874-Speed 2493.86 samples/sec Loss 4.4709 LearningRate 0.000827 Epoch: 7 Global Step: 150720 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:45,028-Speed 2511.91 samples/sec Loss 4.5333 LearningRate 0.000827 Epoch: 7 Global Step: 150730 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:52:53,223-Speed 2499.37 samples/sec Loss 4.5488 LearningRate 0.000827 Epoch: 7 Global Step: 150740 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:01,419-Speed 2499.31 samples/sec Loss 4.5306 LearningRate 0.000827 Epoch: 7 Global Step: 150750 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:09,619-Speed 2498.16 samples/sec Loss 4.5189 LearningRate 0.000827 Epoch: 7 Global Step: 150760 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:17,818-Speed 2498.27 samples/sec Loss 4.5267 LearningRate 0.000827 Epoch: 7 Global Step: 150770 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:26,028-Speed 2494.99 samples/sec Loss 4.4680 LearningRate 0.000827 Epoch: 7 Global Step: 150780 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:34,185-Speed 2511.12 samples/sec Loss 4.4260 LearningRate 0.000827 Epoch: 7 Global Step: 150790 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:42,388-Speed 2497.21 samples/sec Loss 4.4744 LearningRate 0.000827 Epoch: 7 Global Step: 150800 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:50,596-Speed 2495.61 samples/sec Loss 4.5171 LearningRate 0.000826 Epoch: 7 Global Step: 150810 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:53:58,796-Speed 2497.69 samples/sec Loss 4.5030 LearningRate 0.000826 Epoch: 7 Global Step: 150820 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:06,998-Speed 2497.45 samples/sec Loss 4.4387 LearningRate 0.000826 Epoch: 7 Global Step: 150830 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:15,198-Speed 2498.21 samples/sec Loss 4.5222 LearningRate 0.000826 Epoch: 7 Global Step: 150840 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:23,346-Speed 2513.64 samples/sec Loss 4.4542 LearningRate 0.000826 Epoch: 7 Global Step: 150850 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:31,543-Speed 2498.87 samples/sec Loss 4.4750 LearningRate 0.000826 Epoch: 7 Global Step: 150860 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:39,742-Speed 2498.22 samples/sec Loss 4.5842 LearningRate 0.000826 Epoch: 7 Global Step: 150870 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:47,942-Speed 2498.18 samples/sec Loss 4.5908 LearningRate 0.000826 Epoch: 7 Global Step: 150880 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:54:56,137-Speed 2499.22 samples/sec Loss 4.4873 LearningRate 0.000826 Epoch: 7 Global Step: 150890 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:04,339-Speed 2497.38 samples/sec Loss 4.5716 LearningRate 0.000826 Epoch: 7 Global Step: 150900 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:12,489-Speed 2513.29 samples/sec Loss 4.5988 LearningRate 0.000826 Epoch: 7 Global Step: 150910 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:20,690-Speed 2497.76 samples/sec Loss 4.5603 LearningRate 0.000826 Epoch: 7 Global Step: 150920 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:28,889-Speed 2498.34 samples/sec Loss 4.5577 LearningRate 0.000826 Epoch: 7 Global Step: 150930 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:37,094-Speed 2496.61 samples/sec Loss 4.5600 LearningRate 0.000826 Epoch: 7 Global Step: 150940 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:45,300-Speed 2496.27 samples/sec Loss 4.5141 LearningRate 0.000826 Epoch: 7 Global Step: 150950 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:55:53,504-Speed 2496.60 samples/sec Loss 4.5726 LearningRate 0.000826 Epoch: 7 Global Step: 150960 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:01,654-Speed 2513.43 samples/sec Loss 4.4723 LearningRate 0.000826 Epoch: 7 Global Step: 150970 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:09,854-Speed 2497.95 samples/sec Loss 4.5713 LearningRate 0.000826 Epoch: 7 Global Step: 150980 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:18,055-Speed 2497.77 samples/sec Loss 4.5068 LearningRate 0.000826 Epoch: 7 Global Step: 150990 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:26,259-Speed 2496.71 samples/sec Loss 4.5119 LearningRate 0.000826 Epoch: 7 Global Step: 151000 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:34,457-Speed 2498.56 samples/sec Loss 4.4406 LearningRate 0.000826 Epoch: 7 Global Step: 151010 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:42,672-Speed 2493.14 samples/sec Loss 4.5087 LearningRate 0.000826 Epoch: 7 Global Step: 151020 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:50,824-Speed 2512.94 samples/sec Loss 4.4779 LearningRate 0.000826 Epoch: 7 Global Step: 151030 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:56:59,022-Speed 2498.72 samples/sec Loss 4.5012 LearningRate 0.000826 Epoch: 7 Global Step: 151040 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:07,221-Speed 2498.31 samples/sec Loss 4.4712 LearningRate 0.000826 Epoch: 7 Global Step: 151050 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:15,421-Speed 2498.06 samples/sec Loss 4.4367 LearningRate 0.000826 Epoch: 7 Global Step: 151060 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:23,617-Speed 2499.19 samples/sec Loss 4.4969 LearningRate 0.000826 Epoch: 7 Global Step: 151070 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:31,817-Speed 2497.87 samples/sec Loss 4.5027 LearningRate 0.000826 Epoch: 7 Global Step: 151080 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:39,966-Speed 2513.58 samples/sec Loss 4.4827 LearningRate 0.000826 Epoch: 7 Global Step: 151090 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:48,166-Speed 2497.82 samples/sec Loss 4.4711 LearningRate 0.000826 Epoch: 7 Global Step: 151100 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:57:56,380-Speed 2493.91 samples/sec Loss 4.5058 LearningRate 0.000826 Epoch: 7 Global Step: 151110 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:58:04,584-Speed 2496.68 samples/sec Loss 4.4122 LearningRate 0.000826 Epoch: 7 Global Step: 151120 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:12,786-Speed 2497.21 samples/sec Loss 4.4652 LearningRate 0.000826 Epoch: 7 Global Step: 151130 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:20,996-Speed 2494.91 samples/sec Loss 4.4808 LearningRate 0.000826 Epoch: 7 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:29,143-Speed 2514.25 samples/sec Loss 4.5339 LearningRate 0.000826 Epoch: 7 Global Step: 151150 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:37,346-Speed 2496.86 samples/sec Loss 4.4779 LearningRate 0.000826 Epoch: 7 Global Step: 151160 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:45,552-Speed 2496.80 samples/sec Loss 4.5443 LearningRate 0.000826 Epoch: 7 Global Step: 151170 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 00:58:53,709-Speed 2511.03 samples/sec Loss 4.5185 LearningRate 0.000826 Epoch: 7 Global Step: 151180 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:01,912-Speed 2497.31 samples/sec Loss 4.4909 LearningRate 0.000826 Epoch: 7 Global Step: 151190 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:10,112-Speed 2498.00 samples/sec Loss 4.5497 LearningRate 0.000826 Epoch: 7 Global Step: 151200 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:18,260-Speed 2513.85 samples/sec Loss 4.4438 LearningRate 0.000826 Epoch: 7 Global Step: 151210 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:26,464-Speed 2497.09 samples/sec Loss 4.4688 LearningRate 0.000825 Epoch: 7 Global Step: 151220 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:34,664-Speed 2497.63 samples/sec Loss 4.4877 LearningRate 0.000825 Epoch: 7 Global Step: 151230 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:42,863-Speed 2498.50 samples/sec Loss 4.5189 LearningRate 0.000825 Epoch: 7 Global Step: 151240 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:51,072-Speed 2495.43 samples/sec Loss 4.4445 LearningRate 0.000825 Epoch: 7 Global Step: 151250 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 00:59:59,276-Speed 2497.04 samples/sec Loss 4.5430 LearningRate 0.000825 Epoch: 7 Global Step: 151260 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:07,424-Speed 2514.74 samples/sec Loss 4.4568 LearningRate 0.000825 Epoch: 7 Global Step: 151270 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:15,650-Speed 2490.25 samples/sec Loss 4.4746 LearningRate 0.000825 Epoch: 7 Global Step: 151280 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:23,852-Speed 2497.60 samples/sec Loss 4.4779 LearningRate 0.000825 Epoch: 7 Global Step: 151290 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:32,052-Speed 2498.30 samples/sec Loss 4.4831 LearningRate 0.000825 Epoch: 7 Global Step: 151300 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:40,250-Speed 2498.32 samples/sec Loss 4.4351 LearningRate 0.000825 Epoch: 7 Global Step: 151310 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:48,450-Speed 2498.01 samples/sec Loss 4.4653 LearningRate 0.000825 Epoch: 7 Global Step: 151320 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:00:56,604-Speed 2512.20 samples/sec Loss 4.4927 LearningRate 0.000825 Epoch: 7 Global Step: 151330 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:04,802-Speed 2498.49 samples/sec Loss 4.5142 LearningRate 0.000825 Epoch: 7 Global Step: 151340 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:13,004-Speed 2497.10 samples/sec Loss 4.5335 LearningRate 0.000825 Epoch: 7 Global Step: 151350 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:21,211-Speed 2495.95 samples/sec Loss 4.6489 LearningRate 0.000825 Epoch: 7 Global Step: 151360 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:29,419-Speed 2495.37 samples/sec Loss 4.5562 LearningRate 0.000825 Epoch: 7 Global Step: 151370 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:37,618-Speed 2498.48 samples/sec Loss 4.6201 LearningRate 0.000825 Epoch: 7 Global Step: 151380 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:45,766-Speed 2513.66 samples/sec Loss 4.5655 LearningRate 0.000825 Epoch: 7 Global Step: 151390 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:01:53,961-Speed 2499.61 samples/sec Loss 4.6414 LearningRate 0.000825 Epoch: 7 Global Step: 151400 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:02,162-Speed 2497.71 samples/sec Loss 4.4787 LearningRate 0.000825 Epoch: 7 Global Step: 151410 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:10,360-Speed 2498.62 samples/sec Loss 4.5278 LearningRate 0.000825 Epoch: 7 Global Step: 151420 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:18,566-Speed 2496.05 samples/sec Loss 4.5637 LearningRate 0.000825 Epoch: 7 Global Step: 151430 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:26,763-Speed 2499.01 samples/sec Loss 4.6071 LearningRate 0.000825 Epoch: 7 Global Step: 151440 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:34,905-Speed 2515.66 samples/sec Loss 4.5271 LearningRate 0.000825 Epoch: 7 Global Step: 151450 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:43,099-Speed 2499.68 samples/sec Loss 4.5040 LearningRate 0.000825 Epoch: 7 Global Step: 151460 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:51,292-Speed 2500.26 samples/sec Loss 4.4972 LearningRate 0.000825 Epoch: 7 Global Step: 151470 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:02:59,491-Speed 2498.46 samples/sec Loss 4.4253 LearningRate 0.000825 Epoch: 7 Global Step: 151480 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:07,688-Speed 2499.10 samples/sec Loss 4.4562 LearningRate 0.000825 Epoch: 7 Global Step: 151490 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:15,885-Speed 2499.07 samples/sec Loss 4.5108 LearningRate 0.000825 Epoch: 7 Global Step: 151500 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:24,031-Speed 2514.49 samples/sec Loss 4.4473 LearningRate 0.000825 Epoch: 7 Global Step: 151510 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:32,237-Speed 2496.02 samples/sec Loss 4.5693 LearningRate 0.000825 Epoch: 7 Global Step: 151520 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:40,439-Speed 2497.79 samples/sec Loss 4.4584 LearningRate 0.000825 Epoch: 7 Global Step: 151530 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:48,651-Speed 2494.33 samples/sec Loss 4.5096 LearningRate 0.000825 Epoch: 7 Global Step: 151540 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:03:56,842-Speed 2500.86 samples/sec Loss 4.5707 LearningRate 0.000825 Epoch: 7 Global Step: 151550 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:05,040-Speed 2498.84 samples/sec Loss 4.4798 LearningRate 0.000825 Epoch: 7 Global Step: 151560 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:13,186-Speed 2515.07 samples/sec Loss 4.5065 LearningRate 0.000825 Epoch: 7 Global Step: 151570 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:21,380-Speed 2499.62 samples/sec Loss 4.4225 LearningRate 0.000825 Epoch: 7 Global Step: 151580 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:29,578-Speed 2498.55 samples/sec Loss 4.5051 LearningRate 0.000825 Epoch: 7 Global Step: 151590 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:37,777-Speed 2498.43 samples/sec Loss 4.4972 LearningRate 0.000825 Epoch: 7 Global Step: 151600 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:45,980-Speed 2497.02 samples/sec Loss 4.4211 LearningRate 0.000825 Epoch: 7 Global Step: 151610 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:04:54,210-Speed 2488.87 samples/sec Loss 4.5515 LearningRate 0.000825 Epoch: 7 Global Step: 151620 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:02,348-Speed 2516.82 samples/sec Loss 4.4934 LearningRate 0.000824 Epoch: 7 Global Step: 151630 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:10,547-Speed 2498.30 samples/sec Loss 4.5093 LearningRate 0.000824 Epoch: 7 Global Step: 151640 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:18,742-Speed 2499.59 samples/sec Loss 4.4523 LearningRate 0.000824 Epoch: 7 Global Step: 151650 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:26,935-Speed 2500.16 samples/sec Loss 4.4575 LearningRate 0.000824 Epoch: 7 Global Step: 151660 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:35,132-Speed 2498.75 samples/sec Loss 4.4178 LearningRate 0.000824 Epoch: 7 Global Step: 151670 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:43,329-Speed 2499.41 samples/sec Loss 4.4598 LearningRate 0.000824 Epoch: 7 Global Step: 151680 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:51,476-Speed 2514.18 samples/sec Loss 4.5053 LearningRate 0.000824 Epoch: 7 Global Step: 151690 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:05:59,692-Speed 2493.28 samples/sec Loss 4.5077 LearningRate 0.000824 Epoch: 7 Global Step: 151700 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:07,889-Speed 2498.99 samples/sec Loss 4.4964 LearningRate 0.000824 Epoch: 7 Global Step: 151710 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:16,088-Speed 2498.40 samples/sec Loss 4.4664 LearningRate 0.000824 Epoch: 7 Global Step: 151720 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:24,281-Speed 2500.10 samples/sec Loss 4.5473 LearningRate 0.000824 Epoch: 7 Global Step: 151730 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:32,485-Speed 2496.63 samples/sec Loss 4.4046 LearningRate 0.000824 Epoch: 7 Global Step: 151740 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:40,625-Speed 2516.33 samples/sec Loss 4.4847 LearningRate 0.000824 Epoch: 7 Global Step: 151750 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:48,824-Speed 2498.39 samples/sec Loss 4.4678 LearningRate 0.000824 Epoch: 7 Global Step: 151760 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:06:57,026-Speed 2497.34 samples/sec Loss 4.4634 LearningRate 0.000824 Epoch: 7 Global Step: 151770 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:05,223-Speed 2498.89 samples/sec Loss 4.4871 LearningRate 0.000824 Epoch: 7 Global Step: 151780 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:13,420-Speed 2498.93 samples/sec Loss 4.5441 LearningRate 0.000824 Epoch: 7 Global Step: 151790 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:21,619-Speed 2498.46 samples/sec Loss 4.4717 LearningRate 0.000824 Epoch: 7 Global Step: 151800 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:29,765-Speed 2514.56 samples/sec Loss 4.4378 LearningRate 0.000824 Epoch: 7 Global Step: 151810 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:37,965-Speed 2497.72 samples/sec Loss 4.4225 LearningRate 0.000824 Epoch: 7 Global Step: 151820 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:46,165-Speed 2498.20 samples/sec Loss 4.4676 LearningRate 0.000824 Epoch: 7 Global Step: 151830 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:07:54,370-Speed 2496.50 samples/sec Loss 4.4661 LearningRate 0.000824 Epoch: 7 Global Step: 151840 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:02,570-Speed 2497.70 samples/sec Loss 4.5413 LearningRate 0.000824 Epoch: 7 Global Step: 151850 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:10,773-Speed 2497.24 samples/sec Loss 4.5111 LearningRate 0.000824 Epoch: 7 Global Step: 151860 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:18,920-Speed 2514.18 samples/sec Loss 4.5730 LearningRate 0.000824 Epoch: 7 Global Step: 151870 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:27,121-Speed 2497.56 samples/sec Loss 4.5305 LearningRate 0.000824 Epoch: 7 Global Step: 151880 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:35,318-Speed 2498.97 samples/sec Loss 4.4109 LearningRate 0.000824 Epoch: 7 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:43,521-Speed 2496.90 samples/sec Loss 4.4468 LearningRate 0.000824 Epoch: 7 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:51,727-Speed 2496.38 samples/sec Loss 4.5353 LearningRate 0.000824 Epoch: 7 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:08:59,921-Speed 2499.64 samples/sec Loss 4.4567 LearningRate 0.000824 Epoch: 7 Global Step: 151920 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:08,066-Speed 2514.83 samples/sec Loss 4.5227 LearningRate 0.000824 Epoch: 7 Global Step: 151930 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:16,262-Speed 2499.20 samples/sec Loss 4.5637 LearningRate 0.000824 Epoch: 7 Global Step: 151940 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:24,459-Speed 2499.08 samples/sec Loss 4.4102 LearningRate 0.000824 Epoch: 7 Global Step: 151950 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:32,658-Speed 2498.13 samples/sec Loss 4.5603 LearningRate 0.000824 Epoch: 7 Global Step: 151960 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:40,856-Speed 2498.64 samples/sec Loss 4.4766 LearningRate 0.000824 Epoch: 7 Global Step: 151970 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:49,055-Speed 2498.54 samples/sec Loss 4.4485 LearningRate 0.000824 Epoch: 7 Global Step: 151980 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:09:57,203-Speed 2514.14 samples/sec Loss 4.4484 LearningRate 0.000824 Epoch: 7 Global Step: 151990 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:05,399-Speed 2499.04 samples/sec Loss 4.4789 LearningRate 0.000824 Epoch: 7 Global Step: 152000 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:13,600-Speed 2497.43 samples/sec Loss 4.4886 LearningRate 0.000824 Epoch: 7 Global Step: 152010 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:21,800-Speed 2498.03 samples/sec Loss 4.4880 LearningRate 0.000824 Epoch: 7 Global Step: 152020 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:30,002-Speed 2497.31 samples/sec Loss 4.4279 LearningRate 0.000824 Epoch: 7 Global Step: 152030 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:38,205-Speed 2496.85 samples/sec Loss 4.4678 LearningRate 0.000824 Epoch: 7 Global Step: 152040 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:46,351-Speed 2514.55 samples/sec Loss 4.5164 LearningRate 0.000823 Epoch: 7 Global Step: 152050 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:10:54,553-Speed 2497.71 samples/sec Loss 4.5074 LearningRate 0.000823 Epoch: 7 Global Step: 152060 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:02,752-Speed 2498.35 samples/sec Loss 4.5022 LearningRate 0.000823 Epoch: 7 Global Step: 152070 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:10,962-Speed 2494.69 samples/sec Loss 4.4147 LearningRate 0.000823 Epoch: 7 Global Step: 152080 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:19,178-Speed 2493.28 samples/sec Loss 4.4893 LearningRate 0.000823 Epoch: 7 Global Step: 152090 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:27,379-Speed 2497.41 samples/sec Loss 4.5126 LearningRate 0.000823 Epoch: 7 Global Step: 152100 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:35,526-Speed 2514.35 samples/sec Loss 4.5042 LearningRate 0.000823 Epoch: 7 Global Step: 152110 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:43,723-Speed 2498.87 samples/sec Loss 4.5254 LearningRate 0.000823 Epoch: 7 Global Step: 152120 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:11:51,923-Speed 2497.77 samples/sec Loss 4.5201 LearningRate 0.000823 Epoch: 7 Global Step: 152130 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:00,141-Speed 2492.61 samples/sec Loss 4.5790 LearningRate 0.000823 Epoch: 7 Global Step: 152140 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:08,340-Speed 2498.18 samples/sec Loss 4.5936 LearningRate 0.000823 Epoch: 7 Global Step: 152150 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:16,538-Speed 2498.55 samples/sec Loss 4.4742 LearningRate 0.000823 Epoch: 7 Global Step: 152160 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:24,685-Speed 2514.05 samples/sec Loss 4.5320 LearningRate 0.000823 Epoch: 7 Global Step: 152170 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:32,881-Speed 2499.22 samples/sec Loss 4.5028 LearningRate 0.000823 Epoch: 7 Global Step: 152180 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:41,088-Speed 2495.93 samples/sec Loss 4.5092 LearningRate 0.000823 Epoch: 7 Global Step: 152190 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:49,289-Speed 2497.86 samples/sec Loss 4.4853 LearningRate 0.000823 Epoch: 7 Global Step: 152200 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:12:57,501-Speed 2494.15 samples/sec Loss 4.4720 LearningRate 0.000823 Epoch: 7 Global Step: 152210 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:05,700-Speed 2498.28 samples/sec Loss 4.5675 LearningRate 0.000823 Epoch: 7 Global Step: 152220 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:13,843-Speed 2515.53 samples/sec Loss 4.5393 LearningRate 0.000823 Epoch: 7 Global Step: 152230 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:22,040-Speed 2498.78 samples/sec Loss 4.5151 LearningRate 0.000823 Epoch: 7 Global Step: 152240 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:30,239-Speed 2498.34 samples/sec Loss 4.4914 LearningRate 0.000823 Epoch: 7 Global Step: 152250 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:38,435-Speed 2500.34 samples/sec Loss 4.4295 LearningRate 0.000823 Epoch: 7 Global Step: 152260 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:46,650-Speed 2493.24 samples/sec Loss 4.4744 LearningRate 0.000823 Epoch: 7 Global Step: 152270 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:13:54,846-Speed 2499.03 samples/sec Loss 4.4320 LearningRate 0.000823 Epoch: 7 Global Step: 152280 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:02,996-Speed 2513.19 samples/sec Loss 4.5144 LearningRate 0.000823 Epoch: 7 Global Step: 152290 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:11,199-Speed 2497.06 samples/sec Loss 4.4416 LearningRate 0.000823 Epoch: 7 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:19,397-Speed 2498.60 samples/sec Loss 4.4921 LearningRate 0.000823 Epoch: 7 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:27,599-Speed 2497.39 samples/sec Loss 4.6008 LearningRate 0.000823 Epoch: 7 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:35,798-Speed 2498.38 samples/sec Loss 4.6414 LearningRate 0.000823 Epoch: 7 Global Step: 152330 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:44,001-Speed 2497.06 samples/sec Loss 4.4990 LearningRate 0.000823 Epoch: 7 Global Step: 152340 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:14:52,151-Speed 2513.37 samples/sec Loss 4.5131 LearningRate 0.000823 Epoch: 7 Global Step: 152350 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:15:00,348-Speed 2498.78 samples/sec Loss 4.4430 LearningRate 0.000823 Epoch: 7 Global Step: 152360 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:15:08,559-Speed 2494.77 samples/sec Loss 4.6477 LearningRate 0.000823 Epoch: 7 Global Step: 152370 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:15:16,755-Speed 2499.03 samples/sec Loss 4.5632 LearningRate 0.000823 Epoch: 7 Global Step: 152380 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:15:24,951-Speed 2499.20 samples/sec Loss 4.5125 LearningRate 0.000823 Epoch: 7 Global Step: 152390 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:15:33,154-Speed 2497.22 samples/sec Loss 4.7038 LearningRate 0.000823 Epoch: 7 Global Step: 152400 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:15:41,300-Speed 2514.48 samples/sec Loss 4.6025 LearningRate 0.000823 Epoch: 7 Global Step: 152410 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:15:49,504-Speed 2496.74 samples/sec Loss 4.5522 LearningRate 0.000823 Epoch: 7 Global Step: 152420 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:15:57,718-Speed 2493.49 samples/sec Loss 4.6087 LearningRate 0.000823 Epoch: 7 Global Step: 152430 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:05,934-Speed 2493.31 samples/sec Loss 4.5331 LearningRate 0.000823 Epoch: 7 Global Step: 152440 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:14,133-Speed 2498.58 samples/sec Loss 4.5165 LearningRate 0.000823 Epoch: 7 Global Step: 152450 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:22,337-Speed 2496.54 samples/sec Loss 4.5221 LearningRate 0.000822 Epoch: 7 Global Step: 152460 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:30,485-Speed 2514.64 samples/sec Loss 4.4607 LearningRate 0.000822 Epoch: 7 Global Step: 152470 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:38,689-Speed 2496.91 samples/sec Loss 4.4865 LearningRate 0.000822 Epoch: 7 Global Step: 152480 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:46,891-Speed 2497.25 samples/sec Loss 4.4930 LearningRate 0.000822 Epoch: 7 Global Step: 152490 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:16:55,111-Speed 2491.98 samples/sec Loss 4.5861 LearningRate 0.000822 Epoch: 7 Global Step: 152500 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:03,311-Speed 2498.11 samples/sec Loss 4.4826 LearningRate 0.000822 Epoch: 7 Global Step: 152510 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:11,521-Speed 2495.30 samples/sec Loss 4.5006 LearningRate 0.000822 Epoch: 7 Global Step: 152520 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:19,668-Speed 2514.05 samples/sec Loss 4.4599 LearningRate 0.000822 Epoch: 7 Global Step: 152530 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:27,880-Speed 2494.41 samples/sec Loss 4.4741 LearningRate 0.000822 Epoch: 7 Global Step: 152540 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:36,079-Speed 2498.00 samples/sec Loss 4.4589 LearningRate 0.000822 Epoch: 7 Global Step: 152550 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:44,283-Speed 2496.99 samples/sec Loss 4.4800 LearningRate 0.000822 Epoch: 7 Global Step: 152560 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:17:52,485-Speed 2497.26 samples/sec Loss 4.4644 LearningRate 0.000822 Epoch: 7 Global Step: 152570 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:00,691-Speed 2496.12 samples/sec Loss 4.5414 LearningRate 0.000822 Epoch: 7 Global Step: 152580 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:08,838-Speed 2514.39 samples/sec Loss 4.4735 LearningRate 0.000822 Epoch: 7 Global Step: 152590 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:17,037-Speed 2498.16 samples/sec Loss 4.5579 LearningRate 0.000822 Epoch: 7 Global Step: 152600 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:25,236-Speed 2498.31 samples/sec Loss 4.4171 LearningRate 0.000822 Epoch: 7 Global Step: 152610 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:33,438-Speed 2497.50 samples/sec Loss 4.4226 LearningRate 0.000822 Epoch: 7 Global Step: 152620 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:41,648-Speed 2494.87 samples/sec Loss 4.5732 LearningRate 0.000822 Epoch: 7 Global Step: 152630 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:49,846-Speed 2498.36 samples/sec Loss 4.4428 LearningRate 0.000822 Epoch: 7 Global Step: 152640 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:18:57,995-Speed 2513.79 samples/sec Loss 4.4341 LearningRate 0.000822 Epoch: 7 Global Step: 152650 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:06,192-Speed 2499.00 samples/sec Loss 4.5265 LearningRate 0.000822 Epoch: 7 Global Step: 152660 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:14,389-Speed 2499.08 samples/sec Loss 4.4188 LearningRate 0.000822 Epoch: 7 Global Step: 152670 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:22,590-Speed 2497.70 samples/sec Loss 4.5004 LearningRate 0.000822 Epoch: 7 Global Step: 152680 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:30,787-Speed 2498.86 samples/sec Loss 4.4766 LearningRate 0.000822 Epoch: 7 Global Step: 152690 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:38,990-Speed 2496.94 samples/sec Loss 4.4111 LearningRate 0.000822 Epoch: 7 Global Step: 152700 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:47,133-Speed 2515.50 samples/sec Loss 4.4096 LearningRate 0.000822 Epoch: 7 Global Step: 152710 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:19:55,337-Speed 2496.91 samples/sec Loss 4.4184 LearningRate 0.000822 Epoch: 7 Global Step: 152720 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:03,538-Speed 2497.39 samples/sec Loss 4.4814 LearningRate 0.000822 Epoch: 7 Global Step: 152730 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:11,742-Speed 2496.83 samples/sec Loss 4.4320 LearningRate 0.000822 Epoch: 7 Global Step: 152740 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:19,944-Speed 2497.51 samples/sec Loss 4.5335 LearningRate 0.000822 Epoch: 7 Global Step: 152750 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:28,147-Speed 2496.67 samples/sec Loss 4.4193 LearningRate 0.000822 Epoch: 7 Global Step: 152760 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:36,313-Speed 2508.73 samples/sec Loss 4.4325 LearningRate 0.000822 Epoch: 7 Global Step: 152770 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:44,515-Speed 2497.59 samples/sec Loss 4.4933 LearningRate 0.000822 Epoch: 7 Global Step: 152780 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:20:52,721-Speed 2495.99 samples/sec Loss 4.4313 LearningRate 0.000822 Epoch: 7 Global Step: 152790 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:00,921-Speed 2498.31 samples/sec Loss 4.4509 LearningRate 0.000822 Epoch: 7 Global Step: 152800 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:09,125-Speed 2496.54 samples/sec Loss 4.4078 LearningRate 0.000822 Epoch: 7 Global Step: 152810 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:17,329-Speed 2497.09 samples/sec Loss 4.4546 LearningRate 0.000822 Epoch: 7 Global Step: 152820 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:25,473-Speed 2514.96 samples/sec Loss 4.4550 LearningRate 0.000822 Epoch: 7 Global Step: 152830 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:33,680-Speed 2495.89 samples/sec Loss 4.4451 LearningRate 0.000822 Epoch: 7 Global Step: 152840 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:41,880-Speed 2497.85 samples/sec Loss 4.4161 LearningRate 0.000822 Epoch: 7 Global Step: 152850 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:50,077-Speed 2498.97 samples/sec Loss 4.5842 LearningRate 0.000822 Epoch: 7 Global Step: 152860 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:21:58,276-Speed 2497.95 samples/sec Loss 4.4933 LearningRate 0.000821 Epoch: 7 Global Step: 152870 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:06,478-Speed 2497.44 samples/sec Loss 4.4425 LearningRate 0.000821 Epoch: 7 Global Step: 152880 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:14,625-Speed 2514.47 samples/sec Loss 4.4388 LearningRate 0.000821 Epoch: 7 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:22,828-Speed 2496.90 samples/sec Loss 4.4250 LearningRate 0.000821 Epoch: 7 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:31,039-Speed 2494.65 samples/sec Loss 4.3910 LearningRate 0.000821 Epoch: 7 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:39,239-Speed 2497.96 samples/sec Loss 4.4550 LearningRate 0.000821 Epoch: 7 Global Step: 152920 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:47,447-Speed 2495.49 samples/sec Loss 4.5022 LearningRate 0.000821 Epoch: 7 Global Step: 152930 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:22:55,659-Speed 2494.30 samples/sec Loss 4.4605 LearningRate 0.000821 Epoch: 7 Global Step: 152940 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:23:03,812-Speed 2512.09 samples/sec Loss 4.4482 LearningRate 0.000821 Epoch: 7 Global Step: 152950 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:23:12,028-Speed 2498.74 samples/sec Loss 4.5204 LearningRate 0.000821 Epoch: 7 Global Step: 152960 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:23:20,252-Speed 2495.60 samples/sec Loss 4.5460 LearningRate 0.000821 Epoch: 7 Global Step: 152970 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:23:28,466-Speed 2493.64 samples/sec Loss 4.4673 LearningRate 0.000821 Epoch: 7 Global Step: 152980 Fp16 Grad Scale: 65536 Required: 155 hours Training: 2022-07-07 01:23:38,654-Speed 2010.70 samples/sec Loss 4.3698 LearningRate 0.000821 Epoch: 7 Global Step: 152990 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:23:46,849-Speed 2499.37 samples/sec Loss 4.4405 LearningRate 0.000821 Epoch: 7 Global Step: 153000 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:23:54,996-Speed 2514.22 samples/sec Loss 4.4385 LearningRate 0.000821 Epoch: 7 Global Step: 153010 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:03,194-Speed 2498.58 samples/sec Loss 4.3872 LearningRate 0.000821 Epoch: 7 Global Step: 153020 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:11,390-Speed 2499.34 samples/sec Loss 4.4223 LearningRate 0.000821 Epoch: 7 Global Step: 153030 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:19,590-Speed 2498.09 samples/sec Loss 4.3508 LearningRate 0.000821 Epoch: 7 Global Step: 153040 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:27,788-Speed 2498.24 samples/sec Loss 4.4121 LearningRate 0.000821 Epoch: 7 Global Step: 153050 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:35,996-Speed 2496.41 samples/sec Loss 4.4601 LearningRate 0.000821 Epoch: 7 Global Step: 153060 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:44,149-Speed 2512.55 samples/sec Loss 4.4231 LearningRate 0.000821 Epoch: 7 Global Step: 153070 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:24:52,366-Speed 2492.73 samples/sec Loss 4.4249 LearningRate 0.000821 Epoch: 7 Global Step: 153080 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:00,564-Speed 2498.74 samples/sec Loss 4.4501 LearningRate 0.000821 Epoch: 7 Global Step: 153090 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:08,769-Speed 2496.49 samples/sec Loss 4.4673 LearningRate 0.000821 Epoch: 7 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:16,968-Speed 2498.09 samples/sec Loss 4.3866 LearningRate 0.000821 Epoch: 7 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:25,177-Speed 2495.41 samples/sec Loss 4.4231 LearningRate 0.000821 Epoch: 7 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:33,338-Speed 2509.92 samples/sec Loss 4.3647 LearningRate 0.000821 Epoch: 7 Global Step: 153130 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:41,543-Speed 2496.44 samples/sec Loss 4.4882 LearningRate 0.000821 Epoch: 7 Global Step: 153140 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:49,755-Speed 2494.20 samples/sec Loss 4.4072 LearningRate 0.000821 Epoch: 7 Global Step: 153150 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:25:57,960-Speed 2496.56 samples/sec Loss 4.4676 LearningRate 0.000821 Epoch: 7 Global Step: 153160 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:06,166-Speed 2496.07 samples/sec Loss 4.4368 LearningRate 0.000821 Epoch: 7 Global Step: 153170 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:14,367-Speed 2497.68 samples/sec Loss 4.4906 LearningRate 0.000821 Epoch: 7 Global Step: 153180 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:22,512-Speed 2514.79 samples/sec Loss 4.4443 LearningRate 0.000821 Epoch: 7 Global Step: 153190 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:30,719-Speed 2495.71 samples/sec Loss 4.3989 LearningRate 0.000821 Epoch: 7 Global Step: 153200 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:38,913-Speed 2499.91 samples/sec Loss 4.4147 LearningRate 0.000821 Epoch: 7 Global Step: 153210 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:47,115-Speed 2497.24 samples/sec Loss 4.4258 LearningRate 0.000821 Epoch: 7 Global Step: 153220 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:26:55,312-Speed 2499.14 samples/sec Loss 4.4241 LearningRate 0.000821 Epoch: 7 Global Step: 153230 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:03,510-Speed 2498.42 samples/sec Loss 4.3889 LearningRate 0.000821 Epoch: 7 Global Step: 153240 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:11,658-Speed 2514.27 samples/sec Loss 4.4838 LearningRate 0.000821 Epoch: 7 Global Step: 153250 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:19,858-Speed 2498.04 samples/sec Loss 4.5168 LearningRate 0.000821 Epoch: 7 Global Step: 153260 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:28,057-Speed 2498.48 samples/sec Loss 4.4787 LearningRate 0.000821 Epoch: 7 Global Step: 153270 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:36,262-Speed 2496.46 samples/sec Loss 4.4364 LearningRate 0.000820 Epoch: 7 Global Step: 153280 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:44,471-Speed 2495.41 samples/sec Loss 4.4062 LearningRate 0.000820 Epoch: 7 Global Step: 153290 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:27:52,672-Speed 2497.60 samples/sec Loss 4.4618 LearningRate 0.000820 Epoch: 7 Global Step: 153300 Fp16 Grad Scale: 32768 Required: 155 hours Training: 2022-07-07 01:28:00,825-Speed 2512.41 samples/sec Loss 4.4445 LearningRate 0.000820 Epoch: 7 Global Step: 153310 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:09,040-Speed 2493.42 samples/sec Loss 4.4684 LearningRate 0.000820 Epoch: 7 Global Step: 153320 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:17,235-Speed 2499.79 samples/sec Loss 4.4785 LearningRate 0.000820 Epoch: 7 Global Step: 153330 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:25,435-Speed 2497.88 samples/sec Loss 4.4404 LearningRate 0.000820 Epoch: 7 Global Step: 153340 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:33,645-Speed 2494.83 samples/sec Loss 4.5144 LearningRate 0.000820 Epoch: 7 Global Step: 153350 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:41,847-Speed 2497.37 samples/sec Loss 4.4533 LearningRate 0.000820 Epoch: 7 Global Step: 153360 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:49,991-Speed 2514.97 samples/sec Loss 4.4902 LearningRate 0.000820 Epoch: 7 Global Step: 153370 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:28:58,191-Speed 2498.30 samples/sec Loss 4.4233 LearningRate 0.000820 Epoch: 7 Global Step: 153380 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:06,401-Speed 2494.80 samples/sec Loss 4.4793 LearningRate 0.000820 Epoch: 7 Global Step: 153390 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:14,603-Speed 2497.41 samples/sec Loss 4.4212 LearningRate 0.000820 Epoch: 7 Global Step: 153400 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:22,801-Speed 2498.26 samples/sec Loss 4.4114 LearningRate 0.000820 Epoch: 7 Global Step: 153410 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:31,001-Speed 2498.22 samples/sec Loss 4.5018 LearningRate 0.000820 Epoch: 7 Global Step: 153420 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:39,144-Speed 2515.31 samples/sec Loss 4.5258 LearningRate 0.000820 Epoch: 7 Global Step: 153430 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:47,355-Speed 2494.60 samples/sec Loss 4.4593 LearningRate 0.000820 Epoch: 7 Global Step: 153440 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:29:55,566-Speed 2494.54 samples/sec Loss 4.4372 LearningRate 0.000820 Epoch: 7 Global Step: 153450 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:03,764-Speed 2498.73 samples/sec Loss 4.4385 LearningRate 0.000820 Epoch: 7 Global Step: 153460 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:11,964-Speed 2498.02 samples/sec Loss 4.4122 LearningRate 0.000820 Epoch: 7 Global Step: 153470 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:20,164-Speed 2497.82 samples/sec Loss 4.3275 LearningRate 0.000820 Epoch: 7 Global Step: 153480 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:28,328-Speed 2509.28 samples/sec Loss 4.4149 LearningRate 0.000820 Epoch: 7 Global Step: 153490 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:36,527-Speed 2498.34 samples/sec Loss 4.4904 LearningRate 0.000820 Epoch: 7 Global Step: 153500 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:44,731-Speed 2496.59 samples/sec Loss 4.4212 LearningRate 0.000820 Epoch: 7 Global Step: 153510 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:30:52,943-Speed 2494.55 samples/sec Loss 4.4548 LearningRate 0.000820 Epoch: 7 Global Step: 153520 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:01,141-Speed 2498.71 samples/sec Loss 4.4925 LearningRate 0.000820 Epoch: 7 Global Step: 153530 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:09,346-Speed 2496.38 samples/sec Loss 4.5093 LearningRate 0.000820 Epoch: 7 Global Step: 153540 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:17,491-Speed 2514.66 samples/sec Loss 4.4602 LearningRate 0.000820 Epoch: 7 Global Step: 153550 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:25,690-Speed 2498.19 samples/sec Loss 4.3660 LearningRate 0.000820 Epoch: 7 Global Step: 153560 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:33,897-Speed 2496.03 samples/sec Loss 4.3991 LearningRate 0.000820 Epoch: 7 Global Step: 153570 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:42,098-Speed 2497.73 samples/sec Loss 4.4837 LearningRate 0.000820 Epoch: 7 Global Step: 153580 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:50,300-Speed 2497.46 samples/sec Loss 4.4528 LearningRate 0.000820 Epoch: 7 Global Step: 153590 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:31:58,500-Speed 2497.88 samples/sec Loss 4.4359 LearningRate 0.000820 Epoch: 7 Global Step: 153600 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:06,656-Speed 2511.45 samples/sec Loss 4.4212 LearningRate 0.000820 Epoch: 7 Global Step: 153610 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:14,858-Speed 2497.35 samples/sec Loss 4.4102 LearningRate 0.000820 Epoch: 7 Global Step: 153620 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:23,061-Speed 2496.83 samples/sec Loss 4.4072 LearningRate 0.000820 Epoch: 7 Global Step: 153630 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:31,260-Speed 2498.16 samples/sec Loss 4.4093 LearningRate 0.000820 Epoch: 7 Global Step: 153640 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:39,460-Speed 2498.09 samples/sec Loss 4.4518 LearningRate 0.000820 Epoch: 7 Global Step: 153650 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:47,660-Speed 2498.26 samples/sec Loss 4.3826 LearningRate 0.000820 Epoch: 7 Global Step: 153660 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:32:55,808-Speed 2513.67 samples/sec Loss 4.3542 LearningRate 0.000820 Epoch: 7 Global Step: 153670 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:04,005-Speed 2498.86 samples/sec Loss 4.4171 LearningRate 0.000820 Epoch: 7 Global Step: 153680 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:12,204-Speed 2498.55 samples/sec Loss 4.3897 LearningRate 0.000819 Epoch: 7 Global Step: 153690 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:20,413-Speed 2495.31 samples/sec Loss 4.4912 LearningRate 0.000819 Epoch: 7 Global Step: 153700 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:28,614-Speed 2497.48 samples/sec Loss 4.3976 LearningRate 0.000819 Epoch: 7 Global Step: 153710 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:36,812-Speed 2498.56 samples/sec Loss 4.3246 LearningRate 0.000819 Epoch: 7 Global Step: 153720 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:44,956-Speed 2515.04 samples/sec Loss 4.3826 LearningRate 0.000819 Epoch: 7 Global Step: 153730 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:33:53,159-Speed 2496.99 samples/sec Loss 4.4172 LearningRate 0.000819 Epoch: 7 Global Step: 153740 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:01,360-Speed 2498.11 samples/sec Loss 4.4668 LearningRate 0.000819 Epoch: 7 Global Step: 153750 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:09,556-Speed 2499.21 samples/sec Loss 4.5074 LearningRate 0.000819 Epoch: 7 Global Step: 153760 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:17,760-Speed 2497.37 samples/sec Loss 4.4631 LearningRate 0.000819 Epoch: 7 Global Step: 153770 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:25,965-Speed 2496.31 samples/sec Loss 4.4640 LearningRate 0.000819 Epoch: 7 Global Step: 153780 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:34,112-Speed 2514.15 samples/sec Loss 4.5255 LearningRate 0.000819 Epoch: 7 Global Step: 153790 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:42,309-Speed 2498.95 samples/sec Loss 4.4219 LearningRate 0.000819 Epoch: 7 Global Step: 153800 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:50,505-Speed 2499.24 samples/sec Loss 4.4423 LearningRate 0.000819 Epoch: 7 Global Step: 153810 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:34:58,703-Speed 2498.42 samples/sec Loss 4.4646 LearningRate 0.000819 Epoch: 7 Global Step: 153820 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:06,904-Speed 2497.68 samples/sec Loss 4.6006 LearningRate 0.000819 Epoch: 7 Global Step: 153830 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:15,103-Speed 2498.15 samples/sec Loss 4.4779 LearningRate 0.000819 Epoch: 7 Global Step: 153840 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:23,255-Speed 2512.95 samples/sec Loss 4.5603 LearningRate 0.000819 Epoch: 7 Global Step: 153850 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:31,454-Speed 2498.04 samples/sec Loss 4.4847 LearningRate 0.000819 Epoch: 7 Global Step: 153860 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:39,652-Speed 2498.41 samples/sec Loss 4.4499 LearningRate 0.000819 Epoch: 7 Global Step: 153870 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:47,857-Speed 2496.68 samples/sec Loss 4.4748 LearningRate 0.000819 Epoch: 7 Global Step: 153880 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:35:56,063-Speed 2496.15 samples/sec Loss 4.4443 LearningRate 0.000819 Epoch: 7 Global Step: 153890 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:04,265-Speed 2497.38 samples/sec Loss 4.4547 LearningRate 0.000819 Epoch: 7 Global Step: 153900 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:12,409-Speed 2515.09 samples/sec Loss 4.4550 LearningRate 0.000819 Epoch: 7 Global Step: 153910 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:20,610-Speed 2497.68 samples/sec Loss 4.4140 LearningRate 0.000819 Epoch: 7 Global Step: 153920 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:28,811-Speed 2497.71 samples/sec Loss 4.4213 LearningRate 0.000819 Epoch: 7 Global Step: 153930 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:37,008-Speed 2498.88 samples/sec Loss 4.3717 LearningRate 0.000819 Epoch: 7 Global Step: 153940 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:45,207-Speed 2498.26 samples/sec Loss 4.3870 LearningRate 0.000819 Epoch: 7 Global Step: 153950 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:36:53,408-Speed 2497.62 samples/sec Loss 4.4211 LearningRate 0.000819 Epoch: 7 Global Step: 153960 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:01,553-Speed 2514.74 samples/sec Loss 4.4871 LearningRate 0.000819 Epoch: 7 Global Step: 153970 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:09,754-Speed 2497.70 samples/sec Loss 4.4638 LearningRate 0.000819 Epoch: 7 Global Step: 153980 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:17,955-Speed 2497.63 samples/sec Loss 4.5002 LearningRate 0.000819 Epoch: 7 Global Step: 153990 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:26,152-Speed 2499.13 samples/sec Loss 4.4872 LearningRate 0.000819 Epoch: 7 Global Step: 154000 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:34,353-Speed 2497.49 samples/sec Loss 4.4824 LearningRate 0.000819 Epoch: 7 Global Step: 154010 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:42,554-Speed 2497.83 samples/sec Loss 4.4233 LearningRate 0.000819 Epoch: 7 Global Step: 154020 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:50,696-Speed 2515.61 samples/sec Loss 4.3795 LearningRate 0.000819 Epoch: 7 Global Step: 154030 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:37:58,889-Speed 2500.08 samples/sec Loss 4.4693 LearningRate 0.000819 Epoch: 7 Global Step: 154040 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:07,087-Speed 2498.65 samples/sec Loss 4.4043 LearningRate 0.000819 Epoch: 7 Global Step: 154050 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:15,284-Speed 2498.99 samples/sec Loss 4.4864 LearningRate 0.000819 Epoch: 7 Global Step: 154060 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:23,486-Speed 2497.41 samples/sec Loss 4.4448 LearningRate 0.000819 Epoch: 7 Global Step: 154070 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:31,686-Speed 2497.83 samples/sec Loss 4.4994 LearningRate 0.000819 Epoch: 7 Global Step: 154080 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:39,843-Speed 2511.27 samples/sec Loss 4.4424 LearningRate 0.000819 Epoch: 7 Global Step: 154090 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:48,039-Speed 2499.29 samples/sec Loss 4.3765 LearningRate 0.000819 Epoch: 7 Global Step: 154100 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:38:56,237-Speed 2498.41 samples/sec Loss 4.5134 LearningRate 0.000818 Epoch: 7 Global Step: 154110 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:04,430-Speed 2500.06 samples/sec Loss 4.4544 LearningRate 0.000818 Epoch: 7 Global Step: 154120 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:12,630-Speed 2498.06 samples/sec Loss 4.4452 LearningRate 0.000818 Epoch: 7 Global Step: 154130 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:20,830-Speed 2498.05 samples/sec Loss 4.3820 LearningRate 0.000818 Epoch: 7 Global Step: 154140 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:28,973-Speed 2515.23 samples/sec Loss 4.4868 LearningRate 0.000818 Epoch: 7 Global Step: 154150 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:37,175-Speed 2497.47 samples/sec Loss 4.5208 LearningRate 0.000818 Epoch: 7 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:45,374-Speed 2498.18 samples/sec Loss 4.3424 LearningRate 0.000818 Epoch: 7 Global Step: 154170 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:39:53,578-Speed 2496.72 samples/sec Loss 4.4275 LearningRate 0.000818 Epoch: 7 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:40:01,779-Speed 2497.73 samples/sec Loss 4.4324 LearningRate 0.000818 Epoch: 7 Global Step: 154190 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:40:10,013-Speed 2487.96 samples/sec Loss 4.4350 LearningRate 0.000818 Epoch: 7 Global Step: 154200 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:40:18,154-Speed 2515.87 samples/sec Loss 4.3818 LearningRate 0.000818 Epoch: 7 Global Step: 154210 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:40:26,308-Speed 2512.12 samples/sec Loss 4.4745 LearningRate 0.000818 Epoch: 7 Global Step: 154220 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:40:34,508-Speed 2498.05 samples/sec Loss 4.3616 LearningRate 0.000818 Epoch: 7 Global Step: 154230 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:40:42,712-Speed 2496.60 samples/sec Loss 4.4475 LearningRate 0.000818 Epoch: 7 Global Step: 154240 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:40:50,910-Speed 2498.58 samples/sec Loss 4.4012 LearningRate 0.000818 Epoch: 7 Global Step: 154250 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:40:59,107-Speed 2498.97 samples/sec Loss 4.3116 LearningRate 0.000818 Epoch: 7 Global Step: 154260 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:07,248-Speed 2516.09 samples/sec Loss 4.4570 LearningRate 0.000818 Epoch: 7 Global Step: 154270 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:15,456-Speed 2495.55 samples/sec Loss 4.4499 LearningRate 0.000818 Epoch: 7 Global Step: 154280 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:23,658-Speed 2497.32 samples/sec Loss 4.4417 LearningRate 0.000818 Epoch: 7 Global Step: 154290 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:31,861-Speed 2497.21 samples/sec Loss 4.4204 LearningRate 0.000818 Epoch: 7 Global Step: 154300 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:40,055-Speed 2499.92 samples/sec Loss 4.4994 LearningRate 0.000818 Epoch: 7 Global Step: 154310 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:48,261-Speed 2496.11 samples/sec Loss 4.4190 LearningRate 0.000818 Epoch: 7 Global Step: 154320 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:41:56,404-Speed 2515.39 samples/sec Loss 4.5339 LearningRate 0.000818 Epoch: 7 Global Step: 154330 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:04,601-Speed 2498.85 samples/sec Loss 4.4835 LearningRate 0.000818 Epoch: 7 Global Step: 154340 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:12,804-Speed 2497.32 samples/sec Loss 4.4923 LearningRate 0.000818 Epoch: 7 Global Step: 154350 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:20,998-Speed 2499.74 samples/sec Loss 4.4745 LearningRate 0.000818 Epoch: 7 Global Step: 154360 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:29,193-Speed 2499.21 samples/sec Loss 4.5559 LearningRate 0.000818 Epoch: 7 Global Step: 154370 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:37,391-Speed 2498.87 samples/sec Loss 4.4427 LearningRate 0.000818 Epoch: 7 Global Step: 154380 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:45,534-Speed 2515.26 samples/sec Loss 4.4557 LearningRate 0.000818 Epoch: 7 Global Step: 154390 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:42:53,732-Speed 2498.75 samples/sec Loss 4.4677 LearningRate 0.000818 Epoch: 7 Global Step: 154400 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:01,933-Speed 2497.61 samples/sec Loss 4.4757 LearningRate 0.000818 Epoch: 7 Global Step: 154410 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:10,141-Speed 2495.66 samples/sec Loss 4.4913 LearningRate 0.000818 Epoch: 7 Global Step: 154420 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:18,337-Speed 2498.81 samples/sec Loss 4.4002 LearningRate 0.000818 Epoch: 7 Global Step: 154430 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:26,534-Speed 2499.12 samples/sec Loss 4.4725 LearningRate 0.000818 Epoch: 7 Global Step: 154440 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:34,678-Speed 2515.51 samples/sec Loss 4.4157 LearningRate 0.000818 Epoch: 7 Global Step: 154450 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:42,878-Speed 2498.17 samples/sec Loss 4.4546 LearningRate 0.000818 Epoch: 7 Global Step: 154460 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:51,075-Speed 2498.91 samples/sec Loss 4.4014 LearningRate 0.000818 Epoch: 7 Global Step: 154470 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:43:59,272-Speed 2498.79 samples/sec Loss 4.4526 LearningRate 0.000818 Epoch: 7 Global Step: 154480 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:07,468-Speed 2498.98 samples/sec Loss 4.4180 LearningRate 0.000818 Epoch: 7 Global Step: 154490 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:15,667-Speed 2498.56 samples/sec Loss 4.4492 LearningRate 0.000818 Epoch: 7 Global Step: 154500 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:23,812-Speed 2514.76 samples/sec Loss 4.4083 LearningRate 0.000818 Epoch: 7 Global Step: 154510 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:32,012-Speed 2497.86 samples/sec Loss 4.4516 LearningRate 0.000817 Epoch: 7 Global Step: 154520 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:40,222-Speed 2495.03 samples/sec Loss 4.4136 LearningRate 0.000817 Epoch: 7 Global Step: 154530 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:48,423-Speed 2497.83 samples/sec Loss 4.3582 LearningRate 0.000817 Epoch: 7 Global Step: 154540 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:44:56,625-Speed 2497.25 samples/sec Loss 4.4396 LearningRate 0.000817 Epoch: 7 Global Step: 154550 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:04,825-Speed 2497.99 samples/sec Loss 4.3929 LearningRate 0.000817 Epoch: 7 Global Step: 154560 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:12,971-Speed 2514.58 samples/sec Loss 4.5364 LearningRate 0.000817 Epoch: 7 Global Step: 154570 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:21,168-Speed 2498.97 samples/sec Loss 4.4251 LearningRate 0.000817 Epoch: 7 Global Step: 154580 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:29,366-Speed 2498.38 samples/sec Loss 4.4560 LearningRate 0.000817 Epoch: 7 Global Step: 154590 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:37,568-Speed 2497.25 samples/sec Loss 4.4436 LearningRate 0.000817 Epoch: 7 Global Step: 154600 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:45,768-Speed 2498.17 samples/sec Loss 4.4272 LearningRate 0.000817 Epoch: 7 Global Step: 154610 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:45:53,972-Speed 2496.53 samples/sec Loss 4.4254 LearningRate 0.000817 Epoch: 7 Global Step: 154620 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:02,119-Speed 2514.38 samples/sec Loss 4.3488 LearningRate 0.000817 Epoch: 7 Global Step: 154630 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:10,321-Speed 2497.24 samples/sec Loss 4.3402 LearningRate 0.000817 Epoch: 7 Global Step: 154640 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:18,524-Speed 2496.92 samples/sec Loss 4.3603 LearningRate 0.000817 Epoch: 7 Global Step: 154650 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:26,721-Speed 2499.00 samples/sec Loss 4.3144 LearningRate 0.000817 Epoch: 7 Global Step: 154660 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:34,924-Speed 2497.49 samples/sec Loss 4.4486 LearningRate 0.000817 Epoch: 7 Global Step: 154670 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:43,122-Speed 2498.35 samples/sec Loss 4.3878 LearningRate 0.000817 Epoch: 7 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:51,266-Speed 2515.74 samples/sec Loss 4.4134 LearningRate 0.000817 Epoch: 7 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:46:59,462-Speed 2499.04 samples/sec Loss 4.3957 LearningRate 0.000817 Epoch: 7 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:07,657-Speed 2499.59 samples/sec Loss 4.3933 LearningRate 0.000817 Epoch: 7 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:15,853-Speed 2499.29 samples/sec Loss 4.3006 LearningRate 0.000817 Epoch: 7 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:24,051-Speed 2498.44 samples/sec Loss 4.3952 LearningRate 0.000817 Epoch: 7 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:32,262-Speed 2494.87 samples/sec Loss 4.4324 LearningRate 0.000817 Epoch: 7 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:40,408-Speed 2514.37 samples/sec Loss 4.4645 LearningRate 0.000817 Epoch: 7 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:48,607-Speed 2498.52 samples/sec Loss 4.4340 LearningRate 0.000817 Epoch: 7 Global Step: 154760 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:47:56,802-Speed 2499.30 samples/sec Loss 4.4116 LearningRate 0.000817 Epoch: 7 Global Step: 154770 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:04,996-Speed 2499.79 samples/sec Loss 4.4639 LearningRate 0.000817 Epoch: 7 Global Step: 154780 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:13,192-Speed 2499.37 samples/sec Loss 4.4956 LearningRate 0.000817 Epoch: 7 Global Step: 154790 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:21,390-Speed 2498.51 samples/sec Loss 4.3723 LearningRate 0.000817 Epoch: 7 Global Step: 154800 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:29,537-Speed 2514.48 samples/sec Loss 4.4296 LearningRate 0.000817 Epoch: 7 Global Step: 154810 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:37,737-Speed 2497.93 samples/sec Loss 4.4385 LearningRate 0.000817 Epoch: 7 Global Step: 154820 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:45,935-Speed 2498.55 samples/sec Loss 4.4026 LearningRate 0.000817 Epoch: 7 Global Step: 154830 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:48:54,137-Speed 2497.20 samples/sec Loss 4.4565 LearningRate 0.000817 Epoch: 7 Global Step: 154840 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:02,338-Speed 2497.70 samples/sec Loss 4.3850 LearningRate 0.000817 Epoch: 7 Global Step: 154850 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:10,537-Speed 2498.35 samples/sec Loss 4.4025 LearningRate 0.000817 Epoch: 7 Global Step: 154860 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:18,692-Speed 2511.91 samples/sec Loss 4.5020 LearningRate 0.000817 Epoch: 7 Global Step: 154870 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:26,888-Speed 2499.04 samples/sec Loss 4.4762 LearningRate 0.000817 Epoch: 7 Global Step: 154880 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:35,087-Speed 2498.44 samples/sec Loss 4.4811 LearningRate 0.000817 Epoch: 7 Global Step: 154890 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:43,290-Speed 2497.02 samples/sec Loss 4.5743 LearningRate 0.000817 Epoch: 7 Global Step: 154900 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:51,492-Speed 2497.47 samples/sec Loss 4.5085 LearningRate 0.000817 Epoch: 7 Global Step: 154910 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:49:59,694-Speed 2497.42 samples/sec Loss 4.5202 LearningRate 0.000817 Epoch: 7 Global Step: 154920 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:07,840-Speed 2514.37 samples/sec Loss 4.4869 LearningRate 0.000816 Epoch: 7 Global Step: 154930 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:16,043-Speed 2497.07 samples/sec Loss 4.4176 LearningRate 0.000816 Epoch: 7 Global Step: 154940 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:24,246-Speed 2496.98 samples/sec Loss 4.4471 LearningRate 0.000816 Epoch: 7 Global Step: 154950 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:32,458-Speed 2494.26 samples/sec Loss 4.4762 LearningRate 0.000816 Epoch: 7 Global Step: 154960 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:40,659-Speed 2497.69 samples/sec Loss 4.4084 LearningRate 0.000816 Epoch: 7 Global Step: 154970 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:48,859-Speed 2497.69 samples/sec Loss 4.4755 LearningRate 0.000816 Epoch: 7 Global Step: 154980 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:50:57,008-Speed 2513.78 samples/sec Loss 4.4152 LearningRate 0.000816 Epoch: 7 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:05,206-Speed 2498.68 samples/sec Loss 4.4082 LearningRate 0.000816 Epoch: 7 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:13,403-Speed 2498.69 samples/sec Loss 4.4246 LearningRate 0.000816 Epoch: 7 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:21,603-Speed 2498.10 samples/sec Loss 4.3907 LearningRate 0.000816 Epoch: 7 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:29,809-Speed 2496.50 samples/sec Loss 4.4248 LearningRate 0.000816 Epoch: 7 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:38,012-Speed 2497.10 samples/sec Loss 4.3383 LearningRate 0.000816 Epoch: 7 Global Step: 155040 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:46,170-Speed 2510.72 samples/sec Loss 4.4183 LearningRate 0.000816 Epoch: 7 Global Step: 155050 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:51:54,367-Speed 2498.80 samples/sec Loss 4.4801 LearningRate 0.000816 Epoch: 7 Global Step: 155060 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:02,578-Speed 2494.62 samples/sec Loss 4.4481 LearningRate 0.000816 Epoch: 7 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:10,788-Speed 2494.89 samples/sec Loss 4.4240 LearningRate 0.000816 Epoch: 7 Global Step: 155080 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:18,986-Speed 2498.75 samples/sec Loss 4.4440 LearningRate 0.000816 Epoch: 7 Global Step: 155090 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:27,184-Speed 2498.55 samples/sec Loss 4.4193 LearningRate 0.000816 Epoch: 7 Global Step: 155100 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:35,329-Speed 2514.53 samples/sec Loss 4.3896 LearningRate 0.000816 Epoch: 7 Global Step: 155110 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:43,526-Speed 2498.84 samples/sec Loss 4.4552 LearningRate 0.000816 Epoch: 7 Global Step: 155120 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:51,722-Speed 2499.48 samples/sec Loss 4.3644 LearningRate 0.000816 Epoch: 7 Global Step: 155130 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:52:59,918-Speed 2499.13 samples/sec Loss 4.3637 LearningRate 0.000816 Epoch: 7 Global Step: 155140 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:08,131-Speed 2494.04 samples/sec Loss 4.4369 LearningRate 0.000816 Epoch: 7 Global Step: 155150 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:16,344-Speed 2494.17 samples/sec Loss 4.2774 LearningRate 0.000816 Epoch: 7 Global Step: 155160 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:24,490-Speed 2514.55 samples/sec Loss 4.4694 LearningRate 0.000816 Epoch: 7 Global Step: 155170 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:32,685-Speed 2499.38 samples/sec Loss 4.4102 LearningRate 0.000816 Epoch: 7 Global Step: 155180 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:40,887-Speed 2497.38 samples/sec Loss 4.3818 LearningRate 0.000816 Epoch: 7 Global Step: 155190 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:49,083-Speed 2499.40 samples/sec Loss 4.3676 LearningRate 0.000816 Epoch: 7 Global Step: 155200 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:53:57,279-Speed 2498.97 samples/sec Loss 4.3785 LearningRate 0.000816 Epoch: 7 Global Step: 155210 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:05,477-Speed 2498.67 samples/sec Loss 4.4343 LearningRate 0.000816 Epoch: 7 Global Step: 155220 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:13,622-Speed 2515.11 samples/sec Loss 4.4727 LearningRate 0.000816 Epoch: 7 Global Step: 155230 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:21,817-Speed 2499.37 samples/sec Loss 4.4254 LearningRate 0.000816 Epoch: 7 Global Step: 155240 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:30,015-Speed 2498.70 samples/sec Loss 4.4713 LearningRate 0.000816 Epoch: 7 Global Step: 155250 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:38,212-Speed 2498.74 samples/sec Loss 4.3611 LearningRate 0.000816 Epoch: 7 Global Step: 155260 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:46,410-Speed 2498.74 samples/sec Loss 4.4505 LearningRate 0.000816 Epoch: 7 Global Step: 155270 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:54:54,608-Speed 2498.55 samples/sec Loss 4.4877 LearningRate 0.000816 Epoch: 7 Global Step: 155280 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:02,752-Speed 2515.14 samples/sec Loss 4.3908 LearningRate 0.000816 Epoch: 7 Global Step: 155290 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:10,950-Speed 2498.78 samples/sec Loss 4.4330 LearningRate 0.000816 Epoch: 7 Global Step: 155300 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:19,148-Speed 2498.64 samples/sec Loss 4.4587 LearningRate 0.000816 Epoch: 7 Global Step: 155310 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:27,362-Speed 2493.64 samples/sec Loss 4.3821 LearningRate 0.000816 Epoch: 7 Global Step: 155320 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:35,561-Speed 2498.34 samples/sec Loss 4.3910 LearningRate 0.000816 Epoch: 7 Global Step: 155330 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:43,762-Speed 2497.80 samples/sec Loss 4.3972 LearningRate 0.000815 Epoch: 7 Global Step: 155340 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:55:51,908-Speed 2514.64 samples/sec Loss 4.4194 LearningRate 0.000815 Epoch: 7 Global Step: 155350 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:00,108-Speed 2497.96 samples/sec Loss 4.4046 LearningRate 0.000815 Epoch: 7 Global Step: 155360 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:09,760-Speed 2172.78 samples/sec Loss 4.3746 LearningRate 0.000815 Epoch: 7 Global Step: 155370 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:19,641-Speed 2161.62 samples/sec Loss 4.4154 LearningRate 0.000815 Epoch: 7 Global Step: 155380 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:27,834-Speed 2499.86 samples/sec Loss 4.4143 LearningRate 0.000815 Epoch: 7 Global Step: 155390 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:36,598-Speed 2501.43 samples/sec Loss 4.2984 LearningRate 0.000815 Epoch: 7 Global Step: 155400 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:45,399-Speed 2516.85 samples/sec Loss 4.4099 LearningRate 0.000815 Epoch: 7 Global Step: 155410 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 01:56:53,594-Speed 2499.56 samples/sec Loss 4.4279 LearningRate 0.000815 Epoch: 7 Global Step: 155420 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:01,791-Speed 2498.84 samples/sec Loss 4.3837 LearningRate 0.000815 Epoch: 7 Global Step: 155430 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:09,991-Speed 2498.03 samples/sec Loss 4.4350 LearningRate 0.000815 Epoch: 7 Global Step: 155440 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:18,189-Speed 2498.72 samples/sec Loss 4.3638 LearningRate 0.000815 Epoch: 7 Global Step: 155450 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:26,391-Speed 2497.32 samples/sec Loss 4.3814 LearningRate 0.000815 Epoch: 7 Global Step: 155460 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:34,539-Speed 2514.04 samples/sec Loss 4.4408 LearningRate 0.000815 Epoch: 7 Global Step: 155470 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:42,748-Speed 2495.32 samples/sec Loss 4.3881 LearningRate 0.000815 Epoch: 7 Global Step: 155480 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:50,948-Speed 2498.10 samples/sec Loss 4.4173 LearningRate 0.000815 Epoch: 7 Global Step: 155490 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:57:59,146-Speed 2498.58 samples/sec Loss 4.4279 LearningRate 0.000815 Epoch: 7 Global Step: 155500 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:07,348-Speed 2497.21 samples/sec Loss 4.4746 LearningRate 0.000815 Epoch: 7 Global Step: 155510 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:15,554-Speed 2496.01 samples/sec Loss 4.3971 LearningRate 0.000815 Epoch: 7 Global Step: 155520 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:23,701-Speed 2514.41 samples/sec Loss 4.4120 LearningRate 0.000815 Epoch: 7 Global Step: 155530 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:31,902-Speed 2497.65 samples/sec Loss 4.3625 LearningRate 0.000815 Epoch: 7 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:40,100-Speed 2498.64 samples/sec Loss 4.4174 LearningRate 0.000815 Epoch: 7 Global Step: 155550 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:48,302-Speed 2497.26 samples/sec Loss 4.3271 LearningRate 0.000815 Epoch: 7 Global Step: 155560 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:58:56,500-Speed 2498.70 samples/sec Loss 4.3777 LearningRate 0.000815 Epoch: 7 Global Step: 155570 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:04,701-Speed 2497.55 samples/sec Loss 4.4164 LearningRate 0.000815 Epoch: 7 Global Step: 155580 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:12,846-Speed 2514.85 samples/sec Loss 4.4369 LearningRate 0.000815 Epoch: 7 Global Step: 155590 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:21,054-Speed 2495.56 samples/sec Loss 4.4184 LearningRate 0.000815 Epoch: 7 Global Step: 155600 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:29,255-Speed 2497.61 samples/sec Loss 4.3700 LearningRate 0.000815 Epoch: 7 Global Step: 155610 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:37,456-Speed 2497.80 samples/sec Loss 4.3465 LearningRate 0.000815 Epoch: 7 Global Step: 155620 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:45,653-Speed 2498.83 samples/sec Loss 4.3337 LearningRate 0.000815 Epoch: 7 Global Step: 155630 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 01:59:53,870-Speed 2493.14 samples/sec Loss 4.3849 LearningRate 0.000815 Epoch: 7 Global Step: 155640 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:02,018-Speed 2513.88 samples/sec Loss 4.4219 LearningRate 0.000815 Epoch: 7 Global Step: 155650 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:10,218-Speed 2498.13 samples/sec Loss 4.4018 LearningRate 0.000815 Epoch: 7 Global Step: 155660 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:18,417-Speed 2498.27 samples/sec Loss 4.4945 LearningRate 0.000815 Epoch: 7 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:26,635-Speed 2492.41 samples/sec Loss 4.4387 LearningRate 0.000815 Epoch: 7 Global Step: 155680 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:34,835-Speed 2498.00 samples/sec Loss 4.3381 LearningRate 0.000815 Epoch: 7 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:43,039-Speed 2496.77 samples/sec Loss 4.2931 LearningRate 0.000815 Epoch: 7 Global Step: 155700 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:51,190-Speed 2513.22 samples/sec Loss 4.3924 LearningRate 0.000815 Epoch: 7 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:00:59,389-Speed 2498.09 samples/sec Loss 4.3534 LearningRate 0.000815 Epoch: 7 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:07,590-Speed 2497.71 samples/sec Loss 4.4187 LearningRate 0.000815 Epoch: 7 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:15,790-Speed 2497.94 samples/sec Loss 4.4004 LearningRate 0.000815 Epoch: 7 Global Step: 155740 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:23,997-Speed 2495.98 samples/sec Loss 4.4907 LearningRate 0.000815 Epoch: 7 Global Step: 155750 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:32,196-Speed 2498.30 samples/sec Loss 4.3967 LearningRate 0.000814 Epoch: 7 Global Step: 155760 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:40,344-Speed 2514.07 samples/sec Loss 4.3190 LearningRate 0.000814 Epoch: 7 Global Step: 155770 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:48,544-Speed 2497.84 samples/sec Loss 4.4460 LearningRate 0.000814 Epoch: 7 Global Step: 155780 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:01:56,741-Speed 2498.87 samples/sec Loss 4.4385 LearningRate 0.000814 Epoch: 7 Global Step: 155790 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:04,942-Speed 2498.17 samples/sec Loss 4.3823 LearningRate 0.000814 Epoch: 7 Global Step: 155800 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:13,143-Speed 2497.47 samples/sec Loss 4.4159 LearningRate 0.000814 Epoch: 7 Global Step: 155810 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:21,356-Speed 2493.99 samples/sec Loss 4.5224 LearningRate 0.000814 Epoch: 7 Global Step: 155820 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:29,502-Speed 2514.37 samples/sec Loss 4.2784 LearningRate 0.000814 Epoch: 7 Global Step: 155830 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:37,703-Speed 2497.93 samples/sec Loss 4.3939 LearningRate 0.000814 Epoch: 7 Global Step: 155840 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:45,901-Speed 2498.64 samples/sec Loss 4.3585 LearningRate 0.000814 Epoch: 7 Global Step: 155850 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:02:54,098-Speed 2498.74 samples/sec Loss 4.4248 LearningRate 0.000814 Epoch: 7 Global Step: 155860 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:02,303-Speed 2496.54 samples/sec Loss 4.4384 LearningRate 0.000814 Epoch: 7 Global Step: 155870 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:10,501-Speed 2498.85 samples/sec Loss 4.3758 LearningRate 0.000814 Epoch: 7 Global Step: 155880 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:18,647-Speed 2514.53 samples/sec Loss 4.4243 LearningRate 0.000814 Epoch: 7 Global Step: 155890 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:26,844-Speed 2498.73 samples/sec Loss 4.3600 LearningRate 0.000814 Epoch: 7 Global Step: 155900 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:35,047-Speed 2497.32 samples/sec Loss 4.4200 LearningRate 0.000814 Epoch: 7 Global Step: 155910 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:43,246-Speed 2498.33 samples/sec Loss 4.3091 LearningRate 0.000814 Epoch: 7 Global Step: 155920 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:51,443-Speed 2498.79 samples/sec Loss 4.3487 LearningRate 0.000814 Epoch: 7 Global Step: 155930 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:03:59,639-Speed 2499.28 samples/sec Loss 4.3234 LearningRate 0.000814 Epoch: 7 Global Step: 155940 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:07,783-Speed 2515.28 samples/sec Loss 4.4076 LearningRate 0.000814 Epoch: 7 Global Step: 155950 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:15,979-Speed 2498.97 samples/sec Loss 4.3304 LearningRate 0.000814 Epoch: 7 Global Step: 155960 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:24,175-Speed 2499.33 samples/sec Loss 4.3069 LearningRate 0.000814 Epoch: 7 Global Step: 155970 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:32,371-Speed 2499.57 samples/sec Loss 4.4562 LearningRate 0.000814 Epoch: 7 Global Step: 155980 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:40,566-Speed 2499.41 samples/sec Loss 4.3956 LearningRate 0.000814 Epoch: 7 Global Step: 155990 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:48,760-Speed 2499.46 samples/sec Loss 4.4684 LearningRate 0.000814 Epoch: 7 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:04:56,902-Speed 2515.93 samples/sec Loss 4.3911 LearningRate 0.000814 Epoch: 7 Global Step: 156010 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:05,098-Speed 2499.15 samples/sec Loss 4.4711 LearningRate 0.000814 Epoch: 7 Global Step: 156020 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:13,292-Speed 2499.72 samples/sec Loss 4.5131 LearningRate 0.000814 Epoch: 7 Global Step: 156030 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:21,493-Speed 2497.63 samples/sec Loss 4.4649 LearningRate 0.000814 Epoch: 7 Global Step: 156040 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:29,689-Speed 2499.06 samples/sec Loss 4.3631 LearningRate 0.000814 Epoch: 7 Global Step: 156050 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:37,889-Speed 2498.46 samples/sec Loss 4.3997 LearningRate 0.000814 Epoch: 7 Global Step: 156060 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:46,031-Speed 2515.91 samples/sec Loss 4.4092 LearningRate 0.000814 Epoch: 7 Global Step: 156070 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:05:54,231-Speed 2497.98 samples/sec Loss 4.4240 LearningRate 0.000814 Epoch: 7 Global Step: 156080 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:06:02,429-Speed 2498.61 samples/sec Loss 4.4405 LearningRate 0.000814 Epoch: 7 Global Step: 156090 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:06:10,580-Speed 2513.01 samples/sec Loss 4.3968 LearningRate 0.000814 Epoch: 7 Global Step: 156100 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:18,775-Speed 2499.31 samples/sec Loss 4.4802 LearningRate 0.000814 Epoch: 7 Global Step: 156110 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:26,970-Speed 2499.50 samples/sec Loss 4.3665 LearningRate 0.000814 Epoch: 7 Global Step: 156120 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:35,117-Speed 2514.50 samples/sec Loss 4.4278 LearningRate 0.000814 Epoch: 7 Global Step: 156130 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:43,318-Speed 2497.71 samples/sec Loss 4.3698 LearningRate 0.000814 Epoch: 7 Global Step: 156140 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:51,518-Speed 2498.02 samples/sec Loss 4.3956 LearningRate 0.000814 Epoch: 7 Global Step: 156150 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:06:59,716-Speed 2498.78 samples/sec Loss 4.3422 LearningRate 0.000814 Epoch: 7 Global Step: 156160 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:07,917-Speed 2497.82 samples/sec Loss 4.4017 LearningRate 0.000813 Epoch: 7 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:16,116-Speed 2498.16 samples/sec Loss 4.3981 LearningRate 0.000813 Epoch: 7 Global Step: 156180 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:24,259-Speed 2515.49 samples/sec Loss 4.3108 LearningRate 0.000813 Epoch: 7 Global Step: 156190 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:32,461-Speed 2497.26 samples/sec Loss 4.3824 LearningRate 0.000813 Epoch: 7 Global Step: 156200 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:40,662-Speed 2497.78 samples/sec Loss 4.4496 LearningRate 0.000813 Epoch: 7 Global Step: 156210 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:48,860-Speed 2498.55 samples/sec Loss 4.4038 LearningRate 0.000813 Epoch: 7 Global Step: 156220 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:07:57,058-Speed 2498.47 samples/sec Loss 4.4832 LearningRate 0.000813 Epoch: 7 Global Step: 156230 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:05,257-Speed 2498.07 samples/sec Loss 4.4312 LearningRate 0.000813 Epoch: 7 Global Step: 156240 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:13,403-Speed 2514.82 samples/sec Loss 4.5496 LearningRate 0.000813 Epoch: 7 Global Step: 156250 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:21,601-Speed 2498.54 samples/sec Loss 4.5106 LearningRate 0.000813 Epoch: 7 Global Step: 156260 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:29,799-Speed 2498.50 samples/sec Loss 4.3978 LearningRate 0.000813 Epoch: 7 Global Step: 156270 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:37,999-Speed 2497.96 samples/sec Loss 4.4610 LearningRate 0.000813 Epoch: 7 Global Step: 156280 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:46,215-Speed 2492.96 samples/sec Loss 4.4562 LearningRate 0.000813 Epoch: 7 Global Step: 156290 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:08:54,421-Speed 2496.23 samples/sec Loss 4.4360 LearningRate 0.000813 Epoch: 7 Global Step: 156300 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:02,575-Speed 2512.20 samples/sec Loss 4.4747 LearningRate 0.000813 Epoch: 7 Global Step: 156310 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:10,773-Speed 2498.60 samples/sec Loss 4.4466 LearningRate 0.000813 Epoch: 7 Global Step: 156320 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:18,971-Speed 2498.36 samples/sec Loss 4.4253 LearningRate 0.000813 Epoch: 7 Global Step: 156330 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:27,181-Speed 2494.88 samples/sec Loss 4.3618 LearningRate 0.000813 Epoch: 7 Global Step: 156340 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:35,379-Speed 2498.48 samples/sec Loss 4.3176 LearningRate 0.000813 Epoch: 7 Global Step: 156350 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:43,590-Speed 2494.88 samples/sec Loss 4.3367 LearningRate 0.000813 Epoch: 7 Global Step: 156360 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:51,735-Speed 2514.71 samples/sec Loss 4.3893 LearningRate 0.000813 Epoch: 7 Global Step: 156370 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:09:59,932-Speed 2499.23 samples/sec Loss 4.3896 LearningRate 0.000813 Epoch: 7 Global Step: 156380 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:08,128-Speed 2499.06 samples/sec Loss 4.3619 LearningRate 0.000813 Epoch: 7 Global Step: 156390 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:16,325-Speed 2498.91 samples/sec Loss 4.4006 LearningRate 0.000813 Epoch: 7 Global Step: 156400 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:24,529-Speed 2496.48 samples/sec Loss 4.2957 LearningRate 0.000813 Epoch: 7 Global Step: 156410 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:32,735-Speed 2496.29 samples/sec Loss 4.3552 LearningRate 0.000813 Epoch: 7 Global Step: 156420 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:40,879-Speed 2515.30 samples/sec Loss 4.3821 LearningRate 0.000813 Epoch: 7 Global Step: 156430 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:49,088-Speed 2494.96 samples/sec Loss 4.4660 LearningRate 0.000813 Epoch: 7 Global Step: 156440 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:10:57,301-Speed 2494.33 samples/sec Loss 4.3922 LearningRate 0.000813 Epoch: 7 Global Step: 156450 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:05,499-Speed 2498.77 samples/sec Loss 4.3240 LearningRate 0.000813 Epoch: 7 Global Step: 156460 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:13,698-Speed 2498.28 samples/sec Loss 4.3423 LearningRate 0.000813 Epoch: 7 Global Step: 156470 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:21,898-Speed 2497.97 samples/sec Loss 4.3578 LearningRate 0.000813 Epoch: 7 Global Step: 156480 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:30,048-Speed 2513.31 samples/sec Loss 4.3102 LearningRate 0.000813 Epoch: 7 Global Step: 156490 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:38,249-Speed 2497.61 samples/sec Loss 4.4375 LearningRate 0.000813 Epoch: 7 Global Step: 156500 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:46,452-Speed 2496.98 samples/sec Loss 4.3306 LearningRate 0.000813 Epoch: 7 Global Step: 156510 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:11:54,660-Speed 2495.50 samples/sec Loss 4.3672 LearningRate 0.000813 Epoch: 7 Global Step: 156520 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:02,866-Speed 2496.01 samples/sec Loss 4.4021 LearningRate 0.000813 Epoch: 7 Global Step: 156530 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:11,068-Speed 2497.57 samples/sec Loss 4.3648 LearningRate 0.000813 Epoch: 7 Global Step: 156540 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:19,218-Speed 2513.40 samples/sec Loss 4.4296 LearningRate 0.000813 Epoch: 7 Global Step: 156550 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:27,433-Speed 2493.46 samples/sec Loss 4.3532 LearningRate 0.000813 Epoch: 7 Global Step: 156560 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:35,639-Speed 2496.19 samples/sec Loss 4.4344 LearningRate 0.000813 Epoch: 7 Global Step: 156570 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:43,839-Speed 2497.57 samples/sec Loss 4.3968 LearningRate 0.000813 Epoch: 7 Global Step: 156580 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:12:52,039-Speed 2498.08 samples/sec Loss 4.3183 LearningRate 0.000812 Epoch: 7 Global Step: 156590 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:00,238-Speed 2498.31 samples/sec Loss 4.4301 LearningRate 0.000812 Epoch: 7 Global Step: 156600 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:08,387-Speed 2513.46 samples/sec Loss 4.3641 LearningRate 0.000812 Epoch: 7 Global Step: 156610 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:16,594-Speed 2495.90 samples/sec Loss 4.3594 LearningRate 0.000812 Epoch: 7 Global Step: 156620 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:24,802-Speed 2495.66 samples/sec Loss 4.4112 LearningRate 0.000812 Epoch: 7 Global Step: 156630 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:33,006-Speed 2496.89 samples/sec Loss 4.3522 LearningRate 0.000812 Epoch: 7 Global Step: 156640 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:41,208-Speed 2497.41 samples/sec Loss 4.3340 LearningRate 0.000812 Epoch: 7 Global Step: 156650 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:49,405-Speed 2498.62 samples/sec Loss 4.3796 LearningRate 0.000812 Epoch: 7 Global Step: 156660 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:13:57,556-Speed 2512.88 samples/sec Loss 4.3977 LearningRate 0.000812 Epoch: 7 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:05,966-Speed 2500.71 samples/sec Loss 4.4283 LearningRate 0.000812 Epoch: 7 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:14,304-Speed 2500.33 samples/sec Loss 4.3458 LearningRate 0.000812 Epoch: 7 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:22,506-Speed 2497.14 samples/sec Loss 4.3869 LearningRate 0.000812 Epoch: 7 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:30,733-Speed 2496.85 samples/sec Loss 4.3123 LearningRate 0.000812 Epoch: 7 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:41,710-Speed 1867.70 samples/sec Loss 4.2600 LearningRate 0.000812 Epoch: 7 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:49,855-Speed 2514.77 samples/sec Loss 4.3546 LearningRate 0.000812 Epoch: 7 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:14:58,074-Speed 2499.08 samples/sec Loss 4.3628 LearningRate 0.000812 Epoch: 7 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:06,358-Speed 2498.72 samples/sec Loss 4.3386 LearningRate 0.000812 Epoch: 7 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:15,964-Speed 2497.95 samples/sec Loss 4.3415 LearningRate 0.000812 Epoch: 7 Global Step: 156760 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:24,180-Speed 2493.08 samples/sec Loss 4.3087 LearningRate 0.000812 Epoch: 7 Global Step: 156770 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:32,429-Speed 2495.37 samples/sec Loss 4.4306 LearningRate 0.000812 Epoch: 7 Global Step: 156780 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:40,594-Speed 2511.84 samples/sec Loss 4.4216 LearningRate 0.000812 Epoch: 7 Global Step: 156790 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:15:51,882-Speed 1823.43 samples/sec Loss 4.3409 LearningRate 0.000812 Epoch: 7 Global Step: 156800 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:00,092-Speed 2494.85 samples/sec Loss 4.4177 LearningRate 0.000812 Epoch: 7 Global Step: 156810 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:08,309-Speed 2498.05 samples/sec Loss 4.4659 LearningRate 0.000812 Epoch: 7 Global Step: 156820 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:19,695-Speed 2496.12 samples/sec Loss 4.3632 LearningRate 0.000812 Epoch: 7 Global Step: 156830 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:27,892-Speed 2498.92 samples/sec Loss 4.3493 LearningRate 0.000812 Epoch: 7 Global Step: 156840 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:36,150-Speed 2516.77 samples/sec Loss 4.3912 LearningRate 0.000812 Epoch: 7 Global Step: 156850 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:44,367-Speed 2493.19 samples/sec Loss 4.4543 LearningRate 0.000812 Epoch: 7 Global Step: 156860 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:16:52,581-Speed 2500.88 samples/sec Loss 4.2884 LearningRate 0.000812 Epoch: 7 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:00,772-Speed 2500.74 samples/sec Loss 4.3189 LearningRate 0.000812 Epoch: 7 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:08,959-Speed 2501.86 samples/sec Loss 4.3125 LearningRate 0.000812 Epoch: 7 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:17,187-Speed 2501.63 samples/sec Loss 4.3281 LearningRate 0.000812 Epoch: 7 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:25,383-Speed 2516.49 samples/sec Loss 4.3029 LearningRate 0.000812 Epoch: 7 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:33,582-Speed 2498.08 samples/sec Loss 4.3285 LearningRate 0.000812 Epoch: 7 Global Step: 156920 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:41,803-Speed 2500.54 samples/sec Loss 4.3332 LearningRate 0.000812 Epoch: 7 Global Step: 156930 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:17:50,015-Speed 2500.76 samples/sec Loss 4.3226 LearningRate 0.000812 Epoch: 7 Global Step: 156940 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:00,351-Speed 2502.41 samples/sec Loss 4.3471 LearningRate 0.000812 Epoch: 7 Global Step: 156950 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:08,551-Speed 2497.72 samples/sec Loss 4.2733 LearningRate 0.000812 Epoch: 7 Global Step: 156960 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:20,250-Speed 1759.58 samples/sec Loss 4.3064 LearningRate 0.000812 Epoch: 7 Global Step: 156970 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:28,509-Speed 2501.29 samples/sec Loss 4.3539 LearningRate 0.000812 Epoch: 7 Global Step: 156980 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:37,261-Speed 2346.32 samples/sec Loss 4.3843 LearningRate 0.000812 Epoch: 7 Global Step: 156990 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:46,228-Speed 2284.22 samples/sec Loss 4.3755 LearningRate 0.000811 Epoch: 7 Global Step: 157000 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:18:54,444-Speed 2500.76 samples/sec Loss 4.3677 LearningRate 0.000811 Epoch: 7 Global Step: 157010 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:02,693-Speed 2498.81 samples/sec Loss 4.2711 LearningRate 0.000811 Epoch: 7 Global Step: 157020 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:13,812-Speed 1842.07 samples/sec Loss 4.3540 LearningRate 0.000811 Epoch: 7 Global Step: 157030 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:23,460-Speed 2502.50 samples/sec Loss 4.2988 LearningRate 0.000811 Epoch: 7 Global Step: 157040 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:31,655-Speed 2499.41 samples/sec Loss 4.3242 LearningRate 0.000811 Epoch: 7 Global Step: 157050 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:39,854-Speed 2498.46 samples/sec Loss 4.3788 LearningRate 0.000811 Epoch: 7 Global Step: 157060 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:48,053-Speed 2498.32 samples/sec Loss 4.3386 LearningRate 0.000811 Epoch: 7 Global Step: 157070 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:19:56,253-Speed 2497.68 samples/sec Loss 4.3539 LearningRate 0.000811 Epoch: 7 Global Step: 157080 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:04,403-Speed 2513.51 samples/sec Loss 4.3690 LearningRate 0.000811 Epoch: 7 Global Step: 157090 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:12,630-Speed 2489.90 samples/sec Loss 4.3449 LearningRate 0.000811 Epoch: 7 Global Step: 157100 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:20,835-Speed 2496.50 samples/sec Loss 4.4021 LearningRate 0.000811 Epoch: 7 Global Step: 157110 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:29,041-Speed 2496.39 samples/sec Loss 4.4056 LearningRate 0.000811 Epoch: 7 Global Step: 157120 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:37,255-Speed 2493.80 samples/sec Loss 4.4347 LearningRate 0.000811 Epoch: 7 Global Step: 157130 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:45,459-Speed 2496.55 samples/sec Loss 4.4730 LearningRate 0.000811 Epoch: 7 Global Step: 157140 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:20:53,622-Speed 2509.58 samples/sec Loss 4.3456 LearningRate 0.000811 Epoch: 7 Global Step: 157150 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:01,820-Speed 2498.31 samples/sec Loss 4.4020 LearningRate 0.000811 Epoch: 7 Global Step: 157160 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:10,019-Speed 2498.28 samples/sec Loss 4.3537 LearningRate 0.000811 Epoch: 7 Global Step: 157170 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:18,217-Speed 2498.66 samples/sec Loss 4.4297 LearningRate 0.000811 Epoch: 7 Global Step: 157180 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:26,417-Speed 2498.03 samples/sec Loss 4.3532 LearningRate 0.000811 Epoch: 7 Global Step: 157190 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:34,619-Speed 2497.56 samples/sec Loss 4.3443 LearningRate 0.000811 Epoch: 7 Global Step: 157200 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:42,775-Speed 2511.27 samples/sec Loss 4.4560 LearningRate 0.000811 Epoch: 7 Global Step: 157210 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:50,971-Speed 2499.31 samples/sec Loss 4.4359 LearningRate 0.000811 Epoch: 7 Global Step: 157220 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:21:59,178-Speed 2495.88 samples/sec Loss 4.3703 LearningRate 0.000811 Epoch: 7 Global Step: 157230 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:07,379-Speed 2497.69 samples/sec Loss 4.3921 LearningRate 0.000811 Epoch: 7 Global Step: 157240 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:15,583-Speed 2496.82 samples/sec Loss 4.4114 LearningRate 0.000811 Epoch: 7 Global Step: 157250 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:23,781-Speed 2498.42 samples/sec Loss 4.4176 LearningRate 0.000811 Epoch: 7 Global Step: 157260 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:31,925-Speed 2514.88 samples/sec Loss 4.4416 LearningRate 0.000811 Epoch: 7 Global Step: 157270 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:40,125-Speed 2498.22 samples/sec Loss 4.3561 LearningRate 0.000811 Epoch: 7 Global Step: 157280 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:48,325-Speed 2497.71 samples/sec Loss 4.3608 LearningRate 0.000811 Epoch: 7 Global Step: 157290 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:22:56,524-Speed 2498.25 samples/sec Loss 4.3622 LearningRate 0.000811 Epoch: 7 Global Step: 157300 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:04,721-Speed 2498.78 samples/sec Loss 4.2815 LearningRate 0.000811 Epoch: 7 Global Step: 157310 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:12,922-Speed 2497.76 samples/sec Loss 4.3515 LearningRate 0.000811 Epoch: 7 Global Step: 157320 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:21,071-Speed 2513.55 samples/sec Loss 4.3651 LearningRate 0.000811 Epoch: 7 Global Step: 157330 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:29,269-Speed 2498.65 samples/sec Loss 4.4398 LearningRate 0.000811 Epoch: 7 Global Step: 157340 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:37,468-Speed 2498.29 samples/sec Loss 4.3902 LearningRate 0.000811 Epoch: 7 Global Step: 157350 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:45,666-Speed 2498.66 samples/sec Loss 4.3763 LearningRate 0.000811 Epoch: 7 Global Step: 157360 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:23:53,861-Speed 2499.28 samples/sec Loss 4.2989 LearningRate 0.000811 Epoch: 7 Global Step: 157370 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:02,061-Speed 2497.97 samples/sec Loss 4.4183 LearningRate 0.000811 Epoch: 7 Global Step: 157380 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:10,204-Speed 2515.71 samples/sec Loss 4.3654 LearningRate 0.000811 Epoch: 7 Global Step: 157390 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:18,400-Speed 2499.06 samples/sec Loss 4.3067 LearningRate 0.000811 Epoch: 7 Global Step: 157400 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:26,598-Speed 2498.70 samples/sec Loss 4.3193 LearningRate 0.000811 Epoch: 7 Global Step: 157410 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:34,806-Speed 2495.27 samples/sec Loss 4.3279 LearningRate 0.000810 Epoch: 7 Global Step: 157420 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:43,015-Speed 2495.18 samples/sec Loss 4.2848 LearningRate 0.000810 Epoch: 7 Global Step: 157430 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:51,215-Speed 2497.87 samples/sec Loss 4.4030 LearningRate 0.000810 Epoch: 7 Global Step: 157440 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:24:59,363-Speed 2514.08 samples/sec Loss 4.3320 LearningRate 0.000810 Epoch: 7 Global Step: 157450 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:07,560-Speed 2498.81 samples/sec Loss 4.3220 LearningRate 0.000810 Epoch: 7 Global Step: 157460 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:15,761-Speed 2497.93 samples/sec Loss 4.3604 LearningRate 0.000810 Epoch: 7 Global Step: 157470 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:23,960-Speed 2498.31 samples/sec Loss 4.3839 LearningRate 0.000810 Epoch: 7 Global Step: 157480 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:32,177-Speed 2492.54 samples/sec Loss 4.4132 LearningRate 0.000810 Epoch: 7 Global Step: 157490 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:40,376-Speed 2498.45 samples/sec Loss 4.4272 LearningRate 0.000810 Epoch: 7 Global Step: 157500 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:48,525-Speed 2513.77 samples/sec Loss 4.3323 LearningRate 0.000810 Epoch: 7 Global Step: 157510 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:25:56,723-Speed 2498.49 samples/sec Loss 4.3768 LearningRate 0.000810 Epoch: 7 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:04,919-Speed 2499.13 samples/sec Loss 4.3915 LearningRate 0.000810 Epoch: 7 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:13,123-Speed 2497.02 samples/sec Loss 4.4351 LearningRate 0.000810 Epoch: 7 Global Step: 157540 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:21,324-Speed 2497.63 samples/sec Loss 4.3379 LearningRate 0.000810 Epoch: 7 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:29,524-Speed 2497.73 samples/sec Loss 4.3667 LearningRate 0.000810 Epoch: 7 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:37,669-Speed 2514.76 samples/sec Loss 4.3682 LearningRate 0.000810 Epoch: 7 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:45,866-Speed 2499.32 samples/sec Loss 4.3150 LearningRate 0.000810 Epoch: 7 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:26:54,065-Speed 2498.30 samples/sec Loss 4.3213 LearningRate 0.000810 Epoch: 7 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:02,266-Speed 2497.56 samples/sec Loss 4.3402 LearningRate 0.000810 Epoch: 7 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:10,467-Speed 2497.62 samples/sec Loss 4.3547 LearningRate 0.000810 Epoch: 7 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:18,679-Speed 2494.12 samples/sec Loss 4.3071 LearningRate 0.000810 Epoch: 7 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:26,827-Speed 2513.87 samples/sec Loss 4.3600 LearningRate 0.000810 Epoch: 7 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:35,027-Speed 2498.50 samples/sec Loss 4.3842 LearningRate 0.000810 Epoch: 7 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:43,224-Speed 2498.63 samples/sec Loss 4.3111 LearningRate 0.000810 Epoch: 7 Global Step: 157650 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:51,420-Speed 2499.48 samples/sec Loss 4.3112 LearningRate 0.000810 Epoch: 7 Global Step: 157660 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:27:59,618-Speed 2498.23 samples/sec Loss 4.2966 LearningRate 0.000810 Epoch: 7 Global Step: 157670 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:28:07,816-Speed 2498.59 samples/sec Loss 4.3266 LearningRate 0.000810 Epoch: 7 Global Step: 157680 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:28:15,966-Speed 2513.18 samples/sec Loss 4.3753 LearningRate 0.000810 Epoch: 7 Global Step: 157690 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:28:24,163-Speed 2498.94 samples/sec Loss 4.2965 LearningRate 0.000810 Epoch: 7 Global Step: 157700 Fp16 Grad Scale: 65536 Required: 154 hours Training: 2022-07-07 02:28:32,316-Speed 2512.60 samples/sec Loss 4.2664 LearningRate 0.000810 Epoch: 7 Global Step: 157710 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:28:40,515-Speed 2497.92 samples/sec Loss 4.3245 LearningRate 0.000810 Epoch: 7 Global Step: 157720 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:28:48,714-Speed 2498.86 samples/sec Loss 4.3445 LearningRate 0.000810 Epoch: 7 Global Step: 157730 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:28:56,921-Speed 2495.86 samples/sec Loss 4.3257 LearningRate 0.000810 Epoch: 7 Global Step: 157740 Fp16 Grad Scale: 32768 Required: 154 hours Training: 2022-07-07 02:29:05,070-Speed 2513.51 samples/sec Loss 4.3311 LearningRate 0.000810 Epoch: 7 Global Step: 157750 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:13,271-Speed 2497.61 samples/sec Loss 4.3544 LearningRate 0.000810 Epoch: 7 Global Step: 157760 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:21,471-Speed 2498.26 samples/sec Loss 4.3571 LearningRate 0.000810 Epoch: 7 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:29,670-Speed 2498.06 samples/sec Loss 4.3792 LearningRate 0.000810 Epoch: 7 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:37,874-Speed 2496.77 samples/sec Loss 4.3125 LearningRate 0.000810 Epoch: 7 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:46,080-Speed 2495.98 samples/sec Loss 4.3069 LearningRate 0.000810 Epoch: 7 Global Step: 157800 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:29:54,229-Speed 2513.65 samples/sec Loss 4.3718 LearningRate 0.000810 Epoch: 7 Global Step: 157810 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:02,429-Speed 2498.01 samples/sec Loss 4.3099 LearningRate 0.000810 Epoch: 7 Global Step: 157820 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:10,628-Speed 2498.06 samples/sec Loss 4.3811 LearningRate 0.000809 Epoch: 7 Global Step: 157830 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:18,829-Speed 2497.86 samples/sec Loss 4.4300 LearningRate 0.000809 Epoch: 7 Global Step: 157840 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:27,029-Speed 2497.83 samples/sec Loss 4.3488 LearningRate 0.000809 Epoch: 7 Global Step: 157850 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:35,235-Speed 2496.01 samples/sec Loss 4.3773 LearningRate 0.000809 Epoch: 7 Global Step: 157860 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:43,384-Speed 2513.71 samples/sec Loss 4.2979 LearningRate 0.000809 Epoch: 7 Global Step: 157870 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:30:52,120-Speed 2500.34 samples/sec Loss 4.4021 LearningRate 0.000809 Epoch: 7 Global Step: 157880 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:00,814-Speed 2501.08 samples/sec Loss 4.3323 LearningRate 0.000809 Epoch: 7 Global Step: 157890 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:09,010-Speed 2499.23 samples/sec Loss 4.4510 LearningRate 0.000809 Epoch: 7 Global Step: 157900 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:17,476-Speed 2501.08 samples/sec Loss 4.3607 LearningRate 0.000809 Epoch: 7 Global Step: 157910 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:25,673-Speed 2499.01 samples/sec Loss 4.4279 LearningRate 0.000809 Epoch: 7 Global Step: 157920 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:33,828-Speed 2511.73 samples/sec Loss 4.3856 LearningRate 0.000809 Epoch: 7 Global Step: 157930 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:42,024-Speed 2499.17 samples/sec Loss 4.3787 LearningRate 0.000809 Epoch: 7 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:50,225-Speed 2497.69 samples/sec Loss 4.3184 LearningRate 0.000809 Epoch: 7 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:31:58,425-Speed 2498.12 samples/sec Loss 4.3322 LearningRate 0.000809 Epoch: 7 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:06,625-Speed 2498.12 samples/sec Loss 4.3628 LearningRate 0.000809 Epoch: 7 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:14,825-Speed 2497.96 samples/sec Loss 4.3628 LearningRate 0.000809 Epoch: 7 Global Step: 157980 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:22,969-Speed 2515.12 samples/sec Loss 4.4173 LearningRate 0.000809 Epoch: 7 Global Step: 157990 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:31,165-Speed 2499.66 samples/sec Loss 4.3632 LearningRate 0.000809 Epoch: 7 Global Step: 158000 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:39,361-Speed 2499.23 samples/sec Loss 4.2959 LearningRate 0.000809 Epoch: 7 Global Step: 158010 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:47,556-Speed 2499.22 samples/sec Loss 4.4150 LearningRate 0.000809 Epoch: 7 Global Step: 158020 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:32:55,755-Speed 2498.61 samples/sec Loss 4.3777 LearningRate 0.000809 Epoch: 7 Global Step: 158030 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:03,955-Speed 2497.93 samples/sec Loss 4.2945 LearningRate 0.000809 Epoch: 7 Global Step: 158040 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:12,098-Speed 2515.47 samples/sec Loss 4.3647 LearningRate 0.000809 Epoch: 7 Global Step: 158050 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:20,310-Speed 2494.53 samples/sec Loss 4.4277 LearningRate 0.000809 Epoch: 7 Global Step: 158060 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:28,509-Speed 2498.07 samples/sec Loss 4.3982 LearningRate 0.000809 Epoch: 7 Global Step: 158070 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:36,703-Speed 2499.93 samples/sec Loss 4.3283 LearningRate 0.000809 Epoch: 7 Global Step: 158080 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:44,906-Speed 2496.96 samples/sec Loss 4.3364 LearningRate 0.000809 Epoch: 7 Global Step: 158090 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:33:53,118-Speed 2494.37 samples/sec Loss 4.3406 LearningRate 0.000809 Epoch: 7 Global Step: 158100 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:01,274-Speed 2511.64 samples/sec Loss 4.3166 LearningRate 0.000809 Epoch: 7 Global Step: 158110 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:09,478-Speed 2496.77 samples/sec Loss 4.3135 LearningRate 0.000809 Epoch: 7 Global Step: 158120 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:17,676-Speed 2498.41 samples/sec Loss 4.3589 LearningRate 0.000809 Epoch: 7 Global Step: 158130 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:25,876-Speed 2498.41 samples/sec Loss 4.3133 LearningRate 0.000809 Epoch: 7 Global Step: 158140 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:34,090-Speed 2493.97 samples/sec Loss 4.4577 LearningRate 0.000809 Epoch: 7 Global Step: 158150 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:42,288-Speed 2498.58 samples/sec Loss 4.2757 LearningRate 0.000809 Epoch: 7 Global Step: 158160 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:50,449-Speed 2509.77 samples/sec Loss 4.2876 LearningRate 0.000809 Epoch: 7 Global Step: 158170 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:34:58,647-Speed 2498.58 samples/sec Loss 4.3274 LearningRate 0.000809 Epoch: 7 Global Step: 158180 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:06,847-Speed 2498.05 samples/sec Loss 4.3274 LearningRate 0.000809 Epoch: 7 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:15,046-Speed 2498.21 samples/sec Loss 4.3411 LearningRate 0.000809 Epoch: 7 Global Step: 158200 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:23,246-Speed 2497.82 samples/sec Loss 4.3490 LearningRate 0.000809 Epoch: 7 Global Step: 158210 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:31,457-Speed 2494.55 samples/sec Loss 4.2950 LearningRate 0.000809 Epoch: 7 Global Step: 158220 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:39,602-Speed 2515.04 samples/sec Loss 4.2832 LearningRate 0.000809 Epoch: 7 Global Step: 158230 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:47,799-Speed 2498.92 samples/sec Loss 4.2882 LearningRate 0.000808 Epoch: 7 Global Step: 158240 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:35:55,997-Speed 2498.28 samples/sec Loss 4.2819 LearningRate 0.000808 Epoch: 7 Global Step: 158250 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:04,204-Speed 2496.00 samples/sec Loss 4.3023 LearningRate 0.000808 Epoch: 7 Global Step: 158260 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:12,402-Speed 2498.61 samples/sec Loss 4.3164 LearningRate 0.000808 Epoch: 7 Global Step: 158270 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:20,619-Speed 2492.74 samples/sec Loss 4.2987 LearningRate 0.000808 Epoch: 7 Global Step: 158280 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:28,762-Speed 2515.49 samples/sec Loss 4.3069 LearningRate 0.000808 Epoch: 7 Global Step: 158290 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:36,961-Speed 2498.38 samples/sec Loss 4.3747 LearningRate 0.000808 Epoch: 7 Global Step: 158300 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:45,167-Speed 2496.31 samples/sec Loss 4.3821 LearningRate 0.000808 Epoch: 7 Global Step: 158310 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:36:53,369-Speed 2497.37 samples/sec Loss 4.3248 LearningRate 0.000808 Epoch: 7 Global Step: 158320 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:01,573-Speed 2496.54 samples/sec Loss 4.3219 LearningRate 0.000808 Epoch: 7 Global Step: 158330 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:09,779-Speed 2496.26 samples/sec Loss 4.4175 LearningRate 0.000808 Epoch: 7 Global Step: 158340 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:17,924-Speed 2514.89 samples/sec Loss 4.3391 LearningRate 0.000808 Epoch: 7 Global Step: 158350 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:26,121-Speed 2498.61 samples/sec Loss 4.2864 LearningRate 0.000808 Epoch: 7 Global Step: 158360 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:34,323-Speed 2497.70 samples/sec Loss 4.3528 LearningRate 0.000808 Epoch: 7 Global Step: 158370 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:42,533-Speed 2494.71 samples/sec Loss 4.3203 LearningRate 0.000808 Epoch: 7 Global Step: 158380 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:50,732-Speed 2498.38 samples/sec Loss 4.3990 LearningRate 0.000808 Epoch: 7 Global Step: 158390 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:37:58,957-Speed 2490.21 samples/sec Loss 4.3203 LearningRate 0.000808 Epoch: 7 Global Step: 158400 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:07,105-Speed 2513.94 samples/sec Loss 4.3693 LearningRate 0.000808 Epoch: 7 Global Step: 158410 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:15,307-Speed 2497.31 samples/sec Loss 4.3800 LearningRate 0.000808 Epoch: 7 Global Step: 158420 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:23,507-Speed 2498.33 samples/sec Loss 4.3694 LearningRate 0.000808 Epoch: 7 Global Step: 158430 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:31,709-Speed 2497.39 samples/sec Loss 4.3973 LearningRate 0.000808 Epoch: 7 Global Step: 158440 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:39,911-Speed 2497.50 samples/sec Loss 4.4301 LearningRate 0.000808 Epoch: 7 Global Step: 158450 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:48,129-Speed 2492.39 samples/sec Loss 4.4252 LearningRate 0.000808 Epoch: 7 Global Step: 158460 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:38:56,281-Speed 2512.65 samples/sec Loss 4.3896 LearningRate 0.000808 Epoch: 7 Global Step: 158470 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:04,484-Speed 2496.69 samples/sec Loss 4.4304 LearningRate 0.000808 Epoch: 7 Global Step: 158480 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:12,691-Speed 2495.80 samples/sec Loss 4.3874 LearningRate 0.000808 Epoch: 7 Global Step: 158490 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:20,890-Speed 2498.34 samples/sec Loss 4.4636 LearningRate 0.000808 Epoch: 7 Global Step: 158500 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:29,090-Speed 2498.19 samples/sec Loss 4.3821 LearningRate 0.000808 Epoch: 7 Global Step: 158510 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:37,295-Speed 2496.45 samples/sec Loss 4.4420 LearningRate 0.000808 Epoch: 7 Global Step: 158520 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:45,441-Speed 2514.34 samples/sec Loss 4.4730 LearningRate 0.000808 Epoch: 7 Global Step: 158530 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:39:53,642-Speed 2497.60 samples/sec Loss 4.3927 LearningRate 0.000808 Epoch: 7 Global Step: 158540 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:01,842-Speed 2498.02 samples/sec Loss 4.4437 LearningRate 0.000808 Epoch: 7 Global Step: 158550 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:10,040-Speed 2498.70 samples/sec Loss 4.4538 LearningRate 0.000808 Epoch: 7 Global Step: 158560 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:18,247-Speed 2495.88 samples/sec Loss 4.3651 LearningRate 0.000808 Epoch: 7 Global Step: 158570 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:26,442-Speed 2499.52 samples/sec Loss 4.3977 LearningRate 0.000808 Epoch: 7 Global Step: 158580 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:34,590-Speed 2513.91 samples/sec Loss 4.3981 LearningRate 0.000808 Epoch: 7 Global Step: 158590 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:42,785-Speed 2499.46 samples/sec Loss 4.4495 LearningRate 0.000808 Epoch: 7 Global Step: 158600 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:50,981-Speed 2499.24 samples/sec Loss 4.3838 LearningRate 0.000808 Epoch: 7 Global Step: 158610 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:40:59,180-Speed 2498.21 samples/sec Loss 4.3918 LearningRate 0.000808 Epoch: 7 Global Step: 158620 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:07,381-Speed 2497.76 samples/sec Loss 4.3362 LearningRate 0.000808 Epoch: 7 Global Step: 158630 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:15,578-Speed 2498.70 samples/sec Loss 4.3547 LearningRate 0.000808 Epoch: 7 Global Step: 158640 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:23,738-Speed 2510.45 samples/sec Loss 4.3612 LearningRate 0.000808 Epoch: 7 Global Step: 158650 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:31,940-Speed 2497.15 samples/sec Loss 4.2877 LearningRate 0.000807 Epoch: 7 Global Step: 158660 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:40,141-Speed 2497.83 samples/sec Loss 4.3312 LearningRate 0.000807 Epoch: 7 Global Step: 158670 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:48,343-Speed 2497.30 samples/sec Loss 4.4150 LearningRate 0.000807 Epoch: 7 Global Step: 158680 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:41:56,552-Speed 2495.09 samples/sec Loss 4.4210 LearningRate 0.000807 Epoch: 7 Global Step: 158690 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:04,751-Speed 2498.28 samples/sec Loss 4.2939 LearningRate 0.000807 Epoch: 7 Global Step: 158700 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:12,903-Speed 2512.77 samples/sec Loss 4.3690 LearningRate 0.000807 Epoch: 7 Global Step: 158710 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:21,108-Speed 2496.49 samples/sec Loss 4.4336 LearningRate 0.000807 Epoch: 7 Global Step: 158720 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:29,303-Speed 2499.33 samples/sec Loss 4.3399 LearningRate 0.000807 Epoch: 7 Global Step: 158730 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:37,502-Speed 2498.41 samples/sec Loss 4.2950 LearningRate 0.000807 Epoch: 7 Global Step: 158740 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:45,698-Speed 2498.96 samples/sec Loss 4.2986 LearningRate 0.000807 Epoch: 7 Global Step: 158750 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:42:53,896-Speed 2498.94 samples/sec Loss 4.3488 LearningRate 0.000807 Epoch: 7 Global Step: 158760 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:02,052-Speed 2511.31 samples/sec Loss 4.2909 LearningRate 0.000807 Epoch: 7 Global Step: 158770 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:10,249-Speed 2498.77 samples/sec Loss 4.2191 LearningRate 0.000807 Epoch: 7 Global Step: 158780 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:18,448-Speed 2498.58 samples/sec Loss 4.2715 LearningRate 0.000807 Epoch: 7 Global Step: 158790 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:26,643-Speed 2499.40 samples/sec Loss 4.4317 LearningRate 0.000807 Epoch: 7 Global Step: 158800 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:34,853-Speed 2495.11 samples/sec Loss 4.3401 LearningRate 0.000807 Epoch: 7 Global Step: 158810 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:43,066-Speed 2493.81 samples/sec Loss 4.2854 LearningRate 0.000807 Epoch: 7 Global Step: 158820 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:51,212-Speed 2514.46 samples/sec Loss 4.4929 LearningRate 0.000807 Epoch: 7 Global Step: 158830 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:43:59,410-Speed 2498.72 samples/sec Loss 4.3997 LearningRate 0.000807 Epoch: 7 Global Step: 158840 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:07,607-Speed 2498.85 samples/sec Loss 4.3071 LearningRate 0.000807 Epoch: 7 Global Step: 158850 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:15,806-Speed 2498.35 samples/sec Loss 4.3352 LearningRate 0.000807 Epoch: 7 Global Step: 158860 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:24,003-Speed 2498.96 samples/sec Loss 4.2993 LearningRate 0.000807 Epoch: 7 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:32,200-Speed 2498.71 samples/sec Loss 4.3448 LearningRate 0.000807 Epoch: 7 Global Step: 158880 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:40,348-Speed 2513.97 samples/sec Loss 4.3409 LearningRate 0.000807 Epoch: 7 Global Step: 158890 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:48,543-Speed 2499.54 samples/sec Loss 4.3592 LearningRate 0.000807 Epoch: 7 Global Step: 158900 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:44:56,740-Speed 2498.81 samples/sec Loss 4.4116 LearningRate 0.000807 Epoch: 7 Global Step: 158910 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 02:45:04,905-Speed 2509.01 samples/sec Loss 4.3766 LearningRate 0.000807 Epoch: 7 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:13,103-Speed 2498.33 samples/sec Loss 4.4077 LearningRate 0.000807 Epoch: 7 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:21,306-Speed 2497.17 samples/sec Loss 4.2806 LearningRate 0.000807 Epoch: 7 Global Step: 158940 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:29,452-Speed 2514.66 samples/sec Loss 4.3658 LearningRate 0.000807 Epoch: 7 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:37,651-Speed 2498.36 samples/sec Loss 4.3544 LearningRate 0.000807 Epoch: 7 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:45,857-Speed 2496.23 samples/sec Loss 4.3706 LearningRate 0.000807 Epoch: 7 Global Step: 158970 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:45:54,060-Speed 2496.89 samples/sec Loss 4.3173 LearningRate 0.000807 Epoch: 7 Global Step: 158980 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:02,264-Speed 2496.97 samples/sec Loss 4.3678 LearningRate 0.000807 Epoch: 7 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:10,475-Speed 2494.41 samples/sec Loss 4.5044 LearningRate 0.000807 Epoch: 7 Global Step: 159000 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:18,622-Speed 2514.27 samples/sec Loss 4.3871 LearningRate 0.000807 Epoch: 7 Global Step: 159010 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:26,834-Speed 2494.44 samples/sec Loss 4.3846 LearningRate 0.000807 Epoch: 7 Global Step: 159020 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:35,033-Speed 2498.17 samples/sec Loss 4.3844 LearningRate 0.000807 Epoch: 7 Global Step: 159030 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:43,230-Speed 2498.79 samples/sec Loss 4.4388 LearningRate 0.000807 Epoch: 7 Global Step: 159040 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:51,433-Speed 2497.21 samples/sec Loss 4.3954 LearningRate 0.000807 Epoch: 7 Global Step: 159050 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:46:59,632-Speed 2498.38 samples/sec Loss 4.2725 LearningRate 0.000807 Epoch: 7 Global Step: 159060 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:07,791-Speed 2510.38 samples/sec Loss 4.4877 LearningRate 0.000807 Epoch: 7 Global Step: 159070 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:15,995-Speed 2496.73 samples/sec Loss 4.3480 LearningRate 0.000806 Epoch: 7 Global Step: 159080 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:24,193-Speed 2498.74 samples/sec Loss 4.3805 LearningRate 0.000806 Epoch: 7 Global Step: 159090 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:32,393-Speed 2497.85 samples/sec Loss 4.4185 LearningRate 0.000806 Epoch: 7 Global Step: 159100 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:40,591-Speed 2498.40 samples/sec Loss 4.3217 LearningRate 0.000806 Epoch: 7 Global Step: 159110 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:48,793-Speed 2497.66 samples/sec Loss 4.3755 LearningRate 0.000806 Epoch: 7 Global Step: 159120 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:47:56,942-Speed 2513.73 samples/sec Loss 4.3664 LearningRate 0.000806 Epoch: 7 Global Step: 159130 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:05,142-Speed 2498.09 samples/sec Loss 4.3949 LearningRate 0.000806 Epoch: 7 Global Step: 159140 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:13,341-Speed 2498.33 samples/sec Loss 4.2941 LearningRate 0.000806 Epoch: 7 Global Step: 159150 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:21,544-Speed 2496.96 samples/sec Loss 4.3075 LearningRate 0.000806 Epoch: 7 Global Step: 159160 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:29,745-Speed 2497.30 samples/sec Loss 4.2653 LearningRate 0.000806 Epoch: 7 Global Step: 159170 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:38,034-Speed 2471.40 samples/sec Loss 4.3022 LearningRate 0.000806 Epoch: 7 Global Step: 159180 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:46,183-Speed 2513.40 samples/sec Loss 4.2558 LearningRate 0.000806 Epoch: 7 Global Step: 159190 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:48:54,382-Speed 2498.13 samples/sec Loss 4.3395 LearningRate 0.000806 Epoch: 7 Global Step: 159200 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:02,583-Speed 2497.72 samples/sec Loss 4.3187 LearningRate 0.000806 Epoch: 7 Global Step: 159210 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:10,785-Speed 2497.38 samples/sec Loss 4.2335 LearningRate 0.000806 Epoch: 7 Global Step: 159220 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:18,986-Speed 2497.94 samples/sec Loss 4.3281 LearningRate 0.000806 Epoch: 7 Global Step: 159230 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:27,184-Speed 2498.32 samples/sec Loss 4.2867 LearningRate 0.000806 Epoch: 7 Global Step: 159240 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:35,334-Speed 2513.24 samples/sec Loss 4.2852 LearningRate 0.000806 Epoch: 7 Global Step: 159250 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:43,533-Speed 2498.33 samples/sec Loss 4.3061 LearningRate 0.000806 Epoch: 7 Global Step: 159260 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:51,733-Speed 2498.07 samples/sec Loss 4.2497 LearningRate 0.000806 Epoch: 7 Global Step: 159270 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:49:59,939-Speed 2496.19 samples/sec Loss 4.3384 LearningRate 0.000806 Epoch: 7 Global Step: 159280 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:08,146-Speed 2495.93 samples/sec Loss 4.3558 LearningRate 0.000806 Epoch: 7 Global Step: 159290 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:16,344-Speed 2498.63 samples/sec Loss 4.3460 LearningRate 0.000806 Epoch: 7 Global Step: 159300 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:24,499-Speed 2511.65 samples/sec Loss 4.3029 LearningRate 0.000806 Epoch: 7 Global Step: 159310 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:32,699-Speed 2498.14 samples/sec Loss 4.3150 LearningRate 0.000806 Epoch: 7 Global Step: 159320 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:40,906-Speed 2495.60 samples/sec Loss 4.3445 LearningRate 0.000806 Epoch: 7 Global Step: 159330 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:49,105-Speed 2498.48 samples/sec Loss 4.3553 LearningRate 0.000806 Epoch: 7 Global Step: 159340 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:50:57,304-Speed 2498.24 samples/sec Loss 4.3165 LearningRate 0.000806 Epoch: 7 Global Step: 159350 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:05,504-Speed 2497.91 samples/sec Loss 4.3613 LearningRate 0.000806 Epoch: 7 Global Step: 159360 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:13,651-Speed 2514.38 samples/sec Loss 4.3978 LearningRate 0.000806 Epoch: 7 Global Step: 159370 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:21,851-Speed 2497.83 samples/sec Loss 4.3955 LearningRate 0.000806 Epoch: 7 Global Step: 159380 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:30,060-Speed 2495.34 samples/sec Loss 4.3469 LearningRate 0.000806 Epoch: 7 Global Step: 159390 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:38,259-Speed 2498.51 samples/sec Loss 4.3247 LearningRate 0.000806 Epoch: 7 Global Step: 159400 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:46,476-Speed 2492.61 samples/sec Loss 4.2638 LearningRate 0.000806 Epoch: 7 Global Step: 159410 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:51:54,671-Speed 2499.58 samples/sec Loss 4.3741 LearningRate 0.000806 Epoch: 7 Global Step: 159420 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:02,817-Speed 2514.69 samples/sec Loss 4.4077 LearningRate 0.000806 Epoch: 7 Global Step: 159430 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:11,018-Speed 2497.74 samples/sec Loss 4.3735 LearningRate 0.000806 Epoch: 7 Global Step: 159440 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:19,218-Speed 2497.69 samples/sec Loss 4.3056 LearningRate 0.000806 Epoch: 7 Global Step: 159450 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:27,415-Speed 2498.84 samples/sec Loss 4.3288 LearningRate 0.000806 Epoch: 7 Global Step: 159460 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:35,627-Speed 2494.65 samples/sec Loss 4.2974 LearningRate 0.000806 Epoch: 7 Global Step: 159470 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:43,824-Speed 2499.03 samples/sec Loss 4.2558 LearningRate 0.000806 Epoch: 7 Global Step: 159480 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:52:51,982-Speed 2510.89 samples/sec Loss 4.2818 LearningRate 0.000805 Epoch: 7 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:00,184-Speed 2497.49 samples/sec Loss 4.2909 LearningRate 0.000805 Epoch: 7 Global Step: 159500 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:08,383-Speed 2498.65 samples/sec Loss 4.3171 LearningRate 0.000805 Epoch: 7 Global Step: 159510 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:16,579-Speed 2499.08 samples/sec Loss 4.3103 LearningRate 0.000805 Epoch: 7 Global Step: 159520 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:24,780-Speed 2497.83 samples/sec Loss 4.2808 LearningRate 0.000805 Epoch: 7 Global Step: 159530 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:32,982-Speed 2497.19 samples/sec Loss 4.3407 LearningRate 0.000805 Epoch: 7 Global Step: 159540 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:41,129-Speed 2514.29 samples/sec Loss 4.2644 LearningRate 0.000805 Epoch: 7 Global Step: 159550 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:49,329-Speed 2497.95 samples/sec Loss 4.2885 LearningRate 0.000805 Epoch: 7 Global Step: 159560 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:53:57,528-Speed 2498.28 samples/sec Loss 4.3011 LearningRate 0.000805 Epoch: 7 Global Step: 159570 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:05,729-Speed 2497.55 samples/sec Loss 4.3283 LearningRate 0.000805 Epoch: 7 Global Step: 159580 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:13,930-Speed 2497.56 samples/sec Loss 4.2913 LearningRate 0.000805 Epoch: 7 Global Step: 159590 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:22,133-Speed 2497.00 samples/sec Loss 4.1991 LearningRate 0.000805 Epoch: 7 Global Step: 159600 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:30,276-Speed 2515.42 samples/sec Loss 4.3088 LearningRate 0.000805 Epoch: 7 Global Step: 159610 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:38,473-Speed 2498.98 samples/sec Loss 4.2806 LearningRate 0.000805 Epoch: 7 Global Step: 159620 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:46,672-Speed 2498.38 samples/sec Loss 4.2864 LearningRate 0.000805 Epoch: 7 Global Step: 159630 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:54:54,886-Speed 2493.73 samples/sec Loss 4.2978 LearningRate 0.000805 Epoch: 7 Global Step: 159640 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:03,078-Speed 2500.39 samples/sec Loss 4.3617 LearningRate 0.000805 Epoch: 7 Global Step: 159650 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:11,289-Speed 2494.60 samples/sec Loss 4.3300 LearningRate 0.000805 Epoch: 7 Global Step: 159660 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:19,433-Speed 2515.10 samples/sec Loss 4.3944 LearningRate 0.000805 Epoch: 7 Global Step: 159670 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:27,639-Speed 2496.15 samples/sec Loss 4.3880 LearningRate 0.000805 Epoch: 7 Global Step: 159680 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:35,837-Speed 2498.70 samples/sec Loss 4.4087 LearningRate 0.000805 Epoch: 7 Global Step: 159690 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:44,033-Speed 2499.18 samples/sec Loss 4.3819 LearningRate 0.000805 Epoch: 7 Global Step: 159700 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:55:52,235-Speed 2497.22 samples/sec Loss 4.3702 LearningRate 0.000805 Epoch: 7 Global Step: 159710 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:00,436-Speed 2497.39 samples/sec Loss 4.3084 LearningRate 0.000805 Epoch: 7 Global Step: 159720 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:08,607-Speed 2506.88 samples/sec Loss 4.2569 LearningRate 0.000805 Epoch: 7 Global Step: 159730 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:16,809-Speed 2497.64 samples/sec Loss 4.2344 LearningRate 0.000805 Epoch: 7 Global Step: 159740 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:25,011-Speed 2497.26 samples/sec Loss 4.2965 LearningRate 0.000805 Epoch: 7 Global Step: 159750 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:33,207-Speed 2499.20 samples/sec Loss 4.2798 LearningRate 0.000805 Epoch: 7 Global Step: 159760 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:41,406-Speed 2498.37 samples/sec Loss 4.3121 LearningRate 0.000805 Epoch: 7 Global Step: 159770 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:49,605-Speed 2498.18 samples/sec Loss 4.3371 LearningRate 0.000805 Epoch: 7 Global Step: 159780 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:56:57,754-Speed 2513.78 samples/sec Loss 4.2930 LearningRate 0.000805 Epoch: 7 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:05,951-Speed 2498.59 samples/sec Loss 4.2908 LearningRate 0.000805 Epoch: 7 Global Step: 159800 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:14,151-Speed 2497.93 samples/sec Loss 4.2744 LearningRate 0.000805 Epoch: 7 Global Step: 159810 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:22,349-Speed 2498.74 samples/sec Loss 4.3522 LearningRate 0.000805 Epoch: 7 Global Step: 159820 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:30,554-Speed 2496.45 samples/sec Loss 4.3554 LearningRate 0.000805 Epoch: 7 Global Step: 159830 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:38,752-Speed 2498.73 samples/sec Loss 4.2723 LearningRate 0.000805 Epoch: 7 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:46,896-Speed 2515.19 samples/sec Loss 4.3583 LearningRate 0.000805 Epoch: 7 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:57:55,091-Speed 2499.51 samples/sec Loss 4.3086 LearningRate 0.000805 Epoch: 7 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:03,289-Speed 2498.73 samples/sec Loss 4.4063 LearningRate 0.000805 Epoch: 7 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:11,486-Speed 2499.18 samples/sec Loss 4.5208 LearningRate 0.000805 Epoch: 7 Global Step: 159880 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:19,681-Speed 2499.62 samples/sec Loss 4.5368 LearningRate 0.000805 Epoch: 7 Global Step: 159890 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:27,886-Speed 2496.64 samples/sec Loss 4.3506 LearningRate 0.000805 Epoch: 7 Global Step: 159900 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:36,041-Speed 2511.40 samples/sec Loss 4.4052 LearningRate 0.000804 Epoch: 7 Global Step: 159910 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:44,239-Speed 2498.54 samples/sec Loss 4.3926 LearningRate 0.000804 Epoch: 7 Global Step: 159920 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:58:52,439-Speed 2498.12 samples/sec Loss 4.3866 LearningRate 0.000804 Epoch: 7 Global Step: 159930 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:00,636-Speed 2498.73 samples/sec Loss 4.4015 LearningRate 0.000804 Epoch: 7 Global Step: 159940 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:08,832-Speed 2499.24 samples/sec Loss 4.3373 LearningRate 0.000804 Epoch: 7 Global Step: 159950 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:17,030-Speed 2498.50 samples/sec Loss 4.3300 LearningRate 0.000804 Epoch: 7 Global Step: 159960 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:25,178-Speed 2514.05 samples/sec Loss 4.3303 LearningRate 0.000804 Epoch: 7 Global Step: 159970 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:33,378-Speed 2498.00 samples/sec Loss 4.3207 LearningRate 0.000804 Epoch: 7 Global Step: 159980 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:41,579-Speed 2497.69 samples/sec Loss 4.2675 LearningRate 0.000804 Epoch: 7 Global Step: 159990 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:49,779-Speed 2497.89 samples/sec Loss 4.3181 LearningRate 0.000804 Epoch: 7 Global Step: 160000 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 02:59:57,983-Speed 2496.69 samples/sec Loss 4.3637 LearningRate 0.000804 Epoch: 7 Global Step: 160010 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:06,182-Speed 2498.24 samples/sec Loss 4.3360 LearningRate 0.000804 Epoch: 7 Global Step: 160020 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:14,325-Speed 2515.65 samples/sec Loss 4.3030 LearningRate 0.000804 Epoch: 7 Global Step: 160030 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:22,547-Speed 2491.17 samples/sec Loss 4.3563 LearningRate 0.000804 Epoch: 7 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:30,754-Speed 2495.74 samples/sec Loss 4.3156 LearningRate 0.000804 Epoch: 7 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:38,956-Speed 2497.50 samples/sec Loss 4.3183 LearningRate 0.000804 Epoch: 7 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:47,162-Speed 2495.86 samples/sec Loss 4.2802 LearningRate 0.000804 Epoch: 7 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:00:55,362-Speed 2498.06 samples/sec Loss 4.3168 LearningRate 0.000804 Epoch: 7 Global Step: 160080 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:01:03,512-Speed 2513.52 samples/sec Loss 4.2990 LearningRate 0.000804 Epoch: 7 Global Step: 160090 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:01:11,712-Speed 2497.86 samples/sec Loss 4.3484 LearningRate 0.000804 Epoch: 7 Global Step: 160100 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:01:19,915-Speed 2497.11 samples/sec Loss 4.3519 LearningRate 0.000804 Epoch: 7 Global Step: 160110 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:01:28,112-Speed 2499.07 samples/sec Loss 4.3659 LearningRate 0.000804 Epoch: 7 Global Step: 160120 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:01:36,306-Speed 2499.79 samples/sec Loss 4.4231 LearningRate 0.000804 Epoch: 7 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:01:44,506-Speed 2497.76 samples/sec Loss 4.3404 LearningRate 0.000804 Epoch: 7 Global Step: 160140 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:01:52,652-Speed 2514.47 samples/sec Loss 4.2713 LearningRate 0.000804 Epoch: 7 Global Step: 160150 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:00,846-Speed 2499.97 samples/sec Loss 4.3283 LearningRate 0.000804 Epoch: 7 Global Step: 160160 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:09,044-Speed 2498.87 samples/sec Loss 4.2701 LearningRate 0.000804 Epoch: 7 Global Step: 160170 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:17,243-Speed 2498.08 samples/sec Loss 4.2751 LearningRate 0.000804 Epoch: 7 Global Step: 160180 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:25,436-Speed 2500.60 samples/sec Loss 4.3492 LearningRate 0.000804 Epoch: 7 Global Step: 160190 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:33,633-Speed 2498.68 samples/sec Loss 4.3478 LearningRate 0.000804 Epoch: 7 Global Step: 160200 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:41,780-Speed 2514.35 samples/sec Loss 4.3470 LearningRate 0.000804 Epoch: 7 Global Step: 160210 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:49,988-Speed 2495.55 samples/sec Loss 4.2811 LearningRate 0.000804 Epoch: 7 Global Step: 160220 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:02:58,198-Speed 2495.18 samples/sec Loss 4.3215 LearningRate 0.000804 Epoch: 7 Global Step: 160230 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:06,660-Speed 2498.88 samples/sec Loss 4.2684 LearningRate 0.000804 Epoch: 7 Global Step: 160240 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:14,861-Speed 2497.52 samples/sec Loss 4.2482 LearningRate 0.000804 Epoch: 7 Global Step: 160250 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:23,086-Speed 2500.79 samples/sec Loss 4.2760 LearningRate 0.000804 Epoch: 7 Global Step: 160260 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:31,949-Speed 2516.89 samples/sec Loss 4.3156 LearningRate 0.000804 Epoch: 7 Global Step: 160270 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:40,150-Speed 2497.50 samples/sec Loss 4.3014 LearningRate 0.000804 Epoch: 7 Global Step: 160280 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:48,349-Speed 2498.53 samples/sec Loss 4.2128 LearningRate 0.000804 Epoch: 7 Global Step: 160290 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:03:56,551-Speed 2497.45 samples/sec Loss 4.2931 LearningRate 0.000804 Epoch: 7 Global Step: 160300 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:04,753-Speed 2497.28 samples/sec Loss 4.3367 LearningRate 0.000804 Epoch: 7 Global Step: 160310 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:12,951-Speed 2498.52 samples/sec Loss 4.3026 LearningRate 0.000803 Epoch: 7 Global Step: 160320 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:21,101-Speed 2513.46 samples/sec Loss 4.2619 LearningRate 0.000803 Epoch: 7 Global Step: 160330 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:29,309-Speed 2495.34 samples/sec Loss 4.2642 LearningRate 0.000803 Epoch: 7 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:37,510-Speed 2497.82 samples/sec Loss 4.2923 LearningRate 0.000803 Epoch: 7 Global Step: 160350 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:45,708-Speed 2498.58 samples/sec Loss 4.3916 LearningRate 0.000803 Epoch: 7 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:04:53,904-Speed 2499.16 samples/sec Loss 4.3285 LearningRate 0.000803 Epoch: 7 Global Step: 160370 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:02,101-Speed 2498.84 samples/sec Loss 4.2861 LearningRate 0.000803 Epoch: 7 Global Step: 160380 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:10,246-Speed 2514.96 samples/sec Loss 4.3653 LearningRate 0.000803 Epoch: 7 Global Step: 160390 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:18,448-Speed 2497.42 samples/sec Loss 4.2749 LearningRate 0.000803 Epoch: 7 Global Step: 160400 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:26,652-Speed 2496.88 samples/sec Loss 4.3704 LearningRate 0.000803 Epoch: 7 Global Step: 160410 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:34,874-Speed 2491.36 samples/sec Loss 4.2920 LearningRate 0.000803 Epoch: 7 Global Step: 160420 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:43,066-Speed 2500.62 samples/sec Loss 4.3247 LearningRate 0.000803 Epoch: 7 Global Step: 160430 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:51,268-Speed 2497.15 samples/sec Loss 4.3474 LearningRate 0.000803 Epoch: 7 Global Step: 160440 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:05:59,424-Speed 2511.47 samples/sec Loss 4.2945 LearningRate 0.000803 Epoch: 7 Global Step: 160450 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:07,630-Speed 2496.36 samples/sec Loss 4.3595 LearningRate 0.000803 Epoch: 7 Global Step: 160460 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:15,831-Speed 2497.86 samples/sec Loss 4.3107 LearningRate 0.000803 Epoch: 7 Global Step: 160470 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:24,025-Speed 2499.71 samples/sec Loss 4.2754 LearningRate 0.000803 Epoch: 7 Global Step: 160480 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:32,226-Speed 2497.83 samples/sec Loss 4.3423 LearningRate 0.000803 Epoch: 7 Global Step: 160490 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:40,429-Speed 2497.22 samples/sec Loss 4.3756 LearningRate 0.000803 Epoch: 7 Global Step: 160500 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:48,575-Speed 2514.55 samples/sec Loss 4.3633 LearningRate 0.000803 Epoch: 7 Global Step: 160510 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:06:56,778-Speed 2497.10 samples/sec Loss 4.3485 LearningRate 0.000803 Epoch: 7 Global Step: 160520 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:04,980-Speed 2497.53 samples/sec Loss 4.2408 LearningRate 0.000803 Epoch: 7 Global Step: 160530 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:13,183-Speed 2497.18 samples/sec Loss 4.3881 LearningRate 0.000803 Epoch: 7 Global Step: 160540 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:21,380-Speed 2498.94 samples/sec Loss 4.3545 LearningRate 0.000803 Epoch: 7 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:29,580-Speed 2497.98 samples/sec Loss 4.3066 LearningRate 0.000803 Epoch: 7 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:37,727-Speed 2514.07 samples/sec Loss 4.2939 LearningRate 0.000803 Epoch: 7 Global Step: 160570 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:45,923-Speed 2499.23 samples/sec Loss 4.3567 LearningRate 0.000803 Epoch: 7 Global Step: 160580 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:07:54,127-Speed 2496.77 samples/sec Loss 4.2930 LearningRate 0.000803 Epoch: 7 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:08:02,284-Speed 2511.22 samples/sec Loss 4.3039 LearningRate 0.000803 Epoch: 7 Global Step: 160600 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:10,489-Speed 2496.27 samples/sec Loss 4.3254 LearningRate 0.000803 Epoch: 7 Global Step: 160610 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:18,688-Speed 2498.24 samples/sec Loss 4.3211 LearningRate 0.000803 Epoch: 7 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:26,833-Speed 2514.60 samples/sec Loss 4.2918 LearningRate 0.000803 Epoch: 7 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:35,032-Speed 2498.40 samples/sec Loss 4.3306 LearningRate 0.000803 Epoch: 7 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:43,234-Speed 2497.72 samples/sec Loss 4.1829 LearningRate 0.000803 Epoch: 7 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:51,436-Speed 2497.33 samples/sec Loss 4.2668 LearningRate 0.000803 Epoch: 7 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:08:59,651-Speed 2493.35 samples/sec Loss 4.2398 LearningRate 0.000803 Epoch: 7 Global Step: 160670 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:07,851-Speed 2497.86 samples/sec Loss 4.3625 LearningRate 0.000803 Epoch: 7 Global Step: 160680 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:15,999-Speed 2513.75 samples/sec Loss 4.2562 LearningRate 0.000803 Epoch: 7 Global Step: 160690 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:24,213-Speed 2493.90 samples/sec Loss 4.3838 LearningRate 0.000803 Epoch: 7 Global Step: 160700 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:32,412-Speed 2498.29 samples/sec Loss 4.3788 LearningRate 0.000803 Epoch: 7 Global Step: 160710 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:40,619-Speed 2495.73 samples/sec Loss 4.4114 LearningRate 0.000803 Epoch: 7 Global Step: 160720 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:48,820-Speed 2497.70 samples/sec Loss 4.3861 LearningRate 0.000803 Epoch: 7 Global Step: 160730 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:09:57,020-Speed 2497.90 samples/sec Loss 4.2818 LearningRate 0.000802 Epoch: 7 Global Step: 160740 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:05,166-Speed 2514.53 samples/sec Loss 4.2734 LearningRate 0.000802 Epoch: 7 Global Step: 160750 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:13,371-Speed 2496.36 samples/sec Loss 4.2681 LearningRate 0.000802 Epoch: 7 Global Step: 160760 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:21,571-Speed 2497.86 samples/sec Loss 4.3629 LearningRate 0.000802 Epoch: 7 Global Step: 160770 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:29,773-Speed 2497.56 samples/sec Loss 4.2762 LearningRate 0.000802 Epoch: 7 Global Step: 160780 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:37,972-Speed 2498.25 samples/sec Loss 4.3197 LearningRate 0.000802 Epoch: 7 Global Step: 160790 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:46,177-Speed 2496.22 samples/sec Loss 4.2292 LearningRate 0.000802 Epoch: 7 Global Step: 160800 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:10:54,339-Speed 2509.76 samples/sec Loss 4.2655 LearningRate 0.000802 Epoch: 7 Global Step: 160810 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:02,536-Speed 2498.84 samples/sec Loss 4.2777 LearningRate 0.000802 Epoch: 7 Global Step: 160820 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:10,736-Speed 2498.08 samples/sec Loss 4.2364 LearningRate 0.000802 Epoch: 7 Global Step: 160830 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:18,934-Speed 2498.69 samples/sec Loss 4.3028 LearningRate 0.000802 Epoch: 7 Global Step: 160840 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:27,136-Speed 2497.31 samples/sec Loss 4.2355 LearningRate 0.000802 Epoch: 7 Global Step: 160850 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:35,338-Speed 2497.49 samples/sec Loss 4.2609 LearningRate 0.000802 Epoch: 7 Global Step: 160860 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:43,481-Speed 2515.42 samples/sec Loss 4.3404 LearningRate 0.000802 Epoch: 7 Global Step: 160870 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:51,684-Speed 2496.96 samples/sec Loss 4.2637 LearningRate 0.000802 Epoch: 7 Global Step: 160880 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:11:59,892-Speed 2495.83 samples/sec Loss 4.3084 LearningRate 0.000802 Epoch: 7 Global Step: 160890 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:08,091-Speed 2498.25 samples/sec Loss 4.3118 LearningRate 0.000802 Epoch: 7 Global Step: 160900 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:16,292-Speed 2497.68 samples/sec Loss 4.3397 LearningRate 0.000802 Epoch: 7 Global Step: 160910 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:24,493-Speed 2497.49 samples/sec Loss 4.3191 LearningRate 0.000802 Epoch: 7 Global Step: 160920 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:32,645-Speed 2512.95 samples/sec Loss 4.3195 LearningRate 0.000802 Epoch: 7 Global Step: 160930 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:40,846-Speed 2497.70 samples/sec Loss 4.2336 LearningRate 0.000802 Epoch: 7 Global Step: 160940 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:49,041-Speed 2499.77 samples/sec Loss 4.2230 LearningRate 0.000802 Epoch: 7 Global Step: 160950 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:12:57,240-Speed 2498.10 samples/sec Loss 4.2543 LearningRate 0.000802 Epoch: 7 Global Step: 160960 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:05,444-Speed 2496.91 samples/sec Loss 4.2848 LearningRate 0.000802 Epoch: 7 Global Step: 160970 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:13,641-Speed 2498.79 samples/sec Loss 4.3201 LearningRate 0.000802 Epoch: 7 Global Step: 160980 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:21,788-Speed 2514.33 samples/sec Loss 4.3062 LearningRate 0.000802 Epoch: 7 Global Step: 160990 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:29,989-Speed 2497.71 samples/sec Loss 4.2378 LearningRate 0.000802 Epoch: 7 Global Step: 161000 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:38,195-Speed 2496.02 samples/sec Loss 4.2530 LearningRate 0.000802 Epoch: 7 Global Step: 161010 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:46,398-Speed 2497.18 samples/sec Loss 4.2552 LearningRate 0.000802 Epoch: 7 Global Step: 161020 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:13:54,598-Speed 2497.81 samples/sec Loss 4.2629 LearningRate 0.000802 Epoch: 7 Global Step: 161030 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:02,811-Speed 2493.95 samples/sec Loss 4.2779 LearningRate 0.000802 Epoch: 7 Global Step: 161040 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:10,989-Speed 2504.80 samples/sec Loss 4.3719 LearningRate 0.000802 Epoch: 7 Global Step: 161050 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:19,188-Speed 2498.29 samples/sec Loss 4.2539 LearningRate 0.000802 Epoch: 7 Global Step: 161060 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:27,388-Speed 2497.92 samples/sec Loss 4.3287 LearningRate 0.000802 Epoch: 7 Global Step: 161070 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:35,587-Speed 2498.88 samples/sec Loss 4.3280 LearningRate 0.000802 Epoch: 7 Global Step: 161080 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:43,796-Speed 2495.06 samples/sec Loss 4.3565 LearningRate 0.000802 Epoch: 7 Global Step: 161090 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:14:51,992-Speed 2499.45 samples/sec Loss 4.3931 LearningRate 0.000802 Epoch: 7 Global Step: 161100 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:00,141-Speed 2513.61 samples/sec Loss 4.3581 LearningRate 0.000802 Epoch: 7 Global Step: 161110 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:08,336-Speed 2499.35 samples/sec Loss 4.2127 LearningRate 0.000802 Epoch: 7 Global Step: 161120 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:16,543-Speed 2495.93 samples/sec Loss 4.2378 LearningRate 0.000802 Epoch: 7 Global Step: 161130 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:24,735-Speed 2500.43 samples/sec Loss 4.2765 LearningRate 0.000802 Epoch: 7 Global Step: 161140 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:32,939-Speed 2496.93 samples/sec Loss 4.2880 LearningRate 0.000802 Epoch: 7 Global Step: 161150 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:41,138-Speed 2498.37 samples/sec Loss 4.3266 LearningRate 0.000801 Epoch: 7 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:49,285-Speed 2514.03 samples/sec Loss 4.2962 LearningRate 0.000801 Epoch: 7 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:15:57,485-Speed 2498.03 samples/sec Loss 4.3116 LearningRate 0.000801 Epoch: 7 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:05,692-Speed 2496.02 samples/sec Loss 4.2841 LearningRate 0.000801 Epoch: 7 Global Step: 161190 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:13,889-Speed 2498.77 samples/sec Loss 4.3137 LearningRate 0.000801 Epoch: 7 Global Step: 161200 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:22,102-Speed 2494.24 samples/sec Loss 4.2790 LearningRate 0.000801 Epoch: 7 Global Step: 161210 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:30,306-Speed 2496.63 samples/sec Loss 4.3250 LearningRate 0.000801 Epoch: 7 Global Step: 161220 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:38,451-Speed 2514.87 samples/sec Loss 4.3642 LearningRate 0.000801 Epoch: 7 Global Step: 161230 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:46,651-Speed 2497.85 samples/sec Loss 4.3265 LearningRate 0.000801 Epoch: 7 Global Step: 161240 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:16:54,860-Speed 2495.62 samples/sec Loss 4.5052 LearningRate 0.000801 Epoch: 7 Global Step: 161250 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:03,059-Speed 2498.14 samples/sec Loss 4.3097 LearningRate 0.000801 Epoch: 7 Global Step: 161260 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:11,257-Speed 2498.54 samples/sec Loss 4.3556 LearningRate 0.000801 Epoch: 7 Global Step: 161270 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:19,459-Speed 2497.40 samples/sec Loss 4.3944 LearningRate 0.000801 Epoch: 7 Global Step: 161280 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:27,602-Speed 2515.47 samples/sec Loss 4.3038 LearningRate 0.000801 Epoch: 7 Global Step: 161290 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:35,804-Speed 2497.51 samples/sec Loss 4.3115 LearningRate 0.000801 Epoch: 7 Global Step: 161300 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:44,008-Speed 2496.64 samples/sec Loss 4.3486 LearningRate 0.000801 Epoch: 7 Global Step: 161310 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:17:52,205-Speed 2498.92 samples/sec Loss 4.3117 LearningRate 0.000801 Epoch: 7 Global Step: 161320 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:00,406-Speed 2497.96 samples/sec Loss 4.3573 LearningRate 0.000801 Epoch: 7 Global Step: 161330 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:08,603-Speed 2498.73 samples/sec Loss 4.3154 LearningRate 0.000801 Epoch: 7 Global Step: 161340 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:16,753-Speed 2513.31 samples/sec Loss 4.3103 LearningRate 0.000801 Epoch: 7 Global Step: 161350 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:24,951-Speed 2498.37 samples/sec Loss 4.4136 LearningRate 0.000801 Epoch: 7 Global Step: 161360 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:33,156-Speed 2496.61 samples/sec Loss 4.2823 LearningRate 0.000801 Epoch: 7 Global Step: 161370 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:41,353-Speed 2498.68 samples/sec Loss 4.3493 LearningRate 0.000801 Epoch: 7 Global Step: 161380 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:49,554-Speed 2497.69 samples/sec Loss 4.3202 LearningRate 0.000801 Epoch: 7 Global Step: 161390 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:18:57,756-Speed 2497.66 samples/sec Loss 4.3668 LearningRate 0.000801 Epoch: 7 Global Step: 161400 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:05,902-Speed 2514.42 samples/sec Loss 4.2776 LearningRate 0.000801 Epoch: 7 Global Step: 161410 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:14,107-Speed 2496.45 samples/sec Loss 4.3597 LearningRate 0.000801 Epoch: 7 Global Step: 161420 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:22,302-Speed 2499.61 samples/sec Loss 4.3040 LearningRate 0.000801 Epoch: 7 Global Step: 161430 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:30,505-Speed 2497.05 samples/sec Loss 4.3767 LearningRate 0.000801 Epoch: 7 Global Step: 161440 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:38,703-Speed 2498.73 samples/sec Loss 4.2798 LearningRate 0.000801 Epoch: 7 Global Step: 161450 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:46,900-Speed 2498.89 samples/sec Loss 4.3431 LearningRate 0.000801 Epoch: 7 Global Step: 161460 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:19:55,050-Speed 2513.20 samples/sec Loss 4.2745 LearningRate 0.000801 Epoch: 7 Global Step: 161470 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:03,245-Speed 2499.50 samples/sec Loss 4.2825 LearningRate 0.000801 Epoch: 7 Global Step: 161480 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:11,447-Speed 2497.58 samples/sec Loss 4.2953 LearningRate 0.000801 Epoch: 7 Global Step: 161490 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:19,649-Speed 2497.24 samples/sec Loss 4.3142 LearningRate 0.000801 Epoch: 7 Global Step: 161500 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:27,848-Speed 2498.17 samples/sec Loss 4.2729 LearningRate 0.000801 Epoch: 7 Global Step: 161510 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:36,049-Speed 2497.63 samples/sec Loss 4.2600 LearningRate 0.000801 Epoch: 7 Global Step: 161520 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:44,193-Speed 2515.17 samples/sec Loss 4.2744 LearningRate 0.000801 Epoch: 7 Global Step: 161530 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:20:52,396-Speed 2497.26 samples/sec Loss 4.2937 LearningRate 0.000801 Epoch: 7 Global Step: 161540 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:00,593-Speed 2498.99 samples/sec Loss 4.2183 LearningRate 0.000801 Epoch: 7 Global Step: 161550 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:08,794-Speed 2497.94 samples/sec Loss 4.3407 LearningRate 0.000801 Epoch: 7 Global Step: 161560 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:17,002-Speed 2495.48 samples/sec Loss 4.3633 LearningRate 0.000800 Epoch: 7 Global Step: 161570 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:25,200-Speed 2498.59 samples/sec Loss 4.2313 LearningRate 0.000800 Epoch: 7 Global Step: 161580 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:33,345-Speed 2514.80 samples/sec Loss 4.3148 LearningRate 0.000800 Epoch: 7 Global Step: 161590 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:41,565-Speed 2491.98 samples/sec Loss 4.3007 LearningRate 0.000800 Epoch: 7 Global Step: 161600 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:49,764-Speed 2498.48 samples/sec Loss 4.3400 LearningRate 0.000800 Epoch: 7 Global Step: 161610 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:21:57,958-Speed 2499.56 samples/sec Loss 4.2952 LearningRate 0.000800 Epoch: 7 Global Step: 161620 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:06,157-Speed 2498.46 samples/sec Loss 4.3467 LearningRate 0.000800 Epoch: 7 Global Step: 161630 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:14,355-Speed 2498.41 samples/sec Loss 4.3213 LearningRate 0.000800 Epoch: 7 Global Step: 161640 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:22,499-Speed 2515.24 samples/sec Loss 4.2985 LearningRate 0.000800 Epoch: 7 Global Step: 161650 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:30,698-Speed 2498.23 samples/sec Loss 4.3556 LearningRate 0.000800 Epoch: 7 Global Step: 161660 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:38,899-Speed 2497.76 samples/sec Loss 4.2773 LearningRate 0.000800 Epoch: 7 Global Step: 161670 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:47,095-Speed 2499.35 samples/sec Loss 4.2123 LearningRate 0.000800 Epoch: 7 Global Step: 161680 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:22:55,295-Speed 2498.17 samples/sec Loss 4.2333 LearningRate 0.000800 Epoch: 7 Global Step: 161690 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:03,493-Speed 2498.67 samples/sec Loss 4.2250 LearningRate 0.000800 Epoch: 7 Global Step: 161700 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:11,638-Speed 2514.85 samples/sec Loss 4.2951 LearningRate 0.000800 Epoch: 7 Global Step: 161710 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:19,834-Speed 2499.10 samples/sec Loss 4.2835 LearningRate 0.000800 Epoch: 7 Global Step: 161720 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:28,033-Speed 2498.56 samples/sec Loss 4.3349 LearningRate 0.000800 Epoch: 7 Global Step: 161730 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:36,234-Speed 2497.68 samples/sec Loss 4.2976 LearningRate 0.000800 Epoch: 7 Global Step: 161740 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:44,435-Speed 2497.81 samples/sec Loss 4.2474 LearningRate 0.000800 Epoch: 7 Global Step: 161750 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:23:52,632-Speed 2499.01 samples/sec Loss 4.3319 LearningRate 0.000800 Epoch: 7 Global Step: 161760 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:24:00,780-Speed 2514.08 samples/sec Loss 4.2647 LearningRate 0.000800 Epoch: 7 Global Step: 161770 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:24:08,976-Speed 2499.23 samples/sec Loss 4.3916 LearningRate 0.000800 Epoch: 7 Global Step: 161780 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:24:17,170-Speed 2499.63 samples/sec Loss 4.2580 LearningRate 0.000800 Epoch: 7 Global Step: 161790 Fp16 Grad Scale: 32768 Required: 153 hours Training: 2022-07-07 03:24:25,368-Speed 2498.55 samples/sec Loss 4.2677 LearningRate 0.000800 Epoch: 7 Global Step: 161800 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:24:33,566-Speed 2498.71 samples/sec Loss 4.2960 LearningRate 0.000800 Epoch: 7 Global Step: 161810 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:24:41,763-Speed 2498.71 samples/sec Loss 4.3715 LearningRate 0.000800 Epoch: 7 Global Step: 161820 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:24:49,909-Speed 2514.82 samples/sec Loss 4.3268 LearningRate 0.000800 Epoch: 7 Global Step: 161830 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:24:58,106-Speed 2498.76 samples/sec Loss 4.2995 LearningRate 0.000800 Epoch: 7 Global Step: 161840 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:06,307-Speed 2497.73 samples/sec Loss 4.2823 LearningRate 0.000800 Epoch: 7 Global Step: 161850 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:14,503-Speed 2499.17 samples/sec Loss 4.2866 LearningRate 0.000800 Epoch: 7 Global Step: 161860 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:22,701-Speed 2498.66 samples/sec Loss 4.3138 LearningRate 0.000800 Epoch: 7 Global Step: 161870 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:30,902-Speed 2497.50 samples/sec Loss 4.3108 LearningRate 0.000800 Epoch: 7 Global Step: 161880 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:39,052-Speed 2513.24 samples/sec Loss 4.2955 LearningRate 0.000800 Epoch: 7 Global Step: 161890 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:47,257-Speed 2496.61 samples/sec Loss 4.2284 LearningRate 0.000800 Epoch: 7 Global Step: 161900 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:25:55,459-Speed 2497.13 samples/sec Loss 4.2286 LearningRate 0.000800 Epoch: 7 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:03,658-Speed 2498.36 samples/sec Loss 4.2378 LearningRate 0.000800 Epoch: 7 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:11,857-Speed 2498.72 samples/sec Loss 4.1373 LearningRate 0.000800 Epoch: 7 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:20,076-Speed 2492.16 samples/sec Loss 4.2621 LearningRate 0.000800 Epoch: 7 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:28,240-Speed 2508.91 samples/sec Loss 4.2906 LearningRate 0.000800 Epoch: 7 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:36,439-Speed 2498.05 samples/sec Loss 4.2016 LearningRate 0.000800 Epoch: 7 Global Step: 161960 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:44,639-Speed 2497.99 samples/sec Loss 4.2271 LearningRate 0.000800 Epoch: 7 Global Step: 161970 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:26:52,839-Speed 2498.26 samples/sec Loss 4.2691 LearningRate 0.000800 Epoch: 7 Global Step: 161980 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:01,032-Speed 2499.80 samples/sec Loss 4.2332 LearningRate 0.000799 Epoch: 7 Global Step: 161990 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:09,228-Speed 2499.28 samples/sec Loss 4.3165 LearningRate 0.000799 Epoch: 7 Global Step: 162000 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:17,374-Speed 2514.57 samples/sec Loss 4.3239 LearningRate 0.000799 Epoch: 7 Global Step: 162010 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:25,578-Speed 2497.00 samples/sec Loss 4.2966 LearningRate 0.000799 Epoch: 7 Global Step: 162020 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:33,781-Speed 2497.06 samples/sec Loss 4.2326 LearningRate 0.000799 Epoch: 7 Global Step: 162030 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:41,986-Speed 2496.20 samples/sec Loss 4.3246 LearningRate 0.000799 Epoch: 7 Global Step: 162040 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:50,190-Speed 2496.64 samples/sec Loss 4.2911 LearningRate 0.000799 Epoch: 7 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:27:58,387-Speed 2499.20 samples/sec Loss 4.3271 LearningRate 0.000799 Epoch: 7 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 153 hours Training: 2022-07-07 03:28:06,534-Speed 2514.16 samples/sec Loss 4.2637 LearningRate 0.000799 Epoch: 7 Global Step: 162070 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:14,734-Speed 2498.53 samples/sec Loss 4.2911 LearningRate 0.000799 Epoch: 7 Global Step: 162080 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:22,937-Speed 2497.25 samples/sec Loss 4.2817 LearningRate 0.000799 Epoch: 7 Global Step: 162090 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:31,135-Speed 2498.70 samples/sec Loss 4.3120 LearningRate 0.000799 Epoch: 7 Global Step: 162100 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:39,333-Speed 2498.66 samples/sec Loss 4.3420 LearningRate 0.000799 Epoch: 7 Global Step: 162110 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:47,530-Speed 2498.81 samples/sec Loss 4.3304 LearningRate 0.000799 Epoch: 7 Global Step: 162120 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:28:55,679-Speed 2513.46 samples/sec Loss 4.3575 LearningRate 0.000799 Epoch: 7 Global Step: 162130 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:03,882-Speed 2497.22 samples/sec Loss 4.2109 LearningRate 0.000799 Epoch: 7 Global Step: 162140 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:12,078-Speed 2498.99 samples/sec Loss 4.1827 LearningRate 0.000799 Epoch: 7 Global Step: 162150 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:20,285-Speed 2496.01 samples/sec Loss 4.1904 LearningRate 0.000799 Epoch: 7 Global Step: 162160 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:28,484-Speed 2498.21 samples/sec Loss 4.1889 LearningRate 0.000799 Epoch: 7 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:36,682-Speed 2498.85 samples/sec Loss 4.2518 LearningRate 0.000799 Epoch: 7 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:44,832-Speed 2513.20 samples/sec Loss 4.2611 LearningRate 0.000799 Epoch: 7 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:29:53,034-Speed 2497.03 samples/sec Loss 4.2949 LearningRate 0.000799 Epoch: 7 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:01,233-Speed 2498.39 samples/sec Loss 4.2021 LearningRate 0.000799 Epoch: 7 Global Step: 162210 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:09,437-Speed 2497.04 samples/sec Loss 4.2429 LearningRate 0.000799 Epoch: 7 Global Step: 162220 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:17,637-Speed 2497.75 samples/sec Loss 4.2438 LearningRate 0.000799 Epoch: 7 Global Step: 162230 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:25,843-Speed 2496.24 samples/sec Loss 4.2420 LearningRate 0.000799 Epoch: 7 Global Step: 162240 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:33,985-Speed 2515.80 samples/sec Loss 4.2698 LearningRate 0.000799 Epoch: 7 Global Step: 162250 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:42,191-Speed 2496.56 samples/sec Loss 4.1998 LearningRate 0.000799 Epoch: 7 Global Step: 162260 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:50,389-Speed 2498.44 samples/sec Loss 4.1872 LearningRate 0.000799 Epoch: 7 Global Step: 162270 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:30:58,595-Speed 2496.41 samples/sec Loss 4.2417 LearningRate 0.000799 Epoch: 7 Global Step: 162280 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:06,799-Speed 2496.86 samples/sec Loss 4.2335 LearningRate 0.000799 Epoch: 7 Global Step: 162290 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:15,010-Speed 2494.59 samples/sec Loss 4.2773 LearningRate 0.000799 Epoch: 7 Global Step: 162300 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:23,156-Speed 2514.57 samples/sec Loss 4.2506 LearningRate 0.000799 Epoch: 7 Global Step: 162310 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:31,352-Speed 2499.13 samples/sec Loss 4.2219 LearningRate 0.000799 Epoch: 7 Global Step: 162320 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:39,554-Speed 2497.49 samples/sec Loss 4.2487 LearningRate 0.000799 Epoch: 7 Global Step: 162330 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:47,755-Speed 2497.65 samples/sec Loss 4.3106 LearningRate 0.000799 Epoch: 7 Global Step: 162340 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:31:55,954-Speed 2498.33 samples/sec Loss 4.2552 LearningRate 0.000799 Epoch: 7 Global Step: 162350 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:04,157-Speed 2497.07 samples/sec Loss 4.3359 LearningRate 0.000799 Epoch: 7 Global Step: 162360 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:12,301-Speed 2515.21 samples/sec Loss 4.2617 LearningRate 0.000799 Epoch: 7 Global Step: 162370 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:20,499-Speed 2498.30 samples/sec Loss 4.3069 LearningRate 0.000799 Epoch: 7 Global Step: 162380 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:28,697-Speed 2498.56 samples/sec Loss 4.2254 LearningRate 0.000799 Epoch: 7 Global Step: 162390 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:36,898-Speed 2497.87 samples/sec Loss 4.2743 LearningRate 0.000799 Epoch: 7 Global Step: 162400 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:45,096-Speed 2498.63 samples/sec Loss 4.2278 LearningRate 0.000798 Epoch: 7 Global Step: 162410 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:32:53,320-Speed 2490.66 samples/sec Loss 4.3309 LearningRate 0.000798 Epoch: 7 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:01,466-Speed 2514.83 samples/sec Loss 4.2533 LearningRate 0.000798 Epoch: 7 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:09,663-Speed 2499.16 samples/sec Loss 4.2300 LearningRate 0.000798 Epoch: 7 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:17,871-Speed 2495.42 samples/sec Loss 4.1745 LearningRate 0.000798 Epoch: 7 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:26,075-Speed 2496.77 samples/sec Loss 4.2327 LearningRate 0.000798 Epoch: 7 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:34,284-Speed 2495.28 samples/sec Loss 4.2874 LearningRate 0.000798 Epoch: 7 Global Step: 162470 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:42,485-Speed 2497.63 samples/sec Loss 4.2238 LearningRate 0.000798 Epoch: 7 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:50,641-Speed 2511.44 samples/sec Loss 4.2375 LearningRate 0.000798 Epoch: 7 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:33:58,837-Speed 2499.15 samples/sec Loss 4.2632 LearningRate 0.000798 Epoch: 7 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:34:07,036-Speed 2498.19 samples/sec Loss 4.3598 LearningRate 0.000798 Epoch: 7 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:34:15,239-Speed 2496.98 samples/sec Loss 4.3042 LearningRate 0.000798 Epoch: 7 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:34:23,436-Speed 2498.97 samples/sec Loss 4.2246 LearningRate 0.000798 Epoch: 7 Global Step: 162530 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:34:31,590-Speed 2511.90 samples/sec Loss 4.2962 LearningRate 0.000798 Epoch: 7 Global Step: 162540 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:34:39,738-Speed 2514.08 samples/sec Loss 4.3819 LearningRate 0.000798 Epoch: 7 Global Step: 162550 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:34:47,934-Speed 2499.01 samples/sec Loss 4.1694 LearningRate 0.000798 Epoch: 7 Global Step: 162560 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:34:56,133-Speed 2498.38 samples/sec Loss 4.2692 LearningRate 0.000798 Epoch: 7 Global Step: 162570 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:04,332-Speed 2498.31 samples/sec Loss 4.2288 LearningRate 0.000798 Epoch: 7 Global Step: 162580 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:12,654-Speed 2461.14 samples/sec Loss 4.2339 LearningRate 0.000798 Epoch: 7 Global Step: 162590 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:23,909-Speed 2165.19 samples/sec Loss 4.2107 LearningRate 0.000798 Epoch: 7 Global Step: 162600 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:32,477-Speed 2516.27 samples/sec Loss 4.2234 LearningRate 0.000798 Epoch: 7 Global Step: 162610 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:40,802-Speed 2460.44 samples/sec Loss 4.2784 LearningRate 0.000798 Epoch: 7 Global Step: 162620 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:49,025-Speed 2499.65 samples/sec Loss 4.2268 LearningRate 0.000798 Epoch: 7 Global Step: 162630 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:35:57,256-Speed 2488.62 samples/sec Loss 4.2923 LearningRate 0.000798 Epoch: 7 Global Step: 162640 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:05,454-Speed 2498.47 samples/sec Loss 4.2786 LearningRate 0.000798 Epoch: 7 Global Step: 162650 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:13,654-Speed 2497.97 samples/sec Loss 4.3485 LearningRate 0.000798 Epoch: 7 Global Step: 162660 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:21,806-Speed 2512.62 samples/sec Loss 4.2272 LearningRate 0.000798 Epoch: 7 Global Step: 162670 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:30,036-Speed 2488.93 samples/sec Loss 4.2167 LearningRate 0.000798 Epoch: 7 Global Step: 162680 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:38,243-Speed 2495.77 samples/sec Loss 4.2280 LearningRate 0.000798 Epoch: 7 Global Step: 162690 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:46,445-Speed 2497.41 samples/sec Loss 4.1484 LearningRate 0.000798 Epoch: 7 Global Step: 162700 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:36:54,658-Speed 2494.06 samples/sec Loss 4.2445 LearningRate 0.000798 Epoch: 7 Global Step: 162710 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:02,869-Speed 2494.51 samples/sec Loss 4.2279 LearningRate 0.000798 Epoch: 7 Global Step: 162720 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:11,015-Speed 2514.48 samples/sec Loss 4.2953 LearningRate 0.000798 Epoch: 7 Global Step: 162730 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:19,216-Speed 2497.61 samples/sec Loss 4.2258 LearningRate 0.000798 Epoch: 7 Global Step: 162740 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:27,414-Speed 2498.55 samples/sec Loss 4.2988 LearningRate 0.000798 Epoch: 7 Global Step: 162750 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:35,613-Speed 2498.15 samples/sec Loss 4.2078 LearningRate 0.000798 Epoch: 7 Global Step: 162760 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:43,816-Speed 2497.16 samples/sec Loss 4.2110 LearningRate 0.000798 Epoch: 7 Global Step: 162770 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:37:52,013-Speed 2498.87 samples/sec Loss 4.2085 LearningRate 0.000798 Epoch: 7 Global Step: 162780 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:00,157-Speed 2514.94 samples/sec Loss 4.3669 LearningRate 0.000798 Epoch: 7 Global Step: 162790 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:08,350-Speed 2499.94 samples/sec Loss 4.2912 LearningRate 0.000798 Epoch: 7 Global Step: 162800 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:16,551-Speed 2497.85 samples/sec Loss 4.2932 LearningRate 0.000798 Epoch: 7 Global Step: 162810 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:24,748-Speed 2498.81 samples/sec Loss 4.3086 LearningRate 0.000798 Epoch: 7 Global Step: 162820 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:32,945-Speed 2499.08 samples/sec Loss 4.2949 LearningRate 0.000797 Epoch: 7 Global Step: 162830 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:41,145-Speed 2497.98 samples/sec Loss 4.2651 LearningRate 0.000797 Epoch: 7 Global Step: 162840 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:49,290-Speed 2515.95 samples/sec Loss 4.2050 LearningRate 0.000797 Epoch: 7 Global Step: 162850 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:38:57,486-Speed 2499.41 samples/sec Loss 4.2484 LearningRate 0.000797 Epoch: 7 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:05,686-Speed 2498.20 samples/sec Loss 4.2550 LearningRate 0.000797 Epoch: 7 Global Step: 162870 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:13,899-Speed 2493.77 samples/sec Loss 4.2756 LearningRate 0.000797 Epoch: 7 Global Step: 162880 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:22,099-Speed 2498.58 samples/sec Loss 4.2560 LearningRate 0.000797 Epoch: 7 Global Step: 162890 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:30,299-Speed 2497.68 samples/sec Loss 4.2827 LearningRate 0.000797 Epoch: 7 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:38,459-Speed 2510.47 samples/sec Loss 4.2083 LearningRate 0.000797 Epoch: 7 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:46,655-Speed 2499.24 samples/sec Loss 4.3238 LearningRate 0.000797 Epoch: 7 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:39:54,855-Speed 2497.80 samples/sec Loss 4.3052 LearningRate 0.000797 Epoch: 7 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:03,064-Speed 2495.38 samples/sec Loss 4.3398 LearningRate 0.000797 Epoch: 7 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:11,266-Speed 2497.24 samples/sec Loss 4.4071 LearningRate 0.000797 Epoch: 7 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:19,466-Speed 2498.21 samples/sec Loss 4.3385 LearningRate 0.000797 Epoch: 7 Global Step: 162960 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:27,608-Speed 2515.54 samples/sec Loss 4.3369 LearningRate 0.000797 Epoch: 7 Global Step: 162970 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:35,802-Speed 2500.17 samples/sec Loss 4.3891 LearningRate 0.000797 Epoch: 7 Global Step: 162980 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:43,998-Speed 2499.38 samples/sec Loss 4.2926 LearningRate 0.000797 Epoch: 7 Global Step: 162990 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:40:52,194-Speed 2498.93 samples/sec Loss 4.3423 LearningRate 0.000797 Epoch: 7 Global Step: 163000 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:00,391-Speed 2498.93 samples/sec Loss 4.4133 LearningRate 0.000797 Epoch: 7 Global Step: 163010 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:08,592-Speed 2497.63 samples/sec Loss 4.3689 LearningRate 0.000797 Epoch: 7 Global Step: 163020 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:16,735-Speed 2515.70 samples/sec Loss 4.3368 LearningRate 0.000797 Epoch: 7 Global Step: 163030 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:24,937-Speed 2497.31 samples/sec Loss 4.3483 LearningRate 0.000797 Epoch: 7 Global Step: 163040 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:33,132-Speed 2499.61 samples/sec Loss 4.2738 LearningRate 0.000797 Epoch: 7 Global Step: 163050 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:41,340-Speed 2495.50 samples/sec Loss 4.3381 LearningRate 0.000797 Epoch: 7 Global Step: 163060 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:49,549-Speed 2495.24 samples/sec Loss 4.2590 LearningRate 0.000797 Epoch: 7 Global Step: 163070 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:41:57,751-Speed 2497.61 samples/sec Loss 4.3319 LearningRate 0.000797 Epoch: 7 Global Step: 163080 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:05,897-Speed 2514.69 samples/sec Loss 4.2066 LearningRate 0.000797 Epoch: 7 Global Step: 163090 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:14,093-Speed 2499.20 samples/sec Loss 4.2762 LearningRate 0.000797 Epoch: 7 Global Step: 163100 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:22,301-Speed 2495.28 samples/sec Loss 4.2656 LearningRate 0.000797 Epoch: 7 Global Step: 163110 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:30,500-Speed 2498.42 samples/sec Loss 4.2602 LearningRate 0.000797 Epoch: 7 Global Step: 163120 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:38,698-Speed 2498.54 samples/sec Loss 4.2743 LearningRate 0.000797 Epoch: 7 Global Step: 163130 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:46,897-Speed 2498.15 samples/sec Loss 4.2441 LearningRate 0.000797 Epoch: 7 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:42:55,060-Speed 2509.43 samples/sec Loss 4.2310 LearningRate 0.000797 Epoch: 7 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:03,258-Speed 2498.60 samples/sec Loss 4.2855 LearningRate 0.000797 Epoch: 7 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:11,457-Speed 2498.20 samples/sec Loss 4.2211 LearningRate 0.000797 Epoch: 7 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:19,657-Speed 2498.27 samples/sec Loss 4.1541 LearningRate 0.000797 Epoch: 7 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:27,856-Speed 2498.29 samples/sec Loss 4.2624 LearningRate 0.000797 Epoch: 7 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:36,062-Speed 2495.93 samples/sec Loss 4.2715 LearningRate 0.000797 Epoch: 7 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:44,224-Speed 2509.87 samples/sec Loss 4.2573 LearningRate 0.000797 Epoch: 7 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:43:52,423-Speed 2498.11 samples/sec Loss 4.2313 LearningRate 0.000797 Epoch: 7 Global Step: 163220 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:00,626-Speed 2497.14 samples/sec Loss 4.3827 LearningRate 0.000797 Epoch: 7 Global Step: 163230 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:08,831-Speed 2496.46 samples/sec Loss 4.3760 LearningRate 0.000797 Epoch: 7 Global Step: 163240 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:17,036-Speed 2496.32 samples/sec Loss 4.2300 LearningRate 0.000796 Epoch: 7 Global Step: 163250 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:25,237-Speed 2497.92 samples/sec Loss 4.3478 LearningRate 0.000796 Epoch: 7 Global Step: 163260 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:33,384-Speed 2514.11 samples/sec Loss 4.3754 LearningRate 0.000796 Epoch: 7 Global Step: 163270 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:41,582-Speed 2498.66 samples/sec Loss 4.2400 LearningRate 0.000796 Epoch: 7 Global Step: 163280 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:49,789-Speed 2495.92 samples/sec Loss 4.2050 LearningRate 0.000796 Epoch: 7 Global Step: 163290 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:44:57,997-Speed 2495.36 samples/sec Loss 4.2613 LearningRate 0.000796 Epoch: 7 Global Step: 163300 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:06,195-Speed 2498.65 samples/sec Loss 4.2897 LearningRate 0.000796 Epoch: 7 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:14,392-Speed 2498.92 samples/sec Loss 4.2906 LearningRate 0.000796 Epoch: 7 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:22,550-Speed 2510.81 samples/sec Loss 4.2373 LearningRate 0.000796 Epoch: 7 Global Step: 163330 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:30,752-Speed 2497.53 samples/sec Loss 4.2812 LearningRate 0.000796 Epoch: 7 Global Step: 163340 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:38,949-Speed 2499.06 samples/sec Loss 4.2095 LearningRate 0.000796 Epoch: 7 Global Step: 163350 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:47,147-Speed 2498.51 samples/sec Loss 4.3039 LearningRate 0.000796 Epoch: 7 Global Step: 163360 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:45:55,346-Speed 2498.35 samples/sec Loss 4.2674 LearningRate 0.000796 Epoch: 7 Global Step: 163370 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:03,562-Speed 2493.14 samples/sec Loss 4.2914 LearningRate 0.000796 Epoch: 7 Global Step: 163380 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:11,707-Speed 2514.64 samples/sec Loss 4.2041 LearningRate 0.000796 Epoch: 7 Global Step: 163390 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:19,902-Speed 2499.53 samples/sec Loss 4.2195 LearningRate 0.000796 Epoch: 7 Global Step: 163400 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:28,097-Speed 2499.45 samples/sec Loss 4.2622 LearningRate 0.000796 Epoch: 7 Global Step: 163410 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:36,305-Speed 2495.48 samples/sec Loss 4.1542 LearningRate 0.000796 Epoch: 7 Global Step: 163420 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:44,501-Speed 2499.19 samples/sec Loss 4.2286 LearningRate 0.000796 Epoch: 7 Global Step: 163430 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:46:52,699-Speed 2498.46 samples/sec Loss 4.2501 LearningRate 0.000796 Epoch: 7 Global Step: 163440 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:00,847-Speed 2514.18 samples/sec Loss 4.2617 LearningRate 0.000796 Epoch: 7 Global Step: 163450 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:09,048-Speed 2497.77 samples/sec Loss 4.2918 LearningRate 0.000796 Epoch: 7 Global Step: 163460 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:17,257-Speed 2495.06 samples/sec Loss 4.3256 LearningRate 0.000796 Epoch: 7 Global Step: 163470 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:25,452-Speed 2499.41 samples/sec Loss 4.3551 LearningRate 0.000796 Epoch: 7 Global Step: 163480 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:33,653-Speed 2497.68 samples/sec Loss 4.2579 LearningRate 0.000796 Epoch: 7 Global Step: 163490 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:41,850-Speed 2498.98 samples/sec Loss 4.3036 LearningRate 0.000796 Epoch: 7 Global Step: 163500 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:49,993-Speed 2515.38 samples/sec Loss 4.2556 LearningRate 0.000796 Epoch: 7 Global Step: 163510 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:47:58,193-Speed 2498.16 samples/sec Loss 4.3377 LearningRate 0.000796 Epoch: 7 Global Step: 163520 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:06,396-Speed 2497.00 samples/sec Loss 4.2424 LearningRate 0.000796 Epoch: 7 Global Step: 163530 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:14,596-Speed 2498.01 samples/sec Loss 4.3170 LearningRate 0.000796 Epoch: 7 Global Step: 163540 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:22,795-Speed 2498.39 samples/sec Loss 4.3207 LearningRate 0.000796 Epoch: 7 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:30,991-Speed 2499.15 samples/sec Loss 4.2537 LearningRate 0.000796 Epoch: 7 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:39,137-Speed 2514.51 samples/sec Loss 4.2855 LearningRate 0.000796 Epoch: 7 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:47,337-Speed 2498.20 samples/sec Loss 4.2997 LearningRate 0.000796 Epoch: 7 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:48:55,536-Speed 2498.44 samples/sec Loss 4.2882 LearningRate 0.000796 Epoch: 7 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:03,733-Speed 2498.69 samples/sec Loss 4.3236 LearningRate 0.000796 Epoch: 7 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:11,933-Speed 2498.03 samples/sec Loss 4.3338 LearningRate 0.000796 Epoch: 7 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:20,135-Speed 2497.45 samples/sec Loss 4.2761 LearningRate 0.000796 Epoch: 7 Global Step: 163620 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:28,277-Speed 2515.92 samples/sec Loss 4.2610 LearningRate 0.000796 Epoch: 7 Global Step: 163630 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:36,471-Speed 2499.79 samples/sec Loss 4.2859 LearningRate 0.000796 Epoch: 7 Global Step: 163640 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:44,673-Speed 2497.13 samples/sec Loss 4.2359 LearningRate 0.000796 Epoch: 7 Global Step: 163650 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:49:52,871-Speed 2498.88 samples/sec Loss 4.2180 LearningRate 0.000795 Epoch: 7 Global Step: 163660 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:01,066-Speed 2499.54 samples/sec Loss 4.2653 LearningRate 0.000795 Epoch: 7 Global Step: 163670 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:09,276-Speed 2494.75 samples/sec Loss 4.2187 LearningRate 0.000795 Epoch: 7 Global Step: 163680 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:17,426-Speed 2513.48 samples/sec Loss 4.2406 LearningRate 0.000795 Epoch: 7 Global Step: 163690 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:25,624-Speed 2498.44 samples/sec Loss 4.2861 LearningRate 0.000795 Epoch: 7 Global Step: 163700 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:33,820-Speed 2499.25 samples/sec Loss 4.2252 LearningRate 0.000795 Epoch: 7 Global Step: 163710 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:42,018-Speed 2498.66 samples/sec Loss 4.2754 LearningRate 0.000795 Epoch: 7 Global Step: 163720 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:50,229-Speed 2494.65 samples/sec Loss 4.2408 LearningRate 0.000795 Epoch: 7 Global Step: 163730 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:50:58,428-Speed 2498.26 samples/sec Loss 4.2845 LearningRate 0.000795 Epoch: 7 Global Step: 163740 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:06,581-Speed 2512.24 samples/sec Loss 4.2916 LearningRate 0.000795 Epoch: 7 Global Step: 163750 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:14,777-Speed 2499.49 samples/sec Loss 4.1971 LearningRate 0.000795 Epoch: 7 Global Step: 163760 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:22,974-Speed 2498.94 samples/sec Loss 4.1607 LearningRate 0.000795 Epoch: 7 Global Step: 163770 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:31,175-Speed 2497.45 samples/sec Loss 4.2114 LearningRate 0.000795 Epoch: 7 Global Step: 163780 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:39,373-Speed 2498.88 samples/sec Loss 4.2186 LearningRate 0.000795 Epoch: 7 Global Step: 163790 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:47,575-Speed 2497.28 samples/sec Loss 4.2214 LearningRate 0.000795 Epoch: 7 Global Step: 163800 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:51:55,722-Speed 2514.19 samples/sec Loss 4.2325 LearningRate 0.000795 Epoch: 7 Global Step: 163810 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:03,929-Speed 2495.70 samples/sec Loss 4.3070 LearningRate 0.000795 Epoch: 7 Global Step: 163820 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:12,129-Speed 2498.06 samples/sec Loss 4.2337 LearningRate 0.000795 Epoch: 7 Global Step: 163830 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:20,326-Speed 2498.90 samples/sec Loss 4.1497 LearningRate 0.000795 Epoch: 7 Global Step: 163840 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:28,521-Speed 2499.55 samples/sec Loss 4.2769 LearningRate 0.000795 Epoch: 7 Global Step: 163850 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:36,722-Speed 2497.90 samples/sec Loss 4.2888 LearningRate 0.000795 Epoch: 7 Global Step: 163860 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:44,874-Speed 2512.82 samples/sec Loss 4.2564 LearningRate 0.000795 Epoch: 7 Global Step: 163870 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:52:53,071-Speed 2498.93 samples/sec Loss 4.2129 LearningRate 0.000795 Epoch: 7 Global Step: 163880 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:53:01,270-Speed 2498.58 samples/sec Loss 4.2705 LearningRate 0.000795 Epoch: 7 Global Step: 163890 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:53:09,465-Speed 2499.36 samples/sec Loss 4.3058 LearningRate 0.000795 Epoch: 7 Global Step: 163900 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:53:17,662-Speed 2499.11 samples/sec Loss 4.2072 LearningRate 0.000795 Epoch: 7 Global Step: 163910 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 03:53:25,817-Speed 2511.65 samples/sec Loss 4.1578 LearningRate 0.000795 Epoch: 7 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:53:33,959-Speed 2515.67 samples/sec Loss 4.2273 LearningRate 0.000795 Epoch: 7 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:53:42,157-Speed 2498.75 samples/sec Loss 4.2521 LearningRate 0.000795 Epoch: 7 Global Step: 163940 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:53:50,356-Speed 2498.47 samples/sec Loss 4.1932 LearningRate 0.000795 Epoch: 7 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:53:58,563-Speed 2495.71 samples/sec Loss 4.2240 LearningRate 0.000795 Epoch: 7 Global Step: 163960 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:06,765-Speed 2497.38 samples/sec Loss 4.2579 LearningRate 0.000795 Epoch: 7 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:14,963-Speed 2498.73 samples/sec Loss 4.1693 LearningRate 0.000795 Epoch: 7 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:23,110-Speed 2514.26 samples/sec Loss 4.1516 LearningRate 0.000795 Epoch: 7 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:31,309-Speed 2498.10 samples/sec Loss 4.2119 LearningRate 0.000795 Epoch: 7 Global Step: 164000 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:39,508-Speed 2498.15 samples/sec Loss 4.2887 LearningRate 0.000795 Epoch: 7 Global Step: 164010 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:47,708-Speed 2498.42 samples/sec Loss 4.2640 LearningRate 0.000795 Epoch: 7 Global Step: 164020 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:54:55,910-Speed 2497.44 samples/sec Loss 4.3302 LearningRate 0.000795 Epoch: 7 Global Step: 164030 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:04,145-Speed 2487.48 samples/sec Loss 4.3238 LearningRate 0.000795 Epoch: 7 Global Step: 164040 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:12,294-Speed 2513.50 samples/sec Loss 4.2856 LearningRate 0.000795 Epoch: 7 Global Step: 164050 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:20,496-Speed 2497.49 samples/sec Loss 4.1764 LearningRate 0.000795 Epoch: 7 Global Step: 164060 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:28,712-Speed 2493.13 samples/sec Loss 4.1926 LearningRate 0.000795 Epoch: 7 Global Step: 164070 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:36,911-Speed 2498.15 samples/sec Loss 4.1993 LearningRate 0.000794 Epoch: 7 Global Step: 164080 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:45,110-Speed 2498.77 samples/sec Loss 4.2606 LearningRate 0.000794 Epoch: 7 Global Step: 164090 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:55:53,309-Speed 2498.26 samples/sec Loss 4.1804 LearningRate 0.000794 Epoch: 7 Global Step: 164100 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:01,453-Speed 2515.20 samples/sec Loss 4.1915 LearningRate 0.000794 Epoch: 7 Global Step: 164110 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:09,657-Speed 2496.71 samples/sec Loss 4.2429 LearningRate 0.000794 Epoch: 7 Global Step: 164120 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:17,856-Speed 2498.25 samples/sec Loss 4.2461 LearningRate 0.000794 Epoch: 7 Global Step: 164130 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:26,052-Speed 2499.29 samples/sec Loss 4.2477 LearningRate 0.000794 Epoch: 7 Global Step: 164140 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:34,247-Speed 2499.46 samples/sec Loss 4.2261 LearningRate 0.000794 Epoch: 7 Global Step: 164150 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:42,443-Speed 2499.07 samples/sec Loss 4.2108 LearningRate 0.000794 Epoch: 7 Global Step: 164160 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:50,583-Speed 2516.40 samples/sec Loss 4.3115 LearningRate 0.000794 Epoch: 7 Global Step: 164170 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:56:58,795-Speed 2494.21 samples/sec Loss 4.1936 LearningRate 0.000794 Epoch: 7 Global Step: 164180 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:06,993-Speed 2498.67 samples/sec Loss 4.2407 LearningRate 0.000794 Epoch: 7 Global Step: 164190 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:15,192-Speed 2498.12 samples/sec Loss 4.2066 LearningRate 0.000794 Epoch: 7 Global Step: 164200 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:23,400-Speed 2495.79 samples/sec Loss 4.1601 LearningRate 0.000794 Epoch: 7 Global Step: 164210 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:31,597-Speed 2498.65 samples/sec Loss 4.2458 LearningRate 0.000794 Epoch: 7 Global Step: 164220 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:39,746-Speed 2513.65 samples/sec Loss 4.3632 LearningRate 0.000794 Epoch: 7 Global Step: 164230 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:47,941-Speed 2499.44 samples/sec Loss 4.2822 LearningRate 0.000794 Epoch: 7 Global Step: 164240 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:57:56,153-Speed 2494.34 samples/sec Loss 4.2555 LearningRate 0.000794 Epoch: 7 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:04,349-Speed 2499.04 samples/sec Loss 4.2256 LearningRate 0.000794 Epoch: 7 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:12,547-Speed 2498.58 samples/sec Loss 4.2094 LearningRate 0.000794 Epoch: 7 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:20,742-Speed 2499.47 samples/sec Loss 4.2423 LearningRate 0.000794 Epoch: 7 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:28,899-Speed 2511.28 samples/sec Loss 4.2099 LearningRate 0.000794 Epoch: 7 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:37,103-Speed 2496.86 samples/sec Loss 4.3206 LearningRate 0.000794 Epoch: 7 Global Step: 164300 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:45,310-Speed 2495.53 samples/sec Loss 4.3361 LearningRate 0.000794 Epoch: 7 Global Step: 164310 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:58:53,505-Speed 2499.67 samples/sec Loss 4.2286 LearningRate 0.000794 Epoch: 7 Global Step: 164320 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:01,702-Speed 2498.93 samples/sec Loss 4.2809 LearningRate 0.000794 Epoch: 7 Global Step: 164330 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:09,902-Speed 2497.95 samples/sec Loss 4.2943 LearningRate 0.000794 Epoch: 7 Global Step: 164340 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:18,053-Speed 2513.04 samples/sec Loss 4.2793 LearningRate 0.000794 Epoch: 7 Global Step: 164350 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:26,250-Speed 2498.86 samples/sec Loss 4.2841 LearningRate 0.000794 Epoch: 7 Global Step: 164360 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:34,447-Speed 2498.74 samples/sec Loss 4.2308 LearningRate 0.000794 Epoch: 7 Global Step: 164370 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:42,669-Speed 2491.30 samples/sec Loss 4.2600 LearningRate 0.000794 Epoch: 7 Global Step: 164380 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:50,868-Speed 2498.43 samples/sec Loss 4.2620 LearningRate 0.000794 Epoch: 7 Global Step: 164390 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 03:59:59,071-Speed 2496.98 samples/sec Loss 4.2436 LearningRate 0.000794 Epoch: 7 Global Step: 164400 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:07,212-Speed 2516.16 samples/sec Loss 4.2206 LearningRate 0.000794 Epoch: 7 Global Step: 164410 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:15,412-Speed 2497.99 samples/sec Loss 4.2034 LearningRate 0.000794 Epoch: 7 Global Step: 164420 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:23,611-Speed 2498.04 samples/sec Loss 4.2091 LearningRate 0.000794 Epoch: 7 Global Step: 164430 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:31,809-Speed 2498.62 samples/sec Loss 4.2204 LearningRate 0.000794 Epoch: 7 Global Step: 164440 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:40,013-Speed 2497.01 samples/sec Loss 4.2140 LearningRate 0.000794 Epoch: 7 Global Step: 164450 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:48,221-Speed 2495.29 samples/sec Loss 4.2044 LearningRate 0.000794 Epoch: 7 Global Step: 164460 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:00:56,369-Speed 2513.91 samples/sec Loss 4.2385 LearningRate 0.000794 Epoch: 7 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:04,571-Speed 2497.36 samples/sec Loss 4.2050 LearningRate 0.000794 Epoch: 7 Global Step: 164480 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:12,768-Speed 2498.81 samples/sec Loss 4.2530 LearningRate 0.000794 Epoch: 7 Global Step: 164490 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:20,981-Speed 2493.94 samples/sec Loss 4.2353 LearningRate 0.000793 Epoch: 7 Global Step: 164500 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:29,182-Speed 2498.16 samples/sec Loss 4.2535 LearningRate 0.000793 Epoch: 7 Global Step: 164510 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:37,377-Speed 2499.98 samples/sec Loss 4.2372 LearningRate 0.000793 Epoch: 7 Global Step: 164520 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:45,527-Speed 2513.39 samples/sec Loss 4.2961 LearningRate 0.000793 Epoch: 7 Global Step: 164530 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:01:53,725-Speed 2498.47 samples/sec Loss 4.2841 LearningRate 0.000793 Epoch: 7 Global Step: 164540 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:01,924-Speed 2498.55 samples/sec Loss 4.2380 LearningRate 0.000793 Epoch: 7 Global Step: 164550 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:10,119-Speed 2499.36 samples/sec Loss 4.2690 LearningRate 0.000793 Epoch: 7 Global Step: 164560 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:18,319-Speed 2498.06 samples/sec Loss 4.1577 LearningRate 0.000793 Epoch: 7 Global Step: 164570 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:26,519-Speed 2497.87 samples/sec Loss 4.2857 LearningRate 0.000793 Epoch: 7 Global Step: 164580 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:34,681-Speed 2509.62 samples/sec Loss 4.2876 LearningRate 0.000793 Epoch: 7 Global Step: 164590 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:42,879-Speed 2498.56 samples/sec Loss 4.2760 LearningRate 0.000793 Epoch: 7 Global Step: 164600 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:51,078-Speed 2498.30 samples/sec Loss 4.2843 LearningRate 0.000793 Epoch: 7 Global Step: 164610 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:02:59,333-Speed 2481.25 samples/sec Loss 4.2110 LearningRate 0.000793 Epoch: 7 Global Step: 164620 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:07,533-Speed 2498.16 samples/sec Loss 4.1576 LearningRate 0.000793 Epoch: 7 Global Step: 164630 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:15,732-Speed 2498.56 samples/sec Loss 4.2404 LearningRate 0.000793 Epoch: 7 Global Step: 164640 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:23,884-Speed 2512.57 samples/sec Loss 4.2990 LearningRate 0.000793 Epoch: 7 Global Step: 164650 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:32,086-Speed 2497.52 samples/sec Loss 4.2417 LearningRate 0.000793 Epoch: 7 Global Step: 164660 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:40,286-Speed 2497.99 samples/sec Loss 4.1922 LearningRate 0.000793 Epoch: 7 Global Step: 164670 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:48,493-Speed 2495.88 samples/sec Loss 4.2712 LearningRate 0.000793 Epoch: 7 Global Step: 164680 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:03:56,689-Speed 2498.74 samples/sec Loss 4.2438 LearningRate 0.000793 Epoch: 7 Global Step: 164690 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:04,888-Speed 2498.35 samples/sec Loss 4.2471 LearningRate 0.000793 Epoch: 7 Global Step: 164700 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:13,041-Speed 2512.67 samples/sec Loss 4.1934 LearningRate 0.000793 Epoch: 7 Global Step: 164710 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:21,253-Speed 2494.27 samples/sec Loss 4.2446 LearningRate 0.000793 Epoch: 7 Global Step: 164720 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:29,458-Speed 2496.69 samples/sec Loss 4.2175 LearningRate 0.000793 Epoch: 7 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:37,656-Speed 2498.38 samples/sec Loss 4.1660 LearningRate 0.000793 Epoch: 7 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:45,857-Speed 2497.72 samples/sec Loss 4.1950 LearningRate 0.000793 Epoch: 7 Global Step: 164750 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:04:54,057-Speed 2498.05 samples/sec Loss 4.1806 LearningRate 0.000793 Epoch: 7 Global Step: 164760 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:02,204-Speed 2513.99 samples/sec Loss 4.1761 LearningRate 0.000793 Epoch: 7 Global Step: 164770 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:10,404-Speed 2498.08 samples/sec Loss 4.2578 LearningRate 0.000793 Epoch: 7 Global Step: 164780 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:18,602-Speed 2498.86 samples/sec Loss 4.2337 LearningRate 0.000793 Epoch: 7 Global Step: 164790 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:26,798-Speed 2498.90 samples/sec Loss 4.2457 LearningRate 0.000793 Epoch: 7 Global Step: 164800 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:34,999-Speed 2497.83 samples/sec Loss 4.3437 LearningRate 0.000793 Epoch: 7 Global Step: 164810 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:43,197-Speed 2498.46 samples/sec Loss 4.2670 LearningRate 0.000793 Epoch: 7 Global Step: 164820 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:51,342-Speed 2514.94 samples/sec Loss 4.2023 LearningRate 0.000793 Epoch: 7 Global Step: 164830 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:05:59,541-Speed 2498.12 samples/sec Loss 4.2303 LearningRate 0.000793 Epoch: 7 Global Step: 164840 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:07,751-Speed 2495.10 samples/sec Loss 4.2844 LearningRate 0.000793 Epoch: 7 Global Step: 164850 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:15,946-Speed 2499.43 samples/sec Loss 4.2291 LearningRate 0.000793 Epoch: 7 Global Step: 164860 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:24,144-Speed 2498.98 samples/sec Loss 4.2541 LearningRate 0.000793 Epoch: 7 Global Step: 164870 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:32,356-Speed 2494.19 samples/sec Loss 4.1973 LearningRate 0.000793 Epoch: 7 Global Step: 164880 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:40,500-Speed 2515.18 samples/sec Loss 4.3931 LearningRate 0.000793 Epoch: 7 Global Step: 164890 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:48,698-Speed 2498.56 samples/sec Loss 4.3135 LearningRate 0.000793 Epoch: 7 Global Step: 164900 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:06:56,897-Speed 2498.68 samples/sec Loss 4.2395 LearningRate 0.000793 Epoch: 7 Global Step: 164910 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:05,095-Speed 2498.44 samples/sec Loss 4.4378 LearningRate 0.000792 Epoch: 7 Global Step: 164920 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:13,290-Speed 2499.55 samples/sec Loss 4.3139 LearningRate 0.000792 Epoch: 7 Global Step: 164930 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:21,487-Speed 2499.03 samples/sec Loss 4.2457 LearningRate 0.000792 Epoch: 7 Global Step: 164940 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:29,631-Speed 2515.02 samples/sec Loss 4.2280 LearningRate 0.000792 Epoch: 7 Global Step: 164950 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:37,831-Speed 2498.05 samples/sec Loss 4.2306 LearningRate 0.000792 Epoch: 7 Global Step: 164960 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:46,030-Speed 2498.04 samples/sec Loss 4.2688 LearningRate 0.000792 Epoch: 7 Global Step: 164970 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:07:54,234-Speed 2496.94 samples/sec Loss 4.2483 LearningRate 0.000792 Epoch: 7 Global Step: 164980 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:02,429-Speed 2499.56 samples/sec Loss 4.2657 LearningRate 0.000792 Epoch: 7 Global Step: 164990 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:10,631-Speed 2497.17 samples/sec Loss 4.2114 LearningRate 0.000792 Epoch: 7 Global Step: 165000 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:19,313-Speed 2517.02 samples/sec Loss 4.1567 LearningRate 0.000792 Epoch: 7 Global Step: 165010 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:27,511-Speed 2498.52 samples/sec Loss 4.1629 LearningRate 0.000792 Epoch: 7 Global Step: 165020 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:35,706-Speed 2499.54 samples/sec Loss 4.1505 LearningRate 0.000792 Epoch: 7 Global Step: 165030 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:43,903-Speed 2498.92 samples/sec Loss 4.1924 LearningRate 0.000792 Epoch: 7 Global Step: 165040 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:08:52,103-Speed 2498.07 samples/sec Loss 4.2394 LearningRate 0.000792 Epoch: 7 Global Step: 165050 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:00,305-Speed 2497.25 samples/sec Loss 4.2439 LearningRate 0.000792 Epoch: 7 Global Step: 165060 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:08,452-Speed 2514.27 samples/sec Loss 4.1355 LearningRate 0.000792 Epoch: 7 Global Step: 165070 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:16,654-Speed 2497.39 samples/sec Loss 4.2129 LearningRate 0.000792 Epoch: 7 Global Step: 165080 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:24,851-Speed 2498.77 samples/sec Loss 4.3429 LearningRate 0.000792 Epoch: 7 Global Step: 165090 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:33,055-Speed 2496.78 samples/sec Loss 4.2934 LearningRate 0.000792 Epoch: 7 Global Step: 165100 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:41,253-Speed 2498.41 samples/sec Loss 4.2848 LearningRate 0.000792 Epoch: 7 Global Step: 165110 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:09:49,453-Speed 2498.09 samples/sec Loss 4.1905 LearningRate 0.000792 Epoch: 7 Global Step: 165120 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:09:57,605-Speed 2512.78 samples/sec Loss 4.1618 LearningRate 0.000792 Epoch: 7 Global Step: 165130 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:05,825-Speed 2491.80 samples/sec Loss 4.2911 LearningRate 0.000792 Epoch: 7 Global Step: 165140 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:14,025-Speed 2497.83 samples/sec Loss 4.2856 LearningRate 0.000792 Epoch: 7 Global Step: 165150 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:22,230-Speed 2496.68 samples/sec Loss 4.1827 LearningRate 0.000792 Epoch: 7 Global Step: 165160 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:30,426-Speed 2498.99 samples/sec Loss 4.2610 LearningRate 0.000792 Epoch: 7 Global Step: 165170 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:38,623-Speed 2498.75 samples/sec Loss 4.1564 LearningRate 0.000792 Epoch: 7 Global Step: 165180 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:46,775-Speed 2512.91 samples/sec Loss 4.2158 LearningRate 0.000792 Epoch: 7 Global Step: 165190 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:10:54,976-Speed 2498.34 samples/sec Loss 4.2460 LearningRate 0.000792 Epoch: 7 Global Step: 165200 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:03,179-Speed 2497.05 samples/sec Loss 4.1835 LearningRate 0.000792 Epoch: 7 Global Step: 165210 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:11,377-Speed 2498.66 samples/sec Loss 4.2973 LearningRate 0.000792 Epoch: 7 Global Step: 165220 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:19,573-Speed 2499.15 samples/sec Loss 4.2941 LearningRate 0.000792 Epoch: 7 Global Step: 165230 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:27,770-Speed 2498.94 samples/sec Loss 4.1274 LearningRate 0.000792 Epoch: 7 Global Step: 165240 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:35,924-Speed 2512.45 samples/sec Loss 4.2745 LearningRate 0.000792 Epoch: 7 Global Step: 165250 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:44,122-Speed 2498.40 samples/sec Loss 4.2278 LearningRate 0.000792 Epoch: 7 Global Step: 165260 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:11:52,319-Speed 2498.93 samples/sec Loss 4.2474 LearningRate 0.000792 Epoch: 7 Global Step: 165270 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:00,520-Speed 2497.89 samples/sec Loss 4.2194 LearningRate 0.000792 Epoch: 7 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:08,713-Speed 2500.26 samples/sec Loss 4.2697 LearningRate 0.000792 Epoch: 7 Global Step: 165290 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:16,909-Speed 2498.96 samples/sec Loss 4.1923 LearningRate 0.000792 Epoch: 7 Global Step: 165300 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:25,054-Speed 2515.02 samples/sec Loss 4.3062 LearningRate 0.000792 Epoch: 7 Global Step: 165310 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:33,252-Speed 2498.62 samples/sec Loss 4.2854 LearningRate 0.000792 Epoch: 7 Global Step: 165320 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:41,451-Speed 2498.20 samples/sec Loss 4.2518 LearningRate 0.000792 Epoch: 7 Global Step: 165330 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:49,647-Speed 2498.99 samples/sec Loss 4.2227 LearningRate 0.000791 Epoch: 7 Global Step: 165340 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:12:57,856-Speed 2495.25 samples/sec Loss 4.2120 LearningRate 0.000791 Epoch: 7 Global Step: 165350 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:06,059-Speed 2497.34 samples/sec Loss 4.2011 LearningRate 0.000791 Epoch: 7 Global Step: 165360 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:14,204-Speed 2514.76 samples/sec Loss 4.1862 LearningRate 0.000791 Epoch: 7 Global Step: 165370 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:22,404-Speed 2497.85 samples/sec Loss 4.2186 LearningRate 0.000791 Epoch: 7 Global Step: 165380 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:30,607-Speed 2497.16 samples/sec Loss 4.2821 LearningRate 0.000791 Epoch: 7 Global Step: 165390 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:38,808-Speed 2497.57 samples/sec Loss 4.3249 LearningRate 0.000791 Epoch: 7 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:47,007-Speed 2498.59 samples/sec Loss 4.2081 LearningRate 0.000791 Epoch: 7 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:13:55,211-Speed 2496.71 samples/sec Loss 4.3774 LearningRate 0.000791 Epoch: 7 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:03,355-Speed 2515.17 samples/sec Loss 4.2230 LearningRate 0.000791 Epoch: 7 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:11,547-Speed 2500.15 samples/sec Loss 4.2547 LearningRate 0.000791 Epoch: 7 Global Step: 165440 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:19,743-Speed 2499.31 samples/sec Loss 4.2570 LearningRate 0.000791 Epoch: 7 Global Step: 165450 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:27,942-Speed 2498.30 samples/sec Loss 4.2765 LearningRate 0.000791 Epoch: 7 Global Step: 165460 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:36,139-Speed 2498.72 samples/sec Loss 4.2133 LearningRate 0.000791 Epoch: 7 Global Step: 165470 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:44,336-Speed 2499.05 samples/sec Loss 4.2908 LearningRate 0.000791 Epoch: 7 Global Step: 165480 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:14:52,488-Speed 2512.51 samples/sec Loss 4.2275 LearningRate 0.000791 Epoch: 7 Global Step: 165490 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:00,686-Speed 2498.57 samples/sec Loss 4.2955 LearningRate 0.000791 Epoch: 7 Global Step: 165500 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:08,883-Speed 2498.80 samples/sec Loss 4.2533 LearningRate 0.000791 Epoch: 7 Global Step: 165510 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:17,081-Speed 2498.78 samples/sec Loss 4.2277 LearningRate 0.000791 Epoch: 7 Global Step: 165520 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:25,279-Speed 2498.59 samples/sec Loss 4.2640 LearningRate 0.000791 Epoch: 7 Global Step: 165530 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:33,491-Speed 2494.68 samples/sec Loss 4.1827 LearningRate 0.000791 Epoch: 7 Global Step: 165540 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:41,637-Speed 2514.30 samples/sec Loss 4.2373 LearningRate 0.000791 Epoch: 7 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:49,841-Speed 2496.85 samples/sec Loss 4.2200 LearningRate 0.000791 Epoch: 7 Global Step: 165560 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:15:58,038-Speed 2498.74 samples/sec Loss 4.2195 LearningRate 0.000791 Epoch: 7 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:06,238-Speed 2498.11 samples/sec Loss 4.1859 LearningRate 0.000791 Epoch: 7 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:14,435-Speed 2498.71 samples/sec Loss 4.2408 LearningRate 0.000791 Epoch: 7 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:22,633-Speed 2498.69 samples/sec Loss 4.2000 LearningRate 0.000791 Epoch: 7 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:30,778-Speed 2514.77 samples/sec Loss 4.1698 LearningRate 0.000791 Epoch: 7 Global Step: 165610 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:38,979-Speed 2497.79 samples/sec Loss 4.2732 LearningRate 0.000791 Epoch: 7 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:47,177-Speed 2498.76 samples/sec Loss 4.2403 LearningRate 0.000791 Epoch: 7 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:16:55,377-Speed 2498.08 samples/sec Loss 4.3192 LearningRate 0.000791 Epoch: 7 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:03,575-Speed 2498.50 samples/sec Loss 4.2769 LearningRate 0.000791 Epoch: 7 Global Step: 165650 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:11,772-Speed 2499.13 samples/sec Loss 4.2504 LearningRate 0.000791 Epoch: 7 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:19,926-Speed 2511.98 samples/sec Loss 4.2407 LearningRate 0.000791 Epoch: 7 Global Step: 165670 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:28,125-Speed 2498.31 samples/sec Loss 4.1971 LearningRate 0.000791 Epoch: 7 Global Step: 165680 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:36,335-Speed 2494.82 samples/sec Loss 4.3051 LearningRate 0.000791 Epoch: 7 Global Step: 165690 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:44,539-Speed 2496.68 samples/sec Loss 4.2067 LearningRate 0.000791 Epoch: 7 Global Step: 165700 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:17:52,738-Speed 2498.39 samples/sec Loss 4.1645 LearningRate 0.000791 Epoch: 7 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:00,937-Speed 2498.43 samples/sec Loss 4.2501 LearningRate 0.000791 Epoch: 7 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:09,086-Speed 2513.65 samples/sec Loss 4.2877 LearningRate 0.000791 Epoch: 7 Global Step: 165730 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:17,285-Speed 2498.37 samples/sec Loss 4.2190 LearningRate 0.000791 Epoch: 7 Global Step: 165740 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:25,517-Speed 2488.25 samples/sec Loss 4.2355 LearningRate 0.000791 Epoch: 7 Global Step: 165750 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:33,732-Speed 2493.33 samples/sec Loss 4.2142 LearningRate 0.000790 Epoch: 7 Global Step: 165760 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:41,933-Speed 2497.64 samples/sec Loss 4.2559 LearningRate 0.000790 Epoch: 7 Global Step: 165770 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:50,134-Speed 2497.64 samples/sec Loss 4.1824 LearningRate 0.000790 Epoch: 7 Global Step: 165780 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:18:58,288-Speed 2512.17 samples/sec Loss 4.2027 LearningRate 0.000790 Epoch: 7 Global Step: 165790 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:06,486-Speed 2498.67 samples/sec Loss 4.1839 LearningRate 0.000790 Epoch: 7 Global Step: 165800 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:14,692-Speed 2496.02 samples/sec Loss 4.1825 LearningRate 0.000790 Epoch: 7 Global Step: 165810 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:22,892-Speed 2497.90 samples/sec Loss 4.2419 LearningRate 0.000790 Epoch: 7 Global Step: 165820 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:31,092-Speed 2498.01 samples/sec Loss 4.1692 LearningRate 0.000790 Epoch: 7 Global Step: 165830 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:39,292-Speed 2498.05 samples/sec Loss 4.1892 LearningRate 0.000790 Epoch: 7 Global Step: 165840 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:47,441-Speed 2513.64 samples/sec Loss 4.2787 LearningRate 0.000790 Epoch: 7 Global Step: 165850 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:19:55,637-Speed 2499.03 samples/sec Loss 4.2197 LearningRate 0.000790 Epoch: 7 Global Step: 165860 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:03,837-Speed 2498.10 samples/sec Loss 4.2136 LearningRate 0.000790 Epoch: 7 Global Step: 165870 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:12,032-Speed 2499.34 samples/sec Loss 4.2402 LearningRate 0.000790 Epoch: 7 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:20,236-Speed 2496.87 samples/sec Loss 4.2328 LearningRate 0.000790 Epoch: 7 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:28,437-Speed 2497.66 samples/sec Loss 4.1498 LearningRate 0.000790 Epoch: 7 Global Step: 165900 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:36,583-Speed 2514.54 samples/sec Loss 4.1941 LearningRate 0.000790 Epoch: 7 Global Step: 165910 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:46,847-Speed 1995.54 samples/sec Loss 4.1661 LearningRate 0.000790 Epoch: 8 Global Step: 165920 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:20:55,052-Speed 2496.50 samples/sec Loss 4.2110 LearningRate 0.000790 Epoch: 8 Global Step: 165930 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:03,246-Speed 2499.80 samples/sec Loss 4.2291 LearningRate 0.000790 Epoch: 8 Global Step: 165940 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:11,447-Speed 2498.11 samples/sec Loss 4.1841 LearningRate 0.000790 Epoch: 8 Global Step: 165950 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:19,642-Speed 2499.47 samples/sec Loss 4.1765 LearningRate 0.000790 Epoch: 8 Global Step: 165960 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:27,785-Speed 2515.38 samples/sec Loss 4.2058 LearningRate 0.000790 Epoch: 8 Global Step: 165970 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:35,992-Speed 2496.35 samples/sec Loss 4.1498 LearningRate 0.000790 Epoch: 8 Global Step: 165980 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:44,191-Speed 2498.03 samples/sec Loss 4.1340 LearningRate 0.000790 Epoch: 8 Global Step: 165990 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:21:52,392-Speed 2497.90 samples/sec Loss 4.2178 LearningRate 0.000790 Epoch: 8 Global Step: 166000 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:00,597-Speed 2496.51 samples/sec Loss 4.2047 LearningRate 0.000790 Epoch: 8 Global Step: 166010 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:08,799-Speed 2497.32 samples/sec Loss 4.2481 LearningRate 0.000790 Epoch: 8 Global Step: 166020 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:16,947-Speed 2513.84 samples/sec Loss 4.2040 LearningRate 0.000790 Epoch: 8 Global Step: 166030 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:25,147-Speed 2498.08 samples/sec Loss 4.1378 LearningRate 0.000790 Epoch: 8 Global Step: 166040 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:33,355-Speed 2495.34 samples/sec Loss 4.1654 LearningRate 0.000790 Epoch: 8 Global Step: 166050 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:41,561-Speed 2496.18 samples/sec Loss 4.1641 LearningRate 0.000790 Epoch: 8 Global Step: 166060 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:49,759-Speed 2498.70 samples/sec Loss 4.1831 LearningRate 0.000790 Epoch: 8 Global Step: 166070 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:22:57,970-Speed 2494.76 samples/sec Loss 4.2586 LearningRate 0.000790 Epoch: 8 Global Step: 166080 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:06,116-Speed 2514.67 samples/sec Loss 4.1152 LearningRate 0.000790 Epoch: 8 Global Step: 166090 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:14,315-Speed 2498.19 samples/sec Loss 4.2165 LearningRate 0.000790 Epoch: 8 Global Step: 166100 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:22,511-Speed 2499.32 samples/sec Loss 4.1904 LearningRate 0.000790 Epoch: 8 Global Step: 166110 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:30,707-Speed 2499.20 samples/sec Loss 4.2067 LearningRate 0.000790 Epoch: 8 Global Step: 166120 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:38,911-Speed 2496.85 samples/sec Loss 4.2117 LearningRate 0.000790 Epoch: 8 Global Step: 166130 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:47,107-Speed 2499.23 samples/sec Loss 4.1554 LearningRate 0.000790 Epoch: 8 Global Step: 166140 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:23:55,249-Speed 2515.59 samples/sec Loss 4.1989 LearningRate 0.000790 Epoch: 8 Global Step: 166150 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:03,446-Speed 2499.17 samples/sec Loss 4.1982 LearningRate 0.000790 Epoch: 8 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:11,648-Speed 2497.50 samples/sec Loss 4.1540 LearningRate 0.000790 Epoch: 8 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:19,849-Speed 2497.37 samples/sec Loss 4.1478 LearningRate 0.000789 Epoch: 8 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:28,051-Speed 2497.95 samples/sec Loss 4.1608 LearningRate 0.000789 Epoch: 8 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:37,331-Speed 2501.78 samples/sec Loss 4.1761 LearningRate 0.000789 Epoch: 8 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:45,493-Speed 2509.40 samples/sec Loss 4.1830 LearningRate 0.000789 Epoch: 8 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:24:53,776-Speed 2501.42 samples/sec Loss 4.1747 LearningRate 0.000789 Epoch: 8 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:01,998-Speed 2500.11 samples/sec Loss 4.2065 LearningRate 0.000789 Epoch: 8 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:10,197-Speed 2498.28 samples/sec Loss 4.2222 LearningRate 0.000789 Epoch: 8 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:18,444-Speed 2495.40 samples/sec Loss 4.2609 LearningRate 0.000789 Epoch: 8 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:29,797-Speed 2499.86 samples/sec Loss 4.1357 LearningRate 0.000789 Epoch: 8 Global Step: 166260 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:37,954-Speed 2517.84 samples/sec Loss 4.2098 LearningRate 0.000789 Epoch: 8 Global Step: 166270 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:46,353-Speed 2438.81 samples/sec Loss 4.1211 LearningRate 0.000789 Epoch: 8 Global Step: 166280 Fp16 Grad Scale: 65536 Required: 152 hours Training: 2022-07-07 04:25:55,096-Speed 2506.72 samples/sec Loss 4.1916 LearningRate 0.000789 Epoch: 8 Global Step: 166290 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:03,310-Speed 2501.14 samples/sec Loss 4.1948 LearningRate 0.000789 Epoch: 8 Global Step: 166300 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:14,772-Speed 1798.69 samples/sec Loss 4.0961 LearningRate 0.000789 Epoch: 8 Global Step: 166310 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:22,975-Speed 2497.07 samples/sec Loss 4.1592 LearningRate 0.000789 Epoch: 8 Global Step: 166320 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:31,170-Speed 2516.54 samples/sec Loss 4.2255 LearningRate 0.000789 Epoch: 8 Global Step: 166330 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:39,397-Speed 2501.90 samples/sec Loss 4.1205 LearningRate 0.000789 Epoch: 8 Global Step: 166340 Fp16 Grad Scale: 32768 Required: 152 hours Training: 2022-07-07 04:26:47,549-Speed 2512.89 samples/sec Loss 4.1677 LearningRate 0.000789 Epoch: 8 Global Step: 166350 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:26:57,906-Speed 2502.80 samples/sec Loss 4.1009 LearningRate 0.000789 Epoch: 8 Global Step: 166360 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:06,153-Speed 2502.06 samples/sec Loss 4.2235 LearningRate 0.000789 Epoch: 8 Global Step: 166370 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:14,564-Speed 2502.49 samples/sec Loss 4.2000 LearningRate 0.000789 Epoch: 8 Global Step: 166380 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:22,711-Speed 2514.28 samples/sec Loss 4.2341 LearningRate 0.000789 Epoch: 8 Global Step: 166390 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:30,926-Speed 2499.57 samples/sec Loss 4.1942 LearningRate 0.000789 Epoch: 8 Global Step: 166400 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:39,178-Speed 2495.88 samples/sec Loss 4.2572 LearningRate 0.000789 Epoch: 8 Global Step: 166410 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:47,404-Speed 2498.98 samples/sec Loss 4.2293 LearningRate 0.000789 Epoch: 8 Global Step: 166420 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:27:56,195-Speed 2329.88 samples/sec Loss 4.2253 LearningRate 0.000789 Epoch: 8 Global Step: 166430 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:28:04,421-Speed 2499.75 samples/sec Loss 4.2697 LearningRate 0.000789 Epoch: 8 Global Step: 166440 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:28:12,572-Speed 2516.09 samples/sec Loss 4.2118 LearningRate 0.000789 Epoch: 8 Global Step: 166450 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:28:23,321-Speed 2501.02 samples/sec Loss 4.2111 LearningRate 0.000789 Epoch: 8 Global Step: 166460 Fp16 Grad Scale: 16384 Required: 152 hours Training: 2022-07-07 04:28:31,519-Speed 2498.36 samples/sec Loss 4.2124 LearningRate 0.000789 Epoch: 8 Global Step: 166470 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:28:39,932-Speed 2434.49 samples/sec Loss 4.2025 LearningRate 0.000789 Epoch: 8 Global Step: 166480 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:28:48,133-Speed 2497.76 samples/sec Loss 4.2536 LearningRate 0.000789 Epoch: 8 Global Step: 166490 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:28:56,349-Speed 2500.64 samples/sec Loss 4.2280 LearningRate 0.000789 Epoch: 8 Global Step: 166500 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:04,495-Speed 2514.53 samples/sec Loss 4.1246 LearningRate 0.000789 Epoch: 8 Global Step: 166510 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:12,705-Speed 2499.76 samples/sec Loss 4.1880 LearningRate 0.000789 Epoch: 8 Global Step: 166520 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:26,729-Speed 1461.38 samples/sec Loss 4.1558 LearningRate 0.000789 Epoch: 8 Global Step: 166530 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:35,253-Speed 2503.74 samples/sec Loss 4.1638 LearningRate 0.000789 Epoch: 8 Global Step: 166540 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:43,533-Speed 2496.98 samples/sec Loss 4.2537 LearningRate 0.000789 Epoch: 8 Global Step: 166550 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:29:52,607-Speed 2502.04 samples/sec Loss 4.2206 LearningRate 0.000789 Epoch: 8 Global Step: 166560 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:00,748-Speed 2515.89 samples/sec Loss 4.1895 LearningRate 0.000789 Epoch: 8 Global Step: 166570 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:08,947-Speed 2498.45 samples/sec Loss 4.2042 LearningRate 0.000789 Epoch: 8 Global Step: 166580 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:17,146-Speed 2498.78 samples/sec Loss 4.2498 LearningRate 0.000789 Epoch: 8 Global Step: 166590 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:25,345-Speed 2498.21 samples/sec Loss 4.2131 LearningRate 0.000788 Epoch: 8 Global Step: 166600 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:33,545-Speed 2498.01 samples/sec Loss 4.1728 LearningRate 0.000788 Epoch: 8 Global Step: 166610 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:41,751-Speed 2496.16 samples/sec Loss 4.2428 LearningRate 0.000788 Epoch: 8 Global Step: 166620 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:49,897-Speed 2514.72 samples/sec Loss 4.1204 LearningRate 0.000788 Epoch: 8 Global Step: 166630 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:30:58,100-Speed 2496.81 samples/sec Loss 4.1782 LearningRate 0.000788 Epoch: 8 Global Step: 166640 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:06,307-Speed 2495.92 samples/sec Loss 4.1567 LearningRate 0.000788 Epoch: 8 Global Step: 166650 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:14,507-Speed 2498.11 samples/sec Loss 4.1293 LearningRate 0.000788 Epoch: 8 Global Step: 166660 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:22,716-Speed 2495.28 samples/sec Loss 4.2350 LearningRate 0.000788 Epoch: 8 Global Step: 166670 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:30,922-Speed 2495.97 samples/sec Loss 4.2310 LearningRate 0.000788 Epoch: 8 Global Step: 166680 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:39,068-Speed 2514.66 samples/sec Loss 4.2618 LearningRate 0.000788 Epoch: 8 Global Step: 166690 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:47,268-Speed 2497.83 samples/sec Loss 4.1690 LearningRate 0.000788 Epoch: 8 Global Step: 166700 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:31:55,464-Speed 2499.42 samples/sec Loss 4.2219 LearningRate 0.000788 Epoch: 8 Global Step: 166710 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:03,666-Speed 2497.40 samples/sec Loss 4.1883 LearningRate 0.000788 Epoch: 8 Global Step: 166720 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:11,865-Speed 2498.39 samples/sec Loss 4.1798 LearningRate 0.000788 Epoch: 8 Global Step: 166730 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:20,062-Speed 2498.62 samples/sec Loss 4.2006 LearningRate 0.000788 Epoch: 8 Global Step: 166740 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:28,212-Speed 2513.73 samples/sec Loss 4.2600 LearningRate 0.000788 Epoch: 8 Global Step: 166750 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:36,412-Speed 2497.83 samples/sec Loss 4.3186 LearningRate 0.000788 Epoch: 8 Global Step: 166760 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:44,612-Speed 2497.80 samples/sec Loss 4.2927 LearningRate 0.000788 Epoch: 8 Global Step: 166770 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:32:52,812-Speed 2498.24 samples/sec Loss 4.1632 LearningRate 0.000788 Epoch: 8 Global Step: 166780 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:01,011-Speed 2498.17 samples/sec Loss 4.1804 LearningRate 0.000788 Epoch: 8 Global Step: 166790 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:09,210-Speed 2498.16 samples/sec Loss 4.2930 LearningRate 0.000788 Epoch: 8 Global Step: 166800 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:17,362-Speed 2512.91 samples/sec Loss 4.1364 LearningRate 0.000788 Epoch: 8 Global Step: 166810 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:25,558-Speed 2498.98 samples/sec Loss 4.1518 LearningRate 0.000788 Epoch: 8 Global Step: 166820 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:33,786-Speed 2489.35 samples/sec Loss 4.1985 LearningRate 0.000788 Epoch: 8 Global Step: 166830 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:41,988-Speed 2497.33 samples/sec Loss 4.2103 LearningRate 0.000788 Epoch: 8 Global Step: 166840 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:50,190-Speed 2497.49 samples/sec Loss 4.1721 LearningRate 0.000788 Epoch: 8 Global Step: 166850 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:33:58,392-Speed 2497.51 samples/sec Loss 4.2576 LearningRate 0.000788 Epoch: 8 Global Step: 166860 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:06,538-Speed 2514.35 samples/sec Loss 4.2631 LearningRate 0.000788 Epoch: 8 Global Step: 166870 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:14,739-Speed 2497.72 samples/sec Loss 4.2043 LearningRate 0.000788 Epoch: 8 Global Step: 166880 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:22,937-Speed 2498.40 samples/sec Loss 4.1984 LearningRate 0.000788 Epoch: 8 Global Step: 166890 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:31,135-Speed 2498.78 samples/sec Loss 4.1805 LearningRate 0.000788 Epoch: 8 Global Step: 166900 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:39,344-Speed 2495.67 samples/sec Loss 4.2456 LearningRate 0.000788 Epoch: 8 Global Step: 166910 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:47,543-Speed 2498.55 samples/sec Loss 4.2361 LearningRate 0.000788 Epoch: 8 Global Step: 166920 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:34:55,684-Speed 2515.83 samples/sec Loss 4.2320 LearningRate 0.000788 Epoch: 8 Global Step: 166930 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:03,886-Speed 2497.38 samples/sec Loss 4.2246 LearningRate 0.000788 Epoch: 8 Global Step: 166940 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:12,080-Speed 2499.90 samples/sec Loss 4.1258 LearningRate 0.000788 Epoch: 8 Global Step: 166950 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:20,281-Speed 2497.77 samples/sec Loss 4.2041 LearningRate 0.000788 Epoch: 8 Global Step: 166960 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:28,492-Speed 2494.66 samples/sec Loss 4.2028 LearningRate 0.000788 Epoch: 8 Global Step: 166970 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:36,720-Speed 2489.31 samples/sec Loss 4.1863 LearningRate 0.000788 Epoch: 8 Global Step: 166980 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:44,877-Speed 2511.13 samples/sec Loss 4.2580 LearningRate 0.000788 Epoch: 8 Global Step: 166990 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:35:53,075-Speed 2498.70 samples/sec Loss 4.1528 LearningRate 0.000788 Epoch: 8 Global Step: 167000 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:01,269-Speed 2499.73 samples/sec Loss 4.1536 LearningRate 0.000788 Epoch: 8 Global Step: 167010 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:09,471-Speed 2497.59 samples/sec Loss 4.0849 LearningRate 0.000787 Epoch: 8 Global Step: 167020 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:17,671-Speed 2498.34 samples/sec Loss 4.1692 LearningRate 0.000787 Epoch: 8 Global Step: 167030 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:25,870-Speed 2498.10 samples/sec Loss 4.1698 LearningRate 0.000787 Epoch: 8 Global Step: 167040 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:34,016-Speed 2514.53 samples/sec Loss 4.2319 LearningRate 0.000787 Epoch: 8 Global Step: 167050 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:42,216-Speed 2497.65 samples/sec Loss 4.1464 LearningRate 0.000787 Epoch: 8 Global Step: 167060 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:50,417-Speed 2497.78 samples/sec Loss 4.1474 LearningRate 0.000787 Epoch: 8 Global Step: 167070 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:36:58,620-Speed 2497.16 samples/sec Loss 4.1690 LearningRate 0.000787 Epoch: 8 Global Step: 167080 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:06,815-Speed 2499.47 samples/sec Loss 4.2255 LearningRate 0.000787 Epoch: 8 Global Step: 167090 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:15,015-Speed 2498.21 samples/sec Loss 4.1299 LearningRate 0.000787 Epoch: 8 Global Step: 167100 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:23,159-Speed 2515.08 samples/sec Loss 4.1249 LearningRate 0.000787 Epoch: 8 Global Step: 167110 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:31,357-Speed 2498.65 samples/sec Loss 4.2409 LearningRate 0.000787 Epoch: 8 Global Step: 167120 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:39,556-Speed 2498.32 samples/sec Loss 4.1868 LearningRate 0.000787 Epoch: 8 Global Step: 167130 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:47,761-Speed 2496.38 samples/sec Loss 4.2430 LearningRate 0.000787 Epoch: 8 Global Step: 167140 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:37:55,962-Speed 2497.81 samples/sec Loss 4.1968 LearningRate 0.000787 Epoch: 8 Global Step: 167150 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:04,169-Speed 2495.89 samples/sec Loss 4.1941 LearningRate 0.000787 Epoch: 8 Global Step: 167160 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:12,312-Speed 2515.53 samples/sec Loss 4.1281 LearningRate 0.000787 Epoch: 8 Global Step: 167170 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:20,511-Speed 2498.23 samples/sec Loss 4.1581 LearningRate 0.000787 Epoch: 8 Global Step: 167180 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:28,713-Speed 2497.40 samples/sec Loss 4.2259 LearningRate 0.000787 Epoch: 8 Global Step: 167190 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:36,910-Speed 2498.60 samples/sec Loss 4.2146 LearningRate 0.000787 Epoch: 8 Global Step: 167200 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:45,110-Speed 2498.14 samples/sec Loss 4.2456 LearningRate 0.000787 Epoch: 8 Global Step: 167210 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:38:53,306-Speed 2499.07 samples/sec Loss 4.1800 LearningRate 0.000787 Epoch: 8 Global Step: 167220 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:01,449-Speed 2515.43 samples/sec Loss 4.2276 LearningRate 0.000787 Epoch: 8 Global Step: 167230 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:09,647-Speed 2498.43 samples/sec Loss 4.1163 LearningRate 0.000787 Epoch: 8 Global Step: 167240 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:17,849-Speed 2497.47 samples/sec Loss 4.1284 LearningRate 0.000787 Epoch: 8 Global Step: 167250 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:26,048-Speed 2498.32 samples/sec Loss 4.1280 LearningRate 0.000787 Epoch: 8 Global Step: 167260 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:34,248-Speed 2497.68 samples/sec Loss 4.1294 LearningRate 0.000787 Epoch: 8 Global Step: 167270 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:42,453-Speed 2496.58 samples/sec Loss 4.1931 LearningRate 0.000787 Epoch: 8 Global Step: 167280 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:50,597-Speed 2514.95 samples/sec Loss 4.1880 LearningRate 0.000787 Epoch: 8 Global Step: 167290 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:39:58,795-Speed 2498.87 samples/sec Loss 4.1662 LearningRate 0.000787 Epoch: 8 Global Step: 167300 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:06,989-Speed 2499.94 samples/sec Loss 4.2281 LearningRate 0.000787 Epoch: 8 Global Step: 167310 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:15,187-Speed 2498.35 samples/sec Loss 4.1826 LearningRate 0.000787 Epoch: 8 Global Step: 167320 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:23,384-Speed 2499.05 samples/sec Loss 4.1315 LearningRate 0.000787 Epoch: 8 Global Step: 167330 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:31,581-Speed 2499.09 samples/sec Loss 4.1398 LearningRate 0.000787 Epoch: 8 Global Step: 167340 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:39,729-Speed 2514.29 samples/sec Loss 4.3394 LearningRate 0.000787 Epoch: 8 Global Step: 167350 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:47,927-Speed 2498.46 samples/sec Loss 4.1382 LearningRate 0.000787 Epoch: 8 Global Step: 167360 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:40:56,128-Speed 2498.07 samples/sec Loss 4.2720 LearningRate 0.000787 Epoch: 8 Global Step: 167370 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:04,326-Speed 2498.78 samples/sec Loss 4.1824 LearningRate 0.000787 Epoch: 8 Global Step: 167380 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:12,526-Speed 2497.91 samples/sec Loss 4.1971 LearningRate 0.000787 Epoch: 8 Global Step: 167390 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:20,725-Speed 2498.27 samples/sec Loss 4.1177 LearningRate 0.000787 Epoch: 8 Global Step: 167400 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:28,871-Speed 2514.63 samples/sec Loss 4.2616 LearningRate 0.000787 Epoch: 8 Global Step: 167410 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:37,071-Speed 2497.79 samples/sec Loss 4.1887 LearningRate 0.000787 Epoch: 8 Global Step: 167420 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:45,270-Speed 2498.41 samples/sec Loss 4.2019 LearningRate 0.000787 Epoch: 8 Global Step: 167430 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:41:53,466-Speed 2499.30 samples/sec Loss 4.1500 LearningRate 0.000786 Epoch: 8 Global Step: 167440 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:01,664-Speed 2498.75 samples/sec Loss 4.1902 LearningRate 0.000786 Epoch: 8 Global Step: 167450 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:09,863-Speed 2498.46 samples/sec Loss 4.1540 LearningRate 0.000786 Epoch: 8 Global Step: 167460 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:18,007-Speed 2515.11 samples/sec Loss 4.1844 LearningRate 0.000786 Epoch: 8 Global Step: 167470 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:26,207-Speed 2497.71 samples/sec Loss 4.3460 LearningRate 0.000786 Epoch: 8 Global Step: 167480 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:34,405-Speed 2498.55 samples/sec Loss 4.2724 LearningRate 0.000786 Epoch: 8 Global Step: 167490 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:42,603-Speed 2498.83 samples/sec Loss 4.2347 LearningRate 0.000786 Epoch: 8 Global Step: 167500 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:50,813-Speed 2494.82 samples/sec Loss 4.1788 LearningRate 0.000786 Epoch: 8 Global Step: 167510 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:42:59,012-Speed 2498.28 samples/sec Loss 4.1289 LearningRate 0.000786 Epoch: 8 Global Step: 167520 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:43:07,154-Speed 2515.69 samples/sec Loss 4.2438 LearningRate 0.000786 Epoch: 8 Global Step: 167530 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:43:15,353-Speed 2498.41 samples/sec Loss 4.2099 LearningRate 0.000786 Epoch: 8 Global Step: 167540 Fp16 Grad Scale: 16384 Required: 151 hours Training: 2022-07-07 04:43:23,552-Speed 2498.13 samples/sec Loss 4.2112 LearningRate 0.000786 Epoch: 8 Global Step: 167550 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:43:31,751-Speed 2498.54 samples/sec Loss 4.2391 LearningRate 0.000786 Epoch: 8 Global Step: 167560 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:43:41,258-Speed 2154.72 samples/sec Loss 4.1939 LearningRate 0.000786 Epoch: 8 Global Step: 167570 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:43:49,454-Speed 2499.19 samples/sec Loss 4.1760 LearningRate 0.000786 Epoch: 8 Global Step: 167580 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:43:57,612-Speed 2510.74 samples/sec Loss 4.1284 LearningRate 0.000786 Epoch: 8 Global Step: 167590 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:06,562-Speed 2500.54 samples/sec Loss 4.1877 LearningRate 0.000786 Epoch: 8 Global Step: 167600 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:14,759-Speed 2498.87 samples/sec Loss 4.2103 LearningRate 0.000786 Epoch: 8 Global Step: 167610 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:23,306-Speed 2396.63 samples/sec Loss 4.1702 LearningRate 0.000786 Epoch: 8 Global Step: 167620 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:31,503-Speed 2498.56 samples/sec Loss 4.1397 LearningRate 0.000786 Epoch: 8 Global Step: 167630 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:39,701-Speed 2498.57 samples/sec Loss 4.1219 LearningRate 0.000786 Epoch: 8 Global Step: 167640 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:47,857-Speed 2511.50 samples/sec Loss 4.1353 LearningRate 0.000786 Epoch: 8 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:44:56,057-Speed 2498.12 samples/sec Loss 4.1226 LearningRate 0.000786 Epoch: 8 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:04,253-Speed 2498.87 samples/sec Loss 4.1177 LearningRate 0.000786 Epoch: 8 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:12,449-Speed 2499.31 samples/sec Loss 4.1086 LearningRate 0.000786 Epoch: 8 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:20,647-Speed 2499.13 samples/sec Loss 4.0553 LearningRate 0.000786 Epoch: 8 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:28,846-Speed 2498.36 samples/sec Loss 4.1344 LearningRate 0.000786 Epoch: 8 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:36,995-Speed 2513.93 samples/sec Loss 4.1915 LearningRate 0.000786 Epoch: 8 Global Step: 167710 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:45,197-Speed 2497.52 samples/sec Loss 4.1793 LearningRate 0.000786 Epoch: 8 Global Step: 167720 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:45:53,394-Speed 2498.70 samples/sec Loss 4.1261 LearningRate 0.000786 Epoch: 8 Global Step: 167730 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:01,591-Speed 2499.01 samples/sec Loss 4.1052 LearningRate 0.000786 Epoch: 8 Global Step: 167740 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:09,789-Speed 2498.68 samples/sec Loss 4.1457 LearningRate 0.000786 Epoch: 8 Global Step: 167750 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:17,987-Speed 2498.54 samples/sec Loss 4.1605 LearningRate 0.000786 Epoch: 8 Global Step: 167760 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:26,127-Speed 2516.21 samples/sec Loss 4.1930 LearningRate 0.000786 Epoch: 8 Global Step: 167770 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:34,331-Speed 2496.91 samples/sec Loss 4.1943 LearningRate 0.000786 Epoch: 8 Global Step: 167780 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:42,544-Speed 2494.05 samples/sec Loss 4.1827 LearningRate 0.000786 Epoch: 8 Global Step: 167790 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:50,743-Speed 2498.22 samples/sec Loss 4.1838 LearningRate 0.000786 Epoch: 8 Global Step: 167800 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:46:58,953-Speed 2494.95 samples/sec Loss 4.2198 LearningRate 0.000786 Epoch: 8 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:07,158-Speed 2496.62 samples/sec Loss 4.1746 LearningRate 0.000786 Epoch: 8 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:15,300-Speed 2515.90 samples/sec Loss 4.1616 LearningRate 0.000786 Epoch: 8 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:23,503-Speed 2497.30 samples/sec Loss 4.2070 LearningRate 0.000786 Epoch: 8 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:31,703-Speed 2497.74 samples/sec Loss 4.1719 LearningRate 0.000786 Epoch: 8 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:39,902-Speed 2498.16 samples/sec Loss 4.1482 LearningRate 0.000785 Epoch: 8 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:48,109-Speed 2496.30 samples/sec Loss 4.2906 LearningRate 0.000785 Epoch: 8 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:47:56,311-Speed 2497.41 samples/sec Loss 4.2071 LearningRate 0.000785 Epoch: 8 Global Step: 167880 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:04,455-Speed 2514.97 samples/sec Loss 4.2270 LearningRate 0.000785 Epoch: 8 Global Step: 167890 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:12,653-Speed 2498.66 samples/sec Loss 4.2099 LearningRate 0.000785 Epoch: 8 Global Step: 167900 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:20,856-Speed 2497.18 samples/sec Loss 4.1847 LearningRate 0.000785 Epoch: 8 Global Step: 167910 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:29,053-Speed 2498.69 samples/sec Loss 4.2022 LearningRate 0.000785 Epoch: 8 Global Step: 167920 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:37,255-Speed 2497.45 samples/sec Loss 4.1598 LearningRate 0.000785 Epoch: 8 Global Step: 167930 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:45,454-Speed 2498.21 samples/sec Loss 4.2207 LearningRate 0.000785 Epoch: 8 Global Step: 167940 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:48:53,600-Speed 2514.57 samples/sec Loss 4.2442 LearningRate 0.000785 Epoch: 8 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:01,798-Speed 2498.62 samples/sec Loss 4.3337 LearningRate 0.000785 Epoch: 8 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:10,014-Speed 2492.86 samples/sec Loss 4.1776 LearningRate 0.000785 Epoch: 8 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:18,222-Speed 2495.58 samples/sec Loss 4.1633 LearningRate 0.000785 Epoch: 8 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:26,418-Speed 2499.19 samples/sec Loss 4.2416 LearningRate 0.000785 Epoch: 8 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:34,626-Speed 2495.69 samples/sec Loss 4.2825 LearningRate 0.000785 Epoch: 8 Global Step: 168000 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:42,772-Speed 2514.61 samples/sec Loss 4.2427 LearningRate 0.000785 Epoch: 8 Global Step: 168010 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:50,970-Speed 2498.41 samples/sec Loss 4.2175 LearningRate 0.000785 Epoch: 8 Global Step: 168020 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:49:59,168-Speed 2498.72 samples/sec Loss 4.1454 LearningRate 0.000785 Epoch: 8 Global Step: 168030 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:07,362-Speed 2499.64 samples/sec Loss 4.1489 LearningRate 0.000785 Epoch: 8 Global Step: 168040 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:15,566-Speed 2496.97 samples/sec Loss 4.2421 LearningRate 0.000785 Epoch: 8 Global Step: 168050 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:23,764-Speed 2498.46 samples/sec Loss 4.2080 LearningRate 0.000785 Epoch: 8 Global Step: 168060 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:31,911-Speed 2514.34 samples/sec Loss 4.0923 LearningRate 0.000785 Epoch: 8 Global Step: 168070 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:40,121-Speed 2494.77 samples/sec Loss 4.1792 LearningRate 0.000785 Epoch: 8 Global Step: 168080 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:48,318-Speed 2499.73 samples/sec Loss 4.1698 LearningRate 0.000785 Epoch: 8 Global Step: 168090 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:50:56,520-Speed 2497.91 samples/sec Loss 4.1289 LearningRate 0.000785 Epoch: 8 Global Step: 168100 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:04,718-Speed 2498.52 samples/sec Loss 4.1766 LearningRate 0.000785 Epoch: 8 Global Step: 168110 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:12,917-Speed 2498.05 samples/sec Loss 4.1193 LearningRate 0.000785 Epoch: 8 Global Step: 168120 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:21,075-Speed 2511.09 samples/sec Loss 4.1477 LearningRate 0.000785 Epoch: 8 Global Step: 168130 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:29,273-Speed 2498.29 samples/sec Loss 4.1281 LearningRate 0.000785 Epoch: 8 Global Step: 168140 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:37,473-Speed 2498.01 samples/sec Loss 4.0983 LearningRate 0.000785 Epoch: 8 Global Step: 168150 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:45,670-Speed 2498.84 samples/sec Loss 4.1498 LearningRate 0.000785 Epoch: 8 Global Step: 168160 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:51:53,868-Speed 2498.80 samples/sec Loss 4.2147 LearningRate 0.000785 Epoch: 8 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:02,064-Speed 2498.87 samples/sec Loss 4.3165 LearningRate 0.000785 Epoch: 8 Global Step: 168180 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:10,214-Speed 2513.34 samples/sec Loss 4.1330 LearningRate 0.000785 Epoch: 8 Global Step: 168190 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:18,415-Speed 2497.80 samples/sec Loss 4.1545 LearningRate 0.000785 Epoch: 8 Global Step: 168200 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:26,616-Speed 2497.89 samples/sec Loss 4.1176 LearningRate 0.000785 Epoch: 8 Global Step: 168210 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:34,820-Speed 2496.53 samples/sec Loss 4.1816 LearningRate 0.000785 Epoch: 8 Global Step: 168220 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:43,021-Speed 2497.73 samples/sec Loss 4.1859 LearningRate 0.000785 Epoch: 8 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:51,223-Speed 2497.36 samples/sec Loss 4.1756 LearningRate 0.000785 Epoch: 8 Global Step: 168240 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:52:59,369-Speed 2514.38 samples/sec Loss 4.1267 LearningRate 0.000785 Epoch: 8 Global Step: 168250 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:07,569-Speed 2498.08 samples/sec Loss 4.1232 LearningRate 0.000785 Epoch: 8 Global Step: 168260 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:15,766-Speed 2498.54 samples/sec Loss 4.0892 LearningRate 0.000785 Epoch: 8 Global Step: 168270 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:23,965-Speed 2498.48 samples/sec Loss 4.1475 LearningRate 0.000784 Epoch: 8 Global Step: 168280 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:32,164-Speed 2498.22 samples/sec Loss 4.1113 LearningRate 0.000784 Epoch: 8 Global Step: 168290 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:40,368-Speed 2496.78 samples/sec Loss 4.1417 LearningRate 0.000784 Epoch: 8 Global Step: 168300 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:48,511-Speed 2515.50 samples/sec Loss 4.1527 LearningRate 0.000784 Epoch: 8 Global Step: 168310 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:53:56,706-Speed 2499.64 samples/sec Loss 4.1438 LearningRate 0.000784 Epoch: 8 Global Step: 168320 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:04,904-Speed 2498.57 samples/sec Loss 4.1090 LearningRate 0.000784 Epoch: 8 Global Step: 168330 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:13,104-Speed 2498.19 samples/sec Loss 4.1199 LearningRate 0.000784 Epoch: 8 Global Step: 168340 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:21,302-Speed 2498.63 samples/sec Loss 4.1922 LearningRate 0.000784 Epoch: 8 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:29,501-Speed 2498.19 samples/sec Loss 4.1562 LearningRate 0.000784 Epoch: 8 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:37,644-Speed 2516.01 samples/sec Loss 4.1203 LearningRate 0.000784 Epoch: 8 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:45,841-Speed 2498.88 samples/sec Loss 4.1997 LearningRate 0.000784 Epoch: 8 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:54:54,043-Speed 2497.54 samples/sec Loss 4.1243 LearningRate 0.000784 Epoch: 8 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:02,244-Speed 2497.55 samples/sec Loss 4.1641 LearningRate 0.000784 Epoch: 8 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:10,452-Speed 2495.40 samples/sec Loss 4.1961 LearningRate 0.000784 Epoch: 8 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:18,651-Speed 2498.47 samples/sec Loss 4.1510 LearningRate 0.000784 Epoch: 8 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:26,799-Speed 2514.39 samples/sec Loss 4.1518 LearningRate 0.000784 Epoch: 8 Global Step: 168430 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:34,999-Speed 2498.08 samples/sec Loss 4.1735 LearningRate 0.000784 Epoch: 8 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:43,200-Speed 2497.75 samples/sec Loss 4.2143 LearningRate 0.000784 Epoch: 8 Global Step: 168450 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:51,397-Speed 2498.98 samples/sec Loss 4.1551 LearningRate 0.000784 Epoch: 8 Global Step: 168460 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:55:59,595-Speed 2498.46 samples/sec Loss 4.1306 LearningRate 0.000784 Epoch: 8 Global Step: 168470 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:07,805-Speed 2495.13 samples/sec Loss 4.1687 LearningRate 0.000784 Epoch: 8 Global Step: 168480 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:15,949-Speed 2515.19 samples/sec Loss 4.1222 LearningRate 0.000784 Epoch: 8 Global Step: 168490 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:24,144-Speed 2499.49 samples/sec Loss 4.0687 LearningRate 0.000784 Epoch: 8 Global Step: 168500 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:32,348-Speed 2496.73 samples/sec Loss 4.0999 LearningRate 0.000784 Epoch: 8 Global Step: 168510 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:40,548-Speed 2497.98 samples/sec Loss 4.0933 LearningRate 0.000784 Epoch: 8 Global Step: 168520 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:48,750-Speed 2497.30 samples/sec Loss 4.0885 LearningRate 0.000784 Epoch: 8 Global Step: 168530 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:56:56,948-Speed 2498.84 samples/sec Loss 4.0517 LearningRate 0.000784 Epoch: 8 Global Step: 168540 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:05,092-Speed 2515.22 samples/sec Loss 4.0997 LearningRate 0.000784 Epoch: 8 Global Step: 168550 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:13,296-Speed 2496.86 samples/sec Loss 4.0603 LearningRate 0.000784 Epoch: 8 Global Step: 168560 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:21,495-Speed 2498.18 samples/sec Loss 4.1082 LearningRate 0.000784 Epoch: 8 Global Step: 168570 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:29,693-Speed 2498.40 samples/sec Loss 4.1268 LearningRate 0.000784 Epoch: 8 Global Step: 168580 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:37,891-Speed 2498.75 samples/sec Loss 4.0870 LearningRate 0.000784 Epoch: 8 Global Step: 168590 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:46,094-Speed 2496.99 samples/sec Loss 4.0957 LearningRate 0.000784 Epoch: 8 Global Step: 168600 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:57:54,254-Speed 2510.31 samples/sec Loss 4.1347 LearningRate 0.000784 Epoch: 8 Global Step: 168610 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:02,451-Speed 2498.85 samples/sec Loss 4.1442 LearningRate 0.000784 Epoch: 8 Global Step: 168620 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:10,667-Speed 2493.03 samples/sec Loss 4.1076 LearningRate 0.000784 Epoch: 8 Global Step: 168630 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:18,866-Speed 2498.19 samples/sec Loss 4.0477 LearningRate 0.000784 Epoch: 8 Global Step: 168640 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:27,073-Speed 2496.05 samples/sec Loss 4.0765 LearningRate 0.000784 Epoch: 8 Global Step: 168650 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:35,271-Speed 2498.76 samples/sec Loss 4.1190 LearningRate 0.000784 Epoch: 8 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:43,416-Speed 2514.83 samples/sec Loss 4.1244 LearningRate 0.000784 Epoch: 8 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:51,614-Speed 2498.30 samples/sec Loss 4.1704 LearningRate 0.000784 Epoch: 8 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:58:59,823-Speed 2495.34 samples/sec Loss 4.0921 LearningRate 0.000784 Epoch: 8 Global Step: 168690 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:08,023-Speed 2497.89 samples/sec Loss 4.1233 LearningRate 0.000784 Epoch: 8 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:16,219-Speed 2498.97 samples/sec Loss 4.1691 LearningRate 0.000783 Epoch: 8 Global Step: 168710 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:24,417-Speed 2498.72 samples/sec Loss 4.1645 LearningRate 0.000783 Epoch: 8 Global Step: 168720 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:32,565-Speed 2513.77 samples/sec Loss 4.1503 LearningRate 0.000783 Epoch: 8 Global Step: 168730 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:40,763-Speed 2498.80 samples/sec Loss 4.0492 LearningRate 0.000783 Epoch: 8 Global Step: 168740 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 04:59:48,965-Speed 2497.18 samples/sec Loss 4.1749 LearningRate 0.000783 Epoch: 8 Global Step: 168750 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 04:59:57,166-Speed 2497.73 samples/sec Loss 4.1320 LearningRate 0.000783 Epoch: 8 Global Step: 168760 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:05,378-Speed 2494.32 samples/sec Loss 4.1289 LearningRate 0.000783 Epoch: 8 Global Step: 168770 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:13,577-Speed 2498.32 samples/sec Loss 4.2112 LearningRate 0.000783 Epoch: 8 Global Step: 168780 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:21,723-Speed 2514.40 samples/sec Loss 4.1879 LearningRate 0.000783 Epoch: 8 Global Step: 168790 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:29,921-Speed 2498.51 samples/sec Loss 4.1317 LearningRate 0.000783 Epoch: 8 Global Step: 168800 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:38,120-Speed 2498.31 samples/sec Loss 4.2138 LearningRate 0.000783 Epoch: 8 Global Step: 168810 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:46,318-Speed 2498.62 samples/sec Loss 4.1196 LearningRate 0.000783 Epoch: 8 Global Step: 168820 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:00:54,516-Speed 2498.77 samples/sec Loss 4.1058 LearningRate 0.000783 Epoch: 8 Global Step: 168830 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:02,714-Speed 2498.26 samples/sec Loss 4.1673 LearningRate 0.000783 Epoch: 8 Global Step: 168840 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:10,865-Speed 2513.13 samples/sec Loss 4.0605 LearningRate 0.000783 Epoch: 8 Global Step: 168850 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:19,063-Speed 2498.39 samples/sec Loss 4.0753 LearningRate 0.000783 Epoch: 8 Global Step: 168860 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:27,263-Speed 2498.11 samples/sec Loss 4.1407 LearningRate 0.000783 Epoch: 8 Global Step: 168870 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:35,459-Speed 2499.24 samples/sec Loss 4.1494 LearningRate 0.000783 Epoch: 8 Global Step: 168880 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:43,656-Speed 2499.00 samples/sec Loss 4.1180 LearningRate 0.000783 Epoch: 8 Global Step: 168890 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:01:51,852-Speed 2499.12 samples/sec Loss 4.1580 LearningRate 0.000783 Epoch: 8 Global Step: 168900 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:02:00,001-Speed 2513.60 samples/sec Loss 4.0140 LearningRate 0.000783 Epoch: 8 Global Step: 168910 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:02:08,198-Speed 2498.64 samples/sec Loss 4.1305 LearningRate 0.000783 Epoch: 8 Global Step: 168920 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:02:16,357-Speed 2510.55 samples/sec Loss 4.1080 LearningRate 0.000783 Epoch: 8 Global Step: 168930 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:02:24,556-Speed 2498.29 samples/sec Loss 4.0894 LearningRate 0.000783 Epoch: 8 Global Step: 168940 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:02:32,765-Speed 2495.33 samples/sec Loss 4.1020 LearningRate 0.000783 Epoch: 8 Global Step: 168950 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:02:40,976-Speed 2494.60 samples/sec Loss 4.0990 LearningRate 0.000783 Epoch: 8 Global Step: 168960 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:02:49,124-Speed 2514.08 samples/sec Loss 4.1213 LearningRate 0.000783 Epoch: 8 Global Step: 168970 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:02:57,324-Speed 2497.97 samples/sec Loss 4.0810 LearningRate 0.000783 Epoch: 8 Global Step: 168980 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:05,527-Speed 2496.83 samples/sec Loss 4.1846 LearningRate 0.000783 Epoch: 8 Global Step: 168990 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:13,729-Speed 2497.69 samples/sec Loss 4.1154 LearningRate 0.000783 Epoch: 8 Global Step: 169000 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:21,927-Speed 2498.55 samples/sec Loss 4.0563 LearningRate 0.000783 Epoch: 8 Global Step: 169010 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:30,130-Speed 2497.13 samples/sec Loss 4.0326 LearningRate 0.000783 Epoch: 8 Global Step: 169020 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:38,279-Speed 2513.65 samples/sec Loss 4.1236 LearningRate 0.000783 Epoch: 8 Global Step: 169030 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:46,476-Speed 2498.74 samples/sec Loss 4.0619 LearningRate 0.000783 Epoch: 8 Global Step: 169040 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:03:54,674-Speed 2498.80 samples/sec Loss 4.1094 LearningRate 0.000783 Epoch: 8 Global Step: 169050 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:02,871-Speed 2498.99 samples/sec Loss 4.1182 LearningRate 0.000783 Epoch: 8 Global Step: 169060 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:11,069-Speed 2498.27 samples/sec Loss 4.0949 LearningRate 0.000783 Epoch: 8 Global Step: 169070 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:19,268-Speed 2498.48 samples/sec Loss 4.1296 LearningRate 0.000783 Epoch: 8 Global Step: 169080 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:27,415-Speed 2514.45 samples/sec Loss 4.1710 LearningRate 0.000783 Epoch: 8 Global Step: 169090 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:35,620-Speed 2496.26 samples/sec Loss 4.1665 LearningRate 0.000783 Epoch: 8 Global Step: 169100 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:43,825-Speed 2496.62 samples/sec Loss 4.1242 LearningRate 0.000783 Epoch: 8 Global Step: 169110 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:04:52,024-Speed 2498.19 samples/sec Loss 4.1088 LearningRate 0.000783 Epoch: 8 Global Step: 169120 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:00,226-Speed 2497.50 samples/sec Loss 4.1908 LearningRate 0.000782 Epoch: 8 Global Step: 169130 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:08,431-Speed 2496.42 samples/sec Loss 4.1311 LearningRate 0.000782 Epoch: 8 Global Step: 169140 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:16,581-Speed 2513.30 samples/sec Loss 4.1195 LearningRate 0.000782 Epoch: 8 Global Step: 169150 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:24,788-Speed 2495.84 samples/sec Loss 4.0684 LearningRate 0.000782 Epoch: 8 Global Step: 169160 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:32,987-Speed 2498.21 samples/sec Loss 4.1432 LearningRate 0.000782 Epoch: 8 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:41,197-Speed 2494.75 samples/sec Loss 3.9700 LearningRate 0.000782 Epoch: 8 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:49,397-Speed 2498.05 samples/sec Loss 4.1715 LearningRate 0.000782 Epoch: 8 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:05:57,598-Speed 2497.58 samples/sec Loss 4.0242 LearningRate 0.000782 Epoch: 8 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:05,751-Speed 2512.69 samples/sec Loss 4.1180 LearningRate 0.000782 Epoch: 8 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:13,958-Speed 2495.76 samples/sec Loss 4.1167 LearningRate 0.000782 Epoch: 8 Global Step: 169220 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:22,159-Speed 2497.57 samples/sec Loss 4.1267 LearningRate 0.000782 Epoch: 8 Global Step: 169230 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:30,368-Speed 2495.38 samples/sec Loss 4.1390 LearningRate 0.000782 Epoch: 8 Global Step: 169240 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:38,568-Speed 2498.01 samples/sec Loss 4.0587 LearningRate 0.000782 Epoch: 8 Global Step: 169250 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:46,773-Speed 2496.55 samples/sec Loss 4.1182 LearningRate 0.000782 Epoch: 8 Global Step: 169260 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:06:54,920-Speed 2514.08 samples/sec Loss 4.0756 LearningRate 0.000782 Epoch: 8 Global Step: 169270 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:03,120-Speed 2497.88 samples/sec Loss 4.1026 LearningRate 0.000782 Epoch: 8 Global Step: 169280 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:11,328-Speed 2495.78 samples/sec Loss 4.1942 LearningRate 0.000782 Epoch: 8 Global Step: 169290 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:19,528-Speed 2498.00 samples/sec Loss 4.2396 LearningRate 0.000782 Epoch: 8 Global Step: 169300 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:27,727-Speed 2498.11 samples/sec Loss 4.2529 LearningRate 0.000782 Epoch: 8 Global Step: 169310 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:35,926-Speed 2498.35 samples/sec Loss 4.1515 LearningRate 0.000782 Epoch: 8 Global Step: 169320 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:44,077-Speed 2512.85 samples/sec Loss 4.1514 LearningRate 0.000782 Epoch: 8 Global Step: 169330 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:07:52,281-Speed 2497.08 samples/sec Loss 4.1173 LearningRate 0.000782 Epoch: 8 Global Step: 169340 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:00,480-Speed 2498.32 samples/sec Loss 4.0850 LearningRate 0.000782 Epoch: 8 Global Step: 169350 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:08,678-Speed 2498.61 samples/sec Loss 4.1446 LearningRate 0.000782 Epoch: 8 Global Step: 169360 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:16,888-Speed 2494.96 samples/sec Loss 4.1215 LearningRate 0.000782 Epoch: 8 Global Step: 169370 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:25,085-Speed 2499.09 samples/sec Loss 4.1317 LearningRate 0.000782 Epoch: 8 Global Step: 169380 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:33,230-Speed 2514.60 samples/sec Loss 4.1431 LearningRate 0.000782 Epoch: 8 Global Step: 169390 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:41,454-Speed 2490.83 samples/sec Loss 4.1127 LearningRate 0.000782 Epoch: 8 Global Step: 169400 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:49,653-Speed 2498.43 samples/sec Loss 4.2173 LearningRate 0.000782 Epoch: 8 Global Step: 169410 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:08:57,856-Speed 2497.07 samples/sec Loss 4.1159 LearningRate 0.000782 Epoch: 8 Global Step: 169420 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:06,054-Speed 2498.48 samples/sec Loss 4.1919 LearningRate 0.000782 Epoch: 8 Global Step: 169430 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:14,255-Speed 2497.73 samples/sec Loss 4.1218 LearningRate 0.000782 Epoch: 8 Global Step: 169440 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:22,402-Speed 2514.18 samples/sec Loss 4.1714 LearningRate 0.000782 Epoch: 8 Global Step: 169450 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:30,604-Speed 2497.43 samples/sec Loss 4.1445 LearningRate 0.000782 Epoch: 8 Global Step: 169460 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:38,800-Speed 2499.00 samples/sec Loss 4.1034 LearningRate 0.000782 Epoch: 8 Global Step: 169470 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:47,002-Speed 2497.44 samples/sec Loss 4.0841 LearningRate 0.000782 Epoch: 8 Global Step: 169480 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:09:55,199-Speed 2499.07 samples/sec Loss 4.1707 LearningRate 0.000782 Epoch: 8 Global Step: 169490 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:03,399-Speed 2497.91 samples/sec Loss 4.1208 LearningRate 0.000782 Epoch: 8 Global Step: 169500 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:11,542-Speed 2515.46 samples/sec Loss 4.0922 LearningRate 0.000782 Epoch: 8 Global Step: 169510 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:19,751-Speed 2495.09 samples/sec Loss 4.0273 LearningRate 0.000782 Epoch: 8 Global Step: 169520 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:27,953-Speed 2497.44 samples/sec Loss 4.1236 LearningRate 0.000782 Epoch: 8 Global Step: 169530 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:36,156-Speed 2497.12 samples/sec Loss 4.1123 LearningRate 0.000782 Epoch: 8 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:44,357-Speed 2497.87 samples/sec Loss 4.2108 LearningRate 0.000781 Epoch: 8 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:10:52,556-Speed 2498.10 samples/sec Loss 4.1097 LearningRate 0.000781 Epoch: 8 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:00,698-Speed 2515.88 samples/sec Loss 4.1060 LearningRate 0.000781 Epoch: 8 Global Step: 169570 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:08,901-Speed 2497.12 samples/sec Loss 4.1219 LearningRate 0.000781 Epoch: 8 Global Step: 169580 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:17,100-Speed 2498.02 samples/sec Loss 4.1053 LearningRate 0.000781 Epoch: 8 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:25,299-Speed 2498.29 samples/sec Loss 4.1400 LearningRate 0.000781 Epoch: 8 Global Step: 169600 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:33,502-Speed 2497.27 samples/sec Loss 4.1872 LearningRate 0.000781 Epoch: 8 Global Step: 169610 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:41,704-Speed 2497.36 samples/sec Loss 4.2270 LearningRate 0.000781 Epoch: 8 Global Step: 169620 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:49,855-Speed 2513.04 samples/sec Loss 4.0878 LearningRate 0.000781 Epoch: 8 Global Step: 169630 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:11:58,053-Speed 2498.42 samples/sec Loss 4.1850 LearningRate 0.000781 Epoch: 8 Global Step: 169640 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:06,253-Speed 2497.93 samples/sec Loss 4.0819 LearningRate 0.000781 Epoch: 8 Global Step: 169650 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:14,460-Speed 2496.17 samples/sec Loss 4.2163 LearningRate 0.000781 Epoch: 8 Global Step: 169660 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:22,657-Speed 2498.65 samples/sec Loss 4.2709 LearningRate 0.000781 Epoch: 8 Global Step: 169670 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:30,856-Speed 2498.19 samples/sec Loss 4.1939 LearningRate 0.000781 Epoch: 8 Global Step: 169680 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:39,004-Speed 2514.01 samples/sec Loss 4.2935 LearningRate 0.000781 Epoch: 8 Global Step: 169690 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:47,217-Speed 2494.09 samples/sec Loss 4.1986 LearningRate 0.000781 Epoch: 8 Global Step: 169700 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:12:55,414-Speed 2498.93 samples/sec Loss 4.1363 LearningRate 0.000781 Epoch: 8 Global Step: 169710 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:03,611-Speed 2498.79 samples/sec Loss 4.1905 LearningRate 0.000781 Epoch: 8 Global Step: 169720 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:11,811-Speed 2497.91 samples/sec Loss 4.1916 LearningRate 0.000781 Epoch: 8 Global Step: 169730 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:20,018-Speed 2495.90 samples/sec Loss 4.1802 LearningRate 0.000781 Epoch: 8 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:28,163-Speed 2515.05 samples/sec Loss 4.0885 LearningRate 0.000781 Epoch: 8 Global Step: 169750 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:36,361-Speed 2498.43 samples/sec Loss 4.2372 LearningRate 0.000781 Epoch: 8 Global Step: 169760 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:44,564-Speed 2496.99 samples/sec Loss 4.1855 LearningRate 0.000781 Epoch: 8 Global Step: 169770 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:13:52,763-Speed 2498.48 samples/sec Loss 4.1622 LearningRate 0.000781 Epoch: 8 Global Step: 169780 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:00,963-Speed 2497.90 samples/sec Loss 4.1446 LearningRate 0.000781 Epoch: 8 Global Step: 169790 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:09,164-Speed 2497.61 samples/sec Loss 4.1620 LearningRate 0.000781 Epoch: 8 Global Step: 169800 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:17,310-Speed 2514.65 samples/sec Loss 4.1012 LearningRate 0.000781 Epoch: 8 Global Step: 169810 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:25,507-Speed 2498.99 samples/sec Loss 4.1714 LearningRate 0.000781 Epoch: 8 Global Step: 169820 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:33,718-Speed 2494.53 samples/sec Loss 4.2044 LearningRate 0.000781 Epoch: 8 Global Step: 169830 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:41,920-Speed 2497.52 samples/sec Loss 4.1591 LearningRate 0.000781 Epoch: 8 Global Step: 169840 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:50,119-Speed 2498.26 samples/sec Loss 4.0510 LearningRate 0.000781 Epoch: 8 Global Step: 169850 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:14:58,320-Speed 2497.80 samples/sec Loss 4.0978 LearningRate 0.000781 Epoch: 8 Global Step: 169860 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:06,469-Speed 2513.62 samples/sec Loss 4.0707 LearningRate 0.000781 Epoch: 8 Global Step: 169870 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:14,666-Speed 2498.97 samples/sec Loss 4.1125 LearningRate 0.000781 Epoch: 8 Global Step: 169880 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:22,866-Speed 2497.85 samples/sec Loss 4.1509 LearningRate 0.000781 Epoch: 8 Global Step: 169890 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:31,091-Speed 2490.55 samples/sec Loss 4.1911 LearningRate 0.000781 Epoch: 8 Global Step: 169900 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:39,288-Speed 2498.84 samples/sec Loss 4.2683 LearningRate 0.000781 Epoch: 8 Global Step: 169910 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:47,489-Speed 2497.69 samples/sec Loss 4.2327 LearningRate 0.000781 Epoch: 8 Global Step: 169920 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:15:55,636-Speed 2514.23 samples/sec Loss 4.2832 LearningRate 0.000781 Epoch: 8 Global Step: 169930 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:03,836-Speed 2497.80 samples/sec Loss 4.1986 LearningRate 0.000781 Epoch: 8 Global Step: 169940 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:12,041-Speed 2496.68 samples/sec Loss 4.2183 LearningRate 0.000781 Epoch: 8 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:20,243-Speed 2497.15 samples/sec Loss 4.0885 LearningRate 0.000781 Epoch: 8 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:28,447-Speed 2496.84 samples/sec Loss 4.1474 LearningRate 0.000780 Epoch: 8 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:36,650-Speed 2496.87 samples/sec Loss 4.1599 LearningRate 0.000780 Epoch: 8 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:44,798-Speed 2513.95 samples/sec Loss 4.1540 LearningRate 0.000780 Epoch: 8 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:16:53,005-Speed 2495.90 samples/sec Loss 4.1079 LearningRate 0.000780 Epoch: 8 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:01,223-Speed 2492.55 samples/sec Loss 4.0599 LearningRate 0.000780 Epoch: 8 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:09,423-Speed 2498.03 samples/sec Loss 4.0860 LearningRate 0.000780 Epoch: 8 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:17,628-Speed 2496.63 samples/sec Loss 4.1134 LearningRate 0.000780 Epoch: 8 Global Step: 170030 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:25,843-Speed 2493.23 samples/sec Loss 4.0819 LearningRate 0.000780 Epoch: 8 Global Step: 170040 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:34,417-Speed 2516.30 samples/sec Loss 4.1179 LearningRate 0.000780 Epoch: 8 Global Step: 170050 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:42,637-Speed 2491.74 samples/sec Loss 4.1654 LearningRate 0.000780 Epoch: 8 Global Step: 170060 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:17:51,995-Speed 2502.58 samples/sec Loss 4.1230 LearningRate 0.000780 Epoch: 8 Global Step: 170070 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:01,432-Speed 2313.80 samples/sec Loss 4.0744 LearningRate 0.000780 Epoch: 8 Global Step: 170080 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:09,637-Speed 2496.77 samples/sec Loss 4.0804 LearningRate 0.000780 Epoch: 8 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:17,834-Speed 2498.92 samples/sec Loss 4.1008 LearningRate 0.000780 Epoch: 8 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:25,991-Speed 2511.11 samples/sec Loss 4.0828 LearningRate 0.000780 Epoch: 8 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:34,192-Speed 2497.95 samples/sec Loss 4.1405 LearningRate 0.000780 Epoch: 8 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:18:42,391-Speed 2498.62 samples/sec Loss 4.0854 LearningRate 0.000780 Epoch: 8 Global Step: 170130 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:18:50,594-Speed 2496.78 samples/sec Loss 4.0730 LearningRate 0.000780 Epoch: 8 Global Step: 170140 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:18:58,800-Speed 2496.36 samples/sec Loss 4.0675 LearningRate 0.000780 Epoch: 8 Global Step: 170150 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:06,998-Speed 2498.59 samples/sec Loss 4.0586 LearningRate 0.000780 Epoch: 8 Global Step: 170160 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:15,144-Speed 2514.52 samples/sec Loss 4.0619 LearningRate 0.000780 Epoch: 8 Global Step: 170170 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:23,342-Speed 2498.31 samples/sec Loss 4.1124 LearningRate 0.000780 Epoch: 8 Global Step: 170180 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:31,552-Speed 2495.12 samples/sec Loss 4.1430 LearningRate 0.000780 Epoch: 8 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:39,749-Speed 2498.90 samples/sec Loss 4.1383 LearningRate 0.000780 Epoch: 8 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:47,946-Speed 2498.77 samples/sec Loss 4.0810 LearningRate 0.000780 Epoch: 8 Global Step: 170210 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:19:56,146-Speed 2497.93 samples/sec Loss 4.0792 LearningRate 0.000780 Epoch: 8 Global Step: 170220 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:04,291-Speed 2515.03 samples/sec Loss 4.0953 LearningRate 0.000780 Epoch: 8 Global Step: 170230 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:12,488-Speed 2498.91 samples/sec Loss 4.1208 LearningRate 0.000780 Epoch: 8 Global Step: 170240 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:20,685-Speed 2498.66 samples/sec Loss 4.1147 LearningRate 0.000780 Epoch: 8 Global Step: 170250 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:28,887-Speed 2497.48 samples/sec Loss 4.0801 LearningRate 0.000780 Epoch: 8 Global Step: 170260 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:37,088-Speed 2497.71 samples/sec Loss 4.0198 LearningRate 0.000780 Epoch: 8 Global Step: 170270 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:45,294-Speed 2496.21 samples/sec Loss 4.0942 LearningRate 0.000780 Epoch: 8 Global Step: 170280 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:20:53,436-Speed 2515.86 samples/sec Loss 4.0458 LearningRate 0.000780 Epoch: 8 Global Step: 170290 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:01,636-Speed 2498.27 samples/sec Loss 4.0839 LearningRate 0.000780 Epoch: 8 Global Step: 170300 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:09,833-Speed 2498.88 samples/sec Loss 4.1172 LearningRate 0.000780 Epoch: 8 Global Step: 170310 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:18,029-Speed 2499.05 samples/sec Loss 4.0947 LearningRate 0.000780 Epoch: 8 Global Step: 170320 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:26,230-Speed 2497.60 samples/sec Loss 4.0742 LearningRate 0.000780 Epoch: 8 Global Step: 170330 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:34,430-Speed 2498.08 samples/sec Loss 4.1127 LearningRate 0.000780 Epoch: 8 Global Step: 170340 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:42,595-Speed 2508.55 samples/sec Loss 4.1033 LearningRate 0.000780 Epoch: 8 Global Step: 170350 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:50,798-Speed 2497.27 samples/sec Loss 4.0910 LearningRate 0.000780 Epoch: 8 Global Step: 170360 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:21:59,001-Speed 2497.03 samples/sec Loss 4.1174 LearningRate 0.000780 Epoch: 8 Global Step: 170370 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:07,210-Speed 2495.29 samples/sec Loss 4.1421 LearningRate 0.000780 Epoch: 8 Global Step: 170380 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:15,411-Speed 2498.11 samples/sec Loss 4.1170 LearningRate 0.000779 Epoch: 8 Global Step: 170390 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:23,612-Speed 2497.64 samples/sec Loss 4.2085 LearningRate 0.000779 Epoch: 8 Global Step: 170400 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:31,758-Speed 2514.37 samples/sec Loss 4.1122 LearningRate 0.000779 Epoch: 8 Global Step: 170410 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:39,955-Speed 2499.11 samples/sec Loss 4.1680 LearningRate 0.000779 Epoch: 8 Global Step: 170420 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:48,152-Speed 2498.79 samples/sec Loss 4.1746 LearningRate 0.000779 Epoch: 8 Global Step: 170430 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:22:56,351-Speed 2498.36 samples/sec Loss 4.1164 LearningRate 0.000779 Epoch: 8 Global Step: 170440 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:04,550-Speed 2498.46 samples/sec Loss 4.0781 LearningRate 0.000779 Epoch: 8 Global Step: 170450 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:12,750-Speed 2498.01 samples/sec Loss 4.1016 LearningRate 0.000779 Epoch: 8 Global Step: 170460 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:20,894-Speed 2515.31 samples/sec Loss 4.1403 LearningRate 0.000779 Epoch: 8 Global Step: 170470 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:29,098-Speed 2496.48 samples/sec Loss 4.1938 LearningRate 0.000779 Epoch: 8 Global Step: 170480 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:37,312-Speed 2493.85 samples/sec Loss 4.2066 LearningRate 0.000779 Epoch: 8 Global Step: 170490 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:45,538-Speed 2490.04 samples/sec Loss 4.1833 LearningRate 0.000779 Epoch: 8 Global Step: 170500 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:23:53,738-Speed 2498.09 samples/sec Loss 4.1174 LearningRate 0.000779 Epoch: 8 Global Step: 170510 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:01,937-Speed 2498.29 samples/sec Loss 4.1794 LearningRate 0.000779 Epoch: 8 Global Step: 170520 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:10,085-Speed 2513.84 samples/sec Loss 4.2002 LearningRate 0.000779 Epoch: 8 Global Step: 170530 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:18,285-Speed 2498.35 samples/sec Loss 4.1426 LearningRate 0.000779 Epoch: 8 Global Step: 170540 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:26,482-Speed 2498.66 samples/sec Loss 4.1394 LearningRate 0.000779 Epoch: 8 Global Step: 170550 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:34,681-Speed 2498.38 samples/sec Loss 4.0877 LearningRate 0.000779 Epoch: 8 Global Step: 170560 Fp16 Grad Scale: 65536 Required: 151 hours Training: 2022-07-07 05:24:42,836-Speed 2511.74 samples/sec Loss 4.1746 LearningRate 0.000779 Epoch: 8 Global Step: 170570 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:24:51,034-Speed 2498.63 samples/sec Loss 4.0789 LearningRate 0.000779 Epoch: 8 Global Step: 170580 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:24:59,187-Speed 2512.73 samples/sec Loss 4.1428 LearningRate 0.000779 Epoch: 8 Global Step: 170590 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:07,384-Speed 2498.96 samples/sec Loss 4.1604 LearningRate 0.000779 Epoch: 8 Global Step: 170600 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:15,580-Speed 2499.18 samples/sec Loss 4.2033 LearningRate 0.000779 Epoch: 8 Global Step: 170610 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:23,778-Speed 2498.56 samples/sec Loss 4.0900 LearningRate 0.000779 Epoch: 8 Global Step: 170620 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:31,980-Speed 2497.44 samples/sec Loss 4.0810 LearningRate 0.000779 Epoch: 8 Global Step: 170630 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:40,177-Speed 2498.83 samples/sec Loss 4.1727 LearningRate 0.000779 Epoch: 8 Global Step: 170640 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:48,321-Speed 2515.28 samples/sec Loss 4.0637 LearningRate 0.000779 Epoch: 8 Global Step: 170650 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:25:56,518-Speed 2498.97 samples/sec Loss 4.0994 LearningRate 0.000779 Epoch: 8 Global Step: 170660 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:04,716-Speed 2498.56 samples/sec Loss 4.1161 LearningRate 0.000779 Epoch: 8 Global Step: 170670 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:12,914-Speed 2498.81 samples/sec Loss 4.1603 LearningRate 0.000779 Epoch: 8 Global Step: 170680 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:21,118-Speed 2496.58 samples/sec Loss 4.1748 LearningRate 0.000779 Epoch: 8 Global Step: 170690 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:29,317-Speed 2498.44 samples/sec Loss 4.0971 LearningRate 0.000779 Epoch: 8 Global Step: 170700 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:37,462-Speed 2514.74 samples/sec Loss 4.0671 LearningRate 0.000779 Epoch: 8 Global Step: 170710 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:45,662-Speed 2497.91 samples/sec Loss 4.1457 LearningRate 0.000779 Epoch: 8 Global Step: 170720 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:26:53,861-Speed 2498.82 samples/sec Loss 4.1037 LearningRate 0.000779 Epoch: 8 Global Step: 170730 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:02,062-Speed 2497.53 samples/sec Loss 4.1166 LearningRate 0.000779 Epoch: 8 Global Step: 170740 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:10,264-Speed 2497.48 samples/sec Loss 4.2653 LearningRate 0.000779 Epoch: 8 Global Step: 170750 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:18,463-Speed 2498.22 samples/sec Loss 4.0994 LearningRate 0.000779 Epoch: 8 Global Step: 170760 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:26,607-Speed 2515.27 samples/sec Loss 4.1381 LearningRate 0.000779 Epoch: 8 Global Step: 170770 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:34,804-Speed 2498.68 samples/sec Loss 4.2110 LearningRate 0.000779 Epoch: 8 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:43,007-Speed 2497.14 samples/sec Loss 4.1086 LearningRate 0.000779 Epoch: 8 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:51,209-Speed 2497.52 samples/sec Loss 4.0620 LearningRate 0.000779 Epoch: 8 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:27:59,405-Speed 2499.16 samples/sec Loss 4.0790 LearningRate 0.000779 Epoch: 8 Global Step: 170810 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:28:07,603-Speed 2498.35 samples/sec Loss 4.1191 LearningRate 0.000778 Epoch: 8 Global Step: 170820 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:28:15,751-Speed 2514.01 samples/sec Loss 4.1788 LearningRate 0.000778 Epoch: 8 Global Step: 170830 Fp16 Grad Scale: 32768 Required: 151 hours Training: 2022-07-07 05:28:23,946-Speed 2499.55 samples/sec Loss 4.1585 LearningRate 0.000778 Epoch: 8 Global Step: 170840 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:28:32,150-Speed 2496.83 samples/sec Loss 4.0710 LearningRate 0.000778 Epoch: 8 Global Step: 170850 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:28:40,363-Speed 2493.84 samples/sec Loss 4.1408 LearningRate 0.000778 Epoch: 8 Global Step: 170860 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:28:48,564-Speed 2498.11 samples/sec Loss 4.0838 LearningRate 0.000778 Epoch: 8 Global Step: 170870 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:28:56,760-Speed 2499.06 samples/sec Loss 4.1359 LearningRate 0.000778 Epoch: 8 Global Step: 170880 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:04,906-Speed 2514.83 samples/sec Loss 4.1852 LearningRate 0.000778 Epoch: 8 Global Step: 170890 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:13,116-Speed 2494.81 samples/sec Loss 4.1033 LearningRate 0.000778 Epoch: 8 Global Step: 170900 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:21,312-Speed 2499.17 samples/sec Loss 4.0691 LearningRate 0.000778 Epoch: 8 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:29,512-Speed 2498.00 samples/sec Loss 4.1092 LearningRate 0.000778 Epoch: 8 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:37,712-Speed 2498.06 samples/sec Loss 4.0124 LearningRate 0.000778 Epoch: 8 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:45,907-Speed 2499.47 samples/sec Loss 4.0949 LearningRate 0.000778 Epoch: 8 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:29:54,055-Speed 2513.92 samples/sec Loss 4.1297 LearningRate 0.000778 Epoch: 8 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:02,252-Speed 2498.86 samples/sec Loss 4.1132 LearningRate 0.000778 Epoch: 8 Global Step: 170960 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:10,453-Speed 2497.62 samples/sec Loss 4.0873 LearningRate 0.000778 Epoch: 8 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:18,653-Speed 2498.10 samples/sec Loss 4.1207 LearningRate 0.000778 Epoch: 8 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:26,852-Speed 2498.03 samples/sec Loss 4.1450 LearningRate 0.000778 Epoch: 8 Global Step: 170990 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:35,059-Speed 2496.02 samples/sec Loss 4.0701 LearningRate 0.000778 Epoch: 8 Global Step: 171000 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:43,205-Speed 2514.55 samples/sec Loss 4.0893 LearningRate 0.000778 Epoch: 8 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:51,429-Speed 2490.90 samples/sec Loss 4.0493 LearningRate 0.000778 Epoch: 8 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:30:59,626-Speed 2498.92 samples/sec Loss 4.0810 LearningRate 0.000778 Epoch: 8 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:07,823-Speed 2498.65 samples/sec Loss 4.1348 LearningRate 0.000778 Epoch: 8 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:16,023-Speed 2498.09 samples/sec Loss 4.1651 LearningRate 0.000778 Epoch: 8 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:24,221-Speed 2498.92 samples/sec Loss 4.0557 LearningRate 0.000778 Epoch: 8 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:32,368-Speed 2514.36 samples/sec Loss 4.1508 LearningRate 0.000778 Epoch: 8 Global Step: 171070 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:40,563-Speed 2499.33 samples/sec Loss 4.0917 LearningRate 0.000778 Epoch: 8 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:48,767-Speed 2496.77 samples/sec Loss 4.0961 LearningRate 0.000778 Epoch: 8 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:31:56,965-Speed 2498.65 samples/sec Loss 4.0762 LearningRate 0.000778 Epoch: 8 Global Step: 171100 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:05,166-Speed 2497.65 samples/sec Loss 4.1420 LearningRate 0.000778 Epoch: 8 Global Step: 171110 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:13,368-Speed 2497.45 samples/sec Loss 4.0779 LearningRate 0.000778 Epoch: 8 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:21,518-Speed 2513.43 samples/sec Loss 4.0678 LearningRate 0.000778 Epoch: 8 Global Step: 171130 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:29,716-Speed 2498.55 samples/sec Loss 4.0653 LearningRate 0.000778 Epoch: 8 Global Step: 171140 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:37,917-Speed 2497.75 samples/sec Loss 4.1453 LearningRate 0.000778 Epoch: 8 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:46,114-Speed 2498.80 samples/sec Loss 4.1096 LearningRate 0.000778 Epoch: 8 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:32:54,313-Speed 2498.17 samples/sec Loss 4.0482 LearningRate 0.000778 Epoch: 8 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:02,512-Speed 2498.30 samples/sec Loss 4.1282 LearningRate 0.000778 Epoch: 8 Global Step: 171180 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:10,660-Speed 2513.95 samples/sec Loss 4.1186 LearningRate 0.000778 Epoch: 8 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:18,858-Speed 2498.58 samples/sec Loss 4.1296 LearningRate 0.000778 Epoch: 8 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:27,058-Speed 2497.99 samples/sec Loss 4.1482 LearningRate 0.000778 Epoch: 8 Global Step: 171210 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:35,257-Speed 2498.27 samples/sec Loss 4.0963 LearningRate 0.000778 Epoch: 8 Global Step: 171220 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:43,459-Speed 2497.38 samples/sec Loss 4.1452 LearningRate 0.000778 Epoch: 8 Global Step: 171230 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:51,662-Speed 2497.07 samples/sec Loss 4.1625 LearningRate 0.000777 Epoch: 8 Global Step: 171240 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:33:59,810-Speed 2514.13 samples/sec Loss 4.0220 LearningRate 0.000777 Epoch: 8 Global Step: 171250 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:08,020-Speed 2494.64 samples/sec Loss 4.1008 LearningRate 0.000777 Epoch: 8 Global Step: 171260 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:16,222-Speed 2497.36 samples/sec Loss 4.0636 LearningRate 0.000777 Epoch: 8 Global Step: 171270 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:24,421-Speed 2498.24 samples/sec Loss 4.0688 LearningRate 0.000777 Epoch: 8 Global Step: 171280 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:32,630-Speed 2495.32 samples/sec Loss 4.1179 LearningRate 0.000777 Epoch: 8 Global Step: 171290 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:40,828-Speed 2498.79 samples/sec Loss 4.1187 LearningRate 0.000777 Epoch: 8 Global Step: 171300 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:48,974-Speed 2514.35 samples/sec Loss 4.0810 LearningRate 0.000777 Epoch: 8 Global Step: 171310 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:34:57,181-Speed 2496.07 samples/sec Loss 4.0587 LearningRate 0.000777 Epoch: 8 Global Step: 171320 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:05,374-Speed 2499.79 samples/sec Loss 4.0553 LearningRate 0.000777 Epoch: 8 Global Step: 171330 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:13,572-Speed 2498.53 samples/sec Loss 4.0791 LearningRate 0.000777 Epoch: 8 Global Step: 171340 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:21,772-Speed 2498.51 samples/sec Loss 4.0779 LearningRate 0.000777 Epoch: 8 Global Step: 171350 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:29,972-Speed 2497.99 samples/sec Loss 4.1305 LearningRate 0.000777 Epoch: 8 Global Step: 171360 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:38,123-Speed 2512.85 samples/sec Loss 4.0327 LearningRate 0.000777 Epoch: 8 Global Step: 171370 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:46,325-Speed 2497.42 samples/sec Loss 4.1055 LearningRate 0.000777 Epoch: 8 Global Step: 171380 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:35:54,523-Speed 2498.50 samples/sec Loss 4.1373 LearningRate 0.000777 Epoch: 8 Global Step: 171390 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:02,744-Speed 2491.58 samples/sec Loss 4.1218 LearningRate 0.000777 Epoch: 8 Global Step: 171400 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:10,942-Speed 2498.58 samples/sec Loss 4.0646 LearningRate 0.000777 Epoch: 8 Global Step: 171410 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:19,142-Speed 2498.03 samples/sec Loss 4.0266 LearningRate 0.000777 Epoch: 8 Global Step: 171420 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:27,284-Speed 2515.74 samples/sec Loss 4.1358 LearningRate 0.000777 Epoch: 8 Global Step: 171430 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:35,481-Speed 2498.82 samples/sec Loss 4.0807 LearningRate 0.000777 Epoch: 8 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:43,680-Speed 2498.50 samples/sec Loss 4.1301 LearningRate 0.000777 Epoch: 8 Global Step: 171450 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:36:51,875-Speed 2499.57 samples/sec Loss 4.0962 LearningRate 0.000777 Epoch: 8 Global Step: 171460 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:00,076-Speed 2497.57 samples/sec Loss 4.0720 LearningRate 0.000777 Epoch: 8 Global Step: 171470 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:08,272-Speed 2499.29 samples/sec Loss 4.0825 LearningRate 0.000777 Epoch: 8 Global Step: 171480 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:16,417-Speed 2515.11 samples/sec Loss 4.0988 LearningRate 0.000777 Epoch: 8 Global Step: 171490 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:24,616-Speed 2498.27 samples/sec Loss 4.0683 LearningRate 0.000777 Epoch: 8 Global Step: 171500 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:32,813-Speed 2498.72 samples/sec Loss 4.0337 LearningRate 0.000777 Epoch: 8 Global Step: 171510 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:41,015-Speed 2497.34 samples/sec Loss 4.0574 LearningRate 0.000777 Epoch: 8 Global Step: 171520 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:49,214-Speed 2498.43 samples/sec Loss 4.1672 LearningRate 0.000777 Epoch: 8 Global Step: 171530 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:37:57,412-Speed 2498.94 samples/sec Loss 4.0873 LearningRate 0.000777 Epoch: 8 Global Step: 171540 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:05,561-Speed 2513.63 samples/sec Loss 4.1181 LearningRate 0.000777 Epoch: 8 Global Step: 171550 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:13,757-Speed 2499.01 samples/sec Loss 4.0536 LearningRate 0.000777 Epoch: 8 Global Step: 171560 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:21,954-Speed 2499.15 samples/sec Loss 4.0818 LearningRate 0.000777 Epoch: 8 Global Step: 171570 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:30,150-Speed 2499.40 samples/sec Loss 4.0746 LearningRate 0.000777 Epoch: 8 Global Step: 171580 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:38,350-Speed 2498.11 samples/sec Loss 4.0924 LearningRate 0.000777 Epoch: 8 Global Step: 171590 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:46,549-Speed 2498.23 samples/sec Loss 4.0917 LearningRate 0.000777 Epoch: 8 Global Step: 171600 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:38:54,696-Speed 2514.23 samples/sec Loss 4.1243 LearningRate 0.000777 Epoch: 8 Global Step: 171610 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:02,902-Speed 2496.04 samples/sec Loss 4.1196 LearningRate 0.000777 Epoch: 8 Global Step: 171620 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:11,104-Speed 2497.23 samples/sec Loss 4.0727 LearningRate 0.000777 Epoch: 8 Global Step: 171630 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:19,317-Speed 2494.02 samples/sec Loss 4.1175 LearningRate 0.000777 Epoch: 8 Global Step: 171640 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:27,517-Speed 2497.88 samples/sec Loss 4.1958 LearningRate 0.000777 Epoch: 8 Global Step: 171650 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:35,727-Speed 2495.50 samples/sec Loss 4.1112 LearningRate 0.000776 Epoch: 8 Global Step: 171660 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:43,874-Speed 2514.15 samples/sec Loss 4.0725 LearningRate 0.000776 Epoch: 8 Global Step: 171670 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:39:52,069-Speed 2499.39 samples/sec Loss 4.0938 LearningRate 0.000776 Epoch: 8 Global Step: 171680 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:00,272-Speed 2497.38 samples/sec Loss 4.1151 LearningRate 0.000776 Epoch: 8 Global Step: 171690 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:08,472-Speed 2497.88 samples/sec Loss 4.0432 LearningRate 0.000776 Epoch: 8 Global Step: 171700 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:16,669-Speed 2498.83 samples/sec Loss 4.2220 LearningRate 0.000776 Epoch: 8 Global Step: 171710 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:24,867-Speed 2498.87 samples/sec Loss 4.1368 LearningRate 0.000776 Epoch: 8 Global Step: 171720 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:33,014-Speed 2514.41 samples/sec Loss 4.0776 LearningRate 0.000776 Epoch: 8 Global Step: 171730 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:41,217-Speed 2497.19 samples/sec Loss 4.0865 LearningRate 0.000776 Epoch: 8 Global Step: 171740 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:49,413-Speed 2499.24 samples/sec Loss 4.1699 LearningRate 0.000776 Epoch: 8 Global Step: 171750 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:40:57,612-Speed 2497.88 samples/sec Loss 4.0413 LearningRate 0.000776 Epoch: 8 Global Step: 171760 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:41:05,826-Speed 2493.80 samples/sec Loss 4.0812 LearningRate 0.000776 Epoch: 8 Global Step: 171770 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:14,024-Speed 2498.72 samples/sec Loss 4.0861 LearningRate 0.000776 Epoch: 8 Global Step: 171780 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:22,175-Speed 2512.89 samples/sec Loss 4.0520 LearningRate 0.000776 Epoch: 8 Global Step: 171790 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:30,374-Speed 2498.63 samples/sec Loss 4.1251 LearningRate 0.000776 Epoch: 8 Global Step: 171800 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:38,573-Speed 2498.31 samples/sec Loss 4.0648 LearningRate 0.000776 Epoch: 8 Global Step: 171810 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:46,772-Speed 2498.19 samples/sec Loss 4.0691 LearningRate 0.000776 Epoch: 8 Global Step: 171820 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:41:54,969-Speed 2499.00 samples/sec Loss 4.0130 LearningRate 0.000776 Epoch: 8 Global Step: 171830 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:03,168-Speed 2498.19 samples/sec Loss 4.0895 LearningRate 0.000776 Epoch: 8 Global Step: 171840 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:11,311-Speed 2515.58 samples/sec Loss 4.0953 LearningRate 0.000776 Epoch: 8 Global Step: 171850 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:19,511-Speed 2497.92 samples/sec Loss 4.0948 LearningRate 0.000776 Epoch: 8 Global Step: 171860 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:27,713-Speed 2497.25 samples/sec Loss 4.0472 LearningRate 0.000776 Epoch: 8 Global Step: 171870 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:35,917-Speed 2497.20 samples/sec Loss 4.0250 LearningRate 0.000776 Epoch: 8 Global Step: 171880 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:44,130-Speed 2493.80 samples/sec Loss 4.1041 LearningRate 0.000776 Epoch: 8 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:42:52,331-Speed 2497.67 samples/sec Loss 4.1448 LearningRate 0.000776 Epoch: 8 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:00,478-Speed 2514.46 samples/sec Loss 4.0497 LearningRate 0.000776 Epoch: 8 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:08,679-Speed 2497.71 samples/sec Loss 4.0280 LearningRate 0.000776 Epoch: 8 Global Step: 171920 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:16,886-Speed 2495.67 samples/sec Loss 4.0064 LearningRate 0.000776 Epoch: 8 Global Step: 171930 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:25,085-Speed 2498.79 samples/sec Loss 4.0843 LearningRate 0.000776 Epoch: 8 Global Step: 171940 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:33,283-Speed 2498.53 samples/sec Loss 4.0552 LearningRate 0.000776 Epoch: 8 Global Step: 171950 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:41,482-Speed 2498.09 samples/sec Loss 4.0529 LearningRate 0.000776 Epoch: 8 Global Step: 171960 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:49,635-Speed 2512.47 samples/sec Loss 4.0949 LearningRate 0.000776 Epoch: 8 Global Step: 171970 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:43:57,835-Speed 2497.96 samples/sec Loss 4.1034 LearningRate 0.000776 Epoch: 8 Global Step: 171980 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:44:06,032-Speed 2498.68 samples/sec Loss 4.0765 LearningRate 0.000776 Epoch: 8 Global Step: 171990 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:44:14,232-Speed 2498.45 samples/sec Loss 4.0786 LearningRate 0.000776 Epoch: 8 Global Step: 172000 Fp16 Grad Scale: 65536 Required: 150 hours Training: 2022-07-07 05:44:22,390-Speed 2510.80 samples/sec Loss 4.1124 LearningRate 0.000776 Epoch: 8 Global Step: 172010 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:44:30,595-Speed 2496.41 samples/sec Loss 4.0364 LearningRate 0.000776 Epoch: 8 Global Step: 172020 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:44:38,745-Speed 2513.48 samples/sec Loss 4.1523 LearningRate 0.000776 Epoch: 8 Global Step: 172030 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:44:46,940-Speed 2499.56 samples/sec Loss 4.1551 LearningRate 0.000776 Epoch: 8 Global Step: 172040 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:44:55,144-Speed 2496.69 samples/sec Loss 4.0952 LearningRate 0.000776 Epoch: 8 Global Step: 172050 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:03,348-Speed 2496.88 samples/sec Loss 4.0514 LearningRate 0.000776 Epoch: 8 Global Step: 172060 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:11,548-Speed 2498.05 samples/sec Loss 4.1415 LearningRate 0.000776 Epoch: 8 Global Step: 172070 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:19,747-Speed 2498.25 samples/sec Loss 4.0942 LearningRate 0.000776 Epoch: 8 Global Step: 172080 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:27,907-Speed 2510.32 samples/sec Loss 4.1506 LearningRate 0.000775 Epoch: 8 Global Step: 172090 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:36,110-Speed 2497.16 samples/sec Loss 4.0974 LearningRate 0.000775 Epoch: 8 Global Step: 172100 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:44,310-Speed 2497.67 samples/sec Loss 4.2021 LearningRate 0.000775 Epoch: 8 Global Step: 172110 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:45:52,513-Speed 2497.05 samples/sec Loss 4.2343 LearningRate 0.000775 Epoch: 8 Global Step: 172120 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:00,713-Speed 2497.88 samples/sec Loss 4.2159 LearningRate 0.000775 Epoch: 8 Global Step: 172130 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:08,919-Speed 2496.30 samples/sec Loss 4.0984 LearningRate 0.000775 Epoch: 8 Global Step: 172140 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:17,066-Speed 2514.46 samples/sec Loss 4.0992 LearningRate 0.000775 Epoch: 8 Global Step: 172150 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:25,266-Speed 2497.79 samples/sec Loss 4.1322 LearningRate 0.000775 Epoch: 8 Global Step: 172160 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:33,476-Speed 2495.08 samples/sec Loss 4.1812 LearningRate 0.000775 Epoch: 8 Global Step: 172170 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:41,675-Speed 2498.22 samples/sec Loss 4.1648 LearningRate 0.000775 Epoch: 8 Global Step: 172180 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:49,878-Speed 2497.18 samples/sec Loss 4.1988 LearningRate 0.000775 Epoch: 8 Global Step: 172190 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:46:58,085-Speed 2495.80 samples/sec Loss 4.0997 LearningRate 0.000775 Epoch: 8 Global Step: 172200 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:06,233-Speed 2513.72 samples/sec Loss 4.0395 LearningRate 0.000775 Epoch: 8 Global Step: 172210 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:14,440-Speed 2495.77 samples/sec Loss 4.0467 LearningRate 0.000775 Epoch: 8 Global Step: 172220 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:22,645-Speed 2496.40 samples/sec Loss 4.1019 LearningRate 0.000775 Epoch: 8 Global Step: 172230 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:30,853-Speed 2495.60 samples/sec Loss 4.1125 LearningRate 0.000775 Epoch: 8 Global Step: 172240 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:39,052-Speed 2498.16 samples/sec Loss 4.1113 LearningRate 0.000775 Epoch: 8 Global Step: 172250 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:47,262-Speed 2495.06 samples/sec Loss 4.1407 LearningRate 0.000775 Epoch: 8 Global Step: 172260 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:47:55,406-Speed 2515.08 samples/sec Loss 4.1093 LearningRate 0.000775 Epoch: 8 Global Step: 172270 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:03,632-Speed 2490.12 samples/sec Loss 4.0972 LearningRate 0.000775 Epoch: 8 Global Step: 172280 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:11,850-Speed 2492.36 samples/sec Loss 4.0113 LearningRate 0.000775 Epoch: 8 Global Step: 172290 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:20,048-Speed 2498.83 samples/sec Loss 4.1220 LearningRate 0.000775 Epoch: 8 Global Step: 172300 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:28,247-Speed 2498.23 samples/sec Loss 4.1670 LearningRate 0.000775 Epoch: 8 Global Step: 172310 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:36,444-Speed 2498.82 samples/sec Loss 4.0715 LearningRate 0.000775 Epoch: 8 Global Step: 172320 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:44,594-Speed 2513.16 samples/sec Loss 4.0565 LearningRate 0.000775 Epoch: 8 Global Step: 172330 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:48:52,791-Speed 2498.82 samples/sec Loss 4.0863 LearningRate 0.000775 Epoch: 8 Global Step: 172340 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:00,992-Speed 2497.87 samples/sec Loss 4.0569 LearningRate 0.000775 Epoch: 8 Global Step: 172350 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:09,194-Speed 2497.27 samples/sec Loss 4.1012 LearningRate 0.000775 Epoch: 8 Global Step: 172360 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:17,390-Speed 2499.16 samples/sec Loss 4.1440 LearningRate 0.000775 Epoch: 8 Global Step: 172370 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:25,589-Speed 2498.33 samples/sec Loss 4.1006 LearningRate 0.000775 Epoch: 8 Global Step: 172380 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:33,736-Speed 2513.99 samples/sec Loss 4.0701 LearningRate 0.000775 Epoch: 8 Global Step: 172390 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:41,935-Speed 2498.65 samples/sec Loss 4.1059 LearningRate 0.000775 Epoch: 8 Global Step: 172400 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:50,137-Speed 2497.41 samples/sec Loss 4.0926 LearningRate 0.000775 Epoch: 8 Global Step: 172410 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:49:58,334-Speed 2499.34 samples/sec Loss 4.0358 LearningRate 0.000775 Epoch: 8 Global Step: 172420 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:06,552-Speed 2492.43 samples/sec Loss 4.0890 LearningRate 0.000775 Epoch: 8 Global Step: 172430 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:14,750-Speed 2498.43 samples/sec Loss 4.0854 LearningRate 0.000775 Epoch: 8 Global Step: 172440 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:23,727-Speed 2281.64 samples/sec Loss 4.0757 LearningRate 0.000775 Epoch: 8 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:32,771-Speed 2494.28 samples/sec Loss 4.0625 LearningRate 0.000775 Epoch: 8 Global Step: 172460 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:40,966-Speed 2499.52 samples/sec Loss 4.1893 LearningRate 0.000775 Epoch: 8 Global Step: 172470 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:49,163-Speed 2498.87 samples/sec Loss 4.1455 LearningRate 0.000775 Epoch: 8 Global Step: 172480 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:50:57,358-Speed 2499.55 samples/sec Loss 4.0919 LearningRate 0.000775 Epoch: 8 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:05,557-Speed 2498.46 samples/sec Loss 4.1273 LearningRate 0.000775 Epoch: 8 Global Step: 172500 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:13,702-Speed 2514.52 samples/sec Loss 4.0464 LearningRate 0.000774 Epoch: 8 Global Step: 172510 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:21,900-Speed 2498.65 samples/sec Loss 4.1255 LearningRate 0.000774 Epoch: 8 Global Step: 172520 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:30,094-Speed 2499.55 samples/sec Loss 4.0816 LearningRate 0.000774 Epoch: 8 Global Step: 172530 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:38,296-Speed 2497.68 samples/sec Loss 4.1065 LearningRate 0.000774 Epoch: 8 Global Step: 172540 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:46,494-Speed 2498.49 samples/sec Loss 4.0856 LearningRate 0.000774 Epoch: 8 Global Step: 172550 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:51:54,693-Speed 2498.59 samples/sec Loss 4.0488 LearningRate 0.000774 Epoch: 8 Global Step: 172560 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:02,843-Speed 2513.34 samples/sec Loss 4.0707 LearningRate 0.000774 Epoch: 8 Global Step: 172570 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:11,040-Speed 2498.95 samples/sec Loss 4.0243 LearningRate 0.000774 Epoch: 8 Global Step: 172580 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:19,240-Speed 2497.94 samples/sec Loss 4.0379 LearningRate 0.000774 Epoch: 8 Global Step: 172590 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:27,438-Speed 2498.63 samples/sec Loss 4.0224 LearningRate 0.000774 Epoch: 8 Global Step: 172600 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:35,642-Speed 2496.87 samples/sec Loss 4.0657 LearningRate 0.000774 Epoch: 8 Global Step: 172610 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:43,842-Speed 2497.88 samples/sec Loss 4.1358 LearningRate 0.000774 Epoch: 8 Global Step: 172620 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:52:51,984-Speed 2515.54 samples/sec Loss 4.0152 LearningRate 0.000774 Epoch: 8 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:00,177-Speed 2500.53 samples/sec Loss 4.0519 LearningRate 0.000774 Epoch: 8 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:08,375-Speed 2498.50 samples/sec Loss 4.0803 LearningRate 0.000774 Epoch: 8 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:16,569-Speed 2499.83 samples/sec Loss 4.0666 LearningRate 0.000774 Epoch: 8 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:24,781-Speed 2494.38 samples/sec Loss 3.9645 LearningRate 0.000774 Epoch: 8 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:32,979-Speed 2498.65 samples/sec Loss 4.0436 LearningRate 0.000774 Epoch: 8 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:41,123-Speed 2515.17 samples/sec Loss 4.0470 LearningRate 0.000774 Epoch: 8 Global Step: 172690 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:49,321-Speed 2498.59 samples/sec Loss 4.1558 LearningRate 0.000774 Epoch: 8 Global Step: 172700 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:53:57,521-Speed 2498.14 samples/sec Loss 4.0289 LearningRate 0.000774 Epoch: 8 Global Step: 172710 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:05,730-Speed 2495.25 samples/sec Loss 4.1256 LearningRate 0.000774 Epoch: 8 Global Step: 172720 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:13,930-Speed 2497.95 samples/sec Loss 4.1300 LearningRate 0.000774 Epoch: 8 Global Step: 172730 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:22,131-Speed 2497.77 samples/sec Loss 4.1772 LearningRate 0.000774 Epoch: 8 Global Step: 172740 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:30,283-Speed 2512.41 samples/sec Loss 4.0333 LearningRate 0.000774 Epoch: 8 Global Step: 172750 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:38,482-Speed 2498.25 samples/sec Loss 4.0660 LearningRate 0.000774 Epoch: 8 Global Step: 172760 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:46,680-Speed 2498.53 samples/sec Loss 4.0868 LearningRate 0.000774 Epoch: 8 Global Step: 172770 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:54:54,880-Speed 2498.20 samples/sec Loss 4.0524 LearningRate 0.000774 Epoch: 8 Global Step: 172780 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:03,081-Speed 2497.54 samples/sec Loss 4.0846 LearningRate 0.000774 Epoch: 8 Global Step: 172790 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:11,282-Speed 2497.70 samples/sec Loss 4.0663 LearningRate 0.000774 Epoch: 8 Global Step: 172800 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:19,431-Speed 2513.67 samples/sec Loss 4.1063 LearningRate 0.000774 Epoch: 8 Global Step: 172810 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:27,645-Speed 2493.72 samples/sec Loss 4.1429 LearningRate 0.000774 Epoch: 8 Global Step: 172820 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:35,848-Speed 2497.13 samples/sec Loss 4.1450 LearningRate 0.000774 Epoch: 8 Global Step: 172830 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:44,058-Speed 2494.73 samples/sec Loss 4.0232 LearningRate 0.000774 Epoch: 8 Global Step: 172840 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:55:52,263-Speed 2496.64 samples/sec Loss 4.0893 LearningRate 0.000774 Epoch: 8 Global Step: 172850 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:00,463-Speed 2497.69 samples/sec Loss 4.1955 LearningRate 0.000774 Epoch: 8 Global Step: 172860 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:08,611-Speed 2513.93 samples/sec Loss 4.1813 LearningRate 0.000774 Epoch: 8 Global Step: 172870 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:16,814-Speed 2497.32 samples/sec Loss 4.0867 LearningRate 0.000774 Epoch: 8 Global Step: 172880 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:25,013-Speed 2498.12 samples/sec Loss 4.0740 LearningRate 0.000774 Epoch: 8 Global Step: 172890 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:33,218-Speed 2496.31 samples/sec Loss 4.0908 LearningRate 0.000774 Epoch: 8 Global Step: 172900 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:41,423-Speed 2496.54 samples/sec Loss 4.1405 LearningRate 0.000774 Epoch: 8 Global Step: 172910 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:49,621-Speed 2498.73 samples/sec Loss 4.1115 LearningRate 0.000774 Epoch: 8 Global Step: 172920 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:56:57,771-Speed 2513.31 samples/sec Loss 4.0353 LearningRate 0.000774 Epoch: 8 Global Step: 172930 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:05,976-Speed 2496.21 samples/sec Loss 4.0996 LearningRate 0.000773 Epoch: 8 Global Step: 172940 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:14,173-Speed 2498.91 samples/sec Loss 4.1219 LearningRate 0.000773 Epoch: 8 Global Step: 172950 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:22,371-Speed 2498.81 samples/sec Loss 4.0316 LearningRate 0.000773 Epoch: 8 Global Step: 172960 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:30,569-Speed 2498.46 samples/sec Loss 4.1064 LearningRate 0.000773 Epoch: 8 Global Step: 172970 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:38,767-Speed 2498.90 samples/sec Loss 4.0405 LearningRate 0.000773 Epoch: 8 Global Step: 172980 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:46,911-Speed 2515.05 samples/sec Loss 4.0527 LearningRate 0.000773 Epoch: 8 Global Step: 172990 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:57:55,110-Speed 2498.40 samples/sec Loss 4.1037 LearningRate 0.000773 Epoch: 8 Global Step: 173000 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 05:58:03,264-Speed 2512.11 samples/sec Loss 4.0902 LearningRate 0.000773 Epoch: 8 Global Step: 173010 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:11,461-Speed 2500.15 samples/sec Loss 4.0150 LearningRate 0.000773 Epoch: 8 Global Step: 173020 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:19,660-Speed 2498.23 samples/sec Loss 4.0572 LearningRate 0.000773 Epoch: 8 Global Step: 173030 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:27,859-Speed 2498.33 samples/sec Loss 4.0706 LearningRate 0.000773 Epoch: 8 Global Step: 173040 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:36,003-Speed 2515.14 samples/sec Loss 4.0497 LearningRate 0.000773 Epoch: 8 Global Step: 173050 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:44,208-Speed 2496.63 samples/sec Loss 4.1010 LearningRate 0.000773 Epoch: 8 Global Step: 173060 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:58:52,408-Speed 2497.94 samples/sec Loss 4.0922 LearningRate 0.000773 Epoch: 8 Global Step: 173070 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:00,606-Speed 2498.41 samples/sec Loss 4.0643 LearningRate 0.000773 Epoch: 8 Global Step: 173080 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:08,809-Speed 2497.18 samples/sec Loss 4.0861 LearningRate 0.000773 Epoch: 8 Global Step: 173090 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:17,007-Speed 2498.55 samples/sec Loss 4.0076 LearningRate 0.000773 Epoch: 8 Global Step: 173100 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:25,161-Speed 2512.11 samples/sec Loss 4.1384 LearningRate 0.000773 Epoch: 8 Global Step: 173110 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:33,358-Speed 2498.86 samples/sec Loss 4.0864 LearningRate 0.000773 Epoch: 8 Global Step: 173120 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:41,568-Speed 2495.02 samples/sec Loss 4.0513 LearningRate 0.000773 Epoch: 8 Global Step: 173130 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:49,767-Speed 2498.22 samples/sec Loss 4.0934 LearningRate 0.000773 Epoch: 8 Global Step: 173140 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 05:59:57,966-Speed 2498.53 samples/sec Loss 4.0633 LearningRate 0.000773 Epoch: 8 Global Step: 173150 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:06,162-Speed 2498.78 samples/sec Loss 4.1242 LearningRate 0.000773 Epoch: 8 Global Step: 173160 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:14,311-Speed 2513.73 samples/sec Loss 4.0318 LearningRate 0.000773 Epoch: 8 Global Step: 173170 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:22,505-Speed 2499.87 samples/sec Loss 4.0709 LearningRate 0.000773 Epoch: 8 Global Step: 173180 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:30,703-Speed 2498.51 samples/sec Loss 4.0693 LearningRate 0.000773 Epoch: 8 Global Step: 173190 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:38,900-Speed 2499.19 samples/sec Loss 4.1181 LearningRate 0.000773 Epoch: 8 Global Step: 173200 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:47,111-Speed 2494.53 samples/sec Loss 4.0899 LearningRate 0.000773 Epoch: 8 Global Step: 173210 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:00:55,308-Speed 2499.07 samples/sec Loss 3.9811 LearningRate 0.000773 Epoch: 8 Global Step: 173220 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:03,454-Speed 2514.53 samples/sec Loss 4.1167 LearningRate 0.000773 Epoch: 8 Global Step: 173230 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:11,655-Speed 2497.59 samples/sec Loss 4.1296 LearningRate 0.000773 Epoch: 8 Global Step: 173240 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:19,860-Speed 2496.89 samples/sec Loss 4.0122 LearningRate 0.000773 Epoch: 8 Global Step: 173250 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:28,060-Speed 2498.14 samples/sec Loss 4.1538 LearningRate 0.000773 Epoch: 8 Global Step: 173260 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:36,257-Speed 2498.70 samples/sec Loss 4.1688 LearningRate 0.000773 Epoch: 8 Global Step: 173270 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:44,464-Speed 2495.81 samples/sec Loss 4.1137 LearningRate 0.000773 Epoch: 8 Global Step: 173280 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:01:52,609-Speed 2514.94 samples/sec Loss 4.0949 LearningRate 0.000773 Epoch: 8 Global Step: 173290 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:00,812-Speed 2497.26 samples/sec Loss 4.0865 LearningRate 0.000773 Epoch: 8 Global Step: 173300 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:09,014-Speed 2497.67 samples/sec Loss 4.1069 LearningRate 0.000773 Epoch: 8 Global Step: 173310 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:17,211-Speed 2498.78 samples/sec Loss 4.0828 LearningRate 0.000773 Epoch: 8 Global Step: 173320 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:25,409-Speed 2499.00 samples/sec Loss 4.1046 LearningRate 0.000773 Epoch: 8 Global Step: 173330 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:33,606-Speed 2499.00 samples/sec Loss 4.0587 LearningRate 0.000773 Epoch: 8 Global Step: 173340 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:41,748-Speed 2515.56 samples/sec Loss 4.0614 LearningRate 0.000773 Epoch: 8 Global Step: 173350 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:49,948-Speed 2498.22 samples/sec Loss 4.0637 LearningRate 0.000772 Epoch: 8 Global Step: 173360 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:02:58,161-Speed 2493.79 samples/sec Loss 4.1653 LearningRate 0.000772 Epoch: 8 Global Step: 173370 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:06,357-Speed 2499.37 samples/sec Loss 4.1466 LearningRate 0.000772 Epoch: 8 Global Step: 173380 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:14,554-Speed 2498.56 samples/sec Loss 4.0665 LearningRate 0.000772 Epoch: 8 Global Step: 173390 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:22,754-Speed 2498.19 samples/sec Loss 4.1159 LearningRate 0.000772 Epoch: 8 Global Step: 173400 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:30,898-Speed 2515.01 samples/sec Loss 4.1191 LearningRate 0.000772 Epoch: 8 Global Step: 173410 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:39,100-Speed 2497.50 samples/sec Loss 4.0157 LearningRate 0.000772 Epoch: 8 Global Step: 173420 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:47,297-Speed 2498.61 samples/sec Loss 4.0790 LearningRate 0.000772 Epoch: 8 Global Step: 173430 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:03:55,506-Speed 2495.37 samples/sec Loss 4.1109 LearningRate 0.000772 Epoch: 8 Global Step: 173440 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:03,718-Speed 2494.58 samples/sec Loss 4.1289 LearningRate 0.000772 Epoch: 8 Global Step: 173450 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:11,919-Speed 2497.63 samples/sec Loss 4.0835 LearningRate 0.000772 Epoch: 8 Global Step: 173460 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:20,065-Speed 2514.55 samples/sec Loss 4.0337 LearningRate 0.000772 Epoch: 8 Global Step: 173470 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:28,263-Speed 2498.72 samples/sec Loss 4.0516 LearningRate 0.000772 Epoch: 8 Global Step: 173480 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:36,460-Speed 2498.67 samples/sec Loss 4.0822 LearningRate 0.000772 Epoch: 8 Global Step: 173490 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:44,664-Speed 2496.90 samples/sec Loss 4.0917 LearningRate 0.000772 Epoch: 8 Global Step: 173500 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:04:52,866-Speed 2497.31 samples/sec Loss 4.0676 LearningRate 0.000772 Epoch: 8 Global Step: 173510 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:01,064-Speed 2498.53 samples/sec Loss 4.0713 LearningRate 0.000772 Epoch: 8 Global Step: 173520 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:09,208-Speed 2515.29 samples/sec Loss 4.0549 LearningRate 0.000772 Epoch: 8 Global Step: 173530 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:17,403-Speed 2499.21 samples/sec Loss 4.0826 LearningRate 0.000772 Epoch: 8 Global Step: 173540 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:25,603-Speed 2498.14 samples/sec Loss 4.0524 LearningRate 0.000772 Epoch: 8 Global Step: 173550 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:33,802-Speed 2498.27 samples/sec Loss 4.1268 LearningRate 0.000772 Epoch: 8 Global Step: 173560 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:42,004-Speed 2497.54 samples/sec Loss 4.0805 LearningRate 0.000772 Epoch: 8 Global Step: 173570 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:50,213-Speed 2494.90 samples/sec Loss 4.1072 LearningRate 0.000772 Epoch: 8 Global Step: 173580 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:05:58,355-Speed 2515.70 samples/sec Loss 4.0251 LearningRate 0.000772 Epoch: 8 Global Step: 173590 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:06,553-Speed 2498.59 samples/sec Loss 4.0204 LearningRate 0.000772 Epoch: 8 Global Step: 173600 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:14,761-Speed 2495.73 samples/sec Loss 3.9950 LearningRate 0.000772 Epoch: 8 Global Step: 173610 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:22,961-Speed 2498.11 samples/sec Loss 4.0609 LearningRate 0.000772 Epoch: 8 Global Step: 173620 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:31,168-Speed 2495.56 samples/sec Loss 4.0768 LearningRate 0.000772 Epoch: 8 Global Step: 173630 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:39,369-Speed 2497.74 samples/sec Loss 4.0486 LearningRate 0.000772 Epoch: 8 Global Step: 173640 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:47,515-Speed 2514.51 samples/sec Loss 4.2342 LearningRate 0.000772 Epoch: 8 Global Step: 173650 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:06:55,712-Speed 2498.87 samples/sec Loss 4.1149 LearningRate 0.000772 Epoch: 8 Global Step: 173660 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:03,909-Speed 2498.82 samples/sec Loss 4.0689 LearningRate 0.000772 Epoch: 8 Global Step: 173670 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:12,107-Speed 2499.00 samples/sec Loss 4.0616 LearningRate 0.000772 Epoch: 8 Global Step: 173680 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:20,302-Speed 2499.26 samples/sec Loss 4.0789 LearningRate 0.000772 Epoch: 8 Global Step: 173690 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:28,502-Speed 2498.14 samples/sec Loss 4.0985 LearningRate 0.000772 Epoch: 8 Global Step: 173700 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:36,646-Speed 2514.92 samples/sec Loss 4.0566 LearningRate 0.000772 Epoch: 8 Global Step: 173710 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:44,841-Speed 2499.53 samples/sec Loss 4.0225 LearningRate 0.000772 Epoch: 8 Global Step: 173720 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:07:53,042-Speed 2497.90 samples/sec Loss 4.0168 LearningRate 0.000772 Epoch: 8 Global Step: 173730 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:01,253-Speed 2494.51 samples/sec Loss 4.0160 LearningRate 0.000772 Epoch: 8 Global Step: 173740 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:09,452-Speed 2498.19 samples/sec Loss 4.0975 LearningRate 0.000772 Epoch: 8 Global Step: 173750 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:17,652-Speed 2497.95 samples/sec Loss 4.1398 LearningRate 0.000772 Epoch: 8 Global Step: 173760 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:25,798-Speed 2514.70 samples/sec Loss 4.0531 LearningRate 0.000772 Epoch: 8 Global Step: 173770 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:33,994-Speed 2499.23 samples/sec Loss 4.0429 LearningRate 0.000772 Epoch: 8 Global Step: 173780 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:42,190-Speed 2499.08 samples/sec Loss 4.0606 LearningRate 0.000771 Epoch: 8 Global Step: 173790 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:50,385-Speed 2499.84 samples/sec Loss 4.0726 LearningRate 0.000771 Epoch: 8 Global Step: 173800 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:08:58,587-Speed 2497.50 samples/sec Loss 4.0705 LearningRate 0.000771 Epoch: 8 Global Step: 173810 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:06,784-Speed 2498.77 samples/sec Loss 4.0592 LearningRate 0.000771 Epoch: 8 Global Step: 173820 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:14,930-Speed 2514.72 samples/sec Loss 4.0351 LearningRate 0.000771 Epoch: 8 Global Step: 173830 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:23,143-Speed 2494.13 samples/sec Loss 4.1177 LearningRate 0.000771 Epoch: 8 Global Step: 173840 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:31,342-Speed 2498.14 samples/sec Loss 4.1022 LearningRate 0.000771 Epoch: 8 Global Step: 173850 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:39,540-Speed 2498.79 samples/sec Loss 4.0737 LearningRate 0.000771 Epoch: 8 Global Step: 173860 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:47,750-Speed 2495.16 samples/sec Loss 4.0013 LearningRate 0.000771 Epoch: 8 Global Step: 173870 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:09:55,954-Speed 2496.81 samples/sec Loss 4.0307 LearningRate 0.000771 Epoch: 8 Global Step: 173880 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:04,097-Speed 2515.26 samples/sec Loss 4.0852 LearningRate 0.000771 Epoch: 8 Global Step: 173890 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:12,299-Speed 2497.58 samples/sec Loss 4.0654 LearningRate 0.000771 Epoch: 8 Global Step: 173900 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:20,495-Speed 2499.17 samples/sec Loss 4.1348 LearningRate 0.000771 Epoch: 8 Global Step: 173910 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:28,693-Speed 2498.71 samples/sec Loss 4.0589 LearningRate 0.000771 Epoch: 8 Global Step: 173920 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:36,894-Speed 2497.80 samples/sec Loss 4.0460 LearningRate 0.000771 Epoch: 8 Global Step: 173930 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:45,098-Speed 2497.09 samples/sec Loss 4.0556 LearningRate 0.000771 Epoch: 8 Global Step: 173940 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:10:53,245-Speed 2514.30 samples/sec Loss 4.0607 LearningRate 0.000771 Epoch: 8 Global Step: 173950 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:01,441-Speed 2499.15 samples/sec Loss 4.0887 LearningRate 0.000771 Epoch: 8 Global Step: 173960 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:09,638-Speed 2498.84 samples/sec Loss 4.0456 LearningRate 0.000771 Epoch: 8 Global Step: 173970 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:17,835-Speed 2498.92 samples/sec Loss 4.1094 LearningRate 0.000771 Epoch: 8 Global Step: 173980 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:26,029-Speed 2499.66 samples/sec Loss 4.1377 LearningRate 0.000771 Epoch: 8 Global Step: 173990 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:34,225-Speed 2499.21 samples/sec Loss 4.1633 LearningRate 0.000771 Epoch: 8 Global Step: 174000 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:42,376-Speed 2512.97 samples/sec Loss 4.0584 LearningRate 0.000771 Epoch: 8 Global Step: 174010 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:50,573-Speed 2498.75 samples/sec Loss 4.0126 LearningRate 0.000771 Epoch: 8 Global Step: 174020 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:11:58,772-Speed 2498.48 samples/sec Loss 4.0795 LearningRate 0.000771 Epoch: 8 Global Step: 174030 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:06,969-Speed 2499.04 samples/sec Loss 3.9970 LearningRate 0.000771 Epoch: 8 Global Step: 174040 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:15,168-Speed 2498.17 samples/sec Loss 4.0499 LearningRate 0.000771 Epoch: 8 Global Step: 174050 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:23,367-Speed 2498.38 samples/sec Loss 4.0880 LearningRate 0.000771 Epoch: 8 Global Step: 174060 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:31,517-Speed 2513.41 samples/sec Loss 4.0745 LearningRate 0.000771 Epoch: 8 Global Step: 174070 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:39,718-Speed 2497.56 samples/sec Loss 4.0047 LearningRate 0.000771 Epoch: 8 Global Step: 174080 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:47,933-Speed 2493.47 samples/sec Loss 4.0187 LearningRate 0.000771 Epoch: 8 Global Step: 174090 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:12:56,134-Speed 2497.64 samples/sec Loss 4.0824 LearningRate 0.000771 Epoch: 8 Global Step: 174100 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:04,332-Speed 2498.58 samples/sec Loss 4.0433 LearningRate 0.000771 Epoch: 8 Global Step: 174110 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:12,533-Speed 2497.63 samples/sec Loss 4.1128 LearningRate 0.000771 Epoch: 8 Global Step: 174120 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:20,689-Speed 2511.32 samples/sec Loss 4.0656 LearningRate 0.000771 Epoch: 8 Global Step: 174130 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:28,886-Speed 2498.79 samples/sec Loss 4.0266 LearningRate 0.000771 Epoch: 8 Global Step: 174140 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:37,090-Speed 2497.00 samples/sec Loss 4.0434 LearningRate 0.000771 Epoch: 8 Global Step: 174150 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:45,298-Speed 2495.56 samples/sec Loss 4.0613 LearningRate 0.000771 Epoch: 8 Global Step: 174160 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:13:53,495-Speed 2499.12 samples/sec Loss 4.1103 LearningRate 0.000771 Epoch: 8 Global Step: 174170 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:14:01,694-Speed 2498.19 samples/sec Loss 4.0650 LearningRate 0.000771 Epoch: 8 Global Step: 174180 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:14:09,841-Speed 2514.49 samples/sec Loss 4.0904 LearningRate 0.000771 Epoch: 8 Global Step: 174190 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:14:18,038-Speed 2498.62 samples/sec Loss 4.1334 LearningRate 0.000771 Epoch: 8 Global Step: 174200 Fp16 Grad Scale: 16384 Required: 150 hours Training: 2022-07-07 06:14:26,236-Speed 2498.45 samples/sec Loss 4.0592 LearningRate 0.000770 Epoch: 8 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:14:34,433-Speed 2499.13 samples/sec Loss 4.0071 LearningRate 0.000770 Epoch: 8 Global Step: 174220 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:14:42,631-Speed 2498.65 samples/sec Loss 3.9628 LearningRate 0.000770 Epoch: 8 Global Step: 174230 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:14:50,829-Speed 2498.57 samples/sec Loss 3.9828 LearningRate 0.000770 Epoch: 8 Global Step: 174240 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:14:58,975-Speed 2514.51 samples/sec Loss 4.0413 LearningRate 0.000770 Epoch: 8 Global Step: 174250 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:07,177-Speed 2497.21 samples/sec Loss 3.9877 LearningRate 0.000770 Epoch: 8 Global Step: 174260 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:15,374-Speed 2498.73 samples/sec Loss 4.0394 LearningRate 0.000770 Epoch: 8 Global Step: 174270 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:23,589-Speed 2493.74 samples/sec Loss 4.0006 LearningRate 0.000770 Epoch: 8 Global Step: 174280 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:31,791-Speed 2497.41 samples/sec Loss 4.0408 LearningRate 0.000770 Epoch: 8 Global Step: 174290 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:39,989-Speed 2498.81 samples/sec Loss 3.9777 LearningRate 0.000770 Epoch: 8 Global Step: 174300 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:48,135-Speed 2514.50 samples/sec Loss 4.0149 LearningRate 0.000770 Epoch: 8 Global Step: 174310 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:15:56,333-Speed 2498.59 samples/sec Loss 4.0270 LearningRate 0.000770 Epoch: 8 Global Step: 174320 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:04,531-Speed 2498.57 samples/sec Loss 3.9740 LearningRate 0.000770 Epoch: 8 Global Step: 174330 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:12,742-Speed 2494.42 samples/sec Loss 4.0661 LearningRate 0.000770 Epoch: 8 Global Step: 174340 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:20,945-Speed 2496.97 samples/sec Loss 4.0182 LearningRate 0.000770 Epoch: 8 Global Step: 174350 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:29,141-Speed 2499.15 samples/sec Loss 4.0103 LearningRate 0.000770 Epoch: 8 Global Step: 174360 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:37,290-Speed 2514.50 samples/sec Loss 4.0933 LearningRate 0.000770 Epoch: 8 Global Step: 174370 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:45,493-Speed 2496.91 samples/sec Loss 4.0247 LearningRate 0.000770 Epoch: 8 Global Step: 174380 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:16:53,702-Speed 2495.21 samples/sec Loss 4.0332 LearningRate 0.000770 Epoch: 8 Global Step: 174390 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:01,899-Speed 2498.91 samples/sec Loss 4.0426 LearningRate 0.000770 Epoch: 8 Global Step: 174400 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:10,098-Speed 2498.26 samples/sec Loss 4.0683 LearningRate 0.000770 Epoch: 8 Global Step: 174410 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:18,299-Speed 2497.75 samples/sec Loss 4.0707 LearningRate 0.000770 Epoch: 8 Global Step: 174420 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:26,445-Speed 2514.40 samples/sec Loss 4.1203 LearningRate 0.000770 Epoch: 8 Global Step: 174430 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:34,647-Speed 2497.41 samples/sec Loss 4.1216 LearningRate 0.000770 Epoch: 8 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:42,850-Speed 2496.96 samples/sec Loss 4.1361 LearningRate 0.000770 Epoch: 8 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:51,046-Speed 2499.13 samples/sec Loss 4.0869 LearningRate 0.000770 Epoch: 8 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:17:59,244-Speed 2498.58 samples/sec Loss 4.1101 LearningRate 0.000770 Epoch: 8 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:07,453-Speed 2495.14 samples/sec Loss 4.1088 LearningRate 0.000770 Epoch: 8 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:15,601-Speed 2514.02 samples/sec Loss 4.0321 LearningRate 0.000770 Epoch: 8 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:23,798-Speed 2498.82 samples/sec Loss 4.0936 LearningRate 0.000770 Epoch: 8 Global Step: 174500 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:32,001-Speed 2497.22 samples/sec Loss 4.0971 LearningRate 0.000770 Epoch: 8 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:40,202-Speed 2497.85 samples/sec Loss 4.0668 LearningRate 0.000770 Epoch: 8 Global Step: 174520 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:48,401-Speed 2498.78 samples/sec Loss 4.0933 LearningRate 0.000770 Epoch: 8 Global Step: 174530 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:18:56,608-Speed 2495.84 samples/sec Loss 4.1647 LearningRate 0.000770 Epoch: 8 Global Step: 174540 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:04,752-Speed 2515.13 samples/sec Loss 4.0916 LearningRate 0.000770 Epoch: 8 Global Step: 174550 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:12,953-Speed 2497.63 samples/sec Loss 4.0691 LearningRate 0.000770 Epoch: 8 Global Step: 174560 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:21,158-Speed 2496.59 samples/sec Loss 4.0599 LearningRate 0.000770 Epoch: 8 Global Step: 174570 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:29,382-Speed 2490.93 samples/sec Loss 4.1137 LearningRate 0.000770 Epoch: 8 Global Step: 174580 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:37,588-Speed 2496.08 samples/sec Loss 4.0118 LearningRate 0.000770 Epoch: 8 Global Step: 174590 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:45,800-Speed 2494.23 samples/sec Loss 4.0573 LearningRate 0.000770 Epoch: 8 Global Step: 174600 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:19:53,948-Speed 2514.08 samples/sec Loss 4.0195 LearningRate 0.000770 Epoch: 8 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:02,157-Speed 2495.25 samples/sec Loss 3.9608 LearningRate 0.000770 Epoch: 8 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:10,367-Speed 2494.67 samples/sec Loss 4.0389 LearningRate 0.000770 Epoch: 8 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:18,592-Speed 2490.55 samples/sec Loss 3.9993 LearningRate 0.000769 Epoch: 8 Global Step: 174640 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:26,794-Speed 2497.35 samples/sec Loss 4.0664 LearningRate 0.000769 Epoch: 8 Global Step: 174650 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:34,996-Speed 2497.18 samples/sec Loss 4.0895 LearningRate 0.000769 Epoch: 8 Global Step: 174660 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:43,142-Speed 2514.76 samples/sec Loss 3.9558 LearningRate 0.000769 Epoch: 8 Global Step: 174670 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:51,340-Speed 2498.64 samples/sec Loss 4.0672 LearningRate 0.000769 Epoch: 8 Global Step: 174680 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:20:59,538-Speed 2498.62 samples/sec Loss 4.0951 LearningRate 0.000769 Epoch: 8 Global Step: 174690 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:07,734-Speed 2499.24 samples/sec Loss 4.1133 LearningRate 0.000769 Epoch: 8 Global Step: 174700 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:15,930-Speed 2499.48 samples/sec Loss 4.0751 LearningRate 0.000769 Epoch: 8 Global Step: 174710 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:24,129-Speed 2498.60 samples/sec Loss 4.1415 LearningRate 0.000769 Epoch: 8 Global Step: 174720 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:32,283-Speed 2511.69 samples/sec Loss 4.0263 LearningRate 0.000769 Epoch: 8 Global Step: 174730 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:40,983-Speed 2500.41 samples/sec Loss 4.0775 LearningRate 0.000769 Epoch: 8 Global Step: 174740 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:49,654-Speed 2500.42 samples/sec Loss 4.0381 LearningRate 0.000769 Epoch: 8 Global Step: 174750 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:21:58,009-Speed 2497.54 samples/sec Loss 4.0574 LearningRate 0.000769 Epoch: 8 Global Step: 174760 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:06,205-Speed 2499.24 samples/sec Loss 4.0290 LearningRate 0.000769 Epoch: 8 Global Step: 174770 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:14,401-Speed 2499.23 samples/sec Loss 4.0402 LearningRate 0.000769 Epoch: 8 Global Step: 174780 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:22,545-Speed 2515.13 samples/sec Loss 4.0613 LearningRate 0.000769 Epoch: 8 Global Step: 174790 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:30,744-Speed 2498.37 samples/sec Loss 3.9746 LearningRate 0.000769 Epoch: 8 Global Step: 174800 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:38,959-Speed 2493.49 samples/sec Loss 3.9378 LearningRate 0.000769 Epoch: 8 Global Step: 174810 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:47,160-Speed 2497.76 samples/sec Loss 4.0206 LearningRate 0.000769 Epoch: 8 Global Step: 174820 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:22:55,366-Speed 2496.15 samples/sec Loss 4.0273 LearningRate 0.000769 Epoch: 8 Global Step: 174830 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:03,568-Speed 2497.71 samples/sec Loss 4.0889 LearningRate 0.000769 Epoch: 8 Global Step: 174840 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:11,713-Speed 2514.86 samples/sec Loss 4.0265 LearningRate 0.000769 Epoch: 8 Global Step: 174850 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:19,912-Speed 2498.34 samples/sec Loss 3.9949 LearningRate 0.000769 Epoch: 8 Global Step: 174860 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:28,122-Speed 2494.79 samples/sec Loss 4.0873 LearningRate 0.000769 Epoch: 8 Global Step: 174870 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:36,319-Speed 2498.93 samples/sec Loss 4.1004 LearningRate 0.000769 Epoch: 8 Global Step: 174880 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:44,521-Speed 2497.34 samples/sec Loss 4.1482 LearningRate 0.000769 Epoch: 8 Global Step: 174890 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:23:52,718-Speed 2498.96 samples/sec Loss 4.0078 LearningRate 0.000769 Epoch: 8 Global Step: 174900 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:00,860-Speed 2515.76 samples/sec Loss 4.0804 LearningRate 0.000769 Epoch: 8 Global Step: 174910 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:09,069-Speed 2495.35 samples/sec Loss 4.0119 LearningRate 0.000769 Epoch: 8 Global Step: 174920 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:17,269-Speed 2498.15 samples/sec Loss 4.1267 LearningRate 0.000769 Epoch: 8 Global Step: 174930 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:25,466-Speed 2499.08 samples/sec Loss 4.0414 LearningRate 0.000769 Epoch: 8 Global Step: 174940 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:33,662-Speed 2499.43 samples/sec Loss 4.0882 LearningRate 0.000769 Epoch: 8 Global Step: 174950 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:41,858-Speed 2499.02 samples/sec Loss 4.1292 LearningRate 0.000769 Epoch: 8 Global Step: 174960 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:50,007-Speed 2513.70 samples/sec Loss 4.0662 LearningRate 0.000769 Epoch: 8 Global Step: 174970 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:24:58,216-Speed 2495.17 samples/sec Loss 4.0091 LearningRate 0.000769 Epoch: 8 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:06,417-Speed 2497.58 samples/sec Loss 4.0872 LearningRate 0.000769 Epoch: 8 Global Step: 174990 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:14,614-Speed 2499.15 samples/sec Loss 4.1218 LearningRate 0.000769 Epoch: 8 Global Step: 175000 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:22,813-Speed 2498.31 samples/sec Loss 4.0454 LearningRate 0.000769 Epoch: 8 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:31,011-Speed 2498.63 samples/sec Loss 4.0141 LearningRate 0.000769 Epoch: 8 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:39,164-Speed 2515.81 samples/sec Loss 4.0278 LearningRate 0.000769 Epoch: 8 Global Step: 175030 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:47,368-Speed 2496.91 samples/sec Loss 4.0553 LearningRate 0.000769 Epoch: 8 Global Step: 175040 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:25:55,569-Speed 2497.71 samples/sec Loss 4.0862 LearningRate 0.000769 Epoch: 8 Global Step: 175050 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:03,765-Speed 2499.25 samples/sec Loss 4.0750 LearningRate 0.000768 Epoch: 8 Global Step: 175060 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:11,984-Speed 2492.22 samples/sec Loss 4.0639 LearningRate 0.000768 Epoch: 8 Global Step: 175070 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:20,182-Speed 2498.35 samples/sec Loss 4.0514 LearningRate 0.000768 Epoch: 8 Global Step: 175080 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:28,331-Speed 2513.84 samples/sec Loss 4.0102 LearningRate 0.000768 Epoch: 8 Global Step: 175090 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:36,529-Speed 2498.43 samples/sec Loss 4.0332 LearningRate 0.000768 Epoch: 8 Global Step: 175100 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:44,732-Speed 2497.05 samples/sec Loss 4.0260 LearningRate 0.000768 Epoch: 8 Global Step: 175110 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:26:52,929-Speed 2498.95 samples/sec Loss 4.0304 LearningRate 0.000768 Epoch: 8 Global Step: 175120 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:27:01,131-Speed 2497.81 samples/sec Loss 4.0525 LearningRate 0.000768 Epoch: 8 Global Step: 175130 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:27:09,332-Speed 2497.45 samples/sec Loss 3.9580 LearningRate 0.000768 Epoch: 8 Global Step: 175140 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:27:17,477-Speed 2514.98 samples/sec Loss 4.0207 LearningRate 0.000768 Epoch: 8 Global Step: 175150 Fp16 Grad Scale: 32768 Required: 150 hours Training: 2022-07-07 06:27:25,676-Speed 2498.31 samples/sec Loss 3.9416 LearningRate 0.000768 Epoch: 8 Global Step: 175160 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:27:33,873-Speed 2498.93 samples/sec Loss 3.9883 LearningRate 0.000768 Epoch: 8 Global Step: 175170 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:27:42,085-Speed 2494.55 samples/sec Loss 4.1041 LearningRate 0.000768 Epoch: 8 Global Step: 175180 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:27:50,295-Speed 2495.02 samples/sec Loss 4.0731 LearningRate 0.000768 Epoch: 8 Global Step: 175190 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:27:58,495-Speed 2498.12 samples/sec Loss 3.9742 LearningRate 0.000768 Epoch: 8 Global Step: 175200 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:06,649-Speed 2511.85 samples/sec Loss 4.0948 LearningRate 0.000768 Epoch: 8 Global Step: 175210 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:14,847-Speed 2498.50 samples/sec Loss 4.0732 LearningRate 0.000768 Epoch: 8 Global Step: 175220 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:23,043-Speed 2499.55 samples/sec Loss 4.0971 LearningRate 0.000768 Epoch: 8 Global Step: 175230 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:31,260-Speed 2492.70 samples/sec Loss 3.9897 LearningRate 0.000768 Epoch: 8 Global Step: 175240 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:39,455-Speed 2499.30 samples/sec Loss 4.0198 LearningRate 0.000768 Epoch: 8 Global Step: 175250 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:47,653-Speed 2498.54 samples/sec Loss 4.0410 LearningRate 0.000768 Epoch: 8 Global Step: 175260 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:28:55,800-Speed 2514.44 samples/sec Loss 4.0154 LearningRate 0.000768 Epoch: 8 Global Step: 175270 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:04,021-Speed 2491.51 samples/sec Loss 4.0686 LearningRate 0.000768 Epoch: 8 Global Step: 175280 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:12,219-Speed 2498.79 samples/sec Loss 3.9960 LearningRate 0.000768 Epoch: 8 Global Step: 175290 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:20,415-Speed 2499.04 samples/sec Loss 4.0390 LearningRate 0.000768 Epoch: 8 Global Step: 175300 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:28,611-Speed 2499.09 samples/sec Loss 3.9871 LearningRate 0.000768 Epoch: 8 Global Step: 175310 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:36,811-Speed 2498.06 samples/sec Loss 4.0365 LearningRate 0.000768 Epoch: 8 Global Step: 175320 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:44,955-Speed 2514.97 samples/sec Loss 4.0341 LearningRate 0.000768 Epoch: 8 Global Step: 175330 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:29:53,159-Speed 2496.86 samples/sec Loss 4.0506 LearningRate 0.000768 Epoch: 8 Global Step: 175340 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:01,359-Speed 2497.97 samples/sec Loss 4.0007 LearningRate 0.000768 Epoch: 8 Global Step: 175350 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:09,561-Speed 2497.46 samples/sec Loss 4.0353 LearningRate 0.000768 Epoch: 8 Global Step: 175360 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:17,777-Speed 2493.26 samples/sec Loss 3.9453 LearningRate 0.000768 Epoch: 8 Global Step: 175370 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:25,982-Speed 2496.51 samples/sec Loss 4.0314 LearningRate 0.000768 Epoch: 8 Global Step: 175380 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:34,133-Speed 2512.94 samples/sec Loss 3.9918 LearningRate 0.000768 Epoch: 8 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:42,330-Speed 2498.85 samples/sec Loss 3.9580 LearningRate 0.000768 Epoch: 8 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:30:50,539-Speed 2495.55 samples/sec Loss 4.0381 LearningRate 0.000768 Epoch: 8 Global Step: 175410 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:30:58,736-Speed 2498.83 samples/sec Loss 3.9927 LearningRate 0.000768 Epoch: 8 Global Step: 175420 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:06,935-Speed 2498.43 samples/sec Loss 3.9466 LearningRate 0.000768 Epoch: 8 Global Step: 175430 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:15,135-Speed 2497.84 samples/sec Loss 4.0144 LearningRate 0.000768 Epoch: 8 Global Step: 175440 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:23,282-Speed 2514.29 samples/sec Loss 4.0750 LearningRate 0.000768 Epoch: 8 Global Step: 175450 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:31,481-Speed 2498.38 samples/sec Loss 4.0981 LearningRate 0.000768 Epoch: 8 Global Step: 175460 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:39,683-Speed 2497.30 samples/sec Loss 4.0159 LearningRate 0.000768 Epoch: 8 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:47,897-Speed 2493.67 samples/sec Loss 3.9705 LearningRate 0.000768 Epoch: 8 Global Step: 175480 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:31:56,097-Speed 2498.33 samples/sec Loss 4.0140 LearningRate 0.000767 Epoch: 8 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:04,295-Speed 2498.69 samples/sec Loss 3.9865 LearningRate 0.000767 Epoch: 8 Global Step: 175500 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:12,438-Speed 2515.31 samples/sec Loss 4.0234 LearningRate 0.000767 Epoch: 8 Global Step: 175510 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:20,634-Speed 2499.20 samples/sec Loss 4.0905 LearningRate 0.000767 Epoch: 8 Global Step: 175520 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:28,851-Speed 2492.74 samples/sec Loss 4.0431 LearningRate 0.000767 Epoch: 8 Global Step: 175530 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:37,047-Speed 2499.30 samples/sec Loss 4.1111 LearningRate 0.000767 Epoch: 8 Global Step: 175540 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:45,246-Speed 2498.34 samples/sec Loss 4.0182 LearningRate 0.000767 Epoch: 8 Global Step: 175550 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:32:53,448-Speed 2497.47 samples/sec Loss 4.0044 LearningRate 0.000767 Epoch: 8 Global Step: 175560 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:33:01,597-Speed 2513.50 samples/sec Loss 4.0633 LearningRate 0.000767 Epoch: 8 Global Step: 175570 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:33:09,756-Speed 2510.50 samples/sec Loss 4.0620 LearningRate 0.000767 Epoch: 8 Global Step: 175580 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:17,960-Speed 2496.70 samples/sec Loss 4.0436 LearningRate 0.000767 Epoch: 8 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:26,160-Speed 2498.03 samples/sec Loss 4.0116 LearningRate 0.000767 Epoch: 8 Global Step: 175600 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:34,355-Speed 2499.47 samples/sec Loss 3.9874 LearningRate 0.000767 Epoch: 8 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:42,554-Speed 2498.34 samples/sec Loss 4.1246 LearningRate 0.000767 Epoch: 8 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:50,697-Speed 2515.27 samples/sec Loss 4.0579 LearningRate 0.000767 Epoch: 8 Global Step: 175630 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:33:58,896-Speed 2498.77 samples/sec Loss 3.9922 LearningRate 0.000767 Epoch: 8 Global Step: 175640 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:07,095-Speed 2498.40 samples/sec Loss 4.0436 LearningRate 0.000767 Epoch: 8 Global Step: 175650 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:15,304-Speed 2495.37 samples/sec Loss 4.0227 LearningRate 0.000767 Epoch: 8 Global Step: 175660 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:23,504-Speed 2497.80 samples/sec Loss 3.9738 LearningRate 0.000767 Epoch: 8 Global Step: 175670 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:31,702-Speed 2498.66 samples/sec Loss 4.0639 LearningRate 0.000767 Epoch: 8 Global Step: 175680 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:40,280-Speed 2517.60 samples/sec Loss 4.1024 LearningRate 0.000767 Epoch: 8 Global Step: 175690 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:48,479-Speed 2497.93 samples/sec Loss 4.0534 LearningRate 0.000767 Epoch: 8 Global Step: 175700 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:34:56,700-Speed 2501.28 samples/sec Loss 4.0434 LearningRate 0.000767 Epoch: 8 Global Step: 175710 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:04,922-Speed 2500.50 samples/sec Loss 4.0305 LearningRate 0.000767 Epoch: 8 Global Step: 175720 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:13,122-Speed 2497.72 samples/sec Loss 4.1017 LearningRate 0.000767 Epoch: 8 Global Step: 175730 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:21,363-Speed 2501.00 samples/sec Loss 4.0245 LearningRate 0.000767 Epoch: 8 Global Step: 175740 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:31,093-Speed 2118.79 samples/sec Loss 3.9369 LearningRate 0.000767 Epoch: 8 Global Step: 175750 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:39,294-Speed 2497.90 samples/sec Loss 4.0682 LearningRate 0.000767 Epoch: 8 Global Step: 175760 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:47,496-Speed 2497.12 samples/sec Loss 4.0595 LearningRate 0.000767 Epoch: 8 Global Step: 175770 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:35:55,702-Speed 2496.14 samples/sec Loss 4.0310 LearningRate 0.000767 Epoch: 8 Global Step: 175780 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:03,903-Speed 2497.81 samples/sec Loss 3.9949 LearningRate 0.000767 Epoch: 8 Global Step: 175790 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:12,128-Speed 2500.70 samples/sec Loss 4.0585 LearningRate 0.000767 Epoch: 8 Global Step: 175800 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:20,273-Speed 2514.99 samples/sec Loss 4.1187 LearningRate 0.000767 Epoch: 8 Global Step: 175810 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:28,509-Speed 2500.70 samples/sec Loss 3.9843 LearningRate 0.000767 Epoch: 8 Global Step: 175820 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:36,727-Speed 2499.20 samples/sec Loss 4.0001 LearningRate 0.000767 Epoch: 8 Global Step: 175830 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:44,954-Speed 2501.19 samples/sec Loss 4.1138 LearningRate 0.000767 Epoch: 8 Global Step: 175840 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:36:53,152-Speed 2498.74 samples/sec Loss 3.9600 LearningRate 0.000767 Epoch: 8 Global Step: 175850 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:01,621-Speed 2418.27 samples/sec Loss 4.0808 LearningRate 0.000767 Epoch: 8 Global Step: 175860 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:09,769-Speed 2518.68 samples/sec Loss 4.0263 LearningRate 0.000767 Epoch: 8 Global Step: 175870 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:17,986-Speed 2500.74 samples/sec Loss 4.0701 LearningRate 0.000767 Epoch: 8 Global Step: 175880 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:26,184-Speed 2498.51 samples/sec Loss 4.0655 LearningRate 0.000767 Epoch: 8 Global Step: 175890 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:34,377-Speed 2499.92 samples/sec Loss 4.0258 LearningRate 0.000767 Epoch: 8 Global Step: 175900 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:42,662-Speed 2497.12 samples/sec Loss 4.0692 LearningRate 0.000766 Epoch: 8 Global Step: 175910 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:37:50,918-Speed 2500.00 samples/sec Loss 4.0520 LearningRate 0.000766 Epoch: 8 Global Step: 175920 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:38:00,952-Speed 2041.32 samples/sec Loss 4.1008 LearningRate 0.000766 Epoch: 8 Global Step: 175930 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:38:09,161-Speed 2495.30 samples/sec Loss 4.0578 LearningRate 0.000766 Epoch: 8 Global Step: 175940 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:38:17,376-Speed 2500.67 samples/sec Loss 3.9841 LearningRate 0.000766 Epoch: 8 Global Step: 175950 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:38:52,742-Speed 580.44 samples/sec Loss 3.9705 LearningRate 0.000766 Epoch: 8 Global Step: 175960 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:01,555-Speed 2331.22 samples/sec Loss 3.9719 LearningRate 0.000766 Epoch: 8 Global Step: 175970 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:14,220-Speed 1617.24 samples/sec Loss 3.9778 LearningRate 0.000766 Epoch: 8 Global Step: 175980 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:22,665-Speed 2516.51 samples/sec Loss 4.0082 LearningRate 0.000766 Epoch: 8 Global Step: 175990 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:31,272-Speed 2390.18 samples/sec Loss 3.9699 LearningRate 0.000766 Epoch: 8 Global Step: 176000 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:39,489-Speed 2492.68 samples/sec Loss 3.9757 LearningRate 0.000766 Epoch: 8 Global Step: 176010 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:47,759-Speed 2477.05 samples/sec Loss 3.9898 LearningRate 0.000766 Epoch: 8 Global Step: 176020 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:39:55,967-Speed 2495.51 samples/sec Loss 4.0472 LearningRate 0.000766 Epoch: 8 Global Step: 176030 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:04,174-Speed 2495.50 samples/sec Loss 3.9956 LearningRate 0.000766 Epoch: 8 Global Step: 176040 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:12,327-Speed 2512.62 samples/sec Loss 4.0073 LearningRate 0.000766 Epoch: 8 Global Step: 176050 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:20,532-Speed 2496.38 samples/sec Loss 4.0261 LearningRate 0.000766 Epoch: 8 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:28,737-Speed 2496.49 samples/sec Loss 3.9571 LearningRate 0.000766 Epoch: 8 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:36,940-Speed 2497.11 samples/sec Loss 3.9798 LearningRate 0.000766 Epoch: 8 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:45,142-Speed 2497.63 samples/sec Loss 3.9942 LearningRate 0.000766 Epoch: 8 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:40:53,345-Speed 2497.00 samples/sec Loss 4.0042 LearningRate 0.000766 Epoch: 8 Global Step: 176100 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:01,496-Speed 2513.08 samples/sec Loss 4.0630 LearningRate 0.000766 Epoch: 8 Global Step: 176110 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:09,704-Speed 2495.48 samples/sec Loss 4.1396 LearningRate 0.000766 Epoch: 8 Global Step: 176120 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:17,905-Speed 2497.63 samples/sec Loss 4.1082 LearningRate 0.000766 Epoch: 8 Global Step: 176130 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:26,108-Speed 2497.18 samples/sec Loss 4.0036 LearningRate 0.000766 Epoch: 8 Global Step: 176140 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:34,310-Speed 2497.22 samples/sec Loss 4.0422 LearningRate 0.000766 Epoch: 8 Global Step: 176150 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:42,511-Speed 2497.71 samples/sec Loss 4.0413 LearningRate 0.000766 Epoch: 8 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:50,663-Speed 2513.09 samples/sec Loss 3.9544 LearningRate 0.000766 Epoch: 8 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:41:58,864-Speed 2497.55 samples/sec Loss 4.1101 LearningRate 0.000766 Epoch: 8 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:07,069-Speed 2496.48 samples/sec Loss 4.1344 LearningRate 0.000766 Epoch: 8 Global Step: 176190 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:15,289-Speed 2491.93 samples/sec Loss 4.0633 LearningRate 0.000766 Epoch: 8 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:23,497-Speed 2495.33 samples/sec Loss 4.0151 LearningRate 0.000766 Epoch: 8 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:31,704-Speed 2495.73 samples/sec Loss 4.0476 LearningRate 0.000766 Epoch: 8 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:39,854-Speed 2513.25 samples/sec Loss 4.0863 LearningRate 0.000766 Epoch: 8 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:48,056-Speed 2497.59 samples/sec Loss 4.0283 LearningRate 0.000766 Epoch: 8 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:42:56,277-Speed 2491.43 samples/sec Loss 4.0132 LearningRate 0.000766 Epoch: 8 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:04,476-Speed 2498.13 samples/sec Loss 3.9728 LearningRate 0.000766 Epoch: 8 Global Step: 176260 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:12,677-Speed 2497.55 samples/sec Loss 3.9531 LearningRate 0.000766 Epoch: 8 Global Step: 176270 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:20,879-Speed 2497.47 samples/sec Loss 3.9676 LearningRate 0.000766 Epoch: 8 Global Step: 176280 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:29,039-Speed 2510.28 samples/sec Loss 3.9859 LearningRate 0.000766 Epoch: 8 Global Step: 176290 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:37,261-Speed 2491.23 samples/sec Loss 4.0271 LearningRate 0.000766 Epoch: 8 Global Step: 176300 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:45,465-Speed 2496.73 samples/sec Loss 4.0075 LearningRate 0.000766 Epoch: 8 Global Step: 176310 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:43:53,664-Speed 2498.34 samples/sec Loss 4.0113 LearningRate 0.000766 Epoch: 8 Global Step: 176320 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:01,865-Speed 2497.73 samples/sec Loss 4.0595 LearningRate 0.000766 Epoch: 8 Global Step: 176330 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:10,066-Speed 2497.55 samples/sec Loss 4.0327 LearningRate 0.000765 Epoch: 8 Global Step: 176340 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:18,228-Speed 2509.68 samples/sec Loss 4.0930 LearningRate 0.000765 Epoch: 8 Global Step: 176350 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:26,427-Speed 2498.07 samples/sec Loss 3.9578 LearningRate 0.000765 Epoch: 8 Global Step: 176360 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:34,630-Speed 2497.04 samples/sec Loss 4.0421 LearningRate 0.000765 Epoch: 8 Global Step: 176370 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:42,837-Speed 2495.99 samples/sec Loss 4.0433 LearningRate 0.000765 Epoch: 8 Global Step: 176380 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:51,041-Speed 2496.71 samples/sec Loss 4.0538 LearningRate 0.000765 Epoch: 8 Global Step: 176390 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:44:59,247-Speed 2496.12 samples/sec Loss 4.0705 LearningRate 0.000765 Epoch: 8 Global Step: 176400 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:07,395-Speed 2513.72 samples/sec Loss 4.0539 LearningRate 0.000765 Epoch: 8 Global Step: 176410 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:15,609-Speed 2493.76 samples/sec Loss 3.9876 LearningRate 0.000765 Epoch: 8 Global Step: 176420 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:23,813-Speed 2496.89 samples/sec Loss 3.9215 LearningRate 0.000765 Epoch: 8 Global Step: 176430 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:32,015-Speed 2497.29 samples/sec Loss 4.0393 LearningRate 0.000765 Epoch: 8 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:40,214-Speed 2498.32 samples/sec Loss 3.9968 LearningRate 0.000765 Epoch: 8 Global Step: 176450 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:48,418-Speed 2496.61 samples/sec Loss 4.0992 LearningRate 0.000765 Epoch: 8 Global Step: 176460 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:45:56,562-Speed 2515.33 samples/sec Loss 4.1524 LearningRate 0.000765 Epoch: 8 Global Step: 176470 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:04,764-Speed 2497.27 samples/sec Loss 3.9981 LearningRate 0.000765 Epoch: 8 Global Step: 176480 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:12,963-Speed 2498.25 samples/sec Loss 4.0266 LearningRate 0.000765 Epoch: 8 Global Step: 176490 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:21,162-Speed 2498.55 samples/sec Loss 3.9957 LearningRate 0.000765 Epoch: 8 Global Step: 176500 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:29,368-Speed 2496.27 samples/sec Loss 4.0502 LearningRate 0.000765 Epoch: 8 Global Step: 176510 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:37,567-Speed 2498.22 samples/sec Loss 3.9786 LearningRate 0.000765 Epoch: 8 Global Step: 176520 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:45,713-Speed 2514.46 samples/sec Loss 3.9612 LearningRate 0.000765 Epoch: 8 Global Step: 176530 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:46:53,912-Speed 2498.19 samples/sec Loss 4.0560 LearningRate 0.000765 Epoch: 8 Global Step: 176540 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:02,134-Speed 2491.49 samples/sec Loss 3.9726 LearningRate 0.000765 Epoch: 8 Global Step: 176550 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:10,335-Speed 2497.64 samples/sec Loss 4.0090 LearningRate 0.000765 Epoch: 8 Global Step: 176560 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:18,528-Speed 2500.02 samples/sec Loss 4.1477 LearningRate 0.000765 Epoch: 8 Global Step: 176570 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:26,737-Speed 2495.10 samples/sec Loss 4.1161 LearningRate 0.000765 Epoch: 8 Global Step: 176580 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:34,895-Speed 2510.96 samples/sec Loss 4.0206 LearningRate 0.000765 Epoch: 8 Global Step: 176590 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:43,090-Speed 2499.66 samples/sec Loss 4.0976 LearningRate 0.000765 Epoch: 8 Global Step: 176600 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:51,291-Speed 2497.45 samples/sec Loss 4.1556 LearningRate 0.000765 Epoch: 8 Global Step: 176610 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:47:59,488-Speed 2498.98 samples/sec Loss 4.0493 LearningRate 0.000765 Epoch: 8 Global Step: 176620 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:07,699-Speed 2494.56 samples/sec Loss 3.9983 LearningRate 0.000765 Epoch: 8 Global Step: 176630 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:15,900-Speed 2497.80 samples/sec Loss 4.0728 LearningRate 0.000765 Epoch: 8 Global Step: 176640 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:24,046-Speed 2514.39 samples/sec Loss 4.0554 LearningRate 0.000765 Epoch: 8 Global Step: 176650 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:32,247-Speed 2497.74 samples/sec Loss 3.9658 LearningRate 0.000765 Epoch: 8 Global Step: 176660 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:40,452-Speed 2496.48 samples/sec Loss 4.0554 LearningRate 0.000765 Epoch: 8 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:48,652-Speed 2498.09 samples/sec Loss 4.0004 LearningRate 0.000765 Epoch: 8 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:48:56,848-Speed 2499.05 samples/sec Loss 3.9822 LearningRate 0.000765 Epoch: 8 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:05,047-Speed 2498.25 samples/sec Loss 3.9792 LearningRate 0.000765 Epoch: 8 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:13,193-Speed 2514.64 samples/sec Loss 4.0332 LearningRate 0.000765 Epoch: 8 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:21,407-Speed 2493.67 samples/sec Loss 4.0538 LearningRate 0.000765 Epoch: 8 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:29,603-Speed 2499.04 samples/sec Loss 4.0227 LearningRate 0.000765 Epoch: 8 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:37,802-Speed 2498.35 samples/sec Loss 3.9916 LearningRate 0.000765 Epoch: 8 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:46,000-Speed 2498.39 samples/sec Loss 4.0189 LearningRate 0.000765 Epoch: 8 Global Step: 176750 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:49:54,206-Speed 2496.17 samples/sec Loss 4.0402 LearningRate 0.000765 Epoch: 8 Global Step: 176760 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:50:02,349-Speed 2517.11 samples/sec Loss 4.0234 LearningRate 0.000764 Epoch: 8 Global Step: 176770 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 06:50:10,557-Speed 2495.28 samples/sec Loss 4.0310 LearningRate 0.000764 Epoch: 8 Global Step: 176780 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:18,756-Speed 2498.37 samples/sec Loss 3.9593 LearningRate 0.000764 Epoch: 8 Global Step: 176790 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:26,958-Speed 2497.60 samples/sec Loss 3.9865 LearningRate 0.000764 Epoch: 8 Global Step: 176800 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:35,171-Speed 2494.08 samples/sec Loss 4.0172 LearningRate 0.000764 Epoch: 8 Global Step: 176810 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:43,373-Speed 2497.24 samples/sec Loss 4.0159 LearningRate 0.000764 Epoch: 8 Global Step: 176820 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:51,521-Speed 2514.05 samples/sec Loss 3.9659 LearningRate 0.000764 Epoch: 8 Global Step: 176830 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:50:59,725-Speed 2496.56 samples/sec Loss 4.0511 LearningRate 0.000764 Epoch: 8 Global Step: 176840 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:07,927-Speed 2497.75 samples/sec Loss 4.0751 LearningRate 0.000764 Epoch: 8 Global Step: 176850 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:16,128-Speed 2497.82 samples/sec Loss 4.0700 LearningRate 0.000764 Epoch: 8 Global Step: 176860 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:24,330-Speed 2497.26 samples/sec Loss 4.1157 LearningRate 0.000764 Epoch: 8 Global Step: 176870 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:32,533-Speed 2497.15 samples/sec Loss 4.1044 LearningRate 0.000764 Epoch: 8 Global Step: 176880 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:40,683-Speed 2513.16 samples/sec Loss 3.9695 LearningRate 0.000764 Epoch: 8 Global Step: 176890 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:48,883-Speed 2498.11 samples/sec Loss 4.0043 LearningRate 0.000764 Epoch: 8 Global Step: 176900 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:51:57,091-Speed 2495.55 samples/sec Loss 3.9666 LearningRate 0.000764 Epoch: 8 Global Step: 176910 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:05,287-Speed 2498.95 samples/sec Loss 4.0379 LearningRate 0.000764 Epoch: 8 Global Step: 176920 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:13,486-Speed 2498.38 samples/sec Loss 4.0057 LearningRate 0.000764 Epoch: 8 Global Step: 176930 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:21,684-Speed 2498.58 samples/sec Loss 3.9935 LearningRate 0.000764 Epoch: 8 Global Step: 176940 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:29,833-Speed 2513.39 samples/sec Loss 3.9610 LearningRate 0.000764 Epoch: 8 Global Step: 176950 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:38,032-Speed 2498.53 samples/sec Loss 4.0426 LearningRate 0.000764 Epoch: 8 Global Step: 176960 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:46,230-Speed 2498.26 samples/sec Loss 4.0476 LearningRate 0.000764 Epoch: 8 Global Step: 176970 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:52:54,439-Speed 2495.21 samples/sec Loss 4.0245 LearningRate 0.000764 Epoch: 8 Global Step: 176980 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:02,639-Speed 2498.10 samples/sec Loss 4.0391 LearningRate 0.000764 Epoch: 8 Global Step: 176990 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:10,838-Speed 2498.09 samples/sec Loss 4.0378 LearningRate 0.000764 Epoch: 8 Global Step: 177000 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:18,993-Speed 2511.80 samples/sec Loss 4.0002 LearningRate 0.000764 Epoch: 8 Global Step: 177010 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:27,200-Speed 2495.80 samples/sec Loss 3.9664 LearningRate 0.000764 Epoch: 8 Global Step: 177020 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:35,406-Speed 2496.15 samples/sec Loss 3.9747 LearningRate 0.000764 Epoch: 8 Global Step: 177030 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:43,611-Speed 2496.63 samples/sec Loss 4.0094 LearningRate 0.000764 Epoch: 8 Global Step: 177040 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:53:51,811-Speed 2498.23 samples/sec Loss 4.0021 LearningRate 0.000764 Epoch: 8 Global Step: 177050 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:00,012-Speed 2497.88 samples/sec Loss 4.0052 LearningRate 0.000764 Epoch: 8 Global Step: 177060 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:08,160-Speed 2513.79 samples/sec Loss 4.0106 LearningRate 0.000764 Epoch: 8 Global Step: 177070 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:16,362-Speed 2499.84 samples/sec Loss 3.9889 LearningRate 0.000764 Epoch: 8 Global Step: 177080 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:24,577-Speed 2493.38 samples/sec Loss 4.0329 LearningRate 0.000764 Epoch: 8 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:32,774-Speed 2499.09 samples/sec Loss 4.0353 LearningRate 0.000764 Epoch: 8 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:40,971-Speed 2498.62 samples/sec Loss 3.9670 LearningRate 0.000764 Epoch: 8 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:49,172-Speed 2497.53 samples/sec Loss 4.0222 LearningRate 0.000764 Epoch: 8 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:54:57,317-Speed 2514.96 samples/sec Loss 3.9401 LearningRate 0.000764 Epoch: 8 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:05,517-Speed 2497.96 samples/sec Loss 3.9939 LearningRate 0.000764 Epoch: 8 Global Step: 177140 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:13,715-Speed 2498.59 samples/sec Loss 3.9571 LearningRate 0.000764 Epoch: 8 Global Step: 177150 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:21,910-Speed 2499.53 samples/sec Loss 3.9770 LearningRate 0.000764 Epoch: 8 Global Step: 177160 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:30,107-Speed 2498.78 samples/sec Loss 4.0139 LearningRate 0.000764 Epoch: 8 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:38,311-Speed 2496.73 samples/sec Loss 3.9281 LearningRate 0.000764 Epoch: 8 Global Step: 177180 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:46,456-Speed 2514.79 samples/sec Loss 3.9508 LearningRate 0.000763 Epoch: 8 Global Step: 177190 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:55:54,663-Speed 2496.09 samples/sec Loss 4.0168 LearningRate 0.000763 Epoch: 8 Global Step: 177200 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:02,866-Speed 2497.01 samples/sec Loss 3.9958 LearningRate 0.000763 Epoch: 8 Global Step: 177210 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:11,064-Speed 2498.55 samples/sec Loss 4.0375 LearningRate 0.000763 Epoch: 8 Global Step: 177220 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:19,267-Speed 2497.10 samples/sec Loss 4.0181 LearningRate 0.000763 Epoch: 8 Global Step: 177230 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:30,694-Speed 1792.44 samples/sec Loss 3.9568 LearningRate 0.000763 Epoch: 8 Global Step: 177240 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:38,832-Speed 2516.93 samples/sec Loss 4.0668 LearningRate 0.000763 Epoch: 8 Global Step: 177250 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:47,040-Speed 2495.40 samples/sec Loss 3.9274 LearningRate 0.000763 Epoch: 8 Global Step: 177260 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:56:55,298-Speed 2501.45 samples/sec Loss 4.0119 LearningRate 0.000763 Epoch: 8 Global Step: 177270 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:03,499-Speed 2497.66 samples/sec Loss 4.1145 LearningRate 0.000763 Epoch: 8 Global Step: 177280 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:11,695-Speed 2499.12 samples/sec Loss 4.0328 LearningRate 0.000763 Epoch: 8 Global Step: 177290 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:19,892-Speed 2499.32 samples/sec Loss 4.1333 LearningRate 0.000763 Epoch: 8 Global Step: 177300 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:28,038-Speed 2514.60 samples/sec Loss 4.0626 LearningRate 0.000763 Epoch: 8 Global Step: 177310 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:36,236-Speed 2498.47 samples/sec Loss 4.0366 LearningRate 0.000763 Epoch: 8 Global Step: 177320 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:44,438-Speed 2497.43 samples/sec Loss 4.0128 LearningRate 0.000763 Epoch: 8 Global Step: 177330 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:57:52,638-Speed 2497.89 samples/sec Loss 4.0637 LearningRate 0.000763 Epoch: 8 Global Step: 177340 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:00,845-Speed 2495.90 samples/sec Loss 3.9768 LearningRate 0.000763 Epoch: 8 Global Step: 177350 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:09,044-Speed 2498.18 samples/sec Loss 4.0423 LearningRate 0.000763 Epoch: 8 Global Step: 177360 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:17,190-Speed 2514.43 samples/sec Loss 4.0716 LearningRate 0.000763 Epoch: 8 Global Step: 177370 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:25,391-Speed 2497.88 samples/sec Loss 4.0162 LearningRate 0.000763 Epoch: 8 Global Step: 177380 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:33,602-Speed 2494.72 samples/sec Loss 3.9605 LearningRate 0.000763 Epoch: 8 Global Step: 177390 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:41,806-Speed 2496.75 samples/sec Loss 4.0263 LearningRate 0.000763 Epoch: 8 Global Step: 177400 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:50,003-Speed 2498.92 samples/sec Loss 4.0037 LearningRate 0.000763 Epoch: 8 Global Step: 177410 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:58:58,206-Speed 2497.16 samples/sec Loss 4.0178 LearningRate 0.000763 Epoch: 8 Global Step: 177420 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:06,350-Speed 2515.32 samples/sec Loss 3.9901 LearningRate 0.000763 Epoch: 8 Global Step: 177430 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:14,550-Speed 2498.07 samples/sec Loss 3.9980 LearningRate 0.000763 Epoch: 8 Global Step: 177440 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:22,749-Speed 2498.03 samples/sec Loss 3.9829 LearningRate 0.000763 Epoch: 8 Global Step: 177450 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:30,965-Speed 2493.00 samples/sec Loss 4.0195 LearningRate 0.000763 Epoch: 8 Global Step: 177460 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:39,167-Speed 2497.60 samples/sec Loss 3.9742 LearningRate 0.000763 Epoch: 8 Global Step: 177470 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:47,374-Speed 2495.84 samples/sec Loss 3.9409 LearningRate 0.000763 Epoch: 8 Global Step: 177480 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 06:59:55,524-Speed 2513.53 samples/sec Loss 3.9062 LearningRate 0.000763 Epoch: 8 Global Step: 177490 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:03,723-Speed 2498.16 samples/sec Loss 4.0034 LearningRate 0.000763 Epoch: 8 Global Step: 177500 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:11,922-Speed 2498.13 samples/sec Loss 4.0672 LearningRate 0.000763 Epoch: 8 Global Step: 177510 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:20,120-Speed 2498.63 samples/sec Loss 4.0750 LearningRate 0.000763 Epoch: 8 Global Step: 177520 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:28,320-Speed 2498.07 samples/sec Loss 4.0480 LearningRate 0.000763 Epoch: 8 Global Step: 177530 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:36,519-Speed 2498.52 samples/sec Loss 4.0322 LearningRate 0.000763 Epoch: 8 Global Step: 177540 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:44,672-Speed 2512.51 samples/sec Loss 4.0761 LearningRate 0.000763 Epoch: 8 Global Step: 177550 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:00:52,893-Speed 2491.37 samples/sec Loss 3.9632 LearningRate 0.000763 Epoch: 8 Global Step: 177560 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:01,093-Speed 2498.17 samples/sec Loss 4.0976 LearningRate 0.000763 Epoch: 8 Global Step: 177570 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:09,291-Speed 2498.67 samples/sec Loss 4.0464 LearningRate 0.000763 Epoch: 8 Global Step: 177580 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:17,489-Speed 2498.39 samples/sec Loss 3.9924 LearningRate 0.000763 Epoch: 8 Global Step: 177590 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:25,696-Speed 2496.04 samples/sec Loss 4.0062 LearningRate 0.000763 Epoch: 8 Global Step: 177600 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:33,844-Speed 2513.70 samples/sec Loss 4.0425 LearningRate 0.000763 Epoch: 8 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:42,042-Speed 2498.65 samples/sec Loss 3.9440 LearningRate 0.000762 Epoch: 8 Global Step: 177620 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:50,238-Speed 2499.49 samples/sec Loss 4.0284 LearningRate 0.000762 Epoch: 8 Global Step: 177630 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:01:58,435-Speed 2498.87 samples/sec Loss 4.1121 LearningRate 0.000762 Epoch: 8 Global Step: 177640 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:06,647-Speed 2494.37 samples/sec Loss 4.0137 LearningRate 0.000762 Epoch: 8 Global Step: 177650 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:14,857-Speed 2494.64 samples/sec Loss 4.1599 LearningRate 0.000762 Epoch: 8 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:23,004-Speed 2514.23 samples/sec Loss 4.1506 LearningRate 0.000762 Epoch: 8 Global Step: 177670 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:31,210-Speed 2496.28 samples/sec Loss 4.1502 LearningRate 0.000762 Epoch: 8 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:39,411-Speed 2497.73 samples/sec Loss 4.1293 LearningRate 0.000762 Epoch: 8 Global Step: 177690 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:47,613-Speed 2497.61 samples/sec Loss 4.0290 LearningRate 0.000762 Epoch: 8 Global Step: 177700 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:02:55,814-Speed 2497.43 samples/sec Loss 4.0105 LearningRate 0.000762 Epoch: 8 Global Step: 177710 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:04,025-Speed 2494.84 samples/sec Loss 3.9518 LearningRate 0.000762 Epoch: 8 Global Step: 177720 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:12,171-Speed 2514.43 samples/sec Loss 4.0015 LearningRate 0.000762 Epoch: 8 Global Step: 177730 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:20,374-Speed 2497.18 samples/sec Loss 3.9528 LearningRate 0.000762 Epoch: 8 Global Step: 177740 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:28,573-Speed 2498.06 samples/sec Loss 3.9262 LearningRate 0.000762 Epoch: 8 Global Step: 177750 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:36,785-Speed 2494.44 samples/sec Loss 3.9828 LearningRate 0.000762 Epoch: 8 Global Step: 177760 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:44,980-Speed 2499.67 samples/sec Loss 3.9265 LearningRate 0.000762 Epoch: 8 Global Step: 177770 Fp16 Grad Scale: 65536 Required: 149 hours Training: 2022-07-07 07:03:53,136-Speed 2511.21 samples/sec Loss 3.9629 LearningRate 0.000762 Epoch: 8 Global Step: 177780 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:01,279-Speed 2515.49 samples/sec Loss 3.9612 LearningRate 0.000762 Epoch: 8 Global Step: 177790 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:09,484-Speed 2496.45 samples/sec Loss 3.9993 LearningRate 0.000762 Epoch: 8 Global Step: 177800 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:17,697-Speed 2494.20 samples/sec Loss 3.9371 LearningRate 0.000762 Epoch: 8 Global Step: 177810 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:25,892-Speed 2499.77 samples/sec Loss 3.9127 LearningRate 0.000762 Epoch: 8 Global Step: 177820 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:34,089-Speed 2498.91 samples/sec Loss 3.9211 LearningRate 0.000762 Epoch: 8 Global Step: 177830 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:42,296-Speed 2495.78 samples/sec Loss 4.0169 LearningRate 0.000762 Epoch: 8 Global Step: 177840 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:50,439-Speed 2515.46 samples/sec Loss 3.9890 LearningRate 0.000762 Epoch: 8 Global Step: 177850 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:04:58,644-Speed 2496.81 samples/sec Loss 4.0289 LearningRate 0.000762 Epoch: 8 Global Step: 177860 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:06,843-Speed 2498.18 samples/sec Loss 4.0230 LearningRate 0.000762 Epoch: 8 Global Step: 177870 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:15,040-Speed 2498.77 samples/sec Loss 4.0360 LearningRate 0.000762 Epoch: 8 Global Step: 177880 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:23,244-Speed 2496.83 samples/sec Loss 3.9777 LearningRate 0.000762 Epoch: 8 Global Step: 177890 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:31,449-Speed 2496.31 samples/sec Loss 4.0127 LearningRate 0.000762 Epoch: 8 Global Step: 177900 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:39,607-Speed 2510.95 samples/sec Loss 3.9242 LearningRate 0.000762 Epoch: 8 Global Step: 177910 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:47,812-Speed 2496.30 samples/sec Loss 3.9337 LearningRate 0.000762 Epoch: 8 Global Step: 177920 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:05:56,009-Speed 2498.76 samples/sec Loss 3.9269 LearningRate 0.000762 Epoch: 8 Global Step: 177930 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:04,208-Speed 2498.30 samples/sec Loss 3.9530 LearningRate 0.000762 Epoch: 8 Global Step: 177940 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:12,408-Speed 2498.31 samples/sec Loss 4.0102 LearningRate 0.000762 Epoch: 8 Global Step: 177950 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:20,605-Speed 2499.13 samples/sec Loss 4.0106 LearningRate 0.000762 Epoch: 8 Global Step: 177960 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:28,755-Speed 2513.37 samples/sec Loss 3.9299 LearningRate 0.000762 Epoch: 8 Global Step: 177970 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:36,954-Speed 2498.89 samples/sec Loss 3.9350 LearningRate 0.000762 Epoch: 8 Global Step: 177980 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:45,154-Speed 2497.98 samples/sec Loss 3.9781 LearningRate 0.000762 Epoch: 8 Global Step: 177990 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:06:53,353-Speed 2498.26 samples/sec Loss 4.0471 LearningRate 0.000762 Epoch: 8 Global Step: 178000 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:01,550-Speed 2498.96 samples/sec Loss 4.0254 LearningRate 0.000762 Epoch: 8 Global Step: 178010 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:09,749-Speed 2498.58 samples/sec Loss 3.9797 LearningRate 0.000762 Epoch: 8 Global Step: 178020 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:17,901-Speed 2512.72 samples/sec Loss 3.9780 LearningRate 0.000762 Epoch: 8 Global Step: 178030 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:26,123-Speed 2491.31 samples/sec Loss 3.9462 LearningRate 0.000762 Epoch: 8 Global Step: 178040 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:34,319-Speed 2499.10 samples/sec Loss 3.9398 LearningRate 0.000761 Epoch: 8 Global Step: 178050 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:42,521-Speed 2497.35 samples/sec Loss 4.0074 LearningRate 0.000761 Epoch: 8 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:50,720-Speed 2498.48 samples/sec Loss 3.9808 LearningRate 0.000761 Epoch: 8 Global Step: 178070 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:07:58,919-Speed 2498.19 samples/sec Loss 3.9718 LearningRate 0.000761 Epoch: 8 Global Step: 178080 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:07,065-Speed 2514.71 samples/sec Loss 4.0324 LearningRate 0.000761 Epoch: 8 Global Step: 178090 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:15,263-Speed 2498.78 samples/sec Loss 3.9818 LearningRate 0.000761 Epoch: 8 Global Step: 178100 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:23,460-Speed 2498.51 samples/sec Loss 3.9276 LearningRate 0.000761 Epoch: 8 Global Step: 178110 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:31,661-Speed 2497.73 samples/sec Loss 3.9293 LearningRate 0.000761 Epoch: 8 Global Step: 178120 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:39,859-Speed 2498.72 samples/sec Loss 3.9866 LearningRate 0.000761 Epoch: 8 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:48,060-Speed 2497.97 samples/sec Loss 3.9220 LearningRate 0.000761 Epoch: 8 Global Step: 178140 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:08:56,217-Speed 2511.34 samples/sec Loss 3.9567 LearningRate 0.000761 Epoch: 8 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:04,416-Speed 2498.33 samples/sec Loss 3.9045 LearningRate 0.000761 Epoch: 8 Global Step: 178160 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:12,615-Speed 2498.12 samples/sec Loss 3.9377 LearningRate 0.000761 Epoch: 8 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:20,818-Speed 2497.06 samples/sec Loss 3.8386 LearningRate 0.000761 Epoch: 8 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:29,017-Speed 2498.32 samples/sec Loss 3.9449 LearningRate 0.000761 Epoch: 8 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:37,213-Speed 2499.05 samples/sec Loss 3.9016 LearningRate 0.000761 Epoch: 8 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:45,359-Speed 2514.53 samples/sec Loss 3.8908 LearningRate 0.000761 Epoch: 8 Global Step: 178210 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:09:53,559-Speed 2498.00 samples/sec Loss 4.0212 LearningRate 0.000761 Epoch: 8 Global Step: 178220 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:01,758-Speed 2498.62 samples/sec Loss 4.0187 LearningRate 0.000761 Epoch: 8 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:09,960-Speed 2497.54 samples/sec Loss 3.9625 LearningRate 0.000761 Epoch: 8 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:18,158-Speed 2498.73 samples/sec Loss 3.9402 LearningRate 0.000761 Epoch: 8 Global Step: 178250 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:26,358-Speed 2497.84 samples/sec Loss 3.9236 LearningRate 0.000761 Epoch: 8 Global Step: 178260 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:34,501-Speed 2515.45 samples/sec Loss 4.0056 LearningRate 0.000761 Epoch: 8 Global Step: 178270 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:42,700-Speed 2498.10 samples/sec Loss 4.0821 LearningRate 0.000761 Epoch: 8 Global Step: 178280 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:50,897-Speed 2498.85 samples/sec Loss 4.0696 LearningRate 0.000761 Epoch: 8 Global Step: 178290 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:10:59,108-Speed 2494.57 samples/sec Loss 4.0074 LearningRate 0.000761 Epoch: 8 Global Step: 178300 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:07,306-Speed 2498.72 samples/sec Loss 4.0650 LearningRate 0.000761 Epoch: 8 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:15,503-Speed 2499.07 samples/sec Loss 4.0747 LearningRate 0.000761 Epoch: 8 Global Step: 178320 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:23,650-Speed 2514.28 samples/sec Loss 3.9689 LearningRate 0.000761 Epoch: 8 Global Step: 178330 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:31,850-Speed 2497.63 samples/sec Loss 4.0570 LearningRate 0.000761 Epoch: 8 Global Step: 178340 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:40,049-Speed 2498.28 samples/sec Loss 3.9899 LearningRate 0.000761 Epoch: 8 Global Step: 178350 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:48,250-Speed 2497.56 samples/sec Loss 3.9894 LearningRate 0.000761 Epoch: 8 Global Step: 178360 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:11:56,447-Speed 2498.96 samples/sec Loss 3.9168 LearningRate 0.000761 Epoch: 8 Global Step: 178370 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:04,644-Speed 2499.00 samples/sec Loss 3.9623 LearningRate 0.000761 Epoch: 8 Global Step: 178380 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:12,788-Speed 2515.27 samples/sec Loss 4.0232 LearningRate 0.000761 Epoch: 8 Global Step: 178390 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:20,987-Speed 2498.39 samples/sec Loss 3.9948 LearningRate 0.000761 Epoch: 8 Global Step: 178400 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:29,187-Speed 2497.98 samples/sec Loss 4.0084 LearningRate 0.000761 Epoch: 8 Global Step: 178410 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:37,384-Speed 2498.61 samples/sec Loss 3.9479 LearningRate 0.000761 Epoch: 8 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:45,584-Speed 2498.03 samples/sec Loss 4.0094 LearningRate 0.000761 Epoch: 8 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:12:53,800-Speed 2493.06 samples/sec Loss 3.9868 LearningRate 0.000761 Epoch: 8 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:01,946-Speed 2514.72 samples/sec Loss 4.0088 LearningRate 0.000761 Epoch: 8 Global Step: 178450 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:10,155-Speed 2495.24 samples/sec Loss 3.9880 LearningRate 0.000761 Epoch: 8 Global Step: 178460 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:18,358-Speed 2497.25 samples/sec Loss 3.9930 LearningRate 0.000761 Epoch: 8 Global Step: 178470 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:26,559-Speed 2497.64 samples/sec Loss 4.0118 LearningRate 0.000760 Epoch: 8 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:34,757-Speed 2499.16 samples/sec Loss 3.9611 LearningRate 0.000760 Epoch: 8 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:42,956-Speed 2498.13 samples/sec Loss 3.9577 LearningRate 0.000760 Epoch: 8 Global Step: 178500 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:51,107-Speed 2513.14 samples/sec Loss 3.9750 LearningRate 0.000760 Epoch: 8 Global Step: 178510 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:13:59,305-Speed 2498.60 samples/sec Loss 3.9192 LearningRate 0.000760 Epoch: 8 Global Step: 178520 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:07,505-Speed 2497.96 samples/sec Loss 4.0431 LearningRate 0.000760 Epoch: 8 Global Step: 178530 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:15,707-Speed 2498.14 samples/sec Loss 4.0084 LearningRate 0.000760 Epoch: 8 Global Step: 178540 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:23,910-Speed 2497.20 samples/sec Loss 4.0144 LearningRate 0.000760 Epoch: 8 Global Step: 178550 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:32,107-Speed 2498.73 samples/sec Loss 4.1212 LearningRate 0.000760 Epoch: 8 Global Step: 178560 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:40,253-Speed 2514.62 samples/sec Loss 3.9968 LearningRate 0.000760 Epoch: 8 Global Step: 178570 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:48,458-Speed 2496.53 samples/sec Loss 4.0626 LearningRate 0.000760 Epoch: 8 Global Step: 178580 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:14:56,655-Speed 2498.93 samples/sec Loss 4.0082 LearningRate 0.000760 Epoch: 8 Global Step: 178590 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:04,853-Speed 2498.85 samples/sec Loss 4.0105 LearningRate 0.000760 Epoch: 8 Global Step: 178600 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:13,056-Speed 2496.91 samples/sec Loss 3.9904 LearningRate 0.000760 Epoch: 8 Global Step: 178610 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:21,253-Speed 2498.91 samples/sec Loss 3.9717 LearningRate 0.000760 Epoch: 8 Global Step: 178620 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:29,399-Speed 2514.65 samples/sec Loss 3.9950 LearningRate 0.000760 Epoch: 8 Global Step: 178630 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:37,598-Speed 2498.31 samples/sec Loss 3.9778 LearningRate 0.000760 Epoch: 8 Global Step: 178640 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:45,795-Speed 2498.90 samples/sec Loss 4.0413 LearningRate 0.000760 Epoch: 8 Global Step: 178650 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:15:53,995-Speed 2497.93 samples/sec Loss 3.9778 LearningRate 0.000760 Epoch: 8 Global Step: 178660 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:02,192-Speed 2498.95 samples/sec Loss 4.0233 LearningRate 0.000760 Epoch: 8 Global Step: 178670 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:10,391-Speed 2498.30 samples/sec Loss 3.9779 LearningRate 0.000760 Epoch: 8 Global Step: 178680 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:18,535-Speed 2515.35 samples/sec Loss 4.0391 LearningRate 0.000760 Epoch: 8 Global Step: 178690 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:26,732-Speed 2498.83 samples/sec Loss 3.9781 LearningRate 0.000760 Epoch: 8 Global Step: 178700 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:34,928-Speed 2499.35 samples/sec Loss 4.0240 LearningRate 0.000760 Epoch: 8 Global Step: 178710 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:43,124-Speed 2499.02 samples/sec Loss 4.0203 LearningRate 0.000760 Epoch: 8 Global Step: 178720 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:51,329-Speed 2496.31 samples/sec Loss 4.0142 LearningRate 0.000760 Epoch: 8 Global Step: 178730 Fp16 Grad Scale: 32768 Required: 149 hours Training: 2022-07-07 07:16:59,482-Speed 2512.58 samples/sec Loss 3.9322 LearningRate 0.000760 Epoch: 8 Global Step: 178740 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:07,640-Speed 2510.74 samples/sec Loss 4.0270 LearningRate 0.000760 Epoch: 8 Global Step: 178750 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:15,843-Speed 2496.83 samples/sec Loss 4.0132 LearningRate 0.000760 Epoch: 8 Global Step: 178760 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:24,038-Speed 2499.52 samples/sec Loss 3.9382 LearningRate 0.000760 Epoch: 8 Global Step: 178770 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:32,238-Speed 2497.89 samples/sec Loss 4.0267 LearningRate 0.000760 Epoch: 8 Global Step: 178780 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:40,437-Speed 2498.26 samples/sec Loss 3.9770 LearningRate 0.000760 Epoch: 8 Global Step: 178790 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:48,636-Speed 2498.64 samples/sec Loss 4.0191 LearningRate 0.000760 Epoch: 8 Global Step: 178800 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:17:56,784-Speed 2513.91 samples/sec Loss 3.9655 LearningRate 0.000760 Epoch: 8 Global Step: 178810 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:04,981-Speed 2498.95 samples/sec Loss 4.0914 LearningRate 0.000760 Epoch: 8 Global Step: 178820 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:13,192-Speed 2494.68 samples/sec Loss 4.0439 LearningRate 0.000760 Epoch: 8 Global Step: 178830 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:21,390-Speed 2498.35 samples/sec Loss 4.0463 LearningRate 0.000760 Epoch: 8 Global Step: 178840 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:29,589-Speed 2498.53 samples/sec Loss 4.1005 LearningRate 0.000760 Epoch: 8 Global Step: 178850 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:37,788-Speed 2498.20 samples/sec Loss 4.0395 LearningRate 0.000760 Epoch: 8 Global Step: 178860 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:45,938-Speed 2513.27 samples/sec Loss 4.0273 LearningRate 0.000760 Epoch: 8 Global Step: 178870 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:18:54,134-Speed 2499.16 samples/sec Loss 3.9862 LearningRate 0.000760 Epoch: 8 Global Step: 178880 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:02,339-Speed 2496.35 samples/sec Loss 3.9777 LearningRate 0.000760 Epoch: 8 Global Step: 178890 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:10,534-Speed 2499.33 samples/sec Loss 4.0036 LearningRate 0.000760 Epoch: 8 Global Step: 178900 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:18,729-Speed 2499.71 samples/sec Loss 4.0468 LearningRate 0.000759 Epoch: 8 Global Step: 178910 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:26,924-Speed 2499.49 samples/sec Loss 4.0048 LearningRate 0.000759 Epoch: 8 Global Step: 178920 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:35,070-Speed 2514.56 samples/sec Loss 3.9865 LearningRate 0.000759 Epoch: 8 Global Step: 178930 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:43,265-Speed 2499.77 samples/sec Loss 4.0029 LearningRate 0.000759 Epoch: 8 Global Step: 178940 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:51,463-Speed 2498.59 samples/sec Loss 4.0002 LearningRate 0.000759 Epoch: 8 Global Step: 178950 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:19:59,661-Speed 2498.48 samples/sec Loss 3.9819 LearningRate 0.000759 Epoch: 8 Global Step: 178960 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:07,859-Speed 2498.57 samples/sec Loss 4.0125 LearningRate 0.000759 Epoch: 8 Global Step: 178970 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:16,058-Speed 2498.67 samples/sec Loss 4.0354 LearningRate 0.000759 Epoch: 8 Global Step: 178980 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:24,201-Speed 2515.59 samples/sec Loss 4.0303 LearningRate 0.000759 Epoch: 8 Global Step: 178990 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:32,396-Speed 2499.51 samples/sec Loss 3.9350 LearningRate 0.000759 Epoch: 8 Global Step: 179000 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:40,590-Speed 2499.49 samples/sec Loss 3.9780 LearningRate 0.000759 Epoch: 8 Global Step: 179010 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:48,786-Speed 2499.60 samples/sec Loss 3.9978 LearningRate 0.000759 Epoch: 8 Global Step: 179020 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:20:56,982-Speed 2499.46 samples/sec Loss 3.9921 LearningRate 0.000759 Epoch: 8 Global Step: 179030 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:05,181-Speed 2498.14 samples/sec Loss 4.0173 LearningRate 0.000759 Epoch: 8 Global Step: 179040 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:13,323-Speed 2515.77 samples/sec Loss 3.9936 LearningRate 0.000759 Epoch: 8 Global Step: 179050 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:21,516-Speed 2500.36 samples/sec Loss 4.0076 LearningRate 0.000759 Epoch: 8 Global Step: 179060 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:29,714-Speed 2498.55 samples/sec Loss 4.0625 LearningRate 0.000759 Epoch: 8 Global Step: 179070 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:37,914-Speed 2498.05 samples/sec Loss 3.9764 LearningRate 0.000759 Epoch: 8 Global Step: 179080 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:46,117-Speed 2496.93 samples/sec Loss 3.9824 LearningRate 0.000759 Epoch: 8 Global Step: 179090 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:21:54,316-Speed 2498.22 samples/sec Loss 4.0932 LearningRate 0.000759 Epoch: 8 Global Step: 179100 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:02,462-Speed 2514.48 samples/sec Loss 4.0010 LearningRate 0.000759 Epoch: 8 Global Step: 179110 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:10,661-Speed 2498.28 samples/sec Loss 4.0279 LearningRate 0.000759 Epoch: 8 Global Step: 179120 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:18,861-Speed 2498.00 samples/sec Loss 4.0305 LearningRate 0.000759 Epoch: 8 Global Step: 179130 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:27,060-Speed 2498.37 samples/sec Loss 4.0519 LearningRate 0.000759 Epoch: 8 Global Step: 179140 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:35,266-Speed 2496.20 samples/sec Loss 4.0005 LearningRate 0.000759 Epoch: 8 Global Step: 179150 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:43,465-Speed 2498.23 samples/sec Loss 4.0548 LearningRate 0.000759 Epoch: 8 Global Step: 179160 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:51,609-Speed 2515.36 samples/sec Loss 3.9811 LearningRate 0.000759 Epoch: 8 Global Step: 179170 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:22:59,825-Speed 2493.09 samples/sec Loss 3.9917 LearningRate 0.000759 Epoch: 8 Global Step: 179180 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:08,024-Speed 2498.21 samples/sec Loss 4.0156 LearningRate 0.000759 Epoch: 8 Global Step: 179190 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:16,218-Speed 2499.65 samples/sec Loss 4.0027 LearningRate 0.000759 Epoch: 8 Global Step: 179200 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:24,415-Speed 2499.01 samples/sec Loss 4.0088 LearningRate 0.000759 Epoch: 8 Global Step: 179210 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:32,609-Speed 2499.77 samples/sec Loss 3.9669 LearningRate 0.000759 Epoch: 8 Global Step: 179220 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:40,752-Speed 2515.32 samples/sec Loss 4.0031 LearningRate 0.000759 Epoch: 8 Global Step: 179230 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:48,953-Speed 2497.53 samples/sec Loss 3.9040 LearningRate 0.000759 Epoch: 8 Global Step: 179240 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:23:57,159-Speed 2496.11 samples/sec Loss 3.9530 LearningRate 0.000759 Epoch: 8 Global Step: 179250 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:05,366-Speed 2496.04 samples/sec Loss 3.9962 LearningRate 0.000759 Epoch: 8 Global Step: 179260 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:13,563-Speed 2498.84 samples/sec Loss 3.9435 LearningRate 0.000759 Epoch: 8 Global Step: 179270 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:21,758-Speed 2499.64 samples/sec Loss 4.0318 LearningRate 0.000759 Epoch: 8 Global Step: 179280 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:29,901-Speed 2515.41 samples/sec Loss 4.0604 LearningRate 0.000759 Epoch: 8 Global Step: 179290 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:38,100-Speed 2498.30 samples/sec Loss 3.9624 LearningRate 0.000759 Epoch: 8 Global Step: 179300 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:46,297-Speed 2498.93 samples/sec Loss 3.9696 LearningRate 0.000759 Epoch: 8 Global Step: 179310 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:24:54,508-Speed 2494.52 samples/sec Loss 3.9640 LearningRate 0.000759 Epoch: 8 Global Step: 179320 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:02,705-Speed 2499.21 samples/sec Loss 3.9237 LearningRate 0.000758 Epoch: 8 Global Step: 179330 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:10,903-Speed 2498.17 samples/sec Loss 3.9178 LearningRate 0.000758 Epoch: 8 Global Step: 179340 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:19,054-Speed 2513.28 samples/sec Loss 3.9498 LearningRate 0.000758 Epoch: 8 Global Step: 179350 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:27,255-Speed 2497.68 samples/sec Loss 4.0123 LearningRate 0.000758 Epoch: 8 Global Step: 179360 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:35,450-Speed 2499.34 samples/sec Loss 4.0097 LearningRate 0.000758 Epoch: 8 Global Step: 179370 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:43,649-Speed 2498.23 samples/sec Loss 3.9441 LearningRate 0.000758 Epoch: 8 Global Step: 179380 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:25:51,856-Speed 2495.98 samples/sec Loss 3.9265 LearningRate 0.000758 Epoch: 8 Global Step: 179390 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:00,053-Speed 2498.73 samples/sec Loss 3.9190 LearningRate 0.000758 Epoch: 8 Global Step: 179400 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:08,204-Speed 2513.47 samples/sec Loss 3.9699 LearningRate 0.000758 Epoch: 8 Global Step: 179410 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:16,410-Speed 2495.91 samples/sec Loss 3.9481 LearningRate 0.000758 Epoch: 8 Global Step: 179420 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:24,607-Speed 2498.92 samples/sec Loss 3.9603 LearningRate 0.000758 Epoch: 8 Global Step: 179430 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:32,802-Speed 2499.45 samples/sec Loss 3.9763 LearningRate 0.000758 Epoch: 8 Global Step: 179440 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:41,020-Speed 2492.82 samples/sec Loss 3.9723 LearningRate 0.000758 Epoch: 8 Global Step: 179450 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:49,216-Speed 2499.00 samples/sec Loss 3.9247 LearningRate 0.000758 Epoch: 8 Global Step: 179460 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:26:57,362-Speed 2514.71 samples/sec Loss 3.8908 LearningRate 0.000758 Epoch: 8 Global Step: 179470 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:05,562-Speed 2497.93 samples/sec Loss 3.9301 LearningRate 0.000758 Epoch: 8 Global Step: 179480 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:13,758-Speed 2499.00 samples/sec Loss 3.9034 LearningRate 0.000758 Epoch: 8 Global Step: 179490 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:21,957-Speed 2498.25 samples/sec Loss 3.9630 LearningRate 0.000758 Epoch: 8 Global Step: 179500 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:30,182-Speed 2490.34 samples/sec Loss 3.9628 LearningRate 0.000758 Epoch: 8 Global Step: 179510 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:38,380-Speed 2498.58 samples/sec Loss 3.9583 LearningRate 0.000758 Epoch: 8 Global Step: 179520 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:46,528-Speed 2513.79 samples/sec Loss 4.0046 LearningRate 0.000758 Epoch: 8 Global Step: 179530 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:27:54,730-Speed 2497.50 samples/sec Loss 3.9961 LearningRate 0.000758 Epoch: 8 Global Step: 179540 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:02,929-Speed 2498.46 samples/sec Loss 3.8852 LearningRate 0.000758 Epoch: 8 Global Step: 179550 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:11,132-Speed 2497.21 samples/sec Loss 3.9273 LearningRate 0.000758 Epoch: 8 Global Step: 179560 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:19,335-Speed 2497.04 samples/sec Loss 3.9474 LearningRate 0.000758 Epoch: 8 Global Step: 179570 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:27,533-Speed 2498.87 samples/sec Loss 3.9548 LearningRate 0.000758 Epoch: 8 Global Step: 179580 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:36,370-Speed 2317.74 samples/sec Loss 3.9348 LearningRate 0.000758 Epoch: 8 Global Step: 179590 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:44,569-Speed 2498.49 samples/sec Loss 3.8863 LearningRate 0.000758 Epoch: 8 Global Step: 179600 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:28:52,762-Speed 2499.89 samples/sec Loss 3.8977 LearningRate 0.000758 Epoch: 8 Global Step: 179610 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:29:00,962-Speed 2498.11 samples/sec Loss 3.9326 LearningRate 0.000758 Epoch: 8 Global Step: 179620 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:29:09,159-Speed 2498.75 samples/sec Loss 3.9198 LearningRate 0.000758 Epoch: 8 Global Step: 179630 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:29:17,445-Speed 2471.79 samples/sec Loss 4.0237 LearningRate 0.000758 Epoch: 8 Global Step: 179640 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:29:25,590-Speed 2514.99 samples/sec Loss 3.8914 LearningRate 0.000758 Epoch: 8 Global Step: 179650 Fp16 Grad Scale: 16384 Required: 149 hours Training: 2022-07-07 07:29:33,813-Speed 2490.81 samples/sec Loss 3.9230 LearningRate 0.000758 Epoch: 8 Global Step: 179660 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:29:42,012-Speed 2498.37 samples/sec Loss 3.9780 LearningRate 0.000758 Epoch: 8 Global Step: 179670 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:29:50,209-Speed 2498.77 samples/sec Loss 3.9283 LearningRate 0.000758 Epoch: 8 Global Step: 179680 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:29:58,409-Speed 2498.29 samples/sec Loss 3.9145 LearningRate 0.000758 Epoch: 8 Global Step: 179690 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:06,609-Speed 2497.89 samples/sec Loss 3.9161 LearningRate 0.000758 Epoch: 8 Global Step: 179700 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:14,757-Speed 2514.00 samples/sec Loss 3.9887 LearningRate 0.000758 Epoch: 8 Global Step: 179710 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:22,954-Speed 2498.78 samples/sec Loss 3.9324 LearningRate 0.000758 Epoch: 8 Global Step: 179720 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:31,154-Speed 2498.05 samples/sec Loss 3.9924 LearningRate 0.000758 Epoch: 8 Global Step: 179730 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:39,354-Speed 2498.01 samples/sec Loss 4.0494 LearningRate 0.000758 Epoch: 8 Global Step: 179740 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:47,554-Speed 2498.06 samples/sec Loss 4.0389 LearningRate 0.000758 Epoch: 8 Global Step: 179750 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:30:55,753-Speed 2498.16 samples/sec Loss 3.9923 LearningRate 0.000757 Epoch: 8 Global Step: 179760 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:03,899-Speed 2514.56 samples/sec Loss 4.0024 LearningRate 0.000757 Epoch: 8 Global Step: 179770 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:12,099-Speed 2497.78 samples/sec Loss 3.9122 LearningRate 0.000757 Epoch: 8 Global Step: 179780 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:20,299-Speed 2498.08 samples/sec Loss 3.9955 LearningRate 0.000757 Epoch: 8 Global Step: 179790 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:28,499-Speed 2497.71 samples/sec Loss 4.0069 LearningRate 0.000757 Epoch: 8 Global Step: 179800 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:36,707-Speed 2495.64 samples/sec Loss 3.9565 LearningRate 0.000757 Epoch: 8 Global Step: 179810 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:44,906-Speed 2498.39 samples/sec Loss 3.9478 LearningRate 0.000757 Epoch: 8 Global Step: 179820 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:31:53,051-Speed 2514.88 samples/sec Loss 4.0026 LearningRate 0.000757 Epoch: 8 Global Step: 179830 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:01,250-Speed 2497.98 samples/sec Loss 3.9606 LearningRate 0.000757 Epoch: 8 Global Step: 179840 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:09,453-Speed 2497.43 samples/sec Loss 3.9774 LearningRate 0.000757 Epoch: 8 Global Step: 179850 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:17,652-Speed 2498.04 samples/sec Loss 3.9894 LearningRate 0.000757 Epoch: 8 Global Step: 179860 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:25,853-Speed 2497.79 samples/sec Loss 3.9527 LearningRate 0.000757 Epoch: 8 Global Step: 179870 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:34,059-Speed 2496.33 samples/sec Loss 4.0533 LearningRate 0.000757 Epoch: 8 Global Step: 179880 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:42,204-Speed 2514.69 samples/sec Loss 3.9623 LearningRate 0.000757 Epoch: 8 Global Step: 179890 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:50,408-Speed 2496.74 samples/sec Loss 4.0065 LearningRate 0.000757 Epoch: 8 Global Step: 179900 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:32:58,616-Speed 2495.88 samples/sec Loss 3.9892 LearningRate 0.000757 Epoch: 8 Global Step: 179910 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:33:06,813-Speed 2498.80 samples/sec Loss 3.9801 LearningRate 0.000757 Epoch: 8 Global Step: 179920 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:33:15,009-Speed 2499.37 samples/sec Loss 4.0280 LearningRate 0.000757 Epoch: 8 Global Step: 179930 Fp16 Grad Scale: 16384 Required: 148 hours Training: 2022-07-07 07:33:23,215-Speed 2496.09 samples/sec Loss 4.0372 LearningRate 0.000757 Epoch: 8 Global Step: 179940 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:33:31,361-Speed 2514.59 samples/sec Loss 3.9721 LearningRate 0.000757 Epoch: 8 Global Step: 179950 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:33:39,560-Speed 2498.09 samples/sec Loss 3.9599 LearningRate 0.000757 Epoch: 8 Global Step: 179960 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:33:47,763-Speed 2497.14 samples/sec Loss 3.9854 LearningRate 0.000757 Epoch: 8 Global Step: 179970 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:33:55,960-Speed 2498.90 samples/sec Loss 3.9848 LearningRate 0.000757 Epoch: 8 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:04,159-Speed 2498.23 samples/sec Loss 3.9542 LearningRate 0.000757 Epoch: 8 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:12,357-Speed 2498.47 samples/sec Loss 3.9541 LearningRate 0.000757 Epoch: 8 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:20,502-Speed 2514.74 samples/sec Loss 3.9486 LearningRate 0.000757 Epoch: 8 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:28,704-Speed 2497.96 samples/sec Loss 4.0531 LearningRate 0.000757 Epoch: 8 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:36,907-Speed 2496.94 samples/sec Loss 3.9512 LearningRate 0.000757 Epoch: 8 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:45,109-Speed 2497.43 samples/sec Loss 3.9797 LearningRate 0.000757 Epoch: 8 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:34:53,308-Speed 2498.36 samples/sec Loss 3.9309 LearningRate 0.000757 Epoch: 8 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:01,509-Speed 2497.78 samples/sec Loss 3.9838 LearningRate 0.000757 Epoch: 8 Global Step: 180060 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:09,660-Speed 2512.87 samples/sec Loss 4.0057 LearningRate 0.000757 Epoch: 8 Global Step: 180070 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:17,863-Speed 2496.95 samples/sec Loss 3.9575 LearningRate 0.000757 Epoch: 8 Global Step: 180080 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:26,064-Speed 2497.71 samples/sec Loss 3.9475 LearningRate 0.000757 Epoch: 8 Global Step: 180090 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:34,273-Speed 2495.45 samples/sec Loss 3.9461 LearningRate 0.000757 Epoch: 8 Global Step: 180100 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:42,484-Speed 2494.39 samples/sec Loss 3.9710 LearningRate 0.000757 Epoch: 8 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:50,684-Speed 2498.17 samples/sec Loss 4.0459 LearningRate 0.000757 Epoch: 8 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:35:58,830-Speed 2514.31 samples/sec Loss 4.0211 LearningRate 0.000757 Epoch: 8 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:07,030-Speed 2497.87 samples/sec Loss 3.9324 LearningRate 0.000757 Epoch: 8 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:15,227-Speed 2498.91 samples/sec Loss 3.8340 LearningRate 0.000757 Epoch: 8 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:23,424-Speed 2498.98 samples/sec Loss 3.9636 LearningRate 0.000757 Epoch: 8 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:31,639-Speed 2493.51 samples/sec Loss 3.9633 LearningRate 0.000757 Epoch: 8 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:39,838-Speed 2498.18 samples/sec Loss 3.9758 LearningRate 0.000757 Epoch: 8 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:47,985-Speed 2514.04 samples/sec Loss 3.9418 LearningRate 0.000756 Epoch: 8 Global Step: 180190 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:36:56,197-Speed 2494.26 samples/sec Loss 3.9306 LearningRate 0.000756 Epoch: 8 Global Step: 180200 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:04,400-Speed 2497.45 samples/sec Loss 3.9466 LearningRate 0.000756 Epoch: 8 Global Step: 180210 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:12,598-Speed 2498.50 samples/sec Loss 4.0199 LearningRate 0.000756 Epoch: 8 Global Step: 180220 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:20,794-Speed 2499.13 samples/sec Loss 3.9268 LearningRate 0.000756 Epoch: 8 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:28,996-Speed 2497.37 samples/sec Loss 3.9376 LearningRate 0.000756 Epoch: 8 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:37,141-Speed 2514.83 samples/sec Loss 3.8802 LearningRate 0.000756 Epoch: 8 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:45,346-Speed 2496.65 samples/sec Loss 3.9360 LearningRate 0.000756 Epoch: 8 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:37:53,547-Speed 2497.51 samples/sec Loss 3.9940 LearningRate 0.000756 Epoch: 8 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:01,751-Speed 2496.91 samples/sec Loss 3.9801 LearningRate 0.000756 Epoch: 8 Global Step: 180280 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:09,946-Speed 2499.54 samples/sec Loss 3.9426 LearningRate 0.000756 Epoch: 8 Global Step: 180290 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:18,146-Speed 2498.18 samples/sec Loss 3.9568 LearningRate 0.000756 Epoch: 8 Global Step: 180300 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:26,292-Speed 2514.53 samples/sec Loss 3.9463 LearningRate 0.000756 Epoch: 8 Global Step: 180310 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:34,491-Speed 2498.41 samples/sec Loss 4.0041 LearningRate 0.000756 Epoch: 8 Global Step: 180320 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:42,702-Speed 2494.39 samples/sec Loss 3.9628 LearningRate 0.000756 Epoch: 8 Global Step: 180330 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:50,908-Speed 2496.31 samples/sec Loss 4.0217 LearningRate 0.000756 Epoch: 8 Global Step: 180340 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:38:59,109-Speed 2497.59 samples/sec Loss 4.0152 LearningRate 0.000756 Epoch: 8 Global Step: 180350 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:07,329-Speed 2491.68 samples/sec Loss 4.0097 LearningRate 0.000756 Epoch: 8 Global Step: 180360 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:15,473-Speed 2515.95 samples/sec Loss 3.9358 LearningRate 0.000756 Epoch: 8 Global Step: 180370 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:23,676-Speed 2496.78 samples/sec Loss 3.9385 LearningRate 0.000756 Epoch: 8 Global Step: 180380 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:31,874-Speed 2498.48 samples/sec Loss 3.9946 LearningRate 0.000756 Epoch: 8 Global Step: 180390 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:40,074-Speed 2498.58 samples/sec Loss 3.9192 LearningRate 0.000756 Epoch: 8 Global Step: 180400 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:48,270-Speed 2499.27 samples/sec Loss 3.9165 LearningRate 0.000756 Epoch: 8 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:39:56,475-Speed 2496.58 samples/sec Loss 3.9347 LearningRate 0.000756 Epoch: 8 Global Step: 180420 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:04,619-Speed 2514.82 samples/sec Loss 3.9808 LearningRate 0.000756 Epoch: 8 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:12,842-Speed 2491.08 samples/sec Loss 4.0162 LearningRate 0.000756 Epoch: 8 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:21,044-Speed 2497.74 samples/sec Loss 4.0423 LearningRate 0.000756 Epoch: 8 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:29,256-Speed 2494.39 samples/sec Loss 3.9819 LearningRate 0.000756 Epoch: 8 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:37,453-Speed 2498.95 samples/sec Loss 3.9463 LearningRate 0.000756 Epoch: 8 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:45,660-Speed 2495.95 samples/sec Loss 3.8930 LearningRate 0.000756 Epoch: 8 Global Step: 180480 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:40:53,808-Speed 2513.87 samples/sec Loss 3.9166 LearningRate 0.000756 Epoch: 8 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:02,011-Speed 2497.13 samples/sec Loss 3.9708 LearningRate 0.000756 Epoch: 8 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:10,218-Speed 2496.16 samples/sec Loss 3.9337 LearningRate 0.000756 Epoch: 8 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:18,427-Speed 2495.55 samples/sec Loss 3.9576 LearningRate 0.000756 Epoch: 8 Global Step: 180520 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:26,628-Speed 2497.69 samples/sec Loss 3.9424 LearningRate 0.000756 Epoch: 8 Global Step: 180530 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:34,829-Speed 2497.51 samples/sec Loss 3.9904 LearningRate 0.000756 Epoch: 8 Global Step: 180540 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:42,978-Speed 2513.91 samples/sec Loss 3.9314 LearningRate 0.000756 Epoch: 8 Global Step: 180550 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:51,177-Speed 2498.31 samples/sec Loss 3.9729 LearningRate 0.000756 Epoch: 8 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:41:59,375-Speed 2498.46 samples/sec Loss 3.9463 LearningRate 0.000756 Epoch: 8 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:07,576-Speed 2497.81 samples/sec Loss 3.9236 LearningRate 0.000756 Epoch: 8 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:15,775-Speed 2498.22 samples/sec Loss 3.9255 LearningRate 0.000756 Epoch: 8 Global Step: 180590 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:23,977-Speed 2497.14 samples/sec Loss 3.9165 LearningRate 0.000756 Epoch: 8 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:32,121-Speed 2515.12 samples/sec Loss 3.8790 LearningRate 0.000756 Epoch: 8 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:40,316-Speed 2499.60 samples/sec Loss 3.9033 LearningRate 0.000755 Epoch: 8 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:48,513-Speed 2498.93 samples/sec Loss 3.9242 LearningRate 0.000755 Epoch: 8 Global Step: 180630 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:42:56,717-Speed 2496.44 samples/sec Loss 3.9501 LearningRate 0.000755 Epoch: 8 Global Step: 180640 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:04,917-Speed 2498.15 samples/sec Loss 3.8936 LearningRate 0.000755 Epoch: 8 Global Step: 180650 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:13,116-Speed 2498.16 samples/sec Loss 3.9176 LearningRate 0.000755 Epoch: 8 Global Step: 180660 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:21,268-Speed 2512.83 samples/sec Loss 3.9864 LearningRate 0.000755 Epoch: 8 Global Step: 180670 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:29,472-Speed 2496.47 samples/sec Loss 3.8648 LearningRate 0.000755 Epoch: 8 Global Step: 180680 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:37,677-Speed 2496.41 samples/sec Loss 3.9814 LearningRate 0.000755 Epoch: 8 Global Step: 180690 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:45,881-Speed 2496.74 samples/sec Loss 3.9437 LearningRate 0.000755 Epoch: 8 Global Step: 180700 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:43:54,092-Speed 2494.89 samples/sec Loss 3.9595 LearningRate 0.000755 Epoch: 8 Global Step: 180710 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:02,295-Speed 2496.98 samples/sec Loss 3.9912 LearningRate 0.000755 Epoch: 8 Global Step: 180720 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:10,445-Speed 2513.28 samples/sec Loss 3.9593 LearningRate 0.000755 Epoch: 8 Global Step: 180730 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:18,647-Speed 2497.36 samples/sec Loss 3.9206 LearningRate 0.000755 Epoch: 8 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:26,851-Speed 2497.05 samples/sec Loss 3.9341 LearningRate 0.000755 Epoch: 8 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:35,053-Speed 2497.35 samples/sec Loss 3.9715 LearningRate 0.000755 Epoch: 8 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:43,260-Speed 2495.59 samples/sec Loss 3.9919 LearningRate 0.000755 Epoch: 8 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:51,462-Speed 2497.47 samples/sec Loss 3.9458 LearningRate 0.000755 Epoch: 8 Global Step: 180780 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:44:59,610-Speed 2513.71 samples/sec Loss 3.9146 LearningRate 0.000755 Epoch: 8 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:07,829-Speed 2492.23 samples/sec Loss 3.8948 LearningRate 0.000755 Epoch: 8 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:16,030-Speed 2497.65 samples/sec Loss 4.0126 LearningRate 0.000755 Epoch: 8 Global Step: 180810 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:24,235-Speed 2496.46 samples/sec Loss 3.9473 LearningRate 0.000755 Epoch: 8 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:32,437-Speed 2497.41 samples/sec Loss 3.9083 LearningRate 0.000755 Epoch: 8 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:40,645-Speed 2495.52 samples/sec Loss 3.9113 LearningRate 0.000755 Epoch: 8 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:48,796-Speed 2512.71 samples/sec Loss 3.9424 LearningRate 0.000755 Epoch: 8 Global Step: 180850 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:45:56,997-Speed 2497.80 samples/sec Loss 3.9785 LearningRate 0.000755 Epoch: 8 Global Step: 180860 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:05,200-Speed 2497.11 samples/sec Loss 4.0341 LearningRate 0.000755 Epoch: 8 Global Step: 180870 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:13,404-Speed 2496.64 samples/sec Loss 3.9625 LearningRate 0.000755 Epoch: 8 Global Step: 180880 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:21,607-Speed 2497.04 samples/sec Loss 4.0384 LearningRate 0.000755 Epoch: 8 Global Step: 180890 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:29,815-Speed 2495.57 samples/sec Loss 4.0051 LearningRate 0.000755 Epoch: 8 Global Step: 180900 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:37,971-Speed 2511.78 samples/sec Loss 3.9620 LearningRate 0.000755 Epoch: 8 Global Step: 180910 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:46,170-Speed 2498.16 samples/sec Loss 3.8949 LearningRate 0.000755 Epoch: 8 Global Step: 180920 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:46:54,380-Speed 2494.79 samples/sec Loss 3.9299 LearningRate 0.000755 Epoch: 8 Global Step: 180930 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:02,579-Speed 2498.31 samples/sec Loss 3.9134 LearningRate 0.000755 Epoch: 8 Global Step: 180940 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:10,779-Speed 2497.96 samples/sec Loss 3.8530 LearningRate 0.000755 Epoch: 8 Global Step: 180950 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:18,984-Speed 2496.07 samples/sec Loss 3.9578 LearningRate 0.000755 Epoch: 8 Global Step: 180960 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:27,131-Speed 2514.47 samples/sec Loss 3.9732 LearningRate 0.000755 Epoch: 8 Global Step: 180970 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:35,335-Speed 2496.76 samples/sec Loss 4.0092 LearningRate 0.000755 Epoch: 8 Global Step: 180980 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:43,534-Speed 2498.10 samples/sec Loss 3.9926 LearningRate 0.000755 Epoch: 8 Global Step: 180990 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:51,734-Speed 2497.83 samples/sec Loss 4.0585 LearningRate 0.000755 Epoch: 8 Global Step: 181000 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:47:59,936-Speed 2497.32 samples/sec Loss 4.0340 LearningRate 0.000755 Epoch: 8 Global Step: 181010 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:08,137-Speed 2497.75 samples/sec Loss 3.9889 LearningRate 0.000755 Epoch: 8 Global Step: 181020 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:16,287-Speed 2513.36 samples/sec Loss 3.9952 LearningRate 0.000755 Epoch: 8 Global Step: 181030 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:24,485-Speed 2498.36 samples/sec Loss 3.9824 LearningRate 0.000755 Epoch: 8 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:32,690-Speed 2496.78 samples/sec Loss 3.9784 LearningRate 0.000754 Epoch: 8 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:40,892-Speed 2497.56 samples/sec Loss 3.9305 LearningRate 0.000754 Epoch: 8 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:49,088-Speed 2498.96 samples/sec Loss 3.9324 LearningRate 0.000754 Epoch: 8 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:48:57,289-Speed 2497.88 samples/sec Loss 3.9833 LearningRate 0.000754 Epoch: 8 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:05,446-Speed 2510.96 samples/sec Loss 3.9007 LearningRate 0.000754 Epoch: 8 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:13,647-Speed 2497.81 samples/sec Loss 3.9365 LearningRate 0.000754 Epoch: 8 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:21,861-Speed 2493.55 samples/sec Loss 3.9798 LearningRate 0.000754 Epoch: 8 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:30,064-Speed 2497.27 samples/sec Loss 3.9054 LearningRate 0.000754 Epoch: 8 Global Step: 181120 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:38,265-Speed 2497.68 samples/sec Loss 3.9979 LearningRate 0.000754 Epoch: 8 Global Step: 181130 Fp16 Grad Scale: 32768 Required: 148 hours Training: 2022-07-07 07:49:46,482-Speed 2492.83 samples/sec Loss 3.9410 LearningRate 0.000754 Epoch: 8 Global Step: 181140 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:49:54,647-Speed 2508.71 samples/sec Loss 3.8914 LearningRate 0.000754 Epoch: 8 Global Step: 181150 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:02,848-Speed 2497.73 samples/sec Loss 3.9391 LearningRate 0.000754 Epoch: 8 Global Step: 181160 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:11,065-Speed 2493.03 samples/sec Loss 3.9332 LearningRate 0.000754 Epoch: 8 Global Step: 181170 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:19,266-Speed 2497.73 samples/sec Loss 3.9070 LearningRate 0.000754 Epoch: 8 Global Step: 181180 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:27,468-Speed 2497.22 samples/sec Loss 3.8769 LearningRate 0.000754 Epoch: 8 Global Step: 181190 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:35,669-Speed 2497.60 samples/sec Loss 3.9462 LearningRate 0.000754 Epoch: 8 Global Step: 181200 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:43,815-Speed 2515.64 samples/sec Loss 3.9387 LearningRate 0.000754 Epoch: 8 Global Step: 181210 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:50:52,016-Speed 2497.76 samples/sec Loss 3.8658 LearningRate 0.000754 Epoch: 8 Global Step: 181220 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:00,215-Speed 2498.22 samples/sec Loss 4.0063 LearningRate 0.000754 Epoch: 8 Global Step: 181230 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:08,412-Speed 2498.75 samples/sec Loss 4.0120 LearningRate 0.000754 Epoch: 8 Global Step: 181240 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:16,610-Speed 2498.56 samples/sec Loss 3.9499 LearningRate 0.000754 Epoch: 8 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:24,807-Speed 2499.28 samples/sec Loss 3.9592 LearningRate 0.000754 Epoch: 8 Global Step: 181260 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:32,968-Speed 2509.70 samples/sec Loss 3.9340 LearningRate 0.000754 Epoch: 8 Global Step: 181270 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:41,167-Speed 2498.34 samples/sec Loss 4.0343 LearningRate 0.000754 Epoch: 8 Global Step: 181280 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:49,378-Speed 2494.96 samples/sec Loss 4.0091 LearningRate 0.000754 Epoch: 8 Global Step: 181290 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:51:57,577-Speed 2497.93 samples/sec Loss 3.9544 LearningRate 0.000754 Epoch: 8 Global Step: 181300 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:05,775-Speed 2498.65 samples/sec Loss 3.9813 LearningRate 0.000754 Epoch: 8 Global Step: 181310 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:13,972-Speed 2498.93 samples/sec Loss 3.9672 LearningRate 0.000754 Epoch: 8 Global Step: 181320 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:22,119-Speed 2514.30 samples/sec Loss 3.9799 LearningRate 0.000754 Epoch: 8 Global Step: 181330 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:30,318-Speed 2498.44 samples/sec Loss 3.9774 LearningRate 0.000754 Epoch: 8 Global Step: 181340 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:38,517-Speed 2498.17 samples/sec Loss 3.9106 LearningRate 0.000754 Epoch: 8 Global Step: 181350 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:46,719-Speed 2497.57 samples/sec Loss 3.9387 LearningRate 0.000754 Epoch: 8 Global Step: 181360 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:52:54,915-Speed 2499.08 samples/sec Loss 3.9882 LearningRate 0.000754 Epoch: 8 Global Step: 181370 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:03,115-Speed 2497.95 samples/sec Loss 3.8460 LearningRate 0.000754 Epoch: 8 Global Step: 181380 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:11,257-Speed 2515.74 samples/sec Loss 3.9818 LearningRate 0.000754 Epoch: 8 Global Step: 181390 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:19,454-Speed 2498.90 samples/sec Loss 3.9328 LearningRate 0.000754 Epoch: 8 Global Step: 181400 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:27,658-Speed 2496.91 samples/sec Loss 3.9382 LearningRate 0.000754 Epoch: 8 Global Step: 181410 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:35,862-Speed 2496.95 samples/sec Loss 3.9495 LearningRate 0.000754 Epoch: 8 Global Step: 181420 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:44,062-Speed 2497.95 samples/sec Loss 3.9902 LearningRate 0.000754 Epoch: 8 Global Step: 181430 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:53:52,267-Speed 2496.55 samples/sec Loss 3.8884 LearningRate 0.000754 Epoch: 8 Global Step: 181440 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:00,416-Speed 2513.47 samples/sec Loss 3.9376 LearningRate 0.000754 Epoch: 8 Global Step: 181450 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:08,618-Speed 2497.56 samples/sec Loss 3.9466 LearningRate 0.000754 Epoch: 8 Global Step: 181460 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:16,816-Speed 2498.46 samples/sec Loss 3.9780 LearningRate 0.000754 Epoch: 8 Global Step: 181470 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:25,015-Speed 2498.45 samples/sec Loss 3.9679 LearningRate 0.000753 Epoch: 8 Global Step: 181480 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:33,214-Speed 2498.11 samples/sec Loss 3.9536 LearningRate 0.000753 Epoch: 8 Global Step: 181490 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:41,412-Speed 2498.39 samples/sec Loss 3.8877 LearningRate 0.000753 Epoch: 8 Global Step: 181500 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:49,554-Speed 2515.94 samples/sec Loss 3.9601 LearningRate 0.000753 Epoch: 8 Global Step: 181510 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:54:57,754-Speed 2498.69 samples/sec Loss 3.9684 LearningRate 0.000753 Epoch: 8 Global Step: 181520 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:05,960-Speed 2495.88 samples/sec Loss 3.9637 LearningRate 0.000753 Epoch: 8 Global Step: 181530 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:14,171-Speed 2494.73 samples/sec Loss 3.8443 LearningRate 0.000753 Epoch: 8 Global Step: 181540 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:22,381-Speed 2495.25 samples/sec Loss 3.9545 LearningRate 0.000753 Epoch: 8 Global Step: 181550 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:30,578-Speed 2499.12 samples/sec Loss 3.8766 LearningRate 0.000753 Epoch: 8 Global Step: 181560 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:38,722-Speed 2514.85 samples/sec Loss 3.9970 LearningRate 0.000753 Epoch: 8 Global Step: 181570 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:46,919-Speed 2498.75 samples/sec Loss 3.9344 LearningRate 0.000753 Epoch: 8 Global Step: 181580 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:55:55,114-Speed 2499.55 samples/sec Loss 3.9369 LearningRate 0.000753 Epoch: 8 Global Step: 181590 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:03,311-Speed 2499.42 samples/sec Loss 3.8986 LearningRate 0.000753 Epoch: 8 Global Step: 181600 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:11,509-Speed 2498.31 samples/sec Loss 3.9688 LearningRate 0.000753 Epoch: 8 Global Step: 181610 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:19,710-Speed 2497.76 samples/sec Loss 4.0176 LearningRate 0.000753 Epoch: 8 Global Step: 181620 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:27,855-Speed 2514.88 samples/sec Loss 3.9209 LearningRate 0.000753 Epoch: 8 Global Step: 181630 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:36,056-Speed 2497.95 samples/sec Loss 3.9540 LearningRate 0.000753 Epoch: 8 Global Step: 181640 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:44,256-Speed 2497.77 samples/sec Loss 3.9302 LearningRate 0.000753 Epoch: 8 Global Step: 181650 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:56:52,461-Speed 2496.42 samples/sec Loss 3.8863 LearningRate 0.000753 Epoch: 8 Global Step: 181660 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:00,664-Speed 2497.09 samples/sec Loss 3.8877 LearningRate 0.000753 Epoch: 8 Global Step: 181670 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:08,865-Speed 2497.86 samples/sec Loss 4.0031 LearningRate 0.000753 Epoch: 8 Global Step: 181680 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:17,020-Speed 2511.65 samples/sec Loss 3.9636 LearningRate 0.000753 Epoch: 8 Global Step: 181690 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:25,220-Speed 2497.98 samples/sec Loss 3.8651 LearningRate 0.000753 Epoch: 8 Global Step: 181700 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:33,418-Speed 2499.08 samples/sec Loss 3.8779 LearningRate 0.000753 Epoch: 8 Global Step: 181710 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:41,617-Speed 2498.36 samples/sec Loss 3.9406 LearningRate 0.000753 Epoch: 8 Global Step: 181720 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:49,811-Speed 2499.50 samples/sec Loss 3.9000 LearningRate 0.000753 Epoch: 8 Global Step: 181730 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:57:58,013-Speed 2497.27 samples/sec Loss 3.9418 LearningRate 0.000753 Epoch: 8 Global Step: 181740 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:06,156-Speed 2515.69 samples/sec Loss 3.8897 LearningRate 0.000753 Epoch: 8 Global Step: 181750 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:14,376-Speed 2492.01 samples/sec Loss 3.8932 LearningRate 0.000753 Epoch: 8 Global Step: 181760 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:22,575-Speed 2498.25 samples/sec Loss 3.9565 LearningRate 0.000753 Epoch: 8 Global Step: 181770 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:30,770-Speed 2499.30 samples/sec Loss 4.0103 LearningRate 0.000753 Epoch: 8 Global Step: 181780 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:38,970-Speed 2498.33 samples/sec Loss 3.8834 LearningRate 0.000753 Epoch: 8 Global Step: 181790 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:47,179-Speed 2495.15 samples/sec Loss 3.8608 LearningRate 0.000753 Epoch: 8 Global Step: 181800 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:58:55,333-Speed 2511.93 samples/sec Loss 4.0105 LearningRate 0.000753 Epoch: 8 Global Step: 181810 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:03,534-Speed 2497.57 samples/sec Loss 3.8913 LearningRate 0.000753 Epoch: 8 Global Step: 181820 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:11,736-Speed 2497.55 samples/sec Loss 3.8643 LearningRate 0.000753 Epoch: 8 Global Step: 181830 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:19,935-Speed 2498.10 samples/sec Loss 3.8901 LearningRate 0.000753 Epoch: 8 Global Step: 181840 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:28,137-Speed 2497.48 samples/sec Loss 3.8949 LearningRate 0.000753 Epoch: 8 Global Step: 181850 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:36,332-Speed 2499.36 samples/sec Loss 3.9420 LearningRate 0.000753 Epoch: 8 Global Step: 181860 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:44,480-Speed 2514.14 samples/sec Loss 3.9519 LearningRate 0.000753 Epoch: 8 Global Step: 181870 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 07:59:52,682-Speed 2497.52 samples/sec Loss 3.8800 LearningRate 0.000753 Epoch: 8 Global Step: 181880 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:00,883-Speed 2497.65 samples/sec Loss 3.8895 LearningRate 0.000753 Epoch: 8 Global Step: 181890 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:09,083-Speed 2497.96 samples/sec Loss 3.9450 LearningRate 0.000753 Epoch: 8 Global Step: 181900 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:17,278-Speed 2499.23 samples/sec Loss 3.9024 LearningRate 0.000752 Epoch: 8 Global Step: 181910 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:25,475-Speed 2499.10 samples/sec Loss 3.8687 LearningRate 0.000752 Epoch: 8 Global Step: 181920 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:33,619-Speed 2514.90 samples/sec Loss 3.9130 LearningRate 0.000752 Epoch: 8 Global Step: 181930 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:41,830-Speed 2494.90 samples/sec Loss 3.9376 LearningRate 0.000752 Epoch: 8 Global Step: 181940 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:50,026-Speed 2499.13 samples/sec Loss 4.0017 LearningRate 0.000752 Epoch: 8 Global Step: 181950 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:00:58,230-Speed 2496.68 samples/sec Loss 3.9941 LearningRate 0.000752 Epoch: 8 Global Step: 181960 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:06,431-Speed 2497.89 samples/sec Loss 3.9725 LearningRate 0.000752 Epoch: 8 Global Step: 181970 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:14,628-Speed 2498.69 samples/sec Loss 3.9627 LearningRate 0.000752 Epoch: 8 Global Step: 181980 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:22,775-Speed 2514.44 samples/sec Loss 3.9725 LearningRate 0.000752 Epoch: 8 Global Step: 181990 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:30,975-Speed 2497.84 samples/sec Loss 4.0241 LearningRate 0.000752 Epoch: 8 Global Step: 182000 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:39,170-Speed 2499.53 samples/sec Loss 3.8835 LearningRate 0.000752 Epoch: 8 Global Step: 182010 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:47,370-Speed 2497.98 samples/sec Loss 3.9036 LearningRate 0.000752 Epoch: 8 Global Step: 182020 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:01:55,568-Speed 2498.72 samples/sec Loss 3.9111 LearningRate 0.000752 Epoch: 8 Global Step: 182030 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:03,768-Speed 2497.69 samples/sec Loss 3.9134 LearningRate 0.000752 Epoch: 8 Global Step: 182040 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:11,911-Speed 2515.55 samples/sec Loss 3.9661 LearningRate 0.000752 Epoch: 8 Global Step: 182050 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:20,978-Speed 2259.18 samples/sec Loss 3.9160 LearningRate 0.000752 Epoch: 8 Global Step: 182060 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:29,173-Speed 2499.46 samples/sec Loss 4.0522 LearningRate 0.000752 Epoch: 8 Global Step: 182070 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:37,369-Speed 2499.17 samples/sec Loss 3.9905 LearningRate 0.000752 Epoch: 8 Global Step: 182080 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:45,565-Speed 2499.08 samples/sec Loss 3.9176 LearningRate 0.000752 Epoch: 8 Global Step: 182090 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:02:53,776-Speed 2494.59 samples/sec Loss 3.9170 LearningRate 0.000752 Epoch: 8 Global Step: 182100 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:01,941-Speed 2508.88 samples/sec Loss 3.9790 LearningRate 0.000752 Epoch: 8 Global Step: 182110 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:10,137-Speed 2499.12 samples/sec Loss 3.9173 LearningRate 0.000752 Epoch: 8 Global Step: 182120 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:18,334-Speed 2498.97 samples/sec Loss 3.9244 LearningRate 0.000752 Epoch: 8 Global Step: 182130 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:26,542-Speed 2495.51 samples/sec Loss 3.9445 LearningRate 0.000752 Epoch: 8 Global Step: 182140 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:34,738-Speed 2499.27 samples/sec Loss 3.9601 LearningRate 0.000752 Epoch: 8 Global Step: 182150 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:42,933-Speed 2499.53 samples/sec Loss 3.8792 LearningRate 0.000752 Epoch: 8 Global Step: 182160 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:51,076-Speed 2515.23 samples/sec Loss 3.8947 LearningRate 0.000752 Epoch: 8 Global Step: 182170 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:03:59,270-Speed 2500.81 samples/sec Loss 3.9243 LearningRate 0.000752 Epoch: 8 Global Step: 182180 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:07,477-Speed 2495.74 samples/sec Loss 3.8470 LearningRate 0.000752 Epoch: 8 Global Step: 182190 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:15,677-Speed 2498.10 samples/sec Loss 3.8930 LearningRate 0.000752 Epoch: 8 Global Step: 182200 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:23,876-Speed 2498.20 samples/sec Loss 3.9292 LearningRate 0.000752 Epoch: 8 Global Step: 182210 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:32,073-Speed 2498.81 samples/sec Loss 3.8805 LearningRate 0.000752 Epoch: 8 Global Step: 182220 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:40,234-Speed 2509.87 samples/sec Loss 3.9097 LearningRate 0.000752 Epoch: 8 Global Step: 182230 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:48,437-Speed 2497.08 samples/sec Loss 3.9462 LearningRate 0.000752 Epoch: 8 Global Step: 182240 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:04:56,634-Speed 2499.06 samples/sec Loss 3.9849 LearningRate 0.000752 Epoch: 8 Global Step: 182250 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:04,834-Speed 2498.11 samples/sec Loss 3.9431 LearningRate 0.000752 Epoch: 8 Global Step: 182260 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:13,033-Speed 2498.05 samples/sec Loss 3.9500 LearningRate 0.000752 Epoch: 8 Global Step: 182270 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:21,232-Speed 2498.58 samples/sec Loss 3.8827 LearningRate 0.000752 Epoch: 8 Global Step: 182280 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:29,382-Speed 2513.35 samples/sec Loss 3.9010 LearningRate 0.000752 Epoch: 8 Global Step: 182290 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:37,578-Speed 2499.15 samples/sec Loss 3.9825 LearningRate 0.000752 Epoch: 8 Global Step: 182300 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:45,777-Speed 2498.18 samples/sec Loss 3.9035 LearningRate 0.000752 Epoch: 8 Global Step: 182310 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:05:53,974-Speed 2499.04 samples/sec Loss 3.8583 LearningRate 0.000752 Epoch: 8 Global Step: 182320 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:06:02,172-Speed 2498.54 samples/sec Loss 3.9020 LearningRate 0.000752 Epoch: 8 Global Step: 182330 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:06:10,372-Speed 2497.87 samples/sec Loss 3.9554 LearningRate 0.000751 Epoch: 8 Global Step: 182340 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:18,514-Speed 2515.74 samples/sec Loss 3.9732 LearningRate 0.000751 Epoch: 8 Global Step: 182350 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:26,710-Speed 2499.38 samples/sec Loss 3.8656 LearningRate 0.000751 Epoch: 8 Global Step: 182360 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:34,919-Speed 2495.09 samples/sec Loss 3.8851 LearningRate 0.000751 Epoch: 8 Global Step: 182370 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:43,117-Speed 2498.50 samples/sec Loss 3.9064 LearningRate 0.000751 Epoch: 8 Global Step: 182380 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:51,321-Speed 2496.69 samples/sec Loss 3.8701 LearningRate 0.000751 Epoch: 8 Global Step: 182390 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:06:59,474-Speed 2512.60 samples/sec Loss 3.9527 LearningRate 0.000751 Epoch: 8 Global Step: 182400 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:07,628-Speed 2512.26 samples/sec Loss 3.8662 LearningRate 0.000751 Epoch: 8 Global Step: 182410 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:15,824-Speed 2498.93 samples/sec Loss 3.8604 LearningRate 0.000751 Epoch: 8 Global Step: 182420 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:24,022-Speed 2499.18 samples/sec Loss 3.8711 LearningRate 0.000751 Epoch: 8 Global Step: 182430 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:32,238-Speed 2493.22 samples/sec Loss 3.8825 LearningRate 0.000751 Epoch: 8 Global Step: 182440 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:40,435-Speed 2498.51 samples/sec Loss 3.9005 LearningRate 0.000751 Epoch: 8 Global Step: 182450 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:48,632-Speed 2499.04 samples/sec Loss 3.9389 LearningRate 0.000751 Epoch: 8 Global Step: 182460 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:07:56,772-Speed 2516.50 samples/sec Loss 3.9110 LearningRate 0.000751 Epoch: 8 Global Step: 182470 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:04,974-Speed 2497.33 samples/sec Loss 3.9213 LearningRate 0.000751 Epoch: 8 Global Step: 182480 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:13,172-Speed 2498.58 samples/sec Loss 3.8985 LearningRate 0.000751 Epoch: 8 Global Step: 182490 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:21,381-Speed 2495.00 samples/sec Loss 3.8930 LearningRate 0.000751 Epoch: 8 Global Step: 182500 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:29,580-Speed 2498.38 samples/sec Loss 3.9122 LearningRate 0.000751 Epoch: 8 Global Step: 182510 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:37,775-Speed 2499.38 samples/sec Loss 3.8650 LearningRate 0.000751 Epoch: 8 Global Step: 182520 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:45,925-Speed 2513.27 samples/sec Loss 3.9033 LearningRate 0.000751 Epoch: 8 Global Step: 182530 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:08:54,121-Speed 2499.35 samples/sec Loss 3.8687 LearningRate 0.000751 Epoch: 8 Global Step: 182540 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:02,318-Speed 2498.74 samples/sec Loss 3.9283 LearningRate 0.000751 Epoch: 8 Global Step: 182550 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:10,511-Speed 2500.31 samples/sec Loss 3.8489 LearningRate 0.000751 Epoch: 8 Global Step: 182560 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:18,710-Speed 2498.25 samples/sec Loss 3.8628 LearningRate 0.000751 Epoch: 8 Global Step: 182570 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:26,915-Speed 2496.71 samples/sec Loss 3.8438 LearningRate 0.000751 Epoch: 8 Global Step: 182580 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:35,055-Speed 2516.18 samples/sec Loss 3.8250 LearningRate 0.000751 Epoch: 8 Global Step: 182590 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:43,251-Speed 2499.25 samples/sec Loss 3.8849 LearningRate 0.000751 Epoch: 8 Global Step: 182600 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:51,448-Speed 2498.62 samples/sec Loss 3.9531 LearningRate 0.000751 Epoch: 8 Global Step: 182610 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:09:59,649-Speed 2498.08 samples/sec Loss 3.9001 LearningRate 0.000751 Epoch: 8 Global Step: 182620 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:07,841-Speed 2500.34 samples/sec Loss 3.8761 LearningRate 0.000751 Epoch: 8 Global Step: 182630 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:16,037-Speed 2499.22 samples/sec Loss 3.9059 LearningRate 0.000751 Epoch: 8 Global Step: 182640 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:24,184-Speed 2514.37 samples/sec Loss 3.9639 LearningRate 0.000751 Epoch: 8 Global Step: 182650 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:32,382-Speed 2498.38 samples/sec Loss 3.8792 LearningRate 0.000751 Epoch: 8 Global Step: 182660 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:40,576-Speed 2499.85 samples/sec Loss 3.9097 LearningRate 0.000751 Epoch: 8 Global Step: 182670 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:48,773-Speed 2498.65 samples/sec Loss 3.8919 LearningRate 0.000751 Epoch: 8 Global Step: 182680 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:10:56,972-Speed 2498.33 samples/sec Loss 3.8495 LearningRate 0.000751 Epoch: 8 Global Step: 182690 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:05,169-Speed 2499.11 samples/sec Loss 3.9689 LearningRate 0.000751 Epoch: 8 Global Step: 182700 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:13,313-Speed 2515.19 samples/sec Loss 3.8659 LearningRate 0.000751 Epoch: 8 Global Step: 182710 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:21,509-Speed 2499.22 samples/sec Loss 3.9641 LearningRate 0.000751 Epoch: 8 Global Step: 182720 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:29,712-Speed 2496.92 samples/sec Loss 3.9540 LearningRate 0.000751 Epoch: 8 Global Step: 182730 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:37,921-Speed 2495.26 samples/sec Loss 3.9264 LearningRate 0.000751 Epoch: 8 Global Step: 182740 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:46,119-Speed 2498.69 samples/sec Loss 3.9116 LearningRate 0.000751 Epoch: 8 Global Step: 182750 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:11:54,320-Speed 2497.46 samples/sec Loss 3.9536 LearningRate 0.000751 Epoch: 8 Global Step: 182760 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:02,463-Speed 2515.63 samples/sec Loss 3.8925 LearningRate 0.000750 Epoch: 8 Global Step: 182770 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:10,675-Speed 2494.40 samples/sec Loss 3.8538 LearningRate 0.000750 Epoch: 8 Global Step: 182780 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:18,885-Speed 2495.01 samples/sec Loss 3.8830 LearningRate 0.000750 Epoch: 8 Global Step: 182790 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:27,092-Speed 2495.66 samples/sec Loss 3.9276 LearningRate 0.000750 Epoch: 8 Global Step: 182800 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:35,286-Speed 2499.88 samples/sec Loss 3.8703 LearningRate 0.000750 Epoch: 8 Global Step: 182810 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:43,485-Speed 2498.29 samples/sec Loss 3.8606 LearningRate 0.000750 Epoch: 8 Global Step: 182820 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:51,640-Speed 2511.65 samples/sec Loss 3.8638 LearningRate 0.000750 Epoch: 8 Global Step: 182830 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:12:59,838-Speed 2498.50 samples/sec Loss 3.8131 LearningRate 0.000750 Epoch: 8 Global Step: 182840 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:08,035-Speed 2498.65 samples/sec Loss 3.8662 LearningRate 0.000750 Epoch: 8 Global Step: 182850 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:16,231-Speed 2499.25 samples/sec Loss 3.9233 LearningRate 0.000750 Epoch: 8 Global Step: 182860 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:24,429-Speed 2498.56 samples/sec Loss 3.8999 LearningRate 0.000750 Epoch: 8 Global Step: 182870 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:32,642-Speed 2494.08 samples/sec Loss 3.8422 LearningRate 0.000750 Epoch: 8 Global Step: 182880 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:40,785-Speed 2515.68 samples/sec Loss 3.8429 LearningRate 0.000750 Epoch: 8 Global Step: 182890 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:48,985-Speed 2498.02 samples/sec Loss 3.9086 LearningRate 0.000750 Epoch: 8 Global Step: 182900 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:13:57,189-Speed 2496.79 samples/sec Loss 3.8613 LearningRate 0.000750 Epoch: 8 Global Step: 182910 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:05,383-Speed 2499.57 samples/sec Loss 3.8831 LearningRate 0.000750 Epoch: 8 Global Step: 182920 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:13,580-Speed 2499.03 samples/sec Loss 3.9569 LearningRate 0.000750 Epoch: 8 Global Step: 182930 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:21,777-Speed 2498.97 samples/sec Loss 3.9166 LearningRate 0.000750 Epoch: 8 Global Step: 182940 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:29,922-Speed 2514.89 samples/sec Loss 3.9274 LearningRate 0.000750 Epoch: 8 Global Step: 182950 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:38,122-Speed 2497.80 samples/sec Loss 3.8733 LearningRate 0.000750 Epoch: 8 Global Step: 182960 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:46,318-Speed 2499.32 samples/sec Loss 3.8789 LearningRate 0.000750 Epoch: 8 Global Step: 182970 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:14:54,519-Speed 2497.40 samples/sec Loss 3.8859 LearningRate 0.000750 Epoch: 8 Global Step: 182980 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:02,717-Speed 2498.63 samples/sec Loss 4.0327 LearningRate 0.000750 Epoch: 8 Global Step: 182990 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:10,936-Speed 2491.97 samples/sec Loss 3.9306 LearningRate 0.000750 Epoch: 8 Global Step: 183000 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:19,085-Speed 2513.72 samples/sec Loss 3.9255 LearningRate 0.000750 Epoch: 8 Global Step: 183010 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:27,283-Speed 2498.62 samples/sec Loss 3.9226 LearningRate 0.000750 Epoch: 8 Global Step: 183020 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:35,492-Speed 2495.05 samples/sec Loss 3.9377 LearningRate 0.000750 Epoch: 8 Global Step: 183030 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:43,694-Speed 2497.51 samples/sec Loss 3.9043 LearningRate 0.000750 Epoch: 8 Global Step: 183040 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:15:51,891-Speed 2498.72 samples/sec Loss 3.8567 LearningRate 0.000750 Epoch: 8 Global Step: 183050 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:00,091-Speed 2498.04 samples/sec Loss 3.8787 LearningRate 0.000750 Epoch: 8 Global Step: 183060 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:08,238-Speed 2514.02 samples/sec Loss 4.0395 LearningRate 0.000750 Epoch: 8 Global Step: 183070 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:16,443-Speed 2496.68 samples/sec Loss 3.9898 LearningRate 0.000750 Epoch: 8 Global Step: 183080 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:24,641-Speed 2498.60 samples/sec Loss 3.9630 LearningRate 0.000750 Epoch: 8 Global Step: 183090 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:32,841-Speed 2497.87 samples/sec Loss 4.0520 LearningRate 0.000750 Epoch: 8 Global Step: 183100 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:41,041-Speed 2498.01 samples/sec Loss 4.0622 LearningRate 0.000750 Epoch: 8 Global Step: 183110 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:49,239-Speed 2498.58 samples/sec Loss 3.9377 LearningRate 0.000750 Epoch: 8 Global Step: 183120 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:16:57,389-Speed 2513.31 samples/sec Loss 3.8776 LearningRate 0.000750 Epoch: 8 Global Step: 183130 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:05,592-Speed 2497.06 samples/sec Loss 3.9574 LearningRate 0.000750 Epoch: 8 Global Step: 183140 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:13,791-Speed 2498.38 samples/sec Loss 3.9094 LearningRate 0.000750 Epoch: 8 Global Step: 183150 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:21,989-Speed 2498.56 samples/sec Loss 3.9232 LearningRate 0.000750 Epoch: 8 Global Step: 183160 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:30,205-Speed 2493.17 samples/sec Loss 3.9509 LearningRate 0.000750 Epoch: 8 Global Step: 183170 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:38,403-Speed 2498.93 samples/sec Loss 3.8360 LearningRate 0.000750 Epoch: 8 Global Step: 183180 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:46,552-Speed 2513.54 samples/sec Loss 3.9092 LearningRate 0.000750 Epoch: 8 Global Step: 183190 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:17:54,747-Speed 2499.47 samples/sec Loss 3.9228 LearningRate 0.000749 Epoch: 8 Global Step: 183200 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:02,948-Speed 2497.72 samples/sec Loss 3.9302 LearningRate 0.000749 Epoch: 8 Global Step: 183210 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:11,148-Speed 2497.98 samples/sec Loss 3.9384 LearningRate 0.000749 Epoch: 8 Global Step: 183220 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:19,345-Speed 2498.82 samples/sec Loss 3.8907 LearningRate 0.000749 Epoch: 8 Global Step: 183230 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:27,547-Speed 2497.46 samples/sec Loss 3.8809 LearningRate 0.000749 Epoch: 8 Global Step: 183240 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:35,700-Speed 2512.32 samples/sec Loss 3.8887 LearningRate 0.000749 Epoch: 8 Global Step: 183250 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:43,901-Speed 2497.73 samples/sec Loss 3.9305 LearningRate 0.000749 Epoch: 8 Global Step: 183260 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:18:52,102-Speed 2497.67 samples/sec Loss 4.0284 LearningRate 0.000749 Epoch: 8 Global Step: 183270 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:00,302-Speed 2498.32 samples/sec Loss 3.9309 LearningRate 0.000749 Epoch: 8 Global Step: 183280 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:08,500-Speed 2498.58 samples/sec Loss 3.8991 LearningRate 0.000749 Epoch: 8 Global Step: 183290 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:16,701-Speed 2497.70 samples/sec Loss 3.8444 LearningRate 0.000749 Epoch: 8 Global Step: 183300 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:24,850-Speed 2513.45 samples/sec Loss 3.9509 LearningRate 0.000749 Epoch: 8 Global Step: 183310 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:33,049-Speed 2498.13 samples/sec Loss 3.8909 LearningRate 0.000749 Epoch: 8 Global Step: 183320 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:41,256-Speed 2496.26 samples/sec Loss 3.9255 LearningRate 0.000749 Epoch: 8 Global Step: 183330 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:49,454-Speed 2498.63 samples/sec Loss 3.9153 LearningRate 0.000749 Epoch: 8 Global Step: 183340 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:19:57,652-Speed 2498.47 samples/sec Loss 3.9847 LearningRate 0.000749 Epoch: 8 Global Step: 183350 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:05,849-Speed 2498.90 samples/sec Loss 3.9751 LearningRate 0.000749 Epoch: 8 Global Step: 183360 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:13,993-Speed 2515.08 samples/sec Loss 3.8764 LearningRate 0.000749 Epoch: 8 Global Step: 183370 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:22,188-Speed 2499.51 samples/sec Loss 3.9303 LearningRate 0.000749 Epoch: 8 Global Step: 183380 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:30,383-Speed 2499.62 samples/sec Loss 3.8377 LearningRate 0.000749 Epoch: 8 Global Step: 183390 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:38,581-Speed 2498.50 samples/sec Loss 3.8828 LearningRate 0.000749 Epoch: 8 Global Step: 183400 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:46,783-Speed 2497.40 samples/sec Loss 3.9093 LearningRate 0.000749 Epoch: 8 Global Step: 183410 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:20:54,984-Speed 2497.98 samples/sec Loss 3.9098 LearningRate 0.000749 Epoch: 8 Global Step: 183420 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:03,133-Speed 2513.33 samples/sec Loss 3.7942 LearningRate 0.000749 Epoch: 8 Global Step: 183430 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:11,328-Speed 2499.92 samples/sec Loss 3.9565 LearningRate 0.000749 Epoch: 8 Global Step: 183440 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:19,543-Speed 2493.28 samples/sec Loss 3.8855 LearningRate 0.000749 Epoch: 8 Global Step: 183450 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:27,739-Speed 2499.27 samples/sec Loss 3.9344 LearningRate 0.000749 Epoch: 8 Global Step: 183460 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:35,948-Speed 2495.22 samples/sec Loss 3.9483 LearningRate 0.000749 Epoch: 8 Global Step: 183470 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:44,145-Speed 2498.74 samples/sec Loss 3.9849 LearningRate 0.000749 Epoch: 8 Global Step: 183480 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:21:52,290-Speed 2514.95 samples/sec Loss 3.9869 LearningRate 0.000749 Epoch: 8 Global Step: 183490 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:00,488-Speed 2498.96 samples/sec Loss 3.9504 LearningRate 0.000749 Epoch: 8 Global Step: 183500 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:08,682-Speed 2499.76 samples/sec Loss 3.8326 LearningRate 0.000749 Epoch: 8 Global Step: 183510 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:16,878-Speed 2499.27 samples/sec Loss 3.8634 LearningRate 0.000749 Epoch: 8 Global Step: 183520 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:25,074-Speed 2499.08 samples/sec Loss 3.9134 LearningRate 0.000749 Epoch: 8 Global Step: 183530 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:33,270-Speed 2499.05 samples/sec Loss 3.9349 LearningRate 0.000749 Epoch: 8 Global Step: 183540 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:41,426-Speed 2511.22 samples/sec Loss 3.8198 LearningRate 0.000749 Epoch: 8 Global Step: 183550 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:49,623-Speed 2499.17 samples/sec Loss 3.9013 LearningRate 0.000749 Epoch: 8 Global Step: 183560 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:22:57,823-Speed 2497.93 samples/sec Loss 3.8792 LearningRate 0.000749 Epoch: 8 Global Step: 183570 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:23:06,030-Speed 2495.59 samples/sec Loss 3.8466 LearningRate 0.000749 Epoch: 8 Global Step: 183580 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:23:14,225-Speed 2499.32 samples/sec Loss 3.9686 LearningRate 0.000749 Epoch: 8 Global Step: 183590 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:23:22,421-Speed 2499.43 samples/sec Loss 3.8715 LearningRate 0.000749 Epoch: 8 Global Step: 183600 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:23:30,566-Speed 2514.81 samples/sec Loss 3.8573 LearningRate 0.000749 Epoch: 8 Global Step: 183610 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:23:38,766-Speed 2498.34 samples/sec Loss 3.8000 LearningRate 0.000749 Epoch: 8 Global Step: 183620 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:23:46,965-Speed 2498.35 samples/sec Loss 3.8525 LearningRate 0.000748 Epoch: 8 Global Step: 183630 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:23:55,165-Speed 2498.01 samples/sec Loss 3.9272 LearningRate 0.000748 Epoch: 8 Global Step: 183640 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:03,362-Speed 2498.85 samples/sec Loss 3.9352 LearningRate 0.000748 Epoch: 8 Global Step: 183650 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:11,576-Speed 2493.80 samples/sec Loss 3.8298 LearningRate 0.000748 Epoch: 8 Global Step: 183660 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:19,725-Speed 2513.69 samples/sec Loss 3.9023 LearningRate 0.000748 Epoch: 8 Global Step: 183670 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:27,923-Speed 2498.56 samples/sec Loss 3.9573 LearningRate 0.000748 Epoch: 8 Global Step: 183680 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:36,123-Speed 2497.93 samples/sec Loss 3.8827 LearningRate 0.000748 Epoch: 8 Global Step: 183690 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:44,323-Speed 2497.78 samples/sec Loss 3.9181 LearningRate 0.000748 Epoch: 8 Global Step: 183700 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:24:52,520-Speed 2498.88 samples/sec Loss 3.9664 LearningRate 0.000748 Epoch: 8 Global Step: 183710 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:25:00,723-Speed 2496.92 samples/sec Loss 3.8594 LearningRate 0.000748 Epoch: 8 Global Step: 183720 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:25:08,867-Speed 2515.32 samples/sec Loss 3.9263 LearningRate 0.000748 Epoch: 8 Global Step: 183730 Fp16 Grad Scale: 131072 Required: 148 hours Training: 2022-07-07 08:25:17,023-Speed 2511.60 samples/sec Loss 3.9208 LearningRate 0.000748 Epoch: 8 Global Step: 183740 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:25:25,221-Speed 2498.39 samples/sec Loss 3.8851 LearningRate 0.000748 Epoch: 8 Global Step: 183750 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:25:33,417-Speed 2499.20 samples/sec Loss 3.8824 LearningRate 0.000748 Epoch: 8 Global Step: 183760 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:25:41,617-Speed 2497.88 samples/sec Loss 3.8935 LearningRate 0.000748 Epoch: 8 Global Step: 183770 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:25:49,814-Speed 2499.05 samples/sec Loss 3.9122 LearningRate 0.000748 Epoch: 8 Global Step: 183780 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:25:57,957-Speed 2515.13 samples/sec Loss 3.9113 LearningRate 0.000748 Epoch: 8 Global Step: 183790 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:06,157-Speed 2498.01 samples/sec Loss 3.8911 LearningRate 0.000748 Epoch: 8 Global Step: 183800 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:14,376-Speed 2492.33 samples/sec Loss 3.8756 LearningRate 0.000748 Epoch: 8 Global Step: 183810 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:22,571-Speed 2499.30 samples/sec Loss 3.9117 LearningRate 0.000748 Epoch: 8 Global Step: 183820 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:30,769-Speed 2498.76 samples/sec Loss 3.9258 LearningRate 0.000748 Epoch: 8 Global Step: 183830 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:38,967-Speed 2498.77 samples/sec Loss 3.8479 LearningRate 0.000748 Epoch: 8 Global Step: 183840 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:47,112-Speed 2514.72 samples/sec Loss 3.8331 LearningRate 0.000748 Epoch: 8 Global Step: 183850 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:26:55,307-Speed 2499.54 samples/sec Loss 3.8412 LearningRate 0.000748 Epoch: 8 Global Step: 183860 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:03,506-Speed 2498.35 samples/sec Loss 3.8940 LearningRate 0.000748 Epoch: 8 Global Step: 183870 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:11,701-Speed 2499.29 samples/sec Loss 3.8805 LearningRate 0.000748 Epoch: 8 Global Step: 183880 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:19,901-Speed 2498.22 samples/sec Loss 3.9114 LearningRate 0.000748 Epoch: 8 Global Step: 183890 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:28,100-Speed 2498.11 samples/sec Loss 3.8964 LearningRate 0.000748 Epoch: 8 Global Step: 183900 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:36,253-Speed 2512.50 samples/sec Loss 3.9594 LearningRate 0.000748 Epoch: 8 Global Step: 183910 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:44,449-Speed 2498.98 samples/sec Loss 3.9002 LearningRate 0.000748 Epoch: 8 Global Step: 183920 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:27:52,647-Speed 2498.56 samples/sec Loss 3.9609 LearningRate 0.000748 Epoch: 8 Global Step: 183930 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:28:00,844-Speed 2499.10 samples/sec Loss 3.8435 LearningRate 0.000748 Epoch: 8 Global Step: 183940 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:28:09,046-Speed 2497.32 samples/sec Loss 3.9350 LearningRate 0.000748 Epoch: 8 Global Step: 183950 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:28:17,242-Speed 2499.16 samples/sec Loss 3.9098 LearningRate 0.000748 Epoch: 8 Global Step: 183960 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:28:25,385-Speed 2515.38 samples/sec Loss 3.9235 LearningRate 0.000748 Epoch: 8 Global Step: 183970 Fp16 Grad Scale: 65536 Required: 148 hours Training: 2022-07-07 08:28:33,589-Speed 2497.04 samples/sec Loss 3.8576 LearningRate 0.000748 Epoch: 8 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:28:41,787-Speed 2498.49 samples/sec Loss 3.8359 LearningRate 0.000748 Epoch: 8 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:28:49,985-Speed 2498.61 samples/sec Loss 3.9107 LearningRate 0.000748 Epoch: 8 Global Step: 184000 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:28:58,201-Speed 2493.20 samples/sec Loss 3.8907 LearningRate 0.000748 Epoch: 8 Global Step: 184010 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:06,400-Speed 2498.16 samples/sec Loss 3.8590 LearningRate 0.000748 Epoch: 8 Global Step: 184020 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:14,546-Speed 2514.52 samples/sec Loss 3.9395 LearningRate 0.000748 Epoch: 8 Global Step: 184030 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:22,748-Speed 2497.49 samples/sec Loss 3.8403 LearningRate 0.000748 Epoch: 8 Global Step: 184040 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:30,945-Speed 2498.84 samples/sec Loss 3.8399 LearningRate 0.000748 Epoch: 8 Global Step: 184050 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:39,141-Speed 2499.21 samples/sec Loss 3.9289 LearningRate 0.000748 Epoch: 8 Global Step: 184060 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:29:47,301-Speed 2510.18 samples/sec Loss 3.9286 LearningRate 0.000747 Epoch: 8 Global Step: 184070 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:29:55,498-Speed 2499.02 samples/sec Loss 3.9633 LearningRate 0.000747 Epoch: 8 Global Step: 184080 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:03,647-Speed 2513.61 samples/sec Loss 3.9370 LearningRate 0.000747 Epoch: 8 Global Step: 184090 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:11,845-Speed 2498.70 samples/sec Loss 3.8625 LearningRate 0.000747 Epoch: 8 Global Step: 184100 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:20,042-Speed 2498.74 samples/sec Loss 3.8469 LearningRate 0.000747 Epoch: 8 Global Step: 184110 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:28,247-Speed 2496.27 samples/sec Loss 3.9375 LearningRate 0.000747 Epoch: 8 Global Step: 184120 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:36,444-Speed 2499.03 samples/sec Loss 3.9679 LearningRate 0.000747 Epoch: 8 Global Step: 184130 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:44,643-Speed 2498.11 samples/sec Loss 3.8914 LearningRate 0.000747 Epoch: 8 Global Step: 184140 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:30:52,796-Speed 2512.33 samples/sec Loss 3.9741 LearningRate 0.000747 Epoch: 8 Global Step: 184150 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:00,991-Speed 2499.48 samples/sec Loss 3.8617 LearningRate 0.000747 Epoch: 8 Global Step: 184160 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:09,190-Speed 2498.47 samples/sec Loss 3.8541 LearningRate 0.000747 Epoch: 8 Global Step: 184170 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:17,388-Speed 2498.32 samples/sec Loss 3.8252 LearningRate 0.000747 Epoch: 8 Global Step: 184180 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:25,584-Speed 2499.61 samples/sec Loss 3.8459 LearningRate 0.000747 Epoch: 8 Global Step: 184190 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:33,783-Speed 2498.17 samples/sec Loss 3.9618 LearningRate 0.000747 Epoch: 8 Global Step: 184200 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:41,927-Speed 2515.33 samples/sec Loss 3.9313 LearningRate 0.000747 Epoch: 8 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:50,129-Speed 2497.22 samples/sec Loss 3.9245 LearningRate 0.000747 Epoch: 8 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:31:58,327-Speed 2498.51 samples/sec Loss 3.8499 LearningRate 0.000747 Epoch: 8 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:06,532-Speed 2496.49 samples/sec Loss 3.8310 LearningRate 0.000747 Epoch: 8 Global Step: 184240 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:14,732-Speed 2498.20 samples/sec Loss 3.8642 LearningRate 0.000747 Epoch: 8 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:22,938-Speed 2496.27 samples/sec Loss 3.8942 LearningRate 0.000747 Epoch: 8 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:31,083-Speed 2514.99 samples/sec Loss 3.8930 LearningRate 0.000747 Epoch: 8 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:39,282-Speed 2498.26 samples/sec Loss 3.8261 LearningRate 0.000747 Epoch: 8 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:47,479-Speed 2498.59 samples/sec Loss 3.8569 LearningRate 0.000747 Epoch: 8 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:32:55,683-Speed 2497.03 samples/sec Loss 3.8834 LearningRate 0.000747 Epoch: 8 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:03,881-Speed 2498.45 samples/sec Loss 3.8227 LearningRate 0.000747 Epoch: 8 Global Step: 184310 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:12,081-Speed 2497.92 samples/sec Loss 3.9314 LearningRate 0.000747 Epoch: 8 Global Step: 184320 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:20,233-Speed 2512.91 samples/sec Loss 3.8887 LearningRate 0.000747 Epoch: 8 Global Step: 184330 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:28,434-Speed 2497.35 samples/sec Loss 3.9012 LearningRate 0.000747 Epoch: 8 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:36,635-Speed 2498.13 samples/sec Loss 3.8366 LearningRate 0.000747 Epoch: 8 Global Step: 184350 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:44,833-Speed 2498.83 samples/sec Loss 3.8566 LearningRate 0.000747 Epoch: 8 Global Step: 184360 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:33:53,032-Speed 2497.90 samples/sec Loss 3.8677 LearningRate 0.000747 Epoch: 8 Global Step: 184370 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:01,237-Speed 2496.44 samples/sec Loss 3.9404 LearningRate 0.000747 Epoch: 8 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:09,391-Speed 2516.13 samples/sec Loss 3.8786 LearningRate 0.000747 Epoch: 8 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:17,730-Speed 2456.41 samples/sec Loss 3.8692 LearningRate 0.000747 Epoch: 8 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:25,930-Speed 2498.02 samples/sec Loss 3.8902 LearningRate 0.000747 Epoch: 8 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:34,133-Speed 2497.21 samples/sec Loss 3.8818 LearningRate 0.000747 Epoch: 8 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:42,332-Speed 2498.29 samples/sec Loss 3.9151 LearningRate 0.000747 Epoch: 8 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:50,528-Speed 2499.06 samples/sec Loss 3.8849 LearningRate 0.000747 Epoch: 8 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:34:58,673-Speed 2514.77 samples/sec Loss 3.9123 LearningRate 0.000747 Epoch: 8 Global Step: 184450 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:06,872-Speed 2498.23 samples/sec Loss 3.9510 LearningRate 0.000747 Epoch: 8 Global Step: 184460 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:15,080-Speed 2495.57 samples/sec Loss 3.9693 LearningRate 0.000747 Epoch: 8 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:23,280-Speed 2497.87 samples/sec Loss 3.8973 LearningRate 0.000747 Epoch: 8 Global Step: 184480 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:31,484-Speed 2497.25 samples/sec Loss 3.9163 LearningRate 0.000747 Epoch: 8 Global Step: 184490 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:39,680-Speed 2498.97 samples/sec Loss 3.8538 LearningRate 0.000746 Epoch: 8 Global Step: 184500 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:47,833-Speed 2512.42 samples/sec Loss 3.8700 LearningRate 0.000746 Epoch: 8 Global Step: 184510 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:35:56,031-Speed 2498.71 samples/sec Loss 3.9628 LearningRate 0.000746 Epoch: 8 Global Step: 184520 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:04,228-Speed 2498.90 samples/sec Loss 3.8928 LearningRate 0.000746 Epoch: 8 Global Step: 184530 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:12,441-Speed 2493.93 samples/sec Loss 3.8927 LearningRate 0.000746 Epoch: 8 Global Step: 184540 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:20,641-Speed 2497.98 samples/sec Loss 3.9231 LearningRate 0.000746 Epoch: 8 Global Step: 184550 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:28,840-Speed 2498.32 samples/sec Loss 3.9169 LearningRate 0.000746 Epoch: 8 Global Step: 184560 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:36,982-Speed 2515.93 samples/sec Loss 3.9749 LearningRate 0.000746 Epoch: 8 Global Step: 184570 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:45,179-Speed 2498.56 samples/sec Loss 3.8367 LearningRate 0.000746 Epoch: 8 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:36:53,379-Speed 2498.09 samples/sec Loss 3.8882 LearningRate 0.000746 Epoch: 8 Global Step: 184590 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:01,574-Speed 2499.76 samples/sec Loss 3.9056 LearningRate 0.000746 Epoch: 8 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:09,771-Speed 2498.82 samples/sec Loss 3.9211 LearningRate 0.000746 Epoch: 8 Global Step: 184610 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:17,971-Speed 2497.81 samples/sec Loss 3.9011 LearningRate 0.000746 Epoch: 8 Global Step: 184620 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:26,116-Speed 2514.98 samples/sec Loss 3.9891 LearningRate 0.000746 Epoch: 8 Global Step: 184630 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:34,323-Speed 2495.78 samples/sec Loss 3.9635 LearningRate 0.000746 Epoch: 8 Global Step: 184640 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:42,535-Speed 2494.66 samples/sec Loss 3.9239 LearningRate 0.000746 Epoch: 8 Global Step: 184650 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:50,730-Speed 2499.66 samples/sec Loss 3.8958 LearningRate 0.000746 Epoch: 8 Global Step: 184660 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:37:58,931-Speed 2497.75 samples/sec Loss 3.8652 LearningRate 0.000746 Epoch: 8 Global Step: 184670 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:07,133-Speed 2497.33 samples/sec Loss 3.9178 LearningRate 0.000746 Epoch: 8 Global Step: 184680 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:15,285-Speed 2512.62 samples/sec Loss 3.8992 LearningRate 0.000746 Epoch: 8 Global Step: 184690 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:23,481-Speed 2499.06 samples/sec Loss 3.9835 LearningRate 0.000746 Epoch: 8 Global Step: 184700 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:31,683-Speed 2497.64 samples/sec Loss 3.8971 LearningRate 0.000746 Epoch: 8 Global Step: 184710 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:39,885-Speed 2497.31 samples/sec Loss 3.8818 LearningRate 0.000746 Epoch: 8 Global Step: 184720 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:48,091-Speed 2496.10 samples/sec Loss 3.8573 LearningRate 0.000746 Epoch: 8 Global Step: 184730 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:38:56,293-Speed 2497.40 samples/sec Loss 3.9664 LearningRate 0.000746 Epoch: 8 Global Step: 184740 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:04,441-Speed 2513.78 samples/sec Loss 3.9449 LearningRate 0.000746 Epoch: 8 Global Step: 184750 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:12,639-Speed 2498.64 samples/sec Loss 3.8798 LearningRate 0.000746 Epoch: 8 Global Step: 184760 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:20,841-Speed 2497.50 samples/sec Loss 3.8947 LearningRate 0.000746 Epoch: 8 Global Step: 184770 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:29,045-Speed 2496.81 samples/sec Loss 3.9815 LearningRate 0.000746 Epoch: 8 Global Step: 184780 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:37,256-Speed 2494.40 samples/sec Loss 3.8887 LearningRate 0.000746 Epoch: 8 Global Step: 184790 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:45,462-Speed 2496.18 samples/sec Loss 3.8550 LearningRate 0.000746 Epoch: 8 Global Step: 184800 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:39:53,617-Speed 2511.87 samples/sec Loss 3.8192 LearningRate 0.000746 Epoch: 8 Global Step: 184810 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:01,814-Speed 2498.97 samples/sec Loss 3.8189 LearningRate 0.000746 Epoch: 8 Global Step: 184820 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:10,014-Speed 2497.76 samples/sec Loss 3.8709 LearningRate 0.000746 Epoch: 8 Global Step: 184830 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:18,212-Speed 2498.51 samples/sec Loss 3.8733 LearningRate 0.000746 Epoch: 8 Global Step: 184840 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:26,411-Speed 2498.35 samples/sec Loss 3.8745 LearningRate 0.000746 Epoch: 8 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:34,610-Speed 2498.38 samples/sec Loss 3.8783 LearningRate 0.000746 Epoch: 8 Global Step: 184860 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:42,756-Speed 2514.34 samples/sec Loss 3.8636 LearningRate 0.000746 Epoch: 8 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:50,954-Speed 2498.56 samples/sec Loss 3.9035 LearningRate 0.000746 Epoch: 8 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:40:59,154-Speed 2498.02 samples/sec Loss 3.7968 LearningRate 0.000746 Epoch: 8 Global Step: 184890 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:07,351-Speed 2498.85 samples/sec Loss 3.9023 LearningRate 0.000746 Epoch: 8 Global Step: 184900 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:15,551-Speed 2497.73 samples/sec Loss 3.8584 LearningRate 0.000746 Epoch: 8 Global Step: 184910 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:23,758-Speed 2496.13 samples/sec Loss 3.8919 LearningRate 0.000746 Epoch: 8 Global Step: 184920 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:31,919-Speed 2509.95 samples/sec Loss 3.9220 LearningRate 0.000745 Epoch: 8 Global Step: 184930 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:40,119-Speed 2497.77 samples/sec Loss 3.8843 LearningRate 0.000745 Epoch: 8 Global Step: 184940 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:48,323-Speed 2496.82 samples/sec Loss 3.8781 LearningRate 0.000745 Epoch: 8 Global Step: 184950 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:41:56,519-Speed 2498.95 samples/sec Loss 3.8542 LearningRate 0.000745 Epoch: 8 Global Step: 184960 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:04,748-Speed 2489.38 samples/sec Loss 3.8610 LearningRate 0.000745 Epoch: 8 Global Step: 184970 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:12,953-Speed 2496.38 samples/sec Loss 3.9654 LearningRate 0.000745 Epoch: 8 Global Step: 184980 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:21,097-Speed 2514.89 samples/sec Loss 3.9138 LearningRate 0.000745 Epoch: 8 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:29,306-Speed 2495.22 samples/sec Loss 3.9669 LearningRate 0.000745 Epoch: 8 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:37,502-Speed 2499.64 samples/sec Loss 3.8903 LearningRate 0.000745 Epoch: 8 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:45,699-Speed 2498.71 samples/sec Loss 3.8455 LearningRate 0.000745 Epoch: 8 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:42:53,898-Speed 2498.24 samples/sec Loss 3.8642 LearningRate 0.000745 Epoch: 8 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:02,097-Speed 2498.23 samples/sec Loss 3.8069 LearningRate 0.000745 Epoch: 8 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:10,244-Speed 2514.30 samples/sec Loss 3.8953 LearningRate 0.000745 Epoch: 8 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:18,453-Speed 2495.23 samples/sec Loss 3.8816 LearningRate 0.000745 Epoch: 8 Global Step: 185060 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:26,649-Speed 2498.96 samples/sec Loss 3.8134 LearningRate 0.000745 Epoch: 8 Global Step: 185070 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:34,850-Speed 2497.84 samples/sec Loss 3.8488 LearningRate 0.000745 Epoch: 8 Global Step: 185080 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:43,045-Speed 2499.27 samples/sec Loss 3.8386 LearningRate 0.000745 Epoch: 8 Global Step: 185090 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:51,243-Speed 2498.50 samples/sec Loss 3.7772 LearningRate 0.000745 Epoch: 8 Global Step: 185100 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:43:59,387-Speed 2515.61 samples/sec Loss 3.8735 LearningRate 0.000745 Epoch: 8 Global Step: 185110 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:07,586-Speed 2498.31 samples/sec Loss 3.8556 LearningRate 0.000745 Epoch: 8 Global Step: 185120 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:15,788-Speed 2497.61 samples/sec Loss 3.8419 LearningRate 0.000745 Epoch: 8 Global Step: 185130 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:23,986-Speed 2498.49 samples/sec Loss 3.8626 LearningRate 0.000745 Epoch: 8 Global Step: 185140 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:32,190-Speed 2496.62 samples/sec Loss 3.8513 LearningRate 0.000745 Epoch: 8 Global Step: 185150 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:40,389-Speed 2498.32 samples/sec Loss 3.8255 LearningRate 0.000745 Epoch: 8 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:48,540-Speed 2513.10 samples/sec Loss 3.8599 LearningRate 0.000745 Epoch: 8 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:44:56,739-Speed 2498.02 samples/sec Loss 3.8253 LearningRate 0.000745 Epoch: 8 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:04,944-Speed 2496.40 samples/sec Loss 3.8529 LearningRate 0.000745 Epoch: 8 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:13,142-Speed 2498.77 samples/sec Loss 3.8406 LearningRate 0.000745 Epoch: 8 Global Step: 185200 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:21,343-Speed 2497.43 samples/sec Loss 3.9064 LearningRate 0.000745 Epoch: 8 Global Step: 185210 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:29,543-Speed 2497.86 samples/sec Loss 3.8474 LearningRate 0.000745 Epoch: 8 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:37,689-Speed 2514.69 samples/sec Loss 3.8727 LearningRate 0.000745 Epoch: 8 Global Step: 185230 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:45,886-Speed 2498.48 samples/sec Loss 3.8153 LearningRate 0.000745 Epoch: 8 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:45:54,086-Speed 2498.18 samples/sec Loss 3.8229 LearningRate 0.000745 Epoch: 8 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:46:02,286-Speed 2497.66 samples/sec Loss 3.8248 LearningRate 0.000745 Epoch: 8 Global Step: 185260 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:46:10,484-Speed 2498.66 samples/sec Loss 3.8237 LearningRate 0.000745 Epoch: 8 Global Step: 185270 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:46:18,680-Speed 2499.05 samples/sec Loss 3.8648 LearningRate 0.000745 Epoch: 8 Global Step: 185280 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:46:26,839-Speed 2515.84 samples/sec Loss 3.9058 LearningRate 0.000745 Epoch: 8 Global Step: 185290 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:46:35,355-Speed 2499.99 samples/sec Loss 3.8142 LearningRate 0.000745 Epoch: 8 Global Step: 185300 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:46:43,554-Speed 2498.41 samples/sec Loss 3.9330 LearningRate 0.000745 Epoch: 8 Global Step: 185310 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:46:51,845-Speed 2500.25 samples/sec Loss 3.8610 LearningRate 0.000745 Epoch: 8 Global Step: 185320 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:00,076-Speed 2501.03 samples/sec Loss 3.7942 LearningRate 0.000745 Epoch: 8 Global Step: 185330 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:12,976-Speed 1587.78 samples/sec Loss 3.9543 LearningRate 0.000745 Epoch: 8 Global Step: 185340 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:21,108-Speed 2518.83 samples/sec Loss 3.8944 LearningRate 0.000745 Epoch: 8 Global Step: 185350 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:29,484-Speed 2502.74 samples/sec Loss 3.8537 LearningRate 0.000744 Epoch: 8 Global Step: 185360 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:37,731-Speed 2500.61 samples/sec Loss 3.8554 LearningRate 0.000744 Epoch: 8 Global Step: 185370 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:45,926-Speed 2499.37 samples/sec Loss 3.8129 LearningRate 0.000744 Epoch: 8 Global Step: 185380 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:47:54,182-Speed 2501.35 samples/sec Loss 3.9688 LearningRate 0.000744 Epoch: 8 Global Step: 185390 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:02,418-Speed 2501.31 samples/sec Loss 3.9284 LearningRate 0.000744 Epoch: 8 Global Step: 185400 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:10,566-Speed 2513.80 samples/sec Loss 3.8704 LearningRate 0.000744 Epoch: 8 Global Step: 185410 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:18,790-Speed 2500.63 samples/sec Loss 3.7753 LearningRate 0.000744 Epoch: 8 Global Step: 185420 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:27,019-Speed 2500.45 samples/sec Loss 3.9294 LearningRate 0.000744 Epoch: 8 Global Step: 185430 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:35,244-Speed 2499.89 samples/sec Loss 3.9290 LearningRate 0.000744 Epoch: 8 Global Step: 185440 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:43,451-Speed 2496.01 samples/sec Loss 3.8250 LearningRate 0.000744 Epoch: 8 Global Step: 185450 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:51,651-Speed 2497.72 samples/sec Loss 3.9963 LearningRate 0.000744 Epoch: 8 Global Step: 185460 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:48:59,834-Speed 2516.83 samples/sec Loss 3.8569 LearningRate 0.000744 Epoch: 8 Global Step: 185470 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:08,063-Speed 2500.26 samples/sec Loss 3.8408 LearningRate 0.000744 Epoch: 8 Global Step: 185480 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:16,263-Speed 2497.93 samples/sec Loss 3.8854 LearningRate 0.000744 Epoch: 8 Global Step: 185490 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:24,507-Speed 2496.77 samples/sec Loss 3.8991 LearningRate 0.000744 Epoch: 8 Global Step: 185500 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:38,199-Speed 2499.57 samples/sec Loss 3.8487 LearningRate 0.000744 Epoch: 8 Global Step: 185510 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:46,419-Speed 2500.67 samples/sec Loss 3.9481 LearningRate 0.000744 Epoch: 8 Global Step: 185520 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:49:54,615-Speed 2516.60 samples/sec Loss 3.8461 LearningRate 0.000744 Epoch: 8 Global Step: 185530 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:50:02,841-Speed 2499.95 samples/sec Loss 3.9097 LearningRate 0.000744 Epoch: 8 Global Step: 185540 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:50:15,903-Speed 1576.76 samples/sec Loss 3.8860 LearningRate 0.000744 Epoch: 8 Global Step: 185550 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:50:24,104-Speed 2498.81 samples/sec Loss 3.9466 LearningRate 0.000744 Epoch: 8 Global Step: 185560 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:50:32,322-Speed 2499.96 samples/sec Loss 3.8693 LearningRate 0.000744 Epoch: 8 Global Step: 185570 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 08:50:45,155-Speed 1621.55 samples/sec Loss 3.8705 LearningRate 0.000744 Epoch: 8 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:50:55,102-Speed 2063.86 samples/sec Loss 3.9052 LearningRate 0.000744 Epoch: 8 Global Step: 185590 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:06,070-Speed 1867.51 samples/sec Loss 3.8883 LearningRate 0.000744 Epoch: 8 Global Step: 185600 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:14,267-Speed 2498.72 samples/sec Loss 3.9324 LearningRate 0.000744 Epoch: 8 Global Step: 185610 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:23,195-Speed 2296.61 samples/sec Loss 3.9164 LearningRate 0.000744 Epoch: 8 Global Step: 185620 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:31,760-Speed 2391.14 samples/sec Loss 3.9354 LearningRate 0.000744 Epoch: 8 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:39,995-Speed 2499.66 samples/sec Loss 3.8395 LearningRate 0.000744 Epoch: 8 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:48,152-Speed 2511.20 samples/sec Loss 3.8469 LearningRate 0.000744 Epoch: 8 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:51:56,361-Speed 2495.04 samples/sec Loss 3.9295 LearningRate 0.000744 Epoch: 8 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:04,571-Speed 2494.91 samples/sec Loss 3.8897 LearningRate 0.000744 Epoch: 8 Global Step: 185670 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:12,786-Speed 2493.44 samples/sec Loss 3.7995 LearningRate 0.000744 Epoch: 8 Global Step: 185680 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:20,991-Speed 2496.53 samples/sec Loss 3.8101 LearningRate 0.000744 Epoch: 8 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:29,195-Speed 2496.77 samples/sec Loss 3.8758 LearningRate 0.000744 Epoch: 8 Global Step: 185700 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:37,346-Speed 2513.03 samples/sec Loss 3.8292 LearningRate 0.000744 Epoch: 8 Global Step: 185710 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:45,565-Speed 2492.12 samples/sec Loss 3.8511 LearningRate 0.000744 Epoch: 8 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:52:53,771-Speed 2496.34 samples/sec Loss 3.8424 LearningRate 0.000744 Epoch: 8 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:01,973-Speed 2497.45 samples/sec Loss 3.8952 LearningRate 0.000744 Epoch: 8 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:10,174-Speed 2497.71 samples/sec Loss 3.8126 LearningRate 0.000744 Epoch: 8 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:18,378-Speed 2496.79 samples/sec Loss 3.8022 LearningRate 0.000744 Epoch: 8 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:26,523-Speed 2514.62 samples/sec Loss 3.8521 LearningRate 0.000744 Epoch: 8 Global Step: 185770 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:34,725-Speed 2497.24 samples/sec Loss 3.8557 LearningRate 0.000744 Epoch: 8 Global Step: 185780 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:42,931-Speed 2496.27 samples/sec Loss 3.8321 LearningRate 0.000744 Epoch: 8 Global Step: 185790 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:51,133-Speed 2497.72 samples/sec Loss 3.8655 LearningRate 0.000743 Epoch: 8 Global Step: 185800 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:53:59,343-Speed 2494.78 samples/sec Loss 3.8149 LearningRate 0.000743 Epoch: 8 Global Step: 185810 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:07,546-Speed 2497.03 samples/sec Loss 3.9218 LearningRate 0.000743 Epoch: 8 Global Step: 185820 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:15,702-Speed 2511.52 samples/sec Loss 3.8732 LearningRate 0.000743 Epoch: 8 Global Step: 185830 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:23,903-Speed 2497.57 samples/sec Loss 3.8655 LearningRate 0.000743 Epoch: 8 Global Step: 185840 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:32,104-Speed 2497.74 samples/sec Loss 3.8067 LearningRate 0.000743 Epoch: 8 Global Step: 185850 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:40,304-Speed 2498.09 samples/sec Loss 3.7689 LearningRate 0.000743 Epoch: 8 Global Step: 185860 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:48,505-Speed 2497.93 samples/sec Loss 3.8890 LearningRate 0.000743 Epoch: 8 Global Step: 185870 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:54:56,710-Speed 2496.48 samples/sec Loss 3.7593 LearningRate 0.000743 Epoch: 8 Global Step: 185880 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:04,857-Speed 2514.27 samples/sec Loss 3.8727 LearningRate 0.000743 Epoch: 8 Global Step: 185890 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:13,057-Speed 2498.23 samples/sec Loss 3.8544 LearningRate 0.000743 Epoch: 8 Global Step: 185900 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:21,254-Speed 2498.88 samples/sec Loss 3.8312 LearningRate 0.000743 Epoch: 8 Global Step: 185910 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:29,454-Speed 2497.99 samples/sec Loss 3.8363 LearningRate 0.000743 Epoch: 8 Global Step: 185920 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:37,653-Speed 2498.00 samples/sec Loss 3.8885 LearningRate 0.000743 Epoch: 8 Global Step: 185930 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:45,852-Speed 2498.31 samples/sec Loss 3.8696 LearningRate 0.000743 Epoch: 8 Global Step: 185940 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:55:54,001-Speed 2513.53 samples/sec Loss 3.8532 LearningRate 0.000743 Epoch: 8 Global Step: 185950 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:02,204-Speed 2497.17 samples/sec Loss 3.8289 LearningRate 0.000743 Epoch: 8 Global Step: 185960 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:10,406-Speed 2497.54 samples/sec Loss 3.9024 LearningRate 0.000743 Epoch: 8 Global Step: 185970 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:18,609-Speed 2497.20 samples/sec Loss 3.8797 LearningRate 0.000743 Epoch: 8 Global Step: 185980 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:26,808-Speed 2498.46 samples/sec Loss 3.8489 LearningRate 0.000743 Epoch: 8 Global Step: 185990 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:35,011-Speed 2497.26 samples/sec Loss 3.8599 LearningRate 0.000743 Epoch: 8 Global Step: 186000 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:43,157-Speed 2514.37 samples/sec Loss 3.8200 LearningRate 0.000743 Epoch: 8 Global Step: 186010 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:51,359-Speed 2497.49 samples/sec Loss 3.8540 LearningRate 0.000743 Epoch: 8 Global Step: 186020 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:56:59,558-Speed 2498.16 samples/sec Loss 3.8569 LearningRate 0.000743 Epoch: 8 Global Step: 186030 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:07,757-Speed 2498.43 samples/sec Loss 3.8353 LearningRate 0.000743 Epoch: 8 Global Step: 186040 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:15,957-Speed 2497.93 samples/sec Loss 3.7786 LearningRate 0.000743 Epoch: 8 Global Step: 186050 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:24,158-Speed 2497.92 samples/sec Loss 3.8980 LearningRate 0.000743 Epoch: 8 Global Step: 186060 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:32,305-Speed 2514.19 samples/sec Loss 3.8673 LearningRate 0.000743 Epoch: 8 Global Step: 186070 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:40,505-Speed 2497.82 samples/sec Loss 3.8814 LearningRate 0.000743 Epoch: 8 Global Step: 186080 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:48,729-Speed 2490.72 samples/sec Loss 3.9259 LearningRate 0.000743 Epoch: 8 Global Step: 186090 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:57:56,928-Speed 2498.24 samples/sec Loss 3.9290 LearningRate 0.000743 Epoch: 8 Global Step: 186100 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:05,140-Speed 2494.50 samples/sec Loss 3.8590 LearningRate 0.000743 Epoch: 8 Global Step: 186110 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:13,339-Speed 2498.18 samples/sec Loss 3.8277 LearningRate 0.000743 Epoch: 8 Global Step: 186120 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:21,499-Speed 2510.31 samples/sec Loss 3.8456 LearningRate 0.000743 Epoch: 8 Global Step: 186130 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:29,707-Speed 2495.41 samples/sec Loss 3.9039 LearningRate 0.000743 Epoch: 8 Global Step: 186140 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:37,908-Speed 2497.59 samples/sec Loss 3.8535 LearningRate 0.000743 Epoch: 8 Global Step: 186150 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:46,109-Speed 2497.61 samples/sec Loss 3.8732 LearningRate 0.000743 Epoch: 8 Global Step: 186160 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:58:54,311-Speed 2497.58 samples/sec Loss 3.8255 LearningRate 0.000743 Epoch: 8 Global Step: 186170 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:02,523-Speed 2494.29 samples/sec Loss 3.7936 LearningRate 0.000743 Epoch: 8 Global Step: 186180 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:10,674-Speed 2512.85 samples/sec Loss 3.7937 LearningRate 0.000743 Epoch: 8 Global Step: 186190 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:18,874-Speed 2497.90 samples/sec Loss 3.8066 LearningRate 0.000743 Epoch: 8 Global Step: 186200 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:27,073-Speed 2498.73 samples/sec Loss 3.8311 LearningRate 0.000743 Epoch: 8 Global Step: 186210 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:35,281-Speed 2495.41 samples/sec Loss 3.8560 LearningRate 0.000743 Epoch: 8 Global Step: 186220 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:43,509-Speed 2489.59 samples/sec Loss 3.8982 LearningRate 0.000742 Epoch: 8 Global Step: 186230 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:51,710-Speed 2497.78 samples/sec Loss 3.8458 LearningRate 0.000742 Epoch: 8 Global Step: 186240 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 08:59:59,861-Speed 2512.83 samples/sec Loss 3.8982 LearningRate 0.000742 Epoch: 8 Global Step: 186250 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:08,061-Speed 2497.94 samples/sec Loss 3.8177 LearningRate 0.000742 Epoch: 8 Global Step: 186260 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:16,263-Speed 2497.20 samples/sec Loss 3.8553 LearningRate 0.000742 Epoch: 8 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:24,463-Speed 2498.19 samples/sec Loss 3.8722 LearningRate 0.000742 Epoch: 8 Global Step: 186280 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:32,658-Speed 2499.39 samples/sec Loss 3.8488 LearningRate 0.000742 Epoch: 8 Global Step: 186290 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:40,856-Speed 2498.57 samples/sec Loss 3.8572 LearningRate 0.000742 Epoch: 8 Global Step: 186300 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:49,002-Speed 2514.53 samples/sec Loss 3.8716 LearningRate 0.000742 Epoch: 8 Global Step: 186310 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:00:57,202-Speed 2497.94 samples/sec Loss 3.8166 LearningRate 0.000742 Epoch: 8 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:05,399-Speed 2498.82 samples/sec Loss 3.8208 LearningRate 0.000742 Epoch: 8 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:13,597-Speed 2498.64 samples/sec Loss 3.8530 LearningRate 0.000742 Epoch: 8 Global Step: 186340 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:21,797-Speed 2498.11 samples/sec Loss 3.9340 LearningRate 0.000742 Epoch: 8 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:29,994-Speed 2498.67 samples/sec Loss 3.7944 LearningRate 0.000742 Epoch: 8 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:38,142-Speed 2513.63 samples/sec Loss 3.9311 LearningRate 0.000742 Epoch: 8 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:46,343-Speed 2497.86 samples/sec Loss 3.8446 LearningRate 0.000742 Epoch: 8 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:01:54,542-Speed 2498.38 samples/sec Loss 3.8832 LearningRate 0.000742 Epoch: 8 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:02,743-Speed 2497.67 samples/sec Loss 3.9060 LearningRate 0.000742 Epoch: 8 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:10,943-Speed 2497.87 samples/sec Loss 3.8258 LearningRate 0.000742 Epoch: 8 Global Step: 186410 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:19,153-Speed 2494.81 samples/sec Loss 3.8424 LearningRate 0.000742 Epoch: 8 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:27,298-Speed 2515.17 samples/sec Loss 3.8197 LearningRate 0.000742 Epoch: 8 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:35,500-Speed 2497.20 samples/sec Loss 3.8651 LearningRate 0.000742 Epoch: 8 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:43,702-Speed 2497.38 samples/sec Loss 3.7813 LearningRate 0.000742 Epoch: 8 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:02:51,901-Speed 2498.22 samples/sec Loss 3.8467 LearningRate 0.000742 Epoch: 8 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:00,097-Speed 2499.21 samples/sec Loss 3.8445 LearningRate 0.000742 Epoch: 8 Global Step: 186470 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:08,300-Speed 2497.12 samples/sec Loss 3.8828 LearningRate 0.000742 Epoch: 8 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:16,445-Speed 2514.76 samples/sec Loss 3.8298 LearningRate 0.000742 Epoch: 8 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:24,674-Speed 2489.24 samples/sec Loss 3.8117 LearningRate 0.000742 Epoch: 8 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:32,879-Speed 2496.60 samples/sec Loss 3.7782 LearningRate 0.000742 Epoch: 8 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:41,076-Speed 2498.81 samples/sec Loss 3.8740 LearningRate 0.000742 Epoch: 8 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:49,285-Speed 2495.17 samples/sec Loss 3.8077 LearningRate 0.000742 Epoch: 8 Global Step: 186530 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:03:57,497-Speed 2494.16 samples/sec Loss 3.8998 LearningRate 0.000742 Epoch: 8 Global Step: 186540 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:05,641-Speed 2515.24 samples/sec Loss 3.8644 LearningRate 0.000742 Epoch: 8 Global Step: 186550 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:13,851-Speed 2494.80 samples/sec Loss 3.8680 LearningRate 0.000742 Epoch: 8 Global Step: 186560 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:22,053-Speed 2497.57 samples/sec Loss 3.8533 LearningRate 0.000742 Epoch: 8 Global Step: 186570 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:30,254-Speed 2498.21 samples/sec Loss 3.9024 LearningRate 0.000742 Epoch: 8 Global Step: 186580 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:38,457-Speed 2497.18 samples/sec Loss 3.8775 LearningRate 0.000742 Epoch: 8 Global Step: 186590 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:46,658-Speed 2497.57 samples/sec Loss 3.7962 LearningRate 0.000742 Epoch: 8 Global Step: 186600 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:04:54,826-Speed 2507.79 samples/sec Loss 3.9591 LearningRate 0.000742 Epoch: 8 Global Step: 186610 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:03,030-Speed 2496.79 samples/sec Loss 3.8588 LearningRate 0.000742 Epoch: 8 Global Step: 186620 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:11,235-Speed 2496.53 samples/sec Loss 3.9486 LearningRate 0.000742 Epoch: 8 Global Step: 186630 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:19,435-Speed 2498.15 samples/sec Loss 3.9300 LearningRate 0.000742 Epoch: 8 Global Step: 186640 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:27,637-Speed 2497.30 samples/sec Loss 3.8900 LearningRate 0.000742 Epoch: 8 Global Step: 186650 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:37,938-Speed 1988.56 samples/sec Loss 3.8926 LearningRate 0.000741 Epoch: 9 Global Step: 186660 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:46,083-Speed 2514.78 samples/sec Loss 3.8844 LearningRate 0.000741 Epoch: 9 Global Step: 186670 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:05:54,279-Speed 2499.26 samples/sec Loss 3.9288 LearningRate 0.000741 Epoch: 9 Global Step: 186680 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:02,478-Speed 2498.32 samples/sec Loss 3.8506 LearningRate 0.000741 Epoch: 9 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:10,675-Speed 2499.16 samples/sec Loss 3.8515 LearningRate 0.000741 Epoch: 9 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:18,871-Speed 2499.09 samples/sec Loss 3.8639 LearningRate 0.000741 Epoch: 9 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:27,077-Speed 2495.96 samples/sec Loss 3.9653 LearningRate 0.000741 Epoch: 9 Global Step: 186720 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:35,239-Speed 2509.86 samples/sec Loss 3.9154 LearningRate 0.000741 Epoch: 9 Global Step: 186730 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:43,436-Speed 2498.88 samples/sec Loss 3.8528 LearningRate 0.000741 Epoch: 9 Global Step: 186740 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:51,634-Speed 2498.53 samples/sec Loss 3.9038 LearningRate 0.000741 Epoch: 9 Global Step: 186750 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:06:59,838-Speed 2496.68 samples/sec Loss 3.8919 LearningRate 0.000741 Epoch: 9 Global Step: 186760 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:07:08,036-Speed 2498.74 samples/sec Loss 3.8695 LearningRate 0.000741 Epoch: 9 Global Step: 186770 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:07:16,234-Speed 2498.49 samples/sec Loss 3.8666 LearningRate 0.000741 Epoch: 9 Global Step: 186780 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:07:24,380-Speed 2514.65 samples/sec Loss 3.8358 LearningRate 0.000741 Epoch: 9 Global Step: 186790 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:07:32,576-Speed 2499.09 samples/sec Loss 3.9076 LearningRate 0.000741 Epoch: 9 Global Step: 186800 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:07:40,777-Speed 2497.72 samples/sec Loss 3.8931 LearningRate 0.000741 Epoch: 9 Global Step: 186810 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:07:48,981-Speed 2496.72 samples/sec Loss 3.8011 LearningRate 0.000741 Epoch: 9 Global Step: 186820 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:07:57,181-Speed 2497.86 samples/sec Loss 3.9422 LearningRate 0.000741 Epoch: 9 Global Step: 186830 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:05,384-Speed 2496.97 samples/sec Loss 3.8294 LearningRate 0.000741 Epoch: 9 Global Step: 186840 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:13,532-Speed 2514.04 samples/sec Loss 3.8692 LearningRate 0.000741 Epoch: 9 Global Step: 186850 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:21,733-Speed 2497.66 samples/sec Loss 3.8200 LearningRate 0.000741 Epoch: 9 Global Step: 186860 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:29,932-Speed 2498.32 samples/sec Loss 3.8268 LearningRate 0.000741 Epoch: 9 Global Step: 186870 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:38,132-Speed 2497.74 samples/sec Loss 3.8587 LearningRate 0.000741 Epoch: 9 Global Step: 186880 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:46,333-Speed 2498.15 samples/sec Loss 3.8500 LearningRate 0.000741 Epoch: 9 Global Step: 186890 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:08:54,535-Speed 2497.35 samples/sec Loss 3.9004 LearningRate 0.000741 Epoch: 9 Global Step: 186900 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:02,687-Speed 2512.87 samples/sec Loss 3.8149 LearningRate 0.000741 Epoch: 9 Global Step: 186910 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:10,885-Speed 2498.59 samples/sec Loss 3.8351 LearningRate 0.000741 Epoch: 9 Global Step: 186920 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:19,083-Speed 2498.51 samples/sec Loss 3.7688 LearningRate 0.000741 Epoch: 9 Global Step: 186930 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:27,282-Speed 2498.38 samples/sec Loss 3.8642 LearningRate 0.000741 Epoch: 9 Global Step: 186940 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:35,498-Speed 2493.24 samples/sec Loss 3.9005 LearningRate 0.000741 Epoch: 9 Global Step: 186950 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:43,696-Speed 2498.41 samples/sec Loss 3.9294 LearningRate 0.000741 Epoch: 9 Global Step: 186960 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:09:51,856-Speed 2510.38 samples/sec Loss 3.8640 LearningRate 0.000741 Epoch: 9 Global Step: 186970 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:10:00,011-Speed 2511.86 samples/sec Loss 3.9397 LearningRate 0.000741 Epoch: 9 Global Step: 186980 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:08,209-Speed 2498.27 samples/sec Loss 3.9296 LearningRate 0.000741 Epoch: 9 Global Step: 186990 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:16,406-Speed 2498.96 samples/sec Loss 3.9431 LearningRate 0.000741 Epoch: 9 Global Step: 187000 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:24,603-Speed 2498.90 samples/sec Loss 3.8911 LearningRate 0.000741 Epoch: 9 Global Step: 187010 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:32,805-Speed 2497.25 samples/sec Loss 3.8747 LearningRate 0.000741 Epoch: 9 Global Step: 187020 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:40,951-Speed 2514.37 samples/sec Loss 3.8818 LearningRate 0.000741 Epoch: 9 Global Step: 187030 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:49,152-Speed 2498.13 samples/sec Loss 3.8449 LearningRate 0.000741 Epoch: 9 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:10:57,349-Speed 2498.87 samples/sec Loss 3.9091 LearningRate 0.000741 Epoch: 9 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:05,550-Speed 2497.45 samples/sec Loss 3.8932 LearningRate 0.000741 Epoch: 9 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:14,932-Speed 2183.12 samples/sec Loss 3.9162 LearningRate 0.000741 Epoch: 9 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:24,233-Speed 2202.43 samples/sec Loss 3.8673 LearningRate 0.000741 Epoch: 9 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:32,375-Speed 2516.04 samples/sec Loss 3.9707 LearningRate 0.000741 Epoch: 9 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:40,575-Speed 2498.63 samples/sec Loss 3.8862 LearningRate 0.000740 Epoch: 9 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:48,769-Speed 2499.86 samples/sec Loss 3.8804 LearningRate 0.000740 Epoch: 9 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:11:56,968-Speed 2498.33 samples/sec Loss 3.9043 LearningRate 0.000740 Epoch: 9 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:05,171-Speed 2496.87 samples/sec Loss 3.8694 LearningRate 0.000740 Epoch: 9 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:13,367-Speed 2499.29 samples/sec Loss 3.9419 LearningRate 0.000740 Epoch: 9 Global Step: 187140 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:21,511-Speed 2515.13 samples/sec Loss 3.8632 LearningRate 0.000740 Epoch: 9 Global Step: 187150 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:29,712-Speed 2497.84 samples/sec Loss 3.9363 LearningRate 0.000740 Epoch: 9 Global Step: 187160 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:37,923-Speed 2494.39 samples/sec Loss 3.8743 LearningRate 0.000740 Epoch: 9 Global Step: 187170 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:46,124-Speed 2497.70 samples/sec Loss 3.8795 LearningRate 0.000740 Epoch: 9 Global Step: 187180 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:12:54,327-Speed 2497.04 samples/sec Loss 3.7299 LearningRate 0.000740 Epoch: 9 Global Step: 187190 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:02,527-Speed 2497.87 samples/sec Loss 3.8536 LearningRate 0.000740 Epoch: 9 Global Step: 187200 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:10,676-Speed 2513.68 samples/sec Loss 3.8034 LearningRate 0.000740 Epoch: 9 Global Step: 187210 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:18,885-Speed 2495.13 samples/sec Loss 3.8791 LearningRate 0.000740 Epoch: 9 Global Step: 187220 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:27,085-Speed 2498.04 samples/sec Loss 3.9211 LearningRate 0.000740 Epoch: 9 Global Step: 187230 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:35,295-Speed 2494.86 samples/sec Loss 3.8612 LearningRate 0.000740 Epoch: 9 Global Step: 187240 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:43,492-Speed 2498.95 samples/sec Loss 3.8633 LearningRate 0.000740 Epoch: 9 Global Step: 187250 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:51,693-Speed 2497.52 samples/sec Loss 3.8361 LearningRate 0.000740 Epoch: 9 Global Step: 187260 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:13:59,836-Speed 2515.43 samples/sec Loss 3.8668 LearningRate 0.000740 Epoch: 9 Global Step: 187270 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:08,034-Speed 2498.66 samples/sec Loss 3.8766 LearningRate 0.000740 Epoch: 9 Global Step: 187280 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:16,230-Speed 2499.03 samples/sec Loss 3.8320 LearningRate 0.000740 Epoch: 9 Global Step: 187290 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:24,427-Speed 2498.84 samples/sec Loss 3.8942 LearningRate 0.000740 Epoch: 9 Global Step: 187300 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:32,625-Speed 2498.76 samples/sec Loss 3.8516 LearningRate 0.000740 Epoch: 9 Global Step: 187310 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:40,822-Speed 2498.63 samples/sec Loss 3.8893 LearningRate 0.000740 Epoch: 9 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:48,967-Speed 2514.99 samples/sec Loss 3.7450 LearningRate 0.000740 Epoch: 9 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:14:57,160-Speed 2500.05 samples/sec Loss 3.8628 LearningRate 0.000740 Epoch: 9 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:05,373-Speed 2493.78 samples/sec Loss 3.8129 LearningRate 0.000740 Epoch: 9 Global Step: 187350 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:13,570-Speed 2498.85 samples/sec Loss 3.8443 LearningRate 0.000740 Epoch: 9 Global Step: 187360 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:21,780-Speed 2494.95 samples/sec Loss 3.7991 LearningRate 0.000740 Epoch: 9 Global Step: 187370 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:29,981-Speed 2497.91 samples/sec Loss 3.8199 LearningRate 0.000740 Epoch: 9 Global Step: 187380 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:38,132-Speed 2512.86 samples/sec Loss 3.8612 LearningRate 0.000740 Epoch: 9 Global Step: 187390 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:46,332-Speed 2497.98 samples/sec Loss 3.8459 LearningRate 0.000740 Epoch: 9 Global Step: 187400 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:15:54,529-Speed 2498.74 samples/sec Loss 3.8264 LearningRate 0.000740 Epoch: 9 Global Step: 187410 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:02,731-Speed 2497.34 samples/sec Loss 3.8725 LearningRate 0.000740 Epoch: 9 Global Step: 187420 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:10,931-Speed 2498.09 samples/sec Loss 3.8415 LearningRate 0.000740 Epoch: 9 Global Step: 187430 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:19,130-Speed 2498.14 samples/sec Loss 3.7876 LearningRate 0.000740 Epoch: 9 Global Step: 187440 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:27,280-Speed 2513.29 samples/sec Loss 3.8509 LearningRate 0.000740 Epoch: 9 Global Step: 187450 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:35,490-Speed 2494.82 samples/sec Loss 3.7771 LearningRate 0.000740 Epoch: 9 Global Step: 187460 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:43,691-Speed 2497.89 samples/sec Loss 3.8390 LearningRate 0.000740 Epoch: 9 Global Step: 187470 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:16:51,895-Speed 2496.64 samples/sec Loss 3.8327 LearningRate 0.000740 Epoch: 9 Global Step: 187480 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:00,096-Speed 2497.71 samples/sec Loss 3.9799 LearningRate 0.000740 Epoch: 9 Global Step: 187490 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:08,296-Speed 2497.83 samples/sec Loss 3.8969 LearningRate 0.000740 Epoch: 9 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:16,444-Speed 2513.98 samples/sec Loss 3.8715 LearningRate 0.000740 Epoch: 9 Global Step: 187510 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:24,642-Speed 2498.68 samples/sec Loss 3.9265 LearningRate 0.000740 Epoch: 9 Global Step: 187520 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:32,841-Speed 2498.34 samples/sec Loss 3.9047 LearningRate 0.000739 Epoch: 9 Global Step: 187530 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:41,046-Speed 2496.19 samples/sec Loss 3.8669 LearningRate 0.000739 Epoch: 9 Global Step: 187540 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:49,253-Speed 2495.99 samples/sec Loss 3.9075 LearningRate 0.000739 Epoch: 9 Global Step: 187550 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:17:57,449-Speed 2499.04 samples/sec Loss 3.9077 LearningRate 0.000739 Epoch: 9 Global Step: 187560 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:05,610-Speed 2510.05 samples/sec Loss 3.8947 LearningRate 0.000739 Epoch: 9 Global Step: 187570 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:13,808-Speed 2498.86 samples/sec Loss 3.8461 LearningRate 0.000739 Epoch: 9 Global Step: 187580 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:22,008-Speed 2497.81 samples/sec Loss 3.8294 LearningRate 0.000739 Epoch: 9 Global Step: 187590 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:30,207-Speed 2498.42 samples/sec Loss 3.8872 LearningRate 0.000739 Epoch: 9 Global Step: 187600 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:38,404-Speed 2498.85 samples/sec Loss 3.8982 LearningRate 0.000739 Epoch: 9 Global Step: 187610 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:46,606-Speed 2497.61 samples/sec Loss 3.8299 LearningRate 0.000739 Epoch: 9 Global Step: 187620 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:18:54,768-Speed 2509.41 samples/sec Loss 3.7579 LearningRate 0.000739 Epoch: 9 Global Step: 187630 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:02,966-Speed 2498.93 samples/sec Loss 3.8721 LearningRate 0.000739 Epoch: 9 Global Step: 187640 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:11,168-Speed 2497.41 samples/sec Loss 3.8109 LearningRate 0.000739 Epoch: 9 Global Step: 187650 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:19,373-Speed 2496.64 samples/sec Loss 3.8774 LearningRate 0.000739 Epoch: 9 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:27,574-Speed 2497.53 samples/sec Loss 3.8078 LearningRate 0.000739 Epoch: 9 Global Step: 187670 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:35,773-Speed 2498.16 samples/sec Loss 3.7995 LearningRate 0.000739 Epoch: 9 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:43,921-Speed 2513.79 samples/sec Loss 3.8259 LearningRate 0.000739 Epoch: 9 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:19:52,122-Speed 2497.67 samples/sec Loss 3.8641 LearningRate 0.000739 Epoch: 9 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:00,319-Speed 2498.79 samples/sec Loss 3.8616 LearningRate 0.000739 Epoch: 9 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:08,524-Speed 2496.43 samples/sec Loss 3.8497 LearningRate 0.000739 Epoch: 9 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:16,736-Speed 2494.37 samples/sec Loss 3.9000 LearningRate 0.000739 Epoch: 9 Global Step: 187730 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:24,934-Speed 2498.66 samples/sec Loss 3.8335 LearningRate 0.000739 Epoch: 9 Global Step: 187740 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:33,080-Speed 2514.23 samples/sec Loss 3.8050 LearningRate 0.000739 Epoch: 9 Global Step: 187750 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:41,284-Speed 2496.89 samples/sec Loss 3.7811 LearningRate 0.000739 Epoch: 9 Global Step: 187760 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:49,482-Speed 2498.38 samples/sec Loss 3.8102 LearningRate 0.000739 Epoch: 9 Global Step: 187770 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:20:57,701-Speed 2492.27 samples/sec Loss 3.8069 LearningRate 0.000739 Epoch: 9 Global Step: 187780 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:05,899-Speed 2498.72 samples/sec Loss 3.8477 LearningRate 0.000739 Epoch: 9 Global Step: 187790 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:14,102-Speed 2496.90 samples/sec Loss 3.8076 LearningRate 0.000739 Epoch: 9 Global Step: 187800 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:22,249-Speed 2514.42 samples/sec Loss 3.7825 LearningRate 0.000739 Epoch: 9 Global Step: 187810 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:30,447-Speed 2498.38 samples/sec Loss 3.8703 LearningRate 0.000739 Epoch: 9 Global Step: 187820 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:38,651-Speed 2496.92 samples/sec Loss 3.8229 LearningRate 0.000739 Epoch: 9 Global Step: 187830 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:46,848-Speed 2498.91 samples/sec Loss 3.8150 LearningRate 0.000739 Epoch: 9 Global Step: 187840 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:21:55,045-Speed 2498.88 samples/sec Loss 3.8781 LearningRate 0.000739 Epoch: 9 Global Step: 187850 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:03,244-Speed 2498.23 samples/sec Loss 3.8019 LearningRate 0.000739 Epoch: 9 Global Step: 187860 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:11,389-Speed 2514.77 samples/sec Loss 3.8933 LearningRate 0.000739 Epoch: 9 Global Step: 187870 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:19,590-Speed 2498.16 samples/sec Loss 3.8219 LearningRate 0.000739 Epoch: 9 Global Step: 187880 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:27,789-Speed 2498.21 samples/sec Loss 3.8540 LearningRate 0.000739 Epoch: 9 Global Step: 187890 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:35,999-Speed 2495.11 samples/sec Loss 3.8798 LearningRate 0.000739 Epoch: 9 Global Step: 187900 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:44,199-Speed 2497.97 samples/sec Loss 3.8380 LearningRate 0.000739 Epoch: 9 Global Step: 187910 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:22:52,399-Speed 2498.05 samples/sec Loss 3.8809 LearningRate 0.000739 Epoch: 9 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:00,541-Speed 2515.81 samples/sec Loss 3.8525 LearningRate 0.000739 Epoch: 9 Global Step: 187930 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:08,739-Speed 2498.49 samples/sec Loss 3.9095 LearningRate 0.000739 Epoch: 9 Global Step: 187940 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:16,935-Speed 2499.12 samples/sec Loss 3.8720 LearningRate 0.000739 Epoch: 9 Global Step: 187950 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:25,134-Speed 2498.43 samples/sec Loss 3.8504 LearningRate 0.000738 Epoch: 9 Global Step: 187960 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:33,337-Speed 2496.96 samples/sec Loss 3.8274 LearningRate 0.000738 Epoch: 9 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:41,537-Speed 2498.10 samples/sec Loss 3.8287 LearningRate 0.000738 Epoch: 9 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:49,683-Speed 2514.46 samples/sec Loss 3.7731 LearningRate 0.000738 Epoch: 9 Global Step: 187990 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:23:57,887-Speed 2496.72 samples/sec Loss 3.7940 LearningRate 0.000738 Epoch: 9 Global Step: 188000 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:06,084-Speed 2498.83 samples/sec Loss 3.8194 LearningRate 0.000738 Epoch: 9 Global Step: 188010 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:14,306-Speed 2491.33 samples/sec Loss 3.7683 LearningRate 0.000738 Epoch: 9 Global Step: 188020 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:22,517-Speed 2494.67 samples/sec Loss 3.7722 LearningRate 0.000738 Epoch: 9 Global Step: 188030 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:30,715-Speed 2498.50 samples/sec Loss 3.8541 LearningRate 0.000738 Epoch: 9 Global Step: 188040 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:38,866-Speed 2512.89 samples/sec Loss 3.7998 LearningRate 0.000738 Epoch: 9 Global Step: 188050 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:47,066-Speed 2498.02 samples/sec Loss 3.8267 LearningRate 0.000738 Epoch: 9 Global Step: 188060 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:24:55,265-Speed 2498.27 samples/sec Loss 3.8003 LearningRate 0.000738 Epoch: 9 Global Step: 188070 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:03,471-Speed 2496.16 samples/sec Loss 3.8310 LearningRate 0.000738 Epoch: 9 Global Step: 188080 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:11,667-Speed 2499.13 samples/sec Loss 3.8806 LearningRate 0.000738 Epoch: 9 Global Step: 188090 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:19,864-Speed 2499.01 samples/sec Loss 3.8762 LearningRate 0.000738 Epoch: 9 Global Step: 188100 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:28,013-Speed 2513.42 samples/sec Loss 3.9187 LearningRate 0.000738 Epoch: 9 Global Step: 188110 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:36,212-Speed 2498.34 samples/sec Loss 3.9144 LearningRate 0.000738 Epoch: 9 Global Step: 188120 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:44,410-Speed 2498.36 samples/sec Loss 3.8005 LearningRate 0.000738 Epoch: 9 Global Step: 188130 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:25:52,608-Speed 2498.86 samples/sec Loss 3.9124 LearningRate 0.000738 Epoch: 9 Global Step: 188140 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:26:00,810-Speed 2497.35 samples/sec Loss 3.8536 LearningRate 0.000738 Epoch: 9 Global Step: 188150 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:26:09,010-Speed 2497.85 samples/sec Loss 3.8299 LearningRate 0.000738 Epoch: 9 Global Step: 188160 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:26:17,153-Speed 2515.70 samples/sec Loss 3.8144 LearningRate 0.000738 Epoch: 9 Global Step: 188170 Fp16 Grad Scale: 32768 Required: 147 hours Training: 2022-07-07 09:26:25,358-Speed 2496.45 samples/sec Loss 3.7839 LearningRate 0.000738 Epoch: 9 Global Step: 188180 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:26:33,563-Speed 2496.68 samples/sec Loss 3.7725 LearningRate 0.000738 Epoch: 9 Global Step: 188190 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:26:41,757-Speed 2499.82 samples/sec Loss 3.8576 LearningRate 0.000738 Epoch: 9 Global Step: 188200 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:26:49,957-Speed 2498.07 samples/sec Loss 3.7773 LearningRate 0.000738 Epoch: 9 Global Step: 188210 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:26:58,155-Speed 2498.82 samples/sec Loss 3.8501 LearningRate 0.000738 Epoch: 9 Global Step: 188220 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:06,300-Speed 2514.96 samples/sec Loss 3.7764 LearningRate 0.000738 Epoch: 9 Global Step: 188230 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:14,496-Speed 2499.25 samples/sec Loss 3.8279 LearningRate 0.000738 Epoch: 9 Global Step: 188240 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:22,697-Speed 2497.42 samples/sec Loss 3.8313 LearningRate 0.000738 Epoch: 9 Global Step: 188250 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:30,897-Speed 2497.95 samples/sec Loss 3.8763 LearningRate 0.000738 Epoch: 9 Global Step: 188260 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:39,094-Speed 2499.28 samples/sec Loss 3.8238 LearningRate 0.000738 Epoch: 9 Global Step: 188270 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:47,303-Speed 2495.57 samples/sec Loss 3.8034 LearningRate 0.000738 Epoch: 9 Global Step: 188280 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:27:55,447-Speed 2515.03 samples/sec Loss 3.9012 LearningRate 0.000738 Epoch: 9 Global Step: 188290 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:03,645-Speed 2498.62 samples/sec Loss 3.9619 LearningRate 0.000738 Epoch: 9 Global Step: 188300 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:11,843-Speed 2498.68 samples/sec Loss 3.8448 LearningRate 0.000738 Epoch: 9 Global Step: 188310 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:20,039-Speed 2499.16 samples/sec Loss 3.8115 LearningRate 0.000738 Epoch: 9 Global Step: 188320 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:28,237-Speed 2498.58 samples/sec Loss 3.8463 LearningRate 0.000738 Epoch: 9 Global Step: 188330 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:36,433-Speed 2500.07 samples/sec Loss 3.7923 LearningRate 0.000738 Epoch: 9 Global Step: 188340 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:44,578-Speed 2515.08 samples/sec Loss 3.7622 LearningRate 0.000738 Epoch: 9 Global Step: 188350 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:28:52,785-Speed 2495.72 samples/sec Loss 3.7600 LearningRate 0.000738 Epoch: 9 Global Step: 188360 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:00,980-Speed 2499.31 samples/sec Loss 3.7966 LearningRate 0.000738 Epoch: 9 Global Step: 188370 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:09,182-Speed 2497.50 samples/sec Loss 3.8707 LearningRate 0.000738 Epoch: 9 Global Step: 188380 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:17,379-Speed 2498.87 samples/sec Loss 3.9324 LearningRate 0.000738 Epoch: 9 Global Step: 188390 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:25,580-Speed 2497.64 samples/sec Loss 3.8064 LearningRate 0.000737 Epoch: 9 Global Step: 188400 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:33,733-Speed 2512.25 samples/sec Loss 3.8847 LearningRate 0.000737 Epoch: 9 Global Step: 188410 Fp16 Grad Scale: 65536 Required: 147 hours Training: 2022-07-07 09:29:41,945-Speed 2494.57 samples/sec Loss 3.8250 LearningRate 0.000737 Epoch: 9 Global Step: 188420 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:29:50,144-Speed 2498.26 samples/sec Loss 3.7781 LearningRate 0.000737 Epoch: 9 Global Step: 188430 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:29:58,347-Speed 2497.33 samples/sec Loss 3.7680 LearningRate 0.000737 Epoch: 9 Global Step: 188440 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:06,547-Speed 2498.17 samples/sec Loss 3.8228 LearningRate 0.000737 Epoch: 9 Global Step: 188450 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:14,746-Speed 2498.30 samples/sec Loss 3.7798 LearningRate 0.000737 Epoch: 9 Global Step: 188460 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:22,895-Speed 2513.58 samples/sec Loss 3.7777 LearningRate 0.000737 Epoch: 9 Global Step: 188470 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:31,104-Speed 2495.03 samples/sec Loss 3.8874 LearningRate 0.000737 Epoch: 9 Global Step: 188480 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:39,309-Speed 2496.71 samples/sec Loss 3.8328 LearningRate 0.000737 Epoch: 9 Global Step: 188490 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:47,531-Speed 2491.19 samples/sec Loss 3.7742 LearningRate 0.000737 Epoch: 9 Global Step: 188500 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:30:55,728-Speed 2498.92 samples/sec Loss 3.8649 LearningRate 0.000737 Epoch: 9 Global Step: 188510 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:03,926-Speed 2498.54 samples/sec Loss 3.8399 LearningRate 0.000737 Epoch: 9 Global Step: 188520 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:12,075-Speed 2514.12 samples/sec Loss 3.7699 LearningRate 0.000737 Epoch: 9 Global Step: 188530 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:20,272-Speed 2498.86 samples/sec Loss 3.8306 LearningRate 0.000737 Epoch: 9 Global Step: 188540 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:28,467-Speed 2499.42 samples/sec Loss 3.8580 LearningRate 0.000737 Epoch: 9 Global Step: 188550 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:36,670-Speed 2496.92 samples/sec Loss 3.8605 LearningRate 0.000737 Epoch: 9 Global Step: 188560 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:44,875-Speed 2496.62 samples/sec Loss 3.8131 LearningRate 0.000737 Epoch: 9 Global Step: 188570 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:31:53,077-Speed 2497.30 samples/sec Loss 3.7945 LearningRate 0.000737 Epoch: 9 Global Step: 188580 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:01,224-Speed 2514.29 samples/sec Loss 3.8933 LearningRate 0.000737 Epoch: 9 Global Step: 188590 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:09,421-Speed 2498.96 samples/sec Loss 3.8906 LearningRate 0.000737 Epoch: 9 Global Step: 188600 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:17,620-Speed 2498.05 samples/sec Loss 3.8269 LearningRate 0.000737 Epoch: 9 Global Step: 188610 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:25,817-Speed 2498.87 samples/sec Loss 3.8679 LearningRate 0.000737 Epoch: 9 Global Step: 188620 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:34,016-Speed 2498.47 samples/sec Loss 3.8012 LearningRate 0.000737 Epoch: 9 Global Step: 188630 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:42,214-Speed 2498.51 samples/sec Loss 3.8698 LearningRate 0.000737 Epoch: 9 Global Step: 188640 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:50,359-Speed 2515.07 samples/sec Loss 3.8232 LearningRate 0.000737 Epoch: 9 Global Step: 188650 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:32:58,553-Speed 2499.69 samples/sec Loss 3.7310 LearningRate 0.000737 Epoch: 9 Global Step: 188660 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:06,765-Speed 2494.22 samples/sec Loss 3.8379 LearningRate 0.000737 Epoch: 9 Global Step: 188670 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:14,962-Speed 2499.06 samples/sec Loss 3.8522 LearningRate 0.000737 Epoch: 9 Global Step: 188680 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:23,160-Speed 2498.59 samples/sec Loss 3.8640 LearningRate 0.000737 Epoch: 9 Global Step: 188690 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:31,365-Speed 2496.88 samples/sec Loss 3.8175 LearningRate 0.000737 Epoch: 9 Global Step: 188700 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:39,514-Speed 2513.47 samples/sec Loss 3.7830 LearningRate 0.000737 Epoch: 9 Global Step: 188710 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:47,712-Speed 2498.59 samples/sec Loss 3.8601 LearningRate 0.000737 Epoch: 9 Global Step: 188720 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:33:55,915-Speed 2496.93 samples/sec Loss 3.8355 LearningRate 0.000737 Epoch: 9 Global Step: 188730 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:04,114-Speed 2498.66 samples/sec Loss 3.8063 LearningRate 0.000737 Epoch: 9 Global Step: 188740 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:12,315-Speed 2497.36 samples/sec Loss 3.7960 LearningRate 0.000737 Epoch: 9 Global Step: 188750 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:20,514-Speed 2499.25 samples/sec Loss 3.8157 LearningRate 0.000737 Epoch: 9 Global Step: 188760 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:28,661-Speed 2514.27 samples/sec Loss 3.8735 LearningRate 0.000737 Epoch: 9 Global Step: 188770 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:36,858-Speed 2498.94 samples/sec Loss 3.8325 LearningRate 0.000737 Epoch: 9 Global Step: 188780 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:45,055-Speed 2498.94 samples/sec Loss 3.8216 LearningRate 0.000737 Epoch: 9 Global Step: 188790 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:34:53,266-Speed 2494.74 samples/sec Loss 3.7989 LearningRate 0.000737 Epoch: 9 Global Step: 188800 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:01,475-Speed 2495.67 samples/sec Loss 3.8095 LearningRate 0.000737 Epoch: 9 Global Step: 188810 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:09,673-Speed 2498.46 samples/sec Loss 3.8517 LearningRate 0.000737 Epoch: 9 Global Step: 188820 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:17,820-Speed 2514.22 samples/sec Loss 3.8871 LearningRate 0.000736 Epoch: 9 Global Step: 188830 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:26,021-Speed 2497.85 samples/sec Loss 3.7778 LearningRate 0.000736 Epoch: 9 Global Step: 188840 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:34,216-Speed 2499.44 samples/sec Loss 3.8154 LearningRate 0.000736 Epoch: 9 Global Step: 188850 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:42,415-Speed 2498.21 samples/sec Loss 3.7921 LearningRate 0.000736 Epoch: 9 Global Step: 188860 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:50,613-Speed 2498.58 samples/sec Loss 3.7251 LearningRate 0.000736 Epoch: 9 Global Step: 188870 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:35:58,812-Speed 2498.08 samples/sec Loss 3.8239 LearningRate 0.000736 Epoch: 9 Global Step: 188880 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:06,957-Speed 2514.92 samples/sec Loss 3.7461 LearningRate 0.000736 Epoch: 9 Global Step: 188890 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:15,157-Speed 2498.09 samples/sec Loss 3.7600 LearningRate 0.000736 Epoch: 9 Global Step: 188900 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:23,356-Speed 2498.14 samples/sec Loss 3.7599 LearningRate 0.000736 Epoch: 9 Global Step: 188910 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:31,559-Speed 2497.23 samples/sec Loss 3.8022 LearningRate 0.000736 Epoch: 9 Global Step: 188920 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:39,778-Speed 2491.97 samples/sec Loss 3.7894 LearningRate 0.000736 Epoch: 9 Global Step: 188930 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:47,977-Speed 2498.14 samples/sec Loss 3.7583 LearningRate 0.000736 Epoch: 9 Global Step: 188940 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:36:56,122-Speed 2514.91 samples/sec Loss 3.8497 LearningRate 0.000736 Epoch: 9 Global Step: 188950 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:04,319-Speed 2499.08 samples/sec Loss 3.8136 LearningRate 0.000736 Epoch: 9 Global Step: 188960 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:12,518-Speed 2498.03 samples/sec Loss 3.7708 LearningRate 0.000736 Epoch: 9 Global Step: 188970 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:20,715-Speed 2498.84 samples/sec Loss 3.8270 LearningRate 0.000736 Epoch: 9 Global Step: 188980 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:28,925-Speed 2494.84 samples/sec Loss 3.8328 LearningRate 0.000736 Epoch: 9 Global Step: 188990 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:37,135-Speed 2495.08 samples/sec Loss 3.8169 LearningRate 0.000736 Epoch: 9 Global Step: 189000 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:45,285-Speed 2513.01 samples/sec Loss 3.8948 LearningRate 0.000736 Epoch: 9 Global Step: 189010 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:37:53,485-Speed 2498.14 samples/sec Loss 3.8232 LearningRate 0.000736 Epoch: 9 Global Step: 189020 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:01,681-Speed 2499.28 samples/sec Loss 3.7913 LearningRate 0.000736 Epoch: 9 Global Step: 189030 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:09,879-Speed 2500.04 samples/sec Loss 3.8278 LearningRate 0.000736 Epoch: 9 Global Step: 189040 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:18,088-Speed 2494.95 samples/sec Loss 3.8503 LearningRate 0.000736 Epoch: 9 Global Step: 189050 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:26,286-Speed 2498.55 samples/sec Loss 3.8226 LearningRate 0.000736 Epoch: 9 Global Step: 189060 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:34,435-Speed 2513.74 samples/sec Loss 3.8420 LearningRate 0.000736 Epoch: 9 Global Step: 189070 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:42,633-Speed 2498.69 samples/sec Loss 3.8392 LearningRate 0.000736 Epoch: 9 Global Step: 189080 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:50,832-Speed 2498.26 samples/sec Loss 3.8007 LearningRate 0.000736 Epoch: 9 Global Step: 189090 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:38:59,033-Speed 2497.70 samples/sec Loss 3.8359 LearningRate 0.000736 Epoch: 9 Global Step: 189100 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:07,233-Speed 2498.05 samples/sec Loss 3.8402 LearningRate 0.000736 Epoch: 9 Global Step: 189110 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:15,433-Speed 2498.00 samples/sec Loss 3.8525 LearningRate 0.000736 Epoch: 9 Global Step: 189120 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:23,576-Speed 2515.16 samples/sec Loss 3.8447 LearningRate 0.000736 Epoch: 9 Global Step: 189130 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:31,775-Speed 2498.32 samples/sec Loss 3.8356 LearningRate 0.000736 Epoch: 9 Global Step: 189140 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:39,975-Speed 2497.93 samples/sec Loss 3.7932 LearningRate 0.000736 Epoch: 9 Global Step: 189150 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:48,175-Speed 2498.34 samples/sec Loss 3.8188 LearningRate 0.000736 Epoch: 9 Global Step: 189160 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:39:56,375-Speed 2497.95 samples/sec Loss 3.7749 LearningRate 0.000736 Epoch: 9 Global Step: 189170 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:04,577-Speed 2497.29 samples/sec Loss 3.8098 LearningRate 0.000736 Epoch: 9 Global Step: 189180 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:12,722-Speed 2514.80 samples/sec Loss 3.8096 LearningRate 0.000736 Epoch: 9 Global Step: 189190 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:20,931-Speed 2495.18 samples/sec Loss 3.7784 LearningRate 0.000736 Epoch: 9 Global Step: 189200 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:29,133-Speed 2497.46 samples/sec Loss 3.6744 LearningRate 0.000736 Epoch: 9 Global Step: 189210 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:37,331-Speed 2498.69 samples/sec Loss 3.7769 LearningRate 0.000736 Epoch: 9 Global Step: 189220 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:45,528-Speed 2499.14 samples/sec Loss 3.8375 LearningRate 0.000736 Epoch: 9 Global Step: 189230 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:40:53,727-Speed 2498.04 samples/sec Loss 3.7646 LearningRate 0.000736 Epoch: 9 Global Step: 189240 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:01,871-Speed 2515.34 samples/sec Loss 3.7756 LearningRate 0.000736 Epoch: 9 Global Step: 189250 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:10,072-Speed 2497.88 samples/sec Loss 3.8379 LearningRate 0.000736 Epoch: 9 Global Step: 189260 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:18,281-Speed 2495.02 samples/sec Loss 3.7791 LearningRate 0.000735 Epoch: 9 Global Step: 189270 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:26,490-Speed 2495.29 samples/sec Loss 3.8057 LearningRate 0.000735 Epoch: 9 Global Step: 189280 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:34,692-Speed 2497.35 samples/sec Loss 3.7655 LearningRate 0.000735 Epoch: 9 Global Step: 189290 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:41:42,848-Speed 2511.57 samples/sec Loss 3.7500 LearningRate 0.000735 Epoch: 9 Global Step: 189300 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:41:51,001-Speed 2512.50 samples/sec Loss 3.8094 LearningRate 0.000735 Epoch: 9 Global Step: 189310 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:41:59,209-Speed 2495.70 samples/sec Loss 3.8591 LearningRate 0.000735 Epoch: 9 Global Step: 189320 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:07,409-Speed 2497.98 samples/sec Loss 3.7614 LearningRate 0.000735 Epoch: 9 Global Step: 189330 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:15,618-Speed 2495.06 samples/sec Loss 3.7668 LearningRate 0.000735 Epoch: 9 Global Step: 189340 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:23,814-Speed 2499.22 samples/sec Loss 3.7488 LearningRate 0.000735 Epoch: 9 Global Step: 189350 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:32,017-Speed 2497.26 samples/sec Loss 3.8640 LearningRate 0.000735 Epoch: 9 Global Step: 189360 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:40,161-Speed 2515.02 samples/sec Loss 3.8275 LearningRate 0.000735 Epoch: 9 Global Step: 189370 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:48,364-Speed 2497.39 samples/sec Loss 3.8157 LearningRate 0.000735 Epoch: 9 Global Step: 189380 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:42:56,563-Speed 2498.11 samples/sec Loss 3.8061 LearningRate 0.000735 Epoch: 9 Global Step: 189390 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:04,764-Speed 2497.84 samples/sec Loss 3.8624 LearningRate 0.000735 Epoch: 9 Global Step: 189400 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:12,961-Speed 2498.86 samples/sec Loss 3.8733 LearningRate 0.000735 Epoch: 9 Global Step: 189410 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:21,163-Speed 2497.54 samples/sec Loss 3.8407 LearningRate 0.000735 Epoch: 9 Global Step: 189420 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:29,306-Speed 2515.30 samples/sec Loss 3.8283 LearningRate 0.000735 Epoch: 9 Global Step: 189430 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:37,505-Speed 2498.17 samples/sec Loss 3.8334 LearningRate 0.000735 Epoch: 9 Global Step: 189440 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:45,702-Speed 2499.06 samples/sec Loss 3.8068 LearningRate 0.000735 Epoch: 9 Global Step: 189450 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:43:53,904-Speed 2497.18 samples/sec Loss 3.9247 LearningRate 0.000735 Epoch: 9 Global Step: 189460 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:02,120-Speed 2493.09 samples/sec Loss 3.8646 LearningRate 0.000735 Epoch: 9 Global Step: 189470 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:10,345-Speed 2490.53 samples/sec Loss 3.8460 LearningRate 0.000735 Epoch: 9 Global Step: 189480 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:18,503-Speed 2510.84 samples/sec Loss 3.7772 LearningRate 0.000735 Epoch: 9 Global Step: 189490 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:26,761-Speed 2500.44 samples/sec Loss 3.8219 LearningRate 0.000735 Epoch: 9 Global Step: 189500 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:34,965-Speed 2496.90 samples/sec Loss 3.7785 LearningRate 0.000735 Epoch: 9 Global Step: 189510 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:43,163-Speed 2498.59 samples/sec Loss 3.7873 LearningRate 0.000735 Epoch: 9 Global Step: 189520 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:51,359-Speed 2498.99 samples/sec Loss 3.8747 LearningRate 0.000735 Epoch: 9 Global Step: 189530 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:44:59,558-Speed 2498.43 samples/sec Loss 3.8050 LearningRate 0.000735 Epoch: 9 Global Step: 189540 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:07,703-Speed 2515.14 samples/sec Loss 3.7927 LearningRate 0.000735 Epoch: 9 Global Step: 189550 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:15,899-Speed 2498.87 samples/sec Loss 3.8077 LearningRate 0.000735 Epoch: 9 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:24,106-Speed 2496.57 samples/sec Loss 3.7796 LearningRate 0.000735 Epoch: 9 Global Step: 189570 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:32,314-Speed 2495.48 samples/sec Loss 3.8363 LearningRate 0.000735 Epoch: 9 Global Step: 189580 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:40,517-Speed 2496.93 samples/sec Loss 3.9058 LearningRate 0.000735 Epoch: 9 Global Step: 189590 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:48,719-Speed 2497.32 samples/sec Loss 3.7971 LearningRate 0.000735 Epoch: 9 Global Step: 189600 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:45:56,874-Speed 2511.92 samples/sec Loss 3.8199 LearningRate 0.000735 Epoch: 9 Global Step: 189610 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:05,071-Speed 2498.70 samples/sec Loss 3.7857 LearningRate 0.000735 Epoch: 9 Global Step: 189620 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:13,270-Speed 2498.36 samples/sec Loss 3.8320 LearningRate 0.000735 Epoch: 9 Global Step: 189630 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:21,469-Speed 2498.32 samples/sec Loss 3.8378 LearningRate 0.000735 Epoch: 9 Global Step: 189640 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:29,680-Speed 2494.66 samples/sec Loss 3.7817 LearningRate 0.000735 Epoch: 9 Global Step: 189650 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:37,879-Speed 2498.37 samples/sec Loss 3.7539 LearningRate 0.000735 Epoch: 9 Global Step: 189660 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:46,027-Speed 2513.85 samples/sec Loss 3.8691 LearningRate 0.000735 Epoch: 9 Global Step: 189670 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:46:54,230-Speed 2497.03 samples/sec Loss 3.7548 LearningRate 0.000735 Epoch: 9 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:02,425-Speed 2499.54 samples/sec Loss 3.8042 LearningRate 0.000735 Epoch: 9 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:10,632-Speed 2495.83 samples/sec Loss 3.8430 LearningRate 0.000734 Epoch: 9 Global Step: 189700 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:18,829-Speed 2498.84 samples/sec Loss 3.8535 LearningRate 0.000734 Epoch: 9 Global Step: 189710 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:27,023-Speed 2500.06 samples/sec Loss 3.7960 LearningRate 0.000734 Epoch: 9 Global Step: 189720 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:35,170-Speed 2514.67 samples/sec Loss 3.7851 LearningRate 0.000734 Epoch: 9 Global Step: 189730 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:43,367-Speed 2498.77 samples/sec Loss 3.8333 LearningRate 0.000734 Epoch: 9 Global Step: 189740 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:51,567-Speed 2498.19 samples/sec Loss 3.7723 LearningRate 0.000734 Epoch: 9 Global Step: 189750 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:47:59,765-Speed 2498.59 samples/sec Loss 3.8071 LearningRate 0.000734 Epoch: 9 Global Step: 189760 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:07,965-Speed 2498.01 samples/sec Loss 3.7878 LearningRate 0.000734 Epoch: 9 Global Step: 189770 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:16,166-Speed 2497.76 samples/sec Loss 3.7670 LearningRate 0.000734 Epoch: 9 Global Step: 189780 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:24,314-Speed 2513.88 samples/sec Loss 3.7218 LearningRate 0.000734 Epoch: 9 Global Step: 189790 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:32,522-Speed 2495.35 samples/sec Loss 3.7552 LearningRate 0.000734 Epoch: 9 Global Step: 189800 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:40,720-Speed 2498.67 samples/sec Loss 3.8387 LearningRate 0.000734 Epoch: 9 Global Step: 189810 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:48,917-Speed 2498.97 samples/sec Loss 3.8365 LearningRate 0.000734 Epoch: 9 Global Step: 189820 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:48:57,112-Speed 2499.34 samples/sec Loss 3.8252 LearningRate 0.000734 Epoch: 9 Global Step: 189830 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:05,309-Speed 2498.84 samples/sec Loss 3.7637 LearningRate 0.000734 Epoch: 9 Global Step: 189840 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:13,455-Speed 2514.84 samples/sec Loss 3.8480 LearningRate 0.000734 Epoch: 9 Global Step: 189850 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:21,665-Speed 2494.87 samples/sec Loss 3.7944 LearningRate 0.000734 Epoch: 9 Global Step: 189860 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:29,863-Speed 2498.60 samples/sec Loss 3.7999 LearningRate 0.000734 Epoch: 9 Global Step: 189870 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:38,057-Speed 2499.69 samples/sec Loss 3.8040 LearningRate 0.000734 Epoch: 9 Global Step: 189880 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:46,263-Speed 2496.13 samples/sec Loss 3.8381 LearningRate 0.000734 Epoch: 9 Global Step: 189890 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:49:54,460-Speed 2498.89 samples/sec Loss 3.7285 LearningRate 0.000734 Epoch: 9 Global Step: 189900 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:02,607-Speed 2514.11 samples/sec Loss 3.7599 LearningRate 0.000734 Epoch: 9 Global Step: 189910 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:10,803-Speed 2499.13 samples/sec Loss 3.7797 LearningRate 0.000734 Epoch: 9 Global Step: 189920 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:19,001-Speed 2498.56 samples/sec Loss 3.8075 LearningRate 0.000734 Epoch: 9 Global Step: 189930 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:27,202-Speed 2497.41 samples/sec Loss 3.8119 LearningRate 0.000734 Epoch: 9 Global Step: 189940 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:35,402-Speed 2498.06 samples/sec Loss 3.8660 LearningRate 0.000734 Epoch: 9 Global Step: 189950 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:43,602-Speed 2498.09 samples/sec Loss 3.8511 LearningRate 0.000734 Epoch: 9 Global Step: 189960 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:51,750-Speed 2513.97 samples/sec Loss 3.6909 LearningRate 0.000734 Epoch: 9 Global Step: 189970 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:50:59,948-Speed 2498.49 samples/sec Loss 3.8230 LearningRate 0.000734 Epoch: 9 Global Step: 189980 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:08,151-Speed 2497.24 samples/sec Loss 3.8034 LearningRate 0.000734 Epoch: 9 Global Step: 189990 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:16,353-Speed 2497.60 samples/sec Loss 3.7682 LearningRate 0.000734 Epoch: 9 Global Step: 190000 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:24,551-Speed 2498.68 samples/sec Loss 3.7893 LearningRate 0.000734 Epoch: 9 Global Step: 190010 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:32,750-Speed 2497.95 samples/sec Loss 3.8248 LearningRate 0.000734 Epoch: 9 Global Step: 190020 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:40,900-Speed 2514.01 samples/sec Loss 3.8207 LearningRate 0.000734 Epoch: 9 Global Step: 190030 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:49,100-Speed 2498.14 samples/sec Loss 3.7259 LearningRate 0.000734 Epoch: 9 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:51:57,306-Speed 2495.99 samples/sec Loss 3.7614 LearningRate 0.000734 Epoch: 9 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:05,508-Speed 2497.45 samples/sec Loss 3.7860 LearningRate 0.000734 Epoch: 9 Global Step: 190060 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:13,705-Speed 2498.78 samples/sec Loss 3.8489 LearningRate 0.000734 Epoch: 9 Global Step: 190070 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:21,905-Speed 2498.29 samples/sec Loss 3.8797 LearningRate 0.000734 Epoch: 9 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:30,049-Speed 2515.19 samples/sec Loss 3.8216 LearningRate 0.000734 Epoch: 9 Global Step: 190090 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:38,251-Speed 2497.46 samples/sec Loss 3.8394 LearningRate 0.000734 Epoch: 9 Global Step: 190100 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:46,449-Speed 2498.70 samples/sec Loss 3.7488 LearningRate 0.000734 Epoch: 9 Global Step: 190110 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:52:54,651-Speed 2497.17 samples/sec Loss 3.7440 LearningRate 0.000734 Epoch: 9 Global Step: 190120 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:02,855-Speed 2496.82 samples/sec Loss 3.8102 LearningRate 0.000734 Epoch: 9 Global Step: 190130 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:11,056-Speed 2497.78 samples/sec Loss 3.8107 LearningRate 0.000733 Epoch: 9 Global Step: 190140 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:19,201-Speed 2515.05 samples/sec Loss 3.7358 LearningRate 0.000733 Epoch: 9 Global Step: 190150 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:27,404-Speed 2497.00 samples/sec Loss 3.8055 LearningRate 0.000733 Epoch: 9 Global Step: 190160 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:35,605-Speed 2497.69 samples/sec Loss 3.7618 LearningRate 0.000733 Epoch: 9 Global Step: 190170 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:43,807-Speed 2497.25 samples/sec Loss 3.8135 LearningRate 0.000733 Epoch: 9 Global Step: 190180 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:53:52,005-Speed 2498.59 samples/sec Loss 3.8565 LearningRate 0.000733 Epoch: 9 Global Step: 190190 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:00,204-Speed 2498.25 samples/sec Loss 3.7693 LearningRate 0.000733 Epoch: 9 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:08,363-Speed 2510.72 samples/sec Loss 3.8481 LearningRate 0.000733 Epoch: 9 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:16,562-Speed 2498.21 samples/sec Loss 3.8138 LearningRate 0.000733 Epoch: 9 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:24,765-Speed 2497.04 samples/sec Loss 3.7785 LearningRate 0.000733 Epoch: 9 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:32,973-Speed 2495.68 samples/sec Loss 3.7973 LearningRate 0.000733 Epoch: 9 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:41,171-Speed 2498.46 samples/sec Loss 3.7181 LearningRate 0.000733 Epoch: 9 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:49,377-Speed 2496.62 samples/sec Loss 3.8141 LearningRate 0.000733 Epoch: 9 Global Step: 190260 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:54:57,533-Speed 2511.24 samples/sec Loss 3.8269 LearningRate 0.000733 Epoch: 9 Global Step: 190270 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:05,732-Speed 2498.21 samples/sec Loss 3.8395 LearningRate 0.000733 Epoch: 9 Global Step: 190280 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:13,931-Speed 2498.32 samples/sec Loss 3.7799 LearningRate 0.000733 Epoch: 9 Global Step: 190290 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:22,142-Speed 2495.10 samples/sec Loss 3.7736 LearningRate 0.000733 Epoch: 9 Global Step: 190300 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:30,341-Speed 2498.15 samples/sec Loss 3.7877 LearningRate 0.000733 Epoch: 9 Global Step: 190310 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:38,541-Speed 2497.88 samples/sec Loss 3.8259 LearningRate 0.000733 Epoch: 9 Global Step: 190320 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:46,688-Speed 2514.34 samples/sec Loss 3.7674 LearningRate 0.000733 Epoch: 9 Global Step: 190330 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:55:54,887-Speed 2498.28 samples/sec Loss 3.7862 LearningRate 0.000733 Epoch: 9 Global Step: 190340 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:03,084-Speed 2498.87 samples/sec Loss 3.8188 LearningRate 0.000733 Epoch: 9 Global Step: 190350 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:11,283-Speed 2498.19 samples/sec Loss 3.8175 LearningRate 0.000733 Epoch: 9 Global Step: 190360 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:19,482-Speed 2498.35 samples/sec Loss 3.7677 LearningRate 0.000733 Epoch: 9 Global Step: 190370 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:27,683-Speed 2497.66 samples/sec Loss 3.7323 LearningRate 0.000733 Epoch: 9 Global Step: 190380 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:35,830-Speed 2514.43 samples/sec Loss 3.7624 LearningRate 0.000733 Epoch: 9 Global Step: 190390 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:44,026-Speed 2498.93 samples/sec Loss 3.7462 LearningRate 0.000733 Epoch: 9 Global Step: 190400 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:56:52,227-Speed 2497.97 samples/sec Loss 3.7642 LearningRate 0.000733 Epoch: 9 Global Step: 190410 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:00,426-Speed 2498.48 samples/sec Loss 3.8318 LearningRate 0.000733 Epoch: 9 Global Step: 190420 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:08,628-Speed 2497.43 samples/sec Loss 3.7882 LearningRate 0.000733 Epoch: 9 Global Step: 190430 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:16,827-Speed 2498.06 samples/sec Loss 3.7430 LearningRate 0.000733 Epoch: 9 Global Step: 190440 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:24,973-Speed 2514.49 samples/sec Loss 3.8030 LearningRate 0.000733 Epoch: 9 Global Step: 190450 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:33,183-Speed 2495.03 samples/sec Loss 3.7399 LearningRate 0.000733 Epoch: 9 Global Step: 190460 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:41,391-Speed 2495.57 samples/sec Loss 3.7814 LearningRate 0.000733 Epoch: 9 Global Step: 190470 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:49,590-Speed 2498.33 samples/sec Loss 3.7238 LearningRate 0.000733 Epoch: 9 Global Step: 190480 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:57:57,784-Speed 2499.70 samples/sec Loss 3.7975 LearningRate 0.000733 Epoch: 9 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 09:58:05,983-Speed 2498.22 samples/sec Loss 3.7861 LearningRate 0.000733 Epoch: 9 Global Step: 190500 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:14,127-Speed 2515.04 samples/sec Loss 3.8366 LearningRate 0.000733 Epoch: 9 Global Step: 190510 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:22,326-Speed 2498.83 samples/sec Loss 3.8005 LearningRate 0.000733 Epoch: 9 Global Step: 190520 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:30,523-Speed 2498.82 samples/sec Loss 3.7568 LearningRate 0.000733 Epoch: 9 Global Step: 190530 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:38,720-Speed 2499.22 samples/sec Loss 3.7045 LearningRate 0.000733 Epoch: 9 Global Step: 190540 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:46,918-Speed 2498.50 samples/sec Loss 3.7462 LearningRate 0.000733 Epoch: 9 Global Step: 190550 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:58:55,116-Speed 2498.49 samples/sec Loss 3.8110 LearningRate 0.000733 Epoch: 9 Global Step: 190560 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:03,266-Speed 2513.11 samples/sec Loss 3.9596 LearningRate 0.000733 Epoch: 9 Global Step: 190570 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:11,465-Speed 2498.60 samples/sec Loss 3.8456 LearningRate 0.000732 Epoch: 9 Global Step: 190580 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:19,666-Speed 2497.76 samples/sec Loss 3.8470 LearningRate 0.000732 Epoch: 9 Global Step: 190590 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:27,874-Speed 2495.75 samples/sec Loss 3.8134 LearningRate 0.000732 Epoch: 9 Global Step: 190600 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:36,091-Speed 2492.73 samples/sec Loss 3.8240 LearningRate 0.000732 Epoch: 9 Global Step: 190610 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:44,291-Speed 2497.92 samples/sec Loss 3.9047 LearningRate 0.000732 Epoch: 9 Global Step: 190620 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 09:59:52,443-Speed 2512.86 samples/sec Loss 3.8135 LearningRate 0.000732 Epoch: 9 Global Step: 190630 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:00,653-Speed 2494.79 samples/sec Loss 3.8364 LearningRate 0.000732 Epoch: 9 Global Step: 190640 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:08,855-Speed 2497.48 samples/sec Loss 3.8345 LearningRate 0.000732 Epoch: 9 Global Step: 190650 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:17,057-Speed 2497.24 samples/sec Loss 3.8332 LearningRate 0.000732 Epoch: 9 Global Step: 190660 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:25,256-Speed 2498.24 samples/sec Loss 3.7638 LearningRate 0.000732 Epoch: 9 Global Step: 190670 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:33,466-Speed 2494.71 samples/sec Loss 3.7820 LearningRate 0.000732 Epoch: 9 Global Step: 190680 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:41,614-Speed 2514.01 samples/sec Loss 3.7282 LearningRate 0.000732 Epoch: 9 Global Step: 190690 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:49,814-Speed 2498.01 samples/sec Loss 3.8338 LearningRate 0.000732 Epoch: 9 Global Step: 190700 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:00:58,014-Speed 2498.17 samples/sec Loss 3.8978 LearningRate 0.000732 Epoch: 9 Global Step: 190710 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:01:06,211-Speed 2498.80 samples/sec Loss 3.8587 LearningRate 0.000732 Epoch: 9 Global Step: 190720 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:01:14,410-Speed 2498.17 samples/sec Loss 3.8193 LearningRate 0.000732 Epoch: 9 Global Step: 190730 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:01:22,608-Speed 2498.49 samples/sec Loss 3.7835 LearningRate 0.000732 Epoch: 9 Global Step: 190740 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:01:30,753-Speed 2515.12 samples/sec Loss 3.8400 LearningRate 0.000732 Epoch: 9 Global Step: 190750 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:01:38,910-Speed 2511.04 samples/sec Loss 3.8128 LearningRate 0.000732 Epoch: 9 Global Step: 190760 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:01:47,107-Speed 2499.10 samples/sec Loss 3.8354 LearningRate 0.000732 Epoch: 9 Global Step: 190770 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:01:55,306-Speed 2498.24 samples/sec Loss 3.8005 LearningRate 0.000732 Epoch: 9 Global Step: 190780 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:03,505-Speed 2498.47 samples/sec Loss 3.8133 LearningRate 0.000732 Epoch: 9 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:11,703-Speed 2498.74 samples/sec Loss 3.7673 LearningRate 0.000732 Epoch: 9 Global Step: 190800 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:19,859-Speed 2511.32 samples/sec Loss 3.7937 LearningRate 0.000732 Epoch: 9 Global Step: 190810 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:28,057-Speed 2498.47 samples/sec Loss 3.8819 LearningRate 0.000732 Epoch: 9 Global Step: 190820 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:36,256-Speed 2498.48 samples/sec Loss 3.7961 LearningRate 0.000732 Epoch: 9 Global Step: 190830 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:44,454-Speed 2498.34 samples/sec Loss 3.8170 LearningRate 0.000732 Epoch: 9 Global Step: 190840 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:02:52,656-Speed 2497.84 samples/sec Loss 3.8476 LearningRate 0.000732 Epoch: 9 Global Step: 190850 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:00,855-Speed 2498.13 samples/sec Loss 3.7611 LearningRate 0.000732 Epoch: 9 Global Step: 190860 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:09,002-Speed 2514.27 samples/sec Loss 3.7720 LearningRate 0.000732 Epoch: 9 Global Step: 190870 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:17,200-Speed 2498.47 samples/sec Loss 3.7086 LearningRate 0.000732 Epoch: 9 Global Step: 190880 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:25,420-Speed 2491.93 samples/sec Loss 3.7583 LearningRate 0.000732 Epoch: 9 Global Step: 190890 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:33,634-Speed 2493.93 samples/sec Loss 3.7214 LearningRate 0.000732 Epoch: 9 Global Step: 190900 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:41,829-Speed 2499.31 samples/sec Loss 3.7725 LearningRate 0.000732 Epoch: 9 Global Step: 190910 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:50,027-Speed 2498.52 samples/sec Loss 3.8550 LearningRate 0.000732 Epoch: 9 Global Step: 190920 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:03:58,178-Speed 2512.94 samples/sec Loss 3.6882 LearningRate 0.000732 Epoch: 9 Global Step: 190930 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:06,379-Speed 2497.57 samples/sec Loss 3.7886 LearningRate 0.000732 Epoch: 9 Global Step: 190940 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:14,576-Speed 2498.87 samples/sec Loss 3.7386 LearningRate 0.000732 Epoch: 9 Global Step: 190950 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:22,776-Speed 2498.09 samples/sec Loss 3.7220 LearningRate 0.000732 Epoch: 9 Global Step: 190960 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:30,976-Speed 2498.03 samples/sec Loss 3.8566 LearningRate 0.000732 Epoch: 9 Global Step: 190970 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:39,176-Speed 2497.85 samples/sec Loss 3.8719 LearningRate 0.000732 Epoch: 9 Global Step: 190980 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:47,324-Speed 2514.23 samples/sec Loss 3.8492 LearningRate 0.000732 Epoch: 9 Global Step: 190990 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:04:55,530-Speed 2496.16 samples/sec Loss 3.7803 LearningRate 0.000732 Epoch: 9 Global Step: 191000 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:03,726-Speed 2499.23 samples/sec Loss 3.7903 LearningRate 0.000731 Epoch: 9 Global Step: 191010 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:11,925-Speed 2498.15 samples/sec Loss 3.7734 LearningRate 0.000731 Epoch: 9 Global Step: 191020 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:20,123-Speed 2498.77 samples/sec Loss 3.8150 LearningRate 0.000731 Epoch: 9 Global Step: 191030 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:28,318-Speed 2499.31 samples/sec Loss 3.8570 LearningRate 0.000731 Epoch: 9 Global Step: 191040 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:36,464-Speed 2514.52 samples/sec Loss 3.8012 LearningRate 0.000731 Epoch: 9 Global Step: 191050 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:44,676-Speed 2494.34 samples/sec Loss 3.7922 LearningRate 0.000731 Epoch: 9 Global Step: 191060 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:05:52,877-Speed 2497.93 samples/sec Loss 3.7845 LearningRate 0.000731 Epoch: 9 Global Step: 191070 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:01,072-Speed 2499.60 samples/sec Loss 3.8206 LearningRate 0.000731 Epoch: 9 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:09,285-Speed 2494.20 samples/sec Loss 3.7985 LearningRate 0.000731 Epoch: 9 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:17,483-Speed 2498.31 samples/sec Loss 3.8045 LearningRate 0.000731 Epoch: 9 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:25,624-Speed 2516.15 samples/sec Loss 3.7774 LearningRate 0.000731 Epoch: 9 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:33,823-Speed 2498.16 samples/sec Loss 3.8441 LearningRate 0.000731 Epoch: 9 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:42,022-Speed 2498.49 samples/sec Loss 3.7627 LearningRate 0.000731 Epoch: 9 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:50,218-Speed 2499.11 samples/sec Loss 3.7844 LearningRate 0.000731 Epoch: 9 Global Step: 191140 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:06:58,413-Speed 2499.66 samples/sec Loss 3.7857 LearningRate 0.000731 Epoch: 9 Global Step: 191150 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:06,612-Speed 2498.38 samples/sec Loss 3.6564 LearningRate 0.000731 Epoch: 9 Global Step: 191160 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:14,751-Speed 2516.55 samples/sec Loss 3.8481 LearningRate 0.000731 Epoch: 9 Global Step: 191170 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:22,949-Speed 2498.53 samples/sec Loss 3.9055 LearningRate 0.000731 Epoch: 9 Global Step: 191180 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:31,147-Speed 2498.66 samples/sec Loss 3.8122 LearningRate 0.000731 Epoch: 9 Global Step: 191190 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:39,347-Speed 2498.20 samples/sec Loss 3.8212 LearningRate 0.000731 Epoch: 9 Global Step: 191200 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:47,558-Speed 2494.73 samples/sec Loss 3.8313 LearningRate 0.000731 Epoch: 9 Global Step: 191210 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:07:55,757-Speed 2498.25 samples/sec Loss 3.7850 LearningRate 0.000731 Epoch: 9 Global Step: 191220 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:03,901-Speed 2515.11 samples/sec Loss 3.9155 LearningRate 0.000731 Epoch: 9 Global Step: 191230 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:12,098-Speed 2499.04 samples/sec Loss 3.7435 LearningRate 0.000731 Epoch: 9 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:20,292-Speed 2499.83 samples/sec Loss 3.8274 LearningRate 0.000731 Epoch: 9 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:28,495-Speed 2497.38 samples/sec Loss 3.8255 LearningRate 0.000731 Epoch: 9 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:36,690-Speed 2499.42 samples/sec Loss 3.7670 LearningRate 0.000731 Epoch: 9 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:44,885-Speed 2499.54 samples/sec Loss 3.7610 LearningRate 0.000731 Epoch: 9 Global Step: 191280 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:08:53,034-Speed 2513.60 samples/sec Loss 3.7377 LearningRate 0.000731 Epoch: 9 Global Step: 191290 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:01,235-Speed 2497.69 samples/sec Loss 3.7776 LearningRate 0.000731 Epoch: 9 Global Step: 191300 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:09,434-Speed 2498.66 samples/sec Loss 3.8392 LearningRate 0.000731 Epoch: 9 Global Step: 191310 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:17,629-Speed 2499.51 samples/sec Loss 3.8333 LearningRate 0.000731 Epoch: 9 Global Step: 191320 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:25,825-Speed 2499.28 samples/sec Loss 3.8610 LearningRate 0.000731 Epoch: 9 Global Step: 191330 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:34,025-Speed 2497.69 samples/sec Loss 3.8041 LearningRate 0.000731 Epoch: 9 Global Step: 191340 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:42,171-Speed 2514.74 samples/sec Loss 3.9296 LearningRate 0.000731 Epoch: 9 Global Step: 191350 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:50,368-Speed 2498.77 samples/sec Loss 3.9038 LearningRate 0.000731 Epoch: 9 Global Step: 191360 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:09:58,565-Speed 2498.87 samples/sec Loss 3.7836 LearningRate 0.000731 Epoch: 9 Global Step: 191370 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:06,759-Speed 2499.97 samples/sec Loss 3.8646 LearningRate 0.000731 Epoch: 9 Global Step: 191380 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:14,973-Speed 2493.81 samples/sec Loss 3.8868 LearningRate 0.000731 Epoch: 9 Global Step: 191390 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:23,175-Speed 2497.53 samples/sec Loss 3.8236 LearningRate 0.000731 Epoch: 9 Global Step: 191400 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:31,313-Speed 2517.17 samples/sec Loss 3.8732 LearningRate 0.000731 Epoch: 9 Global Step: 191410 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:39,509-Speed 2499.33 samples/sec Loss 3.7946 LearningRate 0.000731 Epoch: 9 Global Step: 191420 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:47,705-Speed 2499.41 samples/sec Loss 3.8579 LearningRate 0.000731 Epoch: 9 Global Step: 191430 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:10:55,903-Speed 2498.51 samples/sec Loss 3.8765 LearningRate 0.000731 Epoch: 9 Global Step: 191440 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:04,100-Speed 2498.70 samples/sec Loss 3.8554 LearningRate 0.000730 Epoch: 9 Global Step: 191450 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:12,298-Speed 2498.83 samples/sec Loss 3.7487 LearningRate 0.000730 Epoch: 9 Global Step: 191460 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:20,449-Speed 2512.94 samples/sec Loss 3.8460 LearningRate 0.000730 Epoch: 9 Global Step: 191470 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:28,646-Speed 2498.83 samples/sec Loss 3.7971 LearningRate 0.000730 Epoch: 9 Global Step: 191480 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:36,846-Speed 2498.18 samples/sec Loss 3.7702 LearningRate 0.000730 Epoch: 9 Global Step: 191490 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:45,047-Speed 2497.85 samples/sec Loss 3.7743 LearningRate 0.000730 Epoch: 9 Global Step: 191500 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:11:53,244-Speed 2498.56 samples/sec Loss 3.7847 LearningRate 0.000730 Epoch: 9 Global Step: 191510 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:01,444-Speed 2498.27 samples/sec Loss 3.7583 LearningRate 0.000730 Epoch: 9 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:09,590-Speed 2514.65 samples/sec Loss 3.7751 LearningRate 0.000730 Epoch: 9 Global Step: 191530 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:17,788-Speed 2498.76 samples/sec Loss 3.8360 LearningRate 0.000730 Epoch: 9 Global Step: 191540 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:25,987-Speed 2498.18 samples/sec Loss 3.8056 LearningRate 0.000730 Epoch: 9 Global Step: 191550 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:34,200-Speed 2494.15 samples/sec Loss 3.7354 LearningRate 0.000730 Epoch: 9 Global Step: 191560 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:42,397-Speed 2499.13 samples/sec Loss 3.7411 LearningRate 0.000730 Epoch: 9 Global Step: 191570 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:50,595-Speed 2498.54 samples/sec Loss 3.8475 LearningRate 0.000730 Epoch: 9 Global Step: 191580 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:12:58,751-Speed 2511.48 samples/sec Loss 3.7707 LearningRate 0.000730 Epoch: 9 Global Step: 191590 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:06,951-Speed 2498.00 samples/sec Loss 3.7665 LearningRate 0.000730 Epoch: 9 Global Step: 191600 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:15,146-Speed 2499.50 samples/sec Loss 3.7744 LearningRate 0.000730 Epoch: 9 Global Step: 191610 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:23,371-Speed 2494.08 samples/sec Loss 3.8136 LearningRate 0.000730 Epoch: 9 Global Step: 191620 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:31,569-Speed 2498.26 samples/sec Loss 3.8235 LearningRate 0.000730 Epoch: 9 Global Step: 191630 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:39,768-Speed 2498.48 samples/sec Loss 3.9585 LearningRate 0.000730 Epoch: 9 Global Step: 191640 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:47,915-Speed 2514.11 samples/sec Loss 3.7989 LearningRate 0.000730 Epoch: 9 Global Step: 191650 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:13:56,112-Speed 2498.89 samples/sec Loss 3.7819 LearningRate 0.000730 Epoch: 9 Global Step: 191660 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:04,310-Speed 2498.81 samples/sec Loss 3.7882 LearningRate 0.000730 Epoch: 9 Global Step: 191670 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:12,511-Speed 2497.76 samples/sec Loss 3.8033 LearningRate 0.000730 Epoch: 9 Global Step: 191680 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:20,712-Speed 2497.42 samples/sec Loss 3.8147 LearningRate 0.000730 Epoch: 9 Global Step: 191690 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:28,915-Speed 2497.42 samples/sec Loss 3.8066 LearningRate 0.000730 Epoch: 9 Global Step: 191700 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:37,060-Speed 2514.72 samples/sec Loss 3.7934 LearningRate 0.000730 Epoch: 9 Global Step: 191710 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:45,274-Speed 2499.53 samples/sec Loss 3.8077 LearningRate 0.000730 Epoch: 9 Global Step: 191720 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:14:53,473-Speed 2498.18 samples/sec Loss 3.7791 LearningRate 0.000730 Epoch: 9 Global Step: 191730 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:01,673-Speed 2497.81 samples/sec Loss 3.7611 LearningRate 0.000730 Epoch: 9 Global Step: 191740 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:09,869-Speed 2499.09 samples/sec Loss 3.7648 LearningRate 0.000730 Epoch: 9 Global Step: 191750 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:18,070-Speed 2497.64 samples/sec Loss 3.8159 LearningRate 0.000730 Epoch: 9 Global Step: 191760 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:26,224-Speed 2512.20 samples/sec Loss 3.8202 LearningRate 0.000730 Epoch: 9 Global Step: 191770 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:34,430-Speed 2496.15 samples/sec Loss 3.8164 LearningRate 0.000730 Epoch: 9 Global Step: 191780 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:42,631-Speed 2497.56 samples/sec Loss 3.8217 LearningRate 0.000730 Epoch: 9 Global Step: 191790 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:50,835-Speed 2496.77 samples/sec Loss 3.8133 LearningRate 0.000730 Epoch: 9 Global Step: 191800 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:15:59,035-Speed 2497.91 samples/sec Loss 3.8691 LearningRate 0.000730 Epoch: 9 Global Step: 191810 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:07,235-Speed 2497.69 samples/sec Loss 3.8765 LearningRate 0.000730 Epoch: 9 Global Step: 191820 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:15,386-Speed 2513.29 samples/sec Loss 3.8100 LearningRate 0.000730 Epoch: 9 Global Step: 191830 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:23,585-Speed 2497.95 samples/sec Loss 3.8699 LearningRate 0.000730 Epoch: 9 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:31,790-Speed 2496.65 samples/sec Loss 3.7858 LearningRate 0.000730 Epoch: 9 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:39,984-Speed 2499.80 samples/sec Loss 3.8283 LearningRate 0.000730 Epoch: 9 Global Step: 191860 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:48,183-Speed 2498.33 samples/sec Loss 3.7988 LearningRate 0.000730 Epoch: 9 Global Step: 191870 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:16:57,359-Speed 2499.10 samples/sec Loss 3.8145 LearningRate 0.000730 Epoch: 9 Global Step: 191880 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:06,975-Speed 2129.97 samples/sec Loss 3.8012 LearningRate 0.000729 Epoch: 9 Global Step: 191890 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:15,172-Speed 2499.01 samples/sec Loss 3.8283 LearningRate 0.000729 Epoch: 9 Global Step: 191900 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:24,423-Speed 2500.99 samples/sec Loss 3.7891 LearningRate 0.000729 Epoch: 9 Global Step: 191910 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:32,742-Speed 2462.21 samples/sec Loss 3.8296 LearningRate 0.000729 Epoch: 9 Global Step: 191920 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:40,943-Speed 2497.49 samples/sec Loss 3.7438 LearningRate 0.000729 Epoch: 9 Global Step: 191930 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:49,145-Speed 2497.20 samples/sec Loss 3.7746 LearningRate 0.000729 Epoch: 9 Global Step: 191940 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:17:57,299-Speed 2512.13 samples/sec Loss 3.7773 LearningRate 0.000729 Epoch: 9 Global Step: 191950 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:18:05,503-Speed 2496.52 samples/sec Loss 3.7605 LearningRate 0.000729 Epoch: 9 Global Step: 191960 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:13,717-Speed 2493.79 samples/sec Loss 3.7577 LearningRate 0.000729 Epoch: 9 Global Step: 191970 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:21,922-Speed 2496.69 samples/sec Loss 3.7730 LearningRate 0.000729 Epoch: 9 Global Step: 191980 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:30,143-Speed 2491.27 samples/sec Loss 3.7295 LearningRate 0.000729 Epoch: 9 Global Step: 191990 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:38,361-Speed 2492.50 samples/sec Loss 3.8278 LearningRate 0.000729 Epoch: 9 Global Step: 192000 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:46,515-Speed 2511.96 samples/sec Loss 3.8086 LearningRate 0.000729 Epoch: 9 Global Step: 192010 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:18:54,721-Speed 2496.23 samples/sec Loss 3.8082 LearningRate 0.000729 Epoch: 9 Global Step: 192020 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:02,923-Speed 2497.44 samples/sec Loss 3.7513 LearningRate 0.000729 Epoch: 9 Global Step: 192030 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:11,125-Speed 2497.46 samples/sec Loss 3.6936 LearningRate 0.000729 Epoch: 9 Global Step: 192040 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:19,327-Speed 2497.24 samples/sec Loss 3.8171 LearningRate 0.000729 Epoch: 9 Global Step: 192050 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:27,525-Speed 2498.81 samples/sec Loss 3.7724 LearningRate 0.000729 Epoch: 9 Global Step: 192060 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:35,669-Speed 2515.01 samples/sec Loss 3.7371 LearningRate 0.000729 Epoch: 9 Global Step: 192070 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:43,864-Speed 2499.19 samples/sec Loss 3.7796 LearningRate 0.000729 Epoch: 9 Global Step: 192080 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:19:52,061-Speed 2499.13 samples/sec Loss 3.8181 LearningRate 0.000729 Epoch: 9 Global Step: 192090 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:00,257-Speed 2499.13 samples/sec Loss 3.7202 LearningRate 0.000729 Epoch: 9 Global Step: 192100 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:08,477-Speed 2491.75 samples/sec Loss 3.8196 LearningRate 0.000729 Epoch: 9 Global Step: 192110 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:16,682-Speed 2496.50 samples/sec Loss 3.7881 LearningRate 0.000729 Epoch: 9 Global Step: 192120 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:24,821-Speed 2516.60 samples/sec Loss 3.8672 LearningRate 0.000729 Epoch: 9 Global Step: 192130 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:33,024-Speed 2497.18 samples/sec Loss 3.7400 LearningRate 0.000729 Epoch: 9 Global Step: 192140 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:41,223-Speed 2498.23 samples/sec Loss 3.7959 LearningRate 0.000729 Epoch: 9 Global Step: 192150 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:49,426-Speed 2496.93 samples/sec Loss 3.7546 LearningRate 0.000729 Epoch: 9 Global Step: 192160 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:20:57,633-Speed 2495.97 samples/sec Loss 3.7536 LearningRate 0.000729 Epoch: 9 Global Step: 192170 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:05,830-Speed 2498.92 samples/sec Loss 3.7737 LearningRate 0.000729 Epoch: 9 Global Step: 192180 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:13,975-Speed 2514.68 samples/sec Loss 3.8176 LearningRate 0.000729 Epoch: 9 Global Step: 192190 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:22,179-Speed 2496.95 samples/sec Loss 3.7177 LearningRate 0.000729 Epoch: 9 Global Step: 192200 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:30,376-Speed 2498.78 samples/sec Loss 3.7866 LearningRate 0.000729 Epoch: 9 Global Step: 192210 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:38,578-Speed 2497.89 samples/sec Loss 3.8096 LearningRate 0.000729 Epoch: 9 Global Step: 192220 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:46,784-Speed 2496.04 samples/sec Loss 3.7772 LearningRate 0.000729 Epoch: 9 Global Step: 192230 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:21:54,984-Speed 2498.01 samples/sec Loss 3.8005 LearningRate 0.000729 Epoch: 9 Global Step: 192240 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:03,144-Speed 2510.23 samples/sec Loss 3.8283 LearningRate 0.000729 Epoch: 9 Global Step: 192250 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:11,343-Speed 2498.14 samples/sec Loss 3.8374 LearningRate 0.000729 Epoch: 9 Global Step: 192260 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:19,541-Speed 2498.60 samples/sec Loss 3.7890 LearningRate 0.000729 Epoch: 9 Global Step: 192270 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:27,741-Speed 2497.97 samples/sec Loss 3.8238 LearningRate 0.000729 Epoch: 9 Global Step: 192280 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:35,943-Speed 2497.44 samples/sec Loss 3.7572 LearningRate 0.000729 Epoch: 9 Global Step: 192290 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:44,141-Speed 2498.40 samples/sec Loss 3.8095 LearningRate 0.000729 Epoch: 9 Global Step: 192300 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:22:52,288-Speed 2514.38 samples/sec Loss 3.8065 LearningRate 0.000729 Epoch: 9 Global Step: 192310 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:00,486-Speed 2498.52 samples/sec Loss 3.7882 LearningRate 0.000728 Epoch: 9 Global Step: 192320 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:08,701-Speed 2493.35 samples/sec Loss 3.7591 LearningRate 0.000728 Epoch: 9 Global Step: 192330 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:16,901-Speed 2498.04 samples/sec Loss 3.7998 LearningRate 0.000728 Epoch: 9 Global Step: 192340 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:25,097-Speed 2499.20 samples/sec Loss 3.8087 LearningRate 0.000728 Epoch: 9 Global Step: 192350 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:33,296-Speed 2498.18 samples/sec Loss 3.7679 LearningRate 0.000728 Epoch: 9 Global Step: 192360 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:41,446-Speed 2513.16 samples/sec Loss 3.8009 LearningRate 0.000728 Epoch: 9 Global Step: 192370 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:49,641-Speed 2499.75 samples/sec Loss 3.7603 LearningRate 0.000728 Epoch: 9 Global Step: 192380 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:23:57,835-Speed 2499.82 samples/sec Loss 3.7789 LearningRate 0.000728 Epoch: 9 Global Step: 192390 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:06,032-Speed 2498.91 samples/sec Loss 3.7888 LearningRate 0.000728 Epoch: 9 Global Step: 192400 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:14,230-Speed 2498.69 samples/sec Loss 3.7875 LearningRate 0.000728 Epoch: 9 Global Step: 192410 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:22,433-Speed 2496.96 samples/sec Loss 3.7475 LearningRate 0.000728 Epoch: 9 Global Step: 192420 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:30,583-Speed 2513.36 samples/sec Loss 3.8556 LearningRate 0.000728 Epoch: 9 Global Step: 192430 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:38,797-Speed 2493.77 samples/sec Loss 3.7798 LearningRate 0.000728 Epoch: 9 Global Step: 192440 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:46,995-Speed 2498.73 samples/sec Loss 3.7538 LearningRate 0.000728 Epoch: 9 Global Step: 192450 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:24:55,197-Speed 2497.35 samples/sec Loss 3.7721 LearningRate 0.000728 Epoch: 9 Global Step: 192460 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:03,395-Speed 2498.92 samples/sec Loss 3.7669 LearningRate 0.000728 Epoch: 9 Global Step: 192470 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:11,606-Speed 2494.55 samples/sec Loss 3.7177 LearningRate 0.000728 Epoch: 9 Global Step: 192480 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:19,753-Speed 2514.08 samples/sec Loss 3.8003 LearningRate 0.000728 Epoch: 9 Global Step: 192490 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:27,962-Speed 2495.29 samples/sec Loss 3.7734 LearningRate 0.000728 Epoch: 9 Global Step: 192500 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:36,166-Speed 2496.74 samples/sec Loss 3.7557 LearningRate 0.000728 Epoch: 9 Global Step: 192510 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:44,374-Speed 2495.60 samples/sec Loss 3.7675 LearningRate 0.000728 Epoch: 9 Global Step: 192520 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:25:52,576-Speed 2497.36 samples/sec Loss 3.8060 LearningRate 0.000728 Epoch: 9 Global Step: 192530 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:00,778-Speed 2497.21 samples/sec Loss 3.8154 LearningRate 0.000728 Epoch: 9 Global Step: 192540 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:08,923-Speed 2514.76 samples/sec Loss 3.8036 LearningRate 0.000728 Epoch: 9 Global Step: 192550 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:17,124-Speed 2497.87 samples/sec Loss 3.7645 LearningRate 0.000728 Epoch: 9 Global Step: 192560 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:25,323-Speed 2498.11 samples/sec Loss 3.7556 LearningRate 0.000728 Epoch: 9 Global Step: 192570 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:33,534-Speed 2494.68 samples/sec Loss 3.7613 LearningRate 0.000728 Epoch: 9 Global Step: 192580 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:41,735-Speed 2497.65 samples/sec Loss 3.8217 LearningRate 0.000728 Epoch: 9 Global Step: 192590 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:49,937-Speed 2497.29 samples/sec Loss 3.8147 LearningRate 0.000728 Epoch: 9 Global Step: 192600 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:26:58,091-Speed 2512.21 samples/sec Loss 3.8108 LearningRate 0.000728 Epoch: 9 Global Step: 192610 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:27:06,301-Speed 2494.79 samples/sec Loss 3.8339 LearningRate 0.000728 Epoch: 9 Global Step: 192620 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:27:14,502-Speed 2497.86 samples/sec Loss 3.8244 LearningRate 0.000728 Epoch: 9 Global Step: 192630 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:27:22,707-Speed 2496.34 samples/sec Loss 3.8669 LearningRate 0.000728 Epoch: 9 Global Step: 192640 Fp16 Grad Scale: 65536 Required: 146 hours Training: 2022-07-07 10:27:30,867-Speed 2510.11 samples/sec Loss 3.8760 LearningRate 0.000728 Epoch: 9 Global Step: 192650 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:27:39,073-Speed 2496.32 samples/sec Loss 3.8667 LearningRate 0.000728 Epoch: 9 Global Step: 192660 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:27:47,235-Speed 2509.64 samples/sec Loss 3.8387 LearningRate 0.000728 Epoch: 9 Global Step: 192670 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:27:55,437-Speed 2497.59 samples/sec Loss 3.8597 LearningRate 0.000728 Epoch: 9 Global Step: 192680 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:03,641-Speed 2496.45 samples/sec Loss 3.8069 LearningRate 0.000728 Epoch: 9 Global Step: 192690 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:11,845-Speed 2497.00 samples/sec Loss 3.7582 LearningRate 0.000728 Epoch: 9 Global Step: 192700 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:20,045-Speed 2497.79 samples/sec Loss 3.8335 LearningRate 0.000728 Epoch: 9 Global Step: 192710 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:28,246-Speed 2497.94 samples/sec Loss 3.8135 LearningRate 0.000728 Epoch: 9 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:36,393-Speed 2514.30 samples/sec Loss 3.8015 LearningRate 0.000728 Epoch: 9 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:44,592-Speed 2498.17 samples/sec Loss 3.7525 LearningRate 0.000728 Epoch: 9 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 146 hours Training: 2022-07-07 10:28:52,793-Speed 2497.65 samples/sec Loss 3.8046 LearningRate 0.000728 Epoch: 9 Global Step: 192750 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:00,994-Speed 2497.74 samples/sec Loss 3.8091 LearningRate 0.000727 Epoch: 9 Global Step: 192760 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:09,193-Speed 2498.38 samples/sec Loss 3.7480 LearningRate 0.000727 Epoch: 9 Global Step: 192770 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:17,404-Speed 2494.58 samples/sec Loss 3.7906 LearningRate 0.000727 Epoch: 9 Global Step: 192780 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:25,559-Speed 2511.70 samples/sec Loss 3.7974 LearningRate 0.000727 Epoch: 9 Global Step: 192790 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:33,757-Speed 2498.51 samples/sec Loss 3.8399 LearningRate 0.000727 Epoch: 9 Global Step: 192800 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:41,954-Speed 2498.81 samples/sec Loss 3.7917 LearningRate 0.000727 Epoch: 9 Global Step: 192810 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:50,155-Speed 2497.68 samples/sec Loss 3.7482 LearningRate 0.000727 Epoch: 9 Global Step: 192820 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:29:58,355-Speed 2497.76 samples/sec Loss 3.7933 LearningRate 0.000727 Epoch: 9 Global Step: 192830 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:06,557-Speed 2497.69 samples/sec Loss 3.8844 LearningRate 0.000727 Epoch: 9 Global Step: 192840 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:14,707-Speed 2513.15 samples/sec Loss 3.8083 LearningRate 0.000727 Epoch: 9 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:22,907-Speed 2497.81 samples/sec Loss 3.7830 LearningRate 0.000727 Epoch: 9 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:31,109-Speed 2497.27 samples/sec Loss 3.8082 LearningRate 0.000727 Epoch: 9 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:39,310-Speed 2497.83 samples/sec Loss 3.7797 LearningRate 0.000727 Epoch: 9 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:47,509-Speed 2498.08 samples/sec Loss 3.7376 LearningRate 0.000727 Epoch: 9 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:30:55,708-Speed 2498.38 samples/sec Loss 3.8012 LearningRate 0.000727 Epoch: 9 Global Step: 192900 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:03,867-Speed 2510.59 samples/sec Loss 3.7612 LearningRate 0.000727 Epoch: 9 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:12,063-Speed 2499.02 samples/sec Loss 3.7172 LearningRate 0.000727 Epoch: 9 Global Step: 192920 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:20,259-Speed 2499.07 samples/sec Loss 3.7546 LearningRate 0.000727 Epoch: 9 Global Step: 192930 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:28,467-Speed 2495.68 samples/sec Loss 3.7780 LearningRate 0.000727 Epoch: 9 Global Step: 192940 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:36,666-Speed 2498.03 samples/sec Loss 3.7297 LearningRate 0.000727 Epoch: 9 Global Step: 192950 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:44,878-Speed 2494.34 samples/sec Loss 3.7403 LearningRate 0.000727 Epoch: 9 Global Step: 192960 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:31:53,039-Speed 2509.81 samples/sec Loss 3.7441 LearningRate 0.000727 Epoch: 9 Global Step: 192970 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:01,240-Speed 2497.81 samples/sec Loss 3.8047 LearningRate 0.000727 Epoch: 9 Global Step: 192980 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:09,443-Speed 2497.17 samples/sec Loss 3.8308 LearningRate 0.000727 Epoch: 9 Global Step: 192990 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:17,640-Speed 2498.61 samples/sec Loss 3.7304 LearningRate 0.000727 Epoch: 9 Global Step: 193000 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:25,837-Speed 2499.17 samples/sec Loss 3.7708 LearningRate 0.000727 Epoch: 9 Global Step: 193010 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:34,048-Speed 2494.44 samples/sec Loss 3.6958 LearningRate 0.000727 Epoch: 9 Global Step: 193020 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:42,194-Speed 2514.63 samples/sec Loss 3.7509 LearningRate 0.000727 Epoch: 9 Global Step: 193030 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:50,393-Speed 2498.33 samples/sec Loss 3.7603 LearningRate 0.000727 Epoch: 9 Global Step: 193040 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:32:58,591-Speed 2498.42 samples/sec Loss 3.7396 LearningRate 0.000727 Epoch: 9 Global Step: 193050 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:06,793-Speed 2500.46 samples/sec Loss 3.8343 LearningRate 0.000727 Epoch: 9 Global Step: 193060 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:14,993-Speed 2498.03 samples/sec Loss 3.8146 LearningRate 0.000727 Epoch: 9 Global Step: 193070 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:23,192-Speed 2498.25 samples/sec Loss 3.7168 LearningRate 0.000727 Epoch: 9 Global Step: 193080 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:31,339-Speed 2514.23 samples/sec Loss 3.8454 LearningRate 0.000727 Epoch: 9 Global Step: 193090 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:39,544-Speed 2496.89 samples/sec Loss 3.7776 LearningRate 0.000727 Epoch: 9 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:47,744-Speed 2498.11 samples/sec Loss 3.7273 LearningRate 0.000727 Epoch: 9 Global Step: 193110 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:33:55,944-Speed 2497.92 samples/sec Loss 3.7790 LearningRate 0.000727 Epoch: 9 Global Step: 193120 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:04,141-Speed 2498.95 samples/sec Loss 3.7516 LearningRate 0.000727 Epoch: 9 Global Step: 193130 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:12,341-Speed 2497.93 samples/sec Loss 3.8166 LearningRate 0.000727 Epoch: 9 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:20,500-Speed 2510.66 samples/sec Loss 3.6894 LearningRate 0.000727 Epoch: 9 Global Step: 193150 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:28,703-Speed 2496.97 samples/sec Loss 3.7813 LearningRate 0.000727 Epoch: 9 Global Step: 193160 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:36,900-Speed 2498.76 samples/sec Loss 3.7387 LearningRate 0.000727 Epoch: 9 Global Step: 193170 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:45,099-Speed 2498.48 samples/sec Loss 3.8310 LearningRate 0.000727 Epoch: 9 Global Step: 193180 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:34:53,295-Speed 2499.27 samples/sec Loss 3.9186 LearningRate 0.000727 Epoch: 9 Global Step: 193190 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:01,495-Speed 2497.77 samples/sec Loss 3.9509 LearningRate 0.000726 Epoch: 9 Global Step: 193200 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:09,647-Speed 2513.06 samples/sec Loss 3.8750 LearningRate 0.000726 Epoch: 9 Global Step: 193210 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:17,849-Speed 2497.28 samples/sec Loss 3.8697 LearningRate 0.000726 Epoch: 9 Global Step: 193220 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:26,047-Speed 2498.54 samples/sec Loss 3.8434 LearningRate 0.000726 Epoch: 9 Global Step: 193230 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:34,249-Speed 2497.64 samples/sec Loss 3.8388 LearningRate 0.000726 Epoch: 9 Global Step: 193240 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:42,451-Speed 2497.36 samples/sec Loss 3.8377 LearningRate 0.000726 Epoch: 9 Global Step: 193250 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:50,654-Speed 2496.98 samples/sec Loss 3.8102 LearningRate 0.000726 Epoch: 9 Global Step: 193260 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:35:58,806-Speed 2512.81 samples/sec Loss 3.7644 LearningRate 0.000726 Epoch: 9 Global Step: 193270 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:07,013-Speed 2495.77 samples/sec Loss 3.8111 LearningRate 0.000726 Epoch: 9 Global Step: 193280 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:15,215-Speed 2497.36 samples/sec Loss 3.7706 LearningRate 0.000726 Epoch: 9 Global Step: 193290 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:23,417-Speed 2497.19 samples/sec Loss 3.7365 LearningRate 0.000726 Epoch: 9 Global Step: 193300 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:31,618-Speed 2497.72 samples/sec Loss 3.7496 LearningRate 0.000726 Epoch: 9 Global Step: 193310 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:39,818-Speed 2497.71 samples/sec Loss 3.6574 LearningRate 0.000726 Epoch: 9 Global Step: 193320 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:47,970-Speed 2512.83 samples/sec Loss 3.8218 LearningRate 0.000726 Epoch: 9 Global Step: 193330 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:36:56,168-Speed 2498.60 samples/sec Loss 3.7848 LearningRate 0.000726 Epoch: 9 Global Step: 193340 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:04,371-Speed 2497.38 samples/sec Loss 3.7699 LearningRate 0.000726 Epoch: 9 Global Step: 193350 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:12,572-Speed 2497.43 samples/sec Loss 3.7181 LearningRate 0.000726 Epoch: 9 Global Step: 193360 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:20,770-Speed 2498.95 samples/sec Loss 3.7616 LearningRate 0.000726 Epoch: 9 Global Step: 193370 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:28,968-Speed 2498.45 samples/sec Loss 3.7652 LearningRate 0.000726 Epoch: 9 Global Step: 193380 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:37,117-Speed 2513.82 samples/sec Loss 3.7259 LearningRate 0.000726 Epoch: 9 Global Step: 193390 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:45,317-Speed 2497.83 samples/sec Loss 3.7350 LearningRate 0.000726 Epoch: 9 Global Step: 193400 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:37:53,515-Speed 2498.50 samples/sec Loss 3.7403 LearningRate 0.000726 Epoch: 9 Global Step: 193410 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:01,712-Speed 2499.23 samples/sec Loss 3.6731 LearningRate 0.000726 Epoch: 9 Global Step: 193420 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:09,910-Speed 2498.42 samples/sec Loss 3.7323 LearningRate 0.000726 Epoch: 9 Global Step: 193430 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:18,113-Speed 2497.11 samples/sec Loss 3.7969 LearningRate 0.000726 Epoch: 9 Global Step: 193440 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:26,263-Speed 2513.46 samples/sec Loss 3.7237 LearningRate 0.000726 Epoch: 9 Global Step: 193450 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:34,460-Speed 2498.80 samples/sec Loss 3.8177 LearningRate 0.000726 Epoch: 9 Global Step: 193460 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:42,657-Speed 2498.80 samples/sec Loss 3.7415 LearningRate 0.000726 Epoch: 9 Global Step: 193470 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:50,859-Speed 2497.46 samples/sec Loss 3.8268 LearningRate 0.000726 Epoch: 9 Global Step: 193480 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:38:59,059-Speed 2497.86 samples/sec Loss 3.8630 LearningRate 0.000726 Epoch: 9 Global Step: 193490 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:07,258-Speed 2498.33 samples/sec Loss 3.8517 LearningRate 0.000726 Epoch: 9 Global Step: 193500 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:15,405-Speed 2514.20 samples/sec Loss 3.8107 LearningRate 0.000726 Epoch: 9 Global Step: 193510 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:23,608-Speed 2497.28 samples/sec Loss 3.7655 LearningRate 0.000726 Epoch: 9 Global Step: 193520 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:31,809-Speed 2497.73 samples/sec Loss 3.7894 LearningRate 0.000726 Epoch: 9 Global Step: 193530 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:40,012-Speed 2497.04 samples/sec Loss 3.7102 LearningRate 0.000726 Epoch: 9 Global Step: 193540 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:48,209-Speed 2498.87 samples/sec Loss 3.7732 LearningRate 0.000726 Epoch: 9 Global Step: 193550 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:39:56,411-Speed 2497.59 samples/sec Loss 3.7854 LearningRate 0.000726 Epoch: 9 Global Step: 193560 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:04,557-Speed 2514.22 samples/sec Loss 3.7432 LearningRate 0.000726 Epoch: 9 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:12,768-Speed 2494.77 samples/sec Loss 3.7503 LearningRate 0.000726 Epoch: 9 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:20,975-Speed 2496.37 samples/sec Loss 3.7752 LearningRate 0.000726 Epoch: 9 Global Step: 193590 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:29,173-Speed 2498.56 samples/sec Loss 3.7805 LearningRate 0.000726 Epoch: 9 Global Step: 193600 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:37,369-Speed 2499.16 samples/sec Loss 3.7854 LearningRate 0.000726 Epoch: 9 Global Step: 193610 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:45,575-Speed 2496.15 samples/sec Loss 3.8657 LearningRate 0.000726 Epoch: 9 Global Step: 193620 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:40:53,723-Speed 2513.80 samples/sec Loss 3.7987 LearningRate 0.000726 Epoch: 9 Global Step: 193630 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:01,920-Speed 2498.97 samples/sec Loss 3.7599 LearningRate 0.000725 Epoch: 9 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:10,122-Speed 2497.87 samples/sec Loss 3.7265 LearningRate 0.000725 Epoch: 9 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:18,323-Speed 2497.53 samples/sec Loss 3.8481 LearningRate 0.000725 Epoch: 9 Global Step: 193660 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:26,527-Speed 2496.91 samples/sec Loss 3.7753 LearningRate 0.000725 Epoch: 9 Global Step: 193670 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:34,728-Speed 2497.63 samples/sec Loss 3.7626 LearningRate 0.000725 Epoch: 9 Global Step: 193680 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:42,880-Speed 2512.51 samples/sec Loss 3.7333 LearningRate 0.000725 Epoch: 9 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:51,081-Speed 2497.91 samples/sec Loss 3.8310 LearningRate 0.000725 Epoch: 9 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:41:59,280-Speed 2498.31 samples/sec Loss 3.8338 LearningRate 0.000725 Epoch: 9 Global Step: 193710 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:07,480-Speed 2497.94 samples/sec Loss 3.7766 LearningRate 0.000725 Epoch: 9 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:15,678-Speed 2498.54 samples/sec Loss 3.8464 LearningRate 0.000725 Epoch: 9 Global Step: 193730 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:23,875-Speed 2498.56 samples/sec Loss 3.7506 LearningRate 0.000725 Epoch: 9 Global Step: 193740 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:32,026-Speed 2513.33 samples/sec Loss 3.8696 LearningRate 0.000725 Epoch: 9 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:40,226-Speed 2497.92 samples/sec Loss 3.7668 LearningRate 0.000725 Epoch: 9 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:48,426-Speed 2497.82 samples/sec Loss 3.7237 LearningRate 0.000725 Epoch: 9 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:42:56,625-Speed 2498.44 samples/sec Loss 3.7923 LearningRate 0.000725 Epoch: 9 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:04,826-Speed 2497.66 samples/sec Loss 3.8044 LearningRate 0.000725 Epoch: 9 Global Step: 193790 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:13,036-Speed 2495.06 samples/sec Loss 3.8009 LearningRate 0.000725 Epoch: 9 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:21,183-Speed 2514.08 samples/sec Loss 3.7633 LearningRate 0.000725 Epoch: 9 Global Step: 193810 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:29,380-Speed 2499.03 samples/sec Loss 3.6870 LearningRate 0.000725 Epoch: 9 Global Step: 193820 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:37,579-Speed 2498.23 samples/sec Loss 3.8060 LearningRate 0.000725 Epoch: 9 Global Step: 193830 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:45,777-Speed 2498.69 samples/sec Loss 3.7580 LearningRate 0.000725 Epoch: 9 Global Step: 193840 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:43:53,973-Speed 2498.98 samples/sec Loss 3.7496 LearningRate 0.000725 Epoch: 9 Global Step: 193850 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:02,172-Speed 2498.37 samples/sec Loss 3.8304 LearningRate 0.000725 Epoch: 9 Global Step: 193860 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:10,321-Speed 2513.72 samples/sec Loss 3.7750 LearningRate 0.000725 Epoch: 9 Global Step: 193870 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:18,519-Speed 2498.47 samples/sec Loss 3.7244 LearningRate 0.000725 Epoch: 9 Global Step: 193880 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:26,729-Speed 2494.83 samples/sec Loss 3.7891 LearningRate 0.000725 Epoch: 9 Global Step: 193890 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:34,924-Speed 2499.51 samples/sec Loss 3.7445 LearningRate 0.000725 Epoch: 9 Global Step: 193900 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:43,122-Speed 2498.69 samples/sec Loss 3.7553 LearningRate 0.000725 Epoch: 9 Global Step: 193910 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:51,322-Speed 2497.78 samples/sec Loss 3.7625 LearningRate 0.000725 Epoch: 9 Global Step: 193920 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:44:59,468-Speed 2514.54 samples/sec Loss 3.7535 LearningRate 0.000725 Epoch: 9 Global Step: 193930 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:07,677-Speed 2495.13 samples/sec Loss 3.7208 LearningRate 0.000725 Epoch: 9 Global Step: 193940 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:15,873-Speed 2499.33 samples/sec Loss 3.8344 LearningRate 0.000725 Epoch: 9 Global Step: 193950 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:24,073-Speed 2498.15 samples/sec Loss 3.8284 LearningRate 0.000725 Epoch: 9 Global Step: 193960 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:32,272-Speed 2498.09 samples/sec Loss 3.7779 LearningRate 0.000725 Epoch: 9 Global Step: 193970 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:40,475-Speed 2496.94 samples/sec Loss 3.8601 LearningRate 0.000725 Epoch: 9 Global Step: 193980 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:48,624-Speed 2513.76 samples/sec Loss 3.8435 LearningRate 0.000725 Epoch: 9 Global Step: 193990 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:45:56,823-Speed 2498.10 samples/sec Loss 3.7830 LearningRate 0.000725 Epoch: 9 Global Step: 194000 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:46:05,020-Speed 2498.89 samples/sec Loss 3.6820 LearningRate 0.000725 Epoch: 9 Global Step: 194010 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 10:46:13,177-Speed 2511.15 samples/sec Loss 3.7662 LearningRate 0.000725 Epoch: 9 Global Step: 194020 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:46:21,375-Speed 2498.65 samples/sec Loss 3.7593 LearningRate 0.000725 Epoch: 9 Global Step: 194030 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:46:29,573-Speed 2498.45 samples/sec Loss 3.7096 LearningRate 0.000725 Epoch: 9 Global Step: 194040 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:46:37,718-Speed 2514.91 samples/sec Loss 3.7424 LearningRate 0.000725 Epoch: 9 Global Step: 194050 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:46:45,919-Speed 2497.78 samples/sec Loss 3.8408 LearningRate 0.000725 Epoch: 9 Global Step: 194060 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:46:54,121-Speed 2497.46 samples/sec Loss 3.8126 LearningRate 0.000724 Epoch: 9 Global Step: 194070 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:02,321-Speed 2497.71 samples/sec Loss 3.7554 LearningRate 0.000724 Epoch: 9 Global Step: 194080 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:10,524-Speed 2497.31 samples/sec Loss 3.8014 LearningRate 0.000724 Epoch: 9 Global Step: 194090 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:18,723-Speed 2498.10 samples/sec Loss 3.7144 LearningRate 0.000724 Epoch: 9 Global Step: 194100 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:26,869-Speed 2514.80 samples/sec Loss 3.7777 LearningRate 0.000724 Epoch: 9 Global Step: 194110 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:35,066-Speed 2498.77 samples/sec Loss 3.7892 LearningRate 0.000724 Epoch: 9 Global Step: 194120 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:43,267-Speed 2497.55 samples/sec Loss 3.6802 LearningRate 0.000724 Epoch: 9 Global Step: 194130 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:51,468-Speed 2497.94 samples/sec Loss 3.7073 LearningRate 0.000724 Epoch: 9 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:47:59,665-Speed 2498.65 samples/sec Loss 3.7098 LearningRate 0.000724 Epoch: 9 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:07,867-Speed 2497.61 samples/sec Loss 3.7376 LearningRate 0.000724 Epoch: 9 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:16,008-Speed 2515.88 samples/sec Loss 3.7705 LearningRate 0.000724 Epoch: 9 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:24,206-Speed 2499.03 samples/sec Loss 3.7693 LearningRate 0.000724 Epoch: 9 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:32,403-Speed 2498.67 samples/sec Loss 3.7465 LearningRate 0.000724 Epoch: 9 Global Step: 194190 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:40,613-Speed 2494.91 samples/sec Loss 3.8378 LearningRate 0.000724 Epoch: 9 Global Step: 194200 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:48,814-Speed 2497.82 samples/sec Loss 3.8128 LearningRate 0.000724 Epoch: 9 Global Step: 194210 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:48:57,018-Speed 2496.44 samples/sec Loss 3.8400 LearningRate 0.000724 Epoch: 9 Global Step: 194220 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:05,167-Speed 2513.95 samples/sec Loss 3.7545 LearningRate 0.000724 Epoch: 9 Global Step: 194230 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:13,478-Speed 2464.34 samples/sec Loss 3.7202 LearningRate 0.000724 Epoch: 9 Global Step: 194240 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:21,676-Speed 2498.40 samples/sec Loss 3.8073 LearningRate 0.000724 Epoch: 9 Global Step: 194250 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:30,018-Speed 2500.01 samples/sec Loss 3.7449 LearningRate 0.000724 Epoch: 9 Global Step: 194260 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:39,656-Speed 2125.29 samples/sec Loss 3.7910 LearningRate 0.000724 Epoch: 9 Global Step: 194270 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:47,848-Speed 2500.40 samples/sec Loss 3.6968 LearningRate 0.000724 Epoch: 9 Global Step: 194280 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:49:55,991-Speed 2515.42 samples/sec Loss 3.7772 LearningRate 0.000724 Epoch: 9 Global Step: 194290 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:04,190-Speed 2498.33 samples/sec Loss 3.7538 LearningRate 0.000724 Epoch: 9 Global Step: 194300 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:12,391-Speed 2497.58 samples/sec Loss 3.7673 LearningRate 0.000724 Epoch: 9 Global Step: 194310 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:20,590-Speed 2498.22 samples/sec Loss 3.7297 LearningRate 0.000724 Epoch: 9 Global Step: 194320 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:28,807-Speed 2493.15 samples/sec Loss 3.7871 LearningRate 0.000724 Epoch: 9 Global Step: 194330 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:37,003-Speed 2498.95 samples/sec Loss 3.6693 LearningRate 0.000724 Epoch: 9 Global Step: 194340 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:45,145-Speed 2515.61 samples/sec Loss 3.7334 LearningRate 0.000724 Epoch: 9 Global Step: 194350 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:50:53,353-Speed 2495.55 samples/sec Loss 3.7093 LearningRate 0.000724 Epoch: 9 Global Step: 194360 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:01,563-Speed 2495.29 samples/sec Loss 3.7372 LearningRate 0.000724 Epoch: 9 Global Step: 194370 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:09,764-Speed 2497.58 samples/sec Loss 3.6925 LearningRate 0.000724 Epoch: 9 Global Step: 194380 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:17,964-Speed 2497.73 samples/sec Loss 3.7252 LearningRate 0.000724 Epoch: 9 Global Step: 194390 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:26,171-Speed 2496.09 samples/sec Loss 3.7819 LearningRate 0.000724 Epoch: 9 Global Step: 194400 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:34,316-Speed 2514.72 samples/sec Loss 3.7546 LearningRate 0.000724 Epoch: 9 Global Step: 194410 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:42,514-Speed 2498.39 samples/sec Loss 3.7221 LearningRate 0.000724 Epoch: 9 Global Step: 194420 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:50,715-Speed 2497.82 samples/sec Loss 3.6578 LearningRate 0.000724 Epoch: 9 Global Step: 194430 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:51:58,912-Speed 2499.00 samples/sec Loss 3.7183 LearningRate 0.000724 Epoch: 9 Global Step: 194440 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:07,128-Speed 2493.01 samples/sec Loss 3.7298 LearningRate 0.000724 Epoch: 9 Global Step: 194450 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:15,329-Speed 2497.62 samples/sec Loss 3.7959 LearningRate 0.000724 Epoch: 9 Global Step: 194460 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:23,473-Speed 2515.20 samples/sec Loss 3.7683 LearningRate 0.000724 Epoch: 9 Global Step: 194470 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:31,675-Speed 2497.28 samples/sec Loss 3.6973 LearningRate 0.000724 Epoch: 9 Global Step: 194480 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:39,875-Speed 2497.88 samples/sec Loss 3.7487 LearningRate 0.000724 Epoch: 9 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:48,073-Speed 2498.81 samples/sec Loss 3.7339 LearningRate 0.000724 Epoch: 9 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:52:56,285-Speed 2494.31 samples/sec Loss 3.7254 LearningRate 0.000723 Epoch: 9 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:04,483-Speed 2498.61 samples/sec Loss 3.7672 LearningRate 0.000723 Epoch: 9 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:12,633-Speed 2513.71 samples/sec Loss 3.7508 LearningRate 0.000723 Epoch: 9 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:20,829-Speed 2499.19 samples/sec Loss 3.7179 LearningRate 0.000723 Epoch: 9 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:29,024-Speed 2499.59 samples/sec Loss 3.8055 LearningRate 0.000723 Epoch: 9 Global Step: 194550 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:37,238-Speed 2493.51 samples/sec Loss 3.7275 LearningRate 0.000723 Epoch: 9 Global Step: 194560 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:45,443-Speed 2496.69 samples/sec Loss 3.7055 LearningRate 0.000723 Epoch: 9 Global Step: 194570 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:53:53,641-Speed 2498.61 samples/sec Loss 3.8123 LearningRate 0.000723 Epoch: 9 Global Step: 194580 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:01,787-Speed 2514.50 samples/sec Loss 3.8370 LearningRate 0.000723 Epoch: 9 Global Step: 194590 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:09,989-Speed 2497.12 samples/sec Loss 3.7528 LearningRate 0.000723 Epoch: 9 Global Step: 194600 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:18,201-Speed 2494.72 samples/sec Loss 3.7342 LearningRate 0.000723 Epoch: 9 Global Step: 194610 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:26,409-Speed 2495.51 samples/sec Loss 3.7806 LearningRate 0.000723 Epoch: 9 Global Step: 194620 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:34,621-Speed 2494.54 samples/sec Loss 3.7593 LearningRate 0.000723 Epoch: 9 Global Step: 194630 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:42,818-Speed 2498.87 samples/sec Loss 3.6985 LearningRate 0.000723 Epoch: 9 Global Step: 194640 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:50,963-Speed 2514.87 samples/sec Loss 3.8743 LearningRate 0.000723 Epoch: 9 Global Step: 194650 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:54:59,160-Speed 2498.73 samples/sec Loss 3.7495 LearningRate 0.000723 Epoch: 9 Global Step: 194660 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:07,355-Speed 2499.55 samples/sec Loss 3.7773 LearningRate 0.000723 Epoch: 9 Global Step: 194670 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:15,551-Speed 2499.11 samples/sec Loss 3.8068 LearningRate 0.000723 Epoch: 9 Global Step: 194680 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:23,753-Speed 2497.36 samples/sec Loss 3.8607 LearningRate 0.000723 Epoch: 9 Global Step: 194690 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:31,963-Speed 2494.91 samples/sec Loss 3.8655 LearningRate 0.000723 Epoch: 9 Global Step: 194700 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:40,109-Speed 2514.33 samples/sec Loss 3.8029 LearningRate 0.000723 Epoch: 9 Global Step: 194710 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:48,306-Speed 2499.31 samples/sec Loss 3.6718 LearningRate 0.000723 Epoch: 9 Global Step: 194720 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:55:56,502-Speed 2499.22 samples/sec Loss 3.7679 LearningRate 0.000723 Epoch: 9 Global Step: 194730 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:04,701-Speed 2498.29 samples/sec Loss 3.7374 LearningRate 0.000723 Epoch: 9 Global Step: 194740 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:12,904-Speed 2497.29 samples/sec Loss 3.7095 LearningRate 0.000723 Epoch: 9 Global Step: 194750 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:21,103-Speed 2498.26 samples/sec Loss 3.7549 LearningRate 0.000723 Epoch: 9 Global Step: 194760 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:29,256-Speed 2512.15 samples/sec Loss 3.6646 LearningRate 0.000723 Epoch: 9 Global Step: 194770 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:37,481-Speed 2490.39 samples/sec Loss 3.7594 LearningRate 0.000723 Epoch: 9 Global Step: 194780 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:56:57,983-Speed 1353.20 samples/sec Loss 3.7739 LearningRate 0.000723 Epoch: 9 Global Step: 194790 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:06,210-Speed 2503.01 samples/sec Loss 3.8131 LearningRate 0.000723 Epoch: 9 Global Step: 194800 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:19,415-Speed 2505.71 samples/sec Loss 3.7541 LearningRate 0.000723 Epoch: 9 Global Step: 194810 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:27,612-Speed 2504.67 samples/sec Loss 3.8276 LearningRate 0.000723 Epoch: 9 Global Step: 194820 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:41,227-Speed 1504.33 samples/sec Loss 3.7341 LearningRate 0.000723 Epoch: 9 Global Step: 194830 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:50,922-Speed 2505.83 samples/sec Loss 3.8001 LearningRate 0.000723 Epoch: 9 Global Step: 194840 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:57:59,142-Speed 2500.48 samples/sec Loss 3.8142 LearningRate 0.000723 Epoch: 9 Global Step: 194850 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:11,765-Speed 1622.66 samples/sec Loss 3.7691 LearningRate 0.000723 Epoch: 9 Global Step: 194860 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:19,999-Speed 2499.16 samples/sec Loss 3.7516 LearningRate 0.000723 Epoch: 9 Global Step: 194870 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:28,225-Speed 2496.12 samples/sec Loss 3.7054 LearningRate 0.000723 Epoch: 9 Global Step: 194880 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:41,612-Speed 1529.86 samples/sec Loss 3.7299 LearningRate 0.000723 Epoch: 9 Global Step: 194890 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:49,851-Speed 2494.41 samples/sec Loss 3.7495 LearningRate 0.000723 Epoch: 9 Global Step: 194900 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:58:58,126-Speed 2492.64 samples/sec Loss 3.7410 LearningRate 0.000723 Epoch: 9 Global Step: 194910 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:06,359-Speed 2488.16 samples/sec Loss 3.7178 LearningRate 0.000723 Epoch: 9 Global Step: 194920 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:14,634-Speed 2495.75 samples/sec Loss 3.7483 LearningRate 0.000723 Epoch: 9 Global Step: 194930 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:25,106-Speed 2497.62 samples/sec Loss 3.6975 LearningRate 0.000723 Epoch: 9 Global Step: 194940 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:33,267-Speed 2509.66 samples/sec Loss 3.7072 LearningRate 0.000722 Epoch: 9 Global Step: 194950 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:41,544-Speed 2498.46 samples/sec Loss 3.7544 LearningRate 0.000722 Epoch: 9 Global Step: 194960 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:49,745-Speed 2498.03 samples/sec Loss 3.7333 LearningRate 0.000722 Epoch: 9 Global Step: 194970 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 10:59:57,978-Speed 2495.99 samples/sec Loss 3.7355 LearningRate 0.000722 Epoch: 9 Global Step: 194980 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:06,198-Speed 2500.33 samples/sec Loss 3.7148 LearningRate 0.000722 Epoch: 9 Global Step: 194990 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:14,401-Speed 2497.09 samples/sec Loss 3.7029 LearningRate 0.000722 Epoch: 9 Global Step: 195000 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:25,809-Speed 2514.06 samples/sec Loss 3.6774 LearningRate 0.000722 Epoch: 9 Global Step: 195010 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:34,031-Speed 2499.07 samples/sec Loss 3.7393 LearningRate 0.000722 Epoch: 9 Global Step: 195020 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:45,461-Speed 1791.88 samples/sec Loss 3.6815 LearningRate 0.000722 Epoch: 9 Global Step: 195030 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:00:53,709-Speed 2498.94 samples/sec Loss 3.6588 LearningRate 0.000722 Epoch: 9 Global Step: 195040 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:03,464-Speed 2114.22 samples/sec Loss 3.7536 LearningRate 0.000722 Epoch: 9 Global Step: 195050 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:11,672-Speed 2495.54 samples/sec Loss 3.7213 LearningRate 0.000722 Epoch: 9 Global Step: 195060 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:19,842-Speed 2507.00 samples/sec Loss 3.7890 LearningRate 0.000722 Epoch: 9 Global Step: 195070 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:28,051-Speed 2495.26 samples/sec Loss 3.6911 LearningRate 0.000722 Epoch: 9 Global Step: 195080 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:36,541-Speed 2419.68 samples/sec Loss 3.6678 LearningRate 0.000722 Epoch: 9 Global Step: 195090 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:44,752-Speed 2494.26 samples/sec Loss 3.7394 LearningRate 0.000722 Epoch: 9 Global Step: 195100 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:01:53,979-Speed 2498.84 samples/sec Loss 3.8786 LearningRate 0.000722 Epoch: 9 Global Step: 195110 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:02,879-Speed 2498.47 samples/sec Loss 3.7179 LearningRate 0.000722 Epoch: 9 Global Step: 195120 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:11,045-Speed 2508.39 samples/sec Loss 3.7062 LearningRate 0.000722 Epoch: 9 Global Step: 195130 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:19,254-Speed 2495.22 samples/sec Loss 3.6830 LearningRate 0.000722 Epoch: 9 Global Step: 195140 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:27,468-Speed 2493.89 samples/sec Loss 3.7363 LearningRate 0.000722 Epoch: 9 Global Step: 195150 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:35,675-Speed 2495.86 samples/sec Loss 3.7324 LearningRate 0.000722 Epoch: 9 Global Step: 195160 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:43,878-Speed 2496.90 samples/sec Loss 3.7537 LearningRate 0.000722 Epoch: 9 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:02:52,096-Speed 2492.43 samples/sec Loss 3.7723 LearningRate 0.000722 Epoch: 9 Global Step: 195180 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:00,247-Speed 2513.21 samples/sec Loss 3.8157 LearningRate 0.000722 Epoch: 9 Global Step: 195190 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:08,452-Speed 2496.50 samples/sec Loss 3.7587 LearningRate 0.000722 Epoch: 9 Global Step: 195200 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:16,657-Speed 2496.47 samples/sec Loss 3.8187 LearningRate 0.000722 Epoch: 9 Global Step: 195210 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:24,881-Speed 2490.51 samples/sec Loss 3.8123 LearningRate 0.000722 Epoch: 9 Global Step: 195220 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:03:33,045-Speed 2508.97 samples/sec Loss 3.7399 LearningRate 0.000722 Epoch: 9 Global Step: 195230 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:41,250-Speed 2496.78 samples/sec Loss 3.7568 LearningRate 0.000722 Epoch: 9 Global Step: 195240 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:49,405-Speed 2511.60 samples/sec Loss 3.7458 LearningRate 0.000722 Epoch: 9 Global Step: 195250 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:03:57,612-Speed 2495.96 samples/sec Loss 3.7568 LearningRate 0.000722 Epoch: 9 Global Step: 195260 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:05,821-Speed 2495.17 samples/sec Loss 3.6627 LearningRate 0.000722 Epoch: 9 Global Step: 195270 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:14,030-Speed 2495.27 samples/sec Loss 3.7095 LearningRate 0.000722 Epoch: 9 Global Step: 195280 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:22,235-Speed 2496.51 samples/sec Loss 3.7056 LearningRate 0.000722 Epoch: 9 Global Step: 195290 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:30,440-Speed 2496.48 samples/sec Loss 3.7049 LearningRate 0.000722 Epoch: 9 Global Step: 195300 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:38,591-Speed 2513.16 samples/sec Loss 3.7119 LearningRate 0.000722 Epoch: 9 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:46,806-Speed 2493.49 samples/sec Loss 3.7063 LearningRate 0.000722 Epoch: 9 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:04:55,010-Speed 2496.48 samples/sec Loss 3.6669 LearningRate 0.000722 Epoch: 9 Global Step: 195330 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:03,211-Speed 2497.92 samples/sec Loss 3.7275 LearningRate 0.000722 Epoch: 9 Global Step: 195340 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:11,415-Speed 2496.83 samples/sec Loss 3.7461 LearningRate 0.000722 Epoch: 9 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:19,635-Speed 2491.82 samples/sec Loss 3.7706 LearningRate 0.000722 Epoch: 9 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:27,785-Speed 2513.94 samples/sec Loss 3.7273 LearningRate 0.000722 Epoch: 9 Global Step: 195370 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:35,994-Speed 2495.64 samples/sec Loss 3.7143 LearningRate 0.000722 Epoch: 9 Global Step: 195380 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:44,203-Speed 2494.97 samples/sec Loss 3.7257 LearningRate 0.000721 Epoch: 9 Global Step: 195390 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:05:52,409-Speed 2496.22 samples/sec Loss 3.7302 LearningRate 0.000721 Epoch: 9 Global Step: 195400 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:00,613-Speed 2496.59 samples/sec Loss 3.7769 LearningRate 0.000721 Epoch: 9 Global Step: 195410 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:08,817-Speed 2496.82 samples/sec Loss 3.6924 LearningRate 0.000721 Epoch: 9 Global Step: 195420 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:16,967-Speed 2513.31 samples/sec Loss 3.7838 LearningRate 0.000721 Epoch: 9 Global Step: 195430 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:25,171-Speed 2496.87 samples/sec Loss 3.7774 LearningRate 0.000721 Epoch: 9 Global Step: 195440 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:33,377-Speed 2495.98 samples/sec Loss 3.7030 LearningRate 0.000721 Epoch: 9 Global Step: 195450 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:41,580-Speed 2497.06 samples/sec Loss 3.7609 LearningRate 0.000721 Epoch: 9 Global Step: 195460 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:49,783-Speed 2497.16 samples/sec Loss 3.8463 LearningRate 0.000721 Epoch: 9 Global Step: 195470 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:06:58,001-Speed 2492.67 samples/sec Loss 3.7624 LearningRate 0.000721 Epoch: 9 Global Step: 195480 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:06,150-Speed 2513.51 samples/sec Loss 3.6939 LearningRate 0.000721 Epoch: 9 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:14,359-Speed 2495.31 samples/sec Loss 3.7551 LearningRate 0.000721 Epoch: 9 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:22,562-Speed 2497.21 samples/sec Loss 3.7636 LearningRate 0.000721 Epoch: 9 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:30,767-Speed 2496.57 samples/sec Loss 3.7629 LearningRate 0.000721 Epoch: 9 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:38,988-Speed 2491.62 samples/sec Loss 3.7096 LearningRate 0.000721 Epoch: 9 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:47,194-Speed 2496.27 samples/sec Loss 3.6654 LearningRate 0.000721 Epoch: 9 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:07:55,341-Speed 2514.48 samples/sec Loss 3.7595 LearningRate 0.000721 Epoch: 9 Global Step: 195550 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:03,545-Speed 2496.92 samples/sec Loss 3.8020 LearningRate 0.000721 Epoch: 9 Global Step: 195560 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:11,742-Speed 2498.75 samples/sec Loss 3.7279 LearningRate 0.000721 Epoch: 9 Global Step: 195570 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:19,941-Speed 2498.42 samples/sec Loss 3.6858 LearningRate 0.000721 Epoch: 9 Global Step: 195580 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:28,145-Speed 2496.81 samples/sec Loss 3.7905 LearningRate 0.000721 Epoch: 9 Global Step: 195590 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:36,344-Speed 2498.05 samples/sec Loss 3.7333 LearningRate 0.000721 Epoch: 9 Global Step: 195600 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:44,496-Speed 2512.65 samples/sec Loss 3.7316 LearningRate 0.000721 Epoch: 9 Global Step: 195610 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:08:52,699-Speed 2497.21 samples/sec Loss 3.6837 LearningRate 0.000721 Epoch: 9 Global Step: 195620 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:00,898-Speed 2498.22 samples/sec Loss 3.6842 LearningRate 0.000721 Epoch: 9 Global Step: 195630 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:09,101-Speed 2496.90 samples/sec Loss 3.7353 LearningRate 0.000721 Epoch: 9 Global Step: 195640 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:17,300-Speed 2498.42 samples/sec Loss 3.7366 LearningRate 0.000721 Epoch: 9 Global Step: 195650 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:25,502-Speed 2497.38 samples/sec Loss 3.6924 LearningRate 0.000721 Epoch: 9 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:33,649-Speed 2514.10 samples/sec Loss 3.8260 LearningRate 0.000721 Epoch: 9 Global Step: 195670 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:41,852-Speed 2497.02 samples/sec Loss 3.7766 LearningRate 0.000721 Epoch: 9 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:50,058-Speed 2496.04 samples/sec Loss 3.7733 LearningRate 0.000721 Epoch: 9 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:09:58,258-Speed 2497.86 samples/sec Loss 3.8305 LearningRate 0.000721 Epoch: 9 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:06,460-Speed 2497.45 samples/sec Loss 3.7637 LearningRate 0.000721 Epoch: 9 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:14,660-Speed 2498.34 samples/sec Loss 3.7700 LearningRate 0.000721 Epoch: 9 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:22,806-Speed 2514.56 samples/sec Loss 3.7750 LearningRate 0.000721 Epoch: 9 Global Step: 195730 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:31,006-Speed 2497.98 samples/sec Loss 3.7189 LearningRate 0.000721 Epoch: 9 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:39,206-Speed 2497.74 samples/sec Loss 3.7383 LearningRate 0.000721 Epoch: 9 Global Step: 195750 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:47,408-Speed 2497.59 samples/sec Loss 3.7807 LearningRate 0.000721 Epoch: 9 Global Step: 195760 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:10:55,610-Speed 2497.58 samples/sec Loss 3.7530 LearningRate 0.000721 Epoch: 9 Global Step: 195770 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:03,811-Speed 2497.48 samples/sec Loss 3.7198 LearningRate 0.000721 Epoch: 9 Global Step: 195780 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:11,959-Speed 2514.05 samples/sec Loss 3.7070 LearningRate 0.000721 Epoch: 9 Global Step: 195790 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:20,162-Speed 2497.08 samples/sec Loss 3.6576 LearningRate 0.000721 Epoch: 9 Global Step: 195800 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:28,365-Speed 2496.86 samples/sec Loss 3.7386 LearningRate 0.000721 Epoch: 9 Global Step: 195810 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:36,567-Speed 2497.29 samples/sec Loss 3.7215 LearningRate 0.000721 Epoch: 9 Global Step: 195820 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:44,775-Speed 2495.57 samples/sec Loss 3.7402 LearningRate 0.000720 Epoch: 9 Global Step: 195830 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:11:52,980-Speed 2496.40 samples/sec Loss 3.8234 LearningRate 0.000720 Epoch: 9 Global Step: 195840 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:01,125-Speed 2514.95 samples/sec Loss 3.7496 LearningRate 0.000720 Epoch: 9 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:09,326-Speed 2498.06 samples/sec Loss 3.7521 LearningRate 0.000720 Epoch: 9 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:17,534-Speed 2495.37 samples/sec Loss 3.6900 LearningRate 0.000720 Epoch: 9 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:25,733-Speed 2498.47 samples/sec Loss 3.7125 LearningRate 0.000720 Epoch: 9 Global Step: 195880 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:33,930-Speed 2498.90 samples/sec Loss 3.8232 LearningRate 0.000720 Epoch: 9 Global Step: 195890 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:42,128-Speed 2498.56 samples/sec Loss 3.6595 LearningRate 0.000720 Epoch: 9 Global Step: 195900 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:50,277-Speed 2513.50 samples/sec Loss 3.7276 LearningRate 0.000720 Epoch: 9 Global Step: 195910 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:12:58,478-Speed 2497.85 samples/sec Loss 3.7069 LearningRate 0.000720 Epoch: 9 Global Step: 195920 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:06,678-Speed 2497.94 samples/sec Loss 3.7018 LearningRate 0.000720 Epoch: 9 Global Step: 195930 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:14,876-Speed 2498.66 samples/sec Loss 3.6930 LearningRate 0.000720 Epoch: 9 Global Step: 195940 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:23,074-Speed 2498.58 samples/sec Loss 3.6543 LearningRate 0.000720 Epoch: 9 Global Step: 195950 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:31,275-Speed 2497.73 samples/sec Loss 3.6860 LearningRate 0.000720 Epoch: 9 Global Step: 195960 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:39,420-Speed 2514.90 samples/sec Loss 3.6796 LearningRate 0.000720 Epoch: 9 Global Step: 195970 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:47,618-Speed 2498.42 samples/sec Loss 3.6846 LearningRate 0.000720 Epoch: 9 Global Step: 195980 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:13:55,828-Speed 2495.10 samples/sec Loss 3.7549 LearningRate 0.000720 Epoch: 9 Global Step: 195990 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:04,038-Speed 2494.77 samples/sec Loss 3.7127 LearningRate 0.000720 Epoch: 9 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:12,240-Speed 2497.58 samples/sec Loss 3.7710 LearningRate 0.000720 Epoch: 9 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:20,439-Speed 2498.23 samples/sec Loss 3.7746 LearningRate 0.000720 Epoch: 9 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:28,586-Speed 2514.46 samples/sec Loss 3.6828 LearningRate 0.000720 Epoch: 9 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:36,787-Speed 2497.64 samples/sec Loss 3.6506 LearningRate 0.000720 Epoch: 9 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:44,985-Speed 2498.38 samples/sec Loss 3.7443 LearningRate 0.000720 Epoch: 9 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:14:53,184-Speed 2498.33 samples/sec Loss 3.7118 LearningRate 0.000720 Epoch: 9 Global Step: 196060 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:01,384-Speed 2498.07 samples/sec Loss 3.8126 LearningRate 0.000720 Epoch: 9 Global Step: 196070 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:09,584-Speed 2497.93 samples/sec Loss 3.6778 LearningRate 0.000720 Epoch: 9 Global Step: 196080 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:17,737-Speed 2512.44 samples/sec Loss 3.6779 LearningRate 0.000720 Epoch: 9 Global Step: 196090 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:25,935-Speed 2498.45 samples/sec Loss 3.6519 LearningRate 0.000720 Epoch: 9 Global Step: 196100 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:34,132-Speed 2498.85 samples/sec Loss 3.7137 LearningRate 0.000720 Epoch: 9 Global Step: 196110 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:42,330-Speed 2498.72 samples/sec Loss 3.7525 LearningRate 0.000720 Epoch: 9 Global Step: 196120 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:50,524-Speed 2499.71 samples/sec Loss 3.8124 LearningRate 0.000720 Epoch: 9 Global Step: 196130 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:15:58,728-Speed 2496.63 samples/sec Loss 3.7646 LearningRate 0.000720 Epoch: 9 Global Step: 196140 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:06,882-Speed 2512.30 samples/sec Loss 3.7686 LearningRate 0.000720 Epoch: 9 Global Step: 196150 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:15,079-Speed 2498.90 samples/sec Loss 3.6906 LearningRate 0.000720 Epoch: 9 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:23,280-Speed 2497.77 samples/sec Loss 3.7266 LearningRate 0.000720 Epoch: 9 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:31,474-Speed 2499.57 samples/sec Loss 3.7605 LearningRate 0.000720 Epoch: 9 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:39,669-Speed 2499.69 samples/sec Loss 3.7351 LearningRate 0.000720 Epoch: 9 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:47,868-Speed 2498.29 samples/sec Loss 3.7257 LearningRate 0.000720 Epoch: 9 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:16:56,016-Speed 2513.92 samples/sec Loss 3.6892 LearningRate 0.000720 Epoch: 9 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:04,215-Speed 2498.30 samples/sec Loss 3.7531 LearningRate 0.000720 Epoch: 9 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:12,414-Speed 2498.29 samples/sec Loss 3.7042 LearningRate 0.000720 Epoch: 9 Global Step: 196230 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:20,611-Speed 2498.76 samples/sec Loss 3.7096 LearningRate 0.000720 Epoch: 9 Global Step: 196240 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:28,811-Speed 2497.96 samples/sec Loss 3.8133 LearningRate 0.000720 Epoch: 9 Global Step: 196250 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:37,010-Speed 2498.24 samples/sec Loss 3.7720 LearningRate 0.000720 Epoch: 9 Global Step: 196260 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:45,155-Speed 2514.65 samples/sec Loss 3.7054 LearningRate 0.000719 Epoch: 9 Global Step: 196270 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:17:53,351-Speed 2499.43 samples/sec Loss 3.8002 LearningRate 0.000719 Epoch: 9 Global Step: 196280 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:01,545-Speed 2499.69 samples/sec Loss 3.7532 LearningRate 0.000719 Epoch: 9 Global Step: 196290 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:09,746-Speed 2497.69 samples/sec Loss 3.6670 LearningRate 0.000719 Epoch: 9 Global Step: 196300 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:17,944-Speed 2498.52 samples/sec Loss 3.7011 LearningRate 0.000719 Epoch: 9 Global Step: 196310 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:26,146-Speed 2497.40 samples/sec Loss 3.6034 LearningRate 0.000719 Epoch: 9 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:34,293-Speed 2514.40 samples/sec Loss 3.7215 LearningRate 0.000719 Epoch: 9 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:42,489-Speed 2499.23 samples/sec Loss 3.7120 LearningRate 0.000719 Epoch: 9 Global Step: 196340 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:50,686-Speed 2498.96 samples/sec Loss 3.7335 LearningRate 0.000719 Epoch: 9 Global Step: 196350 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:18:58,884-Speed 2498.58 samples/sec Loss 3.7157 LearningRate 0.000719 Epoch: 9 Global Step: 196360 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:07,082-Speed 2498.48 samples/sec Loss 3.7301 LearningRate 0.000719 Epoch: 9 Global Step: 196370 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:15,282-Speed 2497.90 samples/sec Loss 3.7498 LearningRate 0.000719 Epoch: 9 Global Step: 196380 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:23,429-Speed 2514.12 samples/sec Loss 3.6557 LearningRate 0.000719 Epoch: 9 Global Step: 196390 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:31,643-Speed 2493.62 samples/sec Loss 3.6915 LearningRate 0.000719 Epoch: 9 Global Step: 196400 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:39,854-Speed 2494.89 samples/sec Loss 3.6521 LearningRate 0.000719 Epoch: 9 Global Step: 196410 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:48,059-Speed 2496.45 samples/sec Loss 3.7419 LearningRate 0.000719 Epoch: 9 Global Step: 196420 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:19:56,264-Speed 2496.30 samples/sec Loss 3.6782 LearningRate 0.000719 Epoch: 9 Global Step: 196430 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:04,468-Speed 2496.50 samples/sec Loss 3.6656 LearningRate 0.000719 Epoch: 9 Global Step: 196440 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:12,616-Speed 2514.01 samples/sec Loss 3.8145 LearningRate 0.000719 Epoch: 9 Global Step: 196450 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:20,828-Speed 2494.51 samples/sec Loss 3.6241 LearningRate 0.000719 Epoch: 9 Global Step: 196460 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:29,030-Speed 2497.34 samples/sec Loss 3.6846 LearningRate 0.000719 Epoch: 9 Global Step: 196470 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:37,232-Speed 2497.20 samples/sec Loss 3.7639 LearningRate 0.000719 Epoch: 9 Global Step: 196480 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:45,436-Speed 2496.95 samples/sec Loss 3.6902 LearningRate 0.000719 Epoch: 9 Global Step: 196490 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:20:53,642-Speed 2496.23 samples/sec Loss 3.7492 LearningRate 0.000719 Epoch: 9 Global Step: 196500 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:01,793-Speed 2512.89 samples/sec Loss 3.7215 LearningRate 0.000719 Epoch: 9 Global Step: 196510 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:09,997-Speed 2496.94 samples/sec Loss 3.7887 LearningRate 0.000719 Epoch: 9 Global Step: 196520 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:18,213-Speed 2492.97 samples/sec Loss 3.7084 LearningRate 0.000719 Epoch: 9 Global Step: 196530 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:26,418-Speed 2496.58 samples/sec Loss 3.6816 LearningRate 0.000719 Epoch: 9 Global Step: 196540 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:34,626-Speed 2495.31 samples/sec Loss 3.6797 LearningRate 0.000719 Epoch: 9 Global Step: 196550 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:42,829-Speed 2497.52 samples/sec Loss 3.7539 LearningRate 0.000719 Epoch: 9 Global Step: 196560 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:50,982-Speed 2512.41 samples/sec Loss 3.6680 LearningRate 0.000719 Epoch: 9 Global Step: 196570 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:21:59,184-Speed 2497.27 samples/sec Loss 3.7571 LearningRate 0.000719 Epoch: 9 Global Step: 196580 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:07,417-Speed 2487.98 samples/sec Loss 3.7615 LearningRate 0.000719 Epoch: 9 Global Step: 196590 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:15,620-Speed 2496.97 samples/sec Loss 3.7937 LearningRate 0.000719 Epoch: 9 Global Step: 196600 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:23,824-Speed 2496.81 samples/sec Loss 3.7012 LearningRate 0.000719 Epoch: 9 Global Step: 196610 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:32,029-Speed 2496.40 samples/sec Loss 3.6989 LearningRate 0.000719 Epoch: 9 Global Step: 196620 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:40,189-Speed 2510.07 samples/sec Loss 3.6993 LearningRate 0.000719 Epoch: 9 Global Step: 196630 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:48,395-Speed 2496.35 samples/sec Loss 3.7376 LearningRate 0.000719 Epoch: 9 Global Step: 196640 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:22:56,599-Speed 2496.72 samples/sec Loss 3.7211 LearningRate 0.000719 Epoch: 9 Global Step: 196650 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:04,801-Speed 2497.68 samples/sec Loss 3.7568 LearningRate 0.000719 Epoch: 9 Global Step: 196660 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:13,009-Speed 2495.92 samples/sec Loss 3.7901 LearningRate 0.000719 Epoch: 9 Global Step: 196670 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:21,218-Speed 2495.13 samples/sec Loss 3.7143 LearningRate 0.000719 Epoch: 9 Global Step: 196680 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:29,367-Speed 2513.74 samples/sec Loss 3.6817 LearningRate 0.000719 Epoch: 9 Global Step: 196690 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:37,571-Speed 2496.70 samples/sec Loss 3.8173 LearningRate 0.000719 Epoch: 9 Global Step: 196700 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:45,770-Speed 2498.00 samples/sec Loss 3.7624 LearningRate 0.000718 Epoch: 9 Global Step: 196710 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:23:53,971-Speed 2498.08 samples/sec Loss 3.7268 LearningRate 0.000718 Epoch: 9 Global Step: 196720 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:02,174-Speed 2496.91 samples/sec Loss 3.6825 LearningRate 0.000718 Epoch: 9 Global Step: 196730 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:10,371-Speed 2498.93 samples/sec Loss 3.7585 LearningRate 0.000718 Epoch: 9 Global Step: 196740 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:18,520-Speed 2513.81 samples/sec Loss 3.7114 LearningRate 0.000718 Epoch: 9 Global Step: 196750 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:26,716-Speed 2499.12 samples/sec Loss 3.6697 LearningRate 0.000718 Epoch: 9 Global Step: 196760 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:34,919-Speed 2497.26 samples/sec Loss 3.6668 LearningRate 0.000718 Epoch: 9 Global Step: 196770 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:43,119-Speed 2498.08 samples/sec Loss 3.6207 LearningRate 0.000718 Epoch: 9 Global Step: 196780 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:51,316-Speed 2498.71 samples/sec Loss 3.7258 LearningRate 0.000718 Epoch: 9 Global Step: 196790 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:24:59,514-Speed 2498.64 samples/sec Loss 3.6664 LearningRate 0.000718 Epoch: 9 Global Step: 196800 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:08,600-Speed 2513.12 samples/sec Loss 3.6823 LearningRate 0.000718 Epoch: 9 Global Step: 196810 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:16,799-Speed 2498.33 samples/sec Loss 3.7138 LearningRate 0.000718 Epoch: 9 Global Step: 196820 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:25,582-Speed 2500.13 samples/sec Loss 3.7040 LearningRate 0.000718 Epoch: 9 Global Step: 196830 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:33,782-Speed 2497.91 samples/sec Loss 3.6814 LearningRate 0.000718 Epoch: 9 Global Step: 196840 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:41,978-Speed 2499.24 samples/sec Loss 3.6909 LearningRate 0.000718 Epoch: 9 Global Step: 196850 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:50,176-Speed 2498.79 samples/sec Loss 3.7255 LearningRate 0.000718 Epoch: 9 Global Step: 196860 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:25:58,321-Speed 2514.93 samples/sec Loss 3.6871 LearningRate 0.000718 Epoch: 9 Global Step: 196870 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:06,519-Speed 2498.52 samples/sec Loss 3.6909 LearningRate 0.000718 Epoch: 9 Global Step: 196880 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:14,717-Speed 2498.52 samples/sec Loss 3.6555 LearningRate 0.000718 Epoch: 9 Global Step: 196890 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:22,915-Speed 2498.60 samples/sec Loss 3.7035 LearningRate 0.000718 Epoch: 9 Global Step: 196900 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:31,112-Speed 2499.09 samples/sec Loss 3.6628 LearningRate 0.000718 Epoch: 9 Global Step: 196910 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:39,320-Speed 2495.35 samples/sec Loss 3.6462 LearningRate 0.000718 Epoch: 9 Global Step: 196920 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:47,470-Speed 2514.24 samples/sec Loss 3.6420 LearningRate 0.000718 Epoch: 9 Global Step: 196930 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:26:55,672-Speed 2497.15 samples/sec Loss 3.6692 LearningRate 0.000718 Epoch: 9 Global Step: 196940 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:03,870-Speed 2498.59 samples/sec Loss 3.7656 LearningRate 0.000718 Epoch: 9 Global Step: 196950 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:12,069-Speed 2498.42 samples/sec Loss 3.7569 LearningRate 0.000718 Epoch: 9 Global Step: 196960 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:20,265-Speed 2499.14 samples/sec Loss 3.7593 LearningRate 0.000718 Epoch: 9 Global Step: 196970 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:28,461-Speed 2499.29 samples/sec Loss 3.6564 LearningRate 0.000718 Epoch: 9 Global Step: 196980 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:36,608-Speed 2514.01 samples/sec Loss 3.6314 LearningRate 0.000718 Epoch: 9 Global Step: 196990 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:44,805-Speed 2498.70 samples/sec Loss 3.6182 LearningRate 0.000718 Epoch: 9 Global Step: 197000 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:27:53,003-Speed 2498.87 samples/sec Loss 3.7502 LearningRate 0.000718 Epoch: 9 Global Step: 197010 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:01,205-Speed 2497.61 samples/sec Loss 3.6926 LearningRate 0.000718 Epoch: 9 Global Step: 197020 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:09,401-Speed 2499.02 samples/sec Loss 3.7417 LearningRate 0.000718 Epoch: 9 Global Step: 197030 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:17,603-Speed 2497.23 samples/sec Loss 3.8261 LearningRate 0.000718 Epoch: 9 Global Step: 197040 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:25,747-Speed 2515.42 samples/sec Loss 3.6932 LearningRate 0.000718 Epoch: 9 Global Step: 197050 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:33,949-Speed 2497.03 samples/sec Loss 3.7537 LearningRate 0.000718 Epoch: 9 Global Step: 197060 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:42,159-Speed 2495.05 samples/sec Loss 3.6860 LearningRate 0.000718 Epoch: 9 Global Step: 197070 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:50,363-Speed 2496.65 samples/sec Loss 3.7402 LearningRate 0.000718 Epoch: 9 Global Step: 197080 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:28:58,583-Speed 2491.72 samples/sec Loss 3.7357 LearningRate 0.000718 Epoch: 9 Global Step: 197090 Fp16 Grad Scale: 65536 Required: 145 hours Training: 2022-07-07 11:29:06,738-Speed 2511.72 samples/sec Loss 3.7431 LearningRate 0.000718 Epoch: 9 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:14,897-Speed 2510.44 samples/sec Loss 3.6419 LearningRate 0.000718 Epoch: 9 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:23,099-Speed 2497.35 samples/sec Loss 3.7006 LearningRate 0.000718 Epoch: 9 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:31,299-Speed 2498.14 samples/sec Loss 3.7518 LearningRate 0.000718 Epoch: 9 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:39,505-Speed 2496.00 samples/sec Loss 3.7752 LearningRate 0.000718 Epoch: 9 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:47,708-Speed 2497.17 samples/sec Loss 3.7201 LearningRate 0.000717 Epoch: 9 Global Step: 197150 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:29:55,909-Speed 2497.54 samples/sec Loss 3.6826 LearningRate 0.000717 Epoch: 9 Global Step: 197160 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:04,054-Speed 2514.71 samples/sec Loss 3.6113 LearningRate 0.000717 Epoch: 9 Global Step: 197170 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:12,254-Speed 2498.16 samples/sec Loss 3.6955 LearningRate 0.000717 Epoch: 9 Global Step: 197180 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:20,455-Speed 2497.53 samples/sec Loss 3.7105 LearningRate 0.000717 Epoch: 9 Global Step: 197190 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:28,660-Speed 2496.75 samples/sec Loss 3.6837 LearningRate 0.000717 Epoch: 9 Global Step: 197200 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:36,867-Speed 2495.77 samples/sec Loss 3.7298 LearningRate 0.000717 Epoch: 9 Global Step: 197210 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:45,069-Speed 2497.19 samples/sec Loss 3.6954 LearningRate 0.000717 Epoch: 9 Global Step: 197220 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:30:53,228-Speed 2510.59 samples/sec Loss 3.7016 LearningRate 0.000717 Epoch: 9 Global Step: 197230 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:31:01,430-Speed 2497.59 samples/sec Loss 3.6733 LearningRate 0.000717 Epoch: 9 Global Step: 197240 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:31:09,624-Speed 2499.98 samples/sec Loss 3.6768 LearningRate 0.000717 Epoch: 9 Global Step: 197250 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:31:17,827-Speed 2497.20 samples/sec Loss 3.7002 LearningRate 0.000717 Epoch: 9 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 145 hours Training: 2022-07-07 11:31:26,030-Speed 2496.95 samples/sec Loss 3.6638 LearningRate 0.000717 Epoch: 9 Global Step: 197270 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:31:34,227-Speed 2498.81 samples/sec Loss 3.6961 LearningRate 0.000717 Epoch: 9 Global Step: 197280 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:31:42,371-Speed 2515.08 samples/sec Loss 3.7176 LearningRate 0.000717 Epoch: 9 Global Step: 197290 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:31:50,569-Speed 2498.79 samples/sec Loss 3.7480 LearningRate 0.000717 Epoch: 9 Global Step: 197300 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:31:58,768-Speed 2498.05 samples/sec Loss 3.7124 LearningRate 0.000717 Epoch: 9 Global Step: 197310 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:06,977-Speed 2495.65 samples/sec Loss 3.7120 LearningRate 0.000717 Epoch: 9 Global Step: 197320 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:15,174-Speed 2498.70 samples/sec Loss 3.6922 LearningRate 0.000717 Epoch: 9 Global Step: 197330 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:23,374-Speed 2497.95 samples/sec Loss 3.7265 LearningRate 0.000717 Epoch: 9 Global Step: 197340 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:31,519-Speed 2514.98 samples/sec Loss 3.7087 LearningRate 0.000717 Epoch: 9 Global Step: 197350 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:39,716-Speed 2499.11 samples/sec Loss 3.7112 LearningRate 0.000717 Epoch: 9 Global Step: 197360 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:47,928-Speed 2494.21 samples/sec Loss 3.6574 LearningRate 0.000717 Epoch: 9 Global Step: 197370 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:32:56,125-Speed 2498.80 samples/sec Loss 3.6888 LearningRate 0.000717 Epoch: 9 Global Step: 197380 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:04,322-Speed 2499.02 samples/sec Loss 3.7260 LearningRate 0.000717 Epoch: 9 Global Step: 197390 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:12,520-Speed 2498.71 samples/sec Loss 3.8202 LearningRate 0.000717 Epoch: 9 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:20,668-Speed 2513.87 samples/sec Loss 3.7306 LearningRate 0.000717 Epoch: 9 Global Step: 197410 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:28,868-Speed 2497.91 samples/sec Loss 3.8209 LearningRate 0.000717 Epoch: 9 Global Step: 197420 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:37,062-Speed 2499.89 samples/sec Loss 3.7906 LearningRate 0.000717 Epoch: 9 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:45,264-Speed 2497.49 samples/sec Loss 3.7208 LearningRate 0.000717 Epoch: 9 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:33:53,474-Speed 2494.88 samples/sec Loss 3.6849 LearningRate 0.000717 Epoch: 9 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:01,674-Speed 2497.77 samples/sec Loss 3.6831 LearningRate 0.000717 Epoch: 9 Global Step: 197460 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:09,823-Speed 2513.85 samples/sec Loss 3.6842 LearningRate 0.000717 Epoch: 9 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:18,020-Speed 2498.80 samples/sec Loss 3.6772 LearningRate 0.000717 Epoch: 9 Global Step: 197480 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:26,220-Speed 2497.82 samples/sec Loss 3.7223 LearningRate 0.000717 Epoch: 9 Global Step: 197490 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:34,422-Speed 2497.53 samples/sec Loss 3.7033 LearningRate 0.000717 Epoch: 9 Global Step: 197500 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:42,620-Speed 2498.60 samples/sec Loss 3.7862 LearningRate 0.000717 Epoch: 9 Global Step: 197510 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:50,821-Speed 2497.67 samples/sec Loss 3.6729 LearningRate 0.000717 Epoch: 9 Global Step: 197520 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:34:58,979-Speed 2510.40 samples/sec Loss 3.6356 LearningRate 0.000717 Epoch: 9 Global Step: 197530 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:07,180-Speed 2497.76 samples/sec Loss 3.7251 LearningRate 0.000717 Epoch: 9 Global Step: 197540 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:15,395-Speed 2493.54 samples/sec Loss 3.7523 LearningRate 0.000717 Epoch: 9 Global Step: 197550 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:23,594-Speed 2498.22 samples/sec Loss 3.6640 LearningRate 0.000717 Epoch: 9 Global Step: 197560 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:31,802-Speed 2495.53 samples/sec Loss 3.6932 LearningRate 0.000717 Epoch: 9 Global Step: 197570 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:40,010-Speed 2495.79 samples/sec Loss 3.6991 LearningRate 0.000717 Epoch: 9 Global Step: 197580 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:48,160-Speed 2513.07 samples/sec Loss 3.7583 LearningRate 0.000716 Epoch: 9 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:35:56,363-Speed 2497.32 samples/sec Loss 3.6151 LearningRate 0.000716 Epoch: 9 Global Step: 197600 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:04,562-Speed 2498.25 samples/sec Loss 3.7557 LearningRate 0.000716 Epoch: 9 Global Step: 197610 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:12,761-Speed 2498.26 samples/sec Loss 3.6777 LearningRate 0.000716 Epoch: 9 Global Step: 197620 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:20,959-Speed 2498.51 samples/sec Loss 3.7240 LearningRate 0.000716 Epoch: 9 Global Step: 197630 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:29,160-Speed 2497.61 samples/sec Loss 3.6655 LearningRate 0.000716 Epoch: 9 Global Step: 197640 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:37,305-Speed 2514.94 samples/sec Loss 3.7658 LearningRate 0.000716 Epoch: 9 Global Step: 197650 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:45,503-Speed 2498.66 samples/sec Loss 3.6785 LearningRate 0.000716 Epoch: 9 Global Step: 197660 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:36:53,713-Speed 2494.60 samples/sec Loss 3.6528 LearningRate 0.000716 Epoch: 9 Global Step: 197670 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:01,911-Speed 2498.66 samples/sec Loss 3.7321 LearningRate 0.000716 Epoch: 9 Global Step: 197680 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:10,107-Speed 2499.22 samples/sec Loss 3.7225 LearningRate 0.000716 Epoch: 9 Global Step: 197690 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:18,314-Speed 2495.93 samples/sec Loss 3.6320 LearningRate 0.000716 Epoch: 9 Global Step: 197700 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:26,458-Speed 2515.13 samples/sec Loss 3.6316 LearningRate 0.000716 Epoch: 9 Global Step: 197710 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:34,673-Speed 2493.17 samples/sec Loss 3.7196 LearningRate 0.000716 Epoch: 9 Global Step: 197720 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:42,872-Speed 2498.26 samples/sec Loss 3.7545 LearningRate 0.000716 Epoch: 9 Global Step: 197730 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:51,073-Speed 2497.78 samples/sec Loss 3.7513 LearningRate 0.000716 Epoch: 9 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:37:59,275-Speed 2497.45 samples/sec Loss 3.6956 LearningRate 0.000716 Epoch: 9 Global Step: 197750 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:07,474-Speed 2498.04 samples/sec Loss 3.8137 LearningRate 0.000716 Epoch: 9 Global Step: 197760 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:15,621-Speed 2514.37 samples/sec Loss 3.7401 LearningRate 0.000716 Epoch: 9 Global Step: 197770 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:23,818-Speed 2498.89 samples/sec Loss 3.7786 LearningRate 0.000716 Epoch: 9 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:32,014-Speed 2499.12 samples/sec Loss 3.7681 LearningRate 0.000716 Epoch: 9 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:40,213-Speed 2498.38 samples/sec Loss 3.6999 LearningRate 0.000716 Epoch: 9 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:48,415-Speed 2497.34 samples/sec Loss 3.7461 LearningRate 0.000716 Epoch: 9 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:38:56,622-Speed 2495.97 samples/sec Loss 3.7200 LearningRate 0.000716 Epoch: 9 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:04,769-Speed 2514.21 samples/sec Loss 3.6787 LearningRate 0.000716 Epoch: 9 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:12,967-Speed 2498.55 samples/sec Loss 3.6797 LearningRate 0.000716 Epoch: 9 Global Step: 197840 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:21,176-Speed 2495.34 samples/sec Loss 3.7464 LearningRate 0.000716 Epoch: 9 Global Step: 197850 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:29,387-Speed 2494.71 samples/sec Loss 3.7473 LearningRate 0.000716 Epoch: 9 Global Step: 197860 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:37,588-Speed 2497.56 samples/sec Loss 3.7167 LearningRate 0.000716 Epoch: 9 Global Step: 197870 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:45,792-Speed 2497.21 samples/sec Loss 3.7222 LearningRate 0.000716 Epoch: 9 Global Step: 197880 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:39:53,938-Speed 2514.52 samples/sec Loss 3.6336 LearningRate 0.000716 Epoch: 9 Global Step: 197890 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:02,138-Speed 2497.97 samples/sec Loss 3.8191 LearningRate 0.000716 Epoch: 9 Global Step: 197900 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:10,337-Speed 2498.36 samples/sec Loss 3.7132 LearningRate 0.000716 Epoch: 9 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:18,532-Speed 2499.32 samples/sec Loss 3.7294 LearningRate 0.000716 Epoch: 9 Global Step: 197920 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:26,730-Speed 2498.66 samples/sec Loss 3.8376 LearningRate 0.000716 Epoch: 9 Global Step: 197930 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:34,929-Speed 2498.42 samples/sec Loss 3.7504 LearningRate 0.000716 Epoch: 9 Global Step: 197940 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:43,072-Speed 2515.36 samples/sec Loss 3.7872 LearningRate 0.000716 Epoch: 9 Global Step: 197950 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:51,271-Speed 2498.23 samples/sec Loss 3.7429 LearningRate 0.000716 Epoch: 9 Global Step: 197960 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:40:59,469-Speed 2499.19 samples/sec Loss 3.7453 LearningRate 0.000716 Epoch: 9 Global Step: 197970 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:07,667-Speed 2498.33 samples/sec Loss 3.7762 LearningRate 0.000716 Epoch: 9 Global Step: 197980 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:15,868-Speed 2497.69 samples/sec Loss 3.6994 LearningRate 0.000716 Epoch: 9 Global Step: 197990 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:24,066-Speed 2498.49 samples/sec Loss 3.6524 LearningRate 0.000716 Epoch: 9 Global Step: 198000 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:32,220-Speed 2512.32 samples/sec Loss 3.7304 LearningRate 0.000716 Epoch: 9 Global Step: 198010 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:40,420-Speed 2497.76 samples/sec Loss 3.7190 LearningRate 0.000716 Epoch: 9 Global Step: 198020 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:48,617-Speed 2498.54 samples/sec Loss 3.7489 LearningRate 0.000715 Epoch: 9 Global Step: 198030 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:41:56,819-Speed 2497.34 samples/sec Loss 3.7607 LearningRate 0.000715 Epoch: 9 Global Step: 198040 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:05,019-Speed 2498.33 samples/sec Loss 3.6677 LearningRate 0.000715 Epoch: 9 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:13,220-Speed 2497.36 samples/sec Loss 3.7124 LearningRate 0.000715 Epoch: 9 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:21,366-Speed 2514.62 samples/sec Loss 3.6415 LearningRate 0.000715 Epoch: 9 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:29,576-Speed 2494.76 samples/sec Loss 3.6813 LearningRate 0.000715 Epoch: 9 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:37,776-Speed 2497.99 samples/sec Loss 3.6601 LearningRate 0.000715 Epoch: 9 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:45,979-Speed 2497.20 samples/sec Loss 3.7055 LearningRate 0.000715 Epoch: 9 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:42:54,177-Speed 2498.51 samples/sec Loss 3.6596 LearningRate 0.000715 Epoch: 9 Global Step: 198110 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:02,375-Speed 2498.47 samples/sec Loss 3.8147 LearningRate 0.000715 Epoch: 9 Global Step: 198120 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:10,519-Speed 2515.09 samples/sec Loss 3.7173 LearningRate 0.000715 Epoch: 9 Global Step: 198130 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:18,723-Speed 2496.95 samples/sec Loss 3.7026 LearningRate 0.000715 Epoch: 9 Global Step: 198140 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:26,921-Speed 2498.67 samples/sec Loss 3.6425 LearningRate 0.000715 Epoch: 9 Global Step: 198150 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:35,118-Speed 2498.79 samples/sec Loss 3.6664 LearningRate 0.000715 Epoch: 9 Global Step: 198160 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:43,318-Speed 2498.05 samples/sec Loss 3.7471 LearningRate 0.000715 Epoch: 9 Global Step: 198170 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:51,518-Speed 2497.65 samples/sec Loss 3.7165 LearningRate 0.000715 Epoch: 9 Global Step: 198180 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:43:59,664-Speed 2514.59 samples/sec Loss 3.7899 LearningRate 0.000715 Epoch: 9 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:07,863-Speed 2498.54 samples/sec Loss 3.7572 LearningRate 0.000715 Epoch: 9 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:16,059-Speed 2498.92 samples/sec Loss 3.6610 LearningRate 0.000715 Epoch: 9 Global Step: 198210 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:24,256-Speed 2499.26 samples/sec Loss 3.7163 LearningRate 0.000715 Epoch: 9 Global Step: 198220 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:32,455-Speed 2498.32 samples/sec Loss 3.6596 LearningRate 0.000715 Epoch: 9 Global Step: 198230 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:40,657-Speed 2497.45 samples/sec Loss 3.7435 LearningRate 0.000715 Epoch: 9 Global Step: 198240 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:48,803-Speed 2515.11 samples/sec Loss 3.7119 LearningRate 0.000715 Epoch: 9 Global Step: 198250 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:44:57,005-Speed 2497.18 samples/sec Loss 3.6759 LearningRate 0.000715 Epoch: 9 Global Step: 198260 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:45:05,203-Speed 2498.83 samples/sec Loss 3.6762 LearningRate 0.000715 Epoch: 9 Global Step: 198270 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:45:13,401-Speed 2498.59 samples/sec Loss 3.6719 LearningRate 0.000715 Epoch: 9 Global Step: 198280 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:45:21,597-Speed 2499.41 samples/sec Loss 3.7287 LearningRate 0.000715 Epoch: 9 Global Step: 198290 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:45:29,794-Speed 2498.57 samples/sec Loss 3.7359 LearningRate 0.000715 Epoch: 9 Global Step: 198300 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:45:37,941-Speed 2514.54 samples/sec Loss 3.7256 LearningRate 0.000715 Epoch: 9 Global Step: 198310 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:45:46,142-Speed 2497.81 samples/sec Loss 3.7089 LearningRate 0.000715 Epoch: 9 Global Step: 198320 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:45:54,343-Speed 2497.78 samples/sec Loss 3.6402 LearningRate 0.000715 Epoch: 9 Global Step: 198330 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:02,539-Speed 2499.12 samples/sec Loss 3.7259 LearningRate 0.000715 Epoch: 9 Global Step: 198340 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:10,740-Speed 2497.91 samples/sec Loss 3.6944 LearningRate 0.000715 Epoch: 9 Global Step: 198350 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:18,937-Speed 2498.81 samples/sec Loss 3.6835 LearningRate 0.000715 Epoch: 9 Global Step: 198360 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:27,080-Speed 2515.59 samples/sec Loss 3.7508 LearningRate 0.000715 Epoch: 9 Global Step: 198370 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:35,277-Speed 2498.62 samples/sec Loss 3.7941 LearningRate 0.000715 Epoch: 9 Global Step: 198380 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:43,472-Speed 2499.46 samples/sec Loss 3.6691 LearningRate 0.000715 Epoch: 9 Global Step: 198390 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:51,669-Speed 2498.69 samples/sec Loss 3.7339 LearningRate 0.000715 Epoch: 9 Global Step: 198400 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:46:59,870-Speed 2497.89 samples/sec Loss 3.7839 LearningRate 0.000715 Epoch: 9 Global Step: 198410 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:08,063-Speed 2500.05 samples/sec Loss 3.7431 LearningRate 0.000715 Epoch: 9 Global Step: 198420 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:16,210-Speed 2514.59 samples/sec Loss 3.7497 LearningRate 0.000715 Epoch: 9 Global Step: 198430 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:24,407-Speed 2498.72 samples/sec Loss 3.6806 LearningRate 0.000715 Epoch: 9 Global Step: 198440 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:32,622-Speed 2494.41 samples/sec Loss 3.7457 LearningRate 0.000715 Epoch: 9 Global Step: 198450 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:40,825-Speed 2496.76 samples/sec Loss 3.7270 LearningRate 0.000715 Epoch: 9 Global Step: 198460 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:49,025-Speed 2498.12 samples/sec Loss 3.7170 LearningRate 0.000715 Epoch: 9 Global Step: 198470 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:47:57,223-Speed 2498.59 samples/sec Loss 3.6746 LearningRate 0.000714 Epoch: 9 Global Step: 198480 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:05,372-Speed 2513.48 samples/sec Loss 3.7349 LearningRate 0.000714 Epoch: 9 Global Step: 198490 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:13,571-Speed 2498.77 samples/sec Loss 3.8628 LearningRate 0.000714 Epoch: 9 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:21,768-Speed 2498.81 samples/sec Loss 3.7740 LearningRate 0.000714 Epoch: 9 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:29,968-Speed 2498.10 samples/sec Loss 3.7575 LearningRate 0.000714 Epoch: 9 Global Step: 198520 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:38,161-Speed 2499.88 samples/sec Loss 3.6552 LearningRate 0.000714 Epoch: 9 Global Step: 198530 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:46,365-Speed 2496.97 samples/sec Loss 3.8282 LearningRate 0.000714 Epoch: 9 Global Step: 198540 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:48:54,547-Speed 2503.57 samples/sec Loss 3.6210 LearningRate 0.000714 Epoch: 9 Global Step: 198550 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:02,747-Speed 2498.07 samples/sec Loss 3.7609 LearningRate 0.000714 Epoch: 9 Global Step: 198560 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:10,953-Speed 2496.12 samples/sec Loss 3.6939 LearningRate 0.000714 Epoch: 9 Global Step: 198570 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:19,154-Speed 2498.05 samples/sec Loss 3.6818 LearningRate 0.000714 Epoch: 9 Global Step: 198580 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:27,349-Speed 2499.34 samples/sec Loss 3.6521 LearningRate 0.000714 Epoch: 9 Global Step: 198590 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:35,548-Speed 2498.19 samples/sec Loss 3.6984 LearningRate 0.000714 Epoch: 9 Global Step: 198600 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:43,697-Speed 2513.60 samples/sec Loss 3.6279 LearningRate 0.000714 Epoch: 9 Global Step: 198610 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:49:51,894-Speed 2498.80 samples/sec Loss 3.7214 LearningRate 0.000714 Epoch: 9 Global Step: 198620 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:00,090-Speed 2499.11 samples/sec Loss 3.6668 LearningRate 0.000714 Epoch: 9 Global Step: 198630 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:08,291-Speed 2497.79 samples/sec Loss 3.6854 LearningRate 0.000714 Epoch: 9 Global Step: 198640 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:16,491-Speed 2497.93 samples/sec Loss 3.6392 LearningRate 0.000714 Epoch: 9 Global Step: 198650 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:24,692-Speed 2497.77 samples/sec Loss 3.6865 LearningRate 0.000714 Epoch: 9 Global Step: 198660 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:32,838-Speed 2514.22 samples/sec Loss 3.7199 LearningRate 0.000714 Epoch: 9 Global Step: 198670 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:41,034-Speed 2499.30 samples/sec Loss 3.7594 LearningRate 0.000714 Epoch: 9 Global Step: 198680 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:49,232-Speed 2498.69 samples/sec Loss 3.6863 LearningRate 0.000714 Epoch: 9 Global Step: 198690 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:50:57,441-Speed 2494.96 samples/sec Loss 3.7198 LearningRate 0.000714 Epoch: 9 Global Step: 198700 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:05,641-Speed 2498.11 samples/sec Loss 3.7573 LearningRate 0.000714 Epoch: 9 Global Step: 198710 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:13,840-Speed 2498.32 samples/sec Loss 3.7656 LearningRate 0.000714 Epoch: 9 Global Step: 198720 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:21,991-Speed 2512.98 samples/sec Loss 3.6814 LearningRate 0.000714 Epoch: 9 Global Step: 198730 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:30,191-Speed 2498.10 samples/sec Loss 3.7477 LearningRate 0.000714 Epoch: 9 Global Step: 198740 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:38,389-Speed 2498.68 samples/sec Loss 3.8118 LearningRate 0.000714 Epoch: 9 Global Step: 198750 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:46,591-Speed 2497.09 samples/sec Loss 3.6677 LearningRate 0.000714 Epoch: 9 Global Step: 198760 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:51:54,791-Speed 2498.11 samples/sec Loss 3.7199 LearningRate 0.000714 Epoch: 9 Global Step: 198770 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:02,999-Speed 2495.85 samples/sec Loss 3.7754 LearningRate 0.000714 Epoch: 9 Global Step: 198780 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:11,145-Speed 2514.38 samples/sec Loss 3.7054 LearningRate 0.000714 Epoch: 9 Global Step: 198790 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:19,346-Speed 2497.78 samples/sec Loss 3.7279 LearningRate 0.000714 Epoch: 9 Global Step: 198800 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:27,543-Speed 2498.90 samples/sec Loss 3.6771 LearningRate 0.000714 Epoch: 9 Global Step: 198810 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:35,741-Speed 2499.55 samples/sec Loss 3.8028 LearningRate 0.000714 Epoch: 9 Global Step: 198820 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:43,939-Speed 2498.61 samples/sec Loss 3.7790 LearningRate 0.000714 Epoch: 9 Global Step: 198830 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:52:52,134-Speed 2499.18 samples/sec Loss 3.8227 LearningRate 0.000714 Epoch: 9 Global Step: 198840 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:00,281-Speed 2514.36 samples/sec Loss 3.7404 LearningRate 0.000714 Epoch: 9 Global Step: 198850 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:08,479-Speed 2498.77 samples/sec Loss 3.6574 LearningRate 0.000714 Epoch: 9 Global Step: 198860 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:16,680-Speed 2497.46 samples/sec Loss 3.7190 LearningRate 0.000714 Epoch: 9 Global Step: 198870 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:24,879-Speed 2498.33 samples/sec Loss 3.7026 LearningRate 0.000714 Epoch: 9 Global Step: 198880 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:33,079-Speed 2497.91 samples/sec Loss 3.6935 LearningRate 0.000714 Epoch: 9 Global Step: 198890 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:41,279-Speed 2498.07 samples/sec Loss 3.8348 LearningRate 0.000714 Epoch: 9 Global Step: 198900 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:49,424-Speed 2514.84 samples/sec Loss 3.6961 LearningRate 0.000714 Epoch: 9 Global Step: 198910 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 11:53:57,578-Speed 2512.11 samples/sec Loss 3.7234 LearningRate 0.000713 Epoch: 9 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:05,775-Speed 2499.10 samples/sec Loss 3.7107 LearningRate 0.000713 Epoch: 9 Global Step: 198930 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:13,974-Speed 2498.15 samples/sec Loss 3.7106 LearningRate 0.000713 Epoch: 9 Global Step: 198940 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:22,174-Speed 2497.96 samples/sec Loss 3.7555 LearningRate 0.000713 Epoch: 9 Global Step: 198950 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:30,378-Speed 2496.85 samples/sec Loss 3.8087 LearningRate 0.000713 Epoch: 9 Global Step: 198960 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:38,535-Speed 2511.15 samples/sec Loss 3.7349 LearningRate 0.000713 Epoch: 9 Global Step: 198970 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:46,745-Speed 2494.72 samples/sec Loss 3.7152 LearningRate 0.000713 Epoch: 9 Global Step: 198980 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:54:54,946-Speed 2497.65 samples/sec Loss 3.7158 LearningRate 0.000713 Epoch: 9 Global Step: 198990 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:03,145-Speed 2498.39 samples/sec Loss 3.6973 LearningRate 0.000713 Epoch: 9 Global Step: 199000 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:11,343-Speed 2498.56 samples/sec Loss 3.7264 LearningRate 0.000713 Epoch: 9 Global Step: 199010 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:19,542-Speed 2498.22 samples/sec Loss 3.7032 LearningRate 0.000713 Epoch: 9 Global Step: 199020 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:27,685-Speed 2515.50 samples/sec Loss 3.6410 LearningRate 0.000713 Epoch: 9 Global Step: 199030 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:35,883-Speed 2498.51 samples/sec Loss 3.7569 LearningRate 0.000713 Epoch: 9 Global Step: 199040 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:44,092-Speed 2495.22 samples/sec Loss 3.7084 LearningRate 0.000713 Epoch: 9 Global Step: 199050 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:55:52,290-Speed 2498.43 samples/sec Loss 3.7112 LearningRate 0.000713 Epoch: 9 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:00,489-Speed 2498.12 samples/sec Loss 3.7164 LearningRate 0.000713 Epoch: 9 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:08,704-Speed 2493.51 samples/sec Loss 3.7113 LearningRate 0.000713 Epoch: 9 Global Step: 199080 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:16,851-Speed 2514.30 samples/sec Loss 3.7210 LearningRate 0.000713 Epoch: 9 Global Step: 199090 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:25,050-Speed 2498.11 samples/sec Loss 3.7936 LearningRate 0.000713 Epoch: 9 Global Step: 199100 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:33,250-Speed 2497.95 samples/sec Loss 3.7536 LearningRate 0.000713 Epoch: 9 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:41,449-Speed 2498.43 samples/sec Loss 3.7327 LearningRate 0.000713 Epoch: 9 Global Step: 199120 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:49,647-Speed 2498.76 samples/sec Loss 3.7147 LearningRate 0.000713 Epoch: 9 Global Step: 199130 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:56:57,972-Speed 2498.38 samples/sec Loss 3.6985 LearningRate 0.000713 Epoch: 9 Global Step: 199140 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:07,494-Speed 2150.92 samples/sec Loss 3.7556 LearningRate 0.000713 Epoch: 9 Global Step: 199150 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:19,308-Speed 1762.39 samples/sec Loss 3.6831 LearningRate 0.000713 Epoch: 9 Global Step: 199160 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:28,261-Speed 2287.68 samples/sec Loss 3.6502 LearningRate 0.000713 Epoch: 9 Global Step: 199170 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:36,457-Speed 2499.11 samples/sec Loss 3.7413 LearningRate 0.000713 Epoch: 9 Global Step: 199180 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:44,653-Speed 2499.22 samples/sec Loss 3.7443 LearningRate 0.000713 Epoch: 9 Global Step: 199190 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:57:52,845-Speed 2500.32 samples/sec Loss 3.6066 LearningRate 0.000713 Epoch: 9 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:00,986-Speed 2516.05 samples/sec Loss 3.6450 LearningRate 0.000713 Epoch: 9 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:09,189-Speed 2496.93 samples/sec Loss 3.6725 LearningRate 0.000713 Epoch: 9 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:17,389-Speed 2498.09 samples/sec Loss 3.6750 LearningRate 0.000713 Epoch: 9 Global Step: 199230 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:25,586-Speed 2498.62 samples/sec Loss 3.6602 LearningRate 0.000713 Epoch: 9 Global Step: 199240 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:33,787-Speed 2497.56 samples/sec Loss 3.6433 LearningRate 0.000713 Epoch: 9 Global Step: 199250 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:41,991-Speed 2496.75 samples/sec Loss 3.6436 LearningRate 0.000713 Epoch: 9 Global Step: 199260 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:50,137-Speed 2514.68 samples/sec Loss 3.7819 LearningRate 0.000713 Epoch: 9 Global Step: 199270 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:58:58,340-Speed 2496.96 samples/sec Loss 3.7895 LearningRate 0.000713 Epoch: 9 Global Step: 199280 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:06,541-Speed 2497.45 samples/sec Loss 3.6531 LearningRate 0.000713 Epoch: 9 Global Step: 199290 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:14,740-Speed 2498.54 samples/sec Loss 3.6839 LearningRate 0.000713 Epoch: 9 Global Step: 199300 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:22,937-Speed 2498.99 samples/sec Loss 3.7193 LearningRate 0.000713 Epoch: 9 Global Step: 199310 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:31,151-Speed 2493.74 samples/sec Loss 3.6257 LearningRate 0.000713 Epoch: 9 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:39,296-Speed 2514.80 samples/sec Loss 3.6629 LearningRate 0.000713 Epoch: 9 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:47,505-Speed 2495.02 samples/sec Loss 3.7484 LearningRate 0.000713 Epoch: 9 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 11:59:55,720-Speed 2493.70 samples/sec Loss 3.7305 LearningRate 0.000713 Epoch: 9 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:03,915-Speed 2499.26 samples/sec Loss 3.6642 LearningRate 0.000712 Epoch: 9 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:12,117-Speed 2497.34 samples/sec Loss 3.6421 LearningRate 0.000712 Epoch: 9 Global Step: 199370 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:20,318-Speed 2497.82 samples/sec Loss 3.6745 LearningRate 0.000712 Epoch: 9 Global Step: 199380 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:28,462-Speed 2515.03 samples/sec Loss 3.7800 LearningRate 0.000712 Epoch: 9 Global Step: 199390 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:36,669-Speed 2495.52 samples/sec Loss 3.6834 LearningRate 0.000712 Epoch: 9 Global Step: 199400 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:44,869-Speed 2497.96 samples/sec Loss 3.7081 LearningRate 0.000712 Epoch: 9 Global Step: 199410 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:00:53,068-Speed 2498.20 samples/sec Loss 3.6549 LearningRate 0.000712 Epoch: 9 Global Step: 199420 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:01,267-Speed 2498.34 samples/sec Loss 3.6915 LearningRate 0.000712 Epoch: 9 Global Step: 199430 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:09,484-Speed 2492.54 samples/sec Loss 3.6892 LearningRate 0.000712 Epoch: 9 Global Step: 199440 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:17,631-Speed 2514.38 samples/sec Loss 3.7374 LearningRate 0.000712 Epoch: 9 Global Step: 199450 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:25,830-Speed 2498.43 samples/sec Loss 3.7378 LearningRate 0.000712 Epoch: 9 Global Step: 199460 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:34,042-Speed 2494.21 samples/sec Loss 3.7485 LearningRate 0.000712 Epoch: 9 Global Step: 199470 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:42,239-Speed 2498.91 samples/sec Loss 3.6831 LearningRate 0.000712 Epoch: 9 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:50,436-Speed 2498.98 samples/sec Loss 3.7576 LearningRate 0.000712 Epoch: 9 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:01:58,636-Speed 2497.68 samples/sec Loss 3.7355 LearningRate 0.000712 Epoch: 9 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:06,791-Speed 2511.80 samples/sec Loss 3.7381 LearningRate 0.000712 Epoch: 9 Global Step: 199510 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:14,991-Speed 2498.37 samples/sec Loss 3.7207 LearningRate 0.000712 Epoch: 9 Global Step: 199520 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:23,188-Speed 2498.58 samples/sec Loss 3.7082 LearningRate 0.000712 Epoch: 9 Global Step: 199530 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:31,385-Speed 2498.81 samples/sec Loss 3.6118 LearningRate 0.000712 Epoch: 9 Global Step: 199540 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:39,594-Speed 2495.60 samples/sec Loss 3.6146 LearningRate 0.000712 Epoch: 9 Global Step: 199550 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:47,805-Speed 2494.61 samples/sec Loss 3.6114 LearningRate 0.000712 Epoch: 9 Global Step: 199560 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:02:55,956-Speed 2512.95 samples/sec Loss 3.6498 LearningRate 0.000712 Epoch: 9 Global Step: 199570 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:04,153-Speed 2499.04 samples/sec Loss 3.7815 LearningRate 0.000712 Epoch: 9 Global Step: 199580 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:12,352-Speed 2498.27 samples/sec Loss 3.7296 LearningRate 0.000712 Epoch: 9 Global Step: 199590 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:20,564-Speed 2494.50 samples/sec Loss 3.6511 LearningRate 0.000712 Epoch: 9 Global Step: 199600 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:28,781-Speed 2492.77 samples/sec Loss 3.7032 LearningRate 0.000712 Epoch: 9 Global Step: 199610 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:36,984-Speed 2497.17 samples/sec Loss 3.6939 LearningRate 0.000712 Epoch: 9 Global Step: 199620 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:45,133-Speed 2513.53 samples/sec Loss 3.6829 LearningRate 0.000712 Epoch: 9 Global Step: 199630 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:03:53,330-Speed 2498.99 samples/sec Loss 3.6886 LearningRate 0.000712 Epoch: 9 Global Step: 199640 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:01,527-Speed 2498.91 samples/sec Loss 3.6690 LearningRate 0.000712 Epoch: 9 Global Step: 199650 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:09,727-Speed 2497.78 samples/sec Loss 3.6868 LearningRate 0.000712 Epoch: 9 Global Step: 199660 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:17,927-Speed 2497.98 samples/sec Loss 3.6926 LearningRate 0.000712 Epoch: 9 Global Step: 199670 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:26,137-Speed 2495.23 samples/sec Loss 3.7240 LearningRate 0.000712 Epoch: 9 Global Step: 199680 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:34,284-Speed 2514.18 samples/sec Loss 3.7176 LearningRate 0.000712 Epoch: 9 Global Step: 199690 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:42,487-Speed 2496.76 samples/sec Loss 3.6957 LearningRate 0.000712 Epoch: 9 Global Step: 199700 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:50,693-Speed 2496.21 samples/sec Loss 3.6777 LearningRate 0.000712 Epoch: 9 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:04:58,890-Speed 2498.92 samples/sec Loss 3.6356 LearningRate 0.000712 Epoch: 9 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:07,094-Speed 2497.21 samples/sec Loss 3.6771 LearningRate 0.000712 Epoch: 9 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:15,303-Speed 2495.28 samples/sec Loss 3.5925 LearningRate 0.000712 Epoch: 9 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:23,446-Speed 2515.30 samples/sec Loss 3.6559 LearningRate 0.000712 Epoch: 9 Global Step: 199750 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:31,643-Speed 2498.77 samples/sec Loss 3.6128 LearningRate 0.000712 Epoch: 9 Global Step: 199760 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:39,853-Speed 2495.16 samples/sec Loss 3.6596 LearningRate 0.000712 Epoch: 9 Global Step: 199770 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:48,048-Speed 2499.29 samples/sec Loss 3.6348 LearningRate 0.000712 Epoch: 9 Global Step: 199780 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:05:56,245-Speed 2499.21 samples/sec Loss 3.6967 LearningRate 0.000712 Epoch: 9 Global Step: 199790 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:04,444-Speed 2498.21 samples/sec Loss 3.5905 LearningRate 0.000711 Epoch: 9 Global Step: 199800 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:12,599-Speed 2511.58 samples/sec Loss 3.6893 LearningRate 0.000711 Epoch: 9 Global Step: 199810 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:20,798-Speed 2498.17 samples/sec Loss 3.7137 LearningRate 0.000711 Epoch: 9 Global Step: 199820 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:28,995-Speed 2498.89 samples/sec Loss 3.6672 LearningRate 0.000711 Epoch: 9 Global Step: 199830 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:37,214-Speed 2492.28 samples/sec Loss 3.7312 LearningRate 0.000711 Epoch: 9 Global Step: 199840 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:45,414-Speed 2498.01 samples/sec Loss 3.8263 LearningRate 0.000711 Epoch: 9 Global Step: 199850 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:06:53,618-Speed 2496.65 samples/sec Loss 3.7250 LearningRate 0.000711 Epoch: 9 Global Step: 199860 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:01,768-Speed 2513.38 samples/sec Loss 3.7792 LearningRate 0.000711 Epoch: 9 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:09,965-Speed 2499.12 samples/sec Loss 3.7278 LearningRate 0.000711 Epoch: 9 Global Step: 199880 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:18,167-Speed 2497.13 samples/sec Loss 3.7218 LearningRate 0.000711 Epoch: 9 Global Step: 199890 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:26,364-Speed 2498.74 samples/sec Loss 3.7597 LearningRate 0.000711 Epoch: 9 Global Step: 199900 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:34,566-Speed 2497.41 samples/sec Loss 3.7625 LearningRate 0.000711 Epoch: 9 Global Step: 199910 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:42,767-Speed 2497.71 samples/sec Loss 3.7006 LearningRate 0.000711 Epoch: 9 Global Step: 199920 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:50,914-Speed 2514.37 samples/sec Loss 3.6957 LearningRate 0.000711 Epoch: 9 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:07:59,112-Speed 2498.53 samples/sec Loss 3.6771 LearningRate 0.000711 Epoch: 9 Global Step: 199940 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:07,323-Speed 2494.34 samples/sec Loss 3.6890 LearningRate 0.000711 Epoch: 9 Global Step: 199950 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:15,519-Speed 2499.66 samples/sec Loss 3.6595 LearningRate 0.000711 Epoch: 9 Global Step: 199960 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:23,713-Speed 2499.68 samples/sec Loss 3.7430 LearningRate 0.000711 Epoch: 9 Global Step: 199970 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:31,923-Speed 2495.04 samples/sec Loss 3.6126 LearningRate 0.000711 Epoch: 9 Global Step: 199980 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:40,082-Speed 2510.54 samples/sec Loss 3.6632 LearningRate 0.000711 Epoch: 9 Global Step: 199990 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:48,281-Speed 2498.27 samples/sec Loss 3.6754 LearningRate 0.000711 Epoch: 9 Global Step: 200000 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:08:56,484-Speed 2497.15 samples/sec Loss 3.7065 LearningRate 0.000711 Epoch: 9 Global Step: 200010 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:04,686-Speed 2497.19 samples/sec Loss 3.7382 LearningRate 0.000711 Epoch: 9 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:12,890-Speed 2496.84 samples/sec Loss 3.6781 LearningRate 0.000711 Epoch: 9 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:21,096-Speed 2496.19 samples/sec Loss 3.6859 LearningRate 0.000711 Epoch: 9 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:29,248-Speed 2512.68 samples/sec Loss 3.7079 LearningRate 0.000711 Epoch: 9 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:37,451-Speed 2496.82 samples/sec Loss 3.6380 LearningRate 0.000711 Epoch: 9 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:45,657-Speed 2496.16 samples/sec Loss 3.7547 LearningRate 0.000711 Epoch: 9 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:09:53,858-Speed 2497.75 samples/sec Loss 3.6571 LearningRate 0.000711 Epoch: 9 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:10:02,059-Speed 2497.74 samples/sec Loss 3.7380 LearningRate 0.000711 Epoch: 9 Global Step: 200090 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:10:10,347-Speed 2471.57 samples/sec Loss 3.7234 LearningRate 0.000711 Epoch: 9 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:10:18,492-Speed 2514.87 samples/sec Loss 3.6556 LearningRate 0.000711 Epoch: 9 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:10:26,695-Speed 2496.82 samples/sec Loss 3.7213 LearningRate 0.000711 Epoch: 9 Global Step: 200120 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:10:34,895-Speed 2497.99 samples/sec Loss 3.7189 LearningRate 0.000711 Epoch: 9 Global Step: 200130 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:10:43,092-Speed 2498.73 samples/sec Loss 3.7224 LearningRate 0.000711 Epoch: 9 Global Step: 200140 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:10:51,290-Speed 2498.67 samples/sec Loss 3.7262 LearningRate 0.000711 Epoch: 9 Global Step: 200150 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:10:59,513-Speed 2491.01 samples/sec Loss 3.7031 LearningRate 0.000711 Epoch: 9 Global Step: 200160 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:07,660-Speed 2514.15 samples/sec Loss 3.6952 LearningRate 0.000711 Epoch: 9 Global Step: 200170 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:15,855-Speed 2499.26 samples/sec Loss 3.6785 LearningRate 0.000711 Epoch: 9 Global Step: 200180 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:24,053-Speed 2498.64 samples/sec Loss 3.6806 LearningRate 0.000711 Epoch: 9 Global Step: 200190 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:32,252-Speed 2498.20 samples/sec Loss 3.6839 LearningRate 0.000711 Epoch: 9 Global Step: 200200 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:40,453-Speed 2497.91 samples/sec Loss 3.7617 LearningRate 0.000711 Epoch: 9 Global Step: 200210 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:48,651-Speed 2498.74 samples/sec Loss 3.6779 LearningRate 0.000711 Epoch: 9 Global Step: 200220 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:11:56,794-Speed 2515.28 samples/sec Loss 3.7092 LearningRate 0.000711 Epoch: 9 Global Step: 200230 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:04,997-Speed 2497.05 samples/sec Loss 3.6295 LearningRate 0.000710 Epoch: 9 Global Step: 200240 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:13,194-Speed 2499.23 samples/sec Loss 3.7089 LearningRate 0.000710 Epoch: 9 Global Step: 200250 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:21,392-Speed 2498.70 samples/sec Loss 3.6689 LearningRate 0.000710 Epoch: 9 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:29,590-Speed 2498.40 samples/sec Loss 3.7885 LearningRate 0.000710 Epoch: 9 Global Step: 200270 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:37,792-Speed 2497.61 samples/sec Loss 3.8135 LearningRate 0.000710 Epoch: 9 Global Step: 200280 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:45,932-Speed 2516.28 samples/sec Loss 3.6898 LearningRate 0.000710 Epoch: 9 Global Step: 200290 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:12:54,130-Speed 2498.71 samples/sec Loss 3.7278 LearningRate 0.000710 Epoch: 9 Global Step: 200300 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:02,331-Speed 2497.63 samples/sec Loss 3.7443 LearningRate 0.000710 Epoch: 9 Global Step: 200310 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:10,529-Speed 2498.60 samples/sec Loss 3.6721 LearningRate 0.000710 Epoch: 9 Global Step: 200320 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:18,729-Speed 2498.16 samples/sec Loss 3.6586 LearningRate 0.000710 Epoch: 9 Global Step: 200330 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:26,928-Speed 2498.48 samples/sec Loss 3.6992 LearningRate 0.000710 Epoch: 9 Global Step: 200340 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:35,068-Speed 2516.18 samples/sec Loss 3.6423 LearningRate 0.000710 Epoch: 9 Global Step: 200350 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:43,267-Speed 2498.59 samples/sec Loss 3.6444 LearningRate 0.000710 Epoch: 9 Global Step: 200360 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:51,465-Speed 2498.67 samples/sec Loss 3.7123 LearningRate 0.000710 Epoch: 9 Global Step: 200370 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:13:59,663-Speed 2498.53 samples/sec Loss 3.5906 LearningRate 0.000710 Epoch: 9 Global Step: 200380 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:07,865-Speed 2497.24 samples/sec Loss 3.6298 LearningRate 0.000710 Epoch: 9 Global Step: 200390 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:16,070-Speed 2496.32 samples/sec Loss 3.6915 LearningRate 0.000710 Epoch: 9 Global Step: 200400 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:24,227-Speed 2511.18 samples/sec Loss 3.7304 LearningRate 0.000710 Epoch: 9 Global Step: 200410 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:32,426-Speed 2498.19 samples/sec Loss 3.6457 LearningRate 0.000710 Epoch: 9 Global Step: 200420 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:40,621-Speed 2499.42 samples/sec Loss 3.6786 LearningRate 0.000710 Epoch: 9 Global Step: 200430 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:48,822-Speed 2497.95 samples/sec Loss 3.7220 LearningRate 0.000710 Epoch: 9 Global Step: 200440 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:14:57,024-Speed 2497.47 samples/sec Loss 3.7207 LearningRate 0.000710 Epoch: 9 Global Step: 200450 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:15:05,224-Speed 2497.86 samples/sec Loss 3.7149 LearningRate 0.000710 Epoch: 9 Global Step: 200460 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:15:13,371-Speed 2514.10 samples/sec Loss 3.6953 LearningRate 0.000710 Epoch: 9 Global Step: 200470 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:15:21,571-Speed 2498.26 samples/sec Loss 3.6643 LearningRate 0.000710 Epoch: 9 Global Step: 200480 Fp16 Grad Scale: 65536 Required: 144 hours Training: 2022-07-07 12:15:29,728-Speed 2511.25 samples/sec Loss 3.6113 LearningRate 0.000710 Epoch: 9 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:15:37,926-Speed 2498.68 samples/sec Loss 3.6732 LearningRate 0.000710 Epoch: 9 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:15:46,132-Speed 2496.40 samples/sec Loss 3.6826 LearningRate 0.000710 Epoch: 9 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:15:54,333-Speed 2497.86 samples/sec Loss 3.7071 LearningRate 0.000710 Epoch: 9 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:02,481-Speed 2513.71 samples/sec Loss 3.7129 LearningRate 0.000710 Epoch: 9 Global Step: 200530 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:10,685-Speed 2496.67 samples/sec Loss 3.6695 LearningRate 0.000710 Epoch: 9 Global Step: 200540 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:18,886-Speed 2498.00 samples/sec Loss 3.6518 LearningRate 0.000710 Epoch: 9 Global Step: 200550 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:27,086-Speed 2498.07 samples/sec Loss 3.7626 LearningRate 0.000710 Epoch: 9 Global Step: 200560 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:35,286-Speed 2497.75 samples/sec Loss 3.7579 LearningRate 0.000710 Epoch: 9 Global Step: 200570 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:43,489-Speed 2497.18 samples/sec Loss 3.6449 LearningRate 0.000710 Epoch: 9 Global Step: 200580 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:51,631-Speed 2515.67 samples/sec Loss 3.6572 LearningRate 0.000710 Epoch: 9 Global Step: 200590 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:16:59,849-Speed 2492.50 samples/sec Loss 3.6871 LearningRate 0.000710 Epoch: 9 Global Step: 200600 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:08,046-Speed 2499.14 samples/sec Loss 3.6882 LearningRate 0.000710 Epoch: 9 Global Step: 200610 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:16,254-Speed 2495.68 samples/sec Loss 3.7521 LearningRate 0.000710 Epoch: 9 Global Step: 200620 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:24,453-Speed 2498.35 samples/sec Loss 3.6320 LearningRate 0.000710 Epoch: 9 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:32,659-Speed 2496.03 samples/sec Loss 3.5854 LearningRate 0.000710 Epoch: 9 Global Step: 200640 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:40,815-Speed 2511.52 samples/sec Loss 3.6909 LearningRate 0.000710 Epoch: 9 Global Step: 200650 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:49,012-Speed 2498.95 samples/sec Loss 3.6946 LearningRate 0.000710 Epoch: 9 Global Step: 200660 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:17:57,208-Speed 2499.24 samples/sec Loss 3.6694 LearningRate 0.000710 Epoch: 9 Global Step: 200670 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:05,412-Speed 2496.71 samples/sec Loss 3.6275 LearningRate 0.000710 Epoch: 9 Global Step: 200680 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:13,612-Speed 2497.74 samples/sec Loss 3.7213 LearningRate 0.000709 Epoch: 9 Global Step: 200690 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:21,809-Speed 2499.36 samples/sec Loss 3.7351 LearningRate 0.000709 Epoch: 9 Global Step: 200700 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:29,973-Speed 2508.80 samples/sec Loss 3.6695 LearningRate 0.000709 Epoch: 9 Global Step: 200710 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:38,179-Speed 2496.32 samples/sec Loss 3.6687 LearningRate 0.000709 Epoch: 9 Global Step: 200720 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:46,377-Speed 2498.45 samples/sec Loss 3.6446 LearningRate 0.000709 Epoch: 9 Global Step: 200730 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:18:54,580-Speed 2496.89 samples/sec Loss 3.6984 LearningRate 0.000709 Epoch: 9 Global Step: 200740 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:02,776-Speed 2499.39 samples/sec Loss 3.6437 LearningRate 0.000709 Epoch: 9 Global Step: 200750 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:10,974-Speed 2498.61 samples/sec Loss 3.6497 LearningRate 0.000709 Epoch: 9 Global Step: 200760 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:19,119-Speed 2514.88 samples/sec Loss 3.7223 LearningRate 0.000709 Epoch: 9 Global Step: 200770 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:27,319-Speed 2498.01 samples/sec Loss 3.6708 LearningRate 0.000709 Epoch: 9 Global Step: 200780 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:35,531-Speed 2494.29 samples/sec Loss 3.7131 LearningRate 0.000709 Epoch: 9 Global Step: 200790 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:43,730-Speed 2498.37 samples/sec Loss 3.6613 LearningRate 0.000709 Epoch: 9 Global Step: 200800 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:19:51,929-Speed 2498.29 samples/sec Loss 3.6861 LearningRate 0.000709 Epoch: 9 Global Step: 200810 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:00,125-Speed 2499.38 samples/sec Loss 3.7352 LearningRate 0.000709 Epoch: 9 Global Step: 200820 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:08,269-Speed 2515.23 samples/sec Loss 3.6672 LearningRate 0.000709 Epoch: 9 Global Step: 200830 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:16,468-Speed 2498.31 samples/sec Loss 3.6847 LearningRate 0.000709 Epoch: 9 Global Step: 200840 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:24,668-Speed 2498.07 samples/sec Loss 3.6945 LearningRate 0.000709 Epoch: 9 Global Step: 200850 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:32,867-Speed 2498.26 samples/sec Loss 3.7293 LearningRate 0.000709 Epoch: 9 Global Step: 200860 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:41,068-Speed 2497.73 samples/sec Loss 3.6506 LearningRate 0.000709 Epoch: 9 Global Step: 200870 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:49,266-Speed 2498.52 samples/sec Loss 3.6523 LearningRate 0.000709 Epoch: 9 Global Step: 200880 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:20:57,426-Speed 2510.25 samples/sec Loss 3.6463 LearningRate 0.000709 Epoch: 9 Global Step: 200890 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:05,628-Speed 2497.66 samples/sec Loss 3.6802 LearningRate 0.000709 Epoch: 9 Global Step: 200900 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:13,826-Speed 2498.54 samples/sec Loss 3.7270 LearningRate 0.000709 Epoch: 9 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:22,028-Speed 2497.42 samples/sec Loss 3.6918 LearningRate 0.000709 Epoch: 9 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:30,226-Speed 2498.79 samples/sec Loss 3.6331 LearningRate 0.000709 Epoch: 9 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:38,436-Speed 2494.81 samples/sec Loss 3.6473 LearningRate 0.000709 Epoch: 9 Global Step: 200940 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:46,584-Speed 2513.85 samples/sec Loss 3.6436 LearningRate 0.000709 Epoch: 9 Global Step: 200950 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:21:54,783-Speed 2498.64 samples/sec Loss 3.6500 LearningRate 0.000709 Epoch: 9 Global Step: 200960 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:02,987-Speed 2496.92 samples/sec Loss 3.7097 LearningRate 0.000709 Epoch: 9 Global Step: 200970 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:11,187-Speed 2498.17 samples/sec Loss 3.7272 LearningRate 0.000709 Epoch: 9 Global Step: 200980 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:19,384-Speed 2498.88 samples/sec Loss 3.6555 LearningRate 0.000709 Epoch: 9 Global Step: 200990 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:27,584-Speed 2497.78 samples/sec Loss 3.7043 LearningRate 0.000709 Epoch: 9 Global Step: 201000 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:35,733-Speed 2513.66 samples/sec Loss 3.6714 LearningRate 0.000709 Epoch: 9 Global Step: 201010 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:43,932-Speed 2498.23 samples/sec Loss 3.6424 LearningRate 0.000709 Epoch: 9 Global Step: 201020 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:22:52,141-Speed 2495.18 samples/sec Loss 3.6238 LearningRate 0.000709 Epoch: 9 Global Step: 201030 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:00,339-Speed 2498.78 samples/sec Loss 3.6525 LearningRate 0.000709 Epoch: 9 Global Step: 201040 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:08,535-Speed 2499.15 samples/sec Loss 3.6252 LearningRate 0.000709 Epoch: 9 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:16,744-Speed 2495.43 samples/sec Loss 3.6475 LearningRate 0.000709 Epoch: 9 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:24,896-Speed 2512.53 samples/sec Loss 3.7051 LearningRate 0.000709 Epoch: 9 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:33,122-Speed 2489.93 samples/sec Loss 3.7269 LearningRate 0.000709 Epoch: 9 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:41,334-Speed 2494.82 samples/sec Loss 3.7786 LearningRate 0.000709 Epoch: 9 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:49,530-Speed 2499.21 samples/sec Loss 3.6998 LearningRate 0.000709 Epoch: 9 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:23:57,732-Speed 2497.22 samples/sec Loss 3.6628 LearningRate 0.000709 Epoch: 9 Global Step: 201110 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:05,929-Speed 2499.05 samples/sec Loss 3.6439 LearningRate 0.000709 Epoch: 9 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:14,074-Speed 2514.65 samples/sec Loss 3.5993 LearningRate 0.000708 Epoch: 9 Global Step: 201130 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:22,271-Speed 2498.86 samples/sec Loss 3.6963 LearningRate 0.000708 Epoch: 9 Global Step: 201140 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:30,486-Speed 2493.40 samples/sec Loss 3.6175 LearningRate 0.000708 Epoch: 9 Global Step: 201150 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:38,705-Speed 2492.21 samples/sec Loss 3.6376 LearningRate 0.000708 Epoch: 9 Global Step: 201160 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:46,902-Speed 2498.84 samples/sec Loss 3.7141 LearningRate 0.000708 Epoch: 9 Global Step: 201170 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:24:55,103-Speed 2497.58 samples/sec Loss 3.6808 LearningRate 0.000708 Epoch: 9 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:03,250-Speed 2514.20 samples/sec Loss 3.6134 LearningRate 0.000708 Epoch: 9 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:11,447-Speed 2498.92 samples/sec Loss 3.6855 LearningRate 0.000708 Epoch: 9 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:19,656-Speed 2495.47 samples/sec Loss 3.6663 LearningRate 0.000708 Epoch: 9 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:27,858-Speed 2497.22 samples/sec Loss 3.7322 LearningRate 0.000708 Epoch: 9 Global Step: 201220 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:36,063-Speed 2496.54 samples/sec Loss 3.6518 LearningRate 0.000708 Epoch: 9 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:44,261-Speed 2498.64 samples/sec Loss 3.6150 LearningRate 0.000708 Epoch: 9 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:25:52,418-Speed 2510.98 samples/sec Loss 3.6752 LearningRate 0.000708 Epoch: 9 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:00,615-Speed 2498.81 samples/sec Loss 3.6533 LearningRate 0.000708 Epoch: 9 Global Step: 201260 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:08,816-Speed 2497.99 samples/sec Loss 3.6369 LearningRate 0.000708 Epoch: 9 Global Step: 201270 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:17,012-Speed 2499.13 samples/sec Loss 3.6534 LearningRate 0.000708 Epoch: 9 Global Step: 201280 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:25,211-Speed 2498.30 samples/sec Loss 3.6038 LearningRate 0.000708 Epoch: 9 Global Step: 201290 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:33,412-Speed 2497.77 samples/sec Loss 3.6876 LearningRate 0.000708 Epoch: 9 Global Step: 201300 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:41,562-Speed 2513.09 samples/sec Loss 3.6659 LearningRate 0.000708 Epoch: 9 Global Step: 201310 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:49,759-Speed 2498.82 samples/sec Loss 3.6478 LearningRate 0.000708 Epoch: 9 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:26:57,954-Speed 2499.44 samples/sec Loss 3.6177 LearningRate 0.000708 Epoch: 9 Global Step: 201330 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:06,153-Speed 2498.35 samples/sec Loss 3.6755 LearningRate 0.000708 Epoch: 9 Global Step: 201340 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:14,351-Speed 2498.40 samples/sec Loss 3.6059 LearningRate 0.000708 Epoch: 9 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:22,552-Speed 2497.81 samples/sec Loss 3.5987 LearningRate 0.000708 Epoch: 9 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:30,698-Speed 2514.43 samples/sec Loss 3.5924 LearningRate 0.000708 Epoch: 9 Global Step: 201370 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:38,893-Speed 2499.92 samples/sec Loss 3.6097 LearningRate 0.000708 Epoch: 9 Global Step: 201380 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:47,091-Speed 2498.69 samples/sec Loss 3.6130 LearningRate 0.000708 Epoch: 9 Global Step: 201390 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:27:55,289-Speed 2498.40 samples/sec Loss 3.6137 LearningRate 0.000708 Epoch: 9 Global Step: 201400 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:03,487-Speed 2498.52 samples/sec Loss 3.6256 LearningRate 0.000708 Epoch: 9 Global Step: 201410 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:11,683-Speed 2499.23 samples/sec Loss 3.6343 LearningRate 0.000708 Epoch: 9 Global Step: 201420 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:19,832-Speed 2513.74 samples/sec Loss 3.6024 LearningRate 0.000708 Epoch: 9 Global Step: 201430 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:28,027-Speed 2499.42 samples/sec Loss 3.6549 LearningRate 0.000708 Epoch: 9 Global Step: 201440 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:36,226-Speed 2498.43 samples/sec Loss 3.5992 LearningRate 0.000708 Epoch: 9 Global Step: 201450 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:44,426-Speed 2497.64 samples/sec Loss 3.6068 LearningRate 0.000708 Epoch: 9 Global Step: 201460 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:28:52,627-Speed 2498.17 samples/sec Loss 3.6265 LearningRate 0.000708 Epoch: 9 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:00,825-Speed 2498.49 samples/sec Loss 3.6519 LearningRate 0.000708 Epoch: 9 Global Step: 201480 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:08,972-Speed 2514.15 samples/sec Loss 3.6549 LearningRate 0.000708 Epoch: 9 Global Step: 201490 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:17,170-Speed 2498.58 samples/sec Loss 3.7006 LearningRate 0.000708 Epoch: 9 Global Step: 201500 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:25,368-Speed 2498.69 samples/sec Loss 3.7192 LearningRate 0.000708 Epoch: 9 Global Step: 201510 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:33,577-Speed 2495.00 samples/sec Loss 3.6968 LearningRate 0.000708 Epoch: 9 Global Step: 201520 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:41,776-Speed 2498.22 samples/sec Loss 3.7049 LearningRate 0.000708 Epoch: 9 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:49,972-Speed 2499.29 samples/sec Loss 3.7394 LearningRate 0.000708 Epoch: 9 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:29:58,472-Speed 2511.29 samples/sec Loss 3.6421 LearningRate 0.000708 Epoch: 9 Global Step: 201550 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:07,041-Speed 2501.06 samples/sec Loss 3.6410 LearningRate 0.000708 Epoch: 9 Global Step: 201560 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:15,238-Speed 2498.83 samples/sec Loss 3.6157 LearningRate 0.000707 Epoch: 9 Global Step: 201570 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:23,433-Speed 2499.65 samples/sec Loss 3.7130 LearningRate 0.000707 Epoch: 9 Global Step: 201580 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:31,653-Speed 2491.78 samples/sec Loss 3.6232 LearningRate 0.000707 Epoch: 9 Global Step: 201590 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:39,853-Speed 2498.05 samples/sec Loss 3.6039 LearningRate 0.000707 Epoch: 9 Global Step: 201600 Fp16 Grad Scale: 32768 Required: 144 hours Training: 2022-07-07 12:30:47,994-Speed 2515.81 samples/sec Loss 3.6667 LearningRate 0.000707 Epoch: 9 Global Step: 201610 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:30:56,196-Speed 2497.57 samples/sec Loss 3.6339 LearningRate 0.000707 Epoch: 9 Global Step: 201620 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:04,394-Speed 2498.34 samples/sec Loss 3.5810 LearningRate 0.000707 Epoch: 9 Global Step: 201630 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:12,590-Speed 2499.29 samples/sec Loss 3.6176 LearningRate 0.000707 Epoch: 9 Global Step: 201640 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:20,789-Speed 2498.20 samples/sec Loss 3.6258 LearningRate 0.000707 Epoch: 9 Global Step: 201650 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:28,989-Speed 2497.89 samples/sec Loss 3.6567 LearningRate 0.000707 Epoch: 9 Global Step: 201660 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:37,136-Speed 2514.59 samples/sec Loss 3.7679 LearningRate 0.000707 Epoch: 9 Global Step: 201670 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:45,338-Speed 2497.02 samples/sec Loss 3.6967 LearningRate 0.000707 Epoch: 9 Global Step: 201680 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:31:53,540-Speed 2497.37 samples/sec Loss 3.7147 LearningRate 0.000707 Epoch: 9 Global Step: 201690 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:01,738-Speed 2498.76 samples/sec Loss 3.6868 LearningRate 0.000707 Epoch: 9 Global Step: 201700 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:09,933-Speed 2499.53 samples/sec Loss 3.6956 LearningRate 0.000707 Epoch: 9 Global Step: 201710 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:18,130-Speed 2498.86 samples/sec Loss 3.6485 LearningRate 0.000707 Epoch: 9 Global Step: 201720 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:26,274-Speed 2515.15 samples/sec Loss 3.7177 LearningRate 0.000707 Epoch: 9 Global Step: 201730 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:34,475-Speed 2497.64 samples/sec Loss 3.7016 LearningRate 0.000707 Epoch: 9 Global Step: 201740 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:42,671-Speed 2499.23 samples/sec Loss 3.6954 LearningRate 0.000707 Epoch: 9 Global Step: 201750 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:50,868-Speed 2498.92 samples/sec Loss 3.5979 LearningRate 0.000707 Epoch: 9 Global Step: 201760 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:32:59,069-Speed 2497.49 samples/sec Loss 3.6655 LearningRate 0.000707 Epoch: 9 Global Step: 201770 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:07,268-Speed 2498.28 samples/sec Loss 3.5608 LearningRate 0.000707 Epoch: 9 Global Step: 201780 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:15,425-Speed 2511.12 samples/sec Loss 3.6228 LearningRate 0.000707 Epoch: 9 Global Step: 201790 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:23,626-Speed 2497.66 samples/sec Loss 3.6471 LearningRate 0.000707 Epoch: 9 Global Step: 201800 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:31,824-Speed 2498.54 samples/sec Loss 3.6399 LearningRate 0.000707 Epoch: 9 Global Step: 201810 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:40,023-Speed 2498.26 samples/sec Loss 3.6011 LearningRate 0.000707 Epoch: 9 Global Step: 201820 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:48,229-Speed 2496.21 samples/sec Loss 3.6768 LearningRate 0.000707 Epoch: 9 Global Step: 201830 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:33:56,428-Speed 2498.22 samples/sec Loss 3.7099 LearningRate 0.000707 Epoch: 9 Global Step: 201840 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:04,576-Speed 2514.05 samples/sec Loss 3.6459 LearningRate 0.000707 Epoch: 9 Global Step: 201850 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:12,772-Speed 2499.25 samples/sec Loss 3.6395 LearningRate 0.000707 Epoch: 9 Global Step: 201860 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:20,968-Speed 2499.25 samples/sec Loss 3.6855 LearningRate 0.000707 Epoch: 9 Global Step: 201870 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:29,166-Speed 2498.35 samples/sec Loss 3.6357 LearningRate 0.000707 Epoch: 9 Global Step: 201880 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:37,363-Speed 2499.01 samples/sec Loss 3.6315 LearningRate 0.000707 Epoch: 9 Global Step: 201890 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:45,562-Speed 2498.37 samples/sec Loss 3.6463 LearningRate 0.000707 Epoch: 9 Global Step: 201900 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:34:53,707-Speed 2514.64 samples/sec Loss 3.6270 LearningRate 0.000707 Epoch: 9 Global Step: 201910 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:01,904-Speed 2498.88 samples/sec Loss 3.6586 LearningRate 0.000707 Epoch: 9 Global Step: 201920 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:10,101-Speed 2498.85 samples/sec Loss 3.6538 LearningRate 0.000707 Epoch: 9 Global Step: 201930 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:18,298-Speed 2498.96 samples/sec Loss 3.6819 LearningRate 0.000707 Epoch: 9 Global Step: 201940 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:26,492-Speed 2499.94 samples/sec Loss 3.6750 LearningRate 0.000707 Epoch: 9 Global Step: 201950 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:34,699-Speed 2496.03 samples/sec Loss 3.6465 LearningRate 0.000707 Epoch: 9 Global Step: 201960 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:42,843-Speed 2514.89 samples/sec Loss 3.6932 LearningRate 0.000707 Epoch: 9 Global Step: 201970 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:51,077-Speed 2487.74 samples/sec Loss 3.6545 LearningRate 0.000707 Epoch: 9 Global Step: 201980 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:35:59,273-Speed 2499.00 samples/sec Loss 3.7291 LearningRate 0.000707 Epoch: 9 Global Step: 201990 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:07,476-Speed 2497.17 samples/sec Loss 3.6811 LearningRate 0.000707 Epoch: 9 Global Step: 202000 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:15,675-Speed 2498.26 samples/sec Loss 3.6723 LearningRate 0.000707 Epoch: 9 Global Step: 202010 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:23,873-Speed 2498.46 samples/sec Loss 3.6133 LearningRate 0.000706 Epoch: 9 Global Step: 202020 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:32,020-Speed 2514.33 samples/sec Loss 3.6380 LearningRate 0.000706 Epoch: 9 Global Step: 202030 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:40,220-Speed 2497.99 samples/sec Loss 3.5972 LearningRate 0.000706 Epoch: 9 Global Step: 202040 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:48,416-Speed 2499.12 samples/sec Loss 3.6079 LearningRate 0.000706 Epoch: 9 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:36:56,615-Speed 2498.27 samples/sec Loss 3.6164 LearningRate 0.000706 Epoch: 9 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:04,814-Speed 2498.18 samples/sec Loss 3.6162 LearningRate 0.000706 Epoch: 9 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:13,013-Speed 2498.46 samples/sec Loss 3.6202 LearningRate 0.000706 Epoch: 9 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:21,158-Speed 2515.04 samples/sec Loss 3.6533 LearningRate 0.000706 Epoch: 9 Global Step: 202090 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:29,355-Speed 2498.72 samples/sec Loss 3.7045 LearningRate 0.000706 Epoch: 9 Global Step: 202100 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:37,551-Speed 2499.28 samples/sec Loss 3.6694 LearningRate 0.000706 Epoch: 9 Global Step: 202110 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:45,752-Speed 2497.53 samples/sec Loss 3.6611 LearningRate 0.000706 Epoch: 9 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:37:53,949-Speed 2498.95 samples/sec Loss 3.7745 LearningRate 0.000706 Epoch: 9 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:02,148-Speed 2499.17 samples/sec Loss 3.7140 LearningRate 0.000706 Epoch: 9 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:10,301-Speed 2512.26 samples/sec Loss 3.6722 LearningRate 0.000706 Epoch: 9 Global Step: 202150 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:18,516-Speed 2493.19 samples/sec Loss 3.6606 LearningRate 0.000706 Epoch: 9 Global Step: 202160 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:26,716-Speed 2498.41 samples/sec Loss 3.6873 LearningRate 0.000706 Epoch: 9 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:34,914-Speed 2498.51 samples/sec Loss 3.6052 LearningRate 0.000706 Epoch: 9 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:43,113-Speed 2498.22 samples/sec Loss 3.6778 LearningRate 0.000706 Epoch: 9 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:51,323-Speed 2498.41 samples/sec Loss 3.6573 LearningRate 0.000706 Epoch: 9 Global Step: 202200 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:38:59,463-Speed 2516.49 samples/sec Loss 3.6343 LearningRate 0.000706 Epoch: 9 Global Step: 202210 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:07,673-Speed 2494.77 samples/sec Loss 3.6353 LearningRate 0.000706 Epoch: 9 Global Step: 202220 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:15,872-Speed 2498.43 samples/sec Loss 3.6219 LearningRate 0.000706 Epoch: 9 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:24,070-Speed 2498.63 samples/sec Loss 3.5371 LearningRate 0.000706 Epoch: 9 Global Step: 202240 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:32,280-Speed 2495.08 samples/sec Loss 3.6607 LearningRate 0.000706 Epoch: 9 Global Step: 202250 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:40,478-Speed 2498.30 samples/sec Loss 3.6336 LearningRate 0.000706 Epoch: 9 Global Step: 202260 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:48,626-Speed 2513.80 samples/sec Loss 3.6861 LearningRate 0.000706 Epoch: 9 Global Step: 202270 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:39:56,824-Speed 2498.66 samples/sec Loss 3.6256 LearningRate 0.000706 Epoch: 9 Global Step: 202280 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 12:40:04,978-Speed 2512.30 samples/sec Loss 3.6142 LearningRate 0.000706 Epoch: 9 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:13,187-Speed 2495.12 samples/sec Loss 3.6554 LearningRate 0.000706 Epoch: 9 Global Step: 202300 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:21,386-Speed 2498.27 samples/sec Loss 3.7379 LearningRate 0.000706 Epoch: 9 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:29,588-Speed 2497.79 samples/sec Loss 3.7121 LearningRate 0.000706 Epoch: 9 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:37,738-Speed 2513.23 samples/sec Loss 3.6330 LearningRate 0.000706 Epoch: 9 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:45,936-Speed 2498.51 samples/sec Loss 3.6889 LearningRate 0.000706 Epoch: 9 Global Step: 202340 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:40:54,134-Speed 2498.51 samples/sec Loss 3.6420 LearningRate 0.000706 Epoch: 9 Global Step: 202350 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:02,334-Speed 2498.20 samples/sec Loss 3.6752 LearningRate 0.000706 Epoch: 9 Global Step: 202360 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:10,531-Speed 2498.67 samples/sec Loss 3.6190 LearningRate 0.000706 Epoch: 9 Global Step: 202370 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:18,728-Speed 2498.81 samples/sec Loss 3.5936 LearningRate 0.000706 Epoch: 9 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:26,873-Speed 2515.11 samples/sec Loss 3.7135 LearningRate 0.000706 Epoch: 9 Global Step: 202390 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:35,072-Speed 2498.57 samples/sec Loss 3.6514 LearningRate 0.000706 Epoch: 9 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:43,270-Speed 2498.43 samples/sec Loss 3.6467 LearningRate 0.000706 Epoch: 9 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:51,471-Speed 2497.46 samples/sec Loss 3.6179 LearningRate 0.000706 Epoch: 9 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:41:59,672-Speed 2497.93 samples/sec Loss 3.6409 LearningRate 0.000706 Epoch: 9 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:07,872-Speed 2497.87 samples/sec Loss 3.6329 LearningRate 0.000706 Epoch: 9 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:16,016-Speed 2515.23 samples/sec Loss 3.6904 LearningRate 0.000706 Epoch: 9 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:24,217-Speed 2497.61 samples/sec Loss 3.6249 LearningRate 0.000705 Epoch: 9 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:32,443-Speed 2490.28 samples/sec Loss 3.6098 LearningRate 0.000705 Epoch: 9 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:40,643-Speed 2497.62 samples/sec Loss 3.6563 LearningRate 0.000705 Epoch: 9 Global Step: 202480 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:48,843-Speed 2497.92 samples/sec Loss 3.6244 LearningRate 0.000705 Epoch: 9 Global Step: 202490 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:42:57,044-Speed 2497.92 samples/sec Loss 3.6258 LearningRate 0.000705 Epoch: 9 Global Step: 202500 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:05,203-Speed 2510.38 samples/sec Loss 3.6214 LearningRate 0.000705 Epoch: 9 Global Step: 202510 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:13,403-Speed 2497.80 samples/sec Loss 3.6119 LearningRate 0.000705 Epoch: 9 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:21,601-Speed 2498.86 samples/sec Loss 3.6443 LearningRate 0.000705 Epoch: 9 Global Step: 202530 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:29,797-Speed 2499.00 samples/sec Loss 3.5663 LearningRate 0.000705 Epoch: 9 Global Step: 202540 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:38,004-Speed 2495.88 samples/sec Loss 3.6365 LearningRate 0.000705 Epoch: 9 Global Step: 202550 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:46,212-Speed 2495.48 samples/sec Loss 3.5958 LearningRate 0.000705 Epoch: 9 Global Step: 202560 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:43:54,378-Speed 2508.31 samples/sec Loss 3.6221 LearningRate 0.000705 Epoch: 9 Global Step: 202570 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:02,572-Speed 2499.77 samples/sec Loss 3.6402 LearningRate 0.000705 Epoch: 9 Global Step: 202580 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:10,788-Speed 2493.50 samples/sec Loss 3.6452 LearningRate 0.000705 Epoch: 9 Global Step: 202590 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:18,988-Speed 2497.65 samples/sec Loss 3.6341 LearningRate 0.000705 Epoch: 9 Global Step: 202600 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:27,188-Speed 2498.25 samples/sec Loss 3.6878 LearningRate 0.000705 Epoch: 9 Global Step: 202610 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:35,390-Speed 2497.33 samples/sec Loss 3.6161 LearningRate 0.000705 Epoch: 9 Global Step: 202620 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:43,549-Speed 2510.72 samples/sec Loss 3.6316 LearningRate 0.000705 Epoch: 9 Global Step: 202630 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:51,778-Speed 2489.22 samples/sec Loss 3.6522 LearningRate 0.000705 Epoch: 9 Global Step: 202640 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:44:59,977-Speed 2497.98 samples/sec Loss 3.6103 LearningRate 0.000705 Epoch: 9 Global Step: 202650 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:08,175-Speed 2498.86 samples/sec Loss 3.6037 LearningRate 0.000705 Epoch: 9 Global Step: 202660 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:16,373-Speed 2498.62 samples/sec Loss 3.6300 LearningRate 0.000705 Epoch: 9 Global Step: 202670 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:24,583-Speed 2495.02 samples/sec Loss 3.6853 LearningRate 0.000705 Epoch: 9 Global Step: 202680 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:32,733-Speed 2513.25 samples/sec Loss 3.6454 LearningRate 0.000705 Epoch: 9 Global Step: 202690 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:40,933-Speed 2498.06 samples/sec Loss 3.6642 LearningRate 0.000705 Epoch: 9 Global Step: 202700 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:49,132-Speed 2498.32 samples/sec Loss 3.6019 LearningRate 0.000705 Epoch: 9 Global Step: 202710 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:45:57,330-Speed 2498.76 samples/sec Loss 3.6939 LearningRate 0.000705 Epoch: 9 Global Step: 202720 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:05,526-Speed 2499.06 samples/sec Loss 3.6690 LearningRate 0.000705 Epoch: 9 Global Step: 202730 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:13,725-Speed 2498.25 samples/sec Loss 3.6536 LearningRate 0.000705 Epoch: 9 Global Step: 202740 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:21,869-Speed 2515.04 samples/sec Loss 3.6789 LearningRate 0.000705 Epoch: 9 Global Step: 202750 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:30,071-Speed 2497.45 samples/sec Loss 3.6351 LearningRate 0.000705 Epoch: 9 Global Step: 202760 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:38,276-Speed 2496.39 samples/sec Loss 3.6726 LearningRate 0.000705 Epoch: 9 Global Step: 202770 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:46,484-Speed 2495.61 samples/sec Loss 3.5933 LearningRate 0.000705 Epoch: 9 Global Step: 202780 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:46:54,684-Speed 2498.01 samples/sec Loss 3.6080 LearningRate 0.000705 Epoch: 9 Global Step: 202790 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:02,881-Speed 2498.77 samples/sec Loss 3.6400 LearningRate 0.000705 Epoch: 9 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:11,027-Speed 2514.62 samples/sec Loss 3.6228 LearningRate 0.000705 Epoch: 9 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:19,227-Speed 2498.03 samples/sec Loss 3.6684 LearningRate 0.000705 Epoch: 9 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:27,426-Speed 2497.97 samples/sec Loss 3.6207 LearningRate 0.000705 Epoch: 9 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:35,625-Speed 2498.42 samples/sec Loss 3.6156 LearningRate 0.000705 Epoch: 9 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:43,835-Speed 2495.11 samples/sec Loss 3.6500 LearningRate 0.000705 Epoch: 9 Global Step: 202850 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:47:52,032-Speed 2498.87 samples/sec Loss 3.7826 LearningRate 0.000705 Epoch: 9 Global Step: 202860 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:00,189-Speed 2511.19 samples/sec Loss 3.5632 LearningRate 0.000705 Epoch: 9 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:08,403-Speed 2493.54 samples/sec Loss 3.6409 LearningRate 0.000705 Epoch: 9 Global Step: 202880 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:16,607-Speed 2496.72 samples/sec Loss 3.6180 LearningRate 0.000705 Epoch: 9 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:24,805-Speed 2498.51 samples/sec Loss 3.6282 LearningRate 0.000705 Epoch: 9 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:33,016-Speed 2494.87 samples/sec Loss 3.6795 LearningRate 0.000704 Epoch: 9 Global Step: 202910 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:41,214-Speed 2498.56 samples/sec Loss 3.7066 LearningRate 0.000704 Epoch: 9 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:49,358-Speed 2515.08 samples/sec Loss 3.7673 LearningRate 0.000704 Epoch: 9 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:48:57,556-Speed 2498.67 samples/sec Loss 3.7553 LearningRate 0.000704 Epoch: 9 Global Step: 202940 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:05,755-Speed 2498.07 samples/sec Loss 3.7191 LearningRate 0.000704 Epoch: 9 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:13,951-Speed 2499.01 samples/sec Loss 3.6822 LearningRate 0.000704 Epoch: 9 Global Step: 202960 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:22,148-Speed 2498.93 samples/sec Loss 3.7915 LearningRate 0.000704 Epoch: 9 Global Step: 202970 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:30,347-Speed 2498.23 samples/sec Loss 3.6493 LearningRate 0.000704 Epoch: 9 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:38,491-Speed 2515.18 samples/sec Loss 3.7424 LearningRate 0.000704 Epoch: 9 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:46,690-Speed 2498.18 samples/sec Loss 3.7249 LearningRate 0.000704 Epoch: 9 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:49:54,892-Speed 2497.49 samples/sec Loss 3.6255 LearningRate 0.000704 Epoch: 9 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:03,105-Speed 2493.76 samples/sec Loss 3.6410 LearningRate 0.000704 Epoch: 9 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:11,309-Speed 2496.97 samples/sec Loss 3.6871 LearningRate 0.000704 Epoch: 9 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:19,505-Speed 2499.02 samples/sec Loss 3.6366 LearningRate 0.000704 Epoch: 9 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:27,651-Speed 2514.39 samples/sec Loss 3.6807 LearningRate 0.000704 Epoch: 9 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:35,850-Speed 2498.61 samples/sec Loss 3.6178 LearningRate 0.000704 Epoch: 9 Global Step: 203060 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:44,046-Speed 2498.86 samples/sec Loss 3.6439 LearningRate 0.000704 Epoch: 9 Global Step: 203070 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:50:52,242-Speed 2499.38 samples/sec Loss 3.6125 LearningRate 0.000704 Epoch: 9 Global Step: 203080 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 12:51:00,413-Speed 2507.20 samples/sec Loss 3.6899 LearningRate 0.000704 Epoch: 9 Global Step: 203090 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:08,612-Speed 2498.08 samples/sec Loss 3.6204 LearningRate 0.000704 Epoch: 9 Global Step: 203100 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:16,758-Speed 2514.61 samples/sec Loss 3.7408 LearningRate 0.000704 Epoch: 9 Global Step: 203110 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:24,964-Speed 2496.54 samples/sec Loss 3.6797 LearningRate 0.000704 Epoch: 9 Global Step: 203120 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:33,163-Speed 2498.09 samples/sec Loss 3.6673 LearningRate 0.000704 Epoch: 9 Global Step: 203130 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:41,367-Speed 2497.20 samples/sec Loss 3.6397 LearningRate 0.000704 Epoch: 9 Global Step: 203140 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:49,568-Speed 2497.69 samples/sec Loss 3.6770 LearningRate 0.000704 Epoch: 9 Global Step: 203150 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:51:57,769-Speed 2497.54 samples/sec Loss 3.7083 LearningRate 0.000704 Epoch: 9 Global Step: 203160 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:05,919-Speed 2513.25 samples/sec Loss 3.7084 LearningRate 0.000704 Epoch: 9 Global Step: 203170 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:14,116-Speed 2498.88 samples/sec Loss 3.6547 LearningRate 0.000704 Epoch: 9 Global Step: 203180 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:22,321-Speed 2496.42 samples/sec Loss 3.6176 LearningRate 0.000704 Epoch: 9 Global Step: 203190 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:30,519-Speed 2498.71 samples/sec Loss 3.6904 LearningRate 0.000704 Epoch: 9 Global Step: 203200 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:38,718-Speed 2498.18 samples/sec Loss 3.6978 LearningRate 0.000704 Epoch: 9 Global Step: 203210 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:46,920-Speed 2497.19 samples/sec Loss 3.6407 LearningRate 0.000704 Epoch: 9 Global Step: 203220 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:52:55,065-Speed 2514.85 samples/sec Loss 3.5865 LearningRate 0.000704 Epoch: 9 Global Step: 203230 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:03,269-Speed 2496.92 samples/sec Loss 3.6813 LearningRate 0.000704 Epoch: 9 Global Step: 203240 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:11,471-Speed 2497.24 samples/sec Loss 3.6169 LearningRate 0.000704 Epoch: 9 Global Step: 203250 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:19,670-Speed 2498.28 samples/sec Loss 3.6135 LearningRate 0.000704 Epoch: 9 Global Step: 203260 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:27,883-Speed 2494.03 samples/sec Loss 3.6412 LearningRate 0.000704 Epoch: 9 Global Step: 203270 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:36,088-Speed 2496.54 samples/sec Loss 3.6629 LearningRate 0.000704 Epoch: 9 Global Step: 203280 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:44,235-Speed 2514.29 samples/sec Loss 3.5721 LearningRate 0.000704 Epoch: 9 Global Step: 203290 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:53:52,433-Speed 2498.44 samples/sec Loss 3.6369 LearningRate 0.000704 Epoch: 9 Global Step: 203300 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:00,628-Speed 2499.61 samples/sec Loss 3.6402 LearningRate 0.000704 Epoch: 9 Global Step: 203310 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:08,824-Speed 2499.08 samples/sec Loss 3.5785 LearningRate 0.000704 Epoch: 9 Global Step: 203320 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:17,024-Speed 2497.86 samples/sec Loss 3.6951 LearningRate 0.000704 Epoch: 9 Global Step: 203330 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:25,221-Speed 2498.83 samples/sec Loss 3.6008 LearningRate 0.000704 Epoch: 9 Global Step: 203340 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:33,367-Speed 2514.55 samples/sec Loss 3.6124 LearningRate 0.000703 Epoch: 9 Global Step: 203350 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:41,565-Speed 2498.69 samples/sec Loss 3.5882 LearningRate 0.000703 Epoch: 9 Global Step: 203360 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:49,762-Speed 2498.67 samples/sec Loss 3.6369 LearningRate 0.000703 Epoch: 9 Global Step: 203370 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:54:57,960-Speed 2498.92 samples/sec Loss 3.6375 LearningRate 0.000703 Epoch: 9 Global Step: 203380 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:06,157-Speed 2499.29 samples/sec Loss 3.6057 LearningRate 0.000703 Epoch: 9 Global Step: 203390 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:14,355-Speed 2498.70 samples/sec Loss 3.6440 LearningRate 0.000703 Epoch: 9 Global Step: 203400 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:22,498-Speed 2515.24 samples/sec Loss 3.6885 LearningRate 0.000703 Epoch: 9 Global Step: 203410 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:30,696-Speed 2498.60 samples/sec Loss 3.5961 LearningRate 0.000703 Epoch: 9 Global Step: 203420 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:38,896-Speed 2498.05 samples/sec Loss 3.6636 LearningRate 0.000703 Epoch: 9 Global Step: 203430 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:47,098-Speed 2497.51 samples/sec Loss 3.7130 LearningRate 0.000703 Epoch: 9 Global Step: 203440 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:55:55,298-Speed 2498.17 samples/sec Loss 3.6030 LearningRate 0.000703 Epoch: 9 Global Step: 203450 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:03,496-Speed 2498.56 samples/sec Loss 3.6474 LearningRate 0.000703 Epoch: 9 Global Step: 203460 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:11,638-Speed 2515.60 samples/sec Loss 3.6190 LearningRate 0.000703 Epoch: 9 Global Step: 203470 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:19,837-Speed 2498.38 samples/sec Loss 3.6279 LearningRate 0.000703 Epoch: 9 Global Step: 203480 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:28,034-Speed 2499.07 samples/sec Loss 3.6418 LearningRate 0.000703 Epoch: 9 Global Step: 203490 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:36,233-Speed 2498.09 samples/sec Loss 3.5855 LearningRate 0.000703 Epoch: 9 Global Step: 203500 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:44,433-Speed 2498.13 samples/sec Loss 3.6197 LearningRate 0.000703 Epoch: 9 Global Step: 203510 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:56:52,632-Speed 2498.24 samples/sec Loss 3.6269 LearningRate 0.000703 Epoch: 9 Global Step: 203520 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:00,780-Speed 2513.91 samples/sec Loss 3.6187 LearningRate 0.000703 Epoch: 9 Global Step: 203530 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:08,980-Speed 2497.99 samples/sec Loss 3.5824 LearningRate 0.000703 Epoch: 9 Global Step: 203540 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:17,176-Speed 2499.25 samples/sec Loss 3.6675 LearningRate 0.000703 Epoch: 9 Global Step: 203550 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:25,374-Speed 2498.64 samples/sec Loss 3.6258 LearningRate 0.000703 Epoch: 9 Global Step: 203560 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:33,580-Speed 2496.37 samples/sec Loss 3.6482 LearningRate 0.000703 Epoch: 9 Global Step: 203570 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:41,782-Speed 2497.42 samples/sec Loss 3.6052 LearningRate 0.000703 Epoch: 9 Global Step: 203580 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:49,928-Speed 2514.36 samples/sec Loss 3.6826 LearningRate 0.000703 Epoch: 9 Global Step: 203590 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:57:58,132-Speed 2497.27 samples/sec Loss 3.5964 LearningRate 0.000703 Epoch: 9 Global Step: 203600 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:06,337-Speed 2496.66 samples/sec Loss 3.5884 LearningRate 0.000703 Epoch: 9 Global Step: 203610 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:14,539-Speed 2497.41 samples/sec Loss 3.5730 LearningRate 0.000703 Epoch: 9 Global Step: 203620 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:22,739-Speed 2497.85 samples/sec Loss 3.6269 LearningRate 0.000703 Epoch: 9 Global Step: 203630 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:30,941-Speed 2497.29 samples/sec Loss 3.5832 LearningRate 0.000703 Epoch: 9 Global Step: 203640 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:39,091-Speed 2513.57 samples/sec Loss 3.5849 LearningRate 0.000703 Epoch: 9 Global Step: 203650 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:47,292-Speed 2497.79 samples/sec Loss 3.6403 LearningRate 0.000703 Epoch: 9 Global Step: 203660 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:58:55,500-Speed 2495.41 samples/sec Loss 3.6545 LearningRate 0.000703 Epoch: 9 Global Step: 203670 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:03,702-Speed 2497.17 samples/sec Loss 3.7018 LearningRate 0.000703 Epoch: 9 Global Step: 203680 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:11,916-Speed 2493.89 samples/sec Loss 3.7322 LearningRate 0.000703 Epoch: 9 Global Step: 203690 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:20,119-Speed 2497.11 samples/sec Loss 3.6543 LearningRate 0.000703 Epoch: 9 Global Step: 203700 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:28,270-Speed 2513.28 samples/sec Loss 3.7205 LearningRate 0.000703 Epoch: 9 Global Step: 203710 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:36,470-Speed 2498.05 samples/sec Loss 3.6380 LearningRate 0.000703 Epoch: 9 Global Step: 203720 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:44,670-Speed 2497.90 samples/sec Loss 3.6931 LearningRate 0.000703 Epoch: 9 Global Step: 203730 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 12:59:52,874-Speed 2496.84 samples/sec Loss 3.5946 LearningRate 0.000703 Epoch: 9 Global Step: 203740 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:01,073-Speed 2498.29 samples/sec Loss 3.5627 LearningRate 0.000703 Epoch: 9 Global Step: 203750 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:09,277-Speed 2496.69 samples/sec Loss 3.7573 LearningRate 0.000703 Epoch: 9 Global Step: 203760 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:17,421-Speed 2515.16 samples/sec Loss 3.6975 LearningRate 0.000703 Epoch: 9 Global Step: 203770 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:25,625-Speed 2497.03 samples/sec Loss 3.5952 LearningRate 0.000703 Epoch: 9 Global Step: 203780 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:33,824-Speed 2498.37 samples/sec Loss 3.6549 LearningRate 0.000703 Epoch: 9 Global Step: 203790 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:42,021-Speed 2498.68 samples/sec Loss 3.6437 LearningRate 0.000702 Epoch: 9 Global Step: 203800 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:50,231-Speed 2495.16 samples/sec Loss 3.7183 LearningRate 0.000702 Epoch: 9 Global Step: 203810 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:00:58,430-Speed 2498.27 samples/sec Loss 3.6478 LearningRate 0.000702 Epoch: 9 Global Step: 203820 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:06,683-Speed 2516.27 samples/sec Loss 3.6330 LearningRate 0.000702 Epoch: 9 Global Step: 203830 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:15,216-Speed 2500.52 samples/sec Loss 3.5763 LearningRate 0.000702 Epoch: 9 Global Step: 203840 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:28,302-Speed 1589.51 samples/sec Loss 3.6774 LearningRate 0.000702 Epoch: 9 Global Step: 203850 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:36,502-Speed 2498.12 samples/sec Loss 3.6612 LearningRate 0.000702 Epoch: 9 Global Step: 203860 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:44,701-Speed 2498.17 samples/sec Loss 3.6073 LearningRate 0.000702 Epoch: 9 Global Step: 203870 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:01:52,909-Speed 2495.75 samples/sec Loss 3.6784 LearningRate 0.000702 Epoch: 9 Global Step: 203880 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:01,052-Speed 2515.32 samples/sec Loss 3.6124 LearningRate 0.000702 Epoch: 9 Global Step: 203890 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:09,248-Speed 2499.32 samples/sec Loss 3.6037 LearningRate 0.000702 Epoch: 9 Global Step: 203900 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:17,441-Speed 2499.95 samples/sec Loss 3.5737 LearningRate 0.000702 Epoch: 9 Global Step: 203910 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:25,642-Speed 2497.94 samples/sec Loss 3.6355 LearningRate 0.000702 Epoch: 9 Global Step: 203920 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:33,839-Speed 2498.69 samples/sec Loss 3.6397 LearningRate 0.000702 Epoch: 9 Global Step: 203930 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:42,038-Speed 2498.29 samples/sec Loss 3.6143 LearningRate 0.000702 Epoch: 9 Global Step: 203940 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:50,187-Speed 2513.55 samples/sec Loss 3.5467 LearningRate 0.000702 Epoch: 9 Global Step: 203950 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:02:58,389-Speed 2497.60 samples/sec Loss 3.5746 LearningRate 0.000702 Epoch: 9 Global Step: 203960 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:06,588-Speed 2498.33 samples/sec Loss 3.5578 LearningRate 0.000702 Epoch: 9 Global Step: 203970 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:14,786-Speed 2498.46 samples/sec Loss 3.6147 LearningRate 0.000702 Epoch: 9 Global Step: 203980 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:22,986-Speed 2498.28 samples/sec Loss 3.5736 LearningRate 0.000702 Epoch: 9 Global Step: 203990 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:31,189-Speed 2496.89 samples/sec Loss 3.6559 LearningRate 0.000702 Epoch: 9 Global Step: 204000 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:39,344-Speed 2511.89 samples/sec Loss 3.4889 LearningRate 0.000702 Epoch: 9 Global Step: 204010 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:47,545-Speed 2497.55 samples/sec Loss 3.5860 LearningRate 0.000702 Epoch: 9 Global Step: 204020 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:03:55,743-Speed 2498.71 samples/sec Loss 3.5984 LearningRate 0.000702 Epoch: 9 Global Step: 204030 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:03,937-Speed 2499.82 samples/sec Loss 3.5904 LearningRate 0.000702 Epoch: 9 Global Step: 204040 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:12,135-Speed 2498.42 samples/sec Loss 3.5907 LearningRate 0.000702 Epoch: 9 Global Step: 204050 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:20,334-Speed 2498.47 samples/sec Loss 3.5978 LearningRate 0.000702 Epoch: 9 Global Step: 204060 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:28,478-Speed 2515.13 samples/sec Loss 3.5490 LearningRate 0.000702 Epoch: 9 Global Step: 204070 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:36,675-Speed 2499.12 samples/sec Loss 3.6346 LearningRate 0.000702 Epoch: 9 Global Step: 204080 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:44,871-Speed 2499.12 samples/sec Loss 3.5872 LearningRate 0.000702 Epoch: 9 Global Step: 204090 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:04:53,070-Speed 2498.02 samples/sec Loss 3.6354 LearningRate 0.000702 Epoch: 9 Global Step: 204100 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:01,271-Speed 2497.76 samples/sec Loss 3.6210 LearningRate 0.000702 Epoch: 9 Global Step: 204110 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:09,492-Speed 2496.41 samples/sec Loss 3.5290 LearningRate 0.000702 Epoch: 9 Global Step: 204120 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:18,150-Speed 2515.33 samples/sec Loss 3.6291 LearningRate 0.000702 Epoch: 9 Global Step: 204130 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:26,349-Speed 2498.42 samples/sec Loss 3.6589 LearningRate 0.000702 Epoch: 9 Global Step: 204140 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:34,593-Speed 2498.06 samples/sec Loss 3.6451 LearningRate 0.000702 Epoch: 9 Global Step: 204150 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:42,836-Speed 2499.92 samples/sec Loss 3.6711 LearningRate 0.000702 Epoch: 9 Global Step: 204160 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:05:51,046-Speed 2494.69 samples/sec Loss 3.5604 LearningRate 0.000702 Epoch: 9 Global Step: 204170 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:01,826-Speed 2500.70 samples/sec Loss 3.7039 LearningRate 0.000702 Epoch: 9 Global Step: 204180 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:10,000-Speed 2517.24 samples/sec Loss 3.6197 LearningRate 0.000702 Epoch: 9 Global Step: 204190 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:21,321-Speed 1816.46 samples/sec Loss 3.6286 LearningRate 0.000702 Epoch: 9 Global Step: 204200 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:29,511-Speed 2500.84 samples/sec Loss 3.5900 LearningRate 0.000702 Epoch: 9 Global Step: 204210 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:37,754-Speed 2501.54 samples/sec Loss 3.5959 LearningRate 0.000702 Epoch: 9 Global Step: 204220 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:46,372-Speed 2502.27 samples/sec Loss 3.6151 LearningRate 0.000702 Epoch: 9 Global Step: 204230 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:06:56,739-Speed 1975.71 samples/sec Loss 3.5673 LearningRate 0.000701 Epoch: 9 Global Step: 204240 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:07:04,907-Speed 2516.89 samples/sec Loss 3.6171 LearningRate 0.000701 Epoch: 9 Global Step: 204250 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:07:13,129-Speed 2501.71 samples/sec Loss 3.6347 LearningRate 0.000701 Epoch: 9 Global Step: 204260 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:07:25,462-Speed 1672.07 samples/sec Loss 3.5962 LearningRate 0.000701 Epoch: 9 Global Step: 204270 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:07:33,690-Speed 2499.04 samples/sec Loss 3.6228 LearningRate 0.000701 Epoch: 9 Global Step: 204280 Fp16 Grad Scale: 16384 Required: 143 hours Training: 2022-07-07 13:07:41,928-Speed 2499.37 samples/sec Loss 3.6292 LearningRate 0.000701 Epoch: 9 Global Step: 204290 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:07:50,679-Speed 2495.53 samples/sec Loss 3.6497 LearningRate 0.000701 Epoch: 9 Global Step: 204300 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:07:58,843-Speed 2516.35 samples/sec Loss 3.6262 LearningRate 0.000701 Epoch: 9 Global Step: 204310 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:07,042-Speed 2498.22 samples/sec Loss 3.6724 LearningRate 0.000701 Epoch: 9 Global Step: 204320 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:15,276-Speed 2499.46 samples/sec Loss 3.6181 LearningRate 0.000701 Epoch: 9 Global Step: 204330 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:23,583-Speed 2499.14 samples/sec Loss 3.6085 LearningRate 0.000701 Epoch: 9 Global Step: 204340 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:33,664-Speed 2031.83 samples/sec Loss 3.6323 LearningRate 0.000701 Epoch: 9 Global Step: 204350 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:41,863-Speed 2498.12 samples/sec Loss 3.6607 LearningRate 0.000701 Epoch: 9 Global Step: 204360 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:50,062-Speed 2516.36 samples/sec Loss 3.6715 LearningRate 0.000701 Epoch: 9 Global Step: 204370 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:08:59,655-Speed 2492.37 samples/sec Loss 3.6587 LearningRate 0.000701 Epoch: 9 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:07,864-Speed 2495.02 samples/sec Loss 3.6220 LearningRate 0.000701 Epoch: 9 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:16,120-Speed 2499.34 samples/sec Loss 3.6515 LearningRate 0.000701 Epoch: 9 Global Step: 204400 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:24,384-Speed 2500.54 samples/sec Loss 3.6329 LearningRate 0.000701 Epoch: 9 Global Step: 204410 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:32,584-Speed 2497.92 samples/sec Loss 3.7331 LearningRate 0.000701 Epoch: 9 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:43,408-Speed 1904.27 samples/sec Loss 3.6290 LearningRate 0.000701 Epoch: 9 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:09:51,861-Speed 2500.67 samples/sec Loss 3.6792 LearningRate 0.000701 Epoch: 9 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:00,078-Speed 2499.89 samples/sec Loss 3.6290 LearningRate 0.000701 Epoch: 9 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:08,279-Speed 2497.68 samples/sec Loss 3.6652 LearningRate 0.000701 Epoch: 9 Global Step: 204460 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:19,377-Speed 1854.63 samples/sec Loss 3.6599 LearningRate 0.000701 Epoch: 9 Global Step: 204470 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:27,931-Speed 2394.70 samples/sec Loss 3.6344 LearningRate 0.000701 Epoch: 9 Global Step: 204480 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:36,069-Speed 2517.08 samples/sec Loss 3.6036 LearningRate 0.000701 Epoch: 9 Global Step: 204490 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:44,265-Speed 2499.35 samples/sec Loss 3.5772 LearningRate 0.000701 Epoch: 9 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:10:52,457-Speed 2500.45 samples/sec Loss 3.5576 LearningRate 0.000701 Epoch: 9 Global Step: 204510 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:00,654-Speed 2498.92 samples/sec Loss 3.5983 LearningRate 0.000701 Epoch: 9 Global Step: 204520 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:08,855-Speed 2497.66 samples/sec Loss 3.6530 LearningRate 0.000701 Epoch: 9 Global Step: 204530 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:17,054-Speed 2498.20 samples/sec Loss 3.6189 LearningRate 0.000701 Epoch: 9 Global Step: 204540 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:25,197-Speed 2515.48 samples/sec Loss 3.6392 LearningRate 0.000701 Epoch: 9 Global Step: 204550 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:33,402-Speed 2496.58 samples/sec Loss 3.5568 LearningRate 0.000701 Epoch: 9 Global Step: 204560 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:41,604-Speed 2497.29 samples/sec Loss 3.6308 LearningRate 0.000701 Epoch: 9 Global Step: 204570 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:49,802-Speed 2498.49 samples/sec Loss 3.5710 LearningRate 0.000701 Epoch: 9 Global Step: 204580 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:11:58,000-Speed 2498.96 samples/sec Loss 3.6232 LearningRate 0.000701 Epoch: 9 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:06,200-Speed 2497.96 samples/sec Loss 3.6852 LearningRate 0.000701 Epoch: 9 Global Step: 204600 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:14,377-Speed 2505.02 samples/sec Loss 3.6616 LearningRate 0.000701 Epoch: 9 Global Step: 204610 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:22,576-Speed 2498.05 samples/sec Loss 3.6148 LearningRate 0.000701 Epoch: 9 Global Step: 204620 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:30,776-Speed 2498.15 samples/sec Loss 3.6554 LearningRate 0.000701 Epoch: 9 Global Step: 204630 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:38,973-Speed 2498.73 samples/sec Loss 3.6764 LearningRate 0.000701 Epoch: 9 Global Step: 204640 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:47,177-Speed 2496.96 samples/sec Loss 3.6042 LearningRate 0.000701 Epoch: 9 Global Step: 204650 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:12:55,377-Speed 2497.74 samples/sec Loss 3.7950 LearningRate 0.000701 Epoch: 9 Global Step: 204660 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:03,523-Speed 2514.75 samples/sec Loss 3.6576 LearningRate 0.000701 Epoch: 9 Global Step: 204670 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:11,721-Speed 2498.37 samples/sec Loss 3.5720 LearningRate 0.000701 Epoch: 9 Global Step: 204680 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:19,932-Speed 2494.56 samples/sec Loss 3.6372 LearningRate 0.000700 Epoch: 9 Global Step: 204690 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:28,132-Speed 2498.01 samples/sec Loss 3.6817 LearningRate 0.000700 Epoch: 9 Global Step: 204700 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:36,328-Speed 2499.31 samples/sec Loss 3.5757 LearningRate 0.000700 Epoch: 9 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:44,544-Speed 2493.29 samples/sec Loss 3.5990 LearningRate 0.000700 Epoch: 9 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:13:52,692-Speed 2513.83 samples/sec Loss 3.7018 LearningRate 0.000700 Epoch: 9 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:00,891-Speed 2498.10 samples/sec Loss 3.6454 LearningRate 0.000700 Epoch: 9 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:09,096-Speed 2496.44 samples/sec Loss 3.5468 LearningRate 0.000700 Epoch: 9 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:17,293-Speed 2499.07 samples/sec Loss 3.6071 LearningRate 0.000700 Epoch: 9 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:25,489-Speed 2499.50 samples/sec Loss 3.6589 LearningRate 0.000700 Epoch: 9 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:33,694-Speed 2496.29 samples/sec Loss 3.5016 LearningRate 0.000700 Epoch: 9 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:41,839-Speed 2515.06 samples/sec Loss 3.5450 LearningRate 0.000700 Epoch: 9 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:50,035-Speed 2499.14 samples/sec Loss 3.5500 LearningRate 0.000700 Epoch: 9 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:14:58,236-Speed 2497.63 samples/sec Loss 3.6424 LearningRate 0.000700 Epoch: 9 Global Step: 204810 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:06,438-Speed 2497.49 samples/sec Loss 3.6071 LearningRate 0.000700 Epoch: 9 Global Step: 204820 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:14,638-Speed 2498.08 samples/sec Loss 3.6603 LearningRate 0.000700 Epoch: 9 Global Step: 204830 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:22,840-Speed 2497.07 samples/sec Loss 3.6770 LearningRate 0.000700 Epoch: 9 Global Step: 204840 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:31,001-Speed 2509.82 samples/sec Loss 3.6051 LearningRate 0.000700 Epoch: 9 Global Step: 204850 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:39,208-Speed 2496.01 samples/sec Loss 3.6574 LearningRate 0.000700 Epoch: 9 Global Step: 204860 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:47,413-Speed 2496.31 samples/sec Loss 3.6004 LearningRate 0.000700 Epoch: 9 Global Step: 204870 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:15:55,615-Speed 2497.72 samples/sec Loss 3.6174 LearningRate 0.000700 Epoch: 9 Global Step: 204880 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:03,816-Speed 2497.59 samples/sec Loss 3.7106 LearningRate 0.000700 Epoch: 9 Global Step: 204890 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:12,015-Speed 2498.34 samples/sec Loss 3.6289 LearningRate 0.000700 Epoch: 9 Global Step: 204900 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:20,169-Speed 2511.79 samples/sec Loss 3.6630 LearningRate 0.000700 Epoch: 9 Global Step: 204910 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:28,370-Speed 2497.84 samples/sec Loss 3.5700 LearningRate 0.000700 Epoch: 9 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:36,572-Speed 2497.25 samples/sec Loss 3.6527 LearningRate 0.000700 Epoch: 9 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:44,784-Speed 2494.29 samples/sec Loss 3.6624 LearningRate 0.000700 Epoch: 9 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:16:52,985-Speed 2497.60 samples/sec Loss 3.6951 LearningRate 0.000700 Epoch: 9 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:01,198-Speed 2493.87 samples/sec Loss 3.6191 LearningRate 0.000700 Epoch: 9 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:09,345-Speed 2514.37 samples/sec Loss 3.6705 LearningRate 0.000700 Epoch: 9 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:17,543-Speed 2498.54 samples/sec Loss 3.6181 LearningRate 0.000700 Epoch: 9 Global Step: 204980 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:25,744-Speed 2497.95 samples/sec Loss 3.6459 LearningRate 0.000700 Epoch: 9 Global Step: 204990 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:33,949-Speed 2496.65 samples/sec Loss 3.7038 LearningRate 0.000700 Epoch: 9 Global Step: 205000 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:42,149-Speed 2497.82 samples/sec Loss 3.6995 LearningRate 0.000700 Epoch: 9 Global Step: 205010 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:50,351-Speed 2497.44 samples/sec Loss 3.7489 LearningRate 0.000700 Epoch: 9 Global Step: 205020 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:17:58,499-Speed 2513.79 samples/sec Loss 3.7039 LearningRate 0.000700 Epoch: 9 Global Step: 205030 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:06,711-Speed 2494.13 samples/sec Loss 3.6736 LearningRate 0.000700 Epoch: 9 Global Step: 205040 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:14,912-Speed 2497.92 samples/sec Loss 3.7529 LearningRate 0.000700 Epoch: 9 Global Step: 205050 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:23,111-Speed 2498.33 samples/sec Loss 3.6978 LearningRate 0.000700 Epoch: 9 Global Step: 205060 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:31,318-Speed 2495.78 samples/sec Loss 3.6261 LearningRate 0.000700 Epoch: 9 Global Step: 205070 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:39,524-Speed 2496.03 samples/sec Loss 3.6391 LearningRate 0.000700 Epoch: 9 Global Step: 205080 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:47,674-Speed 2513.45 samples/sec Loss 3.7073 LearningRate 0.000700 Epoch: 9 Global Step: 205090 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:18:55,888-Speed 2493.85 samples/sec Loss 3.6438 LearningRate 0.000700 Epoch: 9 Global Step: 205100 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:04,089-Speed 2497.55 samples/sec Loss 3.6784 LearningRate 0.000700 Epoch: 9 Global Step: 205110 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:12,291-Speed 2497.35 samples/sec Loss 3.6544 LearningRate 0.000700 Epoch: 9 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:20,500-Speed 2495.09 samples/sec Loss 3.6720 LearningRate 0.000700 Epoch: 9 Global Step: 205130 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:28,697-Speed 2498.97 samples/sec Loss 3.6898 LearningRate 0.000699 Epoch: 9 Global Step: 205140 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:36,851-Speed 2512.06 samples/sec Loss 3.5842 LearningRate 0.000699 Epoch: 9 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:45,061-Speed 2494.59 samples/sec Loss 3.6773 LearningRate 0.000699 Epoch: 9 Global Step: 205160 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:19:53,260-Speed 2498.48 samples/sec Loss 3.6344 LearningRate 0.000699 Epoch: 9 Global Step: 205170 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:01,458-Speed 2498.56 samples/sec Loss 3.6315 LearningRate 0.000699 Epoch: 9 Global Step: 205180 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:09,659-Speed 2497.63 samples/sec Loss 3.5953 LearningRate 0.000699 Epoch: 9 Global Step: 205190 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:17,862-Speed 2497.54 samples/sec Loss 3.5299 LearningRate 0.000699 Epoch: 9 Global Step: 205200 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:26,006-Speed 2515.35 samples/sec Loss 3.6328 LearningRate 0.000699 Epoch: 9 Global Step: 205210 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:34,205-Speed 2498.48 samples/sec Loss 3.6196 LearningRate 0.000699 Epoch: 9 Global Step: 205220 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:42,404-Speed 2498.38 samples/sec Loss 3.5653 LearningRate 0.000699 Epoch: 9 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:50,601-Speed 2498.84 samples/sec Loss 3.6370 LearningRate 0.000699 Epoch: 9 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:20:58,803-Speed 2497.48 samples/sec Loss 3.6159 LearningRate 0.000699 Epoch: 9 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:06,998-Speed 2499.21 samples/sec Loss 3.6259 LearningRate 0.000699 Epoch: 9 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:15,145-Speed 2514.10 samples/sec Loss 3.6278 LearningRate 0.000699 Epoch: 9 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:23,348-Speed 2497.25 samples/sec Loss 3.6021 LearningRate 0.000699 Epoch: 9 Global Step: 205280 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:31,546-Speed 2498.75 samples/sec Loss 3.6128 LearningRate 0.000699 Epoch: 9 Global Step: 205290 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:39,747-Speed 2497.84 samples/sec Loss 3.6087 LearningRate 0.000699 Epoch: 9 Global Step: 205300 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:47,952-Speed 2496.47 samples/sec Loss 3.6216 LearningRate 0.000699 Epoch: 9 Global Step: 205310 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:21:56,154-Speed 2497.55 samples/sec Loss 3.6048 LearningRate 0.000699 Epoch: 9 Global Step: 205320 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:04,302-Speed 2513.78 samples/sec Loss 3.6828 LearningRate 0.000699 Epoch: 9 Global Step: 205330 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:12,504-Speed 2497.31 samples/sec Loss 3.5722 LearningRate 0.000699 Epoch: 9 Global Step: 205340 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:20,700-Speed 2499.37 samples/sec Loss 3.6658 LearningRate 0.000699 Epoch: 9 Global Step: 205350 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:28,898-Speed 2498.64 samples/sec Loss 3.5968 LearningRate 0.000699 Epoch: 9 Global Step: 205360 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:37,097-Speed 2498.29 samples/sec Loss 3.5175 LearningRate 0.000699 Epoch: 9 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:45,296-Speed 2498.07 samples/sec Loss 3.5822 LearningRate 0.000699 Epoch: 9 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:22:53,439-Speed 2515.56 samples/sec Loss 3.5881 LearningRate 0.000699 Epoch: 9 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:01,634-Speed 2499.62 samples/sec Loss 3.5936 LearningRate 0.000699 Epoch: 9 Global Step: 205400 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:09,830-Speed 2499.03 samples/sec Loss 3.5863 LearningRate 0.000699 Epoch: 9 Global Step: 205410 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:18,029-Speed 2498.42 samples/sec Loss 3.6295 LearningRate 0.000699 Epoch: 9 Global Step: 205420 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:26,226-Speed 2498.81 samples/sec Loss 3.5906 LearningRate 0.000699 Epoch: 9 Global Step: 205430 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:34,427-Speed 2497.50 samples/sec Loss 3.5105 LearningRate 0.000699 Epoch: 9 Global Step: 205440 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:42,572-Speed 2515.13 samples/sec Loss 3.5924 LearningRate 0.000699 Epoch: 9 Global Step: 205450 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:50,769-Speed 2499.14 samples/sec Loss 3.5152 LearningRate 0.000699 Epoch: 9 Global Step: 205460 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:23:58,970-Speed 2497.66 samples/sec Loss 3.5963 LearningRate 0.000699 Epoch: 9 Global Step: 205470 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:24:07,168-Speed 2498.30 samples/sec Loss 3.5310 LearningRate 0.000699 Epoch: 9 Global Step: 205480 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:24:15,365-Speed 2498.80 samples/sec Loss 3.6043 LearningRate 0.000699 Epoch: 9 Global Step: 205490 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:24:23,564-Speed 2498.46 samples/sec Loss 3.6453 LearningRate 0.000699 Epoch: 9 Global Step: 205500 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:24:31,709-Speed 2514.92 samples/sec Loss 3.5958 LearningRate 0.000699 Epoch: 9 Global Step: 205510 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:24:39,912-Speed 2496.86 samples/sec Loss 3.6631 LearningRate 0.000699 Epoch: 9 Global Step: 205520 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:24:48,113-Speed 2497.65 samples/sec Loss 3.6398 LearningRate 0.000699 Epoch: 9 Global Step: 205530 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:24:56,314-Speed 2497.53 samples/sec Loss 3.5795 LearningRate 0.000699 Epoch: 9 Global Step: 205540 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:04,515-Speed 2497.97 samples/sec Loss 3.6328 LearningRate 0.000699 Epoch: 9 Global Step: 205550 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:12,714-Speed 2498.23 samples/sec Loss 3.6473 LearningRate 0.000699 Epoch: 9 Global Step: 205560 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:20,860-Speed 2514.53 samples/sec Loss 3.5701 LearningRate 0.000699 Epoch: 9 Global Step: 205570 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:29,061-Speed 2498.05 samples/sec Loss 3.5727 LearningRate 0.000698 Epoch: 9 Global Step: 205580 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:37,260-Speed 2498.18 samples/sec Loss 3.6049 LearningRate 0.000698 Epoch: 9 Global Step: 205590 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:45,466-Speed 2495.94 samples/sec Loss 3.6370 LearningRate 0.000698 Epoch: 9 Global Step: 205600 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:25:53,666-Speed 2497.96 samples/sec Loss 3.6017 LearningRate 0.000698 Epoch: 9 Global Step: 205610 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:01,863-Speed 2498.89 samples/sec Loss 3.5446 LearningRate 0.000698 Epoch: 9 Global Step: 205620 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:10,027-Speed 2509.03 samples/sec Loss 3.7237 LearningRate 0.000698 Epoch: 9 Global Step: 205630 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:18,227-Speed 2498.10 samples/sec Loss 3.5835 LearningRate 0.000698 Epoch: 9 Global Step: 205640 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:26,424-Speed 2498.83 samples/sec Loss 3.5180 LearningRate 0.000698 Epoch: 9 Global Step: 205650 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:34,621-Speed 2498.71 samples/sec Loss 3.5960 LearningRate 0.000698 Epoch: 9 Global Step: 205660 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:42,823-Speed 2497.68 samples/sec Loss 3.6492 LearningRate 0.000698 Epoch: 9 Global Step: 205670 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:51,025-Speed 2497.12 samples/sec Loss 3.5243 LearningRate 0.000698 Epoch: 9 Global Step: 205680 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:26:59,173-Speed 2514.14 samples/sec Loss 3.6366 LearningRate 0.000698 Epoch: 9 Global Step: 205690 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:07,377-Speed 2496.75 samples/sec Loss 3.5478 LearningRate 0.000698 Epoch: 9 Global Step: 205700 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:15,578-Speed 2497.88 samples/sec Loss 3.5555 LearningRate 0.000698 Epoch: 9 Global Step: 205710 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:23,777-Speed 2498.01 samples/sec Loss 3.5744 LearningRate 0.000698 Epoch: 9 Global Step: 205720 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:31,987-Speed 2494.74 samples/sec Loss 3.6154 LearningRate 0.000698 Epoch: 9 Global Step: 205730 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:40,194-Speed 2496.20 samples/sec Loss 3.5714 LearningRate 0.000698 Epoch: 9 Global Step: 205740 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:48,342-Speed 2513.94 samples/sec Loss 3.5536 LearningRate 0.000698 Epoch: 9 Global Step: 205750 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:27:56,537-Speed 2499.70 samples/sec Loss 3.5625 LearningRate 0.000698 Epoch: 9 Global Step: 205760 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:04,738-Speed 2497.69 samples/sec Loss 3.6302 LearningRate 0.000698 Epoch: 9 Global Step: 205770 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:12,935-Speed 2498.68 samples/sec Loss 3.4849 LearningRate 0.000698 Epoch: 9 Global Step: 205780 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:21,135-Speed 2498.11 samples/sec Loss 3.5771 LearningRate 0.000698 Epoch: 9 Global Step: 205790 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:29,335-Speed 2497.84 samples/sec Loss 3.6326 LearningRate 0.000698 Epoch: 9 Global Step: 205800 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:37,482-Speed 2514.26 samples/sec Loss 3.5363 LearningRate 0.000698 Epoch: 9 Global Step: 205810 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:45,689-Speed 2496.02 samples/sec Loss 3.6719 LearningRate 0.000698 Epoch: 9 Global Step: 205820 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:28:53,889-Speed 2497.63 samples/sec Loss 3.6774 LearningRate 0.000698 Epoch: 9 Global Step: 205830 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:29:02,087-Speed 2498.64 samples/sec Loss 3.6559 LearningRate 0.000698 Epoch: 9 Global Step: 205840 Fp16 Grad Scale: 65536 Required: 143 hours Training: 2022-07-07 13:29:10,240-Speed 2512.56 samples/sec Loss 3.6207 LearningRate 0.000698 Epoch: 9 Global Step: 205850 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:18,437-Speed 2498.71 samples/sec Loss 3.6357 LearningRate 0.000698 Epoch: 9 Global Step: 205860 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:26,583-Speed 2514.49 samples/sec Loss 3.6004 LearningRate 0.000698 Epoch: 9 Global Step: 205870 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:34,783-Speed 2498.04 samples/sec Loss 3.6354 LearningRate 0.000698 Epoch: 9 Global Step: 205880 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:42,984-Speed 2497.85 samples/sec Loss 3.6065 LearningRate 0.000698 Epoch: 9 Global Step: 205890 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:51,182-Speed 2498.37 samples/sec Loss 3.5439 LearningRate 0.000698 Epoch: 9 Global Step: 205900 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:29:59,382-Speed 2498.51 samples/sec Loss 3.5656 LearningRate 0.000698 Epoch: 9 Global Step: 205910 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:07,577-Speed 2499.25 samples/sec Loss 3.6000 LearningRate 0.000698 Epoch: 9 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:15,726-Speed 2513.87 samples/sec Loss 3.5457 LearningRate 0.000698 Epoch: 9 Global Step: 205930 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:23,924-Speed 2498.48 samples/sec Loss 3.6141 LearningRate 0.000698 Epoch: 9 Global Step: 205940 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:32,127-Speed 2497.05 samples/sec Loss 3.6424 LearningRate 0.000698 Epoch: 9 Global Step: 205950 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:40,334-Speed 2495.96 samples/sec Loss 3.6176 LearningRate 0.000698 Epoch: 9 Global Step: 205960 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:48,535-Speed 2497.64 samples/sec Loss 3.5628 LearningRate 0.000698 Epoch: 9 Global Step: 205970 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:30:56,741-Speed 2496.26 samples/sec Loss 3.5887 LearningRate 0.000698 Epoch: 9 Global Step: 205980 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:04,893-Speed 2512.51 samples/sec Loss 3.5932 LearningRate 0.000698 Epoch: 9 Global Step: 205990 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:13,096-Speed 2497.09 samples/sec Loss 3.5765 LearningRate 0.000698 Epoch: 9 Global Step: 206000 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:21,298-Speed 2497.22 samples/sec Loss 3.5141 LearningRate 0.000698 Epoch: 9 Global Step: 206010 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:29,498-Speed 2498.03 samples/sec Loss 3.5681 LearningRate 0.000698 Epoch: 9 Global Step: 206020 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:37,700-Speed 2497.18 samples/sec Loss 3.5681 LearningRate 0.000697 Epoch: 9 Global Step: 206030 Fp16 Grad Scale: 32768 Required: 143 hours Training: 2022-07-07 13:31:45,900-Speed 2498.12 samples/sec Loss 3.6208 LearningRate 0.000697 Epoch: 9 Global Step: 206040 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:31:54,050-Speed 2513.36 samples/sec Loss 3.6079 LearningRate 0.000697 Epoch: 9 Global Step: 206050 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:02,250-Speed 2497.73 samples/sec Loss 3.6624 LearningRate 0.000697 Epoch: 9 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:10,453-Speed 2497.28 samples/sec Loss 3.6009 LearningRate 0.000697 Epoch: 9 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:18,656-Speed 2496.94 samples/sec Loss 3.5648 LearningRate 0.000697 Epoch: 9 Global Step: 206080 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:26,856-Speed 2498.25 samples/sec Loss 3.5736 LearningRate 0.000697 Epoch: 9 Global Step: 206090 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:35,056-Speed 2497.94 samples/sec Loss 3.5695 LearningRate 0.000697 Epoch: 9 Global Step: 206100 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:43,207-Speed 2513.17 samples/sec Loss 3.5959 LearningRate 0.000697 Epoch: 9 Global Step: 206110 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:51,406-Speed 2498.33 samples/sec Loss 3.5514 LearningRate 0.000697 Epoch: 9 Global Step: 206120 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:32:59,610-Speed 2496.77 samples/sec Loss 3.5522 LearningRate 0.000697 Epoch: 9 Global Step: 206130 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:07,809-Speed 2498.28 samples/sec Loss 3.6101 LearningRate 0.000697 Epoch: 9 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:16,008-Speed 2498.62 samples/sec Loss 3.5797 LearningRate 0.000697 Epoch: 9 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:24,204-Speed 2499.01 samples/sec Loss 3.5859 LearningRate 0.000697 Epoch: 9 Global Step: 206160 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:32,353-Speed 2513.58 samples/sec Loss 3.5870 LearningRate 0.000697 Epoch: 9 Global Step: 206170 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:40,552-Speed 2498.22 samples/sec Loss 3.6207 LearningRate 0.000697 Epoch: 9 Global Step: 206180 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:48,750-Speed 2498.53 samples/sec Loss 3.6073 LearningRate 0.000697 Epoch: 9 Global Step: 206190 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:33:56,950-Speed 2498.08 samples/sec Loss 3.6957 LearningRate 0.000697 Epoch: 9 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:05,148-Speed 2498.40 samples/sec Loss 3.5788 LearningRate 0.000697 Epoch: 9 Global Step: 206210 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:13,348-Speed 2497.99 samples/sec Loss 3.5318 LearningRate 0.000697 Epoch: 9 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:21,496-Speed 2514.07 samples/sec Loss 3.5671 LearningRate 0.000697 Epoch: 9 Global Step: 206230 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:29,695-Speed 2498.33 samples/sec Loss 3.6265 LearningRate 0.000697 Epoch: 9 Global Step: 206240 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:37,892-Speed 2498.58 samples/sec Loss 3.6033 LearningRate 0.000697 Epoch: 9 Global Step: 206250 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:46,092-Speed 2498.14 samples/sec Loss 3.6508 LearningRate 0.000697 Epoch: 9 Global Step: 206260 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:34:54,295-Speed 2497.39 samples/sec Loss 3.6167 LearningRate 0.000697 Epoch: 9 Global Step: 206270 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:02,500-Speed 2496.49 samples/sec Loss 3.6043 LearningRate 0.000697 Epoch: 9 Global Step: 206280 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:10,646-Speed 2514.44 samples/sec Loss 3.5299 LearningRate 0.000697 Epoch: 9 Global Step: 206290 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:18,847-Speed 2497.50 samples/sec Loss 3.6437 LearningRate 0.000697 Epoch: 9 Global Step: 206300 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:27,043-Speed 2499.24 samples/sec Loss 3.6783 LearningRate 0.000697 Epoch: 9 Global Step: 206310 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:35,245-Speed 2497.51 samples/sec Loss 3.6597 LearningRate 0.000697 Epoch: 9 Global Step: 206320 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:43,455-Speed 2494.52 samples/sec Loss 3.5757 LearningRate 0.000697 Epoch: 9 Global Step: 206330 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:51,651-Speed 2499.20 samples/sec Loss 3.6648 LearningRate 0.000697 Epoch: 9 Global Step: 206340 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:35:59,803-Speed 2512.70 samples/sec Loss 3.6251 LearningRate 0.000697 Epoch: 9 Global Step: 206350 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:08,002-Speed 2498.32 samples/sec Loss 3.6800 LearningRate 0.000697 Epoch: 9 Global Step: 206360 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:16,202-Speed 2497.82 samples/sec Loss 3.6468 LearningRate 0.000697 Epoch: 9 Global Step: 206370 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:24,401-Speed 2498.14 samples/sec Loss 3.7055 LearningRate 0.000697 Epoch: 9 Global Step: 206380 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:32,602-Speed 2497.68 samples/sec Loss 3.5874 LearningRate 0.000697 Epoch: 9 Global Step: 206390 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:40,856-Speed 2481.57 samples/sec Loss 3.6558 LearningRate 0.000697 Epoch: 9 Global Step: 206400 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:48,999-Speed 2515.28 samples/sec Loss 3.6001 LearningRate 0.000697 Epoch: 9 Global Step: 206410 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:36:57,539-Speed 2499.49 samples/sec Loss 3.6132 LearningRate 0.000697 Epoch: 9 Global Step: 206420 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:05,739-Speed 2497.89 samples/sec Loss 3.5805 LearningRate 0.000697 Epoch: 9 Global Step: 206430 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:14,329-Speed 2384.49 samples/sec Loss 3.6018 LearningRate 0.000697 Epoch: 9 Global Step: 206440 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:22,528-Speed 2498.41 samples/sec Loss 3.6160 LearningRate 0.000697 Epoch: 9 Global Step: 206450 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:30,727-Speed 2498.37 samples/sec Loss 3.5541 LearningRate 0.000697 Epoch: 9 Global Step: 206460 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:38,874-Speed 2514.27 samples/sec Loss 3.5776 LearningRate 0.000697 Epoch: 9 Global Step: 206470 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:47,071-Speed 2498.77 samples/sec Loss 3.6322 LearningRate 0.000696 Epoch: 9 Global Step: 206480 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:37:55,267-Speed 2499.46 samples/sec Loss 3.5774 LearningRate 0.000696 Epoch: 9 Global Step: 206490 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:03,468-Speed 2497.61 samples/sec Loss 3.6072 LearningRate 0.000696 Epoch: 9 Global Step: 206500 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:11,667-Speed 2498.16 samples/sec Loss 3.6292 LearningRate 0.000696 Epoch: 9 Global Step: 206510 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:19,866-Speed 2498.34 samples/sec Loss 3.6489 LearningRate 0.000696 Epoch: 9 Global Step: 206520 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:28,009-Speed 2515.34 samples/sec Loss 3.6009 LearningRate 0.000696 Epoch: 9 Global Step: 206530 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:36,206-Speed 2499.08 samples/sec Loss 3.5796 LearningRate 0.000696 Epoch: 9 Global Step: 206540 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:44,405-Speed 2498.15 samples/sec Loss 3.5730 LearningRate 0.000696 Epoch: 9 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:38:52,605-Speed 2498.05 samples/sec Loss 3.5537 LearningRate 0.000696 Epoch: 9 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:00,809-Speed 2496.89 samples/sec Loss 3.5526 LearningRate 0.000696 Epoch: 9 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:09,012-Speed 2497.12 samples/sec Loss 3.5797 LearningRate 0.000696 Epoch: 9 Global Step: 206580 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:17,157-Speed 2514.85 samples/sec Loss 3.5270 LearningRate 0.000696 Epoch: 9 Global Step: 206590 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:25,359-Speed 2497.63 samples/sec Loss 3.6427 LearningRate 0.000696 Epoch: 9 Global Step: 206600 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:33,558-Speed 2498.31 samples/sec Loss 3.6243 LearningRate 0.000696 Epoch: 9 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:41,755-Speed 2498.73 samples/sec Loss 3.6136 LearningRate 0.000696 Epoch: 9 Global Step: 206620 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:49,953-Speed 2498.58 samples/sec Loss 3.6329 LearningRate 0.000696 Epoch: 9 Global Step: 206630 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:39:58,151-Speed 2498.78 samples/sec Loss 3.6272 LearningRate 0.000696 Epoch: 9 Global Step: 206640 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:06,296-Speed 2515.01 samples/sec Loss 3.6195 LearningRate 0.000696 Epoch: 9 Global Step: 206650 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:14,493-Speed 2498.50 samples/sec Loss 3.6059 LearningRate 0.000696 Epoch: 9 Global Step: 206660 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:22,693-Speed 2498.01 samples/sec Loss 3.6137 LearningRate 0.000696 Epoch: 9 Global Step: 206670 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:30,888-Speed 2499.75 samples/sec Loss 3.5868 LearningRate 0.000696 Epoch: 9 Global Step: 206680 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:39,089-Speed 2497.70 samples/sec Loss 3.6617 LearningRate 0.000696 Epoch: 9 Global Step: 206690 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:47,285-Speed 2498.98 samples/sec Loss 3.6060 LearningRate 0.000696 Epoch: 9 Global Step: 206700 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:40:55,429-Speed 2515.28 samples/sec Loss 3.6048 LearningRate 0.000696 Epoch: 9 Global Step: 206710 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:03,631-Speed 2497.53 samples/sec Loss 3.5895 LearningRate 0.000696 Epoch: 9 Global Step: 206720 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:11,826-Speed 2499.52 samples/sec Loss 3.6045 LearningRate 0.000696 Epoch: 9 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:20,025-Speed 2497.93 samples/sec Loss 3.6012 LearningRate 0.000696 Epoch: 9 Global Step: 206740 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:28,230-Speed 2496.71 samples/sec Loss 3.6194 LearningRate 0.000696 Epoch: 9 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:36,441-Speed 2494.49 samples/sec Loss 3.5598 LearningRate 0.000696 Epoch: 9 Global Step: 206760 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:44,595-Speed 2512.03 samples/sec Loss 3.5828 LearningRate 0.000696 Epoch: 9 Global Step: 206770 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:41:52,788-Speed 2500.01 samples/sec Loss 3.6247 LearningRate 0.000696 Epoch: 9 Global Step: 206780 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:00,983-Speed 2499.19 samples/sec Loss 3.5697 LearningRate 0.000696 Epoch: 9 Global Step: 206790 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:09,190-Speed 2496.19 samples/sec Loss 3.5725 LearningRate 0.000696 Epoch: 9 Global Step: 206800 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:17,385-Speed 2499.31 samples/sec Loss 3.5780 LearningRate 0.000696 Epoch: 9 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:25,595-Speed 2494.87 samples/sec Loss 3.5873 LearningRate 0.000696 Epoch: 9 Global Step: 206820 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:33,739-Speed 2515.27 samples/sec Loss 3.5527 LearningRate 0.000696 Epoch: 9 Global Step: 206830 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:41,934-Speed 2499.48 samples/sec Loss 3.6206 LearningRate 0.000696 Epoch: 9 Global Step: 206840 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:50,134-Speed 2497.94 samples/sec Loss 3.5987 LearningRate 0.000696 Epoch: 9 Global Step: 206850 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:42:58,332-Speed 2498.66 samples/sec Loss 3.6427 LearningRate 0.000696 Epoch: 9 Global Step: 206860 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:06,530-Speed 2498.55 samples/sec Loss 3.6013 LearningRate 0.000696 Epoch: 9 Global Step: 206870 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:14,728-Speed 2498.46 samples/sec Loss 3.6153 LearningRate 0.000696 Epoch: 9 Global Step: 206880 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:22,886-Speed 2510.65 samples/sec Loss 3.5377 LearningRate 0.000696 Epoch: 9 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:31,085-Speed 2498.25 samples/sec Loss 3.6685 LearningRate 0.000696 Epoch: 9 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:39,286-Speed 2497.61 samples/sec Loss 3.5723 LearningRate 0.000696 Epoch: 9 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:47,482-Speed 2499.02 samples/sec Loss 3.5343 LearningRate 0.000695 Epoch: 9 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:43:55,683-Speed 2497.71 samples/sec Loss 3.5274 LearningRate 0.000695 Epoch: 9 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:03,882-Speed 2498.25 samples/sec Loss 3.5653 LearningRate 0.000695 Epoch: 9 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:12,029-Speed 2514.28 samples/sec Loss 3.5447 LearningRate 0.000695 Epoch: 9 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:20,230-Speed 2497.48 samples/sec Loss 3.5357 LearningRate 0.000695 Epoch: 9 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:28,451-Speed 2491.65 samples/sec Loss 3.4817 LearningRate 0.000695 Epoch: 9 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:36,650-Speed 2498.06 samples/sec Loss 3.5599 LearningRate 0.000695 Epoch: 9 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:44,859-Speed 2495.28 samples/sec Loss 3.5787 LearningRate 0.000695 Epoch: 9 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:44:53,058-Speed 2498.35 samples/sec Loss 3.5514 LearningRate 0.000695 Epoch: 9 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:45:01,208-Speed 2513.16 samples/sec Loss 3.6117 LearningRate 0.000695 Epoch: 9 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:45:09,406-Speed 2498.70 samples/sec Loss 3.4818 LearningRate 0.000695 Epoch: 9 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:45:17,604-Speed 2498.63 samples/sec Loss 3.5905 LearningRate 0.000695 Epoch: 9 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:45:25,800-Speed 2499.02 samples/sec Loss 3.6734 LearningRate 0.000695 Epoch: 9 Global Step: 207040 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:45:33,996-Speed 2499.33 samples/sec Loss 3.6349 LearningRate 0.000695 Epoch: 9 Global Step: 207050 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:45:42,192-Speed 2499.39 samples/sec Loss 3.6221 LearningRate 0.000695 Epoch: 9 Global Step: 207060 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:45:50,336-Speed 2514.93 samples/sec Loss 3.6505 LearningRate 0.000695 Epoch: 9 Global Step: 207070 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:45:58,533-Speed 2498.83 samples/sec Loss 3.6580 LearningRate 0.000695 Epoch: 9 Global Step: 207080 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:06,739-Speed 2496.05 samples/sec Loss 3.5833 LearningRate 0.000695 Epoch: 9 Global Step: 207090 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:14,936-Speed 2498.70 samples/sec Loss 3.5825 LearningRate 0.000695 Epoch: 9 Global Step: 207100 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:23,135-Speed 2498.90 samples/sec Loss 3.6229 LearningRate 0.000695 Epoch: 9 Global Step: 207110 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:31,341-Speed 2496.12 samples/sec Loss 3.5919 LearningRate 0.000695 Epoch: 9 Global Step: 207120 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:39,486-Speed 2514.60 samples/sec Loss 3.5736 LearningRate 0.000695 Epoch: 9 Global Step: 207130 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:47,689-Speed 2497.28 samples/sec Loss 3.5697 LearningRate 0.000695 Epoch: 9 Global Step: 207140 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:46:55,891-Speed 2497.40 samples/sec Loss 3.5463 LearningRate 0.000695 Epoch: 9 Global Step: 207150 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:04,089-Speed 2498.52 samples/sec Loss 3.6804 LearningRate 0.000695 Epoch: 9 Global Step: 207160 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:12,285-Speed 2499.28 samples/sec Loss 3.7570 LearningRate 0.000695 Epoch: 9 Global Step: 207170 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:20,485-Speed 2498.13 samples/sec Loss 3.5833 LearningRate 0.000695 Epoch: 9 Global Step: 207180 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:28,629-Speed 2515.04 samples/sec Loss 3.7046 LearningRate 0.000695 Epoch: 9 Global Step: 207190 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:36,824-Speed 2499.54 samples/sec Loss 3.6410 LearningRate 0.000695 Epoch: 9 Global Step: 207200 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 13:47:44,978-Speed 2511.84 samples/sec Loss 3.6339 LearningRate 0.000695 Epoch: 9 Global Step: 207210 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:47:53,177-Speed 2498.21 samples/sec Loss 3.6576 LearningRate 0.000695 Epoch: 9 Global Step: 207220 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:01,373-Speed 2499.27 samples/sec Loss 3.7211 LearningRate 0.000695 Epoch: 9 Global Step: 207230 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:09,571-Speed 2498.49 samples/sec Loss 3.6054 LearningRate 0.000695 Epoch: 9 Global Step: 207240 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:17,715-Speed 2514.90 samples/sec Loss 3.6894 LearningRate 0.000695 Epoch: 9 Global Step: 207250 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:25,912-Speed 2499.12 samples/sec Loss 3.6442 LearningRate 0.000695 Epoch: 9 Global Step: 207260 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:34,109-Speed 2499.03 samples/sec Loss 3.6744 LearningRate 0.000695 Epoch: 9 Global Step: 207270 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:42,309-Speed 2497.73 samples/sec Loss 3.5953 LearningRate 0.000695 Epoch: 9 Global Step: 207280 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:50,507-Speed 2498.97 samples/sec Loss 3.6271 LearningRate 0.000695 Epoch: 9 Global Step: 207290 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:48:58,719-Speed 2494.34 samples/sec Loss 3.6180 LearningRate 0.000695 Epoch: 9 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:06,862-Speed 2515.33 samples/sec Loss 3.6553 LearningRate 0.000695 Epoch: 9 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:15,059-Speed 2498.87 samples/sec Loss 3.5730 LearningRate 0.000695 Epoch: 9 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:23,256-Speed 2498.89 samples/sec Loss 3.5964 LearningRate 0.000695 Epoch: 9 Global Step: 207330 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:31,460-Speed 2496.83 samples/sec Loss 3.5917 LearningRate 0.000695 Epoch: 9 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:39,656-Speed 2499.39 samples/sec Loss 3.6175 LearningRate 0.000695 Epoch: 9 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:47,870-Speed 2493.60 samples/sec Loss 3.5657 LearningRate 0.000695 Epoch: 9 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:49:56,014-Speed 2515.13 samples/sec Loss 3.6703 LearningRate 0.000694 Epoch: 9 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:04,214-Speed 2498.02 samples/sec Loss 3.6327 LearningRate 0.000694 Epoch: 9 Global Step: 207380 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:12,411-Speed 2498.93 samples/sec Loss 3.6344 LearningRate 0.000694 Epoch: 9 Global Step: 207390 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:22,753-Speed 1980.43 samples/sec Loss 3.5434 LearningRate 0.000694 Epoch: 10 Global Step: 207400 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:30,952-Speed 2498.39 samples/sec Loss 3.6516 LearningRate 0.000694 Epoch: 10 Global Step: 207410 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:39,146-Speed 2499.62 samples/sec Loss 3.6558 LearningRate 0.000694 Epoch: 10 Global Step: 207420 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:47,290-Speed 2515.31 samples/sec Loss 3.5934 LearningRate 0.000694 Epoch: 10 Global Step: 207430 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:50:55,494-Speed 2496.67 samples/sec Loss 3.6130 LearningRate 0.000694 Epoch: 10 Global Step: 207440 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:03,704-Speed 2494.97 samples/sec Loss 3.6211 LearningRate 0.000694 Epoch: 10 Global Step: 207450 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:11,902-Speed 2498.58 samples/sec Loss 3.5371 LearningRate 0.000694 Epoch: 10 Global Step: 207460 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:20,102-Speed 2497.85 samples/sec Loss 3.5552 LearningRate 0.000694 Epoch: 10 Global Step: 207470 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:28,304-Speed 2497.60 samples/sec Loss 3.5477 LearningRate 0.000694 Epoch: 10 Global Step: 207480 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:36,459-Speed 2511.62 samples/sec Loss 3.5812 LearningRate 0.000694 Epoch: 10 Global Step: 207490 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:44,671-Speed 2494.41 samples/sec Loss 3.5829 LearningRate 0.000694 Epoch: 10 Global Step: 207500 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:51:52,870-Speed 2497.90 samples/sec Loss 3.6155 LearningRate 0.000694 Epoch: 10 Global Step: 207510 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:01,070-Speed 2498.49 samples/sec Loss 3.6018 LearningRate 0.000694 Epoch: 10 Global Step: 207520 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:09,270-Speed 2497.71 samples/sec Loss 3.6310 LearningRate 0.000694 Epoch: 10 Global Step: 207530 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:17,473-Speed 2497.12 samples/sec Loss 3.6231 LearningRate 0.000694 Epoch: 10 Global Step: 207540 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:25,621-Speed 2513.78 samples/sec Loss 3.5473 LearningRate 0.000694 Epoch: 10 Global Step: 207550 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:33,820-Speed 2498.30 samples/sec Loss 3.6299 LearningRate 0.000694 Epoch: 10 Global Step: 207560 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:42,033-Speed 2494.50 samples/sec Loss 3.5754 LearningRate 0.000694 Epoch: 10 Global Step: 207570 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:50,229-Speed 2498.82 samples/sec Loss 3.5387 LearningRate 0.000694 Epoch: 10 Global Step: 207580 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:52:58,432-Speed 2497.24 samples/sec Loss 3.6258 LearningRate 0.000694 Epoch: 10 Global Step: 207590 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:06,630-Speed 2498.49 samples/sec Loss 3.5888 LearningRate 0.000694 Epoch: 10 Global Step: 207600 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:14,775-Speed 2514.95 samples/sec Loss 3.6176 LearningRate 0.000694 Epoch: 10 Global Step: 207610 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:22,974-Speed 2498.40 samples/sec Loss 3.6015 LearningRate 0.000694 Epoch: 10 Global Step: 207620 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:31,169-Speed 2499.27 samples/sec Loss 3.5812 LearningRate 0.000694 Epoch: 10 Global Step: 207630 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:39,369-Speed 2497.99 samples/sec Loss 3.6176 LearningRate 0.000694 Epoch: 10 Global Step: 207640 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:47,572-Speed 2497.35 samples/sec Loss 3.5785 LearningRate 0.000694 Epoch: 10 Global Step: 207650 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:53:55,768-Speed 2499.02 samples/sec Loss 3.6111 LearningRate 0.000694 Epoch: 10 Global Step: 207660 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:03,926-Speed 2511.04 samples/sec Loss 3.5456 LearningRate 0.000694 Epoch: 10 Global Step: 207670 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:12,125-Speed 2498.59 samples/sec Loss 3.5430 LearningRate 0.000694 Epoch: 10 Global Step: 207680 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:20,325-Speed 2498.04 samples/sec Loss 3.6045 LearningRate 0.000694 Epoch: 10 Global Step: 207690 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:28,536-Speed 2494.40 samples/sec Loss 3.5504 LearningRate 0.000694 Epoch: 10 Global Step: 207700 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:36,733-Speed 2498.99 samples/sec Loss 3.6050 LearningRate 0.000694 Epoch: 10 Global Step: 207710 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:44,928-Speed 2499.35 samples/sec Loss 3.4888 LearningRate 0.000694 Epoch: 10 Global Step: 207720 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:54:53,077-Speed 2513.78 samples/sec Loss 3.5431 LearningRate 0.000694 Epoch: 10 Global Step: 207730 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:01,271-Speed 2499.59 samples/sec Loss 3.6268 LearningRate 0.000694 Epoch: 10 Global Step: 207740 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:09,474-Speed 2497.31 samples/sec Loss 3.4999 LearningRate 0.000694 Epoch: 10 Global Step: 207750 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:17,675-Speed 2497.50 samples/sec Loss 3.5985 LearningRate 0.000694 Epoch: 10 Global Step: 207760 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:25,875-Speed 2498.12 samples/sec Loss 3.6190 LearningRate 0.000694 Epoch: 10 Global Step: 207770 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:34,085-Speed 2494.96 samples/sec Loss 3.6809 LearningRate 0.000694 Epoch: 10 Global Step: 207780 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:42,233-Speed 2513.88 samples/sec Loss 3.5414 LearningRate 0.000694 Epoch: 10 Global Step: 207790 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:50,429-Speed 2499.25 samples/sec Loss 3.5880 LearningRate 0.000694 Epoch: 10 Global Step: 207800 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:55:58,627-Speed 2498.42 samples/sec Loss 3.5504 LearningRate 0.000694 Epoch: 10 Global Step: 207810 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:06,828-Speed 2497.77 samples/sec Loss 3.6080 LearningRate 0.000693 Epoch: 10 Global Step: 207820 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:15,025-Speed 2498.78 samples/sec Loss 3.5314 LearningRate 0.000693 Epoch: 10 Global Step: 207830 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:23,228-Speed 2496.79 samples/sec Loss 3.6404 LearningRate 0.000693 Epoch: 10 Global Step: 207840 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:31,391-Speed 2509.50 samples/sec Loss 3.5297 LearningRate 0.000693 Epoch: 10 Global Step: 207850 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:39,592-Speed 2497.42 samples/sec Loss 3.6192 LearningRate 0.000693 Epoch: 10 Global Step: 207860 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:47,789-Speed 2499.04 samples/sec Loss 3.6062 LearningRate 0.000693 Epoch: 10 Global Step: 207870 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:56:55,989-Speed 2497.92 samples/sec Loss 3.5674 LearningRate 0.000693 Epoch: 10 Global Step: 207880 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:04,186-Speed 2498.79 samples/sec Loss 3.5489 LearningRate 0.000693 Epoch: 10 Global Step: 207890 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:12,386-Speed 2498.04 samples/sec Loss 3.5285 LearningRate 0.000693 Epoch: 10 Global Step: 207900 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:20,543-Speed 2511.36 samples/sec Loss 3.5285 LearningRate 0.000693 Epoch: 10 Global Step: 207910 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:28,744-Speed 2497.40 samples/sec Loss 3.5516 LearningRate 0.000693 Epoch: 10 Global Step: 207920 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:36,942-Speed 2498.59 samples/sec Loss 3.5708 LearningRate 0.000693 Epoch: 10 Global Step: 207930 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:45,148-Speed 2496.09 samples/sec Loss 3.6041 LearningRate 0.000693 Epoch: 10 Global Step: 207940 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:57:53,350-Speed 2497.29 samples/sec Loss 3.5830 LearningRate 0.000693 Epoch: 10 Global Step: 207950 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:01,550-Speed 2498.03 samples/sec Loss 3.6545 LearningRate 0.000693 Epoch: 10 Global Step: 207960 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:09,700-Speed 2513.32 samples/sec Loss 3.6546 LearningRate 0.000693 Epoch: 10 Global Step: 207970 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:17,898-Speed 2498.67 samples/sec Loss 3.5521 LearningRate 0.000693 Epoch: 10 Global Step: 207980 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:26,094-Speed 2499.19 samples/sec Loss 3.5484 LearningRate 0.000693 Epoch: 10 Global Step: 207990 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:34,296-Speed 2497.72 samples/sec Loss 3.6631 LearningRate 0.000693 Epoch: 10 Global Step: 208000 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:42,506-Speed 2495.01 samples/sec Loss 3.6283 LearningRate 0.000693 Epoch: 10 Global Step: 208010 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:50,704-Speed 2498.54 samples/sec Loss 3.6224 LearningRate 0.000693 Epoch: 10 Global Step: 208020 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:58:58,850-Speed 2514.29 samples/sec Loss 3.6011 LearningRate 0.000693 Epoch: 10 Global Step: 208030 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:07,049-Speed 2498.38 samples/sec Loss 3.5919 LearningRate 0.000693 Epoch: 10 Global Step: 208040 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:15,248-Speed 2498.37 samples/sec Loss 3.6107 LearningRate 0.000693 Epoch: 10 Global Step: 208050 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:23,444-Speed 2499.15 samples/sec Loss 3.5655 LearningRate 0.000693 Epoch: 10 Global Step: 208060 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:31,640-Speed 2499.39 samples/sec Loss 3.6529 LearningRate 0.000693 Epoch: 10 Global Step: 208070 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:39,857-Speed 2492.83 samples/sec Loss 3.6094 LearningRate 0.000693 Epoch: 10 Global Step: 208080 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:48,002-Speed 2514.68 samples/sec Loss 3.6327 LearningRate 0.000693 Epoch: 10 Global Step: 208090 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 13:59:56,200-Speed 2498.72 samples/sec Loss 3.5702 LearningRate 0.000693 Epoch: 10 Global Step: 208100 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:04,404-Speed 2496.72 samples/sec Loss 3.5420 LearningRate 0.000693 Epoch: 10 Global Step: 208110 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:12,601-Speed 2498.73 samples/sec Loss 3.5065 LearningRate 0.000693 Epoch: 10 Global Step: 208120 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:20,805-Speed 2496.66 samples/sec Loss 3.6530 LearningRate 0.000693 Epoch: 10 Global Step: 208130 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:29,007-Speed 2497.61 samples/sec Loss 3.6126 LearningRate 0.000693 Epoch: 10 Global Step: 208140 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:37,155-Speed 2514.19 samples/sec Loss 3.7421 LearningRate 0.000693 Epoch: 10 Global Step: 208150 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:45,355-Speed 2497.73 samples/sec Loss 3.6216 LearningRate 0.000693 Epoch: 10 Global Step: 208160 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:00:53,556-Speed 2497.51 samples/sec Loss 3.5982 LearningRate 0.000693 Epoch: 10 Global Step: 208170 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:01,757-Speed 2498.01 samples/sec Loss 3.6436 LearningRate 0.000693 Epoch: 10 Global Step: 208180 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:09,956-Speed 2498.30 samples/sec Loss 3.6643 LearningRate 0.000693 Epoch: 10 Global Step: 208190 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:18,179-Speed 2490.68 samples/sec Loss 3.5689 LearningRate 0.000693 Epoch: 10 Global Step: 208200 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:26,327-Speed 2514.12 samples/sec Loss 3.6354 LearningRate 0.000693 Epoch: 10 Global Step: 208210 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:34,527-Speed 2498.19 samples/sec Loss 3.5901 LearningRate 0.000693 Epoch: 10 Global Step: 208220 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:42,730-Speed 2496.84 samples/sec Loss 3.6913 LearningRate 0.000693 Epoch: 10 Global Step: 208230 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:50,925-Speed 2499.51 samples/sec Loss 3.6700 LearningRate 0.000693 Epoch: 10 Global Step: 208240 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:01:59,135-Speed 2494.98 samples/sec Loss 3.6007 LearningRate 0.000693 Epoch: 10 Global Step: 208250 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:07,329-Speed 2499.91 samples/sec Loss 3.6764 LearningRate 0.000693 Epoch: 10 Global Step: 208260 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:15,473-Speed 2514.86 samples/sec Loss 3.6343 LearningRate 0.000692 Epoch: 10 Global Step: 208270 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:23,681-Speed 2497.11 samples/sec Loss 3.6384 LearningRate 0.000692 Epoch: 10 Global Step: 208280 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:31,879-Speed 2498.59 samples/sec Loss 3.6342 LearningRate 0.000692 Epoch: 10 Global Step: 208290 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:40,076-Speed 2498.81 samples/sec Loss 3.5842 LearningRate 0.000692 Epoch: 10 Global Step: 208300 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:48,272-Speed 2499.39 samples/sec Loss 3.5684 LearningRate 0.000692 Epoch: 10 Global Step: 208310 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:02:56,473-Speed 2497.35 samples/sec Loss 3.6058 LearningRate 0.000692 Epoch: 10 Global Step: 208320 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:04,619-Speed 2514.66 samples/sec Loss 3.5703 LearningRate 0.000692 Epoch: 10 Global Step: 208330 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:12,818-Speed 2498.65 samples/sec Loss 3.6084 LearningRate 0.000692 Epoch: 10 Global Step: 208340 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:21,031-Speed 2493.71 samples/sec Loss 3.5745 LearningRate 0.000692 Epoch: 10 Global Step: 208350 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:29,233-Speed 2497.74 samples/sec Loss 3.6360 LearningRate 0.000692 Epoch: 10 Global Step: 208360 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:37,433-Speed 2498.15 samples/sec Loss 3.5954 LearningRate 0.000692 Epoch: 10 Global Step: 208370 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:45,629-Speed 2498.93 samples/sec Loss 3.6007 LearningRate 0.000692 Epoch: 10 Global Step: 208380 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:03:53,773-Speed 2515.21 samples/sec Loss 3.5948 LearningRate 0.000692 Epoch: 10 Global Step: 208390 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:04:01,974-Speed 2497.59 samples/sec Loss 3.5139 LearningRate 0.000692 Epoch: 10 Global Step: 208400 Fp16 Grad Scale: 32768 Required: 142 hours Training: 2022-07-07 14:04:10,171-Speed 2498.78 samples/sec Loss 3.5755 LearningRate 0.000692 Epoch: 10 Global Step: 208410 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:18,368-Speed 2499.11 samples/sec Loss 3.6197 LearningRate 0.000692 Epoch: 10 Global Step: 208420 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:26,563-Speed 2499.14 samples/sec Loss 3.6680 LearningRate 0.000692 Epoch: 10 Global Step: 208430 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:34,769-Speed 2496.18 samples/sec Loss 3.5807 LearningRate 0.000692 Epoch: 10 Global Step: 208440 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:42,917-Speed 2514.15 samples/sec Loss 3.5965 LearningRate 0.000692 Epoch: 10 Global Step: 208450 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:51,114-Speed 2498.76 samples/sec Loss 3.5568 LearningRate 0.000692 Epoch: 10 Global Step: 208460 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:04:59,314-Speed 2497.96 samples/sec Loss 3.5919 LearningRate 0.000692 Epoch: 10 Global Step: 208470 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:07,512-Speed 2498.69 samples/sec Loss 3.5944 LearningRate 0.000692 Epoch: 10 Global Step: 208480 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:15,709-Speed 2498.80 samples/sec Loss 3.6208 LearningRate 0.000692 Epoch: 10 Global Step: 208490 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:23,908-Speed 2498.20 samples/sec Loss 3.6028 LearningRate 0.000692 Epoch: 10 Global Step: 208500 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:32,051-Speed 2515.44 samples/sec Loss 3.5806 LearningRate 0.000692 Epoch: 10 Global Step: 208510 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:40,252-Speed 2497.61 samples/sec Loss 3.5025 LearningRate 0.000692 Epoch: 10 Global Step: 208520 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:48,447-Speed 2499.42 samples/sec Loss 3.5553 LearningRate 0.000692 Epoch: 10 Global Step: 208530 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:05:56,645-Speed 2498.67 samples/sec Loss 3.6049 LearningRate 0.000692 Epoch: 10 Global Step: 208540 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:04,843-Speed 2498.39 samples/sec Loss 3.5721 LearningRate 0.000692 Epoch: 10 Global Step: 208550 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:13,049-Speed 2496.19 samples/sec Loss 3.5359 LearningRate 0.000692 Epoch: 10 Global Step: 208560 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:21,194-Speed 2514.82 samples/sec Loss 3.5279 LearningRate 0.000692 Epoch: 10 Global Step: 208570 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:29,393-Speed 2498.11 samples/sec Loss 3.5317 LearningRate 0.000692 Epoch: 10 Global Step: 208580 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:37,593-Speed 2499.43 samples/sec Loss 3.6062 LearningRate 0.000692 Epoch: 10 Global Step: 208590 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:45,791-Speed 2498.67 samples/sec Loss 3.5659 LearningRate 0.000692 Epoch: 10 Global Step: 208600 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:06:53,995-Speed 2496.66 samples/sec Loss 3.5228 LearningRate 0.000692 Epoch: 10 Global Step: 208610 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:02,198-Speed 2496.86 samples/sec Loss 3.5261 LearningRate 0.000692 Epoch: 10 Global Step: 208620 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:10,350-Speed 2512.66 samples/sec Loss 3.6086 LearningRate 0.000692 Epoch: 10 Global Step: 208630 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:18,549-Speed 2498.41 samples/sec Loss 3.6539 LearningRate 0.000692 Epoch: 10 Global Step: 208640 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:26,761-Speed 2494.04 samples/sec Loss 3.5724 LearningRate 0.000692 Epoch: 10 Global Step: 208650 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:34,963-Speed 2497.16 samples/sec Loss 3.5291 LearningRate 0.000692 Epoch: 10 Global Step: 208660 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:43,161-Speed 2498.65 samples/sec Loss 3.5333 LearningRate 0.000692 Epoch: 10 Global Step: 208670 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:51,361-Speed 2498.15 samples/sec Loss 3.5237 LearningRate 0.000692 Epoch: 10 Global Step: 208680 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:07:59,508-Speed 2514.08 samples/sec Loss 3.5345 LearningRate 0.000692 Epoch: 10 Global Step: 208690 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:07,708-Speed 2497.95 samples/sec Loss 3.5129 LearningRate 0.000692 Epoch: 10 Global Step: 208700 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:15,908-Speed 2498.56 samples/sec Loss 3.6312 LearningRate 0.000692 Epoch: 10 Global Step: 208710 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:24,105-Speed 2498.90 samples/sec Loss 3.5100 LearningRate 0.000691 Epoch: 10 Global Step: 208720 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:32,306-Speed 2497.56 samples/sec Loss 3.4903 LearningRate 0.000691 Epoch: 10 Global Step: 208730 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:40,510-Speed 2496.78 samples/sec Loss 3.5027 LearningRate 0.000691 Epoch: 10 Global Step: 208740 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:48,658-Speed 2513.81 samples/sec Loss 3.5088 LearningRate 0.000691 Epoch: 10 Global Step: 208750 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:08:56,871-Speed 2493.96 samples/sec Loss 3.6119 LearningRate 0.000691 Epoch: 10 Global Step: 208760 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:05,073-Speed 2497.54 samples/sec Loss 3.5831 LearningRate 0.000691 Epoch: 10 Global Step: 208770 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:13,280-Speed 2495.92 samples/sec Loss 3.5581 LearningRate 0.000691 Epoch: 10 Global Step: 208780 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:21,479-Speed 2498.17 samples/sec Loss 3.5866 LearningRate 0.000691 Epoch: 10 Global Step: 208790 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:29,677-Speed 2498.60 samples/sec Loss 3.5512 LearningRate 0.000691 Epoch: 10 Global Step: 208800 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:37,828-Speed 2513.22 samples/sec Loss 3.5863 LearningRate 0.000691 Epoch: 10 Global Step: 208810 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:47,436-Speed 2131.84 samples/sec Loss 3.5921 LearningRate 0.000691 Epoch: 10 Global Step: 208820 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:09:58,949-Speed 2502.81 samples/sec Loss 3.6215 LearningRate 0.000691 Epoch: 10 Global Step: 208830 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:08,211-Speed 2211.34 samples/sec Loss 3.5802 LearningRate 0.000691 Epoch: 10 Global Step: 208840 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:16,402-Speed 2500.79 samples/sec Loss 3.6674 LearningRate 0.000691 Epoch: 10 Global Step: 208850 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:24,608-Speed 2495.98 samples/sec Loss 3.5208 LearningRate 0.000691 Epoch: 10 Global Step: 208860 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:32,755-Speed 2514.31 samples/sec Loss 3.5272 LearningRate 0.000691 Epoch: 10 Global Step: 208870 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:40,956-Speed 2497.51 samples/sec Loss 3.6010 LearningRate 0.000691 Epoch: 10 Global Step: 208880 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:49,155-Speed 2498.42 samples/sec Loss 3.5459 LearningRate 0.000691 Epoch: 10 Global Step: 208890 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:10:57,360-Speed 2496.49 samples/sec Loss 3.6093 LearningRate 0.000691 Epoch: 10 Global Step: 208900 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:05,560-Speed 2497.89 samples/sec Loss 3.5623 LearningRate 0.000691 Epoch: 10 Global Step: 208910 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:13,759-Speed 2498.33 samples/sec Loss 3.5584 LearningRate 0.000691 Epoch: 10 Global Step: 208920 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:21,914-Speed 2511.84 samples/sec Loss 3.6477 LearningRate 0.000691 Epoch: 10 Global Step: 208930 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:30,114-Speed 2497.88 samples/sec Loss 3.5907 LearningRate 0.000691 Epoch: 10 Global Step: 208940 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:38,318-Speed 2497.05 samples/sec Loss 3.5628 LearningRate 0.000691 Epoch: 10 Global Step: 208950 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:46,519-Speed 2497.62 samples/sec Loss 3.5400 LearningRate 0.000691 Epoch: 10 Global Step: 208960 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:11:54,720-Speed 2497.61 samples/sec Loss 3.5606 LearningRate 0.000691 Epoch: 10 Global Step: 208970 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:02,924-Speed 2496.62 samples/sec Loss 3.5260 LearningRate 0.000691 Epoch: 10 Global Step: 208980 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:11,071-Speed 2514.15 samples/sec Loss 3.6138 LearningRate 0.000691 Epoch: 10 Global Step: 208990 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:19,269-Speed 2498.48 samples/sec Loss 3.5372 LearningRate 0.000691 Epoch: 10 Global Step: 209000 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:27,467-Speed 2498.64 samples/sec Loss 3.4724 LearningRate 0.000691 Epoch: 10 Global Step: 209010 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:35,664-Speed 2498.85 samples/sec Loss 3.5489 LearningRate 0.000691 Epoch: 10 Global Step: 209020 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:43,861-Speed 2498.91 samples/sec Loss 3.6247 LearningRate 0.000691 Epoch: 10 Global Step: 209030 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:12:52,062-Speed 2497.74 samples/sec Loss 3.5168 LearningRate 0.000691 Epoch: 10 Global Step: 209040 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:00,207-Speed 2514.72 samples/sec Loss 3.5571 LearningRate 0.000691 Epoch: 10 Global Step: 209050 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:08,408-Speed 2497.84 samples/sec Loss 3.5840 LearningRate 0.000691 Epoch: 10 Global Step: 209060 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:16,606-Speed 2498.63 samples/sec Loss 3.5919 LearningRate 0.000691 Epoch: 10 Global Step: 209070 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:24,805-Speed 2498.20 samples/sec Loss 3.5564 LearningRate 0.000691 Epoch: 10 Global Step: 209080 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:33,005-Speed 2497.99 samples/sec Loss 3.5462 LearningRate 0.000691 Epoch: 10 Global Step: 209090 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:41,206-Speed 2497.57 samples/sec Loss 3.5273 LearningRate 0.000691 Epoch: 10 Global Step: 209100 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:49,351-Speed 2514.74 samples/sec Loss 3.5822 LearningRate 0.000691 Epoch: 10 Global Step: 209110 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:13:57,551-Speed 2497.99 samples/sec Loss 3.5710 LearningRate 0.000691 Epoch: 10 Global Step: 209120 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:05,750-Speed 2498.24 samples/sec Loss 3.6243 LearningRate 0.000691 Epoch: 10 Global Step: 209130 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:13,952-Speed 2497.37 samples/sec Loss 3.5621 LearningRate 0.000691 Epoch: 10 Global Step: 209140 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:22,152-Speed 2497.94 samples/sec Loss 3.5669 LearningRate 0.000691 Epoch: 10 Global Step: 209150 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:30,362-Speed 2494.77 samples/sec Loss 3.5656 LearningRate 0.000691 Epoch: 10 Global Step: 209160 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:38,509-Speed 2514.30 samples/sec Loss 3.5068 LearningRate 0.000690 Epoch: 10 Global Step: 209170 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:46,709-Speed 2498.02 samples/sec Loss 3.5237 LearningRate 0.000690 Epoch: 10 Global Step: 209180 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:14:54,910-Speed 2497.75 samples/sec Loss 3.5500 LearningRate 0.000690 Epoch: 10 Global Step: 209190 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:03,115-Speed 2496.39 samples/sec Loss 3.4828 LearningRate 0.000690 Epoch: 10 Global Step: 209200 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:11,315-Speed 2498.22 samples/sec Loss 3.4972 LearningRate 0.000690 Epoch: 10 Global Step: 209210 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:19,513-Speed 2498.49 samples/sec Loss 3.5365 LearningRate 0.000690 Epoch: 10 Global Step: 209220 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:27,659-Speed 2514.52 samples/sec Loss 3.5775 LearningRate 0.000690 Epoch: 10 Global Step: 209230 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:35,856-Speed 2498.91 samples/sec Loss 3.5451 LearningRate 0.000690 Epoch: 10 Global Step: 209240 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:44,056-Speed 2497.95 samples/sec Loss 3.5446 LearningRate 0.000690 Epoch: 10 Global Step: 209250 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:15:52,268-Speed 2494.41 samples/sec Loss 3.4700 LearningRate 0.000690 Epoch: 10 Global Step: 209260 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:00,471-Speed 2497.04 samples/sec Loss 3.5201 LearningRate 0.000690 Epoch: 10 Global Step: 209270 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:08,675-Speed 2496.69 samples/sec Loss 3.5648 LearningRate 0.000690 Epoch: 10 Global Step: 209280 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:16,820-Speed 2514.88 samples/sec Loss 3.6190 LearningRate 0.000690 Epoch: 10 Global Step: 209290 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:25,024-Speed 2496.65 samples/sec Loss 3.5380 LearningRate 0.000690 Epoch: 10 Global Step: 209300 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:33,246-Speed 2491.30 samples/sec Loss 3.5626 LearningRate 0.000690 Epoch: 10 Global Step: 209310 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:41,446-Speed 2498.13 samples/sec Loss 3.5385 LearningRate 0.000690 Epoch: 10 Global Step: 209320 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:49,648-Speed 2497.23 samples/sec Loss 3.5151 LearningRate 0.000690 Epoch: 10 Global Step: 209330 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:16:57,848-Speed 2498.00 samples/sec Loss 3.5395 LearningRate 0.000690 Epoch: 10 Global Step: 209340 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:05,996-Speed 2513.98 samples/sec Loss 3.5205 LearningRate 0.000690 Epoch: 10 Global Step: 209350 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:14,200-Speed 2496.67 samples/sec Loss 3.6028 LearningRate 0.000690 Epoch: 10 Global Step: 209360 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:22,397-Speed 2498.93 samples/sec Loss 3.5773 LearningRate 0.000690 Epoch: 10 Global Step: 209370 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:30,597-Speed 2497.93 samples/sec Loss 3.5557 LearningRate 0.000690 Epoch: 10 Global Step: 209380 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:38,800-Speed 2496.99 samples/sec Loss 3.5799 LearningRate 0.000690 Epoch: 10 Global Step: 209390 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:46,999-Speed 2498.17 samples/sec Loss 3.6272 LearningRate 0.000690 Epoch: 10 Global Step: 209400 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:17:55,145-Speed 2514.56 samples/sec Loss 3.6693 LearningRate 0.000690 Epoch: 10 Global Step: 209410 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:03,347-Speed 2497.14 samples/sec Loss 3.6377 LearningRate 0.000690 Epoch: 10 Global Step: 209420 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:11,549-Speed 2497.92 samples/sec Loss 3.5870 LearningRate 0.000690 Epoch: 10 Global Step: 209430 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:19,746-Speed 2498.65 samples/sec Loss 3.6430 LearningRate 0.000690 Epoch: 10 Global Step: 209440 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:27,947-Speed 2497.59 samples/sec Loss 3.6115 LearningRate 0.000690 Epoch: 10 Global Step: 209450 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:36,154-Speed 2495.99 samples/sec Loss 3.6266 LearningRate 0.000690 Epoch: 10 Global Step: 209460 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:44,299-Speed 2514.83 samples/sec Loss 3.5217 LearningRate 0.000690 Epoch: 10 Global Step: 209470 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:18:52,497-Speed 2498.61 samples/sec Loss 3.6547 LearningRate 0.000690 Epoch: 10 Global Step: 209480 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:00,696-Speed 2498.30 samples/sec Loss 3.6008 LearningRate 0.000690 Epoch: 10 Global Step: 209490 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:08,902-Speed 2495.87 samples/sec Loss 3.5481 LearningRate 0.000690 Epoch: 10 Global Step: 209500 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:17,102-Speed 2498.02 samples/sec Loss 3.5382 LearningRate 0.000690 Epoch: 10 Global Step: 209510 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:25,301-Speed 2498.55 samples/sec Loss 3.6087 LearningRate 0.000690 Epoch: 10 Global Step: 209520 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:33,452-Speed 2512.89 samples/sec Loss 3.5662 LearningRate 0.000690 Epoch: 10 Global Step: 209530 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:41,665-Speed 2493.86 samples/sec Loss 3.4978 LearningRate 0.000690 Epoch: 10 Global Step: 209540 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:49,865-Speed 2498.49 samples/sec Loss 3.5497 LearningRate 0.000690 Epoch: 10 Global Step: 209550 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:19:58,063-Speed 2498.58 samples/sec Loss 3.5644 LearningRate 0.000690 Epoch: 10 Global Step: 209560 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:20:06,264-Speed 2497.62 samples/sec Loss 3.5108 LearningRate 0.000690 Epoch: 10 Global Step: 209570 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:20:14,462-Speed 2498.29 samples/sec Loss 3.5032 LearningRate 0.000690 Epoch: 10 Global Step: 209580 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:20:22,621-Speed 2510.65 samples/sec Loss 3.5628 LearningRate 0.000690 Epoch: 10 Global Step: 209590 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:20:30,819-Speed 2498.48 samples/sec Loss 3.6101 LearningRate 0.000690 Epoch: 10 Global Step: 209600 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:20:39,017-Speed 2498.77 samples/sec Loss 3.6219 LearningRate 0.000689 Epoch: 10 Global Step: 209610 Fp16 Grad Scale: 131072 Required: 142 hours Training: 2022-07-07 14:20:47,216-Speed 2498.01 samples/sec Loss 3.4789 LearningRate 0.000689 Epoch: 10 Global Step: 209620 Fp16 Grad Scale: 131072 Required: 142 hours Training: 2022-07-07 14:20:55,371-Speed 2511.68 samples/sec Loss 3.5613 LearningRate 0.000689 Epoch: 10 Global Step: 209630 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:03,574-Speed 2497.06 samples/sec Loss 3.6186 LearningRate 0.000689 Epoch: 10 Global Step: 209640 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:11,734-Speed 2510.27 samples/sec Loss 3.5936 LearningRate 0.000689 Epoch: 10 Global Step: 209650 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:19,930-Speed 2499.03 samples/sec Loss 3.5974 LearningRate 0.000689 Epoch: 10 Global Step: 209660 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:28,128-Speed 2498.47 samples/sec Loss 3.5628 LearningRate 0.000689 Epoch: 10 Global Step: 209670 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:36,333-Speed 2496.43 samples/sec Loss 3.4843 LearningRate 0.000689 Epoch: 10 Global Step: 209680 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:44,533-Speed 2497.90 samples/sec Loss 3.5454 LearningRate 0.000689 Epoch: 10 Global Step: 209690 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:21:52,733-Speed 2497.79 samples/sec Loss 3.5958 LearningRate 0.000689 Epoch: 10 Global Step: 209700 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:00,882-Speed 2513.79 samples/sec Loss 3.6299 LearningRate 0.000689 Epoch: 10 Global Step: 209710 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:09,082-Speed 2497.79 samples/sec Loss 3.5818 LearningRate 0.000689 Epoch: 10 Global Step: 209720 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:17,289-Speed 2495.88 samples/sec Loss 3.5907 LearningRate 0.000689 Epoch: 10 Global Step: 209730 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:25,487-Speed 2498.66 samples/sec Loss 3.5720 LearningRate 0.000689 Epoch: 10 Global Step: 209740 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:33,681-Speed 2499.94 samples/sec Loss 3.5365 LearningRate 0.000689 Epoch: 10 Global Step: 209750 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:41,881-Speed 2497.81 samples/sec Loss 3.5283 LearningRate 0.000689 Epoch: 10 Global Step: 209760 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:50,028-Speed 2514.40 samples/sec Loss 3.5620 LearningRate 0.000689 Epoch: 10 Global Step: 209770 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:22:58,240-Speed 2494.22 samples/sec Loss 3.5991 LearningRate 0.000689 Epoch: 10 Global Step: 209780 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:06,440-Speed 2497.79 samples/sec Loss 3.6202 LearningRate 0.000689 Epoch: 10 Global Step: 209790 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:14,638-Speed 2498.74 samples/sec Loss 3.6086 LearningRate 0.000689 Epoch: 10 Global Step: 209800 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:22,840-Speed 2497.30 samples/sec Loss 3.6519 LearningRate 0.000689 Epoch: 10 Global Step: 209810 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:31,038-Speed 2498.53 samples/sec Loss 3.6384 LearningRate 0.000689 Epoch: 10 Global Step: 209820 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:39,183-Speed 2514.69 samples/sec Loss 3.7589 LearningRate 0.000689 Epoch: 10 Global Step: 209830 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:47,383-Speed 2497.80 samples/sec Loss 3.6213 LearningRate 0.000689 Epoch: 10 Global Step: 209840 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:23:55,582-Speed 2498.21 samples/sec Loss 3.6098 LearningRate 0.000689 Epoch: 10 Global Step: 209850 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:03,782-Speed 2498.09 samples/sec Loss 3.6452 LearningRate 0.000689 Epoch: 10 Global Step: 209860 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:11,986-Speed 2496.69 samples/sec Loss 3.6167 LearningRate 0.000689 Epoch: 10 Global Step: 209870 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:20,187-Speed 2497.57 samples/sec Loss 3.6178 LearningRate 0.000689 Epoch: 10 Global Step: 209880 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:28,333-Speed 2514.32 samples/sec Loss 3.5063 LearningRate 0.000689 Epoch: 10 Global Step: 209890 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:36,532-Speed 2498.36 samples/sec Loss 3.5674 LearningRate 0.000689 Epoch: 10 Global Step: 209900 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:44,734-Speed 2497.18 samples/sec Loss 3.5888 LearningRate 0.000689 Epoch: 10 Global Step: 209910 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:24:52,934-Speed 2498.09 samples/sec Loss 3.5478 LearningRate 0.000689 Epoch: 10 Global Step: 209920 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:01,136-Speed 2497.61 samples/sec Loss 3.4982 LearningRate 0.000689 Epoch: 10 Global Step: 209930 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:09,346-Speed 2494.81 samples/sec Loss 3.6387 LearningRate 0.000689 Epoch: 10 Global Step: 209940 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:17,498-Speed 2512.55 samples/sec Loss 3.5750 LearningRate 0.000689 Epoch: 10 Global Step: 209950 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:25,700-Speed 2497.58 samples/sec Loss 3.5213 LearningRate 0.000689 Epoch: 10 Global Step: 209960 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:33,899-Speed 2498.25 samples/sec Loss 3.6231 LearningRate 0.000689 Epoch: 10 Global Step: 209970 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:42,102-Speed 2496.77 samples/sec Loss 3.5778 LearningRate 0.000689 Epoch: 10 Global Step: 209980 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:50,312-Speed 2495.12 samples/sec Loss 3.4865 LearningRate 0.000689 Epoch: 10 Global Step: 209990 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:25:58,511-Speed 2498.21 samples/sec Loss 3.5820 LearningRate 0.000689 Epoch: 10 Global Step: 210000 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:06,659-Speed 2513.82 samples/sec Loss 3.5329 LearningRate 0.000689 Epoch: 10 Global Step: 210010 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:14,874-Speed 2493.49 samples/sec Loss 3.5954 LearningRate 0.000689 Epoch: 10 Global Step: 210020 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:23,087-Speed 2494.00 samples/sec Loss 3.5661 LearningRate 0.000689 Epoch: 10 Global Step: 210030 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:31,289-Speed 2497.37 samples/sec Loss 3.5613 LearningRate 0.000689 Epoch: 10 Global Step: 210040 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:39,487-Speed 2498.25 samples/sec Loss 3.5432 LearningRate 0.000689 Epoch: 10 Global Step: 210050 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:47,688-Speed 2497.67 samples/sec Loss 3.5298 LearningRate 0.000688 Epoch: 10 Global Step: 210060 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:26:55,838-Speed 2513.51 samples/sec Loss 3.6121 LearningRate 0.000688 Epoch: 10 Global Step: 210070 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:04,040-Speed 2497.16 samples/sec Loss 3.5696 LearningRate 0.000688 Epoch: 10 Global Step: 210080 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:12,239-Speed 2498.44 samples/sec Loss 3.5601 LearningRate 0.000688 Epoch: 10 Global Step: 210090 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:20,437-Speed 2498.33 samples/sec Loss 3.5794 LearningRate 0.000688 Epoch: 10 Global Step: 210100 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:28,638-Speed 2497.79 samples/sec Loss 3.5180 LearningRate 0.000688 Epoch: 10 Global Step: 210110 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:36,835-Speed 2498.89 samples/sec Loss 3.5690 LearningRate 0.000688 Epoch: 10 Global Step: 210120 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:44,980-Speed 2514.82 samples/sec Loss 3.5536 LearningRate 0.000688 Epoch: 10 Global Step: 210130 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:27:53,177-Speed 2498.71 samples/sec Loss 3.5331 LearningRate 0.000688 Epoch: 10 Global Step: 210140 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:01,377-Speed 2498.13 samples/sec Loss 3.5101 LearningRate 0.000688 Epoch: 10 Global Step: 210150 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:09,578-Speed 2497.64 samples/sec Loss 3.5827 LearningRate 0.000688 Epoch: 10 Global Step: 210160 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:17,779-Speed 2498.16 samples/sec Loss 3.5478 LearningRate 0.000688 Epoch: 10 Global Step: 210170 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:25,978-Speed 2498.29 samples/sec Loss 3.5134 LearningRate 0.000688 Epoch: 10 Global Step: 210180 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:34,125-Speed 2514.22 samples/sec Loss 3.6025 LearningRate 0.000688 Epoch: 10 Global Step: 210190 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:42,325-Speed 2498.02 samples/sec Loss 3.6212 LearningRate 0.000688 Epoch: 10 Global Step: 210200 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:50,535-Speed 2494.77 samples/sec Loss 3.6299 LearningRate 0.000688 Epoch: 10 Global Step: 210210 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:28:58,736-Speed 2497.52 samples/sec Loss 3.6091 LearningRate 0.000688 Epoch: 10 Global Step: 210220 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:06,935-Speed 2498.34 samples/sec Loss 3.6137 LearningRate 0.000688 Epoch: 10 Global Step: 210230 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:15,138-Speed 2497.64 samples/sec Loss 3.5757 LearningRate 0.000688 Epoch: 10 Global Step: 210240 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:23,286-Speed 2513.96 samples/sec Loss 3.5663 LearningRate 0.000688 Epoch: 10 Global Step: 210250 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:31,499-Speed 2493.98 samples/sec Loss 3.4934 LearningRate 0.000688 Epoch: 10 Global Step: 210260 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:39,698-Speed 2498.23 samples/sec Loss 3.4790 LearningRate 0.000688 Epoch: 10 Global Step: 210270 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:47,911-Speed 2494.24 samples/sec Loss 3.5233 LearningRate 0.000688 Epoch: 10 Global Step: 210280 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:29:56,115-Speed 2496.58 samples/sec Loss 3.5194 LearningRate 0.000688 Epoch: 10 Global Step: 210290 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:04,315-Speed 2498.06 samples/sec Loss 3.4897 LearningRate 0.000688 Epoch: 10 Global Step: 210300 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:12,559-Speed 2484.72 samples/sec Loss 3.5232 LearningRate 0.000688 Epoch: 10 Global Step: 210310 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:20,759-Speed 2497.87 samples/sec Loss 3.5789 LearningRate 0.000688 Epoch: 10 Global Step: 210320 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:28,957-Speed 2498.55 samples/sec Loss 3.6021 LearningRate 0.000688 Epoch: 10 Global Step: 210330 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:37,154-Speed 2498.80 samples/sec Loss 3.5462 LearningRate 0.000688 Epoch: 10 Global Step: 210340 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:45,365-Speed 2494.58 samples/sec Loss 3.5387 LearningRate 0.000688 Epoch: 10 Global Step: 210350 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:30:53,565-Speed 2498.03 samples/sec Loss 3.5509 LearningRate 0.000688 Epoch: 10 Global Step: 210360 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:31:01,715-Speed 2513.05 samples/sec Loss 3.5647 LearningRate 0.000688 Epoch: 10 Global Step: 210370 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:31:09,913-Speed 2499.03 samples/sec Loss 3.5550 LearningRate 0.000688 Epoch: 10 Global Step: 210380 Fp16 Grad Scale: 65536 Required: 142 hours Training: 2022-07-07 14:31:18,117-Speed 2497.10 samples/sec Loss 3.5547 LearningRate 0.000688 Epoch: 10 Global Step: 210390 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:31:26,315-Speed 2498.82 samples/sec Loss 3.5492 LearningRate 0.000688 Epoch: 10 Global Step: 210400 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:31:34,513-Speed 2498.41 samples/sec Loss 3.5224 LearningRate 0.000688 Epoch: 10 Global Step: 210410 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:31:42,711-Speed 2498.72 samples/sec Loss 3.5192 LearningRate 0.000688 Epoch: 10 Global Step: 210420 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:31:50,857-Speed 2514.47 samples/sec Loss 3.6074 LearningRate 0.000688 Epoch: 10 Global Step: 210430 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:31:59,057-Speed 2498.02 samples/sec Loss 3.4897 LearningRate 0.000688 Epoch: 10 Global Step: 210440 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:07,257-Speed 2498.01 samples/sec Loss 3.5685 LearningRate 0.000688 Epoch: 10 Global Step: 210450 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:15,458-Speed 2497.75 samples/sec Loss 3.6457 LearningRate 0.000688 Epoch: 10 Global Step: 210460 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:23,658-Speed 2497.92 samples/sec Loss 3.5478 LearningRate 0.000688 Epoch: 10 Global Step: 210470 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:31,859-Speed 2497.55 samples/sec Loss 3.5256 LearningRate 0.000688 Epoch: 10 Global Step: 210480 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:40,003-Speed 2515.15 samples/sec Loss 3.6621 LearningRate 0.000688 Epoch: 10 Global Step: 210490 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:48,203-Speed 2497.93 samples/sec Loss 3.5486 LearningRate 0.000688 Epoch: 10 Global Step: 210500 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:32:56,405-Speed 2497.51 samples/sec Loss 3.6329 LearningRate 0.000687 Epoch: 10 Global Step: 210510 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:04,614-Speed 2495.12 samples/sec Loss 3.5569 LearningRate 0.000687 Epoch: 10 Global Step: 210520 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:12,814-Speed 2498.00 samples/sec Loss 3.5545 LearningRate 0.000687 Epoch: 10 Global Step: 210530 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:21,018-Speed 2496.86 samples/sec Loss 3.6309 LearningRate 0.000687 Epoch: 10 Global Step: 210540 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:29,165-Speed 2514.16 samples/sec Loss 3.5174 LearningRate 0.000687 Epoch: 10 Global Step: 210550 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:37,366-Speed 2497.54 samples/sec Loss 3.5498 LearningRate 0.000687 Epoch: 10 Global Step: 210560 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:45,564-Speed 2498.75 samples/sec Loss 3.5467 LearningRate 0.000687 Epoch: 10 Global Step: 210570 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:33:53,769-Speed 2496.37 samples/sec Loss 3.5472 LearningRate 0.000687 Epoch: 10 Global Step: 210580 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:01,968-Speed 2498.02 samples/sec Loss 3.5722 LearningRate 0.000687 Epoch: 10 Global Step: 210590 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:10,180-Speed 2494.28 samples/sec Loss 3.5165 LearningRate 0.000687 Epoch: 10 Global Step: 210600 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:18,331-Speed 2513.16 samples/sec Loss 3.5231 LearningRate 0.000687 Epoch: 10 Global Step: 210610 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:26,533-Speed 2497.54 samples/sec Loss 3.5756 LearningRate 0.000687 Epoch: 10 Global Step: 210620 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:34,732-Speed 2498.29 samples/sec Loss 3.6140 LearningRate 0.000687 Epoch: 10 Global Step: 210630 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:42,936-Speed 2496.74 samples/sec Loss 3.5204 LearningRate 0.000687 Epoch: 10 Global Step: 210640 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:51,137-Speed 2497.44 samples/sec Loss 3.5527 LearningRate 0.000687 Epoch: 10 Global Step: 210650 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:34:59,339-Speed 2497.76 samples/sec Loss 3.5698 LearningRate 0.000687 Epoch: 10 Global Step: 210660 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:07,486-Speed 2514.14 samples/sec Loss 3.6024 LearningRate 0.000687 Epoch: 10 Global Step: 210670 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:15,685-Speed 2498.00 samples/sec Loss 3.5186 LearningRate 0.000687 Epoch: 10 Global Step: 210680 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:23,901-Speed 2493.00 samples/sec Loss 3.5553 LearningRate 0.000687 Epoch: 10 Global Step: 210690 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:32,100-Speed 2498.33 samples/sec Loss 3.5101 LearningRate 0.000687 Epoch: 10 Global Step: 210700 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:40,299-Speed 2498.55 samples/sec Loss 3.4873 LearningRate 0.000687 Epoch: 10 Global Step: 210710 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:48,496-Speed 2498.69 samples/sec Loss 3.5441 LearningRate 0.000687 Epoch: 10 Global Step: 210720 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:35:56,643-Speed 2514.22 samples/sec Loss 3.4713 LearningRate 0.000687 Epoch: 10 Global Step: 210730 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:04,842-Speed 2498.36 samples/sec Loss 3.5653 LearningRate 0.000687 Epoch: 10 Global Step: 210740 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:13,051-Speed 2495.35 samples/sec Loss 3.5975 LearningRate 0.000687 Epoch: 10 Global Step: 210750 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:21,257-Speed 2496.25 samples/sec Loss 3.5548 LearningRate 0.000687 Epoch: 10 Global Step: 210760 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:29,470-Speed 2493.90 samples/sec Loss 3.5062 LearningRate 0.000687 Epoch: 10 Global Step: 210770 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:37,675-Speed 2496.43 samples/sec Loss 3.4826 LearningRate 0.000687 Epoch: 10 Global Step: 210780 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:45,825-Speed 2513.39 samples/sec Loss 3.5476 LearningRate 0.000687 Epoch: 10 Global Step: 210790 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:36:54,036-Speed 2494.51 samples/sec Loss 3.5042 LearningRate 0.000687 Epoch: 10 Global Step: 210800 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:37:02,234-Speed 2498.70 samples/sec Loss 3.6109 LearningRate 0.000687 Epoch: 10 Global Step: 210810 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:37:10,440-Speed 2496.09 samples/sec Loss 3.6035 LearningRate 0.000687 Epoch: 10 Global Step: 210820 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:37:18,645-Speed 2496.28 samples/sec Loss 3.5421 LearningRate 0.000687 Epoch: 10 Global Step: 210830 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:37:26,850-Speed 2496.42 samples/sec Loss 3.5943 LearningRate 0.000687 Epoch: 10 Global Step: 210840 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:37:34,999-Speed 2513.69 samples/sec Loss 3.5343 LearningRate 0.000687 Epoch: 10 Global Step: 210850 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:37:43,196-Speed 2498.62 samples/sec Loss 3.5729 LearningRate 0.000687 Epoch: 10 Global Step: 210860 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:37:51,398-Speed 2498.06 samples/sec Loss 3.5475 LearningRate 0.000687 Epoch: 10 Global Step: 210870 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:37:59,595-Speed 2498.83 samples/sec Loss 3.5661 LearningRate 0.000687 Epoch: 10 Global Step: 210880 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:07,793-Speed 2498.31 samples/sec Loss 3.5351 LearningRate 0.000687 Epoch: 10 Global Step: 210890 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:15,994-Speed 2497.71 samples/sec Loss 3.5176 LearningRate 0.000687 Epoch: 10 Global Step: 210900 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:24,149-Speed 2511.60 samples/sec Loss 3.5546 LearningRate 0.000687 Epoch: 10 Global Step: 210910 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:32,351-Speed 2497.60 samples/sec Loss 3.5978 LearningRate 0.000687 Epoch: 10 Global Step: 210920 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:40,551-Speed 2497.93 samples/sec Loss 3.5781 LearningRate 0.000687 Epoch: 10 Global Step: 210930 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:48,749-Speed 2498.46 samples/sec Loss 3.5883 LearningRate 0.000687 Epoch: 10 Global Step: 210940 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:38:56,955-Speed 2496.26 samples/sec Loss 3.5453 LearningRate 0.000687 Epoch: 10 Global Step: 210950 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 14:39:05,110-Speed 2511.47 samples/sec Loss 3.5767 LearningRate 0.000687 Epoch: 10 Global Step: 210960 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:13,254-Speed 2514.96 samples/sec Loss 3.5161 LearningRate 0.000686 Epoch: 10 Global Step: 210970 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:21,459-Speed 2496.69 samples/sec Loss 3.5654 LearningRate 0.000686 Epoch: 10 Global Step: 210980 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:29,659-Speed 2497.86 samples/sec Loss 3.5789 LearningRate 0.000686 Epoch: 10 Global Step: 210990 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:37,858-Speed 2498.34 samples/sec Loss 3.5542 LearningRate 0.000686 Epoch: 10 Global Step: 211000 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:46,061-Speed 2496.86 samples/sec Loss 3.5101 LearningRate 0.000686 Epoch: 10 Global Step: 211010 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:39:54,261-Speed 2498.13 samples/sec Loss 3.5398 LearningRate 0.000686 Epoch: 10 Global Step: 211020 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:02,411-Speed 2514.00 samples/sec Loss 3.6399 LearningRate 0.000686 Epoch: 10 Global Step: 211030 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:10,620-Speed 2495.25 samples/sec Loss 3.5716 LearningRate 0.000686 Epoch: 10 Global Step: 211040 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:18,818-Speed 2498.72 samples/sec Loss 3.5843 LearningRate 0.000686 Epoch: 10 Global Step: 211050 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:27,017-Speed 2498.70 samples/sec Loss 3.5470 LearningRate 0.000686 Epoch: 10 Global Step: 211060 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:35,216-Speed 2498.02 samples/sec Loss 3.5798 LearningRate 0.000686 Epoch: 10 Global Step: 211070 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:43,416-Speed 2498.08 samples/sec Loss 3.5426 LearningRate 0.000686 Epoch: 10 Global Step: 211080 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:51,570-Speed 2512.02 samples/sec Loss 3.4888 LearningRate 0.000686 Epoch: 10 Global Step: 211090 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:40:59,785-Speed 2493.50 samples/sec Loss 3.5169 LearningRate 0.000686 Epoch: 10 Global Step: 211100 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:07,985-Speed 2497.98 samples/sec Loss 3.5115 LearningRate 0.000686 Epoch: 10 Global Step: 211110 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:16,184-Speed 2498.33 samples/sec Loss 3.5381 LearningRate 0.000686 Epoch: 10 Global Step: 211120 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:24,383-Speed 2498.21 samples/sec Loss 3.5035 LearningRate 0.000686 Epoch: 10 Global Step: 211130 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:32,583-Speed 2497.92 samples/sec Loss 3.5626 LearningRate 0.000686 Epoch: 10 Global Step: 211140 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:40,738-Speed 2511.61 samples/sec Loss 3.5053 LearningRate 0.000686 Epoch: 10 Global Step: 211150 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:48,936-Speed 2498.58 samples/sec Loss 3.5727 LearningRate 0.000686 Epoch: 10 Global Step: 211160 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:41:57,424-Speed 2500.47 samples/sec Loss 3.5363 LearningRate 0.000686 Epoch: 10 Global Step: 211170 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:07,075-Speed 2122.32 samples/sec Loss 3.6080 LearningRate 0.000686 Epoch: 10 Global Step: 211180 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:15,270-Speed 2499.58 samples/sec Loss 3.5915 LearningRate 0.000686 Epoch: 10 Global Step: 211190 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:24,135-Speed 2310.53 samples/sec Loss 3.5855 LearningRate 0.000686 Epoch: 10 Global Step: 211200 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:32,684-Speed 2517.51 samples/sec Loss 3.6197 LearningRate 0.000686 Epoch: 10 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:40,884-Speed 2498.26 samples/sec Loss 3.6472 LearningRate 0.000686 Epoch: 10 Global Step: 211220 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:49,091-Speed 2495.68 samples/sec Loss 3.6405 LearningRate 0.000686 Epoch: 10 Global Step: 211230 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:42:57,291-Speed 2497.95 samples/sec Loss 3.5684 LearningRate 0.000686 Epoch: 10 Global Step: 211240 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:05,490-Speed 2498.22 samples/sec Loss 3.5625 LearningRate 0.000686 Epoch: 10 Global Step: 211250 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:13,687-Speed 2499.06 samples/sec Loss 3.5289 LearningRate 0.000686 Epoch: 10 Global Step: 211260 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:21,834-Speed 2514.02 samples/sec Loss 3.5159 LearningRate 0.000686 Epoch: 10 Global Step: 211270 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:30,032-Speed 2498.45 samples/sec Loss 3.5242 LearningRate 0.000686 Epoch: 10 Global Step: 211280 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:38,232-Speed 2498.00 samples/sec Loss 3.5178 LearningRate 0.000686 Epoch: 10 Global Step: 211290 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:46,437-Speed 2496.55 samples/sec Loss 3.5109 LearningRate 0.000686 Epoch: 10 Global Step: 211300 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:43:54,640-Speed 2497.12 samples/sec Loss 3.5825 LearningRate 0.000686 Epoch: 10 Global Step: 211310 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:02,840-Speed 2497.86 samples/sec Loss 3.5795 LearningRate 0.000686 Epoch: 10 Global Step: 211320 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:10,985-Speed 2514.89 samples/sec Loss 3.5954 LearningRate 0.000686 Epoch: 10 Global Step: 211330 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:19,182-Speed 2498.52 samples/sec Loss 3.5122 LearningRate 0.000686 Epoch: 10 Global Step: 211340 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:27,381-Speed 2498.30 samples/sec Loss 3.5876 LearningRate 0.000686 Epoch: 10 Global Step: 211350 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:35,590-Speed 2495.39 samples/sec Loss 3.5702 LearningRate 0.000686 Epoch: 10 Global Step: 211360 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:43,785-Speed 2499.63 samples/sec Loss 3.5901 LearningRate 0.000686 Epoch: 10 Global Step: 211370 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:44:51,983-Speed 2498.41 samples/sec Loss 3.5578 LearningRate 0.000686 Epoch: 10 Global Step: 211380 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:45:00,124-Speed 2515.99 samples/sec Loss 3.5759 LearningRate 0.000686 Epoch: 10 Global Step: 211390 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:45:08,323-Speed 2498.54 samples/sec Loss 3.5834 LearningRate 0.000686 Epoch: 10 Global Step: 211400 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 14:45:16,478-Speed 2511.72 samples/sec Loss 3.4866 LearningRate 0.000686 Epoch: 10 Global Step: 211410 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:45:24,687-Speed 2495.04 samples/sec Loss 3.5146 LearningRate 0.000685 Epoch: 10 Global Step: 211420 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:45:32,885-Speed 2498.81 samples/sec Loss 3.5239 LearningRate 0.000685 Epoch: 10 Global Step: 211430 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:45:41,082-Speed 2498.53 samples/sec Loss 3.5110 LearningRate 0.000685 Epoch: 10 Global Step: 211440 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:45:49,233-Speed 2513.23 samples/sec Loss 3.5477 LearningRate 0.000685 Epoch: 10 Global Step: 211450 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:45:57,432-Speed 2498.21 samples/sec Loss 3.5738 LearningRate 0.000685 Epoch: 10 Global Step: 211460 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:05,631-Speed 2498.13 samples/sec Loss 3.5427 LearningRate 0.000685 Epoch: 10 Global Step: 211470 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:13,832-Speed 2497.68 samples/sec Loss 3.5615 LearningRate 0.000685 Epoch: 10 Global Step: 211480 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:22,057-Speed 2490.26 samples/sec Loss 3.5457 LearningRate 0.000685 Epoch: 10 Global Step: 211490 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:30,260-Speed 2497.00 samples/sec Loss 3.6731 LearningRate 0.000685 Epoch: 10 Global Step: 211500 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:38,416-Speed 2511.16 samples/sec Loss 3.5240 LearningRate 0.000685 Epoch: 10 Global Step: 211510 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:46,626-Speed 2494.88 samples/sec Loss 3.5385 LearningRate 0.000685 Epoch: 10 Global Step: 211520 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:46:54,822-Speed 2499.19 samples/sec Loss 3.6679 LearningRate 0.000685 Epoch: 10 Global Step: 211530 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:03,021-Speed 2498.23 samples/sec Loss 3.6421 LearningRate 0.000685 Epoch: 10 Global Step: 211540 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:11,223-Speed 2497.47 samples/sec Loss 3.6151 LearningRate 0.000685 Epoch: 10 Global Step: 211550 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:19,420-Speed 2498.78 samples/sec Loss 3.6183 LearningRate 0.000685 Epoch: 10 Global Step: 211560 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:27,577-Speed 2511.25 samples/sec Loss 3.5217 LearningRate 0.000685 Epoch: 10 Global Step: 211570 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:35,775-Speed 2498.32 samples/sec Loss 3.5337 LearningRate 0.000685 Epoch: 10 Global Step: 211580 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:43,985-Speed 2495.31 samples/sec Loss 3.5558 LearningRate 0.000685 Epoch: 10 Global Step: 211590 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:47:52,190-Speed 2496.51 samples/sec Loss 3.5114 LearningRate 0.000685 Epoch: 10 Global Step: 211600 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:00,391-Speed 2497.69 samples/sec Loss 3.6176 LearningRate 0.000685 Epoch: 10 Global Step: 211610 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:08,591-Speed 2497.86 samples/sec Loss 3.5852 LearningRate 0.000685 Epoch: 10 Global Step: 211620 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:16,737-Speed 2514.73 samples/sec Loss 3.5718 LearningRate 0.000685 Epoch: 10 Global Step: 211630 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:24,939-Speed 2497.28 samples/sec Loss 3.6041 LearningRate 0.000685 Epoch: 10 Global Step: 211640 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:33,143-Speed 2496.83 samples/sec Loss 3.5405 LearningRate 0.000685 Epoch: 10 Global Step: 211650 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:41,341-Speed 2498.59 samples/sec Loss 3.5558 LearningRate 0.000685 Epoch: 10 Global Step: 211660 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:49,540-Speed 2498.15 samples/sec Loss 3.5826 LearningRate 0.000685 Epoch: 10 Global Step: 211670 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:48:57,755-Speed 2493.41 samples/sec Loss 3.5723 LearningRate 0.000685 Epoch: 10 Global Step: 211680 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:05,897-Speed 2515.82 samples/sec Loss 3.5435 LearningRate 0.000685 Epoch: 10 Global Step: 211690 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:14,096-Speed 2498.18 samples/sec Loss 3.4913 LearningRate 0.000685 Epoch: 10 Global Step: 211700 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:22,295-Speed 2498.31 samples/sec Loss 3.5735 LearningRate 0.000685 Epoch: 10 Global Step: 211710 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:30,496-Speed 2497.65 samples/sec Loss 3.4986 LearningRate 0.000685 Epoch: 10 Global Step: 211720 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:38,697-Speed 2497.68 samples/sec Loss 3.4838 LearningRate 0.000685 Epoch: 10 Global Step: 211730 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:46,897-Speed 2498.17 samples/sec Loss 3.5164 LearningRate 0.000685 Epoch: 10 Global Step: 211740 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:49:55,044-Speed 2514.35 samples/sec Loss 3.5563 LearningRate 0.000685 Epoch: 10 Global Step: 211750 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:03,244-Speed 2497.84 samples/sec Loss 3.4890 LearningRate 0.000685 Epoch: 10 Global Step: 211760 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:11,442-Speed 2498.59 samples/sec Loss 3.5335 LearningRate 0.000685 Epoch: 10 Global Step: 211770 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:19,652-Speed 2494.89 samples/sec Loss 3.5464 LearningRate 0.000685 Epoch: 10 Global Step: 211780 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:27,849-Speed 2498.86 samples/sec Loss 3.4496 LearningRate 0.000685 Epoch: 10 Global Step: 211790 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:36,049-Speed 2498.06 samples/sec Loss 3.5305 LearningRate 0.000685 Epoch: 10 Global Step: 211800 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:44,194-Speed 2514.73 samples/sec Loss 3.5261 LearningRate 0.000685 Epoch: 10 Global Step: 211810 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:50:52,400-Speed 2496.05 samples/sec Loss 3.5794 LearningRate 0.000685 Epoch: 10 Global Step: 211820 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:00,604-Speed 2497.00 samples/sec Loss 3.5386 LearningRate 0.000685 Epoch: 10 Global Step: 211830 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:08,801-Speed 2498.65 samples/sec Loss 3.5640 LearningRate 0.000685 Epoch: 10 Global Step: 211840 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:17,001-Speed 2497.97 samples/sec Loss 3.5415 LearningRate 0.000685 Epoch: 10 Global Step: 211850 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:25,208-Speed 2495.98 samples/sec Loss 3.5831 LearningRate 0.000685 Epoch: 10 Global Step: 211860 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:33,354-Speed 2514.50 samples/sec Loss 3.5742 LearningRate 0.000684 Epoch: 10 Global Step: 211870 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:41,551-Speed 2498.87 samples/sec Loss 3.5813 LearningRate 0.000684 Epoch: 10 Global Step: 211880 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:49,750-Speed 2498.51 samples/sec Loss 3.5539 LearningRate 0.000684 Epoch: 10 Global Step: 211890 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:51:57,952-Speed 2497.13 samples/sec Loss 3.5689 LearningRate 0.000684 Epoch: 10 Global Step: 211900 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:06,153-Speed 2497.93 samples/sec Loss 3.5852 LearningRate 0.000684 Epoch: 10 Global Step: 211910 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:14,369-Speed 2493.03 samples/sec Loss 3.5587 LearningRate 0.000684 Epoch: 10 Global Step: 211920 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:22,515-Speed 2514.59 samples/sec Loss 3.5491 LearningRate 0.000684 Epoch: 10 Global Step: 211930 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:30,717-Speed 2497.74 samples/sec Loss 3.5502 LearningRate 0.000684 Epoch: 10 Global Step: 211940 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:38,930-Speed 2494.26 samples/sec Loss 3.5350 LearningRate 0.000684 Epoch: 10 Global Step: 211950 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:47,130-Speed 2498.01 samples/sec Loss 3.4867 LearningRate 0.000684 Epoch: 10 Global Step: 211960 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:52:55,337-Speed 2495.87 samples/sec Loss 3.5101 LearningRate 0.000684 Epoch: 10 Global Step: 211970 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:03,544-Speed 2495.69 samples/sec Loss 3.4721 LearningRate 0.000684 Epoch: 10 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:11,690-Speed 2514.51 samples/sec Loss 3.5270 LearningRate 0.000684 Epoch: 10 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:19,891-Speed 2497.63 samples/sec Loss 3.5529 LearningRate 0.000684 Epoch: 10 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:28,094-Speed 2497.00 samples/sec Loss 3.4764 LearningRate 0.000684 Epoch: 10 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:36,297-Speed 2497.01 samples/sec Loss 3.4888 LearningRate 0.000684 Epoch: 10 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:44,496-Speed 2498.45 samples/sec Loss 3.4482 LearningRate 0.000684 Epoch: 10 Global Step: 212030 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:53:52,693-Speed 2498.83 samples/sec Loss 3.5214 LearningRate 0.000684 Epoch: 10 Global Step: 212040 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:00,841-Speed 2513.82 samples/sec Loss 3.4308 LearningRate 0.000684 Epoch: 10 Global Step: 212050 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:09,039-Speed 2498.59 samples/sec Loss 3.4641 LearningRate 0.000684 Epoch: 10 Global Step: 212060 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:17,238-Speed 2498.14 samples/sec Loss 3.5076 LearningRate 0.000684 Epoch: 10 Global Step: 212070 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:25,433-Speed 2499.90 samples/sec Loss 3.5548 LearningRate 0.000684 Epoch: 10 Global Step: 212080 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:33,631-Speed 2498.69 samples/sec Loss 3.5187 LearningRate 0.000684 Epoch: 10 Global Step: 212090 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:41,827-Speed 2499.33 samples/sec Loss 3.5422 LearningRate 0.000684 Epoch: 10 Global Step: 212100 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:49,974-Speed 2514.32 samples/sec Loss 3.5384 LearningRate 0.000684 Epoch: 10 Global Step: 212110 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:54:58,171-Speed 2498.83 samples/sec Loss 3.4621 LearningRate 0.000684 Epoch: 10 Global Step: 212120 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:06,370-Speed 2498.38 samples/sec Loss 3.5158 LearningRate 0.000684 Epoch: 10 Global Step: 212130 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:14,569-Speed 2498.32 samples/sec Loss 3.5628 LearningRate 0.000684 Epoch: 10 Global Step: 212140 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:22,768-Speed 2498.28 samples/sec Loss 3.5498 LearningRate 0.000684 Epoch: 10 Global Step: 212150 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:30,967-Speed 2498.19 samples/sec Loss 3.5140 LearningRate 0.000684 Epoch: 10 Global Step: 212160 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:39,121-Speed 2512.08 samples/sec Loss 3.4602 LearningRate 0.000684 Epoch: 10 Global Step: 212170 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:47,320-Speed 2498.63 samples/sec Loss 3.4986 LearningRate 0.000684 Epoch: 10 Global Step: 212180 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:55:55,519-Speed 2498.09 samples/sec Loss 3.5466 LearningRate 0.000684 Epoch: 10 Global Step: 212190 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:03,717-Speed 2498.75 samples/sec Loss 3.5096 LearningRate 0.000684 Epoch: 10 Global Step: 212200 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:11,919-Speed 2497.55 samples/sec Loss 3.5014 LearningRate 0.000684 Epoch: 10 Global Step: 212210 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:20,116-Speed 2498.93 samples/sec Loss 3.5338 LearningRate 0.000684 Epoch: 10 Global Step: 212220 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:28,261-Speed 2514.58 samples/sec Loss 3.5291 LearningRate 0.000684 Epoch: 10 Global Step: 212230 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:36,462-Speed 2497.90 samples/sec Loss 3.5986 LearningRate 0.000684 Epoch: 10 Global Step: 212240 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:44,659-Speed 2498.88 samples/sec Loss 3.5601 LearningRate 0.000684 Epoch: 10 Global Step: 212250 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:56:52,860-Speed 2497.71 samples/sec Loss 3.5476 LearningRate 0.000684 Epoch: 10 Global Step: 212260 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:01,072-Speed 2494.14 samples/sec Loss 3.5428 LearningRate 0.000684 Epoch: 10 Global Step: 212270 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:09,272-Speed 2497.95 samples/sec Loss 3.6218 LearningRate 0.000684 Epoch: 10 Global Step: 212280 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:17,417-Speed 2514.91 samples/sec Loss 3.5741 LearningRate 0.000684 Epoch: 10 Global Step: 212290 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:25,616-Speed 2498.21 samples/sec Loss 3.5643 LearningRate 0.000684 Epoch: 10 Global Step: 212300 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:33,836-Speed 2491.84 samples/sec Loss 3.5388 LearningRate 0.000684 Epoch: 10 Global Step: 212310 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:42,034-Speed 2498.86 samples/sec Loss 3.5737 LearningRate 0.000683 Epoch: 10 Global Step: 212320 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:50,233-Speed 2498.20 samples/sec Loss 3.6385 LearningRate 0.000683 Epoch: 10 Global Step: 212330 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:57:58,435-Speed 2497.34 samples/sec Loss 3.4829 LearningRate 0.000683 Epoch: 10 Global Step: 212340 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:06,592-Speed 2511.17 samples/sec Loss 3.4753 LearningRate 0.000683 Epoch: 10 Global Step: 212350 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:14,788-Speed 2498.88 samples/sec Loss 3.6020 LearningRate 0.000683 Epoch: 10 Global Step: 212360 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:22,986-Speed 2498.75 samples/sec Loss 3.5101 LearningRate 0.000683 Epoch: 10 Global Step: 212370 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:31,185-Speed 2498.49 samples/sec Loss 3.4930 LearningRate 0.000683 Epoch: 10 Global Step: 212380 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:39,383-Speed 2498.55 samples/sec Loss 3.4463 LearningRate 0.000683 Epoch: 10 Global Step: 212390 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:47,582-Speed 2498.20 samples/sec Loss 3.5460 LearningRate 0.000683 Epoch: 10 Global Step: 212400 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:58:55,725-Speed 2515.40 samples/sec Loss 3.5042 LearningRate 0.000683 Epoch: 10 Global Step: 212410 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:03,927-Speed 2497.23 samples/sec Loss 3.5333 LearningRate 0.000683 Epoch: 10 Global Step: 212420 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:12,126-Speed 2498.57 samples/sec Loss 3.4769 LearningRate 0.000683 Epoch: 10 Global Step: 212430 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:20,325-Speed 2498.23 samples/sec Loss 3.5284 LearningRate 0.000683 Epoch: 10 Global Step: 212440 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:28,524-Speed 2498.53 samples/sec Loss 3.5965 LearningRate 0.000683 Epoch: 10 Global Step: 212450 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:36,723-Speed 2498.29 samples/sec Loss 3.5220 LearningRate 0.000683 Epoch: 10 Global Step: 212460 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:44,871-Speed 2513.78 samples/sec Loss 3.5183 LearningRate 0.000683 Epoch: 10 Global Step: 212470 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 14:59:53,073-Speed 2497.47 samples/sec Loss 3.5930 LearningRate 0.000683 Epoch: 10 Global Step: 212480 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:01,272-Speed 2498.10 samples/sec Loss 3.5744 LearningRate 0.000683 Epoch: 10 Global Step: 212490 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:09,483-Speed 2494.69 samples/sec Loss 3.6181 LearningRate 0.000683 Epoch: 10 Global Step: 212500 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:17,684-Speed 2497.68 samples/sec Loss 3.5387 LearningRate 0.000683 Epoch: 10 Global Step: 212510 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:25,882-Speed 2498.56 samples/sec Loss 3.6056 LearningRate 0.000683 Epoch: 10 Global Step: 212520 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:34,026-Speed 2514.90 samples/sec Loss 3.5362 LearningRate 0.000683 Epoch: 10 Global Step: 212530 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:42,234-Speed 2495.50 samples/sec Loss 3.5181 LearningRate 0.000683 Epoch: 10 Global Step: 212540 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:50,436-Speed 2497.89 samples/sec Loss 3.5958 LearningRate 0.000683 Epoch: 10 Global Step: 212550 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:00:58,636-Speed 2498.29 samples/sec Loss 3.6151 LearningRate 0.000683 Epoch: 10 Global Step: 212560 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:01:06,833-Speed 2498.64 samples/sec Loss 3.5255 LearningRate 0.000683 Epoch: 10 Global Step: 212570 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:01:15,033-Speed 2497.95 samples/sec Loss 3.5568 LearningRate 0.000683 Epoch: 10 Global Step: 212580 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:01:23,182-Speed 2513.82 samples/sec Loss 3.5879 LearningRate 0.000683 Epoch: 10 Global Step: 212590 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:01:31,392-Speed 2494.71 samples/sec Loss 3.5461 LearningRate 0.000683 Epoch: 10 Global Step: 212600 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:01:39,592-Speed 2498.22 samples/sec Loss 3.5340 LearningRate 0.000683 Epoch: 10 Global Step: 212610 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:01:47,789-Speed 2498.63 samples/sec Loss 3.4606 LearningRate 0.000683 Epoch: 10 Global Step: 212620 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:01:55,997-Speed 2495.66 samples/sec Loss 3.5555 LearningRate 0.000683 Epoch: 10 Global Step: 212630 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:04,192-Speed 2499.78 samples/sec Loss 3.5640 LearningRate 0.000683 Epoch: 10 Global Step: 212640 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:12,342-Speed 2513.11 samples/sec Loss 3.5219 LearningRate 0.000683 Epoch: 10 Global Step: 212650 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:20,546-Speed 2496.73 samples/sec Loss 3.5956 LearningRate 0.000683 Epoch: 10 Global Step: 212660 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:28,755-Speed 2495.50 samples/sec Loss 3.5253 LearningRate 0.000683 Epoch: 10 Global Step: 212670 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:36,952-Speed 2498.84 samples/sec Loss 3.5814 LearningRate 0.000683 Epoch: 10 Global Step: 212680 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:45,147-Speed 2499.55 samples/sec Loss 3.6525 LearningRate 0.000683 Epoch: 10 Global Step: 212690 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:02:53,345-Speed 2498.39 samples/sec Loss 3.5956 LearningRate 0.000683 Epoch: 10 Global Step: 212700 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:01,499-Speed 2512.25 samples/sec Loss 3.6831 LearningRate 0.000683 Epoch: 10 Global Step: 212710 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:09,694-Speed 2499.58 samples/sec Loss 3.6270 LearningRate 0.000683 Epoch: 10 Global Step: 212720 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:17,894-Speed 2497.80 samples/sec Loss 3.6294 LearningRate 0.000683 Epoch: 10 Global Step: 212730 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:26,092-Speed 2498.59 samples/sec Loss 3.5558 LearningRate 0.000683 Epoch: 10 Global Step: 212740 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:34,292-Speed 2498.13 samples/sec Loss 3.5732 LearningRate 0.000683 Epoch: 10 Global Step: 212750 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:42,506-Speed 2493.71 samples/sec Loss 3.6121 LearningRate 0.000683 Epoch: 10 Global Step: 212760 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:50,646-Speed 2516.41 samples/sec Loss 3.5739 LearningRate 0.000682 Epoch: 10 Global Step: 212770 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:03:58,843-Speed 2499.82 samples/sec Loss 3.5449 LearningRate 0.000682 Epoch: 10 Global Step: 212780 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:07,042-Speed 2498.25 samples/sec Loss 3.5741 LearningRate 0.000682 Epoch: 10 Global Step: 212790 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:15,238-Speed 2498.91 samples/sec Loss 3.5677 LearningRate 0.000682 Epoch: 10 Global Step: 212800 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:23,437-Speed 2498.37 samples/sec Loss 3.5683 LearningRate 0.000682 Epoch: 10 Global Step: 212810 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:31,635-Speed 2498.41 samples/sec Loss 3.5184 LearningRate 0.000682 Epoch: 10 Global Step: 212820 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:39,799-Speed 2509.38 samples/sec Loss 3.5470 LearningRate 0.000682 Epoch: 10 Global Step: 212830 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:48,000-Speed 2497.86 samples/sec Loss 3.5601 LearningRate 0.000682 Epoch: 10 Global Step: 212840 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:04:56,196-Speed 2499.08 samples/sec Loss 3.5012 LearningRate 0.000682 Epoch: 10 Global Step: 212850 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:04,393-Speed 2499.03 samples/sec Loss 3.5360 LearningRate 0.000682 Epoch: 10 Global Step: 212860 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:12,609-Speed 2493.13 samples/sec Loss 3.4978 LearningRate 0.000682 Epoch: 10 Global Step: 212870 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:20,809-Speed 2497.73 samples/sec Loss 3.5062 LearningRate 0.000682 Epoch: 10 Global Step: 212880 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:28,955-Speed 2514.44 samples/sec Loss 3.5334 LearningRate 0.000682 Epoch: 10 Global Step: 212890 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:37,150-Speed 2499.66 samples/sec Loss 3.5301 LearningRate 0.000682 Epoch: 10 Global Step: 212900 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:45,353-Speed 2496.98 samples/sec Loss 3.5825 LearningRate 0.000682 Epoch: 10 Global Step: 212910 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:05:53,547-Speed 2499.75 samples/sec Loss 3.4602 LearningRate 0.000682 Epoch: 10 Global Step: 212920 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:01,742-Speed 2499.49 samples/sec Loss 3.4814 LearningRate 0.000682 Epoch: 10 Global Step: 212930 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:09,946-Speed 2497.06 samples/sec Loss 3.5355 LearningRate 0.000682 Epoch: 10 Global Step: 212940 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:18,090-Speed 2515.14 samples/sec Loss 3.6030 LearningRate 0.000682 Epoch: 10 Global Step: 212950 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:26,288-Speed 2498.55 samples/sec Loss 3.5614 LearningRate 0.000682 Epoch: 10 Global Step: 212960 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:34,487-Speed 2498.25 samples/sec Loss 3.4979 LearningRate 0.000682 Epoch: 10 Global Step: 212970 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:42,684-Speed 2499.10 samples/sec Loss 3.5532 LearningRate 0.000682 Epoch: 10 Global Step: 212980 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:50,882-Speed 2498.45 samples/sec Loss 3.4620 LearningRate 0.000682 Epoch: 10 Global Step: 212990 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:06:59,081-Speed 2498.67 samples/sec Loss 3.5027 LearningRate 0.000682 Epoch: 10 Global Step: 213000 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:07,236-Speed 2511.68 samples/sec Loss 3.4806 LearningRate 0.000682 Epoch: 10 Global Step: 213010 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:15,440-Speed 2497.12 samples/sec Loss 3.5075 LearningRate 0.000682 Epoch: 10 Global Step: 213020 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:23,631-Speed 2500.45 samples/sec Loss 3.4875 LearningRate 0.000682 Epoch: 10 Global Step: 213030 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:31,828-Speed 2499.09 samples/sec Loss 3.5387 LearningRate 0.000682 Epoch: 10 Global Step: 213040 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:40,031-Speed 2497.02 samples/sec Loss 3.5986 LearningRate 0.000682 Epoch: 10 Global Step: 213050 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:48,229-Speed 2498.79 samples/sec Loss 3.5549 LearningRate 0.000682 Epoch: 10 Global Step: 213060 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:07:56,373-Speed 2514.99 samples/sec Loss 3.5065 LearningRate 0.000682 Epoch: 10 Global Step: 213070 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:04,571-Speed 2498.49 samples/sec Loss 3.4956 LearningRate 0.000682 Epoch: 10 Global Step: 213080 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:12,769-Speed 2498.56 samples/sec Loss 3.5575 LearningRate 0.000682 Epoch: 10 Global Step: 213090 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:20,967-Speed 2498.78 samples/sec Loss 3.5908 LearningRate 0.000682 Epoch: 10 Global Step: 213100 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:29,163-Speed 2499.16 samples/sec Loss 3.4931 LearningRate 0.000682 Epoch: 10 Global Step: 213110 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:37,365-Speed 2497.18 samples/sec Loss 3.4605 LearningRate 0.000682 Epoch: 10 Global Step: 213120 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:45,512-Speed 2514.28 samples/sec Loss 3.5798 LearningRate 0.000682 Epoch: 10 Global Step: 213130 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:08:53,711-Speed 2498.50 samples/sec Loss 3.5409 LearningRate 0.000682 Epoch: 10 Global Step: 213140 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:01,908-Speed 2498.85 samples/sec Loss 3.5226 LearningRate 0.000682 Epoch: 10 Global Step: 213150 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:10,103-Speed 2499.42 samples/sec Loss 3.5448 LearningRate 0.000682 Epoch: 10 Global Step: 213160 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:18,305-Speed 2497.45 samples/sec Loss 3.5374 LearningRate 0.000682 Epoch: 10 Global Step: 213170 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:26,504-Speed 2498.17 samples/sec Loss 3.5218 LearningRate 0.000682 Epoch: 10 Global Step: 213180 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:34,649-Speed 2514.87 samples/sec Loss 3.6015 LearningRate 0.000682 Epoch: 10 Global Step: 213190 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:42,846-Speed 2498.79 samples/sec Loss 3.5432 LearningRate 0.000682 Epoch: 10 Global Step: 213200 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:51,040-Speed 2499.91 samples/sec Loss 3.5580 LearningRate 0.000682 Epoch: 10 Global Step: 213210 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:09:59,239-Speed 2498.31 samples/sec Loss 3.5114 LearningRate 0.000681 Epoch: 10 Global Step: 213220 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:07,435-Speed 2499.07 samples/sec Loss 3.5243 LearningRate 0.000681 Epoch: 10 Global Step: 213230 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:15,633-Speed 2498.49 samples/sec Loss 3.5044 LearningRate 0.000681 Epoch: 10 Global Step: 213240 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:23,775-Speed 2515.74 samples/sec Loss 3.5128 LearningRate 0.000681 Epoch: 10 Global Step: 213250 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:31,977-Speed 2497.33 samples/sec Loss 3.5547 LearningRate 0.000681 Epoch: 10 Global Step: 213260 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:40,175-Speed 2498.74 samples/sec Loss 3.5141 LearningRate 0.000681 Epoch: 10 Global Step: 213270 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:48,372-Speed 2498.80 samples/sec Loss 3.5229 LearningRate 0.000681 Epoch: 10 Global Step: 213280 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:10:56,570-Speed 2498.56 samples/sec Loss 3.5278 LearningRate 0.000681 Epoch: 10 Global Step: 213290 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:04,765-Speed 2499.29 samples/sec Loss 3.4666 LearningRate 0.000681 Epoch: 10 Global Step: 213300 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:12,926-Speed 2510.02 samples/sec Loss 3.6487 LearningRate 0.000681 Epoch: 10 Global Step: 213310 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:21,134-Speed 2495.61 samples/sec Loss 3.5717 LearningRate 0.000681 Epoch: 10 Global Step: 213320 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:29,331-Speed 2498.85 samples/sec Loss 3.5168 LearningRate 0.000681 Epoch: 10 Global Step: 213330 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:37,538-Speed 2495.81 samples/sec Loss 3.5018 LearningRate 0.000681 Epoch: 10 Global Step: 213340 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:45,733-Speed 2499.71 samples/sec Loss 3.5050 LearningRate 0.000681 Epoch: 10 Global Step: 213350 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:11:53,930-Speed 2498.99 samples/sec Loss 3.5782 LearningRate 0.000681 Epoch: 10 Global Step: 213360 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:02,075-Speed 2514.88 samples/sec Loss 3.5706 LearningRate 0.000681 Epoch: 10 Global Step: 213370 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:10,278-Speed 2496.80 samples/sec Loss 3.5824 LearningRate 0.000681 Epoch: 10 Global Step: 213380 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:18,493-Speed 2493.23 samples/sec Loss 3.4797 LearningRate 0.000681 Epoch: 10 Global Step: 213390 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:26,697-Speed 2497.39 samples/sec Loss 3.5243 LearningRate 0.000681 Epoch: 10 Global Step: 213400 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:34,895-Speed 2498.35 samples/sec Loss 3.5572 LearningRate 0.000681 Epoch: 10 Global Step: 213410 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:43,120-Speed 2490.25 samples/sec Loss 3.5367 LearningRate 0.000681 Epoch: 10 Global Step: 213420 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:51,265-Speed 2515.57 samples/sec Loss 3.4547 LearningRate 0.000681 Epoch: 10 Global Step: 213430 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:12:59,463-Speed 2498.38 samples/sec Loss 3.5400 LearningRate 0.000681 Epoch: 10 Global Step: 213440 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:07,661-Speed 2498.60 samples/sec Loss 3.5322 LearningRate 0.000681 Epoch: 10 Global Step: 213450 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:15,858-Speed 2498.96 samples/sec Loss 3.5225 LearningRate 0.000681 Epoch: 10 Global Step: 213460 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:24,058-Speed 2498.34 samples/sec Loss 3.4716 LearningRate 0.000681 Epoch: 10 Global Step: 213470 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:32,255-Speed 2498.57 samples/sec Loss 3.5384 LearningRate 0.000681 Epoch: 10 Global Step: 213480 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:40,395-Speed 2516.40 samples/sec Loss 3.5028 LearningRate 0.000681 Epoch: 10 Global Step: 213490 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:48,592-Speed 2498.92 samples/sec Loss 3.5230 LearningRate 0.000681 Epoch: 10 Global Step: 213500 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:13:56,785-Speed 2500.15 samples/sec Loss 3.4812 LearningRate 0.000681 Epoch: 10 Global Step: 213510 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:05,336-Speed 2500.51 samples/sec Loss 3.4744 LearningRate 0.000681 Epoch: 10 Global Step: 213520 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:17,442-Speed 1697.10 samples/sec Loss 3.5554 LearningRate 0.000681 Epoch: 10 Global Step: 213530 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:25,635-Speed 2500.00 samples/sec Loss 3.5296 LearningRate 0.000681 Epoch: 10 Global Step: 213540 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:33,797-Speed 2517.96 samples/sec Loss 3.5448 LearningRate 0.000681 Epoch: 10 Global Step: 213550 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:42,008-Speed 2501.74 samples/sec Loss 3.5973 LearningRate 0.000681 Epoch: 10 Global Step: 213560 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:50,202-Speed 2499.58 samples/sec Loss 3.5699 LearningRate 0.000681 Epoch: 10 Global Step: 213570 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:14:58,454-Speed 2501.18 samples/sec Loss 3.4741 LearningRate 0.000681 Epoch: 10 Global Step: 213580 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:07,953-Speed 2503.10 samples/sec Loss 3.5259 LearningRate 0.000681 Epoch: 10 Global Step: 213590 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:16,147-Speed 2499.90 samples/sec Loss 3.5529 LearningRate 0.000681 Epoch: 10 Global Step: 213600 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:24,323-Speed 2516.96 samples/sec Loss 3.5127 LearningRate 0.000681 Epoch: 10 Global Step: 213610 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:32,571-Speed 2501.81 samples/sec Loss 3.5176 LearningRate 0.000681 Epoch: 10 Global Step: 213620 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:42,320-Speed 2116.85 samples/sec Loss 3.4810 LearningRate 0.000681 Epoch: 10 Global Step: 213630 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:50,514-Speed 2499.89 samples/sec Loss 3.5422 LearningRate 0.000681 Epoch: 10 Global Step: 213640 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:15:58,731-Speed 2501.08 samples/sec Loss 3.5023 LearningRate 0.000681 Epoch: 10 Global Step: 213650 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:10,020-Speed 1831.26 samples/sec Loss 3.5215 LearningRate 0.000681 Epoch: 10 Global Step: 213660 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:18,242-Speed 2518.48 samples/sec Loss 3.5122 LearningRate 0.000680 Epoch: 10 Global Step: 213670 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:26,437-Speed 2499.36 samples/sec Loss 3.5500 LearningRate 0.000680 Epoch: 10 Global Step: 213680 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:37,594-Speed 2501.36 samples/sec Loss 3.4932 LearningRate 0.000680 Epoch: 10 Global Step: 213690 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:45,814-Speed 2501.51 samples/sec Loss 3.5585 LearningRate 0.000680 Epoch: 10 Global Step: 213700 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:16:54,008-Speed 2499.46 samples/sec Loss 3.5833 LearningRate 0.000680 Epoch: 10 Global Step: 213710 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:02,210-Speed 2497.47 samples/sec Loss 3.5798 LearningRate 0.000680 Epoch: 10 Global Step: 213720 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:10,389-Speed 2515.27 samples/sec Loss 3.5201 LearningRate 0.000680 Epoch: 10 Global Step: 213730 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:21,605-Speed 1833.67 samples/sec Loss 3.5236 LearningRate 0.000680 Epoch: 10 Global Step: 213740 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:29,804-Speed 2498.12 samples/sec Loss 3.5177 LearningRate 0.000680 Epoch: 10 Global Step: 213750 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:38,046-Speed 2500.90 samples/sec Loss 3.5399 LearningRate 0.000680 Epoch: 10 Global Step: 213760 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:49,186-Speed 2482.88 samples/sec Loss 3.5205 LearningRate 0.000680 Epoch: 10 Global Step: 213770 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:17:57,384-Speed 2498.68 samples/sec Loss 3.5159 LearningRate 0.000680 Epoch: 10 Global Step: 213780 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:18:05,619-Speed 2512.50 samples/sec Loss 3.5483 LearningRate 0.000680 Epoch: 10 Global Step: 213790 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:18:13,843-Speed 2499.42 samples/sec Loss 3.5234 LearningRate 0.000680 Epoch: 10 Global Step: 213800 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:18:22,048-Speed 2496.54 samples/sec Loss 3.5406 LearningRate 0.000680 Epoch: 10 Global Step: 213810 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:18:30,274-Speed 2496.78 samples/sec Loss 3.4648 LearningRate 0.000680 Epoch: 10 Global Step: 213820 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:18:38,499-Speed 2499.74 samples/sec Loss 3.5505 LearningRate 0.000680 Epoch: 10 Global Step: 213830 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:18:46,743-Speed 2498.58 samples/sec Loss 3.4549 LearningRate 0.000680 Epoch: 10 Global Step: 213840 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:18:54,898-Speed 2511.66 samples/sec Loss 3.5233 LearningRate 0.000680 Epoch: 10 Global Step: 213850 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:07,449-Speed 1639.46 samples/sec Loss 3.5219 LearningRate 0.000680 Epoch: 10 Global Step: 213860 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:15,695-Speed 2499.22 samples/sec Loss 3.4784 LearningRate 0.000680 Epoch: 10 Global Step: 213870 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:23,927-Speed 2495.00 samples/sec Loss 3.4879 LearningRate 0.000680 Epoch: 10 Global Step: 213880 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:32,128-Speed 2497.50 samples/sec Loss 3.4707 LearningRate 0.000680 Epoch: 10 Global Step: 213890 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:40,542-Speed 2464.08 samples/sec Loss 3.4954 LearningRate 0.000680 Epoch: 10 Global Step: 213900 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:48,705-Speed 2509.59 samples/sec Loss 3.5016 LearningRate 0.000680 Epoch: 10 Global Step: 213910 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:19:56,910-Speed 2496.34 samples/sec Loss 3.5418 LearningRate 0.000680 Epoch: 10 Global Step: 213920 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:05,114-Speed 2496.77 samples/sec Loss 3.5355 LearningRate 0.000680 Epoch: 10 Global Step: 213930 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:13,319-Speed 2496.38 samples/sec Loss 3.5352 LearningRate 0.000680 Epoch: 10 Global Step: 213940 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:21,525-Speed 2496.28 samples/sec Loss 3.4819 LearningRate 0.000680 Epoch: 10 Global Step: 213950 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:29,750-Speed 2490.24 samples/sec Loss 3.4509 LearningRate 0.000680 Epoch: 10 Global Step: 213960 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:37,901-Speed 2512.89 samples/sec Loss 3.4204 LearningRate 0.000680 Epoch: 10 Global Step: 213970 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:46,106-Speed 2496.57 samples/sec Loss 3.5512 LearningRate 0.000680 Epoch: 10 Global Step: 213980 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:20:54,313-Speed 2495.89 samples/sec Loss 3.5105 LearningRate 0.000680 Epoch: 10 Global Step: 213990 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:02,517-Speed 2496.36 samples/sec Loss 3.5204 LearningRate 0.000680 Epoch: 10 Global Step: 214000 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:10,720-Speed 2497.16 samples/sec Loss 3.4951 LearningRate 0.000680 Epoch: 10 Global Step: 214010 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:18,931-Speed 2494.83 samples/sec Loss 3.5090 LearningRate 0.000680 Epoch: 10 Global Step: 214020 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:27,085-Speed 2511.97 samples/sec Loss 3.4939 LearningRate 0.000680 Epoch: 10 Global Step: 214030 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:35,295-Speed 2494.89 samples/sec Loss 3.5303 LearningRate 0.000680 Epoch: 10 Global Step: 214040 Fp16 Grad Scale: 131072 Required: 141 hours Training: 2022-07-07 15:21:43,456-Speed 2510.18 samples/sec Loss 3.5726 LearningRate 0.000680 Epoch: 10 Global Step: 214050 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:21:51,658-Speed 2497.14 samples/sec Loss 3.4786 LearningRate 0.000680 Epoch: 10 Global Step: 214060 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:21:59,863-Speed 2496.34 samples/sec Loss 3.5460 LearningRate 0.000680 Epoch: 10 Global Step: 214070 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:08,070-Speed 2496.12 samples/sec Loss 3.5505 LearningRate 0.000680 Epoch: 10 Global Step: 214080 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:16,226-Speed 2511.40 samples/sec Loss 3.5338 LearningRate 0.000680 Epoch: 10 Global Step: 214090 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:24,426-Speed 2498.30 samples/sec Loss 3.5463 LearningRate 0.000680 Epoch: 10 Global Step: 214100 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:32,635-Speed 2495.30 samples/sec Loss 3.5282 LearningRate 0.000680 Epoch: 10 Global Step: 214110 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:40,843-Speed 2495.35 samples/sec Loss 3.5730 LearningRate 0.000680 Epoch: 10 Global Step: 214120 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:49,047-Speed 2496.86 samples/sec Loss 3.5506 LearningRate 0.000679 Epoch: 10 Global Step: 214130 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:22:57,254-Speed 2495.83 samples/sec Loss 3.5818 LearningRate 0.000679 Epoch: 10 Global Step: 214140 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:05,400-Speed 2514.27 samples/sec Loss 3.4722 LearningRate 0.000679 Epoch: 10 Global Step: 214150 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:13,603-Speed 2497.53 samples/sec Loss 3.6162 LearningRate 0.000679 Epoch: 10 Global Step: 214160 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:21,806-Speed 2496.98 samples/sec Loss 3.5484 LearningRate 0.000679 Epoch: 10 Global Step: 214170 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:30,013-Speed 2495.78 samples/sec Loss 3.5226 LearningRate 0.000679 Epoch: 10 Global Step: 214180 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:38,215-Speed 2497.12 samples/sec Loss 3.5829 LearningRate 0.000679 Epoch: 10 Global Step: 214190 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:46,416-Speed 2497.94 samples/sec Loss 3.5489 LearningRate 0.000679 Epoch: 10 Global Step: 214200 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:23:54,561-Speed 2515.17 samples/sec Loss 3.6224 LearningRate 0.000679 Epoch: 10 Global Step: 214210 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:02,762-Speed 2497.50 samples/sec Loss 3.5613 LearningRate 0.000679 Epoch: 10 Global Step: 214220 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:10,961-Speed 2498.06 samples/sec Loss 3.5570 LearningRate 0.000679 Epoch: 10 Global Step: 214230 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:19,165-Speed 2496.95 samples/sec Loss 3.5337 LearningRate 0.000679 Epoch: 10 Global Step: 214240 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:27,364-Speed 2498.18 samples/sec Loss 3.4927 LearningRate 0.000679 Epoch: 10 Global Step: 214250 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:35,565-Speed 2497.61 samples/sec Loss 3.5386 LearningRate 0.000679 Epoch: 10 Global Step: 214260 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:43,707-Speed 2515.77 samples/sec Loss 3.5628 LearningRate 0.000679 Epoch: 10 Global Step: 214270 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:24:51,907-Speed 2498.06 samples/sec Loss 3.5238 LearningRate 0.000679 Epoch: 10 Global Step: 214280 Fp16 Grad Scale: 65536 Required: 141 hours Training: 2022-07-07 15:25:00,061-Speed 2512.13 samples/sec Loss 3.5393 LearningRate 0.000679 Epoch: 10 Global Step: 214290 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:08,257-Speed 2499.03 samples/sec Loss 3.5089 LearningRate 0.000679 Epoch: 10 Global Step: 214300 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:16,453-Speed 2499.28 samples/sec Loss 3.5339 LearningRate 0.000679 Epoch: 10 Global Step: 214310 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:24,649-Speed 2499.13 samples/sec Loss 3.4946 LearningRate 0.000679 Epoch: 10 Global Step: 214320 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:32,792-Speed 2515.32 samples/sec Loss 3.4837 LearningRate 0.000679 Epoch: 10 Global Step: 214330 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:40,989-Speed 2498.97 samples/sec Loss 3.4439 LearningRate 0.000679 Epoch: 10 Global Step: 214340 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:49,192-Speed 2497.22 samples/sec Loss 3.5759 LearningRate 0.000679 Epoch: 10 Global Step: 214350 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:25:57,392-Speed 2498.02 samples/sec Loss 3.4920 LearningRate 0.000679 Epoch: 10 Global Step: 214360 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:05,591-Speed 2498.21 samples/sec Loss 3.5176 LearningRate 0.000679 Epoch: 10 Global Step: 214370 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:13,791-Speed 2498.06 samples/sec Loss 3.4662 LearningRate 0.000679 Epoch: 10 Global Step: 214380 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:21,936-Speed 2515.13 samples/sec Loss 3.5469 LearningRate 0.000679 Epoch: 10 Global Step: 214390 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:30,136-Speed 2497.82 samples/sec Loss 3.4919 LearningRate 0.000679 Epoch: 10 Global Step: 214400 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:38,340-Speed 2496.86 samples/sec Loss 3.5326 LearningRate 0.000679 Epoch: 10 Global Step: 214410 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:46,549-Speed 2494.96 samples/sec Loss 3.4742 LearningRate 0.000679 Epoch: 10 Global Step: 214420 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:26:54,746-Speed 2499.35 samples/sec Loss 3.5110 LearningRate 0.000679 Epoch: 10 Global Step: 214430 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:02,945-Speed 2498.10 samples/sec Loss 3.4475 LearningRate 0.000679 Epoch: 10 Global Step: 214440 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:11,105-Speed 2510.26 samples/sec Loss 3.4246 LearningRate 0.000679 Epoch: 10 Global Step: 214450 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:19,307-Speed 2497.38 samples/sec Loss 3.4742 LearningRate 0.000679 Epoch: 10 Global Step: 214460 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:27,505-Speed 2498.43 samples/sec Loss 3.4804 LearningRate 0.000679 Epoch: 10 Global Step: 214470 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:35,704-Speed 2498.18 samples/sec Loss 3.5126 LearningRate 0.000679 Epoch: 10 Global Step: 214480 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:43,931-Speed 2489.79 samples/sec Loss 3.5520 LearningRate 0.000679 Epoch: 10 Global Step: 214490 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:27:52,129-Speed 2498.45 samples/sec Loss 3.5523 LearningRate 0.000679 Epoch: 10 Global Step: 214500 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:00,278-Speed 2513.85 samples/sec Loss 3.5170 LearningRate 0.000679 Epoch: 10 Global Step: 214510 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:08,480-Speed 2497.20 samples/sec Loss 3.4614 LearningRate 0.000679 Epoch: 10 Global Step: 214520 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:16,682-Speed 2497.43 samples/sec Loss 3.4919 LearningRate 0.000679 Epoch: 10 Global Step: 214530 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:24,879-Speed 2499.53 samples/sec Loss 3.4760 LearningRate 0.000679 Epoch: 10 Global Step: 214540 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:33,090-Speed 2494.57 samples/sec Loss 3.5429 LearningRate 0.000679 Epoch: 10 Global Step: 214550 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:41,300-Speed 2494.65 samples/sec Loss 3.5052 LearningRate 0.000679 Epoch: 10 Global Step: 214560 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:49,453-Speed 2512.26 samples/sec Loss 3.5035 LearningRate 0.000679 Epoch: 10 Global Step: 214570 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:28:57,655-Speed 2497.66 samples/sec Loss 3.5356 LearningRate 0.000678 Epoch: 10 Global Step: 214580 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:05,854-Speed 2498.33 samples/sec Loss 3.5164 LearningRate 0.000678 Epoch: 10 Global Step: 214590 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:14,052-Speed 2498.30 samples/sec Loss 3.5646 LearningRate 0.000678 Epoch: 10 Global Step: 214600 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:22,253-Speed 2497.99 samples/sec Loss 3.5803 LearningRate 0.000678 Epoch: 10 Global Step: 214610 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:30,459-Speed 2496.38 samples/sec Loss 3.6949 LearningRate 0.000678 Epoch: 10 Global Step: 214620 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:38,605-Speed 2514.50 samples/sec Loss 3.5740 LearningRate 0.000678 Epoch: 10 Global Step: 214630 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:46,800-Speed 2499.94 samples/sec Loss 3.5195 LearningRate 0.000678 Epoch: 10 Global Step: 214640 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:29:54,998-Speed 2498.60 samples/sec Loss 3.6835 LearningRate 0.000678 Epoch: 10 Global Step: 214650 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:03,197-Speed 2498.22 samples/sec Loss 3.5857 LearningRate 0.000678 Epoch: 10 Global Step: 214660 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:11,396-Speed 2498.25 samples/sec Loss 3.5762 LearningRate 0.000678 Epoch: 10 Global Step: 214670 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:19,595-Speed 2498.32 samples/sec Loss 3.5337 LearningRate 0.000678 Epoch: 10 Global Step: 214680 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:27,760-Speed 2508.45 samples/sec Loss 3.6110 LearningRate 0.000678 Epoch: 10 Global Step: 214690 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:35,971-Speed 2494.59 samples/sec Loss 3.5493 LearningRate 0.000678 Epoch: 10 Global Step: 214700 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:44,171-Speed 2498.05 samples/sec Loss 3.5662 LearningRate 0.000678 Epoch: 10 Global Step: 214710 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:30:52,369-Speed 2498.47 samples/sec Loss 3.5201 LearningRate 0.000678 Epoch: 10 Global Step: 214720 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:00,572-Speed 2496.99 samples/sec Loss 3.5138 LearningRate 0.000678 Epoch: 10 Global Step: 214730 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:08,782-Speed 2495.19 samples/sec Loss 3.6107 LearningRate 0.000678 Epoch: 10 Global Step: 214740 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:16,946-Speed 2508.81 samples/sec Loss 3.6001 LearningRate 0.000678 Epoch: 10 Global Step: 214750 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:25,148-Speed 2497.30 samples/sec Loss 3.5405 LearningRate 0.000678 Epoch: 10 Global Step: 214760 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:33,352-Speed 2496.94 samples/sec Loss 3.5513 LearningRate 0.000678 Epoch: 10 Global Step: 214770 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:41,550-Speed 2498.53 samples/sec Loss 3.5364 LearningRate 0.000678 Epoch: 10 Global Step: 214780 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:49,752-Speed 2497.10 samples/sec Loss 3.4749 LearningRate 0.000678 Epoch: 10 Global Step: 214790 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:31:57,955-Speed 2497.05 samples/sec Loss 3.4648 LearningRate 0.000678 Epoch: 10 Global Step: 214800 Fp16 Grad Scale: 32768 Required: 141 hours Training: 2022-07-07 15:32:06,103-Speed 2513.72 samples/sec Loss 3.5520 LearningRate 0.000678 Epoch: 10 Global Step: 214810 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:14,308-Speed 2497.53 samples/sec Loss 3.5147 LearningRate 0.000678 Epoch: 10 Global Step: 214820 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:22,503-Speed 2499.31 samples/sec Loss 3.4430 LearningRate 0.000678 Epoch: 10 Global Step: 214830 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:30,708-Speed 2496.60 samples/sec Loss 3.5352 LearningRate 0.000678 Epoch: 10 Global Step: 214840 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:38,907-Speed 2498.37 samples/sec Loss 3.4447 LearningRate 0.000678 Epoch: 10 Global Step: 214850 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:47,107-Speed 2497.91 samples/sec Loss 3.5374 LearningRate 0.000678 Epoch: 10 Global Step: 214860 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:32:55,251-Speed 2515.08 samples/sec Loss 3.5430 LearningRate 0.000678 Epoch: 10 Global Step: 214870 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:03,458-Speed 2495.71 samples/sec Loss 3.5585 LearningRate 0.000678 Epoch: 10 Global Step: 214880 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:11,656-Speed 2498.73 samples/sec Loss 3.5152 LearningRate 0.000678 Epoch: 10 Global Step: 214890 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:19,860-Speed 2496.86 samples/sec Loss 3.5373 LearningRate 0.000678 Epoch: 10 Global Step: 214900 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:28,071-Speed 2494.48 samples/sec Loss 3.4798 LearningRate 0.000678 Epoch: 10 Global Step: 214910 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:36,269-Speed 2498.67 samples/sec Loss 3.4790 LearningRate 0.000678 Epoch: 10 Global Step: 214920 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:44,418-Speed 2513.51 samples/sec Loss 3.5125 LearningRate 0.000678 Epoch: 10 Global Step: 214930 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:33:52,617-Speed 2498.32 samples/sec Loss 3.4158 LearningRate 0.000678 Epoch: 10 Global Step: 214940 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:00,818-Speed 2497.63 samples/sec Loss 3.5125 LearningRate 0.000678 Epoch: 10 Global Step: 214950 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:09,018-Speed 2498.05 samples/sec Loss 3.4871 LearningRate 0.000678 Epoch: 10 Global Step: 214960 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:17,228-Speed 2494.61 samples/sec Loss 3.4852 LearningRate 0.000678 Epoch: 10 Global Step: 214970 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:25,429-Speed 2498.10 samples/sec Loss 3.4246 LearningRate 0.000678 Epoch: 10 Global Step: 214980 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:33,571-Speed 2515.71 samples/sec Loss 3.5935 LearningRate 0.000678 Epoch: 10 Global Step: 214990 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:41,769-Speed 2498.46 samples/sec Loss 3.5005 LearningRate 0.000678 Epoch: 10 Global Step: 215000 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:49,968-Speed 2498.32 samples/sec Loss 3.5090 LearningRate 0.000678 Epoch: 10 Global Step: 215010 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:34:58,167-Speed 2498.31 samples/sec Loss 3.4572 LearningRate 0.000678 Epoch: 10 Global Step: 215020 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:06,362-Speed 2499.29 samples/sec Loss 3.5192 LearningRate 0.000677 Epoch: 10 Global Step: 215030 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:14,558-Speed 2499.27 samples/sec Loss 3.4815 LearningRate 0.000677 Epoch: 10 Global Step: 215040 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:22,711-Speed 2512.48 samples/sec Loss 3.4325 LearningRate 0.000677 Epoch: 10 Global Step: 215050 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:30,917-Speed 2496.32 samples/sec Loss 3.5659 LearningRate 0.000677 Epoch: 10 Global Step: 215060 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:39,117-Speed 2497.74 samples/sec Loss 3.4992 LearningRate 0.000677 Epoch: 10 Global Step: 215070 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:47,320-Speed 2497.25 samples/sec Loss 3.5551 LearningRate 0.000677 Epoch: 10 Global Step: 215080 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:35:55,529-Speed 2494.96 samples/sec Loss 3.5095 LearningRate 0.000677 Epoch: 10 Global Step: 215090 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:03,727-Speed 2498.62 samples/sec Loss 3.5023 LearningRate 0.000677 Epoch: 10 Global Step: 215100 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:11,872-Speed 2515.08 samples/sec Loss 3.4958 LearningRate 0.000677 Epoch: 10 Global Step: 215110 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:20,070-Speed 2498.37 samples/sec Loss 3.4733 LearningRate 0.000677 Epoch: 10 Global Step: 215120 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:28,270-Speed 2497.88 samples/sec Loss 3.5107 LearningRate 0.000677 Epoch: 10 Global Step: 215130 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:36,472-Speed 2497.46 samples/sec Loss 3.5092 LearningRate 0.000677 Epoch: 10 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:44,670-Speed 2498.43 samples/sec Loss 3.4827 LearningRate 0.000677 Epoch: 10 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:36:52,879-Speed 2495.31 samples/sec Loss 3.5384 LearningRate 0.000677 Epoch: 10 Global Step: 215160 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:01,031-Speed 2512.62 samples/sec Loss 3.4338 LearningRate 0.000677 Epoch: 10 Global Step: 215170 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:09,232-Speed 2497.86 samples/sec Loss 3.5138 LearningRate 0.000677 Epoch: 10 Global Step: 215180 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:17,432-Speed 2497.99 samples/sec Loss 3.5358 LearningRate 0.000677 Epoch: 10 Global Step: 215190 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:25,634-Speed 2497.34 samples/sec Loss 3.5406 LearningRate 0.000677 Epoch: 10 Global Step: 215200 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:33,830-Speed 2498.93 samples/sec Loss 3.5216 LearningRate 0.000677 Epoch: 10 Global Step: 215210 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:42,035-Speed 2496.70 samples/sec Loss 3.5676 LearningRate 0.000677 Epoch: 10 Global Step: 215220 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:50,178-Speed 2515.40 samples/sec Loss 3.4196 LearningRate 0.000677 Epoch: 10 Global Step: 215230 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:37:58,378-Speed 2497.67 samples/sec Loss 3.4550 LearningRate 0.000677 Epoch: 10 Global Step: 215240 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:06,575-Speed 2498.92 samples/sec Loss 3.5239 LearningRate 0.000677 Epoch: 10 Global Step: 215250 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:14,780-Speed 2496.44 samples/sec Loss 3.5017 LearningRate 0.000677 Epoch: 10 Global Step: 215260 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:22,979-Speed 2497.98 samples/sec Loss 3.5214 LearningRate 0.000677 Epoch: 10 Global Step: 215270 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:31,177-Speed 2498.52 samples/sec Loss 3.5271 LearningRate 0.000677 Epoch: 10 Global Step: 215280 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:39,321-Speed 2515.15 samples/sec Loss 3.4340 LearningRate 0.000677 Epoch: 10 Global Step: 215290 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:47,533-Speed 2494.30 samples/sec Loss 3.4823 LearningRate 0.000677 Epoch: 10 Global Step: 215300 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:38:55,733-Speed 2497.94 samples/sec Loss 3.5823 LearningRate 0.000677 Epoch: 10 Global Step: 215310 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:03,928-Speed 2499.53 samples/sec Loss 3.4735 LearningRate 0.000677 Epoch: 10 Global Step: 215320 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:12,127-Speed 2498.43 samples/sec Loss 3.4438 LearningRate 0.000677 Epoch: 10 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:20,322-Speed 2499.34 samples/sec Loss 3.4926 LearningRate 0.000677 Epoch: 10 Global Step: 215340 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:28,465-Speed 2515.27 samples/sec Loss 3.4564 LearningRate 0.000677 Epoch: 10 Global Step: 215350 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:36,661-Speed 2499.19 samples/sec Loss 3.5061 LearningRate 0.000677 Epoch: 10 Global Step: 215360 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:44,863-Speed 2497.61 samples/sec Loss 3.4125 LearningRate 0.000677 Epoch: 10 Global Step: 215370 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:39:53,064-Speed 2497.80 samples/sec Loss 3.5045 LearningRate 0.000677 Epoch: 10 Global Step: 215380 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:01,270-Speed 2496.06 samples/sec Loss 3.4671 LearningRate 0.000677 Epoch: 10 Global Step: 215390 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:09,469-Speed 2498.09 samples/sec Loss 3.5472 LearningRate 0.000677 Epoch: 10 Global Step: 215400 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:17,613-Speed 2515.80 samples/sec Loss 3.5216 LearningRate 0.000677 Epoch: 10 Global Step: 215410 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:25,812-Speed 2498.27 samples/sec Loss 3.5387 LearningRate 0.000677 Epoch: 10 Global Step: 215420 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:34,011-Speed 2498.15 samples/sec Loss 3.4735 LearningRate 0.000677 Epoch: 10 Global Step: 215430 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:42,206-Speed 2499.35 samples/sec Loss 3.5081 LearningRate 0.000677 Epoch: 10 Global Step: 215440 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:50,405-Speed 2498.53 samples/sec Loss 3.4462 LearningRate 0.000677 Epoch: 10 Global Step: 215450 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:40:58,607-Speed 2497.62 samples/sec Loss 3.4751 LearningRate 0.000677 Epoch: 10 Global Step: 215460 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:41:06,754-Speed 2514.12 samples/sec Loss 3.4757 LearningRate 0.000677 Epoch: 10 Global Step: 215470 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:41:14,953-Speed 2498.35 samples/sec Loss 3.4628 LearningRate 0.000677 Epoch: 10 Global Step: 215480 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:41:23,154-Speed 2497.79 samples/sec Loss 3.5270 LearningRate 0.000676 Epoch: 10 Global Step: 215490 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:41:31,351-Speed 2498.79 samples/sec Loss 3.4484 LearningRate 0.000676 Epoch: 10 Global Step: 215500 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:41:39,549-Speed 2498.68 samples/sec Loss 3.4745 LearningRate 0.000676 Epoch: 10 Global Step: 215510 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:41:47,749-Speed 2498.03 samples/sec Loss 3.5441 LearningRate 0.000676 Epoch: 10 Global Step: 215520 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:41:55,893-Speed 2515.30 samples/sec Loss 3.5742 LearningRate 0.000676 Epoch: 10 Global Step: 215530 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:04,090-Speed 2498.53 samples/sec Loss 3.5369 LearningRate 0.000676 Epoch: 10 Global Step: 215540 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:12,288-Speed 2498.74 samples/sec Loss 3.4532 LearningRate 0.000676 Epoch: 10 Global Step: 215550 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:20,485-Speed 2499.03 samples/sec Loss 3.5746 LearningRate 0.000676 Epoch: 10 Global Step: 215560 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:28,689-Speed 2496.63 samples/sec Loss 3.5248 LearningRate 0.000676 Epoch: 10 Global Step: 215570 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:36,886-Speed 2498.98 samples/sec Loss 3.4513 LearningRate 0.000676 Epoch: 10 Global Step: 215580 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:45,044-Speed 2510.69 samples/sec Loss 3.5905 LearningRate 0.000676 Epoch: 10 Global Step: 215590 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:42:53,242-Speed 2498.69 samples/sec Loss 3.5090 LearningRate 0.000676 Epoch: 10 Global Step: 215600 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:01,443-Speed 2497.62 samples/sec Loss 3.5111 LearningRate 0.000676 Epoch: 10 Global Step: 215610 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:09,643-Speed 2498.14 samples/sec Loss 3.4855 LearningRate 0.000676 Epoch: 10 Global Step: 215620 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:17,841-Speed 2498.54 samples/sec Loss 3.4399 LearningRate 0.000676 Epoch: 10 Global Step: 215630 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:26,040-Speed 2498.71 samples/sec Loss 3.5030 LearningRate 0.000676 Epoch: 10 Global Step: 215640 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:34,184-Speed 2515.13 samples/sec Loss 3.4651 LearningRate 0.000676 Epoch: 10 Global Step: 215650 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:42,381-Speed 2498.74 samples/sec Loss 3.6440 LearningRate 0.000676 Epoch: 10 Global Step: 215660 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:50,580-Speed 2498.50 samples/sec Loss 3.5755 LearningRate 0.000676 Epoch: 10 Global Step: 215670 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:43:58,785-Speed 2496.38 samples/sec Loss 3.6164 LearningRate 0.000676 Epoch: 10 Global Step: 215680 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:06,989-Speed 2497.02 samples/sec Loss 3.6784 LearningRate 0.000676 Epoch: 10 Global Step: 215690 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:15,189-Speed 2497.70 samples/sec Loss 3.6557 LearningRate 0.000676 Epoch: 10 Global Step: 215700 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:23,336-Speed 2514.08 samples/sec Loss 3.6297 LearningRate 0.000676 Epoch: 10 Global Step: 215710 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:31,535-Speed 2498.51 samples/sec Loss 3.5990 LearningRate 0.000676 Epoch: 10 Global Step: 215720 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:39,735-Speed 2498.49 samples/sec Loss 3.5793 LearningRate 0.000676 Epoch: 10 Global Step: 215730 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:47,930-Speed 2499.33 samples/sec Loss 3.5951 LearningRate 0.000676 Epoch: 10 Global Step: 215740 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:44:56,128-Speed 2498.77 samples/sec Loss 3.5611 LearningRate 0.000676 Epoch: 10 Global Step: 215750 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:04,326-Speed 2498.82 samples/sec Loss 3.5394 LearningRate 0.000676 Epoch: 10 Global Step: 215760 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:12,481-Speed 2511.57 samples/sec Loss 3.5231 LearningRate 0.000676 Epoch: 10 Global Step: 215770 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:20,681-Speed 2498.06 samples/sec Loss 3.4421 LearningRate 0.000676 Epoch: 10 Global Step: 215780 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:28,892-Speed 2494.76 samples/sec Loss 3.5066 LearningRate 0.000676 Epoch: 10 Global Step: 215790 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:37,090-Speed 2498.88 samples/sec Loss 3.5091 LearningRate 0.000676 Epoch: 10 Global Step: 215800 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:45,303-Speed 2493.91 samples/sec Loss 3.5226 LearningRate 0.000676 Epoch: 10 Global Step: 215810 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:45:53,500-Speed 2498.92 samples/sec Loss 3.4772 LearningRate 0.000676 Epoch: 10 Global Step: 215820 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:01,647-Speed 2514.09 samples/sec Loss 3.4511 LearningRate 0.000676 Epoch: 10 Global Step: 215830 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:09,840-Speed 2500.06 samples/sec Loss 3.5729 LearningRate 0.000676 Epoch: 10 Global Step: 215840 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:18,050-Speed 2494.98 samples/sec Loss 3.5821 LearningRate 0.000676 Epoch: 10 Global Step: 215850 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:26,252-Speed 2497.08 samples/sec Loss 3.5199 LearningRate 0.000676 Epoch: 10 Global Step: 215860 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:34,456-Speed 2496.94 samples/sec Loss 3.5733 LearningRate 0.000676 Epoch: 10 Global Step: 215870 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:42,655-Speed 2498.24 samples/sec Loss 3.5295 LearningRate 0.000676 Epoch: 10 Global Step: 215880 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:50,800-Speed 2514.77 samples/sec Loss 3.5437 LearningRate 0.000676 Epoch: 10 Global Step: 215890 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:46:58,996-Speed 2499.40 samples/sec Loss 3.5704 LearningRate 0.000676 Epoch: 10 Global Step: 215900 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:07,192-Speed 2499.16 samples/sec Loss 3.4523 LearningRate 0.000676 Epoch: 10 Global Step: 215910 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:15,392-Speed 2498.14 samples/sec Loss 3.4778 LearningRate 0.000676 Epoch: 10 Global Step: 215920 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:23,592-Speed 2497.74 samples/sec Loss 3.4594 LearningRate 0.000676 Epoch: 10 Global Step: 215930 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:31,792-Speed 2497.91 samples/sec Loss 3.4808 LearningRate 0.000675 Epoch: 10 Global Step: 215940 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:39,936-Speed 2515.29 samples/sec Loss 3.5114 LearningRate 0.000675 Epoch: 10 Global Step: 215950 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:48,136-Speed 2497.90 samples/sec Loss 3.4149 LearningRate 0.000675 Epoch: 10 Global Step: 215960 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:47:56,334-Speed 2498.55 samples/sec Loss 3.5122 LearningRate 0.000675 Epoch: 10 Global Step: 215970 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:04,532-Speed 2498.71 samples/sec Loss 3.5833 LearningRate 0.000675 Epoch: 10 Global Step: 215980 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:12,731-Speed 2498.67 samples/sec Loss 3.4485 LearningRate 0.000675 Epoch: 10 Global Step: 215990 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:20,944-Speed 2493.77 samples/sec Loss 3.4985 LearningRate 0.000675 Epoch: 10 Global Step: 216000 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:29,093-Speed 2513.82 samples/sec Loss 3.4401 LearningRate 0.000675 Epoch: 10 Global Step: 216010 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:37,293-Speed 2498.11 samples/sec Loss 3.4091 LearningRate 0.000675 Epoch: 10 Global Step: 216020 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:45,503-Speed 2494.91 samples/sec Loss 3.4397 LearningRate 0.000675 Epoch: 10 Global Step: 216030 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:48:53,706-Speed 2497.28 samples/sec Loss 3.5039 LearningRate 0.000675 Epoch: 10 Global Step: 216040 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:01,903-Speed 2498.74 samples/sec Loss 3.5322 LearningRate 0.000675 Epoch: 10 Global Step: 216050 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:10,113-Speed 2494.93 samples/sec Loss 3.4935 LearningRate 0.000675 Epoch: 10 Global Step: 216060 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:18,263-Speed 2513.48 samples/sec Loss 3.5937 LearningRate 0.000675 Epoch: 10 Global Step: 216070 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:26,459-Speed 2499.01 samples/sec Loss 3.5481 LearningRate 0.000675 Epoch: 10 Global Step: 216080 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:34,662-Speed 2496.87 samples/sec Loss 3.4961 LearningRate 0.000675 Epoch: 10 Global Step: 216090 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:42,868-Speed 2496.26 samples/sec Loss 3.4848 LearningRate 0.000675 Epoch: 10 Global Step: 216100 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:51,064-Speed 2499.22 samples/sec Loss 3.4752 LearningRate 0.000675 Epoch: 10 Global Step: 216110 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:49:59,263-Speed 2498.26 samples/sec Loss 3.5404 LearningRate 0.000675 Epoch: 10 Global Step: 216120 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:07,407-Speed 2514.92 samples/sec Loss 3.5189 LearningRate 0.000675 Epoch: 10 Global Step: 216130 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:15,605-Speed 2498.57 samples/sec Loss 3.5281 LearningRate 0.000675 Epoch: 10 Global Step: 216140 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:23,817-Speed 2494.57 samples/sec Loss 3.4561 LearningRate 0.000675 Epoch: 10 Global Step: 216150 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:32,037-Speed 2491.61 samples/sec Loss 3.4635 LearningRate 0.000675 Epoch: 10 Global Step: 216160 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:40,233-Speed 2499.06 samples/sec Loss 3.4869 LearningRate 0.000675 Epoch: 10 Global Step: 216170 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:48,436-Speed 2497.20 samples/sec Loss 3.5030 LearningRate 0.000675 Epoch: 10 Global Step: 216180 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:50:56,584-Speed 2514.07 samples/sec Loss 3.5006 LearningRate 0.000675 Epoch: 10 Global Step: 216190 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:04,783-Speed 2498.12 samples/sec Loss 3.4364 LearningRate 0.000675 Epoch: 10 Global Step: 216200 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:12,982-Speed 2498.16 samples/sec Loss 3.4896 LearningRate 0.000675 Epoch: 10 Global Step: 216210 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:21,180-Speed 2498.79 samples/sec Loss 3.4722 LearningRate 0.000675 Epoch: 10 Global Step: 216220 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:29,383-Speed 2496.95 samples/sec Loss 3.4036 LearningRate 0.000675 Epoch: 10 Global Step: 216230 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:37,582-Speed 2498.35 samples/sec Loss 3.4639 LearningRate 0.000675 Epoch: 10 Global Step: 216240 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:45,738-Speed 2511.90 samples/sec Loss 3.4132 LearningRate 0.000675 Epoch: 10 Global Step: 216250 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:51:53,935-Speed 2498.72 samples/sec Loss 3.3875 LearningRate 0.000675 Epoch: 10 Global Step: 216260 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:02,132-Speed 2498.83 samples/sec Loss 3.4662 LearningRate 0.000675 Epoch: 10 Global Step: 216270 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:10,330-Speed 2498.75 samples/sec Loss 3.3968 LearningRate 0.000675 Epoch: 10 Global Step: 216280 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:18,530-Speed 2497.99 samples/sec Loss 3.4188 LearningRate 0.000675 Epoch: 10 Global Step: 216290 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:26,737-Speed 2496.10 samples/sec Loss 3.4067 LearningRate 0.000675 Epoch: 10 Global Step: 216300 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:34,895-Speed 2510.59 samples/sec Loss 3.3888 LearningRate 0.000675 Epoch: 10 Global Step: 216310 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:43,093-Speed 2498.57 samples/sec Loss 3.4575 LearningRate 0.000675 Epoch: 10 Global Step: 216320 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:51,291-Speed 2498.75 samples/sec Loss 3.4620 LearningRate 0.000675 Epoch: 10 Global Step: 216330 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:52:59,496-Speed 2496.81 samples/sec Loss 3.4490 LearningRate 0.000675 Epoch: 10 Global Step: 216340 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:07,708-Speed 2494.31 samples/sec Loss 3.5206 LearningRate 0.000675 Epoch: 10 Global Step: 216350 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:15,917-Speed 2495.34 samples/sec Loss 3.4387 LearningRate 0.000675 Epoch: 10 Global Step: 216360 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:24,064-Speed 2514.52 samples/sec Loss 3.4278 LearningRate 0.000675 Epoch: 10 Global Step: 216370 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:32,288-Speed 2490.36 samples/sec Loss 3.4541 LearningRate 0.000675 Epoch: 10 Global Step: 216380 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:40,488-Speed 2497.96 samples/sec Loss 3.4381 LearningRate 0.000675 Epoch: 10 Global Step: 216390 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:48,693-Speed 2496.65 samples/sec Loss 3.5182 LearningRate 0.000674 Epoch: 10 Global Step: 216400 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:53:56,890-Speed 2498.81 samples/sec Loss 3.4470 LearningRate 0.000674 Epoch: 10 Global Step: 216410 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:54:05,087-Speed 2498.76 samples/sec Loss 3.5029 LearningRate 0.000674 Epoch: 10 Global Step: 216420 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:54:13,233-Speed 2514.50 samples/sec Loss 3.4704 LearningRate 0.000674 Epoch: 10 Global Step: 216430 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:54:21,469-Speed 2487.00 samples/sec Loss 3.4981 LearningRate 0.000674 Epoch: 10 Global Step: 216440 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 15:54:29,638-Speed 2507.35 samples/sec Loss 3.5207 LearningRate 0.000674 Epoch: 10 Global Step: 216450 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:54:37,837-Speed 2498.10 samples/sec Loss 3.5147 LearningRate 0.000674 Epoch: 10 Global Step: 216460 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:54:46,035-Speed 2498.76 samples/sec Loss 3.4778 LearningRate 0.000674 Epoch: 10 Global Step: 216470 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:54:54,232-Speed 2498.91 samples/sec Loss 3.4630 LearningRate 0.000674 Epoch: 10 Global Step: 216480 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:02,385-Speed 2512.32 samples/sec Loss 3.4274 LearningRate 0.000674 Epoch: 10 Global Step: 216490 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:10,583-Speed 2498.35 samples/sec Loss 3.4766 LearningRate 0.000674 Epoch: 10 Global Step: 216500 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:18,781-Speed 2498.48 samples/sec Loss 3.5255 LearningRate 0.000674 Epoch: 10 Global Step: 216510 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:26,980-Speed 2498.27 samples/sec Loss 3.4849 LearningRate 0.000674 Epoch: 10 Global Step: 216520 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:35,181-Speed 2497.56 samples/sec Loss 3.4490 LearningRate 0.000674 Epoch: 10 Global Step: 216530 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:43,379-Speed 2498.62 samples/sec Loss 3.4719 LearningRate 0.000674 Epoch: 10 Global Step: 216540 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:51,525-Speed 2514.58 samples/sec Loss 3.4401 LearningRate 0.000674 Epoch: 10 Global Step: 216550 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:55:59,721-Speed 2499.01 samples/sec Loss 3.4714 LearningRate 0.000674 Epoch: 10 Global Step: 216560 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:07,921-Speed 2498.03 samples/sec Loss 3.4803 LearningRate 0.000674 Epoch: 10 Global Step: 216570 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:16,118-Speed 2498.97 samples/sec Loss 3.4305 LearningRate 0.000674 Epoch: 10 Global Step: 216580 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:24,315-Speed 2498.63 samples/sec Loss 3.4404 LearningRate 0.000674 Epoch: 10 Global Step: 216590 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:32,514-Speed 2498.29 samples/sec Loss 3.4127 LearningRate 0.000674 Epoch: 10 Global Step: 216600 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:40,659-Speed 2514.78 samples/sec Loss 3.5216 LearningRate 0.000674 Epoch: 10 Global Step: 216610 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:48,856-Speed 2499.08 samples/sec Loss 3.4974 LearningRate 0.000674 Epoch: 10 Global Step: 216620 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:56:57,053-Speed 2498.87 samples/sec Loss 3.4335 LearningRate 0.000674 Epoch: 10 Global Step: 216630 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:05,251-Speed 2498.63 samples/sec Loss 3.5435 LearningRate 0.000674 Epoch: 10 Global Step: 216640 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:13,448-Speed 2498.92 samples/sec Loss 3.4792 LearningRate 0.000674 Epoch: 10 Global Step: 216650 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:21,646-Speed 2498.47 samples/sec Loss 3.3868 LearningRate 0.000674 Epoch: 10 Global Step: 216660 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:29,792-Speed 2514.60 samples/sec Loss 3.5092 LearningRate 0.000674 Epoch: 10 Global Step: 216670 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:37,997-Speed 2496.96 samples/sec Loss 3.6151 LearningRate 0.000674 Epoch: 10 Global Step: 216680 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:46,192-Speed 2499.41 samples/sec Loss 3.5014 LearningRate 0.000674 Epoch: 10 Global Step: 216690 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:57:54,389-Speed 2499.00 samples/sec Loss 3.5056 LearningRate 0.000674 Epoch: 10 Global Step: 216700 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:02,593-Speed 2496.63 samples/sec Loss 3.5551 LearningRate 0.000674 Epoch: 10 Global Step: 216710 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:10,797-Speed 2496.67 samples/sec Loss 3.4279 LearningRate 0.000674 Epoch: 10 Global Step: 216720 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:18,940-Speed 2515.30 samples/sec Loss 3.5327 LearningRate 0.000674 Epoch: 10 Global Step: 216730 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:27,152-Speed 2494.40 samples/sec Loss 3.4572 LearningRate 0.000674 Epoch: 10 Global Step: 216740 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:35,355-Speed 2497.04 samples/sec Loss 3.4205 LearningRate 0.000674 Epoch: 10 Global Step: 216750 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:43,555-Speed 2498.08 samples/sec Loss 3.5232 LearningRate 0.000674 Epoch: 10 Global Step: 216760 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:51,754-Speed 2498.05 samples/sec Loss 3.4631 LearningRate 0.000674 Epoch: 10 Global Step: 216770 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:58:59,953-Speed 2498.33 samples/sec Loss 3.5040 LearningRate 0.000674 Epoch: 10 Global Step: 216780 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:08,096-Speed 2515.47 samples/sec Loss 3.4547 LearningRate 0.000674 Epoch: 10 Global Step: 216790 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:16,290-Speed 2499.47 samples/sec Loss 3.4649 LearningRate 0.000674 Epoch: 10 Global Step: 216800 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:24,487-Speed 2499.20 samples/sec Loss 3.4617 LearningRate 0.000674 Epoch: 10 Global Step: 216810 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:32,683-Speed 2499.29 samples/sec Loss 3.4552 LearningRate 0.000674 Epoch: 10 Global Step: 216820 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:40,894-Speed 2494.63 samples/sec Loss 3.4879 LearningRate 0.000674 Epoch: 10 Global Step: 216830 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:49,106-Speed 2494.41 samples/sec Loss 3.4524 LearningRate 0.000674 Epoch: 10 Global Step: 216840 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 15:59:57,247-Speed 2515.97 samples/sec Loss 3.4556 LearningRate 0.000673 Epoch: 10 Global Step: 216850 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:05,448-Speed 2497.83 samples/sec Loss 3.4261 LearningRate 0.000673 Epoch: 10 Global Step: 216860 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:13,644-Speed 2499.31 samples/sec Loss 3.4590 LearningRate 0.000673 Epoch: 10 Global Step: 216870 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:21,838-Speed 2499.92 samples/sec Loss 3.4385 LearningRate 0.000673 Epoch: 10 Global Step: 216880 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:30,033-Speed 2499.66 samples/sec Loss 3.4516 LearningRate 0.000673 Epoch: 10 Global Step: 216890 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:38,231-Speed 2498.58 samples/sec Loss 3.4101 LearningRate 0.000673 Epoch: 10 Global Step: 216900 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:46,375-Speed 2515.17 samples/sec Loss 3.4668 LearningRate 0.000673 Epoch: 10 Global Step: 216910 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:00:54,575-Speed 2497.84 samples/sec Loss 3.4528 LearningRate 0.000673 Epoch: 10 Global Step: 216920 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:02,774-Speed 2498.42 samples/sec Loss 3.5259 LearningRate 0.000673 Epoch: 10 Global Step: 216930 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:10,973-Speed 2498.40 samples/sec Loss 3.4412 LearningRate 0.000673 Epoch: 10 Global Step: 216940 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:19,169-Speed 2499.08 samples/sec Loss 3.5345 LearningRate 0.000673 Epoch: 10 Global Step: 216950 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:27,366-Speed 2499.03 samples/sec Loss 3.4202 LearningRate 0.000673 Epoch: 10 Global Step: 216960 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:35,510-Speed 2515.07 samples/sec Loss 3.5170 LearningRate 0.000673 Epoch: 10 Global Step: 216970 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:43,707-Speed 2499.05 samples/sec Loss 3.5147 LearningRate 0.000673 Epoch: 10 Global Step: 216980 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:01:51,906-Speed 2498.16 samples/sec Loss 3.4486 LearningRate 0.000673 Epoch: 10 Global Step: 216990 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:00,103-Speed 2498.76 samples/sec Loss 3.5069 LearningRate 0.000673 Epoch: 10 Global Step: 217000 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:08,299-Speed 2499.22 samples/sec Loss 3.4925 LearningRate 0.000673 Epoch: 10 Global Step: 217010 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:16,498-Speed 2498.47 samples/sec Loss 3.5299 LearningRate 0.000673 Epoch: 10 Global Step: 217020 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:24,641-Speed 2515.45 samples/sec Loss 3.4804 LearningRate 0.000673 Epoch: 10 Global Step: 217030 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:32,838-Speed 2498.90 samples/sec Loss 3.5833 LearningRate 0.000673 Epoch: 10 Global Step: 217040 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:41,037-Speed 2498.55 samples/sec Loss 3.5700 LearningRate 0.000673 Epoch: 10 Global Step: 217050 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:49,247-Speed 2494.78 samples/sec Loss 3.4854 LearningRate 0.000673 Epoch: 10 Global Step: 217060 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:02:57,450-Speed 2496.90 samples/sec Loss 3.5040 LearningRate 0.000673 Epoch: 10 Global Step: 217070 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:05,663-Speed 2494.21 samples/sec Loss 3.4002 LearningRate 0.000673 Epoch: 10 Global Step: 217080 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:13,808-Speed 2514.89 samples/sec Loss 3.4471 LearningRate 0.000673 Epoch: 10 Global Step: 217090 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:22,008-Speed 2498.08 samples/sec Loss 3.4358 LearningRate 0.000673 Epoch: 10 Global Step: 217100 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:30,204-Speed 2499.28 samples/sec Loss 3.4400 LearningRate 0.000673 Epoch: 10 Global Step: 217110 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:38,404-Speed 2497.98 samples/sec Loss 3.5003 LearningRate 0.000673 Epoch: 10 Global Step: 217120 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:46,607-Speed 2497.12 samples/sec Loss 3.4295 LearningRate 0.000673 Epoch: 10 Global Step: 217130 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:03:54,811-Speed 2496.82 samples/sec Loss 3.4538 LearningRate 0.000673 Epoch: 10 Global Step: 217140 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:02,957-Speed 2514.77 samples/sec Loss 3.5027 LearningRate 0.000673 Epoch: 10 Global Step: 217150 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:11,157-Speed 2498.04 samples/sec Loss 3.5125 LearningRate 0.000673 Epoch: 10 Global Step: 217160 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:19,362-Speed 2496.36 samples/sec Loss 3.4603 LearningRate 0.000673 Epoch: 10 Global Step: 217170 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:27,562-Speed 2497.97 samples/sec Loss 3.4987 LearningRate 0.000673 Epoch: 10 Global Step: 217180 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:35,759-Speed 2498.75 samples/sec Loss 3.4882 LearningRate 0.000673 Epoch: 10 Global Step: 217190 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:43,960-Speed 2497.66 samples/sec Loss 3.5036 LearningRate 0.000673 Epoch: 10 Global Step: 217200 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:04:52,104-Speed 2514.98 samples/sec Loss 3.4463 LearningRate 0.000673 Epoch: 10 Global Step: 217210 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:00,314-Speed 2495.18 samples/sec Loss 3.4661 LearningRate 0.000673 Epoch: 10 Global Step: 217220 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:08,522-Speed 2495.47 samples/sec Loss 3.4872 LearningRate 0.000673 Epoch: 10 Global Step: 217230 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:16,736-Speed 2493.69 samples/sec Loss 3.4619 LearningRate 0.000673 Epoch: 10 Global Step: 217240 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:24,935-Speed 2498.34 samples/sec Loss 3.4655 LearningRate 0.000673 Epoch: 10 Global Step: 217250 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:33,132-Speed 2498.76 samples/sec Loss 3.4759 LearningRate 0.000673 Epoch: 10 Global Step: 217260 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:41,277-Speed 2514.90 samples/sec Loss 3.4833 LearningRate 0.000673 Epoch: 10 Global Step: 217270 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:49,475-Speed 2498.61 samples/sec Loss 3.4574 LearningRate 0.000673 Epoch: 10 Global Step: 217280 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:05:57,672-Speed 2498.97 samples/sec Loss 3.4631 LearningRate 0.000673 Epoch: 10 Global Step: 217290 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:05,878-Speed 2496.58 samples/sec Loss 3.4893 LearningRate 0.000673 Epoch: 10 Global Step: 217300 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:14,079-Speed 2497.34 samples/sec Loss 3.4974 LearningRate 0.000672 Epoch: 10 Global Step: 217310 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:22,327-Speed 2483.55 samples/sec Loss 3.4453 LearningRate 0.000672 Epoch: 10 Global Step: 217320 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:30,476-Speed 2513.87 samples/sec Loss 3.4728 LearningRate 0.000672 Epoch: 10 Global Step: 217330 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:38,696-Speed 2491.94 samples/sec Loss 3.4927 LearningRate 0.000672 Epoch: 10 Global Step: 217340 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:46,909-Speed 2493.92 samples/sec Loss 3.4875 LearningRate 0.000672 Epoch: 10 Global Step: 217350 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:06:55,112-Speed 2497.08 samples/sec Loss 3.4364 LearningRate 0.000672 Epoch: 10 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:03,314-Speed 2497.79 samples/sec Loss 3.3955 LearningRate 0.000672 Epoch: 10 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:11,529-Speed 2493.17 samples/sec Loss 3.4025 LearningRate 0.000672 Epoch: 10 Global Step: 217380 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:19,674-Speed 2514.88 samples/sec Loss 3.4446 LearningRate 0.000672 Epoch: 10 Global Step: 217390 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:27,876-Speed 2497.74 samples/sec Loss 3.4223 LearningRate 0.000672 Epoch: 10 Global Step: 217400 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:36,080-Speed 2496.59 samples/sec Loss 3.4626 LearningRate 0.000672 Epoch: 10 Global Step: 217410 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:44,290-Speed 2494.99 samples/sec Loss 3.4532 LearningRate 0.000672 Epoch: 10 Global Step: 217420 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:07:52,488-Speed 2498.40 samples/sec Loss 3.4659 LearningRate 0.000672 Epoch: 10 Global Step: 217430 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:00,688-Speed 2498.06 samples/sec Loss 3.5055 LearningRate 0.000672 Epoch: 10 Global Step: 217440 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:08,832-Speed 2515.13 samples/sec Loss 3.3956 LearningRate 0.000672 Epoch: 10 Global Step: 217450 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:17,032-Speed 2497.99 samples/sec Loss 3.4497 LearningRate 0.000672 Epoch: 10 Global Step: 217460 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:25,230-Speed 2498.81 samples/sec Loss 3.4520 LearningRate 0.000672 Epoch: 10 Global Step: 217470 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:33,431-Speed 2497.41 samples/sec Loss 3.3754 LearningRate 0.000672 Epoch: 10 Global Step: 217480 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:41,624-Speed 2500.15 samples/sec Loss 3.4477 LearningRate 0.000672 Epoch: 10 Global Step: 217490 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:49,823-Speed 2498.30 samples/sec Loss 3.4601 LearningRate 0.000672 Epoch: 10 Global Step: 217500 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:08:57,964-Speed 2516.21 samples/sec Loss 3.4403 LearningRate 0.000672 Epoch: 10 Global Step: 217510 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:06,159-Speed 2499.42 samples/sec Loss 3.4025 LearningRate 0.000672 Epoch: 10 Global Step: 217520 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:14,368-Speed 2495.40 samples/sec Loss 3.4594 LearningRate 0.000672 Epoch: 10 Global Step: 217530 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:22,580-Speed 2494.42 samples/sec Loss 3.5465 LearningRate 0.000672 Epoch: 10 Global Step: 217540 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:30,777-Speed 2498.63 samples/sec Loss 3.5070 LearningRate 0.000672 Epoch: 10 Global Step: 217550 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:38,980-Speed 2497.31 samples/sec Loss 3.4688 LearningRate 0.000672 Epoch: 10 Global Step: 217560 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:47,125-Speed 2514.92 samples/sec Loss 3.4137 LearningRate 0.000672 Epoch: 10 Global Step: 217570 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:09:55,321-Speed 2498.97 samples/sec Loss 3.5334 LearningRate 0.000672 Epoch: 10 Global Step: 217580 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:03,533-Speed 2494.82 samples/sec Loss 3.4950 LearningRate 0.000672 Epoch: 10 Global Step: 217590 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:11,744-Speed 2494.75 samples/sec Loss 3.4742 LearningRate 0.000672 Epoch: 10 Global Step: 217600 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:19,942-Speed 2498.58 samples/sec Loss 3.4146 LearningRate 0.000672 Epoch: 10 Global Step: 217610 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:28,139-Speed 2498.92 samples/sec Loss 3.4603 LearningRate 0.000672 Epoch: 10 Global Step: 217620 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:36,284-Speed 2514.67 samples/sec Loss 3.4180 LearningRate 0.000672 Epoch: 10 Global Step: 217630 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:44,484-Speed 2498.42 samples/sec Loss 3.4932 LearningRate 0.000672 Epoch: 10 Global Step: 217640 Fp16 Grad Scale: 32768 Required: 140 hours Training: 2022-07-07 16:10:52,680-Speed 2499.11 samples/sec Loss 3.4718 LearningRate 0.000672 Epoch: 10 Global Step: 217650 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:00,878-Speed 2498.63 samples/sec Loss 3.4640 LearningRate 0.000672 Epoch: 10 Global Step: 217660 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:09,078-Speed 2498.00 samples/sec Loss 3.4520 LearningRate 0.000672 Epoch: 10 Global Step: 217670 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:17,279-Speed 2497.72 samples/sec Loss 3.4043 LearningRate 0.000672 Epoch: 10 Global Step: 217680 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:25,424-Speed 2514.75 samples/sec Loss 3.4662 LearningRate 0.000672 Epoch: 10 Global Step: 217690 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:33,634-Speed 2494.90 samples/sec Loss 3.4692 LearningRate 0.000672 Epoch: 10 Global Step: 217700 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:41,834-Speed 2498.10 samples/sec Loss 3.5191 LearningRate 0.000672 Epoch: 10 Global Step: 217710 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:50,031-Speed 2498.95 samples/sec Loss 3.5362 LearningRate 0.000672 Epoch: 10 Global Step: 217720 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:11:58,240-Speed 2495.02 samples/sec Loss 3.4218 LearningRate 0.000672 Epoch: 10 Global Step: 217730 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:06,441-Speed 2497.76 samples/sec Loss 3.5118 LearningRate 0.000672 Epoch: 10 Global Step: 217740 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:14,588-Speed 2514.34 samples/sec Loss 3.4770 LearningRate 0.000672 Epoch: 10 Global Step: 217750 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:22,789-Speed 2497.75 samples/sec Loss 3.5171 LearningRate 0.000671 Epoch: 10 Global Step: 217760 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:30,988-Speed 2498.32 samples/sec Loss 3.5386 LearningRate 0.000671 Epoch: 10 Global Step: 217770 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:39,183-Speed 2499.44 samples/sec Loss 3.5011 LearningRate 0.000671 Epoch: 10 Global Step: 217780 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:47,382-Speed 2498.77 samples/sec Loss 3.4668 LearningRate 0.000671 Epoch: 10 Global Step: 217790 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:12:55,577-Speed 2499.33 samples/sec Loss 3.4847 LearningRate 0.000671 Epoch: 10 Global Step: 217800 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:03,722-Speed 2514.99 samples/sec Loss 3.5527 LearningRate 0.000671 Epoch: 10 Global Step: 217810 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:11,919-Speed 2499.07 samples/sec Loss 3.5467 LearningRate 0.000671 Epoch: 10 Global Step: 217820 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:20,115-Speed 2499.26 samples/sec Loss 3.5938 LearningRate 0.000671 Epoch: 10 Global Step: 217830 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:28,313-Speed 2498.51 samples/sec Loss 3.5692 LearningRate 0.000671 Epoch: 10 Global Step: 217840 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:36,510-Speed 2498.77 samples/sec Loss 3.4903 LearningRate 0.000671 Epoch: 10 Global Step: 217850 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:44,708-Speed 2498.83 samples/sec Loss 3.4699 LearningRate 0.000671 Epoch: 10 Global Step: 217860 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:13:52,855-Speed 2514.15 samples/sec Loss 3.5413 LearningRate 0.000671 Epoch: 10 Global Step: 217870 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:01,055-Speed 2498.13 samples/sec Loss 3.5019 LearningRate 0.000671 Epoch: 10 Global Step: 217880 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:09,253-Speed 2498.43 samples/sec Loss 3.4730 LearningRate 0.000671 Epoch: 10 Global Step: 217890 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:17,454-Speed 2497.97 samples/sec Loss 3.4527 LearningRate 0.000671 Epoch: 10 Global Step: 217900 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:25,651-Speed 2498.79 samples/sec Loss 3.4942 LearningRate 0.000671 Epoch: 10 Global Step: 217910 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:33,852-Speed 2497.56 samples/sec Loss 3.4608 LearningRate 0.000671 Epoch: 10 Global Step: 217920 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:42,007-Speed 2511.94 samples/sec Loss 3.3984 LearningRate 0.000671 Epoch: 10 Global Step: 217930 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:50,206-Speed 2498.05 samples/sec Loss 3.3930 LearningRate 0.000671 Epoch: 10 Global Step: 217940 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:14:58,405-Speed 2498.24 samples/sec Loss 3.4147 LearningRate 0.000671 Epoch: 10 Global Step: 217950 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:06,604-Speed 2498.32 samples/sec Loss 3.4035 LearningRate 0.000671 Epoch: 10 Global Step: 217960 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:14,808-Speed 2496.93 samples/sec Loss 3.4497 LearningRate 0.000671 Epoch: 10 Global Step: 217970 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:23,010-Speed 2497.36 samples/sec Loss 3.4254 LearningRate 0.000671 Epoch: 10 Global Step: 217980 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:31,174-Speed 2508.86 samples/sec Loss 3.3765 LearningRate 0.000671 Epoch: 10 Global Step: 217990 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:39,387-Speed 2494.09 samples/sec Loss 3.3653 LearningRate 0.000671 Epoch: 10 Global Step: 218000 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:47,583-Speed 2499.02 samples/sec Loss 3.4491 LearningRate 0.000671 Epoch: 10 Global Step: 218010 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:15:55,778-Speed 2499.45 samples/sec Loss 3.4877 LearningRate 0.000671 Epoch: 10 Global Step: 218020 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:03,986-Speed 2495.71 samples/sec Loss 3.4134 LearningRate 0.000671 Epoch: 10 Global Step: 218030 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:12,184-Speed 2498.59 samples/sec Loss 3.3939 LearningRate 0.000671 Epoch: 10 Global Step: 218040 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:20,329-Speed 2514.87 samples/sec Loss 3.4138 LearningRate 0.000671 Epoch: 10 Global Step: 218050 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:28,532-Speed 2497.06 samples/sec Loss 3.4728 LearningRate 0.000671 Epoch: 10 Global Step: 218060 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:36,729-Speed 2498.81 samples/sec Loss 3.4452 LearningRate 0.000671 Epoch: 10 Global Step: 218070 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:44,933-Speed 2496.76 samples/sec Loss 3.4664 LearningRate 0.000671 Epoch: 10 Global Step: 218080 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:16:53,136-Speed 2497.09 samples/sec Loss 3.4640 LearningRate 0.000671 Epoch: 10 Global Step: 218090 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:01,343-Speed 2495.82 samples/sec Loss 3.4644 LearningRate 0.000671 Epoch: 10 Global Step: 218100 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:09,488-Speed 2515.00 samples/sec Loss 3.5468 LearningRate 0.000671 Epoch: 10 Global Step: 218110 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:17,685-Speed 2498.85 samples/sec Loss 3.4245 LearningRate 0.000671 Epoch: 10 Global Step: 218120 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:25,886-Speed 2497.47 samples/sec Loss 3.4399 LearningRate 0.000671 Epoch: 10 Global Step: 218130 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:34,089-Speed 2497.34 samples/sec Loss 3.4944 LearningRate 0.000671 Epoch: 10 Global Step: 218140 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:42,290-Speed 2497.57 samples/sec Loss 3.4133 LearningRate 0.000671 Epoch: 10 Global Step: 218150 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:50,485-Speed 2499.46 samples/sec Loss 3.5136 LearningRate 0.000671 Epoch: 10 Global Step: 218160 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:17:58,630-Speed 2515.39 samples/sec Loss 3.4202 LearningRate 0.000671 Epoch: 10 Global Step: 218170 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:06,830-Speed 2497.99 samples/sec Loss 3.4352 LearningRate 0.000671 Epoch: 10 Global Step: 218180 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:15,027-Speed 2498.76 samples/sec Loss 3.4196 LearningRate 0.000671 Epoch: 10 Global Step: 218190 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:23,240-Speed 2494.00 samples/sec Loss 3.4208 LearningRate 0.000671 Epoch: 10 Global Step: 218200 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:31,441-Speed 2497.89 samples/sec Loss 3.4510 LearningRate 0.000671 Epoch: 10 Global Step: 218210 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:39,639-Speed 2498.50 samples/sec Loss 3.5034 LearningRate 0.000670 Epoch: 10 Global Step: 218220 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:47,786-Speed 2514.09 samples/sec Loss 3.4452 LearningRate 0.000670 Epoch: 10 Global Step: 218230 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:18:55,986-Speed 2497.88 samples/sec Loss 3.4878 LearningRate 0.000670 Epoch: 10 Global Step: 218240 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:04,183-Speed 2499.06 samples/sec Loss 3.4715 LearningRate 0.000670 Epoch: 10 Global Step: 218250 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:12,383-Speed 2498.02 samples/sec Loss 3.5290 LearningRate 0.000670 Epoch: 10 Global Step: 218260 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:20,582-Speed 2498.12 samples/sec Loss 3.5137 LearningRate 0.000670 Epoch: 10 Global Step: 218270 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:28,790-Speed 2495.49 samples/sec Loss 3.4467 LearningRate 0.000670 Epoch: 10 Global Step: 218280 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:36,943-Speed 2512.32 samples/sec Loss 3.4633 LearningRate 0.000670 Epoch: 10 Global Step: 218290 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:45,145-Speed 2497.71 samples/sec Loss 3.4638 LearningRate 0.000670 Epoch: 10 Global Step: 218300 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:19:53,343-Speed 2498.49 samples/sec Loss 3.5054 LearningRate 0.000670 Epoch: 10 Global Step: 218310 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:01,543-Speed 2497.93 samples/sec Loss 3.4640 LearningRate 0.000670 Epoch: 10 Global Step: 218320 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:09,740-Speed 2498.76 samples/sec Loss 3.4967 LearningRate 0.000670 Epoch: 10 Global Step: 218330 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:17,940-Speed 2497.87 samples/sec Loss 3.4523 LearningRate 0.000670 Epoch: 10 Global Step: 218340 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:26,083-Speed 2515.52 samples/sec Loss 3.4886 LearningRate 0.000670 Epoch: 10 Global Step: 218350 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:34,282-Speed 2498.19 samples/sec Loss 3.4957 LearningRate 0.000670 Epoch: 10 Global Step: 218360 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:42,480-Speed 2498.54 samples/sec Loss 3.4889 LearningRate 0.000670 Epoch: 10 Global Step: 218370 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:50,679-Speed 2498.57 samples/sec Loss 3.5388 LearningRate 0.000670 Epoch: 10 Global Step: 218380 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:20:58,876-Speed 2498.72 samples/sec Loss 3.5335 LearningRate 0.000670 Epoch: 10 Global Step: 218390 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:07,086-Speed 2495.06 samples/sec Loss 3.5036 LearningRate 0.000670 Epoch: 10 Global Step: 218400 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:15,233-Speed 2514.34 samples/sec Loss 3.5288 LearningRate 0.000670 Epoch: 10 Global Step: 218410 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:23,430-Speed 2498.78 samples/sec Loss 3.4812 LearningRate 0.000670 Epoch: 10 Global Step: 218420 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:31,631-Speed 2497.69 samples/sec Loss 3.4845 LearningRate 0.000670 Epoch: 10 Global Step: 218430 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:39,845-Speed 2493.59 samples/sec Loss 3.4952 LearningRate 0.000670 Epoch: 10 Global Step: 218440 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:48,044-Speed 2498.30 samples/sec Loss 3.4491 LearningRate 0.000670 Epoch: 10 Global Step: 218450 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:21:56,240-Speed 2499.07 samples/sec Loss 3.5525 LearningRate 0.000670 Epoch: 10 Global Step: 218460 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:04,386-Speed 2514.36 samples/sec Loss 3.4204 LearningRate 0.000670 Epoch: 10 Global Step: 218470 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:12,581-Speed 2501.33 samples/sec Loss 3.5014 LearningRate 0.000670 Epoch: 10 Global Step: 218480 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:20,786-Speed 2496.56 samples/sec Loss 3.4273 LearningRate 0.000670 Epoch: 10 Global Step: 218490 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:28,997-Speed 2494.70 samples/sec Loss 3.4485 LearningRate 0.000670 Epoch: 10 Global Step: 218500 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:37,195-Speed 2498.43 samples/sec Loss 3.4899 LearningRate 0.000670 Epoch: 10 Global Step: 218510 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:45,391-Speed 2499.01 samples/sec Loss 3.4502 LearningRate 0.000670 Epoch: 10 Global Step: 218520 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:22:53,536-Speed 2515.11 samples/sec Loss 3.4934 LearningRate 0.000670 Epoch: 10 Global Step: 218530 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:01,736-Speed 2498.04 samples/sec Loss 3.4760 LearningRate 0.000670 Epoch: 10 Global Step: 218540 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:09,932-Speed 2499.09 samples/sec Loss 3.4856 LearningRate 0.000670 Epoch: 10 Global Step: 218550 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:18,128-Speed 2499.33 samples/sec Loss 3.5685 LearningRate 0.000670 Epoch: 10 Global Step: 218560 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:26,323-Speed 2499.35 samples/sec Loss 3.5121 LearningRate 0.000670 Epoch: 10 Global Step: 218570 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:34,522-Speed 2498.19 samples/sec Loss 3.5025 LearningRate 0.000670 Epoch: 10 Global Step: 218580 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:42,679-Speed 2511.26 samples/sec Loss 3.4642 LearningRate 0.000670 Epoch: 10 Global Step: 218590 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:50,874-Speed 2499.41 samples/sec Loss 3.4689 LearningRate 0.000670 Epoch: 10 Global Step: 218600 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:23:59,070-Speed 2499.28 samples/sec Loss 3.4450 LearningRate 0.000670 Epoch: 10 Global Step: 218610 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:07,268-Speed 2498.61 samples/sec Loss 3.4359 LearningRate 0.000670 Epoch: 10 Global Step: 218620 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:15,465-Speed 2499.03 samples/sec Loss 3.4849 LearningRate 0.000670 Epoch: 10 Global Step: 218630 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:23,675-Speed 2494.96 samples/sec Loss 3.4423 LearningRate 0.000670 Epoch: 10 Global Step: 218640 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:31,818-Speed 2515.50 samples/sec Loss 3.4555 LearningRate 0.000670 Epoch: 10 Global Step: 218650 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:40,020-Speed 2497.42 samples/sec Loss 3.5146 LearningRate 0.000670 Epoch: 10 Global Step: 218660 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:48,216-Speed 2499.01 samples/sec Loss 3.4202 LearningRate 0.000669 Epoch: 10 Global Step: 218670 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:24:56,418-Speed 2497.57 samples/sec Loss 3.4509 LearningRate 0.000669 Epoch: 10 Global Step: 218680 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:04,619-Speed 2497.66 samples/sec Loss 3.4937 LearningRate 0.000669 Epoch: 10 Global Step: 218690 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:12,817-Speed 2498.54 samples/sec Loss 3.5035 LearningRate 0.000669 Epoch: 10 Global Step: 218700 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:20,976-Speed 2510.46 samples/sec Loss 3.4767 LearningRate 0.000669 Epoch: 10 Global Step: 218710 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:29,172-Speed 2499.42 samples/sec Loss 3.4679 LearningRate 0.000669 Epoch: 10 Global Step: 218720 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:37,371-Speed 2498.33 samples/sec Loss 3.4702 LearningRate 0.000669 Epoch: 10 Global Step: 218730 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:45,570-Speed 2498.29 samples/sec Loss 3.4598 LearningRate 0.000669 Epoch: 10 Global Step: 218740 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:25:53,774-Speed 2496.71 samples/sec Loss 3.4823 LearningRate 0.000669 Epoch: 10 Global Step: 218750 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:01,975-Speed 2498.20 samples/sec Loss 3.4753 LearningRate 0.000669 Epoch: 10 Global Step: 218760 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:10,123-Speed 2513.68 samples/sec Loss 3.4259 LearningRate 0.000669 Epoch: 10 Global Step: 218770 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:18,323-Speed 2498.00 samples/sec Loss 3.4709 LearningRate 0.000669 Epoch: 10 Global Step: 218780 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:26,532-Speed 2495.02 samples/sec Loss 3.4900 LearningRate 0.000669 Epoch: 10 Global Step: 218790 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:34,733-Speed 2497.74 samples/sec Loss 3.4790 LearningRate 0.000669 Epoch: 10 Global Step: 218800 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:42,934-Speed 2497.62 samples/sec Loss 3.5301 LearningRate 0.000669 Epoch: 10 Global Step: 218810 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:51,130-Speed 2499.18 samples/sec Loss 3.5528 LearningRate 0.000669 Epoch: 10 Global Step: 218820 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:26:59,275-Speed 2515.43 samples/sec Loss 3.4757 LearningRate 0.000669 Epoch: 10 Global Step: 218830 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:27:07,473-Speed 2498.70 samples/sec Loss 3.4931 LearningRate 0.000669 Epoch: 10 Global Step: 218840 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:27:15,662-Speed 2501.26 samples/sec Loss 3.4536 LearningRate 0.000669 Epoch: 10 Global Step: 218850 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:27:23,859-Speed 2498.67 samples/sec Loss 3.4494 LearningRate 0.000669 Epoch: 10 Global Step: 218860 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:27:32,057-Speed 2498.84 samples/sec Loss 3.4662 LearningRate 0.000669 Epoch: 10 Global Step: 218870 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:27:40,254-Speed 2498.97 samples/sec Loss 3.3878 LearningRate 0.000669 Epoch: 10 Global Step: 218880 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:27:48,398-Speed 2514.99 samples/sec Loss 3.4348 LearningRate 0.000669 Epoch: 10 Global Step: 218890 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:27:56,600-Speed 2497.19 samples/sec Loss 3.4359 LearningRate 0.000669 Epoch: 10 Global Step: 218900 Fp16 Grad Scale: 131072 Required: 140 hours Training: 2022-07-07 16:28:04,760-Speed 2510.65 samples/sec Loss 3.4744 LearningRate 0.000669 Epoch: 10 Global Step: 218910 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:12,964-Speed 2496.69 samples/sec Loss 3.3876 LearningRate 0.000669 Epoch: 10 Global Step: 218920 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:21,165-Speed 2497.78 samples/sec Loss 3.4158 LearningRate 0.000669 Epoch: 10 Global Step: 218930 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:29,372-Speed 2495.84 samples/sec Loss 3.4536 LearningRate 0.000669 Epoch: 10 Global Step: 218940 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:37,515-Speed 2515.23 samples/sec Loss 3.3936 LearningRate 0.000669 Epoch: 10 Global Step: 218950 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:45,714-Speed 2498.39 samples/sec Loss 3.3703 LearningRate 0.000669 Epoch: 10 Global Step: 218960 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:28:53,920-Speed 2496.32 samples/sec Loss 3.4087 LearningRate 0.000669 Epoch: 10 Global Step: 218970 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:02,116-Speed 2499.16 samples/sec Loss 3.3284 LearningRate 0.000669 Epoch: 10 Global Step: 218980 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:10,312-Speed 2499.07 samples/sec Loss 3.4474 LearningRate 0.000669 Epoch: 10 Global Step: 218990 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:18,512-Speed 2498.10 samples/sec Loss 3.4195 LearningRate 0.000669 Epoch: 10 Global Step: 219000 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:26,658-Speed 2514.41 samples/sec Loss 3.4210 LearningRate 0.000669 Epoch: 10 Global Step: 219010 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:34,869-Speed 2494.76 samples/sec Loss 3.4061 LearningRate 0.000669 Epoch: 10 Global Step: 219020 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:43,081-Speed 2494.17 samples/sec Loss 3.3388 LearningRate 0.000669 Epoch: 10 Global Step: 219030 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:51,276-Speed 2499.63 samples/sec Loss 3.4186 LearningRate 0.000669 Epoch: 10 Global Step: 219040 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:29:59,469-Speed 2499.98 samples/sec Loss 3.4982 LearningRate 0.000669 Epoch: 10 Global Step: 219050 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:07,666-Speed 2498.97 samples/sec Loss 3.4222 LearningRate 0.000669 Epoch: 10 Global Step: 219060 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:15,813-Speed 2514.13 samples/sec Loss 3.4315 LearningRate 0.000669 Epoch: 10 Global Step: 219070 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:24,022-Speed 2495.29 samples/sec Loss 3.5038 LearningRate 0.000669 Epoch: 10 Global Step: 219080 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:32,214-Speed 2500.15 samples/sec Loss 3.4594 LearningRate 0.000669 Epoch: 10 Global Step: 219090 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:40,411-Speed 2498.94 samples/sec Loss 3.4370 LearningRate 0.000669 Epoch: 10 Global Step: 219100 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:48,607-Speed 2499.13 samples/sec Loss 3.4712 LearningRate 0.000669 Epoch: 10 Global Step: 219110 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:30:56,804-Speed 2498.94 samples/sec Loss 3.3730 LearningRate 0.000669 Epoch: 10 Global Step: 219120 Fp16 Grad Scale: 65536 Required: 140 hours Training: 2022-07-07 16:31:04,949-Speed 2514.78 samples/sec Loss 3.5032 LearningRate 0.000668 Epoch: 10 Global Step: 219130 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:13,146-Speed 2498.87 samples/sec Loss 3.5361 LearningRate 0.000668 Epoch: 10 Global Step: 219140 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:21,340-Speed 2499.64 samples/sec Loss 3.5384 LearningRate 0.000668 Epoch: 10 Global Step: 219150 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:29,539-Speed 2498.26 samples/sec Loss 3.5806 LearningRate 0.000668 Epoch: 10 Global Step: 219160 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:37,749-Speed 2495.23 samples/sec Loss 3.5799 LearningRate 0.000668 Epoch: 10 Global Step: 219170 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:45,946-Speed 2498.64 samples/sec Loss 3.5100 LearningRate 0.000668 Epoch: 10 Global Step: 219180 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:31:54,091-Speed 2514.85 samples/sec Loss 3.5401 LearningRate 0.000668 Epoch: 10 Global Step: 219190 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:02,288-Speed 2498.92 samples/sec Loss 3.4686 LearningRate 0.000668 Epoch: 10 Global Step: 219200 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:10,495-Speed 2495.97 samples/sec Loss 3.5239 LearningRate 0.000668 Epoch: 10 Global Step: 219210 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:18,694-Speed 2498.12 samples/sec Loss 3.5159 LearningRate 0.000668 Epoch: 10 Global Step: 219220 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:26,892-Speed 2498.40 samples/sec Loss 3.4149 LearningRate 0.000668 Epoch: 10 Global Step: 219230 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:35,089-Speed 2499.12 samples/sec Loss 3.5313 LearningRate 0.000668 Epoch: 10 Global Step: 219240 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:43,234-Speed 2514.65 samples/sec Loss 3.5177 LearningRate 0.000668 Epoch: 10 Global Step: 219250 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:51,434-Speed 2498.12 samples/sec Loss 3.4588 LearningRate 0.000668 Epoch: 10 Global Step: 219260 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:32:59,629-Speed 2499.28 samples/sec Loss 3.4745 LearningRate 0.000668 Epoch: 10 Global Step: 219270 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:07,825-Speed 2499.14 samples/sec Loss 3.3801 LearningRate 0.000668 Epoch: 10 Global Step: 219280 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:16,025-Speed 2498.17 samples/sec Loss 3.4384 LearningRate 0.000668 Epoch: 10 Global Step: 219290 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:24,226-Speed 2497.94 samples/sec Loss 3.4004 LearningRate 0.000668 Epoch: 10 Global Step: 219300 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:32,372-Speed 2514.49 samples/sec Loss 3.4619 LearningRate 0.000668 Epoch: 10 Global Step: 219310 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:40,568-Speed 2499.16 samples/sec Loss 3.4428 LearningRate 0.000668 Epoch: 10 Global Step: 219320 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:48,770-Speed 2497.55 samples/sec Loss 3.4038 LearningRate 0.000668 Epoch: 10 Global Step: 219330 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:33:56,967-Speed 2498.60 samples/sec Loss 3.4287 LearningRate 0.000668 Epoch: 10 Global Step: 219340 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:05,168-Speed 2497.77 samples/sec Loss 3.4215 LearningRate 0.000668 Epoch: 10 Global Step: 219350 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:13,369-Speed 2497.68 samples/sec Loss 3.3857 LearningRate 0.000668 Epoch: 10 Global Step: 219360 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:21,515-Speed 2514.43 samples/sec Loss 3.4127 LearningRate 0.000668 Epoch: 10 Global Step: 219370 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:29,713-Speed 2498.69 samples/sec Loss 3.3936 LearningRate 0.000668 Epoch: 10 Global Step: 219380 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:37,911-Speed 2498.38 samples/sec Loss 3.4384 LearningRate 0.000668 Epoch: 10 Global Step: 219390 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:34:46,066-Speed 2511.84 samples/sec Loss 3.3675 LearningRate 0.000668 Epoch: 10 Global Step: 219400 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:34:54,264-Speed 2498.63 samples/sec Loss 3.4554 LearningRate 0.000668 Epoch: 10 Global Step: 219410 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:02,470-Speed 2496.06 samples/sec Loss 3.4679 LearningRate 0.000668 Epoch: 10 Global Step: 219420 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:10,613-Speed 2515.54 samples/sec Loss 3.4541 LearningRate 0.000668 Epoch: 10 Global Step: 219430 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:18,807-Speed 2499.86 samples/sec Loss 3.4374 LearningRate 0.000668 Epoch: 10 Global Step: 219440 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:27,005-Speed 2498.41 samples/sec Loss 3.4881 LearningRate 0.000668 Epoch: 10 Global Step: 219450 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:35,203-Speed 2498.57 samples/sec Loss 3.4674 LearningRate 0.000668 Epoch: 10 Global Step: 219460 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:43,411-Speed 2495.73 samples/sec Loss 3.5307 LearningRate 0.000668 Epoch: 10 Global Step: 219470 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:51,610-Speed 2498.25 samples/sec Loss 3.4907 LearningRate 0.000668 Epoch: 10 Global Step: 219480 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:35:59,753-Speed 2515.44 samples/sec Loss 3.4721 LearningRate 0.000668 Epoch: 10 Global Step: 219490 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:07,951-Speed 2498.52 samples/sec Loss 3.4977 LearningRate 0.000668 Epoch: 10 Global Step: 219500 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:16,148-Speed 2498.77 samples/sec Loss 3.5472 LearningRate 0.000668 Epoch: 10 Global Step: 219510 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:24,350-Speed 2497.69 samples/sec Loss 3.4571 LearningRate 0.000668 Epoch: 10 Global Step: 219520 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:32,548-Speed 2498.40 samples/sec Loss 3.4505 LearningRate 0.000668 Epoch: 10 Global Step: 219530 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:40,746-Speed 2498.44 samples/sec Loss 3.5539 LearningRate 0.000668 Epoch: 10 Global Step: 219540 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:48,897-Speed 2513.05 samples/sec Loss 3.4272 LearningRate 0.000668 Epoch: 10 Global Step: 219550 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:36:57,093-Speed 2499.09 samples/sec Loss 3.4639 LearningRate 0.000668 Epoch: 10 Global Step: 219560 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:05,291-Speed 2498.68 samples/sec Loss 3.4696 LearningRate 0.000668 Epoch: 10 Global Step: 219570 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:13,487-Speed 2499.19 samples/sec Loss 3.4792 LearningRate 0.000668 Epoch: 10 Global Step: 219580 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:21,690-Speed 2496.90 samples/sec Loss 3.4493 LearningRate 0.000667 Epoch: 10 Global Step: 219590 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:29,889-Speed 2498.45 samples/sec Loss 3.3891 LearningRate 0.000667 Epoch: 10 Global Step: 219600 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:38,051-Speed 2509.50 samples/sec Loss 3.4509 LearningRate 0.000667 Epoch: 10 Global Step: 219610 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:46,253-Speed 2497.47 samples/sec Loss 3.4139 LearningRate 0.000667 Epoch: 10 Global Step: 219620 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:37:54,452-Speed 2498.28 samples/sec Loss 3.4140 LearningRate 0.000667 Epoch: 10 Global Step: 219630 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:02,650-Speed 2498.68 samples/sec Loss 3.3606 LearningRate 0.000667 Epoch: 10 Global Step: 219640 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:10,847-Speed 2498.86 samples/sec Loss 3.4414 LearningRate 0.000667 Epoch: 10 Global Step: 219650 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:19,048-Speed 2497.70 samples/sec Loss 3.4047 LearningRate 0.000667 Epoch: 10 Global Step: 219660 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:27,194-Speed 2514.63 samples/sec Loss 3.4739 LearningRate 0.000667 Epoch: 10 Global Step: 219670 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:35,392-Speed 2498.55 samples/sec Loss 3.5265 LearningRate 0.000667 Epoch: 10 Global Step: 219680 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:43,595-Speed 2496.86 samples/sec Loss 3.4217 LearningRate 0.000667 Epoch: 10 Global Step: 219690 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:51,799-Speed 2497.07 samples/sec Loss 3.3641 LearningRate 0.000667 Epoch: 10 Global Step: 219700 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:38:59,996-Speed 2498.80 samples/sec Loss 3.3824 LearningRate 0.000667 Epoch: 10 Global Step: 219710 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:08,195-Speed 2498.40 samples/sec Loss 3.4186 LearningRate 0.000667 Epoch: 10 Global Step: 219720 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:16,342-Speed 2514.02 samples/sec Loss 3.4091 LearningRate 0.000667 Epoch: 10 Global Step: 219730 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:24,537-Speed 2499.48 samples/sec Loss 3.3851 LearningRate 0.000667 Epoch: 10 Global Step: 219740 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:32,737-Speed 2498.01 samples/sec Loss 3.5045 LearningRate 0.000667 Epoch: 10 Global Step: 219750 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:40,941-Speed 2496.90 samples/sec Loss 3.4443 LearningRate 0.000667 Epoch: 10 Global Step: 219760 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:49,137-Speed 2498.96 samples/sec Loss 3.4739 LearningRate 0.000667 Epoch: 10 Global Step: 219770 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:39:57,337-Speed 2498.05 samples/sec Loss 3.4508 LearningRate 0.000667 Epoch: 10 Global Step: 219780 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:05,482-Speed 2514.97 samples/sec Loss 3.3664 LearningRate 0.000667 Epoch: 10 Global Step: 219790 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:13,685-Speed 2496.95 samples/sec Loss 3.3780 LearningRate 0.000667 Epoch: 10 Global Step: 219800 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:21,884-Speed 2498.32 samples/sec Loss 3.4779 LearningRate 0.000667 Epoch: 10 Global Step: 219810 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:30,078-Speed 2499.43 samples/sec Loss 3.3813 LearningRate 0.000667 Epoch: 10 Global Step: 219820 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:38,276-Speed 2498.82 samples/sec Loss 3.4458 LearningRate 0.000667 Epoch: 10 Global Step: 219830 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:46,479-Speed 2496.83 samples/sec Loss 3.3433 LearningRate 0.000667 Epoch: 10 Global Step: 219840 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:40:54,625-Speed 2514.68 samples/sec Loss 3.3676 LearningRate 0.000667 Epoch: 10 Global Step: 219850 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:02,831-Speed 2496.11 samples/sec Loss 3.4374 LearningRate 0.000667 Epoch: 10 Global Step: 219860 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:11,025-Speed 2499.61 samples/sec Loss 3.4287 LearningRate 0.000667 Epoch: 10 Global Step: 219870 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:19,223-Speed 2498.60 samples/sec Loss 3.4775 LearningRate 0.000667 Epoch: 10 Global Step: 219880 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:27,427-Speed 2496.91 samples/sec Loss 3.4828 LearningRate 0.000667 Epoch: 10 Global Step: 219890 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:35,625-Speed 2498.71 samples/sec Loss 3.5590 LearningRate 0.000667 Epoch: 10 Global Step: 219900 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:43,781-Speed 2511.59 samples/sec Loss 3.4015 LearningRate 0.000667 Epoch: 10 Global Step: 219910 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:41:51,989-Speed 2495.18 samples/sec Loss 3.4306 LearningRate 0.000667 Epoch: 10 Global Step: 219920 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:00,185-Speed 2499.14 samples/sec Loss 3.4611 LearningRate 0.000667 Epoch: 10 Global Step: 219930 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:08,387-Speed 2497.51 samples/sec Loss 3.5167 LearningRate 0.000667 Epoch: 10 Global Step: 219940 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:16,601-Speed 2493.82 samples/sec Loss 3.4378 LearningRate 0.000667 Epoch: 10 Global Step: 219950 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:24,795-Speed 2499.60 samples/sec Loss 3.5334 LearningRate 0.000667 Epoch: 10 Global Step: 219960 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:32,950-Speed 2511.61 samples/sec Loss 3.4434 LearningRate 0.000667 Epoch: 10 Global Step: 219970 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:41,148-Speed 2498.84 samples/sec Loss 3.4191 LearningRate 0.000667 Epoch: 10 Global Step: 219980 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:49,349-Speed 2497.65 samples/sec Loss 3.4224 LearningRate 0.000667 Epoch: 10 Global Step: 219990 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:42:57,550-Speed 2497.34 samples/sec Loss 3.4337 LearningRate 0.000667 Epoch: 10 Global Step: 220000 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:05,743-Speed 2500.27 samples/sec Loss 3.4936 LearningRate 0.000667 Epoch: 10 Global Step: 220010 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:13,947-Speed 2496.70 samples/sec Loss 3.4155 LearningRate 0.000667 Epoch: 10 Global Step: 220020 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:22,090-Speed 2515.47 samples/sec Loss 3.4944 LearningRate 0.000667 Epoch: 10 Global Step: 220030 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:30,291-Speed 2497.72 samples/sec Loss 3.5272 LearningRate 0.000666 Epoch: 10 Global Step: 220040 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:38,492-Speed 2497.69 samples/sec Loss 3.5352 LearningRate 0.000666 Epoch: 10 Global Step: 220050 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:46,692-Speed 2497.89 samples/sec Loss 3.4643 LearningRate 0.000666 Epoch: 10 Global Step: 220060 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:43:54,888-Speed 2499.10 samples/sec Loss 3.4063 LearningRate 0.000666 Epoch: 10 Global Step: 220070 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:03,089-Speed 2497.55 samples/sec Loss 3.4933 LearningRate 0.000666 Epoch: 10 Global Step: 220080 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:11,235-Speed 2514.61 samples/sec Loss 3.5408 LearningRate 0.000666 Epoch: 10 Global Step: 220090 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:19,431-Speed 2499.56 samples/sec Loss 3.4578 LearningRate 0.000666 Epoch: 10 Global Step: 220100 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:27,630-Speed 2498.05 samples/sec Loss 3.4730 LearningRate 0.000666 Epoch: 10 Global Step: 220110 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:35,823-Speed 2500.19 samples/sec Loss 3.4274 LearningRate 0.000666 Epoch: 10 Global Step: 220120 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:44,033-Speed 2494.89 samples/sec Loss 3.4827 LearningRate 0.000666 Epoch: 10 Global Step: 220130 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:44:52,233-Speed 2498.57 samples/sec Loss 3.4696 LearningRate 0.000666 Epoch: 10 Global Step: 220140 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:00,377-Speed 2515.09 samples/sec Loss 3.4790 LearningRate 0.000666 Epoch: 10 Global Step: 220150 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:08,577-Speed 2498.05 samples/sec Loss 3.4614 LearningRate 0.000666 Epoch: 10 Global Step: 220160 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:16,775-Speed 2498.75 samples/sec Loss 3.4263 LearningRate 0.000666 Epoch: 10 Global Step: 220170 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:24,971-Speed 2499.02 samples/sec Loss 3.4356 LearningRate 0.000666 Epoch: 10 Global Step: 220180 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:33,167-Speed 2499.28 samples/sec Loss 3.4047 LearningRate 0.000666 Epoch: 10 Global Step: 220190 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:41,369-Speed 2497.27 samples/sec Loss 3.3508 LearningRate 0.000666 Epoch: 10 Global Step: 220200 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:49,513-Speed 2515.07 samples/sec Loss 3.3925 LearningRate 0.000666 Epoch: 10 Global Step: 220210 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:45:57,710-Speed 2499.31 samples/sec Loss 3.4289 LearningRate 0.000666 Epoch: 10 Global Step: 220220 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:05,919-Speed 2495.20 samples/sec Loss 3.4323 LearningRate 0.000666 Epoch: 10 Global Step: 220230 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:14,121-Speed 2497.37 samples/sec Loss 3.4426 LearningRate 0.000666 Epoch: 10 Global Step: 220240 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:22,327-Speed 2496.09 samples/sec Loss 3.3911 LearningRate 0.000666 Epoch: 10 Global Step: 220250 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:30,528-Speed 2497.90 samples/sec Loss 3.5007 LearningRate 0.000666 Epoch: 10 Global Step: 220260 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:38,671-Speed 2515.26 samples/sec Loss 3.5306 LearningRate 0.000666 Epoch: 10 Global Step: 220270 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:46,868-Speed 2498.76 samples/sec Loss 3.3825 LearningRate 0.000666 Epoch: 10 Global Step: 220280 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:46:55,066-Speed 2498.65 samples/sec Loss 3.4090 LearningRate 0.000666 Epoch: 10 Global Step: 220290 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:03,286-Speed 2491.94 samples/sec Loss 3.3918 LearningRate 0.000666 Epoch: 10 Global Step: 220300 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:11,483-Speed 2498.67 samples/sec Loss 3.4602 LearningRate 0.000666 Epoch: 10 Global Step: 220310 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:19,683-Speed 2497.74 samples/sec Loss 3.4874 LearningRate 0.000666 Epoch: 10 Global Step: 220320 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:27,832-Speed 2513.82 samples/sec Loss 3.5700 LearningRate 0.000666 Epoch: 10 Global Step: 220330 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:36,030-Speed 2498.62 samples/sec Loss 3.4484 LearningRate 0.000666 Epoch: 10 Global Step: 220340 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:44,237-Speed 2495.59 samples/sec Loss 3.4554 LearningRate 0.000666 Epoch: 10 Global Step: 220350 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:47:52,447-Speed 2494.96 samples/sec Loss 3.5470 LearningRate 0.000666 Epoch: 10 Global Step: 220360 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:00,644-Speed 2498.84 samples/sec Loss 3.3998 LearningRate 0.000666 Epoch: 10 Global Step: 220370 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:08,837-Speed 2499.96 samples/sec Loss 3.4554 LearningRate 0.000666 Epoch: 10 Global Step: 220380 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:16,994-Speed 2511.65 samples/sec Loss 3.5075 LearningRate 0.000666 Epoch: 10 Global Step: 220390 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:25,189-Speed 2499.31 samples/sec Loss 3.4746 LearningRate 0.000666 Epoch: 10 Global Step: 220400 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:33,385-Speed 2499.33 samples/sec Loss 3.4449 LearningRate 0.000666 Epoch: 10 Global Step: 220410 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:41,582-Speed 2499.04 samples/sec Loss 3.4580 LearningRate 0.000666 Epoch: 10 Global Step: 220420 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:49,777-Speed 2499.39 samples/sec Loss 3.4846 LearningRate 0.000666 Epoch: 10 Global Step: 220430 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:48:57,986-Speed 2495.12 samples/sec Loss 3.3941 LearningRate 0.000666 Epoch: 10 Global Step: 220440 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:06,135-Speed 2513.68 samples/sec Loss 3.4199 LearningRate 0.000666 Epoch: 10 Global Step: 220450 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:14,331-Speed 2499.10 samples/sec Loss 3.4098 LearningRate 0.000666 Epoch: 10 Global Step: 220460 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:22,533-Speed 2497.43 samples/sec Loss 3.4358 LearningRate 0.000666 Epoch: 10 Global Step: 220470 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:30,734-Speed 2497.88 samples/sec Loss 3.4402 LearningRate 0.000666 Epoch: 10 Global Step: 220480 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:38,932-Speed 2498.74 samples/sec Loss 3.4436 LearningRate 0.000666 Epoch: 10 Global Step: 220490 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:47,131-Speed 2497.87 samples/sec Loss 3.3921 LearningRate 0.000665 Epoch: 10 Global Step: 220500 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:49:55,277-Speed 2514.63 samples/sec Loss 3.3783 LearningRate 0.000665 Epoch: 10 Global Step: 220510 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:03,473-Speed 2499.31 samples/sec Loss 3.5016 LearningRate 0.000665 Epoch: 10 Global Step: 220520 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:11,672-Speed 2498.33 samples/sec Loss 3.4786 LearningRate 0.000665 Epoch: 10 Global Step: 220530 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:19,867-Speed 2499.39 samples/sec Loss 3.5097 LearningRate 0.000665 Epoch: 10 Global Step: 220540 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:28,067-Speed 2498.17 samples/sec Loss 3.4039 LearningRate 0.000665 Epoch: 10 Global Step: 220550 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:36,348-Speed 2473.30 samples/sec Loss 3.4323 LearningRate 0.000665 Epoch: 10 Global Step: 220560 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:44,493-Speed 2514.92 samples/sec Loss 3.4635 LearningRate 0.000665 Epoch: 10 Global Step: 220570 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:50:52,692-Speed 2498.30 samples/sec Loss 3.5497 LearningRate 0.000665 Epoch: 10 Global Step: 220580 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:51:00,890-Speed 2498.57 samples/sec Loss 3.4310 LearningRate 0.000665 Epoch: 10 Global Step: 220590 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:51:09,088-Speed 2498.63 samples/sec Loss 3.4076 LearningRate 0.000665 Epoch: 10 Global Step: 220600 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:51:17,289-Speed 2497.69 samples/sec Loss 3.4442 LearningRate 0.000665 Epoch: 10 Global Step: 220610 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:51:25,484-Speed 2499.46 samples/sec Loss 3.5657 LearningRate 0.000665 Epoch: 10 Global Step: 220620 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:51:33,630-Speed 2514.61 samples/sec Loss 3.4696 LearningRate 0.000665 Epoch: 10 Global Step: 220630 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 16:51:41,786-Speed 2511.32 samples/sec Loss 3.4455 LearningRate 0.000665 Epoch: 10 Global Step: 220640 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:51:49,994-Speed 2495.64 samples/sec Loss 3.4817 LearningRate 0.000665 Epoch: 10 Global Step: 220650 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:51:58,191-Speed 2498.75 samples/sec Loss 3.4187 LearningRate 0.000665 Epoch: 10 Global Step: 220660 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:06,388-Speed 2499.04 samples/sec Loss 3.4210 LearningRate 0.000665 Epoch: 10 Global Step: 220670 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:14,584-Speed 2498.92 samples/sec Loss 3.4688 LearningRate 0.000665 Epoch: 10 Global Step: 220680 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:22,727-Speed 2515.68 samples/sec Loss 3.4312 LearningRate 0.000665 Epoch: 10 Global Step: 220690 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:30,925-Speed 2498.44 samples/sec Loss 3.4850 LearningRate 0.000665 Epoch: 10 Global Step: 220700 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:39,125-Speed 2498.09 samples/sec Loss 3.4607 LearningRate 0.000665 Epoch: 10 Global Step: 220710 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:47,333-Speed 2495.41 samples/sec Loss 3.4988 LearningRate 0.000665 Epoch: 10 Global Step: 220720 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:52:55,530-Speed 2498.86 samples/sec Loss 3.4485 LearningRate 0.000665 Epoch: 10 Global Step: 220730 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:03,731-Speed 2497.80 samples/sec Loss 3.4617 LearningRate 0.000665 Epoch: 10 Global Step: 220740 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:11,876-Speed 2514.81 samples/sec Loss 3.4607 LearningRate 0.000665 Epoch: 10 Global Step: 220750 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:20,076-Speed 2498.15 samples/sec Loss 3.3788 LearningRate 0.000665 Epoch: 10 Global Step: 220760 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:28,278-Speed 2497.21 samples/sec Loss 3.3811 LearningRate 0.000665 Epoch: 10 Global Step: 220770 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:36,485-Speed 2495.98 samples/sec Loss 3.4037 LearningRate 0.000665 Epoch: 10 Global Step: 220780 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:44,679-Speed 2499.66 samples/sec Loss 3.4869 LearningRate 0.000665 Epoch: 10 Global Step: 220790 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:53:52,877-Speed 2498.60 samples/sec Loss 3.3865 LearningRate 0.000665 Epoch: 10 Global Step: 220800 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:01,035-Speed 2510.98 samples/sec Loss 3.4828 LearningRate 0.000665 Epoch: 10 Global Step: 220810 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:09,237-Speed 2497.61 samples/sec Loss 3.4888 LearningRate 0.000665 Epoch: 10 Global Step: 220820 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:17,439-Speed 2497.26 samples/sec Loss 3.4494 LearningRate 0.000665 Epoch: 10 Global Step: 220830 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:25,636-Speed 2499.01 samples/sec Loss 3.4241 LearningRate 0.000665 Epoch: 10 Global Step: 220840 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:33,835-Speed 2498.05 samples/sec Loss 3.3793 LearningRate 0.000665 Epoch: 10 Global Step: 220850 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:42,033-Speed 2498.48 samples/sec Loss 3.4576 LearningRate 0.000665 Epoch: 10 Global Step: 220860 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:50,176-Speed 2515.72 samples/sec Loss 3.4399 LearningRate 0.000665 Epoch: 10 Global Step: 220870 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:54:58,379-Speed 2497.85 samples/sec Loss 3.5179 LearningRate 0.000665 Epoch: 10 Global Step: 220880 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:06,577-Speed 2498.27 samples/sec Loss 3.5002 LearningRate 0.000665 Epoch: 10 Global Step: 220890 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:14,777-Speed 2498.27 samples/sec Loss 3.4945 LearningRate 0.000665 Epoch: 10 Global Step: 220900 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:22,973-Speed 2499.30 samples/sec Loss 3.4589 LearningRate 0.000665 Epoch: 10 Global Step: 220910 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:31,171-Speed 2498.53 samples/sec Loss 3.4901 LearningRate 0.000665 Epoch: 10 Global Step: 220920 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:39,319-Speed 2513.90 samples/sec Loss 3.4489 LearningRate 0.000665 Epoch: 10 Global Step: 220930 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:47,518-Speed 2498.30 samples/sec Loss 3.5013 LearningRate 0.000665 Epoch: 10 Global Step: 220940 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:55:55,721-Speed 2497.07 samples/sec Loss 3.4309 LearningRate 0.000665 Epoch: 10 Global Step: 220950 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:03,919-Speed 2498.60 samples/sec Loss 3.4582 LearningRate 0.000664 Epoch: 10 Global Step: 220960 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:12,120-Speed 2497.71 samples/sec Loss 3.3944 LearningRate 0.000664 Epoch: 10 Global Step: 220970 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:20,324-Speed 2496.80 samples/sec Loss 3.3850 LearningRate 0.000664 Epoch: 10 Global Step: 220980 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:28,465-Speed 2516.07 samples/sec Loss 3.4716 LearningRate 0.000664 Epoch: 10 Global Step: 220990 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:36,662-Speed 2498.57 samples/sec Loss 3.4543 LearningRate 0.000664 Epoch: 10 Global Step: 221000 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:44,861-Speed 2498.40 samples/sec Loss 3.4548 LearningRate 0.000664 Epoch: 10 Global Step: 221010 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:56:53,057-Speed 2499.25 samples/sec Loss 3.4844 LearningRate 0.000664 Epoch: 10 Global Step: 221020 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:01,255-Speed 2498.60 samples/sec Loss 3.4834 LearningRate 0.000664 Epoch: 10 Global Step: 221030 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:09,456-Speed 2497.72 samples/sec Loss 3.4964 LearningRate 0.000664 Epoch: 10 Global Step: 221040 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:17,603-Speed 2514.42 samples/sec Loss 3.4087 LearningRate 0.000664 Epoch: 10 Global Step: 221050 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:25,801-Speed 2498.54 samples/sec Loss 3.4213 LearningRate 0.000664 Epoch: 10 Global Step: 221060 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:33,999-Speed 2498.12 samples/sec Loss 3.4555 LearningRate 0.000664 Epoch: 10 Global Step: 221070 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:42,200-Speed 2497.85 samples/sec Loss 3.4318 LearningRate 0.000664 Epoch: 10 Global Step: 221080 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:50,402-Speed 2497.36 samples/sec Loss 3.3925 LearningRate 0.000664 Epoch: 10 Global Step: 221090 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:57:58,607-Speed 2496.16 samples/sec Loss 3.3831 LearningRate 0.000664 Epoch: 10 Global Step: 221100 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:06,757-Speed 2513.59 samples/sec Loss 3.4422 LearningRate 0.000664 Epoch: 10 Global Step: 221110 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:14,954-Speed 2498.62 samples/sec Loss 3.4501 LearningRate 0.000664 Epoch: 10 Global Step: 221120 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:23,151-Speed 2499.20 samples/sec Loss 3.4868 LearningRate 0.000664 Epoch: 10 Global Step: 221130 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:31,349-Speed 2498.57 samples/sec Loss 3.4617 LearningRate 0.000664 Epoch: 10 Global Step: 221140 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:39,549-Speed 2497.86 samples/sec Loss 3.4678 LearningRate 0.000664 Epoch: 10 Global Step: 221150 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:47,755-Speed 2496.12 samples/sec Loss 3.4822 LearningRate 0.000664 Epoch: 10 Global Step: 221160 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:58:55,901-Speed 2514.46 samples/sec Loss 3.3828 LearningRate 0.000664 Epoch: 10 Global Step: 221170 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:04,102-Speed 2498.19 samples/sec Loss 3.4699 LearningRate 0.000664 Epoch: 10 Global Step: 221180 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:12,302-Speed 2497.92 samples/sec Loss 3.5368 LearningRate 0.000664 Epoch: 10 Global Step: 221190 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:20,502-Speed 2498.23 samples/sec Loss 3.5190 LearningRate 0.000664 Epoch: 10 Global Step: 221200 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:28,703-Speed 2497.72 samples/sec Loss 3.5485 LearningRate 0.000664 Epoch: 10 Global Step: 221210 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:36,902-Speed 2498.19 samples/sec Loss 3.4080 LearningRate 0.000664 Epoch: 10 Global Step: 221220 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:45,046-Speed 2515.11 samples/sec Loss 3.4553 LearningRate 0.000664 Epoch: 10 Global Step: 221230 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 16:59:53,243-Speed 2498.73 samples/sec Loss 3.5033 LearningRate 0.000664 Epoch: 10 Global Step: 221240 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:01,440-Speed 2498.89 samples/sec Loss 3.4781 LearningRate 0.000664 Epoch: 10 Global Step: 221250 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:09,641-Speed 2497.83 samples/sec Loss 3.4371 LearningRate 0.000664 Epoch: 10 Global Step: 221260 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:17,841-Speed 2497.89 samples/sec Loss 3.4787 LearningRate 0.000664 Epoch: 10 Global Step: 221270 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:26,038-Speed 2498.84 samples/sec Loss 3.4508 LearningRate 0.000664 Epoch: 10 Global Step: 221280 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:34,183-Speed 2514.85 samples/sec Loss 3.4132 LearningRate 0.000664 Epoch: 10 Global Step: 221290 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:42,389-Speed 2496.25 samples/sec Loss 3.4959 LearningRate 0.000664 Epoch: 10 Global Step: 221300 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:50,588-Speed 2498.34 samples/sec Loss 3.4533 LearningRate 0.000664 Epoch: 10 Global Step: 221310 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:00:58,786-Speed 2498.42 samples/sec Loss 3.5228 LearningRate 0.000664 Epoch: 10 Global Step: 221320 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:06,980-Speed 2499.71 samples/sec Loss 3.4919 LearningRate 0.000664 Epoch: 10 Global Step: 221330 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:15,179-Speed 2498.39 samples/sec Loss 3.5040 LearningRate 0.000664 Epoch: 10 Global Step: 221340 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:23,334-Speed 2511.80 samples/sec Loss 3.5607 LearningRate 0.000664 Epoch: 10 Global Step: 221350 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:31,527-Speed 2499.97 samples/sec Loss 3.4331 LearningRate 0.000664 Epoch: 10 Global Step: 221360 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:39,729-Speed 2497.34 samples/sec Loss 3.4924 LearningRate 0.000664 Epoch: 10 Global Step: 221370 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:47,928-Speed 2498.26 samples/sec Loss 3.4803 LearningRate 0.000664 Epoch: 10 Global Step: 221380 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:01:56,121-Speed 2500.17 samples/sec Loss 3.4389 LearningRate 0.000664 Epoch: 10 Global Step: 221390 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:04,320-Speed 2498.36 samples/sec Loss 3.3739 LearningRate 0.000664 Epoch: 10 Global Step: 221400 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:12,480-Speed 2510.04 samples/sec Loss 3.4948 LearningRate 0.000664 Epoch: 10 Global Step: 221410 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:20,679-Speed 2498.30 samples/sec Loss 3.4642 LearningRate 0.000663 Epoch: 10 Global Step: 221420 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:28,879-Speed 2497.88 samples/sec Loss 3.4482 LearningRate 0.000663 Epoch: 10 Global Step: 221430 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:37,077-Speed 2498.63 samples/sec Loss 3.4794 LearningRate 0.000663 Epoch: 10 Global Step: 221440 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:45,273-Speed 2499.21 samples/sec Loss 3.4238 LearningRate 0.000663 Epoch: 10 Global Step: 221450 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:02:53,473-Speed 2497.93 samples/sec Loss 3.4604 LearningRate 0.000663 Epoch: 10 Global Step: 221460 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:01,617-Speed 2515.16 samples/sec Loss 3.4841 LearningRate 0.000663 Epoch: 10 Global Step: 221470 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:09,815-Speed 2498.48 samples/sec Loss 3.4187 LearningRate 0.000663 Epoch: 10 Global Step: 221480 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:18,012-Speed 2498.88 samples/sec Loss 3.4629 LearningRate 0.000663 Epoch: 10 Global Step: 221490 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:26,210-Speed 2498.58 samples/sec Loss 3.4595 LearningRate 0.000663 Epoch: 10 Global Step: 221500 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:34,411-Speed 2497.74 samples/sec Loss 3.4329 LearningRate 0.000663 Epoch: 10 Global Step: 221510 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:42,614-Speed 2497.46 samples/sec Loss 3.4960 LearningRate 0.000663 Epoch: 10 Global Step: 221520 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:50,757-Speed 2515.45 samples/sec Loss 3.5267 LearningRate 0.000663 Epoch: 10 Global Step: 221530 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:03:58,958-Speed 2497.50 samples/sec Loss 3.4744 LearningRate 0.000663 Epoch: 10 Global Step: 221540 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:07,157-Speed 2498.32 samples/sec Loss 3.4450 LearningRate 0.000663 Epoch: 10 Global Step: 221550 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:15,359-Speed 2497.48 samples/sec Loss 3.3514 LearningRate 0.000663 Epoch: 10 Global Step: 221560 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:23,561-Speed 2497.65 samples/sec Loss 3.4046 LearningRate 0.000663 Epoch: 10 Global Step: 221570 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:31,765-Speed 2496.67 samples/sec Loss 3.4182 LearningRate 0.000663 Epoch: 10 Global Step: 221580 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:39,919-Speed 2512.37 samples/sec Loss 3.4083 LearningRate 0.000663 Epoch: 10 Global Step: 221590 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:48,121-Speed 2497.35 samples/sec Loss 3.4593 LearningRate 0.000663 Epoch: 10 Global Step: 221600 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:04:56,342-Speed 2491.41 samples/sec Loss 3.4523 LearningRate 0.000663 Epoch: 10 Global Step: 221610 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:04,546-Speed 2496.94 samples/sec Loss 3.4403 LearningRate 0.000663 Epoch: 10 Global Step: 221620 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:12,752-Speed 2496.11 samples/sec Loss 3.4085 LearningRate 0.000663 Epoch: 10 Global Step: 221630 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:20,956-Speed 2496.74 samples/sec Loss 3.4267 LearningRate 0.000663 Epoch: 10 Global Step: 221640 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:29,104-Speed 2514.01 samples/sec Loss 3.3508 LearningRate 0.000663 Epoch: 10 Global Step: 221650 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:37,306-Speed 2497.36 samples/sec Loss 3.3700 LearningRate 0.000663 Epoch: 10 Global Step: 221660 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:45,508-Speed 2497.31 samples/sec Loss 3.3820 LearningRate 0.000663 Epoch: 10 Global Step: 221670 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:05:53,737-Speed 2489.10 samples/sec Loss 3.3873 LearningRate 0.000663 Epoch: 10 Global Step: 221680 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:01,938-Speed 2497.58 samples/sec Loss 3.3654 LearningRate 0.000663 Epoch: 10 Global Step: 221690 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:10,138-Speed 2498.03 samples/sec Loss 3.4372 LearningRate 0.000663 Epoch: 10 Global Step: 221700 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:18,291-Speed 2512.40 samples/sec Loss 3.3918 LearningRate 0.000663 Epoch: 10 Global Step: 221710 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:26,488-Speed 2498.97 samples/sec Loss 3.4366 LearningRate 0.000663 Epoch: 10 Global Step: 221720 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:34,686-Speed 2498.51 samples/sec Loss 3.4450 LearningRate 0.000663 Epoch: 10 Global Step: 221730 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:42,885-Speed 2498.45 samples/sec Loss 3.3834 LearningRate 0.000663 Epoch: 10 Global Step: 221740 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:51,093-Speed 2495.62 samples/sec Loss 3.4380 LearningRate 0.000663 Epoch: 10 Global Step: 221750 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:06:59,289-Speed 2499.06 samples/sec Loss 3.4266 LearningRate 0.000663 Epoch: 10 Global Step: 221760 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:07,443-Speed 2512.25 samples/sec Loss 3.4077 LearningRate 0.000663 Epoch: 10 Global Step: 221770 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:15,651-Speed 2495.47 samples/sec Loss 3.4635 LearningRate 0.000663 Epoch: 10 Global Step: 221780 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:23,853-Speed 2497.41 samples/sec Loss 3.3869 LearningRate 0.000663 Epoch: 10 Global Step: 221790 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:32,052-Speed 2498.15 samples/sec Loss 3.4214 LearningRate 0.000663 Epoch: 10 Global Step: 221800 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:40,249-Speed 2498.99 samples/sec Loss 3.3757 LearningRate 0.000663 Epoch: 10 Global Step: 221810 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:48,463-Speed 2493.48 samples/sec Loss 3.4171 LearningRate 0.000663 Epoch: 10 Global Step: 221820 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:07:56,606-Speed 2515.38 samples/sec Loss 3.3919 LearningRate 0.000663 Epoch: 10 Global Step: 221830 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:08:04,809-Speed 2497.17 samples/sec Loss 3.4076 LearningRate 0.000663 Epoch: 10 Global Step: 221840 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:13,012-Speed 2497.02 samples/sec Loss 3.4136 LearningRate 0.000663 Epoch: 10 Global Step: 221850 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:21,209-Speed 2499.09 samples/sec Loss 3.4657 LearningRate 0.000663 Epoch: 10 Global Step: 221860 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:29,409-Speed 2497.93 samples/sec Loss 3.3766 LearningRate 0.000662 Epoch: 10 Global Step: 221870 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:37,607-Speed 2498.57 samples/sec Loss 3.3815 LearningRate 0.000662 Epoch: 10 Global Step: 221880 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:45,753-Speed 2514.65 samples/sec Loss 3.4457 LearningRate 0.000662 Epoch: 10 Global Step: 221890 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:08:53,959-Speed 2496.11 samples/sec Loss 3.4623 LearningRate 0.000662 Epoch: 10 Global Step: 221900 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:02,162-Speed 2497.12 samples/sec Loss 3.4649 LearningRate 0.000662 Epoch: 10 Global Step: 221910 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:10,364-Speed 2497.13 samples/sec Loss 3.4189 LearningRate 0.000662 Epoch: 10 Global Step: 221920 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:18,559-Speed 2499.40 samples/sec Loss 3.4080 LearningRate 0.000662 Epoch: 10 Global Step: 221930 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:26,758-Speed 2498.57 samples/sec Loss 3.3951 LearningRate 0.000662 Epoch: 10 Global Step: 221940 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:34,908-Speed 2513.28 samples/sec Loss 3.4744 LearningRate 0.000662 Epoch: 10 Global Step: 221950 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:43,108-Speed 2497.98 samples/sec Loss 3.3287 LearningRate 0.000662 Epoch: 10 Global Step: 221960 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:51,307-Speed 2498.20 samples/sec Loss 3.3623 LearningRate 0.000662 Epoch: 10 Global Step: 221970 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:09:59,504-Speed 2498.99 samples/sec Loss 3.3774 LearningRate 0.000662 Epoch: 10 Global Step: 221980 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:07,709-Speed 2496.22 samples/sec Loss 3.4177 LearningRate 0.000662 Epoch: 10 Global Step: 221990 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:15,910-Speed 2497.70 samples/sec Loss 3.4433 LearningRate 0.000662 Epoch: 10 Global Step: 222000 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:24,059-Speed 2514.72 samples/sec Loss 3.4078 LearningRate 0.000662 Epoch: 10 Global Step: 222010 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:32,267-Speed 2495.37 samples/sec Loss 3.3987 LearningRate 0.000662 Epoch: 10 Global Step: 222020 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:40,466-Speed 2498.25 samples/sec Loss 3.4301 LearningRate 0.000662 Epoch: 10 Global Step: 222030 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:48,663-Speed 2498.76 samples/sec Loss 3.3978 LearningRate 0.000662 Epoch: 10 Global Step: 222040 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:10:56,860-Speed 2498.94 samples/sec Loss 3.3435 LearningRate 0.000662 Epoch: 10 Global Step: 222050 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:05,062-Speed 2497.38 samples/sec Loss 3.4263 LearningRate 0.000662 Epoch: 10 Global Step: 222060 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:13,208-Speed 2514.50 samples/sec Loss 3.4031 LearningRate 0.000662 Epoch: 10 Global Step: 222070 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:21,409-Speed 2499.13 samples/sec Loss 3.4033 LearningRate 0.000662 Epoch: 10 Global Step: 222080 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:29,613-Speed 2496.53 samples/sec Loss 3.3639 LearningRate 0.000662 Epoch: 10 Global Step: 222090 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:37,814-Speed 2497.70 samples/sec Loss 3.3871 LearningRate 0.000662 Epoch: 10 Global Step: 222100 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:46,012-Speed 2498.63 samples/sec Loss 3.5102 LearningRate 0.000662 Epoch: 10 Global Step: 222110 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:11:54,223-Speed 2494.70 samples/sec Loss 3.4081 LearningRate 0.000662 Epoch: 10 Global Step: 222120 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:02,381-Speed 2510.88 samples/sec Loss 3.4154 LearningRate 0.000662 Epoch: 10 Global Step: 222130 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:10,580-Speed 2498.31 samples/sec Loss 3.3998 LearningRate 0.000662 Epoch: 10 Global Step: 222140 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:18,782-Speed 2497.01 samples/sec Loss 3.3776 LearningRate 0.000662 Epoch: 10 Global Step: 222150 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:26,985-Speed 2497.19 samples/sec Loss 3.4609 LearningRate 0.000662 Epoch: 10 Global Step: 222160 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:35,195-Speed 2494.88 samples/sec Loss 3.3785 LearningRate 0.000662 Epoch: 10 Global Step: 222170 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:43,399-Speed 2496.88 samples/sec Loss 3.3650 LearningRate 0.000662 Epoch: 10 Global Step: 222180 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:51,542-Speed 2515.35 samples/sec Loss 3.4013 LearningRate 0.000662 Epoch: 10 Global Step: 222190 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:12:59,753-Speed 2494.23 samples/sec Loss 3.4204 LearningRate 0.000662 Epoch: 10 Global Step: 222200 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:07,949-Speed 2499.50 samples/sec Loss 3.4057 LearningRate 0.000662 Epoch: 10 Global Step: 222210 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:16,148-Speed 2498.31 samples/sec Loss 3.3587 LearningRate 0.000662 Epoch: 10 Global Step: 222220 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:24,358-Speed 2494.95 samples/sec Loss 3.3738 LearningRate 0.000662 Epoch: 10 Global Step: 222230 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:32,558-Speed 2498.08 samples/sec Loss 3.3601 LearningRate 0.000662 Epoch: 10 Global Step: 222240 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:40,700-Speed 2515.69 samples/sec Loss 3.4611 LearningRate 0.000662 Epoch: 10 Global Step: 222250 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:48,899-Speed 2498.10 samples/sec Loss 3.4154 LearningRate 0.000662 Epoch: 10 Global Step: 222260 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:13:57,120-Speed 2491.83 samples/sec Loss 3.4129 LearningRate 0.000662 Epoch: 10 Global Step: 222270 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:05,319-Speed 2498.21 samples/sec Loss 3.4398 LearningRate 0.000662 Epoch: 10 Global Step: 222280 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:13,518-Speed 2498.02 samples/sec Loss 3.4185 LearningRate 0.000662 Epoch: 10 Global Step: 222290 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:21,717-Speed 2498.32 samples/sec Loss 3.4674 LearningRate 0.000662 Epoch: 10 Global Step: 222300 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:29,864-Speed 2514.12 samples/sec Loss 3.4160 LearningRate 0.000662 Epoch: 10 Global Step: 222310 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:38,062-Speed 2498.55 samples/sec Loss 3.3912 LearningRate 0.000662 Epoch: 10 Global Step: 222320 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:46,264-Speed 2497.66 samples/sec Loss 3.3922 LearningRate 0.000661 Epoch: 10 Global Step: 222330 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:14:54,464-Speed 2497.77 samples/sec Loss 3.4555 LearningRate 0.000661 Epoch: 10 Global Step: 222340 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:02,662-Speed 2498.75 samples/sec Loss 3.3932 LearningRate 0.000661 Epoch: 10 Global Step: 222350 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:10,860-Speed 2498.90 samples/sec Loss 3.4028 LearningRate 0.000661 Epoch: 10 Global Step: 222360 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:19,006-Speed 2514.49 samples/sec Loss 3.4858 LearningRate 0.000661 Epoch: 10 Global Step: 222370 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:27,203-Speed 2498.97 samples/sec Loss 3.4096 LearningRate 0.000661 Epoch: 10 Global Step: 222380 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:35,401-Speed 2498.49 samples/sec Loss 3.3918 LearningRate 0.000661 Epoch: 10 Global Step: 222390 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:43,603-Speed 2497.46 samples/sec Loss 3.3941 LearningRate 0.000661 Epoch: 10 Global Step: 222400 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:15:51,798-Speed 2499.29 samples/sec Loss 3.4157 LearningRate 0.000661 Epoch: 10 Global Step: 222410 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:16:00,000-Speed 2497.50 samples/sec Loss 3.4687 LearningRate 0.000661 Epoch: 10 Global Step: 222420 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:16:08,147-Speed 2514.10 samples/sec Loss 3.3847 LearningRate 0.000661 Epoch: 10 Global Step: 222430 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:16:16,349-Speed 2497.45 samples/sec Loss 3.4748 LearningRate 0.000661 Epoch: 10 Global Step: 222440 Fp16 Grad Scale: 65536 Required: 139 hours Training: 2022-07-07 17:16:24,506-Speed 2510.94 samples/sec Loss 3.4812 LearningRate 0.000661 Epoch: 10 Global Step: 222450 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:16:32,720-Speed 2493.98 samples/sec Loss 3.4295 LearningRate 0.000661 Epoch: 10 Global Step: 222460 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:16:40,920-Speed 2497.93 samples/sec Loss 3.4948 LearningRate 0.000661 Epoch: 10 Global Step: 222470 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:16:49,120-Speed 2497.81 samples/sec Loss 3.4254 LearningRate 0.000661 Epoch: 10 Global Step: 222480 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:16:57,265-Speed 2514.99 samples/sec Loss 3.4261 LearningRate 0.000661 Epoch: 10 Global Step: 222490 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:05,466-Speed 2497.64 samples/sec Loss 3.4225 LearningRate 0.000661 Epoch: 10 Global Step: 222500 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:13,664-Speed 2498.32 samples/sec Loss 3.4949 LearningRate 0.000661 Epoch: 10 Global Step: 222510 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:21,863-Speed 2498.31 samples/sec Loss 3.4084 LearningRate 0.000661 Epoch: 10 Global Step: 222520 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:30,081-Speed 2492.50 samples/sec Loss 3.4086 LearningRate 0.000661 Epoch: 10 Global Step: 222530 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:38,275-Speed 2499.65 samples/sec Loss 3.3220 LearningRate 0.000661 Epoch: 10 Global Step: 222540 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:46,419-Speed 2515.40 samples/sec Loss 3.4382 LearningRate 0.000661 Epoch: 10 Global Step: 222550 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:17:54,615-Speed 2499.12 samples/sec Loss 3.4251 LearningRate 0.000661 Epoch: 10 Global Step: 222560 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:02,811-Speed 2499.17 samples/sec Loss 3.3763 LearningRate 0.000661 Epoch: 10 Global Step: 222570 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:11,007-Speed 2499.10 samples/sec Loss 3.3755 LearningRate 0.000661 Epoch: 10 Global Step: 222580 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:19,204-Speed 2498.79 samples/sec Loss 3.4303 LearningRate 0.000661 Epoch: 10 Global Step: 222590 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:27,414-Speed 2495.40 samples/sec Loss 3.3902 LearningRate 0.000661 Epoch: 10 Global Step: 222600 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:35,559-Speed 2514.96 samples/sec Loss 3.4281 LearningRate 0.000661 Epoch: 10 Global Step: 222610 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:43,754-Speed 2499.17 samples/sec Loss 3.4074 LearningRate 0.000661 Epoch: 10 Global Step: 222620 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:18:51,975-Speed 2491.64 samples/sec Loss 3.4432 LearningRate 0.000661 Epoch: 10 Global Step: 222630 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:00,173-Speed 2498.64 samples/sec Loss 3.4558 LearningRate 0.000661 Epoch: 10 Global Step: 222640 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:08,368-Speed 2499.39 samples/sec Loss 3.3006 LearningRate 0.000661 Epoch: 10 Global Step: 222650 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:16,567-Speed 2498.41 samples/sec Loss 3.4633 LearningRate 0.000661 Epoch: 10 Global Step: 222660 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:24,710-Speed 2515.52 samples/sec Loss 3.4206 LearningRate 0.000661 Epoch: 10 Global Step: 222670 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:32,907-Speed 2498.62 samples/sec Loss 3.4414 LearningRate 0.000661 Epoch: 10 Global Step: 222680 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:41,107-Speed 2498.14 samples/sec Loss 3.4278 LearningRate 0.000661 Epoch: 10 Global Step: 222690 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:49,307-Speed 2498.14 samples/sec Loss 3.4432 LearningRate 0.000661 Epoch: 10 Global Step: 222700 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:19:57,519-Speed 2494.29 samples/sec Loss 3.3970 LearningRate 0.000661 Epoch: 10 Global Step: 222710 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:05,716-Speed 2498.79 samples/sec Loss 3.4213 LearningRate 0.000661 Epoch: 10 Global Step: 222720 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:13,866-Speed 2513.40 samples/sec Loss 3.3913 LearningRate 0.000661 Epoch: 10 Global Step: 222730 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:22,067-Speed 2497.60 samples/sec Loss 3.4541 LearningRate 0.000661 Epoch: 10 Global Step: 222740 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:30,268-Speed 2498.05 samples/sec Loss 3.3795 LearningRate 0.000661 Epoch: 10 Global Step: 222750 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:38,469-Speed 2497.58 samples/sec Loss 3.4235 LearningRate 0.000661 Epoch: 10 Global Step: 222760 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:46,670-Speed 2497.62 samples/sec Loss 3.3379 LearningRate 0.000661 Epoch: 10 Global Step: 222770 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:20:54,868-Speed 2498.61 samples/sec Loss 3.3892 LearningRate 0.000661 Epoch: 10 Global Step: 222780 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:03,011-Speed 2515.47 samples/sec Loss 3.4360 LearningRate 0.000660 Epoch: 10 Global Step: 222790 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:11,208-Speed 2498.72 samples/sec Loss 3.4047 LearningRate 0.000660 Epoch: 10 Global Step: 222800 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:19,407-Speed 2498.53 samples/sec Loss 3.3805 LearningRate 0.000660 Epoch: 10 Global Step: 222810 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:27,606-Speed 2498.45 samples/sec Loss 3.4093 LearningRate 0.000660 Epoch: 10 Global Step: 222820 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:35,809-Speed 2497.18 samples/sec Loss 3.4226 LearningRate 0.000660 Epoch: 10 Global Step: 222830 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:44,006-Speed 2498.69 samples/sec Loss 3.4069 LearningRate 0.000660 Epoch: 10 Global Step: 222840 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:21:52,151-Speed 2515.11 samples/sec Loss 3.4461 LearningRate 0.000660 Epoch: 10 Global Step: 222850 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:00,351-Speed 2497.91 samples/sec Loss 3.4267 LearningRate 0.000660 Epoch: 10 Global Step: 222860 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:08,548-Speed 2498.77 samples/sec Loss 3.4663 LearningRate 0.000660 Epoch: 10 Global Step: 222870 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:16,749-Speed 2497.60 samples/sec Loss 3.4797 LearningRate 0.000660 Epoch: 10 Global Step: 222880 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:25,524-Speed 2500.38 samples/sec Loss 3.4435 LearningRate 0.000660 Epoch: 10 Global Step: 222890 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:33,726-Speed 2497.30 samples/sec Loss 3.4557 LearningRate 0.000660 Epoch: 10 Global Step: 222900 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:41,869-Speed 2515.34 samples/sec Loss 3.3950 LearningRate 0.000660 Epoch: 10 Global Step: 222910 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:50,081-Speed 2494.40 samples/sec Loss 3.4298 LearningRate 0.000660 Epoch: 10 Global Step: 222920 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:22:58,291-Speed 2494.88 samples/sec Loss 3.4357 LearningRate 0.000660 Epoch: 10 Global Step: 222930 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:06,489-Speed 2498.66 samples/sec Loss 3.4625 LearningRate 0.000660 Epoch: 10 Global Step: 222940 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:14,686-Speed 2498.58 samples/sec Loss 3.5162 LearningRate 0.000660 Epoch: 10 Global Step: 222950 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:22,885-Speed 2498.43 samples/sec Loss 3.4049 LearningRate 0.000660 Epoch: 10 Global Step: 222960 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:31,029-Speed 2515.52 samples/sec Loss 3.4526 LearningRate 0.000660 Epoch: 10 Global Step: 222970 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:39,229-Speed 2497.78 samples/sec Loss 3.4777 LearningRate 0.000660 Epoch: 10 Global Step: 222980 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:47,425-Speed 2499.31 samples/sec Loss 3.4398 LearningRate 0.000660 Epoch: 10 Global Step: 222990 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:23:55,635-Speed 2494.90 samples/sec Loss 3.4222 LearningRate 0.000660 Epoch: 10 Global Step: 223000 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:03,832-Speed 2498.68 samples/sec Loss 3.4697 LearningRate 0.000660 Epoch: 10 Global Step: 223010 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:12,033-Speed 2497.63 samples/sec Loss 3.4558 LearningRate 0.000660 Epoch: 10 Global Step: 223020 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:20,183-Speed 2513.50 samples/sec Loss 3.4256 LearningRate 0.000660 Epoch: 10 Global Step: 223030 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:28,381-Speed 2498.37 samples/sec Loss 3.4545 LearningRate 0.000660 Epoch: 10 Global Step: 223040 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:36,580-Speed 2498.59 samples/sec Loss 3.4612 LearningRate 0.000660 Epoch: 10 Global Step: 223050 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:44,777-Speed 2498.83 samples/sec Loss 3.4540 LearningRate 0.000660 Epoch: 10 Global Step: 223060 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:24:52,976-Speed 2498.19 samples/sec Loss 3.3931 LearningRate 0.000660 Epoch: 10 Global Step: 223070 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:01,174-Speed 2498.59 samples/sec Loss 3.5920 LearningRate 0.000660 Epoch: 10 Global Step: 223080 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:09,327-Speed 2512.22 samples/sec Loss 3.3910 LearningRate 0.000660 Epoch: 10 Global Step: 223090 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:18,507-Speed 2492.03 samples/sec Loss 3.4486 LearningRate 0.000660 Epoch: 10 Global Step: 223100 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:26,705-Speed 2498.49 samples/sec Loss 3.5030 LearningRate 0.000660 Epoch: 10 Global Step: 223110 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:37,385-Speed 1920.99 samples/sec Loss 3.4648 LearningRate 0.000660 Epoch: 10 Global Step: 223120 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:45,673-Speed 2502.26 samples/sec Loss 3.5856 LearningRate 0.000660 Epoch: 10 Global Step: 223130 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:25:53,871-Speed 2498.44 samples/sec Loss 3.4563 LearningRate 0.000660 Epoch: 10 Global Step: 223140 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:02,015-Speed 2514.99 samples/sec Loss 3.4238 LearningRate 0.000660 Epoch: 10 Global Step: 223150 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:13,906-Speed 1740.74 samples/sec Loss 3.4044 LearningRate 0.000660 Epoch: 10 Global Step: 223160 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:22,115-Speed 2501.72 samples/sec Loss 3.4333 LearningRate 0.000660 Epoch: 10 Global Step: 223170 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:30,328-Speed 2494.00 samples/sec Loss 3.3933 LearningRate 0.000660 Epoch: 10 Global Step: 223180 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:41,655-Speed 1821.67 samples/sec Loss 3.4589 LearningRate 0.000660 Epoch: 10 Global Step: 223190 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:49,884-Speed 2502.32 samples/sec Loss 3.4572 LearningRate 0.000660 Epoch: 10 Global Step: 223200 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:26:58,026-Speed 2515.67 samples/sec Loss 3.4663 LearningRate 0.000660 Epoch: 10 Global Step: 223210 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:06,260-Speed 2501.13 samples/sec Loss 3.4807 LearningRate 0.000660 Epoch: 10 Global Step: 223220 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:14,489-Speed 2500.06 samples/sec Loss 3.4357 LearningRate 0.000660 Epoch: 10 Global Step: 223230 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:26,553-Speed 2499.74 samples/sec Loss 3.4627 LearningRate 0.000660 Epoch: 10 Global Step: 223240 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:34,791-Speed 2502.55 samples/sec Loss 3.4421 LearningRate 0.000659 Epoch: 10 Global Step: 223250 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:46,087-Speed 1818.93 samples/sec Loss 3.4835 LearningRate 0.000659 Epoch: 10 Global Step: 223260 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:27:54,231-Speed 2518.91 samples/sec Loss 3.4772 LearningRate 0.000659 Epoch: 10 Global Step: 223270 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:02,421-Speed 2500.88 samples/sec Loss 3.4482 LearningRate 0.000659 Epoch: 10 Global Step: 223280 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:10,622-Speed 2500.63 samples/sec Loss 3.5158 LearningRate 0.000659 Epoch: 10 Global Step: 223290 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:18,879-Speed 2498.35 samples/sec Loss 3.4387 LearningRate 0.000659 Epoch: 10 Global Step: 223300 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:27,475-Speed 2501.03 samples/sec Loss 3.4378 LearningRate 0.000659 Epoch: 10 Global Step: 223310 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:38,814-Speed 1806.40 samples/sec Loss 3.4959 LearningRate 0.000659 Epoch: 10 Global Step: 223320 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:47,000-Speed 2517.57 samples/sec Loss 3.4171 LearningRate 0.000659 Epoch: 10 Global Step: 223330 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:28:55,214-Speed 2500.09 samples/sec Loss 3.4104 LearningRate 0.000659 Epoch: 10 Global Step: 223340 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:03,415-Speed 2497.74 samples/sec Loss 3.4077 LearningRate 0.000659 Epoch: 10 Global Step: 223350 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:16,073-Speed 2502.17 samples/sec Loss 3.3351 LearningRate 0.000659 Epoch: 10 Global Step: 223360 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:24,312-Speed 2501.58 samples/sec Loss 3.3027 LearningRate 0.000659 Epoch: 10 Global Step: 223370 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:32,553-Speed 2501.25 samples/sec Loss 3.3585 LearningRate 0.000659 Epoch: 10 Global Step: 223380 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:44,643-Speed 1693.99 samples/sec Loss 3.3522 LearningRate 0.000659 Epoch: 10 Global Step: 223390 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:29:54,051-Speed 2359.62 samples/sec Loss 3.3668 LearningRate 0.000659 Epoch: 10 Global Step: 223400 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:02,248-Speed 2500.83 samples/sec Loss 3.3095 LearningRate 0.000659 Epoch: 10 Global Step: 223410 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:10,449-Speed 2497.54 samples/sec Loss 3.4031 LearningRate 0.000659 Epoch: 10 Global Step: 223420 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:20,382-Speed 2182.27 samples/sec Loss 3.4061 LearningRate 0.000659 Epoch: 10 Global Step: 223430 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:28,770-Speed 2497.27 samples/sec Loss 3.3627 LearningRate 0.000659 Epoch: 10 Global Step: 223440 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:37,202-Speed 2515.54 samples/sec Loss 3.4233 LearningRate 0.000659 Epoch: 10 Global Step: 223450 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:45,411-Speed 2495.18 samples/sec Loss 3.4037 LearningRate 0.000659 Epoch: 10 Global Step: 223460 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:30:53,611-Speed 2497.91 samples/sec Loss 3.3430 LearningRate 0.000659 Epoch: 10 Global Step: 223470 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:01,824-Speed 2494.18 samples/sec Loss 3.4131 LearningRate 0.000659 Epoch: 10 Global Step: 223480 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:10,028-Speed 2496.74 samples/sec Loss 3.4171 LearningRate 0.000659 Epoch: 10 Global Step: 223490 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:18,227-Speed 2498.46 samples/sec Loss 3.3074 LearningRate 0.000659 Epoch: 10 Global Step: 223500 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:26,388-Speed 2510.00 samples/sec Loss 3.4178 LearningRate 0.000659 Epoch: 10 Global Step: 223510 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:34,602-Speed 2493.68 samples/sec Loss 3.3851 LearningRate 0.000659 Epoch: 10 Global Step: 223520 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:42,810-Speed 2495.50 samples/sec Loss 3.2845 LearningRate 0.000659 Epoch: 10 Global Step: 223530 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:51,012-Speed 2497.43 samples/sec Loss 3.3260 LearningRate 0.000659 Epoch: 10 Global Step: 223540 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:31:59,215-Speed 2496.97 samples/sec Loss 3.4013 LearningRate 0.000659 Epoch: 10 Global Step: 223550 Fp16 Grad Scale: 32768 Required: 139 hours Training: 2022-07-07 17:32:07,418-Speed 2497.14 samples/sec Loss 3.3087 LearningRate 0.000659 Epoch: 10 Global Step: 223560 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:15,565-Speed 2514.17 samples/sec Loss 3.3299 LearningRate 0.000659 Epoch: 10 Global Step: 223570 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:23,768-Speed 2496.76 samples/sec Loss 3.3270 LearningRate 0.000659 Epoch: 10 Global Step: 223580 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:31,969-Speed 2497.83 samples/sec Loss 3.2983 LearningRate 0.000659 Epoch: 10 Global Step: 223590 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:40,168-Speed 2498.19 samples/sec Loss 3.3976 LearningRate 0.000659 Epoch: 10 Global Step: 223600 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:48,370-Speed 2497.41 samples/sec Loss 3.4611 LearningRate 0.000659 Epoch: 10 Global Step: 223610 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:32:56,571-Speed 2497.70 samples/sec Loss 3.3712 LearningRate 0.000659 Epoch: 10 Global Step: 223620 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:33:04,720-Speed 2513.75 samples/sec Loss 3.3927 LearningRate 0.000659 Epoch: 10 Global Step: 223630 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:33:12,919-Speed 2498.36 samples/sec Loss 3.4640 LearningRate 0.000659 Epoch: 10 Global Step: 223640 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:33:21,122-Speed 2497.10 samples/sec Loss 3.4201 LearningRate 0.000659 Epoch: 10 Global Step: 223650 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:33:29,322-Speed 2497.94 samples/sec Loss 3.3693 LearningRate 0.000659 Epoch: 10 Global Step: 223660 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:33:37,529-Speed 2496.02 samples/sec Loss 3.3967 LearningRate 0.000659 Epoch: 10 Global Step: 223670 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:33:45,729-Speed 2497.94 samples/sec Loss 3.4698 LearningRate 0.000659 Epoch: 10 Global Step: 223680 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:33:53,888-Speed 2510.32 samples/sec Loss 3.3866 LearningRate 0.000659 Epoch: 10 Global Step: 223690 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:02,092-Speed 2496.70 samples/sec Loss 3.4847 LearningRate 0.000659 Epoch: 10 Global Step: 223700 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:10,296-Speed 2496.98 samples/sec Loss 3.4131 LearningRate 0.000658 Epoch: 10 Global Step: 223710 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:18,496-Speed 2497.92 samples/sec Loss 3.4685 LearningRate 0.000658 Epoch: 10 Global Step: 223720 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:26,695-Speed 2498.20 samples/sec Loss 3.3750 LearningRate 0.000658 Epoch: 10 Global Step: 223730 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:34,895-Speed 2497.99 samples/sec Loss 3.4345 LearningRate 0.000658 Epoch: 10 Global Step: 223740 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:43,045-Speed 2513.39 samples/sec Loss 3.4297 LearningRate 0.000658 Epoch: 10 Global Step: 223750 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:51,244-Speed 2498.13 samples/sec Loss 3.3830 LearningRate 0.000658 Epoch: 10 Global Step: 223760 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:34:59,457-Speed 2494.23 samples/sec Loss 3.5189 LearningRate 0.000658 Epoch: 10 Global Step: 223770 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:07,659-Speed 2496.99 samples/sec Loss 3.4437 LearningRate 0.000658 Epoch: 10 Global Step: 223780 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:15,859-Speed 2498.20 samples/sec Loss 3.4025 LearningRate 0.000658 Epoch: 10 Global Step: 223790 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:24,059-Speed 2498.00 samples/sec Loss 3.3935 LearningRate 0.000658 Epoch: 10 Global Step: 223800 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:32,206-Speed 2514.11 samples/sec Loss 3.4616 LearningRate 0.000658 Epoch: 10 Global Step: 223810 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:40,410-Speed 2496.82 samples/sec Loss 3.3610 LearningRate 0.000658 Epoch: 10 Global Step: 223820 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:35:48,570-Speed 2510.38 samples/sec Loss 3.4052 LearningRate 0.000658 Epoch: 10 Global Step: 223830 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:35:56,768-Speed 2498.47 samples/sec Loss 3.4168 LearningRate 0.000658 Epoch: 10 Global Step: 223840 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:04,971-Speed 2497.41 samples/sec Loss 3.4745 LearningRate 0.000658 Epoch: 10 Global Step: 223850 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:13,170-Speed 2498.18 samples/sec Loss 3.3691 LearningRate 0.000658 Epoch: 10 Global Step: 223860 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:21,317-Speed 2514.31 samples/sec Loss 3.4309 LearningRate 0.000658 Epoch: 10 Global Step: 223870 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:29,516-Speed 2498.10 samples/sec Loss 3.3959 LearningRate 0.000658 Epoch: 10 Global Step: 223880 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:37,717-Speed 2497.81 samples/sec Loss 3.4782 LearningRate 0.000658 Epoch: 10 Global Step: 223890 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:45,918-Speed 2497.69 samples/sec Loss 3.3976 LearningRate 0.000658 Epoch: 10 Global Step: 223900 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:36:54,117-Speed 2498.37 samples/sec Loss 3.3914 LearningRate 0.000658 Epoch: 10 Global Step: 223910 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:02,316-Speed 2498.39 samples/sec Loss 3.4307 LearningRate 0.000658 Epoch: 10 Global Step: 223920 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:10,462-Speed 2514.47 samples/sec Loss 3.4116 LearningRate 0.000658 Epoch: 10 Global Step: 223930 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:18,665-Speed 2497.19 samples/sec Loss 3.3713 LearningRate 0.000658 Epoch: 10 Global Step: 223940 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:26,866-Speed 2497.74 samples/sec Loss 3.4542 LearningRate 0.000658 Epoch: 10 Global Step: 223950 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:35,065-Speed 2498.08 samples/sec Loss 3.3846 LearningRate 0.000658 Epoch: 10 Global Step: 223960 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:43,265-Speed 2498.12 samples/sec Loss 3.4259 LearningRate 0.000658 Epoch: 10 Global Step: 223970 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:51,463-Speed 2499.17 samples/sec Loss 3.4609 LearningRate 0.000658 Epoch: 10 Global Step: 223980 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:37:59,621-Speed 2510.71 samples/sec Loss 3.4224 LearningRate 0.000658 Epoch: 10 Global Step: 223990 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:07,820-Speed 2498.33 samples/sec Loss 3.4804 LearningRate 0.000658 Epoch: 10 Global Step: 224000 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:16,023-Speed 2497.10 samples/sec Loss 3.4748 LearningRate 0.000658 Epoch: 10 Global Step: 224010 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:24,224-Speed 2497.87 samples/sec Loss 3.4200 LearningRate 0.000658 Epoch: 10 Global Step: 224020 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:32,424-Speed 2497.89 samples/sec Loss 3.3843 LearningRate 0.000658 Epoch: 10 Global Step: 224030 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:40,623-Speed 2498.26 samples/sec Loss 3.4485 LearningRate 0.000658 Epoch: 10 Global Step: 224040 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:48,771-Speed 2513.81 samples/sec Loss 3.4484 LearningRate 0.000658 Epoch: 10 Global Step: 224050 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:38:56,975-Speed 2496.94 samples/sec Loss 3.4833 LearningRate 0.000658 Epoch: 10 Global Step: 224060 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:05,175-Speed 2497.96 samples/sec Loss 3.4311 LearningRate 0.000658 Epoch: 10 Global Step: 224070 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:13,373-Speed 2498.55 samples/sec Loss 3.4239 LearningRate 0.000658 Epoch: 10 Global Step: 224080 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:21,574-Speed 2497.85 samples/sec Loss 3.4618 LearningRate 0.000658 Epoch: 10 Global Step: 224090 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:29,775-Speed 2497.65 samples/sec Loss 3.4170 LearningRate 0.000658 Epoch: 10 Global Step: 224100 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:37,935-Speed 2510.02 samples/sec Loss 3.4357 LearningRate 0.000658 Epoch: 10 Global Step: 224110 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:46,136-Speed 2497.76 samples/sec Loss 3.3686 LearningRate 0.000658 Epoch: 10 Global Step: 224120 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:39:54,334-Speed 2498.33 samples/sec Loss 3.4209 LearningRate 0.000658 Epoch: 10 Global Step: 224130 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:02,532-Speed 2498.76 samples/sec Loss 3.4614 LearningRate 0.000658 Epoch: 10 Global Step: 224140 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:10,732-Speed 2498.11 samples/sec Loss 3.4351 LearningRate 0.000658 Epoch: 10 Global Step: 224150 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:18,930-Speed 2498.52 samples/sec Loss 3.3226 LearningRate 0.000658 Epoch: 10 Global Step: 224160 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:27,076-Speed 2514.68 samples/sec Loss 3.3868 LearningRate 0.000657 Epoch: 10 Global Step: 224170 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:35,271-Speed 2499.88 samples/sec Loss 3.3669 LearningRate 0.000657 Epoch: 10 Global Step: 224180 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:43,467-Speed 2499.11 samples/sec Loss 3.3381 LearningRate 0.000657 Epoch: 10 Global Step: 224190 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:51,666-Speed 2498.13 samples/sec Loss 3.3888 LearningRate 0.000657 Epoch: 10 Global Step: 224200 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:40:59,866-Speed 2497.97 samples/sec Loss 3.3659 LearningRate 0.000657 Epoch: 10 Global Step: 224210 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:08,066-Speed 2498.07 samples/sec Loss 3.3803 LearningRate 0.000657 Epoch: 10 Global Step: 224220 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:16,211-Speed 2514.95 samples/sec Loss 3.3563 LearningRate 0.000657 Epoch: 10 Global Step: 224230 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:24,407-Speed 2499.22 samples/sec Loss 3.4059 LearningRate 0.000657 Epoch: 10 Global Step: 224240 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:32,603-Speed 2498.97 samples/sec Loss 3.3697 LearningRate 0.000657 Epoch: 10 Global Step: 224250 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:40,803-Speed 2498.23 samples/sec Loss 3.3515 LearningRate 0.000657 Epoch: 10 Global Step: 224260 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:49,002-Speed 2497.98 samples/sec Loss 3.3547 LearningRate 0.000657 Epoch: 10 Global Step: 224270 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:41:57,199-Speed 2498.81 samples/sec Loss 3.4075 LearningRate 0.000657 Epoch: 10 Global Step: 224280 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:05,345-Speed 2514.89 samples/sec Loss 3.3472 LearningRate 0.000657 Epoch: 10 Global Step: 224290 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:13,543-Speed 2498.39 samples/sec Loss 3.3261 LearningRate 0.000657 Epoch: 10 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:21,744-Speed 2497.76 samples/sec Loss 3.4313 LearningRate 0.000657 Epoch: 10 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:29,945-Speed 2498.02 samples/sec Loss 3.4159 LearningRate 0.000657 Epoch: 10 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:38,150-Speed 2496.32 samples/sec Loss 3.3862 LearningRate 0.000657 Epoch: 10 Global Step: 224330 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:46,350-Speed 2498.14 samples/sec Loss 3.3550 LearningRate 0.000657 Epoch: 10 Global Step: 224340 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:42:54,502-Speed 2512.55 samples/sec Loss 3.3134 LearningRate 0.000657 Epoch: 10 Global Step: 224350 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:02,706-Speed 2496.93 samples/sec Loss 3.3640 LearningRate 0.000657 Epoch: 10 Global Step: 224360 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:10,905-Speed 2498.21 samples/sec Loss 3.4675 LearningRate 0.000657 Epoch: 10 Global Step: 224370 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:19,110-Speed 2496.87 samples/sec Loss 3.4740 LearningRate 0.000657 Epoch: 10 Global Step: 224380 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:27,309-Speed 2498.26 samples/sec Loss 3.4667 LearningRate 0.000657 Epoch: 10 Global Step: 224390 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:35,514-Speed 2496.35 samples/sec Loss 3.5561 LearningRate 0.000657 Epoch: 10 Global Step: 224400 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:43,661-Speed 2514.32 samples/sec Loss 3.4410 LearningRate 0.000657 Epoch: 10 Global Step: 224410 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:43:51,867-Speed 2496.23 samples/sec Loss 3.3888 LearningRate 0.000657 Epoch: 10 Global Step: 224420 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:00,066-Speed 2498.35 samples/sec Loss 3.4666 LearningRate 0.000657 Epoch: 10 Global Step: 224430 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:08,261-Speed 2499.23 samples/sec Loss 3.4262 LearningRate 0.000657 Epoch: 10 Global Step: 224440 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:16,459-Speed 2498.88 samples/sec Loss 3.4027 LearningRate 0.000657 Epoch: 10 Global Step: 224450 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:24,657-Speed 2498.38 samples/sec Loss 3.4379 LearningRate 0.000657 Epoch: 10 Global Step: 224460 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:32,804-Speed 2514.01 samples/sec Loss 3.3626 LearningRate 0.000657 Epoch: 10 Global Step: 224470 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:41,004-Speed 2498.07 samples/sec Loss 3.3875 LearningRate 0.000657 Epoch: 10 Global Step: 224480 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:49,206-Speed 2497.42 samples/sec Loss 3.3428 LearningRate 0.000657 Epoch: 10 Global Step: 224490 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:44:57,415-Speed 2495.00 samples/sec Loss 3.3746 LearningRate 0.000657 Epoch: 10 Global Step: 224500 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:05,617-Speed 2497.45 samples/sec Loss 3.4277 LearningRate 0.000657 Epoch: 10 Global Step: 224510 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:13,819-Speed 2497.39 samples/sec Loss 3.3761 LearningRate 0.000657 Epoch: 10 Global Step: 224520 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:21,964-Speed 2514.89 samples/sec Loss 3.3590 LearningRate 0.000657 Epoch: 10 Global Step: 224530 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:30,164-Speed 2497.89 samples/sec Loss 3.3928 LearningRate 0.000657 Epoch: 10 Global Step: 224540 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:38,364-Speed 2498.13 samples/sec Loss 3.4504 LearningRate 0.000657 Epoch: 10 Global Step: 224550 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:46,570-Speed 2496.20 samples/sec Loss 3.3443 LearningRate 0.000657 Epoch: 10 Global Step: 224560 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:45:54,768-Speed 2498.50 samples/sec Loss 3.3901 LearningRate 0.000657 Epoch: 10 Global Step: 224570 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:02,967-Speed 2498.28 samples/sec Loss 3.3293 LearningRate 0.000657 Epoch: 10 Global Step: 224580 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:11,115-Speed 2514.13 samples/sec Loss 3.3940 LearningRate 0.000657 Epoch: 10 Global Step: 224590 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:19,316-Speed 2497.63 samples/sec Loss 3.3909 LearningRate 0.000657 Epoch: 10 Global Step: 224600 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:27,517-Speed 2497.80 samples/sec Loss 3.4161 LearningRate 0.000657 Epoch: 10 Global Step: 224610 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:35,715-Speed 2498.29 samples/sec Loss 3.4062 LearningRate 0.000657 Epoch: 10 Global Step: 224620 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:43,926-Speed 2494.77 samples/sec Loss 3.4030 LearningRate 0.000656 Epoch: 10 Global Step: 224630 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:46:52,136-Speed 2494.99 samples/sec Loss 3.4199 LearningRate 0.000656 Epoch: 10 Global Step: 224640 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:00,285-Speed 2513.42 samples/sec Loss 3.3822 LearningRate 0.000656 Epoch: 10 Global Step: 224650 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:08,489-Speed 2496.92 samples/sec Loss 3.3771 LearningRate 0.000656 Epoch: 10 Global Step: 224660 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:16,695-Speed 2496.06 samples/sec Loss 3.3663 LearningRate 0.000656 Epoch: 10 Global Step: 224670 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:24,899-Speed 2496.93 samples/sec Loss 3.3509 LearningRate 0.000656 Epoch: 10 Global Step: 224680 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:33,099-Speed 2497.91 samples/sec Loss 3.4125 LearningRate 0.000656 Epoch: 10 Global Step: 224690 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:41,306-Speed 2495.68 samples/sec Loss 3.4169 LearningRate 0.000656 Epoch: 10 Global Step: 224700 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:49,452-Speed 2514.38 samples/sec Loss 3.3439 LearningRate 0.000656 Epoch: 10 Global Step: 224710 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:47:57,662-Speed 2495.40 samples/sec Loss 3.4345 LearningRate 0.000656 Epoch: 10 Global Step: 224720 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:05,864-Speed 2497.34 samples/sec Loss 3.4157 LearningRate 0.000656 Epoch: 10 Global Step: 224730 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:14,069-Speed 2496.32 samples/sec Loss 3.4122 LearningRate 0.000656 Epoch: 10 Global Step: 224740 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:22,275-Speed 2496.68 samples/sec Loss 3.3990 LearningRate 0.000656 Epoch: 10 Global Step: 224750 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:30,480-Speed 2496.07 samples/sec Loss 3.5047 LearningRate 0.000656 Epoch: 10 Global Step: 224760 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:38,629-Speed 2513.65 samples/sec Loss 3.4181 LearningRate 0.000656 Epoch: 10 Global Step: 224770 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:46,834-Speed 2496.34 samples/sec Loss 3.3322 LearningRate 0.000656 Epoch: 10 Global Step: 224780 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:48:55,039-Speed 2496.69 samples/sec Loss 3.4151 LearningRate 0.000656 Epoch: 10 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:03,243-Speed 2496.71 samples/sec Loss 3.3533 LearningRate 0.000656 Epoch: 10 Global Step: 224800 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:11,446-Speed 2497.23 samples/sec Loss 3.3413 LearningRate 0.000656 Epoch: 10 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:19,646-Speed 2497.74 samples/sec Loss 3.3503 LearningRate 0.000656 Epoch: 10 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:27,799-Speed 2512.54 samples/sec Loss 3.3063 LearningRate 0.000656 Epoch: 10 Global Step: 224830 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:36,003-Speed 2496.85 samples/sec Loss 3.4501 LearningRate 0.000656 Epoch: 10 Global Step: 224840 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:44,207-Speed 2496.69 samples/sec Loss 3.3459 LearningRate 0.000656 Epoch: 10 Global Step: 224850 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:49:52,413-Speed 2496.18 samples/sec Loss 3.4154 LearningRate 0.000656 Epoch: 10 Global Step: 224860 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:00,613-Speed 2497.80 samples/sec Loss 3.3924 LearningRate 0.000656 Epoch: 10 Global Step: 224870 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:08,817-Speed 2496.47 samples/sec Loss 3.4196 LearningRate 0.000656 Epoch: 10 Global Step: 224880 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:16,966-Speed 2513.81 samples/sec Loss 3.3726 LearningRate 0.000656 Epoch: 10 Global Step: 224890 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:25,172-Speed 2496.32 samples/sec Loss 3.3790 LearningRate 0.000656 Epoch: 10 Global Step: 224900 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:33,380-Speed 2495.31 samples/sec Loss 3.3787 LearningRate 0.000656 Epoch: 10 Global Step: 224910 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:41,582-Speed 2497.41 samples/sec Loss 3.3915 LearningRate 0.000656 Epoch: 10 Global Step: 224920 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:49,785-Speed 2497.16 samples/sec Loss 3.4341 LearningRate 0.000656 Epoch: 10 Global Step: 224930 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:50:57,987-Speed 2497.39 samples/sec Loss 3.4598 LearningRate 0.000656 Epoch: 10 Global Step: 224940 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:06,137-Speed 2513.41 samples/sec Loss 3.3934 LearningRate 0.000656 Epoch: 10 Global Step: 224950 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:14,343-Speed 2495.89 samples/sec Loss 3.3584 LearningRate 0.000656 Epoch: 10 Global Step: 224960 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:22,550-Speed 2496.12 samples/sec Loss 3.3435 LearningRate 0.000656 Epoch: 10 Global Step: 224970 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:30,750-Speed 2497.88 samples/sec Loss 3.4116 LearningRate 0.000656 Epoch: 10 Global Step: 224980 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:38,954-Speed 2496.89 samples/sec Loss 3.4944 LearningRate 0.000656 Epoch: 10 Global Step: 224990 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:47,158-Speed 2496.80 samples/sec Loss 3.4159 LearningRate 0.000656 Epoch: 10 Global Step: 225000 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:51:55,309-Speed 2513.27 samples/sec Loss 3.5313 LearningRate 0.000656 Epoch: 10 Global Step: 225010 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:52:03,517-Speed 2495.27 samples/sec Loss 3.4420 LearningRate 0.000656 Epoch: 10 Global Step: 225020 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:52:11,719-Speed 2497.47 samples/sec Loss 3.4920 LearningRate 0.000656 Epoch: 10 Global Step: 225030 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:52:19,920-Speed 2497.80 samples/sec Loss 3.4167 LearningRate 0.000656 Epoch: 10 Global Step: 225040 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:52:28,120-Speed 2497.74 samples/sec Loss 3.4478 LearningRate 0.000656 Epoch: 10 Global Step: 225050 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:52:36,324-Speed 2496.88 samples/sec Loss 3.4316 LearningRate 0.000656 Epoch: 10 Global Step: 225060 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:52:44,473-Speed 2513.46 samples/sec Loss 3.3776 LearningRate 0.000656 Epoch: 10 Global Step: 225070 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:52:52,671-Speed 2498.38 samples/sec Loss 3.3883 LearningRate 0.000656 Epoch: 10 Global Step: 225080 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:53:00,875-Speed 2496.83 samples/sec Loss 3.3662 LearningRate 0.000655 Epoch: 10 Global Step: 225090 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 17:53:09,034-Speed 2510.65 samples/sec Loss 3.3745 LearningRate 0.000655 Epoch: 10 Global Step: 225100 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:17,234-Speed 2497.98 samples/sec Loss 3.4330 LearningRate 0.000655 Epoch: 10 Global Step: 225110 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:25,435-Speed 2497.50 samples/sec Loss 3.3714 LearningRate 0.000655 Epoch: 10 Global Step: 225120 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:33,587-Speed 2512.83 samples/sec Loss 3.4347 LearningRate 0.000655 Epoch: 10 Global Step: 225130 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:41,789-Speed 2497.06 samples/sec Loss 3.3719 LearningRate 0.000655 Epoch: 10 Global Step: 225140 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:49,993-Speed 2497.24 samples/sec Loss 3.4000 LearningRate 0.000655 Epoch: 10 Global Step: 225150 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:53:58,193-Speed 2498.03 samples/sec Loss 3.3841 LearningRate 0.000655 Epoch: 10 Global Step: 225160 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:06,399-Speed 2496.12 samples/sec Loss 3.3868 LearningRate 0.000655 Epoch: 10 Global Step: 225170 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:14,603-Speed 2496.77 samples/sec Loss 3.4229 LearningRate 0.000655 Epoch: 10 Global Step: 225180 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:22,755-Speed 2512.63 samples/sec Loss 3.3995 LearningRate 0.000655 Epoch: 10 Global Step: 225190 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:30,962-Speed 2496.15 samples/sec Loss 3.4409 LearningRate 0.000655 Epoch: 10 Global Step: 225200 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:39,168-Speed 2496.10 samples/sec Loss 3.4103 LearningRate 0.000655 Epoch: 10 Global Step: 225210 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:47,376-Speed 2495.43 samples/sec Loss 3.3587 LearningRate 0.000655 Epoch: 10 Global Step: 225220 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:54:55,577-Speed 2497.52 samples/sec Loss 3.4207 LearningRate 0.000655 Epoch: 10 Global Step: 225230 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:03,779-Speed 2497.15 samples/sec Loss 3.4758 LearningRate 0.000655 Epoch: 10 Global Step: 225240 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:11,929-Speed 2513.40 samples/sec Loss 3.4537 LearningRate 0.000655 Epoch: 10 Global Step: 225250 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:20,134-Speed 2496.48 samples/sec Loss 3.3797 LearningRate 0.000655 Epoch: 10 Global Step: 225260 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:28,334-Speed 2497.89 samples/sec Loss 3.3898 LearningRate 0.000655 Epoch: 10 Global Step: 225270 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:36,564-Speed 2489.06 samples/sec Loss 3.3737 LearningRate 0.000655 Epoch: 10 Global Step: 225280 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:44,763-Speed 2498.15 samples/sec Loss 3.3785 LearningRate 0.000655 Epoch: 10 Global Step: 225290 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:55:52,969-Speed 2496.22 samples/sec Loss 3.3765 LearningRate 0.000655 Epoch: 10 Global Step: 225300 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:01,116-Speed 2514.20 samples/sec Loss 3.3865 LearningRate 0.000655 Epoch: 10 Global Step: 225310 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:09,341-Speed 2490.49 samples/sec Loss 3.3448 LearningRate 0.000655 Epoch: 10 Global Step: 225320 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:17,545-Speed 2496.60 samples/sec Loss 3.4480 LearningRate 0.000655 Epoch: 10 Global Step: 225330 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:25,748-Speed 2497.29 samples/sec Loss 3.3768 LearningRate 0.000655 Epoch: 10 Global Step: 225340 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:33,952-Speed 2496.52 samples/sec Loss 3.3710 LearningRate 0.000655 Epoch: 10 Global Step: 225350 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:42,152-Speed 2497.94 samples/sec Loss 3.3501 LearningRate 0.000655 Epoch: 10 Global Step: 225360 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:50,297-Speed 2514.81 samples/sec Loss 3.3891 LearningRate 0.000655 Epoch: 10 Global Step: 225370 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:56:58,500-Speed 2497.03 samples/sec Loss 3.4013 LearningRate 0.000655 Epoch: 10 Global Step: 225380 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:06,701-Speed 2497.86 samples/sec Loss 3.4046 LearningRate 0.000655 Epoch: 10 Global Step: 225390 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:14,899-Speed 2498.58 samples/sec Loss 3.3755 LearningRate 0.000655 Epoch: 10 Global Step: 225400 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:23,098-Speed 2498.14 samples/sec Loss 3.3742 LearningRate 0.000655 Epoch: 10 Global Step: 225410 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:31,297-Speed 2498.21 samples/sec Loss 3.4354 LearningRate 0.000655 Epoch: 10 Global Step: 225420 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:39,440-Speed 2515.64 samples/sec Loss 3.4030 LearningRate 0.000655 Epoch: 10 Global Step: 225430 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:47,639-Speed 2498.46 samples/sec Loss 3.3949 LearningRate 0.000655 Epoch: 10 Global Step: 225440 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:57:55,837-Speed 2498.67 samples/sec Loss 3.3865 LearningRate 0.000655 Epoch: 10 Global Step: 225450 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:04,037-Speed 2498.19 samples/sec Loss 3.4000 LearningRate 0.000655 Epoch: 10 Global Step: 225460 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:12,235-Speed 2498.63 samples/sec Loss 3.3769 LearningRate 0.000655 Epoch: 10 Global Step: 225470 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:20,436-Speed 2497.80 samples/sec Loss 3.3628 LearningRate 0.000655 Epoch: 10 Global Step: 225480 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:28,581-Speed 2514.68 samples/sec Loss 3.4742 LearningRate 0.000655 Epoch: 10 Global Step: 225490 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:36,780-Speed 2498.26 samples/sec Loss 3.3854 LearningRate 0.000655 Epoch: 10 Global Step: 225500 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:44,980-Speed 2498.10 samples/sec Loss 3.3248 LearningRate 0.000655 Epoch: 10 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:58:53,176-Speed 2499.16 samples/sec Loss 3.3126 LearningRate 0.000655 Epoch: 10 Global Step: 225520 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:01,386-Speed 2494.96 samples/sec Loss 3.3923 LearningRate 0.000655 Epoch: 10 Global Step: 225530 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:09,584-Speed 2498.48 samples/sec Loss 3.4083 LearningRate 0.000655 Epoch: 10 Global Step: 225540 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:17,741-Speed 2511.02 samples/sec Loss 3.2740 LearningRate 0.000654 Epoch: 10 Global Step: 225550 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:25,957-Speed 2493.52 samples/sec Loss 3.3275 LearningRate 0.000654 Epoch: 10 Global Step: 225560 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:34,159-Speed 2497.35 samples/sec Loss 3.3416 LearningRate 0.000654 Epoch: 10 Global Step: 225570 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:42,358-Speed 2498.22 samples/sec Loss 3.3763 LearningRate 0.000654 Epoch: 10 Global Step: 225580 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:50,554-Speed 2498.99 samples/sec Loss 3.4517 LearningRate 0.000654 Epoch: 10 Global Step: 225590 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 17:59:58,753-Speed 2498.32 samples/sec Loss 3.4571 LearningRate 0.000654 Epoch: 10 Global Step: 225600 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:06,910-Speed 2511.08 samples/sec Loss 3.3457 LearningRate 0.000654 Epoch: 10 Global Step: 225610 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:15,108-Speed 2498.51 samples/sec Loss 3.4436 LearningRate 0.000654 Epoch: 10 Global Step: 225620 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:23,305-Speed 2499.06 samples/sec Loss 3.4156 LearningRate 0.000654 Epoch: 10 Global Step: 225630 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:31,500-Speed 2499.48 samples/sec Loss 3.3794 LearningRate 0.000654 Epoch: 10 Global Step: 225640 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:39,701-Speed 2497.88 samples/sec Loss 3.4051 LearningRate 0.000654 Epoch: 10 Global Step: 225650 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:47,902-Speed 2497.80 samples/sec Loss 3.3455 LearningRate 0.000654 Epoch: 10 Global Step: 225660 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:00:56,048-Speed 2514.48 samples/sec Loss 3.3770 LearningRate 0.000654 Epoch: 10 Global Step: 225670 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:04,251-Speed 2497.21 samples/sec Loss 3.3798 LearningRate 0.000654 Epoch: 10 Global Step: 225680 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:12,450-Speed 2498.11 samples/sec Loss 3.4141 LearningRate 0.000654 Epoch: 10 Global Step: 225690 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:20,648-Speed 2498.51 samples/sec Loss 3.4380 LearningRate 0.000654 Epoch: 10 Global Step: 225700 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:28,850-Speed 2497.37 samples/sec Loss 3.4382 LearningRate 0.000654 Epoch: 10 Global Step: 225710 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:37,052-Speed 2497.54 samples/sec Loss 3.3350 LearningRate 0.000654 Epoch: 10 Global Step: 225720 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:45,197-Speed 2514.94 samples/sec Loss 3.3867 LearningRate 0.000654 Epoch: 10 Global Step: 225730 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:01:53,401-Speed 2496.74 samples/sec Loss 3.4161 LearningRate 0.000654 Epoch: 10 Global Step: 225740 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:01,599-Speed 2498.45 samples/sec Loss 3.3643 LearningRate 0.000654 Epoch: 10 Global Step: 225750 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:09,808-Speed 2495.25 samples/sec Loss 3.3731 LearningRate 0.000654 Epoch: 10 Global Step: 225760 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:18,007-Speed 2498.19 samples/sec Loss 3.3962 LearningRate 0.000654 Epoch: 10 Global Step: 225770 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:26,205-Speed 2498.67 samples/sec Loss 3.3842 LearningRate 0.000654 Epoch: 10 Global Step: 225780 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:34,353-Speed 2514.02 samples/sec Loss 3.4103 LearningRate 0.000654 Epoch: 10 Global Step: 225790 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:42,552-Speed 2498.15 samples/sec Loss 3.4339 LearningRate 0.000654 Epoch: 10 Global Step: 225800 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:50,749-Speed 2498.79 samples/sec Loss 3.4143 LearningRate 0.000654 Epoch: 10 Global Step: 225810 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:02:58,951-Speed 2497.66 samples/sec Loss 3.3307 LearningRate 0.000654 Epoch: 10 Global Step: 225820 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:07,151-Speed 2498.10 samples/sec Loss 3.3713 LearningRate 0.000654 Epoch: 10 Global Step: 225830 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:15,364-Speed 2494.02 samples/sec Loss 3.4205 LearningRate 0.000654 Epoch: 10 Global Step: 225840 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:23,509-Speed 2514.89 samples/sec Loss 3.4325 LearningRate 0.000654 Epoch: 10 Global Step: 225850 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:31,710-Speed 2497.52 samples/sec Loss 3.4799 LearningRate 0.000654 Epoch: 10 Global Step: 225860 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:39,911-Speed 2497.75 samples/sec Loss 3.4741 LearningRate 0.000654 Epoch: 10 Global Step: 225870 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:48,115-Speed 2496.88 samples/sec Loss 3.4548 LearningRate 0.000654 Epoch: 10 Global Step: 225880 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:03:56,313-Speed 2498.72 samples/sec Loss 3.4263 LearningRate 0.000654 Epoch: 10 Global Step: 225890 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:04,512-Speed 2498.02 samples/sec Loss 3.4809 LearningRate 0.000654 Epoch: 10 Global Step: 225900 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:12,660-Speed 2514.05 samples/sec Loss 3.4480 LearningRate 0.000654 Epoch: 10 Global Step: 225910 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:20,860-Speed 2497.81 samples/sec Loss 3.4658 LearningRate 0.000654 Epoch: 10 Global Step: 225920 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:29,059-Speed 2498.51 samples/sec Loss 3.4635 LearningRate 0.000654 Epoch: 10 Global Step: 225930 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:37,255-Speed 2499.26 samples/sec Loss 3.4547 LearningRate 0.000654 Epoch: 10 Global Step: 225940 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:45,454-Speed 2498.42 samples/sec Loss 3.4684 LearningRate 0.000654 Epoch: 10 Global Step: 225950 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:04:53,653-Speed 2498.49 samples/sec Loss 3.4362 LearningRate 0.000654 Epoch: 10 Global Step: 225960 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:01,800-Speed 2513.99 samples/sec Loss 3.4297 LearningRate 0.000654 Epoch: 10 Global Step: 225970 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:10,001-Speed 2497.88 samples/sec Loss 3.3811 LearningRate 0.000654 Epoch: 10 Global Step: 225980 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:18,200-Speed 2498.51 samples/sec Loss 3.3359 LearningRate 0.000654 Epoch: 10 Global Step: 225990 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:26,398-Speed 2498.31 samples/sec Loss 3.3552 LearningRate 0.000654 Epoch: 10 Global Step: 226000 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:34,599-Speed 2497.89 samples/sec Loss 3.4137 LearningRate 0.000654 Epoch: 10 Global Step: 226010 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:42,802-Speed 2496.86 samples/sec Loss 3.3858 LearningRate 0.000653 Epoch: 10 Global Step: 226020 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:50,960-Speed 2510.94 samples/sec Loss 3.3631 LearningRate 0.000653 Epoch: 10 Global Step: 226030 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:05:59,158-Speed 2498.74 samples/sec Loss 3.3966 LearningRate 0.000653 Epoch: 10 Global Step: 226040 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:07,361-Speed 2497.18 samples/sec Loss 3.3813 LearningRate 0.000653 Epoch: 10 Global Step: 226050 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:15,570-Speed 2495.21 samples/sec Loss 3.3889 LearningRate 0.000653 Epoch: 10 Global Step: 226060 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:23,768-Speed 2498.38 samples/sec Loss 3.4279 LearningRate 0.000653 Epoch: 10 Global Step: 226070 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:31,973-Speed 2496.63 samples/sec Loss 3.4300 LearningRate 0.000653 Epoch: 10 Global Step: 226080 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:40,116-Speed 2515.31 samples/sec Loss 3.3373 LearningRate 0.000653 Epoch: 10 Global Step: 226090 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:48,315-Speed 2498.27 samples/sec Loss 3.3448 LearningRate 0.000653 Epoch: 10 Global Step: 226100 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:06:56,514-Speed 2498.36 samples/sec Loss 3.3225 LearningRate 0.000653 Epoch: 10 Global Step: 226110 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:04,713-Speed 2498.43 samples/sec Loss 3.4384 LearningRate 0.000653 Epoch: 10 Global Step: 226120 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:12,913-Speed 2497.72 samples/sec Loss 3.3582 LearningRate 0.000653 Epoch: 10 Global Step: 226130 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:21,114-Speed 2497.65 samples/sec Loss 3.3772 LearningRate 0.000653 Epoch: 10 Global Step: 226140 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:29,268-Speed 2512.10 samples/sec Loss 3.3642 LearningRate 0.000653 Epoch: 10 Global Step: 226150 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:37,465-Speed 2498.88 samples/sec Loss 3.3352 LearningRate 0.000653 Epoch: 10 Global Step: 226160 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:45,665-Speed 2498.08 samples/sec Loss 3.3323 LearningRate 0.000653 Epoch: 10 Global Step: 226170 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:07:53,865-Speed 2498.02 samples/sec Loss 3.3664 LearningRate 0.000653 Epoch: 10 Global Step: 226180 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:02,066-Speed 2497.55 samples/sec Loss 3.4121 LearningRate 0.000653 Epoch: 10 Global Step: 226190 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:10,263-Speed 2498.92 samples/sec Loss 3.4368 LearningRate 0.000653 Epoch: 10 Global Step: 226200 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:18,408-Speed 2514.66 samples/sec Loss 3.4657 LearningRate 0.000653 Epoch: 10 Global Step: 226210 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:26,606-Speed 2498.48 samples/sec Loss 3.4090 LearningRate 0.000653 Epoch: 10 Global Step: 226220 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:34,806-Speed 2497.86 samples/sec Loss 3.3113 LearningRate 0.000653 Epoch: 10 Global Step: 226230 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:43,016-Speed 2494.86 samples/sec Loss 3.3891 LearningRate 0.000653 Epoch: 10 Global Step: 226240 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:51,217-Speed 2497.86 samples/sec Loss 3.3648 LearningRate 0.000653 Epoch: 10 Global Step: 226250 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:08:59,415-Speed 2498.44 samples/sec Loss 3.3591 LearningRate 0.000653 Epoch: 10 Global Step: 226260 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:09:07,561-Speed 2514.89 samples/sec Loss 3.4247 LearningRate 0.000653 Epoch: 10 Global Step: 226270 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:09:15,759-Speed 2498.96 samples/sec Loss 3.3429 LearningRate 0.000653 Epoch: 10 Global Step: 226280 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:09:23,961-Speed 2497.10 samples/sec Loss 3.4180 LearningRate 0.000653 Epoch: 10 Global Step: 226290 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:09:32,167-Speed 2496.10 samples/sec Loss 3.3604 LearningRate 0.000653 Epoch: 10 Global Step: 226300 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:09:40,362-Speed 2499.86 samples/sec Loss 3.4412 LearningRate 0.000653 Epoch: 10 Global Step: 226310 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:09:48,564-Speed 2497.31 samples/sec Loss 3.4185 LearningRate 0.000653 Epoch: 10 Global Step: 226320 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:09:56,708-Speed 2515.33 samples/sec Loss 3.3879 LearningRate 0.000653 Epoch: 10 Global Step: 226330 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:04,906-Speed 2498.48 samples/sec Loss 3.4020 LearningRate 0.000653 Epoch: 10 Global Step: 226340 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:13,107-Speed 2497.60 samples/sec Loss 3.3600 LearningRate 0.000653 Epoch: 10 Global Step: 226350 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:21,307-Speed 2498.22 samples/sec Loss 3.3704 LearningRate 0.000653 Epoch: 10 Global Step: 226360 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:29,509-Speed 2497.30 samples/sec Loss 3.3697 LearningRate 0.000653 Epoch: 10 Global Step: 226370 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:37,713-Speed 2496.78 samples/sec Loss 3.4209 LearningRate 0.000653 Epoch: 10 Global Step: 226380 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:45,863-Speed 2513.50 samples/sec Loss 3.3234 LearningRate 0.000653 Epoch: 10 Global Step: 226390 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:10:54,063-Speed 2497.79 samples/sec Loss 3.2989 LearningRate 0.000653 Epoch: 10 Global Step: 226400 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:02,264-Speed 2497.79 samples/sec Loss 3.3543 LearningRate 0.000653 Epoch: 10 Global Step: 226410 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:10,463-Speed 2498.43 samples/sec Loss 3.3316 LearningRate 0.000653 Epoch: 10 Global Step: 226420 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:18,660-Speed 2498.67 samples/sec Loss 3.3963 LearningRate 0.000653 Epoch: 10 Global Step: 226430 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:26,861-Speed 2497.72 samples/sec Loss 3.3682 LearningRate 0.000653 Epoch: 10 Global Step: 226440 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:35,007-Speed 2515.25 samples/sec Loss 3.3419 LearningRate 0.000653 Epoch: 10 Global Step: 226450 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:43,210-Speed 2496.70 samples/sec Loss 3.3452 LearningRate 0.000653 Epoch: 10 Global Step: 226460 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:51,410-Speed 2498.24 samples/sec Loss 3.3891 LearningRate 0.000653 Epoch: 10 Global Step: 226470 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:11:59,611-Speed 2497.74 samples/sec Loss 3.3589 LearningRate 0.000652 Epoch: 10 Global Step: 226480 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:07,809-Speed 2498.39 samples/sec Loss 3.4040 LearningRate 0.000652 Epoch: 10 Global Step: 226490 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:16,008-Speed 2498.27 samples/sec Loss 3.3924 LearningRate 0.000652 Epoch: 10 Global Step: 226500 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:24,155-Speed 2514.13 samples/sec Loss 3.3878 LearningRate 0.000652 Epoch: 10 Global Step: 226510 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:32,352-Speed 2498.99 samples/sec Loss 3.3033 LearningRate 0.000652 Epoch: 10 Global Step: 226520 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:40,563-Speed 2494.69 samples/sec Loss 3.3517 LearningRate 0.000652 Epoch: 10 Global Step: 226530 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:48,760-Speed 2498.85 samples/sec Loss 3.3700 LearningRate 0.000652 Epoch: 10 Global Step: 226540 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:12:56,959-Speed 2498.08 samples/sec Loss 3.3390 LearningRate 0.000652 Epoch: 10 Global Step: 226550 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:05,158-Speed 2498.46 samples/sec Loss 3.3020 LearningRate 0.000652 Epoch: 10 Global Step: 226560 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:13,304-Speed 2514.31 samples/sec Loss 3.3571 LearningRate 0.000652 Epoch: 10 Global Step: 226570 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:21,502-Speed 2498.65 samples/sec Loss 3.3389 LearningRate 0.000652 Epoch: 10 Global Step: 226580 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:29,709-Speed 2495.98 samples/sec Loss 3.3929 LearningRate 0.000652 Epoch: 10 Global Step: 226590 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:37,908-Speed 2498.04 samples/sec Loss 3.3919 LearningRate 0.000652 Epoch: 10 Global Step: 226600 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:46,114-Speed 2496.00 samples/sec Loss 3.4091 LearningRate 0.000652 Epoch: 10 Global Step: 226610 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:13:54,320-Speed 2496.39 samples/sec Loss 3.3739 LearningRate 0.000652 Epoch: 10 Global Step: 226620 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:02,466-Speed 2514.67 samples/sec Loss 3.3294 LearningRate 0.000652 Epoch: 10 Global Step: 226630 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:10,665-Speed 2498.32 samples/sec Loss 3.4135 LearningRate 0.000652 Epoch: 10 Global Step: 226640 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:18,866-Speed 2497.63 samples/sec Loss 3.3951 LearningRate 0.000652 Epoch: 10 Global Step: 226650 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:27,067-Speed 2497.47 samples/sec Loss 3.3249 LearningRate 0.000652 Epoch: 10 Global Step: 226660 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:35,267-Speed 2497.94 samples/sec Loss 3.4144 LearningRate 0.000652 Epoch: 10 Global Step: 226670 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:43,468-Speed 2497.70 samples/sec Loss 3.3910 LearningRate 0.000652 Epoch: 10 Global Step: 226680 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:51,612-Speed 2515.09 samples/sec Loss 3.3653 LearningRate 0.000652 Epoch: 10 Global Step: 226690 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:14:59,812-Speed 2497.79 samples/sec Loss 3.4202 LearningRate 0.000652 Epoch: 10 Global Step: 226700 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:08,012-Speed 2498.17 samples/sec Loss 3.3815 LearningRate 0.000652 Epoch: 10 Global Step: 226710 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:16,208-Speed 2499.05 samples/sec Loss 3.3375 LearningRate 0.000652 Epoch: 10 Global Step: 226720 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:24,404-Speed 2499.48 samples/sec Loss 3.4148 LearningRate 0.000652 Epoch: 10 Global Step: 226730 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:32,600-Speed 2498.94 samples/sec Loss 3.4633 LearningRate 0.000652 Epoch: 10 Global Step: 226740 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:40,752-Speed 2512.87 samples/sec Loss 3.4281 LearningRate 0.000652 Epoch: 10 Global Step: 226750 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:48,949-Speed 2498.75 samples/sec Loss 3.4038 LearningRate 0.000652 Epoch: 10 Global Step: 226760 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:15:57,147-Speed 2498.62 samples/sec Loss 3.3423 LearningRate 0.000652 Epoch: 10 Global Step: 226770 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:05,345-Speed 2498.54 samples/sec Loss 3.3412 LearningRate 0.000652 Epoch: 10 Global Step: 226780 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:13,545-Speed 2497.97 samples/sec Loss 3.3484 LearningRate 0.000652 Epoch: 10 Global Step: 226790 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:21,742-Speed 2498.63 samples/sec Loss 3.3757 LearningRate 0.000652 Epoch: 10 Global Step: 226800 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:29,889-Speed 2514.22 samples/sec Loss 3.3155 LearningRate 0.000652 Epoch: 10 Global Step: 226810 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:38,089-Speed 2498.02 samples/sec Loss 3.4032 LearningRate 0.000652 Epoch: 10 Global Step: 226820 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:46,291-Speed 2497.22 samples/sec Loss 3.4314 LearningRate 0.000652 Epoch: 10 Global Step: 226830 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:16:54,496-Speed 2496.67 samples/sec Loss 3.4054 LearningRate 0.000652 Epoch: 10 Global Step: 226840 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:02,700-Speed 2496.74 samples/sec Loss 3.3363 LearningRate 0.000652 Epoch: 10 Global Step: 226850 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:10,901-Speed 2497.79 samples/sec Loss 3.3528 LearningRate 0.000652 Epoch: 10 Global Step: 226860 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:19,048-Speed 2513.93 samples/sec Loss 3.3783 LearningRate 0.000652 Epoch: 10 Global Step: 226870 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:27,249-Speed 2498.04 samples/sec Loss 3.3883 LearningRate 0.000652 Epoch: 10 Global Step: 226880 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:35,451-Speed 2497.32 samples/sec Loss 3.3743 LearningRate 0.000652 Epoch: 10 Global Step: 226890 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:43,657-Speed 2496.38 samples/sec Loss 3.3268 LearningRate 0.000652 Epoch: 10 Global Step: 226900 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:17:51,862-Speed 2496.63 samples/sec Loss 3.3913 LearningRate 0.000652 Epoch: 10 Global Step: 226910 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:00,062-Speed 2497.81 samples/sec Loss 3.4149 LearningRate 0.000652 Epoch: 10 Global Step: 226920 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:08,212-Speed 2513.29 samples/sec Loss 3.3262 LearningRate 0.000652 Epoch: 10 Global Step: 226930 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:16,410-Speed 2498.74 samples/sec Loss 3.3528 LearningRate 0.000651 Epoch: 10 Global Step: 226940 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:24,609-Speed 2498.24 samples/sec Loss 3.3888 LearningRate 0.000651 Epoch: 10 Global Step: 226950 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:32,808-Speed 2498.40 samples/sec Loss 3.3976 LearningRate 0.000651 Epoch: 10 Global Step: 226960 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:41,008-Speed 2497.89 samples/sec Loss 3.3503 LearningRate 0.000651 Epoch: 10 Global Step: 226970 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:49,207-Speed 2498.34 samples/sec Loss 3.3830 LearningRate 0.000651 Epoch: 10 Global Step: 226980 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:18:57,352-Speed 2514.79 samples/sec Loss 3.3644 LearningRate 0.000651 Epoch: 10 Global Step: 226990 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:05,550-Speed 2498.46 samples/sec Loss 3.4100 LearningRate 0.000651 Epoch: 10 Global Step: 227000 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:13,747-Speed 2499.17 samples/sec Loss 3.3280 LearningRate 0.000651 Epoch: 10 Global Step: 227010 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:21,945-Speed 2498.54 samples/sec Loss 3.3463 LearningRate 0.000651 Epoch: 10 Global Step: 227020 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:30,142-Speed 2498.53 samples/sec Loss 3.3867 LearningRate 0.000651 Epoch: 10 Global Step: 227030 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:38,344-Speed 2497.65 samples/sec Loss 3.4845 LearningRate 0.000651 Epoch: 10 Global Step: 227040 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:46,488-Speed 2515.58 samples/sec Loss 3.3135 LearningRate 0.000651 Epoch: 10 Global Step: 227050 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:19:54,699-Speed 2494.45 samples/sec Loss 3.3917 LearningRate 0.000651 Epoch: 10 Global Step: 227060 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:02,897-Speed 2498.75 samples/sec Loss 3.4463 LearningRate 0.000651 Epoch: 10 Global Step: 227070 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:11,107-Speed 2494.72 samples/sec Loss 3.4016 LearningRate 0.000651 Epoch: 10 Global Step: 227080 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:19,303-Speed 2499.76 samples/sec Loss 3.3654 LearningRate 0.000651 Epoch: 10 Global Step: 227090 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:27,499-Speed 2498.91 samples/sec Loss 3.4145 LearningRate 0.000651 Epoch: 10 Global Step: 227100 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:35,644-Speed 2514.93 samples/sec Loss 3.5376 LearningRate 0.000651 Epoch: 10 Global Step: 227110 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:43,843-Speed 2498.15 samples/sec Loss 3.4651 LearningRate 0.000651 Epoch: 10 Global Step: 227120 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:20:52,043-Speed 2498.06 samples/sec Loss 3.5037 LearningRate 0.000651 Epoch: 10 Global Step: 227130 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:00,242-Speed 2498.16 samples/sec Loss 3.5502 LearningRate 0.000651 Epoch: 10 Global Step: 227140 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:08,439-Speed 2499.03 samples/sec Loss 3.4593 LearningRate 0.000651 Epoch: 10 Global Step: 227150 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:16,636-Speed 2498.91 samples/sec Loss 3.5150 LearningRate 0.000651 Epoch: 10 Global Step: 227160 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:24,790-Speed 2512.36 samples/sec Loss 3.4761 LearningRate 0.000651 Epoch: 10 Global Step: 227170 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:32,992-Speed 2497.22 samples/sec Loss 3.4342 LearningRate 0.000651 Epoch: 10 Global Step: 227180 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:41,207-Speed 2493.47 samples/sec Loss 3.4941 LearningRate 0.000651 Epoch: 10 Global Step: 227190 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:49,405-Speed 2498.54 samples/sec Loss 3.4378 LearningRate 0.000651 Epoch: 10 Global Step: 227200 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:21:57,609-Speed 2496.73 samples/sec Loss 3.3968 LearningRate 0.000651 Epoch: 10 Global Step: 227210 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:05,808-Speed 2498.27 samples/sec Loss 3.4112 LearningRate 0.000651 Epoch: 10 Global Step: 227220 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:13,953-Speed 2514.80 samples/sec Loss 3.3572 LearningRate 0.000651 Epoch: 10 Global Step: 227230 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:22,151-Speed 2498.54 samples/sec Loss 3.4259 LearningRate 0.000651 Epoch: 10 Global Step: 227240 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:30,352-Speed 2497.63 samples/sec Loss 3.4115 LearningRate 0.000651 Epoch: 10 Global Step: 227250 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:38,562-Speed 2494.98 samples/sec Loss 3.3502 LearningRate 0.000651 Epoch: 10 Global Step: 227260 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:46,763-Speed 2497.89 samples/sec Loss 3.3383 LearningRate 0.000651 Epoch: 10 Global Step: 227270 Fp16 Grad Scale: 65536 Required: 138 hours Training: 2022-07-07 18:22:54,931-Speed 2507.60 samples/sec Loss 3.3869 LearningRate 0.000651 Epoch: 10 Global Step: 227280 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:03,078-Speed 2514.18 samples/sec Loss 3.3606 LearningRate 0.000651 Epoch: 10 Global Step: 227290 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:11,276-Speed 2498.51 samples/sec Loss 3.3827 LearningRate 0.000651 Epoch: 10 Global Step: 227300 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:19,501-Speed 2490.99 samples/sec Loss 3.3027 LearningRate 0.000651 Epoch: 10 Global Step: 227310 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:27,701-Speed 2497.86 samples/sec Loss 3.3769 LearningRate 0.000651 Epoch: 10 Global Step: 227320 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:35,901-Speed 2497.79 samples/sec Loss 3.3607 LearningRate 0.000651 Epoch: 10 Global Step: 227330 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:44,102-Speed 2497.59 samples/sec Loss 3.4089 LearningRate 0.000651 Epoch: 10 Global Step: 227340 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:23:52,248-Speed 2514.65 samples/sec Loss 3.4083 LearningRate 0.000651 Epoch: 10 Global Step: 227350 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:00,446-Speed 2498.64 samples/sec Loss 3.3711 LearningRate 0.000651 Epoch: 10 Global Step: 227360 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:08,648-Speed 2497.50 samples/sec Loss 3.3126 LearningRate 0.000651 Epoch: 10 Global Step: 227370 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:16,845-Speed 2498.96 samples/sec Loss 3.4379 LearningRate 0.000651 Epoch: 10 Global Step: 227380 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:25,046-Speed 2497.67 samples/sec Loss 3.3899 LearningRate 0.000651 Epoch: 10 Global Step: 227390 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:33,244-Speed 2498.52 samples/sec Loss 3.3607 LearningRate 0.000650 Epoch: 10 Global Step: 227400 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:41,392-Speed 2514.73 samples/sec Loss 3.3890 LearningRate 0.000650 Epoch: 10 Global Step: 227410 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:49,591-Speed 2498.34 samples/sec Loss 3.3931 LearningRate 0.000650 Epoch: 10 Global Step: 227420 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:24:57,787-Speed 2499.14 samples/sec Loss 3.3341 LearningRate 0.000650 Epoch: 10 Global Step: 227430 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:05,985-Speed 2498.58 samples/sec Loss 3.3531 LearningRate 0.000650 Epoch: 10 Global Step: 227440 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:14,185-Speed 2497.87 samples/sec Loss 3.3959 LearningRate 0.000650 Epoch: 10 Global Step: 227450 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:22,387-Speed 2497.40 samples/sec Loss 3.4114 LearningRate 0.000650 Epoch: 10 Global Step: 227460 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:30,544-Speed 2511.10 samples/sec Loss 3.3307 LearningRate 0.000650 Epoch: 10 Global Step: 227470 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:38,742-Speed 2498.54 samples/sec Loss 3.3579 LearningRate 0.000650 Epoch: 10 Global Step: 227480 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:46,940-Speed 2498.43 samples/sec Loss 3.4024 LearningRate 0.000650 Epoch: 10 Global Step: 227490 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:25:55,140-Speed 2498.05 samples/sec Loss 3.3198 LearningRate 0.000650 Epoch: 10 Global Step: 227500 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:03,340-Speed 2498.26 samples/sec Loss 3.3541 LearningRate 0.000650 Epoch: 10 Global Step: 227510 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:11,538-Speed 2498.33 samples/sec Loss 3.3685 LearningRate 0.000650 Epoch: 10 Global Step: 227520 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:19,681-Speed 2515.60 samples/sec Loss 3.3787 LearningRate 0.000650 Epoch: 10 Global Step: 227530 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:27,878-Speed 2498.75 samples/sec Loss 3.4169 LearningRate 0.000650 Epoch: 10 Global Step: 227540 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:36,078-Speed 2497.96 samples/sec Loss 3.4665 LearningRate 0.000650 Epoch: 10 Global Step: 227550 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:44,277-Speed 2498.26 samples/sec Loss 3.4236 LearningRate 0.000650 Epoch: 10 Global Step: 227560 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:26:52,477-Speed 2498.00 samples/sec Loss 3.3763 LearningRate 0.000650 Epoch: 10 Global Step: 227570 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:00,676-Speed 2498.35 samples/sec Loss 3.4609 LearningRate 0.000650 Epoch: 10 Global Step: 227580 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:08,826-Speed 2513.35 samples/sec Loss 3.3486 LearningRate 0.000650 Epoch: 10 Global Step: 227590 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:17,024-Speed 2498.46 samples/sec Loss 3.3594 LearningRate 0.000650 Epoch: 10 Global Step: 227600 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:25,221-Speed 2498.97 samples/sec Loss 3.4375 LearningRate 0.000650 Epoch: 10 Global Step: 227610 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:33,416-Speed 2499.19 samples/sec Loss 3.4576 LearningRate 0.000650 Epoch: 10 Global Step: 227620 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:41,613-Speed 2498.94 samples/sec Loss 3.3648 LearningRate 0.000650 Epoch: 10 Global Step: 227630 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:49,812-Speed 2498.32 samples/sec Loss 3.3140 LearningRate 0.000650 Epoch: 10 Global Step: 227640 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:27:57,959-Speed 2514.06 samples/sec Loss 3.3414 LearningRate 0.000650 Epoch: 10 Global Step: 227650 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:06,158-Speed 2498.44 samples/sec Loss 3.3824 LearningRate 0.000650 Epoch: 10 Global Step: 227660 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:14,357-Speed 2498.27 samples/sec Loss 3.4123 LearningRate 0.000650 Epoch: 10 Global Step: 227670 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:22,559-Speed 2497.46 samples/sec Loss 3.3285 LearningRate 0.000650 Epoch: 10 Global Step: 227680 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:30,757-Speed 2498.44 samples/sec Loss 3.3900 LearningRate 0.000650 Epoch: 10 Global Step: 227690 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:38,956-Speed 2498.18 samples/sec Loss 3.3504 LearningRate 0.000650 Epoch: 10 Global Step: 227700 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:47,104-Speed 2513.88 samples/sec Loss 3.3943 LearningRate 0.000650 Epoch: 10 Global Step: 227710 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:28:55,300-Speed 2498.99 samples/sec Loss 3.3068 LearningRate 0.000650 Epoch: 10 Global Step: 227720 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:03,501-Speed 2497.75 samples/sec Loss 3.3476 LearningRate 0.000650 Epoch: 10 Global Step: 227730 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:11,700-Speed 2498.34 samples/sec Loss 3.3839 LearningRate 0.000650 Epoch: 10 Global Step: 227740 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:19,896-Speed 2499.15 samples/sec Loss 3.3734 LearningRate 0.000650 Epoch: 10 Global Step: 227750 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:28,097-Speed 2497.65 samples/sec Loss 3.3063 LearningRate 0.000650 Epoch: 10 Global Step: 227760 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:36,245-Speed 2514.04 samples/sec Loss 3.3535 LearningRate 0.000650 Epoch: 10 Global Step: 227770 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:44,445-Speed 2498.09 samples/sec Loss 3.4127 LearningRate 0.000650 Epoch: 10 Global Step: 227780 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:29:52,641-Speed 2499.06 samples/sec Loss 3.3157 LearningRate 0.000650 Epoch: 10 Global Step: 227790 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:00,845-Speed 2496.63 samples/sec Loss 3.3542 LearningRate 0.000650 Epoch: 10 Global Step: 227800 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:09,053-Speed 2495.60 samples/sec Loss 3.3603 LearningRate 0.000650 Epoch: 10 Global Step: 227810 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:17,254-Speed 2497.63 samples/sec Loss 3.3720 LearningRate 0.000650 Epoch: 10 Global Step: 227820 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:25,399-Speed 2514.74 samples/sec Loss 3.3284 LearningRate 0.000650 Epoch: 10 Global Step: 227830 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:33,616-Speed 2492.81 samples/sec Loss 3.4587 LearningRate 0.000650 Epoch: 10 Global Step: 227840 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:41,828-Speed 2494.21 samples/sec Loss 3.4081 LearningRate 0.000650 Epoch: 10 Global Step: 227850 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:50,041-Speed 2494.03 samples/sec Loss 3.3939 LearningRate 0.000650 Epoch: 10 Global Step: 227860 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:30:58,243-Speed 2497.30 samples/sec Loss 3.3883 LearningRate 0.000649 Epoch: 10 Global Step: 227870 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:31:06,446-Speed 2497.07 samples/sec Loss 3.3963 LearningRate 0.000649 Epoch: 10 Global Step: 227880 Fp16 Grad Scale: 32768 Required: 138 hours Training: 2022-07-07 18:31:14,594-Speed 2513.77 samples/sec Loss 3.4534 LearningRate 0.000649 Epoch: 10 Global Step: 227890 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:31:22,796-Speed 2497.50 samples/sec Loss 3.3718 LearningRate 0.000649 Epoch: 10 Global Step: 227900 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:31:30,997-Speed 2497.66 samples/sec Loss 3.3801 LearningRate 0.000649 Epoch: 10 Global Step: 227910 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:31:39,198-Speed 2497.91 samples/sec Loss 3.3102 LearningRate 0.000649 Epoch: 10 Global Step: 227920 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:31:47,397-Speed 2498.26 samples/sec Loss 3.3636 LearningRate 0.000649 Epoch: 10 Global Step: 227930 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:31:55,597-Speed 2498.17 samples/sec Loss 3.3927 LearningRate 0.000649 Epoch: 10 Global Step: 227940 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:03,743-Speed 2514.12 samples/sec Loss 3.3468 LearningRate 0.000649 Epoch: 10 Global Step: 227950 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:11,947-Speed 2496.93 samples/sec Loss 3.3768 LearningRate 0.000649 Epoch: 10 Global Step: 227960 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:20,146-Speed 2498.28 samples/sec Loss 3.4058 LearningRate 0.000649 Epoch: 10 Global Step: 227970 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:28,359-Speed 2494.07 samples/sec Loss 3.3652 LearningRate 0.000649 Epoch: 10 Global Step: 227980 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:36,558-Speed 2498.41 samples/sec Loss 3.3327 LearningRate 0.000649 Epoch: 10 Global Step: 227990 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:44,761-Speed 2497.26 samples/sec Loss 3.3929 LearningRate 0.000649 Epoch: 10 Global Step: 228000 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:32:52,907-Speed 2514.65 samples/sec Loss 3.3632 LearningRate 0.000649 Epoch: 10 Global Step: 228010 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:33:01,106-Speed 2498.24 samples/sec Loss 3.4618 LearningRate 0.000649 Epoch: 10 Global Step: 228020 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:33:09,306-Speed 2498.11 samples/sec Loss 3.3886 LearningRate 0.000649 Epoch: 10 Global Step: 228030 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:33:17,520-Speed 2493.76 samples/sec Loss 3.4394 LearningRate 0.000649 Epoch: 10 Global Step: 228040 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:33:25,720-Speed 2498.08 samples/sec Loss 3.4269 LearningRate 0.000649 Epoch: 10 Global Step: 228050 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:33:33,875-Speed 2512.08 samples/sec Loss 3.3881 LearningRate 0.000649 Epoch: 10 Global Step: 228060 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:33:42,020-Speed 2514.84 samples/sec Loss 3.3808 LearningRate 0.000649 Epoch: 10 Global Step: 228070 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:33:50,222-Speed 2497.54 samples/sec Loss 3.4007 LearningRate 0.000649 Epoch: 10 Global Step: 228080 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:33:58,419-Speed 2498.80 samples/sec Loss 3.3425 LearningRate 0.000649 Epoch: 10 Global Step: 228090 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:06,632-Speed 2493.97 samples/sec Loss 3.3476 LearningRate 0.000649 Epoch: 10 Global Step: 228100 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:14,832-Speed 2498.08 samples/sec Loss 3.3386 LearningRate 0.000649 Epoch: 10 Global Step: 228110 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:23,035-Speed 2497.28 samples/sec Loss 3.4950 LearningRate 0.000649 Epoch: 10 Global Step: 228120 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:31,189-Speed 2511.74 samples/sec Loss 3.3918 LearningRate 0.000649 Epoch: 10 Global Step: 228130 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:41,626-Speed 1962.59 samples/sec Loss 3.3787 LearningRate 0.000649 Epoch: 11 Global Step: 228140 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:49,824-Speed 2498.60 samples/sec Loss 3.3848 LearningRate 0.000649 Epoch: 11 Global Step: 228150 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:34:58,022-Speed 2498.66 samples/sec Loss 3.4108 LearningRate 0.000649 Epoch: 11 Global Step: 228160 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:06,216-Speed 2499.87 samples/sec Loss 3.3734 LearningRate 0.000649 Epoch: 11 Global Step: 228170 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:14,417-Speed 2498.16 samples/sec Loss 3.3462 LearningRate 0.000649 Epoch: 11 Global Step: 228180 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:22,558-Speed 2515.81 samples/sec Loss 3.3963 LearningRate 0.000649 Epoch: 11 Global Step: 228190 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:30,775-Speed 2492.78 samples/sec Loss 3.3603 LearningRate 0.000649 Epoch: 11 Global Step: 228200 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:38,977-Speed 2497.59 samples/sec Loss 3.3057 LearningRate 0.000649 Epoch: 11 Global Step: 228210 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:47,187-Speed 2494.99 samples/sec Loss 3.3846 LearningRate 0.000649 Epoch: 11 Global Step: 228220 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:35:55,388-Speed 2497.41 samples/sec Loss 3.3597 LearningRate 0.000649 Epoch: 11 Global Step: 228230 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:03,604-Speed 2493.23 samples/sec Loss 3.4023 LearningRate 0.000649 Epoch: 11 Global Step: 228240 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:11,763-Speed 2510.67 samples/sec Loss 3.3642 LearningRate 0.000649 Epoch: 11 Global Step: 228250 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:19,962-Speed 2498.36 samples/sec Loss 3.4158 LearningRate 0.000649 Epoch: 11 Global Step: 228260 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:28,160-Speed 2498.57 samples/sec Loss 3.4043 LearningRate 0.000649 Epoch: 11 Global Step: 228270 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:36,363-Speed 2497.42 samples/sec Loss 3.3702 LearningRate 0.000649 Epoch: 11 Global Step: 228280 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:44,560-Speed 2498.93 samples/sec Loss 3.4499 LearningRate 0.000649 Epoch: 11 Global Step: 228290 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:36:52,755-Speed 2499.37 samples/sec Loss 3.3460 LearningRate 0.000649 Epoch: 11 Global Step: 228300 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:00,901-Speed 2514.38 samples/sec Loss 3.2712 LearningRate 0.000649 Epoch: 11 Global Step: 228310 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:09,108-Speed 2495.95 samples/sec Loss 3.3427 LearningRate 0.000649 Epoch: 11 Global Step: 228320 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:17,308-Speed 2498.10 samples/sec Loss 3.3818 LearningRate 0.000648 Epoch: 11 Global Step: 228330 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:25,506-Speed 2498.59 samples/sec Loss 3.3424 LearningRate 0.000648 Epoch: 11 Global Step: 228340 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:33,704-Speed 2498.70 samples/sec Loss 3.3474 LearningRate 0.000648 Epoch: 11 Global Step: 228350 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:41,918-Speed 2493.42 samples/sec Loss 3.4941 LearningRate 0.000648 Epoch: 11 Global Step: 228360 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:50,063-Speed 2514.84 samples/sec Loss 3.3684 LearningRate 0.000648 Epoch: 11 Global Step: 228370 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:37:58,261-Speed 2498.57 samples/sec Loss 3.3368 LearningRate 0.000648 Epoch: 11 Global Step: 228380 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:06,458-Speed 2498.93 samples/sec Loss 3.3151 LearningRate 0.000648 Epoch: 11 Global Step: 228390 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:14,656-Speed 2498.97 samples/sec Loss 3.3527 LearningRate 0.000648 Epoch: 11 Global Step: 228400 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:22,854-Speed 2498.33 samples/sec Loss 3.4226 LearningRate 0.000648 Epoch: 11 Global Step: 228410 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:31,054-Speed 2497.69 samples/sec Loss 3.3617 LearningRate 0.000648 Epoch: 11 Global Step: 228420 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:39,200-Speed 2514.68 samples/sec Loss 3.3720 LearningRate 0.000648 Epoch: 11 Global Step: 228430 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:47,400-Speed 2498.40 samples/sec Loss 3.3820 LearningRate 0.000648 Epoch: 11 Global Step: 228440 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:38:55,605-Speed 2496.35 samples/sec Loss 3.3357 LearningRate 0.000648 Epoch: 11 Global Step: 228450 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:03,805-Speed 2498.05 samples/sec Loss 3.3139 LearningRate 0.000648 Epoch: 11 Global Step: 228460 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:12,007-Speed 2497.26 samples/sec Loss 3.3103 LearningRate 0.000648 Epoch: 11 Global Step: 228470 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:20,207-Speed 2498.06 samples/sec Loss 3.3432 LearningRate 0.000648 Epoch: 11 Global Step: 228480 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:28,359-Speed 2512.72 samples/sec Loss 3.3174 LearningRate 0.000648 Epoch: 11 Global Step: 228490 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:36,562-Speed 2497.29 samples/sec Loss 3.3392 LearningRate 0.000648 Epoch: 11 Global Step: 228500 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:44,768-Speed 2496.19 samples/sec Loss 3.3510 LearningRate 0.000648 Epoch: 11 Global Step: 228510 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:39:52,968-Speed 2498.04 samples/sec Loss 3.3152 LearningRate 0.000648 Epoch: 11 Global Step: 228520 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:01,171-Speed 2496.95 samples/sec Loss 3.2753 LearningRate 0.000648 Epoch: 11 Global Step: 228530 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:09,372-Speed 2497.70 samples/sec Loss 3.3629 LearningRate 0.000648 Epoch: 11 Global Step: 228540 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:17,519-Speed 2514.45 samples/sec Loss 3.2792 LearningRate 0.000648 Epoch: 11 Global Step: 228550 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:25,714-Speed 2499.59 samples/sec Loss 3.3497 LearningRate 0.000648 Epoch: 11 Global Step: 228560 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:33,914-Speed 2497.78 samples/sec Loss 3.3564 LearningRate 0.000648 Epoch: 11 Global Step: 228570 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:42,111-Speed 2498.89 samples/sec Loss 3.3377 LearningRate 0.000648 Epoch: 11 Global Step: 228580 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:50,309-Speed 2498.93 samples/sec Loss 3.3199 LearningRate 0.000648 Epoch: 11 Global Step: 228590 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:40:58,505-Speed 2498.91 samples/sec Loss 3.3231 LearningRate 0.000648 Epoch: 11 Global Step: 228600 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:06,652-Speed 2514.44 samples/sec Loss 3.3451 LearningRate 0.000648 Epoch: 11 Global Step: 228610 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:14,848-Speed 2499.06 samples/sec Loss 3.2538 LearningRate 0.000648 Epoch: 11 Global Step: 228620 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:23,046-Speed 2498.72 samples/sec Loss 3.3260 LearningRate 0.000648 Epoch: 11 Global Step: 228630 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:31,245-Speed 2498.30 samples/sec Loss 3.3776 LearningRate 0.000648 Epoch: 11 Global Step: 228640 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:39,443-Speed 2498.65 samples/sec Loss 3.3494 LearningRate 0.000648 Epoch: 11 Global Step: 228650 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:47,649-Speed 2495.87 samples/sec Loss 3.3780 LearningRate 0.000648 Epoch: 11 Global Step: 228660 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:41:55,805-Speed 2511.99 samples/sec Loss 3.3351 LearningRate 0.000648 Epoch: 11 Global Step: 228670 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:04,005-Speed 2497.87 samples/sec Loss 3.3784 LearningRate 0.000648 Epoch: 11 Global Step: 228680 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:12,208-Speed 2497.06 samples/sec Loss 3.3728 LearningRate 0.000648 Epoch: 11 Global Step: 228690 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:20,420-Speed 2494.22 samples/sec Loss 3.3464 LearningRate 0.000648 Epoch: 11 Global Step: 228700 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:28,624-Speed 2496.63 samples/sec Loss 3.2936 LearningRate 0.000648 Epoch: 11 Global Step: 228710 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:36,826-Speed 2497.55 samples/sec Loss 3.3474 LearningRate 0.000648 Epoch: 11 Global Step: 228720 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:44,974-Speed 2513.76 samples/sec Loss 3.3697 LearningRate 0.000648 Epoch: 11 Global Step: 228730 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:42:53,176-Speed 2497.79 samples/sec Loss 3.3418 LearningRate 0.000648 Epoch: 11 Global Step: 228740 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:01,378-Speed 2497.19 samples/sec Loss 3.3558 LearningRate 0.000648 Epoch: 11 Global Step: 228750 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:09,581-Speed 2496.78 samples/sec Loss 3.3579 LearningRate 0.000648 Epoch: 11 Global Step: 228760 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:17,782-Speed 2497.70 samples/sec Loss 3.3231 LearningRate 0.000648 Epoch: 11 Global Step: 228770 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:25,980-Speed 2498.53 samples/sec Loss 3.3123 LearningRate 0.000648 Epoch: 11 Global Step: 228780 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:34,125-Speed 2514.70 samples/sec Loss 3.4631 LearningRate 0.000647 Epoch: 11 Global Step: 228790 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:42,322-Speed 2498.78 samples/sec Loss 3.3870 LearningRate 0.000647 Epoch: 11 Global Step: 228800 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:50,524-Speed 2497.50 samples/sec Loss 3.3382 LearningRate 0.000647 Epoch: 11 Global Step: 228810 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:43:58,728-Speed 2496.93 samples/sec Loss 3.3277 LearningRate 0.000647 Epoch: 11 Global Step: 228820 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:06,926-Speed 2498.44 samples/sec Loss 3.2848 LearningRate 0.000647 Epoch: 11 Global Step: 228830 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:15,125-Speed 2498.34 samples/sec Loss 3.3055 LearningRate 0.000647 Epoch: 11 Global Step: 228840 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:23,270-Speed 2514.89 samples/sec Loss 3.3642 LearningRate 0.000647 Epoch: 11 Global Step: 228850 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:31,475-Speed 2496.53 samples/sec Loss 3.4002 LearningRate 0.000647 Epoch: 11 Global Step: 228860 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:39,672-Speed 2498.81 samples/sec Loss 3.3669 LearningRate 0.000647 Epoch: 11 Global Step: 228870 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:47,868-Speed 2499.10 samples/sec Loss 3.3186 LearningRate 0.000647 Epoch: 11 Global Step: 228880 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:44:56,067-Speed 2498.38 samples/sec Loss 3.3106 LearningRate 0.000647 Epoch: 11 Global Step: 228890 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:04,265-Speed 2498.60 samples/sec Loss 3.3683 LearningRate 0.000647 Epoch: 11 Global Step: 228900 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:12,424-Speed 2510.42 samples/sec Loss 3.3596 LearningRate 0.000647 Epoch: 11 Global Step: 228910 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:20,623-Speed 2498.39 samples/sec Loss 3.3657 LearningRate 0.000647 Epoch: 11 Global Step: 228920 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:28,821-Speed 2498.50 samples/sec Loss 3.3412 LearningRate 0.000647 Epoch: 11 Global Step: 228930 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:37,021-Speed 2497.89 samples/sec Loss 3.3965 LearningRate 0.000647 Epoch: 11 Global Step: 228940 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:45,223-Speed 2497.52 samples/sec Loss 3.3544 LearningRate 0.000647 Epoch: 11 Global Step: 228950 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:45:53,418-Speed 2499.38 samples/sec Loss 3.3443 LearningRate 0.000647 Epoch: 11 Global Step: 228960 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:01,564-Speed 2514.74 samples/sec Loss 3.3110 LearningRate 0.000647 Epoch: 11 Global Step: 228970 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:09,762-Speed 2498.60 samples/sec Loss 3.3526 LearningRate 0.000647 Epoch: 11 Global Step: 228980 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:17,958-Speed 2499.14 samples/sec Loss 3.3296 LearningRate 0.000647 Epoch: 11 Global Step: 228990 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:26,159-Speed 2497.75 samples/sec Loss 3.3322 LearningRate 0.000647 Epoch: 11 Global Step: 229000 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:34,358-Speed 2498.13 samples/sec Loss 3.3898 LearningRate 0.000647 Epoch: 11 Global Step: 229010 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:42,557-Speed 2498.26 samples/sec Loss 3.3280 LearningRate 0.000647 Epoch: 11 Global Step: 229020 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:50,702-Speed 2514.80 samples/sec Loss 3.3609 LearningRate 0.000647 Epoch: 11 Global Step: 229030 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:46:58,901-Speed 2498.52 samples/sec Loss 3.3359 LearningRate 0.000647 Epoch: 11 Global Step: 229040 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:07,104-Speed 2497.18 samples/sec Loss 3.3447 LearningRate 0.000647 Epoch: 11 Global Step: 229050 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:15,303-Speed 2498.04 samples/sec Loss 3.3226 LearningRate 0.000647 Epoch: 11 Global Step: 229060 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:23,506-Speed 2497.27 samples/sec Loss 3.3399 LearningRate 0.000647 Epoch: 11 Global Step: 229070 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:31,703-Speed 2498.72 samples/sec Loss 3.3624 LearningRate 0.000647 Epoch: 11 Global Step: 229080 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:39,862-Speed 2510.81 samples/sec Loss 3.3325 LearningRate 0.000647 Epoch: 11 Global Step: 229090 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:48,063-Speed 2497.34 samples/sec Loss 3.3612 LearningRate 0.000647 Epoch: 11 Global Step: 229100 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:47:56,263-Speed 2498.07 samples/sec Loss 3.3775 LearningRate 0.000647 Epoch: 11 Global Step: 229110 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:04,461-Speed 2498.67 samples/sec Loss 3.4167 LearningRate 0.000647 Epoch: 11 Global Step: 229120 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:12,663-Speed 2497.30 samples/sec Loss 3.3923 LearningRate 0.000647 Epoch: 11 Global Step: 229130 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:20,859-Speed 2499.19 samples/sec Loss 3.3910 LearningRate 0.000647 Epoch: 11 Global Step: 229140 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:29,017-Speed 2510.95 samples/sec Loss 3.4294 LearningRate 0.000647 Epoch: 11 Global Step: 229150 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:37,217-Speed 2498.01 samples/sec Loss 3.3203 LearningRate 0.000647 Epoch: 11 Global Step: 229160 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:45,415-Speed 2498.36 samples/sec Loss 3.3703 LearningRate 0.000647 Epoch: 11 Global Step: 229170 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:48:53,615-Speed 2498.10 samples/sec Loss 3.2686 LearningRate 0.000647 Epoch: 11 Global Step: 229180 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:01,813-Speed 2498.67 samples/sec Loss 3.3248 LearningRate 0.000647 Epoch: 11 Global Step: 229190 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:10,017-Speed 2496.78 samples/sec Loss 3.3210 LearningRate 0.000647 Epoch: 11 Global Step: 229200 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:18,165-Speed 2513.88 samples/sec Loss 3.3817 LearningRate 0.000647 Epoch: 11 Global Step: 229210 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:26,375-Speed 2494.89 samples/sec Loss 3.3662 LearningRate 0.000647 Epoch: 11 Global Step: 229220 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:34,577-Speed 2497.29 samples/sec Loss 3.3757 LearningRate 0.000647 Epoch: 11 Global Step: 229230 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:42,790-Speed 2494.14 samples/sec Loss 3.2373 LearningRate 0.000647 Epoch: 11 Global Step: 229240 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:50,991-Speed 2497.82 samples/sec Loss 3.3399 LearningRate 0.000647 Epoch: 11 Global Step: 229250 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 18:49:59,194-Speed 2497.13 samples/sec Loss 3.2729 LearningRate 0.000646 Epoch: 11 Global Step: 229260 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:07,343-Speed 2513.57 samples/sec Loss 3.4975 LearningRate 0.000646 Epoch: 11 Global Step: 229270 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:15,545-Speed 2497.38 samples/sec Loss 3.3496 LearningRate 0.000646 Epoch: 11 Global Step: 229280 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:23,754-Speed 2495.34 samples/sec Loss 3.4032 LearningRate 0.000646 Epoch: 11 Global Step: 229290 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:31,957-Speed 2497.18 samples/sec Loss 3.4675 LearningRate 0.000646 Epoch: 11 Global Step: 229300 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:40,162-Speed 2496.40 samples/sec Loss 3.5062 LearningRate 0.000646 Epoch: 11 Global Step: 229310 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:48,367-Speed 2496.76 samples/sec Loss 3.4207 LearningRate 0.000646 Epoch: 11 Global Step: 229320 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:50:56,517-Speed 2512.97 samples/sec Loss 3.4280 LearningRate 0.000646 Epoch: 11 Global Step: 229330 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:04,722-Speed 2496.46 samples/sec Loss 3.3362 LearningRate 0.000646 Epoch: 11 Global Step: 229340 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:12,922-Speed 2497.97 samples/sec Loss 3.3540 LearningRate 0.000646 Epoch: 11 Global Step: 229350 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:21,144-Speed 2491.28 samples/sec Loss 3.4010 LearningRate 0.000646 Epoch: 11 Global Step: 229360 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:29,346-Speed 2497.27 samples/sec Loss 3.4159 LearningRate 0.000646 Epoch: 11 Global Step: 229370 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:37,547-Speed 2497.66 samples/sec Loss 3.3769 LearningRate 0.000646 Epoch: 11 Global Step: 229380 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:45,701-Speed 2512.02 samples/sec Loss 3.3740 LearningRate 0.000646 Epoch: 11 Global Step: 229390 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:51:53,898-Speed 2498.71 samples/sec Loss 3.3263 LearningRate 0.000646 Epoch: 11 Global Step: 229400 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:02,093-Speed 2499.61 samples/sec Loss 3.3067 LearningRate 0.000646 Epoch: 11 Global Step: 229410 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:10,303-Speed 2494.94 samples/sec Loss 3.3010 LearningRate 0.000646 Epoch: 11 Global Step: 229420 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:18,502-Speed 2498.59 samples/sec Loss 3.3788 LearningRate 0.000646 Epoch: 11 Global Step: 229430 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:26,704-Speed 2497.40 samples/sec Loss 3.3728 LearningRate 0.000646 Epoch: 11 Global Step: 229440 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:34,849-Speed 2514.77 samples/sec Loss 3.3460 LearningRate 0.000646 Epoch: 11 Global Step: 229450 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:43,059-Speed 2494.87 samples/sec Loss 3.3775 LearningRate 0.000646 Epoch: 11 Global Step: 229460 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:51,261-Speed 2497.58 samples/sec Loss 3.2799 LearningRate 0.000646 Epoch: 11 Global Step: 229470 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:52:59,464-Speed 2496.80 samples/sec Loss 3.3641 LearningRate 0.000646 Epoch: 11 Global Step: 229480 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:07,665-Speed 2497.79 samples/sec Loss 3.3933 LearningRate 0.000646 Epoch: 11 Global Step: 229490 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:15,868-Speed 2496.92 samples/sec Loss 3.3552 LearningRate 0.000646 Epoch: 11 Global Step: 229500 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:24,017-Speed 2513.73 samples/sec Loss 3.4038 LearningRate 0.000646 Epoch: 11 Global Step: 229510 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:32,217-Speed 2497.58 samples/sec Loss 3.2879 LearningRate 0.000646 Epoch: 11 Global Step: 229520 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:40,419-Speed 2497.43 samples/sec Loss 3.3649 LearningRate 0.000646 Epoch: 11 Global Step: 229530 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:48,627-Speed 2495.71 samples/sec Loss 3.3601 LearningRate 0.000646 Epoch: 11 Global Step: 229540 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:53:56,824-Speed 2498.75 samples/sec Loss 3.3032 LearningRate 0.000646 Epoch: 11 Global Step: 229550 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:05,022-Speed 2498.58 samples/sec Loss 3.3256 LearningRate 0.000646 Epoch: 11 Global Step: 229560 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:13,166-Speed 2515.26 samples/sec Loss 3.2608 LearningRate 0.000646 Epoch: 11 Global Step: 229570 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:21,368-Speed 2497.37 samples/sec Loss 3.3319 LearningRate 0.000646 Epoch: 11 Global Step: 229580 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:29,565-Speed 2498.79 samples/sec Loss 3.3161 LearningRate 0.000646 Epoch: 11 Global Step: 229590 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:37,761-Speed 2499.14 samples/sec Loss 3.3153 LearningRate 0.000646 Epoch: 11 Global Step: 229600 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:45,959-Speed 2498.76 samples/sec Loss 3.3652 LearningRate 0.000646 Epoch: 11 Global Step: 229610 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:54:54,163-Speed 2496.73 samples/sec Loss 3.4288 LearningRate 0.000646 Epoch: 11 Global Step: 229620 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:02,307-Speed 2515.02 samples/sec Loss 3.3786 LearningRate 0.000646 Epoch: 11 Global Step: 229630 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:10,507-Speed 2499.23 samples/sec Loss 3.3608 LearningRate 0.000646 Epoch: 11 Global Step: 229640 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:18,708-Speed 2497.84 samples/sec Loss 3.3295 LearningRate 0.000646 Epoch: 11 Global Step: 229650 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:26,907-Speed 2498.03 samples/sec Loss 3.3461 LearningRate 0.000646 Epoch: 11 Global Step: 229660 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:35,118-Speed 2494.73 samples/sec Loss 3.3668 LearningRate 0.000646 Epoch: 11 Global Step: 229670 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:43,322-Speed 2496.66 samples/sec Loss 3.2908 LearningRate 0.000646 Epoch: 11 Global Step: 229680 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:51,470-Speed 2514.02 samples/sec Loss 3.3469 LearningRate 0.000646 Epoch: 11 Global Step: 229690 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:55:59,668-Speed 2498.55 samples/sec Loss 3.3772 LearningRate 0.000646 Epoch: 11 Global Step: 229700 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:07,866-Speed 2498.50 samples/sec Loss 3.2652 LearningRate 0.000646 Epoch: 11 Global Step: 229710 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:16,066-Speed 2497.95 samples/sec Loss 3.3322 LearningRate 0.000645 Epoch: 11 Global Step: 229720 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:24,264-Speed 2498.80 samples/sec Loss 3.3382 LearningRate 0.000645 Epoch: 11 Global Step: 229730 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:32,464-Speed 2497.97 samples/sec Loss 3.3441 LearningRate 0.000645 Epoch: 11 Global Step: 229740 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:40,608-Speed 2514.96 samples/sec Loss 3.3596 LearningRate 0.000645 Epoch: 11 Global Step: 229750 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:48,805-Speed 2498.95 samples/sec Loss 3.3178 LearningRate 0.000645 Epoch: 11 Global Step: 229760 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:56:57,006-Speed 2497.72 samples/sec Loss 3.3883 LearningRate 0.000645 Epoch: 11 Global Step: 229770 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:05,206-Speed 2497.90 samples/sec Loss 3.3249 LearningRate 0.000645 Epoch: 11 Global Step: 229780 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:13,409-Speed 2497.06 samples/sec Loss 3.3514 LearningRate 0.000645 Epoch: 11 Global Step: 229790 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:21,607-Speed 2498.41 samples/sec Loss 3.4686 LearningRate 0.000645 Epoch: 11 Global Step: 229800 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:29,755-Speed 2513.92 samples/sec Loss 3.4139 LearningRate 0.000645 Epoch: 11 Global Step: 229810 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:37,958-Speed 2496.99 samples/sec Loss 3.3570 LearningRate 0.000645 Epoch: 11 Global Step: 229820 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:46,164-Speed 2496.29 samples/sec Loss 3.4144 LearningRate 0.000645 Epoch: 11 Global Step: 229830 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:57:54,362-Speed 2498.49 samples/sec Loss 3.4219 LearningRate 0.000645 Epoch: 11 Global Step: 229840 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:02,562-Speed 2498.10 samples/sec Loss 3.4143 LearningRate 0.000645 Epoch: 11 Global Step: 229850 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:10,766-Speed 2496.80 samples/sec Loss 3.3220 LearningRate 0.000645 Epoch: 11 Global Step: 229860 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:18,909-Speed 2515.25 samples/sec Loss 3.3440 LearningRate 0.000645 Epoch: 11 Global Step: 229870 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:27,110-Speed 2497.75 samples/sec Loss 3.3951 LearningRate 0.000645 Epoch: 11 Global Step: 229880 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:35,309-Speed 2498.62 samples/sec Loss 3.4556 LearningRate 0.000645 Epoch: 11 Global Step: 229890 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:43,503-Speed 2499.93 samples/sec Loss 3.3795 LearningRate 0.000645 Epoch: 11 Global Step: 229900 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:51,699-Speed 2498.93 samples/sec Loss 3.3785 LearningRate 0.000645 Epoch: 11 Global Step: 229910 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:58:59,898-Speed 2498.38 samples/sec Loss 3.3812 LearningRate 0.000645 Epoch: 11 Global Step: 229920 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:08,047-Speed 2513.63 samples/sec Loss 3.3942 LearningRate 0.000645 Epoch: 11 Global Step: 229930 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:16,254-Speed 2495.62 samples/sec Loss 3.3792 LearningRate 0.000645 Epoch: 11 Global Step: 229940 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:24,449-Speed 2499.70 samples/sec Loss 3.3889 LearningRate 0.000645 Epoch: 11 Global Step: 229950 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:32,658-Speed 2495.33 samples/sec Loss 3.3338 LearningRate 0.000645 Epoch: 11 Global Step: 229960 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:40,857-Speed 2498.37 samples/sec Loss 3.3590 LearningRate 0.000645 Epoch: 11 Global Step: 229970 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:49,055-Speed 2498.55 samples/sec Loss 3.3231 LearningRate 0.000645 Epoch: 11 Global Step: 229980 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 18:59:57,201-Speed 2514.45 samples/sec Loss 3.3960 LearningRate 0.000645 Epoch: 11 Global Step: 229990 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:05,400-Speed 2498.41 samples/sec Loss 3.3398 LearningRate 0.000645 Epoch: 11 Global Step: 230000 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:13,596-Speed 2499.00 samples/sec Loss 3.3727 LearningRate 0.000645 Epoch: 11 Global Step: 230010 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:21,799-Speed 2497.35 samples/sec Loss 3.3715 LearningRate 0.000645 Epoch: 11 Global Step: 230020 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:29,999-Speed 2497.99 samples/sec Loss 3.3712 LearningRate 0.000645 Epoch: 11 Global Step: 230030 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:38,197-Speed 2498.55 samples/sec Loss 3.3184 LearningRate 0.000645 Epoch: 11 Global Step: 230040 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:46,341-Speed 2515.21 samples/sec Loss 3.2630 LearningRate 0.000645 Epoch: 11 Global Step: 230050 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:00:54,547-Speed 2495.92 samples/sec Loss 3.3427 LearningRate 0.000645 Epoch: 11 Global Step: 230060 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:02,747-Speed 2498.22 samples/sec Loss 3.3428 LearningRate 0.000645 Epoch: 11 Global Step: 230070 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:10,946-Speed 2498.26 samples/sec Loss 3.2864 LearningRate 0.000645 Epoch: 11 Global Step: 230080 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:19,144-Speed 2498.73 samples/sec Loss 3.3118 LearningRate 0.000645 Epoch: 11 Global Step: 230090 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:27,342-Speed 2498.35 samples/sec Loss 3.4422 LearningRate 0.000645 Epoch: 11 Global Step: 230100 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:35,488-Speed 2514.75 samples/sec Loss 3.3443 LearningRate 0.000645 Epoch: 11 Global Step: 230110 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:43,698-Speed 2494.75 samples/sec Loss 3.3839 LearningRate 0.000645 Epoch: 11 Global Step: 230120 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:01:51,896-Speed 2498.69 samples/sec Loss 3.3404 LearningRate 0.000645 Epoch: 11 Global Step: 230130 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:00,095-Speed 2498.15 samples/sec Loss 3.3790 LearningRate 0.000645 Epoch: 11 Global Step: 230140 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:08,293-Speed 2498.63 samples/sec Loss 3.4465 LearningRate 0.000645 Epoch: 11 Global Step: 230150 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:16,489-Speed 2499.23 samples/sec Loss 3.3906 LearningRate 0.000645 Epoch: 11 Global Step: 230160 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:24,636-Speed 2514.19 samples/sec Loss 3.4144 LearningRate 0.000645 Epoch: 11 Global Step: 230170 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:32,836-Speed 2497.74 samples/sec Loss 3.3881 LearningRate 0.000645 Epoch: 11 Global Step: 230180 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:41,036-Speed 2498.26 samples/sec Loss 3.3218 LearningRate 0.000644 Epoch: 11 Global Step: 230190 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:49,240-Speed 2497.49 samples/sec Loss 3.4256 LearningRate 0.000644 Epoch: 11 Global Step: 230200 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:02:57,441-Speed 2497.62 samples/sec Loss 3.3546 LearningRate 0.000644 Epoch: 11 Global Step: 230210 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:05,654-Speed 2494.09 samples/sec Loss 3.3244 LearningRate 0.000644 Epoch: 11 Global Step: 230220 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:13,799-Speed 2514.65 samples/sec Loss 3.3644 LearningRate 0.000644 Epoch: 11 Global Step: 230230 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:22,000-Speed 2497.98 samples/sec Loss 3.3533 LearningRate 0.000644 Epoch: 11 Global Step: 230240 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:30,199-Speed 2498.09 samples/sec Loss 3.3247 LearningRate 0.000644 Epoch: 11 Global Step: 230250 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:38,402-Speed 2497.09 samples/sec Loss 3.4207 LearningRate 0.000644 Epoch: 11 Global Step: 230260 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:46,640-Speed 2486.69 samples/sec Loss 3.3830 LearningRate 0.000644 Epoch: 11 Global Step: 230270 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:03:54,843-Speed 2496.80 samples/sec Loss 3.3704 LearningRate 0.000644 Epoch: 11 Global Step: 230280 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:02,997-Speed 2512.25 samples/sec Loss 3.3245 LearningRate 0.000644 Epoch: 11 Global Step: 230290 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:11,210-Speed 2493.85 samples/sec Loss 3.4012 LearningRate 0.000644 Epoch: 11 Global Step: 230300 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:19,408-Speed 2498.65 samples/sec Loss 3.2957 LearningRate 0.000644 Epoch: 11 Global Step: 230310 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:27,608-Speed 2497.95 samples/sec Loss 3.3380 LearningRate 0.000644 Epoch: 11 Global Step: 230320 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:35,811-Speed 2496.96 samples/sec Loss 3.3427 LearningRate 0.000644 Epoch: 11 Global Step: 230330 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:44,010-Speed 2498.11 samples/sec Loss 3.3733 LearningRate 0.000644 Epoch: 11 Global Step: 230340 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:04:52,156-Speed 2514.50 samples/sec Loss 3.3906 LearningRate 0.000644 Epoch: 11 Global Step: 230350 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:00,354-Speed 2498.66 samples/sec Loss 3.3435 LearningRate 0.000644 Epoch: 11 Global Step: 230360 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:08,562-Speed 2495.63 samples/sec Loss 3.4411 LearningRate 0.000644 Epoch: 11 Global Step: 230370 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:16,766-Speed 2496.56 samples/sec Loss 3.4006 LearningRate 0.000644 Epoch: 11 Global Step: 230380 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:24,970-Speed 2496.89 samples/sec Loss 3.3512 LearningRate 0.000644 Epoch: 11 Global Step: 230390 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:33,173-Speed 2497.00 samples/sec Loss 3.3378 LearningRate 0.000644 Epoch: 11 Global Step: 230400 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:41,323-Speed 2513.16 samples/sec Loss 3.4212 LearningRate 0.000644 Epoch: 11 Global Step: 230410 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:49,528-Speed 2496.66 samples/sec Loss 3.3155 LearningRate 0.000644 Epoch: 11 Global Step: 230420 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:05:57,732-Speed 2496.51 samples/sec Loss 3.3264 LearningRate 0.000644 Epoch: 11 Global Step: 230430 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:06:05,932-Speed 2498.17 samples/sec Loss 3.3811 LearningRate 0.000644 Epoch: 11 Global Step: 230440 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:06:14,150-Speed 2492.33 samples/sec Loss 3.3714 LearningRate 0.000644 Epoch: 11 Global Step: 230450 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:06:22,353-Speed 2497.27 samples/sec Loss 3.3536 LearningRate 0.000644 Epoch: 11 Global Step: 230460 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:06:30,498-Speed 2514.97 samples/sec Loss 3.3435 LearningRate 0.000644 Epoch: 11 Global Step: 230470 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:06:38,697-Speed 2498.22 samples/sec Loss 3.3063 LearningRate 0.000644 Epoch: 11 Global Step: 230480 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:06:46,906-Speed 2494.99 samples/sec Loss 3.3310 LearningRate 0.000644 Epoch: 11 Global Step: 230490 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:06:55,110-Speed 2496.82 samples/sec Loss 3.3932 LearningRate 0.000644 Epoch: 11 Global Step: 230500 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:03,309-Speed 2498.25 samples/sec Loss 3.3552 LearningRate 0.000644 Epoch: 11 Global Step: 230510 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:11,506-Speed 2498.73 samples/sec Loss 3.4716 LearningRate 0.000644 Epoch: 11 Global Step: 230520 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:19,653-Speed 2514.40 samples/sec Loss 3.2984 LearningRate 0.000644 Epoch: 11 Global Step: 230530 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:27,866-Speed 2493.97 samples/sec Loss 3.3785 LearningRate 0.000644 Epoch: 11 Global Step: 230540 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:36,080-Speed 2494.16 samples/sec Loss 3.3615 LearningRate 0.000644 Epoch: 11 Global Step: 230550 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:44,284-Speed 2496.62 samples/sec Loss 3.3127 LearningRate 0.000644 Epoch: 11 Global Step: 230560 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:07:52,498-Speed 2493.62 samples/sec Loss 3.3175 LearningRate 0.000644 Epoch: 11 Global Step: 230570 Fp16 Grad Scale: 65536 Required: 137 hours Training: 2022-07-07 19:08:00,655-Speed 2511.31 samples/sec Loss 3.3579 LearningRate 0.000644 Epoch: 11 Global Step: 230580 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:08,801-Speed 2514.45 samples/sec Loss 3.2814 LearningRate 0.000644 Epoch: 11 Global Step: 230590 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:17,000-Speed 2498.21 samples/sec Loss 3.2898 LearningRate 0.000644 Epoch: 11 Global Step: 230600 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:25,200-Speed 2498.05 samples/sec Loss 3.4190 LearningRate 0.000644 Epoch: 11 Global Step: 230610 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:33,400-Speed 2498.05 samples/sec Loss 3.3756 LearningRate 0.000644 Epoch: 11 Global Step: 230620 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:41,599-Speed 2498.20 samples/sec Loss 3.4173 LearningRate 0.000644 Epoch: 11 Global Step: 230630 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:49,798-Speed 2498.10 samples/sec Loss 3.4214 LearningRate 0.000644 Epoch: 11 Global Step: 230640 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:08:57,945-Speed 2514.61 samples/sec Loss 3.2749 LearningRate 0.000643 Epoch: 11 Global Step: 230650 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:06,148-Speed 2497.13 samples/sec Loss 3.3844 LearningRate 0.000643 Epoch: 11 Global Step: 230660 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:14,349-Speed 2497.91 samples/sec Loss 3.3754 LearningRate 0.000643 Epoch: 11 Global Step: 230670 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:22,550-Speed 2497.79 samples/sec Loss 3.4344 LearningRate 0.000643 Epoch: 11 Global Step: 230680 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:30,751-Speed 2497.56 samples/sec Loss 3.3721 LearningRate 0.000643 Epoch: 11 Global Step: 230690 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:38,949-Speed 2498.54 samples/sec Loss 3.3139 LearningRate 0.000643 Epoch: 11 Global Step: 230700 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:47,099-Speed 2513.23 samples/sec Loss 3.3383 LearningRate 0.000643 Epoch: 11 Global Step: 230710 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:09:55,295-Speed 2499.33 samples/sec Loss 3.3269 LearningRate 0.000643 Epoch: 11 Global Step: 230720 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:03,490-Speed 2499.46 samples/sec Loss 3.2822 LearningRate 0.000643 Epoch: 11 Global Step: 230730 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:11,688-Speed 2498.65 samples/sec Loss 3.3548 LearningRate 0.000643 Epoch: 11 Global Step: 230740 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:19,883-Speed 2499.50 samples/sec Loss 3.4195 LearningRate 0.000643 Epoch: 11 Global Step: 230750 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:28,078-Speed 2499.46 samples/sec Loss 3.3945 LearningRate 0.000643 Epoch: 11 Global Step: 230760 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:36,221-Speed 2515.44 samples/sec Loss 3.3504 LearningRate 0.000643 Epoch: 11 Global Step: 230770 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:44,416-Speed 2499.39 samples/sec Loss 3.3243 LearningRate 0.000643 Epoch: 11 Global Step: 230780 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:10:52,616-Speed 2498.01 samples/sec Loss 3.2980 LearningRate 0.000643 Epoch: 11 Global Step: 230790 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:00,813-Speed 2498.71 samples/sec Loss 3.3208 LearningRate 0.000643 Epoch: 11 Global Step: 230800 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:09,011-Speed 2498.85 samples/sec Loss 3.3860 LearningRate 0.000643 Epoch: 11 Global Step: 230810 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:17,208-Speed 2498.83 samples/sec Loss 3.3079 LearningRate 0.000643 Epoch: 11 Global Step: 230820 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:25,356-Speed 2513.82 samples/sec Loss 3.3832 LearningRate 0.000643 Epoch: 11 Global Step: 230830 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:33,550-Speed 2499.71 samples/sec Loss 3.3310 LearningRate 0.000643 Epoch: 11 Global Step: 230840 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:41,746-Speed 2499.13 samples/sec Loss 3.3391 LearningRate 0.000643 Epoch: 11 Global Step: 230850 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:49,943-Speed 2498.92 samples/sec Loss 3.3977 LearningRate 0.000643 Epoch: 11 Global Step: 230860 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:11:58,140-Speed 2498.76 samples/sec Loss 3.3238 LearningRate 0.000643 Epoch: 11 Global Step: 230870 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:06,346-Speed 2496.46 samples/sec Loss 3.3684 LearningRate 0.000643 Epoch: 11 Global Step: 230880 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:14,484-Speed 2517.36 samples/sec Loss 3.3496 LearningRate 0.000643 Epoch: 11 Global Step: 230890 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:22,686-Speed 2497.26 samples/sec Loss 3.2675 LearningRate 0.000643 Epoch: 11 Global Step: 230900 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:30,884-Speed 2498.44 samples/sec Loss 3.3274 LearningRate 0.000643 Epoch: 11 Global Step: 230910 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:39,078-Speed 2499.85 samples/sec Loss 3.2819 LearningRate 0.000643 Epoch: 11 Global Step: 230920 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:47,276-Speed 2498.43 samples/sec Loss 3.3742 LearningRate 0.000643 Epoch: 11 Global Step: 230930 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:12:55,473-Speed 2498.63 samples/sec Loss 3.3822 LearningRate 0.000643 Epoch: 11 Global Step: 230940 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:03,628-Speed 2511.79 samples/sec Loss 3.3702 LearningRate 0.000643 Epoch: 11 Global Step: 230950 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:11,830-Speed 2497.62 samples/sec Loss 3.3453 LearningRate 0.000643 Epoch: 11 Global Step: 230960 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:20,026-Speed 2499.06 samples/sec Loss 3.3437 LearningRate 0.000643 Epoch: 11 Global Step: 230970 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:28,225-Speed 2498.24 samples/sec Loss 3.3698 LearningRate 0.000643 Epoch: 11 Global Step: 230980 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:36,433-Speed 2495.51 samples/sec Loss 3.3637 LearningRate 0.000643 Epoch: 11 Global Step: 230990 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:44,632-Speed 2498.33 samples/sec Loss 3.3571 LearningRate 0.000643 Epoch: 11 Global Step: 231000 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:13:52,780-Speed 2514.05 samples/sec Loss 3.3418 LearningRate 0.000643 Epoch: 11 Global Step: 231010 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:00,980-Speed 2497.86 samples/sec Loss 3.3254 LearningRate 0.000643 Epoch: 11 Global Step: 231020 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:09,180-Speed 2497.99 samples/sec Loss 3.2757 LearningRate 0.000643 Epoch: 11 Global Step: 231030 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:17,380-Speed 2497.93 samples/sec Loss 3.3425 LearningRate 0.000643 Epoch: 11 Global Step: 231040 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:25,577-Speed 2498.72 samples/sec Loss 3.3533 LearningRate 0.000643 Epoch: 11 Global Step: 231050 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:33,776-Speed 2498.53 samples/sec Loss 3.3362 LearningRate 0.000643 Epoch: 11 Global Step: 231060 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:41,918-Speed 2515.50 samples/sec Loss 3.3216 LearningRate 0.000643 Epoch: 11 Global Step: 231070 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:50,113-Speed 2499.46 samples/sec Loss 3.3735 LearningRate 0.000643 Epoch: 11 Global Step: 231080 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:14:58,311-Speed 2498.77 samples/sec Loss 3.3284 LearningRate 0.000643 Epoch: 11 Global Step: 231090 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:06,511-Speed 2498.07 samples/sec Loss 3.2892 LearningRate 0.000643 Epoch: 11 Global Step: 231100 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:14,720-Speed 2495.61 samples/sec Loss 3.2733 LearningRate 0.000643 Epoch: 11 Global Step: 231110 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:22,916-Speed 2499.13 samples/sec Loss 3.3036 LearningRate 0.000642 Epoch: 11 Global Step: 231120 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:31,057-Speed 2516.04 samples/sec Loss 3.2901 LearningRate 0.000642 Epoch: 11 Global Step: 231130 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:39,251-Speed 2499.60 samples/sec Loss 3.3321 LearningRate 0.000642 Epoch: 11 Global Step: 231140 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:47,448-Speed 2498.81 samples/sec Loss 3.3501 LearningRate 0.000642 Epoch: 11 Global Step: 231150 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:15:55,642-Speed 2499.86 samples/sec Loss 3.3135 LearningRate 0.000642 Epoch: 11 Global Step: 231160 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:03,838-Speed 2499.33 samples/sec Loss 3.3171 LearningRate 0.000642 Epoch: 11 Global Step: 231170 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:12,034-Speed 2499.13 samples/sec Loss 3.3299 LearningRate 0.000642 Epoch: 11 Global Step: 231180 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:20,176-Speed 2515.78 samples/sec Loss 3.2794 LearningRate 0.000642 Epoch: 11 Global Step: 231190 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:28,375-Speed 2498.16 samples/sec Loss 3.3122 LearningRate 0.000642 Epoch: 11 Global Step: 231200 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:36,571-Speed 2499.07 samples/sec Loss 3.2857 LearningRate 0.000642 Epoch: 11 Global Step: 231210 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:44,770-Speed 2498.62 samples/sec Loss 3.3069 LearningRate 0.000642 Epoch: 11 Global Step: 231220 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:16:52,967-Speed 2499.08 samples/sec Loss 3.3601 LearningRate 0.000642 Epoch: 11 Global Step: 231230 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:01,163-Speed 2498.91 samples/sec Loss 3.2833 LearningRate 0.000642 Epoch: 11 Global Step: 231240 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:09,309-Speed 2514.82 samples/sec Loss 3.3222 LearningRate 0.000642 Epoch: 11 Global Step: 231250 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:17,508-Speed 2498.01 samples/sec Loss 3.4023 LearningRate 0.000642 Epoch: 11 Global Step: 231260 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:25,721-Speed 2494.06 samples/sec Loss 3.3572 LearningRate 0.000642 Epoch: 11 Global Step: 231270 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:33,920-Speed 2498.36 samples/sec Loss 3.3190 LearningRate 0.000642 Epoch: 11 Global Step: 231280 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:42,130-Speed 2494.77 samples/sec Loss 3.3987 LearningRate 0.000642 Epoch: 11 Global Step: 231290 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:50,329-Speed 2498.21 samples/sec Loss 3.3505 LearningRate 0.000642 Epoch: 11 Global Step: 231300 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:17:58,474-Speed 2514.95 samples/sec Loss 3.3698 LearningRate 0.000642 Epoch: 11 Global Step: 231310 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:06,669-Speed 2499.52 samples/sec Loss 3.2275 LearningRate 0.000642 Epoch: 11 Global Step: 231320 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:14,869-Speed 2497.90 samples/sec Loss 3.2884 LearningRate 0.000642 Epoch: 11 Global Step: 231330 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:23,067-Speed 2498.77 samples/sec Loss 3.2880 LearningRate 0.000642 Epoch: 11 Global Step: 231340 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:31,262-Speed 2499.51 samples/sec Loss 3.2952 LearningRate 0.000642 Epoch: 11 Global Step: 231350 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:39,457-Speed 2499.47 samples/sec Loss 3.3462 LearningRate 0.000642 Epoch: 11 Global Step: 231360 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:47,600-Speed 2515.49 samples/sec Loss 3.3697 LearningRate 0.000642 Epoch: 11 Global Step: 231370 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:18:55,795-Speed 2499.38 samples/sec Loss 3.3619 LearningRate 0.000642 Epoch: 11 Global Step: 231380 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:03,993-Speed 2498.68 samples/sec Loss 3.3371 LearningRate 0.000642 Epoch: 11 Global Step: 231390 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:12,186-Speed 2499.88 samples/sec Loss 3.3605 LearningRate 0.000642 Epoch: 11 Global Step: 231400 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:20,385-Speed 2498.18 samples/sec Loss 3.3157 LearningRate 0.000642 Epoch: 11 Global Step: 231410 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:28,585-Speed 2497.94 samples/sec Loss 3.3158 LearningRate 0.000642 Epoch: 11 Global Step: 231420 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:36,745-Speed 2510.48 samples/sec Loss 3.3236 LearningRate 0.000642 Epoch: 11 Global Step: 231430 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:44,945-Speed 2497.70 samples/sec Loss 3.2765 LearningRate 0.000642 Epoch: 11 Global Step: 231440 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:19:53,146-Speed 2497.78 samples/sec Loss 3.3233 LearningRate 0.000642 Epoch: 11 Global Step: 231450 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:01,344-Speed 2498.46 samples/sec Loss 3.3218 LearningRate 0.000642 Epoch: 11 Global Step: 231460 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:09,546-Speed 2497.68 samples/sec Loss 3.3421 LearningRate 0.000642 Epoch: 11 Global Step: 231470 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:17,746-Speed 2497.62 samples/sec Loss 3.2926 LearningRate 0.000642 Epoch: 11 Global Step: 231480 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:25,893-Speed 2514.67 samples/sec Loss 3.3333 LearningRate 0.000642 Epoch: 11 Global Step: 231490 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:34,096-Speed 2497.12 samples/sec Loss 3.2668 LearningRate 0.000642 Epoch: 11 Global Step: 231500 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:42,296-Speed 2498.30 samples/sec Loss 3.3647 LearningRate 0.000642 Epoch: 11 Global Step: 231510 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:50,495-Speed 2498.11 samples/sec Loss 3.2939 LearningRate 0.000642 Epoch: 11 Global Step: 231520 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:20:58,699-Speed 2496.56 samples/sec Loss 3.2608 LearningRate 0.000642 Epoch: 11 Global Step: 231530 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:06,903-Speed 2497.08 samples/sec Loss 3.3110 LearningRate 0.000642 Epoch: 11 Global Step: 231540 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:15,050-Speed 2514.19 samples/sec Loss 3.2770 LearningRate 0.000642 Epoch: 11 Global Step: 231550 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:23,252-Speed 2496.92 samples/sec Loss 3.3286 LearningRate 0.000642 Epoch: 11 Global Step: 231560 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:31,455-Speed 2497.05 samples/sec Loss 3.3865 LearningRate 0.000642 Epoch: 11 Global Step: 231570 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:39,658-Speed 2497.02 samples/sec Loss 3.3283 LearningRate 0.000641 Epoch: 11 Global Step: 231580 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:47,857-Speed 2498.36 samples/sec Loss 3.2725 LearningRate 0.000641 Epoch: 11 Global Step: 231590 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:21:56,055-Speed 2498.45 samples/sec Loss 3.2764 LearningRate 0.000641 Epoch: 11 Global Step: 231600 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:22:04,201-Speed 2514.54 samples/sec Loss 3.3398 LearningRate 0.000641 Epoch: 11 Global Step: 231610 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:22:12,420-Speed 2492.31 samples/sec Loss 3.3148 LearningRate 0.000641 Epoch: 11 Global Step: 231620 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:22:20,631-Speed 2494.54 samples/sec Loss 3.3072 LearningRate 0.000641 Epoch: 11 Global Step: 231630 Fp16 Grad Scale: 32768 Required: 137 hours Training: 2022-07-07 19:22:28,790-Speed 2510.68 samples/sec Loss 3.3067 LearningRate 0.000641 Epoch: 11 Global Step: 231640 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:22:36,989-Speed 2498.17 samples/sec Loss 3.3307 LearningRate 0.000641 Epoch: 11 Global Step: 231650 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:22:45,187-Speed 2498.68 samples/sec Loss 3.3098 LearningRate 0.000641 Epoch: 11 Global Step: 231660 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:22:53,343-Speed 2511.22 samples/sec Loss 3.3976 LearningRate 0.000641 Epoch: 11 Global Step: 231670 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:01,542-Speed 2498.45 samples/sec Loss 3.3604 LearningRate 0.000641 Epoch: 11 Global Step: 231680 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:09,742-Speed 2498.07 samples/sec Loss 3.3624 LearningRate 0.000641 Epoch: 11 Global Step: 231690 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:17,941-Speed 2498.22 samples/sec Loss 3.3484 LearningRate 0.000641 Epoch: 11 Global Step: 231700 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:26,137-Speed 2499.01 samples/sec Loss 3.3802 LearningRate 0.000641 Epoch: 11 Global Step: 231710 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:34,336-Speed 2498.29 samples/sec Loss 3.4013 LearningRate 0.000641 Epoch: 11 Global Step: 231720 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:42,493-Speed 2511.01 samples/sec Loss 3.3191 LearningRate 0.000641 Epoch: 11 Global Step: 231730 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:50,691-Speed 2498.83 samples/sec Loss 3.3198 LearningRate 0.000641 Epoch: 11 Global Step: 231740 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:23:58,892-Speed 2497.41 samples/sec Loss 3.3429 LearningRate 0.000641 Epoch: 11 Global Step: 231750 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:07,088-Speed 2499.49 samples/sec Loss 3.2975 LearningRate 0.000641 Epoch: 11 Global Step: 231760 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:15,286-Speed 2498.51 samples/sec Loss 3.3528 LearningRate 0.000641 Epoch: 11 Global Step: 231770 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:23,482-Speed 2498.94 samples/sec Loss 3.3255 LearningRate 0.000641 Epoch: 11 Global Step: 231780 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:31,628-Speed 2514.78 samples/sec Loss 3.2900 LearningRate 0.000641 Epoch: 11 Global Step: 231790 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:39,825-Speed 2498.71 samples/sec Loss 3.3840 LearningRate 0.000641 Epoch: 11 Global Step: 231800 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:48,024-Speed 2498.32 samples/sec Loss 3.3441 LearningRate 0.000641 Epoch: 11 Global Step: 231810 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:24:56,222-Speed 2498.59 samples/sec Loss 3.3390 LearningRate 0.000641 Epoch: 11 Global Step: 231820 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:04,421-Speed 2498.26 samples/sec Loss 3.3359 LearningRate 0.000641 Epoch: 11 Global Step: 231830 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:12,619-Speed 2498.47 samples/sec Loss 3.2694 LearningRate 0.000641 Epoch: 11 Global Step: 231840 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:20,762-Speed 2515.43 samples/sec Loss 3.2949 LearningRate 0.000641 Epoch: 11 Global Step: 231850 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:28,960-Speed 2498.72 samples/sec Loss 3.2733 LearningRate 0.000641 Epoch: 11 Global Step: 231860 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:37,163-Speed 2496.92 samples/sec Loss 3.3475 LearningRate 0.000641 Epoch: 11 Global Step: 231870 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:45,361-Speed 2498.52 samples/sec Loss 3.2960 LearningRate 0.000641 Epoch: 11 Global Step: 231880 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:25:53,565-Speed 2499.50 samples/sec Loss 3.2514 LearningRate 0.000641 Epoch: 11 Global Step: 231890 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:01,767-Speed 2497.19 samples/sec Loss 3.2678 LearningRate 0.000641 Epoch: 11 Global Step: 231900 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:09,912-Speed 2514.88 samples/sec Loss 3.3087 LearningRate 0.000641 Epoch: 11 Global Step: 231910 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:18,110-Speed 2498.58 samples/sec Loss 3.2763 LearningRate 0.000641 Epoch: 11 Global Step: 231920 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:26,312-Speed 2497.41 samples/sec Loss 3.3378 LearningRate 0.000641 Epoch: 11 Global Step: 231930 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:34,519-Speed 2495.80 samples/sec Loss 3.3587 LearningRate 0.000641 Epoch: 11 Global Step: 231940 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:42,719-Speed 2497.93 samples/sec Loss 3.2909 LearningRate 0.000641 Epoch: 11 Global Step: 231950 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:50,917-Speed 2498.60 samples/sec Loss 3.2822 LearningRate 0.000641 Epoch: 11 Global Step: 231960 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:26:59,064-Speed 2514.03 samples/sec Loss 3.3067 LearningRate 0.000641 Epoch: 11 Global Step: 231970 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:07,265-Speed 2497.68 samples/sec Loss 3.3510 LearningRate 0.000641 Epoch: 11 Global Step: 231980 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:15,463-Speed 2498.67 samples/sec Loss 3.3642 LearningRate 0.000641 Epoch: 11 Global Step: 231990 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:23,661-Speed 2498.49 samples/sec Loss 3.3375 LearningRate 0.000641 Epoch: 11 Global Step: 232000 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:31,861-Speed 2497.90 samples/sec Loss 3.3423 LearningRate 0.000641 Epoch: 11 Global Step: 232010 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:40,058-Speed 2499.01 samples/sec Loss 3.2788 LearningRate 0.000641 Epoch: 11 Global Step: 232020 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:48,200-Speed 2515.56 samples/sec Loss 3.2916 LearningRate 0.000641 Epoch: 11 Global Step: 232030 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:27:56,409-Speed 2495.22 samples/sec Loss 3.3786 LearningRate 0.000641 Epoch: 11 Global Step: 232040 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:04,609-Speed 2498.08 samples/sec Loss 3.3870 LearningRate 0.000640 Epoch: 11 Global Step: 232050 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:12,821-Speed 2494.14 samples/sec Loss 3.3629 LearningRate 0.000640 Epoch: 11 Global Step: 232060 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:21,016-Speed 2499.31 samples/sec Loss 3.3231 LearningRate 0.000640 Epoch: 11 Global Step: 232070 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:29,214-Speed 2498.65 samples/sec Loss 3.2941 LearningRate 0.000640 Epoch: 11 Global Step: 232080 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:37,358-Speed 2515.52 samples/sec Loss 3.3402 LearningRate 0.000640 Epoch: 11 Global Step: 232090 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:45,557-Speed 2498.23 samples/sec Loss 3.3217 LearningRate 0.000640 Epoch: 11 Global Step: 232100 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:28:53,754-Speed 2498.74 samples/sec Loss 3.3602 LearningRate 0.000640 Epoch: 11 Global Step: 232110 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:01,955-Speed 2498.00 samples/sec Loss 3.3471 LearningRate 0.000640 Epoch: 11 Global Step: 232120 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:10,156-Speed 2497.64 samples/sec Loss 3.4194 LearningRate 0.000640 Epoch: 11 Global Step: 232130 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:18,359-Speed 2497.21 samples/sec Loss 3.3505 LearningRate 0.000640 Epoch: 11 Global Step: 232140 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:26,505-Speed 2514.59 samples/sec Loss 3.4059 LearningRate 0.000640 Epoch: 11 Global Step: 232150 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:34,722-Speed 2493.07 samples/sec Loss 3.4237 LearningRate 0.000640 Epoch: 11 Global Step: 232160 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:42,919-Speed 2498.61 samples/sec Loss 3.3217 LearningRate 0.000640 Epoch: 11 Global Step: 232170 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:51,117-Speed 2498.67 samples/sec Loss 3.3558 LearningRate 0.000640 Epoch: 11 Global Step: 232180 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:29:59,314-Speed 2499.06 samples/sec Loss 3.3608 LearningRate 0.000640 Epoch: 11 Global Step: 232190 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:30:07,510-Speed 2499.27 samples/sec Loss 3.3508 LearningRate 0.000640 Epoch: 11 Global Step: 232200 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:30:15,650-Speed 2516.11 samples/sec Loss 3.3036 LearningRate 0.000640 Epoch: 11 Global Step: 232210 Fp16 Grad Scale: 16384 Required: 137 hours Training: 2022-07-07 19:30:23,846-Speed 2499.20 samples/sec Loss 3.3527 LearningRate 0.000640 Epoch: 11 Global Step: 232220 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:30:32,047-Speed 2497.57 samples/sec Loss 3.3745 LearningRate 0.000640 Epoch: 11 Global Step: 232230 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:30:40,244-Speed 2498.91 samples/sec Loss 3.3542 LearningRate 0.000640 Epoch: 11 Global Step: 232240 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:30:48,441-Speed 2498.76 samples/sec Loss 3.3258 LearningRate 0.000640 Epoch: 11 Global Step: 232250 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:30:56,637-Speed 2499.50 samples/sec Loss 3.3248 LearningRate 0.000640 Epoch: 11 Global Step: 232260 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:04,782-Speed 2514.86 samples/sec Loss 3.3158 LearningRate 0.000640 Epoch: 11 Global Step: 232270 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:12,978-Speed 2499.22 samples/sec Loss 3.3098 LearningRate 0.000640 Epoch: 11 Global Step: 232280 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:21,172-Speed 2499.53 samples/sec Loss 3.3511 LearningRate 0.000640 Epoch: 11 Global Step: 232290 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:29,368-Speed 2499.47 samples/sec Loss 3.2943 LearningRate 0.000640 Epoch: 11 Global Step: 232300 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:37,566-Speed 2498.44 samples/sec Loss 3.2977 LearningRate 0.000640 Epoch: 11 Global Step: 232310 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:45,766-Speed 2497.97 samples/sec Loss 3.3181 LearningRate 0.000640 Epoch: 11 Global Step: 232320 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:31:53,924-Speed 2510.56 samples/sec Loss 3.3825 LearningRate 0.000640 Epoch: 11 Global Step: 232330 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:02,128-Speed 2496.85 samples/sec Loss 3.2867 LearningRate 0.000640 Epoch: 11 Global Step: 232340 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:10,325-Speed 2499.08 samples/sec Loss 3.3546 LearningRate 0.000640 Epoch: 11 Global Step: 232350 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:18,526-Speed 2497.61 samples/sec Loss 3.3407 LearningRate 0.000640 Epoch: 11 Global Step: 232360 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:26,720-Speed 2499.56 samples/sec Loss 3.3531 LearningRate 0.000640 Epoch: 11 Global Step: 232370 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:34,915-Speed 2499.42 samples/sec Loss 3.2329 LearningRate 0.000640 Epoch: 11 Global Step: 232380 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:43,059-Speed 2515.19 samples/sec Loss 3.3681 LearningRate 0.000640 Epoch: 11 Global Step: 232390 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:51,253-Speed 2499.68 samples/sec Loss 3.3162 LearningRate 0.000640 Epoch: 11 Global Step: 232400 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:32:59,450-Speed 2499.09 samples/sec Loss 3.3022 LearningRate 0.000640 Epoch: 11 Global Step: 232410 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:07,647-Speed 2498.75 samples/sec Loss 3.3157 LearningRate 0.000640 Epoch: 11 Global Step: 232420 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:15,842-Speed 2499.60 samples/sec Loss 3.3467 LearningRate 0.000640 Epoch: 11 Global Step: 232430 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:24,036-Speed 2500.04 samples/sec Loss 3.2484 LearningRate 0.000640 Epoch: 11 Global Step: 232440 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:32,179-Speed 2515.25 samples/sec Loss 3.3133 LearningRate 0.000640 Epoch: 11 Global Step: 232450 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:40,374-Speed 2499.54 samples/sec Loss 3.2799 LearningRate 0.000640 Epoch: 11 Global Step: 232460 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:48,580-Speed 2496.35 samples/sec Loss 3.3469 LearningRate 0.000640 Epoch: 11 Global Step: 232470 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:33:56,775-Speed 2499.24 samples/sec Loss 3.3323 LearningRate 0.000640 Epoch: 11 Global Step: 232480 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:04,981-Speed 2496.09 samples/sec Loss 3.3035 LearningRate 0.000640 Epoch: 11 Global Step: 232490 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:13,177-Speed 2499.39 samples/sec Loss 3.3432 LearningRate 0.000640 Epoch: 11 Global Step: 232500 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:21,318-Speed 2516.06 samples/sec Loss 3.3578 LearningRate 0.000640 Epoch: 11 Global Step: 232510 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:29,513-Speed 2499.38 samples/sec Loss 3.3144 LearningRate 0.000639 Epoch: 11 Global Step: 232520 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:37,709-Speed 2499.07 samples/sec Loss 3.2675 LearningRate 0.000639 Epoch: 11 Global Step: 232530 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:45,903-Speed 2500.22 samples/sec Loss 3.3134 LearningRate 0.000639 Epoch: 11 Global Step: 232540 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:34:54,100-Speed 2498.73 samples/sec Loss 3.2746 LearningRate 0.000639 Epoch: 11 Global Step: 232550 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:02,296-Speed 2499.12 samples/sec Loss 3.4009 LearningRate 0.000639 Epoch: 11 Global Step: 232560 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:10,438-Speed 2515.90 samples/sec Loss 3.3197 LearningRate 0.000639 Epoch: 11 Global Step: 232570 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:18,637-Speed 2498.56 samples/sec Loss 3.3342 LearningRate 0.000639 Epoch: 11 Global Step: 232580 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:26,852-Speed 2493.42 samples/sec Loss 3.2766 LearningRate 0.000639 Epoch: 11 Global Step: 232590 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:35,050-Speed 2498.52 samples/sec Loss 3.2793 LearningRate 0.000639 Epoch: 11 Global Step: 232600 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:43,247-Speed 2499.00 samples/sec Loss 3.3000 LearningRate 0.000639 Epoch: 11 Global Step: 232610 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:51,452-Speed 2496.35 samples/sec Loss 3.2713 LearningRate 0.000639 Epoch: 11 Global Step: 232620 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:35:59,591-Speed 2516.58 samples/sec Loss 3.2911 LearningRate 0.000639 Epoch: 11 Global Step: 232630 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:07,788-Speed 2498.75 samples/sec Loss 3.3614 LearningRate 0.000639 Epoch: 11 Global Step: 232640 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:15,984-Speed 2499.37 samples/sec Loss 3.4443 LearningRate 0.000639 Epoch: 11 Global Step: 232650 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:24,185-Speed 2497.56 samples/sec Loss 3.3480 LearningRate 0.000639 Epoch: 11 Global Step: 232660 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:32,386-Speed 2497.75 samples/sec Loss 3.3696 LearningRate 0.000639 Epoch: 11 Global Step: 232670 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:40,582-Speed 2499.33 samples/sec Loss 3.4100 LearningRate 0.000639 Epoch: 11 Global Step: 232680 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:48,724-Speed 2515.60 samples/sec Loss 3.3519 LearningRate 0.000639 Epoch: 11 Global Step: 232690 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:36:56,916-Speed 2500.36 samples/sec Loss 3.3425 LearningRate 0.000639 Epoch: 11 Global Step: 232700 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:05,110-Speed 2499.58 samples/sec Loss 3.3517 LearningRate 0.000639 Epoch: 11 Global Step: 232710 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:13,305-Speed 2499.73 samples/sec Loss 3.3372 LearningRate 0.000639 Epoch: 11 Global Step: 232720 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:21,503-Speed 2498.68 samples/sec Loss 3.3438 LearningRate 0.000639 Epoch: 11 Global Step: 232730 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:29,702-Speed 2498.00 samples/sec Loss 3.3688 LearningRate 0.000639 Epoch: 11 Global Step: 232740 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:37,846-Speed 2515.23 samples/sec Loss 3.3195 LearningRate 0.000639 Epoch: 11 Global Step: 232750 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:46,063-Speed 2492.77 samples/sec Loss 3.3004 LearningRate 0.000639 Epoch: 11 Global Step: 232760 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:37:54,261-Speed 2498.94 samples/sec Loss 3.2807 LearningRate 0.000639 Epoch: 11 Global Step: 232770 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:02,456-Speed 2499.29 samples/sec Loss 3.2870 LearningRate 0.000639 Epoch: 11 Global Step: 232780 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:10,654-Speed 2498.98 samples/sec Loss 3.3399 LearningRate 0.000639 Epoch: 11 Global Step: 232790 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:18,847-Speed 2500.18 samples/sec Loss 3.3460 LearningRate 0.000639 Epoch: 11 Global Step: 232800 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:26,990-Speed 2515.48 samples/sec Loss 3.3418 LearningRate 0.000639 Epoch: 11 Global Step: 232810 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:35,193-Speed 2497.15 samples/sec Loss 3.2500 LearningRate 0.000639 Epoch: 11 Global Step: 232820 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:43,392-Speed 2498.30 samples/sec Loss 3.2902 LearningRate 0.000639 Epoch: 11 Global Step: 232830 Fp16 Grad Scale: 16384 Required: 136 hours Training: 2022-07-07 19:38:51,590-Speed 2498.56 samples/sec Loss 3.3356 LearningRate 0.000639 Epoch: 11 Global Step: 232840 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:38:59,789-Speed 2498.32 samples/sec Loss 3.2640 LearningRate 0.000639 Epoch: 11 Global Step: 232850 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:07,989-Speed 2497.95 samples/sec Loss 3.3184 LearningRate 0.000639 Epoch: 11 Global Step: 232860 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:16,135-Speed 2514.36 samples/sec Loss 3.4029 LearningRate 0.000639 Epoch: 11 Global Step: 232870 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:24,336-Speed 2497.74 samples/sec Loss 3.3107 LearningRate 0.000639 Epoch: 11 Global Step: 232880 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:32,535-Speed 2498.09 samples/sec Loss 3.3145 LearningRate 0.000639 Epoch: 11 Global Step: 232890 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:40,733-Speed 2498.51 samples/sec Loss 3.4379 LearningRate 0.000639 Epoch: 11 Global Step: 232900 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:48,929-Speed 2499.10 samples/sec Loss 3.3115 LearningRate 0.000639 Epoch: 11 Global Step: 232910 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:39:57,127-Speed 2498.79 samples/sec Loss 3.3188 LearningRate 0.000639 Epoch: 11 Global Step: 232920 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:05,275-Speed 2513.94 samples/sec Loss 3.3681 LearningRate 0.000639 Epoch: 11 Global Step: 232930 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:13,473-Speed 2498.52 samples/sec Loss 3.2981 LearningRate 0.000639 Epoch: 11 Global Step: 232940 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:21,674-Speed 2497.97 samples/sec Loss 3.3533 LearningRate 0.000639 Epoch: 11 Global Step: 232950 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:29,868-Speed 2499.48 samples/sec Loss 3.3362 LearningRate 0.000639 Epoch: 11 Global Step: 232960 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:38,065-Speed 2499.06 samples/sec Loss 3.3161 LearningRate 0.000639 Epoch: 11 Global Step: 232970 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:46,260-Speed 2499.38 samples/sec Loss 3.2430 LearningRate 0.000638 Epoch: 11 Global Step: 232980 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:40:54,400-Speed 2516.27 samples/sec Loss 3.3044 LearningRate 0.000638 Epoch: 11 Global Step: 232990 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:02,604-Speed 2496.97 samples/sec Loss 3.3437 LearningRate 0.000638 Epoch: 11 Global Step: 233000 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:10,813-Speed 2495.08 samples/sec Loss 3.3429 LearningRate 0.000638 Epoch: 11 Global Step: 233010 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:19,008-Speed 2499.56 samples/sec Loss 3.3388 LearningRate 0.000638 Epoch: 11 Global Step: 233020 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:27,206-Speed 2498.60 samples/sec Loss 3.3738 LearningRate 0.000638 Epoch: 11 Global Step: 233030 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:35,405-Speed 2498.36 samples/sec Loss 3.3213 LearningRate 0.000638 Epoch: 11 Global Step: 233040 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:43,548-Speed 2515.37 samples/sec Loss 3.3154 LearningRate 0.000638 Epoch: 11 Global Step: 233050 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:51,752-Speed 2497.04 samples/sec Loss 3.3928 LearningRate 0.000638 Epoch: 11 Global Step: 233060 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:41:59,948-Speed 2499.28 samples/sec Loss 3.2693 LearningRate 0.000638 Epoch: 11 Global Step: 233070 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:08,146-Speed 2498.24 samples/sec Loss 3.2979 LearningRate 0.000638 Epoch: 11 Global Step: 233080 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:16,345-Speed 2498.24 samples/sec Loss 3.3397 LearningRate 0.000638 Epoch: 11 Global Step: 233090 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:24,544-Speed 2498.28 samples/sec Loss 3.2566 LearningRate 0.000638 Epoch: 11 Global Step: 233100 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:32,693-Speed 2513.76 samples/sec Loss 3.2984 LearningRate 0.000638 Epoch: 11 Global Step: 233110 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:40,899-Speed 2496.10 samples/sec Loss 3.2800 LearningRate 0.000638 Epoch: 11 Global Step: 233120 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:49,105-Speed 2496.22 samples/sec Loss 3.2950 LearningRate 0.000638 Epoch: 11 Global Step: 233130 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:42:57,306-Speed 2497.76 samples/sec Loss 3.2660 LearningRate 0.000638 Epoch: 11 Global Step: 233140 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:05,505-Speed 2498.07 samples/sec Loss 3.2025 LearningRate 0.000638 Epoch: 11 Global Step: 233150 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:13,706-Speed 2497.68 samples/sec Loss 3.2273 LearningRate 0.000638 Epoch: 11 Global Step: 233160 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:21,850-Speed 2514.99 samples/sec Loss 3.2439 LearningRate 0.000638 Epoch: 11 Global Step: 233170 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:30,050-Speed 2498.27 samples/sec Loss 3.1744 LearningRate 0.000638 Epoch: 11 Global Step: 233180 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:38,250-Speed 2497.83 samples/sec Loss 3.2326 LearningRate 0.000638 Epoch: 11 Global Step: 233190 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:46,456-Speed 2496.17 samples/sec Loss 3.2501 LearningRate 0.000638 Epoch: 11 Global Step: 233200 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:43:54,659-Speed 2497.15 samples/sec Loss 3.3663 LearningRate 0.000638 Epoch: 11 Global Step: 233210 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:02,857-Speed 2498.71 samples/sec Loss 3.3002 LearningRate 0.000638 Epoch: 11 Global Step: 233220 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:11,007-Speed 2513.53 samples/sec Loss 3.2557 LearningRate 0.000638 Epoch: 11 Global Step: 233230 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:19,208-Speed 2497.80 samples/sec Loss 3.3456 LearningRate 0.000638 Epoch: 11 Global Step: 233240 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:27,404-Speed 2499.15 samples/sec Loss 3.3171 LearningRate 0.000638 Epoch: 11 Global Step: 233250 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:35,605-Speed 2497.91 samples/sec Loss 3.3318 LearningRate 0.000638 Epoch: 11 Global Step: 233260 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:43,821-Speed 2493.11 samples/sec Loss 3.2195 LearningRate 0.000638 Epoch: 11 Global Step: 233270 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:44:52,017-Speed 2499.24 samples/sec Loss 3.2307 LearningRate 0.000638 Epoch: 11 Global Step: 233280 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:00,161-Speed 2515.04 samples/sec Loss 3.2923 LearningRate 0.000638 Epoch: 11 Global Step: 233290 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:08,361-Speed 2497.95 samples/sec Loss 3.2884 LearningRate 0.000638 Epoch: 11 Global Step: 233300 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:16,575-Speed 2493.87 samples/sec Loss 3.2856 LearningRate 0.000638 Epoch: 11 Global Step: 233310 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:24,773-Speed 2498.39 samples/sec Loss 3.2610 LearningRate 0.000638 Epoch: 11 Global Step: 233320 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:32,969-Speed 2499.29 samples/sec Loss 3.3912 LearningRate 0.000638 Epoch: 11 Global Step: 233330 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:41,170-Speed 2497.62 samples/sec Loss 3.3148 LearningRate 0.000638 Epoch: 11 Global Step: 233340 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:49,315-Speed 2515.00 samples/sec Loss 3.4330 LearningRate 0.000638 Epoch: 11 Global Step: 233350 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:45:57,515-Speed 2497.99 samples/sec Loss 3.4345 LearningRate 0.000638 Epoch: 11 Global Step: 233360 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:05,717-Speed 2497.09 samples/sec Loss 3.3452 LearningRate 0.000638 Epoch: 11 Global Step: 233370 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:13,918-Speed 2497.79 samples/sec Loss 3.3802 LearningRate 0.000638 Epoch: 11 Global Step: 233380 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:22,121-Speed 2497.02 samples/sec Loss 3.4164 LearningRate 0.000638 Epoch: 11 Global Step: 233390 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:30,319-Speed 2498.51 samples/sec Loss 3.3028 LearningRate 0.000638 Epoch: 11 Global Step: 233400 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:38,465-Speed 2514.62 samples/sec Loss 3.3797 LearningRate 0.000638 Epoch: 11 Global Step: 233410 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:46,663-Speed 2498.55 samples/sec Loss 3.3416 LearningRate 0.000638 Epoch: 11 Global Step: 233420 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:46:54,859-Speed 2498.90 samples/sec Loss 3.3204 LearningRate 0.000638 Epoch: 11 Global Step: 233430 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:03,061-Speed 2497.43 samples/sec Loss 3.3395 LearningRate 0.000638 Epoch: 11 Global Step: 233440 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:11,271-Speed 2494.96 samples/sec Loss 3.3206 LearningRate 0.000637 Epoch: 11 Global Step: 233450 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:19,474-Speed 2496.94 samples/sec Loss 3.3128 LearningRate 0.000637 Epoch: 11 Global Step: 233460 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:27,620-Speed 2514.62 samples/sec Loss 3.3365 LearningRate 0.000637 Epoch: 11 Global Step: 233470 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:35,825-Speed 2496.40 samples/sec Loss 3.3387 LearningRate 0.000637 Epoch: 11 Global Step: 233480 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:44,022-Speed 2498.90 samples/sec Loss 3.3520 LearningRate 0.000637 Epoch: 11 Global Step: 233490 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:47:52,223-Speed 2497.76 samples/sec Loss 3.3285 LearningRate 0.000637 Epoch: 11 Global Step: 233500 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:00,422-Speed 2498.63 samples/sec Loss 3.3462 LearningRate 0.000637 Epoch: 11 Global Step: 233510 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:08,620-Speed 2498.15 samples/sec Loss 3.3054 LearningRate 0.000637 Epoch: 11 Global Step: 233520 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:16,771-Speed 2513.18 samples/sec Loss 3.2698 LearningRate 0.000637 Epoch: 11 Global Step: 233530 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:24,988-Speed 2494.13 samples/sec Loss 3.3124 LearningRate 0.000637 Epoch: 11 Global Step: 233540 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:33,187-Speed 2498.25 samples/sec Loss 3.3265 LearningRate 0.000637 Epoch: 11 Global Step: 233550 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:41,382-Speed 2499.47 samples/sec Loss 3.2802 LearningRate 0.000637 Epoch: 11 Global Step: 233560 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:49,580-Speed 2498.51 samples/sec Loss 3.3281 LearningRate 0.000637 Epoch: 11 Global Step: 233570 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:48:57,780-Speed 2497.97 samples/sec Loss 3.3747 LearningRate 0.000637 Epoch: 11 Global Step: 233580 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:05,922-Speed 2515.76 samples/sec Loss 3.2603 LearningRate 0.000637 Epoch: 11 Global Step: 233590 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:14,125-Speed 2497.37 samples/sec Loss 3.3335 LearningRate 0.000637 Epoch: 11 Global Step: 233600 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:22,323-Speed 2498.66 samples/sec Loss 3.3277 LearningRate 0.000637 Epoch: 11 Global Step: 233610 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:30,519-Speed 2499.14 samples/sec Loss 3.2592 LearningRate 0.000637 Epoch: 11 Global Step: 233620 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:38,722-Speed 2497.07 samples/sec Loss 3.3041 LearningRate 0.000637 Epoch: 11 Global Step: 233630 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:46,921-Speed 2498.24 samples/sec Loss 3.3010 LearningRate 0.000637 Epoch: 11 Global Step: 233640 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:49:55,067-Speed 2514.58 samples/sec Loss 3.2761 LearningRate 0.000637 Epoch: 11 Global Step: 233650 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:03,266-Speed 2498.15 samples/sec Loss 3.2701 LearningRate 0.000637 Epoch: 11 Global Step: 233660 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:11,467-Speed 2497.73 samples/sec Loss 3.3302 LearningRate 0.000637 Epoch: 11 Global Step: 233670 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:19,679-Speed 2494.23 samples/sec Loss 3.2878 LearningRate 0.000637 Epoch: 11 Global Step: 233680 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:27,880-Speed 2497.77 samples/sec Loss 3.2866 LearningRate 0.000637 Epoch: 11 Global Step: 233690 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:36,081-Speed 2497.62 samples/sec Loss 3.2613 LearningRate 0.000637 Epoch: 11 Global Step: 233700 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:44,230-Speed 2513.65 samples/sec Loss 3.3337 LearningRate 0.000637 Epoch: 11 Global Step: 233710 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:50:52,429-Speed 2498.05 samples/sec Loss 3.2977 LearningRate 0.000637 Epoch: 11 Global Step: 233720 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:00,627-Speed 2498.56 samples/sec Loss 3.2944 LearningRate 0.000637 Epoch: 11 Global Step: 233730 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:08,823-Speed 2499.23 samples/sec Loss 3.2959 LearningRate 0.000637 Epoch: 11 Global Step: 233740 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:17,021-Speed 2498.53 samples/sec Loss 3.2719 LearningRate 0.000637 Epoch: 11 Global Step: 233750 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:25,219-Speed 2498.46 samples/sec Loss 3.2906 LearningRate 0.000637 Epoch: 11 Global Step: 233760 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:33,366-Speed 2514.26 samples/sec Loss 3.2910 LearningRate 0.000637 Epoch: 11 Global Step: 233770 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:41,576-Speed 2495.05 samples/sec Loss 3.3355 LearningRate 0.000637 Epoch: 11 Global Step: 233780 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:49,776-Speed 2497.82 samples/sec Loss 3.3109 LearningRate 0.000637 Epoch: 11 Global Step: 233790 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:51:57,973-Speed 2498.79 samples/sec Loss 3.3045 LearningRate 0.000637 Epoch: 11 Global Step: 233800 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:06,175-Speed 2497.30 samples/sec Loss 3.4053 LearningRate 0.000637 Epoch: 11 Global Step: 233810 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:14,376-Speed 2497.92 samples/sec Loss 3.3842 LearningRate 0.000637 Epoch: 11 Global Step: 233820 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:22,521-Speed 2514.65 samples/sec Loss 3.3387 LearningRate 0.000637 Epoch: 11 Global Step: 233830 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:30,716-Speed 2499.53 samples/sec Loss 3.2850 LearningRate 0.000637 Epoch: 11 Global Step: 233840 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:38,919-Speed 2497.15 samples/sec Loss 3.2915 LearningRate 0.000637 Epoch: 11 Global Step: 233850 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:47,125-Speed 2495.87 samples/sec Loss 3.2705 LearningRate 0.000637 Epoch: 11 Global Step: 233860 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:52:55,331-Speed 2496.53 samples/sec Loss 3.3185 LearningRate 0.000637 Epoch: 11 Global Step: 233870 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:03,531-Speed 2497.91 samples/sec Loss 3.2694 LearningRate 0.000637 Epoch: 11 Global Step: 233880 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:11,683-Speed 2512.46 samples/sec Loss 3.2520 LearningRate 0.000637 Epoch: 11 Global Step: 233890 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:19,884-Speed 2497.82 samples/sec Loss 3.2848 LearningRate 0.000637 Epoch: 11 Global Step: 233900 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:28,085-Speed 2497.72 samples/sec Loss 3.3505 LearningRate 0.000637 Epoch: 11 Global Step: 233910 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:36,282-Speed 2498.75 samples/sec Loss 3.2926 LearningRate 0.000636 Epoch: 11 Global Step: 233920 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:44,484-Speed 2497.30 samples/sec Loss 3.2743 LearningRate 0.000636 Epoch: 11 Global Step: 233930 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:53:52,684-Speed 2497.88 samples/sec Loss 3.2918 LearningRate 0.000636 Epoch: 11 Global Step: 233940 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:00,826-Speed 2515.97 samples/sec Loss 3.2569 LearningRate 0.000636 Epoch: 11 Global Step: 233950 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:09,039-Speed 2493.87 samples/sec Loss 3.2090 LearningRate 0.000636 Epoch: 11 Global Step: 233960 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:17,238-Speed 2498.42 samples/sec Loss 3.3182 LearningRate 0.000636 Epoch: 11 Global Step: 233970 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:25,435-Speed 2498.87 samples/sec Loss 3.3161 LearningRate 0.000636 Epoch: 11 Global Step: 233980 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:33,631-Speed 2499.09 samples/sec Loss 3.3440 LearningRate 0.000636 Epoch: 11 Global Step: 233990 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:41,828-Speed 2498.82 samples/sec Loss 3.3061 LearningRate 0.000636 Epoch: 11 Global Step: 234000 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:49,974-Speed 2514.64 samples/sec Loss 3.3379 LearningRate 0.000636 Epoch: 11 Global Step: 234010 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:54:58,173-Speed 2498.07 samples/sec Loss 3.3598 LearningRate 0.000636 Epoch: 11 Global Step: 234020 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:55:06,371-Speed 2498.76 samples/sec Loss 3.3301 LearningRate 0.000636 Epoch: 11 Global Step: 234030 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:55:14,571-Speed 2497.82 samples/sec Loss 3.3306 LearningRate 0.000636 Epoch: 11 Global Step: 234040 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:55:22,769-Speed 2498.71 samples/sec Loss 3.3258 LearningRate 0.000636 Epoch: 11 Global Step: 234050 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:55:30,964-Speed 2499.50 samples/sec Loss 3.3069 LearningRate 0.000636 Epoch: 11 Global Step: 234060 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:55:39,108-Speed 2515.21 samples/sec Loss 3.3560 LearningRate 0.000636 Epoch: 11 Global Step: 234070 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:55:47,309-Speed 2497.45 samples/sec Loss 3.2925 LearningRate 0.000636 Epoch: 11 Global Step: 234080 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:55:55,512-Speed 2496.84 samples/sec Loss 3.3697 LearningRate 0.000636 Epoch: 11 Global Step: 234090 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:03,717-Speed 2496.50 samples/sec Loss 3.3095 LearningRate 0.000636 Epoch: 11 Global Step: 234100 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:11,920-Speed 2497.26 samples/sec Loss 3.3808 LearningRate 0.000636 Epoch: 11 Global Step: 234110 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:20,123-Speed 2496.75 samples/sec Loss 3.4093 LearningRate 0.000636 Epoch: 11 Global Step: 234120 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:28,277-Speed 2512.13 samples/sec Loss 3.2534 LearningRate 0.000636 Epoch: 11 Global Step: 234130 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:36,486-Speed 2495.41 samples/sec Loss 3.3997 LearningRate 0.000636 Epoch: 11 Global Step: 234140 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:44,703-Speed 2492.90 samples/sec Loss 3.3306 LearningRate 0.000636 Epoch: 11 Global Step: 234150 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:56:52,900-Speed 2498.87 samples/sec Loss 3.3159 LearningRate 0.000636 Epoch: 11 Global Step: 234160 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 19:57:01,055-Speed 2511.89 samples/sec Loss 3.2704 LearningRate 0.000636 Epoch: 11 Global Step: 234170 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:09,271-Speed 2493.17 samples/sec Loss 3.3351 LearningRate 0.000636 Epoch: 11 Global Step: 234180 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:17,417-Speed 2514.49 samples/sec Loss 3.2975 LearningRate 0.000636 Epoch: 11 Global Step: 234190 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:25,617-Speed 2498.16 samples/sec Loss 3.2366 LearningRate 0.000636 Epoch: 11 Global Step: 234200 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:33,817-Speed 2497.88 samples/sec Loss 3.2464 LearningRate 0.000636 Epoch: 11 Global Step: 234210 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:42,033-Speed 2493.27 samples/sec Loss 3.2422 LearningRate 0.000636 Epoch: 11 Global Step: 234220 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:50,236-Speed 2497.14 samples/sec Loss 3.2982 LearningRate 0.000636 Epoch: 11 Global Step: 234230 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:57:58,447-Speed 2494.41 samples/sec Loss 3.2913 LearningRate 0.000636 Epoch: 11 Global Step: 234240 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:06,600-Speed 2512.38 samples/sec Loss 3.3283 LearningRate 0.000636 Epoch: 11 Global Step: 234250 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:14,798-Speed 2498.66 samples/sec Loss 3.2670 LearningRate 0.000636 Epoch: 11 Global Step: 234260 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:23,008-Speed 2494.95 samples/sec Loss 3.3049 LearningRate 0.000636 Epoch: 11 Global Step: 234270 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:31,206-Speed 2498.37 samples/sec Loss 3.2990 LearningRate 0.000636 Epoch: 11 Global Step: 234280 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:39,405-Speed 2498.42 samples/sec Loss 3.2744 LearningRate 0.000636 Epoch: 11 Global Step: 234290 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:47,604-Speed 2498.27 samples/sec Loss 3.3235 LearningRate 0.000636 Epoch: 11 Global Step: 234300 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:58:55,752-Speed 2513.85 samples/sec Loss 3.2980 LearningRate 0.000636 Epoch: 11 Global Step: 234310 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:03,949-Speed 2498.82 samples/sec Loss 3.4098 LearningRate 0.000636 Epoch: 11 Global Step: 234320 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:12,149-Speed 2498.07 samples/sec Loss 3.2510 LearningRate 0.000636 Epoch: 11 Global Step: 234330 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:20,349-Speed 2498.02 samples/sec Loss 3.2750 LearningRate 0.000636 Epoch: 11 Global Step: 234340 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:28,547-Speed 2498.46 samples/sec Loss 3.3458 LearningRate 0.000636 Epoch: 11 Global Step: 234350 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:36,744-Speed 2498.70 samples/sec Loss 3.3123 LearningRate 0.000636 Epoch: 11 Global Step: 234360 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:44,891-Speed 2514.49 samples/sec Loss 3.2580 LearningRate 0.000636 Epoch: 11 Global Step: 234370 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 19:59:53,092-Speed 2497.62 samples/sec Loss 3.2825 LearningRate 0.000636 Epoch: 11 Global Step: 234380 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:01,302-Speed 2494.73 samples/sec Loss 3.3014 LearningRate 0.000635 Epoch: 11 Global Step: 234390 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:09,499-Speed 2498.83 samples/sec Loss 3.3317 LearningRate 0.000635 Epoch: 11 Global Step: 234400 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:17,701-Speed 2497.65 samples/sec Loss 3.3535 LearningRate 0.000635 Epoch: 11 Global Step: 234410 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:25,898-Speed 2498.55 samples/sec Loss 3.3536 LearningRate 0.000635 Epoch: 11 Global Step: 234420 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:34,052-Speed 2512.08 samples/sec Loss 3.4110 LearningRate 0.000635 Epoch: 11 Global Step: 234430 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:42,251-Speed 2498.21 samples/sec Loss 3.2373 LearningRate 0.000635 Epoch: 11 Global Step: 234440 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:50,459-Speed 2495.55 samples/sec Loss 3.3703 LearningRate 0.000635 Epoch: 11 Global Step: 234450 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:00:58,659-Speed 2498.00 samples/sec Loss 3.3267 LearningRate 0.000635 Epoch: 11 Global Step: 234460 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:06,859-Speed 2498.07 samples/sec Loss 3.2820 LearningRate 0.000635 Epoch: 11 Global Step: 234470 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:15,057-Speed 2498.47 samples/sec Loss 3.3605 LearningRate 0.000635 Epoch: 11 Global Step: 234480 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:23,203-Speed 2514.99 samples/sec Loss 3.3242 LearningRate 0.000635 Epoch: 11 Global Step: 234490 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:31,400-Speed 2498.54 samples/sec Loss 3.3022 LearningRate 0.000635 Epoch: 11 Global Step: 234500 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:39,602-Speed 2497.44 samples/sec Loss 3.2978 LearningRate 0.000635 Epoch: 11 Global Step: 234510 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:47,815-Speed 2494.11 samples/sec Loss 3.2995 LearningRate 0.000635 Epoch: 11 Global Step: 234520 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:01:56,016-Speed 2498.08 samples/sec Loss 3.2895 LearningRate 0.000635 Epoch: 11 Global Step: 234530 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:04,217-Speed 2497.73 samples/sec Loss 3.2747 LearningRate 0.000635 Epoch: 11 Global Step: 234540 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:12,363-Speed 2514.49 samples/sec Loss 3.2455 LearningRate 0.000635 Epoch: 11 Global Step: 234550 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:20,562-Speed 2498.03 samples/sec Loss 3.3125 LearningRate 0.000635 Epoch: 11 Global Step: 234560 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:28,762-Speed 2498.24 samples/sec Loss 3.2165 LearningRate 0.000635 Epoch: 11 Global Step: 234570 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:36,957-Speed 2499.16 samples/sec Loss 3.2268 LearningRate 0.000635 Epoch: 11 Global Step: 234580 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:45,156-Speed 2498.36 samples/sec Loss 3.2629 LearningRate 0.000635 Epoch: 11 Global Step: 234590 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:02:53,359-Speed 2497.98 samples/sec Loss 3.2380 LearningRate 0.000635 Epoch: 11 Global Step: 234600 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:01,503-Speed 2515.01 samples/sec Loss 3.2666 LearningRate 0.000635 Epoch: 11 Global Step: 234610 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:09,702-Speed 2498.35 samples/sec Loss 3.2144 LearningRate 0.000635 Epoch: 11 Global Step: 234620 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:17,903-Speed 2497.51 samples/sec Loss 3.3459 LearningRate 0.000635 Epoch: 11 Global Step: 234630 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:26,117-Speed 2493.72 samples/sec Loss 3.3266 LearningRate 0.000635 Epoch: 11 Global Step: 234640 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:34,318-Speed 2497.56 samples/sec Loss 3.2894 LearningRate 0.000635 Epoch: 11 Global Step: 234650 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:42,517-Speed 2498.57 samples/sec Loss 3.3005 LearningRate 0.000635 Epoch: 11 Global Step: 234660 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:50,666-Speed 2513.40 samples/sec Loss 3.2785 LearningRate 0.000635 Epoch: 11 Global Step: 234670 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:03:58,868-Speed 2497.45 samples/sec Loss 3.2835 LearningRate 0.000635 Epoch: 11 Global Step: 234680 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:07,068-Speed 2497.87 samples/sec Loss 3.3299 LearningRate 0.000635 Epoch: 11 Global Step: 234690 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:15,266-Speed 2498.73 samples/sec Loss 3.3342 LearningRate 0.000635 Epoch: 11 Global Step: 234700 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:23,468-Speed 2497.45 samples/sec Loss 3.3135 LearningRate 0.000635 Epoch: 11 Global Step: 234710 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:31,669-Speed 2497.59 samples/sec Loss 3.2584 LearningRate 0.000635 Epoch: 11 Global Step: 234720 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:39,819-Speed 2513.26 samples/sec Loss 3.2440 LearningRate 0.000635 Epoch: 11 Global Step: 234730 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:48,019-Speed 2497.93 samples/sec Loss 3.2939 LearningRate 0.000635 Epoch: 11 Global Step: 234740 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:04:56,218-Speed 2498.19 samples/sec Loss 3.3201 LearningRate 0.000635 Epoch: 11 Global Step: 234750 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:04,431-Speed 2494.30 samples/sec Loss 3.3363 LearningRate 0.000635 Epoch: 11 Global Step: 234760 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:12,631-Speed 2498.00 samples/sec Loss 3.3364 LearningRate 0.000635 Epoch: 11 Global Step: 234770 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:20,833-Speed 2497.12 samples/sec Loss 3.2993 LearningRate 0.000635 Epoch: 11 Global Step: 234780 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:28,988-Speed 2511.83 samples/sec Loss 3.3798 LearningRate 0.000635 Epoch: 11 Global Step: 234790 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:37,194-Speed 2496.20 samples/sec Loss 3.3350 LearningRate 0.000635 Epoch: 11 Global Step: 234800 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:45,396-Speed 2497.40 samples/sec Loss 3.3091 LearningRate 0.000635 Epoch: 11 Global Step: 234810 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:05:53,597-Speed 2497.84 samples/sec Loss 3.3225 LearningRate 0.000635 Epoch: 11 Global Step: 234820 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:01,797-Speed 2497.83 samples/sec Loss 3.3319 LearningRate 0.000635 Epoch: 11 Global Step: 234830 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:09,997-Speed 2497.97 samples/sec Loss 3.3677 LearningRate 0.000635 Epoch: 11 Global Step: 234840 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:18,146-Speed 2513.71 samples/sec Loss 3.2911 LearningRate 0.000634 Epoch: 11 Global Step: 234850 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:26,345-Speed 2498.27 samples/sec Loss 3.3624 LearningRate 0.000634 Epoch: 11 Global Step: 234860 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:34,550-Speed 2496.58 samples/sec Loss 3.3395 LearningRate 0.000634 Epoch: 11 Global Step: 234870 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:42,750-Speed 2497.75 samples/sec Loss 3.3287 LearningRate 0.000634 Epoch: 11 Global Step: 234880 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:50,951-Speed 2497.80 samples/sec Loss 3.2991 LearningRate 0.000634 Epoch: 11 Global Step: 234890 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:06:59,155-Speed 2496.70 samples/sec Loss 3.3581 LearningRate 0.000634 Epoch: 11 Global Step: 234900 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:07,300-Speed 2515.29 samples/sec Loss 3.2884 LearningRate 0.000634 Epoch: 11 Global Step: 234910 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:15,498-Speed 2498.62 samples/sec Loss 3.3500 LearningRate 0.000634 Epoch: 11 Global Step: 234920 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:23,696-Speed 2498.31 samples/sec Loss 3.2961 LearningRate 0.000634 Epoch: 11 Global Step: 234930 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:31,895-Speed 2498.33 samples/sec Loss 3.3312 LearningRate 0.000634 Epoch: 11 Global Step: 234940 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:40,094-Speed 2498.36 samples/sec Loss 3.3183 LearningRate 0.000634 Epoch: 11 Global Step: 234950 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:48,307-Speed 2494.22 samples/sec Loss 3.3792 LearningRate 0.000634 Epoch: 11 Global Step: 234960 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:07:56,452-Speed 2514.64 samples/sec Loss 3.2824 LearningRate 0.000634 Epoch: 11 Global Step: 234970 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:04,653-Speed 2497.68 samples/sec Loss 3.3728 LearningRate 0.000634 Epoch: 11 Global Step: 234980 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:12,850-Speed 2498.99 samples/sec Loss 3.3936 LearningRate 0.000634 Epoch: 11 Global Step: 234990 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:21,046-Speed 2499.03 samples/sec Loss 3.3383 LearningRate 0.000634 Epoch: 11 Global Step: 235000 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:29,247-Speed 2497.90 samples/sec Loss 3.3822 LearningRate 0.000634 Epoch: 11 Global Step: 235010 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:37,447-Speed 2497.88 samples/sec Loss 3.3944 LearningRate 0.000634 Epoch: 11 Global Step: 235020 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:45,600-Speed 2512.29 samples/sec Loss 3.3507 LearningRate 0.000634 Epoch: 11 Global Step: 235030 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:08:53,794-Speed 2499.74 samples/sec Loss 3.3105 LearningRate 0.000634 Epoch: 11 Global Step: 235040 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:01,991-Speed 2499.13 samples/sec Loss 3.3296 LearningRate 0.000634 Epoch: 11 Global Step: 235050 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:10,190-Speed 2498.50 samples/sec Loss 3.4132 LearningRate 0.000634 Epoch: 11 Global Step: 235060 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:18,385-Speed 2499.09 samples/sec Loss 3.3344 LearningRate 0.000634 Epoch: 11 Global Step: 235070 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:26,581-Speed 2499.23 samples/sec Loss 3.3209 LearningRate 0.000634 Epoch: 11 Global Step: 235080 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:34,721-Speed 2516.39 samples/sec Loss 3.3176 LearningRate 0.000634 Epoch: 11 Global Step: 235090 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:42,922-Speed 2497.45 samples/sec Loss 3.3310 LearningRate 0.000634 Epoch: 11 Global Step: 235100 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:51,120-Speed 2498.68 samples/sec Loss 3.3554 LearningRate 0.000634 Epoch: 11 Global Step: 235110 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:09:59,325-Speed 2496.56 samples/sec Loss 3.3303 LearningRate 0.000634 Epoch: 11 Global Step: 235120 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:07,522-Speed 2498.94 samples/sec Loss 3.2802 LearningRate 0.000634 Epoch: 11 Global Step: 235130 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:15,724-Speed 2497.04 samples/sec Loss 3.3179 LearningRate 0.000634 Epoch: 11 Global Step: 235140 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:23,869-Speed 2514.86 samples/sec Loss 3.3115 LearningRate 0.000634 Epoch: 11 Global Step: 235150 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:32,067-Speed 2498.51 samples/sec Loss 3.3144 LearningRate 0.000634 Epoch: 11 Global Step: 235160 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:40,272-Speed 2496.64 samples/sec Loss 3.3036 LearningRate 0.000634 Epoch: 11 Global Step: 235170 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:48,469-Speed 2498.54 samples/sec Loss 3.3091 LearningRate 0.000634 Epoch: 11 Global Step: 235180 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:10:56,666-Speed 2498.90 samples/sec Loss 3.2577 LearningRate 0.000634 Epoch: 11 Global Step: 235190 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:04,863-Speed 2499.01 samples/sec Loss 3.3581 LearningRate 0.000634 Epoch: 11 Global Step: 235200 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:13,006-Speed 2515.45 samples/sec Loss 3.3399 LearningRate 0.000634 Epoch: 11 Global Step: 235210 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:21,207-Speed 2497.84 samples/sec Loss 3.3185 LearningRate 0.000634 Epoch: 11 Global Step: 235220 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:29,405-Speed 2498.29 samples/sec Loss 3.3318 LearningRate 0.000634 Epoch: 11 Global Step: 235230 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:37,614-Speed 2495.45 samples/sec Loss 3.3054 LearningRate 0.000634 Epoch: 11 Global Step: 235240 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:45,809-Speed 2499.22 samples/sec Loss 3.2959 LearningRate 0.000634 Epoch: 11 Global Step: 235250 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:11:54,014-Speed 2496.60 samples/sec Loss 3.2907 LearningRate 0.000634 Epoch: 11 Global Step: 235260 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:02,158-Speed 2515.24 samples/sec Loss 3.3150 LearningRate 0.000634 Epoch: 11 Global Step: 235270 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:10,357-Speed 2498.15 samples/sec Loss 3.2641 LearningRate 0.000634 Epoch: 11 Global Step: 235280 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:18,559-Speed 2497.29 samples/sec Loss 3.2966 LearningRate 0.000634 Epoch: 11 Global Step: 235290 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:26,756-Speed 2499.24 samples/sec Loss 3.2649 LearningRate 0.000634 Epoch: 11 Global Step: 235300 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:34,952-Speed 2499.09 samples/sec Loss 3.3399 LearningRate 0.000634 Epoch: 11 Global Step: 235310 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:43,151-Speed 2498.50 samples/sec Loss 3.2928 LearningRate 0.000633 Epoch: 11 Global Step: 235320 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:51,295-Speed 2515.03 samples/sec Loss 3.2931 LearningRate 0.000633 Epoch: 11 Global Step: 235330 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:12:59,494-Speed 2498.39 samples/sec Loss 3.2792 LearningRate 0.000633 Epoch: 11 Global Step: 235340 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:13:07,708-Speed 2493.60 samples/sec Loss 3.2618 LearningRate 0.000633 Epoch: 11 Global Step: 235350 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:13:15,906-Speed 2498.75 samples/sec Loss 3.3090 LearningRate 0.000633 Epoch: 11 Global Step: 235360 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:13:24,105-Speed 2498.22 samples/sec Loss 3.3522 LearningRate 0.000633 Epoch: 11 Global Step: 235370 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:13:32,303-Speed 2498.33 samples/sec Loss 3.2429 LearningRate 0.000633 Epoch: 11 Global Step: 235380 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:13:40,448-Speed 2515.07 samples/sec Loss 3.3821 LearningRate 0.000633 Epoch: 11 Global Step: 235390 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:13:48,646-Speed 2498.45 samples/sec Loss 3.3868 LearningRate 0.000633 Epoch: 11 Global Step: 235400 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:13:56,844-Speed 2498.69 samples/sec Loss 3.3043 LearningRate 0.000633 Epoch: 11 Global Step: 235410 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:05,045-Speed 2497.90 samples/sec Loss 3.3975 LearningRate 0.000633 Epoch: 11 Global Step: 235420 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:13,247-Speed 2497.27 samples/sec Loss 3.3782 LearningRate 0.000633 Epoch: 11 Global Step: 235430 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:21,450-Speed 2497.32 samples/sec Loss 3.2564 LearningRate 0.000633 Epoch: 11 Global Step: 235440 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:29,591-Speed 2515.87 samples/sec Loss 3.3462 LearningRate 0.000633 Epoch: 11 Global Step: 235450 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:37,790-Speed 2498.35 samples/sec Loss 3.3315 LearningRate 0.000633 Epoch: 11 Global Step: 235460 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:45,995-Speed 2496.46 samples/sec Loss 3.2617 LearningRate 0.000633 Epoch: 11 Global Step: 235470 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:14:54,190-Speed 2499.66 samples/sec Loss 3.2513 LearningRate 0.000633 Epoch: 11 Global Step: 235480 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:02,389-Speed 2498.06 samples/sec Loss 3.2837 LearningRate 0.000633 Epoch: 11 Global Step: 235490 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:10,587-Speed 2498.68 samples/sec Loss 3.2853 LearningRate 0.000633 Epoch: 11 Global Step: 235500 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:18,733-Speed 2514.47 samples/sec Loss 3.2818 LearningRate 0.000633 Epoch: 11 Global Step: 235510 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:26,928-Speed 2499.25 samples/sec Loss 3.3203 LearningRate 0.000633 Epoch: 11 Global Step: 235520 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:35,124-Speed 2499.11 samples/sec Loss 3.2507 LearningRate 0.000633 Epoch: 11 Global Step: 235530 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:43,326-Speed 2497.52 samples/sec Loss 3.2481 LearningRate 0.000633 Epoch: 11 Global Step: 235540 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:51,522-Speed 2499.19 samples/sec Loss 3.1691 LearningRate 0.000633 Epoch: 11 Global Step: 235550 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:15:59,718-Speed 2499.08 samples/sec Loss 3.2004 LearningRate 0.000633 Epoch: 11 Global Step: 235560 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:07,865-Speed 2514.31 samples/sec Loss 3.2812 LearningRate 0.000633 Epoch: 11 Global Step: 235570 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:16,064-Speed 2498.14 samples/sec Loss 3.2740 LearningRate 0.000633 Epoch: 11 Global Step: 235580 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:24,261-Speed 2498.93 samples/sec Loss 3.2273 LearningRate 0.000633 Epoch: 11 Global Step: 235590 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:32,460-Speed 2498.56 samples/sec Loss 3.3299 LearningRate 0.000633 Epoch: 11 Global Step: 235600 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:40,659-Speed 2497.98 samples/sec Loss 3.3529 LearningRate 0.000633 Epoch: 11 Global Step: 235610 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:48,857-Speed 2498.77 samples/sec Loss 3.3006 LearningRate 0.000633 Epoch: 11 Global Step: 235620 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:16:57,000-Speed 2515.51 samples/sec Loss 3.2521 LearningRate 0.000633 Epoch: 11 Global Step: 235630 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:05,204-Speed 2496.74 samples/sec Loss 3.3324 LearningRate 0.000633 Epoch: 11 Global Step: 235640 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:13,403-Speed 2498.21 samples/sec Loss 3.3259 LearningRate 0.000633 Epoch: 11 Global Step: 235650 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:21,601-Speed 2498.41 samples/sec Loss 3.3410 LearningRate 0.000633 Epoch: 11 Global Step: 235660 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:29,800-Speed 2498.36 samples/sec Loss 3.2708 LearningRate 0.000633 Epoch: 11 Global Step: 235670 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:38,000-Speed 2497.95 samples/sec Loss 3.3350 LearningRate 0.000633 Epoch: 11 Global Step: 235680 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:46,150-Speed 2513.22 samples/sec Loss 3.3827 LearningRate 0.000633 Epoch: 11 Global Step: 235690 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:17:54,349-Speed 2498.34 samples/sec Loss 3.2415 LearningRate 0.000633 Epoch: 11 Global Step: 235700 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:02,545-Speed 2499.20 samples/sec Loss 3.2603 LearningRate 0.000633 Epoch: 11 Global Step: 235710 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:10,745-Speed 2497.87 samples/sec Loss 3.2988 LearningRate 0.000633 Epoch: 11 Global Step: 235720 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:18,942-Speed 2498.94 samples/sec Loss 3.3874 LearningRate 0.000633 Epoch: 11 Global Step: 235730 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:27,154-Speed 2494.20 samples/sec Loss 3.2906 LearningRate 0.000633 Epoch: 11 Global Step: 235740 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:35,305-Speed 2513.00 samples/sec Loss 3.3982 LearningRate 0.000633 Epoch: 11 Global Step: 235750 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:43,501-Speed 2499.12 samples/sec Loss 3.3248 LearningRate 0.000633 Epoch: 11 Global Step: 235760 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:51,698-Speed 2498.89 samples/sec Loss 3.3306 LearningRate 0.000633 Epoch: 11 Global Step: 235770 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:18:59,910-Speed 2494.11 samples/sec Loss 3.3327 LearningRate 0.000633 Epoch: 11 Global Step: 235780 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:08,105-Speed 2499.24 samples/sec Loss 3.2727 LearningRate 0.000632 Epoch: 11 Global Step: 235790 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:16,301-Speed 2499.25 samples/sec Loss 3.2581 LearningRate 0.000632 Epoch: 11 Global Step: 235800 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:24,444-Speed 2515.54 samples/sec Loss 3.3030 LearningRate 0.000632 Epoch: 11 Global Step: 235810 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:32,640-Speed 2499.08 samples/sec Loss 3.3172 LearningRate 0.000632 Epoch: 11 Global Step: 235820 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:40,835-Speed 2500.06 samples/sec Loss 3.2882 LearningRate 0.000632 Epoch: 11 Global Step: 235830 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:49,045-Speed 2494.72 samples/sec Loss 3.3076 LearningRate 0.000632 Epoch: 11 Global Step: 235840 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:19:57,241-Speed 2500.12 samples/sec Loss 3.2872 LearningRate 0.000632 Epoch: 11 Global Step: 235850 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:05,458-Speed 2492.68 samples/sec Loss 3.2439 LearningRate 0.000632 Epoch: 11 Global Step: 235860 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:13,605-Speed 2514.37 samples/sec Loss 3.2094 LearningRate 0.000632 Epoch: 11 Global Step: 235870 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:21,801-Speed 2498.87 samples/sec Loss 3.2407 LearningRate 0.000632 Epoch: 11 Global Step: 235880 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:30,001-Speed 2498.20 samples/sec Loss 3.2607 LearningRate 0.000632 Epoch: 11 Global Step: 235890 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:38,199-Speed 2498.56 samples/sec Loss 3.2989 LearningRate 0.000632 Epoch: 11 Global Step: 235900 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:46,401-Speed 2497.32 samples/sec Loss 3.2288 LearningRate 0.000632 Epoch: 11 Global Step: 235910 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:20:54,600-Speed 2498.26 samples/sec Loss 3.2329 LearningRate 0.000632 Epoch: 11 Global Step: 235920 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:02,749-Speed 2513.93 samples/sec Loss 3.2927 LearningRate 0.000632 Epoch: 11 Global Step: 235930 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:10,950-Speed 2497.62 samples/sec Loss 3.2712 LearningRate 0.000632 Epoch: 11 Global Step: 235940 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:19,158-Speed 2495.27 samples/sec Loss 3.2171 LearningRate 0.000632 Epoch: 11 Global Step: 235950 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:27,360-Speed 2497.24 samples/sec Loss 3.2845 LearningRate 0.000632 Epoch: 11 Global Step: 235960 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:35,567-Speed 2496.02 samples/sec Loss 3.2580 LearningRate 0.000632 Epoch: 11 Global Step: 235970 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:43,767-Speed 2497.77 samples/sec Loss 3.2020 LearningRate 0.000632 Epoch: 11 Global Step: 235980 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:21:51,918-Speed 2513.19 samples/sec Loss 3.2476 LearningRate 0.000632 Epoch: 11 Global Step: 235990 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:00,122-Speed 2496.63 samples/sec Loss 3.3049 LearningRate 0.000632 Epoch: 11 Global Step: 236000 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:08,327-Speed 2496.42 samples/sec Loss 3.2961 LearningRate 0.000632 Epoch: 11 Global Step: 236010 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:16,526-Speed 2498.16 samples/sec Loss 3.2823 LearningRate 0.000632 Epoch: 11 Global Step: 236020 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:24,724-Speed 2498.52 samples/sec Loss 3.2719 LearningRate 0.000632 Epoch: 11 Global Step: 236030 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:32,929-Speed 2496.38 samples/sec Loss 3.2756 LearningRate 0.000632 Epoch: 11 Global Step: 236040 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:41,078-Speed 2513.82 samples/sec Loss 3.3186 LearningRate 0.000632 Epoch: 11 Global Step: 236050 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:49,273-Speed 2499.14 samples/sec Loss 3.2770 LearningRate 0.000632 Epoch: 11 Global Step: 236060 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:22:57,472-Speed 2498.42 samples/sec Loss 3.2205 LearningRate 0.000632 Epoch: 11 Global Step: 236070 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:05,669-Speed 2498.93 samples/sec Loss 3.2750 LearningRate 0.000632 Epoch: 11 Global Step: 236080 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:13,870-Speed 2498.02 samples/sec Loss 3.3279 LearningRate 0.000632 Epoch: 11 Global Step: 236090 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:22,069-Speed 2497.90 samples/sec Loss 3.3402 LearningRate 0.000632 Epoch: 11 Global Step: 236100 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:30,210-Speed 2516.15 samples/sec Loss 3.2792 LearningRate 0.000632 Epoch: 11 Global Step: 236110 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:38,411-Speed 2497.90 samples/sec Loss 3.3103 LearningRate 0.000632 Epoch: 11 Global Step: 236120 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:46,607-Speed 2499.20 samples/sec Loss 3.3129 LearningRate 0.000632 Epoch: 11 Global Step: 236130 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:23:54,802-Speed 2499.57 samples/sec Loss 3.3680 LearningRate 0.000632 Epoch: 11 Global Step: 236140 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:02,996-Speed 2499.67 samples/sec Loss 3.3016 LearningRate 0.000632 Epoch: 11 Global Step: 236150 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:11,196-Speed 2497.99 samples/sec Loss 3.3667 LearningRate 0.000632 Epoch: 11 Global Step: 236160 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:19,341-Speed 2514.80 samples/sec Loss 3.3228 LearningRate 0.000632 Epoch: 11 Global Step: 236170 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:27,541-Speed 2497.92 samples/sec Loss 3.2994 LearningRate 0.000632 Epoch: 11 Global Step: 236180 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:35,739-Speed 2498.69 samples/sec Loss 3.3566 LearningRate 0.000632 Epoch: 11 Global Step: 236190 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:43,940-Speed 2497.64 samples/sec Loss 3.2637 LearningRate 0.000632 Epoch: 11 Global Step: 236200 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:24:52,140-Speed 2498.23 samples/sec Loss 3.2676 LearningRate 0.000632 Epoch: 11 Global Step: 236210 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:00,343-Speed 2497.12 samples/sec Loss 3.2525 LearningRate 0.000632 Epoch: 11 Global Step: 236220 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:08,499-Speed 2511.46 samples/sec Loss 3.2729 LearningRate 0.000632 Epoch: 11 Global Step: 236230 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:16,697-Speed 2498.57 samples/sec Loss 3.2727 LearningRate 0.000632 Epoch: 11 Global Step: 236240 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:24,900-Speed 2497.14 samples/sec Loss 3.2646 LearningRate 0.000632 Epoch: 11 Global Step: 236250 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:33,098-Speed 2498.56 samples/sec Loss 3.2969 LearningRate 0.000631 Epoch: 11 Global Step: 236260 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:41,299-Speed 2497.70 samples/sec Loss 3.2486 LearningRate 0.000631 Epoch: 11 Global Step: 236270 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:49,498-Speed 2498.11 samples/sec Loss 3.2855 LearningRate 0.000631 Epoch: 11 Global Step: 236280 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:25:57,639-Speed 2516.30 samples/sec Loss 3.2753 LearningRate 0.000631 Epoch: 11 Global Step: 236290 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:26:05,842-Speed 2497.11 samples/sec Loss 3.3136 LearningRate 0.000631 Epoch: 11 Global Step: 236300 Fp16 Grad Scale: 65536 Required: 136 hours Training: 2022-07-07 20:26:13,997-Speed 2511.86 samples/sec Loss 3.3366 LearningRate 0.000631 Epoch: 11 Global Step: 236310 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:26:22,195-Speed 2498.51 samples/sec Loss 3.3307 LearningRate 0.000631 Epoch: 11 Global Step: 236320 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:26:30,404-Speed 2494.99 samples/sec Loss 3.3085 LearningRate 0.000631 Epoch: 11 Global Step: 236330 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:26:38,602-Speed 2498.71 samples/sec Loss 3.2841 LearningRate 0.000631 Epoch: 11 Global Step: 236340 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:26:46,747-Speed 2515.03 samples/sec Loss 3.2632 LearningRate 0.000631 Epoch: 11 Global Step: 236350 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:26:54,957-Speed 2494.83 samples/sec Loss 3.2429 LearningRate 0.000631 Epoch: 11 Global Step: 236360 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:03,160-Speed 2497.32 samples/sec Loss 3.2266 LearningRate 0.000631 Epoch: 11 Global Step: 236370 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:11,361-Speed 2497.47 samples/sec Loss 3.2294 LearningRate 0.000631 Epoch: 11 Global Step: 236380 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:19,564-Speed 2497.16 samples/sec Loss 3.2521 LearningRate 0.000631 Epoch: 11 Global Step: 236390 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:27,761-Speed 2498.91 samples/sec Loss 3.2416 LearningRate 0.000631 Epoch: 11 Global Step: 236400 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:35,908-Speed 2514.20 samples/sec Loss 3.2340 LearningRate 0.000631 Epoch: 11 Global Step: 236410 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:44,107-Speed 2498.14 samples/sec Loss 3.3196 LearningRate 0.000631 Epoch: 11 Global Step: 236420 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:27:52,306-Speed 2498.54 samples/sec Loss 3.3612 LearningRate 0.000631 Epoch: 11 Global Step: 236430 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:00,506-Speed 2497.88 samples/sec Loss 3.3485 LearningRate 0.000631 Epoch: 11 Global Step: 236440 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:08,708-Speed 2497.63 samples/sec Loss 3.2747 LearningRate 0.000631 Epoch: 11 Global Step: 236450 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:16,906-Speed 2498.92 samples/sec Loss 3.4893 LearningRate 0.000631 Epoch: 11 Global Step: 236460 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:25,056-Speed 2513.20 samples/sec Loss 3.3026 LearningRate 0.000631 Epoch: 11 Global Step: 236470 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:33,264-Speed 2495.33 samples/sec Loss 3.4020 LearningRate 0.000631 Epoch: 11 Global Step: 236480 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:41,491-Speed 2490.06 samples/sec Loss 3.3163 LearningRate 0.000631 Epoch: 11 Global Step: 236490 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:49,693-Speed 2497.41 samples/sec Loss 3.2862 LearningRate 0.000631 Epoch: 11 Global Step: 236500 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:28:57,891-Speed 2498.66 samples/sec Loss 3.2802 LearningRate 0.000631 Epoch: 11 Global Step: 236510 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:29:06,089-Speed 2498.49 samples/sec Loss 3.2358 LearningRate 0.000631 Epoch: 11 Global Step: 236520 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:29:14,241-Speed 2512.47 samples/sec Loss 3.2797 LearningRate 0.000631 Epoch: 11 Global Step: 236530 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:29:22,452-Speed 2494.81 samples/sec Loss 3.2526 LearningRate 0.000631 Epoch: 11 Global Step: 236540 Fp16 Grad Scale: 32768 Required: 136 hours Training: 2022-07-07 20:29:30,653-Speed 2497.95 samples/sec Loss 3.2725 LearningRate 0.000631 Epoch: 11 Global Step: 236550 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:29:38,856-Speed 2497.01 samples/sec Loss 3.3431 LearningRate 0.000631 Epoch: 11 Global Step: 236560 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:29:47,063-Speed 2495.91 samples/sec Loss 3.2627 LearningRate 0.000631 Epoch: 11 Global Step: 236570 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:29:55,264-Speed 2497.69 samples/sec Loss 3.3107 LearningRate 0.000631 Epoch: 11 Global Step: 236580 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:03,412-Speed 2514.00 samples/sec Loss 3.2489 LearningRate 0.000631 Epoch: 11 Global Step: 236590 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:11,613-Speed 2497.73 samples/sec Loss 3.2951 LearningRate 0.000631 Epoch: 11 Global Step: 236600 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:19,812-Speed 2498.13 samples/sec Loss 3.2880 LearningRate 0.000631 Epoch: 11 Global Step: 236610 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:28,014-Speed 2497.44 samples/sec Loss 3.2856 LearningRate 0.000631 Epoch: 11 Global Step: 236620 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:36,216-Speed 2497.35 samples/sec Loss 3.3161 LearningRate 0.000631 Epoch: 11 Global Step: 236630 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:44,420-Speed 2496.78 samples/sec Loss 3.3413 LearningRate 0.000631 Epoch: 11 Global Step: 236640 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:30:52,569-Speed 2513.72 samples/sec Loss 3.3684 LearningRate 0.000631 Epoch: 11 Global Step: 236650 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:00,766-Speed 2498.74 samples/sec Loss 3.2675 LearningRate 0.000631 Epoch: 11 Global Step: 236660 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:08,964-Speed 2498.52 samples/sec Loss 3.3385 LearningRate 0.000631 Epoch: 11 Global Step: 236670 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:17,164-Speed 2498.03 samples/sec Loss 3.3246 LearningRate 0.000631 Epoch: 11 Global Step: 236680 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:25,362-Speed 2498.95 samples/sec Loss 3.3042 LearningRate 0.000631 Epoch: 11 Global Step: 236690 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:33,559-Speed 2498.59 samples/sec Loss 3.3102 LearningRate 0.000631 Epoch: 11 Global Step: 236700 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:41,707-Speed 2513.93 samples/sec Loss 3.2783 LearningRate 0.000631 Epoch: 11 Global Step: 236710 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:49,903-Speed 2499.06 samples/sec Loss 3.3342 LearningRate 0.000631 Epoch: 11 Global Step: 236720 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:31:58,105-Speed 2497.44 samples/sec Loss 3.2692 LearningRate 0.000630 Epoch: 11 Global Step: 236730 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:06,310-Speed 2496.34 samples/sec Loss 3.2332 LearningRate 0.000630 Epoch: 11 Global Step: 236740 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:14,517-Speed 2495.88 samples/sec Loss 3.2674 LearningRate 0.000630 Epoch: 11 Global Step: 236750 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:22,722-Speed 2496.38 samples/sec Loss 3.2128 LearningRate 0.000630 Epoch: 11 Global Step: 236760 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:30,869-Speed 2514.47 samples/sec Loss 3.2916 LearningRate 0.000630 Epoch: 11 Global Step: 236770 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:39,071-Speed 2497.13 samples/sec Loss 3.2927 LearningRate 0.000630 Epoch: 11 Global Step: 236780 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:47,272-Speed 2498.01 samples/sec Loss 3.2190 LearningRate 0.000630 Epoch: 11 Global Step: 236790 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:32:55,471-Speed 2498.04 samples/sec Loss 3.2536 LearningRate 0.000630 Epoch: 11 Global Step: 236800 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:03,670-Speed 2498.40 samples/sec Loss 3.2638 LearningRate 0.000630 Epoch: 11 Global Step: 236810 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:11,882-Speed 2494.41 samples/sec Loss 3.2248 LearningRate 0.000630 Epoch: 11 Global Step: 236820 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:20,036-Speed 2511.94 samples/sec Loss 3.2479 LearningRate 0.000630 Epoch: 11 Global Step: 236830 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:28,236-Speed 2498.00 samples/sec Loss 3.2218 LearningRate 0.000630 Epoch: 11 Global Step: 236840 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:36,438-Speed 2497.52 samples/sec Loss 3.2748 LearningRate 0.000630 Epoch: 11 Global Step: 236850 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:44,639-Speed 2497.76 samples/sec Loss 3.2796 LearningRate 0.000630 Epoch: 11 Global Step: 236860 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:33:52,853-Speed 2493.71 samples/sec Loss 3.2723 LearningRate 0.000630 Epoch: 11 Global Step: 236870 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:01,051-Speed 2498.67 samples/sec Loss 3.2223 LearningRate 0.000630 Epoch: 11 Global Step: 236880 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:09,212-Speed 2509.87 samples/sec Loss 3.3133 LearningRate 0.000630 Epoch: 11 Global Step: 236890 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:17,417-Speed 2496.68 samples/sec Loss 3.2165 LearningRate 0.000630 Epoch: 11 Global Step: 236900 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:25,612-Speed 2499.39 samples/sec Loss 3.2213 LearningRate 0.000630 Epoch: 11 Global Step: 236910 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:33,808-Speed 2499.32 samples/sec Loss 3.2730 LearningRate 0.000630 Epoch: 11 Global Step: 236920 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:42,005-Speed 2498.83 samples/sec Loss 3.2108 LearningRate 0.000630 Epoch: 11 Global Step: 236930 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:50,205-Speed 2497.88 samples/sec Loss 3.2255 LearningRate 0.000630 Epoch: 11 Global Step: 236940 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:34:58,365-Speed 2510.49 samples/sec Loss 3.3374 LearningRate 0.000630 Epoch: 11 Global Step: 236950 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:06,558-Speed 2499.90 samples/sec Loss 3.2586 LearningRate 0.000630 Epoch: 11 Global Step: 236960 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:14,757-Speed 2498.47 samples/sec Loss 3.2532 LearningRate 0.000630 Epoch: 11 Global Step: 236970 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:22,957-Speed 2498.06 samples/sec Loss 3.2598 LearningRate 0.000630 Epoch: 11 Global Step: 236980 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:31,152-Speed 2499.25 samples/sec Loss 3.2327 LearningRate 0.000630 Epoch: 11 Global Step: 236990 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:39,349-Speed 2498.79 samples/sec Loss 3.2269 LearningRate 0.000630 Epoch: 11 Global Step: 237000 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:47,513-Speed 2509.27 samples/sec Loss 3.2456 LearningRate 0.000630 Epoch: 11 Global Step: 237010 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:35:55,711-Speed 2498.52 samples/sec Loss 3.2731 LearningRate 0.000630 Epoch: 11 Global Step: 237020 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:03,909-Speed 2498.65 samples/sec Loss 3.2219 LearningRate 0.000630 Epoch: 11 Global Step: 237030 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:12,104-Speed 2499.25 samples/sec Loss 3.2949 LearningRate 0.000630 Epoch: 11 Global Step: 237040 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:20,314-Speed 2495.08 samples/sec Loss 3.3081 LearningRate 0.000630 Epoch: 11 Global Step: 237050 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:28,516-Speed 2497.16 samples/sec Loss 3.2330 LearningRate 0.000630 Epoch: 11 Global Step: 237060 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:36,660-Speed 2515.07 samples/sec Loss 3.2750 LearningRate 0.000630 Epoch: 11 Global Step: 237070 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:44,857-Speed 2499.09 samples/sec Loss 3.2554 LearningRate 0.000630 Epoch: 11 Global Step: 237080 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:36:53,053-Speed 2499.02 samples/sec Loss 3.2629 LearningRate 0.000630 Epoch: 11 Global Step: 237090 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:01,250-Speed 2498.99 samples/sec Loss 3.2851 LearningRate 0.000630 Epoch: 11 Global Step: 237100 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:09,450-Speed 2497.81 samples/sec Loss 3.2836 LearningRate 0.000630 Epoch: 11 Global Step: 237110 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:17,649-Speed 2498.27 samples/sec Loss 3.2737 LearningRate 0.000630 Epoch: 11 Global Step: 237120 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:25,793-Speed 2515.24 samples/sec Loss 3.2867 LearningRate 0.000630 Epoch: 11 Global Step: 237130 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:33,989-Speed 2499.28 samples/sec Loss 3.2814 LearningRate 0.000630 Epoch: 11 Global Step: 237140 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:42,183-Speed 2499.78 samples/sec Loss 3.2458 LearningRate 0.000630 Epoch: 11 Global Step: 237150 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:50,409-Speed 2490.14 samples/sec Loss 3.2445 LearningRate 0.000630 Epoch: 11 Global Step: 237160 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:37:58,617-Speed 2495.58 samples/sec Loss 3.2701 LearningRate 0.000630 Epoch: 11 Global Step: 237170 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:06,813-Speed 2499.02 samples/sec Loss 3.2891 LearningRate 0.000630 Epoch: 11 Global Step: 237180 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:14,964-Speed 2513.05 samples/sec Loss 3.2689 LearningRate 0.000630 Epoch: 11 Global Step: 237190 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:23,160-Speed 2499.40 samples/sec Loss 3.2830 LearningRate 0.000629 Epoch: 11 Global Step: 237200 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:31,365-Speed 2496.39 samples/sec Loss 3.2365 LearningRate 0.000629 Epoch: 11 Global Step: 237210 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:39,565-Speed 2498.13 samples/sec Loss 3.3094 LearningRate 0.000629 Epoch: 11 Global Step: 237220 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:47,762-Speed 2499.23 samples/sec Loss 3.2846 LearningRate 0.000629 Epoch: 11 Global Step: 237230 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:38:55,957-Speed 2499.64 samples/sec Loss 3.2093 LearningRate 0.000629 Epoch: 11 Global Step: 237240 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:04,102-Speed 2514.86 samples/sec Loss 3.2456 LearningRate 0.000629 Epoch: 11 Global Step: 237250 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:12,301-Speed 2498.34 samples/sec Loss 3.1963 LearningRate 0.000629 Epoch: 11 Global Step: 237260 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:20,511-Speed 2494.90 samples/sec Loss 3.2556 LearningRate 0.000629 Epoch: 11 Global Step: 237270 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:28,710-Speed 2498.59 samples/sec Loss 3.2537 LearningRate 0.000629 Epoch: 11 Global Step: 237280 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:36,909-Speed 2498.17 samples/sec Loss 3.2519 LearningRate 0.000629 Epoch: 11 Global Step: 237290 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:45,108-Speed 2498.63 samples/sec Loss 3.3070 LearningRate 0.000629 Epoch: 11 Global Step: 237300 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:39:53,249-Speed 2516.32 samples/sec Loss 3.2355 LearningRate 0.000629 Epoch: 11 Global Step: 237310 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:01,446-Speed 2498.91 samples/sec Loss 3.2594 LearningRate 0.000629 Epoch: 11 Global Step: 237320 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:09,643-Speed 2499.01 samples/sec Loss 3.3053 LearningRate 0.000629 Epoch: 11 Global Step: 237330 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:17,839-Speed 2499.22 samples/sec Loss 3.2717 LearningRate 0.000629 Epoch: 11 Global Step: 237340 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:26,035-Speed 2499.07 samples/sec Loss 3.2130 LearningRate 0.000629 Epoch: 11 Global Step: 237350 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:34,233-Speed 2498.74 samples/sec Loss 3.3012 LearningRate 0.000629 Epoch: 11 Global Step: 237360 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:42,382-Speed 2513.75 samples/sec Loss 3.2170 LearningRate 0.000629 Epoch: 11 Global Step: 237370 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:50,582-Speed 2497.90 samples/sec Loss 3.2659 LearningRate 0.000629 Epoch: 11 Global Step: 237380 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:40:58,780-Speed 2498.83 samples/sec Loss 3.2469 LearningRate 0.000629 Epoch: 11 Global Step: 237390 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:06,979-Speed 2498.11 samples/sec Loss 3.2585 LearningRate 0.000629 Epoch: 11 Global Step: 237400 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:15,175-Speed 2499.29 samples/sec Loss 3.2126 LearningRate 0.000629 Epoch: 11 Global Step: 237410 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:23,369-Speed 2499.55 samples/sec Loss 3.2596 LearningRate 0.000629 Epoch: 11 Global Step: 237420 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:31,515-Speed 2514.98 samples/sec Loss 3.2359 LearningRate 0.000629 Epoch: 11 Global Step: 237430 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:39,712-Speed 2498.89 samples/sec Loss 3.2195 LearningRate 0.000629 Epoch: 11 Global Step: 237440 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:47,909-Speed 2498.49 samples/sec Loss 3.2530 LearningRate 0.000629 Epoch: 11 Global Step: 237450 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:41:56,113-Speed 2496.92 samples/sec Loss 3.3205 LearningRate 0.000629 Epoch: 11 Global Step: 237460 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:42:04,311-Speed 2498.66 samples/sec Loss 3.3696 LearningRate 0.000629 Epoch: 11 Global Step: 237470 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:42:12,511-Speed 2497.81 samples/sec Loss 3.2619 LearningRate 0.000629 Epoch: 11 Global Step: 237480 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:42:20,656-Speed 2514.88 samples/sec Loss 3.2585 LearningRate 0.000629 Epoch: 11 Global Step: 237490 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:42:28,854-Speed 2498.60 samples/sec Loss 3.2700 LearningRate 0.000629 Epoch: 11 Global Step: 237500 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:42:37,052-Speed 2498.72 samples/sec Loss 3.2464 LearningRate 0.000629 Epoch: 11 Global Step: 237510 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:42:45,257-Speed 2496.24 samples/sec Loss 3.2525 LearningRate 0.000629 Epoch: 11 Global Step: 237520 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:42:53,454-Speed 2498.83 samples/sec Loss 3.3212 LearningRate 0.000629 Epoch: 11 Global Step: 237530 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:43:01,653-Speed 2498.54 samples/sec Loss 3.2782 LearningRate 0.000629 Epoch: 11 Global Step: 237540 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:43:09,801-Speed 2513.89 samples/sec Loss 3.2436 LearningRate 0.000629 Epoch: 11 Global Step: 237550 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:43:17,997-Speed 2499.00 samples/sec Loss 3.2182 LearningRate 0.000629 Epoch: 11 Global Step: 237560 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:43:26,194-Speed 2498.83 samples/sec Loss 3.2495 LearningRate 0.000629 Epoch: 11 Global Step: 237570 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 20:43:34,354-Speed 2510.30 samples/sec Loss 3.2632 LearningRate 0.000629 Epoch: 11 Global Step: 237580 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:43:42,555-Speed 2497.79 samples/sec Loss 3.2569 LearningRate 0.000629 Epoch: 11 Global Step: 237590 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:43:50,753-Speed 2498.34 samples/sec Loss 3.2267 LearningRate 0.000629 Epoch: 11 Global Step: 237600 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:43:58,904-Speed 2513.36 samples/sec Loss 3.2351 LearningRate 0.000629 Epoch: 11 Global Step: 237610 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:07,100-Speed 2499.03 samples/sec Loss 3.2353 LearningRate 0.000629 Epoch: 11 Global Step: 237620 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:15,311-Speed 2494.55 samples/sec Loss 3.2704 LearningRate 0.000629 Epoch: 11 Global Step: 237630 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:23,508-Speed 2499.00 samples/sec Loss 3.2010 LearningRate 0.000629 Epoch: 11 Global Step: 237640 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:31,711-Speed 2496.87 samples/sec Loss 3.2112 LearningRate 0.000629 Epoch: 11 Global Step: 237650 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:39,907-Speed 2499.15 samples/sec Loss 3.3247 LearningRate 0.000629 Epoch: 11 Global Step: 237660 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:48,053-Speed 2514.33 samples/sec Loss 3.2613 LearningRate 0.000628 Epoch: 11 Global Step: 237670 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:44:56,249-Speed 2499.72 samples/sec Loss 3.2793 LearningRate 0.000628 Epoch: 11 Global Step: 237680 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:04,450-Speed 2498.12 samples/sec Loss 3.2491 LearningRate 0.000628 Epoch: 11 Global Step: 237690 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:12,646-Speed 2499.03 samples/sec Loss 3.2639 LearningRate 0.000628 Epoch: 11 Global Step: 237700 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:20,842-Speed 2499.38 samples/sec Loss 3.2238 LearningRate 0.000628 Epoch: 11 Global Step: 237710 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:29,040-Speed 2498.35 samples/sec Loss 3.3659 LearningRate 0.000628 Epoch: 11 Global Step: 237720 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:37,185-Speed 2514.95 samples/sec Loss 3.1790 LearningRate 0.000628 Epoch: 11 Global Step: 237730 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:45,380-Speed 2499.24 samples/sec Loss 3.2960 LearningRate 0.000628 Epoch: 11 Global Step: 237740 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:45:53,577-Speed 2499.08 samples/sec Loss 3.3085 LearningRate 0.000628 Epoch: 11 Global Step: 237750 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:01,780-Speed 2497.13 samples/sec Loss 3.2248 LearningRate 0.000628 Epoch: 11 Global Step: 237760 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:09,985-Speed 2496.34 samples/sec Loss 3.2595 LearningRate 0.000628 Epoch: 11 Global Step: 237770 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:18,186-Speed 2497.65 samples/sec Loss 3.2575 LearningRate 0.000628 Epoch: 11 Global Step: 237780 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:26,329-Speed 2515.25 samples/sec Loss 3.2656 LearningRate 0.000628 Epoch: 11 Global Step: 237790 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:34,526-Speed 2499.13 samples/sec Loss 3.2690 LearningRate 0.000628 Epoch: 11 Global Step: 237800 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:42,723-Speed 2498.92 samples/sec Loss 3.2588 LearningRate 0.000628 Epoch: 11 Global Step: 237810 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:50,927-Speed 2496.95 samples/sec Loss 3.3241 LearningRate 0.000628 Epoch: 11 Global Step: 237820 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:46:59,126-Speed 2498.20 samples/sec Loss 3.2454 LearningRate 0.000628 Epoch: 11 Global Step: 237830 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:07,325-Speed 2498.38 samples/sec Loss 3.3341 LearningRate 0.000628 Epoch: 11 Global Step: 237840 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:15,474-Speed 2513.59 samples/sec Loss 3.2707 LearningRate 0.000628 Epoch: 11 Global Step: 237850 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:23,672-Speed 2498.56 samples/sec Loss 3.2960 LearningRate 0.000628 Epoch: 11 Global Step: 237860 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:31,873-Speed 2497.84 samples/sec Loss 3.2525 LearningRate 0.000628 Epoch: 11 Global Step: 237870 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:40,073-Speed 2498.10 samples/sec Loss 3.2654 LearningRate 0.000628 Epoch: 11 Global Step: 237880 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:48,291-Speed 2492.44 samples/sec Loss 3.2273 LearningRate 0.000628 Epoch: 11 Global Step: 237890 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:47:56,487-Speed 2499.42 samples/sec Loss 3.2208 LearningRate 0.000628 Epoch: 11 Global Step: 237900 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:04,632-Speed 2514.81 samples/sec Loss 3.2277 LearningRate 0.000628 Epoch: 11 Global Step: 237910 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:12,829-Speed 2498.77 samples/sec Loss 3.2463 LearningRate 0.000628 Epoch: 11 Global Step: 237920 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:21,028-Speed 2498.54 samples/sec Loss 3.2512 LearningRate 0.000628 Epoch: 11 Global Step: 237930 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:29,225-Speed 2498.84 samples/sec Loss 3.2264 LearningRate 0.000628 Epoch: 11 Global Step: 237940 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:37,423-Speed 2498.52 samples/sec Loss 3.2447 LearningRate 0.000628 Epoch: 11 Global Step: 237950 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:45,623-Speed 2498.02 samples/sec Loss 3.3400 LearningRate 0.000628 Epoch: 11 Global Step: 237960 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:48:53,783-Speed 2510.46 samples/sec Loss 3.2745 LearningRate 0.000628 Epoch: 11 Global Step: 237970 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:01,980-Speed 2498.69 samples/sec Loss 3.3146 LearningRate 0.000628 Epoch: 11 Global Step: 237980 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:10,181-Speed 2497.68 samples/sec Loss 3.3493 LearningRate 0.000628 Epoch: 11 Global Step: 237990 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:18,376-Speed 2499.59 samples/sec Loss 3.2627 LearningRate 0.000628 Epoch: 11 Global Step: 238000 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:26,572-Speed 2499.28 samples/sec Loss 3.2751 LearningRate 0.000628 Epoch: 11 Global Step: 238010 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:34,768-Speed 2499.05 samples/sec Loss 3.3238 LearningRate 0.000628 Epoch: 11 Global Step: 238020 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:42,915-Speed 2514.14 samples/sec Loss 3.3162 LearningRate 0.000628 Epoch: 11 Global Step: 238030 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:51,112-Speed 2498.88 samples/sec Loss 3.3086 LearningRate 0.000628 Epoch: 11 Global Step: 238040 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:49:59,310-Speed 2498.61 samples/sec Loss 3.3040 LearningRate 0.000628 Epoch: 11 Global Step: 238050 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:07,511-Speed 2497.82 samples/sec Loss 3.2268 LearningRate 0.000628 Epoch: 11 Global Step: 238060 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:15,722-Speed 2494.57 samples/sec Loss 3.2746 LearningRate 0.000628 Epoch: 11 Global Step: 238070 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:23,922-Speed 2498.08 samples/sec Loss 3.2950 LearningRate 0.000628 Epoch: 11 Global Step: 238080 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:32,072-Speed 2513.33 samples/sec Loss 3.2565 LearningRate 0.000628 Epoch: 11 Global Step: 238090 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:40,273-Speed 2497.93 samples/sec Loss 3.2644 LearningRate 0.000628 Epoch: 11 Global Step: 238100 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:48,473-Speed 2497.94 samples/sec Loss 3.2992 LearningRate 0.000628 Epoch: 11 Global Step: 238110 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:50:56,676-Speed 2497.10 samples/sec Loss 3.2669 LearningRate 0.000628 Epoch: 11 Global Step: 238120 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:04,874-Speed 2498.59 samples/sec Loss 3.2859 LearningRate 0.000628 Epoch: 11 Global Step: 238130 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:13,071-Speed 2499.58 samples/sec Loss 3.2969 LearningRate 0.000627 Epoch: 11 Global Step: 238140 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:21,215-Speed 2515.23 samples/sec Loss 3.2781 LearningRate 0.000627 Epoch: 11 Global Step: 238150 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:29,413-Speed 2498.37 samples/sec Loss 3.3416 LearningRate 0.000627 Epoch: 11 Global Step: 238160 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:37,613-Speed 2498.08 samples/sec Loss 3.3080 LearningRate 0.000627 Epoch: 11 Global Step: 238170 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:45,810-Speed 2499.38 samples/sec Loss 3.2645 LearningRate 0.000627 Epoch: 11 Global Step: 238180 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:51:54,007-Speed 2498.95 samples/sec Loss 3.2446 LearningRate 0.000627 Epoch: 11 Global Step: 238190 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:02,203-Speed 2498.92 samples/sec Loss 3.1950 LearningRate 0.000627 Epoch: 11 Global Step: 238200 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:10,348-Speed 2514.73 samples/sec Loss 3.2838 LearningRate 0.000627 Epoch: 11 Global Step: 238210 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:18,545-Speed 2499.13 samples/sec Loss 3.2689 LearningRate 0.000627 Epoch: 11 Global Step: 238220 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:26,743-Speed 2498.50 samples/sec Loss 3.3190 LearningRate 0.000627 Epoch: 11 Global Step: 238230 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:34,941-Speed 2498.61 samples/sec Loss 3.2538 LearningRate 0.000627 Epoch: 11 Global Step: 238240 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:43,139-Speed 2498.47 samples/sec Loss 3.2767 LearningRate 0.000627 Epoch: 11 Global Step: 238250 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:51,336-Speed 2498.80 samples/sec Loss 3.2739 LearningRate 0.000627 Epoch: 11 Global Step: 238260 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:52:59,483-Speed 2514.40 samples/sec Loss 3.3102 LearningRate 0.000627 Epoch: 11 Global Step: 238270 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:07,684-Speed 2497.53 samples/sec Loss 3.2205 LearningRate 0.000627 Epoch: 11 Global Step: 238280 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:15,881-Speed 2498.87 samples/sec Loss 3.1924 LearningRate 0.000627 Epoch: 11 Global Step: 238290 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:24,079-Speed 2498.44 samples/sec Loss 3.2742 LearningRate 0.000627 Epoch: 11 Global Step: 238300 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:32,277-Speed 2498.67 samples/sec Loss 3.3084 LearningRate 0.000627 Epoch: 11 Global Step: 238310 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:40,477-Speed 2497.83 samples/sec Loss 3.3253 LearningRate 0.000627 Epoch: 11 Global Step: 238320 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:48,627-Speed 2513.14 samples/sec Loss 3.2579 LearningRate 0.000627 Epoch: 11 Global Step: 238330 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:53:56,825-Speed 2498.57 samples/sec Loss 3.3259 LearningRate 0.000627 Epoch: 11 Global Step: 238340 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:05,032-Speed 2496.10 samples/sec Loss 3.2508 LearningRate 0.000627 Epoch: 11 Global Step: 238350 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:13,231-Speed 2498.69 samples/sec Loss 3.2947 LearningRate 0.000627 Epoch: 11 Global Step: 238360 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:21,426-Speed 2499.29 samples/sec Loss 3.2881 LearningRate 0.000627 Epoch: 11 Global Step: 238370 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:29,623-Speed 2498.99 samples/sec Loss 3.3506 LearningRate 0.000627 Epoch: 11 Global Step: 238380 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:37,768-Speed 2514.69 samples/sec Loss 3.2142 LearningRate 0.000627 Epoch: 11 Global Step: 238390 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:45,967-Speed 2498.59 samples/sec Loss 3.3156 LearningRate 0.000627 Epoch: 11 Global Step: 238400 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:54:54,165-Speed 2498.55 samples/sec Loss 3.2721 LearningRate 0.000627 Epoch: 11 Global Step: 238410 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:02,366-Speed 2497.86 samples/sec Loss 3.3019 LearningRate 0.000627 Epoch: 11 Global Step: 238420 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:10,563-Speed 2499.04 samples/sec Loss 3.3048 LearningRate 0.000627 Epoch: 11 Global Step: 238430 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:18,758-Speed 2499.73 samples/sec Loss 3.2865 LearningRate 0.000627 Epoch: 11 Global Step: 238440 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:26,902-Speed 2515.03 samples/sec Loss 3.2489 LearningRate 0.000627 Epoch: 11 Global Step: 238450 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:35,102-Speed 2497.83 samples/sec Loss 3.2393 LearningRate 0.000627 Epoch: 11 Global Step: 238460 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:43,309-Speed 2495.73 samples/sec Loss 3.1908 LearningRate 0.000627 Epoch: 11 Global Step: 238470 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:51,507-Speed 2498.57 samples/sec Loss 3.2252 LearningRate 0.000627 Epoch: 11 Global Step: 238480 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:55:59,704-Speed 2499.09 samples/sec Loss 3.2258 LearningRate 0.000627 Epoch: 11 Global Step: 238490 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:07,906-Speed 2497.54 samples/sec Loss 3.2503 LearningRate 0.000627 Epoch: 11 Global Step: 238500 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:16,049-Speed 2515.40 samples/sec Loss 3.2756 LearningRate 0.000627 Epoch: 11 Global Step: 238510 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:24,252-Speed 2497.53 samples/sec Loss 3.3058 LearningRate 0.000627 Epoch: 11 Global Step: 238520 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:32,445-Speed 2500.04 samples/sec Loss 3.2623 LearningRate 0.000627 Epoch: 11 Global Step: 238530 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:40,645-Speed 2497.96 samples/sec Loss 3.2357 LearningRate 0.000627 Epoch: 11 Global Step: 238540 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:48,844-Speed 2498.34 samples/sec Loss 3.2431 LearningRate 0.000627 Epoch: 11 Global Step: 238550 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:56:57,042-Speed 2498.24 samples/sec Loss 3.3068 LearningRate 0.000627 Epoch: 11 Global Step: 238560 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:05,187-Speed 2514.95 samples/sec Loss 3.1857 LearningRate 0.000627 Epoch: 11 Global Step: 238570 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:13,381-Speed 2500.10 samples/sec Loss 3.3566 LearningRate 0.000627 Epoch: 11 Global Step: 238580 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:21,583-Speed 2497.16 samples/sec Loss 3.3087 LearningRate 0.000627 Epoch: 11 Global Step: 238590 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:29,782-Speed 2498.33 samples/sec Loss 3.2838 LearningRate 0.000627 Epoch: 11 Global Step: 238600 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:37,978-Speed 2499.03 samples/sec Loss 3.3125 LearningRate 0.000627 Epoch: 11 Global Step: 238610 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:46,178-Speed 2497.86 samples/sec Loss 3.2964 LearningRate 0.000626 Epoch: 11 Global Step: 238620 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:57:54,322-Speed 2515.22 samples/sec Loss 3.2555 LearningRate 0.000626 Epoch: 11 Global Step: 238630 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:02,523-Speed 2497.74 samples/sec Loss 3.3286 LearningRate 0.000626 Epoch: 11 Global Step: 238640 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:10,733-Speed 2495.03 samples/sec Loss 3.3264 LearningRate 0.000626 Epoch: 11 Global Step: 238650 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:18,931-Speed 2498.39 samples/sec Loss 3.2744 LearningRate 0.000626 Epoch: 11 Global Step: 238660 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:27,131-Speed 2498.04 samples/sec Loss 3.2449 LearningRate 0.000626 Epoch: 11 Global Step: 238670 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:35,333-Speed 2497.28 samples/sec Loss 3.2960 LearningRate 0.000626 Epoch: 11 Global Step: 238680 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:43,479-Speed 2514.39 samples/sec Loss 3.2064 LearningRate 0.000626 Epoch: 11 Global Step: 238690 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:51,682-Speed 2497.06 samples/sec Loss 3.2382 LearningRate 0.000626 Epoch: 11 Global Step: 238700 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:58:59,885-Speed 2496.87 samples/sec Loss 3.2959 LearningRate 0.000626 Epoch: 11 Global Step: 238710 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:08,083-Speed 2498.79 samples/sec Loss 3.2380 LearningRate 0.000626 Epoch: 11 Global Step: 238720 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:16,282-Speed 2498.37 samples/sec Loss 3.2198 LearningRate 0.000626 Epoch: 11 Global Step: 238730 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:27,193-Speed 1877.16 samples/sec Loss 3.2340 LearningRate 0.000626 Epoch: 11 Global Step: 238740 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:35,332-Speed 2516.58 samples/sec Loss 3.2262 LearningRate 0.000626 Epoch: 11 Global Step: 238750 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:43,590-Speed 2501.36 samples/sec Loss 3.2571 LearningRate 0.000626 Epoch: 11 Global Step: 238760 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 20:59:57,232-Speed 2490.21 samples/sec Loss 3.2470 LearningRate 0.000626 Epoch: 11 Global Step: 238770 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:00:05,487-Speed 2502.80 samples/sec Loss 3.2744 LearningRate 0.000626 Epoch: 11 Global Step: 238780 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:00:13,712-Speed 2501.23 samples/sec Loss 3.2661 LearningRate 0.000626 Epoch: 11 Global Step: 238790 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:00:21,973-Speed 2500.05 samples/sec Loss 3.3409 LearningRate 0.000626 Epoch: 11 Global Step: 238800 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:00:30,121-Speed 2513.97 samples/sec Loss 3.2495 LearningRate 0.000626 Epoch: 11 Global Step: 238810 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:00:38,374-Speed 2498.56 samples/sec Loss 3.2530 LearningRate 0.000626 Epoch: 11 Global Step: 238820 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:00:52,291-Speed 2496.78 samples/sec Loss 3.2003 LearningRate 0.000626 Epoch: 11 Global Step: 238830 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:00,544-Speed 2497.50 samples/sec Loss 3.2245 LearningRate 0.000626 Epoch: 11 Global Step: 238840 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:08,860-Speed 2463.17 samples/sec Loss 3.4175 LearningRate 0.000626 Epoch: 11 Global Step: 238850 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:22,485-Speed 2501.00 samples/sec Loss 3.2989 LearningRate 0.000626 Epoch: 11 Global Step: 238860 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:30,715-Speed 2517.76 samples/sec Loss 3.2558 LearningRate 0.000626 Epoch: 11 Global Step: 238870 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:38,916-Speed 2497.46 samples/sec Loss 3.2437 LearningRate 0.000626 Epoch: 11 Global Step: 238880 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:47,171-Speed 2498.27 samples/sec Loss 3.2664 LearningRate 0.000626 Epoch: 11 Global Step: 238890 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:01:59,000-Speed 2500.64 samples/sec Loss 3.2247 LearningRate 0.000626 Epoch: 11 Global Step: 238900 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:07,198-Speed 2498.46 samples/sec Loss 3.2090 LearningRate 0.000626 Epoch: 11 Global Step: 238910 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:16,814-Speed 2129.95 samples/sec Loss 3.2601 LearningRate 0.000626 Epoch: 11 Global Step: 238920 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:24,978-Speed 2518.70 samples/sec Loss 3.2929 LearningRate 0.000626 Epoch: 11 Global Step: 238930 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:33,256-Speed 2499.99 samples/sec Loss 3.2195 LearningRate 0.000626 Epoch: 11 Global Step: 238940 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:45,116-Speed 1726.93 samples/sec Loss 3.2837 LearningRate 0.000626 Epoch: 11 Global Step: 238950 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:02:53,341-Speed 2501.99 samples/sec Loss 3.2610 LearningRate 0.000626 Epoch: 11 Global Step: 238960 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:03:01,554-Speed 2513.71 samples/sec Loss 3.2403 LearningRate 0.000626 Epoch: 11 Global Step: 238970 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:09,754-Speed 2498.01 samples/sec Loss 3.2476 LearningRate 0.000626 Epoch: 11 Global Step: 238980 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:20,094-Speed 2516.13 samples/sec Loss 3.2372 LearningRate 0.000626 Epoch: 11 Global Step: 238990 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:28,291-Speed 2500.09 samples/sec Loss 3.3045 LearningRate 0.000626 Epoch: 11 Global Step: 239000 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:37,676-Speed 2267.02 samples/sec Loss 3.3256 LearningRate 0.000626 Epoch: 11 Global Step: 239010 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:45,880-Speed 2496.78 samples/sec Loss 3.3076 LearningRate 0.000626 Epoch: 11 Global Step: 239020 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:03:54,156-Speed 2499.82 samples/sec Loss 3.2771 LearningRate 0.000626 Epoch: 11 Global Step: 239030 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:04:02,412-Speed 2499.58 samples/sec Loss 3.3044 LearningRate 0.000626 Epoch: 11 Global Step: 239040 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:04:10,584-Speed 2514.88 samples/sec Loss 3.2508 LearningRate 0.000626 Epoch: 11 Global Step: 239050 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:04:18,786-Speed 2497.04 samples/sec Loss 3.2899 LearningRate 0.000626 Epoch: 11 Global Step: 239060 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:06:21,764-Speed 166.54 samples/sec Loss 3.2917 LearningRate 0.000626 Epoch: 11 Global Step: 239070 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:06:29,969-Speed 2513.05 samples/sec Loss 3.2856 LearningRate 0.000626 Epoch: 11 Global Step: 239080 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:06:38,175-Speed 2512.17 samples/sec Loss 3.2768 LearningRate 0.000625 Epoch: 11 Global Step: 239090 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:06:46,344-Speed 2507.46 samples/sec Loss 3.2745 LearningRate 0.000625 Epoch: 11 Global Step: 239100 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:06:54,494-Speed 2524.85 samples/sec Loss 3.3485 LearningRate 0.000625 Epoch: 11 Global Step: 239110 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:02,657-Speed 2509.07 samples/sec Loss 3.2393 LearningRate 0.000625 Epoch: 11 Global Step: 239120 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:10,859-Speed 2510.09 samples/sec Loss 3.2577 LearningRate 0.000625 Epoch: 11 Global Step: 239130 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:19,091-Speed 2507.62 samples/sec Loss 3.2733 LearningRate 0.000625 Epoch: 11 Global Step: 239140 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:27,277-Speed 2501.98 samples/sec Loss 3.3565 LearningRate 0.000625 Epoch: 11 Global Step: 239150 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:35,449-Speed 2506.53 samples/sec Loss 3.3078 LearningRate 0.000625 Epoch: 11 Global Step: 239160 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:43,615-Speed 2519.67 samples/sec Loss 3.2513 LearningRate 0.000625 Epoch: 11 Global Step: 239170 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:07:51,848-Speed 2508.18 samples/sec Loss 3.3101 LearningRate 0.000625 Epoch: 11 Global Step: 239180 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:01,994-Speed 2018.69 samples/sec Loss 3.3152 LearningRate 0.000625 Epoch: 11 Global Step: 239190 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:10,173-Speed 2504.73 samples/sec Loss 3.3391 LearningRate 0.000625 Epoch: 11 Global Step: 239200 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:18,367-Speed 2499.80 samples/sec Loss 3.3048 LearningRate 0.000625 Epoch: 11 Global Step: 239210 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:26,550-Speed 2502.88 samples/sec Loss 3.2587 LearningRate 0.000625 Epoch: 11 Global Step: 239220 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:34,680-Speed 2519.65 samples/sec Loss 3.2017 LearningRate 0.000625 Epoch: 11 Global Step: 239230 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:42,867-Speed 2501.90 samples/sec Loss 3.2404 LearningRate 0.000625 Epoch: 11 Global Step: 239240 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:51,047-Speed 2504.05 samples/sec Loss 3.2429 LearningRate 0.000625 Epoch: 11 Global Step: 239250 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:08:59,230-Speed 2502.96 samples/sec Loss 3.1852 LearningRate 0.000625 Epoch: 11 Global Step: 239260 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:07,414-Speed 2502.97 samples/sec Loss 3.2757 LearningRate 0.000625 Epoch: 11 Global Step: 239270 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:15,610-Speed 2499.25 samples/sec Loss 3.1994 LearningRate 0.000625 Epoch: 11 Global Step: 239280 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:23,757-Speed 2514.36 samples/sec Loss 3.2845 LearningRate 0.000625 Epoch: 11 Global Step: 239290 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:31,944-Speed 2501.76 samples/sec Loss 3.2833 LearningRate 0.000625 Epoch: 11 Global Step: 239300 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:40,127-Speed 2503.31 samples/sec Loss 3.2984 LearningRate 0.000625 Epoch: 11 Global Step: 239310 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:48,308-Speed 2503.74 samples/sec Loss 3.1959 LearningRate 0.000625 Epoch: 11 Global Step: 239320 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:09:56,488-Speed 2504.18 samples/sec Loss 3.2548 LearningRate 0.000625 Epoch: 11 Global Step: 239330 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:04,676-Speed 2501.61 samples/sec Loss 3.3444 LearningRate 0.000625 Epoch: 11 Global Step: 239340 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:12,799-Speed 2521.55 samples/sec Loss 3.2136 LearningRate 0.000625 Epoch: 11 Global Step: 239350 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:20,978-Speed 2504.23 samples/sec Loss 3.2627 LearningRate 0.000625 Epoch: 11 Global Step: 239360 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:29,162-Speed 2502.84 samples/sec Loss 3.3179 LearningRate 0.000625 Epoch: 11 Global Step: 239370 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:37,343-Speed 2503.69 samples/sec Loss 3.2286 LearningRate 0.000625 Epoch: 11 Global Step: 239380 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:45,525-Speed 2503.65 samples/sec Loss 3.2718 LearningRate 0.000625 Epoch: 11 Global Step: 239390 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:10:53,707-Speed 2503.45 samples/sec Loss 3.3173 LearningRate 0.000625 Epoch: 11 Global Step: 239400 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:01,835-Speed 2520.07 samples/sec Loss 3.2260 LearningRate 0.000625 Epoch: 11 Global Step: 239410 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:10,020-Speed 2502.37 samples/sec Loss 3.2776 LearningRate 0.000625 Epoch: 11 Global Step: 239420 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:18,205-Speed 2502.58 samples/sec Loss 3.3267 LearningRate 0.000625 Epoch: 11 Global Step: 239430 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:26,391-Speed 2502.26 samples/sec Loss 3.2974 LearningRate 0.000625 Epoch: 11 Global Step: 239440 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:34,575-Speed 2502.84 samples/sec Loss 3.2994 LearningRate 0.000625 Epoch: 11 Global Step: 239450 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:42,773-Speed 2498.51 samples/sec Loss 3.2945 LearningRate 0.000625 Epoch: 11 Global Step: 239460 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:50,905-Speed 2518.81 samples/sec Loss 3.2718 LearningRate 0.000625 Epoch: 11 Global Step: 239470 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:11:59,093-Speed 2501.82 samples/sec Loss 3.2638 LearningRate 0.000625 Epoch: 11 Global Step: 239480 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:07,274-Speed 2503.59 samples/sec Loss 3.3722 LearningRate 0.000625 Epoch: 11 Global Step: 239490 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:15,473-Speed 2498.10 samples/sec Loss 3.3498 LearningRate 0.000625 Epoch: 11 Global Step: 239500 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:23,661-Speed 2501.54 samples/sec Loss 3.2956 LearningRate 0.000625 Epoch: 11 Global Step: 239510 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:31,849-Speed 2501.90 samples/sec Loss 3.3047 LearningRate 0.000625 Epoch: 11 Global Step: 239520 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:39,990-Speed 2516.03 samples/sec Loss 3.2835 LearningRate 0.000625 Epoch: 11 Global Step: 239530 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:48,179-Speed 2501.10 samples/sec Loss 3.2841 LearningRate 0.000625 Epoch: 11 Global Step: 239540 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:12:56,369-Speed 2501.15 samples/sec Loss 3.2711 LearningRate 0.000625 Epoch: 11 Global Step: 239550 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:04,558-Speed 2501.48 samples/sec Loss 3.2939 LearningRate 0.000624 Epoch: 11 Global Step: 239560 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:12,744-Speed 2502.24 samples/sec Loss 3.3266 LearningRate 0.000624 Epoch: 11 Global Step: 239570 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:20,931-Speed 2501.99 samples/sec Loss 3.2837 LearningRate 0.000624 Epoch: 11 Global Step: 239580 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:29,065-Speed 2518.24 samples/sec Loss 3.3573 LearningRate 0.000624 Epoch: 11 Global Step: 239590 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:37,266-Speed 2497.44 samples/sec Loss 3.2972 LearningRate 0.000624 Epoch: 11 Global Step: 239600 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:45,459-Speed 2500.40 samples/sec Loss 3.3127 LearningRate 0.000624 Epoch: 11 Global Step: 239610 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:13:53,642-Speed 2502.96 samples/sec Loss 3.3224 LearningRate 0.000624 Epoch: 11 Global Step: 239620 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:01,825-Speed 2503.35 samples/sec Loss 3.2899 LearningRate 0.000624 Epoch: 11 Global Step: 239630 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:10,013-Speed 2501.39 samples/sec Loss 3.2614 LearningRate 0.000624 Epoch: 11 Global Step: 239640 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:18,150-Speed 2517.50 samples/sec Loss 3.2809 LearningRate 0.000624 Epoch: 11 Global Step: 239650 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:26,336-Speed 2502.32 samples/sec Loss 3.2917 LearningRate 0.000624 Epoch: 11 Global Step: 239660 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:34,525-Speed 2501.06 samples/sec Loss 3.2113 LearningRate 0.000624 Epoch: 11 Global Step: 239670 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:42,710-Speed 2502.71 samples/sec Loss 3.2618 LearningRate 0.000624 Epoch: 11 Global Step: 239680 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:50,897-Speed 2501.90 samples/sec Loss 3.2085 LearningRate 0.000624 Epoch: 11 Global Step: 239690 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:14:59,088-Speed 2500.49 samples/sec Loss 3.2340 LearningRate 0.000624 Epoch: 11 Global Step: 239700 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:07,223-Speed 2518.20 samples/sec Loss 3.2163 LearningRate 0.000624 Epoch: 11 Global Step: 239710 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:15,413-Speed 2501.02 samples/sec Loss 3.2593 LearningRate 0.000624 Epoch: 11 Global Step: 239720 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:23,599-Speed 2502.26 samples/sec Loss 3.2025 LearningRate 0.000624 Epoch: 11 Global Step: 239730 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:31,789-Speed 2501.03 samples/sec Loss 3.2559 LearningRate 0.000624 Epoch: 11 Global Step: 239740 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:39,980-Speed 2500.63 samples/sec Loss 3.2067 LearningRate 0.000624 Epoch: 11 Global Step: 239750 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:48,184-Speed 2496.84 samples/sec Loss 3.3021 LearningRate 0.000624 Epoch: 11 Global Step: 239760 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:15:56,322-Speed 2516.88 samples/sec Loss 3.2627 LearningRate 0.000624 Epoch: 11 Global Step: 239770 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:04,530-Speed 2495.70 samples/sec Loss 3.2838 LearningRate 0.000624 Epoch: 11 Global Step: 239780 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:12,718-Speed 2501.80 samples/sec Loss 3.2910 LearningRate 0.000624 Epoch: 11 Global Step: 239790 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:20,909-Speed 2500.56 samples/sec Loss 3.2401 LearningRate 0.000624 Epoch: 11 Global Step: 239800 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:29,104-Speed 2499.62 samples/sec Loss 3.2360 LearningRate 0.000624 Epoch: 11 Global Step: 239810 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:37,293-Speed 2501.29 samples/sec Loss 3.2116 LearningRate 0.000624 Epoch: 11 Global Step: 239820 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:45,437-Speed 2515.17 samples/sec Loss 3.2705 LearningRate 0.000624 Epoch: 11 Global Step: 239830 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:16:53,627-Speed 2501.13 samples/sec Loss 3.2293 LearningRate 0.000624 Epoch: 11 Global Step: 239840 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:01,821-Speed 2499.76 samples/sec Loss 3.2373 LearningRate 0.000624 Epoch: 11 Global Step: 239850 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:10,012-Speed 2500.65 samples/sec Loss 3.2387 LearningRate 0.000624 Epoch: 11 Global Step: 239860 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:18,204-Speed 2500.37 samples/sec Loss 3.2441 LearningRate 0.000624 Epoch: 11 Global Step: 239870 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:26,399-Speed 2499.37 samples/sec Loss 3.2726 LearningRate 0.000624 Epoch: 11 Global Step: 239880 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:34,540-Speed 2516.22 samples/sec Loss 3.2289 LearningRate 0.000624 Epoch: 11 Global Step: 239890 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:42,736-Speed 2499.36 samples/sec Loss 3.2411 LearningRate 0.000624 Epoch: 11 Global Step: 239900 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:50,935-Speed 2498.16 samples/sec Loss 3.2549 LearningRate 0.000624 Epoch: 11 Global Step: 239910 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:17:59,129-Speed 2499.73 samples/sec Loss 3.2473 LearningRate 0.000624 Epoch: 11 Global Step: 239920 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:07,325-Speed 2499.16 samples/sec Loss 3.2587 LearningRate 0.000624 Epoch: 11 Global Step: 239930 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:15,517-Speed 2500.36 samples/sec Loss 3.2407 LearningRate 0.000624 Epoch: 11 Global Step: 239940 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:23,659-Speed 2515.80 samples/sec Loss 3.2550 LearningRate 0.000624 Epoch: 11 Global Step: 239950 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:31,855-Speed 2499.20 samples/sec Loss 3.2904 LearningRate 0.000624 Epoch: 11 Global Step: 239960 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:40,052-Speed 2498.77 samples/sec Loss 3.3008 LearningRate 0.000624 Epoch: 11 Global Step: 239970 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:48,247-Speed 2499.60 samples/sec Loss 3.3285 LearningRate 0.000624 Epoch: 11 Global Step: 239980 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:18:56,443-Speed 2499.21 samples/sec Loss 3.3251 LearningRate 0.000624 Epoch: 11 Global Step: 239990 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:04,639-Speed 2499.06 samples/sec Loss 3.3417 LearningRate 0.000624 Epoch: 11 Global Step: 240000 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:12,784-Speed 2515.60 samples/sec Loss 3.2723 LearningRate 0.000624 Epoch: 11 Global Step: 240010 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:20,981-Speed 2499.19 samples/sec Loss 3.2559 LearningRate 0.000624 Epoch: 11 Global Step: 240020 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:29,180-Speed 2498.20 samples/sec Loss 3.2926 LearningRate 0.000623 Epoch: 11 Global Step: 240030 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:37,376-Speed 2499.14 samples/sec Loss 3.2495 LearningRate 0.000623 Epoch: 11 Global Step: 240040 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:45,571-Speed 2499.52 samples/sec Loss 3.2560 LearningRate 0.000623 Epoch: 11 Global Step: 240050 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:19:53,768-Speed 2498.74 samples/sec Loss 3.1927 LearningRate 0.000623 Epoch: 11 Global Step: 240060 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:01,911-Speed 2515.45 samples/sec Loss 3.2239 LearningRate 0.000623 Epoch: 11 Global Step: 240070 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:10,128-Speed 2492.82 samples/sec Loss 3.2097 LearningRate 0.000623 Epoch: 11 Global Step: 240080 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:18,327-Speed 2498.24 samples/sec Loss 3.2303 LearningRate 0.000623 Epoch: 11 Global Step: 240090 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:26,522-Speed 2499.70 samples/sec Loss 3.2751 LearningRate 0.000623 Epoch: 11 Global Step: 240100 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:34,721-Speed 2498.13 samples/sec Loss 3.1947 LearningRate 0.000623 Epoch: 11 Global Step: 240110 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:42,919-Speed 2498.67 samples/sec Loss 3.2579 LearningRate 0.000623 Epoch: 11 Global Step: 240120 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:51,062-Speed 2515.61 samples/sec Loss 3.2973 LearningRate 0.000623 Epoch: 11 Global Step: 240130 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:20:59,262-Speed 2497.81 samples/sec Loss 3.3049 LearningRate 0.000623 Epoch: 11 Global Step: 240140 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:21:07,464-Speed 2497.42 samples/sec Loss 3.3080 LearningRate 0.000623 Epoch: 11 Global Step: 240150 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:21:15,656-Speed 2500.34 samples/sec Loss 3.2874 LearningRate 0.000623 Epoch: 11 Global Step: 240160 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:21:23,853-Speed 2498.92 samples/sec Loss 3.3262 LearningRate 0.000623 Epoch: 11 Global Step: 240170 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:21:32,048-Speed 2499.33 samples/sec Loss 3.3291 LearningRate 0.000623 Epoch: 11 Global Step: 240180 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:21:40,191-Speed 2515.68 samples/sec Loss 3.2847 LearningRate 0.000623 Epoch: 11 Global Step: 240190 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:21:48,390-Speed 2498.39 samples/sec Loss 3.1897 LearningRate 0.000623 Epoch: 11 Global Step: 240200 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:21:56,597-Speed 2495.95 samples/sec Loss 3.2485 LearningRate 0.000623 Epoch: 11 Global Step: 240210 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:04,790-Speed 2500.28 samples/sec Loss 3.2223 LearningRate 0.000623 Epoch: 11 Global Step: 240220 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:12,994-Speed 2496.82 samples/sec Loss 3.1974 LearningRate 0.000623 Epoch: 11 Global Step: 240230 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:21,190-Speed 2499.20 samples/sec Loss 3.2494 LearningRate 0.000623 Epoch: 11 Global Step: 240240 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:29,338-Speed 2514.38 samples/sec Loss 3.2292 LearningRate 0.000623 Epoch: 11 Global Step: 240250 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:37,533-Speed 2499.22 samples/sec Loss 3.2865 LearningRate 0.000623 Epoch: 11 Global Step: 240260 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:45,731-Speed 2498.60 samples/sec Loss 3.2314 LearningRate 0.000623 Epoch: 11 Global Step: 240270 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:22:53,925-Speed 2499.75 samples/sec Loss 3.2012 LearningRate 0.000623 Epoch: 11 Global Step: 240280 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:02,120-Speed 2499.43 samples/sec Loss 3.2524 LearningRate 0.000623 Epoch: 11 Global Step: 240290 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:10,316-Speed 2499.47 samples/sec Loss 3.3049 LearningRate 0.000623 Epoch: 11 Global Step: 240300 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:18,456-Speed 2516.37 samples/sec Loss 3.2407 LearningRate 0.000623 Epoch: 11 Global Step: 240310 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:26,652-Speed 2499.36 samples/sec Loss 3.2933 LearningRate 0.000623 Epoch: 11 Global Step: 240320 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:34,860-Speed 2495.52 samples/sec Loss 3.2985 LearningRate 0.000623 Epoch: 11 Global Step: 240330 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:43,054-Speed 2499.58 samples/sec Loss 3.3070 LearningRate 0.000623 Epoch: 11 Global Step: 240340 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:51,255-Speed 2497.85 samples/sec Loss 3.2780 LearningRate 0.000623 Epoch: 11 Global Step: 240350 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:23:59,453-Speed 2498.69 samples/sec Loss 3.2368 LearningRate 0.000623 Epoch: 11 Global Step: 240360 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:07,595-Speed 2516.19 samples/sec Loss 3.2038 LearningRate 0.000623 Epoch: 11 Global Step: 240370 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:15,791-Speed 2499.07 samples/sec Loss 3.2326 LearningRate 0.000623 Epoch: 11 Global Step: 240380 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:23,993-Speed 2497.47 samples/sec Loss 3.2396 LearningRate 0.000623 Epoch: 11 Global Step: 240390 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:32,193-Speed 2497.96 samples/sec Loss 3.2184 LearningRate 0.000623 Epoch: 11 Global Step: 240400 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:40,386-Speed 2500.17 samples/sec Loss 3.2390 LearningRate 0.000623 Epoch: 11 Global Step: 240410 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:48,583-Speed 2499.14 samples/sec Loss 3.2547 LearningRate 0.000623 Epoch: 11 Global Step: 240420 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:24:56,738-Speed 2511.70 samples/sec Loss 3.2618 LearningRate 0.000623 Epoch: 11 Global Step: 240430 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:04,934-Speed 2499.38 samples/sec Loss 3.3212 LearningRate 0.000623 Epoch: 11 Global Step: 240440 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:13,126-Speed 2500.32 samples/sec Loss 3.2577 LearningRate 0.000623 Epoch: 11 Global Step: 240450 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:21,322-Speed 2499.19 samples/sec Loss 3.2360 LearningRate 0.000623 Epoch: 11 Global Step: 240460 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:29,519-Speed 2499.04 samples/sec Loss 3.1725 LearningRate 0.000623 Epoch: 11 Global Step: 240470 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:37,714-Speed 2499.73 samples/sec Loss 3.2524 LearningRate 0.000623 Epoch: 11 Global Step: 240480 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:45,858-Speed 2515.27 samples/sec Loss 3.2389 LearningRate 0.000623 Epoch: 11 Global Step: 240490 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:25:54,055-Speed 2499.03 samples/sec Loss 3.2549 LearningRate 0.000623 Epoch: 11 Global Step: 240500 Fp16 Grad Scale: 65536 Required: 135 hours Training: 2022-07-07 21:26:02,220-Speed 2508.82 samples/sec Loss 3.2562 LearningRate 0.000622 Epoch: 11 Global Step: 240510 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:10,418-Speed 2498.62 samples/sec Loss 3.1916 LearningRate 0.000622 Epoch: 11 Global Step: 240520 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:18,626-Speed 2495.44 samples/sec Loss 3.2264 LearningRate 0.000622 Epoch: 11 Global Step: 240530 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:26,818-Speed 2500.35 samples/sec Loss 3.2322 LearningRate 0.000622 Epoch: 11 Global Step: 240540 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:34,959-Speed 2516.23 samples/sec Loss 3.2290 LearningRate 0.000622 Epoch: 11 Global Step: 240550 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:43,153-Speed 2499.81 samples/sec Loss 3.2719 LearningRate 0.000622 Epoch: 11 Global Step: 240560 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:51,345-Speed 2500.19 samples/sec Loss 3.2764 LearningRate 0.000622 Epoch: 11 Global Step: 240570 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:26:59,543-Speed 2498.84 samples/sec Loss 3.2566 LearningRate 0.000622 Epoch: 11 Global Step: 240580 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:07,740-Speed 2498.89 samples/sec Loss 3.2171 LearningRate 0.000622 Epoch: 11 Global Step: 240590 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:15,936-Speed 2499.51 samples/sec Loss 3.2772 LearningRate 0.000622 Epoch: 11 Global Step: 240600 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:24,079-Speed 2515.41 samples/sec Loss 3.2193 LearningRate 0.000622 Epoch: 11 Global Step: 240610 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:32,282-Speed 2496.86 samples/sec Loss 3.2319 LearningRate 0.000622 Epoch: 11 Global Step: 240620 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:40,480-Speed 2498.67 samples/sec Loss 3.2022 LearningRate 0.000622 Epoch: 11 Global Step: 240630 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:48,690-Speed 2494.83 samples/sec Loss 3.2507 LearningRate 0.000622 Epoch: 11 Global Step: 240640 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:27:56,887-Speed 2499.01 samples/sec Loss 3.2358 LearningRate 0.000622 Epoch: 11 Global Step: 240650 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:05,085-Speed 2498.68 samples/sec Loss 3.2266 LearningRate 0.000622 Epoch: 11 Global Step: 240660 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:13,233-Speed 2514.09 samples/sec Loss 3.2585 LearningRate 0.000622 Epoch: 11 Global Step: 240670 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:21,427-Speed 2499.57 samples/sec Loss 3.2373 LearningRate 0.000622 Epoch: 11 Global Step: 240680 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:29,628-Speed 2497.72 samples/sec Loss 3.2378 LearningRate 0.000622 Epoch: 11 Global Step: 240690 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:37,826-Speed 2498.50 samples/sec Loss 3.3307 LearningRate 0.000622 Epoch: 11 Global Step: 240700 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:46,023-Speed 2498.96 samples/sec Loss 3.2213 LearningRate 0.000622 Epoch: 11 Global Step: 240710 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:28:54,220-Speed 2498.99 samples/sec Loss 3.2671 LearningRate 0.000622 Epoch: 11 Global Step: 240720 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:02,364-Speed 2515.02 samples/sec Loss 3.2624 LearningRate 0.000622 Epoch: 11 Global Step: 240730 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:10,562-Speed 2498.67 samples/sec Loss 3.2635 LearningRate 0.000622 Epoch: 11 Global Step: 240740 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:18,761-Speed 2498.10 samples/sec Loss 3.2253 LearningRate 0.000622 Epoch: 11 Global Step: 240750 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:26,959-Speed 2498.88 samples/sec Loss 3.2504 LearningRate 0.000622 Epoch: 11 Global Step: 240760 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:35,156-Speed 2498.57 samples/sec Loss 3.2744 LearningRate 0.000622 Epoch: 11 Global Step: 240770 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:43,361-Speed 2496.64 samples/sec Loss 3.2883 LearningRate 0.000622 Epoch: 11 Global Step: 240780 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:51,503-Speed 2516.27 samples/sec Loss 3.2538 LearningRate 0.000622 Epoch: 11 Global Step: 240790 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:29:59,703-Speed 2498.10 samples/sec Loss 3.2146 LearningRate 0.000622 Epoch: 11 Global Step: 240800 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:07,905-Speed 2497.26 samples/sec Loss 3.2150 LearningRate 0.000622 Epoch: 11 Global Step: 240810 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:16,098-Speed 2500.14 samples/sec Loss 3.1638 LearningRate 0.000622 Epoch: 11 Global Step: 240820 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:24,295-Speed 2499.17 samples/sec Loss 3.2383 LearningRate 0.000622 Epoch: 11 Global Step: 240830 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:32,497-Speed 2497.16 samples/sec Loss 3.1801 LearningRate 0.000622 Epoch: 11 Global Step: 240840 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:40,634-Speed 2517.27 samples/sec Loss 3.2425 LearningRate 0.000622 Epoch: 11 Global Step: 240850 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:48,832-Speed 2498.37 samples/sec Loss 3.2398 LearningRate 0.000622 Epoch: 11 Global Step: 240860 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:30:57,038-Speed 2496.25 samples/sec Loss 3.2140 LearningRate 0.000622 Epoch: 11 Global Step: 240870 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:05,235-Speed 2498.62 samples/sec Loss 3.2430 LearningRate 0.000622 Epoch: 11 Global Step: 240880 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:13,446-Speed 2494.85 samples/sec Loss 3.2160 LearningRate 0.000622 Epoch: 11 Global Step: 240890 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:21,641-Speed 2499.71 samples/sec Loss 3.2352 LearningRate 0.000622 Epoch: 11 Global Step: 240900 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:29,790-Speed 2513.53 samples/sec Loss 3.2354 LearningRate 0.000622 Epoch: 11 Global Step: 240910 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:37,985-Speed 2499.43 samples/sec Loss 3.2613 LearningRate 0.000622 Epoch: 11 Global Step: 240920 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:46,180-Speed 2499.47 samples/sec Loss 3.2615 LearningRate 0.000622 Epoch: 11 Global Step: 240930 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:31:54,378-Speed 2498.76 samples/sec Loss 3.1937 LearningRate 0.000622 Epoch: 11 Global Step: 240940 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:02,571-Speed 2500.13 samples/sec Loss 3.2113 LearningRate 0.000622 Epoch: 11 Global Step: 240950 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:10,766-Speed 2499.44 samples/sec Loss 3.2399 LearningRate 0.000622 Epoch: 11 Global Step: 240960 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:18,910-Speed 2515.11 samples/sec Loss 3.2365 LearningRate 0.000622 Epoch: 11 Global Step: 240970 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:27,107-Speed 2499.21 samples/sec Loss 3.2398 LearningRate 0.000621 Epoch: 11 Global Step: 240980 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:35,306-Speed 2498.17 samples/sec Loss 3.2089 LearningRate 0.000621 Epoch: 11 Global Step: 240990 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:43,504-Speed 2498.72 samples/sec Loss 3.2249 LearningRate 0.000621 Epoch: 11 Global Step: 241000 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:51,701-Speed 2499.24 samples/sec Loss 3.2743 LearningRate 0.000621 Epoch: 11 Global Step: 241010 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:32:59,913-Speed 2494.59 samples/sec Loss 3.2483 LearningRate 0.000621 Epoch: 11 Global Step: 241020 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:08,058-Speed 2514.80 samples/sec Loss 3.2049 LearningRate 0.000621 Epoch: 11 Global Step: 241030 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:16,259-Speed 2497.61 samples/sec Loss 3.2173 LearningRate 0.000621 Epoch: 11 Global Step: 241040 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:24,459-Speed 2498.11 samples/sec Loss 3.1944 LearningRate 0.000621 Epoch: 11 Global Step: 241050 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:32,669-Speed 2494.86 samples/sec Loss 3.1706 LearningRate 0.000621 Epoch: 11 Global Step: 241060 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:40,873-Speed 2496.74 samples/sec Loss 3.2604 LearningRate 0.000621 Epoch: 11 Global Step: 241070 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:49,076-Speed 2496.79 samples/sec Loss 3.2975 LearningRate 0.000621 Epoch: 11 Global Step: 241080 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:33:57,219-Speed 2515.55 samples/sec Loss 3.2974 LearningRate 0.000621 Epoch: 11 Global Step: 241090 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:05,418-Speed 2498.38 samples/sec Loss 3.2395 LearningRate 0.000621 Epoch: 11 Global Step: 241100 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:13,615-Speed 2498.80 samples/sec Loss 3.2230 LearningRate 0.000621 Epoch: 11 Global Step: 241110 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:21,812-Speed 2498.85 samples/sec Loss 3.3011 LearningRate 0.000621 Epoch: 11 Global Step: 241120 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:30,104-Speed 2470.08 samples/sec Loss 3.2011 LearningRate 0.000621 Epoch: 11 Global Step: 241130 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:38,303-Speed 2498.36 samples/sec Loss 3.2335 LearningRate 0.000621 Epoch: 11 Global Step: 241140 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:46,447-Speed 2514.98 samples/sec Loss 3.1776 LearningRate 0.000621 Epoch: 11 Global Step: 241150 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:34:54,645-Speed 2498.65 samples/sec Loss 3.2890 LearningRate 0.000621 Epoch: 11 Global Step: 241160 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:02,852-Speed 2496.25 samples/sec Loss 3.2854 LearningRate 0.000621 Epoch: 11 Global Step: 241170 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:11,050-Speed 2498.94 samples/sec Loss 3.2258 LearningRate 0.000621 Epoch: 11 Global Step: 241180 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:19,245-Speed 2499.34 samples/sec Loss 3.2844 LearningRate 0.000621 Epoch: 11 Global Step: 241190 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:27,444-Speed 2498.81 samples/sec Loss 3.3555 LearningRate 0.000621 Epoch: 11 Global Step: 241200 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:35,589-Speed 2515.07 samples/sec Loss 3.2373 LearningRate 0.000621 Epoch: 11 Global Step: 241210 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:43,787-Speed 2498.42 samples/sec Loss 3.2235 LearningRate 0.000621 Epoch: 11 Global Step: 241220 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:35:51,995-Speed 2495.69 samples/sec Loss 3.1762 LearningRate 0.000621 Epoch: 11 Global Step: 241230 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:00,196-Speed 2497.64 samples/sec Loss 3.2219 LearningRate 0.000621 Epoch: 11 Global Step: 241240 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:08,397-Speed 2497.48 samples/sec Loss 3.1914 LearningRate 0.000621 Epoch: 11 Global Step: 241250 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:16,595-Speed 2498.52 samples/sec Loss 3.2106 LearningRate 0.000621 Epoch: 11 Global Step: 241260 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:24,739-Speed 2515.24 samples/sec Loss 3.2578 LearningRate 0.000621 Epoch: 11 Global Step: 241270 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:32,935-Speed 2499.31 samples/sec Loss 3.2328 LearningRate 0.000621 Epoch: 11 Global Step: 241280 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:41,136-Speed 2497.51 samples/sec Loss 3.2694 LearningRate 0.000621 Epoch: 11 Global Step: 241290 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:49,336-Speed 2498.00 samples/sec Loss 3.2612 LearningRate 0.000621 Epoch: 11 Global Step: 241300 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:36:57,535-Speed 2498.29 samples/sec Loss 3.2445 LearningRate 0.000621 Epoch: 11 Global Step: 241310 Fp16 Grad Scale: 32768 Required: 135 hours Training: 2022-07-07 21:37:05,732-Speed 2498.72 samples/sec Loss 3.3080 LearningRate 0.000621 Epoch: 11 Global Step: 241320 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:13,879-Speed 2514.31 samples/sec Loss 3.2799 LearningRate 0.000621 Epoch: 11 Global Step: 241330 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:22,077-Speed 2498.69 samples/sec Loss 3.2728 LearningRate 0.000621 Epoch: 11 Global Step: 241340 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:30,273-Speed 2498.97 samples/sec Loss 3.3113 LearningRate 0.000621 Epoch: 11 Global Step: 241350 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:38,472-Speed 2498.51 samples/sec Loss 3.2769 LearningRate 0.000621 Epoch: 11 Global Step: 241360 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:46,670-Speed 2498.43 samples/sec Loss 3.2326 LearningRate 0.000621 Epoch: 11 Global Step: 241370 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:37:54,878-Speed 2495.42 samples/sec Loss 3.2847 LearningRate 0.000621 Epoch: 11 Global Step: 241380 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:03,025-Speed 2514.14 samples/sec Loss 3.2498 LearningRate 0.000621 Epoch: 11 Global Step: 241390 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:11,225-Speed 2498.21 samples/sec Loss 3.2509 LearningRate 0.000621 Epoch: 11 Global Step: 241400 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:19,440-Speed 2493.31 samples/sec Loss 3.2332 LearningRate 0.000621 Epoch: 11 Global Step: 241410 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:27,641-Speed 2497.64 samples/sec Loss 3.2621 LearningRate 0.000621 Epoch: 11 Global Step: 241420 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:35,837-Speed 2499.26 samples/sec Loss 3.2349 LearningRate 0.000621 Epoch: 11 Global Step: 241430 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:38:43,992-Speed 2511.78 samples/sec Loss 3.2130 LearningRate 0.000621 Epoch: 11 Global Step: 241440 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:38:52,139-Speed 2514.20 samples/sec Loss 3.2311 LearningRate 0.000620 Epoch: 11 Global Step: 241450 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:00,337-Speed 2498.42 samples/sec Loss 3.2789 LearningRate 0.000620 Epoch: 11 Global Step: 241460 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:08,533-Speed 2499.16 samples/sec Loss 3.1829 LearningRate 0.000620 Epoch: 11 Global Step: 241470 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:16,728-Speed 2499.83 samples/sec Loss 3.1981 LearningRate 0.000620 Epoch: 11 Global Step: 241480 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:24,925-Speed 2498.95 samples/sec Loss 3.1561 LearningRate 0.000620 Epoch: 11 Global Step: 241490 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:33,130-Speed 2496.35 samples/sec Loss 3.3078 LearningRate 0.000620 Epoch: 11 Global Step: 241500 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:41,275-Speed 2515.08 samples/sec Loss 3.2591 LearningRate 0.000620 Epoch: 11 Global Step: 241510 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:49,472-Speed 2498.98 samples/sec Loss 3.2718 LearningRate 0.000620 Epoch: 11 Global Step: 241520 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:39:57,666-Speed 2499.74 samples/sec Loss 3.2097 LearningRate 0.000620 Epoch: 11 Global Step: 241530 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:05,863-Speed 2498.79 samples/sec Loss 3.2190 LearningRate 0.000620 Epoch: 11 Global Step: 241540 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:14,059-Speed 2499.21 samples/sec Loss 3.2563 LearningRate 0.000620 Epoch: 11 Global Step: 241550 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:22,258-Speed 2498.60 samples/sec Loss 3.2331 LearningRate 0.000620 Epoch: 11 Global Step: 241560 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:30,401-Speed 2515.31 samples/sec Loss 3.3026 LearningRate 0.000620 Epoch: 11 Global Step: 241570 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:38,596-Speed 2499.67 samples/sec Loss 3.2288 LearningRate 0.000620 Epoch: 11 Global Step: 241580 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:46,805-Speed 2494.99 samples/sec Loss 3.2337 LearningRate 0.000620 Epoch: 11 Global Step: 241590 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:40:55,005-Speed 2497.96 samples/sec Loss 3.2496 LearningRate 0.000620 Epoch: 11 Global Step: 241600 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:03,199-Speed 2499.69 samples/sec Loss 3.2039 LearningRate 0.000620 Epoch: 11 Global Step: 241610 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:11,393-Speed 2499.69 samples/sec Loss 3.1951 LearningRate 0.000620 Epoch: 11 Global Step: 241620 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:19,549-Speed 2511.49 samples/sec Loss 3.2143 LearningRate 0.000620 Epoch: 11 Global Step: 241630 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:27,744-Speed 2499.65 samples/sec Loss 3.2306 LearningRate 0.000620 Epoch: 11 Global Step: 241640 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:35,945-Speed 2497.46 samples/sec Loss 3.1657 LearningRate 0.000620 Epoch: 11 Global Step: 241650 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:44,154-Speed 2495.38 samples/sec Loss 3.2291 LearningRate 0.000620 Epoch: 11 Global Step: 241660 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:41:52,356-Speed 2497.30 samples/sec Loss 3.2679 LearningRate 0.000620 Epoch: 11 Global Step: 241670 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:00,556-Speed 2498.07 samples/sec Loss 3.1630 LearningRate 0.000620 Epoch: 11 Global Step: 241680 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:08,698-Speed 2515.42 samples/sec Loss 3.1962 LearningRate 0.000620 Epoch: 11 Global Step: 241690 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:16,899-Speed 2497.81 samples/sec Loss 3.2193 LearningRate 0.000620 Epoch: 11 Global Step: 241700 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:25,097-Speed 2498.80 samples/sec Loss 3.2523 LearningRate 0.000620 Epoch: 11 Global Step: 241710 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:33,299-Speed 2497.29 samples/sec Loss 3.2090 LearningRate 0.000620 Epoch: 11 Global Step: 241720 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:41,497-Speed 2498.52 samples/sec Loss 3.2152 LearningRate 0.000620 Epoch: 11 Global Step: 241730 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:49,693-Speed 2499.67 samples/sec Loss 3.2881 LearningRate 0.000620 Epoch: 11 Global Step: 241740 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:42:57,834-Speed 2516.00 samples/sec Loss 3.2036 LearningRate 0.000620 Epoch: 11 Global Step: 241750 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:06,039-Speed 2496.64 samples/sec Loss 3.2116 LearningRate 0.000620 Epoch: 11 Global Step: 241760 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:14,239-Speed 2498.26 samples/sec Loss 3.2031 LearningRate 0.000620 Epoch: 11 Global Step: 241770 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:22,435-Speed 2499.12 samples/sec Loss 3.2753 LearningRate 0.000620 Epoch: 11 Global Step: 241780 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:30,642-Speed 2495.60 samples/sec Loss 3.2355 LearningRate 0.000620 Epoch: 11 Global Step: 241790 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:38,851-Speed 2495.30 samples/sec Loss 3.2479 LearningRate 0.000620 Epoch: 11 Global Step: 241800 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:46,997-Speed 2514.63 samples/sec Loss 3.2840 LearningRate 0.000620 Epoch: 11 Global Step: 241810 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:43:55,191-Speed 2499.66 samples/sec Loss 3.2715 LearningRate 0.000620 Epoch: 11 Global Step: 241820 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:03,386-Speed 2499.51 samples/sec Loss 3.2613 LearningRate 0.000620 Epoch: 11 Global Step: 241830 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:11,582-Speed 2499.06 samples/sec Loss 3.3094 LearningRate 0.000620 Epoch: 11 Global Step: 241840 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:19,777-Speed 2499.90 samples/sec Loss 3.3000 LearningRate 0.000620 Epoch: 11 Global Step: 241850 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:27,970-Speed 2499.93 samples/sec Loss 3.2674 LearningRate 0.000620 Epoch: 11 Global Step: 241860 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:36,113-Speed 2515.53 samples/sec Loss 3.2953 LearningRate 0.000620 Epoch: 11 Global Step: 241870 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:44,308-Speed 2499.48 samples/sec Loss 3.2290 LearningRate 0.000620 Epoch: 11 Global Step: 241880 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:44:52,514-Speed 2496.01 samples/sec Loss 3.2522 LearningRate 0.000620 Epoch: 11 Global Step: 241890 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:00,712-Speed 2498.75 samples/sec Loss 3.2702 LearningRate 0.000620 Epoch: 11 Global Step: 241900 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:08,931-Speed 2492.12 samples/sec Loss 3.2948 LearningRate 0.000620 Epoch: 11 Global Step: 241910 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:17,128-Speed 2498.66 samples/sec Loss 3.2150 LearningRate 0.000620 Epoch: 11 Global Step: 241920 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:25,269-Speed 2516.31 samples/sec Loss 3.2434 LearningRate 0.000619 Epoch: 11 Global Step: 241930 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:33,489-Speed 2491.87 samples/sec Loss 3.2075 LearningRate 0.000619 Epoch: 11 Global Step: 241940 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:41,694-Speed 2496.23 samples/sec Loss 3.2457 LearningRate 0.000619 Epoch: 11 Global Step: 241950 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:49,895-Speed 2497.85 samples/sec Loss 3.1752 LearningRate 0.000619 Epoch: 11 Global Step: 241960 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:45:58,093-Speed 2498.56 samples/sec Loss 3.2134 LearningRate 0.000619 Epoch: 11 Global Step: 241970 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:06,293-Speed 2498.14 samples/sec Loss 3.1814 LearningRate 0.000619 Epoch: 11 Global Step: 241980 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:14,436-Speed 2515.42 samples/sec Loss 3.1906 LearningRate 0.000619 Epoch: 11 Global Step: 241990 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:22,630-Speed 2499.57 samples/sec Loss 3.1954 LearningRate 0.000619 Epoch: 11 Global Step: 242000 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:30,826-Speed 2499.04 samples/sec Loss 3.2125 LearningRate 0.000619 Epoch: 11 Global Step: 242010 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:39,027-Speed 2497.74 samples/sec Loss 3.2471 LearningRate 0.000619 Epoch: 11 Global Step: 242020 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:47,254-Speed 2489.68 samples/sec Loss 3.1545 LearningRate 0.000619 Epoch: 11 Global Step: 242030 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:46:55,452-Speed 2498.61 samples/sec Loss 3.2642 LearningRate 0.000619 Epoch: 11 Global Step: 242040 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:03,597-Speed 2514.97 samples/sec Loss 3.3009 LearningRate 0.000619 Epoch: 11 Global Step: 242050 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:11,796-Speed 2498.28 samples/sec Loss 3.2177 LearningRate 0.000619 Epoch: 11 Global Step: 242060 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:19,999-Speed 2497.11 samples/sec Loss 3.2052 LearningRate 0.000619 Epoch: 11 Global Step: 242070 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:28,207-Speed 2495.85 samples/sec Loss 3.2608 LearningRate 0.000619 Epoch: 11 Global Step: 242080 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:36,406-Speed 2498.00 samples/sec Loss 3.2870 LearningRate 0.000619 Epoch: 11 Global Step: 242090 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:44,606-Speed 2498.03 samples/sec Loss 3.2299 LearningRate 0.000619 Epoch: 11 Global Step: 242100 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:47:52,754-Speed 2513.79 samples/sec Loss 3.2053 LearningRate 0.000619 Epoch: 11 Global Step: 242110 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:00,953-Speed 2498.18 samples/sec Loss 3.2701 LearningRate 0.000619 Epoch: 11 Global Step: 242120 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:09,151-Speed 2498.74 samples/sec Loss 3.2198 LearningRate 0.000619 Epoch: 11 Global Step: 242130 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:17,348-Speed 2498.85 samples/sec Loss 3.2370 LearningRate 0.000619 Epoch: 11 Global Step: 242140 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:25,544-Speed 2498.95 samples/sec Loss 3.2080 LearningRate 0.000619 Epoch: 11 Global Step: 242150 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:33,746-Speed 2497.80 samples/sec Loss 3.2241 LearningRate 0.000619 Epoch: 11 Global Step: 242160 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:41,894-Speed 2513.94 samples/sec Loss 3.3202 LearningRate 0.000619 Epoch: 11 Global Step: 242170 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:50,099-Speed 2496.45 samples/sec Loss 3.2600 LearningRate 0.000619 Epoch: 11 Global Step: 242180 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:48:58,308-Speed 2495.15 samples/sec Loss 3.3485 LearningRate 0.000619 Epoch: 11 Global Step: 242190 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:06,513-Speed 2496.45 samples/sec Loss 3.2797 LearningRate 0.000619 Epoch: 11 Global Step: 242200 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:14,721-Speed 2495.72 samples/sec Loss 3.2300 LearningRate 0.000619 Epoch: 11 Global Step: 242210 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:22,920-Speed 2498.03 samples/sec Loss 3.2842 LearningRate 0.000619 Epoch: 11 Global Step: 242220 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:31,065-Speed 2514.91 samples/sec Loss 3.2559 LearningRate 0.000619 Epoch: 11 Global Step: 242230 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:39,265-Speed 2497.92 samples/sec Loss 3.2460 LearningRate 0.000619 Epoch: 11 Global Step: 242240 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:47,473-Speed 2495.86 samples/sec Loss 3.2606 LearningRate 0.000619 Epoch: 11 Global Step: 242250 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:49:55,671-Speed 2498.43 samples/sec Loss 3.1672 LearningRate 0.000619 Epoch: 11 Global Step: 242260 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:03,870-Speed 2498.16 samples/sec Loss 3.2135 LearningRate 0.000619 Epoch: 11 Global Step: 242270 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:12,070-Speed 2498.32 samples/sec Loss 3.1963 LearningRate 0.000619 Epoch: 11 Global Step: 242280 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:20,209-Speed 2516.80 samples/sec Loss 3.2027 LearningRate 0.000619 Epoch: 11 Global Step: 242290 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:28,403-Speed 2499.47 samples/sec Loss 3.1997 LearningRate 0.000619 Epoch: 11 Global Step: 242300 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:36,599-Speed 2499.43 samples/sec Loss 3.2312 LearningRate 0.000619 Epoch: 11 Global Step: 242310 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:44,793-Speed 2499.73 samples/sec Loss 3.2331 LearningRate 0.000619 Epoch: 11 Global Step: 242320 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:50:52,986-Speed 2500.16 samples/sec Loss 3.2279 LearningRate 0.000619 Epoch: 11 Global Step: 242330 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:01,185-Speed 2498.08 samples/sec Loss 3.1408 LearningRate 0.000619 Epoch: 11 Global Step: 242340 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:09,342-Speed 2511.30 samples/sec Loss 3.1783 LearningRate 0.000619 Epoch: 11 Global Step: 242350 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:17,540-Speed 2498.76 samples/sec Loss 3.1620 LearningRate 0.000619 Epoch: 11 Global Step: 242360 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:25,739-Speed 2498.32 samples/sec Loss 3.2194 LearningRate 0.000619 Epoch: 11 Global Step: 242370 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:33,937-Speed 2498.34 samples/sec Loss 3.2484 LearningRate 0.000619 Epoch: 11 Global Step: 242380 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:42,141-Speed 2496.94 samples/sec Loss 3.1846 LearningRate 0.000619 Epoch: 11 Global Step: 242390 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:50,342-Speed 2497.53 samples/sec Loss 3.2387 LearningRate 0.000618 Epoch: 11 Global Step: 242400 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:51:58,489-Speed 2514.40 samples/sec Loss 3.1964 LearningRate 0.000618 Epoch: 11 Global Step: 242410 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:06,688-Speed 2498.21 samples/sec Loss 3.1893 LearningRate 0.000618 Epoch: 11 Global Step: 242420 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:14,888-Speed 2498.38 samples/sec Loss 3.1845 LearningRate 0.000618 Epoch: 11 Global Step: 242430 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:23,091-Speed 2497.13 samples/sec Loss 3.2591 LearningRate 0.000618 Epoch: 11 Global Step: 242440 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:31,289-Speed 2498.42 samples/sec Loss 3.2824 LearningRate 0.000618 Epoch: 11 Global Step: 242450 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:39,488-Speed 2498.38 samples/sec Loss 3.2114 LearningRate 0.000618 Epoch: 11 Global Step: 242460 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:47,637-Speed 2513.59 samples/sec Loss 3.2354 LearningRate 0.000618 Epoch: 11 Global Step: 242470 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:52:55,841-Speed 2496.74 samples/sec Loss 3.1347 LearningRate 0.000618 Epoch: 11 Global Step: 242480 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:04,040-Speed 2498.31 samples/sec Loss 3.2455 LearningRate 0.000618 Epoch: 11 Global Step: 242490 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:12,242-Speed 2497.69 samples/sec Loss 3.1513 LearningRate 0.000618 Epoch: 11 Global Step: 242500 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:20,466-Speed 2490.59 samples/sec Loss 3.1800 LearningRate 0.000618 Epoch: 11 Global Step: 242510 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:28,663-Speed 2498.62 samples/sec Loss 3.2006 LearningRate 0.000618 Epoch: 11 Global Step: 242520 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:36,812-Speed 2513.86 samples/sec Loss 3.2444 LearningRate 0.000618 Epoch: 11 Global Step: 242530 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:45,016-Speed 2496.98 samples/sec Loss 3.2180 LearningRate 0.000618 Epoch: 11 Global Step: 242540 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:53:53,217-Speed 2497.53 samples/sec Loss 3.2100 LearningRate 0.000618 Epoch: 11 Global Step: 242550 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:01,412-Speed 2499.40 samples/sec Loss 3.1886 LearningRate 0.000618 Epoch: 11 Global Step: 242560 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:09,612-Speed 2497.94 samples/sec Loss 3.1731 LearningRate 0.000618 Epoch: 11 Global Step: 242570 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:17,809-Speed 2498.88 samples/sec Loss 3.2755 LearningRate 0.000618 Epoch: 11 Global Step: 242580 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:25,953-Speed 2515.16 samples/sec Loss 3.1867 LearningRate 0.000618 Epoch: 11 Global Step: 242590 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:34,153-Speed 2497.92 samples/sec Loss 3.2061 LearningRate 0.000618 Epoch: 11 Global Step: 242600 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:42,355-Speed 2497.24 samples/sec Loss 3.2005 LearningRate 0.000618 Epoch: 11 Global Step: 242610 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:50,555-Speed 2497.99 samples/sec Loss 3.1771 LearningRate 0.000618 Epoch: 11 Global Step: 242620 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:54:58,752-Speed 2498.97 samples/sec Loss 3.2723 LearningRate 0.000618 Epoch: 11 Global Step: 242630 Fp16 Grad Scale: 16384 Required: 134 hours Training: 2022-07-07 21:55:06,965-Speed 2493.94 samples/sec Loss 3.2417 LearningRate 0.000618 Epoch: 11 Global Step: 242640 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:15,112-Speed 2514.12 samples/sec Loss 3.2004 LearningRate 0.000618 Epoch: 11 Global Step: 242650 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:23,313-Speed 2497.68 samples/sec Loss 3.2494 LearningRate 0.000618 Epoch: 11 Global Step: 242660 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:31,519-Speed 2496.26 samples/sec Loss 3.1849 LearningRate 0.000618 Epoch: 11 Global Step: 242670 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:39,718-Speed 2498.20 samples/sec Loss 3.2038 LearningRate 0.000618 Epoch: 11 Global Step: 242680 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:47,924-Speed 2495.99 samples/sec Loss 3.1613 LearningRate 0.000618 Epoch: 11 Global Step: 242690 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:55:56,124-Speed 2497.90 samples/sec Loss 3.2233 LearningRate 0.000618 Epoch: 11 Global Step: 242700 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:04,270-Speed 2514.92 samples/sec Loss 3.1791 LearningRate 0.000618 Epoch: 11 Global Step: 242710 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:12,484-Speed 2493.87 samples/sec Loss 3.1464 LearningRate 0.000618 Epoch: 11 Global Step: 242720 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:20,683-Speed 2498.19 samples/sec Loss 3.1541 LearningRate 0.000618 Epoch: 11 Global Step: 242730 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:28,892-Speed 2495.16 samples/sec Loss 3.2273 LearningRate 0.000618 Epoch: 11 Global Step: 242740 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:37,095-Speed 2497.45 samples/sec Loss 3.2143 LearningRate 0.000618 Epoch: 11 Global Step: 242750 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:45,294-Speed 2498.08 samples/sec Loss 3.2322 LearningRate 0.000618 Epoch: 11 Global Step: 242760 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:56:53,441-Speed 2514.01 samples/sec Loss 3.2391 LearningRate 0.000618 Epoch: 11 Global Step: 242770 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:01,643-Speed 2497.41 samples/sec Loss 3.1922 LearningRate 0.000618 Epoch: 11 Global Step: 242780 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:09,844-Speed 2497.64 samples/sec Loss 3.2092 LearningRate 0.000618 Epoch: 11 Global Step: 242790 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:18,049-Speed 2496.45 samples/sec Loss 3.1892 LearningRate 0.000618 Epoch: 11 Global Step: 242800 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:26,246-Speed 2498.78 samples/sec Loss 3.2055 LearningRate 0.000618 Epoch: 11 Global Step: 242810 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:34,447-Speed 2497.93 samples/sec Loss 3.1483 LearningRate 0.000618 Epoch: 11 Global Step: 242820 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:42,611-Speed 2508.98 samples/sec Loss 3.1782 LearningRate 0.000618 Epoch: 11 Global Step: 242830 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:50,811-Speed 2497.91 samples/sec Loss 3.2163 LearningRate 0.000618 Epoch: 11 Global Step: 242840 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:57:59,024-Speed 2494.15 samples/sec Loss 3.2109 LearningRate 0.000618 Epoch: 11 Global Step: 242850 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:07,223-Speed 2498.32 samples/sec Loss 3.2502 LearningRate 0.000618 Epoch: 11 Global Step: 242860 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:15,433-Speed 2494.97 samples/sec Loss 3.2556 LearningRate 0.000618 Epoch: 11 Global Step: 242870 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:23,633-Speed 2498.12 samples/sec Loss 3.3877 LearningRate 0.000617 Epoch: 11 Global Step: 242880 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:31,776-Speed 2515.22 samples/sec Loss 3.3632 LearningRate 0.000617 Epoch: 11 Global Step: 242890 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:39,975-Speed 2498.23 samples/sec Loss 3.3165 LearningRate 0.000617 Epoch: 11 Global Step: 242900 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:48,173-Speed 2498.44 samples/sec Loss 3.3449 LearningRate 0.000617 Epoch: 11 Global Step: 242910 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:58:56,369-Speed 2499.30 samples/sec Loss 3.2330 LearningRate 0.000617 Epoch: 11 Global Step: 242920 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:04,567-Speed 2498.71 samples/sec Loss 3.2433 LearningRate 0.000617 Epoch: 11 Global Step: 242930 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:12,766-Speed 2498.34 samples/sec Loss 3.2959 LearningRate 0.000617 Epoch: 11 Global Step: 242940 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:20,910-Speed 2515.03 samples/sec Loss 3.2482 LearningRate 0.000617 Epoch: 11 Global Step: 242950 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:29,109-Speed 2498.11 samples/sec Loss 3.2533 LearningRate 0.000617 Epoch: 11 Global Step: 242960 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:37,308-Speed 2498.38 samples/sec Loss 3.2309 LearningRate 0.000617 Epoch: 11 Global Step: 242970 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:45,507-Speed 2498.71 samples/sec Loss 3.2277 LearningRate 0.000617 Epoch: 11 Global Step: 242980 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 21:59:53,706-Speed 2498.14 samples/sec Loss 3.2209 LearningRate 0.000617 Epoch: 11 Global Step: 242990 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:01,906-Speed 2497.82 samples/sec Loss 3.1787 LearningRate 0.000617 Epoch: 11 Global Step: 243000 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:10,065-Speed 2510.68 samples/sec Loss 3.2567 LearningRate 0.000617 Epoch: 11 Global Step: 243010 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:18,264-Speed 2498.59 samples/sec Loss 3.2285 LearningRate 0.000617 Epoch: 11 Global Step: 243020 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:26,462-Speed 2498.43 samples/sec Loss 3.3060 LearningRate 0.000617 Epoch: 11 Global Step: 243030 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:34,660-Speed 2498.73 samples/sec Loss 3.2721 LearningRate 0.000617 Epoch: 11 Global Step: 243040 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:42,856-Speed 2498.91 samples/sec Loss 3.2197 LearningRate 0.000617 Epoch: 11 Global Step: 243050 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:51,059-Speed 2497.46 samples/sec Loss 3.2557 LearningRate 0.000617 Epoch: 11 Global Step: 243060 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:00:59,204-Speed 2514.60 samples/sec Loss 3.2521 LearningRate 0.000617 Epoch: 11 Global Step: 243070 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:07,403-Speed 2498.30 samples/sec Loss 3.2254 LearningRate 0.000617 Epoch: 11 Global Step: 243080 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:15,603-Speed 2498.09 samples/sec Loss 3.1991 LearningRate 0.000617 Epoch: 11 Global Step: 243090 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:23,806-Speed 2496.97 samples/sec Loss 3.3288 LearningRate 0.000617 Epoch: 11 Global Step: 243100 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:32,001-Speed 2499.45 samples/sec Loss 3.3670 LearningRate 0.000617 Epoch: 11 Global Step: 243110 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:40,204-Speed 2496.87 samples/sec Loss 3.3099 LearningRate 0.000617 Epoch: 11 Global Step: 243120 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:48,347-Speed 2515.41 samples/sec Loss 3.3061 LearningRate 0.000617 Epoch: 11 Global Step: 243130 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:01:56,547-Speed 2497.90 samples/sec Loss 3.2775 LearningRate 0.000617 Epoch: 11 Global Step: 243140 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:04,747-Speed 2497.93 samples/sec Loss 3.2635 LearningRate 0.000617 Epoch: 11 Global Step: 243150 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:12,944-Speed 2498.63 samples/sec Loss 3.2577 LearningRate 0.000617 Epoch: 11 Global Step: 243160 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:21,146-Speed 2497.32 samples/sec Loss 3.3069 LearningRate 0.000617 Epoch: 11 Global Step: 243170 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:29,350-Speed 2496.88 samples/sec Loss 3.2590 LearningRate 0.000617 Epoch: 11 Global Step: 243180 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:37,496-Speed 2514.53 samples/sec Loss 3.2808 LearningRate 0.000617 Epoch: 11 Global Step: 243190 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:45,696-Speed 2497.90 samples/sec Loss 3.1703 LearningRate 0.000617 Epoch: 11 Global Step: 243200 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:02:53,895-Speed 2498.32 samples/sec Loss 3.2314 LearningRate 0.000617 Epoch: 11 Global Step: 243210 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:02,091-Speed 2498.92 samples/sec Loss 3.2691 LearningRate 0.000617 Epoch: 11 Global Step: 243220 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:10,288-Speed 2498.87 samples/sec Loss 3.2692 LearningRate 0.000617 Epoch: 11 Global Step: 243230 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:18,485-Speed 2498.86 samples/sec Loss 3.2464 LearningRate 0.000617 Epoch: 11 Global Step: 243240 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:26,654-Speed 2507.50 samples/sec Loss 3.1906 LearningRate 0.000617 Epoch: 11 Global Step: 243250 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:34,853-Speed 2498.03 samples/sec Loss 3.2356 LearningRate 0.000617 Epoch: 11 Global Step: 243260 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:43,051-Speed 2498.52 samples/sec Loss 3.2289 LearningRate 0.000617 Epoch: 11 Global Step: 243270 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:51,253-Speed 2497.41 samples/sec Loss 3.2647 LearningRate 0.000617 Epoch: 11 Global Step: 243280 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:03:59,449-Speed 2499.27 samples/sec Loss 3.2223 LearningRate 0.000617 Epoch: 11 Global Step: 243290 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:07,648-Speed 2498.30 samples/sec Loss 3.2052 LearningRate 0.000617 Epoch: 11 Global Step: 243300 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:15,795-Speed 2514.12 samples/sec Loss 3.2116 LearningRate 0.000617 Epoch: 11 Global Step: 243310 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:23,995-Speed 2497.89 samples/sec Loss 3.2054 LearningRate 0.000617 Epoch: 11 Global Step: 243320 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:32,197-Speed 2497.64 samples/sec Loss 3.2237 LearningRate 0.000617 Epoch: 11 Global Step: 243330 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:40,396-Speed 2498.17 samples/sec Loss 3.2355 LearningRate 0.000617 Epoch: 11 Global Step: 243340 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:48,597-Speed 2498.48 samples/sec Loss 3.2159 LearningRate 0.000616 Epoch: 11 Global Step: 243350 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:04:56,799-Speed 2497.41 samples/sec Loss 3.2119 LearningRate 0.000616 Epoch: 11 Global Step: 243360 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:04,951-Speed 2512.55 samples/sec Loss 3.2074 LearningRate 0.000616 Epoch: 11 Global Step: 243370 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:13,162-Speed 2494.72 samples/sec Loss 3.2344 LearningRate 0.000616 Epoch: 11 Global Step: 243380 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:21,364-Speed 2497.32 samples/sec Loss 3.1882 LearningRate 0.000616 Epoch: 11 Global Step: 243390 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:29,565-Speed 2497.55 samples/sec Loss 3.1391 LearningRate 0.000616 Epoch: 11 Global Step: 243400 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:37,765-Speed 2498.22 samples/sec Loss 3.1371 LearningRate 0.000616 Epoch: 11 Global Step: 243410 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:45,963-Speed 2498.70 samples/sec Loss 3.2186 LearningRate 0.000616 Epoch: 11 Global Step: 243420 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:05:54,129-Speed 2508.46 samples/sec Loss 3.2104 LearningRate 0.000616 Epoch: 11 Global Step: 243430 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:02,332-Speed 2497.01 samples/sec Loss 3.2344 LearningRate 0.000616 Epoch: 11 Global Step: 243440 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:10,531-Speed 2498.36 samples/sec Loss 3.1879 LearningRate 0.000616 Epoch: 11 Global Step: 243450 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:18,730-Speed 2498.28 samples/sec Loss 3.1851 LearningRate 0.000616 Epoch: 11 Global Step: 243460 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:26,942-Speed 2494.12 samples/sec Loss 3.1935 LearningRate 0.000616 Epoch: 11 Global Step: 243470 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:35,137-Speed 2499.50 samples/sec Loss 3.1737 LearningRate 0.000616 Epoch: 11 Global Step: 243480 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:43,286-Speed 2513.93 samples/sec Loss 3.1948 LearningRate 0.000616 Epoch: 11 Global Step: 243490 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:51,483-Speed 2498.76 samples/sec Loss 3.1791 LearningRate 0.000616 Epoch: 11 Global Step: 243500 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:06:59,698-Speed 2493.18 samples/sec Loss 3.2428 LearningRate 0.000616 Epoch: 11 Global Step: 243510 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:07,908-Speed 2494.88 samples/sec Loss 3.2180 LearningRate 0.000616 Epoch: 11 Global Step: 243520 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:16,107-Speed 2498.53 samples/sec Loss 3.1741 LearningRate 0.000616 Epoch: 11 Global Step: 243530 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:24,306-Speed 2497.92 samples/sec Loss 3.1954 LearningRate 0.000616 Epoch: 11 Global Step: 243540 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:32,455-Speed 2513.63 samples/sec Loss 3.1666 LearningRate 0.000616 Epoch: 11 Global Step: 243550 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:40,653-Speed 2498.59 samples/sec Loss 3.1831 LearningRate 0.000616 Epoch: 11 Global Step: 243560 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:48,864-Speed 2494.60 samples/sec Loss 3.1866 LearningRate 0.000616 Epoch: 11 Global Step: 243570 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:07:57,084-Speed 2491.66 samples/sec Loss 3.2127 LearningRate 0.000616 Epoch: 11 Global Step: 243580 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:05,285-Speed 2497.60 samples/sec Loss 3.1791 LearningRate 0.000616 Epoch: 11 Global Step: 243590 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:13,489-Speed 2497.10 samples/sec Loss 3.1607 LearningRate 0.000616 Epoch: 11 Global Step: 243600 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:21,635-Speed 2514.49 samples/sec Loss 3.2080 LearningRate 0.000616 Epoch: 11 Global Step: 243610 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:29,831-Speed 2499.04 samples/sec Loss 3.1797 LearningRate 0.000616 Epoch: 11 Global Step: 243620 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:38,029-Speed 2498.64 samples/sec Loss 3.2023 LearningRate 0.000616 Epoch: 11 Global Step: 243630 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:46,227-Speed 2498.59 samples/sec Loss 3.1755 LearningRate 0.000616 Epoch: 11 Global Step: 243640 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:08:54,424-Speed 2498.92 samples/sec Loss 3.1670 LearningRate 0.000616 Epoch: 11 Global Step: 243650 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:02,623-Speed 2498.14 samples/sec Loss 3.1805 LearningRate 0.000616 Epoch: 11 Global Step: 243660 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:10,771-Speed 2514.07 samples/sec Loss 3.2148 LearningRate 0.000616 Epoch: 11 Global Step: 243670 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:18,980-Speed 2495.25 samples/sec Loss 3.2377 LearningRate 0.000616 Epoch: 11 Global Step: 243680 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:27,177-Speed 2498.81 samples/sec Loss 3.2077 LearningRate 0.000616 Epoch: 11 Global Step: 243690 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:35,374-Speed 2498.78 samples/sec Loss 3.1470 LearningRate 0.000616 Epoch: 11 Global Step: 243700 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:43,572-Speed 2498.64 samples/sec Loss 3.1576 LearningRate 0.000616 Epoch: 11 Global Step: 243710 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:51,787-Speed 2493.00 samples/sec Loss 3.1633 LearningRate 0.000616 Epoch: 11 Global Step: 243720 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:09:59,933-Speed 2514.99 samples/sec Loss 3.2032 LearningRate 0.000616 Epoch: 11 Global Step: 243730 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:08,133-Speed 2497.76 samples/sec Loss 3.1584 LearningRate 0.000616 Epoch: 11 Global Step: 243740 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:16,333-Speed 2497.99 samples/sec Loss 3.1782 LearningRate 0.000616 Epoch: 11 Global Step: 243750 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:24,532-Speed 2498.48 samples/sec Loss 3.2509 LearningRate 0.000616 Epoch: 11 Global Step: 243760 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:32,731-Speed 2498.24 samples/sec Loss 3.2199 LearningRate 0.000616 Epoch: 11 Global Step: 243770 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:40,929-Speed 2498.93 samples/sec Loss 3.2176 LearningRate 0.000616 Epoch: 11 Global Step: 243780 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:49,081-Speed 2512.71 samples/sec Loss 3.2332 LearningRate 0.000616 Epoch: 11 Global Step: 243790 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:10:57,280-Speed 2498.41 samples/sec Loss 3.2632 LearningRate 0.000616 Epoch: 11 Global Step: 243800 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:11:05,480-Speed 2497.95 samples/sec Loss 3.2016 LearningRate 0.000616 Epoch: 11 Global Step: 243810 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:11:13,676-Speed 2499.45 samples/sec Loss 3.1665 LearningRate 0.000616 Epoch: 11 Global Step: 243820 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:11:21,876-Speed 2498.09 samples/sec Loss 3.2210 LearningRate 0.000615 Epoch: 11 Global Step: 243830 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:11:30,074-Speed 2498.52 samples/sec Loss 3.2850 LearningRate 0.000615 Epoch: 11 Global Step: 243840 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:11:38,219-Speed 2514.70 samples/sec Loss 3.1944 LearningRate 0.000615 Epoch: 11 Global Step: 243850 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:11:46,416-Speed 2498.80 samples/sec Loss 3.2637 LearningRate 0.000615 Epoch: 11 Global Step: 243860 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:11:54,612-Speed 2499.26 samples/sec Loss 3.3098 LearningRate 0.000615 Epoch: 11 Global Step: 243870 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:02,813-Speed 2497.68 samples/sec Loss 3.3089 LearningRate 0.000615 Epoch: 11 Global Step: 243880 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:11,033-Speed 2492.11 samples/sec Loss 3.3294 LearningRate 0.000615 Epoch: 11 Global Step: 243890 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:19,231-Speed 2498.52 samples/sec Loss 3.2194 LearningRate 0.000615 Epoch: 11 Global Step: 243900 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:27,376-Speed 2514.78 samples/sec Loss 3.2479 LearningRate 0.000615 Epoch: 11 Global Step: 243910 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:35,572-Speed 2499.27 samples/sec Loss 3.3031 LearningRate 0.000615 Epoch: 11 Global Step: 243920 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:43,771-Speed 2498.42 samples/sec Loss 3.2867 LearningRate 0.000615 Epoch: 11 Global Step: 243930 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:12:51,972-Speed 2497.49 samples/sec Loss 3.2103 LearningRate 0.000615 Epoch: 11 Global Step: 243940 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:13:00,174-Speed 2497.27 samples/sec Loss 3.3037 LearningRate 0.000615 Epoch: 11 Global Step: 243950 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:13:08,372-Speed 2498.60 samples/sec Loss 3.2523 LearningRate 0.000615 Epoch: 11 Global Step: 243960 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:13:16,516-Speed 2514.93 samples/sec Loss 3.1821 LearningRate 0.000615 Epoch: 11 Global Step: 243970 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:13:24,670-Speed 2512.19 samples/sec Loss 3.2007 LearningRate 0.000615 Epoch: 11 Global Step: 243980 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:13:32,868-Speed 2498.52 samples/sec Loss 3.2331 LearningRate 0.000615 Epoch: 11 Global Step: 243990 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:13:41,073-Speed 2496.51 samples/sec Loss 3.1725 LearningRate 0.000615 Epoch: 11 Global Step: 244000 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:13:49,273-Speed 2497.81 samples/sec Loss 3.1742 LearningRate 0.000615 Epoch: 11 Global Step: 244010 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:13:57,478-Speed 2496.78 samples/sec Loss 3.2977 LearningRate 0.000615 Epoch: 11 Global Step: 244020 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:05,623-Speed 2514.63 samples/sec Loss 3.2831 LearningRate 0.000615 Epoch: 11 Global Step: 244030 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:13,823-Speed 2497.92 samples/sec Loss 3.2298 LearningRate 0.000615 Epoch: 11 Global Step: 244040 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:22,021-Speed 2498.99 samples/sec Loss 3.2396 LearningRate 0.000615 Epoch: 11 Global Step: 244050 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:30,218-Speed 2498.75 samples/sec Loss 3.2701 LearningRate 0.000615 Epoch: 11 Global Step: 244060 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:38,425-Speed 2495.90 samples/sec Loss 3.2069 LearningRate 0.000615 Epoch: 11 Global Step: 244070 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:46,625-Speed 2497.82 samples/sec Loss 3.2448 LearningRate 0.000615 Epoch: 11 Global Step: 244080 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:14:54,774-Speed 2513.75 samples/sec Loss 3.1827 LearningRate 0.000615 Epoch: 11 Global Step: 244090 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:02,977-Speed 2497.12 samples/sec Loss 3.1826 LearningRate 0.000615 Epoch: 11 Global Step: 244100 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:11,175-Speed 2498.23 samples/sec Loss 3.1837 LearningRate 0.000615 Epoch: 11 Global Step: 244110 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:19,378-Speed 2497.15 samples/sec Loss 3.1367 LearningRate 0.000615 Epoch: 11 Global Step: 244120 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:27,588-Speed 2495.14 samples/sec Loss 3.1791 LearningRate 0.000615 Epoch: 11 Global Step: 244130 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:35,790-Speed 2497.19 samples/sec Loss 3.2218 LearningRate 0.000615 Epoch: 11 Global Step: 244140 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:43,937-Speed 2514.44 samples/sec Loss 3.2072 LearningRate 0.000615 Epoch: 11 Global Step: 244150 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:15:52,136-Speed 2497.95 samples/sec Loss 3.1487 LearningRate 0.000615 Epoch: 11 Global Step: 244160 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:00,335-Speed 2498.31 samples/sec Loss 3.1882 LearningRate 0.000615 Epoch: 11 Global Step: 244170 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:08,536-Speed 2498.00 samples/sec Loss 3.1950 LearningRate 0.000615 Epoch: 11 Global Step: 244180 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:16,734-Speed 2498.54 samples/sec Loss 3.1778 LearningRate 0.000615 Epoch: 11 Global Step: 244190 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:24,931-Speed 2498.67 samples/sec Loss 3.2792 LearningRate 0.000615 Epoch: 11 Global Step: 244200 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:33,090-Speed 2510.78 samples/sec Loss 3.2688 LearningRate 0.000615 Epoch: 11 Global Step: 244210 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:41,288-Speed 2498.78 samples/sec Loss 3.2719 LearningRate 0.000615 Epoch: 11 Global Step: 244220 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:49,487-Speed 2498.29 samples/sec Loss 3.2740 LearningRate 0.000615 Epoch: 11 Global Step: 244230 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:16:57,686-Speed 2498.44 samples/sec Loss 3.2870 LearningRate 0.000615 Epoch: 11 Global Step: 244240 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:05,886-Speed 2498.07 samples/sec Loss 3.2471 LearningRate 0.000615 Epoch: 11 Global Step: 244250 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:14,086-Speed 2497.83 samples/sec Loss 3.2131 LearningRate 0.000615 Epoch: 11 Global Step: 244260 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:22,232-Speed 2514.61 samples/sec Loss 3.2273 LearningRate 0.000615 Epoch: 11 Global Step: 244270 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:30,435-Speed 2497.08 samples/sec Loss 3.2446 LearningRate 0.000615 Epoch: 11 Global Step: 244280 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:38,633-Speed 2498.59 samples/sec Loss 3.2190 LearningRate 0.000615 Epoch: 11 Global Step: 244290 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:46,833-Speed 2498.05 samples/sec Loss 3.2543 LearningRate 0.000614 Epoch: 11 Global Step: 244300 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:17:55,035-Speed 2497.14 samples/sec Loss 3.2332 LearningRate 0.000614 Epoch: 11 Global Step: 244310 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:03,238-Speed 2496.95 samples/sec Loss 3.1456 LearningRate 0.000614 Epoch: 11 Global Step: 244320 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:11,384-Speed 2514.48 samples/sec Loss 3.2774 LearningRate 0.000614 Epoch: 11 Global Step: 244330 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:19,589-Speed 2496.44 samples/sec Loss 3.2288 LearningRate 0.000614 Epoch: 11 Global Step: 244340 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:27,790-Speed 2497.73 samples/sec Loss 3.2314 LearningRate 0.000614 Epoch: 11 Global Step: 244350 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:35,990-Speed 2497.93 samples/sec Loss 3.2698 LearningRate 0.000614 Epoch: 11 Global Step: 244360 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:44,189-Speed 2498.27 samples/sec Loss 3.1938 LearningRate 0.000614 Epoch: 11 Global Step: 244370 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:18:52,387-Speed 2498.49 samples/sec Loss 3.2450 LearningRate 0.000614 Epoch: 11 Global Step: 244380 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:00,532-Speed 2514.79 samples/sec Loss 3.2324 LearningRate 0.000614 Epoch: 11 Global Step: 244390 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:08,735-Speed 2497.31 samples/sec Loss 3.1874 LearningRate 0.000614 Epoch: 11 Global Step: 244400 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:16,933-Speed 2498.44 samples/sec Loss 3.2193 LearningRate 0.000614 Epoch: 11 Global Step: 244410 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:25,134-Speed 2497.82 samples/sec Loss 3.2546 LearningRate 0.000614 Epoch: 11 Global Step: 244420 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:33,333-Speed 2498.09 samples/sec Loss 3.2362 LearningRate 0.000614 Epoch: 11 Global Step: 244430 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:41,535-Speed 2497.55 samples/sec Loss 3.3322 LearningRate 0.000614 Epoch: 11 Global Step: 244440 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:49,679-Speed 2514.90 samples/sec Loss 3.2321 LearningRate 0.000614 Epoch: 11 Global Step: 244450 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:19:57,881-Speed 2497.30 samples/sec Loss 3.1871 LearningRate 0.000614 Epoch: 11 Global Step: 244460 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:06,079-Speed 2498.54 samples/sec Loss 3.2223 LearningRate 0.000614 Epoch: 11 Global Step: 244470 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:14,280-Speed 2497.92 samples/sec Loss 3.2513 LearningRate 0.000614 Epoch: 11 Global Step: 244480 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:22,478-Speed 2498.28 samples/sec Loss 3.1976 LearningRate 0.000614 Epoch: 11 Global Step: 244490 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:30,675-Speed 2498.89 samples/sec Loss 3.2129 LearningRate 0.000614 Epoch: 11 Global Step: 244500 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:38,823-Speed 2514.07 samples/sec Loss 3.2152 LearningRate 0.000614 Epoch: 11 Global Step: 244510 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:47,032-Speed 2495.12 samples/sec Loss 3.2605 LearningRate 0.000614 Epoch: 11 Global Step: 244520 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:20:55,231-Speed 2498.38 samples/sec Loss 3.2118 LearningRate 0.000614 Epoch: 11 Global Step: 244530 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:03,433-Speed 2497.45 samples/sec Loss 3.1996 LearningRate 0.000614 Epoch: 11 Global Step: 244540 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:11,630-Speed 2498.82 samples/sec Loss 3.1245 LearningRate 0.000614 Epoch: 11 Global Step: 244550 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:19,829-Speed 2498.33 samples/sec Loss 3.2772 LearningRate 0.000614 Epoch: 11 Global Step: 244560 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:27,970-Speed 2515.97 samples/sec Loss 3.2498 LearningRate 0.000614 Epoch: 11 Global Step: 244570 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:36,170-Speed 2497.87 samples/sec Loss 3.1917 LearningRate 0.000614 Epoch: 11 Global Step: 244580 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:44,392-Speed 2491.53 samples/sec Loss 3.2596 LearningRate 0.000614 Epoch: 11 Global Step: 244590 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:21:52,590-Speed 2498.39 samples/sec Loss 3.1958 LearningRate 0.000614 Epoch: 11 Global Step: 244600 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:00,785-Speed 2499.34 samples/sec Loss 3.2788 LearningRate 0.000614 Epoch: 11 Global Step: 244610 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:08,986-Speed 2497.88 samples/sec Loss 3.1867 LearningRate 0.000614 Epoch: 11 Global Step: 244620 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:17,135-Speed 2513.86 samples/sec Loss 3.2058 LearningRate 0.000614 Epoch: 11 Global Step: 244630 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:25,345-Speed 2494.75 samples/sec Loss 3.1563 LearningRate 0.000614 Epoch: 11 Global Step: 244640 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:33,544-Speed 2498.14 samples/sec Loss 3.2496 LearningRate 0.000614 Epoch: 11 Global Step: 244650 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:41,754-Speed 2495.06 samples/sec Loss 3.2129 LearningRate 0.000614 Epoch: 11 Global Step: 244660 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:49,957-Speed 2497.07 samples/sec Loss 3.2102 LearningRate 0.000614 Epoch: 11 Global Step: 244670 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:22:58,161-Speed 2496.76 samples/sec Loss 3.2437 LearningRate 0.000614 Epoch: 11 Global Step: 244680 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:06,312-Speed 2512.95 samples/sec Loss 3.2169 LearningRate 0.000614 Epoch: 11 Global Step: 244690 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:14,525-Speed 2493.94 samples/sec Loss 3.2485 LearningRate 0.000614 Epoch: 11 Global Step: 244700 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:22,727-Speed 2497.47 samples/sec Loss 3.2392 LearningRate 0.000614 Epoch: 11 Global Step: 244710 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:30,926-Speed 2498.11 samples/sec Loss 3.1651 LearningRate 0.000614 Epoch: 11 Global Step: 244720 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:39,123-Speed 2499.01 samples/sec Loss 3.1903 LearningRate 0.000614 Epoch: 11 Global Step: 244730 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:47,329-Speed 2495.83 samples/sec Loss 3.2345 LearningRate 0.000614 Epoch: 11 Global Step: 244740 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:23:55,480-Speed 2513.30 samples/sec Loss 3.2377 LearningRate 0.000614 Epoch: 11 Global Step: 244750 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:03,692-Speed 2494.34 samples/sec Loss 3.1750 LearningRate 0.000614 Epoch: 11 Global Step: 244760 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:11,895-Speed 2496.99 samples/sec Loss 3.1552 LearningRate 0.000614 Epoch: 11 Global Step: 244770 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:20,094-Speed 2498.33 samples/sec Loss 3.2309 LearningRate 0.000613 Epoch: 11 Global Step: 244780 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:28,293-Speed 2498.23 samples/sec Loss 3.2158 LearningRate 0.000613 Epoch: 11 Global Step: 244790 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:36,495-Speed 2496.94 samples/sec Loss 3.2613 LearningRate 0.000613 Epoch: 11 Global Step: 244800 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:44,652-Speed 2511.17 samples/sec Loss 3.2395 LearningRate 0.000613 Epoch: 11 Global Step: 244810 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:24:52,854-Speed 2497.51 samples/sec Loss 3.2373 LearningRate 0.000613 Epoch: 11 Global Step: 244820 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:01,053-Speed 2498.51 samples/sec Loss 3.2379 LearningRate 0.000613 Epoch: 11 Global Step: 244830 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:09,253-Speed 2497.64 samples/sec Loss 3.1361 LearningRate 0.000613 Epoch: 11 Global Step: 244840 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:17,455-Speed 2497.51 samples/sec Loss 3.1579 LearningRate 0.000613 Epoch: 11 Global Step: 244850 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:25,651-Speed 2499.30 samples/sec Loss 3.2676 LearningRate 0.000613 Epoch: 11 Global Step: 244860 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:33,797-Speed 2514.49 samples/sec Loss 3.2114 LearningRate 0.000613 Epoch: 11 Global Step: 244870 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:42,001-Speed 2496.56 samples/sec Loss 3.2311 LearningRate 0.000613 Epoch: 11 Global Step: 244880 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:50,204-Speed 2497.22 samples/sec Loss 3.2067 LearningRate 0.000613 Epoch: 11 Global Step: 244890 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:25:58,398-Speed 2499.83 samples/sec Loss 3.2653 LearningRate 0.000613 Epoch: 11 Global Step: 244900 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:06,593-Speed 2499.44 samples/sec Loss 3.2511 LearningRate 0.000613 Epoch: 11 Global Step: 244910 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:14,791-Speed 2498.61 samples/sec Loss 3.2015 LearningRate 0.000613 Epoch: 11 Global Step: 244920 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:22,946-Speed 2512.08 samples/sec Loss 3.2543 LearningRate 0.000613 Epoch: 11 Global Step: 244930 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:31,144-Speed 2498.63 samples/sec Loss 3.2203 LearningRate 0.000613 Epoch: 11 Global Step: 244940 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:39,343-Speed 2498.03 samples/sec Loss 3.2505 LearningRate 0.000613 Epoch: 11 Global Step: 244950 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:47,542-Speed 2498.09 samples/sec Loss 3.2003 LearningRate 0.000613 Epoch: 11 Global Step: 244960 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:26:55,753-Speed 2494.59 samples/sec Loss 3.2188 LearningRate 0.000613 Epoch: 11 Global Step: 244970 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:03,952-Speed 2498.36 samples/sec Loss 3.1623 LearningRate 0.000613 Epoch: 11 Global Step: 244980 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:12,098-Speed 2514.51 samples/sec Loss 3.1914 LearningRate 0.000613 Epoch: 11 Global Step: 244990 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:20,295-Speed 2499.15 samples/sec Loss 3.1849 LearningRate 0.000613 Epoch: 11 Global Step: 245000 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:28,493-Speed 2498.62 samples/sec Loss 3.1931 LearningRate 0.000613 Epoch: 11 Global Step: 245010 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:36,689-Speed 2499.02 samples/sec Loss 3.2155 LearningRate 0.000613 Epoch: 11 Global Step: 245020 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:44,884-Speed 2499.60 samples/sec Loss 3.1862 LearningRate 0.000613 Epoch: 11 Global Step: 245030 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:27:53,081-Speed 2499.07 samples/sec Loss 3.2094 LearningRate 0.000613 Epoch: 11 Global Step: 245040 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:01,228-Speed 2514.25 samples/sec Loss 3.2525 LearningRate 0.000613 Epoch: 11 Global Step: 245050 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:09,426-Speed 2498.45 samples/sec Loss 3.2208 LearningRate 0.000613 Epoch: 11 Global Step: 245060 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:17,624-Speed 2498.39 samples/sec Loss 3.1906 LearningRate 0.000613 Epoch: 11 Global Step: 245070 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:25,821-Speed 2498.87 samples/sec Loss 3.2530 LearningRate 0.000613 Epoch: 11 Global Step: 245080 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:34,034-Speed 2494.27 samples/sec Loss 3.1922 LearningRate 0.000613 Epoch: 11 Global Step: 245090 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:42,233-Speed 2498.04 samples/sec Loss 3.2042 LearningRate 0.000613 Epoch: 11 Global Step: 245100 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:50,377-Speed 2515.08 samples/sec Loss 3.1486 LearningRate 0.000613 Epoch: 11 Global Step: 245110 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:28:58,573-Speed 2499.15 samples/sec Loss 3.1823 LearningRate 0.000613 Epoch: 11 Global Step: 245120 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:06,784-Speed 2494.69 samples/sec Loss 3.2083 LearningRate 0.000613 Epoch: 11 Global Step: 245130 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:14,982-Speed 2498.54 samples/sec Loss 3.2060 LearningRate 0.000613 Epoch: 11 Global Step: 245140 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:23,183-Speed 2497.84 samples/sec Loss 3.1906 LearningRate 0.000613 Epoch: 11 Global Step: 245150 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:31,386-Speed 2497.08 samples/sec Loss 3.2356 LearningRate 0.000613 Epoch: 11 Global Step: 245160 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:39,537-Speed 2512.89 samples/sec Loss 3.1582 LearningRate 0.000613 Epoch: 11 Global Step: 245170 Fp16 Grad Scale: 32768 Required: 134 hours Training: 2022-07-07 22:29:47,746-Speed 2495.18 samples/sec Loss 3.1672 LearningRate 0.000613 Epoch: 11 Global Step: 245180 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:29:55,947-Speed 2497.89 samples/sec Loss 3.1705 LearningRate 0.000613 Epoch: 11 Global Step: 245190 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:04,145-Speed 2498.33 samples/sec Loss 3.1609 LearningRate 0.000613 Epoch: 11 Global Step: 245200 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:12,352-Speed 2495.91 samples/sec Loss 3.1569 LearningRate 0.000613 Epoch: 11 Global Step: 245210 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:20,553-Speed 2497.47 samples/sec Loss 3.1915 LearningRate 0.000613 Epoch: 11 Global Step: 245220 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:28,712-Speed 2510.49 samples/sec Loss 3.1464 LearningRate 0.000613 Epoch: 11 Global Step: 245230 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:36,916-Speed 2497.20 samples/sec Loss 3.1836 LearningRate 0.000613 Epoch: 11 Global Step: 245240 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:45,127-Speed 2494.84 samples/sec Loss 3.1998 LearningRate 0.000613 Epoch: 11 Global Step: 245250 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:30:53,329-Speed 2497.32 samples/sec Loss 3.2316 LearningRate 0.000612 Epoch: 11 Global Step: 245260 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:01,543-Speed 2493.65 samples/sec Loss 3.1928 LearningRate 0.000612 Epoch: 11 Global Step: 245270 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:09,747-Speed 2496.66 samples/sec Loss 3.2160 LearningRate 0.000612 Epoch: 11 Global Step: 245280 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:17,894-Speed 2514.32 samples/sec Loss 3.1891 LearningRate 0.000612 Epoch: 11 Global Step: 245290 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:26,100-Speed 2495.93 samples/sec Loss 3.1924 LearningRate 0.000612 Epoch: 11 Global Step: 245300 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:34,303-Speed 2497.21 samples/sec Loss 3.2885 LearningRate 0.000612 Epoch: 11 Global Step: 245310 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:42,506-Speed 2496.97 samples/sec Loss 3.1489 LearningRate 0.000612 Epoch: 11 Global Step: 245320 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:50,702-Speed 2499.20 samples/sec Loss 3.1549 LearningRate 0.000612 Epoch: 11 Global Step: 245330 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:31:58,910-Speed 2495.55 samples/sec Loss 3.2529 LearningRate 0.000612 Epoch: 11 Global Step: 245340 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:07,061-Speed 2512.98 samples/sec Loss 3.2095 LearningRate 0.000612 Epoch: 11 Global Step: 245350 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:15,259-Speed 2498.51 samples/sec Loss 3.2911 LearningRate 0.000612 Epoch: 11 Global Step: 245360 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:23,461-Speed 2497.18 samples/sec Loss 3.2290 LearningRate 0.000612 Epoch: 11 Global Step: 245370 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:31,660-Speed 2498.55 samples/sec Loss 3.1754 LearningRate 0.000612 Epoch: 11 Global Step: 245380 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:39,859-Speed 2498.44 samples/sec Loss 3.2564 LearningRate 0.000612 Epoch: 11 Global Step: 245390 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:48,060-Speed 2498.06 samples/sec Loss 3.2765 LearningRate 0.000612 Epoch: 11 Global Step: 245400 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:32:56,202-Speed 2515.67 samples/sec Loss 3.1701 LearningRate 0.000612 Epoch: 11 Global Step: 245410 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:04,414-Speed 2494.04 samples/sec Loss 3.1827 LearningRate 0.000612 Epoch: 11 Global Step: 245420 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:12,614-Speed 2498.15 samples/sec Loss 3.1978 LearningRate 0.000612 Epoch: 11 Global Step: 245430 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:20,811-Speed 2498.78 samples/sec Loss 3.2010 LearningRate 0.000612 Epoch: 11 Global Step: 245440 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:29,012-Speed 2497.77 samples/sec Loss 3.2828 LearningRate 0.000612 Epoch: 11 Global Step: 245450 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:37,210-Speed 2498.40 samples/sec Loss 3.1697 LearningRate 0.000612 Epoch: 11 Global Step: 245460 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:45,360-Speed 2513.42 samples/sec Loss 3.2410 LearningRate 0.000612 Epoch: 11 Global Step: 245470 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:33:53,556-Speed 2499.03 samples/sec Loss 3.1174 LearningRate 0.000612 Epoch: 11 Global Step: 245480 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:01,755-Speed 2498.38 samples/sec Loss 3.2142 LearningRate 0.000612 Epoch: 11 Global Step: 245490 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:09,955-Speed 2498.27 samples/sec Loss 3.1543 LearningRate 0.000612 Epoch: 11 Global Step: 245500 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:18,166-Speed 2494.88 samples/sec Loss 3.2273 LearningRate 0.000612 Epoch: 11 Global Step: 245510 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:26,364-Speed 2498.45 samples/sec Loss 3.1736 LearningRate 0.000612 Epoch: 11 Global Step: 245520 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:34,511-Speed 2514.62 samples/sec Loss 3.1588 LearningRate 0.000612 Epoch: 11 Global Step: 245530 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:42,705-Speed 2499.83 samples/sec Loss 3.1721 LearningRate 0.000612 Epoch: 11 Global Step: 245540 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:50,903-Speed 2498.48 samples/sec Loss 3.1809 LearningRate 0.000612 Epoch: 11 Global Step: 245550 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:34:59,101-Speed 2498.58 samples/sec Loss 3.1871 LearningRate 0.000612 Epoch: 11 Global Step: 245560 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:07,299-Speed 2498.70 samples/sec Loss 3.2033 LearningRate 0.000612 Epoch: 11 Global Step: 245570 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:15,496-Speed 2498.66 samples/sec Loss 3.1915 LearningRate 0.000612 Epoch: 11 Global Step: 245580 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:23,639-Speed 2515.48 samples/sec Loss 3.1966 LearningRate 0.000612 Epoch: 11 Global Step: 245590 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:31,835-Speed 2499.02 samples/sec Loss 3.1957 LearningRate 0.000612 Epoch: 11 Global Step: 245600 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:40,032-Speed 2498.96 samples/sec Loss 3.1821 LearningRate 0.000612 Epoch: 11 Global Step: 245610 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:48,228-Speed 2499.42 samples/sec Loss 3.2372 LearningRate 0.000612 Epoch: 11 Global Step: 245620 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:35:56,425-Speed 2498.91 samples/sec Loss 3.1955 LearningRate 0.000612 Epoch: 11 Global Step: 245630 Fp16 Grad Scale: 65536 Required: 134 hours Training: 2022-07-07 22:36:04,632-Speed 2495.65 samples/sec Loss 3.1688 LearningRate 0.000612 Epoch: 11 Global Step: 245640 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:12,775-Speed 2515.41 samples/sec Loss 3.1280 LearningRate 0.000612 Epoch: 11 Global Step: 245650 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:20,974-Speed 2498.42 samples/sec Loss 3.1834 LearningRate 0.000612 Epoch: 11 Global Step: 245660 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:29,169-Speed 2499.36 samples/sec Loss 3.1735 LearningRate 0.000612 Epoch: 11 Global Step: 245670 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:37,366-Speed 2499.09 samples/sec Loss 3.1863 LearningRate 0.000612 Epoch: 11 Global Step: 245680 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:45,567-Speed 2497.76 samples/sec Loss 3.1966 LearningRate 0.000612 Epoch: 11 Global Step: 245690 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:36:53,763-Speed 2499.08 samples/sec Loss 3.1929 LearningRate 0.000612 Epoch: 11 Global Step: 245700 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:01,907-Speed 2515.23 samples/sec Loss 3.2172 LearningRate 0.000612 Epoch: 11 Global Step: 245710 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:10,102-Speed 2499.52 samples/sec Loss 3.1925 LearningRate 0.000612 Epoch: 11 Global Step: 245720 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:18,297-Speed 2499.65 samples/sec Loss 3.1692 LearningRate 0.000611 Epoch: 11 Global Step: 245730 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:26,495-Speed 2498.59 samples/sec Loss 3.1624 LearningRate 0.000611 Epoch: 11 Global Step: 245740 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:34,693-Speed 2498.76 samples/sec Loss 3.1919 LearningRate 0.000611 Epoch: 11 Global Step: 245750 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:42,890-Speed 2498.70 samples/sec Loss 3.1336 LearningRate 0.000611 Epoch: 11 Global Step: 245760 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:51,048-Speed 2510.70 samples/sec Loss 3.2011 LearningRate 0.000611 Epoch: 11 Global Step: 245770 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:37:59,251-Speed 2497.33 samples/sec Loss 3.1712 LearningRate 0.000611 Epoch: 11 Global Step: 245780 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:07,461-Speed 2494.81 samples/sec Loss 3.1851 LearningRate 0.000611 Epoch: 11 Global Step: 245790 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:15,664-Speed 2496.86 samples/sec Loss 3.1411 LearningRate 0.000611 Epoch: 11 Global Step: 245800 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:24,319-Speed 2498.11 samples/sec Loss 3.2453 LearningRate 0.000611 Epoch: 11 Global Step: 245810 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:32,561-Speed 2500.67 samples/sec Loss 3.2653 LearningRate 0.000611 Epoch: 11 Global Step: 245820 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:40,704-Speed 2515.53 samples/sec Loss 3.1665 LearningRate 0.000611 Epoch: 11 Global Step: 245830 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:48,902-Speed 2498.40 samples/sec Loss 3.2270 LearningRate 0.000611 Epoch: 11 Global Step: 245840 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:38:58,275-Speed 2198.39 samples/sec Loss 3.1487 LearningRate 0.000611 Epoch: 11 Global Step: 245850 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:06,495-Speed 2501.15 samples/sec Loss 3.1947 LearningRate 0.000611 Epoch: 11 Global Step: 245860 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:14,691-Speed 2499.13 samples/sec Loss 3.1754 LearningRate 0.000611 Epoch: 11 Global Step: 245870 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:23,513-Speed 2500.52 samples/sec Loss 3.2099 LearningRate 0.000611 Epoch: 11 Global Step: 245880 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:31,680-Speed 2517.09 samples/sec Loss 3.2456 LearningRate 0.000611 Epoch: 11 Global Step: 245890 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:39,881-Speed 2497.20 samples/sec Loss 3.2020 LearningRate 0.000611 Epoch: 11 Global Step: 245900 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:48,132-Speed 2493.52 samples/sec Loss 3.2046 LearningRate 0.000611 Epoch: 11 Global Step: 245910 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:39:56,395-Speed 2499.73 samples/sec Loss 3.2305 LearningRate 0.000611 Epoch: 11 Global Step: 245920 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:04,635-Speed 2498.30 samples/sec Loss 3.1913 LearningRate 0.000611 Epoch: 11 Global Step: 245930 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:12,841-Speed 2496.00 samples/sec Loss 3.2206 LearningRate 0.000611 Epoch: 11 Global Step: 245940 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:21,010-Speed 2507.45 samples/sec Loss 3.2533 LearningRate 0.000611 Epoch: 11 Global Step: 245950 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:29,248-Speed 2499.63 samples/sec Loss 3.2227 LearningRate 0.000611 Epoch: 11 Global Step: 245960 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:37,464-Speed 2499.66 samples/sec Loss 3.1488 LearningRate 0.000611 Epoch: 11 Global Step: 245970 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:45,663-Speed 2498.16 samples/sec Loss 3.2039 LearningRate 0.000611 Epoch: 11 Global Step: 245980 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:40:54,551-Speed 2500.48 samples/sec Loss 3.2220 LearningRate 0.000611 Epoch: 11 Global Step: 245990 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:02,805-Speed 2497.96 samples/sec Loss 3.1968 LearningRate 0.000611 Epoch: 11 Global Step: 246000 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:10,961-Speed 2511.30 samples/sec Loss 3.2198 LearningRate 0.000611 Epoch: 11 Global Step: 246010 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:19,549-Speed 2500.63 samples/sec Loss 3.2019 LearningRate 0.000611 Epoch: 11 Global Step: 246020 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:27,762-Speed 2500.18 samples/sec Loss 3.2075 LearningRate 0.000611 Epoch: 11 Global Step: 246030 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:36,022-Speed 2499.99 samples/sec Loss 3.2400 LearningRate 0.000611 Epoch: 11 Global Step: 246040 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:44,219-Speed 2498.95 samples/sec Loss 3.1633 LearningRate 0.000611 Epoch: 11 Global Step: 246050 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:41:53,572-Speed 2500.86 samples/sec Loss 3.1950 LearningRate 0.000611 Epoch: 11 Global Step: 246060 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:01,759-Speed 2516.83 samples/sec Loss 3.2482 LearningRate 0.000611 Epoch: 11 Global Step: 246070 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:10,006-Speed 2500.11 samples/sec Loss 3.2409 LearningRate 0.000611 Epoch: 11 Global Step: 246080 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:19,423-Speed 2174.95 samples/sec Loss 3.2642 LearningRate 0.000611 Epoch: 11 Global Step: 246090 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:27,692-Speed 2500.37 samples/sec Loss 3.2365 LearningRate 0.000611 Epoch: 11 Global Step: 246100 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:37,304-Speed 2137.39 samples/sec Loss 3.1558 LearningRate 0.000611 Epoch: 11 Global Step: 246110 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:45,524-Speed 2498.91 samples/sec Loss 3.2021 LearningRate 0.000611 Epoch: 11 Global Step: 246120 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:42:53,666-Speed 2515.72 samples/sec Loss 3.1819 LearningRate 0.000611 Epoch: 11 Global Step: 246130 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:01,863-Speed 2498.98 samples/sec Loss 3.2126 LearningRate 0.000611 Epoch: 11 Global Step: 246140 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:10,058-Speed 2499.28 samples/sec Loss 3.2079 LearningRate 0.000611 Epoch: 11 Global Step: 246150 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:18,260-Speed 2497.36 samples/sec Loss 3.2084 LearningRate 0.000611 Epoch: 11 Global Step: 246160 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:26,457-Speed 2499.04 samples/sec Loss 3.1590 LearningRate 0.000611 Epoch: 11 Global Step: 246170 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:34,654-Speed 2498.95 samples/sec Loss 3.1605 LearningRate 0.000611 Epoch: 11 Global Step: 246180 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:42,797-Speed 2515.29 samples/sec Loss 3.1849 LearningRate 0.000611 Epoch: 11 Global Step: 246190 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:50,996-Speed 2498.35 samples/sec Loss 3.2261 LearningRate 0.000611 Epoch: 11 Global Step: 246200 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:43:59,207-Speed 2494.32 samples/sec Loss 3.1909 LearningRate 0.000610 Epoch: 11 Global Step: 246210 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:07,403-Speed 2499.28 samples/sec Loss 3.1803 LearningRate 0.000610 Epoch: 11 Global Step: 246220 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:15,604-Speed 2497.68 samples/sec Loss 3.1437 LearningRate 0.000610 Epoch: 11 Global Step: 246230 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:23,801-Speed 2498.82 samples/sec Loss 3.1917 LearningRate 0.000610 Epoch: 11 Global Step: 246240 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:31,945-Speed 2515.00 samples/sec Loss 3.2541 LearningRate 0.000610 Epoch: 11 Global Step: 246250 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:40,145-Speed 2498.34 samples/sec Loss 3.1741 LearningRate 0.000610 Epoch: 11 Global Step: 246260 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:48,343-Speed 2498.38 samples/sec Loss 3.2188 LearningRate 0.000610 Epoch: 11 Global Step: 246270 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:44:56,547-Speed 2496.67 samples/sec Loss 3.2484 LearningRate 0.000610 Epoch: 11 Global Step: 246280 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:04,749-Speed 2497.55 samples/sec Loss 3.2007 LearningRate 0.000610 Epoch: 11 Global Step: 246290 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:12,947-Speed 2498.36 samples/sec Loss 3.1876 LearningRate 0.000610 Epoch: 11 Global Step: 246300 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:21,091-Speed 2515.12 samples/sec Loss 3.1576 LearningRate 0.000610 Epoch: 11 Global Step: 246310 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:29,291-Speed 2498.10 samples/sec Loss 3.2047 LearningRate 0.000610 Epoch: 11 Global Step: 246320 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:37,490-Speed 2498.30 samples/sec Loss 3.2156 LearningRate 0.000610 Epoch: 11 Global Step: 246330 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:45,694-Speed 2496.64 samples/sec Loss 3.1553 LearningRate 0.000610 Epoch: 11 Global Step: 246340 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:45:53,892-Speed 2498.64 samples/sec Loss 3.1656 LearningRate 0.000610 Epoch: 11 Global Step: 246350 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:46:02,092-Speed 2498.08 samples/sec Loss 3.1518 LearningRate 0.000610 Epoch: 11 Global Step: 246360 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:46:10,237-Speed 2514.57 samples/sec Loss 3.1672 LearningRate 0.000610 Epoch: 11 Global Step: 246370 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:46:18,435-Speed 2498.59 samples/sec Loss 3.1908 LearningRate 0.000610 Epoch: 11 Global Step: 246380 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:46:26,641-Speed 2496.27 samples/sec Loss 3.2160 LearningRate 0.000610 Epoch: 11 Global Step: 246390 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:46:34,841-Speed 2497.75 samples/sec Loss 3.1851 LearningRate 0.000610 Epoch: 11 Global Step: 246400 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:46:43,045-Speed 2496.89 samples/sec Loss 3.1719 LearningRate 0.000610 Epoch: 11 Global Step: 246410 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:46:51,245-Speed 2497.98 samples/sec Loss 3.2138 LearningRate 0.000610 Epoch: 11 Global Step: 246420 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:46:59,390-Speed 2514.65 samples/sec Loss 3.1789 LearningRate 0.000610 Epoch: 11 Global Step: 246430 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:07,594-Speed 2496.83 samples/sec Loss 3.1985 LearningRate 0.000610 Epoch: 11 Global Step: 246440 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:15,789-Speed 2499.31 samples/sec Loss 3.1424 LearningRate 0.000610 Epoch: 11 Global Step: 246450 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:23,988-Speed 2498.44 samples/sec Loss 3.2301 LearningRate 0.000610 Epoch: 11 Global Step: 246460 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:32,186-Speed 2498.38 samples/sec Loss 3.1985 LearningRate 0.000610 Epoch: 11 Global Step: 246470 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:40,386-Speed 2498.01 samples/sec Loss 3.1303 LearningRate 0.000610 Epoch: 11 Global Step: 246480 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:48,532-Speed 2514.26 samples/sec Loss 3.1546 LearningRate 0.000610 Epoch: 11 Global Step: 246490 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 22:47:56,695-Speed 2509.48 samples/sec Loss 3.1829 LearningRate 0.000610 Epoch: 11 Global Step: 246500 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:04,893-Speed 2498.74 samples/sec Loss 3.1920 LearningRate 0.000610 Epoch: 11 Global Step: 246510 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:13,094-Speed 2497.50 samples/sec Loss 3.2021 LearningRate 0.000610 Epoch: 11 Global Step: 246520 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:21,291-Speed 2498.92 samples/sec Loss 3.2216 LearningRate 0.000610 Epoch: 11 Global Step: 246530 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:29,490-Speed 2498.35 samples/sec Loss 3.1618 LearningRate 0.000610 Epoch: 11 Global Step: 246540 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:37,636-Speed 2514.49 samples/sec Loss 3.1447 LearningRate 0.000610 Epoch: 11 Global Step: 246550 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:45,834-Speed 2498.70 samples/sec Loss 3.1580 LearningRate 0.000610 Epoch: 11 Global Step: 246560 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:48:54,038-Speed 2497.21 samples/sec Loss 3.1414 LearningRate 0.000610 Epoch: 11 Global Step: 246570 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:02,240-Speed 2497.10 samples/sec Loss 3.0963 LearningRate 0.000610 Epoch: 11 Global Step: 246580 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:10,438-Speed 2498.58 samples/sec Loss 3.2076 LearningRate 0.000610 Epoch: 11 Global Step: 246590 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:18,641-Speed 2497.32 samples/sec Loss 3.1004 LearningRate 0.000610 Epoch: 11 Global Step: 246600 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:26,786-Speed 2514.67 samples/sec Loss 3.1766 LearningRate 0.000610 Epoch: 11 Global Step: 246610 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:34,986-Speed 2498.06 samples/sec Loss 3.1153 LearningRate 0.000610 Epoch: 11 Global Step: 246620 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:43,184-Speed 2498.29 samples/sec Loss 3.2067 LearningRate 0.000610 Epoch: 11 Global Step: 246630 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:51,383-Speed 2498.23 samples/sec Loss 3.1345 LearningRate 0.000610 Epoch: 11 Global Step: 246640 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:49:59,582-Speed 2498.36 samples/sec Loss 3.1108 LearningRate 0.000610 Epoch: 11 Global Step: 246650 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:07,786-Speed 2496.82 samples/sec Loss 3.2024 LearningRate 0.000610 Epoch: 11 Global Step: 246660 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:15,933-Speed 2514.48 samples/sec Loss 3.2076 LearningRate 0.000610 Epoch: 11 Global Step: 246670 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:24,151-Speed 2492.45 samples/sec Loss 3.2085 LearningRate 0.000610 Epoch: 11 Global Step: 246680 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:32,366-Speed 2493.59 samples/sec Loss 3.2299 LearningRate 0.000609 Epoch: 11 Global Step: 246690 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:40,576-Speed 2495.02 samples/sec Loss 3.1192 LearningRate 0.000609 Epoch: 11 Global Step: 246700 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:48,779-Speed 2497.89 samples/sec Loss 3.1187 LearningRate 0.000609 Epoch: 11 Global Step: 246710 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:50:56,978-Speed 2498.20 samples/sec Loss 3.2317 LearningRate 0.000609 Epoch: 11 Global Step: 246720 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:05,121-Speed 2515.26 samples/sec Loss 3.1311 LearningRate 0.000609 Epoch: 11 Global Step: 246730 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:13,323-Speed 2497.55 samples/sec Loss 3.1551 LearningRate 0.000609 Epoch: 11 Global Step: 246740 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:21,524-Speed 2497.37 samples/sec Loss 3.1903 LearningRate 0.000609 Epoch: 11 Global Step: 246750 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:29,728-Speed 2496.75 samples/sec Loss 3.2276 LearningRate 0.000609 Epoch: 11 Global Step: 246760 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:37,927-Speed 2498.31 samples/sec Loss 3.2132 LearningRate 0.000609 Epoch: 11 Global Step: 246770 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:46,125-Speed 2498.54 samples/sec Loss 3.1959 LearningRate 0.000609 Epoch: 11 Global Step: 246780 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:51:54,270-Speed 2514.89 samples/sec Loss 3.2042 LearningRate 0.000609 Epoch: 11 Global Step: 246790 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:02,466-Speed 2499.08 samples/sec Loss 3.2052 LearningRate 0.000609 Epoch: 11 Global Step: 246800 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:10,663-Speed 2499.42 samples/sec Loss 3.1856 LearningRate 0.000609 Epoch: 11 Global Step: 246810 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:18,864-Speed 2497.42 samples/sec Loss 3.2232 LearningRate 0.000609 Epoch: 11 Global Step: 246820 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:27,066-Speed 2497.49 samples/sec Loss 3.1865 LearningRate 0.000609 Epoch: 11 Global Step: 246830 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:35,274-Speed 2495.57 samples/sec Loss 3.1842 LearningRate 0.000609 Epoch: 11 Global Step: 246840 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:43,420-Speed 2514.57 samples/sec Loss 3.2123 LearningRate 0.000609 Epoch: 11 Global Step: 246850 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:51,636-Speed 2493.21 samples/sec Loss 3.1996 LearningRate 0.000609 Epoch: 11 Global Step: 246860 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:52:59,838-Speed 2497.26 samples/sec Loss 3.1962 LearningRate 0.000609 Epoch: 11 Global Step: 246870 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:08,038-Speed 2497.98 samples/sec Loss 3.2590 LearningRate 0.000609 Epoch: 11 Global Step: 246880 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:16,237-Speed 2498.40 samples/sec Loss 3.1976 LearningRate 0.000609 Epoch: 11 Global Step: 246890 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:24,434-Speed 2498.83 samples/sec Loss 3.1088 LearningRate 0.000609 Epoch: 11 Global Step: 246900 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:32,578-Speed 2515.19 samples/sec Loss 3.1577 LearningRate 0.000609 Epoch: 11 Global Step: 246910 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:40,779-Speed 2497.85 samples/sec Loss 3.2282 LearningRate 0.000609 Epoch: 11 Global Step: 246920 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:48,985-Speed 2496.08 samples/sec Loss 3.2723 LearningRate 0.000609 Epoch: 11 Global Step: 246930 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:53:57,181-Speed 2499.33 samples/sec Loss 3.2314 LearningRate 0.000609 Epoch: 11 Global Step: 246940 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:05,376-Speed 2499.34 samples/sec Loss 3.1597 LearningRate 0.000609 Epoch: 11 Global Step: 246950 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:13,575-Speed 2498.19 samples/sec Loss 3.1431 LearningRate 0.000609 Epoch: 11 Global Step: 246960 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:21,720-Speed 2514.72 samples/sec Loss 3.2524 LearningRate 0.000609 Epoch: 11 Global Step: 246970 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:29,923-Speed 2497.03 samples/sec Loss 3.2098 LearningRate 0.000609 Epoch: 11 Global Step: 246980 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:38,130-Speed 2495.57 samples/sec Loss 3.1759 LearningRate 0.000609 Epoch: 11 Global Step: 246990 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:46,352-Speed 2491.67 samples/sec Loss 3.1673 LearningRate 0.000609 Epoch: 11 Global Step: 247000 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:54:54,556-Speed 2496.79 samples/sec Loss 3.1750 LearningRate 0.000609 Epoch: 11 Global Step: 247010 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:02,765-Speed 2495.11 samples/sec Loss 3.1419 LearningRate 0.000609 Epoch: 11 Global Step: 247020 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:10,917-Speed 2512.32 samples/sec Loss 3.1689 LearningRate 0.000609 Epoch: 11 Global Step: 247030 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:19,125-Speed 2495.78 samples/sec Loss 3.1095 LearningRate 0.000609 Epoch: 11 Global Step: 247040 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:27,330-Speed 2496.68 samples/sec Loss 3.1773 LearningRate 0.000609 Epoch: 11 Global Step: 247050 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:35,532-Speed 2497.04 samples/sec Loss 3.1479 LearningRate 0.000609 Epoch: 11 Global Step: 247060 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:43,737-Speed 2496.46 samples/sec Loss 3.2274 LearningRate 0.000609 Epoch: 11 Global Step: 247070 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:55:51,940-Speed 2497.15 samples/sec Loss 3.1647 LearningRate 0.000609 Epoch: 11 Global Step: 247080 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:00,089-Speed 2513.68 samples/sec Loss 3.1969 LearningRate 0.000609 Epoch: 11 Global Step: 247090 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:08,306-Speed 2492.65 samples/sec Loss 3.1183 LearningRate 0.000609 Epoch: 11 Global Step: 247100 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:16,507-Speed 2497.49 samples/sec Loss 3.1863 LearningRate 0.000609 Epoch: 11 Global Step: 247110 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:24,707-Speed 2498.25 samples/sec Loss 3.1886 LearningRate 0.000609 Epoch: 11 Global Step: 247120 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:32,911-Speed 2496.59 samples/sec Loss 3.2313 LearningRate 0.000609 Epoch: 11 Global Step: 247130 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:41,112-Speed 2497.57 samples/sec Loss 3.1737 LearningRate 0.000609 Epoch: 11 Global Step: 247140 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:49,261-Speed 2513.52 samples/sec Loss 3.1762 LearningRate 0.000609 Epoch: 11 Global Step: 247150 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:56:57,475-Speed 2493.63 samples/sec Loss 3.2134 LearningRate 0.000609 Epoch: 11 Global Step: 247160 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:05,676-Speed 2497.73 samples/sec Loss 3.1676 LearningRate 0.000608 Epoch: 11 Global Step: 247170 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:13,875-Speed 2498.28 samples/sec Loss 3.1646 LearningRate 0.000608 Epoch: 11 Global Step: 247180 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:22,076-Speed 2497.60 samples/sec Loss 3.1880 LearningRate 0.000608 Epoch: 11 Global Step: 247190 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:30,280-Speed 2496.94 samples/sec Loss 3.1677 LearningRate 0.000608 Epoch: 11 Global Step: 247200 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:38,424-Speed 2515.13 samples/sec Loss 3.1788 LearningRate 0.000608 Epoch: 11 Global Step: 247210 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:46,636-Speed 2494.01 samples/sec Loss 3.1566 LearningRate 0.000608 Epoch: 11 Global Step: 247220 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:57:54,846-Speed 2494.96 samples/sec Loss 3.1779 LearningRate 0.000608 Epoch: 11 Global Step: 247230 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:03,046-Speed 2497.94 samples/sec Loss 3.1002 LearningRate 0.000608 Epoch: 11 Global Step: 247240 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:11,244-Speed 2498.52 samples/sec Loss 3.1286 LearningRate 0.000608 Epoch: 11 Global Step: 247250 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:19,446-Speed 2497.35 samples/sec Loss 3.1369 LearningRate 0.000608 Epoch: 11 Global Step: 247260 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:27,591-Speed 2514.80 samples/sec Loss 3.1733 LearningRate 0.000608 Epoch: 11 Global Step: 247270 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:35,788-Speed 2499.08 samples/sec Loss 3.1430 LearningRate 0.000608 Epoch: 11 Global Step: 247280 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:43,986-Speed 2498.58 samples/sec Loss 3.1364 LearningRate 0.000608 Epoch: 11 Global Step: 247290 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:58:52,189-Speed 2496.75 samples/sec Loss 3.1547 LearningRate 0.000608 Epoch: 11 Global Step: 247300 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:00,387-Speed 2498.72 samples/sec Loss 3.1950 LearningRate 0.000608 Epoch: 11 Global Step: 247310 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:08,589-Speed 2497.54 samples/sec Loss 3.1975 LearningRate 0.000608 Epoch: 11 Global Step: 247320 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:16,748-Speed 2510.46 samples/sec Loss 3.1554 LearningRate 0.000608 Epoch: 11 Global Step: 247330 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:24,951-Speed 2497.08 samples/sec Loss 3.1585 LearningRate 0.000608 Epoch: 11 Global Step: 247340 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:33,152-Speed 2497.86 samples/sec Loss 3.1517 LearningRate 0.000608 Epoch: 11 Global Step: 247350 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:41,349-Speed 2498.81 samples/sec Loss 3.0953 LearningRate 0.000608 Epoch: 11 Global Step: 247360 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:49,550-Speed 2497.78 samples/sec Loss 3.1292 LearningRate 0.000608 Epoch: 11 Global Step: 247370 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 22:59:57,761-Speed 2494.79 samples/sec Loss 3.1975 LearningRate 0.000608 Epoch: 11 Global Step: 247380 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:05,907-Speed 2514.39 samples/sec Loss 3.1550 LearningRate 0.000608 Epoch: 11 Global Step: 247390 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:14,107-Speed 2497.88 samples/sec Loss 3.2049 LearningRate 0.000608 Epoch: 11 Global Step: 247400 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:22,316-Speed 2495.24 samples/sec Loss 3.1801 LearningRate 0.000608 Epoch: 11 Global Step: 247410 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:30,514-Speed 2498.52 samples/sec Loss 3.1777 LearningRate 0.000608 Epoch: 11 Global Step: 247420 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:38,712-Speed 2498.82 samples/sec Loss 3.1454 LearningRate 0.000608 Epoch: 11 Global Step: 247430 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:46,908-Speed 2499.06 samples/sec Loss 3.2517 LearningRate 0.000608 Epoch: 11 Global Step: 247440 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:00:55,064-Speed 2511.33 samples/sec Loss 3.2319 LearningRate 0.000608 Epoch: 11 Global Step: 247450 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:03,262-Speed 2498.95 samples/sec Loss 3.2457 LearningRate 0.000608 Epoch: 11 Global Step: 247460 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:11,465-Speed 2497.01 samples/sec Loss 3.2217 LearningRate 0.000608 Epoch: 11 Global Step: 247470 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:19,665-Speed 2497.81 samples/sec Loss 3.2255 LearningRate 0.000608 Epoch: 11 Global Step: 247480 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:27,862-Speed 2499.16 samples/sec Loss 3.2225 LearningRate 0.000608 Epoch: 11 Global Step: 247490 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:36,061-Speed 2498.24 samples/sec Loss 3.2119 LearningRate 0.000608 Epoch: 11 Global Step: 247500 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:44,205-Speed 2515.02 samples/sec Loss 3.1744 LearningRate 0.000608 Epoch: 11 Global Step: 247510 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:01:52,403-Speed 2498.65 samples/sec Loss 3.1395 LearningRate 0.000608 Epoch: 11 Global Step: 247520 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:00,600-Speed 2498.96 samples/sec Loss 3.2079 LearningRate 0.000608 Epoch: 11 Global Step: 247530 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:08,803-Speed 2496.92 samples/sec Loss 3.2198 LearningRate 0.000608 Epoch: 11 Global Step: 247540 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:17,002-Speed 2498.81 samples/sec Loss 3.2790 LearningRate 0.000608 Epoch: 11 Global Step: 247550 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:25,202-Speed 2497.80 samples/sec Loss 3.2590 LearningRate 0.000608 Epoch: 11 Global Step: 247560 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:33,347-Speed 2514.82 samples/sec Loss 3.2516 LearningRate 0.000608 Epoch: 11 Global Step: 247570 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:41,567-Speed 2491.96 samples/sec Loss 3.1918 LearningRate 0.000608 Epoch: 11 Global Step: 247580 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:49,763-Speed 2498.93 samples/sec Loss 3.1692 LearningRate 0.000608 Epoch: 11 Global Step: 247590 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:02:57,963-Speed 2498.28 samples/sec Loss 3.1509 LearningRate 0.000608 Epoch: 11 Global Step: 247600 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:06,165-Speed 2497.18 samples/sec Loss 3.1397 LearningRate 0.000608 Epoch: 11 Global Step: 247610 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:14,363-Speed 2498.38 samples/sec Loss 3.1194 LearningRate 0.000608 Epoch: 11 Global Step: 247620 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:22,514-Speed 2512.99 samples/sec Loss 3.2716 LearningRate 0.000608 Epoch: 11 Global Step: 247630 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:30,709-Speed 2499.47 samples/sec Loss 3.2047 LearningRate 0.000608 Epoch: 11 Global Step: 247640 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:38,918-Speed 2495.19 samples/sec Loss 3.2445 LearningRate 0.000607 Epoch: 11 Global Step: 247650 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:47,134-Speed 2493.13 samples/sec Loss 3.2192 LearningRate 0.000607 Epoch: 11 Global Step: 247660 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:03:55,331-Speed 2498.82 samples/sec Loss 3.2688 LearningRate 0.000607 Epoch: 11 Global Step: 247670 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:04:03,529-Speed 2498.74 samples/sec Loss 3.1582 LearningRate 0.000607 Epoch: 11 Global Step: 247680 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:04:11,672-Speed 2515.57 samples/sec Loss 3.1710 LearningRate 0.000607 Epoch: 11 Global Step: 247690 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:04:19,885-Speed 2494.05 samples/sec Loss 3.1666 LearningRate 0.000607 Epoch: 11 Global Step: 247700 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:04:28,085-Speed 2498.02 samples/sec Loss 3.1632 LearningRate 0.000607 Epoch: 11 Global Step: 247710 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:04:36,284-Speed 2498.22 samples/sec Loss 3.1477 LearningRate 0.000607 Epoch: 11 Global Step: 247720 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:04:44,486-Speed 2497.59 samples/sec Loss 3.1889 LearningRate 0.000607 Epoch: 11 Global Step: 247730 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:04:52,688-Speed 2497.09 samples/sec Loss 3.2118 LearningRate 0.000607 Epoch: 11 Global Step: 247740 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:00,836-Speed 2514.20 samples/sec Loss 3.1869 LearningRate 0.000607 Epoch: 11 Global Step: 247750 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:09,040-Speed 2496.85 samples/sec Loss 3.2047 LearningRate 0.000607 Epoch: 11 Global Step: 247760 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:17,239-Speed 2498.28 samples/sec Loss 3.1921 LearningRate 0.000607 Epoch: 11 Global Step: 247770 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:25,452-Speed 2493.95 samples/sec Loss 3.1756 LearningRate 0.000607 Epoch: 11 Global Step: 247780 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:33,652-Speed 2497.81 samples/sec Loss 3.1937 LearningRate 0.000607 Epoch: 11 Global Step: 247790 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:41,852-Speed 2498.07 samples/sec Loss 3.1863 LearningRate 0.000607 Epoch: 11 Global Step: 247800 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:50,001-Speed 2513.64 samples/sec Loss 3.1592 LearningRate 0.000607 Epoch: 11 Global Step: 247810 Fp16 Grad Scale: 131072 Required: 133 hours Training: 2022-07-07 23:05:58,155-Speed 2511.94 samples/sec Loss 3.2130 LearningRate 0.000607 Epoch: 11 Global Step: 247820 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:06,350-Speed 2499.45 samples/sec Loss 3.2302 LearningRate 0.000607 Epoch: 11 Global Step: 247830 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:14,552-Speed 2497.41 samples/sec Loss 3.2085 LearningRate 0.000607 Epoch: 11 Global Step: 247840 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:22,753-Speed 2497.80 samples/sec Loss 3.1784 LearningRate 0.000607 Epoch: 11 Global Step: 247850 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:30,956-Speed 2497.19 samples/sec Loss 3.1431 LearningRate 0.000607 Epoch: 11 Global Step: 247860 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:39,102-Speed 2514.41 samples/sec Loss 3.0649 LearningRate 0.000607 Epoch: 11 Global Step: 247870 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:47,300-Speed 2498.53 samples/sec Loss 3.1762 LearningRate 0.000607 Epoch: 11 Global Step: 247880 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:06:55,500-Speed 2497.80 samples/sec Loss 3.1559 LearningRate 0.000607 Epoch: 11 Global Step: 247890 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:03,700-Speed 2498.23 samples/sec Loss 3.2188 LearningRate 0.000607 Epoch: 11 Global Step: 247900 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:11,901-Speed 2497.61 samples/sec Loss 3.1995 LearningRate 0.000607 Epoch: 11 Global Step: 247910 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:20,103-Speed 2497.45 samples/sec Loss 3.1657 LearningRate 0.000607 Epoch: 11 Global Step: 247920 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:28,246-Speed 2515.15 samples/sec Loss 3.1634 LearningRate 0.000607 Epoch: 11 Global Step: 247930 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:36,445-Speed 2498.18 samples/sec Loss 3.1642 LearningRate 0.000607 Epoch: 11 Global Step: 247940 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:44,648-Speed 2497.19 samples/sec Loss 3.1740 LearningRate 0.000607 Epoch: 11 Global Step: 247950 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:07:52,844-Speed 2499.08 samples/sec Loss 3.1735 LearningRate 0.000607 Epoch: 11 Global Step: 247960 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:01,047-Speed 2497.20 samples/sec Loss 3.1206 LearningRate 0.000607 Epoch: 11 Global Step: 247970 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:09,245-Speed 2498.68 samples/sec Loss 3.1451 LearningRate 0.000607 Epoch: 11 Global Step: 247980 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:17,401-Speed 2511.50 samples/sec Loss 3.2800 LearningRate 0.000607 Epoch: 11 Global Step: 247990 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:25,599-Speed 2498.61 samples/sec Loss 3.1425 LearningRate 0.000607 Epoch: 11 Global Step: 248000 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:33,801-Speed 2497.19 samples/sec Loss 3.1930 LearningRate 0.000607 Epoch: 11 Global Step: 248010 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:41,999-Speed 2498.52 samples/sec Loss 3.1750 LearningRate 0.000607 Epoch: 11 Global Step: 248020 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:50,199-Speed 2497.92 samples/sec Loss 3.2670 LearningRate 0.000607 Epoch: 11 Global Step: 248030 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:08:58,399-Speed 2498.07 samples/sec Loss 3.1964 LearningRate 0.000607 Epoch: 11 Global Step: 248040 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:06,547-Speed 2513.95 samples/sec Loss 3.1122 LearningRate 0.000607 Epoch: 11 Global Step: 248050 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:14,748-Speed 2497.78 samples/sec Loss 3.2114 LearningRate 0.000607 Epoch: 11 Global Step: 248060 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:22,954-Speed 2496.30 samples/sec Loss 3.2828 LearningRate 0.000607 Epoch: 11 Global Step: 248070 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:31,153-Speed 2498.26 samples/sec Loss 3.1633 LearningRate 0.000607 Epoch: 11 Global Step: 248080 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:39,351-Speed 2498.44 samples/sec Loss 3.0995 LearningRate 0.000607 Epoch: 11 Global Step: 248090 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:47,548-Speed 2498.72 samples/sec Loss 3.1692 LearningRate 0.000607 Epoch: 11 Global Step: 248100 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:09:55,695-Speed 2514.30 samples/sec Loss 3.2457 LearningRate 0.000607 Epoch: 11 Global Step: 248110 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:03,894-Speed 2498.38 samples/sec Loss 3.1548 LearningRate 0.000606 Epoch: 11 Global Step: 248120 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:12,094-Speed 2497.76 samples/sec Loss 3.2146 LearningRate 0.000606 Epoch: 11 Global Step: 248130 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:20,301-Speed 2496.07 samples/sec Loss 3.2969 LearningRate 0.000606 Epoch: 11 Global Step: 248140 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:28,504-Speed 2496.96 samples/sec Loss 3.2073 LearningRate 0.000606 Epoch: 11 Global Step: 248150 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:36,715-Speed 2494.50 samples/sec Loss 3.2224 LearningRate 0.000606 Epoch: 11 Global Step: 248160 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:44,862-Speed 2514.51 samples/sec Loss 3.1399 LearningRate 0.000606 Epoch: 11 Global Step: 248170 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:10:53,062-Speed 2497.92 samples/sec Loss 3.1711 LearningRate 0.000606 Epoch: 11 Global Step: 248180 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:01,264-Speed 2497.16 samples/sec Loss 3.1902 LearningRate 0.000606 Epoch: 11 Global Step: 248190 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:09,464-Speed 2498.15 samples/sec Loss 3.1304 LearningRate 0.000606 Epoch: 11 Global Step: 248200 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:17,662-Speed 2498.28 samples/sec Loss 3.1607 LearningRate 0.000606 Epoch: 11 Global Step: 248210 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:25,861-Speed 2498.11 samples/sec Loss 3.1448 LearningRate 0.000606 Epoch: 11 Global Step: 248220 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:34,008-Speed 2514.31 samples/sec Loss 3.1336 LearningRate 0.000606 Epoch: 11 Global Step: 248230 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:42,211-Speed 2497.35 samples/sec Loss 3.1739 LearningRate 0.000606 Epoch: 11 Global Step: 248240 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:50,414-Speed 2497.07 samples/sec Loss 3.1513 LearningRate 0.000606 Epoch: 11 Global Step: 248250 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:11:58,613-Speed 2498.20 samples/sec Loss 3.1543 LearningRate 0.000606 Epoch: 11 Global Step: 248260 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:06,813-Speed 2497.96 samples/sec Loss 3.1388 LearningRate 0.000606 Epoch: 11 Global Step: 248270 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:15,011-Speed 2498.67 samples/sec Loss 3.1637 LearningRate 0.000606 Epoch: 11 Global Step: 248280 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:23,159-Speed 2514.01 samples/sec Loss 3.1514 LearningRate 0.000606 Epoch: 11 Global Step: 248290 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:31,354-Speed 2499.42 samples/sec Loss 3.1696 LearningRate 0.000606 Epoch: 11 Global Step: 248300 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:39,549-Speed 2499.24 samples/sec Loss 3.2165 LearningRate 0.000606 Epoch: 11 Global Step: 248310 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:47,744-Speed 2499.47 samples/sec Loss 3.1227 LearningRate 0.000606 Epoch: 11 Global Step: 248320 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:12:55,944-Speed 2497.89 samples/sec Loss 3.2041 LearningRate 0.000606 Epoch: 11 Global Step: 248330 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:04,140-Speed 2499.11 samples/sec Loss 3.2073 LearningRate 0.000606 Epoch: 11 Global Step: 248340 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:12,289-Speed 2513.64 samples/sec Loss 3.1844 LearningRate 0.000606 Epoch: 11 Global Step: 248350 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:20,487-Speed 2498.64 samples/sec Loss 3.1634 LearningRate 0.000606 Epoch: 11 Global Step: 248360 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:28,682-Speed 2499.37 samples/sec Loss 3.2107 LearningRate 0.000606 Epoch: 11 Global Step: 248370 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:36,897-Speed 2493.43 samples/sec Loss 3.2109 LearningRate 0.000606 Epoch: 11 Global Step: 248380 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:45,097-Speed 2497.97 samples/sec Loss 3.1905 LearningRate 0.000606 Epoch: 11 Global Step: 248390 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:13:53,294-Speed 2498.74 samples/sec Loss 3.1302 LearningRate 0.000606 Epoch: 11 Global Step: 248400 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:01,436-Speed 2515.68 samples/sec Loss 3.1831 LearningRate 0.000606 Epoch: 11 Global Step: 248410 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:09,642-Speed 2496.26 samples/sec Loss 3.1783 LearningRate 0.000606 Epoch: 11 Global Step: 248420 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:17,842-Speed 2497.97 samples/sec Loss 3.1396 LearningRate 0.000606 Epoch: 11 Global Step: 248430 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:26,042-Speed 2498.11 samples/sec Loss 3.1479 LearningRate 0.000606 Epoch: 11 Global Step: 248440 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:34,239-Speed 2498.93 samples/sec Loss 3.1678 LearningRate 0.000606 Epoch: 11 Global Step: 248450 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:42,435-Speed 2498.99 samples/sec Loss 3.1665 LearningRate 0.000606 Epoch: 11 Global Step: 248460 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:50,581-Speed 2514.86 samples/sec Loss 3.1904 LearningRate 0.000606 Epoch: 11 Global Step: 248470 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:14:58,778-Speed 2498.87 samples/sec Loss 3.1747 LearningRate 0.000606 Epoch: 11 Global Step: 248480 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:06,977-Speed 2498.21 samples/sec Loss 3.1036 LearningRate 0.000606 Epoch: 11 Global Step: 248490 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:15,173-Speed 2499.27 samples/sec Loss 3.1749 LearningRate 0.000606 Epoch: 11 Global Step: 248500 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:23,372-Speed 2498.24 samples/sec Loss 3.1811 LearningRate 0.000606 Epoch: 11 Global Step: 248510 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:31,571-Speed 2498.21 samples/sec Loss 3.1993 LearningRate 0.000606 Epoch: 11 Global Step: 248520 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:39,727-Speed 2511.51 samples/sec Loss 3.1761 LearningRate 0.000606 Epoch: 11 Global Step: 248530 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:47,935-Speed 2495.72 samples/sec Loss 3.1892 LearningRate 0.000606 Epoch: 11 Global Step: 248540 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:15:56,131-Speed 2498.98 samples/sec Loss 3.2236 LearningRate 0.000606 Epoch: 11 Global Step: 248550 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:04,325-Speed 2499.79 samples/sec Loss 3.2189 LearningRate 0.000606 Epoch: 11 Global Step: 248560 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:12,522-Speed 2499.12 samples/sec Loss 3.1941 LearningRate 0.000606 Epoch: 11 Global Step: 248570 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:20,721-Speed 2498.32 samples/sec Loss 3.1824 LearningRate 0.000606 Epoch: 11 Global Step: 248580 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:28,866-Speed 2514.75 samples/sec Loss 3.2114 LearningRate 0.000606 Epoch: 11 Global Step: 248590 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:37,063-Speed 2498.71 samples/sec Loss 3.1416 LearningRate 0.000605 Epoch: 11 Global Step: 248600 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:45,259-Speed 2499.34 samples/sec Loss 3.1292 LearningRate 0.000605 Epoch: 11 Global Step: 248610 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:16:53,459-Speed 2497.91 samples/sec Loss 3.1978 LearningRate 0.000605 Epoch: 11 Global Step: 248620 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:17:01,655-Speed 2499.32 samples/sec Loss 3.1204 LearningRate 0.000605 Epoch: 11 Global Step: 248630 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:17:09,855-Speed 2497.97 samples/sec Loss 3.1810 LearningRate 0.000605 Epoch: 11 Global Step: 248640 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:17:18,000-Speed 2514.67 samples/sec Loss 3.1791 LearningRate 0.000605 Epoch: 11 Global Step: 248650 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:17:26,198-Speed 2498.54 samples/sec Loss 3.2093 LearningRate 0.000605 Epoch: 11 Global Step: 248660 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:17:34,363-Speed 2508.80 samples/sec Loss 3.1926 LearningRate 0.000605 Epoch: 11 Global Step: 248670 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:17:42,560-Speed 2498.62 samples/sec Loss 3.2043 LearningRate 0.000605 Epoch: 11 Global Step: 248680 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:17:50,756-Speed 2499.34 samples/sec Loss 3.2372 LearningRate 0.000605 Epoch: 11 Global Step: 248690 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:17:58,953-Speed 2499.53 samples/sec Loss 3.1928 LearningRate 0.000605 Epoch: 11 Global Step: 248700 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:07,096-Speed 2515.34 samples/sec Loss 3.3474 LearningRate 0.000605 Epoch: 11 Global Step: 248710 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:15,297-Speed 2498.26 samples/sec Loss 3.2164 LearningRate 0.000605 Epoch: 11 Global Step: 248720 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:23,495-Speed 2498.62 samples/sec Loss 3.1978 LearningRate 0.000605 Epoch: 11 Global Step: 248730 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:31,691-Speed 2499.03 samples/sec Loss 3.1726 LearningRate 0.000605 Epoch: 11 Global Step: 248740 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:39,893-Speed 2497.29 samples/sec Loss 3.2128 LearningRate 0.000605 Epoch: 11 Global Step: 248750 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:48,092-Speed 2498.64 samples/sec Loss 3.2085 LearningRate 0.000605 Epoch: 11 Global Step: 248760 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:18:56,239-Speed 2514.09 samples/sec Loss 3.2014 LearningRate 0.000605 Epoch: 11 Global Step: 248770 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:04,437-Speed 2498.53 samples/sec Loss 3.1974 LearningRate 0.000605 Epoch: 11 Global Step: 248780 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:12,639-Speed 2497.39 samples/sec Loss 3.2097 LearningRate 0.000605 Epoch: 11 Global Step: 248790 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:20,839-Speed 2498.07 samples/sec Loss 3.2299 LearningRate 0.000605 Epoch: 11 Global Step: 248800 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:29,041-Speed 2497.34 samples/sec Loss 3.2498 LearningRate 0.000605 Epoch: 11 Global Step: 248810 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:37,244-Speed 2497.11 samples/sec Loss 3.1506 LearningRate 0.000605 Epoch: 11 Global Step: 248820 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:45,388-Speed 2515.24 samples/sec Loss 3.2129 LearningRate 0.000605 Epoch: 11 Global Step: 248830 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:19:53,587-Speed 2497.97 samples/sec Loss 3.1804 LearningRate 0.000605 Epoch: 11 Global Step: 248840 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:01,788-Speed 2497.88 samples/sec Loss 3.2227 LearningRate 0.000605 Epoch: 11 Global Step: 248850 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:09,989-Speed 2497.98 samples/sec Loss 3.1656 LearningRate 0.000605 Epoch: 11 Global Step: 248860 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:18,185-Speed 2499.23 samples/sec Loss 3.1797 LearningRate 0.000605 Epoch: 11 Global Step: 248870 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:28,438-Speed 1997.71 samples/sec Loss 3.1687 LearningRate 0.000605 Epoch: 12 Global Step: 248880 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:36,580-Speed 2515.88 samples/sec Loss 3.1330 LearningRate 0.000605 Epoch: 12 Global Step: 248890 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:44,790-Speed 2494.66 samples/sec Loss 3.1790 LearningRate 0.000605 Epoch: 12 Global Step: 248900 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:20:53,002-Speed 2494.56 samples/sec Loss 3.1597 LearningRate 0.000605 Epoch: 12 Global Step: 248910 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:01,209-Speed 2495.82 samples/sec Loss 3.1487 LearningRate 0.000605 Epoch: 12 Global Step: 248920 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:09,412-Speed 2497.09 samples/sec Loss 3.1272 LearningRate 0.000605 Epoch: 12 Global Step: 248930 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:17,620-Speed 2495.43 samples/sec Loss 3.1309 LearningRate 0.000605 Epoch: 12 Global Step: 248940 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:25,776-Speed 2511.29 samples/sec Loss 3.1952 LearningRate 0.000605 Epoch: 12 Global Step: 248950 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:33,993-Speed 2492.95 samples/sec Loss 3.1945 LearningRate 0.000605 Epoch: 12 Global Step: 248960 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:42,205-Speed 2494.35 samples/sec Loss 3.1606 LearningRate 0.000605 Epoch: 12 Global Step: 248970 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:50,418-Speed 2493.95 samples/sec Loss 3.2137 LearningRate 0.000605 Epoch: 12 Global Step: 248980 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:21:58,627-Speed 2495.17 samples/sec Loss 3.1163 LearningRate 0.000605 Epoch: 12 Global Step: 248990 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:06,838-Speed 2494.76 samples/sec Loss 3.2614 LearningRate 0.000605 Epoch: 12 Global Step: 249000 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:14,994-Speed 2511.30 samples/sec Loss 3.2639 LearningRate 0.000605 Epoch: 12 Global Step: 249010 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:23,199-Speed 2496.64 samples/sec Loss 3.1994 LearningRate 0.000605 Epoch: 12 Global Step: 249020 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:31,414-Speed 2493.72 samples/sec Loss 3.1776 LearningRate 0.000605 Epoch: 12 Global Step: 249030 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:39,615-Speed 2497.62 samples/sec Loss 3.1861 LearningRate 0.000605 Epoch: 12 Global Step: 249040 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:47,816-Speed 2497.74 samples/sec Loss 3.1355 LearningRate 0.000605 Epoch: 12 Global Step: 249050 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:22:56,012-Speed 2499.25 samples/sec Loss 3.2070 LearningRate 0.000605 Epoch: 12 Global Step: 249060 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:04,156-Speed 2515.24 samples/sec Loss 3.1583 LearningRate 0.000605 Epoch: 12 Global Step: 249070 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:12,355-Speed 2498.22 samples/sec Loss 3.1342 LearningRate 0.000604 Epoch: 12 Global Step: 249080 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:20,557-Speed 2497.53 samples/sec Loss 3.1390 LearningRate 0.000604 Epoch: 12 Global Step: 249090 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:28,757-Speed 2497.82 samples/sec Loss 3.1462 LearningRate 0.000604 Epoch: 12 Global Step: 249100 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:36,962-Speed 2496.49 samples/sec Loss 3.1106 LearningRate 0.000604 Epoch: 12 Global Step: 249110 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:45,164-Speed 2497.23 samples/sec Loss 3.1943 LearningRate 0.000604 Epoch: 12 Global Step: 249120 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:23:53,325-Speed 2510.29 samples/sec Loss 3.1289 LearningRate 0.000604 Epoch: 12 Global Step: 249130 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:01,531-Speed 2496.07 samples/sec Loss 3.1916 LearningRate 0.000604 Epoch: 12 Global Step: 249140 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:09,736-Speed 2496.40 samples/sec Loss 3.2600 LearningRate 0.000604 Epoch: 12 Global Step: 249150 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:17,941-Speed 2496.33 samples/sec Loss 3.2227 LearningRate 0.000604 Epoch: 12 Global Step: 249160 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:26,142-Speed 2497.83 samples/sec Loss 3.2380 LearningRate 0.000604 Epoch: 12 Global Step: 249170 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:34,343-Speed 2497.49 samples/sec Loss 3.1825 LearningRate 0.000604 Epoch: 12 Global Step: 249180 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:42,509-Speed 2508.57 samples/sec Loss 3.2277 LearningRate 0.000604 Epoch: 12 Global Step: 249190 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:50,714-Speed 2496.57 samples/sec Loss 3.1617 LearningRate 0.000604 Epoch: 12 Global Step: 249200 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:24:58,931-Speed 2493.01 samples/sec Loss 3.1224 LearningRate 0.000604 Epoch: 12 Global Step: 249210 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:07,130-Speed 2498.25 samples/sec Loss 3.1298 LearningRate 0.000604 Epoch: 12 Global Step: 249220 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:15,329-Speed 2498.37 samples/sec Loss 3.1456 LearningRate 0.000604 Epoch: 12 Global Step: 249230 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:23,531-Speed 2497.14 samples/sec Loss 3.1905 LearningRate 0.000604 Epoch: 12 Global Step: 249240 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:31,677-Speed 2514.63 samples/sec Loss 3.2023 LearningRate 0.000604 Epoch: 12 Global Step: 249250 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:39,883-Speed 2496.05 samples/sec Loss 3.1696 LearningRate 0.000604 Epoch: 12 Global Step: 249260 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:48,090-Speed 2495.66 samples/sec Loss 3.0935 LearningRate 0.000604 Epoch: 12 Global Step: 249270 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:25:56,291-Speed 2497.72 samples/sec Loss 3.1608 LearningRate 0.000604 Epoch: 12 Global Step: 249280 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:04,490-Speed 2498.02 samples/sec Loss 3.0946 LearningRate 0.000604 Epoch: 12 Global Step: 249290 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:12,692-Speed 2497.30 samples/sec Loss 3.1616 LearningRate 0.000604 Epoch: 12 Global Step: 249300 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:20,838-Speed 2514.65 samples/sec Loss 3.1982 LearningRate 0.000604 Epoch: 12 Global Step: 249310 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:29,042-Speed 2496.52 samples/sec Loss 3.1725 LearningRate 0.000604 Epoch: 12 Global Step: 249320 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:37,244-Speed 2497.37 samples/sec Loss 3.0631 LearningRate 0.000604 Epoch: 12 Global Step: 249330 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:45,446-Speed 2497.45 samples/sec Loss 3.1993 LearningRate 0.000604 Epoch: 12 Global Step: 249340 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:26:53,646-Speed 2497.92 samples/sec Loss 3.2050 LearningRate 0.000604 Epoch: 12 Global Step: 249350 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:01,846-Speed 2497.95 samples/sec Loss 3.1001 LearningRate 0.000604 Epoch: 12 Global Step: 249360 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:09,994-Speed 2513.89 samples/sec Loss 3.1881 LearningRate 0.000604 Epoch: 12 Global Step: 249370 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:18,194-Speed 2497.99 samples/sec Loss 3.1199 LearningRate 0.000604 Epoch: 12 Global Step: 249380 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:26,408-Speed 2493.70 samples/sec Loss 3.0712 LearningRate 0.000604 Epoch: 12 Global Step: 249390 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:34,607-Speed 2498.15 samples/sec Loss 3.1107 LearningRate 0.000604 Epoch: 12 Global Step: 249400 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:42,805-Speed 2498.63 samples/sec Loss 3.1335 LearningRate 0.000604 Epoch: 12 Global Step: 249410 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:51,008-Speed 2497.06 samples/sec Loss 3.1441 LearningRate 0.000604 Epoch: 12 Global Step: 249420 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:27:59,161-Speed 2512.32 samples/sec Loss 3.1629 LearningRate 0.000604 Epoch: 12 Global Step: 249430 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:07,365-Speed 2496.72 samples/sec Loss 3.2283 LearningRate 0.000604 Epoch: 12 Global Step: 249440 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:15,578-Speed 2494.00 samples/sec Loss 3.1541 LearningRate 0.000604 Epoch: 12 Global Step: 249450 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:23,779-Speed 2497.86 samples/sec Loss 3.1140 LearningRate 0.000604 Epoch: 12 Global Step: 249460 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:31,976-Speed 2498.67 samples/sec Loss 3.2596 LearningRate 0.000604 Epoch: 12 Global Step: 249470 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:40,190-Speed 2493.79 samples/sec Loss 3.1208 LearningRate 0.000604 Epoch: 12 Global Step: 249480 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:48,337-Speed 2514.58 samples/sec Loss 3.1634 LearningRate 0.000604 Epoch: 12 Global Step: 249490 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:28:56,539-Speed 2497.43 samples/sec Loss 3.1635 LearningRate 0.000604 Epoch: 12 Global Step: 249500 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:04,740-Speed 2497.66 samples/sec Loss 3.1865 LearningRate 0.000604 Epoch: 12 Global Step: 249510 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:12,948-Speed 2495.49 samples/sec Loss 3.1486 LearningRate 0.000604 Epoch: 12 Global Step: 249520 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:21,146-Speed 2498.45 samples/sec Loss 3.1710 LearningRate 0.000604 Epoch: 12 Global Step: 249530 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:29,348-Speed 2497.40 samples/sec Loss 3.1638 LearningRate 0.000604 Epoch: 12 Global Step: 249540 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:37,494-Speed 2514.62 samples/sec Loss 3.1801 LearningRate 0.000604 Epoch: 12 Global Step: 249550 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:45,713-Speed 2492.10 samples/sec Loss 3.1418 LearningRate 0.000603 Epoch: 12 Global Step: 249560 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:29:53,921-Speed 2495.57 samples/sec Loss 3.1224 LearningRate 0.000603 Epoch: 12 Global Step: 249570 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:02,123-Speed 2497.53 samples/sec Loss 3.1339 LearningRate 0.000603 Epoch: 12 Global Step: 249580 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:10,333-Speed 2494.90 samples/sec Loss 3.1682 LearningRate 0.000603 Epoch: 12 Global Step: 249590 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:18,535-Speed 2497.08 samples/sec Loss 3.3162 LearningRate 0.000603 Epoch: 12 Global Step: 249600 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:26,684-Speed 2513.80 samples/sec Loss 3.1773 LearningRate 0.000603 Epoch: 12 Global Step: 249610 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:34,892-Speed 2495.66 samples/sec Loss 3.3673 LearningRate 0.000603 Epoch: 12 Global Step: 249620 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:43,092-Speed 2497.96 samples/sec Loss 3.2955 LearningRate 0.000603 Epoch: 12 Global Step: 249630 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:51,305-Speed 2494.10 samples/sec Loss 3.2602 LearningRate 0.000603 Epoch: 12 Global Step: 249640 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:30:59,503-Speed 2498.35 samples/sec Loss 3.2367 LearningRate 0.000603 Epoch: 12 Global Step: 249650 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:07,707-Speed 2496.94 samples/sec Loss 3.2881 LearningRate 0.000603 Epoch: 12 Global Step: 249660 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:15,853-Speed 2514.33 samples/sec Loss 3.2830 LearningRate 0.000603 Epoch: 12 Global Step: 249670 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:24,055-Speed 2497.36 samples/sec Loss 3.1695 LearningRate 0.000603 Epoch: 12 Global Step: 249680 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:32,255-Speed 2498.06 samples/sec Loss 3.2214 LearningRate 0.000603 Epoch: 12 Global Step: 249690 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:40,456-Speed 2497.85 samples/sec Loss 3.2089 LearningRate 0.000603 Epoch: 12 Global Step: 249700 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:48,658-Speed 2500.07 samples/sec Loss 3.1791 LearningRate 0.000603 Epoch: 12 Global Step: 249710 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:31:56,858-Speed 2498.06 samples/sec Loss 3.2064 LearningRate 0.000603 Epoch: 12 Global Step: 249720 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:05,003-Speed 2514.92 samples/sec Loss 3.1427 LearningRate 0.000603 Epoch: 12 Global Step: 249730 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:13,203-Speed 2498.07 samples/sec Loss 3.1806 LearningRate 0.000603 Epoch: 12 Global Step: 249740 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:21,404-Speed 2497.70 samples/sec Loss 3.1588 LearningRate 0.000603 Epoch: 12 Global Step: 249750 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:29,602-Speed 2498.46 samples/sec Loss 3.2370 LearningRate 0.000603 Epoch: 12 Global Step: 249760 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:37,800-Speed 2498.86 samples/sec Loss 3.1567 LearningRate 0.000603 Epoch: 12 Global Step: 249770 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:46,013-Speed 2493.88 samples/sec Loss 3.1778 LearningRate 0.000603 Epoch: 12 Global Step: 249780 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:32:54,167-Speed 2511.95 samples/sec Loss 3.1995 LearningRate 0.000603 Epoch: 12 Global Step: 249790 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:02,368-Speed 2497.84 samples/sec Loss 3.1941 LearningRate 0.000603 Epoch: 12 Global Step: 249800 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:10,565-Speed 2499.06 samples/sec Loss 3.1696 LearningRate 0.000603 Epoch: 12 Global Step: 249810 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:18,765-Speed 2497.84 samples/sec Loss 3.1882 LearningRate 0.000603 Epoch: 12 Global Step: 249820 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:26,965-Speed 2498.09 samples/sec Loss 3.1709 LearningRate 0.000603 Epoch: 12 Global Step: 249830 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:35,163-Speed 2498.57 samples/sec Loss 3.1653 LearningRate 0.000603 Epoch: 12 Global Step: 249840 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:43,311-Speed 2514.19 samples/sec Loss 3.1776 LearningRate 0.000603 Epoch: 12 Global Step: 249850 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:51,510-Speed 2498.10 samples/sec Loss 3.1940 LearningRate 0.000603 Epoch: 12 Global Step: 249860 Fp16 Grad Scale: 32768 Required: 133 hours Training: 2022-07-07 23:33:59,708-Speed 2498.45 samples/sec Loss 3.2523 LearningRate 0.000603 Epoch: 12 Global Step: 249870 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:07,919-Speed 2494.80 samples/sec Loss 3.1356 LearningRate 0.000603 Epoch: 12 Global Step: 249880 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:16,131-Speed 2494.16 samples/sec Loss 3.2089 LearningRate 0.000603 Epoch: 12 Global Step: 249890 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:24,329-Speed 2498.74 samples/sec Loss 3.1792 LearningRate 0.000603 Epoch: 12 Global Step: 249900 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:32,473-Speed 2514.97 samples/sec Loss 3.1676 LearningRate 0.000603 Epoch: 12 Global Step: 249910 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:40,673-Speed 2497.90 samples/sec Loss 3.1764 LearningRate 0.000603 Epoch: 12 Global Step: 249920 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:48,873-Speed 2498.10 samples/sec Loss 3.2178 LearningRate 0.000603 Epoch: 12 Global Step: 249930 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:34:57,084-Speed 2494.59 samples/sec Loss 3.1357 LearningRate 0.000603 Epoch: 12 Global Step: 249940 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:05,296-Speed 2494.30 samples/sec Loss 3.1611 LearningRate 0.000603 Epoch: 12 Global Step: 249950 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:13,495-Speed 2498.42 samples/sec Loss 3.1679 LearningRate 0.000603 Epoch: 12 Global Step: 249960 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:21,650-Speed 2511.86 samples/sec Loss 3.1673 LearningRate 0.000603 Epoch: 12 Global Step: 249970 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:29,852-Speed 2497.18 samples/sec Loss 3.1516 LearningRate 0.000603 Epoch: 12 Global Step: 249980 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:38,053-Speed 2497.61 samples/sec Loss 3.2279 LearningRate 0.000603 Epoch: 12 Global Step: 249990 Fp16 Grad Scale: 65536 Required: 133 hours Training: 2022-07-07 23:35:46,254-Speed 2497.81 samples/sec Loss 3.1902 LearningRate 0.000603 Epoch: 12 Global Step: 250000 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:35:54,454-Speed 2497.78 samples/sec Loss 3.1688 LearningRate 0.000603 Epoch: 12 Global Step: 250010 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:02,656-Speed 2497.41 samples/sec Loss 3.1183 LearningRate 0.000603 Epoch: 12 Global Step: 250020 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:10,809-Speed 2512.36 samples/sec Loss 3.1340 LearningRate 0.000603 Epoch: 12 Global Step: 250030 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:19,014-Speed 2496.63 samples/sec Loss 3.1292 LearningRate 0.000603 Epoch: 12 Global Step: 250040 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:27,225-Speed 2494.53 samples/sec Loss 3.1162 LearningRate 0.000602 Epoch: 12 Global Step: 250050 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:35,426-Speed 2497.84 samples/sec Loss 3.1003 LearningRate 0.000602 Epoch: 12 Global Step: 250060 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:36:43,595-Speed 2507.66 samples/sec Loss 3.1963 LearningRate 0.000602 Epoch: 12 Global Step: 250070 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:36:51,795-Speed 2497.79 samples/sec Loss 3.1021 LearningRate 0.000602 Epoch: 12 Global Step: 250080 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:36:59,944-Speed 2513.64 samples/sec Loss 3.2053 LearningRate 0.000602 Epoch: 12 Global Step: 250090 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:08,144-Speed 2497.90 samples/sec Loss 3.1521 LearningRate 0.000602 Epoch: 12 Global Step: 250100 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:16,345-Speed 2497.53 samples/sec Loss 3.1015 LearningRate 0.000602 Epoch: 12 Global Step: 250110 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:24,544-Speed 2498.38 samples/sec Loss 3.1701 LearningRate 0.000602 Epoch: 12 Global Step: 250120 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:32,744-Speed 2498.01 samples/sec Loss 3.1221 LearningRate 0.000602 Epoch: 12 Global Step: 250130 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:40,951-Speed 2495.71 samples/sec Loss 3.1481 LearningRate 0.000602 Epoch: 12 Global Step: 250140 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:49,101-Speed 2513.43 samples/sec Loss 3.1634 LearningRate 0.000602 Epoch: 12 Global Step: 250150 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:37:57,296-Speed 2499.49 samples/sec Loss 3.1674 LearningRate 0.000602 Epoch: 12 Global Step: 250160 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:05,496-Speed 2498.03 samples/sec Loss 3.1275 LearningRate 0.000602 Epoch: 12 Global Step: 250170 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:13,702-Speed 2496.07 samples/sec Loss 3.2118 LearningRate 0.000602 Epoch: 12 Global Step: 250180 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:21,901-Speed 2498.31 samples/sec Loss 3.1682 LearningRate 0.000602 Epoch: 12 Global Step: 250190 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:30,099-Speed 2498.37 samples/sec Loss 3.0602 LearningRate 0.000602 Epoch: 12 Global Step: 250200 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:38,243-Speed 2515.54 samples/sec Loss 3.1460 LearningRate 0.000602 Epoch: 12 Global Step: 250210 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:46,443-Speed 2497.69 samples/sec Loss 3.1770 LearningRate 0.000602 Epoch: 12 Global Step: 250220 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:38:54,640-Speed 2498.88 samples/sec Loss 3.1620 LearningRate 0.000602 Epoch: 12 Global Step: 250230 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:02,840-Speed 2497.78 samples/sec Loss 3.1577 LearningRate 0.000602 Epoch: 12 Global Step: 250240 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:11,040-Speed 2498.18 samples/sec Loss 3.2048 LearningRate 0.000602 Epoch: 12 Global Step: 250250 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:19,238-Speed 2498.63 samples/sec Loss 3.1260 LearningRate 0.000602 Epoch: 12 Global Step: 250260 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:27,383-Speed 2514.82 samples/sec Loss 3.1687 LearningRate 0.000602 Epoch: 12 Global Step: 250270 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:35,582-Speed 2498.12 samples/sec Loss 3.0965 LearningRate 0.000602 Epoch: 12 Global Step: 250280 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:43,782-Speed 2497.98 samples/sec Loss 3.1067 LearningRate 0.000602 Epoch: 12 Global Step: 250290 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:39:51,998-Speed 2493.20 samples/sec Loss 3.1321 LearningRate 0.000602 Epoch: 12 Global Step: 250300 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:00,209-Speed 2494.61 samples/sec Loss 3.1279 LearningRate 0.000602 Epoch: 12 Global Step: 250310 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:08,408-Speed 2498.29 samples/sec Loss 3.1410 LearningRate 0.000602 Epoch: 12 Global Step: 250320 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:16,552-Speed 2515.13 samples/sec Loss 3.1298 LearningRate 0.000602 Epoch: 12 Global Step: 250330 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:24,749-Speed 2499.07 samples/sec Loss 3.2094 LearningRate 0.000602 Epoch: 12 Global Step: 250340 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:32,948-Speed 2498.28 samples/sec Loss 3.1780 LearningRate 0.000602 Epoch: 12 Global Step: 250350 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:41,145-Speed 2498.58 samples/sec Loss 3.1430 LearningRate 0.000602 Epoch: 12 Global Step: 250360 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:49,345-Speed 2498.36 samples/sec Loss 3.1033 LearningRate 0.000602 Epoch: 12 Global Step: 250370 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:40:57,543-Speed 2498.27 samples/sec Loss 3.1601 LearningRate 0.000602 Epoch: 12 Global Step: 250380 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:05,687-Speed 2515.25 samples/sec Loss 3.1893 LearningRate 0.000602 Epoch: 12 Global Step: 250390 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:13,884-Speed 2499.13 samples/sec Loss 3.1062 LearningRate 0.000602 Epoch: 12 Global Step: 250400 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:22,081-Speed 2499.19 samples/sec Loss 3.1055 LearningRate 0.000602 Epoch: 12 Global Step: 250410 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:30,288-Speed 2495.73 samples/sec Loss 3.1234 LearningRate 0.000602 Epoch: 12 Global Step: 250420 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:38,484-Speed 2499.15 samples/sec Loss 3.1352 LearningRate 0.000602 Epoch: 12 Global Step: 250430 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:46,686-Speed 2497.90 samples/sec Loss 3.0439 LearningRate 0.000602 Epoch: 12 Global Step: 250440 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:41:54,834-Speed 2513.77 samples/sec Loss 3.1546 LearningRate 0.000602 Epoch: 12 Global Step: 250450 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:03,032-Speed 2498.76 samples/sec Loss 3.1844 LearningRate 0.000602 Epoch: 12 Global Step: 250460 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:11,232-Speed 2498.06 samples/sec Loss 3.1116 LearningRate 0.000602 Epoch: 12 Global Step: 250470 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:19,428-Speed 2499.12 samples/sec Loss 3.1051 LearningRate 0.000602 Epoch: 12 Global Step: 250480 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:27,623-Speed 2499.39 samples/sec Loss 3.1594 LearningRate 0.000602 Epoch: 12 Global Step: 250490 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:35,821-Speed 2498.50 samples/sec Loss 3.1971 LearningRate 0.000602 Epoch: 12 Global Step: 250500 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:43,964-Speed 2515.59 samples/sec Loss 3.1964 LearningRate 0.000602 Epoch: 12 Global Step: 250510 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:42:52,160-Speed 2499.02 samples/sec Loss 3.1374 LearningRate 0.000602 Epoch: 12 Global Step: 250520 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:00,358-Speed 2499.02 samples/sec Loss 3.1822 LearningRate 0.000601 Epoch: 12 Global Step: 250530 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:08,560-Speed 2497.29 samples/sec Loss 3.2086 LearningRate 0.000601 Epoch: 12 Global Step: 250540 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:16,770-Speed 2495.07 samples/sec Loss 3.2027 LearningRate 0.000601 Epoch: 12 Global Step: 250550 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:24,968-Speed 2498.52 samples/sec Loss 3.1517 LearningRate 0.000601 Epoch: 12 Global Step: 250560 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:33,117-Speed 2513.78 samples/sec Loss 3.1177 LearningRate 0.000601 Epoch: 12 Global Step: 250570 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:41,313-Speed 2499.39 samples/sec Loss 3.0617 LearningRate 0.000601 Epoch: 12 Global Step: 250580 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:49,524-Speed 2494.46 samples/sec Loss 3.1637 LearningRate 0.000601 Epoch: 12 Global Step: 250590 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:43:57,719-Speed 2499.50 samples/sec Loss 3.1295 LearningRate 0.000601 Epoch: 12 Global Step: 250600 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:05,917-Speed 2498.57 samples/sec Loss 3.1305 LearningRate 0.000601 Epoch: 12 Global Step: 250610 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:14,116-Speed 2498.02 samples/sec Loss 3.1075 LearningRate 0.000601 Epoch: 12 Global Step: 250620 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:22,258-Speed 2515.92 samples/sec Loss 3.1265 LearningRate 0.000601 Epoch: 12 Global Step: 250630 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:30,456-Speed 2498.71 samples/sec Loss 3.1689 LearningRate 0.000601 Epoch: 12 Global Step: 250640 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:38,655-Speed 2498.23 samples/sec Loss 3.2031 LearningRate 0.000601 Epoch: 12 Global Step: 250650 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:46,854-Speed 2498.43 samples/sec Loss 3.1627 LearningRate 0.000601 Epoch: 12 Global Step: 250660 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:44:55,053-Speed 2498.21 samples/sec Loss 3.1281 LearningRate 0.000601 Epoch: 12 Global Step: 250670 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:03,251-Speed 2498.34 samples/sec Loss 3.1997 LearningRate 0.000601 Epoch: 12 Global Step: 250680 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:11,399-Speed 2514.26 samples/sec Loss 3.1301 LearningRate 0.000601 Epoch: 12 Global Step: 250690 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:19,596-Speed 2498.67 samples/sec Loss 3.1247 LearningRate 0.000601 Epoch: 12 Global Step: 250700 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:27,794-Speed 2498.55 samples/sec Loss 3.1897 LearningRate 0.000601 Epoch: 12 Global Step: 250710 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:35,989-Speed 2499.63 samples/sec Loss 3.1465 LearningRate 0.000601 Epoch: 12 Global Step: 250720 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:44,190-Speed 2497.52 samples/sec Loss 3.1534 LearningRate 0.000601 Epoch: 12 Global Step: 250730 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:45:52,386-Speed 2499.22 samples/sec Loss 3.2437 LearningRate 0.000601 Epoch: 12 Global Step: 250740 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:00,531-Speed 2514.92 samples/sec Loss 3.1768 LearningRate 0.000601 Epoch: 12 Global Step: 250750 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:08,727-Speed 2498.91 samples/sec Loss 3.2007 LearningRate 0.000601 Epoch: 12 Global Step: 250760 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:16,940-Speed 2494.02 samples/sec Loss 3.1261 LearningRate 0.000601 Epoch: 12 Global Step: 250770 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:25,139-Speed 2498.20 samples/sec Loss 3.1841 LearningRate 0.000601 Epoch: 12 Global Step: 250780 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:33,336-Speed 2498.90 samples/sec Loss 3.1887 LearningRate 0.000601 Epoch: 12 Global Step: 250790 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:41,544-Speed 2495.65 samples/sec Loss 3.1178 LearningRate 0.000601 Epoch: 12 Global Step: 250800 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:49,689-Speed 2514.98 samples/sec Loss 3.1415 LearningRate 0.000601 Epoch: 12 Global Step: 250810 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:46:57,887-Speed 2498.62 samples/sec Loss 3.1641 LearningRate 0.000601 Epoch: 12 Global Step: 250820 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:06,085-Speed 2498.43 samples/sec Loss 3.1361 LearningRate 0.000601 Epoch: 12 Global Step: 250830 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:14,283-Speed 2498.54 samples/sec Loss 3.1243 LearningRate 0.000601 Epoch: 12 Global Step: 250840 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:22,484-Speed 2497.60 samples/sec Loss 3.1876 LearningRate 0.000601 Epoch: 12 Global Step: 250850 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:30,694-Speed 2494.77 samples/sec Loss 3.1694 LearningRate 0.000601 Epoch: 12 Global Step: 250860 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:38,837-Speed 2515.52 samples/sec Loss 3.1674 LearningRate 0.000601 Epoch: 12 Global Step: 250870 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:47,041-Speed 2496.62 samples/sec Loss 3.1325 LearningRate 0.000601 Epoch: 12 Global Step: 250880 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:47:55,237-Speed 2499.11 samples/sec Loss 3.0962 LearningRate 0.000601 Epoch: 12 Global Step: 250890 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:03,434-Speed 2499.16 samples/sec Loss 3.1448 LearningRate 0.000601 Epoch: 12 Global Step: 250900 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:11,632-Speed 2498.67 samples/sec Loss 3.1228 LearningRate 0.000601 Epoch: 12 Global Step: 250910 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:19,834-Speed 2497.41 samples/sec Loss 3.1566 LearningRate 0.000601 Epoch: 12 Global Step: 250920 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:27,973-Speed 2516.41 samples/sec Loss 3.1308 LearningRate 0.000601 Epoch: 12 Global Step: 250930 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:36,171-Speed 2498.39 samples/sec Loss 3.1282 LearningRate 0.000601 Epoch: 12 Global Step: 250940 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:44,368-Speed 2499.12 samples/sec Loss 3.1702 LearningRate 0.000601 Epoch: 12 Global Step: 250950 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:48:52,574-Speed 2496.05 samples/sec Loss 3.1636 LearningRate 0.000601 Epoch: 12 Global Step: 250960 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:00,773-Speed 2498.08 samples/sec Loss 3.1969 LearningRate 0.000601 Epoch: 12 Global Step: 250970 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:08,972-Speed 2498.43 samples/sec Loss 3.1954 LearningRate 0.000601 Epoch: 12 Global Step: 250980 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:17,114-Speed 2515.56 samples/sec Loss 3.1244 LearningRate 0.000601 Epoch: 12 Global Step: 250990 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:25,313-Speed 2498.46 samples/sec Loss 3.1889 LearningRate 0.000601 Epoch: 12 Global Step: 251000 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:33,513-Speed 2497.82 samples/sec Loss 3.1782 LearningRate 0.000600 Epoch: 12 Global Step: 251010 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:41,715-Speed 2497.41 samples/sec Loss 3.1490 LearningRate 0.000600 Epoch: 12 Global Step: 251020 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:49,910-Speed 2499.50 samples/sec Loss 3.2609 LearningRate 0.000600 Epoch: 12 Global Step: 251030 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:49:58,112-Speed 2497.42 samples/sec Loss 3.1846 LearningRate 0.000600 Epoch: 12 Global Step: 251040 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:06,255-Speed 2515.46 samples/sec Loss 3.2546 LearningRate 0.000600 Epoch: 12 Global Step: 251050 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:14,460-Speed 2496.39 samples/sec Loss 3.2589 LearningRate 0.000600 Epoch: 12 Global Step: 251060 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:22,657-Speed 2498.95 samples/sec Loss 3.2497 LearningRate 0.000600 Epoch: 12 Global Step: 251070 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:30,883-Speed 2490.01 samples/sec Loss 3.2618 LearningRate 0.000600 Epoch: 12 Global Step: 251080 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:39,079-Speed 2499.33 samples/sec Loss 3.2355 LearningRate 0.000600 Epoch: 12 Global Step: 251090 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:47,279-Speed 2498.09 samples/sec Loss 3.2946 LearningRate 0.000600 Epoch: 12 Global Step: 251100 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:50:55,425-Speed 2514.53 samples/sec Loss 3.2500 LearningRate 0.000600 Epoch: 12 Global Step: 251110 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:03,617-Speed 2500.22 samples/sec Loss 3.1748 LearningRate 0.000600 Epoch: 12 Global Step: 251120 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:11,811-Speed 2499.87 samples/sec Loss 3.1666 LearningRate 0.000600 Epoch: 12 Global Step: 251130 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:20,009-Speed 2498.66 samples/sec Loss 3.1878 LearningRate 0.000600 Epoch: 12 Global Step: 251140 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:28,205-Speed 2499.11 samples/sec Loss 3.1301 LearningRate 0.000600 Epoch: 12 Global Step: 251150 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:36,403-Speed 2498.85 samples/sec Loss 3.1629 LearningRate 0.000600 Epoch: 12 Global Step: 251160 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:44,548-Speed 2514.97 samples/sec Loss 3.1681 LearningRate 0.000600 Epoch: 12 Global Step: 251170 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:51:52,743-Speed 2499.60 samples/sec Loss 3.1703 LearningRate 0.000600 Epoch: 12 Global Step: 251180 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:00,939-Speed 2498.95 samples/sec Loss 3.1569 LearningRate 0.000600 Epoch: 12 Global Step: 251190 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:09,135-Speed 2499.35 samples/sec Loss 3.1637 LearningRate 0.000600 Epoch: 12 Global Step: 251200 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:17,347-Speed 2494.09 samples/sec Loss 3.2029 LearningRate 0.000600 Epoch: 12 Global Step: 251210 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:25,544-Speed 2498.96 samples/sec Loss 3.1253 LearningRate 0.000600 Epoch: 12 Global Step: 251220 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:33,687-Speed 2515.45 samples/sec Loss 3.2131 LearningRate 0.000600 Epoch: 12 Global Step: 251230 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:41,886-Speed 2498.27 samples/sec Loss 3.1637 LearningRate 0.000600 Epoch: 12 Global Step: 251240 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:50,082-Speed 2499.22 samples/sec Loss 3.1908 LearningRate 0.000600 Epoch: 12 Global Step: 251250 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:52:58,276-Speed 2499.62 samples/sec Loss 3.1392 LearningRate 0.000600 Epoch: 12 Global Step: 251260 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:53:06,474-Speed 2498.70 samples/sec Loss 3.1497 LearningRate 0.000600 Epoch: 12 Global Step: 251270 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:14,679-Speed 2496.15 samples/sec Loss 3.1890 LearningRate 0.000600 Epoch: 12 Global Step: 251280 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:22,830-Speed 2513.33 samples/sec Loss 3.2792 LearningRate 0.000600 Epoch: 12 Global Step: 251290 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:31,027-Speed 2498.96 samples/sec Loss 3.1693 LearningRate 0.000600 Epoch: 12 Global Step: 251300 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:39,233-Speed 2495.99 samples/sec Loss 3.1870 LearningRate 0.000600 Epoch: 12 Global Step: 251310 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:47,433-Speed 2498.12 samples/sec Loss 3.1505 LearningRate 0.000600 Epoch: 12 Global Step: 251320 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:53:55,634-Speed 2497.63 samples/sec Loss 3.2034 LearningRate 0.000600 Epoch: 12 Global Step: 251330 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:03,833-Speed 2498.27 samples/sec Loss 3.2131 LearningRate 0.000600 Epoch: 12 Global Step: 251340 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:11,979-Speed 2514.51 samples/sec Loss 3.1612 LearningRate 0.000600 Epoch: 12 Global Step: 251350 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:20,184-Speed 2496.34 samples/sec Loss 3.1813 LearningRate 0.000600 Epoch: 12 Global Step: 251360 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:28,382-Speed 2498.61 samples/sec Loss 3.1726 LearningRate 0.000600 Epoch: 12 Global Step: 251370 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:36,583-Speed 2497.46 samples/sec Loss 3.1276 LearningRate 0.000600 Epoch: 12 Global Step: 251380 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:44,784-Speed 2497.77 samples/sec Loss 3.1199 LearningRate 0.000600 Epoch: 12 Global Step: 251390 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:54:52,981-Speed 2498.67 samples/sec Loss 3.1216 LearningRate 0.000600 Epoch: 12 Global Step: 251400 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:01,127-Speed 2514.73 samples/sec Loss 3.1276 LearningRate 0.000600 Epoch: 12 Global Step: 251410 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:09,322-Speed 2499.59 samples/sec Loss 3.1434 LearningRate 0.000600 Epoch: 12 Global Step: 251420 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:17,528-Speed 2496.11 samples/sec Loss 3.1663 LearningRate 0.000600 Epoch: 12 Global Step: 251430 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:25,725-Speed 2498.74 samples/sec Loss 3.1655 LearningRate 0.000600 Epoch: 12 Global Step: 251440 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:33,945-Speed 2492.04 samples/sec Loss 3.1692 LearningRate 0.000600 Epoch: 12 Global Step: 251450 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:42,149-Speed 2496.46 samples/sec Loss 3.1506 LearningRate 0.000600 Epoch: 12 Global Step: 251460 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:50,301-Speed 2512.86 samples/sec Loss 3.1947 LearningRate 0.000600 Epoch: 12 Global Step: 251470 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:55:58,506-Speed 2496.43 samples/sec Loss 3.1065 LearningRate 0.000600 Epoch: 12 Global Step: 251480 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:06,704-Speed 2498.51 samples/sec Loss 3.0891 LearningRate 0.000599 Epoch: 12 Global Step: 251490 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:14,905-Speed 2497.92 samples/sec Loss 3.1304 LearningRate 0.000599 Epoch: 12 Global Step: 251500 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:23,113-Speed 2495.63 samples/sec Loss 3.1131 LearningRate 0.000599 Epoch: 12 Global Step: 251510 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:31,315-Speed 2497.37 samples/sec Loss 3.1067 LearningRate 0.000599 Epoch: 12 Global Step: 251520 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:39,460-Speed 2515.05 samples/sec Loss 3.0933 LearningRate 0.000599 Epoch: 12 Global Step: 251530 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:47,656-Speed 2499.16 samples/sec Loss 3.1623 LearningRate 0.000599 Epoch: 12 Global Step: 251540 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:56:55,853-Speed 2498.69 samples/sec Loss 3.0377 LearningRate 0.000599 Epoch: 12 Global Step: 251550 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:04,063-Speed 2495.20 samples/sec Loss 3.0743 LearningRate 0.000599 Epoch: 12 Global Step: 251560 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:12,262-Speed 2498.41 samples/sec Loss 3.1306 LearningRate 0.000599 Epoch: 12 Global Step: 251570 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:20,460-Speed 2498.52 samples/sec Loss 3.1122 LearningRate 0.000599 Epoch: 12 Global Step: 251580 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:28,604-Speed 2515.15 samples/sec Loss 3.1073 LearningRate 0.000599 Epoch: 12 Global Step: 251590 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:36,799-Speed 2499.43 samples/sec Loss 3.1512 LearningRate 0.000599 Epoch: 12 Global Step: 251600 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:44,993-Speed 2499.80 samples/sec Loss 3.0805 LearningRate 0.000599 Epoch: 12 Global Step: 251610 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:57:53,192-Speed 2498.27 samples/sec Loss 3.1430 LearningRate 0.000599 Epoch: 12 Global Step: 251620 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:01,384-Speed 2500.31 samples/sec Loss 3.1305 LearningRate 0.000599 Epoch: 12 Global Step: 251630 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:09,581-Speed 2498.93 samples/sec Loss 3.1768 LearningRate 0.000599 Epoch: 12 Global Step: 251640 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:17,722-Speed 2515.98 samples/sec Loss 3.1455 LearningRate 0.000599 Epoch: 12 Global Step: 251650 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:25,915-Speed 2500.18 samples/sec Loss 3.1076 LearningRate 0.000599 Epoch: 12 Global Step: 251660 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:34,115-Speed 2497.88 samples/sec Loss 3.1069 LearningRate 0.000599 Epoch: 12 Global Step: 251670 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:42,311-Speed 2499.38 samples/sec Loss 3.1208 LearningRate 0.000599 Epoch: 12 Global Step: 251680 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:50,507-Speed 2498.89 samples/sec Loss 3.1076 LearningRate 0.000599 Epoch: 12 Global Step: 251690 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:58:58,709-Speed 2497.32 samples/sec Loss 3.1760 LearningRate 0.000599 Epoch: 12 Global Step: 251700 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:59:06,853-Speed 2515.26 samples/sec Loss 3.1538 LearningRate 0.000599 Epoch: 12 Global Step: 251710 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:59:15,053-Speed 2497.87 samples/sec Loss 3.1838 LearningRate 0.000599 Epoch: 12 Global Step: 251720 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:59:23,249-Speed 2499.10 samples/sec Loss 3.2049 LearningRate 0.000599 Epoch: 12 Global Step: 251730 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-07 23:59:31,408-Speed 2510.53 samples/sec Loss 3.1786 LearningRate 0.000599 Epoch: 12 Global Step: 251740 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:59:39,610-Speed 2497.53 samples/sec Loss 3.2156 LearningRate 0.000599 Epoch: 12 Global Step: 251750 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:59:47,819-Speed 2495.26 samples/sec Loss 3.0813 LearningRate 0.000599 Epoch: 12 Global Step: 251760 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-07 23:59:55,966-Speed 2514.29 samples/sec Loss 3.2239 LearningRate 0.000599 Epoch: 12 Global Step: 251770 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:04,173-Speed 2495.89 samples/sec Loss 3.1258 LearningRate 0.000599 Epoch: 12 Global Step: 251780 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:12,372-Speed 2498.45 samples/sec Loss 3.1062 LearningRate 0.000599 Epoch: 12 Global Step: 251790 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:20,580-Speed 2495.70 samples/sec Loss 3.1198 LearningRate 0.000599 Epoch: 12 Global Step: 251800 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:28,778-Speed 2498.49 samples/sec Loss 3.0656 LearningRate 0.000599 Epoch: 12 Global Step: 251810 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:36,976-Speed 2498.72 samples/sec Loss 3.1195 LearningRate 0.000599 Epoch: 12 Global Step: 251820 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:45,117-Speed 2515.97 samples/sec Loss 3.0654 LearningRate 0.000599 Epoch: 12 Global Step: 251830 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:00:53,310-Speed 2500.01 samples/sec Loss 3.0713 LearningRate 0.000599 Epoch: 12 Global Step: 251840 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:01,503-Speed 2500.20 samples/sec Loss 3.1793 LearningRate 0.000599 Epoch: 12 Global Step: 251850 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:09,697-Speed 2499.77 samples/sec Loss 3.1586 LearningRate 0.000599 Epoch: 12 Global Step: 251860 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:17,893-Speed 2499.42 samples/sec Loss 3.2009 LearningRate 0.000599 Epoch: 12 Global Step: 251870 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:26,089-Speed 2499.24 samples/sec Loss 3.2116 LearningRate 0.000599 Epoch: 12 Global Step: 251880 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:34,229-Speed 2516.37 samples/sec Loss 3.1569 LearningRate 0.000599 Epoch: 12 Global Step: 251890 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:42,438-Speed 2495.54 samples/sec Loss 3.1426 LearningRate 0.000599 Epoch: 12 Global Step: 251900 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:50,635-Speed 2498.63 samples/sec Loss 3.1125 LearningRate 0.000599 Epoch: 12 Global Step: 251910 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:01:58,832-Speed 2498.86 samples/sec Loss 3.1339 LearningRate 0.000599 Epoch: 12 Global Step: 251920 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:07,028-Speed 2499.04 samples/sec Loss 3.1553 LearningRate 0.000599 Epoch: 12 Global Step: 251930 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:15,227-Speed 2498.30 samples/sec Loss 3.1065 LearningRate 0.000599 Epoch: 12 Global Step: 251940 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:23,382-Speed 2512.00 samples/sec Loss 3.0649 LearningRate 0.000599 Epoch: 12 Global Step: 251950 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:31,582-Speed 2497.71 samples/sec Loss 3.1284 LearningRate 0.000599 Epoch: 12 Global Step: 251960 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:39,780-Speed 2498.67 samples/sec Loss 3.1261 LearningRate 0.000598 Epoch: 12 Global Step: 251970 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:47,978-Speed 2498.73 samples/sec Loss 3.1781 LearningRate 0.000598 Epoch: 12 Global Step: 251980 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:02:56,177-Speed 2498.14 samples/sec Loss 3.1785 LearningRate 0.000598 Epoch: 12 Global Step: 251990 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:04,382-Speed 2496.56 samples/sec Loss 3.1110 LearningRate 0.000598 Epoch: 12 Global Step: 252000 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:12,529-Speed 2513.88 samples/sec Loss 3.1182 LearningRate 0.000598 Epoch: 12 Global Step: 252010 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:20,731-Speed 2497.75 samples/sec Loss 3.0840 LearningRate 0.000598 Epoch: 12 Global Step: 252020 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:28,928-Speed 2498.75 samples/sec Loss 3.1429 LearningRate 0.000598 Epoch: 12 Global Step: 252030 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:37,128-Speed 2497.75 samples/sec Loss 3.2085 LearningRate 0.000598 Epoch: 12 Global Step: 252040 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:45,350-Speed 2491.37 samples/sec Loss 3.0998 LearningRate 0.000598 Epoch: 12 Global Step: 252050 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:03:53,549-Speed 2498.14 samples/sec Loss 3.1946 LearningRate 0.000598 Epoch: 12 Global Step: 252060 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:01,697-Speed 2513.98 samples/sec Loss 3.1339 LearningRate 0.000598 Epoch: 12 Global Step: 252070 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:09,908-Speed 2494.53 samples/sec Loss 3.1489 LearningRate 0.000598 Epoch: 12 Global Step: 252080 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:18,109-Speed 2497.69 samples/sec Loss 3.0863 LearningRate 0.000598 Epoch: 12 Global Step: 252090 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:26,304-Speed 2499.64 samples/sec Loss 3.1310 LearningRate 0.000598 Epoch: 12 Global Step: 252100 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:34,508-Speed 2496.81 samples/sec Loss 3.1507 LearningRate 0.000598 Epoch: 12 Global Step: 252110 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:42,707-Speed 2497.94 samples/sec Loss 3.1417 LearningRate 0.000598 Epoch: 12 Global Step: 252120 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:50,853-Speed 2514.61 samples/sec Loss 3.1840 LearningRate 0.000598 Epoch: 12 Global Step: 252130 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:04:59,054-Speed 2497.76 samples/sec Loss 3.1425 LearningRate 0.000598 Epoch: 12 Global Step: 252140 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:07,253-Speed 2498.39 samples/sec Loss 3.1828 LearningRate 0.000598 Epoch: 12 Global Step: 252150 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:15,446-Speed 2499.77 samples/sec Loss 3.1665 LearningRate 0.000598 Epoch: 12 Global Step: 252160 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:23,659-Speed 2494.22 samples/sec Loss 3.0956 LearningRate 0.000598 Epoch: 12 Global Step: 252170 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:31,854-Speed 2499.36 samples/sec Loss 3.1102 LearningRate 0.000598 Epoch: 12 Global Step: 252180 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:39,995-Speed 2515.90 samples/sec Loss 3.1511 LearningRate 0.000598 Epoch: 12 Global Step: 252190 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:48,197-Speed 2497.59 samples/sec Loss 3.1434 LearningRate 0.000598 Epoch: 12 Global Step: 252200 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:05:56,396-Speed 2498.33 samples/sec Loss 3.1165 LearningRate 0.000598 Epoch: 12 Global Step: 252210 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:04,596-Speed 2498.21 samples/sec Loss 3.1296 LearningRate 0.000598 Epoch: 12 Global Step: 252220 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:12,793-Speed 2498.70 samples/sec Loss 3.1746 LearningRate 0.000598 Epoch: 12 Global Step: 252230 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:21,002-Speed 2495.33 samples/sec Loss 3.2534 LearningRate 0.000598 Epoch: 12 Global Step: 252240 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:29,147-Speed 2514.75 samples/sec Loss 3.0927 LearningRate 0.000598 Epoch: 12 Global Step: 252250 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:37,345-Speed 2498.82 samples/sec Loss 3.1592 LearningRate 0.000598 Epoch: 12 Global Step: 252260 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:45,555-Speed 2494.74 samples/sec Loss 3.1901 LearningRate 0.000598 Epoch: 12 Global Step: 252270 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:06:53,754-Speed 2498.20 samples/sec Loss 3.1685 LearningRate 0.000598 Epoch: 12 Global Step: 252280 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:01,952-Speed 2498.78 samples/sec Loss 3.2377 LearningRate 0.000598 Epoch: 12 Global Step: 252290 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:10,162-Speed 2495.02 samples/sec Loss 3.1424 LearningRate 0.000598 Epoch: 12 Global Step: 252300 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:18,306-Speed 2514.85 samples/sec Loss 3.1655 LearningRate 0.000598 Epoch: 12 Global Step: 252310 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:26,504-Speed 2498.66 samples/sec Loss 3.2163 LearningRate 0.000598 Epoch: 12 Global Step: 252320 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:34,709-Speed 2496.62 samples/sec Loss 3.1285 LearningRate 0.000598 Epoch: 12 Global Step: 252330 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:42,917-Speed 2495.54 samples/sec Loss 3.1670 LearningRate 0.000598 Epoch: 12 Global Step: 252340 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:51,112-Speed 2499.29 samples/sec Loss 3.1994 LearningRate 0.000598 Epoch: 12 Global Step: 252350 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:07:59,328-Speed 2493.23 samples/sec Loss 3.1893 LearningRate 0.000598 Epoch: 12 Global Step: 252360 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:07,469-Speed 2516.10 samples/sec Loss 3.2235 LearningRate 0.000598 Epoch: 12 Global Step: 252370 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:15,671-Speed 2497.66 samples/sec Loss 3.1672 LearningRate 0.000598 Epoch: 12 Global Step: 252380 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:23,869-Speed 2498.64 samples/sec Loss 3.1261 LearningRate 0.000598 Epoch: 12 Global Step: 252390 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:32,071-Speed 2497.38 samples/sec Loss 3.1591 LearningRate 0.000598 Epoch: 12 Global Step: 252400 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:40,284-Speed 2493.85 samples/sec Loss 3.1769 LearningRate 0.000598 Epoch: 12 Global Step: 252410 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:48,487-Speed 2497.07 samples/sec Loss 3.1608 LearningRate 0.000598 Epoch: 12 Global Step: 252420 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:08:56,636-Speed 2513.38 samples/sec Loss 3.0987 LearningRate 0.000598 Epoch: 12 Global Step: 252430 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:04,842-Speed 2496.34 samples/sec Loss 3.1552 LearningRate 0.000598 Epoch: 12 Global Step: 252440 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:13,043-Speed 2497.65 samples/sec Loss 3.1701 LearningRate 0.000598 Epoch: 12 Global Step: 252450 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:21,243-Speed 2497.83 samples/sec Loss 3.2041 LearningRate 0.000597 Epoch: 12 Global Step: 252460 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:29,447-Speed 2497.02 samples/sec Loss 3.1856 LearningRate 0.000597 Epoch: 12 Global Step: 252470 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:37,657-Speed 2495.06 samples/sec Loss 3.1471 LearningRate 0.000597 Epoch: 12 Global Step: 252480 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:45,809-Speed 2512.54 samples/sec Loss 3.1981 LearningRate 0.000597 Epoch: 12 Global Step: 252490 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:09:54,005-Speed 2498.97 samples/sec Loss 3.1887 LearningRate 0.000597 Epoch: 12 Global Step: 252500 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:02,207-Speed 2497.47 samples/sec Loss 3.1576 LearningRate 0.000597 Epoch: 12 Global Step: 252510 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:10,405-Speed 2498.72 samples/sec Loss 3.1524 LearningRate 0.000597 Epoch: 12 Global Step: 252520 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:18,603-Speed 2498.37 samples/sec Loss 3.1862 LearningRate 0.000597 Epoch: 12 Global Step: 252530 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:26,803-Speed 2498.16 samples/sec Loss 3.2087 LearningRate 0.000597 Epoch: 12 Global Step: 252540 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:34,950-Speed 2514.17 samples/sec Loss 3.1302 LearningRate 0.000597 Epoch: 12 Global Step: 252550 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:43,148-Speed 2498.55 samples/sec Loss 3.1321 LearningRate 0.000597 Epoch: 12 Global Step: 252560 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:51,347-Speed 2498.11 samples/sec Loss 3.1367 LearningRate 0.000597 Epoch: 12 Global Step: 252570 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:10:59,555-Speed 2495.50 samples/sec Loss 3.1616 LearningRate 0.000597 Epoch: 12 Global Step: 252580 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:07,755-Speed 2497.98 samples/sec Loss 3.1728 LearningRate 0.000597 Epoch: 12 Global Step: 252590 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:15,961-Speed 2496.02 samples/sec Loss 3.2438 LearningRate 0.000597 Epoch: 12 Global Step: 252600 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:24,112-Speed 2513.28 samples/sec Loss 3.1643 LearningRate 0.000597 Epoch: 12 Global Step: 252610 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:32,315-Speed 2497.11 samples/sec Loss 3.1891 LearningRate 0.000597 Epoch: 12 Global Step: 252620 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:40,516-Speed 2497.70 samples/sec Loss 3.1688 LearningRate 0.000597 Epoch: 12 Global Step: 252630 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:48,714-Speed 2498.52 samples/sec Loss 3.1364 LearningRate 0.000597 Epoch: 12 Global Step: 252640 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:11:56,914-Speed 2497.99 samples/sec Loss 3.2053 LearningRate 0.000597 Epoch: 12 Global Step: 252650 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:05,127-Speed 2493.93 samples/sec Loss 3.0991 LearningRate 0.000597 Epoch: 12 Global Step: 252660 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:13,269-Speed 2515.79 samples/sec Loss 3.2121 LearningRate 0.000597 Epoch: 12 Global Step: 252670 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:21,469-Speed 2497.97 samples/sec Loss 3.1516 LearningRate 0.000597 Epoch: 12 Global Step: 252680 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:30,282-Speed 2500.08 samples/sec Loss 3.1019 LearningRate 0.000597 Epoch: 12 Global Step: 252690 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:38,482-Speed 2501.20 samples/sec Loss 3.1620 LearningRate 0.000597 Epoch: 12 Global Step: 252700 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:46,710-Speed 2500.51 samples/sec Loss 3.0931 LearningRate 0.000597 Epoch: 12 Global Step: 252710 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:12:54,912-Speed 2497.32 samples/sec Loss 3.1251 LearningRate 0.000597 Epoch: 12 Global Step: 252720 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:13:03,111-Speed 2518.04 samples/sec Loss 3.1048 LearningRate 0.000597 Epoch: 12 Global Step: 252730 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:13:14,389-Speed 1816.12 samples/sec Loss 3.1115 LearningRate 0.000597 Epoch: 12 Global Step: 252740 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:13:22,650-Speed 2501.21 samples/sec Loss 3.1350 LearningRate 0.000597 Epoch: 12 Global Step: 252750 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:13:30,882-Speed 2501.89 samples/sec Loss 3.0982 LearningRate 0.000597 Epoch: 12 Global Step: 252760 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:14:41,587-Speed 289.66 samples/sec Loss 3.1306 LearningRate 0.000597 Epoch: 12 Global Step: 252770 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:14:49,794-Speed 2510.87 samples/sec Loss 3.1492 LearningRate 0.000597 Epoch: 12 Global Step: 252780 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:14:57,992-Speed 2524.36 samples/sec Loss 3.1696 LearningRate 0.000597 Epoch: 12 Global Step: 252790 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:06,676-Speed 2358.73 samples/sec Loss 3.0907 LearningRate 0.000597 Epoch: 12 Global Step: 252800 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:14,887-Speed 2505.99 samples/sec Loss 3.2029 LearningRate 0.000597 Epoch: 12 Global Step: 252810 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:23,099-Speed 2502.73 samples/sec Loss 3.1644 LearningRate 0.000597 Epoch: 12 Global Step: 252820 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:36,147-Speed 2498.84 samples/sec Loss 3.2299 LearningRate 0.000597 Epoch: 12 Global Step: 252830 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:44,382-Speed 2498.40 samples/sec Loss 3.1370 LearningRate 0.000597 Epoch: 12 Global Step: 252840 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:15:52,537-Speed 2511.74 samples/sec Loss 3.1851 LearningRate 0.000597 Epoch: 12 Global Step: 252850 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:04,169-Speed 1765.20 samples/sec Loss 3.0510 LearningRate 0.000597 Epoch: 12 Global Step: 252860 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:12,373-Speed 2497.86 samples/sec Loss 3.1741 LearningRate 0.000597 Epoch: 12 Global Step: 252870 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:20,576-Speed 2497.22 samples/sec Loss 3.1544 LearningRate 0.000597 Epoch: 12 Global Step: 252880 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:30,365-Speed 2500.60 samples/sec Loss 3.2085 LearningRate 0.000597 Epoch: 12 Global Step: 252890 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:39,312-Speed 2297.27 samples/sec Loss 3.1520 LearningRate 0.000597 Epoch: 12 Global Step: 252900 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:49,179-Speed 2075.68 samples/sec Loss 3.1450 LearningRate 0.000597 Epoch: 12 Global Step: 252910 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:16:57,374-Speed 2499.65 samples/sec Loss 3.1167 LearningRate 0.000597 Epoch: 12 Global Step: 252920 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:17:05,570-Speed 2499.42 samples/sec Loss 3.1332 LearningRate 0.000597 Epoch: 12 Global Step: 252930 Fp16 Grad Scale: 32768 Required: 132 hours Training: 2022-07-08 00:17:13,770-Speed 2497.82 samples/sec Loss 3.1314 LearningRate 0.000596 Epoch: 12 Global Step: 252940 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:17:21,971-Speed 2497.73 samples/sec Loss 3.1185 LearningRate 0.000596 Epoch: 12 Global Step: 252950 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:17:30,180-Speed 2495.26 samples/sec Loss 3.1367 LearningRate 0.000596 Epoch: 12 Global Step: 252960 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:17:38,332-Speed 2512.78 samples/sec Loss 3.1213 LearningRate 0.000596 Epoch: 12 Global Step: 252970 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:17:46,537-Speed 2496.38 samples/sec Loss 3.0836 LearningRate 0.000596 Epoch: 12 Global Step: 252980 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:17:54,740-Speed 2496.94 samples/sec Loss 3.1751 LearningRate 0.000596 Epoch: 12 Global Step: 252990 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:02,944-Speed 2496.83 samples/sec Loss 3.1389 LearningRate 0.000596 Epoch: 12 Global Step: 253000 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:11,145-Speed 2497.72 samples/sec Loss 3.1118 LearningRate 0.000596 Epoch: 12 Global Step: 253010 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:19,347-Speed 2497.46 samples/sec Loss 3.0774 LearningRate 0.000596 Epoch: 12 Global Step: 253020 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:27,499-Speed 2512.43 samples/sec Loss 3.0573 LearningRate 0.000596 Epoch: 12 Global Step: 253030 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:35,710-Speed 2494.94 samples/sec Loss 3.1656 LearningRate 0.000596 Epoch: 12 Global Step: 253040 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:43,913-Speed 2497.34 samples/sec Loss 3.1627 LearningRate 0.000596 Epoch: 12 Global Step: 253050 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:18:52,115-Speed 2497.16 samples/sec Loss 3.0491 LearningRate 0.000596 Epoch: 12 Global Step: 253060 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:00,317-Speed 2497.54 samples/sec Loss 3.0988 LearningRate 0.000596 Epoch: 12 Global Step: 253070 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:08,528-Speed 2494.82 samples/sec Loss 3.1040 LearningRate 0.000596 Epoch: 12 Global Step: 253080 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:16,678-Speed 2513.23 samples/sec Loss 3.0388 LearningRate 0.000596 Epoch: 12 Global Step: 253090 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:24,880-Speed 2497.29 samples/sec Loss 3.1481 LearningRate 0.000596 Epoch: 12 Global Step: 253100 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:33,083-Speed 2497.14 samples/sec Loss 3.1077 LearningRate 0.000596 Epoch: 12 Global Step: 253110 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:41,293-Speed 2494.98 samples/sec Loss 3.0718 LearningRate 0.000596 Epoch: 12 Global Step: 253120 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:49,495-Speed 2497.29 samples/sec Loss 3.0720 LearningRate 0.000596 Epoch: 12 Global Step: 253130 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:19:57,696-Speed 2497.49 samples/sec Loss 3.0775 LearningRate 0.000596 Epoch: 12 Global Step: 253140 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:05,848-Speed 2513.04 samples/sec Loss 3.1110 LearningRate 0.000596 Epoch: 12 Global Step: 253150 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:14,070-Speed 2491.12 samples/sec Loss 3.0984 LearningRate 0.000596 Epoch: 12 Global Step: 253160 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:22,275-Speed 2496.40 samples/sec Loss 3.1151 LearningRate 0.000596 Epoch: 12 Global Step: 253170 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:30,481-Speed 2496.26 samples/sec Loss 3.1332 LearningRate 0.000596 Epoch: 12 Global Step: 253180 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:38,683-Speed 2497.46 samples/sec Loss 3.1148 LearningRate 0.000596 Epoch: 12 Global Step: 253190 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:46,887-Speed 2496.66 samples/sec Loss 3.0429 LearningRate 0.000596 Epoch: 12 Global Step: 253200 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:20:55,038-Speed 2513.05 samples/sec Loss 3.1372 LearningRate 0.000596 Epoch: 12 Global Step: 253210 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:03,242-Speed 2496.81 samples/sec Loss 3.0842 LearningRate 0.000596 Epoch: 12 Global Step: 253220 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:11,444-Speed 2497.32 samples/sec Loss 3.0939 LearningRate 0.000596 Epoch: 12 Global Step: 253230 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:19,644-Speed 2497.94 samples/sec Loss 3.1075 LearningRate 0.000596 Epoch: 12 Global Step: 253240 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:27,844-Speed 2497.94 samples/sec Loss 3.1315 LearningRate 0.000596 Epoch: 12 Global Step: 253250 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:36,045-Speed 2497.59 samples/sec Loss 3.1412 LearningRate 0.000596 Epoch: 12 Global Step: 253260 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:44,191-Speed 2514.37 samples/sec Loss 3.0502 LearningRate 0.000596 Epoch: 12 Global Step: 253270 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:21:52,396-Speed 2496.57 samples/sec Loss 3.1544 LearningRate 0.000596 Epoch: 12 Global Step: 253280 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:00,596-Speed 2498.11 samples/sec Loss 3.1245 LearningRate 0.000596 Epoch: 12 Global Step: 253290 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:08,802-Speed 2496.38 samples/sec Loss 3.1263 LearningRate 0.000596 Epoch: 12 Global Step: 253300 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:17,002-Speed 2498.02 samples/sec Loss 3.1679 LearningRate 0.000596 Epoch: 12 Global Step: 253310 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:25,200-Speed 2498.78 samples/sec Loss 3.1087 LearningRate 0.000596 Epoch: 12 Global Step: 253320 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:33,347-Speed 2514.26 samples/sec Loss 3.1309 LearningRate 0.000596 Epoch: 12 Global Step: 253330 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:41,549-Speed 2497.39 samples/sec Loss 3.1298 LearningRate 0.000596 Epoch: 12 Global Step: 253340 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:49,750-Speed 2497.75 samples/sec Loss 3.1625 LearningRate 0.000596 Epoch: 12 Global Step: 253350 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:22:57,961-Speed 2494.59 samples/sec Loss 3.1469 LearningRate 0.000596 Epoch: 12 Global Step: 253360 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:06,161-Speed 2497.90 samples/sec Loss 3.1080 LearningRate 0.000596 Epoch: 12 Global Step: 253370 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:14,361-Speed 2498.05 samples/sec Loss 3.1937 LearningRate 0.000596 Epoch: 12 Global Step: 253380 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:22,507-Speed 2514.60 samples/sec Loss 3.2023 LearningRate 0.000596 Epoch: 12 Global Step: 253390 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:30,708-Speed 2497.95 samples/sec Loss 3.2436 LearningRate 0.000596 Epoch: 12 Global Step: 253400 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:38,915-Speed 2495.86 samples/sec Loss 3.1855 LearningRate 0.000596 Epoch: 12 Global Step: 253410 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:47,128-Speed 2494.05 samples/sec Loss 3.1693 LearningRate 0.000595 Epoch: 12 Global Step: 253420 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:23:55,336-Speed 2496.12 samples/sec Loss 3.1766 LearningRate 0.000595 Epoch: 12 Global Step: 253430 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:03,535-Speed 2498.06 samples/sec Loss 3.1762 LearningRate 0.000595 Epoch: 12 Global Step: 253440 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:11,681-Speed 2514.43 samples/sec Loss 3.1692 LearningRate 0.000595 Epoch: 12 Global Step: 253450 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:19,879-Speed 2498.51 samples/sec Loss 3.1374 LearningRate 0.000595 Epoch: 12 Global Step: 253460 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:28,082-Speed 2497.24 samples/sec Loss 3.1079 LearningRate 0.000595 Epoch: 12 Global Step: 253470 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:36,286-Speed 2497.07 samples/sec Loss 3.0840 LearningRate 0.000595 Epoch: 12 Global Step: 253480 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:44,496-Speed 2494.85 samples/sec Loss 3.1249 LearningRate 0.000595 Epoch: 12 Global Step: 253490 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:24:52,702-Speed 2495.90 samples/sec Loss 3.0943 LearningRate 0.000595 Epoch: 12 Global Step: 253500 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:00,852-Speed 2513.65 samples/sec Loss 3.1751 LearningRate 0.000595 Epoch: 12 Global Step: 253510 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:09,058-Speed 2495.89 samples/sec Loss 3.1290 LearningRate 0.000595 Epoch: 12 Global Step: 253520 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:17,258-Speed 2497.99 samples/sec Loss 3.1084 LearningRate 0.000595 Epoch: 12 Global Step: 253530 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:25,459-Speed 2497.62 samples/sec Loss 3.1089 LearningRate 0.000595 Epoch: 12 Global Step: 253540 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:33,660-Speed 2497.91 samples/sec Loss 3.1493 LearningRate 0.000595 Epoch: 12 Global Step: 253550 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:41,865-Speed 2496.14 samples/sec Loss 3.1501 LearningRate 0.000595 Epoch: 12 Global Step: 253560 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:50,011-Speed 2514.50 samples/sec Loss 3.1755 LearningRate 0.000595 Epoch: 12 Global Step: 253570 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:25:58,210-Speed 2498.28 samples/sec Loss 3.1345 LearningRate 0.000595 Epoch: 12 Global Step: 253580 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:06,408-Speed 2498.71 samples/sec Loss 3.1270 LearningRate 0.000595 Epoch: 12 Global Step: 253590 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:14,621-Speed 2494.03 samples/sec Loss 3.0904 LearningRate 0.000595 Epoch: 12 Global Step: 253600 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:22,821-Speed 2498.18 samples/sec Loss 3.1288 LearningRate 0.000595 Epoch: 12 Global Step: 253610 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:31,018-Speed 2498.76 samples/sec Loss 3.1300 LearningRate 0.000595 Epoch: 12 Global Step: 253620 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:39,186-Speed 2507.84 samples/sec Loss 3.0794 LearningRate 0.000595 Epoch: 12 Global Step: 253630 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:47,380-Speed 2499.86 samples/sec Loss 3.1317 LearningRate 0.000595 Epoch: 12 Global Step: 253640 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:26:55,581-Speed 2497.77 samples/sec Loss 3.1070 LearningRate 0.000595 Epoch: 12 Global Step: 253650 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:03,778-Speed 2499.07 samples/sec Loss 3.2139 LearningRate 0.000595 Epoch: 12 Global Step: 253660 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:11,977-Speed 2498.74 samples/sec Loss 3.1206 LearningRate 0.000595 Epoch: 12 Global Step: 253670 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:20,178-Speed 2497.57 samples/sec Loss 3.0247 LearningRate 0.000595 Epoch: 12 Global Step: 253680 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:28,323-Speed 2515.01 samples/sec Loss 3.0560 LearningRate 0.000595 Epoch: 12 Global Step: 253690 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:36,518-Speed 2499.54 samples/sec Loss 3.1257 LearningRate 0.000595 Epoch: 12 Global Step: 253700 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:44,720-Speed 2497.52 samples/sec Loss 3.1193 LearningRate 0.000595 Epoch: 12 Global Step: 253710 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:27:52,920-Speed 2497.78 samples/sec Loss 3.1156 LearningRate 0.000595 Epoch: 12 Global Step: 253720 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:01,124-Speed 2496.80 samples/sec Loss 3.1042 LearningRate 0.000595 Epoch: 12 Global Step: 253730 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:09,323-Speed 2498.34 samples/sec Loss 3.1195 LearningRate 0.000595 Epoch: 12 Global Step: 253740 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:17,468-Speed 2514.86 samples/sec Loss 3.0960 LearningRate 0.000595 Epoch: 12 Global Step: 253750 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:25,668-Speed 2498.04 samples/sec Loss 3.0993 LearningRate 0.000595 Epoch: 12 Global Step: 253760 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:33,869-Speed 2497.74 samples/sec Loss 3.1382 LearningRate 0.000595 Epoch: 12 Global Step: 253770 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:42,070-Speed 2497.99 samples/sec Loss 3.2000 LearningRate 0.000595 Epoch: 12 Global Step: 253780 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:50,270-Speed 2497.77 samples/sec Loss 3.1773 LearningRate 0.000595 Epoch: 12 Global Step: 253790 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:28:58,472-Speed 2497.87 samples/sec Loss 3.2474 LearningRate 0.000595 Epoch: 12 Global Step: 253800 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:06,618-Speed 2514.30 samples/sec Loss 3.1987 LearningRate 0.000595 Epoch: 12 Global Step: 253810 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:14,829-Speed 2494.72 samples/sec Loss 3.1533 LearningRate 0.000595 Epoch: 12 Global Step: 253820 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:23,025-Speed 2499.09 samples/sec Loss 3.0939 LearningRate 0.000595 Epoch: 12 Global Step: 253830 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:31,226-Speed 2497.88 samples/sec Loss 3.1448 LearningRate 0.000595 Epoch: 12 Global Step: 253840 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:39,426-Speed 2497.94 samples/sec Loss 3.1556 LearningRate 0.000595 Epoch: 12 Global Step: 253850 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:47,629-Speed 2496.93 samples/sec Loss 3.1505 LearningRate 0.000595 Epoch: 12 Global Step: 253860 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:29:55,772-Speed 2515.44 samples/sec Loss 3.1770 LearningRate 0.000595 Epoch: 12 Global Step: 253870 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:03,973-Speed 2497.64 samples/sec Loss 3.1097 LearningRate 0.000595 Epoch: 12 Global Step: 253880 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:12,171-Speed 2498.73 samples/sec Loss 3.0898 LearningRate 0.000595 Epoch: 12 Global Step: 253890 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:20,369-Speed 2498.67 samples/sec Loss 3.1453 LearningRate 0.000595 Epoch: 12 Global Step: 253900 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:28,588-Speed 2492.25 samples/sec Loss 3.1195 LearningRate 0.000594 Epoch: 12 Global Step: 253910 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:36,786-Speed 2498.58 samples/sec Loss 3.1169 LearningRate 0.000594 Epoch: 12 Global Step: 253920 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:44,930-Speed 2515.56 samples/sec Loss 3.0963 LearningRate 0.000594 Epoch: 12 Global Step: 253930 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:30:53,127-Speed 2498.91 samples/sec Loss 3.0959 LearningRate 0.000594 Epoch: 12 Global Step: 253940 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:01,327-Speed 2497.82 samples/sec Loss 3.0320 LearningRate 0.000594 Epoch: 12 Global Step: 253950 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:09,526-Speed 2498.43 samples/sec Loss 3.0786 LearningRate 0.000594 Epoch: 12 Global Step: 253960 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:17,726-Speed 2498.10 samples/sec Loss 3.1332 LearningRate 0.000594 Epoch: 12 Global Step: 253970 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:25,925-Speed 2498.22 samples/sec Loss 3.1408 LearningRate 0.000594 Epoch: 12 Global Step: 253980 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:34,076-Speed 2513.00 samples/sec Loss 3.3140 LearningRate 0.000594 Epoch: 12 Global Step: 253990 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:42,277-Speed 2497.53 samples/sec Loss 3.1630 LearningRate 0.000594 Epoch: 12 Global Step: 254000 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:50,484-Speed 2495.91 samples/sec Loss 3.1501 LearningRate 0.000594 Epoch: 12 Global Step: 254010 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:31:58,695-Speed 2494.48 samples/sec Loss 3.1400 LearningRate 0.000594 Epoch: 12 Global Step: 254020 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:06,894-Speed 2498.49 samples/sec Loss 3.1350 LearningRate 0.000594 Epoch: 12 Global Step: 254030 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:15,096-Speed 2497.41 samples/sec Loss 3.2024 LearningRate 0.000594 Epoch: 12 Global Step: 254040 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:23,241-Speed 2514.81 samples/sec Loss 3.1578 LearningRate 0.000594 Epoch: 12 Global Step: 254050 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:31,441-Speed 2498.31 samples/sec Loss 3.1592 LearningRate 0.000594 Epoch: 12 Global Step: 254060 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:39,638-Speed 2498.76 samples/sec Loss 3.1762 LearningRate 0.000594 Epoch: 12 Global Step: 254070 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:47,839-Speed 2497.70 samples/sec Loss 3.1874 LearningRate 0.000594 Epoch: 12 Global Step: 254080 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:32:56,038-Speed 2498.42 samples/sec Loss 3.2231 LearningRate 0.000594 Epoch: 12 Global Step: 254090 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:33:04,252-Speed 2493.82 samples/sec Loss 3.3169 LearningRate 0.000594 Epoch: 12 Global Step: 254100 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:33:12,396-Speed 2515.27 samples/sec Loss 3.1474 LearningRate 0.000594 Epoch: 12 Global Step: 254110 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:33:20,594-Speed 2498.63 samples/sec Loss 3.3092 LearningRate 0.000594 Epoch: 12 Global Step: 254120 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:33:28,791-Speed 2498.78 samples/sec Loss 3.1856 LearningRate 0.000594 Epoch: 12 Global Step: 254130 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:33:36,989-Speed 2498.48 samples/sec Loss 3.2346 LearningRate 0.000594 Epoch: 12 Global Step: 254140 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:33:45,188-Speed 2498.54 samples/sec Loss 3.2707 LearningRate 0.000594 Epoch: 12 Global Step: 254150 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:33:53,390-Speed 2497.34 samples/sec Loss 3.2040 LearningRate 0.000594 Epoch: 12 Global Step: 254160 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:01,533-Speed 2515.26 samples/sec Loss 3.1519 LearningRate 0.000594 Epoch: 12 Global Step: 254170 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:09,730-Speed 2499.04 samples/sec Loss 3.1343 LearningRate 0.000594 Epoch: 12 Global Step: 254180 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:17,932-Speed 2497.09 samples/sec Loss 3.1537 LearningRate 0.000594 Epoch: 12 Global Step: 254190 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:26,131-Speed 2498.55 samples/sec Loss 3.1800 LearningRate 0.000594 Epoch: 12 Global Step: 254200 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:34,331-Speed 2497.95 samples/sec Loss 3.1982 LearningRate 0.000594 Epoch: 12 Global Step: 254210 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:42,530-Speed 2498.63 samples/sec Loss 3.1418 LearningRate 0.000594 Epoch: 12 Global Step: 254220 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:50,675-Speed 2514.70 samples/sec Loss 3.2142 LearningRate 0.000594 Epoch: 12 Global Step: 254230 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:34:58,875-Speed 2498.17 samples/sec Loss 3.0861 LearningRate 0.000594 Epoch: 12 Global Step: 254240 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:07,095-Speed 2491.92 samples/sec Loss 3.1389 LearningRate 0.000594 Epoch: 12 Global Step: 254250 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:15,310-Speed 2493.23 samples/sec Loss 3.1450 LearningRate 0.000594 Epoch: 12 Global Step: 254260 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:23,511-Speed 2497.76 samples/sec Loss 3.1076 LearningRate 0.000594 Epoch: 12 Global Step: 254270 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:31,711-Speed 2497.86 samples/sec Loss 3.1015 LearningRate 0.000594 Epoch: 12 Global Step: 254280 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:39,857-Speed 2514.74 samples/sec Loss 3.1075 LearningRate 0.000594 Epoch: 12 Global Step: 254290 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:48,054-Speed 2498.82 samples/sec Loss 3.2353 LearningRate 0.000594 Epoch: 12 Global Step: 254300 Fp16 Grad Scale: 131072 Required: 132 hours Training: 2022-07-08 00:35:56,211-Speed 2511.17 samples/sec Loss 3.2301 LearningRate 0.000594 Epoch: 12 Global Step: 254310 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:04,411-Speed 2498.27 samples/sec Loss 3.2129 LearningRate 0.000594 Epoch: 12 Global Step: 254320 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:12,608-Speed 2498.75 samples/sec Loss 3.1281 LearningRate 0.000594 Epoch: 12 Global Step: 254330 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:20,806-Speed 2498.40 samples/sec Loss 3.1495 LearningRate 0.000594 Epoch: 12 Global Step: 254340 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:28,951-Speed 2514.83 samples/sec Loss 3.1426 LearningRate 0.000594 Epoch: 12 Global Step: 254350 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:37,149-Speed 2498.69 samples/sec Loss 3.1861 LearningRate 0.000594 Epoch: 12 Global Step: 254360 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:45,367-Speed 2492.57 samples/sec Loss 3.1022 LearningRate 0.000594 Epoch: 12 Global Step: 254370 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:36:53,565-Speed 2498.57 samples/sec Loss 3.1578 LearningRate 0.000594 Epoch: 12 Global Step: 254380 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:01,763-Speed 2498.62 samples/sec Loss 3.1289 LearningRate 0.000593 Epoch: 12 Global Step: 254390 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:09,965-Speed 2497.11 samples/sec Loss 3.1834 LearningRate 0.000593 Epoch: 12 Global Step: 254400 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:18,115-Speed 2513.39 samples/sec Loss 3.0974 LearningRate 0.000593 Epoch: 12 Global Step: 254410 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:26,314-Speed 2498.43 samples/sec Loss 3.1636 LearningRate 0.000593 Epoch: 12 Global Step: 254420 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:34,512-Speed 2498.95 samples/sec Loss 3.0888 LearningRate 0.000593 Epoch: 12 Global Step: 254430 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:42,721-Speed 2495.09 samples/sec Loss 3.1721 LearningRate 0.000593 Epoch: 12 Global Step: 254440 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:50,922-Speed 2497.80 samples/sec Loss 3.1637 LearningRate 0.000593 Epoch: 12 Global Step: 254450 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:37:59,123-Speed 2497.68 samples/sec Loss 3.0758 LearningRate 0.000593 Epoch: 12 Global Step: 254460 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:07,271-Speed 2513.96 samples/sec Loss 3.0827 LearningRate 0.000593 Epoch: 12 Global Step: 254470 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:15,484-Speed 2494.17 samples/sec Loss 3.0679 LearningRate 0.000593 Epoch: 12 Global Step: 254480 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:23,682-Speed 2498.71 samples/sec Loss 3.1743 LearningRate 0.000593 Epoch: 12 Global Step: 254490 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:31,883-Speed 2497.90 samples/sec Loss 3.0823 LearningRate 0.000593 Epoch: 12 Global Step: 254500 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:40,081-Speed 2498.40 samples/sec Loss 3.1090 LearningRate 0.000593 Epoch: 12 Global Step: 254510 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:48,284-Speed 2497.18 samples/sec Loss 2.9827 LearningRate 0.000593 Epoch: 12 Global Step: 254520 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:38:56,432-Speed 2513.90 samples/sec Loss 3.0014 LearningRate 0.000593 Epoch: 12 Global Step: 254530 Fp16 Grad Scale: 65536 Required: 132 hours Training: 2022-07-08 00:39:04,630-Speed 2498.76 samples/sec Loss 3.1185 LearningRate 0.000593 Epoch: 12 Global Step: 254540 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:12,827-Speed 2498.74 samples/sec Loss 3.0132 LearningRate 0.000593 Epoch: 12 Global Step: 254550 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:21,031-Speed 2496.60 samples/sec Loss 3.0488 LearningRate 0.000593 Epoch: 12 Global Step: 254560 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:29,229-Speed 2498.75 samples/sec Loss 3.0749 LearningRate 0.000593 Epoch: 12 Global Step: 254570 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:37,431-Speed 2497.19 samples/sec Loss 3.0836 LearningRate 0.000593 Epoch: 12 Global Step: 254580 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:45,581-Speed 2513.47 samples/sec Loss 3.0483 LearningRate 0.000593 Epoch: 12 Global Step: 254590 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:39:53,779-Speed 2498.43 samples/sec Loss 3.0862 LearningRate 0.000593 Epoch: 12 Global Step: 254600 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:01,980-Speed 2498.01 samples/sec Loss 3.1385 LearningRate 0.000593 Epoch: 12 Global Step: 254610 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:10,176-Speed 2499.16 samples/sec Loss 3.0814 LearningRate 0.000593 Epoch: 12 Global Step: 254620 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:18,375-Speed 2498.38 samples/sec Loss 3.0693 LearningRate 0.000593 Epoch: 12 Global Step: 254630 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:26,589-Speed 2493.55 samples/sec Loss 3.0860 LearningRate 0.000593 Epoch: 12 Global Step: 254640 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:34,733-Speed 2515.23 samples/sec Loss 3.1280 LearningRate 0.000593 Epoch: 12 Global Step: 254650 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:42,935-Speed 2497.25 samples/sec Loss 3.0869 LearningRate 0.000593 Epoch: 12 Global Step: 254660 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:51,133-Speed 2498.69 samples/sec Loss 3.0686 LearningRate 0.000593 Epoch: 12 Global Step: 254670 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:40:59,332-Speed 2498.08 samples/sec Loss 3.1064 LearningRate 0.000593 Epoch: 12 Global Step: 254680 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:07,531-Speed 2498.25 samples/sec Loss 3.0774 LearningRate 0.000593 Epoch: 12 Global Step: 254690 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:15,728-Speed 2498.96 samples/sec Loss 3.1069 LearningRate 0.000593 Epoch: 12 Global Step: 254700 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:23,873-Speed 2514.85 samples/sec Loss 3.1366 LearningRate 0.000593 Epoch: 12 Global Step: 254710 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:32,072-Speed 2498.14 samples/sec Loss 3.0989 LearningRate 0.000593 Epoch: 12 Global Step: 254720 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:40,277-Speed 2496.84 samples/sec Loss 3.0995 LearningRate 0.000593 Epoch: 12 Global Step: 254730 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:41:48,436-Speed 2510.66 samples/sec Loss 3.1836 LearningRate 0.000593 Epoch: 12 Global Step: 254740 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:41:56,634-Speed 2498.53 samples/sec Loss 3.0945 LearningRate 0.000593 Epoch: 12 Global Step: 254750 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:04,830-Speed 2499.15 samples/sec Loss 3.1432 LearningRate 0.000593 Epoch: 12 Global Step: 254760 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:12,975-Speed 2514.93 samples/sec Loss 3.1933 LearningRate 0.000593 Epoch: 12 Global Step: 254770 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:21,174-Speed 2498.22 samples/sec Loss 3.1541 LearningRate 0.000593 Epoch: 12 Global Step: 254780 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:29,374-Speed 2498.03 samples/sec Loss 3.1934 LearningRate 0.000593 Epoch: 12 Global Step: 254790 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:37,571-Speed 2498.94 samples/sec Loss 3.1611 LearningRate 0.000593 Epoch: 12 Global Step: 254800 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:45,779-Speed 2495.52 samples/sec Loss 3.1241 LearningRate 0.000593 Epoch: 12 Global Step: 254810 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:42:53,985-Speed 2496.18 samples/sec Loss 3.1245 LearningRate 0.000593 Epoch: 12 Global Step: 254820 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:02,127-Speed 2515.88 samples/sec Loss 3.1299 LearningRate 0.000593 Epoch: 12 Global Step: 254830 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:10,331-Speed 2496.77 samples/sec Loss 3.1810 LearningRate 0.000593 Epoch: 12 Global Step: 254840 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:18,531-Speed 2497.96 samples/sec Loss 3.1197 LearningRate 0.000593 Epoch: 12 Global Step: 254850 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:26,731-Speed 2498.11 samples/sec Loss 3.2304 LearningRate 0.000593 Epoch: 12 Global Step: 254860 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:34,933-Speed 2497.03 samples/sec Loss 3.2111 LearningRate 0.000592 Epoch: 12 Global Step: 254870 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:43,131-Speed 2498.62 samples/sec Loss 3.1573 LearningRate 0.000592 Epoch: 12 Global Step: 254880 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:51,279-Speed 2513.92 samples/sec Loss 3.1143 LearningRate 0.000592 Epoch: 12 Global Step: 254890 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:43:59,481-Speed 2497.40 samples/sec Loss 3.2463 LearningRate 0.000592 Epoch: 12 Global Step: 254900 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:07,681-Speed 2497.99 samples/sec Loss 3.1532 LearningRate 0.000592 Epoch: 12 Global Step: 254910 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:15,883-Speed 2497.40 samples/sec Loss 3.1831 LearningRate 0.000592 Epoch: 12 Global Step: 254920 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:24,083-Speed 2497.74 samples/sec Loss 3.1743 LearningRate 0.000592 Epoch: 12 Global Step: 254930 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:32,288-Speed 2496.56 samples/sec Loss 3.1239 LearningRate 0.000592 Epoch: 12 Global Step: 254940 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:40,435-Speed 2514.39 samples/sec Loss 3.2116 LearningRate 0.000592 Epoch: 12 Global Step: 254950 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:48,643-Speed 2495.23 samples/sec Loss 3.2016 LearningRate 0.000592 Epoch: 12 Global Step: 254960 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:44:56,845-Speed 2497.37 samples/sec Loss 3.1727 LearningRate 0.000592 Epoch: 12 Global Step: 254970 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:05,047-Speed 2497.37 samples/sec Loss 3.1096 LearningRate 0.000592 Epoch: 12 Global Step: 254980 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:13,245-Speed 2498.43 samples/sec Loss 3.1230 LearningRate 0.000592 Epoch: 12 Global Step: 254990 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:21,444-Speed 2498.36 samples/sec Loss 3.1165 LearningRate 0.000592 Epoch: 12 Global Step: 255000 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:29,590-Speed 2514.58 samples/sec Loss 3.1600 LearningRate 0.000592 Epoch: 12 Global Step: 255010 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:37,786-Speed 2499.23 samples/sec Loss 3.1787 LearningRate 0.000592 Epoch: 12 Global Step: 255020 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:45,983-Speed 2498.98 samples/sec Loss 3.1128 LearningRate 0.000592 Epoch: 12 Global Step: 255030 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:45:54,182-Speed 2498.40 samples/sec Loss 3.1394 LearningRate 0.000592 Epoch: 12 Global Step: 255040 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:02,380-Speed 2498.63 samples/sec Loss 3.1457 LearningRate 0.000592 Epoch: 12 Global Step: 255050 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:10,579-Speed 2498.39 samples/sec Loss 3.1465 LearningRate 0.000592 Epoch: 12 Global Step: 255060 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:18,724-Speed 2514.88 samples/sec Loss 3.1341 LearningRate 0.000592 Epoch: 12 Global Step: 255070 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:26,923-Speed 2498.25 samples/sec Loss 3.1296 LearningRate 0.000592 Epoch: 12 Global Step: 255080 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:35,122-Speed 2498.24 samples/sec Loss 3.1107 LearningRate 0.000592 Epoch: 12 Global Step: 255090 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:43,324-Speed 2497.26 samples/sec Loss 3.1419 LearningRate 0.000592 Epoch: 12 Global Step: 255100 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:51,533-Speed 2495.28 samples/sec Loss 3.1509 LearningRate 0.000592 Epoch: 12 Global Step: 255110 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:46:59,728-Speed 2499.24 samples/sec Loss 3.0853 LearningRate 0.000592 Epoch: 12 Global Step: 255120 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:07,874-Speed 2514.57 samples/sec Loss 3.1513 LearningRate 0.000592 Epoch: 12 Global Step: 255130 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:16,075-Speed 2497.82 samples/sec Loss 3.1084 LearningRate 0.000592 Epoch: 12 Global Step: 255140 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:24,269-Speed 2499.92 samples/sec Loss 3.2056 LearningRate 0.000592 Epoch: 12 Global Step: 255150 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:32,465-Speed 2499.02 samples/sec Loss 3.1915 LearningRate 0.000592 Epoch: 12 Global Step: 255160 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:40,664-Speed 2498.63 samples/sec Loss 3.2353 LearningRate 0.000592 Epoch: 12 Global Step: 255170 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:48,868-Speed 2496.86 samples/sec Loss 3.0702 LearningRate 0.000592 Epoch: 12 Global Step: 255180 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:47:57,014-Speed 2514.77 samples/sec Loss 3.2270 LearningRate 0.000592 Epoch: 12 Global Step: 255190 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:05,213-Speed 2498.12 samples/sec Loss 3.1344 LearningRate 0.000592 Epoch: 12 Global Step: 255200 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:13,414-Speed 2497.97 samples/sec Loss 3.2364 LearningRate 0.000592 Epoch: 12 Global Step: 255210 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:21,614-Speed 2497.84 samples/sec Loss 3.2938 LearningRate 0.000592 Epoch: 12 Global Step: 255220 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:29,811-Speed 2498.65 samples/sec Loss 3.2555 LearningRate 0.000592 Epoch: 12 Global Step: 255230 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:38,008-Speed 2498.97 samples/sec Loss 3.2824 LearningRate 0.000592 Epoch: 12 Global Step: 255240 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:46,156-Speed 2513.88 samples/sec Loss 3.3252 LearningRate 0.000592 Epoch: 12 Global Step: 255250 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:48:54,357-Speed 2497.79 samples/sec Loss 3.2936 LearningRate 0.000592 Epoch: 12 Global Step: 255260 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:02,552-Speed 2499.22 samples/sec Loss 3.2088 LearningRate 0.000592 Epoch: 12 Global Step: 255270 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:10,753-Speed 2497.63 samples/sec Loss 3.2282 LearningRate 0.000592 Epoch: 12 Global Step: 255280 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:18,953-Speed 2498.08 samples/sec Loss 3.2231 LearningRate 0.000592 Epoch: 12 Global Step: 255290 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:27,151-Speed 2498.32 samples/sec Loss 3.1541 LearningRate 0.000592 Epoch: 12 Global Step: 255300 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:35,303-Speed 2512.62 samples/sec Loss 3.1316 LearningRate 0.000592 Epoch: 12 Global Step: 255310 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:43,501-Speed 2498.64 samples/sec Loss 3.1920 LearningRate 0.000592 Epoch: 12 Global Step: 255320 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:51,711-Speed 2494.69 samples/sec Loss 3.1745 LearningRate 0.000592 Epoch: 12 Global Step: 255330 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:49:59,910-Speed 2498.27 samples/sec Loss 3.1713 LearningRate 0.000592 Epoch: 12 Global Step: 255340 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:08,109-Speed 2498.50 samples/sec Loss 3.2389 LearningRate 0.000592 Epoch: 12 Global Step: 255350 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:16,306-Speed 2498.66 samples/sec Loss 3.1465 LearningRate 0.000591 Epoch: 12 Global Step: 255360 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:24,454-Speed 2514.05 samples/sec Loss 3.0897 LearningRate 0.000591 Epoch: 12 Global Step: 255370 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:32,665-Speed 2494.58 samples/sec Loss 3.1696 LearningRate 0.000591 Epoch: 12 Global Step: 255380 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:40,866-Speed 2498.01 samples/sec Loss 3.1289 LearningRate 0.000591 Epoch: 12 Global Step: 255390 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:49,064-Speed 2498.58 samples/sec Loss 3.1034 LearningRate 0.000591 Epoch: 12 Global Step: 255400 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:50:57,260-Speed 2499.33 samples/sec Loss 3.0833 LearningRate 0.000591 Epoch: 12 Global Step: 255410 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:05,461-Speed 2497.78 samples/sec Loss 3.1067 LearningRate 0.000591 Epoch: 12 Global Step: 255420 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:13,609-Speed 2514.06 samples/sec Loss 3.1308 LearningRate 0.000591 Epoch: 12 Global Step: 255430 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:21,804-Speed 2499.65 samples/sec Loss 3.1660 LearningRate 0.000591 Epoch: 12 Global Step: 255440 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:30,020-Speed 2492.85 samples/sec Loss 3.0904 LearningRate 0.000591 Epoch: 12 Global Step: 255450 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:38,222-Speed 2497.58 samples/sec Loss 3.0633 LearningRate 0.000591 Epoch: 12 Global Step: 255460 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:46,432-Speed 2495.08 samples/sec Loss 3.0860 LearningRate 0.000591 Epoch: 12 Global Step: 255470 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:51:54,629-Speed 2498.69 samples/sec Loss 3.0638 LearningRate 0.000591 Epoch: 12 Global Step: 255480 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:02,771-Speed 2515.87 samples/sec Loss 3.0874 LearningRate 0.000591 Epoch: 12 Global Step: 255490 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:10,972-Speed 2497.70 samples/sec Loss 3.1111 LearningRate 0.000591 Epoch: 12 Global Step: 255500 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:19,172-Speed 2498.02 samples/sec Loss 3.1253 LearningRate 0.000591 Epoch: 12 Global Step: 255510 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:27,371-Speed 2498.49 samples/sec Loss 3.0481 LearningRate 0.000591 Epoch: 12 Global Step: 255520 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:35,572-Speed 2497.62 samples/sec Loss 3.1296 LearningRate 0.000591 Epoch: 12 Global Step: 255530 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:43,775-Speed 2496.91 samples/sec Loss 3.1515 LearningRate 0.000591 Epoch: 12 Global Step: 255540 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:52:51,918-Speed 2515.38 samples/sec Loss 3.1377 LearningRate 0.000591 Epoch: 12 Global Step: 255550 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:00,125-Speed 2495.93 samples/sec Loss 3.1597 LearningRate 0.000591 Epoch: 12 Global Step: 255560 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:08,322-Speed 2498.80 samples/sec Loss 3.1929 LearningRate 0.000591 Epoch: 12 Global Step: 255570 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:16,526-Speed 2496.84 samples/sec Loss 3.1618 LearningRate 0.000591 Epoch: 12 Global Step: 255580 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:24,729-Speed 2496.96 samples/sec Loss 3.1059 LearningRate 0.000591 Epoch: 12 Global Step: 255590 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:32,929-Speed 2498.18 samples/sec Loss 3.1448 LearningRate 0.000591 Epoch: 12 Global Step: 255600 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:41,074-Speed 2514.78 samples/sec Loss 3.1402 LearningRate 0.000591 Epoch: 12 Global Step: 255610 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:49,276-Speed 2497.28 samples/sec Loss 3.1318 LearningRate 0.000591 Epoch: 12 Global Step: 255620 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:53:57,475-Speed 2498.48 samples/sec Loss 3.1326 LearningRate 0.000591 Epoch: 12 Global Step: 255630 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:05,672-Speed 2499.03 samples/sec Loss 3.1121 LearningRate 0.000591 Epoch: 12 Global Step: 255640 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:13,871-Speed 2498.46 samples/sec Loss 3.0776 LearningRate 0.000591 Epoch: 12 Global Step: 255650 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:22,070-Speed 2498.26 samples/sec Loss 3.1777 LearningRate 0.000591 Epoch: 12 Global Step: 255660 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:30,215-Speed 2514.77 samples/sec Loss 3.0870 LearningRate 0.000591 Epoch: 12 Global Step: 255670 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:38,412-Speed 2499.02 samples/sec Loss 3.0455 LearningRate 0.000591 Epoch: 12 Global Step: 255680 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:46,609-Speed 2498.78 samples/sec Loss 3.1272 LearningRate 0.000591 Epoch: 12 Global Step: 255690 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:54:54,803-Speed 2499.63 samples/sec Loss 3.2424 LearningRate 0.000591 Epoch: 12 Global Step: 255700 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:03,003-Speed 2498.12 samples/sec Loss 3.1479 LearningRate 0.000591 Epoch: 12 Global Step: 255710 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:11,201-Speed 2498.31 samples/sec Loss 3.1024 LearningRate 0.000591 Epoch: 12 Global Step: 255720 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:19,352-Speed 2513.14 samples/sec Loss 3.1487 LearningRate 0.000591 Epoch: 12 Global Step: 255730 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:27,551-Speed 2498.37 samples/sec Loss 3.1396 LearningRate 0.000591 Epoch: 12 Global Step: 255740 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:35,751-Speed 2497.98 samples/sec Loss 3.1285 LearningRate 0.000591 Epoch: 12 Global Step: 255750 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:43,946-Speed 2499.44 samples/sec Loss 3.1886 LearningRate 0.000591 Epoch: 12 Global Step: 255760 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:55:52,143-Speed 2499.03 samples/sec Loss 3.1364 LearningRate 0.000591 Epoch: 12 Global Step: 255770 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:00,342-Speed 2498.26 samples/sec Loss 3.0715 LearningRate 0.000591 Epoch: 12 Global Step: 255780 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:08,488-Speed 2514.67 samples/sec Loss 3.1002 LearningRate 0.000591 Epoch: 12 Global Step: 255790 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:16,684-Speed 2498.98 samples/sec Loss 3.0383 LearningRate 0.000591 Epoch: 12 Global Step: 255800 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:24,895-Speed 2494.60 samples/sec Loss 3.1209 LearningRate 0.000591 Epoch: 12 Global Step: 255810 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:33,095-Speed 2498.06 samples/sec Loss 3.1019 LearningRate 0.000591 Epoch: 12 Global Step: 255820 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:41,298-Speed 2497.07 samples/sec Loss 3.1386 LearningRate 0.000591 Epoch: 12 Global Step: 255830 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:49,497-Speed 2498.58 samples/sec Loss 3.0676 LearningRate 0.000591 Epoch: 12 Global Step: 255840 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:56:57,643-Speed 2514.16 samples/sec Loss 3.2419 LearningRate 0.000590 Epoch: 12 Global Step: 255850 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:05,842-Speed 2498.81 samples/sec Loss 3.1710 LearningRate 0.000590 Epoch: 12 Global Step: 255860 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:14,037-Speed 2499.56 samples/sec Loss 3.2220 LearningRate 0.000590 Epoch: 12 Global Step: 255870 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:22,235-Speed 2498.94 samples/sec Loss 3.2508 LearningRate 0.000590 Epoch: 12 Global Step: 255880 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:30,432-Speed 2498.77 samples/sec Loss 3.2713 LearningRate 0.000590 Epoch: 12 Global Step: 255890 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:38,630-Speed 2498.83 samples/sec Loss 3.1967 LearningRate 0.000590 Epoch: 12 Global Step: 255900 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:46,773-Speed 2515.23 samples/sec Loss 3.1453 LearningRate 0.000590 Epoch: 12 Global Step: 255910 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:57:54,972-Speed 2498.28 samples/sec Loss 3.1229 LearningRate 0.000590 Epoch: 12 Global Step: 255920 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:58:03,170-Speed 2498.57 samples/sec Loss 3.1517 LearningRate 0.000590 Epoch: 12 Global Step: 255930 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:58:11,370-Speed 2498.00 samples/sec Loss 3.1150 LearningRate 0.000590 Epoch: 12 Global Step: 255940 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:58:19,578-Speed 2495.39 samples/sec Loss 3.0618 LearningRate 0.000590 Epoch: 12 Global Step: 255950 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 00:58:27,734-Speed 2511.98 samples/sec Loss 3.0229 LearningRate 0.000590 Epoch: 12 Global Step: 255960 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:58:35,877-Speed 2515.42 samples/sec Loss 3.0233 LearningRate 0.000590 Epoch: 12 Global Step: 255970 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:58:44,078-Speed 2497.71 samples/sec Loss 3.0612 LearningRate 0.000590 Epoch: 12 Global Step: 255980 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:58:52,279-Speed 2497.69 samples/sec Loss 3.0928 LearningRate 0.000590 Epoch: 12 Global Step: 255990 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:00,479-Speed 2498.29 samples/sec Loss 3.2224 LearningRate 0.000590 Epoch: 12 Global Step: 256000 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:08,691-Speed 2494.22 samples/sec Loss 3.0637 LearningRate 0.000590 Epoch: 12 Global Step: 256010 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:16,888-Speed 2498.72 samples/sec Loss 3.0602 LearningRate 0.000590 Epoch: 12 Global Step: 256020 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:25,052-Speed 2509.11 samples/sec Loss 3.1042 LearningRate 0.000590 Epoch: 12 Global Step: 256030 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:33,264-Speed 2494.57 samples/sec Loss 3.1061 LearningRate 0.000590 Epoch: 12 Global Step: 256040 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:41,460-Speed 2499.10 samples/sec Loss 3.1132 LearningRate 0.000590 Epoch: 12 Global Step: 256050 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:49,660-Speed 2497.95 samples/sec Loss 3.0810 LearningRate 0.000590 Epoch: 12 Global Step: 256060 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 00:59:57,866-Speed 2496.23 samples/sec Loss 3.0957 LearningRate 0.000590 Epoch: 12 Global Step: 256070 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:06,065-Speed 2498.25 samples/sec Loss 3.0700 LearningRate 0.000590 Epoch: 12 Global Step: 256080 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:14,212-Speed 2514.15 samples/sec Loss 3.1034 LearningRate 0.000590 Epoch: 12 Global Step: 256090 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:22,418-Speed 2496.12 samples/sec Loss 3.1036 LearningRate 0.000590 Epoch: 12 Global Step: 256100 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:30,629-Speed 2494.88 samples/sec Loss 3.1809 LearningRate 0.000590 Epoch: 12 Global Step: 256110 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:38,842-Speed 2494.01 samples/sec Loss 3.0973 LearningRate 0.000590 Epoch: 12 Global Step: 256120 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:47,043-Speed 2497.78 samples/sec Loss 3.0907 LearningRate 0.000590 Epoch: 12 Global Step: 256130 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:00:55,258-Speed 2493.29 samples/sec Loss 3.1242 LearningRate 0.000590 Epoch: 12 Global Step: 256140 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:03,406-Speed 2513.92 samples/sec Loss 3.1367 LearningRate 0.000590 Epoch: 12 Global Step: 256150 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:11,607-Speed 2497.95 samples/sec Loss 3.0743 LearningRate 0.000590 Epoch: 12 Global Step: 256160 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:19,807-Speed 2497.65 samples/sec Loss 3.0984 LearningRate 0.000590 Epoch: 12 Global Step: 256170 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:28,011-Speed 2497.03 samples/sec Loss 3.1013 LearningRate 0.000590 Epoch: 12 Global Step: 256180 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:36,236-Speed 2490.09 samples/sec Loss 3.0694 LearningRate 0.000590 Epoch: 12 Global Step: 256190 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:44,443-Speed 2495.93 samples/sec Loss 3.1318 LearningRate 0.000590 Epoch: 12 Global Step: 256200 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:01:52,589-Speed 2514.35 samples/sec Loss 3.1065 LearningRate 0.000590 Epoch: 12 Global Step: 256210 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:00,787-Speed 2498.54 samples/sec Loss 3.0757 LearningRate 0.000590 Epoch: 12 Global Step: 256220 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:08,990-Speed 2497.14 samples/sec Loss 3.0587 LearningRate 0.000590 Epoch: 12 Global Step: 256230 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:17,205-Speed 2493.42 samples/sec Loss 3.0845 LearningRate 0.000590 Epoch: 12 Global Step: 256240 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:25,404-Speed 2498.05 samples/sec Loss 3.1044 LearningRate 0.000590 Epoch: 12 Global Step: 256250 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:33,613-Speed 2495.53 samples/sec Loss 3.0656 LearningRate 0.000590 Epoch: 12 Global Step: 256260 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:41,759-Speed 2514.58 samples/sec Loss 3.1038 LearningRate 0.000590 Epoch: 12 Global Step: 256270 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:49,957-Speed 2498.28 samples/sec Loss 3.1202 LearningRate 0.000590 Epoch: 12 Global Step: 256280 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:02:58,154-Speed 2498.85 samples/sec Loss 3.1247 LearningRate 0.000590 Epoch: 12 Global Step: 256290 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:06,355-Speed 2497.88 samples/sec Loss 3.1280 LearningRate 0.000590 Epoch: 12 Global Step: 256300 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:14,564-Speed 2495.05 samples/sec Loss 3.0842 LearningRate 0.000590 Epoch: 12 Global Step: 256310 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:22,764-Speed 2497.92 samples/sec Loss 3.1404 LearningRate 0.000590 Epoch: 12 Global Step: 256320 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:30,909-Speed 2515.06 samples/sec Loss 3.1289 LearningRate 0.000589 Epoch: 12 Global Step: 256330 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:39,124-Speed 2493.44 samples/sec Loss 3.1200 LearningRate 0.000589 Epoch: 12 Global Step: 256340 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:47,323-Speed 2498.07 samples/sec Loss 3.1376 LearningRate 0.000589 Epoch: 12 Global Step: 256350 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:03:55,523-Speed 2498.18 samples/sec Loss 3.0808 LearningRate 0.000589 Epoch: 12 Global Step: 256360 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:03,738-Speed 2493.23 samples/sec Loss 3.0487 LearningRate 0.000589 Epoch: 12 Global Step: 256370 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:11,932-Speed 2499.77 samples/sec Loss 3.1004 LearningRate 0.000589 Epoch: 12 Global Step: 256380 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:20,079-Speed 2514.43 samples/sec Loss 3.1353 LearningRate 0.000589 Epoch: 12 Global Step: 256390 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:28,279-Speed 2498.00 samples/sec Loss 3.0480 LearningRate 0.000589 Epoch: 12 Global Step: 256400 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:36,480-Speed 2497.51 samples/sec Loss 3.0425 LearningRate 0.000589 Epoch: 12 Global Step: 256410 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:44,681-Speed 2497.75 samples/sec Loss 3.1163 LearningRate 0.000589 Epoch: 12 Global Step: 256420 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:04:52,883-Speed 2497.11 samples/sec Loss 3.0788 LearningRate 0.000589 Epoch: 12 Global Step: 256430 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:01,085-Speed 2497.39 samples/sec Loss 3.0843 LearningRate 0.000589 Epoch: 12 Global Step: 256440 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:09,238-Speed 2512.55 samples/sec Loss 3.1393 LearningRate 0.000589 Epoch: 12 Global Step: 256450 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:17,440-Speed 2497.15 samples/sec Loss 3.0949 LearningRate 0.000589 Epoch: 12 Global Step: 256460 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:25,643-Speed 2497.13 samples/sec Loss 3.1535 LearningRate 0.000589 Epoch: 12 Global Step: 256470 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:33,843-Speed 2498.03 samples/sec Loss 3.1087 LearningRate 0.000589 Epoch: 12 Global Step: 256480 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:42,045-Speed 2497.09 samples/sec Loss 3.0646 LearningRate 0.000589 Epoch: 12 Global Step: 256490 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:50,248-Speed 2497.25 samples/sec Loss 3.0630 LearningRate 0.000589 Epoch: 12 Global Step: 256500 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:05:58,393-Speed 2514.59 samples/sec Loss 3.1092 LearningRate 0.000589 Epoch: 12 Global Step: 256510 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:06,591-Speed 2498.50 samples/sec Loss 3.1511 LearningRate 0.000589 Epoch: 12 Global Step: 256520 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:14,799-Speed 2495.67 samples/sec Loss 3.0896 LearningRate 0.000589 Epoch: 12 Global Step: 256530 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:23,007-Speed 2495.69 samples/sec Loss 3.1564 LearningRate 0.000589 Epoch: 12 Global Step: 256540 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:31,208-Speed 2497.67 samples/sec Loss 3.0866 LearningRate 0.000589 Epoch: 12 Global Step: 256550 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:39,412-Speed 2496.85 samples/sec Loss 3.1541 LearningRate 0.000589 Epoch: 12 Global Step: 256560 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:47,556-Speed 2515.22 samples/sec Loss 3.0622 LearningRate 0.000589 Epoch: 12 Global Step: 256570 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:06:55,753-Speed 2498.93 samples/sec Loss 3.0290 LearningRate 0.000589 Epoch: 12 Global Step: 256580 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:03,951-Speed 2498.91 samples/sec Loss 3.0862 LearningRate 0.000589 Epoch: 12 Global Step: 256590 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:12,150-Speed 2498.38 samples/sec Loss 3.0391 LearningRate 0.000589 Epoch: 12 Global Step: 256600 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:20,352-Speed 2497.25 samples/sec Loss 3.1488 LearningRate 0.000589 Epoch: 12 Global Step: 256610 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:28,550-Speed 2498.28 samples/sec Loss 3.0712 LearningRate 0.000589 Epoch: 12 Global Step: 256620 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:36,695-Speed 2514.86 samples/sec Loss 3.0415 LearningRate 0.000589 Epoch: 12 Global Step: 256630 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:44,895-Speed 2497.99 samples/sec Loss 3.0986 LearningRate 0.000589 Epoch: 12 Global Step: 256640 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:07:53,095-Speed 2498.35 samples/sec Loss 3.1298 LearningRate 0.000589 Epoch: 12 Global Step: 256650 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:01,292-Speed 2498.49 samples/sec Loss 3.0756 LearningRate 0.000589 Epoch: 12 Global Step: 256660 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:09,493-Speed 2497.84 samples/sec Loss 3.0780 LearningRate 0.000589 Epoch: 12 Global Step: 256670 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:17,695-Speed 2497.33 samples/sec Loss 3.0807 LearningRate 0.000589 Epoch: 12 Global Step: 256680 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:25,839-Speed 2515.35 samples/sec Loss 3.1512 LearningRate 0.000589 Epoch: 12 Global Step: 256690 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:34,038-Speed 2498.38 samples/sec Loss 3.0657 LearningRate 0.000589 Epoch: 12 Global Step: 256700 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:42,235-Speed 2498.78 samples/sec Loss 3.1043 LearningRate 0.000589 Epoch: 12 Global Step: 256710 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:50,434-Speed 2498.31 samples/sec Loss 3.1706 LearningRate 0.000589 Epoch: 12 Global Step: 256720 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:08:58,640-Speed 2496.06 samples/sec Loss 3.1134 LearningRate 0.000589 Epoch: 12 Global Step: 256730 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:06,843-Speed 2497.16 samples/sec Loss 3.1408 LearningRate 0.000589 Epoch: 12 Global Step: 256740 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:14,989-Speed 2514.72 samples/sec Loss 3.1148 LearningRate 0.000589 Epoch: 12 Global Step: 256750 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:23,184-Speed 2499.43 samples/sec Loss 3.0810 LearningRate 0.000589 Epoch: 12 Global Step: 256760 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:31,385-Speed 2497.69 samples/sec Loss 3.0954 LearningRate 0.000589 Epoch: 12 Global Step: 256770 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:39,584-Speed 2498.19 samples/sec Loss 3.1035 LearningRate 0.000589 Epoch: 12 Global Step: 256780 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:47,783-Speed 2498.63 samples/sec Loss 3.0904 LearningRate 0.000589 Epoch: 12 Global Step: 256790 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:09:55,982-Speed 2498.53 samples/sec Loss 3.1475 LearningRate 0.000589 Epoch: 12 Global Step: 256800 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:04,136-Speed 2511.79 samples/sec Loss 3.1073 LearningRate 0.000589 Epoch: 12 Global Step: 256810 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:12,339-Speed 2497.38 samples/sec Loss 3.1344 LearningRate 0.000588 Epoch: 12 Global Step: 256820 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:20,542-Speed 2497.10 samples/sec Loss 3.0770 LearningRate 0.000588 Epoch: 12 Global Step: 256830 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:28,752-Speed 2494.92 samples/sec Loss 3.0868 LearningRate 0.000588 Epoch: 12 Global Step: 256840 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:36,951-Speed 2498.44 samples/sec Loss 3.0439 LearningRate 0.000588 Epoch: 12 Global Step: 256850 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:45,151-Speed 2497.84 samples/sec Loss 3.0578 LearningRate 0.000588 Epoch: 12 Global Step: 256860 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:10:53,298-Speed 2514.10 samples/sec Loss 3.1073 LearningRate 0.000588 Epoch: 12 Global Step: 256870 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:01,509-Speed 2494.94 samples/sec Loss 3.0797 LearningRate 0.000588 Epoch: 12 Global Step: 256880 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:09,706-Speed 2498.73 samples/sec Loss 3.1853 LearningRate 0.000588 Epoch: 12 Global Step: 256890 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:17,904-Speed 2498.49 samples/sec Loss 3.0958 LearningRate 0.000588 Epoch: 12 Global Step: 256900 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:26,103-Speed 2498.33 samples/sec Loss 3.1311 LearningRate 0.000588 Epoch: 12 Global Step: 256910 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:34,301-Speed 2498.45 samples/sec Loss 3.1706 LearningRate 0.000588 Epoch: 12 Global Step: 256920 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:42,447-Speed 2514.47 samples/sec Loss 3.0876 LearningRate 0.000588 Epoch: 12 Global Step: 256930 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:50,646-Speed 2498.57 samples/sec Loss 3.0698 LearningRate 0.000588 Epoch: 12 Global Step: 256940 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:11:58,860-Speed 2493.67 samples/sec Loss 3.0858 LearningRate 0.000588 Epoch: 12 Global Step: 256950 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:07,057-Speed 2499.03 samples/sec Loss 3.0921 LearningRate 0.000588 Epoch: 12 Global Step: 256960 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:15,258-Speed 2497.72 samples/sec Loss 3.0731 LearningRate 0.000588 Epoch: 12 Global Step: 256970 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:23,455-Speed 2498.71 samples/sec Loss 3.1058 LearningRate 0.000588 Epoch: 12 Global Step: 256980 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:31,600-Speed 2514.78 samples/sec Loss 3.1345 LearningRate 0.000588 Epoch: 12 Global Step: 256990 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:39,799-Speed 2498.28 samples/sec Loss 3.1296 LearningRate 0.000588 Epoch: 12 Global Step: 257000 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:47,999-Speed 2497.87 samples/sec Loss 3.0953 LearningRate 0.000588 Epoch: 12 Global Step: 257010 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:12:56,200-Speed 2497.69 samples/sec Loss 3.0932 LearningRate 0.000588 Epoch: 12 Global Step: 257020 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:04,400-Speed 2498.16 samples/sec Loss 3.0918 LearningRate 0.000588 Epoch: 12 Global Step: 257030 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:12,601-Speed 2497.51 samples/sec Loss 3.0752 LearningRate 0.000588 Epoch: 12 Global Step: 257040 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:20,752-Speed 2513.19 samples/sec Loss 3.0992 LearningRate 0.000588 Epoch: 12 Global Step: 257050 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:28,956-Speed 2496.61 samples/sec Loss 3.0844 LearningRate 0.000588 Epoch: 12 Global Step: 257060 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:37,157-Speed 2497.79 samples/sec Loss 3.1524 LearningRate 0.000588 Epoch: 12 Global Step: 257070 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:45,358-Speed 2497.69 samples/sec Loss 3.1288 LearningRate 0.000588 Epoch: 12 Global Step: 257080 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:13:53,569-Speed 2494.56 samples/sec Loss 3.1158 LearningRate 0.000588 Epoch: 12 Global Step: 257090 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:01,773-Speed 2496.79 samples/sec Loss 3.1276 LearningRate 0.000588 Epoch: 12 Global Step: 257100 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:09,923-Speed 2513.53 samples/sec Loss 3.1437 LearningRate 0.000588 Epoch: 12 Global Step: 257110 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:18,128-Speed 2496.33 samples/sec Loss 3.1087 LearningRate 0.000588 Epoch: 12 Global Step: 257120 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:26,328-Speed 2497.89 samples/sec Loss 3.0817 LearningRate 0.000588 Epoch: 12 Global Step: 257130 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:34,525-Speed 2499.08 samples/sec Loss 3.1191 LearningRate 0.000588 Epoch: 12 Global Step: 257140 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:42,737-Speed 2494.40 samples/sec Loss 3.0709 LearningRate 0.000588 Epoch: 12 Global Step: 257150 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:14:50,944-Speed 2495.88 samples/sec Loss 3.0744 LearningRate 0.000588 Epoch: 12 Global Step: 257160 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:14:59,087-Speed 2515.45 samples/sec Loss 3.0462 LearningRate 0.000588 Epoch: 12 Global Step: 257170 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:07,289-Speed 2497.47 samples/sec Loss 3.0707 LearningRate 0.000588 Epoch: 12 Global Step: 257180 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:15,489-Speed 2497.94 samples/sec Loss 3.0821 LearningRate 0.000588 Epoch: 12 Global Step: 257190 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:23,689-Speed 2497.97 samples/sec Loss 3.0384 LearningRate 0.000588 Epoch: 12 Global Step: 257200 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:31,888-Speed 2498.15 samples/sec Loss 3.0582 LearningRate 0.000588 Epoch: 12 Global Step: 257210 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:40,089-Speed 2497.75 samples/sec Loss 3.0268 LearningRate 0.000588 Epoch: 12 Global Step: 257220 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:48,236-Speed 2514.64 samples/sec Loss 3.1109 LearningRate 0.000588 Epoch: 12 Global Step: 257230 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:15:56,435-Speed 2498.34 samples/sec Loss 3.0830 LearningRate 0.000588 Epoch: 12 Global Step: 257240 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:04,636-Speed 2497.70 samples/sec Loss 3.0892 LearningRate 0.000588 Epoch: 12 Global Step: 257250 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:12,836-Speed 2498.32 samples/sec Loss 3.1295 LearningRate 0.000588 Epoch: 12 Global Step: 257260 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:21,048-Speed 2494.48 samples/sec Loss 3.1432 LearningRate 0.000588 Epoch: 12 Global Step: 257270 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:29,251-Speed 2497.06 samples/sec Loss 3.0707 LearningRate 0.000588 Epoch: 12 Global Step: 257280 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:37,398-Speed 2514.06 samples/sec Loss 3.1153 LearningRate 0.000588 Epoch: 12 Global Step: 257290 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:45,604-Speed 2496.28 samples/sec Loss 3.1075 LearningRate 0.000587 Epoch: 12 Global Step: 257300 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:16:53,805-Speed 2497.64 samples/sec Loss 3.1000 LearningRate 0.000587 Epoch: 12 Global Step: 257310 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:02,010-Speed 2496.41 samples/sec Loss 3.0967 LearningRate 0.000587 Epoch: 12 Global Step: 257320 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:10,208-Speed 2498.62 samples/sec Loss 3.1123 LearningRate 0.000587 Epoch: 12 Global Step: 257330 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:18,410-Speed 2497.66 samples/sec Loss 3.0173 LearningRate 0.000587 Epoch: 12 Global Step: 257340 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:26,556-Speed 2514.57 samples/sec Loss 3.0624 LearningRate 0.000587 Epoch: 12 Global Step: 257350 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:34,753-Speed 2498.56 samples/sec Loss 3.1024 LearningRate 0.000587 Epoch: 12 Global Step: 257360 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:42,957-Speed 2497.06 samples/sec Loss 3.1049 LearningRate 0.000587 Epoch: 12 Global Step: 257370 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:51,156-Speed 2498.41 samples/sec Loss 3.1088 LearningRate 0.000587 Epoch: 12 Global Step: 257380 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:17:59,358-Speed 2497.51 samples/sec Loss 3.1175 LearningRate 0.000587 Epoch: 12 Global Step: 257390 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:07,556-Speed 2498.66 samples/sec Loss 3.0824 LearningRate 0.000587 Epoch: 12 Global Step: 257400 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:15,711-Speed 2511.62 samples/sec Loss 3.1117 LearningRate 0.000587 Epoch: 12 Global Step: 257410 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:23,916-Speed 2496.52 samples/sec Loss 3.1516 LearningRate 0.000587 Epoch: 12 Global Step: 257420 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:32,129-Speed 2493.98 samples/sec Loss 3.1071 LearningRate 0.000587 Epoch: 12 Global Step: 257430 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:40,343-Speed 2493.80 samples/sec Loss 3.0632 LearningRate 0.000587 Epoch: 12 Global Step: 257440 Fp16 Grad Scale: 65536 Required: 131 hours Training: 2022-07-08 01:18:48,495-Speed 2512.57 samples/sec Loss 3.0702 LearningRate 0.000587 Epoch: 12 Global Step: 257450 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:18:56,698-Speed 2497.12 samples/sec Loss 3.1269 LearningRate 0.000587 Epoch: 12 Global Step: 257460 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:04,847-Speed 2513.63 samples/sec Loss 3.0734 LearningRate 0.000587 Epoch: 12 Global Step: 257470 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:13,044-Speed 2498.84 samples/sec Loss 3.0410 LearningRate 0.000587 Epoch: 12 Global Step: 257480 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:21,246-Speed 2497.48 samples/sec Loss 3.0669 LearningRate 0.000587 Epoch: 12 Global Step: 257490 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:29,449-Speed 2496.93 samples/sec Loss 3.0580 LearningRate 0.000587 Epoch: 12 Global Step: 257500 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:37,650-Speed 2497.85 samples/sec Loss 3.0319 LearningRate 0.000587 Epoch: 12 Global Step: 257510 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:45,850-Speed 2497.86 samples/sec Loss 3.0362 LearningRate 0.000587 Epoch: 12 Global Step: 257520 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:19:53,995-Speed 2514.88 samples/sec Loss 3.0679 LearningRate 0.000587 Epoch: 12 Global Step: 257530 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:02,193-Speed 2498.57 samples/sec Loss 3.0206 LearningRate 0.000587 Epoch: 12 Global Step: 257540 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:10,391-Speed 2498.55 samples/sec Loss 3.1282 LearningRate 0.000587 Epoch: 12 Global Step: 257550 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:18,592-Speed 2497.43 samples/sec Loss 3.0336 LearningRate 0.000587 Epoch: 12 Global Step: 257560 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:26,789-Speed 2498.91 samples/sec Loss 3.1375 LearningRate 0.000587 Epoch: 12 Global Step: 257570 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:34,988-Speed 2498.33 samples/sec Loss 3.1631 LearningRate 0.000587 Epoch: 12 Global Step: 257580 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:43,132-Speed 2514.94 samples/sec Loss 3.1279 LearningRate 0.000587 Epoch: 12 Global Step: 257590 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:51,331-Speed 2498.47 samples/sec Loss 3.0969 LearningRate 0.000587 Epoch: 12 Global Step: 257600 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:20:59,530-Speed 2498.39 samples/sec Loss 3.1320 LearningRate 0.000587 Epoch: 12 Global Step: 257610 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:07,730-Speed 2497.72 samples/sec Loss 3.1391 LearningRate 0.000587 Epoch: 12 Global Step: 257620 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:15,929-Speed 2498.27 samples/sec Loss 3.1797 LearningRate 0.000587 Epoch: 12 Global Step: 257630 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:24,125-Speed 2499.44 samples/sec Loss 3.1722 LearningRate 0.000587 Epoch: 12 Global Step: 257640 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:32,276-Speed 2512.79 samples/sec Loss 3.1057 LearningRate 0.000587 Epoch: 12 Global Step: 257650 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:40,474-Speed 2498.50 samples/sec Loss 3.0845 LearningRate 0.000587 Epoch: 12 Global Step: 257660 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:48,676-Speed 2497.47 samples/sec Loss 3.1518 LearningRate 0.000587 Epoch: 12 Global Step: 257670 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:21:56,873-Speed 2498.72 samples/sec Loss 3.1484 LearningRate 0.000587 Epoch: 12 Global Step: 257680 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:05,071-Speed 2498.98 samples/sec Loss 3.0927 LearningRate 0.000587 Epoch: 12 Global Step: 257690 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:13,270-Speed 2498.44 samples/sec Loss 3.1749 LearningRate 0.000587 Epoch: 12 Global Step: 257700 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:21,417-Speed 2514.06 samples/sec Loss 3.1289 LearningRate 0.000587 Epoch: 12 Global Step: 257710 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:29,612-Speed 2499.34 samples/sec Loss 3.0917 LearningRate 0.000587 Epoch: 12 Global Step: 257720 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:37,814-Speed 2497.56 samples/sec Loss 3.1991 LearningRate 0.000587 Epoch: 12 Global Step: 257730 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:46,022-Speed 2495.33 samples/sec Loss 3.1062 LearningRate 0.000587 Epoch: 12 Global Step: 257740 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:22:54,224-Speed 2497.44 samples/sec Loss 3.0714 LearningRate 0.000587 Epoch: 12 Global Step: 257750 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:02,425-Speed 2497.55 samples/sec Loss 3.0858 LearningRate 0.000587 Epoch: 12 Global Step: 257760 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:10,571-Speed 2514.73 samples/sec Loss 3.0832 LearningRate 0.000587 Epoch: 12 Global Step: 257770 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:18,784-Speed 2493.74 samples/sec Loss 3.0528 LearningRate 0.000587 Epoch: 12 Global Step: 257780 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:26,989-Speed 2496.50 samples/sec Loss 3.1043 LearningRate 0.000586 Epoch: 12 Global Step: 257790 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:35,185-Speed 2499.17 samples/sec Loss 3.1055 LearningRate 0.000586 Epoch: 12 Global Step: 257800 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:43,385-Speed 2498.06 samples/sec Loss 3.0970 LearningRate 0.000586 Epoch: 12 Global Step: 257810 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:51,582-Speed 2498.84 samples/sec Loss 3.1829 LearningRate 0.000586 Epoch: 12 Global Step: 257820 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:23:59,731-Speed 2513.84 samples/sec Loss 3.0568 LearningRate 0.000586 Epoch: 12 Global Step: 257830 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:24:07,928-Speed 2498.77 samples/sec Loss 3.1247 LearningRate 0.000586 Epoch: 12 Global Step: 257840 Fp16 Grad Scale: 32768 Required: 131 hours Training: 2022-07-08 01:24:16,081-Speed 2512.47 samples/sec Loss 3.1382 LearningRate 0.000586 Epoch: 12 Global Step: 257850 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:24:24,282-Speed 2497.73 samples/sec Loss 3.1587 LearningRate 0.000586 Epoch: 12 Global Step: 257860 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:24:32,482-Speed 2497.90 samples/sec Loss 3.0436 LearningRate 0.000586 Epoch: 12 Global Step: 257870 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:24:40,678-Speed 2499.03 samples/sec Loss 3.0928 LearningRate 0.000586 Epoch: 12 Global Step: 257880 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:24:48,823-Speed 2514.74 samples/sec Loss 3.0882 LearningRate 0.000586 Epoch: 12 Global Step: 257890 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:24:57,054-Speed 2488.65 samples/sec Loss 3.0951 LearningRate 0.000586 Epoch: 12 Global Step: 257900 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:05,252-Speed 2498.76 samples/sec Loss 3.0770 LearningRate 0.000586 Epoch: 12 Global Step: 257910 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:13,449-Speed 2498.68 samples/sec Loss 3.0535 LearningRate 0.000586 Epoch: 12 Global Step: 257920 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:21,647-Speed 2499.15 samples/sec Loss 3.0702 LearningRate 0.000586 Epoch: 12 Global Step: 257930 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:29,846-Speed 2498.25 samples/sec Loss 3.0689 LearningRate 0.000586 Epoch: 12 Global Step: 257940 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:37,992-Speed 2514.40 samples/sec Loss 3.0519 LearningRate 0.000586 Epoch: 12 Global Step: 257950 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:46,193-Speed 2497.90 samples/sec Loss 3.0199 LearningRate 0.000586 Epoch: 12 Global Step: 257960 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:25:54,412-Speed 2492.09 samples/sec Loss 3.0349 LearningRate 0.000586 Epoch: 12 Global Step: 257970 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:02,609-Speed 2498.91 samples/sec Loss 3.1017 LearningRate 0.000586 Epoch: 12 Global Step: 257980 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:10,819-Speed 2495.19 samples/sec Loss 3.0457 LearningRate 0.000586 Epoch: 12 Global Step: 257990 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:19,017-Speed 2498.66 samples/sec Loss 3.0703 LearningRate 0.000586 Epoch: 12 Global Step: 258000 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:27,161-Speed 2515.04 samples/sec Loss 3.0429 LearningRate 0.000586 Epoch: 12 Global Step: 258010 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:35,372-Speed 2494.74 samples/sec Loss 3.0674 LearningRate 0.000586 Epoch: 12 Global Step: 258020 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:43,591-Speed 2492.15 samples/sec Loss 3.0321 LearningRate 0.000586 Epoch: 12 Global Step: 258030 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:51,795-Speed 2496.74 samples/sec Loss 3.0505 LearningRate 0.000586 Epoch: 12 Global Step: 258040 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:26:59,993-Speed 2498.76 samples/sec Loss 3.1103 LearningRate 0.000586 Epoch: 12 Global Step: 258050 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:08,226-Speed 2487.74 samples/sec Loss 3.0700 LearningRate 0.000586 Epoch: 12 Global Step: 258060 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:16,373-Speed 2514.22 samples/sec Loss 3.0456 LearningRate 0.000586 Epoch: 12 Global Step: 258070 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:24,576-Speed 2497.07 samples/sec Loss 3.0997 LearningRate 0.000586 Epoch: 12 Global Step: 258080 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:32,789-Speed 2494.06 samples/sec Loss 3.1098 LearningRate 0.000586 Epoch: 12 Global Step: 258090 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:40,987-Speed 2498.56 samples/sec Loss 3.0510 LearningRate 0.000586 Epoch: 12 Global Step: 258100 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:49,194-Speed 2495.92 samples/sec Loss 3.1182 LearningRate 0.000586 Epoch: 12 Global Step: 258110 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:27:57,395-Speed 2497.89 samples/sec Loss 3.1945 LearningRate 0.000586 Epoch: 12 Global Step: 258120 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:05,541-Speed 2514.46 samples/sec Loss 3.0946 LearningRate 0.000586 Epoch: 12 Global Step: 258130 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:13,739-Speed 2498.70 samples/sec Loss 3.1090 LearningRate 0.000586 Epoch: 12 Global Step: 258140 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:21,953-Speed 2493.62 samples/sec Loss 3.1183 LearningRate 0.000586 Epoch: 12 Global Step: 258150 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:30,156-Speed 2496.99 samples/sec Loss 3.1666 LearningRate 0.000586 Epoch: 12 Global Step: 258160 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:38,362-Speed 2495.93 samples/sec Loss 3.0696 LearningRate 0.000586 Epoch: 12 Global Step: 258170 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:46,558-Speed 2499.05 samples/sec Loss 3.0730 LearningRate 0.000586 Epoch: 12 Global Step: 258180 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:28:54,703-Speed 2515.01 samples/sec Loss 3.0716 LearningRate 0.000586 Epoch: 12 Global Step: 258190 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:02,907-Speed 2496.65 samples/sec Loss 3.1420 LearningRate 0.000586 Epoch: 12 Global Step: 258200 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:11,107-Speed 2498.07 samples/sec Loss 3.0720 LearningRate 0.000586 Epoch: 12 Global Step: 258210 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:19,317-Speed 2494.91 samples/sec Loss 3.0992 LearningRate 0.000586 Epoch: 12 Global Step: 258220 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:27,517-Speed 2498.11 samples/sec Loss 3.0689 LearningRate 0.000586 Epoch: 12 Global Step: 258230 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:35,717-Speed 2498.13 samples/sec Loss 3.0509 LearningRate 0.000586 Epoch: 12 Global Step: 258240 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:43,865-Speed 2513.82 samples/sec Loss 3.0810 LearningRate 0.000586 Epoch: 12 Global Step: 258250 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:29:52,068-Speed 2497.07 samples/sec Loss 3.1329 LearningRate 0.000586 Epoch: 12 Global Step: 258260 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:00,269-Speed 2497.83 samples/sec Loss 3.1166 LearningRate 0.000586 Epoch: 12 Global Step: 258270 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:08,470-Speed 2497.63 samples/sec Loss 3.1087 LearningRate 0.000585 Epoch: 12 Global Step: 258280 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:16,679-Speed 2495.01 samples/sec Loss 3.1539 LearningRate 0.000585 Epoch: 12 Global Step: 258290 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:24,878-Speed 2498.40 samples/sec Loss 3.0901 LearningRate 0.000585 Epoch: 12 Global Step: 258300 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:33,023-Speed 2514.84 samples/sec Loss 3.0875 LearningRate 0.000585 Epoch: 12 Global Step: 258310 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:41,228-Speed 2496.53 samples/sec Loss 3.1183 LearningRate 0.000585 Epoch: 12 Global Step: 258320 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:49,431-Speed 2497.12 samples/sec Loss 3.0600 LearningRate 0.000585 Epoch: 12 Global Step: 258330 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:30:57,638-Speed 2495.67 samples/sec Loss 3.0630 LearningRate 0.000585 Epoch: 12 Global Step: 258340 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:05,842-Speed 2497.13 samples/sec Loss 3.0486 LearningRate 0.000585 Epoch: 12 Global Step: 258350 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:14,042-Speed 2497.76 samples/sec Loss 3.0974 LearningRate 0.000585 Epoch: 12 Global Step: 258360 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:22,195-Speed 2512.19 samples/sec Loss 3.0716 LearningRate 0.000585 Epoch: 12 Global Step: 258370 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:30,423-Speed 2489.70 samples/sec Loss 3.1169 LearningRate 0.000585 Epoch: 12 Global Step: 258380 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:38,628-Speed 2496.39 samples/sec Loss 3.1253 LearningRate 0.000585 Epoch: 12 Global Step: 258390 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:46,828-Speed 2498.04 samples/sec Loss 3.0534 LearningRate 0.000585 Epoch: 12 Global Step: 258400 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:31:55,028-Speed 2498.01 samples/sec Loss 3.0618 LearningRate 0.000585 Epoch: 12 Global Step: 258410 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:03,229-Speed 2497.49 samples/sec Loss 3.1230 LearningRate 0.000585 Epoch: 12 Global Step: 258420 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:11,382-Speed 2512.77 samples/sec Loss 3.0796 LearningRate 0.000585 Epoch: 12 Global Step: 258430 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:19,583-Speed 2497.67 samples/sec Loss 3.0104 LearningRate 0.000585 Epoch: 12 Global Step: 258440 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:27,799-Speed 2493.20 samples/sec Loss 3.0798 LearningRate 0.000585 Epoch: 12 Global Step: 258450 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:36,005-Speed 2496.21 samples/sec Loss 3.0746 LearningRate 0.000585 Epoch: 12 Global Step: 258460 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:44,209-Speed 2496.84 samples/sec Loss 3.0578 LearningRate 0.000585 Epoch: 12 Global Step: 258470 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:32:52,411-Speed 2497.36 samples/sec Loss 3.1745 LearningRate 0.000585 Epoch: 12 Global Step: 258480 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:00,561-Speed 2513.17 samples/sec Loss 3.0976 LearningRate 0.000585 Epoch: 12 Global Step: 258490 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:08,775-Speed 2493.68 samples/sec Loss 3.1061 LearningRate 0.000585 Epoch: 12 Global Step: 258500 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:16,977-Speed 2497.18 samples/sec Loss 3.1311 LearningRate 0.000585 Epoch: 12 Global Step: 258510 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:25,178-Speed 2497.54 samples/sec Loss 3.0321 LearningRate 0.000585 Epoch: 12 Global Step: 258520 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:33,384-Speed 2496.13 samples/sec Loss 3.1456 LearningRate 0.000585 Epoch: 12 Global Step: 258530 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:41,586-Speed 2497.41 samples/sec Loss 3.1439 LearningRate 0.000585 Epoch: 12 Global Step: 258540 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:49,741-Speed 2511.88 samples/sec Loss 3.0598 LearningRate 0.000585 Epoch: 12 Global Step: 258550 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:33:57,943-Speed 2497.32 samples/sec Loss 3.1121 LearningRate 0.000585 Epoch: 12 Global Step: 258560 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:06,139-Speed 2499.23 samples/sec Loss 3.1043 LearningRate 0.000585 Epoch: 12 Global Step: 258570 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:14,341-Speed 2497.27 samples/sec Loss 3.0752 LearningRate 0.000585 Epoch: 12 Global Step: 258580 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:22,541-Speed 2498.08 samples/sec Loss 3.0479 LearningRate 0.000585 Epoch: 12 Global Step: 258590 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:30,744-Speed 2497.36 samples/sec Loss 3.1012 LearningRate 0.000585 Epoch: 12 Global Step: 258600 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:38,921-Speed 2504.88 samples/sec Loss 3.1010 LearningRate 0.000585 Epoch: 12 Global Step: 258610 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:47,119-Speed 2498.87 samples/sec Loss 3.0824 LearningRate 0.000585 Epoch: 12 Global Step: 258620 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:34:55,343-Speed 2490.58 samples/sec Loss 3.0846 LearningRate 0.000585 Epoch: 12 Global Step: 258630 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:03,544-Speed 2497.51 samples/sec Loss 3.0972 LearningRate 0.000585 Epoch: 12 Global Step: 258640 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:11,743-Speed 2498.38 samples/sec Loss 3.0704 LearningRate 0.000585 Epoch: 12 Global Step: 258650 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:19,946-Speed 2496.94 samples/sec Loss 3.1292 LearningRate 0.000585 Epoch: 12 Global Step: 258660 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:28,104-Speed 2511.04 samples/sec Loss 3.1640 LearningRate 0.000585 Epoch: 12 Global Step: 258670 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:36,303-Speed 2498.21 samples/sec Loss 3.1334 LearningRate 0.000585 Epoch: 12 Global Step: 258680 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:44,503-Speed 2498.04 samples/sec Loss 3.1127 LearningRate 0.000585 Epoch: 12 Global Step: 258690 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:35:52,711-Speed 2495.43 samples/sec Loss 3.1268 LearningRate 0.000585 Epoch: 12 Global Step: 258700 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:00,914-Speed 2496.90 samples/sec Loss 3.1336 LearningRate 0.000585 Epoch: 12 Global Step: 258710 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:09,115-Speed 2497.75 samples/sec Loss 3.0312 LearningRate 0.000585 Epoch: 12 Global Step: 258720 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:17,259-Speed 2515.08 samples/sec Loss 3.1298 LearningRate 0.000585 Epoch: 12 Global Step: 258730 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:25,456-Speed 2499.20 samples/sec Loss 3.0857 LearningRate 0.000585 Epoch: 12 Global Step: 258740 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:33,663-Speed 2495.94 samples/sec Loss 3.1409 LearningRate 0.000585 Epoch: 12 Global Step: 258750 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:41,862-Speed 2498.26 samples/sec Loss 3.0298 LearningRate 0.000585 Epoch: 12 Global Step: 258760 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:50,059-Speed 2498.98 samples/sec Loss 3.0650 LearningRate 0.000584 Epoch: 12 Global Step: 258770 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:36:58,284-Speed 2490.53 samples/sec Loss 3.0576 LearningRate 0.000584 Epoch: 12 Global Step: 258780 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:06,431-Speed 2514.13 samples/sec Loss 3.0393 LearningRate 0.000584 Epoch: 12 Global Step: 258790 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:14,633-Speed 2497.53 samples/sec Loss 3.0066 LearningRate 0.000584 Epoch: 12 Global Step: 258800 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:22,835-Speed 2497.55 samples/sec Loss 3.0151 LearningRate 0.000584 Epoch: 12 Global Step: 258810 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:31,037-Speed 2497.47 samples/sec Loss 3.0743 LearningRate 0.000584 Epoch: 12 Global Step: 258820 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:39,239-Speed 2497.46 samples/sec Loss 3.0633 LearningRate 0.000584 Epoch: 12 Global Step: 258830 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:47,435-Speed 2499.31 samples/sec Loss 3.0316 LearningRate 0.000584 Epoch: 12 Global Step: 258840 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:37:55,583-Speed 2514.03 samples/sec Loss 3.1190 LearningRate 0.000584 Epoch: 12 Global Step: 258850 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:38:03,780-Speed 2498.63 samples/sec Loss 3.0433 LearningRate 0.000584 Epoch: 12 Global Step: 258860 Fp16 Grad Scale: 16384 Required: 131 hours Training: 2022-07-08 01:38:11,979-Speed 2498.41 samples/sec Loss 3.1515 LearningRate 0.000584 Epoch: 12 Global Step: 258870 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:38:20,182-Speed 2496.83 samples/sec Loss 3.0464 LearningRate 0.000584 Epoch: 12 Global Step: 258880 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:38:28,384-Speed 2497.62 samples/sec Loss 3.1088 LearningRate 0.000584 Epoch: 12 Global Step: 258890 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:38:36,588-Speed 2496.58 samples/sec Loss 3.0565 LearningRate 0.000584 Epoch: 12 Global Step: 258900 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:38:44,736-Speed 2514.23 samples/sec Loss 3.0724 LearningRate 0.000584 Epoch: 12 Global Step: 258910 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:38:52,935-Speed 2498.34 samples/sec Loss 3.0964 LearningRate 0.000584 Epoch: 12 Global Step: 258920 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:01,139-Speed 2496.76 samples/sec Loss 3.0175 LearningRate 0.000584 Epoch: 12 Global Step: 258930 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:09,343-Speed 2497.38 samples/sec Loss 3.0691 LearningRate 0.000584 Epoch: 12 Global Step: 258940 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:17,545-Speed 2497.51 samples/sec Loss 3.1315 LearningRate 0.000584 Epoch: 12 Global Step: 258950 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:25,749-Speed 2496.44 samples/sec Loss 3.0805 LearningRate 0.000584 Epoch: 12 Global Step: 258960 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:33,901-Speed 2512.76 samples/sec Loss 3.1599 LearningRate 0.000584 Epoch: 12 Global Step: 258970 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:42,101-Speed 2498.07 samples/sec Loss 3.0390 LearningRate 0.000584 Epoch: 12 Global Step: 258980 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:50,300-Speed 2498.31 samples/sec Loss 3.0533 LearningRate 0.000584 Epoch: 12 Global Step: 258990 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:39:58,499-Speed 2498.39 samples/sec Loss 3.1022 LearningRate 0.000584 Epoch: 12 Global Step: 259000 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:40:06,696-Speed 2498.83 samples/sec Loss 3.0334 LearningRate 0.000584 Epoch: 12 Global Step: 259010 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:40:14,895-Speed 2498.41 samples/sec Loss 3.0692 LearningRate 0.000584 Epoch: 12 Global Step: 259020 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:40:23,038-Speed 2515.25 samples/sec Loss 3.0442 LearningRate 0.000584 Epoch: 12 Global Step: 259030 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:40:31,235-Speed 2498.83 samples/sec Loss 3.0428 LearningRate 0.000584 Epoch: 12 Global Step: 259040 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 01:40:39,434-Speed 2498.30 samples/sec Loss 3.0823 LearningRate 0.000584 Epoch: 12 Global Step: 259050 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:40:47,636-Speed 2497.37 samples/sec Loss 3.0656 LearningRate 0.000584 Epoch: 12 Global Step: 259060 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:40:55,837-Speed 2497.67 samples/sec Loss 3.1119 LearningRate 0.000584 Epoch: 12 Global Step: 259070 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:04,032-Speed 2499.52 samples/sec Loss 3.1477 LearningRate 0.000584 Epoch: 12 Global Step: 259080 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:12,178-Speed 2514.34 samples/sec Loss 3.0374 LearningRate 0.000584 Epoch: 12 Global Step: 259090 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:20,375-Speed 2498.81 samples/sec Loss 3.0902 LearningRate 0.000584 Epoch: 12 Global Step: 259100 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:28,580-Speed 2496.90 samples/sec Loss 3.0630 LearningRate 0.000584 Epoch: 12 Global Step: 259110 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:36,781-Speed 2497.74 samples/sec Loss 3.1455 LearningRate 0.000584 Epoch: 12 Global Step: 259120 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:44,979-Speed 2498.38 samples/sec Loss 3.1278 LearningRate 0.000584 Epoch: 12 Global Step: 259130 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:41:53,180-Speed 2497.74 samples/sec Loss 3.0858 LearningRate 0.000584 Epoch: 12 Global Step: 259140 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:01,332-Speed 2512.93 samples/sec Loss 3.0514 LearningRate 0.000584 Epoch: 12 Global Step: 259150 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:09,532-Speed 2497.79 samples/sec Loss 3.0756 LearningRate 0.000584 Epoch: 12 Global Step: 259160 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:17,734-Speed 2497.28 samples/sec Loss 3.1230 LearningRate 0.000584 Epoch: 12 Global Step: 259170 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:25,931-Speed 2498.83 samples/sec Loss 3.0797 LearningRate 0.000584 Epoch: 12 Global Step: 259180 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:34,135-Speed 2496.77 samples/sec Loss 3.1118 LearningRate 0.000584 Epoch: 12 Global Step: 259190 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:42,334-Speed 2498.43 samples/sec Loss 3.1163 LearningRate 0.000584 Epoch: 12 Global Step: 259200 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:50,478-Speed 2515.08 samples/sec Loss 3.1471 LearningRate 0.000584 Epoch: 12 Global Step: 259210 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:42:58,676-Speed 2498.45 samples/sec Loss 3.1007 LearningRate 0.000584 Epoch: 12 Global Step: 259220 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:06,890-Speed 2493.65 samples/sec Loss 3.1048 LearningRate 0.000584 Epoch: 12 Global Step: 259230 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:15,086-Speed 2499.37 samples/sec Loss 3.0768 LearningRate 0.000584 Epoch: 12 Global Step: 259240 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:23,283-Speed 2498.80 samples/sec Loss 3.1460 LearningRate 0.000584 Epoch: 12 Global Step: 259250 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:31,495-Speed 2494.23 samples/sec Loss 3.1010 LearningRate 0.000583 Epoch: 12 Global Step: 259260 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:39,644-Speed 2513.84 samples/sec Loss 3.0502 LearningRate 0.000583 Epoch: 12 Global Step: 259270 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:47,843-Speed 2498.03 samples/sec Loss 3.0919 LearningRate 0.000583 Epoch: 12 Global Step: 259280 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:43:56,046-Speed 2497.11 samples/sec Loss 3.1369 LearningRate 0.000583 Epoch: 12 Global Step: 259290 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:04,246-Speed 2497.89 samples/sec Loss 3.0506 LearningRate 0.000583 Epoch: 12 Global Step: 259300 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:12,446-Speed 2498.15 samples/sec Loss 3.0520 LearningRate 0.000583 Epoch: 12 Global Step: 259310 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:20,647-Speed 2497.42 samples/sec Loss 3.1260 LearningRate 0.000583 Epoch: 12 Global Step: 259320 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:28,791-Speed 2515.10 samples/sec Loss 3.0510 LearningRate 0.000583 Epoch: 12 Global Step: 259330 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:36,987-Speed 2499.12 samples/sec Loss 3.0342 LearningRate 0.000583 Epoch: 12 Global Step: 259340 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:45,186-Speed 2498.39 samples/sec Loss 3.0910 LearningRate 0.000583 Epoch: 12 Global Step: 259350 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:44:53,385-Speed 2498.17 samples/sec Loss 3.0083 LearningRate 0.000583 Epoch: 12 Global Step: 259360 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:01,584-Speed 2498.50 samples/sec Loss 3.0709 LearningRate 0.000583 Epoch: 12 Global Step: 259370 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:09,781-Speed 2498.93 samples/sec Loss 3.0780 LearningRate 0.000583 Epoch: 12 Global Step: 259380 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:17,928-Speed 2514.22 samples/sec Loss 3.0125 LearningRate 0.000583 Epoch: 12 Global Step: 259390 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:26,131-Speed 2497.14 samples/sec Loss 3.0077 LearningRate 0.000583 Epoch: 12 Global Step: 259400 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:34,333-Speed 2497.48 samples/sec Loss 3.0693 LearningRate 0.000583 Epoch: 12 Global Step: 259410 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:42,527-Speed 2499.51 samples/sec Loss 2.9763 LearningRate 0.000583 Epoch: 12 Global Step: 259420 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:50,727-Speed 2498.04 samples/sec Loss 3.0382 LearningRate 0.000583 Epoch: 12 Global Step: 259430 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:45:58,923-Speed 2499.21 samples/sec Loss 3.0932 LearningRate 0.000583 Epoch: 12 Global Step: 259440 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:07,076-Speed 2512.12 samples/sec Loss 3.0212 LearningRate 0.000583 Epoch: 12 Global Step: 259450 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:15,276-Speed 2498.06 samples/sec Loss 3.1004 LearningRate 0.000583 Epoch: 12 Global Step: 259460 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:23,473-Speed 2498.82 samples/sec Loss 3.0525 LearningRate 0.000583 Epoch: 12 Global Step: 259470 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:31,684-Speed 2494.61 samples/sec Loss 3.0729 LearningRate 0.000583 Epoch: 12 Global Step: 259480 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:40,002-Speed 2498.03 samples/sec Loss 3.0247 LearningRate 0.000583 Epoch: 12 Global Step: 259490 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:48,265-Speed 2500.38 samples/sec Loss 3.0443 LearningRate 0.000583 Epoch: 12 Global Step: 259500 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:46:56,462-Speed 2517.15 samples/sec Loss 3.1570 LearningRate 0.000583 Epoch: 12 Global Step: 259510 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:04,661-Speed 2498.10 samples/sec Loss 3.0793 LearningRate 0.000583 Epoch: 12 Global Step: 259520 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:12,898-Speed 2499.99 samples/sec Loss 3.0492 LearningRate 0.000583 Epoch: 12 Global Step: 259530 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:21,153-Speed 2497.63 samples/sec Loss 2.9888 LearningRate 0.000583 Epoch: 12 Global Step: 259540 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:31,820-Speed 1933.23 samples/sec Loss 3.0881 LearningRate 0.000583 Epoch: 12 Global Step: 259550 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:40,017-Speed 2498.63 samples/sec Loss 3.0206 LearningRate 0.000583 Epoch: 12 Global Step: 259560 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:47:48,210-Speed 2515.57 samples/sec Loss 3.0601 LearningRate 0.000583 Epoch: 12 Global Step: 259570 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:02,271-Speed 2500.66 samples/sec Loss 3.0744 LearningRate 0.000583 Epoch: 12 Global Step: 259580 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:10,513-Speed 2501.38 samples/sec Loss 3.0467 LearningRate 0.000583 Epoch: 12 Global Step: 259590 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:18,708-Speed 2499.47 samples/sec Loss 3.1322 LearningRate 0.000583 Epoch: 12 Global Step: 259600 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:26,952-Speed 2500.12 samples/sec Loss 3.0782 LearningRate 0.000583 Epoch: 12 Global Step: 259610 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:35,171-Speed 2500.26 samples/sec Loss 3.0433 LearningRate 0.000583 Epoch: 12 Global Step: 259620 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:43,325-Speed 2512.01 samples/sec Loss 3.0934 LearningRate 0.000583 Epoch: 12 Global Step: 259630 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:51,525-Speed 2497.86 samples/sec Loss 3.1577 LearningRate 0.000583 Epoch: 12 Global Step: 259640 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:48:59,780-Speed 2499.09 samples/sec Loss 3.1205 LearningRate 0.000583 Epoch: 12 Global Step: 259650 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:08,051-Speed 2499.08 samples/sec Loss 3.1120 LearningRate 0.000583 Epoch: 12 Global Step: 259660 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:16,261-Speed 2494.83 samples/sec Loss 3.1010 LearningRate 0.000583 Epoch: 12 Global Step: 259670 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:28,113-Speed 2495.21 samples/sec Loss 3.1467 LearningRate 0.000583 Epoch: 12 Global Step: 259680 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:36,299-Speed 2515.36 samples/sec Loss 3.0653 LearningRate 0.000583 Epoch: 12 Global Step: 259690 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:44,508-Speed 2495.13 samples/sec Loss 3.1723 LearningRate 0.000583 Epoch: 12 Global Step: 259700 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:49:52,760-Speed 2496.36 samples/sec Loss 3.0705 LearningRate 0.000583 Epoch: 12 Global Step: 259710 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:05,707-Speed 1581.96 samples/sec Loss 3.1432 LearningRate 0.000583 Epoch: 12 Global Step: 259720 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:13,952-Speed 2497.44 samples/sec Loss 3.1723 LearningRate 0.000583 Epoch: 12 Global Step: 259730 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:22,216-Speed 2497.61 samples/sec Loss 3.0428 LearningRate 0.000583 Epoch: 12 Global Step: 259740 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:34,485-Speed 1669.50 samples/sec Loss 3.1438 LearningRate 0.000582 Epoch: 12 Global Step: 259750 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:47,041-Speed 1762.07 samples/sec Loss 3.1162 LearningRate 0.000582 Epoch: 12 Global Step: 259760 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:50:55,278-Speed 2486.84 samples/sec Loss 3.1935 LearningRate 0.000582 Epoch: 12 Global Step: 259770 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:03,500-Speed 2491.36 samples/sec Loss 3.0474 LearningRate 0.000582 Epoch: 12 Global Step: 259780 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:11,719-Speed 2492.15 samples/sec Loss 3.1069 LearningRate 0.000582 Epoch: 12 Global Step: 259790 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:19,939-Speed 2491.93 samples/sec Loss 3.1103 LearningRate 0.000582 Epoch: 12 Global Step: 259800 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:28,107-Speed 2507.84 samples/sec Loss 3.1341 LearningRate 0.000582 Epoch: 12 Global Step: 259810 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:36,331-Speed 2490.83 samples/sec Loss 3.0893 LearningRate 0.000582 Epoch: 12 Global Step: 259820 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:44,563-Speed 2488.26 samples/sec Loss 3.1099 LearningRate 0.000582 Epoch: 12 Global Step: 259830 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:51:52,787-Speed 2490.29 samples/sec Loss 3.0568 LearningRate 0.000582 Epoch: 12 Global Step: 259840 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:01,010-Speed 2490.90 samples/sec Loss 3.0239 LearningRate 0.000582 Epoch: 12 Global Step: 259850 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:09,228-Speed 2492.48 samples/sec Loss 3.0643 LearningRate 0.000582 Epoch: 12 Global Step: 259860 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:17,388-Speed 2510.37 samples/sec Loss 3.0453 LearningRate 0.000582 Epoch: 12 Global Step: 259870 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:25,597-Speed 2494.94 samples/sec Loss 3.0820 LearningRate 0.000582 Epoch: 12 Global Step: 259880 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:33,803-Speed 2496.24 samples/sec Loss 3.0330 LearningRate 0.000582 Epoch: 12 Global Step: 259890 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:42,021-Speed 2492.35 samples/sec Loss 3.0600 LearningRate 0.000582 Epoch: 12 Global Step: 259900 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:50,226-Speed 2496.77 samples/sec Loss 3.0714 LearningRate 0.000582 Epoch: 12 Global Step: 259910 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:52:58,441-Speed 2493.43 samples/sec Loss 3.0417 LearningRate 0.000582 Epoch: 12 Global Step: 259920 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:06,602-Speed 2509.86 samples/sec Loss 3.0141 LearningRate 0.000582 Epoch: 12 Global Step: 259930 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:14,806-Speed 2496.96 samples/sec Loss 3.0987 LearningRate 0.000582 Epoch: 12 Global Step: 259940 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:23,021-Speed 2493.22 samples/sec Loss 3.1142 LearningRate 0.000582 Epoch: 12 Global Step: 259950 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:31,238-Speed 2492.86 samples/sec Loss 3.0739 LearningRate 0.000582 Epoch: 12 Global Step: 259960 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:39,442-Speed 2496.51 samples/sec Loss 3.0540 LearningRate 0.000582 Epoch: 12 Global Step: 259970 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:47,648-Speed 2496.13 samples/sec Loss 3.0219 LearningRate 0.000582 Epoch: 12 Global Step: 259980 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:53:55,803-Speed 2511.73 samples/sec Loss 3.1166 LearningRate 0.000582 Epoch: 12 Global Step: 259990 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:04,015-Speed 2494.37 samples/sec Loss 3.1150 LearningRate 0.000582 Epoch: 12 Global Step: 260000 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:12,225-Speed 2495.00 samples/sec Loss 3.0708 LearningRate 0.000582 Epoch: 12 Global Step: 260010 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:20,430-Speed 2496.31 samples/sec Loss 3.0202 LearningRate 0.000582 Epoch: 12 Global Step: 260020 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:28,640-Speed 2494.82 samples/sec Loss 3.0750 LearningRate 0.000582 Epoch: 12 Global Step: 260030 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:36,852-Speed 2494.52 samples/sec Loss 3.0602 LearningRate 0.000582 Epoch: 12 Global Step: 260040 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:45,007-Speed 2511.54 samples/sec Loss 3.1642 LearningRate 0.000582 Epoch: 12 Global Step: 260050 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:54:53,212-Speed 2496.53 samples/sec Loss 3.0981 LearningRate 0.000582 Epoch: 12 Global Step: 260060 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:01,417-Speed 2496.43 samples/sec Loss 3.1202 LearningRate 0.000582 Epoch: 12 Global Step: 260070 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:09,626-Speed 2495.32 samples/sec Loss 3.1458 LearningRate 0.000582 Epoch: 12 Global Step: 260080 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:17,839-Speed 2494.00 samples/sec Loss 3.1330 LearningRate 0.000582 Epoch: 12 Global Step: 260090 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:26,046-Speed 2495.81 samples/sec Loss 3.1208 LearningRate 0.000582 Epoch: 12 Global Step: 260100 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:34,210-Speed 2509.11 samples/sec Loss 3.1927 LearningRate 0.000582 Epoch: 12 Global Step: 260110 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:42,415-Speed 2496.18 samples/sec Loss 3.0983 LearningRate 0.000582 Epoch: 12 Global Step: 260120 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:50,622-Speed 2495.82 samples/sec Loss 3.1223 LearningRate 0.000582 Epoch: 12 Global Step: 260130 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:55:58,827-Speed 2496.74 samples/sec Loss 3.0922 LearningRate 0.000582 Epoch: 12 Global Step: 260140 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:07,031-Speed 2496.74 samples/sec Loss 3.0954 LearningRate 0.000582 Epoch: 12 Global Step: 260150 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:15,237-Speed 2495.93 samples/sec Loss 3.1170 LearningRate 0.000582 Epoch: 12 Global Step: 260160 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:23,390-Speed 2512.44 samples/sec Loss 3.1419 LearningRate 0.000582 Epoch: 12 Global Step: 260170 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:31,596-Speed 2496.41 samples/sec Loss 3.1228 LearningRate 0.000582 Epoch: 12 Global Step: 260180 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:39,799-Speed 2496.86 samples/sec Loss 3.0819 LearningRate 0.000582 Epoch: 12 Global Step: 260190 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:48,004-Speed 2496.48 samples/sec Loss 3.0953 LearningRate 0.000582 Epoch: 12 Global Step: 260200 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:56:56,207-Speed 2497.05 samples/sec Loss 3.1151 LearningRate 0.000582 Epoch: 12 Global Step: 260210 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:57:04,413-Speed 2496.26 samples/sec Loss 3.0832 LearningRate 0.000582 Epoch: 12 Global Step: 260220 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:57:12,571-Speed 2510.66 samples/sec Loss 3.0319 LearningRate 0.000581 Epoch: 12 Global Step: 260230 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:57:20,780-Speed 2495.36 samples/sec Loss 3.0775 LearningRate 0.000581 Epoch: 12 Global Step: 260240 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 01:57:29,004-Speed 2490.53 samples/sec Loss 3.0619 LearningRate 0.000581 Epoch: 12 Global Step: 260250 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:57:37,209-Speed 2496.56 samples/sec Loss 2.9973 LearningRate 0.000581 Epoch: 12 Global Step: 260260 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:57:45,413-Speed 2496.70 samples/sec Loss 3.0599 LearningRate 0.000581 Epoch: 12 Global Step: 260270 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:57:53,619-Speed 2496.22 samples/sec Loss 3.0235 LearningRate 0.000581 Epoch: 12 Global Step: 260280 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:01,775-Speed 2511.47 samples/sec Loss 3.0214 LearningRate 0.000581 Epoch: 12 Global Step: 260290 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:09,980-Speed 2496.68 samples/sec Loss 3.0284 LearningRate 0.000581 Epoch: 12 Global Step: 260300 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:18,183-Speed 2496.70 samples/sec Loss 3.1429 LearningRate 0.000581 Epoch: 12 Global Step: 260310 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:26,386-Speed 2497.04 samples/sec Loss 3.1222 LearningRate 0.000581 Epoch: 12 Global Step: 260320 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:34,601-Speed 2493.29 samples/sec Loss 3.0369 LearningRate 0.000581 Epoch: 12 Global Step: 260330 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:42,806-Speed 2496.49 samples/sec Loss 3.1528 LearningRate 0.000581 Epoch: 12 Global Step: 260340 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:50,959-Speed 2512.29 samples/sec Loss 3.1111 LearningRate 0.000581 Epoch: 12 Global Step: 260350 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:58:59,166-Speed 2495.68 samples/sec Loss 3.0999 LearningRate 0.000581 Epoch: 12 Global Step: 260360 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:07,368-Speed 2497.59 samples/sec Loss 3.0665 LearningRate 0.000581 Epoch: 12 Global Step: 260370 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:15,578-Speed 2495.01 samples/sec Loss 3.0664 LearningRate 0.000581 Epoch: 12 Global Step: 260380 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:23,782-Speed 2496.39 samples/sec Loss 3.0558 LearningRate 0.000581 Epoch: 12 Global Step: 260390 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:31,998-Speed 2493.11 samples/sec Loss 3.0548 LearningRate 0.000581 Epoch: 12 Global Step: 260400 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:40,148-Speed 2513.48 samples/sec Loss 2.9731 LearningRate 0.000581 Epoch: 12 Global Step: 260410 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:48,363-Speed 2493.32 samples/sec Loss 3.0304 LearningRate 0.000581 Epoch: 12 Global Step: 260420 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 01:59:56,564-Speed 2497.61 samples/sec Loss 3.0588 LearningRate 0.000581 Epoch: 12 Global Step: 260430 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:04,770-Speed 2496.01 samples/sec Loss 3.0595 LearningRate 0.000581 Epoch: 12 Global Step: 260440 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:12,973-Speed 2497.20 samples/sec Loss 3.0376 LearningRate 0.000581 Epoch: 12 Global Step: 260450 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:21,183-Speed 2494.86 samples/sec Loss 3.0712 LearningRate 0.000581 Epoch: 12 Global Step: 260460 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:29,334-Speed 2512.98 samples/sec Loss 3.0215 LearningRate 0.000581 Epoch: 12 Global Step: 260470 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:37,538-Speed 2496.88 samples/sec Loss 3.0755 LearningRate 0.000581 Epoch: 12 Global Step: 260480 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:45,743-Speed 2496.28 samples/sec Loss 3.0806 LearningRate 0.000581 Epoch: 12 Global Step: 260490 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:00:53,954-Speed 2494.61 samples/sec Loss 3.0841 LearningRate 0.000581 Epoch: 12 Global Step: 260500 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:02,156-Speed 2497.23 samples/sec Loss 3.0948 LearningRate 0.000581 Epoch: 12 Global Step: 260510 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:10,360-Speed 2496.87 samples/sec Loss 3.0591 LearningRate 0.000581 Epoch: 12 Global Step: 260520 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:18,510-Speed 2513.19 samples/sec Loss 3.1068 LearningRate 0.000581 Epoch: 12 Global Step: 260530 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:26,718-Speed 2495.56 samples/sec Loss 3.1296 LearningRate 0.000581 Epoch: 12 Global Step: 260540 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:34,922-Speed 2496.82 samples/sec Loss 3.1270 LearningRate 0.000581 Epoch: 12 Global Step: 260550 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:43,122-Speed 2497.75 samples/sec Loss 3.1061 LearningRate 0.000581 Epoch: 12 Global Step: 260560 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:51,332-Speed 2495.24 samples/sec Loss 3.0628 LearningRate 0.000581 Epoch: 12 Global Step: 260570 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:01:59,533-Speed 2497.36 samples/sec Loss 3.0506 LearningRate 0.000581 Epoch: 12 Global Step: 260580 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:07,682-Speed 2513.84 samples/sec Loss 3.1716 LearningRate 0.000581 Epoch: 12 Global Step: 260590 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:15,890-Speed 2495.81 samples/sec Loss 3.0351 LearningRate 0.000581 Epoch: 12 Global Step: 260600 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:24,093-Speed 2497.27 samples/sec Loss 3.0895 LearningRate 0.000581 Epoch: 12 Global Step: 260610 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:32,301-Speed 2495.47 samples/sec Loss 3.0302 LearningRate 0.000581 Epoch: 12 Global Step: 260620 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:40,502-Speed 2497.40 samples/sec Loss 3.0278 LearningRate 0.000581 Epoch: 12 Global Step: 260630 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:48,705-Speed 2497.24 samples/sec Loss 3.1309 LearningRate 0.000581 Epoch: 12 Global Step: 260640 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:02:56,860-Speed 2511.73 samples/sec Loss 3.0227 LearningRate 0.000581 Epoch: 12 Global Step: 260650 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:05,062-Speed 2497.42 samples/sec Loss 3.1008 LearningRate 0.000581 Epoch: 12 Global Step: 260660 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:13,266-Speed 2496.96 samples/sec Loss 3.0959 LearningRate 0.000581 Epoch: 12 Global Step: 260670 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:21,479-Speed 2494.03 samples/sec Loss 3.0856 LearningRate 0.000581 Epoch: 12 Global Step: 260680 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:29,682-Speed 2496.77 samples/sec Loss 3.0434 LearningRate 0.000581 Epoch: 12 Global Step: 260690 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:37,887-Speed 2496.34 samples/sec Loss 3.0311 LearningRate 0.000581 Epoch: 12 Global Step: 260700 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:46,039-Speed 2512.86 samples/sec Loss 3.0327 LearningRate 0.000581 Epoch: 12 Global Step: 260710 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:03:54,243-Speed 2497.04 samples/sec Loss 3.0708 LearningRate 0.000580 Epoch: 12 Global Step: 260720 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:02,445-Speed 2497.31 samples/sec Loss 3.0355 LearningRate 0.000580 Epoch: 12 Global Step: 260730 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:10,646-Speed 2497.88 samples/sec Loss 3.0445 LearningRate 0.000580 Epoch: 12 Global Step: 260740 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:18,855-Speed 2495.27 samples/sec Loss 3.1206 LearningRate 0.000580 Epoch: 12 Global Step: 260750 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:27,066-Speed 2494.60 samples/sec Loss 3.1157 LearningRate 0.000580 Epoch: 12 Global Step: 260760 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:35,217-Speed 2512.93 samples/sec Loss 3.0791 LearningRate 0.000580 Epoch: 12 Global Step: 260770 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:43,428-Speed 2494.70 samples/sec Loss 3.0575 LearningRate 0.000580 Epoch: 12 Global Step: 260780 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:51,630-Speed 2497.38 samples/sec Loss 3.0302 LearningRate 0.000580 Epoch: 12 Global Step: 260790 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:04:59,836-Speed 2496.39 samples/sec Loss 3.0217 LearningRate 0.000580 Epoch: 12 Global Step: 260800 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:08,037-Speed 2497.62 samples/sec Loss 3.1031 LearningRate 0.000580 Epoch: 12 Global Step: 260810 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:16,242-Speed 2496.27 samples/sec Loss 3.0752 LearningRate 0.000580 Epoch: 12 Global Step: 260820 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:24,395-Speed 2512.49 samples/sec Loss 3.1513 LearningRate 0.000580 Epoch: 12 Global Step: 260830 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:32,598-Speed 2497.00 samples/sec Loss 3.1338 LearningRate 0.000580 Epoch: 12 Global Step: 260840 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:40,806-Speed 2495.64 samples/sec Loss 3.0919 LearningRate 0.000580 Epoch: 12 Global Step: 260850 Fp16 Grad Scale: 65536 Required: 130 hours Training: 2022-07-08 02:05:48,967-Speed 2509.81 samples/sec Loss 3.0071 LearningRate 0.000580 Epoch: 12 Global Step: 260860 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:05:57,171-Speed 2496.77 samples/sec Loss 3.0667 LearningRate 0.000580 Epoch: 12 Global Step: 260870 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:06:05,329-Speed 2510.79 samples/sec Loss 3.0660 LearningRate 0.000580 Epoch: 12 Global Step: 260880 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:13,479-Speed 2513.26 samples/sec Loss 3.0524 LearningRate 0.000580 Epoch: 12 Global Step: 260890 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:21,681-Speed 2497.41 samples/sec Loss 3.0713 LearningRate 0.000580 Epoch: 12 Global Step: 260900 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:29,883-Speed 2497.37 samples/sec Loss 3.0679 LearningRate 0.000580 Epoch: 12 Global Step: 260910 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:38,088-Speed 2496.50 samples/sec Loss 3.1436 LearningRate 0.000580 Epoch: 12 Global Step: 260920 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:46,292-Speed 2496.68 samples/sec Loss 2.9940 LearningRate 0.000580 Epoch: 12 Global Step: 260930 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:06:54,498-Speed 2496.23 samples/sec Loss 3.0732 LearningRate 0.000580 Epoch: 12 Global Step: 260940 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:02,653-Speed 2511.75 samples/sec Loss 3.0751 LearningRate 0.000580 Epoch: 12 Global Step: 260950 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:10,854-Speed 2497.36 samples/sec Loss 3.0794 LearningRate 0.000580 Epoch: 12 Global Step: 260960 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:19,061-Speed 2495.96 samples/sec Loss 3.0322 LearningRate 0.000580 Epoch: 12 Global Step: 260970 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:27,264-Speed 2497.05 samples/sec Loss 3.0698 LearningRate 0.000580 Epoch: 12 Global Step: 260980 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:35,472-Speed 2495.69 samples/sec Loss 3.1306 LearningRate 0.000580 Epoch: 12 Global Step: 260990 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:43,673-Speed 2497.53 samples/sec Loss 3.1063 LearningRate 0.000580 Epoch: 12 Global Step: 261000 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:07:51,820-Speed 2514.34 samples/sec Loss 3.0722 LearningRate 0.000580 Epoch: 12 Global Step: 261010 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:00,024-Speed 2496.77 samples/sec Loss 3.0438 LearningRate 0.000580 Epoch: 12 Global Step: 261020 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:08,229-Speed 2496.59 samples/sec Loss 3.0475 LearningRate 0.000580 Epoch: 12 Global Step: 261030 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:16,429-Speed 2497.67 samples/sec Loss 3.0785 LearningRate 0.000580 Epoch: 12 Global Step: 261040 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:24,648-Speed 2492.60 samples/sec Loss 3.0758 LearningRate 0.000580 Epoch: 12 Global Step: 261050 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:32,861-Speed 2494.03 samples/sec Loss 3.0768 LearningRate 0.000580 Epoch: 12 Global Step: 261060 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:41,012-Speed 2513.03 samples/sec Loss 3.0271 LearningRate 0.000580 Epoch: 12 Global Step: 261070 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:49,217-Speed 2496.53 samples/sec Loss 3.0302 LearningRate 0.000580 Epoch: 12 Global Step: 261080 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:08:57,418-Speed 2497.72 samples/sec Loss 3.0071 LearningRate 0.000580 Epoch: 12 Global Step: 261090 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:05,621-Speed 2496.84 samples/sec Loss 3.0431 LearningRate 0.000580 Epoch: 12 Global Step: 261100 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:13,823-Speed 2497.27 samples/sec Loss 3.0888 LearningRate 0.000580 Epoch: 12 Global Step: 261110 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:22,026-Speed 2496.95 samples/sec Loss 3.1084 LearningRate 0.000580 Epoch: 12 Global Step: 261120 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:30,182-Speed 2512.01 samples/sec Loss 3.0853 LearningRate 0.000580 Epoch: 12 Global Step: 261130 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:38,387-Speed 2496.39 samples/sec Loss 3.1105 LearningRate 0.000580 Epoch: 12 Global Step: 261140 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:46,603-Speed 2492.98 samples/sec Loss 3.0667 LearningRate 0.000580 Epoch: 12 Global Step: 261150 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:09:54,805-Speed 2497.54 samples/sec Loss 3.0760 LearningRate 0.000580 Epoch: 12 Global Step: 261160 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:03,006-Speed 2497.67 samples/sec Loss 3.0584 LearningRate 0.000580 Epoch: 12 Global Step: 261170 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:11,214-Speed 2495.47 samples/sec Loss 3.0755 LearningRate 0.000580 Epoch: 12 Global Step: 261180 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:19,363-Speed 2513.59 samples/sec Loss 3.0681 LearningRate 0.000580 Epoch: 12 Global Step: 261190 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:27,564-Speed 2497.84 samples/sec Loss 3.0748 LearningRate 0.000580 Epoch: 12 Global Step: 261200 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:35,767-Speed 2496.98 samples/sec Loss 3.0453 LearningRate 0.000579 Epoch: 12 Global Step: 261210 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:43,969-Speed 2497.39 samples/sec Loss 2.9969 LearningRate 0.000579 Epoch: 12 Global Step: 261220 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:10:52,172-Speed 2496.89 samples/sec Loss 3.0395 LearningRate 0.000579 Epoch: 12 Global Step: 261230 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:00,374-Speed 2497.58 samples/sec Loss 3.0666 LearningRate 0.000579 Epoch: 12 Global Step: 261240 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:08,525-Speed 2512.78 samples/sec Loss 3.0724 LearningRate 0.000579 Epoch: 12 Global Step: 261250 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:16,732-Speed 2496.08 samples/sec Loss 3.0365 LearningRate 0.000579 Epoch: 12 Global Step: 261260 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:24,937-Speed 2496.70 samples/sec Loss 3.0967 LearningRate 0.000579 Epoch: 12 Global Step: 261270 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:33,141-Speed 2496.56 samples/sec Loss 3.0886 LearningRate 0.000579 Epoch: 12 Global Step: 261280 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:41,344-Speed 2497.04 samples/sec Loss 3.0125 LearningRate 0.000579 Epoch: 12 Global Step: 261290 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:49,547-Speed 2497.31 samples/sec Loss 3.1278 LearningRate 0.000579 Epoch: 12 Global Step: 261300 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:11:57,713-Speed 2508.26 samples/sec Loss 3.0642 LearningRate 0.000579 Epoch: 12 Global Step: 261310 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:05,917-Speed 2496.54 samples/sec Loss 3.1370 LearningRate 0.000579 Epoch: 12 Global Step: 261320 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:14,121-Speed 2496.81 samples/sec Loss 3.0506 LearningRate 0.000579 Epoch: 12 Global Step: 261330 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:22,325-Speed 2496.83 samples/sec Loss 3.0420 LearningRate 0.000579 Epoch: 12 Global Step: 261340 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:30,526-Speed 2497.42 samples/sec Loss 3.0274 LearningRate 0.000579 Epoch: 12 Global Step: 261350 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:38,730-Speed 2496.86 samples/sec Loss 3.0515 LearningRate 0.000579 Epoch: 12 Global Step: 261360 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:46,881-Speed 2513.05 samples/sec Loss 3.0604 LearningRate 0.000579 Epoch: 12 Global Step: 261370 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:12:55,084-Speed 2497.06 samples/sec Loss 3.0796 LearningRate 0.000579 Epoch: 12 Global Step: 261380 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:03,289-Speed 2496.16 samples/sec Loss 3.0449 LearningRate 0.000579 Epoch: 12 Global Step: 261390 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:11,496-Speed 2495.89 samples/sec Loss 2.9958 LearningRate 0.000579 Epoch: 12 Global Step: 261400 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:19,695-Speed 2498.33 samples/sec Loss 3.0322 LearningRate 0.000579 Epoch: 12 Global Step: 261410 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:27,901-Speed 2495.85 samples/sec Loss 3.0348 LearningRate 0.000579 Epoch: 12 Global Step: 261420 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:36,061-Speed 2510.10 samples/sec Loss 3.0504 LearningRate 0.000579 Epoch: 12 Global Step: 261430 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:44,265-Speed 2496.89 samples/sec Loss 3.0657 LearningRate 0.000579 Epoch: 12 Global Step: 261440 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:13:52,478-Speed 2493.88 samples/sec Loss 3.0233 LearningRate 0.000579 Epoch: 12 Global Step: 261450 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:00,683-Speed 2496.27 samples/sec Loss 3.0584 LearningRate 0.000579 Epoch: 12 Global Step: 261460 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:08,905-Speed 2491.54 samples/sec Loss 2.9914 LearningRate 0.000579 Epoch: 12 Global Step: 261470 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:17,120-Speed 2493.29 samples/sec Loss 3.0441 LearningRate 0.000579 Epoch: 12 Global Step: 261480 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:25,270-Speed 2513.44 samples/sec Loss 3.0187 LearningRate 0.000579 Epoch: 12 Global Step: 261490 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:33,472-Speed 2496.97 samples/sec Loss 3.0930 LearningRate 0.000579 Epoch: 12 Global Step: 261500 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:41,675-Speed 2496.96 samples/sec Loss 3.1146 LearningRate 0.000579 Epoch: 12 Global Step: 261510 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:49,881-Speed 2496.57 samples/sec Loss 3.1220 LearningRate 0.000579 Epoch: 12 Global Step: 261520 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:14:58,083-Speed 2497.19 samples/sec Loss 3.0960 LearningRate 0.000579 Epoch: 12 Global Step: 261530 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:06,285-Speed 2497.25 samples/sec Loss 3.0212 LearningRate 0.000579 Epoch: 12 Global Step: 261540 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:14,436-Speed 2512.99 samples/sec Loss 3.0585 LearningRate 0.000579 Epoch: 12 Global Step: 261550 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:22,638-Speed 2497.55 samples/sec Loss 3.1598 LearningRate 0.000579 Epoch: 12 Global Step: 261560 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:30,841-Speed 2496.86 samples/sec Loss 3.1250 LearningRate 0.000579 Epoch: 12 Global Step: 261570 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:39,044-Speed 2497.09 samples/sec Loss 3.0891 LearningRate 0.000579 Epoch: 12 Global Step: 261580 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:47,248-Speed 2496.95 samples/sec Loss 3.1357 LearningRate 0.000579 Epoch: 12 Global Step: 261590 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:15:55,449-Speed 2497.42 samples/sec Loss 3.1130 LearningRate 0.000579 Epoch: 12 Global Step: 261600 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:03,600-Speed 2513.05 samples/sec Loss 3.1101 LearningRate 0.000579 Epoch: 12 Global Step: 261610 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:11,803-Speed 2497.32 samples/sec Loss 3.0319 LearningRate 0.000579 Epoch: 12 Global Step: 261620 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:20,007-Speed 2496.74 samples/sec Loss 3.0791 LearningRate 0.000579 Epoch: 12 Global Step: 261630 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:28,211-Speed 2496.79 samples/sec Loss 3.1129 LearningRate 0.000579 Epoch: 12 Global Step: 261640 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:36,418-Speed 2495.99 samples/sec Loss 3.1218 LearningRate 0.000579 Epoch: 12 Global Step: 261650 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:44,622-Speed 2496.60 samples/sec Loss 3.0940 LearningRate 0.000579 Epoch: 12 Global Step: 261660 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:16:52,772-Speed 2513.03 samples/sec Loss 3.1092 LearningRate 0.000579 Epoch: 12 Global Step: 261670 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:00,977-Speed 2496.54 samples/sec Loss 3.0538 LearningRate 0.000579 Epoch: 12 Global Step: 261680 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:09,185-Speed 2495.60 samples/sec Loss 3.0637 LearningRate 0.000579 Epoch: 12 Global Step: 261690 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:17,389-Speed 2496.64 samples/sec Loss 3.0500 LearningRate 0.000579 Epoch: 12 Global Step: 261700 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:25,597-Speed 2495.66 samples/sec Loss 3.0816 LearningRate 0.000578 Epoch: 12 Global Step: 261710 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:33,800-Speed 2496.81 samples/sec Loss 3.0508 LearningRate 0.000578 Epoch: 12 Global Step: 261720 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:41,958-Speed 2511.11 samples/sec Loss 3.0339 LearningRate 0.000578 Epoch: 12 Global Step: 261730 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:50,163-Speed 2496.23 samples/sec Loss 3.1040 LearningRate 0.000578 Epoch: 12 Global Step: 261740 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:17:58,368-Speed 2496.69 samples/sec Loss 3.0768 LearningRate 0.000578 Epoch: 12 Global Step: 261750 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:06,575-Speed 2495.78 samples/sec Loss 3.0373 LearningRate 0.000578 Epoch: 12 Global Step: 261760 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:14,781-Speed 2496.10 samples/sec Loss 3.0723 LearningRate 0.000578 Epoch: 12 Global Step: 261770 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:22,985-Speed 2496.60 samples/sec Loss 2.9987 LearningRate 0.000578 Epoch: 12 Global Step: 261780 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:31,139-Speed 2512.12 samples/sec Loss 3.0373 LearningRate 0.000578 Epoch: 12 Global Step: 261790 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:39,344-Speed 2496.37 samples/sec Loss 3.0249 LearningRate 0.000578 Epoch: 12 Global Step: 261800 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:47,557-Speed 2493.91 samples/sec Loss 3.0441 LearningRate 0.000578 Epoch: 12 Global Step: 261810 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:18:55,760-Speed 2497.33 samples/sec Loss 3.0515 LearningRate 0.000578 Epoch: 12 Global Step: 261820 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:03,962-Speed 2497.36 samples/sec Loss 3.0298 LearningRate 0.000578 Epoch: 12 Global Step: 261830 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:12,164-Speed 2497.41 samples/sec Loss 2.9550 LearningRate 0.000578 Epoch: 12 Global Step: 261840 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:20,313-Speed 2513.46 samples/sec Loss 3.0665 LearningRate 0.000578 Epoch: 12 Global Step: 261850 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:28,516-Speed 2496.94 samples/sec Loss 3.0822 LearningRate 0.000578 Epoch: 12 Global Step: 261860 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:36,735-Speed 2492.29 samples/sec Loss 3.0272 LearningRate 0.000578 Epoch: 12 Global Step: 261870 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:44,941-Speed 2496.09 samples/sec Loss 2.9864 LearningRate 0.000578 Epoch: 12 Global Step: 261880 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:19:53,144-Speed 2496.80 samples/sec Loss 3.0857 LearningRate 0.000578 Epoch: 12 Global Step: 261890 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:01,347-Speed 2496.91 samples/sec Loss 3.0052 LearningRate 0.000578 Epoch: 12 Global Step: 261900 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:09,512-Speed 2508.94 samples/sec Loss 3.0159 LearningRate 0.000578 Epoch: 12 Global Step: 261910 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:17,715-Speed 2496.92 samples/sec Loss 3.0809 LearningRate 0.000578 Epoch: 12 Global Step: 261920 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:25,916-Speed 2497.72 samples/sec Loss 3.1343 LearningRate 0.000578 Epoch: 12 Global Step: 261930 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:34,120-Speed 2496.35 samples/sec Loss 3.0790 LearningRate 0.000578 Epoch: 12 Global Step: 261940 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:42,324-Speed 2497.20 samples/sec Loss 3.0631 LearningRate 0.000578 Epoch: 12 Global Step: 261950 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:50,526-Speed 2497.38 samples/sec Loss 2.9961 LearningRate 0.000578 Epoch: 12 Global Step: 261960 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:20:58,675-Speed 2513.65 samples/sec Loss 3.0626 LearningRate 0.000578 Epoch: 12 Global Step: 261970 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:06,892-Speed 2492.79 samples/sec Loss 3.0560 LearningRate 0.000578 Epoch: 12 Global Step: 261980 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:15,096-Speed 2496.89 samples/sec Loss 3.0890 LearningRate 0.000578 Epoch: 12 Global Step: 261990 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:23,302-Speed 2495.84 samples/sec Loss 3.0967 LearningRate 0.000578 Epoch: 12 Global Step: 262000 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:31,513-Speed 2494.94 samples/sec Loss 3.0656 LearningRate 0.000578 Epoch: 12 Global Step: 262010 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:39,715-Speed 2497.32 samples/sec Loss 3.0336 LearningRate 0.000578 Epoch: 12 Global Step: 262020 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:47,877-Speed 2509.64 samples/sec Loss 3.0637 LearningRate 0.000578 Epoch: 12 Global Step: 262030 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:21:56,079-Speed 2497.39 samples/sec Loss 3.0655 LearningRate 0.000578 Epoch: 12 Global Step: 262040 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:22:04,281-Speed 2497.36 samples/sec Loss 3.0568 LearningRate 0.000578 Epoch: 12 Global Step: 262050 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:22:12,487-Speed 2496.03 samples/sec Loss 3.0250 LearningRate 0.000578 Epoch: 12 Global Step: 262060 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:22:20,691-Speed 2496.79 samples/sec Loss 3.0748 LearningRate 0.000578 Epoch: 12 Global Step: 262070 Fp16 Grad Scale: 16384 Required: 130 hours Training: 2022-07-08 02:22:28,898-Speed 2495.85 samples/sec Loss 3.0416 LearningRate 0.000578 Epoch: 12 Global Step: 262080 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:22:37,049-Speed 2513.26 samples/sec Loss 3.0637 LearningRate 0.000578 Epoch: 12 Global Step: 262090 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:22:45,249-Speed 2497.80 samples/sec Loss 3.0508 LearningRate 0.000578 Epoch: 12 Global Step: 262100 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:22:53,452-Speed 2496.85 samples/sec Loss 3.0911 LearningRate 0.000578 Epoch: 12 Global Step: 262110 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:01,659-Speed 2496.17 samples/sec Loss 3.0680 LearningRate 0.000578 Epoch: 12 Global Step: 262120 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:09,865-Speed 2496.00 samples/sec Loss 3.0904 LearningRate 0.000578 Epoch: 12 Global Step: 262130 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:18,069-Speed 2496.89 samples/sec Loss 3.0251 LearningRate 0.000578 Epoch: 12 Global Step: 262140 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:26,217-Speed 2513.95 samples/sec Loss 3.0190 LearningRate 0.000578 Epoch: 12 Global Step: 262150 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:34,430-Speed 2493.76 samples/sec Loss 2.9954 LearningRate 0.000578 Epoch: 12 Global Step: 262160 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:42,640-Speed 2495.08 samples/sec Loss 3.0076 LearningRate 0.000578 Epoch: 12 Global Step: 262170 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:50,846-Speed 2496.10 samples/sec Loss 3.0147 LearningRate 0.000578 Epoch: 12 Global Step: 262180 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:23:59,050-Speed 2496.52 samples/sec Loss 3.0622 LearningRate 0.000578 Epoch: 12 Global Step: 262190 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:07,258-Speed 2495.55 samples/sec Loss 3.0716 LearningRate 0.000577 Epoch: 12 Global Step: 262200 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:15,409-Speed 2513.29 samples/sec Loss 3.0467 LearningRate 0.000577 Epoch: 12 Global Step: 262210 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:23,610-Speed 2497.52 samples/sec Loss 3.0781 LearningRate 0.000577 Epoch: 12 Global Step: 262220 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:31,819-Speed 2495.05 samples/sec Loss 3.0262 LearningRate 0.000577 Epoch: 12 Global Step: 262230 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:40,034-Speed 2493.38 samples/sec Loss 3.0884 LearningRate 0.000577 Epoch: 12 Global Step: 262240 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:48,260-Speed 2490.15 samples/sec Loss 3.0773 LearningRate 0.000577 Epoch: 12 Global Step: 262250 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:24:56,477-Speed 2493.02 samples/sec Loss 3.0648 LearningRate 0.000577 Epoch: 12 Global Step: 262260 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:04,626-Speed 2513.50 samples/sec Loss 3.0681 LearningRate 0.000577 Epoch: 12 Global Step: 262270 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:12,830-Speed 2496.73 samples/sec Loss 3.0251 LearningRate 0.000577 Epoch: 12 Global Step: 262280 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:21,031-Speed 2497.74 samples/sec Loss 3.0918 LearningRate 0.000577 Epoch: 12 Global Step: 262290 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:29,236-Speed 2496.60 samples/sec Loss 3.0083 LearningRate 0.000577 Epoch: 12 Global Step: 262300 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:37,439-Speed 2496.94 samples/sec Loss 3.0336 LearningRate 0.000577 Epoch: 12 Global Step: 262310 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:45,642-Speed 2496.97 samples/sec Loss 3.0264 LearningRate 0.000577 Epoch: 12 Global Step: 262320 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:25:53,793-Speed 2513.09 samples/sec Loss 3.0881 LearningRate 0.000577 Epoch: 12 Global Step: 262330 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:01,999-Speed 2496.23 samples/sec Loss 3.0730 LearningRate 0.000577 Epoch: 12 Global Step: 262340 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:10,202-Speed 2496.84 samples/sec Loss 3.0596 LearningRate 0.000577 Epoch: 12 Global Step: 262350 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:18,405-Speed 2496.98 samples/sec Loss 3.1113 LearningRate 0.000577 Epoch: 12 Global Step: 262360 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:26,607-Speed 2497.71 samples/sec Loss 3.0242 LearningRate 0.000577 Epoch: 12 Global Step: 262370 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:34,810-Speed 2497.07 samples/sec Loss 3.0700 LearningRate 0.000577 Epoch: 12 Global Step: 262380 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:42,960-Speed 2513.27 samples/sec Loss 3.0269 LearningRate 0.000577 Epoch: 12 Global Step: 262390 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:51,160-Speed 2497.83 samples/sec Loss 3.0866 LearningRate 0.000577 Epoch: 12 Global Step: 262400 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:26:59,362-Speed 2497.29 samples/sec Loss 3.0258 LearningRate 0.000577 Epoch: 12 Global Step: 262410 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:07,563-Speed 2497.44 samples/sec Loss 3.0859 LearningRate 0.000577 Epoch: 12 Global Step: 262420 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:15,765-Speed 2497.41 samples/sec Loss 3.0515 LearningRate 0.000577 Epoch: 12 Global Step: 262430 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:23,965-Speed 2498.33 samples/sec Loss 3.0830 LearningRate 0.000577 Epoch: 12 Global Step: 262440 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:32,116-Speed 2513.20 samples/sec Loss 3.1292 LearningRate 0.000577 Epoch: 12 Global Step: 262450 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:40,316-Speed 2497.78 samples/sec Loss 3.0419 LearningRate 0.000577 Epoch: 12 Global Step: 262460 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:48,519-Speed 2497.02 samples/sec Loss 3.0606 LearningRate 0.000577 Epoch: 12 Global Step: 262470 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:27:56,721-Speed 2497.47 samples/sec Loss 3.0332 LearningRate 0.000577 Epoch: 12 Global Step: 262480 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:04,925-Speed 2496.56 samples/sec Loss 3.0524 LearningRate 0.000577 Epoch: 12 Global Step: 262490 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:13,127-Speed 2497.35 samples/sec Loss 3.0552 LearningRate 0.000577 Epoch: 12 Global Step: 262500 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:21,280-Speed 2512.75 samples/sec Loss 3.0200 LearningRate 0.000577 Epoch: 12 Global Step: 262510 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:29,482-Speed 2497.11 samples/sec Loss 3.0823 LearningRate 0.000577 Epoch: 12 Global Step: 262520 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:37,698-Speed 2493.15 samples/sec Loss 3.0794 LearningRate 0.000577 Epoch: 12 Global Step: 262530 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:45,903-Speed 2496.61 samples/sec Loss 2.9977 LearningRate 0.000577 Epoch: 12 Global Step: 262540 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:28:54,109-Speed 2496.00 samples/sec Loss 2.9916 LearningRate 0.000577 Epoch: 12 Global Step: 262550 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:02,314-Speed 2496.45 samples/sec Loss 3.0668 LearningRate 0.000577 Epoch: 12 Global Step: 262560 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:10,471-Speed 2511.26 samples/sec Loss 3.0829 LearningRate 0.000577 Epoch: 12 Global Step: 262570 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:18,674-Speed 2497.10 samples/sec Loss 3.0312 LearningRate 0.000577 Epoch: 12 Global Step: 262580 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:26,876-Speed 2497.40 samples/sec Loss 3.0225 LearningRate 0.000577 Epoch: 12 Global Step: 262590 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:35,081-Speed 2496.48 samples/sec Loss 3.0632 LearningRate 0.000577 Epoch: 12 Global Step: 262600 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:43,305-Speed 2490.73 samples/sec Loss 3.1072 LearningRate 0.000577 Epoch: 12 Global Step: 262610 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:51,509-Speed 2496.52 samples/sec Loss 3.0845 LearningRate 0.000577 Epoch: 12 Global Step: 262620 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:29:59,656-Speed 2514.16 samples/sec Loss 3.0727 LearningRate 0.000577 Epoch: 12 Global Step: 262630 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:07,870-Speed 2493.95 samples/sec Loss 3.0721 LearningRate 0.000577 Epoch: 12 Global Step: 262640 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:16,070-Speed 2497.83 samples/sec Loss 3.0108 LearningRate 0.000577 Epoch: 12 Global Step: 262650 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:24,272-Speed 2497.37 samples/sec Loss 3.0744 LearningRate 0.000577 Epoch: 12 Global Step: 262660 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:32,474-Speed 2497.27 samples/sec Loss 3.0792 LearningRate 0.000577 Epoch: 12 Global Step: 262670 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:40,676-Speed 2497.33 samples/sec Loss 2.9906 LearningRate 0.000577 Epoch: 12 Global Step: 262680 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:48,827-Speed 2513.12 samples/sec Loss 3.0250 LearningRate 0.000576 Epoch: 12 Global Step: 262690 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:30:57,028-Speed 2497.60 samples/sec Loss 3.1077 LearningRate 0.000576 Epoch: 12 Global Step: 262700 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:05,233-Speed 2496.41 samples/sec Loss 3.0304 LearningRate 0.000576 Epoch: 12 Global Step: 262710 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:13,435-Speed 2497.20 samples/sec Loss 3.1485 LearningRate 0.000576 Epoch: 12 Global Step: 262720 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:21,637-Speed 2497.33 samples/sec Loss 3.1565 LearningRate 0.000576 Epoch: 12 Global Step: 262730 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:29,837-Speed 2497.97 samples/sec Loss 3.1448 LearningRate 0.000576 Epoch: 12 Global Step: 262740 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:37,987-Speed 2513.22 samples/sec Loss 3.1267 LearningRate 0.000576 Epoch: 12 Global Step: 262750 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:46,191-Speed 2496.60 samples/sec Loss 3.1046 LearningRate 0.000576 Epoch: 12 Global Step: 262760 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:31:54,396-Speed 2496.45 samples/sec Loss 3.1028 LearningRate 0.000576 Epoch: 12 Global Step: 262770 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:02,597-Speed 2497.76 samples/sec Loss 3.0263 LearningRate 0.000576 Epoch: 12 Global Step: 262780 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:10,801-Speed 2496.74 samples/sec Loss 3.0387 LearningRate 0.000576 Epoch: 12 Global Step: 262790 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:19,005-Speed 2496.86 samples/sec Loss 3.0744 LearningRate 0.000576 Epoch: 12 Global Step: 262800 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:27,155-Speed 2513.20 samples/sec Loss 3.0291 LearningRate 0.000576 Epoch: 12 Global Step: 262810 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:35,358-Speed 2497.07 samples/sec Loss 3.0944 LearningRate 0.000576 Epoch: 12 Global Step: 262820 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:43,561-Speed 2497.02 samples/sec Loss 3.1011 LearningRate 0.000576 Epoch: 12 Global Step: 262830 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:51,762-Speed 2497.56 samples/sec Loss 3.0957 LearningRate 0.000576 Epoch: 12 Global Step: 262840 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:32:59,965-Speed 2497.18 samples/sec Loss 3.0476 LearningRate 0.000576 Epoch: 12 Global Step: 262850 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:08,167-Speed 2497.23 samples/sec Loss 3.0451 LearningRate 0.000576 Epoch: 12 Global Step: 262860 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:16,316-Speed 2513.79 samples/sec Loss 3.0158 LearningRate 0.000576 Epoch: 12 Global Step: 262870 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:24,520-Speed 2496.86 samples/sec Loss 3.0900 LearningRate 0.000576 Epoch: 12 Global Step: 262880 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:32,725-Speed 2496.61 samples/sec Loss 3.0796 LearningRate 0.000576 Epoch: 12 Global Step: 262890 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:40,943-Speed 2492.45 samples/sec Loss 3.0658 LearningRate 0.000576 Epoch: 12 Global Step: 262900 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:49,145-Speed 2497.45 samples/sec Loss 3.1310 LearningRate 0.000576 Epoch: 12 Global Step: 262910 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:33:57,345-Speed 2497.95 samples/sec Loss 2.9998 LearningRate 0.000576 Epoch: 12 Global Step: 262920 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:05,514-Speed 2507.44 samples/sec Loss 3.1055 LearningRate 0.000576 Epoch: 12 Global Step: 262930 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:13,718-Speed 2496.81 samples/sec Loss 3.0782 LearningRate 0.000576 Epoch: 12 Global Step: 262940 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:21,919-Speed 2497.99 samples/sec Loss 3.0330 LearningRate 0.000576 Epoch: 12 Global Step: 262950 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:30,127-Speed 2495.43 samples/sec Loss 3.0560 LearningRate 0.000576 Epoch: 12 Global Step: 262960 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:38,329-Speed 2497.47 samples/sec Loss 3.0433 LearningRate 0.000576 Epoch: 12 Global Step: 262970 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:46,531-Speed 2497.35 samples/sec Loss 3.0176 LearningRate 0.000576 Epoch: 12 Global Step: 262980 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:34:54,680-Speed 2513.72 samples/sec Loss 3.0967 LearningRate 0.000576 Epoch: 12 Global Step: 262990 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:02,882-Speed 2497.26 samples/sec Loss 3.0674 LearningRate 0.000576 Epoch: 12 Global Step: 263000 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:11,082-Speed 2497.92 samples/sec Loss 2.9968 LearningRate 0.000576 Epoch: 12 Global Step: 263010 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:19,287-Speed 2496.30 samples/sec Loss 3.0032 LearningRate 0.000576 Epoch: 12 Global Step: 263020 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:27,486-Speed 2498.59 samples/sec Loss 2.9919 LearningRate 0.000576 Epoch: 12 Global Step: 263030 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:35,687-Speed 2497.40 samples/sec Loss 2.9389 LearningRate 0.000576 Epoch: 12 Global Step: 263040 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:43,843-Speed 2511.70 samples/sec Loss 3.0620 LearningRate 0.000576 Epoch: 12 Global Step: 263050 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:35:52,060-Speed 2492.56 samples/sec Loss 2.9980 LearningRate 0.000576 Epoch: 12 Global Step: 263060 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:00,263-Speed 2497.16 samples/sec Loss 3.0038 LearningRate 0.000576 Epoch: 12 Global Step: 263070 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:08,480-Speed 2492.91 samples/sec Loss 3.0467 LearningRate 0.000576 Epoch: 12 Global Step: 263080 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:16,681-Speed 2497.67 samples/sec Loss 3.0500 LearningRate 0.000576 Epoch: 12 Global Step: 263090 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:24,890-Speed 2495.23 samples/sec Loss 3.0131 LearningRate 0.000576 Epoch: 12 Global Step: 263100 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:33,041-Speed 2512.96 samples/sec Loss 2.9972 LearningRate 0.000576 Epoch: 12 Global Step: 263110 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:41,245-Speed 2496.51 samples/sec Loss 3.0236 LearningRate 0.000576 Epoch: 12 Global Step: 263120 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:49,453-Speed 2495.61 samples/sec Loss 2.9951 LearningRate 0.000576 Epoch: 12 Global Step: 263130 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:36:57,657-Speed 2496.81 samples/sec Loss 3.1067 LearningRate 0.000576 Epoch: 12 Global Step: 263140 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:05,861-Speed 2497.01 samples/sec Loss 3.1225 LearningRate 0.000576 Epoch: 12 Global Step: 263150 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:14,072-Speed 2494.67 samples/sec Loss 3.0253 LearningRate 0.000576 Epoch: 12 Global Step: 263160 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:22,226-Speed 2512.14 samples/sec Loss 3.0650 LearningRate 0.000576 Epoch: 12 Global Step: 263170 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:30,431-Speed 2496.49 samples/sec Loss 3.0513 LearningRate 0.000575 Epoch: 12 Global Step: 263180 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:38,638-Speed 2495.86 samples/sec Loss 3.0630 LearningRate 0.000575 Epoch: 12 Global Step: 263190 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:46,840-Speed 2497.09 samples/sec Loss 3.0770 LearningRate 0.000575 Epoch: 12 Global Step: 263200 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:37:55,044-Speed 2496.60 samples/sec Loss 3.0625 LearningRate 0.000575 Epoch: 12 Global Step: 263210 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:03,250-Speed 2496.39 samples/sec Loss 3.0237 LearningRate 0.000575 Epoch: 12 Global Step: 263220 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:11,401-Speed 2513.49 samples/sec Loss 3.0392 LearningRate 0.000575 Epoch: 12 Global Step: 263230 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:19,602-Speed 2497.41 samples/sec Loss 3.0035 LearningRate 0.000575 Epoch: 12 Global Step: 263240 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:27,805-Speed 2497.17 samples/sec Loss 3.0513 LearningRate 0.000575 Epoch: 12 Global Step: 263250 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:36,010-Speed 2496.43 samples/sec Loss 3.0553 LearningRate 0.000575 Epoch: 12 Global Step: 263260 Fp16 Grad Scale: 32768 Required: 130 hours Training: 2022-07-08 02:38:44,209-Speed 2498.08 samples/sec Loss 3.0098 LearningRate 0.000575 Epoch: 12 Global Step: 263270 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:38:52,411-Speed 2497.76 samples/sec Loss 3.0430 LearningRate 0.000575 Epoch: 12 Global Step: 263280 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:00,563-Speed 2512.76 samples/sec Loss 3.0217 LearningRate 0.000575 Epoch: 12 Global Step: 263290 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:08,769-Speed 2496.10 samples/sec Loss 3.0694 LearningRate 0.000575 Epoch: 12 Global Step: 263300 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:16,973-Speed 2496.58 samples/sec Loss 3.0660 LearningRate 0.000575 Epoch: 12 Global Step: 263310 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:25,186-Speed 2494.08 samples/sec Loss 3.1027 LearningRate 0.000575 Epoch: 12 Global Step: 263320 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:33,389-Speed 2497.19 samples/sec Loss 3.0136 LearningRate 0.000575 Epoch: 12 Global Step: 263330 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:41,591-Speed 2497.24 samples/sec Loss 3.0189 LearningRate 0.000575 Epoch: 12 Global Step: 263340 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:49,738-Speed 2514.10 samples/sec Loss 3.0700 LearningRate 0.000575 Epoch: 12 Global Step: 263350 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:39:57,943-Speed 2496.69 samples/sec Loss 3.0590 LearningRate 0.000575 Epoch: 12 Global Step: 263360 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:06,145-Speed 2497.25 samples/sec Loss 3.0795 LearningRate 0.000575 Epoch: 12 Global Step: 263370 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:14,351-Speed 2496.35 samples/sec Loss 3.0711 LearningRate 0.000575 Epoch: 12 Global Step: 263380 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:22,551-Speed 2498.62 samples/sec Loss 3.0407 LearningRate 0.000575 Epoch: 12 Global Step: 263390 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:30,753-Speed 2497.46 samples/sec Loss 3.0273 LearningRate 0.000575 Epoch: 12 Global Step: 263400 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:38,905-Speed 2512.59 samples/sec Loss 3.1022 LearningRate 0.000575 Epoch: 12 Global Step: 263410 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:47,114-Speed 2495.21 samples/sec Loss 3.0678 LearningRate 0.000575 Epoch: 12 Global Step: 263420 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:40:55,319-Speed 2496.73 samples/sec Loss 3.0534 LearningRate 0.000575 Epoch: 12 Global Step: 263430 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:03,518-Speed 2498.26 samples/sec Loss 3.0206 LearningRate 0.000575 Epoch: 12 Global Step: 263440 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:11,721-Speed 2497.03 samples/sec Loss 3.0679 LearningRate 0.000575 Epoch: 12 Global Step: 263450 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:19,924-Speed 2496.96 samples/sec Loss 3.0221 LearningRate 0.000575 Epoch: 12 Global Step: 263460 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:28,072-Speed 2514.10 samples/sec Loss 3.0750 LearningRate 0.000575 Epoch: 12 Global Step: 263470 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:36,271-Speed 2498.11 samples/sec Loss 2.9916 LearningRate 0.000575 Epoch: 12 Global Step: 263480 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:44,474-Speed 2497.11 samples/sec Loss 3.0787 LearningRate 0.000575 Epoch: 12 Global Step: 263490 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:41:52,681-Speed 2495.93 samples/sec Loss 3.1118 LearningRate 0.000575 Epoch: 12 Global Step: 263500 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:00,884-Speed 2497.15 samples/sec Loss 3.0849 LearningRate 0.000575 Epoch: 12 Global Step: 263510 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:09,088-Speed 2496.63 samples/sec Loss 3.2033 LearningRate 0.000575 Epoch: 12 Global Step: 263520 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:17,236-Speed 2513.81 samples/sec Loss 3.0970 LearningRate 0.000575 Epoch: 12 Global Step: 263530 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:25,438-Speed 2497.55 samples/sec Loss 3.0973 LearningRate 0.000575 Epoch: 12 Global Step: 263540 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:33,641-Speed 2496.92 samples/sec Loss 3.1439 LearningRate 0.000575 Epoch: 12 Global Step: 263550 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:41,843-Speed 2497.27 samples/sec Loss 3.0738 LearningRate 0.000575 Epoch: 12 Global Step: 263560 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:50,048-Speed 2496.67 samples/sec Loss 3.0738 LearningRate 0.000575 Epoch: 12 Global Step: 263570 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:42:58,251-Speed 2497.03 samples/sec Loss 3.0740 LearningRate 0.000575 Epoch: 12 Global Step: 263580 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:06,400-Speed 2513.55 samples/sec Loss 3.0917 LearningRate 0.000575 Epoch: 12 Global Step: 263590 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:14,604-Speed 2496.92 samples/sec Loss 3.0922 LearningRate 0.000575 Epoch: 12 Global Step: 263600 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:22,809-Speed 2496.63 samples/sec Loss 3.0678 LearningRate 0.000575 Epoch: 12 Global Step: 263610 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:31,027-Speed 2492.65 samples/sec Loss 3.0342 LearningRate 0.000575 Epoch: 12 Global Step: 263620 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:39,231-Speed 2496.74 samples/sec Loss 2.9978 LearningRate 0.000575 Epoch: 12 Global Step: 263630 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:47,440-Speed 2495.33 samples/sec Loss 3.0048 LearningRate 0.000575 Epoch: 12 Global Step: 263640 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:43:55,593-Speed 2512.43 samples/sec Loss 3.0389 LearningRate 0.000575 Epoch: 12 Global Step: 263650 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:44:03,796-Speed 2497.00 samples/sec Loss 2.9963 LearningRate 0.000575 Epoch: 12 Global Step: 263660 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 02:44:11,955-Speed 2510.36 samples/sec Loss 3.0406 LearningRate 0.000574 Epoch: 12 Global Step: 263670 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:44:20,186-Speed 2488.58 samples/sec Loss 3.0575 LearningRate 0.000574 Epoch: 12 Global Step: 263680 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:44:28,388-Speed 2497.41 samples/sec Loss 3.0376 LearningRate 0.000574 Epoch: 12 Global Step: 263690 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:44:36,588-Speed 2497.79 samples/sec Loss 3.1349 LearningRate 0.000574 Epoch: 12 Global Step: 263700 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:44:44,743-Speed 2511.84 samples/sec Loss 3.0195 LearningRate 0.000574 Epoch: 12 Global Step: 263710 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:44:52,946-Speed 2496.88 samples/sec Loss 3.1239 LearningRate 0.000574 Epoch: 12 Global Step: 263720 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:01,146-Speed 2497.87 samples/sec Loss 3.0929 LearningRate 0.000574 Epoch: 12 Global Step: 263730 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:09,363-Speed 2492.74 samples/sec Loss 2.9667 LearningRate 0.000574 Epoch: 12 Global Step: 263740 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:17,563-Speed 2497.82 samples/sec Loss 3.0696 LearningRate 0.000574 Epoch: 12 Global Step: 263750 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:25,767-Speed 2496.88 samples/sec Loss 3.0458 LearningRate 0.000574 Epoch: 12 Global Step: 263760 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:33,927-Speed 2510.18 samples/sec Loss 3.0606 LearningRate 0.000574 Epoch: 12 Global Step: 263770 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:42,127-Speed 2497.78 samples/sec Loss 3.0601 LearningRate 0.000574 Epoch: 12 Global Step: 263780 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:50,328-Speed 2497.63 samples/sec Loss 3.0120 LearningRate 0.000574 Epoch: 12 Global Step: 263790 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:45:58,537-Speed 2495.07 samples/sec Loss 3.0670 LearningRate 0.000574 Epoch: 12 Global Step: 263800 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:06,767-Speed 2488.95 samples/sec Loss 3.0736 LearningRate 0.000574 Epoch: 12 Global Step: 263810 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:14,973-Speed 2496.22 samples/sec Loss 3.0076 LearningRate 0.000574 Epoch: 12 Global Step: 263820 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:23,122-Speed 2513.61 samples/sec Loss 3.0883 LearningRate 0.000574 Epoch: 12 Global Step: 263830 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:31,324-Speed 2497.25 samples/sec Loss 3.0979 LearningRate 0.000574 Epoch: 12 Global Step: 263840 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:39,527-Speed 2497.31 samples/sec Loss 3.0586 LearningRate 0.000574 Epoch: 12 Global Step: 263850 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:47,743-Speed 2492.83 samples/sec Loss 3.0456 LearningRate 0.000574 Epoch: 12 Global Step: 263860 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:46:55,948-Speed 2496.52 samples/sec Loss 3.0786 LearningRate 0.000574 Epoch: 12 Global Step: 263870 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:04,150-Speed 2497.22 samples/sec Loss 3.0768 LearningRate 0.000574 Epoch: 12 Global Step: 263880 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:12,299-Speed 2513.42 samples/sec Loss 3.1059 LearningRate 0.000574 Epoch: 12 Global Step: 263890 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:20,503-Speed 2496.89 samples/sec Loss 3.0893 LearningRate 0.000574 Epoch: 12 Global Step: 263900 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:28,712-Speed 2495.04 samples/sec Loss 3.0370 LearningRate 0.000574 Epoch: 12 Global Step: 263910 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:36,917-Speed 2496.51 samples/sec Loss 3.0709 LearningRate 0.000574 Epoch: 12 Global Step: 263920 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:45,120-Speed 2496.82 samples/sec Loss 3.0994 LearningRate 0.000574 Epoch: 12 Global Step: 263930 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:47:53,322-Speed 2497.68 samples/sec Loss 3.0470 LearningRate 0.000574 Epoch: 12 Global Step: 263940 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:01,471-Speed 2513.77 samples/sec Loss 3.0430 LearningRate 0.000574 Epoch: 12 Global Step: 263950 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:09,674-Speed 2496.87 samples/sec Loss 3.0632 LearningRate 0.000574 Epoch: 12 Global Step: 263960 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:17,876-Speed 2497.11 samples/sec Loss 3.0432 LearningRate 0.000574 Epoch: 12 Global Step: 263970 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:26,079-Speed 2497.01 samples/sec Loss 2.9828 LearningRate 0.000574 Epoch: 12 Global Step: 263980 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:34,282-Speed 2497.04 samples/sec Loss 3.0135 LearningRate 0.000574 Epoch: 12 Global Step: 263990 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:42,487-Speed 2496.91 samples/sec Loss 3.0347 LearningRate 0.000574 Epoch: 12 Global Step: 264000 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:50,635-Speed 2513.68 samples/sec Loss 3.0161 LearningRate 0.000574 Epoch: 12 Global Step: 264010 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:48:58,837-Speed 2497.47 samples/sec Loss 3.0231 LearningRate 0.000574 Epoch: 12 Global Step: 264020 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:07,045-Speed 2495.41 samples/sec Loss 3.0101 LearningRate 0.000574 Epoch: 12 Global Step: 264030 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:15,249-Speed 2496.98 samples/sec Loss 2.9857 LearningRate 0.000574 Epoch: 12 Global Step: 264040 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:23,450-Speed 2497.61 samples/sec Loss 3.0381 LearningRate 0.000574 Epoch: 12 Global Step: 264050 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:31,656-Speed 2496.31 samples/sec Loss 3.0390 LearningRate 0.000574 Epoch: 12 Global Step: 264060 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:39,806-Speed 2513.30 samples/sec Loss 3.0609 LearningRate 0.000574 Epoch: 12 Global Step: 264070 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:48,012-Speed 2496.04 samples/sec Loss 3.0374 LearningRate 0.000574 Epoch: 12 Global Step: 264080 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:49:56,217-Speed 2496.75 samples/sec Loss 3.0400 LearningRate 0.000574 Epoch: 12 Global Step: 264090 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:04,421-Speed 2496.50 samples/sec Loss 3.0776 LearningRate 0.000574 Epoch: 12 Global Step: 264100 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:12,631-Speed 2494.91 samples/sec Loss 3.0281 LearningRate 0.000574 Epoch: 12 Global Step: 264110 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:20,834-Speed 2497.33 samples/sec Loss 3.0777 LearningRate 0.000574 Epoch: 12 Global Step: 264120 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:28,986-Speed 2512.76 samples/sec Loss 2.9838 LearningRate 0.000574 Epoch: 12 Global Step: 264130 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:37,189-Speed 2496.89 samples/sec Loss 2.9796 LearningRate 0.000574 Epoch: 12 Global Step: 264140 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:45,405-Speed 2493.24 samples/sec Loss 3.0679 LearningRate 0.000574 Epoch: 12 Global Step: 264150 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:50:53,605-Speed 2497.79 samples/sec Loss 3.0007 LearningRate 0.000573 Epoch: 12 Global Step: 264160 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:01,809-Speed 2496.86 samples/sec Loss 3.0365 LearningRate 0.000573 Epoch: 12 Global Step: 264170 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:10,011-Speed 2497.29 samples/sec Loss 3.0862 LearningRate 0.000573 Epoch: 12 Global Step: 264180 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:18,162-Speed 2513.14 samples/sec Loss 3.0314 LearningRate 0.000573 Epoch: 12 Global Step: 264190 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:26,361-Speed 2498.27 samples/sec Loss 3.0461 LearningRate 0.000573 Epoch: 12 Global Step: 264200 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:34,565-Speed 2496.50 samples/sec Loss 3.0227 LearningRate 0.000573 Epoch: 12 Global Step: 264210 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:42,772-Speed 2496.08 samples/sec Loss 3.0471 LearningRate 0.000573 Epoch: 12 Global Step: 264220 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:50,976-Speed 2496.93 samples/sec Loss 3.0833 LearningRate 0.000573 Epoch: 12 Global Step: 264230 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:51:59,178-Speed 2497.05 samples/sec Loss 3.0516 LearningRate 0.000573 Epoch: 12 Global Step: 264240 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:07,331-Speed 2512.42 samples/sec Loss 3.0778 LearningRate 0.000573 Epoch: 12 Global Step: 264250 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:15,532-Speed 2497.90 samples/sec Loss 3.0627 LearningRate 0.000573 Epoch: 12 Global Step: 264260 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:23,733-Speed 2497.46 samples/sec Loss 3.0049 LearningRate 0.000573 Epoch: 12 Global Step: 264270 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:31,937-Speed 2496.80 samples/sec Loss 3.0054 LearningRate 0.000573 Epoch: 12 Global Step: 264280 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:40,160-Speed 2491.04 samples/sec Loss 2.9804 LearningRate 0.000573 Epoch: 12 Global Step: 264290 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:48,369-Speed 2495.45 samples/sec Loss 2.9851 LearningRate 0.000573 Epoch: 12 Global Step: 264300 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:52:56,518-Speed 2513.70 samples/sec Loss 3.0414 LearningRate 0.000573 Epoch: 12 Global Step: 264310 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:04,722-Speed 2496.59 samples/sec Loss 3.0337 LearningRate 0.000573 Epoch: 12 Global Step: 264320 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:12,928-Speed 2495.95 samples/sec Loss 3.0764 LearningRate 0.000573 Epoch: 12 Global Step: 264330 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:21,137-Speed 2495.42 samples/sec Loss 3.0427 LearningRate 0.000573 Epoch: 12 Global Step: 264340 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:29,343-Speed 2496.04 samples/sec Loss 3.0032 LearningRate 0.000573 Epoch: 12 Global Step: 264350 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:37,546-Speed 2497.29 samples/sec Loss 3.0487 LearningRate 0.000573 Epoch: 12 Global Step: 264360 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:45,707-Speed 2510.32 samples/sec Loss 3.0258 LearningRate 0.000573 Epoch: 12 Global Step: 264370 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:53:53,907-Speed 2497.70 samples/sec Loss 2.9570 LearningRate 0.000573 Epoch: 12 Global Step: 264380 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:02,110-Speed 2497.12 samples/sec Loss 3.0076 LearningRate 0.000573 Epoch: 12 Global Step: 264390 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:10,311-Speed 2497.64 samples/sec Loss 3.0748 LearningRate 0.000573 Epoch: 12 Global Step: 264400 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:18,511-Speed 2498.24 samples/sec Loss 2.9953 LearningRate 0.000573 Epoch: 12 Global Step: 264410 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:26,713-Speed 2497.38 samples/sec Loss 3.0209 LearningRate 0.000573 Epoch: 12 Global Step: 264420 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:34,864-Speed 2512.92 samples/sec Loss 3.0020 LearningRate 0.000573 Epoch: 12 Global Step: 264430 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:43,066-Speed 2497.23 samples/sec Loss 3.0879 LearningRate 0.000573 Epoch: 12 Global Step: 264440 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:51,269-Speed 2496.96 samples/sec Loss 3.0369 LearningRate 0.000573 Epoch: 12 Global Step: 264450 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:54:59,474-Speed 2496.48 samples/sec Loss 3.0629 LearningRate 0.000573 Epoch: 12 Global Step: 264460 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:07,676-Speed 2497.38 samples/sec Loss 3.0310 LearningRate 0.000573 Epoch: 12 Global Step: 264470 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:15,879-Speed 2496.83 samples/sec Loss 3.0149 LearningRate 0.000573 Epoch: 12 Global Step: 264480 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:24,035-Speed 2511.55 samples/sec Loss 3.0579 LearningRate 0.000573 Epoch: 12 Global Step: 264490 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:32,235-Speed 2497.78 samples/sec Loss 3.0551 LearningRate 0.000573 Epoch: 12 Global Step: 264500 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:40,440-Speed 2496.53 samples/sec Loss 2.9948 LearningRate 0.000573 Epoch: 12 Global Step: 264510 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:48,642-Speed 2497.46 samples/sec Loss 3.0204 LearningRate 0.000573 Epoch: 12 Global Step: 264520 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:55:56,844-Speed 2497.25 samples/sec Loss 2.9475 LearningRate 0.000573 Epoch: 12 Global Step: 264530 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:05,055-Speed 2494.47 samples/sec Loss 3.0183 LearningRate 0.000573 Epoch: 12 Global Step: 264540 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:13,206-Speed 2513.06 samples/sec Loss 3.0545 LearningRate 0.000573 Epoch: 12 Global Step: 264550 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:21,410-Speed 2496.60 samples/sec Loss 3.0587 LearningRate 0.000573 Epoch: 12 Global Step: 264560 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:29,616-Speed 2496.39 samples/sec Loss 3.0263 LearningRate 0.000573 Epoch: 12 Global Step: 264570 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:37,818-Speed 2497.29 samples/sec Loss 3.0273 LearningRate 0.000573 Epoch: 12 Global Step: 264580 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:46,023-Speed 2496.46 samples/sec Loss 3.0061 LearningRate 0.000573 Epoch: 12 Global Step: 264590 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:56:54,237-Speed 2493.77 samples/sec Loss 3.0531 LearningRate 0.000573 Epoch: 12 Global Step: 264600 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:02,406-Speed 2507.49 samples/sec Loss 2.9794 LearningRate 0.000573 Epoch: 12 Global Step: 264610 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:10,608-Speed 2497.31 samples/sec Loss 3.0399 LearningRate 0.000573 Epoch: 12 Global Step: 264620 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:18,805-Speed 2498.66 samples/sec Loss 3.0498 LearningRate 0.000573 Epoch: 12 Global Step: 264630 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:27,002-Speed 2498.89 samples/sec Loss 3.0795 LearningRate 0.000573 Epoch: 12 Global Step: 264640 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:35,203-Speed 2497.64 samples/sec Loss 3.0512 LearningRate 0.000573 Epoch: 12 Global Step: 264650 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:43,404-Speed 2497.56 samples/sec Loss 3.0784 LearningRate 0.000572 Epoch: 12 Global Step: 264660 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:51,558-Speed 2512.08 samples/sec Loss 3.0604 LearningRate 0.000572 Epoch: 12 Global Step: 264670 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:57:59,761-Speed 2497.18 samples/sec Loss 3.0361 LearningRate 0.000572 Epoch: 12 Global Step: 264680 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:07,982-Speed 2491.63 samples/sec Loss 3.0877 LearningRate 0.000572 Epoch: 12 Global Step: 264690 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:16,184-Speed 2497.24 samples/sec Loss 3.0902 LearningRate 0.000572 Epoch: 12 Global Step: 264700 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:24,385-Speed 2497.80 samples/sec Loss 3.0245 LearningRate 0.000572 Epoch: 12 Global Step: 264710 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:32,586-Speed 2497.58 samples/sec Loss 3.0856 LearningRate 0.000572 Epoch: 12 Global Step: 264720 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:40,741-Speed 2511.74 samples/sec Loss 3.0495 LearningRate 0.000572 Epoch: 12 Global Step: 264730 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:48,939-Speed 2498.53 samples/sec Loss 3.0599 LearningRate 0.000572 Epoch: 12 Global Step: 264740 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:58:57,143-Speed 2496.66 samples/sec Loss 3.1055 LearningRate 0.000572 Epoch: 12 Global Step: 264750 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:05,345-Speed 2497.73 samples/sec Loss 3.0356 LearningRate 0.000572 Epoch: 12 Global Step: 264760 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:13,549-Speed 2496.78 samples/sec Loss 3.0271 LearningRate 0.000572 Epoch: 12 Global Step: 264770 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:21,747-Speed 2498.84 samples/sec Loss 3.0773 LearningRate 0.000572 Epoch: 12 Global Step: 264780 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:29,893-Speed 2514.79 samples/sec Loss 3.0865 LearningRate 0.000572 Epoch: 12 Global Step: 264790 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:38,092-Speed 2498.42 samples/sec Loss 3.0416 LearningRate 0.000572 Epoch: 12 Global Step: 264800 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:46,298-Speed 2496.23 samples/sec Loss 3.0414 LearningRate 0.000572 Epoch: 12 Global Step: 264810 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 02:59:54,497-Speed 2498.09 samples/sec Loss 3.1163 LearningRate 0.000572 Epoch: 12 Global Step: 264820 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:00:02,693-Speed 2499.24 samples/sec Loss 3.0377 LearningRate 0.000572 Epoch: 12 Global Step: 264830 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:00:10,899-Speed 2496.50 samples/sec Loss 3.0697 LearningRate 0.000572 Epoch: 12 Global Step: 264840 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:00:19,042-Speed 2515.20 samples/sec Loss 3.0270 LearningRate 0.000572 Epoch: 12 Global Step: 264850 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:00:27,241-Speed 2498.52 samples/sec Loss 3.0473 LearningRate 0.000572 Epoch: 12 Global Step: 264860 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:00:35,439-Speed 2498.38 samples/sec Loss 3.0652 LearningRate 0.000572 Epoch: 12 Global Step: 264870 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:00:43,639-Speed 2498.03 samples/sec Loss 3.1006 LearningRate 0.000572 Epoch: 12 Global Step: 264880 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:00:51,836-Speed 2498.93 samples/sec Loss 3.0409 LearningRate 0.000572 Epoch: 12 Global Step: 264890 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:00,033-Speed 2499.01 samples/sec Loss 3.0447 LearningRate 0.000572 Epoch: 12 Global Step: 264900 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:08,179-Speed 2514.24 samples/sec Loss 3.0529 LearningRate 0.000572 Epoch: 12 Global Step: 264910 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:16,380-Speed 2497.93 samples/sec Loss 3.0428 LearningRate 0.000572 Epoch: 12 Global Step: 264920 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:24,584-Speed 2496.71 samples/sec Loss 3.0108 LearningRate 0.000572 Epoch: 12 Global Step: 264930 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:32,782-Speed 2498.30 samples/sec Loss 3.0291 LearningRate 0.000572 Epoch: 12 Global Step: 264940 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:40,982-Speed 2498.14 samples/sec Loss 3.0314 LearningRate 0.000572 Epoch: 12 Global Step: 264950 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:49,181-Speed 2498.17 samples/sec Loss 3.0639 LearningRate 0.000572 Epoch: 12 Global Step: 264960 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:01:57,335-Speed 2512.04 samples/sec Loss 3.0387 LearningRate 0.000572 Epoch: 12 Global Step: 264970 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:05,531-Speed 2499.33 samples/sec Loss 3.0463 LearningRate 0.000572 Epoch: 12 Global Step: 264980 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:13,730-Speed 2498.40 samples/sec Loss 3.0395 LearningRate 0.000572 Epoch: 12 Global Step: 264990 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:21,931-Speed 2497.81 samples/sec Loss 2.9803 LearningRate 0.000572 Epoch: 12 Global Step: 265000 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:30,130-Speed 2498.09 samples/sec Loss 3.0091 LearningRate 0.000572 Epoch: 12 Global Step: 265010 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:38,329-Speed 2498.31 samples/sec Loss 3.0182 LearningRate 0.000572 Epoch: 12 Global Step: 265020 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:46,476-Speed 2514.26 samples/sec Loss 3.0575 LearningRate 0.000572 Epoch: 12 Global Step: 265030 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:02:54,672-Speed 2499.18 samples/sec Loss 3.0199 LearningRate 0.000572 Epoch: 12 Global Step: 265040 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:02,873-Speed 2497.77 samples/sec Loss 3.0661 LearningRate 0.000572 Epoch: 12 Global Step: 265050 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:11,071-Speed 2498.54 samples/sec Loss 3.0572 LearningRate 0.000572 Epoch: 12 Global Step: 265060 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:19,267-Speed 2499.21 samples/sec Loss 3.0213 LearningRate 0.000572 Epoch: 12 Global Step: 265070 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:27,469-Speed 2497.24 samples/sec Loss 2.9946 LearningRate 0.000572 Epoch: 12 Global Step: 265080 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:35,615-Speed 2514.85 samples/sec Loss 3.0038 LearningRate 0.000572 Epoch: 12 Global Step: 265090 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:43,812-Speed 2498.85 samples/sec Loss 3.0353 LearningRate 0.000572 Epoch: 12 Global Step: 265100 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:03:52,009-Speed 2499.17 samples/sec Loss 2.9865 LearningRate 0.000572 Epoch: 12 Global Step: 265110 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:00,208-Speed 2498.01 samples/sec Loss 2.9365 LearningRate 0.000572 Epoch: 12 Global Step: 265120 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:08,410-Speed 2497.57 samples/sec Loss 3.0035 LearningRate 0.000572 Epoch: 12 Global Step: 265130 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:16,614-Speed 2496.79 samples/sec Loss 2.9658 LearningRate 0.000572 Epoch: 12 Global Step: 265140 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:24,760-Speed 2514.87 samples/sec Loss 2.9924 LearningRate 0.000571 Epoch: 12 Global Step: 265150 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:32,958-Speed 2498.52 samples/sec Loss 3.0759 LearningRate 0.000571 Epoch: 12 Global Step: 265160 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:41,153-Speed 2499.38 samples/sec Loss 3.0465 LearningRate 0.000571 Epoch: 12 Global Step: 265170 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:49,354-Speed 2497.81 samples/sec Loss 2.9865 LearningRate 0.000571 Epoch: 12 Global Step: 265180 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:04:57,552-Speed 2498.56 samples/sec Loss 3.1110 LearningRate 0.000571 Epoch: 12 Global Step: 265190 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:05:05,716-Speed 2508.86 samples/sec Loss 3.1115 LearningRate 0.000571 Epoch: 12 Global Step: 265200 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:13,866-Speed 2513.09 samples/sec Loss 3.0255 LearningRate 0.000571 Epoch: 12 Global Step: 265210 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:22,063-Speed 2498.94 samples/sec Loss 3.0177 LearningRate 0.000571 Epoch: 12 Global Step: 265220 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:30,280-Speed 2492.85 samples/sec Loss 3.0069 LearningRate 0.000571 Epoch: 12 Global Step: 265230 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:38,480-Speed 2497.82 samples/sec Loss 3.0530 LearningRate 0.000571 Epoch: 12 Global Step: 265240 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:46,699-Speed 2492.56 samples/sec Loss 3.0075 LearningRate 0.000571 Epoch: 12 Global Step: 265250 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:05:54,905-Speed 2496.01 samples/sec Loss 3.0407 LearningRate 0.000571 Epoch: 12 Global Step: 265260 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:03,052-Speed 2514.26 samples/sec Loss 2.9415 LearningRate 0.000571 Epoch: 12 Global Step: 265270 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:11,249-Speed 2498.85 samples/sec Loss 3.0370 LearningRate 0.000571 Epoch: 12 Global Step: 265280 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:19,450-Speed 2497.58 samples/sec Loss 3.0078 LearningRate 0.000571 Epoch: 12 Global Step: 265290 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:27,648-Speed 2498.71 samples/sec Loss 2.9462 LearningRate 0.000571 Epoch: 12 Global Step: 265300 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:35,847-Speed 2499.62 samples/sec Loss 2.9814 LearningRate 0.000571 Epoch: 12 Global Step: 265310 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:44,054-Speed 2496.06 samples/sec Loss 2.9787 LearningRate 0.000571 Epoch: 12 Global Step: 265320 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:06:52,205-Speed 2512.97 samples/sec Loss 2.9887 LearningRate 0.000571 Epoch: 12 Global Step: 265330 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:00,404-Speed 2498.42 samples/sec Loss 2.9745 LearningRate 0.000571 Epoch: 12 Global Step: 265340 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:08,606-Speed 2497.38 samples/sec Loss 3.0012 LearningRate 0.000571 Epoch: 12 Global Step: 265350 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:16,804-Speed 2498.69 samples/sec Loss 3.0500 LearningRate 0.000571 Epoch: 12 Global Step: 265360 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:25,003-Speed 2498.39 samples/sec Loss 3.0690 LearningRate 0.000571 Epoch: 12 Global Step: 265370 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:33,202-Speed 2498.00 samples/sec Loss 3.0430 LearningRate 0.000571 Epoch: 12 Global Step: 265380 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:41,353-Speed 2513.46 samples/sec Loss 3.0605 LearningRate 0.000571 Epoch: 12 Global Step: 265390 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:49,559-Speed 2496.22 samples/sec Loss 3.0255 LearningRate 0.000571 Epoch: 12 Global Step: 265400 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:07:57,776-Speed 2492.57 samples/sec Loss 3.0090 LearningRate 0.000571 Epoch: 12 Global Step: 265410 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:05,980-Speed 2497.23 samples/sec Loss 3.0260 LearningRate 0.000571 Epoch: 12 Global Step: 265420 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:14,180-Speed 2497.82 samples/sec Loss 2.9781 LearningRate 0.000571 Epoch: 12 Global Step: 265430 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:22,378-Speed 2498.69 samples/sec Loss 2.9405 LearningRate 0.000571 Epoch: 12 Global Step: 265440 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:30,535-Speed 2510.97 samples/sec Loss 3.0151 LearningRate 0.000571 Epoch: 12 Global Step: 265450 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:38,732-Speed 2499.11 samples/sec Loss 3.0042 LearningRate 0.000571 Epoch: 12 Global Step: 265460 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:46,930-Speed 2498.53 samples/sec Loss 3.0825 LearningRate 0.000571 Epoch: 12 Global Step: 265470 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:08:55,129-Speed 2498.04 samples/sec Loss 2.9965 LearningRate 0.000571 Epoch: 12 Global Step: 265480 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:03,326-Speed 2499.03 samples/sec Loss 3.0750 LearningRate 0.000571 Epoch: 12 Global Step: 265490 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:11,543-Speed 2492.77 samples/sec Loss 3.1002 LearningRate 0.000571 Epoch: 12 Global Step: 265500 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:19,701-Speed 2510.64 samples/sec Loss 2.9923 LearningRate 0.000571 Epoch: 12 Global Step: 265510 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:27,899-Speed 2498.61 samples/sec Loss 3.0128 LearningRate 0.000571 Epoch: 12 Global Step: 265520 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:36,112-Speed 2494.16 samples/sec Loss 2.9972 LearningRate 0.000571 Epoch: 12 Global Step: 265530 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:44,312-Speed 2497.92 samples/sec Loss 3.1000 LearningRate 0.000571 Epoch: 12 Global Step: 265540 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:09:52,512-Speed 2498.01 samples/sec Loss 3.0725 LearningRate 0.000571 Epoch: 12 Global Step: 265550 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:00,717-Speed 2496.52 samples/sec Loss 3.0180 LearningRate 0.000571 Epoch: 12 Global Step: 265560 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:08,869-Speed 2512.61 samples/sec Loss 3.0552 LearningRate 0.000571 Epoch: 12 Global Step: 265570 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:17,070-Speed 2497.58 samples/sec Loss 3.0277 LearningRate 0.000571 Epoch: 12 Global Step: 265580 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:25,305-Speed 2487.07 samples/sec Loss 2.9959 LearningRate 0.000571 Epoch: 12 Global Step: 265590 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:33,504-Speed 2498.41 samples/sec Loss 3.0892 LearningRate 0.000571 Epoch: 12 Global Step: 265600 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:41,707-Speed 2497.24 samples/sec Loss 3.0628 LearningRate 0.000571 Epoch: 12 Global Step: 265610 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:49,910-Speed 2496.76 samples/sec Loss 3.0142 LearningRate 0.000571 Epoch: 12 Global Step: 265620 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:10:58,056-Speed 2514.72 samples/sec Loss 2.9980 LearningRate 0.000571 Epoch: 12 Global Step: 265630 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:06,254-Speed 2498.47 samples/sec Loss 3.0176 LearningRate 0.000571 Epoch: 12 Global Step: 265640 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:14,458-Speed 2496.69 samples/sec Loss 3.0607 LearningRate 0.000570 Epoch: 12 Global Step: 265650 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:22,661-Speed 2497.27 samples/sec Loss 3.1089 LearningRate 0.000570 Epoch: 12 Global Step: 265660 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:30,860-Speed 2498.31 samples/sec Loss 3.0428 LearningRate 0.000570 Epoch: 12 Global Step: 265670 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:39,060-Speed 2497.92 samples/sec Loss 3.0356 LearningRate 0.000570 Epoch: 12 Global Step: 265680 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:47,206-Speed 2514.29 samples/sec Loss 3.0659 LearningRate 0.000570 Epoch: 12 Global Step: 265690 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:11:55,403-Speed 2499.06 samples/sec Loss 3.0401 LearningRate 0.000570 Epoch: 12 Global Step: 265700 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:03,604-Speed 2497.64 samples/sec Loss 3.0735 LearningRate 0.000570 Epoch: 12 Global Step: 265710 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:11,806-Speed 2497.38 samples/sec Loss 3.0585 LearningRate 0.000570 Epoch: 12 Global Step: 265720 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:20,008-Speed 2497.22 samples/sec Loss 3.1070 LearningRate 0.000570 Epoch: 12 Global Step: 265730 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:28,216-Speed 2495.57 samples/sec Loss 3.0231 LearningRate 0.000570 Epoch: 12 Global Step: 265740 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:36,366-Speed 2513.32 samples/sec Loss 3.0191 LearningRate 0.000570 Epoch: 12 Global Step: 265750 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:44,571-Speed 2496.69 samples/sec Loss 3.0033 LearningRate 0.000570 Epoch: 12 Global Step: 265760 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:12:52,777-Speed 2496.42 samples/sec Loss 3.0626 LearningRate 0.000570 Epoch: 12 Global Step: 265770 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:00,987-Speed 2495.15 samples/sec Loss 3.0864 LearningRate 0.000570 Epoch: 12 Global Step: 265780 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:09,220-Speed 2487.96 samples/sec Loss 3.0198 LearningRate 0.000570 Epoch: 12 Global Step: 265790 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:17,430-Speed 2495.00 samples/sec Loss 3.0566 LearningRate 0.000570 Epoch: 12 Global Step: 265800 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:25,579-Speed 2513.41 samples/sec Loss 3.0135 LearningRate 0.000570 Epoch: 12 Global Step: 265810 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:33,783-Speed 2496.92 samples/sec Loss 3.0779 LearningRate 0.000570 Epoch: 12 Global Step: 265820 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:41,988-Speed 2496.25 samples/sec Loss 3.0800 LearningRate 0.000570 Epoch: 12 Global Step: 265830 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:50,186-Speed 2498.72 samples/sec Loss 3.0715 LearningRate 0.000570 Epoch: 12 Global Step: 265840 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:13:58,407-Speed 2491.48 samples/sec Loss 3.0381 LearningRate 0.000570 Epoch: 12 Global Step: 265850 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:06,613-Speed 2496.24 samples/sec Loss 3.0881 LearningRate 0.000570 Epoch: 12 Global Step: 265860 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:14,761-Speed 2513.81 samples/sec Loss 3.0079 LearningRate 0.000570 Epoch: 12 Global Step: 265870 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:22,967-Speed 2496.15 samples/sec Loss 3.0279 LearningRate 0.000570 Epoch: 12 Global Step: 265880 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:31,168-Speed 2497.59 samples/sec Loss 2.9784 LearningRate 0.000570 Epoch: 12 Global Step: 265890 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:39,369-Speed 2497.66 samples/sec Loss 3.0684 LearningRate 0.000570 Epoch: 12 Global Step: 265900 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:47,572-Speed 2497.06 samples/sec Loss 3.0268 LearningRate 0.000570 Epoch: 12 Global Step: 265910 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:14:55,770-Speed 2498.73 samples/sec Loss 2.9850 LearningRate 0.000570 Epoch: 12 Global Step: 265920 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:03,918-Speed 2513.64 samples/sec Loss 3.0374 LearningRate 0.000570 Epoch: 12 Global Step: 265930 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:12,126-Speed 2495.50 samples/sec Loss 3.0050 LearningRate 0.000570 Epoch: 12 Global Step: 265940 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:20,336-Speed 2494.94 samples/sec Loss 2.9916 LearningRate 0.000570 Epoch: 12 Global Step: 265950 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:28,541-Speed 2496.35 samples/sec Loss 3.0430 LearningRate 0.000570 Epoch: 12 Global Step: 265960 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:36,753-Speed 2494.17 samples/sec Loss 2.9967 LearningRate 0.000570 Epoch: 12 Global Step: 265970 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:44,968-Speed 2493.43 samples/sec Loss 2.9804 LearningRate 0.000570 Epoch: 12 Global Step: 265980 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:15:53,122-Speed 2512.08 samples/sec Loss 3.0245 LearningRate 0.000570 Epoch: 12 Global Step: 265990 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:01,319-Speed 2499.18 samples/sec Loss 2.9530 LearningRate 0.000570 Epoch: 12 Global Step: 266000 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:09,530-Speed 2494.56 samples/sec Loss 3.0207 LearningRate 0.000570 Epoch: 12 Global Step: 266010 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:17,730-Speed 2498.17 samples/sec Loss 3.0053 LearningRate 0.000570 Epoch: 12 Global Step: 266020 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:25,930-Speed 2497.87 samples/sec Loss 2.9765 LearningRate 0.000570 Epoch: 12 Global Step: 266030 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:34,133-Speed 2496.86 samples/sec Loss 2.9572 LearningRate 0.000570 Epoch: 12 Global Step: 266040 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:42,282-Speed 2513.77 samples/sec Loss 3.0690 LearningRate 0.000570 Epoch: 12 Global Step: 266050 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:50,486-Speed 2496.74 samples/sec Loss 2.9526 LearningRate 0.000570 Epoch: 12 Global Step: 266060 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:16:58,684-Speed 2498.42 samples/sec Loss 3.0076 LearningRate 0.000570 Epoch: 12 Global Step: 266070 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:06,881-Speed 2498.87 samples/sec Loss 3.0300 LearningRate 0.000570 Epoch: 12 Global Step: 266080 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:15,080-Speed 2498.26 samples/sec Loss 3.0570 LearningRate 0.000570 Epoch: 12 Global Step: 266090 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:23,278-Speed 2498.57 samples/sec Loss 3.0160 LearningRate 0.000570 Epoch: 12 Global Step: 266100 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:31,425-Speed 2514.34 samples/sec Loss 3.1001 LearningRate 0.000570 Epoch: 12 Global Step: 266110 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:39,625-Speed 2498.16 samples/sec Loss 3.0570 LearningRate 0.000570 Epoch: 12 Global Step: 266120 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:47,822-Speed 2499.01 samples/sec Loss 3.0880 LearningRate 0.000570 Epoch: 12 Global Step: 266130 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:17:56,024-Speed 2497.38 samples/sec Loss 3.0116 LearningRate 0.000569 Epoch: 12 Global Step: 266140 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:04,221-Speed 2498.79 samples/sec Loss 3.0273 LearningRate 0.000569 Epoch: 12 Global Step: 266150 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:12,423-Speed 2497.47 samples/sec Loss 3.0708 LearningRate 0.000569 Epoch: 12 Global Step: 266160 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:20,571-Speed 2513.87 samples/sec Loss 3.0239 LearningRate 0.000569 Epoch: 12 Global Step: 266170 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:28,767-Speed 2499.16 samples/sec Loss 3.0108 LearningRate 0.000569 Epoch: 12 Global Step: 266180 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:36,969-Speed 2497.41 samples/sec Loss 3.0243 LearningRate 0.000569 Epoch: 12 Global Step: 266190 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:45,169-Speed 2498.03 samples/sec Loss 3.0091 LearningRate 0.000569 Epoch: 12 Global Step: 266200 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:18:53,369-Speed 2497.93 samples/sec Loss 3.0157 LearningRate 0.000569 Epoch: 12 Global Step: 266210 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:01,568-Speed 2498.32 samples/sec Loss 2.9637 LearningRate 0.000569 Epoch: 12 Global Step: 266220 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:09,713-Speed 2514.62 samples/sec Loss 2.9593 LearningRate 0.000569 Epoch: 12 Global Step: 266230 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:17,910-Speed 2499.01 samples/sec Loss 2.9592 LearningRate 0.000569 Epoch: 12 Global Step: 266240 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:26,114-Speed 2496.77 samples/sec Loss 3.0137 LearningRate 0.000569 Epoch: 12 Global Step: 266250 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:34,313-Speed 2498.30 samples/sec Loss 2.9848 LearningRate 0.000569 Epoch: 12 Global Step: 266260 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:42,511-Speed 2498.62 samples/sec Loss 3.0335 LearningRate 0.000569 Epoch: 12 Global Step: 266270 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:50,707-Speed 2499.25 samples/sec Loss 3.0913 LearningRate 0.000569 Epoch: 12 Global Step: 266280 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:19:58,863-Speed 2512.03 samples/sec Loss 3.0634 LearningRate 0.000569 Epoch: 12 Global Step: 266290 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:07,063-Speed 2498.00 samples/sec Loss 3.0137 LearningRate 0.000569 Epoch: 12 Global Step: 266300 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:15,261-Speed 2498.65 samples/sec Loss 3.0170 LearningRate 0.000569 Epoch: 12 Global Step: 266310 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:23,474-Speed 2493.92 samples/sec Loss 3.0698 LearningRate 0.000569 Epoch: 12 Global Step: 266320 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:31,680-Speed 2496.30 samples/sec Loss 3.0299 LearningRate 0.000569 Epoch: 12 Global Step: 266330 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:39,890-Speed 2494.87 samples/sec Loss 3.1473 LearningRate 0.000569 Epoch: 12 Global Step: 266340 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:48,038-Speed 2513.96 samples/sec Loss 3.0394 LearningRate 0.000569 Epoch: 12 Global Step: 266350 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:20:56,268-Speed 2499.99 samples/sec Loss 3.0892 LearningRate 0.000569 Epoch: 12 Global Step: 266360 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:21:04,490-Speed 2491.10 samples/sec Loss 3.0705 LearningRate 0.000569 Epoch: 12 Global Step: 266370 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:21:13,156-Speed 2496.27 samples/sec Loss 3.1021 LearningRate 0.000569 Epoch: 12 Global Step: 266380 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:21:21,402-Speed 2499.49 samples/sec Loss 3.1291 LearningRate 0.000569 Epoch: 12 Global Step: 266390 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:21:29,662-Speed 2500.59 samples/sec Loss 3.1202 LearningRate 0.000569 Epoch: 12 Global Step: 266400 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:21:43,680-Speed 2513.93 samples/sec Loss 3.0384 LearningRate 0.000569 Epoch: 12 Global Step: 266410 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:21:52,071-Speed 2497.93 samples/sec Loss 3.0806 LearningRate 0.000569 Epoch: 12 Global Step: 266420 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:03,489-Speed 1800.56 samples/sec Loss 3.0210 LearningRate 0.000569 Epoch: 12 Global Step: 266430 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:11,773-Speed 2495.98 samples/sec Loss 3.0562 LearningRate 0.000569 Epoch: 12 Global Step: 266440 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:19,987-Speed 2493.53 samples/sec Loss 3.0280 LearningRate 0.000569 Epoch: 12 Global Step: 266450 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:33,046-Speed 2492.72 samples/sec Loss 3.0543 LearningRate 0.000569 Epoch: 12 Global Step: 266460 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:42,980-Speed 2509.07 samples/sec Loss 3.1514 LearningRate 0.000569 Epoch: 12 Global Step: 266470 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:22:51,198-Speed 2492.47 samples/sec Loss 3.0662 LearningRate 0.000569 Epoch: 12 Global Step: 266480 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:04,379-Speed 1563.29 samples/sec Loss 3.0237 LearningRate 0.000569 Epoch: 12 Global Step: 266490 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:12,641-Speed 2498.74 samples/sec Loss 3.0270 LearningRate 0.000569 Epoch: 12 Global Step: 266500 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:23,874-Speed 1823.34 samples/sec Loss 2.9908 LearningRate 0.000569 Epoch: 12 Global Step: 266510 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:32,072-Speed 2499.52 samples/sec Loss 3.0546 LearningRate 0.000569 Epoch: 12 Global Step: 266520 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:40,254-Speed 2517.54 samples/sec Loss 3.1668 LearningRate 0.000569 Epoch: 12 Global Step: 266530 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:23:48,540-Speed 2471.95 samples/sec Loss 3.0363 LearningRate 0.000569 Epoch: 12 Global Step: 266540 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:01,828-Speed 1541.44 samples/sec Loss 3.0017 LearningRate 0.000569 Epoch: 12 Global Step: 266550 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:10,016-Speed 2502.61 samples/sec Loss 3.1086 LearningRate 0.000569 Epoch: 12 Global Step: 266560 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:18,273-Speed 2501.02 samples/sec Loss 3.0346 LearningRate 0.000569 Epoch: 12 Global Step: 266570 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:29,642-Speed 1801.54 samples/sec Loss 3.0132 LearningRate 0.000569 Epoch: 12 Global Step: 266580 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:37,805-Speed 2519.12 samples/sec Loss 3.0395 LearningRate 0.000569 Epoch: 12 Global Step: 266590 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:47,240-Speed 2195.64 samples/sec Loss 3.0129 LearningRate 0.000569 Epoch: 12 Global Step: 266600 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:24:55,781-Speed 2499.18 samples/sec Loss 3.0415 LearningRate 0.000569 Epoch: 12 Global Step: 266610 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:04,090-Speed 2495.58 samples/sec Loss 2.9999 LearningRate 0.000569 Epoch: 12 Global Step: 266620 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:13,204-Speed 2498.29 samples/sec Loss 2.9681 LearningRate 0.000568 Epoch: 12 Global Step: 266630 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:21,405-Speed 2497.44 samples/sec Loss 3.0467 LearningRate 0.000568 Epoch: 12 Global Step: 266640 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:29,552-Speed 2514.08 samples/sec Loss 3.0455 LearningRate 0.000568 Epoch: 12 Global Step: 266650 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:42,637-Speed 1592.37 samples/sec Loss 3.0316 LearningRate 0.000568 Epoch: 12 Global Step: 266660 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:50,880-Speed 2502.34 samples/sec Loss 3.0406 LearningRate 0.000568 Epoch: 12 Global Step: 266670 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:25:59,095-Speed 2493.41 samples/sec Loss 3.0186 LearningRate 0.000568 Epoch: 12 Global Step: 266680 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:26:10,993-Speed 2499.91 samples/sec Loss 3.0051 LearningRate 0.000568 Epoch: 12 Global Step: 266690 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:26:19,265-Speed 2499.10 samples/sec Loss 2.9890 LearningRate 0.000568 Epoch: 12 Global Step: 266700 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:26:30,985-Speed 1747.61 samples/sec Loss 3.0220 LearningRate 0.000568 Epoch: 12 Global Step: 266710 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:26:39,214-Speed 2500.33 samples/sec Loss 3.0431 LearningRate 0.000568 Epoch: 12 Global Step: 266720 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:26:52,535-Speed 1537.56 samples/sec Loss 2.9994 LearningRate 0.000568 Epoch: 12 Global Step: 266730 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:01,627-Speed 2496.21 samples/sec Loss 3.0025 LearningRate 0.000568 Epoch: 12 Global Step: 266740 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:09,889-Speed 2498.02 samples/sec Loss 3.0615 LearningRate 0.000568 Epoch: 12 Global Step: 266750 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:18,082-Speed 2499.73 samples/sec Loss 3.0084 LearningRate 0.000568 Epoch: 12 Global Step: 266760 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:31,099-Speed 2517.40 samples/sec Loss 3.0309 LearningRate 0.000568 Epoch: 12 Global Step: 266770 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:39,330-Speed 2501.12 samples/sec Loss 3.0218 LearningRate 0.000568 Epoch: 12 Global Step: 266780 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:27:47,529-Speed 2498.25 samples/sec Loss 3.0129 LearningRate 0.000568 Epoch: 12 Global Step: 266790 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:28:01,949-Speed 2495.44 samples/sec Loss 2.9908 LearningRate 0.000568 Epoch: 12 Global Step: 266800 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:28:10,174-Speed 2498.16 samples/sec Loss 2.9955 LearningRate 0.000568 Epoch: 12 Global Step: 266810 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:28:18,409-Speed 2497.44 samples/sec Loss 2.9698 LearningRate 0.000568 Epoch: 12 Global Step: 266820 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:28:28,736-Speed 1983.28 samples/sec Loss 3.0081 LearningRate 0.000568 Epoch: 12 Global Step: 266830 Fp16 Grad Scale: 65536 Required: 129 hours Training: 2022-07-08 03:28:36,981-Speed 2499.05 samples/sec Loss 3.0061 LearningRate 0.000568 Epoch: 12 Global Step: 266840 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:28:45,309-Speed 2491.30 samples/sec Loss 2.9559 LearningRate 0.000568 Epoch: 12 Global Step: 266850 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:28:59,144-Speed 1480.37 samples/sec Loss 2.9857 LearningRate 0.000568 Epoch: 12 Global Step: 266860 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:07,727-Speed 2497.77 samples/sec Loss 3.0221 LearningRate 0.000568 Epoch: 12 Global Step: 266870 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:15,944-Speed 2499.63 samples/sec Loss 3.0017 LearningRate 0.000568 Epoch: 12 Global Step: 266880 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:24,115-Speed 2515.71 samples/sec Loss 2.9925 LearningRate 0.000568 Epoch: 12 Global Step: 266890 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:32,332-Speed 2492.86 samples/sec Loss 2.9626 LearningRate 0.000568 Epoch: 12 Global Step: 266900 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:40,531-Speed 2498.24 samples/sec Loss 2.9994 LearningRate 0.000568 Epoch: 12 Global Step: 266910 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:48,731-Speed 2498.39 samples/sec Loss 2.9999 LearningRate 0.000568 Epoch: 12 Global Step: 266920 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:29:56,931-Speed 2497.97 samples/sec Loss 3.0396 LearningRate 0.000568 Epoch: 12 Global Step: 266930 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:05,133-Speed 2497.16 samples/sec Loss 3.0633 LearningRate 0.000568 Epoch: 12 Global Step: 266940 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:13,281-Speed 2514.53 samples/sec Loss 3.0306 LearningRate 0.000568 Epoch: 12 Global Step: 266950 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:21,482-Speed 2498.83 samples/sec Loss 2.9954 LearningRate 0.000568 Epoch: 12 Global Step: 266960 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:29,688-Speed 2496.01 samples/sec Loss 2.9860 LearningRate 0.000568 Epoch: 12 Global Step: 266970 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:37,902-Speed 2493.97 samples/sec Loss 3.0292 LearningRate 0.000568 Epoch: 12 Global Step: 266980 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:46,103-Speed 2497.47 samples/sec Loss 2.9806 LearningRate 0.000568 Epoch: 12 Global Step: 266990 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:30:54,313-Speed 2495.18 samples/sec Loss 2.9045 LearningRate 0.000568 Epoch: 12 Global Step: 267000 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:02,465-Speed 2512.66 samples/sec Loss 3.0141 LearningRate 0.000568 Epoch: 12 Global Step: 267010 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:10,667-Speed 2497.34 samples/sec Loss 2.9699 LearningRate 0.000568 Epoch: 12 Global Step: 267020 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:18,873-Speed 2495.99 samples/sec Loss 3.0300 LearningRate 0.000568 Epoch: 12 Global Step: 267030 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:27,075-Speed 2497.57 samples/sec Loss 2.9821 LearningRate 0.000568 Epoch: 12 Global Step: 267040 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:35,281-Speed 2495.84 samples/sec Loss 2.9951 LearningRate 0.000568 Epoch: 12 Global Step: 267050 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:43,487-Speed 2496.10 samples/sec Loss 2.9891 LearningRate 0.000568 Epoch: 12 Global Step: 267060 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:51,645-Speed 2510.84 samples/sec Loss 2.9824 LearningRate 0.000568 Epoch: 12 Global Step: 267070 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:31:59,847-Speed 2497.34 samples/sec Loss 3.0510 LearningRate 0.000568 Epoch: 12 Global Step: 267080 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:08,048-Speed 2497.70 samples/sec Loss 3.0249 LearningRate 0.000568 Epoch: 12 Global Step: 267090 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:16,248-Speed 2497.70 samples/sec Loss 2.9948 LearningRate 0.000568 Epoch: 12 Global Step: 267100 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:24,456-Speed 2496.05 samples/sec Loss 3.0186 LearningRate 0.000568 Epoch: 12 Global Step: 267110 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:32,658-Speed 2498.05 samples/sec Loss 3.0521 LearningRate 0.000568 Epoch: 12 Global Step: 267120 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:40,820-Speed 2509.61 samples/sec Loss 2.9684 LearningRate 0.000567 Epoch: 12 Global Step: 267130 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:49,021-Speed 2497.90 samples/sec Loss 2.9804 LearningRate 0.000567 Epoch: 12 Global Step: 267140 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:32:57,236-Speed 2493.32 samples/sec Loss 3.0257 LearningRate 0.000567 Epoch: 12 Global Step: 267150 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:05,438-Speed 2497.18 samples/sec Loss 2.9886 LearningRate 0.000567 Epoch: 12 Global Step: 267160 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:13,638-Speed 2498.21 samples/sec Loss 3.0415 LearningRate 0.000567 Epoch: 12 Global Step: 267170 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:21,840-Speed 2497.07 samples/sec Loss 3.0764 LearningRate 0.000567 Epoch: 12 Global Step: 267180 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:29,987-Speed 2514.52 samples/sec Loss 2.9946 LearningRate 0.000567 Epoch: 12 Global Step: 267190 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:38,188-Speed 2497.89 samples/sec Loss 3.0788 LearningRate 0.000567 Epoch: 12 Global Step: 267200 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:46,391-Speed 2497.09 samples/sec Loss 3.0545 LearningRate 0.000567 Epoch: 12 Global Step: 267210 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:33:54,596-Speed 2496.33 samples/sec Loss 3.1004 LearningRate 0.000567 Epoch: 12 Global Step: 267220 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:02,796-Speed 2498.19 samples/sec Loss 3.1451 LearningRate 0.000567 Epoch: 12 Global Step: 267230 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:11,001-Speed 2496.46 samples/sec Loss 3.1367 LearningRate 0.000567 Epoch: 12 Global Step: 267240 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:19,154-Speed 2512.22 samples/sec Loss 3.2035 LearningRate 0.000567 Epoch: 12 Global Step: 267250 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:27,354-Speed 2497.69 samples/sec Loss 3.1359 LearningRate 0.000567 Epoch: 12 Global Step: 267260 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:35,574-Speed 2491.82 samples/sec Loss 3.1581 LearningRate 0.000567 Epoch: 12 Global Step: 267270 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:43,774-Speed 2498.38 samples/sec Loss 3.1076 LearningRate 0.000567 Epoch: 12 Global Step: 267280 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:34:51,976-Speed 2497.51 samples/sec Loss 3.1285 LearningRate 0.000567 Epoch: 12 Global Step: 267290 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:00,183-Speed 2495.71 samples/sec Loss 3.0908 LearningRate 0.000567 Epoch: 12 Global Step: 267300 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:08,336-Speed 2512.49 samples/sec Loss 3.0934 LearningRate 0.000567 Epoch: 12 Global Step: 267310 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:16,539-Speed 2497.16 samples/sec Loss 3.0672 LearningRate 0.000567 Epoch: 12 Global Step: 267320 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:24,738-Speed 2498.45 samples/sec Loss 3.0216 LearningRate 0.000567 Epoch: 12 Global Step: 267330 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:32,943-Speed 2496.67 samples/sec Loss 3.0586 LearningRate 0.000567 Epoch: 12 Global Step: 267340 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:41,145-Speed 2497.30 samples/sec Loss 2.9943 LearningRate 0.000567 Epoch: 12 Global Step: 267350 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:49,347-Speed 2497.28 samples/sec Loss 2.9743 LearningRate 0.000567 Epoch: 12 Global Step: 267360 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:35:57,496-Speed 2513.48 samples/sec Loss 2.9915 LearningRate 0.000567 Epoch: 12 Global Step: 267370 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:05,696-Speed 2498.39 samples/sec Loss 3.0430 LearningRate 0.000567 Epoch: 12 Global Step: 267380 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:13,896-Speed 2498.05 samples/sec Loss 3.0820 LearningRate 0.000567 Epoch: 12 Global Step: 267390 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:22,103-Speed 2495.86 samples/sec Loss 2.9877 LearningRate 0.000567 Epoch: 12 Global Step: 267400 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:30,301-Speed 2498.30 samples/sec Loss 2.9765 LearningRate 0.000567 Epoch: 12 Global Step: 267410 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:38,500-Speed 2498.43 samples/sec Loss 3.0001 LearningRate 0.000567 Epoch: 12 Global Step: 267420 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:46,667-Speed 2508.08 samples/sec Loss 3.0285 LearningRate 0.000567 Epoch: 12 Global Step: 267430 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:36:54,874-Speed 2495.85 samples/sec Loss 3.0286 LearningRate 0.000567 Epoch: 12 Global Step: 267440 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:03,079-Speed 2496.48 samples/sec Loss 3.0139 LearningRate 0.000567 Epoch: 12 Global Step: 267450 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:11,280-Speed 2497.70 samples/sec Loss 3.0213 LearningRate 0.000567 Epoch: 12 Global Step: 267460 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:19,478-Speed 2498.47 samples/sec Loss 3.0119 LearningRate 0.000567 Epoch: 12 Global Step: 267470 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:27,682-Speed 2496.87 samples/sec Loss 2.9261 LearningRate 0.000567 Epoch: 12 Global Step: 267480 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:35,831-Speed 2513.67 samples/sec Loss 2.9700 LearningRate 0.000567 Epoch: 12 Global Step: 267490 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:44,037-Speed 2495.91 samples/sec Loss 2.9781 LearningRate 0.000567 Epoch: 12 Global Step: 267500 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:37:52,253-Speed 2493.15 samples/sec Loss 3.0499 LearningRate 0.000567 Epoch: 12 Global Step: 267510 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:00,466-Speed 2494.17 samples/sec Loss 3.0466 LearningRate 0.000567 Epoch: 12 Global Step: 267520 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:08,667-Speed 2497.69 samples/sec Loss 3.0394 LearningRate 0.000567 Epoch: 12 Global Step: 267530 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:16,867-Speed 2498.24 samples/sec Loss 3.0422 LearningRate 0.000567 Epoch: 12 Global Step: 267540 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:25,015-Speed 2514.20 samples/sec Loss 3.0218 LearningRate 0.000567 Epoch: 12 Global Step: 267550 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:33,217-Speed 2497.09 samples/sec Loss 3.0768 LearningRate 0.000567 Epoch: 12 Global Step: 267560 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:41,431-Speed 2493.93 samples/sec Loss 3.0170 LearningRate 0.000567 Epoch: 12 Global Step: 267570 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:49,635-Speed 2496.73 samples/sec Loss 2.9869 LearningRate 0.000567 Epoch: 12 Global Step: 267580 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:38:57,833-Speed 2498.34 samples/sec Loss 3.0710 LearningRate 0.000567 Epoch: 12 Global Step: 267590 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:06,033-Speed 2498.10 samples/sec Loss 3.0661 LearningRate 0.000567 Epoch: 12 Global Step: 267600 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:14,181-Speed 2513.92 samples/sec Loss 3.0215 LearningRate 0.000567 Epoch: 12 Global Step: 267610 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:22,379-Speed 2498.70 samples/sec Loss 3.1046 LearningRate 0.000567 Epoch: 12 Global Step: 267620 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:30,579-Speed 2498.21 samples/sec Loss 3.0458 LearningRate 0.000566 Epoch: 12 Global Step: 267630 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:38,779-Speed 2497.65 samples/sec Loss 3.0526 LearningRate 0.000566 Epoch: 12 Global Step: 267640 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:46,977-Speed 2498.82 samples/sec Loss 3.0175 LearningRate 0.000566 Epoch: 12 Global Step: 267650 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:39:55,178-Speed 2497.66 samples/sec Loss 2.9860 LearningRate 0.000566 Epoch: 12 Global Step: 267660 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:03,326-Speed 2514.11 samples/sec Loss 2.9966 LearningRate 0.000566 Epoch: 12 Global Step: 267670 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:11,529-Speed 2497.04 samples/sec Loss 3.0337 LearningRate 0.000566 Epoch: 12 Global Step: 267680 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:19,728-Speed 2498.22 samples/sec Loss 2.9779 LearningRate 0.000566 Epoch: 12 Global Step: 267690 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:27,927-Speed 2498.42 samples/sec Loss 3.0216 LearningRate 0.000566 Epoch: 12 Global Step: 267700 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:36,128-Speed 2497.64 samples/sec Loss 3.0044 LearningRate 0.000566 Epoch: 12 Global Step: 267710 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:44,359-Speed 2488.44 samples/sec Loss 2.9789 LearningRate 0.000566 Epoch: 12 Global Step: 267720 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:40:52,505-Speed 2514.66 samples/sec Loss 3.0242 LearningRate 0.000566 Epoch: 12 Global Step: 267730 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:00,706-Speed 2497.58 samples/sec Loss 3.0036 LearningRate 0.000566 Epoch: 12 Global Step: 267740 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:08,914-Speed 2495.54 samples/sec Loss 2.9851 LearningRate 0.000566 Epoch: 12 Global Step: 267750 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:17,119-Speed 2496.64 samples/sec Loss 2.9703 LearningRate 0.000566 Epoch: 12 Global Step: 267760 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:25,326-Speed 2495.67 samples/sec Loss 3.0268 LearningRate 0.000566 Epoch: 12 Global Step: 267770 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:33,528-Speed 2497.56 samples/sec Loss 3.0147 LearningRate 0.000566 Epoch: 12 Global Step: 267780 Fp16 Grad Scale: 32768 Required: 129 hours Training: 2022-07-08 03:41:41,672-Speed 2515.01 samples/sec Loss 2.9808 LearningRate 0.000566 Epoch: 12 Global Step: 267790 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:41:49,874-Speed 2497.38 samples/sec Loss 3.0268 LearningRate 0.000566 Epoch: 12 Global Step: 267800 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:41:58,108-Speed 2487.67 samples/sec Loss 3.0259 LearningRate 0.000566 Epoch: 12 Global Step: 267810 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:06,305-Speed 2498.99 samples/sec Loss 3.0269 LearningRate 0.000566 Epoch: 12 Global Step: 267820 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:14,518-Speed 2494.11 samples/sec Loss 2.9928 LearningRate 0.000566 Epoch: 12 Global Step: 267830 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:22,721-Speed 2496.95 samples/sec Loss 3.0478 LearningRate 0.000566 Epoch: 12 Global Step: 267840 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:30,867-Speed 2514.56 samples/sec Loss 2.9467 LearningRate 0.000566 Epoch: 12 Global Step: 267850 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:39,070-Speed 2497.17 samples/sec Loss 2.9550 LearningRate 0.000566 Epoch: 12 Global Step: 267860 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:47,280-Speed 2494.68 samples/sec Loss 3.0268 LearningRate 0.000566 Epoch: 12 Global Step: 267870 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:42:55,486-Speed 2496.17 samples/sec Loss 3.0577 LearningRate 0.000566 Epoch: 12 Global Step: 267880 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:03,686-Speed 2497.82 samples/sec Loss 2.9748 LearningRate 0.000566 Epoch: 12 Global Step: 267890 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:11,892-Speed 2496.03 samples/sec Loss 3.0310 LearningRate 0.000566 Epoch: 12 Global Step: 267900 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:20,042-Speed 2513.37 samples/sec Loss 2.9745 LearningRate 0.000566 Epoch: 12 Global Step: 267910 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:28,242-Speed 2498.12 samples/sec Loss 3.0169 LearningRate 0.000566 Epoch: 12 Global Step: 267920 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:36,444-Speed 2497.55 samples/sec Loss 2.9985 LearningRate 0.000566 Epoch: 12 Global Step: 267930 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:44,648-Speed 2496.51 samples/sec Loss 2.9919 LearningRate 0.000566 Epoch: 12 Global Step: 267940 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:43:52,849-Speed 2497.74 samples/sec Loss 3.0123 LearningRate 0.000566 Epoch: 12 Global Step: 267950 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:01,049-Speed 2498.14 samples/sec Loss 3.0013 LearningRate 0.000566 Epoch: 12 Global Step: 267960 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:09,199-Speed 2513.13 samples/sec Loss 2.9640 LearningRate 0.000566 Epoch: 12 Global Step: 267970 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:17,401-Speed 2497.23 samples/sec Loss 2.9320 LearningRate 0.000566 Epoch: 12 Global Step: 267980 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:25,602-Speed 2498.13 samples/sec Loss 2.9853 LearningRate 0.000566 Epoch: 12 Global Step: 267990 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:33,810-Speed 2495.30 samples/sec Loss 3.1096 LearningRate 0.000566 Epoch: 12 Global Step: 268000 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:42,012-Speed 2497.51 samples/sec Loss 3.0930 LearningRate 0.000566 Epoch: 12 Global Step: 268010 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:50,214-Speed 2497.42 samples/sec Loss 2.9593 LearningRate 0.000566 Epoch: 12 Global Step: 268020 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:44:58,360-Speed 2514.15 samples/sec Loss 3.0628 LearningRate 0.000566 Epoch: 12 Global Step: 268030 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:45:06,560-Speed 2498.32 samples/sec Loss 3.0252 LearningRate 0.000566 Epoch: 12 Global Step: 268040 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:14,766-Speed 2496.05 samples/sec Loss 3.0171 LearningRate 0.000566 Epoch: 12 Global Step: 268050 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:22,968-Speed 2497.33 samples/sec Loss 2.9929 LearningRate 0.000566 Epoch: 12 Global Step: 268060 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:31,198-Speed 2488.99 samples/sec Loss 3.0247 LearningRate 0.000566 Epoch: 12 Global Step: 268070 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:39,396-Speed 2498.54 samples/sec Loss 3.0940 LearningRate 0.000566 Epoch: 12 Global Step: 268080 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:47,539-Speed 2515.27 samples/sec Loss 3.0181 LearningRate 0.000566 Epoch: 12 Global Step: 268090 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:45:55,738-Speed 2498.17 samples/sec Loss 3.0368 LearningRate 0.000566 Epoch: 12 Global Step: 268100 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:03,934-Speed 2499.90 samples/sec Loss 2.9870 LearningRate 0.000566 Epoch: 12 Global Step: 268110 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:12,135-Speed 2498.03 samples/sec Loss 2.9828 LearningRate 0.000565 Epoch: 12 Global Step: 268120 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:20,348-Speed 2493.96 samples/sec Loss 3.0100 LearningRate 0.000565 Epoch: 12 Global Step: 268130 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:28,553-Speed 2496.46 samples/sec Loss 3.0832 LearningRate 0.000565 Epoch: 12 Global Step: 268140 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:36,700-Speed 2514.40 samples/sec Loss 3.0021 LearningRate 0.000565 Epoch: 12 Global Step: 268150 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:44,913-Speed 2494.03 samples/sec Loss 3.0769 LearningRate 0.000565 Epoch: 12 Global Step: 268160 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:46:53,113-Speed 2497.69 samples/sec Loss 3.0389 LearningRate 0.000565 Epoch: 12 Global Step: 268170 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:47:01,317-Speed 2496.81 samples/sec Loss 3.0209 LearningRate 0.000565 Epoch: 12 Global Step: 268180 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:47:09,541-Speed 2490.77 samples/sec Loss 3.0249 LearningRate 0.000565 Epoch: 12 Global Step: 268190 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 03:47:17,702-Speed 2509.96 samples/sec Loss 3.0408 LearningRate 0.000565 Epoch: 12 Global Step: 268200 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:47:25,849-Speed 2514.30 samples/sec Loss 3.0104 LearningRate 0.000565 Epoch: 12 Global Step: 268210 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:47:34,049-Speed 2497.94 samples/sec Loss 3.0209 LearningRate 0.000565 Epoch: 12 Global Step: 268220 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:47:42,249-Speed 2497.78 samples/sec Loss 3.0306 LearningRate 0.000565 Epoch: 12 Global Step: 268230 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:47:50,455-Speed 2496.50 samples/sec Loss 3.0663 LearningRate 0.000565 Epoch: 12 Global Step: 268240 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:47:58,668-Speed 2494.08 samples/sec Loss 2.9949 LearningRate 0.000565 Epoch: 12 Global Step: 268250 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:06,869-Speed 2497.61 samples/sec Loss 3.0241 LearningRate 0.000565 Epoch: 12 Global Step: 268260 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:15,015-Speed 2514.60 samples/sec Loss 2.9926 LearningRate 0.000565 Epoch: 12 Global Step: 268270 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:23,216-Speed 2497.67 samples/sec Loss 3.0708 LearningRate 0.000565 Epoch: 12 Global Step: 268280 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:31,415-Speed 2498.19 samples/sec Loss 2.9333 LearningRate 0.000565 Epoch: 12 Global Step: 268290 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:39,616-Speed 2497.72 samples/sec Loss 3.0000 LearningRate 0.000565 Epoch: 12 Global Step: 268300 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:47,819-Speed 2497.29 samples/sec Loss 3.0015 LearningRate 0.000565 Epoch: 12 Global Step: 268310 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:48:56,018-Speed 2498.34 samples/sec Loss 2.9630 LearningRate 0.000565 Epoch: 12 Global Step: 268320 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:04,163-Speed 2514.77 samples/sec Loss 2.9913 LearningRate 0.000565 Epoch: 12 Global Step: 268330 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:12,387-Speed 2490.52 samples/sec Loss 2.9602 LearningRate 0.000565 Epoch: 12 Global Step: 268340 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:20,590-Speed 2497.18 samples/sec Loss 3.0178 LearningRate 0.000565 Epoch: 12 Global Step: 268350 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:28,792-Speed 2497.25 samples/sec Loss 2.9588 LearningRate 0.000565 Epoch: 12 Global Step: 268360 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:36,994-Speed 2497.34 samples/sec Loss 2.9322 LearningRate 0.000565 Epoch: 12 Global Step: 268370 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:45,193-Speed 2498.04 samples/sec Loss 3.0163 LearningRate 0.000565 Epoch: 12 Global Step: 268380 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:49:53,340-Speed 2514.45 samples/sec Loss 2.9928 LearningRate 0.000565 Epoch: 12 Global Step: 268390 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:01,550-Speed 2494.72 samples/sec Loss 2.9437 LearningRate 0.000565 Epoch: 12 Global Step: 268400 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:09,753-Speed 2497.07 samples/sec Loss 3.0379 LearningRate 0.000565 Epoch: 12 Global Step: 268410 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:17,953-Speed 2497.86 samples/sec Loss 3.0615 LearningRate 0.000565 Epoch: 12 Global Step: 268420 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:26,152-Speed 2499.26 samples/sec Loss 3.0230 LearningRate 0.000565 Epoch: 12 Global Step: 268430 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:34,367-Speed 2493.13 samples/sec Loss 3.1071 LearningRate 0.000565 Epoch: 12 Global Step: 268440 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:42,512-Speed 2514.70 samples/sec Loss 2.9334 LearningRate 0.000565 Epoch: 12 Global Step: 268450 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:50,724-Speed 2494.47 samples/sec Loss 3.0754 LearningRate 0.000565 Epoch: 12 Global Step: 268460 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:50:58,935-Speed 2494.45 samples/sec Loss 3.1133 LearningRate 0.000565 Epoch: 12 Global Step: 268470 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:07,134-Speed 2498.23 samples/sec Loss 3.0915 LearningRate 0.000565 Epoch: 12 Global Step: 268480 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:15,344-Speed 2494.93 samples/sec Loss 3.1280 LearningRate 0.000565 Epoch: 12 Global Step: 268490 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:23,552-Speed 2495.55 samples/sec Loss 3.0562 LearningRate 0.000565 Epoch: 12 Global Step: 268500 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:31,698-Speed 2514.61 samples/sec Loss 3.0534 LearningRate 0.000565 Epoch: 12 Global Step: 268510 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:39,894-Speed 2499.14 samples/sec Loss 3.0372 LearningRate 0.000565 Epoch: 12 Global Step: 268520 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:48,100-Speed 2496.42 samples/sec Loss 3.0602 LearningRate 0.000565 Epoch: 12 Global Step: 268530 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:51:56,299-Speed 2498.23 samples/sec Loss 3.0646 LearningRate 0.000565 Epoch: 12 Global Step: 268540 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:04,496-Speed 2498.62 samples/sec Loss 3.0437 LearningRate 0.000565 Epoch: 12 Global Step: 268550 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:12,692-Speed 2499.40 samples/sec Loss 3.0381 LearningRate 0.000565 Epoch: 12 Global Step: 268560 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:20,834-Speed 2515.72 samples/sec Loss 3.0983 LearningRate 0.000565 Epoch: 12 Global Step: 268570 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:29,058-Speed 2490.78 samples/sec Loss 2.9974 LearningRate 0.000565 Epoch: 12 Global Step: 268580 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:37,255-Speed 2498.85 samples/sec Loss 3.0510 LearningRate 0.000565 Epoch: 12 Global Step: 268590 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:45,462-Speed 2495.88 samples/sec Loss 3.0456 LearningRate 0.000565 Epoch: 12 Global Step: 268600 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:52:53,662-Speed 2498.06 samples/sec Loss 3.0079 LearningRate 0.000565 Epoch: 12 Global Step: 268610 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:01,862-Speed 2498.21 samples/sec Loss 3.0092 LearningRate 0.000564 Epoch: 12 Global Step: 268620 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:10,009-Speed 2514.39 samples/sec Loss 2.9809 LearningRate 0.000564 Epoch: 12 Global Step: 268630 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:18,207-Speed 2498.30 samples/sec Loss 2.9945 LearningRate 0.000564 Epoch: 12 Global Step: 268640 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:26,402-Speed 2499.42 samples/sec Loss 3.0401 LearningRate 0.000564 Epoch: 12 Global Step: 268650 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:34,612-Speed 2494.78 samples/sec Loss 2.9843 LearningRate 0.000564 Epoch: 12 Global Step: 268660 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:42,813-Speed 2497.86 samples/sec Loss 3.0752 LearningRate 0.000564 Epoch: 12 Global Step: 268670 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:51,008-Speed 2499.30 samples/sec Loss 2.9558 LearningRate 0.000564 Epoch: 12 Global Step: 268680 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:53:59,154-Speed 2514.32 samples/sec Loss 3.0185 LearningRate 0.000564 Epoch: 12 Global Step: 268690 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:07,369-Speed 2493.76 samples/sec Loss 2.9500 LearningRate 0.000564 Epoch: 12 Global Step: 268700 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:15,569-Speed 2497.79 samples/sec Loss 2.9787 LearningRate 0.000564 Epoch: 12 Global Step: 268710 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:23,769-Speed 2498.05 samples/sec Loss 3.0173 LearningRate 0.000564 Epoch: 12 Global Step: 268720 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:31,969-Speed 2498.26 samples/sec Loss 3.0012 LearningRate 0.000564 Epoch: 12 Global Step: 268730 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:40,171-Speed 2497.23 samples/sec Loss 2.9315 LearningRate 0.000564 Epoch: 12 Global Step: 268740 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:48,313-Speed 2515.59 samples/sec Loss 3.0028 LearningRate 0.000564 Epoch: 12 Global Step: 268750 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:54:56,510-Speed 2498.93 samples/sec Loss 3.0392 LearningRate 0.000564 Epoch: 12 Global Step: 268760 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:04,707-Speed 2498.60 samples/sec Loss 2.9860 LearningRate 0.000564 Epoch: 12 Global Step: 268770 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:12,906-Speed 2498.20 samples/sec Loss 3.0141 LearningRate 0.000564 Epoch: 12 Global Step: 268780 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:21,105-Speed 2498.27 samples/sec Loss 2.9900 LearningRate 0.000564 Epoch: 12 Global Step: 268790 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:29,307-Speed 2497.23 samples/sec Loss 3.0558 LearningRate 0.000564 Epoch: 12 Global Step: 268800 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:37,451-Speed 2515.19 samples/sec Loss 2.9735 LearningRate 0.000564 Epoch: 12 Global Step: 268810 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:45,656-Speed 2496.55 samples/sec Loss 2.9676 LearningRate 0.000564 Epoch: 12 Global Step: 268820 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:55:53,853-Speed 2498.86 samples/sec Loss 2.9713 LearningRate 0.000564 Epoch: 12 Global Step: 268830 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:02,059-Speed 2496.27 samples/sec Loss 2.9517 LearningRate 0.000564 Epoch: 12 Global Step: 268840 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:10,274-Speed 2493.45 samples/sec Loss 2.9475 LearningRate 0.000564 Epoch: 12 Global Step: 268850 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:18,470-Speed 2498.95 samples/sec Loss 2.9683 LearningRate 0.000564 Epoch: 12 Global Step: 268860 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:26,619-Speed 2513.58 samples/sec Loss 2.9854 LearningRate 0.000564 Epoch: 12 Global Step: 268870 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:34,819-Speed 2497.94 samples/sec Loss 2.9430 LearningRate 0.000564 Epoch: 12 Global Step: 268880 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:43,022-Speed 2497.19 samples/sec Loss 2.9747 LearningRate 0.000564 Epoch: 12 Global Step: 268890 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:51,224-Speed 2497.25 samples/sec Loss 2.9892 LearningRate 0.000564 Epoch: 12 Global Step: 268900 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:56:59,422-Speed 2498.65 samples/sec Loss 3.0031 LearningRate 0.000564 Epoch: 12 Global Step: 268910 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:07,620-Speed 2498.83 samples/sec Loss 2.9723 LearningRate 0.000564 Epoch: 12 Global Step: 268920 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:15,769-Speed 2513.70 samples/sec Loss 3.0067 LearningRate 0.000564 Epoch: 12 Global Step: 268930 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:23,971-Speed 2497.41 samples/sec Loss 2.9941 LearningRate 0.000564 Epoch: 12 Global Step: 268940 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:32,171-Speed 2497.85 samples/sec Loss 2.9445 LearningRate 0.000564 Epoch: 12 Global Step: 268950 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:40,383-Speed 2494.41 samples/sec Loss 2.9913 LearningRate 0.000564 Epoch: 12 Global Step: 268960 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:48,582-Speed 2498.44 samples/sec Loss 3.0183 LearningRate 0.000564 Epoch: 12 Global Step: 268970 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:57:56,795-Speed 2493.86 samples/sec Loss 2.9936 LearningRate 0.000564 Epoch: 12 Global Step: 268980 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:04,941-Speed 2514.67 samples/sec Loss 3.0267 LearningRate 0.000564 Epoch: 12 Global Step: 268990 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:13,153-Speed 2494.29 samples/sec Loss 3.0052 LearningRate 0.000564 Epoch: 12 Global Step: 269000 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:21,350-Speed 2498.82 samples/sec Loss 3.0538 LearningRate 0.000564 Epoch: 12 Global Step: 269010 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:29,557-Speed 2495.88 samples/sec Loss 3.0649 LearningRate 0.000564 Epoch: 12 Global Step: 269020 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:37,757-Speed 2497.74 samples/sec Loss 3.0065 LearningRate 0.000564 Epoch: 12 Global Step: 269030 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:45,960-Speed 2497.47 samples/sec Loss 2.9813 LearningRate 0.000564 Epoch: 12 Global Step: 269040 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:58:54,107-Speed 2514.10 samples/sec Loss 2.9648 LearningRate 0.000564 Epoch: 12 Global Step: 269050 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:02,309-Speed 2497.41 samples/sec Loss 2.9675 LearningRate 0.000564 Epoch: 12 Global Step: 269060 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:10,511-Speed 2497.59 samples/sec Loss 3.0627 LearningRate 0.000564 Epoch: 12 Global Step: 269070 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:18,709-Speed 2498.53 samples/sec Loss 2.9401 LearningRate 0.000564 Epoch: 12 Global Step: 269080 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:26,905-Speed 2499.16 samples/sec Loss 2.9639 LearningRate 0.000564 Epoch: 12 Global Step: 269090 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:35,118-Speed 2493.90 samples/sec Loss 2.9690 LearningRate 0.000564 Epoch: 12 Global Step: 269100 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:43,261-Speed 2515.39 samples/sec Loss 2.9659 LearningRate 0.000564 Epoch: 12 Global Step: 269110 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:51,457-Speed 2499.19 samples/sec Loss 3.1396 LearningRate 0.000563 Epoch: 12 Global Step: 269120 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 03:59:59,653-Speed 2499.40 samples/sec Loss 3.0280 LearningRate 0.000563 Epoch: 12 Global Step: 269130 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:07,853-Speed 2497.72 samples/sec Loss 3.0073 LearningRate 0.000563 Epoch: 12 Global Step: 269140 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:16,056-Speed 2497.23 samples/sec Loss 3.0452 LearningRate 0.000563 Epoch: 12 Global Step: 269150 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:24,253-Speed 2498.91 samples/sec Loss 3.0602 LearningRate 0.000563 Epoch: 12 Global Step: 269160 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:32,402-Speed 2513.43 samples/sec Loss 3.0185 LearningRate 0.000563 Epoch: 12 Global Step: 269170 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:40,600-Speed 2498.90 samples/sec Loss 3.0437 LearningRate 0.000563 Epoch: 12 Global Step: 269180 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:48,800-Speed 2497.85 samples/sec Loss 3.0356 LearningRate 0.000563 Epoch: 12 Global Step: 269190 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:00:56,998-Speed 2498.64 samples/sec Loss 3.0053 LearningRate 0.000563 Epoch: 12 Global Step: 269200 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:05,201-Speed 2497.33 samples/sec Loss 3.0123 LearningRate 0.000563 Epoch: 12 Global Step: 269210 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:13,403-Speed 2497.32 samples/sec Loss 2.9949 LearningRate 0.000563 Epoch: 12 Global Step: 269220 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:21,550-Speed 2514.04 samples/sec Loss 2.9359 LearningRate 0.000563 Epoch: 12 Global Step: 269230 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:29,748-Speed 2498.66 samples/sec Loss 3.0017 LearningRate 0.000563 Epoch: 12 Global Step: 269240 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:37,946-Speed 2498.38 samples/sec Loss 3.0109 LearningRate 0.000563 Epoch: 12 Global Step: 269250 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:46,145-Speed 2498.27 samples/sec Loss 2.9964 LearningRate 0.000563 Epoch: 12 Global Step: 269260 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:01:54,344-Speed 2498.37 samples/sec Loss 3.0027 LearningRate 0.000563 Epoch: 12 Global Step: 269270 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:02,543-Speed 2498.24 samples/sec Loss 3.0519 LearningRate 0.000563 Epoch: 12 Global Step: 269280 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:10,690-Speed 2514.09 samples/sec Loss 2.9923 LearningRate 0.000563 Epoch: 12 Global Step: 269290 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:18,898-Speed 2495.69 samples/sec Loss 3.0524 LearningRate 0.000563 Epoch: 12 Global Step: 269300 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:27,096-Speed 2498.62 samples/sec Loss 3.0712 LearningRate 0.000563 Epoch: 12 Global Step: 269310 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:35,297-Speed 2497.44 samples/sec Loss 3.0296 LearningRate 0.000563 Epoch: 12 Global Step: 269320 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:43,499-Speed 2497.31 samples/sec Loss 3.0245 LearningRate 0.000563 Epoch: 12 Global Step: 269330 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:51,711-Speed 2494.46 samples/sec Loss 2.9888 LearningRate 0.000563 Epoch: 12 Global Step: 269340 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:02:59,854-Speed 2515.58 samples/sec Loss 3.0337 LearningRate 0.000563 Epoch: 12 Global Step: 269350 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:03:08,060-Speed 2496.19 samples/sec Loss 3.0293 LearningRate 0.000563 Epoch: 12 Global Step: 269360 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:03:16,256-Speed 2499.09 samples/sec Loss 3.0250 LearningRate 0.000563 Epoch: 12 Global Step: 269370 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:03:24,456-Speed 2497.78 samples/sec Loss 2.9945 LearningRate 0.000563 Epoch: 12 Global Step: 269380 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:03:32,653-Speed 2498.89 samples/sec Loss 3.0171 LearningRate 0.000563 Epoch: 12 Global Step: 269390 Fp16 Grad Scale: 32768 Required: 128 hours Training: 2022-07-08 04:03:40,861-Speed 2495.62 samples/sec Loss 3.0325 LearningRate 0.000563 Epoch: 12 Global Step: 269400 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:03:49,010-Speed 2513.68 samples/sec Loss 3.0602 LearningRate 0.000563 Epoch: 12 Global Step: 269410 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:03:57,214-Speed 2496.85 samples/sec Loss 3.0164 LearningRate 0.000563 Epoch: 12 Global Step: 269420 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:05,411-Speed 2498.64 samples/sec Loss 3.0166 LearningRate 0.000563 Epoch: 12 Global Step: 269430 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:13,621-Speed 2495.03 samples/sec Loss 3.0031 LearningRate 0.000563 Epoch: 12 Global Step: 269440 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:21,817-Speed 2499.02 samples/sec Loss 3.0404 LearningRate 0.000563 Epoch: 12 Global Step: 269450 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:30,020-Speed 2497.08 samples/sec Loss 3.0056 LearningRate 0.000563 Epoch: 12 Global Step: 269460 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:38,166-Speed 2514.58 samples/sec Loss 3.0393 LearningRate 0.000563 Epoch: 12 Global Step: 269470 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:46,375-Speed 2495.08 samples/sec Loss 3.0523 LearningRate 0.000563 Epoch: 12 Global Step: 269480 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:04:54,574-Speed 2498.23 samples/sec Loss 3.0504 LearningRate 0.000563 Epoch: 12 Global Step: 269490 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:02,775-Speed 2498.20 samples/sec Loss 3.0277 LearningRate 0.000563 Epoch: 12 Global Step: 269500 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:10,976-Speed 2497.71 samples/sec Loss 2.9937 LearningRate 0.000563 Epoch: 12 Global Step: 269510 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:19,178-Speed 2497.38 samples/sec Loss 3.0251 LearningRate 0.000563 Epoch: 12 Global Step: 269520 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:27,327-Speed 2513.36 samples/sec Loss 3.0581 LearningRate 0.000563 Epoch: 12 Global Step: 269530 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:35,527-Speed 2498.02 samples/sec Loss 2.9677 LearningRate 0.000563 Epoch: 12 Global Step: 269540 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:43,739-Speed 2494.49 samples/sec Loss 3.0078 LearningRate 0.000563 Epoch: 12 Global Step: 269550 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:05:51,947-Speed 2495.32 samples/sec Loss 3.0193 LearningRate 0.000563 Epoch: 12 Global Step: 269560 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:00,144-Speed 2498.77 samples/sec Loss 2.9782 LearningRate 0.000563 Epoch: 12 Global Step: 269570 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:08,359-Speed 2493.88 samples/sec Loss 2.9849 LearningRate 0.000563 Epoch: 12 Global Step: 269580 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:16,510-Speed 2513.16 samples/sec Loss 3.0461 LearningRate 0.000563 Epoch: 12 Global Step: 269590 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:24,714-Speed 2496.78 samples/sec Loss 3.0196 LearningRate 0.000563 Epoch: 12 Global Step: 269600 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:32,909-Speed 2499.46 samples/sec Loss 3.0383 LearningRate 0.000562 Epoch: 12 Global Step: 269610 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:43,438-Speed 1945.45 samples/sec Loss 3.0775 LearningRate 0.000562 Epoch: 13 Global Step: 269620 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:51,635-Speed 2498.73 samples/sec Loss 3.0221 LearningRate 0.000562 Epoch: 13 Global Step: 269630 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:06:59,829-Speed 2499.67 samples/sec Loss 3.0112 LearningRate 0.000562 Epoch: 13 Global Step: 269640 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:07,988-Speed 2510.40 samples/sec Loss 2.9417 LearningRate 0.000562 Epoch: 13 Global Step: 269650 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:16,187-Speed 2498.62 samples/sec Loss 3.0409 LearningRate 0.000562 Epoch: 13 Global Step: 269660 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:24,387-Speed 2497.73 samples/sec Loss 2.9854 LearningRate 0.000562 Epoch: 13 Global Step: 269670 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:32,588-Speed 2497.76 samples/sec Loss 2.9994 LearningRate 0.000562 Epoch: 13 Global Step: 269680 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:40,820-Speed 2488.36 samples/sec Loss 3.0005 LearningRate 0.000562 Epoch: 13 Global Step: 269690 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:49,026-Speed 2495.90 samples/sec Loss 3.0104 LearningRate 0.000562 Epoch: 13 Global Step: 269700 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:07:57,179-Speed 2512.52 samples/sec Loss 3.0236 LearningRate 0.000562 Epoch: 13 Global Step: 269710 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:05,383-Speed 2496.59 samples/sec Loss 3.0356 LearningRate 0.000562 Epoch: 13 Global Step: 269720 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:13,585-Speed 2497.38 samples/sec Loss 2.9952 LearningRate 0.000562 Epoch: 13 Global Step: 269730 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:21,787-Speed 2497.39 samples/sec Loss 3.0109 LearningRate 0.000562 Epoch: 13 Global Step: 269740 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:29,988-Speed 2497.71 samples/sec Loss 3.0508 LearningRate 0.000562 Epoch: 13 Global Step: 269750 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:38,186-Speed 2498.77 samples/sec Loss 3.0457 LearningRate 0.000562 Epoch: 13 Global Step: 269760 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:46,330-Speed 2515.08 samples/sec Loss 2.9727 LearningRate 0.000562 Epoch: 13 Global Step: 269770 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:08:54,534-Speed 2496.81 samples/sec Loss 2.9863 LearningRate 0.000562 Epoch: 13 Global Step: 269780 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:02,732-Speed 2498.41 samples/sec Loss 3.0213 LearningRate 0.000562 Epoch: 13 Global Step: 269790 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:10,945-Speed 2494.22 samples/sec Loss 2.9796 LearningRate 0.000562 Epoch: 13 Global Step: 269800 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:19,141-Speed 2498.85 samples/sec Loss 3.0087 LearningRate 0.000562 Epoch: 13 Global Step: 269810 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:27,340-Speed 2498.34 samples/sec Loss 2.9369 LearningRate 0.000562 Epoch: 13 Global Step: 269820 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:35,487-Speed 2514.25 samples/sec Loss 3.0108 LearningRate 0.000562 Epoch: 13 Global Step: 269830 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:43,688-Speed 2497.78 samples/sec Loss 3.0125 LearningRate 0.000562 Epoch: 13 Global Step: 269840 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:09:51,894-Speed 2496.21 samples/sec Loss 3.0598 LearningRate 0.000562 Epoch: 13 Global Step: 269850 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:00,095-Speed 2497.49 samples/sec Loss 3.0137 LearningRate 0.000562 Epoch: 13 Global Step: 269860 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:08,295-Speed 2497.97 samples/sec Loss 2.9363 LearningRate 0.000562 Epoch: 13 Global Step: 269870 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:16,498-Speed 2496.94 samples/sec Loss 2.9269 LearningRate 0.000562 Epoch: 13 Global Step: 269880 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:24,643-Speed 2514.76 samples/sec Loss 2.9438 LearningRate 0.000562 Epoch: 13 Global Step: 269890 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:32,850-Speed 2495.77 samples/sec Loss 2.9662 LearningRate 0.000562 Epoch: 13 Global Step: 269900 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:41,048-Speed 2498.74 samples/sec Loss 3.0422 LearningRate 0.000562 Epoch: 13 Global Step: 269910 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:49,265-Speed 2492.83 samples/sec Loss 2.9740 LearningRate 0.000562 Epoch: 13 Global Step: 269920 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:10:57,467-Speed 2497.28 samples/sec Loss 2.9762 LearningRate 0.000562 Epoch: 13 Global Step: 269930 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:05,669-Speed 2497.34 samples/sec Loss 3.0187 LearningRate 0.000562 Epoch: 13 Global Step: 269940 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:13,813-Speed 2515.45 samples/sec Loss 2.9801 LearningRate 0.000562 Epoch: 13 Global Step: 269950 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:22,017-Speed 2496.83 samples/sec Loss 2.9911 LearningRate 0.000562 Epoch: 13 Global Step: 269960 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:30,213-Speed 2498.91 samples/sec Loss 2.9788 LearningRate 0.000562 Epoch: 13 Global Step: 269970 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:38,423-Speed 2495.14 samples/sec Loss 3.0314 LearningRate 0.000562 Epoch: 13 Global Step: 269980 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:46,622-Speed 2498.41 samples/sec Loss 2.9601 LearningRate 0.000562 Epoch: 13 Global Step: 269990 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:11:54,821-Speed 2498.07 samples/sec Loss 2.9831 LearningRate 0.000562 Epoch: 13 Global Step: 270000 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:02,963-Speed 2515.68 samples/sec Loss 2.9516 LearningRate 0.000562 Epoch: 13 Global Step: 270010 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:11,164-Speed 2497.80 samples/sec Loss 2.9696 LearningRate 0.000562 Epoch: 13 Global Step: 270020 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:19,363-Speed 2498.16 samples/sec Loss 2.9847 LearningRate 0.000562 Epoch: 13 Global Step: 270030 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:27,563-Speed 2498.11 samples/sec Loss 3.0075 LearningRate 0.000562 Epoch: 13 Global Step: 270040 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:35,766-Speed 2496.92 samples/sec Loss 2.9626 LearningRate 0.000562 Epoch: 13 Global Step: 270050 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:43,967-Speed 2497.61 samples/sec Loss 2.9384 LearningRate 0.000562 Epoch: 13 Global Step: 270060 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:12:52,111-Speed 2515.10 samples/sec Loss 2.9343 LearningRate 0.000562 Epoch: 13 Global Step: 270070 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:00,314-Speed 2497.03 samples/sec Loss 2.9686 LearningRate 0.000562 Epoch: 13 Global Step: 270080 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:08,513-Speed 2498.32 samples/sec Loss 2.8606 LearningRate 0.000562 Epoch: 13 Global Step: 270090 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:16,707-Speed 2499.70 samples/sec Loss 2.9018 LearningRate 0.000562 Epoch: 13 Global Step: 270100 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:24,920-Speed 2494.14 samples/sec Loss 2.9951 LearningRate 0.000561 Epoch: 13 Global Step: 270110 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:33,122-Speed 2497.43 samples/sec Loss 3.0000 LearningRate 0.000561 Epoch: 13 Global Step: 270120 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:41,269-Speed 2514.05 samples/sec Loss 3.0052 LearningRate 0.000561 Epoch: 13 Global Step: 270130 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:49,470-Speed 2497.72 samples/sec Loss 3.0107 LearningRate 0.000561 Epoch: 13 Global Step: 270140 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:13:57,670-Speed 2497.98 samples/sec Loss 2.9407 LearningRate 0.000561 Epoch: 13 Global Step: 270150 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:05,871-Speed 2497.89 samples/sec Loss 2.9997 LearningRate 0.000561 Epoch: 13 Global Step: 270160 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:14,074-Speed 2496.88 samples/sec Loss 2.9980 LearningRate 0.000561 Epoch: 13 Global Step: 270170 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:22,274-Speed 2498.21 samples/sec Loss 2.9495 LearningRate 0.000561 Epoch: 13 Global Step: 270180 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:30,423-Speed 2513.42 samples/sec Loss 3.0059 LearningRate 0.000561 Epoch: 13 Global Step: 270190 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:38,625-Speed 2497.24 samples/sec Loss 2.9940 LearningRate 0.000561 Epoch: 13 Global Step: 270200 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:46,841-Speed 2493.32 samples/sec Loss 2.9838 LearningRate 0.000561 Epoch: 13 Global Step: 270210 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:14:55,046-Speed 2496.58 samples/sec Loss 2.9502 LearningRate 0.000561 Epoch: 13 Global Step: 270220 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:03,244-Speed 2498.52 samples/sec Loss 2.9630 LearningRate 0.000561 Epoch: 13 Global Step: 270230 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:11,462-Speed 2492.42 samples/sec Loss 2.9694 LearningRate 0.000561 Epoch: 13 Global Step: 270240 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:19,604-Speed 2516.08 samples/sec Loss 2.9866 LearningRate 0.000561 Epoch: 13 Global Step: 270250 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:27,803-Speed 2498.14 samples/sec Loss 3.0152 LearningRate 0.000561 Epoch: 13 Global Step: 270260 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:36,010-Speed 2495.88 samples/sec Loss 2.9314 LearningRate 0.000561 Epoch: 13 Global Step: 270270 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:44,207-Speed 2498.88 samples/sec Loss 2.9643 LearningRate 0.000561 Epoch: 13 Global Step: 270280 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:15:52,409-Speed 2497.14 samples/sec Loss 2.9625 LearningRate 0.000561 Epoch: 13 Global Step: 270290 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:00,610-Speed 2497.86 samples/sec Loss 3.0210 LearningRate 0.000561 Epoch: 13 Global Step: 270300 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:08,760-Speed 2513.14 samples/sec Loss 3.0591 LearningRate 0.000561 Epoch: 13 Global Step: 270310 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:16,962-Speed 2497.27 samples/sec Loss 3.0562 LearningRate 0.000561 Epoch: 13 Global Step: 270320 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:25,166-Speed 2496.69 samples/sec Loss 2.9865 LearningRate 0.000561 Epoch: 13 Global Step: 270330 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:33,371-Speed 2496.61 samples/sec Loss 3.0150 LearningRate 0.000561 Epoch: 13 Global Step: 270340 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:41,576-Speed 2496.29 samples/sec Loss 2.9463 LearningRate 0.000561 Epoch: 13 Global Step: 270350 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:49,782-Speed 2496.26 samples/sec Loss 2.9365 LearningRate 0.000561 Epoch: 13 Global Step: 270360 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:16:57,928-Speed 2514.50 samples/sec Loss 2.9790 LearningRate 0.000561 Epoch: 13 Global Step: 270370 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:06,135-Speed 2496.13 samples/sec Loss 3.0195 LearningRate 0.000561 Epoch: 13 Global Step: 270380 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:14,335-Speed 2497.86 samples/sec Loss 3.0351 LearningRate 0.000561 Epoch: 13 Global Step: 270390 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:22,537-Speed 2497.33 samples/sec Loss 2.9998 LearningRate 0.000561 Epoch: 13 Global Step: 270400 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:30,737-Speed 2498.15 samples/sec Loss 3.0176 LearningRate 0.000561 Epoch: 13 Global Step: 270410 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:38,944-Speed 2496.09 samples/sec Loss 3.0189 LearningRate 0.000561 Epoch: 13 Global Step: 270420 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:47,094-Speed 2513.21 samples/sec Loss 2.9983 LearningRate 0.000561 Epoch: 13 Global Step: 270430 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:17:55,291-Speed 2498.73 samples/sec Loss 3.0167 LearningRate 0.000561 Epoch: 13 Global Step: 270440 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:03,493-Speed 2497.49 samples/sec Loss 3.0194 LearningRate 0.000561 Epoch: 13 Global Step: 270450 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:11,690-Speed 2498.77 samples/sec Loss 2.9978 LearningRate 0.000561 Epoch: 13 Global Step: 270460 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:19,887-Speed 2499.01 samples/sec Loss 2.9561 LearningRate 0.000561 Epoch: 13 Global Step: 270470 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:28,094-Speed 2496.07 samples/sec Loss 2.9466 LearningRate 0.000561 Epoch: 13 Global Step: 270480 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:36,241-Speed 2514.31 samples/sec Loss 3.0201 LearningRate 0.000561 Epoch: 13 Global Step: 270490 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:44,444-Speed 2497.11 samples/sec Loss 2.9541 LearningRate 0.000561 Epoch: 13 Global Step: 270500 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:18:52,653-Speed 2495.41 samples/sec Loss 2.9916 LearningRate 0.000561 Epoch: 13 Global Step: 270510 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:00,852-Speed 2498.12 samples/sec Loss 2.9800 LearningRate 0.000561 Epoch: 13 Global Step: 270520 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:09,049-Speed 2498.99 samples/sec Loss 2.9644 LearningRate 0.000561 Epoch: 13 Global Step: 270530 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:17,250-Speed 2497.62 samples/sec Loss 3.0531 LearningRate 0.000561 Epoch: 13 Global Step: 270540 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:25,391-Speed 2515.96 samples/sec Loss 2.9837 LearningRate 0.000561 Epoch: 13 Global Step: 270550 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:33,588-Speed 2499.21 samples/sec Loss 3.0462 LearningRate 0.000561 Epoch: 13 Global Step: 270560 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:41,789-Speed 2497.51 samples/sec Loss 2.9762 LearningRate 0.000561 Epoch: 13 Global Step: 270570 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:49,984-Speed 2499.46 samples/sec Loss 2.9595 LearningRate 0.000561 Epoch: 13 Global Step: 270580 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:19:58,185-Speed 2497.66 samples/sec Loss 3.0530 LearningRate 0.000561 Epoch: 13 Global Step: 270590 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:20:06,382-Speed 2498.78 samples/sec Loss 2.9988 LearningRate 0.000561 Epoch: 13 Global Step: 270600 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:20:14,523-Speed 2516.02 samples/sec Loss 3.0840 LearningRate 0.000560 Epoch: 13 Global Step: 270610 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:20:22,720-Speed 2498.80 samples/sec Loss 3.0137 LearningRate 0.000560 Epoch: 13 Global Step: 270620 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:20:30,871-Speed 2513.03 samples/sec Loss 3.0632 LearningRate 0.000560 Epoch: 13 Global Step: 270630 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:20:39,065-Speed 2499.86 samples/sec Loss 3.0437 LearningRate 0.000560 Epoch: 13 Global Step: 270640 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:20:47,265-Speed 2498.23 samples/sec Loss 3.0089 LearningRate 0.000560 Epoch: 13 Global Step: 270650 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:20:55,460-Speed 2499.16 samples/sec Loss 3.0463 LearningRate 0.000560 Epoch: 13 Global Step: 270660 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:03,605-Speed 2515.00 samples/sec Loss 2.9740 LearningRate 0.000560 Epoch: 13 Global Step: 270670 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:11,804-Speed 2498.07 samples/sec Loss 3.0195 LearningRate 0.000560 Epoch: 13 Global Step: 270680 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:20,000-Speed 2499.20 samples/sec Loss 3.0712 LearningRate 0.000560 Epoch: 13 Global Step: 270690 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:28,211-Speed 2494.73 samples/sec Loss 3.0347 LearningRate 0.000560 Epoch: 13 Global Step: 270700 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:36,408-Speed 2499.21 samples/sec Loss 2.9623 LearningRate 0.000560 Epoch: 13 Global Step: 270710 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:44,606-Speed 2498.56 samples/sec Loss 2.9598 LearningRate 0.000560 Epoch: 13 Global Step: 270720 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:21:52,748-Speed 2515.53 samples/sec Loss 2.9678 LearningRate 0.000560 Epoch: 13 Global Step: 270730 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:00,953-Speed 2496.98 samples/sec Loss 3.0409 LearningRate 0.000560 Epoch: 13 Global Step: 270740 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:09,162-Speed 2494.99 samples/sec Loss 3.0420 LearningRate 0.000560 Epoch: 13 Global Step: 270750 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:17,357-Speed 2499.88 samples/sec Loss 3.0528 LearningRate 0.000560 Epoch: 13 Global Step: 270760 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:25,552-Speed 2499.90 samples/sec Loss 2.9752 LearningRate 0.000560 Epoch: 13 Global Step: 270770 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:33,748-Speed 2498.98 samples/sec Loss 2.9678 LearningRate 0.000560 Epoch: 13 Global Step: 270780 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:41,903-Speed 2512.38 samples/sec Loss 3.0357 LearningRate 0.000560 Epoch: 13 Global Step: 270790 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:50,103-Speed 2498.11 samples/sec Loss 3.0817 LearningRate 0.000560 Epoch: 13 Global Step: 270800 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:22:58,296-Speed 2499.87 samples/sec Loss 2.9609 LearningRate 0.000560 Epoch: 13 Global Step: 270810 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:06,505-Speed 2495.27 samples/sec Loss 3.0345 LearningRate 0.000560 Epoch: 13 Global Step: 270820 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:14,703-Speed 2498.70 samples/sec Loss 2.9749 LearningRate 0.000560 Epoch: 13 Global Step: 270830 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:22,902-Speed 2498.46 samples/sec Loss 3.0795 LearningRate 0.000560 Epoch: 13 Global Step: 270840 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:31,077-Speed 2505.55 samples/sec Loss 2.9960 LearningRate 0.000560 Epoch: 13 Global Step: 270850 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:39,298-Speed 2491.69 samples/sec Loss 2.9823 LearningRate 0.000560 Epoch: 13 Global Step: 270860 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:47,505-Speed 2495.89 samples/sec Loss 3.0196 LearningRate 0.000560 Epoch: 13 Global Step: 270870 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:23:55,727-Speed 2491.23 samples/sec Loss 2.9434 LearningRate 0.000560 Epoch: 13 Global Step: 270880 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:03,923-Speed 2498.94 samples/sec Loss 2.9386 LearningRate 0.000560 Epoch: 13 Global Step: 270890 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:12,120-Speed 2498.86 samples/sec Loss 2.9497 LearningRate 0.000560 Epoch: 13 Global Step: 270900 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:20,264-Speed 2515.40 samples/sec Loss 3.0050 LearningRate 0.000560 Epoch: 13 Global Step: 270910 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:28,459-Speed 2499.20 samples/sec Loss 2.9112 LearningRate 0.000560 Epoch: 13 Global Step: 270920 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:36,664-Speed 2496.91 samples/sec Loss 2.9842 LearningRate 0.000560 Epoch: 13 Global Step: 270930 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:44,866-Speed 2497.09 samples/sec Loss 3.0309 LearningRate 0.000560 Epoch: 13 Global Step: 270940 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:24:53,063-Speed 2498.88 samples/sec Loss 3.0134 LearningRate 0.000560 Epoch: 13 Global Step: 270950 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:01,266-Speed 2496.99 samples/sec Loss 3.0765 LearningRate 0.000560 Epoch: 13 Global Step: 270960 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:09,411-Speed 2514.86 samples/sec Loss 3.0890 LearningRate 0.000560 Epoch: 13 Global Step: 270970 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:17,613-Speed 2497.57 samples/sec Loss 3.0515 LearningRate 0.000560 Epoch: 13 Global Step: 270980 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:25,833-Speed 2491.78 samples/sec Loss 3.0515 LearningRate 0.000560 Epoch: 13 Global Step: 270990 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:34,033-Speed 2497.83 samples/sec Loss 3.0184 LearningRate 0.000560 Epoch: 13 Global Step: 271000 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:42,239-Speed 2496.02 samples/sec Loss 2.9781 LearningRate 0.000560 Epoch: 13 Global Step: 271010 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:50,443-Speed 2496.74 samples/sec Loss 3.0565 LearningRate 0.000560 Epoch: 13 Global Step: 271020 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:25:58,593-Speed 2513.47 samples/sec Loss 2.9927 LearningRate 0.000560 Epoch: 13 Global Step: 271030 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:06,801-Speed 2495.37 samples/sec Loss 2.9534 LearningRate 0.000560 Epoch: 13 Global Step: 271040 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:15,007-Speed 2496.18 samples/sec Loss 2.9609 LearningRate 0.000560 Epoch: 13 Global Step: 271050 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:23,208-Speed 2498.00 samples/sec Loss 2.9337 LearningRate 0.000560 Epoch: 13 Global Step: 271060 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:31,407-Speed 2498.21 samples/sec Loss 2.9853 LearningRate 0.000560 Epoch: 13 Global Step: 271070 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:39,608-Speed 2497.80 samples/sec Loss 2.9537 LearningRate 0.000560 Epoch: 13 Global Step: 271080 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:47,753-Speed 2514.76 samples/sec Loss 2.9650 LearningRate 0.000560 Epoch: 13 Global Step: 271090 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:26:55,956-Speed 2497.08 samples/sec Loss 3.0075 LearningRate 0.000560 Epoch: 13 Global Step: 271100 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:04,155-Speed 2498.42 samples/sec Loss 3.0309 LearningRate 0.000559 Epoch: 13 Global Step: 271110 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:12,352-Speed 2498.82 samples/sec Loss 2.9147 LearningRate 0.000559 Epoch: 13 Global Step: 271120 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:20,553-Speed 2497.55 samples/sec Loss 2.9829 LearningRate 0.000559 Epoch: 13 Global Step: 271130 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:28,752-Speed 2498.47 samples/sec Loss 2.9428 LearningRate 0.000559 Epoch: 13 Global Step: 271140 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:36,901-Speed 2513.56 samples/sec Loss 3.0122 LearningRate 0.000559 Epoch: 13 Global Step: 271150 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:45,101-Speed 2497.99 samples/sec Loss 2.9631 LearningRate 0.000559 Epoch: 13 Global Step: 271160 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:27:53,304-Speed 2497.31 samples/sec Loss 2.9622 LearningRate 0.000559 Epoch: 13 Global Step: 271170 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:01,503-Speed 2498.21 samples/sec Loss 2.9067 LearningRate 0.000559 Epoch: 13 Global Step: 271180 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:09,701-Speed 2498.36 samples/sec Loss 2.9780 LearningRate 0.000559 Epoch: 13 Global Step: 271190 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:17,897-Speed 2499.25 samples/sec Loss 2.9362 LearningRate 0.000559 Epoch: 13 Global Step: 271200 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:26,039-Speed 2515.65 samples/sec Loss 3.0002 LearningRate 0.000559 Epoch: 13 Global Step: 271210 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:34,234-Speed 2499.72 samples/sec Loss 3.0464 LearningRate 0.000559 Epoch: 13 Global Step: 271220 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:42,428-Speed 2499.66 samples/sec Loss 2.9597 LearningRate 0.000559 Epoch: 13 Global Step: 271230 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:50,631-Speed 2497.17 samples/sec Loss 2.9527 LearningRate 0.000559 Epoch: 13 Global Step: 271240 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:28:58,828-Speed 2499.22 samples/sec Loss 2.9797 LearningRate 0.000559 Epoch: 13 Global Step: 271250 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:07,027-Speed 2498.28 samples/sec Loss 2.9397 LearningRate 0.000559 Epoch: 13 Global Step: 271260 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:15,174-Speed 2513.97 samples/sec Loss 3.0148 LearningRate 0.000559 Epoch: 13 Global Step: 271270 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:23,373-Speed 2498.49 samples/sec Loss 3.0135 LearningRate 0.000559 Epoch: 13 Global Step: 271280 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:31,571-Speed 2498.76 samples/sec Loss 3.0030 LearningRate 0.000559 Epoch: 13 Global Step: 271290 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:39,773-Speed 2497.39 samples/sec Loss 3.0230 LearningRate 0.000559 Epoch: 13 Global Step: 271300 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:47,971-Speed 2498.44 samples/sec Loss 3.0684 LearningRate 0.000559 Epoch: 13 Global Step: 271310 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:29:56,167-Speed 2498.94 samples/sec Loss 3.0394 LearningRate 0.000559 Epoch: 13 Global Step: 271320 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:04,312-Speed 2514.90 samples/sec Loss 2.9684 LearningRate 0.000559 Epoch: 13 Global Step: 271330 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:12,512-Speed 2497.94 samples/sec Loss 2.9234 LearningRate 0.000559 Epoch: 13 Global Step: 271340 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:20,710-Speed 2498.46 samples/sec Loss 3.0026 LearningRate 0.000559 Epoch: 13 Global Step: 271350 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:28,913-Speed 2497.12 samples/sec Loss 3.0082 LearningRate 0.000559 Epoch: 13 Global Step: 271360 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:37,115-Speed 2497.59 samples/sec Loss 2.9801 LearningRate 0.000559 Epoch: 13 Global Step: 271370 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:45,321-Speed 2496.03 samples/sec Loss 2.9268 LearningRate 0.000559 Epoch: 13 Global Step: 271380 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:30:53,468-Speed 2514.50 samples/sec Loss 2.9640 LearningRate 0.000559 Epoch: 13 Global Step: 271390 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:01,669-Speed 2497.58 samples/sec Loss 2.9462 LearningRate 0.000559 Epoch: 13 Global Step: 271400 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:09,870-Speed 2497.43 samples/sec Loss 3.0069 LearningRate 0.000559 Epoch: 13 Global Step: 271410 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:18,073-Speed 2497.13 samples/sec Loss 3.0135 LearningRate 0.000559 Epoch: 13 Global Step: 271420 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:26,270-Speed 2498.90 samples/sec Loss 2.9671 LearningRate 0.000559 Epoch: 13 Global Step: 271430 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:34,472-Speed 2497.35 samples/sec Loss 3.0059 LearningRate 0.000559 Epoch: 13 Global Step: 271440 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:42,616-Speed 2515.20 samples/sec Loss 2.9351 LearningRate 0.000559 Epoch: 13 Global Step: 271450 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:50,820-Speed 2496.66 samples/sec Loss 2.9601 LearningRate 0.000559 Epoch: 13 Global Step: 271460 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:31:59,018-Speed 2498.41 samples/sec Loss 3.0365 LearningRate 0.000559 Epoch: 13 Global Step: 271470 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:07,222-Speed 2496.89 samples/sec Loss 2.9599 LearningRate 0.000559 Epoch: 13 Global Step: 271480 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:15,449-Speed 2489.63 samples/sec Loss 2.9685 LearningRate 0.000559 Epoch: 13 Global Step: 271490 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:23,659-Speed 2494.79 samples/sec Loss 3.0407 LearningRate 0.000559 Epoch: 13 Global Step: 271500 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:31,811-Speed 2512.74 samples/sec Loss 3.0113 LearningRate 0.000559 Epoch: 13 Global Step: 271510 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:40,008-Speed 2498.92 samples/sec Loss 2.9581 LearningRate 0.000559 Epoch: 13 Global Step: 271520 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:48,212-Speed 2496.95 samples/sec Loss 3.0293 LearningRate 0.000559 Epoch: 13 Global Step: 271530 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:32:56,409-Speed 2498.83 samples/sec Loss 3.0491 LearningRate 0.000559 Epoch: 13 Global Step: 271540 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:04,620-Speed 2494.43 samples/sec Loss 2.9564 LearningRate 0.000559 Epoch: 13 Global Step: 271550 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:12,820-Speed 2497.98 samples/sec Loss 3.0446 LearningRate 0.000559 Epoch: 13 Global Step: 271560 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:20,972-Speed 2512.65 samples/sec Loss 3.0100 LearningRate 0.000559 Epoch: 13 Global Step: 271570 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:29,178-Speed 2496.10 samples/sec Loss 3.0054 LearningRate 0.000559 Epoch: 13 Global Step: 271580 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:37,381-Speed 2497.22 samples/sec Loss 3.0076 LearningRate 0.000559 Epoch: 13 Global Step: 271590 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:45,584-Speed 2497.02 samples/sec Loss 2.9842 LearningRate 0.000559 Epoch: 13 Global Step: 271600 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:33:53,783-Speed 2498.16 samples/sec Loss 2.9704 LearningRate 0.000558 Epoch: 13 Global Step: 271610 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:01,985-Speed 2497.56 samples/sec Loss 2.9716 LearningRate 0.000558 Epoch: 13 Global Step: 271620 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:10,142-Speed 2511.03 samples/sec Loss 2.9988 LearningRate 0.000558 Epoch: 13 Global Step: 271630 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:18,341-Speed 2498.32 samples/sec Loss 2.9831 LearningRate 0.000558 Epoch: 13 Global Step: 271640 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:26,546-Speed 2496.46 samples/sec Loss 2.9642 LearningRate 0.000558 Epoch: 13 Global Step: 271650 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:34,743-Speed 2498.78 samples/sec Loss 3.0431 LearningRate 0.000558 Epoch: 13 Global Step: 271660 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:42,941-Speed 2498.73 samples/sec Loss 3.0039 LearningRate 0.000558 Epoch: 13 Global Step: 271670 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:51,149-Speed 2495.32 samples/sec Loss 3.0048 LearningRate 0.000558 Epoch: 13 Global Step: 271680 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:34:59,297-Speed 2514.06 samples/sec Loss 2.9312 LearningRate 0.000558 Epoch: 13 Global Step: 271690 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:07,501-Speed 2496.48 samples/sec Loss 2.9980 LearningRate 0.000558 Epoch: 13 Global Step: 271700 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:15,705-Speed 2496.81 samples/sec Loss 2.9612 LearningRate 0.000558 Epoch: 13 Global Step: 271710 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:23,902-Speed 2499.12 samples/sec Loss 2.9815 LearningRate 0.000558 Epoch: 13 Global Step: 271720 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:32,101-Speed 2497.85 samples/sec Loss 2.9601 LearningRate 0.000558 Epoch: 13 Global Step: 271730 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:40,316-Speed 2493.63 samples/sec Loss 2.9739 LearningRate 0.000558 Epoch: 13 Global Step: 271740 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:48,465-Speed 2513.68 samples/sec Loss 2.9963 LearningRate 0.000558 Epoch: 13 Global Step: 271750 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:35:56,660-Speed 2499.43 samples/sec Loss 3.0284 LearningRate 0.000558 Epoch: 13 Global Step: 271760 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:04,857-Speed 2498.95 samples/sec Loss 2.9872 LearningRate 0.000558 Epoch: 13 Global Step: 271770 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:13,053-Speed 2498.90 samples/sec Loss 2.9438 LearningRate 0.000558 Epoch: 13 Global Step: 271780 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:21,249-Speed 2499.20 samples/sec Loss 2.9914 LearningRate 0.000558 Epoch: 13 Global Step: 271790 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:29,446-Speed 2498.85 samples/sec Loss 3.0472 LearningRate 0.000558 Epoch: 13 Global Step: 271800 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:37,590-Speed 2515.03 samples/sec Loss 3.0755 LearningRate 0.000558 Epoch: 13 Global Step: 271810 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:45,785-Speed 2499.61 samples/sec Loss 2.9869 LearningRate 0.000558 Epoch: 13 Global Step: 271820 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:36:53,991-Speed 2495.98 samples/sec Loss 3.0494 LearningRate 0.000558 Epoch: 13 Global Step: 271830 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:02,198-Speed 2495.72 samples/sec Loss 2.9928 LearningRate 0.000558 Epoch: 13 Global Step: 271840 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:10,395-Speed 2499.05 samples/sec Loss 2.9744 LearningRate 0.000558 Epoch: 13 Global Step: 271850 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:18,592-Speed 2498.64 samples/sec Loss 2.9418 LearningRate 0.000558 Epoch: 13 Global Step: 271860 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:26,744-Speed 2513.07 samples/sec Loss 2.9555 LearningRate 0.000558 Epoch: 13 Global Step: 271870 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:34,940-Speed 2498.93 samples/sec Loss 2.9714 LearningRate 0.000558 Epoch: 13 Global Step: 271880 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:43,138-Speed 2498.57 samples/sec Loss 3.0313 LearningRate 0.000558 Epoch: 13 Global Step: 271890 Fp16 Grad Scale: 131072 Required: 128 hours Training: 2022-07-08 04:37:51,297-Speed 2510.74 samples/sec Loss 2.9903 LearningRate 0.000558 Epoch: 13 Global Step: 271900 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:37:59,493-Speed 2499.54 samples/sec Loss 3.0275 LearningRate 0.000558 Epoch: 13 Global Step: 271910 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:07,689-Speed 2499.05 samples/sec Loss 3.0101 LearningRate 0.000558 Epoch: 13 Global Step: 271920 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:15,828-Speed 2516.52 samples/sec Loss 2.9634 LearningRate 0.000558 Epoch: 13 Global Step: 271930 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:24,024-Speed 2499.66 samples/sec Loss 2.9989 LearningRate 0.000558 Epoch: 13 Global Step: 271940 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:32,221-Speed 2498.71 samples/sec Loss 2.9218 LearningRate 0.000558 Epoch: 13 Global Step: 271950 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:40,425-Speed 2496.79 samples/sec Loss 2.9641 LearningRate 0.000558 Epoch: 13 Global Step: 271960 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:48,629-Speed 2497.08 samples/sec Loss 2.9603 LearningRate 0.000558 Epoch: 13 Global Step: 271970 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:38:56,826-Speed 2499.01 samples/sec Loss 2.9408 LearningRate 0.000558 Epoch: 13 Global Step: 271980 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:04,968-Speed 2515.60 samples/sec Loss 3.0329 LearningRate 0.000558 Epoch: 13 Global Step: 271990 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:13,167-Speed 2498.15 samples/sec Loss 2.9549 LearningRate 0.000558 Epoch: 13 Global Step: 272000 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:21,364-Speed 2498.83 samples/sec Loss 2.9830 LearningRate 0.000558 Epoch: 13 Global Step: 272010 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:29,568-Speed 2497.04 samples/sec Loss 2.9376 LearningRate 0.000558 Epoch: 13 Global Step: 272020 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:37,762-Speed 2499.82 samples/sec Loss 2.9108 LearningRate 0.000558 Epoch: 13 Global Step: 272030 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:45,959-Speed 2498.78 samples/sec Loss 2.9605 LearningRate 0.000558 Epoch: 13 Global Step: 272040 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:39:54,106-Speed 2514.28 samples/sec Loss 3.0346 LearningRate 0.000558 Epoch: 13 Global Step: 272050 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:02,308-Speed 2497.25 samples/sec Loss 3.0168 LearningRate 0.000558 Epoch: 13 Global Step: 272060 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:10,516-Speed 2495.86 samples/sec Loss 2.9715 LearningRate 0.000558 Epoch: 13 Global Step: 272070 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:18,719-Speed 2497.24 samples/sec Loss 3.0041 LearningRate 0.000558 Epoch: 13 Global Step: 272080 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:26,930-Speed 2494.51 samples/sec Loss 3.0193 LearningRate 0.000558 Epoch: 13 Global Step: 272090 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:35,129-Speed 2498.24 samples/sec Loss 2.9821 LearningRate 0.000558 Epoch: 13 Global Step: 272100 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:43,278-Speed 2513.58 samples/sec Loss 2.9919 LearningRate 0.000557 Epoch: 13 Global Step: 272110 Fp16 Grad Scale: 65536 Required: 128 hours Training: 2022-07-08 04:40:51,489-Speed 2494.54 samples/sec Loss 2.9613 LearningRate 0.000557 Epoch: 13 Global Step: 272120 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:40:59,789-Speed 2468.00 samples/sec Loss 2.9571 LearningRate 0.000557 Epoch: 13 Global Step: 272130 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:07,997-Speed 2495.66 samples/sec Loss 2.9983 LearningRate 0.000557 Epoch: 13 Global Step: 272140 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:16,194-Speed 2498.88 samples/sec Loss 2.9710 LearningRate 0.000557 Epoch: 13 Global Step: 272150 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:24,392-Speed 2498.37 samples/sec Loss 2.9377 LearningRate 0.000557 Epoch: 13 Global Step: 272160 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:32,538-Speed 2514.66 samples/sec Loss 2.9331 LearningRate 0.000557 Epoch: 13 Global Step: 272170 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:40,736-Speed 2498.48 samples/sec Loss 2.9786 LearningRate 0.000557 Epoch: 13 Global Step: 272180 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:48,931-Speed 2499.53 samples/sec Loss 2.9696 LearningRate 0.000557 Epoch: 13 Global Step: 272190 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:41:57,134-Speed 2497.22 samples/sec Loss 3.0029 LearningRate 0.000557 Epoch: 13 Global Step: 272200 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:05,333-Speed 2498.64 samples/sec Loss 2.9531 LearningRate 0.000557 Epoch: 13 Global Step: 272210 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:13,534-Speed 2497.64 samples/sec Loss 2.9466 LearningRate 0.000557 Epoch: 13 Global Step: 272220 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:21,679-Speed 2514.68 samples/sec Loss 3.0570 LearningRate 0.000557 Epoch: 13 Global Step: 272230 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:29,879-Speed 2498.06 samples/sec Loss 2.9420 LearningRate 0.000557 Epoch: 13 Global Step: 272240 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:38,079-Speed 2497.83 samples/sec Loss 3.0802 LearningRate 0.000557 Epoch: 13 Global Step: 272250 Fp16 Grad Scale: 65536 Required: 127 hours Training: 2022-07-08 04:42:46,241-Speed 2509.64 samples/sec Loss 3.0712 LearningRate 0.000557 Epoch: 13 Global Step: 272260 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:42:54,442-Speed 2497.64 samples/sec Loss 3.1181 LearningRate 0.000557 Epoch: 13 Global Step: 272270 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:02,646-Speed 2496.79 samples/sec Loss 2.9954 LearningRate 0.000557 Epoch: 13 Global Step: 272280 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:10,803-Speed 2511.22 samples/sec Loss 3.0665 LearningRate 0.000557 Epoch: 13 Global Step: 272290 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:19,000-Speed 2498.68 samples/sec Loss 2.9891 LearningRate 0.000557 Epoch: 13 Global Step: 272300 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:27,197-Speed 2498.84 samples/sec Loss 2.9805 LearningRate 0.000557 Epoch: 13 Global Step: 272310 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:35,396-Speed 2498.65 samples/sec Loss 2.9859 LearningRate 0.000557 Epoch: 13 Global Step: 272320 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:43,593-Speed 2498.87 samples/sec Loss 2.9937 LearningRate 0.000557 Epoch: 13 Global Step: 272330 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:51,800-Speed 2495.80 samples/sec Loss 2.9749 LearningRate 0.000557 Epoch: 13 Global Step: 272340 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:43:59,951-Speed 2512.88 samples/sec Loss 3.0358 LearningRate 0.000557 Epoch: 13 Global Step: 272350 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:08,151-Speed 2497.96 samples/sec Loss 3.0102 LearningRate 0.000557 Epoch: 13 Global Step: 272360 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:16,349-Speed 2498.59 samples/sec Loss 2.9893 LearningRate 0.000557 Epoch: 13 Global Step: 272370 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:24,558-Speed 2495.00 samples/sec Loss 2.9894 LearningRate 0.000557 Epoch: 13 Global Step: 272380 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:32,769-Speed 2494.56 samples/sec Loss 3.0152 LearningRate 0.000557 Epoch: 13 Global Step: 272390 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:40,972-Speed 2497.53 samples/sec Loss 2.9522 LearningRate 0.000557 Epoch: 13 Global Step: 272400 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:49,128-Speed 2511.35 samples/sec Loss 2.9821 LearningRate 0.000557 Epoch: 13 Global Step: 272410 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:44:57,329-Speed 2497.80 samples/sec Loss 2.9623 LearningRate 0.000557 Epoch: 13 Global Step: 272420 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:05,527-Speed 2498.58 samples/sec Loss 2.9783 LearningRate 0.000557 Epoch: 13 Global Step: 272430 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:13,737-Speed 2495.14 samples/sec Loss 2.9771 LearningRate 0.000557 Epoch: 13 Global Step: 272440 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:21,938-Speed 2497.81 samples/sec Loss 3.0230 LearningRate 0.000557 Epoch: 13 Global Step: 272450 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:30,142-Speed 2496.57 samples/sec Loss 3.0194 LearningRate 0.000557 Epoch: 13 Global Step: 272460 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:38,303-Speed 2509.77 samples/sec Loss 2.9646 LearningRate 0.000557 Epoch: 13 Global Step: 272470 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:46,506-Speed 2497.23 samples/sec Loss 2.9788 LearningRate 0.000557 Epoch: 13 Global Step: 272480 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:45:54,704-Speed 2498.48 samples/sec Loss 3.0158 LearningRate 0.000557 Epoch: 13 Global Step: 272490 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:02,903-Speed 2498.77 samples/sec Loss 2.9903 LearningRate 0.000557 Epoch: 13 Global Step: 272500 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:11,100-Speed 2498.86 samples/sec Loss 2.9709 LearningRate 0.000557 Epoch: 13 Global Step: 272510 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:19,314-Speed 2493.81 samples/sec Loss 3.0055 LearningRate 0.000557 Epoch: 13 Global Step: 272520 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:27,459-Speed 2514.63 samples/sec Loss 2.9224 LearningRate 0.000557 Epoch: 13 Global Step: 272530 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:35,659-Speed 2498.27 samples/sec Loss 2.9910 LearningRate 0.000557 Epoch: 13 Global Step: 272540 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:43,858-Speed 2498.12 samples/sec Loss 2.9726 LearningRate 0.000557 Epoch: 13 Global Step: 272550 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:46:52,063-Speed 2496.29 samples/sec Loss 2.9927 LearningRate 0.000557 Epoch: 13 Global Step: 272560 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:00,261-Speed 2499.05 samples/sec Loss 3.0066 LearningRate 0.000557 Epoch: 13 Global Step: 272570 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:08,461-Speed 2497.81 samples/sec Loss 2.9793 LearningRate 0.000557 Epoch: 13 Global Step: 272580 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:16,610-Speed 2513.70 samples/sec Loss 2.9493 LearningRate 0.000557 Epoch: 13 Global Step: 272590 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:24,811-Speed 2497.63 samples/sec Loss 2.9955 LearningRate 0.000557 Epoch: 13 Global Step: 272600 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:33,014-Speed 2497.07 samples/sec Loss 2.9400 LearningRate 0.000556 Epoch: 13 Global Step: 272610 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:41,217-Speed 2496.93 samples/sec Loss 2.9700 LearningRate 0.000556 Epoch: 13 Global Step: 272620 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:49,415-Speed 2499.04 samples/sec Loss 2.9477 LearningRate 0.000556 Epoch: 13 Global Step: 272630 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:47:57,612-Speed 2498.58 samples/sec Loss 2.9470 LearningRate 0.000556 Epoch: 13 Global Step: 272640 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:05,763-Speed 2513.14 samples/sec Loss 3.0581 LearningRate 0.000556 Epoch: 13 Global Step: 272650 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:13,965-Speed 2497.60 samples/sec Loss 2.9943 LearningRate 0.000556 Epoch: 13 Global Step: 272660 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:22,174-Speed 2495.00 samples/sec Loss 2.9752 LearningRate 0.000556 Epoch: 13 Global Step: 272670 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:30,377-Speed 2497.10 samples/sec Loss 3.0074 LearningRate 0.000556 Epoch: 13 Global Step: 272680 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:38,575-Speed 2498.63 samples/sec Loss 2.9896 LearningRate 0.000556 Epoch: 13 Global Step: 272690 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:46,777-Speed 2497.10 samples/sec Loss 3.0241 LearningRate 0.000556 Epoch: 13 Global Step: 272700 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:48:54,921-Speed 2515.30 samples/sec Loss 2.9463 LearningRate 0.000556 Epoch: 13 Global Step: 272710 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:03,127-Speed 2496.07 samples/sec Loss 3.0404 LearningRate 0.000556 Epoch: 13 Global Step: 272720 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:11,326-Speed 2498.44 samples/sec Loss 3.0366 LearningRate 0.000556 Epoch: 13 Global Step: 272730 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:19,524-Speed 2498.71 samples/sec Loss 2.9845 LearningRate 0.000556 Epoch: 13 Global Step: 272740 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:27,719-Speed 2499.54 samples/sec Loss 3.0043 LearningRate 0.000556 Epoch: 13 Global Step: 272750 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:35,919-Speed 2497.86 samples/sec Loss 3.0799 LearningRate 0.000556 Epoch: 13 Global Step: 272760 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:44,073-Speed 2512.24 samples/sec Loss 2.9679 LearningRate 0.000556 Epoch: 13 Global Step: 272770 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:49:52,274-Speed 2497.61 samples/sec Loss 3.0215 LearningRate 0.000556 Epoch: 13 Global Step: 272780 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:00,472-Speed 2498.57 samples/sec Loss 2.9905 LearningRate 0.000556 Epoch: 13 Global Step: 272790 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:08,674-Speed 2497.53 samples/sec Loss 2.9646 LearningRate 0.000556 Epoch: 13 Global Step: 272800 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:16,877-Speed 2496.96 samples/sec Loss 3.0381 LearningRate 0.000556 Epoch: 13 Global Step: 272810 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:25,078-Speed 2497.71 samples/sec Loss 2.9276 LearningRate 0.000556 Epoch: 13 Global Step: 272820 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:33,237-Speed 2510.56 samples/sec Loss 2.9785 LearningRate 0.000556 Epoch: 13 Global Step: 272830 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:41,441-Speed 2496.82 samples/sec Loss 3.0167 LearningRate 0.000556 Epoch: 13 Global Step: 272840 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:49,638-Speed 2498.78 samples/sec Loss 3.0291 LearningRate 0.000556 Epoch: 13 Global Step: 272850 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:50:57,839-Speed 2497.76 samples/sec Loss 3.0007 LearningRate 0.000556 Epoch: 13 Global Step: 272860 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:06,040-Speed 2497.59 samples/sec Loss 2.9845 LearningRate 0.000556 Epoch: 13 Global Step: 272870 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:14,248-Speed 2495.55 samples/sec Loss 3.0085 LearningRate 0.000556 Epoch: 13 Global Step: 272880 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:22,422-Speed 2506.01 samples/sec Loss 3.0404 LearningRate 0.000556 Epoch: 13 Global Step: 272890 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:30,622-Speed 2497.88 samples/sec Loss 2.9459 LearningRate 0.000556 Epoch: 13 Global Step: 272900 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:38,824-Speed 2497.32 samples/sec Loss 2.9722 LearningRate 0.000556 Epoch: 13 Global Step: 272910 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:47,025-Speed 2497.69 samples/sec Loss 2.9859 LearningRate 0.000556 Epoch: 13 Global Step: 272920 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:51:55,233-Speed 2495.34 samples/sec Loss 2.9999 LearningRate 0.000556 Epoch: 13 Global Step: 272930 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:03,432-Speed 2498.31 samples/sec Loss 2.9513 LearningRate 0.000556 Epoch: 13 Global Step: 272940 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:11,578-Speed 2514.69 samples/sec Loss 2.9343 LearningRate 0.000556 Epoch: 13 Global Step: 272950 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:19,772-Speed 2500.05 samples/sec Loss 3.0108 LearningRate 0.000556 Epoch: 13 Global Step: 272960 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:27,969-Speed 2498.75 samples/sec Loss 2.9673 LearningRate 0.000556 Epoch: 13 Global Step: 272970 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:36,163-Speed 2500.17 samples/sec Loss 3.0394 LearningRate 0.000556 Epoch: 13 Global Step: 272980 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:44,369-Speed 2496.03 samples/sec Loss 2.9936 LearningRate 0.000556 Epoch: 13 Global Step: 272990 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:52:52,565-Speed 2499.13 samples/sec Loss 2.9812 LearningRate 0.000556 Epoch: 13 Global Step: 273000 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:00,710-Speed 2514.80 samples/sec Loss 3.0045 LearningRate 0.000556 Epoch: 13 Global Step: 273010 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:08,907-Speed 2498.83 samples/sec Loss 2.9458 LearningRate 0.000556 Epoch: 13 Global Step: 273020 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:17,109-Speed 2497.29 samples/sec Loss 2.9772 LearningRate 0.000556 Epoch: 13 Global Step: 273030 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:25,319-Speed 2494.89 samples/sec Loss 3.0003 LearningRate 0.000556 Epoch: 13 Global Step: 273040 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:33,520-Speed 2497.75 samples/sec Loss 2.9371 LearningRate 0.000556 Epoch: 13 Global Step: 273050 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:41,716-Speed 2499.09 samples/sec Loss 2.9720 LearningRate 0.000556 Epoch: 13 Global Step: 273060 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:49,861-Speed 2514.86 samples/sec Loss 2.9295 LearningRate 0.000556 Epoch: 13 Global Step: 273070 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:53:58,082-Speed 2491.55 samples/sec Loss 2.9470 LearningRate 0.000556 Epoch: 13 Global Step: 273080 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:06,275-Speed 2500.27 samples/sec Loss 2.9536 LearningRate 0.000556 Epoch: 13 Global Step: 273090 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:14,481-Speed 2495.91 samples/sec Loss 3.0299 LearningRate 0.000556 Epoch: 13 Global Step: 273100 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:22,680-Speed 2498.52 samples/sec Loss 2.9543 LearningRate 0.000555 Epoch: 13 Global Step: 273110 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:30,890-Speed 2495.03 samples/sec Loss 2.8854 LearningRate 0.000555 Epoch: 13 Global Step: 273120 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:39,038-Speed 2513.73 samples/sec Loss 3.0165 LearningRate 0.000555 Epoch: 13 Global Step: 273130 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:47,247-Speed 2495.28 samples/sec Loss 2.9529 LearningRate 0.000555 Epoch: 13 Global Step: 273140 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:54:55,442-Speed 2499.56 samples/sec Loss 2.9182 LearningRate 0.000555 Epoch: 13 Global Step: 273150 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:03,644-Speed 2497.14 samples/sec Loss 2.9445 LearningRate 0.000555 Epoch: 13 Global Step: 273160 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:11,844-Speed 2497.96 samples/sec Loss 2.9557 LearningRate 0.000555 Epoch: 13 Global Step: 273170 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:20,049-Speed 2496.66 samples/sec Loss 2.9213 LearningRate 0.000555 Epoch: 13 Global Step: 273180 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:28,193-Speed 2515.06 samples/sec Loss 2.9631 LearningRate 0.000555 Epoch: 13 Global Step: 273190 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:36,389-Speed 2499.24 samples/sec Loss 2.9723 LearningRate 0.000555 Epoch: 13 Global Step: 273200 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:44,584-Speed 2499.49 samples/sec Loss 2.9260 LearningRate 0.000555 Epoch: 13 Global Step: 273210 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:55:52,780-Speed 2499.11 samples/sec Loss 3.1006 LearningRate 0.000555 Epoch: 13 Global Step: 273220 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:00,982-Speed 2497.78 samples/sec Loss 3.1076 LearningRate 0.000555 Epoch: 13 Global Step: 273230 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:09,181-Speed 2498.06 samples/sec Loss 3.0258 LearningRate 0.000555 Epoch: 13 Global Step: 273240 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:17,330-Speed 2513.71 samples/sec Loss 2.9138 LearningRate 0.000555 Epoch: 13 Global Step: 273250 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:25,527-Speed 2498.69 samples/sec Loss 2.9978 LearningRate 0.000555 Epoch: 13 Global Step: 273260 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:33,725-Speed 2498.72 samples/sec Loss 3.0261 LearningRate 0.000555 Epoch: 13 Global Step: 273270 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:41,919-Speed 2499.76 samples/sec Loss 2.9360 LearningRate 0.000555 Epoch: 13 Global Step: 273280 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:50,112-Speed 2500.00 samples/sec Loss 3.0168 LearningRate 0.000555 Epoch: 13 Global Step: 273290 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:56:58,308-Speed 2499.34 samples/sec Loss 2.9485 LearningRate 0.000555 Epoch: 13 Global Step: 273300 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:57:06,451-Speed 2515.47 samples/sec Loss 2.9560 LearningRate 0.000555 Epoch: 13 Global Step: 273310 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:57:14,645-Speed 2499.69 samples/sec Loss 2.9820 LearningRate 0.000555 Epoch: 13 Global Step: 273320 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 04:57:22,811-Speed 2508.35 samples/sec Loss 3.0763 LearningRate 0.000555 Epoch: 13 Global Step: 273330 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:57:31,006-Speed 2499.55 samples/sec Loss 3.0019 LearningRate 0.000555 Epoch: 13 Global Step: 273340 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:57:39,203-Speed 2498.83 samples/sec Loss 2.9567 LearningRate 0.000555 Epoch: 13 Global Step: 273350 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:57:47,401-Speed 2498.62 samples/sec Loss 2.9166 LearningRate 0.000555 Epoch: 13 Global Step: 273360 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:57:55,556-Speed 2511.67 samples/sec Loss 3.0168 LearningRate 0.000555 Epoch: 13 Global Step: 273370 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:03,751-Speed 2499.53 samples/sec Loss 2.9138 LearningRate 0.000555 Epoch: 13 Global Step: 273380 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:11,945-Speed 2500.08 samples/sec Loss 2.9941 LearningRate 0.000555 Epoch: 13 Global Step: 273390 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:20,158-Speed 2494.02 samples/sec Loss 3.0185 LearningRate 0.000555 Epoch: 13 Global Step: 273400 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:28,363-Speed 2496.33 samples/sec Loss 3.0153 LearningRate 0.000555 Epoch: 13 Global Step: 273410 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:36,555-Speed 2500.79 samples/sec Loss 2.9907 LearningRate 0.000555 Epoch: 13 Global Step: 273420 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:44,697-Speed 2515.71 samples/sec Loss 3.0312 LearningRate 0.000555 Epoch: 13 Global Step: 273430 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:58:52,900-Speed 2496.85 samples/sec Loss 2.9907 LearningRate 0.000555 Epoch: 13 Global Step: 273440 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:01,113-Speed 2494.32 samples/sec Loss 2.9660 LearningRate 0.000555 Epoch: 13 Global Step: 273450 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:09,305-Speed 2500.36 samples/sec Loss 2.9596 LearningRate 0.000555 Epoch: 13 Global Step: 273460 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:17,499-Speed 2499.69 samples/sec Loss 2.9792 LearningRate 0.000555 Epoch: 13 Global Step: 273470 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:25,694-Speed 2499.45 samples/sec Loss 2.9643 LearningRate 0.000555 Epoch: 13 Global Step: 273480 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:33,835-Speed 2516.03 samples/sec Loss 2.8928 LearningRate 0.000555 Epoch: 13 Global Step: 273490 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:42,086-Speed 2499.62 samples/sec Loss 2.8958 LearningRate 0.000555 Epoch: 13 Global Step: 273500 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:50,324-Speed 2499.29 samples/sec Loss 2.9996 LearningRate 0.000555 Epoch: 13 Global Step: 273510 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 04:59:58,531-Speed 2495.71 samples/sec Loss 2.9282 LearningRate 0.000555 Epoch: 13 Global Step: 273520 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:06,782-Speed 2497.30 samples/sec Loss 2.9321 LearningRate 0.000555 Epoch: 13 Global Step: 273530 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:15,045-Speed 2501.22 samples/sec Loss 2.9922 LearningRate 0.000555 Epoch: 13 Global Step: 273540 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:23,276-Speed 2516.16 samples/sec Loss 2.9337 LearningRate 0.000555 Epoch: 13 Global Step: 273550 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:32,534-Speed 2212.55 samples/sec Loss 2.9177 LearningRate 0.000555 Epoch: 13 Global Step: 273560 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:40,791-Speed 2501.45 samples/sec Loss 2.9812 LearningRate 0.000555 Epoch: 13 Global Step: 273570 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:49,040-Speed 2498.79 samples/sec Loss 2.9921 LearningRate 0.000555 Epoch: 13 Global Step: 273580 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:00:57,275-Speed 2497.22 samples/sec Loss 2.9622 LearningRate 0.000555 Epoch: 13 Global Step: 273590 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:08,338-Speed 1851.28 samples/sec Loss 3.0247 LearningRate 0.000555 Epoch: 13 Global Step: 273600 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:16,550-Speed 2516.86 samples/sec Loss 2.9667 LearningRate 0.000554 Epoch: 13 Global Step: 273610 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:28,147-Speed 1774.85 samples/sec Loss 2.9627 LearningRate 0.000554 Epoch: 13 Global Step: 273620 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:36,353-Speed 2496.04 samples/sec Loss 2.9791 LearningRate 0.000554 Epoch: 13 Global Step: 273630 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:44,659-Speed 2489.48 samples/sec Loss 2.9219 LearningRate 0.000554 Epoch: 13 Global Step: 273640 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:01:52,879-Speed 2501.28 samples/sec Loss 2.9617 LearningRate 0.000554 Epoch: 13 Global Step: 273650 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:04,760-Speed 1723.87 samples/sec Loss 2.9744 LearningRate 0.000554 Epoch: 13 Global Step: 273660 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:12,945-Speed 2515.73 samples/sec Loss 2.9471 LearningRate 0.000554 Epoch: 13 Global Step: 273670 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:21,787-Speed 2501.70 samples/sec Loss 2.9706 LearningRate 0.000554 Epoch: 13 Global Step: 273680 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:30,090-Speed 2496.64 samples/sec Loss 2.9618 LearningRate 0.000554 Epoch: 13 Global Step: 273690 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:38,286-Speed 2499.01 samples/sec Loss 2.9368 LearningRate 0.000554 Epoch: 13 Global Step: 273700 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:46,531-Speed 2500.17 samples/sec Loss 3.0282 LearningRate 0.000554 Epoch: 13 Global Step: 273710 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:02:59,229-Speed 2474.73 samples/sec Loss 3.0015 LearningRate 0.000554 Epoch: 13 Global Step: 273720 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:07,395-Speed 2517.60 samples/sec Loss 2.9993 LearningRate 0.000554 Epoch: 13 Global Step: 273730 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:15,637-Speed 2499.86 samples/sec Loss 3.0369 LearningRate 0.000554 Epoch: 13 Global Step: 273740 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:28,075-Speed 1656.38 samples/sec Loss 3.0065 LearningRate 0.000554 Epoch: 13 Global Step: 273750 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:37,029-Speed 2356.14 samples/sec Loss 2.9902 LearningRate 0.000554 Epoch: 13 Global Step: 273760 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:45,560-Speed 2410.66 samples/sec Loss 3.0033 LearningRate 0.000554 Epoch: 13 Global Step: 273770 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:03:53,768-Speed 2495.67 samples/sec Loss 2.9828 LearningRate 0.000554 Epoch: 13 Global Step: 273780 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:01,915-Speed 2514.11 samples/sec Loss 2.9501 LearningRate 0.000554 Epoch: 13 Global Step: 273790 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:10,118-Speed 2496.92 samples/sec Loss 2.9577 LearningRate 0.000554 Epoch: 13 Global Step: 273800 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:18,316-Speed 2498.55 samples/sec Loss 2.9189 LearningRate 0.000554 Epoch: 13 Global Step: 273810 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:26,514-Speed 2498.69 samples/sec Loss 2.9792 LearningRate 0.000554 Epoch: 13 Global Step: 273820 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:34,713-Speed 2498.34 samples/sec Loss 2.9533 LearningRate 0.000554 Epoch: 13 Global Step: 273830 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:42,916-Speed 2497.11 samples/sec Loss 2.8694 LearningRate 0.000554 Epoch: 13 Global Step: 273840 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:51,059-Speed 2515.72 samples/sec Loss 2.9149 LearningRate 0.000554 Epoch: 13 Global Step: 273850 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:04:59,254-Speed 2499.19 samples/sec Loss 3.0068 LearningRate 0.000554 Epoch: 13 Global Step: 273860 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:07,453-Speed 2498.30 samples/sec Loss 2.9686 LearningRate 0.000554 Epoch: 13 Global Step: 273870 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:15,654-Speed 2497.83 samples/sec Loss 2.9483 LearningRate 0.000554 Epoch: 13 Global Step: 273880 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:23,854-Speed 2497.77 samples/sec Loss 2.9606 LearningRate 0.000554 Epoch: 13 Global Step: 273890 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:32,054-Speed 2498.03 samples/sec Loss 2.9493 LearningRate 0.000554 Epoch: 13 Global Step: 273900 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:40,201-Speed 2514.34 samples/sec Loss 2.9760 LearningRate 0.000554 Epoch: 13 Global Step: 273910 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:48,401-Speed 2498.05 samples/sec Loss 2.9039 LearningRate 0.000554 Epoch: 13 Global Step: 273920 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:05:56,601-Speed 2498.03 samples/sec Loss 3.0220 LearningRate 0.000554 Epoch: 13 Global Step: 273930 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:04,812-Speed 2494.64 samples/sec Loss 2.9688 LearningRate 0.000554 Epoch: 13 Global Step: 273940 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:13,009-Speed 2498.97 samples/sec Loss 2.9285 LearningRate 0.000554 Epoch: 13 Global Step: 273950 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:21,219-Speed 2495.00 samples/sec Loss 2.9640 LearningRate 0.000554 Epoch: 13 Global Step: 273960 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:29,368-Speed 2513.64 samples/sec Loss 2.9678 LearningRate 0.000554 Epoch: 13 Global Step: 273970 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:37,568-Speed 2498.04 samples/sec Loss 2.9972 LearningRate 0.000554 Epoch: 13 Global Step: 273980 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:45,766-Speed 2498.72 samples/sec Loss 3.0168 LearningRate 0.000554 Epoch: 13 Global Step: 273990 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:06:53,969-Speed 2496.94 samples/sec Loss 3.0359 LearningRate 0.000554 Epoch: 13 Global Step: 274000 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:02,169-Speed 2497.91 samples/sec Loss 2.9383 LearningRate 0.000554 Epoch: 13 Global Step: 274010 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:10,369-Speed 2498.15 samples/sec Loss 2.9492 LearningRate 0.000554 Epoch: 13 Global Step: 274020 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:18,516-Speed 2514.08 samples/sec Loss 2.9503 LearningRate 0.000554 Epoch: 13 Global Step: 274030 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:26,720-Speed 2496.57 samples/sec Loss 3.0356 LearningRate 0.000554 Epoch: 13 Global Step: 274040 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:34,927-Speed 2495.92 samples/sec Loss 2.9824 LearningRate 0.000554 Epoch: 13 Global Step: 274050 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:43,123-Speed 2499.20 samples/sec Loss 2.9885 LearningRate 0.000554 Epoch: 13 Global Step: 274060 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:51,320-Speed 2498.97 samples/sec Loss 2.9113 LearningRate 0.000554 Epoch: 13 Global Step: 274070 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:07:59,516-Speed 2499.32 samples/sec Loss 2.9493 LearningRate 0.000554 Epoch: 13 Global Step: 274080 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:07,661-Speed 2514.85 samples/sec Loss 2.8922 LearningRate 0.000554 Epoch: 13 Global Step: 274090 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:15,859-Speed 2498.68 samples/sec Loss 2.9386 LearningRate 0.000554 Epoch: 13 Global Step: 274100 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:24,065-Speed 2496.33 samples/sec Loss 2.8952 LearningRate 0.000553 Epoch: 13 Global Step: 274110 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:32,273-Speed 2495.54 samples/sec Loss 2.9508 LearningRate 0.000553 Epoch: 13 Global Step: 274120 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:40,468-Speed 2499.25 samples/sec Loss 2.9524 LearningRate 0.000553 Epoch: 13 Global Step: 274130 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:48,667-Speed 2498.21 samples/sec Loss 2.9745 LearningRate 0.000553 Epoch: 13 Global Step: 274140 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:08:56,817-Speed 2513.62 samples/sec Loss 2.9827 LearningRate 0.000553 Epoch: 13 Global Step: 274150 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:05,019-Speed 2497.33 samples/sec Loss 2.9632 LearningRate 0.000553 Epoch: 13 Global Step: 274160 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:13,214-Speed 2499.81 samples/sec Loss 2.9943 LearningRate 0.000553 Epoch: 13 Global Step: 274170 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:21,412-Speed 2498.76 samples/sec Loss 2.9298 LearningRate 0.000553 Epoch: 13 Global Step: 274180 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:29,610-Speed 2498.37 samples/sec Loss 2.9932 LearningRate 0.000553 Epoch: 13 Global Step: 274190 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:37,811-Speed 2497.69 samples/sec Loss 2.9360 LearningRate 0.000553 Epoch: 13 Global Step: 274200 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:45,958-Speed 2514.30 samples/sec Loss 3.0230 LearningRate 0.000553 Epoch: 13 Global Step: 274210 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:09:54,160-Speed 2497.22 samples/sec Loss 3.0482 LearningRate 0.000553 Epoch: 13 Global Step: 274220 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:02,361-Speed 2497.82 samples/sec Loss 3.0162 LearningRate 0.000553 Epoch: 13 Global Step: 274230 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:10,569-Speed 2495.75 samples/sec Loss 3.0410 LearningRate 0.000553 Epoch: 13 Global Step: 274240 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:18,768-Speed 2498.42 samples/sec Loss 2.9271 LearningRate 0.000553 Epoch: 13 Global Step: 274250 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:26,976-Speed 2495.50 samples/sec Loss 2.9853 LearningRate 0.000553 Epoch: 13 Global Step: 274260 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:35,117-Speed 2515.92 samples/sec Loss 2.8828 LearningRate 0.000553 Epoch: 13 Global Step: 274270 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:43,317-Speed 2498.07 samples/sec Loss 2.9398 LearningRate 0.000553 Epoch: 13 Global Step: 274280 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:51,520-Speed 2497.47 samples/sec Loss 2.9191 LearningRate 0.000553 Epoch: 13 Global Step: 274290 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:10:59,719-Speed 2498.33 samples/sec Loss 2.9080 LearningRate 0.000553 Epoch: 13 Global Step: 274300 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:07,920-Speed 2497.76 samples/sec Loss 2.9733 LearningRate 0.000553 Epoch: 13 Global Step: 274310 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:16,129-Speed 2495.13 samples/sec Loss 2.9612 LearningRate 0.000553 Epoch: 13 Global Step: 274320 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:24,290-Speed 2509.96 samples/sec Loss 2.9189 LearningRate 0.000553 Epoch: 13 Global Step: 274330 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:32,500-Speed 2494.72 samples/sec Loss 2.9759 LearningRate 0.000553 Epoch: 13 Global Step: 274340 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:40,705-Speed 2496.44 samples/sec Loss 2.9295 LearningRate 0.000553 Epoch: 13 Global Step: 274350 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:48,904-Speed 2498.51 samples/sec Loss 2.9394 LearningRate 0.000553 Epoch: 13 Global Step: 274360 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:11:57,101-Speed 2498.67 samples/sec Loss 2.9739 LearningRate 0.000553 Epoch: 13 Global Step: 274370 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:05,329-Speed 2489.52 samples/sec Loss 2.8852 LearningRate 0.000553 Epoch: 13 Global Step: 274380 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:13,474-Speed 2514.54 samples/sec Loss 2.9429 LearningRate 0.000553 Epoch: 13 Global Step: 274390 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:21,676-Speed 2497.41 samples/sec Loss 2.8686 LearningRate 0.000553 Epoch: 13 Global Step: 274400 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:29,875-Speed 2498.20 samples/sec Loss 2.9148 LearningRate 0.000553 Epoch: 13 Global Step: 274410 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:38,073-Speed 2498.65 samples/sec Loss 2.8994 LearningRate 0.000553 Epoch: 13 Global Step: 274420 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:46,279-Speed 2496.37 samples/sec Loss 2.9643 LearningRate 0.000553 Epoch: 13 Global Step: 274430 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:12:54,491-Speed 2494.24 samples/sec Loss 2.9481 LearningRate 0.000553 Epoch: 13 Global Step: 274440 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:02,637-Speed 2514.52 samples/sec Loss 2.9776 LearningRate 0.000553 Epoch: 13 Global Step: 274450 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:10,835-Speed 2498.61 samples/sec Loss 2.9073 LearningRate 0.000553 Epoch: 13 Global Step: 274460 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:19,031-Speed 2499.12 samples/sec Loss 2.9066 LearningRate 0.000553 Epoch: 13 Global Step: 274470 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:27,227-Speed 2499.12 samples/sec Loss 2.9516 LearningRate 0.000553 Epoch: 13 Global Step: 274480 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:35,424-Speed 2498.94 samples/sec Loss 2.9090 LearningRate 0.000553 Epoch: 13 Global Step: 274490 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:43,636-Speed 2494.39 samples/sec Loss 2.9468 LearningRate 0.000553 Epoch: 13 Global Step: 274500 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:51,781-Speed 2514.84 samples/sec Loss 3.0290 LearningRate 0.000553 Epoch: 13 Global Step: 274510 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:13:59,981-Speed 2498.15 samples/sec Loss 2.9753 LearningRate 0.000553 Epoch: 13 Global Step: 274520 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:14:08,181-Speed 2498.06 samples/sec Loss 2.9077 LearningRate 0.000553 Epoch: 13 Global Step: 274530 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:16,381-Speed 2497.67 samples/sec Loss 2.9180 LearningRate 0.000553 Epoch: 13 Global Step: 274540 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:24,582-Speed 2497.59 samples/sec Loss 2.9618 LearningRate 0.000553 Epoch: 13 Global Step: 274550 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:32,784-Speed 2497.49 samples/sec Loss 3.0472 LearningRate 0.000553 Epoch: 13 Global Step: 274560 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:40,931-Speed 2514.04 samples/sec Loss 2.8977 LearningRate 0.000553 Epoch: 13 Global Step: 274570 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:49,147-Speed 2493.14 samples/sec Loss 2.9851 LearningRate 0.000553 Epoch: 13 Global Step: 274580 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:14:57,353-Speed 2495.90 samples/sec Loss 2.9927 LearningRate 0.000553 Epoch: 13 Global Step: 274590 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:05,554-Speed 2497.60 samples/sec Loss 3.0181 LearningRate 0.000553 Epoch: 13 Global Step: 274600 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:13,754-Speed 2498.24 samples/sec Loss 2.9076 LearningRate 0.000552 Epoch: 13 Global Step: 274610 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:21,957-Speed 2496.95 samples/sec Loss 2.9698 LearningRate 0.000552 Epoch: 13 Global Step: 274620 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:30,104-Speed 2514.15 samples/sec Loss 2.9685 LearningRate 0.000552 Epoch: 13 Global Step: 274630 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:38,299-Speed 2499.65 samples/sec Loss 2.9100 LearningRate 0.000552 Epoch: 13 Global Step: 274640 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:46,499-Speed 2498.19 samples/sec Loss 2.9630 LearningRate 0.000552 Epoch: 13 Global Step: 274650 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:15:54,697-Speed 2498.47 samples/sec Loss 2.9284 LearningRate 0.000552 Epoch: 13 Global Step: 274660 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:02,895-Speed 2498.48 samples/sec Loss 2.9197 LearningRate 0.000552 Epoch: 13 Global Step: 274670 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:11,097-Speed 2497.28 samples/sec Loss 2.9461 LearningRate 0.000552 Epoch: 13 Global Step: 274680 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:19,250-Speed 2512.53 samples/sec Loss 2.9482 LearningRate 0.000552 Epoch: 13 Global Step: 274690 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:27,457-Speed 2495.94 samples/sec Loss 2.9635 LearningRate 0.000552 Epoch: 13 Global Step: 274700 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:35,659-Speed 2497.21 samples/sec Loss 2.9925 LearningRate 0.000552 Epoch: 13 Global Step: 274710 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:43,864-Speed 2496.47 samples/sec Loss 2.9377 LearningRate 0.000552 Epoch: 13 Global Step: 274720 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:16:52,064-Speed 2497.85 samples/sec Loss 2.9384 LearningRate 0.000552 Epoch: 13 Global Step: 274730 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:00,267-Speed 2497.26 samples/sec Loss 2.9820 LearningRate 0.000552 Epoch: 13 Global Step: 274740 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:08,418-Speed 2513.01 samples/sec Loss 2.9834 LearningRate 0.000552 Epoch: 13 Global Step: 274750 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:16,613-Speed 2499.33 samples/sec Loss 2.9470 LearningRate 0.000552 Epoch: 13 Global Step: 274760 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:24,820-Speed 2496.04 samples/sec Loss 2.9894 LearningRate 0.000552 Epoch: 13 Global Step: 274770 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:33,018-Speed 2498.51 samples/sec Loss 2.9767 LearningRate 0.000552 Epoch: 13 Global Step: 274780 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:41,211-Speed 2499.83 samples/sec Loss 2.9994 LearningRate 0.000552 Epoch: 13 Global Step: 274790 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:49,409-Speed 2498.96 samples/sec Loss 3.0126 LearningRate 0.000552 Epoch: 13 Global Step: 274800 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:17:57,553-Speed 2514.99 samples/sec Loss 2.9863 LearningRate 0.000552 Epoch: 13 Global Step: 274810 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:05,751-Speed 2498.60 samples/sec Loss 2.9765 LearningRate 0.000552 Epoch: 13 Global Step: 274820 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:13,955-Speed 2496.89 samples/sec Loss 3.0154 LearningRate 0.000552 Epoch: 13 Global Step: 274830 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:22,153-Speed 2498.43 samples/sec Loss 2.9439 LearningRate 0.000552 Epoch: 13 Global Step: 274840 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:30,354-Speed 2497.76 samples/sec Loss 2.9254 LearningRate 0.000552 Epoch: 13 Global Step: 274850 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:38,551-Speed 2498.99 samples/sec Loss 2.9310 LearningRate 0.000552 Epoch: 13 Global Step: 274860 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:46,697-Speed 2514.92 samples/sec Loss 2.9485 LearningRate 0.000552 Epoch: 13 Global Step: 274870 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:18:54,905-Speed 2495.55 samples/sec Loss 2.9548 LearningRate 0.000552 Epoch: 13 Global Step: 274880 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:03,119-Speed 2493.71 samples/sec Loss 2.8870 LearningRate 0.000552 Epoch: 13 Global Step: 274890 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:11,325-Speed 2496.28 samples/sec Loss 2.9339 LearningRate 0.000552 Epoch: 13 Global Step: 274900 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:19,528-Speed 2496.79 samples/sec Loss 2.9085 LearningRate 0.000552 Epoch: 13 Global Step: 274910 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:27,726-Speed 2498.87 samples/sec Loss 2.9385 LearningRate 0.000552 Epoch: 13 Global Step: 274920 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:35,873-Speed 2514.37 samples/sec Loss 2.9572 LearningRate 0.000552 Epoch: 13 Global Step: 274930 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:44,073-Speed 2497.90 samples/sec Loss 2.9188 LearningRate 0.000552 Epoch: 13 Global Step: 274940 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:19:52,276-Speed 2496.88 samples/sec Loss 2.9232 LearningRate 0.000552 Epoch: 13 Global Step: 274950 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:00,508-Speed 2488.48 samples/sec Loss 2.9427 LearningRate 0.000552 Epoch: 13 Global Step: 274960 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:08,719-Speed 2494.64 samples/sec Loss 2.9528 LearningRate 0.000552 Epoch: 13 Global Step: 274970 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:16,915-Speed 2498.86 samples/sec Loss 2.8595 LearningRate 0.000552 Epoch: 13 Global Step: 274980 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:25,066-Speed 2513.24 samples/sec Loss 2.9308 LearningRate 0.000552 Epoch: 13 Global Step: 274990 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:33,262-Speed 2499.07 samples/sec Loss 2.9817 LearningRate 0.000552 Epoch: 13 Global Step: 275000 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:41,458-Speed 2499.03 samples/sec Loss 2.9472 LearningRate 0.000552 Epoch: 13 Global Step: 275010 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:49,663-Speed 2496.47 samples/sec Loss 2.9075 LearningRate 0.000552 Epoch: 13 Global Step: 275020 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:20:57,859-Speed 2499.31 samples/sec Loss 2.9396 LearningRate 0.000552 Epoch: 13 Global Step: 275030 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:06,067-Speed 2495.37 samples/sec Loss 2.9051 LearningRate 0.000552 Epoch: 13 Global Step: 275040 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:14,214-Speed 2514.15 samples/sec Loss 2.9519 LearningRate 0.000552 Epoch: 13 Global Step: 275050 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:22,412-Speed 2498.50 samples/sec Loss 2.9416 LearningRate 0.000552 Epoch: 13 Global Step: 275060 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:30,610-Speed 2498.83 samples/sec Loss 2.8921 LearningRate 0.000552 Epoch: 13 Global Step: 275070 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:38,813-Speed 2496.90 samples/sec Loss 2.9116 LearningRate 0.000552 Epoch: 13 Global Step: 275080 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:47,007-Speed 2499.85 samples/sec Loss 2.9795 LearningRate 0.000552 Epoch: 13 Global Step: 275090 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:21:55,209-Speed 2497.38 samples/sec Loss 3.0285 LearningRate 0.000552 Epoch: 13 Global Step: 275100 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:03,352-Speed 2515.26 samples/sec Loss 2.9292 LearningRate 0.000552 Epoch: 13 Global Step: 275110 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:11,552-Speed 2498.02 samples/sec Loss 2.9676 LearningRate 0.000551 Epoch: 13 Global Step: 275120 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:19,749-Speed 2498.88 samples/sec Loss 2.9843 LearningRate 0.000551 Epoch: 13 Global Step: 275130 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:27,973-Speed 2490.88 samples/sec Loss 2.9832 LearningRate 0.000551 Epoch: 13 Global Step: 275140 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:36,173-Speed 2497.85 samples/sec Loss 3.0139 LearningRate 0.000551 Epoch: 13 Global Step: 275150 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:44,384-Speed 2494.73 samples/sec Loss 3.0049 LearningRate 0.000551 Epoch: 13 Global Step: 275160 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:22:52,527-Speed 2515.35 samples/sec Loss 2.9914 LearningRate 0.000551 Epoch: 13 Global Step: 275170 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:00,721-Speed 2499.71 samples/sec Loss 2.9813 LearningRate 0.000551 Epoch: 13 Global Step: 275180 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:08,922-Speed 2498.12 samples/sec Loss 2.9923 LearningRate 0.000551 Epoch: 13 Global Step: 275190 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:17,121-Speed 2498.07 samples/sec Loss 3.0407 LearningRate 0.000551 Epoch: 13 Global Step: 275200 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:25,329-Speed 2495.47 samples/sec Loss 3.1035 LearningRate 0.000551 Epoch: 13 Global Step: 275210 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:33,526-Speed 2499.32 samples/sec Loss 3.0039 LearningRate 0.000551 Epoch: 13 Global Step: 275220 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:41,683-Speed 2511.27 samples/sec Loss 3.0511 LearningRate 0.000551 Epoch: 13 Global Step: 275230 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:49,878-Speed 2499.51 samples/sec Loss 2.9695 LearningRate 0.000551 Epoch: 13 Global Step: 275240 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:23:58,079-Speed 2497.59 samples/sec Loss 2.9479 LearningRate 0.000551 Epoch: 13 Global Step: 275250 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:06,287-Speed 2495.53 samples/sec Loss 2.9262 LearningRate 0.000551 Epoch: 13 Global Step: 275260 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:14,495-Speed 2495.73 samples/sec Loss 2.9156 LearningRate 0.000551 Epoch: 13 Global Step: 275270 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:22,696-Speed 2497.54 samples/sec Loss 2.9491 LearningRate 0.000551 Epoch: 13 Global Step: 275280 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:30,849-Speed 2512.39 samples/sec Loss 2.9572 LearningRate 0.000551 Epoch: 13 Global Step: 275290 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:39,056-Speed 2496.11 samples/sec Loss 2.9852 LearningRate 0.000551 Epoch: 13 Global Step: 275300 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:47,263-Speed 2495.80 samples/sec Loss 3.0055 LearningRate 0.000551 Epoch: 13 Global Step: 275310 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:24:55,463-Speed 2498.21 samples/sec Loss 2.9805 LearningRate 0.000551 Epoch: 13 Global Step: 275320 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:03,662-Speed 2498.31 samples/sec Loss 2.9293 LearningRate 0.000551 Epoch: 13 Global Step: 275330 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:11,863-Speed 2497.43 samples/sec Loss 2.9678 LearningRate 0.000551 Epoch: 13 Global Step: 275340 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:20,009-Speed 2514.72 samples/sec Loss 2.9444 LearningRate 0.000551 Epoch: 13 Global Step: 275350 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:28,224-Speed 2493.49 samples/sec Loss 2.9568 LearningRate 0.000551 Epoch: 13 Global Step: 275360 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:36,431-Speed 2495.50 samples/sec Loss 2.9482 LearningRate 0.000551 Epoch: 13 Global Step: 275370 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:44,637-Speed 2496.36 samples/sec Loss 2.9531 LearningRate 0.000551 Epoch: 13 Global Step: 275380 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:25:52,841-Speed 2496.55 samples/sec Loss 2.9354 LearningRate 0.000551 Epoch: 13 Global Step: 275390 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:01,044-Speed 2497.12 samples/sec Loss 2.9291 LearningRate 0.000551 Epoch: 13 Global Step: 275400 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:09,195-Speed 2512.90 samples/sec Loss 2.9827 LearningRate 0.000551 Epoch: 13 Global Step: 275410 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:17,394-Speed 2498.78 samples/sec Loss 3.0068 LearningRate 0.000551 Epoch: 13 Global Step: 275420 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:25,591-Speed 2498.70 samples/sec Loss 3.0251 LearningRate 0.000551 Epoch: 13 Global Step: 275430 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:33,797-Speed 2496.28 samples/sec Loss 2.9847 LearningRate 0.000551 Epoch: 13 Global Step: 275440 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:41,995-Speed 2498.49 samples/sec Loss 2.9207 LearningRate 0.000551 Epoch: 13 Global Step: 275450 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:50,195-Speed 2497.79 samples/sec Loss 2.9475 LearningRate 0.000551 Epoch: 13 Global Step: 275460 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:26:58,346-Speed 2512.83 samples/sec Loss 3.0103 LearningRate 0.000551 Epoch: 13 Global Step: 275470 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:06,555-Speed 2495.26 samples/sec Loss 2.9579 LearningRate 0.000551 Epoch: 13 Global Step: 275480 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:14,752-Speed 2498.93 samples/sec Loss 2.9977 LearningRate 0.000551 Epoch: 13 Global Step: 275490 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:22,953-Speed 2497.59 samples/sec Loss 2.9552 LearningRate 0.000551 Epoch: 13 Global Step: 275500 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:31,154-Speed 2497.65 samples/sec Loss 2.9473 LearningRate 0.000551 Epoch: 13 Global Step: 275510 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:39,356-Speed 2497.36 samples/sec Loss 2.9938 LearningRate 0.000551 Epoch: 13 Global Step: 275520 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:47,502-Speed 2514.50 samples/sec Loss 3.0121 LearningRate 0.000551 Epoch: 13 Global Step: 275530 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:27:55,713-Speed 2494.55 samples/sec Loss 2.9973 LearningRate 0.000551 Epoch: 13 Global Step: 275540 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:03,911-Speed 2498.71 samples/sec Loss 2.9683 LearningRate 0.000551 Epoch: 13 Global Step: 275550 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:12,109-Speed 2498.51 samples/sec Loss 2.9822 LearningRate 0.000551 Epoch: 13 Global Step: 275560 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:20,316-Speed 2495.77 samples/sec Loss 2.9384 LearningRate 0.000551 Epoch: 13 Global Step: 275570 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:28,514-Speed 2498.52 samples/sec Loss 2.9996 LearningRate 0.000551 Epoch: 13 Global Step: 275580 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:36,657-Speed 2515.87 samples/sec Loss 3.0299 LearningRate 0.000551 Epoch: 13 Global Step: 275590 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:44,869-Speed 2494.28 samples/sec Loss 2.9325 LearningRate 0.000551 Epoch: 13 Global Step: 275600 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:28:53,066-Speed 2499.02 samples/sec Loss 3.0187 LearningRate 0.000551 Epoch: 13 Global Step: 275610 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:01,263-Speed 2498.97 samples/sec Loss 3.0020 LearningRate 0.000550 Epoch: 13 Global Step: 275620 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:09,457-Speed 2499.53 samples/sec Loss 3.0070 LearningRate 0.000550 Epoch: 13 Global Step: 275630 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:17,657-Speed 2498.50 samples/sec Loss 3.0604 LearningRate 0.000550 Epoch: 13 Global Step: 275640 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:25,806-Speed 2513.62 samples/sec Loss 3.0116 LearningRate 0.000550 Epoch: 13 Global Step: 275650 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:34,005-Speed 2498.20 samples/sec Loss 2.9836 LearningRate 0.000550 Epoch: 13 Global Step: 275660 Fp16 Grad Scale: 32768 Required: 127 hours Training: 2022-07-08 05:29:42,163-Speed 2510.67 samples/sec Loss 2.9977 LearningRate 0.000550 Epoch: 13 Global Step: 275670 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:29:50,369-Speed 2496.13 samples/sec Loss 2.9571 LearningRate 0.000550 Epoch: 13 Global Step: 275680 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:29:58,567-Speed 2498.37 samples/sec Loss 2.9729 LearningRate 0.000550 Epoch: 13 Global Step: 275690 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:06,763-Speed 2499.34 samples/sec Loss 2.9323 LearningRate 0.000550 Epoch: 13 Global Step: 275700 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:14,913-Speed 2512.94 samples/sec Loss 2.9222 LearningRate 0.000550 Epoch: 13 Global Step: 275710 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:23,108-Speed 2499.51 samples/sec Loss 2.9337 LearningRate 0.000550 Epoch: 13 Global Step: 275720 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:31,309-Speed 2497.66 samples/sec Loss 2.9432 LearningRate 0.000550 Epoch: 13 Global Step: 275730 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:39,505-Speed 2499.18 samples/sec Loss 2.9141 LearningRate 0.000550 Epoch: 13 Global Step: 275740 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:47,713-Speed 2496.16 samples/sec Loss 2.9068 LearningRate 0.000550 Epoch: 13 Global Step: 275750 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:30:55,909-Speed 2499.16 samples/sec Loss 2.9739 LearningRate 0.000550 Epoch: 13 Global Step: 275760 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:04,054-Speed 2514.52 samples/sec Loss 2.9780 LearningRate 0.000550 Epoch: 13 Global Step: 275770 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:12,262-Speed 2495.91 samples/sec Loss 2.9482 LearningRate 0.000550 Epoch: 13 Global Step: 275780 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:20,458-Speed 2498.99 samples/sec Loss 2.9323 LearningRate 0.000550 Epoch: 13 Global Step: 275790 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:28,660-Speed 2497.47 samples/sec Loss 2.9685 LearningRate 0.000550 Epoch: 13 Global Step: 275800 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:36,854-Speed 2499.83 samples/sec Loss 2.9772 LearningRate 0.000550 Epoch: 13 Global Step: 275810 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:45,063-Speed 2495.15 samples/sec Loss 2.9605 LearningRate 0.000550 Epoch: 13 Global Step: 275820 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:31:53,207-Speed 2515.23 samples/sec Loss 2.9128 LearningRate 0.000550 Epoch: 13 Global Step: 275830 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:01,408-Speed 2497.73 samples/sec Loss 2.9342 LearningRate 0.000550 Epoch: 13 Global Step: 275840 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:09,616-Speed 2495.26 samples/sec Loss 2.9662 LearningRate 0.000550 Epoch: 13 Global Step: 275850 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:17,814-Speed 2498.46 samples/sec Loss 2.9786 LearningRate 0.000550 Epoch: 13 Global Step: 275860 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:26,010-Speed 2499.15 samples/sec Loss 2.9779 LearningRate 0.000550 Epoch: 13 Global Step: 275870 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:34,206-Speed 2499.36 samples/sec Loss 2.9313 LearningRate 0.000550 Epoch: 13 Global Step: 275880 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:42,351-Speed 2514.95 samples/sec Loss 2.9853 LearningRate 0.000550 Epoch: 13 Global Step: 275890 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:50,554-Speed 2496.79 samples/sec Loss 2.9090 LearningRate 0.000550 Epoch: 13 Global Step: 275900 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:32:58,764-Speed 2495.06 samples/sec Loss 2.9334 LearningRate 0.000550 Epoch: 13 Global Step: 275910 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:06,960-Speed 2499.14 samples/sec Loss 2.9624 LearningRate 0.000550 Epoch: 13 Global Step: 275920 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:15,158-Speed 2498.64 samples/sec Loss 2.9421 LearningRate 0.000550 Epoch: 13 Global Step: 275930 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:23,356-Speed 2498.51 samples/sec Loss 2.9658 LearningRate 0.000550 Epoch: 13 Global Step: 275940 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:31,504-Speed 2513.97 samples/sec Loss 2.9291 LearningRate 0.000550 Epoch: 13 Global Step: 275950 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:39,703-Speed 2498.36 samples/sec Loss 2.9758 LearningRate 0.000550 Epoch: 13 Global Step: 275960 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:47,900-Speed 2498.79 samples/sec Loss 2.9757 LearningRate 0.000550 Epoch: 13 Global Step: 275970 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:33:56,098-Speed 2498.55 samples/sec Loss 3.0108 LearningRate 0.000550 Epoch: 13 Global Step: 275980 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:04,299-Speed 2497.95 samples/sec Loss 2.9675 LearningRate 0.000550 Epoch: 13 Global Step: 275990 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:12,499-Speed 2497.97 samples/sec Loss 2.9733 LearningRate 0.000550 Epoch: 13 Global Step: 276000 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:20,642-Speed 2515.22 samples/sec Loss 2.9318 LearningRate 0.000550 Epoch: 13 Global Step: 276010 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:28,840-Speed 2498.83 samples/sec Loss 2.9428 LearningRate 0.000550 Epoch: 13 Global Step: 276020 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:37,035-Speed 2499.49 samples/sec Loss 2.9910 LearningRate 0.000550 Epoch: 13 Global Step: 276030 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:45,233-Speed 2498.16 samples/sec Loss 2.9164 LearningRate 0.000550 Epoch: 13 Global Step: 276040 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:34:53,428-Speed 2499.75 samples/sec Loss 2.8942 LearningRate 0.000550 Epoch: 13 Global Step: 276050 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:01,627-Speed 2498.04 samples/sec Loss 2.9245 LearningRate 0.000550 Epoch: 13 Global Step: 276060 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:09,776-Speed 2513.66 samples/sec Loss 2.9882 LearningRate 0.000550 Epoch: 13 Global Step: 276070 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:17,987-Speed 2494.75 samples/sec Loss 2.9694 LearningRate 0.000550 Epoch: 13 Global Step: 276080 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:26,191-Speed 2496.74 samples/sec Loss 2.9092 LearningRate 0.000550 Epoch: 13 Global Step: 276090 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:34,390-Speed 2498.40 samples/sec Loss 2.9392 LearningRate 0.000550 Epoch: 13 Global Step: 276100 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:42,602-Speed 2494.50 samples/sec Loss 2.9850 LearningRate 0.000550 Epoch: 13 Global Step: 276110 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:50,799-Speed 2498.64 samples/sec Loss 2.9396 LearningRate 0.000549 Epoch: 13 Global Step: 276120 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:35:58,944-Speed 2514.68 samples/sec Loss 2.9369 LearningRate 0.000549 Epoch: 13 Global Step: 276130 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:07,142-Speed 2498.79 samples/sec Loss 3.0569 LearningRate 0.000549 Epoch: 13 Global Step: 276140 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:15,338-Speed 2499.18 samples/sec Loss 2.9605 LearningRate 0.000549 Epoch: 13 Global Step: 276150 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:23,536-Speed 2498.56 samples/sec Loss 2.9258 LearningRate 0.000549 Epoch: 13 Global Step: 276160 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:31,740-Speed 2496.54 samples/sec Loss 2.9849 LearningRate 0.000549 Epoch: 13 Global Step: 276170 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:39,940-Speed 2498.67 samples/sec Loss 2.9707 LearningRate 0.000549 Epoch: 13 Global Step: 276180 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:48,085-Speed 2514.88 samples/sec Loss 2.9931 LearningRate 0.000549 Epoch: 13 Global Step: 276190 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:36:56,283-Speed 2498.76 samples/sec Loss 2.9586 LearningRate 0.000549 Epoch: 13 Global Step: 276200 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:04,490-Speed 2495.63 samples/sec Loss 2.9536 LearningRate 0.000549 Epoch: 13 Global Step: 276210 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:12,688-Speed 2498.92 samples/sec Loss 2.9475 LearningRate 0.000549 Epoch: 13 Global Step: 276220 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:20,885-Speed 2498.85 samples/sec Loss 2.9242 LearningRate 0.000549 Epoch: 13 Global Step: 276230 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:29,087-Speed 2497.33 samples/sec Loss 2.9710 LearningRate 0.000549 Epoch: 13 Global Step: 276240 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:37,231-Speed 2515.07 samples/sec Loss 2.9517 LearningRate 0.000549 Epoch: 13 Global Step: 276250 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:45,431-Speed 2498.20 samples/sec Loss 3.0012 LearningRate 0.000549 Epoch: 13 Global Step: 276260 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:37:53,627-Speed 2499.24 samples/sec Loss 2.9343 LearningRate 0.000549 Epoch: 13 Global Step: 276270 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:01,825-Speed 2498.47 samples/sec Loss 2.9134 LearningRate 0.000549 Epoch: 13 Global Step: 276280 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:10,024-Speed 2498.05 samples/sec Loss 2.9312 LearningRate 0.000549 Epoch: 13 Global Step: 276290 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:18,222-Speed 2498.81 samples/sec Loss 2.9126 LearningRate 0.000549 Epoch: 13 Global Step: 276300 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:26,364-Speed 2516.09 samples/sec Loss 2.9681 LearningRate 0.000549 Epoch: 13 Global Step: 276310 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:34,564-Speed 2498.10 samples/sec Loss 2.9277 LearningRate 0.000549 Epoch: 13 Global Step: 276320 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:42,774-Speed 2495.06 samples/sec Loss 3.0010 LearningRate 0.000549 Epoch: 13 Global Step: 276330 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:50,971-Speed 2498.63 samples/sec Loss 2.9728 LearningRate 0.000549 Epoch: 13 Global Step: 276340 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:38:59,176-Speed 2496.43 samples/sec Loss 2.9894 LearningRate 0.000549 Epoch: 13 Global Step: 276350 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:07,379-Speed 2497.07 samples/sec Loss 2.9135 LearningRate 0.000549 Epoch: 13 Global Step: 276360 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:15,528-Speed 2513.68 samples/sec Loss 2.9850 LearningRate 0.000549 Epoch: 13 Global Step: 276370 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:23,727-Speed 2498.19 samples/sec Loss 2.9773 LearningRate 0.000549 Epoch: 13 Global Step: 276380 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:31,933-Speed 2496.18 samples/sec Loss 2.9073 LearningRate 0.000549 Epoch: 13 Global Step: 276390 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:40,129-Speed 2499.23 samples/sec Loss 2.9569 LearningRate 0.000549 Epoch: 13 Global Step: 276400 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:48,326-Speed 2498.87 samples/sec Loss 3.0126 LearningRate 0.000549 Epoch: 13 Global Step: 276410 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:39:56,523-Speed 2498.96 samples/sec Loss 2.9022 LearningRate 0.000549 Epoch: 13 Global Step: 276420 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:04,677-Speed 2511.98 samples/sec Loss 2.9242 LearningRate 0.000549 Epoch: 13 Global Step: 276430 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:12,882-Speed 2496.62 samples/sec Loss 2.9529 LearningRate 0.000549 Epoch: 13 Global Step: 276440 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:21,090-Speed 2495.85 samples/sec Loss 2.9602 LearningRate 0.000549 Epoch: 13 Global Step: 276450 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:29,300-Speed 2494.54 samples/sec Loss 2.9922 LearningRate 0.000549 Epoch: 13 Global Step: 276460 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:37,500-Speed 2498.08 samples/sec Loss 2.9643 LearningRate 0.000549 Epoch: 13 Global Step: 276470 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:45,699-Speed 2498.32 samples/sec Loss 2.9727 LearningRate 0.000549 Epoch: 13 Global Step: 276480 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:40:53,851-Speed 2512.74 samples/sec Loss 2.9172 LearningRate 0.000549 Epoch: 13 Global Step: 276490 Fp16 Grad Scale: 16384 Required: 127 hours Training: 2022-07-08 05:41:02,050-Speed 2498.22 samples/sec Loss 2.9284 LearningRate 0.000549 Epoch: 13 Global Step: 276500 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:10,253-Speed 2496.89 samples/sec Loss 2.9830 LearningRate 0.000549 Epoch: 13 Global Step: 276510 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:18,449-Speed 2499.20 samples/sec Loss 3.0117 LearningRate 0.000549 Epoch: 13 Global Step: 276520 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:26,649-Speed 2497.81 samples/sec Loss 2.9502 LearningRate 0.000549 Epoch: 13 Global Step: 276530 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:34,847-Speed 2498.78 samples/sec Loss 2.9475 LearningRate 0.000549 Epoch: 13 Global Step: 276540 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:42,996-Speed 2513.61 samples/sec Loss 2.9131 LearningRate 0.000549 Epoch: 13 Global Step: 276550 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:51,200-Speed 2497.01 samples/sec Loss 2.9421 LearningRate 0.000549 Epoch: 13 Global Step: 276560 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:41:59,392-Speed 2500.20 samples/sec Loss 3.0232 LearningRate 0.000549 Epoch: 13 Global Step: 276570 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:07,590-Speed 2498.57 samples/sec Loss 2.9567 LearningRate 0.000549 Epoch: 13 Global Step: 276580 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:15,797-Speed 2496.00 samples/sec Loss 2.9878 LearningRate 0.000549 Epoch: 13 Global Step: 276590 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:23,998-Speed 2497.78 samples/sec Loss 2.9873 LearningRate 0.000549 Epoch: 13 Global Step: 276600 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:32,139-Speed 2515.84 samples/sec Loss 3.0169 LearningRate 0.000549 Epoch: 13 Global Step: 276610 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:40,334-Speed 2499.51 samples/sec Loss 2.9712 LearningRate 0.000549 Epoch: 13 Global Step: 276620 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:48,534-Speed 2497.88 samples/sec Loss 2.9198 LearningRate 0.000548 Epoch: 13 Global Step: 276630 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:42:56,730-Speed 2499.22 samples/sec Loss 2.9792 LearningRate 0.000548 Epoch: 13 Global Step: 276640 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:04,934-Speed 2496.79 samples/sec Loss 2.9434 LearningRate 0.000548 Epoch: 13 Global Step: 276650 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:13,130-Speed 2499.13 samples/sec Loss 2.9473 LearningRate 0.000548 Epoch: 13 Global Step: 276660 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:21,277-Speed 2514.18 samples/sec Loss 2.9623 LearningRate 0.000548 Epoch: 13 Global Step: 276670 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:29,479-Speed 2497.55 samples/sec Loss 2.9557 LearningRate 0.000548 Epoch: 13 Global Step: 276680 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:37,676-Speed 2498.69 samples/sec Loss 2.9514 LearningRate 0.000548 Epoch: 13 Global Step: 276690 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:45,875-Speed 2498.57 samples/sec Loss 2.9203 LearningRate 0.000548 Epoch: 13 Global Step: 276700 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:43:54,097-Speed 2491.29 samples/sec Loss 2.9659 LearningRate 0.000548 Epoch: 13 Global Step: 276710 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:02,291-Speed 2499.62 samples/sec Loss 2.9892 LearningRate 0.000548 Epoch: 13 Global Step: 276720 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:10,440-Speed 2513.72 samples/sec Loss 3.0869 LearningRate 0.000548 Epoch: 13 Global Step: 276730 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:18,645-Speed 2496.70 samples/sec Loss 2.9400 LearningRate 0.000548 Epoch: 13 Global Step: 276740 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:26,841-Speed 2499.14 samples/sec Loss 2.9695 LearningRate 0.000548 Epoch: 13 Global Step: 276750 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:35,051-Speed 2494.80 samples/sec Loss 2.9936 LearningRate 0.000548 Epoch: 13 Global Step: 276760 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:43,247-Speed 2499.36 samples/sec Loss 3.0014 LearningRate 0.000548 Epoch: 13 Global Step: 276770 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:51,444-Speed 2498.88 samples/sec Loss 2.9610 LearningRate 0.000548 Epoch: 13 Global Step: 276780 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:44:59,606-Speed 2509.52 samples/sec Loss 2.9424 LearningRate 0.000548 Epoch: 13 Global Step: 276790 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:07,803-Speed 2498.83 samples/sec Loss 2.9590 LearningRate 0.000548 Epoch: 13 Global Step: 276800 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:16,014-Speed 2494.64 samples/sec Loss 3.0328 LearningRate 0.000548 Epoch: 13 Global Step: 276810 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:24,211-Speed 2498.57 samples/sec Loss 2.9757 LearningRate 0.000548 Epoch: 13 Global Step: 276820 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:32,410-Speed 2498.36 samples/sec Loss 2.9475 LearningRate 0.000548 Epoch: 13 Global Step: 276830 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:40,610-Speed 2498.11 samples/sec Loss 3.0404 LearningRate 0.000548 Epoch: 13 Global Step: 276840 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:48,757-Speed 2514.38 samples/sec Loss 3.0375 LearningRate 0.000548 Epoch: 13 Global Step: 276850 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:45:56,956-Speed 2498.33 samples/sec Loss 3.0512 LearningRate 0.000548 Epoch: 13 Global Step: 276860 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:46:05,153-Speed 2498.85 samples/sec Loss 3.0718 LearningRate 0.000548 Epoch: 13 Global Step: 276870 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:13,353-Speed 2497.99 samples/sec Loss 3.0432 LearningRate 0.000548 Epoch: 13 Global Step: 276880 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:21,552-Speed 2498.09 samples/sec Loss 3.1049 LearningRate 0.000548 Epoch: 13 Global Step: 276890 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:29,749-Speed 2498.70 samples/sec Loss 3.1265 LearningRate 0.000548 Epoch: 13 Global Step: 276900 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:37,894-Speed 2514.99 samples/sec Loss 3.0853 LearningRate 0.000548 Epoch: 13 Global Step: 276910 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:46,111-Speed 2492.90 samples/sec Loss 3.0667 LearningRate 0.000548 Epoch: 13 Global Step: 276920 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:46:54,306-Speed 2499.67 samples/sec Loss 3.0651 LearningRate 0.000548 Epoch: 13 Global Step: 276930 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:02,504-Speed 2498.87 samples/sec Loss 3.0517 LearningRate 0.000548 Epoch: 13 Global Step: 276940 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:10,705-Speed 2497.67 samples/sec Loss 2.9718 LearningRate 0.000548 Epoch: 13 Global Step: 276950 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:18,902-Speed 2498.59 samples/sec Loss 2.9736 LearningRate 0.000548 Epoch: 13 Global Step: 276960 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:27,046-Speed 2515.37 samples/sec Loss 2.9981 LearningRate 0.000548 Epoch: 13 Global Step: 276970 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:35,251-Speed 2496.47 samples/sec Loss 2.9457 LearningRate 0.000548 Epoch: 13 Global Step: 276980 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:43,448-Speed 2498.72 samples/sec Loss 2.9708 LearningRate 0.000548 Epoch: 13 Global Step: 276990 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:51,651-Speed 2497.07 samples/sec Loss 2.9222 LearningRate 0.000548 Epoch: 13 Global Step: 277000 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:47:59,875-Speed 2490.70 samples/sec Loss 2.9631 LearningRate 0.000548 Epoch: 13 Global Step: 277010 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:08,073-Speed 2498.53 samples/sec Loss 2.9371 LearningRate 0.000548 Epoch: 13 Global Step: 277020 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:16,219-Speed 2514.25 samples/sec Loss 2.9686 LearningRate 0.000548 Epoch: 13 Global Step: 277030 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:24,417-Speed 2498.57 samples/sec Loss 2.9764 LearningRate 0.000548 Epoch: 13 Global Step: 277040 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:32,615-Speed 2498.65 samples/sec Loss 2.9868 LearningRate 0.000548 Epoch: 13 Global Step: 277050 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:40,816-Speed 2497.80 samples/sec Loss 2.9203 LearningRate 0.000548 Epoch: 13 Global Step: 277060 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:49,012-Speed 2499.06 samples/sec Loss 2.9270 LearningRate 0.000548 Epoch: 13 Global Step: 277070 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:48:57,213-Speed 2497.50 samples/sec Loss 2.9528 LearningRate 0.000548 Epoch: 13 Global Step: 277080 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:05,356-Speed 2515.43 samples/sec Loss 2.9531 LearningRate 0.000548 Epoch: 13 Global Step: 277090 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:13,584-Speed 2489.33 samples/sec Loss 2.9978 LearningRate 0.000548 Epoch: 13 Global Step: 277100 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:21,799-Speed 2493.55 samples/sec Loss 2.9275 LearningRate 0.000548 Epoch: 13 Global Step: 277110 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:30,000-Speed 2497.78 samples/sec Loss 2.9374 LearningRate 0.000548 Epoch: 13 Global Step: 277120 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:38,197-Speed 2498.73 samples/sec Loss 2.9286 LearningRate 0.000547 Epoch: 13 Global Step: 277130 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:46,397-Speed 2498.64 samples/sec Loss 2.9138 LearningRate 0.000547 Epoch: 13 Global Step: 277140 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:49:54,539-Speed 2515.76 samples/sec Loss 2.9714 LearningRate 0.000547 Epoch: 13 Global Step: 277150 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:02,737-Speed 2498.56 samples/sec Loss 2.9582 LearningRate 0.000547 Epoch: 13 Global Step: 277160 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:10,934-Speed 2498.70 samples/sec Loss 2.9265 LearningRate 0.000547 Epoch: 13 Global Step: 277170 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:19,147-Speed 2494.20 samples/sec Loss 2.9493 LearningRate 0.000547 Epoch: 13 Global Step: 277180 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:27,345-Speed 2498.45 samples/sec Loss 2.9254 LearningRate 0.000547 Epoch: 13 Global Step: 277190 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:35,544-Speed 2498.12 samples/sec Loss 2.8897 LearningRate 0.000547 Epoch: 13 Global Step: 277200 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:43,691-Speed 2514.35 samples/sec Loss 2.9761 LearningRate 0.000547 Epoch: 13 Global Step: 277210 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:50:51,890-Speed 2498.34 samples/sec Loss 2.9686 LearningRate 0.000547 Epoch: 13 Global Step: 277220 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:00,086-Speed 2499.03 samples/sec Loss 2.9769 LearningRate 0.000547 Epoch: 13 Global Step: 277230 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:08,286-Speed 2498.08 samples/sec Loss 2.9433 LearningRate 0.000547 Epoch: 13 Global Step: 277240 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:16,483-Speed 2498.69 samples/sec Loss 2.8983 LearningRate 0.000547 Epoch: 13 Global Step: 277250 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:24,682-Speed 2498.25 samples/sec Loss 2.8871 LearningRate 0.000547 Epoch: 13 Global Step: 277260 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:32,834-Speed 2512.74 samples/sec Loss 2.9975 LearningRate 0.000547 Epoch: 13 Global Step: 277270 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:41,032-Speed 2498.70 samples/sec Loss 2.9392 LearningRate 0.000547 Epoch: 13 Global Step: 277280 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:49,229-Speed 2498.64 samples/sec Loss 2.9476 LearningRate 0.000547 Epoch: 13 Global Step: 277290 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:51:57,429-Speed 2498.07 samples/sec Loss 2.9436 LearningRate 0.000547 Epoch: 13 Global Step: 277300 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:05,652-Speed 2490.81 samples/sec Loss 2.9317 LearningRate 0.000547 Epoch: 13 Global Step: 277310 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:13,845-Speed 2499.83 samples/sec Loss 2.8991 LearningRate 0.000547 Epoch: 13 Global Step: 277320 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:21,994-Speed 2513.88 samples/sec Loss 3.0126 LearningRate 0.000547 Epoch: 13 Global Step: 277330 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:30,193-Speed 2498.43 samples/sec Loss 2.8987 LearningRate 0.000547 Epoch: 13 Global Step: 277340 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:38,394-Speed 2497.35 samples/sec Loss 2.9162 LearningRate 0.000547 Epoch: 13 Global Step: 277350 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:46,592-Speed 2499.56 samples/sec Loss 2.9701 LearningRate 0.000547 Epoch: 13 Global Step: 277360 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:52:54,788-Speed 2499.23 samples/sec Loss 2.9605 LearningRate 0.000547 Epoch: 13 Global Step: 277370 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:02,999-Speed 2494.51 samples/sec Loss 2.9085 LearningRate 0.000547 Epoch: 13 Global Step: 277380 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:11,142-Speed 2515.60 samples/sec Loss 2.8906 LearningRate 0.000547 Epoch: 13 Global Step: 277390 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:19,341-Speed 2498.22 samples/sec Loss 2.9466 LearningRate 0.000547 Epoch: 13 Global Step: 277400 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:27,534-Speed 2500.00 samples/sec Loss 2.9975 LearningRate 0.000547 Epoch: 13 Global Step: 277410 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:35,728-Speed 2499.57 samples/sec Loss 2.9388 LearningRate 0.000547 Epoch: 13 Global Step: 277420 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:43,946-Speed 2492.57 samples/sec Loss 2.9494 LearningRate 0.000547 Epoch: 13 Global Step: 277430 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:53:52,146-Speed 2497.93 samples/sec Loss 2.9378 LearningRate 0.000547 Epoch: 13 Global Step: 277440 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:00,305-Speed 2510.74 samples/sec Loss 2.9332 LearningRate 0.000547 Epoch: 13 Global Step: 277450 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:08,507-Speed 2497.32 samples/sec Loss 2.9370 LearningRate 0.000547 Epoch: 13 Global Step: 277460 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:16,712-Speed 2496.32 samples/sec Loss 3.0131 LearningRate 0.000547 Epoch: 13 Global Step: 277470 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:24,908-Speed 2499.38 samples/sec Loss 2.9600 LearningRate 0.000547 Epoch: 13 Global Step: 277480 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:33,109-Speed 2497.86 samples/sec Loss 2.9199 LearningRate 0.000547 Epoch: 13 Global Step: 277490 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:41,305-Speed 2499.06 samples/sec Loss 2.9118 LearningRate 0.000547 Epoch: 13 Global Step: 277500 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:49,459-Speed 2511.89 samples/sec Loss 2.9326 LearningRate 0.000547 Epoch: 13 Global Step: 277510 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:54:57,657-Speed 2498.48 samples/sec Loss 2.9597 LearningRate 0.000547 Epoch: 13 Global Step: 277520 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:05,855-Speed 2498.69 samples/sec Loss 2.9684 LearningRate 0.000547 Epoch: 13 Global Step: 277530 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:14,055-Speed 2497.89 samples/sec Loss 2.9391 LearningRate 0.000547 Epoch: 13 Global Step: 277540 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:22,254-Speed 2498.10 samples/sec Loss 2.9543 LearningRate 0.000547 Epoch: 13 Global Step: 277550 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:30,455-Speed 2497.74 samples/sec Loss 2.8664 LearningRate 0.000547 Epoch: 13 Global Step: 277560 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:38,602-Speed 2514.25 samples/sec Loss 2.9178 LearningRate 0.000547 Epoch: 13 Global Step: 277570 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:46,797-Speed 2499.35 samples/sec Loss 2.8885 LearningRate 0.000547 Epoch: 13 Global Step: 277580 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:55:54,994-Speed 2498.83 samples/sec Loss 2.9224 LearningRate 0.000547 Epoch: 13 Global Step: 277590 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:03,194-Speed 2498.48 samples/sec Loss 2.9102 LearningRate 0.000547 Epoch: 13 Global Step: 277600 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:11,391-Speed 2498.65 samples/sec Loss 2.8724 LearningRate 0.000547 Epoch: 13 Global Step: 277610 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:19,590-Speed 2498.54 samples/sec Loss 2.9178 LearningRate 0.000547 Epoch: 13 Global Step: 277620 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:27,736-Speed 2514.47 samples/sec Loss 2.9428 LearningRate 0.000546 Epoch: 13 Global Step: 277630 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:35,938-Speed 2497.51 samples/sec Loss 2.8578 LearningRate 0.000546 Epoch: 13 Global Step: 277640 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:44,135-Speed 2498.59 samples/sec Loss 2.9900 LearningRate 0.000546 Epoch: 13 Global Step: 277650 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:56:52,332-Speed 2499.09 samples/sec Loss 2.9460 LearningRate 0.000546 Epoch: 13 Global Step: 277660 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:00,526-Speed 2499.75 samples/sec Loss 2.9628 LearningRate 0.000546 Epoch: 13 Global Step: 277670 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:08,737-Speed 2494.59 samples/sec Loss 2.9646 LearningRate 0.000546 Epoch: 13 Global Step: 277680 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:16,901-Speed 2508.94 samples/sec Loss 3.0007 LearningRate 0.000546 Epoch: 13 Global Step: 277690 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:25,101-Speed 2498.06 samples/sec Loss 2.9390 LearningRate 0.000546 Epoch: 13 Global Step: 277700 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:33,300-Speed 2498.45 samples/sec Loss 2.9030 LearningRate 0.000546 Epoch: 13 Global Step: 277710 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:41,515-Speed 2493.23 samples/sec Loss 2.9380 LearningRate 0.000546 Epoch: 13 Global Step: 277720 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:49,726-Speed 2494.64 samples/sec Loss 2.9531 LearningRate 0.000546 Epoch: 13 Global Step: 277730 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:57:57,927-Speed 2497.80 samples/sec Loss 3.0086 LearningRate 0.000546 Epoch: 13 Global Step: 277740 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:58:06,073-Speed 2514.48 samples/sec Loss 2.9801 LearningRate 0.000546 Epoch: 13 Global Step: 277750 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:58:14,272-Speed 2498.45 samples/sec Loss 2.9592 LearningRate 0.000546 Epoch: 13 Global Step: 277760 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:58:22,478-Speed 2495.90 samples/sec Loss 2.9583 LearningRate 0.000546 Epoch: 13 Global Step: 277770 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:58:30,678-Speed 2497.95 samples/sec Loss 2.9330 LearningRate 0.000546 Epoch: 13 Global Step: 277780 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 05:58:38,833-Speed 2511.70 samples/sec Loss 2.9281 LearningRate 0.000546 Epoch: 13 Global Step: 277790 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:58:47,033-Speed 2498.18 samples/sec Loss 2.9054 LearningRate 0.000546 Epoch: 13 Global Step: 277800 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:58:55,176-Speed 2515.52 samples/sec Loss 2.9642 LearningRate 0.000546 Epoch: 13 Global Step: 277810 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:03,369-Speed 2500.14 samples/sec Loss 2.9790 LearningRate 0.000546 Epoch: 13 Global Step: 277820 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:11,570-Speed 2497.65 samples/sec Loss 2.9611 LearningRate 0.000546 Epoch: 13 Global Step: 277830 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:19,766-Speed 2499.10 samples/sec Loss 2.9737 LearningRate 0.000546 Epoch: 13 Global Step: 277840 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:27,968-Speed 2497.71 samples/sec Loss 2.9482 LearningRate 0.000546 Epoch: 13 Global Step: 277850 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:36,172-Speed 2497.02 samples/sec Loss 2.9892 LearningRate 0.000546 Epoch: 13 Global Step: 277860 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:44,322-Speed 2513.20 samples/sec Loss 2.9318 LearningRate 0.000546 Epoch: 13 Global Step: 277870 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 05:59:52,520-Speed 2498.65 samples/sec Loss 2.9515 LearningRate 0.000546 Epoch: 13 Global Step: 277880 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:00,731-Speed 2494.59 samples/sec Loss 2.9185 LearningRate 0.000546 Epoch: 13 Global Step: 277890 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:08,929-Speed 2498.31 samples/sec Loss 2.9610 LearningRate 0.000546 Epoch: 13 Global Step: 277900 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:17,129-Speed 2498.27 samples/sec Loss 2.9393 LearningRate 0.000546 Epoch: 13 Global Step: 277910 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:25,326-Speed 2499.05 samples/sec Loss 2.8876 LearningRate 0.000546 Epoch: 13 Global Step: 277920 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:33,474-Speed 2513.83 samples/sec Loss 2.9393 LearningRate 0.000546 Epoch: 13 Global Step: 277930 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:41,689-Speed 2493.52 samples/sec Loss 2.9364 LearningRate 0.000546 Epoch: 13 Global Step: 277940 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:49,887-Speed 2498.59 samples/sec Loss 2.9317 LearningRate 0.000546 Epoch: 13 Global Step: 277950 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:00:58,082-Speed 2499.45 samples/sec Loss 2.9395 LearningRate 0.000546 Epoch: 13 Global Step: 277960 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:06,281-Speed 2498.11 samples/sec Loss 2.9103 LearningRate 0.000546 Epoch: 13 Global Step: 277970 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:14,480-Speed 2498.51 samples/sec Loss 2.9194 LearningRate 0.000546 Epoch: 13 Global Step: 277980 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:22,626-Speed 2514.30 samples/sec Loss 2.9439 LearningRate 0.000546 Epoch: 13 Global Step: 277990 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:30,836-Speed 2495.19 samples/sec Loss 2.9642 LearningRate 0.000546 Epoch: 13 Global Step: 278000 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:39,046-Speed 2494.96 samples/sec Loss 2.9334 LearningRate 0.000546 Epoch: 13 Global Step: 278010 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:47,245-Speed 2498.05 samples/sec Loss 2.9910 LearningRate 0.000546 Epoch: 13 Global Step: 278020 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:01:55,446-Speed 2497.75 samples/sec Loss 2.8670 LearningRate 0.000546 Epoch: 13 Global Step: 278030 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:03,645-Speed 2498.43 samples/sec Loss 2.9694 LearningRate 0.000546 Epoch: 13 Global Step: 278040 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:11,806-Speed 2509.98 samples/sec Loss 2.9503 LearningRate 0.000546 Epoch: 13 Global Step: 278050 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:20,005-Speed 2498.24 samples/sec Loss 2.9641 LearningRate 0.000546 Epoch: 13 Global Step: 278060 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:28,204-Speed 2498.12 samples/sec Loss 2.9495 LearningRate 0.000546 Epoch: 13 Global Step: 278070 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:36,402-Speed 2498.63 samples/sec Loss 2.9952 LearningRate 0.000546 Epoch: 13 Global Step: 278080 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:44,605-Speed 2497.28 samples/sec Loss 2.9336 LearningRate 0.000546 Epoch: 13 Global Step: 278090 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:02:52,808-Speed 2496.94 samples/sec Loss 2.9683 LearningRate 0.000546 Epoch: 13 Global Step: 278100 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:00,968-Speed 2510.10 samples/sec Loss 3.0070 LearningRate 0.000546 Epoch: 13 Global Step: 278110 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:09,171-Speed 2497.07 samples/sec Loss 2.9517 LearningRate 0.000546 Epoch: 13 Global Step: 278120 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:17,371-Speed 2497.90 samples/sec Loss 3.0319 LearningRate 0.000546 Epoch: 13 Global Step: 278130 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:25,573-Speed 2497.44 samples/sec Loss 2.9167 LearningRate 0.000545 Epoch: 13 Global Step: 278140 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:33,775-Speed 2497.60 samples/sec Loss 2.9513 LearningRate 0.000545 Epoch: 13 Global Step: 278150 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:41,972-Speed 2498.73 samples/sec Loss 2.9332 LearningRate 0.000545 Epoch: 13 Global Step: 278160 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:50,115-Speed 2515.42 samples/sec Loss 2.9015 LearningRate 0.000545 Epoch: 13 Global Step: 278170 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:03:58,318-Speed 2497.29 samples/sec Loss 2.9802 LearningRate 0.000545 Epoch: 13 Global Step: 278180 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:06,521-Speed 2496.95 samples/sec Loss 2.9790 LearningRate 0.000545 Epoch: 13 Global Step: 278190 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:14,717-Speed 2499.08 samples/sec Loss 2.9274 LearningRate 0.000545 Epoch: 13 Global Step: 278200 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:22,916-Speed 2498.30 samples/sec Loss 2.9713 LearningRate 0.000545 Epoch: 13 Global Step: 278210 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:31,125-Speed 2495.43 samples/sec Loss 2.8871 LearningRate 0.000545 Epoch: 13 Global Step: 278220 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:39,265-Speed 2516.20 samples/sec Loss 2.9642 LearningRate 0.000545 Epoch: 13 Global Step: 278230 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:47,463-Speed 2498.80 samples/sec Loss 2.9327 LearningRate 0.000545 Epoch: 13 Global Step: 278240 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:04:55,663-Speed 2497.80 samples/sec Loss 2.9460 LearningRate 0.000545 Epoch: 13 Global Step: 278250 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:03,861-Speed 2498.68 samples/sec Loss 2.9713 LearningRate 0.000545 Epoch: 13 Global Step: 278260 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:12,058-Speed 2498.71 samples/sec Loss 2.9635 LearningRate 0.000545 Epoch: 13 Global Step: 278270 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:20,257-Speed 2498.45 samples/sec Loss 2.9099 LearningRate 0.000545 Epoch: 13 Global Step: 278280 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:28,406-Speed 2513.78 samples/sec Loss 2.9656 LearningRate 0.000545 Epoch: 13 Global Step: 278290 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:36,605-Speed 2497.96 samples/sec Loss 2.9519 LearningRate 0.000545 Epoch: 13 Global Step: 278300 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:44,805-Speed 2498.11 samples/sec Loss 2.9429 LearningRate 0.000545 Epoch: 13 Global Step: 278310 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:05:53,005-Speed 2497.79 samples/sec Loss 2.9692 LearningRate 0.000545 Epoch: 13 Global Step: 278320 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:01,203-Speed 2498.93 samples/sec Loss 3.0135 LearningRate 0.000545 Epoch: 13 Global Step: 278330 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:09,399-Speed 2499.24 samples/sec Loss 2.9184 LearningRate 0.000545 Epoch: 13 Global Step: 278340 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:17,543-Speed 2515.08 samples/sec Loss 2.9848 LearningRate 0.000545 Epoch: 13 Global Step: 278350 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:25,739-Speed 2499.11 samples/sec Loss 3.0290 LearningRate 0.000545 Epoch: 13 Global Step: 278360 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:33,939-Speed 2497.94 samples/sec Loss 2.9690 LearningRate 0.000545 Epoch: 13 Global Step: 278370 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:42,137-Speed 2498.77 samples/sec Loss 2.9241 LearningRate 0.000545 Epoch: 13 Global Step: 278380 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:50,336-Speed 2498.33 samples/sec Loss 2.9481 LearningRate 0.000545 Epoch: 13 Global Step: 278390 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:06:58,532-Speed 2499.37 samples/sec Loss 2.9748 LearningRate 0.000545 Epoch: 13 Global Step: 278400 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:06,673-Speed 2516.01 samples/sec Loss 2.9435 LearningRate 0.000545 Epoch: 13 Global Step: 278410 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:14,873-Speed 2498.02 samples/sec Loss 2.9261 LearningRate 0.000545 Epoch: 13 Global Step: 278420 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:23,068-Speed 2499.43 samples/sec Loss 2.8869 LearningRate 0.000545 Epoch: 13 Global Step: 278430 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:31,280-Speed 2494.51 samples/sec Loss 2.9150 LearningRate 0.000545 Epoch: 13 Global Step: 278440 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:39,478-Speed 2498.67 samples/sec Loss 2.9378 LearningRate 0.000545 Epoch: 13 Global Step: 278450 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:47,684-Speed 2495.91 samples/sec Loss 2.9596 LearningRate 0.000545 Epoch: 13 Global Step: 278460 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:07:55,828-Speed 2515.08 samples/sec Loss 2.9393 LearningRate 0.000545 Epoch: 13 Global Step: 278470 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:04,027-Speed 2498.41 samples/sec Loss 2.9009 LearningRate 0.000545 Epoch: 13 Global Step: 278480 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:12,228-Speed 2497.60 samples/sec Loss 2.9310 LearningRate 0.000545 Epoch: 13 Global Step: 278490 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:20,436-Speed 2495.59 samples/sec Loss 2.9518 LearningRate 0.000545 Epoch: 13 Global Step: 278500 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:28,651-Speed 2493.41 samples/sec Loss 3.0002 LearningRate 0.000545 Epoch: 13 Global Step: 278510 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:36,852-Speed 2497.79 samples/sec Loss 2.9265 LearningRate 0.000545 Epoch: 13 Global Step: 278520 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:45,004-Speed 2512.57 samples/sec Loss 3.0116 LearningRate 0.000545 Epoch: 13 Global Step: 278530 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:08:53,205-Speed 2497.67 samples/sec Loss 2.9918 LearningRate 0.000545 Epoch: 13 Global Step: 278540 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:01,407-Speed 2497.36 samples/sec Loss 2.9107 LearningRate 0.000545 Epoch: 13 Global Step: 278550 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:09,634-Speed 2489.91 samples/sec Loss 2.9887 LearningRate 0.000545 Epoch: 13 Global Step: 278560 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:17,836-Speed 2497.20 samples/sec Loss 2.9175 LearningRate 0.000545 Epoch: 13 Global Step: 278570 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:26,036-Speed 2498.00 samples/sec Loss 2.9181 LearningRate 0.000545 Epoch: 13 Global Step: 278580 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:34,191-Speed 2511.71 samples/sec Loss 2.9739 LearningRate 0.000545 Epoch: 13 Global Step: 278590 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:42,421-Speed 2489.06 samples/sec Loss 2.9254 LearningRate 0.000545 Epoch: 13 Global Step: 278600 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:50,621-Speed 2498.03 samples/sec Loss 2.8586 LearningRate 0.000545 Epoch: 13 Global Step: 278610 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:09:58,818-Speed 2498.60 samples/sec Loss 2.9118 LearningRate 0.000545 Epoch: 13 Global Step: 278620 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:07,023-Speed 2497.06 samples/sec Loss 2.9394 LearningRate 0.000545 Epoch: 13 Global Step: 278630 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:15,232-Speed 2495.51 samples/sec Loss 2.9246 LearningRate 0.000545 Epoch: 13 Global Step: 278640 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:23,381-Speed 2513.40 samples/sec Loss 2.9501 LearningRate 0.000544 Epoch: 13 Global Step: 278650 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:31,577-Speed 2499.30 samples/sec Loss 2.9246 LearningRate 0.000544 Epoch: 13 Global Step: 278660 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:39,780-Speed 2496.94 samples/sec Loss 2.9613 LearningRate 0.000544 Epoch: 13 Global Step: 278670 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:47,979-Speed 2498.34 samples/sec Loss 2.9593 LearningRate 0.000544 Epoch: 13 Global Step: 278680 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:10:56,174-Speed 2499.21 samples/sec Loss 2.8970 LearningRate 0.000544 Epoch: 13 Global Step: 278690 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:04,378-Speed 2496.81 samples/sec Loss 2.9595 LearningRate 0.000544 Epoch: 13 Global Step: 278700 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:12,523-Speed 2514.67 samples/sec Loss 2.9717 LearningRate 0.000544 Epoch: 13 Global Step: 278710 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:20,724-Speed 2497.70 samples/sec Loss 2.9241 LearningRate 0.000544 Epoch: 13 Global Step: 278720 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:28,921-Speed 2498.83 samples/sec Loss 2.8815 LearningRate 0.000544 Epoch: 13 Global Step: 278730 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:37,114-Speed 2500.16 samples/sec Loss 2.9213 LearningRate 0.000544 Epoch: 13 Global Step: 278740 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:45,314-Speed 2497.90 samples/sec Loss 2.9173 LearningRate 0.000544 Epoch: 13 Global Step: 278750 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:11:53,514-Speed 2497.93 samples/sec Loss 2.9132 LearningRate 0.000544 Epoch: 13 Global Step: 278760 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:01,664-Speed 2513.18 samples/sec Loss 2.9146 LearningRate 0.000544 Epoch: 13 Global Step: 278770 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:09,861-Speed 2498.96 samples/sec Loss 3.0385 LearningRate 0.000544 Epoch: 13 Global Step: 278780 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:18,060-Speed 2498.24 samples/sec Loss 2.9385 LearningRate 0.000544 Epoch: 13 Global Step: 278790 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:26,260-Speed 2497.96 samples/sec Loss 2.9298 LearningRate 0.000544 Epoch: 13 Global Step: 278800 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:34,458-Speed 2498.49 samples/sec Loss 2.8937 LearningRate 0.000544 Epoch: 13 Global Step: 278810 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:42,657-Speed 2498.46 samples/sec Loss 2.9783 LearningRate 0.000544 Epoch: 13 Global Step: 278820 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:50,803-Speed 2514.24 samples/sec Loss 2.9526 LearningRate 0.000544 Epoch: 13 Global Step: 278830 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:12:59,002-Speed 2498.32 samples/sec Loss 2.9354 LearningRate 0.000544 Epoch: 13 Global Step: 278840 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:07,203-Speed 2497.63 samples/sec Loss 2.9210 LearningRate 0.000544 Epoch: 13 Global Step: 278850 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:15,405-Speed 2497.38 samples/sec Loss 2.9216 LearningRate 0.000544 Epoch: 13 Global Step: 278860 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:23,611-Speed 2496.19 samples/sec Loss 2.8941 LearningRate 0.000544 Epoch: 13 Global Step: 278870 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:31,818-Speed 2495.63 samples/sec Loss 2.8973 LearningRate 0.000544 Epoch: 13 Global Step: 278880 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:39,965-Speed 2514.57 samples/sec Loss 2.9409 LearningRate 0.000544 Epoch: 13 Global Step: 278890 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:48,163-Speed 2498.76 samples/sec Loss 2.9183 LearningRate 0.000544 Epoch: 13 Global Step: 278900 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:13:56,361-Speed 2498.59 samples/sec Loss 2.8885 LearningRate 0.000544 Epoch: 13 Global Step: 278910 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:04,558-Speed 2498.88 samples/sec Loss 2.9141 LearningRate 0.000544 Epoch: 13 Global Step: 278920 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:12,754-Speed 2499.13 samples/sec Loss 2.9845 LearningRate 0.000544 Epoch: 13 Global Step: 278930 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:20,959-Speed 2496.32 samples/sec Loss 2.9069 LearningRate 0.000544 Epoch: 13 Global Step: 278940 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:29,104-Speed 2514.76 samples/sec Loss 2.8947 LearningRate 0.000544 Epoch: 13 Global Step: 278950 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:37,302-Speed 2498.47 samples/sec Loss 2.9202 LearningRate 0.000544 Epoch: 13 Global Step: 278960 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:45,511-Speed 2495.69 samples/sec Loss 2.9361 LearningRate 0.000544 Epoch: 13 Global Step: 278970 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:14:53,712-Speed 2497.41 samples/sec Loss 2.9597 LearningRate 0.000544 Epoch: 13 Global Step: 278980 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:15:01,913-Speed 2497.80 samples/sec Loss 2.9563 LearningRate 0.000544 Epoch: 13 Global Step: 278990 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:10,112-Speed 2498.39 samples/sec Loss 3.0031 LearningRate 0.000544 Epoch: 13 Global Step: 279000 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:18,258-Speed 2514.78 samples/sec Loss 2.9776 LearningRate 0.000544 Epoch: 13 Global Step: 279010 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:26,459-Speed 2497.40 samples/sec Loss 2.9420 LearningRate 0.000544 Epoch: 13 Global Step: 279020 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:34,658-Speed 2498.39 samples/sec Loss 2.9660 LearningRate 0.000544 Epoch: 13 Global Step: 279030 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:42,862-Speed 2496.94 samples/sec Loss 2.9846 LearningRate 0.000544 Epoch: 13 Global Step: 279040 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:51,056-Speed 2500.66 samples/sec Loss 2.9101 LearningRate 0.000544 Epoch: 13 Global Step: 279050 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:15:59,265-Speed 2495.11 samples/sec Loss 2.9308 LearningRate 0.000544 Epoch: 13 Global Step: 279060 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:07,408-Speed 2515.44 samples/sec Loss 2.9749 LearningRate 0.000544 Epoch: 13 Global Step: 279070 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:15,609-Speed 2497.85 samples/sec Loss 2.9069 LearningRate 0.000544 Epoch: 13 Global Step: 279080 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:23,813-Speed 2496.71 samples/sec Loss 2.9619 LearningRate 0.000544 Epoch: 13 Global Step: 279090 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:32,015-Speed 2497.30 samples/sec Loss 2.9282 LearningRate 0.000544 Epoch: 13 Global Step: 279100 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:40,213-Speed 2498.55 samples/sec Loss 2.8869 LearningRate 0.000544 Epoch: 13 Global Step: 279110 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:48,415-Speed 2497.56 samples/sec Loss 2.9708 LearningRate 0.000544 Epoch: 13 Global Step: 279120 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:16:56,561-Speed 2514.44 samples/sec Loss 2.8749 LearningRate 0.000544 Epoch: 13 Global Step: 279130 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:04,756-Speed 2499.66 samples/sec Loss 2.9377 LearningRate 0.000544 Epoch: 13 Global Step: 279140 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:12,954-Speed 2498.37 samples/sec Loss 2.9679 LearningRate 0.000543 Epoch: 13 Global Step: 279150 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:21,154-Speed 2498.01 samples/sec Loss 2.8812 LearningRate 0.000543 Epoch: 13 Global Step: 279160 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:29,354-Speed 2498.01 samples/sec Loss 2.9356 LearningRate 0.000543 Epoch: 13 Global Step: 279170 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:37,552-Speed 2498.51 samples/sec Loss 2.8672 LearningRate 0.000543 Epoch: 13 Global Step: 279180 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:45,698-Speed 2514.54 samples/sec Loss 2.9509 LearningRate 0.000543 Epoch: 13 Global Step: 279190 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:17:53,907-Speed 2495.34 samples/sec Loss 2.9154 LearningRate 0.000543 Epoch: 13 Global Step: 279200 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:02,108-Speed 2497.85 samples/sec Loss 2.9100 LearningRate 0.000543 Epoch: 13 Global Step: 279210 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:10,305-Speed 2498.81 samples/sec Loss 2.9364 LearningRate 0.000543 Epoch: 13 Global Step: 279220 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:18,509-Speed 2496.65 samples/sec Loss 2.8958 LearningRate 0.000543 Epoch: 13 Global Step: 279230 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:26,716-Speed 2495.79 samples/sec Loss 2.9302 LearningRate 0.000543 Epoch: 13 Global Step: 279240 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:34,866-Speed 2513.40 samples/sec Loss 2.9288 LearningRate 0.000543 Epoch: 13 Global Step: 279250 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:43,075-Speed 2495.29 samples/sec Loss 2.9656 LearningRate 0.000543 Epoch: 13 Global Step: 279260 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:51,279-Speed 2496.68 samples/sec Loss 2.9396 LearningRate 0.000543 Epoch: 13 Global Step: 279270 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:18:59,476-Speed 2498.61 samples/sec Loss 2.9292 LearningRate 0.000543 Epoch: 13 Global Step: 279280 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:07,676-Speed 2498.16 samples/sec Loss 2.9301 LearningRate 0.000543 Epoch: 13 Global Step: 279290 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:15,881-Speed 2496.16 samples/sec Loss 2.8936 LearningRate 0.000543 Epoch: 13 Global Step: 279300 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:24,024-Speed 2515.58 samples/sec Loss 2.9286 LearningRate 0.000543 Epoch: 13 Global Step: 279310 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:32,221-Speed 2499.01 samples/sec Loss 2.8928 LearningRate 0.000543 Epoch: 13 Global Step: 279320 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:40,422-Speed 2497.42 samples/sec Loss 2.9518 LearningRate 0.000543 Epoch: 13 Global Step: 279330 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:48,623-Speed 2498.17 samples/sec Loss 2.9509 LearningRate 0.000543 Epoch: 13 Global Step: 279340 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:19:56,837-Speed 2493.85 samples/sec Loss 2.9025 LearningRate 0.000543 Epoch: 13 Global Step: 279350 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:05,035-Speed 2498.60 samples/sec Loss 2.9478 LearningRate 0.000543 Epoch: 13 Global Step: 279360 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:13,180-Speed 2514.73 samples/sec Loss 2.8821 LearningRate 0.000543 Epoch: 13 Global Step: 279370 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:21,381-Speed 2497.65 samples/sec Loss 2.9417 LearningRate 0.000543 Epoch: 13 Global Step: 279380 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:29,582-Speed 2497.77 samples/sec Loss 2.9215 LearningRate 0.000543 Epoch: 13 Global Step: 279390 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:37,779-Speed 2498.82 samples/sec Loss 2.9674 LearningRate 0.000543 Epoch: 13 Global Step: 279400 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:45,983-Speed 2496.65 samples/sec Loss 2.9253 LearningRate 0.000543 Epoch: 13 Global Step: 279410 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:20:54,182-Speed 2498.35 samples/sec Loss 2.9332 LearningRate 0.000543 Epoch: 13 Global Step: 279420 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:02,326-Speed 2515.53 samples/sec Loss 2.9842 LearningRate 0.000543 Epoch: 13 Global Step: 279430 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:10,529-Speed 2497.00 samples/sec Loss 2.9537 LearningRate 0.000543 Epoch: 13 Global Step: 279440 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:18,729-Speed 2497.85 samples/sec Loss 2.9302 LearningRate 0.000543 Epoch: 13 Global Step: 279450 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:26,928-Speed 2498.52 samples/sec Loss 2.9554 LearningRate 0.000543 Epoch: 13 Global Step: 279460 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:35,125-Speed 2498.77 samples/sec Loss 2.9862 LearningRate 0.000543 Epoch: 13 Global Step: 279470 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:43,326-Speed 2497.60 samples/sec Loss 2.9534 LearningRate 0.000543 Epoch: 13 Global Step: 279480 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:51,472-Speed 2514.87 samples/sec Loss 2.9708 LearningRate 0.000543 Epoch: 13 Global Step: 279490 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:21:59,666-Speed 2499.57 samples/sec Loss 2.8989 LearningRate 0.000543 Epoch: 13 Global Step: 279500 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:07,863-Speed 2499.25 samples/sec Loss 2.9429 LearningRate 0.000543 Epoch: 13 Global Step: 279510 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:16,063-Speed 2497.83 samples/sec Loss 2.9459 LearningRate 0.000543 Epoch: 13 Global Step: 279520 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:24,265-Speed 2497.40 samples/sec Loss 2.9230 LearningRate 0.000543 Epoch: 13 Global Step: 279530 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:32,462-Speed 2499.04 samples/sec Loss 2.8955 LearningRate 0.000543 Epoch: 13 Global Step: 279540 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:40,619-Speed 2511.23 samples/sec Loss 2.9422 LearningRate 0.000543 Epoch: 13 Global Step: 279550 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:48,813-Speed 2499.59 samples/sec Loss 2.9682 LearningRate 0.000543 Epoch: 13 Global Step: 279560 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:22:57,014-Speed 2497.73 samples/sec Loss 2.9412 LearningRate 0.000543 Epoch: 13 Global Step: 279570 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:05,225-Speed 2494.86 samples/sec Loss 2.9089 LearningRate 0.000543 Epoch: 13 Global Step: 279580 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:13,423-Speed 2498.57 samples/sec Loss 2.8942 LearningRate 0.000543 Epoch: 13 Global Step: 279590 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:21,623-Speed 2497.92 samples/sec Loss 2.9182 LearningRate 0.000543 Epoch: 13 Global Step: 279600 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:29,772-Speed 2513.78 samples/sec Loss 2.8817 LearningRate 0.000543 Epoch: 13 Global Step: 279610 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:37,972-Speed 2498.01 samples/sec Loss 2.9686 LearningRate 0.000543 Epoch: 13 Global Step: 279620 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:46,171-Speed 2498.58 samples/sec Loss 2.9343 LearningRate 0.000543 Epoch: 13 Global Step: 279630 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:23:54,367-Speed 2498.89 samples/sec Loss 2.8726 LearningRate 0.000543 Epoch: 13 Global Step: 279640 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:02,566-Speed 2498.44 samples/sec Loss 3.0059 LearningRate 0.000543 Epoch: 13 Global Step: 279650 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:10,763-Speed 2498.70 samples/sec Loss 2.8635 LearningRate 0.000542 Epoch: 13 Global Step: 279660 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:18,914-Speed 2513.03 samples/sec Loss 2.9373 LearningRate 0.000542 Epoch: 13 Global Step: 279670 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:27,114-Speed 2497.84 samples/sec Loss 2.9359 LearningRate 0.000542 Epoch: 13 Global Step: 279680 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:35,327-Speed 2494.25 samples/sec Loss 2.9156 LearningRate 0.000542 Epoch: 13 Global Step: 279690 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:43,530-Speed 2497.08 samples/sec Loss 2.8943 LearningRate 0.000542 Epoch: 13 Global Step: 279700 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:51,750-Speed 2491.81 samples/sec Loss 2.8592 LearningRate 0.000542 Epoch: 13 Global Step: 279710 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:24:59,949-Speed 2498.65 samples/sec Loss 2.9480 LearningRate 0.000542 Epoch: 13 Global Step: 279720 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:08,093-Speed 2514.94 samples/sec Loss 2.9361 LearningRate 0.000542 Epoch: 13 Global Step: 279730 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:16,295-Speed 2497.55 samples/sec Loss 2.9707 LearningRate 0.000542 Epoch: 13 Global Step: 279740 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:24,493-Speed 2498.65 samples/sec Loss 2.9265 LearningRate 0.000542 Epoch: 13 Global Step: 279750 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:32,698-Speed 2496.54 samples/sec Loss 2.9117 LearningRate 0.000542 Epoch: 13 Global Step: 279760 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:40,896-Speed 2498.81 samples/sec Loss 2.8806 LearningRate 0.000542 Epoch: 13 Global Step: 279770 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:49,094-Speed 2499.56 samples/sec Loss 2.9840 LearningRate 0.000542 Epoch: 13 Global Step: 279780 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:25:57,239-Speed 2514.91 samples/sec Loss 2.8944 LearningRate 0.000542 Epoch: 13 Global Step: 279790 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:05,437-Speed 2498.56 samples/sec Loss 2.9625 LearningRate 0.000542 Epoch: 13 Global Step: 279800 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:13,636-Speed 2498.53 samples/sec Loss 2.9559 LearningRate 0.000542 Epoch: 13 Global Step: 279810 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:21,831-Speed 2499.44 samples/sec Loss 2.9429 LearningRate 0.000542 Epoch: 13 Global Step: 279820 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:30,032-Speed 2497.82 samples/sec Loss 2.9216 LearningRate 0.000542 Epoch: 13 Global Step: 279830 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:38,234-Speed 2497.11 samples/sec Loss 2.8648 LearningRate 0.000542 Epoch: 13 Global Step: 279840 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:46,379-Speed 2515.03 samples/sec Loss 2.9302 LearningRate 0.000542 Epoch: 13 Global Step: 279850 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:26:54,577-Speed 2498.37 samples/sec Loss 2.9083 LearningRate 0.000542 Epoch: 13 Global Step: 279860 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:02,788-Speed 2494.74 samples/sec Loss 2.8817 LearningRate 0.000542 Epoch: 13 Global Step: 279870 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:10,984-Speed 2499.83 samples/sec Loss 2.8846 LearningRate 0.000542 Epoch: 13 Global Step: 279880 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:19,181-Speed 2498.72 samples/sec Loss 2.9136 LearningRate 0.000542 Epoch: 13 Global Step: 279890 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:27,382-Speed 2497.64 samples/sec Loss 2.9035 LearningRate 0.000542 Epoch: 13 Global Step: 279900 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:35,527-Speed 2514.99 samples/sec Loss 2.9245 LearningRate 0.000542 Epoch: 13 Global Step: 279910 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:43,729-Speed 2497.53 samples/sec Loss 2.9791 LearningRate 0.000542 Epoch: 13 Global Step: 279920 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:27:51,932-Speed 2496.99 samples/sec Loss 2.8819 LearningRate 0.000542 Epoch: 13 Global Step: 279930 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:00,129-Speed 2498.97 samples/sec Loss 2.8719 LearningRate 0.000542 Epoch: 13 Global Step: 279940 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:08,349-Speed 2491.91 samples/sec Loss 2.8509 LearningRate 0.000542 Epoch: 13 Global Step: 279950 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:16,547-Speed 2498.52 samples/sec Loss 2.8778 LearningRate 0.000542 Epoch: 13 Global Step: 279960 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:24,692-Speed 2514.92 samples/sec Loss 2.9014 LearningRate 0.000542 Epoch: 13 Global Step: 279970 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:32,890-Speed 2498.54 samples/sec Loss 2.9173 LearningRate 0.000542 Epoch: 13 Global Step: 279980 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:41,089-Speed 2498.07 samples/sec Loss 2.9670 LearningRate 0.000542 Epoch: 13 Global Step: 279990 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:49,293-Speed 2497.14 samples/sec Loss 2.9614 LearningRate 0.000542 Epoch: 13 Global Step: 280000 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:28:57,500-Speed 2496.16 samples/sec Loss 2.8719 LearningRate 0.000542 Epoch: 13 Global Step: 280010 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:05,703-Speed 2497.02 samples/sec Loss 2.9314 LearningRate 0.000542 Epoch: 13 Global Step: 280020 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:13,847-Speed 2515.07 samples/sec Loss 2.9422 LearningRate 0.000542 Epoch: 13 Global Step: 280030 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:22,048-Speed 2497.55 samples/sec Loss 2.9386 LearningRate 0.000542 Epoch: 13 Global Step: 280040 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:30,246-Speed 2498.58 samples/sec Loss 2.8768 LearningRate 0.000542 Epoch: 13 Global Step: 280050 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:38,458-Speed 2494.35 samples/sec Loss 2.9804 LearningRate 0.000542 Epoch: 13 Global Step: 280060 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:46,655-Speed 2498.91 samples/sec Loss 2.9376 LearningRate 0.000542 Epoch: 13 Global Step: 280070 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:29:54,865-Speed 2495.05 samples/sec Loss 2.8717 LearningRate 0.000542 Epoch: 13 Global Step: 280080 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:03,012-Speed 2514.14 samples/sec Loss 2.8773 LearningRate 0.000542 Epoch: 13 Global Step: 280090 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:11,227-Speed 2493.35 samples/sec Loss 2.8759 LearningRate 0.000542 Epoch: 13 Global Step: 280100 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:19,427-Speed 2498.19 samples/sec Loss 2.9192 LearningRate 0.000542 Epoch: 13 Global Step: 280110 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:27,624-Speed 2498.93 samples/sec Loss 2.9183 LearningRate 0.000542 Epoch: 13 Global Step: 280120 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:35,835-Speed 2494.46 samples/sec Loss 2.9218 LearningRate 0.000542 Epoch: 13 Global Step: 280130 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:44,038-Speed 2496.95 samples/sec Loss 2.8327 LearningRate 0.000542 Epoch: 13 Global Step: 280140 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:30:52,183-Speed 2515.22 samples/sec Loss 2.9831 LearningRate 0.000542 Epoch: 13 Global Step: 280150 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:31:00,386-Speed 2497.12 samples/sec Loss 2.9105 LearningRate 0.000541 Epoch: 13 Global Step: 280160 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:31:08,581-Speed 2499.33 samples/sec Loss 2.9695 LearningRate 0.000541 Epoch: 13 Global Step: 280170 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:31:16,778-Speed 2498.75 samples/sec Loss 2.9548 LearningRate 0.000541 Epoch: 13 Global Step: 280180 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:31:24,988-Speed 2495.36 samples/sec Loss 2.8779 LearningRate 0.000541 Epoch: 13 Global Step: 280190 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:31:33,198-Speed 2494.82 samples/sec Loss 2.9440 LearningRate 0.000541 Epoch: 13 Global Step: 280200 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:31:41,346-Speed 2513.74 samples/sec Loss 2.9389 LearningRate 0.000541 Epoch: 13 Global Step: 280210 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:31:49,546-Speed 2498.17 samples/sec Loss 2.9369 LearningRate 0.000541 Epoch: 13 Global Step: 280220 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:31:57,761-Speed 2493.52 samples/sec Loss 2.9203 LearningRate 0.000541 Epoch: 13 Global Step: 280230 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:05,961-Speed 2497.97 samples/sec Loss 2.8625 LearningRate 0.000541 Epoch: 13 Global Step: 280240 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:14,173-Speed 2494.18 samples/sec Loss 2.9264 LearningRate 0.000541 Epoch: 13 Global Step: 280250 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:22,372-Speed 2498.51 samples/sec Loss 2.9407 LearningRate 0.000541 Epoch: 13 Global Step: 280260 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:30,521-Speed 2513.77 samples/sec Loss 2.9738 LearningRate 0.000541 Epoch: 13 Global Step: 280270 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:38,720-Speed 2498.24 samples/sec Loss 2.9328 LearningRate 0.000541 Epoch: 13 Global Step: 280280 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:46,931-Speed 2495.27 samples/sec Loss 2.9917 LearningRate 0.000541 Epoch: 13 Global Step: 280290 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:32:55,134-Speed 2497.31 samples/sec Loss 2.9113 LearningRate 0.000541 Epoch: 13 Global Step: 280300 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:03,339-Speed 2496.39 samples/sec Loss 2.9214 LearningRate 0.000541 Epoch: 13 Global Step: 280310 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:11,540-Speed 2497.53 samples/sec Loss 2.8897 LearningRate 0.000541 Epoch: 13 Global Step: 280320 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:19,686-Speed 2514.39 samples/sec Loss 2.9902 LearningRate 0.000541 Epoch: 13 Global Step: 280330 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:27,886-Speed 2498.60 samples/sec Loss 2.9299 LearningRate 0.000541 Epoch: 13 Global Step: 280340 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:36,120-Speed 2498.63 samples/sec Loss 2.9413 LearningRate 0.000541 Epoch: 13 Global Step: 280350 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:44,320-Speed 2497.88 samples/sec Loss 2.9617 LearningRate 0.000541 Epoch: 13 Global Step: 280360 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:33:52,542-Speed 2500.42 samples/sec Loss 2.9190 LearningRate 0.000541 Epoch: 13 Global Step: 280370 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:00,771-Speed 2499.58 samples/sec Loss 3.0099 LearningRate 0.000541 Epoch: 13 Global Step: 280380 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:08,985-Speed 2515.89 samples/sec Loss 2.9502 LearningRate 0.000541 Epoch: 13 Global Step: 280390 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:19,416-Speed 1963.59 samples/sec Loss 2.9165 LearningRate 0.000541 Epoch: 13 Global Step: 280400 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:27,650-Speed 2500.70 samples/sec Loss 2.9224 LearningRate 0.000541 Epoch: 13 Global Step: 280410 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:35,908-Speed 2499.91 samples/sec Loss 2.9256 LearningRate 0.000541 Epoch: 13 Global Step: 280420 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:48,496-Speed 1627.05 samples/sec Loss 2.9568 LearningRate 0.000541 Epoch: 13 Global Step: 280430 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:34:56,686-Speed 2500.82 samples/sec Loss 2.9305 LearningRate 0.000541 Epoch: 13 Global Step: 280440 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:04,862-Speed 2512.62 samples/sec Loss 2.9196 LearningRate 0.000541 Epoch: 13 Global Step: 280450 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:17,972-Speed 1565.71 samples/sec Loss 2.9202 LearningRate 0.000541 Epoch: 13 Global Step: 280460 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:26,208-Speed 2504.02 samples/sec Loss 2.8991 LearningRate 0.000541 Epoch: 13 Global Step: 280470 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:34,437-Speed 2500.91 samples/sec Loss 2.9563 LearningRate 0.000541 Epoch: 13 Global Step: 280480 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:44,525-Speed 2045.22 samples/sec Loss 2.9499 LearningRate 0.000541 Epoch: 13 Global Step: 280490 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:35:52,718-Speed 2501.12 samples/sec Loss 2.9284 LearningRate 0.000541 Epoch: 13 Global Step: 280500 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:00,859-Speed 2515.92 samples/sec Loss 2.9111 LearningRate 0.000541 Epoch: 13 Global Step: 280510 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:12,224-Speed 2185.33 samples/sec Loss 2.9470 LearningRate 0.000541 Epoch: 13 Global Step: 280520 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:20,431-Speed 2496.90 samples/sec Loss 2.9383 LearningRate 0.000541 Epoch: 13 Global Step: 280530 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:28,627-Speed 2499.10 samples/sec Loss 2.9708 LearningRate 0.000541 Epoch: 13 Global Step: 280540 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:40,203-Speed 1780.41 samples/sec Loss 2.9506 LearningRate 0.000541 Epoch: 13 Global Step: 280550 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:49,261-Speed 2502.45 samples/sec Loss 2.9622 LearningRate 0.000541 Epoch: 13 Global Step: 280560 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:36:57,425-Speed 2517.78 samples/sec Loss 2.9557 LearningRate 0.000541 Epoch: 13 Global Step: 280570 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:08,076-Speed 1923.15 samples/sec Loss 2.8901 LearningRate 0.000541 Epoch: 13 Global Step: 280580 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:16,271-Speed 2499.21 samples/sec Loss 2.9671 LearningRate 0.000541 Epoch: 13 Global Step: 280590 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:24,537-Speed 2499.34 samples/sec Loss 2.9499 LearningRate 0.000541 Epoch: 13 Global Step: 280600 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:35,606-Speed 1855.09 samples/sec Loss 3.0076 LearningRate 0.000541 Epoch: 13 Global Step: 280610 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:43,803-Speed 2498.83 samples/sec Loss 2.9532 LearningRate 0.000541 Epoch: 13 Global Step: 280620 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:37:52,784-Speed 2517.02 samples/sec Loss 2.8891 LearningRate 0.000541 Epoch: 13 Global Step: 280630 Fp16 Grad Scale: 65536 Required: 126 hours Training: 2022-07-08 06:38:00,940-Speed 2511.65 samples/sec Loss 2.8571 LearningRate 0.000541 Epoch: 13 Global Step: 280640 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:09,141-Speed 2497.32 samples/sec Loss 2.9044 LearningRate 0.000541 Epoch: 13 Global Step: 280650 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:17,341-Speed 2497.91 samples/sec Loss 2.9270 LearningRate 0.000541 Epoch: 13 Global Step: 280660 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:25,540-Speed 2498.22 samples/sec Loss 2.8756 LearningRate 0.000540 Epoch: 13 Global Step: 280670 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:33,739-Speed 2498.27 samples/sec Loss 2.9317 LearningRate 0.000540 Epoch: 13 Global Step: 280680 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:41,888-Speed 2513.40 samples/sec Loss 2.8930 LearningRate 0.000540 Epoch: 13 Global Step: 280690 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:50,092-Speed 2497.00 samples/sec Loss 2.8923 LearningRate 0.000540 Epoch: 13 Global Step: 280700 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:38:58,291-Speed 2498.23 samples/sec Loss 2.8746 LearningRate 0.000540 Epoch: 13 Global Step: 280710 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:39:06,490-Speed 2498.23 samples/sec Loss 2.9062 LearningRate 0.000540 Epoch: 13 Global Step: 280720 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:39:14,691-Speed 2498.06 samples/sec Loss 3.0038 LearningRate 0.000540 Epoch: 13 Global Step: 280730 Fp16 Grad Scale: 32768 Required: 126 hours Training: 2022-07-08 06:39:22,843-Speed 2512.75 samples/sec Loss 2.9263 LearningRate 0.000540 Epoch: 13 Global Step: 280740 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:39:30,989-Speed 2514.34 samples/sec Loss 2.8913 LearningRate 0.000540 Epoch: 13 Global Step: 280750 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:39:39,186-Speed 2498.85 samples/sec Loss 2.9366 LearningRate 0.000540 Epoch: 13 Global Step: 280760 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:39:47,383-Speed 2498.79 samples/sec Loss 2.9419 LearningRate 0.000540 Epoch: 13 Global Step: 280770 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:39:55,579-Speed 2499.19 samples/sec Loss 2.8970 LearningRate 0.000540 Epoch: 13 Global Step: 280780 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:03,777-Speed 2498.50 samples/sec Loss 2.9477 LearningRate 0.000540 Epoch: 13 Global Step: 280790 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:11,974-Speed 2498.71 samples/sec Loss 2.9746 LearningRate 0.000540 Epoch: 13 Global Step: 280800 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:20,121-Speed 2514.47 samples/sec Loss 2.9225 LearningRate 0.000540 Epoch: 13 Global Step: 280810 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:28,321-Speed 2497.98 samples/sec Loss 2.9073 LearningRate 0.000540 Epoch: 13 Global Step: 280820 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:36,519-Speed 2498.86 samples/sec Loss 2.9659 LearningRate 0.000540 Epoch: 13 Global Step: 280830 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:44,720-Speed 2497.63 samples/sec Loss 2.9305 LearningRate 0.000540 Epoch: 13 Global Step: 280840 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:40:52,921-Speed 2497.71 samples/sec Loss 2.9242 LearningRate 0.000540 Epoch: 13 Global Step: 280850 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:41:01,120-Speed 2498.17 samples/sec Loss 2.9261 LearningRate 0.000540 Epoch: 13 Global Step: 280860 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:41:09,265-Speed 2515.00 samples/sec Loss 3.0241 LearningRate 0.000540 Epoch: 13 Global Step: 280870 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:41:17,464-Speed 2498.17 samples/sec Loss 2.9067 LearningRate 0.000540 Epoch: 13 Global Step: 280880 Fp16 Grad Scale: 16384 Required: 126 hours Training: 2022-07-08 06:41:25,672-Speed 2495.22 samples/sec Loss 2.9500 LearningRate 0.000540 Epoch: 13 Global Step: 280890 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:41:33,869-Speed 2499.09 samples/sec Loss 2.9527 LearningRate 0.000540 Epoch: 13 Global Step: 280900 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:41:42,070-Speed 2497.69 samples/sec Loss 2.8930 LearningRate 0.000540 Epoch: 13 Global Step: 280910 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:41:50,271-Speed 2497.71 samples/sec Loss 2.9265 LearningRate 0.000540 Epoch: 13 Global Step: 280920 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:41:58,418-Speed 2514.11 samples/sec Loss 2.8959 LearningRate 0.000540 Epoch: 13 Global Step: 280930 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:06,627-Speed 2495.34 samples/sec Loss 2.9314 LearningRate 0.000540 Epoch: 13 Global Step: 280940 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:14,824-Speed 2498.84 samples/sec Loss 2.8977 LearningRate 0.000540 Epoch: 13 Global Step: 280950 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:23,027-Speed 2497.02 samples/sec Loss 2.9350 LearningRate 0.000540 Epoch: 13 Global Step: 280960 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:31,232-Speed 2496.32 samples/sec Loss 2.8603 LearningRate 0.000540 Epoch: 13 Global Step: 280970 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:39,436-Speed 2496.84 samples/sec Loss 2.8869 LearningRate 0.000540 Epoch: 13 Global Step: 280980 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:47,582-Speed 2514.55 samples/sec Loss 2.9573 LearningRate 0.000540 Epoch: 13 Global Step: 280990 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:42:55,784-Speed 2497.20 samples/sec Loss 2.8938 LearningRate 0.000540 Epoch: 13 Global Step: 281000 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:04,011-Speed 2489.72 samples/sec Loss 2.8563 LearningRate 0.000540 Epoch: 13 Global Step: 281010 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:12,225-Speed 2493.78 samples/sec Loss 2.8271 LearningRate 0.000540 Epoch: 13 Global Step: 281020 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:20,428-Speed 2496.99 samples/sec Loss 2.8880 LearningRate 0.000540 Epoch: 13 Global Step: 281030 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:28,625-Speed 2498.55 samples/sec Loss 2.8993 LearningRate 0.000540 Epoch: 13 Global Step: 281040 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:36,775-Speed 2514.16 samples/sec Loss 2.9201 LearningRate 0.000540 Epoch: 13 Global Step: 281050 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:44,969-Speed 2499.64 samples/sec Loss 2.8801 LearningRate 0.000540 Epoch: 13 Global Step: 281060 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:43:53,172-Speed 2497.30 samples/sec Loss 2.9064 LearningRate 0.000540 Epoch: 13 Global Step: 281070 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:01,368-Speed 2499.18 samples/sec Loss 2.8621 LearningRate 0.000540 Epoch: 13 Global Step: 281080 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:09,563-Speed 2499.31 samples/sec Loss 2.8756 LearningRate 0.000540 Epoch: 13 Global Step: 281090 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:17,765-Speed 2497.62 samples/sec Loss 2.8343 LearningRate 0.000540 Epoch: 13 Global Step: 281100 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:25,906-Speed 2516.01 samples/sec Loss 2.8407 LearningRate 0.000540 Epoch: 13 Global Step: 281110 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:34,121-Speed 2493.24 samples/sec Loss 2.8802 LearningRate 0.000540 Epoch: 13 Global Step: 281120 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:42,320-Speed 2498.27 samples/sec Loss 2.8885 LearningRate 0.000540 Epoch: 13 Global Step: 281130 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:50,520-Speed 2498.08 samples/sec Loss 2.8526 LearningRate 0.000540 Epoch: 13 Global Step: 281140 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:44:58,721-Speed 2497.57 samples/sec Loss 2.8613 LearningRate 0.000540 Epoch: 13 Global Step: 281150 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:06,921-Speed 2498.02 samples/sec Loss 2.8838 LearningRate 0.000540 Epoch: 13 Global Step: 281160 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:15,069-Speed 2513.98 samples/sec Loss 3.0565 LearningRate 0.000540 Epoch: 13 Global Step: 281170 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:23,268-Speed 2498.41 samples/sec Loss 2.9040 LearningRate 0.000539 Epoch: 13 Global Step: 281180 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:31,467-Speed 2498.39 samples/sec Loss 2.9340 LearningRate 0.000539 Epoch: 13 Global Step: 281190 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:39,664-Speed 2498.83 samples/sec Loss 2.9863 LearningRate 0.000539 Epoch: 13 Global Step: 281200 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 06:45:47,833-Speed 2507.45 samples/sec Loss 2.9614 LearningRate 0.000539 Epoch: 13 Global Step: 281210 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:45:56,026-Speed 2499.77 samples/sec Loss 2.9802 LearningRate 0.000539 Epoch: 13 Global Step: 281220 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:04,171-Speed 2515.04 samples/sec Loss 2.9055 LearningRate 0.000539 Epoch: 13 Global Step: 281230 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:12,366-Speed 2499.47 samples/sec Loss 2.9583 LearningRate 0.000539 Epoch: 13 Global Step: 281240 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:20,566-Speed 2497.98 samples/sec Loss 2.9155 LearningRate 0.000539 Epoch: 13 Global Step: 281250 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:28,779-Speed 2493.78 samples/sec Loss 2.9190 LearningRate 0.000539 Epoch: 13 Global Step: 281260 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:36,982-Speed 2497.09 samples/sec Loss 2.9148 LearningRate 0.000539 Epoch: 13 Global Step: 281270 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:45,182-Speed 2498.16 samples/sec Loss 2.9240 LearningRate 0.000539 Epoch: 13 Global Step: 281280 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:46:53,332-Speed 2513.37 samples/sec Loss 2.8854 LearningRate 0.000539 Epoch: 13 Global Step: 281290 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:01,533-Speed 2497.43 samples/sec Loss 2.9165 LearningRate 0.000539 Epoch: 13 Global Step: 281300 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:09,732-Speed 2498.48 samples/sec Loss 2.9154 LearningRate 0.000539 Epoch: 13 Global Step: 281310 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:17,934-Speed 2497.23 samples/sec Loss 2.8971 LearningRate 0.000539 Epoch: 13 Global Step: 281320 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:26,135-Speed 2497.75 samples/sec Loss 2.8423 LearningRate 0.000539 Epoch: 13 Global Step: 281330 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:34,337-Speed 2497.27 samples/sec Loss 2.9398 LearningRate 0.000539 Epoch: 13 Global Step: 281340 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:42,479-Speed 2515.63 samples/sec Loss 2.9930 LearningRate 0.000539 Epoch: 13 Global Step: 281350 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:50,682-Speed 2497.39 samples/sec Loss 2.8586 LearningRate 0.000539 Epoch: 13 Global Step: 281360 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:47:58,881-Speed 2498.17 samples/sec Loss 2.9137 LearningRate 0.000539 Epoch: 13 Global Step: 281370 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:07,080-Speed 2498.34 samples/sec Loss 2.9219 LearningRate 0.000539 Epoch: 13 Global Step: 281380 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:15,274-Speed 2499.89 samples/sec Loss 2.9276 LearningRate 0.000539 Epoch: 13 Global Step: 281390 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:23,468-Speed 2499.73 samples/sec Loss 2.9529 LearningRate 0.000539 Epoch: 13 Global Step: 281400 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:31,612-Speed 2515.16 samples/sec Loss 2.9149 LearningRate 0.000539 Epoch: 13 Global Step: 281410 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:39,812-Speed 2498.17 samples/sec Loss 2.9271 LearningRate 0.000539 Epoch: 13 Global Step: 281420 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:48,010-Speed 2498.63 samples/sec Loss 2.8895 LearningRate 0.000539 Epoch: 13 Global Step: 281430 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:48:56,212-Speed 2497.53 samples/sec Loss 2.8891 LearningRate 0.000539 Epoch: 13 Global Step: 281440 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:04,420-Speed 2495.43 samples/sec Loss 2.9202 LearningRate 0.000539 Epoch: 13 Global Step: 281450 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:12,616-Speed 2499.38 samples/sec Loss 2.9024 LearningRate 0.000539 Epoch: 13 Global Step: 281460 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:20,759-Speed 2515.48 samples/sec Loss 2.8819 LearningRate 0.000539 Epoch: 13 Global Step: 281470 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:28,964-Speed 2496.23 samples/sec Loss 2.8717 LearningRate 0.000539 Epoch: 13 Global Step: 281480 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:37,159-Speed 2499.81 samples/sec Loss 2.9679 LearningRate 0.000539 Epoch: 13 Global Step: 281490 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:45,358-Speed 2498.53 samples/sec Loss 2.8900 LearningRate 0.000539 Epoch: 13 Global Step: 281500 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:49:53,568-Speed 2494.92 samples/sec Loss 2.9503 LearningRate 0.000539 Epoch: 13 Global Step: 281510 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:01,764-Speed 2498.91 samples/sec Loss 2.9443 LearningRate 0.000539 Epoch: 13 Global Step: 281520 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:09,922-Speed 2510.80 samples/sec Loss 2.9813 LearningRate 0.000539 Epoch: 13 Global Step: 281530 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:18,118-Speed 2499.02 samples/sec Loss 2.9335 LearningRate 0.000539 Epoch: 13 Global Step: 281540 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:26,314-Speed 2499.13 samples/sec Loss 2.9861 LearningRate 0.000539 Epoch: 13 Global Step: 281550 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:34,519-Speed 2496.56 samples/sec Loss 2.9392 LearningRate 0.000539 Epoch: 13 Global Step: 281560 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:42,714-Speed 2499.58 samples/sec Loss 2.9668 LearningRate 0.000539 Epoch: 13 Global Step: 281570 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:50,915-Speed 2497.76 samples/sec Loss 3.0012 LearningRate 0.000539 Epoch: 13 Global Step: 281580 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:50:59,057-Speed 2516.16 samples/sec Loss 2.9084 LearningRate 0.000539 Epoch: 13 Global Step: 281590 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:07,265-Speed 2495.55 samples/sec Loss 2.9355 LearningRate 0.000539 Epoch: 13 Global Step: 281600 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:15,471-Speed 2496.12 samples/sec Loss 3.0034 LearningRate 0.000539 Epoch: 13 Global Step: 281610 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:23,675-Speed 2496.99 samples/sec Loss 2.9782 LearningRate 0.000539 Epoch: 13 Global Step: 281620 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:31,874-Speed 2498.59 samples/sec Loss 2.9670 LearningRate 0.000539 Epoch: 13 Global Step: 281630 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:40,085-Speed 2494.35 samples/sec Loss 3.0031 LearningRate 0.000539 Epoch: 13 Global Step: 281640 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:48,230-Speed 2514.87 samples/sec Loss 2.9159 LearningRate 0.000539 Epoch: 13 Global Step: 281650 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:51:56,428-Speed 2498.59 samples/sec Loss 2.9455 LearningRate 0.000539 Epoch: 13 Global Step: 281660 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:04,625-Speed 2498.84 samples/sec Loss 2.9634 LearningRate 0.000539 Epoch: 13 Global Step: 281670 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:12,827-Speed 2497.54 samples/sec Loss 2.9027 LearningRate 0.000539 Epoch: 13 Global Step: 281680 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:21,028-Speed 2497.73 samples/sec Loss 2.9974 LearningRate 0.000538 Epoch: 13 Global Step: 281690 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:29,227-Speed 2498.14 samples/sec Loss 2.9142 LearningRate 0.000538 Epoch: 13 Global Step: 281700 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:37,371-Speed 2515.18 samples/sec Loss 2.9546 LearningRate 0.000538 Epoch: 13 Global Step: 281710 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:45,567-Speed 2499.28 samples/sec Loss 2.8980 LearningRate 0.000538 Epoch: 13 Global Step: 281720 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:52:53,766-Speed 2498.29 samples/sec Loss 2.9452 LearningRate 0.000538 Epoch: 13 Global Step: 281730 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:01,964-Speed 2498.66 samples/sec Loss 2.9209 LearningRate 0.000538 Epoch: 13 Global Step: 281740 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:10,161-Speed 2498.65 samples/sec Loss 2.9177 LearningRate 0.000538 Epoch: 13 Global Step: 281750 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:18,365-Speed 2496.96 samples/sec Loss 2.9189 LearningRate 0.000538 Epoch: 13 Global Step: 281760 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:26,513-Speed 2513.78 samples/sec Loss 2.9987 LearningRate 0.000538 Epoch: 13 Global Step: 281770 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:34,710-Speed 2500.13 samples/sec Loss 2.9300 LearningRate 0.000538 Epoch: 13 Global Step: 281780 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:42,909-Speed 2498.53 samples/sec Loss 2.9680 LearningRate 0.000538 Epoch: 13 Global Step: 281790 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:51,117-Speed 2495.38 samples/sec Loss 2.9175 LearningRate 0.000538 Epoch: 13 Global Step: 281800 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:53:59,317-Speed 2498.26 samples/sec Loss 2.9437 LearningRate 0.000538 Epoch: 13 Global Step: 281810 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:07,520-Speed 2497.43 samples/sec Loss 2.9777 LearningRate 0.000538 Epoch: 13 Global Step: 281820 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:15,669-Speed 2513.34 samples/sec Loss 2.9585 LearningRate 0.000538 Epoch: 13 Global Step: 281830 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:23,871-Speed 2497.38 samples/sec Loss 2.9280 LearningRate 0.000538 Epoch: 13 Global Step: 281840 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:32,074-Speed 2497.06 samples/sec Loss 2.9350 LearningRate 0.000538 Epoch: 13 Global Step: 281850 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:40,276-Speed 2497.80 samples/sec Loss 2.8790 LearningRate 0.000538 Epoch: 13 Global Step: 281860 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:48,481-Speed 2496.14 samples/sec Loss 2.8838 LearningRate 0.000538 Epoch: 13 Global Step: 281870 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:54:56,686-Speed 2496.59 samples/sec Loss 2.8494 LearningRate 0.000538 Epoch: 13 Global Step: 281880 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:04,831-Speed 2514.93 samples/sec Loss 2.8940 LearningRate 0.000538 Epoch: 13 Global Step: 281890 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:13,035-Speed 2496.79 samples/sec Loss 2.9383 LearningRate 0.000538 Epoch: 13 Global Step: 281900 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:21,234-Speed 2498.30 samples/sec Loss 2.9177 LearningRate 0.000538 Epoch: 13 Global Step: 281910 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:29,432-Speed 2498.51 samples/sec Loss 2.8755 LearningRate 0.000538 Epoch: 13 Global Step: 281920 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:37,645-Speed 2494.38 samples/sec Loss 2.8602 LearningRate 0.000538 Epoch: 13 Global Step: 281930 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:45,848-Speed 2496.92 samples/sec Loss 2.8857 LearningRate 0.000538 Epoch: 13 Global Step: 281940 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:55:53,996-Speed 2514.38 samples/sec Loss 2.9030 LearningRate 0.000538 Epoch: 13 Global Step: 281950 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:02,194-Speed 2498.73 samples/sec Loss 2.9606 LearningRate 0.000538 Epoch: 13 Global Step: 281960 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:10,397-Speed 2497.06 samples/sec Loss 2.9294 LearningRate 0.000538 Epoch: 13 Global Step: 281970 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:18,610-Speed 2494.14 samples/sec Loss 2.8304 LearningRate 0.000538 Epoch: 13 Global Step: 281980 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:26,806-Speed 2499.10 samples/sec Loss 2.8735 LearningRate 0.000538 Epoch: 13 Global Step: 281990 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:35,010-Speed 2496.73 samples/sec Loss 2.8814 LearningRate 0.000538 Epoch: 13 Global Step: 282000 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:43,157-Speed 2514.24 samples/sec Loss 2.8523 LearningRate 0.000538 Epoch: 13 Global Step: 282010 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:51,359-Speed 2497.32 samples/sec Loss 2.9148 LearningRate 0.000538 Epoch: 13 Global Step: 282020 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:56:59,562-Speed 2497.05 samples/sec Loss 2.8688 LearningRate 0.000538 Epoch: 13 Global Step: 282030 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:07,776-Speed 2493.64 samples/sec Loss 2.8683 LearningRate 0.000538 Epoch: 13 Global Step: 282040 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:15,973-Speed 2498.92 samples/sec Loss 2.8825 LearningRate 0.000538 Epoch: 13 Global Step: 282050 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:24,173-Speed 2497.71 samples/sec Loss 2.8575 LearningRate 0.000538 Epoch: 13 Global Step: 282060 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:32,316-Speed 2515.36 samples/sec Loss 2.8580 LearningRate 0.000538 Epoch: 13 Global Step: 282070 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:40,531-Speed 2493.49 samples/sec Loss 2.9134 LearningRate 0.000538 Epoch: 13 Global Step: 282080 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:48,732-Speed 2498.62 samples/sec Loss 2.8783 LearningRate 0.000538 Epoch: 13 Global Step: 282090 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:57:56,933-Speed 2497.66 samples/sec Loss 2.9256 LearningRate 0.000538 Epoch: 13 Global Step: 282100 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:05,130-Speed 2498.98 samples/sec Loss 2.8411 LearningRate 0.000538 Epoch: 13 Global Step: 282110 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:13,324-Speed 2499.57 samples/sec Loss 2.9755 LearningRate 0.000538 Epoch: 13 Global Step: 282120 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:21,467-Speed 2515.70 samples/sec Loss 2.9272 LearningRate 0.000538 Epoch: 13 Global Step: 282130 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:29,665-Speed 2498.43 samples/sec Loss 2.9068 LearningRate 0.000538 Epoch: 13 Global Step: 282140 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:37,860-Speed 2499.34 samples/sec Loss 2.9241 LearningRate 0.000538 Epoch: 13 Global Step: 282150 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:46,061-Speed 2498.12 samples/sec Loss 2.8557 LearningRate 0.000538 Epoch: 13 Global Step: 282160 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:58:54,262-Speed 2497.63 samples/sec Loss 2.9192 LearningRate 0.000538 Epoch: 13 Global Step: 282170 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:02,459-Speed 2498.60 samples/sec Loss 2.9169 LearningRate 0.000538 Epoch: 13 Global Step: 282180 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:10,614-Speed 2511.97 samples/sec Loss 2.8923 LearningRate 0.000538 Epoch: 13 Global Step: 282190 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:18,811-Speed 2498.77 samples/sec Loss 2.8811 LearningRate 0.000537 Epoch: 13 Global Step: 282200 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:27,009-Speed 2498.61 samples/sec Loss 2.8824 LearningRate 0.000537 Epoch: 13 Global Step: 282210 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:35,213-Speed 2496.57 samples/sec Loss 2.8803 LearningRate 0.000537 Epoch: 13 Global Step: 282220 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:43,410-Speed 2498.81 samples/sec Loss 2.8835 LearningRate 0.000537 Epoch: 13 Global Step: 282230 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:51,610-Speed 2497.91 samples/sec Loss 2.8934 LearningRate 0.000537 Epoch: 13 Global Step: 282240 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 06:59:59,758-Speed 2513.89 samples/sec Loss 2.9043 LearningRate 0.000537 Epoch: 13 Global Step: 282250 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:07,959-Speed 2497.82 samples/sec Loss 2.8947 LearningRate 0.000537 Epoch: 13 Global Step: 282260 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:16,171-Speed 2494.32 samples/sec Loss 2.8514 LearningRate 0.000537 Epoch: 13 Global Step: 282270 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:24,372-Speed 2497.76 samples/sec Loss 2.9033 LearningRate 0.000537 Epoch: 13 Global Step: 282280 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:32,571-Speed 2497.98 samples/sec Loss 2.8930 LearningRate 0.000537 Epoch: 13 Global Step: 282290 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:40,771-Speed 2498.21 samples/sec Loss 2.8933 LearningRate 0.000537 Epoch: 13 Global Step: 282300 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:48,918-Speed 2514.32 samples/sec Loss 2.9046 LearningRate 0.000537 Epoch: 13 Global Step: 282310 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:00:57,121-Speed 2497.14 samples/sec Loss 2.8336 LearningRate 0.000537 Epoch: 13 Global Step: 282320 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:05,332-Speed 2494.57 samples/sec Loss 2.9035 LearningRate 0.000537 Epoch: 13 Global Step: 282330 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:13,527-Speed 2499.37 samples/sec Loss 2.8608 LearningRate 0.000537 Epoch: 13 Global Step: 282340 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:21,728-Speed 2497.67 samples/sec Loss 2.8761 LearningRate 0.000537 Epoch: 13 Global Step: 282350 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:29,931-Speed 2496.94 samples/sec Loss 2.8490 LearningRate 0.000537 Epoch: 13 Global Step: 282360 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:38,091-Speed 2510.21 samples/sec Loss 2.8806 LearningRate 0.000537 Epoch: 13 Global Step: 282370 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:46,379-Speed 2471.30 samples/sec Loss 2.8887 LearningRate 0.000537 Epoch: 13 Global Step: 282380 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:01:54,576-Speed 2499.19 samples/sec Loss 2.8579 LearningRate 0.000537 Epoch: 13 Global Step: 282390 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:02:02,772-Speed 2499.35 samples/sec Loss 2.8747 LearningRate 0.000537 Epoch: 13 Global Step: 282400 Fp16 Grad Scale: 8192 Required: 125 hours Training: 2022-07-08 07:02:10,973-Speed 2497.49 samples/sec Loss 2.9262 LearningRate 0.000537 Epoch: 13 Global Step: 282410 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:02:19,176-Speed 2497.18 samples/sec Loss 2.9185 LearningRate 0.000537 Epoch: 13 Global Step: 282420 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:02:27,316-Speed 2516.08 samples/sec Loss 2.8920 LearningRate 0.000537 Epoch: 13 Global Step: 282430 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:02:35,515-Speed 2498.47 samples/sec Loss 2.8752 LearningRate 0.000537 Epoch: 13 Global Step: 282440 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:02:43,709-Speed 2499.62 samples/sec Loss 2.8576 LearningRate 0.000537 Epoch: 13 Global Step: 282450 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:02:51,911-Speed 2497.41 samples/sec Loss 2.8521 LearningRate 0.000537 Epoch: 13 Global Step: 282460 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:00,111-Speed 2498.30 samples/sec Loss 2.8395 LearningRate 0.000537 Epoch: 13 Global Step: 282470 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:08,310-Speed 2498.22 samples/sec Loss 2.7910 LearningRate 0.000537 Epoch: 13 Global Step: 282480 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:16,465-Speed 2511.72 samples/sec Loss 2.8396 LearningRate 0.000537 Epoch: 13 Global Step: 282490 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:24,661-Speed 2499.66 samples/sec Loss 2.8561 LearningRate 0.000537 Epoch: 13 Global Step: 282500 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:32,862-Speed 2497.54 samples/sec Loss 2.9198 LearningRate 0.000537 Epoch: 13 Global Step: 282510 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:41,059-Speed 2498.88 samples/sec Loss 2.8913 LearningRate 0.000537 Epoch: 13 Global Step: 282520 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:49,255-Speed 2499.22 samples/sec Loss 2.8655 LearningRate 0.000537 Epoch: 13 Global Step: 282530 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:03:57,467-Speed 2494.27 samples/sec Loss 2.8533 LearningRate 0.000537 Epoch: 13 Global Step: 282540 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:05,615-Speed 2513.74 samples/sec Loss 2.8693 LearningRate 0.000537 Epoch: 13 Global Step: 282550 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:13,820-Speed 2496.50 samples/sec Loss 2.9036 LearningRate 0.000537 Epoch: 13 Global Step: 282560 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:22,021-Speed 2497.74 samples/sec Loss 2.8943 LearningRate 0.000537 Epoch: 13 Global Step: 282570 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:30,231-Speed 2495.03 samples/sec Loss 2.8696 LearningRate 0.000537 Epoch: 13 Global Step: 282580 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:38,433-Speed 2497.31 samples/sec Loss 2.9395 LearningRate 0.000537 Epoch: 13 Global Step: 282590 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:46,639-Speed 2496.11 samples/sec Loss 2.9184 LearningRate 0.000537 Epoch: 13 Global Step: 282600 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:04:54,786-Speed 2514.34 samples/sec Loss 2.8845 LearningRate 0.000537 Epoch: 13 Global Step: 282610 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:02,988-Speed 2497.19 samples/sec Loss 2.9702 LearningRate 0.000537 Epoch: 13 Global Step: 282620 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:11,188-Speed 2498.08 samples/sec Loss 2.9312 LearningRate 0.000537 Epoch: 13 Global Step: 282630 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:19,393-Speed 2496.42 samples/sec Loss 2.9272 LearningRate 0.000537 Epoch: 13 Global Step: 282640 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:27,595-Speed 2497.38 samples/sec Loss 3.0030 LearningRate 0.000537 Epoch: 13 Global Step: 282650 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:35,794-Speed 2498.05 samples/sec Loss 2.9241 LearningRate 0.000537 Epoch: 13 Global Step: 282660 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:43,941-Speed 2514.13 samples/sec Loss 2.9316 LearningRate 0.000537 Epoch: 13 Global Step: 282670 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:05:52,141-Speed 2498.08 samples/sec Loss 2.9551 LearningRate 0.000537 Epoch: 13 Global Step: 282680 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:00,338-Speed 2498.77 samples/sec Loss 2.8993 LearningRate 0.000537 Epoch: 13 Global Step: 282690 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:08,538-Speed 2497.93 samples/sec Loss 2.9313 LearningRate 0.000537 Epoch: 13 Global Step: 282700 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:16,736-Speed 2498.96 samples/sec Loss 2.9122 LearningRate 0.000536 Epoch: 13 Global Step: 282710 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:24,935-Speed 2498.02 samples/sec Loss 2.9430 LearningRate 0.000536 Epoch: 13 Global Step: 282720 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:33,090-Speed 2511.72 samples/sec Loss 2.8999 LearningRate 0.000536 Epoch: 13 Global Step: 282730 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:41,290-Speed 2498.21 samples/sec Loss 2.8900 LearningRate 0.000536 Epoch: 13 Global Step: 282740 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:49,487-Speed 2498.81 samples/sec Loss 2.9254 LearningRate 0.000536 Epoch: 13 Global Step: 282750 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:06:57,689-Speed 2497.25 samples/sec Loss 3.0203 LearningRate 0.000536 Epoch: 13 Global Step: 282760 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:05,884-Speed 2499.49 samples/sec Loss 2.8731 LearningRate 0.000536 Epoch: 13 Global Step: 282770 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:14,089-Speed 2496.41 samples/sec Loss 2.9505 LearningRate 0.000536 Epoch: 13 Global Step: 282780 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:22,248-Speed 2510.40 samples/sec Loss 2.9221 LearningRate 0.000536 Epoch: 13 Global Step: 282790 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:30,449-Speed 2497.59 samples/sec Loss 2.9272 LearningRate 0.000536 Epoch: 13 Global Step: 282800 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:38,654-Speed 2496.54 samples/sec Loss 2.9188 LearningRate 0.000536 Epoch: 13 Global Step: 282810 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:46,853-Speed 2498.26 samples/sec Loss 2.8805 LearningRate 0.000536 Epoch: 13 Global Step: 282820 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:07:55,052-Speed 2498.51 samples/sec Loss 2.9358 LearningRate 0.000536 Epoch: 13 Global Step: 282830 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:03,249-Speed 2498.65 samples/sec Loss 2.9143 LearningRate 0.000536 Epoch: 13 Global Step: 282840 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:11,395-Speed 2514.62 samples/sec Loss 2.9198 LearningRate 0.000536 Epoch: 13 Global Step: 282850 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:19,609-Speed 2493.85 samples/sec Loss 2.9025 LearningRate 0.000536 Epoch: 13 Global Step: 282860 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:27,804-Speed 2499.32 samples/sec Loss 2.8611 LearningRate 0.000536 Epoch: 13 Global Step: 282870 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:36,022-Speed 2492.56 samples/sec Loss 2.9396 LearningRate 0.000536 Epoch: 13 Global Step: 282880 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:44,220-Speed 2498.72 samples/sec Loss 2.9078 LearningRate 0.000536 Epoch: 13 Global Step: 282890 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:08:52,426-Speed 2496.39 samples/sec Loss 2.9277 LearningRate 0.000536 Epoch: 13 Global Step: 282900 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:00,580-Speed 2512.16 samples/sec Loss 2.9189 LearningRate 0.000536 Epoch: 13 Global Step: 282910 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:08,780-Speed 2497.86 samples/sec Loss 2.8974 LearningRate 0.000536 Epoch: 13 Global Step: 282920 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:16,979-Speed 2498.23 samples/sec Loss 2.9109 LearningRate 0.000536 Epoch: 13 Global Step: 282930 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:25,180-Speed 2497.52 samples/sec Loss 2.9057 LearningRate 0.000536 Epoch: 13 Global Step: 282940 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:33,383-Speed 2497.30 samples/sec Loss 2.8807 LearningRate 0.000536 Epoch: 13 Global Step: 282950 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:41,584-Speed 2497.80 samples/sec Loss 2.8822 LearningRate 0.000536 Epoch: 13 Global Step: 282960 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:49,736-Speed 2512.50 samples/sec Loss 2.9533 LearningRate 0.000536 Epoch: 13 Global Step: 282970 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:09:57,935-Speed 2498.42 samples/sec Loss 2.9155 LearningRate 0.000536 Epoch: 13 Global Step: 282980 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:06,139-Speed 2496.54 samples/sec Loss 2.8907 LearningRate 0.000536 Epoch: 13 Global Step: 282990 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:14,340-Speed 2497.68 samples/sec Loss 2.9056 LearningRate 0.000536 Epoch: 13 Global Step: 283000 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:22,538-Speed 2498.86 samples/sec Loss 2.8995 LearningRate 0.000536 Epoch: 13 Global Step: 283010 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:30,740-Speed 2497.42 samples/sec Loss 2.8767 LearningRate 0.000536 Epoch: 13 Global Step: 283020 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:38,887-Speed 2514.32 samples/sec Loss 2.9175 LearningRate 0.000536 Epoch: 13 Global Step: 283030 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:47,087-Speed 2497.77 samples/sec Loss 2.8141 LearningRate 0.000536 Epoch: 13 Global Step: 283040 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:10:55,295-Speed 2495.71 samples/sec Loss 2.9428 LearningRate 0.000536 Epoch: 13 Global Step: 283050 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:03,494-Speed 2498.15 samples/sec Loss 2.8971 LearningRate 0.000536 Epoch: 13 Global Step: 283060 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:11,699-Speed 2496.43 samples/sec Loss 2.8720 LearningRate 0.000536 Epoch: 13 Global Step: 283070 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:19,913-Speed 2493.63 samples/sec Loss 2.8715 LearningRate 0.000536 Epoch: 13 Global Step: 283080 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:28,063-Speed 2513.54 samples/sec Loss 2.9442 LearningRate 0.000536 Epoch: 13 Global Step: 283090 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:36,276-Speed 2493.83 samples/sec Loss 2.9313 LearningRate 0.000536 Epoch: 13 Global Step: 283100 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:44,478-Speed 2497.38 samples/sec Loss 2.8615 LearningRate 0.000536 Epoch: 13 Global Step: 283110 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:11:52,681-Speed 2497.20 samples/sec Loss 2.8858 LearningRate 0.000536 Epoch: 13 Global Step: 283120 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:00,891-Speed 2495.02 samples/sec Loss 2.9179 LearningRate 0.000536 Epoch: 13 Global Step: 283130 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:09,092-Speed 2497.67 samples/sec Loss 2.9032 LearningRate 0.000536 Epoch: 13 Global Step: 283140 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:17,246-Speed 2511.89 samples/sec Loss 2.8463 LearningRate 0.000536 Epoch: 13 Global Step: 283150 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:25,451-Speed 2496.30 samples/sec Loss 2.9911 LearningRate 0.000536 Epoch: 13 Global Step: 283160 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:33,655-Speed 2496.89 samples/sec Loss 3.0064 LearningRate 0.000536 Epoch: 13 Global Step: 283170 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:41,853-Speed 2498.91 samples/sec Loss 2.9744 LearningRate 0.000536 Epoch: 13 Global Step: 283180 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:50,051-Speed 2498.62 samples/sec Loss 2.9565 LearningRate 0.000536 Epoch: 13 Global Step: 283190 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:12:58,253-Speed 2497.50 samples/sec Loss 2.9718 LearningRate 0.000536 Epoch: 13 Global Step: 283200 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:06,396-Speed 2515.25 samples/sec Loss 2.9689 LearningRate 0.000536 Epoch: 13 Global Step: 283210 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:14,593-Speed 2498.76 samples/sec Loss 2.8709 LearningRate 0.000535 Epoch: 13 Global Step: 283220 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:22,789-Speed 2499.27 samples/sec Loss 2.8832 LearningRate 0.000535 Epoch: 13 Global Step: 283230 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:30,992-Speed 2497.07 samples/sec Loss 2.9336 LearningRate 0.000535 Epoch: 13 Global Step: 283240 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:39,194-Speed 2497.43 samples/sec Loss 2.8714 LearningRate 0.000535 Epoch: 13 Global Step: 283250 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:47,392-Speed 2498.47 samples/sec Loss 2.8784 LearningRate 0.000535 Epoch: 13 Global Step: 283260 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:13:55,539-Speed 2514.26 samples/sec Loss 2.8355 LearningRate 0.000535 Epoch: 13 Global Step: 283270 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:03,738-Speed 2498.13 samples/sec Loss 2.8292 LearningRate 0.000535 Epoch: 13 Global Step: 283280 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:11,937-Speed 2498.32 samples/sec Loss 2.8388 LearningRate 0.000535 Epoch: 13 Global Step: 283290 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:20,141-Speed 2496.80 samples/sec Loss 2.8522 LearningRate 0.000535 Epoch: 13 Global Step: 283300 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:28,356-Speed 2493.72 samples/sec Loss 2.8431 LearningRate 0.000535 Epoch: 13 Global Step: 283310 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:36,552-Speed 2499.25 samples/sec Loss 2.8187 LearningRate 0.000535 Epoch: 13 Global Step: 283320 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:44,697-Speed 2514.82 samples/sec Loss 2.8549 LearningRate 0.000535 Epoch: 13 Global Step: 283330 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:14:52,891-Speed 2499.71 samples/sec Loss 2.8616 LearningRate 0.000535 Epoch: 13 Global Step: 283340 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:01,084-Speed 2500.20 samples/sec Loss 2.8767 LearningRate 0.000535 Epoch: 13 Global Step: 283350 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:09,279-Speed 2499.64 samples/sec Loss 2.8317 LearningRate 0.000535 Epoch: 13 Global Step: 283360 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:17,475-Speed 2498.86 samples/sec Loss 2.8963 LearningRate 0.000535 Epoch: 13 Global Step: 283370 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:25,672-Speed 2498.91 samples/sec Loss 2.9111 LearningRate 0.000535 Epoch: 13 Global Step: 283380 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:33,815-Speed 2515.68 samples/sec Loss 2.8683 LearningRate 0.000535 Epoch: 13 Global Step: 283390 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:42,012-Speed 2499.17 samples/sec Loss 2.8595 LearningRate 0.000535 Epoch: 13 Global Step: 283400 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:50,207-Speed 2499.34 samples/sec Loss 2.8234 LearningRate 0.000535 Epoch: 13 Global Step: 283410 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:15:58,407-Speed 2498.07 samples/sec Loss 2.8980 LearningRate 0.000535 Epoch: 13 Global Step: 283420 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:06,602-Speed 2499.45 samples/sec Loss 2.9151 LearningRate 0.000535 Epoch: 13 Global Step: 283430 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:14,804-Speed 2497.49 samples/sec Loss 2.9232 LearningRate 0.000535 Epoch: 13 Global Step: 283440 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:22,951-Speed 2514.22 samples/sec Loss 2.9297 LearningRate 0.000535 Epoch: 13 Global Step: 283450 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:31,149-Speed 2498.45 samples/sec Loss 2.9208 LearningRate 0.000535 Epoch: 13 Global Step: 283460 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:39,349-Speed 2498.15 samples/sec Loss 2.8667 LearningRate 0.000535 Epoch: 13 Global Step: 283470 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:47,549-Speed 2497.99 samples/sec Loss 2.8986 LearningRate 0.000535 Epoch: 13 Global Step: 283480 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:16:55,754-Speed 2496.45 samples/sec Loss 2.8539 LearningRate 0.000535 Epoch: 13 Global Step: 283490 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:03,953-Speed 2498.24 samples/sec Loss 2.8766 LearningRate 0.000535 Epoch: 13 Global Step: 283500 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:12,100-Speed 2514.49 samples/sec Loss 2.8758 LearningRate 0.000535 Epoch: 13 Global Step: 283510 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:20,294-Speed 2499.59 samples/sec Loss 2.8653 LearningRate 0.000535 Epoch: 13 Global Step: 283520 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:28,496-Speed 2497.57 samples/sec Loss 2.8241 LearningRate 0.000535 Epoch: 13 Global Step: 283530 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:36,697-Speed 2497.89 samples/sec Loss 2.8543 LearningRate 0.000535 Epoch: 13 Global Step: 283540 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:44,894-Speed 2498.61 samples/sec Loss 2.8977 LearningRate 0.000535 Epoch: 13 Global Step: 283550 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:17:53,100-Speed 2496.07 samples/sec Loss 2.9371 LearningRate 0.000535 Epoch: 13 Global Step: 283560 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:18:01,243-Speed 2515.68 samples/sec Loss 2.9536 LearningRate 0.000535 Epoch: 13 Global Step: 283570 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:18:09,446-Speed 2497.16 samples/sec Loss 2.9664 LearningRate 0.000535 Epoch: 13 Global Step: 283580 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:18:17,646-Speed 2498.08 samples/sec Loss 2.8905 LearningRate 0.000535 Epoch: 13 Global Step: 283590 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:18:25,848-Speed 2497.26 samples/sec Loss 2.9499 LearningRate 0.000535 Epoch: 13 Global Step: 283600 Fp16 Grad Scale: 16384 Required: 125 hours Training: 2022-07-08 07:18:34,048-Speed 2497.88 samples/sec Loss 2.8623 LearningRate 0.000535 Epoch: 13 Global Step: 283610 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:18:42,248-Speed 2497.83 samples/sec Loss 2.9149 LearningRate 0.000535 Epoch: 13 Global Step: 283620 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:18:50,394-Speed 2514.86 samples/sec Loss 2.8851 LearningRate 0.000535 Epoch: 13 Global Step: 283630 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:18:58,608-Speed 2493.76 samples/sec Loss 2.8783 LearningRate 0.000535 Epoch: 13 Global Step: 283640 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:06,807-Speed 2498.44 samples/sec Loss 2.9676 LearningRate 0.000535 Epoch: 13 Global Step: 283650 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:15,002-Speed 2499.30 samples/sec Loss 2.9274 LearningRate 0.000535 Epoch: 13 Global Step: 283660 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:23,226-Speed 2491.08 samples/sec Loss 2.9659 LearningRate 0.000535 Epoch: 13 Global Step: 283670 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:31,427-Speed 2497.86 samples/sec Loss 2.9799 LearningRate 0.000535 Epoch: 13 Global Step: 283680 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:39,572-Speed 2514.88 samples/sec Loss 2.9171 LearningRate 0.000535 Epoch: 13 Global Step: 283690 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:47,770-Speed 2498.63 samples/sec Loss 2.8880 LearningRate 0.000535 Epoch: 13 Global Step: 283700 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:19:55,980-Speed 2494.88 samples/sec Loss 2.9198 LearningRate 0.000535 Epoch: 13 Global Step: 283710 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:04,177-Speed 2498.95 samples/sec Loss 2.8918 LearningRate 0.000535 Epoch: 13 Global Step: 283720 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:12,378-Speed 2497.45 samples/sec Loss 2.9058 LearningRate 0.000534 Epoch: 13 Global Step: 283730 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:20,576-Speed 2498.66 samples/sec Loss 2.9145 LearningRate 0.000534 Epoch: 13 Global Step: 283740 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:28,720-Speed 2515.08 samples/sec Loss 2.9735 LearningRate 0.000534 Epoch: 13 Global Step: 283750 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:36,916-Speed 2499.27 samples/sec Loss 2.8931 LearningRate 0.000534 Epoch: 13 Global Step: 283760 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:45,113-Speed 2498.95 samples/sec Loss 2.9277 LearningRate 0.000534 Epoch: 13 Global Step: 283770 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:20:53,308-Speed 2499.61 samples/sec Loss 2.8592 LearningRate 0.000534 Epoch: 13 Global Step: 283780 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:01,500-Speed 2500.51 samples/sec Loss 2.8185 LearningRate 0.000534 Epoch: 13 Global Step: 283790 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:09,697-Speed 2499.00 samples/sec Loss 2.8793 LearningRate 0.000534 Epoch: 13 Global Step: 283800 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:17,838-Speed 2516.40 samples/sec Loss 2.8583 LearningRate 0.000534 Epoch: 13 Global Step: 283810 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:26,042-Speed 2496.90 samples/sec Loss 2.8587 LearningRate 0.000534 Epoch: 13 Global Step: 283820 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:34,237-Speed 2499.45 samples/sec Loss 2.9047 LearningRate 0.000534 Epoch: 13 Global Step: 283830 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:42,434-Speed 2498.63 samples/sec Loss 2.8240 LearningRate 0.000534 Epoch: 13 Global Step: 283840 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:50,632-Speed 2498.75 samples/sec Loss 2.9139 LearningRate 0.000534 Epoch: 13 Global Step: 283850 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:21:58,850-Speed 2492.60 samples/sec Loss 2.8457 LearningRate 0.000534 Epoch: 13 Global Step: 283860 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:06,988-Speed 2516.73 samples/sec Loss 2.8675 LearningRate 0.000534 Epoch: 13 Global Step: 283870 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:15,184-Speed 2499.14 samples/sec Loss 2.8590 LearningRate 0.000534 Epoch: 13 Global Step: 283880 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:23,381-Speed 2498.89 samples/sec Loss 2.8908 LearningRate 0.000534 Epoch: 13 Global Step: 283890 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:31,578-Speed 2498.97 samples/sec Loss 2.8308 LearningRate 0.000534 Epoch: 13 Global Step: 283900 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:39,779-Speed 2497.67 samples/sec Loss 2.7998 LearningRate 0.000534 Epoch: 13 Global Step: 283910 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:47,978-Speed 2498.21 samples/sec Loss 2.8530 LearningRate 0.000534 Epoch: 13 Global Step: 283920 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:22:56,130-Speed 2512.78 samples/sec Loss 2.8730 LearningRate 0.000534 Epoch: 13 Global Step: 283930 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:04,334-Speed 2496.89 samples/sec Loss 2.9264 LearningRate 0.000534 Epoch: 13 Global Step: 283940 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:12,532-Speed 2498.51 samples/sec Loss 2.8809 LearningRate 0.000534 Epoch: 13 Global Step: 283950 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:20,743-Speed 2494.46 samples/sec Loss 2.8842 LearningRate 0.000534 Epoch: 13 Global Step: 283960 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:28,953-Speed 2494.85 samples/sec Loss 2.9536 LearningRate 0.000534 Epoch: 13 Global Step: 283970 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:37,155-Speed 2497.37 samples/sec Loss 2.8518 LearningRate 0.000534 Epoch: 13 Global Step: 283980 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:45,298-Speed 2515.10 samples/sec Loss 2.8574 LearningRate 0.000534 Epoch: 13 Global Step: 283990 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:23:53,497-Speed 2498.63 samples/sec Loss 2.8435 LearningRate 0.000534 Epoch: 13 Global Step: 284000 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:01,706-Speed 2494.97 samples/sec Loss 2.8471 LearningRate 0.000534 Epoch: 13 Global Step: 284010 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:09,908-Speed 2497.28 samples/sec Loss 2.8784 LearningRate 0.000534 Epoch: 13 Global Step: 284020 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:18,106-Speed 2498.78 samples/sec Loss 2.8950 LearningRate 0.000534 Epoch: 13 Global Step: 284030 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:26,303-Speed 2498.58 samples/sec Loss 2.8876 LearningRate 0.000534 Epoch: 13 Global Step: 284040 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:34,455-Speed 2512.72 samples/sec Loss 2.8668 LearningRate 0.000534 Epoch: 13 Global Step: 284050 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:42,652-Speed 2498.88 samples/sec Loss 2.9122 LearningRate 0.000534 Epoch: 13 Global Step: 284060 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:50,849-Speed 2498.81 samples/sec Loss 2.8522 LearningRate 0.000534 Epoch: 13 Global Step: 284070 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:24:59,049-Speed 2498.09 samples/sec Loss 2.8439 LearningRate 0.000534 Epoch: 13 Global Step: 284080 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:07,257-Speed 2495.37 samples/sec Loss 2.8809 LearningRate 0.000534 Epoch: 13 Global Step: 284090 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:15,455-Speed 2498.80 samples/sec Loss 2.8978 LearningRate 0.000534 Epoch: 13 Global Step: 284100 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:23,601-Speed 2514.34 samples/sec Loss 2.8623 LearningRate 0.000534 Epoch: 13 Global Step: 284110 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:31,799-Speed 2498.52 samples/sec Loss 2.8518 LearningRate 0.000534 Epoch: 13 Global Step: 284120 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:39,997-Speed 2498.61 samples/sec Loss 2.8777 LearningRate 0.000534 Epoch: 13 Global Step: 284130 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:48,200-Speed 2497.08 samples/sec Loss 2.8983 LearningRate 0.000534 Epoch: 13 Global Step: 284140 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:25:56,399-Speed 2498.05 samples/sec Loss 2.9433 LearningRate 0.000534 Epoch: 13 Global Step: 284150 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:04,602-Speed 2497.32 samples/sec Loss 2.8325 LearningRate 0.000534 Epoch: 13 Global Step: 284160 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:12,751-Speed 2513.49 samples/sec Loss 2.8722 LearningRate 0.000534 Epoch: 13 Global Step: 284170 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:20,949-Speed 2498.59 samples/sec Loss 2.9518 LearningRate 0.000534 Epoch: 13 Global Step: 284180 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:29,149-Speed 2498.29 samples/sec Loss 2.8772 LearningRate 0.000534 Epoch: 13 Global Step: 284190 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:37,350-Speed 2497.57 samples/sec Loss 2.8745 LearningRate 0.000534 Epoch: 13 Global Step: 284200 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:45,566-Speed 2493.21 samples/sec Loss 2.9110 LearningRate 0.000534 Epoch: 13 Global Step: 284210 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:26:53,763-Speed 2498.79 samples/sec Loss 2.9151 LearningRate 0.000534 Epoch: 13 Global Step: 284220 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:01,910-Speed 2514.53 samples/sec Loss 2.8553 LearningRate 0.000534 Epoch: 13 Global Step: 284230 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:10,112-Speed 2497.38 samples/sec Loss 2.9726 LearningRate 0.000533 Epoch: 13 Global Step: 284240 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:18,315-Speed 2496.99 samples/sec Loss 2.8364 LearningRate 0.000533 Epoch: 13 Global Step: 284250 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:26,524-Speed 2495.20 samples/sec Loss 2.8797 LearningRate 0.000533 Epoch: 13 Global Step: 284260 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:34,724-Speed 2498.06 samples/sec Loss 2.9400 LearningRate 0.000533 Epoch: 13 Global Step: 284270 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:42,918-Speed 2499.82 samples/sec Loss 2.8626 LearningRate 0.000533 Epoch: 13 Global Step: 284280 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:51,065-Speed 2514.10 samples/sec Loss 2.8995 LearningRate 0.000533 Epoch: 13 Global Step: 284290 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:27:59,271-Speed 2496.14 samples/sec Loss 2.9593 LearningRate 0.000533 Epoch: 13 Global Step: 284300 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:07,470-Speed 2498.39 samples/sec Loss 2.9687 LearningRate 0.000533 Epoch: 13 Global Step: 284310 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:15,672-Speed 2497.32 samples/sec Loss 2.9247 LearningRate 0.000533 Epoch: 13 Global Step: 284320 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:23,871-Speed 2498.17 samples/sec Loss 2.9025 LearningRate 0.000533 Epoch: 13 Global Step: 284330 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:32,066-Speed 2499.33 samples/sec Loss 2.9355 LearningRate 0.000533 Epoch: 13 Global Step: 284340 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:40,210-Speed 2515.61 samples/sec Loss 2.8914 LearningRate 0.000533 Epoch: 13 Global Step: 284350 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:48,410-Speed 2497.98 samples/sec Loss 2.9017 LearningRate 0.000533 Epoch: 13 Global Step: 284360 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:28:56,607-Speed 2498.81 samples/sec Loss 2.9196 LearningRate 0.000533 Epoch: 13 Global Step: 284370 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:04,803-Speed 2498.95 samples/sec Loss 2.9384 LearningRate 0.000533 Epoch: 13 Global Step: 284380 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:13,001-Speed 2499.06 samples/sec Loss 2.9391 LearningRate 0.000533 Epoch: 13 Global Step: 284390 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:21,201-Speed 2497.88 samples/sec Loss 2.8743 LearningRate 0.000533 Epoch: 13 Global Step: 284400 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:29,345-Speed 2515.08 samples/sec Loss 2.8728 LearningRate 0.000533 Epoch: 13 Global Step: 284410 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:37,542-Speed 2498.75 samples/sec Loss 2.9704 LearningRate 0.000533 Epoch: 13 Global Step: 284420 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:45,736-Speed 2499.86 samples/sec Loss 2.9004 LearningRate 0.000533 Epoch: 13 Global Step: 284430 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:29:53,943-Speed 2495.81 samples/sec Loss 2.9190 LearningRate 0.000533 Epoch: 13 Global Step: 284440 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:02,138-Speed 2499.52 samples/sec Loss 2.9275 LearningRate 0.000533 Epoch: 13 Global Step: 284450 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:10,340-Speed 2497.70 samples/sec Loss 2.8944 LearningRate 0.000533 Epoch: 13 Global Step: 284460 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:18,481-Speed 2516.05 samples/sec Loss 2.9017 LearningRate 0.000533 Epoch: 13 Global Step: 284470 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:26,680-Speed 2498.34 samples/sec Loss 2.8391 LearningRate 0.000533 Epoch: 13 Global Step: 284480 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:34,875-Speed 2499.52 samples/sec Loss 2.8228 LearningRate 0.000533 Epoch: 13 Global Step: 284490 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:43,083-Speed 2495.56 samples/sec Loss 2.9374 LearningRate 0.000533 Epoch: 13 Global Step: 284500 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:51,286-Speed 2496.94 samples/sec Loss 2.8690 LearningRate 0.000533 Epoch: 13 Global Step: 284510 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:30:59,485-Speed 2498.37 samples/sec Loss 2.8621 LearningRate 0.000533 Epoch: 13 Global Step: 284520 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:07,629-Speed 2515.02 samples/sec Loss 2.8202 LearningRate 0.000533 Epoch: 13 Global Step: 284530 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:15,827-Speed 2498.27 samples/sec Loss 2.8425 LearningRate 0.000533 Epoch: 13 Global Step: 284540 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:24,024-Speed 2499.19 samples/sec Loss 2.8667 LearningRate 0.000533 Epoch: 13 Global Step: 284550 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:32,225-Speed 2497.70 samples/sec Loss 2.8827 LearningRate 0.000533 Epoch: 13 Global Step: 284560 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:40,425-Speed 2497.69 samples/sec Loss 2.8626 LearningRate 0.000533 Epoch: 13 Global Step: 284570 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:48,625-Speed 2497.93 samples/sec Loss 2.9113 LearningRate 0.000533 Epoch: 13 Global Step: 284580 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:31:56,777-Speed 2512.70 samples/sec Loss 2.8692 LearningRate 0.000533 Epoch: 13 Global Step: 284590 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:04,983-Speed 2496.19 samples/sec Loss 2.8663 LearningRate 0.000533 Epoch: 13 Global Step: 284600 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:13,183-Speed 2498.10 samples/sec Loss 2.8528 LearningRate 0.000533 Epoch: 13 Global Step: 284610 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:21,385-Speed 2497.24 samples/sec Loss 2.8543 LearningRate 0.000533 Epoch: 13 Global Step: 284620 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:29,584-Speed 2498.37 samples/sec Loss 2.8579 LearningRate 0.000533 Epoch: 13 Global Step: 284630 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:37,780-Speed 2499.13 samples/sec Loss 2.8977 LearningRate 0.000533 Epoch: 13 Global Step: 284640 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:45,933-Speed 2512.33 samples/sec Loss 2.9019 LearningRate 0.000533 Epoch: 13 Global Step: 284650 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:32:54,131-Speed 2498.66 samples/sec Loss 2.9599 LearningRate 0.000533 Epoch: 13 Global Step: 284660 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:02,330-Speed 2498.28 samples/sec Loss 2.9697 LearningRate 0.000533 Epoch: 13 Global Step: 284670 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:10,532-Speed 2497.37 samples/sec Loss 2.9488 LearningRate 0.000533 Epoch: 13 Global Step: 284680 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:18,730-Speed 2498.49 samples/sec Loss 2.9459 LearningRate 0.000533 Epoch: 13 Global Step: 284690 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:26,933-Speed 2496.89 samples/sec Loss 2.8722 LearningRate 0.000533 Epoch: 13 Global Step: 284700 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:35,094-Speed 2509.85 samples/sec Loss 2.8640 LearningRate 0.000533 Epoch: 13 Global Step: 284710 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:43,306-Speed 2494.35 samples/sec Loss 2.8883 LearningRate 0.000533 Epoch: 13 Global Step: 284720 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:51,506-Speed 2498.23 samples/sec Loss 2.8898 LearningRate 0.000533 Epoch: 13 Global Step: 284730 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:33:59,702-Speed 2499.15 samples/sec Loss 2.9533 LearningRate 0.000533 Epoch: 13 Global Step: 284740 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:07,906-Speed 2496.87 samples/sec Loss 2.8462 LearningRate 0.000532 Epoch: 13 Global Step: 284750 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:16,103-Speed 2498.74 samples/sec Loss 2.8951 LearningRate 0.000532 Epoch: 13 Global Step: 284760 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:24,244-Speed 2516.19 samples/sec Loss 2.8757 LearningRate 0.000532 Epoch: 13 Global Step: 284770 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:32,450-Speed 2496.16 samples/sec Loss 2.8997 LearningRate 0.000532 Epoch: 13 Global Step: 284780 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:40,655-Speed 2496.58 samples/sec Loss 2.8666 LearningRate 0.000532 Epoch: 13 Global Step: 284790 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:48,858-Speed 2496.93 samples/sec Loss 2.8539 LearningRate 0.000532 Epoch: 13 Global Step: 284800 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:34:57,068-Speed 2494.82 samples/sec Loss 2.8815 LearningRate 0.000532 Epoch: 13 Global Step: 284810 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:05,280-Speed 2494.40 samples/sec Loss 2.9315 LearningRate 0.000532 Epoch: 13 Global Step: 284820 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:13,424-Speed 2515.15 samples/sec Loss 2.8942 LearningRate 0.000532 Epoch: 13 Global Step: 284830 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:21,619-Speed 2499.43 samples/sec Loss 2.8685 LearningRate 0.000532 Epoch: 13 Global Step: 284840 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:29,820-Speed 2497.70 samples/sec Loss 2.8310 LearningRate 0.000532 Epoch: 13 Global Step: 284850 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:38,029-Speed 2495.68 samples/sec Loss 2.8677 LearningRate 0.000532 Epoch: 13 Global Step: 284860 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:46,231-Speed 2497.25 samples/sec Loss 2.8582 LearningRate 0.000532 Epoch: 13 Global Step: 284870 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:35:54,429-Speed 2498.51 samples/sec Loss 2.8259 LearningRate 0.000532 Epoch: 13 Global Step: 284880 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:02,575-Speed 2514.50 samples/sec Loss 2.8418 LearningRate 0.000532 Epoch: 13 Global Step: 284890 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:10,782-Speed 2495.88 samples/sec Loss 2.8380 LearningRate 0.000532 Epoch: 13 Global Step: 284900 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:18,984-Speed 2497.15 samples/sec Loss 2.8302 LearningRate 0.000532 Epoch: 13 Global Step: 284910 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:27,180-Speed 2498.99 samples/sec Loss 2.9410 LearningRate 0.000532 Epoch: 13 Global Step: 284920 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:35,382-Speed 2497.51 samples/sec Loss 2.9571 LearningRate 0.000532 Epoch: 13 Global Step: 284930 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:43,596-Speed 2493.68 samples/sec Loss 2.8280 LearningRate 0.000532 Epoch: 13 Global Step: 284940 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:51,742-Speed 2514.64 samples/sec Loss 2.8555 LearningRate 0.000532 Epoch: 13 Global Step: 284950 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:36:59,937-Speed 2499.47 samples/sec Loss 2.8739 LearningRate 0.000532 Epoch: 13 Global Step: 284960 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:08,136-Speed 2498.40 samples/sec Loss 2.9280 LearningRate 0.000532 Epoch: 13 Global Step: 284970 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:16,337-Speed 2497.65 samples/sec Loss 2.8891 LearningRate 0.000532 Epoch: 13 Global Step: 284980 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:24,537-Speed 2498.00 samples/sec Loss 2.8756 LearningRate 0.000532 Epoch: 13 Global Step: 284990 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:32,735-Speed 2498.48 samples/sec Loss 2.8644 LearningRate 0.000532 Epoch: 13 Global Step: 285000 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:40,885-Speed 2513.23 samples/sec Loss 2.8836 LearningRate 0.000532 Epoch: 13 Global Step: 285010 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:49,096-Speed 2494.45 samples/sec Loss 2.8342 LearningRate 0.000532 Epoch: 13 Global Step: 285020 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:37:57,291-Speed 2499.64 samples/sec Loss 2.8865 LearningRate 0.000532 Epoch: 13 Global Step: 285030 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:05,487-Speed 2499.04 samples/sec Loss 2.9123 LearningRate 0.000532 Epoch: 13 Global Step: 285040 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:13,686-Speed 2498.30 samples/sec Loss 2.8578 LearningRate 0.000532 Epoch: 13 Global Step: 285050 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:21,883-Speed 2498.76 samples/sec Loss 2.8236 LearningRate 0.000532 Epoch: 13 Global Step: 285060 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:30,032-Speed 2513.49 samples/sec Loss 2.9063 LearningRate 0.000532 Epoch: 13 Global Step: 285070 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:38,230-Speed 2498.69 samples/sec Loss 2.8318 LearningRate 0.000532 Epoch: 13 Global Step: 285080 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:46,424-Speed 2499.83 samples/sec Loss 2.8643 LearningRate 0.000532 Epoch: 13 Global Step: 285090 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:38:54,624-Speed 2498.00 samples/sec Loss 2.8612 LearningRate 0.000532 Epoch: 13 Global Step: 285100 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:39:02,822-Speed 2498.49 samples/sec Loss 2.8620 LearningRate 0.000532 Epoch: 13 Global Step: 285110 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:39:11,022-Speed 2498.08 samples/sec Loss 2.8270 LearningRate 0.000532 Epoch: 13 Global Step: 285120 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:39:19,166-Speed 2515.06 samples/sec Loss 2.9069 LearningRate 0.000532 Epoch: 13 Global Step: 285130 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:39:27,360-Speed 2499.61 samples/sec Loss 2.8678 LearningRate 0.000532 Epoch: 13 Global Step: 285140 Fp16 Grad Scale: 65536 Required: 125 hours Training: 2022-07-08 07:39:35,514-Speed 2512.23 samples/sec Loss 2.8766 LearningRate 0.000532 Epoch: 13 Global Step: 285150 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:39:43,710-Speed 2499.22 samples/sec Loss 2.8936 LearningRate 0.000532 Epoch: 13 Global Step: 285160 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:39:51,907-Speed 2498.95 samples/sec Loss 2.8805 LearningRate 0.000532 Epoch: 13 Global Step: 285170 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:40:00,107-Speed 2497.90 samples/sec Loss 2.8311 LearningRate 0.000532 Epoch: 13 Global Step: 285180 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:40:08,248-Speed 2516.04 samples/sec Loss 2.8598 LearningRate 0.000532 Epoch: 13 Global Step: 285190 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:40:16,450-Speed 2497.22 samples/sec Loss 2.9186 LearningRate 0.000532 Epoch: 13 Global Step: 285200 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:40:24,648-Speed 2498.81 samples/sec Loss 2.9908 LearningRate 0.000532 Epoch: 13 Global Step: 285210 Fp16 Grad Scale: 32768 Required: 125 hours Training: 2022-07-08 07:40:32,857-Speed 2495.21 samples/sec Loss 2.9543 LearningRate 0.000532 Epoch: 13 Global Step: 285220 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:40:41,057-Speed 2498.07 samples/sec Loss 2.8205 LearningRate 0.000532 Epoch: 13 Global Step: 285230 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:40:49,255-Speed 2498.53 samples/sec Loss 2.9048 LearningRate 0.000532 Epoch: 13 Global Step: 285240 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:40:57,399-Speed 2515.15 samples/sec Loss 2.8804 LearningRate 0.000532 Epoch: 13 Global Step: 285250 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:05,606-Speed 2495.95 samples/sec Loss 2.8557 LearningRate 0.000531 Epoch: 13 Global Step: 285260 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:13,805-Speed 2498.21 samples/sec Loss 2.8013 LearningRate 0.000531 Epoch: 13 Global Step: 285270 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:22,006-Speed 2497.88 samples/sec Loss 2.8416 LearningRate 0.000531 Epoch: 13 Global Step: 285280 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:30,202-Speed 2499.07 samples/sec Loss 2.8485 LearningRate 0.000531 Epoch: 13 Global Step: 285290 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:38,402-Speed 2498.08 samples/sec Loss 2.8435 LearningRate 0.000531 Epoch: 13 Global Step: 285300 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:46,546-Speed 2514.92 samples/sec Loss 2.8980 LearningRate 0.000531 Epoch: 13 Global Step: 285310 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:41:54,765-Speed 2492.13 samples/sec Loss 2.9033 LearningRate 0.000531 Epoch: 13 Global Step: 285320 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:02,959-Speed 2500.00 samples/sec Loss 2.8638 LearningRate 0.000531 Epoch: 13 Global Step: 285330 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:11,165-Speed 2496.42 samples/sec Loss 2.7959 LearningRate 0.000531 Epoch: 13 Global Step: 285340 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:19,365-Speed 2497.98 samples/sec Loss 2.9054 LearningRate 0.000531 Epoch: 13 Global Step: 285350 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:27,562-Speed 2498.78 samples/sec Loss 2.8515 LearningRate 0.000531 Epoch: 13 Global Step: 285360 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:35,708-Speed 2514.92 samples/sec Loss 2.7908 LearningRate 0.000531 Epoch: 13 Global Step: 285370 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:43,906-Speed 2498.70 samples/sec Loss 2.8134 LearningRate 0.000531 Epoch: 13 Global Step: 285380 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:42:52,107-Speed 2497.51 samples/sec Loss 2.8520 LearningRate 0.000531 Epoch: 13 Global Step: 285390 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:00,318-Speed 2494.43 samples/sec Loss 2.8554 LearningRate 0.000531 Epoch: 13 Global Step: 285400 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:08,519-Speed 2497.62 samples/sec Loss 2.8461 LearningRate 0.000531 Epoch: 13 Global Step: 285410 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:16,723-Speed 2496.74 samples/sec Loss 2.9307 LearningRate 0.000531 Epoch: 13 Global Step: 285420 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:24,872-Speed 2513.67 samples/sec Loss 2.8586 LearningRate 0.000531 Epoch: 13 Global Step: 285430 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:33,076-Speed 2496.66 samples/sec Loss 2.8347 LearningRate 0.000531 Epoch: 13 Global Step: 285440 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:41,279-Speed 2497.21 samples/sec Loss 2.8520 LearningRate 0.000531 Epoch: 13 Global Step: 285450 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:49,489-Speed 2495.03 samples/sec Loss 2.8962 LearningRate 0.000531 Epoch: 13 Global Step: 285460 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:43:57,689-Speed 2498.01 samples/sec Loss 2.8464 LearningRate 0.000531 Epoch: 13 Global Step: 285470 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:05,894-Speed 2496.13 samples/sec Loss 2.9337 LearningRate 0.000531 Epoch: 13 Global Step: 285480 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:14,039-Speed 2514.80 samples/sec Loss 2.8886 LearningRate 0.000531 Epoch: 13 Global Step: 285490 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:22,237-Speed 2498.92 samples/sec Loss 2.9023 LearningRate 0.000531 Epoch: 13 Global Step: 285500 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:30,444-Speed 2495.74 samples/sec Loss 2.8843 LearningRate 0.000531 Epoch: 13 Global Step: 285510 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:38,643-Speed 2498.11 samples/sec Loss 2.8441 LearningRate 0.000531 Epoch: 13 Global Step: 285520 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:46,850-Speed 2495.99 samples/sec Loss 2.9442 LearningRate 0.000531 Epoch: 13 Global Step: 285530 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:44:55,048-Speed 2498.57 samples/sec Loss 2.9126 LearningRate 0.000531 Epoch: 13 Global Step: 285540 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:03,196-Speed 2513.80 samples/sec Loss 2.9279 LearningRate 0.000531 Epoch: 13 Global Step: 285550 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:11,397-Speed 2497.75 samples/sec Loss 2.8694 LearningRate 0.000531 Epoch: 13 Global Step: 285560 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:19,602-Speed 2496.47 samples/sec Loss 2.9067 LearningRate 0.000531 Epoch: 13 Global Step: 285570 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:27,800-Speed 2498.46 samples/sec Loss 2.9234 LearningRate 0.000531 Epoch: 13 Global Step: 285580 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:35,998-Speed 2498.52 samples/sec Loss 2.9325 LearningRate 0.000531 Epoch: 13 Global Step: 285590 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:44,200-Speed 2497.33 samples/sec Loss 2.9567 LearningRate 0.000531 Epoch: 13 Global Step: 285600 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:45:52,357-Speed 2511.25 samples/sec Loss 2.9246 LearningRate 0.000531 Epoch: 13 Global Step: 285610 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:00,569-Speed 2494.12 samples/sec Loss 2.8986 LearningRate 0.000531 Epoch: 13 Global Step: 285620 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:08,766-Speed 2498.99 samples/sec Loss 2.8889 LearningRate 0.000531 Epoch: 13 Global Step: 285630 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:16,966-Speed 2497.84 samples/sec Loss 2.8928 LearningRate 0.000531 Epoch: 13 Global Step: 285640 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:25,165-Speed 2498.50 samples/sec Loss 2.8940 LearningRate 0.000531 Epoch: 13 Global Step: 285650 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:33,368-Speed 2497.28 samples/sec Loss 2.8652 LearningRate 0.000531 Epoch: 13 Global Step: 285660 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:41,510-Speed 2515.77 samples/sec Loss 2.8823 LearningRate 0.000531 Epoch: 13 Global Step: 285670 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:49,710-Speed 2497.94 samples/sec Loss 2.8520 LearningRate 0.000531 Epoch: 13 Global Step: 285680 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:46:57,906-Speed 2498.96 samples/sec Loss 2.8563 LearningRate 0.000531 Epoch: 13 Global Step: 285690 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:06,103-Speed 2499.05 samples/sec Loss 2.8703 LearningRate 0.000531 Epoch: 13 Global Step: 285700 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:14,304-Speed 2497.62 samples/sec Loss 2.9031 LearningRate 0.000531 Epoch: 13 Global Step: 285710 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:22,501-Speed 2499.09 samples/sec Loss 2.9028 LearningRate 0.000531 Epoch: 13 Global Step: 285720 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:30,639-Speed 2517.15 samples/sec Loss 2.8868 LearningRate 0.000531 Epoch: 13 Global Step: 285730 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:38,835-Speed 2499.01 samples/sec Loss 2.8897 LearningRate 0.000531 Epoch: 13 Global Step: 285740 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:47,031-Speed 2499.33 samples/sec Loss 2.9069 LearningRate 0.000531 Epoch: 13 Global Step: 285750 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:47:55,233-Speed 2497.12 samples/sec Loss 2.8748 LearningRate 0.000531 Epoch: 13 Global Step: 285760 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:03,441-Speed 2495.90 samples/sec Loss 2.9087 LearningRate 0.000530 Epoch: 13 Global Step: 285770 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:11,641-Speed 2497.62 samples/sec Loss 2.9444 LearningRate 0.000530 Epoch: 13 Global Step: 285780 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:19,790-Speed 2513.73 samples/sec Loss 2.8702 LearningRate 0.000530 Epoch: 13 Global Step: 285790 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:27,991-Speed 2497.71 samples/sec Loss 2.9034 LearningRate 0.000530 Epoch: 13 Global Step: 285800 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:36,191-Speed 2498.51 samples/sec Loss 2.8829 LearningRate 0.000530 Epoch: 13 Global Step: 285810 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:44,392-Speed 2497.68 samples/sec Loss 2.8678 LearningRate 0.000530 Epoch: 13 Global Step: 285820 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:48:52,590-Speed 2498.53 samples/sec Loss 2.8076 LearningRate 0.000530 Epoch: 13 Global Step: 285830 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:00,799-Speed 2495.09 samples/sec Loss 2.8868 LearningRate 0.000530 Epoch: 13 Global Step: 285840 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:08,948-Speed 2513.62 samples/sec Loss 2.8550 LearningRate 0.000530 Epoch: 13 Global Step: 285850 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:17,144-Speed 2499.37 samples/sec Loss 2.8619 LearningRate 0.000530 Epoch: 13 Global Step: 285860 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:25,342-Speed 2498.69 samples/sec Loss 2.8502 LearningRate 0.000530 Epoch: 13 Global Step: 285870 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:33,540-Speed 2498.88 samples/sec Loss 2.8984 LearningRate 0.000530 Epoch: 13 Global Step: 285880 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:41,738-Speed 2498.40 samples/sec Loss 2.8683 LearningRate 0.000530 Epoch: 13 Global Step: 285890 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:49,940-Speed 2497.54 samples/sec Loss 2.9244 LearningRate 0.000530 Epoch: 13 Global Step: 285900 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:49:58,089-Speed 2513.37 samples/sec Loss 2.8639 LearningRate 0.000530 Epoch: 13 Global Step: 285910 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:06,289-Speed 2498.20 samples/sec Loss 2.8309 LearningRate 0.000530 Epoch: 13 Global Step: 285920 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:14,488-Speed 2498.13 samples/sec Loss 2.8646 LearningRate 0.000530 Epoch: 13 Global Step: 285930 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:22,686-Speed 2498.48 samples/sec Loss 2.8729 LearningRate 0.000530 Epoch: 13 Global Step: 285940 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:30,886-Speed 2497.99 samples/sec Loss 2.9225 LearningRate 0.000530 Epoch: 13 Global Step: 285950 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:39,092-Speed 2496.22 samples/sec Loss 2.9291 LearningRate 0.000530 Epoch: 13 Global Step: 285960 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:47,244-Speed 2512.59 samples/sec Loss 2.8785 LearningRate 0.000530 Epoch: 13 Global Step: 285970 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:50:55,441-Speed 2498.87 samples/sec Loss 2.9257 LearningRate 0.000530 Epoch: 13 Global Step: 285980 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:03,657-Speed 2493.12 samples/sec Loss 2.8993 LearningRate 0.000530 Epoch: 13 Global Step: 285990 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:11,854-Speed 2499.26 samples/sec Loss 2.8178 LearningRate 0.000530 Epoch: 13 Global Step: 286000 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:20,050-Speed 2499.15 samples/sec Loss 2.9218 LearningRate 0.000530 Epoch: 13 Global Step: 286010 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:28,245-Speed 2499.38 samples/sec Loss 2.8751 LearningRate 0.000530 Epoch: 13 Global Step: 286020 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:36,390-Speed 2514.83 samples/sec Loss 2.8822 LearningRate 0.000530 Epoch: 13 Global Step: 286030 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:44,589-Speed 2498.05 samples/sec Loss 2.9326 LearningRate 0.000530 Epoch: 13 Global Step: 286040 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:51:52,788-Speed 2498.24 samples/sec Loss 2.8931 LearningRate 0.000530 Epoch: 13 Global Step: 286050 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:00,990-Speed 2497.56 samples/sec Loss 2.8180 LearningRate 0.000530 Epoch: 13 Global Step: 286060 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:09,187-Speed 2498.90 samples/sec Loss 2.8257 LearningRate 0.000530 Epoch: 13 Global Step: 286070 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:17,386-Speed 2497.95 samples/sec Loss 2.8827 LearningRate 0.000530 Epoch: 13 Global Step: 286080 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:25,538-Speed 2513.00 samples/sec Loss 2.9354 LearningRate 0.000530 Epoch: 13 Global Step: 286090 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:33,736-Speed 2498.50 samples/sec Loss 2.8489 LearningRate 0.000530 Epoch: 13 Global Step: 286100 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:41,936-Speed 2498.06 samples/sec Loss 2.8956 LearningRate 0.000530 Epoch: 13 Global Step: 286110 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:50,134-Speed 2498.63 samples/sec Loss 3.0159 LearningRate 0.000530 Epoch: 13 Global Step: 286120 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:52:58,339-Speed 2496.46 samples/sec Loss 2.9113 LearningRate 0.000530 Epoch: 13 Global Step: 286130 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:06,537-Speed 2498.44 samples/sec Loss 2.8749 LearningRate 0.000530 Epoch: 13 Global Step: 286140 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:14,680-Speed 2515.57 samples/sec Loss 2.9060 LearningRate 0.000530 Epoch: 13 Global Step: 286150 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:22,890-Speed 2495.16 samples/sec Loss 2.9251 LearningRate 0.000530 Epoch: 13 Global Step: 286160 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:31,095-Speed 2496.35 samples/sec Loss 2.8893 LearningRate 0.000530 Epoch: 13 Global Step: 286170 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:39,292-Speed 2498.95 samples/sec Loss 2.8776 LearningRate 0.000530 Epoch: 13 Global Step: 286180 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:47,488-Speed 2499.28 samples/sec Loss 2.9222 LearningRate 0.000530 Epoch: 13 Global Step: 286190 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:53:55,685-Speed 2499.13 samples/sec Loss 2.8993 LearningRate 0.000530 Epoch: 13 Global Step: 286200 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:03,838-Speed 2512.35 samples/sec Loss 2.9030 LearningRate 0.000530 Epoch: 13 Global Step: 286210 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:12,031-Speed 2499.94 samples/sec Loss 2.9651 LearningRate 0.000530 Epoch: 13 Global Step: 286220 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:20,236-Speed 2496.68 samples/sec Loss 2.9346 LearningRate 0.000530 Epoch: 13 Global Step: 286230 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:28,438-Speed 2497.35 samples/sec Loss 2.8647 LearningRate 0.000530 Epoch: 13 Global Step: 286240 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:36,638-Speed 2497.82 samples/sec Loss 2.9033 LearningRate 0.000530 Epoch: 13 Global Step: 286250 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:44,835-Speed 2498.94 samples/sec Loss 2.9400 LearningRate 0.000530 Epoch: 13 Global Step: 286260 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:54:52,979-Speed 2515.03 samples/sec Loss 2.9659 LearningRate 0.000530 Epoch: 13 Global Step: 286270 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:01,176-Speed 2499.09 samples/sec Loss 2.8872 LearningRate 0.000530 Epoch: 13 Global Step: 286280 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:09,387-Speed 2494.51 samples/sec Loss 2.8516 LearningRate 0.000529 Epoch: 13 Global Step: 286290 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:17,586-Speed 2498.14 samples/sec Loss 2.8824 LearningRate 0.000529 Epoch: 13 Global Step: 286300 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:25,784-Speed 2498.47 samples/sec Loss 2.8792 LearningRate 0.000529 Epoch: 13 Global Step: 286310 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:33,982-Speed 2498.51 samples/sec Loss 2.8898 LearningRate 0.000529 Epoch: 13 Global Step: 286320 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:42,135-Speed 2512.54 samples/sec Loss 2.9011 LearningRate 0.000529 Epoch: 13 Global Step: 286330 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:50,334-Speed 2498.45 samples/sec Loss 2.8735 LearningRate 0.000529 Epoch: 13 Global Step: 286340 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:55:58,536-Speed 2497.13 samples/sec Loss 2.8961 LearningRate 0.000529 Epoch: 13 Global Step: 286350 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:06,737-Speed 2497.97 samples/sec Loss 2.9084 LearningRate 0.000529 Epoch: 13 Global Step: 286360 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:14,943-Speed 2496.16 samples/sec Loss 2.9321 LearningRate 0.000529 Epoch: 13 Global Step: 286370 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:23,148-Speed 2496.62 samples/sec Loss 2.9955 LearningRate 0.000529 Epoch: 13 Global Step: 286380 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:31,311-Speed 2509.21 samples/sec Loss 2.8872 LearningRate 0.000529 Epoch: 13 Global Step: 286390 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:39,508-Speed 2498.73 samples/sec Loss 2.9255 LearningRate 0.000529 Epoch: 13 Global Step: 286400 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:47,706-Speed 2498.45 samples/sec Loss 2.9143 LearningRate 0.000529 Epoch: 13 Global Step: 286410 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:56:55,906-Speed 2498.23 samples/sec Loss 2.8916 LearningRate 0.000529 Epoch: 13 Global Step: 286420 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:04,118-Speed 2494.11 samples/sec Loss 2.9141 LearningRate 0.000529 Epoch: 13 Global Step: 286430 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:12,321-Speed 2497.13 samples/sec Loss 2.9270 LearningRate 0.000529 Epoch: 13 Global Step: 286440 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:20,465-Speed 2515.09 samples/sec Loss 2.8722 LearningRate 0.000529 Epoch: 13 Global Step: 286450 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:28,670-Speed 2496.57 samples/sec Loss 2.8561 LearningRate 0.000529 Epoch: 13 Global Step: 286460 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:36,870-Speed 2498.14 samples/sec Loss 2.9048 LearningRate 0.000529 Epoch: 13 Global Step: 286470 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:45,069-Speed 2498.16 samples/sec Loss 2.8448 LearningRate 0.000529 Epoch: 13 Global Step: 286480 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:57:53,267-Speed 2498.56 samples/sec Loss 2.8851 LearningRate 0.000529 Epoch: 13 Global Step: 286490 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:01,479-Speed 2494.34 samples/sec Loss 2.8729 LearningRate 0.000529 Epoch: 13 Global Step: 286500 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:09,623-Speed 2515.10 samples/sec Loss 2.8965 LearningRate 0.000529 Epoch: 13 Global Step: 286510 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:17,820-Speed 2498.84 samples/sec Loss 2.8658 LearningRate 0.000529 Epoch: 13 Global Step: 286520 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:26,018-Speed 2498.67 samples/sec Loss 2.8443 LearningRate 0.000529 Epoch: 13 Global Step: 286530 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:34,219-Speed 2497.60 samples/sec Loss 2.8873 LearningRate 0.000529 Epoch: 13 Global Step: 286540 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:42,420-Speed 2497.78 samples/sec Loss 2.8595 LearningRate 0.000529 Epoch: 13 Global Step: 286550 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:50,618-Speed 2498.61 samples/sec Loss 2.8528 LearningRate 0.000529 Epoch: 13 Global Step: 286560 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:58:58,769-Speed 2512.94 samples/sec Loss 2.8434 LearningRate 0.000529 Epoch: 13 Global Step: 286570 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:59:06,967-Speed 2498.64 samples/sec Loss 2.8386 LearningRate 0.000529 Epoch: 13 Global Step: 286580 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 07:59:15,125-Speed 2510.95 samples/sec Loss 2.8937 LearningRate 0.000529 Epoch: 13 Global Step: 286590 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:59:23,331-Speed 2495.99 samples/sec Loss 2.8801 LearningRate 0.000529 Epoch: 13 Global Step: 286600 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:59:31,527-Speed 2499.05 samples/sec Loss 2.8832 LearningRate 0.000529 Epoch: 13 Global Step: 286610 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:59:39,741-Speed 2494.13 samples/sec Loss 2.9075 LearningRate 0.000529 Epoch: 13 Global Step: 286620 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:59:47,896-Speed 2511.62 samples/sec Loss 2.9165 LearningRate 0.000529 Epoch: 13 Global Step: 286630 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 07:59:56,092-Speed 2499.11 samples/sec Loss 2.8485 LearningRate 0.000529 Epoch: 13 Global Step: 286640 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:04,304-Speed 2494.47 samples/sec Loss 2.8740 LearningRate 0.000529 Epoch: 13 Global Step: 286650 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:12,500-Speed 2499.13 samples/sec Loss 2.8534 LearningRate 0.000529 Epoch: 13 Global Step: 286660 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:20,695-Speed 2499.72 samples/sec Loss 2.7988 LearningRate 0.000529 Epoch: 13 Global Step: 286670 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:28,891-Speed 2499.15 samples/sec Loss 2.9090 LearningRate 0.000529 Epoch: 13 Global Step: 286680 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:37,032-Speed 2515.91 samples/sec Loss 2.8576 LearningRate 0.000529 Epoch: 13 Global Step: 286690 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:45,230-Speed 2498.63 samples/sec Loss 2.8439 LearningRate 0.000529 Epoch: 13 Global Step: 286700 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:00:53,427-Speed 2498.59 samples/sec Loss 2.8863 LearningRate 0.000529 Epoch: 13 Global Step: 286710 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:01,639-Speed 2494.56 samples/sec Loss 2.8092 LearningRate 0.000529 Epoch: 13 Global Step: 286720 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:09,835-Speed 2499.54 samples/sec Loss 2.9074 LearningRate 0.000529 Epoch: 13 Global Step: 286730 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:18,042-Speed 2495.69 samples/sec Loss 2.9170 LearningRate 0.000529 Epoch: 13 Global Step: 286740 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:26,186-Speed 2515.09 samples/sec Loss 2.8777 LearningRate 0.000529 Epoch: 13 Global Step: 286750 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:34,383-Speed 2498.72 samples/sec Loss 2.8879 LearningRate 0.000529 Epoch: 13 Global Step: 286760 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:42,605-Speed 2491.27 samples/sec Loss 2.8330 LearningRate 0.000529 Epoch: 13 Global Step: 286770 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:50,816-Speed 2494.57 samples/sec Loss 2.9230 LearningRate 0.000529 Epoch: 13 Global Step: 286780 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:01:59,021-Speed 2496.60 samples/sec Loss 2.9014 LearningRate 0.000529 Epoch: 13 Global Step: 286790 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:07,221-Speed 2498.03 samples/sec Loss 2.9489 LearningRate 0.000528 Epoch: 13 Global Step: 286800 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:15,369-Speed 2513.56 samples/sec Loss 2.8771 LearningRate 0.000528 Epoch: 13 Global Step: 286810 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:23,574-Speed 2496.68 samples/sec Loss 2.8619 LearningRate 0.000528 Epoch: 13 Global Step: 286820 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:31,780-Speed 2496.18 samples/sec Loss 2.9153 LearningRate 0.000528 Epoch: 13 Global Step: 286830 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:39,978-Speed 2498.50 samples/sec Loss 2.9363 LearningRate 0.000528 Epoch: 13 Global Step: 286840 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:48,176-Speed 2498.58 samples/sec Loss 2.8482 LearningRate 0.000528 Epoch: 13 Global Step: 286850 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:02:56,373-Speed 2498.86 samples/sec Loss 2.9025 LearningRate 0.000528 Epoch: 13 Global Step: 286860 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:04,520-Speed 2514.24 samples/sec Loss 2.8780 LearningRate 0.000528 Epoch: 13 Global Step: 286870 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:12,718-Speed 2498.50 samples/sec Loss 2.8448 LearningRate 0.000528 Epoch: 13 Global Step: 286880 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:20,919-Speed 2497.84 samples/sec Loss 2.8979 LearningRate 0.000528 Epoch: 13 Global Step: 286890 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:29,115-Speed 2498.98 samples/sec Loss 2.8982 LearningRate 0.000528 Epoch: 13 Global Step: 286900 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:37,315-Speed 2497.98 samples/sec Loss 2.8612 LearningRate 0.000528 Epoch: 13 Global Step: 286910 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:45,518-Speed 2497.04 samples/sec Loss 2.8024 LearningRate 0.000528 Epoch: 13 Global Step: 286920 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:03:53,660-Speed 2515.81 samples/sec Loss 2.8641 LearningRate 0.000528 Epoch: 13 Global Step: 286930 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:01,857-Speed 2498.74 samples/sec Loss 2.8780 LearningRate 0.000528 Epoch: 13 Global Step: 286940 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:10,062-Speed 2496.54 samples/sec Loss 2.8485 LearningRate 0.000528 Epoch: 13 Global Step: 286950 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:18,260-Speed 2498.74 samples/sec Loss 2.8393 LearningRate 0.000528 Epoch: 13 Global Step: 286960 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:26,459-Speed 2498.13 samples/sec Loss 2.8678 LearningRate 0.000528 Epoch: 13 Global Step: 286970 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:34,659-Speed 2497.83 samples/sec Loss 2.8687 LearningRate 0.000528 Epoch: 13 Global Step: 286980 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:42,811-Speed 2512.69 samples/sec Loss 2.8767 LearningRate 0.000528 Epoch: 13 Global Step: 286990 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:51,014-Speed 2497.04 samples/sec Loss 2.9109 LearningRate 0.000528 Epoch: 13 Global Step: 287000 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:04:59,212-Speed 2498.74 samples/sec Loss 2.8040 LearningRate 0.000528 Epoch: 13 Global Step: 287010 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:07,411-Speed 2498.21 samples/sec Loss 2.8542 LearningRate 0.000528 Epoch: 13 Global Step: 287020 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:15,616-Speed 2496.36 samples/sec Loss 2.8638 LearningRate 0.000528 Epoch: 13 Global Step: 287030 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:23,817-Speed 2497.55 samples/sec Loss 2.9014 LearningRate 0.000528 Epoch: 13 Global Step: 287040 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:31,961-Speed 2515.24 samples/sec Loss 2.8273 LearningRate 0.000528 Epoch: 13 Global Step: 287050 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:40,161-Speed 2497.79 samples/sec Loss 2.8705 LearningRate 0.000528 Epoch: 13 Global Step: 287060 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:48,365-Speed 2496.95 samples/sec Loss 2.8832 LearningRate 0.000528 Epoch: 13 Global Step: 287070 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:05:56,567-Speed 2497.26 samples/sec Loss 2.8824 LearningRate 0.000528 Epoch: 13 Global Step: 287080 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:04,790-Speed 2490.76 samples/sec Loss 2.8972 LearningRate 0.000528 Epoch: 13 Global Step: 287090 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:12,998-Speed 2495.53 samples/sec Loss 2.8150 LearningRate 0.000528 Epoch: 13 Global Step: 287100 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:21,149-Speed 2512.98 samples/sec Loss 2.8243 LearningRate 0.000528 Epoch: 13 Global Step: 287110 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:29,354-Speed 2496.30 samples/sec Loss 2.9074 LearningRate 0.000528 Epoch: 13 Global Step: 287120 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:37,556-Speed 2497.37 samples/sec Loss 2.8640 LearningRate 0.000528 Epoch: 13 Global Step: 287130 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:45,777-Speed 2491.56 samples/sec Loss 2.8273 LearningRate 0.000528 Epoch: 13 Global Step: 287140 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:06:53,991-Speed 2493.97 samples/sec Loss 2.8896 LearningRate 0.000528 Epoch: 13 Global Step: 287150 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:02,196-Speed 2496.30 samples/sec Loss 2.8325 LearningRate 0.000528 Epoch: 13 Global Step: 287160 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:10,346-Speed 2513.15 samples/sec Loss 2.8519 LearningRate 0.000528 Epoch: 13 Global Step: 287170 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:18,545-Speed 2498.42 samples/sec Loss 2.8353 LearningRate 0.000528 Epoch: 13 Global Step: 287180 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:26,746-Speed 2497.51 samples/sec Loss 2.7944 LearningRate 0.000528 Epoch: 13 Global Step: 287190 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:34,951-Speed 2496.46 samples/sec Loss 2.8089 LearningRate 0.000528 Epoch: 13 Global Step: 287200 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:43,209-Speed 2499.68 samples/sec Loss 2.7908 LearningRate 0.000528 Epoch: 13 Global Step: 287210 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:51,569-Speed 2499.44 samples/sec Loss 2.7806 LearningRate 0.000528 Epoch: 13 Global Step: 287220 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:07:59,716-Speed 2514.19 samples/sec Loss 2.8006 LearningRate 0.000528 Epoch: 13 Global Step: 287230 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:12,014-Speed 1743.92 samples/sec Loss 2.8513 LearningRate 0.000528 Epoch: 13 Global Step: 287240 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:20,212-Speed 2499.87 samples/sec Loss 2.8789 LearningRate 0.000528 Epoch: 13 Global Step: 287250 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:28,473-Speed 2497.01 samples/sec Loss 2.8636 LearningRate 0.000528 Epoch: 13 Global Step: 287260 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:42,404-Speed 1470.23 samples/sec Loss 2.8262 LearningRate 0.000528 Epoch: 13 Global Step: 287270 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:50,607-Speed 2498.47 samples/sec Loss 2.8165 LearningRate 0.000528 Epoch: 13 Global Step: 287280 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:08:58,805-Speed 2516.59 samples/sec Loss 2.7722 LearningRate 0.000528 Epoch: 13 Global Step: 287290 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:10,682-Speed 1724.47 samples/sec Loss 2.8280 LearningRate 0.000528 Epoch: 13 Global Step: 287300 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:18,937-Speed 2500.39 samples/sec Loss 2.7877 LearningRate 0.000527 Epoch: 13 Global Step: 287310 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:27,149-Speed 2498.21 samples/sec Loss 2.8054 LearningRate 0.000527 Epoch: 13 Global Step: 287320 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:40,263-Speed 1567.29 samples/sec Loss 2.8136 LearningRate 0.000527 Epoch: 13 Global Step: 287330 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:50,263-Speed 2502.42 samples/sec Loss 2.8273 LearningRate 0.000527 Epoch: 13 Global Step: 287340 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:09:58,575-Speed 2518.61 samples/sec Loss 2.8346 LearningRate 0.000527 Epoch: 13 Global Step: 287350 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:11,292-Speed 1618.16 samples/sec Loss 2.8112 LearningRate 0.000527 Epoch: 13 Global Step: 287360 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:19,514-Speed 2503.78 samples/sec Loss 2.8036 LearningRate 0.000527 Epoch: 13 Global Step: 287370 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:27,758-Speed 2500.52 samples/sec Loss 2.8302 LearningRate 0.000527 Epoch: 13 Global Step: 287380 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:41,490-Speed 1515.29 samples/sec Loss 2.7960 LearningRate 0.000527 Epoch: 13 Global Step: 287390 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:49,693-Speed 2501.63 samples/sec Loss 2.9007 LearningRate 0.000527 Epoch: 13 Global Step: 287400 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:10:57,831-Speed 2517.21 samples/sec Loss 2.8792 LearningRate 0.000527 Epoch: 13 Global Step: 287410 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:06,075-Speed 2499.97 samples/sec Loss 2.8251 LearningRate 0.000527 Epoch: 13 Global Step: 287420 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:18,518-Speed 1732.47 samples/sec Loss 2.8509 LearningRate 0.000527 Epoch: 13 Global Step: 287430 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:26,712-Speed 2501.08 samples/sec Loss 2.8819 LearningRate 0.000527 Epoch: 13 Global Step: 287440 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:35,700-Speed 2485.10 samples/sec Loss 2.8309 LearningRate 0.000527 Epoch: 13 Global Step: 287450 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:43,918-Speed 2500.38 samples/sec Loss 2.8579 LearningRate 0.000527 Epoch: 13 Global Step: 287460 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:11:54,553-Speed 1925.87 samples/sec Loss 2.8942 LearningRate 0.000527 Epoch: 13 Global Step: 287470 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:02,749-Speed 2498.94 samples/sec Loss 2.8817 LearningRate 0.000527 Epoch: 13 Global Step: 287480 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:10,948-Speed 2498.28 samples/sec Loss 2.8799 LearningRate 0.000527 Epoch: 13 Global Step: 287490 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:19,150-Speed 2497.46 samples/sec Loss 2.8271 LearningRate 0.000527 Epoch: 13 Global Step: 287500 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:27,352-Speed 2497.33 samples/sec Loss 2.8124 LearningRate 0.000527 Epoch: 13 Global Step: 287510 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:35,558-Speed 2496.15 samples/sec Loss 2.8214 LearningRate 0.000527 Epoch: 13 Global Step: 287520 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:43,705-Speed 2514.22 samples/sec Loss 2.8567 LearningRate 0.000527 Epoch: 13 Global Step: 287530 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:12:51,924-Speed 2492.28 samples/sec Loss 2.8157 LearningRate 0.000527 Epoch: 13 Global Step: 287540 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:00,138-Speed 2493.89 samples/sec Loss 2.8487 LearningRate 0.000527 Epoch: 13 Global Step: 287550 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:08,351-Speed 2493.84 samples/sec Loss 2.7908 LearningRate 0.000527 Epoch: 13 Global Step: 287560 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:16,569-Speed 2492.56 samples/sec Loss 2.8265 LearningRate 0.000527 Epoch: 13 Global Step: 287570 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:24,772-Speed 2496.86 samples/sec Loss 2.8071 LearningRate 0.000527 Epoch: 13 Global Step: 287580 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:32,923-Speed 2513.09 samples/sec Loss 2.9655 LearningRate 0.000527 Epoch: 13 Global Step: 287590 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:41,126-Speed 2497.09 samples/sec Loss 2.8746 LearningRate 0.000527 Epoch: 13 Global Step: 287600 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:49,339-Speed 2493.72 samples/sec Loss 2.8222 LearningRate 0.000527 Epoch: 13 Global Step: 287610 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:13:57,553-Speed 2494.08 samples/sec Loss 2.8514 LearningRate 0.000527 Epoch: 13 Global Step: 287620 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:05,782-Speed 2489.32 samples/sec Loss 2.8612 LearningRate 0.000527 Epoch: 13 Global Step: 287630 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:13,986-Speed 2496.69 samples/sec Loss 2.8809 LearningRate 0.000527 Epoch: 13 Global Step: 287640 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:22,144-Speed 2510.82 samples/sec Loss 2.8067 LearningRate 0.000527 Epoch: 13 Global Step: 287650 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:30,366-Speed 2491.20 samples/sec Loss 2.8563 LearningRate 0.000527 Epoch: 13 Global Step: 287660 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:38,569-Speed 2497.11 samples/sec Loss 2.9454 LearningRate 0.000527 Epoch: 13 Global Step: 287670 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:46,779-Speed 2494.77 samples/sec Loss 2.8446 LearningRate 0.000527 Epoch: 13 Global Step: 287680 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:14:54,980-Speed 2497.76 samples/sec Loss 2.8797 LearningRate 0.000527 Epoch: 13 Global Step: 287690 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:03,183-Speed 2497.04 samples/sec Loss 2.9101 LearningRate 0.000527 Epoch: 13 Global Step: 287700 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:11,332-Speed 2513.22 samples/sec Loss 2.8704 LearningRate 0.000527 Epoch: 13 Global Step: 287710 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:19,547-Speed 2493.70 samples/sec Loss 2.8998 LearningRate 0.000527 Epoch: 13 Global Step: 287720 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:27,750-Speed 2496.97 samples/sec Loss 2.8809 LearningRate 0.000527 Epoch: 13 Global Step: 287730 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:35,968-Speed 2492.57 samples/sec Loss 2.8500 LearningRate 0.000527 Epoch: 13 Global Step: 287740 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:44,184-Speed 2493.16 samples/sec Loss 2.8701 LearningRate 0.000527 Epoch: 13 Global Step: 287750 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:15:52,396-Speed 2494.27 samples/sec Loss 2.8906 LearningRate 0.000527 Epoch: 13 Global Step: 287760 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:16:00,546-Speed 2513.19 samples/sec Loss 2.8682 LearningRate 0.000527 Epoch: 13 Global Step: 287770 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:16:08,746-Speed 2498.07 samples/sec Loss 2.8459 LearningRate 0.000527 Epoch: 13 Global Step: 287780 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:16:16,949-Speed 2496.79 samples/sec Loss 2.8408 LearningRate 0.000527 Epoch: 13 Global Step: 287790 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:16:25,152-Speed 2497.12 samples/sec Loss 2.8827 LearningRate 0.000527 Epoch: 13 Global Step: 287800 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:16:33,378-Speed 2489.97 samples/sec Loss 2.8434 LearningRate 0.000527 Epoch: 13 Global Step: 287810 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:16:41,580-Speed 2497.62 samples/sec Loss 2.8521 LearningRate 0.000527 Epoch: 13 Global Step: 287820 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:16:49,744-Speed 2509.20 samples/sec Loss 2.8295 LearningRate 0.000526 Epoch: 13 Global Step: 287830 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:16:57,943-Speed 2498.43 samples/sec Loss 2.8182 LearningRate 0.000526 Epoch: 13 Global Step: 287840 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:06,144-Speed 2497.62 samples/sec Loss 2.8450 LearningRate 0.000526 Epoch: 13 Global Step: 287850 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:14,346-Speed 2497.09 samples/sec Loss 2.8646 LearningRate 0.000526 Epoch: 13 Global Step: 287860 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:22,563-Speed 2492.74 samples/sec Loss 2.8419 LearningRate 0.000526 Epoch: 13 Global Step: 287870 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:30,765-Speed 2497.42 samples/sec Loss 2.8687 LearningRate 0.000526 Epoch: 13 Global Step: 287880 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:38,911-Speed 2515.15 samples/sec Loss 2.7873 LearningRate 0.000526 Epoch: 13 Global Step: 287890 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:47,111-Speed 2497.71 samples/sec Loss 2.9242 LearningRate 0.000526 Epoch: 13 Global Step: 287900 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:17:55,315-Speed 2496.93 samples/sec Loss 2.8695 LearningRate 0.000526 Epoch: 13 Global Step: 287910 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:03,516-Speed 2497.64 samples/sec Loss 2.9123 LearningRate 0.000526 Epoch: 13 Global Step: 287920 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:11,723-Speed 2495.77 samples/sec Loss 2.8861 LearningRate 0.000526 Epoch: 13 Global Step: 287930 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:19,925-Speed 2497.26 samples/sec Loss 2.9052 LearningRate 0.000526 Epoch: 13 Global Step: 287940 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:28,078-Speed 2512.60 samples/sec Loss 2.9132 LearningRate 0.000526 Epoch: 13 Global Step: 287950 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:36,277-Speed 2498.28 samples/sec Loss 2.8830 LearningRate 0.000526 Epoch: 13 Global Step: 287960 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:44,484-Speed 2495.52 samples/sec Loss 2.8988 LearningRate 0.000526 Epoch: 13 Global Step: 287970 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:18:52,682-Speed 2498.75 samples/sec Loss 2.8510 LearningRate 0.000526 Epoch: 13 Global Step: 287980 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:00,886-Speed 2496.77 samples/sec Loss 2.8911 LearningRate 0.000526 Epoch: 13 Global Step: 287990 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:09,090-Speed 2496.55 samples/sec Loss 2.9154 LearningRate 0.000526 Epoch: 13 Global Step: 288000 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:17,237-Speed 2514.41 samples/sec Loss 2.8771 LearningRate 0.000526 Epoch: 13 Global Step: 288010 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:25,447-Speed 2495.24 samples/sec Loss 2.8322 LearningRate 0.000526 Epoch: 13 Global Step: 288020 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:33,650-Speed 2497.11 samples/sec Loss 2.8199 LearningRate 0.000526 Epoch: 13 Global Step: 288030 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:41,849-Speed 2498.04 samples/sec Loss 2.8069 LearningRate 0.000526 Epoch: 13 Global Step: 288040 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:50,050-Speed 2497.56 samples/sec Loss 2.8428 LearningRate 0.000526 Epoch: 13 Global Step: 288050 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:19:58,251-Speed 2497.81 samples/sec Loss 2.8883 LearningRate 0.000526 Epoch: 13 Global Step: 288060 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:06,398-Speed 2514.25 samples/sec Loss 2.8336 LearningRate 0.000526 Epoch: 13 Global Step: 288070 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:14,604-Speed 2496.20 samples/sec Loss 2.8480 LearningRate 0.000526 Epoch: 13 Global Step: 288080 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:22,807-Speed 2497.03 samples/sec Loss 2.8857 LearningRate 0.000526 Epoch: 13 Global Step: 288090 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:31,010-Speed 2496.89 samples/sec Loss 2.8022 LearningRate 0.000526 Epoch: 13 Global Step: 288100 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:39,212-Speed 2497.53 samples/sec Loss 2.8838 LearningRate 0.000526 Epoch: 13 Global Step: 288110 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:47,416-Speed 2496.89 samples/sec Loss 2.8589 LearningRate 0.000526 Epoch: 13 Global Step: 288120 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:20:55,565-Speed 2513.53 samples/sec Loss 2.8019 LearningRate 0.000526 Epoch: 13 Global Step: 288130 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:03,762-Speed 2499.07 samples/sec Loss 2.7894 LearningRate 0.000526 Epoch: 13 Global Step: 288140 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:11,965-Speed 2497.06 samples/sec Loss 2.8565 LearningRate 0.000526 Epoch: 13 Global Step: 288150 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:20,166-Speed 2497.72 samples/sec Loss 2.8699 LearningRate 0.000526 Epoch: 13 Global Step: 288160 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:28,367-Speed 2497.99 samples/sec Loss 2.8395 LearningRate 0.000526 Epoch: 13 Global Step: 288170 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:36,575-Speed 2495.56 samples/sec Loss 2.7978 LearningRate 0.000526 Epoch: 13 Global Step: 288180 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:44,716-Speed 2516.15 samples/sec Loss 2.9856 LearningRate 0.000526 Epoch: 13 Global Step: 288190 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:21:52,916-Speed 2497.90 samples/sec Loss 2.8648 LearningRate 0.000526 Epoch: 13 Global Step: 288200 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:01,124-Speed 2495.61 samples/sec Loss 2.8550 LearningRate 0.000526 Epoch: 13 Global Step: 288210 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:09,327-Speed 2497.08 samples/sec Loss 2.7989 LearningRate 0.000526 Epoch: 13 Global Step: 288220 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:17,529-Speed 2497.05 samples/sec Loss 2.9297 LearningRate 0.000526 Epoch: 13 Global Step: 288230 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:25,730-Speed 2497.60 samples/sec Loss 2.8850 LearningRate 0.000526 Epoch: 13 Global Step: 288240 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:33,879-Speed 2513.79 samples/sec Loss 2.9003 LearningRate 0.000526 Epoch: 13 Global Step: 288250 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:42,085-Speed 2496.51 samples/sec Loss 2.8277 LearningRate 0.000526 Epoch: 13 Global Step: 288260 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:50,295-Speed 2494.84 samples/sec Loss 2.8288 LearningRate 0.000526 Epoch: 13 Global Step: 288270 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:22:58,495-Speed 2497.84 samples/sec Loss 2.9097 LearningRate 0.000526 Epoch: 13 Global Step: 288280 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:06,693-Speed 2498.82 samples/sec Loss 2.8905 LearningRate 0.000526 Epoch: 13 Global Step: 288290 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:14,896-Speed 2497.03 samples/sec Loss 2.8396 LearningRate 0.000526 Epoch: 13 Global Step: 288300 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:23,043-Speed 2514.09 samples/sec Loss 2.8944 LearningRate 0.000526 Epoch: 13 Global Step: 288310 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:31,244-Speed 2497.48 samples/sec Loss 2.8884 LearningRate 0.000526 Epoch: 13 Global Step: 288320 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:39,446-Speed 2497.34 samples/sec Loss 2.8145 LearningRate 0.000526 Epoch: 13 Global Step: 288330 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:23:47,604-Speed 2510.77 samples/sec Loss 2.8247 LearningRate 0.000525 Epoch: 13 Global Step: 288340 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:23:55,807-Speed 2497.03 samples/sec Loss 2.8534 LearningRate 0.000525 Epoch: 13 Global Step: 288350 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:04,008-Speed 2497.73 samples/sec Loss 2.8355 LearningRate 0.000525 Epoch: 13 Global Step: 288360 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:12,156-Speed 2513.94 samples/sec Loss 2.8505 LearningRate 0.000525 Epoch: 13 Global Step: 288370 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:20,364-Speed 2495.49 samples/sec Loss 2.8245 LearningRate 0.000525 Epoch: 13 Global Step: 288380 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:28,567-Speed 2496.87 samples/sec Loss 2.8572 LearningRate 0.000525 Epoch: 13 Global Step: 288390 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:36,767-Speed 2497.75 samples/sec Loss 2.8605 LearningRate 0.000525 Epoch: 13 Global Step: 288400 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:44,984-Speed 2492.94 samples/sec Loss 2.8643 LearningRate 0.000525 Epoch: 13 Global Step: 288410 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:24:53,185-Speed 2497.59 samples/sec Loss 2.8585 LearningRate 0.000525 Epoch: 13 Global Step: 288420 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:01,333-Speed 2513.86 samples/sec Loss 2.7957 LearningRate 0.000525 Epoch: 13 Global Step: 288430 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:09,544-Speed 2494.60 samples/sec Loss 2.8279 LearningRate 0.000525 Epoch: 13 Global Step: 288440 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:17,756-Speed 2494.33 samples/sec Loss 2.8004 LearningRate 0.000525 Epoch: 13 Global Step: 288450 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:25,959-Speed 2497.33 samples/sec Loss 2.8201 LearningRate 0.000525 Epoch: 13 Global Step: 288460 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:34,166-Speed 2495.74 samples/sec Loss 2.8108 LearningRate 0.000525 Epoch: 13 Global Step: 288470 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:42,374-Speed 2495.35 samples/sec Loss 2.8591 LearningRate 0.000525 Epoch: 13 Global Step: 288480 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:50,522-Speed 2514.05 samples/sec Loss 2.8985 LearningRate 0.000525 Epoch: 13 Global Step: 288490 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:25:58,725-Speed 2497.05 samples/sec Loss 2.9388 LearningRate 0.000525 Epoch: 13 Global Step: 288500 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:06,928-Speed 2497.15 samples/sec Loss 2.8388 LearningRate 0.000525 Epoch: 13 Global Step: 288510 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:15,130-Speed 2497.44 samples/sec Loss 2.8879 LearningRate 0.000525 Epoch: 13 Global Step: 288520 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:23,332-Speed 2497.62 samples/sec Loss 2.8607 LearningRate 0.000525 Epoch: 13 Global Step: 288530 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:31,531-Speed 2498.11 samples/sec Loss 2.9571 LearningRate 0.000525 Epoch: 13 Global Step: 288540 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:39,681-Speed 2513.30 samples/sec Loss 2.8665 LearningRate 0.000525 Epoch: 13 Global Step: 288550 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:47,888-Speed 2495.90 samples/sec Loss 2.9904 LearningRate 0.000525 Epoch: 13 Global Step: 288560 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:26:56,086-Speed 2498.53 samples/sec Loss 2.9482 LearningRate 0.000525 Epoch: 13 Global Step: 288570 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:04,292-Speed 2496.23 samples/sec Loss 2.9566 LearningRate 0.000525 Epoch: 13 Global Step: 288580 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:12,495-Speed 2497.09 samples/sec Loss 2.9764 LearningRate 0.000525 Epoch: 13 Global Step: 288590 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:20,701-Speed 2496.29 samples/sec Loss 2.9832 LearningRate 0.000525 Epoch: 13 Global Step: 288600 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:28,854-Speed 2512.39 samples/sec Loss 2.9807 LearningRate 0.000525 Epoch: 13 Global Step: 288610 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:37,055-Speed 2497.61 samples/sec Loss 2.9269 LearningRate 0.000525 Epoch: 13 Global Step: 288620 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:45,259-Speed 2496.82 samples/sec Loss 2.8947 LearningRate 0.000525 Epoch: 13 Global Step: 288630 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:27:53,460-Speed 2497.60 samples/sec Loss 2.9110 LearningRate 0.000525 Epoch: 13 Global Step: 288640 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:01,661-Speed 2497.66 samples/sec Loss 2.8772 LearningRate 0.000525 Epoch: 13 Global Step: 288650 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:09,860-Speed 2498.16 samples/sec Loss 2.8980 LearningRate 0.000525 Epoch: 13 Global Step: 288660 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:18,009-Speed 2513.64 samples/sec Loss 2.8685 LearningRate 0.000525 Epoch: 13 Global Step: 288670 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:26,212-Speed 2497.09 samples/sec Loss 2.8867 LearningRate 0.000525 Epoch: 13 Global Step: 288680 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:34,411-Speed 2498.45 samples/sec Loss 2.9017 LearningRate 0.000525 Epoch: 13 Global Step: 288690 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:42,610-Speed 2498.23 samples/sec Loss 2.8834 LearningRate 0.000525 Epoch: 13 Global Step: 288700 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:50,807-Speed 2498.83 samples/sec Loss 2.8370 LearningRate 0.000525 Epoch: 13 Global Step: 288710 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:28:59,016-Speed 2495.38 samples/sec Loss 2.9040 LearningRate 0.000525 Epoch: 13 Global Step: 288720 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:07,173-Speed 2510.89 samples/sec Loss 2.8714 LearningRate 0.000525 Epoch: 13 Global Step: 288730 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:15,377-Speed 2496.92 samples/sec Loss 2.8645 LearningRate 0.000525 Epoch: 13 Global Step: 288740 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:23,578-Speed 2497.79 samples/sec Loss 2.9149 LearningRate 0.000525 Epoch: 13 Global Step: 288750 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:31,777-Speed 2498.38 samples/sec Loss 2.8131 LearningRate 0.000525 Epoch: 13 Global Step: 288760 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:39,974-Speed 2498.88 samples/sec Loss 2.8729 LearningRate 0.000525 Epoch: 13 Global Step: 288770 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:48,172-Speed 2498.47 samples/sec Loss 2.9053 LearningRate 0.000525 Epoch: 13 Global Step: 288780 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:29:56,319-Speed 2514.46 samples/sec Loss 2.8627 LearningRate 0.000525 Epoch: 13 Global Step: 288790 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:04,521-Speed 2497.55 samples/sec Loss 2.8866 LearningRate 0.000525 Epoch: 13 Global Step: 288800 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:12,719-Speed 2498.68 samples/sec Loss 2.8829 LearningRate 0.000525 Epoch: 13 Global Step: 288810 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:20,932-Speed 2493.74 samples/sec Loss 2.8785 LearningRate 0.000525 Epoch: 13 Global Step: 288820 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:29,132-Speed 2497.93 samples/sec Loss 2.8243 LearningRate 0.000525 Epoch: 13 Global Step: 288830 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:37,335-Speed 2497.05 samples/sec Loss 2.8633 LearningRate 0.000525 Epoch: 13 Global Step: 288840 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:45,499-Speed 2509.11 samples/sec Loss 2.8736 LearningRate 0.000525 Epoch: 13 Global Step: 288850 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:30:53,702-Speed 2497.00 samples/sec Loss 2.8745 LearningRate 0.000524 Epoch: 13 Global Step: 288860 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:01,905-Speed 2496.94 samples/sec Loss 2.8764 LearningRate 0.000524 Epoch: 13 Global Step: 288870 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:10,105-Speed 2498.29 samples/sec Loss 2.9195 LearningRate 0.000524 Epoch: 13 Global Step: 288880 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:18,319-Speed 2493.53 samples/sec Loss 2.8684 LearningRate 0.000524 Epoch: 13 Global Step: 288890 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:26,516-Speed 2498.78 samples/sec Loss 2.8686 LearningRate 0.000524 Epoch: 13 Global Step: 288900 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:34,664-Speed 2514.01 samples/sec Loss 2.8912 LearningRate 0.000524 Epoch: 13 Global Step: 288910 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:42,862-Speed 2498.85 samples/sec Loss 2.8324 LearningRate 0.000524 Epoch: 13 Global Step: 288920 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:51,062-Speed 2497.95 samples/sec Loss 2.9135 LearningRate 0.000524 Epoch: 13 Global Step: 288930 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:31:59,267-Speed 2496.26 samples/sec Loss 2.9099 LearningRate 0.000524 Epoch: 13 Global Step: 288940 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:07,465-Speed 2498.78 samples/sec Loss 2.9427 LearningRate 0.000524 Epoch: 13 Global Step: 288950 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:15,665-Speed 2497.86 samples/sec Loss 2.9777 LearningRate 0.000524 Epoch: 13 Global Step: 288960 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:23,822-Speed 2510.96 samples/sec Loss 2.8687 LearningRate 0.000524 Epoch: 13 Global Step: 288970 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:32,023-Speed 2497.79 samples/sec Loss 2.9221 LearningRate 0.000524 Epoch: 13 Global Step: 288980 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:40,225-Speed 2497.54 samples/sec Loss 2.9146 LearningRate 0.000524 Epoch: 13 Global Step: 288990 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:48,426-Speed 2497.45 samples/sec Loss 2.9094 LearningRate 0.000524 Epoch: 13 Global Step: 289000 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:32:56,627-Speed 2497.89 samples/sec Loss 2.9480 LearningRate 0.000524 Epoch: 13 Global Step: 289010 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:04,826-Speed 2498.15 samples/sec Loss 2.9825 LearningRate 0.000524 Epoch: 13 Global Step: 289020 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:12,972-Speed 2514.69 samples/sec Loss 2.9081 LearningRate 0.000524 Epoch: 13 Global Step: 289030 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:21,174-Speed 2497.37 samples/sec Loss 2.9227 LearningRate 0.000524 Epoch: 13 Global Step: 289040 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:29,374-Speed 2497.75 samples/sec Loss 2.8366 LearningRate 0.000524 Epoch: 13 Global Step: 289050 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:37,573-Speed 2498.60 samples/sec Loss 2.9077 LearningRate 0.000524 Epoch: 13 Global Step: 289060 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:45,777-Speed 2496.86 samples/sec Loss 2.8685 LearningRate 0.000524 Epoch: 13 Global Step: 289070 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:33:53,974-Speed 2498.68 samples/sec Loss 2.8028 LearningRate 0.000524 Epoch: 13 Global Step: 289080 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:02,121-Speed 2514.46 samples/sec Loss 2.8396 LearningRate 0.000524 Epoch: 13 Global Step: 289090 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:10,326-Speed 2496.24 samples/sec Loss 2.8462 LearningRate 0.000524 Epoch: 13 Global Step: 289100 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:18,543-Speed 2493.09 samples/sec Loss 2.8148 LearningRate 0.000524 Epoch: 13 Global Step: 289110 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:26,744-Speed 2497.52 samples/sec Loss 2.7757 LearningRate 0.000524 Epoch: 13 Global Step: 289120 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:34,948-Speed 2496.78 samples/sec Loss 2.7715 LearningRate 0.000524 Epoch: 13 Global Step: 289130 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:43,154-Speed 2495.83 samples/sec Loss 2.8498 LearningRate 0.000524 Epoch: 13 Global Step: 289140 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:51,303-Speed 2513.70 samples/sec Loss 2.8499 LearningRate 0.000524 Epoch: 13 Global Step: 289150 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:34:59,512-Speed 2495.34 samples/sec Loss 2.7676 LearningRate 0.000524 Epoch: 13 Global Step: 289160 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:07,718-Speed 2495.99 samples/sec Loss 2.8205 LearningRate 0.000524 Epoch: 13 Global Step: 289170 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:15,925-Speed 2496.05 samples/sec Loss 2.8656 LearningRate 0.000524 Epoch: 13 Global Step: 289180 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:24,133-Speed 2496.44 samples/sec Loss 2.8550 LearningRate 0.000524 Epoch: 13 Global Step: 289190 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:32,356-Speed 2491.01 samples/sec Loss 2.8085 LearningRate 0.000524 Epoch: 13 Global Step: 289200 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:40,504-Speed 2513.76 samples/sec Loss 2.8406 LearningRate 0.000524 Epoch: 13 Global Step: 289210 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:48,708-Speed 2496.69 samples/sec Loss 2.8415 LearningRate 0.000524 Epoch: 13 Global Step: 289220 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:35:56,906-Speed 2498.53 samples/sec Loss 2.8846 LearningRate 0.000524 Epoch: 13 Global Step: 289230 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:05,111-Speed 2496.36 samples/sec Loss 2.8566 LearningRate 0.000524 Epoch: 13 Global Step: 289240 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:13,312-Speed 2497.76 samples/sec Loss 2.8394 LearningRate 0.000524 Epoch: 13 Global Step: 289250 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:21,511-Speed 2498.07 samples/sec Loss 2.8520 LearningRate 0.000524 Epoch: 13 Global Step: 289260 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:29,656-Speed 2515.01 samples/sec Loss 2.8582 LearningRate 0.000524 Epoch: 13 Global Step: 289270 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:37,859-Speed 2497.04 samples/sec Loss 2.8890 LearningRate 0.000524 Epoch: 13 Global Step: 289280 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:46,055-Speed 2499.09 samples/sec Loss 2.8380 LearningRate 0.000524 Epoch: 13 Global Step: 289290 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:36:54,268-Speed 2494.10 samples/sec Loss 2.8036 LearningRate 0.000524 Epoch: 13 Global Step: 289300 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:02,474-Speed 2496.21 samples/sec Loss 2.9555 LearningRate 0.000524 Epoch: 13 Global Step: 289310 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:10,671-Speed 2498.74 samples/sec Loss 2.9043 LearningRate 0.000524 Epoch: 13 Global Step: 289320 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:18,817-Speed 2514.35 samples/sec Loss 2.8427 LearningRate 0.000524 Epoch: 13 Global Step: 289330 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:27,019-Speed 2497.61 samples/sec Loss 2.8787 LearningRate 0.000524 Epoch: 13 Global Step: 289340 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:35,230-Speed 2494.45 samples/sec Loss 2.9014 LearningRate 0.000524 Epoch: 13 Global Step: 289350 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:43,428-Speed 2498.40 samples/sec Loss 2.8407 LearningRate 0.000524 Epoch: 13 Global Step: 289360 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:51,632-Speed 2496.96 samples/sec Loss 2.9101 LearningRate 0.000523 Epoch: 13 Global Step: 289370 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:37:59,833-Speed 2497.88 samples/sec Loss 2.8060 LearningRate 0.000523 Epoch: 13 Global Step: 289380 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:07,978-Speed 2514.84 samples/sec Loss 2.8698 LearningRate 0.000523 Epoch: 13 Global Step: 289390 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:16,179-Speed 2497.85 samples/sec Loss 2.8787 LearningRate 0.000523 Epoch: 13 Global Step: 289400 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:24,382-Speed 2496.88 samples/sec Loss 2.8237 LearningRate 0.000523 Epoch: 13 Global Step: 289410 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:32,591-Speed 2495.17 samples/sec Loss 2.8703 LearningRate 0.000523 Epoch: 13 Global Step: 289420 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:40,793-Speed 2497.45 samples/sec Loss 2.8322 LearningRate 0.000523 Epoch: 13 Global Step: 289430 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:48,998-Speed 2496.39 samples/sec Loss 2.8175 LearningRate 0.000523 Epoch: 13 Global Step: 289440 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:38:57,148-Speed 2513.49 samples/sec Loss 2.8383 LearningRate 0.000523 Epoch: 13 Global Step: 289450 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:05,348-Speed 2497.76 samples/sec Loss 2.8287 LearningRate 0.000523 Epoch: 13 Global Step: 289460 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:13,557-Speed 2495.14 samples/sec Loss 2.8781 LearningRate 0.000523 Epoch: 13 Global Step: 289470 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:21,766-Speed 2495.31 samples/sec Loss 2.8291 LearningRate 0.000523 Epoch: 13 Global Step: 289480 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:29,981-Speed 2493.27 samples/sec Loss 2.8355 LearningRate 0.000523 Epoch: 13 Global Step: 289490 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:38,186-Speed 2496.37 samples/sec Loss 2.8334 LearningRate 0.000523 Epoch: 13 Global Step: 289500 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:46,339-Speed 2512.61 samples/sec Loss 2.8178 LearningRate 0.000523 Epoch: 13 Global Step: 289510 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:39:54,545-Speed 2496.20 samples/sec Loss 2.8438 LearningRate 0.000523 Epoch: 13 Global Step: 289520 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:40:02,750-Speed 2496.54 samples/sec Loss 2.8124 LearningRate 0.000523 Epoch: 13 Global Step: 289530 Fp16 Grad Scale: 32768 Required: 124 hours Training: 2022-07-08 08:40:10,978-Speed 2489.15 samples/sec Loss 2.8374 LearningRate 0.000523 Epoch: 13 Global Step: 289540 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:40:19,197-Speed 2492.32 samples/sec Loss 2.8531 LearningRate 0.000523 Epoch: 13 Global Step: 289550 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:40:27,407-Speed 2494.86 samples/sec Loss 2.9685 LearningRate 0.000523 Epoch: 13 Global Step: 289560 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:40:35,556-Speed 2513.67 samples/sec Loss 2.8586 LearningRate 0.000523 Epoch: 13 Global Step: 289570 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:40:43,756-Speed 2498.07 samples/sec Loss 2.9186 LearningRate 0.000523 Epoch: 13 Global Step: 289580 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:40:51,960-Speed 2496.98 samples/sec Loss 2.8702 LearningRate 0.000523 Epoch: 13 Global Step: 289590 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:41:00,164-Speed 2496.64 samples/sec Loss 2.8635 LearningRate 0.000523 Epoch: 13 Global Step: 289600 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:41:08,370-Speed 2496.22 samples/sec Loss 2.9227 LearningRate 0.000523 Epoch: 13 Global Step: 289610 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:41:16,571-Speed 2497.88 samples/sec Loss 2.8690 LearningRate 0.000523 Epoch: 13 Global Step: 289620 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:41:24,722-Speed 2513.05 samples/sec Loss 2.8600 LearningRate 0.000523 Epoch: 13 Global Step: 289630 Fp16 Grad Scale: 65536 Required: 124 hours Training: 2022-07-08 08:41:32,927-Speed 2496.45 samples/sec Loss 2.8677 LearningRate 0.000523 Epoch: 13 Global Step: 289640 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:41:41,133-Speed 2496.24 samples/sec Loss 2.8761 LearningRate 0.000523 Epoch: 13 Global Step: 289650 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:41:49,336-Speed 2496.74 samples/sec Loss 2.8390 LearningRate 0.000523 Epoch: 13 Global Step: 289660 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:41:57,542-Speed 2496.65 samples/sec Loss 2.8946 LearningRate 0.000523 Epoch: 13 Global Step: 289670 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:05,745-Speed 2497.16 samples/sec Loss 2.8345 LearningRate 0.000523 Epoch: 13 Global Step: 289680 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:13,891-Speed 2514.48 samples/sec Loss 2.8815 LearningRate 0.000523 Epoch: 13 Global Step: 289690 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:22,092-Speed 2497.58 samples/sec Loss 2.9441 LearningRate 0.000523 Epoch: 13 Global Step: 289700 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:30,293-Speed 2497.55 samples/sec Loss 2.8710 LearningRate 0.000523 Epoch: 13 Global Step: 289710 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:38,493-Speed 2498.23 samples/sec Loss 2.8586 LearningRate 0.000523 Epoch: 13 Global Step: 289720 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:46,693-Speed 2498.05 samples/sec Loss 2.7786 LearningRate 0.000523 Epoch: 13 Global Step: 289730 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:42:54,892-Speed 2498.12 samples/sec Loss 2.8278 LearningRate 0.000523 Epoch: 13 Global Step: 289740 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:03,052-Speed 2510.41 samples/sec Loss 2.8757 LearningRate 0.000523 Epoch: 13 Global Step: 289750 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:11,248-Speed 2500.06 samples/sec Loss 2.8672 LearningRate 0.000523 Epoch: 13 Global Step: 289760 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:19,459-Speed 2494.45 samples/sec Loss 2.8597 LearningRate 0.000523 Epoch: 13 Global Step: 289770 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:27,653-Speed 2499.84 samples/sec Loss 2.8820 LearningRate 0.000523 Epoch: 13 Global Step: 289780 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:35,874-Speed 2491.72 samples/sec Loss 2.9360 LearningRate 0.000523 Epoch: 13 Global Step: 289790 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:44,075-Speed 2497.93 samples/sec Loss 2.8625 LearningRate 0.000523 Epoch: 13 Global Step: 289800 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:43:52,218-Speed 2515.14 samples/sec Loss 2.8909 LearningRate 0.000523 Epoch: 13 Global Step: 289810 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:00,419-Speed 2497.67 samples/sec Loss 2.8936 LearningRate 0.000523 Epoch: 13 Global Step: 289820 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:08,615-Speed 2499.24 samples/sec Loss 2.8643 LearningRate 0.000523 Epoch: 13 Global Step: 289830 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:16,828-Speed 2494.20 samples/sec Loss 2.8480 LearningRate 0.000523 Epoch: 13 Global Step: 289840 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:25,035-Speed 2495.86 samples/sec Loss 2.8698 LearningRate 0.000523 Epoch: 13 Global Step: 289850 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:33,239-Speed 2496.79 samples/sec Loss 2.8942 LearningRate 0.000523 Epoch: 13 Global Step: 289860 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:41,400-Speed 2509.97 samples/sec Loss 2.7409 LearningRate 0.000523 Epoch: 13 Global Step: 289870 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:49,607-Speed 2495.68 samples/sec Loss 2.8782 LearningRate 0.000523 Epoch: 13 Global Step: 289880 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 08:44:57,768-Speed 2509.87 samples/sec Loss 2.8291 LearningRate 0.000522 Epoch: 13 Global Step: 289890 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:05,974-Speed 2495.92 samples/sec Loss 2.8317 LearningRate 0.000522 Epoch: 13 Global Step: 289900 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:14,177-Speed 2497.28 samples/sec Loss 2.7818 LearningRate 0.000522 Epoch: 13 Global Step: 289910 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:22,375-Speed 2498.39 samples/sec Loss 2.8832 LearningRate 0.000522 Epoch: 13 Global Step: 289920 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:30,521-Speed 2514.50 samples/sec Loss 2.9027 LearningRate 0.000522 Epoch: 13 Global Step: 289930 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:38,725-Speed 2496.65 samples/sec Loss 2.8534 LearningRate 0.000522 Epoch: 13 Global Step: 289940 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:46,929-Speed 2497.37 samples/sec Loss 2.8224 LearningRate 0.000522 Epoch: 13 Global Step: 289950 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:45:55,136-Speed 2495.51 samples/sec Loss 2.8125 LearningRate 0.000522 Epoch: 13 Global Step: 289960 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:03,340-Speed 2496.70 samples/sec Loss 2.8603 LearningRate 0.000522 Epoch: 13 Global Step: 289970 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:11,544-Speed 2496.87 samples/sec Loss 2.7889 LearningRate 0.000522 Epoch: 13 Global Step: 289980 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:19,692-Speed 2514.03 samples/sec Loss 2.8408 LearningRate 0.000522 Epoch: 13 Global Step: 289990 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:27,889-Speed 2498.69 samples/sec Loss 2.7848 LearningRate 0.000522 Epoch: 13 Global Step: 290000 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:36,089-Speed 2497.95 samples/sec Loss 2.8096 LearningRate 0.000522 Epoch: 13 Global Step: 290010 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:44,287-Speed 2498.64 samples/sec Loss 2.8704 LearningRate 0.000522 Epoch: 13 Global Step: 290020 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:46:52,486-Speed 2498.41 samples/sec Loss 2.8491 LearningRate 0.000522 Epoch: 13 Global Step: 290030 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:00,699-Speed 2493.87 samples/sec Loss 2.7441 LearningRate 0.000522 Epoch: 13 Global Step: 290040 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:08,842-Speed 2515.39 samples/sec Loss 2.8339 LearningRate 0.000522 Epoch: 13 Global Step: 290050 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:17,046-Speed 2496.75 samples/sec Loss 2.8631 LearningRate 0.000522 Epoch: 13 Global Step: 290060 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:25,244-Speed 2498.57 samples/sec Loss 2.8025 LearningRate 0.000522 Epoch: 13 Global Step: 290070 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:33,443-Speed 2498.39 samples/sec Loss 2.7788 LearningRate 0.000522 Epoch: 13 Global Step: 290080 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:41,642-Speed 2498.01 samples/sec Loss 2.9009 LearningRate 0.000522 Epoch: 13 Global Step: 290090 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:49,839-Speed 2498.95 samples/sec Loss 2.8905 LearningRate 0.000522 Epoch: 13 Global Step: 290100 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:47:57,987-Speed 2513.92 samples/sec Loss 2.8900 LearningRate 0.000522 Epoch: 13 Global Step: 290110 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:06,189-Speed 2497.24 samples/sec Loss 2.8510 LearningRate 0.000522 Epoch: 13 Global Step: 290120 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:14,390-Speed 2497.61 samples/sec Loss 2.8465 LearningRate 0.000522 Epoch: 13 Global Step: 290130 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:22,594-Speed 2497.25 samples/sec Loss 2.8371 LearningRate 0.000522 Epoch: 13 Global Step: 290140 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:30,790-Speed 2499.03 samples/sec Loss 2.8299 LearningRate 0.000522 Epoch: 13 Global Step: 290150 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:38,991-Speed 2497.67 samples/sec Loss 2.8878 LearningRate 0.000522 Epoch: 13 Global Step: 290160 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:47,144-Speed 2512.42 samples/sec Loss 2.8448 LearningRate 0.000522 Epoch: 13 Global Step: 290170 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:48:55,342-Speed 2498.59 samples/sec Loss 2.8114 LearningRate 0.000522 Epoch: 13 Global Step: 290180 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:03,543-Speed 2497.57 samples/sec Loss 2.9141 LearningRate 0.000522 Epoch: 13 Global Step: 290190 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:11,744-Speed 2497.72 samples/sec Loss 2.8291 LearningRate 0.000522 Epoch: 13 Global Step: 290200 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:19,939-Speed 2499.42 samples/sec Loss 2.8557 LearningRate 0.000522 Epoch: 13 Global Step: 290210 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:28,143-Speed 2496.95 samples/sec Loss 2.7773 LearningRate 0.000522 Epoch: 13 Global Step: 290220 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:36,296-Speed 2512.08 samples/sec Loss 2.8814 LearningRate 0.000522 Epoch: 13 Global Step: 290230 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:44,497-Speed 2497.68 samples/sec Loss 2.8637 LearningRate 0.000522 Epoch: 13 Global Step: 290240 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:49:52,695-Speed 2498.75 samples/sec Loss 2.8684 LearningRate 0.000522 Epoch: 13 Global Step: 290250 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:50:00,895-Speed 2498.03 samples/sec Loss 2.8354 LearningRate 0.000522 Epoch: 13 Global Step: 290260 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 08:50:09,051-Speed 2511.53 samples/sec Loss 2.8905 LearningRate 0.000522 Epoch: 13 Global Step: 290270 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:17,249-Speed 2498.54 samples/sec Loss 2.9090 LearningRate 0.000522 Epoch: 13 Global Step: 290280 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:25,395-Speed 2514.47 samples/sec Loss 2.8116 LearningRate 0.000522 Epoch: 13 Global Step: 290290 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:33,597-Speed 2497.49 samples/sec Loss 2.9635 LearningRate 0.000522 Epoch: 13 Global Step: 290300 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:41,808-Speed 2494.55 samples/sec Loss 2.8733 LearningRate 0.000522 Epoch: 13 Global Step: 290310 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:50,007-Speed 2498.22 samples/sec Loss 2.8707 LearningRate 0.000522 Epoch: 13 Global Step: 290320 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:50:58,209-Speed 2497.20 samples/sec Loss 2.9027 LearningRate 0.000522 Epoch: 13 Global Step: 290330 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:06,409-Speed 2497.78 samples/sec Loss 2.8509 LearningRate 0.000522 Epoch: 13 Global Step: 290340 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:14,568-Speed 2510.57 samples/sec Loss 2.8540 LearningRate 0.000522 Epoch: 13 Global Step: 290350 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:24,883-Speed 1985.70 samples/sec Loss 2.8737 LearningRate 0.000522 Epoch: 14 Global Step: 290360 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:33,081-Speed 2498.97 samples/sec Loss 2.8878 LearningRate 0.000522 Epoch: 14 Global Step: 290370 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:41,275-Speed 2499.56 samples/sec Loss 2.8772 LearningRate 0.000522 Epoch: 14 Global Step: 290380 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:49,476-Speed 2498.16 samples/sec Loss 2.8602 LearningRate 0.000522 Epoch: 14 Global Step: 290390 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:51:57,678-Speed 2497.36 samples/sec Loss 2.8251 LearningRate 0.000522 Epoch: 14 Global Step: 290400 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:05,823-Speed 2514.60 samples/sec Loss 2.8630 LearningRate 0.000521 Epoch: 14 Global Step: 290410 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:14,021-Speed 2498.74 samples/sec Loss 2.8243 LearningRate 0.000521 Epoch: 14 Global Step: 290420 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:22,224-Speed 2497.10 samples/sec Loss 2.8241 LearningRate 0.000521 Epoch: 14 Global Step: 290430 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:30,430-Speed 2496.02 samples/sec Loss 2.8876 LearningRate 0.000521 Epoch: 14 Global Step: 290440 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:38,626-Speed 2499.24 samples/sec Loss 2.8796 LearningRate 0.000521 Epoch: 14 Global Step: 290450 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:46,828-Speed 2497.28 samples/sec Loss 2.8886 LearningRate 0.000521 Epoch: 14 Global Step: 290460 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:52:54,977-Speed 2513.67 samples/sec Loss 2.8520 LearningRate 0.000521 Epoch: 14 Global Step: 290470 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:03,180-Speed 2497.19 samples/sec Loss 2.8768 LearningRate 0.000521 Epoch: 14 Global Step: 290480 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:11,379-Speed 2498.38 samples/sec Loss 2.8573 LearningRate 0.000521 Epoch: 14 Global Step: 290490 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:19,582-Speed 2497.09 samples/sec Loss 2.8665 LearningRate 0.000521 Epoch: 14 Global Step: 290500 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:27,780-Speed 2498.63 samples/sec Loss 2.8663 LearningRate 0.000521 Epoch: 14 Global Step: 290510 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:35,979-Speed 2498.33 samples/sec Loss 2.8125 LearningRate 0.000521 Epoch: 14 Global Step: 290520 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:44,126-Speed 2514.16 samples/sec Loss 2.8564 LearningRate 0.000521 Epoch: 14 Global Step: 290530 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:53:52,331-Speed 2496.50 samples/sec Loss 2.8119 LearningRate 0.000521 Epoch: 14 Global Step: 290540 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:00,533-Speed 2497.06 samples/sec Loss 2.8390 LearningRate 0.000521 Epoch: 14 Global Step: 290550 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:08,736-Speed 2497.26 samples/sec Loss 2.8131 LearningRate 0.000521 Epoch: 14 Global Step: 290560 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:16,941-Speed 2496.14 samples/sec Loss 2.8083 LearningRate 0.000521 Epoch: 14 Global Step: 290570 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:25,141-Speed 2497.91 samples/sec Loss 2.8013 LearningRate 0.000521 Epoch: 14 Global Step: 290580 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:33,293-Speed 2513.06 samples/sec Loss 2.8248 LearningRate 0.000521 Epoch: 14 Global Step: 290590 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:41,494-Speed 2497.75 samples/sec Loss 2.7915 LearningRate 0.000521 Epoch: 14 Global Step: 290600 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:49,693-Speed 2498.15 samples/sec Loss 2.8089 LearningRate 0.000521 Epoch: 14 Global Step: 290610 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:54:57,900-Speed 2496.33 samples/sec Loss 2.7724 LearningRate 0.000521 Epoch: 14 Global Step: 290620 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:06,105-Speed 2496.41 samples/sec Loss 2.7689 LearningRate 0.000521 Epoch: 14 Global Step: 290630 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:14,303-Speed 2498.63 samples/sec Loss 2.7735 LearningRate 0.000521 Epoch: 14 Global Step: 290640 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:22,455-Speed 2513.04 samples/sec Loss 2.7138 LearningRate 0.000521 Epoch: 14 Global Step: 290650 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:30,656-Speed 2497.67 samples/sec Loss 2.8221 LearningRate 0.000521 Epoch: 14 Global Step: 290660 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:38,863-Speed 2495.92 samples/sec Loss 2.8540 LearningRate 0.000521 Epoch: 14 Global Step: 290670 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:47,064-Speed 2497.68 samples/sec Loss 2.7864 LearningRate 0.000521 Epoch: 14 Global Step: 290680 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:55:55,265-Speed 2497.70 samples/sec Loss 2.8248 LearningRate 0.000521 Epoch: 14 Global Step: 290690 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:03,465-Speed 2498.03 samples/sec Loss 2.8338 LearningRate 0.000521 Epoch: 14 Global Step: 290700 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:11,612-Speed 2513.88 samples/sec Loss 2.8221 LearningRate 0.000521 Epoch: 14 Global Step: 290710 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:19,817-Speed 2496.65 samples/sec Loss 2.8549 LearningRate 0.000521 Epoch: 14 Global Step: 290720 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:28,026-Speed 2495.19 samples/sec Loss 2.8605 LearningRate 0.000521 Epoch: 14 Global Step: 290730 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:36,227-Speed 2497.91 samples/sec Loss 2.8332 LearningRate 0.000521 Epoch: 14 Global Step: 290740 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:44,432-Speed 2496.54 samples/sec Loss 2.7841 LearningRate 0.000521 Epoch: 14 Global Step: 290750 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:56:52,632-Speed 2497.87 samples/sec Loss 2.7992 LearningRate 0.000521 Epoch: 14 Global Step: 290760 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:00,776-Speed 2515.13 samples/sec Loss 2.8717 LearningRate 0.000521 Epoch: 14 Global Step: 290770 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:08,975-Speed 2498.14 samples/sec Loss 2.8714 LearningRate 0.000521 Epoch: 14 Global Step: 290780 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:17,176-Speed 2497.89 samples/sec Loss 2.7931 LearningRate 0.000521 Epoch: 14 Global Step: 290790 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:25,374-Speed 2498.39 samples/sec Loss 2.7721 LearningRate 0.000521 Epoch: 14 Global Step: 290800 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:33,577-Speed 2496.90 samples/sec Loss 2.8128 LearningRate 0.000521 Epoch: 14 Global Step: 290810 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:41,774-Speed 2498.92 samples/sec Loss 2.8317 LearningRate 0.000521 Epoch: 14 Global Step: 290820 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:49,918-Speed 2515.26 samples/sec Loss 2.8057 LearningRate 0.000521 Epoch: 14 Global Step: 290830 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:57:58,120-Speed 2496.99 samples/sec Loss 2.8910 LearningRate 0.000521 Epoch: 14 Global Step: 290840 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:06,320-Speed 2498.14 samples/sec Loss 2.8079 LearningRate 0.000521 Epoch: 14 Global Step: 290850 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:14,523-Speed 2497.37 samples/sec Loss 2.8626 LearningRate 0.000521 Epoch: 14 Global Step: 290860 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:22,727-Speed 2496.60 samples/sec Loss 2.8022 LearningRate 0.000521 Epoch: 14 Global Step: 290870 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:30,929-Speed 2497.37 samples/sec Loss 2.7955 LearningRate 0.000521 Epoch: 14 Global Step: 290880 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:39,075-Speed 2514.49 samples/sec Loss 2.8305 LearningRate 0.000521 Epoch: 14 Global Step: 290890 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:47,273-Speed 2499.36 samples/sec Loss 2.8558 LearningRate 0.000521 Epoch: 14 Global Step: 290900 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:58:55,475-Speed 2497.13 samples/sec Loss 2.8273 LearningRate 0.000521 Epoch: 14 Global Step: 290910 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:03,672-Speed 2499.05 samples/sec Loss 2.8167 LearningRate 0.000520 Epoch: 14 Global Step: 290920 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:11,873-Speed 2497.87 samples/sec Loss 2.8296 LearningRate 0.000520 Epoch: 14 Global Step: 290930 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:20,066-Speed 2499.97 samples/sec Loss 2.8322 LearningRate 0.000520 Epoch: 14 Global Step: 290940 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:28,211-Speed 2514.70 samples/sec Loss 2.8610 LearningRate 0.000520 Epoch: 14 Global Step: 290950 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:36,416-Speed 2496.27 samples/sec Loss 2.7548 LearningRate 0.000520 Epoch: 14 Global Step: 290960 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:44,626-Speed 2495.16 samples/sec Loss 2.8062 LearningRate 0.000520 Epoch: 14 Global Step: 290970 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 08:59:52,826-Speed 2497.98 samples/sec Loss 2.7740 LearningRate 0.000520 Epoch: 14 Global Step: 290980 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:01,041-Speed 2493.32 samples/sec Loss 2.8184 LearningRate 0.000520 Epoch: 14 Global Step: 290990 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:09,245-Speed 2496.83 samples/sec Loss 2.7226 LearningRate 0.000520 Epoch: 14 Global Step: 291000 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:17,393-Speed 2513.90 samples/sec Loss 2.7745 LearningRate 0.000520 Epoch: 14 Global Step: 291010 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:25,592-Speed 2498.36 samples/sec Loss 2.8040 LearningRate 0.000520 Epoch: 14 Global Step: 291020 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:33,807-Speed 2493.45 samples/sec Loss 2.7169 LearningRate 0.000520 Epoch: 14 Global Step: 291030 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:42,002-Speed 2499.29 samples/sec Loss 2.8111 LearningRate 0.000520 Epoch: 14 Global Step: 291040 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:50,199-Speed 2498.83 samples/sec Loss 2.8039 LearningRate 0.000520 Epoch: 14 Global Step: 291050 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:00:58,402-Speed 2497.30 samples/sec Loss 2.7713 LearningRate 0.000520 Epoch: 14 Global Step: 291060 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:06,550-Speed 2514.17 samples/sec Loss 2.8358 LearningRate 0.000520 Epoch: 14 Global Step: 291070 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:14,752-Speed 2497.34 samples/sec Loss 2.8507 LearningRate 0.000520 Epoch: 14 Global Step: 291080 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:22,950-Speed 2498.50 samples/sec Loss 2.8001 LearningRate 0.000520 Epoch: 14 Global Step: 291090 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:31,153-Speed 2497.15 samples/sec Loss 2.8423 LearningRate 0.000520 Epoch: 14 Global Step: 291100 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:39,351-Speed 2498.60 samples/sec Loss 2.8421 LearningRate 0.000520 Epoch: 14 Global Step: 291110 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:47,554-Speed 2497.25 samples/sec Loss 2.8589 LearningRate 0.000520 Epoch: 14 Global Step: 291120 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:01:55,703-Speed 2513.40 samples/sec Loss 2.8752 LearningRate 0.000520 Epoch: 14 Global Step: 291130 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:03,905-Speed 2497.59 samples/sec Loss 2.8729 LearningRate 0.000520 Epoch: 14 Global Step: 291140 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:12,101-Speed 2499.06 samples/sec Loss 2.8666 LearningRate 0.000520 Epoch: 14 Global Step: 291150 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:20,299-Speed 2498.38 samples/sec Loss 2.8367 LearningRate 0.000520 Epoch: 14 Global Step: 291160 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:28,497-Speed 2498.77 samples/sec Loss 2.8389 LearningRate 0.000520 Epoch: 14 Global Step: 291170 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:36,696-Speed 2498.21 samples/sec Loss 2.8407 LearningRate 0.000520 Epoch: 14 Global Step: 291180 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:44,852-Speed 2511.47 samples/sec Loss 2.8580 LearningRate 0.000520 Epoch: 14 Global Step: 291190 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:02:53,050-Speed 2498.63 samples/sec Loss 2.8386 LearningRate 0.000520 Epoch: 14 Global Step: 291200 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:01,253-Speed 2497.05 samples/sec Loss 2.8921 LearningRate 0.000520 Epoch: 14 Global Step: 291210 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:09,456-Speed 2496.99 samples/sec Loss 2.8679 LearningRate 0.000520 Epoch: 14 Global Step: 291220 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:17,654-Speed 2499.12 samples/sec Loss 2.7812 LearningRate 0.000520 Epoch: 14 Global Step: 291230 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:25,853-Speed 2498.22 samples/sec Loss 2.8069 LearningRate 0.000520 Epoch: 14 Global Step: 291240 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:34,001-Speed 2514.26 samples/sec Loss 2.8488 LearningRate 0.000520 Epoch: 14 Global Step: 291250 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:42,205-Speed 2496.63 samples/sec Loss 2.7928 LearningRate 0.000520 Epoch: 14 Global Step: 291260 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:50,407-Speed 2497.58 samples/sec Loss 2.8162 LearningRate 0.000520 Epoch: 14 Global Step: 291270 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:03:58,607-Speed 2497.77 samples/sec Loss 2.8559 LearningRate 0.000520 Epoch: 14 Global Step: 291280 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:06,805-Speed 2498.73 samples/sec Loss 2.8687 LearningRate 0.000520 Epoch: 14 Global Step: 291290 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:15,006-Speed 2497.83 samples/sec Loss 2.7945 LearningRate 0.000520 Epoch: 14 Global Step: 291300 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:23,152-Speed 2514.37 samples/sec Loss 2.8658 LearningRate 0.000520 Epoch: 14 Global Step: 291310 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:31,353-Speed 2497.68 samples/sec Loss 2.8676 LearningRate 0.000520 Epoch: 14 Global Step: 291320 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:39,550-Speed 2498.99 samples/sec Loss 2.8699 LearningRate 0.000520 Epoch: 14 Global Step: 291330 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:47,748-Speed 2498.43 samples/sec Loss 2.8367 LearningRate 0.000520 Epoch: 14 Global Step: 291340 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:04:55,954-Speed 2496.16 samples/sec Loss 2.8647 LearningRate 0.000520 Epoch: 14 Global Step: 291350 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:04,157-Speed 2496.99 samples/sec Loss 2.8483 LearningRate 0.000520 Epoch: 14 Global Step: 291360 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:12,314-Speed 2511.37 samples/sec Loss 2.9378 LearningRate 0.000520 Epoch: 14 Global Step: 291370 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:20,515-Speed 2497.64 samples/sec Loss 2.8534 LearningRate 0.000520 Epoch: 14 Global Step: 291380 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:28,719-Speed 2496.37 samples/sec Loss 2.9231 LearningRate 0.000520 Epoch: 14 Global Step: 291390 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:36,924-Speed 2496.74 samples/sec Loss 2.8948 LearningRate 0.000520 Epoch: 14 Global Step: 291400 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:45,126-Speed 2497.24 samples/sec Loss 2.9068 LearningRate 0.000520 Epoch: 14 Global Step: 291410 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:05:53,340-Speed 2493.50 samples/sec Loss 2.9311 LearningRate 0.000520 Epoch: 14 Global Step: 291420 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:06:01,491-Speed 2513.13 samples/sec Loss 2.8606 LearningRate 0.000520 Epoch: 14 Global Step: 291430 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:06:09,694-Speed 2497.01 samples/sec Loss 2.8639 LearningRate 0.000519 Epoch: 14 Global Step: 291440 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:06:17,900-Speed 2496.17 samples/sec Loss 2.8581 LearningRate 0.000519 Epoch: 14 Global Step: 291450 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:06:26,104-Speed 2496.74 samples/sec Loss 2.8621 LearningRate 0.000519 Epoch: 14 Global Step: 291460 Fp16 Grad Scale: 16384 Required: 123 hours Training: 2022-07-08 09:06:34,306-Speed 2497.42 samples/sec Loss 2.8747 LearningRate 0.000519 Epoch: 14 Global Step: 291470 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:06:42,510-Speed 2496.65 samples/sec Loss 2.8664 LearningRate 0.000519 Epoch: 14 Global Step: 291480 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:06:50,657-Speed 2514.05 samples/sec Loss 2.7898 LearningRate 0.000519 Epoch: 14 Global Step: 291490 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:06:58,861-Speed 2496.83 samples/sec Loss 2.8702 LearningRate 0.000519 Epoch: 14 Global Step: 291500 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:07,075-Speed 2493.83 samples/sec Loss 2.8175 LearningRate 0.000519 Epoch: 14 Global Step: 291510 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:15,278-Speed 2497.11 samples/sec Loss 2.8296 LearningRate 0.000519 Epoch: 14 Global Step: 291520 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:23,481-Speed 2497.01 samples/sec Loss 2.9011 LearningRate 0.000519 Epoch: 14 Global Step: 291530 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:31,684-Speed 2496.92 samples/sec Loss 2.9118 LearningRate 0.000519 Epoch: 14 Global Step: 291540 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:39,840-Speed 2511.42 samples/sec Loss 2.8685 LearningRate 0.000519 Epoch: 14 Global Step: 291550 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:48,044-Speed 2496.71 samples/sec Loss 2.8601 LearningRate 0.000519 Epoch: 14 Global Step: 291560 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:07:56,246-Speed 2497.40 samples/sec Loss 2.8192 LearningRate 0.000519 Epoch: 14 Global Step: 291570 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:04,448-Speed 2497.29 samples/sec Loss 2.8309 LearningRate 0.000519 Epoch: 14 Global Step: 291580 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:12,651-Speed 2497.19 samples/sec Loss 2.8685 LearningRate 0.000519 Epoch: 14 Global Step: 291590 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:20,851-Speed 2497.89 samples/sec Loss 2.8228 LearningRate 0.000519 Epoch: 14 Global Step: 291600 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:29,002-Speed 2514.21 samples/sec Loss 2.8620 LearningRate 0.000519 Epoch: 14 Global Step: 291610 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:37,223-Speed 2491.56 samples/sec Loss 2.8111 LearningRate 0.000519 Epoch: 14 Global Step: 291620 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:45,426-Speed 2496.92 samples/sec Loss 2.7689 LearningRate 0.000519 Epoch: 14 Global Step: 291630 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:08:53,643-Speed 2492.97 samples/sec Loss 2.7913 LearningRate 0.000519 Epoch: 14 Global Step: 291640 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:01,842-Speed 2498.26 samples/sec Loss 2.8881 LearningRate 0.000519 Epoch: 14 Global Step: 291650 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:10,045-Speed 2496.96 samples/sec Loss 2.7944 LearningRate 0.000519 Epoch: 14 Global Step: 291660 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:18,190-Speed 2514.77 samples/sec Loss 2.8307 LearningRate 0.000519 Epoch: 14 Global Step: 291670 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:26,391-Speed 2497.81 samples/sec Loss 2.8222 LearningRate 0.000519 Epoch: 14 Global Step: 291680 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:34,590-Speed 2498.26 samples/sec Loss 2.7635 LearningRate 0.000519 Epoch: 14 Global Step: 291690 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:42,788-Speed 2498.67 samples/sec Loss 2.8339 LearningRate 0.000519 Epoch: 14 Global Step: 291700 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:50,984-Speed 2499.20 samples/sec Loss 2.8984 LearningRate 0.000519 Epoch: 14 Global Step: 291710 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:09:59,186-Speed 2497.25 samples/sec Loss 2.7776 LearningRate 0.000519 Epoch: 14 Global Step: 291720 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:07,337-Speed 2512.89 samples/sec Loss 2.8304 LearningRate 0.000519 Epoch: 14 Global Step: 291730 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:15,543-Speed 2496.17 samples/sec Loss 2.8153 LearningRate 0.000519 Epoch: 14 Global Step: 291740 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:23,747-Speed 2496.98 samples/sec Loss 2.8066 LearningRate 0.000519 Epoch: 14 Global Step: 291750 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:31,948-Speed 2497.70 samples/sec Loss 2.8974 LearningRate 0.000519 Epoch: 14 Global Step: 291760 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:40,144-Speed 2498.88 samples/sec Loss 2.8928 LearningRate 0.000519 Epoch: 14 Global Step: 291770 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:48,349-Speed 2496.60 samples/sec Loss 2.8258 LearningRate 0.000519 Epoch: 14 Global Step: 291780 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:10:56,494-Speed 2514.77 samples/sec Loss 2.9080 LearningRate 0.000519 Epoch: 14 Global Step: 291790 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:04,698-Speed 2496.60 samples/sec Loss 2.8430 LearningRate 0.000519 Epoch: 14 Global Step: 291800 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:12,921-Speed 2491.18 samples/sec Loss 2.7712 LearningRate 0.000519 Epoch: 14 Global Step: 291810 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:21,122-Speed 2497.66 samples/sec Loss 2.8388 LearningRate 0.000519 Epoch: 14 Global Step: 291820 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:29,322-Speed 2497.93 samples/sec Loss 2.8912 LearningRate 0.000519 Epoch: 14 Global Step: 291830 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:37,524-Speed 2497.51 samples/sec Loss 2.8355 LearningRate 0.000519 Epoch: 14 Global Step: 291840 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:45,673-Speed 2513.66 samples/sec Loss 2.8826 LearningRate 0.000519 Epoch: 14 Global Step: 291850 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:11:53,872-Speed 2498.28 samples/sec Loss 2.8339 LearningRate 0.000519 Epoch: 14 Global Step: 291860 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:02,069-Speed 2498.89 samples/sec Loss 2.7783 LearningRate 0.000519 Epoch: 14 Global Step: 291870 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:10,281-Speed 2494.16 samples/sec Loss 2.8427 LearningRate 0.000519 Epoch: 14 Global Step: 291880 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:18,490-Speed 2495.37 samples/sec Loss 2.8019 LearningRate 0.000519 Epoch: 14 Global Step: 291890 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:26,690-Speed 2497.97 samples/sec Loss 2.8397 LearningRate 0.000519 Epoch: 14 Global Step: 291900 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:34,841-Speed 2513.10 samples/sec Loss 2.8526 LearningRate 0.000519 Epoch: 14 Global Step: 291910 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:43,054-Speed 2493.78 samples/sec Loss 2.8984 LearningRate 0.000519 Epoch: 14 Global Step: 291920 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:51,254-Speed 2498.51 samples/sec Loss 2.8071 LearningRate 0.000519 Epoch: 14 Global Step: 291930 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:12:59,462-Speed 2495.52 samples/sec Loss 2.7836 LearningRate 0.000519 Epoch: 14 Global Step: 291940 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:07,659-Speed 2498.61 samples/sec Loss 2.8097 LearningRate 0.000519 Epoch: 14 Global Step: 291950 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:15,863-Speed 2497.35 samples/sec Loss 2.8512 LearningRate 0.000518 Epoch: 14 Global Step: 291960 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:24,010-Speed 2514.16 samples/sec Loss 2.8643 LearningRate 0.000518 Epoch: 14 Global Step: 291970 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:32,209-Speed 2498.31 samples/sec Loss 2.8386 LearningRate 0.000518 Epoch: 14 Global Step: 291980 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:40,418-Speed 2495.25 samples/sec Loss 2.8324 LearningRate 0.000518 Epoch: 14 Global Step: 291990 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:48,627-Speed 2495.48 samples/sec Loss 2.8409 LearningRate 0.000518 Epoch: 14 Global Step: 292000 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:13:56,829-Speed 2497.14 samples/sec Loss 2.8064 LearningRate 0.000518 Epoch: 14 Global Step: 292010 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:05,030-Speed 2497.73 samples/sec Loss 2.8102 LearningRate 0.000518 Epoch: 14 Global Step: 292020 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:13,175-Speed 2514.84 samples/sec Loss 2.8267 LearningRate 0.000518 Epoch: 14 Global Step: 292030 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:21,377-Speed 2497.39 samples/sec Loss 2.8910 LearningRate 0.000518 Epoch: 14 Global Step: 292040 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:29,578-Speed 2497.53 samples/sec Loss 2.8347 LearningRate 0.000518 Epoch: 14 Global Step: 292050 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:37,785-Speed 2495.92 samples/sec Loss 2.9030 LearningRate 0.000518 Epoch: 14 Global Step: 292060 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:45,996-Speed 2494.57 samples/sec Loss 2.8409 LearningRate 0.000518 Epoch: 14 Global Step: 292070 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:14:54,195-Speed 2498.22 samples/sec Loss 2.8903 LearningRate 0.000518 Epoch: 14 Global Step: 292080 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:02,344-Speed 2513.55 samples/sec Loss 2.7911 LearningRate 0.000518 Epoch: 14 Global Step: 292090 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:10,547-Speed 2496.99 samples/sec Loss 2.8407 LearningRate 0.000518 Epoch: 14 Global Step: 292100 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:18,746-Speed 2498.36 samples/sec Loss 2.8550 LearningRate 0.000518 Epoch: 14 Global Step: 292110 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:26,959-Speed 2493.89 samples/sec Loss 2.7998 LearningRate 0.000518 Epoch: 14 Global Step: 292120 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:35,161-Speed 2497.94 samples/sec Loss 2.8285 LearningRate 0.000518 Epoch: 14 Global Step: 292130 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:43,359-Speed 2498.41 samples/sec Loss 2.8833 LearningRate 0.000518 Epoch: 14 Global Step: 292140 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:51,507-Speed 2513.85 samples/sec Loss 2.9524 LearningRate 0.000518 Epoch: 14 Global Step: 292150 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:15:59,708-Speed 2497.71 samples/sec Loss 2.9181 LearningRate 0.000518 Epoch: 14 Global Step: 292160 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:07,911-Speed 2497.08 samples/sec Loss 2.7986 LearningRate 0.000518 Epoch: 14 Global Step: 292170 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:16,113-Speed 2497.31 samples/sec Loss 2.8592 LearningRate 0.000518 Epoch: 14 Global Step: 292180 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:24,313-Speed 2498.03 samples/sec Loss 2.8618 LearningRate 0.000518 Epoch: 14 Global Step: 292190 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:32,526-Speed 2493.77 samples/sec Loss 2.8363 LearningRate 0.000518 Epoch: 14 Global Step: 292200 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:40,673-Speed 2514.38 samples/sec Loss 2.8931 LearningRate 0.000518 Epoch: 14 Global Step: 292210 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:48,889-Speed 2493.05 samples/sec Loss 2.8699 LearningRate 0.000518 Epoch: 14 Global Step: 292220 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:16:57,091-Speed 2497.31 samples/sec Loss 2.8591 LearningRate 0.000518 Epoch: 14 Global Step: 292230 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:05,293-Speed 2497.39 samples/sec Loss 2.8234 LearningRate 0.000518 Epoch: 14 Global Step: 292240 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:13,493-Speed 2497.93 samples/sec Loss 2.7906 LearningRate 0.000518 Epoch: 14 Global Step: 292250 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:21,692-Speed 2498.28 samples/sec Loss 2.8742 LearningRate 0.000518 Epoch: 14 Global Step: 292260 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:29,835-Speed 2515.36 samples/sec Loss 2.8480 LearningRate 0.000518 Epoch: 14 Global Step: 292270 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:38,033-Speed 2498.81 samples/sec Loss 2.8184 LearningRate 0.000518 Epoch: 14 Global Step: 292280 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:46,232-Speed 2498.35 samples/sec Loss 2.8749 LearningRate 0.000518 Epoch: 14 Global Step: 292290 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:17:54,442-Speed 2494.74 samples/sec Loss 2.8455 LearningRate 0.000518 Epoch: 14 Global Step: 292300 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:02,640-Speed 2498.77 samples/sec Loss 2.8198 LearningRate 0.000518 Epoch: 14 Global Step: 292310 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:10,842-Speed 2497.34 samples/sec Loss 2.8143 LearningRate 0.000518 Epoch: 14 Global Step: 292320 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:18,986-Speed 2515.13 samples/sec Loss 2.9149 LearningRate 0.000518 Epoch: 14 Global Step: 292330 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:27,187-Speed 2497.53 samples/sec Loss 2.9089 LearningRate 0.000518 Epoch: 14 Global Step: 292340 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:35,390-Speed 2497.35 samples/sec Loss 2.8944 LearningRate 0.000518 Epoch: 14 Global Step: 292350 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:43,590-Speed 2498.00 samples/sec Loss 2.8011 LearningRate 0.000518 Epoch: 14 Global Step: 292360 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:51,787-Speed 2498.80 samples/sec Loss 2.8594 LearningRate 0.000518 Epoch: 14 Global Step: 292370 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:18:59,988-Speed 2497.64 samples/sec Loss 2.8229 LearningRate 0.000518 Epoch: 14 Global Step: 292380 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:08,131-Speed 2515.80 samples/sec Loss 2.8833 LearningRate 0.000518 Epoch: 14 Global Step: 292390 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:16,351-Speed 2491.73 samples/sec Loss 2.8471 LearningRate 0.000518 Epoch: 14 Global Step: 292400 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:24,550-Speed 2498.70 samples/sec Loss 2.8096 LearningRate 0.000518 Epoch: 14 Global Step: 292410 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:32,751-Speed 2497.92 samples/sec Loss 2.8599 LearningRate 0.000518 Epoch: 14 Global Step: 292420 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:40,949-Speed 2498.36 samples/sec Loss 2.8820 LearningRate 0.000518 Epoch: 14 Global Step: 292430 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:49,152-Speed 2497.14 samples/sec Loss 2.8788 LearningRate 0.000518 Epoch: 14 Global Step: 292440 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:19:57,299-Speed 2514.17 samples/sec Loss 2.8401 LearningRate 0.000518 Epoch: 14 Global Step: 292450 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:05,499-Speed 2497.96 samples/sec Loss 2.8424 LearningRate 0.000518 Epoch: 14 Global Step: 292460 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:13,706-Speed 2495.98 samples/sec Loss 2.8309 LearningRate 0.000518 Epoch: 14 Global Step: 292470 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:21,908-Speed 2497.24 samples/sec Loss 2.8522 LearningRate 0.000517 Epoch: 14 Global Step: 292480 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:30,110-Speed 2497.47 samples/sec Loss 2.8099 LearningRate 0.000517 Epoch: 14 Global Step: 292490 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:38,309-Speed 2498.33 samples/sec Loss 2.8838 LearningRate 0.000517 Epoch: 14 Global Step: 292500 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:46,453-Speed 2515.17 samples/sec Loss 2.8436 LearningRate 0.000517 Epoch: 14 Global Step: 292510 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:20:54,655-Speed 2497.38 samples/sec Loss 2.8362 LearningRate 0.000517 Epoch: 14 Global Step: 292520 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:02,853-Speed 2498.82 samples/sec Loss 2.8578 LearningRate 0.000517 Epoch: 14 Global Step: 292530 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:11,056-Speed 2496.97 samples/sec Loss 2.8970 LearningRate 0.000517 Epoch: 14 Global Step: 292540 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:19,265-Speed 2495.34 samples/sec Loss 2.9067 LearningRate 0.000517 Epoch: 14 Global Step: 292550 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:27,465-Speed 2498.04 samples/sec Loss 2.8846 LearningRate 0.000517 Epoch: 14 Global Step: 292560 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:35,613-Speed 2513.87 samples/sec Loss 2.8775 LearningRate 0.000517 Epoch: 14 Global Step: 292570 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:43,812-Speed 2498.29 samples/sec Loss 2.8132 LearningRate 0.000517 Epoch: 14 Global Step: 292580 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:21:52,015-Speed 2496.98 samples/sec Loss 2.8485 LearningRate 0.000517 Epoch: 14 Global Step: 292590 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:00,321-Speed 2466.18 samples/sec Loss 2.8677 LearningRate 0.000517 Epoch: 14 Global Step: 292600 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:08,521-Speed 2498.03 samples/sec Loss 2.8476 LearningRate 0.000517 Epoch: 14 Global Step: 292610 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:16,723-Speed 2497.06 samples/sec Loss 2.8651 LearningRate 0.000517 Epoch: 14 Global Step: 292620 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:24,874-Speed 2513.18 samples/sec Loss 2.7953 LearningRate 0.000517 Epoch: 14 Global Step: 292630 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:33,077-Speed 2497.14 samples/sec Loss 2.8761 LearningRate 0.000517 Epoch: 14 Global Step: 292640 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:41,279-Speed 2497.35 samples/sec Loss 2.8729 LearningRate 0.000517 Epoch: 14 Global Step: 292650 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:49,478-Speed 2498.38 samples/sec Loss 2.8300 LearningRate 0.000517 Epoch: 14 Global Step: 292660 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:22:57,681-Speed 2497.15 samples/sec Loss 2.9283 LearningRate 0.000517 Epoch: 14 Global Step: 292670 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:05,887-Speed 2496.17 samples/sec Loss 2.8505 LearningRate 0.000517 Epoch: 14 Global Step: 292680 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:14,032-Speed 2514.75 samples/sec Loss 2.8424 LearningRate 0.000517 Epoch: 14 Global Step: 292690 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:22,232-Speed 2498.03 samples/sec Loss 2.8278 LearningRate 0.000517 Epoch: 14 Global Step: 292700 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:30,433-Speed 2497.62 samples/sec Loss 2.8070 LearningRate 0.000517 Epoch: 14 Global Step: 292710 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:38,635-Speed 2497.37 samples/sec Loss 2.8652 LearningRate 0.000517 Epoch: 14 Global Step: 292720 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:46,833-Speed 2498.66 samples/sec Loss 2.8910 LearningRate 0.000517 Epoch: 14 Global Step: 292730 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:23:55,036-Speed 2497.14 samples/sec Loss 2.8425 LearningRate 0.000517 Epoch: 14 Global Step: 292740 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:03,180-Speed 2515.29 samples/sec Loss 2.8882 LearningRate 0.000517 Epoch: 14 Global Step: 292750 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:11,384-Speed 2496.45 samples/sec Loss 2.8057 LearningRate 0.000517 Epoch: 14 Global Step: 292760 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:19,581-Speed 2498.83 samples/sec Loss 2.8644 LearningRate 0.000517 Epoch: 14 Global Step: 292770 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:27,779-Speed 2498.56 samples/sec Loss 2.9016 LearningRate 0.000517 Epoch: 14 Global Step: 292780 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:35,983-Speed 2497.08 samples/sec Loss 2.8458 LearningRate 0.000517 Epoch: 14 Global Step: 292790 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:44,184-Speed 2497.75 samples/sec Loss 2.8924 LearningRate 0.000517 Epoch: 14 Global Step: 292800 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:24:52,330-Speed 2514.38 samples/sec Loss 2.7580 LearningRate 0.000517 Epoch: 14 Global Step: 292810 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:00,527-Speed 2498.78 samples/sec Loss 2.9442 LearningRate 0.000517 Epoch: 14 Global Step: 292820 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:08,727-Speed 2498.09 samples/sec Loss 2.9088 LearningRate 0.000517 Epoch: 14 Global Step: 292830 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:16,929-Speed 2497.18 samples/sec Loss 2.9152 LearningRate 0.000517 Epoch: 14 Global Step: 292840 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:25,129-Speed 2498.01 samples/sec Loss 2.9666 LearningRate 0.000517 Epoch: 14 Global Step: 292850 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:33,328-Speed 2498.25 samples/sec Loss 2.9164 LearningRate 0.000517 Epoch: 14 Global Step: 292860 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:41,473-Speed 2514.92 samples/sec Loss 2.8867 LearningRate 0.000517 Epoch: 14 Global Step: 292870 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:49,672-Speed 2498.35 samples/sec Loss 2.9189 LearningRate 0.000517 Epoch: 14 Global Step: 292880 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:25:57,884-Speed 2494.21 samples/sec Loss 2.8198 LearningRate 0.000517 Epoch: 14 Global Step: 292890 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:06,085-Speed 2497.83 samples/sec Loss 2.9008 LearningRate 0.000517 Epoch: 14 Global Step: 292900 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:14,284-Speed 2498.23 samples/sec Loss 2.8906 LearningRate 0.000517 Epoch: 14 Global Step: 292910 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:22,485-Speed 2497.94 samples/sec Loss 2.8001 LearningRate 0.000517 Epoch: 14 Global Step: 292920 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:30,644-Speed 2510.24 samples/sec Loss 2.8477 LearningRate 0.000517 Epoch: 14 Global Step: 292930 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:38,849-Speed 2496.54 samples/sec Loss 2.8892 LearningRate 0.000517 Epoch: 14 Global Step: 292940 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:47,059-Speed 2495.24 samples/sec Loss 2.8469 LearningRate 0.000517 Epoch: 14 Global Step: 292950 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:26:55,260-Speed 2497.50 samples/sec Loss 2.8013 LearningRate 0.000517 Epoch: 14 Global Step: 292960 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:03,460-Speed 2498.01 samples/sec Loss 2.8689 LearningRate 0.000517 Epoch: 14 Global Step: 292970 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:11,657-Speed 2498.77 samples/sec Loss 2.8561 LearningRate 0.000517 Epoch: 14 Global Step: 292980 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:19,808-Speed 2513.15 samples/sec Loss 2.8351 LearningRate 0.000517 Epoch: 14 Global Step: 292990 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:28,009-Speed 2497.69 samples/sec Loss 2.8358 LearningRate 0.000516 Epoch: 14 Global Step: 293000 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:36,206-Speed 2498.78 samples/sec Loss 2.8466 LearningRate 0.000516 Epoch: 14 Global Step: 293010 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:44,405-Speed 2498.45 samples/sec Loss 2.8045 LearningRate 0.000516 Epoch: 14 Global Step: 293020 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:27:52,604-Speed 2498.14 samples/sec Loss 2.8074 LearningRate 0.000516 Epoch: 14 Global Step: 293030 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:00,804-Speed 2498.11 samples/sec Loss 2.8044 LearningRate 0.000516 Epoch: 14 Global Step: 293040 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:08,953-Speed 2513.31 samples/sec Loss 2.8325 LearningRate 0.000516 Epoch: 14 Global Step: 293050 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:17,153-Speed 2498.05 samples/sec Loss 2.7562 LearningRate 0.000516 Epoch: 14 Global Step: 293060 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:25,351-Speed 2498.51 samples/sec Loss 2.8054 LearningRate 0.000516 Epoch: 14 Global Step: 293070 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:33,553-Speed 2497.38 samples/sec Loss 2.8193 LearningRate 0.000516 Epoch: 14 Global Step: 293080 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:41,751-Speed 2498.47 samples/sec Loss 2.8000 LearningRate 0.000516 Epoch: 14 Global Step: 293090 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:49,956-Speed 2496.51 samples/sec Loss 2.8320 LearningRate 0.000516 Epoch: 14 Global Step: 293100 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:28:58,109-Speed 2512.48 samples/sec Loss 2.7917 LearningRate 0.000516 Epoch: 14 Global Step: 293110 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:29:06,310-Speed 2497.68 samples/sec Loss 2.8212 LearningRate 0.000516 Epoch: 14 Global Step: 293120 Fp16 Grad Scale: 65536 Required: 123 hours Training: 2022-07-08 09:29:14,467-Speed 2510.86 samples/sec Loss 2.7493 LearningRate 0.000516 Epoch: 14 Global Step: 293130 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:29:22,669-Speed 2497.36 samples/sec Loss 2.8027 LearningRate 0.000516 Epoch: 14 Global Step: 293140 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:29:30,871-Speed 2497.55 samples/sec Loss 2.8035 LearningRate 0.000516 Epoch: 14 Global Step: 293150 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:29:39,073-Speed 2497.28 samples/sec Loss 2.8208 LearningRate 0.000516 Epoch: 14 Global Step: 293160 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:29:47,225-Speed 2512.36 samples/sec Loss 2.8518 LearningRate 0.000516 Epoch: 14 Global Step: 293170 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:29:55,428-Speed 2497.26 samples/sec Loss 2.7746 LearningRate 0.000516 Epoch: 14 Global Step: 293180 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:03,628-Speed 2497.94 samples/sec Loss 2.8071 LearningRate 0.000516 Epoch: 14 Global Step: 293190 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:11,835-Speed 2495.70 samples/sec Loss 2.8698 LearningRate 0.000516 Epoch: 14 Global Step: 293200 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:20,045-Speed 2494.86 samples/sec Loss 2.7653 LearningRate 0.000516 Epoch: 14 Global Step: 293210 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:28,247-Speed 2497.36 samples/sec Loss 2.8087 LearningRate 0.000516 Epoch: 14 Global Step: 293220 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:36,395-Speed 2513.88 samples/sec Loss 2.7954 LearningRate 0.000516 Epoch: 14 Global Step: 293230 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:44,595-Speed 2497.80 samples/sec Loss 2.7998 LearningRate 0.000516 Epoch: 14 Global Step: 293240 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:30:52,796-Speed 2497.62 samples/sec Loss 2.7478 LearningRate 0.000516 Epoch: 14 Global Step: 293250 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:01,003-Speed 2496.23 samples/sec Loss 2.8305 LearningRate 0.000516 Epoch: 14 Global Step: 293260 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:09,202-Speed 2498.35 samples/sec Loss 2.8080 LearningRate 0.000516 Epoch: 14 Global Step: 293270 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:17,404-Speed 2497.22 samples/sec Loss 2.8017 LearningRate 0.000516 Epoch: 14 Global Step: 293280 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:25,547-Speed 2515.40 samples/sec Loss 2.7545 LearningRate 0.000516 Epoch: 14 Global Step: 293290 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:33,751-Speed 2496.82 samples/sec Loss 2.8273 LearningRate 0.000516 Epoch: 14 Global Step: 293300 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:41,950-Speed 2498.30 samples/sec Loss 2.8266 LearningRate 0.000516 Epoch: 14 Global Step: 293310 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:50,148-Speed 2498.30 samples/sec Loss 2.8154 LearningRate 0.000516 Epoch: 14 Global Step: 293320 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:31:58,357-Speed 2495.31 samples/sec Loss 2.7998 LearningRate 0.000516 Epoch: 14 Global Step: 293330 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:06,563-Speed 2495.90 samples/sec Loss 2.7884 LearningRate 0.000516 Epoch: 14 Global Step: 293340 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:14,711-Speed 2513.94 samples/sec Loss 2.7960 LearningRate 0.000516 Epoch: 14 Global Step: 293350 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:22,913-Speed 2497.35 samples/sec Loss 2.8442 LearningRate 0.000516 Epoch: 14 Global Step: 293360 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:31,111-Speed 2498.67 samples/sec Loss 2.7886 LearningRate 0.000516 Epoch: 14 Global Step: 293370 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:39,311-Speed 2497.86 samples/sec Loss 2.8924 LearningRate 0.000516 Epoch: 14 Global Step: 293380 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:47,512-Speed 2497.63 samples/sec Loss 2.9175 LearningRate 0.000516 Epoch: 14 Global Step: 293390 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:32:55,715-Speed 2497.13 samples/sec Loss 2.8410 LearningRate 0.000516 Epoch: 14 Global Step: 293400 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:03,868-Speed 2512.40 samples/sec Loss 3.0034 LearningRate 0.000516 Epoch: 14 Global Step: 293410 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:12,078-Speed 2495.06 samples/sec Loss 2.8663 LearningRate 0.000516 Epoch: 14 Global Step: 293420 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:20,291-Speed 2494.10 samples/sec Loss 2.8946 LearningRate 0.000516 Epoch: 14 Global Step: 293430 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:28,489-Speed 2498.43 samples/sec Loss 2.8957 LearningRate 0.000516 Epoch: 14 Global Step: 293440 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:36,686-Speed 2498.95 samples/sec Loss 2.8482 LearningRate 0.000516 Epoch: 14 Global Step: 293450 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:44,885-Speed 2498.11 samples/sec Loss 2.8100 LearningRate 0.000516 Epoch: 14 Global Step: 293460 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:33:53,035-Speed 2513.24 samples/sec Loss 2.8724 LearningRate 0.000516 Epoch: 14 Global Step: 293470 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:01,233-Speed 2498.68 samples/sec Loss 2.8474 LearningRate 0.000516 Epoch: 14 Global Step: 293480 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:09,437-Speed 2496.75 samples/sec Loss 2.8529 LearningRate 0.000516 Epoch: 14 Global Step: 293490 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:17,637-Speed 2498.13 samples/sec Loss 2.8198 LearningRate 0.000516 Epoch: 14 Global Step: 293500 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:25,843-Speed 2496.25 samples/sec Loss 2.7775 LearningRate 0.000516 Epoch: 14 Global Step: 293510 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:34,045-Speed 2497.39 samples/sec Loss 2.7867 LearningRate 0.000515 Epoch: 14 Global Step: 293520 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:42,201-Speed 2511.39 samples/sec Loss 2.8007 LearningRate 0.000515 Epoch: 14 Global Step: 293530 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:50,406-Speed 2496.59 samples/sec Loss 2.8597 LearningRate 0.000515 Epoch: 14 Global Step: 293540 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:34:58,610-Speed 2496.77 samples/sec Loss 2.7729 LearningRate 0.000515 Epoch: 14 Global Step: 293550 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:06,814-Speed 2496.60 samples/sec Loss 2.8473 LearningRate 0.000515 Epoch: 14 Global Step: 293560 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:15,019-Speed 2496.65 samples/sec Loss 2.8823 LearningRate 0.000515 Epoch: 14 Global Step: 293570 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:23,221-Speed 2497.34 samples/sec Loss 2.8570 LearningRate 0.000515 Epoch: 14 Global Step: 293580 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:31,368-Speed 2514.19 samples/sec Loss 2.8510 LearningRate 0.000515 Epoch: 14 Global Step: 293590 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:39,573-Speed 2496.41 samples/sec Loss 2.7841 LearningRate 0.000515 Epoch: 14 Global Step: 293600 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:47,773-Speed 2497.78 samples/sec Loss 2.8006 LearningRate 0.000515 Epoch: 14 Global Step: 293610 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:35:55,974-Speed 2497.82 samples/sec Loss 2.8041 LearningRate 0.000515 Epoch: 14 Global Step: 293620 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:04,170-Speed 2499.11 samples/sec Loss 2.8420 LearningRate 0.000515 Epoch: 14 Global Step: 293630 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:12,393-Speed 2491.04 samples/sec Loss 2.8570 LearningRate 0.000515 Epoch: 14 Global Step: 293640 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:20,538-Speed 2514.70 samples/sec Loss 2.7931 LearningRate 0.000515 Epoch: 14 Global Step: 293650 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:28,739-Speed 2497.88 samples/sec Loss 2.8532 LearningRate 0.000515 Epoch: 14 Global Step: 293660 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:36,940-Speed 2497.73 samples/sec Loss 2.9070 LearningRate 0.000515 Epoch: 14 Global Step: 293670 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:45,142-Speed 2497.40 samples/sec Loss 2.9247 LearningRate 0.000515 Epoch: 14 Global Step: 293680 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:36:53,343-Speed 2497.53 samples/sec Loss 2.8374 LearningRate 0.000515 Epoch: 14 Global Step: 293690 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:01,537-Speed 2499.71 samples/sec Loss 2.8274 LearningRate 0.000515 Epoch: 14 Global Step: 293700 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:09,682-Speed 2515.02 samples/sec Loss 2.8002 LearningRate 0.000515 Epoch: 14 Global Step: 293710 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:17,879-Speed 2498.72 samples/sec Loss 2.8518 LearningRate 0.000515 Epoch: 14 Global Step: 293720 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:26,076-Speed 2498.91 samples/sec Loss 2.7724 LearningRate 0.000515 Epoch: 14 Global Step: 293730 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:34,275-Speed 2498.33 samples/sec Loss 2.8009 LearningRate 0.000515 Epoch: 14 Global Step: 293740 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:42,475-Speed 2497.98 samples/sec Loss 2.8180 LearningRate 0.000515 Epoch: 14 Global Step: 293750 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:50,675-Speed 2497.91 samples/sec Loss 2.7939 LearningRate 0.000515 Epoch: 14 Global Step: 293760 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:37:58,825-Speed 2513.62 samples/sec Loss 2.8028 LearningRate 0.000515 Epoch: 14 Global Step: 293770 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:07,024-Speed 2498.11 samples/sec Loss 2.9010 LearningRate 0.000515 Epoch: 14 Global Step: 293780 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:15,225-Speed 2497.75 samples/sec Loss 2.7553 LearningRate 0.000515 Epoch: 14 Global Step: 293790 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:23,430-Speed 2496.52 samples/sec Loss 2.8343 LearningRate 0.000515 Epoch: 14 Global Step: 293800 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:31,631-Speed 2497.50 samples/sec Loss 2.8446 LearningRate 0.000515 Epoch: 14 Global Step: 293810 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:39,831-Speed 2498.26 samples/sec Loss 2.8251 LearningRate 0.000515 Epoch: 14 Global Step: 293820 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:47,986-Speed 2511.93 samples/sec Loss 2.7578 LearningRate 0.000515 Epoch: 14 Global Step: 293830 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:38:56,184-Speed 2498.58 samples/sec Loss 2.8340 LearningRate 0.000515 Epoch: 14 Global Step: 293840 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:04,379-Speed 2499.22 samples/sec Loss 2.8580 LearningRate 0.000515 Epoch: 14 Global Step: 293850 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:12,579-Speed 2498.12 samples/sec Loss 2.7902 LearningRate 0.000515 Epoch: 14 Global Step: 293860 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:20,778-Speed 2498.18 samples/sec Loss 2.7613 LearningRate 0.000515 Epoch: 14 Global Step: 293870 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:28,988-Speed 2495.08 samples/sec Loss 2.7956 LearningRate 0.000515 Epoch: 14 Global Step: 293880 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:37,137-Speed 2513.62 samples/sec Loss 2.7993 LearningRate 0.000515 Epoch: 14 Global Step: 293890 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:45,343-Speed 2496.01 samples/sec Loss 2.8144 LearningRate 0.000515 Epoch: 14 Global Step: 293900 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:39:53,545-Speed 2497.38 samples/sec Loss 2.8548 LearningRate 0.000515 Epoch: 14 Global Step: 293910 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:01,748-Speed 2496.91 samples/sec Loss 2.8342 LearningRate 0.000515 Epoch: 14 Global Step: 293920 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:09,953-Speed 2496.53 samples/sec Loss 2.7367 LearningRate 0.000515 Epoch: 14 Global Step: 293930 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:18,156-Speed 2497.05 samples/sec Loss 2.7847 LearningRate 0.000515 Epoch: 14 Global Step: 293940 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:26,301-Speed 2514.99 samples/sec Loss 2.8583 LearningRate 0.000515 Epoch: 14 Global Step: 293950 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:34,501-Speed 2498.00 samples/sec Loss 2.8595 LearningRate 0.000515 Epoch: 14 Global Step: 293960 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:42,699-Speed 2498.38 samples/sec Loss 2.8401 LearningRate 0.000515 Epoch: 14 Global Step: 293970 Fp16 Grad Scale: 32768 Required: 123 hours Training: 2022-07-08 09:40:50,904-Speed 2499.57 samples/sec Loss 2.8248 LearningRate 0.000515 Epoch: 14 Global Step: 293980 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:40:59,102-Speed 2498.63 samples/sec Loss 2.8116 LearningRate 0.000515 Epoch: 14 Global Step: 293990 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:07,305-Speed 2497.05 samples/sec Loss 2.8572 LearningRate 0.000515 Epoch: 14 Global Step: 294000 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:15,462-Speed 2510.95 samples/sec Loss 2.8232 LearningRate 0.000515 Epoch: 14 Global Step: 294010 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:23,666-Speed 2497.02 samples/sec Loss 2.8759 LearningRate 0.000515 Epoch: 14 Global Step: 294020 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:31,869-Speed 2496.93 samples/sec Loss 2.8780 LearningRate 0.000515 Epoch: 14 Global Step: 294030 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:40,071-Speed 2497.64 samples/sec Loss 2.8026 LearningRate 0.000514 Epoch: 14 Global Step: 294040 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:48,276-Speed 2496.73 samples/sec Loss 2.8168 LearningRate 0.000514 Epoch: 14 Global Step: 294050 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:41:56,477-Speed 2497.29 samples/sec Loss 2.8470 LearningRate 0.000514 Epoch: 14 Global Step: 294060 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:04,630-Speed 2512.60 samples/sec Loss 2.7767 LearningRate 0.000514 Epoch: 14 Global Step: 294070 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:12,859-Speed 2489.14 samples/sec Loss 2.7786 LearningRate 0.000514 Epoch: 14 Global Step: 294080 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:21,059-Speed 2497.71 samples/sec Loss 2.8425 LearningRate 0.000514 Epoch: 14 Global Step: 294090 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:29,344-Speed 2499.41 samples/sec Loss 2.8527 LearningRate 0.000514 Epoch: 14 Global Step: 294100 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:37,599-Speed 2497.17 samples/sec Loss 2.7925 LearningRate 0.000514 Epoch: 14 Global Step: 294110 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:45,833-Speed 2499.36 samples/sec Loss 2.7814 LearningRate 0.000514 Epoch: 14 Global Step: 294120 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:42:53,988-Speed 2511.79 samples/sec Loss 2.8581 LearningRate 0.000514 Epoch: 14 Global Step: 294130 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:04,576-Speed 1934.46 samples/sec Loss 2.7962 LearningRate 0.000514 Epoch: 14 Global Step: 294140 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:12,792-Speed 2499.43 samples/sec Loss 2.8206 LearningRate 0.000514 Epoch: 14 Global Step: 294150 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:21,048-Speed 2488.43 samples/sec Loss 2.7682 LearningRate 0.000514 Epoch: 14 Global Step: 294160 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:33,119-Speed 1696.79 samples/sec Loss 2.8421 LearningRate 0.000514 Epoch: 14 Global Step: 294170 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:43,324-Speed 2502.77 samples/sec Loss 2.8631 LearningRate 0.000514 Epoch: 14 Global Step: 294180 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:51,502-Speed 2515.92 samples/sec Loss 2.8222 LearningRate 0.000514 Epoch: 14 Global Step: 294190 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:43:59,705-Speed 2496.88 samples/sec Loss 2.8026 LearningRate 0.000514 Epoch: 14 Global Step: 294200 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:44:12,791-Speed 2502.91 samples/sec Loss 2.8216 LearningRate 0.000514 Epoch: 14 Global Step: 294210 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:44:21,008-Speed 2502.99 samples/sec Loss 2.7858 LearningRate 0.000514 Epoch: 14 Global Step: 294220 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:44:29,208-Speed 2497.96 samples/sec Loss 2.8220 LearningRate 0.000514 Epoch: 14 Global Step: 294230 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:44:43,902-Speed 1393.88 samples/sec Loss 2.7896 LearningRate 0.000514 Epoch: 14 Global Step: 294240 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:44:52,117-Speed 2511.27 samples/sec Loss 2.7785 LearningRate 0.000514 Epoch: 14 Global Step: 294250 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:00,359-Speed 2502.40 samples/sec Loss 2.7811 LearningRate 0.000514 Epoch: 14 Global Step: 294260 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:10,709-Speed 1978.94 samples/sec Loss 2.8175 LearningRate 0.000514 Epoch: 14 Global Step: 294270 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:18,903-Speed 2500.61 samples/sec Loss 2.7963 LearningRate 0.000514 Epoch: 14 Global Step: 294280 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:27,172-Speed 2501.62 samples/sec Loss 2.8460 LearningRate 0.000514 Epoch: 14 Global Step: 294290 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:40,710-Speed 1519.21 samples/sec Loss 2.9099 LearningRate 0.000514 Epoch: 14 Global Step: 294300 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:45:49,958-Speed 2224.09 samples/sec Loss 2.9180 LearningRate 0.000514 Epoch: 14 Global Step: 294310 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:46:01,983-Speed 1707.60 samples/sec Loss 2.8305 LearningRate 0.000514 Epoch: 14 Global Step: 294320 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:46:11,118-Speed 2503.18 samples/sec Loss 2.8313 LearningRate 0.000514 Epoch: 14 Global Step: 294330 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:46:19,763-Speed 2369.21 samples/sec Loss 2.7771 LearningRate 0.000514 Epoch: 14 Global Step: 294340 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:46:31,174-Speed 1937.44 samples/sec Loss 2.8296 LearningRate 0.000514 Epoch: 14 Global Step: 294350 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:46:39,372-Speed 2498.58 samples/sec Loss 2.8269 LearningRate 0.000514 Epoch: 14 Global Step: 294360 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:46:47,514-Speed 2515.66 samples/sec Loss 2.8057 LearningRate 0.000514 Epoch: 14 Global Step: 294370 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:46:55,714-Speed 2498.05 samples/sec Loss 2.8548 LearningRate 0.000514 Epoch: 14 Global Step: 294380 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:47:03,918-Speed 2496.71 samples/sec Loss 2.8019 LearningRate 0.000514 Epoch: 14 Global Step: 294390 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:47:12,132-Speed 2493.78 samples/sec Loss 2.8515 LearningRate 0.000514 Epoch: 14 Global Step: 294400 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:47:20,344-Speed 2494.29 samples/sec Loss 2.8494 LearningRate 0.000514 Epoch: 14 Global Step: 294410 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 09:47:28,522-Speed 2504.57 samples/sec Loss 2.7497 LearningRate 0.000514 Epoch: 14 Global Step: 294420 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:47:36,687-Speed 2509.18 samples/sec Loss 2.8007 LearningRate 0.000514 Epoch: 14 Global Step: 294430 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:47:44,897-Speed 2494.85 samples/sec Loss 2.8257 LearningRate 0.000514 Epoch: 14 Global Step: 294440 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:47:53,110-Speed 2494.01 samples/sec Loss 2.8099 LearningRate 0.000514 Epoch: 14 Global Step: 294450 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:01,314-Speed 2497.21 samples/sec Loss 2.8927 LearningRate 0.000514 Epoch: 14 Global Step: 294460 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:09,534-Speed 2492.17 samples/sec Loss 2.8576 LearningRate 0.000514 Epoch: 14 Global Step: 294470 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:17,742-Speed 2495.29 samples/sec Loss 2.7902 LearningRate 0.000514 Epoch: 14 Global Step: 294480 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:25,895-Speed 2512.34 samples/sec Loss 2.8723 LearningRate 0.000514 Epoch: 14 Global Step: 294490 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:34,103-Speed 2495.66 samples/sec Loss 2.7924 LearningRate 0.000514 Epoch: 14 Global Step: 294500 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:42,302-Speed 2498.23 samples/sec Loss 2.7808 LearningRate 0.000514 Epoch: 14 Global Step: 294510 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:50,510-Speed 2495.60 samples/sec Loss 2.8252 LearningRate 0.000514 Epoch: 14 Global Step: 294520 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:48:58,709-Speed 2498.33 samples/sec Loss 2.8328 LearningRate 0.000514 Epoch: 14 Global Step: 294530 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:06,909-Speed 2498.03 samples/sec Loss 2.7932 LearningRate 0.000514 Epoch: 14 Global Step: 294540 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:15,066-Speed 2511.20 samples/sec Loss 2.8040 LearningRate 0.000514 Epoch: 14 Global Step: 294550 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:23,270-Speed 2496.73 samples/sec Loss 2.7829 LearningRate 0.000513 Epoch: 14 Global Step: 294560 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:31,473-Speed 2497.17 samples/sec Loss 2.7810 LearningRate 0.000513 Epoch: 14 Global Step: 294570 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:39,680-Speed 2495.65 samples/sec Loss 2.8392 LearningRate 0.000513 Epoch: 14 Global Step: 294580 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:47,895-Speed 2493.56 samples/sec Loss 2.7806 LearningRate 0.000513 Epoch: 14 Global Step: 294590 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:49:56,097-Speed 2497.44 samples/sec Loss 2.8523 LearningRate 0.000513 Epoch: 14 Global Step: 294600 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:04,245-Speed 2513.78 samples/sec Loss 2.8423 LearningRate 0.000513 Epoch: 14 Global Step: 294610 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:12,453-Speed 2495.76 samples/sec Loss 2.8349 LearningRate 0.000513 Epoch: 14 Global Step: 294620 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:20,656-Speed 2496.89 samples/sec Loss 2.8166 LearningRate 0.000513 Epoch: 14 Global Step: 294630 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:28,855-Speed 2498.21 samples/sec Loss 2.8749 LearningRate 0.000513 Epoch: 14 Global Step: 294640 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:37,059-Speed 2496.85 samples/sec Loss 2.8263 LearningRate 0.000513 Epoch: 14 Global Step: 294650 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:45,273-Speed 2493.91 samples/sec Loss 2.8010 LearningRate 0.000513 Epoch: 14 Global Step: 294660 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:50:53,423-Speed 2513.39 samples/sec Loss 2.8414 LearningRate 0.000513 Epoch: 14 Global Step: 294670 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:01,627-Speed 2496.53 samples/sec Loss 2.8479 LearningRate 0.000513 Epoch: 14 Global Step: 294680 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:09,827-Speed 2497.93 samples/sec Loss 2.8481 LearningRate 0.000513 Epoch: 14 Global Step: 294690 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:18,025-Speed 2498.54 samples/sec Loss 2.8016 LearningRate 0.000513 Epoch: 14 Global Step: 294700 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:26,228-Speed 2497.25 samples/sec Loss 2.8541 LearningRate 0.000513 Epoch: 14 Global Step: 294710 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:34,434-Speed 2496.01 samples/sec Loss 2.8362 LearningRate 0.000513 Epoch: 14 Global Step: 294720 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:42,584-Speed 2513.21 samples/sec Loss 2.7857 LearningRate 0.000513 Epoch: 14 Global Step: 294730 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:50,790-Speed 2496.36 samples/sec Loss 2.8552 LearningRate 0.000513 Epoch: 14 Global Step: 294740 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:51:58,990-Speed 2497.84 samples/sec Loss 2.8439 LearningRate 0.000513 Epoch: 14 Global Step: 294750 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:07,195-Speed 2496.27 samples/sec Loss 2.7721 LearningRate 0.000513 Epoch: 14 Global Step: 294760 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:15,402-Speed 2496.26 samples/sec Loss 2.8775 LearningRate 0.000513 Epoch: 14 Global Step: 294770 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:23,601-Speed 2498.11 samples/sec Loss 2.8249 LearningRate 0.000513 Epoch: 14 Global Step: 294780 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:31,753-Speed 2512.59 samples/sec Loss 2.8169 LearningRate 0.000513 Epoch: 14 Global Step: 294790 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:39,954-Speed 2497.89 samples/sec Loss 2.7959 LearningRate 0.000513 Epoch: 14 Global Step: 294800 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:48,167-Speed 2494.31 samples/sec Loss 2.8469 LearningRate 0.000513 Epoch: 14 Global Step: 294810 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:52:56,368-Speed 2497.43 samples/sec Loss 2.8161 LearningRate 0.000513 Epoch: 14 Global Step: 294820 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:04,574-Speed 2496.20 samples/sec Loss 2.7860 LearningRate 0.000513 Epoch: 14 Global Step: 294830 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:12,797-Speed 2491.02 samples/sec Loss 2.8486 LearningRate 0.000513 Epoch: 14 Global Step: 294840 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:20,952-Speed 2511.88 samples/sec Loss 2.8168 LearningRate 0.000513 Epoch: 14 Global Step: 294850 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:29,149-Speed 2498.86 samples/sec Loss 2.8710 LearningRate 0.000513 Epoch: 14 Global Step: 294860 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:37,353-Speed 2496.85 samples/sec Loss 2.7669 LearningRate 0.000513 Epoch: 14 Global Step: 294870 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:45,555-Speed 2497.18 samples/sec Loss 2.8017 LearningRate 0.000513 Epoch: 14 Global Step: 294880 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:53:53,763-Speed 2495.55 samples/sec Loss 2.8357 LearningRate 0.000513 Epoch: 14 Global Step: 294890 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:01,965-Speed 2497.56 samples/sec Loss 2.9024 LearningRate 0.000513 Epoch: 14 Global Step: 294900 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:10,125-Speed 2510.04 samples/sec Loss 2.8585 LearningRate 0.000513 Epoch: 14 Global Step: 294910 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:18,327-Speed 2497.17 samples/sec Loss 2.8748 LearningRate 0.000513 Epoch: 14 Global Step: 294920 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:26,557-Speed 2489.40 samples/sec Loss 2.8886 LearningRate 0.000513 Epoch: 14 Global Step: 294930 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:34,757-Speed 2498.13 samples/sec Loss 2.8240 LearningRate 0.000513 Epoch: 14 Global Step: 294940 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:42,963-Speed 2496.22 samples/sec Loss 2.8080 LearningRate 0.000513 Epoch: 14 Global Step: 294950 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:51,168-Speed 2496.63 samples/sec Loss 2.7838 LearningRate 0.000513 Epoch: 14 Global Step: 294960 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:54:59,316-Speed 2513.93 samples/sec Loss 2.8122 LearningRate 0.000513 Epoch: 14 Global Step: 294970 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:07,518-Speed 2497.22 samples/sec Loss 2.8285 LearningRate 0.000513 Epoch: 14 Global Step: 294980 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:15,718-Speed 2498.05 samples/sec Loss 2.8539 LearningRate 0.000513 Epoch: 14 Global Step: 294990 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:23,931-Speed 2493.94 samples/sec Loss 2.7866 LearningRate 0.000513 Epoch: 14 Global Step: 295000 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:32,131-Speed 2497.96 samples/sec Loss 2.8172 LearningRate 0.000513 Epoch: 14 Global Step: 295010 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:40,330-Speed 2497.98 samples/sec Loss 2.8087 LearningRate 0.000513 Epoch: 14 Global Step: 295020 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:48,480-Speed 2513.43 samples/sec Loss 2.8350 LearningRate 0.000513 Epoch: 14 Global Step: 295030 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:55:56,678-Speed 2498.36 samples/sec Loss 2.8499 LearningRate 0.000513 Epoch: 14 Global Step: 295040 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:04,878-Speed 2498.03 samples/sec Loss 2.8683 LearningRate 0.000513 Epoch: 14 Global Step: 295050 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:13,078-Speed 2497.91 samples/sec Loss 2.8744 LearningRate 0.000513 Epoch: 14 Global Step: 295060 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:21,278-Speed 2497.84 samples/sec Loss 2.8026 LearningRate 0.000513 Epoch: 14 Global Step: 295070 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:29,479-Speed 2497.60 samples/sec Loss 2.8604 LearningRate 0.000512 Epoch: 14 Global Step: 295080 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:37,630-Speed 2513.05 samples/sec Loss 2.8437 LearningRate 0.000512 Epoch: 14 Global Step: 295090 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:45,832-Speed 2497.28 samples/sec Loss 2.8703 LearningRate 0.000512 Epoch: 14 Global Step: 295100 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:56:54,043-Speed 2494.80 samples/sec Loss 2.8580 LearningRate 0.000512 Epoch: 14 Global Step: 295110 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:02,243-Speed 2498.10 samples/sec Loss 2.8339 LearningRate 0.000512 Epoch: 14 Global Step: 295120 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:10,446-Speed 2496.85 samples/sec Loss 2.9625 LearningRate 0.000512 Epoch: 14 Global Step: 295130 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:18,648-Speed 2497.44 samples/sec Loss 2.8732 LearningRate 0.000512 Epoch: 14 Global Step: 295140 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:26,805-Speed 2511.14 samples/sec Loss 2.8932 LearningRate 0.000512 Epoch: 14 Global Step: 295150 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:35,004-Speed 2498.23 samples/sec Loss 2.8636 LearningRate 0.000512 Epoch: 14 Global Step: 295160 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:43,219-Speed 2493.46 samples/sec Loss 2.8092 LearningRate 0.000512 Epoch: 14 Global Step: 295170 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:51,420-Speed 2497.56 samples/sec Loss 2.8660 LearningRate 0.000512 Epoch: 14 Global Step: 295180 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:57:59,621-Speed 2497.62 samples/sec Loss 2.8837 LearningRate 0.000512 Epoch: 14 Global Step: 295190 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:07,824-Speed 2497.12 samples/sec Loss 2.7555 LearningRate 0.000512 Epoch: 14 Global Step: 295200 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:15,981-Speed 2511.01 samples/sec Loss 2.8099 LearningRate 0.000512 Epoch: 14 Global Step: 295210 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:24,195-Speed 2493.79 samples/sec Loss 2.7716 LearningRate 0.000512 Epoch: 14 Global Step: 295220 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:32,410-Speed 2493.21 samples/sec Loss 2.8244 LearningRate 0.000512 Epoch: 14 Global Step: 295230 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:40,611-Speed 2498.04 samples/sec Loss 2.8380 LearningRate 0.000512 Epoch: 14 Global Step: 295240 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:48,811-Speed 2497.83 samples/sec Loss 2.7794 LearningRate 0.000512 Epoch: 14 Global Step: 295250 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:58:57,018-Speed 2495.73 samples/sec Loss 2.8272 LearningRate 0.000512 Epoch: 14 Global Step: 295260 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:05,171-Speed 2512.56 samples/sec Loss 2.8130 LearningRate 0.000512 Epoch: 14 Global Step: 295270 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:13,371-Speed 2497.93 samples/sec Loss 2.8385 LearningRate 0.000512 Epoch: 14 Global Step: 295280 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:21,587-Speed 2493.11 samples/sec Loss 2.8370 LearningRate 0.000512 Epoch: 14 Global Step: 295290 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:29,803-Speed 2492.92 samples/sec Loss 2.8611 LearningRate 0.000512 Epoch: 14 Global Step: 295300 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:38,001-Speed 2498.85 samples/sec Loss 2.8547 LearningRate 0.000512 Epoch: 14 Global Step: 295310 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:46,209-Speed 2495.24 samples/sec Loss 2.7943 LearningRate 0.000512 Epoch: 14 Global Step: 295320 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 09:59:54,353-Speed 2515.10 samples/sec Loss 2.8360 LearningRate 0.000512 Epoch: 14 Global Step: 295330 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:02,567-Speed 2494.11 samples/sec Loss 2.8332 LearningRate 0.000512 Epoch: 14 Global Step: 295340 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:10,768-Speed 2497.86 samples/sec Loss 2.8114 LearningRate 0.000512 Epoch: 14 Global Step: 295350 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:18,965-Speed 2498.71 samples/sec Loss 2.8454 LearningRate 0.000512 Epoch: 14 Global Step: 295360 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:27,163-Speed 2498.54 samples/sec Loss 2.7994 LearningRate 0.000512 Epoch: 14 Global Step: 295370 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:35,364-Speed 2497.55 samples/sec Loss 2.7608 LearningRate 0.000512 Epoch: 14 Global Step: 295380 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:43,521-Speed 2511.28 samples/sec Loss 2.7801 LearningRate 0.000512 Epoch: 14 Global Step: 295390 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:51,722-Speed 2497.67 samples/sec Loss 2.8175 LearningRate 0.000512 Epoch: 14 Global Step: 295400 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:00:59,929-Speed 2495.84 samples/sec Loss 2.8079 LearningRate 0.000512 Epoch: 14 Global Step: 295410 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:08,135-Speed 2496.08 samples/sec Loss 2.7798 LearningRate 0.000512 Epoch: 14 Global Step: 295420 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:16,335-Speed 2498.00 samples/sec Loss 2.8169 LearningRate 0.000512 Epoch: 14 Global Step: 295430 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:24,534-Speed 2498.22 samples/sec Loss 2.7482 LearningRate 0.000512 Epoch: 14 Global Step: 295440 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:32,691-Speed 2511.23 samples/sec Loss 2.7746 LearningRate 0.000512 Epoch: 14 Global Step: 295450 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:40,892-Speed 2497.70 samples/sec Loss 2.8148 LearningRate 0.000512 Epoch: 14 Global Step: 295460 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:49,092-Speed 2497.75 samples/sec Loss 2.8466 LearningRate 0.000512 Epoch: 14 Global Step: 295470 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:01:57,291-Speed 2498.37 samples/sec Loss 2.7877 LearningRate 0.000512 Epoch: 14 Global Step: 295480 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:05,489-Speed 2498.48 samples/sec Loss 2.7664 LearningRate 0.000512 Epoch: 14 Global Step: 295490 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:13,693-Speed 2496.85 samples/sec Loss 2.8110 LearningRate 0.000512 Epoch: 14 Global Step: 295500 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:21,838-Speed 2514.99 samples/sec Loss 2.7724 LearningRate 0.000512 Epoch: 14 Global Step: 295510 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:30,037-Speed 2498.18 samples/sec Loss 2.7469 LearningRate 0.000512 Epoch: 14 Global Step: 295520 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:38,235-Speed 2498.69 samples/sec Loss 2.8339 LearningRate 0.000512 Epoch: 14 Global Step: 295530 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:46,439-Speed 2496.89 samples/sec Loss 2.8749 LearningRate 0.000512 Epoch: 14 Global Step: 295540 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:02:54,637-Speed 2498.48 samples/sec Loss 2.8729 LearningRate 0.000512 Epoch: 14 Global Step: 295550 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:02,838-Speed 2497.71 samples/sec Loss 2.9426 LearningRate 0.000512 Epoch: 14 Global Step: 295560 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:10,987-Speed 2513.56 samples/sec Loss 2.8959 LearningRate 0.000512 Epoch: 14 Global Step: 295570 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:19,191-Speed 2496.73 samples/sec Loss 2.8710 LearningRate 0.000512 Epoch: 14 Global Step: 295580 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:27,396-Speed 2496.40 samples/sec Loss 2.8849 LearningRate 0.000512 Epoch: 14 Global Step: 295590 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:35,598-Speed 2497.43 samples/sec Loss 2.8519 LearningRate 0.000511 Epoch: 14 Global Step: 295600 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:43,796-Speed 2498.29 samples/sec Loss 2.8663 LearningRate 0.000511 Epoch: 14 Global Step: 295610 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:03:51,995-Speed 2498.39 samples/sec Loss 2.8185 LearningRate 0.000511 Epoch: 14 Global Step: 295620 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:00,160-Speed 2508.63 samples/sec Loss 2.7902 LearningRate 0.000511 Epoch: 14 Global Step: 295630 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:08,374-Speed 2493.65 samples/sec Loss 2.8086 LearningRate 0.000511 Epoch: 14 Global Step: 295640 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:16,574-Speed 2498.17 samples/sec Loss 2.8247 LearningRate 0.000511 Epoch: 14 Global Step: 295650 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:24,772-Speed 2498.56 samples/sec Loss 2.8144 LearningRate 0.000511 Epoch: 14 Global Step: 295660 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:32,996-Speed 2490.61 samples/sec Loss 2.8098 LearningRate 0.000511 Epoch: 14 Global Step: 295670 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:41,206-Speed 2494.88 samples/sec Loss 2.8375 LearningRate 0.000511 Epoch: 14 Global Step: 295680 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:49,357-Speed 2513.02 samples/sec Loss 2.7733 LearningRate 0.000511 Epoch: 14 Global Step: 295690 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:04:57,558-Speed 2497.87 samples/sec Loss 2.7984 LearningRate 0.000511 Epoch: 14 Global Step: 295700 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:05,754-Speed 2499.02 samples/sec Loss 2.7926 LearningRate 0.000511 Epoch: 14 Global Step: 295710 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:13,955-Speed 2497.60 samples/sec Loss 2.7986 LearningRate 0.000511 Epoch: 14 Global Step: 295720 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:22,154-Speed 2498.51 samples/sec Loss 2.8631 LearningRate 0.000511 Epoch: 14 Global Step: 295730 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:30,366-Speed 2494.31 samples/sec Loss 2.8057 LearningRate 0.000511 Epoch: 14 Global Step: 295740 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:38,510-Speed 2515.18 samples/sec Loss 2.8334 LearningRate 0.000511 Epoch: 14 Global Step: 295750 Fp16 Grad Scale: 65536 Required: 122 hours Training: 2022-07-08 10:05:46,668-Speed 2510.73 samples/sec Loss 2.7672 LearningRate 0.000511 Epoch: 14 Global Step: 295760 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:05:54,869-Speed 2497.92 samples/sec Loss 2.8260 LearningRate 0.000511 Epoch: 14 Global Step: 295770 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:03,081-Speed 2494.25 samples/sec Loss 2.8566 LearningRate 0.000511 Epoch: 14 Global Step: 295780 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:11,287-Speed 2496.15 samples/sec Loss 2.7637 LearningRate 0.000511 Epoch: 14 Global Step: 295790 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:19,488-Speed 2497.60 samples/sec Loss 2.8691 LearningRate 0.000511 Epoch: 14 Global Step: 295800 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:27,633-Speed 2515.18 samples/sec Loss 2.7831 LearningRate 0.000511 Epoch: 14 Global Step: 295810 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:35,830-Speed 2498.63 samples/sec Loss 2.7684 LearningRate 0.000511 Epoch: 14 Global Step: 295820 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:44,031-Speed 2497.59 samples/sec Loss 2.8277 LearningRate 0.000511 Epoch: 14 Global Step: 295830 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:06:52,229-Speed 2498.56 samples/sec Loss 2.8520 LearningRate 0.000511 Epoch: 14 Global Step: 295840 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:00,427-Speed 2498.46 samples/sec Loss 2.7900 LearningRate 0.000511 Epoch: 14 Global Step: 295850 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:08,629-Speed 2497.58 samples/sec Loss 2.8014 LearningRate 0.000511 Epoch: 14 Global Step: 295860 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:16,774-Speed 2514.76 samples/sec Loss 2.7989 LearningRate 0.000511 Epoch: 14 Global Step: 295870 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:24,975-Speed 2497.53 samples/sec Loss 2.7616 LearningRate 0.000511 Epoch: 14 Global Step: 295880 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:33,193-Speed 2492.61 samples/sec Loss 2.8232 LearningRate 0.000511 Epoch: 14 Global Step: 295890 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:41,396-Speed 2497.11 samples/sec Loss 2.8316 LearningRate 0.000511 Epoch: 14 Global Step: 295900 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:49,602-Speed 2496.39 samples/sec Loss 2.8083 LearningRate 0.000511 Epoch: 14 Global Step: 295910 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:07:57,808-Speed 2496.10 samples/sec Loss 2.7830 LearningRate 0.000511 Epoch: 14 Global Step: 295920 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:05,960-Speed 2512.51 samples/sec Loss 2.7849 LearningRate 0.000511 Epoch: 14 Global Step: 295930 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:14,164-Speed 2496.82 samples/sec Loss 2.8274 LearningRate 0.000511 Epoch: 14 Global Step: 295940 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:22,364-Speed 2497.97 samples/sec Loss 2.8046 LearningRate 0.000511 Epoch: 14 Global Step: 295950 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:30,568-Speed 2496.88 samples/sec Loss 2.8252 LearningRate 0.000511 Epoch: 14 Global Step: 295960 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:38,770-Speed 2497.42 samples/sec Loss 2.8530 LearningRate 0.000511 Epoch: 14 Global Step: 295970 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:46,980-Speed 2494.84 samples/sec Loss 2.7829 LearningRate 0.000511 Epoch: 14 Global Step: 295980 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:08:55,125-Speed 2515.45 samples/sec Loss 2.8301 LearningRate 0.000511 Epoch: 14 Global Step: 295990 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:03,325-Speed 2497.83 samples/sec Loss 2.8146 LearningRate 0.000511 Epoch: 14 Global Step: 296000 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:11,525-Speed 2498.08 samples/sec Loss 2.8012 LearningRate 0.000511 Epoch: 14 Global Step: 296010 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:19,726-Speed 2497.51 samples/sec Loss 2.7770 LearningRate 0.000511 Epoch: 14 Global Step: 296020 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:27,925-Speed 2498.39 samples/sec Loss 2.8270 LearningRate 0.000511 Epoch: 14 Global Step: 296030 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:36,128-Speed 2496.97 samples/sec Loss 2.7948 LearningRate 0.000511 Epoch: 14 Global Step: 296040 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:44,270-Speed 2515.61 samples/sec Loss 2.8367 LearningRate 0.000511 Epoch: 14 Global Step: 296050 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:09:52,467-Speed 2498.88 samples/sec Loss 2.7800 LearningRate 0.000511 Epoch: 14 Global Step: 296060 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:10:00,666-Speed 2498.33 samples/sec Loss 2.8011 LearningRate 0.000511 Epoch: 14 Global Step: 296070 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:10:08,824-Speed 2510.93 samples/sec Loss 2.7981 LearningRate 0.000511 Epoch: 14 Global Step: 296080 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:17,034-Speed 2494.79 samples/sec Loss 2.8537 LearningRate 0.000511 Epoch: 14 Global Step: 296090 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:25,233-Speed 2498.46 samples/sec Loss 2.8368 LearningRate 0.000511 Epoch: 14 Global Step: 296100 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:33,384-Speed 2513.03 samples/sec Loss 2.8009 LearningRate 0.000511 Epoch: 14 Global Step: 296110 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:41,587-Speed 2496.85 samples/sec Loss 2.7949 LearningRate 0.000510 Epoch: 14 Global Step: 296120 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:49,792-Speed 2496.62 samples/sec Loss 2.7943 LearningRate 0.000510 Epoch: 14 Global Step: 296130 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:10:57,994-Speed 2497.27 samples/sec Loss 2.8136 LearningRate 0.000510 Epoch: 14 Global Step: 296140 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:06,198-Speed 2496.92 samples/sec Loss 2.8597 LearningRate 0.000510 Epoch: 14 Global Step: 296150 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:14,397-Speed 2498.18 samples/sec Loss 2.8134 LearningRate 0.000510 Epoch: 14 Global Step: 296160 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:22,544-Speed 2514.07 samples/sec Loss 2.8062 LearningRate 0.000510 Epoch: 14 Global Step: 296170 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:30,746-Speed 2497.39 samples/sec Loss 2.8381 LearningRate 0.000510 Epoch: 14 Global Step: 296180 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:38,946-Speed 2498.25 samples/sec Loss 2.8953 LearningRate 0.000510 Epoch: 14 Global Step: 296190 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:47,144-Speed 2498.45 samples/sec Loss 2.7918 LearningRate 0.000510 Epoch: 14 Global Step: 296200 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:11:55,350-Speed 2495.98 samples/sec Loss 2.7746 LearningRate 0.000510 Epoch: 14 Global Step: 296210 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:03,550-Speed 2498.03 samples/sec Loss 2.8312 LearningRate 0.000510 Epoch: 14 Global Step: 296220 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:11,700-Speed 2513.34 samples/sec Loss 2.8987 LearningRate 0.000510 Epoch: 14 Global Step: 296230 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:19,900-Speed 2498.06 samples/sec Loss 2.8340 LearningRate 0.000510 Epoch: 14 Global Step: 296240 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:28,101-Speed 2497.71 samples/sec Loss 2.8250 LearningRate 0.000510 Epoch: 14 Global Step: 296250 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:36,302-Speed 2497.57 samples/sec Loss 2.7930 LearningRate 0.000510 Epoch: 14 Global Step: 296260 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:44,507-Speed 2496.60 samples/sec Loss 2.8705 LearningRate 0.000510 Epoch: 14 Global Step: 296270 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:12:52,708-Speed 2497.58 samples/sec Loss 2.8578 LearningRate 0.000510 Epoch: 14 Global Step: 296280 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:00,852-Speed 2515.29 samples/sec Loss 2.7863 LearningRate 0.000510 Epoch: 14 Global Step: 296290 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:09,052-Speed 2497.98 samples/sec Loss 2.8340 LearningRate 0.000510 Epoch: 14 Global Step: 296300 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:17,250-Speed 2498.54 samples/sec Loss 2.7491 LearningRate 0.000510 Epoch: 14 Global Step: 296310 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:25,446-Speed 2499.02 samples/sec Loss 2.8595 LearningRate 0.000510 Epoch: 14 Global Step: 296320 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:33,667-Speed 2491.53 samples/sec Loss 2.8745 LearningRate 0.000510 Epoch: 14 Global Step: 296330 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:41,867-Speed 2497.85 samples/sec Loss 2.9556 LearningRate 0.000510 Epoch: 14 Global Step: 296340 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:50,032-Speed 2508.91 samples/sec Loss 2.8247 LearningRate 0.000510 Epoch: 14 Global Step: 296350 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:13:58,253-Speed 2491.71 samples/sec Loss 2.9414 LearningRate 0.000510 Epoch: 14 Global Step: 296360 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:06,455-Speed 2497.34 samples/sec Loss 2.8951 LearningRate 0.000510 Epoch: 14 Global Step: 296370 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:14,670-Speed 2493.51 samples/sec Loss 2.9273 LearningRate 0.000510 Epoch: 14 Global Step: 296380 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:22,877-Speed 2495.61 samples/sec Loss 2.8565 LearningRate 0.000510 Epoch: 14 Global Step: 296390 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:31,077-Speed 2498.03 samples/sec Loss 2.8735 LearningRate 0.000510 Epoch: 14 Global Step: 296400 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:39,237-Speed 2510.04 samples/sec Loss 2.8979 LearningRate 0.000510 Epoch: 14 Global Step: 296410 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:47,449-Speed 2494.61 samples/sec Loss 2.9005 LearningRate 0.000510 Epoch: 14 Global Step: 296420 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:14:55,651-Speed 2497.36 samples/sec Loss 2.8809 LearningRate 0.000510 Epoch: 14 Global Step: 296430 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:03,866-Speed 2493.24 samples/sec Loss 2.8020 LearningRate 0.000510 Epoch: 14 Global Step: 296440 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:12,074-Speed 2495.77 samples/sec Loss 2.8542 LearningRate 0.000510 Epoch: 14 Global Step: 296450 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:20,279-Speed 2496.23 samples/sec Loss 2.7926 LearningRate 0.000510 Epoch: 14 Global Step: 296460 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:28,428-Speed 2513.44 samples/sec Loss 2.8168 LearningRate 0.000510 Epoch: 14 Global Step: 296470 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:36,640-Speed 2494.53 samples/sec Loss 2.8054 LearningRate 0.000510 Epoch: 14 Global Step: 296480 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:44,863-Speed 2491.08 samples/sec Loss 2.7744 LearningRate 0.000510 Epoch: 14 Global Step: 296490 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:15:53,065-Speed 2497.73 samples/sec Loss 2.8199 LearningRate 0.000510 Epoch: 14 Global Step: 296500 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:01,262-Speed 2498.62 samples/sec Loss 2.8614 LearningRate 0.000510 Epoch: 14 Global Step: 296510 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:09,467-Speed 2496.34 samples/sec Loss 2.7464 LearningRate 0.000510 Epoch: 14 Global Step: 296520 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:17,609-Speed 2516.00 samples/sec Loss 2.8266 LearningRate 0.000510 Epoch: 14 Global Step: 296530 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:25,815-Speed 2496.73 samples/sec Loss 2.7904 LearningRate 0.000510 Epoch: 14 Global Step: 296540 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:34,041-Speed 2490.17 samples/sec Loss 2.7490 LearningRate 0.000510 Epoch: 14 Global Step: 296550 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:42,247-Speed 2496.13 samples/sec Loss 2.7546 LearningRate 0.000510 Epoch: 14 Global Step: 296560 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:50,447-Speed 2498.16 samples/sec Loss 2.7794 LearningRate 0.000510 Epoch: 14 Global Step: 296570 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:16:58,653-Speed 2495.92 samples/sec Loss 2.7931 LearningRate 0.000510 Epoch: 14 Global Step: 296580 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:06,808-Speed 2512.23 samples/sec Loss 2.8941 LearningRate 0.000510 Epoch: 14 Global Step: 296590 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:15,008-Speed 2497.95 samples/sec Loss 2.7684 LearningRate 0.000510 Epoch: 14 Global Step: 296600 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:23,203-Speed 2499.63 samples/sec Loss 2.7236 LearningRate 0.000510 Epoch: 14 Global Step: 296610 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:31,401-Speed 2498.52 samples/sec Loss 2.7723 LearningRate 0.000510 Epoch: 14 Global Step: 296620 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:39,599-Speed 2498.49 samples/sec Loss 2.7835 LearningRate 0.000510 Epoch: 14 Global Step: 296630 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:47,801-Speed 2497.22 samples/sec Loss 2.8017 LearningRate 0.000510 Epoch: 14 Global Step: 296640 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:17:55,951-Speed 2513.34 samples/sec Loss 2.7787 LearningRate 0.000509 Epoch: 14 Global Step: 296650 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:04,147-Speed 2499.27 samples/sec Loss 2.8535 LearningRate 0.000509 Epoch: 14 Global Step: 296660 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:12,344-Speed 2498.86 samples/sec Loss 2.8331 LearningRate 0.000509 Epoch: 14 Global Step: 296670 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:20,542-Speed 2498.46 samples/sec Loss 2.8315 LearningRate 0.000509 Epoch: 14 Global Step: 296680 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:28,744-Speed 2497.53 samples/sec Loss 2.7892 LearningRate 0.000509 Epoch: 14 Global Step: 296690 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:36,941-Speed 2499.15 samples/sec Loss 2.7809 LearningRate 0.000509 Epoch: 14 Global Step: 296700 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:45,096-Speed 2511.66 samples/sec Loss 2.8046 LearningRate 0.000509 Epoch: 14 Global Step: 296710 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:18:53,307-Speed 2494.73 samples/sec Loss 2.8489 LearningRate 0.000509 Epoch: 14 Global Step: 296720 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:01,525-Speed 2492.43 samples/sec Loss 2.8634 LearningRate 0.000509 Epoch: 14 Global Step: 296730 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:09,730-Speed 2496.45 samples/sec Loss 2.8313 LearningRate 0.000509 Epoch: 14 Global Step: 296740 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:17,926-Speed 2499.36 samples/sec Loss 2.7898 LearningRate 0.000509 Epoch: 14 Global Step: 296750 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:26,125-Speed 2498.28 samples/sec Loss 2.7709 LearningRate 0.000509 Epoch: 14 Global Step: 296760 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:34,281-Speed 2511.35 samples/sec Loss 2.8569 LearningRate 0.000509 Epoch: 14 Global Step: 296770 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:42,485-Speed 2496.80 samples/sec Loss 2.7907 LearningRate 0.000509 Epoch: 14 Global Step: 296780 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:50,686-Speed 2497.40 samples/sec Loss 2.7654 LearningRate 0.000509 Epoch: 14 Global Step: 296790 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:19:58,901-Speed 2493.96 samples/sec Loss 2.7894 LearningRate 0.000509 Epoch: 14 Global Step: 296800 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:07,102-Speed 2497.87 samples/sec Loss 2.8251 LearningRate 0.000509 Epoch: 14 Global Step: 296810 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:15,313-Speed 2494.41 samples/sec Loss 2.8654 LearningRate 0.000509 Epoch: 14 Global Step: 296820 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:23,462-Speed 2513.82 samples/sec Loss 2.7426 LearningRate 0.000509 Epoch: 14 Global Step: 296830 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:31,663-Speed 2497.89 samples/sec Loss 2.8383 LearningRate 0.000509 Epoch: 14 Global Step: 296840 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:39,868-Speed 2496.12 samples/sec Loss 2.8069 LearningRate 0.000509 Epoch: 14 Global Step: 296850 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:48,070-Speed 2497.92 samples/sec Loss 2.7975 LearningRate 0.000509 Epoch: 14 Global Step: 296860 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:20:56,272-Speed 2497.32 samples/sec Loss 2.7870 LearningRate 0.000509 Epoch: 14 Global Step: 296870 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:04,474-Speed 2497.81 samples/sec Loss 2.8044 LearningRate 0.000509 Epoch: 14 Global Step: 296880 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:12,624-Speed 2513.15 samples/sec Loss 2.7954 LearningRate 0.000509 Epoch: 14 Global Step: 296890 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:20,822-Speed 2498.76 samples/sec Loss 2.7799 LearningRate 0.000509 Epoch: 14 Global Step: 296900 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:29,021-Speed 2498.14 samples/sec Loss 2.8318 LearningRate 0.000509 Epoch: 14 Global Step: 296910 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:37,237-Speed 2493.22 samples/sec Loss 2.8023 LearningRate 0.000509 Epoch: 14 Global Step: 296920 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:45,435-Speed 2498.33 samples/sec Loss 2.8374 LearningRate 0.000509 Epoch: 14 Global Step: 296930 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:21:53,635-Speed 2498.04 samples/sec Loss 2.8081 LearningRate 0.000509 Epoch: 14 Global Step: 296940 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:01,781-Speed 2514.48 samples/sec Loss 2.8001 LearningRate 0.000509 Epoch: 14 Global Step: 296950 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:09,979-Speed 2498.45 samples/sec Loss 2.7929 LearningRate 0.000509 Epoch: 14 Global Step: 296960 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:18,192-Speed 2494.06 samples/sec Loss 2.8416 LearningRate 0.000509 Epoch: 14 Global Step: 296970 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:26,390-Speed 2498.67 samples/sec Loss 2.8044 LearningRate 0.000509 Epoch: 14 Global Step: 296980 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:34,591-Speed 2497.62 samples/sec Loss 2.7490 LearningRate 0.000509 Epoch: 14 Global Step: 296990 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:42,791-Speed 2497.83 samples/sec Loss 2.7792 LearningRate 0.000509 Epoch: 14 Global Step: 297000 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:50,950-Speed 2510.54 samples/sec Loss 2.8167 LearningRate 0.000509 Epoch: 14 Global Step: 297010 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:22:59,149-Speed 2498.30 samples/sec Loss 2.8442 LearningRate 0.000509 Epoch: 14 Global Step: 297020 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:07,349-Speed 2497.97 samples/sec Loss 2.7834 LearningRate 0.000509 Epoch: 14 Global Step: 297030 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:15,549-Speed 2497.94 samples/sec Loss 2.7849 LearningRate 0.000509 Epoch: 14 Global Step: 297040 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:23,749-Speed 2498.12 samples/sec Loss 2.8419 LearningRate 0.000509 Epoch: 14 Global Step: 297050 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:31,949-Speed 2497.78 samples/sec Loss 2.8038 LearningRate 0.000509 Epoch: 14 Global Step: 297060 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:40,095-Speed 2514.43 samples/sec Loss 2.7807 LearningRate 0.000509 Epoch: 14 Global Step: 297070 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:48,296-Speed 2497.90 samples/sec Loss 2.8476 LearningRate 0.000509 Epoch: 14 Global Step: 297080 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:23:56,496-Speed 2497.76 samples/sec Loss 2.8288 LearningRate 0.000509 Epoch: 14 Global Step: 297090 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:04,695-Speed 2498.56 samples/sec Loss 2.8472 LearningRate 0.000509 Epoch: 14 Global Step: 297100 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:12,897-Speed 2497.52 samples/sec Loss 2.8635 LearningRate 0.000509 Epoch: 14 Global Step: 297110 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:21,095-Speed 2498.60 samples/sec Loss 2.9079 LearningRate 0.000509 Epoch: 14 Global Step: 297120 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:29,245-Speed 2513.41 samples/sec Loss 2.8105 LearningRate 0.000509 Epoch: 14 Global Step: 297130 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:37,443-Speed 2498.32 samples/sec Loss 2.8406 LearningRate 0.000509 Epoch: 14 Global Step: 297140 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:45,646-Speed 2497.22 samples/sec Loss 2.7868 LearningRate 0.000509 Epoch: 14 Global Step: 297150 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:24:53,870-Speed 2490.82 samples/sec Loss 2.8304 LearningRate 0.000509 Epoch: 14 Global Step: 297160 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:02,068-Speed 2498.36 samples/sec Loss 2.8141 LearningRate 0.000508 Epoch: 14 Global Step: 297170 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:10,277-Speed 2495.32 samples/sec Loss 2.7858 LearningRate 0.000508 Epoch: 14 Global Step: 297180 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:18,435-Speed 2510.81 samples/sec Loss 2.7683 LearningRate 0.000508 Epoch: 14 Global Step: 297190 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:26,635-Speed 2497.99 samples/sec Loss 2.7537 LearningRate 0.000508 Epoch: 14 Global Step: 297200 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:34,833-Speed 2498.74 samples/sec Loss 2.7595 LearningRate 0.000508 Epoch: 14 Global Step: 297210 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:43,034-Speed 2497.69 samples/sec Loss 2.7502 LearningRate 0.000508 Epoch: 14 Global Step: 297220 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:51,236-Speed 2497.63 samples/sec Loss 2.7232 LearningRate 0.000508 Epoch: 14 Global Step: 297230 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:25:59,438-Speed 2497.18 samples/sec Loss 2.8495 LearningRate 0.000508 Epoch: 14 Global Step: 297240 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:26:07,582-Speed 2515.21 samples/sec Loss 2.7969 LearningRate 0.000508 Epoch: 14 Global Step: 297250 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:26:15,781-Speed 2498.28 samples/sec Loss 2.7953 LearningRate 0.000508 Epoch: 14 Global Step: 297260 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:26:23,979-Speed 2498.75 samples/sec Loss 2.7913 LearningRate 0.000508 Epoch: 14 Global Step: 297270 Fp16 Grad Scale: 16384 Required: 122 hours Training: 2022-07-08 10:26:32,180-Speed 2497.74 samples/sec Loss 2.8093 LearningRate 0.000508 Epoch: 14 Global Step: 297280 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:26:40,384-Speed 2496.66 samples/sec Loss 2.8282 LearningRate 0.000508 Epoch: 14 Global Step: 297290 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:26:48,590-Speed 2496.09 samples/sec Loss 2.8090 LearningRate 0.000508 Epoch: 14 Global Step: 297300 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:26:56,736-Speed 2514.46 samples/sec Loss 2.8276 LearningRate 0.000508 Epoch: 14 Global Step: 297310 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:04,936-Speed 2498.12 samples/sec Loss 2.8443 LearningRate 0.000508 Epoch: 14 Global Step: 297320 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:13,150-Speed 2493.76 samples/sec Loss 2.7856 LearningRate 0.000508 Epoch: 14 Global Step: 297330 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:21,352-Speed 2497.44 samples/sec Loss 2.7779 LearningRate 0.000508 Epoch: 14 Global Step: 297340 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:29,562-Speed 2495.02 samples/sec Loss 2.7568 LearningRate 0.000508 Epoch: 14 Global Step: 297350 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:37,764-Speed 2497.11 samples/sec Loss 2.7819 LearningRate 0.000508 Epoch: 14 Global Step: 297360 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:45,922-Speed 2510.87 samples/sec Loss 2.7923 LearningRate 0.000508 Epoch: 14 Global Step: 297370 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:27:54,120-Speed 2498.66 samples/sec Loss 2.7676 LearningRate 0.000508 Epoch: 14 Global Step: 297380 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:02,323-Speed 2496.86 samples/sec Loss 2.7822 LearningRate 0.000508 Epoch: 14 Global Step: 297390 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:10,523-Speed 2498.35 samples/sec Loss 2.8586 LearningRate 0.000508 Epoch: 14 Global Step: 297400 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:18,723-Speed 2498.07 samples/sec Loss 2.8236 LearningRate 0.000508 Epoch: 14 Global Step: 297410 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:26,925-Speed 2497.56 samples/sec Loss 2.8044 LearningRate 0.000508 Epoch: 14 Global Step: 297420 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:35,073-Speed 2514.11 samples/sec Loss 2.9001 LearningRate 0.000508 Epoch: 14 Global Step: 297430 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:43,279-Speed 2496.26 samples/sec Loss 2.8587 LearningRate 0.000508 Epoch: 14 Global Step: 297440 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:51,483-Speed 2496.88 samples/sec Loss 2.9239 LearningRate 0.000508 Epoch: 14 Global Step: 297450 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:28:59,686-Speed 2497.12 samples/sec Loss 2.9915 LearningRate 0.000508 Epoch: 14 Global Step: 297460 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:07,899-Speed 2493.79 samples/sec Loss 2.9068 LearningRate 0.000508 Epoch: 14 Global Step: 297470 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:16,098-Speed 2498.30 samples/sec Loss 2.8914 LearningRate 0.000508 Epoch: 14 Global Step: 297480 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:24,239-Speed 2516.11 samples/sec Loss 2.8756 LearningRate 0.000508 Epoch: 14 Global Step: 297490 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:32,434-Speed 2499.48 samples/sec Loss 2.8183 LearningRate 0.000508 Epoch: 14 Global Step: 297500 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:40,630-Speed 2499.14 samples/sec Loss 2.8365 LearningRate 0.000508 Epoch: 14 Global Step: 297510 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:48,829-Speed 2498.32 samples/sec Loss 2.8673 LearningRate 0.000508 Epoch: 14 Global Step: 297520 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:29:57,038-Speed 2495.31 samples/sec Loss 2.8837 LearningRate 0.000508 Epoch: 14 Global Step: 297530 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:05,233-Speed 2499.31 samples/sec Loss 2.8173 LearningRate 0.000508 Epoch: 14 Global Step: 297540 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:13,391-Speed 2510.92 samples/sec Loss 2.7897 LearningRate 0.000508 Epoch: 14 Global Step: 297550 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:21,585-Speed 2499.65 samples/sec Loss 2.7792 LearningRate 0.000508 Epoch: 14 Global Step: 297560 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:29,784-Speed 2498.52 samples/sec Loss 2.7901 LearningRate 0.000508 Epoch: 14 Global Step: 297570 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:37,983-Speed 2498.95 samples/sec Loss 2.8298 LearningRate 0.000508 Epoch: 14 Global Step: 297580 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:46,192-Speed 2495.22 samples/sec Loss 2.8254 LearningRate 0.000508 Epoch: 14 Global Step: 297590 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:30:54,390-Speed 2498.17 samples/sec Loss 2.7877 LearningRate 0.000508 Epoch: 14 Global Step: 297600 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:02,538-Speed 2514.06 samples/sec Loss 2.8018 LearningRate 0.000508 Epoch: 14 Global Step: 297610 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:10,747-Speed 2495.17 samples/sec Loss 2.8304 LearningRate 0.000508 Epoch: 14 Global Step: 297620 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:18,946-Speed 2498.45 samples/sec Loss 2.8170 LearningRate 0.000508 Epoch: 14 Global Step: 297630 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:27,143-Speed 2498.64 samples/sec Loss 2.7702 LearningRate 0.000508 Epoch: 14 Global Step: 297640 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:35,353-Speed 2495.49 samples/sec Loss 2.7739 LearningRate 0.000508 Epoch: 14 Global Step: 297650 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:43,551-Speed 2498.45 samples/sec Loss 2.8402 LearningRate 0.000508 Epoch: 14 Global Step: 297660 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:51,704-Speed 2512.48 samples/sec Loss 2.7169 LearningRate 0.000508 Epoch: 14 Global Step: 297670 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:31:59,901-Speed 2498.96 samples/sec Loss 2.7996 LearningRate 0.000508 Epoch: 14 Global Step: 297680 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:08,103-Speed 2497.31 samples/sec Loss 2.7607 LearningRate 0.000507 Epoch: 14 Global Step: 297690 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:16,315-Speed 2494.31 samples/sec Loss 2.7831 LearningRate 0.000507 Epoch: 14 Global Step: 297700 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:24,521-Speed 2495.96 samples/sec Loss 2.8272 LearningRate 0.000507 Epoch: 14 Global Step: 297710 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:32,721-Speed 2497.89 samples/sec Loss 2.8321 LearningRate 0.000507 Epoch: 14 Global Step: 297720 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:40,870-Speed 2513.72 samples/sec Loss 2.8097 LearningRate 0.000507 Epoch: 14 Global Step: 297730 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:49,075-Speed 2496.35 samples/sec Loss 2.7925 LearningRate 0.000507 Epoch: 14 Global Step: 297740 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:32:57,275-Speed 2498.08 samples/sec Loss 2.7527 LearningRate 0.000507 Epoch: 14 Global Step: 297750 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:05,472-Speed 2498.86 samples/sec Loss 2.7440 LearningRate 0.000507 Epoch: 14 Global Step: 297760 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:13,684-Speed 2494.98 samples/sec Loss 2.8034 LearningRate 0.000507 Epoch: 14 Global Step: 297770 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:21,885-Speed 2497.69 samples/sec Loss 2.7753 LearningRate 0.000507 Epoch: 14 Global Step: 297780 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:30,027-Speed 2515.73 samples/sec Loss 2.8795 LearningRate 0.000507 Epoch: 14 Global Step: 297790 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:38,225-Speed 2498.56 samples/sec Loss 2.7944 LearningRate 0.000507 Epoch: 14 Global Step: 297800 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:46,425-Speed 2498.26 samples/sec Loss 2.7353 LearningRate 0.000507 Epoch: 14 Global Step: 297810 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:33:54,623-Speed 2498.28 samples/sec Loss 2.7868 LearningRate 0.000507 Epoch: 14 Global Step: 297820 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:02,824-Speed 2497.70 samples/sec Loss 2.7963 LearningRate 0.000507 Epoch: 14 Global Step: 297830 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:11,031-Speed 2495.98 samples/sec Loss 2.8010 LearningRate 0.000507 Epoch: 14 Global Step: 297840 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:19,175-Speed 2515.11 samples/sec Loss 2.8012 LearningRate 0.000507 Epoch: 14 Global Step: 297850 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:27,379-Speed 2496.52 samples/sec Loss 2.7747 LearningRate 0.000507 Epoch: 14 Global Step: 297860 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:35,583-Speed 2496.86 samples/sec Loss 2.8111 LearningRate 0.000507 Epoch: 14 Global Step: 297870 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:43,787-Speed 2496.77 samples/sec Loss 2.8245 LearningRate 0.000507 Epoch: 14 Global Step: 297880 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:34:51,990-Speed 2497.13 samples/sec Loss 2.8258 LearningRate 0.000507 Epoch: 14 Global Step: 297890 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:00,190-Speed 2497.97 samples/sec Loss 2.8107 LearningRate 0.000507 Epoch: 14 Global Step: 297900 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:08,344-Speed 2512.09 samples/sec Loss 2.7238 LearningRate 0.000507 Epoch: 14 Global Step: 297910 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:16,544-Speed 2498.22 samples/sec Loss 2.8041 LearningRate 0.000507 Epoch: 14 Global Step: 297920 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:24,743-Speed 2498.09 samples/sec Loss 2.7862 LearningRate 0.000507 Epoch: 14 Global Step: 297930 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:32,953-Speed 2494.87 samples/sec Loss 2.8091 LearningRate 0.000507 Epoch: 14 Global Step: 297940 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:41,164-Speed 2494.82 samples/sec Loss 2.7830 LearningRate 0.000507 Epoch: 14 Global Step: 297950 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:49,371-Speed 2495.73 samples/sec Loss 2.8067 LearningRate 0.000507 Epoch: 14 Global Step: 297960 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:35:57,518-Speed 2514.16 samples/sec Loss 2.7984 LearningRate 0.000507 Epoch: 14 Global Step: 297970 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:05,727-Speed 2495.31 samples/sec Loss 2.8098 LearningRate 0.000507 Epoch: 14 Global Step: 297980 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:13,939-Speed 2494.40 samples/sec Loss 2.8529 LearningRate 0.000507 Epoch: 14 Global Step: 297990 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:22,141-Speed 2497.41 samples/sec Loss 2.7844 LearningRate 0.000507 Epoch: 14 Global Step: 298000 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:30,344-Speed 2496.93 samples/sec Loss 2.7677 LearningRate 0.000507 Epoch: 14 Global Step: 298010 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:38,549-Speed 2496.58 samples/sec Loss 2.8004 LearningRate 0.000507 Epoch: 14 Global Step: 298020 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:46,698-Speed 2513.86 samples/sec Loss 2.8014 LearningRate 0.000507 Epoch: 14 Global Step: 298030 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:36:54,905-Speed 2495.65 samples/sec Loss 2.7894 LearningRate 0.000507 Epoch: 14 Global Step: 298040 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:03,112-Speed 2496.16 samples/sec Loss 2.7914 LearningRate 0.000507 Epoch: 14 Global Step: 298050 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:11,322-Speed 2495.15 samples/sec Loss 2.7618 LearningRate 0.000507 Epoch: 14 Global Step: 298060 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:19,526-Speed 2496.81 samples/sec Loss 2.7498 LearningRate 0.000507 Epoch: 14 Global Step: 298070 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:27,730-Speed 2496.49 samples/sec Loss 2.8786 LearningRate 0.000507 Epoch: 14 Global Step: 298080 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:35,880-Speed 2513.56 samples/sec Loss 2.8469 LearningRate 0.000507 Epoch: 14 Global Step: 298090 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:44,085-Speed 2496.44 samples/sec Loss 2.8368 LearningRate 0.000507 Epoch: 14 Global Step: 298100 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:37:52,288-Speed 2496.89 samples/sec Loss 2.8729 LearningRate 0.000507 Epoch: 14 Global Step: 298110 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:00,490-Speed 2497.56 samples/sec Loss 2.8225 LearningRate 0.000507 Epoch: 14 Global Step: 298120 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:08,687-Speed 2498.76 samples/sec Loss 2.8490 LearningRate 0.000507 Epoch: 14 Global Step: 298130 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:16,892-Speed 2496.59 samples/sec Loss 2.7875 LearningRate 0.000507 Epoch: 14 Global Step: 298140 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:25,041-Speed 2513.69 samples/sec Loss 2.7891 LearningRate 0.000507 Epoch: 14 Global Step: 298150 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:33,250-Speed 2495.35 samples/sec Loss 2.7750 LearningRate 0.000507 Epoch: 14 Global Step: 298160 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:41,457-Speed 2495.75 samples/sec Loss 2.8202 LearningRate 0.000507 Epoch: 14 Global Step: 298170 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:49,659-Speed 2497.37 samples/sec Loss 2.7194 LearningRate 0.000507 Epoch: 14 Global Step: 298180 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:38:57,874-Speed 2493.35 samples/sec Loss 2.7759 LearningRate 0.000507 Epoch: 14 Global Step: 298190 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:06,073-Speed 2497.92 samples/sec Loss 2.8170 LearningRate 0.000507 Epoch: 14 Global Step: 298200 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:14,226-Speed 2512.44 samples/sec Loss 2.8225 LearningRate 0.000507 Epoch: 14 Global Step: 298210 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:22,426-Speed 2498.11 samples/sec Loss 2.7553 LearningRate 0.000506 Epoch: 14 Global Step: 298220 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:30,626-Speed 2498.08 samples/sec Loss 2.7661 LearningRate 0.000506 Epoch: 14 Global Step: 298230 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:38,825-Speed 2498.12 samples/sec Loss 2.7947 LearningRate 0.000506 Epoch: 14 Global Step: 298240 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:47,031-Speed 2496.23 samples/sec Loss 2.7705 LearningRate 0.000506 Epoch: 14 Global Step: 298250 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:39:55,234-Speed 2497.07 samples/sec Loss 2.8100 LearningRate 0.000506 Epoch: 14 Global Step: 298260 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:03,381-Speed 2514.24 samples/sec Loss 2.8628 LearningRate 0.000506 Epoch: 14 Global Step: 298270 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:11,581-Speed 2498.01 samples/sec Loss 2.8053 LearningRate 0.000506 Epoch: 14 Global Step: 298280 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:19,789-Speed 2495.51 samples/sec Loss 2.8193 LearningRate 0.000506 Epoch: 14 Global Step: 298290 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:27,987-Speed 2498.78 samples/sec Loss 2.7919 LearningRate 0.000506 Epoch: 14 Global Step: 298300 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:36,190-Speed 2496.91 samples/sec Loss 2.8106 LearningRate 0.000506 Epoch: 14 Global Step: 298310 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:44,397-Speed 2495.85 samples/sec Loss 2.8081 LearningRate 0.000506 Epoch: 14 Global Step: 298320 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:40:52,539-Speed 2515.80 samples/sec Loss 2.8100 LearningRate 0.000506 Epoch: 14 Global Step: 298330 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:00,738-Speed 2498.64 samples/sec Loss 2.8021 LearningRate 0.000506 Epoch: 14 Global Step: 298340 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:08,940-Speed 2497.24 samples/sec Loss 2.8539 LearningRate 0.000506 Epoch: 14 Global Step: 298350 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:17,141-Speed 2497.46 samples/sec Loss 2.8507 LearningRate 0.000506 Epoch: 14 Global Step: 298360 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:25,342-Speed 2498.02 samples/sec Loss 2.8153 LearningRate 0.000506 Epoch: 14 Global Step: 298370 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:33,545-Speed 2497.14 samples/sec Loss 2.7976 LearningRate 0.000506 Epoch: 14 Global Step: 298380 Fp16 Grad Scale: 32768 Required: 122 hours Training: 2022-07-08 10:41:41,701-Speed 2511.47 samples/sec Loss 2.7697 LearningRate 0.000506 Epoch: 14 Global Step: 298390 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:41:49,901-Speed 2497.73 samples/sec Loss 2.8468 LearningRate 0.000506 Epoch: 14 Global Step: 298400 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:41:58,108-Speed 2495.84 samples/sec Loss 2.8619 LearningRate 0.000506 Epoch: 14 Global Step: 298410 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:06,317-Speed 2495.52 samples/sec Loss 2.8256 LearningRate 0.000506 Epoch: 14 Global Step: 298420 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:14,520-Speed 2497.12 samples/sec Loss 2.7067 LearningRate 0.000506 Epoch: 14 Global Step: 298430 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:22,719-Speed 2498.11 samples/sec Loss 2.8846 LearningRate 0.000506 Epoch: 14 Global Step: 298440 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:30,866-Speed 2514.54 samples/sec Loss 2.8048 LearningRate 0.000506 Epoch: 14 Global Step: 298450 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:39,074-Speed 2495.41 samples/sec Loss 2.8162 LearningRate 0.000506 Epoch: 14 Global Step: 298460 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:47,292-Speed 2492.53 samples/sec Loss 2.7729 LearningRate 0.000506 Epoch: 14 Global Step: 298470 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:42:55,492-Speed 2497.87 samples/sec Loss 2.8097 LearningRate 0.000506 Epoch: 14 Global Step: 298480 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 10:43:03,695-Speed 2497.34 samples/sec Loss 2.7793 LearningRate 0.000506 Epoch: 14 Global Step: 298490 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 10:43:11,896-Speed 2497.66 samples/sec Loss 2.7421 LearningRate 0.000506 Epoch: 14 Global Step: 298500 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 10:43:20,039-Speed 2515.44 samples/sec Loss 2.7922 LearningRate 0.000506 Epoch: 14 Global Step: 298510 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 10:43:28,193-Speed 2512.17 samples/sec Loss 2.8152 LearningRate 0.000506 Epoch: 14 Global Step: 298520 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:43:36,390-Speed 2498.78 samples/sec Loss 2.8237 LearningRate 0.000506 Epoch: 14 Global Step: 298530 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:43:44,584-Speed 2499.72 samples/sec Loss 2.8067 LearningRate 0.000506 Epoch: 14 Global Step: 298540 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:43:52,784-Speed 2498.10 samples/sec Loss 2.8556 LearningRate 0.000506 Epoch: 14 Global Step: 298550 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:00,978-Speed 2499.70 samples/sec Loss 2.8298 LearningRate 0.000506 Epoch: 14 Global Step: 298560 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:09,126-Speed 2514.19 samples/sec Loss 2.8117 LearningRate 0.000506 Epoch: 14 Global Step: 298570 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:17,321-Speed 2499.55 samples/sec Loss 2.7773 LearningRate 0.000506 Epoch: 14 Global Step: 298580 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:25,522-Speed 2497.62 samples/sec Loss 2.8146 LearningRate 0.000506 Epoch: 14 Global Step: 298590 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:33,717-Speed 2499.47 samples/sec Loss 2.8147 LearningRate 0.000506 Epoch: 14 Global Step: 298600 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:41,915-Speed 2498.64 samples/sec Loss 2.7839 LearningRate 0.000506 Epoch: 14 Global Step: 298610 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:50,111-Speed 2498.95 samples/sec Loss 2.8128 LearningRate 0.000506 Epoch: 14 Global Step: 298620 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:44:58,262-Speed 2513.03 samples/sec Loss 2.7590 LearningRate 0.000506 Epoch: 14 Global Step: 298630 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:06,459-Speed 2499.26 samples/sec Loss 2.8173 LearningRate 0.000506 Epoch: 14 Global Step: 298640 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:14,655-Speed 2499.10 samples/sec Loss 2.7844 LearningRate 0.000506 Epoch: 14 Global Step: 298650 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:22,853-Speed 2498.61 samples/sec Loss 2.7343 LearningRate 0.000506 Epoch: 14 Global Step: 298660 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:31,054-Speed 2497.86 samples/sec Loss 2.7529 LearningRate 0.000506 Epoch: 14 Global Step: 298670 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:39,262-Speed 2495.62 samples/sec Loss 2.8346 LearningRate 0.000506 Epoch: 14 Global Step: 298680 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:47,407-Speed 2514.81 samples/sec Loss 2.7508 LearningRate 0.000506 Epoch: 14 Global Step: 298690 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:45:55,607-Speed 2497.92 samples/sec Loss 2.7855 LearningRate 0.000506 Epoch: 14 Global Step: 298700 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:03,809-Speed 2497.66 samples/sec Loss 2.7916 LearningRate 0.000506 Epoch: 14 Global Step: 298710 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:12,008-Speed 2498.16 samples/sec Loss 2.7916 LearningRate 0.000506 Epoch: 14 Global Step: 298720 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:20,212-Speed 2496.84 samples/sec Loss 2.8913 LearningRate 0.000506 Epoch: 14 Global Step: 298730 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:28,428-Speed 2493.30 samples/sec Loss 2.8806 LearningRate 0.000505 Epoch: 14 Global Step: 298740 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:36,573-Speed 2514.85 samples/sec Loss 2.8418 LearningRate 0.000505 Epoch: 14 Global Step: 298750 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:44,781-Speed 2495.36 samples/sec Loss 2.8518 LearningRate 0.000505 Epoch: 14 Global Step: 298760 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:46:52,983-Speed 2497.47 samples/sec Loss 2.8235 LearningRate 0.000505 Epoch: 14 Global Step: 298770 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:01,190-Speed 2495.81 samples/sec Loss 2.8408 LearningRate 0.000505 Epoch: 14 Global Step: 298780 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:09,389-Speed 2498.37 samples/sec Loss 2.7870 LearningRate 0.000505 Epoch: 14 Global Step: 298790 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:17,587-Speed 2498.80 samples/sec Loss 2.7873 LearningRate 0.000505 Epoch: 14 Global Step: 298800 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:25,747-Speed 2510.20 samples/sec Loss 2.8067 LearningRate 0.000505 Epoch: 14 Global Step: 298810 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:33,946-Speed 2498.14 samples/sec Loss 2.7737 LearningRate 0.000505 Epoch: 14 Global Step: 298820 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:42,149-Speed 2496.98 samples/sec Loss 2.7847 LearningRate 0.000505 Epoch: 14 Global Step: 298830 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:50,354-Speed 2496.47 samples/sec Loss 2.7488 LearningRate 0.000505 Epoch: 14 Global Step: 298840 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:47:58,559-Speed 2496.46 samples/sec Loss 2.7971 LearningRate 0.000505 Epoch: 14 Global Step: 298850 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:06,758-Speed 2498.08 samples/sec Loss 2.7892 LearningRate 0.000505 Epoch: 14 Global Step: 298860 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:14,911-Speed 2512.34 samples/sec Loss 2.7734 LearningRate 0.000505 Epoch: 14 Global Step: 298870 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:23,116-Speed 2496.65 samples/sec Loss 2.7431 LearningRate 0.000505 Epoch: 14 Global Step: 298880 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:31,320-Speed 2496.87 samples/sec Loss 2.8285 LearningRate 0.000505 Epoch: 14 Global Step: 298890 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:39,519-Speed 2498.37 samples/sec Loss 2.7694 LearningRate 0.000505 Epoch: 14 Global Step: 298900 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:47,718-Speed 2498.09 samples/sec Loss 2.7569 LearningRate 0.000505 Epoch: 14 Global Step: 298910 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:48:55,923-Speed 2496.80 samples/sec Loss 2.8077 LearningRate 0.000505 Epoch: 14 Global Step: 298920 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:04,073-Speed 2513.38 samples/sec Loss 2.8697 LearningRate 0.000505 Epoch: 14 Global Step: 298930 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:12,279-Speed 2496.18 samples/sec Loss 2.7687 LearningRate 0.000505 Epoch: 14 Global Step: 298940 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:20,486-Speed 2495.78 samples/sec Loss 2.7484 LearningRate 0.000505 Epoch: 14 Global Step: 298950 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:28,684-Speed 2498.73 samples/sec Loss 2.8043 LearningRate 0.000505 Epoch: 14 Global Step: 298960 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:36,882-Speed 2498.55 samples/sec Loss 2.7887 LearningRate 0.000505 Epoch: 14 Global Step: 298970 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:45,080-Speed 2498.62 samples/sec Loss 2.7291 LearningRate 0.000505 Epoch: 14 Global Step: 298980 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:49:53,227-Speed 2514.28 samples/sec Loss 2.7661 LearningRate 0.000505 Epoch: 14 Global Step: 298990 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:01,424-Speed 2498.70 samples/sec Loss 2.8759 LearningRate 0.000505 Epoch: 14 Global Step: 299000 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:09,623-Speed 2498.31 samples/sec Loss 2.7914 LearningRate 0.000505 Epoch: 14 Global Step: 299010 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:17,823-Speed 2498.01 samples/sec Loss 2.8141 LearningRate 0.000505 Epoch: 14 Global Step: 299020 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:26,021-Speed 2498.69 samples/sec Loss 2.7595 LearningRate 0.000505 Epoch: 14 Global Step: 299030 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:34,222-Speed 2497.61 samples/sec Loss 2.7588 LearningRate 0.000505 Epoch: 14 Global Step: 299040 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:42,381-Speed 2510.63 samples/sec Loss 2.7009 LearningRate 0.000505 Epoch: 14 Global Step: 299050 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:50,580-Speed 2498.31 samples/sec Loss 2.7506 LearningRate 0.000505 Epoch: 14 Global Step: 299060 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:50:58,785-Speed 2496.53 samples/sec Loss 2.8114 LearningRate 0.000505 Epoch: 14 Global Step: 299070 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:06,985-Speed 2497.92 samples/sec Loss 2.7984 LearningRate 0.000505 Epoch: 14 Global Step: 299080 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:15,185-Speed 2498.06 samples/sec Loss 2.7979 LearningRate 0.000505 Epoch: 14 Global Step: 299090 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:23,388-Speed 2496.86 samples/sec Loss 2.7854 LearningRate 0.000505 Epoch: 14 Global Step: 299100 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:31,534-Speed 2514.50 samples/sec Loss 2.7163 LearningRate 0.000505 Epoch: 14 Global Step: 299110 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:39,734-Speed 2498.12 samples/sec Loss 2.7785 LearningRate 0.000505 Epoch: 14 Global Step: 299120 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:47,936-Speed 2497.27 samples/sec Loss 2.7568 LearningRate 0.000505 Epoch: 14 Global Step: 299130 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:51:56,139-Speed 2496.99 samples/sec Loss 2.7151 LearningRate 0.000505 Epoch: 14 Global Step: 299140 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:04,339-Speed 2497.63 samples/sec Loss 2.7576 LearningRate 0.000505 Epoch: 14 Global Step: 299150 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:12,539-Speed 2498.03 samples/sec Loss 2.7712 LearningRate 0.000505 Epoch: 14 Global Step: 299160 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:20,686-Speed 2514.20 samples/sec Loss 2.7390 LearningRate 0.000505 Epoch: 14 Global Step: 299170 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:28,886-Speed 2498.16 samples/sec Loss 2.7807 LearningRate 0.000505 Epoch: 14 Global Step: 299180 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:37,086-Speed 2498.03 samples/sec Loss 2.7174 LearningRate 0.000505 Epoch: 14 Global Step: 299190 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:45,285-Speed 2498.35 samples/sec Loss 2.7484 LearningRate 0.000505 Epoch: 14 Global Step: 299200 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:52:53,487-Speed 2497.25 samples/sec Loss 2.8098 LearningRate 0.000505 Epoch: 14 Global Step: 299210 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:01,706-Speed 2492.47 samples/sec Loss 2.7561 LearningRate 0.000505 Epoch: 14 Global Step: 299220 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:09,850-Speed 2514.97 samples/sec Loss 2.7657 LearningRate 0.000505 Epoch: 14 Global Step: 299230 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:18,054-Speed 2496.85 samples/sec Loss 2.7790 LearningRate 0.000505 Epoch: 14 Global Step: 299240 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:26,250-Speed 2499.39 samples/sec Loss 2.7617 LearningRate 0.000505 Epoch: 14 Global Step: 299250 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:34,454-Speed 2496.67 samples/sec Loss 2.7890 LearningRate 0.000505 Epoch: 14 Global Step: 299260 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:42,655-Speed 2497.74 samples/sec Loss 2.7669 LearningRate 0.000504 Epoch: 14 Global Step: 299270 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:50,853-Speed 2499.42 samples/sec Loss 2.8103 LearningRate 0.000504 Epoch: 14 Global Step: 299280 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:53:59,000-Speed 2513.98 samples/sec Loss 2.7698 LearningRate 0.000504 Epoch: 14 Global Step: 299290 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:07,202-Speed 2497.55 samples/sec Loss 2.7777 LearningRate 0.000504 Epoch: 14 Global Step: 299300 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:15,401-Speed 2498.17 samples/sec Loss 2.7878 LearningRate 0.000504 Epoch: 14 Global Step: 299310 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:23,601-Speed 2497.76 samples/sec Loss 2.7915 LearningRate 0.000504 Epoch: 14 Global Step: 299320 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:31,814-Speed 2494.34 samples/sec Loss 2.7356 LearningRate 0.000504 Epoch: 14 Global Step: 299330 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:40,024-Speed 2494.79 samples/sec Loss 2.7805 LearningRate 0.000504 Epoch: 14 Global Step: 299340 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:48,175-Speed 2512.96 samples/sec Loss 2.8076 LearningRate 0.000504 Epoch: 14 Global Step: 299350 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:54:56,385-Speed 2494.72 samples/sec Loss 2.7836 LearningRate 0.000504 Epoch: 14 Global Step: 299360 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:04,583-Speed 2498.53 samples/sec Loss 2.7992 LearningRate 0.000504 Epoch: 14 Global Step: 299370 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:12,787-Speed 2497.16 samples/sec Loss 2.8314 LearningRate 0.000504 Epoch: 14 Global Step: 299380 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:20,990-Speed 2497.11 samples/sec Loss 2.8164 LearningRate 0.000504 Epoch: 14 Global Step: 299390 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:29,189-Speed 2498.26 samples/sec Loss 2.7884 LearningRate 0.000504 Epoch: 14 Global Step: 299400 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:37,332-Speed 2515.32 samples/sec Loss 2.7532 LearningRate 0.000504 Epoch: 14 Global Step: 299410 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:45,542-Speed 2494.91 samples/sec Loss 2.7249 LearningRate 0.000504 Epoch: 14 Global Step: 299420 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:55:53,748-Speed 2496.18 samples/sec Loss 2.8250 LearningRate 0.000504 Epoch: 14 Global Step: 299430 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:01,957-Speed 2495.38 samples/sec Loss 2.8296 LearningRate 0.000504 Epoch: 14 Global Step: 299440 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:10,163-Speed 2496.42 samples/sec Loss 2.7881 LearningRate 0.000504 Epoch: 14 Global Step: 299450 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:18,363-Speed 2497.72 samples/sec Loss 2.8235 LearningRate 0.000504 Epoch: 14 Global Step: 299460 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:26,507-Speed 2515.25 samples/sec Loss 2.8019 LearningRate 0.000504 Epoch: 14 Global Step: 299470 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:34,710-Speed 2497.45 samples/sec Loss 2.7636 LearningRate 0.000504 Epoch: 14 Global Step: 299480 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:42,908-Speed 2498.41 samples/sec Loss 2.8100 LearningRate 0.000504 Epoch: 14 Global Step: 299490 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:51,106-Speed 2498.81 samples/sec Loss 2.8207 LearningRate 0.000504 Epoch: 14 Global Step: 299500 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:56:59,309-Speed 2497.11 samples/sec Loss 2.8065 LearningRate 0.000504 Epoch: 14 Global Step: 299510 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:07,512-Speed 2497.13 samples/sec Loss 2.8364 LearningRate 0.000504 Epoch: 14 Global Step: 299520 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:15,657-Speed 2514.50 samples/sec Loss 2.7508 LearningRate 0.000504 Epoch: 14 Global Step: 299530 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:23,864-Speed 2495.90 samples/sec Loss 2.7559 LearningRate 0.000504 Epoch: 14 Global Step: 299540 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:32,075-Speed 2494.71 samples/sec Loss 2.7249 LearningRate 0.000504 Epoch: 14 Global Step: 299550 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:40,273-Speed 2498.61 samples/sec Loss 2.6792 LearningRate 0.000504 Epoch: 14 Global Step: 299560 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:48,483-Speed 2494.74 samples/sec Loss 2.7577 LearningRate 0.000504 Epoch: 14 Global Step: 299570 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:57:56,687-Speed 2496.87 samples/sec Loss 2.7274 LearningRate 0.000504 Epoch: 14 Global Step: 299580 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:04,848-Speed 2509.86 samples/sec Loss 2.7994 LearningRate 0.000504 Epoch: 14 Global Step: 299590 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:13,051-Speed 2496.94 samples/sec Loss 2.7306 LearningRate 0.000504 Epoch: 14 Global Step: 299600 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:21,262-Speed 2494.61 samples/sec Loss 2.7984 LearningRate 0.000504 Epoch: 14 Global Step: 299610 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:29,464-Speed 2497.50 samples/sec Loss 2.8504 LearningRate 0.000504 Epoch: 14 Global Step: 299620 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:37,667-Speed 2497.14 samples/sec Loss 2.8133 LearningRate 0.000504 Epoch: 14 Global Step: 299630 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:45,869-Speed 2497.22 samples/sec Loss 2.7287 LearningRate 0.000504 Epoch: 14 Global Step: 299640 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:58:54,034-Speed 2508.72 samples/sec Loss 2.7689 LearningRate 0.000504 Epoch: 14 Global Step: 299650 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:02,236-Speed 2497.10 samples/sec Loss 2.7765 LearningRate 0.000504 Epoch: 14 Global Step: 299660 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:10,436-Speed 2498.19 samples/sec Loss 2.7164 LearningRate 0.000504 Epoch: 14 Global Step: 299670 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:18,633-Speed 2499.25 samples/sec Loss 2.7386 LearningRate 0.000504 Epoch: 14 Global Step: 299680 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:26,833-Speed 2497.87 samples/sec Loss 2.7313 LearningRate 0.000504 Epoch: 14 Global Step: 299690 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:35,036-Speed 2497.09 samples/sec Loss 2.7851 LearningRate 0.000504 Epoch: 14 Global Step: 299700 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:43,174-Speed 2517.50 samples/sec Loss 2.7563 LearningRate 0.000504 Epoch: 14 Global Step: 299710 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 10:59:51,380-Speed 2496.01 samples/sec Loss 2.8200 LearningRate 0.000504 Epoch: 14 Global Step: 299720 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 10:59:59,583-Speed 2497.17 samples/sec Loss 2.7956 LearningRate 0.000504 Epoch: 14 Global Step: 299730 Fp16 Grad Scale: 65536 Required: 121 hours Training: 2022-07-08 11:00:07,738-Speed 2511.97 samples/sec Loss 2.7941 LearningRate 0.000504 Epoch: 14 Global Step: 299740 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:15,940-Speed 2497.32 samples/sec Loss 2.8420 LearningRate 0.000504 Epoch: 14 Global Step: 299750 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:24,141-Speed 2497.44 samples/sec Loss 2.7795 LearningRate 0.000504 Epoch: 14 Global Step: 299760 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:32,293-Speed 2512.79 samples/sec Loss 2.7851 LearningRate 0.000504 Epoch: 14 Global Step: 299770 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:40,497-Speed 2496.74 samples/sec Loss 2.7635 LearningRate 0.000504 Epoch: 14 Global Step: 299780 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:48,699-Speed 2497.36 samples/sec Loss 2.6997 LearningRate 0.000503 Epoch: 14 Global Step: 299790 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:00:56,904-Speed 2496.57 samples/sec Loss 2.7688 LearningRate 0.000503 Epoch: 14 Global Step: 299800 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:05,105-Speed 2497.58 samples/sec Loss 2.6966 LearningRate 0.000503 Epoch: 14 Global Step: 299810 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:13,308-Speed 2497.22 samples/sec Loss 2.7277 LearningRate 0.000503 Epoch: 14 Global Step: 299820 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:21,467-Speed 2510.82 samples/sec Loss 2.7428 LearningRate 0.000503 Epoch: 14 Global Step: 299830 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:29,667-Speed 2497.92 samples/sec Loss 2.7146 LearningRate 0.000503 Epoch: 14 Global Step: 299840 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:37,880-Speed 2493.89 samples/sec Loss 2.7641 LearningRate 0.000503 Epoch: 14 Global Step: 299850 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:46,084-Speed 2496.75 samples/sec Loss 2.7993 LearningRate 0.000503 Epoch: 14 Global Step: 299860 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:01:54,287-Speed 2496.97 samples/sec Loss 2.7242 LearningRate 0.000503 Epoch: 14 Global Step: 299870 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:02:02,452-Speed 2509.78 samples/sec Loss 2.7285 LearningRate 0.000503 Epoch: 14 Global Step: 299880 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:10,600-Speed 2513.79 samples/sec Loss 2.7248 LearningRate 0.000503 Epoch: 14 Global Step: 299890 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:18,801-Speed 2497.84 samples/sec Loss 2.7361 LearningRate 0.000503 Epoch: 14 Global Step: 299900 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:27,003-Speed 2497.35 samples/sec Loss 2.7034 LearningRate 0.000503 Epoch: 14 Global Step: 299910 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:35,202-Speed 2498.25 samples/sec Loss 2.7772 LearningRate 0.000503 Epoch: 14 Global Step: 299920 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:43,399-Speed 2499.06 samples/sec Loss 2.8524 LearningRate 0.000503 Epoch: 14 Global Step: 299930 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:51,605-Speed 2496.05 samples/sec Loss 2.8258 LearningRate 0.000503 Epoch: 14 Global Step: 299940 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:02:59,748-Speed 2515.53 samples/sec Loss 2.7968 LearningRate 0.000503 Epoch: 14 Global Step: 299950 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:07,952-Speed 2496.69 samples/sec Loss 2.7821 LearningRate 0.000503 Epoch: 14 Global Step: 299960 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:16,149-Speed 2499.02 samples/sec Loss 2.7941 LearningRate 0.000503 Epoch: 14 Global Step: 299970 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:24,347-Speed 2498.59 samples/sec Loss 2.8054 LearningRate 0.000503 Epoch: 14 Global Step: 299980 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:32,542-Speed 2499.34 samples/sec Loss 2.7884 LearningRate 0.000503 Epoch: 14 Global Step: 299990 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:40,740-Speed 2498.61 samples/sec Loss 2.7143 LearningRate 0.000503 Epoch: 14 Global Step: 300000 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:48,886-Speed 2514.50 samples/sec Loss 2.7981 LearningRate 0.000503 Epoch: 14 Global Step: 300010 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:03:57,083-Speed 2498.93 samples/sec Loss 2.8037 LearningRate 0.000503 Epoch: 14 Global Step: 300020 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:05,285-Speed 2497.20 samples/sec Loss 2.7335 LearningRate 0.000503 Epoch: 14 Global Step: 300030 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:13,486-Speed 2497.73 samples/sec Loss 2.8074 LearningRate 0.000503 Epoch: 14 Global Step: 300040 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:21,686-Speed 2497.90 samples/sec Loss 2.7837 LearningRate 0.000503 Epoch: 14 Global Step: 300050 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:29,889-Speed 2497.09 samples/sec Loss 2.8809 LearningRate 0.000503 Epoch: 14 Global Step: 300060 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:38,033-Speed 2515.12 samples/sec Loss 2.7451 LearningRate 0.000503 Epoch: 14 Global Step: 300070 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:46,231-Speed 2498.66 samples/sec Loss 2.8029 LearningRate 0.000503 Epoch: 14 Global Step: 300080 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:04:54,430-Speed 2498.44 samples/sec Loss 2.7799 LearningRate 0.000503 Epoch: 14 Global Step: 300090 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:02,633-Speed 2496.90 samples/sec Loss 2.8206 LearningRate 0.000503 Epoch: 14 Global Step: 300100 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:10,831-Speed 2499.05 samples/sec Loss 2.8346 LearningRate 0.000503 Epoch: 14 Global Step: 300110 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:19,032-Speed 2497.39 samples/sec Loss 2.8049 LearningRate 0.000503 Epoch: 14 Global Step: 300120 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:27,174-Speed 2515.84 samples/sec Loss 2.7995 LearningRate 0.000503 Epoch: 14 Global Step: 300130 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:35,377-Speed 2497.15 samples/sec Loss 2.8002 LearningRate 0.000503 Epoch: 14 Global Step: 300140 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:43,574-Speed 2498.87 samples/sec Loss 2.7513 LearningRate 0.000503 Epoch: 14 Global Step: 300150 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:51,773-Speed 2498.35 samples/sec Loss 2.7611 LearningRate 0.000503 Epoch: 14 Global Step: 300160 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:05:59,973-Speed 2497.90 samples/sec Loss 2.7435 LearningRate 0.000503 Epoch: 14 Global Step: 300170 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:08,184-Speed 2494.57 samples/sec Loss 2.8030 LearningRate 0.000503 Epoch: 14 Global Step: 300180 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:16,327-Speed 2515.28 samples/sec Loss 2.7395 LearningRate 0.000503 Epoch: 14 Global Step: 300190 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:24,526-Speed 2498.24 samples/sec Loss 2.8355 LearningRate 0.000503 Epoch: 14 Global Step: 300200 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:32,725-Speed 2498.50 samples/sec Loss 2.7750 LearningRate 0.000503 Epoch: 14 Global Step: 300210 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:40,922-Speed 2498.82 samples/sec Loss 2.8167 LearningRate 0.000503 Epoch: 14 Global Step: 300220 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:49,118-Speed 2499.18 samples/sec Loss 2.7724 LearningRate 0.000503 Epoch: 14 Global Step: 300230 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:06:57,335-Speed 2492.66 samples/sec Loss 2.8059 LearningRate 0.000503 Epoch: 14 Global Step: 300240 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:05,484-Speed 2513.74 samples/sec Loss 2.7580 LearningRate 0.000503 Epoch: 14 Global Step: 300250 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:13,683-Speed 2498.17 samples/sec Loss 2.7588 LearningRate 0.000503 Epoch: 14 Global Step: 300260 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:21,885-Speed 2498.39 samples/sec Loss 2.6859 LearningRate 0.000503 Epoch: 14 Global Step: 300270 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:30,091-Speed 2496.18 samples/sec Loss 2.7502 LearningRate 0.000503 Epoch: 14 Global Step: 300280 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:38,296-Speed 2496.43 samples/sec Loss 2.7322 LearningRate 0.000503 Epoch: 14 Global Step: 300290 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:46,507-Speed 2494.52 samples/sec Loss 2.7266 LearningRate 0.000503 Epoch: 14 Global Step: 300300 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:07:54,654-Speed 2514.21 samples/sec Loss 2.7020 LearningRate 0.000503 Epoch: 14 Global Step: 300310 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:02,853-Speed 2498.56 samples/sec Loss 2.7072 LearningRate 0.000502 Epoch: 14 Global Step: 300320 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:11,049-Speed 2499.17 samples/sec Loss 2.7155 LearningRate 0.000502 Epoch: 14 Global Step: 300330 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:19,249-Speed 2498.08 samples/sec Loss 2.6998 LearningRate 0.000502 Epoch: 14 Global Step: 300340 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:27,448-Speed 2498.17 samples/sec Loss 2.7371 LearningRate 0.000502 Epoch: 14 Global Step: 300350 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:35,647-Speed 2498.24 samples/sec Loss 2.6961 LearningRate 0.000502 Epoch: 14 Global Step: 300360 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:43,797-Speed 2513.47 samples/sec Loss 2.6941 LearningRate 0.000502 Epoch: 14 Global Step: 300370 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:08:51,995-Speed 2498.69 samples/sec Loss 2.6759 LearningRate 0.000502 Epoch: 14 Global Step: 300380 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:00,197-Speed 2497.30 samples/sec Loss 2.6857 LearningRate 0.000502 Epoch: 14 Global Step: 300390 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:08,399-Speed 2497.20 samples/sec Loss 2.7102 LearningRate 0.000502 Epoch: 14 Global Step: 300400 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:16,600-Speed 2497.85 samples/sec Loss 2.7504 LearningRate 0.000502 Epoch: 14 Global Step: 300410 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:24,799-Speed 2498.20 samples/sec Loss 2.7378 LearningRate 0.000502 Epoch: 14 Global Step: 300420 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:32,948-Speed 2513.74 samples/sec Loss 2.7020 LearningRate 0.000502 Epoch: 14 Global Step: 300430 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:41,153-Speed 2496.32 samples/sec Loss 2.7855 LearningRate 0.000502 Epoch: 14 Global Step: 300440 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:49,352-Speed 2498.05 samples/sec Loss 2.7474 LearningRate 0.000502 Epoch: 14 Global Step: 300450 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:09:57,550-Speed 2498.69 samples/sec Loss 2.7727 LearningRate 0.000502 Epoch: 14 Global Step: 300460 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:05,754-Speed 2496.86 samples/sec Loss 2.7280 LearningRate 0.000502 Epoch: 14 Global Step: 300470 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:13,952-Speed 2498.35 samples/sec Loss 2.7897 LearningRate 0.000502 Epoch: 14 Global Step: 300480 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:22,099-Speed 2514.24 samples/sec Loss 2.7623 LearningRate 0.000502 Epoch: 14 Global Step: 300490 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:30,300-Speed 2497.59 samples/sec Loss 2.7713 LearningRate 0.000502 Epoch: 14 Global Step: 300500 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:38,500-Speed 2498.08 samples/sec Loss 2.7178 LearningRate 0.000502 Epoch: 14 Global Step: 300510 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:46,701-Speed 2497.56 samples/sec Loss 2.7596 LearningRate 0.000502 Epoch: 14 Global Step: 300520 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:10:54,902-Speed 2497.64 samples/sec Loss 2.6991 LearningRate 0.000502 Epoch: 14 Global Step: 300530 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:03,114-Speed 2494.39 samples/sec Loss 2.7324 LearningRate 0.000502 Epoch: 14 Global Step: 300540 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:11,269-Speed 2512.00 samples/sec Loss 2.7005 LearningRate 0.000502 Epoch: 14 Global Step: 300550 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:19,466-Speed 2498.73 samples/sec Loss 2.7467 LearningRate 0.000502 Epoch: 14 Global Step: 300560 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:27,671-Speed 2496.60 samples/sec Loss 2.7167 LearningRate 0.000502 Epoch: 14 Global Step: 300570 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:35,897-Speed 2489.93 samples/sec Loss 2.7205 LearningRate 0.000502 Epoch: 14 Global Step: 300580 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:44,096-Speed 2498.31 samples/sec Loss 2.6706 LearningRate 0.000502 Epoch: 14 Global Step: 300590 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:11:52,300-Speed 2496.68 samples/sec Loss 2.7036 LearningRate 0.000502 Epoch: 14 Global Step: 300600 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:00,446-Speed 2514.34 samples/sec Loss 2.7580 LearningRate 0.000502 Epoch: 14 Global Step: 300610 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:08,646-Speed 2498.31 samples/sec Loss 2.7400 LearningRate 0.000502 Epoch: 14 Global Step: 300620 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:16,849-Speed 2496.97 samples/sec Loss 2.7326 LearningRate 0.000502 Epoch: 14 Global Step: 300630 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:25,051-Speed 2497.43 samples/sec Loss 2.7605 LearningRate 0.000502 Epoch: 14 Global Step: 300640 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:33,249-Speed 2498.55 samples/sec Loss 2.7064 LearningRate 0.000502 Epoch: 14 Global Step: 300650 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:41,456-Speed 2496.05 samples/sec Loss 2.7243 LearningRate 0.000502 Epoch: 14 Global Step: 300660 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:49,616-Speed 2510.26 samples/sec Loss 2.7852 LearningRate 0.000502 Epoch: 14 Global Step: 300670 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:12:57,833-Speed 2492.71 samples/sec Loss 2.7248 LearningRate 0.000502 Epoch: 14 Global Step: 300680 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:06,039-Speed 2495.82 samples/sec Loss 2.7964 LearningRate 0.000502 Epoch: 14 Global Step: 300690 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:14,243-Speed 2496.90 samples/sec Loss 2.8157 LearningRate 0.000502 Epoch: 14 Global Step: 300700 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:22,440-Speed 2499.00 samples/sec Loss 2.7866 LearningRate 0.000502 Epoch: 14 Global Step: 300710 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:30,639-Speed 2498.22 samples/sec Loss 2.6994 LearningRate 0.000502 Epoch: 14 Global Step: 300720 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:38,786-Speed 2514.48 samples/sec Loss 2.7696 LearningRate 0.000502 Epoch: 14 Global Step: 300730 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:46,989-Speed 2497.33 samples/sec Loss 2.7483 LearningRate 0.000502 Epoch: 14 Global Step: 300740 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:13:55,192-Speed 2497.06 samples/sec Loss 2.7297 LearningRate 0.000502 Epoch: 14 Global Step: 300750 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:03,393-Speed 2497.74 samples/sec Loss 2.7261 LearningRate 0.000502 Epoch: 14 Global Step: 300760 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:11,594-Speed 2497.73 samples/sec Loss 2.7705 LearningRate 0.000502 Epoch: 14 Global Step: 300770 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:19,799-Speed 2496.38 samples/sec Loss 2.7188 LearningRate 0.000502 Epoch: 14 Global Step: 300780 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:27,951-Speed 2512.65 samples/sec Loss 2.7767 LearningRate 0.000502 Epoch: 14 Global Step: 300790 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:36,151-Speed 2497.92 samples/sec Loss 2.7224 LearningRate 0.000502 Epoch: 14 Global Step: 300800 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:44,352-Speed 2498.03 samples/sec Loss 2.7697 LearningRate 0.000502 Epoch: 14 Global Step: 300810 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:14:52,549-Speed 2498.81 samples/sec Loss 2.8080 LearningRate 0.000502 Epoch: 14 Global Step: 300820 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:00,750-Speed 2497.62 samples/sec Loss 2.8115 LearningRate 0.000502 Epoch: 14 Global Step: 300830 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:08,951-Speed 2497.38 samples/sec Loss 2.7849 LearningRate 0.000502 Epoch: 14 Global Step: 300840 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:17,118-Speed 2508.02 samples/sec Loss 2.7750 LearningRate 0.000501 Epoch: 14 Global Step: 300850 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:25,320-Speed 2497.73 samples/sec Loss 2.7434 LearningRate 0.000501 Epoch: 14 Global Step: 300860 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:33,520-Speed 2497.81 samples/sec Loss 2.7295 LearningRate 0.000501 Epoch: 14 Global Step: 300870 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:41,722-Speed 2497.52 samples/sec Loss 2.7476 LearningRate 0.000501 Epoch: 14 Global Step: 300880 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:49,923-Speed 2497.65 samples/sec Loss 2.7713 LearningRate 0.000501 Epoch: 14 Global Step: 300890 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:15:58,124-Speed 2497.75 samples/sec Loss 2.7600 LearningRate 0.000501 Epoch: 14 Global Step: 300900 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:06,273-Speed 2513.47 samples/sec Loss 2.7831 LearningRate 0.000501 Epoch: 14 Global Step: 300910 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:14,477-Speed 2496.77 samples/sec Loss 2.7965 LearningRate 0.000501 Epoch: 14 Global Step: 300920 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:22,681-Speed 2496.71 samples/sec Loss 2.7211 LearningRate 0.000501 Epoch: 14 Global Step: 300930 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:30,881-Speed 2497.87 samples/sec Loss 2.7127 LearningRate 0.000501 Epoch: 14 Global Step: 300940 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:39,084-Speed 2497.27 samples/sec Loss 2.7631 LearningRate 0.000501 Epoch: 14 Global Step: 300950 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:47,289-Speed 2496.31 samples/sec Loss 2.7452 LearningRate 0.000501 Epoch: 14 Global Step: 300960 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:16:55,452-Speed 2509.53 samples/sec Loss 2.7885 LearningRate 0.000501 Epoch: 14 Global Step: 300970 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:03,653-Speed 2497.69 samples/sec Loss 2.7319 LearningRate 0.000501 Epoch: 14 Global Step: 300980 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:11,856-Speed 2496.95 samples/sec Loss 2.7687 LearningRate 0.000501 Epoch: 14 Global Step: 300990 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:23,921-Speed 1859.46 samples/sec Loss 2.7761 LearningRate 0.000501 Epoch: 14 Global Step: 301000 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:32,189-Speed 2500.59 samples/sec Loss 2.8125 LearningRate 0.000501 Epoch: 14 Global Step: 301010 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:40,430-Speed 2499.95 samples/sec Loss 2.7957 LearningRate 0.000501 Epoch: 14 Global Step: 301020 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:50,115-Speed 2114.91 samples/sec Loss 2.7565 LearningRate 0.000501 Epoch: 14 Global Step: 301030 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:17:58,340-Speed 2501.74 samples/sec Loss 2.7550 LearningRate 0.000501 Epoch: 14 Global Step: 301040 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:18:06,556-Speed 2500.61 samples/sec Loss 2.7483 LearningRate 0.000501 Epoch: 14 Global Step: 301050 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:18:14,776-Speed 2491.93 samples/sec Loss 2.8307 LearningRate 0.000501 Epoch: 14 Global Step: 301060 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:18:23,020-Speed 2500.83 samples/sec Loss 2.8213 LearningRate 0.000501 Epoch: 14 Global Step: 301070 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:18:31,258-Speed 2499.71 samples/sec Loss 2.7840 LearningRate 0.000501 Epoch: 14 Global Step: 301080 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:18:43,793-Speed 1637.54 samples/sec Loss 2.7952 LearningRate 0.000501 Epoch: 14 Global Step: 301090 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:18:52,030-Speed 2501.17 samples/sec Loss 2.7604 LearningRate 0.000501 Epoch: 14 Global Step: 301100 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:00,263-Speed 2500.81 samples/sec Loss 2.8162 LearningRate 0.000501 Epoch: 14 Global Step: 301110 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:08,501-Speed 2498.44 samples/sec Loss 2.7781 LearningRate 0.000501 Epoch: 14 Global Step: 301120 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:20,658-Speed 1684.78 samples/sec Loss 2.7786 LearningRate 0.000501 Epoch: 14 Global Step: 301130 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:28,982-Speed 2502.49 samples/sec Loss 2.7285 LearningRate 0.000501 Epoch: 14 Global Step: 301140 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:37,163-Speed 2514.30 samples/sec Loss 2.7393 LearningRate 0.000501 Epoch: 14 Global Step: 301150 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:48,248-Speed 1847.64 samples/sec Loss 2.7772 LearningRate 0.000501 Epoch: 14 Global Step: 301160 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:19:56,442-Speed 2499.68 samples/sec Loss 2.7743 LearningRate 0.000501 Epoch: 14 Global Step: 301170 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:08,095-Speed 1770.20 samples/sec Loss 2.7857 LearningRate 0.000501 Epoch: 14 Global Step: 301180 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:16,304-Speed 2502.94 samples/sec Loss 2.7857 LearningRate 0.000501 Epoch: 14 Global Step: 301190 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:24,502-Speed 2498.57 samples/sec Loss 2.8243 LearningRate 0.000501 Epoch: 14 Global Step: 301200 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:36,231-Speed 1795.55 samples/sec Loss 2.8676 LearningRate 0.000501 Epoch: 14 Global Step: 301210 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:44,455-Speed 2500.05 samples/sec Loss 2.8020 LearningRate 0.000501 Epoch: 14 Global Step: 301220 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:20:52,660-Speed 2496.19 samples/sec Loss 2.8087 LearningRate 0.000501 Epoch: 14 Global Step: 301230 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:03,934-Speed 2420.35 samples/sec Loss 2.8724 LearningRate 0.000501 Epoch: 14 Global Step: 301240 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:13,363-Speed 2492.47 samples/sec Loss 2.8343 LearningRate 0.000501 Epoch: 14 Global Step: 301250 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:26,367-Speed 2497.84 samples/sec Loss 2.7533 LearningRate 0.000501 Epoch: 14 Global Step: 301260 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:34,999-Speed 2516.85 samples/sec Loss 2.7554 LearningRate 0.000501 Epoch: 14 Global Step: 301270 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:44,824-Speed 2316.43 samples/sec Loss 2.7791 LearningRate 0.000501 Epoch: 14 Global Step: 301280 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:21:53,277-Speed 2495.59 samples/sec Loss 2.7934 LearningRate 0.000501 Epoch: 14 Global Step: 301290 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:01,477-Speed 2497.97 samples/sec Loss 2.7576 LearningRate 0.000501 Epoch: 14 Global Step: 301300 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:09,685-Speed 2495.16 samples/sec Loss 2.8191 LearningRate 0.000501 Epoch: 14 Global Step: 301310 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:22,120-Speed 1723.69 samples/sec Loss 2.8286 LearningRate 0.000501 Epoch: 14 Global Step: 301320 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:30,486-Speed 2517.65 samples/sec Loss 2.7682 LearningRate 0.000501 Epoch: 14 Global Step: 301330 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:38,689-Speed 2497.17 samples/sec Loss 2.7899 LearningRate 0.000501 Epoch: 14 Global Step: 301340 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:22:52,656-Speed 2498.72 samples/sec Loss 2.8577 LearningRate 0.000501 Epoch: 14 Global Step: 301350 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:00,913-Speed 2500.15 samples/sec Loss 2.8655 LearningRate 0.000501 Epoch: 14 Global Step: 301360 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:13,634-Speed 1610.07 samples/sec Loss 2.7824 LearningRate 0.000500 Epoch: 14 Global Step: 301370 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:21,894-Speed 2496.12 samples/sec Loss 2.8418 LearningRate 0.000500 Epoch: 14 Global Step: 301380 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:30,052-Speed 2515.44 samples/sec Loss 2.8668 LearningRate 0.000500 Epoch: 14 Global Step: 301390 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:43,449-Speed 1532.48 samples/sec Loss 2.8143 LearningRate 0.000500 Epoch: 14 Global Step: 301400 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:23:52,201-Speed 2510.99 samples/sec Loss 2.8467 LearningRate 0.000500 Epoch: 14 Global Step: 301410 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:24:00,472-Speed 2500.14 samples/sec Loss 2.8769 LearningRate 0.000500 Epoch: 14 Global Step: 301420 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:24:11,193-Speed 2108.60 samples/sec Loss 2.8241 LearningRate 0.000500 Epoch: 14 Global Step: 301430 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:24:19,396-Speed 2496.86 samples/sec Loss 2.7620 LearningRate 0.000500 Epoch: 14 Global Step: 301440 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:24:27,721-Speed 2513.14 samples/sec Loss 2.8348 LearningRate 0.000500 Epoch: 14 Global Step: 301450 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:24:36,002-Speed 2498.34 samples/sec Loss 2.7911 LearningRate 0.000500 Epoch: 14 Global Step: 301460 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:25:32,279-Speed 363.93 samples/sec Loss 2.7852 LearningRate 0.000500 Epoch: 14 Global Step: 301470 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:25:40,512-Speed 2502.83 samples/sec Loss 2.7840 LearningRate 0.000500 Epoch: 14 Global Step: 301480 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:25:49,024-Speed 2420.00 samples/sec Loss 2.8115 LearningRate 0.000500 Epoch: 14 Global Step: 301490 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:25:57,510-Speed 2413.57 samples/sec Loss 2.8345 LearningRate 0.000500 Epoch: 14 Global Step: 301500 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:05,659-Speed 2513.82 samples/sec Loss 2.7995 LearningRate 0.000500 Epoch: 14 Global Step: 301510 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:13,878-Speed 2492.13 samples/sec Loss 2.7665 LearningRate 0.000500 Epoch: 14 Global Step: 301520 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:22,100-Speed 2491.54 samples/sec Loss 2.7872 LearningRate 0.000500 Epoch: 14 Global Step: 301530 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:30,329-Speed 2489.28 samples/sec Loss 2.7706 LearningRate 0.000500 Epoch: 14 Global Step: 301540 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:38,541-Speed 2494.40 samples/sec Loss 2.7702 LearningRate 0.000500 Epoch: 14 Global Step: 301550 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:46,751-Speed 2494.84 samples/sec Loss 2.7761 LearningRate 0.000500 Epoch: 14 Global Step: 301560 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:26:54,911-Speed 2510.28 samples/sec Loss 2.7477 LearningRate 0.000500 Epoch: 14 Global Step: 301570 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:03,120-Speed 2495.18 samples/sec Loss 2.7617 LearningRate 0.000500 Epoch: 14 Global Step: 301580 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:11,323-Speed 2496.93 samples/sec Loss 2.8017 LearningRate 0.000500 Epoch: 14 Global Step: 301590 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:19,527-Speed 2496.96 samples/sec Loss 2.7702 LearningRate 0.000500 Epoch: 14 Global Step: 301600 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:27,726-Speed 2498.34 samples/sec Loss 2.7533 LearningRate 0.000500 Epoch: 14 Global Step: 301610 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:35,939-Speed 2493.88 samples/sec Loss 2.7295 LearningRate 0.000500 Epoch: 14 Global Step: 301620 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:44,088-Speed 2513.58 samples/sec Loss 2.7477 LearningRate 0.000500 Epoch: 14 Global Step: 301630 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:27:52,294-Speed 2496.06 samples/sec Loss 2.7552 LearningRate 0.000500 Epoch: 14 Global Step: 301640 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:00,500-Speed 2496.16 samples/sec Loss 2.8584 LearningRate 0.000500 Epoch: 14 Global Step: 301650 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:08,721-Speed 2491.60 samples/sec Loss 2.8031 LearningRate 0.000500 Epoch: 14 Global Step: 301660 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:16,929-Speed 2495.60 samples/sec Loss 2.7985 LearningRate 0.000500 Epoch: 14 Global Step: 301670 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:25,134-Speed 2496.63 samples/sec Loss 2.7608 LearningRate 0.000500 Epoch: 14 Global Step: 301680 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:33,288-Speed 2512.08 samples/sec Loss 2.7991 LearningRate 0.000500 Epoch: 14 Global Step: 301690 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:41,495-Speed 2495.92 samples/sec Loss 2.7921 LearningRate 0.000500 Epoch: 14 Global Step: 301700 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:49,702-Speed 2495.93 samples/sec Loss 2.7613 LearningRate 0.000500 Epoch: 14 Global Step: 301710 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:28:57,913-Speed 2494.47 samples/sec Loss 2.7923 LearningRate 0.000500 Epoch: 14 Global Step: 301720 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:06,119-Speed 2496.31 samples/sec Loss 2.7824 LearningRate 0.000500 Epoch: 14 Global Step: 301730 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:14,327-Speed 2495.37 samples/sec Loss 2.7576 LearningRate 0.000500 Epoch: 14 Global Step: 301740 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:22,486-Speed 2510.70 samples/sec Loss 2.8352 LearningRate 0.000500 Epoch: 14 Global Step: 301750 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:30,694-Speed 2495.37 samples/sec Loss 2.7223 LearningRate 0.000500 Epoch: 14 Global Step: 301760 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:38,901-Speed 2496.16 samples/sec Loss 2.7744 LearningRate 0.000500 Epoch: 14 Global Step: 301770 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:47,116-Speed 2493.58 samples/sec Loss 2.7907 LearningRate 0.000500 Epoch: 14 Global Step: 301780 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:29:55,325-Speed 2495.25 samples/sec Loss 2.7854 LearningRate 0.000500 Epoch: 14 Global Step: 301790 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:03,534-Speed 2494.99 samples/sec Loss 2.7645 LearningRate 0.000500 Epoch: 14 Global Step: 301800 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:11,688-Speed 2512.30 samples/sec Loss 2.7725 LearningRate 0.000500 Epoch: 14 Global Step: 301810 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:19,908-Speed 2491.75 samples/sec Loss 2.8336 LearningRate 0.000500 Epoch: 14 Global Step: 301820 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:28,111-Speed 2497.19 samples/sec Loss 2.7387 LearningRate 0.000500 Epoch: 14 Global Step: 301830 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:36,322-Speed 2494.58 samples/sec Loss 2.7895 LearningRate 0.000500 Epoch: 14 Global Step: 301840 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:44,525-Speed 2497.28 samples/sec Loss 2.7635 LearningRate 0.000500 Epoch: 14 Global Step: 301850 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:30:52,728-Speed 2496.89 samples/sec Loss 2.7656 LearningRate 0.000500 Epoch: 14 Global Step: 301860 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:00,880-Speed 2512.86 samples/sec Loss 2.7884 LearningRate 0.000500 Epoch: 14 Global Step: 301870 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:09,111-Speed 2488.57 samples/sec Loss 2.7373 LearningRate 0.000500 Epoch: 14 Global Step: 301880 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:17,317-Speed 2496.21 samples/sec Loss 2.7541 LearningRate 0.000500 Epoch: 14 Global Step: 301890 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:25,527-Speed 2494.75 samples/sec Loss 2.7530 LearningRate 0.000499 Epoch: 14 Global Step: 301900 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:33,732-Speed 2497.07 samples/sec Loss 2.7397 LearningRate 0.000499 Epoch: 14 Global Step: 301910 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:41,936-Speed 2496.52 samples/sec Loss 2.7372 LearningRate 0.000499 Epoch: 14 Global Step: 301920 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:50,098-Speed 2509.71 samples/sec Loss 2.7494 LearningRate 0.000499 Epoch: 14 Global Step: 301930 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:31:58,301-Speed 2497.02 samples/sec Loss 2.7382 LearningRate 0.000499 Epoch: 14 Global Step: 301940 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:06,515-Speed 2493.92 samples/sec Loss 2.7113 LearningRate 0.000499 Epoch: 14 Global Step: 301950 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:14,718-Speed 2496.79 samples/sec Loss 2.8260 LearningRate 0.000499 Epoch: 14 Global Step: 301960 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:22,940-Speed 2491.30 samples/sec Loss 2.7462 LearningRate 0.000499 Epoch: 14 Global Step: 301970 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:31,146-Speed 2496.01 samples/sec Loss 2.7709 LearningRate 0.000499 Epoch: 14 Global Step: 301980 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:39,298-Speed 2513.04 samples/sec Loss 2.7400 LearningRate 0.000499 Epoch: 14 Global Step: 301990 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:47,516-Speed 2492.32 samples/sec Loss 2.7209 LearningRate 0.000499 Epoch: 14 Global Step: 302000 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:32:55,740-Speed 2490.67 samples/sec Loss 2.7724 LearningRate 0.000499 Epoch: 14 Global Step: 302010 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:03,942-Speed 2497.61 samples/sec Loss 2.7917 LearningRate 0.000499 Epoch: 14 Global Step: 302020 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:12,147-Speed 2496.55 samples/sec Loss 2.7807 LearningRate 0.000499 Epoch: 14 Global Step: 302030 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:20,350-Speed 2497.05 samples/sec Loss 2.7437 LearningRate 0.000499 Epoch: 14 Global Step: 302040 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:28,501-Speed 2512.83 samples/sec Loss 2.7988 LearningRate 0.000499 Epoch: 14 Global Step: 302050 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:36,705-Speed 2496.83 samples/sec Loss 2.7280 LearningRate 0.000499 Epoch: 14 Global Step: 302060 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:44,914-Speed 2495.34 samples/sec Loss 2.7396 LearningRate 0.000499 Epoch: 14 Global Step: 302070 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:33:53,124-Speed 2494.69 samples/sec Loss 2.8164 LearningRate 0.000499 Epoch: 14 Global Step: 302080 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:01,329-Speed 2496.45 samples/sec Loss 2.8352 LearningRate 0.000499 Epoch: 14 Global Step: 302090 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:09,535-Speed 2496.56 samples/sec Loss 2.8357 LearningRate 0.000499 Epoch: 14 Global Step: 302100 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:17,683-Speed 2513.74 samples/sec Loss 2.7357 LearningRate 0.000499 Epoch: 14 Global Step: 302110 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:25,887-Speed 2497.08 samples/sec Loss 2.7734 LearningRate 0.000499 Epoch: 14 Global Step: 302120 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:34,095-Speed 2495.49 samples/sec Loss 2.7863 LearningRate 0.000499 Epoch: 14 Global Step: 302130 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:42,302-Speed 2495.92 samples/sec Loss 2.7465 LearningRate 0.000499 Epoch: 14 Global Step: 302140 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:50,506-Speed 2496.76 samples/sec Loss 2.7334 LearningRate 0.000499 Epoch: 14 Global Step: 302150 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:34:58,710-Speed 2496.84 samples/sec Loss 2.7847 LearningRate 0.000499 Epoch: 14 Global Step: 302160 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:06,862-Speed 2512.70 samples/sec Loss 2.8043 LearningRate 0.000499 Epoch: 14 Global Step: 302170 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:15,066-Speed 2496.70 samples/sec Loss 2.8027 LearningRate 0.000499 Epoch: 14 Global Step: 302180 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:23,272-Speed 2496.30 samples/sec Loss 2.8386 LearningRate 0.000499 Epoch: 14 Global Step: 302190 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:31,476-Speed 2496.67 samples/sec Loss 2.8735 LearningRate 0.000499 Epoch: 14 Global Step: 302200 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:39,678-Speed 2497.78 samples/sec Loss 2.7950 LearningRate 0.000499 Epoch: 14 Global Step: 302210 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:47,879-Speed 2498.22 samples/sec Loss 2.7884 LearningRate 0.000499 Epoch: 14 Global Step: 302220 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:35:56,031-Speed 2512.73 samples/sec Loss 2.8036 LearningRate 0.000499 Epoch: 14 Global Step: 302230 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:04,231-Speed 2498.02 samples/sec Loss 2.8230 LearningRate 0.000499 Epoch: 14 Global Step: 302240 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:12,439-Speed 2495.67 samples/sec Loss 2.7400 LearningRate 0.000499 Epoch: 14 Global Step: 302250 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:20,639-Speed 2497.96 samples/sec Loss 2.7027 LearningRate 0.000499 Epoch: 14 Global Step: 302260 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:28,857-Speed 2492.51 samples/sec Loss 2.7690 LearningRate 0.000499 Epoch: 14 Global Step: 302270 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:37,062-Speed 2496.41 samples/sec Loss 2.7241 LearningRate 0.000499 Epoch: 14 Global Step: 302280 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:45,213-Speed 2513.06 samples/sec Loss 2.7744 LearningRate 0.000499 Epoch: 14 Global Step: 302290 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:36:53,418-Speed 2496.32 samples/sec Loss 2.7984 LearningRate 0.000499 Epoch: 14 Global Step: 302300 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:01,626-Speed 2495.41 samples/sec Loss 2.7647 LearningRate 0.000499 Epoch: 14 Global Step: 302310 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:09,831-Speed 2496.71 samples/sec Loss 2.7126 LearningRate 0.000499 Epoch: 14 Global Step: 302320 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:18,033-Speed 2497.09 samples/sec Loss 2.7878 LearningRate 0.000499 Epoch: 14 Global Step: 302330 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:26,238-Speed 2496.47 samples/sec Loss 2.7213 LearningRate 0.000499 Epoch: 14 Global Step: 302340 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:34,388-Speed 2513.23 samples/sec Loss 2.7204 LearningRate 0.000499 Epoch: 14 Global Step: 302350 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:42,597-Speed 2495.16 samples/sec Loss 2.7851 LearningRate 0.000499 Epoch: 14 Global Step: 302360 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:50,800-Speed 2497.34 samples/sec Loss 2.7790 LearningRate 0.000499 Epoch: 14 Global Step: 302370 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:37:59,005-Speed 2496.24 samples/sec Loss 2.8274 LearningRate 0.000499 Epoch: 14 Global Step: 302380 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:07,211-Speed 2496.23 samples/sec Loss 2.7667 LearningRate 0.000499 Epoch: 14 Global Step: 302390 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:15,434-Speed 2491.09 samples/sec Loss 2.7731 LearningRate 0.000499 Epoch: 14 Global Step: 302400 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:23,596-Speed 2509.62 samples/sec Loss 2.7522 LearningRate 0.000499 Epoch: 14 Global Step: 302410 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:31,810-Speed 2493.84 samples/sec Loss 2.7639 LearningRate 0.000499 Epoch: 14 Global Step: 302420 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:40,015-Speed 2496.51 samples/sec Loss 2.7616 LearningRate 0.000498 Epoch: 14 Global Step: 302430 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:48,221-Speed 2495.98 samples/sec Loss 2.7254 LearningRate 0.000498 Epoch: 14 Global Step: 302440 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:38:56,424-Speed 2497.26 samples/sec Loss 2.7668 LearningRate 0.000498 Epoch: 14 Global Step: 302450 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:04,633-Speed 2495.13 samples/sec Loss 2.6917 LearningRate 0.000498 Epoch: 14 Global Step: 302460 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:12,786-Speed 2512.20 samples/sec Loss 2.7648 LearningRate 0.000498 Epoch: 14 Global Step: 302470 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:20,988-Speed 2497.57 samples/sec Loss 2.7282 LearningRate 0.000498 Epoch: 14 Global Step: 302480 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:29,191-Speed 2497.18 samples/sec Loss 2.7323 LearningRate 0.000498 Epoch: 14 Global Step: 302490 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:37,401-Speed 2494.86 samples/sec Loss 2.7514 LearningRate 0.000498 Epoch: 14 Global Step: 302500 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:45,603-Speed 2497.27 samples/sec Loss 2.7394 LearningRate 0.000498 Epoch: 14 Global Step: 302510 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:39:53,807-Speed 2496.89 samples/sec Loss 2.7320 LearningRate 0.000498 Epoch: 14 Global Step: 302520 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:01,957-Speed 2515.06 samples/sec Loss 2.7697 LearningRate 0.000498 Epoch: 14 Global Step: 302530 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:10,162-Speed 2496.37 samples/sec Loss 2.7123 LearningRate 0.000498 Epoch: 14 Global Step: 302540 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:18,365-Speed 2497.14 samples/sec Loss 2.6876 LearningRate 0.000498 Epoch: 14 Global Step: 302550 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:26,568-Speed 2497.29 samples/sec Loss 2.7111 LearningRate 0.000498 Epoch: 14 Global Step: 302560 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:34,778-Speed 2494.75 samples/sec Loss 2.6984 LearningRate 0.000498 Epoch: 14 Global Step: 302570 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:42,984-Speed 2496.10 samples/sec Loss 2.8065 LearningRate 0.000498 Epoch: 14 Global Step: 302580 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:51,144-Speed 2511.81 samples/sec Loss 2.7871 LearningRate 0.000498 Epoch: 14 Global Step: 302590 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:40:59,375-Speed 2488.60 samples/sec Loss 2.7267 LearningRate 0.000498 Epoch: 14 Global Step: 302600 Fp16 Grad Scale: 16384 Required: 121 hours Training: 2022-07-08 11:41:07,582-Speed 2495.67 samples/sec Loss 2.7895 LearningRate 0.000498 Epoch: 14 Global Step: 302610 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:15,786-Speed 2496.60 samples/sec Loss 2.7849 LearningRate 0.000498 Epoch: 14 Global Step: 302620 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:24,001-Speed 2493.59 samples/sec Loss 2.7722 LearningRate 0.000498 Epoch: 14 Global Step: 302630 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:32,202-Speed 2497.74 samples/sec Loss 2.7000 LearningRate 0.000498 Epoch: 14 Global Step: 302640 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:40,354-Speed 2512.49 samples/sec Loss 2.7448 LearningRate 0.000498 Epoch: 14 Global Step: 302650 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:48,558-Speed 2496.77 samples/sec Loss 2.6746 LearningRate 0.000498 Epoch: 14 Global Step: 302660 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:41:56,760-Speed 2497.49 samples/sec Loss 2.7582 LearningRate 0.000498 Epoch: 14 Global Step: 302670 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:04,968-Speed 2495.72 samples/sec Loss 2.7363 LearningRate 0.000498 Epoch: 14 Global Step: 302680 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:13,171-Speed 2497.03 samples/sec Loss 2.7831 LearningRate 0.000498 Epoch: 14 Global Step: 302690 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:21,376-Speed 2496.56 samples/sec Loss 2.7832 LearningRate 0.000498 Epoch: 14 Global Step: 302700 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:29,534-Speed 2511.33 samples/sec Loss 2.7798 LearningRate 0.000498 Epoch: 14 Global Step: 302710 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:37,740-Speed 2495.93 samples/sec Loss 2.7508 LearningRate 0.000498 Epoch: 14 Global Step: 302720 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:45,946-Speed 2496.04 samples/sec Loss 2.7795 LearningRate 0.000498 Epoch: 14 Global Step: 302730 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:42:54,162-Speed 2493.26 samples/sec Loss 2.7055 LearningRate 0.000498 Epoch: 14 Global Step: 302740 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:02,367-Speed 2496.49 samples/sec Loss 2.7486 LearningRate 0.000498 Epoch: 14 Global Step: 302750 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:10,573-Speed 2495.99 samples/sec Loss 2.7387 LearningRate 0.000498 Epoch: 14 Global Step: 302760 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:18,726-Speed 2512.22 samples/sec Loss 2.7918 LearningRate 0.000498 Epoch: 14 Global Step: 302770 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:26,958-Speed 2488.25 samples/sec Loss 2.8030 LearningRate 0.000498 Epoch: 14 Global Step: 302780 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:35,166-Speed 2495.51 samples/sec Loss 2.7486 LearningRate 0.000498 Epoch: 14 Global Step: 302790 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:43,367-Speed 2497.58 samples/sec Loss 2.7803 LearningRate 0.000498 Epoch: 14 Global Step: 302800 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:51,572-Speed 2496.35 samples/sec Loss 2.7119 LearningRate 0.000498 Epoch: 14 Global Step: 302810 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:43:59,785-Speed 2494.04 samples/sec Loss 2.7007 LearningRate 0.000498 Epoch: 14 Global Step: 302820 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:07,941-Speed 2511.32 samples/sec Loss 2.7178 LearningRate 0.000498 Epoch: 14 Global Step: 302830 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:16,234-Speed 2469.99 samples/sec Loss 2.7628 LearningRate 0.000498 Epoch: 14 Global Step: 302840 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:24,435-Speed 2497.59 samples/sec Loss 2.7757 LearningRate 0.000498 Epoch: 14 Global Step: 302850 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:32,639-Speed 2496.66 samples/sec Loss 2.7004 LearningRate 0.000498 Epoch: 14 Global Step: 302860 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:40,846-Speed 2496.09 samples/sec Loss 2.7118 LearningRate 0.000498 Epoch: 14 Global Step: 302870 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:49,051-Speed 2496.53 samples/sec Loss 2.7496 LearningRate 0.000498 Epoch: 14 Global Step: 302880 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:44:57,209-Speed 2510.94 samples/sec Loss 2.7922 LearningRate 0.000498 Epoch: 14 Global Step: 302890 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:05,410-Speed 2497.36 samples/sec Loss 2.7972 LearningRate 0.000498 Epoch: 14 Global Step: 302900 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:13,630-Speed 2492.05 samples/sec Loss 2.7240 LearningRate 0.000498 Epoch: 14 Global Step: 302910 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:21,845-Speed 2493.49 samples/sec Loss 2.7929 LearningRate 0.000498 Epoch: 14 Global Step: 302920 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:30,045-Speed 2497.78 samples/sec Loss 2.7903 LearningRate 0.000498 Epoch: 14 Global Step: 302930 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:38,248-Speed 2497.29 samples/sec Loss 2.8240 LearningRate 0.000498 Epoch: 14 Global Step: 302940 Fp16 Grad Scale: 32768 Required: 121 hours Training: 2022-07-08 11:45:46,406-Speed 2511.02 samples/sec Loss 2.7900 LearningRate 0.000498 Epoch: 14 Global Step: 302950 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:45:54,607-Speed 2497.60 samples/sec Loss 2.7945 LearningRate 0.000497 Epoch: 14 Global Step: 302960 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:02,808-Speed 2497.39 samples/sec Loss 2.7393 LearningRate 0.000497 Epoch: 14 Global Step: 302970 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:11,013-Speed 2496.52 samples/sec Loss 2.6919 LearningRate 0.000497 Epoch: 14 Global Step: 302980 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:19,215-Speed 2497.48 samples/sec Loss 2.7875 LearningRate 0.000497 Epoch: 14 Global Step: 302990 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:27,418-Speed 2496.79 samples/sec Loss 2.7877 LearningRate 0.000497 Epoch: 14 Global Step: 303000 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:35,567-Speed 2513.72 samples/sec Loss 2.7774 LearningRate 0.000497 Epoch: 14 Global Step: 303010 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:43,775-Speed 2495.68 samples/sec Loss 2.7003 LearningRate 0.000497 Epoch: 14 Global Step: 303020 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:46:51,975-Speed 2497.79 samples/sec Loss 2.7292 LearningRate 0.000497 Epoch: 14 Global Step: 303030 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:00,193-Speed 2492.49 samples/sec Loss 2.7452 LearningRate 0.000497 Epoch: 14 Global Step: 303040 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:08,397-Speed 2496.79 samples/sec Loss 2.7588 LearningRate 0.000497 Epoch: 14 Global Step: 303050 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:16,598-Speed 2497.59 samples/sec Loss 2.7474 LearningRate 0.000497 Epoch: 14 Global Step: 303060 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:24,758-Speed 2510.00 samples/sec Loss 2.7240 LearningRate 0.000497 Epoch: 14 Global Step: 303070 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:32,960-Speed 2497.48 samples/sec Loss 2.7211 LearningRate 0.000497 Epoch: 14 Global Step: 303080 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:41,159-Speed 2498.14 samples/sec Loss 2.7182 LearningRate 0.000497 Epoch: 14 Global Step: 303090 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:49,360-Speed 2497.85 samples/sec Loss 2.7684 LearningRate 0.000497 Epoch: 14 Global Step: 303100 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:47:57,561-Speed 2497.79 samples/sec Loss 2.7816 LearningRate 0.000497 Epoch: 14 Global Step: 303110 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:48:05,766-Speed 2496.30 samples/sec Loss 2.7760 LearningRate 0.000497 Epoch: 14 Global Step: 303120 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:48:13,912-Speed 2514.66 samples/sec Loss 2.7743 LearningRate 0.000497 Epoch: 14 Global Step: 303130 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:48:22,115-Speed 2497.08 samples/sec Loss 2.7585 LearningRate 0.000497 Epoch: 14 Global Step: 303140 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:48:30,320-Speed 2496.22 samples/sec Loss 2.7388 LearningRate 0.000497 Epoch: 14 Global Step: 303150 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 11:48:38,481-Speed 2509.78 samples/sec Loss 2.7591 LearningRate 0.000497 Epoch: 14 Global Step: 303160 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:48:46,698-Speed 2493.12 samples/sec Loss 2.8006 LearningRate 0.000497 Epoch: 14 Global Step: 303170 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:48:54,913-Speed 2493.48 samples/sec Loss 2.7767 LearningRate 0.000497 Epoch: 14 Global Step: 303180 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:03,071-Speed 2510.85 samples/sec Loss 2.7240 LearningRate 0.000497 Epoch: 14 Global Step: 303190 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:11,274-Speed 2496.98 samples/sec Loss 2.8594 LearningRate 0.000497 Epoch: 14 Global Step: 303200 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:19,474-Speed 2498.14 samples/sec Loss 2.7815 LearningRate 0.000497 Epoch: 14 Global Step: 303210 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:27,677-Speed 2497.29 samples/sec Loss 2.7375 LearningRate 0.000497 Epoch: 14 Global Step: 303220 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:35,877-Speed 2497.65 samples/sec Loss 2.8297 LearningRate 0.000497 Epoch: 14 Global Step: 303230 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:44,080-Speed 2497.28 samples/sec Loss 2.7292 LearningRate 0.000497 Epoch: 14 Global Step: 303240 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:49:52,227-Speed 2514.09 samples/sec Loss 2.8332 LearningRate 0.000497 Epoch: 14 Global Step: 303250 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:00,434-Speed 2495.99 samples/sec Loss 2.7783 LearningRate 0.000497 Epoch: 14 Global Step: 303260 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:08,640-Speed 2496.35 samples/sec Loss 2.7882 LearningRate 0.000497 Epoch: 14 Global Step: 303270 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:16,855-Speed 2493.33 samples/sec Loss 2.8039 LearningRate 0.000497 Epoch: 14 Global Step: 303280 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:25,079-Speed 2490.78 samples/sec Loss 2.7262 LearningRate 0.000497 Epoch: 14 Global Step: 303290 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:33,293-Speed 2493.56 samples/sec Loss 2.7957 LearningRate 0.000497 Epoch: 14 Global Step: 303300 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:41,442-Speed 2513.52 samples/sec Loss 2.8018 LearningRate 0.000497 Epoch: 14 Global Step: 303310 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:49,644-Speed 2497.35 samples/sec Loss 2.7683 LearningRate 0.000497 Epoch: 14 Global Step: 303320 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:50:57,846-Speed 2497.51 samples/sec Loss 2.7429 LearningRate 0.000497 Epoch: 14 Global Step: 303330 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:06,052-Speed 2496.20 samples/sec Loss 2.7172 LearningRate 0.000497 Epoch: 14 Global Step: 303340 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:14,258-Speed 2496.23 samples/sec Loss 2.7552 LearningRate 0.000497 Epoch: 14 Global Step: 303350 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:22,462-Speed 2496.57 samples/sec Loss 2.7890 LearningRate 0.000497 Epoch: 14 Global Step: 303360 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:30,612-Speed 2513.26 samples/sec Loss 2.7127 LearningRate 0.000497 Epoch: 14 Global Step: 303370 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:38,812-Speed 2497.91 samples/sec Loss 2.7557 LearningRate 0.000497 Epoch: 14 Global Step: 303380 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:47,021-Speed 2495.15 samples/sec Loss 2.7759 LearningRate 0.000497 Epoch: 14 Global Step: 303390 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:51:55,222-Speed 2497.89 samples/sec Loss 2.7590 LearningRate 0.000497 Epoch: 14 Global Step: 303400 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:03,443-Speed 2491.82 samples/sec Loss 2.7644 LearningRate 0.000497 Epoch: 14 Global Step: 303410 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:11,643-Speed 2498.15 samples/sec Loss 2.7286 LearningRate 0.000497 Epoch: 14 Global Step: 303420 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:19,785-Speed 2515.67 samples/sec Loss 2.7229 LearningRate 0.000497 Epoch: 14 Global Step: 303430 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:27,985-Speed 2498.06 samples/sec Loss 2.6993 LearningRate 0.000497 Epoch: 14 Global Step: 303440 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:36,187-Speed 2497.35 samples/sec Loss 2.7118 LearningRate 0.000497 Epoch: 14 Global Step: 303450 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:44,386-Speed 2498.53 samples/sec Loss 2.7644 LearningRate 0.000497 Epoch: 14 Global Step: 303460 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:52:52,591-Speed 2496.22 samples/sec Loss 2.7557 LearningRate 0.000497 Epoch: 14 Global Step: 303470 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:00,788-Speed 2498.97 samples/sec Loss 2.7375 LearningRate 0.000497 Epoch: 14 Global Step: 303480 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:08,932-Speed 2515.42 samples/sec Loss 2.7890 LearningRate 0.000496 Epoch: 14 Global Step: 303490 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:17,131-Speed 2498.16 samples/sec Loss 2.7292 LearningRate 0.000496 Epoch: 14 Global Step: 303500 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:25,336-Speed 2496.58 samples/sec Loss 2.7781 LearningRate 0.000496 Epoch: 14 Global Step: 303510 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:33,533-Speed 2498.65 samples/sec Loss 2.8032 LearningRate 0.000496 Epoch: 14 Global Step: 303520 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:41,747-Speed 2493.98 samples/sec Loss 2.7329 LearningRate 0.000496 Epoch: 14 Global Step: 303530 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:49,948-Speed 2497.85 samples/sec Loss 2.8439 LearningRate 0.000496 Epoch: 14 Global Step: 303540 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:53:58,092-Speed 2514.86 samples/sec Loss 2.7939 LearningRate 0.000496 Epoch: 14 Global Step: 303550 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:06,292-Speed 2498.18 samples/sec Loss 2.7261 LearningRate 0.000496 Epoch: 14 Global Step: 303560 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:14,492-Speed 2497.94 samples/sec Loss 2.7385 LearningRate 0.000496 Epoch: 14 Global Step: 303570 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:22,693-Speed 2497.66 samples/sec Loss 2.7827 LearningRate 0.000496 Epoch: 14 Global Step: 303580 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:30,892-Speed 2498.28 samples/sec Loss 2.7344 LearningRate 0.000496 Epoch: 14 Global Step: 303590 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:39,103-Speed 2494.77 samples/sec Loss 2.7751 LearningRate 0.000496 Epoch: 14 Global Step: 303600 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:47,253-Speed 2513.13 samples/sec Loss 2.7319 LearningRate 0.000496 Epoch: 14 Global Step: 303610 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:54:55,455-Speed 2497.34 samples/sec Loss 2.7575 LearningRate 0.000496 Epoch: 14 Global Step: 303620 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:03,658-Speed 2497.18 samples/sec Loss 2.7844 LearningRate 0.000496 Epoch: 14 Global Step: 303630 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:11,854-Speed 2499.00 samples/sec Loss 2.7426 LearningRate 0.000496 Epoch: 14 Global Step: 303640 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:20,054-Speed 2498.08 samples/sec Loss 2.7901 LearningRate 0.000496 Epoch: 14 Global Step: 303650 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:28,250-Speed 2499.16 samples/sec Loss 2.7820 LearningRate 0.000496 Epoch: 14 Global Step: 303660 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:36,393-Speed 2515.48 samples/sec Loss 2.7812 LearningRate 0.000496 Epoch: 14 Global Step: 303670 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:44,592-Speed 2498.29 samples/sec Loss 2.8428 LearningRate 0.000496 Epoch: 14 Global Step: 303680 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:55:52,793-Speed 2497.69 samples/sec Loss 2.7806 LearningRate 0.000496 Epoch: 14 Global Step: 303690 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:00,990-Speed 2498.87 samples/sec Loss 2.7421 LearningRate 0.000496 Epoch: 14 Global Step: 303700 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:09,191-Speed 2497.64 samples/sec Loss 2.7340 LearningRate 0.000496 Epoch: 14 Global Step: 303710 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:17,397-Speed 2496.34 samples/sec Loss 2.7724 LearningRate 0.000496 Epoch: 14 Global Step: 303720 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:25,541-Speed 2514.95 samples/sec Loss 2.7307 LearningRate 0.000496 Epoch: 14 Global Step: 303730 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:33,741-Speed 2498.09 samples/sec Loss 2.7700 LearningRate 0.000496 Epoch: 14 Global Step: 303740 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:41,944-Speed 2497.06 samples/sec Loss 2.7368 LearningRate 0.000496 Epoch: 14 Global Step: 303750 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:50,144-Speed 2497.76 samples/sec Loss 2.7147 LearningRate 0.000496 Epoch: 14 Global Step: 303760 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:56:58,343-Speed 2498.38 samples/sec Loss 2.7449 LearningRate 0.000496 Epoch: 14 Global Step: 303770 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:06,539-Speed 2499.15 samples/sec Loss 2.7594 LearningRate 0.000496 Epoch: 14 Global Step: 303780 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:14,686-Speed 2514.26 samples/sec Loss 2.7988 LearningRate 0.000496 Epoch: 14 Global Step: 303790 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:22,885-Speed 2499.25 samples/sec Loss 2.7849 LearningRate 0.000496 Epoch: 14 Global Step: 303800 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:31,083-Speed 2498.66 samples/sec Loss 2.7641 LearningRate 0.000496 Epoch: 14 Global Step: 303810 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:39,282-Speed 2498.12 samples/sec Loss 2.7457 LearningRate 0.000496 Epoch: 14 Global Step: 303820 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:47,477-Speed 2499.77 samples/sec Loss 2.7432 LearningRate 0.000496 Epoch: 14 Global Step: 303830 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:57:55,680-Speed 2497.06 samples/sec Loss 2.7242 LearningRate 0.000496 Epoch: 14 Global Step: 303840 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:03,821-Speed 2515.99 samples/sec Loss 2.7729 LearningRate 0.000496 Epoch: 14 Global Step: 303850 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:12,018-Speed 2498.87 samples/sec Loss 2.7618 LearningRate 0.000496 Epoch: 14 Global Step: 303860 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:20,217-Speed 2498.35 samples/sec Loss 2.7320 LearningRate 0.000496 Epoch: 14 Global Step: 303870 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:28,417-Speed 2498.06 samples/sec Loss 2.7622 LearningRate 0.000496 Epoch: 14 Global Step: 303880 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:36,612-Speed 2499.56 samples/sec Loss 2.7617 LearningRate 0.000496 Epoch: 14 Global Step: 303890 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:44,811-Speed 2498.28 samples/sec Loss 2.8331 LearningRate 0.000496 Epoch: 14 Global Step: 303900 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:58:52,970-Speed 2510.43 samples/sec Loss 2.7543 LearningRate 0.000496 Epoch: 14 Global Step: 303910 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:01,183-Speed 2494.02 samples/sec Loss 2.8462 LearningRate 0.000496 Epoch: 14 Global Step: 303920 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:09,380-Speed 2499.15 samples/sec Loss 2.7950 LearningRate 0.000496 Epoch: 14 Global Step: 303930 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:17,575-Speed 2499.46 samples/sec Loss 2.8059 LearningRate 0.000496 Epoch: 14 Global Step: 303940 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:25,773-Speed 2498.46 samples/sec Loss 2.8416 LearningRate 0.000496 Epoch: 14 Global Step: 303950 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:33,972-Speed 2498.10 samples/sec Loss 2.8075 LearningRate 0.000496 Epoch: 14 Global Step: 303960 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:42,121-Speed 2513.82 samples/sec Loss 2.6974 LearningRate 0.000496 Epoch: 14 Global Step: 303970 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:50,320-Speed 2498.19 samples/sec Loss 2.7694 LearningRate 0.000496 Epoch: 14 Global Step: 303980 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 11:59:58,525-Speed 2496.50 samples/sec Loss 2.7830 LearningRate 0.000496 Epoch: 14 Global Step: 303990 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:06,723-Speed 2498.45 samples/sec Loss 2.8193 LearningRate 0.000496 Epoch: 14 Global Step: 304000 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:14,926-Speed 2497.11 samples/sec Loss 2.7920 LearningRate 0.000496 Epoch: 14 Global Step: 304010 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:23,125-Speed 2498.72 samples/sec Loss 2.7495 LearningRate 0.000495 Epoch: 14 Global Step: 304020 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:31,268-Speed 2515.45 samples/sec Loss 2.7830 LearningRate 0.000495 Epoch: 14 Global Step: 304030 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:39,468-Speed 2498.15 samples/sec Loss 2.7539 LearningRate 0.000495 Epoch: 14 Global Step: 304040 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:47,675-Speed 2495.81 samples/sec Loss 2.7269 LearningRate 0.000495 Epoch: 14 Global Step: 304050 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:00:55,870-Speed 2499.54 samples/sec Loss 2.8207 LearningRate 0.000495 Epoch: 14 Global Step: 304060 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:04,069-Speed 2498.05 samples/sec Loss 2.7381 LearningRate 0.000495 Epoch: 14 Global Step: 304070 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:12,268-Speed 2498.19 samples/sec Loss 2.7994 LearningRate 0.000495 Epoch: 14 Global Step: 304080 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:20,414-Speed 2514.58 samples/sec Loss 2.7564 LearningRate 0.000495 Epoch: 14 Global Step: 304090 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:28,619-Speed 2496.62 samples/sec Loss 2.7596 LearningRate 0.000495 Epoch: 14 Global Step: 304100 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:36,824-Speed 2496.41 samples/sec Loss 2.8051 LearningRate 0.000495 Epoch: 14 Global Step: 304110 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:45,025-Speed 2497.66 samples/sec Loss 2.7620 LearningRate 0.000495 Epoch: 14 Global Step: 304120 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:01:53,224-Speed 2498.31 samples/sec Loss 2.7772 LearningRate 0.000495 Epoch: 14 Global Step: 304130 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:01,423-Speed 2498.30 samples/sec Loss 2.7797 LearningRate 0.000495 Epoch: 14 Global Step: 304140 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:09,567-Speed 2515.37 samples/sec Loss 2.7851 LearningRate 0.000495 Epoch: 14 Global Step: 304150 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:17,767-Speed 2497.95 samples/sec Loss 2.7411 LearningRate 0.000495 Epoch: 14 Global Step: 304160 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:25,973-Speed 2496.05 samples/sec Loss 2.7229 LearningRate 0.000495 Epoch: 14 Global Step: 304170 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:34,173-Speed 2498.16 samples/sec Loss 2.7386 LearningRate 0.000495 Epoch: 14 Global Step: 304180 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:42,376-Speed 2496.75 samples/sec Loss 2.7824 LearningRate 0.000495 Epoch: 14 Global Step: 304190 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:50,585-Speed 2495.36 samples/sec Loss 2.7516 LearningRate 0.000495 Epoch: 14 Global Step: 304200 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:02:58,732-Speed 2514.04 samples/sec Loss 2.7906 LearningRate 0.000495 Epoch: 14 Global Step: 304210 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:06,932-Speed 2498.15 samples/sec Loss 2.6847 LearningRate 0.000495 Epoch: 14 Global Step: 304220 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:15,131-Speed 2498.29 samples/sec Loss 2.7343 LearningRate 0.000495 Epoch: 14 Global Step: 304230 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:23,328-Speed 2498.78 samples/sec Loss 2.7377 LearningRate 0.000495 Epoch: 14 Global Step: 304240 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:31,531-Speed 2497.27 samples/sec Loss 2.7711 LearningRate 0.000495 Epoch: 14 Global Step: 304250 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:39,731-Speed 2497.89 samples/sec Loss 2.7256 LearningRate 0.000495 Epoch: 14 Global Step: 304260 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:47,875-Speed 2514.95 samples/sec Loss 2.7059 LearningRate 0.000495 Epoch: 14 Global Step: 304270 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:03:56,074-Speed 2498.26 samples/sec Loss 2.7816 LearningRate 0.000495 Epoch: 14 Global Step: 304280 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:04,276-Speed 2497.56 samples/sec Loss 2.7918 LearningRate 0.000495 Epoch: 14 Global Step: 304290 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:12,477-Speed 2497.94 samples/sec Loss 2.7283 LearningRate 0.000495 Epoch: 14 Global Step: 304300 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:20,676-Speed 2498.33 samples/sec Loss 2.7859 LearningRate 0.000495 Epoch: 14 Global Step: 304310 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:28,876-Speed 2497.73 samples/sec Loss 2.7269 LearningRate 0.000495 Epoch: 14 Global Step: 304320 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:37,020-Speed 2515.27 samples/sec Loss 2.7013 LearningRate 0.000495 Epoch: 14 Global Step: 304330 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:45,219-Speed 2498.19 samples/sec Loss 2.7960 LearningRate 0.000495 Epoch: 14 Global Step: 304340 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:04:53,417-Speed 2498.59 samples/sec Loss 2.7672 LearningRate 0.000495 Epoch: 14 Global Step: 304350 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:05:01,621-Speed 2496.73 samples/sec Loss 2.7239 LearningRate 0.000495 Epoch: 14 Global Step: 304360 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:09,821-Speed 2497.97 samples/sec Loss 2.8103 LearningRate 0.000495 Epoch: 14 Global Step: 304370 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:18,019-Speed 2498.46 samples/sec Loss 2.7728 LearningRate 0.000495 Epoch: 14 Global Step: 304380 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:26,173-Speed 2511.98 samples/sec Loss 2.7587 LearningRate 0.000495 Epoch: 14 Global Step: 304390 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:34,370-Speed 2499.03 samples/sec Loss 2.7812 LearningRate 0.000495 Epoch: 14 Global Step: 304400 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:42,568-Speed 2498.52 samples/sec Loss 2.7756 LearningRate 0.000495 Epoch: 14 Global Step: 304410 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:50,767-Speed 2497.97 samples/sec Loss 2.7658 LearningRate 0.000495 Epoch: 14 Global Step: 304420 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:05:58,969-Speed 2497.59 samples/sec Loss 2.8324 LearningRate 0.000495 Epoch: 14 Global Step: 304430 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:07,166-Speed 2498.86 samples/sec Loss 2.7328 LearningRate 0.000495 Epoch: 14 Global Step: 304440 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:15,314-Speed 2513.64 samples/sec Loss 2.7992 LearningRate 0.000495 Epoch: 14 Global Step: 304450 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:23,520-Speed 2496.46 samples/sec Loss 2.7589 LearningRate 0.000495 Epoch: 14 Global Step: 304460 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:31,715-Speed 2499.50 samples/sec Loss 2.7608 LearningRate 0.000495 Epoch: 14 Global Step: 304470 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:39,914-Speed 2498.67 samples/sec Loss 2.7864 LearningRate 0.000495 Epoch: 14 Global Step: 304480 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:48,117-Speed 2496.84 samples/sec Loss 2.7553 LearningRate 0.000495 Epoch: 14 Global Step: 304490 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:06:56,316-Speed 2498.44 samples/sec Loss 2.7768 LearningRate 0.000495 Epoch: 14 Global Step: 304500 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:04,460-Speed 2514.99 samples/sec Loss 2.7598 LearningRate 0.000495 Epoch: 14 Global Step: 304510 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:12,666-Speed 2496.10 samples/sec Loss 2.7377 LearningRate 0.000495 Epoch: 14 Global Step: 304520 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:20,873-Speed 2495.93 samples/sec Loss 2.7262 LearningRate 0.000495 Epoch: 14 Global Step: 304530 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:29,073-Speed 2497.98 samples/sec Loss 2.7721 LearningRate 0.000495 Epoch: 14 Global Step: 304540 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:37,269-Speed 2499.15 samples/sec Loss 2.7300 LearningRate 0.000494 Epoch: 14 Global Step: 304550 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:45,466-Speed 2498.62 samples/sec Loss 2.8178 LearningRate 0.000494 Epoch: 14 Global Step: 304560 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:07:53,621-Speed 2512.04 samples/sec Loss 2.7476 LearningRate 0.000494 Epoch: 14 Global Step: 304570 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:01,827-Speed 2496.04 samples/sec Loss 2.6715 LearningRate 0.000494 Epoch: 14 Global Step: 304580 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:10,029-Speed 2497.58 samples/sec Loss 2.7313 LearningRate 0.000494 Epoch: 14 Global Step: 304590 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:18,225-Speed 2499.27 samples/sec Loss 2.7787 LearningRate 0.000494 Epoch: 14 Global Step: 304600 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:26,425-Speed 2497.87 samples/sec Loss 2.7696 LearningRate 0.000494 Epoch: 14 Global Step: 304610 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:34,623-Speed 2498.52 samples/sec Loss 2.7547 LearningRate 0.000494 Epoch: 14 Global Step: 304620 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:42,770-Speed 2514.20 samples/sec Loss 2.6645 LearningRate 0.000494 Epoch: 14 Global Step: 304630 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:50,964-Speed 2499.88 samples/sec Loss 2.7500 LearningRate 0.000494 Epoch: 14 Global Step: 304640 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:08:59,178-Speed 2493.58 samples/sec Loss 2.7552 LearningRate 0.000494 Epoch: 14 Global Step: 304650 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:07,384-Speed 2496.36 samples/sec Loss 2.6692 LearningRate 0.000494 Epoch: 14 Global Step: 304660 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:15,592-Speed 2495.29 samples/sec Loss 2.6857 LearningRate 0.000494 Epoch: 14 Global Step: 304670 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:23,794-Speed 2497.42 samples/sec Loss 2.7469 LearningRate 0.000494 Epoch: 14 Global Step: 304680 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:31,944-Speed 2513.36 samples/sec Loss 2.7429 LearningRate 0.000494 Epoch: 14 Global Step: 304690 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:40,147-Speed 2496.76 samples/sec Loss 2.7392 LearningRate 0.000494 Epoch: 14 Global Step: 304700 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:48,349-Speed 2497.65 samples/sec Loss 2.7804 LearningRate 0.000494 Epoch: 14 Global Step: 304710 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:09:56,555-Speed 2495.89 samples/sec Loss 2.7306 LearningRate 0.000494 Epoch: 14 Global Step: 304720 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:04,758-Speed 2497.07 samples/sec Loss 2.7811 LearningRate 0.000494 Epoch: 14 Global Step: 304730 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:12,955-Speed 2498.68 samples/sec Loss 2.7619 LearningRate 0.000494 Epoch: 14 Global Step: 304740 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:21,102-Speed 2514.28 samples/sec Loss 2.8135 LearningRate 0.000494 Epoch: 14 Global Step: 304750 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:29,304-Speed 2497.37 samples/sec Loss 2.7571 LearningRate 0.000494 Epoch: 14 Global Step: 304760 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:37,507-Speed 2496.82 samples/sec Loss 2.7227 LearningRate 0.000494 Epoch: 14 Global Step: 304770 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:45,714-Speed 2495.85 samples/sec Loss 2.7336 LearningRate 0.000494 Epoch: 14 Global Step: 304780 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:10:53,914-Speed 2498.19 samples/sec Loss 2.7045 LearningRate 0.000494 Epoch: 14 Global Step: 304790 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:02,114-Speed 2497.88 samples/sec Loss 2.6896 LearningRate 0.000494 Epoch: 14 Global Step: 304800 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:10,262-Speed 2513.71 samples/sec Loss 2.7341 LearningRate 0.000494 Epoch: 14 Global Step: 304810 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:18,462-Speed 2498.13 samples/sec Loss 2.6802 LearningRate 0.000494 Epoch: 14 Global Step: 304820 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:26,656-Speed 2499.75 samples/sec Loss 2.7566 LearningRate 0.000494 Epoch: 14 Global Step: 304830 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:34,870-Speed 2493.62 samples/sec Loss 2.7492 LearningRate 0.000494 Epoch: 14 Global Step: 304840 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:43,067-Speed 2498.72 samples/sec Loss 2.7263 LearningRate 0.000494 Epoch: 14 Global Step: 304850 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:51,264-Speed 2499.16 samples/sec Loss 2.7317 LearningRate 0.000494 Epoch: 14 Global Step: 304860 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:11:59,410-Speed 2515.18 samples/sec Loss 2.7235 LearningRate 0.000494 Epoch: 14 Global Step: 304870 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:07,609-Speed 2498.32 samples/sec Loss 2.7675 LearningRate 0.000494 Epoch: 14 Global Step: 304880 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:15,823-Speed 2493.71 samples/sec Loss 2.7498 LearningRate 0.000494 Epoch: 14 Global Step: 304890 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:24,021-Speed 2498.79 samples/sec Loss 2.7005 LearningRate 0.000494 Epoch: 14 Global Step: 304900 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:32,217-Speed 2498.87 samples/sec Loss 2.6839 LearningRate 0.000494 Epoch: 14 Global Step: 304910 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:40,418-Speed 2497.70 samples/sec Loss 2.6811 LearningRate 0.000494 Epoch: 14 Global Step: 304920 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:48,562-Speed 2514.96 samples/sec Loss 2.6479 LearningRate 0.000494 Epoch: 14 Global Step: 304930 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:12:56,765-Speed 2496.90 samples/sec Loss 2.7234 LearningRate 0.000494 Epoch: 14 Global Step: 304940 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:04,966-Speed 2497.95 samples/sec Loss 2.6938 LearningRate 0.000494 Epoch: 14 Global Step: 304950 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:13,164-Speed 2498.84 samples/sec Loss 2.7173 LearningRate 0.000494 Epoch: 14 Global Step: 304960 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:21,366-Speed 2497.39 samples/sec Loss 2.7664 LearningRate 0.000494 Epoch: 14 Global Step: 304970 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:29,566-Speed 2497.98 samples/sec Loss 2.7212 LearningRate 0.000494 Epoch: 14 Global Step: 304980 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:37,712-Speed 2514.57 samples/sec Loss 2.8269 LearningRate 0.000494 Epoch: 14 Global Step: 304990 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:45,913-Speed 2497.71 samples/sec Loss 2.7818 LearningRate 0.000494 Epoch: 14 Global Step: 305000 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:13:54,117-Speed 2496.80 samples/sec Loss 2.7779 LearningRate 0.000494 Epoch: 14 Global Step: 305010 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:02,315-Speed 2498.48 samples/sec Loss 2.8107 LearningRate 0.000494 Epoch: 14 Global Step: 305020 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:10,515-Speed 2498.08 samples/sec Loss 2.7908 LearningRate 0.000494 Epoch: 14 Global Step: 305030 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:18,713-Speed 2498.60 samples/sec Loss 2.7516 LearningRate 0.000494 Epoch: 14 Global Step: 305040 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:26,859-Speed 2514.46 samples/sec Loss 2.7630 LearningRate 0.000494 Epoch: 14 Global Step: 305050 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:35,056-Speed 2498.93 samples/sec Loss 2.7916 LearningRate 0.000494 Epoch: 14 Global Step: 305060 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:43,253-Speed 2498.70 samples/sec Loss 2.7718 LearningRate 0.000494 Epoch: 14 Global Step: 305070 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:51,453-Speed 2497.81 samples/sec Loss 2.7740 LearningRate 0.000493 Epoch: 14 Global Step: 305080 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:14:59,654-Speed 2497.82 samples/sec Loss 2.7861 LearningRate 0.000493 Epoch: 14 Global Step: 305090 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:07,849-Speed 2499.63 samples/sec Loss 2.7026 LearningRate 0.000493 Epoch: 14 Global Step: 305100 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:16,000-Speed 2513.11 samples/sec Loss 2.8145 LearningRate 0.000493 Epoch: 14 Global Step: 305110 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:24,198-Speed 2498.56 samples/sec Loss 2.7414 LearningRate 0.000493 Epoch: 14 Global Step: 305120 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:32,397-Speed 2498.30 samples/sec Loss 2.7255 LearningRate 0.000493 Epoch: 14 Global Step: 305130 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:40,596-Speed 2498.19 samples/sec Loss 2.7290 LearningRate 0.000493 Epoch: 14 Global Step: 305140 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:48,788-Speed 2500.47 samples/sec Loss 2.7290 LearningRate 0.000493 Epoch: 14 Global Step: 305150 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:15:56,991-Speed 2496.97 samples/sec Loss 2.7415 LearningRate 0.000493 Epoch: 14 Global Step: 305160 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:05,139-Speed 2513.91 samples/sec Loss 2.7535 LearningRate 0.000493 Epoch: 14 Global Step: 305170 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:13,348-Speed 2495.28 samples/sec Loss 2.7971 LearningRate 0.000493 Epoch: 14 Global Step: 305180 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:21,548-Speed 2498.15 samples/sec Loss 2.7447 LearningRate 0.000493 Epoch: 14 Global Step: 305190 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:29,756-Speed 2495.47 samples/sec Loss 2.7335 LearningRate 0.000493 Epoch: 14 Global Step: 305200 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:37,959-Speed 2497.53 samples/sec Loss 2.7186 LearningRate 0.000493 Epoch: 14 Global Step: 305210 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:46,163-Speed 2496.58 samples/sec Loss 2.7299 LearningRate 0.000493 Epoch: 14 Global Step: 305220 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:16:54,307-Speed 2515.04 samples/sec Loss 2.7802 LearningRate 0.000493 Epoch: 14 Global Step: 305230 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:02,505-Speed 2498.71 samples/sec Loss 2.7874 LearningRate 0.000493 Epoch: 14 Global Step: 305240 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:10,701-Speed 2498.95 samples/sec Loss 2.7795 LearningRate 0.000493 Epoch: 14 Global Step: 305250 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:18,899-Speed 2499.03 samples/sec Loss 2.7564 LearningRate 0.000493 Epoch: 14 Global Step: 305260 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:27,109-Speed 2495.69 samples/sec Loss 2.7268 LearningRate 0.000493 Epoch: 14 Global Step: 305270 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:35,312-Speed 2496.96 samples/sec Loss 2.7786 LearningRate 0.000493 Epoch: 14 Global Step: 305280 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:43,472-Speed 2510.37 samples/sec Loss 2.7338 LearningRate 0.000493 Epoch: 14 Global Step: 305290 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:51,665-Speed 2499.96 samples/sec Loss 2.7432 LearningRate 0.000493 Epoch: 14 Global Step: 305300 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:17:59,860-Speed 2499.16 samples/sec Loss 2.7572 LearningRate 0.000493 Epoch: 14 Global Step: 305310 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:08,057-Speed 2498.95 samples/sec Loss 2.7490 LearningRate 0.000493 Epoch: 14 Global Step: 305320 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:16,252-Speed 2499.38 samples/sec Loss 2.7246 LearningRate 0.000493 Epoch: 14 Global Step: 305330 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:24,452-Speed 2497.97 samples/sec Loss 2.7798 LearningRate 0.000493 Epoch: 14 Global Step: 305340 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:32,609-Speed 2511.14 samples/sec Loss 2.8074 LearningRate 0.000493 Epoch: 14 Global Step: 305350 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:40,807-Speed 2498.53 samples/sec Loss 2.7676 LearningRate 0.000493 Epoch: 14 Global Step: 305360 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:49,010-Speed 2497.34 samples/sec Loss 2.7774 LearningRate 0.000493 Epoch: 14 Global Step: 305370 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:18:57,216-Speed 2496.26 samples/sec Loss 2.7730 LearningRate 0.000493 Epoch: 14 Global Step: 305380 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:05,418-Speed 2497.13 samples/sec Loss 2.7902 LearningRate 0.000493 Epoch: 14 Global Step: 305390 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:13,623-Speed 2496.41 samples/sec Loss 2.7784 LearningRate 0.000493 Epoch: 14 Global Step: 305400 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:21,783-Speed 2510.33 samples/sec Loss 2.7622 LearningRate 0.000493 Epoch: 14 Global Step: 305410 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:29,991-Speed 2495.38 samples/sec Loss 2.7978 LearningRate 0.000493 Epoch: 14 Global Step: 305420 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:38,194-Speed 2497.27 samples/sec Loss 2.7387 LearningRate 0.000493 Epoch: 14 Global Step: 305430 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:46,399-Speed 2496.68 samples/sec Loss 2.7360 LearningRate 0.000493 Epoch: 14 Global Step: 305440 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:19:54,599-Speed 2497.93 samples/sec Loss 2.7104 LearningRate 0.000493 Epoch: 14 Global Step: 305450 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:02,809-Speed 2494.86 samples/sec Loss 2.7278 LearningRate 0.000493 Epoch: 14 Global Step: 305460 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:10,955-Speed 2514.52 samples/sec Loss 2.7015 LearningRate 0.000493 Epoch: 14 Global Step: 305470 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:19,159-Speed 2496.67 samples/sec Loss 2.7266 LearningRate 0.000493 Epoch: 14 Global Step: 305480 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:27,359-Speed 2497.97 samples/sec Loss 2.7239 LearningRate 0.000493 Epoch: 14 Global Step: 305490 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:35,557-Speed 2498.59 samples/sec Loss 2.7772 LearningRate 0.000493 Epoch: 14 Global Step: 305500 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:43,764-Speed 2496.09 samples/sec Loss 2.7551 LearningRate 0.000493 Epoch: 14 Global Step: 305510 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:20:51,966-Speed 2497.46 samples/sec Loss 2.7268 LearningRate 0.000493 Epoch: 14 Global Step: 305520 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:00,115-Speed 2513.47 samples/sec Loss 2.7286 LearningRate 0.000493 Epoch: 14 Global Step: 305530 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:08,314-Speed 2498.50 samples/sec Loss 2.7596 LearningRate 0.000493 Epoch: 14 Global Step: 305540 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:16,516-Speed 2497.14 samples/sec Loss 2.7262 LearningRate 0.000493 Epoch: 14 Global Step: 305550 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:24,717-Speed 2498.07 samples/sec Loss 2.6785 LearningRate 0.000493 Epoch: 14 Global Step: 305560 Fp16 Grad Scale: 65536 Required: 120 hours Training: 2022-07-08 12:21:32,917-Speed 2497.96 samples/sec Loss 2.7115 LearningRate 0.000493 Epoch: 14 Global Step: 305570 Fp16 Grad Scale: 65536 Required: 120 hours Training: 2022-07-08 12:21:41,076-Speed 2510.41 samples/sec Loss 2.7478 LearningRate 0.000493 Epoch: 14 Global Step: 305580 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:49,223-Speed 2514.04 samples/sec Loss 2.7360 LearningRate 0.000493 Epoch: 14 Global Step: 305590 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:21:57,427-Speed 2496.79 samples/sec Loss 2.7455 LearningRate 0.000493 Epoch: 14 Global Step: 305600 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:05,639-Speed 2494.27 samples/sec Loss 2.7663 LearningRate 0.000492 Epoch: 14 Global Step: 305610 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:13,839-Speed 2498.02 samples/sec Loss 2.8665 LearningRate 0.000492 Epoch: 14 Global Step: 305620 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:22,039-Speed 2497.74 samples/sec Loss 2.7330 LearningRate 0.000492 Epoch: 14 Global Step: 305630 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:30,241-Speed 2497.26 samples/sec Loss 2.7678 LearningRate 0.000492 Epoch: 14 Global Step: 305640 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:38,389-Speed 2514.09 samples/sec Loss 2.7887 LearningRate 0.000492 Epoch: 14 Global Step: 305650 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:46,595-Speed 2496.04 samples/sec Loss 2.7792 LearningRate 0.000492 Epoch: 14 Global Step: 305660 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:22:54,817-Speed 2491.34 samples/sec Loss 2.7066 LearningRate 0.000492 Epoch: 14 Global Step: 305670 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:03,024-Speed 2495.74 samples/sec Loss 2.7080 LearningRate 0.000492 Epoch: 14 Global Step: 305680 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:11,223-Speed 2498.36 samples/sec Loss 2.7181 LearningRate 0.000492 Epoch: 14 Global Step: 305690 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:19,424-Speed 2497.68 samples/sec Loss 2.7988 LearningRate 0.000492 Epoch: 14 Global Step: 305700 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:27,572-Speed 2513.88 samples/sec Loss 2.7494 LearningRate 0.000492 Epoch: 14 Global Step: 305710 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:35,774-Speed 2497.42 samples/sec Loss 2.7624 LearningRate 0.000492 Epoch: 14 Global Step: 305720 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:43,974-Speed 2497.90 samples/sec Loss 2.7275 LearningRate 0.000492 Epoch: 14 Global Step: 305730 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:23:52,187-Speed 2493.91 samples/sec Loss 2.7384 LearningRate 0.000492 Epoch: 14 Global Step: 305740 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:00,390-Speed 2497.43 samples/sec Loss 2.7588 LearningRate 0.000492 Epoch: 14 Global Step: 305750 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:08,592-Speed 2497.23 samples/sec Loss 2.7244 LearningRate 0.000492 Epoch: 14 Global Step: 305760 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:16,744-Speed 2512.62 samples/sec Loss 2.7457 LearningRate 0.000492 Epoch: 14 Global Step: 305770 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:24,944-Speed 2497.88 samples/sec Loss 2.7434 LearningRate 0.000492 Epoch: 14 Global Step: 305780 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:33,155-Speed 2494.94 samples/sec Loss 2.6641 LearningRate 0.000492 Epoch: 14 Global Step: 305790 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:41,355-Speed 2497.73 samples/sec Loss 2.7599 LearningRate 0.000492 Epoch: 14 Global Step: 305800 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:49,561-Speed 2496.28 samples/sec Loss 2.7620 LearningRate 0.000492 Epoch: 14 Global Step: 305810 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:24:57,719-Speed 2510.68 samples/sec Loss 2.7207 LearningRate 0.000492 Epoch: 14 Global Step: 305820 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:05,880-Speed 2510.12 samples/sec Loss 2.7947 LearningRate 0.000492 Epoch: 14 Global Step: 305830 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:14,079-Speed 2498.18 samples/sec Loss 2.8071 LearningRate 0.000492 Epoch: 14 Global Step: 305840 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:22,278-Speed 2498.27 samples/sec Loss 2.7462 LearningRate 0.000492 Epoch: 14 Global Step: 305850 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:30,478-Speed 2497.92 samples/sec Loss 2.7697 LearningRate 0.000492 Epoch: 14 Global Step: 305860 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:38,680-Speed 2497.19 samples/sec Loss 2.7565 LearningRate 0.000492 Epoch: 14 Global Step: 305870 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:46,886-Speed 2496.00 samples/sec Loss 2.8047 LearningRate 0.000492 Epoch: 14 Global Step: 305880 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:25:55,034-Speed 2513.98 samples/sec Loss 2.7253 LearningRate 0.000492 Epoch: 14 Global Step: 305890 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:03,235-Speed 2497.85 samples/sec Loss 2.8545 LearningRate 0.000492 Epoch: 14 Global Step: 305900 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:11,431-Speed 2499.07 samples/sec Loss 2.7436 LearningRate 0.000492 Epoch: 14 Global Step: 305910 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:19,640-Speed 2495.24 samples/sec Loss 2.7299 LearningRate 0.000492 Epoch: 14 Global Step: 305920 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:27,841-Speed 2497.74 samples/sec Loss 2.7888 LearningRate 0.000492 Epoch: 14 Global Step: 305930 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:36,041-Speed 2498.18 samples/sec Loss 2.7022 LearningRate 0.000492 Epoch: 14 Global Step: 305940 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:44,188-Speed 2514.36 samples/sec Loss 2.7581 LearningRate 0.000492 Epoch: 14 Global Step: 305950 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:26:52,388-Speed 2498.01 samples/sec Loss 2.7766 LearningRate 0.000492 Epoch: 14 Global Step: 305960 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:00,602-Speed 2493.63 samples/sec Loss 2.7219 LearningRate 0.000492 Epoch: 14 Global Step: 305970 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:08,804-Speed 2497.16 samples/sec Loss 2.7674 LearningRate 0.000492 Epoch: 14 Global Step: 305980 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:17,008-Speed 2497.03 samples/sec Loss 2.7040 LearningRate 0.000492 Epoch: 14 Global Step: 305990 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:25,206-Speed 2498.33 samples/sec Loss 2.6981 LearningRate 0.000492 Epoch: 14 Global Step: 306000 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:33,363-Speed 2511.02 samples/sec Loss 2.7322 LearningRate 0.000492 Epoch: 14 Global Step: 306010 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:41,569-Speed 2496.49 samples/sec Loss 2.7902 LearningRate 0.000492 Epoch: 14 Global Step: 306020 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:49,793-Speed 2490.86 samples/sec Loss 2.7572 LearningRate 0.000492 Epoch: 14 Global Step: 306030 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:27:57,992-Speed 2498.16 samples/sec Loss 2.7431 LearningRate 0.000492 Epoch: 14 Global Step: 306040 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:06,205-Speed 2494.11 samples/sec Loss 2.8255 LearningRate 0.000492 Epoch: 14 Global Step: 306050 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:14,406-Speed 2497.79 samples/sec Loss 2.7495 LearningRate 0.000492 Epoch: 14 Global Step: 306060 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:22,549-Speed 2515.58 samples/sec Loss 2.7096 LearningRate 0.000492 Epoch: 14 Global Step: 306070 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:30,748-Speed 2498.19 samples/sec Loss 2.7238 LearningRate 0.000492 Epoch: 14 Global Step: 306080 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:38,949-Speed 2497.83 samples/sec Loss 2.7517 LearningRate 0.000492 Epoch: 14 Global Step: 306090 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:47,154-Speed 2496.58 samples/sec Loss 2.7388 LearningRate 0.000492 Epoch: 14 Global Step: 306100 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:28:55,356-Speed 2497.24 samples/sec Loss 2.7708 LearningRate 0.000492 Epoch: 14 Global Step: 306110 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:03,555-Speed 2498.19 samples/sec Loss 2.6841 LearningRate 0.000492 Epoch: 14 Global Step: 306120 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:11,708-Speed 2512.32 samples/sec Loss 2.7246 LearningRate 0.000492 Epoch: 14 Global Step: 306130 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:19,908-Speed 2497.96 samples/sec Loss 2.7356 LearningRate 0.000491 Epoch: 14 Global Step: 306140 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:28,115-Speed 2495.90 samples/sec Loss 2.7297 LearningRate 0.000491 Epoch: 14 Global Step: 306150 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:36,313-Speed 2498.23 samples/sec Loss 2.6979 LearningRate 0.000491 Epoch: 14 Global Step: 306160 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:44,526-Speed 2494.01 samples/sec Loss 2.6727 LearningRate 0.000491 Epoch: 14 Global Step: 306170 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:29:52,728-Speed 2497.45 samples/sec Loss 2.6798 LearningRate 0.000491 Epoch: 14 Global Step: 306180 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:00,878-Speed 2513.62 samples/sec Loss 2.6920 LearningRate 0.000491 Epoch: 14 Global Step: 306190 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:09,077-Speed 2497.99 samples/sec Loss 2.6529 LearningRate 0.000491 Epoch: 14 Global Step: 306200 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:17,278-Speed 2497.98 samples/sec Loss 2.6636 LearningRate 0.000491 Epoch: 14 Global Step: 306210 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:25,478-Speed 2498.18 samples/sec Loss 2.7088 LearningRate 0.000491 Epoch: 14 Global Step: 306220 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:33,676-Speed 2498.41 samples/sec Loss 2.6837 LearningRate 0.000491 Epoch: 14 Global Step: 306230 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:41,877-Speed 2497.60 samples/sec Loss 2.6699 LearningRate 0.000491 Epoch: 14 Global Step: 306240 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:50,025-Speed 2513.91 samples/sec Loss 2.6523 LearningRate 0.000491 Epoch: 14 Global Step: 306250 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:30:58,226-Speed 2497.84 samples/sec Loss 2.6638 LearningRate 0.000491 Epoch: 14 Global Step: 306260 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:06,429-Speed 2497.01 samples/sec Loss 2.7214 LearningRate 0.000491 Epoch: 14 Global Step: 306270 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:14,631-Speed 2497.57 samples/sec Loss 2.7744 LearningRate 0.000491 Epoch: 14 Global Step: 306280 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:22,836-Speed 2496.24 samples/sec Loss 2.8014 LearningRate 0.000491 Epoch: 14 Global Step: 306290 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:31,034-Speed 2498.72 samples/sec Loss 2.7177 LearningRate 0.000491 Epoch: 14 Global Step: 306300 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:39,182-Speed 2513.84 samples/sec Loss 2.7012 LearningRate 0.000491 Epoch: 14 Global Step: 306310 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:47,386-Speed 2496.43 samples/sec Loss 2.7408 LearningRate 0.000491 Epoch: 14 Global Step: 306320 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:31:55,583-Speed 2499.03 samples/sec Loss 2.6950 LearningRate 0.000491 Epoch: 14 Global Step: 306330 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:03,782-Speed 2498.27 samples/sec Loss 2.7103 LearningRate 0.000491 Epoch: 14 Global Step: 306340 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:11,981-Speed 2498.32 samples/sec Loss 2.7098 LearningRate 0.000491 Epoch: 14 Global Step: 306350 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:20,196-Speed 2493.21 samples/sec Loss 2.6866 LearningRate 0.000491 Epoch: 14 Global Step: 306360 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:28,350-Speed 2512.19 samples/sec Loss 2.7181 LearningRate 0.000491 Epoch: 14 Global Step: 306370 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:36,549-Speed 2498.10 samples/sec Loss 2.7421 LearningRate 0.000491 Epoch: 14 Global Step: 306380 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:44,751-Speed 2497.58 samples/sec Loss 2.7394 LearningRate 0.000491 Epoch: 14 Global Step: 306390 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:32:52,954-Speed 2496.97 samples/sec Loss 2.7691 LearningRate 0.000491 Epoch: 14 Global Step: 306400 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:01,158-Speed 2496.68 samples/sec Loss 2.8138 LearningRate 0.000491 Epoch: 14 Global Step: 306410 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:09,377-Speed 2492.29 samples/sec Loss 2.7124 LearningRate 0.000491 Epoch: 14 Global Step: 306420 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:17,527-Speed 2512.98 samples/sec Loss 2.7653 LearningRate 0.000491 Epoch: 14 Global Step: 306430 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:25,742-Speed 2493.63 samples/sec Loss 2.7623 LearningRate 0.000491 Epoch: 14 Global Step: 306440 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:33,963-Speed 2491.67 samples/sec Loss 2.7073 LearningRate 0.000491 Epoch: 14 Global Step: 306450 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:42,163-Speed 2497.98 samples/sec Loss 2.7408 LearningRate 0.000491 Epoch: 14 Global Step: 306460 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:50,364-Speed 2497.51 samples/sec Loss 2.7422 LearningRate 0.000491 Epoch: 14 Global Step: 306470 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:33:58,565-Speed 2497.79 samples/sec Loss 2.7304 LearningRate 0.000491 Epoch: 14 Global Step: 306480 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:06,716-Speed 2512.96 samples/sec Loss 2.7761 LearningRate 0.000491 Epoch: 14 Global Step: 306490 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:14,917-Speed 2497.52 samples/sec Loss 2.7496 LearningRate 0.000491 Epoch: 14 Global Step: 306500 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:23,117-Speed 2498.04 samples/sec Loss 2.7178 LearningRate 0.000491 Epoch: 14 Global Step: 306510 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:31,321-Speed 2496.78 samples/sec Loss 2.6676 LearningRate 0.000491 Epoch: 14 Global Step: 306520 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:39,531-Speed 2494.90 samples/sec Loss 2.7775 LearningRate 0.000491 Epoch: 14 Global Step: 306530 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:47,731-Speed 2497.68 samples/sec Loss 2.7577 LearningRate 0.000491 Epoch: 14 Global Step: 306540 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:34:55,879-Speed 2514.07 samples/sec Loss 2.7498 LearningRate 0.000491 Epoch: 14 Global Step: 306550 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:04,093-Speed 2493.79 samples/sec Loss 2.7609 LearningRate 0.000491 Epoch: 14 Global Step: 306560 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:12,295-Speed 2497.23 samples/sec Loss 2.7328 LearningRate 0.000491 Epoch: 14 Global Step: 306570 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:20,496-Speed 2497.49 samples/sec Loss 2.7083 LearningRate 0.000491 Epoch: 14 Global Step: 306580 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:28,699-Speed 2497.34 samples/sec Loss 2.7599 LearningRate 0.000491 Epoch: 14 Global Step: 306590 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:36,905-Speed 2496.08 samples/sec Loss 2.7327 LearningRate 0.000491 Epoch: 14 Global Step: 306600 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:45,053-Speed 2513.89 samples/sec Loss 2.7078 LearningRate 0.000491 Epoch: 14 Global Step: 306610 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:35:53,256-Speed 2497.04 samples/sec Loss 2.7340 LearningRate 0.000491 Epoch: 14 Global Step: 306620 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:01,457-Speed 2497.60 samples/sec Loss 2.7332 LearningRate 0.000491 Epoch: 14 Global Step: 306630 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:09,655-Speed 2498.60 samples/sec Loss 2.7502 LearningRate 0.000491 Epoch: 14 Global Step: 306640 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:17,854-Speed 2498.20 samples/sec Loss 2.7094 LearningRate 0.000491 Epoch: 14 Global Step: 306650 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:26,054-Speed 2497.93 samples/sec Loss 2.7496 LearningRate 0.000491 Epoch: 14 Global Step: 306660 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:34,201-Speed 2514.27 samples/sec Loss 2.7591 LearningRate 0.000491 Epoch: 14 Global Step: 306670 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:42,405-Speed 2496.96 samples/sec Loss 2.8039 LearningRate 0.000490 Epoch: 14 Global Step: 306680 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:50,605-Speed 2497.84 samples/sec Loss 2.7799 LearningRate 0.000490 Epoch: 14 Global Step: 306690 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:36:58,808-Speed 2498.23 samples/sec Loss 2.7889 LearningRate 0.000490 Epoch: 14 Global Step: 306700 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:07,008-Speed 2498.01 samples/sec Loss 2.7981 LearningRate 0.000490 Epoch: 14 Global Step: 306710 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:15,224-Speed 2493.22 samples/sec Loss 2.7817 LearningRate 0.000490 Epoch: 14 Global Step: 306720 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:23,370-Speed 2514.28 samples/sec Loss 2.7259 LearningRate 0.000490 Epoch: 14 Global Step: 306730 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:31,573-Speed 2497.07 samples/sec Loss 2.7844 LearningRate 0.000490 Epoch: 14 Global Step: 306740 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:39,772-Speed 2498.33 samples/sec Loss 2.7751 LearningRate 0.000490 Epoch: 14 Global Step: 306750 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:47,971-Speed 2498.14 samples/sec Loss 2.7723 LearningRate 0.000490 Epoch: 14 Global Step: 306760 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:37:56,171-Speed 2497.95 samples/sec Loss 2.8236 LearningRate 0.000490 Epoch: 14 Global Step: 306770 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:04,370-Speed 2498.20 samples/sec Loss 2.7603 LearningRate 0.000490 Epoch: 14 Global Step: 306780 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:12,516-Speed 2514.68 samples/sec Loss 2.6933 LearningRate 0.000490 Epoch: 14 Global Step: 306790 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:20,740-Speed 2490.47 samples/sec Loss 2.7307 LearningRate 0.000490 Epoch: 14 Global Step: 306800 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:28,942-Speed 2497.39 samples/sec Loss 2.7209 LearningRate 0.000490 Epoch: 14 Global Step: 306810 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:37,140-Speed 2498.65 samples/sec Loss 2.7289 LearningRate 0.000490 Epoch: 14 Global Step: 306820 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:45,351-Speed 2494.52 samples/sec Loss 2.7785 LearningRate 0.000490 Epoch: 14 Global Step: 306830 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:38:53,551-Speed 2497.76 samples/sec Loss 2.7428 LearningRate 0.000490 Epoch: 14 Global Step: 306840 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:01,696-Speed 2515.15 samples/sec Loss 2.6953 LearningRate 0.000490 Epoch: 14 Global Step: 306850 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:09,894-Speed 2498.56 samples/sec Loss 2.7260 LearningRate 0.000490 Epoch: 14 Global Step: 306860 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:18,094-Speed 2497.71 samples/sec Loss 2.6844 LearningRate 0.000490 Epoch: 14 Global Step: 306870 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:26,307-Speed 2494.43 samples/sec Loss 2.6849 LearningRate 0.000490 Epoch: 14 Global Step: 306880 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:34,503-Speed 2498.90 samples/sec Loss 2.7281 LearningRate 0.000490 Epoch: 14 Global Step: 306890 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:42,703-Speed 2498.23 samples/sec Loss 2.6820 LearningRate 0.000490 Epoch: 14 Global Step: 306900 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:50,850-Speed 2514.26 samples/sec Loss 2.7295 LearningRate 0.000490 Epoch: 14 Global Step: 306910 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:39:59,052-Speed 2497.25 samples/sec Loss 2.6626 LearningRate 0.000490 Epoch: 14 Global Step: 306920 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:07,252-Speed 2497.95 samples/sec Loss 2.6542 LearningRate 0.000490 Epoch: 14 Global Step: 306930 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:15,460-Speed 2495.28 samples/sec Loss 2.6420 LearningRate 0.000490 Epoch: 14 Global Step: 306940 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:23,669-Speed 2495.41 samples/sec Loss 2.6715 LearningRate 0.000490 Epoch: 14 Global Step: 306950 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:31,869-Speed 2497.83 samples/sec Loss 2.7064 LearningRate 0.000490 Epoch: 14 Global Step: 306960 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:40,016-Speed 2514.31 samples/sec Loss 2.6794 LearningRate 0.000490 Epoch: 14 Global Step: 306970 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:48,222-Speed 2496.22 samples/sec Loss 2.7712 LearningRate 0.000490 Epoch: 14 Global Step: 306980 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:40:56,421-Speed 2498.27 samples/sec Loss 2.7134 LearningRate 0.000490 Epoch: 14 Global Step: 306990 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:41:04,618-Speed 2498.92 samples/sec Loss 2.7694 LearningRate 0.000490 Epoch: 14 Global Step: 307000 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:41:12,816-Speed 2498.57 samples/sec Loss 2.7140 LearningRate 0.000490 Epoch: 14 Global Step: 307010 Fp16 Grad Scale: 16384 Required: 120 hours Training: 2022-07-08 12:41:21,015-Speed 2498.32 samples/sec Loss 2.7598 LearningRate 0.000490 Epoch: 14 Global Step: 307020 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:41:29,161-Speed 2514.41 samples/sec Loss 2.7314 LearningRate 0.000490 Epoch: 14 Global Step: 307030 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:41:37,364-Speed 2497.10 samples/sec Loss 2.7748 LearningRate 0.000490 Epoch: 14 Global Step: 307040 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:41:45,561-Speed 2498.89 samples/sec Loss 2.7251 LearningRate 0.000490 Epoch: 14 Global Step: 307050 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:41:53,757-Speed 2499.15 samples/sec Loss 2.7181 LearningRate 0.000490 Epoch: 14 Global Step: 307060 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:01,954-Speed 2498.68 samples/sec Loss 2.7105 LearningRate 0.000490 Epoch: 14 Global Step: 307070 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:10,157-Speed 2497.12 samples/sec Loss 2.7229 LearningRate 0.000490 Epoch: 14 Global Step: 307080 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:18,299-Speed 2515.45 samples/sec Loss 2.7125 LearningRate 0.000490 Epoch: 14 Global Step: 307090 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:26,512-Speed 2494.31 samples/sec Loss 2.7444 LearningRate 0.000490 Epoch: 14 Global Step: 307100 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:34,711-Speed 2498.01 samples/sec Loss 2.7724 LearningRate 0.000490 Epoch: 14 Global Step: 307110 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:42,908-Speed 2499.11 samples/sec Loss 2.8449 LearningRate 0.000490 Epoch: 14 Global Step: 307120 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:51,103-Speed 2499.23 samples/sec Loss 2.7484 LearningRate 0.000490 Epoch: 14 Global Step: 307130 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:42:59,299-Speed 2499.40 samples/sec Loss 2.7756 LearningRate 0.000490 Epoch: 14 Global Step: 307140 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:07,441-Speed 2515.74 samples/sec Loss 2.7047 LearningRate 0.000490 Epoch: 14 Global Step: 307150 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:15,638-Speed 2498.93 samples/sec Loss 2.8097 LearningRate 0.000490 Epoch: 14 Global Step: 307160 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:23,838-Speed 2498.07 samples/sec Loss 2.8133 LearningRate 0.000490 Epoch: 14 Global Step: 307170 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:32,036-Speed 2498.59 samples/sec Loss 2.7429 LearningRate 0.000490 Epoch: 14 Global Step: 307180 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:40,248-Speed 2494.18 samples/sec Loss 2.7221 LearningRate 0.000490 Epoch: 14 Global Step: 307190 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:48,443-Speed 2499.12 samples/sec Loss 2.7331 LearningRate 0.000490 Epoch: 14 Global Step: 307200 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:43:56,592-Speed 2513.91 samples/sec Loss 2.7028 LearningRate 0.000489 Epoch: 14 Global Step: 307210 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:04,790-Speed 2498.56 samples/sec Loss 2.8305 LearningRate 0.000489 Epoch: 14 Global Step: 307220 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:12,993-Speed 2496.92 samples/sec Loss 2.7878 LearningRate 0.000489 Epoch: 14 Global Step: 307230 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:21,193-Speed 2497.91 samples/sec Loss 2.7380 LearningRate 0.000489 Epoch: 14 Global Step: 307240 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:29,398-Speed 2496.73 samples/sec Loss 2.6923 LearningRate 0.000489 Epoch: 14 Global Step: 307250 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:37,605-Speed 2495.81 samples/sec Loss 2.7709 LearningRate 0.000489 Epoch: 14 Global Step: 307260 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:45,755-Speed 2513.31 samples/sec Loss 2.7870 LearningRate 0.000489 Epoch: 14 Global Step: 307270 Fp16 Grad Scale: 32768 Required: 120 hours Training: 2022-07-08 12:44:53,955-Speed 2497.80 samples/sec Loss 2.7728 LearningRate 0.000489 Epoch: 14 Global Step: 307280 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:02,154-Speed 2498.23 samples/sec Loss 2.7660 LearningRate 0.000489 Epoch: 14 Global Step: 307290 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:10,366-Speed 2494.30 samples/sec Loss 2.7541 LearningRate 0.000489 Epoch: 14 Global Step: 307300 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:18,566-Speed 2497.98 samples/sec Loss 2.7620 LearningRate 0.000489 Epoch: 14 Global Step: 307310 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:26,775-Speed 2495.13 samples/sec Loss 2.7680 LearningRate 0.000489 Epoch: 14 Global Step: 307320 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:34,935-Speed 2510.43 samples/sec Loss 2.7559 LearningRate 0.000489 Epoch: 14 Global Step: 307330 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:43,147-Speed 2494.12 samples/sec Loss 2.8065 LearningRate 0.000489 Epoch: 14 Global Step: 307340 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:51,352-Speed 2496.58 samples/sec Loss 2.7652 LearningRate 0.000489 Epoch: 14 Global Step: 307350 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:45:59,556-Speed 2496.75 samples/sec Loss 2.7305 LearningRate 0.000489 Epoch: 14 Global Step: 307360 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:07,760-Speed 2496.82 samples/sec Loss 2.7883 LearningRate 0.000489 Epoch: 14 Global Step: 307370 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:15,960-Speed 2497.66 samples/sec Loss 2.8339 LearningRate 0.000489 Epoch: 14 Global Step: 307380 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:24,112-Speed 2512.70 samples/sec Loss 2.7554 LearningRate 0.000489 Epoch: 14 Global Step: 307390 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:32,319-Speed 2495.88 samples/sec Loss 2.6935 LearningRate 0.000489 Epoch: 14 Global Step: 307400 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:40,518-Speed 2498.20 samples/sec Loss 2.6542 LearningRate 0.000489 Epoch: 14 Global Step: 307410 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:48,717-Speed 2498.32 samples/sec Loss 2.7319 LearningRate 0.000489 Epoch: 14 Global Step: 307420 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:46:56,925-Speed 2495.65 samples/sec Loss 2.7695 LearningRate 0.000489 Epoch: 14 Global Step: 307430 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:05,140-Speed 2493.68 samples/sec Loss 2.7596 LearningRate 0.000489 Epoch: 14 Global Step: 307440 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:13,287-Speed 2514.63 samples/sec Loss 2.7489 LearningRate 0.000489 Epoch: 14 Global Step: 307450 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:21,491-Speed 2496.76 samples/sec Loss 2.7169 LearningRate 0.000489 Epoch: 14 Global Step: 307460 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:29,699-Speed 2495.42 samples/sec Loss 2.6966 LearningRate 0.000489 Epoch: 14 Global Step: 307470 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:37,903-Speed 2496.86 samples/sec Loss 2.7362 LearningRate 0.000489 Epoch: 14 Global Step: 307480 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:46,105-Speed 2497.45 samples/sec Loss 2.7645 LearningRate 0.000489 Epoch: 14 Global Step: 307490 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:47:54,306-Speed 2497.60 samples/sec Loss 2.7374 LearningRate 0.000489 Epoch: 14 Global Step: 307500 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:02,470-Speed 2508.91 samples/sec Loss 2.6986 LearningRate 0.000489 Epoch: 14 Global Step: 307510 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:10,671-Speed 2497.66 samples/sec Loss 2.7336 LearningRate 0.000489 Epoch: 14 Global Step: 307520 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:18,871-Speed 2497.97 samples/sec Loss 2.8158 LearningRate 0.000489 Epoch: 14 Global Step: 307530 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:27,071-Speed 2498.05 samples/sec Loss 2.8045 LearningRate 0.000489 Epoch: 14 Global Step: 307540 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:35,273-Speed 2497.30 samples/sec Loss 2.8089 LearningRate 0.000489 Epoch: 14 Global Step: 307550 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:43,474-Speed 2497.81 samples/sec Loss 2.7681 LearningRate 0.000489 Epoch: 14 Global Step: 307560 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:51,620-Speed 2514.25 samples/sec Loss 2.8085 LearningRate 0.000489 Epoch: 14 Global Step: 307570 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:48:59,821-Speed 2497.61 samples/sec Loss 2.6974 LearningRate 0.000489 Epoch: 14 Global Step: 307580 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:08,029-Speed 2495.61 samples/sec Loss 2.7558 LearningRate 0.000489 Epoch: 14 Global Step: 307590 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:16,227-Speed 2498.55 samples/sec Loss 2.7514 LearningRate 0.000489 Epoch: 14 Global Step: 307600 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:24,427-Speed 2498.08 samples/sec Loss 2.7141 LearningRate 0.000489 Epoch: 14 Global Step: 307610 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:32,631-Speed 2496.85 samples/sec Loss 2.6846 LearningRate 0.000489 Epoch: 14 Global Step: 307620 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:40,777-Speed 2514.34 samples/sec Loss 2.6909 LearningRate 0.000489 Epoch: 14 Global Step: 307630 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:48,977-Speed 2498.17 samples/sec Loss 2.7293 LearningRate 0.000489 Epoch: 14 Global Step: 307640 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:49:57,181-Speed 2496.49 samples/sec Loss 2.7129 LearningRate 0.000489 Epoch: 14 Global Step: 307650 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:05,381-Speed 2498.18 samples/sec Loss 2.7243 LearningRate 0.000489 Epoch: 14 Global Step: 307660 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:13,579-Speed 2498.42 samples/sec Loss 2.7273 LearningRate 0.000489 Epoch: 14 Global Step: 307670 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:21,791-Speed 2494.30 samples/sec Loss 2.7351 LearningRate 0.000489 Epoch: 14 Global Step: 307680 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:29,934-Speed 2515.44 samples/sec Loss 2.7730 LearningRate 0.000489 Epoch: 14 Global Step: 307690 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:38,136-Speed 2497.27 samples/sec Loss 2.6988 LearningRate 0.000489 Epoch: 14 Global Step: 307700 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:46,336-Speed 2498.39 samples/sec Loss 2.6963 LearningRate 0.000489 Epoch: 14 Global Step: 307710 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:50:54,542-Speed 2496.49 samples/sec Loss 2.7584 LearningRate 0.000489 Epoch: 14 Global Step: 307720 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:02,746-Speed 2496.50 samples/sec Loss 2.7152 LearningRate 0.000489 Epoch: 14 Global Step: 307730 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:10,947-Speed 2497.83 samples/sec Loss 2.6890 LearningRate 0.000488 Epoch: 14 Global Step: 307740 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:19,106-Speed 2510.39 samples/sec Loss 2.7159 LearningRate 0.000488 Epoch: 14 Global Step: 307750 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:27,308-Speed 2497.38 samples/sec Loss 2.7292 LearningRate 0.000488 Epoch: 14 Global Step: 307760 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:35,511-Speed 2496.99 samples/sec Loss 2.7018 LearningRate 0.000488 Epoch: 14 Global Step: 307770 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:43,714-Speed 2496.99 samples/sec Loss 2.7349 LearningRate 0.000488 Epoch: 14 Global Step: 307780 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:51:51,916-Speed 2497.60 samples/sec Loss 2.7075 LearningRate 0.000488 Epoch: 14 Global Step: 307790 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:52:00,116-Speed 2498.03 samples/sec Loss 2.7118 LearningRate 0.000488 Epoch: 14 Global Step: 307800 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:52:08,270-Speed 2512.14 samples/sec Loss 2.7546 LearningRate 0.000488 Epoch: 14 Global Step: 307810 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:52:16,467-Speed 2498.56 samples/sec Loss 2.7115 LearningRate 0.000488 Epoch: 14 Global Step: 307820 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 12:52:24,627-Speed 2510.21 samples/sec Loss 2.6893 LearningRate 0.000488 Epoch: 14 Global Step: 307830 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:52:32,833-Speed 2496.40 samples/sec Loss 2.7258 LearningRate 0.000488 Epoch: 14 Global Step: 307840 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:52:41,039-Speed 2495.95 samples/sec Loss 2.7205 LearningRate 0.000488 Epoch: 14 Global Step: 307850 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:52:49,241-Speed 2497.77 samples/sec Loss 2.7373 LearningRate 0.000488 Epoch: 14 Global Step: 307860 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:52:57,398-Speed 2511.03 samples/sec Loss 2.7216 LearningRate 0.000488 Epoch: 14 Global Step: 307870 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:05,601-Speed 2497.02 samples/sec Loss 2.6962 LearningRate 0.000488 Epoch: 14 Global Step: 307880 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:13,799-Speed 2498.76 samples/sec Loss 2.7217 LearningRate 0.000488 Epoch: 14 Global Step: 307890 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:22,000-Speed 2497.51 samples/sec Loss 2.7034 LearningRate 0.000488 Epoch: 14 Global Step: 307900 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:30,204-Speed 2496.87 samples/sec Loss 2.7255 LearningRate 0.000488 Epoch: 14 Global Step: 307910 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:38,402-Speed 2498.47 samples/sec Loss 2.7608 LearningRate 0.000488 Epoch: 14 Global Step: 307920 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:46,576-Speed 2506.08 samples/sec Loss 2.7031 LearningRate 0.000488 Epoch: 14 Global Step: 307930 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:53:54,776-Speed 2498.06 samples/sec Loss 2.6430 LearningRate 0.000488 Epoch: 14 Global Step: 307940 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:02,985-Speed 2495.43 samples/sec Loss 2.7305 LearningRate 0.000488 Epoch: 14 Global Step: 307950 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:11,184-Speed 2498.20 samples/sec Loss 2.7783 LearningRate 0.000488 Epoch: 14 Global Step: 307960 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:19,406-Speed 2491.19 samples/sec Loss 2.6987 LearningRate 0.000488 Epoch: 14 Global Step: 307970 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:27,611-Speed 2496.70 samples/sec Loss 2.7508 LearningRate 0.000488 Epoch: 14 Global Step: 307980 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:35,760-Speed 2513.40 samples/sec Loss 2.6833 LearningRate 0.000488 Epoch: 14 Global Step: 307990 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:43,972-Speed 2494.24 samples/sec Loss 2.7375 LearningRate 0.000488 Epoch: 14 Global Step: 308000 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:54:52,174-Speed 2498.04 samples/sec Loss 2.7062 LearningRate 0.000488 Epoch: 14 Global Step: 308010 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:00,383-Speed 2495.31 samples/sec Loss 2.6880 LearningRate 0.000488 Epoch: 14 Global Step: 308020 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:08,585-Speed 2497.15 samples/sec Loss 2.7142 LearningRate 0.000488 Epoch: 14 Global Step: 308030 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:16,795-Speed 2495.36 samples/sec Loss 2.7086 LearningRate 0.000488 Epoch: 14 Global Step: 308040 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:24,947-Speed 2512.66 samples/sec Loss 2.6876 LearningRate 0.000488 Epoch: 14 Global Step: 308050 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:33,152-Speed 2496.69 samples/sec Loss 2.6746 LearningRate 0.000488 Epoch: 14 Global Step: 308060 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:41,355-Speed 2496.86 samples/sec Loss 2.7125 LearningRate 0.000488 Epoch: 14 Global Step: 308070 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:49,568-Speed 2493.92 samples/sec Loss 2.7067 LearningRate 0.000488 Epoch: 14 Global Step: 308080 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:55:57,769-Speed 2497.83 samples/sec Loss 2.7125 LearningRate 0.000488 Epoch: 14 Global Step: 308090 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:05,984-Speed 2493.32 samples/sec Loss 2.7425 LearningRate 0.000488 Epoch: 14 Global Step: 308100 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:14,136-Speed 2512.71 samples/sec Loss 2.7431 LearningRate 0.000488 Epoch: 14 Global Step: 308110 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:22,339-Speed 2497.11 samples/sec Loss 2.6872 LearningRate 0.000488 Epoch: 14 Global Step: 308120 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:30,541-Speed 2497.39 samples/sec Loss 2.7479 LearningRate 0.000488 Epoch: 14 Global Step: 308130 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:38,755-Speed 2493.82 samples/sec Loss 2.6511 LearningRate 0.000488 Epoch: 14 Global Step: 308140 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:46,957-Speed 2497.09 samples/sec Loss 2.7102 LearningRate 0.000488 Epoch: 14 Global Step: 308150 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:56:55,160-Speed 2496.81 samples/sec Loss 2.6931 LearningRate 0.000488 Epoch: 14 Global Step: 308160 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:57:03,348-Speed 2514.75 samples/sec Loss 2.7411 LearningRate 0.000488 Epoch: 14 Global Step: 308170 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:01,274-Speed 173.71 samples/sec Loss 2.7012 LearningRate 0.000488 Epoch: 14 Global Step: 308180 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:09,545-Speed 2495.91 samples/sec Loss 2.7776 LearningRate 0.000488 Epoch: 14 Global Step: 308190 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:22,563-Speed 1576.53 samples/sec Loss 2.6685 LearningRate 0.000488 Epoch: 14 Global Step: 308200 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:30,762-Speed 2503.18 samples/sec Loss 2.6696 LearningRate 0.000488 Epoch: 14 Global Step: 308210 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:38,960-Speed 2498.65 samples/sec Loss 2.7236 LearningRate 0.000488 Epoch: 14 Global Step: 308220 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:47,130-Speed 2516.24 samples/sec Loss 2.7430 LearningRate 0.000488 Epoch: 14 Global Step: 308230 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 12:59:55,564-Speed 2499.60 samples/sec Loss 2.7120 LearningRate 0.000488 Epoch: 14 Global Step: 308240 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:03,919-Speed 2451.65 samples/sec Loss 2.7067 LearningRate 0.000488 Epoch: 14 Global Step: 308250 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:15,977-Speed 1728.24 samples/sec Loss 2.6603 LearningRate 0.000488 Epoch: 14 Global Step: 308260 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:24,202-Speed 2502.76 samples/sec Loss 2.6980 LearningRate 0.000488 Epoch: 14 Global Step: 308270 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:32,499-Speed 2501.46 samples/sec Loss 2.7278 LearningRate 0.000487 Epoch: 14 Global Step: 308280 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:44,044-Speed 1774.10 samples/sec Loss 2.7095 LearningRate 0.000487 Epoch: 14 Global Step: 308290 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:00:57,294-Speed 2493.11 samples/sec Loss 2.7075 LearningRate 0.000487 Epoch: 14 Global Step: 308300 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:05,968-Speed 2361.22 samples/sec Loss 2.6587 LearningRate 0.000487 Epoch: 14 Global Step: 308310 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:14,168-Speed 2498.07 samples/sec Loss 2.7112 LearningRate 0.000487 Epoch: 14 Global Step: 308320 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:22,371-Speed 2496.77 samples/sec Loss 2.7430 LearningRate 0.000487 Epoch: 14 Global Step: 308330 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:30,600-Speed 2489.30 samples/sec Loss 2.7474 LearningRate 0.000487 Epoch: 14 Global Step: 308340 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:38,763-Speed 2509.25 samples/sec Loss 2.6600 LearningRate 0.000487 Epoch: 14 Global Step: 308350 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:46,984-Speed 2491.66 samples/sec Loss 2.7169 LearningRate 0.000487 Epoch: 14 Global Step: 308360 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:01:55,208-Speed 2490.50 samples/sec Loss 2.6938 LearningRate 0.000487 Epoch: 14 Global Step: 308370 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:03,435-Speed 2489.94 samples/sec Loss 2.6942 LearningRate 0.000487 Epoch: 14 Global Step: 308380 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:11,666-Speed 2488.51 samples/sec Loss 2.6904 LearningRate 0.000487 Epoch: 14 Global Step: 308390 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:19,902-Speed 2486.87 samples/sec Loss 2.6341 LearningRate 0.000487 Epoch: 14 Global Step: 308400 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:28,074-Speed 2506.79 samples/sec Loss 2.6562 LearningRate 0.000487 Epoch: 14 Global Step: 308410 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:36,291-Speed 2492.84 samples/sec Loss 2.7158 LearningRate 0.000487 Epoch: 14 Global Step: 308420 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:44,511-Speed 2491.95 samples/sec Loss 2.7777 LearningRate 0.000487 Epoch: 14 Global Step: 308430 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:02:52,716-Speed 2496.32 samples/sec Loss 2.7670 LearningRate 0.000487 Epoch: 14 Global Step: 308440 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:00,945-Speed 2489.09 samples/sec Loss 2.7427 LearningRate 0.000487 Epoch: 14 Global Step: 308450 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:09,179-Speed 2487.91 samples/sec Loss 2.7061 LearningRate 0.000487 Epoch: 14 Global Step: 308460 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:17,326-Speed 2514.24 samples/sec Loss 2.6863 LearningRate 0.000487 Epoch: 14 Global Step: 308470 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:25,534-Speed 2495.63 samples/sec Loss 2.7098 LearningRate 0.000487 Epoch: 14 Global Step: 308480 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:33,736-Speed 2497.24 samples/sec Loss 2.7019 LearningRate 0.000487 Epoch: 14 Global Step: 308490 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:41,953-Speed 2493.15 samples/sec Loss 2.6686 LearningRate 0.000487 Epoch: 14 Global Step: 308500 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:50,156-Speed 2497.04 samples/sec Loss 2.6771 LearningRate 0.000487 Epoch: 14 Global Step: 308510 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:03:58,364-Speed 2495.25 samples/sec Loss 2.7063 LearningRate 0.000487 Epoch: 14 Global Step: 308520 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:06,535-Speed 2507.01 samples/sec Loss 2.7454 LearningRate 0.000487 Epoch: 14 Global Step: 308530 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:14,745-Speed 2495.13 samples/sec Loss 2.6869 LearningRate 0.000487 Epoch: 14 Global Step: 308540 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:22,954-Speed 2494.81 samples/sec Loss 2.7531 LearningRate 0.000487 Epoch: 14 Global Step: 308550 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:31,161-Speed 2495.88 samples/sec Loss 2.7836 LearningRate 0.000487 Epoch: 14 Global Step: 308560 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:39,369-Speed 2495.51 samples/sec Loss 2.6954 LearningRate 0.000487 Epoch: 14 Global Step: 308570 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:47,590-Speed 2491.55 samples/sec Loss 2.7691 LearningRate 0.000487 Epoch: 14 Global Step: 308580 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:04:55,764-Speed 2505.98 samples/sec Loss 2.6928 LearningRate 0.000487 Epoch: 14 Global Step: 308590 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:03,971-Speed 2495.75 samples/sec Loss 2.6944 LearningRate 0.000487 Epoch: 14 Global Step: 308600 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:12,178-Speed 2496.25 samples/sec Loss 2.7073 LearningRate 0.000487 Epoch: 14 Global Step: 308610 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:20,384-Speed 2496.09 samples/sec Loss 2.6976 LearningRate 0.000487 Epoch: 14 Global Step: 308620 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:28,587-Speed 2496.80 samples/sec Loss 2.7532 LearningRate 0.000487 Epoch: 14 Global Step: 308630 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:36,796-Speed 2495.18 samples/sec Loss 2.7054 LearningRate 0.000487 Epoch: 14 Global Step: 308640 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:44,951-Speed 2511.99 samples/sec Loss 2.7850 LearningRate 0.000487 Epoch: 14 Global Step: 308650 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:05:53,163-Speed 2494.35 samples/sec Loss 2.6929 LearningRate 0.000487 Epoch: 14 Global Step: 308660 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:01,368-Speed 2496.26 samples/sec Loss 2.7060 LearningRate 0.000487 Epoch: 14 Global Step: 308670 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:09,578-Speed 2494.93 samples/sec Loss 2.6820 LearningRate 0.000487 Epoch: 14 Global Step: 308680 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:17,792-Speed 2493.76 samples/sec Loss 2.6701 LearningRate 0.000487 Epoch: 14 Global Step: 308690 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:26,002-Speed 2495.09 samples/sec Loss 2.7036 LearningRate 0.000487 Epoch: 14 Global Step: 308700 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:34,157-Speed 2511.61 samples/sec Loss 2.7350 LearningRate 0.000487 Epoch: 14 Global Step: 308710 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:42,359-Speed 2497.28 samples/sec Loss 2.7541 LearningRate 0.000487 Epoch: 14 Global Step: 308720 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:50,567-Speed 2495.71 samples/sec Loss 2.6527 LearningRate 0.000487 Epoch: 14 Global Step: 308730 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:06:58,786-Speed 2492.12 samples/sec Loss 2.7201 LearningRate 0.000487 Epoch: 14 Global Step: 308740 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:06,992-Speed 2496.18 samples/sec Loss 2.7168 LearningRate 0.000487 Epoch: 14 Global Step: 308750 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:15,217-Speed 2490.41 samples/sec Loss 2.6834 LearningRate 0.000487 Epoch: 14 Global Step: 308760 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:23,371-Speed 2512.09 samples/sec Loss 2.7228 LearningRate 0.000487 Epoch: 14 Global Step: 308770 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:31,582-Speed 2494.69 samples/sec Loss 2.7487 LearningRate 0.000487 Epoch: 14 Global Step: 308780 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:39,788-Speed 2495.93 samples/sec Loss 2.7300 LearningRate 0.000487 Epoch: 14 Global Step: 308790 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:47,995-Speed 2496.06 samples/sec Loss 2.7479 LearningRate 0.000487 Epoch: 14 Global Step: 308800 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:07:56,207-Speed 2494.23 samples/sec Loss 2.6772 LearningRate 0.000486 Epoch: 14 Global Step: 308810 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:04,412-Speed 2496.39 samples/sec Loss 2.7233 LearningRate 0.000486 Epoch: 14 Global Step: 308820 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:12,567-Speed 2512.03 samples/sec Loss 2.7304 LearningRate 0.000486 Epoch: 14 Global Step: 308830 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:20,776-Speed 2495.06 samples/sec Loss 2.6774 LearningRate 0.000486 Epoch: 14 Global Step: 308840 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:28,982-Speed 2496.11 samples/sec Loss 2.6764 LearningRate 0.000486 Epoch: 14 Global Step: 308850 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:37,186-Speed 2496.70 samples/sec Loss 2.7092 LearningRate 0.000486 Epoch: 14 Global Step: 308860 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:45,400-Speed 2493.49 samples/sec Loss 2.7366 LearningRate 0.000486 Epoch: 14 Global Step: 308870 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:08:53,608-Speed 2495.52 samples/sec Loss 2.6655 LearningRate 0.000486 Epoch: 14 Global Step: 308880 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:01,762-Speed 2512.12 samples/sec Loss 2.7500 LearningRate 0.000486 Epoch: 14 Global Step: 308890 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:09,970-Speed 2495.57 samples/sec Loss 2.6773 LearningRate 0.000486 Epoch: 14 Global Step: 308900 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:18,177-Speed 2495.81 samples/sec Loss 2.7240 LearningRate 0.000486 Epoch: 14 Global Step: 308910 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:26,382-Speed 2496.42 samples/sec Loss 2.7245 LearningRate 0.000486 Epoch: 14 Global Step: 308920 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:34,586-Speed 2496.89 samples/sec Loss 2.7066 LearningRate 0.000486 Epoch: 14 Global Step: 308930 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:42,818-Speed 2488.25 samples/sec Loss 2.7243 LearningRate 0.000486 Epoch: 14 Global Step: 308940 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:50,972-Speed 2512.05 samples/sec Loss 2.6912 LearningRate 0.000486 Epoch: 14 Global Step: 308950 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:09:59,192-Speed 2491.81 samples/sec Loss 2.7071 LearningRate 0.000486 Epoch: 14 Global Step: 308960 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:07,395-Speed 2497.24 samples/sec Loss 2.6954 LearningRate 0.000486 Epoch: 14 Global Step: 308970 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:15,604-Speed 2495.19 samples/sec Loss 2.7070 LearningRate 0.000486 Epoch: 14 Global Step: 308980 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:23,809-Speed 2496.34 samples/sec Loss 2.7133 LearningRate 0.000486 Epoch: 14 Global Step: 308990 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:32,022-Speed 2494.00 samples/sec Loss 2.7261 LearningRate 0.000486 Epoch: 14 Global Step: 309000 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:40,177-Speed 2511.54 samples/sec Loss 2.7238 LearningRate 0.000486 Epoch: 14 Global Step: 309010 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:48,380-Speed 2497.19 samples/sec Loss 2.7118 LearningRate 0.000486 Epoch: 14 Global Step: 309020 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:10:56,584-Speed 2496.63 samples/sec Loss 2.7082 LearningRate 0.000486 Epoch: 14 Global Step: 309030 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:04,788-Speed 2496.61 samples/sec Loss 2.6395 LearningRate 0.000486 Epoch: 14 Global Step: 309040 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:12,991-Speed 2496.98 samples/sec Loss 2.7209 LearningRate 0.000486 Epoch: 14 Global Step: 309050 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:21,195-Speed 2496.66 samples/sec Loss 2.7227 LearningRate 0.000486 Epoch: 14 Global Step: 309060 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:29,361-Speed 2508.45 samples/sec Loss 2.7577 LearningRate 0.000486 Epoch: 14 Global Step: 309070 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:37,566-Speed 2496.48 samples/sec Loss 2.7457 LearningRate 0.000486 Epoch: 14 Global Step: 309080 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:45,769-Speed 2496.92 samples/sec Loss 2.7225 LearningRate 0.000486 Epoch: 14 Global Step: 309090 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:11:53,974-Speed 2496.51 samples/sec Loss 2.7286 LearningRate 0.000486 Epoch: 14 Global Step: 309100 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:02,181-Speed 2495.80 samples/sec Loss 2.7109 LearningRate 0.000486 Epoch: 14 Global Step: 309110 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:10,399-Speed 2492.26 samples/sec Loss 2.6893 LearningRate 0.000486 Epoch: 14 Global Step: 309120 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:18,552-Speed 2512.48 samples/sec Loss 2.7490 LearningRate 0.000486 Epoch: 14 Global Step: 309130 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:26,754-Speed 2497.26 samples/sec Loss 2.7028 LearningRate 0.000486 Epoch: 14 Global Step: 309140 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:34,960-Speed 2496.26 samples/sec Loss 2.7213 LearningRate 0.000486 Epoch: 14 Global Step: 309150 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:43,173-Speed 2494.11 samples/sec Loss 2.6353 LearningRate 0.000486 Epoch: 14 Global Step: 309160 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:51,377-Speed 2496.42 samples/sec Loss 2.7243 LearningRate 0.000486 Epoch: 14 Global Step: 309170 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:12:59,580-Speed 2497.25 samples/sec Loss 2.6752 LearningRate 0.000486 Epoch: 14 Global Step: 309180 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:07,734-Speed 2511.81 samples/sec Loss 2.7555 LearningRate 0.000486 Epoch: 14 Global Step: 309190 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:15,971-Speed 2486.85 samples/sec Loss 2.7954 LearningRate 0.000486 Epoch: 14 Global Step: 309200 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:24,174-Speed 2496.71 samples/sec Loss 2.7411 LearningRate 0.000486 Epoch: 14 Global Step: 309210 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:32,393-Speed 2492.45 samples/sec Loss 2.8166 LearningRate 0.000486 Epoch: 14 Global Step: 309220 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:40,597-Speed 2496.89 samples/sec Loss 2.7183 LearningRate 0.000486 Epoch: 14 Global Step: 309230 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:48,803-Speed 2496.13 samples/sec Loss 2.7939 LearningRate 0.000486 Epoch: 14 Global Step: 309240 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:13:56,960-Speed 2511.01 samples/sec Loss 2.7168 LearningRate 0.000486 Epoch: 14 Global Step: 309250 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:05,166-Speed 2496.34 samples/sec Loss 2.6719 LearningRate 0.000486 Epoch: 14 Global Step: 309260 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:13,373-Speed 2496.00 samples/sec Loss 2.7070 LearningRate 0.000486 Epoch: 14 Global Step: 309270 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:21,579-Speed 2496.03 samples/sec Loss 2.6812 LearningRate 0.000486 Epoch: 14 Global Step: 309280 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:29,783-Speed 2496.74 samples/sec Loss 2.6932 LearningRate 0.000486 Epoch: 14 Global Step: 309290 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:37,991-Speed 2495.50 samples/sec Loss 2.7208 LearningRate 0.000486 Epoch: 14 Global Step: 309300 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:46,165-Speed 2505.78 samples/sec Loss 2.6880 LearningRate 0.000486 Epoch: 14 Global Step: 309310 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:14:54,369-Speed 2496.79 samples/sec Loss 2.7509 LearningRate 0.000486 Epoch: 14 Global Step: 309320 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:02,575-Speed 2495.87 samples/sec Loss 2.7812 LearningRate 0.000486 Epoch: 14 Global Step: 309330 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:10,779-Speed 2496.80 samples/sec Loss 2.7558 LearningRate 0.000486 Epoch: 14 Global Step: 309340 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:18,988-Speed 2495.43 samples/sec Loss 2.7189 LearningRate 0.000485 Epoch: 14 Global Step: 309350 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:27,193-Speed 2496.05 samples/sec Loss 2.6845 LearningRate 0.000485 Epoch: 14 Global Step: 309360 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:35,343-Speed 2513.22 samples/sec Loss 2.7332 LearningRate 0.000485 Epoch: 14 Global Step: 309370 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:43,556-Speed 2494.00 samples/sec Loss 2.6963 LearningRate 0.000485 Epoch: 14 Global Step: 309380 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:15:51,759-Speed 2497.41 samples/sec Loss 2.7733 LearningRate 0.000485 Epoch: 14 Global Step: 309390 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:00,003-Speed 2484.51 samples/sec Loss 2.7112 LearningRate 0.000485 Epoch: 14 Global Step: 309400 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:08,210-Speed 2495.76 samples/sec Loss 2.7497 LearningRate 0.000485 Epoch: 14 Global Step: 309410 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:16,417-Speed 2495.63 samples/sec Loss 2.6507 LearningRate 0.000485 Epoch: 14 Global Step: 309420 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:24,580-Speed 2514.34 samples/sec Loss 2.6309 LearningRate 0.000485 Epoch: 14 Global Step: 309430 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:32,785-Speed 2496.03 samples/sec Loss 2.7316 LearningRate 0.000485 Epoch: 14 Global Step: 309440 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:41,003-Speed 2492.50 samples/sec Loss 2.6974 LearningRate 0.000485 Epoch: 14 Global Step: 309450 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:49,222-Speed 2492.02 samples/sec Loss 2.7485 LearningRate 0.000485 Epoch: 14 Global Step: 309460 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:16:57,428-Speed 2496.14 samples/sec Loss 2.7328 LearningRate 0.000485 Epoch: 14 Global Step: 309470 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:05,632-Speed 2496.71 samples/sec Loss 2.6826 LearningRate 0.000485 Epoch: 14 Global Step: 309480 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:13,786-Speed 2511.90 samples/sec Loss 2.6742 LearningRate 0.000485 Epoch: 14 Global Step: 309490 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:21,995-Speed 2495.24 samples/sec Loss 2.6399 LearningRate 0.000485 Epoch: 14 Global Step: 309500 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:30,211-Speed 2493.16 samples/sec Loss 2.6714 LearningRate 0.000485 Epoch: 14 Global Step: 309510 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:38,415-Speed 2496.59 samples/sec Loss 2.6659 LearningRate 0.000485 Epoch: 14 Global Step: 309520 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:46,620-Speed 2496.76 samples/sec Loss 2.6958 LearningRate 0.000485 Epoch: 14 Global Step: 309530 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:17:54,823-Speed 2496.87 samples/sec Loss 2.7318 LearningRate 0.000485 Epoch: 14 Global Step: 309540 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:02,974-Speed 2512.98 samples/sec Loss 2.7473 LearningRate 0.000485 Epoch: 14 Global Step: 309550 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:11,175-Speed 2497.97 samples/sec Loss 2.7296 LearningRate 0.000485 Epoch: 14 Global Step: 309560 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:19,377-Speed 2497.63 samples/sec Loss 2.7828 LearningRate 0.000485 Epoch: 14 Global Step: 309570 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:27,583-Speed 2496.05 samples/sec Loss 2.7467 LearningRate 0.000485 Epoch: 14 Global Step: 309580 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:35,786-Speed 2497.08 samples/sec Loss 2.7053 LearningRate 0.000485 Epoch: 14 Global Step: 309590 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:44,004-Speed 2492.36 samples/sec Loss 2.7147 LearningRate 0.000485 Epoch: 14 Global Step: 309600 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:18:52,154-Speed 2513.51 samples/sec Loss 2.7229 LearningRate 0.000485 Epoch: 14 Global Step: 309610 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:19:00,327-Speed 2506.25 samples/sec Loss 2.7638 LearningRate 0.000485 Epoch: 14 Global Step: 309620 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:08,533-Speed 2495.95 samples/sec Loss 2.7132 LearningRate 0.000485 Epoch: 14 Global Step: 309630 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:16,738-Speed 2496.77 samples/sec Loss 2.7870 LearningRate 0.000485 Epoch: 14 Global Step: 309640 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:24,960-Speed 2491.15 samples/sec Loss 2.7151 LearningRate 0.000485 Epoch: 14 Global Step: 309650 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:33,162-Speed 2497.45 samples/sec Loss 2.7043 LearningRate 0.000485 Epoch: 14 Global Step: 309660 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:41,311-Speed 2513.35 samples/sec Loss 2.7430 LearningRate 0.000485 Epoch: 14 Global Step: 309670 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:49,515-Speed 2496.80 samples/sec Loss 2.7477 LearningRate 0.000485 Epoch: 14 Global Step: 309680 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:19:57,726-Speed 2494.35 samples/sec Loss 2.6861 LearningRate 0.000485 Epoch: 14 Global Step: 309690 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:05,932-Speed 2496.17 samples/sec Loss 2.7358 LearningRate 0.000485 Epoch: 14 Global Step: 309700 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:14,135-Speed 2497.45 samples/sec Loss 2.7667 LearningRate 0.000485 Epoch: 14 Global Step: 309710 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:22,354-Speed 2491.97 samples/sec Loss 2.7067 LearningRate 0.000485 Epoch: 14 Global Step: 309720 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:30,507-Speed 2512.23 samples/sec Loss 2.6613 LearningRate 0.000485 Epoch: 14 Global Step: 309730 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:38,710-Speed 2496.97 samples/sec Loss 2.7619 LearningRate 0.000485 Epoch: 14 Global Step: 309740 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:46,919-Speed 2495.53 samples/sec Loss 2.7350 LearningRate 0.000485 Epoch: 14 Global Step: 309750 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:20:55,120-Speed 2497.55 samples/sec Loss 2.7260 LearningRate 0.000485 Epoch: 14 Global Step: 309760 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:03,323-Speed 2496.90 samples/sec Loss 2.7429 LearningRate 0.000485 Epoch: 14 Global Step: 309770 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:11,524-Speed 2497.68 samples/sec Loss 2.6918 LearningRate 0.000485 Epoch: 14 Global Step: 309780 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:19,680-Speed 2511.24 samples/sec Loss 2.7209 LearningRate 0.000485 Epoch: 14 Global Step: 309790 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:27,886-Speed 2496.18 samples/sec Loss 2.7047 LearningRate 0.000485 Epoch: 14 Global Step: 309800 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:36,092-Speed 2496.33 samples/sec Loss 2.7366 LearningRate 0.000485 Epoch: 14 Global Step: 309810 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:44,308-Speed 2492.87 samples/sec Loss 2.7843 LearningRate 0.000485 Epoch: 14 Global Step: 309820 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:21:52,511-Speed 2496.97 samples/sec Loss 2.7164 LearningRate 0.000485 Epoch: 14 Global Step: 309830 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:00,721-Speed 2494.98 samples/sec Loss 2.8205 LearningRate 0.000485 Epoch: 14 Global Step: 309840 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:08,868-Speed 2514.17 samples/sec Loss 2.7156 LearningRate 0.000485 Epoch: 14 Global Step: 309850 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:17,070-Speed 2497.36 samples/sec Loss 2.7421 LearningRate 0.000485 Epoch: 14 Global Step: 309860 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:25,275-Speed 2496.36 samples/sec Loss 2.7129 LearningRate 0.000485 Epoch: 14 Global Step: 309870 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:33,475-Speed 2498.02 samples/sec Loss 2.7365 LearningRate 0.000484 Epoch: 14 Global Step: 309880 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:41,680-Speed 2496.52 samples/sec Loss 2.7706 LearningRate 0.000484 Epoch: 14 Global Step: 309890 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:49,882-Speed 2497.39 samples/sec Loss 2.7404 LearningRate 0.000484 Epoch: 14 Global Step: 309900 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:22:58,032-Speed 2513.20 samples/sec Loss 2.7300 LearningRate 0.000484 Epoch: 14 Global Step: 309910 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:06,239-Speed 2495.89 samples/sec Loss 2.6857 LearningRate 0.000484 Epoch: 14 Global Step: 309920 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:14,447-Speed 2495.46 samples/sec Loss 2.7245 LearningRate 0.000484 Epoch: 14 Global Step: 309930 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:22,651-Speed 2496.52 samples/sec Loss 2.7117 LearningRate 0.000484 Epoch: 14 Global Step: 309940 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:30,858-Speed 2495.92 samples/sec Loss 2.6877 LearningRate 0.000484 Epoch: 14 Global Step: 309950 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:39,060-Speed 2497.24 samples/sec Loss 2.7425 LearningRate 0.000484 Epoch: 14 Global Step: 309960 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:47,211-Speed 2512.86 samples/sec Loss 2.6807 LearningRate 0.000484 Epoch: 14 Global Step: 309970 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:23:55,421-Speed 2494.91 samples/sec Loss 2.7677 LearningRate 0.000484 Epoch: 14 Global Step: 309980 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:03,630-Speed 2495.34 samples/sec Loss 2.7024 LearningRate 0.000484 Epoch: 14 Global Step: 309990 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:11,833-Speed 2497.46 samples/sec Loss 2.7070 LearningRate 0.000484 Epoch: 14 Global Step: 310000 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:20,039-Speed 2496.19 samples/sec Loss 2.6968 LearningRate 0.000484 Epoch: 14 Global Step: 310010 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:28,241-Speed 2497.16 samples/sec Loss 2.7020 LearningRate 0.000484 Epoch: 14 Global Step: 310020 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:36,394-Speed 2512.47 samples/sec Loss 2.7625 LearningRate 0.000484 Epoch: 14 Global Step: 310030 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:44,595-Speed 2497.63 samples/sec Loss 2.7184 LearningRate 0.000484 Epoch: 14 Global Step: 310040 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:24:52,798-Speed 2497.07 samples/sec Loss 2.7542 LearningRate 0.000484 Epoch: 14 Global Step: 310050 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:01,002-Speed 2496.70 samples/sec Loss 2.7831 LearningRate 0.000484 Epoch: 14 Global Step: 310060 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:09,205-Speed 2497.29 samples/sec Loss 2.6921 LearningRate 0.000484 Epoch: 14 Global Step: 310070 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:17,420-Speed 2493.35 samples/sec Loss 2.6650 LearningRate 0.000484 Epoch: 14 Global Step: 310080 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:25,570-Speed 2513.06 samples/sec Loss 2.7273 LearningRate 0.000484 Epoch: 14 Global Step: 310090 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:33,776-Speed 2496.30 samples/sec Loss 2.7284 LearningRate 0.000484 Epoch: 14 Global Step: 310100 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:41,984-Speed 2495.49 samples/sec Loss 2.6890 LearningRate 0.000484 Epoch: 14 Global Step: 310110 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:50,189-Speed 2496.56 samples/sec Loss 2.6936 LearningRate 0.000484 Epoch: 14 Global Step: 310120 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:25:58,395-Speed 2496.08 samples/sec Loss 2.7043 LearningRate 0.000484 Epoch: 14 Global Step: 310130 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:06,605-Speed 2494.92 samples/sec Loss 2.7656 LearningRate 0.000484 Epoch: 14 Global Step: 310140 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:14,758-Speed 2512.58 samples/sec Loss 2.7587 LearningRate 0.000484 Epoch: 14 Global Step: 310150 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:22,963-Speed 2496.19 samples/sec Loss 2.7119 LearningRate 0.000484 Epoch: 14 Global Step: 310160 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:31,170-Speed 2495.85 samples/sec Loss 2.7100 LearningRate 0.000484 Epoch: 14 Global Step: 310170 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:39,381-Speed 2494.50 samples/sec Loss 2.7388 LearningRate 0.000484 Epoch: 14 Global Step: 310180 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:47,586-Speed 2496.64 samples/sec Loss 2.6558 LearningRate 0.000484 Epoch: 14 Global Step: 310190 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:26:55,791-Speed 2496.27 samples/sec Loss 2.7642 LearningRate 0.000484 Epoch: 14 Global Step: 310200 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:03,954-Speed 2509.20 samples/sec Loss 2.7149 LearningRate 0.000484 Epoch: 14 Global Step: 310210 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:12,159-Speed 2496.44 samples/sec Loss 2.7189 LearningRate 0.000484 Epoch: 14 Global Step: 310220 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:20,365-Speed 2496.16 samples/sec Loss 2.7153 LearningRate 0.000484 Epoch: 14 Global Step: 310230 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:28,570-Speed 2496.35 samples/sec Loss 2.7202 LearningRate 0.000484 Epoch: 14 Global Step: 310240 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:36,776-Speed 2496.01 samples/sec Loss 2.7079 LearningRate 0.000484 Epoch: 14 Global Step: 310250 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:44,989-Speed 2494.19 samples/sec Loss 2.6894 LearningRate 0.000484 Epoch: 14 Global Step: 310260 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:27:53,147-Speed 2510.88 samples/sec Loss 2.7439 LearningRate 0.000484 Epoch: 14 Global Step: 310270 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:01,349-Speed 2497.35 samples/sec Loss 2.7004 LearningRate 0.000484 Epoch: 14 Global Step: 310280 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:09,567-Speed 2492.44 samples/sec Loss 2.7219 LearningRate 0.000484 Epoch: 14 Global Step: 310290 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:17,771-Speed 2497.09 samples/sec Loss 2.6990 LearningRate 0.000484 Epoch: 14 Global Step: 310300 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:25,972-Speed 2497.54 samples/sec Loss 2.7585 LearningRate 0.000484 Epoch: 14 Global Step: 310310 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:34,197-Speed 2490.27 samples/sec Loss 2.6291 LearningRate 0.000484 Epoch: 14 Global Step: 310320 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:42,345-Speed 2513.93 samples/sec Loss 2.7226 LearningRate 0.000484 Epoch: 14 Global Step: 310330 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:50,549-Speed 2496.97 samples/sec Loss 2.6417 LearningRate 0.000484 Epoch: 14 Global Step: 310340 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:28:58,752-Speed 2496.87 samples/sec Loss 2.6745 LearningRate 0.000484 Epoch: 14 Global Step: 310350 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:06,977-Speed 2490.45 samples/sec Loss 2.7184 LearningRate 0.000484 Epoch: 14 Global Step: 310360 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:15,188-Speed 2494.75 samples/sec Loss 2.6714 LearningRate 0.000484 Epoch: 14 Global Step: 310370 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:23,396-Speed 2495.30 samples/sec Loss 2.7678 LearningRate 0.000484 Epoch: 14 Global Step: 310380 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:31,545-Speed 2513.71 samples/sec Loss 2.7045 LearningRate 0.000484 Epoch: 14 Global Step: 310390 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:39,766-Speed 2491.77 samples/sec Loss 2.6799 LearningRate 0.000484 Epoch: 14 Global Step: 310400 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:47,972-Speed 2496.08 samples/sec Loss 2.7165 LearningRate 0.000484 Epoch: 14 Global Step: 310410 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:29:56,178-Speed 2496.13 samples/sec Loss 2.7106 LearningRate 0.000483 Epoch: 14 Global Step: 310420 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:04,389-Speed 2494.43 samples/sec Loss 2.6634 LearningRate 0.000483 Epoch: 14 Global Step: 310430 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:12,593-Speed 2497.07 samples/sec Loss 2.7461 LearningRate 0.000483 Epoch: 14 Global Step: 310440 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:20,742-Speed 2513.21 samples/sec Loss 2.7888 LearningRate 0.000483 Epoch: 14 Global Step: 310450 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:28,948-Speed 2496.49 samples/sec Loss 2.7490 LearningRate 0.000483 Epoch: 14 Global Step: 310460 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:37,150-Speed 2497.33 samples/sec Loss 2.7339 LearningRate 0.000483 Epoch: 14 Global Step: 310470 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:45,354-Speed 2496.75 samples/sec Loss 2.6822 LearningRate 0.000483 Epoch: 14 Global Step: 310480 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:30:53,555-Speed 2497.55 samples/sec Loss 2.7879 LearningRate 0.000483 Epoch: 14 Global Step: 310490 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:01,759-Speed 2496.53 samples/sec Loss 2.7192 LearningRate 0.000483 Epoch: 14 Global Step: 310500 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:09,911-Speed 2513.01 samples/sec Loss 2.6955 LearningRate 0.000483 Epoch: 14 Global Step: 310510 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:18,114-Speed 2496.92 samples/sec Loss 2.7164 LearningRate 0.000483 Epoch: 14 Global Step: 310520 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:26,315-Speed 2497.64 samples/sec Loss 2.7247 LearningRate 0.000483 Epoch: 14 Global Step: 310530 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:34,519-Speed 2496.91 samples/sec Loss 2.7272 LearningRate 0.000483 Epoch: 14 Global Step: 310540 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:42,722-Speed 2496.98 samples/sec Loss 2.7329 LearningRate 0.000483 Epoch: 14 Global Step: 310550 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:50,933-Speed 2494.49 samples/sec Loss 2.7762 LearningRate 0.000483 Epoch: 14 Global Step: 310560 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:31:59,091-Speed 2510.97 samples/sec Loss 2.7367 LearningRate 0.000483 Epoch: 14 Global Step: 310570 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:07,298-Speed 2495.78 samples/sec Loss 2.7370 LearningRate 0.000483 Epoch: 14 Global Step: 310580 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:15,503-Speed 2496.25 samples/sec Loss 2.6485 LearningRate 0.000483 Epoch: 14 Global Step: 310590 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:23,704-Speed 2497.83 samples/sec Loss 2.6964 LearningRate 0.000483 Epoch: 14 Global Step: 310600 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:31,908-Speed 2496.93 samples/sec Loss 2.6915 LearningRate 0.000483 Epoch: 14 Global Step: 310610 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:40,119-Speed 2494.69 samples/sec Loss 2.7577 LearningRate 0.000483 Epoch: 14 Global Step: 310620 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:48,284-Speed 2508.52 samples/sec Loss 2.7071 LearningRate 0.000483 Epoch: 14 Global Step: 310630 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:32:56,493-Speed 2495.36 samples/sec Loss 2.8091 LearningRate 0.000483 Epoch: 14 Global Step: 310640 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:04,698-Speed 2496.42 samples/sec Loss 2.8404 LearningRate 0.000483 Epoch: 14 Global Step: 310650 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:12,906-Speed 2495.52 samples/sec Loss 2.7297 LearningRate 0.000483 Epoch: 14 Global Step: 310660 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:21,114-Speed 2495.56 samples/sec Loss 2.7695 LearningRate 0.000483 Epoch: 14 Global Step: 310670 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:29,319-Speed 2497.71 samples/sec Loss 2.6738 LearningRate 0.000483 Epoch: 14 Global Step: 310680 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:37,465-Speed 2515.26 samples/sec Loss 2.7558 LearningRate 0.000483 Epoch: 14 Global Step: 310690 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:45,668-Speed 2497.05 samples/sec Loss 2.6952 LearningRate 0.000483 Epoch: 14 Global Step: 310700 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:33:53,874-Speed 2496.01 samples/sec Loss 2.8012 LearningRate 0.000483 Epoch: 14 Global Step: 310710 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:02,080-Speed 2496.16 samples/sec Loss 2.7613 LearningRate 0.000483 Epoch: 14 Global Step: 310720 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:10,285-Speed 2496.48 samples/sec Loss 2.7064 LearningRate 0.000483 Epoch: 14 Global Step: 310730 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:18,496-Speed 2494.54 samples/sec Loss 2.7156 LearningRate 0.000483 Epoch: 14 Global Step: 310740 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:26,644-Speed 2513.74 samples/sec Loss 2.6850 LearningRate 0.000483 Epoch: 14 Global Step: 310750 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:34,854-Speed 2494.84 samples/sec Loss 2.8040 LearningRate 0.000483 Epoch: 14 Global Step: 310760 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:43,087-Speed 2488.16 samples/sec Loss 2.7355 LearningRate 0.000483 Epoch: 14 Global Step: 310770 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:51,288-Speed 2497.72 samples/sec Loss 2.6508 LearningRate 0.000483 Epoch: 14 Global Step: 310780 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:34:59,494-Speed 2495.82 samples/sec Loss 2.7576 LearningRate 0.000483 Epoch: 14 Global Step: 310790 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:35:07,698-Speed 2496.97 samples/sec Loss 2.7720 LearningRate 0.000483 Epoch: 14 Global Step: 310800 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:35:15,841-Speed 2515.33 samples/sec Loss 2.7775 LearningRate 0.000483 Epoch: 14 Global Step: 310810 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:35:24,050-Speed 2495.05 samples/sec Loss 2.7716 LearningRate 0.000483 Epoch: 14 Global Step: 310820 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:35:32,255-Speed 2496.45 samples/sec Loss 2.7678 LearningRate 0.000483 Epoch: 14 Global Step: 310830 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:35:40,455-Speed 2498.05 samples/sec Loss 2.7275 LearningRate 0.000483 Epoch: 14 Global Step: 310840 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:35:48,654-Speed 2498.15 samples/sec Loss 2.7771 LearningRate 0.000483 Epoch: 14 Global Step: 310850 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:35:56,857-Speed 2496.93 samples/sec Loss 2.8605 LearningRate 0.000483 Epoch: 14 Global Step: 310860 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:05,006-Speed 2513.71 samples/sec Loss 2.6990 LearningRate 0.000483 Epoch: 14 Global Step: 310870 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:13,205-Speed 2498.09 samples/sec Loss 2.7656 LearningRate 0.000483 Epoch: 14 Global Step: 310880 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:21,414-Speed 2495.29 samples/sec Loss 2.7432 LearningRate 0.000483 Epoch: 14 Global Step: 310890 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:29,621-Speed 2496.05 samples/sec Loss 2.7194 LearningRate 0.000483 Epoch: 14 Global Step: 310900 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:37,825-Speed 2496.83 samples/sec Loss 2.7577 LearningRate 0.000483 Epoch: 14 Global Step: 310910 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:46,030-Speed 2496.81 samples/sec Loss 2.7218 LearningRate 0.000483 Epoch: 14 Global Step: 310920 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:36:54,179-Speed 2513.31 samples/sec Loss 2.7078 LearningRate 0.000483 Epoch: 14 Global Step: 310930 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:02,381-Speed 2497.39 samples/sec Loss 2.7405 LearningRate 0.000483 Epoch: 14 Global Step: 310940 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:10,582-Speed 2497.71 samples/sec Loss 2.7171 LearningRate 0.000483 Epoch: 14 Global Step: 310950 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:18,781-Speed 2498.39 samples/sec Loss 2.7787 LearningRate 0.000482 Epoch: 14 Global Step: 310960 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:26,983-Speed 2497.25 samples/sec Loss 2.8027 LearningRate 0.000482 Epoch: 14 Global Step: 310970 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:35,182-Speed 2497.98 samples/sec Loss 2.7713 LearningRate 0.000482 Epoch: 14 Global Step: 310980 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:43,331-Speed 2513.86 samples/sec Loss 2.7315 LearningRate 0.000482 Epoch: 14 Global Step: 310990 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:51,533-Speed 2497.47 samples/sec Loss 2.6972 LearningRate 0.000482 Epoch: 14 Global Step: 311000 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:37:59,735-Speed 2497.36 samples/sec Loss 2.7431 LearningRate 0.000482 Epoch: 14 Global Step: 311010 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:07,936-Speed 2497.66 samples/sec Loss 2.7077 LearningRate 0.000482 Epoch: 14 Global Step: 311020 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:16,137-Speed 2497.66 samples/sec Loss 2.7342 LearningRate 0.000482 Epoch: 14 Global Step: 311030 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:24,345-Speed 2495.51 samples/sec Loss 2.7302 LearningRate 0.000482 Epoch: 14 Global Step: 311040 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:32,491-Speed 2514.74 samples/sec Loss 2.7292 LearningRate 0.000482 Epoch: 14 Global Step: 311050 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:40,683-Speed 2500.56 samples/sec Loss 2.7364 LearningRate 0.000482 Epoch: 14 Global Step: 311060 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:48,885-Speed 2497.36 samples/sec Loss 2.7255 LearningRate 0.000482 Epoch: 14 Global Step: 311070 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:38:57,080-Speed 2499.47 samples/sec Loss 2.7521 LearningRate 0.000482 Epoch: 14 Global Step: 311080 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:05,279-Speed 2498.08 samples/sec Loss 2.7287 LearningRate 0.000482 Epoch: 14 Global Step: 311090 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:15,408-Speed 2022.27 samples/sec Loss 2.7627 LearningRate 0.000482 Epoch: 15 Global Step: 311100 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:23,554-Speed 2514.74 samples/sec Loss 2.7187 LearningRate 0.000482 Epoch: 15 Global Step: 311110 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:31,752-Speed 2498.36 samples/sec Loss 2.7234 LearningRate 0.000482 Epoch: 15 Global Step: 311120 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:39,955-Speed 2497.01 samples/sec Loss 2.7955 LearningRate 0.000482 Epoch: 15 Global Step: 311130 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:48,155-Speed 2498.53 samples/sec Loss 2.7934 LearningRate 0.000482 Epoch: 15 Global Step: 311140 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:39:56,356-Speed 2497.41 samples/sec Loss 2.7255 LearningRate 0.000482 Epoch: 15 Global Step: 311150 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:04,559-Speed 2497.16 samples/sec Loss 2.7203 LearningRate 0.000482 Epoch: 15 Global Step: 311160 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:12,712-Speed 2512.52 samples/sec Loss 2.6893 LearningRate 0.000482 Epoch: 15 Global Step: 311170 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:20,917-Speed 2496.25 samples/sec Loss 2.6839 LearningRate 0.000482 Epoch: 15 Global Step: 311180 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:29,120-Speed 2497.09 samples/sec Loss 2.6732 LearningRate 0.000482 Epoch: 15 Global Step: 311190 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:37,318-Speed 2498.58 samples/sec Loss 2.6910 LearningRate 0.000482 Epoch: 15 Global Step: 311200 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:45,521-Speed 2496.94 samples/sec Loss 2.6973 LearningRate 0.000482 Epoch: 15 Global Step: 311210 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:40:53,723-Speed 2497.64 samples/sec Loss 2.6740 LearningRate 0.000482 Epoch: 15 Global Step: 311220 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:01,870-Speed 2514.18 samples/sec Loss 2.7001 LearningRate 0.000482 Epoch: 15 Global Step: 311230 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:10,094-Speed 2490.69 samples/sec Loss 2.6529 LearningRate 0.000482 Epoch: 15 Global Step: 311240 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:18,297-Speed 2496.86 samples/sec Loss 2.6823 LearningRate 0.000482 Epoch: 15 Global Step: 311250 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:26,505-Speed 2495.49 samples/sec Loss 2.6891 LearningRate 0.000482 Epoch: 15 Global Step: 311260 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:34,705-Speed 2497.86 samples/sec Loss 2.6829 LearningRate 0.000482 Epoch: 15 Global Step: 311270 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:42,917-Speed 2494.29 samples/sec Loss 2.7305 LearningRate 0.000482 Epoch: 15 Global Step: 311280 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:51,076-Speed 2510.62 samples/sec Loss 2.7487 LearningRate 0.000482 Epoch: 15 Global Step: 311290 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:41:59,281-Speed 2496.20 samples/sec Loss 2.6789 LearningRate 0.000482 Epoch: 15 Global Step: 311300 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:07,480-Speed 2498.45 samples/sec Loss 2.6527 LearningRate 0.000482 Epoch: 15 Global Step: 311310 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:15,679-Speed 2498.02 samples/sec Loss 2.6786 LearningRate 0.000482 Epoch: 15 Global Step: 311320 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:23,879-Speed 2498.09 samples/sec Loss 2.7039 LearningRate 0.000482 Epoch: 15 Global Step: 311330 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:32,084-Speed 2496.32 samples/sec Loss 2.6928 LearningRate 0.000482 Epoch: 15 Global Step: 311340 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:40,234-Speed 2513.33 samples/sec Loss 2.6811 LearningRate 0.000482 Epoch: 15 Global Step: 311350 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:48,436-Speed 2497.53 samples/sec Loss 2.6618 LearningRate 0.000482 Epoch: 15 Global Step: 311360 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:42:56,648-Speed 2494.52 samples/sec Loss 2.6201 LearningRate 0.000482 Epoch: 15 Global Step: 311370 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:04,848-Speed 2498.17 samples/sec Loss 2.6894 LearningRate 0.000482 Epoch: 15 Global Step: 311380 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:13,047-Speed 2498.08 samples/sec Loss 2.7462 LearningRate 0.000482 Epoch: 15 Global Step: 311390 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:21,254-Speed 2495.73 samples/sec Loss 2.6766 LearningRate 0.000482 Epoch: 15 Global Step: 311400 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:29,421-Speed 2508.29 samples/sec Loss 2.6442 LearningRate 0.000482 Epoch: 15 Global Step: 311410 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:37,620-Speed 2498.21 samples/sec Loss 2.6938 LearningRate 0.000482 Epoch: 15 Global Step: 311420 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:45,830-Speed 2494.76 samples/sec Loss 2.7104 LearningRate 0.000482 Epoch: 15 Global Step: 311430 Fp16 Grad Scale: 32768 Required: 119 hours Training: 2022-07-08 13:43:53,986-Speed 2511.56 samples/sec Loss 2.7009 LearningRate 0.000482 Epoch: 15 Global Step: 311440 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:02,194-Speed 2495.48 samples/sec Loss 2.7454 LearningRate 0.000482 Epoch: 15 Global Step: 311450 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:10,399-Speed 2496.34 samples/sec Loss 2.6974 LearningRate 0.000482 Epoch: 15 Global Step: 311460 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:18,551-Speed 2512.49 samples/sec Loss 2.7625 LearningRate 0.000482 Epoch: 15 Global Step: 311470 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:26,752-Speed 2497.88 samples/sec Loss 2.6496 LearningRate 0.000482 Epoch: 15 Global Step: 311480 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:34,951-Speed 2498.16 samples/sec Loss 2.7259 LearningRate 0.000482 Epoch: 15 Global Step: 311490 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:43,151-Speed 2497.89 samples/sec Loss 2.7325 LearningRate 0.000481 Epoch: 15 Global Step: 311500 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:51,346-Speed 2499.67 samples/sec Loss 2.7219 LearningRate 0.000481 Epoch: 15 Global Step: 311510 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:44:59,560-Speed 2493.77 samples/sec Loss 2.6847 LearningRate 0.000481 Epoch: 15 Global Step: 311520 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:07,716-Speed 2511.28 samples/sec Loss 2.6622 LearningRate 0.000481 Epoch: 15 Global Step: 311530 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:15,914-Speed 2498.58 samples/sec Loss 2.7147 LearningRate 0.000481 Epoch: 15 Global Step: 311540 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:24,121-Speed 2495.90 samples/sec Loss 2.7222 LearningRate 0.000481 Epoch: 15 Global Step: 311550 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:32,318-Speed 2498.78 samples/sec Loss 2.7440 LearningRate 0.000481 Epoch: 15 Global Step: 311560 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:40,517-Speed 2498.33 samples/sec Loss 2.7086 LearningRate 0.000481 Epoch: 15 Global Step: 311570 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:48,723-Speed 2496.01 samples/sec Loss 2.7730 LearningRate 0.000481 Epoch: 15 Global Step: 311580 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:45:56,867-Speed 2515.08 samples/sec Loss 2.6713 LearningRate 0.000481 Epoch: 15 Global Step: 311590 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:05,065-Speed 2498.59 samples/sec Loss 2.8009 LearningRate 0.000481 Epoch: 15 Global Step: 311600 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:13,271-Speed 2496.15 samples/sec Loss 2.7228 LearningRate 0.000481 Epoch: 15 Global Step: 311610 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:21,482-Speed 2494.45 samples/sec Loss 2.7283 LearningRate 0.000481 Epoch: 15 Global Step: 311620 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:29,682-Speed 2498.00 samples/sec Loss 2.7463 LearningRate 0.000481 Epoch: 15 Global Step: 311630 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:37,886-Speed 2496.86 samples/sec Loss 2.6975 LearningRate 0.000481 Epoch: 15 Global Step: 311640 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:46,037-Speed 2513.19 samples/sec Loss 2.6662 LearningRate 0.000481 Epoch: 15 Global Step: 311650 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:46:54,249-Speed 2494.04 samples/sec Loss 2.6826 LearningRate 0.000481 Epoch: 15 Global Step: 311660 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:02,451-Speed 2497.82 samples/sec Loss 2.7208 LearningRate 0.000481 Epoch: 15 Global Step: 311670 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:10,667-Speed 2493.01 samples/sec Loss 2.6404 LearningRate 0.000481 Epoch: 15 Global Step: 311680 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:18,867-Speed 2497.70 samples/sec Loss 2.7041 LearningRate 0.000481 Epoch: 15 Global Step: 311690 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:27,067-Speed 2498.17 samples/sec Loss 2.6803 LearningRate 0.000481 Epoch: 15 Global Step: 311700 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:35,212-Speed 2514.79 samples/sec Loss 2.7222 LearningRate 0.000481 Epoch: 15 Global Step: 311710 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:43,426-Speed 2493.91 samples/sec Loss 2.7435 LearningRate 0.000481 Epoch: 15 Global Step: 311720 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:51,626-Speed 2497.82 samples/sec Loss 2.7110 LearningRate 0.000481 Epoch: 15 Global Step: 311730 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:47:59,839-Speed 2494.20 samples/sec Loss 2.7164 LearningRate 0.000481 Epoch: 15 Global Step: 311740 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:08,045-Speed 2495.95 samples/sec Loss 2.7178 LearningRate 0.000481 Epoch: 15 Global Step: 311750 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:16,245-Speed 2498.13 samples/sec Loss 2.6794 LearningRate 0.000481 Epoch: 15 Global Step: 311760 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:24,393-Speed 2513.74 samples/sec Loss 2.6593 LearningRate 0.000481 Epoch: 15 Global Step: 311770 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:32,606-Speed 2494.19 samples/sec Loss 2.6739 LearningRate 0.000481 Epoch: 15 Global Step: 311780 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:40,806-Speed 2497.94 samples/sec Loss 2.6460 LearningRate 0.000481 Epoch: 15 Global Step: 311790 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:49,007-Speed 2497.50 samples/sec Loss 2.7083 LearningRate 0.000481 Epoch: 15 Global Step: 311800 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:48:57,206-Speed 2498.46 samples/sec Loss 2.6454 LearningRate 0.000481 Epoch: 15 Global Step: 311810 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:05,421-Speed 2493.24 samples/sec Loss 2.6962 LearningRate 0.000481 Epoch: 15 Global Step: 311820 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:13,568-Speed 2514.36 samples/sec Loss 2.6471 LearningRate 0.000481 Epoch: 15 Global Step: 311830 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:21,780-Speed 2494.04 samples/sec Loss 2.7819 LearningRate 0.000481 Epoch: 15 Global Step: 311840 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:29,983-Speed 2497.71 samples/sec Loss 2.6902 LearningRate 0.000481 Epoch: 15 Global Step: 311850 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:38,184-Speed 2497.63 samples/sec Loss 2.7026 LearningRate 0.000481 Epoch: 15 Global Step: 311860 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:46,388-Speed 2496.97 samples/sec Loss 2.7286 LearningRate 0.000481 Epoch: 15 Global Step: 311870 Fp16 Grad Scale: 16384 Required: 119 hours Training: 2022-07-08 13:49:54,595-Speed 2495.78 samples/sec Loss 2.7034 LearningRate 0.000481 Epoch: 15 Global Step: 311880 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:02,743-Speed 2513.70 samples/sec Loss 2.7515 LearningRate 0.000481 Epoch: 15 Global Step: 311890 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:10,952-Speed 2495.47 samples/sec Loss 2.7174 LearningRate 0.000481 Epoch: 15 Global Step: 311900 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:19,155-Speed 2497.35 samples/sec Loss 2.6917 LearningRate 0.000481 Epoch: 15 Global Step: 311910 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:27,354-Speed 2498.05 samples/sec Loss 2.7392 LearningRate 0.000481 Epoch: 15 Global Step: 311920 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:35,555-Speed 2497.61 samples/sec Loss 2.6888 LearningRate 0.000481 Epoch: 15 Global Step: 311930 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:43,756-Speed 2498.03 samples/sec Loss 2.7294 LearningRate 0.000481 Epoch: 15 Global Step: 311940 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:50:51,910-Speed 2511.88 samples/sec Loss 2.7157 LearningRate 0.000481 Epoch: 15 Global Step: 311950 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:00,110-Speed 2498.04 samples/sec Loss 2.6902 LearningRate 0.000481 Epoch: 15 Global Step: 311960 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:08,310-Speed 2498.04 samples/sec Loss 2.7550 LearningRate 0.000481 Epoch: 15 Global Step: 311970 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:16,512-Speed 2497.31 samples/sec Loss 2.6957 LearningRate 0.000481 Epoch: 15 Global Step: 311980 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:24,715-Speed 2496.90 samples/sec Loss 2.6972 LearningRate 0.000481 Epoch: 15 Global Step: 311990 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:32,913-Speed 2498.37 samples/sec Loss 2.6791 LearningRate 0.000481 Epoch: 15 Global Step: 312000 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:41,061-Speed 2513.99 samples/sec Loss 2.6889 LearningRate 0.000481 Epoch: 15 Global Step: 312010 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:49,260-Speed 2498.34 samples/sec Loss 2.6654 LearningRate 0.000481 Epoch: 15 Global Step: 312020 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:51:57,462-Speed 2497.18 samples/sec Loss 2.7613 LearningRate 0.000480 Epoch: 15 Global Step: 312030 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:05,660-Speed 2498.63 samples/sec Loss 2.7181 LearningRate 0.000480 Epoch: 15 Global Step: 312040 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:13,862-Speed 2497.47 samples/sec Loss 2.6751 LearningRate 0.000480 Epoch: 15 Global Step: 312050 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:22,061-Speed 2498.01 samples/sec Loss 2.7274 LearningRate 0.000480 Epoch: 15 Global Step: 312060 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:30,205-Speed 2515.19 samples/sec Loss 2.7333 LearningRate 0.000480 Epoch: 15 Global Step: 312070 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:38,431-Speed 2490.10 samples/sec Loss 2.7036 LearningRate 0.000480 Epoch: 15 Global Step: 312080 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:46,636-Speed 2496.75 samples/sec Loss 2.6840 LearningRate 0.000480 Epoch: 15 Global Step: 312090 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:52:54,834-Speed 2498.29 samples/sec Loss 2.6794 LearningRate 0.000480 Epoch: 15 Global Step: 312100 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:03,030-Speed 2499.14 samples/sec Loss 2.6715 LearningRate 0.000480 Epoch: 15 Global Step: 312110 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:11,244-Speed 2493.72 samples/sec Loss 2.6828 LearningRate 0.000480 Epoch: 15 Global Step: 312120 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:19,399-Speed 2512.01 samples/sec Loss 2.6820 LearningRate 0.000480 Epoch: 15 Global Step: 312130 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:27,594-Speed 2499.23 samples/sec Loss 2.7156 LearningRate 0.000480 Epoch: 15 Global Step: 312140 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:35,795-Speed 2497.94 samples/sec Loss 2.7080 LearningRate 0.000480 Epoch: 15 Global Step: 312150 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:43,993-Speed 2498.88 samples/sec Loss 2.7063 LearningRate 0.000480 Epoch: 15 Global Step: 312160 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:53:52,195-Speed 2497.34 samples/sec Loss 2.6785 LearningRate 0.000480 Epoch: 15 Global Step: 312170 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:00,392-Speed 2498.68 samples/sec Loss 2.6531 LearningRate 0.000480 Epoch: 15 Global Step: 312180 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:08,538-Speed 2514.47 samples/sec Loss 2.7854 LearningRate 0.000480 Epoch: 15 Global Step: 312190 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:16,751-Speed 2494.12 samples/sec Loss 2.7455 LearningRate 0.000480 Epoch: 15 Global Step: 312200 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:24,951-Speed 2498.10 samples/sec Loss 2.8177 LearningRate 0.000480 Epoch: 15 Global Step: 312210 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:33,158-Speed 2495.62 samples/sec Loss 2.7866 LearningRate 0.000480 Epoch: 15 Global Step: 312220 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:41,358-Speed 2498.00 samples/sec Loss 2.7180 LearningRate 0.000480 Epoch: 15 Global Step: 312230 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:49,565-Speed 2495.95 samples/sec Loss 2.7207 LearningRate 0.000480 Epoch: 15 Global Step: 312240 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:54:57,712-Speed 2514.15 samples/sec Loss 2.7139 LearningRate 0.000480 Epoch: 15 Global Step: 312250 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:05,924-Speed 2494.32 samples/sec Loss 2.6738 LearningRate 0.000480 Epoch: 15 Global Step: 312260 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:14,125-Speed 2498.82 samples/sec Loss 2.6750 LearningRate 0.000480 Epoch: 15 Global Step: 312270 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:22,327-Speed 2497.37 samples/sec Loss 2.7604 LearningRate 0.000480 Epoch: 15 Global Step: 312280 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:30,532-Speed 2496.44 samples/sec Loss 2.7271 LearningRate 0.000480 Epoch: 15 Global Step: 312290 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:38,731-Speed 2498.45 samples/sec Loss 2.6571 LearningRate 0.000480 Epoch: 15 Global Step: 312300 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:46,880-Speed 2513.55 samples/sec Loss 2.7313 LearningRate 0.000480 Epoch: 15 Global Step: 312310 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:55:55,079-Speed 2498.23 samples/sec Loss 2.7523 LearningRate 0.000480 Epoch: 15 Global Step: 312320 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:03,280-Speed 2497.54 samples/sec Loss 2.7214 LearningRate 0.000480 Epoch: 15 Global Step: 312330 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:11,479-Speed 2498.26 samples/sec Loss 2.7283 LearningRate 0.000480 Epoch: 15 Global Step: 312340 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:19,678-Speed 2498.22 samples/sec Loss 2.6968 LearningRate 0.000480 Epoch: 15 Global Step: 312350 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:27,879-Speed 2497.85 samples/sec Loss 2.6860 LearningRate 0.000480 Epoch: 15 Global Step: 312360 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:36,025-Speed 2514.54 samples/sec Loss 2.6604 LearningRate 0.000480 Epoch: 15 Global Step: 312370 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:44,236-Speed 2494.51 samples/sec Loss 2.6825 LearningRate 0.000480 Epoch: 15 Global Step: 312380 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:56:52,432-Speed 2499.17 samples/sec Loss 2.6404 LearningRate 0.000480 Epoch: 15 Global Step: 312390 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:00,628-Speed 2499.21 samples/sec Loss 2.6724 LearningRate 0.000480 Epoch: 15 Global Step: 312400 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:08,827-Speed 2498.37 samples/sec Loss 2.6886 LearningRate 0.000480 Epoch: 15 Global Step: 312410 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:17,026-Speed 2498.28 samples/sec Loss 2.6681 LearningRate 0.000480 Epoch: 15 Global Step: 312420 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:25,173-Speed 2514.11 samples/sec Loss 2.6103 LearningRate 0.000480 Epoch: 15 Global Step: 312430 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:33,376-Speed 2497.35 samples/sec Loss 2.5812 LearningRate 0.000480 Epoch: 15 Global Step: 312440 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:41,581-Speed 2496.31 samples/sec Loss 2.6486 LearningRate 0.000480 Epoch: 15 Global Step: 312450 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:49,780-Speed 2498.13 samples/sec Loss 2.6733 LearningRate 0.000480 Epoch: 15 Global Step: 312460 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:57:57,983-Speed 2497.09 samples/sec Loss 2.7088 LearningRate 0.000480 Epoch: 15 Global Step: 312470 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:06,190-Speed 2496.06 samples/sec Loss 2.7282 LearningRate 0.000480 Epoch: 15 Global Step: 312480 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:14,346-Speed 2511.48 samples/sec Loss 2.6918 LearningRate 0.000480 Epoch: 15 Global Step: 312490 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:22,549-Speed 2496.86 samples/sec Loss 2.6592 LearningRate 0.000480 Epoch: 15 Global Step: 312500 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:30,751-Speed 2497.29 samples/sec Loss 2.6304 LearningRate 0.000480 Epoch: 15 Global Step: 312510 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:38,953-Speed 2497.82 samples/sec Loss 2.6486 LearningRate 0.000480 Epoch: 15 Global Step: 312520 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:47,158-Speed 2496.29 samples/sec Loss 2.6248 LearningRate 0.000480 Epoch: 15 Global Step: 312530 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:58:55,361-Speed 2497.34 samples/sec Loss 2.6474 LearningRate 0.000480 Epoch: 15 Global Step: 312540 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:03,509-Speed 2514.16 samples/sec Loss 2.6955 LearningRate 0.000480 Epoch: 15 Global Step: 312550 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:11,711-Speed 2497.25 samples/sec Loss 2.6934 LearningRate 0.000480 Epoch: 15 Global Step: 312560 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:19,919-Speed 2495.63 samples/sec Loss 2.6728 LearningRate 0.000479 Epoch: 15 Global Step: 312570 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:28,124-Speed 2496.55 samples/sec Loss 2.6145 LearningRate 0.000479 Epoch: 15 Global Step: 312580 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:36,327-Speed 2497.27 samples/sec Loss 2.6688 LearningRate 0.000479 Epoch: 15 Global Step: 312590 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:44,531-Speed 2496.71 samples/sec Loss 2.6516 LearningRate 0.000479 Epoch: 15 Global Step: 312600 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 13:59:52,677-Speed 2514.47 samples/sec Loss 2.6058 LearningRate 0.000479 Epoch: 15 Global Step: 312610 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:00:00,880-Speed 2497.24 samples/sec Loss 2.7838 LearningRate 0.000479 Epoch: 15 Global Step: 312620 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:00:09,083-Speed 2497.12 samples/sec Loss 2.6880 LearningRate 0.000479 Epoch: 15 Global Step: 312630 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:00:17,282-Speed 2498.19 samples/sec Loss 2.7139 LearningRate 0.000479 Epoch: 15 Global Step: 312640 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:00:25,487-Speed 2496.57 samples/sec Loss 2.7090 LearningRate 0.000479 Epoch: 15 Global Step: 312650 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:00:33,689-Speed 2497.34 samples/sec Loss 2.7187 LearningRate 0.000479 Epoch: 15 Global Step: 312660 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:00:41,838-Speed 2513.85 samples/sec Loss 2.7036 LearningRate 0.000479 Epoch: 15 Global Step: 312670 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:00:50,046-Speed 2495.35 samples/sec Loss 2.6699 LearningRate 0.000479 Epoch: 15 Global Step: 312680 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:00:58,252-Speed 2496.31 samples/sec Loss 2.6691 LearningRate 0.000479 Epoch: 15 Global Step: 312690 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:06,452-Speed 2497.84 samples/sec Loss 2.6669 LearningRate 0.000479 Epoch: 15 Global Step: 312700 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:14,657-Speed 2496.49 samples/sec Loss 2.6865 LearningRate 0.000479 Epoch: 15 Global Step: 312710 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:22,862-Speed 2496.34 samples/sec Loss 2.7577 LearningRate 0.000479 Epoch: 15 Global Step: 312720 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:31,010-Speed 2514.09 samples/sec Loss 2.7307 LearningRate 0.000479 Epoch: 15 Global Step: 312730 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:39,213-Speed 2497.04 samples/sec Loss 2.7029 LearningRate 0.000479 Epoch: 15 Global Step: 312740 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:47,413-Speed 2498.86 samples/sec Loss 2.7682 LearningRate 0.000479 Epoch: 15 Global Step: 312750 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:01:55,615-Speed 2497.52 samples/sec Loss 2.7491 LearningRate 0.000479 Epoch: 15 Global Step: 312760 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:03,827-Speed 2494.03 samples/sec Loss 2.7088 LearningRate 0.000479 Epoch: 15 Global Step: 312770 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:12,033-Speed 2496.30 samples/sec Loss 2.7134 LearningRate 0.000479 Epoch: 15 Global Step: 312780 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:20,183-Speed 2513.22 samples/sec Loss 2.6978 LearningRate 0.000479 Epoch: 15 Global Step: 312790 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:28,387-Speed 2496.59 samples/sec Loss 2.6517 LearningRate 0.000479 Epoch: 15 Global Step: 312800 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:36,588-Speed 2497.82 samples/sec Loss 2.6726 LearningRate 0.000479 Epoch: 15 Global Step: 312810 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:44,792-Speed 2496.78 samples/sec Loss 2.6681 LearningRate 0.000479 Epoch: 15 Global Step: 312820 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:02:52,996-Speed 2496.92 samples/sec Loss 2.6827 LearningRate 0.000479 Epoch: 15 Global Step: 312830 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:01,198-Speed 2497.12 samples/sec Loss 2.6505 LearningRate 0.000479 Epoch: 15 Global Step: 312840 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:09,344-Speed 2514.48 samples/sec Loss 2.6561 LearningRate 0.000479 Epoch: 15 Global Step: 312850 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:17,552-Speed 2495.89 samples/sec Loss 2.6819 LearningRate 0.000479 Epoch: 15 Global Step: 312860 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:25,753-Speed 2497.63 samples/sec Loss 2.6892 LearningRate 0.000479 Epoch: 15 Global Step: 312870 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:33,953-Speed 2497.91 samples/sec Loss 2.6437 LearningRate 0.000479 Epoch: 15 Global Step: 312880 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:42,153-Speed 2497.77 samples/sec Loss 2.6471 LearningRate 0.000479 Epoch: 15 Global Step: 312890 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:50,355-Speed 2497.48 samples/sec Loss 2.6562 LearningRate 0.000479 Epoch: 15 Global Step: 312900 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:03:58,506-Speed 2513.06 samples/sec Loss 2.6937 LearningRate 0.000479 Epoch: 15 Global Step: 312910 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:06,707-Speed 2497.59 samples/sec Loss 2.7059 LearningRate 0.000479 Epoch: 15 Global Step: 312920 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:14,905-Speed 2498.61 samples/sec Loss 2.6950 LearningRate 0.000479 Epoch: 15 Global Step: 312930 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:23,121-Speed 2492.94 samples/sec Loss 2.6525 LearningRate 0.000479 Epoch: 15 Global Step: 312940 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:31,320-Speed 2499.05 samples/sec Loss 2.7081 LearningRate 0.000479 Epoch: 15 Global Step: 312950 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:39,518-Speed 2498.48 samples/sec Loss 2.6043 LearningRate 0.000479 Epoch: 15 Global Step: 312960 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:47,665-Speed 2514.32 samples/sec Loss 2.6885 LearningRate 0.000479 Epoch: 15 Global Step: 312970 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:04:55,868-Speed 2496.92 samples/sec Loss 2.6635 LearningRate 0.000479 Epoch: 15 Global Step: 312980 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:04,070-Speed 2497.24 samples/sec Loss 2.6624 LearningRate 0.000479 Epoch: 15 Global Step: 312990 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:12,276-Speed 2496.41 samples/sec Loss 2.6928 LearningRate 0.000479 Epoch: 15 Global Step: 313000 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:20,476-Speed 2497.89 samples/sec Loss 2.6857 LearningRate 0.000479 Epoch: 15 Global Step: 313010 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:28,676-Speed 2497.81 samples/sec Loss 2.7205 LearningRate 0.000479 Epoch: 15 Global Step: 313020 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:36,825-Speed 2513.65 samples/sec Loss 2.6975 LearningRate 0.000479 Epoch: 15 Global Step: 313030 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:45,033-Speed 2495.55 samples/sec Loss 2.6537 LearningRate 0.000479 Epoch: 15 Global Step: 313040 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:05:53,239-Speed 2496.48 samples/sec Loss 2.6897 LearningRate 0.000479 Epoch: 15 Global Step: 313050 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:01,436-Speed 2498.60 samples/sec Loss 2.7016 LearningRate 0.000479 Epoch: 15 Global Step: 313060 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:09,635-Speed 2498.07 samples/sec Loss 2.7338 LearningRate 0.000479 Epoch: 15 Global Step: 313070 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:17,850-Speed 2493.38 samples/sec Loss 2.7275 LearningRate 0.000479 Epoch: 15 Global Step: 313080 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:26,000-Speed 2513.61 samples/sec Loss 2.6793 LearningRate 0.000479 Epoch: 15 Global Step: 313090 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:34,204-Speed 2496.59 samples/sec Loss 2.6774 LearningRate 0.000479 Epoch: 15 Global Step: 313100 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:42,403-Speed 2498.34 samples/sec Loss 2.6849 LearningRate 0.000478 Epoch: 15 Global Step: 313110 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:50,605-Speed 2497.34 samples/sec Loss 2.6725 LearningRate 0.000478 Epoch: 15 Global Step: 313120 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:06:58,808-Speed 2497.22 samples/sec Loss 2.6916 LearningRate 0.000478 Epoch: 15 Global Step: 313130 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:07,009-Speed 2497.63 samples/sec Loss 2.6590 LearningRate 0.000478 Epoch: 15 Global Step: 313140 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:15,156-Speed 2514.22 samples/sec Loss 2.6515 LearningRate 0.000478 Epoch: 15 Global Step: 313150 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:23,355-Speed 2498.13 samples/sec Loss 2.7274 LearningRate 0.000478 Epoch: 15 Global Step: 313160 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:31,568-Speed 2494.18 samples/sec Loss 2.7196 LearningRate 0.000478 Epoch: 15 Global Step: 313170 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:39,789-Speed 2491.36 samples/sec Loss 2.7608 LearningRate 0.000478 Epoch: 15 Global Step: 313180 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:48,000-Speed 2494.77 samples/sec Loss 2.6951 LearningRate 0.000478 Epoch: 15 Global Step: 313190 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:07:56,205-Speed 2496.42 samples/sec Loss 2.6745 LearningRate 0.000478 Epoch: 15 Global Step: 313200 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:08:04,350-Speed 2514.97 samples/sec Loss 2.7147 LearningRate 0.000478 Epoch: 15 Global Step: 313210 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:08:12,552-Speed 2497.46 samples/sec Loss 2.6859 LearningRate 0.000478 Epoch: 15 Global Step: 313220 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:08:20,754-Speed 2497.52 samples/sec Loss 2.7489 LearningRate 0.000478 Epoch: 15 Global Step: 313230 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:08:28,953-Speed 2498.27 samples/sec Loss 2.7840 LearningRate 0.000478 Epoch: 15 Global Step: 313240 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:08:37,111-Speed 2510.63 samples/sec Loss 2.7235 LearningRate 0.000478 Epoch: 15 Global Step: 313250 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:08:45,310-Speed 2498.24 samples/sec Loss 2.7306 LearningRate 0.000478 Epoch: 15 Global Step: 313260 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:08:53,456-Speed 2514.64 samples/sec Loss 2.7200 LearningRate 0.000478 Epoch: 15 Global Step: 313270 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:01,654-Speed 2498.38 samples/sec Loss 2.7313 LearningRate 0.000478 Epoch: 15 Global Step: 313280 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:09,850-Speed 2499.41 samples/sec Loss 2.6894 LearningRate 0.000478 Epoch: 15 Global Step: 313290 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:18,052-Speed 2497.33 samples/sec Loss 2.7305 LearningRate 0.000478 Epoch: 15 Global Step: 313300 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:26,251-Speed 2497.97 samples/sec Loss 2.6774 LearningRate 0.000478 Epoch: 15 Global Step: 313310 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:34,450-Speed 2498.70 samples/sec Loss 2.7161 LearningRate 0.000478 Epoch: 15 Global Step: 313320 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:42,594-Speed 2515.10 samples/sec Loss 2.7718 LearningRate 0.000478 Epoch: 15 Global Step: 313330 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:50,803-Speed 2495.22 samples/sec Loss 2.6841 LearningRate 0.000478 Epoch: 15 Global Step: 313340 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:09:59,002-Speed 2498.25 samples/sec Loss 2.6760 LearningRate 0.000478 Epoch: 15 Global Step: 313350 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:07,198-Speed 2499.18 samples/sec Loss 2.7491 LearningRate 0.000478 Epoch: 15 Global Step: 313360 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:15,412-Speed 2493.76 samples/sec Loss 2.6867 LearningRate 0.000478 Epoch: 15 Global Step: 313370 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:23,618-Speed 2495.99 samples/sec Loss 2.7026 LearningRate 0.000478 Epoch: 15 Global Step: 313380 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:31,762-Speed 2515.14 samples/sec Loss 2.6777 LearningRate 0.000478 Epoch: 15 Global Step: 313390 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:39,963-Speed 2497.92 samples/sec Loss 2.7133 LearningRate 0.000478 Epoch: 15 Global Step: 313400 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:48,161-Speed 2498.54 samples/sec Loss 2.7140 LearningRate 0.000478 Epoch: 15 Global Step: 313410 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:10:56,361-Speed 2497.76 samples/sec Loss 2.7080 LearningRate 0.000478 Epoch: 15 Global Step: 313420 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:04,561-Speed 2498.13 samples/sec Loss 2.7033 LearningRate 0.000478 Epoch: 15 Global Step: 313430 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:12,775-Speed 2494.01 samples/sec Loss 2.6601 LearningRate 0.000478 Epoch: 15 Global Step: 313440 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:20,919-Speed 2514.88 samples/sec Loss 2.6763 LearningRate 0.000478 Epoch: 15 Global Step: 313450 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:29,121-Speed 2497.29 samples/sec Loss 2.7321 LearningRate 0.000478 Epoch: 15 Global Step: 313460 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:37,333-Speed 2494.31 samples/sec Loss 2.6945 LearningRate 0.000478 Epoch: 15 Global Step: 313470 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:45,544-Speed 2494.67 samples/sec Loss 2.6822 LearningRate 0.000478 Epoch: 15 Global Step: 313480 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:11:53,751-Speed 2495.80 samples/sec Loss 2.6770 LearningRate 0.000478 Epoch: 15 Global Step: 313490 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:01,954-Speed 2497.07 samples/sec Loss 2.7066 LearningRate 0.000478 Epoch: 15 Global Step: 313500 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:10,105-Speed 2513.01 samples/sec Loss 2.6909 LearningRate 0.000478 Epoch: 15 Global Step: 313510 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:18,311-Speed 2496.07 samples/sec Loss 2.6642 LearningRate 0.000478 Epoch: 15 Global Step: 313520 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:26,513-Speed 2497.43 samples/sec Loss 2.7037 LearningRate 0.000478 Epoch: 15 Global Step: 313530 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:34,715-Speed 2497.45 samples/sec Loss 2.7037 LearningRate 0.000478 Epoch: 15 Global Step: 313540 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:42,915-Speed 2498.00 samples/sec Loss 2.7867 LearningRate 0.000478 Epoch: 15 Global Step: 313550 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:51,118-Speed 2496.73 samples/sec Loss 2.7252 LearningRate 0.000478 Epoch: 15 Global Step: 313560 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:12:59,265-Speed 2514.23 samples/sec Loss 2.7131 LearningRate 0.000478 Epoch: 15 Global Step: 313570 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:07,475-Speed 2495.09 samples/sec Loss 2.7155 LearningRate 0.000478 Epoch: 15 Global Step: 313580 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:15,677-Speed 2497.37 samples/sec Loss 2.6421 LearningRate 0.000478 Epoch: 15 Global Step: 313590 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:23,876-Speed 2498.31 samples/sec Loss 2.6925 LearningRate 0.000478 Epoch: 15 Global Step: 313600 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:32,075-Speed 2498.16 samples/sec Loss 2.7058 LearningRate 0.000478 Epoch: 15 Global Step: 313610 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:40,273-Speed 2498.57 samples/sec Loss 2.7512 LearningRate 0.000478 Epoch: 15 Global Step: 313620 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:48,421-Speed 2513.98 samples/sec Loss 2.6421 LearningRate 0.000478 Epoch: 15 Global Step: 313630 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:13:56,631-Speed 2494.57 samples/sec Loss 2.6504 LearningRate 0.000478 Epoch: 15 Global Step: 313640 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:04,841-Speed 2494.96 samples/sec Loss 2.6916 LearningRate 0.000477 Epoch: 15 Global Step: 313650 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:13,041-Speed 2498.13 samples/sec Loss 2.6512 LearningRate 0.000477 Epoch: 15 Global Step: 313660 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:21,243-Speed 2497.60 samples/sec Loss 2.6845 LearningRate 0.000477 Epoch: 15 Global Step: 313670 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:29,453-Speed 2494.78 samples/sec Loss 2.7013 LearningRate 0.000477 Epoch: 15 Global Step: 313680 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:37,615-Speed 2509.65 samples/sec Loss 2.7161 LearningRate 0.000477 Epoch: 15 Global Step: 313690 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:45,815-Speed 2498.03 samples/sec Loss 2.6476 LearningRate 0.000477 Epoch: 15 Global Step: 313700 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:14:54,030-Speed 2493.32 samples/sec Loss 2.6564 LearningRate 0.000477 Epoch: 15 Global Step: 313710 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:02,229-Speed 2498.42 samples/sec Loss 2.6750 LearningRate 0.000477 Epoch: 15 Global Step: 313720 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:10,434-Speed 2496.66 samples/sec Loss 2.6989 LearningRate 0.000477 Epoch: 15 Global Step: 313730 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:18,640-Speed 2496.43 samples/sec Loss 2.6581 LearningRate 0.000477 Epoch: 15 Global Step: 313740 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:26,786-Speed 2514.38 samples/sec Loss 2.6931 LearningRate 0.000477 Epoch: 15 Global Step: 313750 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:34,990-Speed 2496.72 samples/sec Loss 2.6787 LearningRate 0.000477 Epoch: 15 Global Step: 313760 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:43,189-Speed 2498.19 samples/sec Loss 2.6864 LearningRate 0.000477 Epoch: 15 Global Step: 313770 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:51,393-Speed 2496.90 samples/sec Loss 2.7361 LearningRate 0.000477 Epoch: 15 Global Step: 313780 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:15:59,589-Speed 2499.13 samples/sec Loss 2.7904 LearningRate 0.000477 Epoch: 15 Global Step: 313790 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:07,793-Speed 2496.79 samples/sec Loss 2.7149 LearningRate 0.000477 Epoch: 15 Global Step: 313800 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:15,938-Speed 2514.89 samples/sec Loss 2.7016 LearningRate 0.000477 Epoch: 15 Global Step: 313810 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:24,143-Speed 2496.31 samples/sec Loss 2.7496 LearningRate 0.000477 Epoch: 15 Global Step: 313820 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:32,356-Speed 2493.94 samples/sec Loss 2.7031 LearningRate 0.000477 Epoch: 15 Global Step: 313830 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:40,555-Speed 2498.46 samples/sec Loss 2.6794 LearningRate 0.000477 Epoch: 15 Global Step: 313840 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:48,755-Speed 2497.95 samples/sec Loss 2.6939 LearningRate 0.000477 Epoch: 15 Global Step: 313850 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:16:56,968-Speed 2494.03 samples/sec Loss 2.6528 LearningRate 0.000477 Epoch: 15 Global Step: 313860 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:05,113-Speed 2514.71 samples/sec Loss 2.8149 LearningRate 0.000477 Epoch: 15 Global Step: 313870 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:13,308-Speed 2499.50 samples/sec Loss 2.7279 LearningRate 0.000477 Epoch: 15 Global Step: 313880 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:21,515-Speed 2496.00 samples/sec Loss 2.6781 LearningRate 0.000477 Epoch: 15 Global Step: 313890 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:29,714-Speed 2498.08 samples/sec Loss 2.7049 LearningRate 0.000477 Epoch: 15 Global Step: 313900 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:37,911-Speed 2498.94 samples/sec Loss 2.7009 LearningRate 0.000477 Epoch: 15 Global Step: 313910 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:46,106-Speed 2499.60 samples/sec Loss 2.7430 LearningRate 0.000477 Epoch: 15 Global Step: 313920 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:17:54,251-Speed 2514.52 samples/sec Loss 2.7012 LearningRate 0.000477 Epoch: 15 Global Step: 313930 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:02,446-Speed 2499.62 samples/sec Loss 2.7238 LearningRate 0.000477 Epoch: 15 Global Step: 313940 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:10,643-Speed 2499.14 samples/sec Loss 2.7188 LearningRate 0.000477 Epoch: 15 Global Step: 313950 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:18,841-Speed 2498.49 samples/sec Loss 2.6826 LearningRate 0.000477 Epoch: 15 Global Step: 313960 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:27,038-Speed 2498.80 samples/sec Loss 2.7110 LearningRate 0.000477 Epoch: 15 Global Step: 313970 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:35,234-Speed 2499.78 samples/sec Loss 2.6754 LearningRate 0.000477 Epoch: 15 Global Step: 313980 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:43,381-Speed 2513.98 samples/sec Loss 2.6704 LearningRate 0.000477 Epoch: 15 Global Step: 313990 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:51,579-Speed 2498.99 samples/sec Loss 2.7194 LearningRate 0.000477 Epoch: 15 Global Step: 314000 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:18:59,775-Speed 2499.46 samples/sec Loss 2.6989 LearningRate 0.000477 Epoch: 15 Global Step: 314010 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:07,978-Speed 2497.02 samples/sec Loss 2.7249 LearningRate 0.000477 Epoch: 15 Global Step: 314020 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:16,177-Speed 2498.43 samples/sec Loss 2.7231 LearningRate 0.000477 Epoch: 15 Global Step: 314030 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:24,383-Speed 2496.22 samples/sec Loss 2.5991 LearningRate 0.000477 Epoch: 15 Global Step: 314040 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:32,528-Speed 2514.66 samples/sec Loss 2.7413 LearningRate 0.000477 Epoch: 15 Global Step: 314050 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:40,726-Speed 2499.00 samples/sec Loss 2.7173 LearningRate 0.000477 Epoch: 15 Global Step: 314060 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:48,923-Speed 2498.79 samples/sec Loss 2.6677 LearningRate 0.000477 Epoch: 15 Global Step: 314070 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:19:57,137-Speed 2493.81 samples/sec Loss 2.6720 LearningRate 0.000477 Epoch: 15 Global Step: 314080 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:05,335-Speed 2498.30 samples/sec Loss 2.6588 LearningRate 0.000477 Epoch: 15 Global Step: 314090 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:13,533-Speed 2498.70 samples/sec Loss 2.6976 LearningRate 0.000477 Epoch: 15 Global Step: 314100 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:21,684-Speed 2513.13 samples/sec Loss 2.6571 LearningRate 0.000477 Epoch: 15 Global Step: 314110 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:29,884-Speed 2498.12 samples/sec Loss 2.6395 LearningRate 0.000477 Epoch: 15 Global Step: 314120 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:38,083-Speed 2498.17 samples/sec Loss 2.6882 LearningRate 0.000477 Epoch: 15 Global Step: 314130 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:46,283-Speed 2497.80 samples/sec Loss 2.6373 LearningRate 0.000477 Epoch: 15 Global Step: 314140 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:20:54,483-Speed 2497.99 samples/sec Loss 2.6613 LearningRate 0.000477 Epoch: 15 Global Step: 314150 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:02,686-Speed 2497.14 samples/sec Loss 2.6202 LearningRate 0.000477 Epoch: 15 Global Step: 314160 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:10,832-Speed 2515.10 samples/sec Loss 2.7711 LearningRate 0.000477 Epoch: 15 Global Step: 314170 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:19,035-Speed 2497.31 samples/sec Loss 2.7059 LearningRate 0.000477 Epoch: 15 Global Step: 314180 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:27,237-Speed 2497.09 samples/sec Loss 2.6982 LearningRate 0.000476 Epoch: 15 Global Step: 314190 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:35,438-Speed 2497.82 samples/sec Loss 2.7647 LearningRate 0.000476 Epoch: 15 Global Step: 314200 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:43,640-Speed 2497.36 samples/sec Loss 2.7023 LearningRate 0.000476 Epoch: 15 Global Step: 314210 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:51,853-Speed 2494.20 samples/sec Loss 2.7194 LearningRate 0.000476 Epoch: 15 Global Step: 314220 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:21:59,998-Speed 2514.57 samples/sec Loss 2.7213 LearningRate 0.000476 Epoch: 15 Global Step: 314230 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:08,200-Speed 2497.57 samples/sec Loss 2.7758 LearningRate 0.000476 Epoch: 15 Global Step: 314240 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:16,399-Speed 2498.17 samples/sec Loss 2.7443 LearningRate 0.000476 Epoch: 15 Global Step: 314250 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:24,600-Speed 2497.59 samples/sec Loss 2.8150 LearningRate 0.000476 Epoch: 15 Global Step: 314260 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:32,795-Speed 2499.59 samples/sec Loss 2.8005 LearningRate 0.000476 Epoch: 15 Global Step: 314270 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:40,994-Speed 2497.86 samples/sec Loss 2.7977 LearningRate 0.000476 Epoch: 15 Global Step: 314280 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:49,138-Speed 2515.10 samples/sec Loss 2.7541 LearningRate 0.000476 Epoch: 15 Global Step: 314290 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:22:57,348-Speed 2495.01 samples/sec Loss 2.7846 LearningRate 0.000476 Epoch: 15 Global Step: 314300 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:05,546-Speed 2498.88 samples/sec Loss 2.7291 LearningRate 0.000476 Epoch: 15 Global Step: 314310 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:13,745-Speed 2498.32 samples/sec Loss 2.6953 LearningRate 0.000476 Epoch: 15 Global Step: 314320 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:21,942-Speed 2498.64 samples/sec Loss 2.7492 LearningRate 0.000476 Epoch: 15 Global Step: 314330 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:30,142-Speed 2498.19 samples/sec Loss 2.7521 LearningRate 0.000476 Epoch: 15 Global Step: 314340 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:38,287-Speed 2515.49 samples/sec Loss 2.6701 LearningRate 0.000476 Epoch: 15 Global Step: 314350 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:46,496-Speed 2495.15 samples/sec Loss 2.7013 LearningRate 0.000476 Epoch: 15 Global Step: 314360 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:23:54,703-Speed 2496.10 samples/sec Loss 2.6092 LearningRate 0.000476 Epoch: 15 Global Step: 314370 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:02,903-Speed 2497.97 samples/sec Loss 2.6945 LearningRate 0.000476 Epoch: 15 Global Step: 314380 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:11,104-Speed 2497.65 samples/sec Loss 2.6640 LearningRate 0.000476 Epoch: 15 Global Step: 314390 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:19,308-Speed 2496.64 samples/sec Loss 2.6281 LearningRate 0.000476 Epoch: 15 Global Step: 314400 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:27,454-Speed 2514.69 samples/sec Loss 2.6220 LearningRate 0.000476 Epoch: 15 Global Step: 314410 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:35,657-Speed 2496.96 samples/sec Loss 2.6445 LearningRate 0.000476 Epoch: 15 Global Step: 314420 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:43,860-Speed 2497.18 samples/sec Loss 2.6691 LearningRate 0.000476 Epoch: 15 Global Step: 314430 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:24:52,060-Speed 2497.94 samples/sec Loss 2.6909 LearningRate 0.000476 Epoch: 15 Global Step: 314440 Fp16 Grad Scale: 16384 Required: 118 hours Training: 2022-07-08 14:25:00,271-Speed 2494.54 samples/sec Loss 2.6254 LearningRate 0.000476 Epoch: 15 Global Step: 314450 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:08,474-Speed 2496.94 samples/sec Loss 2.6306 LearningRate 0.000476 Epoch: 15 Global Step: 314460 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:16,626-Speed 2512.72 samples/sec Loss 2.6329 LearningRate 0.000476 Epoch: 15 Global Step: 314470 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:24,826-Speed 2497.92 samples/sec Loss 2.6263 LearningRate 0.000476 Epoch: 15 Global Step: 314480 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:33,036-Speed 2495.24 samples/sec Loss 2.6005 LearningRate 0.000476 Epoch: 15 Global Step: 314490 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:41,248-Speed 2494.40 samples/sec Loss 2.6787 LearningRate 0.000476 Epoch: 15 Global Step: 314500 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:49,447-Speed 2498.21 samples/sec Loss 2.6257 LearningRate 0.000476 Epoch: 15 Global Step: 314510 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:25:57,646-Speed 2498.29 samples/sec Loss 2.6155 LearningRate 0.000476 Epoch: 15 Global Step: 314520 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:05,794-Speed 2513.88 samples/sec Loss 2.5927 LearningRate 0.000476 Epoch: 15 Global Step: 314530 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:14,001-Speed 2495.94 samples/sec Loss 2.6038 LearningRate 0.000476 Epoch: 15 Global Step: 314540 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:22,204-Speed 2496.88 samples/sec Loss 2.6554 LearningRate 0.000476 Epoch: 15 Global Step: 314550 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:30,410-Speed 2496.23 samples/sec Loss 2.7393 LearningRate 0.000476 Epoch: 15 Global Step: 314560 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:38,619-Speed 2495.18 samples/sec Loss 2.7172 LearningRate 0.000476 Epoch: 15 Global Step: 314570 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:46,819-Speed 2497.85 samples/sec Loss 2.7081 LearningRate 0.000476 Epoch: 15 Global Step: 314580 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:26:54,966-Speed 2514.22 samples/sec Loss 2.6245 LearningRate 0.000476 Epoch: 15 Global Step: 314590 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:03,167-Speed 2497.65 samples/sec Loss 2.7858 LearningRate 0.000476 Epoch: 15 Global Step: 314600 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:11,368-Speed 2497.67 samples/sec Loss 2.6819 LearningRate 0.000476 Epoch: 15 Global Step: 314610 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:19,565-Speed 2498.90 samples/sec Loss 2.7044 LearningRate 0.000476 Epoch: 15 Global Step: 314620 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:27,766-Speed 2497.80 samples/sec Loss 2.7482 LearningRate 0.000476 Epoch: 15 Global Step: 314630 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:35,967-Speed 2497.52 samples/sec Loss 2.6806 LearningRate 0.000476 Epoch: 15 Global Step: 314640 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:44,115-Speed 2514.09 samples/sec Loss 2.7236 LearningRate 0.000476 Epoch: 15 Global Step: 314650 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:27:52,313-Speed 2498.17 samples/sec Loss 2.6720 LearningRate 0.000476 Epoch: 15 Global Step: 314660 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:00,515-Speed 2497.59 samples/sec Loss 2.7312 LearningRate 0.000476 Epoch: 15 Global Step: 314670 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:08,716-Speed 2497.63 samples/sec Loss 2.6963 LearningRate 0.000476 Epoch: 15 Global Step: 314680 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:16,914-Speed 2498.65 samples/sec Loss 2.7388 LearningRate 0.000476 Epoch: 15 Global Step: 314690 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:25,112-Speed 2498.65 samples/sec Loss 2.6899 LearningRate 0.000476 Epoch: 15 Global Step: 314700 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:33,264-Speed 2512.83 samples/sec Loss 2.7148 LearningRate 0.000476 Epoch: 15 Global Step: 314710 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:41,479-Speed 2493.25 samples/sec Loss 2.6614 LearningRate 0.000476 Epoch: 15 Global Step: 314720 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:49,677-Speed 2498.87 samples/sec Loss 2.6463 LearningRate 0.000475 Epoch: 15 Global Step: 314730 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:28:57,884-Speed 2495.51 samples/sec Loss 2.6986 LearningRate 0.000475 Epoch: 15 Global Step: 314740 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:06,082-Speed 2498.74 samples/sec Loss 2.6937 LearningRate 0.000475 Epoch: 15 Global Step: 314750 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:14,284-Speed 2497.58 samples/sec Loss 2.7331 LearningRate 0.000475 Epoch: 15 Global Step: 314760 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:22,435-Speed 2512.96 samples/sec Loss 2.8029 LearningRate 0.000475 Epoch: 15 Global Step: 314770 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:30,632-Speed 2498.70 samples/sec Loss 2.6781 LearningRate 0.000475 Epoch: 15 Global Step: 314780 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:38,834-Speed 2497.47 samples/sec Loss 2.7171 LearningRate 0.000475 Epoch: 15 Global Step: 314790 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:47,035-Speed 2498.24 samples/sec Loss 2.6650 LearningRate 0.000475 Epoch: 15 Global Step: 314800 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:29:55,232-Speed 2498.82 samples/sec Loss 2.6774 LearningRate 0.000475 Epoch: 15 Global Step: 314810 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:03,430-Speed 2498.45 samples/sec Loss 2.7513 LearningRate 0.000475 Epoch: 15 Global Step: 314820 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:11,581-Speed 2513.06 samples/sec Loss 2.6576 LearningRate 0.000475 Epoch: 15 Global Step: 314830 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:19,806-Speed 2490.42 samples/sec Loss 2.7355 LearningRate 0.000475 Epoch: 15 Global Step: 314840 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:28,012-Speed 2496.19 samples/sec Loss 2.6799 LearningRate 0.000475 Epoch: 15 Global Step: 314850 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:36,223-Speed 2494.45 samples/sec Loss 2.6694 LearningRate 0.000475 Epoch: 15 Global Step: 314860 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:44,424-Speed 2497.85 samples/sec Loss 2.6473 LearningRate 0.000475 Epoch: 15 Global Step: 314870 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:30:52,627-Speed 2497.14 samples/sec Loss 2.6344 LearningRate 0.000475 Epoch: 15 Global Step: 314880 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:00,778-Speed 2513.08 samples/sec Loss 2.6335 LearningRate 0.000475 Epoch: 15 Global Step: 314890 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:08,981-Speed 2496.90 samples/sec Loss 2.7174 LearningRate 0.000475 Epoch: 15 Global Step: 314900 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:17,177-Speed 2499.34 samples/sec Loss 2.6690 LearningRate 0.000475 Epoch: 15 Global Step: 314910 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:25,373-Speed 2499.06 samples/sec Loss 2.6932 LearningRate 0.000475 Epoch: 15 Global Step: 314920 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:33,571-Speed 2498.61 samples/sec Loss 2.7092 LearningRate 0.000475 Epoch: 15 Global Step: 314930 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:41,770-Speed 2498.29 samples/sec Loss 2.7213 LearningRate 0.000475 Epoch: 15 Global Step: 314940 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:49,919-Speed 2513.81 samples/sec Loss 2.6475 LearningRate 0.000475 Epoch: 15 Global Step: 314950 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:31:58,131-Speed 2494.15 samples/sec Loss 2.6326 LearningRate 0.000475 Epoch: 15 Global Step: 314960 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:06,327-Speed 2499.14 samples/sec Loss 2.6539 LearningRate 0.000475 Epoch: 15 Global Step: 314970 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:14,532-Speed 2496.38 samples/sec Loss 2.6823 LearningRate 0.000475 Epoch: 15 Global Step: 314980 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:22,811-Speed 2497.83 samples/sec Loss 2.7130 LearningRate 0.000475 Epoch: 15 Global Step: 314990 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:34,674-Speed 1726.52 samples/sec Loss 2.6992 LearningRate 0.000475 Epoch: 15 Global Step: 315000 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:42,821-Speed 2514.08 samples/sec Loss 2.6511 LearningRate 0.000475 Epoch: 15 Global Step: 315010 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:51,075-Speed 2499.56 samples/sec Loss 2.6650 LearningRate 0.000475 Epoch: 15 Global Step: 315020 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:32:59,369-Speed 2500.20 samples/sec Loss 2.6538 LearningRate 0.000475 Epoch: 15 Global Step: 315030 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:33:07,567-Speed 2498.18 samples/sec Loss 2.6813 LearningRate 0.000475 Epoch: 15 Global Step: 315040 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:33:21,522-Speed 2500.28 samples/sec Loss 2.6684 LearningRate 0.000475 Epoch: 15 Global Step: 315050 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:33:29,779-Speed 2499.07 samples/sec Loss 2.6500 LearningRate 0.000475 Epoch: 15 Global Step: 315060 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:33:42,275-Speed 1639.21 samples/sec Loss 2.7196 LearningRate 0.000475 Epoch: 15 Global Step: 315070 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:33:50,531-Speed 2504.21 samples/sec Loss 2.7128 LearningRate 0.000475 Epoch: 15 Global Step: 315080 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:05,414-Speed 2488.88 samples/sec Loss 2.6981 LearningRate 0.000475 Epoch: 15 Global Step: 315090 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:13,691-Speed 2504.01 samples/sec Loss 2.6803 LearningRate 0.000475 Epoch: 15 Global Step: 315100 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:21,883-Speed 2500.33 samples/sec Loss 2.6687 LearningRate 0.000475 Epoch: 15 Global Step: 315110 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:34,526-Speed 1624.24 samples/sec Loss 2.7398 LearningRate 0.000475 Epoch: 15 Global Step: 315120 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:43,254-Speed 2356.08 samples/sec Loss 2.7062 LearningRate 0.000475 Epoch: 15 Global Step: 315130 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:51,445-Speed 2500.58 samples/sec Loss 2.7081 LearningRate 0.000475 Epoch: 15 Global Step: 315140 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:34:59,761-Speed 2502.07 samples/sec Loss 2.6952 LearningRate 0.000475 Epoch: 15 Global Step: 315150 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:07,972-Speed 2501.48 samples/sec Loss 2.7302 LearningRate 0.000475 Epoch: 15 Global Step: 315160 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:16,841-Speed 2501.08 samples/sec Loss 2.6739 LearningRate 0.000475 Epoch: 15 Global Step: 315170 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:27,975-Speed 1853.99 samples/sec Loss 2.7049 LearningRate 0.000475 Epoch: 15 Global Step: 315180 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:36,119-Speed 2515.21 samples/sec Loss 2.7384 LearningRate 0.000475 Epoch: 15 Global Step: 315190 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:44,520-Speed 2496.74 samples/sec Loss 2.7230 LearningRate 0.000475 Epoch: 15 Global Step: 315200 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:35:54,023-Speed 2500.33 samples/sec Loss 2.6803 LearningRate 0.000475 Epoch: 15 Global Step: 315210 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:03,434-Speed 2176.35 samples/sec Loss 2.6809 LearningRate 0.000475 Epoch: 15 Global Step: 315220 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:11,638-Speed 2498.74 samples/sec Loss 2.7239 LearningRate 0.000475 Epoch: 15 Global Step: 315230 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:19,918-Speed 2485.04 samples/sec Loss 2.6424 LearningRate 0.000475 Epoch: 15 Global Step: 315240 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:28,093-Speed 2507.31 samples/sec Loss 2.6935 LearningRate 0.000475 Epoch: 15 Global Step: 315250 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:37,612-Speed 2151.76 samples/sec Loss 2.6856 LearningRate 0.000475 Epoch: 15 Global Step: 315260 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:45,830-Speed 2492.39 samples/sec Loss 2.7109 LearningRate 0.000475 Epoch: 15 Global Step: 315270 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:36:54,034-Speed 2496.73 samples/sec Loss 2.7129 LearningRate 0.000474 Epoch: 15 Global Step: 315280 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:02,241-Speed 2496.20 samples/sec Loss 2.6825 LearningRate 0.000474 Epoch: 15 Global Step: 315290 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:10,457-Speed 2492.98 samples/sec Loss 2.6629 LearningRate 0.000474 Epoch: 15 Global Step: 315300 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:18,608-Speed 2512.93 samples/sec Loss 2.6646 LearningRate 0.000474 Epoch: 15 Global Step: 315310 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:26,829-Speed 2491.83 samples/sec Loss 2.6733 LearningRate 0.000474 Epoch: 15 Global Step: 315320 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:35,043-Speed 2493.88 samples/sec Loss 2.6203 LearningRate 0.000474 Epoch: 15 Global Step: 315330 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:43,247-Speed 2496.79 samples/sec Loss 2.6769 LearningRate 0.000474 Epoch: 15 Global Step: 315340 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:51,449-Speed 2497.29 samples/sec Loss 2.7112 LearningRate 0.000474 Epoch: 15 Global Step: 315350 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:37:59,652-Speed 2497.44 samples/sec Loss 2.6437 LearningRate 0.000474 Epoch: 15 Global Step: 315360 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:07,803-Speed 2513.05 samples/sec Loss 2.6821 LearningRate 0.000474 Epoch: 15 Global Step: 315370 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:16,004-Speed 2497.58 samples/sec Loss 2.7063 LearningRate 0.000474 Epoch: 15 Global Step: 315380 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:24,206-Speed 2497.60 samples/sec Loss 2.6256 LearningRate 0.000474 Epoch: 15 Global Step: 315390 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:32,416-Speed 2495.07 samples/sec Loss 2.6676 LearningRate 0.000474 Epoch: 15 Global Step: 315400 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:40,619-Speed 2497.07 samples/sec Loss 2.6850 LearningRate 0.000474 Epoch: 15 Global Step: 315410 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:48,821-Speed 2497.36 samples/sec Loss 2.7080 LearningRate 0.000474 Epoch: 15 Global Step: 315420 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:38:56,967-Speed 2514.55 samples/sec Loss 2.6964 LearningRate 0.000474 Epoch: 15 Global Step: 315430 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:05,166-Speed 2498.22 samples/sec Loss 2.7032 LearningRate 0.000474 Epoch: 15 Global Step: 315440 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:13,376-Speed 2495.02 samples/sec Loss 2.6629 LearningRate 0.000474 Epoch: 15 Global Step: 315450 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:21,579-Speed 2497.07 samples/sec Loss 2.6564 LearningRate 0.000474 Epoch: 15 Global Step: 315460 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:29,789-Speed 2494.93 samples/sec Loss 2.6204 LearningRate 0.000474 Epoch: 15 Global Step: 315470 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:38,005-Speed 2492.97 samples/sec Loss 2.6234 LearningRate 0.000474 Epoch: 15 Global Step: 315480 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:46,149-Speed 2515.43 samples/sec Loss 2.6426 LearningRate 0.000474 Epoch: 15 Global Step: 315490 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:39:54,350-Speed 2497.39 samples/sec Loss 2.6711 LearningRate 0.000474 Epoch: 15 Global Step: 315500 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:02,558-Speed 2495.44 samples/sec Loss 2.6808 LearningRate 0.000474 Epoch: 15 Global Step: 315510 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:10,759-Speed 2497.80 samples/sec Loss 2.6700 LearningRate 0.000474 Epoch: 15 Global Step: 315520 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:18,967-Speed 2495.54 samples/sec Loss 2.6355 LearningRate 0.000474 Epoch: 15 Global Step: 315530 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:27,169-Speed 2497.08 samples/sec Loss 2.7036 LearningRate 0.000474 Epoch: 15 Global Step: 315540 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:35,317-Speed 2514.12 samples/sec Loss 2.6295 LearningRate 0.000474 Epoch: 15 Global Step: 315550 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:43,521-Speed 2496.58 samples/sec Loss 2.6299 LearningRate 0.000474 Epoch: 15 Global Step: 315560 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:51,724-Speed 2497.27 samples/sec Loss 2.6694 LearningRate 0.000474 Epoch: 15 Global Step: 315570 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:40:59,924-Speed 2498.01 samples/sec Loss 2.6627 LearningRate 0.000474 Epoch: 15 Global Step: 315580 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:08,126-Speed 2497.27 samples/sec Loss 2.6666 LearningRate 0.000474 Epoch: 15 Global Step: 315590 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:16,325-Speed 2498.54 samples/sec Loss 2.6487 LearningRate 0.000474 Epoch: 15 Global Step: 315600 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:24,474-Speed 2513.54 samples/sec Loss 2.6207 LearningRate 0.000474 Epoch: 15 Global Step: 315610 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:32,686-Speed 2494.39 samples/sec Loss 2.6754 LearningRate 0.000474 Epoch: 15 Global Step: 315620 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:40,888-Speed 2497.47 samples/sec Loss 2.6586 LearningRate 0.000474 Epoch: 15 Global Step: 315630 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:49,091-Speed 2497.06 samples/sec Loss 2.6391 LearningRate 0.000474 Epoch: 15 Global Step: 315640 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:41:57,292-Speed 2497.74 samples/sec Loss 2.6411 LearningRate 0.000474 Epoch: 15 Global Step: 315650 Fp16 Grad Scale: 65536 Required: 118 hours Training: 2022-07-08 14:42:05,491-Speed 2498.38 samples/sec Loss 2.6682 LearningRate 0.000474 Epoch: 15 Global Step: 315660 Fp16 Grad Scale: 65536 Required: 118 hours Training: 2022-07-08 14:42:13,638-Speed 2514.03 samples/sec Loss 2.6901 LearningRate 0.000474 Epoch: 15 Global Step: 315670 Fp16 Grad Scale: 65536 Required: 118 hours Training: 2022-07-08 14:42:21,806-Speed 2507.81 samples/sec Loss 2.6594 LearningRate 0.000474 Epoch: 15 Global Step: 315680 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:42:30,008-Speed 2497.54 samples/sec Loss 2.6589 LearningRate 0.000474 Epoch: 15 Global Step: 315690 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:42:38,235-Speed 2489.54 samples/sec Loss 2.6460 LearningRate 0.000474 Epoch: 15 Global Step: 315700 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:42:46,436-Speed 2497.78 samples/sec Loss 2.6239 LearningRate 0.000474 Epoch: 15 Global Step: 315710 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:42:54,640-Speed 2496.66 samples/sec Loss 2.6479 LearningRate 0.000474 Epoch: 15 Global Step: 315720 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:02,790-Speed 2513.47 samples/sec Loss 2.6472 LearningRate 0.000474 Epoch: 15 Global Step: 315730 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:10,990-Speed 2497.76 samples/sec Loss 2.6818 LearningRate 0.000474 Epoch: 15 Global Step: 315740 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:19,198-Speed 2495.53 samples/sec Loss 2.6634 LearningRate 0.000474 Epoch: 15 Global Step: 315750 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:27,411-Speed 2493.99 samples/sec Loss 2.6397 LearningRate 0.000474 Epoch: 15 Global Step: 315760 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:35,615-Speed 2496.72 samples/sec Loss 2.6219 LearningRate 0.000474 Epoch: 15 Global Step: 315770 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:43,816-Speed 2497.67 samples/sec Loss 2.6243 LearningRate 0.000474 Epoch: 15 Global Step: 315780 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:43:51,962-Speed 2514.73 samples/sec Loss 2.6304 LearningRate 0.000474 Epoch: 15 Global Step: 315790 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:00,165-Speed 2497.13 samples/sec Loss 2.7230 LearningRate 0.000474 Epoch: 15 Global Step: 315800 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:08,371-Speed 2495.91 samples/sec Loss 2.6607 LearningRate 0.000474 Epoch: 15 Global Step: 315810 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:16,584-Speed 2494.54 samples/sec Loss 2.6709 LearningRate 0.000473 Epoch: 15 Global Step: 315820 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:24,800-Speed 2493.27 samples/sec Loss 2.7320 LearningRate 0.000473 Epoch: 15 Global Step: 315830 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:33,002-Speed 2496.98 samples/sec Loss 2.7010 LearningRate 0.000473 Epoch: 15 Global Step: 315840 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:41,152-Speed 2513.18 samples/sec Loss 2.6967 LearningRate 0.000473 Epoch: 15 Global Step: 315850 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:49,363-Speed 2495.01 samples/sec Loss 2.6201 LearningRate 0.000473 Epoch: 15 Global Step: 315860 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:44:57,569-Speed 2496.01 samples/sec Loss 2.6589 LearningRate 0.000473 Epoch: 15 Global Step: 315870 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:05,768-Speed 2498.09 samples/sec Loss 2.6529 LearningRate 0.000473 Epoch: 15 Global Step: 315880 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:13,977-Speed 2495.42 samples/sec Loss 2.6740 LearningRate 0.000473 Epoch: 15 Global Step: 315890 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:22,189-Speed 2494.34 samples/sec Loss 2.6408 LearningRate 0.000473 Epoch: 15 Global Step: 315900 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:30,339-Speed 2513.38 samples/sec Loss 2.6694 LearningRate 0.000473 Epoch: 15 Global Step: 315910 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:38,540-Speed 2497.44 samples/sec Loss 2.6213 LearningRate 0.000473 Epoch: 15 Global Step: 315920 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:46,739-Speed 2498.10 samples/sec Loss 2.6440 LearningRate 0.000473 Epoch: 15 Global Step: 315930 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:45:54,938-Speed 2498.40 samples/sec Loss 2.6428 LearningRate 0.000473 Epoch: 15 Global Step: 315940 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:03,137-Speed 2498.26 samples/sec Loss 2.6320 LearningRate 0.000473 Epoch: 15 Global Step: 315950 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:11,348-Speed 2494.76 samples/sec Loss 2.6426 LearningRate 0.000473 Epoch: 15 Global Step: 315960 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:19,494-Speed 2514.48 samples/sec Loss 2.6123 LearningRate 0.000473 Epoch: 15 Global Step: 315970 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:27,709-Speed 2493.28 samples/sec Loss 2.6422 LearningRate 0.000473 Epoch: 15 Global Step: 315980 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:35,910-Speed 2497.77 samples/sec Loss 2.6681 LearningRate 0.000473 Epoch: 15 Global Step: 315990 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:44,119-Speed 2495.41 samples/sec Loss 2.6605 LearningRate 0.000473 Epoch: 15 Global Step: 316000 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:46:52,321-Speed 2497.35 samples/sec Loss 2.6575 LearningRate 0.000473 Epoch: 15 Global Step: 316010 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:00,521-Speed 2498.04 samples/sec Loss 2.6142 LearningRate 0.000473 Epoch: 15 Global Step: 316020 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:08,669-Speed 2513.63 samples/sec Loss 2.6593 LearningRate 0.000473 Epoch: 15 Global Step: 316030 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:16,885-Speed 2493.06 samples/sec Loss 2.6506 LearningRate 0.000473 Epoch: 15 Global Step: 316040 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:25,086-Speed 2498.11 samples/sec Loss 2.6367 LearningRate 0.000473 Epoch: 15 Global Step: 316050 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:33,288-Speed 2497.19 samples/sec Loss 2.6425 LearningRate 0.000473 Epoch: 15 Global Step: 316060 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:41,491-Speed 2497.03 samples/sec Loss 2.6798 LearningRate 0.000473 Epoch: 15 Global Step: 316070 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:49,691-Speed 2498.40 samples/sec Loss 2.6724 LearningRate 0.000473 Epoch: 15 Global Step: 316080 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:47:57,842-Speed 2512.93 samples/sec Loss 2.7752 LearningRate 0.000473 Epoch: 15 Global Step: 316090 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:06,039-Speed 2498.79 samples/sec Loss 2.6908 LearningRate 0.000473 Epoch: 15 Global Step: 316100 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:14,245-Speed 2496.32 samples/sec Loss 2.6851 LearningRate 0.000473 Epoch: 15 Global Step: 316110 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:22,460-Speed 2493.48 samples/sec Loss 2.6855 LearningRate 0.000473 Epoch: 15 Global Step: 316120 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:30,658-Speed 2498.36 samples/sec Loss 2.6701 LearningRate 0.000473 Epoch: 15 Global Step: 316130 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:38,858-Speed 2497.99 samples/sec Loss 2.7094 LearningRate 0.000473 Epoch: 15 Global Step: 316140 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:47,007-Speed 2513.46 samples/sec Loss 2.7066 LearningRate 0.000473 Epoch: 15 Global Step: 316150 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:48:55,204-Speed 2499.09 samples/sec Loss 2.6616 LearningRate 0.000473 Epoch: 15 Global Step: 316160 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:03,405-Speed 2497.54 samples/sec Loss 2.6501 LearningRate 0.000473 Epoch: 15 Global Step: 316170 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:11,605-Speed 2497.87 samples/sec Loss 2.6162 LearningRate 0.000473 Epoch: 15 Global Step: 316180 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:19,802-Speed 2499.26 samples/sec Loss 2.7199 LearningRate 0.000473 Epoch: 15 Global Step: 316190 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:28,001-Speed 2498.36 samples/sec Loss 2.6536 LearningRate 0.000473 Epoch: 15 Global Step: 316200 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:36,147-Speed 2514.53 samples/sec Loss 2.6510 LearningRate 0.000473 Epoch: 15 Global Step: 316210 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:44,350-Speed 2497.02 samples/sec Loss 2.6728 LearningRate 0.000473 Epoch: 15 Global Step: 316220 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:49:52,552-Speed 2497.28 samples/sec Loss 2.6646 LearningRate 0.000473 Epoch: 15 Global Step: 316230 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:50:00,759-Speed 2495.86 samples/sec Loss 2.6795 LearningRate 0.000473 Epoch: 15 Global Step: 316240 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:50:08,956-Speed 2498.88 samples/sec Loss 2.6014 LearningRate 0.000473 Epoch: 15 Global Step: 316250 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:50:17,165-Speed 2495.17 samples/sec Loss 2.6854 LearningRate 0.000473 Epoch: 15 Global Step: 316260 Fp16 Grad Scale: 32768 Required: 118 hours Training: 2022-07-08 14:50:25,321-Speed 2511.52 samples/sec Loss 2.6774 LearningRate 0.000473 Epoch: 15 Global Step: 316270 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:50:33,523-Speed 2497.43 samples/sec Loss 2.7716 LearningRate 0.000473 Epoch: 15 Global Step: 316280 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:50:41,720-Speed 2498.90 samples/sec Loss 2.7007 LearningRate 0.000473 Epoch: 15 Global Step: 316290 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:50:49,925-Speed 2496.46 samples/sec Loss 2.7777 LearningRate 0.000473 Epoch: 15 Global Step: 316300 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:50:58,124-Speed 2498.23 samples/sec Loss 2.8256 LearningRate 0.000473 Epoch: 15 Global Step: 316310 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:06,327-Speed 2496.98 samples/sec Loss 2.7550 LearningRate 0.000473 Epoch: 15 Global Step: 316320 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:14,474-Speed 2514.12 samples/sec Loss 2.8629 LearningRate 0.000473 Epoch: 15 Global Step: 316330 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:22,678-Speed 2496.71 samples/sec Loss 2.7113 LearningRate 0.000473 Epoch: 15 Global Step: 316340 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:30,878-Speed 2498.14 samples/sec Loss 2.7454 LearningRate 0.000473 Epoch: 15 Global Step: 316350 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:39,078-Speed 2498.13 samples/sec Loss 2.7563 LearningRate 0.000472 Epoch: 15 Global Step: 316360 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:47,279-Speed 2497.72 samples/sec Loss 2.7854 LearningRate 0.000472 Epoch: 15 Global Step: 316370 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:51:55,477-Speed 2498.45 samples/sec Loss 2.7170 LearningRate 0.000472 Epoch: 15 Global Step: 316380 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:03,643-Speed 2508.34 samples/sec Loss 2.7523 LearningRate 0.000472 Epoch: 15 Global Step: 316390 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:11,846-Speed 2497.04 samples/sec Loss 2.7116 LearningRate 0.000472 Epoch: 15 Global Step: 316400 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:20,045-Speed 2498.01 samples/sec Loss 2.7367 LearningRate 0.000472 Epoch: 15 Global Step: 316410 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:28,246-Speed 2497.71 samples/sec Loss 2.7174 LearningRate 0.000472 Epoch: 15 Global Step: 316420 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:36,449-Speed 2497.23 samples/sec Loss 2.6702 LearningRate 0.000472 Epoch: 15 Global Step: 316430 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:44,649-Speed 2497.84 samples/sec Loss 2.6656 LearningRate 0.000472 Epoch: 15 Global Step: 316440 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:52:52,792-Speed 2515.34 samples/sec Loss 2.6233 LearningRate 0.000472 Epoch: 15 Global Step: 316450 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:00,997-Speed 2496.97 samples/sec Loss 2.6471 LearningRate 0.000472 Epoch: 15 Global Step: 316460 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:09,201-Speed 2496.53 samples/sec Loss 2.6581 LearningRate 0.000472 Epoch: 15 Global Step: 316470 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:17,410-Speed 2495.24 samples/sec Loss 2.6744 LearningRate 0.000472 Epoch: 15 Global Step: 316480 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:25,608-Speed 2498.62 samples/sec Loss 2.6497 LearningRate 0.000472 Epoch: 15 Global Step: 316490 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:33,808-Speed 2498.02 samples/sec Loss 2.6961 LearningRate 0.000472 Epoch: 15 Global Step: 316500 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:41,961-Speed 2512.21 samples/sec Loss 2.6672 LearningRate 0.000472 Epoch: 15 Global Step: 316510 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:50,159-Speed 2498.96 samples/sec Loss 2.6836 LearningRate 0.000472 Epoch: 15 Global Step: 316520 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:53:58,371-Speed 2494.05 samples/sec Loss 2.7258 LearningRate 0.000472 Epoch: 15 Global Step: 316530 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:06,613-Speed 2485.12 samples/sec Loss 2.7170 LearningRate 0.000472 Epoch: 15 Global Step: 316540 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:14,813-Speed 2497.88 samples/sec Loss 2.7172 LearningRate 0.000472 Epoch: 15 Global Step: 316550 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:23,014-Speed 2498.28 samples/sec Loss 2.7356 LearningRate 0.000472 Epoch: 15 Global Step: 316560 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:31,159-Speed 2514.59 samples/sec Loss 2.6951 LearningRate 0.000472 Epoch: 15 Global Step: 316570 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:39,357-Speed 2498.49 samples/sec Loss 2.6954 LearningRate 0.000472 Epoch: 15 Global Step: 316580 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:47,557-Speed 2498.02 samples/sec Loss 2.6916 LearningRate 0.000472 Epoch: 15 Global Step: 316590 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:54:55,757-Speed 2497.90 samples/sec Loss 2.6490 LearningRate 0.000472 Epoch: 15 Global Step: 316600 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:03,981-Speed 2490.48 samples/sec Loss 2.7347 LearningRate 0.000472 Epoch: 15 Global Step: 316610 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:12,183-Speed 2497.57 samples/sec Loss 2.7129 LearningRate 0.000472 Epoch: 15 Global Step: 316620 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:20,335-Speed 2512.58 samples/sec Loss 2.6877 LearningRate 0.000472 Epoch: 15 Global Step: 316630 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:28,538-Speed 2496.85 samples/sec Loss 2.7035 LearningRate 0.000472 Epoch: 15 Global Step: 316640 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:36,749-Speed 2494.86 samples/sec Loss 2.6539 LearningRate 0.000472 Epoch: 15 Global Step: 316650 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:44,948-Speed 2498.32 samples/sec Loss 2.7183 LearningRate 0.000472 Epoch: 15 Global Step: 316660 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:55:53,147-Speed 2498.38 samples/sec Loss 2.6999 LearningRate 0.000472 Epoch: 15 Global Step: 316670 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:01,343-Speed 2499.01 samples/sec Loss 2.6913 LearningRate 0.000472 Epoch: 15 Global Step: 316680 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:09,493-Speed 2513.29 samples/sec Loss 2.6724 LearningRate 0.000472 Epoch: 15 Global Step: 316690 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:17,694-Speed 2498.02 samples/sec Loss 2.7242 LearningRate 0.000472 Epoch: 15 Global Step: 316700 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:25,892-Speed 2498.63 samples/sec Loss 2.7181 LearningRate 0.000472 Epoch: 15 Global Step: 316710 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:34,092-Speed 2497.74 samples/sec Loss 2.6988 LearningRate 0.000472 Epoch: 15 Global Step: 316720 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:42,290-Speed 2498.35 samples/sec Loss 2.7293 LearningRate 0.000472 Epoch: 15 Global Step: 316730 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:50,493-Speed 2497.22 samples/sec Loss 2.6474 LearningRate 0.000472 Epoch: 15 Global Step: 316740 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:56:58,647-Speed 2512.22 samples/sec Loss 2.7412 LearningRate 0.000472 Epoch: 15 Global Step: 316750 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:06,848-Speed 2497.48 samples/sec Loss 2.7828 LearningRate 0.000472 Epoch: 15 Global Step: 316760 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:15,055-Speed 2496.11 samples/sec Loss 2.7377 LearningRate 0.000472 Epoch: 15 Global Step: 316770 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:23,253-Speed 2498.89 samples/sec Loss 2.7851 LearningRate 0.000472 Epoch: 15 Global Step: 316780 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:31,452-Speed 2498.11 samples/sec Loss 2.7741 LearningRate 0.000472 Epoch: 15 Global Step: 316790 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:39,652-Speed 2498.11 samples/sec Loss 2.6737 LearningRate 0.000472 Epoch: 15 Global Step: 316800 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:47,801-Speed 2513.89 samples/sec Loss 2.6770 LearningRate 0.000472 Epoch: 15 Global Step: 316810 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:57:56,002-Speed 2497.72 samples/sec Loss 2.6609 LearningRate 0.000472 Epoch: 15 Global Step: 316820 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:04,209-Speed 2495.83 samples/sec Loss 2.7244 LearningRate 0.000472 Epoch: 15 Global Step: 316830 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:12,407-Speed 2498.40 samples/sec Loss 2.7071 LearningRate 0.000472 Epoch: 15 Global Step: 316840 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:20,609-Speed 2497.45 samples/sec Loss 2.6502 LearningRate 0.000472 Epoch: 15 Global Step: 316850 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:28,812-Speed 2497.08 samples/sec Loss 2.6692 LearningRate 0.000472 Epoch: 15 Global Step: 316860 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:36,963-Speed 2513.11 samples/sec Loss 2.7278 LearningRate 0.000472 Epoch: 15 Global Step: 316870 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 14:58:45,176-Speed 2493.89 samples/sec Loss 2.6277 LearningRate 0.000472 Epoch: 15 Global Step: 316880 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:58:53,373-Speed 2499.34 samples/sec Loss 2.6237 LearningRate 0.000472 Epoch: 15 Global Step: 316890 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:01,599-Speed 2490.01 samples/sec Loss 2.6099 LearningRate 0.000471 Epoch: 15 Global Step: 316900 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:09,796-Speed 2498.77 samples/sec Loss 2.6765 LearningRate 0.000471 Epoch: 15 Global Step: 316910 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:17,995-Speed 2498.20 samples/sec Loss 2.6323 LearningRate 0.000471 Epoch: 15 Global Step: 316920 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:26,143-Speed 2514.11 samples/sec Loss 2.6715 LearningRate 0.000471 Epoch: 15 Global Step: 316930 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:34,339-Speed 2499.21 samples/sec Loss 2.7270 LearningRate 0.000471 Epoch: 15 Global Step: 316940 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:42,538-Speed 2498.05 samples/sec Loss 2.7386 LearningRate 0.000471 Epoch: 15 Global Step: 316950 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:50,737-Speed 2498.37 samples/sec Loss 2.6904 LearningRate 0.000471 Epoch: 15 Global Step: 316960 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 14:59:58,936-Speed 2498.29 samples/sec Loss 2.6674 LearningRate 0.000471 Epoch: 15 Global Step: 316970 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:07,136-Speed 2498.02 samples/sec Loss 2.6044 LearningRate 0.000471 Epoch: 15 Global Step: 316980 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:15,279-Speed 2515.45 samples/sec Loss 2.6838 LearningRate 0.000471 Epoch: 15 Global Step: 316990 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:23,484-Speed 2496.34 samples/sec Loss 2.6197 LearningRate 0.000471 Epoch: 15 Global Step: 317000 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:31,684-Speed 2498.03 samples/sec Loss 2.6845 LearningRate 0.000471 Epoch: 15 Global Step: 317010 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:39,887-Speed 2497.15 samples/sec Loss 2.6608 LearningRate 0.000471 Epoch: 15 Global Step: 317020 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:48,084-Speed 2499.06 samples/sec Loss 2.6177 LearningRate 0.000471 Epoch: 15 Global Step: 317030 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:00:56,295-Speed 2494.55 samples/sec Loss 2.6895 LearningRate 0.000471 Epoch: 15 Global Step: 317040 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:04,440-Speed 2514.66 samples/sec Loss 2.7045 LearningRate 0.000471 Epoch: 15 Global Step: 317050 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:12,639-Speed 2498.44 samples/sec Loss 2.7087 LearningRate 0.000471 Epoch: 15 Global Step: 317060 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:20,838-Speed 2497.96 samples/sec Loss 2.6849 LearningRate 0.000471 Epoch: 15 Global Step: 317070 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:29,036-Speed 2498.76 samples/sec Loss 2.7036 LearningRate 0.000471 Epoch: 15 Global Step: 317080 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:37,231-Speed 2499.45 samples/sec Loss 2.6823 LearningRate 0.000471 Epoch: 15 Global Step: 317090 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:45,433-Speed 2497.40 samples/sec Loss 2.6858 LearningRate 0.000471 Epoch: 15 Global Step: 317100 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:01:53,589-Speed 2511.70 samples/sec Loss 2.7115 LearningRate 0.000471 Epoch: 15 Global Step: 317110 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:01,787-Speed 2498.40 samples/sec Loss 2.6167 LearningRate 0.000471 Epoch: 15 Global Step: 317120 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:09,991-Speed 2496.83 samples/sec Loss 2.6126 LearningRate 0.000471 Epoch: 15 Global Step: 317130 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:18,192-Speed 2497.93 samples/sec Loss 2.6758 LearningRate 0.000471 Epoch: 15 Global Step: 317140 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:26,390-Speed 2498.26 samples/sec Loss 2.6818 LearningRate 0.000471 Epoch: 15 Global Step: 317150 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:34,587-Speed 2499.12 samples/sec Loss 2.6717 LearningRate 0.000471 Epoch: 15 Global Step: 317160 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:42,743-Speed 2511.29 samples/sec Loss 2.6628 LearningRate 0.000471 Epoch: 15 Global Step: 317170 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:50,945-Speed 2497.25 samples/sec Loss 2.6464 LearningRate 0.000471 Epoch: 15 Global Step: 317180 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:02:59,169-Speed 2490.86 samples/sec Loss 2.7017 LearningRate 0.000471 Epoch: 15 Global Step: 317190 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:07,371-Speed 2497.49 samples/sec Loss 2.6932 LearningRate 0.000471 Epoch: 15 Global Step: 317200 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:15,574-Speed 2497.13 samples/sec Loss 2.6345 LearningRate 0.000471 Epoch: 15 Global Step: 317210 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:23,779-Speed 2496.41 samples/sec Loss 2.6486 LearningRate 0.000471 Epoch: 15 Global Step: 317220 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:31,932-Speed 2512.35 samples/sec Loss 2.6713 LearningRate 0.000471 Epoch: 15 Global Step: 317230 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:40,130-Speed 2498.37 samples/sec Loss 2.6226 LearningRate 0.000471 Epoch: 15 Global Step: 317240 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:48,331-Speed 2497.86 samples/sec Loss 2.6473 LearningRate 0.000471 Epoch: 15 Global Step: 317250 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:03:56,528-Speed 2498.74 samples/sec Loss 2.7337 LearningRate 0.000471 Epoch: 15 Global Step: 317260 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:04:04,687-Speed 2510.69 samples/sec Loss 2.6753 LearningRate 0.000471 Epoch: 15 Global Step: 317270 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:12,896-Speed 2495.17 samples/sec Loss 2.6741 LearningRate 0.000471 Epoch: 15 Global Step: 317280 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:21,039-Speed 2515.32 samples/sec Loss 2.6901 LearningRate 0.000471 Epoch: 15 Global Step: 317290 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:29,240-Speed 2497.67 samples/sec Loss 2.6423 LearningRate 0.000471 Epoch: 15 Global Step: 317300 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:37,443-Speed 2496.99 samples/sec Loss 2.6565 LearningRate 0.000471 Epoch: 15 Global Step: 317310 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:45,656-Speed 2494.34 samples/sec Loss 2.6544 LearningRate 0.000471 Epoch: 15 Global Step: 317320 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:04:53,857-Speed 2497.96 samples/sec Loss 2.7009 LearningRate 0.000471 Epoch: 15 Global Step: 317330 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:02,057-Speed 2498.07 samples/sec Loss 2.6511 LearningRate 0.000471 Epoch: 15 Global Step: 317340 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:10,208-Speed 2512.93 samples/sec Loss 2.6621 LearningRate 0.000471 Epoch: 15 Global Step: 317350 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:18,409-Speed 2497.67 samples/sec Loss 2.5918 LearningRate 0.000471 Epoch: 15 Global Step: 317360 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:26,605-Speed 2499.26 samples/sec Loss 2.5907 LearningRate 0.000471 Epoch: 15 Global Step: 317370 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:34,804-Speed 2498.44 samples/sec Loss 2.6849 LearningRate 0.000471 Epoch: 15 Global Step: 317380 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:43,002-Speed 2498.72 samples/sec Loss 2.5745 LearningRate 0.000471 Epoch: 15 Global Step: 317390 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:51,206-Speed 2496.73 samples/sec Loss 2.6409 LearningRate 0.000471 Epoch: 15 Global Step: 317400 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:05:59,351-Speed 2514.68 samples/sec Loss 2.6165 LearningRate 0.000471 Epoch: 15 Global Step: 317410 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:07,551-Speed 2498.15 samples/sec Loss 2.6525 LearningRate 0.000471 Epoch: 15 Global Step: 317420 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:15,758-Speed 2495.65 samples/sec Loss 2.6285 LearningRate 0.000471 Epoch: 15 Global Step: 317430 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:23,968-Speed 2495.29 samples/sec Loss 2.6076 LearningRate 0.000471 Epoch: 15 Global Step: 317440 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:32,179-Speed 2494.42 samples/sec Loss 2.6385 LearningRate 0.000470 Epoch: 15 Global Step: 317450 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:40,381-Speed 2497.43 samples/sec Loss 2.6676 LearningRate 0.000470 Epoch: 15 Global Step: 317460 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:48,526-Speed 2514.77 samples/sec Loss 2.6280 LearningRate 0.000470 Epoch: 15 Global Step: 317470 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:06:56,747-Speed 2491.80 samples/sec Loss 2.6164 LearningRate 0.000470 Epoch: 15 Global Step: 317480 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:04,949-Speed 2497.52 samples/sec Loss 2.6194 LearningRate 0.000470 Epoch: 15 Global Step: 317490 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:13,146-Speed 2498.89 samples/sec Loss 2.6324 LearningRate 0.000470 Epoch: 15 Global Step: 317500 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:21,344-Speed 2498.65 samples/sec Loss 2.6310 LearningRate 0.000470 Epoch: 15 Global Step: 317510 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:29,547-Speed 2497.17 samples/sec Loss 2.5798 LearningRate 0.000470 Epoch: 15 Global Step: 317520 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:37,692-Speed 2514.63 samples/sec Loss 2.5812 LearningRate 0.000470 Epoch: 15 Global Step: 317530 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:45,898-Speed 2496.74 samples/sec Loss 2.6330 LearningRate 0.000470 Epoch: 15 Global Step: 317540 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:07:54,103-Speed 2496.50 samples/sec Loss 2.6768 LearningRate 0.000470 Epoch: 15 Global Step: 317550 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:02,304-Speed 2497.70 samples/sec Loss 2.6468 LearningRate 0.000470 Epoch: 15 Global Step: 317560 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:10,506-Speed 2497.52 samples/sec Loss 2.6452 LearningRate 0.000470 Epoch: 15 Global Step: 317570 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:18,703-Speed 2498.93 samples/sec Loss 2.6242 LearningRate 0.000470 Epoch: 15 Global Step: 317580 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:26,848-Speed 2515.07 samples/sec Loss 2.6732 LearningRate 0.000470 Epoch: 15 Global Step: 317590 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:35,047-Speed 2498.30 samples/sec Loss 2.6643 LearningRate 0.000470 Epoch: 15 Global Step: 317600 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:43,251-Speed 2496.52 samples/sec Loss 2.6277 LearningRate 0.000470 Epoch: 15 Global Step: 317610 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:51,456-Speed 2496.42 samples/sec Loss 2.6373 LearningRate 0.000470 Epoch: 15 Global Step: 317620 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:08:59,672-Speed 2493.35 samples/sec Loss 2.6619 LearningRate 0.000470 Epoch: 15 Global Step: 317630 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:07,876-Speed 2496.54 samples/sec Loss 2.6774 LearningRate 0.000470 Epoch: 15 Global Step: 317640 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:16,029-Speed 2512.59 samples/sec Loss 2.6106 LearningRate 0.000470 Epoch: 15 Global Step: 317650 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:24,229-Speed 2497.75 samples/sec Loss 2.6718 LearningRate 0.000470 Epoch: 15 Global Step: 317660 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:32,433-Speed 2496.74 samples/sec Loss 2.6282 LearningRate 0.000470 Epoch: 15 Global Step: 317670 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:40,633-Speed 2497.90 samples/sec Loss 2.6481 LearningRate 0.000470 Epoch: 15 Global Step: 317680 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:48,834-Speed 2497.75 samples/sec Loss 2.6976 LearningRate 0.000470 Epoch: 15 Global Step: 317690 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:09:57,037-Speed 2497.04 samples/sec Loss 2.6932 LearningRate 0.000470 Epoch: 15 Global Step: 317700 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:05,183-Speed 2515.37 samples/sec Loss 2.7133 LearningRate 0.000470 Epoch: 15 Global Step: 317710 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:13,387-Speed 2496.80 samples/sec Loss 2.7015 LearningRate 0.000470 Epoch: 15 Global Step: 317720 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:21,587-Speed 2497.74 samples/sec Loss 2.6530 LearningRate 0.000470 Epoch: 15 Global Step: 317730 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:29,787-Speed 2498.26 samples/sec Loss 2.6335 LearningRate 0.000470 Epoch: 15 Global Step: 317740 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:37,989-Speed 2497.40 samples/sec Loss 2.7114 LearningRate 0.000470 Epoch: 15 Global Step: 317750 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:46,197-Speed 2495.55 samples/sec Loss 2.6536 LearningRate 0.000470 Epoch: 15 Global Step: 317760 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:10:54,358-Speed 2510.03 samples/sec Loss 2.6824 LearningRate 0.000470 Epoch: 15 Global Step: 317770 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:02,557-Speed 2498.44 samples/sec Loss 2.6768 LearningRate 0.000470 Epoch: 15 Global Step: 317780 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:10,755-Speed 2498.64 samples/sec Loss 2.6877 LearningRate 0.000470 Epoch: 15 Global Step: 317790 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:18,956-Speed 2497.86 samples/sec Loss 2.6105 LearningRate 0.000470 Epoch: 15 Global Step: 317800 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:27,160-Speed 2496.55 samples/sec Loss 2.6590 LearningRate 0.000470 Epoch: 15 Global Step: 317810 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:35,359-Speed 2498.23 samples/sec Loss 2.6345 LearningRate 0.000470 Epoch: 15 Global Step: 317820 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:43,511-Speed 2512.73 samples/sec Loss 2.6909 LearningRate 0.000470 Epoch: 15 Global Step: 317830 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:51,710-Speed 2498.22 samples/sec Loss 2.6095 LearningRate 0.000470 Epoch: 15 Global Step: 317840 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:11:59,914-Speed 2496.96 samples/sec Loss 2.6288 LearningRate 0.000470 Epoch: 15 Global Step: 317850 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:08,122-Speed 2495.38 samples/sec Loss 2.6602 LearningRate 0.000470 Epoch: 15 Global Step: 317860 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:16,324-Speed 2497.41 samples/sec Loss 2.6755 LearningRate 0.000470 Epoch: 15 Global Step: 317870 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:24,527-Speed 2496.95 samples/sec Loss 2.6484 LearningRate 0.000470 Epoch: 15 Global Step: 317880 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:32,675-Speed 2513.90 samples/sec Loss 2.5739 LearningRate 0.000470 Epoch: 15 Global Step: 317890 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:40,875-Speed 2498.00 samples/sec Loss 2.6018 LearningRate 0.000470 Epoch: 15 Global Step: 317900 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:49,082-Speed 2496.40 samples/sec Loss 2.6235 LearningRate 0.000470 Epoch: 15 Global Step: 317910 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:12:57,283-Speed 2497.47 samples/sec Loss 2.6612 LearningRate 0.000470 Epoch: 15 Global Step: 317920 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:05,495-Speed 2494.67 samples/sec Loss 2.6483 LearningRate 0.000470 Epoch: 15 Global Step: 317930 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:13,695-Speed 2497.95 samples/sec Loss 2.6535 LearningRate 0.000470 Epoch: 15 Global Step: 317940 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:21,853-Speed 2510.90 samples/sec Loss 2.6686 LearningRate 0.000470 Epoch: 15 Global Step: 317950 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:30,063-Speed 2494.67 samples/sec Loss 2.6782 LearningRate 0.000470 Epoch: 15 Global Step: 317960 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:38,269-Speed 2496.27 samples/sec Loss 2.6721 LearningRate 0.000470 Epoch: 15 Global Step: 317970 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:46,469-Speed 2498.10 samples/sec Loss 2.6663 LearningRate 0.000470 Epoch: 15 Global Step: 317980 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:13:54,678-Speed 2495.28 samples/sec Loss 2.6600 LearningRate 0.000469 Epoch: 15 Global Step: 317990 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:02,887-Speed 2495.37 samples/sec Loss 2.6312 LearningRate 0.000469 Epoch: 15 Global Step: 318000 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:11,029-Speed 2515.82 samples/sec Loss 2.6865 LearningRate 0.000469 Epoch: 15 Global Step: 318010 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:19,230-Speed 2497.45 samples/sec Loss 2.7302 LearningRate 0.000469 Epoch: 15 Global Step: 318020 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:27,430-Speed 2498.08 samples/sec Loss 2.7514 LearningRate 0.000469 Epoch: 15 Global Step: 318030 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:35,632-Speed 2497.61 samples/sec Loss 2.6500 LearningRate 0.000469 Epoch: 15 Global Step: 318040 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:43,843-Speed 2494.38 samples/sec Loss 2.7262 LearningRate 0.000469 Epoch: 15 Global Step: 318050 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:14:52,048-Speed 2496.45 samples/sec Loss 2.6225 LearningRate 0.000469 Epoch: 15 Global Step: 318060 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:00,207-Speed 2510.37 samples/sec Loss 2.6572 LearningRate 0.000469 Epoch: 15 Global Step: 318070 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:08,408-Speed 2497.79 samples/sec Loss 2.6574 LearningRate 0.000469 Epoch: 15 Global Step: 318080 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:16,607-Speed 2498.10 samples/sec Loss 2.6509 LearningRate 0.000469 Epoch: 15 Global Step: 318090 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:24,807-Speed 2498.11 samples/sec Loss 2.6727 LearningRate 0.000469 Epoch: 15 Global Step: 318100 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:33,006-Speed 2498.26 samples/sec Loss 2.7176 LearningRate 0.000469 Epoch: 15 Global Step: 318110 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:41,206-Speed 2497.80 samples/sec Loss 2.6411 LearningRate 0.000469 Epoch: 15 Global Step: 318120 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:49,356-Speed 2513.37 samples/sec Loss 2.6195 LearningRate 0.000469 Epoch: 15 Global Step: 318130 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:15:57,565-Speed 2495.39 samples/sec Loss 2.6214 LearningRate 0.000469 Epoch: 15 Global Step: 318140 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:05,773-Speed 2495.56 samples/sec Loss 2.6319 LearningRate 0.000469 Epoch: 15 Global Step: 318150 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:13,971-Speed 2498.24 samples/sec Loss 2.6764 LearningRate 0.000469 Epoch: 15 Global Step: 318160 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:22,175-Speed 2496.88 samples/sec Loss 2.6825 LearningRate 0.000469 Epoch: 15 Global Step: 318170 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:30,370-Speed 2499.38 samples/sec Loss 2.6507 LearningRate 0.000469 Epoch: 15 Global Step: 318180 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:38,518-Speed 2514.25 samples/sec Loss 2.6487 LearningRate 0.000469 Epoch: 15 Global Step: 318190 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:46,716-Speed 2498.63 samples/sec Loss 2.7063 LearningRate 0.000469 Epoch: 15 Global Step: 318200 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:16:54,921-Speed 2496.58 samples/sec Loss 2.7199 LearningRate 0.000469 Epoch: 15 Global Step: 318210 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:03,117-Speed 2499.20 samples/sec Loss 2.6671 LearningRate 0.000469 Epoch: 15 Global Step: 318220 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:11,319-Speed 2497.26 samples/sec Loss 2.7595 LearningRate 0.000469 Epoch: 15 Global Step: 318230 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:19,518-Speed 2498.41 samples/sec Loss 2.7279 LearningRate 0.000469 Epoch: 15 Global Step: 318240 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:27,676-Speed 2510.70 samples/sec Loss 2.6776 LearningRate 0.000469 Epoch: 15 Global Step: 318250 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:35,875-Speed 2498.26 samples/sec Loss 2.7078 LearningRate 0.000469 Epoch: 15 Global Step: 318260 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:44,075-Speed 2498.05 samples/sec Loss 2.7140 LearningRate 0.000469 Epoch: 15 Global Step: 318270 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:17:52,276-Speed 2497.66 samples/sec Loss 2.7058 LearningRate 0.000469 Epoch: 15 Global Step: 318280 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:00,474-Speed 2498.65 samples/sec Loss 2.6632 LearningRate 0.000469 Epoch: 15 Global Step: 318290 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:08,674-Speed 2498.14 samples/sec Loss 2.6344 LearningRate 0.000469 Epoch: 15 Global Step: 318300 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:16,825-Speed 2512.95 samples/sec Loss 2.6685 LearningRate 0.000469 Epoch: 15 Global Step: 318310 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:25,024-Speed 2498.44 samples/sec Loss 2.6442 LearningRate 0.000469 Epoch: 15 Global Step: 318320 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:33,255-Speed 2488.50 samples/sec Loss 2.6919 LearningRate 0.000469 Epoch: 15 Global Step: 318330 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:41,460-Speed 2496.36 samples/sec Loss 2.6749 LearningRate 0.000469 Epoch: 15 Global Step: 318340 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:49,658-Speed 2498.59 samples/sec Loss 2.6937 LearningRate 0.000469 Epoch: 15 Global Step: 318350 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:18:57,859-Speed 2497.61 samples/sec Loss 2.7043 LearningRate 0.000469 Epoch: 15 Global Step: 318360 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:06,009-Speed 2513.79 samples/sec Loss 2.6576 LearningRate 0.000469 Epoch: 15 Global Step: 318370 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:14,215-Speed 2496.15 samples/sec Loss 2.6514 LearningRate 0.000469 Epoch: 15 Global Step: 318380 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:22,412-Speed 2498.75 samples/sec Loss 2.6966 LearningRate 0.000469 Epoch: 15 Global Step: 318390 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:30,618-Speed 2496.20 samples/sec Loss 2.6683 LearningRate 0.000469 Epoch: 15 Global Step: 318400 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:38,812-Speed 2499.89 samples/sec Loss 2.6561 LearningRate 0.000469 Epoch: 15 Global Step: 318410 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:47,012-Speed 2497.78 samples/sec Loss 2.6477 LearningRate 0.000469 Epoch: 15 Global Step: 318420 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:19:55,157-Speed 2514.62 samples/sec Loss 2.6281 LearningRate 0.000469 Epoch: 15 Global Step: 318430 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:20:03,354-Speed 2498.88 samples/sec Loss 2.6577 LearningRate 0.000469 Epoch: 15 Global Step: 318440 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:20:11,563-Speed 2495.24 samples/sec Loss 2.6684 LearningRate 0.000469 Epoch: 15 Global Step: 318450 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:20:19,771-Speed 2495.51 samples/sec Loss 2.6431 LearningRate 0.000469 Epoch: 15 Global Step: 318460 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:20:27,968-Speed 2498.93 samples/sec Loss 2.6865 LearningRate 0.000469 Epoch: 15 Global Step: 318470 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:20:36,168-Speed 2497.86 samples/sec Loss 2.6611 LearningRate 0.000469 Epoch: 15 Global Step: 318480 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:20:44,311-Speed 2515.55 samples/sec Loss 2.6631 LearningRate 0.000469 Epoch: 15 Global Step: 318490 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:20:52,527-Speed 2493.24 samples/sec Loss 2.6636 LearningRate 0.000469 Epoch: 15 Global Step: 318500 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:00,726-Speed 2498.01 samples/sec Loss 2.7268 LearningRate 0.000469 Epoch: 15 Global Step: 318510 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:08,934-Speed 2495.43 samples/sec Loss 2.7008 LearningRate 0.000469 Epoch: 15 Global Step: 318520 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:17,134-Speed 2498.04 samples/sec Loss 2.6118 LearningRate 0.000469 Epoch: 15 Global Step: 318530 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:25,331-Speed 2498.75 samples/sec Loss 2.6982 LearningRate 0.000468 Epoch: 15 Global Step: 318540 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:33,476-Speed 2514.75 samples/sec Loss 2.6483 LearningRate 0.000468 Epoch: 15 Global Step: 318550 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:21:41,634-Speed 2510.99 samples/sec Loss 2.6566 LearningRate 0.000468 Epoch: 15 Global Step: 318560 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:21:49,837-Speed 2496.73 samples/sec Loss 2.6604 LearningRate 0.000468 Epoch: 15 Global Step: 318570 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:21:58,036-Speed 2498.38 samples/sec Loss 2.6397 LearningRate 0.000468 Epoch: 15 Global Step: 318580 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:06,244-Speed 2495.62 samples/sec Loss 2.6511 LearningRate 0.000468 Epoch: 15 Global Step: 318590 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:14,446-Speed 2497.39 samples/sec Loss 2.6730 LearningRate 0.000468 Epoch: 15 Global Step: 318600 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:22,597-Speed 2513.33 samples/sec Loss 2.5786 LearningRate 0.000468 Epoch: 15 Global Step: 318610 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:30,800-Speed 2496.78 samples/sec Loss 2.6128 LearningRate 0.000468 Epoch: 15 Global Step: 318620 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:38,999-Speed 2498.33 samples/sec Loss 2.6305 LearningRate 0.000468 Epoch: 15 Global Step: 318630 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:47,197-Speed 2498.78 samples/sec Loss 2.5949 LearningRate 0.000468 Epoch: 15 Global Step: 318640 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:22:55,399-Speed 2497.03 samples/sec Loss 2.6563 LearningRate 0.000468 Epoch: 15 Global Step: 318650 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:03,609-Speed 2495.48 samples/sec Loss 2.6247 LearningRate 0.000468 Epoch: 15 Global Step: 318660 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:11,755-Speed 2514.36 samples/sec Loss 2.6570 LearningRate 0.000468 Epoch: 15 Global Step: 318670 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:19,953-Speed 2498.91 samples/sec Loss 2.6618 LearningRate 0.000468 Epoch: 15 Global Step: 318680 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:28,150-Speed 2499.01 samples/sec Loss 2.7600 LearningRate 0.000468 Epoch: 15 Global Step: 318690 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:36,352-Speed 2497.55 samples/sec Loss 2.7419 LearningRate 0.000468 Epoch: 15 Global Step: 318700 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:44,553-Speed 2497.38 samples/sec Loss 2.6678 LearningRate 0.000468 Epoch: 15 Global Step: 318710 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:23:52,760-Speed 2499.73 samples/sec Loss 2.6654 LearningRate 0.000468 Epoch: 15 Global Step: 318720 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:00,930-Speed 2507.06 samples/sec Loss 2.6289 LearningRate 0.000468 Epoch: 15 Global Step: 318730 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:09,137-Speed 2495.83 samples/sec Loss 2.6589 LearningRate 0.000468 Epoch: 15 Global Step: 318740 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:17,336-Speed 2498.36 samples/sec Loss 2.7034 LearningRate 0.000468 Epoch: 15 Global Step: 318750 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:25,531-Speed 2499.58 samples/sec Loss 2.7094 LearningRate 0.000468 Epoch: 15 Global Step: 318760 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:33,728-Speed 2498.95 samples/sec Loss 2.6540 LearningRate 0.000468 Epoch: 15 Global Step: 318770 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:41,938-Speed 2494.61 samples/sec Loss 2.6526 LearningRate 0.000468 Epoch: 15 Global Step: 318780 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:50,086-Speed 2514.16 samples/sec Loss 2.7128 LearningRate 0.000468 Epoch: 15 Global Step: 318790 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:24:58,292-Speed 2496.20 samples/sec Loss 2.6682 LearningRate 0.000468 Epoch: 15 Global Step: 318800 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:06,491-Speed 2498.08 samples/sec Loss 2.6154 LearningRate 0.000468 Epoch: 15 Global Step: 318810 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:14,692-Speed 2497.75 samples/sec Loss 2.6398 LearningRate 0.000468 Epoch: 15 Global Step: 318820 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:22,890-Speed 2498.42 samples/sec Loss 2.6641 LearningRate 0.000468 Epoch: 15 Global Step: 318830 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:31,092-Speed 2497.38 samples/sec Loss 2.6402 LearningRate 0.000468 Epoch: 15 Global Step: 318840 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:39,237-Speed 2514.87 samples/sec Loss 2.6386 LearningRate 0.000468 Epoch: 15 Global Step: 318850 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:47,442-Speed 2496.51 samples/sec Loss 2.6775 LearningRate 0.000468 Epoch: 15 Global Step: 318860 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:25:55,639-Speed 2498.59 samples/sec Loss 2.6003 LearningRate 0.000468 Epoch: 15 Global Step: 318870 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:03,845-Speed 2496.16 samples/sec Loss 2.6449 LearningRate 0.000468 Epoch: 15 Global Step: 318880 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:12,048-Speed 2496.99 samples/sec Loss 2.6293 LearningRate 0.000468 Epoch: 15 Global Step: 318890 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:20,245-Speed 2498.76 samples/sec Loss 2.6383 LearningRate 0.000468 Epoch: 15 Global Step: 318900 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:28,409-Speed 2509.13 samples/sec Loss 2.6527 LearningRate 0.000468 Epoch: 15 Global Step: 318910 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:36,608-Speed 2497.98 samples/sec Loss 2.6265 LearningRate 0.000468 Epoch: 15 Global Step: 318920 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:44,825-Speed 2493.01 samples/sec Loss 2.7224 LearningRate 0.000468 Epoch: 15 Global Step: 318930 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:26:53,029-Speed 2496.54 samples/sec Loss 2.6162 LearningRate 0.000468 Epoch: 15 Global Step: 318940 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:01,234-Speed 2496.84 samples/sec Loss 2.6192 LearningRate 0.000468 Epoch: 15 Global Step: 318950 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:09,440-Speed 2496.27 samples/sec Loss 2.6042 LearningRate 0.000468 Epoch: 15 Global Step: 318960 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:17,586-Speed 2514.77 samples/sec Loss 2.6361 LearningRate 0.000468 Epoch: 15 Global Step: 318970 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:25,801-Speed 2493.48 samples/sec Loss 2.6471 LearningRate 0.000468 Epoch: 15 Global Step: 318980 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:34,004-Speed 2497.09 samples/sec Loss 2.5655 LearningRate 0.000468 Epoch: 15 Global Step: 318990 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:42,202-Speed 2498.30 samples/sec Loss 2.6276 LearningRate 0.000468 Epoch: 15 Global Step: 319000 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:50,401-Speed 2498.30 samples/sec Loss 2.6321 LearningRate 0.000468 Epoch: 15 Global Step: 319010 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:27:58,614-Speed 2494.10 samples/sec Loss 2.6036 LearningRate 0.000468 Epoch: 15 Global Step: 319020 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:06,762-Speed 2514.01 samples/sec Loss 2.6577 LearningRate 0.000468 Epoch: 15 Global Step: 319030 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:14,961-Speed 2498.38 samples/sec Loss 2.6911 LearningRate 0.000468 Epoch: 15 Global Step: 319040 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:23,166-Speed 2496.51 samples/sec Loss 2.6680 LearningRate 0.000468 Epoch: 15 Global Step: 319050 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:31,363-Speed 2498.71 samples/sec Loss 2.6670 LearningRate 0.000468 Epoch: 15 Global Step: 319060 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:39,563-Speed 2498.21 samples/sec Loss 2.6162 LearningRate 0.000468 Epoch: 15 Global Step: 319070 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:47,765-Speed 2497.12 samples/sec Loss 2.6367 LearningRate 0.000467 Epoch: 15 Global Step: 319080 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:28:55,911-Speed 2514.71 samples/sec Loss 2.7216 LearningRate 0.000467 Epoch: 15 Global Step: 319090 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:04,111-Speed 2498.03 samples/sec Loss 2.6733 LearningRate 0.000467 Epoch: 15 Global Step: 319100 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:12,309-Speed 2498.39 samples/sec Loss 2.6660 LearningRate 0.000467 Epoch: 15 Global Step: 319110 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:20,520-Speed 2494.89 samples/sec Loss 2.6451 LearningRate 0.000467 Epoch: 15 Global Step: 319120 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:28,718-Speed 2498.70 samples/sec Loss 2.6742 LearningRate 0.000467 Epoch: 15 Global Step: 319130 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:36,919-Speed 2498.11 samples/sec Loss 2.6801 LearningRate 0.000467 Epoch: 15 Global Step: 319140 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:45,068-Speed 2513.32 samples/sec Loss 2.6143 LearningRate 0.000467 Epoch: 15 Global Step: 319150 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:29:53,262-Speed 2499.94 samples/sec Loss 2.6660 LearningRate 0.000467 Epoch: 15 Global Step: 319160 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:01,462-Speed 2498.01 samples/sec Loss 2.6306 LearningRate 0.000467 Epoch: 15 Global Step: 319170 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:09,667-Speed 2496.46 samples/sec Loss 2.6488 LearningRate 0.000467 Epoch: 15 Global Step: 319180 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:17,867-Speed 2497.85 samples/sec Loss 2.6793 LearningRate 0.000467 Epoch: 15 Global Step: 319190 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:26,071-Speed 2496.81 samples/sec Loss 2.7238 LearningRate 0.000467 Epoch: 15 Global Step: 319200 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:34,216-Speed 2514.80 samples/sec Loss 2.6568 LearningRate 0.000467 Epoch: 15 Global Step: 319210 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:42,415-Speed 2498.20 samples/sec Loss 2.6644 LearningRate 0.000467 Epoch: 15 Global Step: 319220 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:50,618-Speed 2497.18 samples/sec Loss 2.6863 LearningRate 0.000467 Epoch: 15 Global Step: 319230 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:30:58,825-Speed 2495.79 samples/sec Loss 2.6513 LearningRate 0.000467 Epoch: 15 Global Step: 319240 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:07,024-Speed 2498.22 samples/sec Loss 2.6634 LearningRate 0.000467 Epoch: 15 Global Step: 319250 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:15,224-Speed 2498.25 samples/sec Loss 2.5970 LearningRate 0.000467 Epoch: 15 Global Step: 319260 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:23,374-Speed 2513.42 samples/sec Loss 2.6794 LearningRate 0.000467 Epoch: 15 Global Step: 319270 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:31,578-Speed 2496.72 samples/sec Loss 2.6556 LearningRate 0.000467 Epoch: 15 Global Step: 319280 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:39,776-Speed 2498.41 samples/sec Loss 2.6880 LearningRate 0.000467 Epoch: 15 Global Step: 319290 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:47,976-Speed 2498.08 samples/sec Loss 2.7015 LearningRate 0.000467 Epoch: 15 Global Step: 319300 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:31:56,188-Speed 2494.21 samples/sec Loss 2.7100 LearningRate 0.000467 Epoch: 15 Global Step: 319310 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:04,383-Speed 2499.57 samples/sec Loss 2.7453 LearningRate 0.000467 Epoch: 15 Global Step: 319320 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:12,529-Speed 2514.53 samples/sec Loss 2.6187 LearningRate 0.000467 Epoch: 15 Global Step: 319330 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:20,726-Speed 2498.80 samples/sec Loss 2.7111 LearningRate 0.000467 Epoch: 15 Global Step: 319340 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:28,920-Speed 2500.04 samples/sec Loss 2.6435 LearningRate 0.000467 Epoch: 15 Global Step: 319350 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:37,119-Speed 2498.41 samples/sec Loss 2.6540 LearningRate 0.000467 Epoch: 15 Global Step: 319360 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:45,320-Speed 2497.86 samples/sec Loss 2.7052 LearningRate 0.000467 Epoch: 15 Global Step: 319370 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:32:53,518-Speed 2498.46 samples/sec Loss 2.7398 LearningRate 0.000467 Epoch: 15 Global Step: 319380 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:01,663-Speed 2514.78 samples/sec Loss 2.6633 LearningRate 0.000467 Epoch: 15 Global Step: 319390 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:09,865-Speed 2497.18 samples/sec Loss 2.6518 LearningRate 0.000467 Epoch: 15 Global Step: 319400 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:18,061-Speed 2499.24 samples/sec Loss 2.6767 LearningRate 0.000467 Epoch: 15 Global Step: 319410 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:26,259-Speed 2498.58 samples/sec Loss 2.6479 LearningRate 0.000467 Epoch: 15 Global Step: 319420 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:34,457-Speed 2498.82 samples/sec Loss 2.6504 LearningRate 0.000467 Epoch: 15 Global Step: 319430 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:42,659-Speed 2497.38 samples/sec Loss 2.6692 LearningRate 0.000467 Epoch: 15 Global Step: 319440 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:50,809-Speed 2513.22 samples/sec Loss 2.6786 LearningRate 0.000467 Epoch: 15 Global Step: 319450 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:33:59,011-Speed 2497.39 samples/sec Loss 2.6291 LearningRate 0.000467 Epoch: 15 Global Step: 319460 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:07,223-Speed 2494.44 samples/sec Loss 2.6103 LearningRate 0.000467 Epoch: 15 Global Step: 319470 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:15,419-Speed 2499.25 samples/sec Loss 2.5970 LearningRate 0.000467 Epoch: 15 Global Step: 319480 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:23,617-Speed 2498.68 samples/sec Loss 2.6501 LearningRate 0.000467 Epoch: 15 Global Step: 319490 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:31,815-Speed 2498.52 samples/sec Loss 2.6120 LearningRate 0.000467 Epoch: 15 Global Step: 319500 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:39,957-Speed 2516.03 samples/sec Loss 2.6587 LearningRate 0.000467 Epoch: 15 Global Step: 319510 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:48,158-Speed 2497.72 samples/sec Loss 2.5904 LearningRate 0.000467 Epoch: 15 Global Step: 319520 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:34:56,354-Speed 2499.22 samples/sec Loss 2.5956 LearningRate 0.000467 Epoch: 15 Global Step: 319530 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:04,552-Speed 2498.73 samples/sec Loss 2.6805 LearningRate 0.000467 Epoch: 15 Global Step: 319540 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:12,752-Speed 2497.96 samples/sec Loss 2.6408 LearningRate 0.000467 Epoch: 15 Global Step: 319550 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:20,954-Speed 2497.68 samples/sec Loss 2.6370 LearningRate 0.000467 Epoch: 15 Global Step: 319560 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:29,098-Speed 2514.88 samples/sec Loss 2.7201 LearningRate 0.000467 Epoch: 15 Global Step: 319570 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:37,299-Speed 2497.67 samples/sec Loss 2.6229 LearningRate 0.000467 Epoch: 15 Global Step: 319580 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:45,494-Speed 2499.57 samples/sec Loss 2.6650 LearningRate 0.000467 Epoch: 15 Global Step: 319590 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:35:53,691-Speed 2498.82 samples/sec Loss 2.6347 LearningRate 0.000467 Epoch: 15 Global Step: 319600 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:01,891-Speed 2498.15 samples/sec Loss 2.6760 LearningRate 0.000467 Epoch: 15 Global Step: 319610 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:10,093-Speed 2497.32 samples/sec Loss 2.6574 LearningRate 0.000467 Epoch: 15 Global Step: 319620 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:18,238-Speed 2514.85 samples/sec Loss 2.5908 LearningRate 0.000466 Epoch: 15 Global Step: 319630 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:26,436-Speed 2498.44 samples/sec Loss 2.6823 LearningRate 0.000466 Epoch: 15 Global Step: 319640 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:34,631-Speed 2499.45 samples/sec Loss 2.6856 LearningRate 0.000466 Epoch: 15 Global Step: 319650 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:42,833-Speed 2497.30 samples/sec Loss 2.6285 LearningRate 0.000466 Epoch: 15 Global Step: 319660 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:51,029-Speed 2499.40 samples/sec Loss 2.6165 LearningRate 0.000466 Epoch: 15 Global Step: 319670 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:36:59,232-Speed 2496.87 samples/sec Loss 2.6763 LearningRate 0.000466 Epoch: 15 Global Step: 319680 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:07,377-Speed 2514.93 samples/sec Loss 2.6875 LearningRate 0.000466 Epoch: 15 Global Step: 319690 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:15,577-Speed 2498.00 samples/sec Loss 2.6261 LearningRate 0.000466 Epoch: 15 Global Step: 319700 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:23,777-Speed 2498.01 samples/sec Loss 2.6644 LearningRate 0.000466 Epoch: 15 Global Step: 319710 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:31,975-Speed 2498.59 samples/sec Loss 2.6452 LearningRate 0.000466 Epoch: 15 Global Step: 319720 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:40,172-Speed 2498.93 samples/sec Loss 2.6576 LearningRate 0.000466 Epoch: 15 Global Step: 319730 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:48,372-Speed 2497.81 samples/sec Loss 2.5885 LearningRate 0.000466 Epoch: 15 Global Step: 319740 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:37:56,516-Speed 2515.15 samples/sec Loss 2.6385 LearningRate 0.000466 Epoch: 15 Global Step: 319750 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:38:04,717-Speed 2497.54 samples/sec Loss 2.6144 LearningRate 0.000466 Epoch: 15 Global Step: 319760 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:12,919-Speed 2497.30 samples/sec Loss 2.6531 LearningRate 0.000466 Epoch: 15 Global Step: 319770 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:21,118-Speed 2498.49 samples/sec Loss 2.6375 LearningRate 0.000466 Epoch: 15 Global Step: 319780 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:29,323-Speed 2496.40 samples/sec Loss 2.6703 LearningRate 0.000466 Epoch: 15 Global Step: 319790 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:37,522-Speed 2498.10 samples/sec Loss 2.6405 LearningRate 0.000466 Epoch: 15 Global Step: 319800 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:45,672-Speed 2513.57 samples/sec Loss 2.6471 LearningRate 0.000466 Epoch: 15 Global Step: 319810 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:38:53,878-Speed 2496.11 samples/sec Loss 2.6914 LearningRate 0.000466 Epoch: 15 Global Step: 319820 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:02,077-Speed 2498.35 samples/sec Loss 2.6476 LearningRate 0.000466 Epoch: 15 Global Step: 319830 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:10,275-Speed 2498.81 samples/sec Loss 2.6781 LearningRate 0.000466 Epoch: 15 Global Step: 319840 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:18,516-Speed 2485.51 samples/sec Loss 2.5854 LearningRate 0.000466 Epoch: 15 Global Step: 319850 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:26,713-Speed 2498.82 samples/sec Loss 2.6599 LearningRate 0.000466 Epoch: 15 Global Step: 319860 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:34,861-Speed 2514.58 samples/sec Loss 2.6247 LearningRate 0.000466 Epoch: 15 Global Step: 319870 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:43,065-Speed 2496.67 samples/sec Loss 2.6600 LearningRate 0.000466 Epoch: 15 Global Step: 319880 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:51,265-Speed 2498.22 samples/sec Loss 2.6501 LearningRate 0.000466 Epoch: 15 Global Step: 319890 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:39:59,469-Speed 2496.58 samples/sec Loss 2.5758 LearningRate 0.000466 Epoch: 15 Global Step: 319900 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:07,669-Speed 2498.10 samples/sec Loss 2.6185 LearningRate 0.000466 Epoch: 15 Global Step: 319910 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:15,880-Speed 2494.51 samples/sec Loss 2.5526 LearningRate 0.000466 Epoch: 15 Global Step: 319920 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:24,026-Speed 2514.28 samples/sec Loss 2.6080 LearningRate 0.000466 Epoch: 15 Global Step: 319930 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:32,229-Speed 2497.30 samples/sec Loss 2.6791 LearningRate 0.000466 Epoch: 15 Global Step: 319940 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:40,431-Speed 2497.40 samples/sec Loss 2.6104 LearningRate 0.000466 Epoch: 15 Global Step: 319950 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:48,629-Speed 2498.25 samples/sec Loss 2.6275 LearningRate 0.000466 Epoch: 15 Global Step: 319960 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:40:56,826-Speed 2499.01 samples/sec Loss 2.6553 LearningRate 0.000466 Epoch: 15 Global Step: 319970 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:05,048-Speed 2491.41 samples/sec Loss 2.6857 LearningRate 0.000466 Epoch: 15 Global Step: 319980 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:13,195-Speed 2514.11 samples/sec Loss 2.6352 LearningRate 0.000466 Epoch: 15 Global Step: 319990 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:21,398-Speed 2497.17 samples/sec Loss 2.6284 LearningRate 0.000466 Epoch: 15 Global Step: 320000 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:29,595-Speed 2498.57 samples/sec Loss 2.6141 LearningRate 0.000466 Epoch: 15 Global Step: 320010 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:37,817-Speed 2491.78 samples/sec Loss 2.6314 LearningRate 0.000466 Epoch: 15 Global Step: 320020 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:46,015-Speed 2498.37 samples/sec Loss 2.6470 LearningRate 0.000466 Epoch: 15 Global Step: 320030 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:41:54,213-Speed 2498.49 samples/sec Loss 2.6122 LearningRate 0.000466 Epoch: 15 Global Step: 320040 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:02,355-Speed 2515.84 samples/sec Loss 2.6340 LearningRate 0.000466 Epoch: 15 Global Step: 320050 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:10,555-Speed 2498.09 samples/sec Loss 2.7346 LearningRate 0.000466 Epoch: 15 Global Step: 320060 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:18,755-Speed 2498.15 samples/sec Loss 2.6338 LearningRate 0.000466 Epoch: 15 Global Step: 320070 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:26,953-Speed 2498.46 samples/sec Loss 2.6207 LearningRate 0.000466 Epoch: 15 Global Step: 320080 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:35,155-Speed 2497.41 samples/sec Loss 2.6635 LearningRate 0.000466 Epoch: 15 Global Step: 320090 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:43,358-Speed 2497.12 samples/sec Loss 2.6045 LearningRate 0.000466 Epoch: 15 Global Step: 320100 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:51,508-Speed 2513.44 samples/sec Loss 2.6729 LearningRate 0.000466 Epoch: 15 Global Step: 320110 Fp16 Grad Scale: 65536 Required: 117 hours Training: 2022-07-08 15:42:59,662-Speed 2512.16 samples/sec Loss 2.6296 LearningRate 0.000466 Epoch: 15 Global Step: 320120 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:07,864-Speed 2497.06 samples/sec Loss 2.6810 LearningRate 0.000466 Epoch: 15 Global Step: 320130 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:16,078-Speed 2493.67 samples/sec Loss 2.6863 LearningRate 0.000466 Epoch: 15 Global Step: 320140 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:24,279-Speed 2497.87 samples/sec Loss 2.7046 LearningRate 0.000466 Epoch: 15 Global Step: 320150 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:32,481-Speed 2497.14 samples/sec Loss 2.6222 LearningRate 0.000466 Epoch: 15 Global Step: 320160 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:40,626-Speed 2514.81 samples/sec Loss 2.6426 LearningRate 0.000466 Epoch: 15 Global Step: 320170 Fp16 Grad Scale: 32768 Required: 117 hours Training: 2022-07-08 15:43:48,793-Speed 2508.30 samples/sec Loss 2.6532 LearningRate 0.000465 Epoch: 15 Global Step: 320180 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:43:56,991-Speed 2498.46 samples/sec Loss 2.6271 LearningRate 0.000465 Epoch: 15 Global Step: 320190 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:05,193-Speed 2497.29 samples/sec Loss 2.7110 LearningRate 0.000465 Epoch: 15 Global Step: 320200 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:13,392-Speed 2498.32 samples/sec Loss 2.6507 LearningRate 0.000465 Epoch: 15 Global Step: 320210 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:21,596-Speed 2496.72 samples/sec Loss 2.6656 LearningRate 0.000465 Epoch: 15 Global Step: 320220 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:29,742-Speed 2514.55 samples/sec Loss 2.6619 LearningRate 0.000465 Epoch: 15 Global Step: 320230 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:37,941-Speed 2498.32 samples/sec Loss 2.6468 LearningRate 0.000465 Epoch: 15 Global Step: 320240 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:46,146-Speed 2496.49 samples/sec Loss 2.6223 LearningRate 0.000465 Epoch: 15 Global Step: 320250 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:44:54,358-Speed 2494.10 samples/sec Loss 2.6140 LearningRate 0.000465 Epoch: 15 Global Step: 320260 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:02,560-Speed 2497.54 samples/sec Loss 2.7127 LearningRate 0.000465 Epoch: 15 Global Step: 320270 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:10,760-Speed 2498.20 samples/sec Loss 2.6243 LearningRate 0.000465 Epoch: 15 Global Step: 320280 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:18,904-Speed 2515.13 samples/sec Loss 2.7162 LearningRate 0.000465 Epoch: 15 Global Step: 320290 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:27,104-Speed 2497.76 samples/sec Loss 2.6132 LearningRate 0.000465 Epoch: 15 Global Step: 320300 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:35,312-Speed 2495.75 samples/sec Loss 2.6379 LearningRate 0.000465 Epoch: 15 Global Step: 320310 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:43,512-Speed 2498.08 samples/sec Loss 2.6660 LearningRate 0.000465 Epoch: 15 Global Step: 320320 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:51,720-Speed 2495.69 samples/sec Loss 2.6702 LearningRate 0.000465 Epoch: 15 Global Step: 320330 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:45:59,923-Speed 2496.96 samples/sec Loss 2.6242 LearningRate 0.000465 Epoch: 15 Global Step: 320340 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:08,070-Speed 2514.12 samples/sec Loss 2.6450 LearningRate 0.000465 Epoch: 15 Global Step: 320350 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:16,291-Speed 2491.59 samples/sec Loss 2.6274 LearningRate 0.000465 Epoch: 15 Global Step: 320360 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:24,488-Speed 2498.77 samples/sec Loss 2.6767 LearningRate 0.000465 Epoch: 15 Global Step: 320370 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:32,691-Speed 2496.99 samples/sec Loss 2.5991 LearningRate 0.000465 Epoch: 15 Global Step: 320380 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:40,888-Speed 2498.92 samples/sec Loss 2.6387 LearningRate 0.000465 Epoch: 15 Global Step: 320390 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:49,098-Speed 2494.99 samples/sec Loss 2.6717 LearningRate 0.000465 Epoch: 15 Global Step: 320400 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:46:57,244-Speed 2514.89 samples/sec Loss 2.6756 LearningRate 0.000465 Epoch: 15 Global Step: 320410 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:05,445-Speed 2497.70 samples/sec Loss 2.6047 LearningRate 0.000465 Epoch: 15 Global Step: 320420 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:13,648-Speed 2497.08 samples/sec Loss 2.6529 LearningRate 0.000465 Epoch: 15 Global Step: 320430 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:21,850-Speed 2497.59 samples/sec Loss 2.6337 LearningRate 0.000465 Epoch: 15 Global Step: 320440 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:30,050-Speed 2497.93 samples/sec Loss 2.6010 LearningRate 0.000465 Epoch: 15 Global Step: 320450 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:38,259-Speed 2495.15 samples/sec Loss 2.6156 LearningRate 0.000465 Epoch: 15 Global Step: 320460 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:46,405-Speed 2514.38 samples/sec Loss 2.6320 LearningRate 0.000465 Epoch: 15 Global Step: 320470 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:47:54,605-Speed 2498.12 samples/sec Loss 2.6707 LearningRate 0.000465 Epoch: 15 Global Step: 320480 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:02,802-Speed 2498.89 samples/sec Loss 2.6942 LearningRate 0.000465 Epoch: 15 Global Step: 320490 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:11,004-Speed 2497.43 samples/sec Loss 2.6607 LearningRate 0.000465 Epoch: 15 Global Step: 320500 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:19,210-Speed 2495.96 samples/sec Loss 2.6491 LearningRate 0.000465 Epoch: 15 Global Step: 320510 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:27,415-Speed 2496.54 samples/sec Loss 2.6849 LearningRate 0.000465 Epoch: 15 Global Step: 320520 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:35,564-Speed 2513.45 samples/sec Loss 2.7240 LearningRate 0.000465 Epoch: 15 Global Step: 320530 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:43,763-Speed 2498.42 samples/sec Loss 2.6758 LearningRate 0.000465 Epoch: 15 Global Step: 320540 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:48:51,959-Speed 2499.13 samples/sec Loss 2.7387 LearningRate 0.000465 Epoch: 15 Global Step: 320550 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:49:00,162-Speed 2497.35 samples/sec Loss 2.6980 LearningRate 0.000465 Epoch: 15 Global Step: 320560 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:49:08,364-Speed 2497.36 samples/sec Loss 2.6995 LearningRate 0.000465 Epoch: 15 Global Step: 320570 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:49:16,561-Speed 2498.94 samples/sec Loss 2.7003 LearningRate 0.000465 Epoch: 15 Global Step: 320580 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:49:24,706-Speed 2514.59 samples/sec Loss 2.7042 LearningRate 0.000465 Epoch: 15 Global Step: 320590 Fp16 Grad Scale: 16384 Required: 117 hours Training: 2022-07-08 15:49:32,905-Speed 2498.46 samples/sec Loss 2.6964 LearningRate 0.000465 Epoch: 15 Global Step: 320600 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:49:41,114-Speed 2495.04 samples/sec Loss 2.7115 LearningRate 0.000465 Epoch: 15 Global Step: 320610 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:49:49,311-Speed 2498.85 samples/sec Loss 2.6822 LearningRate 0.000465 Epoch: 15 Global Step: 320620 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:49:57,511-Speed 2498.51 samples/sec Loss 2.6474 LearningRate 0.000465 Epoch: 15 Global Step: 320630 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:05,722-Speed 2494.50 samples/sec Loss 2.6803 LearningRate 0.000465 Epoch: 15 Global Step: 320640 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:13,867-Speed 2514.77 samples/sec Loss 2.6789 LearningRate 0.000465 Epoch: 15 Global Step: 320650 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:22,074-Speed 2496.03 samples/sec Loss 2.6645 LearningRate 0.000465 Epoch: 15 Global Step: 320660 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:30,271-Speed 2498.96 samples/sec Loss 2.6445 LearningRate 0.000465 Epoch: 15 Global Step: 320670 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:38,464-Speed 2499.87 samples/sec Loss 2.6196 LearningRate 0.000465 Epoch: 15 Global Step: 320680 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:46,666-Speed 2497.28 samples/sec Loss 2.6913 LearningRate 0.000465 Epoch: 15 Global Step: 320690 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:50:54,866-Speed 2497.86 samples/sec Loss 2.7045 LearningRate 0.000465 Epoch: 15 Global Step: 320700 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:03,012-Speed 2515.03 samples/sec Loss 2.7101 LearningRate 0.000465 Epoch: 15 Global Step: 320710 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:11,222-Speed 2494.84 samples/sec Loss 2.7040 LearningRate 0.000464 Epoch: 15 Global Step: 320720 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:19,423-Speed 2497.48 samples/sec Loss 2.6436 LearningRate 0.000464 Epoch: 15 Global Step: 320730 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:27,620-Speed 2498.99 samples/sec Loss 2.6904 LearningRate 0.000464 Epoch: 15 Global Step: 320740 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:35,836-Speed 2493.17 samples/sec Loss 2.6802 LearningRate 0.000464 Epoch: 15 Global Step: 320750 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:44,033-Speed 2498.66 samples/sec Loss 2.6505 LearningRate 0.000464 Epoch: 15 Global Step: 320760 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:51:52,179-Speed 2514.68 samples/sec Loss 2.6223 LearningRate 0.000464 Epoch: 15 Global Step: 320770 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:00,376-Speed 2499.10 samples/sec Loss 2.6519 LearningRate 0.000464 Epoch: 15 Global Step: 320780 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:08,588-Speed 2494.61 samples/sec Loss 2.6574 LearningRate 0.000464 Epoch: 15 Global Step: 320790 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:16,784-Speed 2498.96 samples/sec Loss 2.6706 LearningRate 0.000464 Epoch: 15 Global Step: 320800 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:24,988-Speed 2496.98 samples/sec Loss 2.6411 LearningRate 0.000464 Epoch: 15 Global Step: 320810 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:33,199-Speed 2494.44 samples/sec Loss 2.6062 LearningRate 0.000464 Epoch: 15 Global Step: 320820 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:41,347-Speed 2514.05 samples/sec Loss 2.6162 LearningRate 0.000464 Epoch: 15 Global Step: 320830 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:49,544-Speed 2498.94 samples/sec Loss 2.6784 LearningRate 0.000464 Epoch: 15 Global Step: 320840 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:52:57,738-Speed 2499.54 samples/sec Loss 2.6167 LearningRate 0.000464 Epoch: 15 Global Step: 320850 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:05,934-Speed 2499.31 samples/sec Loss 2.6351 LearningRate 0.000464 Epoch: 15 Global Step: 320860 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:14,133-Speed 2498.41 samples/sec Loss 2.6540 LearningRate 0.000464 Epoch: 15 Global Step: 320870 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:22,332-Speed 2497.96 samples/sec Loss 2.6538 LearningRate 0.000464 Epoch: 15 Global Step: 320880 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:30,478-Speed 2515.23 samples/sec Loss 2.6900 LearningRate 0.000464 Epoch: 15 Global Step: 320890 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:38,691-Speed 2494.05 samples/sec Loss 2.6740 LearningRate 0.000464 Epoch: 15 Global Step: 320900 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:46,891-Speed 2497.84 samples/sec Loss 2.6629 LearningRate 0.000464 Epoch: 15 Global Step: 320910 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:53:55,089-Speed 2498.81 samples/sec Loss 2.6309 LearningRate 0.000464 Epoch: 15 Global Step: 320920 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:03,292-Speed 2497.01 samples/sec Loss 2.6632 LearningRate 0.000464 Epoch: 15 Global Step: 320930 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:11,494-Speed 2497.18 samples/sec Loss 2.6879 LearningRate 0.000464 Epoch: 15 Global Step: 320940 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:19,650-Speed 2511.58 samples/sec Loss 2.6830 LearningRate 0.000464 Epoch: 15 Global Step: 320950 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:27,852-Speed 2497.12 samples/sec Loss 2.6529 LearningRate 0.000464 Epoch: 15 Global Step: 320960 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:36,065-Speed 2494.10 samples/sec Loss 2.6935 LearningRate 0.000464 Epoch: 15 Global Step: 320970 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:44,263-Speed 2498.53 samples/sec Loss 2.6435 LearningRate 0.000464 Epoch: 15 Global Step: 320980 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:54:52,463-Speed 2497.83 samples/sec Loss 2.6232 LearningRate 0.000464 Epoch: 15 Global Step: 320990 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:00,664-Speed 2497.92 samples/sec Loss 2.7214 LearningRate 0.000464 Epoch: 15 Global Step: 321000 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:08,810-Speed 2514.52 samples/sec Loss 2.5927 LearningRate 0.000464 Epoch: 15 Global Step: 321010 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:17,031-Speed 2491.67 samples/sec Loss 2.6723 LearningRate 0.000464 Epoch: 15 Global Step: 321020 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:25,237-Speed 2496.05 samples/sec Loss 2.6328 LearningRate 0.000464 Epoch: 15 Global Step: 321030 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:33,433-Speed 2499.20 samples/sec Loss 2.6581 LearningRate 0.000464 Epoch: 15 Global Step: 321040 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:41,644-Speed 2494.45 samples/sec Loss 2.6362 LearningRate 0.000464 Epoch: 15 Global Step: 321050 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:49,843-Speed 2498.20 samples/sec Loss 2.6554 LearningRate 0.000464 Epoch: 15 Global Step: 321060 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:55:57,991-Speed 2513.99 samples/sec Loss 2.6748 LearningRate 0.000464 Epoch: 15 Global Step: 321070 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:06,188-Speed 2498.85 samples/sec Loss 2.6698 LearningRate 0.000464 Epoch: 15 Global Step: 321080 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:14,386-Speed 2498.53 samples/sec Loss 2.6492 LearningRate 0.000464 Epoch: 15 Global Step: 321090 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:22,583-Speed 2499.40 samples/sec Loss 2.6335 LearningRate 0.000464 Epoch: 15 Global Step: 321100 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:30,782-Speed 2498.19 samples/sec Loss 2.6313 LearningRate 0.000464 Epoch: 15 Global Step: 321110 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:38,985-Speed 2497.44 samples/sec Loss 2.6974 LearningRate 0.000464 Epoch: 15 Global Step: 321120 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:47,138-Speed 2512.03 samples/sec Loss 2.6717 LearningRate 0.000464 Epoch: 15 Global Step: 321130 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:56:55,343-Speed 2497.02 samples/sec Loss 2.6093 LearningRate 0.000464 Epoch: 15 Global Step: 321140 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:03,545-Speed 2497.38 samples/sec Loss 2.6532 LearningRate 0.000464 Epoch: 15 Global Step: 321150 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:11,751-Speed 2496.04 samples/sec Loss 2.6568 LearningRate 0.000464 Epoch: 15 Global Step: 321160 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:19,948-Speed 2498.83 samples/sec Loss 2.6983 LearningRate 0.000464 Epoch: 15 Global Step: 321170 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:28,146-Speed 2498.52 samples/sec Loss 2.6628 LearningRate 0.000464 Epoch: 15 Global Step: 321180 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:36,307-Speed 2510.01 samples/sec Loss 2.6383 LearningRate 0.000464 Epoch: 15 Global Step: 321190 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:44,514-Speed 2496.19 samples/sec Loss 2.6842 LearningRate 0.000464 Epoch: 15 Global Step: 321200 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:57:52,713-Speed 2498.32 samples/sec Loss 2.6250 LearningRate 0.000464 Epoch: 15 Global Step: 321210 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:00,911-Speed 2498.45 samples/sec Loss 2.6362 LearningRate 0.000464 Epoch: 15 Global Step: 321220 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:09,112-Speed 2497.56 samples/sec Loss 2.6031 LearningRate 0.000464 Epoch: 15 Global Step: 321230 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:17,309-Speed 2498.79 samples/sec Loss 2.6389 LearningRate 0.000464 Epoch: 15 Global Step: 321240 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:25,454-Speed 2515.00 samples/sec Loss 2.6952 LearningRate 0.000464 Epoch: 15 Global Step: 321250 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:33,653-Speed 2498.22 samples/sec Loss 2.6206 LearningRate 0.000464 Epoch: 15 Global Step: 321260 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:41,857-Speed 2496.84 samples/sec Loss 2.6319 LearningRate 0.000463 Epoch: 15 Global Step: 321270 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:50,080-Speed 2491.04 samples/sec Loss 2.6273 LearningRate 0.000463 Epoch: 15 Global Step: 321280 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:58:58,276-Speed 2499.02 samples/sec Loss 2.6623 LearningRate 0.000463 Epoch: 15 Global Step: 321290 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:06,472-Speed 2499.18 samples/sec Loss 2.6619 LearningRate 0.000463 Epoch: 15 Global Step: 321300 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:14,619-Speed 2514.15 samples/sec Loss 2.5803 LearningRate 0.000463 Epoch: 15 Global Step: 321310 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:22,820-Speed 2497.75 samples/sec Loss 2.6566 LearningRate 0.000463 Epoch: 15 Global Step: 321320 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:31,027-Speed 2496.12 samples/sec Loss 2.6783 LearningRate 0.000463 Epoch: 15 Global Step: 321330 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:39,228-Speed 2497.46 samples/sec Loss 2.6589 LearningRate 0.000463 Epoch: 15 Global Step: 321340 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:47,433-Speed 2496.99 samples/sec Loss 2.6588 LearningRate 0.000463 Epoch: 15 Global Step: 321350 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 15:59:55,635-Speed 2497.20 samples/sec Loss 2.6328 LearningRate 0.000463 Epoch: 15 Global Step: 321360 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:00:03,783-Speed 2513.89 samples/sec Loss 2.7207 LearningRate 0.000463 Epoch: 15 Global Step: 321370 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:00:11,988-Speed 2496.34 samples/sec Loss 2.6120 LearningRate 0.000463 Epoch: 15 Global Step: 321380 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:00:20,190-Speed 2497.51 samples/sec Loss 2.6507 LearningRate 0.000463 Epoch: 15 Global Step: 321390 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:00:28,397-Speed 2496.03 samples/sec Loss 2.7116 LearningRate 0.000463 Epoch: 15 Global Step: 321400 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:00:36,608-Speed 2494.73 samples/sec Loss 2.6286 LearningRate 0.000463 Epoch: 15 Global Step: 321410 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:00:44,810-Speed 2497.39 samples/sec Loss 2.6427 LearningRate 0.000463 Epoch: 15 Global Step: 321420 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:00:52,958-Speed 2513.86 samples/sec Loss 2.5936 LearningRate 0.000463 Epoch: 15 Global Step: 321430 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:01,162-Speed 2496.81 samples/sec Loss 2.6127 LearningRate 0.000463 Epoch: 15 Global Step: 321440 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:09,372-Speed 2495.08 samples/sec Loss 2.6425 LearningRate 0.000463 Epoch: 15 Global Step: 321450 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:17,572-Speed 2498.28 samples/sec Loss 2.6273 LearningRate 0.000463 Epoch: 15 Global Step: 321460 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:25,772-Speed 2497.94 samples/sec Loss 2.6863 LearningRate 0.000463 Epoch: 15 Global Step: 321470 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:33,974-Speed 2497.34 samples/sec Loss 2.6839 LearningRate 0.000463 Epoch: 15 Global Step: 321480 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:42,121-Speed 2513.96 samples/sec Loss 2.7201 LearningRate 0.000463 Epoch: 15 Global Step: 321490 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:50,325-Speed 2496.78 samples/sec Loss 2.6641 LearningRate 0.000463 Epoch: 15 Global Step: 321500 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:01:58,524-Speed 2498.25 samples/sec Loss 2.6704 LearningRate 0.000463 Epoch: 15 Global Step: 321510 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:06,722-Speed 2498.40 samples/sec Loss 2.6390 LearningRate 0.000463 Epoch: 15 Global Step: 321520 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:14,925-Speed 2497.08 samples/sec Loss 2.6062 LearningRate 0.000463 Epoch: 15 Global Step: 321530 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:23,125-Speed 2498.07 samples/sec Loss 2.7422 LearningRate 0.000463 Epoch: 15 Global Step: 321540 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:31,273-Speed 2514.04 samples/sec Loss 2.6520 LearningRate 0.000463 Epoch: 15 Global Step: 321550 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:39,472-Speed 2498.44 samples/sec Loss 2.6228 LearningRate 0.000463 Epoch: 15 Global Step: 321560 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:47,675-Speed 2496.84 samples/sec Loss 2.7087 LearningRate 0.000463 Epoch: 15 Global Step: 321570 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:02:55,877-Speed 2497.42 samples/sec Loss 2.6807 LearningRate 0.000463 Epoch: 15 Global Step: 321580 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:04,078-Speed 2497.60 samples/sec Loss 2.7215 LearningRate 0.000463 Epoch: 15 Global Step: 321590 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:12,279-Speed 2497.52 samples/sec Loss 2.6493 LearningRate 0.000463 Epoch: 15 Global Step: 321600 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:20,426-Speed 2514.19 samples/sec Loss 2.5994 LearningRate 0.000463 Epoch: 15 Global Step: 321610 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:28,630-Speed 2496.92 samples/sec Loss 2.6690 LearningRate 0.000463 Epoch: 15 Global Step: 321620 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:36,831-Speed 2497.71 samples/sec Loss 2.6348 LearningRate 0.000463 Epoch: 15 Global Step: 321630 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:45,034-Speed 2497.01 samples/sec Loss 2.5891 LearningRate 0.000463 Epoch: 15 Global Step: 321640 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:03:53,246-Speed 2494.06 samples/sec Loss 2.6352 LearningRate 0.000463 Epoch: 15 Global Step: 321650 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:01,447-Speed 2497.70 samples/sec Loss 2.6314 LearningRate 0.000463 Epoch: 15 Global Step: 321660 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:09,595-Speed 2513.77 samples/sec Loss 2.6495 LearningRate 0.000463 Epoch: 15 Global Step: 321670 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:17,801-Speed 2496.15 samples/sec Loss 2.6446 LearningRate 0.000463 Epoch: 15 Global Step: 321680 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:26,011-Speed 2494.94 samples/sec Loss 2.6370 LearningRate 0.000463 Epoch: 15 Global Step: 321690 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:34,212-Speed 2497.60 samples/sec Loss 2.6377 LearningRate 0.000463 Epoch: 15 Global Step: 321700 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:42,417-Speed 2496.56 samples/sec Loss 2.6390 LearningRate 0.000463 Epoch: 15 Global Step: 321710 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:50,618-Speed 2497.42 samples/sec Loss 2.6062 LearningRate 0.000463 Epoch: 15 Global Step: 321720 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:04:58,767-Speed 2513.51 samples/sec Loss 2.6096 LearningRate 0.000463 Epoch: 15 Global Step: 321730 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:06,975-Speed 2495.70 samples/sec Loss 2.6158 LearningRate 0.000463 Epoch: 15 Global Step: 321740 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:15,180-Speed 2496.45 samples/sec Loss 2.6364 LearningRate 0.000463 Epoch: 15 Global Step: 321750 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:23,384-Speed 2496.86 samples/sec Loss 2.6415 LearningRate 0.000463 Epoch: 15 Global Step: 321760 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:31,589-Speed 2496.35 samples/sec Loss 2.5971 LearningRate 0.000463 Epoch: 15 Global Step: 321770 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:39,805-Speed 2492.98 samples/sec Loss 2.6175 LearningRate 0.000463 Epoch: 15 Global Step: 321780 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:47,958-Speed 2512.64 samples/sec Loss 2.6133 LearningRate 0.000463 Epoch: 15 Global Step: 321790 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:05:56,163-Speed 2496.30 samples/sec Loss 2.6305 LearningRate 0.000463 Epoch: 15 Global Step: 321800 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:04,367-Speed 2496.87 samples/sec Loss 2.6619 LearningRate 0.000463 Epoch: 15 Global Step: 321810 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:12,568-Speed 2497.84 samples/sec Loss 2.6311 LearningRate 0.000462 Epoch: 15 Global Step: 321820 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:20,769-Speed 2497.55 samples/sec Loss 2.6198 LearningRate 0.000462 Epoch: 15 Global Step: 321830 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:28,968-Speed 2498.21 samples/sec Loss 2.6652 LearningRate 0.000462 Epoch: 15 Global Step: 321840 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:37,127-Speed 2510.37 samples/sec Loss 2.6938 LearningRate 0.000462 Epoch: 15 Global Step: 321850 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:45,325-Speed 2498.65 samples/sec Loss 2.6478 LearningRate 0.000462 Epoch: 15 Global Step: 321860 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:06:53,531-Speed 2495.87 samples/sec Loss 2.6435 LearningRate 0.000462 Epoch: 15 Global Step: 321870 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:01,735-Speed 2496.77 samples/sec Loss 2.6349 LearningRate 0.000462 Epoch: 15 Global Step: 321880 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:09,936-Speed 2497.80 samples/sec Loss 2.6205 LearningRate 0.000462 Epoch: 15 Global Step: 321890 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:18,136-Speed 2497.89 samples/sec Loss 2.6111 LearningRate 0.000462 Epoch: 15 Global Step: 321900 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:26,299-Speed 2509.34 samples/sec Loss 2.6176 LearningRate 0.000462 Epoch: 15 Global Step: 321910 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:34,504-Speed 2496.21 samples/sec Loss 2.6518 LearningRate 0.000462 Epoch: 15 Global Step: 321920 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:42,787-Speed 2499.11 samples/sec Loss 2.6237 LearningRate 0.000462 Epoch: 15 Global Step: 321930 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:50,998-Speed 2494.51 samples/sec Loss 2.6345 LearningRate 0.000462 Epoch: 15 Global Step: 321940 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:07:59,239-Speed 2499.05 samples/sec Loss 2.5962 LearningRate 0.000462 Epoch: 15 Global Step: 321950 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:07,497-Speed 2497.25 samples/sec Loss 2.5628 LearningRate 0.000462 Epoch: 15 Global Step: 321960 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:15,735-Speed 2514.67 samples/sec Loss 2.6409 LearningRate 0.000462 Epoch: 15 Global Step: 321970 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:23,946-Speed 2494.55 samples/sec Loss 2.6354 LearningRate 0.000462 Epoch: 15 Global Step: 321980 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:32,195-Speed 2499.85 samples/sec Loss 2.6726 LearningRate 0.000462 Epoch: 15 Global Step: 321990 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:44,527-Speed 1668.29 samples/sec Loss 2.6532 LearningRate 0.000462 Epoch: 15 Global Step: 322000 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:08:52,759-Speed 2502.14 samples/sec Loss 2.6468 LearningRate 0.000462 Epoch: 15 Global Step: 322010 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:00,952-Speed 2500.02 samples/sec Loss 2.6291 LearningRate 0.000462 Epoch: 15 Global Step: 322020 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:14,257-Speed 2517.88 samples/sec Loss 2.6417 LearningRate 0.000462 Epoch: 15 Global Step: 322030 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:22,502-Speed 2502.48 samples/sec Loss 2.6879 LearningRate 0.000462 Epoch: 15 Global Step: 322040 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:30,697-Speed 2499.38 samples/sec Loss 2.6698 LearningRate 0.000462 Epoch: 15 Global Step: 322050 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:38,944-Speed 2500.83 samples/sec Loss 2.6351 LearningRate 0.000462 Epoch: 15 Global Step: 322060 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:47,166-Speed 2500.04 samples/sec Loss 2.6164 LearningRate 0.000462 Epoch: 15 Global Step: 322070 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:09:55,413-Speed 2499.22 samples/sec Loss 2.6826 LearningRate 0.000462 Epoch: 15 Global Step: 322080 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:03,560-Speed 2514.04 samples/sec Loss 2.6233 LearningRate 0.000462 Epoch: 15 Global Step: 322090 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:15,087-Speed 2490.73 samples/sec Loss 2.6895 LearningRate 0.000462 Epoch: 15 Global Step: 322100 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:23,336-Speed 2502.22 samples/sec Loss 2.6858 LearningRate 0.000462 Epoch: 15 Global Step: 322110 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:31,582-Speed 2499.49 samples/sec Loss 2.6832 LearningRate 0.000462 Epoch: 15 Global Step: 322120 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:39,779-Speed 2498.47 samples/sec Loss 2.6947 LearningRate 0.000462 Epoch: 15 Global Step: 322130 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:47,990-Speed 2494.81 samples/sec Loss 2.6653 LearningRate 0.000462 Epoch: 15 Global Step: 322140 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:10:56,171-Speed 2517.21 samples/sec Loss 2.7133 LearningRate 0.000462 Epoch: 15 Global Step: 322150 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:04,418-Speed 2500.16 samples/sec Loss 2.6486 LearningRate 0.000462 Epoch: 15 Global Step: 322160 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:15,643-Speed 1824.67 samples/sec Loss 2.6973 LearningRate 0.000462 Epoch: 15 Global Step: 322170 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:23,876-Speed 2500.66 samples/sec Loss 2.6954 LearningRate 0.000462 Epoch: 15 Global Step: 322180 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:32,123-Speed 2500.21 samples/sec Loss 2.6191 LearningRate 0.000462 Epoch: 15 Global Step: 322190 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:41,855-Speed 2104.56 samples/sec Loss 2.6534 LearningRate 0.000462 Epoch: 15 Global Step: 322200 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:50,694-Speed 2343.09 samples/sec Loss 2.6878 LearningRate 0.000462 Epoch: 15 Global Step: 322210 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:11:58,895-Speed 2497.67 samples/sec Loss 2.6236 LearningRate 0.000462 Epoch: 15 Global Step: 322220 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:07,105-Speed 2495.11 samples/sec Loss 2.6350 LearningRate 0.000462 Epoch: 15 Global Step: 322230 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:15,309-Speed 2496.84 samples/sec Loss 2.6626 LearningRate 0.000462 Epoch: 15 Global Step: 322240 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:23,515-Speed 2496.17 samples/sec Loss 2.6202 LearningRate 0.000462 Epoch: 15 Global Step: 322250 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:31,717-Speed 2497.37 samples/sec Loss 2.6487 LearningRate 0.000462 Epoch: 15 Global Step: 322260 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:39,868-Speed 2512.85 samples/sec Loss 2.6069 LearningRate 0.000462 Epoch: 15 Global Step: 322270 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:48,074-Speed 2496.25 samples/sec Loss 2.6202 LearningRate 0.000462 Epoch: 15 Global Step: 322280 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:12:56,274-Speed 2497.96 samples/sec Loss 2.6254 LearningRate 0.000462 Epoch: 15 Global Step: 322290 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:04,480-Speed 2496.72 samples/sec Loss 2.5945 LearningRate 0.000462 Epoch: 15 Global Step: 322300 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:12,717-Speed 2486.69 samples/sec Loss 2.6103 LearningRate 0.000462 Epoch: 15 Global Step: 322310 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:20,914-Speed 2498.81 samples/sec Loss 2.6140 LearningRate 0.000462 Epoch: 15 Global Step: 322320 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:29,060-Speed 2514.63 samples/sec Loss 2.6311 LearningRate 0.000462 Epoch: 15 Global Step: 322330 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:37,271-Speed 2494.76 samples/sec Loss 2.6165 LearningRate 0.000462 Epoch: 15 Global Step: 322340 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:45,476-Speed 2496.38 samples/sec Loss 2.6970 LearningRate 0.000462 Epoch: 15 Global Step: 322350 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:13:53,677-Speed 2497.48 samples/sec Loss 2.6009 LearningRate 0.000462 Epoch: 15 Global Step: 322360 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:01,879-Speed 2497.50 samples/sec Loss 2.5849 LearningRate 0.000461 Epoch: 15 Global Step: 322370 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:10,080-Speed 2497.47 samples/sec Loss 2.6664 LearningRate 0.000461 Epoch: 15 Global Step: 322380 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:18,229-Speed 2513.49 samples/sec Loss 2.5838 LearningRate 0.000461 Epoch: 15 Global Step: 322390 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:26,436-Speed 2495.82 samples/sec Loss 2.5966 LearningRate 0.000461 Epoch: 15 Global Step: 322400 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:34,639-Speed 2497.17 samples/sec Loss 2.6323 LearningRate 0.000461 Epoch: 15 Global Step: 322410 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:42,843-Speed 2496.82 samples/sec Loss 2.6542 LearningRate 0.000461 Epoch: 15 Global Step: 322420 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:51,045-Speed 2497.33 samples/sec Loss 2.6210 LearningRate 0.000461 Epoch: 15 Global Step: 322430 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:14:59,249-Speed 2496.58 samples/sec Loss 2.6058 LearningRate 0.000461 Epoch: 15 Global Step: 322440 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:07,400-Speed 2513.26 samples/sec Loss 2.6280 LearningRate 0.000461 Epoch: 15 Global Step: 322450 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:15,608-Speed 2495.35 samples/sec Loss 2.6061 LearningRate 0.000461 Epoch: 15 Global Step: 322460 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:23,805-Speed 2498.72 samples/sec Loss 2.7017 LearningRate 0.000461 Epoch: 15 Global Step: 322470 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:32,012-Speed 2496.12 samples/sec Loss 2.6926 LearningRate 0.000461 Epoch: 15 Global Step: 322480 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:40,228-Speed 2492.95 samples/sec Loss 2.6732 LearningRate 0.000461 Epoch: 15 Global Step: 322490 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:48,427-Speed 2498.27 samples/sec Loss 2.6547 LearningRate 0.000461 Epoch: 15 Global Step: 322500 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:15:56,578-Speed 2513.12 samples/sec Loss 2.6149 LearningRate 0.000461 Epoch: 15 Global Step: 322510 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:04,778-Speed 2498.12 samples/sec Loss 2.6549 LearningRate 0.000461 Epoch: 15 Global Step: 322520 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:12,981-Speed 2496.85 samples/sec Loss 2.6799 LearningRate 0.000461 Epoch: 15 Global Step: 322530 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:21,182-Speed 2497.56 samples/sec Loss 2.6670 LearningRate 0.000461 Epoch: 15 Global Step: 322540 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:29,384-Speed 2497.90 samples/sec Loss 2.6662 LearningRate 0.000461 Epoch: 15 Global Step: 322550 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:37,591-Speed 2495.79 samples/sec Loss 2.6508 LearningRate 0.000461 Epoch: 15 Global Step: 322560 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:45,742-Speed 2512.82 samples/sec Loss 2.6166 LearningRate 0.000461 Epoch: 15 Global Step: 322570 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:16:53,940-Speed 2498.59 samples/sec Loss 2.6340 LearningRate 0.000461 Epoch: 15 Global Step: 322580 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:02,143-Speed 2496.91 samples/sec Loss 2.6447 LearningRate 0.000461 Epoch: 15 Global Step: 322590 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:10,347-Speed 2497.03 samples/sec Loss 2.6640 LearningRate 0.000461 Epoch: 15 Global Step: 322600 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:18,551-Speed 2496.87 samples/sec Loss 2.6313 LearningRate 0.000461 Epoch: 15 Global Step: 322610 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:26,756-Speed 2496.18 samples/sec Loss 2.6534 LearningRate 0.000461 Epoch: 15 Global Step: 322620 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:34,906-Speed 2513.31 samples/sec Loss 2.6923 LearningRate 0.000461 Epoch: 15 Global Step: 322630 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:43,114-Speed 2495.83 samples/sec Loss 2.6238 LearningRate 0.000461 Epoch: 15 Global Step: 322640 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:51,335-Speed 2491.41 samples/sec Loss 2.6366 LearningRate 0.000461 Epoch: 15 Global Step: 322650 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:17:59,537-Speed 2497.37 samples/sec Loss 2.6757 LearningRate 0.000461 Epoch: 15 Global Step: 322660 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:07,752-Speed 2493.61 samples/sec Loss 2.5747 LearningRate 0.000461 Epoch: 15 Global Step: 322670 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:15,952-Speed 2497.78 samples/sec Loss 2.5951 LearningRate 0.000461 Epoch: 15 Global Step: 322680 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:24,099-Speed 2514.35 samples/sec Loss 2.6377 LearningRate 0.000461 Epoch: 15 Global Step: 322690 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:32,301-Speed 2497.24 samples/sec Loss 2.6200 LearningRate 0.000461 Epoch: 15 Global Step: 322700 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:40,517-Speed 2493.37 samples/sec Loss 2.6504 LearningRate 0.000461 Epoch: 15 Global Step: 322710 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:48,723-Speed 2496.21 samples/sec Loss 2.5802 LearningRate 0.000461 Epoch: 15 Global Step: 322720 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:18:56,938-Speed 2493.29 samples/sec Loss 2.6357 LearningRate 0.000461 Epoch: 15 Global Step: 322730 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:05,140-Speed 2497.22 samples/sec Loss 2.6260 LearningRate 0.000461 Epoch: 15 Global Step: 322740 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:13,289-Speed 2513.76 samples/sec Loss 2.6133 LearningRate 0.000461 Epoch: 15 Global Step: 322750 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:21,505-Speed 2492.99 samples/sec Loss 2.6459 LearningRate 0.000461 Epoch: 15 Global Step: 322760 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:29,707-Speed 2497.22 samples/sec Loss 2.6535 LearningRate 0.000461 Epoch: 15 Global Step: 322770 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:37,907-Speed 2498.05 samples/sec Loss 2.5932 LearningRate 0.000461 Epoch: 15 Global Step: 322780 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:46,129-Speed 2491.21 samples/sec Loss 2.6723 LearningRate 0.000461 Epoch: 15 Global Step: 322790 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:19:54,343-Speed 2493.75 samples/sec Loss 2.6048 LearningRate 0.000461 Epoch: 15 Global Step: 322800 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:02,490-Speed 2513.84 samples/sec Loss 2.5745 LearningRate 0.000461 Epoch: 15 Global Step: 322810 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:10,694-Speed 2496.91 samples/sec Loss 2.5945 LearningRate 0.000461 Epoch: 15 Global Step: 322820 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:18,893-Speed 2498.44 samples/sec Loss 2.6044 LearningRate 0.000461 Epoch: 15 Global Step: 322830 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:27,095-Speed 2497.44 samples/sec Loss 2.5722 LearningRate 0.000461 Epoch: 15 Global Step: 322840 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:35,296-Speed 2497.71 samples/sec Loss 2.5615 LearningRate 0.000461 Epoch: 15 Global Step: 322850 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:43,497-Speed 2497.49 samples/sec Loss 2.5555 LearningRate 0.000461 Epoch: 15 Global Step: 322860 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:51,682-Speed 2502.82 samples/sec Loss 2.5718 LearningRate 0.000461 Epoch: 15 Global Step: 322870 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:20:59,921-Speed 2486.08 samples/sec Loss 2.5826 LearningRate 0.000461 Epoch: 15 Global Step: 322880 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:21:08,122-Speed 2497.62 samples/sec Loss 2.5966 LearningRate 0.000461 Epoch: 15 Global Step: 322890 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:21:16,323-Speed 2497.91 samples/sec Loss 2.6413 LearningRate 0.000461 Epoch: 15 Global Step: 322900 Fp16 Grad Scale: 65536 Required: 116 hours Training: 2022-07-08 16:21:24,494-Speed 2506.87 samples/sec Loss 2.6066 LearningRate 0.000461 Epoch: 15 Global Step: 322910 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:21:32,696-Speed 2497.41 samples/sec Loss 2.7025 LearningRate 0.000460 Epoch: 15 Global Step: 322920 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:21:40,841-Speed 2514.73 samples/sec Loss 2.6537 LearningRate 0.000460 Epoch: 15 Global Step: 322930 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:21:49,046-Speed 2496.34 samples/sec Loss 2.6276 LearningRate 0.000460 Epoch: 15 Global Step: 322940 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:21:57,248-Speed 2497.48 samples/sec Loss 2.6200 LearningRate 0.000460 Epoch: 15 Global Step: 322950 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:05,460-Speed 2494.54 samples/sec Loss 2.6440 LearningRate 0.000460 Epoch: 15 Global Step: 322960 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:13,664-Speed 2496.56 samples/sec Loss 2.6271 LearningRate 0.000460 Epoch: 15 Global Step: 322970 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:21,865-Speed 2497.66 samples/sec Loss 2.6228 LearningRate 0.000460 Epoch: 15 Global Step: 322980 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:30,014-Speed 2513.62 samples/sec Loss 2.6808 LearningRate 0.000460 Epoch: 15 Global Step: 322990 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:38,219-Speed 2496.35 samples/sec Loss 2.5637 LearningRate 0.000460 Epoch: 15 Global Step: 323000 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:46,421-Speed 2497.38 samples/sec Loss 2.6838 LearningRate 0.000460 Epoch: 15 Global Step: 323010 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:22:54,628-Speed 2496.12 samples/sec Loss 2.6838 LearningRate 0.000460 Epoch: 15 Global Step: 323020 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:02,828-Speed 2497.69 samples/sec Loss 2.5673 LearningRate 0.000460 Epoch: 15 Global Step: 323030 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:11,031-Speed 2497.23 samples/sec Loss 2.5757 LearningRate 0.000460 Epoch: 15 Global Step: 323040 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:19,185-Speed 2512.24 samples/sec Loss 2.6729 LearningRate 0.000460 Epoch: 15 Global Step: 323050 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:27,386-Speed 2497.51 samples/sec Loss 2.6608 LearningRate 0.000460 Epoch: 15 Global Step: 323060 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:35,593-Speed 2495.90 samples/sec Loss 2.6318 LearningRate 0.000460 Epoch: 15 Global Step: 323070 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:43,793-Speed 2497.91 samples/sec Loss 2.6514 LearningRate 0.000460 Epoch: 15 Global Step: 323080 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:23:51,997-Speed 2496.80 samples/sec Loss 2.5972 LearningRate 0.000460 Epoch: 15 Global Step: 323090 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:00,209-Speed 2494.30 samples/sec Loss 2.6210 LearningRate 0.000460 Epoch: 15 Global Step: 323100 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:08,370-Speed 2509.93 samples/sec Loss 2.6223 LearningRate 0.000460 Epoch: 15 Global Step: 323110 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:16,585-Speed 2493.44 samples/sec Loss 2.6260 LearningRate 0.000460 Epoch: 15 Global Step: 323120 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:24,791-Speed 2496.37 samples/sec Loss 2.6518 LearningRate 0.000460 Epoch: 15 Global Step: 323130 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:32,998-Speed 2495.74 samples/sec Loss 2.5486 LearningRate 0.000460 Epoch: 15 Global Step: 323140 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:41,198-Speed 2497.70 samples/sec Loss 2.6355 LearningRate 0.000460 Epoch: 15 Global Step: 323150 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:49,401-Speed 2497.05 samples/sec Loss 2.5969 LearningRate 0.000460 Epoch: 15 Global Step: 323160 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:24:57,557-Speed 2511.48 samples/sec Loss 2.6054 LearningRate 0.000460 Epoch: 15 Global Step: 323170 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:05,848-Speed 2470.47 samples/sec Loss 2.5966 LearningRate 0.000460 Epoch: 15 Global Step: 323180 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:14,061-Speed 2494.17 samples/sec Loss 2.6363 LearningRate 0.000460 Epoch: 15 Global Step: 323190 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:22,280-Speed 2492.37 samples/sec Loss 2.6129 LearningRate 0.000460 Epoch: 15 Global Step: 323200 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:30,484-Speed 2496.63 samples/sec Loss 2.6026 LearningRate 0.000460 Epoch: 15 Global Step: 323210 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:38,686-Speed 2497.20 samples/sec Loss 2.5580 LearningRate 0.000460 Epoch: 15 Global Step: 323220 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:46,839-Speed 2512.46 samples/sec Loss 2.5673 LearningRate 0.000460 Epoch: 15 Global Step: 323230 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:25:55,041-Speed 2497.40 samples/sec Loss 2.6309 LearningRate 0.000460 Epoch: 15 Global Step: 323240 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:03,244-Speed 2497.15 samples/sec Loss 2.5481 LearningRate 0.000460 Epoch: 15 Global Step: 323250 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:11,444-Speed 2498.09 samples/sec Loss 2.5954 LearningRate 0.000460 Epoch: 15 Global Step: 323260 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:19,654-Speed 2494.95 samples/sec Loss 2.5852 LearningRate 0.000460 Epoch: 15 Global Step: 323270 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:27,865-Speed 2494.48 samples/sec Loss 2.6018 LearningRate 0.000460 Epoch: 15 Global Step: 323280 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:36,015-Speed 2513.30 samples/sec Loss 2.6729 LearningRate 0.000460 Epoch: 15 Global Step: 323290 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:44,215-Speed 2497.86 samples/sec Loss 2.7093 LearningRate 0.000460 Epoch: 15 Global Step: 323300 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:26:52,414-Speed 2498.28 samples/sec Loss 2.6304 LearningRate 0.000460 Epoch: 15 Global Step: 323310 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:00,618-Speed 2496.77 samples/sec Loss 2.6450 LearningRate 0.000460 Epoch: 15 Global Step: 323320 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:08,818-Speed 2497.94 samples/sec Loss 2.6853 LearningRate 0.000460 Epoch: 15 Global Step: 323330 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:17,021-Speed 2497.04 samples/sec Loss 2.7591 LearningRate 0.000460 Epoch: 15 Global Step: 323340 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:25,168-Speed 2514.05 samples/sec Loss 2.6542 LearningRate 0.000460 Epoch: 15 Global Step: 323350 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:33,368-Speed 2498.07 samples/sec Loss 2.7957 LearningRate 0.000460 Epoch: 15 Global Step: 323360 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:41,591-Speed 2490.79 samples/sec Loss 2.7136 LearningRate 0.000460 Epoch: 15 Global Step: 323370 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:49,790-Speed 2498.49 samples/sec Loss 2.7572 LearningRate 0.000460 Epoch: 15 Global Step: 323380 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:27:57,991-Speed 2497.76 samples/sec Loss 2.7582 LearningRate 0.000460 Epoch: 15 Global Step: 323390 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:06,197-Speed 2496.15 samples/sec Loss 2.7384 LearningRate 0.000460 Epoch: 15 Global Step: 323400 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:14,345-Speed 2513.74 samples/sec Loss 2.7103 LearningRate 0.000460 Epoch: 15 Global Step: 323410 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:22,571-Speed 2490.28 samples/sec Loss 2.6873 LearningRate 0.000460 Epoch: 15 Global Step: 323420 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:30,776-Speed 2496.39 samples/sec Loss 2.6772 LearningRate 0.000460 Epoch: 15 Global Step: 323430 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:38,974-Speed 2498.59 samples/sec Loss 2.6985 LearningRate 0.000460 Epoch: 15 Global Step: 323440 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:47,173-Speed 2497.99 samples/sec Loss 2.7198 LearningRate 0.000460 Epoch: 15 Global Step: 323450 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:28:55,385-Speed 2494.47 samples/sec Loss 2.6624 LearningRate 0.000460 Epoch: 15 Global Step: 323460 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:29:03,534-Speed 2513.73 samples/sec Loss 2.6483 LearningRate 0.000459 Epoch: 15 Global Step: 323470 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:29:11,702-Speed 2507.72 samples/sec Loss 2.6288 LearningRate 0.000459 Epoch: 15 Global Step: 323480 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:29:19,900-Speed 2498.52 samples/sec Loss 2.6129 LearningRate 0.000459 Epoch: 15 Global Step: 323490 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:29:28,109-Speed 2495.24 samples/sec Loss 2.5721 LearningRate 0.000459 Epoch: 15 Global Step: 323500 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:29:36,311-Speed 2497.17 samples/sec Loss 2.6105 LearningRate 0.000459 Epoch: 15 Global Step: 323510 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:29:44,515-Speed 2496.66 samples/sec Loss 2.6294 LearningRate 0.000459 Epoch: 15 Global Step: 323520 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:29:52,665-Speed 2513.37 samples/sec Loss 2.5995 LearningRate 0.000459 Epoch: 15 Global Step: 323530 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:00,871-Speed 2496.47 samples/sec Loss 2.5769 LearningRate 0.000459 Epoch: 15 Global Step: 323540 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:09,075-Speed 2496.64 samples/sec Loss 2.6978 LearningRate 0.000459 Epoch: 15 Global Step: 323550 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:17,275-Speed 2497.77 samples/sec Loss 2.6006 LearningRate 0.000459 Epoch: 15 Global Step: 323560 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:25,472-Speed 2498.87 samples/sec Loss 2.6140 LearningRate 0.000459 Epoch: 15 Global Step: 323570 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:33,678-Speed 2496.18 samples/sec Loss 2.5347 LearningRate 0.000459 Epoch: 15 Global Step: 323580 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:41,829-Speed 2513.03 samples/sec Loss 2.5763 LearningRate 0.000459 Epoch: 15 Global Step: 323590 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:50,031-Speed 2497.47 samples/sec Loss 2.6117 LearningRate 0.000459 Epoch: 15 Global Step: 323600 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:30:58,237-Speed 2496.34 samples/sec Loss 2.5768 LearningRate 0.000459 Epoch: 15 Global Step: 323610 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:06,435-Speed 2498.53 samples/sec Loss 2.5958 LearningRate 0.000459 Epoch: 15 Global Step: 323620 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:14,656-Speed 2491.70 samples/sec Loss 2.6502 LearningRate 0.000459 Epoch: 15 Global Step: 323630 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:22,857-Speed 2497.49 samples/sec Loss 2.6494 LearningRate 0.000459 Epoch: 15 Global Step: 323640 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:31,008-Speed 2513.06 samples/sec Loss 2.5990 LearningRate 0.000459 Epoch: 15 Global Step: 323650 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:39,216-Speed 2495.55 samples/sec Loss 2.6637 LearningRate 0.000459 Epoch: 15 Global Step: 323660 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:47,419-Speed 2497.17 samples/sec Loss 2.6474 LearningRate 0.000459 Epoch: 15 Global Step: 323670 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:31:55,619-Speed 2498.01 samples/sec Loss 2.6430 LearningRate 0.000459 Epoch: 15 Global Step: 323680 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:03,829-Speed 2495.13 samples/sec Loss 2.6496 LearningRate 0.000459 Epoch: 15 Global Step: 323690 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:12,031-Speed 2497.29 samples/sec Loss 2.6752 LearningRate 0.000459 Epoch: 15 Global Step: 323700 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:20,179-Speed 2513.79 samples/sec Loss 2.6103 LearningRate 0.000459 Epoch: 15 Global Step: 323710 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:28,383-Speed 2496.82 samples/sec Loss 2.6854 LearningRate 0.000459 Epoch: 15 Global Step: 323720 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:36,587-Speed 2496.63 samples/sec Loss 2.5848 LearningRate 0.000459 Epoch: 15 Global Step: 323730 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:44,791-Speed 2497.01 samples/sec Loss 2.6518 LearningRate 0.000459 Epoch: 15 Global Step: 323740 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:32:52,992-Speed 2497.70 samples/sec Loss 2.6328 LearningRate 0.000459 Epoch: 15 Global Step: 323750 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:01,203-Speed 2494.75 samples/sec Loss 2.6612 LearningRate 0.000459 Epoch: 15 Global Step: 323760 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:09,352-Speed 2513.63 samples/sec Loss 2.6474 LearningRate 0.000459 Epoch: 15 Global Step: 323770 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:17,556-Speed 2496.90 samples/sec Loss 2.6217 LearningRate 0.000459 Epoch: 15 Global Step: 323780 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:25,757-Speed 2497.51 samples/sec Loss 2.6650 LearningRate 0.000459 Epoch: 15 Global Step: 323790 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:33,958-Speed 2497.80 samples/sec Loss 2.6621 LearningRate 0.000459 Epoch: 15 Global Step: 323800 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:42,158-Speed 2498.02 samples/sec Loss 2.6397 LearningRate 0.000459 Epoch: 15 Global Step: 323810 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:50,372-Speed 2493.64 samples/sec Loss 2.6236 LearningRate 0.000459 Epoch: 15 Global Step: 323820 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:33:58,522-Speed 2513.37 samples/sec Loss 2.6299 LearningRate 0.000459 Epoch: 15 Global Step: 323830 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:06,729-Speed 2495.74 samples/sec Loss 2.6209 LearningRate 0.000459 Epoch: 15 Global Step: 323840 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:14,933-Speed 2496.83 samples/sec Loss 2.6207 LearningRate 0.000459 Epoch: 15 Global Step: 323850 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:23,131-Speed 2498.45 samples/sec Loss 2.6575 LearningRate 0.000459 Epoch: 15 Global Step: 323860 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:31,329-Speed 2499.10 samples/sec Loss 2.6599 LearningRate 0.000459 Epoch: 15 Global Step: 323870 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:39,529-Speed 2498.05 samples/sec Loss 2.6626 LearningRate 0.000459 Epoch: 15 Global Step: 323880 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:47,677-Speed 2513.99 samples/sec Loss 2.6438 LearningRate 0.000459 Epoch: 15 Global Step: 323890 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:34:55,896-Speed 2491.89 samples/sec Loss 2.6773 LearningRate 0.000459 Epoch: 15 Global Step: 323900 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:04,094-Speed 2498.67 samples/sec Loss 2.6486 LearningRate 0.000459 Epoch: 15 Global Step: 323910 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:12,295-Speed 2497.63 samples/sec Loss 2.6093 LearningRate 0.000459 Epoch: 15 Global Step: 323920 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:20,494-Speed 2498.44 samples/sec Loss 2.6175 LearningRate 0.000459 Epoch: 15 Global Step: 323930 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:28,698-Speed 2496.72 samples/sec Loss 2.6237 LearningRate 0.000459 Epoch: 15 Global Step: 323940 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:36,842-Speed 2514.95 samples/sec Loss 2.6435 LearningRate 0.000459 Epoch: 15 Global Step: 323950 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:45,050-Speed 2495.62 samples/sec Loss 2.6112 LearningRate 0.000459 Epoch: 15 Global Step: 323960 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:35:53,254-Speed 2496.69 samples/sec Loss 2.5957 LearningRate 0.000459 Epoch: 15 Global Step: 323970 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:01,459-Speed 2496.51 samples/sec Loss 2.5547 LearningRate 0.000459 Epoch: 15 Global Step: 323980 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:09,664-Speed 2496.38 samples/sec Loss 2.6647 LearningRate 0.000459 Epoch: 15 Global Step: 323990 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:17,869-Speed 2496.69 samples/sec Loss 2.6201 LearningRate 0.000459 Epoch: 15 Global Step: 324000 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:26,020-Speed 2513.21 samples/sec Loss 2.6068 LearningRate 0.000459 Epoch: 15 Global Step: 324010 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:34,221-Speed 2497.46 samples/sec Loss 2.6547 LearningRate 0.000458 Epoch: 15 Global Step: 324020 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:42,437-Speed 2493.16 samples/sec Loss 2.6148 LearningRate 0.000458 Epoch: 15 Global Step: 324030 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:50,638-Speed 2497.59 samples/sec Loss 2.6528 LearningRate 0.000458 Epoch: 15 Global Step: 324040 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:36:58,850-Speed 2494.42 samples/sec Loss 2.6413 LearningRate 0.000458 Epoch: 15 Global Step: 324050 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:07,063-Speed 2493.81 samples/sec Loss 2.6028 LearningRate 0.000458 Epoch: 15 Global Step: 324060 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:15,214-Speed 2513.14 samples/sec Loss 2.6035 LearningRate 0.000458 Epoch: 15 Global Step: 324070 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:23,414-Speed 2498.16 samples/sec Loss 2.6030 LearningRate 0.000458 Epoch: 15 Global Step: 324080 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:31,611-Speed 2498.91 samples/sec Loss 2.6166 LearningRate 0.000458 Epoch: 15 Global Step: 324090 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:39,812-Speed 2497.61 samples/sec Loss 2.6574 LearningRate 0.000458 Epoch: 15 Global Step: 324100 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:48,012-Speed 2498.25 samples/sec Loss 2.6507 LearningRate 0.000458 Epoch: 15 Global Step: 324110 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:37:56,222-Speed 2494.69 samples/sec Loss 2.6362 LearningRate 0.000458 Epoch: 15 Global Step: 324120 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:04,365-Speed 2515.52 samples/sec Loss 2.6237 LearningRate 0.000458 Epoch: 15 Global Step: 324130 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:12,565-Speed 2497.80 samples/sec Loss 2.5613 LearningRate 0.000458 Epoch: 15 Global Step: 324140 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:20,766-Speed 2497.95 samples/sec Loss 2.6171 LearningRate 0.000458 Epoch: 15 Global Step: 324150 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:28,967-Speed 2497.45 samples/sec Loss 2.6288 LearningRate 0.000458 Epoch: 15 Global Step: 324160 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:37,172-Speed 2496.65 samples/sec Loss 2.5963 LearningRate 0.000458 Epoch: 15 Global Step: 324170 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:45,372-Speed 2497.74 samples/sec Loss 2.5430 LearningRate 0.000458 Epoch: 15 Global Step: 324180 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:38:53,518-Speed 2514.51 samples/sec Loss 2.6259 LearningRate 0.000458 Epoch: 15 Global Step: 324190 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:01,724-Speed 2496.32 samples/sec Loss 2.5565 LearningRate 0.000458 Epoch: 15 Global Step: 324200 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:09,924-Speed 2497.89 samples/sec Loss 2.5889 LearningRate 0.000458 Epoch: 15 Global Step: 324210 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:18,127-Speed 2497.03 samples/sec Loss 2.6036 LearningRate 0.000458 Epoch: 15 Global Step: 324220 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:26,337-Speed 2494.89 samples/sec Loss 2.5852 LearningRate 0.000458 Epoch: 15 Global Step: 324230 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:34,545-Speed 2495.83 samples/sec Loss 2.6180 LearningRate 0.000458 Epoch: 15 Global Step: 324240 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:42,698-Speed 2512.22 samples/sec Loss 2.6087 LearningRate 0.000458 Epoch: 15 Global Step: 324250 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:50,902-Speed 2497.01 samples/sec Loss 2.6345 LearningRate 0.000458 Epoch: 15 Global Step: 324260 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:39:59,107-Speed 2496.33 samples/sec Loss 2.6056 LearningRate 0.000458 Epoch: 15 Global Step: 324270 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:07,321-Speed 2493.75 samples/sec Loss 2.6584 LearningRate 0.000458 Epoch: 15 Global Step: 324280 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:15,524-Speed 2497.14 samples/sec Loss 2.6599 LearningRate 0.000458 Epoch: 15 Global Step: 324290 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:23,722-Speed 2498.34 samples/sec Loss 2.6070 LearningRate 0.000458 Epoch: 15 Global Step: 324300 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:31,871-Speed 2513.75 samples/sec Loss 2.6374 LearningRate 0.000458 Epoch: 15 Global Step: 324310 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:40,079-Speed 2495.67 samples/sec Loss 2.5255 LearningRate 0.000458 Epoch: 15 Global Step: 324320 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:48,282-Speed 2496.78 samples/sec Loss 2.6094 LearningRate 0.000458 Epoch: 15 Global Step: 324330 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:40:56,486-Speed 2496.90 samples/sec Loss 2.6226 LearningRate 0.000458 Epoch: 15 Global Step: 324340 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:04,689-Speed 2497.04 samples/sec Loss 2.5438 LearningRate 0.000458 Epoch: 15 Global Step: 324350 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:12,892-Speed 2497.23 samples/sec Loss 2.5667 LearningRate 0.000458 Epoch: 15 Global Step: 324360 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:21,042-Speed 2513.05 samples/sec Loss 2.6060 LearningRate 0.000458 Epoch: 15 Global Step: 324370 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:29,243-Speed 2497.73 samples/sec Loss 2.5898 LearningRate 0.000458 Epoch: 15 Global Step: 324380 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:37,445-Speed 2497.39 samples/sec Loss 2.5892 LearningRate 0.000458 Epoch: 15 Global Step: 324390 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:45,646-Speed 2497.63 samples/sec Loss 2.5812 LearningRate 0.000458 Epoch: 15 Global Step: 324400 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:41:53,848-Speed 2497.22 samples/sec Loss 2.5929 LearningRate 0.000458 Epoch: 15 Global Step: 324410 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:02,047-Speed 2498.56 samples/sec Loss 2.6444 LearningRate 0.000458 Epoch: 15 Global Step: 324420 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:10,207-Speed 2510.24 samples/sec Loss 2.5947 LearningRate 0.000458 Epoch: 15 Global Step: 324430 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:18,406-Speed 2498.20 samples/sec Loss 2.5743 LearningRate 0.000458 Epoch: 15 Global Step: 324440 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:26,604-Speed 2498.69 samples/sec Loss 2.7085 LearningRate 0.000458 Epoch: 15 Global Step: 324450 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:34,801-Speed 2498.97 samples/sec Loss 2.6498 LearningRate 0.000458 Epoch: 15 Global Step: 324460 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:43,004-Speed 2496.99 samples/sec Loss 2.6301 LearningRate 0.000458 Epoch: 15 Global Step: 324470 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:51,208-Speed 2496.73 samples/sec Loss 2.6469 LearningRate 0.000458 Epoch: 15 Global Step: 324480 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:42:59,360-Speed 2512.49 samples/sec Loss 2.6425 LearningRate 0.000458 Epoch: 15 Global Step: 324490 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:07,561-Speed 2497.89 samples/sec Loss 2.6351 LearningRate 0.000458 Epoch: 15 Global Step: 324500 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:15,770-Speed 2495.05 samples/sec Loss 2.5992 LearningRate 0.000458 Epoch: 15 Global Step: 324510 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:23,970-Speed 2497.78 samples/sec Loss 2.6130 LearningRate 0.000458 Epoch: 15 Global Step: 324520 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:32,175-Speed 2496.49 samples/sec Loss 2.6008 LearningRate 0.000458 Epoch: 15 Global Step: 324530 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:40,375-Speed 2498.09 samples/sec Loss 2.5455 LearningRate 0.000458 Epoch: 15 Global Step: 324540 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:48,524-Speed 2513.42 samples/sec Loss 2.6416 LearningRate 0.000458 Epoch: 15 Global Step: 324550 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:43:56,723-Speed 2498.22 samples/sec Loss 2.5817 LearningRate 0.000458 Epoch: 15 Global Step: 324560 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:04,927-Speed 2496.66 samples/sec Loss 2.5690 LearningRate 0.000457 Epoch: 15 Global Step: 324570 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:13,131-Speed 2496.84 samples/sec Loss 2.5891 LearningRate 0.000457 Epoch: 15 Global Step: 324580 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:21,336-Speed 2496.51 samples/sec Loss 2.6342 LearningRate 0.000457 Epoch: 15 Global Step: 324590 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:29,537-Speed 2497.57 samples/sec Loss 2.5434 LearningRate 0.000457 Epoch: 15 Global Step: 324600 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:37,687-Speed 2513.13 samples/sec Loss 2.5796 LearningRate 0.000457 Epoch: 15 Global Step: 324610 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:45,894-Speed 2496.05 samples/sec Loss 2.6281 LearningRate 0.000457 Epoch: 15 Global Step: 324620 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:44:54,098-Speed 2496.50 samples/sec Loss 2.5921 LearningRate 0.000457 Epoch: 15 Global Step: 324630 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:45:02,302-Speed 2496.68 samples/sec Loss 2.6112 LearningRate 0.000457 Epoch: 15 Global Step: 324640 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:45:10,507-Speed 2496.59 samples/sec Loss 2.5816 LearningRate 0.000457 Epoch: 15 Global Step: 324650 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:45:18,708-Speed 2497.92 samples/sec Loss 2.6691 LearningRate 0.000457 Epoch: 15 Global Step: 324660 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:45:26,861-Speed 2512.12 samples/sec Loss 2.5988 LearningRate 0.000457 Epoch: 15 Global Step: 324670 Fp16 Grad Scale: 16384 Required: 116 hours Training: 2022-07-08 16:45:35,074-Speed 2494.11 samples/sec Loss 2.6456 LearningRate 0.000457 Epoch: 15 Global Step: 324680 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:45:43,287-Speed 2494.28 samples/sec Loss 2.6454 LearningRate 0.000457 Epoch: 15 Global Step: 324690 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:45:51,491-Speed 2496.54 samples/sec Loss 2.6094 LearningRate 0.000457 Epoch: 15 Global Step: 324700 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:45:59,708-Speed 2492.81 samples/sec Loss 2.5630 LearningRate 0.000457 Epoch: 15 Global Step: 324710 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:07,910-Speed 2497.60 samples/sec Loss 2.6401 LearningRate 0.000457 Epoch: 15 Global Step: 324720 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:16,056-Speed 2514.44 samples/sec Loss 2.5545 LearningRate 0.000457 Epoch: 15 Global Step: 324730 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:24,262-Speed 2496.25 samples/sec Loss 2.6103 LearningRate 0.000457 Epoch: 15 Global Step: 324740 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:32,465-Speed 2496.96 samples/sec Loss 2.5975 LearningRate 0.000457 Epoch: 15 Global Step: 324750 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:40,673-Speed 2495.54 samples/sec Loss 2.6285 LearningRate 0.000457 Epoch: 15 Global Step: 324760 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:48,875-Speed 2497.35 samples/sec Loss 2.5689 LearningRate 0.000457 Epoch: 15 Global Step: 324770 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:46:57,073-Speed 2498.50 samples/sec Loss 2.5969 LearningRate 0.000457 Epoch: 15 Global Step: 324780 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:05,223-Speed 2513.10 samples/sec Loss 2.6002 LearningRate 0.000457 Epoch: 15 Global Step: 324790 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:13,425-Speed 2497.64 samples/sec Loss 2.6331 LearningRate 0.000457 Epoch: 15 Global Step: 324800 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:21,633-Speed 2495.61 samples/sec Loss 2.6072 LearningRate 0.000457 Epoch: 15 Global Step: 324810 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:29,835-Speed 2497.16 samples/sec Loss 2.6214 LearningRate 0.000457 Epoch: 15 Global Step: 324820 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:38,033-Speed 2498.57 samples/sec Loss 2.6312 LearningRate 0.000457 Epoch: 15 Global Step: 324830 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:46,241-Speed 2495.55 samples/sec Loss 2.6089 LearningRate 0.000457 Epoch: 15 Global Step: 324840 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:47:54,402-Speed 2509.90 samples/sec Loss 2.6155 LearningRate 0.000457 Epoch: 15 Global Step: 324850 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:02,612-Speed 2494.85 samples/sec Loss 2.5551 LearningRate 0.000457 Epoch: 15 Global Step: 324860 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:10,816-Speed 2496.93 samples/sec Loss 2.6428 LearningRate 0.000457 Epoch: 15 Global Step: 324870 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:19,019-Speed 2497.02 samples/sec Loss 2.5926 LearningRate 0.000457 Epoch: 15 Global Step: 324880 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:27,221-Speed 2497.41 samples/sec Loss 2.5472 LearningRate 0.000457 Epoch: 15 Global Step: 324890 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:35,422-Speed 2497.40 samples/sec Loss 2.6196 LearningRate 0.000457 Epoch: 15 Global Step: 324900 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:43,569-Speed 2514.18 samples/sec Loss 2.6222 LearningRate 0.000457 Epoch: 15 Global Step: 324910 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:51,773-Speed 2496.73 samples/sec Loss 2.6053 LearningRate 0.000457 Epoch: 15 Global Step: 324920 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:48:59,988-Speed 2493.64 samples/sec Loss 2.6047 LearningRate 0.000457 Epoch: 15 Global Step: 324930 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:49:08,199-Speed 2494.32 samples/sec Loss 2.6424 LearningRate 0.000457 Epoch: 15 Global Step: 324940 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:49:16,398-Speed 2498.38 samples/sec Loss 2.6104 LearningRate 0.000457 Epoch: 15 Global Step: 324950 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:49:24,603-Speed 2496.36 samples/sec Loss 2.5937 LearningRate 0.000457 Epoch: 15 Global Step: 324960 Fp16 Grad Scale: 32768 Required: 116 hours Training: 2022-07-08 16:49:32,751-Speed 2513.85 samples/sec Loss 2.5458 LearningRate 0.000457 Epoch: 15 Global Step: 324970 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:49:40,955-Speed 2496.74 samples/sec Loss 2.5464 LearningRate 0.000457 Epoch: 15 Global Step: 324980 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:49:49,157-Speed 2497.47 samples/sec Loss 2.6012 LearningRate 0.000457 Epoch: 15 Global Step: 324990 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:49:57,356-Speed 2498.08 samples/sec Loss 2.5748 LearningRate 0.000457 Epoch: 15 Global Step: 325000 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:05,561-Speed 2496.40 samples/sec Loss 2.5850 LearningRate 0.000457 Epoch: 15 Global Step: 325010 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:13,765-Speed 2496.79 samples/sec Loss 2.6595 LearningRate 0.000457 Epoch: 15 Global Step: 325020 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:21,915-Speed 2513.46 samples/sec Loss 2.6127 LearningRate 0.000457 Epoch: 15 Global Step: 325030 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:30,124-Speed 2495.36 samples/sec Loss 2.6513 LearningRate 0.000457 Epoch: 15 Global Step: 325040 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:38,324-Speed 2497.74 samples/sec Loss 2.6675 LearningRate 0.000457 Epoch: 15 Global Step: 325050 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:46,529-Speed 2496.43 samples/sec Loss 2.6043 LearningRate 0.000457 Epoch: 15 Global Step: 325060 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:50:54,728-Speed 2498.40 samples/sec Loss 2.6194 LearningRate 0.000457 Epoch: 15 Global Step: 325070 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:02,929-Speed 2497.99 samples/sec Loss 2.6168 LearningRate 0.000457 Epoch: 15 Global Step: 325080 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:11,089-Speed 2510.12 samples/sec Loss 2.6600 LearningRate 0.000457 Epoch: 15 Global Step: 325090 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:19,286-Speed 2498.95 samples/sec Loss 2.6113 LearningRate 0.000457 Epoch: 15 Global Step: 325100 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:27,490-Speed 2496.85 samples/sec Loss 2.6460 LearningRate 0.000457 Epoch: 15 Global Step: 325110 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:35,692-Speed 2497.44 samples/sec Loss 2.5573 LearningRate 0.000456 Epoch: 15 Global Step: 325120 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:43,891-Speed 2498.13 samples/sec Loss 2.6363 LearningRate 0.000456 Epoch: 15 Global Step: 325130 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:51:52,092-Speed 2498.03 samples/sec Loss 2.6295 LearningRate 0.000456 Epoch: 15 Global Step: 325140 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:00,234-Speed 2515.79 samples/sec Loss 2.6933 LearningRate 0.000456 Epoch: 15 Global Step: 325150 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:08,431-Speed 2498.57 samples/sec Loss 2.5755 LearningRate 0.000456 Epoch: 15 Global Step: 325160 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:16,630-Speed 2498.27 samples/sec Loss 2.6308 LearningRate 0.000456 Epoch: 15 Global Step: 325170 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:24,832-Speed 2497.37 samples/sec Loss 2.6116 LearningRate 0.000456 Epoch: 15 Global Step: 325180 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:33,036-Speed 2497.00 samples/sec Loss 2.6381 LearningRate 0.000456 Epoch: 15 Global Step: 325190 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:41,235-Speed 2498.10 samples/sec Loss 2.6229 LearningRate 0.000456 Epoch: 15 Global Step: 325200 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:49,385-Speed 2513.33 samples/sec Loss 2.6404 LearningRate 0.000456 Epoch: 15 Global Step: 325210 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:52:57,588-Speed 2497.25 samples/sec Loss 2.6438 LearningRate 0.000456 Epoch: 15 Global Step: 325220 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:05,787-Speed 2498.29 samples/sec Loss 2.6123 LearningRate 0.000456 Epoch: 15 Global Step: 325230 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:13,986-Speed 2498.42 samples/sec Loss 2.6303 LearningRate 0.000456 Epoch: 15 Global Step: 325240 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:22,187-Speed 2497.90 samples/sec Loss 2.6435 LearningRate 0.000456 Epoch: 15 Global Step: 325250 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:30,394-Speed 2495.92 samples/sec Loss 2.5798 LearningRate 0.000456 Epoch: 15 Global Step: 325260 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:38,544-Speed 2513.02 samples/sec Loss 2.6202 LearningRate 0.000456 Epoch: 15 Global Step: 325270 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:46,747-Speed 2497.41 samples/sec Loss 2.6155 LearningRate 0.000456 Epoch: 15 Global Step: 325280 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:53:54,947-Speed 2497.83 samples/sec Loss 2.6505 LearningRate 0.000456 Epoch: 15 Global Step: 325290 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:03,144-Speed 2499.10 samples/sec Loss 2.6223 LearningRate 0.000456 Epoch: 15 Global Step: 325300 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:11,350-Speed 2496.14 samples/sec Loss 2.5930 LearningRate 0.000456 Epoch: 15 Global Step: 325310 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:19,555-Speed 2496.38 samples/sec Loss 2.6051 LearningRate 0.000456 Epoch: 15 Global Step: 325320 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:27,713-Speed 2510.92 samples/sec Loss 2.5521 LearningRate 0.000456 Epoch: 15 Global Step: 325330 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:35,914-Speed 2497.70 samples/sec Loss 2.5899 LearningRate 0.000456 Epoch: 15 Global Step: 325340 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:44,113-Speed 2498.36 samples/sec Loss 2.5702 LearningRate 0.000456 Epoch: 15 Global Step: 325350 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:54:52,310-Speed 2498.83 samples/sec Loss 2.6683 LearningRate 0.000456 Epoch: 15 Global Step: 325360 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:00,509-Speed 2498.69 samples/sec Loss 2.6027 LearningRate 0.000456 Epoch: 15 Global Step: 325370 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:08,711-Speed 2497.33 samples/sec Loss 2.5848 LearningRate 0.000456 Epoch: 15 Global Step: 325380 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:16,860-Speed 2513.73 samples/sec Loss 2.4978 LearningRate 0.000456 Epoch: 15 Global Step: 325390 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:25,065-Speed 2496.29 samples/sec Loss 2.5490 LearningRate 0.000456 Epoch: 15 Global Step: 325400 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:33,264-Speed 2498.07 samples/sec Loss 2.5843 LearningRate 0.000456 Epoch: 15 Global Step: 325410 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:41,465-Speed 2497.82 samples/sec Loss 2.5369 LearningRate 0.000456 Epoch: 15 Global Step: 325420 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:49,664-Speed 2498.40 samples/sec Loss 2.6042 LearningRate 0.000456 Epoch: 15 Global Step: 325430 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:55:57,863-Speed 2498.21 samples/sec Loss 2.5727 LearningRate 0.000456 Epoch: 15 Global Step: 325440 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:06,008-Speed 2515.02 samples/sec Loss 2.5977 LearningRate 0.000456 Epoch: 15 Global Step: 325450 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:14,211-Speed 2497.47 samples/sec Loss 2.5835 LearningRate 0.000456 Epoch: 15 Global Step: 325460 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:22,409-Speed 2498.27 samples/sec Loss 2.6326 LearningRate 0.000456 Epoch: 15 Global Step: 325470 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:30,609-Speed 2498.02 samples/sec Loss 2.6830 LearningRate 0.000456 Epoch: 15 Global Step: 325480 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:38,824-Speed 2493.22 samples/sec Loss 2.6715 LearningRate 0.000456 Epoch: 15 Global Step: 325490 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:47,032-Speed 2495.70 samples/sec Loss 2.5705 LearningRate 0.000456 Epoch: 15 Global Step: 325500 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:56:55,180-Speed 2513.93 samples/sec Loss 2.6608 LearningRate 0.000456 Epoch: 15 Global Step: 325510 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:03,386-Speed 2496.01 samples/sec Loss 2.6412 LearningRate 0.000456 Epoch: 15 Global Step: 325520 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:11,587-Speed 2497.67 samples/sec Loss 2.5830 LearningRate 0.000456 Epoch: 15 Global Step: 325530 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:19,784-Speed 2498.91 samples/sec Loss 2.6216 LearningRate 0.000456 Epoch: 15 Global Step: 325540 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:27,982-Speed 2498.63 samples/sec Loss 2.5744 LearningRate 0.000456 Epoch: 15 Global Step: 325550 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:36,184-Speed 2497.48 samples/sec Loss 2.6122 LearningRate 0.000456 Epoch: 15 Global Step: 325560 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:44,337-Speed 2512.60 samples/sec Loss 2.6346 LearningRate 0.000456 Epoch: 15 Global Step: 325570 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:57:52,537-Speed 2497.56 samples/sec Loss 2.6725 LearningRate 0.000456 Epoch: 15 Global Step: 325580 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:00,740-Speed 2497.57 samples/sec Loss 2.6268 LearningRate 0.000456 Epoch: 15 Global Step: 325590 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:08,939-Speed 2498.18 samples/sec Loss 2.6368 LearningRate 0.000456 Epoch: 15 Global Step: 325600 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:17,150-Speed 2494.65 samples/sec Loss 2.6028 LearningRate 0.000456 Epoch: 15 Global Step: 325610 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:25,348-Speed 2498.35 samples/sec Loss 2.6216 LearningRate 0.000456 Epoch: 15 Global Step: 325620 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:33,493-Speed 2514.94 samples/sec Loss 2.5944 LearningRate 0.000456 Epoch: 15 Global Step: 325630 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:41,703-Speed 2494.91 samples/sec Loss 2.6194 LearningRate 0.000456 Epoch: 15 Global Step: 325640 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:49,904-Speed 2497.80 samples/sec Loss 2.6582 LearningRate 0.000456 Epoch: 15 Global Step: 325650 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:58:58,116-Speed 2494.24 samples/sec Loss 2.5904 LearningRate 0.000456 Epoch: 15 Global Step: 325660 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:06,314-Speed 2498.39 samples/sec Loss 2.6367 LearningRate 0.000456 Epoch: 15 Global Step: 325670 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:14,516-Speed 2497.54 samples/sec Loss 2.5896 LearningRate 0.000455 Epoch: 15 Global Step: 325680 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:22,657-Speed 2516.13 samples/sec Loss 2.5812 LearningRate 0.000455 Epoch: 15 Global Step: 325690 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:30,857-Speed 2497.89 samples/sec Loss 2.5852 LearningRate 0.000455 Epoch: 15 Global Step: 325700 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:39,056-Speed 2498.57 samples/sec Loss 2.5862 LearningRate 0.000455 Epoch: 15 Global Step: 325710 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:47,260-Speed 2497.00 samples/sec Loss 2.6370 LearningRate 0.000455 Epoch: 15 Global Step: 325720 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 16:59:55,462-Speed 2497.45 samples/sec Loss 2.5810 LearningRate 0.000455 Epoch: 15 Global Step: 325730 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:03,664-Speed 2497.26 samples/sec Loss 2.7637 LearningRate 0.000455 Epoch: 15 Global Step: 325740 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:11,815-Speed 2512.92 samples/sec Loss 2.5955 LearningRate 0.000455 Epoch: 15 Global Step: 325750 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:20,021-Speed 2496.23 samples/sec Loss 2.5874 LearningRate 0.000455 Epoch: 15 Global Step: 325760 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:28,223-Speed 2497.25 samples/sec Loss 2.6691 LearningRate 0.000455 Epoch: 15 Global Step: 325770 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:36,437-Speed 2493.50 samples/sec Loss 2.6257 LearningRate 0.000455 Epoch: 15 Global Step: 325780 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:44,642-Speed 2496.36 samples/sec Loss 2.6673 LearningRate 0.000455 Epoch: 15 Global Step: 325790 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:00:52,844-Speed 2497.50 samples/sec Loss 2.6436 LearningRate 0.000455 Epoch: 15 Global Step: 325800 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:00,997-Speed 2512.22 samples/sec Loss 2.5836 LearningRate 0.000455 Epoch: 15 Global Step: 325810 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:09,214-Speed 2492.73 samples/sec Loss 2.6362 LearningRate 0.000455 Epoch: 15 Global Step: 325820 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:17,419-Speed 2496.33 samples/sec Loss 2.6643 LearningRate 0.000455 Epoch: 15 Global Step: 325830 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:25,619-Speed 2498.39 samples/sec Loss 2.6932 LearningRate 0.000455 Epoch: 15 Global Step: 325840 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:33,826-Speed 2495.74 samples/sec Loss 2.7021 LearningRate 0.000455 Epoch: 15 Global Step: 325850 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:42,027-Speed 2497.56 samples/sec Loss 2.6542 LearningRate 0.000455 Epoch: 15 Global Step: 325860 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:50,171-Speed 2515.11 samples/sec Loss 2.6979 LearningRate 0.000455 Epoch: 15 Global Step: 325870 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:01:58,370-Speed 2498.43 samples/sec Loss 2.6399 LearningRate 0.000455 Epoch: 15 Global Step: 325880 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:02:06,573-Speed 2497.24 samples/sec Loss 2.6746 LearningRate 0.000455 Epoch: 15 Global Step: 325890 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:02:14,771-Speed 2498.40 samples/sec Loss 2.6551 LearningRate 0.000455 Epoch: 15 Global Step: 325900 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:02:22,926-Speed 2511.64 samples/sec Loss 2.6369 LearningRate 0.000455 Epoch: 15 Global Step: 325910 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:02:31,125-Speed 2498.55 samples/sec Loss 2.6639 LearningRate 0.000455 Epoch: 15 Global Step: 325920 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:02:39,277-Speed 2512.48 samples/sec Loss 2.6563 LearningRate 0.000455 Epoch: 15 Global Step: 325930 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:02:47,476-Speed 2498.31 samples/sec Loss 2.6744 LearningRate 0.000455 Epoch: 15 Global Step: 325940 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:02:55,675-Speed 2498.40 samples/sec Loss 2.6073 LearningRate 0.000455 Epoch: 15 Global Step: 325950 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:03,875-Speed 2497.66 samples/sec Loss 2.6192 LearningRate 0.000455 Epoch: 15 Global Step: 325960 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:12,074-Speed 2498.34 samples/sec Loss 2.5909 LearningRate 0.000455 Epoch: 15 Global Step: 325970 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:20,284-Speed 2494.81 samples/sec Loss 2.6199 LearningRate 0.000455 Epoch: 15 Global Step: 325980 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:28,426-Speed 2515.82 samples/sec Loss 2.6490 LearningRate 0.000455 Epoch: 15 Global Step: 325990 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:36,623-Speed 2498.88 samples/sec Loss 2.5299 LearningRate 0.000455 Epoch: 15 Global Step: 326000 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:44,820-Speed 2498.77 samples/sec Loss 2.6496 LearningRate 0.000455 Epoch: 15 Global Step: 326010 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:03:53,018-Speed 2498.96 samples/sec Loss 2.6050 LearningRate 0.000455 Epoch: 15 Global Step: 326020 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:01,215-Speed 2499.02 samples/sec Loss 2.6674 LearningRate 0.000455 Epoch: 15 Global Step: 326030 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:09,415-Speed 2497.71 samples/sec Loss 2.6727 LearningRate 0.000455 Epoch: 15 Global Step: 326040 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:17,559-Speed 2515.35 samples/sec Loss 2.6512 LearningRate 0.000455 Epoch: 15 Global Step: 326050 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:25,755-Speed 2499.36 samples/sec Loss 2.6277 LearningRate 0.000455 Epoch: 15 Global Step: 326060 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:33,955-Speed 2498.22 samples/sec Loss 2.6557 LearningRate 0.000455 Epoch: 15 Global Step: 326070 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:42,154-Speed 2498.19 samples/sec Loss 2.6645 LearningRate 0.000455 Epoch: 15 Global Step: 326080 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:50,355-Speed 2497.59 samples/sec Loss 2.6419 LearningRate 0.000455 Epoch: 15 Global Step: 326090 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:04:58,563-Speed 2495.51 samples/sec Loss 2.6542 LearningRate 0.000455 Epoch: 15 Global Step: 326100 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:06,711-Speed 2513.97 samples/sec Loss 2.5955 LearningRate 0.000455 Epoch: 15 Global Step: 326110 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:14,908-Speed 2498.83 samples/sec Loss 2.6224 LearningRate 0.000455 Epoch: 15 Global Step: 326120 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:23,122-Speed 2493.73 samples/sec Loss 2.5755 LearningRate 0.000455 Epoch: 15 Global Step: 326130 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:31,329-Speed 2495.75 samples/sec Loss 2.5612 LearningRate 0.000455 Epoch: 15 Global Step: 326140 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:39,529-Speed 2498.06 samples/sec Loss 2.6653 LearningRate 0.000455 Epoch: 15 Global Step: 326150 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:47,738-Speed 2495.20 samples/sec Loss 2.5715 LearningRate 0.000455 Epoch: 15 Global Step: 326160 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:05:55,883-Speed 2514.82 samples/sec Loss 2.5475 LearningRate 0.000455 Epoch: 15 Global Step: 326170 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:04,099-Speed 2493.29 samples/sec Loss 2.5983 LearningRate 0.000455 Epoch: 15 Global Step: 326180 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:12,296-Speed 2498.69 samples/sec Loss 2.6261 LearningRate 0.000455 Epoch: 15 Global Step: 326190 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:20,497-Speed 2497.72 samples/sec Loss 2.6261 LearningRate 0.000455 Epoch: 15 Global Step: 326200 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:28,698-Speed 2497.38 samples/sec Loss 2.6077 LearningRate 0.000455 Epoch: 15 Global Step: 326210 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:36,902-Speed 2496.96 samples/sec Loss 2.6067 LearningRate 0.000455 Epoch: 15 Global Step: 326220 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:45,060-Speed 2510.73 samples/sec Loss 2.6458 LearningRate 0.000454 Epoch: 15 Global Step: 326230 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:06:53,258-Speed 2498.46 samples/sec Loss 2.6328 LearningRate 0.000454 Epoch: 15 Global Step: 326240 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:01,460-Speed 2497.31 samples/sec Loss 2.6300 LearningRate 0.000454 Epoch: 15 Global Step: 326250 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:09,674-Speed 2493.88 samples/sec Loss 2.6484 LearningRate 0.000454 Epoch: 15 Global Step: 326260 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:17,874-Speed 2497.65 samples/sec Loss 2.6018 LearningRate 0.000454 Epoch: 15 Global Step: 326270 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:26,075-Speed 2497.87 samples/sec Loss 2.7063 LearningRate 0.000454 Epoch: 15 Global Step: 326280 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:34,220-Speed 2514.93 samples/sec Loss 2.5867 LearningRate 0.000454 Epoch: 15 Global Step: 326290 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:42,422-Speed 2497.27 samples/sec Loss 2.6523 LearningRate 0.000454 Epoch: 15 Global Step: 326300 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:50,622-Speed 2497.89 samples/sec Loss 2.6055 LearningRate 0.000454 Epoch: 15 Global Step: 326310 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:07:58,821-Speed 2498.29 samples/sec Loss 2.6476 LearningRate 0.000454 Epoch: 15 Global Step: 326320 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:07,018-Speed 2498.76 samples/sec Loss 2.6603 LearningRate 0.000454 Epoch: 15 Global Step: 326330 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:15,220-Speed 2497.44 samples/sec Loss 2.5873 LearningRate 0.000454 Epoch: 15 Global Step: 326340 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:23,367-Speed 2514.23 samples/sec Loss 2.6316 LearningRate 0.000454 Epoch: 15 Global Step: 326350 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:31,568-Speed 2497.41 samples/sec Loss 2.6308 LearningRate 0.000454 Epoch: 15 Global Step: 326360 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:39,768-Speed 2498.08 samples/sec Loss 2.6237 LearningRate 0.000454 Epoch: 15 Global Step: 326370 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:47,969-Speed 2498.23 samples/sec Loss 2.6228 LearningRate 0.000454 Epoch: 15 Global Step: 326380 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:08:56,171-Speed 2497.24 samples/sec Loss 2.6404 LearningRate 0.000454 Epoch: 15 Global Step: 326390 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:04,371-Speed 2497.99 samples/sec Loss 2.6167 LearningRate 0.000454 Epoch: 15 Global Step: 326400 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:12,515-Speed 2515.06 samples/sec Loss 2.6535 LearningRate 0.000454 Epoch: 15 Global Step: 326410 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:20,717-Speed 2497.31 samples/sec Loss 2.6324 LearningRate 0.000454 Epoch: 15 Global Step: 326420 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:28,916-Speed 2498.35 samples/sec Loss 2.6114 LearningRate 0.000454 Epoch: 15 Global Step: 326430 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:37,117-Speed 2498.08 samples/sec Loss 2.6355 LearningRate 0.000454 Epoch: 15 Global Step: 326440 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:45,316-Speed 2498.15 samples/sec Loss 2.6424 LearningRate 0.000454 Epoch: 15 Global Step: 326450 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:09:53,515-Speed 2498.27 samples/sec Loss 2.6106 LearningRate 0.000454 Epoch: 15 Global Step: 326460 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:01,661-Speed 2514.63 samples/sec Loss 2.5899 LearningRate 0.000454 Epoch: 15 Global Step: 326470 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:09,864-Speed 2496.72 samples/sec Loss 2.6623 LearningRate 0.000454 Epoch: 15 Global Step: 326480 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:18,069-Speed 2496.47 samples/sec Loss 2.6141 LearningRate 0.000454 Epoch: 15 Global Step: 326490 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:26,266-Speed 2499.18 samples/sec Loss 2.5974 LearningRate 0.000454 Epoch: 15 Global Step: 326500 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:34,466-Speed 2497.81 samples/sec Loss 2.6012 LearningRate 0.000454 Epoch: 15 Global Step: 326510 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:42,676-Speed 2495.25 samples/sec Loss 2.6672 LearningRate 0.000454 Epoch: 15 Global Step: 326520 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:50,821-Speed 2514.88 samples/sec Loss 2.6230 LearningRate 0.000454 Epoch: 15 Global Step: 326530 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:10:59,021-Speed 2498.06 samples/sec Loss 2.6279 LearningRate 0.000454 Epoch: 15 Global Step: 326540 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:07,226-Speed 2496.27 samples/sec Loss 2.6240 LearningRate 0.000454 Epoch: 15 Global Step: 326550 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:15,425-Speed 2498.27 samples/sec Loss 2.6927 LearningRate 0.000454 Epoch: 15 Global Step: 326560 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:23,635-Speed 2495.42 samples/sec Loss 2.6400 LearningRate 0.000454 Epoch: 15 Global Step: 326570 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:31,831-Speed 2499.28 samples/sec Loss 2.6444 LearningRate 0.000454 Epoch: 15 Global Step: 326580 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:39,980-Speed 2513.59 samples/sec Loss 2.5817 LearningRate 0.000454 Epoch: 15 Global Step: 326590 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:48,179-Speed 2498.11 samples/sec Loss 2.6091 LearningRate 0.000454 Epoch: 15 Global Step: 326600 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:11:56,381-Speed 2497.70 samples/sec Loss 2.6139 LearningRate 0.000454 Epoch: 15 Global Step: 326610 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:04,578-Speed 2498.69 samples/sec Loss 2.6431 LearningRate 0.000454 Epoch: 15 Global Step: 326620 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:12,774-Speed 2499.26 samples/sec Loss 2.5927 LearningRate 0.000454 Epoch: 15 Global Step: 326630 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:20,974-Speed 2498.12 samples/sec Loss 2.6198 LearningRate 0.000454 Epoch: 15 Global Step: 326640 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:29,122-Speed 2514.13 samples/sec Loss 2.5649 LearningRate 0.000454 Epoch: 15 Global Step: 326650 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:37,338-Speed 2492.70 samples/sec Loss 2.6587 LearningRate 0.000454 Epoch: 15 Global Step: 326660 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:45,547-Speed 2495.07 samples/sec Loss 2.6336 LearningRate 0.000454 Epoch: 15 Global Step: 326670 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:12:53,765-Speed 2492.75 samples/sec Loss 2.6104 LearningRate 0.000454 Epoch: 15 Global Step: 326680 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:01,966-Speed 2497.74 samples/sec Loss 2.6425 LearningRate 0.000454 Epoch: 15 Global Step: 326690 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:10,170-Speed 2496.60 samples/sec Loss 2.6220 LearningRate 0.000454 Epoch: 15 Global Step: 326700 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:18,316-Speed 2514.67 samples/sec Loss 2.6089 LearningRate 0.000454 Epoch: 15 Global Step: 326710 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:26,514-Speed 2498.45 samples/sec Loss 2.5855 LearningRate 0.000454 Epoch: 15 Global Step: 326720 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:34,715-Speed 2497.64 samples/sec Loss 2.6002 LearningRate 0.000454 Epoch: 15 Global Step: 326730 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:42,946-Speed 2488.53 samples/sec Loss 2.5899 LearningRate 0.000454 Epoch: 15 Global Step: 326740 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:51,147-Speed 2497.85 samples/sec Loss 2.6080 LearningRate 0.000454 Epoch: 15 Global Step: 326750 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:13:59,351-Speed 2496.77 samples/sec Loss 2.5942 LearningRate 0.000454 Epoch: 15 Global Step: 326760 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:07,506-Speed 2511.86 samples/sec Loss 2.5660 LearningRate 0.000454 Epoch: 15 Global Step: 326770 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:15,704-Speed 2498.51 samples/sec Loss 2.5779 LearningRate 0.000453 Epoch: 15 Global Step: 326780 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:23,905-Speed 2497.71 samples/sec Loss 2.6342 LearningRate 0.000453 Epoch: 15 Global Step: 326790 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:32,112-Speed 2495.79 samples/sec Loss 2.6529 LearningRate 0.000453 Epoch: 15 Global Step: 326800 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:40,313-Speed 2497.72 samples/sec Loss 2.6279 LearningRate 0.000453 Epoch: 15 Global Step: 326810 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:48,514-Speed 2497.70 samples/sec Loss 2.6231 LearningRate 0.000453 Epoch: 15 Global Step: 326820 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:14:56,664-Speed 2513.36 samples/sec Loss 2.5988 LearningRate 0.000453 Epoch: 15 Global Step: 326830 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:04,863-Speed 2498.23 samples/sec Loss 2.6589 LearningRate 0.000453 Epoch: 15 Global Step: 326840 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:13,073-Speed 2494.95 samples/sec Loss 2.6321 LearningRate 0.000453 Epoch: 15 Global Step: 326850 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:21,280-Speed 2495.64 samples/sec Loss 2.5641 LearningRate 0.000453 Epoch: 15 Global Step: 326860 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:29,480-Speed 2498.17 samples/sec Loss 2.5933 LearningRate 0.000453 Epoch: 15 Global Step: 326870 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:37,685-Speed 2496.19 samples/sec Loss 2.5446 LearningRate 0.000453 Epoch: 15 Global Step: 326880 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:45,831-Speed 2514.53 samples/sec Loss 2.6335 LearningRate 0.000453 Epoch: 15 Global Step: 326890 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:15:54,037-Speed 2496.19 samples/sec Loss 2.6090 LearningRate 0.000453 Epoch: 15 Global Step: 326900 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:02,240-Speed 2497.11 samples/sec Loss 2.5704 LearningRate 0.000453 Epoch: 15 Global Step: 326910 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:10,443-Speed 2496.91 samples/sec Loss 2.6256 LearningRate 0.000453 Epoch: 15 Global Step: 326920 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:18,644-Speed 2497.82 samples/sec Loss 2.6159 LearningRate 0.000453 Epoch: 15 Global Step: 326930 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:26,854-Speed 2494.73 samples/sec Loss 2.5932 LearningRate 0.000453 Epoch: 15 Global Step: 326940 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:35,008-Speed 2512.12 samples/sec Loss 2.6243 LearningRate 0.000453 Epoch: 15 Global Step: 326950 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:43,215-Speed 2495.99 samples/sec Loss 2.7046 LearningRate 0.000453 Epoch: 15 Global Step: 326960 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:51,420-Speed 2496.35 samples/sec Loss 2.6192 LearningRate 0.000453 Epoch: 15 Global Step: 326970 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:16:59,634-Speed 2493.76 samples/sec Loss 2.6381 LearningRate 0.000453 Epoch: 15 Global Step: 326980 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:07,836-Speed 2497.29 samples/sec Loss 2.6144 LearningRate 0.000453 Epoch: 15 Global Step: 326990 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:16,038-Speed 2497.55 samples/sec Loss 2.6506 LearningRate 0.000453 Epoch: 15 Global Step: 327000 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:24,186-Speed 2513.79 samples/sec Loss 2.6020 LearningRate 0.000453 Epoch: 15 Global Step: 327010 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:32,384-Speed 2498.69 samples/sec Loss 2.6222 LearningRate 0.000453 Epoch: 15 Global Step: 327020 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:40,602-Speed 2492.61 samples/sec Loss 2.5885 LearningRate 0.000453 Epoch: 15 Global Step: 327030 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:48,801-Speed 2498.01 samples/sec Loss 2.6565 LearningRate 0.000453 Epoch: 15 Global Step: 327040 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:17:57,002-Speed 2497.79 samples/sec Loss 2.6516 LearningRate 0.000453 Epoch: 15 Global Step: 327050 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:05,209-Speed 2495.77 samples/sec Loss 2.6307 LearningRate 0.000453 Epoch: 15 Global Step: 327060 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:13,356-Speed 2514.29 samples/sec Loss 2.6014 LearningRate 0.000453 Epoch: 15 Global Step: 327070 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:21,571-Speed 2493.61 samples/sec Loss 2.6331 LearningRate 0.000453 Epoch: 15 Global Step: 327080 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:29,770-Speed 2498.22 samples/sec Loss 2.6537 LearningRate 0.000453 Epoch: 15 Global Step: 327090 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:37,966-Speed 2499.04 samples/sec Loss 2.6503 LearningRate 0.000453 Epoch: 15 Global Step: 327100 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:18:46,168-Speed 2497.59 samples/sec Loss 2.6439 LearningRate 0.000453 Epoch: 15 Global Step: 327110 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:18:54,368-Speed 2498.00 samples/sec Loss 2.6443 LearningRate 0.000453 Epoch: 15 Global Step: 327120 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:19:02,511-Speed 2515.17 samples/sec Loss 2.5663 LearningRate 0.000453 Epoch: 15 Global Step: 327130 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:19:10,715-Speed 2497.05 samples/sec Loss 2.6340 LearningRate 0.000453 Epoch: 15 Global Step: 327140 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:19:18,912-Speed 2498.80 samples/sec Loss 2.6596 LearningRate 0.000453 Epoch: 15 Global Step: 327150 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:19:27,114-Speed 2497.31 samples/sec Loss 2.6551 LearningRate 0.000453 Epoch: 15 Global Step: 327160 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:19:35,270-Speed 2511.30 samples/sec Loss 2.6630 LearningRate 0.000453 Epoch: 15 Global Step: 327170 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:19:43,479-Speed 2495.32 samples/sec Loss 2.6245 LearningRate 0.000453 Epoch: 15 Global Step: 327180 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:19:51,626-Speed 2514.61 samples/sec Loss 2.6153 LearningRate 0.000453 Epoch: 15 Global Step: 327190 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:19:59,832-Speed 2496.02 samples/sec Loss 2.6372 LearningRate 0.000453 Epoch: 15 Global Step: 327200 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:08,032-Speed 2498.17 samples/sec Loss 2.6406 LearningRate 0.000453 Epoch: 15 Global Step: 327210 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:16,227-Speed 2499.24 samples/sec Loss 2.6829 LearningRate 0.000453 Epoch: 15 Global Step: 327220 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:24,428-Speed 2497.78 samples/sec Loss 2.6209 LearningRate 0.000453 Epoch: 15 Global Step: 327230 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:32,637-Speed 2495.48 samples/sec Loss 2.6168 LearningRate 0.000453 Epoch: 15 Global Step: 327240 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:40,781-Speed 2515.17 samples/sec Loss 2.5688 LearningRate 0.000453 Epoch: 15 Global Step: 327250 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:48,996-Speed 2493.35 samples/sec Loss 2.6416 LearningRate 0.000453 Epoch: 15 Global Step: 327260 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:20:57,196-Speed 2497.94 samples/sec Loss 2.6189 LearningRate 0.000453 Epoch: 15 Global Step: 327270 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:05,401-Speed 2496.55 samples/sec Loss 2.5988 LearningRate 0.000453 Epoch: 15 Global Step: 327280 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:13,599-Speed 2498.59 samples/sec Loss 2.6312 LearningRate 0.000453 Epoch: 15 Global Step: 327290 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:21,799-Speed 2498.03 samples/sec Loss 2.5901 LearningRate 0.000453 Epoch: 15 Global Step: 327300 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:29,947-Speed 2513.85 samples/sec Loss 2.6067 LearningRate 0.000453 Epoch: 15 Global Step: 327310 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:38,143-Speed 2499.13 samples/sec Loss 2.6459 LearningRate 0.000453 Epoch: 15 Global Step: 327320 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:46,342-Speed 2498.19 samples/sec Loss 2.6979 LearningRate 0.000453 Epoch: 15 Global Step: 327330 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:21:54,544-Speed 2497.93 samples/sec Loss 2.6273 LearningRate 0.000452 Epoch: 15 Global Step: 327340 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:02,741-Speed 2498.74 samples/sec Loss 2.5803 LearningRate 0.000452 Epoch: 15 Global Step: 327350 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:10,951-Speed 2495.29 samples/sec Loss 2.6251 LearningRate 0.000452 Epoch: 15 Global Step: 327360 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:19,099-Speed 2513.76 samples/sec Loss 2.6418 LearningRate 0.000452 Epoch: 15 Global Step: 327370 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:27,304-Speed 2496.67 samples/sec Loss 2.5751 LearningRate 0.000452 Epoch: 15 Global Step: 327380 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:35,502-Speed 2498.50 samples/sec Loss 2.5339 LearningRate 0.000452 Epoch: 15 Global Step: 327390 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:43,704-Speed 2497.13 samples/sec Loss 2.5685 LearningRate 0.000452 Epoch: 15 Global Step: 327400 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:22:51,914-Speed 2495.08 samples/sec Loss 2.6346 LearningRate 0.000452 Epoch: 15 Global Step: 327410 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:00,113-Speed 2498.30 samples/sec Loss 2.5758 LearningRate 0.000452 Epoch: 15 Global Step: 327420 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:08,258-Speed 2514.84 samples/sec Loss 2.5520 LearningRate 0.000452 Epoch: 15 Global Step: 327430 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:16,461-Speed 2497.05 samples/sec Loss 2.6839 LearningRate 0.000452 Epoch: 15 Global Step: 327440 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:24,670-Speed 2495.28 samples/sec Loss 2.5752 LearningRate 0.000452 Epoch: 15 Global Step: 327450 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:32,879-Speed 2495.26 samples/sec Loss 2.6093 LearningRate 0.000452 Epoch: 15 Global Step: 327460 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:41,094-Speed 2493.45 samples/sec Loss 2.5636 LearningRate 0.000452 Epoch: 15 Global Step: 327470 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:49,298-Speed 2496.72 samples/sec Loss 2.6096 LearningRate 0.000452 Epoch: 15 Global Step: 327480 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:23:57,452-Speed 2511.97 samples/sec Loss 2.5624 LearningRate 0.000452 Epoch: 15 Global Step: 327490 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:05,650-Speed 2498.51 samples/sec Loss 2.5908 LearningRate 0.000452 Epoch: 15 Global Step: 327500 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:13,855-Speed 2496.80 samples/sec Loss 2.5828 LearningRate 0.000452 Epoch: 15 Global Step: 327510 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:22,057-Speed 2497.37 samples/sec Loss 2.5569 LearningRate 0.000452 Epoch: 15 Global Step: 327520 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:30,261-Speed 2497.02 samples/sec Loss 2.6249 LearningRate 0.000452 Epoch: 15 Global Step: 327530 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:38,465-Speed 2496.69 samples/sec Loss 2.5757 LearningRate 0.000452 Epoch: 15 Global Step: 327540 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:46,620-Speed 2511.71 samples/sec Loss 2.5568 LearningRate 0.000452 Epoch: 15 Global Step: 327550 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:24:54,822-Speed 2497.26 samples/sec Loss 2.6666 LearningRate 0.000452 Epoch: 15 Global Step: 327560 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:03,024-Speed 2497.38 samples/sec Loss 2.6017 LearningRate 0.000452 Epoch: 15 Global Step: 327570 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:11,227-Speed 2497.36 samples/sec Loss 2.5738 LearningRate 0.000452 Epoch: 15 Global Step: 327580 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:19,431-Speed 2496.91 samples/sec Loss 2.5920 LearningRate 0.000452 Epoch: 15 Global Step: 327590 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:27,633-Speed 2497.06 samples/sec Loss 2.5597 LearningRate 0.000452 Epoch: 15 Global Step: 327600 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:35,781-Speed 2514.03 samples/sec Loss 2.6203 LearningRate 0.000452 Epoch: 15 Global Step: 327610 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:43,985-Speed 2496.94 samples/sec Loss 2.6375 LearningRate 0.000452 Epoch: 15 Global Step: 327620 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:25:52,188-Speed 2497.01 samples/sec Loss 2.6211 LearningRate 0.000452 Epoch: 15 Global Step: 327630 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:00,401-Speed 2494.31 samples/sec Loss 2.5804 LearningRate 0.000452 Epoch: 15 Global Step: 327640 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:08,604-Speed 2497.20 samples/sec Loss 2.6379 LearningRate 0.000452 Epoch: 15 Global Step: 327650 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:16,805-Speed 2497.55 samples/sec Loss 2.6313 LearningRate 0.000452 Epoch: 15 Global Step: 327660 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:24,965-Speed 2510.19 samples/sec Loss 2.6055 LearningRate 0.000452 Epoch: 15 Global Step: 327670 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:33,179-Speed 2493.62 samples/sec Loss 2.6487 LearningRate 0.000452 Epoch: 15 Global Step: 327680 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:41,380-Speed 2497.82 samples/sec Loss 2.6389 LearningRate 0.000452 Epoch: 15 Global Step: 327690 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:49,583-Speed 2496.98 samples/sec Loss 2.6192 LearningRate 0.000452 Epoch: 15 Global Step: 327700 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:26:57,788-Speed 2496.58 samples/sec Loss 2.6346 LearningRate 0.000452 Epoch: 15 Global Step: 327710 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:05,986-Speed 2498.59 samples/sec Loss 2.6126 LearningRate 0.000452 Epoch: 15 Global Step: 327720 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:14,131-Speed 2515.01 samples/sec Loss 2.5576 LearningRate 0.000452 Epoch: 15 Global Step: 327730 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:22,329-Speed 2498.50 samples/sec Loss 2.5736 LearningRate 0.000452 Epoch: 15 Global Step: 327740 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:30,527-Speed 2498.90 samples/sec Loss 2.6599 LearningRate 0.000452 Epoch: 15 Global Step: 327750 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:38,732-Speed 2496.45 samples/sec Loss 2.5857 LearningRate 0.000452 Epoch: 15 Global Step: 327760 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:46,929-Speed 2498.95 samples/sec Loss 2.5673 LearningRate 0.000452 Epoch: 15 Global Step: 327770 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:27:55,129-Speed 2498.35 samples/sec Loss 2.5897 LearningRate 0.000452 Epoch: 15 Global Step: 327780 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:03,279-Speed 2513.17 samples/sec Loss 2.6706 LearningRate 0.000452 Epoch: 15 Global Step: 327790 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:11,484-Speed 2496.50 samples/sec Loss 2.6360 LearningRate 0.000452 Epoch: 15 Global Step: 327800 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:19,686-Speed 2497.53 samples/sec Loss 2.5635 LearningRate 0.000452 Epoch: 15 Global Step: 327810 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:27,887-Speed 2497.66 samples/sec Loss 2.5874 LearningRate 0.000452 Epoch: 15 Global Step: 327820 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:36,088-Speed 2497.89 samples/sec Loss 2.6217 LearningRate 0.000452 Epoch: 15 Global Step: 327830 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:44,296-Speed 2495.39 samples/sec Loss 2.5981 LearningRate 0.000452 Epoch: 15 Global Step: 327840 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:28:52,446-Speed 2513.30 samples/sec Loss 2.5778 LearningRate 0.000452 Epoch: 15 Global Step: 327850 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:00,656-Speed 2495.07 samples/sec Loss 2.5960 LearningRate 0.000452 Epoch: 15 Global Step: 327860 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:08,863-Speed 2495.68 samples/sec Loss 2.5645 LearningRate 0.000452 Epoch: 15 Global Step: 327870 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:17,062-Speed 2498.22 samples/sec Loss 2.6172 LearningRate 0.000452 Epoch: 15 Global Step: 327880 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:25,267-Speed 2496.66 samples/sec Loss 2.6044 LearningRate 0.000451 Epoch: 15 Global Step: 327890 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:33,466-Speed 2498.20 samples/sec Loss 2.5934 LearningRate 0.000451 Epoch: 15 Global Step: 327900 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:41,630-Speed 2509.24 samples/sec Loss 2.6103 LearningRate 0.000451 Epoch: 15 Global Step: 327910 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:49,831-Speed 2497.66 samples/sec Loss 2.6185 LearningRate 0.000451 Epoch: 15 Global Step: 327920 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:29:58,031-Speed 2497.91 samples/sec Loss 2.6296 LearningRate 0.000451 Epoch: 15 Global Step: 327930 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:06,234-Speed 2497.16 samples/sec Loss 2.6212 LearningRate 0.000451 Epoch: 15 Global Step: 327940 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:14,435-Speed 2497.67 samples/sec Loss 2.6749 LearningRate 0.000451 Epoch: 15 Global Step: 327950 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:22,645-Speed 2494.69 samples/sec Loss 2.7390 LearningRate 0.000451 Epoch: 15 Global Step: 327960 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:30,819-Speed 2505.93 samples/sec Loss 2.6005 LearningRate 0.000451 Epoch: 15 Global Step: 327970 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:39,052-Speed 2487.90 samples/sec Loss 2.6355 LearningRate 0.000451 Epoch: 15 Global Step: 327980 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:47,254-Speed 2497.41 samples/sec Loss 2.5461 LearningRate 0.000451 Epoch: 15 Global Step: 327990 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:30:55,454-Speed 2497.82 samples/sec Loss 2.5602 LearningRate 0.000451 Epoch: 15 Global Step: 328000 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:03,658-Speed 2496.82 samples/sec Loss 2.5612 LearningRate 0.000451 Epoch: 15 Global Step: 328010 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:11,858-Speed 2497.81 samples/sec Loss 2.5665 LearningRate 0.000451 Epoch: 15 Global Step: 328020 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:20,016-Speed 2511.25 samples/sec Loss 2.5749 LearningRate 0.000451 Epoch: 15 Global Step: 328030 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:28,218-Speed 2497.14 samples/sec Loss 2.5965 LearningRate 0.000451 Epoch: 15 Global Step: 328040 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:36,416-Speed 2498.57 samples/sec Loss 2.5662 LearningRate 0.000451 Epoch: 15 Global Step: 328050 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:44,614-Speed 2498.26 samples/sec Loss 2.5649 LearningRate 0.000451 Epoch: 15 Global Step: 328060 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:31:52,824-Speed 2495.14 samples/sec Loss 2.5784 LearningRate 0.000451 Epoch: 15 Global Step: 328070 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:01,022-Speed 2498.69 samples/sec Loss 2.5590 LearningRate 0.000451 Epoch: 15 Global Step: 328080 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:09,171-Speed 2513.31 samples/sec Loss 2.5855 LearningRate 0.000451 Epoch: 15 Global Step: 328090 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:17,371-Speed 2498.14 samples/sec Loss 2.5467 LearningRate 0.000451 Epoch: 15 Global Step: 328100 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:25,571-Speed 2498.10 samples/sec Loss 2.6159 LearningRate 0.000451 Epoch: 15 Global Step: 328110 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:33,768-Speed 2498.70 samples/sec Loss 2.6224 LearningRate 0.000451 Epoch: 15 Global Step: 328120 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:41,968-Speed 2497.96 samples/sec Loss 2.5977 LearningRate 0.000451 Epoch: 15 Global Step: 328130 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:50,171-Speed 2497.31 samples/sec Loss 2.6230 LearningRate 0.000451 Epoch: 15 Global Step: 328140 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:32:58,328-Speed 2511.19 samples/sec Loss 2.6124 LearningRate 0.000451 Epoch: 15 Global Step: 328150 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:06,525-Speed 2498.86 samples/sec Loss 2.6063 LearningRate 0.000451 Epoch: 15 Global Step: 328160 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:14,726-Speed 2497.60 samples/sec Loss 2.6161 LearningRate 0.000451 Epoch: 15 Global Step: 328170 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:22,926-Speed 2497.75 samples/sec Loss 2.5673 LearningRate 0.000451 Epoch: 15 Global Step: 328180 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:31,125-Speed 2498.54 samples/sec Loss 2.5818 LearningRate 0.000451 Epoch: 15 Global Step: 328190 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:39,328-Speed 2497.02 samples/sec Loss 2.5787 LearningRate 0.000451 Epoch: 15 Global Step: 328200 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:47,487-Speed 2510.55 samples/sec Loss 2.5639 LearningRate 0.000451 Epoch: 15 Global Step: 328210 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:33:55,687-Speed 2497.86 samples/sec Loss 2.6012 LearningRate 0.000451 Epoch: 15 Global Step: 328220 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:03,884-Speed 2499.22 samples/sec Loss 2.6175 LearningRate 0.000451 Epoch: 15 Global Step: 328230 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:12,081-Speed 2498.65 samples/sec Loss 2.5907 LearningRate 0.000451 Epoch: 15 Global Step: 328240 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:20,278-Speed 2498.83 samples/sec Loss 2.5775 LearningRate 0.000451 Epoch: 15 Global Step: 328250 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:28,486-Speed 2495.89 samples/sec Loss 2.5855 LearningRate 0.000451 Epoch: 15 Global Step: 328260 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:36,628-Speed 2515.68 samples/sec Loss 2.5726 LearningRate 0.000451 Epoch: 15 Global Step: 328270 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:44,833-Speed 2496.19 samples/sec Loss 2.5186 LearningRate 0.000451 Epoch: 15 Global Step: 328280 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:34:53,031-Speed 2498.61 samples/sec Loss 2.6113 LearningRate 0.000451 Epoch: 15 Global Step: 328290 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:01,239-Speed 2495.71 samples/sec Loss 2.6107 LearningRate 0.000451 Epoch: 15 Global Step: 328300 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:09,436-Speed 2498.84 samples/sec Loss 2.5755 LearningRate 0.000451 Epoch: 15 Global Step: 328310 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:17,637-Speed 2497.70 samples/sec Loss 2.5942 LearningRate 0.000451 Epoch: 15 Global Step: 328320 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:25,784-Speed 2514.11 samples/sec Loss 2.5642 LearningRate 0.000451 Epoch: 15 Global Step: 328330 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:33,983-Speed 2498.36 samples/sec Loss 2.6180 LearningRate 0.000451 Epoch: 15 Global Step: 328340 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:42,182-Speed 2498.25 samples/sec Loss 2.6035 LearningRate 0.000451 Epoch: 15 Global Step: 328350 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:50,387-Speed 2496.41 samples/sec Loss 2.6276 LearningRate 0.000451 Epoch: 15 Global Step: 328360 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:35:58,586-Speed 2498.37 samples/sec Loss 2.6388 LearningRate 0.000451 Epoch: 15 Global Step: 328370 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:06,786-Speed 2498.12 samples/sec Loss 2.6039 LearningRate 0.000451 Epoch: 15 Global Step: 328380 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:14,931-Speed 2514.62 samples/sec Loss 2.6160 LearningRate 0.000451 Epoch: 15 Global Step: 328390 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:23,149-Speed 2492.73 samples/sec Loss 2.6174 LearningRate 0.000451 Epoch: 15 Global Step: 328400 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:31,351-Speed 2497.40 samples/sec Loss 2.5865 LearningRate 0.000451 Epoch: 15 Global Step: 328410 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:39,551-Speed 2497.93 samples/sec Loss 2.6396 LearningRate 0.000451 Epoch: 15 Global Step: 328420 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:47,752-Speed 2497.76 samples/sec Loss 2.5763 LearningRate 0.000451 Epoch: 15 Global Step: 328430 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:36:55,955-Speed 2496.69 samples/sec Loss 2.6745 LearningRate 0.000451 Epoch: 15 Global Step: 328440 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:04,099-Speed 2515.40 samples/sec Loss 2.6224 LearningRate 0.000450 Epoch: 15 Global Step: 328450 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:12,324-Speed 2490.42 samples/sec Loss 2.6513 LearningRate 0.000450 Epoch: 15 Global Step: 328460 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:20,538-Speed 2493.63 samples/sec Loss 2.6754 LearningRate 0.000450 Epoch: 15 Global Step: 328470 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:28,745-Speed 2495.85 samples/sec Loss 2.6814 LearningRate 0.000450 Epoch: 15 Global Step: 328480 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:36,956-Speed 2494.85 samples/sec Loss 2.6557 LearningRate 0.000450 Epoch: 15 Global Step: 328490 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:45,155-Speed 2498.17 samples/sec Loss 2.6615 LearningRate 0.000450 Epoch: 15 Global Step: 328500 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:37:53,305-Speed 2513.55 samples/sec Loss 2.6072 LearningRate 0.000450 Epoch: 15 Global Step: 328510 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:01,504-Speed 2498.11 samples/sec Loss 2.6427 LearningRate 0.000450 Epoch: 15 Global Step: 328520 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:09,707-Speed 2497.92 samples/sec Loss 2.5849 LearningRate 0.000450 Epoch: 15 Global Step: 328530 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:17,910-Speed 2496.99 samples/sec Loss 2.6443 LearningRate 0.000450 Epoch: 15 Global Step: 328540 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:26,109-Speed 2498.31 samples/sec Loss 2.6118 LearningRate 0.000450 Epoch: 15 Global Step: 328550 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:34,310-Speed 2498.15 samples/sec Loss 2.6011 LearningRate 0.000450 Epoch: 15 Global Step: 328560 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:42,455-Speed 2515.29 samples/sec Loss 2.6081 LearningRate 0.000450 Epoch: 15 Global Step: 328570 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:50,655-Speed 2498.03 samples/sec Loss 2.6022 LearningRate 0.000450 Epoch: 15 Global Step: 328580 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:38:58,870-Speed 2493.67 samples/sec Loss 2.6351 LearningRate 0.000450 Epoch: 15 Global Step: 328590 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:07,068-Speed 2498.70 samples/sec Loss 2.5714 LearningRate 0.000450 Epoch: 15 Global Step: 328600 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:15,270-Speed 2497.22 samples/sec Loss 2.6199 LearningRate 0.000450 Epoch: 15 Global Step: 328610 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:23,469-Speed 2498.44 samples/sec Loss 2.5325 LearningRate 0.000450 Epoch: 15 Global Step: 328620 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:31,613-Speed 2515.04 samples/sec Loss 2.6447 LearningRate 0.000450 Epoch: 15 Global Step: 328630 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:39,811-Speed 2498.73 samples/sec Loss 2.5940 LearningRate 0.000450 Epoch: 15 Global Step: 328640 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:48,011-Speed 2497.85 samples/sec Loss 2.5517 LearningRate 0.000450 Epoch: 15 Global Step: 328650 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:39:56,218-Speed 2495.76 samples/sec Loss 2.6017 LearningRate 0.000450 Epoch: 15 Global Step: 328660 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:04,415-Speed 2498.85 samples/sec Loss 2.6038 LearningRate 0.000450 Epoch: 15 Global Step: 328670 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:12,616-Speed 2498.11 samples/sec Loss 2.6112 LearningRate 0.000450 Epoch: 15 Global Step: 328680 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:20,777-Speed 2509.80 samples/sec Loss 2.6160 LearningRate 0.000450 Epoch: 15 Global Step: 328690 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:28,986-Speed 2495.12 samples/sec Loss 2.6469 LearningRate 0.000450 Epoch: 15 Global Step: 328700 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:37,191-Speed 2496.60 samples/sec Loss 2.6183 LearningRate 0.000450 Epoch: 15 Global Step: 328710 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:45,403-Speed 2494.53 samples/sec Loss 2.6393 LearningRate 0.000450 Epoch: 15 Global Step: 328720 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:40:53,605-Speed 2497.11 samples/sec Loss 2.5554 LearningRate 0.000450 Epoch: 15 Global Step: 328730 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:01,803-Speed 2498.46 samples/sec Loss 2.6110 LearningRate 0.000450 Epoch: 15 Global Step: 328740 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:09,949-Speed 2514.67 samples/sec Loss 2.6772 LearningRate 0.000450 Epoch: 15 Global Step: 328750 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:18,147-Speed 2498.51 samples/sec Loss 2.6399 LearningRate 0.000450 Epoch: 15 Global Step: 328760 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:26,355-Speed 2495.69 samples/sec Loss 2.6235 LearningRate 0.000450 Epoch: 15 Global Step: 328770 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:34,557-Speed 2497.14 samples/sec Loss 2.5819 LearningRate 0.000450 Epoch: 15 Global Step: 328780 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:42,763-Speed 2496.17 samples/sec Loss 2.5714 LearningRate 0.000450 Epoch: 15 Global Step: 328790 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:50,960-Speed 2499.17 samples/sec Loss 2.5835 LearningRate 0.000450 Epoch: 15 Global Step: 328800 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:41:59,110-Speed 2513.32 samples/sec Loss 2.6300 LearningRate 0.000450 Epoch: 15 Global Step: 328810 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:07,309-Speed 2498.12 samples/sec Loss 2.6497 LearningRate 0.000450 Epoch: 15 Global Step: 328820 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:15,508-Speed 2498.30 samples/sec Loss 2.6103 LearningRate 0.000450 Epoch: 15 Global Step: 328830 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:23,718-Speed 2494.92 samples/sec Loss 2.5893 LearningRate 0.000450 Epoch: 15 Global Step: 328840 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:31,915-Speed 2498.71 samples/sec Loss 2.5471 LearningRate 0.000450 Epoch: 15 Global Step: 328850 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:40,116-Speed 2497.71 samples/sec Loss 2.6543 LearningRate 0.000450 Epoch: 15 Global Step: 328860 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:48,263-Speed 2514.76 samples/sec Loss 2.5804 LearningRate 0.000450 Epoch: 15 Global Step: 328870 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:42:56,460-Speed 2498.59 samples/sec Loss 2.6425 LearningRate 0.000450 Epoch: 15 Global Step: 328880 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:04,660-Speed 2497.88 samples/sec Loss 2.6262 LearningRate 0.000450 Epoch: 15 Global Step: 328890 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:12,858-Speed 2498.55 samples/sec Loss 2.6311 LearningRate 0.000450 Epoch: 15 Global Step: 328900 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:21,054-Speed 2499.42 samples/sec Loss 2.5939 LearningRate 0.000450 Epoch: 15 Global Step: 328910 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:29,252-Speed 2498.46 samples/sec Loss 2.5863 LearningRate 0.000450 Epoch: 15 Global Step: 328920 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:37,400-Speed 2514.05 samples/sec Loss 2.6388 LearningRate 0.000450 Epoch: 15 Global Step: 328930 Fp16 Grad Scale: 65536 Required: 115 hours Training: 2022-07-08 17:43:45,617-Speed 2513.26 samples/sec Loss 2.6356 LearningRate 0.000450 Epoch: 15 Global Step: 328940 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:43:53,846-Speed 2498.77 samples/sec Loss 2.6380 LearningRate 0.000450 Epoch: 15 Global Step: 328950 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:02,057-Speed 2494.47 samples/sec Loss 2.5770 LearningRate 0.000450 Epoch: 15 Global Step: 328960 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:15,189-Speed 1574.52 samples/sec Loss 2.5346 LearningRate 0.000450 Epoch: 15 Global Step: 328970 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:23,449-Speed 2503.02 samples/sec Loss 2.5902 LearningRate 0.000450 Epoch: 15 Global Step: 328980 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:31,631-Speed 2519.24 samples/sec Loss 2.5775 LearningRate 0.000450 Epoch: 15 Global Step: 328990 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:43,101-Speed 1785.74 samples/sec Loss 2.6363 LearningRate 0.000450 Epoch: 15 Global Step: 329000 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:44:51,312-Speed 2502.14 samples/sec Loss 2.5858 LearningRate 0.000449 Epoch: 15 Global Step: 329010 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:04,058-Speed 1618.13 samples/sec Loss 2.6365 LearningRate 0.000449 Epoch: 15 Global Step: 329020 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:12,265-Speed 2495.77 samples/sec Loss 2.6058 LearningRate 0.000449 Epoch: 15 Global Step: 329030 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:20,504-Speed 2503.39 samples/sec Loss 2.5775 LearningRate 0.000449 Epoch: 15 Global Step: 329040 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:28,671-Speed 2518.07 samples/sec Loss 2.5986 LearningRate 0.000449 Epoch: 15 Global Step: 329050 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:41,009-Speed 1660.10 samples/sec Loss 2.6280 LearningRate 0.000449 Epoch: 15 Global Step: 329060 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:49,245-Speed 2504.27 samples/sec Loss 2.5693 LearningRate 0.000449 Epoch: 15 Global Step: 329070 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:45:57,492-Speed 2499.38 samples/sec Loss 2.6192 LearningRate 0.000449 Epoch: 15 Global Step: 329080 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:05,693-Speed 2497.40 samples/sec Loss 2.5636 LearningRate 0.000449 Epoch: 15 Global Step: 329090 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:17,921-Speed 1714.95 samples/sec Loss 2.5620 LearningRate 0.000449 Epoch: 15 Global Step: 329100 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:26,138-Speed 2514.47 samples/sec Loss 2.5644 LearningRate 0.000449 Epoch: 15 Global Step: 329110 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:34,670-Speed 2497.53 samples/sec Loss 2.5853 LearningRate 0.000449 Epoch: 15 Global Step: 329120 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:47,494-Speed 1597.10 samples/sec Loss 2.5918 LearningRate 0.000449 Epoch: 15 Global Step: 329130 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:46:55,921-Speed 2491.56 samples/sec Loss 2.5913 LearningRate 0.000449 Epoch: 15 Global Step: 329140 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:04,174-Speed 2497.03 samples/sec Loss 2.6073 LearningRate 0.000449 Epoch: 15 Global Step: 329150 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:12,381-Speed 2496.44 samples/sec Loss 2.6007 LearningRate 0.000449 Epoch: 15 Global Step: 329160 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:20,559-Speed 2514.19 samples/sec Loss 2.6002 LearningRate 0.000449 Epoch: 15 Global Step: 329170 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:28,983-Speed 2498.20 samples/sec Loss 2.6050 LearningRate 0.000449 Epoch: 15 Global Step: 329180 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:38,414-Speed 2171.72 samples/sec Loss 2.5767 LearningRate 0.000449 Epoch: 15 Global Step: 329190 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:47,079-Speed 2363.94 samples/sec Loss 2.5327 LearningRate 0.000449 Epoch: 15 Global Step: 329200 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:47:55,972-Speed 2308.01 samples/sec Loss 2.5671 LearningRate 0.000449 Epoch: 15 Global Step: 329210 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:04,189-Speed 2493.17 samples/sec Loss 2.6232 LearningRate 0.000449 Epoch: 15 Global Step: 329220 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:12,338-Speed 2513.49 samples/sec Loss 2.5713 LearningRate 0.000449 Epoch: 15 Global Step: 329230 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:20,537-Speed 2498.19 samples/sec Loss 2.5987 LearningRate 0.000449 Epoch: 15 Global Step: 329240 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:28,751-Speed 2493.99 samples/sec Loss 2.5889 LearningRate 0.000449 Epoch: 15 Global Step: 329250 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:36,956-Speed 2496.14 samples/sec Loss 2.5442 LearningRate 0.000449 Epoch: 15 Global Step: 329260 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:45,160-Speed 2496.81 samples/sec Loss 2.5444 LearningRate 0.000449 Epoch: 15 Global Step: 329270 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:48:53,376-Speed 2493.12 samples/sec Loss 2.6009 LearningRate 0.000449 Epoch: 15 Global Step: 329280 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:01,526-Speed 2513.24 samples/sec Loss 2.6342 LearningRate 0.000449 Epoch: 15 Global Step: 329290 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:09,730-Speed 2496.91 samples/sec Loss 2.6135 LearningRate 0.000449 Epoch: 15 Global Step: 329300 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:17,935-Speed 2496.50 samples/sec Loss 2.5988 LearningRate 0.000449 Epoch: 15 Global Step: 329310 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:26,140-Speed 2496.43 samples/sec Loss 2.5760 LearningRate 0.000449 Epoch: 15 Global Step: 329320 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:34,354-Speed 2493.64 samples/sec Loss 2.5960 LearningRate 0.000449 Epoch: 15 Global Step: 329330 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:42,556-Speed 2497.33 samples/sec Loss 2.6061 LearningRate 0.000449 Epoch: 15 Global Step: 329340 Fp16 Grad Scale: 32768 Required: 115 hours Training: 2022-07-08 17:49:50,705-Speed 2513.68 samples/sec Loss 2.5542 LearningRate 0.000449 Epoch: 15 Global Step: 329350 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:49:58,909-Speed 2496.58 samples/sec Loss 2.6171 LearningRate 0.000449 Epoch: 15 Global Step: 329360 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:07,113-Speed 2496.73 samples/sec Loss 2.6122 LearningRate 0.000449 Epoch: 15 Global Step: 329370 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:15,314-Speed 2497.78 samples/sec Loss 2.6252 LearningRate 0.000449 Epoch: 15 Global Step: 329380 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:23,517-Speed 2497.21 samples/sec Loss 2.5866 LearningRate 0.000449 Epoch: 15 Global Step: 329390 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:31,721-Speed 2496.80 samples/sec Loss 2.6389 LearningRate 0.000449 Epoch: 15 Global Step: 329400 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:39,873-Speed 2512.69 samples/sec Loss 2.6422 LearningRate 0.000449 Epoch: 15 Global Step: 329410 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:48,077-Speed 2496.67 samples/sec Loss 2.6103 LearningRate 0.000449 Epoch: 15 Global Step: 329420 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:50:56,277-Speed 2498.09 samples/sec Loss 2.6168 LearningRate 0.000449 Epoch: 15 Global Step: 329430 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:04,493-Speed 2492.94 samples/sec Loss 2.6279 LearningRate 0.000449 Epoch: 15 Global Step: 329440 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:12,693-Speed 2498.17 samples/sec Loss 2.6186 LearningRate 0.000449 Epoch: 15 Global Step: 329450 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:20,896-Speed 2497.10 samples/sec Loss 2.5989 LearningRate 0.000449 Epoch: 15 Global Step: 329460 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:29,052-Speed 2511.29 samples/sec Loss 2.5422 LearningRate 0.000449 Epoch: 15 Global Step: 329470 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:37,254-Speed 2497.33 samples/sec Loss 2.6224 LearningRate 0.000449 Epoch: 15 Global Step: 329480 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:45,455-Speed 2497.74 samples/sec Loss 2.5681 LearningRate 0.000449 Epoch: 15 Global Step: 329490 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:51:53,658-Speed 2497.04 samples/sec Loss 2.5698 LearningRate 0.000449 Epoch: 15 Global Step: 329500 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:01,857-Speed 2498.06 samples/sec Loss 2.6523 LearningRate 0.000449 Epoch: 15 Global Step: 329510 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:10,067-Speed 2495.11 samples/sec Loss 2.5778 LearningRate 0.000449 Epoch: 15 Global Step: 329520 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:18,228-Speed 2509.77 samples/sec Loss 2.6508 LearningRate 0.000449 Epoch: 15 Global Step: 329530 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:26,434-Speed 2496.69 samples/sec Loss 2.5451 LearningRate 0.000449 Epoch: 15 Global Step: 329540 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:34,656-Speed 2491.20 samples/sec Loss 2.5452 LearningRate 0.000449 Epoch: 15 Global Step: 329550 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:42,855-Speed 2498.12 samples/sec Loss 2.5490 LearningRate 0.000448 Epoch: 15 Global Step: 329560 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:51,062-Speed 2495.85 samples/sec Loss 2.5615 LearningRate 0.000448 Epoch: 15 Global Step: 329570 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:52:59,270-Speed 2495.45 samples/sec Loss 2.5307 LearningRate 0.000448 Epoch: 15 Global Step: 329580 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:07,417-Speed 2514.39 samples/sec Loss 2.5354 LearningRate 0.000448 Epoch: 15 Global Step: 329590 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:15,615-Speed 2498.63 samples/sec Loss 2.5273 LearningRate 0.000448 Epoch: 15 Global Step: 329600 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:23,814-Speed 2498.34 samples/sec Loss 2.5534 LearningRate 0.000448 Epoch: 15 Global Step: 329610 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:32,015-Speed 2497.48 samples/sec Loss 2.5395 LearningRate 0.000448 Epoch: 15 Global Step: 329620 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:40,224-Speed 2495.28 samples/sec Loss 2.6289 LearningRate 0.000448 Epoch: 15 Global Step: 329630 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:48,427-Speed 2497.11 samples/sec Loss 2.6132 LearningRate 0.000448 Epoch: 15 Global Step: 329640 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:53:56,578-Speed 2513.08 samples/sec Loss 2.5570 LearningRate 0.000448 Epoch: 15 Global Step: 329650 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:04,774-Speed 2499.23 samples/sec Loss 2.5445 LearningRate 0.000448 Epoch: 15 Global Step: 329660 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:12,995-Speed 2491.49 samples/sec Loss 2.5777 LearningRate 0.000448 Epoch: 15 Global Step: 329670 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:21,203-Speed 2495.69 samples/sec Loss 2.6223 LearningRate 0.000448 Epoch: 15 Global Step: 329680 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:29,417-Speed 2493.85 samples/sec Loss 2.6555 LearningRate 0.000448 Epoch: 15 Global Step: 329690 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:37,629-Speed 2494.13 samples/sec Loss 2.5382 LearningRate 0.000448 Epoch: 15 Global Step: 329700 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:45,776-Speed 2514.07 samples/sec Loss 2.5919 LearningRate 0.000448 Epoch: 15 Global Step: 329710 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:54:53,979-Speed 2497.25 samples/sec Loss 2.5739 LearningRate 0.000448 Epoch: 15 Global Step: 329720 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:02,181-Speed 2497.19 samples/sec Loss 2.6220 LearningRate 0.000448 Epoch: 15 Global Step: 329730 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:10,383-Speed 2497.29 samples/sec Loss 2.6743 LearningRate 0.000448 Epoch: 15 Global Step: 329740 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:18,582-Speed 2498.09 samples/sec Loss 2.5915 LearningRate 0.000448 Epoch: 15 Global Step: 329750 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:26,780-Speed 2498.73 samples/sec Loss 2.5339 LearningRate 0.000448 Epoch: 15 Global Step: 329760 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:34,930-Speed 2513.17 samples/sec Loss 2.5444 LearningRate 0.000448 Epoch: 15 Global Step: 329770 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:43,145-Speed 2493.29 samples/sec Loss 2.5321 LearningRate 0.000448 Epoch: 15 Global Step: 329780 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:51,346-Speed 2497.76 samples/sec Loss 2.6181 LearningRate 0.000448 Epoch: 15 Global Step: 329790 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:55:59,543-Speed 2499.03 samples/sec Loss 2.5528 LearningRate 0.000448 Epoch: 15 Global Step: 329800 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:07,745-Speed 2497.14 samples/sec Loss 2.6246 LearningRate 0.000448 Epoch: 15 Global Step: 329810 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:15,946-Speed 2497.55 samples/sec Loss 2.6535 LearningRate 0.000448 Epoch: 15 Global Step: 329820 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:24,091-Speed 2514.99 samples/sec Loss 2.5598 LearningRate 0.000448 Epoch: 15 Global Step: 329830 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:32,288-Speed 2498.83 samples/sec Loss 2.6554 LearningRate 0.000448 Epoch: 15 Global Step: 329840 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:40,505-Speed 2492.90 samples/sec Loss 2.5954 LearningRate 0.000448 Epoch: 15 Global Step: 329850 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:48,711-Speed 2495.97 samples/sec Loss 2.6224 LearningRate 0.000448 Epoch: 15 Global Step: 329860 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:56:56,907-Speed 2499.21 samples/sec Loss 2.5972 LearningRate 0.000448 Epoch: 15 Global Step: 329870 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:05,118-Speed 2494.49 samples/sec Loss 2.5895 LearningRate 0.000448 Epoch: 15 Global Step: 329880 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:13,264-Speed 2514.68 samples/sec Loss 2.6327 LearningRate 0.000448 Epoch: 15 Global Step: 329890 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:21,470-Speed 2495.98 samples/sec Loss 2.5995 LearningRate 0.000448 Epoch: 15 Global Step: 329900 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:29,670-Speed 2498.43 samples/sec Loss 2.6081 LearningRate 0.000448 Epoch: 15 Global Step: 329910 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:37,884-Speed 2493.62 samples/sec Loss 2.6244 LearningRate 0.000448 Epoch: 15 Global Step: 329920 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:46,083-Speed 2498.17 samples/sec Loss 2.6136 LearningRate 0.000448 Epoch: 15 Global Step: 329930 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:57:54,288-Speed 2496.32 samples/sec Loss 2.6363 LearningRate 0.000448 Epoch: 15 Global Step: 329940 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:02,439-Speed 2513.21 samples/sec Loss 2.5926 LearningRate 0.000448 Epoch: 15 Global Step: 329950 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:10,641-Speed 2497.50 samples/sec Loss 2.5693 LearningRate 0.000448 Epoch: 15 Global Step: 329960 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:18,840-Speed 2498.52 samples/sec Loss 2.6449 LearningRate 0.000448 Epoch: 15 Global Step: 329970 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:27,043-Speed 2496.92 samples/sec Loss 2.5257 LearningRate 0.000448 Epoch: 15 Global Step: 329980 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:35,246-Speed 2497.02 samples/sec Loss 2.5909 LearningRate 0.000448 Epoch: 15 Global Step: 329990 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:43,445-Speed 2498.17 samples/sec Loss 2.5809 LearningRate 0.000448 Epoch: 15 Global Step: 330000 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:51,592-Speed 2514.30 samples/sec Loss 2.5937 LearningRate 0.000448 Epoch: 15 Global Step: 330010 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:58:59,802-Speed 2495.09 samples/sec Loss 2.5840 LearningRate 0.000448 Epoch: 15 Global Step: 330020 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:08,006-Speed 2496.75 samples/sec Loss 2.6064 LearningRate 0.000448 Epoch: 15 Global Step: 330030 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:16,219-Speed 2494.07 samples/sec Loss 2.5600 LearningRate 0.000448 Epoch: 15 Global Step: 330040 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:24,420-Speed 2497.65 samples/sec Loss 2.6100 LearningRate 0.000448 Epoch: 15 Global Step: 330050 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:32,619-Speed 2498.21 samples/sec Loss 2.7237 LearningRate 0.000448 Epoch: 15 Global Step: 330060 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:40,766-Speed 2514.18 samples/sec Loss 2.5740 LearningRate 0.000448 Epoch: 15 Global Step: 330070 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:48,971-Speed 2496.73 samples/sec Loss 2.5882 LearningRate 0.000448 Epoch: 15 Global Step: 330080 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 17:59:57,176-Speed 2496.23 samples/sec Loss 2.6028 LearningRate 0.000448 Epoch: 15 Global Step: 330090 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:00:05,379-Speed 2497.02 samples/sec Loss 2.5702 LearningRate 0.000448 Epoch: 15 Global Step: 330100 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:00:13,581-Speed 2497.31 samples/sec Loss 2.5429 LearningRate 0.000448 Epoch: 15 Global Step: 330110 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:00:21,782-Speed 2497.91 samples/sec Loss 2.5667 LearningRate 0.000447 Epoch: 15 Global Step: 330120 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:00:29,931-Speed 2513.66 samples/sec Loss 2.5159 LearningRate 0.000447 Epoch: 15 Global Step: 330130 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:00:38,134-Speed 2497.12 samples/sec Loss 2.5438 LearningRate 0.000447 Epoch: 15 Global Step: 330140 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:00:46,335-Speed 2497.51 samples/sec Loss 2.5611 LearningRate 0.000447 Epoch: 15 Global Step: 330150 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:00:54,540-Speed 2496.28 samples/sec Loss 2.5907 LearningRate 0.000447 Epoch: 15 Global Step: 330160 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:02,745-Speed 2496.56 samples/sec Loss 2.5568 LearningRate 0.000447 Epoch: 15 Global Step: 330170 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:10,948-Speed 2497.49 samples/sec Loss 2.5834 LearningRate 0.000447 Epoch: 15 Global Step: 330180 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:19,093-Speed 2514.64 samples/sec Loss 2.5747 LearningRate 0.000447 Epoch: 15 Global Step: 330190 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:27,291-Speed 2498.46 samples/sec Loss 2.5451 LearningRate 0.000447 Epoch: 15 Global Step: 330200 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:35,488-Speed 2499.16 samples/sec Loss 2.5859 LearningRate 0.000447 Epoch: 15 Global Step: 330210 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:43,684-Speed 2499.12 samples/sec Loss 2.5407 LearningRate 0.000447 Epoch: 15 Global Step: 330220 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:01:51,882-Speed 2498.61 samples/sec Loss 2.6315 LearningRate 0.000447 Epoch: 15 Global Step: 330230 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:00,082-Speed 2497.95 samples/sec Loss 2.5781 LearningRate 0.000447 Epoch: 15 Global Step: 330240 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:08,228-Speed 2514.51 samples/sec Loss 2.6081 LearningRate 0.000447 Epoch: 15 Global Step: 330250 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:16,438-Speed 2495.00 samples/sec Loss 2.6241 LearningRate 0.000447 Epoch: 15 Global Step: 330260 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:24,635-Speed 2498.79 samples/sec Loss 2.5916 LearningRate 0.000447 Epoch: 15 Global Step: 330270 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:32,837-Speed 2497.61 samples/sec Loss 2.6045 LearningRate 0.000447 Epoch: 15 Global Step: 330280 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:41,036-Speed 2498.49 samples/sec Loss 2.5771 LearningRate 0.000447 Epoch: 15 Global Step: 330290 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:49,235-Speed 2497.99 samples/sec Loss 2.6577 LearningRate 0.000447 Epoch: 15 Global Step: 330300 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:02:57,393-Speed 2511.00 samples/sec Loss 2.6560 LearningRate 0.000447 Epoch: 15 Global Step: 330310 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:05,590-Speed 2498.80 samples/sec Loss 2.6034 LearningRate 0.000447 Epoch: 15 Global Step: 330320 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:13,792-Speed 2497.56 samples/sec Loss 2.6213 LearningRate 0.000447 Epoch: 15 Global Step: 330330 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:21,992-Speed 2498.16 samples/sec Loss 2.5697 LearningRate 0.000447 Epoch: 15 Global Step: 330340 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:30,205-Speed 2493.87 samples/sec Loss 2.6671 LearningRate 0.000447 Epoch: 15 Global Step: 330350 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:38,409-Speed 2496.77 samples/sec Loss 2.5613 LearningRate 0.000447 Epoch: 15 Global Step: 330360 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:46,553-Speed 2515.17 samples/sec Loss 2.5812 LearningRate 0.000447 Epoch: 15 Global Step: 330370 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:03:54,752-Speed 2498.40 samples/sec Loss 2.5697 LearningRate 0.000447 Epoch: 15 Global Step: 330380 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:02,955-Speed 2497.04 samples/sec Loss 2.6057 LearningRate 0.000447 Epoch: 15 Global Step: 330390 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:11,155-Speed 2497.98 samples/sec Loss 2.5950 LearningRate 0.000447 Epoch: 15 Global Step: 330400 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:19,354-Speed 2498.20 samples/sec Loss 2.6014 LearningRate 0.000447 Epoch: 15 Global Step: 330410 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:27,555-Speed 2497.49 samples/sec Loss 2.5747 LearningRate 0.000447 Epoch: 15 Global Step: 330420 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:35,705-Speed 2513.41 samples/sec Loss 2.6029 LearningRate 0.000447 Epoch: 15 Global Step: 330430 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:43,902-Speed 2498.78 samples/sec Loss 2.5603 LearningRate 0.000447 Epoch: 15 Global Step: 330440 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:04:52,106-Speed 2496.70 samples/sec Loss 2.6054 LearningRate 0.000447 Epoch: 15 Global Step: 330450 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:00,309-Speed 2497.11 samples/sec Loss 2.5659 LearningRate 0.000447 Epoch: 15 Global Step: 330460 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:08,508-Speed 2498.06 samples/sec Loss 2.5401 LearningRate 0.000447 Epoch: 15 Global Step: 330470 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:16,707-Speed 2498.86 samples/sec Loss 2.5762 LearningRate 0.000447 Epoch: 15 Global Step: 330480 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:24,858-Speed 2512.92 samples/sec Loss 2.5556 LearningRate 0.000447 Epoch: 15 Global Step: 330490 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:33,061-Speed 2497.15 samples/sec Loss 2.5597 LearningRate 0.000447 Epoch: 15 Global Step: 330500 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:41,262-Speed 2497.46 samples/sec Loss 2.5199 LearningRate 0.000447 Epoch: 15 Global Step: 330510 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:49,459-Speed 2498.89 samples/sec Loss 2.5530 LearningRate 0.000447 Epoch: 15 Global Step: 330520 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:05:57,660-Speed 2497.45 samples/sec Loss 2.5551 LearningRate 0.000447 Epoch: 15 Global Step: 330530 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:05,858-Speed 2498.61 samples/sec Loss 2.5459 LearningRate 0.000447 Epoch: 15 Global Step: 330540 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:14,003-Speed 2514.83 samples/sec Loss 2.5743 LearningRate 0.000447 Epoch: 15 Global Step: 330550 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:22,204-Speed 2497.80 samples/sec Loss 2.5886 LearningRate 0.000447 Epoch: 15 Global Step: 330560 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:30,406-Speed 2497.36 samples/sec Loss 2.5740 LearningRate 0.000447 Epoch: 15 Global Step: 330570 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:38,604-Speed 2498.39 samples/sec Loss 2.5531 LearningRate 0.000447 Epoch: 15 Global Step: 330580 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:46,805-Speed 2497.81 samples/sec Loss 2.5795 LearningRate 0.000447 Epoch: 15 Global Step: 330590 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:06:55,002-Speed 2499.00 samples/sec Loss 2.6273 LearningRate 0.000447 Epoch: 15 Global Step: 330600 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:07:03,151-Speed 2513.68 samples/sec Loss 2.5636 LearningRate 0.000447 Epoch: 15 Global Step: 330610 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:07:11,352-Speed 2497.65 samples/sec Loss 2.6506 LearningRate 0.000447 Epoch: 15 Global Step: 330620 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:07:19,553-Speed 2497.81 samples/sec Loss 2.5736 LearningRate 0.000447 Epoch: 15 Global Step: 330630 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:07:27,712-Speed 2510.52 samples/sec Loss 2.5341 LearningRate 0.000447 Epoch: 15 Global Step: 330640 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:07:35,911-Speed 2498.21 samples/sec Loss 2.5706 LearningRate 0.000447 Epoch: 15 Global Step: 330650 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:07:44,108-Speed 2498.96 samples/sec Loss 2.6078 LearningRate 0.000447 Epoch: 15 Global Step: 330660 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:07:52,255-Speed 2514.21 samples/sec Loss 2.5581 LearningRate 0.000447 Epoch: 15 Global Step: 330670 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:00,465-Speed 2494.77 samples/sec Loss 2.5710 LearningRate 0.000446 Epoch: 15 Global Step: 330680 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:08,663-Speed 2498.61 samples/sec Loss 2.5520 LearningRate 0.000446 Epoch: 15 Global Step: 330690 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:16,868-Speed 2496.37 samples/sec Loss 2.5978 LearningRate 0.000446 Epoch: 15 Global Step: 330700 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:25,065-Speed 2499.01 samples/sec Loss 2.5501 LearningRate 0.000446 Epoch: 15 Global Step: 330710 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:33,279-Speed 2493.62 samples/sec Loss 2.5805 LearningRate 0.000446 Epoch: 15 Global Step: 330720 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:41,424-Speed 2515.06 samples/sec Loss 2.5422 LearningRate 0.000446 Epoch: 15 Global Step: 330730 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:49,622-Speed 2498.44 samples/sec Loss 2.5465 LearningRate 0.000446 Epoch: 15 Global Step: 330740 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:08:57,820-Speed 2498.74 samples/sec Loss 2.5350 LearningRate 0.000446 Epoch: 15 Global Step: 330750 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:06,020-Speed 2498.00 samples/sec Loss 2.5638 LearningRate 0.000446 Epoch: 15 Global Step: 330760 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:14,229-Speed 2495.14 samples/sec Loss 2.5569 LearningRate 0.000446 Epoch: 15 Global Step: 330770 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:22,430-Speed 2497.69 samples/sec Loss 2.5633 LearningRate 0.000446 Epoch: 15 Global Step: 330780 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:30,589-Speed 2510.54 samples/sec Loss 2.6231 LearningRate 0.000446 Epoch: 15 Global Step: 330790 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:38,790-Speed 2497.67 samples/sec Loss 2.6002 LearningRate 0.000446 Epoch: 15 Global Step: 330800 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:46,991-Speed 2497.69 samples/sec Loss 2.5973 LearningRate 0.000446 Epoch: 15 Global Step: 330810 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:09:55,194-Speed 2497.12 samples/sec Loss 2.5875 LearningRate 0.000446 Epoch: 15 Global Step: 330820 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:03,393-Speed 2498.05 samples/sec Loss 2.4828 LearningRate 0.000446 Epoch: 15 Global Step: 330830 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:11,596-Speed 2498.00 samples/sec Loss 2.5462 LearningRate 0.000446 Epoch: 15 Global Step: 330840 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:19,744-Speed 2513.75 samples/sec Loss 2.5479 LearningRate 0.000446 Epoch: 15 Global Step: 330850 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:27,948-Speed 2497.00 samples/sec Loss 2.6149 LearningRate 0.000446 Epoch: 15 Global Step: 330860 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:36,147-Speed 2498.06 samples/sec Loss 2.5601 LearningRate 0.000446 Epoch: 15 Global Step: 330870 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:44,346-Speed 2498.52 samples/sec Loss 2.5421 LearningRate 0.000446 Epoch: 15 Global Step: 330880 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:10:52,547-Speed 2497.81 samples/sec Loss 2.6121 LearningRate 0.000446 Epoch: 15 Global Step: 330890 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:00,744-Speed 2498.62 samples/sec Loss 2.6164 LearningRate 0.000446 Epoch: 15 Global Step: 330900 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:08,895-Speed 2513.24 samples/sec Loss 2.5584 LearningRate 0.000446 Epoch: 15 Global Step: 330910 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:17,097-Speed 2497.15 samples/sec Loss 2.6224 LearningRate 0.000446 Epoch: 15 Global Step: 330920 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:25,314-Speed 2492.77 samples/sec Loss 2.5753 LearningRate 0.000446 Epoch: 15 Global Step: 330930 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:33,520-Speed 2496.11 samples/sec Loss 2.5065 LearningRate 0.000446 Epoch: 15 Global Step: 330940 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:41,732-Speed 2494.45 samples/sec Loss 2.5110 LearningRate 0.000446 Epoch: 15 Global Step: 330950 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:49,933-Speed 2498.46 samples/sec Loss 2.6499 LearningRate 0.000446 Epoch: 15 Global Step: 330960 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:11:58,090-Speed 2511.03 samples/sec Loss 2.5387 LearningRate 0.000446 Epoch: 15 Global Step: 330970 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:06,291-Speed 2497.88 samples/sec Loss 2.6032 LearningRate 0.000446 Epoch: 15 Global Step: 330980 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:14,495-Speed 2496.76 samples/sec Loss 2.6056 LearningRate 0.000446 Epoch: 15 Global Step: 330990 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:22,695-Speed 2497.74 samples/sec Loss 2.5949 LearningRate 0.000446 Epoch: 15 Global Step: 331000 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:30,893-Speed 2498.72 samples/sec Loss 2.5851 LearningRate 0.000446 Epoch: 15 Global Step: 331010 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:39,101-Speed 2495.37 samples/sec Loss 2.5274 LearningRate 0.000446 Epoch: 15 Global Step: 331020 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:47,250-Speed 2513.56 samples/sec Loss 2.5729 LearningRate 0.000446 Epoch: 15 Global Step: 331030 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:12:55,448-Speed 2498.83 samples/sec Loss 2.5459 LearningRate 0.000446 Epoch: 15 Global Step: 331040 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:03,646-Speed 2498.37 samples/sec Loss 2.5617 LearningRate 0.000446 Epoch: 15 Global Step: 331050 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:11,844-Speed 2498.99 samples/sec Loss 2.5765 LearningRate 0.000446 Epoch: 15 Global Step: 331060 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:20,043-Speed 2498.49 samples/sec Loss 2.5701 LearningRate 0.000446 Epoch: 15 Global Step: 331070 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:28,248-Speed 2496.44 samples/sec Loss 2.6054 LearningRate 0.000446 Epoch: 15 Global Step: 331080 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:36,391-Speed 2515.25 samples/sec Loss 2.5896 LearningRate 0.000446 Epoch: 15 Global Step: 331090 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:44,591-Speed 2498.22 samples/sec Loss 2.5408 LearningRate 0.000446 Epoch: 15 Global Step: 331100 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:13:52,796-Speed 2496.47 samples/sec Loss 2.5729 LearningRate 0.000446 Epoch: 15 Global Step: 331110 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:00,996-Speed 2498.14 samples/sec Loss 2.5291 LearningRate 0.000446 Epoch: 15 Global Step: 331120 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:09,193-Speed 2498.69 samples/sec Loss 2.5521 LearningRate 0.000446 Epoch: 15 Global Step: 331130 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:17,390-Speed 2498.77 samples/sec Loss 2.5295 LearningRate 0.000446 Epoch: 15 Global Step: 331140 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:25,548-Speed 2511.00 samples/sec Loss 2.5274 LearningRate 0.000446 Epoch: 15 Global Step: 331150 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:33,746-Speed 2498.52 samples/sec Loss 2.5630 LearningRate 0.000446 Epoch: 15 Global Step: 331160 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:41,946-Speed 2498.14 samples/sec Loss 2.5611 LearningRate 0.000446 Epoch: 15 Global Step: 331170 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:50,145-Speed 2498.34 samples/sec Loss 2.5685 LearningRate 0.000446 Epoch: 15 Global Step: 331180 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:14:58,342-Speed 2499.22 samples/sec Loss 2.5937 LearningRate 0.000446 Epoch: 15 Global Step: 331190 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:06,540-Speed 2498.25 samples/sec Loss 2.5793 LearningRate 0.000446 Epoch: 15 Global Step: 331200 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:14,691-Speed 2513.07 samples/sec Loss 2.5769 LearningRate 0.000446 Epoch: 15 Global Step: 331210 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:22,889-Speed 2498.54 samples/sec Loss 2.5564 LearningRate 0.000446 Epoch: 15 Global Step: 331220 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:31,101-Speed 2494.14 samples/sec Loss 2.5706 LearningRate 0.000446 Epoch: 15 Global Step: 331230 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:39,304-Speed 2497.43 samples/sec Loss 2.5573 LearningRate 0.000445 Epoch: 15 Global Step: 331240 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:47,507-Speed 2497.05 samples/sec Loss 2.5509 LearningRate 0.000445 Epoch: 15 Global Step: 331250 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:15:55,708-Speed 2497.61 samples/sec Loss 2.5228 LearningRate 0.000445 Epoch: 15 Global Step: 331260 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:03,854-Speed 2514.70 samples/sec Loss 2.5070 LearningRate 0.000445 Epoch: 15 Global Step: 331270 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:12,055-Speed 2497.73 samples/sec Loss 2.6087 LearningRate 0.000445 Epoch: 15 Global Step: 331280 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:20,258-Speed 2497.03 samples/sec Loss 2.6275 LearningRate 0.000445 Epoch: 15 Global Step: 331290 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:28,456-Speed 2498.50 samples/sec Loss 2.5974 LearningRate 0.000445 Epoch: 15 Global Step: 331300 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:36,668-Speed 2494.27 samples/sec Loss 2.5934 LearningRate 0.000445 Epoch: 15 Global Step: 331310 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:44,889-Speed 2491.66 samples/sec Loss 2.6734 LearningRate 0.000445 Epoch: 15 Global Step: 331320 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:16:53,035-Speed 2514.55 samples/sec Loss 2.6281 LearningRate 0.000445 Epoch: 15 Global Step: 331330 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:01,243-Speed 2495.56 samples/sec Loss 2.5794 LearningRate 0.000445 Epoch: 15 Global Step: 331340 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:09,441-Speed 2498.68 samples/sec Loss 2.5505 LearningRate 0.000445 Epoch: 15 Global Step: 331350 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:17,642-Speed 2497.59 samples/sec Loss 2.5933 LearningRate 0.000445 Epoch: 15 Global Step: 331360 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:25,843-Speed 2497.91 samples/sec Loss 2.5703 LearningRate 0.000445 Epoch: 15 Global Step: 331370 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:34,043-Speed 2497.98 samples/sec Loss 2.6421 LearningRate 0.000445 Epoch: 15 Global Step: 331380 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:42,189-Speed 2514.45 samples/sec Loss 2.5518 LearningRate 0.000445 Epoch: 15 Global Step: 331390 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:50,393-Speed 2497.03 samples/sec Loss 2.7095 LearningRate 0.000445 Epoch: 15 Global Step: 331400 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:17:58,597-Speed 2497.05 samples/sec Loss 2.6534 LearningRate 0.000445 Epoch: 15 Global Step: 331410 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:06,805-Speed 2495.53 samples/sec Loss 2.6829 LearningRate 0.000445 Epoch: 15 Global Step: 331420 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:15,006-Speed 2497.47 samples/sec Loss 2.6885 LearningRate 0.000445 Epoch: 15 Global Step: 331430 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:23,207-Speed 2497.86 samples/sec Loss 2.6623 LearningRate 0.000445 Epoch: 15 Global Step: 331440 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:31,362-Speed 2511.49 samples/sec Loss 2.6849 LearningRate 0.000445 Epoch: 15 Global Step: 331450 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:39,566-Speed 2497.30 samples/sec Loss 2.6406 LearningRate 0.000445 Epoch: 15 Global Step: 331460 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:47,776-Speed 2495.00 samples/sec Loss 2.6566 LearningRate 0.000445 Epoch: 15 Global Step: 331470 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:18:55,975-Speed 2498.11 samples/sec Loss 2.6709 LearningRate 0.000445 Epoch: 15 Global Step: 331480 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:04,173-Speed 2498.41 samples/sec Loss 2.5828 LearningRate 0.000445 Epoch: 15 Global Step: 331490 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:12,389-Speed 2493.24 samples/sec Loss 2.6191 LearningRate 0.000445 Epoch: 15 Global Step: 331500 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:20,541-Speed 2512.75 samples/sec Loss 2.6322 LearningRate 0.000445 Epoch: 15 Global Step: 331510 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:28,755-Speed 2493.55 samples/sec Loss 2.5935 LearningRate 0.000445 Epoch: 15 Global Step: 331520 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:36,964-Speed 2495.25 samples/sec Loss 2.5520 LearningRate 0.000445 Epoch: 15 Global Step: 331530 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:45,167-Speed 2497.16 samples/sec Loss 2.5861 LearningRate 0.000445 Epoch: 15 Global Step: 331540 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:19:53,370-Speed 2496.74 samples/sec Loss 2.5766 LearningRate 0.000445 Epoch: 15 Global Step: 331550 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:01,571-Speed 2497.68 samples/sec Loss 2.5428 LearningRate 0.000445 Epoch: 15 Global Step: 331560 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:09,720-Speed 2513.69 samples/sec Loss 2.5378 LearningRate 0.000445 Epoch: 15 Global Step: 331570 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:17,923-Speed 2496.95 samples/sec Loss 2.5679 LearningRate 0.000445 Epoch: 15 Global Step: 331580 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:26,126-Speed 2497.27 samples/sec Loss 2.5452 LearningRate 0.000445 Epoch: 15 Global Step: 331590 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:34,344-Speed 2492.29 samples/sec Loss 2.5719 LearningRate 0.000445 Epoch: 15 Global Step: 331600 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:42,561-Speed 2492.92 samples/sec Loss 2.5744 LearningRate 0.000445 Epoch: 15 Global Step: 331610 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:50,761-Speed 2497.92 samples/sec Loss 2.5607 LearningRate 0.000445 Epoch: 15 Global Step: 331620 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:20:58,921-Speed 2510.36 samples/sec Loss 2.6089 LearningRate 0.000445 Epoch: 15 Global Step: 331630 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:07,120-Speed 2498.15 samples/sec Loss 2.5917 LearningRate 0.000445 Epoch: 15 Global Step: 331640 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:15,317-Speed 2499.02 samples/sec Loss 2.6167 LearningRate 0.000445 Epoch: 15 Global Step: 331650 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:23,522-Speed 2496.20 samples/sec Loss 2.5726 LearningRate 0.000445 Epoch: 15 Global Step: 331660 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:31,722-Speed 2497.98 samples/sec Loss 2.5749 LearningRate 0.000445 Epoch: 15 Global Step: 331670 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:39,923-Speed 2497.65 samples/sec Loss 2.5526 LearningRate 0.000445 Epoch: 15 Global Step: 331680 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:48,072-Speed 2513.87 samples/sec Loss 2.6101 LearningRate 0.000445 Epoch: 15 Global Step: 331690 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:21:56,272-Speed 2497.83 samples/sec Loss 2.5635 LearningRate 0.000445 Epoch: 15 Global Step: 331700 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:04,479-Speed 2495.89 samples/sec Loss 2.5717 LearningRate 0.000445 Epoch: 15 Global Step: 331710 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:12,681-Speed 2497.14 samples/sec Loss 2.5720 LearningRate 0.000445 Epoch: 15 Global Step: 331720 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:20,882-Speed 2497.61 samples/sec Loss 2.5939 LearningRate 0.000445 Epoch: 15 Global Step: 331730 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:29,085-Speed 2497.10 samples/sec Loss 2.5189 LearningRate 0.000445 Epoch: 15 Global Step: 331740 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:37,236-Speed 2513.01 samples/sec Loss 2.5912 LearningRate 0.000445 Epoch: 15 Global Step: 331750 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:45,436-Speed 2497.83 samples/sec Loss 2.5591 LearningRate 0.000445 Epoch: 15 Global Step: 331760 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:22:53,639-Speed 2497.33 samples/sec Loss 2.5736 LearningRate 0.000445 Epoch: 15 Global Step: 331770 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:01,840-Speed 2497.48 samples/sec Loss 2.5676 LearningRate 0.000445 Epoch: 15 Global Step: 331780 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:10,040-Speed 2498.19 samples/sec Loss 2.5629 LearningRate 0.000445 Epoch: 15 Global Step: 331790 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:18,257-Speed 2492.83 samples/sec Loss 2.5593 LearningRate 0.000444 Epoch: 15 Global Step: 331800 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:26,399-Speed 2515.57 samples/sec Loss 2.6100 LearningRate 0.000444 Epoch: 15 Global Step: 331810 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:34,598-Speed 2498.34 samples/sec Loss 2.5425 LearningRate 0.000444 Epoch: 15 Global Step: 331820 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:42,795-Speed 2498.69 samples/sec Loss 2.5917 LearningRate 0.000444 Epoch: 15 Global Step: 331830 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:23:53,067-Speed 1993.94 samples/sec Loss 2.6295 LearningRate 0.000444 Epoch: 16 Global Step: 331840 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:24:01,266-Speed 2498.55 samples/sec Loss 2.5745 LearningRate 0.000444 Epoch: 16 Global Step: 331850 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:24:09,418-Speed 2512.48 samples/sec Loss 2.5815 LearningRate 0.000444 Epoch: 16 Global Step: 331860 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:17,569-Speed 2513.34 samples/sec Loss 2.5527 LearningRate 0.000444 Epoch: 16 Global Step: 331870 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:25,765-Speed 2499.19 samples/sec Loss 2.6077 LearningRate 0.000444 Epoch: 16 Global Step: 331880 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:33,960-Speed 2499.66 samples/sec Loss 2.6074 LearningRate 0.000444 Epoch: 16 Global Step: 331890 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:42,173-Speed 2493.85 samples/sec Loss 2.5322 LearningRate 0.000444 Epoch: 16 Global Step: 331900 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:50,370-Speed 2499.24 samples/sec Loss 2.6345 LearningRate 0.000444 Epoch: 16 Global Step: 331910 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:24:58,567-Speed 2498.90 samples/sec Loss 2.5504 LearningRate 0.000444 Epoch: 16 Global Step: 331920 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:06,716-Speed 2513.68 samples/sec Loss 2.5618 LearningRate 0.000444 Epoch: 16 Global Step: 331930 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:14,910-Speed 2499.70 samples/sec Loss 2.6302 LearningRate 0.000444 Epoch: 16 Global Step: 331940 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:23,107-Speed 2498.85 samples/sec Loss 2.5871 LearningRate 0.000444 Epoch: 16 Global Step: 331950 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:31,302-Speed 2499.65 samples/sec Loss 2.5018 LearningRate 0.000444 Epoch: 16 Global Step: 331960 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:39,504-Speed 2497.31 samples/sec Loss 2.5888 LearningRate 0.000444 Epoch: 16 Global Step: 331970 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:47,709-Speed 2496.47 samples/sec Loss 2.5567 LearningRate 0.000444 Epoch: 16 Global Step: 331980 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:25:55,853-Speed 2515.27 samples/sec Loss 2.5563 LearningRate 0.000444 Epoch: 16 Global Step: 331990 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:04,053-Speed 2498.31 samples/sec Loss 2.5838 LearningRate 0.000444 Epoch: 16 Global Step: 332000 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:12,265-Speed 2494.07 samples/sec Loss 2.5538 LearningRate 0.000444 Epoch: 16 Global Step: 332010 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:20,465-Speed 2498.30 samples/sec Loss 2.5331 LearningRate 0.000444 Epoch: 16 Global Step: 332020 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:28,664-Speed 2498.27 samples/sec Loss 2.5850 LearningRate 0.000444 Epoch: 16 Global Step: 332030 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:36,868-Speed 2496.75 samples/sec Loss 2.5755 LearningRate 0.000444 Epoch: 16 Global Step: 332040 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:45,020-Speed 2512.71 samples/sec Loss 2.6173 LearningRate 0.000444 Epoch: 16 Global Step: 332050 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:26:53,217-Speed 2498.88 samples/sec Loss 2.5943 LearningRate 0.000444 Epoch: 16 Global Step: 332060 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:01,416-Speed 2498.33 samples/sec Loss 2.6044 LearningRate 0.000444 Epoch: 16 Global Step: 332070 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:09,616-Speed 2498.31 samples/sec Loss 2.6039 LearningRate 0.000444 Epoch: 16 Global Step: 332080 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:17,830-Speed 2493.67 samples/sec Loss 2.5361 LearningRate 0.000444 Epoch: 16 Global Step: 332090 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:26,034-Speed 2496.61 samples/sec Loss 2.5976 LearningRate 0.000444 Epoch: 16 Global Step: 332100 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:34,186-Speed 2512.58 samples/sec Loss 2.5792 LearningRate 0.000444 Epoch: 16 Global Step: 332110 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:42,399-Speed 2494.65 samples/sec Loss 2.6191 LearningRate 0.000444 Epoch: 16 Global Step: 332120 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:50,599-Speed 2497.86 samples/sec Loss 2.6036 LearningRate 0.000444 Epoch: 16 Global Step: 332130 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:27:58,798-Speed 2498.43 samples/sec Loss 2.5985 LearningRate 0.000444 Epoch: 16 Global Step: 332140 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:07,014-Speed 2493.30 samples/sec Loss 2.5846 LearningRate 0.000444 Epoch: 16 Global Step: 332150 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:15,214-Speed 2497.79 samples/sec Loss 2.5551 LearningRate 0.000444 Epoch: 16 Global Step: 332160 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:23,374-Speed 2510.24 samples/sec Loss 2.5942 LearningRate 0.000444 Epoch: 16 Global Step: 332170 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:31,579-Speed 2496.62 samples/sec Loss 2.5810 LearningRate 0.000444 Epoch: 16 Global Step: 332180 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:39,777-Speed 2498.28 samples/sec Loss 2.5591 LearningRate 0.000444 Epoch: 16 Global Step: 332190 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:47,973-Speed 2499.34 samples/sec Loss 2.5555 LearningRate 0.000444 Epoch: 16 Global Step: 332200 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:28:56,174-Speed 2497.62 samples/sec Loss 2.6204 LearningRate 0.000444 Epoch: 16 Global Step: 332210 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:04,373-Speed 2498.09 samples/sec Loss 2.6026 LearningRate 0.000444 Epoch: 16 Global Step: 332220 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:12,520-Speed 2514.23 samples/sec Loss 2.5944 LearningRate 0.000444 Epoch: 16 Global Step: 332230 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:20,719-Speed 2498.16 samples/sec Loss 2.5507 LearningRate 0.000444 Epoch: 16 Global Step: 332240 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:28,924-Speed 2496.42 samples/sec Loss 2.5662 LearningRate 0.000444 Epoch: 16 Global Step: 332250 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:37,138-Speed 2493.63 samples/sec Loss 2.5662 LearningRate 0.000444 Epoch: 16 Global Step: 332260 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:45,338-Speed 2497.94 samples/sec Loss 2.5713 LearningRate 0.000444 Epoch: 16 Global Step: 332270 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:29:53,535-Speed 2498.99 samples/sec Loss 2.5469 LearningRate 0.000444 Epoch: 16 Global Step: 332280 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:01,685-Speed 2513.44 samples/sec Loss 2.5464 LearningRate 0.000444 Epoch: 16 Global Step: 332290 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:09,884-Speed 2498.14 samples/sec Loss 2.6072 LearningRate 0.000444 Epoch: 16 Global Step: 332300 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:18,083-Speed 2498.09 samples/sec Loss 2.5642 LearningRate 0.000444 Epoch: 16 Global Step: 332310 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:26,285-Speed 2497.42 samples/sec Loss 2.5391 LearningRate 0.000444 Epoch: 16 Global Step: 332320 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:34,483-Speed 2498.60 samples/sec Loss 2.5027 LearningRate 0.000444 Epoch: 16 Global Step: 332330 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:42,688-Speed 2496.85 samples/sec Loss 2.5496 LearningRate 0.000444 Epoch: 16 Global Step: 332340 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:50,831-Speed 2515.44 samples/sec Loss 2.5568 LearningRate 0.000444 Epoch: 16 Global Step: 332350 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:30:59,031-Speed 2497.85 samples/sec Loss 2.5689 LearningRate 0.000443 Epoch: 16 Global Step: 332360 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:07,229-Speed 2498.85 samples/sec Loss 2.5602 LearningRate 0.000443 Epoch: 16 Global Step: 332370 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:15,428-Speed 2498.28 samples/sec Loss 2.4830 LearningRate 0.000443 Epoch: 16 Global Step: 332380 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:23,635-Speed 2495.78 samples/sec Loss 2.5504 LearningRate 0.000443 Epoch: 16 Global Step: 332390 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:31,848-Speed 2493.87 samples/sec Loss 2.6040 LearningRate 0.000443 Epoch: 16 Global Step: 332400 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:39,999-Speed 2513.15 samples/sec Loss 2.5458 LearningRate 0.000443 Epoch: 16 Global Step: 332410 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:48,197-Speed 2498.35 samples/sec Loss 2.5877 LearningRate 0.000443 Epoch: 16 Global Step: 332420 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:31:56,403-Speed 2496.20 samples/sec Loss 2.6057 LearningRate 0.000443 Epoch: 16 Global Step: 332430 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:04,602-Speed 2498.05 samples/sec Loss 2.5819 LearningRate 0.000443 Epoch: 16 Global Step: 332440 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:12,806-Speed 2496.65 samples/sec Loss 2.5395 LearningRate 0.000443 Epoch: 16 Global Step: 332450 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:21,004-Speed 2498.51 samples/sec Loss 2.5493 LearningRate 0.000443 Epoch: 16 Global Step: 332460 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:29,163-Speed 2510.44 samples/sec Loss 2.5930 LearningRate 0.000443 Epoch: 16 Global Step: 332470 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:37,362-Speed 2498.39 samples/sec Loss 2.5831 LearningRate 0.000443 Epoch: 16 Global Step: 332480 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:45,574-Speed 2494.54 samples/sec Loss 2.5687 LearningRate 0.000443 Epoch: 16 Global Step: 332490 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:32:53,773-Speed 2498.00 samples/sec Loss 2.5261 LearningRate 0.000443 Epoch: 16 Global Step: 332500 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:01,974-Speed 2497.75 samples/sec Loss 2.5618 LearningRate 0.000443 Epoch: 16 Global Step: 332510 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:10,177-Speed 2497.15 samples/sec Loss 2.5895 LearningRate 0.000443 Epoch: 16 Global Step: 332520 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:18,319-Speed 2515.99 samples/sec Loss 2.5465 LearningRate 0.000443 Epoch: 16 Global Step: 332530 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:26,518-Speed 2498.20 samples/sec Loss 2.6477 LearningRate 0.000443 Epoch: 16 Global Step: 332540 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:34,721-Speed 2497.22 samples/sec Loss 2.5526 LearningRate 0.000443 Epoch: 16 Global Step: 332550 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:42,925-Speed 2496.94 samples/sec Loss 2.5942 LearningRate 0.000443 Epoch: 16 Global Step: 332560 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:51,122-Speed 2498.64 samples/sec Loss 2.5952 LearningRate 0.000443 Epoch: 16 Global Step: 332570 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:33:59,318-Speed 2499.08 samples/sec Loss 2.5547 LearningRate 0.000443 Epoch: 16 Global Step: 332580 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:07,457-Speed 2516.60 samples/sec Loss 2.5531 LearningRate 0.000443 Epoch: 16 Global Step: 332590 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:15,668-Speed 2494.81 samples/sec Loss 2.5363 LearningRate 0.000443 Epoch: 16 Global Step: 332600 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:23,867-Speed 2498.14 samples/sec Loss 2.5436 LearningRate 0.000443 Epoch: 16 Global Step: 332610 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:32,064-Speed 2498.89 samples/sec Loss 2.5523 LearningRate 0.000443 Epoch: 16 Global Step: 332620 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:40,267-Speed 2496.98 samples/sec Loss 2.5447 LearningRate 0.000443 Epoch: 16 Global Step: 332630 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:48,467-Speed 2498.17 samples/sec Loss 2.5914 LearningRate 0.000443 Epoch: 16 Global Step: 332640 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:34:56,613-Speed 2514.40 samples/sec Loss 2.5198 LearningRate 0.000443 Epoch: 16 Global Step: 332650 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:04,815-Speed 2497.57 samples/sec Loss 2.5388 LearningRate 0.000443 Epoch: 16 Global Step: 332660 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:13,017-Speed 2497.28 samples/sec Loss 2.5319 LearningRate 0.000443 Epoch: 16 Global Step: 332670 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:21,235-Speed 2492.63 samples/sec Loss 2.5447 LearningRate 0.000443 Epoch: 16 Global Step: 332680 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:29,436-Speed 2497.63 samples/sec Loss 2.5319 LearningRate 0.000443 Epoch: 16 Global Step: 332690 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:37,633-Speed 2498.68 samples/sec Loss 2.5477 LearningRate 0.000443 Epoch: 16 Global Step: 332700 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:45,778-Speed 2514.84 samples/sec Loss 2.5471 LearningRate 0.000443 Epoch: 16 Global Step: 332710 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:35:54,000-Speed 2491.39 samples/sec Loss 2.5599 LearningRate 0.000443 Epoch: 16 Global Step: 332720 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:02,207-Speed 2495.81 samples/sec Loss 2.5755 LearningRate 0.000443 Epoch: 16 Global Step: 332730 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:10,405-Speed 2498.36 samples/sec Loss 2.6013 LearningRate 0.000443 Epoch: 16 Global Step: 332740 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:18,604-Speed 2498.45 samples/sec Loss 2.5593 LearningRate 0.000443 Epoch: 16 Global Step: 332750 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:26,807-Speed 2496.85 samples/sec Loss 2.6183 LearningRate 0.000443 Epoch: 16 Global Step: 332760 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:34,955-Speed 2514.18 samples/sec Loss 2.6377 LearningRate 0.000443 Epoch: 16 Global Step: 332770 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:43,149-Speed 2499.79 samples/sec Loss 2.5662 LearningRate 0.000443 Epoch: 16 Global Step: 332780 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:51,346-Speed 2498.79 samples/sec Loss 2.5811 LearningRate 0.000443 Epoch: 16 Global Step: 332790 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:36:59,546-Speed 2498.04 samples/sec Loss 2.5711 LearningRate 0.000443 Epoch: 16 Global Step: 332800 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:07,739-Speed 2499.85 samples/sec Loss 2.5718 LearningRate 0.000443 Epoch: 16 Global Step: 332810 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:15,938-Speed 2498.33 samples/sec Loss 2.5459 LearningRate 0.000443 Epoch: 16 Global Step: 332820 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:24,092-Speed 2512.98 samples/sec Loss 2.5949 LearningRate 0.000443 Epoch: 16 Global Step: 332830 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:32,290-Speed 2498.30 samples/sec Loss 2.5554 LearningRate 0.000443 Epoch: 16 Global Step: 332840 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:40,490-Speed 2497.97 samples/sec Loss 2.5997 LearningRate 0.000443 Epoch: 16 Global Step: 332850 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:48,690-Speed 2498.06 samples/sec Loss 2.5705 LearningRate 0.000443 Epoch: 16 Global Step: 332860 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:37:56,889-Speed 2498.41 samples/sec Loss 2.5452 LearningRate 0.000443 Epoch: 16 Global Step: 332870 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:05,093-Speed 2496.60 samples/sec Loss 2.6016 LearningRate 0.000443 Epoch: 16 Global Step: 332880 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:13,238-Speed 2514.99 samples/sec Loss 2.6214 LearningRate 0.000443 Epoch: 16 Global Step: 332890 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:21,439-Speed 2497.58 samples/sec Loss 2.5922 LearningRate 0.000443 Epoch: 16 Global Step: 332900 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:29,641-Speed 2497.41 samples/sec Loss 2.5846 LearningRate 0.000443 Epoch: 16 Global Step: 332910 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:37,843-Speed 2497.14 samples/sec Loss 2.5828 LearningRate 0.000442 Epoch: 16 Global Step: 332920 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:46,042-Speed 2498.51 samples/sec Loss 2.5962 LearningRate 0.000442 Epoch: 16 Global Step: 332930 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:38:54,243-Speed 2497.71 samples/sec Loss 2.6049 LearningRate 0.000442 Epoch: 16 Global Step: 332940 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:02,386-Speed 2515.30 samples/sec Loss 2.5800 LearningRate 0.000442 Epoch: 16 Global Step: 332950 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:10,590-Speed 2496.91 samples/sec Loss 2.5145 LearningRate 0.000442 Epoch: 16 Global Step: 332960 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:18,801-Speed 2494.40 samples/sec Loss 2.5339 LearningRate 0.000442 Epoch: 16 Global Step: 332970 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:27,004-Speed 2496.96 samples/sec Loss 2.6083 LearningRate 0.000442 Epoch: 16 Global Step: 332980 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:35,212-Speed 2495.79 samples/sec Loss 2.5281 LearningRate 0.000442 Epoch: 16 Global Step: 332990 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:43,418-Speed 2496.18 samples/sec Loss 2.5760 LearningRate 0.000442 Epoch: 16 Global Step: 333000 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:51,566-Speed 2513.73 samples/sec Loss 2.4845 LearningRate 0.000442 Epoch: 16 Global Step: 333010 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:39:59,761-Speed 2499.37 samples/sec Loss 2.5944 LearningRate 0.000442 Epoch: 16 Global Step: 333020 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:40:07,974-Speed 2494.18 samples/sec Loss 2.5300 LearningRate 0.000442 Epoch: 16 Global Step: 333030 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:40:16,176-Speed 2497.22 samples/sec Loss 2.5211 LearningRate 0.000442 Epoch: 16 Global Step: 333040 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:40:24,382-Speed 2495.94 samples/sec Loss 2.5539 LearningRate 0.000442 Epoch: 16 Global Step: 333050 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:40:32,594-Speed 2494.38 samples/sec Loss 2.5487 LearningRate 0.000442 Epoch: 16 Global Step: 333060 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:40:40,734-Speed 2516.54 samples/sec Loss 2.5290 LearningRate 0.000442 Epoch: 16 Global Step: 333070 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:40:48,932-Speed 2498.40 samples/sec Loss 2.5313 LearningRate 0.000442 Epoch: 16 Global Step: 333080 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:40:57,140-Speed 2495.57 samples/sec Loss 2.6045 LearningRate 0.000442 Epoch: 16 Global Step: 333090 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:05,357-Speed 2493.12 samples/sec Loss 2.5293 LearningRate 0.000442 Epoch: 16 Global Step: 333100 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:13,553-Speed 2499.16 samples/sec Loss 2.5296 LearningRate 0.000442 Epoch: 16 Global Step: 333110 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:21,750-Speed 2498.61 samples/sec Loss 2.5733 LearningRate 0.000442 Epoch: 16 Global Step: 333120 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:29,896-Speed 2514.57 samples/sec Loss 2.5518 LearningRate 0.000442 Epoch: 16 Global Step: 333130 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:38,097-Speed 2497.84 samples/sec Loss 2.5891 LearningRate 0.000442 Epoch: 16 Global Step: 333140 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:46,291-Speed 2499.62 samples/sec Loss 2.5809 LearningRate 0.000442 Epoch: 16 Global Step: 333150 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:41:54,490-Speed 2498.55 samples/sec Loss 2.5268 LearningRate 0.000442 Epoch: 16 Global Step: 333160 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:02,706-Speed 2493.55 samples/sec Loss 2.5550 LearningRate 0.000442 Epoch: 16 Global Step: 333170 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:10,903-Speed 2498.80 samples/sec Loss 2.5396 LearningRate 0.000442 Epoch: 16 Global Step: 333180 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:19,046-Speed 2515.25 samples/sec Loss 2.5527 LearningRate 0.000442 Epoch: 16 Global Step: 333190 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:27,249-Speed 2496.83 samples/sec Loss 2.5370 LearningRate 0.000442 Epoch: 16 Global Step: 333200 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:35,447-Speed 2498.79 samples/sec Loss 2.5615 LearningRate 0.000442 Epoch: 16 Global Step: 333210 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:43,649-Speed 2497.34 samples/sec Loss 2.5184 LearningRate 0.000442 Epoch: 16 Global Step: 333220 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:42:51,856-Speed 2495.96 samples/sec Loss 2.5221 LearningRate 0.000442 Epoch: 16 Global Step: 333230 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:00,063-Speed 2496.14 samples/sec Loss 2.5465 LearningRate 0.000442 Epoch: 16 Global Step: 333240 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:08,215-Speed 2512.53 samples/sec Loss 2.5128 LearningRate 0.000442 Epoch: 16 Global Step: 333250 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:16,418-Speed 2497.27 samples/sec Loss 2.5654 LearningRate 0.000442 Epoch: 16 Global Step: 333260 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:24,615-Speed 2498.60 samples/sec Loss 2.5438 LearningRate 0.000442 Epoch: 16 Global Step: 333270 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:32,830-Speed 2493.50 samples/sec Loss 2.5829 LearningRate 0.000442 Epoch: 16 Global Step: 333280 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:41,028-Speed 2498.66 samples/sec Loss 2.5813 LearningRate 0.000442 Epoch: 16 Global Step: 333290 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:49,234-Speed 2496.02 samples/sec Loss 2.5439 LearningRate 0.000442 Epoch: 16 Global Step: 333300 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:43:57,377-Speed 2515.40 samples/sec Loss 2.5421 LearningRate 0.000442 Epoch: 16 Global Step: 333310 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:05,575-Speed 2498.49 samples/sec Loss 2.5766 LearningRate 0.000442 Epoch: 16 Global Step: 333320 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:13,772-Speed 2499.28 samples/sec Loss 2.5963 LearningRate 0.000442 Epoch: 16 Global Step: 333330 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:21,966-Speed 2499.76 samples/sec Loss 2.5631 LearningRate 0.000442 Epoch: 16 Global Step: 333340 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:30,161-Speed 2499.35 samples/sec Loss 2.5676 LearningRate 0.000442 Epoch: 16 Global Step: 333350 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:38,359-Speed 2498.49 samples/sec Loss 2.5070 LearningRate 0.000442 Epoch: 16 Global Step: 333360 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:46,507-Speed 2514.44 samples/sec Loss 2.5539 LearningRate 0.000442 Epoch: 16 Global Step: 333370 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:44:54,707-Speed 2498.04 samples/sec Loss 2.5948 LearningRate 0.000442 Epoch: 16 Global Step: 333380 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:02,991-Speed 2472.45 samples/sec Loss 2.5739 LearningRate 0.000442 Epoch: 16 Global Step: 333390 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:11,191-Speed 2498.24 samples/sec Loss 2.5310 LearningRate 0.000442 Epoch: 16 Global Step: 333400 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:19,388-Speed 2499.25 samples/sec Loss 2.6133 LearningRate 0.000442 Epoch: 16 Global Step: 333410 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:27,583-Speed 2499.27 samples/sec Loss 2.5421 LearningRate 0.000442 Epoch: 16 Global Step: 333420 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:35,735-Speed 2512.78 samples/sec Loss 2.6264 LearningRate 0.000442 Epoch: 16 Global Step: 333430 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:43,930-Speed 2499.47 samples/sec Loss 2.5972 LearningRate 0.000442 Epoch: 16 Global Step: 333440 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:45:52,135-Speed 2496.38 samples/sec Loss 2.5827 LearningRate 0.000442 Epoch: 16 Global Step: 333450 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:00,338-Speed 2497.20 samples/sec Loss 2.5940 LearningRate 0.000442 Epoch: 16 Global Step: 333460 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:08,536-Speed 2498.42 samples/sec Loss 2.5704 LearningRate 0.000442 Epoch: 16 Global Step: 333470 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:16,737-Speed 2497.92 samples/sec Loss 2.6330 LearningRate 0.000441 Epoch: 16 Global Step: 333480 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:24,882-Speed 2514.74 samples/sec Loss 2.5704 LearningRate 0.000441 Epoch: 16 Global Step: 333490 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:33,096-Speed 2493.46 samples/sec Loss 2.5425 LearningRate 0.000441 Epoch: 16 Global Step: 333500 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:41,295-Speed 2498.44 samples/sec Loss 2.5590 LearningRate 0.000441 Epoch: 16 Global Step: 333510 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:49,497-Speed 2497.63 samples/sec Loss 2.5421 LearningRate 0.000441 Epoch: 16 Global Step: 333520 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:46:57,694-Speed 2498.65 samples/sec Loss 2.5932 LearningRate 0.000441 Epoch: 16 Global Step: 333530 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:47:05,893-Speed 2498.07 samples/sec Loss 2.5723 LearningRate 0.000441 Epoch: 16 Global Step: 333540 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:47:14,037-Speed 2515.34 samples/sec Loss 2.5803 LearningRate 0.000441 Epoch: 16 Global Step: 333550 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:47:22,237-Speed 2498.00 samples/sec Loss 2.5419 LearningRate 0.000441 Epoch: 16 Global Step: 333560 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:47:30,438-Speed 2497.66 samples/sec Loss 2.5898 LearningRate 0.000441 Epoch: 16 Global Step: 333570 Fp16 Grad Scale: 65536 Required: 114 hours Training: 2022-07-08 18:47:38,594-Speed 2511.20 samples/sec Loss 2.5503 LearningRate 0.000441 Epoch: 16 Global Step: 333580 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:47:46,809-Speed 2493.47 samples/sec Loss 2.5631 LearningRate 0.000441 Epoch: 16 Global Step: 333590 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:47:55,011-Speed 2498.04 samples/sec Loss 2.5148 LearningRate 0.000441 Epoch: 16 Global Step: 333600 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:03,158-Speed 2514.12 samples/sec Loss 2.5910 LearningRate 0.000441 Epoch: 16 Global Step: 333610 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:11,364-Speed 2496.20 samples/sec Loss 2.5559 LearningRate 0.000441 Epoch: 16 Global Step: 333620 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:19,581-Speed 2492.87 samples/sec Loss 2.4941 LearningRate 0.000441 Epoch: 16 Global Step: 333630 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:27,790-Speed 2495.27 samples/sec Loss 2.5740 LearningRate 0.000441 Epoch: 16 Global Step: 333640 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:35,991-Speed 2497.59 samples/sec Loss 2.5884 LearningRate 0.000441 Epoch: 16 Global Step: 333650 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:44,191-Speed 2498.13 samples/sec Loss 2.5303 LearningRate 0.000441 Epoch: 16 Global Step: 333660 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:48:52,348-Speed 2511.05 samples/sec Loss 2.5802 LearningRate 0.000441 Epoch: 16 Global Step: 333670 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:49:00,550-Speed 2500.07 samples/sec Loss 2.5499 LearningRate 0.000441 Epoch: 16 Global Step: 333680 Fp16 Grad Scale: 32768 Required: 114 hours Training: 2022-07-08 18:49:08,755-Speed 2496.51 samples/sec Loss 2.5647 LearningRate 0.000441 Epoch: 16 Global Step: 333690 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:16,966-Speed 2494.59 samples/sec Loss 2.5837 LearningRate 0.000441 Epoch: 16 Global Step: 333700 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:25,164-Speed 2498.61 samples/sec Loss 2.5475 LearningRate 0.000441 Epoch: 16 Global Step: 333710 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:33,361-Speed 2498.90 samples/sec Loss 2.5997 LearningRate 0.000441 Epoch: 16 Global Step: 333720 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:41,508-Speed 2514.44 samples/sec Loss 2.5803 LearningRate 0.000441 Epoch: 16 Global Step: 333730 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:49,707-Speed 2498.10 samples/sec Loss 2.5474 LearningRate 0.000441 Epoch: 16 Global Step: 333740 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:49:57,909-Speed 2497.27 samples/sec Loss 2.5825 LearningRate 0.000441 Epoch: 16 Global Step: 333750 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:06,124-Speed 2493.27 samples/sec Loss 2.5706 LearningRate 0.000441 Epoch: 16 Global Step: 333760 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:14,324-Speed 2498.04 samples/sec Loss 2.5841 LearningRate 0.000441 Epoch: 16 Global Step: 333770 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:22,534-Speed 2494.90 samples/sec Loss 2.6158 LearningRate 0.000441 Epoch: 16 Global Step: 333780 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:30,682-Speed 2513.90 samples/sec Loss 2.5400 LearningRate 0.000441 Epoch: 16 Global Step: 333790 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:38,884-Speed 2497.74 samples/sec Loss 2.5989 LearningRate 0.000441 Epoch: 16 Global Step: 333800 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:47,085-Speed 2497.63 samples/sec Loss 2.5776 LearningRate 0.000441 Epoch: 16 Global Step: 333810 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:50:55,286-Speed 2497.84 samples/sec Loss 2.6346 LearningRate 0.000441 Epoch: 16 Global Step: 333820 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:03,486-Speed 2498.13 samples/sec Loss 2.5365 LearningRate 0.000441 Epoch: 16 Global Step: 333830 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:11,689-Speed 2497.14 samples/sec Loss 2.5313 LearningRate 0.000441 Epoch: 16 Global Step: 333840 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:19,836-Speed 2514.22 samples/sec Loss 2.5257 LearningRate 0.000441 Epoch: 16 Global Step: 333850 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:28,048-Speed 2494.48 samples/sec Loss 2.5336 LearningRate 0.000441 Epoch: 16 Global Step: 333860 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:36,247-Speed 2498.04 samples/sec Loss 2.5102 LearningRate 0.000441 Epoch: 16 Global Step: 333870 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:44,454-Speed 2495.93 samples/sec Loss 2.5886 LearningRate 0.000441 Epoch: 16 Global Step: 333880 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:51:52,690-Speed 2486.93 samples/sec Loss 2.6161 LearningRate 0.000441 Epoch: 16 Global Step: 333890 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:00,892-Speed 2497.35 samples/sec Loss 2.5965 LearningRate 0.000441 Epoch: 16 Global Step: 333900 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:09,034-Speed 2515.68 samples/sec Loss 2.5899 LearningRate 0.000441 Epoch: 16 Global Step: 333910 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:17,266-Speed 2488.47 samples/sec Loss 2.5501 LearningRate 0.000441 Epoch: 16 Global Step: 333920 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:25,466-Speed 2497.84 samples/sec Loss 2.5294 LearningRate 0.000441 Epoch: 16 Global Step: 333930 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:33,668-Speed 2497.25 samples/sec Loss 2.5800 LearningRate 0.000441 Epoch: 16 Global Step: 333940 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:41,875-Speed 2496.00 samples/sec Loss 2.6021 LearningRate 0.000441 Epoch: 16 Global Step: 333950 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:50,091-Speed 2493.07 samples/sec Loss 2.6454 LearningRate 0.000441 Epoch: 16 Global Step: 333960 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:52:58,254-Speed 2509.25 samples/sec Loss 2.5688 LearningRate 0.000441 Epoch: 16 Global Step: 333970 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:06,460-Speed 2496.09 samples/sec Loss 2.5490 LearningRate 0.000441 Epoch: 16 Global Step: 333980 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:14,657-Speed 2498.93 samples/sec Loss 2.6336 LearningRate 0.000441 Epoch: 16 Global Step: 333990 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:22,876-Speed 2492.35 samples/sec Loss 2.5660 LearningRate 0.000441 Epoch: 16 Global Step: 334000 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:31,078-Speed 2497.44 samples/sec Loss 2.5787 LearningRate 0.000441 Epoch: 16 Global Step: 334010 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:39,278-Speed 2497.83 samples/sec Loss 2.6093 LearningRate 0.000441 Epoch: 16 Global Step: 334020 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:47,425-Speed 2514.10 samples/sec Loss 2.5855 LearningRate 0.000441 Epoch: 16 Global Step: 334030 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:53:55,628-Speed 2497.06 samples/sec Loss 2.5629 LearningRate 0.000440 Epoch: 16 Global Step: 334040 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:03,838-Speed 2495.10 samples/sec Loss 2.5583 LearningRate 0.000440 Epoch: 16 Global Step: 334050 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:12,041-Speed 2497.07 samples/sec Loss 2.5903 LearningRate 0.000440 Epoch: 16 Global Step: 334060 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:20,260-Speed 2492.30 samples/sec Loss 2.5803 LearningRate 0.000440 Epoch: 16 Global Step: 334070 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:28,462-Speed 2497.08 samples/sec Loss 2.5007 LearningRate 0.000440 Epoch: 16 Global Step: 334080 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:36,615-Speed 2512.64 samples/sec Loss 2.5749 LearningRate 0.000440 Epoch: 16 Global Step: 334090 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:44,828-Speed 2493.82 samples/sec Loss 2.5890 LearningRate 0.000440 Epoch: 16 Global Step: 334100 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:54:53,028-Speed 2498.10 samples/sec Loss 2.5689 LearningRate 0.000440 Epoch: 16 Global Step: 334110 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:01,226-Speed 2498.37 samples/sec Loss 2.6157 LearningRate 0.000440 Epoch: 16 Global Step: 334120 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:09,424-Speed 2498.60 samples/sec Loss 2.5301 LearningRate 0.000440 Epoch: 16 Global Step: 334130 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:17,622-Speed 2498.35 samples/sec Loss 2.5412 LearningRate 0.000440 Epoch: 16 Global Step: 334140 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:25,769-Speed 2514.73 samples/sec Loss 2.5879 LearningRate 0.000440 Epoch: 16 Global Step: 334150 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:33,967-Speed 2498.41 samples/sec Loss 2.5786 LearningRate 0.000440 Epoch: 16 Global Step: 334160 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:42,170-Speed 2497.18 samples/sec Loss 2.5434 LearningRate 0.000440 Epoch: 16 Global Step: 334170 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:50,372-Speed 2497.54 samples/sec Loss 2.5935 LearningRate 0.000440 Epoch: 16 Global Step: 334180 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:55:58,572-Speed 2498.17 samples/sec Loss 2.5869 LearningRate 0.000440 Epoch: 16 Global Step: 334190 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:06,783-Speed 2494.29 samples/sec Loss 2.5550 LearningRate 0.000440 Epoch: 16 Global Step: 334200 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:14,933-Speed 2513.48 samples/sec Loss 2.5790 LearningRate 0.000440 Epoch: 16 Global Step: 334210 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:23,133-Speed 2497.77 samples/sec Loss 2.5920 LearningRate 0.000440 Epoch: 16 Global Step: 334220 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:31,343-Speed 2495.57 samples/sec Loss 2.5693 LearningRate 0.000440 Epoch: 16 Global Step: 334230 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:39,549-Speed 2496.15 samples/sec Loss 2.6155 LearningRate 0.000440 Epoch: 16 Global Step: 334240 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:47,752-Speed 2497.06 samples/sec Loss 2.5403 LearningRate 0.000440 Epoch: 16 Global Step: 334250 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:56:55,954-Speed 2497.50 samples/sec Loss 2.5404 LearningRate 0.000440 Epoch: 16 Global Step: 334260 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:04,100-Speed 2514.93 samples/sec Loss 2.5887 LearningRate 0.000440 Epoch: 16 Global Step: 334270 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:12,302-Speed 2497.35 samples/sec Loss 2.5806 LearningRate 0.000440 Epoch: 16 Global Step: 334280 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:20,501-Speed 2498.26 samples/sec Loss 2.6008 LearningRate 0.000440 Epoch: 16 Global Step: 334290 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:28,706-Speed 2496.57 samples/sec Loss 2.5327 LearningRate 0.000440 Epoch: 16 Global Step: 334300 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:36,922-Speed 2493.41 samples/sec Loss 2.5229 LearningRate 0.000440 Epoch: 16 Global Step: 334310 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:45,120-Speed 2498.32 samples/sec Loss 2.6383 LearningRate 0.000440 Epoch: 16 Global Step: 334320 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:57:53,266-Speed 2514.62 samples/sec Loss 2.5608 LearningRate 0.000440 Epoch: 16 Global Step: 334330 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:01,470-Speed 2496.90 samples/sec Loss 2.5315 LearningRate 0.000440 Epoch: 16 Global Step: 334340 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:09,672-Speed 2497.25 samples/sec Loss 2.4978 LearningRate 0.000440 Epoch: 16 Global Step: 334350 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:17,888-Speed 2492.87 samples/sec Loss 2.5568 LearningRate 0.000440 Epoch: 16 Global Step: 334360 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:26,091-Speed 2497.12 samples/sec Loss 2.5595 LearningRate 0.000440 Epoch: 16 Global Step: 334370 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:34,291-Speed 2498.34 samples/sec Loss 2.5454 LearningRate 0.000440 Epoch: 16 Global Step: 334380 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:42,438-Speed 2514.05 samples/sec Loss 2.5580 LearningRate 0.000440 Epoch: 16 Global Step: 334390 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:50,641-Speed 2497.46 samples/sec Loss 2.5794 LearningRate 0.000440 Epoch: 16 Global Step: 334400 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:58:58,839-Speed 2499.50 samples/sec Loss 2.5082 LearningRate 0.000440 Epoch: 16 Global Step: 334410 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:07,038-Speed 2498.29 samples/sec Loss 2.5494 LearningRate 0.000440 Epoch: 16 Global Step: 334420 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:15,237-Speed 2497.98 samples/sec Loss 2.5322 LearningRate 0.000440 Epoch: 16 Global Step: 334430 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:23,439-Speed 2497.34 samples/sec Loss 2.5510 LearningRate 0.000440 Epoch: 16 Global Step: 334440 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:31,584-Speed 2514.92 samples/sec Loss 2.5452 LearningRate 0.000440 Epoch: 16 Global Step: 334450 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:39,780-Speed 2499.08 samples/sec Loss 2.5234 LearningRate 0.000440 Epoch: 16 Global Step: 334460 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:47,983-Speed 2497.27 samples/sec Loss 2.5699 LearningRate 0.000440 Epoch: 16 Global Step: 334470 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 18:59:56,186-Speed 2497.03 samples/sec Loss 2.5524 LearningRate 0.000440 Epoch: 16 Global Step: 334480 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:04,388-Speed 2497.51 samples/sec Loss 2.5323 LearningRate 0.000440 Epoch: 16 Global Step: 334490 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:12,589-Speed 2497.68 samples/sec Loss 2.5011 LearningRate 0.000440 Epoch: 16 Global Step: 334500 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:20,735-Speed 2514.44 samples/sec Loss 2.5138 LearningRate 0.000440 Epoch: 16 Global Step: 334510 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:28,937-Speed 2497.40 samples/sec Loss 2.5231 LearningRate 0.000440 Epoch: 16 Global Step: 334520 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:37,143-Speed 2495.94 samples/sec Loss 2.5797 LearningRate 0.000440 Epoch: 16 Global Step: 334530 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:45,350-Speed 2496.18 samples/sec Loss 2.5613 LearningRate 0.000440 Epoch: 16 Global Step: 334540 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:00:53,551-Speed 2497.66 samples/sec Loss 2.5554 LearningRate 0.000440 Epoch: 16 Global Step: 334550 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:01,749-Speed 2498.49 samples/sec Loss 2.5181 LearningRate 0.000440 Epoch: 16 Global Step: 334560 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:09,895-Speed 2514.54 samples/sec Loss 2.5705 LearningRate 0.000440 Epoch: 16 Global Step: 334570 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:18,096-Speed 2497.61 samples/sec Loss 2.5754 LearningRate 0.000440 Epoch: 16 Global Step: 334580 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:26,299-Speed 2497.25 samples/sec Loss 2.5432 LearningRate 0.000440 Epoch: 16 Global Step: 334590 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:34,500-Speed 2497.63 samples/sec Loss 2.5038 LearningRate 0.000440 Epoch: 16 Global Step: 334600 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:42,710-Speed 2495.01 samples/sec Loss 2.6228 LearningRate 0.000439 Epoch: 16 Global Step: 334610 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:50,908-Speed 2498.61 samples/sec Loss 2.5952 LearningRate 0.000439 Epoch: 16 Global Step: 334620 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:01:59,055-Speed 2514.28 samples/sec Loss 2.5998 LearningRate 0.000439 Epoch: 16 Global Step: 334630 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:07,252-Speed 2499.22 samples/sec Loss 2.5650 LearningRate 0.000439 Epoch: 16 Global Step: 334640 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:15,457-Speed 2496.44 samples/sec Loss 2.6207 LearningRate 0.000439 Epoch: 16 Global Step: 334650 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:23,658-Speed 2497.78 samples/sec Loss 2.5890 LearningRate 0.000439 Epoch: 16 Global Step: 334660 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:31,871-Speed 2493.84 samples/sec Loss 2.5601 LearningRate 0.000439 Epoch: 16 Global Step: 334670 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:40,070-Speed 2498.13 samples/sec Loss 2.5712 LearningRate 0.000439 Epoch: 16 Global Step: 334680 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:48,220-Speed 2513.28 samples/sec Loss 2.5248 LearningRate 0.000439 Epoch: 16 Global Step: 334690 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:02:56,427-Speed 2495.84 samples/sec Loss 2.5154 LearningRate 0.000439 Epoch: 16 Global Step: 334700 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:04,629-Speed 2497.19 samples/sec Loss 2.5991 LearningRate 0.000439 Epoch: 16 Global Step: 334710 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:12,828-Speed 2498.51 samples/sec Loss 2.5197 LearningRate 0.000439 Epoch: 16 Global Step: 334720 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:21,030-Speed 2497.40 samples/sec Loss 2.5539 LearningRate 0.000439 Epoch: 16 Global Step: 334730 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:29,231-Speed 2497.31 samples/sec Loss 2.5022 LearningRate 0.000439 Epoch: 16 Global Step: 334740 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:37,373-Speed 2515.87 samples/sec Loss 2.6123 LearningRate 0.000439 Epoch: 16 Global Step: 334750 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:45,570-Speed 2499.13 samples/sec Loss 2.6394 LearningRate 0.000439 Epoch: 16 Global Step: 334760 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:03:53,776-Speed 2496.19 samples/sec Loss 2.5598 LearningRate 0.000439 Epoch: 16 Global Step: 334770 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:04:01,978-Speed 2497.41 samples/sec Loss 2.6068 LearningRate 0.000439 Epoch: 16 Global Step: 334780 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:10,174-Speed 2499.22 samples/sec Loss 2.6919 LearningRate 0.000439 Epoch: 16 Global Step: 334790 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:18,370-Speed 2499.41 samples/sec Loss 2.5742 LearningRate 0.000439 Epoch: 16 Global Step: 334800 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:26,515-Speed 2514.67 samples/sec Loss 2.6656 LearningRate 0.000439 Epoch: 16 Global Step: 334810 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:34,707-Speed 2500.21 samples/sec Loss 2.6129 LearningRate 0.000439 Epoch: 16 Global Step: 334820 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:42,918-Speed 2494.61 samples/sec Loss 2.5944 LearningRate 0.000439 Epoch: 16 Global Step: 334830 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:51,115-Speed 2498.89 samples/sec Loss 2.5676 LearningRate 0.000439 Epoch: 16 Global Step: 334840 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:04:59,314-Speed 2498.38 samples/sec Loss 2.5674 LearningRate 0.000439 Epoch: 16 Global Step: 334850 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:07,510-Speed 2499.09 samples/sec Loss 2.5674 LearningRate 0.000439 Epoch: 16 Global Step: 334860 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:15,657-Speed 2514.20 samples/sec Loss 2.5361 LearningRate 0.000439 Epoch: 16 Global Step: 334870 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:23,852-Speed 2499.46 samples/sec Loss 2.5814 LearningRate 0.000439 Epoch: 16 Global Step: 334880 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:32,050-Speed 2498.73 samples/sec Loss 2.6041 LearningRate 0.000439 Epoch: 16 Global Step: 334890 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:40,254-Speed 2496.77 samples/sec Loss 2.5415 LearningRate 0.000439 Epoch: 16 Global Step: 334900 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:48,466-Speed 2494.66 samples/sec Loss 2.5414 LearningRate 0.000439 Epoch: 16 Global Step: 334910 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:05:56,664-Speed 2498.28 samples/sec Loss 2.5549 LearningRate 0.000439 Epoch: 16 Global Step: 334920 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:04,813-Speed 2513.65 samples/sec Loss 2.5370 LearningRate 0.000439 Epoch: 16 Global Step: 334930 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:13,025-Speed 2494.37 samples/sec Loss 2.5584 LearningRate 0.000439 Epoch: 16 Global Step: 334940 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:21,225-Speed 2497.98 samples/sec Loss 2.4995 LearningRate 0.000439 Epoch: 16 Global Step: 334950 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:29,424-Speed 2498.28 samples/sec Loss 2.5961 LearningRate 0.000439 Epoch: 16 Global Step: 334960 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:37,625-Speed 2497.71 samples/sec Loss 2.5183 LearningRate 0.000439 Epoch: 16 Global Step: 334970 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:45,827-Speed 2497.11 samples/sec Loss 2.5487 LearningRate 0.000439 Epoch: 16 Global Step: 334980 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:06:53,968-Speed 2516.00 samples/sec Loss 2.5913 LearningRate 0.000439 Epoch: 16 Global Step: 334990 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:02,167-Speed 2498.64 samples/sec Loss 2.5531 LearningRate 0.000439 Epoch: 16 Global Step: 335000 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:10,367-Speed 2497.88 samples/sec Loss 2.5435 LearningRate 0.000439 Epoch: 16 Global Step: 335010 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:18,564-Speed 2498.96 samples/sec Loss 2.5609 LearningRate 0.000439 Epoch: 16 Global Step: 335020 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:26,762-Speed 2498.39 samples/sec Loss 2.5369 LearningRate 0.000439 Epoch: 16 Global Step: 335030 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:34,961-Speed 2498.33 samples/sec Loss 2.5630 LearningRate 0.000439 Epoch: 16 Global Step: 335040 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:43,101-Speed 2516.31 samples/sec Loss 2.5387 LearningRate 0.000439 Epoch: 16 Global Step: 335050 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:51,301-Speed 2497.88 samples/sec Loss 2.5789 LearningRate 0.000439 Epoch: 16 Global Step: 335060 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:07:59,514-Speed 2493.94 samples/sec Loss 2.6201 LearningRate 0.000439 Epoch: 16 Global Step: 335070 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:07,713-Speed 2498.50 samples/sec Loss 2.6043 LearningRate 0.000439 Epoch: 16 Global Step: 335080 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:15,914-Speed 2497.43 samples/sec Loss 2.5741 LearningRate 0.000439 Epoch: 16 Global Step: 335090 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:24,117-Speed 2497.16 samples/sec Loss 2.5957 LearningRate 0.000439 Epoch: 16 Global Step: 335100 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:32,263-Speed 2514.41 samples/sec Loss 2.5679 LearningRate 0.000439 Epoch: 16 Global Step: 335110 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:40,463-Speed 2498.15 samples/sec Loss 2.5420 LearningRate 0.000439 Epoch: 16 Global Step: 335120 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:48,659-Speed 2499.11 samples/sec Loss 2.5635 LearningRate 0.000439 Epoch: 16 Global Step: 335130 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:08:56,859-Speed 2498.11 samples/sec Loss 2.5458 LearningRate 0.000439 Epoch: 16 Global Step: 335140 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:05,057-Speed 2498.52 samples/sec Loss 2.5467 LearningRate 0.000439 Epoch: 16 Global Step: 335150 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:13,256-Speed 2498.22 samples/sec Loss 2.5508 LearningRate 0.000439 Epoch: 16 Global Step: 335160 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:21,415-Speed 2510.60 samples/sec Loss 2.5490 LearningRate 0.000438 Epoch: 16 Global Step: 335170 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:29,611-Speed 2499.34 samples/sec Loss 2.5257 LearningRate 0.000438 Epoch: 16 Global Step: 335180 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:37,810-Speed 2498.86 samples/sec Loss 2.5201 LearningRate 0.000438 Epoch: 16 Global Step: 335190 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:46,011-Speed 2497.46 samples/sec Loss 2.5091 LearningRate 0.000438 Epoch: 16 Global Step: 335200 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:09:54,210-Speed 2498.36 samples/sec Loss 2.5196 LearningRate 0.000438 Epoch: 16 Global Step: 335210 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:10:02,406-Speed 2499.26 samples/sec Loss 2.5349 LearningRate 0.000438 Epoch: 16 Global Step: 335220 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:10:10,554-Speed 2514.19 samples/sec Loss 2.5224 LearningRate 0.000438 Epoch: 16 Global Step: 335230 Fp16 Grad Scale: 65536 Required: 113 hours Training: 2022-07-08 19:10:18,711-Speed 2511.08 samples/sec Loss 2.4699 LearningRate 0.000438 Epoch: 16 Global Step: 335240 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:10:26,912-Speed 2497.76 samples/sec Loss 2.5018 LearningRate 0.000438 Epoch: 16 Global Step: 335250 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:10:35,108-Speed 2499.10 samples/sec Loss 2.5128 LearningRate 0.000438 Epoch: 16 Global Step: 335260 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:10:43,310-Speed 2497.63 samples/sec Loss 2.5763 LearningRate 0.000438 Epoch: 16 Global Step: 335270 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:10:51,511-Speed 2497.67 samples/sec Loss 2.5376 LearningRate 0.000438 Epoch: 16 Global Step: 335280 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:10:59,655-Speed 2515.00 samples/sec Loss 2.5863 LearningRate 0.000438 Epoch: 16 Global Step: 335290 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:11:07,849-Speed 2499.84 samples/sec Loss 2.4870 LearningRate 0.000438 Epoch: 16 Global Step: 335300 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:11:16,051-Speed 2497.51 samples/sec Loss 2.5357 LearningRate 0.000438 Epoch: 16 Global Step: 335310 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:11:24,215-Speed 2508.81 samples/sec Loss 2.5192 LearningRate 0.000438 Epoch: 16 Global Step: 335320 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:11:32,416-Speed 2497.69 samples/sec Loss 2.5328 LearningRate 0.000438 Epoch: 16 Global Step: 335330 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:11:40,621-Speed 2496.59 samples/sec Loss 2.5604 LearningRate 0.000438 Epoch: 16 Global Step: 335340 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:11:48,767-Speed 2514.53 samples/sec Loss 2.5870 LearningRate 0.000438 Epoch: 16 Global Step: 335350 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:11:56,967-Speed 2497.84 samples/sec Loss 2.5617 LearningRate 0.000438 Epoch: 16 Global Step: 335360 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:05,179-Speed 2494.36 samples/sec Loss 2.5024 LearningRate 0.000438 Epoch: 16 Global Step: 335370 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:13,381-Speed 2497.22 samples/sec Loss 2.5302 LearningRate 0.000438 Epoch: 16 Global Step: 335380 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:21,579-Speed 2498.61 samples/sec Loss 2.5506 LearningRate 0.000438 Epoch: 16 Global Step: 335390 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:29,785-Speed 2496.03 samples/sec Loss 2.5610 LearningRate 0.000438 Epoch: 16 Global Step: 335400 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:37,941-Speed 2511.35 samples/sec Loss 2.5471 LearningRate 0.000438 Epoch: 16 Global Step: 335410 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:46,139-Speed 2498.75 samples/sec Loss 2.6150 LearningRate 0.000438 Epoch: 16 Global Step: 335420 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:12:54,343-Speed 2497.08 samples/sec Loss 2.5695 LearningRate 0.000438 Epoch: 16 Global Step: 335430 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:02,543-Speed 2497.58 samples/sec Loss 2.5163 LearningRate 0.000438 Epoch: 16 Global Step: 335440 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:10,747-Speed 2497.04 samples/sec Loss 2.5882 LearningRate 0.000438 Epoch: 16 Global Step: 335450 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:18,943-Speed 2498.95 samples/sec Loss 2.6008 LearningRate 0.000438 Epoch: 16 Global Step: 335460 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:27,089-Speed 2514.64 samples/sec Loss 2.5706 LearningRate 0.000438 Epoch: 16 Global Step: 335470 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:35,286-Speed 2498.66 samples/sec Loss 2.6086 LearningRate 0.000438 Epoch: 16 Global Step: 335480 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:43,480-Speed 2499.75 samples/sec Loss 2.5456 LearningRate 0.000438 Epoch: 16 Global Step: 335490 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:51,681-Speed 2497.54 samples/sec Loss 2.5417 LearningRate 0.000438 Epoch: 16 Global Step: 335500 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:13:59,883-Speed 2497.41 samples/sec Loss 2.5459 LearningRate 0.000438 Epoch: 16 Global Step: 335510 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:08,087-Speed 2496.64 samples/sec Loss 2.5112 LearningRate 0.000438 Epoch: 16 Global Step: 335520 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:16,234-Speed 2514.45 samples/sec Loss 2.5144 LearningRate 0.000438 Epoch: 16 Global Step: 335530 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:24,430-Speed 2498.97 samples/sec Loss 2.5221 LearningRate 0.000438 Epoch: 16 Global Step: 335540 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:32,626-Speed 2499.32 samples/sec Loss 2.5737 LearningRate 0.000438 Epoch: 16 Global Step: 335550 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:40,831-Speed 2496.55 samples/sec Loss 2.6044 LearningRate 0.000438 Epoch: 16 Global Step: 335560 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:49,034-Speed 2497.13 samples/sec Loss 2.5413 LearningRate 0.000438 Epoch: 16 Global Step: 335570 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:14:57,232-Speed 2498.82 samples/sec Loss 2.5689 LearningRate 0.000438 Epoch: 16 Global Step: 335580 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:05,378-Speed 2514.24 samples/sec Loss 2.5401 LearningRate 0.000438 Epoch: 16 Global Step: 335590 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:13,577-Speed 2498.41 samples/sec Loss 2.6021 LearningRate 0.000438 Epoch: 16 Global Step: 335600 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:21,775-Speed 2498.53 samples/sec Loss 2.6051 LearningRate 0.000438 Epoch: 16 Global Step: 335610 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:29,970-Speed 2499.48 samples/sec Loss 2.5353 LearningRate 0.000438 Epoch: 16 Global Step: 335620 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:38,166-Speed 2499.16 samples/sec Loss 2.6009 LearningRate 0.000438 Epoch: 16 Global Step: 335630 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:46,362-Speed 2499.46 samples/sec Loss 2.5881 LearningRate 0.000438 Epoch: 16 Global Step: 335640 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:15:54,507-Speed 2514.80 samples/sec Loss 2.5723 LearningRate 0.000438 Epoch: 16 Global Step: 335650 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:02,707-Speed 2497.96 samples/sec Loss 2.5261 LearningRate 0.000438 Epoch: 16 Global Step: 335660 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:10,903-Speed 2499.46 samples/sec Loss 2.5346 LearningRate 0.000438 Epoch: 16 Global Step: 335670 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:19,100-Speed 2498.80 samples/sec Loss 2.5271 LearningRate 0.000438 Epoch: 16 Global Step: 335680 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:27,298-Speed 2498.49 samples/sec Loss 2.5439 LearningRate 0.000438 Epoch: 16 Global Step: 335690 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:35,495-Speed 2498.90 samples/sec Loss 2.5102 LearningRate 0.000438 Epoch: 16 Global Step: 335700 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:43,654-Speed 2510.51 samples/sec Loss 2.5415 LearningRate 0.000438 Epoch: 16 Global Step: 335710 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:16:51,855-Speed 2497.64 samples/sec Loss 2.5855 LearningRate 0.000438 Epoch: 16 Global Step: 335720 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:00,054-Speed 2498.48 samples/sec Loss 2.4948 LearningRate 0.000437 Epoch: 16 Global Step: 335730 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:08,246-Speed 2500.32 samples/sec Loss 2.5561 LearningRate 0.000437 Epoch: 16 Global Step: 335740 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:16,458-Speed 2494.12 samples/sec Loss 2.5210 LearningRate 0.000437 Epoch: 16 Global Step: 335750 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:24,660-Speed 2497.35 samples/sec Loss 2.5086 LearningRate 0.000437 Epoch: 16 Global Step: 335760 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:32,802-Speed 2515.97 samples/sec Loss 2.4624 LearningRate 0.000437 Epoch: 16 Global Step: 335770 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:41,002-Speed 2497.85 samples/sec Loss 2.5265 LearningRate 0.000437 Epoch: 16 Global Step: 335780 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:49,200-Speed 2498.70 samples/sec Loss 2.5730 LearningRate 0.000437 Epoch: 16 Global Step: 335790 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:17:57,395-Speed 2499.70 samples/sec Loss 2.5278 LearningRate 0.000437 Epoch: 16 Global Step: 335800 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:05,593-Speed 2498.76 samples/sec Loss 2.5710 LearningRate 0.000437 Epoch: 16 Global Step: 335810 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:13,791-Speed 2498.33 samples/sec Loss 2.5361 LearningRate 0.000437 Epoch: 16 Global Step: 335820 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:21,937-Speed 2514.80 samples/sec Loss 2.5367 LearningRate 0.000437 Epoch: 16 Global Step: 335830 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:30,141-Speed 2496.63 samples/sec Loss 2.5580 LearningRate 0.000437 Epoch: 16 Global Step: 335840 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:38,341-Speed 2497.96 samples/sec Loss 2.5916 LearningRate 0.000437 Epoch: 16 Global Step: 335850 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:46,543-Speed 2497.73 samples/sec Loss 2.5430 LearningRate 0.000437 Epoch: 16 Global Step: 335860 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:18:54,754-Speed 2494.38 samples/sec Loss 2.5972 LearningRate 0.000437 Epoch: 16 Global Step: 335870 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:03,222-Speed 2500.35 samples/sec Loss 2.6329 LearningRate 0.000437 Epoch: 16 Global Step: 335880 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:11,393-Speed 2514.90 samples/sec Loss 2.6042 LearningRate 0.000437 Epoch: 16 Global Step: 335890 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:21,909-Speed 1947.67 samples/sec Loss 2.6016 LearningRate 0.000437 Epoch: 16 Global Step: 335900 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:30,103-Speed 2499.83 samples/sec Loss 2.5782 LearningRate 0.000437 Epoch: 16 Global Step: 335910 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:38,339-Speed 2500.56 samples/sec Loss 2.5652 LearningRate 0.000437 Epoch: 16 Global Step: 335920 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:46,572-Speed 2500.24 samples/sec Loss 2.5739 LearningRate 0.000437 Epoch: 16 Global Step: 335930 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:19:54,770-Speed 2498.53 samples/sec Loss 2.5073 LearningRate 0.000437 Epoch: 16 Global Step: 335940 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:02,986-Speed 2515.79 samples/sec Loss 2.6107 LearningRate 0.000437 Epoch: 16 Global Step: 335950 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:15,299-Speed 2499.36 samples/sec Loss 2.5637 LearningRate 0.000437 Epoch: 16 Global Step: 335960 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:23,555-Speed 2501.15 samples/sec Loss 2.6060 LearningRate 0.000437 Epoch: 16 Global Step: 335970 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:31,756-Speed 2497.68 samples/sec Loss 2.4885 LearningRate 0.000437 Epoch: 16 Global Step: 335980 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:44,778-Speed 1580.03 samples/sec Loss 2.5585 LearningRate 0.000437 Epoch: 16 Global Step: 335990 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:20:52,983-Speed 2498.10 samples/sec Loss 2.6141 LearningRate 0.000437 Epoch: 16 Global Step: 336000 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:01,147-Speed 2508.98 samples/sec Loss 2.5572 LearningRate 0.000437 Epoch: 16 Global Step: 336010 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:09,379-Speed 2488.14 samples/sec Loss 2.5537 LearningRate 0.000437 Epoch: 16 Global Step: 336020 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:21,505-Speed 1696.62 samples/sec Loss 2.5048 LearningRate 0.000437 Epoch: 16 Global Step: 336030 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:29,747-Speed 2486.78 samples/sec Loss 2.5595 LearningRate 0.000437 Epoch: 16 Global Step: 336040 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:37,970-Speed 2490.79 samples/sec Loss 2.5460 LearningRate 0.000437 Epoch: 16 Global Step: 336050 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:46,218-Speed 2493.66 samples/sec Loss 2.5358 LearningRate 0.000437 Epoch: 16 Global Step: 336060 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:21:58,703-Speed 1649.86 samples/sec Loss 2.5778 LearningRate 0.000437 Epoch: 16 Global Step: 336070 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:06,962-Speed 2496.98 samples/sec Loss 2.5138 LearningRate 0.000437 Epoch: 16 Global Step: 336080 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:15,210-Speed 2498.34 samples/sec Loss 2.5502 LearningRate 0.000437 Epoch: 16 Global Step: 336090 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:26,735-Speed 1782.94 samples/sec Loss 2.5000 LearningRate 0.000437 Epoch: 16 Global Step: 336100 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:34,925-Speed 2501.14 samples/sec Loss 2.5407 LearningRate 0.000437 Epoch: 16 Global Step: 336110 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:43,324-Speed 2502.53 samples/sec Loss 2.5512 LearningRate 0.000437 Epoch: 16 Global Step: 336120 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:22:55,181-Speed 2294.12 samples/sec Loss 2.5196 LearningRate 0.000437 Epoch: 16 Global Step: 336130 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:03,374-Speed 2499.96 samples/sec Loss 2.5729 LearningRate 0.000437 Epoch: 16 Global Step: 336140 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:15,654-Speed 2501.64 samples/sec Loss 2.5184 LearningRate 0.000437 Epoch: 16 Global Step: 336150 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:23,856-Speed 2497.31 samples/sec Loss 2.5921 LearningRate 0.000437 Epoch: 16 Global Step: 336160 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:32,062-Speed 2496.05 samples/sec Loss 2.5895 LearningRate 0.000437 Epoch: 16 Global Step: 336170 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:40,269-Speed 2495.95 samples/sec Loss 2.5392 LearningRate 0.000437 Epoch: 16 Global Step: 336180 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:48,420-Speed 2512.97 samples/sec Loss 2.5633 LearningRate 0.000437 Epoch: 16 Global Step: 336190 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:23:56,639-Speed 2492.21 samples/sec Loss 2.4880 LearningRate 0.000437 Epoch: 16 Global Step: 336200 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:04,849-Speed 2494.84 samples/sec Loss 2.4998 LearningRate 0.000437 Epoch: 16 Global Step: 336210 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:13,051-Speed 2497.34 samples/sec Loss 2.5343 LearningRate 0.000437 Epoch: 16 Global Step: 336220 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:21,255-Speed 2497.20 samples/sec Loss 2.5505 LearningRate 0.000437 Epoch: 16 Global Step: 336230 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:29,459-Speed 2496.85 samples/sec Loss 2.5446 LearningRate 0.000437 Epoch: 16 Global Step: 336240 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:37,611-Speed 2512.71 samples/sec Loss 2.5753 LearningRate 0.000437 Epoch: 16 Global Step: 336250 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:45,812-Speed 2497.76 samples/sec Loss 2.5618 LearningRate 0.000437 Epoch: 16 Global Step: 336260 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:24:54,023-Speed 2494.56 samples/sec Loss 2.5519 LearningRate 0.000437 Epoch: 16 Global Step: 336270 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:02,227-Speed 2496.66 samples/sec Loss 2.5564 LearningRate 0.000437 Epoch: 16 Global Step: 336280 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:10,433-Speed 2496.16 samples/sec Loss 2.5447 LearningRate 0.000437 Epoch: 16 Global Step: 336290 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:18,646-Speed 2493.89 samples/sec Loss 2.5529 LearningRate 0.000436 Epoch: 16 Global Step: 336300 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:26,796-Speed 2513.49 samples/sec Loss 2.5464 LearningRate 0.000436 Epoch: 16 Global Step: 336310 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:34,996-Speed 2497.93 samples/sec Loss 2.5496 LearningRate 0.000436 Epoch: 16 Global Step: 336320 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:43,211-Speed 2493.45 samples/sec Loss 2.5752 LearningRate 0.000436 Epoch: 16 Global Step: 336330 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:51,411-Speed 2498.14 samples/sec Loss 2.5123 LearningRate 0.000436 Epoch: 16 Global Step: 336340 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:25:59,618-Speed 2495.67 samples/sec Loss 2.5099 LearningRate 0.000436 Epoch: 16 Global Step: 336350 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:07,827-Speed 2495.24 samples/sec Loss 2.5915 LearningRate 0.000436 Epoch: 16 Global Step: 336360 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:15,972-Speed 2514.52 samples/sec Loss 2.5202 LearningRate 0.000436 Epoch: 16 Global Step: 336370 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:24,173-Speed 2497.95 samples/sec Loss 2.5784 LearningRate 0.000436 Epoch: 16 Global Step: 336380 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:32,379-Speed 2495.99 samples/sec Loss 2.5112 LearningRate 0.000436 Epoch: 16 Global Step: 336390 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:40,578-Speed 2498.18 samples/sec Loss 2.5313 LearningRate 0.000436 Epoch: 16 Global Step: 336400 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:48,781-Speed 2497.34 samples/sec Loss 2.5822 LearningRate 0.000436 Epoch: 16 Global Step: 336410 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:26:56,988-Speed 2495.71 samples/sec Loss 2.5589 LearningRate 0.000436 Epoch: 16 Global Step: 336420 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:05,149-Speed 2509.97 samples/sec Loss 2.6125 LearningRate 0.000436 Epoch: 16 Global Step: 336430 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:13,349-Speed 2497.97 samples/sec Loss 2.5292 LearningRate 0.000436 Epoch: 16 Global Step: 336440 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:21,548-Speed 2498.32 samples/sec Loss 2.5403 LearningRate 0.000436 Epoch: 16 Global Step: 336450 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:29,751-Speed 2497.04 samples/sec Loss 2.5642 LearningRate 0.000436 Epoch: 16 Global Step: 336460 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:37,952-Speed 2497.59 samples/sec Loss 2.5580 LearningRate 0.000436 Epoch: 16 Global Step: 336470 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:46,167-Speed 2493.66 samples/sec Loss 2.5840 LearningRate 0.000436 Epoch: 16 Global Step: 336480 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:27:54,314-Speed 2514.03 samples/sec Loss 2.5124 LearningRate 0.000436 Epoch: 16 Global Step: 336490 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:28:02,518-Speed 2496.93 samples/sec Loss 2.5567 LearningRate 0.000436 Epoch: 16 Global Step: 336500 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:28:10,722-Speed 2497.04 samples/sec Loss 2.5021 LearningRate 0.000436 Epoch: 16 Global Step: 336510 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:28:18,923-Speed 2497.26 samples/sec Loss 2.5585 LearningRate 0.000436 Epoch: 16 Global Step: 336520 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:28:27,128-Speed 2496.85 samples/sec Loss 2.5350 LearningRate 0.000436 Epoch: 16 Global Step: 336530 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:28:35,326-Speed 2498.62 samples/sec Loss 2.5161 LearningRate 0.000436 Epoch: 16 Global Step: 336540 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:28:43,475-Speed 2513.68 samples/sec Loss 2.5550 LearningRate 0.000436 Epoch: 16 Global Step: 336550 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:28:51,686-Speed 2494.49 samples/sec Loss 2.5286 LearningRate 0.000436 Epoch: 16 Global Step: 336560 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:28:59,887-Speed 2497.79 samples/sec Loss 2.5397 LearningRate 0.000436 Epoch: 16 Global Step: 336570 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:29:08,091-Speed 2496.55 samples/sec Loss 2.5450 LearningRate 0.000436 Epoch: 16 Global Step: 336580 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:29:16,294-Speed 2497.24 samples/sec Loss 2.4959 LearningRate 0.000436 Epoch: 16 Global Step: 336590 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:29:24,457-Speed 2509.18 samples/sec Loss 2.5530 LearningRate 0.000436 Epoch: 16 Global Step: 336600 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:29:32,606-Speed 2513.81 samples/sec Loss 2.5353 LearningRate 0.000436 Epoch: 16 Global Step: 336610 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:29:40,807-Speed 2497.46 samples/sec Loss 2.5128 LearningRate 0.000436 Epoch: 16 Global Step: 336620 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:29:49,012-Speed 2496.43 samples/sec Loss 2.5847 LearningRate 0.000436 Epoch: 16 Global Step: 336630 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:29:57,239-Speed 2490.19 samples/sec Loss 2.5626 LearningRate 0.000436 Epoch: 16 Global Step: 336640 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:05,451-Speed 2494.59 samples/sec Loss 2.5576 LearningRate 0.000436 Epoch: 16 Global Step: 336650 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:13,650-Speed 2498.02 samples/sec Loss 2.5819 LearningRate 0.000436 Epoch: 16 Global Step: 336660 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:21,812-Speed 2509.64 samples/sec Loss 2.5729 LearningRate 0.000436 Epoch: 16 Global Step: 336670 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:30,016-Speed 2496.86 samples/sec Loss 2.5342 LearningRate 0.000436 Epoch: 16 Global Step: 336680 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:38,217-Speed 2497.90 samples/sec Loss 2.5704 LearningRate 0.000436 Epoch: 16 Global Step: 336690 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:46,421-Speed 2496.82 samples/sec Loss 2.5812 LearningRate 0.000436 Epoch: 16 Global Step: 336700 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:30:54,621-Speed 2497.85 samples/sec Loss 2.5368 LearningRate 0.000436 Epoch: 16 Global Step: 336710 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:02,823-Speed 2497.45 samples/sec Loss 2.5121 LearningRate 0.000436 Epoch: 16 Global Step: 336720 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:10,971-Speed 2514.00 samples/sec Loss 2.5813 LearningRate 0.000436 Epoch: 16 Global Step: 336730 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:19,168-Speed 2498.66 samples/sec Loss 2.5870 LearningRate 0.000436 Epoch: 16 Global Step: 336740 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:27,388-Speed 2491.86 samples/sec Loss 2.5452 LearningRate 0.000436 Epoch: 16 Global Step: 336750 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:35,599-Speed 2494.86 samples/sec Loss 2.5384 LearningRate 0.000436 Epoch: 16 Global Step: 336760 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:43,799-Speed 2497.93 samples/sec Loss 2.5795 LearningRate 0.000436 Epoch: 16 Global Step: 336770 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:31:51,998-Speed 2498.39 samples/sec Loss 2.5783 LearningRate 0.000436 Epoch: 16 Global Step: 336780 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:00,147-Speed 2513.56 samples/sec Loss 2.5726 LearningRate 0.000436 Epoch: 16 Global Step: 336790 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:08,343-Speed 2498.98 samples/sec Loss 2.5740 LearningRate 0.000436 Epoch: 16 Global Step: 336800 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:16,540-Speed 2498.96 samples/sec Loss 2.5961 LearningRate 0.000436 Epoch: 16 Global Step: 336810 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:24,740-Speed 2497.94 samples/sec Loss 2.5597 LearningRate 0.000436 Epoch: 16 Global Step: 336820 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:32,943-Speed 2497.04 samples/sec Loss 2.5774 LearningRate 0.000436 Epoch: 16 Global Step: 336830 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:41,141-Speed 2498.75 samples/sec Loss 2.5462 LearningRate 0.000436 Epoch: 16 Global Step: 336840 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:49,284-Speed 2515.47 samples/sec Loss 2.5648 LearningRate 0.000436 Epoch: 16 Global Step: 336850 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:32:57,483-Speed 2498.16 samples/sec Loss 2.5369 LearningRate 0.000435 Epoch: 16 Global Step: 336860 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:05,685-Speed 2497.44 samples/sec Loss 2.5079 LearningRate 0.000435 Epoch: 16 Global Step: 336870 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:13,891-Speed 2496.20 samples/sec Loss 2.5029 LearningRate 0.000435 Epoch: 16 Global Step: 336880 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:22,089-Speed 2498.81 samples/sec Loss 2.5812 LearningRate 0.000435 Epoch: 16 Global Step: 336890 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:30,290-Speed 2497.46 samples/sec Loss 2.5420 LearningRate 0.000435 Epoch: 16 Global Step: 336900 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:38,435-Speed 2515.03 samples/sec Loss 2.5551 LearningRate 0.000435 Epoch: 16 Global Step: 336910 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:46,636-Speed 2497.72 samples/sec Loss 2.5633 LearningRate 0.000435 Epoch: 16 Global Step: 336920 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:33:54,841-Speed 2496.77 samples/sec Loss 2.6139 LearningRate 0.000435 Epoch: 16 Global Step: 336930 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:03,041-Speed 2498.05 samples/sec Loss 2.5641 LearningRate 0.000435 Epoch: 16 Global Step: 336940 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:11,243-Speed 2497.13 samples/sec Loss 2.5572 LearningRate 0.000435 Epoch: 16 Global Step: 336950 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:19,442-Speed 2498.49 samples/sec Loss 2.5615 LearningRate 0.000435 Epoch: 16 Global Step: 336960 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:27,595-Speed 2512.30 samples/sec Loss 2.5433 LearningRate 0.000435 Epoch: 16 Global Step: 336970 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:35,794-Speed 2498.17 samples/sec Loss 2.5689 LearningRate 0.000435 Epoch: 16 Global Step: 336980 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:44,003-Speed 2495.21 samples/sec Loss 2.5663 LearningRate 0.000435 Epoch: 16 Global Step: 336990 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:34:52,199-Speed 2499.25 samples/sec Loss 2.5677 LearningRate 0.000435 Epoch: 16 Global Step: 337000 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:00,402-Speed 2497.26 samples/sec Loss 2.5920 LearningRate 0.000435 Epoch: 16 Global Step: 337010 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:08,599-Speed 2498.56 samples/sec Loss 2.5212 LearningRate 0.000435 Epoch: 16 Global Step: 337020 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:16,749-Speed 2513.18 samples/sec Loss 2.5430 LearningRate 0.000435 Epoch: 16 Global Step: 337030 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:24,946-Speed 2498.80 samples/sec Loss 2.5007 LearningRate 0.000435 Epoch: 16 Global Step: 337040 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:33,149-Speed 2497.20 samples/sec Loss 2.5231 LearningRate 0.000435 Epoch: 16 Global Step: 337050 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:41,352-Speed 2497.11 samples/sec Loss 2.5249 LearningRate 0.000435 Epoch: 16 Global Step: 337060 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:49,551-Speed 2498.27 samples/sec Loss 2.4959 LearningRate 0.000435 Epoch: 16 Global Step: 337070 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:35:57,752-Speed 2497.65 samples/sec Loss 2.5732 LearningRate 0.000435 Epoch: 16 Global Step: 337080 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:05,904-Speed 2512.53 samples/sec Loss 2.5786 LearningRate 0.000435 Epoch: 16 Global Step: 337090 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:14,189-Speed 2472.37 samples/sec Loss 2.5962 LearningRate 0.000435 Epoch: 16 Global Step: 337100 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:22,401-Speed 2494.19 samples/sec Loss 2.5818 LearningRate 0.000435 Epoch: 16 Global Step: 337110 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:30,606-Speed 2496.69 samples/sec Loss 2.5892 LearningRate 0.000435 Epoch: 16 Global Step: 337120 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:38,812-Speed 2496.00 samples/sec Loss 2.5511 LearningRate 0.000435 Epoch: 16 Global Step: 337130 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:47,012-Speed 2498.31 samples/sec Loss 2.5674 LearningRate 0.000435 Epoch: 16 Global Step: 337140 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:36:55,168-Speed 2511.16 samples/sec Loss 2.5917 LearningRate 0.000435 Epoch: 16 Global Step: 337150 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:03,367-Speed 2498.53 samples/sec Loss 2.5357 LearningRate 0.000435 Epoch: 16 Global Step: 337160 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:11,566-Speed 2498.41 samples/sec Loss 2.5445 LearningRate 0.000435 Epoch: 16 Global Step: 337170 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:19,763-Speed 2498.68 samples/sec Loss 2.5315 LearningRate 0.000435 Epoch: 16 Global Step: 337180 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:27,958-Speed 2499.54 samples/sec Loss 2.5384 LearningRate 0.000435 Epoch: 16 Global Step: 337190 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:36,157-Speed 2498.47 samples/sec Loss 2.5574 LearningRate 0.000435 Epoch: 16 Global Step: 337200 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:44,301-Speed 2514.91 samples/sec Loss 2.5334 LearningRate 0.000435 Epoch: 16 Global Step: 337210 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:37:52,508-Speed 2495.93 samples/sec Loss 2.5474 LearningRate 0.000435 Epoch: 16 Global Step: 337220 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:00,706-Speed 2498.43 samples/sec Loss 2.4861 LearningRate 0.000435 Epoch: 16 Global Step: 337230 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:08,906-Speed 2497.99 samples/sec Loss 2.4910 LearningRate 0.000435 Epoch: 16 Global Step: 337240 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:17,106-Speed 2497.87 samples/sec Loss 2.5194 LearningRate 0.000435 Epoch: 16 Global Step: 337250 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:25,306-Speed 2498.03 samples/sec Loss 2.5555 LearningRate 0.000435 Epoch: 16 Global Step: 337260 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:33,449-Speed 2515.49 samples/sec Loss 2.4514 LearningRate 0.000435 Epoch: 16 Global Step: 337270 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:41,648-Speed 2498.44 samples/sec Loss 2.5534 LearningRate 0.000435 Epoch: 16 Global Step: 337280 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:49,850-Speed 2497.40 samples/sec Loss 2.5438 LearningRate 0.000435 Epoch: 16 Global Step: 337290 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:38:58,059-Speed 2495.05 samples/sec Loss 2.5140 LearningRate 0.000435 Epoch: 16 Global Step: 337300 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:06,270-Speed 2494.97 samples/sec Loss 2.4873 LearningRate 0.000435 Epoch: 16 Global Step: 337310 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:14,470-Speed 2497.92 samples/sec Loss 2.5278 LearningRate 0.000435 Epoch: 16 Global Step: 337320 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:22,616-Speed 2514.71 samples/sec Loss 2.5633 LearningRate 0.000435 Epoch: 16 Global Step: 337330 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:30,817-Speed 2497.65 samples/sec Loss 2.5582 LearningRate 0.000435 Epoch: 16 Global Step: 337340 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:39,021-Speed 2496.60 samples/sec Loss 2.5431 LearningRate 0.000435 Epoch: 16 Global Step: 337350 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:47,236-Speed 2493.57 samples/sec Loss 2.4931 LearningRate 0.000435 Epoch: 16 Global Step: 337360 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:39:55,439-Speed 2496.91 samples/sec Loss 2.5452 LearningRate 0.000435 Epoch: 16 Global Step: 337370 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:03,640-Speed 2497.60 samples/sec Loss 2.5074 LearningRate 0.000435 Epoch: 16 Global Step: 337380 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:11,786-Speed 2514.58 samples/sec Loss 2.5190 LearningRate 0.000435 Epoch: 16 Global Step: 337390 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:19,987-Speed 2497.72 samples/sec Loss 2.5455 LearningRate 0.000435 Epoch: 16 Global Step: 337400 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:28,190-Speed 2497.00 samples/sec Loss 2.4595 LearningRate 0.000435 Epoch: 16 Global Step: 337410 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:36,388-Speed 2498.98 samples/sec Loss 2.5368 LearningRate 0.000435 Epoch: 16 Global Step: 337420 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:44,592-Speed 2496.73 samples/sec Loss 2.5589 LearningRate 0.000434 Epoch: 16 Global Step: 337430 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:40:52,803-Speed 2494.66 samples/sec Loss 2.5312 LearningRate 0.000434 Epoch: 16 Global Step: 337440 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:00,954-Speed 2512.91 samples/sec Loss 2.5074 LearningRate 0.000434 Epoch: 16 Global Step: 337450 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:09,169-Speed 2493.51 samples/sec Loss 2.5478 LearningRate 0.000434 Epoch: 16 Global Step: 337460 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:17,382-Speed 2494.30 samples/sec Loss 2.5869 LearningRate 0.000434 Epoch: 16 Global Step: 337470 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:25,579-Speed 2498.70 samples/sec Loss 2.5082 LearningRate 0.000434 Epoch: 16 Global Step: 337480 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:33,786-Speed 2495.90 samples/sec Loss 2.4942 LearningRate 0.000434 Epoch: 16 Global Step: 337490 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:41,989-Speed 2496.97 samples/sec Loss 2.4908 LearningRate 0.000434 Epoch: 16 Global Step: 337500 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:50,135-Speed 2514.61 samples/sec Loss 2.5518 LearningRate 0.000434 Epoch: 16 Global Step: 337510 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:41:58,339-Speed 2496.79 samples/sec Loss 2.5275 LearningRate 0.000434 Epoch: 16 Global Step: 337520 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:06,542-Speed 2497.19 samples/sec Loss 2.5430 LearningRate 0.000434 Epoch: 16 Global Step: 337530 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:14,742-Speed 2498.08 samples/sec Loss 2.5475 LearningRate 0.000434 Epoch: 16 Global Step: 337540 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:22,946-Speed 2496.52 samples/sec Loss 2.5250 LearningRate 0.000434 Epoch: 16 Global Step: 337550 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:31,147-Speed 2497.58 samples/sec Loss 2.5035 LearningRate 0.000434 Epoch: 16 Global Step: 337560 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:39,295-Speed 2513.82 samples/sec Loss 2.5281 LearningRate 0.000434 Epoch: 16 Global Step: 337570 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:47,513-Speed 2492.84 samples/sec Loss 2.5242 LearningRate 0.000434 Epoch: 16 Global Step: 337580 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:42:55,711-Speed 2498.46 samples/sec Loss 2.5252 LearningRate 0.000434 Epoch: 16 Global Step: 337590 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:03,912-Speed 2497.73 samples/sec Loss 2.5123 LearningRate 0.000434 Epoch: 16 Global Step: 337600 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:12,145-Speed 2488.07 samples/sec Loss 2.5563 LearningRate 0.000434 Epoch: 16 Global Step: 337610 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:20,345-Speed 2498.00 samples/sec Loss 2.4991 LearningRate 0.000434 Epoch: 16 Global Step: 337620 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:28,495-Speed 2513.10 samples/sec Loss 2.5545 LearningRate 0.000434 Epoch: 16 Global Step: 337630 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:36,701-Speed 2496.04 samples/sec Loss 2.5370 LearningRate 0.000434 Epoch: 16 Global Step: 337640 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:44,900-Speed 2498.50 samples/sec Loss 2.5323 LearningRate 0.000434 Epoch: 16 Global Step: 337650 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:43:53,124-Speed 2490.80 samples/sec Loss 2.5335 LearningRate 0.000434 Epoch: 16 Global Step: 337660 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:01,324-Speed 2497.74 samples/sec Loss 2.5020 LearningRate 0.000434 Epoch: 16 Global Step: 337670 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:09,522-Speed 2498.62 samples/sec Loss 2.5454 LearningRate 0.000434 Epoch: 16 Global Step: 337680 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:17,671-Speed 2513.75 samples/sec Loss 2.6259 LearningRate 0.000434 Epoch: 16 Global Step: 337690 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:25,868-Speed 2498.90 samples/sec Loss 2.5311 LearningRate 0.000434 Epoch: 16 Global Step: 337700 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:34,067-Speed 2498.58 samples/sec Loss 2.5314 LearningRate 0.000434 Epoch: 16 Global Step: 337710 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:42,266-Speed 2498.55 samples/sec Loss 2.5904 LearningRate 0.000434 Epoch: 16 Global Step: 337720 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:50,464-Speed 2498.55 samples/sec Loss 2.6046 LearningRate 0.000434 Epoch: 16 Global Step: 337730 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:44:58,660-Speed 2499.00 samples/sec Loss 2.5660 LearningRate 0.000434 Epoch: 16 Global Step: 337740 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:06,806-Speed 2514.53 samples/sec Loss 2.5349 LearningRate 0.000434 Epoch: 16 Global Step: 337750 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:15,014-Speed 2495.71 samples/sec Loss 2.5003 LearningRate 0.000434 Epoch: 16 Global Step: 337760 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:23,218-Speed 2496.68 samples/sec Loss 2.6247 LearningRate 0.000434 Epoch: 16 Global Step: 337770 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:31,426-Speed 2495.49 samples/sec Loss 2.5311 LearningRate 0.000434 Epoch: 16 Global Step: 337780 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:39,622-Speed 2499.06 samples/sec Loss 2.5382 LearningRate 0.000434 Epoch: 16 Global Step: 337790 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:45:47,829-Speed 2495.92 samples/sec Loss 2.5964 LearningRate 0.000434 Epoch: 16 Global Step: 337800 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:45:55,979-Speed 2513.22 samples/sec Loss 2.5428 LearningRate 0.000434 Epoch: 16 Global Step: 337810 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:04,189-Speed 2495.06 samples/sec Loss 2.6166 LearningRate 0.000434 Epoch: 16 Global Step: 337820 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:12,386-Speed 2498.86 samples/sec Loss 2.6513 LearningRate 0.000434 Epoch: 16 Global Step: 337830 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:20,587-Speed 2497.56 samples/sec Loss 2.6068 LearningRate 0.000434 Epoch: 16 Global Step: 337840 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:28,796-Speed 2495.36 samples/sec Loss 2.5864 LearningRate 0.000434 Epoch: 16 Global Step: 337850 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:36,995-Speed 2498.26 samples/sec Loss 2.6414 LearningRate 0.000434 Epoch: 16 Global Step: 337860 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:45,141-Speed 2514.49 samples/sec Loss 2.5926 LearningRate 0.000434 Epoch: 16 Global Step: 337870 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:46:53,341-Speed 2498.09 samples/sec Loss 2.5792 LearningRate 0.000434 Epoch: 16 Global Step: 337880 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:47:01,541-Speed 2497.84 samples/sec Loss 2.5316 LearningRate 0.000434 Epoch: 16 Global Step: 337890 Fp16 Grad Scale: 32768 Required: 113 hours Training: 2022-07-08 19:47:09,699-Speed 2511.15 samples/sec Loss 2.5502 LearningRate 0.000434 Epoch: 16 Global Step: 337900 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:17,912-Speed 2493.87 samples/sec Loss 2.5313 LearningRate 0.000434 Epoch: 16 Global Step: 337910 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:26,121-Speed 2495.43 samples/sec Loss 2.5975 LearningRate 0.000434 Epoch: 16 Global Step: 337920 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:34,265-Speed 2515.22 samples/sec Loss 2.5350 LearningRate 0.000434 Epoch: 16 Global Step: 337930 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:42,461-Speed 2499.25 samples/sec Loss 2.5452 LearningRate 0.000434 Epoch: 16 Global Step: 337940 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:50,663-Speed 2497.30 samples/sec Loss 2.5689 LearningRate 0.000434 Epoch: 16 Global Step: 337950 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:47:58,870-Speed 2495.89 samples/sec Loss 2.5756 LearningRate 0.000434 Epoch: 16 Global Step: 337960 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:07,075-Speed 2496.60 samples/sec Loss 2.5444 LearningRate 0.000434 Epoch: 16 Global Step: 337970 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:15,287-Speed 2494.17 samples/sec Loss 2.5484 LearningRate 0.000434 Epoch: 16 Global Step: 337980 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:23,431-Speed 2515.13 samples/sec Loss 2.5323 LearningRate 0.000434 Epoch: 16 Global Step: 337990 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:31,630-Speed 2498.29 samples/sec Loss 2.6211 LearningRate 0.000433 Epoch: 16 Global Step: 338000 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:39,864-Speed 2487.61 samples/sec Loss 2.5453 LearningRate 0.000433 Epoch: 16 Global Step: 338010 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:48,066-Speed 2497.59 samples/sec Loss 2.6076 LearningRate 0.000433 Epoch: 16 Global Step: 338020 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:48:56,275-Speed 2495.08 samples/sec Loss 2.5614 LearningRate 0.000433 Epoch: 16 Global Step: 338030 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:49:04,489-Speed 2493.67 samples/sec Loss 2.5626 LearningRate 0.000433 Epoch: 16 Global Step: 338040 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:49:12,633-Speed 2515.24 samples/sec Loss 2.5420 LearningRate 0.000433 Epoch: 16 Global Step: 338050 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:49:20,833-Speed 2497.71 samples/sec Loss 2.5537 LearningRate 0.000433 Epoch: 16 Global Step: 338060 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:49:29,042-Speed 2495.57 samples/sec Loss 2.5813 LearningRate 0.000433 Epoch: 16 Global Step: 338070 Fp16 Grad Scale: 16384 Required: 113 hours Training: 2022-07-08 19:49:37,241-Speed 2498.13 samples/sec Loss 2.5768 LearningRate 0.000433 Epoch: 16 Global Step: 338080 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:49:45,455-Speed 2493.88 samples/sec Loss 2.5365 LearningRate 0.000433 Epoch: 16 Global Step: 338090 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:49:53,660-Speed 2496.56 samples/sec Loss 2.5269 LearningRate 0.000433 Epoch: 16 Global Step: 338100 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:01,804-Speed 2514.99 samples/sec Loss 2.5074 LearningRate 0.000433 Epoch: 16 Global Step: 338110 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:10,004-Speed 2497.93 samples/sec Loss 2.5643 LearningRate 0.000433 Epoch: 16 Global Step: 338120 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:18,217-Speed 2494.09 samples/sec Loss 2.5154 LearningRate 0.000433 Epoch: 16 Global Step: 338130 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:26,417-Speed 2497.80 samples/sec Loss 2.5813 LearningRate 0.000433 Epoch: 16 Global Step: 338140 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:34,624-Speed 2496.03 samples/sec Loss 2.5438 LearningRate 0.000433 Epoch: 16 Global Step: 338150 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:42,822-Speed 2498.66 samples/sec Loss 2.5483 LearningRate 0.000433 Epoch: 16 Global Step: 338160 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:50,979-Speed 2511.11 samples/sec Loss 2.5886 LearningRate 0.000433 Epoch: 16 Global Step: 338170 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:50:59,178-Speed 2498.06 samples/sec Loss 2.6003 LearningRate 0.000433 Epoch: 16 Global Step: 338180 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:07,382-Speed 2496.91 samples/sec Loss 2.5313 LearningRate 0.000433 Epoch: 16 Global Step: 338190 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:15,587-Speed 2496.39 samples/sec Loss 2.6205 LearningRate 0.000433 Epoch: 16 Global Step: 338200 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:23,788-Speed 2497.70 samples/sec Loss 2.6535 LearningRate 0.000433 Epoch: 16 Global Step: 338210 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:31,989-Speed 2497.60 samples/sec Loss 2.6114 LearningRate 0.000433 Epoch: 16 Global Step: 338220 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:40,153-Speed 2508.99 samples/sec Loss 2.6162 LearningRate 0.000433 Epoch: 16 Global Step: 338230 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:48,357-Speed 2496.95 samples/sec Loss 2.6045 LearningRate 0.000433 Epoch: 16 Global Step: 338240 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:51:56,555-Speed 2498.37 samples/sec Loss 2.6337 LearningRate 0.000433 Epoch: 16 Global Step: 338250 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:04,755-Speed 2498.03 samples/sec Loss 2.5308 LearningRate 0.000433 Epoch: 16 Global Step: 338260 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:12,966-Speed 2494.68 samples/sec Loss 2.5558 LearningRate 0.000433 Epoch: 16 Global Step: 338270 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:21,170-Speed 2496.63 samples/sec Loss 2.5783 LearningRate 0.000433 Epoch: 16 Global Step: 338280 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:29,313-Speed 2515.57 samples/sec Loss 2.4805 LearningRate 0.000433 Epoch: 16 Global Step: 338290 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:37,513-Speed 2498.30 samples/sec Loss 2.4982 LearningRate 0.000433 Epoch: 16 Global Step: 338300 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:45,715-Speed 2497.14 samples/sec Loss 2.5516 LearningRate 0.000433 Epoch: 16 Global Step: 338310 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:52:53,913-Speed 2498.74 samples/sec Loss 2.5363 LearningRate 0.000433 Epoch: 16 Global Step: 338320 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:02,124-Speed 2494.71 samples/sec Loss 2.5643 LearningRate 0.000433 Epoch: 16 Global Step: 338330 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:10,330-Speed 2496.14 samples/sec Loss 2.6203 LearningRate 0.000433 Epoch: 16 Global Step: 338340 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:18,477-Speed 2514.25 samples/sec Loss 2.5312 LearningRate 0.000433 Epoch: 16 Global Step: 338350 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:26,677-Speed 2497.61 samples/sec Loss 2.5569 LearningRate 0.000433 Epoch: 16 Global Step: 338360 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:34,877-Speed 2497.98 samples/sec Loss 2.5258 LearningRate 0.000433 Epoch: 16 Global Step: 338370 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:43,075-Speed 2498.90 samples/sec Loss 2.5431 LearningRate 0.000433 Epoch: 16 Global Step: 338380 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:51,276-Speed 2497.75 samples/sec Loss 2.5655 LearningRate 0.000433 Epoch: 16 Global Step: 338390 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:53:59,476-Speed 2497.89 samples/sec Loss 2.5170 LearningRate 0.000433 Epoch: 16 Global Step: 338400 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:07,626-Speed 2513.31 samples/sec Loss 2.5045 LearningRate 0.000433 Epoch: 16 Global Step: 338410 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:15,824-Speed 2498.85 samples/sec Loss 2.5058 LearningRate 0.000433 Epoch: 16 Global Step: 338420 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:24,043-Speed 2492.33 samples/sec Loss 2.5084 LearningRate 0.000433 Epoch: 16 Global Step: 338430 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:32,248-Speed 2496.33 samples/sec Loss 2.4977 LearningRate 0.000433 Epoch: 16 Global Step: 338440 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:40,446-Speed 2498.61 samples/sec Loss 2.5361 LearningRate 0.000433 Epoch: 16 Global Step: 338450 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:48,649-Speed 2497.24 samples/sec Loss 2.5000 LearningRate 0.000433 Epoch: 16 Global Step: 338460 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:54:56,795-Speed 2514.34 samples/sec Loss 2.4502 LearningRate 0.000433 Epoch: 16 Global Step: 338470 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:04,995-Speed 2498.38 samples/sec Loss 2.5021 LearningRate 0.000433 Epoch: 16 Global Step: 338480 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:13,196-Speed 2497.68 samples/sec Loss 2.5230 LearningRate 0.000433 Epoch: 16 Global Step: 338490 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:21,392-Speed 2499.05 samples/sec Loss 2.5401 LearningRate 0.000433 Epoch: 16 Global Step: 338500 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:29,590-Speed 2498.59 samples/sec Loss 2.4935 LearningRate 0.000433 Epoch: 16 Global Step: 338510 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:37,798-Speed 2495.59 samples/sec Loss 2.6079 LearningRate 0.000433 Epoch: 16 Global Step: 338520 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:45,954-Speed 2511.47 samples/sec Loss 2.5350 LearningRate 0.000433 Epoch: 16 Global Step: 338530 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:55:54,154-Speed 2497.94 samples/sec Loss 2.5379 LearningRate 0.000433 Epoch: 16 Global Step: 338540 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:02,352-Speed 2498.74 samples/sec Loss 2.5180 LearningRate 0.000433 Epoch: 16 Global Step: 338550 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:10,555-Speed 2496.84 samples/sec Loss 2.5730 LearningRate 0.000432 Epoch: 16 Global Step: 338560 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:18,756-Speed 2497.67 samples/sec Loss 2.5412 LearningRate 0.000432 Epoch: 16 Global Step: 338570 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:26,962-Speed 2496.09 samples/sec Loss 2.5357 LearningRate 0.000432 Epoch: 16 Global Step: 338580 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:35,106-Speed 2515.14 samples/sec Loss 2.5177 LearningRate 0.000432 Epoch: 16 Global Step: 338590 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:43,306-Speed 2498.09 samples/sec Loss 2.5521 LearningRate 0.000432 Epoch: 16 Global Step: 338600 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:51,507-Speed 2497.64 samples/sec Loss 2.5371 LearningRate 0.000432 Epoch: 16 Global Step: 338610 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:56:59,706-Speed 2498.26 samples/sec Loss 2.4966 LearningRate 0.000432 Epoch: 16 Global Step: 338620 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:07,919-Speed 2493.94 samples/sec Loss 2.5321 LearningRate 0.000432 Epoch: 16 Global Step: 338630 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:16,119-Speed 2497.91 samples/sec Loss 2.5300 LearningRate 0.000432 Epoch: 16 Global Step: 338640 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:24,271-Speed 2512.61 samples/sec Loss 2.5094 LearningRate 0.000432 Epoch: 16 Global Step: 338650 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:32,482-Speed 2494.57 samples/sec Loss 2.5102 LearningRate 0.000432 Epoch: 16 Global Step: 338660 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:40,680-Speed 2498.72 samples/sec Loss 2.5210 LearningRate 0.000432 Epoch: 16 Global Step: 338670 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:48,879-Speed 2498.32 samples/sec Loss 2.5029 LearningRate 0.000432 Epoch: 16 Global Step: 338680 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:57:57,074-Speed 2499.46 samples/sec Loss 2.5046 LearningRate 0.000432 Epoch: 16 Global Step: 338690 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:05,285-Speed 2494.51 samples/sec Loss 2.5013 LearningRate 0.000432 Epoch: 16 Global Step: 338700 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:13,446-Speed 2509.79 samples/sec Loss 2.5359 LearningRate 0.000432 Epoch: 16 Global Step: 338710 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:21,644-Speed 2498.61 samples/sec Loss 2.5478 LearningRate 0.000432 Epoch: 16 Global Step: 338720 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:29,847-Speed 2497.00 samples/sec Loss 2.4916 LearningRate 0.000432 Epoch: 16 Global Step: 338730 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:38,065-Speed 2492.61 samples/sec Loss 2.4873 LearningRate 0.000432 Epoch: 16 Global Step: 338740 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:46,265-Speed 2497.91 samples/sec Loss 2.5358 LearningRate 0.000432 Epoch: 16 Global Step: 338750 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:58:54,469-Speed 2496.73 samples/sec Loss 2.5458 LearningRate 0.000432 Epoch: 16 Global Step: 338760 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:02,621-Speed 2512.88 samples/sec Loss 2.5042 LearningRate 0.000432 Epoch: 16 Global Step: 338770 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:10,823-Speed 2497.18 samples/sec Loss 2.5429 LearningRate 0.000432 Epoch: 16 Global Step: 338780 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:19,023-Speed 2498.13 samples/sec Loss 2.5140 LearningRate 0.000432 Epoch: 16 Global Step: 338790 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:27,224-Speed 2497.65 samples/sec Loss 2.5283 LearningRate 0.000432 Epoch: 16 Global Step: 338800 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:35,424-Speed 2497.95 samples/sec Loss 2.5465 LearningRate 0.000432 Epoch: 16 Global Step: 338810 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:43,624-Speed 2497.97 samples/sec Loss 2.5908 LearningRate 0.000432 Epoch: 16 Global Step: 338820 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:51,768-Speed 2515.21 samples/sec Loss 2.5366 LearningRate 0.000432 Epoch: 16 Global Step: 338830 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 19:59:59,968-Speed 2498.04 samples/sec Loss 2.5253 LearningRate 0.000432 Epoch: 16 Global Step: 338840 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:08,166-Speed 2498.54 samples/sec Loss 2.6370 LearningRate 0.000432 Epoch: 16 Global Step: 338850 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:16,381-Speed 2493.55 samples/sec Loss 2.5415 LearningRate 0.000432 Epoch: 16 Global Step: 338860 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:24,589-Speed 2495.78 samples/sec Loss 2.5199 LearningRate 0.000432 Epoch: 16 Global Step: 338870 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:32,788-Speed 2497.97 samples/sec Loss 2.5267 LearningRate 0.000432 Epoch: 16 Global Step: 338880 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:40,934-Speed 2514.65 samples/sec Loss 2.4926 LearningRate 0.000432 Epoch: 16 Global Step: 338890 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:49,134-Speed 2497.86 samples/sec Loss 2.5516 LearningRate 0.000432 Epoch: 16 Global Step: 338900 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:00:57,332-Speed 2498.71 samples/sec Loss 2.5577 LearningRate 0.000432 Epoch: 16 Global Step: 338910 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:05,534-Speed 2497.27 samples/sec Loss 2.5501 LearningRate 0.000432 Epoch: 16 Global Step: 338920 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:13,735-Speed 2497.64 samples/sec Loss 2.5790 LearningRate 0.000432 Epoch: 16 Global Step: 338930 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:21,949-Speed 2493.78 samples/sec Loss 2.5274 LearningRate 0.000432 Epoch: 16 Global Step: 338940 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:30,096-Speed 2514.35 samples/sec Loss 2.5679 LearningRate 0.000432 Epoch: 16 Global Step: 338950 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:38,299-Speed 2496.94 samples/sec Loss 2.5193 LearningRate 0.000432 Epoch: 16 Global Step: 338960 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:46,496-Speed 2498.60 samples/sec Loss 2.5078 LearningRate 0.000432 Epoch: 16 Global Step: 338970 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:01:54,700-Speed 2497.23 samples/sec Loss 2.5200 LearningRate 0.000432 Epoch: 16 Global Step: 338980 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:02,911-Speed 2494.55 samples/sec Loss 2.5320 LearningRate 0.000432 Epoch: 16 Global Step: 338990 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:11,111-Speed 2497.73 samples/sec Loss 2.5488 LearningRate 0.000432 Epoch: 16 Global Step: 339000 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:19,270-Speed 2510.44 samples/sec Loss 2.5148 LearningRate 0.000432 Epoch: 16 Global Step: 339010 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:27,469-Speed 2498.21 samples/sec Loss 2.4836 LearningRate 0.000432 Epoch: 16 Global Step: 339020 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:35,673-Speed 2497.12 samples/sec Loss 2.5813 LearningRate 0.000432 Epoch: 16 Global Step: 339030 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:43,874-Speed 2497.30 samples/sec Loss 2.5142 LearningRate 0.000432 Epoch: 16 Global Step: 339040 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:02:52,076-Speed 2497.24 samples/sec Loss 2.5427 LearningRate 0.000432 Epoch: 16 Global Step: 339050 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:03:00,273-Speed 2499.22 samples/sec Loss 2.5441 LearningRate 0.000432 Epoch: 16 Global Step: 339060 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:03:08,426-Speed 2512.20 samples/sec Loss 2.5935 LearningRate 0.000432 Epoch: 16 Global Step: 339070 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:03:16,628-Speed 2497.25 samples/sec Loss 2.5724 LearningRate 0.000432 Epoch: 16 Global Step: 339080 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:03:24,830-Speed 2497.37 samples/sec Loss 2.5414 LearningRate 0.000432 Epoch: 16 Global Step: 339090 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:03:33,032-Speed 2497.44 samples/sec Loss 2.5660 LearningRate 0.000432 Epoch: 16 Global Step: 339100 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:03:41,235-Speed 2497.08 samples/sec Loss 2.5994 LearningRate 0.000432 Epoch: 16 Global Step: 339110 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:03:49,437-Speed 2497.34 samples/sec Loss 2.5355 LearningRate 0.000432 Epoch: 16 Global Step: 339120 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:03:57,583-Speed 2514.60 samples/sec Loss 2.5464 LearningRate 0.000431 Epoch: 16 Global Step: 339130 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:05,784-Speed 2497.53 samples/sec Loss 2.5845 LearningRate 0.000431 Epoch: 16 Global Step: 339140 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:13,985-Speed 2497.68 samples/sec Loss 2.6105 LearningRate 0.000431 Epoch: 16 Global Step: 339150 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:22,194-Speed 2495.04 samples/sec Loss 2.5440 LearningRate 0.000431 Epoch: 16 Global Step: 339160 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:30,394-Speed 2498.05 samples/sec Loss 2.5186 LearningRate 0.000431 Epoch: 16 Global Step: 339170 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:38,594-Speed 2498.00 samples/sec Loss 2.6099 LearningRate 0.000431 Epoch: 16 Global Step: 339180 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:46,748-Speed 2512.03 samples/sec Loss 2.5890 LearningRate 0.000431 Epoch: 16 Global Step: 339190 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:04:54,951-Speed 2497.23 samples/sec Loss 2.6449 LearningRate 0.000431 Epoch: 16 Global Step: 339200 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:03,155-Speed 2496.50 samples/sec Loss 2.5967 LearningRate 0.000431 Epoch: 16 Global Step: 339210 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:11,353-Speed 2498.46 samples/sec Loss 2.6087 LearningRate 0.000431 Epoch: 16 Global Step: 339220 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:19,559-Speed 2496.15 samples/sec Loss 2.5553 LearningRate 0.000431 Epoch: 16 Global Step: 339230 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:27,759-Speed 2498.95 samples/sec Loss 2.6082 LearningRate 0.000431 Epoch: 16 Global Step: 339240 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:35,906-Speed 2514.30 samples/sec Loss 2.5557 LearningRate 0.000431 Epoch: 16 Global Step: 339250 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:44,121-Speed 2493.19 samples/sec Loss 2.5212 LearningRate 0.000431 Epoch: 16 Global Step: 339260 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:05:52,345-Speed 2490.56 samples/sec Loss 2.5968 LearningRate 0.000431 Epoch: 16 Global Step: 339270 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:00,546-Speed 2497.92 samples/sec Loss 2.5534 LearningRate 0.000431 Epoch: 16 Global Step: 339280 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:08,746-Speed 2497.78 samples/sec Loss 2.5788 LearningRate 0.000431 Epoch: 16 Global Step: 339290 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:16,957-Speed 2494.52 samples/sec Loss 2.5527 LearningRate 0.000431 Epoch: 16 Global Step: 339300 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:25,101-Speed 2515.22 samples/sec Loss 2.5397 LearningRate 0.000431 Epoch: 16 Global Step: 339310 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:33,303-Speed 2497.91 samples/sec Loss 2.5268 LearningRate 0.000431 Epoch: 16 Global Step: 339320 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:41,501-Speed 2498.34 samples/sec Loss 2.5241 LearningRate 0.000431 Epoch: 16 Global Step: 339330 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:49,706-Speed 2496.54 samples/sec Loss 2.5602 LearningRate 0.000431 Epoch: 16 Global Step: 339340 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:06:57,913-Speed 2495.76 samples/sec Loss 2.5156 LearningRate 0.000431 Epoch: 16 Global Step: 339350 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:06,119-Speed 2496.11 samples/sec Loss 2.5253 LearningRate 0.000431 Epoch: 16 Global Step: 339360 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:14,266-Speed 2514.10 samples/sec Loss 2.4827 LearningRate 0.000431 Epoch: 16 Global Step: 339370 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:22,464-Speed 2498.43 samples/sec Loss 2.5140 LearningRate 0.000431 Epoch: 16 Global Step: 339380 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:30,667-Speed 2497.13 samples/sec Loss 2.4599 LearningRate 0.000431 Epoch: 16 Global Step: 339390 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:38,865-Speed 2498.54 samples/sec Loss 2.4333 LearningRate 0.000431 Epoch: 16 Global Step: 339400 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:47,084-Speed 2492.48 samples/sec Loss 2.4595 LearningRate 0.000431 Epoch: 16 Global Step: 339410 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:07:55,282-Speed 2498.36 samples/sec Loss 2.5518 LearningRate 0.000431 Epoch: 16 Global Step: 339420 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:03,426-Speed 2514.98 samples/sec Loss 2.5079 LearningRate 0.000431 Epoch: 16 Global Step: 339430 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:11,626-Speed 2498.09 samples/sec Loss 2.5029 LearningRate 0.000431 Epoch: 16 Global Step: 339440 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:19,826-Speed 2498.18 samples/sec Loss 2.4479 LearningRate 0.000431 Epoch: 16 Global Step: 339450 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:28,023-Speed 2499.09 samples/sec Loss 2.5465 LearningRate 0.000431 Epoch: 16 Global Step: 339460 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:36,226-Speed 2497.20 samples/sec Loss 2.5346 LearningRate 0.000431 Epoch: 16 Global Step: 339470 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:44,427-Speed 2497.70 samples/sec Loss 2.5320 LearningRate 0.000431 Epoch: 16 Global Step: 339480 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:08:52,575-Speed 2514.04 samples/sec Loss 2.5274 LearningRate 0.000431 Epoch: 16 Global Step: 339490 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:00,781-Speed 2495.97 samples/sec Loss 2.5075 LearningRate 0.000431 Epoch: 16 Global Step: 339500 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:08,978-Speed 2499.17 samples/sec Loss 2.5362 LearningRate 0.000431 Epoch: 16 Global Step: 339510 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:17,177-Speed 2498.04 samples/sec Loss 2.5025 LearningRate 0.000431 Epoch: 16 Global Step: 339520 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:25,377-Speed 2498.11 samples/sec Loss 2.5008 LearningRate 0.000431 Epoch: 16 Global Step: 339530 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:33,581-Speed 2496.68 samples/sec Loss 2.5297 LearningRate 0.000431 Epoch: 16 Global Step: 339540 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:41,731-Speed 2513.38 samples/sec Loss 2.5237 LearningRate 0.000431 Epoch: 16 Global Step: 339550 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:49,932-Speed 2497.57 samples/sec Loss 2.5116 LearningRate 0.000431 Epoch: 16 Global Step: 339560 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:09:58,131-Speed 2498.53 samples/sec Loss 2.5833 LearningRate 0.000431 Epoch: 16 Global Step: 339570 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:06,329-Speed 2498.65 samples/sec Loss 2.5134 LearningRate 0.000431 Epoch: 16 Global Step: 339580 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:14,531-Speed 2497.17 samples/sec Loss 2.5210 LearningRate 0.000431 Epoch: 16 Global Step: 339590 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:22,747-Speed 2493.21 samples/sec Loss 2.6407 LearningRate 0.000431 Epoch: 16 Global Step: 339600 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:30,898-Speed 2513.40 samples/sec Loss 2.5199 LearningRate 0.000431 Epoch: 16 Global Step: 339610 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:39,100-Speed 2497.35 samples/sec Loss 2.5314 LearningRate 0.000431 Epoch: 16 Global Step: 339620 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:47,310-Speed 2494.96 samples/sec Loss 2.5486 LearningRate 0.000431 Epoch: 16 Global Step: 339630 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:10:55,508-Speed 2498.71 samples/sec Loss 2.5585 LearningRate 0.000431 Epoch: 16 Global Step: 339640 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:03,707-Speed 2498.24 samples/sec Loss 2.5597 LearningRate 0.000431 Epoch: 16 Global Step: 339650 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:11,909-Speed 2497.15 samples/sec Loss 2.5247 LearningRate 0.000431 Epoch: 16 Global Step: 339660 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:20,058-Speed 2513.67 samples/sec Loss 2.5264 LearningRate 0.000431 Epoch: 16 Global Step: 339670 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:28,256-Speed 2498.69 samples/sec Loss 2.5306 LearningRate 0.000431 Epoch: 16 Global Step: 339680 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:36,461-Speed 2496.35 samples/sec Loss 2.4936 LearningRate 0.000431 Epoch: 16 Global Step: 339690 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:44,664-Speed 2497.26 samples/sec Loss 2.5435 LearningRate 0.000430 Epoch: 16 Global Step: 339700 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:11:52,869-Speed 2496.50 samples/sec Loss 2.5166 LearningRate 0.000430 Epoch: 16 Global Step: 339710 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:01,078-Speed 2495.26 samples/sec Loss 2.5533 LearningRate 0.000430 Epoch: 16 Global Step: 339720 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:09,223-Speed 2514.87 samples/sec Loss 2.5492 LearningRate 0.000430 Epoch: 16 Global Step: 339730 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:17,429-Speed 2496.35 samples/sec Loss 2.5490 LearningRate 0.000430 Epoch: 16 Global Step: 339740 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:25,629-Speed 2498.03 samples/sec Loss 2.5426 LearningRate 0.000430 Epoch: 16 Global Step: 339750 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:33,827-Speed 2498.58 samples/sec Loss 2.5601 LearningRate 0.000430 Epoch: 16 Global Step: 339760 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:42,052-Speed 2490.15 samples/sec Loss 2.5684 LearningRate 0.000430 Epoch: 16 Global Step: 339770 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:50,258-Speed 2496.25 samples/sec Loss 2.5437 LearningRate 0.000430 Epoch: 16 Global Step: 339780 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:12:58,405-Speed 2514.26 samples/sec Loss 2.5834 LearningRate 0.000430 Epoch: 16 Global Step: 339790 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:06,606-Speed 2498.07 samples/sec Loss 2.5541 LearningRate 0.000430 Epoch: 16 Global Step: 339800 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:14,804-Speed 2498.79 samples/sec Loss 2.5425 LearningRate 0.000430 Epoch: 16 Global Step: 339810 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:23,005-Speed 2497.66 samples/sec Loss 2.5128 LearningRate 0.000430 Epoch: 16 Global Step: 339820 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:31,205-Speed 2498.03 samples/sec Loss 2.5168 LearningRate 0.000430 Epoch: 16 Global Step: 339830 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:39,406-Speed 2497.66 samples/sec Loss 2.4806 LearningRate 0.000430 Epoch: 16 Global Step: 339840 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:47,554-Speed 2514.01 samples/sec Loss 2.4747 LearningRate 0.000430 Epoch: 16 Global Step: 339850 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:13:55,754-Speed 2497.98 samples/sec Loss 2.5361 LearningRate 0.000430 Epoch: 16 Global Step: 339860 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:03,954-Speed 2497.98 samples/sec Loss 2.5344 LearningRate 0.000430 Epoch: 16 Global Step: 339870 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:12,154-Speed 2497.85 samples/sec Loss 2.5805 LearningRate 0.000430 Epoch: 16 Global Step: 339880 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:20,352-Speed 2498.67 samples/sec Loss 2.4448 LearningRate 0.000430 Epoch: 16 Global Step: 339890 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:28,552-Speed 2497.90 samples/sec Loss 2.4947 LearningRate 0.000430 Epoch: 16 Global Step: 339900 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:36,701-Speed 2513.52 samples/sec Loss 2.5118 LearningRate 0.000430 Epoch: 16 Global Step: 339910 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:44,899-Speed 2498.69 samples/sec Loss 2.5622 LearningRate 0.000430 Epoch: 16 Global Step: 339920 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:14:53,100-Speed 2497.48 samples/sec Loss 2.6079 LearningRate 0.000430 Epoch: 16 Global Step: 339930 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:01,300-Speed 2497.90 samples/sec Loss 2.6222 LearningRate 0.000430 Epoch: 16 Global Step: 339940 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:09,499-Speed 2498.29 samples/sec Loss 2.5961 LearningRate 0.000430 Epoch: 16 Global Step: 339950 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:17,696-Speed 2498.89 samples/sec Loss 2.5781 LearningRate 0.000430 Epoch: 16 Global Step: 339960 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:25,841-Speed 2514.93 samples/sec Loss 2.5378 LearningRate 0.000430 Epoch: 16 Global Step: 339970 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:34,040-Speed 2498.17 samples/sec Loss 2.5237 LearningRate 0.000430 Epoch: 16 Global Step: 339980 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:42,237-Speed 2499.06 samples/sec Loss 2.5323 LearningRate 0.000430 Epoch: 16 Global Step: 339990 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:50,434-Speed 2498.76 samples/sec Loss 2.5638 LearningRate 0.000430 Epoch: 16 Global Step: 340000 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:15:58,639-Speed 2496.94 samples/sec Loss 2.5590 LearningRate 0.000430 Epoch: 16 Global Step: 340010 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:06,841-Speed 2497.39 samples/sec Loss 2.5079 LearningRate 0.000430 Epoch: 16 Global Step: 340020 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:14,993-Speed 2512.56 samples/sec Loss 2.5151 LearningRate 0.000430 Epoch: 16 Global Step: 340030 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:23,191-Speed 2498.76 samples/sec Loss 2.5562 LearningRate 0.000430 Epoch: 16 Global Step: 340040 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:31,390-Speed 2498.00 samples/sec Loss 2.5226 LearningRate 0.000430 Epoch: 16 Global Step: 340050 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:39,590-Speed 2497.95 samples/sec Loss 2.5767 LearningRate 0.000430 Epoch: 16 Global Step: 340060 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:47,792-Speed 2497.35 samples/sec Loss 2.5549 LearningRate 0.000430 Epoch: 16 Global Step: 340070 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:16:55,993-Speed 2497.70 samples/sec Loss 2.5267 LearningRate 0.000430 Epoch: 16 Global Step: 340080 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:04,141-Speed 2514.09 samples/sec Loss 2.5578 LearningRate 0.000430 Epoch: 16 Global Step: 340090 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:12,345-Speed 2496.84 samples/sec Loss 2.5203 LearningRate 0.000430 Epoch: 16 Global Step: 340100 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:20,545-Speed 2498.02 samples/sec Loss 2.5348 LearningRate 0.000430 Epoch: 16 Global Step: 340110 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:28,744-Speed 2498.05 samples/sec Loss 2.5519 LearningRate 0.000430 Epoch: 16 Global Step: 340120 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:36,948-Speed 2497.09 samples/sec Loss 2.5372 LearningRate 0.000430 Epoch: 16 Global Step: 340130 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:45,144-Speed 2498.97 samples/sec Loss 2.5030 LearningRate 0.000430 Epoch: 16 Global Step: 340140 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:17:53,291-Speed 2514.44 samples/sec Loss 2.5326 LearningRate 0.000430 Epoch: 16 Global Step: 340150 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:01,491-Speed 2497.64 samples/sec Loss 2.5728 LearningRate 0.000430 Epoch: 16 Global Step: 340160 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:09,718-Speed 2489.77 samples/sec Loss 2.5419 LearningRate 0.000430 Epoch: 16 Global Step: 340170 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:17,917-Speed 2498.32 samples/sec Loss 2.4966 LearningRate 0.000430 Epoch: 16 Global Step: 340180 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:26,118-Speed 2497.68 samples/sec Loss 2.5290 LearningRate 0.000430 Epoch: 16 Global Step: 340190 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:34,315-Speed 2498.78 samples/sec Loss 2.5317 LearningRate 0.000430 Epoch: 16 Global Step: 340200 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:42,467-Speed 2513.07 samples/sec Loss 2.5557 LearningRate 0.000430 Epoch: 16 Global Step: 340210 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:50,665-Speed 2498.39 samples/sec Loss 2.5135 LearningRate 0.000430 Epoch: 16 Global Step: 340220 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:18:58,865-Speed 2498.06 samples/sec Loss 2.4888 LearningRate 0.000430 Epoch: 16 Global Step: 340230 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:07,075-Speed 2495.10 samples/sec Loss 2.5713 LearningRate 0.000430 Epoch: 16 Global Step: 340240 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:15,271-Speed 2499.06 samples/sec Loss 2.5488 LearningRate 0.000430 Epoch: 16 Global Step: 340250 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:23,477-Speed 2496.00 samples/sec Loss 2.4800 LearningRate 0.000430 Epoch: 16 Global Step: 340260 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:31,620-Speed 2515.33 samples/sec Loss 2.5029 LearningRate 0.000429 Epoch: 16 Global Step: 340270 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:39,821-Speed 2498.05 samples/sec Loss 2.5363 LearningRate 0.000429 Epoch: 16 Global Step: 340280 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:48,030-Speed 2495.31 samples/sec Loss 2.5606 LearningRate 0.000429 Epoch: 16 Global Step: 340290 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:19:56,231-Speed 2497.78 samples/sec Loss 2.5092 LearningRate 0.000429 Epoch: 16 Global Step: 340300 Fp16 Grad Scale: 65536 Required: 112 hours Training: 2022-07-08 20:20:04,432-Speed 2497.61 samples/sec Loss 2.4890 LearningRate 0.000429 Epoch: 16 Global Step: 340310 Fp16 Grad Scale: 65536 Required: 112 hours Training: 2022-07-08 20:20:12,588-Speed 2511.43 samples/sec Loss 2.5311 LearningRate 0.000429 Epoch: 16 Global Step: 340320 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:20:20,734-Speed 2514.50 samples/sec Loss 2.5045 LearningRate 0.000429 Epoch: 16 Global Step: 340330 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:20:28,930-Speed 2499.03 samples/sec Loss 2.5558 LearningRate 0.000429 Epoch: 16 Global Step: 340340 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:20:37,130-Speed 2498.32 samples/sec Loss 2.5240 LearningRate 0.000429 Epoch: 16 Global Step: 340350 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:20:45,328-Speed 2498.43 samples/sec Loss 2.5446 LearningRate 0.000429 Epoch: 16 Global Step: 340360 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:20:53,524-Speed 2499.08 samples/sec Loss 2.5370 LearningRate 0.000429 Epoch: 16 Global Step: 340370 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:01,721-Speed 2498.89 samples/sec Loss 2.5312 LearningRate 0.000429 Epoch: 16 Global Step: 340380 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:09,869-Speed 2513.86 samples/sec Loss 2.5126 LearningRate 0.000429 Epoch: 16 Global Step: 340390 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:18,066-Speed 2499.17 samples/sec Loss 2.5448 LearningRate 0.000429 Epoch: 16 Global Step: 340400 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:26,277-Speed 2494.71 samples/sec Loss 2.4892 LearningRate 0.000429 Epoch: 16 Global Step: 340410 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:34,477-Speed 2498.05 samples/sec Loss 2.5551 LearningRate 0.000429 Epoch: 16 Global Step: 340420 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:42,674-Speed 2498.73 samples/sec Loss 2.5624 LearningRate 0.000429 Epoch: 16 Global Step: 340430 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:50,886-Speed 2494.33 samples/sec Loss 2.5854 LearningRate 0.000429 Epoch: 16 Global Step: 340440 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:21:59,041-Speed 2511.90 samples/sec Loss 2.5579 LearningRate 0.000429 Epoch: 16 Global Step: 340450 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:07,240-Speed 2498.30 samples/sec Loss 2.6095 LearningRate 0.000429 Epoch: 16 Global Step: 340460 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:15,440-Speed 2498.13 samples/sec Loss 2.5208 LearningRate 0.000429 Epoch: 16 Global Step: 340470 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:23,641-Speed 2497.99 samples/sec Loss 2.5507 LearningRate 0.000429 Epoch: 16 Global Step: 340480 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:31,842-Speed 2497.58 samples/sec Loss 2.5392 LearningRate 0.000429 Epoch: 16 Global Step: 340490 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:40,059-Speed 2492.89 samples/sec Loss 2.5193 LearningRate 0.000429 Epoch: 16 Global Step: 340500 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:48,208-Speed 2513.47 samples/sec Loss 2.5445 LearningRate 0.000429 Epoch: 16 Global Step: 340510 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:22:56,430-Speed 2491.29 samples/sec Loss 2.5312 LearningRate 0.000429 Epoch: 16 Global Step: 340520 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:04,630-Speed 2498.12 samples/sec Loss 2.5734 LearningRate 0.000429 Epoch: 16 Global Step: 340530 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:12,831-Speed 2497.71 samples/sec Loss 2.5630 LearningRate 0.000429 Epoch: 16 Global Step: 340540 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:21,034-Speed 2497.19 samples/sec Loss 2.5818 LearningRate 0.000429 Epoch: 16 Global Step: 340550 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:29,232-Speed 2498.78 samples/sec Loss 2.5492 LearningRate 0.000429 Epoch: 16 Global Step: 340560 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:37,378-Speed 2514.19 samples/sec Loss 2.5183 LearningRate 0.000429 Epoch: 16 Global Step: 340570 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:45,573-Speed 2499.65 samples/sec Loss 2.5339 LearningRate 0.000429 Epoch: 16 Global Step: 340580 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:23:53,769-Speed 2499.34 samples/sec Loss 2.5261 LearningRate 0.000429 Epoch: 16 Global Step: 340590 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:01,980-Speed 2494.36 samples/sec Loss 2.5362 LearningRate 0.000429 Epoch: 16 Global Step: 340600 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:10,177-Speed 2498.87 samples/sec Loss 2.5188 LearningRate 0.000429 Epoch: 16 Global Step: 340610 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:18,388-Speed 2494.83 samples/sec Loss 2.5270 LearningRate 0.000429 Epoch: 16 Global Step: 340620 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:26,531-Speed 2515.47 samples/sec Loss 2.5331 LearningRate 0.000429 Epoch: 16 Global Step: 340630 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:34,728-Speed 2498.66 samples/sec Loss 2.5034 LearningRate 0.000429 Epoch: 16 Global Step: 340640 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:42,927-Speed 2498.31 samples/sec Loss 2.5063 LearningRate 0.000429 Epoch: 16 Global Step: 340650 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:51,129-Speed 2497.51 samples/sec Loss 2.4793 LearningRate 0.000429 Epoch: 16 Global Step: 340660 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:24:59,329-Speed 2498.18 samples/sec Loss 2.5112 LearningRate 0.000429 Epoch: 16 Global Step: 340670 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:25:07,531-Speed 2497.03 samples/sec Loss 2.4799 LearningRate 0.000429 Epoch: 16 Global Step: 340680 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:25:15,688-Speed 2511.53 samples/sec Loss 2.5723 LearningRate 0.000429 Epoch: 16 Global Step: 340690 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:25:23,841-Speed 2512.18 samples/sec Loss 2.5588 LearningRate 0.000429 Epoch: 16 Global Step: 340700 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:25:32,039-Speed 2498.41 samples/sec Loss 2.6000 LearningRate 0.000429 Epoch: 16 Global Step: 340710 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:25:40,241-Speed 2497.41 samples/sec Loss 2.5476 LearningRate 0.000429 Epoch: 16 Global Step: 340720 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:25:48,455-Speed 2493.92 samples/sec Loss 2.5068 LearningRate 0.000429 Epoch: 16 Global Step: 340730 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:25:56,668-Speed 2493.97 samples/sec Loss 2.5425 LearningRate 0.000429 Epoch: 16 Global Step: 340740 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:04,822-Speed 2512.01 samples/sec Loss 2.5283 LearningRate 0.000429 Epoch: 16 Global Step: 340750 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:13,030-Speed 2495.51 samples/sec Loss 2.5577 LearningRate 0.000429 Epoch: 16 Global Step: 340760 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:21,226-Speed 2499.02 samples/sec Loss 2.6039 LearningRate 0.000429 Epoch: 16 Global Step: 340770 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:29,427-Speed 2497.59 samples/sec Loss 2.5172 LearningRate 0.000429 Epoch: 16 Global Step: 340780 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:37,628-Speed 2497.64 samples/sec Loss 2.6027 LearningRate 0.000429 Epoch: 16 Global Step: 340790 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:45,822-Speed 2500.10 samples/sec Loss 2.6110 LearningRate 0.000429 Epoch: 16 Global Step: 340800 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:26:53,982-Speed 2510.28 samples/sec Loss 2.5364 LearningRate 0.000429 Epoch: 16 Global Step: 340810 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:02,184-Speed 2497.28 samples/sec Loss 2.5472 LearningRate 0.000429 Epoch: 16 Global Step: 340820 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:10,386-Speed 2497.30 samples/sec Loss 2.5816 LearningRate 0.000429 Epoch: 16 Global Step: 340830 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:18,595-Speed 2495.72 samples/sec Loss 2.5612 LearningRate 0.000428 Epoch: 16 Global Step: 340840 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:26,809-Speed 2494.02 samples/sec Loss 2.5816 LearningRate 0.000428 Epoch: 16 Global Step: 340850 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:35,008-Speed 2498.22 samples/sec Loss 2.5577 LearningRate 0.000428 Epoch: 16 Global Step: 340860 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:43,152-Speed 2514.93 samples/sec Loss 2.5532 LearningRate 0.000428 Epoch: 16 Global Step: 340870 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:51,349-Speed 2498.91 samples/sec Loss 2.5282 LearningRate 0.000428 Epoch: 16 Global Step: 340880 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:27:59,555-Speed 2495.99 samples/sec Loss 2.5728 LearningRate 0.000428 Epoch: 16 Global Step: 340890 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:07,756-Speed 2497.78 samples/sec Loss 2.5205 LearningRate 0.000428 Epoch: 16 Global Step: 340900 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:15,953-Speed 2498.77 samples/sec Loss 2.5284 LearningRate 0.000428 Epoch: 16 Global Step: 340910 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:24,153-Speed 2498.00 samples/sec Loss 2.5561 LearningRate 0.000428 Epoch: 16 Global Step: 340920 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:32,304-Speed 2513.00 samples/sec Loss 2.5167 LearningRate 0.000428 Epoch: 16 Global Step: 340930 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:40,508-Speed 2496.78 samples/sec Loss 2.4988 LearningRate 0.000428 Epoch: 16 Global Step: 340940 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:48,709-Speed 2497.70 samples/sec Loss 2.5300 LearningRate 0.000428 Epoch: 16 Global Step: 340950 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:28:56,931-Speed 2491.37 samples/sec Loss 2.4982 LearningRate 0.000428 Epoch: 16 Global Step: 340960 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:05,133-Speed 2497.04 samples/sec Loss 2.4955 LearningRate 0.000428 Epoch: 16 Global Step: 340970 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:13,341-Speed 2495.48 samples/sec Loss 2.5510 LearningRate 0.000428 Epoch: 16 Global Step: 340980 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:21,483-Speed 2515.79 samples/sec Loss 2.5292 LearningRate 0.000428 Epoch: 16 Global Step: 340990 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:29,685-Speed 2497.95 samples/sec Loss 2.4984 LearningRate 0.000428 Epoch: 16 Global Step: 341000 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:37,883-Speed 2498.44 samples/sec Loss 2.5823 LearningRate 0.000428 Epoch: 16 Global Step: 341010 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:46,081-Speed 2498.41 samples/sec Loss 2.5402 LearningRate 0.000428 Epoch: 16 Global Step: 341020 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:29:54,286-Speed 2496.61 samples/sec Loss 2.4731 LearningRate 0.000428 Epoch: 16 Global Step: 341030 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:02,489-Speed 2497.20 samples/sec Loss 2.5156 LearningRate 0.000428 Epoch: 16 Global Step: 341040 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:10,643-Speed 2511.87 samples/sec Loss 2.5448 LearningRate 0.000428 Epoch: 16 Global Step: 341050 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:18,851-Speed 2495.56 samples/sec Loss 2.5413 LearningRate 0.000428 Epoch: 16 Global Step: 341060 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:27,046-Speed 2499.26 samples/sec Loss 2.5636 LearningRate 0.000428 Epoch: 16 Global Step: 341070 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:35,243-Speed 2499.06 samples/sec Loss 2.5561 LearningRate 0.000428 Epoch: 16 Global Step: 341080 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:43,442-Speed 2498.18 samples/sec Loss 2.5257 LearningRate 0.000428 Epoch: 16 Global Step: 341090 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:51,644-Speed 2497.12 samples/sec Loss 2.5447 LearningRate 0.000428 Epoch: 16 Global Step: 341100 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:30:59,790-Speed 2514.51 samples/sec Loss 2.4513 LearningRate 0.000428 Epoch: 16 Global Step: 341110 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:07,986-Speed 2499.11 samples/sec Loss 2.5193 LearningRate 0.000428 Epoch: 16 Global Step: 341120 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:16,186-Speed 2497.92 samples/sec Loss 2.4965 LearningRate 0.000428 Epoch: 16 Global Step: 341130 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:24,386-Speed 2498.09 samples/sec Loss 2.5441 LearningRate 0.000428 Epoch: 16 Global Step: 341140 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:32,586-Speed 2498.00 samples/sec Loss 2.5256 LearningRate 0.000428 Epoch: 16 Global Step: 341150 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:40,785-Speed 2498.35 samples/sec Loss 2.5329 LearningRate 0.000428 Epoch: 16 Global Step: 341160 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:48,930-Speed 2514.99 samples/sec Loss 2.5701 LearningRate 0.000428 Epoch: 16 Global Step: 341170 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:31:57,126-Speed 2499.01 samples/sec Loss 2.5206 LearningRate 0.000428 Epoch: 16 Global Step: 341180 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:05,324-Speed 2498.49 samples/sec Loss 2.5250 LearningRate 0.000428 Epoch: 16 Global Step: 341190 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:13,522-Speed 2499.04 samples/sec Loss 2.5044 LearningRate 0.000428 Epoch: 16 Global Step: 341200 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:21,723-Speed 2497.54 samples/sec Loss 2.5549 LearningRate 0.000428 Epoch: 16 Global Step: 341210 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:29,925-Speed 2497.76 samples/sec Loss 2.5335 LearningRate 0.000428 Epoch: 16 Global Step: 341220 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:38,071-Speed 2514.99 samples/sec Loss 2.5174 LearningRate 0.000428 Epoch: 16 Global Step: 341230 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:46,267-Speed 2499.27 samples/sec Loss 2.4661 LearningRate 0.000428 Epoch: 16 Global Step: 341240 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:32:54,475-Speed 2495.40 samples/sec Loss 2.5291 LearningRate 0.000428 Epoch: 16 Global Step: 341250 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:02,675-Speed 2498.20 samples/sec Loss 2.5399 LearningRate 0.000428 Epoch: 16 Global Step: 341260 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:10,887-Speed 2494.17 samples/sec Loss 2.5553 LearningRate 0.000428 Epoch: 16 Global Step: 341270 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:19,087-Speed 2498.04 samples/sec Loss 2.5788 LearningRate 0.000428 Epoch: 16 Global Step: 341280 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:27,232-Speed 2514.72 samples/sec Loss 2.5121 LearningRate 0.000428 Epoch: 16 Global Step: 341290 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:35,430-Speed 2498.45 samples/sec Loss 2.5585 LearningRate 0.000428 Epoch: 16 Global Step: 341300 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:43,626-Speed 2499.22 samples/sec Loss 2.5320 LearningRate 0.000428 Epoch: 16 Global Step: 341310 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:33:51,825-Speed 2498.28 samples/sec Loss 2.4920 LearningRate 0.000428 Epoch: 16 Global Step: 341320 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:00,036-Speed 2494.82 samples/sec Loss 2.5278 LearningRate 0.000428 Epoch: 16 Global Step: 341330 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:08,234-Speed 2498.36 samples/sec Loss 2.5823 LearningRate 0.000428 Epoch: 16 Global Step: 341340 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:16,378-Speed 2515.39 samples/sec Loss 2.5297 LearningRate 0.000428 Epoch: 16 Global Step: 341350 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:24,578-Speed 2497.72 samples/sec Loss 2.5555 LearningRate 0.000428 Epoch: 16 Global Step: 341360 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:32,781-Speed 2497.11 samples/sec Loss 2.5084 LearningRate 0.000428 Epoch: 16 Global Step: 341370 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:40,984-Speed 2497.10 samples/sec Loss 2.5610 LearningRate 0.000428 Epoch: 16 Global Step: 341380 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:49,181-Speed 2498.89 samples/sec Loss 2.5999 LearningRate 0.000428 Epoch: 16 Global Step: 341390 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:34:57,380-Speed 2498.14 samples/sec Loss 2.5601 LearningRate 0.000428 Epoch: 16 Global Step: 341400 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:05,540-Speed 2510.16 samples/sec Loss 2.5054 LearningRate 0.000427 Epoch: 16 Global Step: 341410 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:13,737-Speed 2499.12 samples/sec Loss 2.5329 LearningRate 0.000427 Epoch: 16 Global Step: 341420 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:21,941-Speed 2497.11 samples/sec Loss 2.5624 LearningRate 0.000427 Epoch: 16 Global Step: 341430 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:30,139-Speed 2498.45 samples/sec Loss 2.5028 LearningRate 0.000427 Epoch: 16 Global Step: 341440 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:38,341-Speed 2497.77 samples/sec Loss 2.5174 LearningRate 0.000427 Epoch: 16 Global Step: 341450 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:46,542-Speed 2497.76 samples/sec Loss 2.5015 LearningRate 0.000427 Epoch: 16 Global Step: 341460 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:35:54,691-Speed 2513.70 samples/sec Loss 2.5325 LearningRate 0.000427 Epoch: 16 Global Step: 341470 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:02,892-Speed 2497.89 samples/sec Loss 2.5095 LearningRate 0.000427 Epoch: 16 Global Step: 341480 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:11,105-Speed 2494.00 samples/sec Loss 2.5236 LearningRate 0.000427 Epoch: 16 Global Step: 341490 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:19,307-Speed 2497.14 samples/sec Loss 2.4816 LearningRate 0.000427 Epoch: 16 Global Step: 341500 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:27,521-Speed 2493.60 samples/sec Loss 2.5232 LearningRate 0.000427 Epoch: 16 Global Step: 341510 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:35,721-Speed 2498.24 samples/sec Loss 2.4910 LearningRate 0.000427 Epoch: 16 Global Step: 341520 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:43,868-Speed 2513.99 samples/sec Loss 2.5351 LearningRate 0.000427 Epoch: 16 Global Step: 341530 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:36:52,070-Speed 2497.40 samples/sec Loss 2.5477 LearningRate 0.000427 Epoch: 16 Global Step: 341540 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:00,271-Speed 2497.86 samples/sec Loss 2.5221 LearningRate 0.000427 Epoch: 16 Global Step: 341550 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:08,484-Speed 2493.89 samples/sec Loss 2.4801 LearningRate 0.000427 Epoch: 16 Global Step: 341560 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:16,685-Speed 2497.58 samples/sec Loss 2.5273 LearningRate 0.000427 Epoch: 16 Global Step: 341570 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:24,887-Speed 2497.64 samples/sec Loss 2.4962 LearningRate 0.000427 Epoch: 16 Global Step: 341580 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:33,037-Speed 2513.27 samples/sec Loss 2.4968 LearningRate 0.000427 Epoch: 16 Global Step: 341590 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:41,242-Speed 2496.31 samples/sec Loss 2.5215 LearningRate 0.000427 Epoch: 16 Global Step: 341600 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:49,441-Speed 2498.38 samples/sec Loss 2.5245 LearningRate 0.000427 Epoch: 16 Global Step: 341610 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:37:57,642-Speed 2497.54 samples/sec Loss 2.5528 LearningRate 0.000427 Epoch: 16 Global Step: 341620 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:05,842-Speed 2497.91 samples/sec Loss 2.5785 LearningRate 0.000427 Epoch: 16 Global Step: 341630 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:14,037-Speed 2499.50 samples/sec Loss 2.5139 LearningRate 0.000427 Epoch: 16 Global Step: 341640 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:22,187-Speed 2513.42 samples/sec Loss 2.4973 LearningRate 0.000427 Epoch: 16 Global Step: 341650 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:30,383-Speed 2499.06 samples/sec Loss 2.4969 LearningRate 0.000427 Epoch: 16 Global Step: 341660 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:38,583-Speed 2498.18 samples/sec Loss 2.4978 LearningRate 0.000427 Epoch: 16 Global Step: 341670 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:46,788-Speed 2496.56 samples/sec Loss 2.5437 LearningRate 0.000427 Epoch: 16 Global Step: 341680 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:38:54,987-Speed 2498.57 samples/sec Loss 2.5752 LearningRate 0.000427 Epoch: 16 Global Step: 341690 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:03,191-Speed 2496.52 samples/sec Loss 2.5224 LearningRate 0.000427 Epoch: 16 Global Step: 341700 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:11,336-Speed 2514.68 samples/sec Loss 2.5781 LearningRate 0.000427 Epoch: 16 Global Step: 341710 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:19,539-Speed 2497.16 samples/sec Loss 2.4916 LearningRate 0.000427 Epoch: 16 Global Step: 341720 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:27,755-Speed 2493.16 samples/sec Loss 2.5432 LearningRate 0.000427 Epoch: 16 Global Step: 341730 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:35,961-Speed 2496.13 samples/sec Loss 2.5527 LearningRate 0.000427 Epoch: 16 Global Step: 341740 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:44,164-Speed 2496.97 samples/sec Loss 2.5125 LearningRate 0.000427 Epoch: 16 Global Step: 341750 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:39:52,364-Speed 2497.72 samples/sec Loss 2.5727 LearningRate 0.000427 Epoch: 16 Global Step: 341760 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:00,512-Speed 2513.97 samples/sec Loss 2.4564 LearningRate 0.000427 Epoch: 16 Global Step: 341770 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:08,715-Speed 2497.11 samples/sec Loss 2.5573 LearningRate 0.000427 Epoch: 16 Global Step: 341780 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:16,914-Speed 2498.08 samples/sec Loss 2.4733 LearningRate 0.000427 Epoch: 16 Global Step: 341790 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:25,120-Speed 2496.27 samples/sec Loss 2.4849 LearningRate 0.000427 Epoch: 16 Global Step: 341800 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:33,322-Speed 2497.18 samples/sec Loss 2.5708 LearningRate 0.000427 Epoch: 16 Global Step: 341810 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:41,519-Speed 2498.79 samples/sec Loss 2.4918 LearningRate 0.000427 Epoch: 16 Global Step: 341820 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:49,668-Speed 2514.26 samples/sec Loss 2.5008 LearningRate 0.000427 Epoch: 16 Global Step: 341830 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:40:57,863-Speed 2500.00 samples/sec Loss 2.4833 LearningRate 0.000427 Epoch: 16 Global Step: 341840 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:06,057-Speed 2499.63 samples/sec Loss 2.4587 LearningRate 0.000427 Epoch: 16 Global Step: 341850 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:14,256-Speed 2498.88 samples/sec Loss 2.4451 LearningRate 0.000427 Epoch: 16 Global Step: 341860 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:22,456-Speed 2498.18 samples/sec Loss 2.5184 LearningRate 0.000427 Epoch: 16 Global Step: 341870 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:30,660-Speed 2496.64 samples/sec Loss 2.5177 LearningRate 0.000427 Epoch: 16 Global Step: 341880 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:38,828-Speed 2507.68 samples/sec Loss 2.5527 LearningRate 0.000427 Epoch: 16 Global Step: 341890 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:41:47,026-Speed 2498.57 samples/sec Loss 2.4639 LearningRate 0.000427 Epoch: 16 Global Step: 341900 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:41:55,232-Speed 2496.14 samples/sec Loss 2.4685 LearningRate 0.000427 Epoch: 16 Global Step: 341910 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:03,428-Speed 2499.14 samples/sec Loss 2.5642 LearningRate 0.000427 Epoch: 16 Global Step: 341920 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:11,627-Speed 2498.49 samples/sec Loss 2.4980 LearningRate 0.000427 Epoch: 16 Global Step: 341930 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:19,829-Speed 2497.25 samples/sec Loss 2.4579 LearningRate 0.000427 Epoch: 16 Global Step: 341940 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:27,982-Speed 2512.28 samples/sec Loss 2.4587 LearningRate 0.000427 Epoch: 16 Global Step: 341950 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:36,186-Speed 2496.57 samples/sec Loss 2.5456 LearningRate 0.000427 Epoch: 16 Global Step: 341960 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:44,387-Speed 2497.74 samples/sec Loss 2.5339 LearningRate 0.000427 Epoch: 16 Global Step: 341970 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:42:52,586-Speed 2498.48 samples/sec Loss 2.4654 LearningRate 0.000426 Epoch: 16 Global Step: 341980 Fp16 Grad Scale: 32768 Required: 112 hours Training: 2022-07-08 20:43:00,758-Speed 2506.71 samples/sec Loss 2.4830 LearningRate 0.000426 Epoch: 16 Global Step: 341990 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:08,960-Speed 2497.35 samples/sec Loss 2.5017 LearningRate 0.000426 Epoch: 16 Global Step: 342000 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:17,105-Speed 2514.70 samples/sec Loss 2.5367 LearningRate 0.000426 Epoch: 16 Global Step: 342010 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:25,309-Speed 2496.79 samples/sec Loss 2.5146 LearningRate 0.000426 Epoch: 16 Global Step: 342020 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:33,508-Speed 2498.57 samples/sec Loss 2.4603 LearningRate 0.000426 Epoch: 16 Global Step: 342030 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:41,706-Speed 2498.34 samples/sec Loss 2.5242 LearningRate 0.000426 Epoch: 16 Global Step: 342040 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:49,903-Speed 2499.05 samples/sec Loss 2.5169 LearningRate 0.000426 Epoch: 16 Global Step: 342050 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:43:58,105-Speed 2497.73 samples/sec Loss 2.4610 LearningRate 0.000426 Epoch: 16 Global Step: 342060 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:06,274-Speed 2507.35 samples/sec Loss 2.5788 LearningRate 0.000426 Epoch: 16 Global Step: 342070 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:14,471-Speed 2498.81 samples/sec Loss 2.4573 LearningRate 0.000426 Epoch: 16 Global Step: 342080 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:22,672-Speed 2498.60 samples/sec Loss 2.5322 LearningRate 0.000426 Epoch: 16 Global Step: 342090 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:30,870-Speed 2498.69 samples/sec Loss 2.5353 LearningRate 0.000426 Epoch: 16 Global Step: 342100 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:39,070-Speed 2498.04 samples/sec Loss 2.4566 LearningRate 0.000426 Epoch: 16 Global Step: 342110 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:47,269-Speed 2497.99 samples/sec Loss 2.5085 LearningRate 0.000426 Epoch: 16 Global Step: 342120 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:44:55,418-Speed 2513.60 samples/sec Loss 2.5301 LearningRate 0.000426 Epoch: 16 Global Step: 342130 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:03,619-Speed 2497.62 samples/sec Loss 2.5017 LearningRate 0.000426 Epoch: 16 Global Step: 342140 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:11,822-Speed 2497.14 samples/sec Loss 2.4567 LearningRate 0.000426 Epoch: 16 Global Step: 342150 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:20,025-Speed 2497.54 samples/sec Loss 2.4709 LearningRate 0.000426 Epoch: 16 Global Step: 342160 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:28,224-Speed 2498.32 samples/sec Loss 2.5200 LearningRate 0.000426 Epoch: 16 Global Step: 342170 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:36,426-Speed 2497.67 samples/sec Loss 2.4693 LearningRate 0.000426 Epoch: 16 Global Step: 342180 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:44,580-Speed 2512.04 samples/sec Loss 2.5096 LearningRate 0.000426 Epoch: 16 Global Step: 342190 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:45:52,795-Speed 2493.10 samples/sec Loss 2.5330 LearningRate 0.000426 Epoch: 16 Global Step: 342200 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:00,995-Speed 2497.95 samples/sec Loss 2.4754 LearningRate 0.000426 Epoch: 16 Global Step: 342210 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:09,195-Speed 2498.08 samples/sec Loss 2.5284 LearningRate 0.000426 Epoch: 16 Global Step: 342220 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:17,409-Speed 2493.87 samples/sec Loss 2.4710 LearningRate 0.000426 Epoch: 16 Global Step: 342230 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:25,618-Speed 2495.41 samples/sec Loss 2.5062 LearningRate 0.000426 Epoch: 16 Global Step: 342240 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:33,770-Speed 2512.50 samples/sec Loss 2.5125 LearningRate 0.000426 Epoch: 16 Global Step: 342250 Fp16 Grad Scale: 16384 Required: 112 hours Training: 2022-07-08 20:46:41,942-Speed 2506.57 samples/sec Loss 2.5388 LearningRate 0.000426 Epoch: 16 Global Step: 342260 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:46:50,141-Speed 2498.39 samples/sec Loss 2.5340 LearningRate 0.000426 Epoch: 16 Global Step: 342270 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:46:58,348-Speed 2495.92 samples/sec Loss 2.5050 LearningRate 0.000426 Epoch: 16 Global Step: 342280 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:06,548-Speed 2497.98 samples/sec Loss 2.4453 LearningRate 0.000426 Epoch: 16 Global Step: 342290 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:14,743-Speed 2499.49 samples/sec Loss 2.4381 LearningRate 0.000426 Epoch: 16 Global Step: 342300 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:22,892-Speed 2513.57 samples/sec Loss 2.5161 LearningRate 0.000426 Epoch: 16 Global Step: 342310 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:31,091-Speed 2498.35 samples/sec Loss 2.4975 LearningRate 0.000426 Epoch: 16 Global Step: 342320 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:39,287-Speed 2498.87 samples/sec Loss 2.5042 LearningRate 0.000426 Epoch: 16 Global Step: 342330 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:47,485-Speed 2499.34 samples/sec Loss 2.5086 LearningRate 0.000426 Epoch: 16 Global Step: 342340 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:47:55,683-Speed 2498.29 samples/sec Loss 2.4678 LearningRate 0.000426 Epoch: 16 Global Step: 342350 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:03,883-Speed 2498.29 samples/sec Loss 2.4555 LearningRate 0.000426 Epoch: 16 Global Step: 342360 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:12,035-Speed 2512.44 samples/sec Loss 2.4613 LearningRate 0.000426 Epoch: 16 Global Step: 342370 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:20,230-Speed 2499.60 samples/sec Loss 2.5491 LearningRate 0.000426 Epoch: 16 Global Step: 342380 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:28,427-Speed 2498.92 samples/sec Loss 2.5509 LearningRate 0.000426 Epoch: 16 Global Step: 342390 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:36,625-Speed 2498.83 samples/sec Loss 2.4979 LearningRate 0.000426 Epoch: 16 Global Step: 342400 Fp16 Grad Scale: 8192 Required: 112 hours Training: 2022-07-08 20:48:44,823-Speed 2498.47 samples/sec Loss 2.4849 LearningRate 0.000426 Epoch: 16 Global Step: 342410 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:48:53,034-Speed 2494.54 samples/sec Loss 2.5490 LearningRate 0.000426 Epoch: 16 Global Step: 342420 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:01,177-Speed 2515.73 samples/sec Loss 2.5070 LearningRate 0.000426 Epoch: 16 Global Step: 342430 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:09,379-Speed 2497.51 samples/sec Loss 2.4879 LearningRate 0.000426 Epoch: 16 Global Step: 342440 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:17,576-Speed 2498.81 samples/sec Loss 2.5821 LearningRate 0.000426 Epoch: 16 Global Step: 342450 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:25,771-Speed 2499.50 samples/sec Loss 2.5317 LearningRate 0.000426 Epoch: 16 Global Step: 342460 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:33,969-Speed 2498.73 samples/sec Loss 2.5019 LearningRate 0.000426 Epoch: 16 Global Step: 342470 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:42,165-Speed 2498.90 samples/sec Loss 2.5676 LearningRate 0.000426 Epoch: 16 Global Step: 342480 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:50,311-Speed 2515.00 samples/sec Loss 2.5586 LearningRate 0.000426 Epoch: 16 Global Step: 342490 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:49:58,508-Speed 2498.58 samples/sec Loss 2.5035 LearningRate 0.000426 Epoch: 16 Global Step: 342500 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:06,707-Speed 2498.54 samples/sec Loss 2.5252 LearningRate 0.000426 Epoch: 16 Global Step: 342510 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:14,904-Speed 2498.70 samples/sec Loss 2.5356 LearningRate 0.000426 Epoch: 16 Global Step: 342520 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:23,106-Speed 2497.38 samples/sec Loss 2.5348 LearningRate 0.000426 Epoch: 16 Global Step: 342530 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:31,302-Speed 2498.84 samples/sec Loss 2.6024 LearningRate 0.000426 Epoch: 16 Global Step: 342540 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:39,460-Speed 2511.02 samples/sec Loss 2.5337 LearningRate 0.000425 Epoch: 16 Global Step: 342550 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:47,654-Speed 2499.83 samples/sec Loss 2.5607 LearningRate 0.000425 Epoch: 16 Global Step: 342560 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:50:55,850-Speed 2499.27 samples/sec Loss 2.5248 LearningRate 0.000425 Epoch: 16 Global Step: 342570 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:04,047-Speed 2498.67 samples/sec Loss 2.5498 LearningRate 0.000425 Epoch: 16 Global Step: 342580 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:12,244-Speed 2498.77 samples/sec Loss 2.5302 LearningRate 0.000425 Epoch: 16 Global Step: 342590 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:20,443-Speed 2498.37 samples/sec Loss 2.5189 LearningRate 0.000425 Epoch: 16 Global Step: 342600 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:28,588-Speed 2514.99 samples/sec Loss 2.4953 LearningRate 0.000425 Epoch: 16 Global Step: 342610 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:36,782-Speed 2499.67 samples/sec Loss 2.4969 LearningRate 0.000425 Epoch: 16 Global Step: 342620 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:44,980-Speed 2498.53 samples/sec Loss 2.4568 LearningRate 0.000425 Epoch: 16 Global Step: 342630 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:51:53,177-Speed 2498.87 samples/sec Loss 2.4926 LearningRate 0.000425 Epoch: 16 Global Step: 342640 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:01,376-Speed 2498.40 samples/sec Loss 2.5052 LearningRate 0.000425 Epoch: 16 Global Step: 342650 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:09,585-Speed 2495.19 samples/sec Loss 2.5133 LearningRate 0.000425 Epoch: 16 Global Step: 342660 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:17,730-Speed 2514.75 samples/sec Loss 2.5221 LearningRate 0.000425 Epoch: 16 Global Step: 342670 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:25,930-Speed 2498.58 samples/sec Loss 2.4466 LearningRate 0.000425 Epoch: 16 Global Step: 342680 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:34,131-Speed 2497.72 samples/sec Loss 2.4773 LearningRate 0.000425 Epoch: 16 Global Step: 342690 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:42,326-Speed 2499.44 samples/sec Loss 2.4891 LearningRate 0.000425 Epoch: 16 Global Step: 342700 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:50,524-Speed 2498.81 samples/sec Loss 2.4611 LearningRate 0.000425 Epoch: 16 Global Step: 342710 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:52:58,722-Speed 2498.57 samples/sec Loss 2.5163 LearningRate 0.000425 Epoch: 16 Global Step: 342720 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:06,866-Speed 2515.12 samples/sec Loss 2.4593 LearningRate 0.000425 Epoch: 16 Global Step: 342730 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:15,065-Speed 2498.60 samples/sec Loss 2.5299 LearningRate 0.000425 Epoch: 16 Global Step: 342740 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:23,264-Speed 2498.37 samples/sec Loss 2.5273 LearningRate 0.000425 Epoch: 16 Global Step: 342750 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:31,459-Speed 2499.46 samples/sec Loss 2.5123 LearningRate 0.000425 Epoch: 16 Global Step: 342760 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:39,657-Speed 2498.56 samples/sec Loss 2.4937 LearningRate 0.000425 Epoch: 16 Global Step: 342770 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:47,864-Speed 2495.49 samples/sec Loss 2.5821 LearningRate 0.000425 Epoch: 16 Global Step: 342780 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:53:59,234-Speed 1844.05 samples/sec Loss 2.4880 LearningRate 0.000425 Epoch: 16 Global Step: 342790 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:07,495-Speed 2502.93 samples/sec Loss 2.5245 LearningRate 0.000425 Epoch: 16 Global Step: 342800 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:15,700-Speed 2496.36 samples/sec Loss 2.4435 LearningRate 0.000425 Epoch: 16 Global Step: 342810 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:26,617-Speed 1886.69 samples/sec Loss 2.5184 LearningRate 0.000425 Epoch: 16 Global Step: 342820 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:34,993-Speed 2501.94 samples/sec Loss 2.5158 LearningRate 0.000425 Epoch: 16 Global Step: 342830 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:43,201-Speed 2495.45 samples/sec Loss 2.4870 LearningRate 0.000425 Epoch: 16 Global Step: 342840 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:51,399-Speed 2518.71 samples/sec Loss 2.5012 LearningRate 0.000425 Epoch: 16 Global Step: 342850 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:54:59,691-Speed 2498.12 samples/sec Loss 2.5265 LearningRate 0.000425 Epoch: 16 Global Step: 342860 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:07,895-Speed 2496.41 samples/sec Loss 2.4851 LearningRate 0.000425 Epoch: 16 Global Step: 342870 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:16,092-Speed 2499.00 samples/sec Loss 2.5015 LearningRate 0.000425 Epoch: 16 Global Step: 342880 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:28,178-Speed 1702.48 samples/sec Loss 2.5131 LearningRate 0.000425 Epoch: 16 Global Step: 342890 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:36,373-Speed 2500.54 samples/sec Loss 2.5706 LearningRate 0.000425 Epoch: 16 Global Step: 342900 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:44,515-Speed 2515.73 samples/sec Loss 2.5645 LearningRate 0.000425 Epoch: 16 Global Step: 342910 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:55:52,710-Speed 2499.24 samples/sec Loss 2.5135 LearningRate 0.000425 Epoch: 16 Global Step: 342920 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:00,971-Speed 2499.34 samples/sec Loss 2.5277 LearningRate 0.000425 Epoch: 16 Global Step: 342930 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:09,226-Speed 2500.78 samples/sec Loss 2.5184 LearningRate 0.000425 Epoch: 16 Global Step: 342940 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:18,431-Speed 2225.14 samples/sec Loss 2.5220 LearningRate 0.000425 Epoch: 16 Global Step: 342950 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:26,926-Speed 2502.28 samples/sec Loss 2.5425 LearningRate 0.000425 Epoch: 16 Global Step: 342960 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:35,076-Speed 2516.59 samples/sec Loss 2.5474 LearningRate 0.000425 Epoch: 16 Global Step: 342970 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:47,552-Speed 1641.81 samples/sec Loss 2.4984 LearningRate 0.000425 Epoch: 16 Global Step: 342980 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:56:55,758-Speed 2495.99 samples/sec Loss 2.4852 LearningRate 0.000425 Epoch: 16 Global Step: 342990 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:03,980-Speed 2501.72 samples/sec Loss 2.5041 LearningRate 0.000425 Epoch: 16 Global Step: 343000 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:12,222-Speed 2500.75 samples/sec Loss 2.4851 LearningRate 0.000425 Epoch: 16 Global Step: 343010 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:24,534-Speed 1663.62 samples/sec Loss 2.5630 LearningRate 0.000425 Epoch: 16 Global Step: 343020 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:32,752-Speed 2512.69 samples/sec Loss 2.5133 LearningRate 0.000425 Epoch: 16 Global Step: 343030 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:46,038-Speed 1665.35 samples/sec Loss 2.5348 LearningRate 0.000425 Epoch: 16 Global Step: 343040 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:57:54,712-Speed 2502.44 samples/sec Loss 2.5358 LearningRate 0.000425 Epoch: 16 Global Step: 343050 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:03,357-Speed 2375.40 samples/sec Loss 2.4992 LearningRate 0.000425 Epoch: 16 Global Step: 343060 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:11,565-Speed 2496.02 samples/sec Loss 2.5420 LearningRate 0.000425 Epoch: 16 Global Step: 343070 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:19,760-Speed 2499.40 samples/sec Loss 2.5654 LearningRate 0.000425 Epoch: 16 Global Step: 343080 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:27,904-Speed 2515.09 samples/sec Loss 2.4825 LearningRate 0.000425 Epoch: 16 Global Step: 343090 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:36,104-Speed 2497.90 samples/sec Loss 2.5232 LearningRate 0.000425 Epoch: 16 Global Step: 343100 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:44,309-Speed 2496.54 samples/sec Loss 2.5276 LearningRate 0.000425 Epoch: 16 Global Step: 343110 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:58:52,518-Speed 2495.17 samples/sec Loss 2.5300 LearningRate 0.000425 Epoch: 16 Global Step: 343120 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:00,715-Speed 2499.12 samples/sec Loss 2.5172 LearningRate 0.000424 Epoch: 16 Global Step: 343130 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:08,909-Speed 2499.67 samples/sec Loss 2.5060 LearningRate 0.000424 Epoch: 16 Global Step: 343140 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:17,054-Speed 2514.82 samples/sec Loss 2.5027 LearningRate 0.000424 Epoch: 16 Global Step: 343150 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:25,260-Speed 2496.39 samples/sec Loss 2.5041 LearningRate 0.000424 Epoch: 16 Global Step: 343160 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:33,464-Speed 2497.04 samples/sec Loss 2.4846 LearningRate 0.000424 Epoch: 16 Global Step: 343170 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:41,666-Speed 2497.18 samples/sec Loss 2.4758 LearningRate 0.000424 Epoch: 16 Global Step: 343180 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:49,870-Speed 2496.92 samples/sec Loss 2.4801 LearningRate 0.000424 Epoch: 16 Global Step: 343190 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 20:59:58,068-Speed 2498.68 samples/sec Loss 2.4669 LearningRate 0.000424 Epoch: 16 Global Step: 343200 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:06,216-Speed 2513.67 samples/sec Loss 2.5290 LearningRate 0.000424 Epoch: 16 Global Step: 343210 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:14,414-Speed 2498.74 samples/sec Loss 2.5182 LearningRate 0.000424 Epoch: 16 Global Step: 343220 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:22,616-Speed 2497.90 samples/sec Loss 2.5244 LearningRate 0.000424 Epoch: 16 Global Step: 343230 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:30,817-Speed 2497.56 samples/sec Loss 2.5010 LearningRate 0.000424 Epoch: 16 Global Step: 343240 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:39,016-Speed 2498.31 samples/sec Loss 2.4891 LearningRate 0.000424 Epoch: 16 Global Step: 343250 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:47,218-Speed 2497.51 samples/sec Loss 2.5434 LearningRate 0.000424 Epoch: 16 Global Step: 343260 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:00:55,363-Speed 2514.54 samples/sec Loss 2.5439 LearningRate 0.000424 Epoch: 16 Global Step: 343270 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:03,564-Speed 2498.25 samples/sec Loss 2.5634 LearningRate 0.000424 Epoch: 16 Global Step: 343280 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:11,761-Speed 2498.97 samples/sec Loss 2.5123 LearningRate 0.000424 Epoch: 16 Global Step: 343290 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:19,960-Speed 2498.19 samples/sec Loss 2.5319 LearningRate 0.000424 Epoch: 16 Global Step: 343300 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:28,163-Speed 2496.97 samples/sec Loss 2.4761 LearningRate 0.000424 Epoch: 16 Global Step: 343310 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:36,368-Speed 2496.29 samples/sec Loss 2.4740 LearningRate 0.000424 Epoch: 16 Global Step: 343320 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:44,517-Speed 2513.79 samples/sec Loss 2.4866 LearningRate 0.000424 Epoch: 16 Global Step: 343330 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:01:52,728-Speed 2494.79 samples/sec Loss 2.4304 LearningRate 0.000424 Epoch: 16 Global Step: 343340 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:00,932-Speed 2496.88 samples/sec Loss 2.5357 LearningRate 0.000424 Epoch: 16 Global Step: 343350 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:09,131-Speed 2498.16 samples/sec Loss 2.4839 LearningRate 0.000424 Epoch: 16 Global Step: 343360 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:17,332-Speed 2498.33 samples/sec Loss 2.5114 LearningRate 0.000424 Epoch: 16 Global Step: 343370 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:25,527-Speed 2499.13 samples/sec Loss 2.4943 LearningRate 0.000424 Epoch: 16 Global Step: 343380 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:33,675-Speed 2514.15 samples/sec Loss 2.4258 LearningRate 0.000424 Epoch: 16 Global Step: 343390 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:41,873-Speed 2498.92 samples/sec Loss 2.4644 LearningRate 0.000424 Epoch: 16 Global Step: 343400 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:50,074-Speed 2497.73 samples/sec Loss 2.5024 LearningRate 0.000424 Epoch: 16 Global Step: 343410 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:02:58,277-Speed 2496.75 samples/sec Loss 2.5026 LearningRate 0.000424 Epoch: 16 Global Step: 343420 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:03:06,488-Speed 2494.70 samples/sec Loss 2.5399 LearningRate 0.000424 Epoch: 16 Global Step: 343430 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:03:14,694-Speed 2496.07 samples/sec Loss 2.5032 LearningRate 0.000424 Epoch: 16 Global Step: 343440 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:03:22,850-Speed 2512.67 samples/sec Loss 2.5286 LearningRate 0.000424 Epoch: 16 Global Step: 343450 Fp16 Grad Scale: 8192 Required: 111 hours Training: 2022-07-08 21:03:31,047-Speed 2498.78 samples/sec Loss 2.5282 LearningRate 0.000424 Epoch: 16 Global Step: 343460 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:03:39,246-Speed 2498.24 samples/sec Loss 2.5253 LearningRate 0.000424 Epoch: 16 Global Step: 343470 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:03:47,457-Speed 2494.82 samples/sec Loss 2.5259 LearningRate 0.000424 Epoch: 16 Global Step: 343480 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:03:55,657-Speed 2498.00 samples/sec Loss 2.5031 LearningRate 0.000424 Epoch: 16 Global Step: 343490 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:03,860-Speed 2496.80 samples/sec Loss 2.4874 LearningRate 0.000424 Epoch: 16 Global Step: 343500 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:12,006-Speed 2514.52 samples/sec Loss 2.5418 LearningRate 0.000424 Epoch: 16 Global Step: 343510 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:20,207-Speed 2498.01 samples/sec Loss 2.5224 LearningRate 0.000424 Epoch: 16 Global Step: 343520 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:28,409-Speed 2497.29 samples/sec Loss 2.4905 LearningRate 0.000424 Epoch: 16 Global Step: 343530 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:36,616-Speed 2495.71 samples/sec Loss 2.5008 LearningRate 0.000424 Epoch: 16 Global Step: 343540 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:44,818-Speed 2497.19 samples/sec Loss 2.5830 LearningRate 0.000424 Epoch: 16 Global Step: 343550 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:04:53,023-Speed 2496.35 samples/sec Loss 2.5010 LearningRate 0.000424 Epoch: 16 Global Step: 343560 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:01,171-Speed 2514.03 samples/sec Loss 2.4790 LearningRate 0.000424 Epoch: 16 Global Step: 343570 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:09,374-Speed 2496.88 samples/sec Loss 2.5137 LearningRate 0.000424 Epoch: 16 Global Step: 343580 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:17,587-Speed 2494.88 samples/sec Loss 2.5231 LearningRate 0.000424 Epoch: 16 Global Step: 343590 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:25,785-Speed 2498.64 samples/sec Loss 2.4992 LearningRate 0.000424 Epoch: 16 Global Step: 343600 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:33,984-Speed 2498.04 samples/sec Loss 2.5434 LearningRate 0.000424 Epoch: 16 Global Step: 343610 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:42,180-Speed 2499.06 samples/sec Loss 2.5007 LearningRate 0.000424 Epoch: 16 Global Step: 343620 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:50,327-Speed 2514.30 samples/sec Loss 2.5207 LearningRate 0.000424 Epoch: 16 Global Step: 343630 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:05:58,604-Speed 2474.67 samples/sec Loss 2.4946 LearningRate 0.000424 Epoch: 16 Global Step: 343640 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:06,801-Speed 2498.73 samples/sec Loss 2.5082 LearningRate 0.000424 Epoch: 16 Global Step: 343650 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:15,005-Speed 2496.72 samples/sec Loss 2.5141 LearningRate 0.000424 Epoch: 16 Global Step: 343660 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:23,205-Speed 2498.23 samples/sec Loss 2.5494 LearningRate 0.000424 Epoch: 16 Global Step: 343670 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:31,407-Speed 2497.31 samples/sec Loss 2.4837 LearningRate 0.000424 Epoch: 16 Global Step: 343680 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:39,556-Speed 2513.59 samples/sec Loss 2.5108 LearningRate 0.000424 Epoch: 16 Global Step: 343690 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:47,753-Speed 2498.78 samples/sec Loss 2.5447 LearningRate 0.000423 Epoch: 16 Global Step: 343700 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:06:55,953-Speed 2497.87 samples/sec Loss 2.5051 LearningRate 0.000423 Epoch: 16 Global Step: 343710 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:04,149-Speed 2499.35 samples/sec Loss 2.5129 LearningRate 0.000423 Epoch: 16 Global Step: 343720 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:12,346-Speed 2498.79 samples/sec Loss 2.5115 LearningRate 0.000423 Epoch: 16 Global Step: 343730 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:20,545-Speed 2498.20 samples/sec Loss 2.6165 LearningRate 0.000423 Epoch: 16 Global Step: 343740 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:28,694-Speed 2513.74 samples/sec Loss 2.5396 LearningRate 0.000423 Epoch: 16 Global Step: 343750 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:36,893-Speed 2498.56 samples/sec Loss 2.4559 LearningRate 0.000423 Epoch: 16 Global Step: 343760 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:45,092-Speed 2498.07 samples/sec Loss 2.4980 LearningRate 0.000423 Epoch: 16 Global Step: 343770 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:07:53,291-Speed 2498.54 samples/sec Loss 2.4983 LearningRate 0.000423 Epoch: 16 Global Step: 343780 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:01,495-Speed 2496.88 samples/sec Loss 2.4853 LearningRate 0.000423 Epoch: 16 Global Step: 343790 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:09,694-Speed 2497.92 samples/sec Loss 2.4897 LearningRate 0.000423 Epoch: 16 Global Step: 343800 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:17,841-Speed 2514.49 samples/sec Loss 2.4643 LearningRate 0.000423 Epoch: 16 Global Step: 343810 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:26,040-Speed 2498.01 samples/sec Loss 2.5247 LearningRate 0.000423 Epoch: 16 Global Step: 343820 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:34,241-Speed 2497.73 samples/sec Loss 2.5663 LearningRate 0.000423 Epoch: 16 Global Step: 343830 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:42,452-Speed 2494.84 samples/sec Loss 2.4858 LearningRate 0.000423 Epoch: 16 Global Step: 343840 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:50,649-Speed 2498.92 samples/sec Loss 2.5143 LearningRate 0.000423 Epoch: 16 Global Step: 343850 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:08:58,845-Speed 2499.04 samples/sec Loss 2.5624 LearningRate 0.000423 Epoch: 16 Global Step: 343860 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:06,993-Speed 2514.22 samples/sec Loss 2.5288 LearningRate 0.000423 Epoch: 16 Global Step: 343870 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:15,193-Speed 2497.85 samples/sec Loss 2.4821 LearningRate 0.000423 Epoch: 16 Global Step: 343880 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:23,401-Speed 2495.29 samples/sec Loss 2.4790 LearningRate 0.000423 Epoch: 16 Global Step: 343890 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:31,605-Speed 2496.86 samples/sec Loss 2.5522 LearningRate 0.000423 Epoch: 16 Global Step: 343900 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:39,801-Speed 2499.05 samples/sec Loss 2.4793 LearningRate 0.000423 Epoch: 16 Global Step: 343910 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:47,997-Speed 2499.61 samples/sec Loss 2.5217 LearningRate 0.000423 Epoch: 16 Global Step: 343920 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:09:56,140-Speed 2515.36 samples/sec Loss 2.4479 LearningRate 0.000423 Epoch: 16 Global Step: 343930 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:04,353-Speed 2494.22 samples/sec Loss 2.5217 LearningRate 0.000423 Epoch: 16 Global Step: 343940 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:12,563-Speed 2495.08 samples/sec Loss 2.5502 LearningRate 0.000423 Epoch: 16 Global Step: 343950 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:20,774-Speed 2494.43 samples/sec Loss 2.4908 LearningRate 0.000423 Epoch: 16 Global Step: 343960 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:28,973-Speed 2498.20 samples/sec Loss 2.5026 LearningRate 0.000423 Epoch: 16 Global Step: 343970 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:37,177-Speed 2496.74 samples/sec Loss 2.5291 LearningRate 0.000423 Epoch: 16 Global Step: 343980 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:45,321-Speed 2515.19 samples/sec Loss 2.5239 LearningRate 0.000423 Epoch: 16 Global Step: 343990 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:10:53,519-Speed 2498.44 samples/sec Loss 2.4489 LearningRate 0.000423 Epoch: 16 Global Step: 344000 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:01,725-Speed 2496.22 samples/sec Loss 2.5356 LearningRate 0.000423 Epoch: 16 Global Step: 344010 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:09,937-Speed 2494.28 samples/sec Loss 2.5294 LearningRate 0.000423 Epoch: 16 Global Step: 344020 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:18,139-Speed 2497.72 samples/sec Loss 2.4792 LearningRate 0.000423 Epoch: 16 Global Step: 344030 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:26,335-Speed 2499.13 samples/sec Loss 2.4450 LearningRate 0.000423 Epoch: 16 Global Step: 344040 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:34,485-Speed 2513.52 samples/sec Loss 2.4795 LearningRate 0.000423 Epoch: 16 Global Step: 344050 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:42,685-Speed 2498.15 samples/sec Loss 2.4996 LearningRate 0.000423 Epoch: 16 Global Step: 344060 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:50,882-Speed 2498.70 samples/sec Loss 2.5369 LearningRate 0.000423 Epoch: 16 Global Step: 344070 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:11:59,082-Speed 2497.90 samples/sec Loss 2.5028 LearningRate 0.000423 Epoch: 16 Global Step: 344080 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:07,283-Speed 2498.00 samples/sec Loss 2.5004 LearningRate 0.000423 Epoch: 16 Global Step: 344090 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:15,500-Speed 2492.67 samples/sec Loss 2.5269 LearningRate 0.000423 Epoch: 16 Global Step: 344100 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:23,654-Speed 2512.29 samples/sec Loss 2.4887 LearningRate 0.000423 Epoch: 16 Global Step: 344110 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:31,852-Speed 2498.42 samples/sec Loss 2.5341 LearningRate 0.000423 Epoch: 16 Global Step: 344120 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:40,049-Speed 2498.80 samples/sec Loss 2.5451 LearningRate 0.000423 Epoch: 16 Global Step: 344130 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:48,249-Speed 2498.01 samples/sec Loss 2.5159 LearningRate 0.000423 Epoch: 16 Global Step: 344140 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:12:56,458-Speed 2495.40 samples/sec Loss 2.5203 LearningRate 0.000423 Epoch: 16 Global Step: 344150 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:04,659-Speed 2497.46 samples/sec Loss 2.5113 LearningRate 0.000423 Epoch: 16 Global Step: 344160 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:12,811-Speed 2512.92 samples/sec Loss 2.4534 LearningRate 0.000423 Epoch: 16 Global Step: 344170 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:21,011-Speed 2497.87 samples/sec Loss 2.5024 LearningRate 0.000423 Epoch: 16 Global Step: 344180 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:29,216-Speed 2496.49 samples/sec Loss 2.5096 LearningRate 0.000423 Epoch: 16 Global Step: 344190 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:37,419-Speed 2497.04 samples/sec Loss 2.4744 LearningRate 0.000423 Epoch: 16 Global Step: 344200 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:45,631-Speed 2494.64 samples/sec Loss 2.5500 LearningRate 0.000423 Epoch: 16 Global Step: 344210 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:13:53,829-Speed 2498.54 samples/sec Loss 2.5233 LearningRate 0.000423 Epoch: 16 Global Step: 344220 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:01,973-Speed 2515.01 samples/sec Loss 2.5587 LearningRate 0.000423 Epoch: 16 Global Step: 344230 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:10,171-Speed 2498.69 samples/sec Loss 2.4916 LearningRate 0.000423 Epoch: 16 Global Step: 344240 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:18,370-Speed 2498.20 samples/sec Loss 2.5638 LearningRate 0.000423 Epoch: 16 Global Step: 344250 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:26,567-Speed 2499.13 samples/sec Loss 2.5326 LearningRate 0.000423 Epoch: 16 Global Step: 344260 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:34,765-Speed 2498.97 samples/sec Loss 2.5183 LearningRate 0.000422 Epoch: 16 Global Step: 344270 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:42,964-Speed 2498.05 samples/sec Loss 2.5027 LearningRate 0.000422 Epoch: 16 Global Step: 344280 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:51,106-Speed 2515.77 samples/sec Loss 2.5108 LearningRate 0.000422 Epoch: 16 Global Step: 344290 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:14:59,307-Speed 2497.82 samples/sec Loss 2.4604 LearningRate 0.000422 Epoch: 16 Global Step: 344300 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:07,512-Speed 2496.36 samples/sec Loss 2.5048 LearningRate 0.000422 Epoch: 16 Global Step: 344310 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:15,712-Speed 2497.88 samples/sec Loss 2.5023 LearningRate 0.000422 Epoch: 16 Global Step: 344320 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:23,929-Speed 2492.98 samples/sec Loss 2.4582 LearningRate 0.000422 Epoch: 16 Global Step: 344330 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:32,128-Speed 2498.33 samples/sec Loss 2.4651 LearningRate 0.000422 Epoch: 16 Global Step: 344340 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:40,278-Speed 2513.35 samples/sec Loss 2.5524 LearningRate 0.000422 Epoch: 16 Global Step: 344350 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:48,482-Speed 2496.52 samples/sec Loss 2.4846 LearningRate 0.000422 Epoch: 16 Global Step: 344360 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:15:56,687-Speed 2496.58 samples/sec Loss 2.4827 LearningRate 0.000422 Epoch: 16 Global Step: 344370 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:04,888-Speed 2497.72 samples/sec Loss 2.5005 LearningRate 0.000422 Epoch: 16 Global Step: 344380 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:13,091-Speed 2496.94 samples/sec Loss 2.4814 LearningRate 0.000422 Epoch: 16 Global Step: 344390 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:21,309-Speed 2492.50 samples/sec Loss 2.5089 LearningRate 0.000422 Epoch: 16 Global Step: 344400 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:29,472-Speed 2509.10 samples/sec Loss 2.5139 LearningRate 0.000422 Epoch: 16 Global Step: 344410 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:37,673-Speed 2497.78 samples/sec Loss 2.5143 LearningRate 0.000422 Epoch: 16 Global Step: 344420 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:45,877-Speed 2497.01 samples/sec Loss 2.5110 LearningRate 0.000422 Epoch: 16 Global Step: 344430 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:16:54,076-Speed 2498.31 samples/sec Loss 2.5356 LearningRate 0.000422 Epoch: 16 Global Step: 344440 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:02,273-Speed 2498.84 samples/sec Loss 2.4765 LearningRate 0.000422 Epoch: 16 Global Step: 344450 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:10,472-Speed 2498.37 samples/sec Loss 2.4343 LearningRate 0.000422 Epoch: 16 Global Step: 344460 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:18,618-Speed 2514.55 samples/sec Loss 2.5009 LearningRate 0.000422 Epoch: 16 Global Step: 344470 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:26,815-Speed 2498.66 samples/sec Loss 2.4662 LearningRate 0.000422 Epoch: 16 Global Step: 344480 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:35,022-Speed 2496.13 samples/sec Loss 2.4789 LearningRate 0.000422 Epoch: 16 Global Step: 344490 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:43,227-Speed 2496.46 samples/sec Loss 2.5110 LearningRate 0.000422 Epoch: 16 Global Step: 344500 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:51,430-Speed 2497.03 samples/sec Loss 2.5304 LearningRate 0.000422 Epoch: 16 Global Step: 344510 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:17:59,627-Speed 2498.87 samples/sec Loss 2.5385 LearningRate 0.000422 Epoch: 16 Global Step: 344520 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:07,772-Speed 2514.56 samples/sec Loss 2.5078 LearningRate 0.000422 Epoch: 16 Global Step: 344530 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:15,980-Speed 2495.58 samples/sec Loss 2.4978 LearningRate 0.000422 Epoch: 16 Global Step: 344540 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:24,190-Speed 2495.02 samples/sec Loss 2.4351 LearningRate 0.000422 Epoch: 16 Global Step: 344550 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:32,388-Speed 2498.51 samples/sec Loss 2.5185 LearningRate 0.000422 Epoch: 16 Global Step: 344560 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:40,586-Speed 2498.96 samples/sec Loss 2.5814 LearningRate 0.000422 Epoch: 16 Global Step: 344570 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:48,787-Speed 2497.50 samples/sec Loss 2.4951 LearningRate 0.000422 Epoch: 16 Global Step: 344580 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:18:56,933-Speed 2514.35 samples/sec Loss 2.5175 LearningRate 0.000422 Epoch: 16 Global Step: 344590 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:05,133-Speed 2498.05 samples/sec Loss 2.5054 LearningRate 0.000422 Epoch: 16 Global Step: 344600 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:13,335-Speed 2497.56 samples/sec Loss 2.6098 LearningRate 0.000422 Epoch: 16 Global Step: 344610 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:21,539-Speed 2496.58 samples/sec Loss 2.5435 LearningRate 0.000422 Epoch: 16 Global Step: 344620 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:29,748-Speed 2495.24 samples/sec Loss 2.5238 LearningRate 0.000422 Epoch: 16 Global Step: 344630 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:37,951-Speed 2497.03 samples/sec Loss 2.5364 LearningRate 0.000422 Epoch: 16 Global Step: 344640 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:46,109-Speed 2510.70 samples/sec Loss 2.5462 LearningRate 0.000422 Epoch: 16 Global Step: 344650 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:19:54,310-Speed 2497.93 samples/sec Loss 2.5337 LearningRate 0.000422 Epoch: 16 Global Step: 344660 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:02,513-Speed 2496.92 samples/sec Loss 2.4974 LearningRate 0.000422 Epoch: 16 Global Step: 344670 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:10,711-Speed 2498.72 samples/sec Loss 2.4879 LearningRate 0.000422 Epoch: 16 Global Step: 344680 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:18,913-Speed 2497.13 samples/sec Loss 2.5143 LearningRate 0.000422 Epoch: 16 Global Step: 344690 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:27,119-Speed 2496.24 samples/sec Loss 2.5661 LearningRate 0.000422 Epoch: 16 Global Step: 344700 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:35,267-Speed 2514.07 samples/sec Loss 2.5142 LearningRate 0.000422 Epoch: 16 Global Step: 344710 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:43,478-Speed 2494.41 samples/sec Loss 2.4919 LearningRate 0.000422 Epoch: 16 Global Step: 344720 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:51,691-Speed 2494.21 samples/sec Loss 2.4532 LearningRate 0.000422 Epoch: 16 Global Step: 344730 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:20:59,889-Speed 2498.42 samples/sec Loss 2.4993 LearningRate 0.000422 Epoch: 16 Global Step: 344740 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:08,090-Speed 2497.68 samples/sec Loss 2.4983 LearningRate 0.000422 Epoch: 16 Global Step: 344750 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:16,294-Speed 2496.84 samples/sec Loss 2.5036 LearningRate 0.000422 Epoch: 16 Global Step: 344760 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:24,443-Speed 2513.49 samples/sec Loss 2.5206 LearningRate 0.000422 Epoch: 16 Global Step: 344770 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:32,642-Speed 2498.32 samples/sec Loss 2.5054 LearningRate 0.000422 Epoch: 16 Global Step: 344780 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:40,841-Speed 2498.40 samples/sec Loss 2.5183 LearningRate 0.000422 Epoch: 16 Global Step: 344790 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:21:48,999-Speed 2510.72 samples/sec Loss 2.4962 LearningRate 0.000422 Epoch: 16 Global Step: 344800 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:21:57,203-Speed 2496.87 samples/sec Loss 2.4964 LearningRate 0.000422 Epoch: 16 Global Step: 344810 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:05,400-Speed 2498.80 samples/sec Loss 2.5239 LearningRate 0.000422 Epoch: 16 Global Step: 344820 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:13,555-Speed 2511.70 samples/sec Loss 2.4662 LearningRate 0.000422 Epoch: 16 Global Step: 344830 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:21,755-Speed 2498.20 samples/sec Loss 2.5055 LearningRate 0.000422 Epoch: 16 Global Step: 344840 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:29,953-Speed 2498.49 samples/sec Loss 2.4887 LearningRate 0.000421 Epoch: 16 Global Step: 344850 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:38,153-Speed 2498.05 samples/sec Loss 2.4511 LearningRate 0.000421 Epoch: 16 Global Step: 344860 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:46,351-Speed 2499.09 samples/sec Loss 2.4498 LearningRate 0.000421 Epoch: 16 Global Step: 344870 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:22:54,564-Speed 2493.67 samples/sec Loss 2.4855 LearningRate 0.000421 Epoch: 16 Global Step: 344880 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:02,724-Speed 2510.40 samples/sec Loss 2.4817 LearningRate 0.000421 Epoch: 16 Global Step: 344890 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:10,926-Speed 2497.67 samples/sec Loss 2.5368 LearningRate 0.000421 Epoch: 16 Global Step: 344900 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:19,124-Speed 2498.53 samples/sec Loss 2.4965 LearningRate 0.000421 Epoch: 16 Global Step: 344910 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:27,323-Speed 2498.24 samples/sec Loss 2.4857 LearningRate 0.000421 Epoch: 16 Global Step: 344920 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:35,523-Speed 2498.24 samples/sec Loss 2.4505 LearningRate 0.000421 Epoch: 16 Global Step: 344930 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:43,723-Speed 2497.73 samples/sec Loss 2.4656 LearningRate 0.000421 Epoch: 16 Global Step: 344940 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:23:51,865-Speed 2515.83 samples/sec Loss 2.5238 LearningRate 0.000421 Epoch: 16 Global Step: 344950 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:00,071-Speed 2496.22 samples/sec Loss 2.4986 LearningRate 0.000421 Epoch: 16 Global Step: 344960 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:08,269-Speed 2498.36 samples/sec Loss 2.4849 LearningRate 0.000421 Epoch: 16 Global Step: 344970 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:16,467-Speed 2498.74 samples/sec Loss 2.5071 LearningRate 0.000421 Epoch: 16 Global Step: 344980 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:24,665-Speed 2498.67 samples/sec Loss 2.5098 LearningRate 0.000421 Epoch: 16 Global Step: 344990 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:32,865-Speed 2497.74 samples/sec Loss 2.5388 LearningRate 0.000421 Epoch: 16 Global Step: 345000 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:41,019-Speed 2512.18 samples/sec Loss 2.5649 LearningRate 0.000421 Epoch: 16 Global Step: 345010 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:49,221-Speed 2497.47 samples/sec Loss 2.4693 LearningRate 0.000421 Epoch: 16 Global Step: 345020 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:24:57,425-Speed 2496.92 samples/sec Loss 2.4824 LearningRate 0.000421 Epoch: 16 Global Step: 345030 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:05,632-Speed 2495.71 samples/sec Loss 2.5203 LearningRate 0.000421 Epoch: 16 Global Step: 345040 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:13,834-Speed 2497.25 samples/sec Loss 2.4558 LearningRate 0.000421 Epoch: 16 Global Step: 345050 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:22,047-Speed 2494.17 samples/sec Loss 2.5453 LearningRate 0.000421 Epoch: 16 Global Step: 345060 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:30,193-Speed 2514.63 samples/sec Loss 2.4846 LearningRate 0.000421 Epoch: 16 Global Step: 345070 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:38,394-Speed 2497.45 samples/sec Loss 2.4905 LearningRate 0.000421 Epoch: 16 Global Step: 345080 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:46,592-Speed 2498.60 samples/sec Loss 2.4707 LearningRate 0.000421 Epoch: 16 Global Step: 345090 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:25:54,791-Speed 2498.39 samples/sec Loss 2.5266 LearningRate 0.000421 Epoch: 16 Global Step: 345100 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:02,995-Speed 2496.75 samples/sec Loss 2.4910 LearningRate 0.000421 Epoch: 16 Global Step: 345110 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:11,195-Speed 2498.04 samples/sec Loss 2.4820 LearningRate 0.000421 Epoch: 16 Global Step: 345120 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:19,345-Speed 2513.25 samples/sec Loss 2.4969 LearningRate 0.000421 Epoch: 16 Global Step: 345130 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:27,543-Speed 2498.72 samples/sec Loss 2.4390 LearningRate 0.000421 Epoch: 16 Global Step: 345140 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:35,743-Speed 2497.83 samples/sec Loss 2.5021 LearningRate 0.000421 Epoch: 16 Global Step: 345150 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:43,944-Speed 2497.60 samples/sec Loss 2.5492 LearningRate 0.000421 Epoch: 16 Global Step: 345160 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:26:52,142-Speed 2498.56 samples/sec Loss 2.4853 LearningRate 0.000421 Epoch: 16 Global Step: 345170 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:00,343-Speed 2497.78 samples/sec Loss 2.5197 LearningRate 0.000421 Epoch: 16 Global Step: 345180 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:08,503-Speed 2510.38 samples/sec Loss 2.5328 LearningRate 0.000421 Epoch: 16 Global Step: 345190 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:16,699-Speed 2498.90 samples/sec Loss 2.5652 LearningRate 0.000421 Epoch: 16 Global Step: 345200 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:24,899-Speed 2498.47 samples/sec Loss 2.5089 LearningRate 0.000421 Epoch: 16 Global Step: 345210 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:33,098-Speed 2498.16 samples/sec Loss 2.4624 LearningRate 0.000421 Epoch: 16 Global Step: 345220 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:41,296-Speed 2498.87 samples/sec Loss 2.4884 LearningRate 0.000421 Epoch: 16 Global Step: 345230 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:49,499-Speed 2497.01 samples/sec Loss 2.4802 LearningRate 0.000421 Epoch: 16 Global Step: 345240 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:27:57,650-Speed 2512.99 samples/sec Loss 2.4818 LearningRate 0.000421 Epoch: 16 Global Step: 345250 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:05,867-Speed 2492.99 samples/sec Loss 2.4646 LearningRate 0.000421 Epoch: 16 Global Step: 345260 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:14,062-Speed 2499.39 samples/sec Loss 2.5359 LearningRate 0.000421 Epoch: 16 Global Step: 345270 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:22,272-Speed 2494.66 samples/sec Loss 2.5407 LearningRate 0.000421 Epoch: 16 Global Step: 345280 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:30,470-Speed 2499.13 samples/sec Loss 2.4769 LearningRate 0.000421 Epoch: 16 Global Step: 345290 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:38,667-Speed 2498.74 samples/sec Loss 2.5237 LearningRate 0.000421 Epoch: 16 Global Step: 345300 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:46,815-Speed 2514.03 samples/sec Loss 2.5051 LearningRate 0.000421 Epoch: 16 Global Step: 345310 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:28:55,019-Speed 2496.85 samples/sec Loss 2.5311 LearningRate 0.000421 Epoch: 16 Global Step: 345320 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:03,216-Speed 2498.93 samples/sec Loss 2.5178 LearningRate 0.000421 Epoch: 16 Global Step: 345330 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:11,418-Speed 2497.14 samples/sec Loss 2.5132 LearningRate 0.000421 Epoch: 16 Global Step: 345340 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:19,616-Speed 2498.60 samples/sec Loss 2.5036 LearningRate 0.000421 Epoch: 16 Global Step: 345350 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:27,820-Speed 2497.10 samples/sec Loss 2.5119 LearningRate 0.000421 Epoch: 16 Global Step: 345360 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:35,967-Speed 2514.17 samples/sec Loss 2.5118 LearningRate 0.000421 Epoch: 16 Global Step: 345370 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:44,166-Speed 2498.19 samples/sec Loss 2.5317 LearningRate 0.000421 Epoch: 16 Global Step: 345380 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:29:52,364-Speed 2498.65 samples/sec Loss 2.4460 LearningRate 0.000421 Epoch: 16 Global Step: 345390 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:00,568-Speed 2496.78 samples/sec Loss 2.5638 LearningRate 0.000421 Epoch: 16 Global Step: 345400 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:08,765-Speed 2498.76 samples/sec Loss 2.5138 LearningRate 0.000421 Epoch: 16 Global Step: 345410 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:16,966-Speed 2497.78 samples/sec Loss 2.5048 LearningRate 0.000420 Epoch: 16 Global Step: 345420 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:25,115-Speed 2513.39 samples/sec Loss 2.4589 LearningRate 0.000420 Epoch: 16 Global Step: 345430 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:33,319-Speed 2496.78 samples/sec Loss 2.4148 LearningRate 0.000420 Epoch: 16 Global Step: 345440 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:41,529-Speed 2494.99 samples/sec Loss 2.4900 LearningRate 0.000420 Epoch: 16 Global Step: 345450 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:49,754-Speed 2490.21 samples/sec Loss 2.4535 LearningRate 0.000420 Epoch: 16 Global Step: 345460 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:30:57,959-Speed 2496.58 samples/sec Loss 2.4106 LearningRate 0.000420 Epoch: 16 Global Step: 345470 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:06,172-Speed 2493.70 samples/sec Loss 2.4997 LearningRate 0.000420 Epoch: 16 Global Step: 345480 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:14,319-Speed 2514.26 samples/sec Loss 2.5193 LearningRate 0.000420 Epoch: 16 Global Step: 345490 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:22,520-Speed 2497.61 samples/sec Loss 2.4356 LearningRate 0.000420 Epoch: 16 Global Step: 345500 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:30,734-Speed 2493.50 samples/sec Loss 2.4870 LearningRate 0.000420 Epoch: 16 Global Step: 345510 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:38,932-Speed 2498.46 samples/sec Loss 2.4868 LearningRate 0.000420 Epoch: 16 Global Step: 345520 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:47,137-Speed 2496.64 samples/sec Loss 2.4529 LearningRate 0.000420 Epoch: 16 Global Step: 345530 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:31:55,333-Speed 2498.85 samples/sec Loss 2.5189 LearningRate 0.000420 Epoch: 16 Global Step: 345540 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:03,479-Speed 2514.57 samples/sec Loss 2.5158 LearningRate 0.000420 Epoch: 16 Global Step: 345550 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:11,681-Speed 2497.83 samples/sec Loss 2.4712 LearningRate 0.000420 Epoch: 16 Global Step: 345560 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:19,889-Speed 2495.39 samples/sec Loss 2.4991 LearningRate 0.000420 Epoch: 16 Global Step: 345570 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:28,086-Speed 2498.83 samples/sec Loss 2.4954 LearningRate 0.000420 Epoch: 16 Global Step: 345580 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:36,283-Speed 2498.78 samples/sec Loss 2.5158 LearningRate 0.000420 Epoch: 16 Global Step: 345590 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:44,480-Speed 2498.98 samples/sec Loss 2.4994 LearningRate 0.000420 Epoch: 16 Global Step: 345600 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:32:52,625-Speed 2514.64 samples/sec Loss 2.5163 LearningRate 0.000420 Epoch: 16 Global Step: 345610 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:00,824-Speed 2498.23 samples/sec Loss 2.3962 LearningRate 0.000420 Epoch: 16 Global Step: 345620 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:09,021-Speed 2498.99 samples/sec Loss 2.4870 LearningRate 0.000420 Epoch: 16 Global Step: 345630 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:17,223-Speed 2498.22 samples/sec Loss 2.4764 LearningRate 0.000420 Epoch: 16 Global Step: 345640 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:25,422-Speed 2498.33 samples/sec Loss 2.4935 LearningRate 0.000420 Epoch: 16 Global Step: 345650 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:33,622-Speed 2497.85 samples/sec Loss 2.5035 LearningRate 0.000420 Epoch: 16 Global Step: 345660 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:41,771-Speed 2513.51 samples/sec Loss 2.5144 LearningRate 0.000420 Epoch: 16 Global Step: 345670 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:49,969-Speed 2498.74 samples/sec Loss 2.5004 LearningRate 0.000420 Epoch: 16 Global Step: 345680 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:33:58,168-Speed 2498.32 samples/sec Loss 2.4944 LearningRate 0.000420 Epoch: 16 Global Step: 345690 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:06,363-Speed 2499.42 samples/sec Loss 2.5041 LearningRate 0.000420 Epoch: 16 Global Step: 345700 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:14,564-Speed 2497.83 samples/sec Loss 2.5004 LearningRate 0.000420 Epoch: 16 Global Step: 345710 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:22,767-Speed 2497.03 samples/sec Loss 2.5083 LearningRate 0.000420 Epoch: 16 Global Step: 345720 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:30,912-Speed 2514.74 samples/sec Loss 2.5089 LearningRate 0.000420 Epoch: 16 Global Step: 345730 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:39,110-Speed 2498.98 samples/sec Loss 2.4796 LearningRate 0.000420 Epoch: 16 Global Step: 345740 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:47,306-Speed 2499.03 samples/sec Loss 2.5042 LearningRate 0.000420 Epoch: 16 Global Step: 345750 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:34:55,504-Speed 2498.64 samples/sec Loss 2.4667 LearningRate 0.000420 Epoch: 16 Global Step: 345760 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:03,702-Speed 2498.51 samples/sec Loss 2.5014 LearningRate 0.000420 Epoch: 16 Global Step: 345770 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:11,904-Speed 2497.32 samples/sec Loss 2.4896 LearningRate 0.000420 Epoch: 16 Global Step: 345780 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:20,046-Speed 2515.74 samples/sec Loss 2.4910 LearningRate 0.000420 Epoch: 16 Global Step: 345790 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:28,245-Speed 2498.46 samples/sec Loss 2.4670 LearningRate 0.000420 Epoch: 16 Global Step: 345800 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:36,444-Speed 2498.30 samples/sec Loss 2.5248 LearningRate 0.000420 Epoch: 16 Global Step: 345810 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:44,645-Speed 2497.47 samples/sec Loss 2.5071 LearningRate 0.000420 Epoch: 16 Global Step: 345820 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:35:52,858-Speed 2494.15 samples/sec Loss 2.5200 LearningRate 0.000420 Epoch: 16 Global Step: 345830 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:01,055-Speed 2498.68 samples/sec Loss 2.4606 LearningRate 0.000420 Epoch: 16 Global Step: 345840 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:09,202-Speed 2514.38 samples/sec Loss 2.5466 LearningRate 0.000420 Epoch: 16 Global Step: 345850 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:17,409-Speed 2495.60 samples/sec Loss 2.4352 LearningRate 0.000420 Epoch: 16 Global Step: 345860 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:25,617-Speed 2495.83 samples/sec Loss 2.4891 LearningRate 0.000420 Epoch: 16 Global Step: 345870 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:33,820-Speed 2497.02 samples/sec Loss 2.4601 LearningRate 0.000420 Epoch: 16 Global Step: 345880 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:42,020-Speed 2497.72 samples/sec Loss 2.4567 LearningRate 0.000420 Epoch: 16 Global Step: 345890 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:50,219-Speed 2498.30 samples/sec Loss 2.4447 LearningRate 0.000420 Epoch: 16 Global Step: 345900 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:36:58,366-Speed 2514.49 samples/sec Loss 2.4695 LearningRate 0.000420 Epoch: 16 Global Step: 345910 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:06,565-Speed 2498.40 samples/sec Loss 2.4798 LearningRate 0.000420 Epoch: 16 Global Step: 345920 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:14,765-Speed 2497.89 samples/sec Loss 2.4870 LearningRate 0.000420 Epoch: 16 Global Step: 345930 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:22,976-Speed 2494.59 samples/sec Loss 2.4752 LearningRate 0.000420 Epoch: 16 Global Step: 345940 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:31,177-Speed 2497.89 samples/sec Loss 2.4889 LearningRate 0.000420 Epoch: 16 Global Step: 345950 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:39,376-Speed 2498.13 samples/sec Loss 2.5006 LearningRate 0.000420 Epoch: 16 Global Step: 345960 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:47,525-Speed 2513.89 samples/sec Loss 2.4838 LearningRate 0.000420 Epoch: 16 Global Step: 345970 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:37:55,724-Speed 2498.11 samples/sec Loss 2.4622 LearningRate 0.000420 Epoch: 16 Global Step: 345980 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:38:03,925-Speed 2497.64 samples/sec Loss 2.4390 LearningRate 0.000420 Epoch: 16 Global Step: 345990 Fp16 Grad Scale: 16384 Required: 111 hours Training: 2022-07-08 21:38:12,127-Speed 2497.80 samples/sec Loss 2.5185 LearningRate 0.000419 Epoch: 16 Global Step: 346000 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:38:20,330-Speed 2496.93 samples/sec Loss 2.4878 LearningRate 0.000419 Epoch: 16 Global Step: 346010 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:38:28,530-Speed 2498.45 samples/sec Loss 2.5460 LearningRate 0.000419 Epoch: 16 Global Step: 346020 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:38:36,674-Speed 2515.18 samples/sec Loss 2.4606 LearningRate 0.000419 Epoch: 16 Global Step: 346030 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:38:44,913-Speed 2486.06 samples/sec Loss 2.5109 LearningRate 0.000419 Epoch: 16 Global Step: 346040 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:38:53,113-Speed 2497.87 samples/sec Loss 2.4973 LearningRate 0.000419 Epoch: 16 Global Step: 346050 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:01,311-Speed 2498.72 samples/sec Loss 2.4974 LearningRate 0.000419 Epoch: 16 Global Step: 346060 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:09,513-Speed 2497.07 samples/sec Loss 2.5103 LearningRate 0.000419 Epoch: 16 Global Step: 346070 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:17,714-Speed 2497.80 samples/sec Loss 2.4880 LearningRate 0.000419 Epoch: 16 Global Step: 346080 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:25,863-Speed 2513.56 samples/sec Loss 2.4692 LearningRate 0.000419 Epoch: 16 Global Step: 346090 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:34,066-Speed 2497.05 samples/sec Loss 2.5042 LearningRate 0.000419 Epoch: 16 Global Step: 346100 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:42,265-Speed 2498.23 samples/sec Loss 2.5375 LearningRate 0.000419 Epoch: 16 Global Step: 346110 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:50,464-Speed 2498.25 samples/sec Loss 2.4980 LearningRate 0.000419 Epoch: 16 Global Step: 346120 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:39:58,673-Speed 2495.25 samples/sec Loss 2.4991 LearningRate 0.000419 Epoch: 16 Global Step: 346130 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:06,879-Speed 2496.63 samples/sec Loss 2.5042 LearningRate 0.000419 Epoch: 16 Global Step: 346140 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:15,022-Speed 2515.26 samples/sec Loss 2.5062 LearningRate 0.000419 Epoch: 16 Global Step: 346150 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:23,235-Speed 2494.11 samples/sec Loss 2.4786 LearningRate 0.000419 Epoch: 16 Global Step: 346160 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:31,436-Speed 2497.52 samples/sec Loss 2.5070 LearningRate 0.000419 Epoch: 16 Global Step: 346170 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:39,635-Speed 2498.09 samples/sec Loss 2.5018 LearningRate 0.000419 Epoch: 16 Global Step: 346180 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:47,828-Speed 2499.92 samples/sec Loss 2.4770 LearningRate 0.000419 Epoch: 16 Global Step: 346190 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:40:56,027-Speed 2498.45 samples/sec Loss 2.4536 LearningRate 0.000419 Epoch: 16 Global Step: 346200 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:04,172-Speed 2514.78 samples/sec Loss 2.5123 LearningRate 0.000419 Epoch: 16 Global Step: 346210 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:12,381-Speed 2500.54 samples/sec Loss 2.4478 LearningRate 0.000419 Epoch: 16 Global Step: 346220 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:20,595-Speed 2494.05 samples/sec Loss 2.4613 LearningRate 0.000419 Epoch: 16 Global Step: 346230 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:28,797-Speed 2497.42 samples/sec Loss 2.5008 LearningRate 0.000419 Epoch: 16 Global Step: 346240 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:36,990-Speed 2499.88 samples/sec Loss 2.4360 LearningRate 0.000419 Epoch: 16 Global Step: 346250 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:45,192-Speed 2497.43 samples/sec Loss 2.4519 LearningRate 0.000419 Epoch: 16 Global Step: 346260 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:41:53,337-Speed 2514.87 samples/sec Loss 2.4770 LearningRate 0.000419 Epoch: 16 Global Step: 346270 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:01,534-Speed 2498.77 samples/sec Loss 2.4223 LearningRate 0.000419 Epoch: 16 Global Step: 346280 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:09,744-Speed 2495.29 samples/sec Loss 2.4697 LearningRate 0.000419 Epoch: 16 Global Step: 346290 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:17,941-Speed 2498.88 samples/sec Loss 2.4023 LearningRate 0.000419 Epoch: 16 Global Step: 346300 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:26,141-Speed 2497.56 samples/sec Loss 2.4838 LearningRate 0.000419 Epoch: 16 Global Step: 346310 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:34,345-Speed 2496.99 samples/sec Loss 2.3987 LearningRate 0.000419 Epoch: 16 Global Step: 346320 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:42,487-Speed 2515.61 samples/sec Loss 2.4976 LearningRate 0.000419 Epoch: 16 Global Step: 346330 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:50,684-Speed 2498.97 samples/sec Loss 2.4548 LearningRate 0.000419 Epoch: 16 Global Step: 346340 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:42:58,886-Speed 2497.33 samples/sec Loss 2.4779 LearningRate 0.000419 Epoch: 16 Global Step: 346350 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:07,088-Speed 2497.43 samples/sec Loss 2.4452 LearningRate 0.000419 Epoch: 16 Global Step: 346360 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:15,287-Speed 2498.29 samples/sec Loss 2.4662 LearningRate 0.000419 Epoch: 16 Global Step: 346370 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:23,498-Speed 2494.32 samples/sec Loss 2.4606 LearningRate 0.000419 Epoch: 16 Global Step: 346380 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:31,653-Speed 2512.04 samples/sec Loss 2.4474 LearningRate 0.000419 Epoch: 16 Global Step: 346390 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:39,856-Speed 2497.01 samples/sec Loss 2.4622 LearningRate 0.000419 Epoch: 16 Global Step: 346400 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:48,052-Speed 2499.32 samples/sec Loss 2.4511 LearningRate 0.000419 Epoch: 16 Global Step: 346410 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:43:56,259-Speed 2496.02 samples/sec Loss 2.4694 LearningRate 0.000419 Epoch: 16 Global Step: 346420 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:04,462-Speed 2496.91 samples/sec Loss 2.4718 LearningRate 0.000419 Epoch: 16 Global Step: 346430 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:12,662-Speed 2497.92 samples/sec Loss 2.4351 LearningRate 0.000419 Epoch: 16 Global Step: 346440 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:20,826-Speed 2508.92 samples/sec Loss 2.4068 LearningRate 0.000419 Epoch: 16 Global Step: 346450 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:29,026-Speed 2498.02 samples/sec Loss 2.4983 LearningRate 0.000419 Epoch: 16 Global Step: 346460 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:37,239-Speed 2494.09 samples/sec Loss 2.4315 LearningRate 0.000419 Epoch: 16 Global Step: 346470 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:45,437-Speed 2498.27 samples/sec Loss 2.4249 LearningRate 0.000419 Epoch: 16 Global Step: 346480 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:44:53,637-Speed 2498.20 samples/sec Loss 2.4624 LearningRate 0.000419 Epoch: 16 Global Step: 346490 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:01,832-Speed 2499.31 samples/sec Loss 2.5202 LearningRate 0.000419 Epoch: 16 Global Step: 346500 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:09,980-Speed 2513.86 samples/sec Loss 2.5002 LearningRate 0.000419 Epoch: 16 Global Step: 346510 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:18,187-Speed 2500.37 samples/sec Loss 2.4759 LearningRate 0.000419 Epoch: 16 Global Step: 346520 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:26,396-Speed 2497.80 samples/sec Loss 2.5079 LearningRate 0.000419 Epoch: 16 Global Step: 346530 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:34,610-Speed 2493.83 samples/sec Loss 2.4962 LearningRate 0.000419 Epoch: 16 Global Step: 346540 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:42,838-Speed 2489.42 samples/sec Loss 2.4891 LearningRate 0.000419 Epoch: 16 Global Step: 346550 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:51,044-Speed 2496.12 samples/sec Loss 2.5330 LearningRate 0.000419 Epoch: 16 Global Step: 346560 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:45:59,190-Speed 2514.62 samples/sec Loss 2.5039 LearningRate 0.000419 Epoch: 16 Global Step: 346570 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:07,401-Speed 2494.60 samples/sec Loss 2.4745 LearningRate 0.000418 Epoch: 16 Global Step: 346580 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:15,605-Speed 2496.71 samples/sec Loss 2.5081 LearningRate 0.000418 Epoch: 16 Global Step: 346590 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:23,803-Speed 2498.81 samples/sec Loss 2.5431 LearningRate 0.000418 Epoch: 16 Global Step: 346600 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:32,000-Speed 2498.73 samples/sec Loss 2.4954 LearningRate 0.000418 Epoch: 16 Global Step: 346610 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:40,200-Speed 2497.84 samples/sec Loss 2.4761 LearningRate 0.000418 Epoch: 16 Global Step: 346620 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:48,350-Speed 2513.27 samples/sec Loss 2.5385 LearningRate 0.000418 Epoch: 16 Global Step: 346630 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:46:56,547-Speed 2499.06 samples/sec Loss 2.5841 LearningRate 0.000418 Epoch: 16 Global Step: 346640 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:04,743-Speed 2499.04 samples/sec Loss 2.5606 LearningRate 0.000418 Epoch: 16 Global Step: 346650 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:12,986-Speed 2484.86 samples/sec Loss 2.5382 LearningRate 0.000418 Epoch: 16 Global Step: 346660 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:21,185-Speed 2498.30 samples/sec Loss 2.5676 LearningRate 0.000418 Epoch: 16 Global Step: 346670 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:29,391-Speed 2496.55 samples/sec Loss 2.5288 LearningRate 0.000418 Epoch: 16 Global Step: 346680 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:37,540-Speed 2513.27 samples/sec Loss 2.5671 LearningRate 0.000418 Epoch: 16 Global Step: 346690 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:45,755-Speed 2493.69 samples/sec Loss 2.5727 LearningRate 0.000418 Epoch: 16 Global Step: 346700 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:47:53,957-Speed 2497.22 samples/sec Loss 2.5450 LearningRate 0.000418 Epoch: 16 Global Step: 346710 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:02,156-Speed 2498.15 samples/sec Loss 2.5301 LearningRate 0.000418 Epoch: 16 Global Step: 346720 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:10,362-Speed 2496.19 samples/sec Loss 2.5084 LearningRate 0.000418 Epoch: 16 Global Step: 346730 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:18,561-Speed 2498.16 samples/sec Loss 2.4881 LearningRate 0.000418 Epoch: 16 Global Step: 346740 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:26,712-Speed 2513.10 samples/sec Loss 2.5190 LearningRate 0.000418 Epoch: 16 Global Step: 346750 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:34,921-Speed 2495.32 samples/sec Loss 2.4797 LearningRate 0.000418 Epoch: 16 Global Step: 346760 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:43,120-Speed 2498.09 samples/sec Loss 2.4576 LearningRate 0.000418 Epoch: 16 Global Step: 346770 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:51,321-Speed 2497.63 samples/sec Loss 2.4791 LearningRate 0.000418 Epoch: 16 Global Step: 346780 Fp16 Grad Scale: 32768 Required: 111 hours Training: 2022-07-08 21:48:59,522-Speed 2497.79 samples/sec Loss 2.4639 LearningRate 0.000418 Epoch: 16 Global Step: 346790 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:07,724-Speed 2497.44 samples/sec Loss 2.4747 LearningRate 0.000418 Epoch: 16 Global Step: 346800 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:15,882-Speed 2510.64 samples/sec Loss 2.5213 LearningRate 0.000418 Epoch: 16 Global Step: 346810 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:24,082-Speed 2497.90 samples/sec Loss 2.5094 LearningRate 0.000418 Epoch: 16 Global Step: 346820 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:32,282-Speed 2498.20 samples/sec Loss 2.4942 LearningRate 0.000418 Epoch: 16 Global Step: 346830 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:40,486-Speed 2496.69 samples/sec Loss 2.5137 LearningRate 0.000418 Epoch: 16 Global Step: 346840 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:48,683-Speed 2498.87 samples/sec Loss 2.4715 LearningRate 0.000418 Epoch: 16 Global Step: 346850 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:49:56,881-Speed 2498.60 samples/sec Loss 2.4903 LearningRate 0.000418 Epoch: 16 Global Step: 346860 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:05,025-Speed 2515.26 samples/sec Loss 2.5633 LearningRate 0.000418 Epoch: 16 Global Step: 346870 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:13,223-Speed 2498.43 samples/sec Loss 2.5764 LearningRate 0.000418 Epoch: 16 Global Step: 346880 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:21,428-Speed 2496.66 samples/sec Loss 2.5052 LearningRate 0.000418 Epoch: 16 Global Step: 346890 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:29,636-Speed 2495.48 samples/sec Loss 2.5400 LearningRate 0.000418 Epoch: 16 Global Step: 346900 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:37,835-Speed 2498.26 samples/sec Loss 2.5290 LearningRate 0.000418 Epoch: 16 Global Step: 346910 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:46,034-Speed 2498.23 samples/sec Loss 2.5256 LearningRate 0.000418 Epoch: 16 Global Step: 346920 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:50:54,179-Speed 2515.02 samples/sec Loss 2.4746 LearningRate 0.000418 Epoch: 16 Global Step: 346930 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:02,378-Speed 2498.08 samples/sec Loss 2.4628 LearningRate 0.000418 Epoch: 16 Global Step: 346940 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:10,578-Speed 2498.48 samples/sec Loss 2.5366 LearningRate 0.000418 Epoch: 16 Global Step: 346950 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:18,779-Speed 2497.50 samples/sec Loss 2.4608 LearningRate 0.000418 Epoch: 16 Global Step: 346960 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:26,978-Speed 2498.12 samples/sec Loss 2.4749 LearningRate 0.000418 Epoch: 16 Global Step: 346970 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:35,184-Speed 2496.06 samples/sec Loss 2.5035 LearningRate 0.000418 Epoch: 16 Global Step: 346980 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:43,329-Speed 2515.02 samples/sec Loss 2.5493 LearningRate 0.000418 Epoch: 16 Global Step: 346990 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:51,529-Speed 2498.03 samples/sec Loss 2.4892 LearningRate 0.000418 Epoch: 16 Global Step: 347000 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:51:59,727-Speed 2498.72 samples/sec Loss 2.4839 LearningRate 0.000418 Epoch: 16 Global Step: 347010 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:07,923-Speed 2499.18 samples/sec Loss 2.4918 LearningRate 0.000418 Epoch: 16 Global Step: 347020 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:16,122-Speed 2498.29 samples/sec Loss 2.5029 LearningRate 0.000418 Epoch: 16 Global Step: 347030 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:24,321-Speed 2498.17 samples/sec Loss 2.5087 LearningRate 0.000418 Epoch: 16 Global Step: 347040 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:32,467-Speed 2514.74 samples/sec Loss 2.5138 LearningRate 0.000418 Epoch: 16 Global Step: 347050 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:40,668-Speed 2497.79 samples/sec Loss 2.5270 LearningRate 0.000418 Epoch: 16 Global Step: 347060 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:48,867-Speed 2498.39 samples/sec Loss 2.5050 LearningRate 0.000418 Epoch: 16 Global Step: 347070 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:52:57,073-Speed 2496.02 samples/sec Loss 2.5167 LearningRate 0.000418 Epoch: 16 Global Step: 347080 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:05,270-Speed 2498.67 samples/sec Loss 2.4994 LearningRate 0.000418 Epoch: 16 Global Step: 347090 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:13,470-Speed 2498.16 samples/sec Loss 2.4686 LearningRate 0.000418 Epoch: 16 Global Step: 347100 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:21,620-Speed 2513.43 samples/sec Loss 2.4479 LearningRate 0.000418 Epoch: 16 Global Step: 347110 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:29,818-Speed 2498.61 samples/sec Loss 2.4887 LearningRate 0.000418 Epoch: 16 Global Step: 347120 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:38,021-Speed 2497.18 samples/sec Loss 2.5057 LearningRate 0.000418 Epoch: 16 Global Step: 347130 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:46,222-Speed 2497.93 samples/sec Loss 2.5374 LearningRate 0.000418 Epoch: 16 Global Step: 347140 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:53:54,423-Speed 2497.76 samples/sec Loss 2.5048 LearningRate 0.000417 Epoch: 16 Global Step: 347150 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:54:02,622-Speed 2498.36 samples/sec Loss 2.5009 LearningRate 0.000417 Epoch: 16 Global Step: 347160 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:54:10,772-Speed 2513.31 samples/sec Loss 2.4673 LearningRate 0.000417 Epoch: 16 Global Step: 347170 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:54:18,972-Speed 2497.91 samples/sec Loss 2.4752 LearningRate 0.000417 Epoch: 16 Global Step: 347180 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:54:27,171-Speed 2498.43 samples/sec Loss 2.4673 LearningRate 0.000417 Epoch: 16 Global Step: 347190 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 21:54:35,370-Speed 2498.25 samples/sec Loss 2.5100 LearningRate 0.000417 Epoch: 16 Global Step: 347200 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:54:43,578-Speed 2495.24 samples/sec Loss 2.5275 LearningRate 0.000417 Epoch: 16 Global Step: 347210 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:54:51,780-Speed 2497.44 samples/sec Loss 2.4774 LearningRate 0.000417 Epoch: 16 Global Step: 347220 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:54:59,925-Speed 2515.11 samples/sec Loss 2.5149 LearningRate 0.000417 Epoch: 16 Global Step: 347230 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:08,135-Speed 2494.92 samples/sec Loss 2.5145 LearningRate 0.000417 Epoch: 16 Global Step: 347240 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:16,334-Speed 2498.38 samples/sec Loss 2.4583 LearningRate 0.000417 Epoch: 16 Global Step: 347250 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:24,542-Speed 2495.77 samples/sec Loss 2.5004 LearningRate 0.000417 Epoch: 16 Global Step: 347260 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:32,740-Speed 2498.20 samples/sec Loss 2.4981 LearningRate 0.000417 Epoch: 16 Global Step: 347270 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:40,940-Speed 2498.09 samples/sec Loss 2.5415 LearningRate 0.000417 Epoch: 16 Global Step: 347280 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:49,086-Speed 2514.53 samples/sec Loss 2.4950 LearningRate 0.000417 Epoch: 16 Global Step: 347290 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:55:57,287-Speed 2497.63 samples/sec Loss 2.5600 LearningRate 0.000417 Epoch: 16 Global Step: 347300 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:05,487-Speed 2497.77 samples/sec Loss 2.5265 LearningRate 0.000417 Epoch: 16 Global Step: 347310 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:13,692-Speed 2496.44 samples/sec Loss 2.5453 LearningRate 0.000417 Epoch: 16 Global Step: 347320 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:21,897-Speed 2496.69 samples/sec Loss 2.5532 LearningRate 0.000417 Epoch: 16 Global Step: 347330 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:30,090-Speed 2499.98 samples/sec Loss 2.5699 LearningRate 0.000417 Epoch: 16 Global Step: 347340 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:38,240-Speed 2513.29 samples/sec Loss 2.5449 LearningRate 0.000417 Epoch: 16 Global Step: 347350 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:46,441-Speed 2497.72 samples/sec Loss 2.5659 LearningRate 0.000417 Epoch: 16 Global Step: 347360 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:56:54,658-Speed 2493.08 samples/sec Loss 2.5539 LearningRate 0.000417 Epoch: 16 Global Step: 347370 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:02,854-Speed 2498.99 samples/sec Loss 2.5689 LearningRate 0.000417 Epoch: 16 Global Step: 347380 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:11,052-Speed 2498.64 samples/sec Loss 2.5933 LearningRate 0.000417 Epoch: 16 Global Step: 347390 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:19,251-Speed 2498.21 samples/sec Loss 2.4969 LearningRate 0.000417 Epoch: 16 Global Step: 347400 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:27,398-Speed 2514.40 samples/sec Loss 2.5123 LearningRate 0.000417 Epoch: 16 Global Step: 347410 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:35,598-Speed 2497.82 samples/sec Loss 2.5466 LearningRate 0.000417 Epoch: 16 Global Step: 347420 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:43,804-Speed 2496.33 samples/sec Loss 2.5266 LearningRate 0.000417 Epoch: 16 Global Step: 347430 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:57:51,999-Speed 2499.31 samples/sec Loss 2.5292 LearningRate 0.000417 Epoch: 16 Global Step: 347440 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:00,222-Speed 2491.14 samples/sec Loss 2.5303 LearningRate 0.000417 Epoch: 16 Global Step: 347450 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:08,420-Speed 2498.43 samples/sec Loss 2.5023 LearningRate 0.000417 Epoch: 16 Global Step: 347460 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:16,566-Speed 2514.57 samples/sec Loss 2.5353 LearningRate 0.000417 Epoch: 16 Global Step: 347470 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:24,770-Speed 2496.67 samples/sec Loss 2.4811 LearningRate 0.000417 Epoch: 16 Global Step: 347480 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:32,971-Speed 2497.68 samples/sec Loss 2.4706 LearningRate 0.000417 Epoch: 16 Global Step: 347490 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:41,173-Speed 2497.14 samples/sec Loss 2.4672 LearningRate 0.000417 Epoch: 16 Global Step: 347500 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:49,378-Speed 2496.19 samples/sec Loss 2.5244 LearningRate 0.000417 Epoch: 16 Global Step: 347510 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:58:57,602-Speed 2490.71 samples/sec Loss 2.4703 LearningRate 0.000417 Epoch: 16 Global Step: 347520 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:05,745-Speed 2515.69 samples/sec Loss 2.4534 LearningRate 0.000417 Epoch: 16 Global Step: 347530 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:13,946-Speed 2498.12 samples/sec Loss 2.5060 LearningRate 0.000417 Epoch: 16 Global Step: 347540 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:22,148-Speed 2497.35 samples/sec Loss 2.4810 LearningRate 0.000417 Epoch: 16 Global Step: 347550 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:30,346-Speed 2498.51 samples/sec Loss 2.4450 LearningRate 0.000417 Epoch: 16 Global Step: 347560 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:38,553-Speed 2495.92 samples/sec Loss 2.4496 LearningRate 0.000417 Epoch: 16 Global Step: 347570 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:46,764-Speed 2494.59 samples/sec Loss 2.5431 LearningRate 0.000417 Epoch: 16 Global Step: 347580 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 21:59:54,910-Speed 2515.75 samples/sec Loss 2.4706 LearningRate 0.000417 Epoch: 16 Global Step: 347590 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:03,104-Speed 2499.81 samples/sec Loss 2.4884 LearningRate 0.000417 Epoch: 16 Global Step: 347600 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:11,332-Speed 2489.43 samples/sec Loss 2.4621 LearningRate 0.000417 Epoch: 16 Global Step: 347610 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:19,533-Speed 2497.64 samples/sec Loss 2.4543 LearningRate 0.000417 Epoch: 16 Global Step: 347620 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:27,736-Speed 2497.01 samples/sec Loss 2.4823 LearningRate 0.000417 Epoch: 16 Global Step: 347630 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:35,936-Speed 2497.94 samples/sec Loss 2.4768 LearningRate 0.000417 Epoch: 16 Global Step: 347640 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:44,082-Speed 2514.50 samples/sec Loss 2.5091 LearningRate 0.000417 Epoch: 16 Global Step: 347650 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:00:52,282-Speed 2497.81 samples/sec Loss 2.5337 LearningRate 0.000417 Epoch: 16 Global Step: 347660 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:00,482-Speed 2498.44 samples/sec Loss 2.5525 LearningRate 0.000417 Epoch: 16 Global Step: 347670 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:08,678-Speed 2499.17 samples/sec Loss 2.5089 LearningRate 0.000417 Epoch: 16 Global Step: 347680 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:16,878-Speed 2497.93 samples/sec Loss 2.4964 LearningRate 0.000417 Epoch: 16 Global Step: 347690 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:25,079-Speed 2497.52 samples/sec Loss 2.4733 LearningRate 0.000417 Epoch: 16 Global Step: 347700 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:33,235-Speed 2511.70 samples/sec Loss 2.5249 LearningRate 0.000417 Epoch: 16 Global Step: 347710 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:41,439-Speed 2496.68 samples/sec Loss 2.4682 LearningRate 0.000417 Epoch: 16 Global Step: 347720 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:49,644-Speed 2496.54 samples/sec Loss 2.4531 LearningRate 0.000416 Epoch: 16 Global Step: 347730 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:01:57,841-Speed 2498.75 samples/sec Loss 2.5094 LearningRate 0.000416 Epoch: 16 Global Step: 347740 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:02:06,045-Speed 2496.57 samples/sec Loss 2.4383 LearningRate 0.000416 Epoch: 16 Global Step: 347750 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:02:14,254-Speed 2495.28 samples/sec Loss 2.4467 LearningRate 0.000416 Epoch: 16 Global Step: 347760 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:02:22,400-Speed 2514.65 samples/sec Loss 2.4380 LearningRate 0.000416 Epoch: 16 Global Step: 347770 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:02:30,606-Speed 2496.02 samples/sec Loss 2.4493 LearningRate 0.000416 Epoch: 16 Global Step: 347780 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:02:38,764-Speed 2510.76 samples/sec Loss 2.4850 LearningRate 0.000416 Epoch: 16 Global Step: 347790 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:02:46,965-Speed 2497.96 samples/sec Loss 2.4303 LearningRate 0.000416 Epoch: 16 Global Step: 347800 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:02:55,168-Speed 2496.76 samples/sec Loss 2.4594 LearningRate 0.000416 Epoch: 16 Global Step: 347810 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:03,372-Speed 2496.85 samples/sec Loss 2.4501 LearningRate 0.000416 Epoch: 16 Global Step: 347820 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:11,520-Speed 2514.38 samples/sec Loss 2.4899 LearningRate 0.000416 Epoch: 16 Global Step: 347830 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:19,723-Speed 2497.11 samples/sec Loss 2.4834 LearningRate 0.000416 Epoch: 16 Global Step: 347840 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:27,923-Speed 2497.90 samples/sec Loss 2.4897 LearningRate 0.000416 Epoch: 16 Global Step: 347850 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:36,127-Speed 2496.82 samples/sec Loss 2.4530 LearningRate 0.000416 Epoch: 16 Global Step: 347860 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:44,329-Speed 2497.18 samples/sec Loss 2.4476 LearningRate 0.000416 Epoch: 16 Global Step: 347870 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:03:52,541-Speed 2494.51 samples/sec Loss 2.4643 LearningRate 0.000416 Epoch: 16 Global Step: 347880 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:00,689-Speed 2513.72 samples/sec Loss 2.4990 LearningRate 0.000416 Epoch: 16 Global Step: 347890 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:08,898-Speed 2495.35 samples/sec Loss 2.4637 LearningRate 0.000416 Epoch: 16 Global Step: 347900 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:17,100-Speed 2497.32 samples/sec Loss 2.4651 LearningRate 0.000416 Epoch: 16 Global Step: 347910 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:25,304-Speed 2497.15 samples/sec Loss 2.4835 LearningRate 0.000416 Epoch: 16 Global Step: 347920 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:33,510-Speed 2495.94 samples/sec Loss 2.4596 LearningRate 0.000416 Epoch: 16 Global Step: 347930 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:41,716-Speed 2496.35 samples/sec Loss 2.4790 LearningRate 0.000416 Epoch: 16 Global Step: 347940 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:49,866-Speed 2513.25 samples/sec Loss 2.4867 LearningRate 0.000416 Epoch: 16 Global Step: 347950 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:04:58,080-Speed 2493.74 samples/sec Loss 2.4876 LearningRate 0.000416 Epoch: 16 Global Step: 347960 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:06,281-Speed 2497.68 samples/sec Loss 2.4377 LearningRate 0.000416 Epoch: 16 Global Step: 347970 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:14,484-Speed 2497.12 samples/sec Loss 2.4658 LearningRate 0.000416 Epoch: 16 Global Step: 347980 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:22,688-Speed 2497.00 samples/sec Loss 2.4841 LearningRate 0.000416 Epoch: 16 Global Step: 347990 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:30,892-Speed 2496.70 samples/sec Loss 2.4455 LearningRate 0.000416 Epoch: 16 Global Step: 348000 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:39,044-Speed 2512.78 samples/sec Loss 2.4626 LearningRate 0.000416 Epoch: 16 Global Step: 348010 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:47,254-Speed 2495.20 samples/sec Loss 2.4868 LearningRate 0.000416 Epoch: 16 Global Step: 348020 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:05:55,456-Speed 2497.29 samples/sec Loss 2.5162 LearningRate 0.000416 Epoch: 16 Global Step: 348030 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:06:03,615-Speed 2510.48 samples/sec Loss 2.4770 LearningRate 0.000416 Epoch: 16 Global Step: 348040 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:11,826-Speed 2494.93 samples/sec Loss 2.4402 LearningRate 0.000416 Epoch: 16 Global Step: 348050 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:20,030-Speed 2497.04 samples/sec Loss 2.4807 LearningRate 0.000416 Epoch: 16 Global Step: 348060 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:28,177-Speed 2514.05 samples/sec Loss 2.4655 LearningRate 0.000416 Epoch: 16 Global Step: 348070 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:36,379-Speed 2497.17 samples/sec Loss 2.4334 LearningRate 0.000416 Epoch: 16 Global Step: 348080 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:44,580-Speed 2497.63 samples/sec Loss 2.4611 LearningRate 0.000416 Epoch: 16 Global Step: 348090 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:06:52,784-Speed 2497.31 samples/sec Loss 2.4309 LearningRate 0.000416 Epoch: 16 Global Step: 348100 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:00,981-Speed 2498.96 samples/sec Loss 2.4548 LearningRate 0.000416 Epoch: 16 Global Step: 348110 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:09,179-Speed 2498.41 samples/sec Loss 2.4573 LearningRate 0.000416 Epoch: 16 Global Step: 348120 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:17,324-Speed 2515.12 samples/sec Loss 2.4481 LearningRate 0.000416 Epoch: 16 Global Step: 348130 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:25,521-Speed 2499.03 samples/sec Loss 2.4906 LearningRate 0.000416 Epoch: 16 Global Step: 348140 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:33,726-Speed 2496.38 samples/sec Loss 2.4768 LearningRate 0.000416 Epoch: 16 Global Step: 348150 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:41,923-Speed 2498.75 samples/sec Loss 2.4727 LearningRate 0.000416 Epoch: 16 Global Step: 348160 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:50,118-Speed 2499.71 samples/sec Loss 2.4905 LearningRate 0.000416 Epoch: 16 Global Step: 348170 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:07:58,316-Speed 2498.95 samples/sec Loss 2.4966 LearningRate 0.000416 Epoch: 16 Global Step: 348180 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:06,460-Speed 2514.90 samples/sec Loss 2.5080 LearningRate 0.000416 Epoch: 16 Global Step: 348190 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:14,654-Speed 2499.70 samples/sec Loss 2.4594 LearningRate 0.000416 Epoch: 16 Global Step: 348200 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:22,853-Speed 2498.24 samples/sec Loss 2.4727 LearningRate 0.000416 Epoch: 16 Global Step: 348210 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:31,049-Speed 2499.10 samples/sec Loss 2.4847 LearningRate 0.000416 Epoch: 16 Global Step: 348220 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:39,245-Speed 2499.34 samples/sec Loss 2.4202 LearningRate 0.000416 Epoch: 16 Global Step: 348230 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:47,445-Speed 2498.10 samples/sec Loss 2.4752 LearningRate 0.000416 Epoch: 16 Global Step: 348240 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:08:55,592-Speed 2514.33 samples/sec Loss 2.4903 LearningRate 0.000416 Epoch: 16 Global Step: 348250 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:03,789-Speed 2498.80 samples/sec Loss 2.4659 LearningRate 0.000416 Epoch: 16 Global Step: 348260 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:11,989-Speed 2497.79 samples/sec Loss 2.4180 LearningRate 0.000416 Epoch: 16 Global Step: 348270 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:20,183-Speed 2499.76 samples/sec Loss 2.4573 LearningRate 0.000416 Epoch: 16 Global Step: 348280 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:28,398-Speed 2493.51 samples/sec Loss 2.5033 LearningRate 0.000416 Epoch: 16 Global Step: 348290 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:36,594-Speed 2498.94 samples/sec Loss 2.4718 LearningRate 0.000416 Epoch: 16 Global Step: 348300 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:44,740-Speed 2514.43 samples/sec Loss 2.4726 LearningRate 0.000415 Epoch: 16 Global Step: 348310 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:09:52,945-Speed 2496.50 samples/sec Loss 2.4629 LearningRate 0.000415 Epoch: 16 Global Step: 348320 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:01,146-Speed 2497.55 samples/sec Loss 2.5119 LearningRate 0.000415 Epoch: 16 Global Step: 348330 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:09,469-Speed 2499.78 samples/sec Loss 2.4948 LearningRate 0.000415 Epoch: 16 Global Step: 348340 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:17,667-Speed 2498.63 samples/sec Loss 2.5289 LearningRate 0.000415 Epoch: 16 Global Step: 348350 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:25,871-Speed 2496.81 samples/sec Loss 2.5269 LearningRate 0.000415 Epoch: 16 Global Step: 348360 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:34,015-Speed 2515.07 samples/sec Loss 2.4958 LearningRate 0.000415 Epoch: 16 Global Step: 348370 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:42,214-Speed 2498.19 samples/sec Loss 2.4160 LearningRate 0.000415 Epoch: 16 Global Step: 348380 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:50,409-Speed 2499.50 samples/sec Loss 2.4636 LearningRate 0.000415 Epoch: 16 Global Step: 348390 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:10:58,612-Speed 2496.81 samples/sec Loss 2.4579 LearningRate 0.000415 Epoch: 16 Global Step: 348400 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:06,812-Speed 2498.12 samples/sec Loss 2.4313 LearningRate 0.000415 Epoch: 16 Global Step: 348410 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:15,009-Speed 2498.64 samples/sec Loss 2.4846 LearningRate 0.000415 Epoch: 16 Global Step: 348420 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:23,158-Speed 2513.70 samples/sec Loss 2.4936 LearningRate 0.000415 Epoch: 16 Global Step: 348430 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:31,353-Speed 2499.41 samples/sec Loss 2.4604 LearningRate 0.000415 Epoch: 16 Global Step: 348440 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:39,548-Speed 2499.53 samples/sec Loss 2.5233 LearningRate 0.000415 Epoch: 16 Global Step: 348450 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:47,745-Speed 2498.90 samples/sec Loss 2.4642 LearningRate 0.000415 Epoch: 16 Global Step: 348460 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:11:55,939-Speed 2499.70 samples/sec Loss 2.5363 LearningRate 0.000415 Epoch: 16 Global Step: 348470 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:04,138-Speed 2498.13 samples/sec Loss 2.5144 LearningRate 0.000415 Epoch: 16 Global Step: 348480 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:12,279-Speed 2516.27 samples/sec Loss 2.5232 LearningRate 0.000415 Epoch: 16 Global Step: 348490 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:20,479-Speed 2497.87 samples/sec Loss 2.4926 LearningRate 0.000415 Epoch: 16 Global Step: 348500 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:28,677-Speed 2499.07 samples/sec Loss 2.4741 LearningRate 0.000415 Epoch: 16 Global Step: 348510 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:36,878-Speed 2497.48 samples/sec Loss 2.4277 LearningRate 0.000415 Epoch: 16 Global Step: 348520 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:45,080-Speed 2497.63 samples/sec Loss 2.4890 LearningRate 0.000415 Epoch: 16 Global Step: 348530 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:12:53,279-Speed 2498.06 samples/sec Loss 2.4993 LearningRate 0.000415 Epoch: 16 Global Step: 348540 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:01,424-Speed 2514.90 samples/sec Loss 2.4700 LearningRate 0.000415 Epoch: 16 Global Step: 348550 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:09,627-Speed 2497.00 samples/sec Loss 2.4265 LearningRate 0.000415 Epoch: 16 Global Step: 348560 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:17,823-Speed 2499.17 samples/sec Loss 2.4773 LearningRate 0.000415 Epoch: 16 Global Step: 348570 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:26,037-Speed 2493.87 samples/sec Loss 2.5353 LearningRate 0.000415 Epoch: 16 Global Step: 348580 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:34,235-Speed 2498.58 samples/sec Loss 2.5182 LearningRate 0.000415 Epoch: 16 Global Step: 348590 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:42,449-Speed 2493.73 samples/sec Loss 2.5448 LearningRate 0.000415 Epoch: 16 Global Step: 348600 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:50,616-Speed 2508.16 samples/sec Loss 2.5312 LearningRate 0.000415 Epoch: 16 Global Step: 348610 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:13:58,826-Speed 2494.93 samples/sec Loss 2.4503 LearningRate 0.000415 Epoch: 16 Global Step: 348620 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:07,029-Speed 2497.07 samples/sec Loss 2.5072 LearningRate 0.000415 Epoch: 16 Global Step: 348630 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:15,227-Speed 2498.57 samples/sec Loss 2.4697 LearningRate 0.000415 Epoch: 16 Global Step: 348640 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:23,425-Speed 2498.98 samples/sec Loss 2.4969 LearningRate 0.000415 Epoch: 16 Global Step: 348650 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:31,628-Speed 2497.02 samples/sec Loss 2.4192 LearningRate 0.000415 Epoch: 16 Global Step: 348660 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:39,776-Speed 2513.85 samples/sec Loss 2.5091 LearningRate 0.000415 Epoch: 16 Global Step: 348670 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:47,976-Speed 2498.11 samples/sec Loss 2.4918 LearningRate 0.000415 Epoch: 16 Global Step: 348680 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:14:56,174-Speed 2498.32 samples/sec Loss 2.4703 LearningRate 0.000415 Epoch: 16 Global Step: 348690 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:04,384-Speed 2495.03 samples/sec Loss 2.4524 LearningRate 0.000415 Epoch: 16 Global Step: 348700 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:12,580-Speed 2499.19 samples/sec Loss 2.4729 LearningRate 0.000415 Epoch: 16 Global Step: 348710 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:20,779-Speed 2498.08 samples/sec Loss 2.4189 LearningRate 0.000415 Epoch: 16 Global Step: 348720 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:28,924-Speed 2514.97 samples/sec Loss 2.4648 LearningRate 0.000415 Epoch: 16 Global Step: 348730 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:37,121-Speed 2498.90 samples/sec Loss 2.4517 LearningRate 0.000415 Epoch: 16 Global Step: 348740 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:45,325-Speed 2496.94 samples/sec Loss 2.4944 LearningRate 0.000415 Epoch: 16 Global Step: 348750 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:15:53,522-Speed 2498.78 samples/sec Loss 2.4752 LearningRate 0.000415 Epoch: 16 Global Step: 348760 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:01,722-Speed 2497.81 samples/sec Loss 2.4472 LearningRate 0.000415 Epoch: 16 Global Step: 348770 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:09,925-Speed 2497.02 samples/sec Loss 2.5167 LearningRate 0.000415 Epoch: 16 Global Step: 348780 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:18,075-Speed 2513.20 samples/sec Loss 2.4794 LearningRate 0.000415 Epoch: 16 Global Step: 348790 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:26,279-Speed 2497.65 samples/sec Loss 2.4402 LearningRate 0.000415 Epoch: 16 Global Step: 348800 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:34,475-Speed 2499.40 samples/sec Loss 2.4728 LearningRate 0.000415 Epoch: 16 Global Step: 348810 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:42,677-Speed 2497.38 samples/sec Loss 2.4682 LearningRate 0.000415 Epoch: 16 Global Step: 348820 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:50,878-Speed 2498.18 samples/sec Loss 2.4954 LearningRate 0.000415 Epoch: 16 Global Step: 348830 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:16:59,079-Speed 2497.48 samples/sec Loss 2.5089 LearningRate 0.000415 Epoch: 16 Global Step: 348840 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:07,231-Speed 2512.80 samples/sec Loss 2.4606 LearningRate 0.000415 Epoch: 16 Global Step: 348850 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:15,432-Speed 2497.65 samples/sec Loss 2.4526 LearningRate 0.000415 Epoch: 16 Global Step: 348860 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:23,629-Speed 2498.80 samples/sec Loss 2.4739 LearningRate 0.000415 Epoch: 16 Global Step: 348870 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:31,831-Speed 2497.33 samples/sec Loss 2.5204 LearningRate 0.000415 Epoch: 16 Global Step: 348880 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:40,036-Speed 2496.56 samples/sec Loss 2.4992 LearningRate 0.000414 Epoch: 16 Global Step: 348890 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:48,234-Speed 2498.38 samples/sec Loss 2.4710 LearningRate 0.000414 Epoch: 16 Global Step: 348900 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:17:56,386-Speed 2512.71 samples/sec Loss 2.4710 LearningRate 0.000414 Epoch: 16 Global Step: 348910 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:04,584-Speed 2498.79 samples/sec Loss 2.5169 LearningRate 0.000414 Epoch: 16 Global Step: 348920 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:12,782-Speed 2498.34 samples/sec Loss 2.4884 LearningRate 0.000414 Epoch: 16 Global Step: 348930 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:20,984-Speed 2497.40 samples/sec Loss 2.5273 LearningRate 0.000414 Epoch: 16 Global Step: 348940 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:29,182-Speed 2498.51 samples/sec Loss 2.4864 LearningRate 0.000414 Epoch: 16 Global Step: 348950 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:37,382-Speed 2498.23 samples/sec Loss 2.5610 LearningRate 0.000414 Epoch: 16 Global Step: 348960 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:45,530-Speed 2513.84 samples/sec Loss 2.4565 LearningRate 0.000414 Epoch: 16 Global Step: 348970 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:18:53,734-Speed 2496.81 samples/sec Loss 2.4899 LearningRate 0.000414 Epoch: 16 Global Step: 348980 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:01,933-Speed 2498.26 samples/sec Loss 2.5540 LearningRate 0.000414 Epoch: 16 Global Step: 348990 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:10,139-Speed 2495.99 samples/sec Loss 2.4822 LearningRate 0.000414 Epoch: 16 Global Step: 349000 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:18,342-Speed 2497.09 samples/sec Loss 2.5103 LearningRate 0.000414 Epoch: 16 Global Step: 349010 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:26,540-Speed 2498.57 samples/sec Loss 2.5135 LearningRate 0.000414 Epoch: 16 Global Step: 349020 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:34,687-Speed 2514.30 samples/sec Loss 2.4880 LearningRate 0.000414 Epoch: 16 Global Step: 349030 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:42,890-Speed 2497.05 samples/sec Loss 2.4319 LearningRate 0.000414 Epoch: 16 Global Step: 349040 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:51,088-Speed 2498.58 samples/sec Loss 2.5027 LearningRate 0.000414 Epoch: 16 Global Step: 349050 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:19:59,286-Speed 2498.54 samples/sec Loss 2.5181 LearningRate 0.000414 Epoch: 16 Global Step: 349060 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:07,487-Speed 2497.94 samples/sec Loss 2.5175 LearningRate 0.000414 Epoch: 16 Global Step: 349070 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:15,686-Speed 2498.43 samples/sec Loss 2.4618 LearningRate 0.000414 Epoch: 16 Global Step: 349080 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:23,832-Speed 2514.53 samples/sec Loss 2.4913 LearningRate 0.000414 Epoch: 16 Global Step: 349090 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:32,035-Speed 2496.84 samples/sec Loss 2.4659 LearningRate 0.000414 Epoch: 16 Global Step: 349100 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:40,240-Speed 2496.69 samples/sec Loss 2.4904 LearningRate 0.000414 Epoch: 16 Global Step: 349110 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:48,438-Speed 2498.75 samples/sec Loss 2.4328 LearningRate 0.000414 Epoch: 16 Global Step: 349120 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:20:56,638-Speed 2497.93 samples/sec Loss 2.5032 LearningRate 0.000414 Epoch: 16 Global Step: 349130 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:04,841-Speed 2496.99 samples/sec Loss 2.4851 LearningRate 0.000414 Epoch: 16 Global Step: 349140 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:12,993-Speed 2512.96 samples/sec Loss 2.5311 LearningRate 0.000414 Epoch: 16 Global Step: 349150 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:21,202-Speed 2495.19 samples/sec Loss 2.4757 LearningRate 0.000414 Epoch: 16 Global Step: 349160 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:29,404-Speed 2497.35 samples/sec Loss 2.4569 LearningRate 0.000414 Epoch: 16 Global Step: 349170 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:37,620-Speed 2493.35 samples/sec Loss 2.4850 LearningRate 0.000414 Epoch: 16 Global Step: 349180 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:45,818-Speed 2498.36 samples/sec Loss 2.4221 LearningRate 0.000414 Epoch: 16 Global Step: 349190 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:21:54,017-Speed 2498.21 samples/sec Loss 2.5605 LearningRate 0.000414 Epoch: 16 Global Step: 349200 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:22:02,164-Speed 2515.02 samples/sec Loss 2.5425 LearningRate 0.000414 Epoch: 16 Global Step: 349210 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:22:10,366-Speed 2497.65 samples/sec Loss 2.4880 LearningRate 0.000414 Epoch: 16 Global Step: 349220 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:22:18,564-Speed 2498.38 samples/sec Loss 2.4269 LearningRate 0.000414 Epoch: 16 Global Step: 349230 Fp16 Grad Scale: 16384 Required: 110 hours Training: 2022-07-08 22:22:26,774-Speed 2494.90 samples/sec Loss 2.4444 LearningRate 0.000414 Epoch: 16 Global Step: 349240 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:22:34,973-Speed 2498.21 samples/sec Loss 2.4669 LearningRate 0.000414 Epoch: 16 Global Step: 349250 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:22:43,181-Speed 2495.63 samples/sec Loss 2.4479 LearningRate 0.000414 Epoch: 16 Global Step: 349260 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:22:51,327-Speed 2514.62 samples/sec Loss 2.4667 LearningRate 0.000414 Epoch: 16 Global Step: 349270 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:22:59,545-Speed 2492.46 samples/sec Loss 2.4926 LearningRate 0.000414 Epoch: 16 Global Step: 349280 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:07,752-Speed 2495.87 samples/sec Loss 2.4449 LearningRate 0.000414 Epoch: 16 Global Step: 349290 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:15,964-Speed 2494.27 samples/sec Loss 2.4502 LearningRate 0.000414 Epoch: 16 Global Step: 349300 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:24,170-Speed 2495.93 samples/sec Loss 2.5023 LearningRate 0.000414 Epoch: 16 Global Step: 349310 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:32,371-Speed 2497.59 samples/sec Loss 2.5256 LearningRate 0.000414 Epoch: 16 Global Step: 349320 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:40,524-Speed 2512.55 samples/sec Loss 2.4839 LearningRate 0.000414 Epoch: 16 Global Step: 349330 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:48,733-Speed 2495.10 samples/sec Loss 2.4973 LearningRate 0.000414 Epoch: 16 Global Step: 349340 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:23:56,939-Speed 2496.43 samples/sec Loss 2.5335 LearningRate 0.000414 Epoch: 16 Global Step: 349350 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:05,141-Speed 2497.19 samples/sec Loss 2.5302 LearningRate 0.000414 Epoch: 16 Global Step: 349360 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:13,342-Speed 2497.82 samples/sec Loss 2.4174 LearningRate 0.000414 Epoch: 16 Global Step: 349370 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:21,540-Speed 2498.44 samples/sec Loss 2.5024 LearningRate 0.000414 Epoch: 16 Global Step: 349380 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:29,685-Speed 2514.72 samples/sec Loss 2.4852 LearningRate 0.000414 Epoch: 16 Global Step: 349390 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:37,887-Speed 2497.30 samples/sec Loss 2.4574 LearningRate 0.000414 Epoch: 16 Global Step: 349400 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:46,096-Speed 2495.23 samples/sec Loss 2.4248 LearningRate 0.000414 Epoch: 16 Global Step: 349410 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:24:54,303-Speed 2495.62 samples/sec Loss 2.4715 LearningRate 0.000414 Epoch: 16 Global Step: 349420 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:02,503-Speed 2498.01 samples/sec Loss 2.4412 LearningRate 0.000414 Epoch: 16 Global Step: 349430 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:10,704-Speed 2497.75 samples/sec Loss 2.4559 LearningRate 0.000414 Epoch: 16 Global Step: 349440 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:18,851-Speed 2514.18 samples/sec Loss 2.4642 LearningRate 0.000414 Epoch: 16 Global Step: 349450 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:27,047-Speed 2499.02 samples/sec Loss 2.4836 LearningRate 0.000414 Epoch: 16 Global Step: 349460 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:35,254-Speed 2495.65 samples/sec Loss 2.4566 LearningRate 0.000413 Epoch: 16 Global Step: 349470 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:43,449-Speed 2499.52 samples/sec Loss 2.4572 LearningRate 0.000413 Epoch: 16 Global Step: 349480 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:51,646-Speed 2498.82 samples/sec Loss 2.4382 LearningRate 0.000413 Epoch: 16 Global Step: 349490 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:25:59,850-Speed 2496.77 samples/sec Loss 2.4247 LearningRate 0.000413 Epoch: 16 Global Step: 349500 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:08,002-Speed 2512.76 samples/sec Loss 2.4484 LearningRate 0.000413 Epoch: 16 Global Step: 349510 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:16,200-Speed 2498.49 samples/sec Loss 2.4622 LearningRate 0.000413 Epoch: 16 Global Step: 349520 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:24,399-Speed 2498.57 samples/sec Loss 2.4591 LearningRate 0.000413 Epoch: 16 Global Step: 349530 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:32,594-Speed 2499.16 samples/sec Loss 2.4705 LearningRate 0.000413 Epoch: 16 Global Step: 349540 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:40,795-Speed 2497.76 samples/sec Loss 2.4244 LearningRate 0.000413 Epoch: 16 Global Step: 349550 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:48,993-Speed 2498.60 samples/sec Loss 2.4367 LearningRate 0.000413 Epoch: 16 Global Step: 349560 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:26:57,149-Speed 2511.55 samples/sec Loss 2.5020 LearningRate 0.000413 Epoch: 16 Global Step: 349570 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:05,346-Speed 2498.85 samples/sec Loss 2.4586 LearningRate 0.000413 Epoch: 16 Global Step: 349580 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:13,545-Speed 2498.41 samples/sec Loss 2.4551 LearningRate 0.000413 Epoch: 16 Global Step: 349590 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:21,748-Speed 2496.89 samples/sec Loss 2.4745 LearningRate 0.000413 Epoch: 16 Global Step: 349600 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:29,955-Speed 2495.80 samples/sec Loss 2.4490 LearningRate 0.000413 Epoch: 16 Global Step: 349610 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:38,155-Speed 2497.91 samples/sec Loss 2.4992 LearningRate 0.000413 Epoch: 16 Global Step: 349620 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:46,307-Speed 2513.52 samples/sec Loss 2.4612 LearningRate 0.000413 Epoch: 16 Global Step: 349630 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:27:54,512-Speed 2496.35 samples/sec Loss 2.5003 LearningRate 0.000413 Epoch: 16 Global Step: 349640 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:02,713-Speed 2497.69 samples/sec Loss 2.4760 LearningRate 0.000413 Epoch: 16 Global Step: 349650 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:10,935-Speed 2491.44 samples/sec Loss 2.4758 LearningRate 0.000413 Epoch: 16 Global Step: 349660 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:19,133-Speed 2498.54 samples/sec Loss 2.4442 LearningRate 0.000413 Epoch: 16 Global Step: 349670 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:27,332-Speed 2498.11 samples/sec Loss 2.5291 LearningRate 0.000413 Epoch: 16 Global Step: 349680 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:35,479-Speed 2514.31 samples/sec Loss 2.4593 LearningRate 0.000413 Epoch: 16 Global Step: 349690 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:44,010-Speed 2495.60 samples/sec Loss 2.4272 LearningRate 0.000413 Epoch: 16 Global Step: 349700 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:28:52,427-Speed 2497.57 samples/sec Loss 2.5533 LearningRate 0.000413 Epoch: 16 Global Step: 349710 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:00,672-Speed 2500.02 samples/sec Loss 2.4786 LearningRate 0.000413 Epoch: 16 Global Step: 349720 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:08,867-Speed 2499.34 samples/sec Loss 2.4388 LearningRate 0.000413 Epoch: 16 Global Step: 349730 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:17,146-Speed 2500.37 samples/sec Loss 2.4053 LearningRate 0.000413 Epoch: 16 Global Step: 349740 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:26,598-Speed 2517.40 samples/sec Loss 2.3935 LearningRate 0.000413 Epoch: 16 Global Step: 349750 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:34,866-Speed 2501.55 samples/sec Loss 2.3975 LearningRate 0.000413 Epoch: 16 Global Step: 349760 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:43,065-Speed 2498.18 samples/sec Loss 2.4376 LearningRate 0.000413 Epoch: 16 Global Step: 349770 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:29:51,335-Speed 2500.26 samples/sec Loss 2.5150 LearningRate 0.000413 Epoch: 16 Global Step: 349780 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:04,598-Speed 2494.47 samples/sec Loss 2.4804 LearningRate 0.000413 Epoch: 16 Global Step: 349790 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:12,860-Speed 2498.48 samples/sec Loss 2.4471 LearningRate 0.000413 Epoch: 16 Global Step: 349800 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:24,900-Speed 1710.61 samples/sec Loss 2.4644 LearningRate 0.000413 Epoch: 16 Global Step: 349810 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:33,156-Speed 2502.90 samples/sec Loss 2.5008 LearningRate 0.000413 Epoch: 16 Global Step: 349820 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:42,876-Speed 2107.27 samples/sec Loss 2.4641 LearningRate 0.000413 Epoch: 16 Global Step: 349830 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:30:51,083-Speed 2499.55 samples/sec Loss 2.5285 LearningRate 0.000413 Epoch: 16 Global Step: 349840 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:01,066-Speed 2069.60 samples/sec Loss 2.4710 LearningRate 0.000413 Epoch: 16 Global Step: 349850 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:09,270-Speed 2496.86 samples/sec Loss 2.4540 LearningRate 0.000413 Epoch: 16 Global Step: 349860 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:17,488-Speed 2492.33 samples/sec Loss 2.5145 LearningRate 0.000413 Epoch: 16 Global Step: 349870 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:25,725-Speed 2498.60 samples/sec Loss 2.4714 LearningRate 0.000413 Epoch: 16 Global Step: 349880 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:38,408-Speed 1622.98 samples/sec Loss 2.4670 LearningRate 0.000413 Epoch: 16 Global Step: 349890 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:46,614-Speed 2496.08 samples/sec Loss 2.4422 LearningRate 0.000413 Epoch: 16 Global Step: 349900 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:31:54,883-Speed 2496.79 samples/sec Loss 2.4004 LearningRate 0.000413 Epoch: 16 Global Step: 349910 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:03,124-Speed 2495.63 samples/sec Loss 2.4550 LearningRate 0.000413 Epoch: 16 Global Step: 349920 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:11,996-Speed 2308.47 samples/sec Loss 2.4078 LearningRate 0.000413 Epoch: 16 Global Step: 349930 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:20,217-Speed 2491.69 samples/sec Loss 2.4572 LearningRate 0.000413 Epoch: 16 Global Step: 349940 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:28,485-Speed 2492.35 samples/sec Loss 2.4182 LearningRate 0.000413 Epoch: 16 Global Step: 349950 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:36,737-Speed 2491.52 samples/sec Loss 2.4679 LearningRate 0.000413 Epoch: 16 Global Step: 349960 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:45,253-Speed 2405.08 samples/sec Loss 2.4351 LearningRate 0.000413 Epoch: 16 Global Step: 349970 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:32:53,485-Speed 2490.36 samples/sec Loss 2.4592 LearningRate 0.000413 Epoch: 16 Global Step: 349980 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:01,659-Speed 2506.11 samples/sec Loss 2.4903 LearningRate 0.000413 Epoch: 16 Global Step: 349990 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:09,882-Speed 2490.83 samples/sec Loss 2.4054 LearningRate 0.000413 Epoch: 16 Global Step: 350000 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:18,097-Speed 2493.24 samples/sec Loss 2.3743 LearningRate 0.000413 Epoch: 16 Global Step: 350010 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:26,324-Speed 2494.63 samples/sec Loss 2.4563 LearningRate 0.000413 Epoch: 16 Global Step: 350020 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:34,544-Speed 2493.80 samples/sec Loss 2.4688 LearningRate 0.000413 Epoch: 16 Global Step: 350030 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:42,748-Speed 2496.62 samples/sec Loss 2.4582 LearningRate 0.000413 Epoch: 16 Global Step: 350040 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:50,899-Speed 2513.08 samples/sec Loss 2.4895 LearningRate 0.000412 Epoch: 16 Global Step: 350050 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:33:59,102-Speed 2497.21 samples/sec Loss 2.4695 LearningRate 0.000412 Epoch: 16 Global Step: 350060 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:07,306-Speed 2496.68 samples/sec Loss 2.4773 LearningRate 0.000412 Epoch: 16 Global Step: 350070 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:15,512-Speed 2496.12 samples/sec Loss 2.4533 LearningRate 0.000412 Epoch: 16 Global Step: 350080 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:23,718-Speed 2496.24 samples/sec Loss 2.4966 LearningRate 0.000412 Epoch: 16 Global Step: 350090 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:31,925-Speed 2495.65 samples/sec Loss 2.4777 LearningRate 0.000412 Epoch: 16 Global Step: 350100 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:40,093-Speed 2507.90 samples/sec Loss 2.5099 LearningRate 0.000412 Epoch: 16 Global Step: 350110 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:48,299-Speed 2496.50 samples/sec Loss 2.4753 LearningRate 0.000412 Epoch: 16 Global Step: 350120 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:34:56,505-Speed 2496.09 samples/sec Loss 2.5110 LearningRate 0.000412 Epoch: 16 Global Step: 350130 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:04,714-Speed 2495.27 samples/sec Loss 2.4325 LearningRate 0.000412 Epoch: 16 Global Step: 350140 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:12,931-Speed 2492.90 samples/sec Loss 2.4561 LearningRate 0.000412 Epoch: 16 Global Step: 350150 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:21,141-Speed 2494.98 samples/sec Loss 2.5342 LearningRate 0.000412 Epoch: 16 Global Step: 350160 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:29,293-Speed 2512.56 samples/sec Loss 2.4331 LearningRate 0.000412 Epoch: 16 Global Step: 350170 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:37,500-Speed 2495.97 samples/sec Loss 2.4813 LearningRate 0.000412 Epoch: 16 Global Step: 350180 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:45,707-Speed 2496.15 samples/sec Loss 2.4352 LearningRate 0.000412 Epoch: 16 Global Step: 350190 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:35:53,918-Speed 2494.43 samples/sec Loss 2.4766 LearningRate 0.000412 Epoch: 16 Global Step: 350200 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:02,121-Speed 2496.99 samples/sec Loss 2.4671 LearningRate 0.000412 Epoch: 16 Global Step: 350210 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:10,327-Speed 2496.31 samples/sec Loss 2.4919 LearningRate 0.000412 Epoch: 16 Global Step: 350220 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:18,475-Speed 2513.98 samples/sec Loss 2.3987 LearningRate 0.000412 Epoch: 16 Global Step: 350230 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:26,680-Speed 2496.30 samples/sec Loss 2.4908 LearningRate 0.000412 Epoch: 16 Global Step: 350240 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:34,885-Speed 2496.24 samples/sec Loss 2.4636 LearningRate 0.000412 Epoch: 16 Global Step: 350250 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:43,100-Speed 2493.46 samples/sec Loss 2.5234 LearningRate 0.000412 Epoch: 16 Global Step: 350260 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:51,304-Speed 2496.81 samples/sec Loss 2.5151 LearningRate 0.000412 Epoch: 16 Global Step: 350270 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:36:59,509-Speed 2496.62 samples/sec Loss 2.4844 LearningRate 0.000412 Epoch: 16 Global Step: 350280 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:07,664-Speed 2511.69 samples/sec Loss 2.4859 LearningRate 0.000412 Epoch: 16 Global Step: 350290 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:15,875-Speed 2494.34 samples/sec Loss 2.3657 LearningRate 0.000412 Epoch: 16 Global Step: 350300 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:24,079-Speed 2496.60 samples/sec Loss 2.4719 LearningRate 0.000412 Epoch: 16 Global Step: 350310 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:32,287-Speed 2495.79 samples/sec Loss 2.5430 LearningRate 0.000412 Epoch: 16 Global Step: 350320 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:40,495-Speed 2495.40 samples/sec Loss 2.4576 LearningRate 0.000412 Epoch: 16 Global Step: 350330 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:48,703-Speed 2495.49 samples/sec Loss 2.4936 LearningRate 0.000412 Epoch: 16 Global Step: 350340 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:37:56,867-Speed 2509.15 samples/sec Loss 2.4836 LearningRate 0.000412 Epoch: 16 Global Step: 350350 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:05,070-Speed 2496.93 samples/sec Loss 2.5191 LearningRate 0.000412 Epoch: 16 Global Step: 350360 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:13,273-Speed 2496.98 samples/sec Loss 2.5038 LearningRate 0.000412 Epoch: 16 Global Step: 350370 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:21,500-Speed 2489.90 samples/sec Loss 2.4627 LearningRate 0.000412 Epoch: 16 Global Step: 350380 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:29,701-Speed 2497.57 samples/sec Loss 2.4692 LearningRate 0.000412 Epoch: 16 Global Step: 350390 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:37,907-Speed 2496.06 samples/sec Loss 2.4498 LearningRate 0.000412 Epoch: 16 Global Step: 350400 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:46,063-Speed 2511.68 samples/sec Loss 2.4230 LearningRate 0.000412 Epoch: 16 Global Step: 350410 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:38:54,294-Speed 2488.50 samples/sec Loss 2.5053 LearningRate 0.000412 Epoch: 16 Global Step: 350420 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:39:02,510-Speed 2493.02 samples/sec Loss 2.5770 LearningRate 0.000412 Epoch: 16 Global Step: 350430 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:39:10,715-Speed 2496.45 samples/sec Loss 2.4758 LearningRate 0.000412 Epoch: 16 Global Step: 350440 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:18,921-Speed 2496.41 samples/sec Loss 2.4657 LearningRate 0.000412 Epoch: 16 Global Step: 350450 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:27,124-Speed 2496.96 samples/sec Loss 2.4463 LearningRate 0.000412 Epoch: 16 Global Step: 350460 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:35,279-Speed 2511.85 samples/sec Loss 2.4560 LearningRate 0.000412 Epoch: 16 Global Step: 350470 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:43,483-Speed 2496.64 samples/sec Loss 2.4468 LearningRate 0.000412 Epoch: 16 Global Step: 350480 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:51,688-Speed 2496.68 samples/sec Loss 2.4410 LearningRate 0.000412 Epoch: 16 Global Step: 350490 Fp16 Grad Scale: 65536 Required: 110 hours Training: 2022-07-08 22:39:59,849-Speed 2509.87 samples/sec Loss 2.4345 LearningRate 0.000412 Epoch: 16 Global Step: 350500 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:08,060-Speed 2494.65 samples/sec Loss 2.4883 LearningRate 0.000412 Epoch: 16 Global Step: 350510 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:16,269-Speed 2495.03 samples/sec Loss 2.4969 LearningRate 0.000412 Epoch: 16 Global Step: 350520 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:24,431-Speed 2509.71 samples/sec Loss 2.4502 LearningRate 0.000412 Epoch: 16 Global Step: 350530 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:32,643-Speed 2494.14 samples/sec Loss 2.4863 LearningRate 0.000412 Epoch: 16 Global Step: 350540 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:40,850-Speed 2495.95 samples/sec Loss 2.4262 LearningRate 0.000412 Epoch: 16 Global Step: 350550 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:49,055-Speed 2496.44 samples/sec Loss 2.4391 LearningRate 0.000412 Epoch: 16 Global Step: 350560 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:40:57,261-Speed 2496.01 samples/sec Loss 2.5184 LearningRate 0.000412 Epoch: 16 Global Step: 350570 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:05,466-Speed 2496.67 samples/sec Loss 2.4153 LearningRate 0.000412 Epoch: 16 Global Step: 350580 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:13,618-Speed 2512.42 samples/sec Loss 2.4818 LearningRate 0.000412 Epoch: 16 Global Step: 350590 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:21,837-Speed 2492.37 samples/sec Loss 2.4585 LearningRate 0.000412 Epoch: 16 Global Step: 350600 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:30,043-Speed 2496.12 samples/sec Loss 2.4997 LearningRate 0.000412 Epoch: 16 Global Step: 350610 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:38,248-Speed 2496.18 samples/sec Loss 2.3763 LearningRate 0.000412 Epoch: 16 Global Step: 350620 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:46,454-Speed 2496.12 samples/sec Loss 2.4271 LearningRate 0.000411 Epoch: 16 Global Step: 350630 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:41:54,674-Speed 2492.10 samples/sec Loss 2.4099 LearningRate 0.000411 Epoch: 16 Global Step: 350640 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:02,824-Speed 2513.27 samples/sec Loss 2.4207 LearningRate 0.000411 Epoch: 16 Global Step: 350650 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:11,059-Speed 2487.33 samples/sec Loss 2.4184 LearningRate 0.000411 Epoch: 16 Global Step: 350660 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:19,265-Speed 2496.04 samples/sec Loss 2.4620 LearningRate 0.000411 Epoch: 16 Global Step: 350670 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:27,468-Speed 2496.90 samples/sec Loss 2.4359 LearningRate 0.000411 Epoch: 16 Global Step: 350680 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:35,693-Speed 2490.65 samples/sec Loss 2.4700 LearningRate 0.000411 Epoch: 16 Global Step: 350690 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:43,900-Speed 2495.55 samples/sec Loss 2.4822 LearningRate 0.000411 Epoch: 16 Global Step: 350700 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:42:52,053-Speed 2512.61 samples/sec Loss 2.4231 LearningRate 0.000411 Epoch: 16 Global Step: 350710 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:00,258-Speed 2496.32 samples/sec Loss 2.5086 LearningRate 0.000411 Epoch: 16 Global Step: 350720 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:08,460-Speed 2497.26 samples/sec Loss 2.4872 LearningRate 0.000411 Epoch: 16 Global Step: 350730 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:16,664-Speed 2496.73 samples/sec Loss 2.4524 LearningRate 0.000411 Epoch: 16 Global Step: 350740 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:24,869-Speed 2496.78 samples/sec Loss 2.4202 LearningRate 0.000411 Epoch: 16 Global Step: 350750 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:33,078-Speed 2495.40 samples/sec Loss 2.4944 LearningRate 0.000411 Epoch: 16 Global Step: 350760 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:41,230-Speed 2512.59 samples/sec Loss 2.3962 LearningRate 0.000411 Epoch: 16 Global Step: 350770 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:49,435-Speed 2496.44 samples/sec Loss 2.3798 LearningRate 0.000411 Epoch: 16 Global Step: 350780 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:43:57,642-Speed 2495.74 samples/sec Loss 2.4752 LearningRate 0.000411 Epoch: 16 Global Step: 350790 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:05,861-Speed 2492.10 samples/sec Loss 2.4854 LearningRate 0.000411 Epoch: 16 Global Step: 350800 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:14,078-Speed 2492.76 samples/sec Loss 2.4548 LearningRate 0.000411 Epoch: 16 Global Step: 350810 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:22,287-Speed 2495.10 samples/sec Loss 2.5020 LearningRate 0.000411 Epoch: 16 Global Step: 350820 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:30,439-Speed 2512.95 samples/sec Loss 2.4790 LearningRate 0.000411 Epoch: 16 Global Step: 350830 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:38,645-Speed 2496.06 samples/sec Loss 2.4444 LearningRate 0.000411 Epoch: 16 Global Step: 350840 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:46,848-Speed 2496.85 samples/sec Loss 2.4832 LearningRate 0.000411 Epoch: 16 Global Step: 350850 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:44:55,052-Speed 2496.77 samples/sec Loss 2.4145 LearningRate 0.000411 Epoch: 16 Global Step: 350860 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:03,254-Speed 2497.25 samples/sec Loss 2.4054 LearningRate 0.000411 Epoch: 16 Global Step: 350870 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:11,460-Speed 2495.97 samples/sec Loss 2.3946 LearningRate 0.000411 Epoch: 16 Global Step: 350880 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:19,627-Speed 2508.14 samples/sec Loss 2.4037 LearningRate 0.000411 Epoch: 16 Global Step: 350890 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:27,829-Speed 2497.30 samples/sec Loss 2.4902 LearningRate 0.000411 Epoch: 16 Global Step: 350900 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:36,038-Speed 2495.61 samples/sec Loss 2.4043 LearningRate 0.000411 Epoch: 16 Global Step: 350910 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:44,242-Speed 2496.81 samples/sec Loss 2.4203 LearningRate 0.000411 Epoch: 16 Global Step: 350920 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:45:52,447-Speed 2496.44 samples/sec Loss 2.4474 LearningRate 0.000411 Epoch: 16 Global Step: 350930 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:00,651-Speed 2496.56 samples/sec Loss 2.4734 LearningRate 0.000411 Epoch: 16 Global Step: 350940 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:08,802-Speed 2512.94 samples/sec Loss 2.3922 LearningRate 0.000411 Epoch: 16 Global Step: 350950 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:17,008-Speed 2496.20 samples/sec Loss 2.4465 LearningRate 0.000411 Epoch: 16 Global Step: 350960 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:25,223-Speed 2493.57 samples/sec Loss 2.4363 LearningRate 0.000411 Epoch: 16 Global Step: 350970 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:33,425-Speed 2497.31 samples/sec Loss 2.4722 LearningRate 0.000411 Epoch: 16 Global Step: 350980 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:41,639-Speed 2493.71 samples/sec Loss 2.4485 LearningRate 0.000411 Epoch: 16 Global Step: 350990 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:49,845-Speed 2495.99 samples/sec Loss 2.4588 LearningRate 0.000411 Epoch: 16 Global Step: 351000 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:46:58,000-Speed 2511.94 samples/sec Loss 2.4130 LearningRate 0.000411 Epoch: 16 Global Step: 351010 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:06,213-Speed 2493.67 samples/sec Loss 2.4133 LearningRate 0.000411 Epoch: 16 Global Step: 351020 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:14,420-Speed 2495.98 samples/sec Loss 2.4406 LearningRate 0.000411 Epoch: 16 Global Step: 351030 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:22,627-Speed 2495.87 samples/sec Loss 2.4856 LearningRate 0.000411 Epoch: 16 Global Step: 351040 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:30,832-Speed 2496.32 samples/sec Loss 2.4656 LearningRate 0.000411 Epoch: 16 Global Step: 351050 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:39,038-Speed 2496.14 samples/sec Loss 2.4392 LearningRate 0.000411 Epoch: 16 Global Step: 351060 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:47,194-Speed 2512.04 samples/sec Loss 2.4903 LearningRate 0.000411 Epoch: 16 Global Step: 351070 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:47:55,400-Speed 2496.13 samples/sec Loss 2.4410 LearningRate 0.000411 Epoch: 16 Global Step: 351080 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:03,605-Speed 2496.57 samples/sec Loss 2.4718 LearningRate 0.000411 Epoch: 16 Global Step: 351090 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:11,811-Speed 2495.97 samples/sec Loss 2.4785 LearningRate 0.000411 Epoch: 16 Global Step: 351100 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:20,016-Speed 2496.41 samples/sec Loss 2.4411 LearningRate 0.000411 Epoch: 16 Global Step: 351110 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:28,224-Speed 2495.50 samples/sec Loss 2.4693 LearningRate 0.000411 Epoch: 16 Global Step: 351120 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:36,378-Speed 2512.12 samples/sec Loss 2.5226 LearningRate 0.000411 Epoch: 16 Global Step: 351130 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:44,584-Speed 2496.05 samples/sec Loss 2.5092 LearningRate 0.000411 Epoch: 16 Global Step: 351140 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:48:52,791-Speed 2496.09 samples/sec Loss 2.5174 LearningRate 0.000411 Epoch: 16 Global Step: 351150 Fp16 Grad Scale: 32768 Required: 110 hours Training: 2022-07-08 22:49:00,998-Speed 2495.65 samples/sec Loss 2.4323 LearningRate 0.000411 Epoch: 16 Global Step: 351160 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:09,208-Speed 2494.91 samples/sec Loss 2.4892 LearningRate 0.000411 Epoch: 16 Global Step: 351170 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:17,417-Speed 2495.45 samples/sec Loss 2.4292 LearningRate 0.000411 Epoch: 16 Global Step: 351180 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:25,572-Speed 2511.74 samples/sec Loss 2.5180 LearningRate 0.000411 Epoch: 16 Global Step: 351190 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:33,780-Speed 2495.41 samples/sec Loss 2.5342 LearningRate 0.000411 Epoch: 16 Global Step: 351200 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:41,991-Speed 2494.82 samples/sec Loss 2.5016 LearningRate 0.000410 Epoch: 16 Global Step: 351210 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:50,196-Speed 2496.55 samples/sec Loss 2.4717 LearningRate 0.000410 Epoch: 16 Global Step: 351220 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:49:58,408-Speed 2494.30 samples/sec Loss 2.4905 LearningRate 0.000410 Epoch: 16 Global Step: 351230 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:06,616-Speed 2495.52 samples/sec Loss 2.5023 LearningRate 0.000410 Epoch: 16 Global Step: 351240 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:14,770-Speed 2512.15 samples/sec Loss 2.5200 LearningRate 0.000410 Epoch: 16 Global Step: 351250 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:22,978-Speed 2495.32 samples/sec Loss 2.4549 LearningRate 0.000410 Epoch: 16 Global Step: 351260 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:31,184-Speed 2496.13 samples/sec Loss 2.4627 LearningRate 0.000410 Epoch: 16 Global Step: 351270 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:39,392-Speed 2495.39 samples/sec Loss 2.5045 LearningRate 0.000410 Epoch: 16 Global Step: 351280 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:47,634-Speed 2485.32 samples/sec Loss 2.4577 LearningRate 0.000410 Epoch: 16 Global Step: 351290 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:50:55,844-Speed 2494.92 samples/sec Loss 2.4794 LearningRate 0.000410 Epoch: 16 Global Step: 351300 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:03,998-Speed 2511.94 samples/sec Loss 2.4669 LearningRate 0.000410 Epoch: 16 Global Step: 351310 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:12,201-Speed 2496.84 samples/sec Loss 2.4534 LearningRate 0.000410 Epoch: 16 Global Step: 351320 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:20,405-Speed 2496.88 samples/sec Loss 2.4697 LearningRate 0.000410 Epoch: 16 Global Step: 351330 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:28,609-Speed 2496.68 samples/sec Loss 2.4426 LearningRate 0.000410 Epoch: 16 Global Step: 351340 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:36,812-Speed 2497.09 samples/sec Loss 2.4700 LearningRate 0.000410 Epoch: 16 Global Step: 351350 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:45,016-Speed 2496.68 samples/sec Loss 2.4522 LearningRate 0.000410 Epoch: 16 Global Step: 351360 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:51:53,169-Speed 2512.41 samples/sec Loss 2.5098 LearningRate 0.000410 Epoch: 16 Global Step: 351370 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:01,373-Speed 2497.12 samples/sec Loss 2.4949 LearningRate 0.000410 Epoch: 16 Global Step: 351380 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:09,575-Speed 2497.09 samples/sec Loss 2.4509 LearningRate 0.000410 Epoch: 16 Global Step: 351390 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:17,779-Speed 2496.85 samples/sec Loss 2.4871 LearningRate 0.000410 Epoch: 16 Global Step: 351400 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:25,992-Speed 2493.86 samples/sec Loss 2.4845 LearningRate 0.000410 Epoch: 16 Global Step: 351410 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:34,199-Speed 2496.00 samples/sec Loss 2.4525 LearningRate 0.000410 Epoch: 16 Global Step: 351420 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:42,350-Speed 2512.92 samples/sec Loss 2.4149 LearningRate 0.000410 Epoch: 16 Global Step: 351430 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:50,557-Speed 2495.80 samples/sec Loss 2.5075 LearningRate 0.000410 Epoch: 16 Global Step: 351440 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:52:58,769-Speed 2494.69 samples/sec Loss 2.4666 LearningRate 0.000410 Epoch: 16 Global Step: 351450 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:06,981-Speed 2494.49 samples/sec Loss 2.4982 LearningRate 0.000410 Epoch: 16 Global Step: 351460 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:15,186-Speed 2496.47 samples/sec Loss 2.4614 LearningRate 0.000410 Epoch: 16 Global Step: 351470 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:23,390-Speed 2496.36 samples/sec Loss 2.5216 LearningRate 0.000410 Epoch: 16 Global Step: 351480 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:31,541-Speed 2513.12 samples/sec Loss 2.4935 LearningRate 0.000410 Epoch: 16 Global Step: 351490 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:39,770-Speed 2489.05 samples/sec Loss 2.4960 LearningRate 0.000410 Epoch: 16 Global Step: 351500 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:47,980-Speed 2494.99 samples/sec Loss 2.4477 LearningRate 0.000410 Epoch: 16 Global Step: 351510 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:53:56,181-Speed 2497.42 samples/sec Loss 2.4642 LearningRate 0.000410 Epoch: 16 Global Step: 351520 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:04,393-Speed 2494.30 samples/sec Loss 2.4678 LearningRate 0.000410 Epoch: 16 Global Step: 351530 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:12,601-Speed 2495.89 samples/sec Loss 2.4296 LearningRate 0.000410 Epoch: 16 Global Step: 351540 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:20,753-Speed 2512.42 samples/sec Loss 2.5450 LearningRate 0.000410 Epoch: 16 Global Step: 351550 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:28,965-Speed 2494.31 samples/sec Loss 2.4828 LearningRate 0.000410 Epoch: 16 Global Step: 351560 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:37,172-Speed 2495.96 samples/sec Loss 2.4905 LearningRate 0.000410 Epoch: 16 Global Step: 351570 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:45,380-Speed 2495.30 samples/sec Loss 2.4513 LearningRate 0.000410 Epoch: 16 Global Step: 351580 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:54:53,591-Speed 2494.75 samples/sec Loss 2.4878 LearningRate 0.000410 Epoch: 16 Global Step: 351590 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:01,799-Speed 2495.57 samples/sec Loss 2.5169 LearningRate 0.000410 Epoch: 16 Global Step: 351600 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:09,952-Speed 2512.18 samples/sec Loss 2.5497 LearningRate 0.000410 Epoch: 16 Global Step: 351610 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:18,163-Speed 2494.68 samples/sec Loss 2.5243 LearningRate 0.000410 Epoch: 16 Global Step: 351620 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:26,378-Speed 2493.56 samples/sec Loss 2.4985 LearningRate 0.000410 Epoch: 16 Global Step: 351630 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:34,594-Speed 2493.10 samples/sec Loss 2.4830 LearningRate 0.000410 Epoch: 16 Global Step: 351640 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:42,803-Speed 2495.53 samples/sec Loss 2.4560 LearningRate 0.000410 Epoch: 16 Global Step: 351650 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:51,011-Speed 2495.32 samples/sec Loss 2.4608 LearningRate 0.000410 Epoch: 16 Global Step: 351660 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:55:59,168-Speed 2511.37 samples/sec Loss 2.4630 LearningRate 0.000410 Epoch: 16 Global Step: 351670 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:56:07,390-Speed 2491.33 samples/sec Loss 2.4520 LearningRate 0.000410 Epoch: 16 Global Step: 351680 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:56:15,601-Speed 2494.67 samples/sec Loss 2.4867 LearningRate 0.000410 Epoch: 16 Global Step: 351690 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:56:23,837-Speed 2487.17 samples/sec Loss 2.4532 LearningRate 0.000410 Epoch: 16 Global Step: 351700 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:56:32,041-Speed 2496.94 samples/sec Loss 2.5333 LearningRate 0.000410 Epoch: 16 Global Step: 351710 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:56:40,248-Speed 2495.73 samples/sec Loss 2.4356 LearningRate 0.000410 Epoch: 16 Global Step: 351720 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:56:48,407-Speed 2510.70 samples/sec Loss 2.4325 LearningRate 0.000410 Epoch: 16 Global Step: 351730 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:56:56,611-Speed 2496.71 samples/sec Loss 2.4509 LearningRate 0.000410 Epoch: 16 Global Step: 351740 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:04,811-Speed 2497.77 samples/sec Loss 2.4762 LearningRate 0.000410 Epoch: 16 Global Step: 351750 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:13,013-Speed 2497.50 samples/sec Loss 2.4857 LearningRate 0.000410 Epoch: 16 Global Step: 351760 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:21,219-Speed 2495.94 samples/sec Loss 2.4648 LearningRate 0.000410 Epoch: 16 Global Step: 351770 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:29,424-Speed 2496.27 samples/sec Loss 2.4502 LearningRate 0.000410 Epoch: 16 Global Step: 351780 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:37,579-Speed 2511.73 samples/sec Loss 2.4533 LearningRate 0.000410 Epoch: 16 Global Step: 351790 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:45,785-Speed 2496.23 samples/sec Loss 2.4113 LearningRate 0.000409 Epoch: 16 Global Step: 351800 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:57:53,989-Speed 2496.56 samples/sec Loss 2.4232 LearningRate 0.000409 Epoch: 16 Global Step: 351810 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:02,196-Speed 2495.87 samples/sec Loss 2.5000 LearningRate 0.000409 Epoch: 16 Global Step: 351820 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:10,410-Speed 2494.09 samples/sec Loss 2.4642 LearningRate 0.000409 Epoch: 16 Global Step: 351830 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:18,613-Speed 2497.17 samples/sec Loss 2.4535 LearningRate 0.000409 Epoch: 16 Global Step: 351840 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:26,765-Speed 2512.62 samples/sec Loss 2.3955 LearningRate 0.000409 Epoch: 16 Global Step: 351850 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:34,971-Speed 2496.14 samples/sec Loss 2.4514 LearningRate 0.000409 Epoch: 16 Global Step: 351860 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:43,174-Speed 2496.87 samples/sec Loss 2.4714 LearningRate 0.000409 Epoch: 16 Global Step: 351870 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:51,377-Speed 2497.21 samples/sec Loss 2.4867 LearningRate 0.000409 Epoch: 16 Global Step: 351880 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:58:59,582-Speed 2496.71 samples/sec Loss 2.5015 LearningRate 0.000409 Epoch: 16 Global Step: 351890 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:59:07,799-Speed 2492.73 samples/sec Loss 2.4234 LearningRate 0.000409 Epoch: 16 Global Step: 351900 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:59:15,951-Speed 2512.43 samples/sec Loss 2.4304 LearningRate 0.000409 Epoch: 16 Global Step: 351910 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 22:59:24,111-Speed 2510.53 samples/sec Loss 2.4981 LearningRate 0.000409 Epoch: 16 Global Step: 351920 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:59:32,315-Speed 2496.75 samples/sec Loss 2.5115 LearningRate 0.000409 Epoch: 16 Global Step: 351930 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:59:40,518-Speed 2496.86 samples/sec Loss 2.4845 LearningRate 0.000409 Epoch: 16 Global Step: 351940 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:59:48,721-Speed 2497.19 samples/sec Loss 2.4330 LearningRate 0.000409 Epoch: 16 Global Step: 351950 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 22:59:56,927-Speed 2496.34 samples/sec Loss 2.4939 LearningRate 0.000409 Epoch: 16 Global Step: 351960 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:05,085-Speed 2510.85 samples/sec Loss 2.4361 LearningRate 0.000409 Epoch: 16 Global Step: 351970 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:13,290-Speed 2496.31 samples/sec Loss 2.4840 LearningRate 0.000409 Epoch: 16 Global Step: 351980 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:21,497-Speed 2496.01 samples/sec Loss 2.4952 LearningRate 0.000409 Epoch: 16 Global Step: 351990 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:29,699-Speed 2497.29 samples/sec Loss 2.4448 LearningRate 0.000409 Epoch: 16 Global Step: 352000 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:37,903-Speed 2496.83 samples/sec Loss 2.4618 LearningRate 0.000409 Epoch: 16 Global Step: 352010 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:46,103-Speed 2497.83 samples/sec Loss 2.3821 LearningRate 0.000409 Epoch: 16 Global Step: 352020 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:00:54,250-Speed 2514.32 samples/sec Loss 2.4812 LearningRate 0.000409 Epoch: 16 Global Step: 352030 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:02,458-Speed 2495.43 samples/sec Loss 2.4823 LearningRate 0.000409 Epoch: 16 Global Step: 352040 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:10,675-Speed 2493.15 samples/sec Loss 2.4582 LearningRate 0.000409 Epoch: 16 Global Step: 352050 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:18,881-Speed 2495.92 samples/sec Loss 2.4023 LearningRate 0.000409 Epoch: 16 Global Step: 352060 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:27,089-Speed 2495.53 samples/sec Loss 2.4010 LearningRate 0.000409 Epoch: 16 Global Step: 352070 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:35,291-Speed 2497.49 samples/sec Loss 2.4971 LearningRate 0.000409 Epoch: 16 Global Step: 352080 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:43,441-Speed 2512.96 samples/sec Loss 2.5244 LearningRate 0.000409 Epoch: 16 Global Step: 352090 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:51,641-Speed 2498.36 samples/sec Loss 2.5147 LearningRate 0.000409 Epoch: 16 Global Step: 352100 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:01:59,847-Speed 2496.17 samples/sec Loss 2.4908 LearningRate 0.000409 Epoch: 16 Global Step: 352110 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:08,050-Speed 2496.90 samples/sec Loss 2.4774 LearningRate 0.000409 Epoch: 16 Global Step: 352120 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:16,255-Speed 2496.45 samples/sec Loss 2.4592 LearningRate 0.000409 Epoch: 16 Global Step: 352130 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:24,456-Speed 2497.60 samples/sec Loss 2.4883 LearningRate 0.000409 Epoch: 16 Global Step: 352140 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:32,606-Speed 2513.17 samples/sec Loss 2.4980 LearningRate 0.000409 Epoch: 16 Global Step: 352150 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:40,807-Speed 2497.72 samples/sec Loss 2.4688 LearningRate 0.000409 Epoch: 16 Global Step: 352160 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:49,019-Speed 2494.19 samples/sec Loss 2.5081 LearningRate 0.000409 Epoch: 16 Global Step: 352170 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:02:57,222-Speed 2497.12 samples/sec Loss 2.5063 LearningRate 0.000409 Epoch: 16 Global Step: 352180 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:05,427-Speed 2496.40 samples/sec Loss 2.4804 LearningRate 0.000409 Epoch: 16 Global Step: 352190 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:13,632-Speed 2496.36 samples/sec Loss 2.5259 LearningRate 0.000409 Epoch: 16 Global Step: 352200 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:21,782-Speed 2513.51 samples/sec Loss 2.4865 LearningRate 0.000409 Epoch: 16 Global Step: 352210 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:29,985-Speed 2497.31 samples/sec Loss 2.4479 LearningRate 0.000409 Epoch: 16 Global Step: 352220 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:38,192-Speed 2495.89 samples/sec Loss 2.4852 LearningRate 0.000409 Epoch: 16 Global Step: 352230 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:46,394-Speed 2497.07 samples/sec Loss 2.5297 LearningRate 0.000409 Epoch: 16 Global Step: 352240 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:03:54,602-Speed 2495.66 samples/sec Loss 2.4822 LearningRate 0.000409 Epoch: 16 Global Step: 352250 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:02,807-Speed 2496.48 samples/sec Loss 2.4788 LearningRate 0.000409 Epoch: 16 Global Step: 352260 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:10,959-Speed 2512.77 samples/sec Loss 2.5207 LearningRate 0.000409 Epoch: 16 Global Step: 352270 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:19,163-Speed 2496.70 samples/sec Loss 2.4655 LearningRate 0.000409 Epoch: 16 Global Step: 352280 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:27,368-Speed 2496.77 samples/sec Loss 2.4633 LearningRate 0.000409 Epoch: 16 Global Step: 352290 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:35,574-Speed 2496.11 samples/sec Loss 2.5013 LearningRate 0.000409 Epoch: 16 Global Step: 352300 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:43,790-Speed 2492.93 samples/sec Loss 2.5306 LearningRate 0.000409 Epoch: 16 Global Step: 352310 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:04:51,993-Speed 2497.14 samples/sec Loss 2.4809 LearningRate 0.000409 Epoch: 16 Global Step: 352320 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:00,148-Speed 2511.68 samples/sec Loss 2.4530 LearningRate 0.000409 Epoch: 16 Global Step: 352330 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:08,359-Speed 2494.55 samples/sec Loss 2.4802 LearningRate 0.000409 Epoch: 16 Global Step: 352340 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:16,563-Speed 2496.79 samples/sec Loss 2.4768 LearningRate 0.000409 Epoch: 16 Global Step: 352350 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:24,776-Speed 2494.22 samples/sec Loss 2.4655 LearningRate 0.000409 Epoch: 16 Global Step: 352360 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:32,977-Speed 2497.43 samples/sec Loss 2.4323 LearningRate 0.000409 Epoch: 16 Global Step: 352370 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:41,180-Speed 2497.02 samples/sec Loss 2.4931 LearningRate 0.000408 Epoch: 16 Global Step: 352380 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:49,331-Speed 2513.13 samples/sec Loss 2.4996 LearningRate 0.000408 Epoch: 16 Global Step: 352390 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:05:57,532-Speed 2497.59 samples/sec Loss 2.5125 LearningRate 0.000408 Epoch: 16 Global Step: 352400 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:05,734-Speed 2497.27 samples/sec Loss 2.4882 LearningRate 0.000408 Epoch: 16 Global Step: 352410 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:13,935-Speed 2497.65 samples/sec Loss 2.5097 LearningRate 0.000408 Epoch: 16 Global Step: 352420 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:22,143-Speed 2495.45 samples/sec Loss 2.4553 LearningRate 0.000408 Epoch: 16 Global Step: 352430 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:30,346-Speed 2497.08 samples/sec Loss 2.4480 LearningRate 0.000408 Epoch: 16 Global Step: 352440 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:38,494-Speed 2514.05 samples/sec Loss 2.4302 LearningRate 0.000408 Epoch: 16 Global Step: 352450 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:46,695-Speed 2497.74 samples/sec Loss 2.5041 LearningRate 0.000408 Epoch: 16 Global Step: 352460 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:06:54,901-Speed 2495.98 samples/sec Loss 2.4391 LearningRate 0.000408 Epoch: 16 Global Step: 352470 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:03,108-Speed 2496.04 samples/sec Loss 2.4403 LearningRate 0.000408 Epoch: 16 Global Step: 352480 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:11,314-Speed 2496.45 samples/sec Loss 2.4445 LearningRate 0.000408 Epoch: 16 Global Step: 352490 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:19,516-Speed 2497.31 samples/sec Loss 2.4707 LearningRate 0.000408 Epoch: 16 Global Step: 352500 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:27,667-Speed 2512.91 samples/sec Loss 2.4952 LearningRate 0.000408 Epoch: 16 Global Step: 352510 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:35,869-Speed 2497.76 samples/sec Loss 2.4657 LearningRate 0.000408 Epoch: 16 Global Step: 352520 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:44,072-Speed 2496.98 samples/sec Loss 2.4993 LearningRate 0.000408 Epoch: 16 Global Step: 352530 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:07:52,276-Speed 2496.65 samples/sec Loss 2.4997 LearningRate 0.000408 Epoch: 16 Global Step: 352540 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:00,484-Speed 2495.59 samples/sec Loss 2.4496 LearningRate 0.000408 Epoch: 16 Global Step: 352550 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:08,687-Speed 2496.88 samples/sec Loss 2.4684 LearningRate 0.000408 Epoch: 16 Global Step: 352560 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:16,848-Speed 2510.09 samples/sec Loss 2.4918 LearningRate 0.000408 Epoch: 16 Global Step: 352570 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:27,875-Speed 1857.47 samples/sec Loss 2.4857 LearningRate 0.000408 Epoch: 17 Global Step: 352580 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:36,067-Speed 2500.31 samples/sec Loss 2.5484 LearningRate 0.000408 Epoch: 17 Global Step: 352590 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:44,268-Speed 2497.72 samples/sec Loss 2.5523 LearningRate 0.000408 Epoch: 17 Global Step: 352600 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:08:52,463-Speed 2499.24 samples/sec Loss 2.5409 LearningRate 0.000408 Epoch: 17 Global Step: 352610 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:00,663-Speed 2497.98 samples/sec Loss 2.5186 LearningRate 0.000408 Epoch: 17 Global Step: 352620 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:08,815-Speed 2512.81 samples/sec Loss 2.4861 LearningRate 0.000408 Epoch: 17 Global Step: 352630 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:17,011-Speed 2499.05 samples/sec Loss 2.5270 LearningRate 0.000408 Epoch: 17 Global Step: 352640 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:25,213-Speed 2497.52 samples/sec Loss 2.4681 LearningRate 0.000408 Epoch: 17 Global Step: 352650 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:33,422-Speed 2495.34 samples/sec Loss 2.5035 LearningRate 0.000408 Epoch: 17 Global Step: 352660 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:41,622-Speed 2498.12 samples/sec Loss 2.4869 LearningRate 0.000408 Epoch: 17 Global Step: 352670 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:49,829-Speed 2495.59 samples/sec Loss 2.4539 LearningRate 0.000408 Epoch: 17 Global Step: 352680 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:09:57,990-Speed 2509.79 samples/sec Loss 2.4770 LearningRate 0.000408 Epoch: 17 Global Step: 352690 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:06,194-Speed 2496.83 samples/sec Loss 2.5170 LearningRate 0.000408 Epoch: 17 Global Step: 352700 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:14,392-Speed 2498.65 samples/sec Loss 2.4408 LearningRate 0.000408 Epoch: 17 Global Step: 352710 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:22,592-Speed 2497.73 samples/sec Loss 2.4725 LearningRate 0.000408 Epoch: 17 Global Step: 352720 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:30,793-Speed 2497.78 samples/sec Loss 2.3935 LearningRate 0.000408 Epoch: 17 Global Step: 352730 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:38,995-Speed 2497.45 samples/sec Loss 2.4527 LearningRate 0.000408 Epoch: 17 Global Step: 352740 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:47,144-Speed 2513.59 samples/sec Loss 2.4921 LearningRate 0.000408 Epoch: 17 Global Step: 352750 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:10:55,343-Speed 2498.26 samples/sec Loss 2.4299 LearningRate 0.000408 Epoch: 17 Global Step: 352760 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:03,543-Speed 2497.87 samples/sec Loss 2.4384 LearningRate 0.000408 Epoch: 17 Global Step: 352770 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:11,740-Speed 2498.88 samples/sec Loss 2.4374 LearningRate 0.000408 Epoch: 17 Global Step: 352780 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:19,938-Speed 2498.66 samples/sec Loss 2.4323 LearningRate 0.000408 Epoch: 17 Global Step: 352790 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:28,140-Speed 2497.23 samples/sec Loss 2.4721 LearningRate 0.000408 Epoch: 17 Global Step: 352800 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:36,290-Speed 2513.31 samples/sec Loss 2.3955 LearningRate 0.000408 Epoch: 17 Global Step: 352810 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:44,486-Speed 2499.17 samples/sec Loss 2.4824 LearningRate 0.000408 Epoch: 17 Global Step: 352820 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:11:52,688-Speed 2497.26 samples/sec Loss 2.4326 LearningRate 0.000408 Epoch: 17 Global Step: 352830 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:00,892-Speed 2496.90 samples/sec Loss 2.4383 LearningRate 0.000408 Epoch: 17 Global Step: 352840 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:09,094-Speed 2497.22 samples/sec Loss 2.4852 LearningRate 0.000408 Epoch: 17 Global Step: 352850 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:17,301-Speed 2495.94 samples/sec Loss 2.4581 LearningRate 0.000408 Epoch: 17 Global Step: 352860 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:25,449-Speed 2513.97 samples/sec Loss 2.5063 LearningRate 0.000408 Epoch: 17 Global Step: 352870 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:33,646-Speed 2498.97 samples/sec Loss 2.4558 LearningRate 0.000408 Epoch: 17 Global Step: 352880 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:41,844-Speed 2498.46 samples/sec Loss 2.4222 LearningRate 0.000408 Epoch: 17 Global Step: 352890 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:50,061-Speed 2493.03 samples/sec Loss 2.4150 LearningRate 0.000408 Epoch: 17 Global Step: 352900 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:12:58,260-Speed 2498.05 samples/sec Loss 2.4846 LearningRate 0.000408 Epoch: 17 Global Step: 352910 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:06,458-Speed 2498.39 samples/sec Loss 2.4734 LearningRate 0.000408 Epoch: 17 Global Step: 352920 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:14,611-Speed 2512.75 samples/sec Loss 2.4224 LearningRate 0.000408 Epoch: 17 Global Step: 352930 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:22,815-Speed 2497.04 samples/sec Loss 2.3873 LearningRate 0.000408 Epoch: 17 Global Step: 352940 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:31,027-Speed 2494.16 samples/sec Loss 2.4832 LearningRate 0.000408 Epoch: 17 Global Step: 352950 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:39,231-Speed 2496.65 samples/sec Loss 2.4633 LearningRate 0.000408 Epoch: 17 Global Step: 352960 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:47,432-Speed 2498.03 samples/sec Loss 2.4237 LearningRate 0.000407 Epoch: 17 Global Step: 352970 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:13:55,632-Speed 2498.04 samples/sec Loss 2.4327 LearningRate 0.000407 Epoch: 17 Global Step: 352980 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:03,781-Speed 2513.52 samples/sec Loss 2.4248 LearningRate 0.000407 Epoch: 17 Global Step: 352990 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:11,980-Speed 2498.09 samples/sec Loss 2.4234 LearningRate 0.000407 Epoch: 17 Global Step: 353000 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:20,180-Speed 2498.06 samples/sec Loss 2.4213 LearningRate 0.000407 Epoch: 17 Global Step: 353010 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:28,379-Speed 2498.18 samples/sec Loss 2.4254 LearningRate 0.000407 Epoch: 17 Global Step: 353020 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:36,585-Speed 2496.10 samples/sec Loss 2.4387 LearningRate 0.000407 Epoch: 17 Global Step: 353030 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:44,785-Speed 2498.09 samples/sec Loss 2.4420 LearningRate 0.000407 Epoch: 17 Global Step: 353040 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:14:52,931-Speed 2514.60 samples/sec Loss 2.4132 LearningRate 0.000407 Epoch: 17 Global Step: 353050 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:01,141-Speed 2494.70 samples/sec Loss 2.4417 LearningRate 0.000407 Epoch: 17 Global Step: 353060 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:09,339-Speed 2498.57 samples/sec Loss 2.4669 LearningRate 0.000407 Epoch: 17 Global Step: 353070 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:17,556-Speed 2492.90 samples/sec Loss 2.3976 LearningRate 0.000407 Epoch: 17 Global Step: 353080 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:25,754-Speed 2498.58 samples/sec Loss 2.3924 LearningRate 0.000407 Epoch: 17 Global Step: 353090 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:33,964-Speed 2494.83 samples/sec Loss 2.4141 LearningRate 0.000407 Epoch: 17 Global Step: 353100 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:42,109-Speed 2514.98 samples/sec Loss 2.4045 LearningRate 0.000407 Epoch: 17 Global Step: 353110 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:15:50,307-Speed 2498.46 samples/sec Loss 2.4467 LearningRate 0.000407 Epoch: 17 Global Step: 353120 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:15:58,506-Speed 2498.38 samples/sec Loss 2.4193 LearningRate 0.000407 Epoch: 17 Global Step: 353130 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:06,706-Speed 2497.84 samples/sec Loss 2.3735 LearningRate 0.000407 Epoch: 17 Global Step: 353140 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:14,919-Speed 2494.06 samples/sec Loss 2.3994 LearningRate 0.000407 Epoch: 17 Global Step: 353150 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:23,119-Speed 2497.81 samples/sec Loss 2.4160 LearningRate 0.000407 Epoch: 17 Global Step: 353160 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:31,267-Speed 2514.07 samples/sec Loss 2.4186 LearningRate 0.000407 Epoch: 17 Global Step: 353170 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:39,485-Speed 2492.48 samples/sec Loss 2.4963 LearningRate 0.000407 Epoch: 17 Global Step: 353180 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:47,686-Speed 2497.75 samples/sec Loss 2.4441 LearningRate 0.000407 Epoch: 17 Global Step: 353190 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:16:55,884-Speed 2498.59 samples/sec Loss 2.4935 LearningRate 0.000407 Epoch: 17 Global Step: 353200 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:04,086-Speed 2497.41 samples/sec Loss 2.4877 LearningRate 0.000407 Epoch: 17 Global Step: 353210 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:12,293-Speed 2495.86 samples/sec Loss 2.4993 LearningRate 0.000407 Epoch: 17 Global Step: 353220 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:20,440-Speed 2514.30 samples/sec Loss 2.4824 LearningRate 0.000407 Epoch: 17 Global Step: 353230 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:28,639-Speed 2498.11 samples/sec Loss 2.4763 LearningRate 0.000407 Epoch: 17 Global Step: 353240 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:36,841-Speed 2497.50 samples/sec Loss 2.4642 LearningRate 0.000407 Epoch: 17 Global Step: 353250 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:45,041-Speed 2498.16 samples/sec Loss 2.3874 LearningRate 0.000407 Epoch: 17 Global Step: 353260 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:17:53,243-Speed 2497.25 samples/sec Loss 2.4854 LearningRate 0.000407 Epoch: 17 Global Step: 353270 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:01,442-Speed 2498.37 samples/sec Loss 2.4831 LearningRate 0.000407 Epoch: 17 Global Step: 353280 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:09,590-Speed 2513.78 samples/sec Loss 2.4980 LearningRate 0.000407 Epoch: 17 Global Step: 353290 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:17,804-Speed 2493.71 samples/sec Loss 2.4638 LearningRate 0.000407 Epoch: 17 Global Step: 353300 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:26,003-Speed 2498.54 samples/sec Loss 2.5370 LearningRate 0.000407 Epoch: 17 Global Step: 353310 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:34,202-Speed 2498.46 samples/sec Loss 2.5084 LearningRate 0.000407 Epoch: 17 Global Step: 353320 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:42,400-Speed 2498.73 samples/sec Loss 2.4116 LearningRate 0.000407 Epoch: 17 Global Step: 353330 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:50,606-Speed 2495.88 samples/sec Loss 2.5112 LearningRate 0.000407 Epoch: 17 Global Step: 353340 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:18:58,784-Speed 2504.69 samples/sec Loss 2.4787 LearningRate 0.000407 Epoch: 17 Global Step: 353350 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:06,982-Speed 2498.74 samples/sec Loss 2.4786 LearningRate 0.000407 Epoch: 17 Global Step: 353360 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:15,185-Speed 2497.11 samples/sec Loss 2.4911 LearningRate 0.000407 Epoch: 17 Global Step: 353370 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:23,385-Speed 2498.05 samples/sec Loss 2.4498 LearningRate 0.000407 Epoch: 17 Global Step: 353380 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:31,581-Speed 2499.21 samples/sec Loss 2.4771 LearningRate 0.000407 Epoch: 17 Global Step: 353390 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:39,787-Speed 2496.31 samples/sec Loss 2.4390 LearningRate 0.000407 Epoch: 17 Global Step: 353400 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:47,934-Speed 2513.85 samples/sec Loss 2.4093 LearningRate 0.000407 Epoch: 17 Global Step: 353410 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:19:56,132-Speed 2499.25 samples/sec Loss 2.4735 LearningRate 0.000407 Epoch: 17 Global Step: 353420 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:04,334-Speed 2497.18 samples/sec Loss 2.4277 LearningRate 0.000407 Epoch: 17 Global Step: 353430 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:12,539-Speed 2496.64 samples/sec Loss 2.4157 LearningRate 0.000407 Epoch: 17 Global Step: 353440 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:20,737-Speed 2498.29 samples/sec Loss 2.4048 LearningRate 0.000407 Epoch: 17 Global Step: 353450 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:28,936-Speed 2498.17 samples/sec Loss 2.4270 LearningRate 0.000407 Epoch: 17 Global Step: 353460 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:37,080-Speed 2515.19 samples/sec Loss 2.3978 LearningRate 0.000407 Epoch: 17 Global Step: 353470 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:45,283-Speed 2497.18 samples/sec Loss 2.3982 LearningRate 0.000407 Epoch: 17 Global Step: 353480 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:20:53,492-Speed 2495.27 samples/sec Loss 2.4400 LearningRate 0.000407 Epoch: 17 Global Step: 353490 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:01,689-Speed 2498.90 samples/sec Loss 2.4251 LearningRate 0.000407 Epoch: 17 Global Step: 353500 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:09,981-Speed 2470.23 samples/sec Loss 2.4017 LearningRate 0.000407 Epoch: 17 Global Step: 353510 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:18,195-Speed 2493.72 samples/sec Loss 2.4717 LearningRate 0.000407 Epoch: 17 Global Step: 353520 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:26,344-Speed 2513.45 samples/sec Loss 2.4060 LearningRate 0.000407 Epoch: 17 Global Step: 353530 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:34,551-Speed 2496.07 samples/sec Loss 2.4472 LearningRate 0.000407 Epoch: 17 Global Step: 353540 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:42,751-Speed 2498.05 samples/sec Loss 2.4604 LearningRate 0.000406 Epoch: 17 Global Step: 353550 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:50,948-Speed 2498.69 samples/sec Loss 2.3962 LearningRate 0.000406 Epoch: 17 Global Step: 353560 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:21:59,147-Speed 2498.22 samples/sec Loss 2.4203 LearningRate 0.000406 Epoch: 17 Global Step: 353570 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:07,352-Speed 2496.45 samples/sec Loss 2.4015 LearningRate 0.000406 Epoch: 17 Global Step: 353580 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:15,498-Speed 2514.61 samples/sec Loss 2.3882 LearningRate 0.000406 Epoch: 17 Global Step: 353590 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:23,702-Speed 2496.94 samples/sec Loss 2.4615 LearningRate 0.000406 Epoch: 17 Global Step: 353600 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:31,908-Speed 2496.19 samples/sec Loss 2.4847 LearningRate 0.000406 Epoch: 17 Global Step: 353610 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:40,107-Speed 2498.34 samples/sec Loss 2.4193 LearningRate 0.000406 Epoch: 17 Global Step: 353620 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:48,308-Speed 2497.77 samples/sec Loss 2.4509 LearningRate 0.000406 Epoch: 17 Global Step: 353630 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:22:56,509-Speed 2497.67 samples/sec Loss 2.4337 LearningRate 0.000406 Epoch: 17 Global Step: 353640 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:04,656-Speed 2514.14 samples/sec Loss 2.4388 LearningRate 0.000406 Epoch: 17 Global Step: 353650 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:12,859-Speed 2497.40 samples/sec Loss 2.4789 LearningRate 0.000406 Epoch: 17 Global Step: 353660 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:21,060-Speed 2497.58 samples/sec Loss 2.3668 LearningRate 0.000406 Epoch: 17 Global Step: 353670 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:29,262-Speed 2497.51 samples/sec Loss 2.4452 LearningRate 0.000406 Epoch: 17 Global Step: 353680 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:37,460-Speed 2498.37 samples/sec Loss 2.4515 LearningRate 0.000406 Epoch: 17 Global Step: 353690 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:45,657-Speed 2498.87 samples/sec Loss 2.4449 LearningRate 0.000406 Epoch: 17 Global Step: 353700 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:23:53,803-Speed 2514.51 samples/sec Loss 2.4624 LearningRate 0.000406 Epoch: 17 Global Step: 353710 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:02,003-Speed 2498.20 samples/sec Loss 2.4347 LearningRate 0.000406 Epoch: 17 Global Step: 353720 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:10,201-Speed 2498.32 samples/sec Loss 2.4299 LearningRate 0.000406 Epoch: 17 Global Step: 353730 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:18,401-Speed 2498.50 samples/sec Loss 2.4860 LearningRate 0.000406 Epoch: 17 Global Step: 353740 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:26,600-Speed 2498.48 samples/sec Loss 2.4190 LearningRate 0.000406 Epoch: 17 Global Step: 353750 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:34,805-Speed 2496.29 samples/sec Loss 2.4508 LearningRate 0.000406 Epoch: 17 Global Step: 353760 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:42,952-Speed 2514.09 samples/sec Loss 2.4363 LearningRate 0.000406 Epoch: 17 Global Step: 353770 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:51,151-Speed 2498.24 samples/sec Loss 2.4316 LearningRate 0.000406 Epoch: 17 Global Step: 353780 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:24:59,356-Speed 2496.56 samples/sec Loss 2.4646 LearningRate 0.000406 Epoch: 17 Global Step: 353790 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:07,561-Speed 2496.38 samples/sec Loss 2.4317 LearningRate 0.000406 Epoch: 17 Global Step: 353800 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:15,786-Speed 2490.24 samples/sec Loss 2.4269 LearningRate 0.000406 Epoch: 17 Global Step: 353810 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:23,987-Speed 2498.07 samples/sec Loss 2.4207 LearningRate 0.000406 Epoch: 17 Global Step: 353820 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:32,132-Speed 2514.84 samples/sec Loss 2.3841 LearningRate 0.000406 Epoch: 17 Global Step: 353830 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:40,331-Speed 2498.28 samples/sec Loss 2.4632 LearningRate 0.000406 Epoch: 17 Global Step: 353840 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:48,531-Speed 2497.79 samples/sec Loss 2.5079 LearningRate 0.000406 Epoch: 17 Global Step: 353850 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:25:56,734-Speed 2497.23 samples/sec Loss 2.4417 LearningRate 0.000406 Epoch: 17 Global Step: 353860 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:04,938-Speed 2496.74 samples/sec Loss 2.4083 LearningRate 0.000406 Epoch: 17 Global Step: 353870 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:13,140-Speed 2497.41 samples/sec Loss 2.4601 LearningRate 0.000406 Epoch: 17 Global Step: 353880 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:21,288-Speed 2513.66 samples/sec Loss 2.4560 LearningRate 0.000406 Epoch: 17 Global Step: 353890 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:29,521-Speed 2488.06 samples/sec Loss 2.4701 LearningRate 0.000406 Epoch: 17 Global Step: 353900 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:37,719-Speed 2498.30 samples/sec Loss 2.4591 LearningRate 0.000406 Epoch: 17 Global Step: 353910 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:45,917-Speed 2498.52 samples/sec Loss 2.4424 LearningRate 0.000406 Epoch: 17 Global Step: 353920 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:26:54,120-Speed 2497.25 samples/sec Loss 2.4464 LearningRate 0.000406 Epoch: 17 Global Step: 353930 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:02,321-Speed 2497.49 samples/sec Loss 2.4655 LearningRate 0.000406 Epoch: 17 Global Step: 353940 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:10,471-Speed 2513.17 samples/sec Loss 2.4402 LearningRate 0.000406 Epoch: 17 Global Step: 353950 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:18,686-Speed 2493.19 samples/sec Loss 2.4609 LearningRate 0.000406 Epoch: 17 Global Step: 353960 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:26,885-Speed 2498.84 samples/sec Loss 2.4964 LearningRate 0.000406 Epoch: 17 Global Step: 353970 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:35,088-Speed 2496.93 samples/sec Loss 2.4879 LearningRate 0.000406 Epoch: 17 Global Step: 353980 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:43,301-Speed 2493.97 samples/sec Loss 2.4551 LearningRate 0.000406 Epoch: 17 Global Step: 353990 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:51,504-Speed 2497.11 samples/sec Loss 2.4325 LearningRate 0.000406 Epoch: 17 Global Step: 354000 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:27:59,667-Speed 2509.38 samples/sec Loss 2.4562 LearningRate 0.000406 Epoch: 17 Global Step: 354010 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:07,867-Speed 2497.75 samples/sec Loss 2.4385 LearningRate 0.000406 Epoch: 17 Global Step: 354020 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:16,067-Speed 2498.02 samples/sec Loss 2.4608 LearningRate 0.000406 Epoch: 17 Global Step: 354030 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:24,269-Speed 2497.70 samples/sec Loss 2.4134 LearningRate 0.000406 Epoch: 17 Global Step: 354040 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:32,471-Speed 2497.92 samples/sec Loss 2.4532 LearningRate 0.000406 Epoch: 17 Global Step: 354050 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:40,671-Speed 2497.71 samples/sec Loss 2.4052 LearningRate 0.000406 Epoch: 17 Global Step: 354060 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:48,821-Speed 2513.43 samples/sec Loss 2.4617 LearningRate 0.000406 Epoch: 17 Global Step: 354070 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:28:57,021-Speed 2497.97 samples/sec Loss 2.4554 LearningRate 0.000406 Epoch: 17 Global Step: 354080 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:05,228-Speed 2496.15 samples/sec Loss 2.4640 LearningRate 0.000406 Epoch: 17 Global Step: 354090 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:13,438-Speed 2494.86 samples/sec Loss 2.4813 LearningRate 0.000406 Epoch: 17 Global Step: 354100 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:21,634-Speed 2499.06 samples/sec Loss 2.4624 LearningRate 0.000406 Epoch: 17 Global Step: 354110 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:29,833-Speed 2498.21 samples/sec Loss 2.4344 LearningRate 0.000406 Epoch: 17 Global Step: 354120 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:37,983-Speed 2513.36 samples/sec Loss 2.3909 LearningRate 0.000406 Epoch: 17 Global Step: 354130 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:46,181-Speed 2498.65 samples/sec Loss 2.4407 LearningRate 0.000405 Epoch: 17 Global Step: 354140 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:29:54,381-Speed 2498.09 samples/sec Loss 2.3880 LearningRate 0.000405 Epoch: 17 Global Step: 354150 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:02,580-Speed 2498.24 samples/sec Loss 2.4364 LearningRate 0.000405 Epoch: 17 Global Step: 354160 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:10,780-Speed 2498.17 samples/sec Loss 2.4731 LearningRate 0.000405 Epoch: 17 Global Step: 354170 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:18,984-Speed 2496.70 samples/sec Loss 2.4352 LearningRate 0.000405 Epoch: 17 Global Step: 354180 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:27,128-Speed 2515.20 samples/sec Loss 2.4383 LearningRate 0.000405 Epoch: 17 Global Step: 354190 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:35,325-Speed 2498.89 samples/sec Loss 2.4709 LearningRate 0.000405 Epoch: 17 Global Step: 354200 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:43,524-Speed 2498.45 samples/sec Loss 2.4314 LearningRate 0.000405 Epoch: 17 Global Step: 354210 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:51,721-Speed 2498.65 samples/sec Loss 2.4362 LearningRate 0.000405 Epoch: 17 Global Step: 354220 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:30:59,924-Speed 2497.19 samples/sec Loss 2.4671 LearningRate 0.000405 Epoch: 17 Global Step: 354230 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:08,123-Speed 2497.98 samples/sec Loss 2.4495 LearningRate 0.000405 Epoch: 17 Global Step: 354240 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:16,269-Speed 2514.65 samples/sec Loss 2.4688 LearningRate 0.000405 Epoch: 17 Global Step: 354250 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:24,473-Speed 2496.56 samples/sec Loss 2.4661 LearningRate 0.000405 Epoch: 17 Global Step: 354260 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:32,672-Speed 2498.43 samples/sec Loss 2.4497 LearningRate 0.000405 Epoch: 17 Global Step: 354270 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:40,867-Speed 2499.17 samples/sec Loss 2.4823 LearningRate 0.000405 Epoch: 17 Global Step: 354280 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:49,073-Speed 2496.38 samples/sec Loss 2.4272 LearningRate 0.000405 Epoch: 17 Global Step: 354290 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:31:57,271-Speed 2498.28 samples/sec Loss 2.4695 LearningRate 0.000405 Epoch: 17 Global Step: 354300 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:05,420-Speed 2513.66 samples/sec Loss 2.4745 LearningRate 0.000405 Epoch: 17 Global Step: 354310 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:13,616-Speed 2499.40 samples/sec Loss 2.4559 LearningRate 0.000405 Epoch: 17 Global Step: 354320 Fp16 Grad Scale: 131072 Required: 109 hours Training: 2022-07-08 23:32:21,774-Speed 2510.70 samples/sec Loss 2.4533 LearningRate 0.000405 Epoch: 17 Global Step: 354330 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:29,972-Speed 2498.59 samples/sec Loss 2.4684 LearningRate 0.000405 Epoch: 17 Global Step: 354340 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:38,171-Speed 2497.99 samples/sec Loss 2.4469 LearningRate 0.000405 Epoch: 17 Global Step: 354350 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:46,380-Speed 2495.32 samples/sec Loss 2.4417 LearningRate 0.000405 Epoch: 17 Global Step: 354360 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:32:54,553-Speed 2506.37 samples/sec Loss 2.4121 LearningRate 0.000405 Epoch: 17 Global Step: 354370 Fp16 Grad Scale: 65536 Required: 109 hours Training: 2022-07-08 23:33:02,709-Speed 2511.28 samples/sec Loss 2.4705 LearningRate 0.000405 Epoch: 17 Global Step: 354380 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:10,914-Speed 2496.50 samples/sec Loss 2.4706 LearningRate 0.000405 Epoch: 17 Global Step: 354390 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:19,119-Speed 2496.38 samples/sec Loss 2.5079 LearningRate 0.000405 Epoch: 17 Global Step: 354400 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:27,316-Speed 2498.99 samples/sec Loss 2.4975 LearningRate 0.000405 Epoch: 17 Global Step: 354410 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:35,514-Speed 2498.48 samples/sec Loss 2.4306 LearningRate 0.000405 Epoch: 17 Global Step: 354420 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:43,661-Speed 2514.24 samples/sec Loss 2.4624 LearningRate 0.000405 Epoch: 17 Global Step: 354430 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:33:51,860-Speed 2498.27 samples/sec Loss 2.4867 LearningRate 0.000405 Epoch: 17 Global Step: 354440 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:00,062-Speed 2497.24 samples/sec Loss 2.4503 LearningRate 0.000405 Epoch: 17 Global Step: 354450 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:08,266-Speed 2496.79 samples/sec Loss 2.4348 LearningRate 0.000405 Epoch: 17 Global Step: 354460 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:16,467-Speed 2500.20 samples/sec Loss 2.4314 LearningRate 0.000405 Epoch: 17 Global Step: 354470 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:24,676-Speed 2495.18 samples/sec Loss 2.4393 LearningRate 0.000405 Epoch: 17 Global Step: 354480 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:32,821-Speed 2514.54 samples/sec Loss 2.3508 LearningRate 0.000405 Epoch: 17 Global Step: 354490 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:41,021-Speed 2497.97 samples/sec Loss 2.3771 LearningRate 0.000405 Epoch: 17 Global Step: 354500 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:49,225-Speed 2496.75 samples/sec Loss 2.3911 LearningRate 0.000405 Epoch: 17 Global Step: 354510 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:34:57,425-Speed 2497.94 samples/sec Loss 2.4243 LearningRate 0.000405 Epoch: 17 Global Step: 354520 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:05,639-Speed 2493.51 samples/sec Loss 2.4244 LearningRate 0.000405 Epoch: 17 Global Step: 354530 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:13,838-Speed 2498.19 samples/sec Loss 2.4563 LearningRate 0.000405 Epoch: 17 Global Step: 354540 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:21,983-Speed 2514.97 samples/sec Loss 2.4018 LearningRate 0.000405 Epoch: 17 Global Step: 354550 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:30,180-Speed 2498.88 samples/sec Loss 2.3908 LearningRate 0.000405 Epoch: 17 Global Step: 354560 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:38,377-Speed 2498.99 samples/sec Loss 2.4153 LearningRate 0.000405 Epoch: 17 Global Step: 354570 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:46,576-Speed 2498.16 samples/sec Loss 2.4022 LearningRate 0.000405 Epoch: 17 Global Step: 354580 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:35:54,781-Speed 2496.53 samples/sec Loss 2.4390 LearningRate 0.000405 Epoch: 17 Global Step: 354590 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:02,997-Speed 2493.09 samples/sec Loss 2.3884 LearningRate 0.000405 Epoch: 17 Global Step: 354600 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:11,146-Speed 2513.47 samples/sec Loss 2.4535 LearningRate 0.000405 Epoch: 17 Global Step: 354610 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:19,346-Speed 2497.95 samples/sec Loss 2.4255 LearningRate 0.000405 Epoch: 17 Global Step: 354620 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:27,544-Speed 2498.68 samples/sec Loss 2.3960 LearningRate 0.000405 Epoch: 17 Global Step: 354630 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:35,744-Speed 2497.93 samples/sec Loss 2.4286 LearningRate 0.000405 Epoch: 17 Global Step: 354640 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:43,949-Speed 2496.40 samples/sec Loss 2.3838 LearningRate 0.000405 Epoch: 17 Global Step: 354650 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:36:52,162-Speed 2494.12 samples/sec Loss 2.4623 LearningRate 0.000405 Epoch: 17 Global Step: 354660 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:00,309-Speed 2514.09 samples/sec Loss 2.4406 LearningRate 0.000405 Epoch: 17 Global Step: 354670 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:08,523-Speed 2493.77 samples/sec Loss 2.4193 LearningRate 0.000405 Epoch: 17 Global Step: 354680 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:16,723-Speed 2498.11 samples/sec Loss 2.4085 LearningRate 0.000405 Epoch: 17 Global Step: 354690 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:24,921-Speed 2498.53 samples/sec Loss 2.4239 LearningRate 0.000405 Epoch: 17 Global Step: 354700 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:33,124-Speed 2497.06 samples/sec Loss 2.4043 LearningRate 0.000405 Epoch: 17 Global Step: 354710 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:41,338-Speed 2493.69 samples/sec Loss 2.3987 LearningRate 0.000404 Epoch: 17 Global Step: 354720 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:49,483-Speed 2514.92 samples/sec Loss 2.4493 LearningRate 0.000404 Epoch: 17 Global Step: 354730 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:37:57,689-Speed 2496.12 samples/sec Loss 2.4362 LearningRate 0.000404 Epoch: 17 Global Step: 354740 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:05,888-Speed 2498.30 samples/sec Loss 2.4614 LearningRate 0.000404 Epoch: 17 Global Step: 354750 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:14,110-Speed 2491.14 samples/sec Loss 2.4322 LearningRate 0.000404 Epoch: 17 Global Step: 354760 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:22,312-Speed 2497.48 samples/sec Loss 2.4255 LearningRate 0.000404 Epoch: 17 Global Step: 354770 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:30,517-Speed 2496.35 samples/sec Loss 2.4258 LearningRate 0.000404 Epoch: 17 Global Step: 354780 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:38,664-Speed 2514.38 samples/sec Loss 2.4198 LearningRate 0.000404 Epoch: 17 Global Step: 354790 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:46,863-Speed 2498.18 samples/sec Loss 2.4317 LearningRate 0.000404 Epoch: 17 Global Step: 354800 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:38:55,071-Speed 2495.57 samples/sec Loss 2.4154 LearningRate 0.000404 Epoch: 17 Global Step: 354810 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:03,272-Speed 2497.69 samples/sec Loss 2.4063 LearningRate 0.000404 Epoch: 17 Global Step: 354820 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:11,475-Speed 2497.35 samples/sec Loss 2.4433 LearningRate 0.000404 Epoch: 17 Global Step: 354830 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:19,675-Speed 2497.91 samples/sec Loss 2.4236 LearningRate 0.000404 Epoch: 17 Global Step: 354840 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:27,819-Speed 2515.08 samples/sec Loss 2.4370 LearningRate 0.000404 Epoch: 17 Global Step: 354850 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:36,023-Speed 2496.57 samples/sec Loss 2.4585 LearningRate 0.000404 Epoch: 17 Global Step: 354860 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:44,221-Speed 2498.48 samples/sec Loss 2.4407 LearningRate 0.000404 Epoch: 17 Global Step: 354870 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:39:52,424-Speed 2497.16 samples/sec Loss 2.4141 LearningRate 0.000404 Epoch: 17 Global Step: 354880 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:00,625-Speed 2497.73 samples/sec Loss 2.3740 LearningRate 0.000404 Epoch: 17 Global Step: 354890 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:08,828-Speed 2496.73 samples/sec Loss 2.4257 LearningRate 0.000404 Epoch: 17 Global Step: 354900 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:16,980-Speed 2512.57 samples/sec Loss 2.3857 LearningRate 0.000404 Epoch: 17 Global Step: 354910 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:25,181-Speed 2498.25 samples/sec Loss 2.3609 LearningRate 0.000404 Epoch: 17 Global Step: 354920 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:33,380-Speed 2498.03 samples/sec Loss 2.4294 LearningRate 0.000404 Epoch: 17 Global Step: 354930 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:41,589-Speed 2495.67 samples/sec Loss 2.4682 LearningRate 0.000404 Epoch: 17 Global Step: 354940 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:49,790-Speed 2497.38 samples/sec Loss 2.4077 LearningRate 0.000404 Epoch: 17 Global Step: 354950 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:40:57,990-Speed 2497.96 samples/sec Loss 2.4512 LearningRate 0.000404 Epoch: 17 Global Step: 354960 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:06,135-Speed 2514.84 samples/sec Loss 2.4222 LearningRate 0.000404 Epoch: 17 Global Step: 354970 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:14,335-Speed 2498.09 samples/sec Loss 2.4048 LearningRate 0.000404 Epoch: 17 Global Step: 354980 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:22,538-Speed 2496.99 samples/sec Loss 2.3965 LearningRate 0.000404 Epoch: 17 Global Step: 354990 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:30,740-Speed 2497.12 samples/sec Loss 2.4014 LearningRate 0.000404 Epoch: 17 Global Step: 355000 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:38,944-Speed 2497.04 samples/sec Loss 2.3778 LearningRate 0.000404 Epoch: 17 Global Step: 355010 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:47,157-Speed 2494.13 samples/sec Loss 2.4224 LearningRate 0.000404 Epoch: 17 Global Step: 355020 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:41:55,300-Speed 2515.22 samples/sec Loss 2.4314 LearningRate 0.000404 Epoch: 17 Global Step: 355030 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:03,505-Speed 2496.45 samples/sec Loss 2.4172 LearningRate 0.000404 Epoch: 17 Global Step: 355040 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:11,703-Speed 2498.87 samples/sec Loss 2.4070 LearningRate 0.000404 Epoch: 17 Global Step: 355050 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:19,905-Speed 2497.30 samples/sec Loss 2.4114 LearningRate 0.000404 Epoch: 17 Global Step: 355060 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:28,102-Speed 2498.95 samples/sec Loss 2.4118 LearningRate 0.000404 Epoch: 17 Global Step: 355070 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:36,304-Speed 2497.50 samples/sec Loss 2.4631 LearningRate 0.000404 Epoch: 17 Global Step: 355080 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:44,452-Speed 2513.92 samples/sec Loss 2.4538 LearningRate 0.000404 Epoch: 17 Global Step: 355090 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:42:52,649-Speed 2498.73 samples/sec Loss 2.3867 LearningRate 0.000404 Epoch: 17 Global Step: 355100 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:00,849-Speed 2498.02 samples/sec Loss 2.4335 LearningRate 0.000404 Epoch: 17 Global Step: 355110 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:09,054-Speed 2496.56 samples/sec Loss 2.4464 LearningRate 0.000404 Epoch: 17 Global Step: 355120 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:17,253-Speed 2498.17 samples/sec Loss 2.4262 LearningRate 0.000404 Epoch: 17 Global Step: 355130 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:25,457-Speed 2496.81 samples/sec Loss 2.4025 LearningRate 0.000404 Epoch: 17 Global Step: 355140 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:33,611-Speed 2512.22 samples/sec Loss 2.4885 LearningRate 0.000404 Epoch: 17 Global Step: 355150 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:41,812-Speed 2497.67 samples/sec Loss 2.4607 LearningRate 0.000404 Epoch: 17 Global Step: 355160 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:50,017-Speed 2496.69 samples/sec Loss 2.4298 LearningRate 0.000404 Epoch: 17 Global Step: 355170 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:43:58,221-Speed 2496.56 samples/sec Loss 2.4306 LearningRate 0.000404 Epoch: 17 Global Step: 355180 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:06,427-Speed 2496.10 samples/sec Loss 2.4566 LearningRate 0.000404 Epoch: 17 Global Step: 355190 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:14,634-Speed 2495.94 samples/sec Loss 2.4707 LearningRate 0.000404 Epoch: 17 Global Step: 355200 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:22,791-Speed 2511.70 samples/sec Loss 2.4458 LearningRate 0.000404 Epoch: 17 Global Step: 355210 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:30,995-Speed 2496.64 samples/sec Loss 2.4545 LearningRate 0.000404 Epoch: 17 Global Step: 355220 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:39,201-Speed 2496.06 samples/sec Loss 2.4583 LearningRate 0.000404 Epoch: 17 Global Step: 355230 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:47,404-Speed 2497.08 samples/sec Loss 2.4332 LearningRate 0.000404 Epoch: 17 Global Step: 355240 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:44:55,607-Speed 2496.86 samples/sec Loss 2.4766 LearningRate 0.000404 Epoch: 17 Global Step: 355250 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:03,809-Speed 2497.50 samples/sec Loss 2.3832 LearningRate 0.000404 Epoch: 17 Global Step: 355260 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:11,960-Speed 2513.11 samples/sec Loss 2.5308 LearningRate 0.000404 Epoch: 17 Global Step: 355270 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:20,161-Speed 2497.42 samples/sec Loss 2.4917 LearningRate 0.000404 Epoch: 17 Global Step: 355280 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:28,380-Speed 2492.39 samples/sec Loss 2.4657 LearningRate 0.000404 Epoch: 17 Global Step: 355290 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:36,591-Speed 2494.46 samples/sec Loss 2.5105 LearningRate 0.000404 Epoch: 17 Global Step: 355300 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:44,795-Speed 2496.91 samples/sec Loss 2.4245 LearningRate 0.000403 Epoch: 17 Global Step: 355310 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:45:53,001-Speed 2496.13 samples/sec Loss 2.4256 LearningRate 0.000403 Epoch: 17 Global Step: 355320 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:01,160-Speed 2510.56 samples/sec Loss 2.4181 LearningRate 0.000403 Epoch: 17 Global Step: 355330 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:09,363-Speed 2496.90 samples/sec Loss 2.4119 LearningRate 0.000403 Epoch: 17 Global Step: 355340 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:17,569-Speed 2496.16 samples/sec Loss 2.4494 LearningRate 0.000403 Epoch: 17 Global Step: 355350 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:25,773-Speed 2496.73 samples/sec Loss 2.4040 LearningRate 0.000403 Epoch: 17 Global Step: 355360 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:33,976-Speed 2497.12 samples/sec Loss 2.4374 LearningRate 0.000403 Epoch: 17 Global Step: 355370 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:42,183-Speed 2495.87 samples/sec Loss 2.4508 LearningRate 0.000403 Epoch: 17 Global Step: 355380 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:50,330-Speed 2513.86 samples/sec Loss 2.4284 LearningRate 0.000403 Epoch: 17 Global Step: 355390 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:46:58,533-Speed 2497.13 samples/sec Loss 2.4152 LearningRate 0.000403 Epoch: 17 Global Step: 355400 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:47:06,738-Speed 2496.70 samples/sec Loss 2.4237 LearningRate 0.000403 Epoch: 17 Global Step: 355410 Fp16 Grad Scale: 32768 Required: 109 hours Training: 2022-07-08 23:47:14,901-Speed 2509.11 samples/sec Loss 2.4199 LearningRate 0.000403 Epoch: 17 Global Step: 355420 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:47:23,101-Speed 2498.09 samples/sec Loss 2.3820 LearningRate 0.000403 Epoch: 17 Global Step: 355430 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:47:31,315-Speed 2493.91 samples/sec Loss 2.4272 LearningRate 0.000403 Epoch: 17 Global Step: 355440 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:47:39,462-Speed 2514.34 samples/sec Loss 2.4318 LearningRate 0.000403 Epoch: 17 Global Step: 355450 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:47:47,663-Speed 2497.68 samples/sec Loss 2.3811 LearningRate 0.000403 Epoch: 17 Global Step: 355460 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:47:55,863-Speed 2497.83 samples/sec Loss 2.4093 LearningRate 0.000403 Epoch: 17 Global Step: 355470 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:48:04,070-Speed 2496.26 samples/sec Loss 2.3977 LearningRate 0.000403 Epoch: 17 Global Step: 355480 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:48:12,271-Speed 2497.46 samples/sec Loss 2.4015 LearningRate 0.000403 Epoch: 17 Global Step: 355490 Fp16 Grad Scale: 16384 Required: 109 hours Training: 2022-07-08 23:48:20,500-Speed 2489.26 samples/sec Loss 2.3793 LearningRate 0.000403 Epoch: 17 Global Step: 355500 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:48:28,642-Speed 2515.74 samples/sec Loss 2.4026 LearningRate 0.000403 Epoch: 17 Global Step: 355510 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:48:36,845-Speed 2497.43 samples/sec Loss 2.3675 LearningRate 0.000403 Epoch: 17 Global Step: 355520 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:48:45,046-Speed 2497.63 samples/sec Loss 2.3943 LearningRate 0.000403 Epoch: 17 Global Step: 355530 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:48:53,250-Speed 2496.52 samples/sec Loss 2.3996 LearningRate 0.000403 Epoch: 17 Global Step: 355540 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:01,452-Speed 2497.23 samples/sec Loss 2.4182 LearningRate 0.000403 Epoch: 17 Global Step: 355550 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:09,655-Speed 2497.20 samples/sec Loss 2.3885 LearningRate 0.000403 Epoch: 17 Global Step: 355560 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:17,806-Speed 2512.78 samples/sec Loss 2.4268 LearningRate 0.000403 Epoch: 17 Global Step: 355570 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:26,008-Speed 2497.43 samples/sec Loss 2.3996 LearningRate 0.000403 Epoch: 17 Global Step: 355580 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:34,210-Speed 2497.66 samples/sec Loss 2.4229 LearningRate 0.000403 Epoch: 17 Global Step: 355590 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:42,423-Speed 2494.07 samples/sec Loss 2.4122 LearningRate 0.000403 Epoch: 17 Global Step: 355600 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:50,623-Speed 2497.93 samples/sec Loss 2.4228 LearningRate 0.000403 Epoch: 17 Global Step: 355610 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:49:58,825-Speed 2497.30 samples/sec Loss 2.5026 LearningRate 0.000403 Epoch: 17 Global Step: 355620 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:06,985-Speed 2510.19 samples/sec Loss 2.4575 LearningRate 0.000403 Epoch: 17 Global Step: 355630 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:15,182-Speed 2498.94 samples/sec Loss 2.4477 LearningRate 0.000403 Epoch: 17 Global Step: 355640 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:23,388-Speed 2495.94 samples/sec Loss 2.4668 LearningRate 0.000403 Epoch: 17 Global Step: 355650 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:31,589-Speed 2497.58 samples/sec Loss 2.4645 LearningRate 0.000403 Epoch: 17 Global Step: 355660 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:39,788-Speed 2498.35 samples/sec Loss 2.4576 LearningRate 0.000403 Epoch: 17 Global Step: 355670 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:47,993-Speed 2496.20 samples/sec Loss 2.4830 LearningRate 0.000403 Epoch: 17 Global Step: 355680 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:50:56,138-Speed 2514.64 samples/sec Loss 2.4432 LearningRate 0.000403 Epoch: 17 Global Step: 355690 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:04,340-Speed 2497.59 samples/sec Loss 2.4314 LearningRate 0.000403 Epoch: 17 Global Step: 355700 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:12,542-Speed 2497.24 samples/sec Loss 2.5134 LearningRate 0.000403 Epoch: 17 Global Step: 355710 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:20,743-Speed 2497.69 samples/sec Loss 2.4321 LearningRate 0.000403 Epoch: 17 Global Step: 355720 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:28,945-Speed 2497.41 samples/sec Loss 2.4202 LearningRate 0.000403 Epoch: 17 Global Step: 355730 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:37,146-Speed 2497.60 samples/sec Loss 2.4653 LearningRate 0.000403 Epoch: 17 Global Step: 355740 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:45,293-Speed 2514.20 samples/sec Loss 2.4587 LearningRate 0.000403 Epoch: 17 Global Step: 355750 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:51:53,510-Speed 2492.71 samples/sec Loss 2.4212 LearningRate 0.000403 Epoch: 17 Global Step: 355760 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:01,713-Speed 2497.31 samples/sec Loss 2.4287 LearningRate 0.000403 Epoch: 17 Global Step: 355770 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:09,917-Speed 2496.66 samples/sec Loss 2.4208 LearningRate 0.000403 Epoch: 17 Global Step: 355780 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:18,122-Speed 2496.71 samples/sec Loss 2.4552 LearningRate 0.000403 Epoch: 17 Global Step: 355790 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:26,322-Speed 2497.86 samples/sec Loss 2.4699 LearningRate 0.000403 Epoch: 17 Global Step: 355800 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:34,479-Speed 2511.22 samples/sec Loss 2.4588 LearningRate 0.000403 Epoch: 17 Global Step: 355810 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:42,684-Speed 2496.48 samples/sec Loss 2.4456 LearningRate 0.000403 Epoch: 17 Global Step: 355820 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:50,887-Speed 2497.16 samples/sec Loss 2.4380 LearningRate 0.000403 Epoch: 17 Global Step: 355830 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:52:59,096-Speed 2495.25 samples/sec Loss 2.4198 LearningRate 0.000403 Epoch: 17 Global Step: 355840 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:07,295-Speed 2498.28 samples/sec Loss 2.4263 LearningRate 0.000403 Epoch: 17 Global Step: 355850 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:15,498-Speed 2497.01 samples/sec Loss 2.4237 LearningRate 0.000403 Epoch: 17 Global Step: 355860 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:23,646-Speed 2513.91 samples/sec Loss 2.4639 LearningRate 0.000403 Epoch: 17 Global Step: 355870 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:31,846-Speed 2497.99 samples/sec Loss 2.4122 LearningRate 0.000403 Epoch: 17 Global Step: 355880 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:40,049-Speed 2496.62 samples/sec Loss 2.4972 LearningRate 0.000403 Epoch: 17 Global Step: 355890 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:48,258-Speed 2495.90 samples/sec Loss 2.4475 LearningRate 0.000402 Epoch: 17 Global Step: 355900 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:53:56,461-Speed 2496.90 samples/sec Loss 2.5369 LearningRate 0.000402 Epoch: 17 Global Step: 355910 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:04,662-Speed 2497.60 samples/sec Loss 2.4424 LearningRate 0.000402 Epoch: 17 Global Step: 355920 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:12,809-Speed 2514.21 samples/sec Loss 2.4525 LearningRate 0.000402 Epoch: 17 Global Step: 355930 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:21,040-Speed 2488.53 samples/sec Loss 2.4533 LearningRate 0.000402 Epoch: 17 Global Step: 355940 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:29,243-Speed 2497.10 samples/sec Loss 2.4334 LearningRate 0.000402 Epoch: 17 Global Step: 355950 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:37,443-Speed 2498.03 samples/sec Loss 2.4770 LearningRate 0.000402 Epoch: 17 Global Step: 355960 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:45,643-Speed 2497.95 samples/sec Loss 2.4228 LearningRate 0.000402 Epoch: 17 Global Step: 355970 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:54:53,844-Speed 2497.50 samples/sec Loss 2.3993 LearningRate 0.000402 Epoch: 17 Global Step: 355980 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:01,992-Speed 2514.17 samples/sec Loss 2.4136 LearningRate 0.000402 Epoch: 17 Global Step: 355990 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:10,192-Speed 2497.98 samples/sec Loss 2.4752 LearningRate 0.000402 Epoch: 17 Global Step: 356000 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:18,398-Speed 2496.29 samples/sec Loss 2.4687 LearningRate 0.000402 Epoch: 17 Global Step: 356010 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:26,599-Speed 2497.70 samples/sec Loss 2.4276 LearningRate 0.000402 Epoch: 17 Global Step: 356020 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:34,813-Speed 2493.67 samples/sec Loss 2.4583 LearningRate 0.000402 Epoch: 17 Global Step: 356030 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:43,019-Speed 2495.98 samples/sec Loss 2.4440 LearningRate 0.000402 Epoch: 17 Global Step: 356040 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:51,169-Speed 2513.29 samples/sec Loss 2.4269 LearningRate 0.000402 Epoch: 17 Global Step: 356050 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:55:59,375-Speed 2496.40 samples/sec Loss 2.4852 LearningRate 0.000402 Epoch: 17 Global Step: 356060 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:07,602-Speed 2489.69 samples/sec Loss 2.4654 LearningRate 0.000402 Epoch: 17 Global Step: 356070 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:15,802-Speed 2497.97 samples/sec Loss 2.4918 LearningRate 0.000402 Epoch: 17 Global Step: 356080 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:24,003-Speed 2497.68 samples/sec Loss 2.4789 LearningRate 0.000402 Epoch: 17 Global Step: 356090 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:32,218-Speed 2493.48 samples/sec Loss 2.4307 LearningRate 0.000402 Epoch: 17 Global Step: 356100 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:40,370-Speed 2512.47 samples/sec Loss 2.4263 LearningRate 0.000402 Epoch: 17 Global Step: 356110 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:48,573-Speed 2497.07 samples/sec Loss 2.4617 LearningRate 0.000402 Epoch: 17 Global Step: 356120 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:56:56,775-Speed 2497.26 samples/sec Loss 2.4728 LearningRate 0.000402 Epoch: 17 Global Step: 356130 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:04,978-Speed 2497.15 samples/sec Loss 2.4483 LearningRate 0.000402 Epoch: 17 Global Step: 356140 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:13,185-Speed 2495.67 samples/sec Loss 2.4869 LearningRate 0.000402 Epoch: 17 Global Step: 356150 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:21,388-Speed 2496.83 samples/sec Loss 2.4433 LearningRate 0.000402 Epoch: 17 Global Step: 356160 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:29,564-Speed 2505.53 samples/sec Loss 2.5271 LearningRate 0.000402 Epoch: 17 Global Step: 356170 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:37,765-Speed 2497.74 samples/sec Loss 2.4701 LearningRate 0.000402 Epoch: 17 Global Step: 356180 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:45,967-Speed 2497.62 samples/sec Loss 2.5180 LearningRate 0.000402 Epoch: 17 Global Step: 356190 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:57:54,166-Speed 2498.26 samples/sec Loss 2.5324 LearningRate 0.000402 Epoch: 17 Global Step: 356200 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:02,369-Speed 2496.83 samples/sec Loss 2.4852 LearningRate 0.000402 Epoch: 17 Global Step: 356210 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:10,574-Speed 2496.44 samples/sec Loss 2.4534 LearningRate 0.000402 Epoch: 17 Global Step: 356220 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:18,715-Speed 2516.06 samples/sec Loss 2.4953 LearningRate 0.000402 Epoch: 17 Global Step: 356230 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:26,919-Speed 2496.77 samples/sec Loss 2.4048 LearningRate 0.000402 Epoch: 17 Global Step: 356240 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:35,122-Speed 2497.16 samples/sec Loss 2.4088 LearningRate 0.000402 Epoch: 17 Global Step: 356250 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:43,325-Speed 2496.89 samples/sec Loss 2.4167 LearningRate 0.000402 Epoch: 17 Global Step: 356260 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:51,524-Speed 2498.02 samples/sec Loss 2.4223 LearningRate 0.000402 Epoch: 17 Global Step: 356270 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:58:59,724-Speed 2497.96 samples/sec Loss 2.4563 LearningRate 0.000402 Epoch: 17 Global Step: 356280 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:07,874-Speed 2513.35 samples/sec Loss 2.3885 LearningRate 0.000402 Epoch: 17 Global Step: 356290 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:16,082-Speed 2495.75 samples/sec Loss 2.4154 LearningRate 0.000402 Epoch: 17 Global Step: 356300 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:24,283-Speed 2497.36 samples/sec Loss 2.4235 LearningRate 0.000402 Epoch: 17 Global Step: 356310 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:32,486-Speed 2497.28 samples/sec Loss 2.4465 LearningRate 0.000402 Epoch: 17 Global Step: 356320 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:40,686-Speed 2497.79 samples/sec Loss 2.4200 LearningRate 0.000402 Epoch: 17 Global Step: 356330 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:48,887-Speed 2497.61 samples/sec Loss 2.4206 LearningRate 0.000402 Epoch: 17 Global Step: 356340 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-08 23:59:57,066-Speed 2504.35 samples/sec Loss 2.3918 LearningRate 0.000402 Epoch: 17 Global Step: 356350 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:05,281-Speed 2493.19 samples/sec Loss 2.3807 LearningRate 0.000402 Epoch: 17 Global Step: 356360 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:13,510-Speed 2489.21 samples/sec Loss 2.4096 LearningRate 0.000402 Epoch: 17 Global Step: 356370 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:21,721-Speed 2494.78 samples/sec Loss 2.4407 LearningRate 0.000402 Epoch: 17 Global Step: 356380 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:29,925-Speed 2496.64 samples/sec Loss 2.4255 LearningRate 0.000402 Epoch: 17 Global Step: 356390 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:38,127-Speed 2497.19 samples/sec Loss 2.3966 LearningRate 0.000402 Epoch: 17 Global Step: 356400 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:46,290-Speed 2509.47 samples/sec Loss 2.4900 LearningRate 0.000402 Epoch: 17 Global Step: 356410 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:00:54,498-Speed 2495.40 samples/sec Loss 2.4266 LearningRate 0.000402 Epoch: 17 Global Step: 356420 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:02,705-Speed 2495.62 samples/sec Loss 2.3724 LearningRate 0.000402 Epoch: 17 Global Step: 356430 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:10,926-Speed 2491.77 samples/sec Loss 2.4559 LearningRate 0.000402 Epoch: 17 Global Step: 356440 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:19,124-Speed 2498.39 samples/sec Loss 2.4532 LearningRate 0.000402 Epoch: 17 Global Step: 356450 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:27,326-Speed 2497.59 samples/sec Loss 2.4450 LearningRate 0.000402 Epoch: 17 Global Step: 356460 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:35,486-Speed 2510.20 samples/sec Loss 2.4447 LearningRate 0.000402 Epoch: 17 Global Step: 356470 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:43,687-Speed 2497.75 samples/sec Loss 2.4613 LearningRate 0.000402 Epoch: 17 Global Step: 356480 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:01:51,888-Speed 2497.46 samples/sec Loss 2.4660 LearningRate 0.000401 Epoch: 17 Global Step: 356490 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:00,091-Speed 2497.04 samples/sec Loss 2.4650 LearningRate 0.000401 Epoch: 17 Global Step: 356500 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:08,305-Speed 2493.63 samples/sec Loss 2.4431 LearningRate 0.000401 Epoch: 17 Global Step: 356510 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:16,531-Speed 2490.15 samples/sec Loss 2.4396 LearningRate 0.000401 Epoch: 17 Global Step: 356520 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:24,698-Speed 2507.94 samples/sec Loss 2.4694 LearningRate 0.000401 Epoch: 17 Global Step: 356530 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:32,927-Speed 2489.15 samples/sec Loss 2.4424 LearningRate 0.000401 Epoch: 17 Global Step: 356540 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:41,127-Speed 2497.81 samples/sec Loss 2.4618 LearningRate 0.000401 Epoch: 17 Global Step: 356550 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:49,333-Speed 2496.22 samples/sec Loss 2.4741 LearningRate 0.000401 Epoch: 17 Global Step: 356560 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:02:57,541-Speed 2495.51 samples/sec Loss 2.4562 LearningRate 0.000401 Epoch: 17 Global Step: 356570 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:03:05,740-Speed 2498.12 samples/sec Loss 2.5285 LearningRate 0.000401 Epoch: 17 Global Step: 356580 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:03:13,890-Speed 2513.19 samples/sec Loss 2.5408 LearningRate 0.000401 Epoch: 17 Global Step: 356590 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:03:22,102-Speed 2494.66 samples/sec Loss 2.4756 LearningRate 0.000401 Epoch: 17 Global Step: 356600 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:03:30,302-Speed 2498.17 samples/sec Loss 2.5060 LearningRate 0.000401 Epoch: 17 Global Step: 356610 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:03:38,517-Speed 2493.07 samples/sec Loss 2.4556 LearningRate 0.000401 Epoch: 17 Global Step: 356620 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:03:46,725-Speed 2496.08 samples/sec Loss 2.4420 LearningRate 0.000401 Epoch: 17 Global Step: 356630 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:03:54,926-Speed 2497.47 samples/sec Loss 2.4171 LearningRate 0.000401 Epoch: 17 Global Step: 356640 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:03,089-Speed 2509.13 samples/sec Loss 2.4066 LearningRate 0.000401 Epoch: 17 Global Step: 356650 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:11,291-Speed 2497.32 samples/sec Loss 2.4254 LearningRate 0.000401 Epoch: 17 Global Step: 356660 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:19,492-Speed 2497.60 samples/sec Loss 2.3893 LearningRate 0.000401 Epoch: 17 Global Step: 356670 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:27,693-Speed 2497.90 samples/sec Loss 2.3874 LearningRate 0.000401 Epoch: 17 Global Step: 356680 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:35,893-Speed 2497.87 samples/sec Loss 2.5048 LearningRate 0.000401 Epoch: 17 Global Step: 356690 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:44,147-Speed 2491.83 samples/sec Loss 2.4458 LearningRate 0.000401 Epoch: 17 Global Step: 356700 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:04:52,363-Speed 2516.03 samples/sec Loss 2.3881 LearningRate 0.000401 Epoch: 17 Global Step: 356710 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:00,564-Speed 2497.63 samples/sec Loss 2.4176 LearningRate 0.000401 Epoch: 17 Global Step: 356720 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:08,771-Speed 2495.67 samples/sec Loss 2.4031 LearningRate 0.000401 Epoch: 17 Global Step: 356730 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:21,634-Speed 2500.51 samples/sec Loss 2.4474 LearningRate 0.000401 Epoch: 17 Global Step: 356740 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:31,375-Speed 2501.33 samples/sec Loss 2.3565 LearningRate 0.000401 Epoch: 17 Global Step: 356750 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:39,572-Speed 2498.63 samples/sec Loss 2.3992 LearningRate 0.000401 Epoch: 17 Global Step: 356760 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:05:52,466-Speed 1639.63 samples/sec Loss 2.3479 LearningRate 0.000401 Epoch: 17 Global Step: 356770 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:00,677-Speed 2502.90 samples/sec Loss 2.4036 LearningRate 0.000401 Epoch: 17 Global Step: 356780 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:08,869-Speed 2500.29 samples/sec Loss 2.4226 LearningRate 0.000401 Epoch: 17 Global Step: 356790 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:21,468-Speed 1630.49 samples/sec Loss 2.3730 LearningRate 0.000401 Epoch: 17 Global Step: 356800 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:36,867-Speed 1498.47 samples/sec Loss 2.3376 LearningRate 0.000401 Epoch: 17 Global Step: 356810 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:45,100-Speed 2502.23 samples/sec Loss 2.3923 LearningRate 0.000401 Epoch: 17 Global Step: 356820 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:06:53,474-Speed 2516.72 samples/sec Loss 2.4016 LearningRate 0.000401 Epoch: 17 Global Step: 356830 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:01,743-Speed 2499.66 samples/sec Loss 2.4395 LearningRate 0.000401 Epoch: 17 Global Step: 356840 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:12,458-Speed 1911.51 samples/sec Loss 2.4847 LearningRate 0.000401 Epoch: 17 Global Step: 356850 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:20,699-Speed 2496.49 samples/sec Loss 2.4025 LearningRate 0.000401 Epoch: 17 Global Step: 356860 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:31,644-Speed 1903.99 samples/sec Loss 2.3838 LearningRate 0.000401 Epoch: 17 Global Step: 356870 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:39,851-Speed 2495.54 samples/sec Loss 2.4071 LearningRate 0.000401 Epoch: 17 Global Step: 356880 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:07:54,261-Speed 1423.67 samples/sec Loss 2.4026 LearningRate 0.000401 Epoch: 17 Global Step: 356890 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:04,481-Speed 2012.12 samples/sec Loss 2.4147 LearningRate 0.000401 Epoch: 17 Global Step: 356900 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:12,714-Speed 2499.92 samples/sec Loss 2.4445 LearningRate 0.000401 Epoch: 17 Global Step: 356910 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:20,917-Speed 2496.77 samples/sec Loss 2.4388 LearningRate 0.000401 Epoch: 17 Global Step: 356920 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:29,305-Speed 2461.91 samples/sec Loss 2.4003 LearningRate 0.000401 Epoch: 17 Global Step: 356930 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:38,329-Speed 2498.55 samples/sec Loss 2.4094 LearningRate 0.000401 Epoch: 17 Global Step: 356940 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:49,375-Speed 1854.28 samples/sec Loss 2.4396 LearningRate 0.000401 Epoch: 17 Global Step: 356950 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:08:57,575-Speed 2497.91 samples/sec Loss 2.3991 LearningRate 0.000401 Epoch: 17 Global Step: 356960 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:05,777-Speed 2497.40 samples/sec Loss 2.4166 LearningRate 0.000401 Epoch: 17 Global Step: 356970 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:13,988-Speed 2494.78 samples/sec Loss 2.5367 LearningRate 0.000401 Epoch: 17 Global Step: 356980 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:22,196-Speed 2495.24 samples/sec Loss 2.4614 LearningRate 0.000401 Epoch: 17 Global Step: 356990 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:30,406-Speed 2494.90 samples/sec Loss 2.4075 LearningRate 0.000401 Epoch: 17 Global Step: 357000 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:38,568-Speed 2510.54 samples/sec Loss 2.4240 LearningRate 0.000401 Epoch: 17 Global Step: 357010 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:46,785-Speed 2492.77 samples/sec Loss 2.4908 LearningRate 0.000401 Epoch: 17 Global Step: 357020 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:09:55,002-Speed 2492.58 samples/sec Loss 2.4704 LearningRate 0.000401 Epoch: 17 Global Step: 357030 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:03,213-Speed 2494.74 samples/sec Loss 2.4289 LearningRate 0.000401 Epoch: 17 Global Step: 357040 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:11,422-Speed 2494.99 samples/sec Loss 2.4310 LearningRate 0.000401 Epoch: 17 Global Step: 357050 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:19,632-Speed 2495.20 samples/sec Loss 2.4316 LearningRate 0.000401 Epoch: 17 Global Step: 357060 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:27,784-Speed 2512.43 samples/sec Loss 2.4921 LearningRate 0.000401 Epoch: 17 Global Step: 357070 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:35,987-Speed 2497.14 samples/sec Loss 2.4852 LearningRate 0.000400 Epoch: 17 Global Step: 357080 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:44,193-Speed 2496.19 samples/sec Loss 2.4873 LearningRate 0.000400 Epoch: 17 Global Step: 357090 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:10:52,400-Speed 2495.86 samples/sec Loss 2.4546 LearningRate 0.000400 Epoch: 17 Global Step: 357100 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:00,604-Speed 2496.65 samples/sec Loss 2.4214 LearningRate 0.000400 Epoch: 17 Global Step: 357110 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:08,814-Speed 2494.99 samples/sec Loss 2.4341 LearningRate 0.000400 Epoch: 17 Global Step: 357120 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:16,971-Speed 2511.26 samples/sec Loss 2.4717 LearningRate 0.000400 Epoch: 17 Global Step: 357130 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:25,178-Speed 2495.72 samples/sec Loss 2.4103 LearningRate 0.000400 Epoch: 17 Global Step: 357140 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:33,382-Speed 2496.83 samples/sec Loss 2.4320 LearningRate 0.000400 Epoch: 17 Global Step: 357150 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:41,592-Speed 2495.26 samples/sec Loss 2.3879 LearningRate 0.000400 Epoch: 17 Global Step: 357160 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:49,795-Speed 2496.71 samples/sec Loss 2.4229 LearningRate 0.000400 Epoch: 17 Global Step: 357170 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:11:58,012-Speed 2492.85 samples/sec Loss 2.4556 LearningRate 0.000400 Epoch: 17 Global Step: 357180 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:06,164-Speed 2512.76 samples/sec Loss 2.4926 LearningRate 0.000400 Epoch: 17 Global Step: 357190 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:14,373-Speed 2495.22 samples/sec Loss 2.4037 LearningRate 0.000400 Epoch: 17 Global Step: 357200 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:22,580-Speed 2495.77 samples/sec Loss 2.4378 LearningRate 0.000400 Epoch: 17 Global Step: 357210 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:30,785-Speed 2496.42 samples/sec Loss 2.4441 LearningRate 0.000400 Epoch: 17 Global Step: 357220 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:38,992-Speed 2495.96 samples/sec Loss 2.4395 LearningRate 0.000400 Epoch: 17 Global Step: 357230 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:47,208-Speed 2492.84 samples/sec Loss 2.4417 LearningRate 0.000400 Epoch: 17 Global Step: 357240 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:12:55,358-Speed 2513.25 samples/sec Loss 2.4549 LearningRate 0.000400 Epoch: 17 Global Step: 357250 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:03,566-Speed 2495.44 samples/sec Loss 2.3932 LearningRate 0.000400 Epoch: 17 Global Step: 357260 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:11,767-Speed 2497.78 samples/sec Loss 2.4127 LearningRate 0.000400 Epoch: 17 Global Step: 357270 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:19,972-Speed 2496.27 samples/sec Loss 2.4096 LearningRate 0.000400 Epoch: 17 Global Step: 357280 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:28,177-Speed 2496.45 samples/sec Loss 2.3999 LearningRate 0.000400 Epoch: 17 Global Step: 357290 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:36,381-Speed 2496.54 samples/sec Loss 2.3988 LearningRate 0.000400 Epoch: 17 Global Step: 357300 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:44,533-Speed 2512.89 samples/sec Loss 2.3895 LearningRate 0.000400 Epoch: 17 Global Step: 357310 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:13:52,694-Speed 2509.95 samples/sec Loss 2.3741 LearningRate 0.000400 Epoch: 17 Global Step: 357320 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:00,898-Speed 2496.42 samples/sec Loss 2.4243 LearningRate 0.000400 Epoch: 17 Global Step: 357330 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:09,107-Speed 2495.38 samples/sec Loss 2.4284 LearningRate 0.000400 Epoch: 17 Global Step: 357340 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:17,312-Speed 2496.40 samples/sec Loss 2.5275 LearningRate 0.000400 Epoch: 17 Global Step: 357350 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:25,515-Speed 2496.95 samples/sec Loss 2.4889 LearningRate 0.000400 Epoch: 17 Global Step: 357360 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:33,665-Speed 2513.92 samples/sec Loss 2.4039 LearningRate 0.000400 Epoch: 17 Global Step: 357370 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:41,871-Speed 2496.10 samples/sec Loss 2.4675 LearningRate 0.000400 Epoch: 17 Global Step: 357380 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:50,078-Speed 2495.65 samples/sec Loss 2.4680 LearningRate 0.000400 Epoch: 17 Global Step: 357390 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:14:58,291-Speed 2493.99 samples/sec Loss 2.4197 LearningRate 0.000400 Epoch: 17 Global Step: 357400 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:06,491-Speed 2498.13 samples/sec Loss 2.4214 LearningRate 0.000400 Epoch: 17 Global Step: 357410 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:14,692-Speed 2497.60 samples/sec Loss 2.5071 LearningRate 0.000400 Epoch: 17 Global Step: 357420 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:22,846-Speed 2512.03 samples/sec Loss 2.4929 LearningRate 0.000400 Epoch: 17 Global Step: 357430 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:31,048-Speed 2497.21 samples/sec Loss 2.4615 LearningRate 0.000400 Epoch: 17 Global Step: 357440 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:39,251-Speed 2497.21 samples/sec Loss 2.4003 LearningRate 0.000400 Epoch: 17 Global Step: 357450 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:47,452-Speed 2497.65 samples/sec Loss 2.4859 LearningRate 0.000400 Epoch: 17 Global Step: 357460 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:15:55,655-Speed 2496.88 samples/sec Loss 2.4014 LearningRate 0.000400 Epoch: 17 Global Step: 357470 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:03,858-Speed 2497.08 samples/sec Loss 2.4261 LearningRate 0.000400 Epoch: 17 Global Step: 357480 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:12,010-Speed 2512.69 samples/sec Loss 2.4407 LearningRate 0.000400 Epoch: 17 Global Step: 357490 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:20,217-Speed 2496.06 samples/sec Loss 2.4559 LearningRate 0.000400 Epoch: 17 Global Step: 357500 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:28,425-Speed 2495.41 samples/sec Loss 2.4257 LearningRate 0.000400 Epoch: 17 Global Step: 357510 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:36,626-Speed 2497.43 samples/sec Loss 2.4596 LearningRate 0.000400 Epoch: 17 Global Step: 357520 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:44,832-Speed 2496.35 samples/sec Loss 2.4249 LearningRate 0.000400 Epoch: 17 Global Step: 357530 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:16:53,037-Speed 2496.23 samples/sec Loss 2.4397 LearningRate 0.000400 Epoch: 17 Global Step: 357540 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:01,191-Speed 2512.17 samples/sec Loss 2.4156 LearningRate 0.000400 Epoch: 17 Global Step: 357550 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:09,394-Speed 2497.05 samples/sec Loss 2.4488 LearningRate 0.000400 Epoch: 17 Global Step: 357560 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:17,603-Speed 2495.19 samples/sec Loss 2.4524 LearningRate 0.000400 Epoch: 17 Global Step: 357570 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:25,815-Speed 2494.40 samples/sec Loss 2.4086 LearningRate 0.000400 Epoch: 17 Global Step: 357580 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:34,021-Speed 2496.21 samples/sec Loss 2.4282 LearningRate 0.000400 Epoch: 17 Global Step: 357590 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:42,228-Speed 2495.90 samples/sec Loss 2.4338 LearningRate 0.000400 Epoch: 17 Global Step: 357600 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:50,377-Speed 2513.81 samples/sec Loss 2.3681 LearningRate 0.000400 Epoch: 17 Global Step: 357610 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:17:58,592-Speed 2493.40 samples/sec Loss 2.4166 LearningRate 0.000400 Epoch: 17 Global Step: 357620 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:06,791-Speed 2498.30 samples/sec Loss 2.3764 LearningRate 0.000400 Epoch: 17 Global Step: 357630 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:14,992-Speed 2497.54 samples/sec Loss 2.4392 LearningRate 0.000400 Epoch: 17 Global Step: 357640 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:23,197-Speed 2496.38 samples/sec Loss 2.4200 LearningRate 0.000400 Epoch: 17 Global Step: 357650 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:31,407-Speed 2495.03 samples/sec Loss 2.4389 LearningRate 0.000400 Epoch: 17 Global Step: 357660 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:39,554-Speed 2514.22 samples/sec Loss 2.4366 LearningRate 0.000399 Epoch: 17 Global Step: 357670 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:47,755-Speed 2497.52 samples/sec Loss 2.4107 LearningRate 0.000399 Epoch: 17 Global Step: 357680 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:18:55,959-Speed 2496.87 samples/sec Loss 2.4625 LearningRate 0.000399 Epoch: 17 Global Step: 357690 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:04,162-Speed 2496.72 samples/sec Loss 2.3834 LearningRate 0.000399 Epoch: 17 Global Step: 357700 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:12,364-Speed 2497.38 samples/sec Loss 2.4695 LearningRate 0.000399 Epoch: 17 Global Step: 357710 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:20,567-Speed 2496.93 samples/sec Loss 2.4242 LearningRate 0.000399 Epoch: 17 Global Step: 357720 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:28,720-Speed 2512.43 samples/sec Loss 2.4320 LearningRate 0.000399 Epoch: 17 Global Step: 357730 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:36,927-Speed 2496.03 samples/sec Loss 2.4592 LearningRate 0.000399 Epoch: 17 Global Step: 357740 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:45,134-Speed 2495.82 samples/sec Loss 2.4659 LearningRate 0.000399 Epoch: 17 Global Step: 357750 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:19:53,376-Speed 2484.92 samples/sec Loss 2.3959 LearningRate 0.000399 Epoch: 17 Global Step: 357760 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:01,584-Speed 2496.07 samples/sec Loss 2.4400 LearningRate 0.000399 Epoch: 17 Global Step: 357770 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:09,787-Speed 2496.93 samples/sec Loss 2.4689 LearningRate 0.000399 Epoch: 17 Global Step: 357780 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:17,951-Speed 2508.66 samples/sec Loss 2.4246 LearningRate 0.000399 Epoch: 17 Global Step: 357790 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:26,150-Speed 2498.37 samples/sec Loss 2.4301 LearningRate 0.000399 Epoch: 17 Global Step: 357800 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:34,437-Speed 2472.01 samples/sec Loss 2.4089 LearningRate 0.000399 Epoch: 17 Global Step: 357810 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:42,635-Speed 2498.53 samples/sec Loss 2.4799 LearningRate 0.000399 Epoch: 17 Global Step: 357820 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:50,846-Speed 2494.65 samples/sec Loss 2.4648 LearningRate 0.000399 Epoch: 17 Global Step: 357830 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:20:59,053-Speed 2496.10 samples/sec Loss 2.4148 LearningRate 0.000399 Epoch: 17 Global Step: 357840 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:07,211-Speed 2510.79 samples/sec Loss 2.3944 LearningRate 0.000399 Epoch: 17 Global Step: 357850 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:15,412-Speed 2497.44 samples/sec Loss 2.4585 LearningRate 0.000399 Epoch: 17 Global Step: 357860 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:23,623-Speed 2494.61 samples/sec Loss 2.4450 LearningRate 0.000399 Epoch: 17 Global Step: 357870 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:31,826-Speed 2497.42 samples/sec Loss 2.4217 LearningRate 0.000399 Epoch: 17 Global Step: 357880 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:40,029-Speed 2496.83 samples/sec Loss 2.4185 LearningRate 0.000399 Epoch: 17 Global Step: 357890 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:48,231-Speed 2497.23 samples/sec Loss 2.4064 LearningRate 0.000399 Epoch: 17 Global Step: 357900 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:21:56,379-Speed 2514.09 samples/sec Loss 2.4633 LearningRate 0.000399 Epoch: 17 Global Step: 357910 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:04,581-Speed 2497.30 samples/sec Loss 2.4228 LearningRate 0.000399 Epoch: 17 Global Step: 357920 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:12,799-Speed 2492.39 samples/sec Loss 2.4184 LearningRate 0.000399 Epoch: 17 Global Step: 357930 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:21,003-Speed 2496.96 samples/sec Loss 2.4814 LearningRate 0.000399 Epoch: 17 Global Step: 357940 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:29,211-Speed 2495.70 samples/sec Loss 2.4139 LearningRate 0.000399 Epoch: 17 Global Step: 357950 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:37,414-Speed 2496.87 samples/sec Loss 2.4680 LearningRate 0.000399 Epoch: 17 Global Step: 357960 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:45,561-Speed 2513.97 samples/sec Loss 2.4622 LearningRate 0.000399 Epoch: 17 Global Step: 357970 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:22:53,771-Speed 2495.13 samples/sec Loss 2.4019 LearningRate 0.000399 Epoch: 17 Global Step: 357980 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:01,971-Speed 2498.00 samples/sec Loss 2.4391 LearningRate 0.000399 Epoch: 17 Global Step: 357990 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:10,175-Speed 2496.82 samples/sec Loss 2.3995 LearningRate 0.000399 Epoch: 17 Global Step: 358000 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:18,377-Speed 2497.02 samples/sec Loss 2.3796 LearningRate 0.000399 Epoch: 17 Global Step: 358010 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:26,585-Speed 2495.66 samples/sec Loss 2.4515 LearningRate 0.000399 Epoch: 17 Global Step: 358020 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:34,762-Speed 2505.07 samples/sec Loss 2.4973 LearningRate 0.000399 Epoch: 17 Global Step: 358030 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:43,029-Speed 2477.59 samples/sec Loss 2.4503 LearningRate 0.000399 Epoch: 17 Global Step: 358040 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:51,231-Speed 2497.32 samples/sec Loss 2.4557 LearningRate 0.000399 Epoch: 17 Global Step: 358050 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:23:59,433-Speed 2497.10 samples/sec Loss 2.4454 LearningRate 0.000399 Epoch: 17 Global Step: 358060 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:07,635-Speed 2497.45 samples/sec Loss 2.4386 LearningRate 0.000399 Epoch: 17 Global Step: 358070 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:15,838-Speed 2496.87 samples/sec Loss 2.4681 LearningRate 0.000399 Epoch: 17 Global Step: 358080 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:23,985-Speed 2514.55 samples/sec Loss 2.4721 LearningRate 0.000399 Epoch: 17 Global Step: 358090 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:32,188-Speed 2500.25 samples/sec Loss 2.4124 LearningRate 0.000399 Epoch: 17 Global Step: 358100 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:40,402-Speed 2493.76 samples/sec Loss 2.4020 LearningRate 0.000399 Epoch: 17 Global Step: 358110 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:48,601-Speed 2498.26 samples/sec Loss 2.3985 LearningRate 0.000399 Epoch: 17 Global Step: 358120 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:24:56,800-Speed 2498.35 samples/sec Loss 2.3901 LearningRate 0.000399 Epoch: 17 Global Step: 358130 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:05,003-Speed 2496.87 samples/sec Loss 2.4128 LearningRate 0.000399 Epoch: 17 Global Step: 358140 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:13,158-Speed 2512.03 samples/sec Loss 2.3826 LearningRate 0.000399 Epoch: 17 Global Step: 358150 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:21,361-Speed 2496.96 samples/sec Loss 2.4347 LearningRate 0.000399 Epoch: 17 Global Step: 358160 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:29,563-Speed 2497.51 samples/sec Loss 2.3852 LearningRate 0.000399 Epoch: 17 Global Step: 358170 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:37,766-Speed 2497.61 samples/sec Loss 2.3967 LearningRate 0.000399 Epoch: 17 Global Step: 358180 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:45,971-Speed 2496.02 samples/sec Loss 2.3800 LearningRate 0.000399 Epoch: 17 Global Step: 358190 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:25:54,178-Speed 2495.83 samples/sec Loss 2.4043 LearningRate 0.000399 Epoch: 17 Global Step: 358200 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:02,327-Speed 2513.69 samples/sec Loss 2.3449 LearningRate 0.000399 Epoch: 17 Global Step: 358210 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:10,527-Speed 2497.90 samples/sec Loss 2.4061 LearningRate 0.000399 Epoch: 17 Global Step: 358220 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:18,733-Speed 2496.32 samples/sec Loss 2.3380 LearningRate 0.000399 Epoch: 17 Global Step: 358230 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:26,966-Speed 2488.01 samples/sec Loss 2.4363 LearningRate 0.000399 Epoch: 17 Global Step: 358240 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:35,171-Speed 2496.38 samples/sec Loss 2.4704 LearningRate 0.000399 Epoch: 17 Global Step: 358250 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:43,373-Speed 2497.22 samples/sec Loss 2.4349 LearningRate 0.000398 Epoch: 17 Global Step: 358260 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:51,525-Speed 2512.75 samples/sec Loss 2.4047 LearningRate 0.000398 Epoch: 17 Global Step: 358270 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:26:59,723-Speed 2498.85 samples/sec Loss 2.3985 LearningRate 0.000398 Epoch: 17 Global Step: 358280 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:07,928-Speed 2496.49 samples/sec Loss 2.4070 LearningRate 0.000398 Epoch: 17 Global Step: 358290 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:16,131-Speed 2497.03 samples/sec Loss 2.4148 LearningRate 0.000398 Epoch: 17 Global Step: 358300 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:24,380-Speed 2483.13 samples/sec Loss 2.3863 LearningRate 0.000398 Epoch: 17 Global Step: 358310 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:32,658-Speed 2474.91 samples/sec Loss 2.3677 LearningRate 0.000398 Epoch: 17 Global Step: 358320 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:40,885-Speed 2489.97 samples/sec Loss 2.4106 LearningRate 0.000398 Epoch: 17 Global Step: 358330 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:49,166-Speed 2474.09 samples/sec Loss 2.3871 LearningRate 0.000398 Epoch: 17 Global Step: 358340 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:27:57,456-Speed 2471.08 samples/sec Loss 2.4007 LearningRate 0.000398 Epoch: 17 Global Step: 358350 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:05,743-Speed 2471.65 samples/sec Loss 2.4395 LearningRate 0.000398 Epoch: 17 Global Step: 358360 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:14,047-Speed 2466.68 samples/sec Loss 2.3851 LearningRate 0.000398 Epoch: 17 Global Step: 358370 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:22,318-Speed 2476.79 samples/sec Loss 2.4313 LearningRate 0.000398 Epoch: 17 Global Step: 358380 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:30,533-Speed 2493.27 samples/sec Loss 2.4104 LearningRate 0.000398 Epoch: 17 Global Step: 358390 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:38,791-Speed 2481.66 samples/sec Loss 2.4000 LearningRate 0.000398 Epoch: 17 Global Step: 358400 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:47,013-Speed 2491.16 samples/sec Loss 2.4769 LearningRate 0.000398 Epoch: 17 Global Step: 358410 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:28:55,231-Speed 2492.63 samples/sec Loss 2.4588 LearningRate 0.000398 Epoch: 17 Global Step: 358420 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:03,444-Speed 2493.91 samples/sec Loss 2.4209 LearningRate 0.000398 Epoch: 17 Global Step: 358430 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:11,655-Speed 2494.65 samples/sec Loss 2.4370 LearningRate 0.000398 Epoch: 17 Global Step: 358440 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:19,817-Speed 2509.59 samples/sec Loss 2.4096 LearningRate 0.000398 Epoch: 17 Global Step: 358450 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:28,023-Speed 2496.13 samples/sec Loss 2.3765 LearningRate 0.000398 Epoch: 17 Global Step: 358460 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:36,230-Speed 2495.86 samples/sec Loss 2.4333 LearningRate 0.000398 Epoch: 17 Global Step: 358470 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:44,445-Speed 2493.38 samples/sec Loss 2.4389 LearningRate 0.000398 Epoch: 17 Global Step: 358480 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:29:52,649-Speed 2496.67 samples/sec Loss 2.4102 LearningRate 0.000398 Epoch: 17 Global Step: 358490 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:30:00,852-Speed 2497.44 samples/sec Loss 2.3945 LearningRate 0.000398 Epoch: 17 Global Step: 358500 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:30:09,004-Speed 2512.59 samples/sec Loss 2.4330 LearningRate 0.000398 Epoch: 17 Global Step: 358510 Fp16 Grad Scale: 16384 Required: 108 hours Training: 2022-07-09 00:30:17,207-Speed 2497.00 samples/sec Loss 2.4364 LearningRate 0.000398 Epoch: 17 Global Step: 358520 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:30:25,413-Speed 2496.37 samples/sec Loss 2.4642 LearningRate 0.000398 Epoch: 17 Global Step: 358530 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:30:33,631-Speed 2492.60 samples/sec Loss 2.4222 LearningRate 0.000398 Epoch: 17 Global Step: 358540 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:30:41,846-Speed 2493.26 samples/sec Loss 2.4303 LearningRate 0.000398 Epoch: 17 Global Step: 358550 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:30:50,052-Speed 2496.39 samples/sec Loss 2.4021 LearningRate 0.000398 Epoch: 17 Global Step: 358560 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:30:58,208-Speed 2511.31 samples/sec Loss 2.4555 LearningRate 0.000398 Epoch: 17 Global Step: 358570 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:06,413-Speed 2496.41 samples/sec Loss 2.4831 LearningRate 0.000398 Epoch: 17 Global Step: 358580 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:14,614-Speed 2497.91 samples/sec Loss 2.4574 LearningRate 0.000398 Epoch: 17 Global Step: 358590 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:22,843-Speed 2489.22 samples/sec Loss 2.4092 LearningRate 0.000398 Epoch: 17 Global Step: 358600 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:31,048-Speed 2496.58 samples/sec Loss 2.4530 LearningRate 0.000398 Epoch: 17 Global Step: 358610 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:39,250-Speed 2497.11 samples/sec Loss 2.4761 LearningRate 0.000398 Epoch: 17 Global Step: 358620 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:47,400-Speed 2513.39 samples/sec Loss 2.4602 LearningRate 0.000398 Epoch: 17 Global Step: 358630 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:31:55,608-Speed 2495.46 samples/sec Loss 2.4306 LearningRate 0.000398 Epoch: 17 Global Step: 358640 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:03,809-Speed 2497.69 samples/sec Loss 2.4204 LearningRate 0.000398 Epoch: 17 Global Step: 358650 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:12,013-Speed 2496.67 samples/sec Loss 2.4128 LearningRate 0.000398 Epoch: 17 Global Step: 358660 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:20,229-Speed 2493.26 samples/sec Loss 2.4248 LearningRate 0.000398 Epoch: 17 Global Step: 358670 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:28,434-Speed 2496.46 samples/sec Loss 2.4053 LearningRate 0.000398 Epoch: 17 Global Step: 358680 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:36,596-Speed 2509.47 samples/sec Loss 2.4468 LearningRate 0.000398 Epoch: 17 Global Step: 358690 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:44,810-Speed 2493.81 samples/sec Loss 2.4289 LearningRate 0.000398 Epoch: 17 Global Step: 358700 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:32:53,013-Speed 2497.04 samples/sec Loss 2.4322 LearningRate 0.000398 Epoch: 17 Global Step: 358710 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:01,230-Speed 2492.72 samples/sec Loss 2.4533 LearningRate 0.000398 Epoch: 17 Global Step: 358720 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:09,445-Speed 2493.48 samples/sec Loss 2.4291 LearningRate 0.000398 Epoch: 17 Global Step: 358730 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:17,646-Speed 2497.76 samples/sec Loss 2.4359 LearningRate 0.000398 Epoch: 17 Global Step: 358740 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:25,794-Speed 2513.96 samples/sec Loss 2.3908 LearningRate 0.000398 Epoch: 17 Global Step: 358750 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:34,002-Speed 2495.44 samples/sec Loss 2.4820 LearningRate 0.000398 Epoch: 17 Global Step: 358760 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:42,206-Speed 2496.83 samples/sec Loss 2.4427 LearningRate 0.000398 Epoch: 17 Global Step: 358770 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:50,408-Speed 2497.29 samples/sec Loss 2.4508 LearningRate 0.000398 Epoch: 17 Global Step: 358780 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:33:58,620-Speed 2494.61 samples/sec Loss 2.4533 LearningRate 0.000398 Epoch: 17 Global Step: 358790 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:06,823-Speed 2496.79 samples/sec Loss 2.4133 LearningRate 0.000398 Epoch: 17 Global Step: 358800 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:14,977-Speed 2512.15 samples/sec Loss 2.4286 LearningRate 0.000398 Epoch: 17 Global Step: 358810 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:23,182-Speed 2496.37 samples/sec Loss 2.3683 LearningRate 0.000398 Epoch: 17 Global Step: 358820 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:31,384-Speed 2497.24 samples/sec Loss 2.3984 LearningRate 0.000398 Epoch: 17 Global Step: 358830 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:39,589-Speed 2496.62 samples/sec Loss 2.4210 LearningRate 0.000398 Epoch: 17 Global Step: 358840 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:47,788-Speed 2498.07 samples/sec Loss 2.3837 LearningRate 0.000397 Epoch: 17 Global Step: 358850 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:34:55,993-Speed 2496.36 samples/sec Loss 2.3572 LearningRate 0.000397 Epoch: 17 Global Step: 358860 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:04,143-Speed 2513.32 samples/sec Loss 2.3840 LearningRate 0.000397 Epoch: 17 Global Step: 358870 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:12,343-Speed 2498.11 samples/sec Loss 2.4042 LearningRate 0.000397 Epoch: 17 Global Step: 358880 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:20,545-Speed 2497.33 samples/sec Loss 2.4297 LearningRate 0.000397 Epoch: 17 Global Step: 358890 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:28,747-Speed 2497.44 samples/sec Loss 2.4339 LearningRate 0.000397 Epoch: 17 Global Step: 358900 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:36,948-Speed 2497.67 samples/sec Loss 2.4247 LearningRate 0.000397 Epoch: 17 Global Step: 358910 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:45,154-Speed 2495.98 samples/sec Loss 2.4459 LearningRate 0.000397 Epoch: 17 Global Step: 358920 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:35:53,305-Speed 2513.04 samples/sec Loss 2.4469 LearningRate 0.000397 Epoch: 17 Global Step: 358930 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:01,510-Speed 2496.42 samples/sec Loss 2.4148 LearningRate 0.000397 Epoch: 17 Global Step: 358940 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:09,722-Speed 2494.29 samples/sec Loss 2.4498 LearningRate 0.000397 Epoch: 17 Global Step: 358950 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:17,921-Speed 2498.18 samples/sec Loss 2.3966 LearningRate 0.000397 Epoch: 17 Global Step: 358960 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:26,126-Speed 2496.53 samples/sec Loss 2.4332 LearningRate 0.000397 Epoch: 17 Global Step: 358970 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:34,328-Speed 2497.21 samples/sec Loss 2.4174 LearningRate 0.000397 Epoch: 17 Global Step: 358980 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:42,476-Speed 2514.00 samples/sec Loss 2.3210 LearningRate 0.000397 Epoch: 17 Global Step: 358990 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:50,682-Speed 2496.08 samples/sec Loss 2.3643 LearningRate 0.000397 Epoch: 17 Global Step: 359000 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:36:58,888-Speed 2496.22 samples/sec Loss 2.3745 LearningRate 0.000397 Epoch: 17 Global Step: 359010 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:07,092-Speed 2496.53 samples/sec Loss 2.3852 LearningRate 0.000397 Epoch: 17 Global Step: 359020 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:15,297-Speed 2496.51 samples/sec Loss 2.4203 LearningRate 0.000397 Epoch: 17 Global Step: 359030 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:23,497-Speed 2497.75 samples/sec Loss 2.3936 LearningRate 0.000397 Epoch: 17 Global Step: 359040 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:31,653-Speed 2511.53 samples/sec Loss 2.4324 LearningRate 0.000397 Epoch: 17 Global Step: 359050 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:39,855-Speed 2497.36 samples/sec Loss 2.3998 LearningRate 0.000397 Epoch: 17 Global Step: 359060 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:48,057-Speed 2497.38 samples/sec Loss 2.3635 LearningRate 0.000397 Epoch: 17 Global Step: 359070 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:37:56,268-Speed 2494.61 samples/sec Loss 2.4144 LearningRate 0.000397 Epoch: 17 Global Step: 359080 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:04,470-Speed 2497.21 samples/sec Loss 2.4042 LearningRate 0.000397 Epoch: 17 Global Step: 359090 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:12,672-Speed 2497.33 samples/sec Loss 2.4269 LearningRate 0.000397 Epoch: 17 Global Step: 359100 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:20,834-Speed 2509.52 samples/sec Loss 2.4027 LearningRate 0.000397 Epoch: 17 Global Step: 359110 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:29,036-Speed 2497.61 samples/sec Loss 2.3823 LearningRate 0.000397 Epoch: 17 Global Step: 359120 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:37,242-Speed 2496.01 samples/sec Loss 2.4162 LearningRate 0.000397 Epoch: 17 Global Step: 359130 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:45,446-Speed 2496.74 samples/sec Loss 2.4039 LearningRate 0.000397 Epoch: 17 Global Step: 359140 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:38:53,647-Speed 2497.88 samples/sec Loss 2.4106 LearningRate 0.000397 Epoch: 17 Global Step: 359150 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:01,849-Speed 2497.38 samples/sec Loss 2.4531 LearningRate 0.000397 Epoch: 17 Global Step: 359160 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:10,003-Speed 2512.05 samples/sec Loss 2.3495 LearningRate 0.000397 Epoch: 17 Global Step: 359170 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:18,207-Speed 2496.87 samples/sec Loss 2.4111 LearningRate 0.000397 Epoch: 17 Global Step: 359180 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:26,408-Speed 2497.71 samples/sec Loss 2.4383 LearningRate 0.000397 Epoch: 17 Global Step: 359190 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:34,608-Speed 2497.87 samples/sec Loss 2.4293 LearningRate 0.000397 Epoch: 17 Global Step: 359200 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:42,823-Speed 2493.39 samples/sec Loss 2.4020 LearningRate 0.000397 Epoch: 17 Global Step: 359210 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:51,028-Speed 2496.46 samples/sec Loss 2.3860 LearningRate 0.000397 Epoch: 17 Global Step: 359220 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:39:59,180-Speed 2512.54 samples/sec Loss 2.3856 LearningRate 0.000397 Epoch: 17 Global Step: 359230 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:07,385-Speed 2496.40 samples/sec Loss 2.3881 LearningRate 0.000397 Epoch: 17 Global Step: 359240 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:15,590-Speed 2496.64 samples/sec Loss 2.4571 LearningRate 0.000397 Epoch: 17 Global Step: 359250 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:23,798-Speed 2495.25 samples/sec Loss 2.4145 LearningRate 0.000397 Epoch: 17 Global Step: 359260 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:32,001-Speed 2497.16 samples/sec Loss 2.3769 LearningRate 0.000397 Epoch: 17 Global Step: 359270 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:40,206-Speed 2496.87 samples/sec Loss 2.4605 LearningRate 0.000397 Epoch: 17 Global Step: 359280 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:48,357-Speed 2512.86 samples/sec Loss 2.4001 LearningRate 0.000397 Epoch: 17 Global Step: 359290 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:40:56,561-Speed 2496.65 samples/sec Loss 2.3896 LearningRate 0.000397 Epoch: 17 Global Step: 359300 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:04,767-Speed 2495.99 samples/sec Loss 2.4636 LearningRate 0.000397 Epoch: 17 Global Step: 359310 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:12,970-Speed 2497.38 samples/sec Loss 2.4263 LearningRate 0.000397 Epoch: 17 Global Step: 359320 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:21,170-Speed 2497.99 samples/sec Loss 2.4274 LearningRate 0.000397 Epoch: 17 Global Step: 359330 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:29,373-Speed 2496.90 samples/sec Loss 2.4290 LearningRate 0.000397 Epoch: 17 Global Step: 359340 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:37,522-Speed 2513.72 samples/sec Loss 2.3999 LearningRate 0.000397 Epoch: 17 Global Step: 359350 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:45,726-Speed 2496.50 samples/sec Loss 2.4072 LearningRate 0.000397 Epoch: 17 Global Step: 359360 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:41:53,930-Speed 2496.83 samples/sec Loss 2.3945 LearningRate 0.000397 Epoch: 17 Global Step: 359370 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:02,129-Speed 2498.11 samples/sec Loss 2.4201 LearningRate 0.000397 Epoch: 17 Global Step: 359380 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:10,334-Speed 2496.67 samples/sec Loss 2.3929 LearningRate 0.000397 Epoch: 17 Global Step: 359390 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:18,542-Speed 2495.40 samples/sec Loss 2.4066 LearningRate 0.000397 Epoch: 17 Global Step: 359400 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:26,690-Speed 2513.66 samples/sec Loss 2.3328 LearningRate 0.000397 Epoch: 17 Global Step: 359410 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:34,894-Speed 2496.92 samples/sec Loss 2.4077 LearningRate 0.000397 Epoch: 17 Global Step: 359420 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:43,097-Speed 2496.96 samples/sec Loss 2.4100 LearningRate 0.000397 Epoch: 17 Global Step: 359430 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:51,301-Speed 2496.69 samples/sec Loss 2.4386 LearningRate 0.000396 Epoch: 17 Global Step: 359440 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:42:59,502-Speed 2497.61 samples/sec Loss 2.3995 LearningRate 0.000396 Epoch: 17 Global Step: 359450 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:07,705-Speed 2497.15 samples/sec Loss 2.3968 LearningRate 0.000396 Epoch: 17 Global Step: 359460 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:15,855-Speed 2513.59 samples/sec Loss 2.3917 LearningRate 0.000396 Epoch: 17 Global Step: 359470 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:24,057-Speed 2497.23 samples/sec Loss 2.3548 LearningRate 0.000396 Epoch: 17 Global Step: 359480 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:32,260-Speed 2496.90 samples/sec Loss 2.3962 LearningRate 0.000396 Epoch: 17 Global Step: 359490 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:40,466-Speed 2496.27 samples/sec Loss 2.3727 LearningRate 0.000396 Epoch: 17 Global Step: 359500 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:48,669-Speed 2496.85 samples/sec Loss 2.3789 LearningRate 0.000396 Epoch: 17 Global Step: 359510 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:43:56,875-Speed 2496.37 samples/sec Loss 2.3773 LearningRate 0.000396 Epoch: 17 Global Step: 359520 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:05,023-Speed 2513.93 samples/sec Loss 2.4274 LearningRate 0.000396 Epoch: 17 Global Step: 359530 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:13,239-Speed 2492.84 samples/sec Loss 2.3860 LearningRate 0.000396 Epoch: 17 Global Step: 359540 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:21,442-Speed 2497.18 samples/sec Loss 2.4126 LearningRate 0.000396 Epoch: 17 Global Step: 359550 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:29,651-Speed 2495.34 samples/sec Loss 2.4096 LearningRate 0.000396 Epoch: 17 Global Step: 359560 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:37,853-Speed 2497.63 samples/sec Loss 2.4448 LearningRate 0.000396 Epoch: 17 Global Step: 359570 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:46,057-Speed 2496.87 samples/sec Loss 2.4649 LearningRate 0.000396 Epoch: 17 Global Step: 359580 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:44:54,209-Speed 2512.41 samples/sec Loss 2.4136 LearningRate 0.000396 Epoch: 17 Global Step: 359590 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:02,414-Speed 2496.71 samples/sec Loss 2.4142 LearningRate 0.000396 Epoch: 17 Global Step: 359600 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:10,631-Speed 2492.66 samples/sec Loss 2.4280 LearningRate 0.000396 Epoch: 17 Global Step: 359610 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:18,834-Speed 2497.07 samples/sec Loss 2.4407 LearningRate 0.000396 Epoch: 17 Global Step: 359620 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:27,040-Speed 2496.25 samples/sec Loss 2.4986 LearningRate 0.000396 Epoch: 17 Global Step: 359630 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:35,244-Speed 2496.66 samples/sec Loss 2.4548 LearningRate 0.000396 Epoch: 17 Global Step: 359640 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:43,393-Speed 2513.49 samples/sec Loss 2.4131 LearningRate 0.000396 Epoch: 17 Global Step: 359650 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:51,596-Speed 2497.03 samples/sec Loss 2.3996 LearningRate 0.000396 Epoch: 17 Global Step: 359660 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:45:59,802-Speed 2495.98 samples/sec Loss 2.5209 LearningRate 0.000396 Epoch: 17 Global Step: 359670 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:46:08,010-Speed 2495.66 samples/sec Loss 2.4472 LearningRate 0.000396 Epoch: 17 Global Step: 359680 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:46:16,214-Speed 2496.52 samples/sec Loss 2.3688 LearningRate 0.000396 Epoch: 17 Global Step: 359690 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:46:24,416-Speed 2497.42 samples/sec Loss 2.4348 LearningRate 0.000396 Epoch: 17 Global Step: 359700 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:46:32,566-Speed 2513.29 samples/sec Loss 2.4223 LearningRate 0.000396 Epoch: 17 Global Step: 359710 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:46:40,772-Speed 2496.14 samples/sec Loss 2.4214 LearningRate 0.000396 Epoch: 17 Global Step: 359720 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:46:48,974-Speed 2497.55 samples/sec Loss 2.4060 LearningRate 0.000396 Epoch: 17 Global Step: 359730 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:46:57,181-Speed 2496.05 samples/sec Loss 2.4412 LearningRate 0.000396 Epoch: 17 Global Step: 359740 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:47:05,383-Speed 2497.12 samples/sec Loss 2.3995 LearningRate 0.000396 Epoch: 17 Global Step: 359750 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:47:13,588-Speed 2496.38 samples/sec Loss 2.3873 LearningRate 0.000396 Epoch: 17 Global Step: 359760 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:47:21,739-Speed 2513.17 samples/sec Loss 2.4281 LearningRate 0.000396 Epoch: 17 Global Step: 359770 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:47:29,939-Speed 2497.73 samples/sec Loss 2.4179 LearningRate 0.000396 Epoch: 17 Global Step: 359780 Fp16 Grad Scale: 65536 Required: 108 hours Training: 2022-07-09 00:47:38,100-Speed 2509.80 samples/sec Loss 2.3975 LearningRate 0.000396 Epoch: 17 Global Step: 359790 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:47:46,306-Speed 2496.17 samples/sec Loss 2.4808 LearningRate 0.000396 Epoch: 17 Global Step: 359800 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:47:54,513-Speed 2496.14 samples/sec Loss 2.4631 LearningRate 0.000396 Epoch: 17 Global Step: 359810 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:02,718-Speed 2496.18 samples/sec Loss 2.4166 LearningRate 0.000396 Epoch: 17 Global Step: 359820 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:10,868-Speed 2513.42 samples/sec Loss 2.4370 LearningRate 0.000396 Epoch: 17 Global Step: 359830 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:19,070-Speed 2497.32 samples/sec Loss 2.4450 LearningRate 0.000396 Epoch: 17 Global Step: 359840 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:27,277-Speed 2495.77 samples/sec Loss 2.4744 LearningRate 0.000396 Epoch: 17 Global Step: 359850 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:35,489-Speed 2494.41 samples/sec Loss 2.4437 LearningRate 0.000396 Epoch: 17 Global Step: 359860 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:43,689-Speed 2498.07 samples/sec Loss 2.3952 LearningRate 0.000396 Epoch: 17 Global Step: 359870 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:48:51,894-Speed 2496.35 samples/sec Loss 2.4483 LearningRate 0.000396 Epoch: 17 Global Step: 359880 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:49:00,042-Speed 2513.80 samples/sec Loss 2.4350 LearningRate 0.000396 Epoch: 17 Global Step: 359890 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:49:08,245-Speed 2497.17 samples/sec Loss 2.4562 LearningRate 0.000396 Epoch: 17 Global Step: 359900 Fp16 Grad Scale: 32768 Required: 108 hours Training: 2022-07-09 00:49:16,450-Speed 2496.17 samples/sec Loss 2.3903 LearningRate 0.000396 Epoch: 17 Global Step: 359910 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:49:24,652-Speed 2497.45 samples/sec Loss 2.3938 LearningRate 0.000396 Epoch: 17 Global Step: 359920 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:49:32,854-Speed 2497.31 samples/sec Loss 2.3447 LearningRate 0.000396 Epoch: 17 Global Step: 359930 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:49:41,063-Speed 2495.24 samples/sec Loss 2.3795 LearningRate 0.000396 Epoch: 17 Global Step: 359940 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:49:49,234-Speed 2506.77 samples/sec Loss 2.4111 LearningRate 0.000396 Epoch: 17 Global Step: 359950 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:49:57,440-Speed 2496.24 samples/sec Loss 2.4166 LearningRate 0.000396 Epoch: 17 Global Step: 359960 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:05,645-Speed 2496.23 samples/sec Loss 2.4260 LearningRate 0.000396 Epoch: 17 Global Step: 359970 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:13,848-Speed 2497.05 samples/sec Loss 2.4406 LearningRate 0.000396 Epoch: 17 Global Step: 359980 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:22,054-Speed 2496.14 samples/sec Loss 2.4546 LearningRate 0.000396 Epoch: 17 Global Step: 359990 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:30,254-Speed 2497.90 samples/sec Loss 2.4290 LearningRate 0.000396 Epoch: 17 Global Step: 360000 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:38,407-Speed 2512.52 samples/sec Loss 2.4382 LearningRate 0.000396 Epoch: 17 Global Step: 360010 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:46,608-Speed 2497.76 samples/sec Loss 2.4107 LearningRate 0.000396 Epoch: 17 Global Step: 360020 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:50:54,811-Speed 2496.92 samples/sec Loss 2.4018 LearningRate 0.000396 Epoch: 17 Global Step: 360030 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:03,013-Speed 2497.28 samples/sec Loss 2.4172 LearningRate 0.000395 Epoch: 17 Global Step: 360040 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:11,219-Speed 2496.06 samples/sec Loss 2.3846 LearningRate 0.000395 Epoch: 17 Global Step: 360050 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:19,425-Speed 2496.14 samples/sec Loss 2.4387 LearningRate 0.000395 Epoch: 17 Global Step: 360060 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:27,576-Speed 2513.03 samples/sec Loss 2.3940 LearningRate 0.000395 Epoch: 17 Global Step: 360070 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:35,781-Speed 2496.38 samples/sec Loss 2.3681 LearningRate 0.000395 Epoch: 17 Global Step: 360080 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:43,988-Speed 2495.73 samples/sec Loss 2.4332 LearningRate 0.000395 Epoch: 17 Global Step: 360090 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:51:52,191-Speed 2496.94 samples/sec Loss 2.4203 LearningRate 0.000395 Epoch: 17 Global Step: 360100 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:00,403-Speed 2494.69 samples/sec Loss 2.4117 LearningRate 0.000395 Epoch: 17 Global Step: 360110 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:08,610-Speed 2495.89 samples/sec Loss 2.4400 LearningRate 0.000395 Epoch: 17 Global Step: 360120 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:16,762-Speed 2512.75 samples/sec Loss 2.3813 LearningRate 0.000395 Epoch: 17 Global Step: 360130 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:24,962-Speed 2497.66 samples/sec Loss 2.4429 LearningRate 0.000395 Epoch: 17 Global Step: 360140 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:33,177-Speed 2493.65 samples/sec Loss 2.3830 LearningRate 0.000395 Epoch: 17 Global Step: 360150 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:41,381-Speed 2496.73 samples/sec Loss 2.3376 LearningRate 0.000395 Epoch: 17 Global Step: 360160 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:49,587-Speed 2496.05 samples/sec Loss 2.4164 LearningRate 0.000395 Epoch: 17 Global Step: 360170 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:52:57,794-Speed 2495.72 samples/sec Loss 2.3892 LearningRate 0.000395 Epoch: 17 Global Step: 360180 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:05,946-Speed 2512.76 samples/sec Loss 2.4410 LearningRate 0.000395 Epoch: 17 Global Step: 360190 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:14,148-Speed 2497.40 samples/sec Loss 2.4147 LearningRate 0.000395 Epoch: 17 Global Step: 360200 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:22,347-Speed 2498.12 samples/sec Loss 2.3820 LearningRate 0.000395 Epoch: 17 Global Step: 360210 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:30,551-Speed 2496.66 samples/sec Loss 2.3737 LearningRate 0.000395 Epoch: 17 Global Step: 360220 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:38,767-Speed 2493.22 samples/sec Loss 2.4230 LearningRate 0.000395 Epoch: 17 Global Step: 360230 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:46,974-Speed 2495.59 samples/sec Loss 2.4540 LearningRate 0.000395 Epoch: 17 Global Step: 360240 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:53:55,123-Speed 2513.64 samples/sec Loss 2.4590 LearningRate 0.000395 Epoch: 17 Global Step: 360250 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:54:03,330-Speed 2495.86 samples/sec Loss 2.4404 LearningRate 0.000395 Epoch: 17 Global Step: 360260 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:54:11,533-Speed 2497.15 samples/sec Loss 2.3884 LearningRate 0.000395 Epoch: 17 Global Step: 360270 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:54:19,735-Speed 2497.32 samples/sec Loss 2.3857 LearningRate 0.000395 Epoch: 17 Global Step: 360280 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 00:54:27,893-Speed 2510.52 samples/sec Loss 2.4935 LearningRate 0.000395 Epoch: 17 Global Step: 360290 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:54:36,096-Speed 2497.24 samples/sec Loss 2.4852 LearningRate 0.000395 Epoch: 17 Global Step: 360300 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:54:44,259-Speed 2509.18 samples/sec Loss 2.4107 LearningRate 0.000395 Epoch: 17 Global Step: 360310 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:54:52,462-Speed 2496.96 samples/sec Loss 2.4454 LearningRate 0.000395 Epoch: 17 Global Step: 360320 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:00,664-Speed 2497.53 samples/sec Loss 2.4240 LearningRate 0.000395 Epoch: 17 Global Step: 360330 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:08,869-Speed 2496.57 samples/sec Loss 2.3991 LearningRate 0.000395 Epoch: 17 Global Step: 360340 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:17,069-Speed 2497.75 samples/sec Loss 2.3770 LearningRate 0.000395 Epoch: 17 Global Step: 360350 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:25,273-Speed 2496.86 samples/sec Loss 2.4284 LearningRate 0.000395 Epoch: 17 Global Step: 360360 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:33,426-Speed 2512.32 samples/sec Loss 2.3916 LearningRate 0.000395 Epoch: 17 Global Step: 360370 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:41,652-Speed 2490.03 samples/sec Loss 2.4911 LearningRate 0.000395 Epoch: 17 Global Step: 360380 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:49,854-Speed 2497.23 samples/sec Loss 2.4797 LearningRate 0.000395 Epoch: 17 Global Step: 360390 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:55:58,055-Speed 2497.57 samples/sec Loss 2.4894 LearningRate 0.000395 Epoch: 17 Global Step: 360400 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:06,256-Speed 2497.64 samples/sec Loss 2.4377 LearningRate 0.000395 Epoch: 17 Global Step: 360410 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:14,460-Speed 2496.87 samples/sec Loss 2.4284 LearningRate 0.000395 Epoch: 17 Global Step: 360420 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:22,610-Speed 2513.27 samples/sec Loss 2.4321 LearningRate 0.000395 Epoch: 17 Global Step: 360430 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:30,815-Speed 2496.49 samples/sec Loss 2.4307 LearningRate 0.000395 Epoch: 17 Global Step: 360440 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:39,023-Speed 2495.34 samples/sec Loss 2.4055 LearningRate 0.000395 Epoch: 17 Global Step: 360450 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:47,231-Speed 2495.50 samples/sec Loss 2.4215 LearningRate 0.000395 Epoch: 17 Global Step: 360460 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:56:55,435-Speed 2496.98 samples/sec Loss 2.3706 LearningRate 0.000395 Epoch: 17 Global Step: 360470 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:03,641-Speed 2495.98 samples/sec Loss 2.3849 LearningRate 0.000395 Epoch: 17 Global Step: 360480 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:11,794-Speed 2512.39 samples/sec Loss 2.3894 LearningRate 0.000395 Epoch: 17 Global Step: 360490 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:19,997-Speed 2496.99 samples/sec Loss 2.3936 LearningRate 0.000395 Epoch: 17 Global Step: 360500 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:28,204-Speed 2495.77 samples/sec Loss 2.4203 LearningRate 0.000395 Epoch: 17 Global Step: 360510 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:36,417-Speed 2494.14 samples/sec Loss 2.4106 LearningRate 0.000395 Epoch: 17 Global Step: 360520 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:44,626-Speed 2495.00 samples/sec Loss 2.3889 LearningRate 0.000395 Epoch: 17 Global Step: 360530 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:57:52,843-Speed 2492.80 samples/sec Loss 2.4374 LearningRate 0.000395 Epoch: 17 Global Step: 360540 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:01,008-Speed 2508.59 samples/sec Loss 2.4405 LearningRate 0.000395 Epoch: 17 Global Step: 360550 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:09,226-Speed 2492.53 samples/sec Loss 2.4071 LearningRate 0.000395 Epoch: 17 Global Step: 360560 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:17,449-Speed 2491.03 samples/sec Loss 2.3593 LearningRate 0.000395 Epoch: 17 Global Step: 360570 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:25,657-Speed 2495.48 samples/sec Loss 2.3742 LearningRate 0.000395 Epoch: 17 Global Step: 360580 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:33,863-Speed 2495.94 samples/sec Loss 2.4482 LearningRate 0.000395 Epoch: 17 Global Step: 360590 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:42,070-Speed 2495.73 samples/sec Loss 2.4492 LearningRate 0.000395 Epoch: 17 Global Step: 360600 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:50,228-Speed 2510.98 samples/sec Loss 2.4567 LearningRate 0.000395 Epoch: 17 Global Step: 360610 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:58:58,443-Speed 2493.65 samples/sec Loss 2.4394 LearningRate 0.000395 Epoch: 17 Global Step: 360620 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:06,652-Speed 2495.12 samples/sec Loss 2.4275 LearningRate 0.000394 Epoch: 17 Global Step: 360630 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:14,874-Speed 2491.44 samples/sec Loss 2.4485 LearningRate 0.000394 Epoch: 17 Global Step: 360640 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:23,081-Speed 2495.89 samples/sec Loss 2.3974 LearningRate 0.000394 Epoch: 17 Global Step: 360650 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:31,289-Speed 2495.48 samples/sec Loss 2.4055 LearningRate 0.000394 Epoch: 17 Global Step: 360660 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:39,442-Speed 2512.04 samples/sec Loss 2.4870 LearningRate 0.000394 Epoch: 17 Global Step: 360670 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:47,646-Speed 2496.91 samples/sec Loss 2.4166 LearningRate 0.000394 Epoch: 17 Global Step: 360680 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 00:59:55,852-Speed 2495.84 samples/sec Loss 2.3911 LearningRate 0.000394 Epoch: 17 Global Step: 360690 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:04,057-Speed 2496.49 samples/sec Loss 2.4038 LearningRate 0.000394 Epoch: 17 Global Step: 360700 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:12,261-Speed 2497.03 samples/sec Loss 2.4191 LearningRate 0.000394 Epoch: 17 Global Step: 360710 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:20,462-Speed 2497.50 samples/sec Loss 2.4099 LearningRate 0.000394 Epoch: 17 Global Step: 360720 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:28,627-Speed 2508.82 samples/sec Loss 2.4077 LearningRate 0.000394 Epoch: 17 Global Step: 360730 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:36,834-Speed 2495.71 samples/sec Loss 2.4379 LearningRate 0.000394 Epoch: 17 Global Step: 360740 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:45,038-Speed 2496.84 samples/sec Loss 2.4188 LearningRate 0.000394 Epoch: 17 Global Step: 360750 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:00:53,257-Speed 2491.87 samples/sec Loss 2.4546 LearningRate 0.000394 Epoch: 17 Global Step: 360760 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:01,460-Speed 2497.30 samples/sec Loss 2.4284 LearningRate 0.000394 Epoch: 17 Global Step: 360770 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:09,666-Speed 2495.98 samples/sec Loss 2.4127 LearningRate 0.000394 Epoch: 17 Global Step: 360780 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:17,818-Speed 2512.78 samples/sec Loss 2.4630 LearningRate 0.000394 Epoch: 17 Global Step: 360790 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:26,020-Speed 2497.13 samples/sec Loss 2.4574 LearningRate 0.000394 Epoch: 17 Global Step: 360800 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:34,232-Speed 2494.58 samples/sec Loss 2.4070 LearningRate 0.000394 Epoch: 17 Global Step: 360810 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:42,438-Speed 2496.10 samples/sec Loss 2.4261 LearningRate 0.000394 Epoch: 17 Global Step: 360820 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:50,642-Speed 2496.58 samples/sec Loss 2.3976 LearningRate 0.000394 Epoch: 17 Global Step: 360830 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:01:58,846-Speed 2496.99 samples/sec Loss 2.4030 LearningRate 0.000394 Epoch: 17 Global Step: 360840 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:06,995-Speed 2513.74 samples/sec Loss 2.3910 LearningRate 0.000394 Epoch: 17 Global Step: 360850 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:15,196-Speed 2497.47 samples/sec Loss 2.3796 LearningRate 0.000394 Epoch: 17 Global Step: 360860 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:23,402-Speed 2496.35 samples/sec Loss 2.3596 LearningRate 0.000394 Epoch: 17 Global Step: 360870 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:31,611-Speed 2495.03 samples/sec Loss 2.4152 LearningRate 0.000394 Epoch: 17 Global Step: 360880 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:39,814-Speed 2497.01 samples/sec Loss 2.4133 LearningRate 0.000394 Epoch: 17 Global Step: 360890 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:48,027-Speed 2493.90 samples/sec Loss 2.4424 LearningRate 0.000394 Epoch: 17 Global Step: 360900 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:02:56,176-Speed 2513.68 samples/sec Loss 2.4383 LearningRate 0.000394 Epoch: 17 Global Step: 360910 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:04,388-Speed 2494.38 samples/sec Loss 2.4123 LearningRate 0.000394 Epoch: 17 Global Step: 360920 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:12,589-Speed 2497.71 samples/sec Loss 2.4443 LearningRate 0.000394 Epoch: 17 Global Step: 360930 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:20,791-Speed 2497.34 samples/sec Loss 2.4461 LearningRate 0.000394 Epoch: 17 Global Step: 360940 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:28,998-Speed 2495.69 samples/sec Loss 2.4494 LearningRate 0.000394 Epoch: 17 Global Step: 360950 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:37,206-Speed 2495.58 samples/sec Loss 2.3933 LearningRate 0.000394 Epoch: 17 Global Step: 360960 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:45,372-Speed 2508.61 samples/sec Loss 2.3951 LearningRate 0.000394 Epoch: 17 Global Step: 360970 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:03:53,575-Speed 2496.79 samples/sec Loss 2.3921 LearningRate 0.000394 Epoch: 17 Global Step: 360980 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:01,790-Speed 2493.51 samples/sec Loss 2.3706 LearningRate 0.000394 Epoch: 17 Global Step: 360990 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:10,001-Speed 2494.62 samples/sec Loss 2.4735 LearningRate 0.000394 Epoch: 17 Global Step: 361000 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:18,202-Speed 2497.74 samples/sec Loss 2.4151 LearningRate 0.000394 Epoch: 17 Global Step: 361010 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:26,405-Speed 2497.07 samples/sec Loss 2.4191 LearningRate 0.000394 Epoch: 17 Global Step: 361020 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:34,568-Speed 2509.35 samples/sec Loss 2.3878 LearningRate 0.000394 Epoch: 17 Global Step: 361030 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:42,780-Speed 2494.32 samples/sec Loss 2.4200 LearningRate 0.000394 Epoch: 17 Global Step: 361040 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:50,977-Speed 2499.20 samples/sec Loss 2.3721 LearningRate 0.000394 Epoch: 17 Global Step: 361050 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:04:59,177-Speed 2497.83 samples/sec Loss 2.4156 LearningRate 0.000394 Epoch: 17 Global Step: 361060 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:07,382-Speed 2496.58 samples/sec Loss 2.4528 LearningRate 0.000394 Epoch: 17 Global Step: 361070 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:15,580-Speed 2498.38 samples/sec Loss 2.4729 LearningRate 0.000394 Epoch: 17 Global Step: 361080 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:23,730-Speed 2513.44 samples/sec Loss 2.4835 LearningRate 0.000394 Epoch: 17 Global Step: 361090 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:31,938-Speed 2495.38 samples/sec Loss 2.4305 LearningRate 0.000394 Epoch: 17 Global Step: 361100 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:40,139-Speed 2497.79 samples/sec Loss 2.4564 LearningRate 0.000394 Epoch: 17 Global Step: 361110 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:48,339-Speed 2498.30 samples/sec Loss 2.4356 LearningRate 0.000394 Epoch: 17 Global Step: 361120 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:05:56,539-Speed 2497.79 samples/sec Loss 2.4232 LearningRate 0.000394 Epoch: 17 Global Step: 361130 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:04,745-Speed 2496.17 samples/sec Loss 2.3952 LearningRate 0.000394 Epoch: 17 Global Step: 361140 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:12,894-Speed 2513.40 samples/sec Loss 2.3720 LearningRate 0.000394 Epoch: 17 Global Step: 361150 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:21,097-Speed 2497.12 samples/sec Loss 2.4101 LearningRate 0.000394 Epoch: 17 Global Step: 361160 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:29,302-Speed 2496.57 samples/sec Loss 2.4341 LearningRate 0.000394 Epoch: 17 Global Step: 361170 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:37,500-Speed 2498.49 samples/sec Loss 2.4068 LearningRate 0.000394 Epoch: 17 Global Step: 361180 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:45,709-Speed 2495.39 samples/sec Loss 2.4215 LearningRate 0.000394 Epoch: 17 Global Step: 361190 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:06:53,913-Speed 2496.53 samples/sec Loss 2.3907 LearningRate 0.000394 Epoch: 17 Global Step: 361200 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:02,061-Speed 2513.77 samples/sec Loss 2.3400 LearningRate 0.000394 Epoch: 17 Global Step: 361210 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:10,267-Speed 2496.12 samples/sec Loss 2.3595 LearningRate 0.000393 Epoch: 17 Global Step: 361220 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:18,474-Speed 2495.89 samples/sec Loss 2.3983 LearningRate 0.000393 Epoch: 17 Global Step: 361230 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:26,698-Speed 2490.67 samples/sec Loss 2.3910 LearningRate 0.000393 Epoch: 17 Global Step: 361240 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:34,912-Speed 2493.51 samples/sec Loss 2.4337 LearningRate 0.000393 Epoch: 17 Global Step: 361250 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:43,118-Speed 2496.11 samples/sec Loss 2.4143 LearningRate 0.000393 Epoch: 17 Global Step: 361260 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:51,270-Speed 2512.99 samples/sec Loss 2.3877 LearningRate 0.000393 Epoch: 17 Global Step: 361270 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:07:59,473-Speed 2496.82 samples/sec Loss 2.4028 LearningRate 0.000393 Epoch: 17 Global Step: 361280 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:07,675-Speed 2497.48 samples/sec Loss 2.3526 LearningRate 0.000393 Epoch: 17 Global Step: 361290 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:15,891-Speed 2493.44 samples/sec Loss 2.4184 LearningRate 0.000393 Epoch: 17 Global Step: 361300 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:24,096-Speed 2496.46 samples/sec Loss 2.4280 LearningRate 0.000393 Epoch: 17 Global Step: 361310 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:32,299-Speed 2496.87 samples/sec Loss 2.4035 LearningRate 0.000393 Epoch: 17 Global Step: 361320 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:40,449-Speed 2513.42 samples/sec Loss 2.4156 LearningRate 0.000393 Epoch: 17 Global Step: 361330 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:48,652-Speed 2497.18 samples/sec Loss 2.4030 LearningRate 0.000393 Epoch: 17 Global Step: 361340 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:08:56,856-Speed 2496.85 samples/sec Loss 2.3847 LearningRate 0.000393 Epoch: 17 Global Step: 361350 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:05,057-Speed 2497.66 samples/sec Loss 2.4177 LearningRate 0.000393 Epoch: 17 Global Step: 361360 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:13,260-Speed 2496.92 samples/sec Loss 2.4427 LearningRate 0.000393 Epoch: 17 Global Step: 361370 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:21,465-Speed 2496.39 samples/sec Loss 2.4247 LearningRate 0.000393 Epoch: 17 Global Step: 361380 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:29,615-Speed 2513.43 samples/sec Loss 2.4302 LearningRate 0.000393 Epoch: 17 Global Step: 361390 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:37,820-Speed 2496.18 samples/sec Loss 2.3768 LearningRate 0.000393 Epoch: 17 Global Step: 361400 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:46,023-Speed 2497.53 samples/sec Loss 2.4007 LearningRate 0.000393 Epoch: 17 Global Step: 361410 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:09:54,224-Speed 2497.49 samples/sec Loss 2.3944 LearningRate 0.000393 Epoch: 17 Global Step: 361420 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:02,443-Speed 2492.10 samples/sec Loss 2.3934 LearningRate 0.000393 Epoch: 17 Global Step: 361430 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:10,645-Speed 2497.39 samples/sec Loss 2.4723 LearningRate 0.000393 Epoch: 17 Global Step: 361440 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:18,791-Speed 2514.55 samples/sec Loss 2.4418 LearningRate 0.000393 Epoch: 17 Global Step: 361450 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:26,997-Speed 2496.20 samples/sec Loss 2.3860 LearningRate 0.000393 Epoch: 17 Global Step: 361460 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:35,199-Speed 2497.59 samples/sec Loss 2.4522 LearningRate 0.000393 Epoch: 17 Global Step: 361470 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:43,402-Speed 2496.86 samples/sec Loss 2.4306 LearningRate 0.000393 Epoch: 17 Global Step: 361480 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:10:51,605-Speed 2497.29 samples/sec Loss 2.3890 LearningRate 0.000393 Epoch: 17 Global Step: 361490 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:10:59,805-Speed 2497.95 samples/sec Loss 2.4673 LearningRate 0.000393 Epoch: 17 Global Step: 361500 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:11:07,959-Speed 2511.92 samples/sec Loss 2.4504 LearningRate 0.000393 Epoch: 17 Global Step: 361510 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:11:16,126-Speed 2508.27 samples/sec Loss 2.4335 LearningRate 0.000393 Epoch: 17 Global Step: 361520 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:11:24,328-Speed 2497.40 samples/sec Loss 2.4788 LearningRate 0.000393 Epoch: 17 Global Step: 361530 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:11:32,544-Speed 2493.12 samples/sec Loss 2.4640 LearningRate 0.000393 Epoch: 17 Global Step: 361540 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:11:40,757-Speed 2493.83 samples/sec Loss 2.4337 LearningRate 0.000393 Epoch: 17 Global Step: 361550 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:11:48,972-Speed 2493.57 samples/sec Loss 2.4910 LearningRate 0.000393 Epoch: 17 Global Step: 361560 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:11:57,126-Speed 2511.84 samples/sec Loss 2.4478 LearningRate 0.000393 Epoch: 17 Global Step: 361570 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:05,338-Speed 2494.28 samples/sec Loss 2.4886 LearningRate 0.000393 Epoch: 17 Global Step: 361580 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:13,540-Speed 2497.44 samples/sec Loss 2.4289 LearningRate 0.000393 Epoch: 17 Global Step: 361590 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:21,762-Speed 2491.31 samples/sec Loss 2.4040 LearningRate 0.000393 Epoch: 17 Global Step: 361600 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:29,959-Speed 2498.85 samples/sec Loss 2.4740 LearningRate 0.000393 Epoch: 17 Global Step: 361610 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:38,176-Speed 2492.98 samples/sec Loss 2.3662 LearningRate 0.000393 Epoch: 17 Global Step: 361620 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:46,324-Speed 2513.91 samples/sec Loss 2.4589 LearningRate 0.000393 Epoch: 17 Global Step: 361630 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:12:54,526-Speed 2497.47 samples/sec Loss 2.4262 LearningRate 0.000393 Epoch: 17 Global Step: 361640 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:02,732-Speed 2496.03 samples/sec Loss 2.4631 LearningRate 0.000393 Epoch: 17 Global Step: 361650 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:10,941-Speed 2495.24 samples/sec Loss 2.4665 LearningRate 0.000393 Epoch: 17 Global Step: 361660 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:19,142-Speed 2497.55 samples/sec Loss 2.4450 LearningRate 0.000393 Epoch: 17 Global Step: 361670 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:27,360-Speed 2492.59 samples/sec Loss 2.4355 LearningRate 0.000393 Epoch: 17 Global Step: 361680 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:35,512-Speed 2512.76 samples/sec Loss 2.4291 LearningRate 0.000393 Epoch: 17 Global Step: 361690 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:43,718-Speed 2495.92 samples/sec Loss 2.4051 LearningRate 0.000393 Epoch: 17 Global Step: 361700 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:13:51,923-Speed 2496.48 samples/sec Loss 2.3835 LearningRate 0.000393 Epoch: 17 Global Step: 361710 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:00,127-Speed 2496.68 samples/sec Loss 2.4381 LearningRate 0.000393 Epoch: 17 Global Step: 361720 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:08,331-Speed 2496.72 samples/sec Loss 2.3888 LearningRate 0.000393 Epoch: 17 Global Step: 361730 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:16,538-Speed 2495.91 samples/sec Loss 2.4227 LearningRate 0.000393 Epoch: 17 Global Step: 361740 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:24,695-Speed 2511.22 samples/sec Loss 2.3879 LearningRate 0.000393 Epoch: 17 Global Step: 361750 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:32,899-Speed 2496.64 samples/sec Loss 2.3792 LearningRate 0.000393 Epoch: 17 Global Step: 361760 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:41,111-Speed 2494.20 samples/sec Loss 2.4071 LearningRate 0.000393 Epoch: 17 Global Step: 361770 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:49,326-Speed 2493.36 samples/sec Loss 2.3813 LearningRate 0.000393 Epoch: 17 Global Step: 361780 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:14:57,532-Speed 2496.42 samples/sec Loss 2.4295 LearningRate 0.000393 Epoch: 17 Global Step: 361790 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:05,733-Speed 2497.74 samples/sec Loss 2.3694 LearningRate 0.000393 Epoch: 17 Global Step: 361800 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:13,882-Speed 2513.78 samples/sec Loss 2.4355 LearningRate 0.000393 Epoch: 17 Global Step: 361810 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:22,081-Speed 2498.21 samples/sec Loss 2.3936 LearningRate 0.000392 Epoch: 17 Global Step: 361820 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:30,286-Speed 2496.38 samples/sec Loss 2.4303 LearningRate 0.000392 Epoch: 17 Global Step: 361830 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:38,488-Speed 2497.34 samples/sec Loss 2.3993 LearningRate 0.000392 Epoch: 17 Global Step: 361840 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:46,691-Speed 2496.98 samples/sec Loss 2.4362 LearningRate 0.000392 Epoch: 17 Global Step: 361850 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:15:54,895-Speed 2497.29 samples/sec Loss 2.4129 LearningRate 0.000392 Epoch: 17 Global Step: 361860 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:03,041-Speed 2514.34 samples/sec Loss 2.4779 LearningRate 0.000392 Epoch: 17 Global Step: 361870 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:11,241-Speed 2498.11 samples/sec Loss 2.4096 LearningRate 0.000392 Epoch: 17 Global Step: 361880 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:19,455-Speed 2493.98 samples/sec Loss 2.4480 LearningRate 0.000392 Epoch: 17 Global Step: 361890 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:27,656-Speed 2497.44 samples/sec Loss 2.4036 LearningRate 0.000392 Epoch: 17 Global Step: 361900 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:35,858-Speed 2497.77 samples/sec Loss 2.4253 LearningRate 0.000392 Epoch: 17 Global Step: 361910 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:44,060-Speed 2497.33 samples/sec Loss 2.4066 LearningRate 0.000392 Epoch: 17 Global Step: 361920 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:16:52,214-Speed 2512.07 samples/sec Loss 2.4292 LearningRate 0.000392 Epoch: 17 Global Step: 361930 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:00,419-Speed 2496.43 samples/sec Loss 2.3801 LearningRate 0.000392 Epoch: 17 Global Step: 361940 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:08,620-Speed 2497.75 samples/sec Loss 2.4111 LearningRate 0.000392 Epoch: 17 Global Step: 361950 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:16,820-Speed 2497.87 samples/sec Loss 2.4827 LearningRate 0.000392 Epoch: 17 Global Step: 361960 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:25,061-Speed 2485.56 samples/sec Loss 2.3757 LearningRate 0.000392 Epoch: 17 Global Step: 361970 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:33,275-Speed 2493.54 samples/sec Loss 2.4316 LearningRate 0.000392 Epoch: 17 Global Step: 361980 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:41,425-Speed 2513.54 samples/sec Loss 2.4179 LearningRate 0.000392 Epoch: 17 Global Step: 361990 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:49,630-Speed 2496.53 samples/sec Loss 2.4605 LearningRate 0.000392 Epoch: 17 Global Step: 362000 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:17:57,832-Speed 2497.31 samples/sec Loss 2.4370 LearningRate 0.000392 Epoch: 17 Global Step: 362010 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:06,051-Speed 2492.19 samples/sec Loss 2.4518 LearningRate 0.000392 Epoch: 17 Global Step: 362020 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:14,254-Speed 2496.99 samples/sec Loss 2.4216 LearningRate 0.000392 Epoch: 17 Global Step: 362030 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:22,472-Speed 2492.55 samples/sec Loss 2.4169 LearningRate 0.000392 Epoch: 17 Global Step: 362040 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:30,621-Speed 2513.67 samples/sec Loss 2.3914 LearningRate 0.000392 Epoch: 17 Global Step: 362050 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:38,829-Speed 2495.69 samples/sec Loss 2.4083 LearningRate 0.000392 Epoch: 17 Global Step: 362060 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:47,045-Speed 2493.22 samples/sec Loss 2.4358 LearningRate 0.000392 Epoch: 17 Global Step: 362070 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:18:55,251-Speed 2496.11 samples/sec Loss 2.4347 LearningRate 0.000392 Epoch: 17 Global Step: 362080 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:03,450-Speed 2498.03 samples/sec Loss 2.4137 LearningRate 0.000392 Epoch: 17 Global Step: 362090 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:11,653-Speed 2497.28 samples/sec Loss 2.3689 LearningRate 0.000392 Epoch: 17 Global Step: 362100 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:19,799-Speed 2514.52 samples/sec Loss 2.3113 LearningRate 0.000392 Epoch: 17 Global Step: 362110 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:28,012-Speed 2494.23 samples/sec Loss 2.3756 LearningRate 0.000392 Epoch: 17 Global Step: 362120 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:36,216-Speed 2496.47 samples/sec Loss 2.3953 LearningRate 0.000392 Epoch: 17 Global Step: 362130 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:44,418-Speed 2497.33 samples/sec Loss 2.4018 LearningRate 0.000392 Epoch: 17 Global Step: 362140 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:19:52,616-Speed 2498.51 samples/sec Loss 2.4122 LearningRate 0.000392 Epoch: 17 Global Step: 362150 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:00,817-Speed 2497.83 samples/sec Loss 2.3815 LearningRate 0.000392 Epoch: 17 Global Step: 362160 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:08,962-Speed 2514.61 samples/sec Loss 2.4327 LearningRate 0.000392 Epoch: 17 Global Step: 362170 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:17,162-Speed 2498.06 samples/sec Loss 2.3545 LearningRate 0.000392 Epoch: 17 Global Step: 362180 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:25,361-Speed 2498.31 samples/sec Loss 2.4131 LearningRate 0.000392 Epoch: 17 Global Step: 362190 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:33,563-Speed 2497.28 samples/sec Loss 2.3930 LearningRate 0.000392 Epoch: 17 Global Step: 362200 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:41,767-Speed 2496.92 samples/sec Loss 2.4202 LearningRate 0.000392 Epoch: 17 Global Step: 362210 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:49,970-Speed 2497.17 samples/sec Loss 2.4257 LearningRate 0.000392 Epoch: 17 Global Step: 362220 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:20:58,126-Speed 2511.23 samples/sec Loss 2.3494 LearningRate 0.000392 Epoch: 17 Global Step: 362230 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:06,330-Speed 2496.79 samples/sec Loss 2.3320 LearningRate 0.000392 Epoch: 17 Global Step: 362240 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:14,539-Speed 2495.37 samples/sec Loss 2.3706 LearningRate 0.000392 Epoch: 17 Global Step: 362250 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:22,742-Speed 2497.01 samples/sec Loss 2.4001 LearningRate 0.000392 Epoch: 17 Global Step: 362260 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:30,947-Speed 2496.55 samples/sec Loss 2.4158 LearningRate 0.000392 Epoch: 17 Global Step: 362270 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:39,157-Speed 2495.04 samples/sec Loss 2.3371 LearningRate 0.000392 Epoch: 17 Global Step: 362280 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:47,314-Speed 2511.01 samples/sec Loss 2.4334 LearningRate 0.000392 Epoch: 17 Global Step: 362290 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:21:55,530-Speed 2493.27 samples/sec Loss 2.4075 LearningRate 0.000392 Epoch: 17 Global Step: 362300 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:03,731-Speed 2497.55 samples/sec Loss 2.4608 LearningRate 0.000392 Epoch: 17 Global Step: 362310 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:11,933-Speed 2497.51 samples/sec Loss 2.4111 LearningRate 0.000392 Epoch: 17 Global Step: 362320 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:20,134-Speed 2497.69 samples/sec Loss 2.4368 LearningRate 0.000392 Epoch: 17 Global Step: 362330 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:28,336-Speed 2497.53 samples/sec Loss 2.3663 LearningRate 0.000392 Epoch: 17 Global Step: 362340 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:36,486-Speed 2513.38 samples/sec Loss 2.4241 LearningRate 0.000392 Epoch: 17 Global Step: 362350 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:44,690-Speed 2496.94 samples/sec Loss 2.3666 LearningRate 0.000392 Epoch: 17 Global Step: 362360 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:22:52,896-Speed 2496.97 samples/sec Loss 2.3965 LearningRate 0.000392 Epoch: 17 Global Step: 362370 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:01,108-Speed 2494.43 samples/sec Loss 2.4022 LearningRate 0.000392 Epoch: 17 Global Step: 362380 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:09,327-Speed 2492.27 samples/sec Loss 2.4369 LearningRate 0.000392 Epoch: 17 Global Step: 362390 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:17,527-Speed 2498.09 samples/sec Loss 2.4394 LearningRate 0.000392 Epoch: 17 Global Step: 362400 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:25,680-Speed 2512.43 samples/sec Loss 2.4292 LearningRate 0.000392 Epoch: 17 Global Step: 362410 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:33,880-Speed 2497.74 samples/sec Loss 2.3687 LearningRate 0.000391 Epoch: 17 Global Step: 362420 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:42,090-Speed 2494.91 samples/sec Loss 2.4092 LearningRate 0.000391 Epoch: 17 Global Step: 362430 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:50,293-Speed 2497.06 samples/sec Loss 2.4353 LearningRate 0.000391 Epoch: 17 Global Step: 362440 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:23:58,501-Speed 2495.86 samples/sec Loss 2.3832 LearningRate 0.000391 Epoch: 17 Global Step: 362450 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:06,705-Speed 2496.80 samples/sec Loss 2.5099 LearningRate 0.000391 Epoch: 17 Global Step: 362460 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:14,854-Speed 2513.43 samples/sec Loss 2.4067 LearningRate 0.000391 Epoch: 17 Global Step: 362470 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:23,062-Speed 2495.71 samples/sec Loss 2.4386 LearningRate 0.000391 Epoch: 17 Global Step: 362480 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:31,264-Speed 2497.62 samples/sec Loss 2.4486 LearningRate 0.000391 Epoch: 17 Global Step: 362490 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:39,479-Speed 2493.36 samples/sec Loss 2.3646 LearningRate 0.000391 Epoch: 17 Global Step: 362500 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:47,677-Speed 2498.31 samples/sec Loss 2.3916 LearningRate 0.000391 Epoch: 17 Global Step: 362510 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:24:55,881-Speed 2496.92 samples/sec Loss 2.4130 LearningRate 0.000391 Epoch: 17 Global Step: 362520 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:04,032-Speed 2512.98 samples/sec Loss 2.4184 LearningRate 0.000391 Epoch: 17 Global Step: 362530 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:12,236-Speed 2496.53 samples/sec Loss 2.4123 LearningRate 0.000391 Epoch: 17 Global Step: 362540 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:20,441-Speed 2496.50 samples/sec Loss 2.3796 LearningRate 0.000391 Epoch: 17 Global Step: 362550 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:28,649-Speed 2495.82 samples/sec Loss 2.4791 LearningRate 0.000391 Epoch: 17 Global Step: 362560 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:36,846-Speed 2498.76 samples/sec Loss 2.4443 LearningRate 0.000391 Epoch: 17 Global Step: 362570 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:45,052-Speed 2496.09 samples/sec Loss 2.4891 LearningRate 0.000391 Epoch: 17 Global Step: 362580 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:25:53,197-Speed 2514.77 samples/sec Loss 2.3901 LearningRate 0.000391 Epoch: 17 Global Step: 362590 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:01,399-Speed 2497.32 samples/sec Loss 2.3983 LearningRate 0.000391 Epoch: 17 Global Step: 362600 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:09,602-Speed 2497.20 samples/sec Loss 2.3879 LearningRate 0.000391 Epoch: 17 Global Step: 362610 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:17,808-Speed 2496.33 samples/sec Loss 2.4060 LearningRate 0.000391 Epoch: 17 Global Step: 362620 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:26,011-Speed 2497.07 samples/sec Loss 2.4324 LearningRate 0.000391 Epoch: 17 Global Step: 362630 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:34,216-Speed 2499.39 samples/sec Loss 2.3951 LearningRate 0.000391 Epoch: 17 Global Step: 362640 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:42,376-Speed 2510.44 samples/sec Loss 2.3962 LearningRate 0.000391 Epoch: 17 Global Step: 362650 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:50,582-Speed 2496.24 samples/sec Loss 2.4319 LearningRate 0.000391 Epoch: 17 Global Step: 362660 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:26:58,790-Speed 2495.30 samples/sec Loss 2.4018 LearningRate 0.000391 Epoch: 17 Global Step: 362670 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:27:06,995-Speed 2496.45 samples/sec Loss 2.4458 LearningRate 0.000391 Epoch: 17 Global Step: 362680 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:27:15,196-Speed 2497.57 samples/sec Loss 2.4050 LearningRate 0.000391 Epoch: 17 Global Step: 362690 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:27:23,402-Speed 2496.21 samples/sec Loss 2.4015 LearningRate 0.000391 Epoch: 17 Global Step: 362700 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:27:31,552-Speed 2513.32 samples/sec Loss 2.5041 LearningRate 0.000391 Epoch: 17 Global Step: 362710 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:27:39,753-Speed 2497.83 samples/sec Loss 2.4336 LearningRate 0.000391 Epoch: 17 Global Step: 362720 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:27:47,957-Speed 2496.73 samples/sec Loss 2.3957 LearningRate 0.000391 Epoch: 17 Global Step: 362730 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:27:56,165-Speed 2495.69 samples/sec Loss 2.3541 LearningRate 0.000391 Epoch: 17 Global Step: 362740 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:04,372-Speed 2495.64 samples/sec Loss 2.4336 LearningRate 0.000391 Epoch: 17 Global Step: 362750 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:12,579-Speed 2495.92 samples/sec Loss 2.4137 LearningRate 0.000391 Epoch: 17 Global Step: 362760 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:20,727-Speed 2513.87 samples/sec Loss 2.3984 LearningRate 0.000391 Epoch: 17 Global Step: 362770 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:28,932-Speed 2496.52 samples/sec Loss 2.4050 LearningRate 0.000391 Epoch: 17 Global Step: 362780 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:37,136-Speed 2496.63 samples/sec Loss 2.4155 LearningRate 0.000391 Epoch: 17 Global Step: 362790 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:45,340-Speed 2496.77 samples/sec Loss 2.4133 LearningRate 0.000391 Epoch: 17 Global Step: 362800 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:28:53,546-Speed 2496.19 samples/sec Loss 2.4230 LearningRate 0.000391 Epoch: 17 Global Step: 362810 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:01,756-Speed 2494.66 samples/sec Loss 2.4270 LearningRate 0.000391 Epoch: 17 Global Step: 362820 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:09,931-Speed 2505.87 samples/sec Loss 2.3921 LearningRate 0.000391 Epoch: 17 Global Step: 362830 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:18,136-Speed 2496.86 samples/sec Loss 2.4059 LearningRate 0.000391 Epoch: 17 Global Step: 362840 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:26,339-Speed 2496.73 samples/sec Loss 2.3929 LearningRate 0.000391 Epoch: 17 Global Step: 362850 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:34,551-Speed 2494.46 samples/sec Loss 2.4044 LearningRate 0.000391 Epoch: 17 Global Step: 362860 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:42,756-Speed 2496.26 samples/sec Loss 2.4145 LearningRate 0.000391 Epoch: 17 Global Step: 362870 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:50,958-Speed 2497.41 samples/sec Loss 2.4015 LearningRate 0.000391 Epoch: 17 Global Step: 362880 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:29:59,106-Speed 2513.70 samples/sec Loss 2.3730 LearningRate 0.000391 Epoch: 17 Global Step: 362890 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:07,316-Speed 2494.99 samples/sec Loss 2.3847 LearningRate 0.000391 Epoch: 17 Global Step: 362900 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:15,519-Speed 2497.02 samples/sec Loss 2.4392 LearningRate 0.000391 Epoch: 17 Global Step: 362910 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:23,721-Speed 2497.23 samples/sec Loss 2.4166 LearningRate 0.000391 Epoch: 17 Global Step: 362920 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:31,931-Speed 2494.93 samples/sec Loss 2.3689 LearningRate 0.000391 Epoch: 17 Global Step: 362930 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:40,135-Speed 2496.88 samples/sec Loss 2.4099 LearningRate 0.000391 Epoch: 17 Global Step: 362940 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:48,286-Speed 2513.06 samples/sec Loss 2.3883 LearningRate 0.000391 Epoch: 17 Global Step: 362950 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:30:56,493-Speed 2495.85 samples/sec Loss 2.4271 LearningRate 0.000391 Epoch: 17 Global Step: 362960 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:04,701-Speed 2495.60 samples/sec Loss 2.4379 LearningRate 0.000391 Epoch: 17 Global Step: 362970 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:12,911-Speed 2495.08 samples/sec Loss 2.4230 LearningRate 0.000391 Epoch: 17 Global Step: 362980 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:21,119-Speed 2495.47 samples/sec Loss 2.4249 LearningRate 0.000391 Epoch: 17 Global Step: 362990 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:29,323-Speed 2496.76 samples/sec Loss 2.4575 LearningRate 0.000391 Epoch: 17 Global Step: 363000 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:37,492-Speed 2507.49 samples/sec Loss 2.4465 LearningRate 0.000390 Epoch: 17 Global Step: 363010 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:31:45,653-Speed 2509.89 samples/sec Loss 2.4073 LearningRate 0.000390 Epoch: 17 Global Step: 363020 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:31:53,860-Speed 2495.77 samples/sec Loss 2.4049 LearningRate 0.000390 Epoch: 17 Global Step: 363030 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:02,077-Speed 2492.95 samples/sec Loss 2.4131 LearningRate 0.000390 Epoch: 17 Global Step: 363040 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:10,282-Speed 2496.09 samples/sec Loss 2.4551 LearningRate 0.000390 Epoch: 17 Global Step: 363050 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:18,487-Speed 2496.38 samples/sec Loss 2.4101 LearningRate 0.000390 Epoch: 17 Global Step: 363060 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:26,642-Speed 2511.90 samples/sec Loss 2.3965 LearningRate 0.000390 Epoch: 17 Global Step: 363070 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:34,843-Speed 2498.05 samples/sec Loss 2.4334 LearningRate 0.000390 Epoch: 17 Global Step: 363080 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:43,045-Speed 2497.15 samples/sec Loss 2.4680 LearningRate 0.000390 Epoch: 17 Global Step: 363090 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:51,247-Speed 2497.53 samples/sec Loss 2.4456 LearningRate 0.000390 Epoch: 17 Global Step: 363100 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:32:59,448-Speed 2497.44 samples/sec Loss 2.4140 LearningRate 0.000390 Epoch: 17 Global Step: 363110 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:07,653-Speed 2496.63 samples/sec Loss 2.4149 LearningRate 0.000390 Epoch: 17 Global Step: 363120 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:15,801-Speed 2513.92 samples/sec Loss 2.4541 LearningRate 0.000390 Epoch: 17 Global Step: 363130 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:24,013-Speed 2494.13 samples/sec Loss 2.4388 LearningRate 0.000390 Epoch: 17 Global Step: 363140 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:32,217-Speed 2496.80 samples/sec Loss 2.4800 LearningRate 0.000390 Epoch: 17 Global Step: 363150 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:40,419-Speed 2497.47 samples/sec Loss 2.3759 LearningRate 0.000390 Epoch: 17 Global Step: 363160 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:48,628-Speed 2495.48 samples/sec Loss 2.4091 LearningRate 0.000390 Epoch: 17 Global Step: 363170 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:33:56,838-Speed 2494.99 samples/sec Loss 2.4321 LearningRate 0.000390 Epoch: 17 Global Step: 363180 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:04,986-Speed 2513.99 samples/sec Loss 2.4003 LearningRate 0.000390 Epoch: 17 Global Step: 363190 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:13,187-Speed 2497.55 samples/sec Loss 2.3699 LearningRate 0.000390 Epoch: 17 Global Step: 363200 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:21,392-Speed 2496.38 samples/sec Loss 2.3752 LearningRate 0.000390 Epoch: 17 Global Step: 363210 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:29,608-Speed 2492.97 samples/sec Loss 2.3687 LearningRate 0.000390 Epoch: 17 Global Step: 363220 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:37,824-Speed 2493.34 samples/sec Loss 2.3696 LearningRate 0.000390 Epoch: 17 Global Step: 363230 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:46,028-Speed 2496.73 samples/sec Loss 2.4090 LearningRate 0.000390 Epoch: 17 Global Step: 363240 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:34:54,177-Speed 2513.50 samples/sec Loss 2.3702 LearningRate 0.000390 Epoch: 17 Global Step: 363250 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:02,382-Speed 2496.91 samples/sec Loss 2.3377 LearningRate 0.000390 Epoch: 17 Global Step: 363260 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:10,583-Speed 2497.76 samples/sec Loss 2.3636 LearningRate 0.000390 Epoch: 17 Global Step: 363270 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:18,789-Speed 2496.23 samples/sec Loss 2.3852 LearningRate 0.000390 Epoch: 17 Global Step: 363280 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:26,996-Speed 2495.95 samples/sec Loss 2.3443 LearningRate 0.000390 Epoch: 17 Global Step: 363290 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:35,197-Speed 2497.66 samples/sec Loss 2.3949 LearningRate 0.000390 Epoch: 17 Global Step: 363300 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:43,346-Speed 2513.63 samples/sec Loss 2.3773 LearningRate 0.000390 Epoch: 17 Global Step: 363310 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:51,552-Speed 2496.63 samples/sec Loss 2.4044 LearningRate 0.000390 Epoch: 17 Global Step: 363320 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:35:59,755-Speed 2496.89 samples/sec Loss 2.3398 LearningRate 0.000390 Epoch: 17 Global Step: 363330 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:07,959-Speed 2496.77 samples/sec Loss 2.3879 LearningRate 0.000390 Epoch: 17 Global Step: 363340 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:16,165-Speed 2496.17 samples/sec Loss 2.4715 LearningRate 0.000390 Epoch: 17 Global Step: 363350 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:24,371-Speed 2496.09 samples/sec Loss 2.3706 LearningRate 0.000390 Epoch: 17 Global Step: 363360 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:32,530-Speed 2510.50 samples/sec Loss 2.4007 LearningRate 0.000390 Epoch: 17 Global Step: 363370 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:40,733-Speed 2497.02 samples/sec Loss 2.3755 LearningRate 0.000390 Epoch: 17 Global Step: 363380 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:48,935-Speed 2497.63 samples/sec Loss 2.3881 LearningRate 0.000390 Epoch: 17 Global Step: 363390 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:36:57,148-Speed 2493.73 samples/sec Loss 2.4258 LearningRate 0.000390 Epoch: 17 Global Step: 363400 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:05,348-Speed 2497.94 samples/sec Loss 2.3835 LearningRate 0.000390 Epoch: 17 Global Step: 363410 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:13,568-Speed 2491.90 samples/sec Loss 2.3930 LearningRate 0.000390 Epoch: 17 Global Step: 363420 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:21,716-Speed 2514.09 samples/sec Loss 2.3669 LearningRate 0.000390 Epoch: 17 Global Step: 363430 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:29,929-Speed 2493.78 samples/sec Loss 2.3900 LearningRate 0.000390 Epoch: 17 Global Step: 363440 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:38,136-Speed 2496.19 samples/sec Loss 2.3533 LearningRate 0.000390 Epoch: 17 Global Step: 363450 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:46,343-Speed 2495.96 samples/sec Loss 2.3629 LearningRate 0.000390 Epoch: 17 Global Step: 363460 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:37:54,548-Speed 2496.54 samples/sec Loss 2.3861 LearningRate 0.000390 Epoch: 17 Global Step: 363470 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:02,749-Speed 2497.75 samples/sec Loss 2.3796 LearningRate 0.000390 Epoch: 17 Global Step: 363480 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:10,897-Speed 2513.95 samples/sec Loss 2.3239 LearningRate 0.000390 Epoch: 17 Global Step: 363490 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:19,101-Speed 2496.64 samples/sec Loss 2.3475 LearningRate 0.000390 Epoch: 17 Global Step: 363500 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:27,303-Speed 2497.16 samples/sec Loss 2.3639 LearningRate 0.000390 Epoch: 17 Global Step: 363510 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:35,506-Speed 2497.06 samples/sec Loss 2.3809 LearningRate 0.000390 Epoch: 17 Global Step: 363520 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:43,719-Speed 2493.99 samples/sec Loss 2.3524 LearningRate 0.000390 Epoch: 17 Global Step: 363530 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:38:51,918-Speed 2498.14 samples/sec Loss 2.3868 LearningRate 0.000390 Epoch: 17 Global Step: 363540 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:00,070-Speed 2512.73 samples/sec Loss 2.3940 LearningRate 0.000390 Epoch: 17 Global Step: 363550 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:08,268-Speed 2498.42 samples/sec Loss 2.4232 LearningRate 0.000390 Epoch: 17 Global Step: 363560 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:16,470-Speed 2497.61 samples/sec Loss 2.4148 LearningRate 0.000390 Epoch: 17 Global Step: 363570 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:24,674-Speed 2496.65 samples/sec Loss 2.3437 LearningRate 0.000390 Epoch: 17 Global Step: 363580 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:32,878-Speed 2496.78 samples/sec Loss 2.3686 LearningRate 0.000390 Epoch: 17 Global Step: 363590 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:41,143-Speed 2495.33 samples/sec Loss 2.4108 LearningRate 0.000390 Epoch: 17 Global Step: 363600 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:49,338-Speed 2515.15 samples/sec Loss 2.3446 LearningRate 0.000389 Epoch: 17 Global Step: 363610 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:39:57,542-Speed 2496.73 samples/sec Loss 2.3912 LearningRate 0.000389 Epoch: 17 Global Step: 363620 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:05,750-Speed 2495.43 samples/sec Loss 2.3992 LearningRate 0.000389 Epoch: 17 Global Step: 363630 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:15,834-Speed 2038.66 samples/sec Loss 2.4057 LearningRate 0.000389 Epoch: 17 Global Step: 363640 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:24,059-Speed 2501.08 samples/sec Loss 2.3800 LearningRate 0.000389 Epoch: 17 Global Step: 363650 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:32,261-Speed 2497.15 samples/sec Loss 2.3804 LearningRate 0.000389 Epoch: 17 Global Step: 363660 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:40,441-Speed 2516.27 samples/sec Loss 2.3600 LearningRate 0.000389 Epoch: 17 Global Step: 363670 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:40:52,820-Speed 1662.16 samples/sec Loss 2.3406 LearningRate 0.000389 Epoch: 17 Global Step: 363680 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:01,078-Speed 2501.18 samples/sec Loss 2.4046 LearningRate 0.000389 Epoch: 17 Global Step: 363690 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:09,272-Speed 2499.83 samples/sec Loss 2.3552 LearningRate 0.000389 Epoch: 17 Global Step: 363700 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:17,589-Speed 2500.59 samples/sec Loss 2.3496 LearningRate 0.000389 Epoch: 17 Global Step: 363710 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:25,943-Speed 2500.24 samples/sec Loss 2.3529 LearningRate 0.000389 Epoch: 17 Global Step: 363720 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:34,091-Speed 2513.82 samples/sec Loss 2.3786 LearningRate 0.000389 Epoch: 17 Global Step: 363730 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:42,338-Speed 2500.25 samples/sec Loss 2.3797 LearningRate 0.000389 Epoch: 17 Global Step: 363740 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:41:54,481-Speed 1701.62 samples/sec Loss 2.3983 LearningRate 0.000389 Epoch: 17 Global Step: 363750 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:02,718-Speed 2500.40 samples/sec Loss 2.3455 LearningRate 0.000389 Epoch: 17 Global Step: 363760 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:16,751-Speed 1459.53 samples/sec Loss 2.3500 LearningRate 0.000389 Epoch: 17 Global Step: 363770 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:25,008-Speed 2500.93 samples/sec Loss 2.4071 LearningRate 0.000389 Epoch: 17 Global Step: 363780 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:33,206-Speed 2514.87 samples/sec Loss 2.4296 LearningRate 0.000389 Epoch: 17 Global Step: 363790 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:41,480-Speed 2498.55 samples/sec Loss 2.3816 LearningRate 0.000389 Epoch: 17 Global Step: 363800 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:49,678-Speed 2499.63 samples/sec Loss 2.4332 LearningRate 0.000389 Epoch: 17 Global Step: 363810 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:42:57,905-Speed 2500.62 samples/sec Loss 2.4164 LearningRate 0.000389 Epoch: 17 Global Step: 363820 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:06,100-Speed 2499.50 samples/sec Loss 2.4147 LearningRate 0.000389 Epoch: 17 Global Step: 363830 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:20,738-Speed 2470.75 samples/sec Loss 2.4185 LearningRate 0.000389 Epoch: 17 Global Step: 363840 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:28,898-Speed 2516.84 samples/sec Loss 2.3524 LearningRate 0.000389 Epoch: 17 Global Step: 363850 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:38,747-Speed 2079.59 samples/sec Loss 2.3750 LearningRate 0.000389 Epoch: 17 Global Step: 363860 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:46,943-Speed 2499.08 samples/sec Loss 2.3883 LearningRate 0.000389 Epoch: 17 Global Step: 363870 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:43:55,137-Speed 2500.06 samples/sec Loss 2.4086 LearningRate 0.000389 Epoch: 17 Global Step: 363880 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:03,333-Speed 2499.00 samples/sec Loss 2.3961 LearningRate 0.000389 Epoch: 17 Global Step: 363890 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:11,535-Speed 2497.47 samples/sec Loss 2.4089 LearningRate 0.000389 Epoch: 17 Global Step: 363900 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:19,680-Speed 2515.04 samples/sec Loss 2.4092 LearningRate 0.000389 Epoch: 17 Global Step: 363910 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:27,880-Speed 2497.84 samples/sec Loss 2.3880 LearningRate 0.000389 Epoch: 17 Global Step: 363920 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:36,090-Speed 2495.07 samples/sec Loss 2.4160 LearningRate 0.000389 Epoch: 17 Global Step: 363930 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:44,291-Speed 2497.74 samples/sec Loss 2.3638 LearningRate 0.000389 Epoch: 17 Global Step: 363940 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:44:52,508-Speed 2492.72 samples/sec Loss 2.4295 LearningRate 0.000389 Epoch: 17 Global Step: 363950 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:00,709-Speed 2497.72 samples/sec Loss 2.3384 LearningRate 0.000389 Epoch: 17 Global Step: 363960 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:08,856-Speed 2513.99 samples/sec Loss 2.3676 LearningRate 0.000389 Epoch: 17 Global Step: 363970 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:17,059-Speed 2497.11 samples/sec Loss 2.3136 LearningRate 0.000389 Epoch: 17 Global Step: 363980 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:25,262-Speed 2497.13 samples/sec Loss 2.3552 LearningRate 0.000389 Epoch: 17 Global Step: 363990 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:33,474-Speed 2494.59 samples/sec Loss 2.3854 LearningRate 0.000389 Epoch: 17 Global Step: 364000 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:41,678-Speed 2496.95 samples/sec Loss 2.3611 LearningRate 0.000389 Epoch: 17 Global Step: 364010 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:49,888-Speed 2494.56 samples/sec Loss 2.3659 LearningRate 0.000389 Epoch: 17 Global Step: 364020 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:45:58,054-Speed 2509.07 samples/sec Loss 2.3495 LearningRate 0.000389 Epoch: 17 Global Step: 364030 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:06,258-Speed 2496.60 samples/sec Loss 2.3805 LearningRate 0.000389 Epoch: 17 Global Step: 364040 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:14,467-Speed 2495.25 samples/sec Loss 2.3483 LearningRate 0.000389 Epoch: 17 Global Step: 364050 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:22,670-Speed 2496.97 samples/sec Loss 2.3597 LearningRate 0.000389 Epoch: 17 Global Step: 364060 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:30,872-Speed 2497.79 samples/sec Loss 2.3265 LearningRate 0.000389 Epoch: 17 Global Step: 364070 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:39,077-Speed 2496.25 samples/sec Loss 2.3812 LearningRate 0.000389 Epoch: 17 Global Step: 364080 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:47,227-Speed 2513.40 samples/sec Loss 2.3905 LearningRate 0.000389 Epoch: 17 Global Step: 364090 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:46:55,428-Speed 2497.59 samples/sec Loss 2.3749 LearningRate 0.000389 Epoch: 17 Global Step: 364100 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:03,635-Speed 2495.99 samples/sec Loss 2.3789 LearningRate 0.000389 Epoch: 17 Global Step: 364110 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:11,834-Speed 2498.28 samples/sec Loss 2.3729 LearningRate 0.000389 Epoch: 17 Global Step: 364120 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:20,035-Speed 2497.45 samples/sec Loss 2.3088 LearningRate 0.000389 Epoch: 17 Global Step: 364130 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:28,248-Speed 2494.10 samples/sec Loss 2.3189 LearningRate 0.000389 Epoch: 17 Global Step: 364140 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:36,403-Speed 2511.89 samples/sec Loss 2.3562 LearningRate 0.000389 Epoch: 17 Global Step: 364150 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:44,620-Speed 2492.69 samples/sec Loss 2.3979 LearningRate 0.000389 Epoch: 17 Global Step: 364160 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:47:52,825-Speed 2496.68 samples/sec Loss 2.3646 LearningRate 0.000389 Epoch: 17 Global Step: 364170 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:48:01,022-Speed 2498.57 samples/sec Loss 2.3619 LearningRate 0.000389 Epoch: 17 Global Step: 364180 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:48:09,229-Speed 2495.63 samples/sec Loss 2.3856 LearningRate 0.000389 Epoch: 17 Global Step: 364190 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:48:17,431-Speed 2497.65 samples/sec Loss 2.4113 LearningRate 0.000389 Epoch: 17 Global Step: 364200 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:48:25,582-Speed 2513.02 samples/sec Loss 2.3876 LearningRate 0.000388 Epoch: 17 Global Step: 364210 Fp16 Grad Scale: 16384 Required: 107 hours Training: 2022-07-09 01:48:33,796-Speed 2493.65 samples/sec Loss 2.3558 LearningRate 0.000388 Epoch: 17 Global Step: 364220 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:48:42,000-Speed 2496.74 samples/sec Loss 2.3808 LearningRate 0.000388 Epoch: 17 Global Step: 364230 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:48:50,202-Speed 2497.23 samples/sec Loss 2.3744 LearningRate 0.000388 Epoch: 17 Global Step: 364240 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:48:58,409-Speed 2495.75 samples/sec Loss 2.3739 LearningRate 0.000388 Epoch: 17 Global Step: 364250 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:49:06,610-Speed 2498.21 samples/sec Loss 2.3528 LearningRate 0.000388 Epoch: 17 Global Step: 364260 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:49:14,764-Speed 2511.74 samples/sec Loss 2.3752 LearningRate 0.000388 Epoch: 17 Global Step: 364270 Fp16 Grad Scale: 32768 Required: 107 hours Training: 2022-07-09 01:49:22,973-Speed 2495.44 samples/sec Loss 2.3658 LearningRate 0.000388 Epoch: 17 Global Step: 364280 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:49:31,198-Speed 2490.29 samples/sec Loss 2.4085 LearningRate 0.000388 Epoch: 17 Global Step: 364290 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:49:39,417-Speed 2492.38 samples/sec Loss 2.4384 LearningRate 0.000388 Epoch: 17 Global Step: 364300 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:49:47,631-Speed 2493.53 samples/sec Loss 2.3614 LearningRate 0.000388 Epoch: 17 Global Step: 364310 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:49:55,839-Speed 2495.46 samples/sec Loss 2.4369 LearningRate 0.000388 Epoch: 17 Global Step: 364320 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:03,988-Speed 2513.66 samples/sec Loss 2.3838 LearningRate 0.000388 Epoch: 17 Global Step: 364330 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:12,195-Speed 2495.78 samples/sec Loss 2.3903 LearningRate 0.000388 Epoch: 17 Global Step: 364340 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:20,396-Speed 2497.44 samples/sec Loss 2.4301 LearningRate 0.000388 Epoch: 17 Global Step: 364350 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:28,597-Speed 2497.80 samples/sec Loss 2.4042 LearningRate 0.000388 Epoch: 17 Global Step: 364360 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:36,803-Speed 2496.10 samples/sec Loss 2.4230 LearningRate 0.000388 Epoch: 17 Global Step: 364370 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:45,010-Speed 2495.77 samples/sec Loss 2.3352 LearningRate 0.000388 Epoch: 17 Global Step: 364380 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:50:53,164-Speed 2512.12 samples/sec Loss 2.4439 LearningRate 0.000388 Epoch: 17 Global Step: 364390 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:01,367-Speed 2497.41 samples/sec Loss 2.4099 LearningRate 0.000388 Epoch: 17 Global Step: 364400 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:09,569-Speed 2497.39 samples/sec Loss 2.3326 LearningRate 0.000388 Epoch: 17 Global Step: 364410 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:17,773-Speed 2496.71 samples/sec Loss 2.4214 LearningRate 0.000388 Epoch: 17 Global Step: 364420 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:25,983-Speed 2494.79 samples/sec Loss 2.4215 LearningRate 0.000388 Epoch: 17 Global Step: 364430 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:34,193-Speed 2495.07 samples/sec Loss 2.4225 LearningRate 0.000388 Epoch: 17 Global Step: 364440 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:42,342-Speed 2513.36 samples/sec Loss 2.4800 LearningRate 0.000388 Epoch: 17 Global Step: 364450 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:50,546-Speed 2496.94 samples/sec Loss 2.4174 LearningRate 0.000388 Epoch: 17 Global Step: 364460 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:51:58,751-Speed 2496.32 samples/sec Loss 2.3553 LearningRate 0.000388 Epoch: 17 Global Step: 364470 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:06,957-Speed 2495.98 samples/sec Loss 2.4033 LearningRate 0.000388 Epoch: 17 Global Step: 364480 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:15,160-Speed 2497.55 samples/sec Loss 2.3778 LearningRate 0.000388 Epoch: 17 Global Step: 364490 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:23,363-Speed 2496.86 samples/sec Loss 2.3445 LearningRate 0.000388 Epoch: 17 Global Step: 364500 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:31,514-Speed 2513.02 samples/sec Loss 2.4009 LearningRate 0.000388 Epoch: 17 Global Step: 364510 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:39,719-Speed 2496.41 samples/sec Loss 2.3924 LearningRate 0.000388 Epoch: 17 Global Step: 364520 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:47,922-Speed 2497.11 samples/sec Loss 2.3930 LearningRate 0.000388 Epoch: 17 Global Step: 364530 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:52:56,137-Speed 2493.31 samples/sec Loss 2.3963 LearningRate 0.000388 Epoch: 17 Global Step: 364540 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:04,341-Speed 2496.55 samples/sec Loss 2.4026 LearningRate 0.000388 Epoch: 17 Global Step: 364550 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:12,538-Speed 2498.78 samples/sec Loss 2.3366 LearningRate 0.000388 Epoch: 17 Global Step: 364560 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:20,687-Speed 2513.63 samples/sec Loss 2.3929 LearningRate 0.000388 Epoch: 17 Global Step: 364570 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:28,890-Speed 2496.87 samples/sec Loss 2.3873 LearningRate 0.000388 Epoch: 17 Global Step: 364580 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:37,096-Speed 2496.31 samples/sec Loss 2.4056 LearningRate 0.000388 Epoch: 17 Global Step: 364590 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:45,304-Speed 2495.75 samples/sec Loss 2.3713 LearningRate 0.000388 Epoch: 17 Global Step: 364600 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:53:53,506-Speed 2496.99 samples/sec Loss 2.3606 LearningRate 0.000388 Epoch: 17 Global Step: 364610 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:01,713-Speed 2495.87 samples/sec Loss 2.3125 LearningRate 0.000388 Epoch: 17 Global Step: 364620 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:09,865-Speed 2512.63 samples/sec Loss 2.4367 LearningRate 0.000388 Epoch: 17 Global Step: 364630 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:18,073-Speed 2495.78 samples/sec Loss 2.4523 LearningRate 0.000388 Epoch: 17 Global Step: 364640 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:26,295-Speed 2490.98 samples/sec Loss 2.3647 LearningRate 0.000388 Epoch: 17 Global Step: 364650 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:34,503-Speed 2495.55 samples/sec Loss 2.3882 LearningRate 0.000388 Epoch: 17 Global Step: 364660 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:42,708-Speed 2496.61 samples/sec Loss 2.3913 LearningRate 0.000388 Epoch: 17 Global Step: 364670 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:50,917-Speed 2495.39 samples/sec Loss 2.3889 LearningRate 0.000388 Epoch: 17 Global Step: 364680 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:54:59,064-Speed 2513.94 samples/sec Loss 2.4087 LearningRate 0.000388 Epoch: 17 Global Step: 364690 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:07,274-Speed 2494.86 samples/sec Loss 2.4058 LearningRate 0.000388 Epoch: 17 Global Step: 364700 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:15,477-Speed 2497.05 samples/sec Loss 2.3745 LearningRate 0.000388 Epoch: 17 Global Step: 364710 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:23,680-Speed 2496.95 samples/sec Loss 2.3100 LearningRate 0.000388 Epoch: 17 Global Step: 364720 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:31,888-Speed 2495.57 samples/sec Loss 2.3941 LearningRate 0.000388 Epoch: 17 Global Step: 364730 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:40,095-Speed 2496.01 samples/sec Loss 2.3884 LearningRate 0.000388 Epoch: 17 Global Step: 364740 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:48,246-Speed 2513.17 samples/sec Loss 2.3729 LearningRate 0.000388 Epoch: 17 Global Step: 364750 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:55:56,445-Speed 2497.97 samples/sec Loss 2.3420 LearningRate 0.000388 Epoch: 17 Global Step: 364760 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:04,660-Speed 2493.75 samples/sec Loss 2.4212 LearningRate 0.000388 Epoch: 17 Global Step: 364770 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:12,862-Speed 2497.23 samples/sec Loss 2.4237 LearningRate 0.000388 Epoch: 17 Global Step: 364780 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:21,068-Speed 2496.44 samples/sec Loss 2.3899 LearningRate 0.000388 Epoch: 17 Global Step: 364790 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:29,269-Speed 2497.31 samples/sec Loss 2.4245 LearningRate 0.000388 Epoch: 17 Global Step: 364800 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:37,423-Speed 2512.24 samples/sec Loss 2.4341 LearningRate 0.000387 Epoch: 17 Global Step: 364810 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:45,626-Speed 2497.05 samples/sec Loss 2.4185 LearningRate 0.000387 Epoch: 17 Global Step: 364820 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:56:53,826-Speed 2497.85 samples/sec Loss 2.4000 LearningRate 0.000387 Epoch: 17 Global Step: 364830 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:02,038-Speed 2494.49 samples/sec Loss 2.3860 LearningRate 0.000387 Epoch: 17 Global Step: 364840 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:10,244-Speed 2496.04 samples/sec Loss 2.3826 LearningRate 0.000387 Epoch: 17 Global Step: 364850 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:18,446-Speed 2497.41 samples/sec Loss 2.3891 LearningRate 0.000387 Epoch: 17 Global Step: 364860 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:26,597-Speed 2513.00 samples/sec Loss 2.3509 LearningRate 0.000387 Epoch: 17 Global Step: 364870 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:34,818-Speed 2491.53 samples/sec Loss 2.4167 LearningRate 0.000387 Epoch: 17 Global Step: 364880 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:43,036-Speed 2492.35 samples/sec Loss 2.4214 LearningRate 0.000387 Epoch: 17 Global Step: 364890 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:51,239-Speed 2497.16 samples/sec Loss 2.3646 LearningRate 0.000387 Epoch: 17 Global Step: 364900 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:57:59,440-Speed 2497.70 samples/sec Loss 2.3480 LearningRate 0.000387 Epoch: 17 Global Step: 364910 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:07,643-Speed 2497.09 samples/sec Loss 2.3767 LearningRate 0.000387 Epoch: 17 Global Step: 364920 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:15,795-Speed 2512.58 samples/sec Loss 2.3794 LearningRate 0.000387 Epoch: 17 Global Step: 364930 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:24,001-Speed 2496.11 samples/sec Loss 2.3497 LearningRate 0.000387 Epoch: 17 Global Step: 364940 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:32,202-Speed 2497.66 samples/sec Loss 2.3827 LearningRate 0.000387 Epoch: 17 Global Step: 364950 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:40,406-Speed 2496.70 samples/sec Loss 2.4153 LearningRate 0.000387 Epoch: 17 Global Step: 364960 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:48,627-Speed 2492.54 samples/sec Loss 2.3538 LearningRate 0.000387 Epoch: 17 Global Step: 364970 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:58:56,828-Speed 2497.59 samples/sec Loss 2.4013 LearningRate 0.000387 Epoch: 17 Global Step: 364980 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:04,989-Speed 2509.97 samples/sec Loss 2.3867 LearningRate 0.000387 Epoch: 17 Global Step: 364990 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:13,194-Speed 2496.36 samples/sec Loss 2.4044 LearningRate 0.000387 Epoch: 17 Global Step: 365000 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:21,395-Speed 2497.53 samples/sec Loss 2.3732 LearningRate 0.000387 Epoch: 17 Global Step: 365010 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:29,599-Speed 2496.83 samples/sec Loss 2.3979 LearningRate 0.000387 Epoch: 17 Global Step: 365020 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:37,803-Speed 2496.55 samples/sec Loss 2.3228 LearningRate 0.000387 Epoch: 17 Global Step: 365030 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:46,005-Speed 2497.54 samples/sec Loss 2.3755 LearningRate 0.000387 Epoch: 17 Global Step: 365040 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 01:59:54,161-Speed 2511.62 samples/sec Loss 2.3556 LearningRate 0.000387 Epoch: 17 Global Step: 365050 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:02,377-Speed 2493.26 samples/sec Loss 2.3534 LearningRate 0.000387 Epoch: 17 Global Step: 365060 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:10,584-Speed 2495.81 samples/sec Loss 2.4025 LearningRate 0.000387 Epoch: 17 Global Step: 365070 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:18,800-Speed 2492.92 samples/sec Loss 2.3874 LearningRate 0.000387 Epoch: 17 Global Step: 365080 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:27,000-Speed 2497.83 samples/sec Loss 2.4066 LearningRate 0.000387 Epoch: 17 Global Step: 365090 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:35,202-Speed 2497.49 samples/sec Loss 2.4104 LearningRate 0.000387 Epoch: 17 Global Step: 365100 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:43,349-Speed 2514.13 samples/sec Loss 2.3420 LearningRate 0.000387 Epoch: 17 Global Step: 365110 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:51,551-Speed 2497.81 samples/sec Loss 2.3527 LearningRate 0.000387 Epoch: 17 Global Step: 365120 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:00:59,753-Speed 2497.30 samples/sec Loss 2.4020 LearningRate 0.000387 Epoch: 17 Global Step: 365130 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:07,955-Speed 2497.39 samples/sec Loss 2.3607 LearningRate 0.000387 Epoch: 17 Global Step: 365140 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:16,157-Speed 2497.33 samples/sec Loss 2.3749 LearningRate 0.000387 Epoch: 17 Global Step: 365150 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:24,360-Speed 2496.97 samples/sec Loss 2.3632 LearningRate 0.000387 Epoch: 17 Global Step: 365160 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:32,512-Speed 2512.86 samples/sec Loss 2.4002 LearningRate 0.000387 Epoch: 17 Global Step: 365170 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:40,713-Speed 2497.47 samples/sec Loss 2.4237 LearningRate 0.000387 Epoch: 17 Global Step: 365180 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:48,918-Speed 2497.11 samples/sec Loss 2.4038 LearningRate 0.000387 Epoch: 17 Global Step: 365190 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:01:57,126-Speed 2495.39 samples/sec Loss 2.3647 LearningRate 0.000387 Epoch: 17 Global Step: 365200 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:05,330-Speed 2496.72 samples/sec Loss 2.3650 LearningRate 0.000387 Epoch: 17 Global Step: 365210 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:13,546-Speed 2493.24 samples/sec Loss 2.3771 LearningRate 0.000387 Epoch: 17 Global Step: 365220 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:21,701-Speed 2511.84 samples/sec Loss 2.4034 LearningRate 0.000387 Epoch: 17 Global Step: 365230 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:29,907-Speed 2496.27 samples/sec Loss 2.3828 LearningRate 0.000387 Epoch: 17 Global Step: 365240 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:38,116-Speed 2495.17 samples/sec Loss 2.3919 LearningRate 0.000387 Epoch: 17 Global Step: 365250 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:46,319-Speed 2496.86 samples/sec Loss 2.3637 LearningRate 0.000387 Epoch: 17 Global Step: 365260 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:02:54,522-Speed 2497.10 samples/sec Loss 2.3931 LearningRate 0.000387 Epoch: 17 Global Step: 365270 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:02,736-Speed 2493.78 samples/sec Loss 2.4370 LearningRate 0.000387 Epoch: 17 Global Step: 365280 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:10,883-Speed 2514.15 samples/sec Loss 2.4482 LearningRate 0.000387 Epoch: 17 Global Step: 365290 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:19,084-Speed 2497.62 samples/sec Loss 2.3913 LearningRate 0.000387 Epoch: 17 Global Step: 365300 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:27,285-Speed 2497.46 samples/sec Loss 2.4198 LearningRate 0.000387 Epoch: 17 Global Step: 365310 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:35,490-Speed 2496.62 samples/sec Loss 2.4542 LearningRate 0.000387 Epoch: 17 Global Step: 365320 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:43,690-Speed 2498.00 samples/sec Loss 2.4197 LearningRate 0.000387 Epoch: 17 Global Step: 365330 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:03:51,891-Speed 2497.42 samples/sec Loss 2.4320 LearningRate 0.000387 Epoch: 17 Global Step: 365340 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:00,044-Speed 2512.44 samples/sec Loss 2.4001 LearningRate 0.000387 Epoch: 17 Global Step: 365350 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:08,245-Speed 2497.54 samples/sec Loss 2.4346 LearningRate 0.000387 Epoch: 17 Global Step: 365360 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:16,454-Speed 2495.21 samples/sec Loss 2.4854 LearningRate 0.000387 Epoch: 17 Global Step: 365370 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:24,656-Speed 2497.36 samples/sec Loss 2.4227 LearningRate 0.000387 Epoch: 17 Global Step: 365380 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:32,861-Speed 2496.39 samples/sec Loss 2.4417 LearningRate 0.000387 Epoch: 17 Global Step: 365390 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:41,061-Speed 2498.01 samples/sec Loss 2.4755 LearningRate 0.000387 Epoch: 17 Global Step: 365400 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:49,208-Speed 2514.15 samples/sec Loss 2.3947 LearningRate 0.000386 Epoch: 17 Global Step: 365410 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:04:57,413-Speed 2496.74 samples/sec Loss 2.4314 LearningRate 0.000386 Epoch: 17 Global Step: 365420 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:05,622-Speed 2495.55 samples/sec Loss 2.3750 LearningRate 0.000386 Epoch: 17 Global Step: 365430 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:13,824-Speed 2497.45 samples/sec Loss 2.4223 LearningRate 0.000386 Epoch: 17 Global Step: 365440 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:22,026-Speed 2497.28 samples/sec Loss 2.4187 LearningRate 0.000386 Epoch: 17 Global Step: 365450 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:30,232-Speed 2496.03 samples/sec Loss 2.3904 LearningRate 0.000386 Epoch: 17 Global Step: 365460 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:38,402-Speed 2507.31 samples/sec Loss 2.4474 LearningRate 0.000386 Epoch: 17 Global Step: 365470 Fp16 Grad Scale: 65536 Required: 106 hours Training: 2022-07-09 02:05:46,567-Speed 2508.73 samples/sec Loss 2.4209 LearningRate 0.000386 Epoch: 17 Global Step: 365480 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:05:54,769-Speed 2497.08 samples/sec Loss 2.4600 LearningRate 0.000386 Epoch: 17 Global Step: 365490 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:02,978-Speed 2495.51 samples/sec Loss 2.4197 LearningRate 0.000386 Epoch: 17 Global Step: 365500 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:11,184-Speed 2496.19 samples/sec Loss 2.3826 LearningRate 0.000386 Epoch: 17 Global Step: 365510 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:19,390-Speed 2496.66 samples/sec Loss 2.4730 LearningRate 0.000386 Epoch: 17 Global Step: 365520 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:27,542-Speed 2512.47 samples/sec Loss 2.3821 LearningRate 0.000386 Epoch: 17 Global Step: 365530 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:35,746-Speed 2497.01 samples/sec Loss 2.3633 LearningRate 0.000386 Epoch: 17 Global Step: 365540 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:43,946-Speed 2497.56 samples/sec Loss 2.4176 LearningRate 0.000386 Epoch: 17 Global Step: 365550 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:06:52,177-Speed 2488.97 samples/sec Loss 2.4617 LearningRate 0.000386 Epoch: 17 Global Step: 365560 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:00,378-Speed 2497.71 samples/sec Loss 2.4227 LearningRate 0.000386 Epoch: 17 Global Step: 365570 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:08,580-Speed 2497.34 samples/sec Loss 2.4092 LearningRate 0.000386 Epoch: 17 Global Step: 365580 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:16,729-Speed 2513.81 samples/sec Loss 2.4507 LearningRate 0.000386 Epoch: 17 Global Step: 365590 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:24,934-Speed 2496.51 samples/sec Loss 2.4011 LearningRate 0.000386 Epoch: 17 Global Step: 365600 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:33,135-Speed 2497.42 samples/sec Loss 2.3850 LearningRate 0.000386 Epoch: 17 Global Step: 365610 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:41,341-Speed 2496.21 samples/sec Loss 2.3646 LearningRate 0.000386 Epoch: 17 Global Step: 365620 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:49,544-Speed 2496.87 samples/sec Loss 2.3338 LearningRate 0.000386 Epoch: 17 Global Step: 365630 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:07:57,747-Speed 2497.13 samples/sec Loss 2.3983 LearningRate 0.000386 Epoch: 17 Global Step: 365640 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:05,895-Speed 2513.79 samples/sec Loss 2.3559 LearningRate 0.000386 Epoch: 17 Global Step: 365650 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:14,102-Speed 2495.94 samples/sec Loss 2.4040 LearningRate 0.000386 Epoch: 17 Global Step: 365660 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:22,332-Speed 2488.86 samples/sec Loss 2.3733 LearningRate 0.000386 Epoch: 17 Global Step: 365670 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:30,536-Speed 2496.87 samples/sec Loss 2.3208 LearningRate 0.000386 Epoch: 17 Global Step: 365680 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:38,735-Speed 2498.17 samples/sec Loss 2.3615 LearningRate 0.000386 Epoch: 17 Global Step: 365690 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:46,944-Speed 2495.47 samples/sec Loss 2.3671 LearningRate 0.000386 Epoch: 17 Global Step: 365700 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:08:55,129-Speed 2502.97 samples/sec Loss 2.3931 LearningRate 0.000386 Epoch: 17 Global Step: 365710 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:03,331-Speed 2497.56 samples/sec Loss 2.3948 LearningRate 0.000386 Epoch: 17 Global Step: 365720 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:11,533-Speed 2497.38 samples/sec Loss 2.4352 LearningRate 0.000386 Epoch: 17 Global Step: 365730 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:19,735-Speed 2497.35 samples/sec Loss 2.3783 LearningRate 0.000386 Epoch: 17 Global Step: 365740 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:27,939-Speed 2496.72 samples/sec Loss 2.3741 LearningRate 0.000386 Epoch: 17 Global Step: 365750 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:36,143-Speed 2496.85 samples/sec Loss 2.3826 LearningRate 0.000386 Epoch: 17 Global Step: 365760 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:44,294-Speed 2512.89 samples/sec Loss 2.4062 LearningRate 0.000386 Epoch: 17 Global Step: 365770 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:09:52,500-Speed 2496.17 samples/sec Loss 2.4119 LearningRate 0.000386 Epoch: 17 Global Step: 365780 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:00,706-Speed 2496.03 samples/sec Loss 2.3783 LearningRate 0.000386 Epoch: 17 Global Step: 365790 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:08,910-Speed 2496.85 samples/sec Loss 2.4148 LearningRate 0.000386 Epoch: 17 Global Step: 365800 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:17,110-Speed 2497.93 samples/sec Loss 2.4045 LearningRate 0.000386 Epoch: 17 Global Step: 365810 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:25,310-Speed 2497.64 samples/sec Loss 2.3368 LearningRate 0.000386 Epoch: 17 Global Step: 365820 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:33,462-Speed 2512.86 samples/sec Loss 2.3776 LearningRate 0.000386 Epoch: 17 Global Step: 365830 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:41,667-Speed 2496.55 samples/sec Loss 2.3588 LearningRate 0.000386 Epoch: 17 Global Step: 365840 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:49,870-Speed 2496.80 samples/sec Loss 2.3451 LearningRate 0.000386 Epoch: 17 Global Step: 365850 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:10:58,031-Speed 2509.92 samples/sec Loss 2.3817 LearningRate 0.000386 Epoch: 17 Global Step: 365860 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:06,242-Speed 2494.62 samples/sec Loss 2.4074 LearningRate 0.000386 Epoch: 17 Global Step: 365870 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:14,448-Speed 2496.26 samples/sec Loss 2.3705 LearningRate 0.000386 Epoch: 17 Global Step: 365880 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:22,596-Speed 2513.72 samples/sec Loss 2.4023 LearningRate 0.000386 Epoch: 17 Global Step: 365890 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:30,797-Speed 2497.42 samples/sec Loss 2.4049 LearningRate 0.000386 Epoch: 17 Global Step: 365900 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:38,999-Speed 2497.66 samples/sec Loss 2.3586 LearningRate 0.000386 Epoch: 17 Global Step: 365910 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:47,210-Speed 2494.41 samples/sec Loss 2.4043 LearningRate 0.000386 Epoch: 17 Global Step: 365920 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:11:55,413-Speed 2497.05 samples/sec Loss 2.3582 LearningRate 0.000386 Epoch: 17 Global Step: 365930 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:03,615-Speed 2497.17 samples/sec Loss 2.4171 LearningRate 0.000386 Epoch: 17 Global Step: 365940 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:11,764-Speed 2513.87 samples/sec Loss 2.3754 LearningRate 0.000386 Epoch: 17 Global Step: 365950 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:19,966-Speed 2497.34 samples/sec Loss 2.3208 LearningRate 0.000386 Epoch: 17 Global Step: 365960 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:28,171-Speed 2496.59 samples/sec Loss 2.3173 LearningRate 0.000386 Epoch: 17 Global Step: 365970 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:36,377-Speed 2496.09 samples/sec Loss 2.3188 LearningRate 0.000386 Epoch: 17 Global Step: 365980 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:44,581-Speed 2496.95 samples/sec Loss 2.3514 LearningRate 0.000386 Epoch: 17 Global Step: 365990 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:12:52,810-Speed 2489.09 samples/sec Loss 2.3317 LearningRate 0.000386 Epoch: 17 Global Step: 366000 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:00,963-Speed 2512.33 samples/sec Loss 2.3592 LearningRate 0.000385 Epoch: 17 Global Step: 366010 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:09,167-Speed 2496.86 samples/sec Loss 2.3457 LearningRate 0.000385 Epoch: 17 Global Step: 366020 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:17,367-Speed 2497.91 samples/sec Loss 2.3617 LearningRate 0.000385 Epoch: 17 Global Step: 366030 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:25,571-Speed 2496.61 samples/sec Loss 2.4200 LearningRate 0.000385 Epoch: 17 Global Step: 366040 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:33,776-Speed 2496.51 samples/sec Loss 2.3362 LearningRate 0.000385 Epoch: 17 Global Step: 366050 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:41,981-Speed 2496.48 samples/sec Loss 2.4040 LearningRate 0.000385 Epoch: 17 Global Step: 366060 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:50,130-Speed 2513.26 samples/sec Loss 2.3676 LearningRate 0.000385 Epoch: 17 Global Step: 366070 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:13:58,330-Speed 2498.06 samples/sec Loss 2.3943 LearningRate 0.000385 Epoch: 17 Global Step: 366080 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:06,533-Speed 2497.24 samples/sec Loss 2.4045 LearningRate 0.000385 Epoch: 17 Global Step: 366090 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:14,737-Speed 2496.71 samples/sec Loss 2.3959 LearningRate 0.000385 Epoch: 17 Global Step: 366100 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:22,938-Speed 2497.58 samples/sec Loss 2.4205 LearningRate 0.000385 Epoch: 17 Global Step: 366110 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:31,144-Speed 2496.39 samples/sec Loss 2.3734 LearningRate 0.000385 Epoch: 17 Global Step: 366120 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:39,295-Speed 2512.98 samples/sec Loss 2.4189 LearningRate 0.000385 Epoch: 17 Global Step: 366130 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:47,495-Speed 2498.09 samples/sec Loss 2.3818 LearningRate 0.000385 Epoch: 17 Global Step: 366140 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:14:55,699-Speed 2496.65 samples/sec Loss 2.3423 LearningRate 0.000385 Epoch: 17 Global Step: 366150 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:03,899-Speed 2498.27 samples/sec Loss 2.2974 LearningRate 0.000385 Epoch: 17 Global Step: 366160 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:12,107-Speed 2495.67 samples/sec Loss 2.3876 LearningRate 0.000385 Epoch: 17 Global Step: 366170 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:20,309-Speed 2497.42 samples/sec Loss 2.3363 LearningRate 0.000385 Epoch: 17 Global Step: 366180 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:28,468-Speed 2510.56 samples/sec Loss 2.3768 LearningRate 0.000385 Epoch: 17 Global Step: 366190 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:36,673-Speed 2496.40 samples/sec Loss 2.3538 LearningRate 0.000385 Epoch: 17 Global Step: 366200 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:44,877-Speed 2496.65 samples/sec Loss 2.3595 LearningRate 0.000385 Epoch: 17 Global Step: 366210 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:15:53,079-Speed 2497.39 samples/sec Loss 2.3649 LearningRate 0.000385 Epoch: 17 Global Step: 366220 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:01,283-Speed 2496.65 samples/sec Loss 2.3836 LearningRate 0.000385 Epoch: 17 Global Step: 366230 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:09,487-Speed 2496.79 samples/sec Loss 2.3924 LearningRate 0.000385 Epoch: 17 Global Step: 366240 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:17,638-Speed 2513.32 samples/sec Loss 2.3164 LearningRate 0.000385 Epoch: 17 Global Step: 366250 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:25,839-Speed 2497.44 samples/sec Loss 2.3836 LearningRate 0.000385 Epoch: 17 Global Step: 366260 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:34,044-Speed 2496.49 samples/sec Loss 2.3297 LearningRate 0.000385 Epoch: 17 Global Step: 366270 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:42,246-Speed 2497.19 samples/sec Loss 2.4000 LearningRate 0.000385 Epoch: 17 Global Step: 366280 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:50,449-Speed 2497.07 samples/sec Loss 2.3988 LearningRate 0.000385 Epoch: 17 Global Step: 366290 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:16:58,662-Speed 2494.19 samples/sec Loss 2.3834 LearningRate 0.000385 Epoch: 17 Global Step: 366300 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:06,817-Speed 2511.41 samples/sec Loss 2.3453 LearningRate 0.000385 Epoch: 17 Global Step: 366310 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:15,020-Speed 2497.21 samples/sec Loss 2.3881 LearningRate 0.000385 Epoch: 17 Global Step: 366320 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:23,222-Speed 2497.40 samples/sec Loss 2.3721 LearningRate 0.000385 Epoch: 17 Global Step: 366330 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:31,423-Speed 2497.37 samples/sec Loss 2.3370 LearningRate 0.000385 Epoch: 17 Global Step: 366340 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:39,628-Speed 2496.77 samples/sec Loss 2.3564 LearningRate 0.000385 Epoch: 17 Global Step: 366350 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:47,843-Speed 2493.15 samples/sec Loss 2.3381 LearningRate 0.000385 Epoch: 17 Global Step: 366360 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:17:55,994-Speed 2513.17 samples/sec Loss 2.4011 LearningRate 0.000385 Epoch: 17 Global Step: 366370 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:04,202-Speed 2495.36 samples/sec Loss 2.4501 LearningRate 0.000385 Epoch: 17 Global Step: 366380 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:12,421-Speed 2492.65 samples/sec Loss 2.4054 LearningRate 0.000385 Epoch: 17 Global Step: 366390 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:20,624-Speed 2496.91 samples/sec Loss 2.4159 LearningRate 0.000385 Epoch: 17 Global Step: 366400 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:28,830-Speed 2496.28 samples/sec Loss 2.3659 LearningRate 0.000385 Epoch: 17 Global Step: 366410 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:37,050-Speed 2492.16 samples/sec Loss 2.3448 LearningRate 0.000385 Epoch: 17 Global Step: 366420 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:45,210-Speed 2510.26 samples/sec Loss 2.3885 LearningRate 0.000385 Epoch: 17 Global Step: 366430 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:18:53,415-Speed 2496.63 samples/sec Loss 2.3630 LearningRate 0.000385 Epoch: 17 Global Step: 366440 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:01,618-Speed 2497.03 samples/sec Loss 2.3680 LearningRate 0.000385 Epoch: 17 Global Step: 366450 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:09,817-Speed 2498.28 samples/sec Loss 2.3758 LearningRate 0.000385 Epoch: 17 Global Step: 366460 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:18,018-Speed 2497.42 samples/sec Loss 2.3951 LearningRate 0.000385 Epoch: 17 Global Step: 366470 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:26,221-Speed 2497.09 samples/sec Loss 2.4193 LearningRate 0.000385 Epoch: 17 Global Step: 366480 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:34,381-Speed 2510.32 samples/sec Loss 2.4088 LearningRate 0.000385 Epoch: 17 Global Step: 366490 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:42,583-Speed 2497.35 samples/sec Loss 2.4282 LearningRate 0.000385 Epoch: 17 Global Step: 366500 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:50,788-Speed 2496.45 samples/sec Loss 2.4198 LearningRate 0.000385 Epoch: 17 Global Step: 366510 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:19:58,989-Speed 2497.80 samples/sec Loss 2.3708 LearningRate 0.000385 Epoch: 17 Global Step: 366520 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:07,194-Speed 2496.30 samples/sec Loss 2.3660 LearningRate 0.000385 Epoch: 17 Global Step: 366530 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:15,395-Speed 2497.63 samples/sec Loss 2.3611 LearningRate 0.000385 Epoch: 17 Global Step: 366540 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:23,539-Speed 2515.21 samples/sec Loss 2.3828 LearningRate 0.000385 Epoch: 17 Global Step: 366550 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:31,738-Speed 2498.15 samples/sec Loss 2.3779 LearningRate 0.000385 Epoch: 17 Global Step: 366560 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:39,937-Speed 2498.44 samples/sec Loss 2.4120 LearningRate 0.000385 Epoch: 17 Global Step: 366570 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:48,143-Speed 2495.91 samples/sec Loss 2.4225 LearningRate 0.000385 Epoch: 17 Global Step: 366580 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:20:56,346-Speed 2497.12 samples/sec Loss 2.3480 LearningRate 0.000385 Epoch: 17 Global Step: 366590 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:04,548-Speed 2497.35 samples/sec Loss 2.4185 LearningRate 0.000385 Epoch: 17 Global Step: 366600 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:12,701-Speed 2512.29 samples/sec Loss 2.3492 LearningRate 0.000384 Epoch: 17 Global Step: 366610 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:20,904-Speed 2497.07 samples/sec Loss 2.3788 LearningRate 0.000384 Epoch: 17 Global Step: 366620 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:29,118-Speed 2494.06 samples/sec Loss 2.3671 LearningRate 0.000384 Epoch: 17 Global Step: 366630 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:37,319-Speed 2497.50 samples/sec Loss 2.3557 LearningRate 0.000384 Epoch: 17 Global Step: 366640 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:45,519-Speed 2497.95 samples/sec Loss 2.4107 LearningRate 0.000384 Epoch: 17 Global Step: 366650 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:21:53,723-Speed 2496.61 samples/sec Loss 2.3493 LearningRate 0.000384 Epoch: 17 Global Step: 366660 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:01,872-Speed 2513.48 samples/sec Loss 2.3261 LearningRate 0.000384 Epoch: 17 Global Step: 366670 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:10,071-Speed 2498.50 samples/sec Loss 2.3895 LearningRate 0.000384 Epoch: 17 Global Step: 366680 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:18,275-Speed 2496.61 samples/sec Loss 2.3572 LearningRate 0.000384 Epoch: 17 Global Step: 366690 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:26,475-Speed 2497.93 samples/sec Loss 2.3455 LearningRate 0.000384 Epoch: 17 Global Step: 366700 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:34,677-Speed 2497.41 samples/sec Loss 2.3445 LearningRate 0.000384 Epoch: 17 Global Step: 366710 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:42,898-Speed 2491.79 samples/sec Loss 2.3927 LearningRate 0.000384 Epoch: 17 Global Step: 366720 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:51,049-Speed 2512.88 samples/sec Loss 2.3632 LearningRate 0.000384 Epoch: 17 Global Step: 366730 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:22:59,252-Speed 2497.02 samples/sec Loss 2.4793 LearningRate 0.000384 Epoch: 17 Global Step: 366740 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:07,456-Speed 2496.78 samples/sec Loss 2.4026 LearningRate 0.000384 Epoch: 17 Global Step: 366750 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:15,660-Speed 2496.87 samples/sec Loss 2.3853 LearningRate 0.000384 Epoch: 17 Global Step: 366760 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:23,865-Speed 2496.36 samples/sec Loss 2.3969 LearningRate 0.000384 Epoch: 17 Global Step: 366770 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:32,065-Speed 2497.76 samples/sec Loss 2.3786 LearningRate 0.000384 Epoch: 17 Global Step: 366780 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:40,223-Speed 2511.11 samples/sec Loss 2.4153 LearningRate 0.000384 Epoch: 17 Global Step: 366790 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:48,426-Speed 2497.07 samples/sec Loss 2.3800 LearningRate 0.000384 Epoch: 17 Global Step: 366800 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:23:56,629-Speed 2496.73 samples/sec Loss 2.3499 LearningRate 0.000384 Epoch: 17 Global Step: 366810 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:04,832-Speed 2496.94 samples/sec Loss 2.3690 LearningRate 0.000384 Epoch: 17 Global Step: 366820 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:13,033-Speed 2498.06 samples/sec Loss 2.3276 LearningRate 0.000384 Epoch: 17 Global Step: 366830 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:21,235-Speed 2497.14 samples/sec Loss 2.3534 LearningRate 0.000384 Epoch: 17 Global Step: 366840 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:29,410-Speed 2505.52 samples/sec Loss 2.3385 LearningRate 0.000384 Epoch: 17 Global Step: 366850 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:37,631-Speed 2491.74 samples/sec Loss 2.3677 LearningRate 0.000384 Epoch: 17 Global Step: 366860 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:45,835-Speed 2496.80 samples/sec Loss 2.3375 LearningRate 0.000384 Epoch: 17 Global Step: 366870 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:24:54,035-Speed 2497.97 samples/sec Loss 2.3346 LearningRate 0.000384 Epoch: 17 Global Step: 366880 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:02,235-Speed 2497.81 samples/sec Loss 2.3615 LearningRate 0.000384 Epoch: 17 Global Step: 366890 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:10,443-Speed 2495.72 samples/sec Loss 2.3147 LearningRate 0.000384 Epoch: 17 Global Step: 366900 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:18,604-Speed 2510.18 samples/sec Loss 2.3663 LearningRate 0.000384 Epoch: 17 Global Step: 366910 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:26,810-Speed 2496.14 samples/sec Loss 2.3632 LearningRate 0.000384 Epoch: 17 Global Step: 366920 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:35,011-Speed 2497.56 samples/sec Loss 2.3911 LearningRate 0.000384 Epoch: 17 Global Step: 366930 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:43,210-Speed 2498.17 samples/sec Loss 2.3672 LearningRate 0.000384 Epoch: 17 Global Step: 366940 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:51,410-Speed 2498.01 samples/sec Loss 2.3714 LearningRate 0.000384 Epoch: 17 Global Step: 366950 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:25:59,618-Speed 2495.59 samples/sec Loss 2.3766 LearningRate 0.000384 Epoch: 17 Global Step: 366960 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:07,783-Speed 2508.82 samples/sec Loss 2.4185 LearningRate 0.000384 Epoch: 17 Global Step: 366970 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:15,982-Speed 2498.08 samples/sec Loss 2.3907 LearningRate 0.000384 Epoch: 17 Global Step: 366980 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:24,184-Speed 2497.50 samples/sec Loss 2.3788 LearningRate 0.000384 Epoch: 17 Global Step: 366990 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:32,400-Speed 2493.08 samples/sec Loss 2.3624 LearningRate 0.000384 Epoch: 17 Global Step: 367000 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:40,608-Speed 2495.56 samples/sec Loss 2.4001 LearningRate 0.000384 Epoch: 17 Global Step: 367010 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:48,812-Speed 2496.59 samples/sec Loss 2.3541 LearningRate 0.000384 Epoch: 17 Global Step: 367020 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:26:56,969-Speed 2511.28 samples/sec Loss 2.4218 LearningRate 0.000384 Epoch: 17 Global Step: 367030 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:27:05,168-Speed 2498.15 samples/sec Loss 2.3963 LearningRate 0.000384 Epoch: 17 Global Step: 367040 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:27:13,368-Speed 2497.93 samples/sec Loss 2.4714 LearningRate 0.000384 Epoch: 17 Global Step: 367050 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:27:21,566-Speed 2498.47 samples/sec Loss 2.3335 LearningRate 0.000384 Epoch: 17 Global Step: 367060 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:27:29,772-Speed 2496.42 samples/sec Loss 2.3944 LearningRate 0.000384 Epoch: 17 Global Step: 367070 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:27:37,973-Speed 2497.63 samples/sec Loss 2.4389 LearningRate 0.000384 Epoch: 17 Global Step: 367080 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:27:46,125-Speed 2512.67 samples/sec Loss 2.4189 LearningRate 0.000384 Epoch: 17 Global Step: 367090 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:27:54,329-Speed 2496.63 samples/sec Loss 2.4372 LearningRate 0.000384 Epoch: 17 Global Step: 367100 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:02,531-Speed 2497.58 samples/sec Loss 2.4157 LearningRate 0.000384 Epoch: 17 Global Step: 367110 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:10,736-Speed 2496.45 samples/sec Loss 2.3997 LearningRate 0.000384 Epoch: 17 Global Step: 367120 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:18,937-Speed 2497.47 samples/sec Loss 2.3633 LearningRate 0.000384 Epoch: 17 Global Step: 367130 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:27,137-Speed 2498.21 samples/sec Loss 2.4346 LearningRate 0.000384 Epoch: 17 Global Step: 367140 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:35,287-Speed 2513.22 samples/sec Loss 2.4046 LearningRate 0.000384 Epoch: 17 Global Step: 367150 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:43,490-Speed 2497.00 samples/sec Loss 2.3821 LearningRate 0.000384 Epoch: 17 Global Step: 367160 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:51,692-Speed 2497.13 samples/sec Loss 2.3875 LearningRate 0.000384 Epoch: 17 Global Step: 367170 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:28:59,904-Speed 2494.81 samples/sec Loss 2.4087 LearningRate 0.000384 Epoch: 17 Global Step: 367180 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:08,105-Speed 2497.78 samples/sec Loss 2.3756 LearningRate 0.000384 Epoch: 17 Global Step: 367190 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:16,308-Speed 2496.89 samples/sec Loss 2.3469 LearningRate 0.000384 Epoch: 17 Global Step: 367200 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:24,457-Speed 2513.78 samples/sec Loss 2.4024 LearningRate 0.000383 Epoch: 17 Global Step: 367210 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:32,660-Speed 2497.00 samples/sec Loss 2.4024 LearningRate 0.000383 Epoch: 17 Global Step: 367220 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:40,876-Speed 2493.04 samples/sec Loss 2.4010 LearningRate 0.000383 Epoch: 17 Global Step: 367230 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:49,080-Speed 2496.64 samples/sec Loss 2.3866 LearningRate 0.000383 Epoch: 17 Global Step: 367240 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:29:57,284-Speed 2496.98 samples/sec Loss 2.3622 LearningRate 0.000383 Epoch: 17 Global Step: 367250 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:05,493-Speed 2494.99 samples/sec Loss 2.4028 LearningRate 0.000383 Epoch: 17 Global Step: 367260 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:13,641-Speed 2514.21 samples/sec Loss 2.4084 LearningRate 0.000383 Epoch: 17 Global Step: 367270 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:21,852-Speed 2494.42 samples/sec Loss 2.3766 LearningRate 0.000383 Epoch: 17 Global Step: 367280 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:30,058-Speed 2496.31 samples/sec Loss 2.3834 LearningRate 0.000383 Epoch: 17 Global Step: 367290 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:38,275-Speed 2492.61 samples/sec Loss 2.4288 LearningRate 0.000383 Epoch: 17 Global Step: 367300 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:46,483-Speed 2495.68 samples/sec Loss 2.4104 LearningRate 0.000383 Epoch: 17 Global Step: 367310 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:30:54,701-Speed 2492.35 samples/sec Loss 2.4036 LearningRate 0.000383 Epoch: 17 Global Step: 367320 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:02,861-Speed 2510.28 samples/sec Loss 2.3455 LearningRate 0.000383 Epoch: 17 Global Step: 367330 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:11,065-Speed 2496.81 samples/sec Loss 2.3493 LearningRate 0.000383 Epoch: 17 Global Step: 367340 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:19,268-Speed 2496.83 samples/sec Loss 2.3660 LearningRate 0.000383 Epoch: 17 Global Step: 367350 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:27,474-Speed 2496.37 samples/sec Loss 2.4060 LearningRate 0.000383 Epoch: 17 Global Step: 367360 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:35,676-Speed 2497.15 samples/sec Loss 2.3606 LearningRate 0.000383 Epoch: 17 Global Step: 367370 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:43,877-Speed 2497.87 samples/sec Loss 2.3980 LearningRate 0.000383 Epoch: 17 Global Step: 367380 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:31:52,029-Speed 2512.77 samples/sec Loss 2.3534 LearningRate 0.000383 Epoch: 17 Global Step: 367390 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:00,246-Speed 2492.77 samples/sec Loss 2.3963 LearningRate 0.000383 Epoch: 17 Global Step: 367400 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:08,446-Speed 2498.04 samples/sec Loss 2.3615 LearningRate 0.000383 Epoch: 17 Global Step: 367410 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:16,654-Speed 2495.49 samples/sec Loss 2.4004 LearningRate 0.000383 Epoch: 17 Global Step: 367420 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:24,856-Speed 2497.67 samples/sec Loss 2.4028 LearningRate 0.000383 Epoch: 17 Global Step: 367430 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:33,072-Speed 2493.18 samples/sec Loss 2.3670 LearningRate 0.000383 Epoch: 17 Global Step: 367440 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:41,217-Speed 2514.79 samples/sec Loss 2.3687 LearningRate 0.000383 Epoch: 17 Global Step: 367450 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:49,417-Speed 2498.02 samples/sec Loss 2.3421 LearningRate 0.000383 Epoch: 17 Global Step: 367460 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:32:57,619-Speed 2497.22 samples/sec Loss 2.4029 LearningRate 0.000383 Epoch: 17 Global Step: 367470 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:05,822-Speed 2497.32 samples/sec Loss 2.3974 LearningRate 0.000383 Epoch: 17 Global Step: 367480 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:14,030-Speed 2495.60 samples/sec Loss 2.3681 LearningRate 0.000383 Epoch: 17 Global Step: 367490 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:22,234-Speed 2496.34 samples/sec Loss 2.3403 LearningRate 0.000383 Epoch: 17 Global Step: 367500 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:30,381-Speed 2514.35 samples/sec Loss 2.3928 LearningRate 0.000383 Epoch: 17 Global Step: 367510 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:38,591-Speed 2495.30 samples/sec Loss 2.3808 LearningRate 0.000383 Epoch: 17 Global Step: 367520 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:46,793-Speed 2497.35 samples/sec Loss 2.4009 LearningRate 0.000383 Epoch: 17 Global Step: 367530 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:33:54,995-Speed 2497.38 samples/sec Loss 2.4319 LearningRate 0.000383 Epoch: 17 Global Step: 367540 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:03,198-Speed 2496.88 samples/sec Loss 2.3505 LearningRate 0.000383 Epoch: 17 Global Step: 367550 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:11,403-Speed 2496.55 samples/sec Loss 2.3635 LearningRate 0.000383 Epoch: 17 Global Step: 367560 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:19,558-Speed 2511.64 samples/sec Loss 2.3536 LearningRate 0.000383 Epoch: 17 Global Step: 367570 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:27,760-Speed 2497.51 samples/sec Loss 2.3393 LearningRate 0.000383 Epoch: 17 Global Step: 367580 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:35,962-Speed 2497.42 samples/sec Loss 2.3873 LearningRate 0.000383 Epoch: 17 Global Step: 367590 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:44,162-Speed 2497.91 samples/sec Loss 2.3950 LearningRate 0.000383 Epoch: 17 Global Step: 367600 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:34:52,363-Speed 2497.46 samples/sec Loss 2.3762 LearningRate 0.000383 Epoch: 17 Global Step: 367610 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:00,564-Speed 2497.66 samples/sec Loss 2.4015 LearningRate 0.000383 Epoch: 17 Global Step: 367620 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:08,720-Speed 2511.55 samples/sec Loss 2.4033 LearningRate 0.000383 Epoch: 17 Global Step: 367630 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:16,920-Speed 2498.31 samples/sec Loss 2.3936 LearningRate 0.000383 Epoch: 17 Global Step: 367640 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:25,121-Speed 2497.73 samples/sec Loss 2.4472 LearningRate 0.000383 Epoch: 17 Global Step: 367650 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:33,335-Speed 2493.39 samples/sec Loss 2.3981 LearningRate 0.000383 Epoch: 17 Global Step: 367660 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:41,535-Speed 2498.06 samples/sec Loss 2.4345 LearningRate 0.000383 Epoch: 17 Global Step: 367670 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:49,750-Speed 2493.33 samples/sec Loss 2.3927 LearningRate 0.000383 Epoch: 17 Global Step: 367680 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:35:57,917-Speed 2508.26 samples/sec Loss 2.3928 LearningRate 0.000383 Epoch: 17 Global Step: 367690 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:06,118-Speed 2497.33 samples/sec Loss 2.3523 LearningRate 0.000383 Epoch: 17 Global Step: 367700 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:14,322-Speed 2496.87 samples/sec Loss 2.3193 LearningRate 0.000383 Epoch: 17 Global Step: 367710 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:22,525-Speed 2497.13 samples/sec Loss 2.3761 LearningRate 0.000383 Epoch: 17 Global Step: 367720 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:30,729-Speed 2496.52 samples/sec Loss 2.3493 LearningRate 0.000383 Epoch: 17 Global Step: 367730 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:38,935-Speed 2496.16 samples/sec Loss 2.3902 LearningRate 0.000383 Epoch: 17 Global Step: 367740 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:47,090-Speed 2511.69 samples/sec Loss 2.3608 LearningRate 0.000383 Epoch: 17 Global Step: 367750 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:36:55,297-Speed 2495.84 samples/sec Loss 2.3459 LearningRate 0.000383 Epoch: 17 Global Step: 367760 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:03,506-Speed 2495.22 samples/sec Loss 2.3763 LearningRate 0.000383 Epoch: 17 Global Step: 367770 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:11,721-Speed 2493.57 samples/sec Loss 2.3016 LearningRate 0.000383 Epoch: 17 Global Step: 367780 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:19,930-Speed 2494.93 samples/sec Loss 2.3670 LearningRate 0.000383 Epoch: 17 Global Step: 367790 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:28,137-Speed 2495.93 samples/sec Loss 2.3868 LearningRate 0.000383 Epoch: 17 Global Step: 367800 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:36,293-Speed 2511.70 samples/sec Loss 2.3513 LearningRate 0.000383 Epoch: 17 Global Step: 367810 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:44,497-Speed 2496.57 samples/sec Loss 2.4097 LearningRate 0.000382 Epoch: 17 Global Step: 367820 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:37:52,710-Speed 2494.33 samples/sec Loss 2.4185 LearningRate 0.000382 Epoch: 17 Global Step: 367830 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:00,940-Speed 2488.58 samples/sec Loss 2.3716 LearningRate 0.000382 Epoch: 17 Global Step: 367840 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:09,157-Speed 2492.73 samples/sec Loss 2.3503 LearningRate 0.000382 Epoch: 17 Global Step: 367850 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:17,360-Speed 2497.14 samples/sec Loss 2.3515 LearningRate 0.000382 Epoch: 17 Global Step: 367860 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:25,514-Speed 2512.02 samples/sec Loss 2.3291 LearningRate 0.000382 Epoch: 17 Global Step: 367870 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:33,715-Speed 2497.91 samples/sec Loss 2.3412 LearningRate 0.000382 Epoch: 17 Global Step: 367880 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:41,917-Speed 2497.28 samples/sec Loss 2.3445 LearningRate 0.000382 Epoch: 17 Global Step: 367890 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:50,147-Speed 2488.78 samples/sec Loss 2.3515 LearningRate 0.000382 Epoch: 17 Global Step: 367900 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:38:58,354-Speed 2495.74 samples/sec Loss 2.3953 LearningRate 0.000382 Epoch: 17 Global Step: 367910 Fp16 Grad Scale: 32768 Required: 106 hours Training: 2022-07-09 02:39:06,527-Speed 2506.20 samples/sec Loss 2.3542 LearningRate 0.000382 Epoch: 17 Global Step: 367920 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:14,678-Speed 2513.04 samples/sec Loss 2.2910 LearningRate 0.000382 Epoch: 17 Global Step: 367930 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:22,882-Speed 2496.53 samples/sec Loss 2.3604 LearningRate 0.000382 Epoch: 17 Global Step: 367940 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:31,087-Speed 2496.47 samples/sec Loss 2.3528 LearningRate 0.000382 Epoch: 17 Global Step: 367950 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:39,297-Speed 2494.71 samples/sec Loss 2.4287 LearningRate 0.000382 Epoch: 17 Global Step: 367960 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:47,501-Speed 2496.78 samples/sec Loss 2.3747 LearningRate 0.000382 Epoch: 17 Global Step: 367970 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:39:55,708-Speed 2496.28 samples/sec Loss 2.4161 LearningRate 0.000382 Epoch: 17 Global Step: 367980 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:03,860-Speed 2512.60 samples/sec Loss 2.2777 LearningRate 0.000382 Epoch: 17 Global Step: 367990 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:12,069-Speed 2495.54 samples/sec Loss 2.4083 LearningRate 0.000382 Epoch: 17 Global Step: 368000 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:20,277-Speed 2495.56 samples/sec Loss 2.4075 LearningRate 0.000382 Epoch: 17 Global Step: 368010 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:28,480-Speed 2496.78 samples/sec Loss 2.3744 LearningRate 0.000382 Epoch: 17 Global Step: 368020 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:36,688-Speed 2495.80 samples/sec Loss 2.3560 LearningRate 0.000382 Epoch: 17 Global Step: 368030 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:44,893-Speed 2496.48 samples/sec Loss 2.3656 LearningRate 0.000382 Epoch: 17 Global Step: 368040 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:40:53,048-Speed 2511.93 samples/sec Loss 2.3903 LearningRate 0.000382 Epoch: 17 Global Step: 368050 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:01,255-Speed 2495.57 samples/sec Loss 2.3856 LearningRate 0.000382 Epoch: 17 Global Step: 368060 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:09,461-Speed 2496.30 samples/sec Loss 2.3859 LearningRate 0.000382 Epoch: 17 Global Step: 368070 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:17,671-Speed 2494.74 samples/sec Loss 2.3280 LearningRate 0.000382 Epoch: 17 Global Step: 368080 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:25,880-Speed 2495.56 samples/sec Loss 2.3592 LearningRate 0.000382 Epoch: 17 Global Step: 368090 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:34,089-Speed 2495.02 samples/sec Loss 2.4414 LearningRate 0.000382 Epoch: 17 Global Step: 368100 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:42,244-Speed 2511.87 samples/sec Loss 2.3891 LearningRate 0.000382 Epoch: 17 Global Step: 368110 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:50,462-Speed 2492.42 samples/sec Loss 2.3600 LearningRate 0.000382 Epoch: 17 Global Step: 368120 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:41:58,672-Speed 2494.94 samples/sec Loss 2.3658 LearningRate 0.000382 Epoch: 17 Global Step: 368130 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:06,879-Speed 2495.63 samples/sec Loss 2.3541 LearningRate 0.000382 Epoch: 17 Global Step: 368140 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:15,088-Speed 2495.32 samples/sec Loss 2.3483 LearningRate 0.000382 Epoch: 17 Global Step: 368150 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:23,296-Speed 2495.45 samples/sec Loss 2.3600 LearningRate 0.000382 Epoch: 17 Global Step: 368160 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:31,468-Speed 2506.65 samples/sec Loss 2.3411 LearningRate 0.000382 Epoch: 17 Global Step: 368170 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:39,671-Speed 2496.72 samples/sec Loss 2.3636 LearningRate 0.000382 Epoch: 17 Global Step: 368180 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:47,874-Speed 2497.41 samples/sec Loss 2.3725 LearningRate 0.000382 Epoch: 17 Global Step: 368190 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:42:56,076-Speed 2497.45 samples/sec Loss 2.3369 LearningRate 0.000382 Epoch: 17 Global Step: 368200 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:04,288-Speed 2494.37 samples/sec Loss 2.3375 LearningRate 0.000382 Epoch: 17 Global Step: 368210 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:12,494-Speed 2495.94 samples/sec Loss 2.3730 LearningRate 0.000382 Epoch: 17 Global Step: 368220 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:20,648-Speed 2512.22 samples/sec Loss 2.3763 LearningRate 0.000382 Epoch: 17 Global Step: 368230 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:28,851-Speed 2497.00 samples/sec Loss 2.3537 LearningRate 0.000382 Epoch: 17 Global Step: 368240 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:37,060-Speed 2494.93 samples/sec Loss 2.3893 LearningRate 0.000382 Epoch: 17 Global Step: 368250 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:45,264-Speed 2496.92 samples/sec Loss 2.3532 LearningRate 0.000382 Epoch: 17 Global Step: 368260 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:43:53,465-Speed 2497.48 samples/sec Loss 2.3618 LearningRate 0.000382 Epoch: 17 Global Step: 368270 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:01,679-Speed 2493.90 samples/sec Loss 2.3459 LearningRate 0.000382 Epoch: 17 Global Step: 368280 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:09,833-Speed 2512.08 samples/sec Loss 2.3827 LearningRate 0.000382 Epoch: 17 Global Step: 368290 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:18,035-Speed 2497.11 samples/sec Loss 2.3831 LearningRate 0.000382 Epoch: 17 Global Step: 368300 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:26,241-Speed 2496.54 samples/sec Loss 2.4079 LearningRate 0.000382 Epoch: 17 Global Step: 368310 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:34,446-Speed 2496.31 samples/sec Loss 2.3757 LearningRate 0.000382 Epoch: 17 Global Step: 368320 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:42,648-Speed 2497.25 samples/sec Loss 2.4406 LearningRate 0.000382 Epoch: 17 Global Step: 368330 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:50,852-Speed 2496.86 samples/sec Loss 2.3314 LearningRate 0.000382 Epoch: 17 Global Step: 368340 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:44:59,014-Speed 2509.72 samples/sec Loss 2.3558 LearningRate 0.000382 Epoch: 17 Global Step: 368350 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:07,220-Speed 2496.02 samples/sec Loss 2.3355 LearningRate 0.000382 Epoch: 17 Global Step: 368360 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:15,421-Speed 2497.69 samples/sec Loss 2.3708 LearningRate 0.000382 Epoch: 17 Global Step: 368370 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:23,624-Speed 2497.12 samples/sec Loss 2.3655 LearningRate 0.000382 Epoch: 17 Global Step: 368380 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:31,824-Speed 2497.75 samples/sec Loss 2.3604 LearningRate 0.000382 Epoch: 17 Global Step: 368390 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:40,032-Speed 2495.54 samples/sec Loss 2.4089 LearningRate 0.000382 Epoch: 17 Global Step: 368400 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:48,184-Speed 2512.68 samples/sec Loss 2.3602 LearningRate 0.000382 Epoch: 17 Global Step: 368410 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:45:56,387-Speed 2497.07 samples/sec Loss 2.4032 LearningRate 0.000381 Epoch: 17 Global Step: 368420 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:04,593-Speed 2496.15 samples/sec Loss 2.3932 LearningRate 0.000381 Epoch: 17 Global Step: 368430 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:12,796-Speed 2496.79 samples/sec Loss 2.3740 LearningRate 0.000381 Epoch: 17 Global Step: 368440 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:20,997-Speed 2497.67 samples/sec Loss 2.4035 LearningRate 0.000381 Epoch: 17 Global Step: 368450 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:29,199-Speed 2497.27 samples/sec Loss 2.3784 LearningRate 0.000381 Epoch: 17 Global Step: 368460 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:37,354-Speed 2511.90 samples/sec Loss 2.4619 LearningRate 0.000381 Epoch: 17 Global Step: 368470 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:45,555-Speed 2497.57 samples/sec Loss 2.4065 LearningRate 0.000381 Epoch: 17 Global Step: 368480 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:46:53,755-Speed 2497.92 samples/sec Loss 2.4672 LearningRate 0.000381 Epoch: 17 Global Step: 368490 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:01,966-Speed 2494.69 samples/sec Loss 2.3831 LearningRate 0.000381 Epoch: 17 Global Step: 368500 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:10,168-Speed 2497.54 samples/sec Loss 2.3876 LearningRate 0.000381 Epoch: 17 Global Step: 368510 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:18,372-Speed 2496.56 samples/sec Loss 2.3656 LearningRate 0.000381 Epoch: 17 Global Step: 368520 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:26,523-Speed 2512.98 samples/sec Loss 2.4450 LearningRate 0.000381 Epoch: 17 Global Step: 368530 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:34,726-Speed 2497.41 samples/sec Loss 2.3935 LearningRate 0.000381 Epoch: 17 Global Step: 368540 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:42,927-Speed 2497.39 samples/sec Loss 2.4314 LearningRate 0.000381 Epoch: 17 Global Step: 368550 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:51,132-Speed 2496.67 samples/sec Loss 2.4183 LearningRate 0.000381 Epoch: 17 Global Step: 368560 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:47:59,334-Speed 2497.15 samples/sec Loss 2.4191 LearningRate 0.000381 Epoch: 17 Global Step: 368570 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:07,538-Speed 2496.92 samples/sec Loss 2.3823 LearningRate 0.000381 Epoch: 17 Global Step: 368580 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:15,688-Speed 2513.26 samples/sec Loss 2.4080 LearningRate 0.000381 Epoch: 17 Global Step: 368590 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:23,890-Speed 2497.40 samples/sec Loss 2.4015 LearningRate 0.000381 Epoch: 17 Global Step: 368600 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:32,095-Speed 2496.44 samples/sec Loss 2.4210 LearningRate 0.000381 Epoch: 17 Global Step: 368610 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:40,296-Speed 2497.73 samples/sec Loss 2.4039 LearningRate 0.000381 Epoch: 17 Global Step: 368620 Fp16 Grad Scale: 16384 Required: 106 hours Training: 2022-07-09 02:48:48,497-Speed 2497.50 samples/sec Loss 2.4125 LearningRate 0.000381 Epoch: 17 Global Step: 368630 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:48:56,712-Speed 2493.54 samples/sec Loss 2.4120 LearningRate 0.000381 Epoch: 17 Global Step: 368640 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:04,858-Speed 2514.30 samples/sec Loss 2.4154 LearningRate 0.000381 Epoch: 17 Global Step: 368650 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:13,055-Speed 2498.89 samples/sec Loss 2.4488 LearningRate 0.000381 Epoch: 17 Global Step: 368660 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:21,259-Speed 2496.69 samples/sec Loss 2.4727 LearningRate 0.000381 Epoch: 17 Global Step: 368670 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:29,466-Speed 2496.11 samples/sec Loss 2.4471 LearningRate 0.000381 Epoch: 17 Global Step: 368680 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:37,670-Speed 2496.85 samples/sec Loss 2.4318 LearningRate 0.000381 Epoch: 17 Global Step: 368690 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:45,877-Speed 2496.05 samples/sec Loss 2.4700 LearningRate 0.000381 Epoch: 17 Global Step: 368700 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:49:54,027-Speed 2513.01 samples/sec Loss 2.4141 LearningRate 0.000381 Epoch: 17 Global Step: 368710 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:02,248-Speed 2491.60 samples/sec Loss 2.4280 LearningRate 0.000381 Epoch: 17 Global Step: 368720 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:10,449-Speed 2498.17 samples/sec Loss 2.3780 LearningRate 0.000381 Epoch: 17 Global Step: 368730 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:18,660-Speed 2494.63 samples/sec Loss 2.4272 LearningRate 0.000381 Epoch: 17 Global Step: 368740 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:26,858-Speed 2498.30 samples/sec Loss 2.3874 LearningRate 0.000381 Epoch: 17 Global Step: 368750 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:35,060-Speed 2497.52 samples/sec Loss 2.3898 LearningRate 0.000381 Epoch: 17 Global Step: 368760 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:43,210-Speed 2513.21 samples/sec Loss 2.3942 LearningRate 0.000381 Epoch: 17 Global Step: 368770 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:51,410-Speed 2497.81 samples/sec Loss 2.3880 LearningRate 0.000381 Epoch: 17 Global Step: 368780 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:50:59,616-Speed 2496.20 samples/sec Loss 2.4287 LearningRate 0.000381 Epoch: 17 Global Step: 368790 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:07,815-Speed 2498.24 samples/sec Loss 2.3655 LearningRate 0.000381 Epoch: 17 Global Step: 368800 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:16,027-Speed 2494.79 samples/sec Loss 2.3726 LearningRate 0.000381 Epoch: 17 Global Step: 368810 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:24,228-Speed 2497.41 samples/sec Loss 2.3734 LearningRate 0.000381 Epoch: 17 Global Step: 368820 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:32,379-Speed 2513.27 samples/sec Loss 2.4223 LearningRate 0.000381 Epoch: 17 Global Step: 368830 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:40,592-Speed 2493.93 samples/sec Loss 2.4017 LearningRate 0.000381 Epoch: 17 Global Step: 368840 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:48,793-Speed 2497.68 samples/sec Loss 2.4108 LearningRate 0.000381 Epoch: 17 Global Step: 368850 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:51:56,993-Speed 2497.75 samples/sec Loss 2.3479 LearningRate 0.000381 Epoch: 17 Global Step: 368860 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:05,198-Speed 2496.39 samples/sec Loss 2.3804 LearningRate 0.000381 Epoch: 17 Global Step: 368870 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:13,400-Speed 2497.39 samples/sec Loss 2.3920 LearningRate 0.000381 Epoch: 17 Global Step: 368880 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:21,550-Speed 2513.80 samples/sec Loss 2.3883 LearningRate 0.000381 Epoch: 17 Global Step: 368890 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:29,748-Speed 2498.51 samples/sec Loss 2.3858 LearningRate 0.000381 Epoch: 17 Global Step: 368900 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:37,950-Speed 2497.25 samples/sec Loss 2.3899 LearningRate 0.000381 Epoch: 17 Global Step: 368910 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:46,156-Speed 2496.09 samples/sec Loss 2.4133 LearningRate 0.000381 Epoch: 17 Global Step: 368920 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:52:54,379-Speed 2491.46 samples/sec Loss 2.3595 LearningRate 0.000381 Epoch: 17 Global Step: 368930 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:02,593-Speed 2493.37 samples/sec Loss 2.3937 LearningRate 0.000381 Epoch: 17 Global Step: 368940 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:10,756-Speed 2509.36 samples/sec Loss 2.3729 LearningRate 0.000381 Epoch: 17 Global Step: 368950 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:18,958-Speed 2497.43 samples/sec Loss 2.3489 LearningRate 0.000381 Epoch: 17 Global Step: 368960 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:27,162-Speed 2496.66 samples/sec Loss 2.4099 LearningRate 0.000381 Epoch: 17 Global Step: 368970 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:35,364-Speed 2497.38 samples/sec Loss 2.4261 LearningRate 0.000381 Epoch: 17 Global Step: 368980 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:43,565-Speed 2497.66 samples/sec Loss 2.3802 LearningRate 0.000381 Epoch: 17 Global Step: 368990 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:51,773-Speed 2495.52 samples/sec Loss 2.3201 LearningRate 0.000381 Epoch: 17 Global Step: 369000 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:53:59,925-Speed 2512.80 samples/sec Loss 2.3468 LearningRate 0.000381 Epoch: 17 Global Step: 369010 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:08,131-Speed 2495.95 samples/sec Loss 2.3772 LearningRate 0.000381 Epoch: 17 Global Step: 369020 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:16,337-Speed 2496.26 samples/sec Loss 2.3487 LearningRate 0.000380 Epoch: 17 Global Step: 369030 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:24,540-Speed 2497.21 samples/sec Loss 2.3845 LearningRate 0.000380 Epoch: 17 Global Step: 369040 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:32,767-Speed 2489.47 samples/sec Loss 2.3921 LearningRate 0.000380 Epoch: 17 Global Step: 369050 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:40,971-Speed 2496.88 samples/sec Loss 2.3772 LearningRate 0.000380 Epoch: 17 Global Step: 369060 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:49,118-Speed 2514.12 samples/sec Loss 2.3683 LearningRate 0.000380 Epoch: 17 Global Step: 369070 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:54:57,324-Speed 2496.37 samples/sec Loss 2.3875 LearningRate 0.000380 Epoch: 17 Global Step: 369080 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:55:05,524-Speed 2497.72 samples/sec Loss 2.3638 LearningRate 0.000380 Epoch: 17 Global Step: 369090 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:55:13,728-Speed 2496.69 samples/sec Loss 2.3637 LearningRate 0.000380 Epoch: 17 Global Step: 369100 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:55:21,945-Speed 2492.94 samples/sec Loss 2.3214 LearningRate 0.000380 Epoch: 17 Global Step: 369110 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:55:30,145-Speed 2498.19 samples/sec Loss 2.4098 LearningRate 0.000380 Epoch: 17 Global Step: 369120 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:55:38,299-Speed 2511.94 samples/sec Loss 2.3578 LearningRate 0.000380 Epoch: 17 Global Step: 369130 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:55:46,498-Speed 2498.50 samples/sec Loss 2.3715 LearningRate 0.000380 Epoch: 17 Global Step: 369140 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:55:54,700-Speed 2497.86 samples/sec Loss 2.3531 LearningRate 0.000380 Epoch: 17 Global Step: 369150 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:02,904-Speed 2496.90 samples/sec Loss 2.3641 LearningRate 0.000380 Epoch: 17 Global Step: 369160 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:11,105-Speed 2497.61 samples/sec Loss 2.3629 LearningRate 0.000380 Epoch: 17 Global Step: 369170 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:19,338-Speed 2488.19 samples/sec Loss 2.3341 LearningRate 0.000380 Epoch: 17 Global Step: 369180 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:27,491-Speed 2513.03 samples/sec Loss 2.3820 LearningRate 0.000380 Epoch: 17 Global Step: 369190 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:35,692-Speed 2497.46 samples/sec Loss 2.3775 LearningRate 0.000380 Epoch: 17 Global Step: 369200 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 02:56:43,848-Speed 2511.53 samples/sec Loss 2.3589 LearningRate 0.000380 Epoch: 17 Global Step: 369210 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:56:52,053-Speed 2496.59 samples/sec Loss 2.3470 LearningRate 0.000380 Epoch: 17 Global Step: 369220 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:00,253-Speed 2497.97 samples/sec Loss 2.3864 LearningRate 0.000380 Epoch: 17 Global Step: 369230 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:08,457-Speed 2496.97 samples/sec Loss 2.3539 LearningRate 0.000380 Epoch: 17 Global Step: 369240 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:16,602-Speed 2514.43 samples/sec Loss 2.3795 LearningRate 0.000380 Epoch: 17 Global Step: 369250 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:24,802-Speed 2498.18 samples/sec Loss 2.3845 LearningRate 0.000380 Epoch: 17 Global Step: 369260 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:33,001-Speed 2498.13 samples/sec Loss 2.4389 LearningRate 0.000380 Epoch: 17 Global Step: 369270 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:41,203-Speed 2497.28 samples/sec Loss 2.4163 LearningRate 0.000380 Epoch: 17 Global Step: 369280 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:49,408-Speed 2496.60 samples/sec Loss 2.4135 LearningRate 0.000380 Epoch: 17 Global Step: 369290 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:57:57,610-Speed 2497.46 samples/sec Loss 2.3397 LearningRate 0.000380 Epoch: 17 Global Step: 369300 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:05,753-Speed 2515.23 samples/sec Loss 2.3773 LearningRate 0.000380 Epoch: 17 Global Step: 369310 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:13,965-Speed 2494.58 samples/sec Loss 2.4481 LearningRate 0.000380 Epoch: 17 Global Step: 369320 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:22,182-Speed 2492.44 samples/sec Loss 2.3863 LearningRate 0.000380 Epoch: 17 Global Step: 369330 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:30,387-Speed 2496.71 samples/sec Loss 2.4021 LearningRate 0.000380 Epoch: 17 Global Step: 369340 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:38,590-Speed 2497.48 samples/sec Loss 2.4014 LearningRate 0.000380 Epoch: 17 Global Step: 369350 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:46,789-Speed 2498.23 samples/sec Loss 2.3735 LearningRate 0.000380 Epoch: 17 Global Step: 369360 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:58:54,939-Speed 2513.20 samples/sec Loss 2.3635 LearningRate 0.000380 Epoch: 17 Global Step: 369370 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:03,138-Speed 2498.03 samples/sec Loss 2.4218 LearningRate 0.000380 Epoch: 17 Global Step: 369380 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:11,336-Speed 2499.09 samples/sec Loss 2.3704 LearningRate 0.000380 Epoch: 17 Global Step: 369390 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:19,551-Speed 2493.47 samples/sec Loss 2.3695 LearningRate 0.000380 Epoch: 17 Global Step: 369400 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:27,749-Speed 2498.44 samples/sec Loss 2.3567 LearningRate 0.000380 Epoch: 17 Global Step: 369410 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:35,947-Speed 2498.79 samples/sec Loss 2.3326 LearningRate 0.000380 Epoch: 17 Global Step: 369420 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:44,106-Speed 2510.53 samples/sec Loss 2.3804 LearningRate 0.000380 Epoch: 17 Global Step: 369430 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 02:59:52,311-Speed 2496.32 samples/sec Loss 2.3509 LearningRate 0.000380 Epoch: 17 Global Step: 369440 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:00,510-Speed 2498.25 samples/sec Loss 2.3308 LearningRate 0.000380 Epoch: 17 Global Step: 369450 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:08,716-Speed 2496.09 samples/sec Loss 2.3199 LearningRate 0.000380 Epoch: 17 Global Step: 369460 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:16,920-Speed 2496.87 samples/sec Loss 2.3312 LearningRate 0.000380 Epoch: 17 Global Step: 369470 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:25,126-Speed 2496.18 samples/sec Loss 2.3561 LearningRate 0.000380 Epoch: 17 Global Step: 369480 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:33,278-Speed 2512.63 samples/sec Loss 2.3014 LearningRate 0.000380 Epoch: 17 Global Step: 369490 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:41,479-Speed 2497.55 samples/sec Loss 2.3022 LearningRate 0.000380 Epoch: 17 Global Step: 369500 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:49,677-Speed 2498.59 samples/sec Loss 2.3320 LearningRate 0.000380 Epoch: 17 Global Step: 369510 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:00:57,881-Speed 2496.81 samples/sec Loss 2.3144 LearningRate 0.000380 Epoch: 17 Global Step: 369520 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:06,083-Speed 2497.19 samples/sec Loss 2.2986 LearningRate 0.000380 Epoch: 17 Global Step: 369530 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:14,283-Speed 2498.07 samples/sec Loss 2.3607 LearningRate 0.000380 Epoch: 17 Global Step: 369540 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:22,431-Speed 2513.78 samples/sec Loss 2.3295 LearningRate 0.000380 Epoch: 17 Global Step: 369550 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:30,632-Speed 2497.74 samples/sec Loss 2.2801 LearningRate 0.000380 Epoch: 17 Global Step: 369560 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:38,832-Speed 2498.21 samples/sec Loss 2.3346 LearningRate 0.000380 Epoch: 17 Global Step: 369570 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:47,050-Speed 2492.49 samples/sec Loss 2.3379 LearningRate 0.000380 Epoch: 17 Global Step: 369580 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:01:55,251-Speed 2497.58 samples/sec Loss 2.3589 LearningRate 0.000380 Epoch: 17 Global Step: 369590 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:03,450-Speed 2498.28 samples/sec Loss 2.3247 LearningRate 0.000380 Epoch: 17 Global Step: 369600 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:11,597-Speed 2514.15 samples/sec Loss 2.3407 LearningRate 0.000380 Epoch: 17 Global Step: 369610 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:19,800-Speed 2496.90 samples/sec Loss 2.3223 LearningRate 0.000380 Epoch: 17 Global Step: 369620 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:28,005-Speed 2496.44 samples/sec Loss 2.3462 LearningRate 0.000379 Epoch: 17 Global Step: 369630 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:36,209-Speed 2496.54 samples/sec Loss 2.3220 LearningRate 0.000379 Epoch: 17 Global Step: 369640 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:44,412-Speed 2497.18 samples/sec Loss 2.3188 LearningRate 0.000379 Epoch: 17 Global Step: 369650 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:02:52,613-Speed 2497.98 samples/sec Loss 2.3138 LearningRate 0.000379 Epoch: 17 Global Step: 369660 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:00,766-Speed 2512.12 samples/sec Loss 2.3411 LearningRate 0.000379 Epoch: 17 Global Step: 369670 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:08,972-Speed 2496.16 samples/sec Loss 2.3733 LearningRate 0.000379 Epoch: 17 Global Step: 369680 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:17,177-Speed 2496.63 samples/sec Loss 2.3483 LearningRate 0.000379 Epoch: 17 Global Step: 369690 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:25,381-Speed 2496.63 samples/sec Loss 2.3643 LearningRate 0.000379 Epoch: 17 Global Step: 369700 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:33,587-Speed 2495.92 samples/sec Loss 2.3665 LearningRate 0.000379 Epoch: 17 Global Step: 369710 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:41,790-Speed 2497.01 samples/sec Loss 2.3597 LearningRate 0.000379 Epoch: 17 Global Step: 369720 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:49,942-Speed 2513.02 samples/sec Loss 2.3996 LearningRate 0.000379 Epoch: 17 Global Step: 369730 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:03:58,143-Speed 2497.60 samples/sec Loss 2.4031 LearningRate 0.000379 Epoch: 17 Global Step: 369740 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:06,345-Speed 2497.38 samples/sec Loss 2.3837 LearningRate 0.000379 Epoch: 17 Global Step: 369750 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:14,544-Speed 2498.23 samples/sec Loss 2.3428 LearningRate 0.000379 Epoch: 17 Global Step: 369760 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:22,746-Speed 2497.59 samples/sec Loss 2.3225 LearningRate 0.000379 Epoch: 17 Global Step: 369770 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:30,947-Speed 2497.70 samples/sec Loss 2.3768 LearningRate 0.000379 Epoch: 17 Global Step: 369780 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:39,093-Speed 2514.57 samples/sec Loss 2.3653 LearningRate 0.000379 Epoch: 17 Global Step: 369790 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:47,293-Speed 2497.87 samples/sec Loss 2.3658 LearningRate 0.000379 Epoch: 17 Global Step: 369800 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:04:55,496-Speed 2496.95 samples/sec Loss 2.3631 LearningRate 0.000379 Epoch: 17 Global Step: 369810 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:03,702-Speed 2496.11 samples/sec Loss 2.3402 LearningRate 0.000379 Epoch: 17 Global Step: 369820 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:11,903-Speed 2497.94 samples/sec Loss 2.4352 LearningRate 0.000379 Epoch: 17 Global Step: 369830 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:20,108-Speed 2496.30 samples/sec Loss 2.3434 LearningRate 0.000379 Epoch: 17 Global Step: 369840 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:28,255-Speed 2514.12 samples/sec Loss 2.3372 LearningRate 0.000379 Epoch: 17 Global Step: 369850 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:36,456-Speed 2497.63 samples/sec Loss 2.3353 LearningRate 0.000379 Epoch: 17 Global Step: 369860 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:44,656-Speed 2498.12 samples/sec Loss 2.3812 LearningRate 0.000379 Epoch: 17 Global Step: 369870 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:05:52,860-Speed 2496.91 samples/sec Loss 2.3487 LearningRate 0.000379 Epoch: 17 Global Step: 369880 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:01,061-Speed 2497.87 samples/sec Loss 2.2760 LearningRate 0.000379 Epoch: 17 Global Step: 369890 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:09,264-Speed 2496.84 samples/sec Loss 2.3688 LearningRate 0.000379 Epoch: 17 Global Step: 369900 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:17,415-Speed 2512.94 samples/sec Loss 2.3445 LearningRate 0.000379 Epoch: 17 Global Step: 369910 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:25,629-Speed 2493.83 samples/sec Loss 2.3210 LearningRate 0.000379 Epoch: 17 Global Step: 369920 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:33,830-Speed 2497.47 samples/sec Loss 2.3653 LearningRate 0.000379 Epoch: 17 Global Step: 369930 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:42,029-Speed 2498.39 samples/sec Loss 2.3614 LearningRate 0.000379 Epoch: 17 Global Step: 369940 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:50,236-Speed 2496.36 samples/sec Loss 2.3893 LearningRate 0.000379 Epoch: 17 Global Step: 369950 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:06:58,438-Speed 2497.13 samples/sec Loss 2.3154 LearningRate 0.000379 Epoch: 17 Global Step: 369960 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:06,595-Speed 2511.03 samples/sec Loss 2.3639 LearningRate 0.000379 Epoch: 17 Global Step: 369970 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:14,798-Speed 2497.14 samples/sec Loss 2.3361 LearningRate 0.000379 Epoch: 17 Global Step: 369980 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:23,004-Speed 2496.00 samples/sec Loss 2.3825 LearningRate 0.000379 Epoch: 17 Global Step: 369990 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:31,206-Speed 2497.26 samples/sec Loss 2.2738 LearningRate 0.000379 Epoch: 17 Global Step: 370000 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:39,412-Speed 2496.26 samples/sec Loss 2.3672 LearningRate 0.000379 Epoch: 17 Global Step: 370010 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:47,617-Speed 2496.32 samples/sec Loss 2.3678 LearningRate 0.000379 Epoch: 17 Global Step: 370020 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:07:55,769-Speed 2512.86 samples/sec Loss 2.3376 LearningRate 0.000379 Epoch: 17 Global Step: 370030 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:03,973-Speed 2496.50 samples/sec Loss 2.3558 LearningRate 0.000379 Epoch: 17 Global Step: 370040 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:12,187-Speed 2493.64 samples/sec Loss 2.3712 LearningRate 0.000379 Epoch: 17 Global Step: 370050 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:20,386-Speed 2498.52 samples/sec Loss 2.3850 LearningRate 0.000379 Epoch: 17 Global Step: 370060 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:28,594-Speed 2495.27 samples/sec Loss 2.3687 LearningRate 0.000379 Epoch: 17 Global Step: 370070 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:36,797-Speed 2497.05 samples/sec Loss 2.3446 LearningRate 0.000379 Epoch: 17 Global Step: 370080 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:44,948-Speed 2513.08 samples/sec Loss 2.3787 LearningRate 0.000379 Epoch: 17 Global Step: 370090 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:08:53,153-Speed 2496.46 samples/sec Loss 2.3545 LearningRate 0.000379 Epoch: 17 Global Step: 370100 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:01,364-Speed 2494.41 samples/sec Loss 2.3626 LearningRate 0.000379 Epoch: 17 Global Step: 370110 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:09,566-Speed 2497.65 samples/sec Loss 2.3557 LearningRate 0.000379 Epoch: 17 Global Step: 370120 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:17,769-Speed 2496.89 samples/sec Loss 2.3527 LearningRate 0.000379 Epoch: 17 Global Step: 370130 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:25,975-Speed 2496.26 samples/sec Loss 2.3644 LearningRate 0.000379 Epoch: 17 Global Step: 370140 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:34,125-Speed 2513.39 samples/sec Loss 2.3576 LearningRate 0.000379 Epoch: 17 Global Step: 370150 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:42,325-Speed 2497.91 samples/sec Loss 2.4202 LearningRate 0.000379 Epoch: 17 Global Step: 370160 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:50,543-Speed 2492.71 samples/sec Loss 2.3428 LearningRate 0.000379 Epoch: 17 Global Step: 370170 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:09:58,750-Speed 2495.95 samples/sec Loss 2.3758 LearningRate 0.000379 Epoch: 17 Global Step: 370180 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:06,959-Speed 2495.05 samples/sec Loss 2.3365 LearningRate 0.000379 Epoch: 17 Global Step: 370190 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:15,173-Speed 2493.76 samples/sec Loss 2.3667 LearningRate 0.000379 Epoch: 17 Global Step: 370200 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:23,322-Speed 2513.32 samples/sec Loss 2.4120 LearningRate 0.000379 Epoch: 17 Global Step: 370210 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:31,526-Speed 2496.91 samples/sec Loss 2.3436 LearningRate 0.000379 Epoch: 17 Global Step: 370220 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:39,728-Speed 2497.40 samples/sec Loss 2.3502 LearningRate 0.000379 Epoch: 17 Global Step: 370230 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:47,932-Speed 2496.89 samples/sec Loss 2.3202 LearningRate 0.000378 Epoch: 17 Global Step: 370240 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:10:56,133-Speed 2497.50 samples/sec Loss 2.3474 LearningRate 0.000378 Epoch: 17 Global Step: 370250 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:04,335-Speed 2497.38 samples/sec Loss 2.3221 LearningRate 0.000378 Epoch: 17 Global Step: 370260 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:12,484-Speed 2513.74 samples/sec Loss 2.3901 LearningRate 0.000378 Epoch: 17 Global Step: 370270 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:20,685-Speed 2497.63 samples/sec Loss 2.2806 LearningRate 0.000378 Epoch: 17 Global Step: 370280 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:28,890-Speed 2496.54 samples/sec Loss 2.3432 LearningRate 0.000378 Epoch: 17 Global Step: 370290 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:37,095-Speed 2496.66 samples/sec Loss 2.3702 LearningRate 0.000378 Epoch: 17 Global Step: 370300 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:45,296-Speed 2497.54 samples/sec Loss 2.3390 LearningRate 0.000378 Epoch: 17 Global Step: 370310 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:11:53,501-Speed 2496.47 samples/sec Loss 2.3022 LearningRate 0.000378 Epoch: 17 Global Step: 370320 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:01,655-Speed 2512.13 samples/sec Loss 2.3490 LearningRate 0.000378 Epoch: 17 Global Step: 370330 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:09,858-Speed 2498.44 samples/sec Loss 2.4153 LearningRate 0.000378 Epoch: 17 Global Step: 370340 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:18,072-Speed 2493.79 samples/sec Loss 2.3961 LearningRate 0.000378 Epoch: 17 Global Step: 370350 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:26,274-Speed 2497.24 samples/sec Loss 2.3910 LearningRate 0.000378 Epoch: 17 Global Step: 370360 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:34,476-Speed 2497.57 samples/sec Loss 2.3408 LearningRate 0.000378 Epoch: 17 Global Step: 370370 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:42,679-Speed 2497.13 samples/sec Loss 2.3711 LearningRate 0.000378 Epoch: 17 Global Step: 370380 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:50,825-Speed 2514.58 samples/sec Loss 2.3310 LearningRate 0.000378 Epoch: 17 Global Step: 370390 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:12:59,037-Speed 2494.15 samples/sec Loss 2.4071 LearningRate 0.000378 Epoch: 17 Global Step: 370400 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:13:07,241-Speed 2497.04 samples/sec Loss 2.3431 LearningRate 0.000378 Epoch: 17 Global Step: 370410 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:15,441-Speed 2497.90 samples/sec Loss 2.3603 LearningRate 0.000378 Epoch: 17 Global Step: 370420 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:23,639-Speed 2498.37 samples/sec Loss 2.3163 LearningRate 0.000378 Epoch: 17 Global Step: 370430 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:31,845-Speed 2496.35 samples/sec Loss 2.3639 LearningRate 0.000378 Epoch: 17 Global Step: 370440 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:39,995-Speed 2513.23 samples/sec Loss 2.3306 LearningRate 0.000378 Epoch: 17 Global Step: 370450 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:48,192-Speed 2499.12 samples/sec Loss 2.4233 LearningRate 0.000378 Epoch: 17 Global Step: 370460 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:13:56,396-Speed 2496.59 samples/sec Loss 2.3538 LearningRate 0.000378 Epoch: 17 Global Step: 370470 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:04,600-Speed 2496.77 samples/sec Loss 2.3663 LearningRate 0.000378 Epoch: 17 Global Step: 370480 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:12,799-Speed 2498.45 samples/sec Loss 2.3468 LearningRate 0.000378 Epoch: 17 Global Step: 370490 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:21,008-Speed 2495.05 samples/sec Loss 2.3244 LearningRate 0.000378 Epoch: 17 Global Step: 370500 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:29,157-Speed 2513.89 samples/sec Loss 2.3654 LearningRate 0.000378 Epoch: 17 Global Step: 370510 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:37,453-Speed 2498.97 samples/sec Loss 2.3142 LearningRate 0.000378 Epoch: 17 Global Step: 370520 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:45,731-Speed 2499.94 samples/sec Loss 2.3402 LearningRate 0.000378 Epoch: 17 Global Step: 370530 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:14:55,954-Speed 2003.37 samples/sec Loss 2.3262 LearningRate 0.000378 Epoch: 17 Global Step: 370540 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:04,167-Speed 2502.15 samples/sec Loss 2.3539 LearningRate 0.000378 Epoch: 17 Global Step: 370550 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:12,405-Speed 2501.11 samples/sec Loss 2.3462 LearningRate 0.000378 Epoch: 17 Global Step: 370560 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:20,553-Speed 2514.00 samples/sec Loss 2.4077 LearningRate 0.000378 Epoch: 17 Global Step: 370570 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:28,792-Speed 2499.16 samples/sec Loss 2.3582 LearningRate 0.000378 Epoch: 17 Global Step: 370580 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:37,047-Speed 2498.53 samples/sec Loss 2.4111 LearningRate 0.000378 Epoch: 17 Global Step: 370590 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:45,276-Speed 2499.11 samples/sec Loss 2.3496 LearningRate 0.000378 Epoch: 17 Global Step: 370600 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:15:53,488-Speed 2494.23 samples/sec Loss 2.3550 LearningRate 0.000378 Epoch: 17 Global Step: 370610 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:01,698-Speed 2494.97 samples/sec Loss 2.3760 LearningRate 0.000378 Epoch: 17 Global Step: 370620 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:13,279-Speed 1783.01 samples/sec Loss 2.3363 LearningRate 0.000378 Epoch: 17 Global Step: 370630 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:21,864-Speed 2395.56 samples/sec Loss 2.4114 LearningRate 0.000378 Epoch: 17 Global Step: 370640 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:30,064-Speed 2497.82 samples/sec Loss 2.3307 LearningRate 0.000378 Epoch: 17 Global Step: 370650 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:41,312-Speed 1821.04 samples/sec Loss 2.3808 LearningRate 0.000378 Epoch: 17 Global Step: 370660 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:49,568-Speed 2501.97 samples/sec Loss 2.3879 LearningRate 0.000378 Epoch: 17 Global Step: 370670 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:16:57,819-Speed 2500.96 samples/sec Loss 2.3768 LearningRate 0.000378 Epoch: 17 Global Step: 370680 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:07,178-Speed 2188.32 samples/sec Loss 2.4070 LearningRate 0.000378 Epoch: 17 Global Step: 370690 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:15,414-Speed 2501.13 samples/sec Loss 2.4273 LearningRate 0.000378 Epoch: 17 Global Step: 370700 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:26,293-Speed 1888.84 samples/sec Loss 2.3703 LearningRate 0.000378 Epoch: 17 Global Step: 370710 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:34,565-Speed 2502.48 samples/sec Loss 2.3547 LearningRate 0.000378 Epoch: 17 Global Step: 370720 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:42,781-Speed 2492.84 samples/sec Loss 2.3777 LearningRate 0.000378 Epoch: 17 Global Step: 370730 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:17:51,022-Speed 2499.65 samples/sec Loss 2.3505 LearningRate 0.000378 Epoch: 17 Global Step: 370740 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:00,125-Speed 2512.04 samples/sec Loss 2.3968 LearningRate 0.000378 Epoch: 17 Global Step: 370750 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:12,283-Speed 1684.71 samples/sec Loss 2.3318 LearningRate 0.000378 Epoch: 17 Global Step: 370760 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:20,637-Speed 2502.68 samples/sec Loss 2.3235 LearningRate 0.000378 Epoch: 17 Global Step: 370770 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:28,844-Speed 2499.46 samples/sec Loss 2.3427 LearningRate 0.000378 Epoch: 17 Global Step: 370780 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:37,060-Speed 2498.19 samples/sec Loss 2.3608 LearningRate 0.000378 Epoch: 17 Global Step: 370790 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:45,259-Speed 2498.17 samples/sec Loss 2.3573 LearningRate 0.000378 Epoch: 17 Global Step: 370800 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:18:56,227-Speed 2514.00 samples/sec Loss 2.3339 LearningRate 0.000378 Epoch: 17 Global Step: 370810 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:04,555-Speed 2499.43 samples/sec Loss 2.3708 LearningRate 0.000378 Epoch: 17 Global Step: 370820 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:12,947-Speed 2499.66 samples/sec Loss 2.3730 LearningRate 0.000378 Epoch: 17 Global Step: 370830 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:21,148-Speed 2497.71 samples/sec Loss 2.3485 LearningRate 0.000377 Epoch: 17 Global Step: 370840 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:29,350-Speed 2497.34 samples/sec Loss 2.3431 LearningRate 0.000377 Epoch: 17 Global Step: 370850 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:37,621-Speed 2499.76 samples/sec Loss 2.3350 LearningRate 0.000377 Epoch: 17 Global Step: 370860 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:45,810-Speed 2516.79 samples/sec Loss 2.3449 LearningRate 0.000377 Epoch: 17 Global Step: 370870 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:19:54,021-Speed 2494.53 samples/sec Loss 2.3526 LearningRate 0.000377 Epoch: 17 Global Step: 370880 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:02,251-Speed 2500.94 samples/sec Loss 2.3540 LearningRate 0.000377 Epoch: 17 Global Step: 370890 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:10,513-Speed 2498.78 samples/sec Loss 2.3435 LearningRate 0.000377 Epoch: 17 Global Step: 370900 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:18,718-Speed 2496.57 samples/sec Loss 2.3999 LearningRate 0.000377 Epoch: 17 Global Step: 370910 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:27,053-Speed 2499.76 samples/sec Loss 2.3114 LearningRate 0.000377 Epoch: 17 Global Step: 370920 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:39,518-Speed 1651.63 samples/sec Loss 2.3321 LearningRate 0.000377 Epoch: 17 Global Step: 370930 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:47,746-Speed 2501.62 samples/sec Loss 2.3501 LearningRate 0.000377 Epoch: 17 Global Step: 370940 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:20:55,955-Speed 2495.17 samples/sec Loss 2.3663 LearningRate 0.000377 Epoch: 17 Global Step: 370950 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:04,382-Speed 2497.16 samples/sec Loss 2.3502 LearningRate 0.000377 Epoch: 17 Global Step: 370960 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:12,616-Speed 2501.36 samples/sec Loss 2.3324 LearningRate 0.000377 Epoch: 17 Global Step: 370970 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:20,817-Speed 2497.56 samples/sec Loss 2.3597 LearningRate 0.000377 Epoch: 17 Global Step: 370980 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:28,986-Speed 2507.36 samples/sec Loss 2.3669 LearningRate 0.000377 Epoch: 17 Global Step: 370990 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:39,973-Speed 1877.39 samples/sec Loss 2.4524 LearningRate 0.000377 Epoch: 17 Global Step: 371000 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:48,170-Speed 2499.71 samples/sec Loss 2.3749 LearningRate 0.000377 Epoch: 17 Global Step: 371010 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:21:56,365-Speed 2499.48 samples/sec Loss 2.4040 LearningRate 0.000377 Epoch: 17 Global Step: 371020 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:09,400-Speed 1582.16 samples/sec Loss 2.3248 LearningRate 0.000377 Epoch: 17 Global Step: 371030 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:17,619-Speed 2502.05 samples/sec Loss 2.4098 LearningRate 0.000377 Epoch: 17 Global Step: 371040 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:27,707-Speed 2030.42 samples/sec Loss 2.4082 LearningRate 0.000377 Epoch: 17 Global Step: 371050 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:35,900-Speed 2500.09 samples/sec Loss 2.4111 LearningRate 0.000377 Epoch: 17 Global Step: 371060 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:44,105-Speed 2501.32 samples/sec Loss 2.3640 LearningRate 0.000377 Epoch: 17 Global Step: 371070 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:22:54,862-Speed 1920.84 samples/sec Loss 2.3699 LearningRate 0.000377 Epoch: 17 Global Step: 371080 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:03,060-Speed 2498.40 samples/sec Loss 2.3540 LearningRate 0.000377 Epoch: 17 Global Step: 371090 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:11,842-Speed 2498.62 samples/sec Loss 2.3823 LearningRate 0.000377 Epoch: 17 Global Step: 371100 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:21,064-Speed 2513.76 samples/sec Loss 2.4064 LearningRate 0.000377 Epoch: 17 Global Step: 371110 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:29,273-Speed 2495.23 samples/sec Loss 2.3512 LearningRate 0.000377 Epoch: 17 Global Step: 371120 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:37,486-Speed 2494.08 samples/sec Loss 2.3231 LearningRate 0.000377 Epoch: 17 Global Step: 371130 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:45,720-Speed 2487.85 samples/sec Loss 2.3355 LearningRate 0.000377 Epoch: 17 Global Step: 371140 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:23:53,945-Speed 2490.30 samples/sec Loss 2.3205 LearningRate 0.000377 Epoch: 17 Global Step: 371150 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:02,175-Speed 2488.57 samples/sec Loss 2.3879 LearningRate 0.000377 Epoch: 17 Global Step: 371160 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:10,358-Speed 2503.32 samples/sec Loss 2.3942 LearningRate 0.000377 Epoch: 17 Global Step: 371170 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:18,587-Speed 2489.17 samples/sec Loss 2.3358 LearningRate 0.000377 Epoch: 17 Global Step: 371180 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:26,808-Speed 2491.39 samples/sec Loss 2.3345 LearningRate 0.000377 Epoch: 17 Global Step: 371190 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:35,047-Speed 2486.27 samples/sec Loss 2.3229 LearningRate 0.000377 Epoch: 17 Global Step: 371200 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:43,263-Speed 2493.29 samples/sec Loss 2.3566 LearningRate 0.000377 Epoch: 17 Global Step: 371210 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:51,474-Speed 2494.56 samples/sec Loss 2.3547 LearningRate 0.000377 Epoch: 17 Global Step: 371220 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:24:59,633-Speed 2510.46 samples/sec Loss 2.3855 LearningRate 0.000377 Epoch: 17 Global Step: 371230 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:07,837-Speed 2496.39 samples/sec Loss 2.4162 LearningRate 0.000377 Epoch: 17 Global Step: 371240 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:16,035-Speed 2498.69 samples/sec Loss 2.3756 LearningRate 0.000377 Epoch: 17 Global Step: 371250 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:24,236-Speed 2497.90 samples/sec Loss 2.3924 LearningRate 0.000377 Epoch: 17 Global Step: 371260 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:32,438-Speed 2497.08 samples/sec Loss 2.3890 LearningRate 0.000377 Epoch: 17 Global Step: 371270 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:40,645-Speed 2495.88 samples/sec Loss 2.3589 LearningRate 0.000377 Epoch: 17 Global Step: 371280 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:48,800-Speed 2511.97 samples/sec Loss 2.3931 LearningRate 0.000377 Epoch: 17 Global Step: 371290 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:25:57,009-Speed 2495.21 samples/sec Loss 2.3364 LearningRate 0.000377 Epoch: 17 Global Step: 371300 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:05,217-Speed 2495.57 samples/sec Loss 2.3655 LearningRate 0.000377 Epoch: 17 Global Step: 371310 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:13,424-Speed 2495.65 samples/sec Loss 2.4169 LearningRate 0.000377 Epoch: 17 Global Step: 371320 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:21,642-Speed 2492.89 samples/sec Loss 2.3430 LearningRate 0.000377 Epoch: 17 Global Step: 371330 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:29,851-Speed 2494.93 samples/sec Loss 2.3526 LearningRate 0.000377 Epoch: 17 Global Step: 371340 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:38,020-Speed 2507.24 samples/sec Loss 2.3338 LearningRate 0.000377 Epoch: 17 Global Step: 371350 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:46,246-Speed 2490.61 samples/sec Loss 2.3860 LearningRate 0.000377 Epoch: 17 Global Step: 371360 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:26:54,464-Speed 2492.53 samples/sec Loss 2.3953 LearningRate 0.000377 Epoch: 17 Global Step: 371370 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:02,670-Speed 2496.31 samples/sec Loss 2.3670 LearningRate 0.000377 Epoch: 17 Global Step: 371380 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:10,874-Speed 2496.77 samples/sec Loss 2.3718 LearningRate 0.000377 Epoch: 17 Global Step: 371390 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:19,079-Speed 2496.48 samples/sec Loss 2.3077 LearningRate 0.000377 Epoch: 17 Global Step: 371400 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:27,240-Speed 2509.93 samples/sec Loss 2.3306 LearningRate 0.000377 Epoch: 17 Global Step: 371410 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:35,454-Speed 2493.75 samples/sec Loss 2.3822 LearningRate 0.000377 Epoch: 17 Global Step: 371420 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:43,671-Speed 2492.92 samples/sec Loss 2.3786 LearningRate 0.000377 Epoch: 17 Global Step: 371430 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:27:51,875-Speed 2496.76 samples/sec Loss 2.3384 LearningRate 0.000377 Epoch: 17 Global Step: 371440 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:00,091-Speed 2493.19 samples/sec Loss 2.4111 LearningRate 0.000376 Epoch: 17 Global Step: 371450 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:08,297-Speed 2496.25 samples/sec Loss 2.3408 LearningRate 0.000376 Epoch: 17 Global Step: 371460 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:16,449-Speed 2512.71 samples/sec Loss 2.3474 LearningRate 0.000376 Epoch: 17 Global Step: 371470 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:24,653-Speed 2496.69 samples/sec Loss 2.3415 LearningRate 0.000376 Epoch: 17 Global Step: 371480 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:32,860-Speed 2495.94 samples/sec Loss 2.3665 LearningRate 0.000376 Epoch: 17 Global Step: 371490 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:41,065-Speed 2496.42 samples/sec Loss 2.3636 LearningRate 0.000376 Epoch: 17 Global Step: 371500 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:49,286-Speed 2491.87 samples/sec Loss 2.3590 LearningRate 0.000376 Epoch: 17 Global Step: 371510 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:28:57,502-Speed 2493.17 samples/sec Loss 2.3664 LearningRate 0.000376 Epoch: 17 Global Step: 371520 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:29:05,675-Speed 2506.25 samples/sec Loss 2.4388 LearningRate 0.000376 Epoch: 17 Global Step: 371530 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:29:13,882-Speed 2495.80 samples/sec Loss 2.3678 LearningRate 0.000376 Epoch: 17 Global Step: 371540 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:29:22,047-Speed 2508.81 samples/sec Loss 2.3181 LearningRate 0.000376 Epoch: 17 Global Step: 371550 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:29:30,257-Speed 2494.72 samples/sec Loss 2.3534 LearningRate 0.000376 Epoch: 17 Global Step: 371560 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:29:38,463-Speed 2496.08 samples/sec Loss 2.3835 LearningRate 0.000376 Epoch: 17 Global Step: 371570 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:29:46,671-Speed 2495.71 samples/sec Loss 2.3549 LearningRate 0.000376 Epoch: 17 Global Step: 371580 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:29:54,822-Speed 2512.76 samples/sec Loss 2.3883 LearningRate 0.000376 Epoch: 17 Global Step: 371590 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:03,039-Speed 2493.07 samples/sec Loss 2.3730 LearningRate 0.000376 Epoch: 17 Global Step: 371600 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:11,241-Speed 2497.26 samples/sec Loss 2.3367 LearningRate 0.000376 Epoch: 17 Global Step: 371610 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:19,447-Speed 2496.21 samples/sec Loss 2.3212 LearningRate 0.000376 Epoch: 17 Global Step: 371620 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:27,654-Speed 2495.79 samples/sec Loss 2.3366 LearningRate 0.000376 Epoch: 17 Global Step: 371630 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:35,859-Speed 2496.49 samples/sec Loss 2.4051 LearningRate 0.000376 Epoch: 17 Global Step: 371640 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:44,011-Speed 2512.92 samples/sec Loss 2.3661 LearningRate 0.000376 Epoch: 17 Global Step: 371650 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:30:52,214-Speed 2497.12 samples/sec Loss 2.3009 LearningRate 0.000376 Epoch: 17 Global Step: 371660 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:00,416-Speed 2497.42 samples/sec Loss 2.2965 LearningRate 0.000376 Epoch: 17 Global Step: 371670 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:08,626-Speed 2494.75 samples/sec Loss 2.3751 LearningRate 0.000376 Epoch: 17 Global Step: 371680 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:16,829-Speed 2497.15 samples/sec Loss 2.3574 LearningRate 0.000376 Epoch: 17 Global Step: 371690 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:25,035-Speed 2496.03 samples/sec Loss 2.3112 LearningRate 0.000376 Epoch: 17 Global Step: 371700 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:33,184-Speed 2513.77 samples/sec Loss 2.3153 LearningRate 0.000376 Epoch: 17 Global Step: 371710 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:41,387-Speed 2497.02 samples/sec Loss 2.3632 LearningRate 0.000376 Epoch: 17 Global Step: 371720 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:49,591-Speed 2496.55 samples/sec Loss 2.3425 LearningRate 0.000376 Epoch: 17 Global Step: 371730 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:31:57,792-Speed 2497.52 samples/sec Loss 2.3369 LearningRate 0.000376 Epoch: 17 Global Step: 371740 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:05,997-Speed 2496.58 samples/sec Loss 2.2942 LearningRate 0.000376 Epoch: 17 Global Step: 371750 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:14,206-Speed 2495.15 samples/sec Loss 2.3260 LearningRate 0.000376 Epoch: 17 Global Step: 371760 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:22,360-Speed 2511.99 samples/sec Loss 2.3541 LearningRate 0.000376 Epoch: 17 Global Step: 371770 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:30,566-Speed 2496.02 samples/sec Loss 2.3481 LearningRate 0.000376 Epoch: 17 Global Step: 371780 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:38,770-Speed 2496.75 samples/sec Loss 2.3464 LearningRate 0.000376 Epoch: 17 Global Step: 371790 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:46,974-Speed 2496.96 samples/sec Loss 2.3223 LearningRate 0.000376 Epoch: 17 Global Step: 371800 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:32:55,178-Speed 2496.51 samples/sec Loss 2.3492 LearningRate 0.000376 Epoch: 17 Global Step: 371810 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:03,383-Speed 2496.74 samples/sec Loss 2.3420 LearningRate 0.000376 Epoch: 17 Global Step: 371820 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:11,537-Speed 2511.75 samples/sec Loss 2.3190 LearningRate 0.000376 Epoch: 17 Global Step: 371830 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:19,740-Speed 2497.11 samples/sec Loss 2.3122 LearningRate 0.000376 Epoch: 17 Global Step: 371840 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:27,942-Speed 2497.16 samples/sec Loss 2.3178 LearningRate 0.000376 Epoch: 17 Global Step: 371850 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:36,160-Speed 2492.46 samples/sec Loss 2.3302 LearningRate 0.000376 Epoch: 17 Global Step: 371860 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:44,364-Speed 2496.80 samples/sec Loss 2.3154 LearningRate 0.000376 Epoch: 17 Global Step: 371870 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:33:52,570-Speed 2496.12 samples/sec Loss 2.3933 LearningRate 0.000376 Epoch: 17 Global Step: 371880 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:00,727-Speed 2511.15 samples/sec Loss 2.3192 LearningRate 0.000376 Epoch: 17 Global Step: 371890 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:08,948-Speed 2491.53 samples/sec Loss 2.3152 LearningRate 0.000376 Epoch: 17 Global Step: 371900 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:17,152-Speed 2496.73 samples/sec Loss 2.3010 LearningRate 0.000376 Epoch: 17 Global Step: 371910 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:25,357-Speed 2496.48 samples/sec Loss 2.3585 LearningRate 0.000376 Epoch: 17 Global Step: 371920 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:33,560-Speed 2496.84 samples/sec Loss 2.3537 LearningRate 0.000376 Epoch: 17 Global Step: 371930 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:41,783-Speed 2491.02 samples/sec Loss 2.3075 LearningRate 0.000376 Epoch: 17 Global Step: 371940 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:49,933-Speed 2513.25 samples/sec Loss 2.3952 LearningRate 0.000376 Epoch: 17 Global Step: 371950 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:34:58,137-Speed 2496.90 samples/sec Loss 2.3024 LearningRate 0.000376 Epoch: 17 Global Step: 371960 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:06,339-Speed 2497.55 samples/sec Loss 2.3529 LearningRate 0.000376 Epoch: 17 Global Step: 371970 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:14,550-Speed 2494.61 samples/sec Loss 2.3649 LearningRate 0.000376 Epoch: 17 Global Step: 371980 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:22,755-Speed 2496.52 samples/sec Loss 2.3468 LearningRate 0.000376 Epoch: 17 Global Step: 371990 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:30,967-Speed 2494.34 samples/sec Loss 2.3989 LearningRate 0.000376 Epoch: 17 Global Step: 372000 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:39,116-Speed 2513.58 samples/sec Loss 2.3769 LearningRate 0.000376 Epoch: 17 Global Step: 372010 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:47,319-Speed 2497.06 samples/sec Loss 2.3315 LearningRate 0.000376 Epoch: 17 Global Step: 372020 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:35:55,522-Speed 2496.92 samples/sec Loss 2.3797 LearningRate 0.000376 Epoch: 17 Global Step: 372030 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:03,731-Speed 2495.41 samples/sec Loss 2.3804 LearningRate 0.000376 Epoch: 17 Global Step: 372040 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:11,934-Speed 2497.17 samples/sec Loss 2.3282 LearningRate 0.000376 Epoch: 17 Global Step: 372050 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:20,136-Speed 2497.31 samples/sec Loss 2.3157 LearningRate 0.000375 Epoch: 17 Global Step: 372060 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:28,286-Speed 2513.38 samples/sec Loss 2.3839 LearningRate 0.000375 Epoch: 17 Global Step: 372070 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:36,490-Speed 2496.55 samples/sec Loss 2.2837 LearningRate 0.000375 Epoch: 17 Global Step: 372080 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:44,693-Speed 2497.34 samples/sec Loss 2.3383 LearningRate 0.000375 Epoch: 17 Global Step: 372090 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:36:52,895-Speed 2497.20 samples/sec Loss 2.3778 LearningRate 0.000375 Epoch: 17 Global Step: 372100 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:01,111-Speed 2493.04 samples/sec Loss 2.3413 LearningRate 0.000375 Epoch: 17 Global Step: 372110 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:09,312-Speed 2497.42 samples/sec Loss 2.3583 LearningRate 0.000375 Epoch: 17 Global Step: 372120 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:17,465-Speed 2512.67 samples/sec Loss 2.3322 LearningRate 0.000375 Epoch: 17 Global Step: 372130 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:25,684-Speed 2492.06 samples/sec Loss 2.3207 LearningRate 0.000375 Epoch: 17 Global Step: 372140 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:33,891-Speed 2495.99 samples/sec Loss 2.3709 LearningRate 0.000375 Epoch: 17 Global Step: 372150 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:42,093-Speed 2497.23 samples/sec Loss 2.3234 LearningRate 0.000375 Epoch: 17 Global Step: 372160 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:50,294-Speed 2497.65 samples/sec Loss 2.2677 LearningRate 0.000375 Epoch: 17 Global Step: 372170 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:37:58,502-Speed 2495.58 samples/sec Loss 2.3658 LearningRate 0.000375 Epoch: 17 Global Step: 372180 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:06,653-Speed 2513.06 samples/sec Loss 2.3781 LearningRate 0.000375 Epoch: 17 Global Step: 372190 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:14,857-Speed 2496.85 samples/sec Loss 2.3269 LearningRate 0.000375 Epoch: 17 Global Step: 372200 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:23,066-Speed 2495.24 samples/sec Loss 2.3533 LearningRate 0.000375 Epoch: 17 Global Step: 372210 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:31,267-Speed 2497.52 samples/sec Loss 2.3758 LearningRate 0.000375 Epoch: 17 Global Step: 372220 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:39,467-Speed 2497.88 samples/sec Loss 2.3290 LearningRate 0.000375 Epoch: 17 Global Step: 372230 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:47,669-Speed 2497.30 samples/sec Loss 2.3715 LearningRate 0.000375 Epoch: 17 Global Step: 372240 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:38:55,821-Speed 2512.79 samples/sec Loss 2.3835 LearningRate 0.000375 Epoch: 17 Global Step: 372250 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:04,025-Speed 2496.81 samples/sec Loss 2.3836 LearningRate 0.000375 Epoch: 17 Global Step: 372260 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:12,226-Speed 2497.76 samples/sec Loss 2.3588 LearningRate 0.000375 Epoch: 17 Global Step: 372270 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:20,427-Speed 2497.54 samples/sec Loss 2.4344 LearningRate 0.000375 Epoch: 17 Global Step: 372280 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:28,629-Speed 2497.24 samples/sec Loss 2.3575 LearningRate 0.000375 Epoch: 17 Global Step: 372290 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:36,832-Speed 2497.09 samples/sec Loss 2.3351 LearningRate 0.000375 Epoch: 17 Global Step: 372300 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:44,984-Speed 2512.87 samples/sec Loss 2.3224 LearningRate 0.000375 Epoch: 17 Global Step: 372310 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:39:53,205-Speed 2491.30 samples/sec Loss 2.3086 LearningRate 0.000375 Epoch: 17 Global Step: 372320 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:01,406-Speed 2497.69 samples/sec Loss 2.2977 LearningRate 0.000375 Epoch: 17 Global Step: 372330 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:09,608-Speed 2497.18 samples/sec Loss 2.3931 LearningRate 0.000375 Epoch: 17 Global Step: 372340 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:17,810-Speed 2497.61 samples/sec Loss 2.2770 LearningRate 0.000375 Epoch: 17 Global Step: 372350 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:26,011-Speed 2497.52 samples/sec Loss 2.3472 LearningRate 0.000375 Epoch: 17 Global Step: 372360 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:34,160-Speed 2513.59 samples/sec Loss 2.3109 LearningRate 0.000375 Epoch: 17 Global Step: 372370 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:42,362-Speed 2497.35 samples/sec Loss 2.3361 LearningRate 0.000375 Epoch: 17 Global Step: 372380 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:50,569-Speed 2495.84 samples/sec Loss 2.3459 LearningRate 0.000375 Epoch: 17 Global Step: 372390 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:40:58,794-Speed 2490.47 samples/sec Loss 2.3560 LearningRate 0.000375 Epoch: 17 Global Step: 372400 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:06,999-Speed 2496.38 samples/sec Loss 2.3522 LearningRate 0.000375 Epoch: 17 Global Step: 372410 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:15,208-Speed 2495.26 samples/sec Loss 2.3357 LearningRate 0.000375 Epoch: 17 Global Step: 372420 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:23,363-Speed 2511.69 samples/sec Loss 2.3236 LearningRate 0.000375 Epoch: 17 Global Step: 372430 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:31,580-Speed 2493.23 samples/sec Loss 2.3438 LearningRate 0.000375 Epoch: 17 Global Step: 372440 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:39,795-Speed 2493.13 samples/sec Loss 2.3481 LearningRate 0.000375 Epoch: 17 Global Step: 372450 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:48,004-Speed 2495.39 samples/sec Loss 2.3979 LearningRate 0.000375 Epoch: 17 Global Step: 372460 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:41:56,205-Speed 2497.64 samples/sec Loss 2.3668 LearningRate 0.000375 Epoch: 17 Global Step: 372470 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:04,421-Speed 2492.85 samples/sec Loss 2.3193 LearningRate 0.000375 Epoch: 17 Global Step: 372480 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:12,568-Speed 2514.28 samples/sec Loss 2.3658 LearningRate 0.000375 Epoch: 17 Global Step: 372490 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:20,773-Speed 2496.74 samples/sec Loss 2.3698 LearningRate 0.000375 Epoch: 17 Global Step: 372500 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:28,980-Speed 2495.88 samples/sec Loss 2.4083 LearningRate 0.000375 Epoch: 17 Global Step: 372510 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:37,184-Speed 2496.65 samples/sec Loss 2.3310 LearningRate 0.000375 Epoch: 17 Global Step: 372520 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:45,386-Speed 2497.36 samples/sec Loss 2.3686 LearningRate 0.000375 Epoch: 17 Global Step: 372530 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:42:53,585-Speed 2498.11 samples/sec Loss 2.4012 LearningRate 0.000375 Epoch: 17 Global Step: 372540 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:01,731-Speed 2515.41 samples/sec Loss 2.3786 LearningRate 0.000375 Epoch: 17 Global Step: 372550 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:09,937-Speed 2496.28 samples/sec Loss 2.3786 LearningRate 0.000375 Epoch: 17 Global Step: 372560 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:18,145-Speed 2495.46 samples/sec Loss 2.3279 LearningRate 0.000375 Epoch: 17 Global Step: 372570 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:26,352-Speed 2496.11 samples/sec Loss 2.3979 LearningRate 0.000375 Epoch: 17 Global Step: 372580 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:34,553-Speed 2497.43 samples/sec Loss 2.3495 LearningRate 0.000375 Epoch: 17 Global Step: 372590 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:42,752-Speed 2498.33 samples/sec Loss 2.3819 LearningRate 0.000375 Epoch: 17 Global Step: 372600 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:50,900-Speed 2513.98 samples/sec Loss 2.3430 LearningRate 0.000375 Epoch: 17 Global Step: 372610 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:43:59,107-Speed 2495.81 samples/sec Loss 2.4168 LearningRate 0.000375 Epoch: 17 Global Step: 372620 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:07,323-Speed 2492.81 samples/sec Loss 2.4000 LearningRate 0.000375 Epoch: 17 Global Step: 372630 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:15,525-Speed 2497.35 samples/sec Loss 2.3159 LearningRate 0.000375 Epoch: 17 Global Step: 372640 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:23,727-Speed 2497.64 samples/sec Loss 2.3798 LearningRate 0.000375 Epoch: 17 Global Step: 372650 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:31,932-Speed 2496.21 samples/sec Loss 2.4222 LearningRate 0.000375 Epoch: 17 Global Step: 372660 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:40,082-Speed 2513.43 samples/sec Loss 2.3735 LearningRate 0.000374 Epoch: 17 Global Step: 372670 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:48,287-Speed 2496.42 samples/sec Loss 2.3980 LearningRate 0.000374 Epoch: 17 Global Step: 372680 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:44:56,487-Speed 2497.84 samples/sec Loss 2.3497 LearningRate 0.000374 Epoch: 17 Global Step: 372690 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:04,704-Speed 2492.92 samples/sec Loss 2.3445 LearningRate 0.000374 Epoch: 17 Global Step: 372700 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:12,913-Speed 2495.35 samples/sec Loss 2.3370 LearningRate 0.000374 Epoch: 17 Global Step: 372710 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:21,130-Speed 2492.58 samples/sec Loss 2.4085 LearningRate 0.000374 Epoch: 17 Global Step: 372720 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:29,279-Speed 2513.73 samples/sec Loss 2.3087 LearningRate 0.000374 Epoch: 17 Global Step: 372730 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:37,480-Speed 2497.73 samples/sec Loss 2.3448 LearningRate 0.000374 Epoch: 17 Global Step: 372740 Fp16 Grad Scale: 16384 Required: 105 hours Training: 2022-07-09 03:45:45,680-Speed 2497.87 samples/sec Loss 2.3529 LearningRate 0.000374 Epoch: 17 Global Step: 372750 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:45:53,879-Speed 2498.11 samples/sec Loss 2.3578 LearningRate 0.000374 Epoch: 17 Global Step: 372760 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:02,079-Speed 2498.40 samples/sec Loss 2.3557 LearningRate 0.000374 Epoch: 17 Global Step: 372770 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:10,282-Speed 2496.88 samples/sec Loss 2.3698 LearningRate 0.000374 Epoch: 17 Global Step: 372780 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:18,441-Speed 2510.43 samples/sec Loss 2.4079 LearningRate 0.000374 Epoch: 17 Global Step: 372790 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:26,643-Speed 2497.25 samples/sec Loss 2.3845 LearningRate 0.000374 Epoch: 17 Global Step: 372800 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:34,847-Speed 2496.81 samples/sec Loss 2.3511 LearningRate 0.000374 Epoch: 17 Global Step: 372810 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:43,053-Speed 2496.24 samples/sec Loss 2.3707 LearningRate 0.000374 Epoch: 17 Global Step: 372820 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:51,256-Speed 2496.98 samples/sec Loss 2.3612 LearningRate 0.000374 Epoch: 17 Global Step: 372830 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:46:59,466-Speed 2494.85 samples/sec Loss 2.3413 LearningRate 0.000374 Epoch: 17 Global Step: 372840 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:07,644-Speed 2504.96 samples/sec Loss 2.3391 LearningRate 0.000374 Epoch: 17 Global Step: 372850 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:15,859-Speed 2493.28 samples/sec Loss 2.3911 LearningRate 0.000374 Epoch: 17 Global Step: 372860 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:24,078-Speed 2492.18 samples/sec Loss 2.3995 LearningRate 0.000374 Epoch: 17 Global Step: 372870 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:32,285-Speed 2495.90 samples/sec Loss 2.3394 LearningRate 0.000374 Epoch: 17 Global Step: 372880 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:40,495-Speed 2494.95 samples/sec Loss 2.3304 LearningRate 0.000374 Epoch: 17 Global Step: 372890 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:48,702-Speed 2495.79 samples/sec Loss 2.3525 LearningRate 0.000374 Epoch: 17 Global Step: 372900 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:47:56,852-Speed 2513.30 samples/sec Loss 2.3785 LearningRate 0.000374 Epoch: 17 Global Step: 372910 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:05,058-Speed 2496.33 samples/sec Loss 2.3638 LearningRate 0.000374 Epoch: 17 Global Step: 372920 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:13,265-Speed 2496.00 samples/sec Loss 2.3608 LearningRate 0.000374 Epoch: 17 Global Step: 372930 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:21,479-Speed 2493.65 samples/sec Loss 2.3607 LearningRate 0.000374 Epoch: 17 Global Step: 372940 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:29,698-Speed 2492.07 samples/sec Loss 2.3219 LearningRate 0.000374 Epoch: 17 Global Step: 372950 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:37,906-Speed 2495.56 samples/sec Loss 2.4027 LearningRate 0.000374 Epoch: 17 Global Step: 372960 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:46,057-Speed 2512.82 samples/sec Loss 2.3281 LearningRate 0.000374 Epoch: 17 Global Step: 372970 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:48:54,266-Speed 2495.16 samples/sec Loss 2.3793 LearningRate 0.000374 Epoch: 17 Global Step: 372980 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:49:02,477-Speed 2494.54 samples/sec Loss 2.3630 LearningRate 0.000374 Epoch: 17 Global Step: 372990 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:49:10,689-Speed 2494.34 samples/sec Loss 2.3637 LearningRate 0.000374 Epoch: 17 Global Step: 373000 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:49:18,894-Speed 2496.67 samples/sec Loss 2.3695 LearningRate 0.000374 Epoch: 17 Global Step: 373010 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:49:27,097-Speed 2496.89 samples/sec Loss 2.3704 LearningRate 0.000374 Epoch: 17 Global Step: 373020 Fp16 Grad Scale: 32768 Required: 105 hours Training: 2022-07-09 03:49:35,247-Speed 2513.25 samples/sec Loss 2.3768 LearningRate 0.000374 Epoch: 17 Global Step: 373030 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:49:43,449-Speed 2497.51 samples/sec Loss 2.3359 LearningRate 0.000374 Epoch: 17 Global Step: 373040 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:49:51,673-Speed 2490.69 samples/sec Loss 2.3924 LearningRate 0.000374 Epoch: 17 Global Step: 373050 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:49:59,875-Speed 2497.65 samples/sec Loss 2.3618 LearningRate 0.000374 Epoch: 17 Global Step: 373060 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:08,081-Speed 2496.12 samples/sec Loss 2.3503 LearningRate 0.000374 Epoch: 17 Global Step: 373070 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:16,282-Speed 2497.88 samples/sec Loss 2.3569 LearningRate 0.000374 Epoch: 17 Global Step: 373080 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:24,439-Speed 2511.13 samples/sec Loss 2.3227 LearningRate 0.000374 Epoch: 17 Global Step: 373090 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:32,646-Speed 2495.66 samples/sec Loss 2.3413 LearningRate 0.000374 Epoch: 17 Global Step: 373100 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:40,851-Speed 2496.62 samples/sec Loss 2.3239 LearningRate 0.000374 Epoch: 17 Global Step: 373110 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:49,055-Speed 2496.95 samples/sec Loss 2.3313 LearningRate 0.000374 Epoch: 17 Global Step: 373120 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:50:57,263-Speed 2495.39 samples/sec Loss 2.3661 LearningRate 0.000374 Epoch: 17 Global Step: 373130 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:51:05,470-Speed 2495.89 samples/sec Loss 2.3928 LearningRate 0.000374 Epoch: 17 Global Step: 373140 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:51:13,625-Speed 2511.72 samples/sec Loss 2.3018 LearningRate 0.000374 Epoch: 17 Global Step: 373150 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:51:21,834-Speed 2495.14 samples/sec Loss 2.3855 LearningRate 0.000374 Epoch: 17 Global Step: 373160 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:51:30,049-Speed 2493.36 samples/sec Loss 2.4242 LearningRate 0.000374 Epoch: 17 Global Step: 373170 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 03:51:38,209-Speed 2510.12 samples/sec Loss 2.3295 LearningRate 0.000374 Epoch: 17 Global Step: 373180 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:51:46,410-Speed 2497.97 samples/sec Loss 2.3424 LearningRate 0.000374 Epoch: 17 Global Step: 373190 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:51:54,634-Speed 2490.69 samples/sec Loss 2.3358 LearningRate 0.000374 Epoch: 17 Global Step: 373200 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:02,783-Speed 2513.57 samples/sec Loss 2.3812 LearningRate 0.000374 Epoch: 17 Global Step: 373210 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:10,990-Speed 2495.66 samples/sec Loss 2.3588 LearningRate 0.000374 Epoch: 17 Global Step: 373220 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:19,195-Speed 2496.71 samples/sec Loss 2.3480 LearningRate 0.000374 Epoch: 17 Global Step: 373230 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:27,402-Speed 2495.89 samples/sec Loss 2.3567 LearningRate 0.000374 Epoch: 17 Global Step: 373240 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:35,603-Speed 2497.66 samples/sec Loss 2.3753 LearningRate 0.000374 Epoch: 17 Global Step: 373250 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:43,810-Speed 2495.78 samples/sec Loss 2.3479 LearningRate 0.000374 Epoch: 17 Global Step: 373260 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:52:51,956-Speed 2514.79 samples/sec Loss 2.3334 LearningRate 0.000374 Epoch: 17 Global Step: 373270 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:00,156-Speed 2497.87 samples/sec Loss 2.3100 LearningRate 0.000373 Epoch: 17 Global Step: 373280 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:08,354-Speed 2498.45 samples/sec Loss 2.3860 LearningRate 0.000373 Epoch: 17 Global Step: 373290 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:16,554-Speed 2497.97 samples/sec Loss 2.3846 LearningRate 0.000373 Epoch: 17 Global Step: 373300 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:24,751-Speed 2498.96 samples/sec Loss 2.4112 LearningRate 0.000373 Epoch: 17 Global Step: 373310 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:35,858-Speed 1844.10 samples/sec Loss 2.3970 LearningRate 0.000373 Epoch: 18 Global Step: 373320 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:43,999-Speed 2515.86 samples/sec Loss 2.3636 LearningRate 0.000373 Epoch: 18 Global Step: 373330 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:53:52,208-Speed 2495.28 samples/sec Loss 2.3825 LearningRate 0.000373 Epoch: 18 Global Step: 373340 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:00,401-Speed 2499.94 samples/sec Loss 2.3636 LearningRate 0.000373 Epoch: 18 Global Step: 373350 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:08,613-Speed 2494.37 samples/sec Loss 2.3795 LearningRate 0.000373 Epoch: 18 Global Step: 373360 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:16,822-Speed 2495.15 samples/sec Loss 2.3336 LearningRate 0.000373 Epoch: 18 Global Step: 373370 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:25,030-Speed 2495.80 samples/sec Loss 2.3528 LearningRate 0.000373 Epoch: 18 Global Step: 373380 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:33,172-Speed 2515.63 samples/sec Loss 2.3377 LearningRate 0.000373 Epoch: 18 Global Step: 373390 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:41,367-Speed 2499.50 samples/sec Loss 2.3433 LearningRate 0.000373 Epoch: 18 Global Step: 373400 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:49,579-Speed 2495.16 samples/sec Loss 2.3501 LearningRate 0.000373 Epoch: 18 Global Step: 373410 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:54:57,781-Speed 2497.44 samples/sec Loss 2.3354 LearningRate 0.000373 Epoch: 18 Global Step: 373420 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:05,982-Speed 2497.71 samples/sec Loss 2.3110 LearningRate 0.000373 Epoch: 18 Global Step: 373430 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:14,182-Speed 2497.88 samples/sec Loss 2.3477 LearningRate 0.000373 Epoch: 18 Global Step: 373440 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:22,331-Speed 2513.34 samples/sec Loss 2.3103 LearningRate 0.000373 Epoch: 18 Global Step: 373450 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:30,535-Speed 2496.76 samples/sec Loss 2.3376 LearningRate 0.000373 Epoch: 18 Global Step: 373460 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:38,741-Speed 2496.61 samples/sec Loss 2.3422 LearningRate 0.000373 Epoch: 18 Global Step: 373470 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:46,943-Speed 2497.36 samples/sec Loss 2.3305 LearningRate 0.000373 Epoch: 18 Global Step: 373480 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:55:55,155-Speed 2494.24 samples/sec Loss 2.3327 LearningRate 0.000373 Epoch: 18 Global Step: 373490 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:03,355-Speed 2497.96 samples/sec Loss 2.2737 LearningRate 0.000373 Epoch: 18 Global Step: 373500 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:11,504-Speed 2513.83 samples/sec Loss 2.3352 LearningRate 0.000373 Epoch: 18 Global Step: 373510 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:19,713-Speed 2495.06 samples/sec Loss 2.3498 LearningRate 0.000373 Epoch: 18 Global Step: 373520 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:27,916-Speed 2496.98 samples/sec Loss 2.3335 LearningRate 0.000373 Epoch: 18 Global Step: 373530 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:36,115-Speed 2498.52 samples/sec Loss 2.3188 LearningRate 0.000373 Epoch: 18 Global Step: 373540 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:44,316-Speed 2497.47 samples/sec Loss 2.3420 LearningRate 0.000373 Epoch: 18 Global Step: 373550 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:56:52,518-Speed 2497.42 samples/sec Loss 2.3568 LearningRate 0.000373 Epoch: 18 Global Step: 373560 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:00,662-Speed 2515.17 samples/sec Loss 2.2749 LearningRate 0.000373 Epoch: 18 Global Step: 373570 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:08,875-Speed 2493.98 samples/sec Loss 2.3380 LearningRate 0.000373 Epoch: 18 Global Step: 373580 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:17,074-Speed 2498.14 samples/sec Loss 2.3377 LearningRate 0.000373 Epoch: 18 Global Step: 373590 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:25,278-Speed 2497.02 samples/sec Loss 2.3083 LearningRate 0.000373 Epoch: 18 Global Step: 373600 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:33,476-Speed 2498.33 samples/sec Loss 2.3372 LearningRate 0.000373 Epoch: 18 Global Step: 373610 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:41,676-Speed 2498.22 samples/sec Loss 2.3513 LearningRate 0.000373 Epoch: 18 Global Step: 373620 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:49,823-Speed 2514.14 samples/sec Loss 2.3173 LearningRate 0.000373 Epoch: 18 Global Step: 373630 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:57:58,037-Speed 2493.62 samples/sec Loss 2.3816 LearningRate 0.000373 Epoch: 18 Global Step: 373640 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:06,237-Speed 2497.92 samples/sec Loss 2.3273 LearningRate 0.000373 Epoch: 18 Global Step: 373650 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:14,447-Speed 2494.94 samples/sec Loss 2.3491 LearningRate 0.000373 Epoch: 18 Global Step: 373660 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:22,662-Speed 2493.42 samples/sec Loss 2.3399 LearningRate 0.000373 Epoch: 18 Global Step: 373670 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:30,866-Speed 2496.86 samples/sec Loss 2.3153 LearningRate 0.000373 Epoch: 18 Global Step: 373680 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:39,015-Speed 2514.86 samples/sec Loss 2.3311 LearningRate 0.000373 Epoch: 18 Global Step: 373690 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:47,222-Speed 2496.05 samples/sec Loss 2.3724 LearningRate 0.000373 Epoch: 18 Global Step: 373700 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:58:55,424-Speed 2497.56 samples/sec Loss 2.3532 LearningRate 0.000373 Epoch: 18 Global Step: 373710 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:03,624-Speed 2497.68 samples/sec Loss 2.3108 LearningRate 0.000373 Epoch: 18 Global Step: 373720 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:11,840-Speed 2493.23 samples/sec Loss 2.3433 LearningRate 0.000373 Epoch: 18 Global Step: 373730 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:20,040-Speed 2497.93 samples/sec Loss 2.3182 LearningRate 0.000373 Epoch: 18 Global Step: 373740 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:28,197-Speed 2510.97 samples/sec Loss 2.3209 LearningRate 0.000373 Epoch: 18 Global Step: 373750 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:36,398-Speed 2497.83 samples/sec Loss 2.3559 LearningRate 0.000373 Epoch: 18 Global Step: 373760 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:44,598-Speed 2497.76 samples/sec Loss 2.3226 LearningRate 0.000373 Epoch: 18 Global Step: 373770 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 03:59:52,797-Speed 2498.52 samples/sec Loss 2.2845 LearningRate 0.000373 Epoch: 18 Global Step: 373780 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:01,030-Speed 2487.90 samples/sec Loss 2.2811 LearningRate 0.000373 Epoch: 18 Global Step: 373790 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:09,230-Speed 2497.82 samples/sec Loss 2.2991 LearningRate 0.000373 Epoch: 18 Global Step: 373800 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:17,381-Speed 2513.17 samples/sec Loss 2.3594 LearningRate 0.000373 Epoch: 18 Global Step: 373810 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:25,612-Speed 2488.75 samples/sec Loss 2.3930 LearningRate 0.000373 Epoch: 18 Global Step: 373820 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:33,813-Speed 2497.65 samples/sec Loss 2.3488 LearningRate 0.000373 Epoch: 18 Global Step: 373830 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:42,011-Speed 2498.68 samples/sec Loss 2.3520 LearningRate 0.000373 Epoch: 18 Global Step: 373840 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:50,223-Speed 2494.56 samples/sec Loss 2.3304 LearningRate 0.000373 Epoch: 18 Global Step: 373850 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:00:58,427-Speed 2496.62 samples/sec Loss 2.3087 LearningRate 0.000373 Epoch: 18 Global Step: 373860 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:06,587-Speed 2510.09 samples/sec Loss 2.3731 LearningRate 0.000373 Epoch: 18 Global Step: 373870 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:14,786-Speed 2498.30 samples/sec Loss 2.3723 LearningRate 0.000373 Epoch: 18 Global Step: 373880 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:22,986-Speed 2498.15 samples/sec Loss 2.3366 LearningRate 0.000372 Epoch: 18 Global Step: 373890 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:31,224-Speed 2486.15 samples/sec Loss 2.3474 LearningRate 0.000372 Epoch: 18 Global Step: 373900 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:39,426-Speed 2497.55 samples/sec Loss 2.3177 LearningRate 0.000372 Epoch: 18 Global Step: 373910 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:47,624-Speed 2498.49 samples/sec Loss 2.3427 LearningRate 0.000372 Epoch: 18 Global Step: 373920 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:01:55,780-Speed 2511.34 samples/sec Loss 2.3472 LearningRate 0.000372 Epoch: 18 Global Step: 373930 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:03,982-Speed 2497.37 samples/sec Loss 2.3729 LearningRate 0.000372 Epoch: 18 Global Step: 373940 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:12,181-Speed 2498.36 samples/sec Loss 2.3385 LearningRate 0.000372 Epoch: 18 Global Step: 373950 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:20,383-Speed 2497.27 samples/sec Loss 2.3377 LearningRate 0.000372 Epoch: 18 Global Step: 373960 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:28,665-Speed 2473.21 samples/sec Loss 2.3167 LearningRate 0.000372 Epoch: 18 Global Step: 373970 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:36,868-Speed 2496.88 samples/sec Loss 2.3538 LearningRate 0.000372 Epoch: 18 Global Step: 373980 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:45,014-Speed 2514.61 samples/sec Loss 2.3578 LearningRate 0.000372 Epoch: 18 Global Step: 373990 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:02:53,219-Speed 2497.04 samples/sec Loss 2.3127 LearningRate 0.000372 Epoch: 18 Global Step: 374000 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:01,418-Speed 2498.00 samples/sec Loss 2.3788 LearningRate 0.000372 Epoch: 18 Global Step: 374010 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:09,623-Speed 2496.63 samples/sec Loss 2.3270 LearningRate 0.000372 Epoch: 18 Global Step: 374020 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:17,822-Speed 2498.26 samples/sec Loss 2.3579 LearningRate 0.000372 Epoch: 18 Global Step: 374030 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:26,027-Speed 2496.37 samples/sec Loss 2.3600 LearningRate 0.000372 Epoch: 18 Global Step: 374040 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:34,174-Speed 2514.43 samples/sec Loss 2.3890 LearningRate 0.000372 Epoch: 18 Global Step: 374050 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:42,387-Speed 2493.88 samples/sec Loss 2.3160 LearningRate 0.000372 Epoch: 18 Global Step: 374060 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:50,589-Speed 2497.54 samples/sec Loss 2.3445 LearningRate 0.000372 Epoch: 18 Global Step: 374070 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:03:58,784-Speed 2499.22 samples/sec Loss 2.3226 LearningRate 0.000372 Epoch: 18 Global Step: 374080 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:06,985-Speed 2497.70 samples/sec Loss 2.3132 LearningRate 0.000372 Epoch: 18 Global Step: 374090 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:15,185-Speed 2498.08 samples/sec Loss 2.3512 LearningRate 0.000372 Epoch: 18 Global Step: 374100 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:23,333-Speed 2514.08 samples/sec Loss 2.3148 LearningRate 0.000372 Epoch: 18 Global Step: 374110 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:31,538-Speed 2496.39 samples/sec Loss 2.3041 LearningRate 0.000372 Epoch: 18 Global Step: 374120 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:39,738-Speed 2497.97 samples/sec Loss 2.3575 LearningRate 0.000372 Epoch: 18 Global Step: 374130 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:47,939-Speed 2497.39 samples/sec Loss 2.3020 LearningRate 0.000372 Epoch: 18 Global Step: 374140 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:04:56,144-Speed 2496.66 samples/sec Loss 2.3723 LearningRate 0.000372 Epoch: 18 Global Step: 374150 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:04,348-Speed 2496.76 samples/sec Loss 2.3327 LearningRate 0.000372 Epoch: 18 Global Step: 374160 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:12,501-Speed 2512.57 samples/sec Loss 2.3367 LearningRate 0.000372 Epoch: 18 Global Step: 374170 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:20,705-Speed 2496.75 samples/sec Loss 2.3844 LearningRate 0.000372 Epoch: 18 Global Step: 374180 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:28,909-Speed 2496.71 samples/sec Loss 2.3592 LearningRate 0.000372 Epoch: 18 Global Step: 374190 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:37,112-Speed 2497.07 samples/sec Loss 2.3760 LearningRate 0.000372 Epoch: 18 Global Step: 374200 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:45,329-Speed 2492.55 samples/sec Loss 2.3349 LearningRate 0.000372 Epoch: 18 Global Step: 374210 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:05:53,532-Speed 2497.36 samples/sec Loss 2.3345 LearningRate 0.000372 Epoch: 18 Global Step: 374220 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:01,704-Speed 2506.51 samples/sec Loss 2.3232 LearningRate 0.000372 Epoch: 18 Global Step: 374230 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:09,908-Speed 2496.61 samples/sec Loss 2.3037 LearningRate 0.000372 Epoch: 18 Global Step: 374240 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:18,131-Speed 2491.24 samples/sec Loss 2.3395 LearningRate 0.000372 Epoch: 18 Global Step: 374250 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:26,331-Speed 2497.77 samples/sec Loss 2.4016 LearningRate 0.000372 Epoch: 18 Global Step: 374260 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:34,535-Speed 2496.82 samples/sec Loss 2.3881 LearningRate 0.000372 Epoch: 18 Global Step: 374270 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:42,738-Speed 2496.96 samples/sec Loss 2.3197 LearningRate 0.000372 Epoch: 18 Global Step: 374280 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:50,886-Speed 2513.89 samples/sec Loss 2.3412 LearningRate 0.000372 Epoch: 18 Global Step: 374290 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:06:59,090-Speed 2496.76 samples/sec Loss 2.3714 LearningRate 0.000372 Epoch: 18 Global Step: 374300 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:07,290-Speed 2498.10 samples/sec Loss 2.3473 LearningRate 0.000372 Epoch: 18 Global Step: 374310 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:15,519-Speed 2489.17 samples/sec Loss 2.2940 LearningRate 0.000372 Epoch: 18 Global Step: 374320 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:23,724-Speed 2496.21 samples/sec Loss 2.3109 LearningRate 0.000372 Epoch: 18 Global Step: 374330 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:31,921-Speed 2498.96 samples/sec Loss 2.3013 LearningRate 0.000372 Epoch: 18 Global Step: 374340 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:40,067-Speed 2514.36 samples/sec Loss 2.3115 LearningRate 0.000372 Epoch: 18 Global Step: 374350 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:48,272-Speed 2496.63 samples/sec Loss 2.3581 LearningRate 0.000372 Epoch: 18 Global Step: 374360 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:07:56,475-Speed 2497.03 samples/sec Loss 2.3134 LearningRate 0.000372 Epoch: 18 Global Step: 374370 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:08:04,675-Speed 2498.03 samples/sec Loss 2.3483 LearningRate 0.000372 Epoch: 18 Global Step: 374380 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:12,879-Speed 2496.77 samples/sec Loss 2.3465 LearningRate 0.000372 Epoch: 18 Global Step: 374390 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:21,086-Speed 2495.84 samples/sec Loss 2.3173 LearningRate 0.000372 Epoch: 18 Global Step: 374400 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:29,233-Speed 2514.04 samples/sec Loss 2.3672 LearningRate 0.000372 Epoch: 18 Global Step: 374410 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:37,441-Speed 2495.81 samples/sec Loss 2.2813 LearningRate 0.000372 Epoch: 18 Global Step: 374420 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:45,643-Speed 2497.19 samples/sec Loss 2.3104 LearningRate 0.000372 Epoch: 18 Global Step: 374430 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:08:53,844-Speed 2497.65 samples/sec Loss 2.3205 LearningRate 0.000372 Epoch: 18 Global Step: 374440 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:02,044-Speed 2497.83 samples/sec Loss 2.3558 LearningRate 0.000372 Epoch: 18 Global Step: 374450 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:10,252-Speed 2495.59 samples/sec Loss 2.2842 LearningRate 0.000372 Epoch: 18 Global Step: 374460 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:18,413-Speed 2509.98 samples/sec Loss 2.2894 LearningRate 0.000372 Epoch: 18 Global Step: 374470 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:26,613-Speed 2498.02 samples/sec Loss 2.3379 LearningRate 0.000372 Epoch: 18 Global Step: 374480 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:34,817-Speed 2496.73 samples/sec Loss 2.3423 LearningRate 0.000372 Epoch: 18 Global Step: 374490 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:43,018-Speed 2497.77 samples/sec Loss 2.3355 LearningRate 0.000371 Epoch: 18 Global Step: 374500 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:51,231-Speed 2493.95 samples/sec Loss 2.3304 LearningRate 0.000371 Epoch: 18 Global Step: 374510 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:09:59,446-Speed 2493.23 samples/sec Loss 2.3429 LearningRate 0.000371 Epoch: 18 Global Step: 374520 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:07,598-Speed 2512.55 samples/sec Loss 2.3447 LearningRate 0.000371 Epoch: 18 Global Step: 374530 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:15,804-Speed 2496.36 samples/sec Loss 2.3014 LearningRate 0.000371 Epoch: 18 Global Step: 374540 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:24,008-Speed 2496.65 samples/sec Loss 2.3073 LearningRate 0.000371 Epoch: 18 Global Step: 374550 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:32,208-Speed 2497.92 samples/sec Loss 2.3396 LearningRate 0.000371 Epoch: 18 Global Step: 374560 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:40,411-Speed 2496.98 samples/sec Loss 2.2772 LearningRate 0.000371 Epoch: 18 Global Step: 374570 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:48,615-Speed 2496.88 samples/sec Loss 2.3645 LearningRate 0.000371 Epoch: 18 Global Step: 374580 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:10:56,763-Speed 2513.84 samples/sec Loss 2.2872 LearningRate 0.000371 Epoch: 18 Global Step: 374590 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:04,978-Speed 2493.41 samples/sec Loss 2.3388 LearningRate 0.000371 Epoch: 18 Global Step: 374600 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:13,178-Speed 2498.00 samples/sec Loss 2.3193 LearningRate 0.000371 Epoch: 18 Global Step: 374610 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:21,382-Speed 2496.77 samples/sec Loss 2.3327 LearningRate 0.000371 Epoch: 18 Global Step: 374620 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:29,581-Speed 2497.99 samples/sec Loss 2.3873 LearningRate 0.000371 Epoch: 18 Global Step: 374630 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:37,787-Speed 2496.27 samples/sec Loss 2.3014 LearningRate 0.000371 Epoch: 18 Global Step: 374640 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:45,935-Speed 2513.84 samples/sec Loss 2.3691 LearningRate 0.000371 Epoch: 18 Global Step: 374650 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:11:54,135-Speed 2498.03 samples/sec Loss 2.3060 LearningRate 0.000371 Epoch: 18 Global Step: 374660 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:02,336-Speed 2497.73 samples/sec Loss 2.2690 LearningRate 0.000371 Epoch: 18 Global Step: 374670 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:10,541-Speed 2496.24 samples/sec Loss 2.2813 LearningRate 0.000371 Epoch: 18 Global Step: 374680 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:18,742-Speed 2497.68 samples/sec Loss 2.3223 LearningRate 0.000371 Epoch: 18 Global Step: 374690 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:26,948-Speed 2496.32 samples/sec Loss 2.3267 LearningRate 0.000371 Epoch: 18 Global Step: 374700 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:35,100-Speed 2512.72 samples/sec Loss 2.2764 LearningRate 0.000371 Epoch: 18 Global Step: 374710 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:43,300-Speed 2497.81 samples/sec Loss 2.2824 LearningRate 0.000371 Epoch: 18 Global Step: 374720 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:51,500-Speed 2497.97 samples/sec Loss 2.3347 LearningRate 0.000371 Epoch: 18 Global Step: 374730 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:12:59,706-Speed 2496.12 samples/sec Loss 2.3119 LearningRate 0.000371 Epoch: 18 Global Step: 374740 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:07,907-Speed 2497.72 samples/sec Loss 2.3058 LearningRate 0.000371 Epoch: 18 Global Step: 374750 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:16,111-Speed 2496.70 samples/sec Loss 2.3904 LearningRate 0.000371 Epoch: 18 Global Step: 374760 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:24,279-Speed 2507.54 samples/sec Loss 2.3318 LearningRate 0.000371 Epoch: 18 Global Step: 374770 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:32,490-Speed 2494.68 samples/sec Loss 2.2999 LearningRate 0.000371 Epoch: 18 Global Step: 374780 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:40,695-Speed 2496.57 samples/sec Loss 2.3745 LearningRate 0.000371 Epoch: 18 Global Step: 374790 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:48,899-Speed 2496.63 samples/sec Loss 2.3448 LearningRate 0.000371 Epoch: 18 Global Step: 374800 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:13:57,111-Speed 2494.29 samples/sec Loss 2.3544 LearningRate 0.000371 Epoch: 18 Global Step: 374810 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:05,318-Speed 2496.24 samples/sec Loss 2.3533 LearningRate 0.000371 Epoch: 18 Global Step: 374820 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:13,463-Speed 2514.69 samples/sec Loss 2.3275 LearningRate 0.000371 Epoch: 18 Global Step: 374830 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:21,672-Speed 2495.16 samples/sec Loss 2.3079 LearningRate 0.000371 Epoch: 18 Global Step: 374840 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:29,885-Speed 2493.95 samples/sec Loss 2.2996 LearningRate 0.000371 Epoch: 18 Global Step: 374850 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:38,105-Speed 2491.86 samples/sec Loss 2.3338 LearningRate 0.000371 Epoch: 18 Global Step: 374860 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:46,313-Speed 2495.93 samples/sec Loss 2.3386 LearningRate 0.000371 Epoch: 18 Global Step: 374870 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:14:54,513-Speed 2498.27 samples/sec Loss 2.3118 LearningRate 0.000371 Epoch: 18 Global Step: 374880 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:02,659-Speed 2514.50 samples/sec Loss 2.3331 LearningRate 0.000371 Epoch: 18 Global Step: 374890 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:10,873-Speed 2493.66 samples/sec Loss 2.3258 LearningRate 0.000371 Epoch: 18 Global Step: 374900 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:19,074-Speed 2497.86 samples/sec Loss 2.3025 LearningRate 0.000371 Epoch: 18 Global Step: 374910 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:27,276-Speed 2497.49 samples/sec Loss 2.3888 LearningRate 0.000371 Epoch: 18 Global Step: 374920 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:35,479-Speed 2496.89 samples/sec Loss 2.3337 LearningRate 0.000371 Epoch: 18 Global Step: 374930 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:43,679-Speed 2497.96 samples/sec Loss 2.3048 LearningRate 0.000371 Epoch: 18 Global Step: 374940 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:15:51,827-Speed 2513.92 samples/sec Loss 2.3513 LearningRate 0.000371 Epoch: 18 Global Step: 374950 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:00,027-Speed 2497.94 samples/sec Loss 2.3484 LearningRate 0.000371 Epoch: 18 Global Step: 374960 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:08,226-Speed 2498.34 samples/sec Loss 2.3697 LearningRate 0.000371 Epoch: 18 Global Step: 374970 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:16,435-Speed 2495.24 samples/sec Loss 2.2773 LearningRate 0.000371 Epoch: 18 Global Step: 374980 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:24,633-Speed 2498.28 samples/sec Loss 2.4012 LearningRate 0.000371 Epoch: 18 Global Step: 374990 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:32,851-Speed 2492.61 samples/sec Loss 2.3095 LearningRate 0.000371 Epoch: 18 Global Step: 375000 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:41,012-Speed 2509.83 samples/sec Loss 2.3364 LearningRate 0.000371 Epoch: 18 Global Step: 375010 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:49,212-Speed 2497.83 samples/sec Loss 2.3626 LearningRate 0.000371 Epoch: 18 Global Step: 375020 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:16:57,425-Speed 2493.95 samples/sec Loss 2.3373 LearningRate 0.000371 Epoch: 18 Global Step: 375030 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:05,627-Speed 2497.37 samples/sec Loss 2.3619 LearningRate 0.000371 Epoch: 18 Global Step: 375040 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:13,830-Speed 2497.15 samples/sec Loss 2.3769 LearningRate 0.000371 Epoch: 18 Global Step: 375050 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:22,037-Speed 2495.76 samples/sec Loss 2.3394 LearningRate 0.000371 Epoch: 18 Global Step: 375060 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:30,191-Speed 2512.06 samples/sec Loss 2.3499 LearningRate 0.000371 Epoch: 18 Global Step: 375070 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:38,394-Speed 2496.86 samples/sec Loss 2.2915 LearningRate 0.000371 Epoch: 18 Global Step: 375080 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:46,599-Speed 2496.53 samples/sec Loss 2.3227 LearningRate 0.000371 Epoch: 18 Global Step: 375090 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:17:54,798-Speed 2497.99 samples/sec Loss 2.3531 LearningRate 0.000371 Epoch: 18 Global Step: 375100 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:03,001-Speed 2497.17 samples/sec Loss 2.3395 LearningRate 0.000371 Epoch: 18 Global Step: 375110 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:11,206-Speed 2496.26 samples/sec Loss 2.4296 LearningRate 0.000370 Epoch: 18 Global Step: 375120 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:19,365-Speed 2510.72 samples/sec Loss 2.3450 LearningRate 0.000370 Epoch: 18 Global Step: 375130 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:27,567-Speed 2497.30 samples/sec Loss 2.3802 LearningRate 0.000370 Epoch: 18 Global Step: 375140 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:35,779-Speed 2494.23 samples/sec Loss 2.3710 LearningRate 0.000370 Epoch: 18 Global Step: 375150 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:43,981-Speed 2497.33 samples/sec Loss 2.3724 LearningRate 0.000370 Epoch: 18 Global Step: 375160 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:18:52,180-Speed 2498.01 samples/sec Loss 2.3689 LearningRate 0.000370 Epoch: 18 Global Step: 375170 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:00,382-Speed 2497.49 samples/sec Loss 2.3569 LearningRate 0.000370 Epoch: 18 Global Step: 375180 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:08,530-Speed 2513.59 samples/sec Loss 2.3467 LearningRate 0.000370 Epoch: 18 Global Step: 375190 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:16,733-Speed 2496.90 samples/sec Loss 2.3904 LearningRate 0.000370 Epoch: 18 Global Step: 375200 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:24,933-Speed 2497.97 samples/sec Loss 2.3607 LearningRate 0.000370 Epoch: 18 Global Step: 375210 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:33,131-Speed 2498.53 samples/sec Loss 2.3432 LearningRate 0.000370 Epoch: 18 Global Step: 375220 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:41,333-Speed 2497.54 samples/sec Loss 2.3354 LearningRate 0.000370 Epoch: 18 Global Step: 375230 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:49,532-Speed 2498.16 samples/sec Loss 2.3839 LearningRate 0.000370 Epoch: 18 Global Step: 375240 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:19:57,687-Speed 2511.84 samples/sec Loss 2.3341 LearningRate 0.000370 Epoch: 18 Global Step: 375250 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:05,888-Speed 2497.95 samples/sec Loss 2.4009 LearningRate 0.000370 Epoch: 18 Global Step: 375260 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:14,102-Speed 2493.40 samples/sec Loss 2.3495 LearningRate 0.000370 Epoch: 18 Global Step: 375270 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:22,326-Speed 2490.87 samples/sec Loss 2.3042 LearningRate 0.000370 Epoch: 18 Global Step: 375280 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:30,534-Speed 2495.50 samples/sec Loss 2.3069 LearningRate 0.000370 Epoch: 18 Global Step: 375290 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:38,738-Speed 2496.73 samples/sec Loss 2.3339 LearningRate 0.000370 Epoch: 18 Global Step: 375300 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:46,887-Speed 2513.56 samples/sec Loss 2.3366 LearningRate 0.000370 Epoch: 18 Global Step: 375310 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:20:55,093-Speed 2496.06 samples/sec Loss 2.2950 LearningRate 0.000370 Epoch: 18 Global Step: 375320 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:03,299-Speed 2495.89 samples/sec Loss 2.3130 LearningRate 0.000370 Epoch: 18 Global Step: 375330 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:11,500-Speed 2497.61 samples/sec Loss 2.3643 LearningRate 0.000370 Epoch: 18 Global Step: 375340 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:19,705-Speed 2496.54 samples/sec Loss 2.3224 LearningRate 0.000370 Epoch: 18 Global Step: 375350 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:27,910-Speed 2496.68 samples/sec Loss 2.3530 LearningRate 0.000370 Epoch: 18 Global Step: 375360 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:36,066-Speed 2511.67 samples/sec Loss 2.3023 LearningRate 0.000370 Epoch: 18 Global Step: 375370 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:44,273-Speed 2495.71 samples/sec Loss 2.2770 LearningRate 0.000370 Epoch: 18 Global Step: 375380 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:21:52,476-Speed 2496.98 samples/sec Loss 2.3188 LearningRate 0.000370 Epoch: 18 Global Step: 375390 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:22:00,683-Speed 2495.78 samples/sec Loss 2.2589 LearningRate 0.000370 Epoch: 18 Global Step: 375400 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:22:08,920-Speed 2486.67 samples/sec Loss 2.3379 LearningRate 0.000370 Epoch: 18 Global Step: 375410 Fp16 Grad Scale: 32768 Required: 104 hours Training: 2022-07-09 04:22:17,085-Speed 2508.78 samples/sec Loss 2.2976 LearningRate 0.000370 Epoch: 18 Global Step: 375420 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:22:25,234-Speed 2513.60 samples/sec Loss 2.3352 LearningRate 0.000370 Epoch: 18 Global Step: 375430 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:22:33,437-Speed 2496.90 samples/sec Loss 2.3580 LearningRate 0.000370 Epoch: 18 Global Step: 375440 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:22:41,648-Speed 2494.60 samples/sec Loss 2.3086 LearningRate 0.000370 Epoch: 18 Global Step: 375450 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:22:49,856-Speed 2495.69 samples/sec Loss 2.3134 LearningRate 0.000370 Epoch: 18 Global Step: 375460 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:22:58,078-Speed 2491.17 samples/sec Loss 2.3532 LearningRate 0.000370 Epoch: 18 Global Step: 375470 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:06,277-Speed 2498.04 samples/sec Loss 2.3290 LearningRate 0.000370 Epoch: 18 Global Step: 375480 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:14,444-Speed 2508.22 samples/sec Loss 2.3174 LearningRate 0.000370 Epoch: 18 Global Step: 375490 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:22,644-Speed 2497.93 samples/sec Loss 2.3363 LearningRate 0.000370 Epoch: 18 Global Step: 375500 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:30,845-Speed 2497.51 samples/sec Loss 2.3036 LearningRate 0.000370 Epoch: 18 Global Step: 375510 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:39,048-Speed 2497.19 samples/sec Loss 2.3150 LearningRate 0.000370 Epoch: 18 Global Step: 375520 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:47,251-Speed 2497.05 samples/sec Loss 2.3588 LearningRate 0.000370 Epoch: 18 Global Step: 375530 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:23:55,455-Speed 2496.75 samples/sec Loss 2.3361 LearningRate 0.000370 Epoch: 18 Global Step: 375540 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:03,604-Speed 2513.67 samples/sec Loss 2.3160 LearningRate 0.000370 Epoch: 18 Global Step: 375550 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:11,820-Speed 2493.20 samples/sec Loss 2.4022 LearningRate 0.000370 Epoch: 18 Global Step: 375560 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:20,024-Speed 2496.51 samples/sec Loss 2.3256 LearningRate 0.000370 Epoch: 18 Global Step: 375570 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:28,226-Speed 2497.36 samples/sec Loss 2.3010 LearningRate 0.000370 Epoch: 18 Global Step: 375580 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:36,431-Speed 2496.30 samples/sec Loss 2.3239 LearningRate 0.000370 Epoch: 18 Global Step: 375590 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:44,637-Speed 2496.38 samples/sec Loss 2.3374 LearningRate 0.000370 Epoch: 18 Global Step: 375600 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:24:52,788-Speed 2512.87 samples/sec Loss 2.3920 LearningRate 0.000370 Epoch: 18 Global Step: 375610 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:25:00,994-Speed 2496.02 samples/sec Loss 2.3381 LearningRate 0.000370 Epoch: 18 Global Step: 375620 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:25:09,149-Speed 2511.76 samples/sec Loss 2.3760 LearningRate 0.000370 Epoch: 18 Global Step: 375630 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:17,350-Speed 2498.00 samples/sec Loss 2.3836 LearningRate 0.000370 Epoch: 18 Global Step: 375640 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:25,557-Speed 2495.89 samples/sec Loss 2.3656 LearningRate 0.000370 Epoch: 18 Global Step: 375650 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:33,755-Speed 2498.23 samples/sec Loss 2.3197 LearningRate 0.000370 Epoch: 18 Global Step: 375660 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:41,903-Speed 2514.06 samples/sec Loss 2.3035 LearningRate 0.000370 Epoch: 18 Global Step: 375670 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:50,106-Speed 2497.08 samples/sec Loss 2.3308 LearningRate 0.000370 Epoch: 18 Global Step: 375680 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:25:58,318-Speed 2494.26 samples/sec Loss 2.3106 LearningRate 0.000370 Epoch: 18 Global Step: 375690 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:06,519-Speed 2497.80 samples/sec Loss 2.2863 LearningRate 0.000370 Epoch: 18 Global Step: 375700 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:14,717-Speed 2499.50 samples/sec Loss 2.3383 LearningRate 0.000370 Epoch: 18 Global Step: 375710 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:22,917-Speed 2498.22 samples/sec Loss 2.3305 LearningRate 0.000370 Epoch: 18 Global Step: 375720 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:31,075-Speed 2510.58 samples/sec Loss 2.2887 LearningRate 0.000369 Epoch: 18 Global Step: 375730 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:39,274-Speed 2498.30 samples/sec Loss 2.2971 LearningRate 0.000369 Epoch: 18 Global Step: 375740 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:47,474-Speed 2498.14 samples/sec Loss 2.3435 LearningRate 0.000369 Epoch: 18 Global Step: 375750 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:26:55,675-Speed 2497.77 samples/sec Loss 2.3131 LearningRate 0.000369 Epoch: 18 Global Step: 375760 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:03,882-Speed 2495.57 samples/sec Loss 2.3028 LearningRate 0.000369 Epoch: 18 Global Step: 375770 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:12,084-Speed 2497.49 samples/sec Loss 2.4212 LearningRate 0.000369 Epoch: 18 Global Step: 375780 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:20,244-Speed 2510.25 samples/sec Loss 2.3557 LearningRate 0.000369 Epoch: 18 Global Step: 375790 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:28,446-Speed 2497.45 samples/sec Loss 2.3768 LearningRate 0.000369 Epoch: 18 Global Step: 375800 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:36,646-Speed 2497.84 samples/sec Loss 2.3324 LearningRate 0.000369 Epoch: 18 Global Step: 375810 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:44,851-Speed 2496.64 samples/sec Loss 2.3040 LearningRate 0.000369 Epoch: 18 Global Step: 375820 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:27:53,060-Speed 2495.47 samples/sec Loss 2.3855 LearningRate 0.000369 Epoch: 18 Global Step: 375830 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:01,265-Speed 2496.43 samples/sec Loss 2.3071 LearningRate 0.000369 Epoch: 18 Global Step: 375840 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:09,412-Speed 2514.31 samples/sec Loss 2.3157 LearningRate 0.000369 Epoch: 18 Global Step: 375850 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:17,613-Speed 2497.78 samples/sec Loss 2.4129 LearningRate 0.000369 Epoch: 18 Global Step: 375860 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:25,809-Speed 2499.24 samples/sec Loss 2.2792 LearningRate 0.000369 Epoch: 18 Global Step: 375870 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:34,010-Speed 2497.57 samples/sec Loss 2.4126 LearningRate 0.000369 Epoch: 18 Global Step: 375880 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:42,208-Speed 2498.72 samples/sec Loss 2.2995 LearningRate 0.000369 Epoch: 18 Global Step: 375890 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:50,412-Speed 2496.98 samples/sec Loss 2.3242 LearningRate 0.000369 Epoch: 18 Global Step: 375900 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:28:58,561-Speed 2513.56 samples/sec Loss 2.2971 LearningRate 0.000369 Epoch: 18 Global Step: 375910 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:06,757-Speed 2498.98 samples/sec Loss 2.3351 LearningRate 0.000369 Epoch: 18 Global Step: 375920 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:14,961-Speed 2497.07 samples/sec Loss 2.3858 LearningRate 0.000369 Epoch: 18 Global Step: 375930 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:23,161-Speed 2498.07 samples/sec Loss 2.3493 LearningRate 0.000369 Epoch: 18 Global Step: 375940 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:31,364-Speed 2496.83 samples/sec Loss 2.3505 LearningRate 0.000369 Epoch: 18 Global Step: 375950 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:39,572-Speed 2495.75 samples/sec Loss 2.3465 LearningRate 0.000369 Epoch: 18 Global Step: 375960 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:47,745-Speed 2506.07 samples/sec Loss 2.3207 LearningRate 0.000369 Epoch: 18 Global Step: 375970 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:29:55,948-Speed 2497.38 samples/sec Loss 2.3351 LearningRate 0.000369 Epoch: 18 Global Step: 375980 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:04,148-Speed 2497.95 samples/sec Loss 2.3342 LearningRate 0.000369 Epoch: 18 Global Step: 375990 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:12,352-Speed 2496.72 samples/sec Loss 2.3686 LearningRate 0.000369 Epoch: 18 Global Step: 376000 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:20,554-Speed 2497.25 samples/sec Loss 2.3465 LearningRate 0.000369 Epoch: 18 Global Step: 376010 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:28,753-Speed 2498.33 samples/sec Loss 2.3309 LearningRate 0.000369 Epoch: 18 Global Step: 376020 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:36,903-Speed 2512.93 samples/sec Loss 2.3611 LearningRate 0.000369 Epoch: 18 Global Step: 376030 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:45,103-Speed 2498.02 samples/sec Loss 2.3652 LearningRate 0.000369 Epoch: 18 Global Step: 376040 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:30:53,302-Speed 2498.46 samples/sec Loss 2.3173 LearningRate 0.000369 Epoch: 18 Global Step: 376050 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:01,514-Speed 2494.46 samples/sec Loss 2.3317 LearningRate 0.000369 Epoch: 18 Global Step: 376060 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:09,715-Speed 2497.34 samples/sec Loss 2.3413 LearningRate 0.000369 Epoch: 18 Global Step: 376070 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:17,916-Speed 2497.75 samples/sec Loss 2.3572 LearningRate 0.000369 Epoch: 18 Global Step: 376080 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:26,065-Speed 2513.54 samples/sec Loss 2.3470 LearningRate 0.000369 Epoch: 18 Global Step: 376090 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:34,267-Speed 2497.55 samples/sec Loss 2.3207 LearningRate 0.000369 Epoch: 18 Global Step: 376100 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:42,466-Speed 2498.12 samples/sec Loss 2.2958 LearningRate 0.000369 Epoch: 18 Global Step: 376110 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:50,664-Speed 2498.74 samples/sec Loss 2.3823 LearningRate 0.000369 Epoch: 18 Global Step: 376120 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:31:58,869-Speed 2496.48 samples/sec Loss 2.3282 LearningRate 0.000369 Epoch: 18 Global Step: 376130 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:07,072-Speed 2496.88 samples/sec Loss 2.3592 LearningRate 0.000369 Epoch: 18 Global Step: 376140 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:15,220-Speed 2514.01 samples/sec Loss 2.4502 LearningRate 0.000369 Epoch: 18 Global Step: 376150 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:23,418-Speed 2498.53 samples/sec Loss 2.3740 LearningRate 0.000369 Epoch: 18 Global Step: 376160 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:31,617-Speed 2498.10 samples/sec Loss 2.3297 LearningRate 0.000369 Epoch: 18 Global Step: 376170 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:39,823-Speed 2496.45 samples/sec Loss 2.3468 LearningRate 0.000369 Epoch: 18 Global Step: 376180 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:48,027-Speed 2496.56 samples/sec Loss 2.3757 LearningRate 0.000369 Epoch: 18 Global Step: 376190 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:32:56,228-Speed 2497.75 samples/sec Loss 2.3295 LearningRate 0.000369 Epoch: 18 Global Step: 376200 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:04,379-Speed 2512.91 samples/sec Loss 2.3307 LearningRate 0.000369 Epoch: 18 Global Step: 376210 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:12,581-Speed 2497.61 samples/sec Loss 2.3933 LearningRate 0.000369 Epoch: 18 Global Step: 376220 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:20,779-Speed 2498.53 samples/sec Loss 2.3376 LearningRate 0.000369 Epoch: 18 Global Step: 376230 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:28,988-Speed 2495.48 samples/sec Loss 2.3449 LearningRate 0.000369 Epoch: 18 Global Step: 376240 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:37,193-Speed 2496.27 samples/sec Loss 2.3128 LearningRate 0.000369 Epoch: 18 Global Step: 376250 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:45,399-Speed 2496.21 samples/sec Loss 2.3371 LearningRate 0.000369 Epoch: 18 Global Step: 376260 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:33:53,546-Speed 2513.87 samples/sec Loss 2.3195 LearningRate 0.000369 Epoch: 18 Global Step: 376270 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:01,748-Speed 2497.46 samples/sec Loss 2.2962 LearningRate 0.000369 Epoch: 18 Global Step: 376280 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:09,949-Speed 2498.02 samples/sec Loss 2.3871 LearningRate 0.000369 Epoch: 18 Global Step: 376290 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:18,149-Speed 2497.88 samples/sec Loss 2.3939 LearningRate 0.000369 Epoch: 18 Global Step: 376300 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:26,360-Speed 2494.66 samples/sec Loss 2.3054 LearningRate 0.000369 Epoch: 18 Global Step: 376310 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:34,559-Speed 2498.70 samples/sec Loss 2.3133 LearningRate 0.000369 Epoch: 18 Global Step: 376320 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:42,704-Speed 2514.64 samples/sec Loss 2.3630 LearningRate 0.000369 Epoch: 18 Global Step: 376330 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:50,905-Speed 2497.66 samples/sec Loss 2.2993 LearningRate 0.000369 Epoch: 18 Global Step: 376340 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:34:59,107-Speed 2497.35 samples/sec Loss 2.3082 LearningRate 0.000368 Epoch: 18 Global Step: 376350 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:07,309-Speed 2497.43 samples/sec Loss 2.3188 LearningRate 0.000368 Epoch: 18 Global Step: 376360 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:15,511-Speed 2497.29 samples/sec Loss 2.3259 LearningRate 0.000368 Epoch: 18 Global Step: 376370 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:23,710-Speed 2498.34 samples/sec Loss 2.3677 LearningRate 0.000368 Epoch: 18 Global Step: 376380 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:31,851-Speed 2516.13 samples/sec Loss 2.3083 LearningRate 0.000368 Epoch: 18 Global Step: 376390 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:40,052-Speed 2497.81 samples/sec Loss 2.2959 LearningRate 0.000368 Epoch: 18 Global Step: 376400 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:48,247-Speed 2499.33 samples/sec Loss 2.3096 LearningRate 0.000368 Epoch: 18 Global Step: 376410 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:35:56,447-Speed 2498.01 samples/sec Loss 2.3594 LearningRate 0.000368 Epoch: 18 Global Step: 376420 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:04,654-Speed 2495.78 samples/sec Loss 2.3297 LearningRate 0.000368 Epoch: 18 Global Step: 376430 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:12,857-Speed 2496.86 samples/sec Loss 2.3285 LearningRate 0.000368 Epoch: 18 Global Step: 376440 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:21,006-Speed 2513.96 samples/sec Loss 2.3501 LearningRate 0.000368 Epoch: 18 Global Step: 376450 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:29,204-Speed 2498.47 samples/sec Loss 2.3249 LearningRate 0.000368 Epoch: 18 Global Step: 376460 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:37,405-Speed 2497.67 samples/sec Loss 2.3494 LearningRate 0.000368 Epoch: 18 Global Step: 376470 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:45,603-Speed 2498.53 samples/sec Loss 2.3014 LearningRate 0.000368 Epoch: 18 Global Step: 376480 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:36:53,800-Speed 2498.81 samples/sec Loss 2.3333 LearningRate 0.000368 Epoch: 18 Global Step: 376490 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:01,999-Speed 2498.23 samples/sec Loss 2.3469 LearningRate 0.000368 Epoch: 18 Global Step: 376500 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:10,177-Speed 2504.74 samples/sec Loss 2.3056 LearningRate 0.000368 Epoch: 18 Global Step: 376510 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:18,376-Speed 2498.14 samples/sec Loss 2.3299 LearningRate 0.000368 Epoch: 18 Global Step: 376520 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:26,583-Speed 2495.94 samples/sec Loss 2.2987 LearningRate 0.000368 Epoch: 18 Global Step: 376530 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:34,779-Speed 2499.04 samples/sec Loss 2.2883 LearningRate 0.000368 Epoch: 18 Global Step: 376540 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:42,985-Speed 2496.37 samples/sec Loss 2.2828 LearningRate 0.000368 Epoch: 18 Global Step: 376550 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:51,188-Speed 2496.96 samples/sec Loss 2.2776 LearningRate 0.000368 Epoch: 18 Global Step: 376560 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:37:59,339-Speed 2512.79 samples/sec Loss 2.2720 LearningRate 0.000368 Epoch: 18 Global Step: 376570 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:07,540-Speed 2497.74 samples/sec Loss 2.2956 LearningRate 0.000368 Epoch: 18 Global Step: 376580 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:15,739-Speed 2498.50 samples/sec Loss 2.3016 LearningRate 0.000368 Epoch: 18 Global Step: 376590 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:23,950-Speed 2494.61 samples/sec Loss 2.3342 LearningRate 0.000368 Epoch: 18 Global Step: 376600 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:32,149-Speed 2498.38 samples/sec Loss 2.3264 LearningRate 0.000368 Epoch: 18 Global Step: 376610 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:40,348-Speed 2498.07 samples/sec Loss 2.3288 LearningRate 0.000368 Epoch: 18 Global Step: 376620 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:48,495-Speed 2514.06 samples/sec Loss 2.2872 LearningRate 0.000368 Epoch: 18 Global Step: 376630 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:38:56,706-Speed 2494.88 samples/sec Loss 2.2945 LearningRate 0.000368 Epoch: 18 Global Step: 376640 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:04,900-Speed 2499.84 samples/sec Loss 2.2925 LearningRate 0.000368 Epoch: 18 Global Step: 376650 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:13,106-Speed 2495.99 samples/sec Loss 2.3338 LearningRate 0.000368 Epoch: 18 Global Step: 376660 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:21,319-Speed 2493.87 samples/sec Loss 2.3119 LearningRate 0.000368 Epoch: 18 Global Step: 376670 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:29,522-Speed 2497.19 samples/sec Loss 2.2930 LearningRate 0.000368 Epoch: 18 Global Step: 376680 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:37,668-Speed 2514.58 samples/sec Loss 2.3159 LearningRate 0.000368 Epoch: 18 Global Step: 376690 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:45,866-Speed 2498.69 samples/sec Loss 2.2811 LearningRate 0.000368 Epoch: 18 Global Step: 376700 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:39:54,064-Speed 2498.66 samples/sec Loss 2.3024 LearningRate 0.000368 Epoch: 18 Global Step: 376710 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:02,263-Speed 2498.35 samples/sec Loss 2.3276 LearningRate 0.000368 Epoch: 18 Global Step: 376720 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:10,468-Speed 2496.27 samples/sec Loss 2.3242 LearningRate 0.000368 Epoch: 18 Global Step: 376730 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:18,673-Speed 2496.66 samples/sec Loss 2.2785 LearningRate 0.000368 Epoch: 18 Global Step: 376740 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:26,826-Speed 2512.29 samples/sec Loss 2.2677 LearningRate 0.000368 Epoch: 18 Global Step: 376750 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:35,029-Speed 2496.81 samples/sec Loss 2.3687 LearningRate 0.000368 Epoch: 18 Global Step: 376760 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:43,233-Speed 2496.79 samples/sec Loss 2.3054 LearningRate 0.000368 Epoch: 18 Global Step: 376770 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:51,437-Speed 2496.75 samples/sec Loss 2.3259 LearningRate 0.000368 Epoch: 18 Global Step: 376780 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:40:59,641-Speed 2496.76 samples/sec Loss 2.3055 LearningRate 0.000368 Epoch: 18 Global Step: 376790 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:41:07,848-Speed 2495.89 samples/sec Loss 2.2853 LearningRate 0.000368 Epoch: 18 Global Step: 376800 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:41:16,002-Speed 2512.07 samples/sec Loss 2.3172 LearningRate 0.000368 Epoch: 18 Global Step: 376810 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:41:24,207-Speed 2496.66 samples/sec Loss 2.3088 LearningRate 0.000368 Epoch: 18 Global Step: 376820 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-07-09 04:41:32,406-Speed 2498.21 samples/sec Loss 2.3892 LearningRate 0.000368 Epoch: 18 Global Step: 376830 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:41:40,608-Speed 2497.75 samples/sec Loss 2.3147 LearningRate 0.000368 Epoch: 18 Global Step: 376840 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:41:48,807-Speed 2498.25 samples/sec Loss 2.2899 LearningRate 0.000368 Epoch: 18 Global Step: 376850 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:41:57,009-Speed 2497.31 samples/sec Loss 2.2794 LearningRate 0.000368 Epoch: 18 Global Step: 376860 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:05,169-Speed 2510.00 samples/sec Loss 2.3013 LearningRate 0.000368 Epoch: 18 Global Step: 376870 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:13,372-Speed 2497.20 samples/sec Loss 2.2770 LearningRate 0.000368 Epoch: 18 Global Step: 376880 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:21,571-Speed 2498.43 samples/sec Loss 2.3489 LearningRate 0.000368 Epoch: 18 Global Step: 376890 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:29,773-Speed 2497.24 samples/sec Loss 2.2674 LearningRate 0.000368 Epoch: 18 Global Step: 376900 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:37,974-Speed 2497.62 samples/sec Loss 2.3204 LearningRate 0.000368 Epoch: 18 Global Step: 376910 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:46,179-Speed 2496.40 samples/sec Loss 2.3229 LearningRate 0.000368 Epoch: 18 Global Step: 376920 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:42:54,337-Speed 2511.00 samples/sec Loss 2.2951 LearningRate 0.000368 Epoch: 18 Global Step: 376930 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:02,539-Speed 2497.31 samples/sec Loss 2.3325 LearningRate 0.000368 Epoch: 18 Global Step: 376940 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:10,736-Speed 2498.90 samples/sec Loss 2.3247 LearningRate 0.000368 Epoch: 18 Global Step: 376950 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:18,941-Speed 2496.52 samples/sec Loss 2.2812 LearningRate 0.000367 Epoch: 18 Global Step: 376960 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:27,138-Speed 2498.94 samples/sec Loss 2.3387 LearningRate 0.000367 Epoch: 18 Global Step: 376970 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:35,342-Speed 2496.94 samples/sec Loss 2.3495 LearningRate 0.000367 Epoch: 18 Global Step: 376980 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:43,506-Speed 2508.95 samples/sec Loss 2.2991 LearningRate 0.000367 Epoch: 18 Global Step: 376990 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:51,709-Speed 2497.00 samples/sec Loss 2.2905 LearningRate 0.000367 Epoch: 18 Global Step: 377000 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:43:59,909-Speed 2497.92 samples/sec Loss 2.2815 LearningRate 0.000367 Epoch: 18 Global Step: 377010 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:08,110-Speed 2497.59 samples/sec Loss 2.2854 LearningRate 0.000367 Epoch: 18 Global Step: 377020 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:16,309-Speed 2497.97 samples/sec Loss 2.2871 LearningRate 0.000367 Epoch: 18 Global Step: 377030 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:24,515-Speed 2496.23 samples/sec Loss 2.3135 LearningRate 0.000367 Epoch: 18 Global Step: 377040 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:32,663-Speed 2513.97 samples/sec Loss 2.3181 LearningRate 0.000367 Epoch: 18 Global Step: 377050 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:40,863-Speed 2497.99 samples/sec Loss 2.3736 LearningRate 0.000367 Epoch: 18 Global Step: 377060 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:49,070-Speed 2495.69 samples/sec Loss 2.3175 LearningRate 0.000367 Epoch: 18 Global Step: 377070 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:44:57,273-Speed 2497.21 samples/sec Loss 2.3160 LearningRate 0.000367 Epoch: 18 Global Step: 377080 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:05,484-Speed 2495.03 samples/sec Loss 2.3557 LearningRate 0.000367 Epoch: 18 Global Step: 377090 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:13,688-Speed 2496.63 samples/sec Loss 2.2870 LearningRate 0.000367 Epoch: 18 Global Step: 377100 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:21,840-Speed 2512.51 samples/sec Loss 2.3452 LearningRate 0.000367 Epoch: 18 Global Step: 377110 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:30,046-Speed 2496.36 samples/sec Loss 2.3411 LearningRate 0.000367 Epoch: 18 Global Step: 377120 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:38,248-Speed 2497.36 samples/sec Loss 2.3354 LearningRate 0.000367 Epoch: 18 Global Step: 377130 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:46,449-Speed 2497.49 samples/sec Loss 2.3245 LearningRate 0.000367 Epoch: 18 Global Step: 377140 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:45:54,652-Speed 2497.10 samples/sec Loss 2.3348 LearningRate 0.000367 Epoch: 18 Global Step: 377150 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:02,869-Speed 2493.06 samples/sec Loss 2.2646 LearningRate 0.000367 Epoch: 18 Global Step: 377160 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:11,024-Speed 2511.50 samples/sec Loss 2.2928 LearningRate 0.000367 Epoch: 18 Global Step: 377170 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:19,235-Speed 2494.54 samples/sec Loss 2.2935 LearningRate 0.000367 Epoch: 18 Global Step: 377180 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:27,456-Speed 2491.59 samples/sec Loss 2.3681 LearningRate 0.000367 Epoch: 18 Global Step: 377190 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:35,667-Speed 2494.62 samples/sec Loss 2.3543 LearningRate 0.000367 Epoch: 18 Global Step: 377200 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:43,870-Speed 2497.07 samples/sec Loss 2.3242 LearningRate 0.000367 Epoch: 18 Global Step: 377210 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:46:52,069-Speed 2498.32 samples/sec Loss 2.3413 LearningRate 0.000367 Epoch: 18 Global Step: 377220 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:00,218-Speed 2513.64 samples/sec Loss 2.2972 LearningRate 0.000367 Epoch: 18 Global Step: 377230 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:08,419-Speed 2497.68 samples/sec Loss 2.3579 LearningRate 0.000367 Epoch: 18 Global Step: 377240 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:16,620-Speed 2497.55 samples/sec Loss 2.2641 LearningRate 0.000367 Epoch: 18 Global Step: 377250 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:24,825-Speed 2496.28 samples/sec Loss 2.3523 LearningRate 0.000367 Epoch: 18 Global Step: 377260 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:33,028-Speed 2497.18 samples/sec Loss 2.2861 LearningRate 0.000367 Epoch: 18 Global Step: 377270 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:41,231-Speed 2497.37 samples/sec Loss 2.2651 LearningRate 0.000367 Epoch: 18 Global Step: 377280 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:49,387-Speed 2511.40 samples/sec Loss 2.3499 LearningRate 0.000367 Epoch: 18 Global Step: 377290 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:47:57,588-Speed 2497.59 samples/sec Loss 2.3719 LearningRate 0.000367 Epoch: 18 Global Step: 377300 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:05,792-Speed 2496.55 samples/sec Loss 2.2881 LearningRate 0.000367 Epoch: 18 Global Step: 377310 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:13,997-Speed 2496.77 samples/sec Loss 2.3243 LearningRate 0.000367 Epoch: 18 Global Step: 377320 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:22,206-Speed 2494.96 samples/sec Loss 2.3302 LearningRate 0.000367 Epoch: 18 Global Step: 377330 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:30,414-Speed 2495.73 samples/sec Loss 2.3136 LearningRate 0.000367 Epoch: 18 Global Step: 377340 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:38,563-Speed 2513.42 samples/sec Loss 2.2833 LearningRate 0.000367 Epoch: 18 Global Step: 377350 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:46,770-Speed 2495.87 samples/sec Loss 2.3218 LearningRate 0.000367 Epoch: 18 Global Step: 377360 Fp16 Grad Scale: 16384 Required: 104 hours Training: 2022-07-09 04:48:54,970-Speed 2497.99 samples/sec Loss 2.3008 LearningRate 0.000367 Epoch: 18 Global Step: 377370 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:03,174-Speed 2496.87 samples/sec Loss 2.3100 LearningRate 0.000367 Epoch: 18 Global Step: 377380 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:11,392-Speed 2492.23 samples/sec Loss 2.3791 LearningRate 0.000367 Epoch: 18 Global Step: 377390 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:19,594-Speed 2497.29 samples/sec Loss 2.3029 LearningRate 0.000367 Epoch: 18 Global Step: 377400 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:27,743-Speed 2513.57 samples/sec Loss 2.3603 LearningRate 0.000367 Epoch: 18 Global Step: 377410 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:35,944-Speed 2497.66 samples/sec Loss 2.3645 LearningRate 0.000367 Epoch: 18 Global Step: 377420 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:44,150-Speed 2496.07 samples/sec Loss 2.3177 LearningRate 0.000367 Epoch: 18 Global Step: 377430 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:49:52,351-Speed 2497.64 samples/sec Loss 2.3652 LearningRate 0.000367 Epoch: 18 Global Step: 377440 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:00,550-Speed 2498.48 samples/sec Loss 2.3201 LearningRate 0.000367 Epoch: 18 Global Step: 377450 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:08,759-Speed 2495.11 samples/sec Loss 2.3586 LearningRate 0.000367 Epoch: 18 Global Step: 377460 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:16,908-Speed 2513.67 samples/sec Loss 2.3350 LearningRate 0.000367 Epoch: 18 Global Step: 377470 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:25,109-Speed 2497.65 samples/sec Loss 2.3273 LearningRate 0.000367 Epoch: 18 Global Step: 377480 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:33,315-Speed 2496.20 samples/sec Loss 2.3542 LearningRate 0.000367 Epoch: 18 Global Step: 377490 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:41,529-Speed 2493.68 samples/sec Loss 2.2746 LearningRate 0.000367 Epoch: 18 Global Step: 377500 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:49,730-Speed 2497.69 samples/sec Loss 2.3318 LearningRate 0.000367 Epoch: 18 Global Step: 377510 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:50:57,946-Speed 2493.18 samples/sec Loss 2.3068 LearningRate 0.000367 Epoch: 18 Global Step: 377520 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:06,100-Speed 2512.62 samples/sec Loss 2.3397 LearningRate 0.000367 Epoch: 18 Global Step: 377530 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:14,314-Speed 2493.49 samples/sec Loss 2.2949 LearningRate 0.000367 Epoch: 18 Global Step: 377540 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:22,516-Speed 2497.34 samples/sec Loss 2.3424 LearningRate 0.000367 Epoch: 18 Global Step: 377550 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:30,722-Speed 2496.08 samples/sec Loss 2.2892 LearningRate 0.000367 Epoch: 18 Global Step: 377560 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:38,924-Speed 2497.34 samples/sec Loss 2.2771 LearningRate 0.000367 Epoch: 18 Global Step: 377570 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:47,130-Speed 2496.26 samples/sec Loss 2.3320 LearningRate 0.000366 Epoch: 18 Global Step: 377580 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:51:55,276-Speed 2514.48 samples/sec Loss 2.2824 LearningRate 0.000366 Epoch: 18 Global Step: 377590 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:03,483-Speed 2495.87 samples/sec Loss 2.2768 LearningRate 0.000366 Epoch: 18 Global Step: 377600 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:11,687-Speed 2496.79 samples/sec Loss 2.3001 LearningRate 0.000366 Epoch: 18 Global Step: 377610 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:19,899-Speed 2494.13 samples/sec Loss 2.3248 LearningRate 0.000366 Epoch: 18 Global Step: 377620 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:28,106-Speed 2496.03 samples/sec Loss 2.3268 LearningRate 0.000366 Epoch: 18 Global Step: 377630 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:36,312-Speed 2495.98 samples/sec Loss 2.2992 LearningRate 0.000366 Epoch: 18 Global Step: 377640 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:44,462-Speed 2513.31 samples/sec Loss 2.3442 LearningRate 0.000366 Epoch: 18 Global Step: 377650 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:52:52,663-Speed 2497.78 samples/sec Loss 2.3042 LearningRate 0.000366 Epoch: 18 Global Step: 377660 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:00,863-Speed 2497.77 samples/sec Loss 2.3257 LearningRate 0.000366 Epoch: 18 Global Step: 377670 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:09,065-Speed 2497.28 samples/sec Loss 2.3001 LearningRate 0.000366 Epoch: 18 Global Step: 377680 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:17,281-Speed 2493.11 samples/sec Loss 2.3249 LearningRate 0.000366 Epoch: 18 Global Step: 377690 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:25,483-Speed 2497.49 samples/sec Loss 2.3070 LearningRate 0.000366 Epoch: 18 Global Step: 377700 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:33,632-Speed 2513.55 samples/sec Loss 2.2831 LearningRate 0.000366 Epoch: 18 Global Step: 377710 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:41,832-Speed 2497.86 samples/sec Loss 2.3030 LearningRate 0.000366 Epoch: 18 Global Step: 377720 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:53:50,033-Speed 2498.57 samples/sec Loss 2.2805 LearningRate 0.000366 Epoch: 18 Global Step: 377730 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:00,332-Speed 1988.71 samples/sec Loss 2.3169 LearningRate 0.000366 Epoch: 18 Global Step: 377740 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:08,529-Speed 2498.63 samples/sec Loss 2.3154 LearningRate 0.000366 Epoch: 18 Global Step: 377750 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:16,802-Speed 2500.22 samples/sec Loss 2.2964 LearningRate 0.000366 Epoch: 18 Global Step: 377760 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:28,243-Speed 1801.71 samples/sec Loss 2.3127 LearningRate 0.000366 Epoch: 18 Global Step: 377770 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:36,448-Speed 2496.55 samples/sec Loss 2.3034 LearningRate 0.000366 Epoch: 18 Global Step: 377780 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:44,683-Speed 2500.73 samples/sec Loss 2.3338 LearningRate 0.000366 Epoch: 18 Global Step: 377790 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:54:52,911-Speed 2500.04 samples/sec Loss 2.3518 LearningRate 0.000366 Epoch: 18 Global Step: 377800 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:01,104-Speed 2499.83 samples/sec Loss 2.3089 LearningRate 0.000366 Epoch: 18 Global Step: 377810 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:09,302-Speed 2498.76 samples/sec Loss 2.3236 LearningRate 0.000366 Epoch: 18 Global Step: 377820 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:19,084-Speed 2119.21 samples/sec Loss 2.3341 LearningRate 0.000366 Epoch: 18 Global Step: 377830 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:27,329-Speed 2500.64 samples/sec Loss 2.3153 LearningRate 0.000366 Epoch: 18 Global Step: 377840 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:35,529-Speed 2497.74 samples/sec Loss 2.2637 LearningRate 0.000366 Epoch: 18 Global Step: 377850 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:46,776-Speed 1830.35 samples/sec Loss 2.2770 LearningRate 0.000366 Epoch: 18 Global Step: 377860 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:55:55,003-Speed 2501.40 samples/sec Loss 2.3421 LearningRate 0.000366 Epoch: 18 Global Step: 377870 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:03,203-Speed 2497.80 samples/sec Loss 2.2973 LearningRate 0.000366 Epoch: 18 Global Step: 377880 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:11,384-Speed 2517.17 samples/sec Loss 2.3424 LearningRate 0.000366 Epoch: 18 Global Step: 377890 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:19,650-Speed 2497.89 samples/sec Loss 2.3088 LearningRate 0.000366 Epoch: 18 Global Step: 377900 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:27,848-Speed 2498.54 samples/sec Loss 2.3078 LearningRate 0.000366 Epoch: 18 Global Step: 377910 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:36,068-Speed 2499.02 samples/sec Loss 2.2758 LearningRate 0.000366 Epoch: 18 Global Step: 377920 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:48,384-Speed 1669.11 samples/sec Loss 2.2531 LearningRate 0.000366 Epoch: 18 Global Step: 377930 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:56:56,670-Speed 2499.10 samples/sec Loss 2.3132 LearningRate 0.000366 Epoch: 18 Global Step: 377940 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:04,821-Speed 2512.77 samples/sec Loss 2.3400 LearningRate 0.000366 Epoch: 18 Global Step: 377950 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:13,030-Speed 2495.18 samples/sec Loss 2.2937 LearningRate 0.000366 Epoch: 18 Global Step: 377960 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:21,307-Speed 2496.49 samples/sec Loss 2.2957 LearningRate 0.000366 Epoch: 18 Global Step: 377970 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:29,543-Speed 2497.25 samples/sec Loss 2.3198 LearningRate 0.000366 Epoch: 18 Global Step: 377980 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:37,748-Speed 2496.51 samples/sec Loss 2.3431 LearningRate 0.000366 Epoch: 18 Global Step: 377990 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:46,008-Speed 2498.80 samples/sec Loss 2.3325 LearningRate 0.000366 Epoch: 18 Global Step: 378000 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:57:54,670-Speed 2365.17 samples/sec Loss 2.3354 LearningRate 0.000366 Epoch: 18 Global Step: 378010 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:58:03,222-Speed 2395.06 samples/sec Loss 2.2567 LearningRate 0.000366 Epoch: 18 Global Step: 378020 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 04:58:13,282-Speed 2499.41 samples/sec Loss 2.3344 LearningRate 0.000366 Epoch: 18 Global Step: 378030 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:58:21,500-Speed 2492.70 samples/sec Loss 2.3566 LearningRate 0.000366 Epoch: 18 Global Step: 378040 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:58:29,702-Speed 2497.38 samples/sec Loss 2.3797 LearningRate 0.000366 Epoch: 18 Global Step: 378050 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:58:37,904-Speed 2497.34 samples/sec Loss 2.3029 LearningRate 0.000366 Epoch: 18 Global Step: 378060 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:58:46,054-Speed 2513.47 samples/sec Loss 2.3442 LearningRate 0.000366 Epoch: 18 Global Step: 378070 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:58:54,257-Speed 2497.09 samples/sec Loss 2.3401 LearningRate 0.000366 Epoch: 18 Global Step: 378080 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:02,461-Speed 2497.10 samples/sec Loss 2.3428 LearningRate 0.000366 Epoch: 18 Global Step: 378090 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:10,681-Speed 2492.27 samples/sec Loss 2.3163 LearningRate 0.000366 Epoch: 18 Global Step: 378100 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:18,909-Speed 2489.49 samples/sec Loss 2.3324 LearningRate 0.000366 Epoch: 18 Global Step: 378110 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:27,124-Speed 2493.44 samples/sec Loss 2.3329 LearningRate 0.000366 Epoch: 18 Global Step: 378120 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:35,276-Speed 2512.89 samples/sec Loss 2.2902 LearningRate 0.000366 Epoch: 18 Global Step: 378130 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:43,480-Speed 2496.59 samples/sec Loss 2.3022 LearningRate 0.000366 Epoch: 18 Global Step: 378140 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:51,685-Speed 2496.40 samples/sec Loss 2.3527 LearningRate 0.000366 Epoch: 18 Global Step: 378150 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 04:59:59,890-Speed 2496.34 samples/sec Loss 2.3283 LearningRate 0.000366 Epoch: 18 Global Step: 378160 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:08,100-Speed 2494.94 samples/sec Loss 2.3229 LearningRate 0.000366 Epoch: 18 Global Step: 378170 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:16,304-Speed 2496.99 samples/sec Loss 2.3551 LearningRate 0.000366 Epoch: 18 Global Step: 378180 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:24,452-Speed 2514.02 samples/sec Loss 2.3103 LearningRate 0.000365 Epoch: 18 Global Step: 378190 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:32,653-Speed 2497.82 samples/sec Loss 2.3148 LearningRate 0.000365 Epoch: 18 Global Step: 378200 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:40,854-Speed 2497.40 samples/sec Loss 2.3707 LearningRate 0.000365 Epoch: 18 Global Step: 378210 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:49,056-Speed 2497.57 samples/sec Loss 2.3195 LearningRate 0.000365 Epoch: 18 Global Step: 378220 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:00:57,269-Speed 2493.93 samples/sec Loss 2.3322 LearningRate 0.000365 Epoch: 18 Global Step: 378230 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:05,471-Speed 2497.53 samples/sec Loss 2.2688 LearningRate 0.000365 Epoch: 18 Global Step: 378240 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:13,619-Speed 2514.03 samples/sec Loss 2.3361 LearningRate 0.000365 Epoch: 18 Global Step: 378250 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:21,834-Speed 2493.40 samples/sec Loss 2.3277 LearningRate 0.000365 Epoch: 18 Global Step: 378260 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:30,035-Speed 2497.61 samples/sec Loss 2.2902 LearningRate 0.000365 Epoch: 18 Global Step: 378270 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:38,250-Speed 2493.50 samples/sec Loss 2.3242 LearningRate 0.000365 Epoch: 18 Global Step: 378280 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:46,453-Speed 2496.99 samples/sec Loss 2.3705 LearningRate 0.000365 Epoch: 18 Global Step: 378290 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:01:54,671-Speed 2492.33 samples/sec Loss 2.4006 LearningRate 0.000365 Epoch: 18 Global Step: 378300 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:02,822-Speed 2513.35 samples/sec Loss 2.3139 LearningRate 0.000365 Epoch: 18 Global Step: 378310 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:11,030-Speed 2495.60 samples/sec Loss 2.2814 LearningRate 0.000365 Epoch: 18 Global Step: 378320 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:19,236-Speed 2496.09 samples/sec Loss 2.2980 LearningRate 0.000365 Epoch: 18 Global Step: 378330 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:27,440-Speed 2496.78 samples/sec Loss 2.2578 LearningRate 0.000365 Epoch: 18 Global Step: 378340 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:35,642-Speed 2497.29 samples/sec Loss 2.3238 LearningRate 0.000365 Epoch: 18 Global Step: 378350 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:43,845-Speed 2497.01 samples/sec Loss 2.3663 LearningRate 0.000365 Epoch: 18 Global Step: 378360 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:02:51,997-Speed 2512.66 samples/sec Loss 2.3852 LearningRate 0.000365 Epoch: 18 Global Step: 378370 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:00,201-Speed 2497.25 samples/sec Loss 2.3082 LearningRate 0.000365 Epoch: 18 Global Step: 378380 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:08,401-Speed 2498.01 samples/sec Loss 2.3057 LearningRate 0.000365 Epoch: 18 Global Step: 378390 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:16,603-Speed 2497.21 samples/sec Loss 2.3224 LearningRate 0.000365 Epoch: 18 Global Step: 378400 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:24,802-Speed 2497.99 samples/sec Loss 2.3440 LearningRate 0.000365 Epoch: 18 Global Step: 378410 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:33,003-Speed 2497.86 samples/sec Loss 2.3251 LearningRate 0.000365 Epoch: 18 Global Step: 378420 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:41,149-Speed 2514.60 samples/sec Loss 2.3365 LearningRate 0.000365 Epoch: 18 Global Step: 378430 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:49,347-Speed 2498.40 samples/sec Loss 2.2942 LearningRate 0.000365 Epoch: 18 Global Step: 378440 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:03:57,548-Speed 2497.86 samples/sec Loss 2.2982 LearningRate 0.000365 Epoch: 18 Global Step: 378450 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:05,759-Speed 2494.71 samples/sec Loss 2.2463 LearningRate 0.000365 Epoch: 18 Global Step: 378460 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:13,960-Speed 2497.78 samples/sec Loss 2.2877 LearningRate 0.000365 Epoch: 18 Global Step: 378470 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:22,167-Speed 2495.76 samples/sec Loss 2.2805 LearningRate 0.000365 Epoch: 18 Global Step: 378480 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:30,320-Speed 2512.28 samples/sec Loss 2.3534 LearningRate 0.000365 Epoch: 18 Global Step: 378490 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:38,521-Speed 2497.80 samples/sec Loss 2.3301 LearningRate 0.000365 Epoch: 18 Global Step: 378500 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:46,722-Speed 2497.64 samples/sec Loss 2.3228 LearningRate 0.000365 Epoch: 18 Global Step: 378510 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:04:54,946-Speed 2490.54 samples/sec Loss 2.2754 LearningRate 0.000365 Epoch: 18 Global Step: 378520 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:03,151-Speed 2496.52 samples/sec Loss 2.3001 LearningRate 0.000365 Epoch: 18 Global Step: 378530 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:11,354-Speed 2497.24 samples/sec Loss 2.3121 LearningRate 0.000365 Epoch: 18 Global Step: 378540 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:19,503-Speed 2513.67 samples/sec Loss 2.2953 LearningRate 0.000365 Epoch: 18 Global Step: 378550 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:27,705-Speed 2497.43 samples/sec Loss 2.3069 LearningRate 0.000365 Epoch: 18 Global Step: 378560 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:35,906-Speed 2497.76 samples/sec Loss 2.2985 LearningRate 0.000365 Epoch: 18 Global Step: 378570 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:44,108-Speed 2497.34 samples/sec Loss 2.3125 LearningRate 0.000365 Epoch: 18 Global Step: 378580 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:05:52,325-Speed 2492.74 samples/sec Loss 2.2673 LearningRate 0.000365 Epoch: 18 Global Step: 378590 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:00,527-Speed 2497.42 samples/sec Loss 2.2906 LearningRate 0.000365 Epoch: 18 Global Step: 378600 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:08,675-Speed 2513.58 samples/sec Loss 2.2835 LearningRate 0.000365 Epoch: 18 Global Step: 378610 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:16,881-Speed 2496.42 samples/sec Loss 2.3255 LearningRate 0.000365 Epoch: 18 Global Step: 378620 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:25,084-Speed 2496.92 samples/sec Loss 2.3367 LearningRate 0.000365 Epoch: 18 Global Step: 378630 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:33,302-Speed 2492.40 samples/sec Loss 2.3594 LearningRate 0.000365 Epoch: 18 Global Step: 378640 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:41,506-Speed 2496.70 samples/sec Loss 2.3094 LearningRate 0.000365 Epoch: 18 Global Step: 378650 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:49,706-Speed 2498.69 samples/sec Loss 2.2854 LearningRate 0.000365 Epoch: 18 Global Step: 378660 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:06:57,852-Speed 2514.45 samples/sec Loss 2.2908 LearningRate 0.000365 Epoch: 18 Global Step: 378670 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:06,054-Speed 2497.50 samples/sec Loss 2.3239 LearningRate 0.000365 Epoch: 18 Global Step: 378680 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:14,253-Speed 2498.18 samples/sec Loss 2.3374 LearningRate 0.000365 Epoch: 18 Global Step: 378690 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:22,458-Speed 2496.51 samples/sec Loss 2.3433 LearningRate 0.000365 Epoch: 18 Global Step: 378700 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:30,656-Speed 2498.48 samples/sec Loss 2.2882 LearningRate 0.000365 Epoch: 18 Global Step: 378710 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:38,861-Speed 2496.46 samples/sec Loss 2.3389 LearningRate 0.000365 Epoch: 18 Global Step: 378720 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:47,010-Speed 2513.82 samples/sec Loss 2.3338 LearningRate 0.000365 Epoch: 18 Global Step: 378730 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:07:55,211-Speed 2497.52 samples/sec Loss 2.2932 LearningRate 0.000365 Epoch: 18 Global Step: 378740 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:03,411-Speed 2497.87 samples/sec Loss 2.3191 LearningRate 0.000365 Epoch: 18 Global Step: 378750 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:11,613-Speed 2497.55 samples/sec Loss 2.2801 LearningRate 0.000365 Epoch: 18 Global Step: 378760 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:19,811-Speed 2498.35 samples/sec Loss 2.3372 LearningRate 0.000365 Epoch: 18 Global Step: 378770 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:28,011-Speed 2497.92 samples/sec Loss 2.2989 LearningRate 0.000365 Epoch: 18 Global Step: 378780 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:36,159-Speed 2513.82 samples/sec Loss 2.2996 LearningRate 0.000365 Epoch: 18 Global Step: 378790 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:44,363-Speed 2496.73 samples/sec Loss 2.3341 LearningRate 0.000365 Epoch: 18 Global Step: 378800 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:08:52,563-Speed 2498.05 samples/sec Loss 2.3135 LearningRate 0.000364 Epoch: 18 Global Step: 378810 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:00,779-Speed 2493.31 samples/sec Loss 2.3347 LearningRate 0.000364 Epoch: 18 Global Step: 378820 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:08,993-Speed 2493.89 samples/sec Loss 2.3415 LearningRate 0.000364 Epoch: 18 Global Step: 378830 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:17,195-Speed 2497.04 samples/sec Loss 2.2597 LearningRate 0.000364 Epoch: 18 Global Step: 378840 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:25,346-Speed 2513.10 samples/sec Loss 2.2480 LearningRate 0.000364 Epoch: 18 Global Step: 378850 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:33,544-Speed 2498.81 samples/sec Loss 2.3058 LearningRate 0.000364 Epoch: 18 Global Step: 378860 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:41,740-Speed 2499.16 samples/sec Loss 2.3052 LearningRate 0.000364 Epoch: 18 Global Step: 378870 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:49,940-Speed 2498.08 samples/sec Loss 2.2636 LearningRate 0.000364 Epoch: 18 Global Step: 378880 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:09:58,138-Speed 2498.67 samples/sec Loss 2.2774 LearningRate 0.000364 Epoch: 18 Global Step: 378890 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:06,339-Speed 2497.82 samples/sec Loss 2.2590 LearningRate 0.000364 Epoch: 18 Global Step: 378900 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:14,483-Speed 2514.87 samples/sec Loss 2.3042 LearningRate 0.000364 Epoch: 18 Global Step: 378910 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:22,683-Speed 2497.96 samples/sec Loss 2.3267 LearningRate 0.000364 Epoch: 18 Global Step: 378920 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:30,885-Speed 2497.31 samples/sec Loss 2.3131 LearningRate 0.000364 Epoch: 18 Global Step: 378930 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:39,085-Speed 2498.37 samples/sec Loss 2.2885 LearningRate 0.000364 Epoch: 18 Global Step: 378940 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:47,298-Speed 2493.94 samples/sec Loss 2.2733 LearningRate 0.000364 Epoch: 18 Global Step: 378950 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:10:55,495-Speed 2498.85 samples/sec Loss 2.3010 LearningRate 0.000364 Epoch: 18 Global Step: 378960 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:03,647-Speed 2512.80 samples/sec Loss 2.2738 LearningRate 0.000364 Epoch: 18 Global Step: 378970 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:11,845-Speed 2498.62 samples/sec Loss 2.3032 LearningRate 0.000364 Epoch: 18 Global Step: 378980 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:20,055-Speed 2494.94 samples/sec Loss 2.2833 LearningRate 0.000364 Epoch: 18 Global Step: 378990 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:28,255-Speed 2497.99 samples/sec Loss 2.2938 LearningRate 0.000364 Epoch: 18 Global Step: 379000 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:36,457-Speed 2497.30 samples/sec Loss 2.2929 LearningRate 0.000364 Epoch: 18 Global Step: 379010 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:44,657-Speed 2497.91 samples/sec Loss 2.3309 LearningRate 0.000364 Epoch: 18 Global Step: 379020 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:11:52,813-Speed 2511.45 samples/sec Loss 2.3044 LearningRate 0.000364 Epoch: 18 Global Step: 379030 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:01,014-Speed 2497.82 samples/sec Loss 2.3299 LearningRate 0.000364 Epoch: 18 Global Step: 379040 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:09,227-Speed 2493.83 samples/sec Loss 2.3100 LearningRate 0.000364 Epoch: 18 Global Step: 379050 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:17,429-Speed 2497.63 samples/sec Loss 2.3261 LearningRate 0.000364 Epoch: 18 Global Step: 379060 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:25,632-Speed 2496.80 samples/sec Loss 2.3066 LearningRate 0.000364 Epoch: 18 Global Step: 379070 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:33,836-Speed 2496.86 samples/sec Loss 2.2329 LearningRate 0.000364 Epoch: 18 Global Step: 379080 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:41,988-Speed 2512.94 samples/sec Loss 2.2849 LearningRate 0.000364 Epoch: 18 Global Step: 379090 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:50,190-Speed 2497.10 samples/sec Loss 2.3028 LearningRate 0.000364 Epoch: 18 Global Step: 379100 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:12:58,392-Speed 2497.89 samples/sec Loss 2.2892 LearningRate 0.000364 Epoch: 18 Global Step: 379110 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:06,595-Speed 2496.87 samples/sec Loss 2.3810 LearningRate 0.000364 Epoch: 18 Global Step: 379120 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:14,800-Speed 2497.02 samples/sec Loss 2.3058 LearningRate 0.000364 Epoch: 18 Global Step: 379130 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:23,012-Speed 2493.95 samples/sec Loss 2.3297 LearningRate 0.000364 Epoch: 18 Global Step: 379140 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:31,165-Speed 2512.67 samples/sec Loss 2.3143 LearningRate 0.000364 Epoch: 18 Global Step: 379150 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:39,376-Speed 2494.70 samples/sec Loss 2.3360 LearningRate 0.000364 Epoch: 18 Global Step: 379160 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:47,573-Speed 2498.89 samples/sec Loss 2.2995 LearningRate 0.000364 Epoch: 18 Global Step: 379170 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:13:55,773-Speed 2498.06 samples/sec Loss 2.3560 LearningRate 0.000364 Epoch: 18 Global Step: 379180 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:14:03,971-Speed 2498.63 samples/sec Loss 2.2927 LearningRate 0.000364 Epoch: 18 Global Step: 379190 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:14:12,171-Speed 2497.90 samples/sec Loss 2.3178 LearningRate 0.000364 Epoch: 18 Global Step: 379200 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:14:20,319-Speed 2513.78 samples/sec Loss 2.3202 LearningRate 0.000364 Epoch: 18 Global Step: 379210 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:14:28,528-Speed 2495.11 samples/sec Loss 2.3090 LearningRate 0.000364 Epoch: 18 Global Step: 379220 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:14:36,729-Speed 2497.80 samples/sec Loss 2.3159 LearningRate 0.000364 Epoch: 18 Global Step: 379230 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:14:44,926-Speed 2498.69 samples/sec Loss 2.2819 LearningRate 0.000364 Epoch: 18 Global Step: 379240 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:14:53,136-Speed 2494.96 samples/sec Loss 2.3219 LearningRate 0.000364 Epoch: 18 Global Step: 379250 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:01,336-Speed 2498.28 samples/sec Loss 2.3110 LearningRate 0.000364 Epoch: 18 Global Step: 379260 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:09,482-Speed 2514.34 samples/sec Loss 2.3254 LearningRate 0.000364 Epoch: 18 Global Step: 379270 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:17,681-Speed 2498.33 samples/sec Loss 2.3260 LearningRate 0.000364 Epoch: 18 Global Step: 379280 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:25,896-Speed 2493.33 samples/sec Loss 2.3533 LearningRate 0.000364 Epoch: 18 Global Step: 379290 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:34,105-Speed 2495.16 samples/sec Loss 2.3316 LearningRate 0.000364 Epoch: 18 Global Step: 379300 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:42,321-Speed 2493.38 samples/sec Loss 2.3248 LearningRate 0.000364 Epoch: 18 Global Step: 379310 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:50,522-Speed 2497.70 samples/sec Loss 2.2825 LearningRate 0.000364 Epoch: 18 Global Step: 379320 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:15:58,695-Speed 2506.21 samples/sec Loss 2.3390 LearningRate 0.000364 Epoch: 18 Global Step: 379330 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:06,901-Speed 2495.95 samples/sec Loss 2.3017 LearningRate 0.000364 Epoch: 18 Global Step: 379340 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:15,104-Speed 2497.04 samples/sec Loss 2.3985 LearningRate 0.000364 Epoch: 18 Global Step: 379350 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:23,308-Speed 2496.85 samples/sec Loss 2.3180 LearningRate 0.000364 Epoch: 18 Global Step: 379360 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:31,520-Speed 2494.18 samples/sec Loss 2.2895 LearningRate 0.000364 Epoch: 18 Global Step: 379370 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:39,721-Speed 2497.66 samples/sec Loss 2.3232 LearningRate 0.000364 Epoch: 18 Global Step: 379380 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:47,876-Speed 2511.64 samples/sec Loss 2.3132 LearningRate 0.000364 Epoch: 18 Global Step: 379390 Fp16 Grad Scale: 65536 Required: 103 hours Training: 2022-07-09 05:16:56,038-Speed 2509.66 samples/sec Loss 2.2774 LearningRate 0.000364 Epoch: 18 Global Step: 379400 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:04,243-Speed 2496.43 samples/sec Loss 2.2975 LearningRate 0.000364 Epoch: 18 Global Step: 379410 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:12,445-Speed 2497.32 samples/sec Loss 2.3691 LearningRate 0.000364 Epoch: 18 Global Step: 379420 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:20,643-Speed 2498.45 samples/sec Loss 2.2968 LearningRate 0.000363 Epoch: 18 Global Step: 379430 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:28,842-Speed 2498.26 samples/sec Loss 2.3150 LearningRate 0.000363 Epoch: 18 Global Step: 379440 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:36,989-Speed 2514.39 samples/sec Loss 2.2938 LearningRate 0.000363 Epoch: 18 Global Step: 379450 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:45,190-Speed 2497.84 samples/sec Loss 2.2446 LearningRate 0.000363 Epoch: 18 Global Step: 379460 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:17:53,395-Speed 2496.57 samples/sec Loss 2.2265 LearningRate 0.000363 Epoch: 18 Global Step: 379470 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:01,596-Speed 2497.64 samples/sec Loss 2.2162 LearningRate 0.000363 Epoch: 18 Global Step: 379480 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:09,800-Speed 2496.64 samples/sec Loss 2.3022 LearningRate 0.000363 Epoch: 18 Global Step: 379490 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:18,006-Speed 2496.01 samples/sec Loss 2.3460 LearningRate 0.000363 Epoch: 18 Global Step: 379500 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:26,156-Speed 2513.47 samples/sec Loss 2.2497 LearningRate 0.000363 Epoch: 18 Global Step: 379510 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:34,364-Speed 2495.64 samples/sec Loss 2.2624 LearningRate 0.000363 Epoch: 18 Global Step: 379520 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:42,564-Speed 2497.95 samples/sec Loss 2.2803 LearningRate 0.000363 Epoch: 18 Global Step: 379530 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:50,765-Speed 2497.81 samples/sec Loss 2.3162 LearningRate 0.000363 Epoch: 18 Global Step: 379540 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:18:58,969-Speed 2496.61 samples/sec Loss 2.3013 LearningRate 0.000363 Epoch: 18 Global Step: 379550 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:07,176-Speed 2495.88 samples/sec Loss 2.2924 LearningRate 0.000363 Epoch: 18 Global Step: 379560 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:15,323-Speed 2514.02 samples/sec Loss 2.3289 LearningRate 0.000363 Epoch: 18 Global Step: 379570 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:23,530-Speed 2495.87 samples/sec Loss 2.3561 LearningRate 0.000363 Epoch: 18 Global Step: 379580 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:31,729-Speed 2498.47 samples/sec Loss 2.3120 LearningRate 0.000363 Epoch: 18 Global Step: 379590 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:39,932-Speed 2496.84 samples/sec Loss 2.2643 LearningRate 0.000363 Epoch: 18 Global Step: 379600 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:48,147-Speed 2493.25 samples/sec Loss 2.3196 LearningRate 0.000363 Epoch: 18 Global Step: 379610 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:19:56,348-Speed 2497.64 samples/sec Loss 2.2766 LearningRate 0.000363 Epoch: 18 Global Step: 379620 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:04,498-Speed 2513.38 samples/sec Loss 2.2795 LearningRate 0.000363 Epoch: 18 Global Step: 379630 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:12,697-Speed 2498.08 samples/sec Loss 2.2930 LearningRate 0.000363 Epoch: 18 Global Step: 379640 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:20,895-Speed 2498.54 samples/sec Loss 2.3449 LearningRate 0.000363 Epoch: 18 Global Step: 379650 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:29,093-Speed 2498.77 samples/sec Loss 2.2540 LearningRate 0.000363 Epoch: 18 Global Step: 379660 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:37,295-Speed 2497.28 samples/sec Loss 2.2931 LearningRate 0.000363 Epoch: 18 Global Step: 379670 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:45,495-Speed 2497.87 samples/sec Loss 2.3544 LearningRate 0.000363 Epoch: 18 Global Step: 379680 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:20:53,650-Speed 2511.97 samples/sec Loss 2.2971 LearningRate 0.000363 Epoch: 18 Global Step: 379690 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:01,847-Speed 2498.98 samples/sec Loss 2.3422 LearningRate 0.000363 Epoch: 18 Global Step: 379700 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:10,053-Speed 2496.13 samples/sec Loss 2.2859 LearningRate 0.000363 Epoch: 18 Global Step: 379710 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:18,257-Speed 2497.23 samples/sec Loss 2.3003 LearningRate 0.000363 Epoch: 18 Global Step: 379720 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:26,465-Speed 2495.67 samples/sec Loss 2.3105 LearningRate 0.000363 Epoch: 18 Global Step: 379730 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:34,670-Speed 2496.16 samples/sec Loss 2.3687 LearningRate 0.000363 Epoch: 18 Global Step: 379740 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:42,828-Speed 2510.75 samples/sec Loss 2.3037 LearningRate 0.000363 Epoch: 18 Global Step: 379750 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:51,029-Speed 2497.72 samples/sec Loss 2.3331 LearningRate 0.000363 Epoch: 18 Global Step: 379760 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:21:59,235-Speed 2496.07 samples/sec Loss 2.3165 LearningRate 0.000363 Epoch: 18 Global Step: 379770 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:07,438-Speed 2497.04 samples/sec Loss 2.3249 LearningRate 0.000363 Epoch: 18 Global Step: 379780 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:15,641-Speed 2496.92 samples/sec Loss 2.3181 LearningRate 0.000363 Epoch: 18 Global Step: 379790 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:23,849-Speed 2495.51 samples/sec Loss 2.3050 LearningRate 0.000363 Epoch: 18 Global Step: 379800 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:32,003-Speed 2512.06 samples/sec Loss 2.3401 LearningRate 0.000363 Epoch: 18 Global Step: 379810 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:40,205-Speed 2497.28 samples/sec Loss 2.2675 LearningRate 0.000363 Epoch: 18 Global Step: 379820 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:48,407-Speed 2497.66 samples/sec Loss 2.3476 LearningRate 0.000363 Epoch: 18 Global Step: 379830 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:22:56,610-Speed 2496.96 samples/sec Loss 2.2456 LearningRate 0.000363 Epoch: 18 Global Step: 379840 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:04,811-Speed 2497.65 samples/sec Loss 2.3154 LearningRate 0.000363 Epoch: 18 Global Step: 379850 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:13,013-Speed 2497.47 samples/sec Loss 2.3176 LearningRate 0.000363 Epoch: 18 Global Step: 379860 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:21,158-Speed 2514.55 samples/sec Loss 2.3208 LearningRate 0.000363 Epoch: 18 Global Step: 379870 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:29,361-Speed 2497.20 samples/sec Loss 2.2773 LearningRate 0.000363 Epoch: 18 Global Step: 379880 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:37,562-Speed 2497.81 samples/sec Loss 2.3093 LearningRate 0.000363 Epoch: 18 Global Step: 379890 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:45,760-Speed 2498.37 samples/sec Loss 2.2895 LearningRate 0.000363 Epoch: 18 Global Step: 379900 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:23:53,971-Speed 2494.76 samples/sec Loss 2.2624 LearningRate 0.000363 Epoch: 18 Global Step: 379910 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:02,174-Speed 2497.02 samples/sec Loss 2.2602 LearningRate 0.000363 Epoch: 18 Global Step: 379920 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:10,326-Speed 2512.75 samples/sec Loss 2.3244 LearningRate 0.000363 Epoch: 18 Global Step: 379930 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:18,532-Speed 2496.31 samples/sec Loss 2.2943 LearningRate 0.000363 Epoch: 18 Global Step: 379940 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:26,730-Speed 2498.53 samples/sec Loss 2.2884 LearningRate 0.000363 Epoch: 18 Global Step: 379950 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:34,930-Speed 2498.44 samples/sec Loss 2.3316 LearningRate 0.000363 Epoch: 18 Global Step: 379960 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:43,133-Speed 2496.96 samples/sec Loss 2.3143 LearningRate 0.000363 Epoch: 18 Global Step: 379970 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:51,344-Speed 2494.48 samples/sec Loss 2.2567 LearningRate 0.000363 Epoch: 18 Global Step: 379980 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:24:59,509-Speed 2508.86 samples/sec Loss 2.3265 LearningRate 0.000363 Epoch: 18 Global Step: 379990 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:07,716-Speed 2495.93 samples/sec Loss 2.2970 LearningRate 0.000363 Epoch: 18 Global Step: 380000 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:15,918-Speed 2497.27 samples/sec Loss 2.2927 LearningRate 0.000363 Epoch: 18 Global Step: 380010 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:24,120-Speed 2497.30 samples/sec Loss 2.2789 LearningRate 0.000363 Epoch: 18 Global Step: 380020 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:32,324-Speed 2496.70 samples/sec Loss 2.2465 LearningRate 0.000363 Epoch: 18 Global Step: 380030 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:40,522-Speed 2498.55 samples/sec Loss 2.2473 LearningRate 0.000363 Epoch: 18 Global Step: 380040 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:48,674-Speed 2512.55 samples/sec Loss 2.3076 LearningRate 0.000362 Epoch: 18 Global Step: 380050 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:25:56,870-Speed 2499.14 samples/sec Loss 2.2401 LearningRate 0.000362 Epoch: 18 Global Step: 380060 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:05,069-Speed 2498.37 samples/sec Loss 2.2719 LearningRate 0.000362 Epoch: 18 Global Step: 380070 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:13,269-Speed 2497.76 samples/sec Loss 2.3212 LearningRate 0.000362 Epoch: 18 Global Step: 380080 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:21,469-Speed 2498.08 samples/sec Loss 2.3380 LearningRate 0.000362 Epoch: 18 Global Step: 380090 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:29,668-Speed 2498.22 samples/sec Loss 2.2995 LearningRate 0.000362 Epoch: 18 Global Step: 380100 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:37,816-Speed 2514.16 samples/sec Loss 2.3241 LearningRate 0.000362 Epoch: 18 Global Step: 380110 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:46,016-Speed 2497.72 samples/sec Loss 2.3084 LearningRate 0.000362 Epoch: 18 Global Step: 380120 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:26:54,216-Speed 2497.89 samples/sec Loss 2.2762 LearningRate 0.000362 Epoch: 18 Global Step: 380130 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:02,422-Speed 2496.08 samples/sec Loss 2.3195 LearningRate 0.000362 Epoch: 18 Global Step: 380140 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:10,634-Speed 2494.46 samples/sec Loss 2.3220 LearningRate 0.000362 Epoch: 18 Global Step: 380150 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:18,836-Speed 2497.27 samples/sec Loss 2.2506 LearningRate 0.000362 Epoch: 18 Global Step: 380160 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:26,985-Speed 2513.39 samples/sec Loss 2.3372 LearningRate 0.000362 Epoch: 18 Global Step: 380170 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:35,185-Speed 2498.36 samples/sec Loss 2.3260 LearningRate 0.000362 Epoch: 18 Global Step: 380180 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:43,388-Speed 2497.20 samples/sec Loss 2.3567 LearningRate 0.000362 Epoch: 18 Global Step: 380190 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:51,594-Speed 2496.07 samples/sec Loss 2.3257 LearningRate 0.000362 Epoch: 18 Global Step: 380200 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:27:59,811-Speed 2493.22 samples/sec Loss 2.3249 LearningRate 0.000362 Epoch: 18 Global Step: 380210 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:08,009-Speed 2498.45 samples/sec Loss 2.3263 LearningRate 0.000362 Epoch: 18 Global Step: 380220 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:16,163-Speed 2512.27 samples/sec Loss 2.3616 LearningRate 0.000362 Epoch: 18 Global Step: 380230 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:24,360-Speed 2498.44 samples/sec Loss 2.3060 LearningRate 0.000362 Epoch: 18 Global Step: 380240 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:32,562-Speed 2497.57 samples/sec Loss 2.3046 LearningRate 0.000362 Epoch: 18 Global Step: 380250 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:40,760-Speed 2498.56 samples/sec Loss 2.3088 LearningRate 0.000362 Epoch: 18 Global Step: 380260 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:48,957-Speed 2498.75 samples/sec Loss 2.3142 LearningRate 0.000362 Epoch: 18 Global Step: 380270 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:28:57,158-Speed 2497.72 samples/sec Loss 2.3521 LearningRate 0.000362 Epoch: 18 Global Step: 380280 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:05,304-Speed 2514.52 samples/sec Loss 2.2828 LearningRate 0.000362 Epoch: 18 Global Step: 380290 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:13,509-Speed 2496.31 samples/sec Loss 2.3015 LearningRate 0.000362 Epoch: 18 Global Step: 380300 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:21,713-Speed 2496.76 samples/sec Loss 2.3192 LearningRate 0.000362 Epoch: 18 Global Step: 380310 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:29,919-Speed 2496.06 samples/sec Loss 2.3043 LearningRate 0.000362 Epoch: 18 Global Step: 380320 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:38,117-Speed 2498.56 samples/sec Loss 2.3285 LearningRate 0.000362 Epoch: 18 Global Step: 380330 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:46,320-Speed 2497.52 samples/sec Loss 2.2821 LearningRate 0.000362 Epoch: 18 Global Step: 380340 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:29:54,465-Speed 2514.84 samples/sec Loss 2.3060 LearningRate 0.000362 Epoch: 18 Global Step: 380350 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:30:02,666-Speed 2497.45 samples/sec Loss 2.2652 LearningRate 0.000362 Epoch: 18 Global Step: 380360 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:30:10,872-Speed 2496.22 samples/sec Loss 2.2751 LearningRate 0.000362 Epoch: 18 Global Step: 380370 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:30:19,070-Speed 2498.55 samples/sec Loss 2.2767 LearningRate 0.000362 Epoch: 18 Global Step: 380380 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:30:27,229-Speed 2510.73 samples/sec Loss 2.2921 LearningRate 0.000362 Epoch: 18 Global Step: 380390 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:30:35,429-Speed 2497.75 samples/sec Loss 2.3115 LearningRate 0.000362 Epoch: 18 Global Step: 380400 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:30:43,598-Speed 2507.56 samples/sec Loss 2.3298 LearningRate 0.000362 Epoch: 18 Global Step: 380410 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:30:51,815-Speed 2492.65 samples/sec Loss 2.3482 LearningRate 0.000362 Epoch: 18 Global Step: 380420 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:00,029-Speed 2493.67 samples/sec Loss 2.3677 LearningRate 0.000362 Epoch: 18 Global Step: 380430 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:08,227-Speed 2498.85 samples/sec Loss 2.3213 LearningRate 0.000362 Epoch: 18 Global Step: 380440 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:16,426-Speed 2498.07 samples/sec Loss 2.3185 LearningRate 0.000362 Epoch: 18 Global Step: 380450 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:24,634-Speed 2495.69 samples/sec Loss 2.3758 LearningRate 0.000362 Epoch: 18 Global Step: 380460 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:32,783-Speed 2513.49 samples/sec Loss 2.3518 LearningRate 0.000362 Epoch: 18 Global Step: 380470 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:40,984-Speed 2497.62 samples/sec Loss 2.2835 LearningRate 0.000362 Epoch: 18 Global Step: 380480 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:49,185-Speed 2497.77 samples/sec Loss 2.2680 LearningRate 0.000362 Epoch: 18 Global Step: 380490 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:31:57,393-Speed 2495.55 samples/sec Loss 2.3397 LearningRate 0.000362 Epoch: 18 Global Step: 380500 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:05,601-Speed 2495.42 samples/sec Loss 2.3289 LearningRate 0.000362 Epoch: 18 Global Step: 380510 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:13,802-Speed 2498.01 samples/sec Loss 2.3391 LearningRate 0.000362 Epoch: 18 Global Step: 380520 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:21,950-Speed 2514.04 samples/sec Loss 2.3565 LearningRate 0.000362 Epoch: 18 Global Step: 380530 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:30,145-Speed 2499.41 samples/sec Loss 2.2847 LearningRate 0.000362 Epoch: 18 Global Step: 380540 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:38,346-Speed 2497.68 samples/sec Loss 2.2916 LearningRate 0.000362 Epoch: 18 Global Step: 380550 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:46,547-Speed 2497.93 samples/sec Loss 2.3038 LearningRate 0.000362 Epoch: 18 Global Step: 380560 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:32:54,744-Speed 2498.80 samples/sec Loss 2.3648 LearningRate 0.000362 Epoch: 18 Global Step: 380570 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:02,948-Speed 2496.74 samples/sec Loss 2.3376 LearningRate 0.000362 Epoch: 18 Global Step: 380580 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:11,095-Speed 2514.04 samples/sec Loss 2.3904 LearningRate 0.000362 Epoch: 18 Global Step: 380590 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:19,296-Speed 2497.85 samples/sec Loss 2.3603 LearningRate 0.000362 Epoch: 18 Global Step: 380600 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:27,496-Speed 2498.11 samples/sec Loss 2.3307 LearningRate 0.000362 Epoch: 18 Global Step: 380610 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:35,699-Speed 2496.89 samples/sec Loss 2.3199 LearningRate 0.000362 Epoch: 18 Global Step: 380620 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:43,897-Speed 2498.51 samples/sec Loss 2.3848 LearningRate 0.000362 Epoch: 18 Global Step: 380630 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:33:52,099-Speed 2497.33 samples/sec Loss 2.3438 LearningRate 0.000362 Epoch: 18 Global Step: 380640 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:00,245-Speed 2514.46 samples/sec Loss 2.3047 LearningRate 0.000362 Epoch: 18 Global Step: 380650 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:08,444-Speed 2498.27 samples/sec Loss 2.3051 LearningRate 0.000362 Epoch: 18 Global Step: 380660 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:16,644-Speed 2498.06 samples/sec Loss 2.3288 LearningRate 0.000361 Epoch: 18 Global Step: 380670 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:24,843-Speed 2498.46 samples/sec Loss 2.3100 LearningRate 0.000361 Epoch: 18 Global Step: 380680 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:33,046-Speed 2497.10 samples/sec Loss 2.2726 LearningRate 0.000361 Epoch: 18 Global Step: 380690 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:41,244-Speed 2498.48 samples/sec Loss 2.2974 LearningRate 0.000361 Epoch: 18 Global Step: 380700 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:49,403-Speed 2510.50 samples/sec Loss 2.3239 LearningRate 0.000361 Epoch: 18 Global Step: 380710 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:34:57,609-Speed 2496.15 samples/sec Loss 2.3864 LearningRate 0.000361 Epoch: 18 Global Step: 380720 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:05,811-Speed 2497.29 samples/sec Loss 2.2958 LearningRate 0.000361 Epoch: 18 Global Step: 380730 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:14,012-Speed 2497.70 samples/sec Loss 2.3362 LearningRate 0.000361 Epoch: 18 Global Step: 380740 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:22,222-Speed 2494.89 samples/sec Loss 2.3036 LearningRate 0.000361 Epoch: 18 Global Step: 380750 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:30,419-Speed 2499.08 samples/sec Loss 2.3528 LearningRate 0.000361 Epoch: 18 Global Step: 380760 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:38,566-Speed 2514.27 samples/sec Loss 2.3470 LearningRate 0.000361 Epoch: 18 Global Step: 380770 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:46,776-Speed 2495.18 samples/sec Loss 2.3168 LearningRate 0.000361 Epoch: 18 Global Step: 380780 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:35:54,976-Speed 2498.29 samples/sec Loss 2.3939 LearningRate 0.000361 Epoch: 18 Global Step: 380790 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:03,178-Speed 2497.20 samples/sec Loss 2.3579 LearningRate 0.000361 Epoch: 18 Global Step: 380800 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:11,376-Speed 2498.57 samples/sec Loss 2.3213 LearningRate 0.000361 Epoch: 18 Global Step: 380810 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:19,592-Speed 2493.22 samples/sec Loss 2.2874 LearningRate 0.000361 Epoch: 18 Global Step: 380820 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:27,744-Speed 2512.93 samples/sec Loss 2.2824 LearningRate 0.000361 Epoch: 18 Global Step: 380830 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:35,947-Speed 2496.88 samples/sec Loss 2.2894 LearningRate 0.000361 Epoch: 18 Global Step: 380840 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:44,150-Speed 2497.12 samples/sec Loss 2.2891 LearningRate 0.000361 Epoch: 18 Global Step: 380850 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:36:52,358-Speed 2495.56 samples/sec Loss 2.3058 LearningRate 0.000361 Epoch: 18 Global Step: 380860 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:00,560-Speed 2497.41 samples/sec Loss 2.2872 LearningRate 0.000361 Epoch: 18 Global Step: 380870 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:08,761-Speed 2497.39 samples/sec Loss 2.2595 LearningRate 0.000361 Epoch: 18 Global Step: 380880 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:16,911-Speed 2513.34 samples/sec Loss 2.2673 LearningRate 0.000361 Epoch: 18 Global Step: 380890 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:25,119-Speed 2496.26 samples/sec Loss 2.2417 LearningRate 0.000361 Epoch: 18 Global Step: 380900 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:33,324-Speed 2496.56 samples/sec Loss 2.2610 LearningRate 0.000361 Epoch: 18 Global Step: 380910 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:41,525-Speed 2497.62 samples/sec Loss 2.2454 LearningRate 0.000361 Epoch: 18 Global Step: 380920 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:49,736-Speed 2494.49 samples/sec Loss 2.3041 LearningRate 0.000361 Epoch: 18 Global Step: 380930 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:37:57,954-Speed 2492.65 samples/sec Loss 2.3079 LearningRate 0.000361 Epoch: 18 Global Step: 380940 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:06,109-Speed 2511.95 samples/sec Loss 2.2852 LearningRate 0.000361 Epoch: 18 Global Step: 380950 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:14,311-Speed 2497.17 samples/sec Loss 2.2736 LearningRate 0.000361 Epoch: 18 Global Step: 380960 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:22,518-Speed 2496.11 samples/sec Loss 2.3227 LearningRate 0.000361 Epoch: 18 Global Step: 380970 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:30,722-Speed 2496.86 samples/sec Loss 2.3345 LearningRate 0.000361 Epoch: 18 Global Step: 380980 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:38,921-Speed 2498.13 samples/sec Loss 2.3211 LearningRate 0.000361 Epoch: 18 Global Step: 380990 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:47,123-Speed 2497.51 samples/sec Loss 2.3335 LearningRate 0.000361 Epoch: 18 Global Step: 381000 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:38:55,281-Speed 2511.03 samples/sec Loss 2.3791 LearningRate 0.000361 Epoch: 18 Global Step: 381010 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:03,496-Speed 2493.31 samples/sec Loss 2.3216 LearningRate 0.000361 Epoch: 18 Global Step: 381020 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:11,695-Speed 2498.08 samples/sec Loss 2.3213 LearningRate 0.000361 Epoch: 18 Global Step: 381030 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:19,894-Speed 2498.45 samples/sec Loss 2.3349 LearningRate 0.000361 Epoch: 18 Global Step: 381040 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:28,098-Speed 2496.69 samples/sec Loss 2.3336 LearningRate 0.000361 Epoch: 18 Global Step: 381050 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:36,301-Speed 2497.21 samples/sec Loss 2.2677 LearningRate 0.000361 Epoch: 18 Global Step: 381060 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:44,450-Speed 2513.52 samples/sec Loss 2.2980 LearningRate 0.000361 Epoch: 18 Global Step: 381070 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:39:52,652-Speed 2497.29 samples/sec Loss 2.2744 LearningRate 0.000361 Epoch: 18 Global Step: 381080 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:00,854-Speed 2497.45 samples/sec Loss 2.3372 LearningRate 0.000361 Epoch: 18 Global Step: 381090 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:09,063-Speed 2495.18 samples/sec Loss 2.3379 LearningRate 0.000361 Epoch: 18 Global Step: 381100 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:17,263-Speed 2497.95 samples/sec Loss 2.3175 LearningRate 0.000361 Epoch: 18 Global Step: 381110 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:25,465-Speed 2497.29 samples/sec Loss 2.3109 LearningRate 0.000361 Epoch: 18 Global Step: 381120 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:33,610-Speed 2514.96 samples/sec Loss 2.2956 LearningRate 0.000361 Epoch: 18 Global Step: 381130 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:41,819-Speed 2495.67 samples/sec Loss 2.3796 LearningRate 0.000361 Epoch: 18 Global Step: 381140 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:50,020-Speed 2497.67 samples/sec Loss 2.3540 LearningRate 0.000361 Epoch: 18 Global Step: 381150 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:40:58,216-Speed 2498.98 samples/sec Loss 2.3423 LearningRate 0.000361 Epoch: 18 Global Step: 381160 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:06,416-Speed 2498.21 samples/sec Loss 2.3094 LearningRate 0.000361 Epoch: 18 Global Step: 381170 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:14,612-Speed 2499.07 samples/sec Loss 2.3072 LearningRate 0.000361 Epoch: 18 Global Step: 381180 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:22,758-Speed 2514.40 samples/sec Loss 2.3550 LearningRate 0.000361 Epoch: 18 Global Step: 381190 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:30,967-Speed 2495.21 samples/sec Loss 2.2862 LearningRate 0.000361 Epoch: 18 Global Step: 381200 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:39,168-Speed 2497.83 samples/sec Loss 2.3649 LearningRate 0.000361 Epoch: 18 Global Step: 381210 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:47,364-Speed 2499.23 samples/sec Loss 2.3412 LearningRate 0.000361 Epoch: 18 Global Step: 381220 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:41:55,564-Speed 2497.76 samples/sec Loss 2.2849 LearningRate 0.000361 Epoch: 18 Global Step: 381230 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:03,763-Speed 2498.35 samples/sec Loss 2.3113 LearningRate 0.000361 Epoch: 18 Global Step: 381240 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:11,910-Speed 2514.15 samples/sec Loss 2.2795 LearningRate 0.000361 Epoch: 18 Global Step: 381250 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:20,117-Speed 2495.60 samples/sec Loss 2.2727 LearningRate 0.000361 Epoch: 18 Global Step: 381260 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:28,317-Speed 2498.10 samples/sec Loss 2.3089 LearningRate 0.000361 Epoch: 18 Global Step: 381270 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:36,518-Speed 2497.40 samples/sec Loss 2.2720 LearningRate 0.000361 Epoch: 18 Global Step: 381280 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:44,713-Speed 2499.38 samples/sec Loss 2.2999 LearningRate 0.000360 Epoch: 18 Global Step: 381290 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:42:52,914-Speed 2497.59 samples/sec Loss 2.2912 LearningRate 0.000360 Epoch: 18 Global Step: 381300 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:01,071-Speed 2511.27 samples/sec Loss 2.2772 LearningRate 0.000360 Epoch: 18 Global Step: 381310 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:09,281-Speed 2494.97 samples/sec Loss 2.2951 LearningRate 0.000360 Epoch: 18 Global Step: 381320 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:17,478-Speed 2498.85 samples/sec Loss 2.3460 LearningRate 0.000360 Epoch: 18 Global Step: 381330 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:25,678-Speed 2497.93 samples/sec Loss 2.2914 LearningRate 0.000360 Epoch: 18 Global Step: 381340 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:33,883-Speed 2496.46 samples/sec Loss 2.3267 LearningRate 0.000360 Epoch: 18 Global Step: 381350 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:42,083-Speed 2498.35 samples/sec Loss 2.3422 LearningRate 0.000360 Epoch: 18 Global Step: 381360 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:50,238-Speed 2512.08 samples/sec Loss 2.3182 LearningRate 0.000360 Epoch: 18 Global Step: 381370 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:43:58,436-Speed 2498.44 samples/sec Loss 2.3432 LearningRate 0.000360 Epoch: 18 Global Step: 381380 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:06,637-Speed 2497.39 samples/sec Loss 2.3255 LearningRate 0.000360 Epoch: 18 Global Step: 381390 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:14,835-Speed 2500.74 samples/sec Loss 2.3348 LearningRate 0.000360 Epoch: 18 Global Step: 381400 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:23,033-Speed 2498.62 samples/sec Loss 2.3350 LearningRate 0.000360 Epoch: 18 Global Step: 381410 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:31,229-Speed 2499.21 samples/sec Loss 2.3309 LearningRate 0.000360 Epoch: 18 Global Step: 381420 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:39,374-Speed 2514.74 samples/sec Loss 2.2692 LearningRate 0.000360 Epoch: 18 Global Step: 381430 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:47,582-Speed 2495.69 samples/sec Loss 2.3348 LearningRate 0.000360 Epoch: 18 Global Step: 381440 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:44:55,792-Speed 2494.97 samples/sec Loss 2.3164 LearningRate 0.000360 Epoch: 18 Global Step: 381450 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:03,989-Speed 2499.11 samples/sec Loss 2.2884 LearningRate 0.000360 Epoch: 18 Global Step: 381460 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:12,194-Speed 2496.77 samples/sec Loss 2.3373 LearningRate 0.000360 Epoch: 18 Global Step: 381470 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:20,394-Speed 2498.04 samples/sec Loss 2.2599 LearningRate 0.000360 Epoch: 18 Global Step: 381480 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:28,542-Speed 2513.71 samples/sec Loss 2.3115 LearningRate 0.000360 Epoch: 18 Global Step: 381490 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:36,744-Speed 2497.41 samples/sec Loss 2.3009 LearningRate 0.000360 Epoch: 18 Global Step: 381500 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:44,939-Speed 2499.42 samples/sec Loss 2.3260 LearningRate 0.000360 Epoch: 18 Global Step: 381510 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:45:53,137-Speed 2498.56 samples/sec Loss 2.3286 LearningRate 0.000360 Epoch: 18 Global Step: 381520 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:01,334-Speed 2498.95 samples/sec Loss 2.3054 LearningRate 0.000360 Epoch: 18 Global Step: 381530 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:09,531-Speed 2498.76 samples/sec Loss 2.3502 LearningRate 0.000360 Epoch: 18 Global Step: 381540 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:17,677-Speed 2514.44 samples/sec Loss 2.3075 LearningRate 0.000360 Epoch: 18 Global Step: 381550 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:25,878-Speed 2497.53 samples/sec Loss 2.2913 LearningRate 0.000360 Epoch: 18 Global Step: 381560 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:34,085-Speed 2495.89 samples/sec Loss 2.3287 LearningRate 0.000360 Epoch: 18 Global Step: 381570 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:42,284-Speed 2498.38 samples/sec Loss 2.2883 LearningRate 0.000360 Epoch: 18 Global Step: 381580 Fp16 Grad Scale: 16384 Required: 103 hours Training: 2022-07-09 05:46:50,491-Speed 2495.55 samples/sec Loss 2.2899 LearningRate 0.000360 Epoch: 18 Global Step: 381590 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:46:58,694-Speed 2497.14 samples/sec Loss 2.2902 LearningRate 0.000360 Epoch: 18 Global Step: 381600 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:06,848-Speed 2512.10 samples/sec Loss 2.2552 LearningRate 0.000360 Epoch: 18 Global Step: 381610 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:15,048-Speed 2497.82 samples/sec Loss 2.3049 LearningRate 0.000360 Epoch: 18 Global Step: 381620 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:23,254-Speed 2496.39 samples/sec Loss 2.2655 LearningRate 0.000360 Epoch: 18 Global Step: 381630 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:31,466-Speed 2494.22 samples/sec Loss 2.3048 LearningRate 0.000360 Epoch: 18 Global Step: 381640 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:39,670-Speed 2496.76 samples/sec Loss 2.3254 LearningRate 0.000360 Epoch: 18 Global Step: 381650 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:47,876-Speed 2496.16 samples/sec Loss 2.3169 LearningRate 0.000360 Epoch: 18 Global Step: 381660 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:47:56,026-Speed 2513.30 samples/sec Loss 2.3153 LearningRate 0.000360 Epoch: 18 Global Step: 381670 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:04,232-Speed 2496.08 samples/sec Loss 2.3200 LearningRate 0.000360 Epoch: 18 Global Step: 381680 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:12,432-Speed 2498.35 samples/sec Loss 2.3159 LearningRate 0.000360 Epoch: 18 Global Step: 381690 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:20,632-Speed 2498.23 samples/sec Loss 2.3277 LearningRate 0.000360 Epoch: 18 Global Step: 381700 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:28,834-Speed 2497.44 samples/sec Loss 2.3187 LearningRate 0.000360 Epoch: 18 Global Step: 381710 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:37,033-Speed 2498.09 samples/sec Loss 2.3560 LearningRate 0.000360 Epoch: 18 Global Step: 381720 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:45,183-Speed 2513.51 samples/sec Loss 2.3167 LearningRate 0.000360 Epoch: 18 Global Step: 381730 Fp16 Grad Scale: 32768 Required: 103 hours Training: 2022-07-09 05:48:53,389-Speed 2496.12 samples/sec Loss 2.3131 LearningRate 0.000360 Epoch: 18 Global Step: 381740 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:01,586-Speed 2499.01 samples/sec Loss 2.3557 LearningRate 0.000360 Epoch: 18 Global Step: 381750 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:09,795-Speed 2495.26 samples/sec Loss 2.3602 LearningRate 0.000360 Epoch: 18 Global Step: 381760 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:17,994-Speed 2498.06 samples/sec Loss 2.3443 LearningRate 0.000360 Epoch: 18 Global Step: 381770 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:26,205-Speed 2494.63 samples/sec Loss 2.2971 LearningRate 0.000360 Epoch: 18 Global Step: 381780 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:34,352-Speed 2514.21 samples/sec Loss 2.3439 LearningRate 0.000360 Epoch: 18 Global Step: 381790 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:42,559-Speed 2495.91 samples/sec Loss 2.3078 LearningRate 0.000360 Epoch: 18 Global Step: 381800 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:50,772-Speed 2494.42 samples/sec Loss 2.2725 LearningRate 0.000360 Epoch: 18 Global Step: 381810 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:49:58,972-Speed 2497.79 samples/sec Loss 2.2472 LearningRate 0.000360 Epoch: 18 Global Step: 381820 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:07,173-Speed 2497.77 samples/sec Loss 2.2523 LearningRate 0.000360 Epoch: 18 Global Step: 381830 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:15,377-Speed 2496.57 samples/sec Loss 2.2785 LearningRate 0.000360 Epoch: 18 Global Step: 381840 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:23,529-Speed 2512.68 samples/sec Loss 2.3172 LearningRate 0.000360 Epoch: 18 Global Step: 381850 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:31,728-Speed 2498.23 samples/sec Loss 2.3226 LearningRate 0.000360 Epoch: 18 Global Step: 381860 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:39,928-Speed 2497.95 samples/sec Loss 2.3308 LearningRate 0.000360 Epoch: 18 Global Step: 381870 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:48,127-Speed 2498.95 samples/sec Loss 2.2413 LearningRate 0.000360 Epoch: 18 Global Step: 381880 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:50:56,329-Speed 2497.33 samples/sec Loss 2.2467 LearningRate 0.000360 Epoch: 18 Global Step: 381890 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:04,526-Speed 2498.89 samples/sec Loss 2.3192 LearningRate 0.000360 Epoch: 18 Global Step: 381900 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:12,672-Speed 2514.46 samples/sec Loss 2.2822 LearningRate 0.000359 Epoch: 18 Global Step: 381910 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:20,871-Speed 2498.13 samples/sec Loss 2.3230 LearningRate 0.000359 Epoch: 18 Global Step: 381920 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:29,071-Speed 2498.18 samples/sec Loss 2.2437 LearningRate 0.000359 Epoch: 18 Global Step: 381930 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:37,272-Speed 2497.68 samples/sec Loss 2.2583 LearningRate 0.000359 Epoch: 18 Global Step: 381940 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:45,472-Speed 2497.93 samples/sec Loss 2.3000 LearningRate 0.000359 Epoch: 18 Global Step: 381950 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:51:53,687-Speed 2493.44 samples/sec Loss 2.3180 LearningRate 0.000359 Epoch: 18 Global Step: 381960 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:01,830-Speed 2515.23 samples/sec Loss 2.2913 LearningRate 0.000359 Epoch: 18 Global Step: 381970 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:10,034-Speed 2496.94 samples/sec Loss 2.3028 LearningRate 0.000359 Epoch: 18 Global Step: 381980 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:18,234-Speed 2498.06 samples/sec Loss 2.3241 LearningRate 0.000359 Epoch: 18 Global Step: 381990 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:26,453-Speed 2492.02 samples/sec Loss 2.3627 LearningRate 0.000359 Epoch: 18 Global Step: 382000 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:34,656-Speed 2497.02 samples/sec Loss 2.2936 LearningRate 0.000359 Epoch: 18 Global Step: 382010 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:42,856-Speed 2498.09 samples/sec Loss 2.2527 LearningRate 0.000359 Epoch: 18 Global Step: 382020 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:50,999-Speed 2515.28 samples/sec Loss 2.3041 LearningRate 0.000359 Epoch: 18 Global Step: 382030 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:52:59,211-Speed 2494.35 samples/sec Loss 2.3103 LearningRate 0.000359 Epoch: 18 Global Step: 382040 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:07,424-Speed 2494.05 samples/sec Loss 2.3282 LearningRate 0.000359 Epoch: 18 Global Step: 382050 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:15,634-Speed 2495.16 samples/sec Loss 2.2466 LearningRate 0.000359 Epoch: 18 Global Step: 382060 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:23,834-Speed 2497.78 samples/sec Loss 2.3027 LearningRate 0.000359 Epoch: 18 Global Step: 382070 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:32,033-Speed 2498.05 samples/sec Loss 2.2989 LearningRate 0.000359 Epoch: 18 Global Step: 382080 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:40,185-Speed 2512.96 samples/sec Loss 2.3473 LearningRate 0.000359 Epoch: 18 Global Step: 382090 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:48,385-Speed 2497.79 samples/sec Loss 2.3421 LearningRate 0.000359 Epoch: 18 Global Step: 382100 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:53:56,582-Speed 2498.86 samples/sec Loss 2.3065 LearningRate 0.000359 Epoch: 18 Global Step: 382110 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:04,811-Speed 2489.32 samples/sec Loss 2.3612 LearningRate 0.000359 Epoch: 18 Global Step: 382120 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:13,012-Speed 2497.57 samples/sec Loss 2.3256 LearningRate 0.000359 Epoch: 18 Global Step: 382130 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:21,226-Speed 2493.58 samples/sec Loss 2.3154 LearningRate 0.000359 Epoch: 18 Global Step: 382140 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:29,374-Speed 2513.92 samples/sec Loss 2.3177 LearningRate 0.000359 Epoch: 18 Global Step: 382150 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:37,577-Speed 2497.15 samples/sec Loss 2.3166 LearningRate 0.000359 Epoch: 18 Global Step: 382160 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:45,779-Speed 2497.25 samples/sec Loss 2.2877 LearningRate 0.000359 Epoch: 18 Global Step: 382170 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:54:53,986-Speed 2495.85 samples/sec Loss 2.2967 LearningRate 0.000359 Epoch: 18 Global Step: 382180 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:02,187-Speed 2497.79 samples/sec Loss 2.3181 LearningRate 0.000359 Epoch: 18 Global Step: 382190 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:10,386-Speed 2498.27 samples/sec Loss 2.2743 LearningRate 0.000359 Epoch: 18 Global Step: 382200 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:18,530-Speed 2515.25 samples/sec Loss 2.3181 LearningRate 0.000359 Epoch: 18 Global Step: 382210 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:26,730-Speed 2497.74 samples/sec Loss 2.3139 LearningRate 0.000359 Epoch: 18 Global Step: 382220 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:34,927-Speed 2498.92 samples/sec Loss 2.3215 LearningRate 0.000359 Epoch: 18 Global Step: 382230 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:43,128-Speed 2497.63 samples/sec Loss 2.3146 LearningRate 0.000359 Epoch: 18 Global Step: 382240 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:51,327-Speed 2498.31 samples/sec Loss 2.3277 LearningRate 0.000359 Epoch: 18 Global Step: 382250 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:55:59,523-Speed 2499.15 samples/sec Loss 2.3035 LearningRate 0.000359 Epoch: 18 Global Step: 382260 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:56:07,672-Speed 2513.33 samples/sec Loss 2.3286 LearningRate 0.000359 Epoch: 18 Global Step: 382270 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:56:15,871-Speed 2498.44 samples/sec Loss 2.3482 LearningRate 0.000359 Epoch: 18 Global Step: 382280 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:56:24,073-Speed 2497.88 samples/sec Loss 2.2576 LearningRate 0.000359 Epoch: 18 Global Step: 382290 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 05:56:32,228-Speed 2511.61 samples/sec Loss 2.3650 LearningRate 0.000359 Epoch: 18 Global Step: 382300 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:56:40,424-Speed 2499.07 samples/sec Loss 2.2819 LearningRate 0.000359 Epoch: 18 Global Step: 382310 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:56:48,628-Speed 2496.98 samples/sec Loss 2.3042 LearningRate 0.000359 Epoch: 18 Global Step: 382320 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:56:56,772-Speed 2514.99 samples/sec Loss 2.3136 LearningRate 0.000359 Epoch: 18 Global Step: 382330 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:04,971-Speed 2498.22 samples/sec Loss 2.3204 LearningRate 0.000359 Epoch: 18 Global Step: 382340 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:13,178-Speed 2495.71 samples/sec Loss 2.2732 LearningRate 0.000359 Epoch: 18 Global Step: 382350 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:21,375-Speed 2498.84 samples/sec Loss 2.2662 LearningRate 0.000359 Epoch: 18 Global Step: 382360 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:29,580-Speed 2496.30 samples/sec Loss 2.2751 LearningRate 0.000359 Epoch: 18 Global Step: 382370 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:37,781-Speed 2497.47 samples/sec Loss 2.2487 LearningRate 0.000359 Epoch: 18 Global Step: 382380 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:45,925-Speed 2515.39 samples/sec Loss 2.2907 LearningRate 0.000359 Epoch: 18 Global Step: 382390 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:57:54,123-Speed 2498.46 samples/sec Loss 2.2649 LearningRate 0.000359 Epoch: 18 Global Step: 382400 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:02,317-Speed 2499.66 samples/sec Loss 2.3185 LearningRate 0.000359 Epoch: 18 Global Step: 382410 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:10,518-Speed 2497.87 samples/sec Loss 2.2901 LearningRate 0.000359 Epoch: 18 Global Step: 382420 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:18,715-Speed 2498.89 samples/sec Loss 2.2755 LearningRate 0.000359 Epoch: 18 Global Step: 382430 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:26,913-Speed 2498.77 samples/sec Loss 2.2847 LearningRate 0.000359 Epoch: 18 Global Step: 382440 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:35,061-Speed 2513.78 samples/sec Loss 2.2833 LearningRate 0.000359 Epoch: 18 Global Step: 382450 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:43,260-Speed 2498.34 samples/sec Loss 2.3523 LearningRate 0.000359 Epoch: 18 Global Step: 382460 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:51,456-Speed 2499.14 samples/sec Loss 2.3768 LearningRate 0.000359 Epoch: 18 Global Step: 382470 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:58:59,653-Speed 2498.84 samples/sec Loss 2.2902 LearningRate 0.000359 Epoch: 18 Global Step: 382480 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 05:59:07,809-Speed 2511.46 samples/sec Loss 2.3126 LearningRate 0.000359 Epoch: 18 Global Step: 382490 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:16,008-Speed 2498.44 samples/sec Loss 2.3038 LearningRate 0.000359 Epoch: 18 Global Step: 382500 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:24,154-Speed 2514.50 samples/sec Loss 2.3248 LearningRate 0.000359 Epoch: 18 Global Step: 382510 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:32,352-Speed 2498.39 samples/sec Loss 2.3122 LearningRate 0.000359 Epoch: 18 Global Step: 382520 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:40,557-Speed 2496.56 samples/sec Loss 2.3157 LearningRate 0.000359 Epoch: 18 Global Step: 382530 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:48,764-Speed 2495.66 samples/sec Loss 2.2895 LearningRate 0.000358 Epoch: 18 Global Step: 382540 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 05:59:56,968-Speed 2496.88 samples/sec Loss 2.2905 LearningRate 0.000358 Epoch: 18 Global Step: 382550 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:05,168-Speed 2497.89 samples/sec Loss 2.2930 LearningRate 0.000358 Epoch: 18 Global Step: 382560 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:13,317-Speed 2513.71 samples/sec Loss 2.2474 LearningRate 0.000358 Epoch: 18 Global Step: 382570 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:21,517-Speed 2497.72 samples/sec Loss 2.2372 LearningRate 0.000358 Epoch: 18 Global Step: 382580 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:29,725-Speed 2495.42 samples/sec Loss 2.3321 LearningRate 0.000358 Epoch: 18 Global Step: 382590 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:37,930-Speed 2496.45 samples/sec Loss 2.2348 LearningRate 0.000358 Epoch: 18 Global Step: 382600 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:46,132-Speed 2497.26 samples/sec Loss 2.2948 LearningRate 0.000358 Epoch: 18 Global Step: 382610 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:00:54,335-Speed 2496.99 samples/sec Loss 2.2515 LearningRate 0.000358 Epoch: 18 Global Step: 382620 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:02,483-Speed 2514.23 samples/sec Loss 2.3310 LearningRate 0.000358 Epoch: 18 Global Step: 382630 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:10,688-Speed 2496.39 samples/sec Loss 2.3008 LearningRate 0.000358 Epoch: 18 Global Step: 382640 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:18,886-Speed 2498.63 samples/sec Loss 2.2696 LearningRate 0.000358 Epoch: 18 Global Step: 382650 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:27,087-Speed 2497.57 samples/sec Loss 2.2319 LearningRate 0.000358 Epoch: 18 Global Step: 382660 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:35,289-Speed 2497.49 samples/sec Loss 2.3035 LearningRate 0.000358 Epoch: 18 Global Step: 382670 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:43,493-Speed 2496.74 samples/sec Loss 2.2796 LearningRate 0.000358 Epoch: 18 Global Step: 382680 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:51,639-Speed 2514.40 samples/sec Loss 2.3642 LearningRate 0.000358 Epoch: 18 Global Step: 382690 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:01:59,842-Speed 2497.13 samples/sec Loss 2.2577 LearningRate 0.000358 Epoch: 18 Global Step: 382700 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:08,057-Speed 2493.57 samples/sec Loss 2.3231 LearningRate 0.000358 Epoch: 18 Global Step: 382710 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:16,258-Speed 2497.69 samples/sec Loss 2.2802 LearningRate 0.000358 Epoch: 18 Global Step: 382720 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:24,462-Speed 2496.83 samples/sec Loss 2.3262 LearningRate 0.000358 Epoch: 18 Global Step: 382730 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:32,661-Speed 2498.33 samples/sec Loss 2.2842 LearningRate 0.000358 Epoch: 18 Global Step: 382740 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:40,822-Speed 2510.09 samples/sec Loss 2.3229 LearningRate 0.000358 Epoch: 18 Global Step: 382750 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:49,028-Speed 2496.12 samples/sec Loss 2.2983 LearningRate 0.000358 Epoch: 18 Global Step: 382760 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:02:57,224-Speed 2498.99 samples/sec Loss 2.2795 LearningRate 0.000358 Epoch: 18 Global Step: 382770 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:05,424-Speed 2498.01 samples/sec Loss 2.3221 LearningRate 0.000358 Epoch: 18 Global Step: 382780 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:13,631-Speed 2495.87 samples/sec Loss 2.2625 LearningRate 0.000358 Epoch: 18 Global Step: 382790 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:21,831-Speed 2497.91 samples/sec Loss 2.3420 LearningRate 0.000358 Epoch: 18 Global Step: 382800 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:29,980-Speed 2513.83 samples/sec Loss 2.2979 LearningRate 0.000358 Epoch: 18 Global Step: 382810 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:38,182-Speed 2497.32 samples/sec Loss 2.3257 LearningRate 0.000358 Epoch: 18 Global Step: 382820 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:46,382-Speed 2498.02 samples/sec Loss 2.2725 LearningRate 0.000358 Epoch: 18 Global Step: 382830 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:03:54,580-Speed 2498.38 samples/sec Loss 2.3041 LearningRate 0.000358 Epoch: 18 Global Step: 382840 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:02,785-Speed 2496.50 samples/sec Loss 2.2274 LearningRate 0.000358 Epoch: 18 Global Step: 382850 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:10,983-Speed 2498.57 samples/sec Loss 2.2566 LearningRate 0.000358 Epoch: 18 Global Step: 382860 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:19,130-Speed 2514.19 samples/sec Loss 2.2173 LearningRate 0.000358 Epoch: 18 Global Step: 382870 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:27,330-Speed 2498.05 samples/sec Loss 2.2082 LearningRate 0.000358 Epoch: 18 Global Step: 382880 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:35,532-Speed 2497.57 samples/sec Loss 2.2540 LearningRate 0.000358 Epoch: 18 Global Step: 382890 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:43,733-Speed 2497.71 samples/sec Loss 2.3206 LearningRate 0.000358 Epoch: 18 Global Step: 382900 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:04:51,932-Speed 2498.39 samples/sec Loss 2.2961 LearningRate 0.000358 Epoch: 18 Global Step: 382910 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:00,147-Speed 2493.43 samples/sec Loss 2.2862 LearningRate 0.000358 Epoch: 18 Global Step: 382920 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:08,302-Speed 2511.56 samples/sec Loss 2.2576 LearningRate 0.000358 Epoch: 18 Global Step: 382930 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:16,506-Speed 2496.92 samples/sec Loss 2.2819 LearningRate 0.000358 Epoch: 18 Global Step: 382940 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:24,705-Speed 2498.32 samples/sec Loss 2.3063 LearningRate 0.000358 Epoch: 18 Global Step: 382950 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:32,901-Speed 2498.91 samples/sec Loss 2.2628 LearningRate 0.000358 Epoch: 18 Global Step: 382960 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:41,104-Speed 2497.14 samples/sec Loss 2.2453 LearningRate 0.000358 Epoch: 18 Global Step: 382970 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:49,307-Speed 2496.89 samples/sec Loss 2.2989 LearningRate 0.000358 Epoch: 18 Global Step: 382980 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:05:57,453-Speed 2514.72 samples/sec Loss 2.3011 LearningRate 0.000358 Epoch: 18 Global Step: 382990 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:05,654-Speed 2497.61 samples/sec Loss 2.3081 LearningRate 0.000358 Epoch: 18 Global Step: 383000 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:13,853-Speed 2498.24 samples/sec Loss 2.2795 LearningRate 0.000358 Epoch: 18 Global Step: 383010 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:22,051-Speed 2498.45 samples/sec Loss 2.2830 LearningRate 0.000358 Epoch: 18 Global Step: 383020 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:30,257-Speed 2496.18 samples/sec Loss 2.2610 LearningRate 0.000358 Epoch: 18 Global Step: 383030 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:38,463-Speed 2496.17 samples/sec Loss 2.2572 LearningRate 0.000358 Epoch: 18 Global Step: 383040 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:46,612-Speed 2513.40 samples/sec Loss 2.3023 LearningRate 0.000358 Epoch: 18 Global Step: 383050 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:06:54,817-Speed 2496.46 samples/sec Loss 2.2876 LearningRate 0.000358 Epoch: 18 Global Step: 383060 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:03,028-Speed 2494.78 samples/sec Loss 2.2757 LearningRate 0.000358 Epoch: 18 Global Step: 383070 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:11,230-Speed 2497.26 samples/sec Loss 2.3249 LearningRate 0.000358 Epoch: 18 Global Step: 383080 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:19,433-Speed 2497.10 samples/sec Loss 2.2594 LearningRate 0.000358 Epoch: 18 Global Step: 383090 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:27,632-Speed 2498.37 samples/sec Loss 2.3055 LearningRate 0.000358 Epoch: 18 Global Step: 383100 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:35,789-Speed 2510.80 samples/sec Loss 2.2593 LearningRate 0.000358 Epoch: 18 Global Step: 383110 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:43,991-Speed 2497.52 samples/sec Loss 2.2978 LearningRate 0.000358 Epoch: 18 Global Step: 383120 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:07:52,189-Speed 2498.48 samples/sec Loss 2.3129 LearningRate 0.000358 Epoch: 18 Global Step: 383130 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:00,390-Speed 2497.61 samples/sec Loss 2.2515 LearningRate 0.000358 Epoch: 18 Global Step: 383140 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:08,596-Speed 2496.02 samples/sec Loss 2.2838 LearningRate 0.000358 Epoch: 18 Global Step: 383150 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:16,799-Speed 2497.06 samples/sec Loss 2.2958 LearningRate 0.000357 Epoch: 18 Global Step: 383160 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:24,969-Speed 2507.43 samples/sec Loss 2.3181 LearningRate 0.000357 Epoch: 18 Global Step: 383170 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:33,183-Speed 2493.76 samples/sec Loss 2.2788 LearningRate 0.000357 Epoch: 18 Global Step: 383180 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:41,392-Speed 2495.03 samples/sec Loss 2.2318 LearningRate 0.000357 Epoch: 18 Global Step: 383190 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:49,591-Speed 2498.12 samples/sec Loss 2.3108 LearningRate 0.000357 Epoch: 18 Global Step: 383200 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:08:57,788-Speed 2499.13 samples/sec Loss 2.3017 LearningRate 0.000357 Epoch: 18 Global Step: 383210 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:05,986-Speed 2498.10 samples/sec Loss 2.2830 LearningRate 0.000357 Epoch: 18 Global Step: 383220 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:14,131-Speed 2515.05 samples/sec Loss 2.3073 LearningRate 0.000357 Epoch: 18 Global Step: 383230 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:22,352-Speed 2491.40 samples/sec Loss 2.2641 LearningRate 0.000357 Epoch: 18 Global Step: 383240 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:30,551-Speed 2498.39 samples/sec Loss 2.3051 LearningRate 0.000357 Epoch: 18 Global Step: 383250 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:38,749-Speed 2498.57 samples/sec Loss 2.3303 LearningRate 0.000357 Epoch: 18 Global Step: 383260 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:46,947-Speed 2498.30 samples/sec Loss 2.2917 LearningRate 0.000357 Epoch: 18 Global Step: 383270 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:09:55,148-Speed 2497.85 samples/sec Loss 2.3287 LearningRate 0.000357 Epoch: 18 Global Step: 383280 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:03,296-Speed 2514.09 samples/sec Loss 2.3196 LearningRate 0.000357 Epoch: 18 Global Step: 383290 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:11,494-Speed 2498.33 samples/sec Loss 2.2765 LearningRate 0.000357 Epoch: 18 Global Step: 383300 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:19,693-Speed 2498.44 samples/sec Loss 2.3217 LearningRate 0.000357 Epoch: 18 Global Step: 383310 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:27,895-Speed 2497.33 samples/sec Loss 2.3235 LearningRate 0.000357 Epoch: 18 Global Step: 383320 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:36,095-Speed 2497.91 samples/sec Loss 2.2841 LearningRate 0.000357 Epoch: 18 Global Step: 383330 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:44,301-Speed 2496.41 samples/sec Loss 2.2821 LearningRate 0.000357 Epoch: 18 Global Step: 383340 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:10:52,446-Speed 2514.62 samples/sec Loss 2.3254 LearningRate 0.000357 Epoch: 18 Global Step: 383350 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:00,649-Speed 2497.27 samples/sec Loss 2.2715 LearningRate 0.000357 Epoch: 18 Global Step: 383360 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:08,883-Speed 2487.38 samples/sec Loss 2.2947 LearningRate 0.000357 Epoch: 18 Global Step: 383370 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:17,084-Speed 2497.78 samples/sec Loss 2.2752 LearningRate 0.000357 Epoch: 18 Global Step: 383380 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:25,298-Speed 2493.48 samples/sec Loss 2.3437 LearningRate 0.000357 Epoch: 18 Global Step: 383390 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:33,499-Speed 2497.78 samples/sec Loss 2.3558 LearningRate 0.000357 Epoch: 18 Global Step: 383400 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:41,651-Speed 2512.59 samples/sec Loss 2.3060 LearningRate 0.000357 Epoch: 18 Global Step: 383410 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:49,849-Speed 2498.61 samples/sec Loss 2.2634 LearningRate 0.000357 Epoch: 18 Global Step: 383420 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:11:58,048-Speed 2498.50 samples/sec Loss 2.2743 LearningRate 0.000357 Epoch: 18 Global Step: 383430 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:06,247-Speed 2498.02 samples/sec Loss 2.2909 LearningRate 0.000357 Epoch: 18 Global Step: 383440 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:14,446-Speed 2498.20 samples/sec Loss 2.3475 LearningRate 0.000357 Epoch: 18 Global Step: 383450 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:22,650-Speed 2496.68 samples/sec Loss 2.3125 LearningRate 0.000357 Epoch: 18 Global Step: 383460 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:30,796-Speed 2514.49 samples/sec Loss 2.3025 LearningRate 0.000357 Epoch: 18 Global Step: 383470 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:38,999-Speed 2497.19 samples/sec Loss 2.3362 LearningRate 0.000357 Epoch: 18 Global Step: 383480 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:47,208-Speed 2495.31 samples/sec Loss 2.2727 LearningRate 0.000357 Epoch: 18 Global Step: 383490 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:12:55,410-Speed 2497.35 samples/sec Loss 2.3062 LearningRate 0.000357 Epoch: 18 Global Step: 383500 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:03,622-Speed 2494.50 samples/sec Loss 2.2791 LearningRate 0.000357 Epoch: 18 Global Step: 383510 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:11,822-Speed 2498.17 samples/sec Loss 2.3538 LearningRate 0.000357 Epoch: 18 Global Step: 383520 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:19,963-Speed 2515.98 samples/sec Loss 2.2814 LearningRate 0.000357 Epoch: 18 Global Step: 383530 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:28,163-Speed 2497.80 samples/sec Loss 2.2902 LearningRate 0.000357 Epoch: 18 Global Step: 383540 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:36,375-Speed 2494.59 samples/sec Loss 2.2681 LearningRate 0.000357 Epoch: 18 Global Step: 383550 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:44,579-Speed 2496.44 samples/sec Loss 2.2992 LearningRate 0.000357 Epoch: 18 Global Step: 383560 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:13:52,776-Speed 2498.97 samples/sec Loss 2.3045 LearningRate 0.000357 Epoch: 18 Global Step: 383570 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:00,980-Speed 2496.73 samples/sec Loss 2.2560 LearningRate 0.000357 Epoch: 18 Global Step: 383580 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:09,129-Speed 2513.84 samples/sec Loss 2.2761 LearningRate 0.000357 Epoch: 18 Global Step: 383590 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:17,327-Speed 2498.52 samples/sec Loss 2.2923 LearningRate 0.000357 Epoch: 18 Global Step: 383600 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:25,527-Speed 2497.95 samples/sec Loss 2.2477 LearningRate 0.000357 Epoch: 18 Global Step: 383610 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:33,730-Speed 2497.37 samples/sec Loss 2.3244 LearningRate 0.000357 Epoch: 18 Global Step: 383620 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:41,927-Speed 2499.00 samples/sec Loss 2.2931 LearningRate 0.000357 Epoch: 18 Global Step: 383630 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:50,125-Speed 2498.62 samples/sec Loss 2.3282 LearningRate 0.000357 Epoch: 18 Global Step: 383640 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:14:58,274-Speed 2514.37 samples/sec Loss 2.2539 LearningRate 0.000357 Epoch: 18 Global Step: 383650 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:15:06,477-Speed 2497.26 samples/sec Loss 2.3013 LearningRate 0.000357 Epoch: 18 Global Step: 383660 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:15:14,676-Speed 2497.96 samples/sec Loss 2.2903 LearningRate 0.000357 Epoch: 18 Global Step: 383670 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:15:22,882-Speed 2496.17 samples/sec Loss 2.3278 LearningRate 0.000357 Epoch: 18 Global Step: 383680 Fp16 Grad Scale: 8192 Required: 102 hours Training: 2022-07-09 06:15:31,081-Speed 2498.52 samples/sec Loss 2.3277 LearningRate 0.000357 Epoch: 18 Global Step: 383690 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:15:39,281-Speed 2497.90 samples/sec Loss 2.2919 LearningRate 0.000357 Epoch: 18 Global Step: 383700 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:15:47,429-Speed 2513.98 samples/sec Loss 2.2686 LearningRate 0.000357 Epoch: 18 Global Step: 383710 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:15:55,627-Speed 2498.42 samples/sec Loss 2.3138 LearningRate 0.000357 Epoch: 18 Global Step: 383720 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:03,825-Speed 2498.49 samples/sec Loss 2.3231 LearningRate 0.000357 Epoch: 18 Global Step: 383730 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:12,035-Speed 2494.98 samples/sec Loss 2.3467 LearningRate 0.000357 Epoch: 18 Global Step: 383740 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:20,240-Speed 2496.56 samples/sec Loss 2.2734 LearningRate 0.000357 Epoch: 18 Global Step: 383750 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:28,445-Speed 2496.36 samples/sec Loss 2.3091 LearningRate 0.000357 Epoch: 18 Global Step: 383760 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:36,618-Speed 2506.47 samples/sec Loss 2.2688 LearningRate 0.000357 Epoch: 18 Global Step: 383770 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:44,836-Speed 2492.48 samples/sec Loss 2.3489 LearningRate 0.000357 Epoch: 18 Global Step: 383780 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:16:53,049-Speed 2493.88 samples/sec Loss 2.2953 LearningRate 0.000356 Epoch: 18 Global Step: 383790 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:01,257-Speed 2495.43 samples/sec Loss 2.2996 LearningRate 0.000356 Epoch: 18 Global Step: 383800 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:09,462-Speed 2496.49 samples/sec Loss 2.2926 LearningRate 0.000356 Epoch: 18 Global Step: 383810 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:17,663-Speed 2497.67 samples/sec Loss 2.2625 LearningRate 0.000356 Epoch: 18 Global Step: 383820 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:25,813-Speed 2513.34 samples/sec Loss 2.2594 LearningRate 0.000356 Epoch: 18 Global Step: 383830 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:34,019-Speed 2496.15 samples/sec Loss 2.2875 LearningRate 0.000356 Epoch: 18 Global Step: 383840 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:42,233-Speed 2493.43 samples/sec Loss 2.3067 LearningRate 0.000356 Epoch: 18 Global Step: 383850 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:50,443-Speed 2495.39 samples/sec Loss 2.3116 LearningRate 0.000356 Epoch: 18 Global Step: 383860 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:17:58,653-Speed 2494.90 samples/sec Loss 2.2937 LearningRate 0.000356 Epoch: 18 Global Step: 383870 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:06,858-Speed 2496.62 samples/sec Loss 2.2122 LearningRate 0.000356 Epoch: 18 Global Step: 383880 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:15,021-Speed 2509.50 samples/sec Loss 2.3143 LearningRate 0.000356 Epoch: 18 Global Step: 383890 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:23,224-Speed 2496.95 samples/sec Loss 2.2493 LearningRate 0.000356 Epoch: 18 Global Step: 383900 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:31,429-Speed 2496.34 samples/sec Loss 2.2261 LearningRate 0.000356 Epoch: 18 Global Step: 383910 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:39,634-Speed 2496.80 samples/sec Loss 2.2928 LearningRate 0.000356 Epoch: 18 Global Step: 383920 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:47,843-Speed 2495.53 samples/sec Loss 2.2824 LearningRate 0.000356 Epoch: 18 Global Step: 383930 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:18:56,056-Speed 2493.78 samples/sec Loss 2.2830 LearningRate 0.000356 Epoch: 18 Global Step: 383940 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:04,207-Speed 2512.96 samples/sec Loss 2.3520 LearningRate 0.000356 Epoch: 18 Global Step: 383950 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:12,411-Speed 2496.98 samples/sec Loss 2.3086 LearningRate 0.000356 Epoch: 18 Global Step: 383960 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:20,611-Speed 2497.88 samples/sec Loss 2.2855 LearningRate 0.000356 Epoch: 18 Global Step: 383970 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:28,813-Speed 2497.39 samples/sec Loss 2.2858 LearningRate 0.000356 Epoch: 18 Global Step: 383980 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:37,014-Speed 2497.74 samples/sec Loss 2.3144 LearningRate 0.000356 Epoch: 18 Global Step: 383990 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:45,211-Speed 2498.69 samples/sec Loss 2.2830 LearningRate 0.000356 Epoch: 18 Global Step: 384000 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:19:53,358-Speed 2514.57 samples/sec Loss 2.2667 LearningRate 0.000356 Epoch: 18 Global Step: 384010 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:01,557-Speed 2498.97 samples/sec Loss 2.2653 LearningRate 0.000356 Epoch: 18 Global Step: 384020 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:09,757-Speed 2497.90 samples/sec Loss 2.2238 LearningRate 0.000356 Epoch: 18 Global Step: 384030 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:17,957-Speed 2498.09 samples/sec Loss 2.2603 LearningRate 0.000356 Epoch: 18 Global Step: 384040 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:26,162-Speed 2496.28 samples/sec Loss 2.2753 LearningRate 0.000356 Epoch: 18 Global Step: 384050 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:34,365-Speed 2497.20 samples/sec Loss 2.2498 LearningRate 0.000356 Epoch: 18 Global Step: 384060 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:42,512-Speed 2513.96 samples/sec Loss 2.3077 LearningRate 0.000356 Epoch: 18 Global Step: 384070 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:50,711-Speed 2498.42 samples/sec Loss 2.2212 LearningRate 0.000356 Epoch: 18 Global Step: 384080 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:20:58,910-Speed 2498.28 samples/sec Loss 2.2412 LearningRate 0.000356 Epoch: 18 Global Step: 384090 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:07,114-Speed 2496.63 samples/sec Loss 2.3221 LearningRate 0.000356 Epoch: 18 Global Step: 384100 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:15,320-Speed 2496.04 samples/sec Loss 2.2914 LearningRate 0.000356 Epoch: 18 Global Step: 384110 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:23,518-Speed 2498.45 samples/sec Loss 2.2533 LearningRate 0.000356 Epoch: 18 Global Step: 384120 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:31,664-Speed 2514.74 samples/sec Loss 2.3587 LearningRate 0.000356 Epoch: 18 Global Step: 384130 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:39,865-Speed 2497.57 samples/sec Loss 2.3256 LearningRate 0.000356 Epoch: 18 Global Step: 384140 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:48,072-Speed 2495.78 samples/sec Loss 2.3219 LearningRate 0.000356 Epoch: 18 Global Step: 384150 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:21:56,283-Speed 2494.62 samples/sec Loss 2.3393 LearningRate 0.000356 Epoch: 18 Global Step: 384160 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:04,486-Speed 2496.99 samples/sec Loss 2.2803 LearningRate 0.000356 Epoch: 18 Global Step: 384170 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:12,689-Speed 2496.82 samples/sec Loss 2.3044 LearningRate 0.000356 Epoch: 18 Global Step: 384180 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:20,833-Speed 2515.16 samples/sec Loss 2.2558 LearningRate 0.000356 Epoch: 18 Global Step: 384190 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:29,038-Speed 2496.62 samples/sec Loss 2.2643 LearningRate 0.000356 Epoch: 18 Global Step: 384200 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:37,333-Speed 2469.09 samples/sec Loss 2.2982 LearningRate 0.000356 Epoch: 18 Global Step: 384210 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:45,531-Speed 2498.61 samples/sec Loss 2.3149 LearningRate 0.000356 Epoch: 18 Global Step: 384220 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:22:53,745-Speed 2493.78 samples/sec Loss 2.3547 LearningRate 0.000356 Epoch: 18 Global Step: 384230 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:01,943-Speed 2498.45 samples/sec Loss 2.2493 LearningRate 0.000356 Epoch: 18 Global Step: 384240 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:10,093-Speed 2513.38 samples/sec Loss 2.2580 LearningRate 0.000356 Epoch: 18 Global Step: 384250 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:18,293-Speed 2498.14 samples/sec Loss 2.2413 LearningRate 0.000356 Epoch: 18 Global Step: 384260 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:26,490-Speed 2498.85 samples/sec Loss 2.3075 LearningRate 0.000356 Epoch: 18 Global Step: 384270 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:34,690-Speed 2498.01 samples/sec Loss 2.2463 LearningRate 0.000356 Epoch: 18 Global Step: 384280 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:42,893-Speed 2496.69 samples/sec Loss 2.2328 LearningRate 0.000356 Epoch: 18 Global Step: 384290 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:51,114-Speed 2491.49 samples/sec Loss 2.3044 LearningRate 0.000356 Epoch: 18 Global Step: 384300 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:23:59,275-Speed 2509.96 samples/sec Loss 2.3007 LearningRate 0.000356 Epoch: 18 Global Step: 384310 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:07,475-Speed 2498.45 samples/sec Loss 2.2523 LearningRate 0.000356 Epoch: 18 Global Step: 384320 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:15,682-Speed 2495.53 samples/sec Loss 2.3180 LearningRate 0.000356 Epoch: 18 Global Step: 384330 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:23,886-Speed 2496.88 samples/sec Loss 2.2825 LearningRate 0.000356 Epoch: 18 Global Step: 384340 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:32,088-Speed 2497.33 samples/sec Loss 2.2676 LearningRate 0.000356 Epoch: 18 Global Step: 384350 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:40,301-Speed 2493.99 samples/sec Loss 2.2695 LearningRate 0.000356 Epoch: 18 Global Step: 384360 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:48,471-Speed 2507.06 samples/sec Loss 2.2845 LearningRate 0.000356 Epoch: 18 Global Step: 384370 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:24:56,671-Speed 2498.06 samples/sec Loss 2.3419 LearningRate 0.000356 Epoch: 18 Global Step: 384380 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:04,871-Speed 2498.12 samples/sec Loss 2.2937 LearningRate 0.000356 Epoch: 18 Global Step: 384390 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:13,075-Speed 2496.61 samples/sec Loss 2.2970 LearningRate 0.000356 Epoch: 18 Global Step: 384400 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:21,280-Speed 2496.35 samples/sec Loss 2.2584 LearningRate 0.000355 Epoch: 18 Global Step: 384410 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:29,480-Speed 2498.24 samples/sec Loss 2.3741 LearningRate 0.000355 Epoch: 18 Global Step: 384420 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:37,629-Speed 2513.68 samples/sec Loss 2.3066 LearningRate 0.000355 Epoch: 18 Global Step: 384430 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:45,829-Speed 2497.78 samples/sec Loss 2.2699 LearningRate 0.000355 Epoch: 18 Global Step: 384440 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:25:54,031-Speed 2497.88 samples/sec Loss 2.2542 LearningRate 0.000355 Epoch: 18 Global Step: 384450 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:02,232-Speed 2497.72 samples/sec Loss 2.2866 LearningRate 0.000355 Epoch: 18 Global Step: 384460 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:10,435-Speed 2496.78 samples/sec Loss 2.2726 LearningRate 0.000355 Epoch: 18 Global Step: 384470 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:18,634-Speed 2498.56 samples/sec Loss 2.3233 LearningRate 0.000355 Epoch: 18 Global Step: 384480 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:26,778-Speed 2514.99 samples/sec Loss 2.3206 LearningRate 0.000355 Epoch: 18 Global Step: 384490 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:34,977-Speed 2498.15 samples/sec Loss 2.2932 LearningRate 0.000355 Epoch: 18 Global Step: 384500 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:43,177-Speed 2498.03 samples/sec Loss 2.1913 LearningRate 0.000355 Epoch: 18 Global Step: 384510 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:51,393-Speed 2493.19 samples/sec Loss 2.2846 LearningRate 0.000355 Epoch: 18 Global Step: 384520 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:26:59,594-Speed 2497.66 samples/sec Loss 2.2977 LearningRate 0.000355 Epoch: 18 Global Step: 384530 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:07,799-Speed 2496.45 samples/sec Loss 2.2782 LearningRate 0.000355 Epoch: 18 Global Step: 384540 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:15,949-Speed 2513.33 samples/sec Loss 2.2496 LearningRate 0.000355 Epoch: 18 Global Step: 384550 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:24,149-Speed 2498.14 samples/sec Loss 2.2811 LearningRate 0.000355 Epoch: 18 Global Step: 384560 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:32,348-Speed 2498.18 samples/sec Loss 2.3356 LearningRate 0.000355 Epoch: 18 Global Step: 384570 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:40,548-Speed 2498.11 samples/sec Loss 2.2534 LearningRate 0.000355 Epoch: 18 Global Step: 384580 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:48,746-Speed 2498.49 samples/sec Loss 2.2725 LearningRate 0.000355 Epoch: 18 Global Step: 384590 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:27:56,961-Speed 2493.42 samples/sec Loss 2.2274 LearningRate 0.000355 Epoch: 18 Global Step: 384600 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:05,111-Speed 2513.20 samples/sec Loss 2.2703 LearningRate 0.000355 Epoch: 18 Global Step: 384610 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:13,316-Speed 2496.56 samples/sec Loss 2.2465 LearningRate 0.000355 Epoch: 18 Global Step: 384620 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:21,515-Speed 2498.08 samples/sec Loss 2.2719 LearningRate 0.000355 Epoch: 18 Global Step: 384630 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:29,717-Speed 2497.56 samples/sec Loss 2.3073 LearningRate 0.000355 Epoch: 18 Global Step: 384640 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:37,918-Speed 2497.61 samples/sec Loss 2.3830 LearningRate 0.000355 Epoch: 18 Global Step: 384650 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:46,133-Speed 2493.52 samples/sec Loss 2.2714 LearningRate 0.000355 Epoch: 18 Global Step: 384660 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:28:54,281-Speed 2514.14 samples/sec Loss 2.2898 LearningRate 0.000355 Epoch: 18 Global Step: 384670 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:02,490-Speed 2495.36 samples/sec Loss 2.3444 LearningRate 0.000355 Epoch: 18 Global Step: 384680 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:10,689-Speed 2498.23 samples/sec Loss 2.3369 LearningRate 0.000355 Epoch: 18 Global Step: 384690 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:18,914-Speed 2499.20 samples/sec Loss 2.2765 LearningRate 0.000355 Epoch: 18 Global Step: 384700 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:27,115-Speed 2497.46 samples/sec Loss 2.3431 LearningRate 0.000355 Epoch: 18 Global Step: 384710 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:35,367-Speed 2498.90 samples/sec Loss 2.2849 LearningRate 0.000355 Epoch: 18 Global Step: 384720 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:43,510-Speed 2515.28 samples/sec Loss 2.2875 LearningRate 0.000355 Epoch: 18 Global Step: 384730 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:29:51,780-Speed 2498.78 samples/sec Loss 2.2672 LearningRate 0.000355 Epoch: 18 Global Step: 384740 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:02,886-Speed 2498.34 samples/sec Loss 2.2348 LearningRate 0.000355 Epoch: 18 Global Step: 384750 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:11,081-Speed 2499.43 samples/sec Loss 2.2586 LearningRate 0.000355 Epoch: 18 Global Step: 384760 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:23,680-Speed 1626.89 samples/sec Loss 2.2886 LearningRate 0.000355 Epoch: 18 Global Step: 384770 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:32,098-Speed 2499.78 samples/sec Loss 2.2374 LearningRate 0.000355 Epoch: 18 Global Step: 384780 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:40,244-Speed 2514.54 samples/sec Loss 2.2630 LearningRate 0.000355 Epoch: 18 Global Step: 384790 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:30:53,374-Speed 2494.95 samples/sec Loss 2.2622 LearningRate 0.000355 Epoch: 18 Global Step: 384800 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:01,587-Speed 2501.94 samples/sec Loss 2.2893 LearningRate 0.000355 Epoch: 18 Global Step: 384810 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:13,840-Speed 1671.55 samples/sec Loss 2.2938 LearningRate 0.000355 Epoch: 18 Global Step: 384820 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:22,031-Speed 2502.05 samples/sec Loss 2.2959 LearningRate 0.000355 Epoch: 18 Global Step: 384830 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:30,274-Speed 2500.69 samples/sec Loss 2.2575 LearningRate 0.000355 Epoch: 18 Global Step: 384840 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:38,419-Speed 2514.64 samples/sec Loss 2.3384 LearningRate 0.000355 Epoch: 18 Global Step: 384850 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:50,711-Speed 2489.27 samples/sec Loss 2.2417 LearningRate 0.000355 Epoch: 18 Global Step: 384860 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:31:59,384-Speed 2499.58 samples/sec Loss 2.3187 LearningRate 0.000355 Epoch: 18 Global Step: 384870 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:32:07,593-Speed 2495.19 samples/sec Loss 2.3028 LearningRate 0.000355 Epoch: 18 Global Step: 384880 Fp16 Grad Scale: 16384 Required: 102 hours Training: 2022-07-09 06:32:15,833-Speed 2498.66 samples/sec Loss 2.2682 LearningRate 0.000355 Epoch: 18 Global Step: 384890 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:32:26,817-Speed 1894.34 samples/sec Loss 2.3126 LearningRate 0.000355 Epoch: 18 Global Step: 384900 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:32:34,998-Speed 2518.14 samples/sec Loss 2.2996 LearningRate 0.000355 Epoch: 18 Global Step: 384910 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:32:43,197-Speed 2498.15 samples/sec Loss 2.2654 LearningRate 0.000355 Epoch: 18 Global Step: 384920 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:32:55,779-Speed 2499.34 samples/sec Loss 2.2697 LearningRate 0.000355 Epoch: 18 Global Step: 384930 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:07,394-Speed 1765.14 samples/sec Loss 2.2580 LearningRate 0.000355 Epoch: 18 Global Step: 384940 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:15,670-Speed 2474.89 samples/sec Loss 2.3001 LearningRate 0.000355 Epoch: 18 Global Step: 384950 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:23,903-Speed 2500.77 samples/sec Loss 2.2471 LearningRate 0.000355 Epoch: 18 Global Step: 384960 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:33,454-Speed 2185.83 samples/sec Loss 2.3250 LearningRate 0.000355 Epoch: 18 Global Step: 384970 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:41,657-Speed 2497.18 samples/sec Loss 2.2856 LearningRate 0.000355 Epoch: 18 Global Step: 384980 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:49,854-Speed 2498.71 samples/sec Loss 2.2981 LearningRate 0.000355 Epoch: 18 Global Step: 384990 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:33:58,057-Speed 2497.21 samples/sec Loss 2.2909 LearningRate 0.000355 Epoch: 18 Global Step: 385000 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:06,258-Speed 2497.37 samples/sec Loss 2.2377 LearningRate 0.000355 Epoch: 18 Global Step: 385010 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:14,464-Speed 2496.13 samples/sec Loss 2.2917 LearningRate 0.000355 Epoch: 18 Global Step: 385020 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:22,614-Speed 2513.34 samples/sec Loss 2.2697 LearningRate 0.000355 Epoch: 18 Global Step: 385030 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:30,824-Speed 2495.06 samples/sec Loss 2.2541 LearningRate 0.000354 Epoch: 18 Global Step: 385040 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:39,031-Speed 2495.55 samples/sec Loss 2.2454 LearningRate 0.000354 Epoch: 18 Global Step: 385050 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:47,240-Speed 2495.50 samples/sec Loss 2.2651 LearningRate 0.000354 Epoch: 18 Global Step: 385060 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:34:55,441-Speed 2497.60 samples/sec Loss 2.2520 LearningRate 0.000354 Epoch: 18 Global Step: 385070 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:03,646-Speed 2496.13 samples/sec Loss 2.3160 LearningRate 0.000354 Epoch: 18 Global Step: 385080 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:11,794-Speed 2513.90 samples/sec Loss 2.2193 LearningRate 0.000354 Epoch: 18 Global Step: 385090 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:19,995-Speed 2497.78 samples/sec Loss 2.2106 LearningRate 0.000354 Epoch: 18 Global Step: 385100 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:28,201-Speed 2495.83 samples/sec Loss 2.2431 LearningRate 0.000354 Epoch: 18 Global Step: 385110 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:36,405-Speed 2497.42 samples/sec Loss 2.2459 LearningRate 0.000354 Epoch: 18 Global Step: 385120 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:44,622-Speed 2492.73 samples/sec Loss 2.2550 LearningRate 0.000354 Epoch: 18 Global Step: 385130 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:35:52,826-Speed 2496.66 samples/sec Loss 2.2335 LearningRate 0.000354 Epoch: 18 Global Step: 385140 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:00,977-Speed 2513.10 samples/sec Loss 2.2097 LearningRate 0.000354 Epoch: 18 Global Step: 385150 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:09,187-Speed 2494.91 samples/sec Loss 2.2998 LearningRate 0.000354 Epoch: 18 Global Step: 385160 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:17,389-Speed 2497.48 samples/sec Loss 2.2312 LearningRate 0.000354 Epoch: 18 Global Step: 385170 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:25,602-Speed 2494.11 samples/sec Loss 2.2568 LearningRate 0.000354 Epoch: 18 Global Step: 385180 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:33,812-Speed 2494.87 samples/sec Loss 2.2306 LearningRate 0.000354 Epoch: 18 Global Step: 385190 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:42,017-Speed 2496.22 samples/sec Loss 2.2693 LearningRate 0.000354 Epoch: 18 Global Step: 385200 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:50,171-Speed 2512.08 samples/sec Loss 2.2229 LearningRate 0.000354 Epoch: 18 Global Step: 385210 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:36:58,372-Speed 2497.78 samples/sec Loss 2.2474 LearningRate 0.000354 Epoch: 18 Global Step: 385220 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:06,578-Speed 2496.25 samples/sec Loss 2.2519 LearningRate 0.000354 Epoch: 18 Global Step: 385230 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:14,782-Speed 2496.97 samples/sec Loss 2.2458 LearningRate 0.000354 Epoch: 18 Global Step: 385240 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:22,990-Speed 2495.67 samples/sec Loss 2.2342 LearningRate 0.000354 Epoch: 18 Global Step: 385250 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:31,198-Speed 2495.34 samples/sec Loss 2.2544 LearningRate 0.000354 Epoch: 18 Global Step: 385260 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:39,346-Speed 2513.71 samples/sec Loss 2.2201 LearningRate 0.000354 Epoch: 18 Global Step: 385270 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:47,546-Speed 2498.25 samples/sec Loss 2.2490 LearningRate 0.000354 Epoch: 18 Global Step: 385280 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:37:55,743-Speed 2498.64 samples/sec Loss 2.2709 LearningRate 0.000354 Epoch: 18 Global Step: 385290 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:03,947-Speed 2496.93 samples/sec Loss 2.2739 LearningRate 0.000354 Epoch: 18 Global Step: 385300 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:12,151-Speed 2496.72 samples/sec Loss 2.2879 LearningRate 0.000354 Epoch: 18 Global Step: 385310 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:20,359-Speed 2495.80 samples/sec Loss 2.2857 LearningRate 0.000354 Epoch: 18 Global Step: 385320 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:28,520-Speed 2509.79 samples/sec Loss 2.2388 LearningRate 0.000354 Epoch: 18 Global Step: 385330 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:36,726-Speed 2496.06 samples/sec Loss 2.2511 LearningRate 0.000354 Epoch: 18 Global Step: 385340 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:44,925-Speed 2498.47 samples/sec Loss 2.2703 LearningRate 0.000354 Epoch: 18 Global Step: 385350 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:38:53,125-Speed 2498.16 samples/sec Loss 2.2767 LearningRate 0.000354 Epoch: 18 Global Step: 385360 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:01,329-Speed 2496.59 samples/sec Loss 2.2850 LearningRate 0.000354 Epoch: 18 Global Step: 385370 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:09,535-Speed 2496.21 samples/sec Loss 2.2632 LearningRate 0.000354 Epoch: 18 Global Step: 385380 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:17,697-Speed 2509.75 samples/sec Loss 2.2994 LearningRate 0.000354 Epoch: 18 Global Step: 385390 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:25,900-Speed 2497.15 samples/sec Loss 2.2538 LearningRate 0.000354 Epoch: 18 Global Step: 385400 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:34,106-Speed 2495.93 samples/sec Loss 2.2571 LearningRate 0.000354 Epoch: 18 Global Step: 385410 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:42,309-Speed 2496.97 samples/sec Loss 2.3073 LearningRate 0.000354 Epoch: 18 Global Step: 385420 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:50,510-Speed 2497.74 samples/sec Loss 2.2804 LearningRate 0.000354 Epoch: 18 Global Step: 385430 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:39:58,722-Speed 2494.45 samples/sec Loss 2.2873 LearningRate 0.000354 Epoch: 18 Global Step: 385440 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:06,879-Speed 2510.95 samples/sec Loss 2.2661 LearningRate 0.000354 Epoch: 18 Global Step: 385450 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:15,079-Speed 2498.00 samples/sec Loss 2.2943 LearningRate 0.000354 Epoch: 18 Global Step: 385460 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:23,279-Speed 2498.15 samples/sec Loss 2.2791 LearningRate 0.000354 Epoch: 18 Global Step: 385470 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:31,477-Speed 2498.56 samples/sec Loss 2.3329 LearningRate 0.000354 Epoch: 18 Global Step: 385480 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:39,683-Speed 2496.44 samples/sec Loss 2.3082 LearningRate 0.000354 Epoch: 18 Global Step: 385490 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:47,889-Speed 2496.04 samples/sec Loss 2.3518 LearningRate 0.000354 Epoch: 18 Global Step: 385500 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:40:56,045-Speed 2511.33 samples/sec Loss 2.2391 LearningRate 0.000354 Epoch: 18 Global Step: 385510 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:04,245-Speed 2498.21 samples/sec Loss 2.2697 LearningRate 0.000354 Epoch: 18 Global Step: 385520 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:12,450-Speed 2496.27 samples/sec Loss 2.3077 LearningRate 0.000354 Epoch: 18 Global Step: 385530 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:20,655-Speed 2496.58 samples/sec Loss 2.3043 LearningRate 0.000354 Epoch: 18 Global Step: 385540 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:28,855-Speed 2498.02 samples/sec Loss 2.2520 LearningRate 0.000354 Epoch: 18 Global Step: 385550 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:37,059-Speed 2496.64 samples/sec Loss 2.3529 LearningRate 0.000354 Epoch: 18 Global Step: 385560 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:45,210-Speed 2513.15 samples/sec Loss 2.2661 LearningRate 0.000354 Epoch: 18 Global Step: 385570 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:41:53,410-Speed 2497.72 samples/sec Loss 2.2985 LearningRate 0.000354 Epoch: 18 Global Step: 385580 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:01,619-Speed 2495.19 samples/sec Loss 2.2569 LearningRate 0.000354 Epoch: 18 Global Step: 385590 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:09,820-Speed 2497.70 samples/sec Loss 2.2774 LearningRate 0.000354 Epoch: 18 Global Step: 385600 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:18,038-Speed 2492.36 samples/sec Loss 2.3025 LearningRate 0.000354 Epoch: 18 Global Step: 385610 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:26,249-Speed 2494.96 samples/sec Loss 2.2421 LearningRate 0.000354 Epoch: 18 Global Step: 385620 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:34,398-Speed 2513.99 samples/sec Loss 2.3133 LearningRate 0.000354 Epoch: 18 Global Step: 385630 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:42,605-Speed 2495.46 samples/sec Loss 2.2907 LearningRate 0.000354 Epoch: 18 Global Step: 385640 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:50,807-Speed 2497.58 samples/sec Loss 2.3063 LearningRate 0.000354 Epoch: 18 Global Step: 385650 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:42:59,010-Speed 2497.13 samples/sec Loss 2.2796 LearningRate 0.000354 Epoch: 18 Global Step: 385660 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:07,214-Speed 2496.66 samples/sec Loss 2.3120 LearningRate 0.000353 Epoch: 18 Global Step: 385670 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:15,418-Speed 2496.81 samples/sec Loss 2.3073 LearningRate 0.000353 Epoch: 18 Global Step: 385680 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:23,588-Speed 2507.23 samples/sec Loss 2.2904 LearningRate 0.000353 Epoch: 18 Global Step: 385690 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:31,791-Speed 2496.97 samples/sec Loss 2.2426 LearningRate 0.000353 Epoch: 18 Global Step: 385700 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:39,991-Speed 2497.90 samples/sec Loss 2.2945 LearningRate 0.000353 Epoch: 18 Global Step: 385710 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:48,194-Speed 2496.98 samples/sec Loss 2.1959 LearningRate 0.000353 Epoch: 18 Global Step: 385720 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:43:56,398-Speed 2496.78 samples/sec Loss 2.1934 LearningRate 0.000353 Epoch: 18 Global Step: 385730 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:04,604-Speed 2496.31 samples/sec Loss 2.2223 LearningRate 0.000353 Epoch: 18 Global Step: 385740 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:12,751-Speed 2514.27 samples/sec Loss 2.2524 LearningRate 0.000353 Epoch: 18 Global Step: 385750 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:20,950-Speed 2498.04 samples/sec Loss 2.2649 LearningRate 0.000353 Epoch: 18 Global Step: 385760 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:29,152-Speed 2497.47 samples/sec Loss 2.3022 LearningRate 0.000353 Epoch: 18 Global Step: 385770 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:37,353-Speed 2497.77 samples/sec Loss 2.2416 LearningRate 0.000353 Epoch: 18 Global Step: 385780 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:45,568-Speed 2493.29 samples/sec Loss 2.2689 LearningRate 0.000353 Epoch: 18 Global Step: 385790 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:44:53,769-Speed 2497.67 samples/sec Loss 2.2808 LearningRate 0.000353 Epoch: 18 Global Step: 385800 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:01,912-Speed 2515.57 samples/sec Loss 2.2875 LearningRate 0.000353 Epoch: 18 Global Step: 385810 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:10,112-Speed 2498.07 samples/sec Loss 2.2651 LearningRate 0.000353 Epoch: 18 Global Step: 385820 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:18,317-Speed 2496.35 samples/sec Loss 2.2297 LearningRate 0.000353 Epoch: 18 Global Step: 385830 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:26,516-Speed 2498.19 samples/sec Loss 2.2759 LearningRate 0.000353 Epoch: 18 Global Step: 385840 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:34,717-Speed 2497.67 samples/sec Loss 2.2417 LearningRate 0.000353 Epoch: 18 Global Step: 385850 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:42,920-Speed 2496.99 samples/sec Loss 2.2573 LearningRate 0.000353 Epoch: 18 Global Step: 385860 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:51,080-Speed 2510.14 samples/sec Loss 2.2416 LearningRate 0.000353 Epoch: 18 Global Step: 385870 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:45:59,286-Speed 2496.43 samples/sec Loss 2.2250 LearningRate 0.000353 Epoch: 18 Global Step: 385880 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:07,487-Speed 2497.67 samples/sec Loss 2.3081 LearningRate 0.000353 Epoch: 18 Global Step: 385890 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:15,687-Speed 2497.84 samples/sec Loss 2.3188 LearningRate 0.000353 Epoch: 18 Global Step: 385900 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:23,890-Speed 2496.86 samples/sec Loss 2.2572 LearningRate 0.000353 Epoch: 18 Global Step: 385910 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:32,101-Speed 2494.75 samples/sec Loss 2.3115 LearningRate 0.000353 Epoch: 18 Global Step: 385920 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:40,252-Speed 2513.13 samples/sec Loss 2.3093 LearningRate 0.000353 Epoch: 18 Global Step: 385930 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:48,452-Speed 2497.90 samples/sec Loss 2.2887 LearningRate 0.000353 Epoch: 18 Global Step: 385940 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:46:56,651-Speed 2498.22 samples/sec Loss 2.3126 LearningRate 0.000353 Epoch: 18 Global Step: 385950 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:04,854-Speed 2497.07 samples/sec Loss 2.2741 LearningRate 0.000353 Epoch: 18 Global Step: 385960 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:13,054-Speed 2498.19 samples/sec Loss 2.2354 LearningRate 0.000353 Epoch: 18 Global Step: 385970 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:21,255-Speed 2497.64 samples/sec Loss 2.2957 LearningRate 0.000353 Epoch: 18 Global Step: 385980 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:29,401-Speed 2514.51 samples/sec Loss 2.2559 LearningRate 0.000353 Epoch: 18 Global Step: 385990 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:37,607-Speed 2496.19 samples/sec Loss 2.2720 LearningRate 0.000353 Epoch: 18 Global Step: 386000 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:45,813-Speed 2496.24 samples/sec Loss 2.2799 LearningRate 0.000353 Epoch: 18 Global Step: 386010 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:47:54,013-Speed 2497.89 samples/sec Loss 2.2776 LearningRate 0.000353 Epoch: 18 Global Step: 386020 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:02,215-Speed 2498.00 samples/sec Loss 2.3085 LearningRate 0.000353 Epoch: 18 Global Step: 386030 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:10,417-Speed 2497.05 samples/sec Loss 2.2845 LearningRate 0.000353 Epoch: 18 Global Step: 386040 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:18,570-Speed 2512.54 samples/sec Loss 2.2610 LearningRate 0.000353 Epoch: 18 Global Step: 386050 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:26,772-Speed 2497.45 samples/sec Loss 2.2986 LearningRate 0.000353 Epoch: 18 Global Step: 386060 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:34,976-Speed 2496.45 samples/sec Loss 2.3157 LearningRate 0.000353 Epoch: 18 Global Step: 386070 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:43,174-Speed 2498.78 samples/sec Loss 2.2412 LearningRate 0.000353 Epoch: 18 Global Step: 386080 Fp16 Grad Scale: 32768 Required: 102 hours Training: 2022-07-09 06:48:51,372-Speed 2498.74 samples/sec Loss 2.3101 LearningRate 0.000353 Epoch: 18 Global Step: 386090 Fp16 Grad Scale: 65536 Required: 102 hours Training: 2022-07-09 06:48:59,580-Speed 2495.42 samples/sec Loss 2.2607 LearningRate 0.000353 Epoch: 18 Global Step: 386100 Fp16 Grad Scale: 65536 Required: 102 hours Training: 2022-07-09 06:49:07,727-Speed 2514.30 samples/sec Loss 2.2674 LearningRate 0.000353 Epoch: 18 Global Step: 386110 Fp16 Grad Scale: 65536 Required: 102 hours Training: 2022-07-09 06:49:15,933-Speed 2496.16 samples/sec Loss 2.2925 LearningRate 0.000353 Epoch: 18 Global Step: 386120 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:49:24,135-Speed 2497.30 samples/sec Loss 2.3389 LearningRate 0.000353 Epoch: 18 Global Step: 386130 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:49:32,350-Speed 2493.25 samples/sec Loss 2.2723 LearningRate 0.000353 Epoch: 18 Global Step: 386140 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:49:40,552-Speed 2497.49 samples/sec Loss 2.2493 LearningRate 0.000353 Epoch: 18 Global Step: 386150 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:49:48,758-Speed 2496.03 samples/sec Loss 2.2974 LearningRate 0.000353 Epoch: 18 Global Step: 386160 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:49:56,904-Speed 2514.64 samples/sec Loss 2.2904 LearningRate 0.000353 Epoch: 18 Global Step: 386170 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:50:05,111-Speed 2495.86 samples/sec Loss 2.3386 LearningRate 0.000353 Epoch: 18 Global Step: 386180 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:50:13,314-Speed 2497.21 samples/sec Loss 2.2771 LearningRate 0.000353 Epoch: 18 Global Step: 386190 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:50:21,520-Speed 2496.13 samples/sec Loss 2.3106 LearningRate 0.000353 Epoch: 18 Global Step: 386200 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:50:29,718-Speed 2498.50 samples/sec Loss 2.3338 LearningRate 0.000353 Epoch: 18 Global Step: 386210 Fp16 Grad Scale: 65536 Required: 101 hours Training: 2022-07-09 06:50:37,875-Speed 2510.83 samples/sec Loss 2.2812 LearningRate 0.000353 Epoch: 18 Global Step: 386220 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:50:46,019-Speed 2515.33 samples/sec Loss 2.2823 LearningRate 0.000353 Epoch: 18 Global Step: 386230 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:50:54,217-Speed 2498.56 samples/sec Loss 2.2672 LearningRate 0.000353 Epoch: 18 Global Step: 386240 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:02,429-Speed 2494.32 samples/sec Loss 2.3334 LearningRate 0.000353 Epoch: 18 Global Step: 386250 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:10,630-Speed 2498.00 samples/sec Loss 2.2845 LearningRate 0.000353 Epoch: 18 Global Step: 386260 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:18,827-Speed 2498.77 samples/sec Loss 2.2812 LearningRate 0.000353 Epoch: 18 Global Step: 386270 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:27,028-Speed 2497.85 samples/sec Loss 2.2961 LearningRate 0.000353 Epoch: 18 Global Step: 386280 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:35,174-Speed 2514.24 samples/sec Loss 2.2911 LearningRate 0.000352 Epoch: 18 Global Step: 386290 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:43,374-Speed 2497.87 samples/sec Loss 2.2969 LearningRate 0.000352 Epoch: 18 Global Step: 386300 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:51,574-Speed 2497.89 samples/sec Loss 2.3196 LearningRate 0.000352 Epoch: 18 Global Step: 386310 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:51:59,774-Speed 2498.06 samples/sec Loss 2.2930 LearningRate 0.000352 Epoch: 18 Global Step: 386320 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:07,977-Speed 2497.02 samples/sec Loss 2.3139 LearningRate 0.000352 Epoch: 18 Global Step: 386330 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:16,179-Speed 2497.32 samples/sec Loss 2.3101 LearningRate 0.000352 Epoch: 18 Global Step: 386340 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:24,322-Speed 2515.40 samples/sec Loss 2.2645 LearningRate 0.000352 Epoch: 18 Global Step: 386350 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:32,527-Speed 2496.50 samples/sec Loss 2.2763 LearningRate 0.000352 Epoch: 18 Global Step: 386360 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:40,728-Speed 2497.76 samples/sec Loss 2.2730 LearningRate 0.000352 Epoch: 18 Global Step: 386370 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:48,932-Speed 2496.81 samples/sec Loss 2.2444 LearningRate 0.000352 Epoch: 18 Global Step: 386380 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:52:57,131-Speed 2498.08 samples/sec Loss 2.2706 LearningRate 0.000352 Epoch: 18 Global Step: 386390 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:05,335-Speed 2497.21 samples/sec Loss 2.3093 LearningRate 0.000352 Epoch: 18 Global Step: 386400 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:13,483-Speed 2513.97 samples/sec Loss 2.2639 LearningRate 0.000352 Epoch: 18 Global Step: 386410 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:21,683-Speed 2497.84 samples/sec Loss 2.2824 LearningRate 0.000352 Epoch: 18 Global Step: 386420 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:29,880-Speed 2498.93 samples/sec Loss 2.2422 LearningRate 0.000352 Epoch: 18 Global Step: 386430 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:38,079-Speed 2498.58 samples/sec Loss 2.2622 LearningRate 0.000352 Epoch: 18 Global Step: 386440 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:46,282-Speed 2497.11 samples/sec Loss 2.2742 LearningRate 0.000352 Epoch: 18 Global Step: 386450 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:53:54,484-Speed 2497.47 samples/sec Loss 2.2590 LearningRate 0.000352 Epoch: 18 Global Step: 386460 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:02,632-Speed 2513.73 samples/sec Loss 2.2905 LearningRate 0.000352 Epoch: 18 Global Step: 386470 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:10,830-Speed 2499.51 samples/sec Loss 2.2319 LearningRate 0.000352 Epoch: 18 Global Step: 386480 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:19,029-Speed 2498.13 samples/sec Loss 2.2747 LearningRate 0.000352 Epoch: 18 Global Step: 386490 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:27,232-Speed 2496.79 samples/sec Loss 2.2959 LearningRate 0.000352 Epoch: 18 Global Step: 386500 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:35,435-Speed 2497.16 samples/sec Loss 2.2904 LearningRate 0.000352 Epoch: 18 Global Step: 386510 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:43,635-Speed 2498.18 samples/sec Loss 2.2621 LearningRate 0.000352 Epoch: 18 Global Step: 386520 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:51,785-Speed 2513.33 samples/sec Loss 2.3027 LearningRate 0.000352 Epoch: 18 Global Step: 386530 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:54:59,988-Speed 2496.84 samples/sec Loss 2.2737 LearningRate 0.000352 Epoch: 18 Global Step: 386540 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:08,197-Speed 2495.25 samples/sec Loss 2.3024 LearningRate 0.000352 Epoch: 18 Global Step: 386550 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:16,397-Speed 2497.92 samples/sec Loss 2.2763 LearningRate 0.000352 Epoch: 18 Global Step: 386560 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:24,597-Speed 2498.02 samples/sec Loss 2.2728 LearningRate 0.000352 Epoch: 18 Global Step: 386570 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:32,813-Speed 2493.16 samples/sec Loss 2.3093 LearningRate 0.000352 Epoch: 18 Global Step: 386580 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:40,960-Speed 2514.07 samples/sec Loss 2.2535 LearningRate 0.000352 Epoch: 18 Global Step: 386590 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:49,170-Speed 2494.72 samples/sec Loss 2.2642 LearningRate 0.000352 Epoch: 18 Global Step: 386600 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:55:57,377-Speed 2495.99 samples/sec Loss 2.2918 LearningRate 0.000352 Epoch: 18 Global Step: 386610 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:05,583-Speed 2496.22 samples/sec Loss 2.3112 LearningRate 0.000352 Epoch: 18 Global Step: 386620 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:13,790-Speed 2495.95 samples/sec Loss 2.2959 LearningRate 0.000352 Epoch: 18 Global Step: 386630 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:21,996-Speed 2496.00 samples/sec Loss 2.2649 LearningRate 0.000352 Epoch: 18 Global Step: 386640 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:30,148-Speed 2512.55 samples/sec Loss 2.3005 LearningRate 0.000352 Epoch: 18 Global Step: 386650 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:38,352-Speed 2496.87 samples/sec Loss 2.3293 LearningRate 0.000352 Epoch: 18 Global Step: 386660 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:46,554-Speed 2497.72 samples/sec Loss 2.3014 LearningRate 0.000352 Epoch: 18 Global Step: 386670 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:56:54,761-Speed 2495.75 samples/sec Loss 2.3434 LearningRate 0.000352 Epoch: 18 Global Step: 386680 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:02,966-Speed 2496.33 samples/sec Loss 2.3131 LearningRate 0.000352 Epoch: 18 Global Step: 386690 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:11,179-Speed 2493.94 samples/sec Loss 2.2679 LearningRate 0.000352 Epoch: 18 Global Step: 386700 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:19,327-Speed 2513.75 samples/sec Loss 2.2731 LearningRate 0.000352 Epoch: 18 Global Step: 386710 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:27,529-Speed 2497.48 samples/sec Loss 2.3134 LearningRate 0.000352 Epoch: 18 Global Step: 386720 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:35,728-Speed 2498.25 samples/sec Loss 2.3428 LearningRate 0.000352 Epoch: 18 Global Step: 386730 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:43,932-Speed 2496.54 samples/sec Loss 2.2944 LearningRate 0.000352 Epoch: 18 Global Step: 386740 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:57:52,139-Speed 2496.13 samples/sec Loss 2.2964 LearningRate 0.000352 Epoch: 18 Global Step: 386750 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:00,357-Speed 2492.19 samples/sec Loss 2.2845 LearningRate 0.000352 Epoch: 18 Global Step: 386760 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:08,518-Speed 2510.18 samples/sec Loss 2.2704 LearningRate 0.000352 Epoch: 18 Global Step: 386770 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:16,718-Speed 2497.77 samples/sec Loss 2.2919 LearningRate 0.000352 Epoch: 18 Global Step: 386780 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:24,934-Speed 2493.24 samples/sec Loss 2.2963 LearningRate 0.000352 Epoch: 18 Global Step: 386790 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:33,136-Speed 2497.45 samples/sec Loss 2.2822 LearningRate 0.000352 Epoch: 18 Global Step: 386800 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:41,339-Speed 2496.83 samples/sec Loss 2.2429 LearningRate 0.000352 Epoch: 18 Global Step: 386810 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:49,540-Speed 2497.94 samples/sec Loss 2.2916 LearningRate 0.000352 Epoch: 18 Global Step: 386820 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:58:57,685-Speed 2514.85 samples/sec Loss 2.2412 LearningRate 0.000352 Epoch: 18 Global Step: 386830 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:05,889-Speed 2497.16 samples/sec Loss 2.2571 LearningRate 0.000352 Epoch: 18 Global Step: 386840 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:14,090-Speed 2497.68 samples/sec Loss 2.2808 LearningRate 0.000352 Epoch: 18 Global Step: 386850 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:22,306-Speed 2493.28 samples/sec Loss 2.2232 LearningRate 0.000352 Epoch: 18 Global Step: 386860 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:30,510-Speed 2496.56 samples/sec Loss 2.2615 LearningRate 0.000352 Epoch: 18 Global Step: 386870 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:38,717-Speed 2495.64 samples/sec Loss 2.2240 LearningRate 0.000352 Epoch: 18 Global Step: 386880 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:46,871-Speed 2512.13 samples/sec Loss 2.2879 LearningRate 0.000352 Epoch: 18 Global Step: 386890 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 06:59:55,080-Speed 2495.27 samples/sec Loss 2.2120 LearningRate 0.000352 Epoch: 18 Global Step: 386900 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:03,281-Speed 2497.66 samples/sec Loss 2.2461 LearningRate 0.000352 Epoch: 18 Global Step: 386910 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:11,485-Speed 2496.86 samples/sec Loss 2.2725 LearningRate 0.000351 Epoch: 18 Global Step: 386920 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:19,687-Speed 2497.66 samples/sec Loss 2.2718 LearningRate 0.000351 Epoch: 18 Global Step: 386930 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:27,887-Speed 2497.73 samples/sec Loss 2.2670 LearningRate 0.000351 Epoch: 18 Global Step: 386940 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:36,037-Speed 2513.50 samples/sec Loss 2.2067 LearningRate 0.000351 Epoch: 18 Global Step: 386950 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:44,237-Speed 2497.98 samples/sec Loss 2.2409 LearningRate 0.000351 Epoch: 18 Global Step: 386960 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:00:52,438-Speed 2497.52 samples/sec Loss 2.2804 LearningRate 0.000351 Epoch: 18 Global Step: 386970 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:01:00,638-Speed 2497.97 samples/sec Loss 2.2541 LearningRate 0.000351 Epoch: 18 Global Step: 386980 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:01:08,835-Speed 2498.95 samples/sec Loss 2.3129 LearningRate 0.000351 Epoch: 18 Global Step: 386990 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:01:17,036-Speed 2497.86 samples/sec Loss 2.3077 LearningRate 0.000351 Epoch: 18 Global Step: 387000 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:01:25,183-Speed 2514.08 samples/sec Loss 2.2782 LearningRate 0.000351 Epoch: 18 Global Step: 387010 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:01:33,339-Speed 2511.51 samples/sec Loss 2.2329 LearningRate 0.000351 Epoch: 18 Global Step: 387020 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:01:41,538-Speed 2498.14 samples/sec Loss 2.2548 LearningRate 0.000351 Epoch: 18 Global Step: 387030 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:01:49,739-Speed 2497.87 samples/sec Loss 2.2835 LearningRate 0.000351 Epoch: 18 Global Step: 387040 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:01:57,938-Speed 2498.45 samples/sec Loss 2.2707 LearningRate 0.000351 Epoch: 18 Global Step: 387050 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:06,137-Speed 2498.15 samples/sec Loss 2.2465 LearningRate 0.000351 Epoch: 18 Global Step: 387060 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:14,287-Speed 2513.20 samples/sec Loss 2.2866 LearningRate 0.000351 Epoch: 18 Global Step: 387070 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:22,494-Speed 2495.86 samples/sec Loss 2.2576 LearningRate 0.000351 Epoch: 18 Global Step: 387080 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:30,708-Speed 2493.66 samples/sec Loss 2.2948 LearningRate 0.000351 Epoch: 18 Global Step: 387090 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:38,908-Speed 2498.07 samples/sec Loss 2.3113 LearningRate 0.000351 Epoch: 18 Global Step: 387100 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:47,108-Speed 2497.77 samples/sec Loss 2.2451 LearningRate 0.000351 Epoch: 18 Global Step: 387110 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:02:55,313-Speed 2496.82 samples/sec Loss 2.2405 LearningRate 0.000351 Epoch: 18 Global Step: 387120 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:03,473-Speed 2510.02 samples/sec Loss 2.2200 LearningRate 0.000351 Epoch: 18 Global Step: 387130 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:11,682-Speed 2495.38 samples/sec Loss 2.2921 LearningRate 0.000351 Epoch: 18 Global Step: 387140 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:19,890-Speed 2495.43 samples/sec Loss 2.3034 LearningRate 0.000351 Epoch: 18 Global Step: 387150 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:28,104-Speed 2493.75 samples/sec Loss 2.2883 LearningRate 0.000351 Epoch: 18 Global Step: 387160 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:36,308-Speed 2496.82 samples/sec Loss 2.2838 LearningRate 0.000351 Epoch: 18 Global Step: 387170 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:44,511-Speed 2497.06 samples/sec Loss 2.2996 LearningRate 0.000351 Epoch: 18 Global Step: 387180 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:03:52,661-Speed 2513.32 samples/sec Loss 2.3002 LearningRate 0.000351 Epoch: 18 Global Step: 387190 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:00,861-Speed 2498.06 samples/sec Loss 2.2677 LearningRate 0.000351 Epoch: 18 Global Step: 387200 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:09,066-Speed 2496.28 samples/sec Loss 2.2513 LearningRate 0.000351 Epoch: 18 Global Step: 387210 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:17,266-Speed 2498.17 samples/sec Loss 2.2512 LearningRate 0.000351 Epoch: 18 Global Step: 387220 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:25,468-Speed 2497.26 samples/sec Loss 2.2839 LearningRate 0.000351 Epoch: 18 Global Step: 387230 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:33,672-Speed 2496.66 samples/sec Loss 2.2838 LearningRate 0.000351 Epoch: 18 Global Step: 387240 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:41,824-Speed 2512.64 samples/sec Loss 2.2249 LearningRate 0.000351 Epoch: 18 Global Step: 387250 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:50,023-Speed 2498.13 samples/sec Loss 2.2477 LearningRate 0.000351 Epoch: 18 Global Step: 387260 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:04:58,228-Speed 2496.58 samples/sec Loss 2.2577 LearningRate 0.000351 Epoch: 18 Global Step: 387270 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:06,428-Speed 2498.00 samples/sec Loss 2.2756 LearningRate 0.000351 Epoch: 18 Global Step: 387280 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:14,626-Speed 2498.48 samples/sec Loss 2.2811 LearningRate 0.000351 Epoch: 18 Global Step: 387290 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:22,846-Speed 2491.80 samples/sec Loss 2.2873 LearningRate 0.000351 Epoch: 18 Global Step: 387300 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:30,996-Speed 2513.65 samples/sec Loss 2.2085 LearningRate 0.000351 Epoch: 18 Global Step: 387310 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:39,199-Speed 2496.92 samples/sec Loss 2.2678 LearningRate 0.000351 Epoch: 18 Global Step: 387320 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:47,400-Speed 2497.78 samples/sec Loss 2.2379 LearningRate 0.000351 Epoch: 18 Global Step: 387330 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:05:55,604-Speed 2496.84 samples/sec Loss 2.2590 LearningRate 0.000351 Epoch: 18 Global Step: 387340 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:03,802-Speed 2498.63 samples/sec Loss 2.2209 LearningRate 0.000351 Epoch: 18 Global Step: 387350 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:12,004-Speed 2497.42 samples/sec Loss 2.2517 LearningRate 0.000351 Epoch: 18 Global Step: 387360 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:20,152-Speed 2513.83 samples/sec Loss 2.2494 LearningRate 0.000351 Epoch: 18 Global Step: 387370 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:28,354-Speed 2497.23 samples/sec Loss 2.2289 LearningRate 0.000351 Epoch: 18 Global Step: 387380 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:36,555-Speed 2497.79 samples/sec Loss 2.2360 LearningRate 0.000351 Epoch: 18 Global Step: 387390 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:44,764-Speed 2495.11 samples/sec Loss 2.2535 LearningRate 0.000351 Epoch: 18 Global Step: 387400 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:06:52,964-Speed 2497.93 samples/sec Loss 2.2468 LearningRate 0.000351 Epoch: 18 Global Step: 387410 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:01,173-Speed 2495.31 samples/sec Loss 2.2508 LearningRate 0.000351 Epoch: 18 Global Step: 387420 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:09,337-Speed 2509.12 samples/sec Loss 2.2606 LearningRate 0.000351 Epoch: 18 Global Step: 387430 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:17,540-Speed 2497.24 samples/sec Loss 2.2616 LearningRate 0.000351 Epoch: 18 Global Step: 387440 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:25,738-Speed 2498.55 samples/sec Loss 2.2805 LearningRate 0.000351 Epoch: 18 Global Step: 387450 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:33,940-Speed 2497.30 samples/sec Loss 2.2865 LearningRate 0.000351 Epoch: 18 Global Step: 387460 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:42,145-Speed 2496.34 samples/sec Loss 2.3090 LearningRate 0.000351 Epoch: 18 Global Step: 387470 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:50,342-Speed 2498.78 samples/sec Loss 2.2991 LearningRate 0.000351 Epoch: 18 Global Step: 387480 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:07:58,488-Speed 2514.68 samples/sec Loss 2.2314 LearningRate 0.000351 Epoch: 18 Global Step: 387490 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:06,685-Speed 2498.89 samples/sec Loss 2.2741 LearningRate 0.000351 Epoch: 18 Global Step: 387500 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:14,885-Speed 2497.80 samples/sec Loss 2.3511 LearningRate 0.000351 Epoch: 18 Global Step: 387510 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:23,086-Speed 2497.72 samples/sec Loss 2.2730 LearningRate 0.000351 Epoch: 18 Global Step: 387520 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:31,287-Speed 2497.68 samples/sec Loss 2.2345 LearningRate 0.000351 Epoch: 18 Global Step: 387530 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:39,484-Speed 2498.73 samples/sec Loss 2.2610 LearningRate 0.000351 Epoch: 18 Global Step: 387540 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:47,631-Speed 2514.36 samples/sec Loss 2.2368 LearningRate 0.000350 Epoch: 18 Global Step: 387550 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:08:55,830-Speed 2498.23 samples/sec Loss 2.1939 LearningRate 0.000350 Epoch: 18 Global Step: 387560 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:04,026-Speed 2499.18 samples/sec Loss 2.2446 LearningRate 0.000350 Epoch: 18 Global Step: 387570 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:12,225-Speed 2498.50 samples/sec Loss 2.2490 LearningRate 0.000350 Epoch: 18 Global Step: 387580 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:20,422-Speed 2498.82 samples/sec Loss 2.2353 LearningRate 0.000350 Epoch: 18 Global Step: 387590 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:28,619-Speed 2498.67 samples/sec Loss 2.2210 LearningRate 0.000350 Epoch: 18 Global Step: 387600 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:36,766-Speed 2514.25 samples/sec Loss 2.1926 LearningRate 0.000350 Epoch: 18 Global Step: 387610 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:44,968-Speed 2497.64 samples/sec Loss 2.2130 LearningRate 0.000350 Epoch: 18 Global Step: 387620 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:09:53,164-Speed 2499.10 samples/sec Loss 2.2231 LearningRate 0.000350 Epoch: 18 Global Step: 387630 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:01,365-Speed 2497.67 samples/sec Loss 2.2325 LearningRate 0.000350 Epoch: 18 Global Step: 387640 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:09,567-Speed 2497.28 samples/sec Loss 2.2622 LearningRate 0.000350 Epoch: 18 Global Step: 387650 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:17,767-Speed 2497.98 samples/sec Loss 2.1805 LearningRate 0.000350 Epoch: 18 Global Step: 387660 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:25,925-Speed 2510.83 samples/sec Loss 2.2331 LearningRate 0.000350 Epoch: 18 Global Step: 387670 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:34,123-Speed 2498.48 samples/sec Loss 2.2311 LearningRate 0.000350 Epoch: 18 Global Step: 387680 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:42,326-Speed 2497.16 samples/sec Loss 2.2696 LearningRate 0.000350 Epoch: 18 Global Step: 387690 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:50,526-Speed 2497.75 samples/sec Loss 2.2245 LearningRate 0.000350 Epoch: 18 Global Step: 387700 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:10:58,731-Speed 2496.64 samples/sec Loss 2.2005 LearningRate 0.000350 Epoch: 18 Global Step: 387710 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:06,937-Speed 2496.28 samples/sec Loss 2.2849 LearningRate 0.000350 Epoch: 18 Global Step: 387720 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:15,083-Speed 2514.36 samples/sec Loss 2.2633 LearningRate 0.000350 Epoch: 18 Global Step: 387730 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:23,287-Speed 2496.89 samples/sec Loss 2.2523 LearningRate 0.000350 Epoch: 18 Global Step: 387740 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:31,496-Speed 2495.08 samples/sec Loss 2.2533 LearningRate 0.000350 Epoch: 18 Global Step: 387750 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:39,698-Speed 2497.43 samples/sec Loss 2.1980 LearningRate 0.000350 Epoch: 18 Global Step: 387760 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:47,900-Speed 2497.03 samples/sec Loss 2.2868 LearningRate 0.000350 Epoch: 18 Global Step: 387770 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:11:56,109-Speed 2495.36 samples/sec Loss 2.2633 LearningRate 0.000350 Epoch: 18 Global Step: 387780 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:04,258-Speed 2513.86 samples/sec Loss 2.2432 LearningRate 0.000350 Epoch: 18 Global Step: 387790 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:12,461-Speed 2497.18 samples/sec Loss 2.2593 LearningRate 0.000350 Epoch: 18 Global Step: 387800 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:20,665-Speed 2496.45 samples/sec Loss 2.2208 LearningRate 0.000350 Epoch: 18 Global Step: 387810 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:28,870-Speed 2497.04 samples/sec Loss 2.2594 LearningRate 0.000350 Epoch: 18 Global Step: 387820 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:37,068-Speed 2498.52 samples/sec Loss 2.2508 LearningRate 0.000350 Epoch: 18 Global Step: 387830 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:45,271-Speed 2497.07 samples/sec Loss 2.2756 LearningRate 0.000350 Epoch: 18 Global Step: 387840 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:12:53,423-Speed 2512.62 samples/sec Loss 2.2429 LearningRate 0.000350 Epoch: 18 Global Step: 387850 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:01,624-Speed 2497.77 samples/sec Loss 2.2955 LearningRate 0.000350 Epoch: 18 Global Step: 387860 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:09,824-Speed 2497.89 samples/sec Loss 2.2373 LearningRate 0.000350 Epoch: 18 Global Step: 387870 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:18,028-Speed 2496.68 samples/sec Loss 2.2620 LearningRate 0.000350 Epoch: 18 Global Step: 387880 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:26,231-Speed 2497.40 samples/sec Loss 2.2639 LearningRate 0.000350 Epoch: 18 Global Step: 387890 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:34,431-Speed 2498.01 samples/sec Loss 2.2757 LearningRate 0.000350 Epoch: 18 Global Step: 387900 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:42,579-Speed 2513.85 samples/sec Loss 2.2376 LearningRate 0.000350 Epoch: 18 Global Step: 387910 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:50,778-Speed 2498.52 samples/sec Loss 2.2500 LearningRate 0.000350 Epoch: 18 Global Step: 387920 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:13:58,978-Speed 2497.88 samples/sec Loss 2.2763 LearningRate 0.000350 Epoch: 18 Global Step: 387930 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:07,179-Speed 2497.74 samples/sec Loss 2.3393 LearningRate 0.000350 Epoch: 18 Global Step: 387940 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:15,378-Speed 2498.38 samples/sec Loss 2.2830 LearningRate 0.000350 Epoch: 18 Global Step: 387950 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:23,577-Speed 2498.56 samples/sec Loss 2.2396 LearningRate 0.000350 Epoch: 18 Global Step: 387960 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:31,728-Speed 2513.27 samples/sec Loss 2.3387 LearningRate 0.000350 Epoch: 18 Global Step: 387970 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:39,927-Speed 2498.40 samples/sec Loss 2.3060 LearningRate 0.000350 Epoch: 18 Global Step: 387980 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:48,134-Speed 2496.10 samples/sec Loss 2.2884 LearningRate 0.000350 Epoch: 18 Global Step: 387990 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:14:56,340-Speed 2495.87 samples/sec Loss 2.3005 LearningRate 0.000350 Epoch: 18 Global Step: 388000 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:04,548-Speed 2495.56 samples/sec Loss 2.2881 LearningRate 0.000350 Epoch: 18 Global Step: 388010 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:12,752-Speed 2496.84 samples/sec Loss 2.2921 LearningRate 0.000350 Epoch: 18 Global Step: 388020 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:20,898-Speed 2514.55 samples/sec Loss 2.2755 LearningRate 0.000350 Epoch: 18 Global Step: 388030 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:29,115-Speed 2492.94 samples/sec Loss 2.2825 LearningRate 0.000350 Epoch: 18 Global Step: 388040 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:37,316-Speed 2497.69 samples/sec Loss 2.3141 LearningRate 0.000350 Epoch: 18 Global Step: 388050 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:45,520-Speed 2496.70 samples/sec Loss 2.3914 LearningRate 0.000350 Epoch: 18 Global Step: 388060 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:15:53,726-Speed 2495.94 samples/sec Loss 2.3454 LearningRate 0.000350 Epoch: 18 Global Step: 388070 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:01,930-Speed 2496.83 samples/sec Loss 2.3264 LearningRate 0.000350 Epoch: 18 Global Step: 388080 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:10,088-Speed 2511.00 samples/sec Loss 2.2525 LearningRate 0.000350 Epoch: 18 Global Step: 388090 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:18,289-Speed 2497.54 samples/sec Loss 2.2680 LearningRate 0.000350 Epoch: 18 Global Step: 388100 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:26,490-Speed 2497.48 samples/sec Loss 2.2725 LearningRate 0.000350 Epoch: 18 Global Step: 388110 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:34,693-Speed 2497.22 samples/sec Loss 2.3079 LearningRate 0.000350 Epoch: 18 Global Step: 388120 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:42,893-Speed 2497.89 samples/sec Loss 2.2846 LearningRate 0.000350 Epoch: 18 Global Step: 388130 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:51,090-Speed 2498.62 samples/sec Loss 2.3140 LearningRate 0.000350 Epoch: 18 Global Step: 388140 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:16:59,235-Speed 2514.94 samples/sec Loss 2.2842 LearningRate 0.000350 Epoch: 18 Global Step: 388150 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:07,452-Speed 2492.75 samples/sec Loss 2.2685 LearningRate 0.000350 Epoch: 18 Global Step: 388160 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:15,662-Speed 2494.84 samples/sec Loss 2.3083 LearningRate 0.000350 Epoch: 18 Global Step: 388170 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:23,866-Speed 2496.63 samples/sec Loss 2.2876 LearningRate 0.000349 Epoch: 18 Global Step: 388180 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:32,067-Speed 2497.58 samples/sec Loss 2.2360 LearningRate 0.000349 Epoch: 18 Global Step: 388190 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:40,267-Speed 2498.08 samples/sec Loss 2.2162 LearningRate 0.000349 Epoch: 18 Global Step: 388200 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:48,425-Speed 2510.63 samples/sec Loss 2.2479 LearningRate 0.000349 Epoch: 18 Global Step: 388210 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:17:56,629-Speed 2496.96 samples/sec Loss 2.2573 LearningRate 0.000349 Epoch: 18 Global Step: 388220 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:04,832-Speed 2496.92 samples/sec Loss 2.2348 LearningRate 0.000349 Epoch: 18 Global Step: 388230 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:13,035-Speed 2497.21 samples/sec Loss 2.2535 LearningRate 0.000349 Epoch: 18 Global Step: 388240 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:21,238-Speed 2497.07 samples/sec Loss 2.2538 LearningRate 0.000349 Epoch: 18 Global Step: 388250 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:29,441-Speed 2497.03 samples/sec Loss 2.2730 LearningRate 0.000349 Epoch: 18 Global Step: 388260 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:37,589-Speed 2513.91 samples/sec Loss 2.2804 LearningRate 0.000349 Epoch: 18 Global Step: 388270 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:45,795-Speed 2496.35 samples/sec Loss 2.3259 LearningRate 0.000349 Epoch: 18 Global Step: 388280 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:18:54,006-Speed 2494.58 samples/sec Loss 2.3020 LearningRate 0.000349 Epoch: 18 Global Step: 388290 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:02,207-Speed 2497.62 samples/sec Loss 2.3249 LearningRate 0.000349 Epoch: 18 Global Step: 388300 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:10,414-Speed 2496.43 samples/sec Loss 2.3158 LearningRate 0.000349 Epoch: 18 Global Step: 388310 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:18,614-Speed 2497.85 samples/sec Loss 2.3358 LearningRate 0.000349 Epoch: 18 Global Step: 388320 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:26,763-Speed 2513.66 samples/sec Loss 2.3603 LearningRate 0.000349 Epoch: 18 Global Step: 388330 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:34,958-Speed 2499.22 samples/sec Loss 2.3100 LearningRate 0.000349 Epoch: 18 Global Step: 388340 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:43,158-Speed 2498.14 samples/sec Loss 2.3250 LearningRate 0.000349 Epoch: 18 Global Step: 388350 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:51,357-Speed 2498.16 samples/sec Loss 2.3114 LearningRate 0.000349 Epoch: 18 Global Step: 388360 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:19:59,556-Speed 2498.05 samples/sec Loss 2.2454 LearningRate 0.000349 Epoch: 18 Global Step: 388370 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:07,763-Speed 2496.01 samples/sec Loss 2.2870 LearningRate 0.000349 Epoch: 18 Global Step: 388380 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:15,909-Speed 2514.35 samples/sec Loss 2.2642 LearningRate 0.000349 Epoch: 18 Global Step: 388390 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:24,106-Speed 2498.85 samples/sec Loss 2.2576 LearningRate 0.000349 Epoch: 18 Global Step: 388400 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:32,302-Speed 2499.09 samples/sec Loss 2.2690 LearningRate 0.000349 Epoch: 18 Global Step: 388410 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:40,505-Speed 2497.16 samples/sec Loss 2.2777 LearningRate 0.000349 Epoch: 18 Global Step: 388420 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:48,703-Speed 2498.42 samples/sec Loss 2.3007 LearningRate 0.000349 Epoch: 18 Global Step: 388430 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:20:56,912-Speed 2495.13 samples/sec Loss 2.2528 LearningRate 0.000349 Epoch: 18 Global Step: 388440 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:05,060-Speed 2514.16 samples/sec Loss 2.2325 LearningRate 0.000349 Epoch: 18 Global Step: 388450 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:13,258-Speed 2498.64 samples/sec Loss 2.2462 LearningRate 0.000349 Epoch: 18 Global Step: 388460 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:21,463-Speed 2496.20 samples/sec Loss 2.2967 LearningRate 0.000349 Epoch: 18 Global Step: 388470 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:29,664-Speed 2497.40 samples/sec Loss 2.2629 LearningRate 0.000349 Epoch: 18 Global Step: 388480 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:37,874-Speed 2495.01 samples/sec Loss 2.2358 LearningRate 0.000349 Epoch: 18 Global Step: 388490 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:46,084-Speed 2494.94 samples/sec Loss 2.3030 LearningRate 0.000349 Epoch: 18 Global Step: 388500 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:21:54,230-Speed 2514.31 samples/sec Loss 2.2843 LearningRate 0.000349 Epoch: 18 Global Step: 388510 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:02,431-Speed 2497.71 samples/sec Loss 2.2717 LearningRate 0.000349 Epoch: 18 Global Step: 388520 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:10,632-Speed 2497.71 samples/sec Loss 2.2225 LearningRate 0.000349 Epoch: 18 Global Step: 388530 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:18,830-Speed 2498.46 samples/sec Loss 2.2880 LearningRate 0.000349 Epoch: 18 Global Step: 388540 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:27,030-Speed 2497.96 samples/sec Loss 2.2715 LearningRate 0.000349 Epoch: 18 Global Step: 388550 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:35,246-Speed 2493.38 samples/sec Loss 2.2373 LearningRate 0.000349 Epoch: 18 Global Step: 388560 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:43,395-Speed 2513.49 samples/sec Loss 2.2315 LearningRate 0.000349 Epoch: 18 Global Step: 388570 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:51,606-Speed 2494.67 samples/sec Loss 2.3321 LearningRate 0.000349 Epoch: 18 Global Step: 388580 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:22:59,810-Speed 2496.66 samples/sec Loss 2.2892 LearningRate 0.000349 Epoch: 18 Global Step: 388590 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:08,012-Speed 2497.34 samples/sec Loss 2.3027 LearningRate 0.000349 Epoch: 18 Global Step: 388600 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:16,213-Speed 2497.93 samples/sec Loss 2.2918 LearningRate 0.000349 Epoch: 18 Global Step: 388610 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:24,413-Speed 2497.74 samples/sec Loss 2.2849 LearningRate 0.000349 Epoch: 18 Global Step: 388620 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:32,560-Speed 2514.45 samples/sec Loss 2.3356 LearningRate 0.000349 Epoch: 18 Global Step: 388630 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:40,773-Speed 2494.19 samples/sec Loss 2.3020 LearningRate 0.000349 Epoch: 18 Global Step: 388640 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:48,979-Speed 2496.37 samples/sec Loss 2.2705 LearningRate 0.000349 Epoch: 18 Global Step: 388650 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:23:57,178-Speed 2498.11 samples/sec Loss 2.3001 LearningRate 0.000349 Epoch: 18 Global Step: 388660 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:05,386-Speed 2495.52 samples/sec Loss 2.3144 LearningRate 0.000349 Epoch: 18 Global Step: 388670 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:13,586-Speed 2497.78 samples/sec Loss 2.3323 LearningRate 0.000349 Epoch: 18 Global Step: 388680 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:21,732-Speed 2514.77 samples/sec Loss 2.2859 LearningRate 0.000349 Epoch: 18 Global Step: 388690 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:29,934-Speed 2497.23 samples/sec Loss 2.3152 LearningRate 0.000349 Epoch: 18 Global Step: 388700 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:38,137-Speed 2497.19 samples/sec Loss 2.2708 LearningRate 0.000349 Epoch: 18 Global Step: 388710 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:46,338-Speed 2497.57 samples/sec Loss 2.2995 LearningRate 0.000349 Epoch: 18 Global Step: 388720 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:24:54,551-Speed 2494.12 samples/sec Loss 2.3185 LearningRate 0.000349 Epoch: 18 Global Step: 388730 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:02,752-Speed 2497.44 samples/sec Loss 2.3038 LearningRate 0.000349 Epoch: 18 Global Step: 388740 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:10,902-Speed 2513.40 samples/sec Loss 2.3406 LearningRate 0.000349 Epoch: 18 Global Step: 388750 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:19,102-Speed 2498.19 samples/sec Loss 2.2784 LearningRate 0.000349 Epoch: 18 Global Step: 388760 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:27,301-Speed 2498.35 samples/sec Loss 2.2768 LearningRate 0.000349 Epoch: 18 Global Step: 388770 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:35,505-Speed 2496.75 samples/sec Loss 2.2956 LearningRate 0.000349 Epoch: 18 Global Step: 388780 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:43,705-Speed 2497.78 samples/sec Loss 2.3205 LearningRate 0.000349 Epoch: 18 Global Step: 388790 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:25:51,861-Speed 2511.73 samples/sec Loss 2.2481 LearningRate 0.000349 Epoch: 18 Global Step: 388800 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:00,008-Speed 2514.08 samples/sec Loss 2.3080 LearningRate 0.000349 Epoch: 18 Global Step: 388810 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:08,207-Speed 2498.17 samples/sec Loss 2.2762 LearningRate 0.000348 Epoch: 18 Global Step: 388820 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:16,408-Speed 2497.89 samples/sec Loss 2.2712 LearningRate 0.000348 Epoch: 18 Global Step: 388830 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:24,609-Speed 2497.78 samples/sec Loss 2.2488 LearningRate 0.000348 Epoch: 18 Global Step: 388840 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:32,810-Speed 2497.53 samples/sec Loss 2.2943 LearningRate 0.000348 Epoch: 18 Global Step: 388850 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:41,013-Speed 2496.97 samples/sec Loss 2.2132 LearningRate 0.000348 Epoch: 18 Global Step: 388860 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:49,159-Speed 2514.63 samples/sec Loss 2.3076 LearningRate 0.000348 Epoch: 18 Global Step: 388870 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:26:57,384-Speed 2490.73 samples/sec Loss 2.2970 LearningRate 0.000348 Epoch: 18 Global Step: 388880 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:05,586-Speed 2497.20 samples/sec Loss 2.2284 LearningRate 0.000348 Epoch: 18 Global Step: 388890 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:13,785-Speed 2498.29 samples/sec Loss 2.2948 LearningRate 0.000348 Epoch: 18 Global Step: 388900 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:21,985-Speed 2497.82 samples/sec Loss 2.2835 LearningRate 0.000348 Epoch: 18 Global Step: 388910 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:30,185-Speed 2497.96 samples/sec Loss 2.3027 LearningRate 0.000348 Epoch: 18 Global Step: 388920 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:38,331-Speed 2514.53 samples/sec Loss 2.3011 LearningRate 0.000348 Epoch: 18 Global Step: 388930 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:46,534-Speed 2497.02 samples/sec Loss 2.2492 LearningRate 0.000348 Epoch: 18 Global Step: 388940 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:27:54,759-Speed 2490.36 samples/sec Loss 2.2264 LearningRate 0.000348 Epoch: 18 Global Step: 388950 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:02,960-Speed 2498.04 samples/sec Loss 2.2873 LearningRate 0.000348 Epoch: 18 Global Step: 388960 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:11,165-Speed 2497.16 samples/sec Loss 2.2674 LearningRate 0.000348 Epoch: 18 Global Step: 388970 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:19,362-Speed 2498.66 samples/sec Loss 2.2883 LearningRate 0.000348 Epoch: 18 Global Step: 388980 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:27,512-Speed 2513.50 samples/sec Loss 2.2976 LearningRate 0.000348 Epoch: 18 Global Step: 388990 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:35,721-Speed 2495.15 samples/sec Loss 2.2643 LearningRate 0.000348 Epoch: 18 Global Step: 389000 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:43,921-Speed 2497.93 samples/sec Loss 2.2812 LearningRate 0.000348 Epoch: 18 Global Step: 389010 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:28:52,121-Speed 2497.79 samples/sec Loss 2.2778 LearningRate 0.000348 Epoch: 18 Global Step: 389020 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:00,335-Speed 2494.02 samples/sec Loss 2.2488 LearningRate 0.000348 Epoch: 18 Global Step: 389030 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:08,547-Speed 2494.14 samples/sec Loss 2.2617 LearningRate 0.000348 Epoch: 18 Global Step: 389040 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:16,695-Speed 2513.81 samples/sec Loss 2.2979 LearningRate 0.000348 Epoch: 18 Global Step: 389050 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:24,894-Speed 2498.45 samples/sec Loss 2.2498 LearningRate 0.000348 Epoch: 18 Global Step: 389060 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:33,107-Speed 2494.02 samples/sec Loss 2.2686 LearningRate 0.000348 Epoch: 18 Global Step: 389070 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:41,314-Speed 2495.74 samples/sec Loss 2.3016 LearningRate 0.000348 Epoch: 18 Global Step: 389080 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:49,515-Speed 2497.71 samples/sec Loss 2.2900 LearningRate 0.000348 Epoch: 18 Global Step: 389090 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:29:57,716-Speed 2497.51 samples/sec Loss 2.2473 LearningRate 0.000348 Epoch: 18 Global Step: 389100 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:05,866-Speed 2513.48 samples/sec Loss 2.2767 LearningRate 0.000348 Epoch: 18 Global Step: 389110 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:14,066-Speed 2497.74 samples/sec Loss 2.2684 LearningRate 0.000348 Epoch: 18 Global Step: 389120 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:22,270-Speed 2496.93 samples/sec Loss 2.2442 LearningRate 0.000348 Epoch: 18 Global Step: 389130 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:30,467-Speed 2499.38 samples/sec Loss 2.3060 LearningRate 0.000348 Epoch: 18 Global Step: 389140 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:38,679-Speed 2494.24 samples/sec Loss 2.2456 LearningRate 0.000348 Epoch: 18 Global Step: 389150 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:46,877-Speed 2498.65 samples/sec Loss 2.2526 LearningRate 0.000348 Epoch: 18 Global Step: 389160 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:30:55,031-Speed 2511.85 samples/sec Loss 2.3484 LearningRate 0.000348 Epoch: 18 Global Step: 389170 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:03,231-Speed 2498.03 samples/sec Loss 2.2210 LearningRate 0.000348 Epoch: 18 Global Step: 389180 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:11,434-Speed 2497.10 samples/sec Loss 2.2907 LearningRate 0.000348 Epoch: 18 Global Step: 389190 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:19,646-Speed 2493.96 samples/sec Loss 2.3038 LearningRate 0.000348 Epoch: 18 Global Step: 389200 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:27,853-Speed 2496.05 samples/sec Loss 2.2609 LearningRate 0.000348 Epoch: 18 Global Step: 389210 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:36,058-Speed 2496.48 samples/sec Loss 2.2041 LearningRate 0.000348 Epoch: 18 Global Step: 389220 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:44,207-Speed 2513.54 samples/sec Loss 2.2589 LearningRate 0.000348 Epoch: 18 Global Step: 389230 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:31:52,413-Speed 2496.44 samples/sec Loss 2.3324 LearningRate 0.000348 Epoch: 18 Global Step: 389240 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:00,625-Speed 2494.44 samples/sec Loss 2.2530 LearningRate 0.000348 Epoch: 18 Global Step: 389250 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:08,828-Speed 2497.11 samples/sec Loss 2.2985 LearningRate 0.000348 Epoch: 18 Global Step: 389260 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:17,026-Speed 2498.52 samples/sec Loss 2.3197 LearningRate 0.000348 Epoch: 18 Global Step: 389270 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:25,226-Speed 2497.78 samples/sec Loss 2.3070 LearningRate 0.000348 Epoch: 18 Global Step: 389280 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:33,380-Speed 2511.92 samples/sec Loss 2.2212 LearningRate 0.000348 Epoch: 18 Global Step: 389290 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:41,579-Speed 2498.59 samples/sec Loss 2.2604 LearningRate 0.000348 Epoch: 18 Global Step: 389300 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:49,786-Speed 2495.52 samples/sec Loss 2.2638 LearningRate 0.000348 Epoch: 18 Global Step: 389310 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:32:57,988-Speed 2497.39 samples/sec Loss 2.2698 LearningRate 0.000348 Epoch: 18 Global Step: 389320 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:06,193-Speed 2496.76 samples/sec Loss 2.3019 LearningRate 0.000348 Epoch: 18 Global Step: 389330 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:14,396-Speed 2497.11 samples/sec Loss 2.2439 LearningRate 0.000348 Epoch: 18 Global Step: 389340 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:22,543-Speed 2514.09 samples/sec Loss 2.2242 LearningRate 0.000348 Epoch: 18 Global Step: 389350 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:30,740-Speed 2498.90 samples/sec Loss 2.2575 LearningRate 0.000348 Epoch: 18 Global Step: 389360 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:38,947-Speed 2495.89 samples/sec Loss 2.2539 LearningRate 0.000348 Epoch: 18 Global Step: 389370 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:47,144-Speed 2498.84 samples/sec Loss 2.2687 LearningRate 0.000348 Epoch: 18 Global Step: 389380 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:33:55,348-Speed 2496.76 samples/sec Loss 2.3111 LearningRate 0.000348 Epoch: 18 Global Step: 389390 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:03,576-Speed 2489.43 samples/sec Loss 2.2339 LearningRate 0.000348 Epoch: 18 Global Step: 389400 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:11,733-Speed 2511.38 samples/sec Loss 2.2460 LearningRate 0.000348 Epoch: 18 Global Step: 389410 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:19,940-Speed 2495.95 samples/sec Loss 2.2857 LearningRate 0.000348 Epoch: 18 Global Step: 389420 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:28,149-Speed 2495.30 samples/sec Loss 2.2467 LearningRate 0.000348 Epoch: 18 Global Step: 389430 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:36,353-Speed 2496.69 samples/sec Loss 2.2372 LearningRate 0.000348 Epoch: 18 Global Step: 389440 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:44,552-Speed 2498.03 samples/sec Loss 2.2832 LearningRate 0.000347 Epoch: 18 Global Step: 389450 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:34:52,755-Speed 2497.17 samples/sec Loss 2.2681 LearningRate 0.000347 Epoch: 18 Global Step: 389460 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:00,906-Speed 2513.21 samples/sec Loss 2.2532 LearningRate 0.000347 Epoch: 18 Global Step: 389470 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:09,107-Speed 2497.91 samples/sec Loss 2.2471 LearningRate 0.000347 Epoch: 18 Global Step: 389480 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:17,309-Speed 2496.94 samples/sec Loss 2.2481 LearningRate 0.000347 Epoch: 18 Global Step: 389490 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:25,517-Speed 2495.66 samples/sec Loss 2.2711 LearningRate 0.000347 Epoch: 18 Global Step: 389500 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:33,722-Speed 2496.43 samples/sec Loss 2.2817 LearningRate 0.000347 Epoch: 18 Global Step: 389510 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:41,924-Speed 2497.24 samples/sec Loss 2.2881 LearningRate 0.000347 Epoch: 18 Global Step: 389520 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:50,074-Speed 2513.17 samples/sec Loss 2.2509 LearningRate 0.000347 Epoch: 18 Global Step: 389530 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:35:58,276-Speed 2497.71 samples/sec Loss 2.2661 LearningRate 0.000347 Epoch: 18 Global Step: 389540 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:06,477-Speed 2497.63 samples/sec Loss 2.3327 LearningRate 0.000347 Epoch: 18 Global Step: 389550 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:14,677-Speed 2497.90 samples/sec Loss 2.2788 LearningRate 0.000347 Epoch: 18 Global Step: 389560 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:22,880-Speed 2497.15 samples/sec Loss 2.2545 LearningRate 0.000347 Epoch: 18 Global Step: 389570 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:31,083-Speed 2497.03 samples/sec Loss 2.2564 LearningRate 0.000347 Epoch: 18 Global Step: 389580 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:39,233-Speed 2513.13 samples/sec Loss 2.2872 LearningRate 0.000347 Epoch: 18 Global Step: 389590 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:47,449-Speed 2493.28 samples/sec Loss 2.2801 LearningRate 0.000347 Epoch: 18 Global Step: 389600 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:36:55,649-Speed 2497.85 samples/sec Loss 2.2387 LearningRate 0.000347 Epoch: 18 Global Step: 389610 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:03,849-Speed 2497.75 samples/sec Loss 2.2901 LearningRate 0.000347 Epoch: 18 Global Step: 389620 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:12,049-Speed 2497.87 samples/sec Loss 2.2745 LearningRate 0.000347 Epoch: 18 Global Step: 389630 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:20,254-Speed 2496.59 samples/sec Loss 2.2594 LearningRate 0.000347 Epoch: 18 Global Step: 389640 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:28,401-Speed 2514.19 samples/sec Loss 2.3052 LearningRate 0.000347 Epoch: 18 Global Step: 389650 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:36,605-Speed 2497.08 samples/sec Loss 2.2187 LearningRate 0.000347 Epoch: 18 Global Step: 389660 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:44,802-Speed 2498.79 samples/sec Loss 2.2467 LearningRate 0.000347 Epoch: 18 Global Step: 389670 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:37:53,014-Speed 2494.22 samples/sec Loss 2.2867 LearningRate 0.000347 Epoch: 18 Global Step: 389680 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:01,215-Speed 2497.64 samples/sec Loss 2.2934 LearningRate 0.000347 Epoch: 18 Global Step: 389690 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:09,414-Speed 2498.27 samples/sec Loss 2.2330 LearningRate 0.000347 Epoch: 18 Global Step: 389700 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:17,575-Speed 2509.97 samples/sec Loss 2.3182 LearningRate 0.000347 Epoch: 18 Global Step: 389710 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:25,777-Speed 2497.51 samples/sec Loss 2.2864 LearningRate 0.000347 Epoch: 18 Global Step: 389720 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:33,991-Speed 2493.45 samples/sec Loss 2.2359 LearningRate 0.000347 Epoch: 18 Global Step: 389730 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:42,194-Speed 2497.36 samples/sec Loss 2.3149 LearningRate 0.000347 Epoch: 18 Global Step: 389740 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:50,393-Speed 2498.41 samples/sec Loss 2.3083 LearningRate 0.000347 Epoch: 18 Global Step: 389750 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:38:58,607-Speed 2493.44 samples/sec Loss 2.2372 LearningRate 0.000347 Epoch: 18 Global Step: 389760 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:06,759-Speed 2512.90 samples/sec Loss 2.2412 LearningRate 0.000347 Epoch: 18 Global Step: 389770 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:14,961-Speed 2497.33 samples/sec Loss 2.2959 LearningRate 0.000347 Epoch: 18 Global Step: 389780 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:23,163-Speed 2497.25 samples/sec Loss 2.2529 LearningRate 0.000347 Epoch: 18 Global Step: 389790 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:31,365-Speed 2497.20 samples/sec Loss 2.2504 LearningRate 0.000347 Epoch: 18 Global Step: 389800 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:39,569-Speed 2496.90 samples/sec Loss 2.2300 LearningRate 0.000347 Epoch: 18 Global Step: 389810 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:47,772-Speed 2497.16 samples/sec Loss 2.2490 LearningRate 0.000347 Epoch: 18 Global Step: 389820 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:39:55,919-Speed 2514.10 samples/sec Loss 2.2684 LearningRate 0.000347 Epoch: 18 Global Step: 389830 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:04,137-Speed 2492.65 samples/sec Loss 2.1681 LearningRate 0.000347 Epoch: 18 Global Step: 389840 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:12,341-Speed 2496.75 samples/sec Loss 2.2382 LearningRate 0.000347 Epoch: 18 Global Step: 389850 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:20,546-Speed 2496.36 samples/sec Loss 2.2031 LearningRate 0.000347 Epoch: 18 Global Step: 389860 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:28,747-Speed 2497.82 samples/sec Loss 2.2488 LearningRate 0.000347 Epoch: 18 Global Step: 389870 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:36,947-Speed 2497.92 samples/sec Loss 2.2502 LearningRate 0.000347 Epoch: 18 Global Step: 389880 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:45,096-Speed 2513.63 samples/sec Loss 2.2076 LearningRate 0.000347 Epoch: 18 Global Step: 389890 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:40:53,299-Speed 2496.79 samples/sec Loss 2.2673 LearningRate 0.000347 Epoch: 18 Global Step: 389900 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:01,508-Speed 2495.33 samples/sec Loss 2.2880 LearningRate 0.000347 Epoch: 18 Global Step: 389910 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:09,708-Speed 2498.28 samples/sec Loss 2.2818 LearningRate 0.000347 Epoch: 18 Global Step: 389920 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:17,906-Speed 2498.36 samples/sec Loss 2.2839 LearningRate 0.000347 Epoch: 18 Global Step: 389930 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:26,114-Speed 2495.89 samples/sec Loss 2.2854 LearningRate 0.000347 Epoch: 18 Global Step: 389940 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:34,260-Speed 2514.56 samples/sec Loss 2.2388 LearningRate 0.000347 Epoch: 18 Global Step: 389950 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:42,458-Speed 2498.42 samples/sec Loss 2.3112 LearningRate 0.000347 Epoch: 18 Global Step: 389960 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:50,668-Speed 2494.87 samples/sec Loss 2.3097 LearningRate 0.000347 Epoch: 18 Global Step: 389970 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:41:58,886-Speed 2492.61 samples/sec Loss 2.2960 LearningRate 0.000347 Epoch: 18 Global Step: 389980 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:42:07,089-Speed 2497.07 samples/sec Loss 2.3242 LearningRate 0.000347 Epoch: 18 Global Step: 389990 Fp16 Grad Scale: 16384 Required: 101 hours Training: 2022-07-09 07:42:15,293-Speed 2496.58 samples/sec Loss 2.2450 LearningRate 0.000347 Epoch: 18 Global Step: 390000 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:42:23,439-Speed 2515.17 samples/sec Loss 2.2565 LearningRate 0.000347 Epoch: 18 Global Step: 390010 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:42:31,638-Speed 2498.28 samples/sec Loss 2.2713 LearningRate 0.000347 Epoch: 18 Global Step: 390020 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:42:39,836-Speed 2498.45 samples/sec Loss 2.3005 LearningRate 0.000347 Epoch: 18 Global Step: 390030 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:42:48,046-Speed 2495.16 samples/sec Loss 2.2470 LearningRate 0.000347 Epoch: 18 Global Step: 390040 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:42:56,242-Speed 2499.03 samples/sec Loss 2.2663 LearningRate 0.000347 Epoch: 18 Global Step: 390050 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:04,443-Speed 2497.75 samples/sec Loss 2.3369 LearningRate 0.000347 Epoch: 18 Global Step: 390060 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:12,589-Speed 2514.37 samples/sec Loss 2.3100 LearningRate 0.000347 Epoch: 18 Global Step: 390070 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:20,799-Speed 2494.81 samples/sec Loss 2.2614 LearningRate 0.000346 Epoch: 18 Global Step: 390080 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:29,001-Speed 2497.51 samples/sec Loss 2.2630 LearningRate 0.000346 Epoch: 18 Global Step: 390090 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:37,202-Speed 2497.54 samples/sec Loss 2.2523 LearningRate 0.000346 Epoch: 18 Global Step: 390100 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:45,408-Speed 2496.12 samples/sec Loss 2.2383 LearningRate 0.000346 Epoch: 18 Global Step: 390110 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:43:53,612-Speed 2496.70 samples/sec Loss 2.2836 LearningRate 0.000346 Epoch: 18 Global Step: 390120 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:01,760-Speed 2513.93 samples/sec Loss 2.2413 LearningRate 0.000346 Epoch: 18 Global Step: 390130 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:09,957-Speed 2498.79 samples/sec Loss 2.2536 LearningRate 0.000346 Epoch: 18 Global Step: 390140 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:18,157-Speed 2497.99 samples/sec Loss 2.2530 LearningRate 0.000346 Epoch: 18 Global Step: 390150 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:26,356-Speed 2498.25 samples/sec Loss 2.2605 LearningRate 0.000346 Epoch: 18 Global Step: 390160 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:34,553-Speed 2498.61 samples/sec Loss 2.2699 LearningRate 0.000346 Epoch: 18 Global Step: 390170 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:42,755-Speed 2497.50 samples/sec Loss 2.2146 LearningRate 0.000346 Epoch: 18 Global Step: 390180 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:50,903-Speed 2513.77 samples/sec Loss 2.2153 LearningRate 0.000346 Epoch: 18 Global Step: 390190 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:44:59,104-Speed 2498.06 samples/sec Loss 2.2211 LearningRate 0.000346 Epoch: 18 Global Step: 390200 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:07,300-Speed 2499.46 samples/sec Loss 2.2507 LearningRate 0.000346 Epoch: 18 Global Step: 390210 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:15,502-Speed 2497.28 samples/sec Loss 2.2987 LearningRate 0.000346 Epoch: 18 Global Step: 390220 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:23,705-Speed 2496.90 samples/sec Loss 2.2734 LearningRate 0.000346 Epoch: 18 Global Step: 390230 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:31,908-Speed 2496.99 samples/sec Loss 2.2679 LearningRate 0.000346 Epoch: 18 Global Step: 390240 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:40,057-Speed 2513.55 samples/sec Loss 2.2200 LearningRate 0.000346 Epoch: 18 Global Step: 390250 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:48,257-Speed 2498.08 samples/sec Loss 2.2492 LearningRate 0.000346 Epoch: 18 Global Step: 390260 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:45:56,456-Speed 2498.29 samples/sec Loss 2.2364 LearningRate 0.000346 Epoch: 18 Global Step: 390270 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:04,653-Speed 2498.65 samples/sec Loss 2.2249 LearningRate 0.000346 Epoch: 18 Global Step: 390280 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:12,850-Speed 2499.93 samples/sec Loss 2.1922 LearningRate 0.000346 Epoch: 18 Global Step: 390290 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:21,048-Speed 2498.57 samples/sec Loss 2.3036 LearningRate 0.000346 Epoch: 18 Global Step: 390300 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:29,198-Speed 2513.25 samples/sec Loss 2.2663 LearningRate 0.000346 Epoch: 18 Global Step: 390310 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:37,397-Speed 2498.14 samples/sec Loss 2.2524 LearningRate 0.000346 Epoch: 18 Global Step: 390320 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:45,598-Speed 2497.73 samples/sec Loss 2.2836 LearningRate 0.000346 Epoch: 18 Global Step: 390330 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:46:53,809-Speed 2494.35 samples/sec Loss 2.2532 LearningRate 0.000346 Epoch: 18 Global Step: 390340 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:02,008-Speed 2498.31 samples/sec Loss 2.2502 LearningRate 0.000346 Epoch: 18 Global Step: 390350 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:10,217-Speed 2495.32 samples/sec Loss 2.2193 LearningRate 0.000346 Epoch: 18 Global Step: 390360 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:18,366-Speed 2513.37 samples/sec Loss 2.2340 LearningRate 0.000346 Epoch: 18 Global Step: 390370 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:26,564-Speed 2498.77 samples/sec Loss 2.2381 LearningRate 0.000346 Epoch: 18 Global Step: 390380 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:34,762-Speed 2498.69 samples/sec Loss 2.2848 LearningRate 0.000346 Epoch: 18 Global Step: 390390 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:42,962-Speed 2498.00 samples/sec Loss 2.2702 LearningRate 0.000346 Epoch: 18 Global Step: 390400 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:51,163-Speed 2497.74 samples/sec Loss 2.2109 LearningRate 0.000346 Epoch: 18 Global Step: 390410 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:47:59,366-Speed 2496.98 samples/sec Loss 2.2975 LearningRate 0.000346 Epoch: 18 Global Step: 390420 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:48:07,512-Speed 2514.61 samples/sec Loss 2.2303 LearningRate 0.000346 Epoch: 18 Global Step: 390430 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:48:15,710-Speed 2498.81 samples/sec Loss 2.2332 LearningRate 0.000346 Epoch: 18 Global Step: 390440 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:48:23,906-Speed 2498.82 samples/sec Loss 2.2607 LearningRate 0.000346 Epoch: 18 Global Step: 390450 Fp16 Grad Scale: 32768 Required: 101 hours Training: 2022-07-09 07:48:32,108-Speed 2497.51 samples/sec Loss 2.2447 LearningRate 0.000346 Epoch: 18 Global Step: 390460 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 07:48:40,302-Speed 2500.06 samples/sec Loss 2.2795 LearningRate 0.000346 Epoch: 18 Global Step: 390470 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 07:48:48,455-Speed 2512.20 samples/sec Loss 2.2911 LearningRate 0.000346 Epoch: 18 Global Step: 390480 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:48:56,598-Speed 2515.51 samples/sec Loss 2.2651 LearningRate 0.000346 Epoch: 18 Global Step: 390490 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:04,798-Speed 2498.02 samples/sec Loss 2.2812 LearningRate 0.000346 Epoch: 18 Global Step: 390500 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:12,997-Speed 2498.24 samples/sec Loss 2.3162 LearningRate 0.000346 Epoch: 18 Global Step: 390510 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:21,197-Speed 2497.98 samples/sec Loss 2.3381 LearningRate 0.000346 Epoch: 18 Global Step: 390520 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:29,404-Speed 2496.05 samples/sec Loss 2.3398 LearningRate 0.000346 Epoch: 18 Global Step: 390530 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:37,605-Speed 2497.85 samples/sec Loss 2.2784 LearningRate 0.000346 Epoch: 18 Global Step: 390540 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:45,753-Speed 2513.90 samples/sec Loss 2.2820 LearningRate 0.000346 Epoch: 18 Global Step: 390550 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:49:53,950-Speed 2498.93 samples/sec Loss 2.2860 LearningRate 0.000346 Epoch: 18 Global Step: 390560 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:02,161-Speed 2494.63 samples/sec Loss 2.3037 LearningRate 0.000346 Epoch: 18 Global Step: 390570 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:10,377-Speed 2493.16 samples/sec Loss 2.2637 LearningRate 0.000346 Epoch: 18 Global Step: 390580 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:18,584-Speed 2495.83 samples/sec Loss 2.2528 LearningRate 0.000346 Epoch: 18 Global Step: 390590 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:26,790-Speed 2495.98 samples/sec Loss 2.2035 LearningRate 0.000346 Epoch: 18 Global Step: 390600 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:34,947-Speed 2510.95 samples/sec Loss 2.2525 LearningRate 0.000346 Epoch: 18 Global Step: 390610 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:43,150-Speed 2497.31 samples/sec Loss 2.2695 LearningRate 0.000346 Epoch: 18 Global Step: 390620 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:51,350-Speed 2497.96 samples/sec Loss 2.2400 LearningRate 0.000346 Epoch: 18 Global Step: 390630 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:50:59,549-Speed 2498.43 samples/sec Loss 2.2887 LearningRate 0.000346 Epoch: 18 Global Step: 390640 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:07,747-Speed 2498.51 samples/sec Loss 2.2605 LearningRate 0.000346 Epoch: 18 Global Step: 390650 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:15,946-Speed 2498.45 samples/sec Loss 2.2570 LearningRate 0.000346 Epoch: 18 Global Step: 390660 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:24,096-Speed 2513.33 samples/sec Loss 2.2626 LearningRate 0.000346 Epoch: 18 Global Step: 390670 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:32,296-Speed 2497.88 samples/sec Loss 2.2376 LearningRate 0.000346 Epoch: 18 Global Step: 390680 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:40,499-Speed 2497.21 samples/sec Loss 2.2075 LearningRate 0.000346 Epoch: 18 Global Step: 390690 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:48,706-Speed 2496.15 samples/sec Loss 2.1899 LearningRate 0.000346 Epoch: 18 Global Step: 390700 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:51:56,909-Speed 2497.07 samples/sec Loss 2.2535 LearningRate 0.000346 Epoch: 18 Global Step: 390710 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:05,111-Speed 2497.33 samples/sec Loss 2.2270 LearningRate 0.000345 Epoch: 18 Global Step: 390720 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:13,261-Speed 2513.09 samples/sec Loss 2.2031 LearningRate 0.000345 Epoch: 18 Global Step: 390730 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:21,468-Speed 2496.16 samples/sec Loss 2.2837 LearningRate 0.000345 Epoch: 18 Global Step: 390740 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:29,665-Speed 2498.92 samples/sec Loss 2.2015 LearningRate 0.000345 Epoch: 18 Global Step: 390750 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:37,867-Speed 2497.18 samples/sec Loss 2.2331 LearningRate 0.000345 Epoch: 18 Global Step: 390760 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:46,069-Speed 2497.80 samples/sec Loss 2.2225 LearningRate 0.000345 Epoch: 18 Global Step: 390770 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:52:54,270-Speed 2497.63 samples/sec Loss 2.1903 LearningRate 0.000345 Epoch: 18 Global Step: 390780 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:02,419-Speed 2513.69 samples/sec Loss 2.2073 LearningRate 0.000345 Epoch: 18 Global Step: 390790 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:10,630-Speed 2494.44 samples/sec Loss 2.2245 LearningRate 0.000345 Epoch: 18 Global Step: 390800 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:18,841-Speed 2494.68 samples/sec Loss 2.2486 LearningRate 0.000345 Epoch: 18 Global Step: 390810 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:27,046-Speed 2496.80 samples/sec Loss 2.2421 LearningRate 0.000345 Epoch: 18 Global Step: 390820 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:35,245-Speed 2498.12 samples/sec Loss 2.2176 LearningRate 0.000345 Epoch: 18 Global Step: 390830 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:43,456-Speed 2494.71 samples/sec Loss 2.2576 LearningRate 0.000345 Epoch: 18 Global Step: 390840 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:51,605-Speed 2513.75 samples/sec Loss 2.2738 LearningRate 0.000345 Epoch: 18 Global Step: 390850 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:53:59,806-Speed 2497.48 samples/sec Loss 2.2271 LearningRate 0.000345 Epoch: 18 Global Step: 390860 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:08,006-Speed 2498.15 samples/sec Loss 2.2773 LearningRate 0.000345 Epoch: 18 Global Step: 390870 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:16,213-Speed 2495.66 samples/sec Loss 2.2370 LearningRate 0.000345 Epoch: 18 Global Step: 390880 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:24,424-Speed 2494.69 samples/sec Loss 2.2696 LearningRate 0.000345 Epoch: 18 Global Step: 390890 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:32,625-Speed 2497.64 samples/sec Loss 2.2346 LearningRate 0.000345 Epoch: 18 Global Step: 390900 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:40,774-Speed 2513.39 samples/sec Loss 2.2912 LearningRate 0.000345 Epoch: 18 Global Step: 390910 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:48,976-Speed 2497.42 samples/sec Loss 2.2872 LearningRate 0.000345 Epoch: 18 Global Step: 390920 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:54:57,177-Speed 2497.76 samples/sec Loss 2.2280 LearningRate 0.000345 Epoch: 18 Global Step: 390930 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:05,373-Speed 2498.98 samples/sec Loss 2.2555 LearningRate 0.000345 Epoch: 18 Global Step: 390940 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:13,572-Speed 2498.36 samples/sec Loss 2.2290 LearningRate 0.000345 Epoch: 18 Global Step: 390950 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:21,771-Speed 2498.11 samples/sec Loss 2.2556 LearningRate 0.000345 Epoch: 18 Global Step: 390960 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:29,919-Speed 2514.41 samples/sec Loss 2.2794 LearningRate 0.000345 Epoch: 18 Global Step: 390970 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:38,118-Speed 2498.29 samples/sec Loss 2.2296 LearningRate 0.000345 Epoch: 18 Global Step: 390980 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:46,320-Speed 2497.29 samples/sec Loss 2.2285 LearningRate 0.000345 Epoch: 18 Global Step: 390990 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:55:54,522-Speed 2497.47 samples/sec Loss 2.2715 LearningRate 0.000345 Epoch: 18 Global Step: 391000 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:02,720-Speed 2498.51 samples/sec Loss 2.2758 LearningRate 0.000345 Epoch: 18 Global Step: 391010 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:10,923-Speed 2496.95 samples/sec Loss 2.2644 LearningRate 0.000345 Epoch: 18 Global Step: 391020 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:19,073-Speed 2513.41 samples/sec Loss 2.1954 LearningRate 0.000345 Epoch: 18 Global Step: 391030 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:27,275-Speed 2497.29 samples/sec Loss 2.2361 LearningRate 0.000345 Epoch: 18 Global Step: 391040 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:35,474-Speed 2498.20 samples/sec Loss 2.2403 LearningRate 0.000345 Epoch: 18 Global Step: 391050 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:43,697-Speed 2491.11 samples/sec Loss 2.2129 LearningRate 0.000345 Epoch: 18 Global Step: 391060 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:56:51,894-Speed 2498.69 samples/sec Loss 2.2326 LearningRate 0.000345 Epoch: 18 Global Step: 391070 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:00,095-Speed 2497.75 samples/sec Loss 2.2362 LearningRate 0.000345 Epoch: 18 Global Step: 391080 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:08,245-Speed 2513.27 samples/sec Loss 2.2869 LearningRate 0.000345 Epoch: 18 Global Step: 391090 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:16,443-Speed 2498.65 samples/sec Loss 2.2718 LearningRate 0.000345 Epoch: 18 Global Step: 391100 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:24,640-Speed 2498.79 samples/sec Loss 2.2804 LearningRate 0.000345 Epoch: 18 Global Step: 391110 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:32,865-Speed 2490.37 samples/sec Loss 2.2543 LearningRate 0.000345 Epoch: 18 Global Step: 391120 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:41,064-Speed 2498.09 samples/sec Loss 2.2347 LearningRate 0.000345 Epoch: 18 Global Step: 391130 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:49,266-Speed 2497.31 samples/sec Loss 2.2725 LearningRate 0.000345 Epoch: 18 Global Step: 391140 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:57:57,425-Speed 2510.36 samples/sec Loss 2.2537 LearningRate 0.000345 Epoch: 18 Global Step: 391150 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:05,624-Speed 2498.30 samples/sec Loss 2.2606 LearningRate 0.000345 Epoch: 18 Global Step: 391160 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:13,825-Speed 2497.65 samples/sec Loss 2.3112 LearningRate 0.000345 Epoch: 18 Global Step: 391170 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:22,031-Speed 2495.96 samples/sec Loss 2.2539 LearningRate 0.000345 Epoch: 18 Global Step: 391180 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:30,234-Speed 2497.19 samples/sec Loss 2.2212 LearningRate 0.000345 Epoch: 18 Global Step: 391190 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:38,440-Speed 2496.19 samples/sec Loss 2.2053 LearningRate 0.000345 Epoch: 18 Global Step: 391200 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:46,599-Speed 2510.30 samples/sec Loss 2.2461 LearningRate 0.000345 Epoch: 18 Global Step: 391210 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:58:54,804-Speed 2496.50 samples/sec Loss 2.2028 LearningRate 0.000345 Epoch: 18 Global Step: 391220 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:03,006-Speed 2497.23 samples/sec Loss 2.2793 LearningRate 0.000345 Epoch: 18 Global Step: 391230 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:11,206-Speed 2498.08 samples/sec Loss 2.3071 LearningRate 0.000345 Epoch: 18 Global Step: 391240 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:19,409-Speed 2496.98 samples/sec Loss 2.2459 LearningRate 0.000345 Epoch: 18 Global Step: 391250 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:27,611-Speed 2497.43 samples/sec Loss 2.3082 LearningRate 0.000345 Epoch: 18 Global Step: 391260 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:35,766-Speed 2511.70 samples/sec Loss 2.2578 LearningRate 0.000345 Epoch: 18 Global Step: 391270 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:43,974-Speed 2495.68 samples/sec Loss 2.3067 LearningRate 0.000345 Epoch: 18 Global Step: 391280 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 07:59:52,177-Speed 2496.93 samples/sec Loss 2.3362 LearningRate 0.000345 Epoch: 18 Global Step: 391290 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:00,382-Speed 2496.99 samples/sec Loss 2.3400 LearningRate 0.000345 Epoch: 18 Global Step: 391300 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:08,587-Speed 2496.32 samples/sec Loss 2.3168 LearningRate 0.000345 Epoch: 18 Global Step: 391310 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:16,792-Speed 2496.43 samples/sec Loss 2.2798 LearningRate 0.000345 Epoch: 18 Global Step: 391320 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:24,947-Speed 2511.61 samples/sec Loss 2.2078 LearningRate 0.000345 Epoch: 18 Global Step: 391330 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:33,155-Speed 2495.73 samples/sec Loss 2.2339 LearningRate 0.000345 Epoch: 18 Global Step: 391340 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:41,373-Speed 2492.40 samples/sec Loss 2.2450 LearningRate 0.000344 Epoch: 18 Global Step: 391350 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:49,570-Speed 2498.70 samples/sec Loss 2.2656 LearningRate 0.000344 Epoch: 18 Global Step: 391360 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:00:57,778-Speed 2495.77 samples/sec Loss 2.2626 LearningRate 0.000344 Epoch: 18 Global Step: 391370 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:05,976-Speed 2498.41 samples/sec Loss 2.2214 LearningRate 0.000344 Epoch: 18 Global Step: 391380 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:14,122-Speed 2514.60 samples/sec Loss 2.2842 LearningRate 0.000344 Epoch: 18 Global Step: 391390 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:22,323-Speed 2497.79 samples/sec Loss 2.2647 LearningRate 0.000344 Epoch: 18 Global Step: 391400 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:30,526-Speed 2497.02 samples/sec Loss 2.2555 LearningRate 0.000344 Epoch: 18 Global Step: 391410 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:38,741-Speed 2493.34 samples/sec Loss 2.2754 LearningRate 0.000344 Epoch: 18 Global Step: 391420 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:46,966-Speed 2490.47 samples/sec Loss 2.2172 LearningRate 0.000344 Epoch: 18 Global Step: 391430 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:01:55,167-Speed 2497.66 samples/sec Loss 2.2419 LearningRate 0.000344 Epoch: 18 Global Step: 391440 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:03,318-Speed 2512.74 samples/sec Loss 2.2199 LearningRate 0.000344 Epoch: 18 Global Step: 391450 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:11,522-Speed 2496.75 samples/sec Loss 2.1915 LearningRate 0.000344 Epoch: 18 Global Step: 391460 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:19,724-Speed 2497.37 samples/sec Loss 2.2538 LearningRate 0.000344 Epoch: 18 Global Step: 391470 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:27,928-Speed 2496.50 samples/sec Loss 2.2525 LearningRate 0.000344 Epoch: 18 Global Step: 391480 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:36,129-Speed 2497.51 samples/sec Loss 2.2347 LearningRate 0.000344 Epoch: 18 Global Step: 391490 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:44,335-Speed 2496.45 samples/sec Loss 2.1966 LearningRate 0.000344 Epoch: 18 Global Step: 391500 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:02:52,482-Speed 2514.13 samples/sec Loss 2.2046 LearningRate 0.000344 Epoch: 18 Global Step: 391510 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:00,684-Speed 2497.32 samples/sec Loss 2.2164 LearningRate 0.000344 Epoch: 18 Global Step: 391520 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:08,885-Speed 2497.67 samples/sec Loss 2.2183 LearningRate 0.000344 Epoch: 18 Global Step: 391530 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:17,086-Speed 2497.71 samples/sec Loss 2.2533 LearningRate 0.000344 Epoch: 18 Global Step: 391540 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:25,289-Speed 2496.98 samples/sec Loss 2.2796 LearningRate 0.000344 Epoch: 18 Global Step: 391550 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:33,489-Speed 2498.02 samples/sec Loss 2.2347 LearningRate 0.000344 Epoch: 18 Global Step: 391560 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:41,635-Speed 2514.45 samples/sec Loss 2.2561 LearningRate 0.000344 Epoch: 18 Global Step: 391570 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:49,847-Speed 2494.54 samples/sec Loss 2.2745 LearningRate 0.000344 Epoch: 18 Global Step: 391580 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:03:58,048-Speed 2497.70 samples/sec Loss 2.2294 LearningRate 0.000344 Epoch: 18 Global Step: 391590 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:06,251-Speed 2496.85 samples/sec Loss 2.2042 LearningRate 0.000344 Epoch: 18 Global Step: 391600 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:14,454-Speed 2497.14 samples/sec Loss 2.2038 LearningRate 0.000344 Epoch: 18 Global Step: 391610 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:22,653-Speed 2498.19 samples/sec Loss 2.2336 LearningRate 0.000344 Epoch: 18 Global Step: 391620 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:30,816-Speed 2509.36 samples/sec Loss 2.2153 LearningRate 0.000344 Epoch: 18 Global Step: 391630 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:39,016-Speed 2498.07 samples/sec Loss 2.2083 LearningRate 0.000344 Epoch: 18 Global Step: 391640 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:47,220-Speed 2496.99 samples/sec Loss 2.3133 LearningRate 0.000344 Epoch: 18 Global Step: 391650 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:04:55,419-Speed 2498.30 samples/sec Loss 2.2981 LearningRate 0.000344 Epoch: 18 Global Step: 391660 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:05:03,621-Speed 2497.28 samples/sec Loss 2.2324 LearningRate 0.000344 Epoch: 18 Global Step: 391670 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:05:11,824-Speed 2496.91 samples/sec Loss 2.2388 LearningRate 0.000344 Epoch: 18 Global Step: 391680 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:05:20,357-Speed 2516.45 samples/sec Loss 2.2467 LearningRate 0.000344 Epoch: 18 Global Step: 391690 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:05:28,606-Speed 2499.82 samples/sec Loss 2.2038 LearningRate 0.000344 Epoch: 18 Global Step: 391700 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:05:39,496-Speed 1880.76 samples/sec Loss 2.2426 LearningRate 0.000344 Epoch: 18 Global Step: 391710 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:05:47,694-Speed 2498.16 samples/sec Loss 2.2450 LearningRate 0.000344 Epoch: 18 Global Step: 391720 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:05:55,960-Speed 2499.17 samples/sec Loss 2.2416 LearningRate 0.000344 Epoch: 18 Global Step: 391730 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:04,185-Speed 2498.12 samples/sec Loss 2.2438 LearningRate 0.000344 Epoch: 18 Global Step: 391740 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:14,786-Speed 1932.19 samples/sec Loss 2.2196 LearningRate 0.000344 Epoch: 18 Global Step: 391750 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:22,993-Speed 2495.75 samples/sec Loss 2.2099 LearningRate 0.000344 Epoch: 18 Global Step: 391760 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:31,451-Speed 2496.51 samples/sec Loss 2.2044 LearningRate 0.000344 Epoch: 18 Global Step: 391770 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:39,709-Speed 2495.79 samples/sec Loss 2.2407 LearningRate 0.000344 Epoch: 18 Global Step: 391780 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:51,605-Speed 1721.77 samples/sec Loss 2.2199 LearningRate 0.000344 Epoch: 18 Global Step: 391790 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:06:59,866-Speed 2496.87 samples/sec Loss 2.2672 LearningRate 0.000344 Epoch: 18 Global Step: 391800 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:08,063-Speed 2513.24 samples/sec Loss 2.2077 LearningRate 0.000344 Epoch: 18 Global Step: 391810 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:16,530-Speed 2497.04 samples/sec Loss 2.2070 LearningRate 0.000344 Epoch: 18 Global Step: 391820 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:24,735-Speed 2496.36 samples/sec Loss 2.2801 LearningRate 0.000344 Epoch: 18 Global Step: 391830 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:32,937-Speed 2497.13 samples/sec Loss 2.2600 LearningRate 0.000344 Epoch: 18 Global Step: 391840 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:41,178-Speed 2500.13 samples/sec Loss 2.2621 LearningRate 0.000344 Epoch: 18 Global Step: 391850 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:07:49,400-Speed 2500.20 samples/sec Loss 2.2126 LearningRate 0.000344 Epoch: 18 Global Step: 391860 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:08:01,386-Speed 1708.74 samples/sec Loss 2.2771 LearningRate 0.000344 Epoch: 18 Global Step: 391870 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:08:09,584-Speed 2498.52 samples/sec Loss 2.2466 LearningRate 0.000344 Epoch: 18 Global Step: 391880 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:08:17,819-Speed 2503.77 samples/sec Loss 2.2761 LearningRate 0.000344 Epoch: 18 Global Step: 391890 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:08:27,499-Speed 2132.58 samples/sec Loss 2.2608 LearningRate 0.000344 Epoch: 18 Global Step: 391900 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:08:35,699-Speed 2497.68 samples/sec Loss 2.2509 LearningRate 0.000344 Epoch: 18 Global Step: 391910 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:11,139-Speed 577.90 samples/sec Loss 2.2482 LearningRate 0.000344 Epoch: 18 Global Step: 391920 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:19,850-Speed 2516.47 samples/sec Loss 2.2583 LearningRate 0.000344 Epoch: 18 Global Step: 391930 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:28,094-Speed 2501.85 samples/sec Loss 2.2997 LearningRate 0.000344 Epoch: 18 Global Step: 391940 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:36,300-Speed 2496.01 samples/sec Loss 2.2498 LearningRate 0.000344 Epoch: 18 Global Step: 391950 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:44,507-Speed 2495.66 samples/sec Loss 2.2558 LearningRate 0.000344 Epoch: 18 Global Step: 391960 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:09:52,721-Speed 2493.62 samples/sec Loss 2.2524 LearningRate 0.000344 Epoch: 18 Global Step: 391970 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:00,941-Speed 2491.95 samples/sec Loss 2.2264 LearningRate 0.000344 Epoch: 18 Global Step: 391980 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:09,121-Speed 2503.90 samples/sec Loss 2.2510 LearningRate 0.000343 Epoch: 18 Global Step: 391990 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:17,366-Speed 2484.37 samples/sec Loss 2.2119 LearningRate 0.000343 Epoch: 18 Global Step: 392000 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:25,591-Speed 2490.48 samples/sec Loss 2.2115 LearningRate 0.000343 Epoch: 18 Global Step: 392010 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:33,813-Speed 2491.33 samples/sec Loss 2.2480 LearningRate 0.000343 Epoch: 18 Global Step: 392020 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:42,044-Speed 2488.48 samples/sec Loss 2.1958 LearningRate 0.000343 Epoch: 18 Global Step: 392030 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:50,265-Speed 2491.55 samples/sec Loss 2.2470 LearningRate 0.000343 Epoch: 18 Global Step: 392040 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:10:58,421-Speed 2511.57 samples/sec Loss 2.2276 LearningRate 0.000343 Epoch: 18 Global Step: 392050 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:06,632-Speed 2494.54 samples/sec Loss 2.2299 LearningRate 0.000343 Epoch: 18 Global Step: 392060 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:14,838-Speed 2496.14 samples/sec Loss 2.2449 LearningRate 0.000343 Epoch: 18 Global Step: 392070 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:23,041-Speed 2497.40 samples/sec Loss 2.2481 LearningRate 0.000343 Epoch: 18 Global Step: 392080 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:31,248-Speed 2495.68 samples/sec Loss 2.2096 LearningRate 0.000343 Epoch: 18 Global Step: 392090 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:39,451-Speed 2497.14 samples/sec Loss 2.2486 LearningRate 0.000343 Epoch: 18 Global Step: 392100 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:47,599-Speed 2514.06 samples/sec Loss 2.2375 LearningRate 0.000343 Epoch: 18 Global Step: 392110 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:11:55,803-Speed 2496.88 samples/sec Loss 2.2270 LearningRate 0.000343 Epoch: 18 Global Step: 392120 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:04,004-Speed 2497.43 samples/sec Loss 2.2584 LearningRate 0.000343 Epoch: 18 Global Step: 392130 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:12,208-Speed 2496.85 samples/sec Loss 2.2504 LearningRate 0.000343 Epoch: 18 Global Step: 392140 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:20,415-Speed 2495.74 samples/sec Loss 2.2692 LearningRate 0.000343 Epoch: 18 Global Step: 392150 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:28,618-Speed 2496.90 samples/sec Loss 2.2236 LearningRate 0.000343 Epoch: 18 Global Step: 392160 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:36,766-Speed 2514.03 samples/sec Loss 2.2138 LearningRate 0.000343 Epoch: 18 Global Step: 392170 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:44,969-Speed 2497.09 samples/sec Loss 2.2656 LearningRate 0.000343 Epoch: 18 Global Step: 392180 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:12:53,172-Speed 2497.13 samples/sec Loss 2.2879 LearningRate 0.000343 Epoch: 18 Global Step: 392190 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:01,377-Speed 2496.38 samples/sec Loss 2.2636 LearningRate 0.000343 Epoch: 18 Global Step: 392200 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:09,586-Speed 2495.41 samples/sec Loss 2.1877 LearningRate 0.000343 Epoch: 18 Global Step: 392210 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:17,791-Speed 2496.27 samples/sec Loss 2.2453 LearningRate 0.000343 Epoch: 18 Global Step: 392220 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:25,970-Speed 2504.38 samples/sec Loss 2.2444 LearningRate 0.000343 Epoch: 18 Global Step: 392230 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:34,179-Speed 2495.10 samples/sec Loss 2.2742 LearningRate 0.000343 Epoch: 18 Global Step: 392240 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:42,402-Speed 2491.24 samples/sec Loss 2.2524 LearningRate 0.000343 Epoch: 18 Global Step: 392250 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:50,608-Speed 2496.19 samples/sec Loss 2.2502 LearningRate 0.000343 Epoch: 18 Global Step: 392260 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:13:58,818-Speed 2494.75 samples/sec Loss 2.3320 LearningRate 0.000343 Epoch: 18 Global Step: 392270 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:07,028-Speed 2494.99 samples/sec Loss 2.2532 LearningRate 0.000343 Epoch: 18 Global Step: 392280 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:15,181-Speed 2512.29 samples/sec Loss 2.3117 LearningRate 0.000343 Epoch: 18 Global Step: 392290 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:23,388-Speed 2495.59 samples/sec Loss 2.2582 LearningRate 0.000343 Epoch: 18 Global Step: 392300 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:31,606-Speed 2492.47 samples/sec Loss 2.2791 LearningRate 0.000343 Epoch: 18 Global Step: 392310 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:39,808-Speed 2497.35 samples/sec Loss 2.2506 LearningRate 0.000343 Epoch: 18 Global Step: 392320 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:48,014-Speed 2495.96 samples/sec Loss 2.2613 LearningRate 0.000343 Epoch: 18 Global Step: 392330 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:14:56,220-Speed 2496.38 samples/sec Loss 2.2578 LearningRate 0.000343 Epoch: 18 Global Step: 392340 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:04,377-Speed 2511.17 samples/sec Loss 2.2226 LearningRate 0.000343 Epoch: 18 Global Step: 392350 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:12,579-Speed 2497.16 samples/sec Loss 2.2445 LearningRate 0.000343 Epoch: 18 Global Step: 392360 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:20,781-Speed 2497.19 samples/sec Loss 2.2808 LearningRate 0.000343 Epoch: 18 Global Step: 392370 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:28,986-Speed 2496.43 samples/sec Loss 2.2610 LearningRate 0.000343 Epoch: 18 Global Step: 392380 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:37,197-Speed 2494.42 samples/sec Loss 2.2890 LearningRate 0.000343 Epoch: 18 Global Step: 392390 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:45,451-Speed 2481.71 samples/sec Loss 2.2652 LearningRate 0.000343 Epoch: 18 Global Step: 392400 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:15:53,604-Speed 2512.12 samples/sec Loss 2.2765 LearningRate 0.000343 Epoch: 18 Global Step: 392410 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:01,816-Speed 2494.43 samples/sec Loss 2.2555 LearningRate 0.000343 Epoch: 18 Global Step: 392420 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:10,018-Speed 2498.13 samples/sec Loss 2.2537 LearningRate 0.000343 Epoch: 18 Global Step: 392430 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:18,222-Speed 2496.46 samples/sec Loss 2.3056 LearningRate 0.000343 Epoch: 18 Global Step: 392440 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:26,426-Speed 2497.05 samples/sec Loss 2.2287 LearningRate 0.000343 Epoch: 18 Global Step: 392450 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:34,636-Speed 2495.14 samples/sec Loss 2.2778 LearningRate 0.000343 Epoch: 18 Global Step: 392460 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:42,792-Speed 2511.43 samples/sec Loss 2.2743 LearningRate 0.000343 Epoch: 18 Global Step: 392470 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:51,002-Speed 2494.93 samples/sec Loss 2.2452 LearningRate 0.000343 Epoch: 18 Global Step: 392480 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:16:59,209-Speed 2495.92 samples/sec Loss 2.2745 LearningRate 0.000343 Epoch: 18 Global Step: 392490 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:07,417-Speed 2496.03 samples/sec Loss 2.2865 LearningRate 0.000343 Epoch: 18 Global Step: 392500 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:15,620-Speed 2497.01 samples/sec Loss 2.2319 LearningRate 0.000343 Epoch: 18 Global Step: 392510 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:23,827-Speed 2495.82 samples/sec Loss 2.2815 LearningRate 0.000343 Epoch: 18 Global Step: 392520 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:31,979-Speed 2512.89 samples/sec Loss 2.2576 LearningRate 0.000343 Epoch: 18 Global Step: 392530 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:40,184-Speed 2496.45 samples/sec Loss 2.2833 LearningRate 0.000343 Epoch: 18 Global Step: 392540 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:48,388-Speed 2496.72 samples/sec Loss 2.2354 LearningRate 0.000343 Epoch: 18 Global Step: 392550 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:17:56,602-Speed 2493.71 samples/sec Loss 2.2271 LearningRate 0.000343 Epoch: 18 Global Step: 392560 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:04,806-Speed 2496.57 samples/sec Loss 2.2416 LearningRate 0.000343 Epoch: 18 Global Step: 392570 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:13,012-Speed 2496.33 samples/sec Loss 2.2777 LearningRate 0.000343 Epoch: 18 Global Step: 392580 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:21,187-Speed 2505.59 samples/sec Loss 2.2413 LearningRate 0.000343 Epoch: 18 Global Step: 392590 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:29,392-Speed 2496.30 samples/sec Loss 2.2660 LearningRate 0.000343 Epoch: 18 Global Step: 392600 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:37,596-Speed 2496.91 samples/sec Loss 2.2641 LearningRate 0.000343 Epoch: 18 Global Step: 392610 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:45,800-Speed 2496.62 samples/sec Loss 2.2590 LearningRate 0.000343 Epoch: 18 Global Step: 392620 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:18:54,003-Speed 2496.97 samples/sec Loss 2.2571 LearningRate 0.000342 Epoch: 18 Global Step: 392630 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:02,206-Speed 2497.04 samples/sec Loss 2.2445 LearningRate 0.000342 Epoch: 18 Global Step: 392640 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:10,358-Speed 2512.66 samples/sec Loss 2.2697 LearningRate 0.000342 Epoch: 18 Global Step: 392650 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:18,562-Speed 2496.78 samples/sec Loss 2.2261 LearningRate 0.000342 Epoch: 18 Global Step: 392660 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:26,769-Speed 2495.70 samples/sec Loss 2.2858 LearningRate 0.000342 Epoch: 18 Global Step: 392670 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:34,971-Speed 2496.91 samples/sec Loss 2.2649 LearningRate 0.000342 Epoch: 18 Global Step: 392680 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:43,187-Speed 2493.37 samples/sec Loss 2.2457 LearningRate 0.000342 Epoch: 18 Global Step: 392690 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:51,400-Speed 2494.02 samples/sec Loss 2.2939 LearningRate 0.000342 Epoch: 18 Global Step: 392700 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:19:59,552-Speed 2512.76 samples/sec Loss 2.2420 LearningRate 0.000342 Epoch: 18 Global Step: 392710 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:07,758-Speed 2496.03 samples/sec Loss 2.2456 LearningRate 0.000342 Epoch: 18 Global Step: 392720 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:15,961-Speed 2496.73 samples/sec Loss 2.2321 LearningRate 0.000342 Epoch: 18 Global Step: 392730 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:24,166-Speed 2496.52 samples/sec Loss 2.2230 LearningRate 0.000342 Epoch: 18 Global Step: 392740 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:32,364-Speed 2498.40 samples/sec Loss 2.2212 LearningRate 0.000342 Epoch: 18 Global Step: 392750 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:40,570-Speed 2496.20 samples/sec Loss 2.1938 LearningRate 0.000342 Epoch: 18 Global Step: 392760 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:48,729-Speed 2510.37 samples/sec Loss 2.1889 LearningRate 0.000342 Epoch: 18 Global Step: 392770 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:20:56,930-Speed 2497.67 samples/sec Loss 2.2512 LearningRate 0.000342 Epoch: 18 Global Step: 392780 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:05,136-Speed 2496.20 samples/sec Loss 2.2298 LearningRate 0.000342 Epoch: 18 Global Step: 392790 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:13,340-Speed 2496.65 samples/sec Loss 2.2453 LearningRate 0.000342 Epoch: 18 Global Step: 392800 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:21,551-Speed 2494.73 samples/sec Loss 2.2474 LearningRate 0.000342 Epoch: 18 Global Step: 392810 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:29,754-Speed 2496.79 samples/sec Loss 2.1815 LearningRate 0.000342 Epoch: 18 Global Step: 392820 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:37,905-Speed 2513.53 samples/sec Loss 2.2235 LearningRate 0.000342 Epoch: 18 Global Step: 392830 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:46,106-Speed 2497.33 samples/sec Loss 2.1872 LearningRate 0.000342 Epoch: 18 Global Step: 392840 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:21:54,305-Speed 2498.38 samples/sec Loss 2.2065 LearningRate 0.000342 Epoch: 18 Global Step: 392850 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:22:02,508-Speed 2497.09 samples/sec Loss 2.2501 LearningRate 0.000342 Epoch: 18 Global Step: 392860 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:22:10,714-Speed 2496.07 samples/sec Loss 2.2223 LearningRate 0.000342 Epoch: 18 Global Step: 392870 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:22:18,913-Speed 2498.37 samples/sec Loss 2.2931 LearningRate 0.000342 Epoch: 18 Global Step: 392880 Fp16 Grad Scale: 65536 Required: 100 hours Training: 2022-07-09 08:22:27,059-Speed 2514.37 samples/sec Loss 2.2846 LearningRate 0.000342 Epoch: 18 Global Step: 392890 Fp16 Grad Scale: 65536 Required: 100 hours Training: 2022-07-09 08:22:35,261-Speed 2497.14 samples/sec Loss 2.2354 LearningRate 0.000342 Epoch: 18 Global Step: 392900 Fp16 Grad Scale: 65536 Required: 100 hours Training: 2022-07-09 08:22:43,472-Speed 2494.71 samples/sec Loss 2.2641 LearningRate 0.000342 Epoch: 18 Global Step: 392910 Fp16 Grad Scale: 65536 Required: 100 hours Training: 2022-07-09 08:22:51,675-Speed 2497.11 samples/sec Loss 2.1847 LearningRate 0.000342 Epoch: 18 Global Step: 392920 Fp16 Grad Scale: 65536 Required: 100 hours Training: 2022-07-09 08:22:59,835-Speed 2510.36 samples/sec Loss 2.2244 LearningRate 0.000342 Epoch: 18 Global Step: 392930 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:08,034-Speed 2497.99 samples/sec Loss 2.2924 LearningRate 0.000342 Epoch: 18 Global Step: 392940 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:16,183-Speed 2513.68 samples/sec Loss 2.2449 LearningRate 0.000342 Epoch: 18 Global Step: 392950 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:24,384-Speed 2497.66 samples/sec Loss 2.2891 LearningRate 0.000342 Epoch: 18 Global Step: 392960 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:32,587-Speed 2496.94 samples/sec Loss 2.2751 LearningRate 0.000342 Epoch: 18 Global Step: 392970 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:40,786-Speed 2498.18 samples/sec Loss 2.2278 LearningRate 0.000342 Epoch: 18 Global Step: 392980 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:48,987-Speed 2497.66 samples/sec Loss 2.1992 LearningRate 0.000342 Epoch: 18 Global Step: 392990 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:23:57,190-Speed 2496.98 samples/sec Loss 2.2638 LearningRate 0.000342 Epoch: 18 Global Step: 393000 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:05,336-Speed 2514.44 samples/sec Loss 2.2585 LearningRate 0.000342 Epoch: 18 Global Step: 393010 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:13,536-Speed 2498.34 samples/sec Loss 2.2116 LearningRate 0.000342 Epoch: 18 Global Step: 393020 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:21,739-Speed 2497.08 samples/sec Loss 2.2151 LearningRate 0.000342 Epoch: 18 Global Step: 393030 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:29,941-Speed 2497.50 samples/sec Loss 2.2825 LearningRate 0.000342 Epoch: 18 Global Step: 393040 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:38,148-Speed 2495.80 samples/sec Loss 2.2057 LearningRate 0.000342 Epoch: 18 Global Step: 393050 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:46,360-Speed 2494.23 samples/sec Loss 2.2332 LearningRate 0.000342 Epoch: 18 Global Step: 393060 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:24:54,517-Speed 2510.94 samples/sec Loss 2.2402 LearningRate 0.000342 Epoch: 18 Global Step: 393070 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:02,722-Speed 2496.55 samples/sec Loss 2.2382 LearningRate 0.000342 Epoch: 18 Global Step: 393080 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:10,920-Speed 2498.43 samples/sec Loss 2.2545 LearningRate 0.000342 Epoch: 18 Global Step: 393090 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:19,124-Speed 2496.76 samples/sec Loss 2.2825 LearningRate 0.000342 Epoch: 18 Global Step: 393100 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:27,331-Speed 2496.09 samples/sec Loss 2.2298 LearningRate 0.000342 Epoch: 18 Global Step: 393110 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:35,532-Speed 2497.47 samples/sec Loss 2.1995 LearningRate 0.000342 Epoch: 18 Global Step: 393120 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:43,694-Speed 2509.68 samples/sec Loss 2.2482 LearningRate 0.000342 Epoch: 18 Global Step: 393130 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:25:51,899-Speed 2496.41 samples/sec Loss 2.2681 LearningRate 0.000342 Epoch: 18 Global Step: 393140 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:00,101-Speed 2497.39 samples/sec Loss 2.2412 LearningRate 0.000342 Epoch: 18 Global Step: 393150 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:08,305-Speed 2496.76 samples/sec Loss 2.2390 LearningRate 0.000342 Epoch: 18 Global Step: 393160 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:16,505-Speed 2497.63 samples/sec Loss 2.2046 LearningRate 0.000342 Epoch: 18 Global Step: 393170 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:24,711-Speed 2496.38 samples/sec Loss 2.2623 LearningRate 0.000342 Epoch: 18 Global Step: 393180 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:32,859-Speed 2513.70 samples/sec Loss 2.2309 LearningRate 0.000342 Epoch: 18 Global Step: 393190 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:41,060-Speed 2498.13 samples/sec Loss 2.2624 LearningRate 0.000342 Epoch: 18 Global Step: 393200 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:49,267-Speed 2496.04 samples/sec Loss 2.2799 LearningRate 0.000342 Epoch: 18 Global Step: 393210 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:26:57,467-Speed 2497.96 samples/sec Loss 2.2614 LearningRate 0.000342 Epoch: 18 Global Step: 393220 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:05,666-Speed 2498.17 samples/sec Loss 2.2318 LearningRate 0.000342 Epoch: 18 Global Step: 393230 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:13,871-Speed 2496.41 samples/sec Loss 2.1940 LearningRate 0.000342 Epoch: 18 Global Step: 393240 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:22,020-Speed 2513.86 samples/sec Loss 2.2325 LearningRate 0.000342 Epoch: 18 Global Step: 393250 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:30,222-Speed 2497.59 samples/sec Loss 2.2334 LearningRate 0.000342 Epoch: 18 Global Step: 393260 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:38,425-Speed 2496.96 samples/sec Loss 2.2249 LearningRate 0.000341 Epoch: 18 Global Step: 393270 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:46,630-Speed 2496.27 samples/sec Loss 2.2262 LearningRate 0.000341 Epoch: 18 Global Step: 393280 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:27:54,836-Speed 2496.29 samples/sec Loss 2.1650 LearningRate 0.000341 Epoch: 18 Global Step: 393290 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:03,038-Speed 2497.12 samples/sec Loss 2.2156 LearningRate 0.000341 Epoch: 18 Global Step: 393300 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:11,186-Speed 2513.77 samples/sec Loss 2.2431 LearningRate 0.000341 Epoch: 18 Global Step: 393310 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:19,428-Speed 2485.53 samples/sec Loss 2.2129 LearningRate 0.000341 Epoch: 18 Global Step: 393320 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:27,627-Speed 2498.12 samples/sec Loss 2.2381 LearningRate 0.000341 Epoch: 18 Global Step: 393330 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:35,831-Speed 2496.61 samples/sec Loss 2.2192 LearningRate 0.000341 Epoch: 18 Global Step: 393340 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:44,035-Speed 2496.94 samples/sec Loss 2.2114 LearningRate 0.000341 Epoch: 18 Global Step: 393350 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:28:52,243-Speed 2495.48 samples/sec Loss 2.2601 LearningRate 0.000341 Epoch: 18 Global Step: 393360 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:00,388-Speed 2514.68 samples/sec Loss 2.2696 LearningRate 0.000341 Epoch: 18 Global Step: 393370 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:08,588-Speed 2498.12 samples/sec Loss 2.2664 LearningRate 0.000341 Epoch: 18 Global Step: 393380 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:16,787-Speed 2498.03 samples/sec Loss 2.2422 LearningRate 0.000341 Epoch: 18 Global Step: 393390 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:24,994-Speed 2495.73 samples/sec Loss 2.2274 LearningRate 0.000341 Epoch: 18 Global Step: 393400 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:33,195-Speed 2497.62 samples/sec Loss 2.2008 LearningRate 0.000341 Epoch: 18 Global Step: 393410 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:41,407-Speed 2494.28 samples/sec Loss 2.2185 LearningRate 0.000341 Epoch: 18 Global Step: 393420 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:49,560-Speed 2512.20 samples/sec Loss 2.2791 LearningRate 0.000341 Epoch: 18 Global Step: 393430 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:29:57,759-Speed 2498.35 samples/sec Loss 2.2731 LearningRate 0.000341 Epoch: 18 Global Step: 393440 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:05,960-Speed 2497.67 samples/sec Loss 2.2220 LearningRate 0.000341 Epoch: 18 Global Step: 393450 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:14,158-Speed 2498.37 samples/sec Loss 2.2516 LearningRate 0.000341 Epoch: 18 Global Step: 393460 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:22,356-Speed 2498.52 samples/sec Loss 2.2265 LearningRate 0.000341 Epoch: 18 Global Step: 393470 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:30,557-Speed 2497.90 samples/sec Loss 2.2326 LearningRate 0.000341 Epoch: 18 Global Step: 393480 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:38,706-Speed 2513.47 samples/sec Loss 2.2224 LearningRate 0.000341 Epoch: 18 Global Step: 393490 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:46,919-Speed 2494.91 samples/sec Loss 2.1898 LearningRate 0.000341 Epoch: 18 Global Step: 393500 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:30:55,122-Speed 2496.94 samples/sec Loss 2.1830 LearningRate 0.000341 Epoch: 18 Global Step: 393510 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:03,322-Speed 2498.12 samples/sec Loss 2.2470 LearningRate 0.000341 Epoch: 18 Global Step: 393520 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:11,524-Speed 2497.29 samples/sec Loss 2.1950 LearningRate 0.000341 Epoch: 18 Global Step: 393530 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:19,731-Speed 2495.84 samples/sec Loss 2.2307 LearningRate 0.000341 Epoch: 18 Global Step: 393540 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:27,879-Speed 2514.04 samples/sec Loss 2.2217 LearningRate 0.000341 Epoch: 18 Global Step: 393550 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:36,093-Speed 2493.62 samples/sec Loss 2.2468 LearningRate 0.000341 Epoch: 18 Global Step: 393560 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:44,294-Speed 2497.57 samples/sec Loss 2.2568 LearningRate 0.000341 Epoch: 18 Global Step: 393570 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:31:52,498-Speed 2496.72 samples/sec Loss 2.2422 LearningRate 0.000341 Epoch: 18 Global Step: 393580 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:00,699-Speed 2497.72 samples/sec Loss 2.2194 LearningRate 0.000341 Epoch: 18 Global Step: 393590 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:08,899-Speed 2498.05 samples/sec Loss 2.2258 LearningRate 0.000341 Epoch: 18 Global Step: 393600 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:17,049-Speed 2513.22 samples/sec Loss 2.2188 LearningRate 0.000341 Epoch: 18 Global Step: 393610 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:25,250-Speed 2497.57 samples/sec Loss 2.1972 LearningRate 0.000341 Epoch: 18 Global Step: 393620 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:33,453-Speed 2497.08 samples/sec Loss 2.2332 LearningRate 0.000341 Epoch: 18 Global Step: 393630 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:41,655-Speed 2497.32 samples/sec Loss 2.2498 LearningRate 0.000341 Epoch: 18 Global Step: 393640 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:49,855-Speed 2498.04 samples/sec Loss 2.2120 LearningRate 0.000341 Epoch: 18 Global Step: 393650 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:32:58,051-Speed 2499.09 samples/sec Loss 2.2012 LearningRate 0.000341 Epoch: 18 Global Step: 393660 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:06,197-Speed 2514.84 samples/sec Loss 2.2267 LearningRate 0.000341 Epoch: 18 Global Step: 393670 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:14,396-Speed 2498.48 samples/sec Loss 2.2289 LearningRate 0.000341 Epoch: 18 Global Step: 393680 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:22,595-Speed 2498.25 samples/sec Loss 2.2116 LearningRate 0.000341 Epoch: 18 Global Step: 393690 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:30,807-Speed 2494.81 samples/sec Loss 2.2085 LearningRate 0.000341 Epoch: 18 Global Step: 393700 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:39,007-Speed 2497.76 samples/sec Loss 2.2133 LearningRate 0.000341 Epoch: 18 Global Step: 393710 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:47,208-Speed 2497.66 samples/sec Loss 2.2828 LearningRate 0.000341 Epoch: 18 Global Step: 393720 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:33:55,354-Speed 2514.68 samples/sec Loss 2.2302 LearningRate 0.000341 Epoch: 18 Global Step: 393730 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:03,556-Speed 2497.31 samples/sec Loss 2.2157 LearningRate 0.000341 Epoch: 18 Global Step: 393740 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:11,770-Speed 2496.41 samples/sec Loss 2.2444 LearningRate 0.000341 Epoch: 18 Global Step: 393750 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:19,973-Speed 2496.86 samples/sec Loss 2.2318 LearningRate 0.000341 Epoch: 18 Global Step: 393760 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:28,177-Speed 2497.01 samples/sec Loss 2.2218 LearningRate 0.000341 Epoch: 18 Global Step: 393770 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:36,378-Speed 2497.81 samples/sec Loss 2.2188 LearningRate 0.000341 Epoch: 18 Global Step: 393780 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:44,521-Speed 2515.43 samples/sec Loss 2.2144 LearningRate 0.000341 Epoch: 18 Global Step: 393790 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:34:52,733-Speed 2494.64 samples/sec Loss 2.2487 LearningRate 0.000341 Epoch: 18 Global Step: 393800 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:00,938-Speed 2496.47 samples/sec Loss 2.2036 LearningRate 0.000341 Epoch: 18 Global Step: 393810 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:09,148-Speed 2494.80 samples/sec Loss 2.2520 LearningRate 0.000341 Epoch: 18 Global Step: 393820 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:17,354-Speed 2496.02 samples/sec Loss 2.2572 LearningRate 0.000341 Epoch: 18 Global Step: 393830 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:25,554-Speed 2499.17 samples/sec Loss 2.2165 LearningRate 0.000341 Epoch: 18 Global Step: 393840 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:33,698-Speed 2514.97 samples/sec Loss 2.2709 LearningRate 0.000341 Epoch: 18 Global Step: 393850 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:41,920-Speed 2491.37 samples/sec Loss 2.2583 LearningRate 0.000341 Epoch: 18 Global Step: 393860 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:50,121-Speed 2497.91 samples/sec Loss 2.3009 LearningRate 0.000341 Epoch: 18 Global Step: 393870 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:35:58,326-Speed 2496.36 samples/sec Loss 2.2536 LearningRate 0.000341 Epoch: 18 Global Step: 393880 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:36:06,526-Speed 2497.91 samples/sec Loss 2.2518 LearningRate 0.000341 Epoch: 18 Global Step: 393890 Fp16 Grad Scale: 32768 Required: 100 hours Training: 2022-07-09 08:36:14,685-Speed 2510.62 samples/sec Loss 2.2367 LearningRate 0.000340 Epoch: 18 Global Step: 393900 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:36:22,838-Speed 2512.26 samples/sec Loss 2.3338 LearningRate 0.000340 Epoch: 18 Global Step: 393910 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:36:31,042-Speed 2496.79 samples/sec Loss 2.2433 LearningRate 0.000340 Epoch: 18 Global Step: 393920 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:36:39,244-Speed 2497.19 samples/sec Loss 2.2356 LearningRate 0.000340 Epoch: 18 Global Step: 393930 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:36:47,448-Speed 2496.76 samples/sec Loss 2.2575 LearningRate 0.000340 Epoch: 18 Global Step: 393940 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:36:55,650-Speed 2497.53 samples/sec Loss 2.2757 LearningRate 0.000340 Epoch: 18 Global Step: 393950 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:03,851-Speed 2497.89 samples/sec Loss 2.2180 LearningRate 0.000340 Epoch: 18 Global Step: 393960 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:12,011-Speed 2510.17 samples/sec Loss 2.2665 LearningRate 0.000340 Epoch: 18 Global Step: 393970 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:20,220-Speed 2495.12 samples/sec Loss 2.2464 LearningRate 0.000340 Epoch: 18 Global Step: 393980 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:28,419-Speed 2498.19 samples/sec Loss 2.2006 LearningRate 0.000340 Epoch: 18 Global Step: 393990 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:36,617-Speed 2498.81 samples/sec Loss 2.2803 LearningRate 0.000340 Epoch: 18 Global Step: 394000 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:44,842-Speed 2490.07 samples/sec Loss 2.2500 LearningRate 0.000340 Epoch: 18 Global Step: 394010 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:37:53,044-Speed 2497.51 samples/sec Loss 2.2555 LearningRate 0.000340 Epoch: 18 Global Step: 394020 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:01,185-Speed 2516.02 samples/sec Loss 2.2372 LearningRate 0.000340 Epoch: 18 Global Step: 394030 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:09,406-Speed 2491.77 samples/sec Loss 2.2672 LearningRate 0.000340 Epoch: 18 Global Step: 394040 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:17,602-Speed 2499.15 samples/sec Loss 2.2367 LearningRate 0.000340 Epoch: 18 Global Step: 394050 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:28,206-Speed 1931.45 samples/sec Loss 2.2098 LearningRate 0.000340 Epoch: 19 Global Step: 394060 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:36,401-Speed 2499.72 samples/sec Loss 2.2872 LearningRate 0.000340 Epoch: 19 Global Step: 394070 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:44,608-Speed 2495.75 samples/sec Loss 2.2716 LearningRate 0.000340 Epoch: 19 Global Step: 394080 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:38:52,774-Speed 2508.45 samples/sec Loss 2.2669 LearningRate 0.000340 Epoch: 19 Global Step: 394090 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:00,978-Speed 2496.74 samples/sec Loss 2.2481 LearningRate 0.000340 Epoch: 19 Global Step: 394100 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:09,178-Speed 2498.02 samples/sec Loss 2.2250 LearningRate 0.000340 Epoch: 19 Global Step: 394110 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:17,379-Speed 2497.44 samples/sec Loss 2.3131 LearningRate 0.000340 Epoch: 19 Global Step: 394120 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:25,580-Speed 2497.66 samples/sec Loss 2.2598 LearningRate 0.000340 Epoch: 19 Global Step: 394130 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:33,811-Speed 2488.60 samples/sec Loss 2.2336 LearningRate 0.000340 Epoch: 19 Global Step: 394140 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:41,957-Speed 2514.33 samples/sec Loss 2.2136 LearningRate 0.000340 Epoch: 19 Global Step: 394150 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:50,157-Speed 2498.10 samples/sec Loss 2.1965 LearningRate 0.000340 Epoch: 19 Global Step: 394160 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:39:58,356-Speed 2498.11 samples/sec Loss 2.2351 LearningRate 0.000340 Epoch: 19 Global Step: 394170 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:06,560-Speed 2496.74 samples/sec Loss 2.2776 LearningRate 0.000340 Epoch: 19 Global Step: 394180 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:14,764-Speed 2496.64 samples/sec Loss 2.2816 LearningRate 0.000340 Epoch: 19 Global Step: 394190 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:22,968-Speed 2496.64 samples/sec Loss 2.2590 LearningRate 0.000340 Epoch: 19 Global Step: 394200 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:31,117-Speed 2513.67 samples/sec Loss 2.2590 LearningRate 0.000340 Epoch: 19 Global Step: 394210 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:39,324-Speed 2496.02 samples/sec Loss 2.2364 LearningRate 0.000340 Epoch: 19 Global Step: 394220 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:47,530-Speed 2496.09 samples/sec Loss 2.2324 LearningRate 0.000340 Epoch: 19 Global Step: 394230 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:40:55,727-Speed 2498.75 samples/sec Loss 2.2126 LearningRate 0.000340 Epoch: 19 Global Step: 394240 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:03,931-Speed 2496.69 samples/sec Loss 2.2135 LearningRate 0.000340 Epoch: 19 Global Step: 394250 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:12,145-Speed 2493.98 samples/sec Loss 2.1930 LearningRate 0.000340 Epoch: 19 Global Step: 394260 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:20,292-Speed 2514.15 samples/sec Loss 2.2434 LearningRate 0.000340 Epoch: 19 Global Step: 394270 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:28,496-Speed 2496.74 samples/sec Loss 2.2258 LearningRate 0.000340 Epoch: 19 Global Step: 394280 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:36,698-Speed 2497.59 samples/sec Loss 2.2763 LearningRate 0.000340 Epoch: 19 Global Step: 394290 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:44,925-Speed 2489.73 samples/sec Loss 2.2434 LearningRate 0.000340 Epoch: 19 Global Step: 394300 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:41:53,127-Speed 2497.23 samples/sec Loss 2.2400 LearningRate 0.000340 Epoch: 19 Global Step: 394310 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:01,342-Speed 2493.69 samples/sec Loss 2.2361 LearningRate 0.000340 Epoch: 19 Global Step: 394320 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:09,495-Speed 2512.39 samples/sec Loss 2.2181 LearningRate 0.000340 Epoch: 19 Global Step: 394330 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:17,694-Speed 2498.10 samples/sec Loss 2.2115 LearningRate 0.000340 Epoch: 19 Global Step: 394340 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:25,891-Speed 2499.07 samples/sec Loss 2.2469 LearningRate 0.000340 Epoch: 19 Global Step: 394350 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:34,091-Speed 2497.91 samples/sec Loss 2.2506 LearningRate 0.000340 Epoch: 19 Global Step: 394360 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:42,293-Speed 2497.27 samples/sec Loss 2.2253 LearningRate 0.000340 Epoch: 19 Global Step: 394370 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:50,493-Speed 2497.84 samples/sec Loss 2.2731 LearningRate 0.000340 Epoch: 19 Global Step: 394380 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:42:58,671-Speed 2504.42 samples/sec Loss 2.2633 LearningRate 0.000340 Epoch: 19 Global Step: 394390 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:06,874-Speed 2497.05 samples/sec Loss 2.3012 LearningRate 0.000340 Epoch: 19 Global Step: 394400 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:15,079-Speed 2496.69 samples/sec Loss 2.2097 LearningRate 0.000340 Epoch: 19 Global Step: 394410 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:23,280-Speed 2497.60 samples/sec Loss 2.2836 LearningRate 0.000340 Epoch: 19 Global Step: 394420 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:31,561-Speed 2474.15 samples/sec Loss 2.2670 LearningRate 0.000340 Epoch: 19 Global Step: 394430 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:39,764-Speed 2496.91 samples/sec Loss 2.2368 LearningRate 0.000340 Epoch: 19 Global Step: 394440 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:47,914-Speed 2513.31 samples/sec Loss 2.2896 LearningRate 0.000340 Epoch: 19 Global Step: 394450 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:43:56,130-Speed 2492.90 samples/sec Loss 2.2102 LearningRate 0.000340 Epoch: 19 Global Step: 394460 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:04,337-Speed 2495.94 samples/sec Loss 2.3025 LearningRate 0.000340 Epoch: 19 Global Step: 394470 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:12,548-Speed 2494.51 samples/sec Loss 2.2567 LearningRate 0.000340 Epoch: 19 Global Step: 394480 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:20,750-Speed 2497.14 samples/sec Loss 2.2229 LearningRate 0.000340 Epoch: 19 Global Step: 394490 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:28,965-Speed 2493.69 samples/sec Loss 2.2575 LearningRate 0.000340 Epoch: 19 Global Step: 394500 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:37,113-Speed 2513.96 samples/sec Loss 2.2770 LearningRate 0.000340 Epoch: 19 Global Step: 394510 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:45,313-Speed 2497.91 samples/sec Loss 2.2304 LearningRate 0.000340 Epoch: 19 Global Step: 394520 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:44:53,514-Speed 2497.71 samples/sec Loss 2.2927 LearningRate 0.000340 Epoch: 19 Global Step: 394530 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:01,713-Speed 2498.25 samples/sec Loss 2.2663 LearningRate 0.000340 Epoch: 19 Global Step: 394540 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:09,914-Speed 2497.74 samples/sec Loss 2.2746 LearningRate 0.000339 Epoch: 19 Global Step: 394550 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:18,116-Speed 2499.53 samples/sec Loss 2.2188 LearningRate 0.000339 Epoch: 19 Global Step: 394560 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:26,264-Speed 2514.02 samples/sec Loss 2.2259 LearningRate 0.000339 Epoch: 19 Global Step: 394570 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:34,472-Speed 2495.48 samples/sec Loss 2.2650 LearningRate 0.000339 Epoch: 19 Global Step: 394580 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:42,671-Speed 2498.17 samples/sec Loss 2.2212 LearningRate 0.000339 Epoch: 19 Global Step: 394590 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:50,881-Speed 2494.93 samples/sec Loss 2.2379 LearningRate 0.000339 Epoch: 19 Global Step: 394600 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:45:59,082-Speed 2498.02 samples/sec Loss 2.2322 LearningRate 0.000339 Epoch: 19 Global Step: 394610 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:07,295-Speed 2494.00 samples/sec Loss 2.2265 LearningRate 0.000339 Epoch: 19 Global Step: 394620 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:15,457-Speed 2509.85 samples/sec Loss 2.2089 LearningRate 0.000339 Epoch: 19 Global Step: 394630 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:23,659-Speed 2497.13 samples/sec Loss 2.2545 LearningRate 0.000339 Epoch: 19 Global Step: 394640 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:31,860-Speed 2497.80 samples/sec Loss 2.1945 LearningRate 0.000339 Epoch: 19 Global Step: 394650 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:40,061-Speed 2497.74 samples/sec Loss 2.2085 LearningRate 0.000339 Epoch: 19 Global Step: 394660 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:48,261-Speed 2497.88 samples/sec Loss 2.2244 LearningRate 0.000339 Epoch: 19 Global Step: 394670 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:46:56,462-Speed 2497.50 samples/sec Loss 2.2000 LearningRate 0.000339 Epoch: 19 Global Step: 394680 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:04,608-Speed 2514.53 samples/sec Loss 2.2063 LearningRate 0.000339 Epoch: 19 Global Step: 394690 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:12,809-Speed 2497.88 samples/sec Loss 2.1809 LearningRate 0.000339 Epoch: 19 Global Step: 394700 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:21,012-Speed 2496.99 samples/sec Loss 2.1890 LearningRate 0.000339 Epoch: 19 Global Step: 394710 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:29,212-Speed 2497.86 samples/sec Loss 2.2025 LearningRate 0.000339 Epoch: 19 Global Step: 394720 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:37,412-Speed 2498.10 samples/sec Loss 2.2611 LearningRate 0.000339 Epoch: 19 Global Step: 394730 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:45,616-Speed 2496.52 samples/sec Loss 2.1414 LearningRate 0.000339 Epoch: 19 Global Step: 394740 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:47:53,760-Speed 2515.15 samples/sec Loss 2.1358 LearningRate 0.000339 Epoch: 19 Global Step: 394750 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:01,965-Speed 2496.56 samples/sec Loss 2.2230 LearningRate 0.000339 Epoch: 19 Global Step: 394760 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:10,165-Speed 2497.98 samples/sec Loss 2.2155 LearningRate 0.000339 Epoch: 19 Global Step: 394770 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:18,368-Speed 2497.16 samples/sec Loss 2.2067 LearningRate 0.000339 Epoch: 19 Global Step: 394780 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:26,571-Speed 2497.13 samples/sec Loss 2.1850 LearningRate 0.000339 Epoch: 19 Global Step: 394790 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:34,773-Speed 2497.36 samples/sec Loss 2.2029 LearningRate 0.000339 Epoch: 19 Global Step: 394800 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:42,924-Speed 2513.04 samples/sec Loss 2.2307 LearningRate 0.000339 Epoch: 19 Global Step: 394810 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:51,127-Speed 2496.72 samples/sec Loss 2.2145 LearningRate 0.000339 Epoch: 19 Global Step: 394820 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:48:59,333-Speed 2496.28 samples/sec Loss 2.2400 LearningRate 0.000339 Epoch: 19 Global Step: 394830 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:49:07,534-Speed 2497.56 samples/sec Loss 2.1896 LearningRate 0.000339 Epoch: 19 Global Step: 394840 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:49:15,735-Speed 2497.69 samples/sec Loss 2.2426 LearningRate 0.000339 Epoch: 19 Global Step: 394850 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:49:23,935-Speed 2498.03 samples/sec Loss 2.2328 LearningRate 0.000339 Epoch: 19 Global Step: 394860 Fp16 Grad Scale: 16384 Required: 100 hours Training: 2022-07-09 08:49:32,096-Speed 2509.92 samples/sec Loss 2.2066 LearningRate 0.000339 Epoch: 19 Global Step: 394870 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:49:40,297-Speed 2497.56 samples/sec Loss 2.1809 LearningRate 0.000339 Epoch: 19 Global Step: 394880 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:49:48,496-Speed 2498.37 samples/sec Loss 2.2543 LearningRate 0.000339 Epoch: 19 Global Step: 394890 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:49:56,699-Speed 2497.17 samples/sec Loss 2.2610 LearningRate 0.000339 Epoch: 19 Global Step: 394900 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:04,899-Speed 2497.91 samples/sec Loss 2.1867 LearningRate 0.000339 Epoch: 19 Global Step: 394910 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:13,099-Speed 2498.21 samples/sec Loss 2.2050 LearningRate 0.000339 Epoch: 19 Global Step: 394920 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:21,255-Speed 2511.60 samples/sec Loss 2.2013 LearningRate 0.000339 Epoch: 19 Global Step: 394930 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:29,466-Speed 2494.72 samples/sec Loss 2.1998 LearningRate 0.000339 Epoch: 19 Global Step: 394940 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:37,666-Speed 2497.95 samples/sec Loss 2.2100 LearningRate 0.000339 Epoch: 19 Global Step: 394950 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:45,870-Speed 2496.83 samples/sec Loss 2.1871 LearningRate 0.000339 Epoch: 19 Global Step: 394960 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:50:54,072-Speed 2497.22 samples/sec Loss 2.2117 LearningRate 0.000339 Epoch: 19 Global Step: 394970 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:02,272-Speed 2497.97 samples/sec Loss 2.1952 LearningRate 0.000339 Epoch: 19 Global Step: 394980 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:10,419-Speed 2513.99 samples/sec Loss 2.2064 LearningRate 0.000339 Epoch: 19 Global Step: 394990 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:18,618-Speed 2498.55 samples/sec Loss 2.2638 LearningRate 0.000339 Epoch: 19 Global Step: 395000 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:26,827-Speed 2495.36 samples/sec Loss 2.2319 LearningRate 0.000339 Epoch: 19 Global Step: 395010 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:35,028-Speed 2497.62 samples/sec Loss 2.1833 LearningRate 0.000339 Epoch: 19 Global Step: 395020 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:43,228-Speed 2498.12 samples/sec Loss 2.2323 LearningRate 0.000339 Epoch: 19 Global Step: 395030 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:51,428-Speed 2497.84 samples/sec Loss 2.2508 LearningRate 0.000339 Epoch: 19 Global Step: 395040 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:51:59,574-Speed 2514.58 samples/sec Loss 2.1983 LearningRate 0.000339 Epoch: 19 Global Step: 395050 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:52:07,787-Speed 2494.13 samples/sec Loss 2.2349 LearningRate 0.000339 Epoch: 19 Global Step: 395060 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:52:15,988-Speed 2497.78 samples/sec Loss 2.2374 LearningRate 0.000339 Epoch: 19 Global Step: 395070 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:52:24,196-Speed 2495.44 samples/sec Loss 2.2061 LearningRate 0.000339 Epoch: 19 Global Step: 395080 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:52:32,410-Speed 2493.75 samples/sec Loss 2.1967 LearningRate 0.000339 Epoch: 19 Global Step: 395090 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:52:40,618-Speed 2495.42 samples/sec Loss 2.2355 LearningRate 0.000339 Epoch: 19 Global Step: 395100 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:52:48,770-Speed 2512.90 samples/sec Loss 2.1911 LearningRate 0.000339 Epoch: 19 Global Step: 395110 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:52:56,974-Speed 2496.50 samples/sec Loss 2.1997 LearningRate 0.000339 Epoch: 19 Global Step: 395120 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:05,174-Speed 2497.92 samples/sec Loss 2.2421 LearningRate 0.000339 Epoch: 19 Global Step: 395130 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:13,390-Speed 2493.22 samples/sec Loss 2.2574 LearningRate 0.000339 Epoch: 19 Global Step: 395140 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:21,595-Speed 2496.46 samples/sec Loss 2.2595 LearningRate 0.000339 Epoch: 19 Global Step: 395150 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:29,795-Speed 2498.13 samples/sec Loss 2.2489 LearningRate 0.000339 Epoch: 19 Global Step: 395160 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:37,944-Speed 2513.66 samples/sec Loss 2.2304 LearningRate 0.000339 Epoch: 19 Global Step: 395170 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:46,140-Speed 2498.99 samples/sec Loss 2.2034 LearningRate 0.000339 Epoch: 19 Global Step: 395180 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:53:54,341-Speed 2497.68 samples/sec Loss 2.2395 LearningRate 0.000338 Epoch: 19 Global Step: 395190 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:02,540-Speed 2498.02 samples/sec Loss 2.2360 LearningRate 0.000338 Epoch: 19 Global Step: 395200 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:10,744-Speed 2496.83 samples/sec Loss 2.2640 LearningRate 0.000338 Epoch: 19 Global Step: 395210 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:18,944-Speed 2498.18 samples/sec Loss 2.2258 LearningRate 0.000338 Epoch: 19 Global Step: 395220 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:27,098-Speed 2511.91 samples/sec Loss 2.2240 LearningRate 0.000338 Epoch: 19 Global Step: 395230 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:35,306-Speed 2495.40 samples/sec Loss 2.2264 LearningRate 0.000338 Epoch: 19 Global Step: 395240 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:43,510-Speed 2496.74 samples/sec Loss 2.2466 LearningRate 0.000338 Epoch: 19 Global Step: 395250 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:51,711-Speed 2497.70 samples/sec Loss 2.2408 LearningRate 0.000338 Epoch: 19 Global Step: 395260 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:54:59,914-Speed 2497.40 samples/sec Loss 2.2211 LearningRate 0.000338 Epoch: 19 Global Step: 395270 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:08,115-Speed 2497.50 samples/sec Loss 2.2358 LearningRate 0.000338 Epoch: 19 Global Step: 395280 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:16,262-Speed 2514.30 samples/sec Loss 2.2422 LearningRate 0.000338 Epoch: 19 Global Step: 395290 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:24,482-Speed 2492.95 samples/sec Loss 2.2733 LearningRate 0.000338 Epoch: 19 Global Step: 395300 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:32,680-Speed 2498.30 samples/sec Loss 2.2459 LearningRate 0.000338 Epoch: 19 Global Step: 395310 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:40,879-Speed 2498.40 samples/sec Loss 2.1664 LearningRate 0.000338 Epoch: 19 Global Step: 395320 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:49,079-Speed 2497.80 samples/sec Loss 2.2535 LearningRate 0.000338 Epoch: 19 Global Step: 395330 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 08:55:57,246-Speed 2508.51 samples/sec Loss 2.2250 LearningRate 0.000338 Epoch: 19 Global Step: 395340 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:05,402-Speed 2511.34 samples/sec Loss 2.2296 LearningRate 0.000338 Epoch: 19 Global Step: 395350 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:13,607-Speed 2496.48 samples/sec Loss 2.1982 LearningRate 0.000338 Epoch: 19 Global Step: 395360 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:21,808-Speed 2497.99 samples/sec Loss 2.2257 LearningRate 0.000338 Epoch: 19 Global Step: 395370 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:30,012-Speed 2497.01 samples/sec Loss 2.2254 LearningRate 0.000338 Epoch: 19 Global Step: 395380 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:38,213-Speed 2497.42 samples/sec Loss 2.2351 LearningRate 0.000338 Epoch: 19 Global Step: 395390 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:46,416-Speed 2496.98 samples/sec Loss 2.1814 LearningRate 0.000338 Epoch: 19 Global Step: 395400 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:56:54,565-Speed 2514.09 samples/sec Loss 2.2372 LearningRate 0.000338 Epoch: 19 Global Step: 395410 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:02,765-Speed 2497.72 samples/sec Loss 2.1981 LearningRate 0.000338 Epoch: 19 Global Step: 395420 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:10,980-Speed 2493.92 samples/sec Loss 2.1794 LearningRate 0.000338 Epoch: 19 Global Step: 395430 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:19,182-Speed 2497.21 samples/sec Loss 2.1744 LearningRate 0.000338 Epoch: 19 Global Step: 395440 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:27,383-Speed 2497.82 samples/sec Loss 2.2071 LearningRate 0.000338 Epoch: 19 Global Step: 395450 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:35,591-Speed 2495.43 samples/sec Loss 2.2420 LearningRate 0.000338 Epoch: 19 Global Step: 395460 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:43,742-Speed 2512.73 samples/sec Loss 2.1871 LearningRate 0.000338 Epoch: 19 Global Step: 395470 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:57:51,942-Speed 2498.08 samples/sec Loss 2.1943 LearningRate 0.000338 Epoch: 19 Global Step: 395480 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:00,162-Speed 2491.92 samples/sec Loss 2.1914 LearningRate 0.000338 Epoch: 19 Global Step: 395490 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:08,364-Speed 2497.21 samples/sec Loss 2.1928 LearningRate 0.000338 Epoch: 19 Global Step: 395500 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:16,563-Speed 2498.21 samples/sec Loss 2.1609 LearningRate 0.000338 Epoch: 19 Global Step: 395510 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:24,768-Speed 2496.67 samples/sec Loss 2.2246 LearningRate 0.000338 Epoch: 19 Global Step: 395520 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:32,928-Speed 2510.07 samples/sec Loss 2.1883 LearningRate 0.000338 Epoch: 19 Global Step: 395530 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:41,126-Speed 2498.50 samples/sec Loss 2.1894 LearningRate 0.000338 Epoch: 19 Global Step: 395540 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:49,327-Speed 2497.73 samples/sec Loss 2.2136 LearningRate 0.000338 Epoch: 19 Global Step: 395550 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:58:57,528-Speed 2497.46 samples/sec Loss 2.2168 LearningRate 0.000338 Epoch: 19 Global Step: 395560 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:05,733-Speed 2496.66 samples/sec Loss 2.2097 LearningRate 0.000338 Epoch: 19 Global Step: 395570 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:13,933-Speed 2497.79 samples/sec Loss 2.2196 LearningRate 0.000338 Epoch: 19 Global Step: 395580 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:22,085-Speed 2512.56 samples/sec Loss 2.2056 LearningRate 0.000338 Epoch: 19 Global Step: 395590 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:30,288-Speed 2497.19 samples/sec Loss 2.2229 LearningRate 0.000338 Epoch: 19 Global Step: 395600 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:38,497-Speed 2495.44 samples/sec Loss 2.2520 LearningRate 0.000338 Epoch: 19 Global Step: 395610 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:46,696-Speed 2498.15 samples/sec Loss 2.1929 LearningRate 0.000338 Epoch: 19 Global Step: 395620 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 08:59:54,900-Speed 2496.76 samples/sec Loss 2.2167 LearningRate 0.000338 Epoch: 19 Global Step: 395630 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:03,107-Speed 2495.95 samples/sec Loss 2.1884 LearningRate 0.000338 Epoch: 19 Global Step: 395640 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:11,258-Speed 2512.74 samples/sec Loss 2.1913 LearningRate 0.000338 Epoch: 19 Global Step: 395650 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:19,466-Speed 2495.65 samples/sec Loss 2.1822 LearningRate 0.000338 Epoch: 19 Global Step: 395660 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:27,666-Speed 2497.97 samples/sec Loss 2.2281 LearningRate 0.000338 Epoch: 19 Global Step: 395670 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:35,869-Speed 2497.28 samples/sec Loss 2.2438 LearningRate 0.000338 Epoch: 19 Global Step: 395680 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:44,067-Speed 2498.52 samples/sec Loss 2.2327 LearningRate 0.000338 Epoch: 19 Global Step: 395690 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:00:52,274-Speed 2495.99 samples/sec Loss 2.1859 LearningRate 0.000338 Epoch: 19 Global Step: 395700 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:00,426-Speed 2512.72 samples/sec Loss 2.2363 LearningRate 0.000338 Epoch: 19 Global Step: 395710 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:08,624-Speed 2498.54 samples/sec Loss 2.2222 LearningRate 0.000338 Epoch: 19 Global Step: 395720 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:16,827-Speed 2496.98 samples/sec Loss 2.2494 LearningRate 0.000338 Epoch: 19 Global Step: 395730 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:25,031-Speed 2496.58 samples/sec Loss 2.2468 LearningRate 0.000338 Epoch: 19 Global Step: 395740 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:33,238-Speed 2496.11 samples/sec Loss 2.2268 LearningRate 0.000338 Epoch: 19 Global Step: 395750 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:41,447-Speed 2495.14 samples/sec Loss 2.2183 LearningRate 0.000338 Epoch: 19 Global Step: 395760 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:49,598-Speed 2513.17 samples/sec Loss 2.2014 LearningRate 0.000338 Epoch: 19 Global Step: 395770 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:01:57,802-Speed 2496.70 samples/sec Loss 2.2251 LearningRate 0.000338 Epoch: 19 Global Step: 395780 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:06,002-Speed 2497.85 samples/sec Loss 2.2218 LearningRate 0.000338 Epoch: 19 Global Step: 395790 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:14,208-Speed 2496.27 samples/sec Loss 2.2567 LearningRate 0.000338 Epoch: 19 Global Step: 395800 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:22,411-Speed 2497.22 samples/sec Loss 2.2356 LearningRate 0.000338 Epoch: 19 Global Step: 395810 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:30,620-Speed 2494.99 samples/sec Loss 2.2392 LearningRate 0.000338 Epoch: 19 Global Step: 395820 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:38,782-Speed 2509.67 samples/sec Loss 2.2248 LearningRate 0.000337 Epoch: 19 Global Step: 395830 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:46,983-Speed 2497.66 samples/sec Loss 2.2572 LearningRate 0.000337 Epoch: 19 Global Step: 395840 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:02:55,188-Speed 2496.44 samples/sec Loss 2.2608 LearningRate 0.000337 Epoch: 19 Global Step: 395850 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:03,390-Speed 2497.10 samples/sec Loss 2.1976 LearningRate 0.000337 Epoch: 19 Global Step: 395860 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:11,590-Speed 2497.99 samples/sec Loss 2.1752 LearningRate 0.000337 Epoch: 19 Global Step: 395870 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:19,792-Speed 2497.58 samples/sec Loss 2.2466 LearningRate 0.000337 Epoch: 19 Global Step: 395880 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:27,940-Speed 2513.89 samples/sec Loss 2.2631 LearningRate 0.000337 Epoch: 19 Global Step: 395890 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:36,139-Speed 2498.04 samples/sec Loss 2.2896 LearningRate 0.000337 Epoch: 19 Global Step: 395900 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:44,339-Speed 2498.05 samples/sec Loss 2.2645 LearningRate 0.000337 Epoch: 19 Global Step: 395910 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:03:52,539-Speed 2498.24 samples/sec Loss 2.2312 LearningRate 0.000337 Epoch: 19 Global Step: 395920 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:00,740-Speed 2498.50 samples/sec Loss 2.2087 LearningRate 0.000337 Epoch: 19 Global Step: 395930 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:08,948-Speed 2495.52 samples/sec Loss 2.2640 LearningRate 0.000337 Epoch: 19 Global Step: 395940 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:17,102-Speed 2512.01 samples/sec Loss 2.1879 LearningRate 0.000337 Epoch: 19 Global Step: 395950 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:25,304-Speed 2497.25 samples/sec Loss 2.2057 LearningRate 0.000337 Epoch: 19 Global Step: 395960 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:33,507-Speed 2497.35 samples/sec Loss 2.2359 LearningRate 0.000337 Epoch: 19 Global Step: 395970 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:41,708-Speed 2497.58 samples/sec Loss 2.2395 LearningRate 0.000337 Epoch: 19 Global Step: 395980 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:49,913-Speed 2496.18 samples/sec Loss 2.2443 LearningRate 0.000337 Epoch: 19 Global Step: 395990 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:04:58,128-Speed 2493.47 samples/sec Loss 2.2219 LearningRate 0.000337 Epoch: 19 Global Step: 396000 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:06,280-Speed 2512.55 samples/sec Loss 2.2190 LearningRate 0.000337 Epoch: 19 Global Step: 396010 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:14,487-Speed 2495.91 samples/sec Loss 2.2312 LearningRate 0.000337 Epoch: 19 Global Step: 396020 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:22,689-Speed 2497.41 samples/sec Loss 2.2025 LearningRate 0.000337 Epoch: 19 Global Step: 396030 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:30,894-Speed 2496.63 samples/sec Loss 2.2314 LearningRate 0.000337 Epoch: 19 Global Step: 396040 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:39,101-Speed 2495.86 samples/sec Loss 2.2099 LearningRate 0.000337 Epoch: 19 Global Step: 396050 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:47,304-Speed 2496.88 samples/sec Loss 2.2445 LearningRate 0.000337 Epoch: 19 Global Step: 396060 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:05:55,454-Speed 2513.55 samples/sec Loss 2.2424 LearningRate 0.000337 Epoch: 19 Global Step: 396070 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:03,657-Speed 2497.05 samples/sec Loss 2.2133 LearningRate 0.000337 Epoch: 19 Global Step: 396080 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:11,860-Speed 2496.93 samples/sec Loss 2.2541 LearningRate 0.000337 Epoch: 19 Global Step: 396090 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:20,067-Speed 2495.83 samples/sec Loss 2.2529 LearningRate 0.000337 Epoch: 19 Global Step: 396100 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:28,269-Speed 2497.38 samples/sec Loss 2.2516 LearningRate 0.000337 Epoch: 19 Global Step: 396110 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:36,471-Speed 2497.40 samples/sec Loss 2.2422 LearningRate 0.000337 Epoch: 19 Global Step: 396120 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:44,619-Speed 2513.92 samples/sec Loss 2.2706 LearningRate 0.000337 Epoch: 19 Global Step: 396130 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:06:52,821-Speed 2497.45 samples/sec Loss 2.2512 LearningRate 0.000337 Epoch: 19 Global Step: 396140 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:01,029-Speed 2495.43 samples/sec Loss 2.2709 LearningRate 0.000337 Epoch: 19 Global Step: 396150 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:09,230-Speed 2497.77 samples/sec Loss 2.2420 LearningRate 0.000337 Epoch: 19 Global Step: 396160 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:17,439-Speed 2495.48 samples/sec Loss 2.2716 LearningRate 0.000337 Epoch: 19 Global Step: 396170 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:25,641-Speed 2497.49 samples/sec Loss 2.2886 LearningRate 0.000337 Epoch: 19 Global Step: 396180 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:33,801-Speed 2510.28 samples/sec Loss 2.3239 LearningRate 0.000337 Epoch: 19 Global Step: 396190 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:42,003-Speed 2497.34 samples/sec Loss 2.2773 LearningRate 0.000337 Epoch: 19 Global Step: 396200 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:50,206-Speed 2497.28 samples/sec Loss 2.2793 LearningRate 0.000337 Epoch: 19 Global Step: 396210 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:07:58,408-Speed 2497.42 samples/sec Loss 2.2685 LearningRate 0.000337 Epoch: 19 Global Step: 396220 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:06,614-Speed 2496.18 samples/sec Loss 2.2877 LearningRate 0.000337 Epoch: 19 Global Step: 396230 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:14,816-Speed 2497.09 samples/sec Loss 2.2281 LearningRate 0.000337 Epoch: 19 Global Step: 396240 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:22,967-Speed 2513.14 samples/sec Loss 2.2324 LearningRate 0.000337 Epoch: 19 Global Step: 396250 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:31,169-Speed 2497.20 samples/sec Loss 2.2278 LearningRate 0.000337 Epoch: 19 Global Step: 396260 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:39,370-Speed 2497.88 samples/sec Loss 2.2788 LearningRate 0.000337 Epoch: 19 Global Step: 396270 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:47,593-Speed 2491.02 samples/sec Loss 2.2935 LearningRate 0.000337 Epoch: 19 Global Step: 396280 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:08:55,811-Speed 2492.83 samples/sec Loss 2.2612 LearningRate 0.000337 Epoch: 19 Global Step: 396290 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:04,012-Speed 2497.82 samples/sec Loss 2.2969 LearningRate 0.000337 Epoch: 19 Global Step: 396300 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:12,175-Speed 2509.13 samples/sec Loss 2.2335 LearningRate 0.000337 Epoch: 19 Global Step: 396310 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:20,381-Speed 2496.34 samples/sec Loss 2.2254 LearningRate 0.000337 Epoch: 19 Global Step: 396320 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:28,593-Speed 2494.39 samples/sec Loss 2.2742 LearningRate 0.000337 Epoch: 19 Global Step: 396330 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:36,793-Speed 2497.96 samples/sec Loss 2.2902 LearningRate 0.000337 Epoch: 19 Global Step: 396340 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:44,991-Speed 2498.69 samples/sec Loss 2.2609 LearningRate 0.000337 Epoch: 19 Global Step: 396350 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:09:53,203-Speed 2494.36 samples/sec Loss 2.2788 LearningRate 0.000337 Epoch: 19 Global Step: 396360 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:01,357-Speed 2511.94 samples/sec Loss 2.2216 LearningRate 0.000337 Epoch: 19 Global Step: 396370 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:09,558-Speed 2497.47 samples/sec Loss 2.2814 LearningRate 0.000337 Epoch: 19 Global Step: 396380 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:17,759-Speed 2497.86 samples/sec Loss 2.2199 LearningRate 0.000337 Epoch: 19 Global Step: 396390 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:25,965-Speed 2495.88 samples/sec Loss 2.2507 LearningRate 0.000337 Epoch: 19 Global Step: 396400 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:34,164-Speed 2498.27 samples/sec Loss 2.2446 LearningRate 0.000337 Epoch: 19 Global Step: 396410 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:42,371-Speed 2495.96 samples/sec Loss 2.2496 LearningRate 0.000337 Epoch: 19 Global Step: 396420 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:50,518-Speed 2514.05 samples/sec Loss 2.2536 LearningRate 0.000337 Epoch: 19 Global Step: 396430 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:10:58,724-Speed 2496.23 samples/sec Loss 2.2175 LearningRate 0.000337 Epoch: 19 Global Step: 396440 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:06,927-Speed 2497.17 samples/sec Loss 2.2508 LearningRate 0.000337 Epoch: 19 Global Step: 396450 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:15,126-Speed 2498.31 samples/sec Loss 2.2665 LearningRate 0.000337 Epoch: 19 Global Step: 396460 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:23,340-Speed 2493.86 samples/sec Loss 2.2559 LearningRate 0.000336 Epoch: 19 Global Step: 396470 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:31,547-Speed 2495.81 samples/sec Loss 2.2195 LearningRate 0.000336 Epoch: 19 Global Step: 396480 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:39,722-Speed 2505.39 samples/sec Loss 2.2399 LearningRate 0.000336 Epoch: 19 Global Step: 396490 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:47,928-Speed 2496.12 samples/sec Loss 2.2496 LearningRate 0.000336 Epoch: 19 Global Step: 396500 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:11:56,128-Speed 2497.86 samples/sec Loss 2.2019 LearningRate 0.000336 Epoch: 19 Global Step: 396510 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:12:04,336-Speed 2495.91 samples/sec Loss 2.2041 LearningRate 0.000336 Epoch: 19 Global Step: 396520 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:12:12,538-Speed 2497.08 samples/sec Loss 2.2709 LearningRate 0.000336 Epoch: 19 Global Step: 396530 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:12:20,739-Speed 2497.80 samples/sec Loss 2.2023 LearningRate 0.000336 Epoch: 19 Global Step: 396540 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:12:28,887-Speed 2513.83 samples/sec Loss 2.1902 LearningRate 0.000336 Epoch: 19 Global Step: 396550 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:12:37,094-Speed 2495.80 samples/sec Loss 2.2102 LearningRate 0.000336 Epoch: 19 Global Step: 396560 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:12:45,307-Speed 2494.03 samples/sec Loss 2.2466 LearningRate 0.000336 Epoch: 19 Global Step: 396570 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:12:53,516-Speed 2495.25 samples/sec Loss 2.2491 LearningRate 0.000336 Epoch: 19 Global Step: 396580 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:01,716-Speed 2497.75 samples/sec Loss 2.1795 LearningRate 0.000336 Epoch: 19 Global Step: 396590 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:09,923-Speed 2495.97 samples/sec Loss 2.2126 LearningRate 0.000336 Epoch: 19 Global Step: 396600 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:18,072-Speed 2513.56 samples/sec Loss 2.1964 LearningRate 0.000336 Epoch: 19 Global Step: 396610 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:26,274-Speed 2497.41 samples/sec Loss 2.1944 LearningRate 0.000336 Epoch: 19 Global Step: 396620 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:34,485-Speed 2494.79 samples/sec Loss 2.2276 LearningRate 0.000336 Epoch: 19 Global Step: 396630 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:42,686-Speed 2497.69 samples/sec Loss 2.2237 LearningRate 0.000336 Epoch: 19 Global Step: 396640 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:50,887-Speed 2497.52 samples/sec Loss 2.1919 LearningRate 0.000336 Epoch: 19 Global Step: 396650 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:13:59,091-Speed 2496.75 samples/sec Loss 2.2209 LearningRate 0.000336 Epoch: 19 Global Step: 396660 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:07,237-Speed 2514.51 samples/sec Loss 2.2389 LearningRate 0.000336 Epoch: 19 Global Step: 396670 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:15,447-Speed 2494.87 samples/sec Loss 2.2703 LearningRate 0.000336 Epoch: 19 Global Step: 396680 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:23,663-Speed 2493.20 samples/sec Loss 2.2344 LearningRate 0.000336 Epoch: 19 Global Step: 396690 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:31,864-Speed 2497.65 samples/sec Loss 2.2298 LearningRate 0.000336 Epoch: 19 Global Step: 396700 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:40,079-Speed 2493.42 samples/sec Loss 2.2386 LearningRate 0.000336 Epoch: 19 Global Step: 396710 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:48,303-Speed 2490.53 samples/sec Loss 2.1872 LearningRate 0.000336 Epoch: 19 Global Step: 396720 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:14:56,466-Speed 2509.30 samples/sec Loss 2.1955 LearningRate 0.000336 Epoch: 19 Global Step: 396730 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:04,672-Speed 2496.23 samples/sec Loss 2.2150 LearningRate 0.000336 Epoch: 19 Global Step: 396740 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:12,870-Speed 2498.59 samples/sec Loss 2.2469 LearningRate 0.000336 Epoch: 19 Global Step: 396750 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:21,069-Speed 2498.21 samples/sec Loss 2.2630 LearningRate 0.000336 Epoch: 19 Global Step: 396760 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:29,276-Speed 2495.82 samples/sec Loss 2.1705 LearningRate 0.000336 Epoch: 19 Global Step: 396770 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:37,479-Speed 2496.96 samples/sec Loss 2.2312 LearningRate 0.000336 Epoch: 19 Global Step: 396780 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:45,629-Speed 2513.31 samples/sec Loss 2.1562 LearningRate 0.000336 Epoch: 19 Global Step: 396790 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:15:53,829-Speed 2498.08 samples/sec Loss 2.1969 LearningRate 0.000336 Epoch: 19 Global Step: 396800 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:02,032-Speed 2497.13 samples/sec Loss 2.2243 LearningRate 0.000336 Epoch: 19 Global Step: 396810 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:10,235-Speed 2496.90 samples/sec Loss 2.1491 LearningRate 0.000336 Epoch: 19 Global Step: 396820 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:18,439-Speed 2496.75 samples/sec Loss 2.2434 LearningRate 0.000336 Epoch: 19 Global Step: 396830 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:26,655-Speed 2493.15 samples/sec Loss 2.2321 LearningRate 0.000336 Epoch: 19 Global Step: 396840 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:34,803-Speed 2513.70 samples/sec Loss 2.2081 LearningRate 0.000336 Epoch: 19 Global Step: 396850 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:43,016-Speed 2493.91 samples/sec Loss 2.2136 LearningRate 0.000336 Epoch: 19 Global Step: 396860 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:51,235-Speed 2492.28 samples/sec Loss 2.2057 LearningRate 0.000336 Epoch: 19 Global Step: 396870 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:16:59,447-Speed 2494.39 samples/sec Loss 2.1988 LearningRate 0.000336 Epoch: 19 Global Step: 396880 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:07,659-Speed 2494.26 samples/sec Loss 2.2333 LearningRate 0.000336 Epoch: 19 Global Step: 396890 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:15,862-Speed 2497.16 samples/sec Loss 2.2412 LearningRate 0.000336 Epoch: 19 Global Step: 396900 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:24,011-Speed 2513.63 samples/sec Loss 2.2418 LearningRate 0.000336 Epoch: 19 Global Step: 396910 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:32,212-Speed 2497.68 samples/sec Loss 2.2480 LearningRate 0.000336 Epoch: 19 Global Step: 396920 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:40,429-Speed 2492.61 samples/sec Loss 2.2429 LearningRate 0.000336 Epoch: 19 Global Step: 396930 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:48,630-Speed 2497.73 samples/sec Loss 2.2373 LearningRate 0.000336 Epoch: 19 Global Step: 396940 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:17:56,836-Speed 2496.24 samples/sec Loss 2.2403 LearningRate 0.000336 Epoch: 19 Global Step: 396950 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:05,036-Speed 2497.90 samples/sec Loss 2.2960 LearningRate 0.000336 Epoch: 19 Global Step: 396960 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:13,184-Speed 2513.77 samples/sec Loss 2.2185 LearningRate 0.000336 Epoch: 19 Global Step: 396970 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:21,392-Speed 2495.65 samples/sec Loss 2.2392 LearningRate 0.000336 Epoch: 19 Global Step: 396980 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:29,593-Speed 2497.60 samples/sec Loss 2.2273 LearningRate 0.000336 Epoch: 19 Global Step: 396990 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:37,807-Speed 2493.75 samples/sec Loss 2.2613 LearningRate 0.000336 Epoch: 19 Global Step: 397000 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:46,009-Speed 2497.37 samples/sec Loss 2.2752 LearningRate 0.000336 Epoch: 19 Global Step: 397010 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:18:54,209-Speed 2498.34 samples/sec Loss 2.2706 LearningRate 0.000336 Epoch: 19 Global Step: 397020 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:02,372-Speed 2509.27 samples/sec Loss 2.2121 LearningRate 0.000336 Epoch: 19 Global Step: 397030 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:10,574-Speed 2496.95 samples/sec Loss 2.2051 LearningRate 0.000336 Epoch: 19 Global Step: 397040 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:18,778-Speed 2496.92 samples/sec Loss 2.2579 LearningRate 0.000336 Epoch: 19 Global Step: 397050 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:26,988-Speed 2495.07 samples/sec Loss 2.2249 LearningRate 0.000336 Epoch: 19 Global Step: 397060 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:35,191-Speed 2497.12 samples/sec Loss 2.1939 LearningRate 0.000336 Epoch: 19 Global Step: 397070 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:43,393-Speed 2497.09 samples/sec Loss 2.2437 LearningRate 0.000336 Epoch: 19 Global Step: 397080 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:51,548-Speed 2511.87 samples/sec Loss 2.2277 LearningRate 0.000336 Epoch: 19 Global Step: 397090 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:19:59,749-Speed 2497.48 samples/sec Loss 2.2332 LearningRate 0.000336 Epoch: 19 Global Step: 397100 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:07,954-Speed 2496.41 samples/sec Loss 2.2671 LearningRate 0.000336 Epoch: 19 Global Step: 397110 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:16,154-Speed 2497.87 samples/sec Loss 2.2381 LearningRate 0.000335 Epoch: 19 Global Step: 397120 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:24,356-Speed 2497.37 samples/sec Loss 2.2985 LearningRate 0.000335 Epoch: 19 Global Step: 397130 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:32,561-Speed 2496.49 samples/sec Loss 2.2423 LearningRate 0.000335 Epoch: 19 Global Step: 397140 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:40,709-Speed 2513.67 samples/sec Loss 2.2742 LearningRate 0.000335 Epoch: 19 Global Step: 397150 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:48,910-Speed 2497.82 samples/sec Loss 2.2693 LearningRate 0.000335 Epoch: 19 Global Step: 397160 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:20:57,115-Speed 2496.50 samples/sec Loss 2.2760 LearningRate 0.000335 Epoch: 19 Global Step: 397170 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:05,313-Speed 2498.36 samples/sec Loss 2.1946 LearningRate 0.000335 Epoch: 19 Global Step: 397180 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:13,515-Speed 2497.21 samples/sec Loss 2.2408 LearningRate 0.000335 Epoch: 19 Global Step: 397190 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:21,727-Speed 2494.43 samples/sec Loss 2.2182 LearningRate 0.000335 Epoch: 19 Global Step: 397200 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:29,873-Speed 2514.35 samples/sec Loss 2.2695 LearningRate 0.000335 Epoch: 19 Global Step: 397210 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:38,073-Speed 2498.18 samples/sec Loss 2.2138 LearningRate 0.000335 Epoch: 19 Global Step: 397220 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:46,275-Speed 2497.43 samples/sec Loss 2.2475 LearningRate 0.000335 Epoch: 19 Global Step: 397230 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:21:54,478-Speed 2497.18 samples/sec Loss 2.2226 LearningRate 0.000335 Epoch: 19 Global Step: 397240 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:02,692-Speed 2493.84 samples/sec Loss 2.2667 LearningRate 0.000335 Epoch: 19 Global Step: 397250 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:10,895-Speed 2497.15 samples/sec Loss 2.1613 LearningRate 0.000335 Epoch: 19 Global Step: 397260 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:19,046-Speed 2512.96 samples/sec Loss 2.2604 LearningRate 0.000335 Epoch: 19 Global Step: 397270 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:27,249-Speed 2496.88 samples/sec Loss 2.2171 LearningRate 0.000335 Epoch: 19 Global Step: 397280 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:35,457-Speed 2495.43 samples/sec Loss 2.2088 LearningRate 0.000335 Epoch: 19 Global Step: 397290 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:43,659-Speed 2497.63 samples/sec Loss 2.2088 LearningRate 0.000335 Epoch: 19 Global Step: 397300 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:22:51,860-Speed 2497.46 samples/sec Loss 2.2344 LearningRate 0.000335 Epoch: 19 Global Step: 397310 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:00,083-Speed 2491.14 samples/sec Loss 2.2689 LearningRate 0.000335 Epoch: 19 Global Step: 397320 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:08,229-Speed 2514.32 samples/sec Loss 2.2243 LearningRate 0.000335 Epoch: 19 Global Step: 397330 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:16,442-Speed 2494.22 samples/sec Loss 2.2523 LearningRate 0.000335 Epoch: 19 Global Step: 397340 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:24,643-Speed 2497.89 samples/sec Loss 2.2076 LearningRate 0.000335 Epoch: 19 Global Step: 397350 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:32,845-Speed 2497.47 samples/sec Loss 2.2246 LearningRate 0.000335 Epoch: 19 Global Step: 397360 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:41,058-Speed 2493.97 samples/sec Loss 2.2280 LearningRate 0.000335 Epoch: 19 Global Step: 397370 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:49,257-Speed 2498.43 samples/sec Loss 2.2027 LearningRate 0.000335 Epoch: 19 Global Step: 397380 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:23:57,403-Speed 2514.26 samples/sec Loss 2.2579 LearningRate 0.000335 Epoch: 19 Global Step: 397390 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:05,614-Speed 2494.71 samples/sec Loss 2.2176 LearningRate 0.000335 Epoch: 19 Global Step: 397400 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:13,816-Speed 2497.47 samples/sec Loss 2.2039 LearningRate 0.000335 Epoch: 19 Global Step: 397410 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:22,014-Speed 2498.40 samples/sec Loss 2.2082 LearningRate 0.000335 Epoch: 19 Global Step: 397420 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:30,219-Speed 2496.49 samples/sec Loss 2.1922 LearningRate 0.000335 Epoch: 19 Global Step: 397430 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:38,422-Speed 2496.99 samples/sec Loss 2.2066 LearningRate 0.000335 Epoch: 19 Global Step: 397440 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:46,572-Speed 2513.49 samples/sec Loss 2.2028 LearningRate 0.000335 Epoch: 19 Global Step: 397450 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:24:54,773-Speed 2497.54 samples/sec Loss 2.1916 LearningRate 0.000335 Epoch: 19 Global Step: 397460 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:02,975-Speed 2497.37 samples/sec Loss 2.1566 LearningRate 0.000335 Epoch: 19 Global Step: 397470 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:11,178-Speed 2497.54 samples/sec Loss 2.2005 LearningRate 0.000335 Epoch: 19 Global Step: 397480 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:19,382-Speed 2496.60 samples/sec Loss 2.2310 LearningRate 0.000335 Epoch: 19 Global Step: 397490 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:27,590-Speed 2495.36 samples/sec Loss 2.2135 LearningRate 0.000335 Epoch: 19 Global Step: 397500 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:35,740-Speed 2513.49 samples/sec Loss 2.1810 LearningRate 0.000335 Epoch: 19 Global Step: 397510 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:43,939-Speed 2498.26 samples/sec Loss 2.2121 LearningRate 0.000335 Epoch: 19 Global Step: 397520 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:25:52,145-Speed 2496.04 samples/sec Loss 2.1526 LearningRate 0.000335 Epoch: 19 Global Step: 397530 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:26:00,304-Speed 2510.41 samples/sec Loss 2.2128 LearningRate 0.000335 Epoch: 19 Global Step: 397540 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:08,508-Speed 2496.80 samples/sec Loss 2.1873 LearningRate 0.000335 Epoch: 19 Global Step: 397550 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:16,707-Speed 2498.25 samples/sec Loss 2.2150 LearningRate 0.000335 Epoch: 19 Global Step: 397560 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:24,857-Speed 2513.15 samples/sec Loss 2.1923 LearningRate 0.000335 Epoch: 19 Global Step: 397570 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:33,061-Speed 2496.47 samples/sec Loss 2.2376 LearningRate 0.000335 Epoch: 19 Global Step: 397580 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:41,262-Speed 2497.68 samples/sec Loss 2.1977 LearningRate 0.000335 Epoch: 19 Global Step: 397590 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:49,466-Speed 2497.25 samples/sec Loss 2.1704 LearningRate 0.000335 Epoch: 19 Global Step: 397600 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:26:57,668-Speed 2497.26 samples/sec Loss 2.1850 LearningRate 0.000335 Epoch: 19 Global Step: 397610 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:05,880-Speed 2494.27 samples/sec Loss 2.1942 LearningRate 0.000335 Epoch: 19 Global Step: 397620 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:14,028-Speed 2513.81 samples/sec Loss 2.2231 LearningRate 0.000335 Epoch: 19 Global Step: 397630 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:22,230-Speed 2497.67 samples/sec Loss 2.1419 LearningRate 0.000335 Epoch: 19 Global Step: 397640 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:30,443-Speed 2493.74 samples/sec Loss 2.2138 LearningRate 0.000335 Epoch: 19 Global Step: 397650 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:38,646-Speed 2497.23 samples/sec Loss 2.1935 LearningRate 0.000335 Epoch: 19 Global Step: 397660 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:46,848-Speed 2497.38 samples/sec Loss 2.1915 LearningRate 0.000335 Epoch: 19 Global Step: 397670 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:27:55,047-Speed 2498.43 samples/sec Loss 2.1795 LearningRate 0.000335 Epoch: 19 Global Step: 397680 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:03,196-Speed 2513.65 samples/sec Loss 2.1912 LearningRate 0.000335 Epoch: 19 Global Step: 397690 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:11,399-Speed 2497.16 samples/sec Loss 2.1807 LearningRate 0.000335 Epoch: 19 Global Step: 397700 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:19,597-Speed 2498.77 samples/sec Loss 2.1960 LearningRate 0.000335 Epoch: 19 Global Step: 397710 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:27,798-Speed 2497.74 samples/sec Loss 2.1705 LearningRate 0.000335 Epoch: 19 Global Step: 397720 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:36,000-Speed 2497.18 samples/sec Loss 2.1955 LearningRate 0.000335 Epoch: 19 Global Step: 397730 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:44,203-Speed 2496.87 samples/sec Loss 2.2246 LearningRate 0.000335 Epoch: 19 Global Step: 397740 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:28:52,350-Speed 2514.37 samples/sec Loss 2.2231 LearningRate 0.000335 Epoch: 19 Global Step: 397750 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:00,551-Speed 2497.50 samples/sec Loss 2.2179 LearningRate 0.000334 Epoch: 19 Global Step: 397760 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:08,754-Speed 2496.94 samples/sec Loss 2.1784 LearningRate 0.000334 Epoch: 19 Global Step: 397770 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:16,954-Speed 2497.94 samples/sec Loss 2.2109 LearningRate 0.000334 Epoch: 19 Global Step: 397780 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:25,149-Speed 2499.60 samples/sec Loss 2.1731 LearningRate 0.000334 Epoch: 19 Global Step: 397790 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:33,347-Speed 2498.31 samples/sec Loss 2.2119 LearningRate 0.000334 Epoch: 19 Global Step: 397800 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:41,498-Speed 2513.12 samples/sec Loss 2.1949 LearningRate 0.000334 Epoch: 19 Global Step: 397810 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:49,700-Speed 2497.54 samples/sec Loss 2.2683 LearningRate 0.000334 Epoch: 19 Global Step: 397820 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:29:57,900-Speed 2498.05 samples/sec Loss 2.1666 LearningRate 0.000334 Epoch: 19 Global Step: 397830 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:06,099-Speed 2498.13 samples/sec Loss 2.1740 LearningRate 0.000334 Epoch: 19 Global Step: 397840 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:14,302-Speed 2496.99 samples/sec Loss 2.1382 LearningRate 0.000334 Epoch: 19 Global Step: 397850 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:22,505-Speed 2497.34 samples/sec Loss 2.2261 LearningRate 0.000334 Epoch: 19 Global Step: 397860 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:30,650-Speed 2514.79 samples/sec Loss 2.2072 LearningRate 0.000334 Epoch: 19 Global Step: 397870 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:38,851-Speed 2497.93 samples/sec Loss 2.2075 LearningRate 0.000334 Epoch: 19 Global Step: 397880 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:47,051-Speed 2497.82 samples/sec Loss 2.2004 LearningRate 0.000334 Epoch: 19 Global Step: 397890 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:30:55,252-Speed 2497.69 samples/sec Loss 2.1599 LearningRate 0.000334 Epoch: 19 Global Step: 397900 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:03,452-Speed 2498.18 samples/sec Loss 2.2199 LearningRate 0.000334 Epoch: 19 Global Step: 397910 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:11,655-Speed 2496.98 samples/sec Loss 2.2226 LearningRate 0.000334 Epoch: 19 Global Step: 397920 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:19,802-Speed 2514.24 samples/sec Loss 2.2355 LearningRate 0.000334 Epoch: 19 Global Step: 397930 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:28,003-Speed 2497.56 samples/sec Loss 2.2217 LearningRate 0.000334 Epoch: 19 Global Step: 397940 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:36,204-Speed 2497.71 samples/sec Loss 2.2598 LearningRate 0.000334 Epoch: 19 Global Step: 397950 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:44,404-Speed 2497.88 samples/sec Loss 2.2571 LearningRate 0.000334 Epoch: 19 Global Step: 397960 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:31:52,604-Speed 2497.95 samples/sec Loss 2.1993 LearningRate 0.000334 Epoch: 19 Global Step: 397970 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:00,802-Speed 2498.82 samples/sec Loss 2.1577 LearningRate 0.000334 Epoch: 19 Global Step: 397980 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:08,955-Speed 2512.32 samples/sec Loss 2.2378 LearningRate 0.000334 Epoch: 19 Global Step: 397990 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:17,158-Speed 2497.10 samples/sec Loss 2.1579 LearningRate 0.000334 Epoch: 19 Global Step: 398000 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:25,356-Speed 2498.47 samples/sec Loss 2.2505 LearningRate 0.000334 Epoch: 19 Global Step: 398010 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:33,560-Speed 2496.77 samples/sec Loss 2.2041 LearningRate 0.000334 Epoch: 19 Global Step: 398020 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:41,761-Speed 2497.38 samples/sec Loss 2.2828 LearningRate 0.000334 Epoch: 19 Global Step: 398030 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:49,988-Speed 2490.09 samples/sec Loss 2.2394 LearningRate 0.000334 Epoch: 19 Global Step: 398040 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:32:58,135-Speed 2514.13 samples/sec Loss 2.1977 LearningRate 0.000334 Epoch: 19 Global Step: 398050 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:06,342-Speed 2495.98 samples/sec Loss 2.1683 LearningRate 0.000334 Epoch: 19 Global Step: 398060 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:14,543-Speed 2497.49 samples/sec Loss 2.2029 LearningRate 0.000334 Epoch: 19 Global Step: 398070 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:22,743-Speed 2498.42 samples/sec Loss 2.2198 LearningRate 0.000334 Epoch: 19 Global Step: 398080 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:30,944-Speed 2497.62 samples/sec Loss 2.2142 LearningRate 0.000334 Epoch: 19 Global Step: 398090 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:39,158-Speed 2493.73 samples/sec Loss 2.2179 LearningRate 0.000334 Epoch: 19 Global Step: 398100 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:47,304-Speed 2514.40 samples/sec Loss 2.1349 LearningRate 0.000334 Epoch: 19 Global Step: 398110 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:33:55,503-Speed 2498.22 samples/sec Loss 2.2215 LearningRate 0.000334 Epoch: 19 Global Step: 398120 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:03,714-Speed 2494.75 samples/sec Loss 2.1958 LearningRate 0.000334 Epoch: 19 Global Step: 398130 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:11,914-Speed 2497.94 samples/sec Loss 2.2103 LearningRate 0.000334 Epoch: 19 Global Step: 398140 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:20,114-Speed 2498.17 samples/sec Loss 2.2262 LearningRate 0.000334 Epoch: 19 Global Step: 398150 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:28,311-Speed 2498.89 samples/sec Loss 2.2371 LearningRate 0.000334 Epoch: 19 Global Step: 398160 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:36,455-Speed 2515.10 samples/sec Loss 2.2703 LearningRate 0.000334 Epoch: 19 Global Step: 398170 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:44,654-Speed 2498.29 samples/sec Loss 2.2319 LearningRate 0.000334 Epoch: 19 Global Step: 398180 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:34:52,855-Speed 2497.66 samples/sec Loss 2.2328 LearningRate 0.000334 Epoch: 19 Global Step: 398190 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:01,052-Speed 2498.75 samples/sec Loss 2.2831 LearningRate 0.000334 Epoch: 19 Global Step: 398200 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:09,256-Speed 2497.07 samples/sec Loss 2.2899 LearningRate 0.000334 Epoch: 19 Global Step: 398210 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:17,456-Speed 2497.96 samples/sec Loss 2.2691 LearningRate 0.000334 Epoch: 19 Global Step: 398220 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:25,601-Speed 2514.70 samples/sec Loss 2.2119 LearningRate 0.000334 Epoch: 19 Global Step: 398230 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:33,799-Speed 2498.59 samples/sec Loss 2.2309 LearningRate 0.000334 Epoch: 19 Global Step: 398240 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:41,998-Speed 2498.40 samples/sec Loss 2.2399 LearningRate 0.000334 Epoch: 19 Global Step: 398250 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:50,208-Speed 2495.14 samples/sec Loss 2.1767 LearningRate 0.000334 Epoch: 19 Global Step: 398260 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:35:58,420-Speed 2494.49 samples/sec Loss 2.2087 LearningRate 0.000334 Epoch: 19 Global Step: 398270 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:06,619-Speed 2497.98 samples/sec Loss 2.1655 LearningRate 0.000334 Epoch: 19 Global Step: 398280 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:14,766-Speed 2514.44 samples/sec Loss 2.2114 LearningRate 0.000334 Epoch: 19 Global Step: 398290 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:22,971-Speed 2496.52 samples/sec Loss 2.1868 LearningRate 0.000334 Epoch: 19 Global Step: 398300 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:31,168-Speed 2498.93 samples/sec Loss 2.1973 LearningRate 0.000334 Epoch: 19 Global Step: 398310 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:39,368-Speed 2498.55 samples/sec Loss 2.2175 LearningRate 0.000334 Epoch: 19 Global Step: 398320 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:47,577-Speed 2495.20 samples/sec Loss 2.1468 LearningRate 0.000334 Epoch: 19 Global Step: 398330 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:36:55,781-Speed 2496.74 samples/sec Loss 2.2100 LearningRate 0.000334 Epoch: 19 Global Step: 398340 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:03,928-Speed 2514.28 samples/sec Loss 2.1899 LearningRate 0.000334 Epoch: 19 Global Step: 398350 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:12,131-Speed 2497.33 samples/sec Loss 2.2013 LearningRate 0.000334 Epoch: 19 Global Step: 398360 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:20,334-Speed 2497.09 samples/sec Loss 2.1947 LearningRate 0.000334 Epoch: 19 Global Step: 398370 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:28,535-Speed 2497.61 samples/sec Loss 2.1823 LearningRate 0.000334 Epoch: 19 Global Step: 398380 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:36,735-Speed 2498.00 samples/sec Loss 2.2435 LearningRate 0.000334 Epoch: 19 Global Step: 398390 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:44,935-Speed 2498.00 samples/sec Loss 2.2187 LearningRate 0.000334 Epoch: 19 Global Step: 398400 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:37:53,085-Speed 2513.09 samples/sec Loss 2.1775 LearningRate 0.000333 Epoch: 19 Global Step: 398410 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:01,288-Speed 2497.37 samples/sec Loss 2.2380 LearningRate 0.000333 Epoch: 19 Global Step: 398420 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:09,487-Speed 2498.16 samples/sec Loss 2.2336 LearningRate 0.000333 Epoch: 19 Global Step: 398430 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:17,689-Speed 2497.36 samples/sec Loss 2.2311 LearningRate 0.000333 Epoch: 19 Global Step: 398440 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:25,886-Speed 2498.82 samples/sec Loss 2.2104 LearningRate 0.000333 Epoch: 19 Global Step: 398450 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:34,084-Speed 2498.63 samples/sec Loss 2.2235 LearningRate 0.000333 Epoch: 19 Global Step: 398460 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:42,239-Speed 2511.69 samples/sec Loss 2.2283 LearningRate 0.000333 Epoch: 19 Global Step: 398470 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:50,440-Speed 2497.91 samples/sec Loss 2.2251 LearningRate 0.000333 Epoch: 19 Global Step: 398480 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:38:58,640-Speed 2497.85 samples/sec Loss 2.2495 LearningRate 0.000333 Epoch: 19 Global Step: 398490 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:06,847-Speed 2495.83 samples/sec Loss 2.2507 LearningRate 0.000333 Epoch: 19 Global Step: 398500 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:15,056-Speed 2495.13 samples/sec Loss 2.1939 LearningRate 0.000333 Epoch: 19 Global Step: 398510 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:23,257-Speed 2497.95 samples/sec Loss 2.1410 LearningRate 0.000333 Epoch: 19 Global Step: 398520 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:31,405-Speed 2514.09 samples/sec Loss 2.2415 LearningRate 0.000333 Epoch: 19 Global Step: 398530 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:39,610-Speed 2496.28 samples/sec Loss 2.2596 LearningRate 0.000333 Epoch: 19 Global Step: 398540 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:47,813-Speed 2497.49 samples/sec Loss 2.2119 LearningRate 0.000333 Epoch: 19 Global Step: 398550 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:39:56,012-Speed 2498.30 samples/sec Loss 2.1738 LearningRate 0.000333 Epoch: 19 Global Step: 398560 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:04,219-Speed 2495.75 samples/sec Loss 2.1914 LearningRate 0.000333 Epoch: 19 Global Step: 398570 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:12,417-Speed 2498.58 samples/sec Loss 2.2368 LearningRate 0.000333 Epoch: 19 Global Step: 398580 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:20,563-Speed 2514.42 samples/sec Loss 2.2012 LearningRate 0.000333 Epoch: 19 Global Step: 398590 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:28,763-Speed 2498.23 samples/sec Loss 2.2142 LearningRate 0.000333 Epoch: 19 Global Step: 398600 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:36,961-Speed 2498.54 samples/sec Loss 2.2389 LearningRate 0.000333 Epoch: 19 Global Step: 398610 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:45,171-Speed 2494.78 samples/sec Loss 2.1630 LearningRate 0.000333 Epoch: 19 Global Step: 398620 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:40:53,390-Speed 2492.35 samples/sec Loss 2.2003 LearningRate 0.000333 Epoch: 19 Global Step: 398630 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:01,592-Speed 2497.33 samples/sec Loss 2.1667 LearningRate 0.000333 Epoch: 19 Global Step: 398640 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:09,738-Speed 2514.29 samples/sec Loss 2.1806 LearningRate 0.000333 Epoch: 19 Global Step: 398650 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:17,946-Speed 2495.59 samples/sec Loss 2.2414 LearningRate 0.000333 Epoch: 19 Global Step: 398660 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:26,195-Speed 2500.05 samples/sec Loss 2.2383 LearningRate 0.000333 Epoch: 19 Global Step: 398670 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:38,011-Speed 1908.26 samples/sec Loss 2.2219 LearningRate 0.000333 Epoch: 19 Global Step: 398680 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:46,211-Speed 2498.01 samples/sec Loss 2.1808 LearningRate 0.000333 Epoch: 19 Global Step: 398690 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:41:54,470-Speed 2501.20 samples/sec Loss 2.2176 LearningRate 0.000333 Epoch: 19 Global Step: 398700 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:42:06,933-Speed 2515.06 samples/sec Loss 2.2123 LearningRate 0.000333 Epoch: 19 Global Step: 398710 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:42:15,190-Speed 2502.57 samples/sec Loss 2.1913 LearningRate 0.000333 Epoch: 19 Global Step: 398720 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:42:23,401-Speed 2494.65 samples/sec Loss 2.1881 LearningRate 0.000333 Epoch: 19 Global Step: 398730 Fp16 Grad Scale: 16384 Required: 99 hours Training: 2022-07-09 09:42:31,623-Speed 2500.96 samples/sec Loss 2.1850 LearningRate 0.000333 Epoch: 19 Global Step: 398740 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:42:44,263-Speed 1723.77 samples/sec Loss 2.2578 LearningRate 0.000333 Epoch: 19 Global Step: 398750 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:42:52,489-Speed 2501.91 samples/sec Loss 2.2283 LearningRate 0.000333 Epoch: 19 Global Step: 398760 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:00,660-Speed 2517.81 samples/sec Loss 2.2068 LearningRate 0.000333 Epoch: 19 Global Step: 398770 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:08,913-Speed 2499.34 samples/sec Loss 2.2133 LearningRate 0.000333 Epoch: 19 Global Step: 398780 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:17,111-Speed 2498.44 samples/sec Loss 2.1626 LearningRate 0.000333 Epoch: 19 Global Step: 398790 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:30,075-Speed 1580.01 samples/sec Loss 2.2524 LearningRate 0.000333 Epoch: 19 Global Step: 398800 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:38,323-Speed 2500.74 samples/sec Loss 2.2257 LearningRate 0.000333 Epoch: 19 Global Step: 398810 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:46,568-Speed 2498.32 samples/sec Loss 2.2279 LearningRate 0.000333 Epoch: 19 Global Step: 398820 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:43:58,251-Speed 1753.19 samples/sec Loss 2.2559 LearningRate 0.000333 Epoch: 19 Global Step: 398830 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:06,510-Speed 2500.00 samples/sec Loss 2.1975 LearningRate 0.000333 Epoch: 19 Global Step: 398840 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:14,749-Speed 2499.51 samples/sec Loss 2.2330 LearningRate 0.000333 Epoch: 19 Global Step: 398850 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:23,435-Speed 2357.96 samples/sec Loss 2.2100 LearningRate 0.000333 Epoch: 19 Global Step: 398860 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:35,772-Speed 1664.14 samples/sec Loss 2.1756 LearningRate 0.000333 Epoch: 19 Global Step: 398870 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:44,043-Speed 2495.79 samples/sec Loss 2.2294 LearningRate 0.000333 Epoch: 19 Global Step: 398880 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:44:52,225-Speed 2514.87 samples/sec Loss 2.2531 LearningRate 0.000333 Epoch: 19 Global Step: 398890 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:03,937-Speed 1748.74 samples/sec Loss 2.2398 LearningRate 0.000333 Epoch: 19 Global Step: 398900 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:12,332-Speed 2461.11 samples/sec Loss 2.1878 LearningRate 0.000333 Epoch: 19 Global Step: 398910 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:20,625-Speed 2492.81 samples/sec Loss 2.2391 LearningRate 0.000333 Epoch: 19 Global Step: 398920 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:28,830-Speed 2496.32 samples/sec Loss 2.2349 LearningRate 0.000333 Epoch: 19 Global Step: 398930 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:37,716-Speed 2492.74 samples/sec Loss 2.1752 LearningRate 0.000333 Epoch: 19 Global Step: 398940 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:45,870-Speed 2512.27 samples/sec Loss 2.2153 LearningRate 0.000333 Epoch: 19 Global Step: 398950 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:45:54,077-Speed 2495.66 samples/sec Loss 2.2089 LearningRate 0.000333 Epoch: 19 Global Step: 398960 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:02,281-Speed 2496.86 samples/sec Loss 2.2496 LearningRate 0.000333 Epoch: 19 Global Step: 398970 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:10,486-Speed 2496.27 samples/sec Loss 2.2038 LearningRate 0.000333 Epoch: 19 Global Step: 398980 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:18,692-Speed 2496.14 samples/sec Loss 2.2377 LearningRate 0.000333 Epoch: 19 Global Step: 398990 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:26,898-Speed 2496.01 samples/sec Loss 2.2327 LearningRate 0.000333 Epoch: 19 Global Step: 399000 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:35,049-Speed 2513.00 samples/sec Loss 2.2265 LearningRate 0.000333 Epoch: 19 Global Step: 399010 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:43,254-Speed 2496.68 samples/sec Loss 2.2778 LearningRate 0.000333 Epoch: 19 Global Step: 399020 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:51,469-Speed 2493.34 samples/sec Loss 2.1946 LearningRate 0.000333 Epoch: 19 Global Step: 399030 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:46:59,672-Speed 2497.04 samples/sec Loss 2.1974 LearningRate 0.000333 Epoch: 19 Global Step: 399040 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:07,889-Speed 2493.09 samples/sec Loss 2.2033 LearningRate 0.000332 Epoch: 19 Global Step: 399050 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:16,094-Speed 2496.34 samples/sec Loss 2.2314 LearningRate 0.000332 Epoch: 19 Global Step: 399060 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:24,242-Speed 2513.98 samples/sec Loss 2.1955 LearningRate 0.000332 Epoch: 19 Global Step: 399070 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:32,450-Speed 2495.32 samples/sec Loss 2.2415 LearningRate 0.000332 Epoch: 19 Global Step: 399080 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:40,658-Speed 2495.62 samples/sec Loss 2.2278 LearningRate 0.000332 Epoch: 19 Global Step: 399090 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:48,877-Speed 2492.21 samples/sec Loss 2.2165 LearningRate 0.000332 Epoch: 19 Global Step: 399100 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:47:57,086-Speed 2495.05 samples/sec Loss 2.1903 LearningRate 0.000332 Epoch: 19 Global Step: 399110 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:05,302-Speed 2493.05 samples/sec Loss 2.1392 LearningRate 0.000332 Epoch: 19 Global Step: 399120 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:13,454-Speed 2512.63 samples/sec Loss 2.2035 LearningRate 0.000332 Epoch: 19 Global Step: 399130 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:21,657-Speed 2497.04 samples/sec Loss 2.2066 LearningRate 0.000332 Epoch: 19 Global Step: 399140 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:29,862-Speed 2496.70 samples/sec Loss 2.1646 LearningRate 0.000332 Epoch: 19 Global Step: 399150 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:38,062-Speed 2497.67 samples/sec Loss 2.2062 LearningRate 0.000332 Epoch: 19 Global Step: 399160 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:46,269-Speed 2495.86 samples/sec Loss 2.2505 LearningRate 0.000332 Epoch: 19 Global Step: 399170 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:48:54,484-Speed 2493.54 samples/sec Loss 2.1851 LearningRate 0.000332 Epoch: 19 Global Step: 399180 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:02,636-Speed 2512.57 samples/sec Loss 2.1921 LearningRate 0.000332 Epoch: 19 Global Step: 399190 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:10,843-Speed 2496.28 samples/sec Loss 2.1570 LearningRate 0.000332 Epoch: 19 Global Step: 399200 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:19,048-Speed 2496.16 samples/sec Loss 2.2226 LearningRate 0.000332 Epoch: 19 Global Step: 399210 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:27,252-Speed 2496.94 samples/sec Loss 2.1880 LearningRate 0.000332 Epoch: 19 Global Step: 399220 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:35,489-Speed 2486.47 samples/sec Loss 2.1901 LearningRate 0.000332 Epoch: 19 Global Step: 399230 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:43,691-Speed 2497.28 samples/sec Loss 2.2294 LearningRate 0.000332 Epoch: 19 Global Step: 399240 Fp16 Grad Scale: 32768 Required: 99 hours Training: 2022-07-09 09:49:51,837-Speed 2514.41 samples/sec Loss 2.2295 LearningRate 0.000332 Epoch: 19 Global Step: 399250 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 09:50:00,048-Speed 2494.82 samples/sec Loss 2.1903 LearningRate 0.000332 Epoch: 19 Global Step: 399260 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 09:50:08,247-Speed 2498.37 samples/sec Loss 2.1797 LearningRate 0.000332 Epoch: 19 Global Step: 399270 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 09:50:16,453-Speed 2496.08 samples/sec Loss 2.1766 LearningRate 0.000332 Epoch: 19 Global Step: 399280 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 09:50:24,612-Speed 2510.24 samples/sec Loss 2.1871 LearningRate 0.000332 Epoch: 19 Global Step: 399290 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:50:32,827-Speed 2493.53 samples/sec Loss 2.2261 LearningRate 0.000332 Epoch: 19 Global Step: 399300 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:50:40,978-Speed 2513.03 samples/sec Loss 2.2044 LearningRate 0.000332 Epoch: 19 Global Step: 399310 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:50:49,184-Speed 2496.09 samples/sec Loss 2.1815 LearningRate 0.000332 Epoch: 19 Global Step: 399320 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:50:57,389-Speed 2496.51 samples/sec Loss 2.1488 LearningRate 0.000332 Epoch: 19 Global Step: 399330 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:05,589-Speed 2498.02 samples/sec Loss 2.1477 LearningRate 0.000332 Epoch: 19 Global Step: 399340 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:13,797-Speed 2495.32 samples/sec Loss 2.2150 LearningRate 0.000332 Epoch: 19 Global Step: 399350 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:22,003-Speed 2496.09 samples/sec Loss 2.2124 LearningRate 0.000332 Epoch: 19 Global Step: 399360 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:30,149-Speed 2515.43 samples/sec Loss 2.1896 LearningRate 0.000332 Epoch: 19 Global Step: 399370 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:38,348-Speed 2498.23 samples/sec Loss 2.2347 LearningRate 0.000332 Epoch: 19 Global Step: 399380 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:46,551-Speed 2497.13 samples/sec Loss 2.2620 LearningRate 0.000332 Epoch: 19 Global Step: 399390 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:51:54,753-Speed 2497.46 samples/sec Loss 2.2289 LearningRate 0.000332 Epoch: 19 Global Step: 399400 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:02,955-Speed 2497.49 samples/sec Loss 2.2261 LearningRate 0.000332 Epoch: 19 Global Step: 399410 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:11,161-Speed 2496.34 samples/sec Loss 2.2009 LearningRate 0.000332 Epoch: 19 Global Step: 399420 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:19,308-Speed 2514.17 samples/sec Loss 2.2081 LearningRate 0.000332 Epoch: 19 Global Step: 399430 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:27,511-Speed 2497.08 samples/sec Loss 2.1991 LearningRate 0.000332 Epoch: 19 Global Step: 399440 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:35,714-Speed 2496.81 samples/sec Loss 2.2421 LearningRate 0.000332 Epoch: 19 Global Step: 399450 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:43,914-Speed 2498.06 samples/sec Loss 2.2533 LearningRate 0.000332 Epoch: 19 Global Step: 399460 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:52:52,116-Speed 2497.47 samples/sec Loss 2.2006 LearningRate 0.000332 Epoch: 19 Global Step: 399470 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:00,320-Speed 2496.68 samples/sec Loss 2.1875 LearningRate 0.000332 Epoch: 19 Global Step: 399480 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:08,470-Speed 2513.14 samples/sec Loss 2.2386 LearningRate 0.000332 Epoch: 19 Global Step: 399490 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:16,675-Speed 2496.45 samples/sec Loss 2.2154 LearningRate 0.000332 Epoch: 19 Global Step: 399500 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:24,883-Speed 2495.73 samples/sec Loss 2.2442 LearningRate 0.000332 Epoch: 19 Global Step: 399510 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:33,086-Speed 2497.13 samples/sec Loss 2.2253 LearningRate 0.000332 Epoch: 19 Global Step: 399520 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:41,285-Speed 2498.23 samples/sec Loss 2.2241 LearningRate 0.000332 Epoch: 19 Global Step: 399530 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:49,486-Speed 2497.81 samples/sec Loss 2.1866 LearningRate 0.000332 Epoch: 19 Global Step: 399540 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:53:57,634-Speed 2513.66 samples/sec Loss 2.2423 LearningRate 0.000332 Epoch: 19 Global Step: 399550 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:05,835-Speed 2497.69 samples/sec Loss 2.2318 LearningRate 0.000332 Epoch: 19 Global Step: 399560 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:14,036-Speed 2497.65 samples/sec Loss 2.2113 LearningRate 0.000332 Epoch: 19 Global Step: 399570 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:22,243-Speed 2495.92 samples/sec Loss 2.2239 LearningRate 0.000332 Epoch: 19 Global Step: 399580 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:30,445-Speed 2497.14 samples/sec Loss 2.2102 LearningRate 0.000332 Epoch: 19 Global Step: 399590 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:38,651-Speed 2496.13 samples/sec Loss 2.2122 LearningRate 0.000332 Epoch: 19 Global Step: 399600 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:46,800-Speed 2513.78 samples/sec Loss 2.1742 LearningRate 0.000332 Epoch: 19 Global Step: 399610 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:54:55,002-Speed 2497.34 samples/sec Loss 2.1859 LearningRate 0.000332 Epoch: 19 Global Step: 399620 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:03,204-Speed 2497.27 samples/sec Loss 2.2008 LearningRate 0.000332 Epoch: 19 Global Step: 399630 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:11,406-Speed 2497.50 samples/sec Loss 2.2063 LearningRate 0.000332 Epoch: 19 Global Step: 399640 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:19,607-Speed 2497.67 samples/sec Loss 2.2222 LearningRate 0.000332 Epoch: 19 Global Step: 399650 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:27,819-Speed 2494.22 samples/sec Loss 2.2214 LearningRate 0.000332 Epoch: 19 Global Step: 399660 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:35,977-Speed 2510.98 samples/sec Loss 2.2186 LearningRate 0.000332 Epoch: 19 Global Step: 399670 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:44,180-Speed 2497.09 samples/sec Loss 2.1926 LearningRate 0.000332 Epoch: 19 Global Step: 399680 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:55:52,382-Speed 2497.61 samples/sec Loss 2.1568 LearningRate 0.000332 Epoch: 19 Global Step: 399690 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:00,585-Speed 2497.08 samples/sec Loss 2.1844 LearningRate 0.000331 Epoch: 19 Global Step: 399700 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:08,785-Speed 2498.04 samples/sec Loss 2.1658 LearningRate 0.000331 Epoch: 19 Global Step: 399710 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:16,985-Speed 2497.84 samples/sec Loss 2.2234 LearningRate 0.000331 Epoch: 19 Global Step: 399720 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:25,134-Speed 2513.73 samples/sec Loss 2.1589 LearningRate 0.000331 Epoch: 19 Global Step: 399730 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:33,340-Speed 2496.10 samples/sec Loss 2.2278 LearningRate 0.000331 Epoch: 19 Global Step: 399740 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:41,550-Speed 2495.20 samples/sec Loss 2.1952 LearningRate 0.000331 Epoch: 19 Global Step: 399750 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:49,758-Speed 2495.42 samples/sec Loss 2.1863 LearningRate 0.000331 Epoch: 19 Global Step: 399760 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:56:57,966-Speed 2495.71 samples/sec Loss 2.1942 LearningRate 0.000331 Epoch: 19 Global Step: 399770 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:06,171-Speed 2496.36 samples/sec Loss 2.2014 LearningRate 0.000331 Epoch: 19 Global Step: 399780 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:14,321-Speed 2513.15 samples/sec Loss 2.2327 LearningRate 0.000331 Epoch: 19 Global Step: 399790 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:22,526-Speed 2496.64 samples/sec Loss 2.2210 LearningRate 0.000331 Epoch: 19 Global Step: 399800 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:30,729-Speed 2496.84 samples/sec Loss 2.2475 LearningRate 0.000331 Epoch: 19 Global Step: 399810 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:38,948-Speed 2492.30 samples/sec Loss 2.1886 LearningRate 0.000331 Epoch: 19 Global Step: 399820 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:47,162-Speed 2493.68 samples/sec Loss 2.1724 LearningRate 0.000331 Epoch: 19 Global Step: 399830 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:57:55,364-Speed 2497.61 samples/sec Loss 2.1831 LearningRate 0.000331 Epoch: 19 Global Step: 399840 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:03,529-Speed 2508.40 samples/sec Loss 2.2180 LearningRate 0.000331 Epoch: 19 Global Step: 399850 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:11,731-Speed 2497.47 samples/sec Loss 2.2223 LearningRate 0.000331 Epoch: 19 Global Step: 399860 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:19,931-Speed 2498.01 samples/sec Loss 2.1405 LearningRate 0.000331 Epoch: 19 Global Step: 399870 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:28,134-Speed 2497.31 samples/sec Loss 2.2402 LearningRate 0.000331 Epoch: 19 Global Step: 399880 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:36,334-Speed 2497.69 samples/sec Loss 2.2828 LearningRate 0.000331 Epoch: 19 Global Step: 399890 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:44,537-Speed 2497.14 samples/sec Loss 2.1799 LearningRate 0.000331 Epoch: 19 Global Step: 399900 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:58:52,686-Speed 2513.52 samples/sec Loss 2.2557 LearningRate 0.000331 Epoch: 19 Global Step: 399910 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:00,904-Speed 2492.59 samples/sec Loss 2.1558 LearningRate 0.000331 Epoch: 19 Global Step: 399920 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:09,112-Speed 2495.45 samples/sec Loss 2.2706 LearningRate 0.000331 Epoch: 19 Global Step: 399930 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:17,318-Speed 2495.87 samples/sec Loss 2.2231 LearningRate 0.000331 Epoch: 19 Global Step: 399940 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:25,519-Speed 2497.79 samples/sec Loss 2.2242 LearningRate 0.000331 Epoch: 19 Global Step: 399950 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:33,735-Speed 2493.10 samples/sec Loss 2.1909 LearningRate 0.000331 Epoch: 19 Global Step: 399960 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:41,884-Speed 2513.46 samples/sec Loss 2.2168 LearningRate 0.000331 Epoch: 19 Global Step: 399970 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:50,086-Speed 2497.54 samples/sec Loss 2.1986 LearningRate 0.000331 Epoch: 19 Global Step: 399980 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 09:59:58,292-Speed 2496.69 samples/sec Loss 2.2525 LearningRate 0.000331 Epoch: 19 Global Step: 399990 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:06,494-Speed 2497.29 samples/sec Loss 2.2592 LearningRate 0.000331 Epoch: 19 Global Step: 400000 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:14,703-Speed 2495.22 samples/sec Loss 2.2068 LearningRate 0.000331 Epoch: 19 Global Step: 400010 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:22,906-Speed 2496.99 samples/sec Loss 2.2114 LearningRate 0.000331 Epoch: 19 Global Step: 400020 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:31,057-Speed 2513.02 samples/sec Loss 2.1987 LearningRate 0.000331 Epoch: 19 Global Step: 400030 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:39,258-Speed 2497.53 samples/sec Loss 2.2075 LearningRate 0.000331 Epoch: 19 Global Step: 400040 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:47,459-Speed 2497.68 samples/sec Loss 2.2809 LearningRate 0.000331 Epoch: 19 Global Step: 400050 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:00:55,668-Speed 2495.35 samples/sec Loss 2.2261 LearningRate 0.000331 Epoch: 19 Global Step: 400060 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:03,871-Speed 2496.81 samples/sec Loss 2.2251 LearningRate 0.000331 Epoch: 19 Global Step: 400070 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:12,069-Speed 2498.49 samples/sec Loss 2.1976 LearningRate 0.000331 Epoch: 19 Global Step: 400080 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:20,218-Speed 2513.98 samples/sec Loss 2.1943 LearningRate 0.000331 Epoch: 19 Global Step: 400090 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:28,430-Speed 2494.06 samples/sec Loss 2.1849 LearningRate 0.000331 Epoch: 19 Global Step: 400100 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:36,636-Speed 2496.12 samples/sec Loss 2.1835 LearningRate 0.000331 Epoch: 19 Global Step: 400110 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:44,838-Speed 2497.49 samples/sec Loss 2.2111 LearningRate 0.000331 Epoch: 19 Global Step: 400120 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:01:53,052-Speed 2493.79 samples/sec Loss 2.2859 LearningRate 0.000331 Epoch: 19 Global Step: 400130 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:01,259-Speed 2496.70 samples/sec Loss 2.1583 LearningRate 0.000331 Epoch: 19 Global Step: 400140 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:09,410-Speed 2512.99 samples/sec Loss 2.2361 LearningRate 0.000331 Epoch: 19 Global Step: 400150 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:17,609-Speed 2498.33 samples/sec Loss 2.1685 LearningRate 0.000331 Epoch: 19 Global Step: 400160 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:25,820-Speed 2494.63 samples/sec Loss 2.1870 LearningRate 0.000331 Epoch: 19 Global Step: 400170 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:34,022-Speed 2497.58 samples/sec Loss 2.1579 LearningRate 0.000331 Epoch: 19 Global Step: 400180 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:42,227-Speed 2496.27 samples/sec Loss 2.1884 LearningRate 0.000331 Epoch: 19 Global Step: 400190 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:50,430-Speed 2496.98 samples/sec Loss 2.2272 LearningRate 0.000331 Epoch: 19 Global Step: 400200 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:02:58,597-Speed 2508.97 samples/sec Loss 2.2245 LearningRate 0.000331 Epoch: 19 Global Step: 400210 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:06,806-Speed 2495.23 samples/sec Loss 2.1774 LearningRate 0.000331 Epoch: 19 Global Step: 400220 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:15,013-Speed 2495.89 samples/sec Loss 2.3121 LearningRate 0.000331 Epoch: 19 Global Step: 400230 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:23,216-Speed 2497.18 samples/sec Loss 2.2019 LearningRate 0.000331 Epoch: 19 Global Step: 400240 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:31,420-Speed 2496.60 samples/sec Loss 2.1969 LearningRate 0.000331 Epoch: 19 Global Step: 400250 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:39,619-Speed 2498.34 samples/sec Loss 2.2270 LearningRate 0.000331 Epoch: 19 Global Step: 400260 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:47,769-Speed 2513.19 samples/sec Loss 2.2079 LearningRate 0.000331 Epoch: 19 Global Step: 400270 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:03:55,978-Speed 2495.45 samples/sec Loss 2.2203 LearningRate 0.000331 Epoch: 19 Global Step: 400280 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:04,176-Speed 2498.68 samples/sec Loss 2.2314 LearningRate 0.000331 Epoch: 19 Global Step: 400290 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:12,380-Speed 2496.65 samples/sec Loss 2.2075 LearningRate 0.000331 Epoch: 19 Global Step: 400300 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:20,582-Speed 2497.41 samples/sec Loss 2.2137 LearningRate 0.000331 Epoch: 19 Global Step: 400310 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:28,785-Speed 2497.02 samples/sec Loss 2.2209 LearningRate 0.000331 Epoch: 19 Global Step: 400320 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:36,945-Speed 2510.04 samples/sec Loss 2.1943 LearningRate 0.000331 Epoch: 19 Global Step: 400330 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:45,147-Speed 2497.47 samples/sec Loss 2.2345 LearningRate 0.000331 Epoch: 19 Global Step: 400340 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:04:53,346-Speed 2498.43 samples/sec Loss 2.2376 LearningRate 0.000330 Epoch: 19 Global Step: 400350 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:01,560-Speed 2493.63 samples/sec Loss 2.1582 LearningRate 0.000330 Epoch: 19 Global Step: 400360 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:09,761-Speed 2497.92 samples/sec Loss 2.2260 LearningRate 0.000330 Epoch: 19 Global Step: 400370 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:17,962-Speed 2497.42 samples/sec Loss 2.2262 LearningRate 0.000330 Epoch: 19 Global Step: 400380 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:26,110-Speed 2513.87 samples/sec Loss 2.2172 LearningRate 0.000330 Epoch: 19 Global Step: 400390 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:34,310-Speed 2498.39 samples/sec Loss 2.1968 LearningRate 0.000330 Epoch: 19 Global Step: 400400 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:42,510-Speed 2497.77 samples/sec Loss 2.1936 LearningRate 0.000330 Epoch: 19 Global Step: 400410 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:50,712-Speed 2497.30 samples/sec Loss 2.2428 LearningRate 0.000330 Epoch: 19 Global Step: 400420 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:05:58,925-Speed 2494.02 samples/sec Loss 2.2024 LearningRate 0.000330 Epoch: 19 Global Step: 400430 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:07,127-Speed 2497.49 samples/sec Loss 2.2124 LearningRate 0.000330 Epoch: 19 Global Step: 400440 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:15,277-Speed 2513.22 samples/sec Loss 2.1483 LearningRate 0.000330 Epoch: 19 Global Step: 400450 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:23,477-Speed 2498.02 samples/sec Loss 2.2485 LearningRate 0.000330 Epoch: 19 Global Step: 400460 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:31,692-Speed 2493.33 samples/sec Loss 2.1898 LearningRate 0.000330 Epoch: 19 Global Step: 400470 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:39,889-Speed 2499.14 samples/sec Loss 2.2129 LearningRate 0.000330 Epoch: 19 Global Step: 400480 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:06:48,092-Speed 2496.94 samples/sec Loss 2.1709 LearningRate 0.000330 Epoch: 19 Global Step: 400490 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:06:56,297-Speed 2496.58 samples/sec Loss 2.1626 LearningRate 0.000330 Epoch: 19 Global Step: 400500 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:04,443-Speed 2514.43 samples/sec Loss 2.1655 LearningRate 0.000330 Epoch: 19 Global Step: 400510 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:12,649-Speed 2496.31 samples/sec Loss 2.1612 LearningRate 0.000330 Epoch: 19 Global Step: 400520 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:20,852-Speed 2496.80 samples/sec Loss 2.2010 LearningRate 0.000330 Epoch: 19 Global Step: 400530 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:29,061-Speed 2495.36 samples/sec Loss 2.2221 LearningRate 0.000330 Epoch: 19 Global Step: 400540 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:37,267-Speed 2496.01 samples/sec Loss 2.2135 LearningRate 0.000330 Epoch: 19 Global Step: 400550 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:45,472-Speed 2496.48 samples/sec Loss 2.2031 LearningRate 0.000330 Epoch: 19 Global Step: 400560 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:07:53,623-Speed 2512.98 samples/sec Loss 2.1552 LearningRate 0.000330 Epoch: 19 Global Step: 400570 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:01,825-Speed 2497.23 samples/sec Loss 2.1861 LearningRate 0.000330 Epoch: 19 Global Step: 400580 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:10,038-Speed 2494.03 samples/sec Loss 2.2123 LearningRate 0.000330 Epoch: 19 Global Step: 400590 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:18,240-Speed 2497.23 samples/sec Loss 2.1868 LearningRate 0.000330 Epoch: 19 Global Step: 400600 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:26,445-Speed 2496.58 samples/sec Loss 2.1600 LearningRate 0.000330 Epoch: 19 Global Step: 400610 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:34,657-Speed 2494.22 samples/sec Loss 2.1868 LearningRate 0.000330 Epoch: 19 Global Step: 400620 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:42,806-Speed 2513.83 samples/sec Loss 2.1978 LearningRate 0.000330 Epoch: 19 Global Step: 400630 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:51,009-Speed 2496.99 samples/sec Loss 2.1732 LearningRate 0.000330 Epoch: 19 Global Step: 400640 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:08:59,213-Speed 2496.50 samples/sec Loss 2.2216 LearningRate 0.000330 Epoch: 19 Global Step: 400650 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:07,413-Speed 2498.04 samples/sec Loss 2.2068 LearningRate 0.000330 Epoch: 19 Global Step: 400660 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:15,614-Speed 2497.63 samples/sec Loss 2.1565 LearningRate 0.000330 Epoch: 19 Global Step: 400670 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:23,821-Speed 2495.94 samples/sec Loss 2.1854 LearningRate 0.000330 Epoch: 19 Global Step: 400680 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:31,985-Speed 2508.74 samples/sec Loss 2.1973 LearningRate 0.000330 Epoch: 19 Global Step: 400690 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:40,194-Speed 2495.26 samples/sec Loss 2.1518 LearningRate 0.000330 Epoch: 19 Global Step: 400700 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:48,404-Speed 2494.96 samples/sec Loss 2.1353 LearningRate 0.000330 Epoch: 19 Global Step: 400710 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:09:56,607-Speed 2496.88 samples/sec Loss 2.2113 LearningRate 0.000330 Epoch: 19 Global Step: 400720 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:04,810-Speed 2497.03 samples/sec Loss 2.1339 LearningRate 0.000330 Epoch: 19 Global Step: 400730 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:13,025-Speed 2493.34 samples/sec Loss 2.1477 LearningRate 0.000330 Epoch: 19 Global Step: 400740 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:21,175-Speed 2513.27 samples/sec Loss 2.1665 LearningRate 0.000330 Epoch: 19 Global Step: 400750 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:29,379-Speed 2496.80 samples/sec Loss 2.1473 LearningRate 0.000330 Epoch: 19 Global Step: 400760 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:37,580-Speed 2497.53 samples/sec Loss 2.2080 LearningRate 0.000330 Epoch: 19 Global Step: 400770 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:45,786-Speed 2496.29 samples/sec Loss 2.2267 LearningRate 0.000330 Epoch: 19 Global Step: 400780 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:10:53,990-Speed 2496.58 samples/sec Loss 2.2322 LearningRate 0.000330 Epoch: 19 Global Step: 400790 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:02,190-Speed 2497.66 samples/sec Loss 2.2277 LearningRate 0.000330 Epoch: 19 Global Step: 400800 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:10,343-Speed 2512.48 samples/sec Loss 2.1840 LearningRate 0.000330 Epoch: 19 Global Step: 400810 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:18,544-Speed 2497.76 samples/sec Loss 2.1675 LearningRate 0.000330 Epoch: 19 Global Step: 400820 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:26,752-Speed 2495.44 samples/sec Loss 2.1820 LearningRate 0.000330 Epoch: 19 Global Step: 400830 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:34,962-Speed 2494.77 samples/sec Loss 2.1531 LearningRate 0.000330 Epoch: 19 Global Step: 400840 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:43,163-Speed 2497.69 samples/sec Loss 2.1974 LearningRate 0.000330 Epoch: 19 Global Step: 400850 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:51,366-Speed 2497.02 samples/sec Loss 2.1486 LearningRate 0.000330 Epoch: 19 Global Step: 400860 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:11:59,518-Speed 2512.71 samples/sec Loss 2.2065 LearningRate 0.000330 Epoch: 19 Global Step: 400870 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:07,721-Speed 2496.95 samples/sec Loss 2.2021 LearningRate 0.000330 Epoch: 19 Global Step: 400880 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:15,926-Speed 2496.46 samples/sec Loss 2.1818 LearningRate 0.000330 Epoch: 19 Global Step: 400890 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:24,128-Speed 2497.41 samples/sec Loss 2.2474 LearningRate 0.000330 Epoch: 19 Global Step: 400900 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:32,330-Speed 2497.38 samples/sec Loss 2.1825 LearningRate 0.000330 Epoch: 19 Global Step: 400910 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:40,533-Speed 2497.91 samples/sec Loss 2.1830 LearningRate 0.000330 Epoch: 19 Global Step: 400920 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:48,681-Speed 2513.76 samples/sec Loss 2.1939 LearningRate 0.000330 Epoch: 19 Global Step: 400930 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:12:56,882-Speed 2497.73 samples/sec Loss 2.2186 LearningRate 0.000330 Epoch: 19 Global Step: 400940 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:05,086-Speed 2496.74 samples/sec Loss 2.1972 LearningRate 0.000330 Epoch: 19 Global Step: 400950 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:13,284-Speed 2498.50 samples/sec Loss 2.1940 LearningRate 0.000330 Epoch: 19 Global Step: 400960 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:21,488-Speed 2497.01 samples/sec Loss 2.2079 LearningRate 0.000330 Epoch: 19 Global Step: 400970 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:29,692-Speed 2496.80 samples/sec Loss 2.1734 LearningRate 0.000330 Epoch: 19 Global Step: 400980 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:37,838-Speed 2514.15 samples/sec Loss 2.2064 LearningRate 0.000330 Epoch: 19 Global Step: 400990 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:46,053-Speed 2493.59 samples/sec Loss 2.1885 LearningRate 0.000329 Epoch: 19 Global Step: 401000 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:13:54,257-Speed 2496.86 samples/sec Loss 2.2136 LearningRate 0.000329 Epoch: 19 Global Step: 401010 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:02,455-Speed 2498.39 samples/sec Loss 2.1963 LearningRate 0.000329 Epoch: 19 Global Step: 401020 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:10,669-Speed 2493.88 samples/sec Loss 2.1504 LearningRate 0.000329 Epoch: 19 Global Step: 401030 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:18,870-Speed 2497.61 samples/sec Loss 2.2401 LearningRate 0.000329 Epoch: 19 Global Step: 401040 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:27,018-Speed 2514.08 samples/sec Loss 2.2525 LearningRate 0.000329 Epoch: 19 Global Step: 401050 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:35,222-Speed 2496.73 samples/sec Loss 2.2152 LearningRate 0.000329 Epoch: 19 Global Step: 401060 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:43,426-Speed 2496.44 samples/sec Loss 2.2409 LearningRate 0.000329 Epoch: 19 Global Step: 401070 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:51,632-Speed 2496.43 samples/sec Loss 2.2516 LearningRate 0.000329 Epoch: 19 Global Step: 401080 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:14:59,834-Speed 2497.36 samples/sec Loss 2.2202 LearningRate 0.000329 Epoch: 19 Global Step: 401090 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:08,042-Speed 2495.39 samples/sec Loss 2.2895 LearningRate 0.000329 Epoch: 19 Global Step: 401100 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:16,187-Speed 2515.05 samples/sec Loss 2.2412 LearningRate 0.000329 Epoch: 19 Global Step: 401110 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:24,386-Speed 2498.51 samples/sec Loss 2.2185 LearningRate 0.000329 Epoch: 19 Global Step: 401120 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:32,588-Speed 2497.17 samples/sec Loss 2.2494 LearningRate 0.000329 Epoch: 19 Global Step: 401130 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:40,789-Speed 2497.50 samples/sec Loss 2.2259 LearningRate 0.000329 Epoch: 19 Global Step: 401140 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:49,001-Speed 2494.39 samples/sec Loss 2.1693 LearningRate 0.000329 Epoch: 19 Global Step: 401150 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:15:57,204-Speed 2497.18 samples/sec Loss 2.2579 LearningRate 0.000329 Epoch: 19 Global Step: 401160 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:16:05,350-Speed 2514.43 samples/sec Loss 2.2299 LearningRate 0.000329 Epoch: 19 Global Step: 401170 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:16:13,506-Speed 2511.45 samples/sec Loss 2.2111 LearningRate 0.000329 Epoch: 19 Global Step: 401180 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:16:21,708-Speed 2497.28 samples/sec Loss 2.1878 LearningRate 0.000329 Epoch: 19 Global Step: 401190 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:16:29,910-Speed 2497.32 samples/sec Loss 2.2269 LearningRate 0.000329 Epoch: 19 Global Step: 401200 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:16:38,109-Speed 2498.31 samples/sec Loss 2.2409 LearningRate 0.000329 Epoch: 19 Global Step: 401210 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:16:46,316-Speed 2495.73 samples/sec Loss 2.2373 LearningRate 0.000329 Epoch: 19 Global Step: 401220 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:16:54,470-Speed 2512.12 samples/sec Loss 2.2326 LearningRate 0.000329 Epoch: 19 Global Step: 401230 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:02,678-Speed 2495.49 samples/sec Loss 2.2242 LearningRate 0.000329 Epoch: 19 Global Step: 401240 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:10,879-Speed 2497.66 samples/sec Loss 2.2213 LearningRate 0.000329 Epoch: 19 Global Step: 401250 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:19,077-Speed 2498.63 samples/sec Loss 2.1832 LearningRate 0.000329 Epoch: 19 Global Step: 401260 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:27,282-Speed 2496.19 samples/sec Loss 2.2610 LearningRate 0.000329 Epoch: 19 Global Step: 401270 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:35,483-Speed 2497.93 samples/sec Loss 2.2351 LearningRate 0.000329 Epoch: 19 Global Step: 401280 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:43,639-Speed 2511.28 samples/sec Loss 2.1671 LearningRate 0.000329 Epoch: 19 Global Step: 401290 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:17:51,846-Speed 2496.09 samples/sec Loss 2.2161 LearningRate 0.000329 Epoch: 19 Global Step: 401300 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:00,045-Speed 2498.26 samples/sec Loss 2.2059 LearningRate 0.000329 Epoch: 19 Global Step: 401310 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:08,243-Speed 2498.41 samples/sec Loss 2.1423 LearningRate 0.000329 Epoch: 19 Global Step: 401320 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:16,444-Speed 2497.64 samples/sec Loss 2.2741 LearningRate 0.000329 Epoch: 19 Global Step: 401330 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:24,645-Speed 2497.60 samples/sec Loss 2.1894 LearningRate 0.000329 Epoch: 19 Global Step: 401340 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:32,794-Speed 2513.40 samples/sec Loss 2.1825 LearningRate 0.000329 Epoch: 19 Global Step: 401350 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:40,992-Speed 2498.52 samples/sec Loss 2.2048 LearningRate 0.000329 Epoch: 19 Global Step: 401360 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:49,195-Speed 2497.11 samples/sec Loss 2.1930 LearningRate 0.000329 Epoch: 19 Global Step: 401370 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:18:57,397-Speed 2497.24 samples/sec Loss 2.2128 LearningRate 0.000329 Epoch: 19 Global Step: 401380 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:05,592-Speed 2499.43 samples/sec Loss 2.1873 LearningRate 0.000329 Epoch: 19 Global Step: 401390 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:13,794-Speed 2497.26 samples/sec Loss 2.2057 LearningRate 0.000329 Epoch: 19 Global Step: 401400 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:21,946-Speed 2512.68 samples/sec Loss 2.2542 LearningRate 0.000329 Epoch: 19 Global Step: 401410 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:30,146-Speed 2498.28 samples/sec Loss 2.2282 LearningRate 0.000329 Epoch: 19 Global Step: 401420 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:38,357-Speed 2494.83 samples/sec Loss 2.1857 LearningRate 0.000329 Epoch: 19 Global Step: 401430 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:46,569-Speed 2494.45 samples/sec Loss 2.1980 LearningRate 0.000329 Epoch: 19 Global Step: 401440 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:19:54,770-Speed 2497.51 samples/sec Loss 2.2583 LearningRate 0.000329 Epoch: 19 Global Step: 401450 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:02,974-Speed 2496.81 samples/sec Loss 2.1900 LearningRate 0.000329 Epoch: 19 Global Step: 401460 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:11,126-Speed 2512.56 samples/sec Loss 2.2060 LearningRate 0.000329 Epoch: 19 Global Step: 401470 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:19,351-Speed 2490.33 samples/sec Loss 2.2058 LearningRate 0.000329 Epoch: 19 Global Step: 401480 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:27,554-Speed 2497.05 samples/sec Loss 2.2333 LearningRate 0.000329 Epoch: 19 Global Step: 401490 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:35,757-Speed 2497.11 samples/sec Loss 2.2281 LearningRate 0.000329 Epoch: 19 Global Step: 401500 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:43,958-Speed 2497.63 samples/sec Loss 2.1643 LearningRate 0.000329 Epoch: 19 Global Step: 401510 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:20:52,163-Speed 2496.31 samples/sec Loss 2.2132 LearningRate 0.000329 Epoch: 19 Global Step: 401520 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:00,312-Speed 2513.54 samples/sec Loss 2.1823 LearningRate 0.000329 Epoch: 19 Global Step: 401530 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:08,514-Speed 2497.46 samples/sec Loss 2.2285 LearningRate 0.000329 Epoch: 19 Global Step: 401540 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:16,715-Speed 2497.53 samples/sec Loss 2.2561 LearningRate 0.000329 Epoch: 19 Global Step: 401550 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:24,917-Speed 2497.45 samples/sec Loss 2.2277 LearningRate 0.000329 Epoch: 19 Global Step: 401560 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:33,139-Speed 2491.20 samples/sec Loss 2.2504 LearningRate 0.000329 Epoch: 19 Global Step: 401570 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:41,341-Speed 2497.47 samples/sec Loss 2.2117 LearningRate 0.000329 Epoch: 19 Global Step: 401580 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:49,502-Speed 2509.80 samples/sec Loss 2.2204 LearningRate 0.000329 Epoch: 19 Global Step: 401590 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:21:57,705-Speed 2497.37 samples/sec Loss 2.2514 LearningRate 0.000329 Epoch: 19 Global Step: 401600 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:05,906-Speed 2497.53 samples/sec Loss 2.1864 LearningRate 0.000329 Epoch: 19 Global Step: 401610 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:14,113-Speed 2495.95 samples/sec Loss 2.1795 LearningRate 0.000329 Epoch: 19 Global Step: 401620 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:22,316-Speed 2497.15 samples/sec Loss 2.1952 LearningRate 0.000329 Epoch: 19 Global Step: 401630 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:30,521-Speed 2496.41 samples/sec Loss 2.1654 LearningRate 0.000329 Epoch: 19 Global Step: 401640 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:38,692-Speed 2506.78 samples/sec Loss 2.1797 LearningRate 0.000328 Epoch: 19 Global Step: 401650 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:46,898-Speed 2496.13 samples/sec Loss 2.1820 LearningRate 0.000328 Epoch: 19 Global Step: 401660 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:22:55,122-Speed 2490.86 samples/sec Loss 2.1574 LearningRate 0.000328 Epoch: 19 Global Step: 401670 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:03,323-Speed 2497.44 samples/sec Loss 2.1931 LearningRate 0.000328 Epoch: 19 Global Step: 401680 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:11,526-Speed 2497.31 samples/sec Loss 2.1962 LearningRate 0.000328 Epoch: 19 Global Step: 401690 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:19,728-Speed 2497.36 samples/sec Loss 2.1642 LearningRate 0.000328 Epoch: 19 Global Step: 401700 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:27,875-Speed 2513.96 samples/sec Loss 2.1811 LearningRate 0.000328 Epoch: 19 Global Step: 401710 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:36,080-Speed 2496.52 samples/sec Loss 2.1894 LearningRate 0.000328 Epoch: 19 Global Step: 401720 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:44,287-Speed 2496.09 samples/sec Loss 2.2006 LearningRate 0.000328 Epoch: 19 Global Step: 401730 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:23:52,501-Speed 2493.90 samples/sec Loss 2.1978 LearningRate 0.000328 Epoch: 19 Global Step: 401740 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:00,704-Speed 2496.99 samples/sec Loss 2.2347 LearningRate 0.000328 Epoch: 19 Global Step: 401750 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:08,907-Speed 2496.78 samples/sec Loss 2.2275 LearningRate 0.000328 Epoch: 19 Global Step: 401760 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:17,058-Speed 2512.90 samples/sec Loss 2.2224 LearningRate 0.000328 Epoch: 19 Global Step: 401770 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:25,267-Speed 2495.23 samples/sec Loss 2.1740 LearningRate 0.000328 Epoch: 19 Global Step: 401780 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:33,466-Speed 2498.52 samples/sec Loss 2.2056 LearningRate 0.000328 Epoch: 19 Global Step: 401790 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:41,669-Speed 2497.10 samples/sec Loss 2.2197 LearningRate 0.000328 Epoch: 19 Global Step: 401800 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:49,873-Speed 2496.52 samples/sec Loss 2.2079 LearningRate 0.000328 Epoch: 19 Global Step: 401810 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:24:58,085-Speed 2494.65 samples/sec Loss 2.3122 LearningRate 0.000328 Epoch: 19 Global Step: 401820 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:06,234-Speed 2513.63 samples/sec Loss 2.1552 LearningRate 0.000328 Epoch: 19 Global Step: 401830 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:14,437-Speed 2497.24 samples/sec Loss 2.2993 LearningRate 0.000328 Epoch: 19 Global Step: 401840 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:22,651-Speed 2493.93 samples/sec Loss 2.2457 LearningRate 0.000328 Epoch: 19 Global Step: 401850 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:30,858-Speed 2495.89 samples/sec Loss 2.1852 LearningRate 0.000328 Epoch: 19 Global Step: 401860 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:39,065-Speed 2495.56 samples/sec Loss 2.2804 LearningRate 0.000328 Epoch: 19 Global Step: 401870 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:47,267-Speed 2497.30 samples/sec Loss 2.1745 LearningRate 0.000328 Epoch: 19 Global Step: 401880 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:25:55,416-Speed 2513.55 samples/sec Loss 2.2783 LearningRate 0.000328 Epoch: 19 Global Step: 401890 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:03,633-Speed 2492.92 samples/sec Loss 2.2156 LearningRate 0.000328 Epoch: 19 Global Step: 401900 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:11,835-Speed 2497.15 samples/sec Loss 2.1734 LearningRate 0.000328 Epoch: 19 Global Step: 401910 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:20,057-Speed 2491.29 samples/sec Loss 2.1520 LearningRate 0.000328 Epoch: 19 Global Step: 401920 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:28,271-Speed 2493.86 samples/sec Loss 2.1832 LearningRate 0.000328 Epoch: 19 Global Step: 401930 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:36,471-Speed 2497.84 samples/sec Loss 2.2171 LearningRate 0.000328 Epoch: 19 Global Step: 401940 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:44,617-Speed 2514.43 samples/sec Loss 2.2117 LearningRate 0.000328 Epoch: 19 Global Step: 401950 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:26:52,815-Speed 2498.72 samples/sec Loss 2.1303 LearningRate 0.000328 Epoch: 19 Global Step: 401960 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:01,014-Speed 2498.02 samples/sec Loss 2.2605 LearningRate 0.000328 Epoch: 19 Global Step: 401970 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:09,213-Speed 2498.41 samples/sec Loss 2.1803 LearningRate 0.000328 Epoch: 19 Global Step: 401980 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:17,421-Speed 2495.53 samples/sec Loss 2.1529 LearningRate 0.000328 Epoch: 19 Global Step: 401990 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:25,620-Speed 2498.10 samples/sec Loss 2.1320 LearningRate 0.000328 Epoch: 19 Global Step: 402000 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:33,764-Speed 2515.23 samples/sec Loss 2.1680 LearningRate 0.000328 Epoch: 19 Global Step: 402010 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:41,969-Speed 2496.55 samples/sec Loss 2.1283 LearningRate 0.000328 Epoch: 19 Global Step: 402020 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:50,178-Speed 2495.03 samples/sec Loss 2.1906 LearningRate 0.000328 Epoch: 19 Global Step: 402030 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:27:58,411-Speed 2488.02 samples/sec Loss 2.1564 LearningRate 0.000328 Epoch: 19 Global Step: 402040 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:06,614-Speed 2497.24 samples/sec Loss 2.1916 LearningRate 0.000328 Epoch: 19 Global Step: 402050 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:14,818-Speed 2496.92 samples/sec Loss 2.1758 LearningRate 0.000328 Epoch: 19 Global Step: 402060 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:22,966-Speed 2513.84 samples/sec Loss 2.1516 LearningRate 0.000328 Epoch: 19 Global Step: 402070 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:31,164-Speed 2498.41 samples/sec Loss 2.1736 LearningRate 0.000328 Epoch: 19 Global Step: 402080 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:39,378-Speed 2494.22 samples/sec Loss 2.1793 LearningRate 0.000328 Epoch: 19 Global Step: 402090 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:47,581-Speed 2497.09 samples/sec Loss 2.2002 LearningRate 0.000328 Epoch: 19 Global Step: 402100 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:28:55,782-Speed 2497.65 samples/sec Loss 2.1472 LearningRate 0.000328 Epoch: 19 Global Step: 402110 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:03,985-Speed 2497.17 samples/sec Loss 2.1697 LearningRate 0.000328 Epoch: 19 Global Step: 402120 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:12,143-Speed 2511.00 samples/sec Loss 2.1548 LearningRate 0.000328 Epoch: 19 Global Step: 402130 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:20,343-Speed 2497.93 samples/sec Loss 2.1421 LearningRate 0.000328 Epoch: 19 Global Step: 402140 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:28,550-Speed 2496.07 samples/sec Loss 2.1934 LearningRate 0.000328 Epoch: 19 Global Step: 402150 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:36,753-Speed 2497.25 samples/sec Loss 2.1644 LearningRate 0.000328 Epoch: 19 Global Step: 402160 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:44,956-Speed 2497.05 samples/sec Loss 2.1843 LearningRate 0.000328 Epoch: 19 Global Step: 402170 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:29:53,163-Speed 2495.94 samples/sec Loss 2.1518 LearningRate 0.000328 Epoch: 19 Global Step: 402180 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:01,309-Speed 2514.38 samples/sec Loss 2.1650 LearningRate 0.000328 Epoch: 19 Global Step: 402190 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:09,509-Speed 2498.03 samples/sec Loss 2.1508 LearningRate 0.000328 Epoch: 19 Global Step: 402200 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:17,714-Speed 2496.23 samples/sec Loss 2.1396 LearningRate 0.000328 Epoch: 19 Global Step: 402210 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:25,914-Speed 2497.92 samples/sec Loss 2.1603 LearningRate 0.000328 Epoch: 19 Global Step: 402220 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:34,126-Speed 2494.29 samples/sec Loss 2.1765 LearningRate 0.000328 Epoch: 19 Global Step: 402230 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:42,328-Speed 2497.27 samples/sec Loss 2.1797 LearningRate 0.000328 Epoch: 19 Global Step: 402240 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:50,473-Speed 2514.89 samples/sec Loss 2.1553 LearningRate 0.000328 Epoch: 19 Global Step: 402250 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:30:58,676-Speed 2497.04 samples/sec Loss 2.1805 LearningRate 0.000328 Epoch: 19 Global Step: 402260 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:06,879-Speed 2497.00 samples/sec Loss 2.1574 LearningRate 0.000328 Epoch: 19 Global Step: 402270 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:15,082-Speed 2497.10 samples/sec Loss 2.2320 LearningRate 0.000328 Epoch: 19 Global Step: 402280 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:23,283-Speed 2497.75 samples/sec Loss 2.1864 LearningRate 0.000328 Epoch: 19 Global Step: 402290 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:31,480-Speed 2498.49 samples/sec Loss 2.2472 LearningRate 0.000327 Epoch: 19 Global Step: 402300 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:39,627-Speed 2514.37 samples/sec Loss 2.1849 LearningRate 0.000327 Epoch: 19 Global Step: 402310 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:47,831-Speed 2496.54 samples/sec Loss 2.2490 LearningRate 0.000327 Epoch: 19 Global Step: 402320 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:31:56,028-Speed 2498.98 samples/sec Loss 2.2022 LearningRate 0.000327 Epoch: 19 Global Step: 402330 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:32:04,235-Speed 2495.77 samples/sec Loss 2.1490 LearningRate 0.000327 Epoch: 19 Global Step: 402340 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:32:12,436-Speed 2497.61 samples/sec Loss 2.2474 LearningRate 0.000327 Epoch: 19 Global Step: 402350 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:32:20,635-Speed 2498.31 samples/sec Loss 2.1595 LearningRate 0.000327 Epoch: 19 Global Step: 402360 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:32:28,781-Speed 2514.57 samples/sec Loss 2.2042 LearningRate 0.000327 Epoch: 19 Global Step: 402370 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:32:36,981-Speed 2497.83 samples/sec Loss 2.1906 LearningRate 0.000327 Epoch: 19 Global Step: 402380 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:32:45,182-Speed 2497.51 samples/sec Loss 2.1856 LearningRate 0.000327 Epoch: 19 Global Step: 402390 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:32:53,387-Speed 2496.66 samples/sec Loss 2.1653 LearningRate 0.000327 Epoch: 19 Global Step: 402400 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:33:01,589-Speed 2497.25 samples/sec Loss 2.1809 LearningRate 0.000327 Epoch: 19 Global Step: 402410 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:33:09,798-Speed 2495.31 samples/sec Loss 2.1582 LearningRate 0.000327 Epoch: 19 Global Step: 402420 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:33:17,956-Speed 2510.88 samples/sec Loss 2.2145 LearningRate 0.000327 Epoch: 19 Global Step: 402430 Fp16 Grad Scale: 32768 Required: 98 hours Training: 2022-07-09 10:33:26,112-Speed 2511.44 samples/sec Loss 2.1966 LearningRate 0.000327 Epoch: 19 Global Step: 402440 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:33:34,312-Speed 2497.95 samples/sec Loss 2.1657 LearningRate 0.000327 Epoch: 19 Global Step: 402450 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:33:42,511-Speed 2498.19 samples/sec Loss 2.2493 LearningRate 0.000327 Epoch: 19 Global Step: 402460 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:33:50,711-Speed 2497.96 samples/sec Loss 2.2032 LearningRate 0.000327 Epoch: 19 Global Step: 402470 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:33:58,911-Speed 2497.92 samples/sec Loss 2.2272 LearningRate 0.000327 Epoch: 19 Global Step: 402480 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:07,061-Speed 2513.48 samples/sec Loss 2.2043 LearningRate 0.000327 Epoch: 19 Global Step: 402490 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:15,263-Speed 2497.26 samples/sec Loss 2.2514 LearningRate 0.000327 Epoch: 19 Global Step: 402500 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:23,486-Speed 2490.86 samples/sec Loss 2.2585 LearningRate 0.000327 Epoch: 19 Global Step: 402510 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:31,687-Speed 2497.72 samples/sec Loss 2.2707 LearningRate 0.000327 Epoch: 19 Global Step: 402520 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:39,886-Speed 2498.11 samples/sec Loss 2.2232 LearningRate 0.000327 Epoch: 19 Global Step: 402530 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:48,091-Speed 2497.39 samples/sec Loss 2.2403 LearningRate 0.000327 Epoch: 19 Global Step: 402540 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:34:56,239-Speed 2513.70 samples/sec Loss 2.2565 LearningRate 0.000327 Epoch: 19 Global Step: 402550 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:04,437-Speed 2498.51 samples/sec Loss 2.2506 LearningRate 0.000327 Epoch: 19 Global Step: 402560 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:12,642-Speed 2496.62 samples/sec Loss 2.1942 LearningRate 0.000327 Epoch: 19 Global Step: 402570 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:20,843-Speed 2497.45 samples/sec Loss 2.2140 LearningRate 0.000327 Epoch: 19 Global Step: 402580 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:29,045-Speed 2497.38 samples/sec Loss 2.2034 LearningRate 0.000327 Epoch: 19 Global Step: 402590 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:37,246-Speed 2497.71 samples/sec Loss 2.1777 LearningRate 0.000327 Epoch: 19 Global Step: 402600 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:45,396-Speed 2513.45 samples/sec Loss 2.2386 LearningRate 0.000327 Epoch: 19 Global Step: 402610 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:35:53,596-Speed 2497.83 samples/sec Loss 2.1755 LearningRate 0.000327 Epoch: 19 Global Step: 402620 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:01,797-Speed 2497.60 samples/sec Loss 2.1954 LearningRate 0.000327 Epoch: 19 Global Step: 402630 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:09,996-Speed 2498.67 samples/sec Loss 2.1971 LearningRate 0.000327 Epoch: 19 Global Step: 402640 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:18,197-Speed 2497.62 samples/sec Loss 2.2196 LearningRate 0.000327 Epoch: 19 Global Step: 402650 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:26,400-Speed 2497.20 samples/sec Loss 2.2112 LearningRate 0.000327 Epoch: 19 Global Step: 402660 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:34,550-Speed 2513.27 samples/sec Loss 2.2121 LearningRate 0.000327 Epoch: 19 Global Step: 402670 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:42,757-Speed 2495.97 samples/sec Loss 2.1947 LearningRate 0.000327 Epoch: 19 Global Step: 402680 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:50,960-Speed 2496.90 samples/sec Loss 2.1771 LearningRate 0.000327 Epoch: 19 Global Step: 402690 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:36:59,163-Speed 2497.05 samples/sec Loss 2.2105 LearningRate 0.000327 Epoch: 19 Global Step: 402700 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:07,364-Speed 2497.83 samples/sec Loss 2.2193 LearningRate 0.000327 Epoch: 19 Global Step: 402710 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:15,563-Speed 2497.99 samples/sec Loss 2.1809 LearningRate 0.000327 Epoch: 19 Global Step: 402720 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:23,708-Speed 2514.78 samples/sec Loss 2.2005 LearningRate 0.000327 Epoch: 19 Global Step: 402730 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:31,910-Speed 2497.50 samples/sec Loss 2.1917 LearningRate 0.000327 Epoch: 19 Global Step: 402740 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:40,111-Speed 2497.46 samples/sec Loss 2.1426 LearningRate 0.000327 Epoch: 19 Global Step: 402750 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:48,309-Speed 2498.62 samples/sec Loss 2.1879 LearningRate 0.000327 Epoch: 19 Global Step: 402760 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:37:56,525-Speed 2493.23 samples/sec Loss 2.2087 LearningRate 0.000327 Epoch: 19 Global Step: 402770 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:04,728-Speed 2497.02 samples/sec Loss 2.2285 LearningRate 0.000327 Epoch: 19 Global Step: 402780 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:12,886-Speed 2510.79 samples/sec Loss 2.1779 LearningRate 0.000327 Epoch: 19 Global Step: 402790 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:21,099-Speed 2494.16 samples/sec Loss 2.1727 LearningRate 0.000327 Epoch: 19 Global Step: 402800 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:29,300-Speed 2497.40 samples/sec Loss 2.1595 LearningRate 0.000327 Epoch: 19 Global Step: 402810 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:37,504-Speed 2496.91 samples/sec Loss 2.1396 LearningRate 0.000327 Epoch: 19 Global Step: 402820 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:45,715-Speed 2494.93 samples/sec Loss 2.1811 LearningRate 0.000327 Epoch: 19 Global Step: 402830 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:38:53,919-Speed 2496.50 samples/sec Loss 2.2234 LearningRate 0.000327 Epoch: 19 Global Step: 402840 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:02,068-Speed 2513.68 samples/sec Loss 2.1454 LearningRate 0.000327 Epoch: 19 Global Step: 402850 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:10,267-Speed 2498.31 samples/sec Loss 2.1765 LearningRate 0.000327 Epoch: 19 Global Step: 402860 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:18,469-Speed 2497.42 samples/sec Loss 2.2240 LearningRate 0.000327 Epoch: 19 Global Step: 402870 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:26,671-Speed 2497.33 samples/sec Loss 2.2087 LearningRate 0.000327 Epoch: 19 Global Step: 402880 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:34,877-Speed 2496.07 samples/sec Loss 2.1847 LearningRate 0.000327 Epoch: 19 Global Step: 402890 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:43,092-Speed 2493.43 samples/sec Loss 2.1650 LearningRate 0.000327 Epoch: 19 Global Step: 402900 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:51,239-Speed 2514.18 samples/sec Loss 2.2031 LearningRate 0.000327 Epoch: 19 Global Step: 402910 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:39:59,452-Speed 2494.01 samples/sec Loss 2.2040 LearningRate 0.000327 Epoch: 19 Global Step: 402920 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:07,655-Speed 2496.88 samples/sec Loss 2.2453 LearningRate 0.000327 Epoch: 19 Global Step: 402930 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:15,853-Speed 2498.47 samples/sec Loss 2.1594 LearningRate 0.000327 Epoch: 19 Global Step: 402940 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:24,067-Speed 2493.93 samples/sec Loss 2.2221 LearningRate 0.000327 Epoch: 19 Global Step: 402950 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:32,267-Speed 2498.02 samples/sec Loss 2.2241 LearningRate 0.000326 Epoch: 19 Global Step: 402960 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:40,415-Speed 2513.67 samples/sec Loss 2.1787 LearningRate 0.000326 Epoch: 19 Global Step: 402970 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:48,617-Speed 2497.35 samples/sec Loss 2.2060 LearningRate 0.000326 Epoch: 19 Global Step: 402980 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:40:56,819-Speed 2497.40 samples/sec Loss 2.1942 LearningRate 0.000326 Epoch: 19 Global Step: 402990 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:05,019-Speed 2497.75 samples/sec Loss 2.2288 LearningRate 0.000326 Epoch: 19 Global Step: 403000 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:13,228-Speed 2495.10 samples/sec Loss 2.2338 LearningRate 0.000326 Epoch: 19 Global Step: 403010 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:21,441-Speed 2493.99 samples/sec Loss 2.2021 LearningRate 0.000326 Epoch: 19 Global Step: 403020 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:29,591-Speed 2513.47 samples/sec Loss 2.2080 LearningRate 0.000326 Epoch: 19 Global Step: 403030 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:37,792-Speed 2497.39 samples/sec Loss 2.1972 LearningRate 0.000326 Epoch: 19 Global Step: 403040 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:45,995-Speed 2497.05 samples/sec Loss 2.1694 LearningRate 0.000326 Epoch: 19 Global Step: 403050 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:41:54,209-Speed 2493.84 samples/sec Loss 2.2141 LearningRate 0.000326 Epoch: 19 Global Step: 403060 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:02,410-Speed 2497.59 samples/sec Loss 2.2004 LearningRate 0.000326 Epoch: 19 Global Step: 403070 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:10,621-Speed 2494.55 samples/sec Loss 2.1860 LearningRate 0.000326 Epoch: 19 Global Step: 403080 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:18,771-Speed 2513.40 samples/sec Loss 2.1876 LearningRate 0.000326 Epoch: 19 Global Step: 403090 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:26,978-Speed 2496.10 samples/sec Loss 2.1780 LearningRate 0.000326 Epoch: 19 Global Step: 403100 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:35,185-Speed 2495.86 samples/sec Loss 2.1770 LearningRate 0.000326 Epoch: 19 Global Step: 403110 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:43,387-Speed 2497.17 samples/sec Loss 2.1707 LearningRate 0.000326 Epoch: 19 Global Step: 403120 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:51,586-Speed 2498.36 samples/sec Loss 2.1712 LearningRate 0.000326 Epoch: 19 Global Step: 403130 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:42:59,786-Speed 2498.01 samples/sec Loss 2.2123 LearningRate 0.000326 Epoch: 19 Global Step: 403140 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:07,932-Speed 2514.58 samples/sec Loss 2.2451 LearningRate 0.000326 Epoch: 19 Global Step: 403150 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:16,132-Speed 2497.71 samples/sec Loss 2.2131 LearningRate 0.000326 Epoch: 19 Global Step: 403160 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:24,339-Speed 2496.05 samples/sec Loss 2.2152 LearningRate 0.000326 Epoch: 19 Global Step: 403170 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:32,541-Speed 2497.32 samples/sec Loss 2.2223 LearningRate 0.000326 Epoch: 19 Global Step: 403180 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:40,743-Speed 2497.38 samples/sec Loss 2.2594 LearningRate 0.000326 Epoch: 19 Global Step: 403190 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:48,944-Speed 2497.70 samples/sec Loss 2.2014 LearningRate 0.000326 Epoch: 19 Global Step: 403200 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:43:57,093-Speed 2513.75 samples/sec Loss 2.2488 LearningRate 0.000326 Epoch: 19 Global Step: 403210 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:05,294-Speed 2497.62 samples/sec Loss 2.2545 LearningRate 0.000326 Epoch: 19 Global Step: 403220 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:13,495-Speed 2497.80 samples/sec Loss 2.2159 LearningRate 0.000326 Epoch: 19 Global Step: 403230 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:21,705-Speed 2494.83 samples/sec Loss 2.2284 LearningRate 0.000326 Epoch: 19 Global Step: 403240 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:29,909-Speed 2496.72 samples/sec Loss 2.2627 LearningRate 0.000326 Epoch: 19 Global Step: 403250 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:38,110-Speed 2497.54 samples/sec Loss 2.1892 LearningRate 0.000326 Epoch: 19 Global Step: 403260 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:46,271-Speed 2509.93 samples/sec Loss 2.2102 LearningRate 0.000326 Epoch: 19 Global Step: 403270 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:44:54,492-Speed 2491.64 samples/sec Loss 2.1270 LearningRate 0.000326 Epoch: 19 Global Step: 403280 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:02,691-Speed 2498.43 samples/sec Loss 2.1994 LearningRate 0.000326 Epoch: 19 Global Step: 403290 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:10,892-Speed 2497.60 samples/sec Loss 2.2221 LearningRate 0.000326 Epoch: 19 Global Step: 403300 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:19,098-Speed 2496.16 samples/sec Loss 2.1889 LearningRate 0.000326 Epoch: 19 Global Step: 403310 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:27,297-Speed 2498.13 samples/sec Loss 2.1763 LearningRate 0.000326 Epoch: 19 Global Step: 403320 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:35,448-Speed 2513.05 samples/sec Loss 2.1832 LearningRate 0.000326 Epoch: 19 Global Step: 403330 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:43,661-Speed 2493.97 samples/sec Loss 2.1699 LearningRate 0.000326 Epoch: 19 Global Step: 403340 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:45:51,867-Speed 2496.40 samples/sec Loss 2.1361 LearningRate 0.000326 Epoch: 19 Global Step: 403350 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:00,082-Speed 2493.30 samples/sec Loss 2.2326 LearningRate 0.000326 Epoch: 19 Global Step: 403360 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:08,303-Speed 2491.52 samples/sec Loss 2.2371 LearningRate 0.000326 Epoch: 19 Global Step: 403370 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:16,504-Speed 2497.73 samples/sec Loss 2.2062 LearningRate 0.000326 Epoch: 19 Global Step: 403380 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:24,658-Speed 2512.01 samples/sec Loss 2.2027 LearningRate 0.000326 Epoch: 19 Global Step: 403390 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:32,865-Speed 2495.97 samples/sec Loss 2.2146 LearningRate 0.000326 Epoch: 19 Global Step: 403400 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:41,066-Speed 2497.53 samples/sec Loss 2.2282 LearningRate 0.000326 Epoch: 19 Global Step: 403410 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:49,274-Speed 2495.40 samples/sec Loss 2.2120 LearningRate 0.000326 Epoch: 19 Global Step: 403420 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:46:57,483-Speed 2495.14 samples/sec Loss 2.2049 LearningRate 0.000326 Epoch: 19 Global Step: 403430 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:05,698-Speed 2493.47 samples/sec Loss 2.2320 LearningRate 0.000326 Epoch: 19 Global Step: 403440 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:13,849-Speed 2512.97 samples/sec Loss 2.1820 LearningRate 0.000326 Epoch: 19 Global Step: 403450 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:22,053-Speed 2496.82 samples/sec Loss 2.2008 LearningRate 0.000326 Epoch: 19 Global Step: 403460 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:30,258-Speed 2496.19 samples/sec Loss 2.1894 LearningRate 0.000326 Epoch: 19 Global Step: 403470 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:38,460-Speed 2497.64 samples/sec Loss 2.2175 LearningRate 0.000326 Epoch: 19 Global Step: 403480 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:46,675-Speed 2493.42 samples/sec Loss 2.1637 LearningRate 0.000326 Epoch: 19 Global Step: 403490 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:47:54,876-Speed 2497.78 samples/sec Loss 2.2131 LearningRate 0.000326 Epoch: 19 Global Step: 403500 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:03,028-Speed 2512.59 samples/sec Loss 2.2245 LearningRate 0.000326 Epoch: 19 Global Step: 403510 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:11,230-Speed 2497.32 samples/sec Loss 2.1796 LearningRate 0.000326 Epoch: 19 Global Step: 403520 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:19,430-Speed 2497.91 samples/sec Loss 2.1520 LearningRate 0.000326 Epoch: 19 Global Step: 403530 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:27,630-Speed 2498.08 samples/sec Loss 2.2554 LearningRate 0.000326 Epoch: 19 Global Step: 403540 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:35,833-Speed 2497.11 samples/sec Loss 2.1721 LearningRate 0.000326 Epoch: 19 Global Step: 403550 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:44,046-Speed 2494.02 samples/sec Loss 2.1759 LearningRate 0.000326 Epoch: 19 Global Step: 403560 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:48:52,195-Speed 2513.57 samples/sec Loss 2.2360 LearningRate 0.000326 Epoch: 19 Global Step: 403570 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:49:00,396-Speed 2497.32 samples/sec Loss 2.2102 LearningRate 0.000326 Epoch: 19 Global Step: 403580 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:49:08,611-Speed 2493.57 samples/sec Loss 2.2104 LearningRate 0.000326 Epoch: 19 Global Step: 403590 Fp16 Grad Scale: 16384 Required: 98 hours Training: 2022-07-09 10:49:16,809-Speed 2498.53 samples/sec Loss 2.2018 LearningRate 0.000326 Epoch: 19 Global Step: 403600 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:49:25,012-Speed 2497.19 samples/sec Loss 2.1865 LearningRate 0.000325 Epoch: 19 Global Step: 403610 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:49:33,211-Speed 2498.39 samples/sec Loss 2.1630 LearningRate 0.000325 Epoch: 19 Global Step: 403620 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:49:41,360-Speed 2513.38 samples/sec Loss 2.1710 LearningRate 0.000325 Epoch: 19 Global Step: 403630 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:49:49,562-Speed 2497.34 samples/sec Loss 2.1428 LearningRate 0.000325 Epoch: 19 Global Step: 403640 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:49:57,770-Speed 2495.46 samples/sec Loss 2.2163 LearningRate 0.000325 Epoch: 19 Global Step: 403650 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:05,970-Speed 2498.32 samples/sec Loss 2.2046 LearningRate 0.000325 Epoch: 19 Global Step: 403660 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:14,174-Speed 2497.05 samples/sec Loss 2.1996 LearningRate 0.000325 Epoch: 19 Global Step: 403670 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:22,390-Speed 2492.94 samples/sec Loss 2.1584 LearningRate 0.000325 Epoch: 19 Global Step: 403680 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:30,545-Speed 2511.79 samples/sec Loss 2.1709 LearningRate 0.000325 Epoch: 19 Global Step: 403690 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:38,750-Speed 2496.74 samples/sec Loss 2.1644 LearningRate 0.000325 Epoch: 19 Global Step: 403700 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:46,966-Speed 2493.21 samples/sec Loss 2.1889 LearningRate 0.000325 Epoch: 19 Global Step: 403710 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:50:55,187-Speed 2491.51 samples/sec Loss 2.1433 LearningRate 0.000325 Epoch: 19 Global Step: 403720 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:03,397-Speed 2494.93 samples/sec Loss 2.1792 LearningRate 0.000325 Epoch: 19 Global Step: 403730 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:11,598-Speed 2497.52 samples/sec Loss 2.1745 LearningRate 0.000325 Epoch: 19 Global Step: 403740 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:19,743-Speed 2514.77 samples/sec Loss 2.2039 LearningRate 0.000325 Epoch: 19 Global Step: 403750 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:27,948-Speed 2496.61 samples/sec Loss 2.1835 LearningRate 0.000325 Epoch: 19 Global Step: 403760 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:36,149-Speed 2497.91 samples/sec Loss 2.1653 LearningRate 0.000325 Epoch: 19 Global Step: 403770 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:44,349-Speed 2497.76 samples/sec Loss 2.1882 LearningRate 0.000325 Epoch: 19 Global Step: 403780 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:51:52,548-Speed 2498.29 samples/sec Loss 2.1987 LearningRate 0.000325 Epoch: 19 Global Step: 403790 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:00,748-Speed 2498.15 samples/sec Loss 2.2080 LearningRate 0.000325 Epoch: 19 Global Step: 403800 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:08,927-Speed 2504.34 samples/sec Loss 2.1711 LearningRate 0.000325 Epoch: 19 Global Step: 403810 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:17,138-Speed 2494.72 samples/sec Loss 2.1789 LearningRate 0.000325 Epoch: 19 Global Step: 403820 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:25,339-Speed 2497.54 samples/sec Loss 2.1941 LearningRate 0.000325 Epoch: 19 Global Step: 403830 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:33,551-Speed 2494.44 samples/sec Loss 2.1609 LearningRate 0.000325 Epoch: 19 Global Step: 403840 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:41,752-Speed 2497.40 samples/sec Loss 2.1986 LearningRate 0.000325 Epoch: 19 Global Step: 403850 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:49,951-Speed 2498.38 samples/sec Loss 2.1468 LearningRate 0.000325 Epoch: 19 Global Step: 403860 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:52:58,104-Speed 2512.31 samples/sec Loss 2.2100 LearningRate 0.000325 Epoch: 19 Global Step: 403870 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:06,305-Speed 2497.58 samples/sec Loss 2.1617 LearningRate 0.000325 Epoch: 19 Global Step: 403880 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:14,516-Speed 2494.90 samples/sec Loss 2.1746 LearningRate 0.000325 Epoch: 19 Global Step: 403890 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:22,718-Speed 2497.35 samples/sec Loss 2.1099 LearningRate 0.000325 Epoch: 19 Global Step: 403900 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:30,922-Speed 2496.47 samples/sec Loss 2.2148 LearningRate 0.000325 Epoch: 19 Global Step: 403910 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:39,125-Speed 2497.12 samples/sec Loss 2.2096 LearningRate 0.000325 Epoch: 19 Global Step: 403920 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:47,280-Speed 2512.08 samples/sec Loss 2.1915 LearningRate 0.000325 Epoch: 19 Global Step: 403930 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:53:55,495-Speed 2493.08 samples/sec Loss 2.1463 LearningRate 0.000325 Epoch: 19 Global Step: 403940 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:03,711-Speed 2493.14 samples/sec Loss 2.1623 LearningRate 0.000325 Epoch: 19 Global Step: 403950 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:11,917-Speed 2496.08 samples/sec Loss 2.1828 LearningRate 0.000325 Epoch: 19 Global Step: 403960 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:20,119-Speed 2497.51 samples/sec Loss 2.1970 LearningRate 0.000325 Epoch: 19 Global Step: 403970 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:28,323-Speed 2496.64 samples/sec Loss 2.2150 LearningRate 0.000325 Epoch: 19 Global Step: 403980 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:36,472-Speed 2513.63 samples/sec Loss 2.1571 LearningRate 0.000325 Epoch: 19 Global Step: 403990 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:44,693-Speed 2491.32 samples/sec Loss 2.2003 LearningRate 0.000325 Epoch: 19 Global Step: 404000 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:54:52,897-Speed 2496.84 samples/sec Loss 2.2233 LearningRate 0.000325 Epoch: 19 Global Step: 404010 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:01,100-Speed 2497.10 samples/sec Loss 2.1851 LearningRate 0.000325 Epoch: 19 Global Step: 404020 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:09,315-Speed 2493.10 samples/sec Loss 2.2455 LearningRate 0.000325 Epoch: 19 Global Step: 404030 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:17,517-Speed 2497.58 samples/sec Loss 2.2212 LearningRate 0.000325 Epoch: 19 Global Step: 404040 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:25,669-Speed 2512.73 samples/sec Loss 2.1793 LearningRate 0.000325 Epoch: 19 Global Step: 404050 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:33,874-Speed 2496.52 samples/sec Loss 2.1508 LearningRate 0.000325 Epoch: 19 Global Step: 404060 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:42,086-Speed 2494.32 samples/sec Loss 2.1831 LearningRate 0.000325 Epoch: 19 Global Step: 404070 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:50,288-Speed 2497.37 samples/sec Loss 2.1695 LearningRate 0.000325 Epoch: 19 Global Step: 404080 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:55:58,494-Speed 2496.05 samples/sec Loss 2.1578 LearningRate 0.000325 Epoch: 19 Global Step: 404090 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:06,703-Speed 2495.10 samples/sec Loss 2.2390 LearningRate 0.000325 Epoch: 19 Global Step: 404100 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:14,857-Speed 2512.14 samples/sec Loss 2.1731 LearningRate 0.000325 Epoch: 19 Global Step: 404110 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:23,059-Speed 2497.58 samples/sec Loss 2.1665 LearningRate 0.000325 Epoch: 19 Global Step: 404120 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:31,259-Speed 2497.65 samples/sec Loss 2.1770 LearningRate 0.000325 Epoch: 19 Global Step: 404130 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:39,461-Speed 2497.48 samples/sec Loss 2.2151 LearningRate 0.000325 Epoch: 19 Global Step: 404140 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:47,665-Speed 2496.95 samples/sec Loss 2.1932 LearningRate 0.000325 Epoch: 19 Global Step: 404150 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:56:55,889-Speed 2490.40 samples/sec Loss 2.1532 LearningRate 0.000325 Epoch: 19 Global Step: 404160 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:04,036-Speed 2514.28 samples/sec Loss 2.1711 LearningRate 0.000325 Epoch: 19 Global Step: 404170 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:12,248-Speed 2494.63 samples/sec Loss 2.1332 LearningRate 0.000325 Epoch: 19 Global Step: 404180 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:20,449-Speed 2497.82 samples/sec Loss 2.1239 LearningRate 0.000325 Epoch: 19 Global Step: 404190 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:28,653-Speed 2496.75 samples/sec Loss 2.1679 LearningRate 0.000325 Epoch: 19 Global Step: 404200 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:36,857-Speed 2496.90 samples/sec Loss 2.1529 LearningRate 0.000325 Epoch: 19 Global Step: 404210 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:45,070-Speed 2493.69 samples/sec Loss 2.1723 LearningRate 0.000325 Epoch: 19 Global Step: 404220 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:57:53,215-Speed 2514.95 samples/sec Loss 2.1477 LearningRate 0.000325 Epoch: 19 Global Step: 404230 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:58:01,415-Speed 2498.15 samples/sec Loss 2.1937 LearningRate 0.000325 Epoch: 19 Global Step: 404240 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:58:09,613-Speed 2498.29 samples/sec Loss 2.1858 LearningRate 0.000325 Epoch: 19 Global Step: 404250 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:58:17,812-Speed 2498.17 samples/sec Loss 2.1851 LearningRate 0.000324 Epoch: 19 Global Step: 404260 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 10:58:25,970-Speed 2510.77 samples/sec Loss 2.1667 LearningRate 0.000324 Epoch: 19 Global Step: 404270 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:58:34,173-Speed 2497.24 samples/sec Loss 2.2034 LearningRate 0.000324 Epoch: 19 Global Step: 404280 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:58:42,318-Speed 2514.56 samples/sec Loss 2.1993 LearningRate 0.000324 Epoch: 19 Global Step: 404290 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:58:50,520-Speed 2497.32 samples/sec Loss 2.2124 LearningRate 0.000324 Epoch: 19 Global Step: 404300 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:58:58,723-Speed 2496.86 samples/sec Loss 2.1566 LearningRate 0.000324 Epoch: 19 Global Step: 404310 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:06,924-Speed 2497.67 samples/sec Loss 2.2184 LearningRate 0.000324 Epoch: 19 Global Step: 404320 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:15,127-Speed 2497.16 samples/sec Loss 2.1914 LearningRate 0.000324 Epoch: 19 Global Step: 404330 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:23,328-Speed 2497.58 samples/sec Loss 2.2087 LearningRate 0.000324 Epoch: 19 Global Step: 404340 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:31,478-Speed 2513.09 samples/sec Loss 2.1914 LearningRate 0.000324 Epoch: 19 Global Step: 404350 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:39,678-Speed 2498.06 samples/sec Loss 2.1921 LearningRate 0.000324 Epoch: 19 Global Step: 404360 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:47,885-Speed 2495.86 samples/sec Loss 2.2123 LearningRate 0.000324 Epoch: 19 Global Step: 404370 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 10:59:56,087-Speed 2497.15 samples/sec Loss 2.2028 LearningRate 0.000324 Epoch: 19 Global Step: 404380 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:04,290-Speed 2497.05 samples/sec Loss 2.1557 LearningRate 0.000324 Epoch: 19 Global Step: 404390 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:12,509-Speed 2492.34 samples/sec Loss 2.1931 LearningRate 0.000324 Epoch: 19 Global Step: 404400 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:20,671-Speed 2509.49 samples/sec Loss 2.2269 LearningRate 0.000324 Epoch: 19 Global Step: 404410 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:28,880-Speed 2495.31 samples/sec Loss 2.2066 LearningRate 0.000324 Epoch: 19 Global Step: 404420 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:37,087-Speed 2495.76 samples/sec Loss 2.1421 LearningRate 0.000324 Epoch: 19 Global Step: 404430 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:45,296-Speed 2495.20 samples/sec Loss 2.2207 LearningRate 0.000324 Epoch: 19 Global Step: 404440 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:00:53,505-Speed 2495.09 samples/sec Loss 2.2435 LearningRate 0.000324 Epoch: 19 Global Step: 404450 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:01,715-Speed 2494.93 samples/sec Loss 2.1937 LearningRate 0.000324 Epoch: 19 Global Step: 404460 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:09,865-Speed 2513.53 samples/sec Loss 2.1661 LearningRate 0.000324 Epoch: 19 Global Step: 404470 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:18,067-Speed 2497.28 samples/sec Loss 2.2692 LearningRate 0.000324 Epoch: 19 Global Step: 404480 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:26,289-Speed 2491.28 samples/sec Loss 2.2090 LearningRate 0.000324 Epoch: 19 Global Step: 404490 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:34,489-Speed 2498.22 samples/sec Loss 2.2168 LearningRate 0.000324 Epoch: 19 Global Step: 404500 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:42,691-Speed 2497.39 samples/sec Loss 2.2050 LearningRate 0.000324 Epoch: 19 Global Step: 404510 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:50,907-Speed 2492.87 samples/sec Loss 2.2138 LearningRate 0.000324 Epoch: 19 Global Step: 404520 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:01:59,055-Speed 2513.97 samples/sec Loss 2.1971 LearningRate 0.000324 Epoch: 19 Global Step: 404530 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:07,263-Speed 2495.61 samples/sec Loss 2.2283 LearningRate 0.000324 Epoch: 19 Global Step: 404540 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:15,462-Speed 2498.30 samples/sec Loss 2.2220 LearningRate 0.000324 Epoch: 19 Global Step: 404550 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:23,664-Speed 2497.87 samples/sec Loss 2.2177 LearningRate 0.000324 Epoch: 19 Global Step: 404560 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:31,876-Speed 2494.27 samples/sec Loss 2.1958 LearningRate 0.000324 Epoch: 19 Global Step: 404570 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:40,077-Speed 2497.91 samples/sec Loss 2.1876 LearningRate 0.000324 Epoch: 19 Global Step: 404580 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:48,226-Speed 2513.31 samples/sec Loss 2.1944 LearningRate 0.000324 Epoch: 19 Global Step: 404590 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:02:56,428-Speed 2497.21 samples/sec Loss 2.1781 LearningRate 0.000324 Epoch: 19 Global Step: 404600 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:04,629-Speed 2497.68 samples/sec Loss 2.2246 LearningRate 0.000324 Epoch: 19 Global Step: 404610 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:12,830-Speed 2498.00 samples/sec Loss 2.1977 LearningRate 0.000324 Epoch: 19 Global Step: 404620 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:21,042-Speed 2493.91 samples/sec Loss 2.2577 LearningRate 0.000324 Epoch: 19 Global Step: 404630 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:29,242-Speed 2497.86 samples/sec Loss 2.1627 LearningRate 0.000324 Epoch: 19 Global Step: 404640 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:37,402-Speed 2510.52 samples/sec Loss 2.1392 LearningRate 0.000324 Epoch: 19 Global Step: 404650 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:45,607-Speed 2496.32 samples/sec Loss 2.1914 LearningRate 0.000324 Epoch: 19 Global Step: 404660 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:03:53,908-Speed 2467.52 samples/sec Loss 2.1859 LearningRate 0.000324 Epoch: 19 Global Step: 404670 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:02,109-Speed 2497.65 samples/sec Loss 2.2100 LearningRate 0.000324 Epoch: 19 Global Step: 404680 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:10,309-Speed 2497.91 samples/sec Loss 2.1893 LearningRate 0.000324 Epoch: 19 Global Step: 404690 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:18,521-Speed 2494.35 samples/sec Loss 2.2427 LearningRate 0.000324 Epoch: 19 Global Step: 404700 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:26,671-Speed 2513.10 samples/sec Loss 2.2472 LearningRate 0.000324 Epoch: 19 Global Step: 404710 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:34,872-Speed 2497.62 samples/sec Loss 2.1855 LearningRate 0.000324 Epoch: 19 Global Step: 404720 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:43,075-Speed 2496.93 samples/sec Loss 2.1970 LearningRate 0.000324 Epoch: 19 Global Step: 404730 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:51,279-Speed 2496.83 samples/sec Loss 2.2205 LearningRate 0.000324 Epoch: 19 Global Step: 404740 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:04:59,483-Speed 2496.86 samples/sec Loss 2.1862 LearningRate 0.000324 Epoch: 19 Global Step: 404750 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:07,687-Speed 2496.73 samples/sec Loss 2.1881 LearningRate 0.000324 Epoch: 19 Global Step: 404760 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:15,841-Speed 2512.13 samples/sec Loss 2.2178 LearningRate 0.000324 Epoch: 19 Global Step: 404770 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:24,057-Speed 2493.16 samples/sec Loss 2.1581 LearningRate 0.000324 Epoch: 19 Global Step: 404780 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:32,261-Speed 2496.72 samples/sec Loss 2.2129 LearningRate 0.000324 Epoch: 19 Global Step: 404790 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:40,465-Speed 2496.86 samples/sec Loss 2.1961 LearningRate 0.000324 Epoch: 19 Global Step: 404800 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:48,668-Speed 2497.08 samples/sec Loss 2.2094 LearningRate 0.000324 Epoch: 19 Global Step: 404810 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:05:56,871-Speed 2496.73 samples/sec Loss 2.2061 LearningRate 0.000324 Epoch: 19 Global Step: 404820 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:05,030-Speed 2510.51 samples/sec Loss 2.1595 LearningRate 0.000324 Epoch: 19 Global Step: 404830 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:13,238-Speed 2495.51 samples/sec Loss 2.1602 LearningRate 0.000324 Epoch: 19 Global Step: 404840 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:21,447-Speed 2495.17 samples/sec Loss 2.2466 LearningRate 0.000324 Epoch: 19 Global Step: 404850 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:29,647-Speed 2498.06 samples/sec Loss 2.2237 LearningRate 0.000324 Epoch: 19 Global Step: 404860 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:37,845-Speed 2498.60 samples/sec Loss 2.1860 LearningRate 0.000324 Epoch: 19 Global Step: 404870 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:46,052-Speed 2495.75 samples/sec Loss 2.2355 LearningRate 0.000324 Epoch: 19 Global Step: 404880 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:06:54,200-Speed 2513.93 samples/sec Loss 2.1045 LearningRate 0.000324 Epoch: 19 Global Step: 404890 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:02,401-Speed 2497.99 samples/sec Loss 2.2181 LearningRate 0.000324 Epoch: 19 Global Step: 404900 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:10,603-Speed 2497.21 samples/sec Loss 2.1579 LearningRate 0.000324 Epoch: 19 Global Step: 404910 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:18,823-Speed 2491.71 samples/sec Loss 2.1971 LearningRate 0.000323 Epoch: 19 Global Step: 404920 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:27,036-Speed 2494.06 samples/sec Loss 2.1044 LearningRate 0.000323 Epoch: 19 Global Step: 404930 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:35,237-Speed 2497.54 samples/sec Loss 2.1564 LearningRate 0.000323 Epoch: 19 Global Step: 404940 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:43,386-Speed 2513.80 samples/sec Loss 2.1885 LearningRate 0.000323 Epoch: 19 Global Step: 404950 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:51,587-Speed 2497.58 samples/sec Loss 2.1941 LearningRate 0.000323 Epoch: 19 Global Step: 404960 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:07:59,792-Speed 2496.50 samples/sec Loss 2.1043 LearningRate 0.000323 Epoch: 19 Global Step: 404970 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:08,000-Speed 2495.66 samples/sec Loss 2.1831 LearningRate 0.000323 Epoch: 19 Global Step: 404980 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:16,213-Speed 2493.71 samples/sec Loss 2.2058 LearningRate 0.000323 Epoch: 19 Global Step: 404990 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:24,418-Speed 2496.62 samples/sec Loss 2.2078 LearningRate 0.000323 Epoch: 19 Global Step: 405000 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:32,573-Speed 2511.86 samples/sec Loss 2.2099 LearningRate 0.000323 Epoch: 19 Global Step: 405010 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:40,776-Speed 2496.78 samples/sec Loss 2.1664 LearningRate 0.000323 Epoch: 19 Global Step: 405020 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:48,989-Speed 2494.30 samples/sec Loss 2.2301 LearningRate 0.000323 Epoch: 19 Global Step: 405030 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:08:57,196-Speed 2495.80 samples/sec Loss 2.1678 LearningRate 0.000323 Epoch: 19 Global Step: 405040 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:05,400-Speed 2496.84 samples/sec Loss 2.1377 LearningRate 0.000323 Epoch: 19 Global Step: 405050 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:13,620-Speed 2491.46 samples/sec Loss 2.1538 LearningRate 0.000323 Epoch: 19 Global Step: 405060 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:21,775-Speed 2511.97 samples/sec Loss 2.1775 LearningRate 0.000323 Epoch: 19 Global Step: 405070 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:29,978-Speed 2497.09 samples/sec Loss 2.1692 LearningRate 0.000323 Epoch: 19 Global Step: 405080 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:38,195-Speed 2492.42 samples/sec Loss 2.1511 LearningRate 0.000323 Epoch: 19 Global Step: 405090 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:46,407-Speed 2494.71 samples/sec Loss 2.1886 LearningRate 0.000323 Epoch: 19 Global Step: 405100 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:09:54,611-Speed 2497.24 samples/sec Loss 2.1521 LearningRate 0.000323 Epoch: 19 Global Step: 405110 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:02,813-Speed 2497.29 samples/sec Loss 2.1930 LearningRate 0.000323 Epoch: 19 Global Step: 405120 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:10,964-Speed 2512.83 samples/sec Loss 2.2057 LearningRate 0.000323 Epoch: 19 Global Step: 405130 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:19,173-Speed 2495.34 samples/sec Loss 2.1872 LearningRate 0.000323 Epoch: 19 Global Step: 405140 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:27,380-Speed 2495.78 samples/sec Loss 2.1739 LearningRate 0.000323 Epoch: 19 Global Step: 405150 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:35,580-Speed 2498.02 samples/sec Loss 2.2038 LearningRate 0.000323 Epoch: 19 Global Step: 405160 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:43,781-Speed 2497.36 samples/sec Loss 2.1952 LearningRate 0.000323 Epoch: 19 Global Step: 405170 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:10:51,982-Speed 2497.68 samples/sec Loss 2.2279 LearningRate 0.000323 Epoch: 19 Global Step: 405180 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:00,151-Speed 2507.78 samples/sec Loss 2.2260 LearningRate 0.000323 Epoch: 19 Global Step: 405190 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:08,354-Speed 2496.95 samples/sec Loss 2.1878 LearningRate 0.000323 Epoch: 19 Global Step: 405200 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:16,556-Speed 2497.39 samples/sec Loss 2.1866 LearningRate 0.000323 Epoch: 19 Global Step: 405210 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:24,755-Speed 2498.29 samples/sec Loss 2.1918 LearningRate 0.000323 Epoch: 19 Global Step: 405220 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:32,963-Speed 2495.78 samples/sec Loss 2.2854 LearningRate 0.000323 Epoch: 19 Global Step: 405230 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:41,164-Speed 2497.55 samples/sec Loss 2.1566 LearningRate 0.000323 Epoch: 19 Global Step: 405240 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:49,315-Speed 2513.06 samples/sec Loss 2.1827 LearningRate 0.000323 Epoch: 19 Global Step: 405250 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:11:57,518-Speed 2497.08 samples/sec Loss 2.1831 LearningRate 0.000323 Epoch: 19 Global Step: 405260 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:05,722-Speed 2496.73 samples/sec Loss 2.1618 LearningRate 0.000323 Epoch: 19 Global Step: 405270 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:13,923-Speed 2497.80 samples/sec Loss 2.1763 LearningRate 0.000323 Epoch: 19 Global Step: 405280 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:22,124-Speed 2497.41 samples/sec Loss 2.1350 LearningRate 0.000323 Epoch: 19 Global Step: 405290 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:30,328-Speed 2496.78 samples/sec Loss 2.2170 LearningRate 0.000323 Epoch: 19 Global Step: 405300 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:38,478-Speed 2513.38 samples/sec Loss 2.1523 LearningRate 0.000323 Epoch: 19 Global Step: 405310 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:46,681-Speed 2496.97 samples/sec Loss 2.1801 LearningRate 0.000323 Epoch: 19 Global Step: 405320 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:12:54,887-Speed 2496.18 samples/sec Loss 2.2150 LearningRate 0.000323 Epoch: 19 Global Step: 405330 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:03,087-Speed 2497.88 samples/sec Loss 2.1191 LearningRate 0.000323 Epoch: 19 Global Step: 405340 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:11,297-Speed 2494.96 samples/sec Loss 2.1662 LearningRate 0.000323 Epoch: 19 Global Step: 405350 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:19,501-Speed 2497.10 samples/sec Loss 2.2287 LearningRate 0.000323 Epoch: 19 Global Step: 405360 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:27,647-Speed 2514.32 samples/sec Loss 2.1625 LearningRate 0.000323 Epoch: 19 Global Step: 405370 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:35,856-Speed 2495.22 samples/sec Loss 2.1974 LearningRate 0.000323 Epoch: 19 Global Step: 405380 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:44,058-Speed 2497.81 samples/sec Loss 2.1661 LearningRate 0.000323 Epoch: 19 Global Step: 405390 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:13:52,263-Speed 2496.17 samples/sec Loss 2.2176 LearningRate 0.000323 Epoch: 19 Global Step: 405400 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:00,463-Speed 2497.99 samples/sec Loss 2.2038 LearningRate 0.000323 Epoch: 19 Global Step: 405410 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:08,665-Speed 2497.52 samples/sec Loss 2.1776 LearningRate 0.000323 Epoch: 19 Global Step: 405420 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:16,814-Speed 2514.43 samples/sec Loss 2.1898 LearningRate 0.000323 Epoch: 19 Global Step: 405430 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:25,016-Speed 2497.15 samples/sec Loss 2.1522 LearningRate 0.000323 Epoch: 19 Global Step: 405440 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:33,218-Speed 2497.21 samples/sec Loss 2.2481 LearningRate 0.000323 Epoch: 19 Global Step: 405450 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:41,433-Speed 2493.69 samples/sec Loss 2.2247 LearningRate 0.000323 Epoch: 19 Global Step: 405460 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:14:49,636-Speed 2496.86 samples/sec Loss 2.1953 LearningRate 0.000323 Epoch: 19 Global Step: 405470 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:14:57,850-Speed 2493.75 samples/sec Loss 2.1833 LearningRate 0.000323 Epoch: 19 Global Step: 405480 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:06,001-Speed 2512.92 samples/sec Loss 2.1708 LearningRate 0.000323 Epoch: 19 Global Step: 405490 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:14,203-Speed 2497.39 samples/sec Loss 2.1912 LearningRate 0.000323 Epoch: 19 Global Step: 405500 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:22,400-Speed 2498.97 samples/sec Loss 2.2230 LearningRate 0.000323 Epoch: 19 Global Step: 405510 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:30,601-Speed 2497.73 samples/sec Loss 2.1854 LearningRate 0.000323 Epoch: 19 Global Step: 405520 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:38,801-Speed 2497.92 samples/sec Loss 2.2003 LearningRate 0.000323 Epoch: 19 Global Step: 405530 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:47,001-Speed 2497.83 samples/sec Loss 2.1723 LearningRate 0.000323 Epoch: 19 Global Step: 405540 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:15:55,164-Speed 2509.50 samples/sec Loss 2.1802 LearningRate 0.000323 Epoch: 19 Global Step: 405550 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:03,367-Speed 2497.16 samples/sec Loss 2.1352 LearningRate 0.000323 Epoch: 19 Global Step: 405560 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:11,571-Speed 2496.52 samples/sec Loss 2.1937 LearningRate 0.000323 Epoch: 19 Global Step: 405570 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:19,784-Speed 2493.94 samples/sec Loss 2.1506 LearningRate 0.000322 Epoch: 19 Global Step: 405580 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:27,986-Speed 2497.34 samples/sec Loss 2.1351 LearningRate 0.000322 Epoch: 19 Global Step: 405590 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:36,193-Speed 2495.88 samples/sec Loss 2.1499 LearningRate 0.000322 Epoch: 19 Global Step: 405600 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:44,347-Speed 2512.25 samples/sec Loss 2.1396 LearningRate 0.000322 Epoch: 19 Global Step: 405610 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:16:52,563-Speed 2494.73 samples/sec Loss 2.1456 LearningRate 0.000322 Epoch: 19 Global Step: 405620 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:01,100-Speed 2499.48 samples/sec Loss 2.1918 LearningRate 0.000322 Epoch: 19 Global Step: 405630 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:09,304-Speed 2496.60 samples/sec Loss 2.1587 LearningRate 0.000322 Epoch: 19 Global Step: 405640 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:17,504-Speed 2497.98 samples/sec Loss 2.2003 LearningRate 0.000322 Epoch: 19 Global Step: 405650 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:25,742-Speed 2499.93 samples/sec Loss 2.1966 LearningRate 0.000322 Epoch: 19 Global Step: 405660 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:33,943-Speed 2515.33 samples/sec Loss 2.1835 LearningRate 0.000322 Epoch: 19 Global Step: 405670 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:42,143-Speed 2497.89 samples/sec Loss 2.2144 LearningRate 0.000322 Epoch: 19 Global Step: 405680 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:17:50,709-Speed 2500.36 samples/sec Loss 2.2155 LearningRate 0.000322 Epoch: 19 Global Step: 405690 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:00,998-Speed 2486.01 samples/sec Loss 2.1884 LearningRate 0.000322 Epoch: 19 Global Step: 405700 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:09,385-Speed 2495.44 samples/sec Loss 2.2145 LearningRate 0.000322 Epoch: 19 Global Step: 405710 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:20,317-Speed 1873.55 samples/sec Loss 2.1733 LearningRate 0.000322 Epoch: 19 Global Step: 405720 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:28,531-Speed 2517.17 samples/sec Loss 2.2350 LearningRate 0.000322 Epoch: 19 Global Step: 405730 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:36,767-Speed 2499.67 samples/sec Loss 2.2198 LearningRate 0.000322 Epoch: 19 Global Step: 405740 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:44,969-Speed 2497.11 samples/sec Loss 2.1752 LearningRate 0.000322 Epoch: 19 Global Step: 405750 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:18:54,482-Speed 2498.41 samples/sec Loss 2.2142 LearningRate 0.000322 Epoch: 19 Global Step: 405760 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:02,748-Speed 2496.73 samples/sec Loss 2.2003 LearningRate 0.000322 Epoch: 19 Global Step: 405770 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:10,956-Speed 2495.34 samples/sec Loss 2.2097 LearningRate 0.000322 Epoch: 19 Global Step: 405780 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:22,785-Speed 1731.44 samples/sec Loss 2.2045 LearningRate 0.000322 Epoch: 19 Global Step: 405790 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:31,049-Speed 2495.09 samples/sec Loss 2.2090 LearningRate 0.000322 Epoch: 19 Global Step: 405800 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:39,337-Speed 2498.30 samples/sec Loss 2.1820 LearningRate 0.000322 Epoch: 19 Global Step: 405810 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:47,539-Speed 2497.20 samples/sec Loss 2.1916 LearningRate 0.000322 Epoch: 19 Global Step: 405820 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:19:55,743-Speed 2499.91 samples/sec Loss 2.1716 LearningRate 0.000322 Epoch: 19 Global Step: 405830 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:03,978-Speed 2499.21 samples/sec Loss 2.1783 LearningRate 0.000322 Epoch: 19 Global Step: 405840 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:12,175-Speed 2515.55 samples/sec Loss 2.1716 LearningRate 0.000322 Epoch: 19 Global Step: 405850 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:20,376-Speed 2497.28 samples/sec Loss 2.1418 LearningRate 0.000322 Epoch: 19 Global Step: 405860 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:29,099-Speed 2348.13 samples/sec Loss 2.1900 LearningRate 0.000322 Epoch: 19 Global Step: 405870 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:37,338-Speed 2495.22 samples/sec Loss 2.1872 LearningRate 0.000322 Epoch: 19 Global Step: 405880 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:45,584-Speed 2499.26 samples/sec Loss 2.1734 LearningRate 0.000322 Epoch: 19 Global Step: 405890 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:20:53,783-Speed 2498.19 samples/sec Loss 2.1648 LearningRate 0.000322 Epoch: 19 Global Step: 405900 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:06,107-Speed 1924.95 samples/sec Loss 2.1706 LearningRate 0.000322 Epoch: 19 Global Step: 405910 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:14,729-Speed 2498.98 samples/sec Loss 2.1345 LearningRate 0.000322 Epoch: 19 Global Step: 405920 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:22,931-Speed 2497.24 samples/sec Loss 2.1841 LearningRate 0.000322 Epoch: 19 Global Step: 405930 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:31,132-Speed 2497.82 samples/sec Loss 2.2143 LearningRate 0.000322 Epoch: 19 Global Step: 405940 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:39,334-Speed 2497.62 samples/sec Loss 2.2098 LearningRate 0.000322 Epoch: 19 Global Step: 405950 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:47,541-Speed 2495.67 samples/sec Loss 2.2051 LearningRate 0.000322 Epoch: 19 Global Step: 405960 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:21:55,693-Speed 2512.64 samples/sec Loss 2.1722 LearningRate 0.000322 Epoch: 19 Global Step: 405970 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:03,899-Speed 2496.05 samples/sec Loss 2.1834 LearningRate 0.000322 Epoch: 19 Global Step: 405980 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:12,106-Speed 2496.03 samples/sec Loss 2.1868 LearningRate 0.000322 Epoch: 19 Global Step: 405990 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:20,312-Speed 2496.46 samples/sec Loss 2.1456 LearningRate 0.000322 Epoch: 19 Global Step: 406000 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:28,525-Speed 2493.65 samples/sec Loss 2.2201 LearningRate 0.000322 Epoch: 19 Global Step: 406010 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:36,731-Speed 2496.15 samples/sec Loss 2.1526 LearningRate 0.000322 Epoch: 19 Global Step: 406020 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:44,890-Speed 2510.45 samples/sec Loss 2.1760 LearningRate 0.000322 Epoch: 19 Global Step: 406030 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:22:53,054-Speed 2509.21 samples/sec Loss 2.1699 LearningRate 0.000322 Epoch: 19 Global Step: 406040 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:01,261-Speed 2495.63 samples/sec Loss 2.2150 LearningRate 0.000322 Epoch: 19 Global Step: 406050 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:09,469-Speed 2495.59 samples/sec Loss 2.1700 LearningRate 0.000322 Epoch: 19 Global Step: 406060 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:17,679-Speed 2495.29 samples/sec Loss 2.1429 LearningRate 0.000322 Epoch: 19 Global Step: 406070 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:25,885-Speed 2496.06 samples/sec Loss 2.1800 LearningRate 0.000322 Epoch: 19 Global Step: 406080 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:34,038-Speed 2512.30 samples/sec Loss 2.1816 LearningRate 0.000322 Epoch: 19 Global Step: 406090 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:42,242-Speed 2496.72 samples/sec Loss 2.1339 LearningRate 0.000322 Epoch: 19 Global Step: 406100 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:50,459-Speed 2492.65 samples/sec Loss 2.1912 LearningRate 0.000322 Epoch: 19 Global Step: 406110 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:23:58,665-Speed 2496.18 samples/sec Loss 2.1590 LearningRate 0.000322 Epoch: 19 Global Step: 406120 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:06,868-Speed 2496.64 samples/sec Loss 2.1995 LearningRate 0.000322 Epoch: 19 Global Step: 406130 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:15,072-Speed 2496.72 samples/sec Loss 2.1604 LearningRate 0.000322 Epoch: 19 Global Step: 406140 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:23,226-Speed 2512.12 samples/sec Loss 2.1969 LearningRate 0.000322 Epoch: 19 Global Step: 406150 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:31,430-Speed 2496.73 samples/sec Loss 2.1736 LearningRate 0.000322 Epoch: 19 Global Step: 406160 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:39,633-Speed 2497.11 samples/sec Loss 2.1777 LearningRate 0.000322 Epoch: 19 Global Step: 406170 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:47,842-Speed 2495.21 samples/sec Loss 2.1919 LearningRate 0.000322 Epoch: 19 Global Step: 406180 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:24:56,057-Speed 2493.24 samples/sec Loss 2.2120 LearningRate 0.000322 Epoch: 19 Global Step: 406190 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:04,260-Speed 2497.16 samples/sec Loss 2.1831 LearningRate 0.000322 Epoch: 19 Global Step: 406200 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:12,408-Speed 2514.42 samples/sec Loss 2.2030 LearningRate 0.000322 Epoch: 19 Global Step: 406210 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:20,612-Speed 2497.04 samples/sec Loss 2.1092 LearningRate 0.000322 Epoch: 19 Global Step: 406220 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:28,816-Speed 2496.56 samples/sec Loss 2.1558 LearningRate 0.000321 Epoch: 19 Global Step: 406230 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:37,019-Speed 2497.10 samples/sec Loss 2.1724 LearningRate 0.000321 Epoch: 19 Global Step: 406240 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:45,230-Speed 2494.66 samples/sec Loss 2.1836 LearningRate 0.000321 Epoch: 19 Global Step: 406250 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:25:53,433-Speed 2496.76 samples/sec Loss 2.1698 LearningRate 0.000321 Epoch: 19 Global Step: 406260 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:01,587-Speed 2512.20 samples/sec Loss 2.1771 LearningRate 0.000321 Epoch: 19 Global Step: 406270 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:09,799-Speed 2494.24 samples/sec Loss 2.1769 LearningRate 0.000321 Epoch: 19 Global Step: 406280 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:18,007-Speed 2495.77 samples/sec Loss 2.1589 LearningRate 0.000321 Epoch: 19 Global Step: 406290 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:26,208-Speed 2497.82 samples/sec Loss 2.1659 LearningRate 0.000321 Epoch: 19 Global Step: 406300 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:34,421-Speed 2494.03 samples/sec Loss 2.1268 LearningRate 0.000321 Epoch: 19 Global Step: 406310 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:42,623-Speed 2497.57 samples/sec Loss 2.2074 LearningRate 0.000321 Epoch: 19 Global Step: 406320 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:50,768-Speed 2514.75 samples/sec Loss 2.1656 LearningRate 0.000321 Epoch: 19 Global Step: 406330 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:26:58,968-Speed 2498.56 samples/sec Loss 2.1663 LearningRate 0.000321 Epoch: 19 Global Step: 406340 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:07,167-Speed 2498.10 samples/sec Loss 2.1677 LearningRate 0.000321 Epoch: 19 Global Step: 406350 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:15,366-Speed 2498.37 samples/sec Loss 2.1957 LearningRate 0.000321 Epoch: 19 Global Step: 406360 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:23,571-Speed 2496.18 samples/sec Loss 2.2062 LearningRate 0.000321 Epoch: 19 Global Step: 406370 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:31,775-Speed 2496.84 samples/sec Loss 2.1441 LearningRate 0.000321 Epoch: 19 Global Step: 406380 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:39,927-Speed 2512.66 samples/sec Loss 2.1696 LearningRate 0.000321 Epoch: 19 Global Step: 406390 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:48,128-Speed 2497.71 samples/sec Loss 2.1498 LearningRate 0.000321 Epoch: 19 Global Step: 406400 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:27:56,338-Speed 2495.19 samples/sec Loss 2.1056 LearningRate 0.000321 Epoch: 19 Global Step: 406410 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:04,542-Speed 2496.77 samples/sec Loss 2.1376 LearningRate 0.000321 Epoch: 19 Global Step: 406420 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:12,744-Speed 2497.06 samples/sec Loss 2.0757 LearningRate 0.000321 Epoch: 19 Global Step: 406430 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:20,962-Speed 2492.82 samples/sec Loss 2.1852 LearningRate 0.000321 Epoch: 19 Global Step: 406440 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:29,109-Speed 2514.87 samples/sec Loss 2.1835 LearningRate 0.000321 Epoch: 19 Global Step: 406450 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:37,323-Speed 2493.65 samples/sec Loss 2.1678 LearningRate 0.000321 Epoch: 19 Global Step: 406460 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:45,523-Speed 2497.91 samples/sec Loss 2.1673 LearningRate 0.000321 Epoch: 19 Global Step: 406470 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:28:53,736-Speed 2494.12 samples/sec Loss 2.1413 LearningRate 0.000321 Epoch: 19 Global Step: 406480 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:01,939-Speed 2497.17 samples/sec Loss 2.1371 LearningRate 0.000321 Epoch: 19 Global Step: 406490 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:10,140-Speed 2497.39 samples/sec Loss 2.1039 LearningRate 0.000321 Epoch: 19 Global Step: 406500 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:18,289-Speed 2513.48 samples/sec Loss 2.1856 LearningRate 0.000321 Epoch: 19 Global Step: 406510 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:26,492-Speed 2497.55 samples/sec Loss 2.1119 LearningRate 0.000321 Epoch: 19 Global Step: 406520 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:34,696-Speed 2496.86 samples/sec Loss 2.1333 LearningRate 0.000321 Epoch: 19 Global Step: 406530 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:42,911-Speed 2493.55 samples/sec Loss 2.1319 LearningRate 0.000321 Epoch: 19 Global Step: 406540 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:51,112-Speed 2497.60 samples/sec Loss 2.1878 LearningRate 0.000321 Epoch: 19 Global Step: 406550 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:29:59,314-Speed 2497.60 samples/sec Loss 2.1717 LearningRate 0.000321 Epoch: 19 Global Step: 406560 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:07,465-Speed 2512.83 samples/sec Loss 2.1474 LearningRate 0.000321 Epoch: 19 Global Step: 406570 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:15,668-Speed 2497.08 samples/sec Loss 2.1557 LearningRate 0.000321 Epoch: 19 Global Step: 406580 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:23,871-Speed 2496.98 samples/sec Loss 2.1913 LearningRate 0.000321 Epoch: 19 Global Step: 406590 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:32,073-Speed 2497.33 samples/sec Loss 2.2347 LearningRate 0.000321 Epoch: 19 Global Step: 406600 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:40,277-Speed 2496.69 samples/sec Loss 2.0987 LearningRate 0.000321 Epoch: 19 Global Step: 406610 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:48,484-Speed 2495.90 samples/sec Loss 2.2198 LearningRate 0.000321 Epoch: 19 Global Step: 406620 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:30:56,632-Speed 2513.91 samples/sec Loss 2.1917 LearningRate 0.000321 Epoch: 19 Global Step: 406630 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:04,840-Speed 2495.80 samples/sec Loss 2.1596 LearningRate 0.000321 Epoch: 19 Global Step: 406640 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:13,042-Speed 2497.47 samples/sec Loss 2.1791 LearningRate 0.000321 Epoch: 19 Global Step: 406650 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:21,246-Speed 2496.79 samples/sec Loss 2.1624 LearningRate 0.000321 Epoch: 19 Global Step: 406660 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:29,452-Speed 2496.88 samples/sec Loss 2.1735 LearningRate 0.000321 Epoch: 19 Global Step: 406670 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:37,667-Speed 2493.47 samples/sec Loss 2.2041 LearningRate 0.000321 Epoch: 19 Global Step: 406680 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:45,821-Speed 2512.00 samples/sec Loss 2.1763 LearningRate 0.000321 Epoch: 19 Global Step: 406690 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:31:54,024-Speed 2496.97 samples/sec Loss 2.1852 LearningRate 0.000321 Epoch: 19 Global Step: 406700 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:02,236-Speed 2494.68 samples/sec Loss 2.1784 LearningRate 0.000321 Epoch: 19 Global Step: 406710 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:10,444-Speed 2495.63 samples/sec Loss 2.1538 LearningRate 0.000321 Epoch: 19 Global Step: 406720 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:18,661-Speed 2492.62 samples/sec Loss 2.1666 LearningRate 0.000321 Epoch: 19 Global Step: 406730 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:26,863-Speed 2497.43 samples/sec Loss 2.1679 LearningRate 0.000321 Epoch: 19 Global Step: 406740 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:35,016-Speed 2512.40 samples/sec Loss 2.1567 LearningRate 0.000321 Epoch: 19 Global Step: 406750 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:43,215-Speed 2498.17 samples/sec Loss 2.1917 LearningRate 0.000321 Epoch: 19 Global Step: 406760 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:51,425-Speed 2494.86 samples/sec Loss 2.1563 LearningRate 0.000321 Epoch: 19 Global Step: 406770 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:32:59,632-Speed 2495.78 samples/sec Loss 2.1406 LearningRate 0.000321 Epoch: 19 Global Step: 406780 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:07,840-Speed 2495.85 samples/sec Loss 2.1667 LearningRate 0.000321 Epoch: 19 Global Step: 406790 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:16,048-Speed 2495.23 samples/sec Loss 2.1837 LearningRate 0.000321 Epoch: 19 Global Step: 406800 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:24,206-Speed 2511.68 samples/sec Loss 2.1462 LearningRate 0.000321 Epoch: 19 Global Step: 406810 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:32,417-Speed 2494.39 samples/sec Loss 2.1867 LearningRate 0.000321 Epoch: 19 Global Step: 406820 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:40,618-Speed 2497.67 samples/sec Loss 2.1550 LearningRate 0.000321 Epoch: 19 Global Step: 406830 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:48,818-Speed 2497.89 samples/sec Loss 2.1549 LearningRate 0.000321 Epoch: 19 Global Step: 406840 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:33:57,022-Speed 2496.45 samples/sec Loss 2.1562 LearningRate 0.000321 Epoch: 19 Global Step: 406850 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:05,240-Speed 2492.63 samples/sec Loss 2.1815 LearningRate 0.000321 Epoch: 19 Global Step: 406860 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:13,391-Speed 2512.98 samples/sec Loss 2.1815 LearningRate 0.000321 Epoch: 19 Global Step: 406870 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:21,600-Speed 2495.32 samples/sec Loss 2.1817 LearningRate 0.000321 Epoch: 19 Global Step: 406880 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:29,800-Speed 2497.99 samples/sec Loss 2.1551 LearningRate 0.000320 Epoch: 19 Global Step: 406890 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:38,002-Speed 2497.67 samples/sec Loss 2.1422 LearningRate 0.000320 Epoch: 19 Global Step: 406900 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:46,208-Speed 2496.17 samples/sec Loss 2.1758 LearningRate 0.000320 Epoch: 19 Global Step: 406910 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:34:54,408-Speed 2497.77 samples/sec Loss 2.2020 LearningRate 0.000320 Epoch: 19 Global Step: 406920 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:02,568-Speed 2510.35 samples/sec Loss 2.2027 LearningRate 0.000320 Epoch: 19 Global Step: 406930 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:10,769-Speed 2497.48 samples/sec Loss 2.1560 LearningRate 0.000320 Epoch: 19 Global Step: 406940 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:18,971-Speed 2497.62 samples/sec Loss 2.2136 LearningRate 0.000320 Epoch: 19 Global Step: 406950 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:27,176-Speed 2496.39 samples/sec Loss 2.1853 LearningRate 0.000320 Epoch: 19 Global Step: 406960 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:35,377-Speed 2497.70 samples/sec Loss 2.1692 LearningRate 0.000320 Epoch: 19 Global Step: 406970 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:43,584-Speed 2495.74 samples/sec Loss 2.1886 LearningRate 0.000320 Epoch: 19 Global Step: 406980 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:51,731-Speed 2514.29 samples/sec Loss 2.1656 LearningRate 0.000320 Epoch: 19 Global Step: 406990 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:35:59,934-Speed 2497.18 samples/sec Loss 2.1516 LearningRate 0.000320 Epoch: 19 Global Step: 407000 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:08,140-Speed 2495.95 samples/sec Loss 2.1800 LearningRate 0.000320 Epoch: 19 Global Step: 407010 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:16,342-Speed 2497.23 samples/sec Loss 2.1877 LearningRate 0.000320 Epoch: 19 Global Step: 407020 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:24,552-Speed 2494.92 samples/sec Loss 2.1615 LearningRate 0.000320 Epoch: 19 Global Step: 407030 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:32,753-Speed 2497.78 samples/sec Loss 2.1843 LearningRate 0.000320 Epoch: 19 Global Step: 407040 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:40,901-Speed 2514.03 samples/sec Loss 2.2077 LearningRate 0.000320 Epoch: 19 Global Step: 407050 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:49,110-Speed 2495.03 samples/sec Loss 2.1417 LearningRate 0.000320 Epoch: 19 Global Step: 407060 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:36:57,313-Speed 2497.33 samples/sec Loss 2.1505 LearningRate 0.000320 Epoch: 19 Global Step: 407070 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:05,519-Speed 2496.13 samples/sec Loss 2.1809 LearningRate 0.000320 Epoch: 19 Global Step: 407080 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:13,718-Speed 2498.17 samples/sec Loss 2.1989 LearningRate 0.000320 Epoch: 19 Global Step: 407090 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:21,918-Speed 2497.84 samples/sec Loss 2.1336 LearningRate 0.000320 Epoch: 19 Global Step: 407100 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:30,067-Speed 2513.87 samples/sec Loss 2.1763 LearningRate 0.000320 Epoch: 19 Global Step: 407110 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:38,265-Speed 2498.45 samples/sec Loss 2.1361 LearningRate 0.000320 Epoch: 19 Global Step: 407120 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:46,476-Speed 2494.66 samples/sec Loss 2.1511 LearningRate 0.000320 Epoch: 19 Global Step: 407130 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:37:54,703-Speed 2490.02 samples/sec Loss 2.1363 LearningRate 0.000320 Epoch: 19 Global Step: 407140 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:02,908-Speed 2496.52 samples/sec Loss 2.1010 LearningRate 0.000320 Epoch: 19 Global Step: 407150 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:11,129-Speed 2491.94 samples/sec Loss 2.1298 LearningRate 0.000320 Epoch: 19 Global Step: 407160 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:19,291-Speed 2509.23 samples/sec Loss 2.1399 LearningRate 0.000320 Epoch: 19 Global Step: 407170 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:27,495-Speed 2497.01 samples/sec Loss 2.1505 LearningRate 0.000320 Epoch: 19 Global Step: 407180 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:35,697-Speed 2497.48 samples/sec Loss 2.1698 LearningRate 0.000320 Epoch: 19 Global Step: 407190 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:43,899-Speed 2497.26 samples/sec Loss 2.2065 LearningRate 0.000320 Epoch: 19 Global Step: 407200 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:38:52,102-Speed 2497.11 samples/sec Loss 2.1769 LearningRate 0.000320 Epoch: 19 Global Step: 407210 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:39:00,306-Speed 2497.00 samples/sec Loss 2.1473 LearningRate 0.000320 Epoch: 19 Global Step: 407220 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:39:08,454-Speed 2513.49 samples/sec Loss 2.1595 LearningRate 0.000320 Epoch: 19 Global Step: 407230 Fp16 Grad Scale: 16384 Required: 97 hours Training: 2022-07-09 11:39:16,656-Speed 2497.35 samples/sec Loss 2.1425 LearningRate 0.000320 Epoch: 19 Global Step: 407240 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:39:24,859-Speed 2497.26 samples/sec Loss 2.1963 LearningRate 0.000320 Epoch: 19 Global Step: 407250 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:39:33,066-Speed 2495.72 samples/sec Loss 2.1659 LearningRate 0.000320 Epoch: 19 Global Step: 407260 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:39:41,264-Speed 2498.44 samples/sec Loss 2.1565 LearningRate 0.000320 Epoch: 19 Global Step: 407270 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:39:49,468-Speed 2496.67 samples/sec Loss 2.1878 LearningRate 0.000320 Epoch: 19 Global Step: 407280 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:39:57,620-Speed 2512.85 samples/sec Loss 2.1318 LearningRate 0.000320 Epoch: 19 Global Step: 407290 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:05,834-Speed 2493.70 samples/sec Loss 2.1705 LearningRate 0.000320 Epoch: 19 Global Step: 407300 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:14,035-Speed 2497.66 samples/sec Loss 2.1775 LearningRate 0.000320 Epoch: 19 Global Step: 407310 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:22,235-Speed 2497.89 samples/sec Loss 2.1469 LearningRate 0.000320 Epoch: 19 Global Step: 407320 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:30,435-Speed 2498.01 samples/sec Loss 2.1861 LearningRate 0.000320 Epoch: 19 Global Step: 407330 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:38,635-Speed 2497.84 samples/sec Loss 2.1736 LearningRate 0.000320 Epoch: 19 Global Step: 407340 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:46,786-Speed 2512.99 samples/sec Loss 2.1474 LearningRate 0.000320 Epoch: 19 Global Step: 407350 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:40:54,987-Speed 2497.55 samples/sec Loss 2.1481 LearningRate 0.000320 Epoch: 19 Global Step: 407360 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:03,187-Speed 2497.95 samples/sec Loss 2.1930 LearningRate 0.000320 Epoch: 19 Global Step: 407370 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:11,393-Speed 2496.03 samples/sec Loss 2.1975 LearningRate 0.000320 Epoch: 19 Global Step: 407380 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:19,595-Speed 2497.58 samples/sec Loss 2.1670 LearningRate 0.000320 Epoch: 19 Global Step: 407390 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:27,797-Speed 2497.48 samples/sec Loss 2.1825 LearningRate 0.000320 Epoch: 19 Global Step: 407400 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:35,949-Speed 2512.95 samples/sec Loss 2.2155 LearningRate 0.000320 Epoch: 19 Global Step: 407410 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:44,155-Speed 2495.86 samples/sec Loss 2.1827 LearningRate 0.000320 Epoch: 19 Global Step: 407420 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:41:52,375-Speed 2491.83 samples/sec Loss 2.1996 LearningRate 0.000320 Epoch: 19 Global Step: 407430 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:00,582-Speed 2496.02 samples/sec Loss 2.1645 LearningRate 0.000320 Epoch: 19 Global Step: 407440 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:08,795-Speed 2494.14 samples/sec Loss 2.1180 LearningRate 0.000320 Epoch: 19 Global Step: 407450 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:17,003-Speed 2495.60 samples/sec Loss 2.1280 LearningRate 0.000320 Epoch: 19 Global Step: 407460 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:25,154-Speed 2512.87 samples/sec Loss 2.2238 LearningRate 0.000320 Epoch: 19 Global Step: 407470 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:33,354-Speed 2497.88 samples/sec Loss 2.1368 LearningRate 0.000320 Epoch: 19 Global Step: 407480 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:41,556-Speed 2497.46 samples/sec Loss 2.2309 LearningRate 0.000320 Epoch: 19 Global Step: 407490 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:49,760-Speed 2496.84 samples/sec Loss 2.1347 LearningRate 0.000320 Epoch: 19 Global Step: 407500 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:42:57,964-Speed 2496.67 samples/sec Loss 2.1855 LearningRate 0.000320 Epoch: 19 Global Step: 407510 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:06,177-Speed 2493.77 samples/sec Loss 2.1285 LearningRate 0.000320 Epoch: 19 Global Step: 407520 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:14,324-Speed 2514.23 samples/sec Loss 2.2124 LearningRate 0.000320 Epoch: 19 Global Step: 407530 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:22,527-Speed 2497.21 samples/sec Loss 2.1556 LearningRate 0.000320 Epoch: 19 Global Step: 407540 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:30,726-Speed 2497.99 samples/sec Loss 2.2235 LearningRate 0.000319 Epoch: 19 Global Step: 407550 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:38,928-Speed 2497.33 samples/sec Loss 2.2231 LearningRate 0.000319 Epoch: 19 Global Step: 407560 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:47,135-Speed 2496.25 samples/sec Loss 2.1834 LearningRate 0.000319 Epoch: 19 Global Step: 407570 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:43:55,337-Speed 2497.29 samples/sec Loss 2.1614 LearningRate 0.000319 Epoch: 19 Global Step: 407580 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:03,492-Speed 2511.77 samples/sec Loss 2.1421 LearningRate 0.000319 Epoch: 19 Global Step: 407590 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:11,707-Speed 2493.56 samples/sec Loss 2.1967 LearningRate 0.000319 Epoch: 19 Global Step: 407600 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:19,913-Speed 2496.34 samples/sec Loss 2.1932 LearningRate 0.000319 Epoch: 19 Global Step: 407610 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:28,119-Speed 2495.99 samples/sec Loss 2.1751 LearningRate 0.000319 Epoch: 19 Global Step: 407620 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:36,322-Speed 2497.04 samples/sec Loss 2.2221 LearningRate 0.000319 Epoch: 19 Global Step: 407630 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:44,530-Speed 2495.46 samples/sec Loss 2.2049 LearningRate 0.000319 Epoch: 19 Global Step: 407640 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:44:52,689-Speed 2510.47 samples/sec Loss 2.1639 LearningRate 0.000319 Epoch: 19 Global Step: 407650 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:00,901-Speed 2494.58 samples/sec Loss 2.1882 LearningRate 0.000319 Epoch: 19 Global Step: 407660 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:09,109-Speed 2495.47 samples/sec Loss 2.1544 LearningRate 0.000319 Epoch: 19 Global Step: 407670 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:17,313-Speed 2496.94 samples/sec Loss 2.2190 LearningRate 0.000319 Epoch: 19 Global Step: 407680 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:25,520-Speed 2495.71 samples/sec Loss 2.2064 LearningRate 0.000319 Epoch: 19 Global Step: 407690 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:33,723-Speed 2496.91 samples/sec Loss 2.1813 LearningRate 0.000319 Epoch: 19 Global Step: 407700 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:41,874-Speed 2513.32 samples/sec Loss 2.2439 LearningRate 0.000319 Epoch: 19 Global Step: 407710 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:50,077-Speed 2497.16 samples/sec Loss 2.1779 LearningRate 0.000319 Epoch: 19 Global Step: 407720 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:45:58,285-Speed 2495.58 samples/sec Loss 2.1467 LearningRate 0.000319 Epoch: 19 Global Step: 407730 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:06,498-Speed 2493.95 samples/sec Loss 2.2113 LearningRate 0.000319 Epoch: 19 Global Step: 407740 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:14,697-Speed 2498.35 samples/sec Loss 2.1801 LearningRate 0.000319 Epoch: 19 Global Step: 407750 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:22,901-Speed 2496.67 samples/sec Loss 2.1975 LearningRate 0.000319 Epoch: 19 Global Step: 407760 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:31,051-Speed 2513.37 samples/sec Loss 2.2159 LearningRate 0.000319 Epoch: 19 Global Step: 407770 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:39,256-Speed 2496.68 samples/sec Loss 2.2053 LearningRate 0.000319 Epoch: 19 Global Step: 407780 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:47,457-Speed 2497.47 samples/sec Loss 2.1942 LearningRate 0.000319 Epoch: 19 Global Step: 407790 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:46:55,654-Speed 2498.93 samples/sec Loss 2.1909 LearningRate 0.000319 Epoch: 19 Global Step: 407800 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:03,852-Speed 2498.97 samples/sec Loss 2.1632 LearningRate 0.000319 Epoch: 19 Global Step: 407810 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:12,052-Speed 2498.08 samples/sec Loss 2.1738 LearningRate 0.000319 Epoch: 19 Global Step: 407820 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:20,201-Speed 2513.58 samples/sec Loss 2.1963 LearningRate 0.000319 Epoch: 19 Global Step: 407830 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:28,399-Speed 2498.96 samples/sec Loss 2.2005 LearningRate 0.000319 Epoch: 19 Global Step: 407840 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:36,597-Speed 2498.56 samples/sec Loss 2.1509 LearningRate 0.000319 Epoch: 19 Global Step: 407850 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:44,801-Speed 2496.68 samples/sec Loss 2.1642 LearningRate 0.000319 Epoch: 19 Global Step: 407860 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:47:53,015-Speed 2493.78 samples/sec Loss 2.1680 LearningRate 0.000319 Epoch: 19 Global Step: 407870 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:01,221-Speed 2496.12 samples/sec Loss 2.1294 LearningRate 0.000319 Epoch: 19 Global Step: 407880 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:09,369-Speed 2513.88 samples/sec Loss 2.1735 LearningRate 0.000319 Epoch: 19 Global Step: 407890 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:17,571-Speed 2497.25 samples/sec Loss 2.2029 LearningRate 0.000319 Epoch: 19 Global Step: 407900 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:25,770-Speed 2498.27 samples/sec Loss 2.2421 LearningRate 0.000319 Epoch: 19 Global Step: 407910 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:33,970-Speed 2498.37 samples/sec Loss 2.1711 LearningRate 0.000319 Epoch: 19 Global Step: 407920 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:42,182-Speed 2494.08 samples/sec Loss 2.1871 LearningRate 0.000319 Epoch: 19 Global Step: 407930 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:50,386-Speed 2496.80 samples/sec Loss 2.1810 LearningRate 0.000319 Epoch: 19 Global Step: 407940 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:48:58,544-Speed 2510.85 samples/sec Loss 2.1790 LearningRate 0.000319 Epoch: 19 Global Step: 407950 Fp16 Grad Scale: 32768 Required: 97 hours Training: 2022-07-09 11:49:06,750-Speed 2495.99 samples/sec Loss 2.1423 LearningRate 0.000319 Epoch: 19 Global Step: 407960 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:14,952-Speed 2497.47 samples/sec Loss 2.2002 LearningRate 0.000319 Epoch: 19 Global Step: 407970 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:23,155-Speed 2497.11 samples/sec Loss 2.1927 LearningRate 0.000319 Epoch: 19 Global Step: 407980 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:31,356-Speed 2497.56 samples/sec Loss 2.1541 LearningRate 0.000319 Epoch: 19 Global Step: 407990 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:39,558-Speed 2497.26 samples/sec Loss 2.1553 LearningRate 0.000319 Epoch: 19 Global Step: 408000 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:47,706-Speed 2513.68 samples/sec Loss 2.2227 LearningRate 0.000319 Epoch: 19 Global Step: 408010 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:49:55,919-Speed 2494.20 samples/sec Loss 2.2137 LearningRate 0.000319 Epoch: 19 Global Step: 408020 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:04,123-Speed 2496.62 samples/sec Loss 2.2463 LearningRate 0.000319 Epoch: 19 Global Step: 408030 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:12,329-Speed 2496.31 samples/sec Loss 2.1954 LearningRate 0.000319 Epoch: 19 Global Step: 408040 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:20,535-Speed 2495.99 samples/sec Loss 2.1800 LearningRate 0.000319 Epoch: 19 Global Step: 408050 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:28,737-Speed 2497.48 samples/sec Loss 2.1079 LearningRate 0.000319 Epoch: 19 Global Step: 408060 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:36,885-Speed 2513.83 samples/sec Loss 2.1746 LearningRate 0.000319 Epoch: 19 Global Step: 408070 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:45,085-Speed 2498.09 samples/sec Loss 2.1930 LearningRate 0.000319 Epoch: 19 Global Step: 408080 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:50:53,290-Speed 2496.32 samples/sec Loss 2.1846 LearningRate 0.000319 Epoch: 19 Global Step: 408090 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:01,493-Speed 2497.22 samples/sec Loss 2.2041 LearningRate 0.000319 Epoch: 19 Global Step: 408100 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:09,699-Speed 2496.07 samples/sec Loss 2.1418 LearningRate 0.000319 Epoch: 19 Global Step: 408110 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:17,901-Speed 2497.29 samples/sec Loss 2.1634 LearningRate 0.000319 Epoch: 19 Global Step: 408120 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:26,049-Speed 2513.84 samples/sec Loss 2.1726 LearningRate 0.000319 Epoch: 19 Global Step: 408130 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:34,249-Speed 2498.08 samples/sec Loss 2.1855 LearningRate 0.000319 Epoch: 19 Global Step: 408140 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:42,449-Speed 2497.85 samples/sec Loss 2.1852 LearningRate 0.000319 Epoch: 19 Global Step: 408150 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:50,652-Speed 2497.08 samples/sec Loss 2.1391 LearningRate 0.000319 Epoch: 19 Global Step: 408160 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:51:58,850-Speed 2498.46 samples/sec Loss 2.1611 LearningRate 0.000319 Epoch: 19 Global Step: 408170 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:07,066-Speed 2493.33 samples/sec Loss 2.1645 LearningRate 0.000319 Epoch: 19 Global Step: 408180 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:15,216-Speed 2513.16 samples/sec Loss 2.2593 LearningRate 0.000319 Epoch: 19 Global Step: 408190 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:23,419-Speed 2497.00 samples/sec Loss 2.2030 LearningRate 0.000319 Epoch: 19 Global Step: 408200 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:31,622-Speed 2497.17 samples/sec Loss 2.1859 LearningRate 0.000318 Epoch: 19 Global Step: 408210 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:39,831-Speed 2495.49 samples/sec Loss 2.1866 LearningRate 0.000318 Epoch: 19 Global Step: 408220 Fp16 Grad Scale: 32768 Required: 96 hours Training: 2022-07-09 11:52:47,988-Speed 2511.11 samples/sec Loss 2.1924 LearningRate 0.000318 Epoch: 19 Global Step: 408230 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:52:56,189-Speed 2497.52 samples/sec Loss 2.1554 LearningRate 0.000318 Epoch: 19 Global Step: 408240 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:04,337-Speed 2513.96 samples/sec Loss 2.1691 LearningRate 0.000318 Epoch: 19 Global Step: 408250 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:12,537-Speed 2498.10 samples/sec Loss 2.2117 LearningRate 0.000318 Epoch: 19 Global Step: 408260 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:20,739-Speed 2497.34 samples/sec Loss 2.1445 LearningRate 0.000318 Epoch: 19 Global Step: 408270 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:28,940-Speed 2497.72 samples/sec Loss 2.1461 LearningRate 0.000318 Epoch: 19 Global Step: 408280 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:37,140-Speed 2497.90 samples/sec Loss 2.1341 LearningRate 0.000318 Epoch: 19 Global Step: 408290 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:45,344-Speed 2496.79 samples/sec Loss 2.2139 LearningRate 0.000318 Epoch: 19 Global Step: 408300 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:53:53,491-Speed 2514.17 samples/sec Loss 2.1406 LearningRate 0.000318 Epoch: 19 Global Step: 408310 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 11:54:01,646-Speed 2511.86 samples/sec Loss 2.1276 LearningRate 0.000318 Epoch: 19 Global Step: 408320 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:09,845-Speed 2498.22 samples/sec Loss 2.2075 LearningRate 0.000318 Epoch: 19 Global Step: 408330 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:18,048-Speed 2497.23 samples/sec Loss 2.1916 LearningRate 0.000318 Epoch: 19 Global Step: 408340 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:26,255-Speed 2495.83 samples/sec Loss 2.1879 LearningRate 0.000318 Epoch: 19 Global Step: 408350 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:34,460-Speed 2496.60 samples/sec Loss 2.1639 LearningRate 0.000318 Epoch: 19 Global Step: 408360 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:42,607-Speed 2514.11 samples/sec Loss 2.1851 LearningRate 0.000318 Epoch: 19 Global Step: 408370 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:50,810-Speed 2497.21 samples/sec Loss 2.1930 LearningRate 0.000318 Epoch: 19 Global Step: 408380 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:54:59,012-Speed 2497.32 samples/sec Loss 2.1768 LearningRate 0.000318 Epoch: 19 Global Step: 408390 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:07,214-Speed 2497.37 samples/sec Loss 2.2367 LearningRate 0.000318 Epoch: 19 Global Step: 408400 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:15,419-Speed 2496.49 samples/sec Loss 2.1778 LearningRate 0.000318 Epoch: 19 Global Step: 408410 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:23,619-Speed 2497.88 samples/sec Loss 2.1618 LearningRate 0.000318 Epoch: 19 Global Step: 408420 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:31,770-Speed 2513.09 samples/sec Loss 2.1774 LearningRate 0.000318 Epoch: 19 Global Step: 408430 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:39,968-Speed 2498.49 samples/sec Loss 2.1557 LearningRate 0.000318 Epoch: 19 Global Step: 408440 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:48,171-Speed 2497.17 samples/sec Loss 2.1731 LearningRate 0.000318 Epoch: 19 Global Step: 408450 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:55:56,400-Speed 2488.94 samples/sec Loss 2.1907 LearningRate 0.000318 Epoch: 19 Global Step: 408460 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:04,602-Speed 2497.52 samples/sec Loss 2.1730 LearningRate 0.000318 Epoch: 19 Global Step: 408470 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:12,802-Speed 2498.08 samples/sec Loss 2.1639 LearningRate 0.000318 Epoch: 19 Global Step: 408480 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:20,951-Speed 2513.58 samples/sec Loss 2.2447 LearningRate 0.000318 Epoch: 19 Global Step: 408490 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:29,156-Speed 2496.39 samples/sec Loss 2.2091 LearningRate 0.000318 Epoch: 19 Global Step: 408500 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:37,359-Speed 2497.14 samples/sec Loss 2.1722 LearningRate 0.000318 Epoch: 19 Global Step: 408510 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:45,565-Speed 2495.99 samples/sec Loss 2.1687 LearningRate 0.000318 Epoch: 19 Global Step: 408520 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:56:53,765-Speed 2498.11 samples/sec Loss 2.1902 LearningRate 0.000318 Epoch: 19 Global Step: 408530 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:01,970-Speed 2496.51 samples/sec Loss 2.1842 LearningRate 0.000318 Epoch: 19 Global Step: 408540 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:10,118-Speed 2513.94 samples/sec Loss 2.1383 LearningRate 0.000318 Epoch: 19 Global Step: 408550 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:18,321-Speed 2497.23 samples/sec Loss 2.1906 LearningRate 0.000318 Epoch: 19 Global Step: 408560 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:26,539-Speed 2492.42 samples/sec Loss 2.1831 LearningRate 0.000318 Epoch: 19 Global Step: 408570 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:34,740-Speed 2497.48 samples/sec Loss 2.1835 LearningRate 0.000318 Epoch: 19 Global Step: 408580 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:42,941-Speed 2497.89 samples/sec Loss 2.1140 LearningRate 0.000318 Epoch: 19 Global Step: 408590 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:51,153-Speed 2494.28 samples/sec Loss 2.1179 LearningRate 0.000318 Epoch: 19 Global Step: 408600 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:57:59,300-Speed 2514.41 samples/sec Loss 2.1930 LearningRate 0.000318 Epoch: 19 Global Step: 408610 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:07,509-Speed 2495.08 samples/sec Loss 2.1641 LearningRate 0.000318 Epoch: 19 Global Step: 408620 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:15,722-Speed 2494.00 samples/sec Loss 2.1250 LearningRate 0.000318 Epoch: 19 Global Step: 408630 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:23,923-Speed 2497.90 samples/sec Loss 2.1411 LearningRate 0.000318 Epoch: 19 Global Step: 408640 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:32,127-Speed 2496.62 samples/sec Loss 2.1711 LearningRate 0.000318 Epoch: 19 Global Step: 408650 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:40,326-Speed 2498.30 samples/sec Loss 2.1327 LearningRate 0.000318 Epoch: 19 Global Step: 408660 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:48,475-Speed 2514.16 samples/sec Loss 2.1913 LearningRate 0.000318 Epoch: 19 Global Step: 408670 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:58:56,672-Speed 2498.95 samples/sec Loss 2.1671 LearningRate 0.000318 Epoch: 19 Global Step: 408680 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:04,873-Speed 2497.62 samples/sec Loss 2.1489 LearningRate 0.000318 Epoch: 19 Global Step: 408690 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:13,075-Speed 2497.48 samples/sec Loss 2.1865 LearningRate 0.000318 Epoch: 19 Global Step: 408700 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:21,280-Speed 2496.31 samples/sec Loss 2.1458 LearningRate 0.000318 Epoch: 19 Global Step: 408710 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:29,485-Speed 2496.41 samples/sec Loss 2.1503 LearningRate 0.000318 Epoch: 19 Global Step: 408720 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:37,634-Speed 2513.66 samples/sec Loss 2.1222 LearningRate 0.000318 Epoch: 19 Global Step: 408730 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:45,838-Speed 2496.75 samples/sec Loss 2.1397 LearningRate 0.000318 Epoch: 19 Global Step: 408740 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 11:59:54,037-Speed 2498.31 samples/sec Loss 2.1470 LearningRate 0.000318 Epoch: 19 Global Step: 408750 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:02,239-Speed 2497.46 samples/sec Loss 2.1276 LearningRate 0.000318 Epoch: 19 Global Step: 408760 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:10,437-Speed 2498.21 samples/sec Loss 2.1326 LearningRate 0.000318 Epoch: 19 Global Step: 408770 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:18,640-Speed 2497.18 samples/sec Loss 2.1756 LearningRate 0.000318 Epoch: 19 Global Step: 408780 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:26,785-Speed 2514.80 samples/sec Loss 2.0977 LearningRate 0.000318 Epoch: 19 Global Step: 408790 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:34,991-Speed 2496.12 samples/sec Loss 2.1508 LearningRate 0.000318 Epoch: 19 Global Step: 408800 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:43,210-Speed 2492.34 samples/sec Loss 2.1393 LearningRate 0.000318 Epoch: 19 Global Step: 408810 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:51,424-Speed 2493.50 samples/sec Loss 2.1538 LearningRate 0.000318 Epoch: 19 Global Step: 408820 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:00:59,624-Speed 2498.21 samples/sec Loss 2.1360 LearningRate 0.000318 Epoch: 19 Global Step: 408830 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:07,825-Speed 2497.52 samples/sec Loss 2.1373 LearningRate 0.000318 Epoch: 19 Global Step: 408840 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:15,972-Speed 2514.17 samples/sec Loss 2.1691 LearningRate 0.000318 Epoch: 19 Global Step: 408850 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:24,173-Speed 2497.64 samples/sec Loss 2.1711 LearningRate 0.000318 Epoch: 19 Global Step: 408860 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:32,372-Speed 2498.36 samples/sec Loss 2.1435 LearningRate 0.000318 Epoch: 19 Global Step: 408870 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:40,574-Speed 2497.50 samples/sec Loss 2.1655 LearningRate 0.000317 Epoch: 19 Global Step: 408880 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:48,780-Speed 2495.98 samples/sec Loss 2.1730 LearningRate 0.000317 Epoch: 19 Global Step: 408890 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:01:56,989-Speed 2495.31 samples/sec Loss 2.1370 LearningRate 0.000317 Epoch: 19 Global Step: 408900 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:05,138-Speed 2513.77 samples/sec Loss 2.1532 LearningRate 0.000317 Epoch: 19 Global Step: 408910 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:13,341-Speed 2496.94 samples/sec Loss 2.1744 LearningRate 0.000317 Epoch: 19 Global Step: 408920 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:21,561-Speed 2491.74 samples/sec Loss 2.1671 LearningRate 0.000317 Epoch: 19 Global Step: 408930 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:29,778-Speed 2493.19 samples/sec Loss 2.1532 LearningRate 0.000317 Epoch: 19 Global Step: 408940 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:37,982-Speed 2496.63 samples/sec Loss 2.1710 LearningRate 0.000317 Epoch: 19 Global Step: 408950 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:46,183-Speed 2497.71 samples/sec Loss 2.1866 LearningRate 0.000317 Epoch: 19 Global Step: 408960 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:02:54,332-Speed 2513.68 samples/sec Loss 2.1671 LearningRate 0.000317 Epoch: 19 Global Step: 408970 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:02,544-Speed 2494.32 samples/sec Loss 2.1501 LearningRate 0.000317 Epoch: 19 Global Step: 408980 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:10,743-Speed 2498.02 samples/sec Loss 2.1254 LearningRate 0.000317 Epoch: 19 Global Step: 408990 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:18,964-Speed 2491.72 samples/sec Loss 2.1374 LearningRate 0.000317 Epoch: 19 Global Step: 409000 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:27,164-Speed 2498.18 samples/sec Loss 2.1781 LearningRate 0.000317 Epoch: 19 Global Step: 409010 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:35,362-Speed 2498.55 samples/sec Loss 2.1756 LearningRate 0.000317 Epoch: 19 Global Step: 409020 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:43,509-Speed 2514.08 samples/sec Loss 2.1159 LearningRate 0.000317 Epoch: 19 Global Step: 409030 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:51,707-Speed 2498.54 samples/sec Loss 2.1481 LearningRate 0.000317 Epoch: 19 Global Step: 409040 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:03:59,918-Speed 2494.82 samples/sec Loss 2.1472 LearningRate 0.000317 Epoch: 19 Global Step: 409050 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:08,129-Speed 2494.72 samples/sec Loss 2.1628 LearningRate 0.000317 Epoch: 19 Global Step: 409060 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:16,324-Speed 2499.27 samples/sec Loss 2.1339 LearningRate 0.000317 Epoch: 19 Global Step: 409070 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:24,526-Speed 2497.43 samples/sec Loss 2.1189 LearningRate 0.000317 Epoch: 19 Global Step: 409080 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:32,672-Speed 2514.37 samples/sec Loss 2.1733 LearningRate 0.000317 Epoch: 19 Global Step: 409090 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:40,876-Speed 2496.89 samples/sec Loss 2.1552 LearningRate 0.000317 Epoch: 19 Global Step: 409100 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:49,075-Speed 2498.18 samples/sec Loss 2.1358 LearningRate 0.000317 Epoch: 19 Global Step: 409110 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:04:57,273-Speed 2498.48 samples/sec Loss 2.1569 LearningRate 0.000317 Epoch: 19 Global Step: 409120 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:05,483-Speed 2495.23 samples/sec Loss 2.1957 LearningRate 0.000317 Epoch: 19 Global Step: 409130 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:13,681-Speed 2498.60 samples/sec Loss 2.1526 LearningRate 0.000317 Epoch: 19 Global Step: 409140 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:21,829-Speed 2513.80 samples/sec Loss 2.1789 LearningRate 0.000317 Epoch: 19 Global Step: 409150 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:30,029-Speed 2498.02 samples/sec Loss 2.1759 LearningRate 0.000317 Epoch: 19 Global Step: 409160 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:38,234-Speed 2496.49 samples/sec Loss 2.1747 LearningRate 0.000317 Epoch: 19 Global Step: 409170 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:46,434-Speed 2497.88 samples/sec Loss 2.1825 LearningRate 0.000317 Epoch: 19 Global Step: 409180 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:05:54,637-Speed 2497.25 samples/sec Loss 2.1733 LearningRate 0.000317 Epoch: 19 Global Step: 409190 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:02,838-Speed 2497.37 samples/sec Loss 2.1973 LearningRate 0.000317 Epoch: 19 Global Step: 409200 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:10,985-Speed 2514.36 samples/sec Loss 2.1607 LearningRate 0.000317 Epoch: 19 Global Step: 409210 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:19,184-Speed 2498.13 samples/sec Loss 2.1782 LearningRate 0.000317 Epoch: 19 Global Step: 409220 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:27,384-Speed 2497.86 samples/sec Loss 2.1926 LearningRate 0.000317 Epoch: 19 Global Step: 409230 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:35,585-Speed 2497.97 samples/sec Loss 2.1837 LearningRate 0.000317 Epoch: 19 Global Step: 409240 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:43,782-Speed 2498.88 samples/sec Loss 2.2194 LearningRate 0.000317 Epoch: 19 Global Step: 409250 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:06:51,985-Speed 2496.93 samples/sec Loss 2.1833 LearningRate 0.000317 Epoch: 19 Global Step: 409260 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:00,145-Speed 2510.09 samples/sec Loss 2.1615 LearningRate 0.000317 Epoch: 19 Global Step: 409270 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:08,345-Speed 2497.90 samples/sec Loss 2.1725 LearningRate 0.000317 Epoch: 19 Global Step: 409280 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:16,545-Speed 2497.95 samples/sec Loss 2.2133 LearningRate 0.000317 Epoch: 19 Global Step: 409290 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:24,756-Speed 2495.08 samples/sec Loss 2.1765 LearningRate 0.000317 Epoch: 19 Global Step: 409300 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:32,959-Speed 2497.21 samples/sec Loss 2.1757 LearningRate 0.000317 Epoch: 19 Global Step: 409310 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:41,157-Speed 2498.57 samples/sec Loss 2.2045 LearningRate 0.000317 Epoch: 19 Global Step: 409320 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:49,306-Speed 2513.76 samples/sec Loss 2.1776 LearningRate 0.000317 Epoch: 19 Global Step: 409330 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:07:57,504-Speed 2498.42 samples/sec Loss 2.1600 LearningRate 0.000317 Epoch: 19 Global Step: 409340 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:05,718-Speed 2493.74 samples/sec Loss 2.1872 LearningRate 0.000317 Epoch: 19 Global Step: 409350 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:13,919-Speed 2497.60 samples/sec Loss 2.1897 LearningRate 0.000317 Epoch: 19 Global Step: 409360 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:22,119-Speed 2498.19 samples/sec Loss 2.2060 LearningRate 0.000317 Epoch: 19 Global Step: 409370 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:30,329-Speed 2494.94 samples/sec Loss 2.2235 LearningRate 0.000317 Epoch: 19 Global Step: 409380 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:38,475-Speed 2514.43 samples/sec Loss 2.1745 LearningRate 0.000317 Epoch: 19 Global Step: 409390 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:46,692-Speed 2492.85 samples/sec Loss 2.1405 LearningRate 0.000317 Epoch: 19 Global Step: 409400 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:08:54,892-Speed 2498.13 samples/sec Loss 2.1773 LearningRate 0.000317 Epoch: 19 Global Step: 409410 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:03,092-Speed 2497.81 samples/sec Loss 2.1862 LearningRate 0.000317 Epoch: 19 Global Step: 409420 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:11,293-Speed 2498.00 samples/sec Loss 2.1854 LearningRate 0.000317 Epoch: 19 Global Step: 409430 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:19,493-Speed 2497.80 samples/sec Loss 2.1963 LearningRate 0.000317 Epoch: 19 Global Step: 409440 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:27,641-Speed 2514.03 samples/sec Loss 2.1701 LearningRate 0.000317 Epoch: 19 Global Step: 409450 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:35,840-Speed 2498.34 samples/sec Loss 2.1571 LearningRate 0.000317 Epoch: 19 Global Step: 409460 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:44,042-Speed 2497.22 samples/sec Loss 2.1278 LearningRate 0.000317 Epoch: 19 Global Step: 409470 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:09:52,243-Speed 2497.94 samples/sec Loss 2.2065 LearningRate 0.000317 Epoch: 19 Global Step: 409480 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:10:00,444-Speed 2497.74 samples/sec Loss 2.1894 LearningRate 0.000317 Epoch: 19 Global Step: 409490 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:10:08,643-Speed 2497.95 samples/sec Loss 2.1595 LearningRate 0.000317 Epoch: 19 Global Step: 409500 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:10:16,789-Speed 2514.57 samples/sec Loss 2.1683 LearningRate 0.000317 Epoch: 19 Global Step: 409510 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:10:24,989-Speed 2498.21 samples/sec Loss 2.1450 LearningRate 0.000317 Epoch: 19 Global Step: 409520 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:10:33,207-Speed 2492.51 samples/sec Loss 2.1689 LearningRate 0.000317 Epoch: 19 Global Step: 409530 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:10:41,406-Speed 2498.19 samples/sec Loss 2.1426 LearningRate 0.000316 Epoch: 19 Global Step: 409540 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:10:49,620-Speed 2493.94 samples/sec Loss 2.1510 LearningRate 0.000316 Epoch: 19 Global Step: 409550 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:10:57,822-Speed 2497.22 samples/sec Loss 2.1646 LearningRate 0.000316 Epoch: 19 Global Step: 409560 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:05,969-Speed 2514.17 samples/sec Loss 2.1791 LearningRate 0.000316 Epoch: 19 Global Step: 409570 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:14,187-Speed 2492.45 samples/sec Loss 2.2002 LearningRate 0.000316 Epoch: 19 Global Step: 409580 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:22,413-Speed 2490.54 samples/sec Loss 2.1491 LearningRate 0.000316 Epoch: 19 Global Step: 409590 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:30,621-Speed 2495.66 samples/sec Loss 2.1500 LearningRate 0.000316 Epoch: 19 Global Step: 409600 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:38,822-Speed 2497.54 samples/sec Loss 2.1584 LearningRate 0.000316 Epoch: 19 Global Step: 409610 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:47,026-Speed 2496.84 samples/sec Loss 2.1705 LearningRate 0.000316 Epoch: 19 Global Step: 409620 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:11:55,171-Speed 2514.62 samples/sec Loss 2.1377 LearningRate 0.000316 Epoch: 19 Global Step: 409630 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:03,370-Speed 2498.51 samples/sec Loss 2.1835 LearningRate 0.000316 Epoch: 19 Global Step: 409640 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:11,576-Speed 2496.12 samples/sec Loss 2.1683 LearningRate 0.000316 Epoch: 19 Global Step: 409650 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:19,781-Speed 2496.64 samples/sec Loss 2.1734 LearningRate 0.000316 Epoch: 19 Global Step: 409660 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:27,985-Speed 2496.71 samples/sec Loss 2.1816 LearningRate 0.000316 Epoch: 19 Global Step: 409670 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:36,190-Speed 2496.62 samples/sec Loss 2.1214 LearningRate 0.000316 Epoch: 19 Global Step: 409680 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:44,337-Speed 2514.24 samples/sec Loss 2.1538 LearningRate 0.000316 Epoch: 19 Global Step: 409690 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:12:52,572-Speed 2487.48 samples/sec Loss 2.1814 LearningRate 0.000316 Epoch: 19 Global Step: 409700 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:00,778-Speed 2496.13 samples/sec Loss 2.1809 LearningRate 0.000316 Epoch: 19 Global Step: 409710 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:08,984-Speed 2495.99 samples/sec Loss 2.2002 LearningRate 0.000316 Epoch: 19 Global Step: 409720 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:17,192-Speed 2495.69 samples/sec Loss 2.1395 LearningRate 0.000316 Epoch: 19 Global Step: 409730 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:25,395-Speed 2497.00 samples/sec Loss 2.2092 LearningRate 0.000316 Epoch: 19 Global Step: 409740 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:33,549-Speed 2512.18 samples/sec Loss 2.1328 LearningRate 0.000316 Epoch: 19 Global Step: 409750 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:41,750-Speed 2497.80 samples/sec Loss 2.1768 LearningRate 0.000316 Epoch: 19 Global Step: 409760 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:49,954-Speed 2496.67 samples/sec Loss 2.1199 LearningRate 0.000316 Epoch: 19 Global Step: 409770 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:13:58,162-Speed 2495.65 samples/sec Loss 2.1666 LearningRate 0.000316 Epoch: 19 Global Step: 409780 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:06,382-Speed 2491.95 samples/sec Loss 2.1991 LearningRate 0.000316 Epoch: 19 Global Step: 409790 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:14,579-Speed 2498.74 samples/sec Loss 2.1232 LearningRate 0.000316 Epoch: 19 Global Step: 409800 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:22,722-Speed 2515.18 samples/sec Loss 2.1480 LearningRate 0.000316 Epoch: 19 Global Step: 409810 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:30,921-Speed 2498.30 samples/sec Loss 2.1176 LearningRate 0.000316 Epoch: 19 Global Step: 409820 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:39,119-Speed 2498.66 samples/sec Loss 2.1429 LearningRate 0.000316 Epoch: 19 Global Step: 409830 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:47,322-Speed 2496.95 samples/sec Loss 2.1295 LearningRate 0.000316 Epoch: 19 Global Step: 409840 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:14:55,534-Speed 2494.43 samples/sec Loss 2.1143 LearningRate 0.000316 Epoch: 19 Global Step: 409850 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:03,733-Speed 2498.06 samples/sec Loss 2.2130 LearningRate 0.000316 Epoch: 19 Global Step: 409860 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:11,903-Speed 2507.43 samples/sec Loss 2.1946 LearningRate 0.000316 Epoch: 19 Global Step: 409870 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:20,102-Speed 2498.14 samples/sec Loss 2.1999 LearningRate 0.000316 Epoch: 19 Global Step: 409880 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:28,303-Speed 2497.63 samples/sec Loss 2.1191 LearningRate 0.000316 Epoch: 19 Global Step: 409890 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:36,506-Speed 2497.27 samples/sec Loss 2.1616 LearningRate 0.000316 Epoch: 19 Global Step: 409900 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:44,712-Speed 2496.09 samples/sec Loss 2.2314 LearningRate 0.000316 Epoch: 19 Global Step: 409910 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:15:52,911-Speed 2497.95 samples/sec Loss 2.1804 LearningRate 0.000316 Epoch: 19 Global Step: 409920 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:01,055-Speed 2515.37 samples/sec Loss 2.1982 LearningRate 0.000316 Epoch: 19 Global Step: 409930 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:09,255-Speed 2497.74 samples/sec Loss 2.1792 LearningRate 0.000316 Epoch: 19 Global Step: 409940 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:17,461-Speed 2496.30 samples/sec Loss 2.1478 LearningRate 0.000316 Epoch: 19 Global Step: 409950 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:25,660-Speed 2498.15 samples/sec Loss 2.2188 LearningRate 0.000316 Epoch: 19 Global Step: 409960 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:33,864-Speed 2496.73 samples/sec Loss 2.2018 LearningRate 0.000316 Epoch: 19 Global Step: 409970 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:42,064-Speed 2498.05 samples/sec Loss 2.2369 LearningRate 0.000316 Epoch: 19 Global Step: 409980 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:50,210-Speed 2514.17 samples/sec Loss 2.2259 LearningRate 0.000316 Epoch: 19 Global Step: 409990 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:16:58,411-Speed 2497.80 samples/sec Loss 2.1797 LearningRate 0.000316 Epoch: 19 Global Step: 410000 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:06,621-Speed 2494.82 samples/sec Loss 2.1486 LearningRate 0.000316 Epoch: 19 Global Step: 410010 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:14,822-Speed 2497.70 samples/sec Loss 2.1772 LearningRate 0.000316 Epoch: 19 Global Step: 410020 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:23,026-Speed 2496.46 samples/sec Loss 2.1511 LearningRate 0.000316 Epoch: 19 Global Step: 410030 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:31,230-Speed 2496.97 samples/sec Loss 2.1872 LearningRate 0.000316 Epoch: 19 Global Step: 410040 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:39,386-Speed 2511.44 samples/sec Loss 2.1766 LearningRate 0.000316 Epoch: 19 Global Step: 410050 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:47,585-Speed 2498.15 samples/sec Loss 2.1724 LearningRate 0.000316 Epoch: 19 Global Step: 410060 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:17:55,785-Speed 2497.82 samples/sec Loss 2.1924 LearningRate 0.000316 Epoch: 19 Global Step: 410070 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:03,990-Speed 2496.57 samples/sec Loss 2.2185 LearningRate 0.000316 Epoch: 19 Global Step: 410080 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:12,206-Speed 2493.11 samples/sec Loss 2.2425 LearningRate 0.000316 Epoch: 19 Global Step: 410090 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:20,419-Speed 2493.88 samples/sec Loss 2.2528 LearningRate 0.000316 Epoch: 19 Global Step: 410100 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:28,572-Speed 2512.40 samples/sec Loss 2.2087 LearningRate 0.000316 Epoch: 19 Global Step: 410110 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:36,774-Speed 2497.41 samples/sec Loss 2.1869 LearningRate 0.000316 Epoch: 19 Global Step: 410120 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:44,978-Speed 2496.54 samples/sec Loss 2.1697 LearningRate 0.000316 Epoch: 19 Global Step: 410130 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:18:53,189-Speed 2494.74 samples/sec Loss 2.1797 LearningRate 0.000316 Epoch: 19 Global Step: 410140 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:01,390-Speed 2497.93 samples/sec Loss 2.2355 LearningRate 0.000316 Epoch: 19 Global Step: 410150 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:09,594-Speed 2496.76 samples/sec Loss 2.2229 LearningRate 0.000316 Epoch: 19 Global Step: 410160 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:17,740-Speed 2514.54 samples/sec Loss 2.1957 LearningRate 0.000316 Epoch: 19 Global Step: 410170 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:25,938-Speed 2498.42 samples/sec Loss 2.1757 LearningRate 0.000316 Epoch: 19 Global Step: 410180 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:34,142-Speed 2496.63 samples/sec Loss 2.1561 LearningRate 0.000316 Epoch: 19 Global Step: 410190 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:42,344-Speed 2497.53 samples/sec Loss 2.1998 LearningRate 0.000315 Epoch: 19 Global Step: 410200 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:50,567-Speed 2490.94 samples/sec Loss 2.1964 LearningRate 0.000315 Epoch: 19 Global Step: 410210 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:19:58,775-Speed 2495.84 samples/sec Loss 2.1854 LearningRate 0.000315 Epoch: 19 Global Step: 410220 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:06,919-Speed 2515.15 samples/sec Loss 2.2186 LearningRate 0.000315 Epoch: 19 Global Step: 410230 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:15,118-Speed 2498.09 samples/sec Loss 2.1448 LearningRate 0.000315 Epoch: 19 Global Step: 410240 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:23,316-Speed 2498.51 samples/sec Loss 2.1451 LearningRate 0.000315 Epoch: 19 Global Step: 410250 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:31,519-Speed 2496.97 samples/sec Loss 2.1092 LearningRate 0.000315 Epoch: 19 Global Step: 410260 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:39,722-Speed 2497.08 samples/sec Loss 2.1788 LearningRate 0.000315 Epoch: 19 Global Step: 410270 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:47,926-Speed 2497.06 samples/sec Loss 2.1619 LearningRate 0.000315 Epoch: 19 Global Step: 410280 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:20:56,077-Speed 2513.05 samples/sec Loss 2.0826 LearningRate 0.000315 Epoch: 19 Global Step: 410290 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:04,278-Speed 2497.77 samples/sec Loss 2.1103 LearningRate 0.000315 Epoch: 19 Global Step: 410300 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:12,476-Speed 2498.32 samples/sec Loss 2.1715 LearningRate 0.000315 Epoch: 19 Global Step: 410310 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:20,680-Speed 2496.85 samples/sec Loss 2.1505 LearningRate 0.000315 Epoch: 19 Global Step: 410320 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:28,879-Speed 2498.65 samples/sec Loss 2.1519 LearningRate 0.000315 Epoch: 19 Global Step: 410330 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:37,081-Speed 2497.02 samples/sec Loss 2.1533 LearningRate 0.000315 Epoch: 19 Global Step: 410340 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:45,228-Speed 2514.44 samples/sec Loss 2.1488 LearningRate 0.000315 Epoch: 19 Global Step: 410350 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:21:53,431-Speed 2496.93 samples/sec Loss 2.1069 LearningRate 0.000315 Epoch: 19 Global Step: 410360 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:01,644-Speed 2494.17 samples/sec Loss 2.1441 LearningRate 0.000315 Epoch: 19 Global Step: 410370 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:09,848-Speed 2496.73 samples/sec Loss 2.1622 LearningRate 0.000315 Epoch: 19 Global Step: 410380 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:18,050-Speed 2497.68 samples/sec Loss 2.1582 LearningRate 0.000315 Epoch: 19 Global Step: 410390 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:26,251-Speed 2497.58 samples/sec Loss 2.2049 LearningRate 0.000315 Epoch: 19 Global Step: 410400 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:34,405-Speed 2512.02 samples/sec Loss 2.0882 LearningRate 0.000315 Epoch: 19 Global Step: 410410 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:42,621-Speed 2493.23 samples/sec Loss 2.1826 LearningRate 0.000315 Epoch: 19 Global Step: 410420 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:50,821-Speed 2497.98 samples/sec Loss 2.1249 LearningRate 0.000315 Epoch: 19 Global Step: 410430 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:22:59,019-Speed 2498.57 samples/sec Loss 2.1094 LearningRate 0.000315 Epoch: 19 Global Step: 410440 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:07,223-Speed 2496.64 samples/sec Loss 2.1419 LearningRate 0.000315 Epoch: 19 Global Step: 410450 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:15,423-Speed 2498.04 samples/sec Loss 2.1522 LearningRate 0.000315 Epoch: 19 Global Step: 410460 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:23,570-Speed 2514.45 samples/sec Loss 2.1388 LearningRate 0.000315 Epoch: 19 Global Step: 410470 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:31,771-Speed 2497.55 samples/sec Loss 2.1784 LearningRate 0.000315 Epoch: 19 Global Step: 410480 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:39,971-Speed 2498.07 samples/sec Loss 2.1759 LearningRate 0.000315 Epoch: 19 Global Step: 410490 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:48,176-Speed 2496.30 samples/sec Loss 2.1972 LearningRate 0.000315 Epoch: 19 Global Step: 410500 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:23:56,381-Speed 2496.49 samples/sec Loss 2.1592 LearningRate 0.000315 Epoch: 19 Global Step: 410510 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:04,585-Speed 2496.56 samples/sec Loss 2.1468 LearningRate 0.000315 Epoch: 19 Global Step: 410520 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:12,737-Speed 2512.77 samples/sec Loss 2.1499 LearningRate 0.000315 Epoch: 19 Global Step: 410530 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:20,937-Speed 2498.04 samples/sec Loss 2.0888 LearningRate 0.000315 Epoch: 19 Global Step: 410540 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:29,138-Speed 2497.79 samples/sec Loss 2.1444 LearningRate 0.000315 Epoch: 19 Global Step: 410550 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:37,341-Speed 2497.22 samples/sec Loss 2.1310 LearningRate 0.000315 Epoch: 19 Global Step: 410560 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:45,542-Speed 2497.79 samples/sec Loss 2.1510 LearningRate 0.000315 Epoch: 19 Global Step: 410570 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:24:53,747-Speed 2496.33 samples/sec Loss 2.1464 LearningRate 0.000315 Epoch: 19 Global Step: 410580 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:01,895-Speed 2514.20 samples/sec Loss 2.1272 LearningRate 0.000315 Epoch: 19 Global Step: 410590 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:10,107-Speed 2494.17 samples/sec Loss 2.1376 LearningRate 0.000315 Epoch: 19 Global Step: 410600 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:18,308-Speed 2497.82 samples/sec Loss 2.1429 LearningRate 0.000315 Epoch: 19 Global Step: 410610 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:26,506-Speed 2498.63 samples/sec Loss 2.1428 LearningRate 0.000315 Epoch: 19 Global Step: 410620 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:34,708-Speed 2497.42 samples/sec Loss 2.1287 LearningRate 0.000315 Epoch: 19 Global Step: 410630 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:25:42,864-Speed 2511.28 samples/sec Loss 2.1523 LearningRate 0.000315 Epoch: 19 Global Step: 410640 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:25:51,008-Speed 2514.85 samples/sec Loss 2.1524 LearningRate 0.000315 Epoch: 19 Global Step: 410650 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:25:59,207-Speed 2498.73 samples/sec Loss 2.1024 LearningRate 0.000315 Epoch: 19 Global Step: 410660 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:07,407-Speed 2497.96 samples/sec Loss 2.1282 LearningRate 0.000315 Epoch: 19 Global Step: 410670 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:15,605-Speed 2498.46 samples/sec Loss 2.1730 LearningRate 0.000315 Epoch: 19 Global Step: 410680 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:23,816-Speed 2494.73 samples/sec Loss 2.1421 LearningRate 0.000315 Epoch: 19 Global Step: 410690 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:32,027-Speed 2494.50 samples/sec Loss 2.1728 LearningRate 0.000315 Epoch: 19 Global Step: 410700 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:40,170-Speed 2515.45 samples/sec Loss 2.1690 LearningRate 0.000315 Epoch: 19 Global Step: 410710 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:48,370-Speed 2498.23 samples/sec Loss 2.1388 LearningRate 0.000315 Epoch: 19 Global Step: 410720 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:26:56,574-Speed 2496.67 samples/sec Loss 2.1066 LearningRate 0.000315 Epoch: 19 Global Step: 410730 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:04,794-Speed 2492.12 samples/sec Loss 2.0819 LearningRate 0.000315 Epoch: 19 Global Step: 410740 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:12,989-Speed 2499.38 samples/sec Loss 2.1340 LearningRate 0.000315 Epoch: 19 Global Step: 410750 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:21,204-Speed 2493.64 samples/sec Loss 2.1744 LearningRate 0.000315 Epoch: 19 Global Step: 410760 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:29,356-Speed 2512.25 samples/sec Loss 2.1588 LearningRate 0.000315 Epoch: 19 Global Step: 410770 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:37,559-Speed 2497.22 samples/sec Loss 2.1490 LearningRate 0.000315 Epoch: 19 Global Step: 410780 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:45,759-Speed 2497.82 samples/sec Loss 2.1751 LearningRate 0.000315 Epoch: 19 Global Step: 410790 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:27:53,963-Speed 2496.83 samples/sec Loss 2.1593 LearningRate 0.000315 Epoch: 19 Global Step: 410800 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:02,159-Speed 2498.98 samples/sec Loss 2.1795 LearningRate 0.000315 Epoch: 19 Global Step: 410810 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:10,361-Speed 2497.49 samples/sec Loss 2.2212 LearningRate 0.000315 Epoch: 19 Global Step: 410820 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:18,515-Speed 2512.14 samples/sec Loss 2.1583 LearningRate 0.000315 Epoch: 19 Global Step: 410830 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:26,712-Speed 2498.75 samples/sec Loss 2.2394 LearningRate 0.000315 Epoch: 19 Global Step: 410840 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:34,914-Speed 2497.25 samples/sec Loss 2.1673 LearningRate 0.000315 Epoch: 19 Global Step: 410850 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:43,117-Speed 2497.23 samples/sec Loss 2.1476 LearningRate 0.000315 Epoch: 19 Global Step: 410860 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:51,315-Speed 2498.41 samples/sec Loss 2.2398 LearningRate 0.000314 Epoch: 19 Global Step: 410870 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:28:59,515-Speed 2498.00 samples/sec Loss 2.1638 LearningRate 0.000314 Epoch: 19 Global Step: 410880 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:07,661-Speed 2514.52 samples/sec Loss 2.1745 LearningRate 0.000314 Epoch: 19 Global Step: 410890 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:15,864-Speed 2496.91 samples/sec Loss 2.2005 LearningRate 0.000314 Epoch: 19 Global Step: 410900 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:24,067-Speed 2497.28 samples/sec Loss 2.1226 LearningRate 0.000314 Epoch: 19 Global Step: 410910 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:32,281-Speed 2493.39 samples/sec Loss 2.1744 LearningRate 0.000314 Epoch: 19 Global Step: 410920 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:40,481-Speed 2498.11 samples/sec Loss 2.2003 LearningRate 0.000314 Epoch: 19 Global Step: 410930 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:48,691-Speed 2495.02 samples/sec Loss 2.1422 LearningRate 0.000314 Epoch: 19 Global Step: 410940 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:29:56,846-Speed 2511.86 samples/sec Loss 2.1495 LearningRate 0.000314 Epoch: 19 Global Step: 410950 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:05,043-Speed 2498.93 samples/sec Loss 2.1204 LearningRate 0.000314 Epoch: 19 Global Step: 410960 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:13,251-Speed 2495.97 samples/sec Loss 2.1408 LearningRate 0.000314 Epoch: 19 Global Step: 410970 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:21,459-Speed 2495.44 samples/sec Loss 2.1592 LearningRate 0.000314 Epoch: 19 Global Step: 410980 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:29,662-Speed 2496.93 samples/sec Loss 2.1308 LearningRate 0.000314 Epoch: 19 Global Step: 410990 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:37,871-Speed 2495.30 samples/sec Loss 2.1893 LearningRate 0.000314 Epoch: 19 Global Step: 411000 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:46,019-Speed 2513.93 samples/sec Loss 2.1326 LearningRate 0.000314 Epoch: 19 Global Step: 411010 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:30:54,221-Speed 2497.61 samples/sec Loss 2.1042 LearningRate 0.000314 Epoch: 19 Global Step: 411020 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:02,423-Speed 2497.31 samples/sec Loss 2.1435 LearningRate 0.000314 Epoch: 19 Global Step: 411030 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:10,622-Speed 2498.00 samples/sec Loss 2.1266 LearningRate 0.000314 Epoch: 19 Global Step: 411040 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:18,832-Speed 2495.14 samples/sec Loss 2.1411 LearningRate 0.000314 Epoch: 19 Global Step: 411050 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:27,045-Speed 2494.22 samples/sec Loss 2.1572 LearningRate 0.000314 Epoch: 19 Global Step: 411060 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:35,193-Speed 2513.98 samples/sec Loss 2.1883 LearningRate 0.000314 Epoch: 19 Global Step: 411070 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:43,391-Speed 2498.66 samples/sec Loss 2.1149 LearningRate 0.000314 Epoch: 19 Global Step: 411080 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:51,591-Speed 2497.90 samples/sec Loss 2.1115 LearningRate 0.000314 Epoch: 19 Global Step: 411090 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:31:59,789-Speed 2498.93 samples/sec Loss 2.1838 LearningRate 0.000314 Epoch: 19 Global Step: 411100 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:07,989-Speed 2498.01 samples/sec Loss 2.1792 LearningRate 0.000314 Epoch: 19 Global Step: 411110 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:16,186-Speed 2498.82 samples/sec Loss 2.1420 LearningRate 0.000314 Epoch: 19 Global Step: 411120 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:24,331-Speed 2515.09 samples/sec Loss 2.1860 LearningRate 0.000314 Epoch: 19 Global Step: 411130 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:32,532-Speed 2497.56 samples/sec Loss 2.1999 LearningRate 0.000314 Epoch: 19 Global Step: 411140 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:40,731-Speed 2498.28 samples/sec Loss 2.1861 LearningRate 0.000314 Epoch: 19 Global Step: 411150 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:48,933-Speed 2497.27 samples/sec Loss 2.1587 LearningRate 0.000314 Epoch: 19 Global Step: 411160 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:32:57,133-Speed 2498.02 samples/sec Loss 2.2024 LearningRate 0.000314 Epoch: 19 Global Step: 411170 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:05,335-Speed 2497.19 samples/sec Loss 2.2122 LearningRate 0.000314 Epoch: 19 Global Step: 411180 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:13,478-Speed 2515.44 samples/sec Loss 2.1190 LearningRate 0.000314 Epoch: 19 Global Step: 411190 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:21,679-Speed 2497.90 samples/sec Loss 2.1668 LearningRate 0.000314 Epoch: 19 Global Step: 411200 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:29,875-Speed 2499.17 samples/sec Loss 2.1308 LearningRate 0.000314 Epoch: 19 Global Step: 411210 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:38,099-Speed 2490.38 samples/sec Loss 2.1870 LearningRate 0.000314 Epoch: 19 Global Step: 411220 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:46,296-Speed 2498.92 samples/sec Loss 2.2314 LearningRate 0.000314 Epoch: 19 Global Step: 411230 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:33:54,497-Speed 2497.83 samples/sec Loss 2.1425 LearningRate 0.000314 Epoch: 19 Global Step: 411240 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:02,642-Speed 2514.58 samples/sec Loss 2.1906 LearningRate 0.000314 Epoch: 19 Global Step: 411250 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:10,854-Speed 2494.45 samples/sec Loss 2.1850 LearningRate 0.000314 Epoch: 19 Global Step: 411260 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:19,055-Speed 2497.52 samples/sec Loss 2.1972 LearningRate 0.000314 Epoch: 19 Global Step: 411270 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:27,254-Speed 2498.28 samples/sec Loss 2.1419 LearningRate 0.000314 Epoch: 19 Global Step: 411280 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:35,457-Speed 2497.15 samples/sec Loss 2.1991 LearningRate 0.000314 Epoch: 19 Global Step: 411290 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:43,658-Speed 2497.60 samples/sec Loss 2.1582 LearningRate 0.000314 Epoch: 19 Global Step: 411300 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:34:51,801-Speed 2515.41 samples/sec Loss 2.2141 LearningRate 0.000314 Epoch: 19 Global Step: 411310 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:00,005-Speed 2497.33 samples/sec Loss 2.1718 LearningRate 0.000314 Epoch: 19 Global Step: 411320 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:08,215-Speed 2494.90 samples/sec Loss 2.1857 LearningRate 0.000314 Epoch: 19 Global Step: 411330 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:16,410-Speed 2499.36 samples/sec Loss 2.1655 LearningRate 0.000314 Epoch: 19 Global Step: 411340 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:24,632-Speed 2491.63 samples/sec Loss 2.1813 LearningRate 0.000314 Epoch: 19 Global Step: 411350 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:32,840-Speed 2495.50 samples/sec Loss 2.1640 LearningRate 0.000314 Epoch: 19 Global Step: 411360 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:40,986-Speed 2514.59 samples/sec Loss 2.1824 LearningRate 0.000314 Epoch: 19 Global Step: 411370 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:49,188-Speed 2497.23 samples/sec Loss 2.1927 LearningRate 0.000314 Epoch: 19 Global Step: 411380 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:35:57,385-Speed 2498.84 samples/sec Loss 2.1842 LearningRate 0.000314 Epoch: 19 Global Step: 411390 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:05,585-Speed 2497.93 samples/sec Loss 2.1427 LearningRate 0.000314 Epoch: 19 Global Step: 411400 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:13,791-Speed 2496.05 samples/sec Loss 2.1500 LearningRate 0.000314 Epoch: 19 Global Step: 411410 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:21,991-Speed 2498.02 samples/sec Loss 2.1668 LearningRate 0.000314 Epoch: 19 Global Step: 411420 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:30,145-Speed 2512.11 samples/sec Loss 2.1408 LearningRate 0.000314 Epoch: 19 Global Step: 411430 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:38,355-Speed 2494.99 samples/sec Loss 2.1506 LearningRate 0.000314 Epoch: 19 Global Step: 411440 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:46,552-Speed 2498.72 samples/sec Loss 2.1406 LearningRate 0.000314 Epoch: 19 Global Step: 411450 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:36:54,757-Speed 2496.91 samples/sec Loss 2.1361 LearningRate 0.000314 Epoch: 19 Global Step: 411460 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:02,955-Speed 2498.39 samples/sec Loss 2.1431 LearningRate 0.000314 Epoch: 19 Global Step: 411470 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:11,156-Speed 2497.80 samples/sec Loss 2.1572 LearningRate 0.000314 Epoch: 19 Global Step: 411480 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:19,302-Speed 2514.49 samples/sec Loss 2.1514 LearningRate 0.000314 Epoch: 19 Global Step: 411490 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:27,500-Speed 2498.61 samples/sec Loss 2.1651 LearningRate 0.000314 Epoch: 19 Global Step: 411500 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:35,698-Speed 2498.69 samples/sec Loss 2.1292 LearningRate 0.000314 Epoch: 19 Global Step: 411510 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:43,900-Speed 2497.47 samples/sec Loss 2.1844 LearningRate 0.000314 Epoch: 19 Global Step: 411520 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:37:52,098-Speed 2498.95 samples/sec Loss 2.1911 LearningRate 0.000313 Epoch: 19 Global Step: 411530 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:00,293-Speed 2499.36 samples/sec Loss 2.1350 LearningRate 0.000313 Epoch: 19 Global Step: 411540 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:08,443-Speed 2513.44 samples/sec Loss 2.1018 LearningRate 0.000313 Epoch: 19 Global Step: 411550 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:16,641-Speed 2498.54 samples/sec Loss 2.1558 LearningRate 0.000313 Epoch: 19 Global Step: 411560 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:24,838-Speed 2499.10 samples/sec Loss 2.1564 LearningRate 0.000313 Epoch: 19 Global Step: 411570 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:33,036-Speed 2498.40 samples/sec Loss 2.1614 LearningRate 0.000313 Epoch: 19 Global Step: 411580 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:41,237-Speed 2497.67 samples/sec Loss 2.1782 LearningRate 0.000313 Epoch: 19 Global Step: 411590 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:49,437-Speed 2498.29 samples/sec Loss 2.1854 LearningRate 0.000313 Epoch: 19 Global Step: 411600 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:38:57,586-Speed 2513.38 samples/sec Loss 2.1814 LearningRate 0.000313 Epoch: 19 Global Step: 411610 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:05,792-Speed 2496.34 samples/sec Loss 2.1761 LearningRate 0.000313 Epoch: 19 Global Step: 411620 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:14,001-Speed 2495.51 samples/sec Loss 2.1737 LearningRate 0.000313 Epoch: 19 Global Step: 411630 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:22,203-Speed 2497.70 samples/sec Loss 2.1445 LearningRate 0.000313 Epoch: 19 Global Step: 411640 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:30,403-Speed 2498.00 samples/sec Loss 2.0983 LearningRate 0.000313 Epoch: 19 Global Step: 411650 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:38,605-Speed 2498.34 samples/sec Loss 2.1223 LearningRate 0.000313 Epoch: 19 Global Step: 411660 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:46,752-Speed 2514.11 samples/sec Loss 2.1343 LearningRate 0.000313 Epoch: 19 Global Step: 411670 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:39:54,952-Speed 2497.96 samples/sec Loss 2.0992 LearningRate 0.000313 Epoch: 19 Global Step: 411680 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:03,154-Speed 2497.38 samples/sec Loss 2.1402 LearningRate 0.000313 Epoch: 19 Global Step: 411690 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:11,353-Speed 2498.33 samples/sec Loss 2.1507 LearningRate 0.000313 Epoch: 19 Global Step: 411700 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:19,576-Speed 2491.65 samples/sec Loss 2.1345 LearningRate 0.000313 Epoch: 19 Global Step: 411710 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:27,775-Speed 2498.36 samples/sec Loss 2.1508 LearningRate 0.000313 Epoch: 19 Global Step: 411720 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:35,926-Speed 2512.94 samples/sec Loss 2.1253 LearningRate 0.000313 Epoch: 19 Global Step: 411730 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:44,123-Speed 2498.99 samples/sec Loss 2.1053 LearningRate 0.000313 Epoch: 19 Global Step: 411740 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:40:52,321-Speed 2498.37 samples/sec Loss 2.1531 LearningRate 0.000313 Epoch: 19 Global Step: 411750 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:00,528-Speed 2495.90 samples/sec Loss 2.1706 LearningRate 0.000313 Epoch: 19 Global Step: 411760 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:08,732-Speed 2496.81 samples/sec Loss 2.1332 LearningRate 0.000313 Epoch: 19 Global Step: 411770 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:16,933-Speed 2497.68 samples/sec Loss 2.1150 LearningRate 0.000313 Epoch: 19 Global Step: 411780 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:25,087-Speed 2512.29 samples/sec Loss 2.0979 LearningRate 0.000313 Epoch: 19 Global Step: 411790 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:33,290-Speed 2496.93 samples/sec Loss 2.0976 LearningRate 0.000313 Epoch: 19 Global Step: 411800 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:41,494-Speed 2496.94 samples/sec Loss 2.1524 LearningRate 0.000313 Epoch: 19 Global Step: 411810 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:49,697-Speed 2497.08 samples/sec Loss 2.1120 LearningRate 0.000313 Epoch: 19 Global Step: 411820 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:41:57,902-Speed 2496.46 samples/sec Loss 2.1552 LearningRate 0.000313 Epoch: 19 Global Step: 411830 Fp16 Grad Scale: 8192 Required: 96 hours Training: 2022-07-09 12:42:06,102-Speed 2497.89 samples/sec Loss 2.1172 LearningRate 0.000313 Epoch: 19 Global Step: 411840 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:14,251-Speed 2513.51 samples/sec Loss 2.1323 LearningRate 0.000313 Epoch: 19 Global Step: 411850 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:22,450-Speed 2498.63 samples/sec Loss 2.1298 LearningRate 0.000313 Epoch: 19 Global Step: 411860 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:30,652-Speed 2497.55 samples/sec Loss 2.1164 LearningRate 0.000313 Epoch: 19 Global Step: 411870 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:38,855-Speed 2496.92 samples/sec Loss 2.1536 LearningRate 0.000313 Epoch: 19 Global Step: 411880 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:47,060-Speed 2496.49 samples/sec Loss 2.1304 LearningRate 0.000313 Epoch: 19 Global Step: 411890 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:42:55,274-Speed 2493.76 samples/sec Loss 2.1703 LearningRate 0.000313 Epoch: 19 Global Step: 411900 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:03,421-Speed 2514.13 samples/sec Loss 2.1217 LearningRate 0.000313 Epoch: 19 Global Step: 411910 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:11,621-Speed 2497.86 samples/sec Loss 2.0910 LearningRate 0.000313 Epoch: 19 Global Step: 411920 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:19,828-Speed 2495.90 samples/sec Loss 2.1244 LearningRate 0.000313 Epoch: 19 Global Step: 411930 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:28,028-Speed 2498.81 samples/sec Loss 2.1494 LearningRate 0.000313 Epoch: 19 Global Step: 411940 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:36,231-Speed 2497.19 samples/sec Loss 2.1732 LearningRate 0.000313 Epoch: 19 Global Step: 411950 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:44,431-Speed 2498.03 samples/sec Loss 2.1272 LearningRate 0.000313 Epoch: 19 Global Step: 411960 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:43:52,602-Speed 2506.90 samples/sec Loss 2.1109 LearningRate 0.000313 Epoch: 19 Global Step: 411970 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:00,800-Speed 2498.27 samples/sec Loss 2.1143 LearningRate 0.000313 Epoch: 19 Global Step: 411980 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:09,011-Speed 2494.58 samples/sec Loss 2.1622 LearningRate 0.000313 Epoch: 19 Global Step: 411990 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:17,212-Speed 2497.74 samples/sec Loss 2.1564 LearningRate 0.000313 Epoch: 19 Global Step: 412000 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:25,412-Speed 2498.18 samples/sec Loss 2.1058 LearningRate 0.000313 Epoch: 19 Global Step: 412010 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:33,613-Speed 2497.99 samples/sec Loss 2.1468 LearningRate 0.000313 Epoch: 19 Global Step: 412020 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:41,771-Speed 2510.60 samples/sec Loss 2.1018 LearningRate 0.000313 Epoch: 19 Global Step: 412030 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:49,974-Speed 2497.08 samples/sec Loss 2.1590 LearningRate 0.000313 Epoch: 19 Global Step: 412040 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:44:58,178-Speed 2496.76 samples/sec Loss 2.1435 LearningRate 0.000313 Epoch: 19 Global Step: 412050 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:06,378-Speed 2497.66 samples/sec Loss 2.1394 LearningRate 0.000313 Epoch: 19 Global Step: 412060 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:14,584-Speed 2496.24 samples/sec Loss 2.2363 LearningRate 0.000313 Epoch: 19 Global Step: 412070 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:22,785-Speed 2497.85 samples/sec Loss 2.1125 LearningRate 0.000313 Epoch: 19 Global Step: 412080 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:30,933-Speed 2514.11 samples/sec Loss 2.1775 LearningRate 0.000313 Epoch: 19 Global Step: 412090 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:39,134-Speed 2497.50 samples/sec Loss 2.1743 LearningRate 0.000313 Epoch: 19 Global Step: 412100 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:47,332-Speed 2498.84 samples/sec Loss 2.1120 LearningRate 0.000313 Epoch: 19 Global Step: 412110 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:45:55,535-Speed 2497.06 samples/sec Loss 2.1439 LearningRate 0.000313 Epoch: 19 Global Step: 412120 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:03,739-Speed 2496.85 samples/sec Loss 2.1289 LearningRate 0.000313 Epoch: 19 Global Step: 412130 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:11,940-Speed 2497.70 samples/sec Loss 2.0904 LearningRate 0.000313 Epoch: 19 Global Step: 412140 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:20,090-Speed 2513.16 samples/sec Loss 2.1637 LearningRate 0.000313 Epoch: 19 Global Step: 412150 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:28,288-Speed 2498.88 samples/sec Loss 2.1791 LearningRate 0.000313 Epoch: 19 Global Step: 412160 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:36,487-Speed 2498.41 samples/sec Loss 2.0965 LearningRate 0.000313 Epoch: 19 Global Step: 412170 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:44,687-Speed 2497.89 samples/sec Loss 2.1249 LearningRate 0.000313 Epoch: 19 Global Step: 412180 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:46:52,884-Speed 2498.89 samples/sec Loss 2.1585 LearningRate 0.000313 Epoch: 19 Global Step: 412190 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:01,084-Speed 2497.90 samples/sec Loss 2.1646 LearningRate 0.000312 Epoch: 19 Global Step: 412200 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:09,233-Speed 2513.60 samples/sec Loss 2.1915 LearningRate 0.000312 Epoch: 19 Global Step: 412210 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:17,435-Speed 2497.31 samples/sec Loss 2.1522 LearningRate 0.000312 Epoch: 19 Global Step: 412220 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:25,645-Speed 2494.75 samples/sec Loss 2.1304 LearningRate 0.000312 Epoch: 19 Global Step: 412230 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:33,845-Speed 2498.03 samples/sec Loss 2.0972 LearningRate 0.000312 Epoch: 19 Global Step: 412240 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:42,044-Speed 2498.13 samples/sec Loss 2.0920 LearningRate 0.000312 Epoch: 19 Global Step: 412250 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:50,243-Speed 2498.47 samples/sec Loss 2.1600 LearningRate 0.000312 Epoch: 19 Global Step: 412260 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:47:58,391-Speed 2513.85 samples/sec Loss 2.1302 LearningRate 0.000312 Epoch: 19 Global Step: 412270 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:48:06,593-Speed 2497.25 samples/sec Loss 2.1653 LearningRate 0.000312 Epoch: 19 Global Step: 412280 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:48:14,795-Speed 2497.31 samples/sec Loss 2.1713 LearningRate 0.000312 Epoch: 19 Global Step: 412290 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:48:22,995-Speed 2498.26 samples/sec Loss 2.1340 LearningRate 0.000312 Epoch: 19 Global Step: 412300 Fp16 Grad Scale: 16384 Required: 96 hours Training: 2022-07-09 12:48:31,198-Speed 2497.13 samples/sec Loss 2.1845 LearningRate 0.000312 Epoch: 19 Global Step: 412310 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:48:39,397-Speed 2498.21 samples/sec Loss 2.1229 LearningRate 0.000312 Epoch: 19 Global Step: 412320 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:48:47,542-Speed 2515.17 samples/sec Loss 2.1650 LearningRate 0.000312 Epoch: 19 Global Step: 412330 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:48:55,742-Speed 2497.62 samples/sec Loss 2.1554 LearningRate 0.000312 Epoch: 19 Global Step: 412340 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:03,939-Speed 2498.87 samples/sec Loss 2.1627 LearningRate 0.000312 Epoch: 19 Global Step: 412350 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:12,139-Speed 2498.06 samples/sec Loss 2.2109 LearningRate 0.000312 Epoch: 19 Global Step: 412360 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:20,341-Speed 2497.38 samples/sec Loss 2.1555 LearningRate 0.000312 Epoch: 19 Global Step: 412370 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:28,542-Speed 2497.42 samples/sec Loss 2.2028 LearningRate 0.000312 Epoch: 19 Global Step: 412380 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:36,703-Speed 2509.98 samples/sec Loss 2.1450 LearningRate 0.000312 Epoch: 19 Global Step: 412390 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:44,900-Speed 2498.69 samples/sec Loss 2.1797 LearningRate 0.000312 Epoch: 19 Global Step: 412400 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:49:53,098-Speed 2498.65 samples/sec Loss 2.1637 LearningRate 0.000312 Epoch: 19 Global Step: 412410 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:01,307-Speed 2495.21 samples/sec Loss 2.1627 LearningRate 0.000312 Epoch: 19 Global Step: 412420 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:09,511-Speed 2497.08 samples/sec Loss 2.1576 LearningRate 0.000312 Epoch: 19 Global Step: 412430 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:17,710-Speed 2498.33 samples/sec Loss 2.1219 LearningRate 0.000312 Epoch: 19 Global Step: 412440 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:25,862-Speed 2512.75 samples/sec Loss 2.1652 LearningRate 0.000312 Epoch: 19 Global Step: 412450 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:34,063-Speed 2498.41 samples/sec Loss 2.1339 LearningRate 0.000312 Epoch: 19 Global Step: 412460 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:42,265-Speed 2497.21 samples/sec Loss 2.1506 LearningRate 0.000312 Epoch: 19 Global Step: 412470 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:50,467-Speed 2497.28 samples/sec Loss 2.1194 LearningRate 0.000312 Epoch: 19 Global Step: 412480 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:50:58,670-Speed 2497.30 samples/sec Loss 2.1034 LearningRate 0.000312 Epoch: 19 Global Step: 412490 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:06,875-Speed 2496.19 samples/sec Loss 2.0917 LearningRate 0.000312 Epoch: 19 Global Step: 412500 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:15,020-Speed 2514.94 samples/sec Loss 2.1172 LearningRate 0.000312 Epoch: 19 Global Step: 412510 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:23,221-Speed 2498.04 samples/sec Loss 2.1341 LearningRate 0.000312 Epoch: 19 Global Step: 412520 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:31,445-Speed 2490.73 samples/sec Loss 2.1533 LearningRate 0.000312 Epoch: 19 Global Step: 412530 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:39,652-Speed 2496.03 samples/sec Loss 2.1322 LearningRate 0.000312 Epoch: 19 Global Step: 412540 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:47,849-Speed 2499.09 samples/sec Loss 2.1092 LearningRate 0.000312 Epoch: 19 Global Step: 412550 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:51:56,053-Speed 2496.88 samples/sec Loss 2.1519 LearningRate 0.000312 Epoch: 19 Global Step: 412560 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:04,204-Speed 2512.64 samples/sec Loss 2.1296 LearningRate 0.000312 Epoch: 19 Global Step: 412570 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:12,405-Speed 2497.77 samples/sec Loss 2.1611 LearningRate 0.000312 Epoch: 19 Global Step: 412580 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:20,779-Speed 2445.77 samples/sec Loss 2.1300 LearningRate 0.000312 Epoch: 19 Global Step: 412590 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:29,175-Speed 2500.71 samples/sec Loss 2.1386 LearningRate 0.000312 Epoch: 19 Global Step: 412600 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:37,371-Speed 2498.95 samples/sec Loss 2.1018 LearningRate 0.000312 Epoch: 19 Global Step: 412610 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:48,170-Speed 1910.54 samples/sec Loss 2.1582 LearningRate 0.000312 Epoch: 19 Global Step: 412620 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:52:56,409-Speed 2515.72 samples/sec Loss 2.0792 LearningRate 0.000312 Epoch: 19 Global Step: 412630 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:04,609-Speed 2497.89 samples/sec Loss 2.1071 LearningRate 0.000312 Epoch: 19 Global Step: 412640 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:16,992-Speed 1664.98 samples/sec Loss 2.1046 LearningRate 0.000312 Epoch: 19 Global Step: 412650 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:25,193-Speed 2498.46 samples/sec Loss 2.1340 LearningRate 0.000312 Epoch: 19 Global Step: 412660 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:33,402-Speed 2495.09 samples/sec Loss 2.1315 LearningRate 0.000312 Epoch: 19 Global Step: 412670 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:46,467-Speed 1567.85 samples/sec Loss 2.1278 LearningRate 0.000312 Epoch: 19 Global Step: 412680 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:53:54,624-Speed 2512.36 samples/sec Loss 2.1968 LearningRate 0.000312 Epoch: 19 Global Step: 412690 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:02,865-Speed 2495.67 samples/sec Loss 2.1327 LearningRate 0.000312 Epoch: 19 Global Step: 412700 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:11,082-Speed 2492.80 samples/sec Loss 2.1283 LearningRate 0.000312 Epoch: 19 Global Step: 412710 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:23,800-Speed 2497.75 samples/sec Loss 2.1040 LearningRate 0.000312 Epoch: 19 Global Step: 412720 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:32,051-Speed 2494.68 samples/sec Loss 2.1134 LearningRate 0.000312 Epoch: 19 Global Step: 412730 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:40,264-Speed 2493.93 samples/sec Loss 2.1030 LearningRate 0.000312 Epoch: 19 Global Step: 412740 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:54:53,711-Speed 2177.74 samples/sec Loss 2.1498 LearningRate 0.000312 Epoch: 19 Global Step: 412750 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:02,444-Speed 2482.26 samples/sec Loss 2.1158 LearningRate 0.000312 Epoch: 19 Global Step: 412760 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:10,673-Speed 2503.29 samples/sec Loss 2.1349 LearningRate 0.000312 Epoch: 19 Global Step: 412770 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:19,007-Speed 2501.41 samples/sec Loss 2.1483 LearningRate 0.000312 Epoch: 19 Global Step: 412780 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:32,516-Speed 1516.09 samples/sec Loss 2.1541 LearningRate 0.000312 Epoch: 19 Global Step: 412790 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:40,736-Speed 2500.78 samples/sec Loss 2.0614 LearningRate 0.000312 Epoch: 19 Global Step: 412800 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:55:48,927-Speed 2514.70 samples/sec Loss 2.1220 LearningRate 0.000312 Epoch: 19 Global Step: 412810 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:05,851-Speed 1634.39 samples/sec Loss 2.1165 LearningRate 0.000312 Epoch: 19 Global Step: 412820 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:14,091-Speed 2501.55 samples/sec Loss 2.1100 LearningRate 0.000312 Epoch: 19 Global Step: 412830 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:24,785-Speed 2404.22 samples/sec Loss 2.1225 LearningRate 0.000312 Epoch: 19 Global Step: 412840 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:32,990-Speed 2496.31 samples/sec Loss 2.1150 LearningRate 0.000312 Epoch: 19 Global Step: 412850 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:41,388-Speed 2438.80 samples/sec Loss 2.1264 LearningRate 0.000312 Epoch: 19 Global Step: 412860 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:49,545-Speed 2511.46 samples/sec Loss 2.1388 LearningRate 0.000311 Epoch: 19 Global Step: 412870 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:56:57,755-Speed 2495.06 samples/sec Loss 2.1340 LearningRate 0.000311 Epoch: 19 Global Step: 412880 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:05,966-Speed 2494.79 samples/sec Loss 2.1652 LearningRate 0.000311 Epoch: 19 Global Step: 412890 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:14,185-Speed 2492.21 samples/sec Loss 2.0477 LearningRate 0.000311 Epoch: 19 Global Step: 412900 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:22,395-Speed 2494.64 samples/sec Loss 2.1029 LearningRate 0.000311 Epoch: 19 Global Step: 412910 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:30,603-Speed 2495.67 samples/sec Loss 2.1653 LearningRate 0.000311 Epoch: 19 Global Step: 412920 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:38,760-Speed 2512.08 samples/sec Loss 2.1504 LearningRate 0.000311 Epoch: 19 Global Step: 412930 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:46,966-Speed 2496.06 samples/sec Loss 2.1417 LearningRate 0.000311 Epoch: 19 Global Step: 412940 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:57:55,185-Speed 2492.25 samples/sec Loss 2.0830 LearningRate 0.000311 Epoch: 19 Global Step: 412950 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:03,398-Speed 2494.16 samples/sec Loss 2.1285 LearningRate 0.000311 Epoch: 19 Global Step: 412960 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:11,603-Speed 2496.16 samples/sec Loss 2.1760 LearningRate 0.000311 Epoch: 19 Global Step: 412970 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:19,822-Speed 2492.09 samples/sec Loss 2.2042 LearningRate 0.000311 Epoch: 19 Global Step: 412980 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:27,985-Speed 2509.32 samples/sec Loss 2.1443 LearningRate 0.000311 Epoch: 19 Global Step: 412990 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:36,202-Speed 2492.71 samples/sec Loss 2.1892 LearningRate 0.000311 Epoch: 19 Global Step: 413000 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:44,407-Speed 2496.45 samples/sec Loss 2.1953 LearningRate 0.000311 Epoch: 19 Global Step: 413010 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:58:52,608-Speed 2497.92 samples/sec Loss 2.1596 LearningRate 0.000311 Epoch: 19 Global Step: 413020 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:59:00,820-Speed 2494.25 samples/sec Loss 2.0946 LearningRate 0.000311 Epoch: 19 Global Step: 413030 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 12:59:09,026-Speed 2495.95 samples/sec Loss 2.1210 LearningRate 0.000311 Epoch: 19 Global Step: 413040 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:17,179-Speed 2512.68 samples/sec Loss 2.1128 LearningRate 0.000311 Epoch: 19 Global Step: 413050 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:25,387-Speed 2495.53 samples/sec Loss 2.1615 LearningRate 0.000311 Epoch: 19 Global Step: 413060 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:33,588-Speed 2497.74 samples/sec Loss 2.1533 LearningRate 0.000311 Epoch: 19 Global Step: 413070 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:41,804-Speed 2492.87 samples/sec Loss 2.1289 LearningRate 0.000311 Epoch: 19 Global Step: 413080 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:50,011-Speed 2496.42 samples/sec Loss 2.1076 LearningRate 0.000311 Epoch: 19 Global Step: 413090 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 12:59:58,215-Speed 2496.74 samples/sec Loss 2.1228 LearningRate 0.000311 Epoch: 19 Global Step: 413100 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:06,370-Speed 2511.69 samples/sec Loss 2.1173 LearningRate 0.000311 Epoch: 19 Global Step: 413110 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:14,575-Speed 2496.56 samples/sec Loss 2.1433 LearningRate 0.000311 Epoch: 19 Global Step: 413120 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:22,783-Speed 2495.59 samples/sec Loss 2.1216 LearningRate 0.000311 Epoch: 19 Global Step: 413130 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:30,996-Speed 2494.05 samples/sec Loss 2.1408 LearningRate 0.000311 Epoch: 19 Global Step: 413140 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:39,209-Speed 2494.25 samples/sec Loss 2.1632 LearningRate 0.000311 Epoch: 19 Global Step: 413150 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:47,415-Speed 2496.06 samples/sec Loss 2.1786 LearningRate 0.000311 Epoch: 19 Global Step: 413160 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:00:55,566-Speed 2512.77 samples/sec Loss 2.1318 LearningRate 0.000311 Epoch: 19 Global Step: 413170 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:03,771-Speed 2496.65 samples/sec Loss 2.1516 LearningRate 0.000311 Epoch: 19 Global Step: 413180 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:12,013-Speed 2485.41 samples/sec Loss 2.1221 LearningRate 0.000311 Epoch: 19 Global Step: 413190 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:20,218-Speed 2496.37 samples/sec Loss 2.1661 LearningRate 0.000311 Epoch: 19 Global Step: 413200 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:28,423-Speed 2496.53 samples/sec Loss 2.1005 LearningRate 0.000311 Epoch: 19 Global Step: 413210 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:36,626-Speed 2497.05 samples/sec Loss 2.1121 LearningRate 0.000311 Epoch: 19 Global Step: 413220 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:44,781-Speed 2511.99 samples/sec Loss 2.1268 LearningRate 0.000311 Epoch: 19 Global Step: 413230 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:01:52,985-Speed 2496.67 samples/sec Loss 2.1745 LearningRate 0.000311 Epoch: 19 Global Step: 413240 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:02:01,190-Speed 2496.28 samples/sec Loss 2.1904 LearningRate 0.000311 Epoch: 19 Global Step: 413250 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:02:09,351-Speed 2510.20 samples/sec Loss 2.1800 LearningRate 0.000311 Epoch: 19 Global Step: 413260 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:17,550-Speed 2498.10 samples/sec Loss 2.1120 LearningRate 0.000311 Epoch: 19 Global Step: 413270 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:25,757-Speed 2495.81 samples/sec Loss 2.1886 LearningRate 0.000311 Epoch: 19 Global Step: 413280 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:33,907-Speed 2513.19 samples/sec Loss 2.1389 LearningRate 0.000311 Epoch: 19 Global Step: 413290 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:42,116-Speed 2495.12 samples/sec Loss 2.1773 LearningRate 0.000311 Epoch: 19 Global Step: 413300 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:50,319-Speed 2497.17 samples/sec Loss 2.1142 LearningRate 0.000311 Epoch: 19 Global Step: 413310 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:02:58,517-Speed 2498.73 samples/sec Loss 2.1814 LearningRate 0.000311 Epoch: 19 Global Step: 413320 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:06,719-Speed 2497.41 samples/sec Loss 2.1105 LearningRate 0.000311 Epoch: 19 Global Step: 413330 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:14,923-Speed 2496.81 samples/sec Loss 2.1469 LearningRate 0.000311 Epoch: 19 Global Step: 413340 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:23,074-Speed 2513.01 samples/sec Loss 2.1792 LearningRate 0.000311 Epoch: 19 Global Step: 413350 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:31,276-Speed 2497.36 samples/sec Loss 2.1445 LearningRate 0.000311 Epoch: 19 Global Step: 413360 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:39,478-Speed 2497.19 samples/sec Loss 2.1477 LearningRate 0.000311 Epoch: 19 Global Step: 413370 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:47,685-Speed 2495.84 samples/sec Loss 2.1115 LearningRate 0.000311 Epoch: 19 Global Step: 413380 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:03:55,886-Speed 2497.61 samples/sec Loss 2.0950 LearningRate 0.000311 Epoch: 19 Global Step: 413390 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:04,088-Speed 2497.33 samples/sec Loss 2.1903 LearningRate 0.000311 Epoch: 19 Global Step: 413400 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:12,243-Speed 2512.14 samples/sec Loss 2.1845 LearningRate 0.000311 Epoch: 19 Global Step: 413410 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:20,448-Speed 2496.42 samples/sec Loss 2.1570 LearningRate 0.000311 Epoch: 19 Global Step: 413420 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:28,652-Speed 2496.82 samples/sec Loss 2.1936 LearningRate 0.000311 Epoch: 19 Global Step: 413430 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:36,853-Speed 2497.58 samples/sec Loss 2.1613 LearningRate 0.000311 Epoch: 19 Global Step: 413440 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:45,068-Speed 2493.26 samples/sec Loss 2.1557 LearningRate 0.000311 Epoch: 19 Global Step: 413450 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:04:53,275-Speed 2495.73 samples/sec Loss 2.1699 LearningRate 0.000311 Epoch: 19 Global Step: 413460 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:01,426-Speed 2513.03 samples/sec Loss 2.1459 LearningRate 0.000311 Epoch: 19 Global Step: 413470 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:09,639-Speed 2493.93 samples/sec Loss 2.1098 LearningRate 0.000311 Epoch: 19 Global Step: 413480 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:17,858-Speed 2492.37 samples/sec Loss 2.1706 LearningRate 0.000311 Epoch: 19 Global Step: 413490 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:26,063-Speed 2496.56 samples/sec Loss 2.2140 LearningRate 0.000311 Epoch: 19 Global Step: 413500 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:34,269-Speed 2496.17 samples/sec Loss 2.1086 LearningRate 0.000311 Epoch: 19 Global Step: 413510 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:42,476-Speed 2496.24 samples/sec Loss 2.1436 LearningRate 0.000311 Epoch: 19 Global Step: 413520 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:50,648-Speed 2506.29 samples/sec Loss 2.1417 LearningRate 0.000311 Epoch: 19 Global Step: 413530 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:05:58,873-Speed 2490.33 samples/sec Loss 2.1518 LearningRate 0.000310 Epoch: 19 Global Step: 413540 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:07,076-Speed 2496.98 samples/sec Loss 2.1338 LearningRate 0.000310 Epoch: 19 Global Step: 413550 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:15,279-Speed 2497.20 samples/sec Loss 2.1714 LearningRate 0.000310 Epoch: 19 Global Step: 413560 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:23,480-Speed 2497.79 samples/sec Loss 2.1254 LearningRate 0.000310 Epoch: 19 Global Step: 413570 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:31,687-Speed 2495.80 samples/sec Loss 2.1650 LearningRate 0.000310 Epoch: 19 Global Step: 413580 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:39,840-Speed 2512.25 samples/sec Loss 2.1770 LearningRate 0.000310 Epoch: 19 Global Step: 413590 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:48,062-Speed 2491.48 samples/sec Loss 2.1251 LearningRate 0.000310 Epoch: 19 Global Step: 413600 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:06:56,260-Speed 2498.56 samples/sec Loss 2.1467 LearningRate 0.000310 Epoch: 19 Global Step: 413610 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:04,460-Speed 2498.05 samples/sec Loss 2.1641 LearningRate 0.000310 Epoch: 19 Global Step: 413620 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:12,673-Speed 2493.73 samples/sec Loss 2.1956 LearningRate 0.000310 Epoch: 19 Global Step: 413630 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:20,876-Speed 2497.09 samples/sec Loss 2.1161 LearningRate 0.000310 Epoch: 19 Global Step: 413640 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:29,027-Speed 2513.05 samples/sec Loss 2.1645 LearningRate 0.000310 Epoch: 19 Global Step: 413650 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:37,234-Speed 2495.74 samples/sec Loss 2.1574 LearningRate 0.000310 Epoch: 19 Global Step: 413660 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:45,436-Speed 2497.44 samples/sec Loss 2.1620 LearningRate 0.000310 Epoch: 19 Global Step: 413670 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:07:53,638-Speed 2497.55 samples/sec Loss 2.1507 LearningRate 0.000310 Epoch: 19 Global Step: 413680 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:01,854-Speed 2492.97 samples/sec Loss 2.1243 LearningRate 0.000310 Epoch: 19 Global Step: 413690 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:10,059-Speed 2496.57 samples/sec Loss 2.1627 LearningRate 0.000310 Epoch: 19 Global Step: 413700 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:18,208-Speed 2513.51 samples/sec Loss 2.1477 LearningRate 0.000310 Epoch: 19 Global Step: 413710 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:26,412-Speed 2496.66 samples/sec Loss 2.1891 LearningRate 0.000310 Epoch: 19 Global Step: 413720 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:34,617-Speed 2496.46 samples/sec Loss 2.1525 LearningRate 0.000310 Epoch: 19 Global Step: 413730 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:42,827-Speed 2494.82 samples/sec Loss 2.1668 LearningRate 0.000310 Epoch: 19 Global Step: 413740 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:51,028-Speed 2497.75 samples/sec Loss 2.1604 LearningRate 0.000310 Epoch: 19 Global Step: 413750 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:08:59,231-Speed 2496.91 samples/sec Loss 2.1597 LearningRate 0.000310 Epoch: 19 Global Step: 413760 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:07,381-Speed 2513.47 samples/sec Loss 2.1773 LearningRate 0.000310 Epoch: 19 Global Step: 413770 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:15,588-Speed 2495.61 samples/sec Loss 2.1584 LearningRate 0.000310 Epoch: 19 Global Step: 413780 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:23,787-Speed 2498.91 samples/sec Loss 2.1303 LearningRate 0.000310 Epoch: 19 Global Step: 413790 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:31,989-Speed 2497.28 samples/sec Loss 2.1529 LearningRate 0.000310 Epoch: 19 Global Step: 413800 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:40,198-Speed 2495.24 samples/sec Loss 2.0991 LearningRate 0.000310 Epoch: 19 Global Step: 413810 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:48,399-Speed 2497.40 samples/sec Loss 2.1628 LearningRate 0.000310 Epoch: 19 Global Step: 413820 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:09:56,562-Speed 2509.60 samples/sec Loss 2.1563 LearningRate 0.000310 Epoch: 19 Global Step: 413830 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:04,766-Speed 2496.68 samples/sec Loss 2.1319 LearningRate 0.000310 Epoch: 19 Global Step: 413840 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:12,966-Speed 2498.09 samples/sec Loss 2.1028 LearningRate 0.000310 Epoch: 19 Global Step: 413850 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:21,168-Speed 2497.47 samples/sec Loss 2.1340 LearningRate 0.000310 Epoch: 19 Global Step: 413860 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:29,370-Speed 2497.57 samples/sec Loss 2.1272 LearningRate 0.000310 Epoch: 19 Global Step: 413870 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:37,570-Speed 2497.65 samples/sec Loss 2.1114 LearningRate 0.000310 Epoch: 19 Global Step: 413880 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:45,719-Speed 2513.69 samples/sec Loss 2.1329 LearningRate 0.000310 Epoch: 19 Global Step: 413890 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:10:53,932-Speed 2494.06 samples/sec Loss 2.1712 LearningRate 0.000310 Epoch: 19 Global Step: 413900 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:02,133-Speed 2498.13 samples/sec Loss 2.1919 LearningRate 0.000310 Epoch: 19 Global Step: 413910 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:10,351-Speed 2492.50 samples/sec Loss 2.1393 LearningRate 0.000310 Epoch: 19 Global Step: 413920 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:18,554-Speed 2497.00 samples/sec Loss 2.2240 LearningRate 0.000310 Epoch: 19 Global Step: 413930 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:26,759-Speed 2496.72 samples/sec Loss 2.1638 LearningRate 0.000310 Epoch: 19 Global Step: 413940 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:34,907-Speed 2513.78 samples/sec Loss 2.1248 LearningRate 0.000310 Epoch: 19 Global Step: 413950 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:43,113-Speed 2496.02 samples/sec Loss 2.1577 LearningRate 0.000310 Epoch: 19 Global Step: 413960 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:51,311-Speed 2498.60 samples/sec Loss 2.1108 LearningRate 0.000310 Epoch: 19 Global Step: 413970 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:11:59,519-Speed 2495.59 samples/sec Loss 2.1134 LearningRate 0.000310 Epoch: 19 Global Step: 413980 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:07,722-Speed 2497.16 samples/sec Loss 2.1705 LearningRate 0.000310 Epoch: 19 Global Step: 413990 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:15,921-Speed 2498.14 samples/sec Loss 2.1623 LearningRate 0.000310 Epoch: 19 Global Step: 414000 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:24,069-Speed 2513.89 samples/sec Loss 2.1759 LearningRate 0.000310 Epoch: 19 Global Step: 414010 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:32,272-Speed 2497.15 samples/sec Loss 2.1362 LearningRate 0.000310 Epoch: 19 Global Step: 414020 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:40,476-Speed 2496.91 samples/sec Loss 2.1490 LearningRate 0.000310 Epoch: 19 Global Step: 414030 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:48,677-Speed 2497.88 samples/sec Loss 2.1800 LearningRate 0.000310 Epoch: 19 Global Step: 414040 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:12:56,876-Speed 2498.02 samples/sec Loss 2.1739 LearningRate 0.000310 Epoch: 19 Global Step: 414050 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:05,080-Speed 2497.18 samples/sec Loss 2.1867 LearningRate 0.000310 Epoch: 19 Global Step: 414060 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:13,225-Speed 2515.44 samples/sec Loss 2.1553 LearningRate 0.000310 Epoch: 19 Global Step: 414070 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:21,441-Speed 2493.02 samples/sec Loss 2.0606 LearningRate 0.000310 Epoch: 19 Global Step: 414080 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:29,647-Speed 2496.32 samples/sec Loss 2.1475 LearningRate 0.000310 Epoch: 19 Global Step: 414090 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:37,848-Speed 2497.57 samples/sec Loss 2.1113 LearningRate 0.000310 Epoch: 19 Global Step: 414100 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:46,052-Speed 2496.93 samples/sec Loss 2.1611 LearningRate 0.000310 Epoch: 19 Global Step: 414110 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:13:54,252-Speed 2497.86 samples/sec Loss 2.1497 LearningRate 0.000310 Epoch: 19 Global Step: 414120 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:02,400-Speed 2514.00 samples/sec Loss 2.1541 LearningRate 0.000310 Epoch: 19 Global Step: 414130 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:10,600-Speed 2497.95 samples/sec Loss 2.1400 LearningRate 0.000310 Epoch: 19 Global Step: 414140 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:18,800-Speed 2497.90 samples/sec Loss 2.1622 LearningRate 0.000310 Epoch: 19 Global Step: 414150 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:26,999-Speed 2498.27 samples/sec Loss 2.1042 LearningRate 0.000310 Epoch: 19 Global Step: 414160 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:35,202-Speed 2497.15 samples/sec Loss 2.1244 LearningRate 0.000310 Epoch: 19 Global Step: 414170 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:43,403-Speed 2497.54 samples/sec Loss 2.1376 LearningRate 0.000310 Epoch: 19 Global Step: 414180 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:51,567-Speed 2509.19 samples/sec Loss 2.1684 LearningRate 0.000310 Epoch: 19 Global Step: 414190 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:14:59,768-Speed 2497.70 samples/sec Loss 2.1959 LearningRate 0.000310 Epoch: 19 Global Step: 414200 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:07,971-Speed 2497.21 samples/sec Loss 2.1956 LearningRate 0.000309 Epoch: 19 Global Step: 414210 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:16,172-Speed 2497.62 samples/sec Loss 2.1950 LearningRate 0.000309 Epoch: 19 Global Step: 414220 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:24,380-Speed 2495.65 samples/sec Loss 2.1146 LearningRate 0.000309 Epoch: 19 Global Step: 414230 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:32,580-Speed 2497.67 samples/sec Loss 2.1448 LearningRate 0.000309 Epoch: 19 Global Step: 414240 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:40,728-Speed 2514.00 samples/sec Loss 2.2074 LearningRate 0.000309 Epoch: 19 Global Step: 414250 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:48,938-Speed 2495.11 samples/sec Loss 2.1782 LearningRate 0.000309 Epoch: 19 Global Step: 414260 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:15:57,143-Speed 2496.37 samples/sec Loss 2.1886 LearningRate 0.000309 Epoch: 19 Global Step: 414270 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:05,344-Speed 2497.83 samples/sec Loss 2.1616 LearningRate 0.000309 Epoch: 19 Global Step: 414280 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:13,548-Speed 2496.74 samples/sec Loss 2.1654 LearningRate 0.000309 Epoch: 19 Global Step: 414290 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:21,753-Speed 2496.39 samples/sec Loss 2.1031 LearningRate 0.000309 Epoch: 19 Global Step: 414300 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:29,904-Speed 2513.04 samples/sec Loss 2.1441 LearningRate 0.000309 Epoch: 19 Global Step: 414310 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:38,104-Speed 2497.92 samples/sec Loss 2.1188 LearningRate 0.000309 Epoch: 19 Global Step: 414320 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:46,309-Speed 2496.61 samples/sec Loss 2.1274 LearningRate 0.000309 Epoch: 19 Global Step: 414330 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:16:54,512-Speed 2496.72 samples/sec Loss 2.1398 LearningRate 0.000309 Epoch: 19 Global Step: 414340 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:02,712-Speed 2498.13 samples/sec Loss 2.1388 LearningRate 0.000309 Epoch: 19 Global Step: 414350 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:10,915-Speed 2496.95 samples/sec Loss 2.1516 LearningRate 0.000309 Epoch: 19 Global Step: 414360 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:19,061-Speed 2515.00 samples/sec Loss 2.1293 LearningRate 0.000309 Epoch: 19 Global Step: 414370 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:27,263-Speed 2497.58 samples/sec Loss 2.0932 LearningRate 0.000309 Epoch: 19 Global Step: 414380 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:35,466-Speed 2496.79 samples/sec Loss 2.1224 LearningRate 0.000309 Epoch: 19 Global Step: 414390 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:43,670-Speed 2497.16 samples/sec Loss 2.1031 LearningRate 0.000309 Epoch: 19 Global Step: 414400 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:17:51,870-Speed 2498.01 samples/sec Loss 2.1061 LearningRate 0.000309 Epoch: 19 Global Step: 414410 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:18:00,074-Speed 2496.46 samples/sec Loss 2.1794 LearningRate 0.000309 Epoch: 19 Global Step: 414420 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:18:08,223-Speed 2513.65 samples/sec Loss 2.0946 LearningRate 0.000309 Epoch: 19 Global Step: 414430 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:18:16,426-Speed 2497.03 samples/sec Loss 2.1683 LearningRate 0.000309 Epoch: 19 Global Step: 414440 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:18:24,626-Speed 2498.07 samples/sec Loss 2.1255 LearningRate 0.000309 Epoch: 19 Global Step: 414450 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:18:32,826-Speed 2497.93 samples/sec Loss 2.1384 LearningRate 0.000309 Epoch: 19 Global Step: 414460 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:18:41,028-Speed 2497.28 samples/sec Loss 2.2095 LearningRate 0.000309 Epoch: 19 Global Step: 414470 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:18:49,232-Speed 2496.61 samples/sec Loss 2.1346 LearningRate 0.000309 Epoch: 19 Global Step: 414480 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:18:57,386-Speed 2512.42 samples/sec Loss 2.1389 LearningRate 0.000309 Epoch: 19 Global Step: 414490 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:05,588-Speed 2497.26 samples/sec Loss 2.1888 LearningRate 0.000309 Epoch: 19 Global Step: 414500 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:13,788-Speed 2497.76 samples/sec Loss 2.1425 LearningRate 0.000309 Epoch: 19 Global Step: 414510 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:21,991-Speed 2497.17 samples/sec Loss 2.1267 LearningRate 0.000309 Epoch: 19 Global Step: 414520 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:30,193-Speed 2497.40 samples/sec Loss 2.1512 LearningRate 0.000309 Epoch: 19 Global Step: 414530 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:38,410-Speed 2492.68 samples/sec Loss 2.1478 LearningRate 0.000309 Epoch: 19 Global Step: 414540 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:46,565-Speed 2511.84 samples/sec Loss 2.1287 LearningRate 0.000309 Epoch: 19 Global Step: 414550 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:19:54,765-Speed 2498.26 samples/sec Loss 2.1176 LearningRate 0.000309 Epoch: 19 Global Step: 414560 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:02,965-Speed 2497.77 samples/sec Loss 2.1211 LearningRate 0.000309 Epoch: 19 Global Step: 414570 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:11,168-Speed 2497.15 samples/sec Loss 2.1322 LearningRate 0.000309 Epoch: 19 Global Step: 414580 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:19,370-Speed 2496.99 samples/sec Loss 2.1147 LearningRate 0.000309 Epoch: 19 Global Step: 414590 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:27,576-Speed 2496.73 samples/sec Loss 2.1902 LearningRate 0.000309 Epoch: 19 Global Step: 414600 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:35,725-Speed 2513.72 samples/sec Loss 2.1657 LearningRate 0.000309 Epoch: 19 Global Step: 414610 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:43,927-Speed 2497.34 samples/sec Loss 2.1278 LearningRate 0.000309 Epoch: 19 Global Step: 414620 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:20:52,129-Speed 2497.42 samples/sec Loss 2.1781 LearningRate 0.000309 Epoch: 19 Global Step: 414630 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:00,328-Speed 2498.18 samples/sec Loss 2.1307 LearningRate 0.000309 Epoch: 19 Global Step: 414640 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:08,529-Speed 2497.65 samples/sec Loss 2.1441 LearningRate 0.000309 Epoch: 19 Global Step: 414650 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:16,729-Speed 2497.85 samples/sec Loss 2.2038 LearningRate 0.000309 Epoch: 19 Global Step: 414660 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:24,876-Speed 2514.51 samples/sec Loss 2.1285 LearningRate 0.000309 Epoch: 19 Global Step: 414670 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:33,095-Speed 2492.21 samples/sec Loss 2.1571 LearningRate 0.000309 Epoch: 19 Global Step: 414680 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:41,295-Speed 2497.96 samples/sec Loss 2.1775 LearningRate 0.000309 Epoch: 19 Global Step: 414690 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:49,494-Speed 2498.10 samples/sec Loss 2.1836 LearningRate 0.000309 Epoch: 19 Global Step: 414700 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:21:57,696-Speed 2497.40 samples/sec Loss 2.1629 LearningRate 0.000309 Epoch: 19 Global Step: 414710 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:05,898-Speed 2497.37 samples/sec Loss 2.1485 LearningRate 0.000309 Epoch: 19 Global Step: 414720 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:14,049-Speed 2513.10 samples/sec Loss 2.2051 LearningRate 0.000309 Epoch: 19 Global Step: 414730 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:22,256-Speed 2495.73 samples/sec Loss 2.1742 LearningRate 0.000309 Epoch: 19 Global Step: 414740 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:30,456-Speed 2498.15 samples/sec Loss 2.2107 LearningRate 0.000309 Epoch: 19 Global Step: 414750 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:38,658-Speed 2497.39 samples/sec Loss 2.1996 LearningRate 0.000309 Epoch: 19 Global Step: 414760 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:46,869-Speed 2494.56 samples/sec Loss 2.1536 LearningRate 0.000309 Epoch: 19 Global Step: 414770 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:22:55,068-Speed 2498.39 samples/sec Loss 2.1398 LearningRate 0.000309 Epoch: 19 Global Step: 414780 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:23:03,229-Speed 2510.05 samples/sec Loss 2.2042 LearningRate 0.000309 Epoch: 19 Global Step: 414790 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:23:13,934-Speed 1913.33 samples/sec Loss 2.1478 LearningRate 0.000309 Epoch: 20 Global Step: 414800 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:23:22,128-Speed 2499.76 samples/sec Loss 2.1123 LearningRate 0.000309 Epoch: 20 Global Step: 414810 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:23:30,340-Speed 2494.36 samples/sec Loss 2.2057 LearningRate 0.000309 Epoch: 20 Global Step: 414820 Fp16 Grad Scale: 32768 Required: 95 hours Training: 2022-07-09 13:23:38,496-Speed 2511.44 samples/sec Loss 2.1233 LearningRate 0.000309 Epoch: 20 Global Step: 414830 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:23:46,701-Speed 2496.36 samples/sec Loss 2.1387 LearningRate 0.000309 Epoch: 20 Global Step: 414840 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:23:54,867-Speed 2508.49 samples/sec Loss 2.1666 LearningRate 0.000309 Epoch: 20 Global Step: 414850 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:03,068-Speed 2497.65 samples/sec Loss 2.1229 LearningRate 0.000309 Epoch: 20 Global Step: 414860 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:11,274-Speed 2496.51 samples/sec Loss 2.1369 LearningRate 0.000309 Epoch: 20 Global Step: 414870 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:19,474-Speed 2498.03 samples/sec Loss 2.1173 LearningRate 0.000308 Epoch: 20 Global Step: 414880 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:27,673-Speed 2498.03 samples/sec Loss 2.1250 LearningRate 0.000308 Epoch: 20 Global Step: 414890 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:35,875-Speed 2497.23 samples/sec Loss 2.1697 LearningRate 0.000308 Epoch: 20 Global Step: 414900 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:44,022-Speed 2514.38 samples/sec Loss 2.1703 LearningRate 0.000308 Epoch: 20 Global Step: 414910 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:24:52,221-Speed 2498.28 samples/sec Loss 2.1076 LearningRate 0.000308 Epoch: 20 Global Step: 414920 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:00,423-Speed 2497.24 samples/sec Loss 2.1663 LearningRate 0.000308 Epoch: 20 Global Step: 414930 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:08,622-Speed 2498.61 samples/sec Loss 2.1043 LearningRate 0.000308 Epoch: 20 Global Step: 414940 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:16,827-Speed 2496.63 samples/sec Loss 2.1466 LearningRate 0.000308 Epoch: 20 Global Step: 414950 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:25,027-Speed 2497.73 samples/sec Loss 2.1330 LearningRate 0.000308 Epoch: 20 Global Step: 414960 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:33,175-Speed 2513.87 samples/sec Loss 2.1122 LearningRate 0.000308 Epoch: 20 Global Step: 414970 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:41,377-Speed 2497.73 samples/sec Loss 2.0945 LearningRate 0.000308 Epoch: 20 Global Step: 414980 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:49,573-Speed 2498.88 samples/sec Loss 2.0933 LearningRate 0.000308 Epoch: 20 Global Step: 414990 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:25:57,774-Speed 2497.85 samples/sec Loss 2.1419 LearningRate 0.000308 Epoch: 20 Global Step: 415000 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:05,978-Speed 2496.62 samples/sec Loss 2.1914 LearningRate 0.000308 Epoch: 20 Global Step: 415010 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:14,192-Speed 2493.66 samples/sec Loss 2.1136 LearningRate 0.000308 Epoch: 20 Global Step: 415020 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:22,342-Speed 2513.40 samples/sec Loss 2.1005 LearningRate 0.000308 Epoch: 20 Global Step: 415030 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:30,544-Speed 2497.35 samples/sec Loss 2.1052 LearningRate 0.000308 Epoch: 20 Global Step: 415040 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:38,744-Speed 2497.88 samples/sec Loss 2.1099 LearningRate 0.000308 Epoch: 20 Global Step: 415050 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:46,949-Speed 2496.86 samples/sec Loss 2.1203 LearningRate 0.000308 Epoch: 20 Global Step: 415060 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:26:55,148-Speed 2498.66 samples/sec Loss 2.1712 LearningRate 0.000308 Epoch: 20 Global Step: 415070 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:03,348-Speed 2497.99 samples/sec Loss 2.1227 LearningRate 0.000308 Epoch: 20 Global Step: 415080 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:11,496-Speed 2513.74 samples/sec Loss 2.1031 LearningRate 0.000308 Epoch: 20 Global Step: 415090 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:19,695-Speed 2498.32 samples/sec Loss 2.1363 LearningRate 0.000308 Epoch: 20 Global Step: 415100 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:27,893-Speed 2498.65 samples/sec Loss 2.1732 LearningRate 0.000308 Epoch: 20 Global Step: 415110 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:36,096-Speed 2497.00 samples/sec Loss 2.0985 LearningRate 0.000308 Epoch: 20 Global Step: 415120 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:44,294-Speed 2498.52 samples/sec Loss 2.1177 LearningRate 0.000308 Epoch: 20 Global Step: 415130 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:27:52,497-Speed 2497.23 samples/sec Loss 2.0795 LearningRate 0.000308 Epoch: 20 Global Step: 415140 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:00,655-Speed 2510.83 samples/sec Loss 2.1263 LearningRate 0.000308 Epoch: 20 Global Step: 415150 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:08,857-Speed 2497.19 samples/sec Loss 2.0909 LearningRate 0.000308 Epoch: 20 Global Step: 415160 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:17,056-Speed 2498.34 samples/sec Loss 2.1105 LearningRate 0.000308 Epoch: 20 Global Step: 415170 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:25,267-Speed 2494.80 samples/sec Loss 2.0919 LearningRate 0.000308 Epoch: 20 Global Step: 415180 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:33,463-Speed 2499.07 samples/sec Loss 2.0975 LearningRate 0.000308 Epoch: 20 Global Step: 415190 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:41,671-Speed 2495.64 samples/sec Loss 2.1259 LearningRate 0.000308 Epoch: 20 Global Step: 415200 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:49,814-Speed 2515.48 samples/sec Loss 2.1835 LearningRate 0.000308 Epoch: 20 Global Step: 415210 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:28:58,014-Speed 2498.34 samples/sec Loss 2.1309 LearningRate 0.000308 Epoch: 20 Global Step: 415220 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:06,214-Speed 2497.75 samples/sec Loss 2.1081 LearningRate 0.000308 Epoch: 20 Global Step: 415230 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:14,413-Speed 2498.31 samples/sec Loss 2.0761 LearningRate 0.000308 Epoch: 20 Global Step: 415240 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:22,612-Speed 2498.58 samples/sec Loss 2.1401 LearningRate 0.000308 Epoch: 20 Global Step: 415250 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:30,816-Speed 2496.80 samples/sec Loss 2.1594 LearningRate 0.000308 Epoch: 20 Global Step: 415260 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:38,959-Speed 2515.20 samples/sec Loss 2.1305 LearningRate 0.000308 Epoch: 20 Global Step: 415270 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:47,160-Speed 2497.71 samples/sec Loss 2.1476 LearningRate 0.000308 Epoch: 20 Global Step: 415280 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:29:55,362-Speed 2497.33 samples/sec Loss 2.1029 LearningRate 0.000308 Epoch: 20 Global Step: 415290 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:03,561-Speed 2498.36 samples/sec Loss 2.0859 LearningRate 0.000308 Epoch: 20 Global Step: 415300 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:11,770-Speed 2495.21 samples/sec Loss 2.1662 LearningRate 0.000308 Epoch: 20 Global Step: 415310 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:19,969-Speed 2498.19 samples/sec Loss 2.0984 LearningRate 0.000308 Epoch: 20 Global Step: 415320 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:28,114-Speed 2515.46 samples/sec Loss 2.1155 LearningRate 0.000308 Epoch: 20 Global Step: 415330 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:36,316-Speed 2497.68 samples/sec Loss 2.0902 LearningRate 0.000308 Epoch: 20 Global Step: 415340 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:44,518-Speed 2497.24 samples/sec Loss 2.1327 LearningRate 0.000308 Epoch: 20 Global Step: 415350 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:30:52,718-Speed 2498.23 samples/sec Loss 2.1002 LearningRate 0.000308 Epoch: 20 Global Step: 415360 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:00,916-Speed 2498.59 samples/sec Loss 2.1124 LearningRate 0.000308 Epoch: 20 Global Step: 415370 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:09,134-Speed 2492.56 samples/sec Loss 2.1143 LearningRate 0.000308 Epoch: 20 Global Step: 415380 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:17,276-Speed 2515.60 samples/sec Loss 2.1596 LearningRate 0.000308 Epoch: 20 Global Step: 415390 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:25,474-Speed 2498.95 samples/sec Loss 2.1177 LearningRate 0.000308 Epoch: 20 Global Step: 415400 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:33,677-Speed 2497.24 samples/sec Loss 2.1307 LearningRate 0.000308 Epoch: 20 Global Step: 415410 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:41,874-Speed 2498.67 samples/sec Loss 2.1511 LearningRate 0.000308 Epoch: 20 Global Step: 415420 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:50,071-Speed 2498.96 samples/sec Loss 2.1226 LearningRate 0.000308 Epoch: 20 Global Step: 415430 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:31:58,272-Speed 2497.87 samples/sec Loss 2.1745 LearningRate 0.000308 Epoch: 20 Global Step: 415440 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:06,416-Speed 2515.14 samples/sec Loss 2.1094 LearningRate 0.000308 Epoch: 20 Global Step: 415450 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:14,614-Speed 2498.70 samples/sec Loss 2.1097 LearningRate 0.000308 Epoch: 20 Global Step: 415460 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:22,814-Speed 2497.85 samples/sec Loss 2.1069 LearningRate 0.000308 Epoch: 20 Global Step: 415470 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:31,013-Speed 2498.68 samples/sec Loss 2.0853 LearningRate 0.000308 Epoch: 20 Global Step: 415480 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:39,212-Speed 2498.03 samples/sec Loss 2.1494 LearningRate 0.000308 Epoch: 20 Global Step: 415490 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:47,427-Speed 2493.33 samples/sec Loss 2.1330 LearningRate 0.000308 Epoch: 20 Global Step: 415500 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:32:55,571-Speed 2515.14 samples/sec Loss 2.0866 LearningRate 0.000308 Epoch: 20 Global Step: 415510 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:03,775-Speed 2496.92 samples/sec Loss 2.0613 LearningRate 0.000308 Epoch: 20 Global Step: 415520 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:11,988-Speed 2494.04 samples/sec Loss 2.1732 LearningRate 0.000308 Epoch: 20 Global Step: 415530 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:20,193-Speed 2496.43 samples/sec Loss 2.1439 LearningRate 0.000308 Epoch: 20 Global Step: 415540 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:28,392-Speed 2498.25 samples/sec Loss 2.1647 LearningRate 0.000307 Epoch: 20 Global Step: 415550 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:36,594-Speed 2497.40 samples/sec Loss 2.1720 LearningRate 0.000307 Epoch: 20 Global Step: 415560 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:44,751-Speed 2511.16 samples/sec Loss 2.1662 LearningRate 0.000307 Epoch: 20 Global Step: 415570 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:33:52,950-Speed 2498.33 samples/sec Loss 2.1292 LearningRate 0.000307 Epoch: 20 Global Step: 415580 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:01,154-Speed 2496.61 samples/sec Loss 2.1247 LearningRate 0.000307 Epoch: 20 Global Step: 415590 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:09,355-Speed 2497.68 samples/sec Loss 2.1599 LearningRate 0.000307 Epoch: 20 Global Step: 415600 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:17,554-Speed 2498.37 samples/sec Loss 2.1279 LearningRate 0.000307 Epoch: 20 Global Step: 415610 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:25,751-Speed 2498.74 samples/sec Loss 2.1334 LearningRate 0.000307 Epoch: 20 Global Step: 415620 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:33,898-Speed 2514.02 samples/sec Loss 2.1549 LearningRate 0.000307 Epoch: 20 Global Step: 415630 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:42,097-Speed 2498.26 samples/sec Loss 2.1535 LearningRate 0.000307 Epoch: 20 Global Step: 415640 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:50,305-Speed 2495.58 samples/sec Loss 2.1196 LearningRate 0.000307 Epoch: 20 Global Step: 415650 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:34:58,501-Speed 2499.00 samples/sec Loss 2.0949 LearningRate 0.000307 Epoch: 20 Global Step: 415660 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:06,700-Speed 2498.44 samples/sec Loss 2.1443 LearningRate 0.000307 Epoch: 20 Global Step: 415670 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:14,903-Speed 2496.94 samples/sec Loss 2.1177 LearningRate 0.000307 Epoch: 20 Global Step: 415680 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:23,054-Speed 2513.30 samples/sec Loss 2.1108 LearningRate 0.000307 Epoch: 20 Global Step: 415690 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:31,255-Speed 2497.71 samples/sec Loss 2.1380 LearningRate 0.000307 Epoch: 20 Global Step: 415700 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:39,455-Speed 2497.91 samples/sec Loss 2.1052 LearningRate 0.000307 Epoch: 20 Global Step: 415710 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:47,653-Speed 2498.59 samples/sec Loss 2.1328 LearningRate 0.000307 Epoch: 20 Global Step: 415720 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:35:55,863-Speed 2494.77 samples/sec Loss 2.1088 LearningRate 0.000307 Epoch: 20 Global Step: 415730 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:04,060-Speed 2498.62 samples/sec Loss 2.1381 LearningRate 0.000307 Epoch: 20 Global Step: 415740 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:12,207-Speed 2514.31 samples/sec Loss 2.1075 LearningRate 0.000307 Epoch: 20 Global Step: 415750 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:20,405-Speed 2498.60 samples/sec Loss 2.1311 LearningRate 0.000307 Epoch: 20 Global Step: 415760 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:28,607-Speed 2497.34 samples/sec Loss 2.0880 LearningRate 0.000307 Epoch: 20 Global Step: 415770 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:36,822-Speed 2493.54 samples/sec Loss 2.1667 LearningRate 0.000307 Epoch: 20 Global Step: 415780 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:45,024-Speed 2497.29 samples/sec Loss 2.1349 LearningRate 0.000307 Epoch: 20 Global Step: 415790 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:36:53,225-Speed 2497.60 samples/sec Loss 2.1299 LearningRate 0.000307 Epoch: 20 Global Step: 415800 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:01,372-Speed 2514.36 samples/sec Loss 2.1061 LearningRate 0.000307 Epoch: 20 Global Step: 415810 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:09,575-Speed 2496.73 samples/sec Loss 2.1640 LearningRate 0.000307 Epoch: 20 Global Step: 415820 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:17,777-Speed 2497.60 samples/sec Loss 2.1306 LearningRate 0.000307 Epoch: 20 Global Step: 415830 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:25,978-Speed 2497.50 samples/sec Loss 2.1272 LearningRate 0.000307 Epoch: 20 Global Step: 415840 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:34,194-Speed 2493.23 samples/sec Loss 2.0900 LearningRate 0.000307 Epoch: 20 Global Step: 415850 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:42,395-Speed 2497.59 samples/sec Loss 2.1117 LearningRate 0.000307 Epoch: 20 Global Step: 415860 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:50,547-Speed 2512.75 samples/sec Loss 2.0891 LearningRate 0.000307 Epoch: 20 Global Step: 415870 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:37:58,754-Speed 2495.70 samples/sec Loss 2.1575 LearningRate 0.000307 Epoch: 20 Global Step: 415880 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:06,960-Speed 2496.05 samples/sec Loss 2.1163 LearningRate 0.000307 Epoch: 20 Global Step: 415890 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:15,163-Speed 2497.37 samples/sec Loss 2.1641 LearningRate 0.000307 Epoch: 20 Global Step: 415900 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:23,369-Speed 2496.23 samples/sec Loss 2.1139 LearningRate 0.000307 Epoch: 20 Global Step: 415910 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:31,570-Speed 2497.57 samples/sec Loss 2.1379 LearningRate 0.000307 Epoch: 20 Global Step: 415920 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:39,716-Speed 2514.46 samples/sec Loss 2.1271 LearningRate 0.000307 Epoch: 20 Global Step: 415930 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:47,914-Speed 2498.65 samples/sec Loss 2.1144 LearningRate 0.000307 Epoch: 20 Global Step: 415940 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:38:56,111-Speed 2498.84 samples/sec Loss 2.1242 LearningRate 0.000307 Epoch: 20 Global Step: 415950 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:04,309-Speed 2498.65 samples/sec Loss 2.1437 LearningRate 0.000307 Epoch: 20 Global Step: 415960 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:12,514-Speed 2496.43 samples/sec Loss 2.1161 LearningRate 0.000307 Epoch: 20 Global Step: 415970 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:20,715-Speed 2497.67 samples/sec Loss 2.1643 LearningRate 0.000307 Epoch: 20 Global Step: 415980 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:28,860-Speed 2514.69 samples/sec Loss 2.1117 LearningRate 0.000307 Epoch: 20 Global Step: 415990 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:37,058-Speed 2499.14 samples/sec Loss 2.1465 LearningRate 0.000307 Epoch: 20 Global Step: 416000 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-07-09 13:39:45,220-Speed 2509.33 samples/sec Loss 2.1175 LearningRate 0.000307 Epoch: 20 Global Step: 416010 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:39:53,419-Speed 2498.40 samples/sec Loss 2.1481 LearningRate 0.000307 Epoch: 20 Global Step: 416020 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:01,615-Speed 2499.19 samples/sec Loss 2.1594 LearningRate 0.000307 Epoch: 20 Global Step: 416030 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:09,814-Speed 2498.30 samples/sec Loss 2.1495 LearningRate 0.000307 Epoch: 20 Global Step: 416040 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:17,960-Speed 2514.70 samples/sec Loss 2.0613 LearningRate 0.000307 Epoch: 20 Global Step: 416050 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:26,157-Speed 2498.98 samples/sec Loss 2.1315 LearningRate 0.000307 Epoch: 20 Global Step: 416060 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:34,359-Speed 2497.24 samples/sec Loss 2.1089 LearningRate 0.000307 Epoch: 20 Global Step: 416070 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:42,567-Speed 2495.57 samples/sec Loss 2.1079 LearningRate 0.000307 Epoch: 20 Global Step: 416080 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:50,767-Speed 2497.82 samples/sec Loss 2.0905 LearningRate 0.000307 Epoch: 20 Global Step: 416090 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:40:58,970-Speed 2496.89 samples/sec Loss 2.1117 LearningRate 0.000307 Epoch: 20 Global Step: 416100 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:07,119-Speed 2513.73 samples/sec Loss 2.0985 LearningRate 0.000307 Epoch: 20 Global Step: 416110 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:15,319-Speed 2498.05 samples/sec Loss 2.1438 LearningRate 0.000307 Epoch: 20 Global Step: 416120 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:23,524-Speed 2496.58 samples/sec Loss 2.1289 LearningRate 0.000307 Epoch: 20 Global Step: 416130 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:31,727-Speed 2496.75 samples/sec Loss 2.1761 LearningRate 0.000307 Epoch: 20 Global Step: 416140 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:39,927-Speed 2498.15 samples/sec Loss 2.1239 LearningRate 0.000307 Epoch: 20 Global Step: 416150 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:48,129-Speed 2497.13 samples/sec Loss 2.1493 LearningRate 0.000307 Epoch: 20 Global Step: 416160 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:41:56,276-Speed 2514.37 samples/sec Loss 2.1671 LearningRate 0.000307 Epoch: 20 Global Step: 416170 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:04,478-Speed 2497.31 samples/sec Loss 2.1067 LearningRate 0.000307 Epoch: 20 Global Step: 416180 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:12,687-Speed 2495.40 samples/sec Loss 2.1170 LearningRate 0.000307 Epoch: 20 Global Step: 416190 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:20,892-Speed 2496.42 samples/sec Loss 2.1796 LearningRate 0.000307 Epoch: 20 Global Step: 416200 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:29,095-Speed 2496.85 samples/sec Loss 2.0996 LearningRate 0.000307 Epoch: 20 Global Step: 416210 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:37,300-Speed 2496.50 samples/sec Loss 2.0880 LearningRate 0.000307 Epoch: 20 Global Step: 416220 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:45,444-Speed 2515.14 samples/sec Loss 2.1649 LearningRate 0.000306 Epoch: 20 Global Step: 416230 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:42:53,646-Speed 2497.17 samples/sec Loss 2.0907 LearningRate 0.000306 Epoch: 20 Global Step: 416240 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:01,849-Speed 2497.29 samples/sec Loss 2.1035 LearningRate 0.000306 Epoch: 20 Global Step: 416250 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:10,049-Speed 2497.98 samples/sec Loss 2.1018 LearningRate 0.000306 Epoch: 20 Global Step: 416260 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:18,258-Speed 2494.92 samples/sec Loss 2.0915 LearningRate 0.000306 Epoch: 20 Global Step: 416270 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:26,458-Speed 2498.42 samples/sec Loss 2.1469 LearningRate 0.000306 Epoch: 20 Global Step: 416280 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:34,607-Speed 2513.52 samples/sec Loss 2.0979 LearningRate 0.000306 Epoch: 20 Global Step: 416290 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:42,810-Speed 2497.01 samples/sec Loss 2.0867 LearningRate 0.000306 Epoch: 20 Global Step: 416300 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:51,015-Speed 2497.08 samples/sec Loss 2.1158 LearningRate 0.000306 Epoch: 20 Global Step: 416310 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:43:59,214-Speed 2498.37 samples/sec Loss 2.1251 LearningRate 0.000306 Epoch: 20 Global Step: 416320 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:07,416-Speed 2497.26 samples/sec Loss 2.1304 LearningRate 0.000306 Epoch: 20 Global Step: 416330 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:15,622-Speed 2496.20 samples/sec Loss 2.1269 LearningRate 0.000306 Epoch: 20 Global Step: 416340 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:23,784-Speed 2509.57 samples/sec Loss 2.1470 LearningRate 0.000306 Epoch: 20 Global Step: 416350 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:31,997-Speed 2494.02 samples/sec Loss 2.0933 LearningRate 0.000306 Epoch: 20 Global Step: 416360 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:40,203-Speed 2496.14 samples/sec Loss 2.1067 LearningRate 0.000306 Epoch: 20 Global Step: 416370 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:48,401-Speed 2498.34 samples/sec Loss 2.1198 LearningRate 0.000306 Epoch: 20 Global Step: 416380 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:44:56,604-Speed 2497.28 samples/sec Loss 2.1107 LearningRate 0.000306 Epoch: 20 Global Step: 416390 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:04,816-Speed 2494.46 samples/sec Loss 2.1145 LearningRate 0.000306 Epoch: 20 Global Step: 416400 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:12,963-Speed 2514.13 samples/sec Loss 2.1154 LearningRate 0.000306 Epoch: 20 Global Step: 416410 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:21,165-Speed 2497.78 samples/sec Loss 2.1662 LearningRate 0.000306 Epoch: 20 Global Step: 416420 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:29,365-Speed 2497.70 samples/sec Loss 2.0963 LearningRate 0.000306 Epoch: 20 Global Step: 416430 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:37,579-Speed 2494.11 samples/sec Loss 2.1595 LearningRate 0.000306 Epoch: 20 Global Step: 416440 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:45,779-Speed 2497.64 samples/sec Loss 2.1545 LearningRate 0.000306 Epoch: 20 Global Step: 416450 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:45:53,981-Speed 2497.43 samples/sec Loss 2.1687 LearningRate 0.000306 Epoch: 20 Global Step: 416460 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:02,128-Speed 2514.46 samples/sec Loss 2.1129 LearningRate 0.000306 Epoch: 20 Global Step: 416470 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:10,331-Speed 2497.22 samples/sec Loss 2.1161 LearningRate 0.000306 Epoch: 20 Global Step: 416480 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:18,530-Speed 2498.43 samples/sec Loss 2.1442 LearningRate 0.000306 Epoch: 20 Global Step: 416490 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:26,727-Speed 2498.58 samples/sec Loss 2.1391 LearningRate 0.000306 Epoch: 20 Global Step: 416500 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:34,929-Speed 2497.54 samples/sec Loss 2.0900 LearningRate 0.000306 Epoch: 20 Global Step: 416510 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:43,125-Speed 2499.27 samples/sec Loss 2.0988 LearningRate 0.000306 Epoch: 20 Global Step: 416520 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:51,271-Speed 2514.51 samples/sec Loss 2.1027 LearningRate 0.000306 Epoch: 20 Global Step: 416530 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:46:59,472-Speed 2497.66 samples/sec Loss 2.1472 LearningRate 0.000306 Epoch: 20 Global Step: 416540 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:07,671-Speed 2498.41 samples/sec Loss 2.1578 LearningRate 0.000306 Epoch: 20 Global Step: 416550 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:15,871-Speed 2497.69 samples/sec Loss 2.0958 LearningRate 0.000306 Epoch: 20 Global Step: 416560 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:24,070-Speed 2498.49 samples/sec Loss 2.1272 LearningRate 0.000306 Epoch: 20 Global Step: 416570 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:32,269-Speed 2498.33 samples/sec Loss 2.1281 LearningRate 0.000306 Epoch: 20 Global Step: 416580 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:40,418-Speed 2513.70 samples/sec Loss 2.1534 LearningRate 0.000306 Epoch: 20 Global Step: 416590 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:48,615-Speed 2498.62 samples/sec Loss 2.1117 LearningRate 0.000306 Epoch: 20 Global Step: 416600 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:47:56,827-Speed 2494.30 samples/sec Loss 2.2045 LearningRate 0.000306 Epoch: 20 Global Step: 416610 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:05,030-Speed 2497.36 samples/sec Loss 2.1437 LearningRate 0.000306 Epoch: 20 Global Step: 416620 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:13,240-Speed 2494.73 samples/sec Loss 2.1583 LearningRate 0.000306 Epoch: 20 Global Step: 416630 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:21,450-Speed 2494.84 samples/sec Loss 2.1733 LearningRate 0.000306 Epoch: 20 Global Step: 416640 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:29,602-Speed 2513.26 samples/sec Loss 2.1317 LearningRate 0.000306 Epoch: 20 Global Step: 416650 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:37,797-Speed 2499.32 samples/sec Loss 2.1628 LearningRate 0.000306 Epoch: 20 Global Step: 416660 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:45,998-Speed 2497.71 samples/sec Loss 2.0990 LearningRate 0.000306 Epoch: 20 Global Step: 416670 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:48:54,196-Speed 2498.61 samples/sec Loss 2.1826 LearningRate 0.000306 Epoch: 20 Global Step: 416680 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:49:02,392-Speed 2499.29 samples/sec Loss 2.1405 LearningRate 0.000306 Epoch: 20 Global Step: 416690 Fp16 Grad Scale: 8192 Required: 95 hours Training: 2022-07-09 13:49:10,591-Speed 2498.29 samples/sec Loss 2.1159 LearningRate 0.000306 Epoch: 20 Global Step: 416700 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:18,737-Speed 2514.69 samples/sec Loss 2.1123 LearningRate 0.000306 Epoch: 20 Global Step: 416710 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:26,935-Speed 2498.48 samples/sec Loss 2.0720 LearningRate 0.000306 Epoch: 20 Global Step: 416720 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:35,129-Speed 2499.88 samples/sec Loss 2.0883 LearningRate 0.000306 Epoch: 20 Global Step: 416730 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:43,342-Speed 2493.90 samples/sec Loss 2.1524 LearningRate 0.000306 Epoch: 20 Global Step: 416740 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:51,539-Speed 2498.79 samples/sec Loss 2.1172 LearningRate 0.000306 Epoch: 20 Global Step: 416750 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:49:59,743-Speed 2496.73 samples/sec Loss 2.1038 LearningRate 0.000306 Epoch: 20 Global Step: 416760 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:07,891-Speed 2513.89 samples/sec Loss 2.1394 LearningRate 0.000306 Epoch: 20 Global Step: 416770 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:16,091-Speed 2498.09 samples/sec Loss 2.1259 LearningRate 0.000306 Epoch: 20 Global Step: 416780 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:24,291-Speed 2498.09 samples/sec Loss 2.0774 LearningRate 0.000306 Epoch: 20 Global Step: 416790 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:32,487-Speed 2498.97 samples/sec Loss 2.1218 LearningRate 0.000306 Epoch: 20 Global Step: 416800 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:40,688-Speed 2497.82 samples/sec Loss 2.1648 LearningRate 0.000306 Epoch: 20 Global Step: 416810 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:48,885-Speed 2498.80 samples/sec Loss 2.1345 LearningRate 0.000306 Epoch: 20 Global Step: 416820 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:50:57,029-Speed 2515.04 samples/sec Loss 2.0848 LearningRate 0.000306 Epoch: 20 Global Step: 416830 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:05,230-Speed 2497.83 samples/sec Loss 2.1331 LearningRate 0.000306 Epoch: 20 Global Step: 416840 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:13,427-Speed 2498.91 samples/sec Loss 2.1575 LearningRate 0.000306 Epoch: 20 Global Step: 416850 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:21,624-Speed 2498.73 samples/sec Loss 2.1702 LearningRate 0.000306 Epoch: 20 Global Step: 416860 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:29,830-Speed 2496.23 samples/sec Loss 2.1428 LearningRate 0.000306 Epoch: 20 Global Step: 416870 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:38,029-Speed 2498.01 samples/sec Loss 2.1238 LearningRate 0.000306 Epoch: 20 Global Step: 416880 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:46,171-Speed 2515.88 samples/sec Loss 2.1595 LearningRate 0.000306 Epoch: 20 Global Step: 416890 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:51:54,367-Speed 2499.10 samples/sec Loss 2.0604 LearningRate 0.000305 Epoch: 20 Global Step: 416900 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:02,570-Speed 2497.20 samples/sec Loss 2.1270 LearningRate 0.000305 Epoch: 20 Global Step: 416910 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:10,781-Speed 2494.62 samples/sec Loss 2.1601 LearningRate 0.000305 Epoch: 20 Global Step: 416920 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:18,980-Speed 2498.36 samples/sec Loss 2.1175 LearningRate 0.000305 Epoch: 20 Global Step: 416930 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:27,179-Speed 2498.22 samples/sec Loss 2.1379 LearningRate 0.000305 Epoch: 20 Global Step: 416940 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:35,326-Speed 2513.87 samples/sec Loss 2.1093 LearningRate 0.000305 Epoch: 20 Global Step: 416950 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:43,528-Speed 2497.36 samples/sec Loss 2.1536 LearningRate 0.000305 Epoch: 20 Global Step: 416960 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:51,729-Speed 2497.76 samples/sec Loss 2.1682 LearningRate 0.000305 Epoch: 20 Global Step: 416970 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:52:59,927-Speed 2498.97 samples/sec Loss 2.1425 LearningRate 0.000305 Epoch: 20 Global Step: 416980 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:08,128-Speed 2497.49 samples/sec Loss 2.1223 LearningRate 0.000305 Epoch: 20 Global Step: 416990 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:16,327-Speed 2498.07 samples/sec Loss 2.1045 LearningRate 0.000305 Epoch: 20 Global Step: 417000 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:24,474-Speed 2514.26 samples/sec Loss 2.1422 LearningRate 0.000305 Epoch: 20 Global Step: 417010 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:32,678-Speed 2496.89 samples/sec Loss 2.1145 LearningRate 0.000305 Epoch: 20 Global Step: 417020 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:40,875-Speed 2498.84 samples/sec Loss 2.1118 LearningRate 0.000305 Epoch: 20 Global Step: 417030 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:49,071-Speed 2498.99 samples/sec Loss 2.1032 LearningRate 0.000305 Epoch: 20 Global Step: 417040 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:53:57,269-Speed 2498.82 samples/sec Loss 2.1468 LearningRate 0.000305 Epoch: 20 Global Step: 417050 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:05,472-Speed 2496.81 samples/sec Loss 2.1161 LearningRate 0.000305 Epoch: 20 Global Step: 417060 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:13,640-Speed 2507.86 samples/sec Loss 2.1300 LearningRate 0.000305 Epoch: 20 Global Step: 417070 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:21,836-Speed 2499.14 samples/sec Loss 2.0790 LearningRate 0.000305 Epoch: 20 Global Step: 417080 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:30,036-Speed 2498.05 samples/sec Loss 2.1140 LearningRate 0.000305 Epoch: 20 Global Step: 417090 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:38,235-Speed 2498.01 samples/sec Loss 2.0986 LearningRate 0.000305 Epoch: 20 Global Step: 417100 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:46,430-Speed 2499.40 samples/sec Loss 2.0700 LearningRate 0.000305 Epoch: 20 Global Step: 417110 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:54:54,628-Speed 2498.77 samples/sec Loss 2.1518 LearningRate 0.000305 Epoch: 20 Global Step: 417120 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:02,769-Speed 2516.06 samples/sec Loss 2.1057 LearningRate 0.000305 Epoch: 20 Global Step: 417130 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:10,972-Speed 2496.77 samples/sec Loss 2.1095 LearningRate 0.000305 Epoch: 20 Global Step: 417140 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:19,176-Speed 2496.69 samples/sec Loss 2.1514 LearningRate 0.000305 Epoch: 20 Global Step: 417150 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:27,378-Speed 2497.61 samples/sec Loss 2.1283 LearningRate 0.000305 Epoch: 20 Global Step: 417160 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:35,578-Speed 2497.80 samples/sec Loss 2.0790 LearningRate 0.000305 Epoch: 20 Global Step: 417170 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:43,788-Speed 2494.71 samples/sec Loss 2.1496 LearningRate 0.000305 Epoch: 20 Global Step: 417180 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:55:51,932-Speed 2515.31 samples/sec Loss 2.1219 LearningRate 0.000305 Epoch: 20 Global Step: 417190 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:56:00,146-Speed 2493.97 samples/sec Loss 2.1395 LearningRate 0.000305 Epoch: 20 Global Step: 417200 Fp16 Grad Scale: 8192 Required: 94 hours Training: 2022-07-09 13:56:08,362-Speed 2492.88 samples/sec Loss 2.1608 LearningRate 0.000305 Epoch: 20 Global Step: 417210 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:16,570-Speed 2495.37 samples/sec Loss 2.1237 LearningRate 0.000305 Epoch: 20 Global Step: 417220 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:24,768-Speed 2498.71 samples/sec Loss 2.0756 LearningRate 0.000305 Epoch: 20 Global Step: 417230 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:32,979-Speed 2494.39 samples/sec Loss 2.1472 LearningRate 0.000305 Epoch: 20 Global Step: 417240 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:41,129-Speed 2513.43 samples/sec Loss 2.1376 LearningRate 0.000305 Epoch: 20 Global Step: 417250 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:49,326-Speed 2498.70 samples/sec Loss 2.1282 LearningRate 0.000305 Epoch: 20 Global Step: 417260 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:56:57,527-Speed 2497.77 samples/sec Loss 2.1224 LearningRate 0.000305 Epoch: 20 Global Step: 417270 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:05,727-Speed 2497.90 samples/sec Loss 2.1019 LearningRate 0.000305 Epoch: 20 Global Step: 417280 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:13,923-Speed 2499.18 samples/sec Loss 2.1276 LearningRate 0.000305 Epoch: 20 Global Step: 417290 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:22,121-Speed 2498.65 samples/sec Loss 2.1078 LearningRate 0.000305 Epoch: 20 Global Step: 417300 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:30,266-Speed 2514.69 samples/sec Loss 2.0929 LearningRate 0.000305 Epoch: 20 Global Step: 417310 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:38,465-Speed 2498.40 samples/sec Loss 2.0686 LearningRate 0.000305 Epoch: 20 Global Step: 417320 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:46,664-Speed 2498.56 samples/sec Loss 2.0912 LearningRate 0.000305 Epoch: 20 Global Step: 417330 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:57:54,860-Speed 2499.00 samples/sec Loss 2.0903 LearningRate 0.000305 Epoch: 20 Global Step: 417340 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:03,060-Speed 2497.78 samples/sec Loss 2.1358 LearningRate 0.000305 Epoch: 20 Global Step: 417350 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:11,268-Speed 2495.47 samples/sec Loss 2.0937 LearningRate 0.000305 Epoch: 20 Global Step: 417360 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:19,413-Speed 2514.98 samples/sec Loss 2.0749 LearningRate 0.000305 Epoch: 20 Global Step: 417370 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:27,613-Speed 2498.05 samples/sec Loss 2.0858 LearningRate 0.000305 Epoch: 20 Global Step: 417380 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:35,822-Speed 2495.13 samples/sec Loss 2.0956 LearningRate 0.000305 Epoch: 20 Global Step: 417390 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:44,021-Speed 2498.61 samples/sec Loss 2.0798 LearningRate 0.000305 Epoch: 20 Global Step: 417400 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:58:52,223-Speed 2497.14 samples/sec Loss 2.0984 LearningRate 0.000305 Epoch: 20 Global Step: 417410 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:00,426-Speed 2497.21 samples/sec Loss 2.0410 LearningRate 0.000305 Epoch: 20 Global Step: 417420 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:08,569-Speed 2515.42 samples/sec Loss 2.1061 LearningRate 0.000305 Epoch: 20 Global Step: 417430 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:16,769-Speed 2497.72 samples/sec Loss 2.1401 LearningRate 0.000305 Epoch: 20 Global Step: 417440 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:24,967-Speed 2498.67 samples/sec Loss 2.0940 LearningRate 0.000305 Epoch: 20 Global Step: 417450 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:33,170-Speed 2497.04 samples/sec Loss 2.1299 LearningRate 0.000305 Epoch: 20 Global Step: 417460 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:41,372-Speed 2497.37 samples/sec Loss 2.0962 LearningRate 0.000305 Epoch: 20 Global Step: 417470 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:49,571-Speed 2498.17 samples/sec Loss 2.1298 LearningRate 0.000305 Epoch: 20 Global Step: 417480 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 13:59:57,717-Speed 2514.52 samples/sec Loss 2.1121 LearningRate 0.000305 Epoch: 20 Global Step: 417490 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:05,920-Speed 2497.06 samples/sec Loss 2.0835 LearningRate 0.000305 Epoch: 20 Global Step: 417500 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:14,118-Speed 2498.72 samples/sec Loss 2.0767 LearningRate 0.000305 Epoch: 20 Global Step: 417510 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:22,319-Speed 2497.66 samples/sec Loss 2.0830 LearningRate 0.000305 Epoch: 20 Global Step: 417520 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:30,519-Speed 2497.74 samples/sec Loss 2.1229 LearningRate 0.000305 Epoch: 20 Global Step: 417530 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:38,732-Speed 2494.30 samples/sec Loss 2.1548 LearningRate 0.000305 Epoch: 20 Global Step: 417540 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:46,890-Speed 2510.77 samples/sec Loss 2.0908 LearningRate 0.000305 Epoch: 20 Global Step: 417550 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:00:55,088-Speed 2498.47 samples/sec Loss 2.0686 LearningRate 0.000305 Epoch: 20 Global Step: 417560 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:03,294-Speed 2496.22 samples/sec Loss 2.0607 LearningRate 0.000305 Epoch: 20 Global Step: 417570 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:11,508-Speed 2493.46 samples/sec Loss 2.1104 LearningRate 0.000304 Epoch: 20 Global Step: 417580 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:19,711-Speed 2497.03 samples/sec Loss 2.0751 LearningRate 0.000304 Epoch: 20 Global Step: 417590 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:27,921-Speed 2495.09 samples/sec Loss 2.0876 LearningRate 0.000304 Epoch: 20 Global Step: 417600 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:36,069-Speed 2514.23 samples/sec Loss 2.1080 LearningRate 0.000304 Epoch: 20 Global Step: 417610 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:44,270-Speed 2497.45 samples/sec Loss 2.0499 LearningRate 0.000304 Epoch: 20 Global Step: 417620 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:01:52,469-Speed 2498.65 samples/sec Loss 2.1197 LearningRate 0.000304 Epoch: 20 Global Step: 417630 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:00,666-Speed 2499.03 samples/sec Loss 2.0975 LearningRate 0.000304 Epoch: 20 Global Step: 417640 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:08,883-Speed 2492.61 samples/sec Loss 2.0821 LearningRate 0.000304 Epoch: 20 Global Step: 417650 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:17,087-Speed 2496.61 samples/sec Loss 2.1299 LearningRate 0.000304 Epoch: 20 Global Step: 417660 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:25,231-Speed 2515.26 samples/sec Loss 2.0989 LearningRate 0.000304 Epoch: 20 Global Step: 417670 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:33,432-Speed 2497.75 samples/sec Loss 2.1352 LearningRate 0.000304 Epoch: 20 Global Step: 417680 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:41,629-Speed 2498.97 samples/sec Loss 2.1513 LearningRate 0.000304 Epoch: 20 Global Step: 417690 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:49,830-Speed 2497.52 samples/sec Loss 2.1190 LearningRate 0.000304 Epoch: 20 Global Step: 417700 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:02:58,032-Speed 2497.41 samples/sec Loss 2.1137 LearningRate 0.000304 Epoch: 20 Global Step: 417710 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:06,228-Speed 2499.11 samples/sec Loss 2.0871 LearningRate 0.000304 Epoch: 20 Global Step: 417720 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:14,374-Speed 2514.32 samples/sec Loss 2.0556 LearningRate 0.000304 Epoch: 20 Global Step: 417730 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:22,577-Speed 2497.32 samples/sec Loss 2.1445 LearningRate 0.000304 Epoch: 20 Global Step: 417740 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:30,774-Speed 2498.90 samples/sec Loss 2.1326 LearningRate 0.000304 Epoch: 20 Global Step: 417750 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:38,972-Speed 2498.34 samples/sec Loss 2.0815 LearningRate 0.000304 Epoch: 20 Global Step: 417760 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:47,173-Speed 2497.91 samples/sec Loss 2.1157 LearningRate 0.000304 Epoch: 20 Global Step: 417770 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:03:55,371-Speed 2498.49 samples/sec Loss 2.1218 LearningRate 0.000304 Epoch: 20 Global Step: 417780 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:03,521-Speed 2513.33 samples/sec Loss 2.1145 LearningRate 0.000304 Epoch: 20 Global Step: 417790 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:11,720-Speed 2498.22 samples/sec Loss 2.1110 LearningRate 0.000304 Epoch: 20 Global Step: 417800 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:19,919-Speed 2498.22 samples/sec Loss 2.1472 LearningRate 0.000304 Epoch: 20 Global Step: 417810 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:28,123-Speed 2497.45 samples/sec Loss 2.1255 LearningRate 0.000304 Epoch: 20 Global Step: 417820 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:36,325-Speed 2497.40 samples/sec Loss 2.1191 LearningRate 0.000304 Epoch: 20 Global Step: 417830 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:44,526-Speed 2497.64 samples/sec Loss 2.1528 LearningRate 0.000304 Epoch: 20 Global Step: 417840 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:04:52,671-Speed 2514.75 samples/sec Loss 2.1345 LearningRate 0.000304 Epoch: 20 Global Step: 417850 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:00,874-Speed 2496.99 samples/sec Loss 2.1374 LearningRate 0.000304 Epoch: 20 Global Step: 417860 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:09,077-Speed 2497.20 samples/sec Loss 2.1692 LearningRate 0.000304 Epoch: 20 Global Step: 417870 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:17,282-Speed 2496.45 samples/sec Loss 2.0753 LearningRate 0.000304 Epoch: 20 Global Step: 417880 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:25,485-Speed 2496.97 samples/sec Loss 2.1831 LearningRate 0.000304 Epoch: 20 Global Step: 417890 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:33,691-Speed 2496.42 samples/sec Loss 2.1319 LearningRate 0.000304 Epoch: 20 Global Step: 417900 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:41,838-Speed 2514.57 samples/sec Loss 2.1170 LearningRate 0.000304 Epoch: 20 Global Step: 417910 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:50,036-Speed 2498.28 samples/sec Loss 2.1522 LearningRate 0.000304 Epoch: 20 Global Step: 417920 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:05:58,238-Speed 2497.23 samples/sec Loss 2.1134 LearningRate 0.000304 Epoch: 20 Global Step: 417930 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:06,442-Speed 2497.06 samples/sec Loss 2.1471 LearningRate 0.000304 Epoch: 20 Global Step: 417940 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:14,644-Speed 2497.38 samples/sec Loss 2.1134 LearningRate 0.000304 Epoch: 20 Global Step: 417950 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:22,846-Speed 2497.34 samples/sec Loss 2.0952 LearningRate 0.000304 Epoch: 20 Global Step: 417960 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:30,999-Speed 2512.64 samples/sec Loss 2.1114 LearningRate 0.000304 Epoch: 20 Global Step: 417970 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:39,201-Speed 2497.54 samples/sec Loss 2.0959 LearningRate 0.000304 Epoch: 20 Global Step: 417980 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:47,400-Speed 2498.09 samples/sec Loss 2.1951 LearningRate 0.000304 Epoch: 20 Global Step: 417990 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:06:55,606-Speed 2496.07 samples/sec Loss 2.1066 LearningRate 0.000304 Epoch: 20 Global Step: 418000 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:03,808-Speed 2497.40 samples/sec Loss 2.1252 LearningRate 0.000304 Epoch: 20 Global Step: 418010 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:12,009-Speed 2497.81 samples/sec Loss 2.0894 LearningRate 0.000304 Epoch: 20 Global Step: 418020 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:20,170-Speed 2510.00 samples/sec Loss 2.1347 LearningRate 0.000304 Epoch: 20 Global Step: 418030 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:28,374-Speed 2496.92 samples/sec Loss 2.1477 LearningRate 0.000304 Epoch: 20 Global Step: 418040 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:36,577-Speed 2496.97 samples/sec Loss 2.0758 LearningRate 0.000304 Epoch: 20 Global Step: 418050 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:44,776-Speed 2498.19 samples/sec Loss 2.1369 LearningRate 0.000304 Epoch: 20 Global Step: 418060 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:07:52,973-Speed 2499.15 samples/sec Loss 2.0891 LearningRate 0.000304 Epoch: 20 Global Step: 418070 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:01,173-Speed 2498.02 samples/sec Loss 2.1503 LearningRate 0.000304 Epoch: 20 Global Step: 418080 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:09,319-Speed 2514.59 samples/sec Loss 2.1366 LearningRate 0.000304 Epoch: 20 Global Step: 418090 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:17,517-Speed 2498.75 samples/sec Loss 2.1200 LearningRate 0.000304 Epoch: 20 Global Step: 418100 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:25,719-Speed 2497.37 samples/sec Loss 2.1191 LearningRate 0.000304 Epoch: 20 Global Step: 418110 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:33,918-Speed 2498.11 samples/sec Loss 2.1392 LearningRate 0.000304 Epoch: 20 Global Step: 418120 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:42,123-Speed 2496.58 samples/sec Loss 2.1632 LearningRate 0.000304 Epoch: 20 Global Step: 418130 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:50,320-Speed 2498.60 samples/sec Loss 2.1192 LearningRate 0.000304 Epoch: 20 Global Step: 418140 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:08:58,468-Speed 2513.97 samples/sec Loss 2.1754 LearningRate 0.000304 Epoch: 20 Global Step: 418150 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:06,668-Speed 2497.90 samples/sec Loss 2.1316 LearningRate 0.000304 Epoch: 20 Global Step: 418160 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:14,870-Speed 2497.14 samples/sec Loss 2.0991 LearningRate 0.000304 Epoch: 20 Global Step: 418170 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:23,076-Speed 2496.19 samples/sec Loss 2.1506 LearningRate 0.000304 Epoch: 20 Global Step: 418180 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:31,277-Speed 2498.15 samples/sec Loss 2.1251 LearningRate 0.000304 Epoch: 20 Global Step: 418190 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:39,478-Speed 2497.55 samples/sec Loss 2.0997 LearningRate 0.000304 Epoch: 20 Global Step: 418200 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:47,627-Speed 2513.99 samples/sec Loss 2.0889 LearningRate 0.000304 Epoch: 20 Global Step: 418210 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:09:55,832-Speed 2496.19 samples/sec Loss 2.1017 LearningRate 0.000304 Epoch: 20 Global Step: 418220 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:04,032-Speed 2498.07 samples/sec Loss 2.1138 LearningRate 0.000304 Epoch: 20 Global Step: 418230 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:12,234-Speed 2497.34 samples/sec Loss 2.1230 LearningRate 0.000304 Epoch: 20 Global Step: 418240 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:20,445-Speed 2494.91 samples/sec Loss 2.1181 LearningRate 0.000304 Epoch: 20 Global Step: 418250 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:28,643-Speed 2498.28 samples/sec Loss 2.1023 LearningRate 0.000303 Epoch: 20 Global Step: 418260 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:36,804-Speed 2510.06 samples/sec Loss 2.1213 LearningRate 0.000303 Epoch: 20 Global Step: 418270 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:45,004-Speed 2497.78 samples/sec Loss 2.1549 LearningRate 0.000303 Epoch: 20 Global Step: 418280 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:10:53,208-Speed 2497.30 samples/sec Loss 2.1370 LearningRate 0.000303 Epoch: 20 Global Step: 418290 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:01,410-Speed 2497.07 samples/sec Loss 2.0480 LearningRate 0.000303 Epoch: 20 Global Step: 418300 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:09,613-Speed 2497.19 samples/sec Loss 2.1160 LearningRate 0.000303 Epoch: 20 Global Step: 418310 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:17,811-Speed 2498.34 samples/sec Loss 2.1121 LearningRate 0.000303 Epoch: 20 Global Step: 418320 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:25,962-Speed 2513.00 samples/sec Loss 2.1174 LearningRate 0.000303 Epoch: 20 Global Step: 418330 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:34,161-Speed 2498.18 samples/sec Loss 2.1396 LearningRate 0.000303 Epoch: 20 Global Step: 418340 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:42,360-Speed 2498.43 samples/sec Loss 2.0819 LearningRate 0.000303 Epoch: 20 Global Step: 418350 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:50,558-Speed 2498.36 samples/sec Loss 2.1346 LearningRate 0.000303 Epoch: 20 Global Step: 418360 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:11:58,762-Speed 2496.91 samples/sec Loss 2.1346 LearningRate 0.000303 Epoch: 20 Global Step: 418370 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:12:06,961-Speed 2498.12 samples/sec Loss 2.0944 LearningRate 0.000303 Epoch: 20 Global Step: 418380 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:12:15,113-Speed 2512.78 samples/sec Loss 2.1104 LearningRate 0.000303 Epoch: 20 Global Step: 418390 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:12:23,310-Speed 2499.11 samples/sec Loss 2.0794 LearningRate 0.000303 Epoch: 20 Global Step: 418400 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:12:31,507-Speed 2498.91 samples/sec Loss 2.1209 LearningRate 0.000303 Epoch: 20 Global Step: 418410 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:12:39,711-Speed 2496.44 samples/sec Loss 2.0874 LearningRate 0.000303 Epoch: 20 Global Step: 418420 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:12:47,913-Speed 2497.55 samples/sec Loss 2.1155 LearningRate 0.000303 Epoch: 20 Global Step: 418430 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:12:56,116-Speed 2497.08 samples/sec Loss 2.1276 LearningRate 0.000303 Epoch: 20 Global Step: 418440 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:04,264-Speed 2513.93 samples/sec Loss 2.1645 LearningRate 0.000303 Epoch: 20 Global Step: 418450 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:12,465-Speed 2498.16 samples/sec Loss 2.1138 LearningRate 0.000303 Epoch: 20 Global Step: 418460 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:20,664-Speed 2498.14 samples/sec Loss 2.1219 LearningRate 0.000303 Epoch: 20 Global Step: 418470 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:28,866-Speed 2497.39 samples/sec Loss 2.1335 LearningRate 0.000303 Epoch: 20 Global Step: 418480 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:37,077-Speed 2494.75 samples/sec Loss 2.0895 LearningRate 0.000303 Epoch: 20 Global Step: 418490 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:45,280-Speed 2497.08 samples/sec Loss 2.1140 LearningRate 0.000303 Epoch: 20 Global Step: 418500 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:13:53,428-Speed 2513.73 samples/sec Loss 2.1200 LearningRate 0.000303 Epoch: 20 Global Step: 418510 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:01,635-Speed 2496.06 samples/sec Loss 2.1400 LearningRate 0.000303 Epoch: 20 Global Step: 418520 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:09,835-Speed 2497.65 samples/sec Loss 2.1013 LearningRate 0.000303 Epoch: 20 Global Step: 418530 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:18,037-Speed 2497.34 samples/sec Loss 2.0814 LearningRate 0.000303 Epoch: 20 Global Step: 418540 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:26,242-Speed 2496.68 samples/sec Loss 2.0521 LearningRate 0.000303 Epoch: 20 Global Step: 418550 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:34,444-Speed 2497.32 samples/sec Loss 2.1540 LearningRate 0.000303 Epoch: 20 Global Step: 418560 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:42,592-Speed 2513.87 samples/sec Loss 2.0601 LearningRate 0.000303 Epoch: 20 Global Step: 418570 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:50,791-Speed 2498.19 samples/sec Loss 2.1464 LearningRate 0.000303 Epoch: 20 Global Step: 418580 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:14:58,992-Speed 2497.84 samples/sec Loss 2.1031 LearningRate 0.000303 Epoch: 20 Global Step: 418590 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:07,197-Speed 2496.58 samples/sec Loss 2.1257 LearningRate 0.000303 Epoch: 20 Global Step: 418600 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:15,405-Speed 2495.41 samples/sec Loss 2.0727 LearningRate 0.000303 Epoch: 20 Global Step: 418610 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:23,606-Speed 2497.74 samples/sec Loss 2.0968 LearningRate 0.000303 Epoch: 20 Global Step: 418620 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:31,755-Speed 2513.84 samples/sec Loss 2.1359 LearningRate 0.000303 Epoch: 20 Global Step: 418630 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:39,962-Speed 2495.69 samples/sec Loss 2.1245 LearningRate 0.000303 Epoch: 20 Global Step: 418640 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:48,160-Speed 2498.54 samples/sec Loss 2.1431 LearningRate 0.000303 Epoch: 20 Global Step: 418650 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:15:56,360-Speed 2498.19 samples/sec Loss 2.1009 LearningRate 0.000303 Epoch: 20 Global Step: 418660 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:04,556-Speed 2499.06 samples/sec Loss 2.1115 LearningRate 0.000303 Epoch: 20 Global Step: 418670 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:12,766-Speed 2494.77 samples/sec Loss 2.0613 LearningRate 0.000303 Epoch: 20 Global Step: 418680 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:20,914-Speed 2514.08 samples/sec Loss 2.1195 LearningRate 0.000303 Epoch: 20 Global Step: 418690 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:29,111-Speed 2499.01 samples/sec Loss 2.1171 LearningRate 0.000303 Epoch: 20 Global Step: 418700 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:37,311-Speed 2498.24 samples/sec Loss 2.0687 LearningRate 0.000303 Epoch: 20 Global Step: 418710 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:45,507-Speed 2498.94 samples/sec Loss 2.1346 LearningRate 0.000303 Epoch: 20 Global Step: 418720 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:16:53,709-Speed 2497.33 samples/sec Loss 2.1146 LearningRate 0.000303 Epoch: 20 Global Step: 418730 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:01,911-Speed 2497.26 samples/sec Loss 2.1389 LearningRate 0.000303 Epoch: 20 Global Step: 418740 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:10,059-Speed 2514.29 samples/sec Loss 2.1172 LearningRate 0.000303 Epoch: 20 Global Step: 418750 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:18,259-Speed 2497.79 samples/sec Loss 2.1448 LearningRate 0.000303 Epoch: 20 Global Step: 418760 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:26,466-Speed 2495.71 samples/sec Loss 2.1298 LearningRate 0.000303 Epoch: 20 Global Step: 418770 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:34,667-Speed 2498.13 samples/sec Loss 2.0681 LearningRate 0.000303 Epoch: 20 Global Step: 418780 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:42,886-Speed 2492.28 samples/sec Loss 2.1468 LearningRate 0.000303 Epoch: 20 Global Step: 418790 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:51,087-Speed 2497.77 samples/sec Loss 2.0567 LearningRate 0.000303 Epoch: 20 Global Step: 418800 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:17:59,245-Speed 2510.86 samples/sec Loss 2.1190 LearningRate 0.000303 Epoch: 20 Global Step: 418810 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:18:07,415-Speed 2507.08 samples/sec Loss 2.1070 LearningRate 0.000303 Epoch: 20 Global Step: 418820 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:15,613-Speed 2498.62 samples/sec Loss 2.0749 LearningRate 0.000303 Epoch: 20 Global Step: 418830 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:23,817-Speed 2496.55 samples/sec Loss 2.0940 LearningRate 0.000303 Epoch: 20 Global Step: 418840 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:32,017-Speed 2498.13 samples/sec Loss 2.1204 LearningRate 0.000303 Epoch: 20 Global Step: 418850 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:40,217-Speed 2497.98 samples/sec Loss 2.0837 LearningRate 0.000303 Epoch: 20 Global Step: 418860 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:48,363-Speed 2514.41 samples/sec Loss 2.1268 LearningRate 0.000303 Epoch: 20 Global Step: 418870 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:18:56,560-Speed 2498.85 samples/sec Loss 2.0660 LearningRate 0.000303 Epoch: 20 Global Step: 418880 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:04,759-Speed 2498.41 samples/sec Loss 2.1201 LearningRate 0.000303 Epoch: 20 Global Step: 418890 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:12,956-Speed 2498.88 samples/sec Loss 2.1398 LearningRate 0.000303 Epoch: 20 Global Step: 418900 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:21,164-Speed 2495.37 samples/sec Loss 2.1130 LearningRate 0.000303 Epoch: 20 Global Step: 418910 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:29,370-Speed 2496.01 samples/sec Loss 2.1020 LearningRate 0.000303 Epoch: 20 Global Step: 418920 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:37,517-Speed 2514.47 samples/sec Loss 2.0764 LearningRate 0.000302 Epoch: 20 Global Step: 418930 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:45,721-Speed 2496.58 samples/sec Loss 2.1199 LearningRate 0.000302 Epoch: 20 Global Step: 418940 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:19:53,924-Speed 2496.99 samples/sec Loss 2.0536 LearningRate 0.000302 Epoch: 20 Global Step: 418950 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:02,129-Speed 2496.36 samples/sec Loss 2.0963 LearningRate 0.000302 Epoch: 20 Global Step: 418960 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:10,331-Speed 2497.39 samples/sec Loss 2.0730 LearningRate 0.000302 Epoch: 20 Global Step: 418970 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:18,534-Speed 2496.96 samples/sec Loss 2.1117 LearningRate 0.000302 Epoch: 20 Global Step: 418980 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:26,683-Speed 2513.81 samples/sec Loss 2.1480 LearningRate 0.000302 Epoch: 20 Global Step: 418990 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:34,887-Speed 2496.59 samples/sec Loss 2.0750 LearningRate 0.000302 Epoch: 20 Global Step: 419000 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:43,099-Speed 2494.78 samples/sec Loss 2.1051 LearningRate 0.000302 Epoch: 20 Global Step: 419010 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:51,298-Speed 2498.33 samples/sec Loss 2.1168 LearningRate 0.000302 Epoch: 20 Global Step: 419020 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:20:59,506-Speed 2495.49 samples/sec Loss 2.1206 LearningRate 0.000302 Epoch: 20 Global Step: 419030 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:07,713-Speed 2495.99 samples/sec Loss 2.1355 LearningRate 0.000302 Epoch: 20 Global Step: 419040 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:15,864-Speed 2512.81 samples/sec Loss 2.1345 LearningRate 0.000302 Epoch: 20 Global Step: 419050 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:24,067-Speed 2497.35 samples/sec Loss 2.1233 LearningRate 0.000302 Epoch: 20 Global Step: 419060 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:32,283-Speed 2492.99 samples/sec Loss 2.1129 LearningRate 0.000302 Epoch: 20 Global Step: 419070 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:40,485-Speed 2497.14 samples/sec Loss 2.0875 LearningRate 0.000302 Epoch: 20 Global Step: 419080 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:48,687-Speed 2497.66 samples/sec Loss 2.0873 LearningRate 0.000302 Epoch: 20 Global Step: 419090 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:21:56,886-Speed 2498.13 samples/sec Loss 2.1179 LearningRate 0.000302 Epoch: 20 Global Step: 419100 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:05,036-Speed 2513.28 samples/sec Loss 2.1375 LearningRate 0.000302 Epoch: 20 Global Step: 419110 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:13,241-Speed 2496.47 samples/sec Loss 2.1332 LearningRate 0.000302 Epoch: 20 Global Step: 419120 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:21,442-Speed 2497.61 samples/sec Loss 2.1004 LearningRate 0.000302 Epoch: 20 Global Step: 419130 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:29,644-Speed 2497.40 samples/sec Loss 2.1612 LearningRate 0.000302 Epoch: 20 Global Step: 419140 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:37,859-Speed 2493.31 samples/sec Loss 2.0509 LearningRate 0.000302 Epoch: 20 Global Step: 419150 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:46,063-Speed 2496.63 samples/sec Loss 2.1402 LearningRate 0.000302 Epoch: 20 Global Step: 419160 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:22:54,215-Speed 2512.84 samples/sec Loss 2.1360 LearningRate 0.000302 Epoch: 20 Global Step: 419170 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:02,418-Speed 2496.89 samples/sec Loss 2.1117 LearningRate 0.000302 Epoch: 20 Global Step: 419180 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:10,618-Speed 2497.97 samples/sec Loss 2.0982 LearningRate 0.000302 Epoch: 20 Global Step: 419190 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:18,816-Speed 2498.76 samples/sec Loss 2.1337 LearningRate 0.000302 Epoch: 20 Global Step: 419200 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:27,013-Speed 2499.04 samples/sec Loss 2.1322 LearningRate 0.000302 Epoch: 20 Global Step: 419210 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:35,216-Speed 2496.73 samples/sec Loss 2.0929 LearningRate 0.000302 Epoch: 20 Global Step: 419220 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:43,384-Speed 2507.76 samples/sec Loss 2.1246 LearningRate 0.000302 Epoch: 20 Global Step: 419230 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:51,580-Speed 2500.13 samples/sec Loss 2.1038 LearningRate 0.000302 Epoch: 20 Global Step: 419240 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:23:59,776-Speed 2499.17 samples/sec Loss 2.1841 LearningRate 0.000302 Epoch: 20 Global Step: 419250 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:07,976-Speed 2498.03 samples/sec Loss 2.0923 LearningRate 0.000302 Epoch: 20 Global Step: 419260 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:16,173-Speed 2498.81 samples/sec Loss 2.0927 LearningRate 0.000302 Epoch: 20 Global Step: 419270 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:24,370-Speed 2499.03 samples/sec Loss 2.1396 LearningRate 0.000302 Epoch: 20 Global Step: 419280 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:32,519-Speed 2513.73 samples/sec Loss 2.0827 LearningRate 0.000302 Epoch: 20 Global Step: 419290 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:40,718-Speed 2498.09 samples/sec Loss 2.1558 LearningRate 0.000302 Epoch: 20 Global Step: 419300 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:48,914-Speed 2499.27 samples/sec Loss 2.1297 LearningRate 0.000302 Epoch: 20 Global Step: 419310 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:24:57,111-Speed 2498.60 samples/sec Loss 2.1115 LearningRate 0.000302 Epoch: 20 Global Step: 419320 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:05,311-Speed 2497.99 samples/sec Loss 2.0974 LearningRate 0.000302 Epoch: 20 Global Step: 419330 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:13,509-Speed 2498.55 samples/sec Loss 2.1352 LearningRate 0.000302 Epoch: 20 Global Step: 419340 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:21,669-Speed 2510.30 samples/sec Loss 2.0772 LearningRate 0.000302 Epoch: 20 Global Step: 419350 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:29,870-Speed 2497.52 samples/sec Loss 2.0841 LearningRate 0.000302 Epoch: 20 Global Step: 419360 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:38,075-Speed 2496.44 samples/sec Loss 2.1429 LearningRate 0.000302 Epoch: 20 Global Step: 419370 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:46,275-Speed 2498.22 samples/sec Loss 2.1110 LearningRate 0.000302 Epoch: 20 Global Step: 419380 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:25:54,487-Speed 2494.14 samples/sec Loss 2.1352 LearningRate 0.000302 Epoch: 20 Global Step: 419390 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:02,708-Speed 2491.64 samples/sec Loss 2.0562 LearningRate 0.000302 Epoch: 20 Global Step: 419400 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:10,855-Speed 2514.32 samples/sec Loss 2.0920 LearningRate 0.000302 Epoch: 20 Global Step: 419410 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:19,053-Speed 2498.54 samples/sec Loss 2.0679 LearningRate 0.000302 Epoch: 20 Global Step: 419420 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:27,255-Speed 2497.07 samples/sec Loss 2.0670 LearningRate 0.000302 Epoch: 20 Global Step: 419430 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:35,452-Speed 2499.11 samples/sec Loss 2.1032 LearningRate 0.000302 Epoch: 20 Global Step: 419440 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:43,650-Speed 2498.45 samples/sec Loss 2.0692 LearningRate 0.000302 Epoch: 20 Global Step: 419450 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:51,851-Speed 2497.76 samples/sec Loss 2.0635 LearningRate 0.000302 Epoch: 20 Global Step: 419460 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:26:59,999-Speed 2513.82 samples/sec Loss 2.1076 LearningRate 0.000302 Epoch: 20 Global Step: 419470 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:08,199-Speed 2498.11 samples/sec Loss 2.0896 LearningRate 0.000302 Epoch: 20 Global Step: 419480 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:16,399-Speed 2498.04 samples/sec Loss 2.1000 LearningRate 0.000302 Epoch: 20 Global Step: 419490 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:24,599-Speed 2497.83 samples/sec Loss 2.1128 LearningRate 0.000302 Epoch: 20 Global Step: 419500 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:32,801-Speed 2497.53 samples/sec Loss 2.1144 LearningRate 0.000302 Epoch: 20 Global Step: 419510 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:40,992-Speed 2500.56 samples/sec Loss 2.0889 LearningRate 0.000302 Epoch: 20 Global Step: 419520 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:49,141-Speed 2513.77 samples/sec Loss 2.0760 LearningRate 0.000302 Epoch: 20 Global Step: 419530 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:27:57,342-Speed 2497.42 samples/sec Loss 2.1496 LearningRate 0.000302 Epoch: 20 Global Step: 419540 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:09,472-Speed 1690.59 samples/sec Loss 2.0836 LearningRate 0.000302 Epoch: 20 Global Step: 419550 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:17,678-Speed 2502.09 samples/sec Loss 2.0767 LearningRate 0.000302 Epoch: 20 Global Step: 419560 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:25,875-Speed 2498.96 samples/sec Loss 2.1054 LearningRate 0.000302 Epoch: 20 Global Step: 419570 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:37,927-Speed 1736.34 samples/sec Loss 2.1236 LearningRate 0.000302 Epoch: 20 Global Step: 419580 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:46,083-Speed 2518.37 samples/sec Loss 2.1235 LearningRate 0.000302 Epoch: 20 Global Step: 419590 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:28:54,278-Speed 2499.39 samples/sec Loss 2.1226 LearningRate 0.000302 Epoch: 20 Global Step: 419600 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:04,627-Speed 1985.75 samples/sec Loss 2.1294 LearningRate 0.000301 Epoch: 20 Global Step: 419610 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:12,879-Speed 2501.57 samples/sec Loss 2.0949 LearningRate 0.000301 Epoch: 20 Global Step: 419620 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:21,092-Speed 2493.73 samples/sec Loss 2.0839 LearningRate 0.000301 Epoch: 20 Global Step: 419630 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:29,348-Speed 2499.97 samples/sec Loss 2.1189 LearningRate 0.000301 Epoch: 20 Global Step: 419640 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:37,542-Speed 2516.64 samples/sec Loss 2.1304 LearningRate 0.000301 Epoch: 20 Global Step: 419650 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:50,505-Speed 1580.01 samples/sec Loss 2.1603 LearningRate 0.000301 Epoch: 20 Global Step: 419660 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:29:58,736-Speed 2502.45 samples/sec Loss 2.1358 LearningRate 0.000301 Epoch: 20 Global Step: 419670 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:06,982-Speed 2501.52 samples/sec Loss 2.1353 LearningRate 0.000301 Epoch: 20 Global Step: 419680 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:15,234-Speed 2500.24 samples/sec Loss 2.1327 LearningRate 0.000301 Epoch: 20 Global Step: 419690 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:27,199-Speed 1711.86 samples/sec Loss 2.1171 LearningRate 0.000301 Epoch: 20 Global Step: 419700 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:35,653-Speed 2519.06 samples/sec Loss 2.1593 LearningRate 0.000301 Epoch: 20 Global Step: 419710 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:49,233-Speed 1517.81 samples/sec Loss 2.0894 LearningRate 0.000301 Epoch: 20 Global Step: 419720 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:30:57,455-Speed 2503.14 samples/sec Loss 2.1491 LearningRate 0.000301 Epoch: 20 Global Step: 419730 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:05,650-Speed 2499.36 samples/sec Loss 2.1563 LearningRate 0.000301 Epoch: 20 Global Step: 419740 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:16,991-Speed 2481.43 samples/sec Loss 2.1004 LearningRate 0.000301 Epoch: 20 Global Step: 419750 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:25,286-Speed 2500.52 samples/sec Loss 2.1101 LearningRate 0.000301 Epoch: 20 Global Step: 419760 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:33,426-Speed 2516.61 samples/sec Loss 2.0794 LearningRate 0.000301 Epoch: 20 Global Step: 419770 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:47,226-Speed 1633.75 samples/sec Loss 2.0987 LearningRate 0.000301 Epoch: 20 Global Step: 419780 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:31:56,956-Speed 2107.17 samples/sec Loss 2.1519 LearningRate 0.000301 Epoch: 20 Global Step: 419790 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:09,571-Speed 1629.21 samples/sec Loss 2.1516 LearningRate 0.000301 Epoch: 20 Global Step: 419800 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:19,340-Speed 2096.97 samples/sec Loss 2.1221 LearningRate 0.000301 Epoch: 20 Global Step: 419810 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:27,548-Speed 2495.81 samples/sec Loss 2.1240 LearningRate 0.000301 Epoch: 20 Global Step: 419820 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:35,707-Speed 2510.46 samples/sec Loss 2.1711 LearningRate 0.000301 Epoch: 20 Global Step: 419830 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:43,925-Speed 2492.68 samples/sec Loss 2.1219 LearningRate 0.000301 Epoch: 20 Global Step: 419840 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:32:52,147-Speed 2491.50 samples/sec Loss 2.1150 LearningRate 0.000301 Epoch: 20 Global Step: 419850 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:00,381-Speed 2487.59 samples/sec Loss 2.1582 LearningRate 0.000301 Epoch: 20 Global Step: 419860 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:08,604-Speed 2490.85 samples/sec Loss 2.1720 LearningRate 0.000301 Epoch: 20 Global Step: 419870 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:16,823-Speed 2492.45 samples/sec Loss 2.1579 LearningRate 0.000301 Epoch: 20 Global Step: 419880 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:24,985-Speed 2509.55 samples/sec Loss 2.1233 LearningRate 0.000301 Epoch: 20 Global Step: 419890 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:33,206-Speed 2491.79 samples/sec Loss 2.1406 LearningRate 0.000301 Epoch: 20 Global Step: 419900 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:41,410-Speed 2496.72 samples/sec Loss 2.1294 LearningRate 0.000301 Epoch: 20 Global Step: 419910 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:49,614-Speed 2496.48 samples/sec Loss 2.1310 LearningRate 0.000301 Epoch: 20 Global Step: 419920 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:33:57,846-Speed 2488.23 samples/sec Loss 2.1406 LearningRate 0.000301 Epoch: 20 Global Step: 419930 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:06,075-Speed 2489.72 samples/sec Loss 2.1527 LearningRate 0.000301 Epoch: 20 Global Step: 419940 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:14,245-Speed 2507.27 samples/sec Loss 2.1294 LearningRate 0.000301 Epoch: 20 Global Step: 419950 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:22,437-Speed 2500.33 samples/sec Loss 2.1245 LearningRate 0.000301 Epoch: 20 Global Step: 419960 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:30,666-Speed 2489.22 samples/sec Loss 2.1163 LearningRate 0.000301 Epoch: 20 Global Step: 419970 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:38,903-Speed 2486.81 samples/sec Loss 2.1145 LearningRate 0.000301 Epoch: 20 Global Step: 419980 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:47,102-Speed 2498.06 samples/sec Loss 2.1275 LearningRate 0.000301 Epoch: 20 Global Step: 419990 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:34:55,310-Speed 2495.74 samples/sec Loss 2.1504 LearningRate 0.000301 Epoch: 20 Global Step: 420000 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:35:03,465-Speed 2511.78 samples/sec Loss 2.1140 LearningRate 0.000301 Epoch: 20 Global Step: 420010 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:35:11,674-Speed 2495.20 samples/sec Loss 2.1298 LearningRate 0.000301 Epoch: 20 Global Step: 420020 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:35:19,892-Speed 2492.35 samples/sec Loss 2.1488 LearningRate 0.000301 Epoch: 20 Global Step: 420030 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:35:28,102-Speed 2494.89 samples/sec Loss 2.1351 LearningRate 0.000301 Epoch: 20 Global Step: 420040 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:35:36,311-Speed 2495.11 samples/sec Loss 2.0946 LearningRate 0.000301 Epoch: 20 Global Step: 420050 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:35:44,520-Speed 2495.26 samples/sec Loss 2.1575 LearningRate 0.000301 Epoch: 20 Global Step: 420060 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:35:52,677-Speed 2510.84 samples/sec Loss 2.1039 LearningRate 0.000301 Epoch: 20 Global Step: 420070 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:00,885-Speed 2495.73 samples/sec Loss 2.1320 LearningRate 0.000301 Epoch: 20 Global Step: 420080 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:09,088-Speed 2497.21 samples/sec Loss 2.1271 LearningRate 0.000301 Epoch: 20 Global Step: 420090 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:17,296-Speed 2495.64 samples/sec Loss 2.1601 LearningRate 0.000301 Epoch: 20 Global Step: 420100 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:25,504-Speed 2495.58 samples/sec Loss 2.1096 LearningRate 0.000301 Epoch: 20 Global Step: 420110 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:33,714-Speed 2496.07 samples/sec Loss 2.1362 LearningRate 0.000301 Epoch: 20 Global Step: 420120 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:41,874-Speed 2510.10 samples/sec Loss 2.0789 LearningRate 0.000301 Epoch: 20 Global Step: 420130 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:50,082-Speed 2495.58 samples/sec Loss 2.1527 LearningRate 0.000301 Epoch: 20 Global Step: 420140 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:36:58,296-Speed 2493.47 samples/sec Loss 2.0876 LearningRate 0.000301 Epoch: 20 Global Step: 420150 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:06,509-Speed 2494.10 samples/sec Loss 2.1372 LearningRate 0.000301 Epoch: 20 Global Step: 420160 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:14,717-Speed 2495.48 samples/sec Loss 2.1611 LearningRate 0.000301 Epoch: 20 Global Step: 420170 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:22,922-Speed 2496.37 samples/sec Loss 2.1033 LearningRate 0.000301 Epoch: 20 Global Step: 420180 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:31,086-Speed 2508.88 samples/sec Loss 2.1105 LearningRate 0.000301 Epoch: 20 Global Step: 420190 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:39,289-Speed 2497.17 samples/sec Loss 2.0971 LearningRate 0.000301 Epoch: 20 Global Step: 420200 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:47,492-Speed 2497.00 samples/sec Loss 2.0924 LearningRate 0.000301 Epoch: 20 Global Step: 420210 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:37:55,702-Speed 2494.88 samples/sec Loss 2.1517 LearningRate 0.000301 Epoch: 20 Global Step: 420220 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:03,907-Speed 2496.49 samples/sec Loss 2.1384 LearningRate 0.000301 Epoch: 20 Global Step: 420230 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:12,116-Speed 2495.09 samples/sec Loss 2.1051 LearningRate 0.000301 Epoch: 20 Global Step: 420240 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:20,273-Speed 2511.34 samples/sec Loss 2.1241 LearningRate 0.000301 Epoch: 20 Global Step: 420250 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:28,480-Speed 2496.76 samples/sec Loss 2.0921 LearningRate 0.000301 Epoch: 20 Global Step: 420260 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:36,686-Speed 2495.80 samples/sec Loss 2.1102 LearningRate 0.000301 Epoch: 20 Global Step: 420270 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:44,892-Speed 2496.49 samples/sec Loss 2.0795 LearningRate 0.000301 Epoch: 20 Global Step: 420280 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:38:53,095-Speed 2497.07 samples/sec Loss 2.1302 LearningRate 0.000300 Epoch: 20 Global Step: 420290 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:01,301-Speed 2495.83 samples/sec Loss 2.0964 LearningRate 0.000300 Epoch: 20 Global Step: 420300 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:09,454-Speed 2512.19 samples/sec Loss 2.1480 LearningRate 0.000300 Epoch: 20 Global Step: 420310 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:17,655-Speed 2497.85 samples/sec Loss 2.1264 LearningRate 0.000300 Epoch: 20 Global Step: 420320 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:25,871-Speed 2493.26 samples/sec Loss 2.1153 LearningRate 0.000300 Epoch: 20 Global Step: 420330 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:34,078-Speed 2495.68 samples/sec Loss 2.1659 LearningRate 0.000300 Epoch: 20 Global Step: 420340 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:42,281-Speed 2497.15 samples/sec Loss 2.1301 LearningRate 0.000300 Epoch: 20 Global Step: 420350 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:50,483-Speed 2497.38 samples/sec Loss 2.1201 LearningRate 0.000300 Epoch: 20 Global Step: 420360 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:39:58,633-Speed 2513.48 samples/sec Loss 2.1130 LearningRate 0.000300 Epoch: 20 Global Step: 420370 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:06,835-Speed 2497.34 samples/sec Loss 2.1154 LearningRate 0.000300 Epoch: 20 Global Step: 420380 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:15,035-Speed 2498.07 samples/sec Loss 2.0843 LearningRate 0.000300 Epoch: 20 Global Step: 420390 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:23,288-Speed 2481.79 samples/sec Loss 2.1097 LearningRate 0.000300 Epoch: 20 Global Step: 420400 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:31,496-Speed 2495.63 samples/sec Loss 2.1140 LearningRate 0.000300 Epoch: 20 Global Step: 420410 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:39,699-Speed 2497.02 samples/sec Loss 2.1000 LearningRate 0.000300 Epoch: 20 Global Step: 420420 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:47,849-Speed 2513.24 samples/sec Loss 2.1065 LearningRate 0.000300 Epoch: 20 Global Step: 420430 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:40:56,055-Speed 2496.28 samples/sec Loss 2.1177 LearningRate 0.000300 Epoch: 20 Global Step: 420440 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:04,258-Speed 2497.38 samples/sec Loss 2.0804 LearningRate 0.000300 Epoch: 20 Global Step: 420450 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:12,464-Speed 2496.01 samples/sec Loss 2.0821 LearningRate 0.000300 Epoch: 20 Global Step: 420460 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:20,669-Speed 2496.24 samples/sec Loss 2.1236 LearningRate 0.000300 Epoch: 20 Global Step: 420470 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:28,872-Speed 2497.07 samples/sec Loss 2.1091 LearningRate 0.000300 Epoch: 20 Global Step: 420480 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:37,024-Speed 2512.85 samples/sec Loss 2.1346 LearningRate 0.000300 Epoch: 20 Global Step: 420490 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:45,226-Speed 2497.27 samples/sec Loss 2.1349 LearningRate 0.000300 Epoch: 20 Global Step: 420500 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:41:53,428-Speed 2497.16 samples/sec Loss 2.1154 LearningRate 0.000300 Epoch: 20 Global Step: 420510 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:01,633-Speed 2497.10 samples/sec Loss 2.1357 LearningRate 0.000300 Epoch: 20 Global Step: 420520 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:09,835-Speed 2497.33 samples/sec Loss 2.0947 LearningRate 0.000300 Epoch: 20 Global Step: 420530 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:18,035-Speed 2497.79 samples/sec Loss 2.1117 LearningRate 0.000300 Epoch: 20 Global Step: 420540 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:26,186-Speed 2512.99 samples/sec Loss 2.0948 LearningRate 0.000300 Epoch: 20 Global Step: 420550 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:34,388-Speed 2497.75 samples/sec Loss 2.0687 LearningRate 0.000300 Epoch: 20 Global Step: 420560 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:42,595-Speed 2496.06 samples/sec Loss 2.1028 LearningRate 0.000300 Epoch: 20 Global Step: 420570 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:50,791-Speed 2499.08 samples/sec Loss 2.1338 LearningRate 0.000300 Epoch: 20 Global Step: 420580 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:42:58,992-Speed 2497.73 samples/sec Loss 2.0968 LearningRate 0.000300 Epoch: 20 Global Step: 420590 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:07,195-Speed 2496.88 samples/sec Loss 2.1277 LearningRate 0.000300 Epoch: 20 Global Step: 420600 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:15,341-Speed 2514.72 samples/sec Loss 2.0938 LearningRate 0.000300 Epoch: 20 Global Step: 420610 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:23,543-Speed 2497.26 samples/sec Loss 2.0893 LearningRate 0.000300 Epoch: 20 Global Step: 420620 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:31,741-Speed 2498.62 samples/sec Loss 2.1304 LearningRate 0.000300 Epoch: 20 Global Step: 420630 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:39,953-Speed 2494.93 samples/sec Loss 2.1143 LearningRate 0.000300 Epoch: 20 Global Step: 420640 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:48,153-Speed 2498.06 samples/sec Loss 2.1002 LearningRate 0.000300 Epoch: 20 Global Step: 420650 Fp16 Grad Scale: 32768 Required: 94 hours Training: 2022-07-09 14:43:56,314-Speed 2509.77 samples/sec Loss 2.0995 LearningRate 0.000300 Epoch: 20 Global Step: 420660 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:04,460-Speed 2514.66 samples/sec Loss 2.0915 LearningRate 0.000300 Epoch: 20 Global Step: 420670 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:12,661-Speed 2497.58 samples/sec Loss 2.1376 LearningRate 0.000300 Epoch: 20 Global Step: 420680 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:20,868-Speed 2495.61 samples/sec Loss 2.1266 LearningRate 0.000300 Epoch: 20 Global Step: 420690 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:29,075-Speed 2496.10 samples/sec Loss 2.1148 LearningRate 0.000300 Epoch: 20 Global Step: 420700 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:37,277-Speed 2497.39 samples/sec Loss 2.1559 LearningRate 0.000300 Epoch: 20 Global Step: 420710 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:45,490-Speed 2493.83 samples/sec Loss 2.1124 LearningRate 0.000300 Epoch: 20 Global Step: 420720 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:44:53,642-Speed 2512.85 samples/sec Loss 2.1599 LearningRate 0.000300 Epoch: 20 Global Step: 420730 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:01,839-Speed 2498.70 samples/sec Loss 2.1070 LearningRate 0.000300 Epoch: 20 Global Step: 420740 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:10,039-Speed 2498.15 samples/sec Loss 2.1512 LearningRate 0.000300 Epoch: 20 Global Step: 420750 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:18,242-Speed 2497.02 samples/sec Loss 2.1190 LearningRate 0.000300 Epoch: 20 Global Step: 420760 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:26,443-Speed 2497.54 samples/sec Loss 2.0911 LearningRate 0.000300 Epoch: 20 Global Step: 420770 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:34,651-Speed 2495.90 samples/sec Loss 2.1113 LearningRate 0.000300 Epoch: 20 Global Step: 420780 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:42,801-Speed 2513.27 samples/sec Loss 2.0973 LearningRate 0.000300 Epoch: 20 Global Step: 420790 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:51,003-Speed 2497.26 samples/sec Loss 2.1167 LearningRate 0.000300 Epoch: 20 Global Step: 420800 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:45:59,207-Speed 2497.13 samples/sec Loss 2.1526 LearningRate 0.000300 Epoch: 20 Global Step: 420810 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:07,417-Speed 2494.98 samples/sec Loss 2.1334 LearningRate 0.000300 Epoch: 20 Global Step: 420820 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:15,617-Speed 2497.74 samples/sec Loss 2.1450 LearningRate 0.000300 Epoch: 20 Global Step: 420830 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:23,827-Speed 2495.06 samples/sec Loss 2.1497 LearningRate 0.000300 Epoch: 20 Global Step: 420840 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:31,973-Speed 2514.43 samples/sec Loss 2.1577 LearningRate 0.000300 Epoch: 20 Global Step: 420850 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:40,176-Speed 2497.05 samples/sec Loss 2.1400 LearningRate 0.000300 Epoch: 20 Global Step: 420860 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:48,380-Speed 2496.75 samples/sec Loss 2.1177 LearningRate 0.000300 Epoch: 20 Global Step: 420870 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:46:56,585-Speed 2496.35 samples/sec Loss 2.1093 LearningRate 0.000300 Epoch: 20 Global Step: 420880 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:04,787-Speed 2497.22 samples/sec Loss 2.1058 LearningRate 0.000300 Epoch: 20 Global Step: 420890 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:12,990-Speed 2497.04 samples/sec Loss 2.1037 LearningRate 0.000300 Epoch: 20 Global Step: 420900 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:21,139-Speed 2513.56 samples/sec Loss 2.1240 LearningRate 0.000300 Epoch: 20 Global Step: 420910 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:29,356-Speed 2492.67 samples/sec Loss 2.1386 LearningRate 0.000300 Epoch: 20 Global Step: 420920 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:37,555-Speed 2498.23 samples/sec Loss 2.1433 LearningRate 0.000300 Epoch: 20 Global Step: 420930 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:45,754-Speed 2498.18 samples/sec Loss 2.1060 LearningRate 0.000300 Epoch: 20 Global Step: 420940 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:47:53,954-Speed 2497.99 samples/sec Loss 2.1554 LearningRate 0.000300 Epoch: 20 Global Step: 420950 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:02,155-Speed 2498.49 samples/sec Loss 2.0850 LearningRate 0.000300 Epoch: 20 Global Step: 420960 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:10,302-Speed 2514.30 samples/sec Loss 2.0829 LearningRate 0.000300 Epoch: 20 Global Step: 420970 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:18,512-Speed 2494.74 samples/sec Loss 2.1102 LearningRate 0.000299 Epoch: 20 Global Step: 420980 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:26,717-Speed 2496.32 samples/sec Loss 2.1002 LearningRate 0.000299 Epoch: 20 Global Step: 420990 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:34,919-Speed 2497.63 samples/sec Loss 2.1209 LearningRate 0.000299 Epoch: 20 Global Step: 421000 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:43,116-Speed 2498.74 samples/sec Loss 2.0633 LearningRate 0.000299 Epoch: 20 Global Step: 421010 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:51,315-Speed 2498.33 samples/sec Loss 2.0706 LearningRate 0.000299 Epoch: 20 Global Step: 421020 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:48:59,466-Speed 2513.08 samples/sec Loss 2.1116 LearningRate 0.000299 Epoch: 20 Global Step: 421030 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:07,676-Speed 2494.62 samples/sec Loss 2.0842 LearningRate 0.000299 Epoch: 20 Global Step: 421040 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:15,881-Speed 2496.57 samples/sec Loss 2.1228 LearningRate 0.000299 Epoch: 20 Global Step: 421050 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:24,079-Speed 2498.45 samples/sec Loss 2.1249 LearningRate 0.000299 Epoch: 20 Global Step: 421060 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:32,283-Speed 2497.03 samples/sec Loss 2.1154 LearningRate 0.000299 Epoch: 20 Global Step: 421070 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:40,485-Speed 2497.17 samples/sec Loss 2.0966 LearningRate 0.000299 Epoch: 20 Global Step: 421080 Fp16 Grad Scale: 16384 Required: 94 hours Training: 2022-07-09 14:49:48,633-Speed 2514.15 samples/sec Loss 2.0719 LearningRate 0.000299 Epoch: 20 Global Step: 421090 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:49:56,832-Speed 2498.47 samples/sec Loss 2.1218 LearningRate 0.000299 Epoch: 20 Global Step: 421100 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:05,030-Speed 2498.31 samples/sec Loss 2.1145 LearningRate 0.000299 Epoch: 20 Global Step: 421110 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:13,227-Speed 2499.06 samples/sec Loss 2.0823 LearningRate 0.000299 Epoch: 20 Global Step: 421120 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:21,429-Speed 2497.45 samples/sec Loss 2.0908 LearningRate 0.000299 Epoch: 20 Global Step: 421130 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:29,630-Speed 2497.52 samples/sec Loss 2.0771 LearningRate 0.000299 Epoch: 20 Global Step: 421140 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:37,773-Speed 2515.56 samples/sec Loss 2.1138 LearningRate 0.000299 Epoch: 20 Global Step: 421150 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:45,975-Speed 2497.28 samples/sec Loss 2.0704 LearningRate 0.000299 Epoch: 20 Global Step: 421160 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:50:54,185-Speed 2495.10 samples/sec Loss 2.0431 LearningRate 0.000299 Epoch: 20 Global Step: 421170 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:02,400-Speed 2493.25 samples/sec Loss 2.0852 LearningRate 0.000299 Epoch: 20 Global Step: 421180 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:10,608-Speed 2495.69 samples/sec Loss 2.0753 LearningRate 0.000299 Epoch: 20 Global Step: 421190 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:18,818-Speed 2495.01 samples/sec Loss 2.0813 LearningRate 0.000299 Epoch: 20 Global Step: 421200 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:26,978-Speed 2510.16 samples/sec Loss 2.1063 LearningRate 0.000299 Epoch: 20 Global Step: 421210 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:35,178-Speed 2497.70 samples/sec Loss 2.0495 LearningRate 0.000299 Epoch: 20 Global Step: 421220 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:43,380-Speed 2497.63 samples/sec Loss 2.1115 LearningRate 0.000299 Epoch: 20 Global Step: 421230 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:51,581-Speed 2497.81 samples/sec Loss 2.1411 LearningRate 0.000299 Epoch: 20 Global Step: 421240 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:51:59,780-Speed 2498.35 samples/sec Loss 2.1402 LearningRate 0.000299 Epoch: 20 Global Step: 421250 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:07,995-Speed 2493.24 samples/sec Loss 2.0999 LearningRate 0.000299 Epoch: 20 Global Step: 421260 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:16,148-Speed 2512.60 samples/sec Loss 2.0638 LearningRate 0.000299 Epoch: 20 Global Step: 421270 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:24,344-Speed 2499.04 samples/sec Loss 2.1067 LearningRate 0.000299 Epoch: 20 Global Step: 421280 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:32,554-Speed 2494.91 samples/sec Loss 2.0777 LearningRate 0.000299 Epoch: 20 Global Step: 421290 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:40,767-Speed 2494.20 samples/sec Loss 2.1044 LearningRate 0.000299 Epoch: 20 Global Step: 421300 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:48,966-Speed 2498.40 samples/sec Loss 2.0763 LearningRate 0.000299 Epoch: 20 Global Step: 421310 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:52:57,168-Speed 2497.50 samples/sec Loss 2.0719 LearningRate 0.000299 Epoch: 20 Global Step: 421320 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:05,315-Speed 2514.29 samples/sec Loss 2.0393 LearningRate 0.000299 Epoch: 20 Global Step: 421330 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:13,513-Speed 2498.50 samples/sec Loss 2.0542 LearningRate 0.000299 Epoch: 20 Global Step: 421340 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:21,715-Speed 2497.28 samples/sec Loss 2.1382 LearningRate 0.000299 Epoch: 20 Global Step: 421350 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:29,916-Speed 2497.73 samples/sec Loss 2.1170 LearningRate 0.000299 Epoch: 20 Global Step: 421360 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:38,118-Speed 2497.39 samples/sec Loss 2.0959 LearningRate 0.000299 Epoch: 20 Global Step: 421370 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:46,329-Speed 2494.80 samples/sec Loss 2.0464 LearningRate 0.000299 Epoch: 20 Global Step: 421380 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:53:54,482-Speed 2512.32 samples/sec Loss 2.1364 LearningRate 0.000299 Epoch: 20 Global Step: 421390 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:02,686-Speed 2496.57 samples/sec Loss 2.0974 LearningRate 0.000299 Epoch: 20 Global Step: 421400 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:10,889-Speed 2496.95 samples/sec Loss 2.0585 LearningRate 0.000299 Epoch: 20 Global Step: 421410 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:19,087-Speed 2498.69 samples/sec Loss 2.1110 LearningRate 0.000299 Epoch: 20 Global Step: 421420 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:27,289-Speed 2497.07 samples/sec Loss 2.1003 LearningRate 0.000299 Epoch: 20 Global Step: 421430 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:35,512-Speed 2491.17 samples/sec Loss 2.1342 LearningRate 0.000299 Epoch: 20 Global Step: 421440 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:43,657-Speed 2514.72 samples/sec Loss 2.0920 LearningRate 0.000299 Epoch: 20 Global Step: 421450 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:54:51,857-Speed 2498.04 samples/sec Loss 2.0914 LearningRate 0.000299 Epoch: 20 Global Step: 421460 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:00,054-Speed 2498.95 samples/sec Loss 2.1429 LearningRate 0.000299 Epoch: 20 Global Step: 421470 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:08,256-Speed 2496.98 samples/sec Loss 2.0803 LearningRate 0.000299 Epoch: 20 Global Step: 421480 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:16,455-Speed 2498.50 samples/sec Loss 2.0720 LearningRate 0.000299 Epoch: 20 Global Step: 421490 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:24,664-Speed 2495.21 samples/sec Loss 2.0992 LearningRate 0.000299 Epoch: 20 Global Step: 421500 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:32,824-Speed 2510.19 samples/sec Loss 2.0622 LearningRate 0.000299 Epoch: 20 Global Step: 421510 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:41,030-Speed 2496.19 samples/sec Loss 2.0621 LearningRate 0.000299 Epoch: 20 Global Step: 421520 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:49,231-Speed 2497.40 samples/sec Loss 2.0939 LearningRate 0.000299 Epoch: 20 Global Step: 421530 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:55:57,435-Speed 2496.74 samples/sec Loss 2.0519 LearningRate 0.000299 Epoch: 20 Global Step: 421540 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:05,644-Speed 2495.24 samples/sec Loss 2.1320 LearningRate 0.000299 Epoch: 20 Global Step: 421550 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:13,845-Speed 2497.55 samples/sec Loss 2.0901 LearningRate 0.000299 Epoch: 20 Global Step: 421560 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:21,991-Speed 2514.46 samples/sec Loss 2.1245 LearningRate 0.000299 Epoch: 20 Global Step: 421570 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:30,189-Speed 2498.81 samples/sec Loss 2.0814 LearningRate 0.000299 Epoch: 20 Global Step: 421580 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:38,391-Speed 2497.30 samples/sec Loss 2.0941 LearningRate 0.000299 Epoch: 20 Global Step: 421590 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:46,586-Speed 2499.51 samples/sec Loss 2.0724 LearningRate 0.000299 Epoch: 20 Global Step: 421600 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:56:54,803-Speed 2492.62 samples/sec Loss 2.0629 LearningRate 0.000299 Epoch: 20 Global Step: 421610 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:03,008-Speed 2496.60 samples/sec Loss 2.1364 LearningRate 0.000299 Epoch: 20 Global Step: 421620 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:11,152-Speed 2514.90 samples/sec Loss 2.0878 LearningRate 0.000299 Epoch: 20 Global Step: 421630 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:19,350-Speed 2498.57 samples/sec Loss 2.1223 LearningRate 0.000299 Epoch: 20 Global Step: 421640 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:27,560-Speed 2494.78 samples/sec Loss 2.0417 LearningRate 0.000299 Epoch: 20 Global Step: 421650 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:35,772-Speed 2494.49 samples/sec Loss 2.0668 LearningRate 0.000298 Epoch: 20 Global Step: 421660 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:43,973-Speed 2497.57 samples/sec Loss 2.1013 LearningRate 0.000298 Epoch: 20 Global Step: 421670 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:57:52,174-Speed 2497.37 samples/sec Loss 2.1011 LearningRate 0.000298 Epoch: 20 Global Step: 421680 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:00,324-Speed 2513.41 samples/sec Loss 2.0598 LearningRate 0.000298 Epoch: 20 Global Step: 421690 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:08,526-Speed 2497.40 samples/sec Loss 2.1419 LearningRate 0.000298 Epoch: 20 Global Step: 421700 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:16,724-Speed 2498.29 samples/sec Loss 2.0869 LearningRate 0.000298 Epoch: 20 Global Step: 421710 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:24,921-Speed 2498.87 samples/sec Loss 2.0848 LearningRate 0.000298 Epoch: 20 Global Step: 421720 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:33,127-Speed 2496.52 samples/sec Loss 2.1114 LearningRate 0.000298 Epoch: 20 Global Step: 421730 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:41,329-Speed 2497.13 samples/sec Loss 2.1178 LearningRate 0.000298 Epoch: 20 Global Step: 421740 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:49,478-Speed 2513.82 samples/sec Loss 2.0880 LearningRate 0.000298 Epoch: 20 Global Step: 421750 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:58:57,678-Speed 2497.93 samples/sec Loss 2.0978 LearningRate 0.000298 Epoch: 20 Global Step: 421760 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:05,878-Speed 2497.99 samples/sec Loss 2.1058 LearningRate 0.000298 Epoch: 20 Global Step: 421770 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:14,082-Speed 2497.25 samples/sec Loss 2.1044 LearningRate 0.000298 Epoch: 20 Global Step: 421780 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:22,289-Speed 2495.60 samples/sec Loss 2.0745 LearningRate 0.000298 Epoch: 20 Global Step: 421790 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:30,490-Speed 2498.02 samples/sec Loss 2.1107 LearningRate 0.000298 Epoch: 20 Global Step: 421800 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:38,638-Speed 2514.24 samples/sec Loss 2.1192 LearningRate 0.000298 Epoch: 20 Global Step: 421810 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:46,884-Speed 2483.97 samples/sec Loss 2.0659 LearningRate 0.000298 Epoch: 20 Global Step: 421820 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 14:59:55,085-Speed 2497.52 samples/sec Loss 2.0907 LearningRate 0.000298 Epoch: 20 Global Step: 421830 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:00:03,283-Speed 2498.38 samples/sec Loss 2.0882 LearningRate 0.000298 Epoch: 20 Global Step: 421840 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:00:11,488-Speed 2496.67 samples/sec Loss 2.1097 LearningRate 0.000298 Epoch: 20 Global Step: 421850 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:00:19,690-Speed 2497.14 samples/sec Loss 2.1219 LearningRate 0.000298 Epoch: 20 Global Step: 421860 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:00:27,842-Speed 2512.95 samples/sec Loss 2.0967 LearningRate 0.000298 Epoch: 20 Global Step: 421870 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:00:36,044-Speed 2497.56 samples/sec Loss 2.1227 LearningRate 0.000298 Epoch: 20 Global Step: 421880 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:00:44,247-Speed 2497.18 samples/sec Loss 2.0832 LearningRate 0.000298 Epoch: 20 Global Step: 421890 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:00:52,455-Speed 2495.32 samples/sec Loss 2.1283 LearningRate 0.000298 Epoch: 20 Global Step: 421900 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:00,658-Speed 2497.05 samples/sec Loss 2.1192 LearningRate 0.000298 Epoch: 20 Global Step: 421910 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:08,860-Speed 2497.52 samples/sec Loss 2.1543 LearningRate 0.000298 Epoch: 20 Global Step: 421920 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:17,007-Speed 2514.09 samples/sec Loss 2.1031 LearningRate 0.000298 Epoch: 20 Global Step: 421930 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:25,204-Speed 2498.87 samples/sec Loss 2.1105 LearningRate 0.000298 Epoch: 20 Global Step: 421940 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:33,400-Speed 2499.53 samples/sec Loss 2.0946 LearningRate 0.000298 Epoch: 20 Global Step: 421950 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:41,599-Speed 2498.08 samples/sec Loss 2.1362 LearningRate 0.000298 Epoch: 20 Global Step: 421960 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:49,801-Speed 2497.56 samples/sec Loss 2.0755 LearningRate 0.000298 Epoch: 20 Global Step: 421970 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:01:58,001-Speed 2497.93 samples/sec Loss 2.1218 LearningRate 0.000298 Epoch: 20 Global Step: 421980 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:06,151-Speed 2513.47 samples/sec Loss 2.0645 LearningRate 0.000298 Epoch: 20 Global Step: 421990 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:14,363-Speed 2494.26 samples/sec Loss 2.1177 LearningRate 0.000298 Epoch: 20 Global Step: 422000 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:22,564-Speed 2497.59 samples/sec Loss 2.1351 LearningRate 0.000298 Epoch: 20 Global Step: 422010 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:30,764-Speed 2497.99 samples/sec Loss 2.0767 LearningRate 0.000298 Epoch: 20 Global Step: 422020 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:38,965-Speed 2497.73 samples/sec Loss 2.0783 LearningRate 0.000298 Epoch: 20 Global Step: 422030 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:47,169-Speed 2496.59 samples/sec Loss 2.0750 LearningRate 0.000298 Epoch: 20 Global Step: 422040 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:02:55,340-Speed 2506.97 samples/sec Loss 2.1533 LearningRate 0.000298 Epoch: 20 Global Step: 422050 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:03,549-Speed 2495.06 samples/sec Loss 2.0955 LearningRate 0.000298 Epoch: 20 Global Step: 422060 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:11,751-Speed 2497.46 samples/sec Loss 2.1121 LearningRate 0.000298 Epoch: 20 Global Step: 422070 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:19,947-Speed 2499.12 samples/sec Loss 2.1435 LearningRate 0.000298 Epoch: 20 Global Step: 422080 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:28,149-Speed 2497.63 samples/sec Loss 2.1071 LearningRate 0.000298 Epoch: 20 Global Step: 422090 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:36,351-Speed 2497.06 samples/sec Loss 2.0691 LearningRate 0.000298 Epoch: 20 Global Step: 422100 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:44,504-Speed 2512.29 samples/sec Loss 2.1302 LearningRate 0.000298 Epoch: 20 Global Step: 422110 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:03:52,708-Speed 2496.99 samples/sec Loss 2.1359 LearningRate 0.000298 Epoch: 20 Global Step: 422120 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:00,916-Speed 2495.35 samples/sec Loss 2.0713 LearningRate 0.000298 Epoch: 20 Global Step: 422130 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:09,114-Speed 2498.39 samples/sec Loss 2.1179 LearningRate 0.000298 Epoch: 20 Global Step: 422140 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:17,315-Speed 2497.68 samples/sec Loss 2.0842 LearningRate 0.000298 Epoch: 20 Global Step: 422150 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:25,515-Speed 2498.20 samples/sec Loss 2.0913 LearningRate 0.000298 Epoch: 20 Global Step: 422160 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:33,665-Speed 2513.20 samples/sec Loss 2.1287 LearningRate 0.000298 Epoch: 20 Global Step: 422170 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:41,864-Speed 2498.13 samples/sec Loss 2.0494 LearningRate 0.000298 Epoch: 20 Global Step: 422180 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:50,067-Speed 2497.38 samples/sec Loss 2.1120 LearningRate 0.000298 Epoch: 20 Global Step: 422190 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:04:58,270-Speed 2497.20 samples/sec Loss 2.0785 LearningRate 0.000298 Epoch: 20 Global Step: 422200 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:06,472-Speed 2497.26 samples/sec Loss 2.0807 LearningRate 0.000298 Epoch: 20 Global Step: 422210 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:14,674-Speed 2497.69 samples/sec Loss 2.0570 LearningRate 0.000298 Epoch: 20 Global Step: 422220 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:22,821-Speed 2514.04 samples/sec Loss 2.0671 LearningRate 0.000298 Epoch: 20 Global Step: 422230 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:31,022-Speed 2497.53 samples/sec Loss 2.0897 LearningRate 0.000298 Epoch: 20 Global Step: 422240 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:39,237-Speed 2493.54 samples/sec Loss 2.0430 LearningRate 0.000298 Epoch: 20 Global Step: 422250 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:47,438-Speed 2497.69 samples/sec Loss 2.1004 LearningRate 0.000298 Epoch: 20 Global Step: 422260 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:05:55,636-Speed 2498.64 samples/sec Loss 2.0707 LearningRate 0.000298 Epoch: 20 Global Step: 422270 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:03,835-Speed 2498.22 samples/sec Loss 2.0693 LearningRate 0.000298 Epoch: 20 Global Step: 422280 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:11,982-Speed 2514.22 samples/sec Loss 2.0715 LearningRate 0.000298 Epoch: 20 Global Step: 422290 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:20,183-Speed 2497.67 samples/sec Loss 2.0924 LearningRate 0.000298 Epoch: 20 Global Step: 422300 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:28,389-Speed 2496.05 samples/sec Loss 2.1203 LearningRate 0.000298 Epoch: 20 Global Step: 422310 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:36,587-Speed 2498.75 samples/sec Loss 2.0804 LearningRate 0.000298 Epoch: 20 Global Step: 422320 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:44,815-Speed 2489.44 samples/sec Loss 2.1021 LearningRate 0.000298 Epoch: 20 Global Step: 422330 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:06:53,014-Speed 2498.04 samples/sec Loss 2.1163 LearningRate 0.000297 Epoch: 20 Global Step: 422340 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:01,165-Speed 2513.48 samples/sec Loss 2.1148 LearningRate 0.000297 Epoch: 20 Global Step: 422350 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:09,369-Speed 2496.71 samples/sec Loss 2.0705 LearningRate 0.000297 Epoch: 20 Global Step: 422360 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:17,567-Speed 2498.59 samples/sec Loss 2.0919 LearningRate 0.000297 Epoch: 20 Global Step: 422370 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:25,780-Speed 2493.75 samples/sec Loss 2.0426 LearningRate 0.000297 Epoch: 20 Global Step: 422380 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:33,985-Speed 2497.00 samples/sec Loss 2.1009 LearningRate 0.000297 Epoch: 20 Global Step: 422390 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:42,191-Speed 2495.77 samples/sec Loss 2.1220 LearningRate 0.000297 Epoch: 20 Global Step: 422400 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:50,340-Speed 2513.76 samples/sec Loss 2.1203 LearningRate 0.000297 Epoch: 20 Global Step: 422410 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:07:58,538-Speed 2498.61 samples/sec Loss 2.1163 LearningRate 0.000297 Epoch: 20 Global Step: 422420 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:06,742-Speed 2496.72 samples/sec Loss 2.0941 LearningRate 0.000297 Epoch: 20 Global Step: 422430 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:14,942-Speed 2498.14 samples/sec Loss 2.1194 LearningRate 0.000297 Epoch: 20 Global Step: 422440 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:23,144-Speed 2497.36 samples/sec Loss 2.0430 LearningRate 0.000297 Epoch: 20 Global Step: 422450 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:31,353-Speed 2495.18 samples/sec Loss 2.0609 LearningRate 0.000297 Epoch: 20 Global Step: 422460 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:39,499-Speed 2514.58 samples/sec Loss 2.1130 LearningRate 0.000297 Epoch: 20 Global Step: 422470 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:47,697-Speed 2498.34 samples/sec Loss 2.1089 LearningRate 0.000297 Epoch: 20 Global Step: 422480 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:08:55,899-Speed 2497.44 samples/sec Loss 2.0941 LearningRate 0.000297 Epoch: 20 Global Step: 422490 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:04,105-Speed 2496.19 samples/sec Loss 2.0664 LearningRate 0.000297 Epoch: 20 Global Step: 422500 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:12,305-Speed 2497.92 samples/sec Loss 2.0906 LearningRate 0.000297 Epoch: 20 Global Step: 422510 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:20,504-Speed 2498.37 samples/sec Loss 2.0387 LearningRate 0.000297 Epoch: 20 Global Step: 422520 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:28,650-Speed 2514.37 samples/sec Loss 2.0777 LearningRate 0.000297 Epoch: 20 Global Step: 422530 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:36,853-Speed 2497.01 samples/sec Loss 2.1130 LearningRate 0.000297 Epoch: 20 Global Step: 422540 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:45,054-Speed 2497.64 samples/sec Loss 2.1119 LearningRate 0.000297 Epoch: 20 Global Step: 422550 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:09:53,261-Speed 2495.93 samples/sec Loss 2.0523 LearningRate 0.000297 Epoch: 20 Global Step: 422560 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:01,458-Speed 2498.71 samples/sec Loss 2.0817 LearningRate 0.000297 Epoch: 20 Global Step: 422570 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:09,659-Speed 2497.63 samples/sec Loss 2.0890 LearningRate 0.000297 Epoch: 20 Global Step: 422580 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:17,813-Speed 2512.12 samples/sec Loss 2.0653 LearningRate 0.000297 Epoch: 20 Global Step: 422590 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:26,012-Speed 2498.18 samples/sec Loss 2.0731 LearningRate 0.000297 Epoch: 20 Global Step: 422600 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:34,239-Speed 2489.99 samples/sec Loss 2.0612 LearningRate 0.000297 Epoch: 20 Global Step: 422610 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:42,440-Speed 2498.05 samples/sec Loss 2.0610 LearningRate 0.000297 Epoch: 20 Global Step: 422620 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:50,641-Speed 2497.58 samples/sec Loss 2.0881 LearningRate 0.000297 Epoch: 20 Global Step: 422630 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:10:58,842-Speed 2497.66 samples/sec Loss 2.0635 LearningRate 0.000297 Epoch: 20 Global Step: 422640 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:06,986-Speed 2514.84 samples/sec Loss 2.0695 LearningRate 0.000297 Epoch: 20 Global Step: 422650 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:15,185-Speed 2498.32 samples/sec Loss 2.0480 LearningRate 0.000297 Epoch: 20 Global Step: 422660 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:23,382-Speed 2498.83 samples/sec Loss 2.0783 LearningRate 0.000297 Epoch: 20 Global Step: 422670 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:31,582-Speed 2498.01 samples/sec Loss 2.0429 LearningRate 0.000297 Epoch: 20 Global Step: 422680 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:39,779-Speed 2499.10 samples/sec Loss 2.0050 LearningRate 0.000297 Epoch: 20 Global Step: 422690 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:47,983-Speed 2496.48 samples/sec Loss 2.0894 LearningRate 0.000297 Epoch: 20 Global Step: 422700 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:11:56,147-Speed 2509.04 samples/sec Loss 2.1198 LearningRate 0.000297 Epoch: 20 Global Step: 422710 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:04,348-Speed 2497.81 samples/sec Loss 2.0870 LearningRate 0.000297 Epoch: 20 Global Step: 422720 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:12,549-Speed 2497.81 samples/sec Loss 2.1136 LearningRate 0.000297 Epoch: 20 Global Step: 422730 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:20,752-Speed 2496.92 samples/sec Loss 2.0764 LearningRate 0.000297 Epoch: 20 Global Step: 422740 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:28,953-Speed 2497.62 samples/sec Loss 2.0546 LearningRate 0.000297 Epoch: 20 Global Step: 422750 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:37,154-Speed 2497.61 samples/sec Loss 2.0619 LearningRate 0.000297 Epoch: 20 Global Step: 422760 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:45,306-Speed 2513.76 samples/sec Loss 2.1216 LearningRate 0.000297 Epoch: 20 Global Step: 422770 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:12:53,517-Speed 2494.41 samples/sec Loss 2.0864 LearningRate 0.000297 Epoch: 20 Global Step: 422780 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:01,721-Speed 2496.92 samples/sec Loss 2.0866 LearningRate 0.000297 Epoch: 20 Global Step: 422790 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:09,920-Speed 2498.11 samples/sec Loss 2.0948 LearningRate 0.000297 Epoch: 20 Global Step: 422800 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:18,124-Speed 2497.00 samples/sec Loss 2.0932 LearningRate 0.000297 Epoch: 20 Global Step: 422810 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:26,325-Speed 2497.55 samples/sec Loss 2.1335 LearningRate 0.000297 Epoch: 20 Global Step: 422820 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:34,477-Speed 2512.76 samples/sec Loss 2.0398 LearningRate 0.000297 Epoch: 20 Global Step: 422830 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:42,679-Speed 2497.19 samples/sec Loss 2.1046 LearningRate 0.000297 Epoch: 20 Global Step: 422840 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:50,884-Speed 2496.69 samples/sec Loss 2.1056 LearningRate 0.000297 Epoch: 20 Global Step: 422850 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:13:59,101-Speed 2492.76 samples/sec Loss 2.1156 LearningRate 0.000297 Epoch: 20 Global Step: 422860 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:07,312-Speed 2494.41 samples/sec Loss 2.0578 LearningRate 0.000297 Epoch: 20 Global Step: 422870 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:15,515-Speed 2496.97 samples/sec Loss 2.0551 LearningRate 0.000297 Epoch: 20 Global Step: 422880 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:23,665-Speed 2513.94 samples/sec Loss 2.1198 LearningRate 0.000297 Epoch: 20 Global Step: 422890 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:31,866-Speed 2497.66 samples/sec Loss 2.1293 LearningRate 0.000297 Epoch: 20 Global Step: 422900 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:40,073-Speed 2495.62 samples/sec Loss 2.0527 LearningRate 0.000297 Epoch: 20 Global Step: 422910 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:48,273-Speed 2498.01 samples/sec Loss 2.1070 LearningRate 0.000297 Epoch: 20 Global Step: 422920 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:14:56,473-Speed 2497.95 samples/sec Loss 2.1774 LearningRate 0.000297 Epoch: 20 Global Step: 422930 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:04,676-Speed 2496.84 samples/sec Loss 2.1458 LearningRate 0.000297 Epoch: 20 Global Step: 422940 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:12,824-Speed 2514.39 samples/sec Loss 2.1016 LearningRate 0.000297 Epoch: 20 Global Step: 422950 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:21,024-Speed 2497.72 samples/sec Loss 2.1361 LearningRate 0.000297 Epoch: 20 Global Step: 422960 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:29,222-Speed 2498.75 samples/sec Loss 2.0572 LearningRate 0.000297 Epoch: 20 Global Step: 422970 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:37,418-Speed 2499.10 samples/sec Loss 2.1102 LearningRate 0.000297 Epoch: 20 Global Step: 422980 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:45,619-Speed 2497.49 samples/sec Loss 2.1134 LearningRate 0.000297 Epoch: 20 Global Step: 422990 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:15:53,830-Speed 2494.91 samples/sec Loss 2.0470 LearningRate 0.000297 Epoch: 20 Global Step: 423000 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:01,987-Speed 2510.98 samples/sec Loss 2.1291 LearningRate 0.000297 Epoch: 20 Global Step: 423010 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:10,185-Speed 2498.67 samples/sec Loss 2.1171 LearningRate 0.000297 Epoch: 20 Global Step: 423020 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:18,393-Speed 2495.35 samples/sec Loss 2.1202 LearningRate 0.000296 Epoch: 20 Global Step: 423030 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:26,590-Speed 2498.80 samples/sec Loss 2.0755 LearningRate 0.000296 Epoch: 20 Global Step: 423040 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:34,791-Speed 2497.87 samples/sec Loss 2.1138 LearningRate 0.000296 Epoch: 20 Global Step: 423050 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:16:42,992-Speed 2497.39 samples/sec Loss 2.0936 LearningRate 0.000296 Epoch: 20 Global Step: 423060 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:16:51,140-Speed 2514.14 samples/sec Loss 2.1004 LearningRate 0.000296 Epoch: 20 Global Step: 423070 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:16:59,345-Speed 2496.81 samples/sec Loss 2.0511 LearningRate 0.000296 Epoch: 20 Global Step: 423080 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:07,543-Speed 2498.44 samples/sec Loss 2.1363 LearningRate 0.000296 Epoch: 20 Global Step: 423090 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:15,743-Speed 2498.04 samples/sec Loss 2.0907 LearningRate 0.000296 Epoch: 20 Global Step: 423100 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:23,943-Speed 2497.95 samples/sec Loss 2.1006 LearningRate 0.000296 Epoch: 20 Global Step: 423110 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:32,142-Speed 2498.10 samples/sec Loss 2.1116 LearningRate 0.000296 Epoch: 20 Global Step: 423120 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:40,304-Speed 2509.73 samples/sec Loss 2.0713 LearningRate 0.000296 Epoch: 20 Global Step: 423130 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:48,505-Speed 2497.61 samples/sec Loss 2.1539 LearningRate 0.000296 Epoch: 20 Global Step: 423140 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:17:56,703-Speed 2498.43 samples/sec Loss 2.0962 LearningRate 0.000296 Epoch: 20 Global Step: 423150 Fp16 Grad Scale: 65536 Required: 93 hours Training: 2022-07-09 15:18:04,864-Speed 2510.20 samples/sec Loss 2.0784 LearningRate 0.000296 Epoch: 20 Global Step: 423160 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:13,064-Speed 2497.89 samples/sec Loss 2.0659 LearningRate 0.000296 Epoch: 20 Global Step: 423170 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:21,262-Speed 2498.82 samples/sec Loss 2.1225 LearningRate 0.000296 Epoch: 20 Global Step: 423180 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:29,416-Speed 2512.04 samples/sec Loss 2.0896 LearningRate 0.000296 Epoch: 20 Global Step: 423190 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:37,649-Speed 2487.67 samples/sec Loss 2.0706 LearningRate 0.000296 Epoch: 20 Global Step: 423200 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:45,852-Speed 2497.14 samples/sec Loss 2.1029 LearningRate 0.000296 Epoch: 20 Global Step: 423210 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:18:54,053-Speed 2497.56 samples/sec Loss 2.1310 LearningRate 0.000296 Epoch: 20 Global Step: 423220 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:02,250-Speed 2498.91 samples/sec Loss 2.1156 LearningRate 0.000296 Epoch: 20 Global Step: 423230 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:10,453-Speed 2497.25 samples/sec Loss 2.1117 LearningRate 0.000296 Epoch: 20 Global Step: 423240 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:18,602-Speed 2513.46 samples/sec Loss 2.1231 LearningRate 0.000296 Epoch: 20 Global Step: 423250 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:26,801-Speed 2498.27 samples/sec Loss 2.0589 LearningRate 0.000296 Epoch: 20 Global Step: 423260 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:35,003-Speed 2497.43 samples/sec Loss 2.0887 LearningRate 0.000296 Epoch: 20 Global Step: 423270 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:43,206-Speed 2497.04 samples/sec Loss 2.1522 LearningRate 0.000296 Epoch: 20 Global Step: 423280 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:51,406-Speed 2497.94 samples/sec Loss 2.1207 LearningRate 0.000296 Epoch: 20 Global Step: 423290 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:19:59,605-Speed 2498.37 samples/sec Loss 2.0865 LearningRate 0.000296 Epoch: 20 Global Step: 423300 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:07,750-Speed 2515.00 samples/sec Loss 2.0917 LearningRate 0.000296 Epoch: 20 Global Step: 423310 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:15,952-Speed 2497.38 samples/sec Loss 2.0979 LearningRate 0.000296 Epoch: 20 Global Step: 423320 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:24,150-Speed 2498.41 samples/sec Loss 2.0847 LearningRate 0.000296 Epoch: 20 Global Step: 423330 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:32,362-Speed 2494.58 samples/sec Loss 2.0565 LearningRate 0.000296 Epoch: 20 Global Step: 423340 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:40,567-Speed 2496.54 samples/sec Loss 2.1084 LearningRate 0.000296 Epoch: 20 Global Step: 423350 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:48,768-Speed 2497.51 samples/sec Loss 2.0743 LearningRate 0.000296 Epoch: 20 Global Step: 423360 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:20:56,920-Speed 2512.33 samples/sec Loss 2.1321 LearningRate 0.000296 Epoch: 20 Global Step: 423370 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:05,117-Speed 2499.11 samples/sec Loss 2.0587 LearningRate 0.000296 Epoch: 20 Global Step: 423380 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:13,316-Speed 2498.36 samples/sec Loss 2.0862 LearningRate 0.000296 Epoch: 20 Global Step: 423390 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:21,536-Speed 2491.82 samples/sec Loss 2.0491 LearningRate 0.000296 Epoch: 20 Global Step: 423400 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:29,746-Speed 2495.10 samples/sec Loss 2.1094 LearningRate 0.000296 Epoch: 20 Global Step: 423410 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:37,948-Speed 2497.23 samples/sec Loss 2.0568 LearningRate 0.000296 Epoch: 20 Global Step: 423420 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:46,092-Speed 2515.17 samples/sec Loss 2.0634 LearningRate 0.000296 Epoch: 20 Global Step: 423430 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:21:54,288-Speed 2499.18 samples/sec Loss 2.0898 LearningRate 0.000296 Epoch: 20 Global Step: 423440 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:02,490-Speed 2497.36 samples/sec Loss 2.0489 LearningRate 0.000296 Epoch: 20 Global Step: 423450 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:10,694-Speed 2497.06 samples/sec Loss 2.0572 LearningRate 0.000296 Epoch: 20 Global Step: 423460 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:18,893-Speed 2497.89 samples/sec Loss 2.1059 LearningRate 0.000296 Epoch: 20 Global Step: 423470 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:27,099-Speed 2496.60 samples/sec Loss 2.1033 LearningRate 0.000296 Epoch: 20 Global Step: 423480 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:35,256-Speed 2511.15 samples/sec Loss 2.0685 LearningRate 0.000296 Epoch: 20 Global Step: 423490 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:43,458-Speed 2497.33 samples/sec Loss 2.0933 LearningRate 0.000296 Epoch: 20 Global Step: 423500 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:51,656-Speed 2498.70 samples/sec Loss 2.0764 LearningRate 0.000296 Epoch: 20 Global Step: 423510 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:22:59,853-Speed 2498.75 samples/sec Loss 2.0804 LearningRate 0.000296 Epoch: 20 Global Step: 423520 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:08,051-Speed 2498.63 samples/sec Loss 2.1457 LearningRate 0.000296 Epoch: 20 Global Step: 423530 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:16,251-Speed 2497.85 samples/sec Loss 2.0827 LearningRate 0.000296 Epoch: 20 Global Step: 423540 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:24,402-Speed 2512.99 samples/sec Loss 2.0929 LearningRate 0.000296 Epoch: 20 Global Step: 423550 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:32,602-Speed 2498.01 samples/sec Loss 2.1127 LearningRate 0.000296 Epoch: 20 Global Step: 423560 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:40,799-Speed 2499.11 samples/sec Loss 2.0921 LearningRate 0.000296 Epoch: 20 Global Step: 423570 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:49,001-Speed 2497.51 samples/sec Loss 2.0609 LearningRate 0.000296 Epoch: 20 Global Step: 423580 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:23:57,203-Speed 2497.31 samples/sec Loss 2.1107 LearningRate 0.000296 Epoch: 20 Global Step: 423590 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:05,402-Speed 2498.20 samples/sec Loss 2.1061 LearningRate 0.000296 Epoch: 20 Global Step: 423600 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:13,548-Speed 2514.58 samples/sec Loss 2.1098 LearningRate 0.000296 Epoch: 20 Global Step: 423610 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:21,748-Speed 2497.98 samples/sec Loss 2.1106 LearningRate 0.000296 Epoch: 20 Global Step: 423620 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:29,946-Speed 2498.51 samples/sec Loss 2.1246 LearningRate 0.000296 Epoch: 20 Global Step: 423630 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:38,149-Speed 2497.20 samples/sec Loss 2.0873 LearningRate 0.000296 Epoch: 20 Global Step: 423640 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:46,345-Speed 2499.33 samples/sec Loss 2.1036 LearningRate 0.000296 Epoch: 20 Global Step: 423650 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:24:54,551-Speed 2496.07 samples/sec Loss 2.1154 LearningRate 0.000296 Epoch: 20 Global Step: 423660 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:02,697-Speed 2514.62 samples/sec Loss 2.1095 LearningRate 0.000296 Epoch: 20 Global Step: 423670 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:10,899-Speed 2497.17 samples/sec Loss 2.1171 LearningRate 0.000296 Epoch: 20 Global Step: 423680 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:19,103-Speed 2496.93 samples/sec Loss 2.1022 LearningRate 0.000296 Epoch: 20 Global Step: 423690 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:27,302-Speed 2498.34 samples/sec Loss 2.1209 LearningRate 0.000296 Epoch: 20 Global Step: 423700 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:35,510-Speed 2495.29 samples/sec Loss 2.1084 LearningRate 0.000295 Epoch: 20 Global Step: 423710 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:43,713-Speed 2496.86 samples/sec Loss 2.0995 LearningRate 0.000295 Epoch: 20 Global Step: 423720 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:25:51,883-Speed 2507.37 samples/sec Loss 2.1367 LearningRate 0.000295 Epoch: 20 Global Step: 423730 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:00,081-Speed 2498.80 samples/sec Loss 2.0976 LearningRate 0.000295 Epoch: 20 Global Step: 423740 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:08,281-Speed 2497.85 samples/sec Loss 2.0604 LearningRate 0.000295 Epoch: 20 Global Step: 423750 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:16,481-Speed 2497.96 samples/sec Loss 2.1054 LearningRate 0.000295 Epoch: 20 Global Step: 423760 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:24,693-Speed 2494.43 samples/sec Loss 2.1312 LearningRate 0.000295 Epoch: 20 Global Step: 423770 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:32,893-Speed 2497.83 samples/sec Loss 2.1087 LearningRate 0.000295 Epoch: 20 Global Step: 423780 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:41,040-Speed 2514.22 samples/sec Loss 2.0987 LearningRate 0.000295 Epoch: 20 Global Step: 423790 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:49,239-Speed 2498.05 samples/sec Loss 2.0791 LearningRate 0.000295 Epoch: 20 Global Step: 423800 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:26:57,438-Speed 2498.16 samples/sec Loss 2.1096 LearningRate 0.000295 Epoch: 20 Global Step: 423810 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:05,639-Speed 2497.85 samples/sec Loss 2.0735 LearningRate 0.000295 Epoch: 20 Global Step: 423820 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:13,836-Speed 2498.76 samples/sec Loss 2.0742 LearningRate 0.000295 Epoch: 20 Global Step: 423830 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:22,036-Speed 2498.04 samples/sec Loss 2.0649 LearningRate 0.000295 Epoch: 20 Global Step: 423840 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:30,189-Speed 2512.65 samples/sec Loss 2.0853 LearningRate 0.000295 Epoch: 20 Global Step: 423850 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:38,390-Speed 2497.54 samples/sec Loss 2.0326 LearningRate 0.000295 Epoch: 20 Global Step: 423860 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:46,587-Speed 2498.77 samples/sec Loss 2.0706 LearningRate 0.000295 Epoch: 20 Global Step: 423870 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:27:54,808-Speed 2491.87 samples/sec Loss 2.0901 LearningRate 0.000295 Epoch: 20 Global Step: 423880 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:03,010-Speed 2497.24 samples/sec Loss 2.0528 LearningRate 0.000295 Epoch: 20 Global Step: 423890 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:11,216-Speed 2496.23 samples/sec Loss 2.1082 LearningRate 0.000295 Epoch: 20 Global Step: 423900 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:19,376-Speed 2510.12 samples/sec Loss 2.0891 LearningRate 0.000295 Epoch: 20 Global Step: 423910 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:27,574-Speed 2498.67 samples/sec Loss 2.0972 LearningRate 0.000295 Epoch: 20 Global Step: 423920 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:35,773-Speed 2498.25 samples/sec Loss 2.0860 LearningRate 0.000295 Epoch: 20 Global Step: 423930 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:43,973-Speed 2498.06 samples/sec Loss 2.0442 LearningRate 0.000295 Epoch: 20 Global Step: 423940 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:28:52,172-Speed 2498.35 samples/sec Loss 2.0926 LearningRate 0.000295 Epoch: 20 Global Step: 423950 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:29:00,371-Speed 2498.30 samples/sec Loss 2.0871 LearningRate 0.000295 Epoch: 20 Global Step: 423960 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:29:08,515-Speed 2515.00 samples/sec Loss 2.0782 LearningRate 0.000295 Epoch: 20 Global Step: 423970 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:29:16,724-Speed 2495.32 samples/sec Loss 2.1482 LearningRate 0.000295 Epoch: 20 Global Step: 423980 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:29:24,927-Speed 2497.08 samples/sec Loss 2.1004 LearningRate 0.000295 Epoch: 20 Global Step: 423990 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:29:33,092-Speed 2508.55 samples/sec Loss 2.0631 LearningRate 0.000295 Epoch: 20 Global Step: 424000 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:29:41,292-Speed 2497.89 samples/sec Loss 2.0922 LearningRate 0.000295 Epoch: 20 Global Step: 424010 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:29:49,491-Speed 2498.14 samples/sec Loss 2.1214 LearningRate 0.000295 Epoch: 20 Global Step: 424020 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:29:57,639-Speed 2514.17 samples/sec Loss 2.0261 LearningRate 0.000295 Epoch: 20 Global Step: 424030 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:05,847-Speed 2495.65 samples/sec Loss 2.1111 LearningRate 0.000295 Epoch: 20 Global Step: 424040 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:14,048-Speed 2497.82 samples/sec Loss 2.1303 LearningRate 0.000295 Epoch: 20 Global Step: 424050 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:22,248-Speed 2497.71 samples/sec Loss 2.0652 LearningRate 0.000295 Epoch: 20 Global Step: 424060 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:30,456-Speed 2495.62 samples/sec Loss 2.0808 LearningRate 0.000295 Epoch: 20 Global Step: 424070 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:38,656-Speed 2498.06 samples/sec Loss 2.1009 LearningRate 0.000295 Epoch: 20 Global Step: 424080 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:46,803-Speed 2514.42 samples/sec Loss 2.1039 LearningRate 0.000295 Epoch: 20 Global Step: 424090 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:30:55,017-Speed 2493.56 samples/sec Loss 2.0732 LearningRate 0.000295 Epoch: 20 Global Step: 424100 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:03,216-Speed 2498.20 samples/sec Loss 2.1141 LearningRate 0.000295 Epoch: 20 Global Step: 424110 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:11,423-Speed 2495.74 samples/sec Loss 2.0679 LearningRate 0.000295 Epoch: 20 Global Step: 424120 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:19,622-Speed 2498.51 samples/sec Loss 2.0740 LearningRate 0.000295 Epoch: 20 Global Step: 424130 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:27,822-Speed 2497.71 samples/sec Loss 2.0649 LearningRate 0.000295 Epoch: 20 Global Step: 424140 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:36,001-Speed 2504.39 samples/sec Loss 2.1093 LearningRate 0.000295 Epoch: 20 Global Step: 424150 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:44,200-Speed 2498.29 samples/sec Loss 2.0555 LearningRate 0.000295 Epoch: 20 Global Step: 424160 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:31:52,399-Speed 2498.27 samples/sec Loss 2.0765 LearningRate 0.000295 Epoch: 20 Global Step: 424170 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:00,599-Speed 2497.92 samples/sec Loss 2.1119 LearningRate 0.000295 Epoch: 20 Global Step: 424180 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:08,820-Speed 2491.62 samples/sec Loss 2.0815 LearningRate 0.000295 Epoch: 20 Global Step: 424190 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:17,020-Speed 2497.75 samples/sec Loss 2.0621 LearningRate 0.000295 Epoch: 20 Global Step: 424200 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:25,168-Speed 2513.83 samples/sec Loss 2.0548 LearningRate 0.000295 Epoch: 20 Global Step: 424210 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:33,366-Speed 2498.86 samples/sec Loss 2.0719 LearningRate 0.000295 Epoch: 20 Global Step: 424220 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:41,566-Speed 2497.81 samples/sec Loss 2.0561 LearningRate 0.000295 Epoch: 20 Global Step: 424230 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:49,769-Speed 2497.08 samples/sec Loss 2.1233 LearningRate 0.000295 Epoch: 20 Global Step: 424240 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:32:57,969-Speed 2498.07 samples/sec Loss 2.0679 LearningRate 0.000295 Epoch: 20 Global Step: 424250 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:06,176-Speed 2495.61 samples/sec Loss 2.0758 LearningRate 0.000295 Epoch: 20 Global Step: 424260 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:14,325-Speed 2513.75 samples/sec Loss 2.1160 LearningRate 0.000295 Epoch: 20 Global Step: 424270 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:22,527-Speed 2497.25 samples/sec Loss 2.0774 LearningRate 0.000295 Epoch: 20 Global Step: 424280 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:30,728-Speed 2498.17 samples/sec Loss 2.1006 LearningRate 0.000295 Epoch: 20 Global Step: 424290 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:38,931-Speed 2497.05 samples/sec Loss 2.1320 LearningRate 0.000295 Epoch: 20 Global Step: 424300 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:47,144-Speed 2494.02 samples/sec Loss 2.0868 LearningRate 0.000295 Epoch: 20 Global Step: 424310 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:33:55,342-Speed 2498.52 samples/sec Loss 2.0895 LearningRate 0.000295 Epoch: 20 Global Step: 424320 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:03,491-Speed 2513.58 samples/sec Loss 2.0630 LearningRate 0.000295 Epoch: 20 Global Step: 424330 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:11,691-Speed 2497.86 samples/sec Loss 2.0770 LearningRate 0.000295 Epoch: 20 Global Step: 424340 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:19,891-Speed 2497.90 samples/sec Loss 2.0768 LearningRate 0.000295 Epoch: 20 Global Step: 424350 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:28,093-Speed 2497.47 samples/sec Loss 2.0662 LearningRate 0.000295 Epoch: 20 Global Step: 424360 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:36,293-Speed 2498.01 samples/sec Loss 2.0824 LearningRate 0.000295 Epoch: 20 Global Step: 424370 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:44,501-Speed 2495.32 samples/sec Loss 2.0541 LearningRate 0.000295 Epoch: 20 Global Step: 424380 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:34:52,647-Speed 2514.56 samples/sec Loss 2.0348 LearningRate 0.000295 Epoch: 20 Global Step: 424390 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:00,848-Speed 2497.49 samples/sec Loss 2.0817 LearningRate 0.000294 Epoch: 20 Global Step: 424400 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:09,057-Speed 2495.34 samples/sec Loss 2.0578 LearningRate 0.000294 Epoch: 20 Global Step: 424410 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:17,262-Speed 2496.52 samples/sec Loss 2.0650 LearningRate 0.000294 Epoch: 20 Global Step: 424420 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:25,460-Speed 2498.46 samples/sec Loss 2.0904 LearningRate 0.000294 Epoch: 20 Global Step: 424430 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:33,660-Speed 2497.87 samples/sec Loss 2.1738 LearningRate 0.000294 Epoch: 20 Global Step: 424440 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:41,810-Speed 2513.36 samples/sec Loss 2.1688 LearningRate 0.000294 Epoch: 20 Global Step: 424450 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:50,014-Speed 2496.94 samples/sec Loss 2.0874 LearningRate 0.000294 Epoch: 20 Global Step: 424460 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:35:58,215-Speed 2497.56 samples/sec Loss 2.0845 LearningRate 0.000294 Epoch: 20 Global Step: 424470 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:06,420-Speed 2496.23 samples/sec Loss 2.1352 LearningRate 0.000294 Epoch: 20 Global Step: 424480 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:14,621-Speed 2498.35 samples/sec Loss 2.0807 LearningRate 0.000294 Epoch: 20 Global Step: 424490 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:22,822-Speed 2497.66 samples/sec Loss 2.1067 LearningRate 0.000294 Epoch: 20 Global Step: 424500 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:30,971-Speed 2513.46 samples/sec Loss 2.0254 LearningRate 0.000294 Epoch: 20 Global Step: 424510 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:39,172-Speed 2497.67 samples/sec Loss 2.1004 LearningRate 0.000294 Epoch: 20 Global Step: 424520 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:47,376-Speed 2496.92 samples/sec Loss 2.1010 LearningRate 0.000294 Epoch: 20 Global Step: 424530 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:36:55,584-Speed 2495.49 samples/sec Loss 2.1156 LearningRate 0.000294 Epoch: 20 Global Step: 424540 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:03,786-Speed 2497.44 samples/sec Loss 2.1365 LearningRate 0.000294 Epoch: 20 Global Step: 424550 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:11,984-Speed 2498.38 samples/sec Loss 2.0789 LearningRate 0.000294 Epoch: 20 Global Step: 424560 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:20,130-Speed 2514.52 samples/sec Loss 2.1181 LearningRate 0.000294 Epoch: 20 Global Step: 424570 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:28,335-Speed 2496.42 samples/sec Loss 2.0459 LearningRate 0.000294 Epoch: 20 Global Step: 424580 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:36,543-Speed 2495.54 samples/sec Loss 2.0736 LearningRate 0.000294 Epoch: 20 Global Step: 424590 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:44,746-Speed 2497.18 samples/sec Loss 2.0503 LearningRate 0.000294 Epoch: 20 Global Step: 424600 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:37:52,950-Speed 2496.62 samples/sec Loss 2.0778 LearningRate 0.000294 Epoch: 20 Global Step: 424610 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:01,149-Speed 2498.57 samples/sec Loss 2.1415 LearningRate 0.000294 Epoch: 20 Global Step: 424620 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:09,306-Speed 2511.16 samples/sec Loss 2.1161 LearningRate 0.000294 Epoch: 20 Global Step: 424630 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:17,507-Speed 2497.51 samples/sec Loss 2.0825 LearningRate 0.000294 Epoch: 20 Global Step: 424640 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:25,717-Speed 2495.11 samples/sec Loss 2.1085 LearningRate 0.000294 Epoch: 20 Global Step: 424650 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:33,922-Speed 2496.36 samples/sec Loss 2.1035 LearningRate 0.000294 Epoch: 20 Global Step: 424660 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:42,129-Speed 2495.71 samples/sec Loss 2.1457 LearningRate 0.000294 Epoch: 20 Global Step: 424670 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:50,329-Speed 2498.10 samples/sec Loss 2.1472 LearningRate 0.000294 Epoch: 20 Global Step: 424680 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:38:58,481-Speed 2512.70 samples/sec Loss 2.0865 LearningRate 0.000294 Epoch: 20 Global Step: 424690 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:06,685-Speed 2496.67 samples/sec Loss 2.1594 LearningRate 0.000294 Epoch: 20 Global Step: 424700 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:14,887-Speed 2497.35 samples/sec Loss 2.1127 LearningRate 0.000294 Epoch: 20 Global Step: 424710 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:23,090-Speed 2496.74 samples/sec Loss 2.0996 LearningRate 0.000294 Epoch: 20 Global Step: 424720 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:31,298-Speed 2495.61 samples/sec Loss 2.1130 LearningRate 0.000294 Epoch: 20 Global Step: 424730 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:39,498-Speed 2497.95 samples/sec Loss 2.0926 LearningRate 0.000294 Epoch: 20 Global Step: 424740 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:47,645-Speed 2514.37 samples/sec Loss 2.1296 LearningRate 0.000294 Epoch: 20 Global Step: 424750 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:39:55,842-Speed 2498.65 samples/sec Loss 2.1505 LearningRate 0.000294 Epoch: 20 Global Step: 424760 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:04,047-Speed 2496.55 samples/sec Loss 2.1112 LearningRate 0.000294 Epoch: 20 Global Step: 424770 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:12,248-Speed 2497.40 samples/sec Loss 2.1013 LearningRate 0.000294 Epoch: 20 Global Step: 424780 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:20,451-Speed 2497.33 samples/sec Loss 2.1372 LearningRate 0.000294 Epoch: 20 Global Step: 424790 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:28,651-Speed 2498.05 samples/sec Loss 2.0896 LearningRate 0.000294 Epoch: 20 Global Step: 424800 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:36,800-Speed 2513.65 samples/sec Loss 2.1279 LearningRate 0.000294 Epoch: 20 Global Step: 424810 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:45,002-Speed 2497.13 samples/sec Loss 2.1267 LearningRate 0.000294 Epoch: 20 Global Step: 424820 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:40:53,201-Speed 2498.22 samples/sec Loss 2.1261 LearningRate 0.000294 Epoch: 20 Global Step: 424830 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:01,398-Speed 2498.97 samples/sec Loss 2.1525 LearningRate 0.000294 Epoch: 20 Global Step: 424840 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:09,600-Speed 2497.64 samples/sec Loss 2.0615 LearningRate 0.000294 Epoch: 20 Global Step: 424850 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:17,801-Speed 2497.44 samples/sec Loss 2.1214 LearningRate 0.000294 Epoch: 20 Global Step: 424860 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:25,948-Speed 2514.32 samples/sec Loss 2.1368 LearningRate 0.000294 Epoch: 20 Global Step: 424870 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:34,148-Speed 2498.07 samples/sec Loss 2.1305 LearningRate 0.000294 Epoch: 20 Global Step: 424880 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:42,349-Speed 2497.70 samples/sec Loss 2.1127 LearningRate 0.000294 Epoch: 20 Global Step: 424890 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:50,562-Speed 2494.00 samples/sec Loss 2.0935 LearningRate 0.000294 Epoch: 20 Global Step: 424900 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:41:58,764-Speed 2497.39 samples/sec Loss 2.1286 LearningRate 0.000294 Epoch: 20 Global Step: 424910 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:06,964-Speed 2498.02 samples/sec Loss 2.1322 LearningRate 0.000294 Epoch: 20 Global Step: 424920 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:15,111-Speed 2514.16 samples/sec Loss 2.1023 LearningRate 0.000294 Epoch: 20 Global Step: 424930 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:23,308-Speed 2498.83 samples/sec Loss 2.1260 LearningRate 0.000294 Epoch: 20 Global Step: 424940 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:31,511-Speed 2497.24 samples/sec Loss 2.0877 LearningRate 0.000294 Epoch: 20 Global Step: 424950 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:39,718-Speed 2496.00 samples/sec Loss 2.1009 LearningRate 0.000294 Epoch: 20 Global Step: 424960 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:47,916-Speed 2498.67 samples/sec Loss 2.1205 LearningRate 0.000294 Epoch: 20 Global Step: 424970 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:42:56,118-Speed 2497.18 samples/sec Loss 2.1100 LearningRate 0.000294 Epoch: 20 Global Step: 424980 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:04,285-Speed 2508.23 samples/sec Loss 2.1463 LearningRate 0.000294 Epoch: 20 Global Step: 424990 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:12,490-Speed 2496.26 samples/sec Loss 2.1161 LearningRate 0.000294 Epoch: 20 Global Step: 425000 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:20,695-Speed 2496.34 samples/sec Loss 2.1459 LearningRate 0.000294 Epoch: 20 Global Step: 425010 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:28,899-Speed 2497.31 samples/sec Loss 2.1346 LearningRate 0.000294 Epoch: 20 Global Step: 425020 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:37,102-Speed 2497.09 samples/sec Loss 2.1103 LearningRate 0.000294 Epoch: 20 Global Step: 425030 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:45,308-Speed 2496.07 samples/sec Loss 2.0884 LearningRate 0.000294 Epoch: 20 Global Step: 425040 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:43:53,470-Speed 2509.75 samples/sec Loss 2.1195 LearningRate 0.000294 Epoch: 20 Global Step: 425050 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:01,674-Speed 2496.84 samples/sec Loss 2.1067 LearningRate 0.000294 Epoch: 20 Global Step: 425060 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:09,875-Speed 2497.68 samples/sec Loss 2.1062 LearningRate 0.000294 Epoch: 20 Global Step: 425070 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:18,072-Speed 2498.78 samples/sec Loss 2.0952 LearningRate 0.000294 Epoch: 20 Global Step: 425080 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:26,270-Speed 2498.51 samples/sec Loss 2.1108 LearningRate 0.000293 Epoch: 20 Global Step: 425090 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:34,471-Speed 2497.42 samples/sec Loss 2.0577 LearningRate 0.000293 Epoch: 20 Global Step: 425100 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:42,631-Speed 2510.61 samples/sec Loss 2.0762 LearningRate 0.000293 Epoch: 20 Global Step: 425110 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:50,829-Speed 2498.64 samples/sec Loss 2.0739 LearningRate 0.000293 Epoch: 20 Global Step: 425120 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:44:59,125-Speed 2468.93 samples/sec Loss 2.0826 LearningRate 0.000293 Epoch: 20 Global Step: 425130 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:07,328-Speed 2496.98 samples/sec Loss 2.0879 LearningRate 0.000293 Epoch: 20 Global Step: 425140 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:15,532-Speed 2496.80 samples/sec Loss 2.0923 LearningRate 0.000293 Epoch: 20 Global Step: 425150 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:23,730-Speed 2498.91 samples/sec Loss 2.1299 LearningRate 0.000293 Epoch: 20 Global Step: 425160 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:31,877-Speed 2514.03 samples/sec Loss 2.0923 LearningRate 0.000293 Epoch: 20 Global Step: 425170 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:40,082-Speed 2496.50 samples/sec Loss 2.0820 LearningRate 0.000293 Epoch: 20 Global Step: 425180 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:48,281-Speed 2498.38 samples/sec Loss 2.1228 LearningRate 0.000293 Epoch: 20 Global Step: 425190 Fp16 Grad Scale: 16384 Required: 93 hours Training: 2022-07-09 15:45:56,487-Speed 2496.05 samples/sec Loss 2.0940 LearningRate 0.000293 Epoch: 20 Global Step: 425200 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:04,700-Speed 2494.06 samples/sec Loss 2.0846 LearningRate 0.000293 Epoch: 20 Global Step: 425210 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:12,903-Speed 2497.13 samples/sec Loss 2.0796 LearningRate 0.000293 Epoch: 20 Global Step: 425220 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:21,053-Speed 2513.41 samples/sec Loss 2.0829 LearningRate 0.000293 Epoch: 20 Global Step: 425230 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:29,259-Speed 2496.19 samples/sec Loss 2.0541 LearningRate 0.000293 Epoch: 20 Global Step: 425240 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:37,468-Speed 2495.05 samples/sec Loss 2.1249 LearningRate 0.000293 Epoch: 20 Global Step: 425250 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:45,677-Speed 2495.40 samples/sec Loss 2.1091 LearningRate 0.000293 Epoch: 20 Global Step: 425260 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:46:53,882-Speed 2496.60 samples/sec Loss 2.0952 LearningRate 0.000293 Epoch: 20 Global Step: 425270 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:02,088-Speed 2495.93 samples/sec Loss 2.1054 LearningRate 0.000293 Epoch: 20 Global Step: 425280 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:10,240-Speed 2512.53 samples/sec Loss 2.1157 LearningRate 0.000293 Epoch: 20 Global Step: 425290 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:18,444-Speed 2496.65 samples/sec Loss 2.0252 LearningRate 0.000293 Epoch: 20 Global Step: 425300 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:26,644-Speed 2498.19 samples/sec Loss 2.1074 LearningRate 0.000293 Epoch: 20 Global Step: 425310 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:34,855-Speed 2494.43 samples/sec Loss 2.0827 LearningRate 0.000293 Epoch: 20 Global Step: 425320 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:43,069-Speed 2493.92 samples/sec Loss 2.0538 LearningRate 0.000293 Epoch: 20 Global Step: 425330 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:51,270-Speed 2497.67 samples/sec Loss 2.0946 LearningRate 0.000293 Epoch: 20 Global Step: 425340 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:47:59,418-Speed 2513.90 samples/sec Loss 2.0664 LearningRate 0.000293 Epoch: 20 Global Step: 425350 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:07,627-Speed 2495.26 samples/sec Loss 2.0890 LearningRate 0.000293 Epoch: 20 Global Step: 425360 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:15,828-Speed 2497.46 samples/sec Loss 2.0569 LearningRate 0.000293 Epoch: 20 Global Step: 425370 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:24,030-Speed 2497.51 samples/sec Loss 2.0810 LearningRate 0.000293 Epoch: 20 Global Step: 425380 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:32,234-Speed 2496.51 samples/sec Loss 2.0827 LearningRate 0.000293 Epoch: 20 Global Step: 425390 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:40,440-Speed 2496.19 samples/sec Loss 2.1002 LearningRate 0.000293 Epoch: 20 Global Step: 425400 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:48,610-Speed 2507.13 samples/sec Loss 2.0928 LearningRate 0.000293 Epoch: 20 Global Step: 425410 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:48:56,817-Speed 2496.30 samples/sec Loss 2.0544 LearningRate 0.000293 Epoch: 20 Global Step: 425420 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:49:05,022-Speed 2496.33 samples/sec Loss 2.0420 LearningRate 0.000293 Epoch: 20 Global Step: 425430 Fp16 Grad Scale: 32768 Required: 93 hours Training: 2022-07-09 15:49:13,225-Speed 2497.02 samples/sec Loss 2.0579 LearningRate 0.000293 Epoch: 20 Global Step: 425440 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:49:21,426-Speed 2497.59 samples/sec Loss 2.0443 LearningRate 0.000293 Epoch: 20 Global Step: 425450 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:49:29,629-Speed 2497.14 samples/sec Loss 2.0927 LearningRate 0.000293 Epoch: 20 Global Step: 425460 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:49:37,787-Speed 2510.74 samples/sec Loss 2.1270 LearningRate 0.000293 Epoch: 20 Global Step: 425470 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:49:45,989-Speed 2497.26 samples/sec Loss 2.0324 LearningRate 0.000293 Epoch: 20 Global Step: 425480 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:49:54,197-Speed 2495.77 samples/sec Loss 2.0750 LearningRate 0.000293 Epoch: 20 Global Step: 425490 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:02,398-Speed 2497.56 samples/sec Loss 2.0315 LearningRate 0.000293 Epoch: 20 Global Step: 425500 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:10,611-Speed 2494.04 samples/sec Loss 2.0623 LearningRate 0.000293 Epoch: 20 Global Step: 425510 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:18,815-Speed 2496.72 samples/sec Loss 2.0897 LearningRate 0.000293 Epoch: 20 Global Step: 425520 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:26,961-Speed 2514.45 samples/sec Loss 2.0544 LearningRate 0.000293 Epoch: 20 Global Step: 425530 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:35,190-Speed 2489.26 samples/sec Loss 2.0717 LearningRate 0.000293 Epoch: 20 Global Step: 425540 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:43,395-Speed 2496.37 samples/sec Loss 2.0439 LearningRate 0.000293 Epoch: 20 Global Step: 425550 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:51,607-Speed 2494.30 samples/sec Loss 2.0690 LearningRate 0.000293 Epoch: 20 Global Step: 425560 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:50:59,817-Speed 2494.82 samples/sec Loss 2.0735 LearningRate 0.000293 Epoch: 20 Global Step: 425570 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:08,021-Speed 2496.86 samples/sec Loss 2.0590 LearningRate 0.000293 Epoch: 20 Global Step: 425580 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:16,168-Speed 2514.05 samples/sec Loss 2.1084 LearningRate 0.000293 Epoch: 20 Global Step: 425590 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:24,375-Speed 2495.80 samples/sec Loss 2.0956 LearningRate 0.000293 Epoch: 20 Global Step: 425600 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:32,574-Speed 2498.35 samples/sec Loss 2.0959 LearningRate 0.000293 Epoch: 20 Global Step: 425610 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:40,776-Speed 2497.33 samples/sec Loss 2.0744 LearningRate 0.000293 Epoch: 20 Global Step: 425620 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:48,977-Speed 2497.64 samples/sec Loss 2.0865 LearningRate 0.000293 Epoch: 20 Global Step: 425630 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:51:57,183-Speed 2496.73 samples/sec Loss 2.0821 LearningRate 0.000293 Epoch: 20 Global Step: 425640 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:05,332-Speed 2513.96 samples/sec Loss 2.0267 LearningRate 0.000293 Epoch: 20 Global Step: 425650 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:13,539-Speed 2495.83 samples/sec Loss 2.1006 LearningRate 0.000293 Epoch: 20 Global Step: 425660 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:21,740-Speed 2497.38 samples/sec Loss 2.0579 LearningRate 0.000293 Epoch: 20 Global Step: 425670 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:29,940-Speed 2497.96 samples/sec Loss 2.0799 LearningRate 0.000293 Epoch: 20 Global Step: 425680 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:38,140-Speed 2498.14 samples/sec Loss 2.0780 LearningRate 0.000293 Epoch: 20 Global Step: 425690 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:46,343-Speed 2496.86 samples/sec Loss 2.1206 LearningRate 0.000293 Epoch: 20 Global Step: 425700 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:52:54,492-Speed 2513.51 samples/sec Loss 2.0187 LearningRate 0.000293 Epoch: 20 Global Step: 425710 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:02,691-Speed 2498.22 samples/sec Loss 2.0585 LearningRate 0.000293 Epoch: 20 Global Step: 425720 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:10,892-Speed 2497.89 samples/sec Loss 2.0709 LearningRate 0.000293 Epoch: 20 Global Step: 425730 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:19,100-Speed 2495.37 samples/sec Loss 2.0810 LearningRate 0.000293 Epoch: 20 Global Step: 425740 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:27,317-Speed 2492.71 samples/sec Loss 2.0605 LearningRate 0.000293 Epoch: 20 Global Step: 425750 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:35,515-Speed 2498.85 samples/sec Loss 2.0396 LearningRate 0.000293 Epoch: 20 Global Step: 425760 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:43,677-Speed 2509.60 samples/sec Loss 2.1287 LearningRate 0.000293 Epoch: 20 Global Step: 425770 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:53:51,878-Speed 2497.66 samples/sec Loss 2.0420 LearningRate 0.000292 Epoch: 20 Global Step: 425780 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:00,080-Speed 2497.67 samples/sec Loss 2.0508 LearningRate 0.000292 Epoch: 20 Global Step: 425790 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:08,278-Speed 2498.59 samples/sec Loss 2.0445 LearningRate 0.000292 Epoch: 20 Global Step: 425800 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:16,482-Speed 2496.71 samples/sec Loss 2.1049 LearningRate 0.000292 Epoch: 20 Global Step: 425810 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:24,689-Speed 2495.69 samples/sec Loss 2.1078 LearningRate 0.000292 Epoch: 20 Global Step: 425820 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:32,837-Speed 2513.91 samples/sec Loss 2.0550 LearningRate 0.000292 Epoch: 20 Global Step: 425830 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:41,046-Speed 2495.49 samples/sec Loss 2.0769 LearningRate 0.000292 Epoch: 20 Global Step: 425840 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:49,259-Speed 2494.08 samples/sec Loss 2.0899 LearningRate 0.000292 Epoch: 20 Global Step: 425850 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:54:57,464-Speed 2496.49 samples/sec Loss 2.0812 LearningRate 0.000292 Epoch: 20 Global Step: 425860 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:05,666-Speed 2497.48 samples/sec Loss 2.0927 LearningRate 0.000292 Epoch: 20 Global Step: 425870 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:13,875-Speed 2495.21 samples/sec Loss 2.0918 LearningRate 0.000292 Epoch: 20 Global Step: 425880 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:22,029-Speed 2511.93 samples/sec Loss 2.0336 LearningRate 0.000292 Epoch: 20 Global Step: 425890 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:30,232-Speed 2497.00 samples/sec Loss 2.0188 LearningRate 0.000292 Epoch: 20 Global Step: 425900 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:38,443-Speed 2494.78 samples/sec Loss 2.0277 LearningRate 0.000292 Epoch: 20 Global Step: 425910 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:46,645-Speed 2497.70 samples/sec Loss 2.0795 LearningRate 0.000292 Epoch: 20 Global Step: 425920 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:55:54,857-Speed 2494.26 samples/sec Loss 2.0916 LearningRate 0.000292 Epoch: 20 Global Step: 425930 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:03,060-Speed 2497.13 samples/sec Loss 2.0128 LearningRate 0.000292 Epoch: 20 Global Step: 425940 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:11,221-Speed 2510.05 samples/sec Loss 2.0870 LearningRate 0.000292 Epoch: 20 Global Step: 425950 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:19,420-Speed 2498.19 samples/sec Loss 2.1434 LearningRate 0.000292 Epoch: 20 Global Step: 425960 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:27,624-Speed 2496.80 samples/sec Loss 2.1099 LearningRate 0.000292 Epoch: 20 Global Step: 425970 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:35,827-Speed 2500.59 samples/sec Loss 2.1090 LearningRate 0.000292 Epoch: 20 Global Step: 425980 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:44,031-Speed 2496.88 samples/sec Loss 2.0980 LearningRate 0.000292 Epoch: 20 Global Step: 425990 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:56:52,231-Speed 2498.05 samples/sec Loss 2.0873 LearningRate 0.000292 Epoch: 20 Global Step: 426000 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:00,392-Speed 2510.53 samples/sec Loss 2.0561 LearningRate 0.000292 Epoch: 20 Global Step: 426010 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:08,597-Speed 2496.24 samples/sec Loss 2.0841 LearningRate 0.000292 Epoch: 20 Global Step: 426020 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:16,799-Speed 2497.48 samples/sec Loss 2.0750 LearningRate 0.000292 Epoch: 20 Global Step: 426030 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:24,999-Speed 2498.12 samples/sec Loss 2.0980 LearningRate 0.000292 Epoch: 20 Global Step: 426040 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:33,199-Speed 2497.94 samples/sec Loss 2.0773 LearningRate 0.000292 Epoch: 20 Global Step: 426050 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:41,401-Speed 2497.50 samples/sec Loss 2.0769 LearningRate 0.000292 Epoch: 20 Global Step: 426060 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:49,546-Speed 2514.67 samples/sec Loss 2.0954 LearningRate 0.000292 Epoch: 20 Global Step: 426070 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:57:57,756-Speed 2495.11 samples/sec Loss 2.0925 LearningRate 0.000292 Epoch: 20 Global Step: 426080 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:05,968-Speed 2494.02 samples/sec Loss 2.0678 LearningRate 0.000292 Epoch: 20 Global Step: 426090 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:14,165-Speed 2499.10 samples/sec Loss 2.0814 LearningRate 0.000292 Epoch: 20 Global Step: 426100 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:22,373-Speed 2495.29 samples/sec Loss 2.0730 LearningRate 0.000292 Epoch: 20 Global Step: 426110 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:30,577-Speed 2497.03 samples/sec Loss 2.0515 LearningRate 0.000292 Epoch: 20 Global Step: 426120 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:38,725-Speed 2513.96 samples/sec Loss 2.0886 LearningRate 0.000292 Epoch: 20 Global Step: 426130 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:46,926-Speed 2497.48 samples/sec Loss 2.0829 LearningRate 0.000292 Epoch: 20 Global Step: 426140 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:58:55,126-Speed 2497.97 samples/sec Loss 2.1000 LearningRate 0.000292 Epoch: 20 Global Step: 426150 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:03,323-Speed 2499.01 samples/sec Loss 2.1082 LearningRate 0.000292 Epoch: 20 Global Step: 426160 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:11,522-Speed 2498.28 samples/sec Loss 2.1535 LearningRate 0.000292 Epoch: 20 Global Step: 426170 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:19,733-Speed 2494.49 samples/sec Loss 2.0536 LearningRate 0.000292 Epoch: 20 Global Step: 426180 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:27,880-Speed 2514.41 samples/sec Loss 2.0994 LearningRate 0.000292 Epoch: 20 Global Step: 426190 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:36,080-Speed 2497.86 samples/sec Loss 2.0665 LearningRate 0.000292 Epoch: 20 Global Step: 426200 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:44,282-Speed 2497.32 samples/sec Loss 2.1058 LearningRate 0.000292 Epoch: 20 Global Step: 426210 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 15:59:52,440-Speed 2510.96 samples/sec Loss 2.0829 LearningRate 0.000292 Epoch: 20 Global Step: 426220 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:00,636-Speed 2499.24 samples/sec Loss 2.0989 LearningRate 0.000292 Epoch: 20 Global Step: 426230 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:08,833-Speed 2498.55 samples/sec Loss 2.0586 LearningRate 0.000292 Epoch: 20 Global Step: 426240 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:16,982-Speed 2513.55 samples/sec Loss 2.0589 LearningRate 0.000292 Epoch: 20 Global Step: 426250 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:25,185-Speed 2497.01 samples/sec Loss 2.0948 LearningRate 0.000292 Epoch: 20 Global Step: 426260 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:33,391-Speed 2496.36 samples/sec Loss 2.0674 LearningRate 0.000292 Epoch: 20 Global Step: 426270 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:41,592-Speed 2497.75 samples/sec Loss 2.1198 LearningRate 0.000292 Epoch: 20 Global Step: 426280 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:49,793-Speed 2498.04 samples/sec Loss 2.0699 LearningRate 0.000292 Epoch: 20 Global Step: 426290 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:00:57,997-Speed 2496.58 samples/sec Loss 2.1103 LearningRate 0.000292 Epoch: 20 Global Step: 426300 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:06,148-Speed 2512.99 samples/sec Loss 2.0491 LearningRate 0.000292 Epoch: 20 Global Step: 426310 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:14,353-Speed 2496.63 samples/sec Loss 2.0703 LearningRate 0.000292 Epoch: 20 Global Step: 426320 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:22,555-Speed 2497.31 samples/sec Loss 2.0598 LearningRate 0.000292 Epoch: 20 Global Step: 426330 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:30,763-Speed 2495.66 samples/sec Loss 2.1217 LearningRate 0.000292 Epoch: 20 Global Step: 426340 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:38,966-Speed 2497.07 samples/sec Loss 2.0727 LearningRate 0.000292 Epoch: 20 Global Step: 426350 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:47,174-Speed 2495.38 samples/sec Loss 2.1313 LearningRate 0.000292 Epoch: 20 Global Step: 426360 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:01:55,328-Speed 2512.16 samples/sec Loss 2.0918 LearningRate 0.000292 Epoch: 20 Global Step: 426370 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:03,530-Speed 2498.49 samples/sec Loss 2.0882 LearningRate 0.000292 Epoch: 20 Global Step: 426380 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:11,738-Speed 2495.46 samples/sec Loss 2.1178 LearningRate 0.000292 Epoch: 20 Global Step: 426390 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:19,937-Speed 2498.25 samples/sec Loss 2.1147 LearningRate 0.000292 Epoch: 20 Global Step: 426400 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:28,141-Speed 2496.58 samples/sec Loss 2.1219 LearningRate 0.000292 Epoch: 20 Global Step: 426410 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:36,353-Speed 2494.51 samples/sec Loss 2.1437 LearningRate 0.000292 Epoch: 20 Global Step: 426420 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:44,497-Speed 2514.90 samples/sec Loss 2.1181 LearningRate 0.000292 Epoch: 20 Global Step: 426430 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:02:52,698-Speed 2497.83 samples/sec Loss 2.1149 LearningRate 0.000292 Epoch: 20 Global Step: 426440 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:00,898-Speed 2497.88 samples/sec Loss 2.0284 LearningRate 0.000292 Epoch: 20 Global Step: 426450 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:09,096-Speed 2498.55 samples/sec Loss 2.1295 LearningRate 0.000292 Epoch: 20 Global Step: 426460 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:17,292-Speed 2499.17 samples/sec Loss 2.0858 LearningRate 0.000291 Epoch: 20 Global Step: 426470 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:25,490-Speed 2498.42 samples/sec Loss 2.0949 LearningRate 0.000291 Epoch: 20 Global Step: 426480 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:33,634-Speed 2515.24 samples/sec Loss 2.0723 LearningRate 0.000291 Epoch: 20 Global Step: 426490 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:41,839-Speed 2496.51 samples/sec Loss 2.0896 LearningRate 0.000291 Epoch: 20 Global Step: 426500 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:50,039-Speed 2498.10 samples/sec Loss 2.0988 LearningRate 0.000291 Epoch: 20 Global Step: 426510 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:03:58,238-Speed 2498.23 samples/sec Loss 2.0810 LearningRate 0.000291 Epoch: 20 Global Step: 426520 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:07,364-Speed 2501.47 samples/sec Loss 2.0955 LearningRate 0.000291 Epoch: 20 Global Step: 426530 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:18,908-Speed 2497.62 samples/sec Loss 2.0561 LearningRate 0.000291 Epoch: 20 Global Step: 426540 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:27,060-Speed 2512.62 samples/sec Loss 2.0620 LearningRate 0.000291 Epoch: 20 Global Step: 426550 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:40,347-Speed 1550.83 samples/sec Loss 2.0430 LearningRate 0.000291 Epoch: 20 Global Step: 426560 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:48,592-Speed 2499.24 samples/sec Loss 2.0828 LearningRate 0.000291 Epoch: 20 Global Step: 426570 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:04:56,790-Speed 2498.60 samples/sec Loss 2.0922 LearningRate 0.000291 Epoch: 20 Global Step: 426580 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:04,990-Speed 2497.89 samples/sec Loss 2.0718 LearningRate 0.000291 Epoch: 20 Global Step: 426590 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:13,231-Speed 2500.57 samples/sec Loss 2.0510 LearningRate 0.000291 Epoch: 20 Global Step: 426600 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:26,587-Speed 2515.54 samples/sec Loss 2.0283 LearningRate 0.000291 Epoch: 20 Global Step: 426610 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:34,842-Speed 2500.44 samples/sec Loss 2.0738 LearningRate 0.000291 Epoch: 20 Global Step: 426620 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:46,938-Speed 1702.98 samples/sec Loss 2.0655 LearningRate 0.000291 Epoch: 20 Global Step: 426630 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:05:55,158-Speed 2500.56 samples/sec Loss 2.0360 LearningRate 0.000291 Epoch: 20 Global Step: 426640 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:03,357-Speed 2498.39 samples/sec Loss 2.0894 LearningRate 0.000291 Epoch: 20 Global Step: 426650 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:15,613-Speed 2496.82 samples/sec Loss 2.0845 LearningRate 0.000291 Epoch: 20 Global Step: 426660 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:23,783-Speed 2514.69 samples/sec Loss 2.1144 LearningRate 0.000291 Epoch: 20 Global Step: 426670 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:31,993-Speed 2495.10 samples/sec Loss 2.0638 LearningRate 0.000291 Epoch: 20 Global Step: 426680 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:46,047-Speed 1457.38 samples/sec Loss 2.1086 LearningRate 0.000291 Epoch: 20 Global Step: 426690 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:06:54,274-Speed 2501.47 samples/sec Loss 2.1215 LearningRate 0.000291 Epoch: 20 Global Step: 426700 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:02,525-Speed 2499.71 samples/sec Loss 2.0843 LearningRate 0.000291 Epoch: 20 Global Step: 426710 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:14,873-Speed 1658.67 samples/sec Loss 2.0896 LearningRate 0.000291 Epoch: 20 Global Step: 426720 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:23,081-Speed 2517.35 samples/sec Loss 2.0226 LearningRate 0.000291 Epoch: 20 Global Step: 426730 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:31,289-Speed 2501.95 samples/sec Loss 2.0881 LearningRate 0.000291 Epoch: 20 Global Step: 426740 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:42,246-Speed 1869.36 samples/sec Loss 2.0973 LearningRate 0.000291 Epoch: 20 Global Step: 426750 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:50,487-Speed 2501.50 samples/sec Loss 2.0647 LearningRate 0.000291 Epoch: 20 Global Step: 426760 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:07:58,799-Speed 2502.30 samples/sec Loss 2.0819 LearningRate 0.000291 Epoch: 20 Global Step: 426770 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:08,268-Speed 2163.07 samples/sec Loss 2.0535 LearningRate 0.000291 Epoch: 20 Global Step: 426780 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:16,416-Speed 2514.11 samples/sec Loss 2.0716 LearningRate 0.000291 Epoch: 20 Global Step: 426790 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:25,716-Speed 2500.20 samples/sec Loss 2.0798 LearningRate 0.000291 Epoch: 20 Global Step: 426800 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:33,919-Speed 2497.14 samples/sec Loss 2.0489 LearningRate 0.000291 Epoch: 20 Global Step: 426810 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:42,125-Speed 2496.26 samples/sec Loss 2.0817 LearningRate 0.000291 Epoch: 20 Global Step: 426820 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:50,334-Speed 2495.07 samples/sec Loss 2.0417 LearningRate 0.000291 Epoch: 20 Global Step: 426830 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:08:58,545-Speed 2494.68 samples/sec Loss 2.0372 LearningRate 0.000291 Epoch: 20 Global Step: 426840 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:06,699-Speed 2511.94 samples/sec Loss 2.1009 LearningRate 0.000291 Epoch: 20 Global Step: 426850 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:14,937-Speed 2486.47 samples/sec Loss 2.0607 LearningRate 0.000291 Epoch: 20 Global Step: 426860 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:23,147-Speed 2494.71 samples/sec Loss 2.1039 LearningRate 0.000291 Epoch: 20 Global Step: 426870 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:31,360-Speed 2494.18 samples/sec Loss 2.1125 LearningRate 0.000291 Epoch: 20 Global Step: 426880 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:39,569-Speed 2495.21 samples/sec Loss 2.1017 LearningRate 0.000291 Epoch: 20 Global Step: 426890 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:47,774-Speed 2496.48 samples/sec Loss 2.0349 LearningRate 0.000291 Epoch: 20 Global Step: 426900 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:09:55,928-Speed 2512.04 samples/sec Loss 2.1169 LearningRate 0.000291 Epoch: 20 Global Step: 426910 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:04,133-Speed 2496.38 samples/sec Loss 2.0370 LearningRate 0.000291 Epoch: 20 Global Step: 426920 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:12,340-Speed 2496.16 samples/sec Loss 2.0815 LearningRate 0.000291 Epoch: 20 Global Step: 426930 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:20,546-Speed 2496.08 samples/sec Loss 2.0771 LearningRate 0.000291 Epoch: 20 Global Step: 426940 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:28,756-Speed 2495.09 samples/sec Loss 2.0713 LearningRate 0.000291 Epoch: 20 Global Step: 426950 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:36,965-Speed 2495.18 samples/sec Loss 2.1219 LearningRate 0.000291 Epoch: 20 Global Step: 426960 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:45,120-Speed 2511.69 samples/sec Loss 2.0637 LearningRate 0.000291 Epoch: 20 Global Step: 426970 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:10:53,335-Speed 2493.42 samples/sec Loss 2.1104 LearningRate 0.000291 Epoch: 20 Global Step: 426980 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:01,542-Speed 2496.18 samples/sec Loss 2.1189 LearningRate 0.000291 Epoch: 20 Global Step: 426990 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:09,753-Speed 2494.63 samples/sec Loss 2.0650 LearningRate 0.000291 Epoch: 20 Global Step: 427000 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:17,961-Speed 2495.36 samples/sec Loss 2.0645 LearningRate 0.000291 Epoch: 20 Global Step: 427010 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:26,175-Speed 2493.63 samples/sec Loss 2.0514 LearningRate 0.000291 Epoch: 20 Global Step: 427020 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:34,329-Speed 2512.28 samples/sec Loss 2.0687 LearningRate 0.000291 Epoch: 20 Global Step: 427030 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:42,537-Speed 2495.60 samples/sec Loss 2.0282 LearningRate 0.000291 Epoch: 20 Global Step: 427040 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:50,742-Speed 2496.30 samples/sec Loss 2.0709 LearningRate 0.000291 Epoch: 20 Global Step: 427050 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:11:58,952-Speed 2495.04 samples/sec Loss 2.1146 LearningRate 0.000291 Epoch: 20 Global Step: 427060 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:07,160-Speed 2495.47 samples/sec Loss 2.0677 LearningRate 0.000291 Epoch: 20 Global Step: 427070 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:15,365-Speed 2496.45 samples/sec Loss 2.0452 LearningRate 0.000291 Epoch: 20 Global Step: 427080 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:23,515-Speed 2513.21 samples/sec Loss 2.0342 LearningRate 0.000291 Epoch: 20 Global Step: 427090 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:31,720-Speed 2496.51 samples/sec Loss 2.0598 LearningRate 0.000291 Epoch: 20 Global Step: 427100 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:39,927-Speed 2495.89 samples/sec Loss 2.0375 LearningRate 0.000291 Epoch: 20 Global Step: 427110 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:48,130-Speed 2496.89 samples/sec Loss 2.0806 LearningRate 0.000291 Epoch: 20 Global Step: 427120 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:12:56,336-Speed 2496.10 samples/sec Loss 2.1188 LearningRate 0.000291 Epoch: 20 Global Step: 427130 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:04,552-Speed 2493.32 samples/sec Loss 2.0147 LearningRate 0.000291 Epoch: 20 Global Step: 427140 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:12,712-Speed 2510.28 samples/sec Loss 2.0910 LearningRate 0.000291 Epoch: 20 Global Step: 427150 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:20,921-Speed 2495.40 samples/sec Loss 2.1038 LearningRate 0.000290 Epoch: 20 Global Step: 427160 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:29,125-Speed 2496.43 samples/sec Loss 2.1158 LearningRate 0.000290 Epoch: 20 Global Step: 427170 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:37,332-Speed 2495.99 samples/sec Loss 2.0720 LearningRate 0.000290 Epoch: 20 Global Step: 427180 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:45,551-Speed 2492.12 samples/sec Loss 2.1117 LearningRate 0.000290 Epoch: 20 Global Step: 427190 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:13:53,757-Speed 2496.24 samples/sec Loss 2.0500 LearningRate 0.000290 Epoch: 20 Global Step: 427200 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:01,911-Speed 2512.07 samples/sec Loss 2.0907 LearningRate 0.000290 Epoch: 20 Global Step: 427210 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:10,118-Speed 2496.12 samples/sec Loss 2.0576 LearningRate 0.000290 Epoch: 20 Global Step: 427220 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:18,325-Speed 2495.93 samples/sec Loss 2.1049 LearningRate 0.000290 Epoch: 20 Global Step: 427230 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:26,539-Speed 2493.82 samples/sec Loss 2.0578 LearningRate 0.000290 Epoch: 20 Global Step: 427240 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:34,742-Speed 2496.97 samples/sec Loss 2.1087 LearningRate 0.000290 Epoch: 20 Global Step: 427250 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:42,945-Speed 2497.11 samples/sec Loss 2.0916 LearningRate 0.000290 Epoch: 20 Global Step: 427260 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:51,102-Speed 2511.09 samples/sec Loss 2.0099 LearningRate 0.000290 Epoch: 20 Global Step: 427270 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:14:59,313-Speed 2494.52 samples/sec Loss 2.0683 LearningRate 0.000290 Epoch: 20 Global Step: 427280 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:07,519-Speed 2496.07 samples/sec Loss 2.1178 LearningRate 0.000290 Epoch: 20 Global Step: 427290 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:15,725-Speed 2496.29 samples/sec Loss 2.0854 LearningRate 0.000290 Epoch: 20 Global Step: 427300 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:23,941-Speed 2493.18 samples/sec Loss 2.1159 LearningRate 0.000290 Epoch: 20 Global Step: 427310 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:32,147-Speed 2495.99 samples/sec Loss 2.1338 LearningRate 0.000290 Epoch: 20 Global Step: 427320 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:40,301-Speed 2512.10 samples/sec Loss 2.1019 LearningRate 0.000290 Epoch: 20 Global Step: 427330 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:48,505-Speed 2496.77 samples/sec Loss 2.0892 LearningRate 0.000290 Epoch: 20 Global Step: 427340 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:15:56,710-Speed 2496.23 samples/sec Loss 2.0758 LearningRate 0.000290 Epoch: 20 Global Step: 427350 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:04,912-Speed 2497.18 samples/sec Loss 2.0466 LearningRate 0.000290 Epoch: 20 Global Step: 427360 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:13,115-Speed 2497.13 samples/sec Loss 2.1210 LearningRate 0.000290 Epoch: 20 Global Step: 427370 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:21,324-Speed 2495.60 samples/sec Loss 2.0712 LearningRate 0.000290 Epoch: 20 Global Step: 427380 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:29,484-Speed 2510.14 samples/sec Loss 2.0839 LearningRate 0.000290 Epoch: 20 Global Step: 427390 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:37,691-Speed 2495.90 samples/sec Loss 2.0892 LearningRate 0.000290 Epoch: 20 Global Step: 427400 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:45,895-Speed 2496.88 samples/sec Loss 2.0893 LearningRate 0.000290 Epoch: 20 Global Step: 427410 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:16:54,101-Speed 2496.14 samples/sec Loss 2.0862 LearningRate 0.000290 Epoch: 20 Global Step: 427420 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:02,307-Speed 2496.02 samples/sec Loss 2.0894 LearningRate 0.000290 Epoch: 20 Global Step: 427430 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:10,511-Speed 2497.03 samples/sec Loss 2.0504 LearningRate 0.000290 Epoch: 20 Global Step: 427440 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:18,674-Speed 2509.42 samples/sec Loss 2.0681 LearningRate 0.000290 Epoch: 20 Global Step: 427450 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:26,882-Speed 2495.36 samples/sec Loss 2.0985 LearningRate 0.000290 Epoch: 20 Global Step: 427460 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:35,086-Speed 2497.24 samples/sec Loss 2.1120 LearningRate 0.000290 Epoch: 20 Global Step: 427470 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:43,292-Speed 2496.20 samples/sec Loss 2.0819 LearningRate 0.000290 Epoch: 20 Global Step: 427480 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:51,496-Speed 2496.56 samples/sec Loss 2.0533 LearningRate 0.000290 Epoch: 20 Global Step: 427490 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:17:59,709-Speed 2493.88 samples/sec Loss 2.0786 LearningRate 0.000290 Epoch: 20 Global Step: 427500 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:07,877-Speed 2507.95 samples/sec Loss 2.1049 LearningRate 0.000290 Epoch: 20 Global Step: 427510 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:16,080-Speed 2497.04 samples/sec Loss 2.0459 LearningRate 0.000290 Epoch: 20 Global Step: 427520 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:24,283-Speed 2497.25 samples/sec Loss 2.0681 LearningRate 0.000290 Epoch: 20 Global Step: 427530 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:32,490-Speed 2495.81 samples/sec Loss 2.0773 LearningRate 0.000290 Epoch: 20 Global Step: 427540 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:40,708-Speed 2492.29 samples/sec Loss 2.0490 LearningRate 0.000290 Epoch: 20 Global Step: 427550 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:48,914-Speed 2496.19 samples/sec Loss 2.0417 LearningRate 0.000290 Epoch: 20 Global Step: 427560 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:18:57,068-Speed 2512.02 samples/sec Loss 2.1505 LearningRate 0.000290 Epoch: 20 Global Step: 427570 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:05,269-Speed 2497.66 samples/sec Loss 2.0428 LearningRate 0.000290 Epoch: 20 Global Step: 427580 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:13,488-Speed 2491.94 samples/sec Loss 2.0868 LearningRate 0.000290 Epoch: 20 Global Step: 427590 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:21,692-Speed 2496.83 samples/sec Loss 2.0783 LearningRate 0.000290 Epoch: 20 Global Step: 427600 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:29,894-Speed 2497.34 samples/sec Loss 2.0431 LearningRate 0.000290 Epoch: 20 Global Step: 427610 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:38,096-Speed 2497.26 samples/sec Loss 2.0701 LearningRate 0.000290 Epoch: 20 Global Step: 427620 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:46,248-Speed 2512.61 samples/sec Loss 2.0684 LearningRate 0.000290 Epoch: 20 Global Step: 427630 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:19:54,455-Speed 2495.96 samples/sec Loss 2.0709 LearningRate 0.000290 Epoch: 20 Global Step: 427640 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:02,659-Speed 2496.61 samples/sec Loss 2.0948 LearningRate 0.000290 Epoch: 20 Global Step: 427650 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:10,863-Speed 2496.69 samples/sec Loss 2.0536 LearningRate 0.000290 Epoch: 20 Global Step: 427660 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:19,087-Speed 2490.67 samples/sec Loss 2.0732 LearningRate 0.000290 Epoch: 20 Global Step: 427670 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:27,293-Speed 2496.05 samples/sec Loss 2.0867 LearningRate 0.000290 Epoch: 20 Global Step: 427680 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:35,444-Speed 2513.02 samples/sec Loss 2.0702 LearningRate 0.000290 Epoch: 20 Global Step: 427690 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:43,655-Speed 2494.62 samples/sec Loss 2.0986 LearningRate 0.000290 Epoch: 20 Global Step: 427700 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:20:51,858-Speed 2497.08 samples/sec Loss 2.0700 LearningRate 0.000290 Epoch: 20 Global Step: 427710 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:00,061-Speed 2497.41 samples/sec Loss 2.0527 LearningRate 0.000290 Epoch: 20 Global Step: 427720 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:08,263-Speed 2497.13 samples/sec Loss 2.0544 LearningRate 0.000290 Epoch: 20 Global Step: 427730 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:16,466-Speed 2497.03 samples/sec Loss 2.0952 LearningRate 0.000290 Epoch: 20 Global Step: 427740 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:24,617-Speed 2513.06 samples/sec Loss 2.0428 LearningRate 0.000290 Epoch: 20 Global Step: 427750 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:32,819-Speed 2497.34 samples/sec Loss 2.0918 LearningRate 0.000290 Epoch: 20 Global Step: 427760 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:41,024-Speed 2496.51 samples/sec Loss 2.1192 LearningRate 0.000290 Epoch: 20 Global Step: 427770 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:49,231-Speed 2496.12 samples/sec Loss 2.0952 LearningRate 0.000290 Epoch: 20 Global Step: 427780 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:21:57,436-Speed 2496.39 samples/sec Loss 2.1177 LearningRate 0.000290 Epoch: 20 Global Step: 427790 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:05,658-Speed 2491.32 samples/sec Loss 2.1022 LearningRate 0.000290 Epoch: 20 Global Step: 427800 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:13,812-Speed 2512.58 samples/sec Loss 2.1073 LearningRate 0.000290 Epoch: 20 Global Step: 427810 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:22,024-Speed 2494.39 samples/sec Loss 2.0475 LearningRate 0.000290 Epoch: 20 Global Step: 427820 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:30,230-Speed 2496.04 samples/sec Loss 2.0497 LearningRate 0.000290 Epoch: 20 Global Step: 427830 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:38,435-Speed 2497.08 samples/sec Loss 2.0369 LearningRate 0.000290 Epoch: 20 Global Step: 427840 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:46,639-Speed 2496.48 samples/sec Loss 2.0683 LearningRate 0.000289 Epoch: 20 Global Step: 427850 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:22:54,847-Speed 2495.67 samples/sec Loss 2.1027 LearningRate 0.000289 Epoch: 20 Global Step: 427860 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:23:03,002-Speed 2511.61 samples/sec Loss 2.1235 LearningRate 0.000289 Epoch: 20 Global Step: 427870 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:23:11,212-Speed 2495.11 samples/sec Loss 2.0617 LearningRate 0.000289 Epoch: 20 Global Step: 427880 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:23:19,422-Speed 2494.78 samples/sec Loss 2.0632 LearningRate 0.000289 Epoch: 20 Global Step: 427890 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:23:27,634-Speed 2494.16 samples/sec Loss 2.0317 LearningRate 0.000289 Epoch: 20 Global Step: 427900 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:23:35,804-Speed 2507.34 samples/sec Loss 2.0699 LearningRate 0.000289 Epoch: 20 Global Step: 427910 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:23:44,010-Speed 2495.99 samples/sec Loss 2.0745 LearningRate 0.000289 Epoch: 20 Global Step: 427920 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:23:52,169-Speed 2510.45 samples/sec Loss 2.0481 LearningRate 0.000289 Epoch: 20 Global Step: 427930 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:00,377-Speed 2495.70 samples/sec Loss 2.0809 LearningRate 0.000289 Epoch: 20 Global Step: 427940 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:08,592-Speed 2493.53 samples/sec Loss 2.0705 LearningRate 0.000289 Epoch: 20 Global Step: 427950 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:16,800-Speed 2495.51 samples/sec Loss 2.0629 LearningRate 0.000289 Epoch: 20 Global Step: 427960 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:25,007-Speed 2495.71 samples/sec Loss 2.0459 LearningRate 0.000289 Epoch: 20 Global Step: 427970 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:33,214-Speed 2495.68 samples/sec Loss 2.0736 LearningRate 0.000289 Epoch: 20 Global Step: 427980 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:41,369-Speed 2511.96 samples/sec Loss 2.0379 LearningRate 0.000289 Epoch: 20 Global Step: 427990 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:49,575-Speed 2496.08 samples/sec Loss 2.0304 LearningRate 0.000289 Epoch: 20 Global Step: 428000 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:24:57,783-Speed 2496.26 samples/sec Loss 2.1280 LearningRate 0.000289 Epoch: 20 Global Step: 428010 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:05,990-Speed 2495.74 samples/sec Loss 2.1204 LearningRate 0.000289 Epoch: 20 Global Step: 428020 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:14,197-Speed 2495.85 samples/sec Loss 2.1205 LearningRate 0.000289 Epoch: 20 Global Step: 428030 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:22,400-Speed 2497.07 samples/sec Loss 2.0861 LearningRate 0.000289 Epoch: 20 Global Step: 428040 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:30,551-Speed 2512.81 samples/sec Loss 2.0569 LearningRate 0.000289 Epoch: 20 Global Step: 428050 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:38,758-Speed 2495.72 samples/sec Loss 2.0724 LearningRate 0.000289 Epoch: 20 Global Step: 428060 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:46,962-Speed 2496.85 samples/sec Loss 2.0757 LearningRate 0.000289 Epoch: 20 Global Step: 428070 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:25:55,165-Speed 2497.07 samples/sec Loss 2.0428 LearningRate 0.000289 Epoch: 20 Global Step: 428080 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:03,373-Speed 2495.48 samples/sec Loss 2.0939 LearningRate 0.000289 Epoch: 20 Global Step: 428090 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:11,579-Speed 2496.11 samples/sec Loss 2.0405 LearningRate 0.000289 Epoch: 20 Global Step: 428100 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:19,730-Speed 2513.09 samples/sec Loss 2.0858 LearningRate 0.000289 Epoch: 20 Global Step: 428110 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:27,934-Speed 2496.69 samples/sec Loss 2.1053 LearningRate 0.000289 Epoch: 20 Global Step: 428120 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:36,137-Speed 2496.88 samples/sec Loss 2.1026 LearningRate 0.000289 Epoch: 20 Global Step: 428130 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:44,343-Speed 2496.43 samples/sec Loss 2.1223 LearningRate 0.000289 Epoch: 20 Global Step: 428140 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:26:52,548-Speed 2496.28 samples/sec Loss 2.0839 LearningRate 0.000289 Epoch: 20 Global Step: 428150 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:00,765-Speed 2492.71 samples/sec Loss 2.0896 LearningRate 0.000289 Epoch: 20 Global Step: 428160 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:08,918-Speed 2512.36 samples/sec Loss 2.0815 LearningRate 0.000289 Epoch: 20 Global Step: 428170 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:17,135-Speed 2493.10 samples/sec Loss 2.0676 LearningRate 0.000289 Epoch: 20 Global Step: 428180 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:25,344-Speed 2495.18 samples/sec Loss 2.1150 LearningRate 0.000289 Epoch: 20 Global Step: 428190 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:33,551-Speed 2495.74 samples/sec Loss 2.0929 LearningRate 0.000289 Epoch: 20 Global Step: 428200 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:41,757-Speed 2496.08 samples/sec Loss 2.1191 LearningRate 0.000289 Epoch: 20 Global Step: 428210 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:49,962-Speed 2496.41 samples/sec Loss 2.1034 LearningRate 0.000289 Epoch: 20 Global Step: 428220 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:27:58,121-Speed 2510.68 samples/sec Loss 2.0114 LearningRate 0.000289 Epoch: 20 Global Step: 428230 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:06,330-Speed 2495.09 samples/sec Loss 2.0808 LearningRate 0.000289 Epoch: 20 Global Step: 428240 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:14,541-Speed 2494.57 samples/sec Loss 2.0661 LearningRate 0.000289 Epoch: 20 Global Step: 428250 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:22,751-Speed 2495.07 samples/sec Loss 2.0442 LearningRate 0.000289 Epoch: 20 Global Step: 428260 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:30,959-Speed 2495.30 samples/sec Loss 2.0690 LearningRate 0.000289 Epoch: 20 Global Step: 428270 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:39,166-Speed 2495.62 samples/sec Loss 2.0798 LearningRate 0.000289 Epoch: 20 Global Step: 428280 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:47,321-Speed 2511.79 samples/sec Loss 2.0790 LearningRate 0.000289 Epoch: 20 Global Step: 428290 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:28:55,525-Speed 2496.89 samples/sec Loss 2.0898 LearningRate 0.000289 Epoch: 20 Global Step: 428300 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:03,738-Speed 2494.04 samples/sec Loss 2.0495 LearningRate 0.000289 Epoch: 20 Global Step: 428310 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:11,944-Speed 2495.85 samples/sec Loss 2.0972 LearningRate 0.000289 Epoch: 20 Global Step: 428320 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:20,151-Speed 2495.97 samples/sec Loss 2.0782 LearningRate 0.000289 Epoch: 20 Global Step: 428330 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:28,356-Speed 2496.53 samples/sec Loss 2.0485 LearningRate 0.000289 Epoch: 20 Global Step: 428340 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:36,511-Speed 2511.64 samples/sec Loss 2.0730 LearningRate 0.000289 Epoch: 20 Global Step: 428350 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:44,738-Speed 2489.66 samples/sec Loss 2.0803 LearningRate 0.000289 Epoch: 20 Global Step: 428360 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:29:52,944-Speed 2496.32 samples/sec Loss 2.0594 LearningRate 0.000289 Epoch: 20 Global Step: 428370 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:01,148-Speed 2497.01 samples/sec Loss 2.0752 LearningRate 0.000289 Epoch: 20 Global Step: 428380 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:09,352-Speed 2496.43 samples/sec Loss 2.0294 LearningRate 0.000289 Epoch: 20 Global Step: 428390 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:17,557-Speed 2496.67 samples/sec Loss 2.0490 LearningRate 0.000289 Epoch: 20 Global Step: 428400 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:25,709-Speed 2512.54 samples/sec Loss 2.0320 LearningRate 0.000289 Epoch: 20 Global Step: 428410 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:33,913-Speed 2496.59 samples/sec Loss 2.0542 LearningRate 0.000289 Epoch: 20 Global Step: 428420 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:42,124-Speed 2494.90 samples/sec Loss 2.0456 LearningRate 0.000289 Epoch: 20 Global Step: 428430 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:50,330-Speed 2495.83 samples/sec Loss 2.0473 LearningRate 0.000289 Epoch: 20 Global Step: 428440 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:30:58,540-Speed 2495.08 samples/sec Loss 2.0653 LearningRate 0.000289 Epoch: 20 Global Step: 428450 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:06,745-Speed 2496.52 samples/sec Loss 2.0814 LearningRate 0.000289 Epoch: 20 Global Step: 428460 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:14,897-Speed 2512.36 samples/sec Loss 2.1067 LearningRate 0.000289 Epoch: 20 Global Step: 428470 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:23,109-Speed 2494.32 samples/sec Loss 2.0645 LearningRate 0.000289 Epoch: 20 Global Step: 428480 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:31,311-Speed 2497.43 samples/sec Loss 2.0439 LearningRate 0.000289 Epoch: 20 Global Step: 428490 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:39,519-Speed 2495.71 samples/sec Loss 2.0341 LearningRate 0.000289 Epoch: 20 Global Step: 428500 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:47,724-Speed 2496.36 samples/sec Loss 2.0974 LearningRate 0.000289 Epoch: 20 Global Step: 428510 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:31:55,930-Speed 2495.94 samples/sec Loss 2.0660 LearningRate 0.000289 Epoch: 20 Global Step: 428520 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:04,098-Speed 2508.04 samples/sec Loss 2.0072 LearningRate 0.000289 Epoch: 20 Global Step: 428530 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:12,302-Speed 2496.81 samples/sec Loss 2.0379 LearningRate 0.000289 Epoch: 20 Global Step: 428540 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:20,507-Speed 2496.20 samples/sec Loss 2.0345 LearningRate 0.000288 Epoch: 20 Global Step: 428550 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:28,710-Speed 2497.31 samples/sec Loss 2.0721 LearningRate 0.000288 Epoch: 20 Global Step: 428560 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:36,915-Speed 2496.81 samples/sec Loss 2.0210 LearningRate 0.000288 Epoch: 20 Global Step: 428570 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:45,121-Speed 2496.07 samples/sec Loss 2.0959 LearningRate 0.000288 Epoch: 20 Global Step: 428580 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:32:53,270-Speed 2513.47 samples/sec Loss 1.9965 LearningRate 0.000288 Epoch: 20 Global Step: 428590 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:01,473-Speed 2497.05 samples/sec Loss 2.0735 LearningRate 0.000288 Epoch: 20 Global Step: 428600 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:09,695-Speed 2491.36 samples/sec Loss 2.0516 LearningRate 0.000288 Epoch: 20 Global Step: 428610 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:17,897-Speed 2497.27 samples/sec Loss 2.0833 LearningRate 0.000288 Epoch: 20 Global Step: 428620 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:26,101-Speed 2496.91 samples/sec Loss 2.0228 LearningRate 0.000288 Epoch: 20 Global Step: 428630 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:34,308-Speed 2495.75 samples/sec Loss 2.0292 LearningRate 0.000288 Epoch: 20 Global Step: 428640 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:42,474-Speed 2508.57 samples/sec Loss 2.0409 LearningRate 0.000288 Epoch: 20 Global Step: 428650 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:50,681-Speed 2495.64 samples/sec Loss 2.0875 LearningRate 0.000288 Epoch: 20 Global Step: 428660 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:33:58,882-Speed 2497.76 samples/sec Loss 2.0507 LearningRate 0.000288 Epoch: 20 Global Step: 428670 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:07,087-Speed 2496.21 samples/sec Loss 2.0821 LearningRate 0.000288 Epoch: 20 Global Step: 428680 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:15,289-Speed 2497.66 samples/sec Loss 2.0560 LearningRate 0.000288 Epoch: 20 Global Step: 428690 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:23,507-Speed 2492.41 samples/sec Loss 2.0764 LearningRate 0.000288 Epoch: 20 Global Step: 428700 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:31,658-Speed 2513.16 samples/sec Loss 2.0941 LearningRate 0.000288 Epoch: 20 Global Step: 428710 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:39,867-Speed 2495.25 samples/sec Loss 2.0541 LearningRate 0.000288 Epoch: 20 Global Step: 428720 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:48,075-Speed 2495.56 samples/sec Loss 2.1162 LearningRate 0.000288 Epoch: 20 Global Step: 428730 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:34:56,279-Speed 2496.64 samples/sec Loss 2.0884 LearningRate 0.000288 Epoch: 20 Global Step: 428740 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:04,487-Speed 2495.45 samples/sec Loss 2.0431 LearningRate 0.000288 Epoch: 20 Global Step: 428750 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:12,707-Speed 2491.91 samples/sec Loss 2.0734 LearningRate 0.000288 Epoch: 20 Global Step: 428760 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:20,862-Speed 2511.85 samples/sec Loss 2.0710 LearningRate 0.000288 Epoch: 20 Global Step: 428770 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:29,067-Speed 2496.44 samples/sec Loss 2.0432 LearningRate 0.000288 Epoch: 20 Global Step: 428780 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:37,272-Speed 2496.41 samples/sec Loss 2.0397 LearningRate 0.000288 Epoch: 20 Global Step: 428790 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:45,484-Speed 2494.31 samples/sec Loss 2.0487 LearningRate 0.000288 Epoch: 20 Global Step: 428800 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:35:53,690-Speed 2496.16 samples/sec Loss 2.0484 LearningRate 0.000288 Epoch: 20 Global Step: 428810 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:01,898-Speed 2495.47 samples/sec Loss 2.1029 LearningRate 0.000288 Epoch: 20 Global Step: 428820 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:10,053-Speed 2511.57 samples/sec Loss 2.0692 LearningRate 0.000288 Epoch: 20 Global Step: 428830 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:18,255-Speed 2497.32 samples/sec Loss 2.0399 LearningRate 0.000288 Epoch: 20 Global Step: 428840 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:26,466-Speed 2494.81 samples/sec Loss 2.0564 LearningRate 0.000288 Epoch: 20 Global Step: 428850 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:34,670-Speed 2496.54 samples/sec Loss 2.1230 LearningRate 0.000288 Epoch: 20 Global Step: 428860 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:42,878-Speed 2495.53 samples/sec Loss 2.1544 LearningRate 0.000288 Epoch: 20 Global Step: 428870 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:51,086-Speed 2495.88 samples/sec Loss 2.0313 LearningRate 0.000288 Epoch: 20 Global Step: 428880 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:36:59,236-Speed 2513.21 samples/sec Loss 2.1545 LearningRate 0.000288 Epoch: 20 Global Step: 428890 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:07,443-Speed 2496.28 samples/sec Loss 2.0925 LearningRate 0.000288 Epoch: 20 Global Step: 428900 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:15,649-Speed 2496.42 samples/sec Loss 2.1317 LearningRate 0.000288 Epoch: 20 Global Step: 428910 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:23,853-Speed 2496.72 samples/sec Loss 2.1225 LearningRate 0.000288 Epoch: 20 Global Step: 428920 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:32,058-Speed 2496.31 samples/sec Loss 2.1490 LearningRate 0.000288 Epoch: 20 Global Step: 428930 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:40,266-Speed 2495.68 samples/sec Loss 2.0974 LearningRate 0.000288 Epoch: 20 Global Step: 428940 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:48,416-Speed 2513.12 samples/sec Loss 2.1011 LearningRate 0.000288 Epoch: 20 Global Step: 428950 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:37:56,623-Speed 2496.07 samples/sec Loss 2.1132 LearningRate 0.000288 Epoch: 20 Global Step: 428960 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:04,826-Speed 2496.97 samples/sec Loss 2.0712 LearningRate 0.000288 Epoch: 20 Global Step: 428970 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:13,034-Speed 2495.66 samples/sec Loss 2.1310 LearningRate 0.000288 Epoch: 20 Global Step: 428980 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:21,238-Speed 2496.59 samples/sec Loss 2.1183 LearningRate 0.000288 Epoch: 20 Global Step: 428990 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:29,443-Speed 2496.36 samples/sec Loss 2.1217 LearningRate 0.000288 Epoch: 20 Global Step: 429000 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:37,601-Speed 2510.75 samples/sec Loss 2.1363 LearningRate 0.000288 Epoch: 20 Global Step: 429010 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:45,820-Speed 2492.14 samples/sec Loss 2.1131 LearningRate 0.000288 Epoch: 20 Global Step: 429020 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:38:54,027-Speed 2496.02 samples/sec Loss 2.1150 LearningRate 0.000288 Epoch: 20 Global Step: 429030 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:02,243-Speed 2492.81 samples/sec Loss 2.1441 LearningRate 0.000288 Epoch: 20 Global Step: 429040 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:10,452-Speed 2495.40 samples/sec Loss 2.1065 LearningRate 0.000288 Epoch: 20 Global Step: 429050 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:18,660-Speed 2495.43 samples/sec Loss 2.1862 LearningRate 0.000288 Epoch: 20 Global Step: 429060 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:26,815-Speed 2511.60 samples/sec Loss 2.0534 LearningRate 0.000288 Epoch: 20 Global Step: 429070 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:35,021-Speed 2496.13 samples/sec Loss 2.1405 LearningRate 0.000288 Epoch: 20 Global Step: 429080 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:43,225-Speed 2496.63 samples/sec Loss 2.0941 LearningRate 0.000288 Epoch: 20 Global Step: 429090 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:51,430-Speed 2496.48 samples/sec Loss 2.1435 LearningRate 0.000288 Epoch: 20 Global Step: 429100 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:39:59,639-Speed 2495.18 samples/sec Loss 2.1980 LearningRate 0.000288 Epoch: 20 Global Step: 429110 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:07,846-Speed 2495.73 samples/sec Loss 2.1676 LearningRate 0.000288 Epoch: 20 Global Step: 429120 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:15,994-Speed 2513.66 samples/sec Loss 2.0697 LearningRate 0.000288 Epoch: 20 Global Step: 429130 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:24,198-Speed 2496.99 samples/sec Loss 2.1606 LearningRate 0.000288 Epoch: 20 Global Step: 429140 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:32,400-Speed 2497.24 samples/sec Loss 2.0693 LearningRate 0.000288 Epoch: 20 Global Step: 429150 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:40,603-Speed 2497.13 samples/sec Loss 2.1050 LearningRate 0.000288 Epoch: 20 Global Step: 429160 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:48,806-Speed 2496.90 samples/sec Loss 2.1506 LearningRate 0.000288 Epoch: 20 Global Step: 429170 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:40:57,010-Speed 2496.83 samples/sec Loss 2.0527 LearningRate 0.000288 Epoch: 20 Global Step: 429180 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:05,159-Speed 2513.47 samples/sec Loss 2.1017 LearningRate 0.000288 Epoch: 20 Global Step: 429190 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:13,367-Speed 2495.61 samples/sec Loss 2.0715 LearningRate 0.000288 Epoch: 20 Global Step: 429200 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:21,584-Speed 2492.82 samples/sec Loss 2.0900 LearningRate 0.000288 Epoch: 20 Global Step: 429210 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:29,785-Speed 2497.33 samples/sec Loss 2.0751 LearningRate 0.000288 Epoch: 20 Global Step: 429220 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:37,992-Speed 2496.17 samples/sec Loss 2.0546 LearningRate 0.000288 Epoch: 20 Global Step: 429230 Fp16 Grad Scale: 32768 Required: 92 hours Training: 2022-07-09 16:41:46,154-Speed 2509.48 samples/sec Loss 2.1000 LearningRate 0.000287 Epoch: 20 Global Step: 429240 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:41:54,302-Speed 2513.76 samples/sec Loss 2.0900 LearningRate 0.000287 Epoch: 20 Global Step: 429250 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:02,505-Speed 2496.96 samples/sec Loss 2.1028 LearningRate 0.000287 Epoch: 20 Global Step: 429260 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:10,711-Speed 2496.35 samples/sec Loss 2.0873 LearningRate 0.000287 Epoch: 20 Global Step: 429270 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:18,922-Speed 2494.54 samples/sec Loss 2.0704 LearningRate 0.000287 Epoch: 20 Global Step: 429280 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:27,124-Speed 2497.31 samples/sec Loss 2.1005 LearningRate 0.000287 Epoch: 20 Global Step: 429290 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:35,331-Speed 2496.45 samples/sec Loss 2.0514 LearningRate 0.000287 Epoch: 20 Global Step: 429300 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:43,489-Speed 2511.15 samples/sec Loss 2.0905 LearningRate 0.000287 Epoch: 20 Global Step: 429310 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:51,691-Speed 2497.05 samples/sec Loss 2.0639 LearningRate 0.000287 Epoch: 20 Global Step: 429320 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:42:59,894-Speed 2496.98 samples/sec Loss 2.0116 LearningRate 0.000287 Epoch: 20 Global Step: 429330 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:08,100-Speed 2496.34 samples/sec Loss 2.0462 LearningRate 0.000287 Epoch: 20 Global Step: 429340 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:16,306-Speed 2496.13 samples/sec Loss 2.0214 LearningRate 0.000287 Epoch: 20 Global Step: 429350 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:24,516-Speed 2494.71 samples/sec Loss 2.0284 LearningRate 0.000287 Epoch: 20 Global Step: 429360 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:32,669-Speed 2512.50 samples/sec Loss 2.0557 LearningRate 0.000287 Epoch: 20 Global Step: 429370 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:40,894-Speed 2490.36 samples/sec Loss 2.0580 LearningRate 0.000287 Epoch: 20 Global Step: 429380 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:49,117-Speed 2490.67 samples/sec Loss 2.0694 LearningRate 0.000287 Epoch: 20 Global Step: 429390 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:43:57,323-Speed 2496.21 samples/sec Loss 2.0382 LearningRate 0.000287 Epoch: 20 Global Step: 429400 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:05,539-Speed 2493.00 samples/sec Loss 2.0575 LearningRate 0.000287 Epoch: 20 Global Step: 429410 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:13,748-Speed 2495.31 samples/sec Loss 2.0592 LearningRate 0.000287 Epoch: 20 Global Step: 429420 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:21,900-Speed 2512.44 samples/sec Loss 2.0607 LearningRate 0.000287 Epoch: 20 Global Step: 429430 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:30,105-Speed 2496.43 samples/sec Loss 2.0776 LearningRate 0.000287 Epoch: 20 Global Step: 429440 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:38,308-Speed 2497.08 samples/sec Loss 2.0492 LearningRate 0.000287 Epoch: 20 Global Step: 429450 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:46,515-Speed 2495.68 samples/sec Loss 2.0803 LearningRate 0.000287 Epoch: 20 Global Step: 429460 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:44:54,721-Speed 2496.20 samples/sec Loss 2.0660 LearningRate 0.000287 Epoch: 20 Global Step: 429470 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:02,928-Speed 2495.61 samples/sec Loss 2.0349 LearningRate 0.000287 Epoch: 20 Global Step: 429480 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:11,077-Speed 2513.64 samples/sec Loss 2.0314 LearningRate 0.000287 Epoch: 20 Global Step: 429490 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:19,283-Speed 2496.14 samples/sec Loss 2.0906 LearningRate 0.000287 Epoch: 20 Global Step: 429500 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:27,489-Speed 2496.41 samples/sec Loss 2.0907 LearningRate 0.000287 Epoch: 20 Global Step: 429510 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:35,695-Speed 2496.04 samples/sec Loss 2.0997 LearningRate 0.000287 Epoch: 20 Global Step: 429520 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:43,901-Speed 2496.23 samples/sec Loss 2.0908 LearningRate 0.000287 Epoch: 20 Global Step: 429530 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:45:52,116-Speed 2493.26 samples/sec Loss 2.0747 LearningRate 0.000287 Epoch: 20 Global Step: 429540 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:00,266-Speed 2513.34 samples/sec Loss 2.0809 LearningRate 0.000287 Epoch: 20 Global Step: 429550 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:08,472-Speed 2496.38 samples/sec Loss 2.0997 LearningRate 0.000287 Epoch: 20 Global Step: 429560 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:16,674-Speed 2497.63 samples/sec Loss 2.0258 LearningRate 0.000287 Epoch: 20 Global Step: 429570 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:24,875-Speed 2497.70 samples/sec Loss 2.0719 LearningRate 0.000287 Epoch: 20 Global Step: 429580 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:33,077-Speed 2497.47 samples/sec Loss 2.1205 LearningRate 0.000287 Epoch: 20 Global Step: 429590 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:41,280-Speed 2496.93 samples/sec Loss 2.0792 LearningRate 0.000287 Epoch: 20 Global Step: 429600 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:49,431-Speed 2513.13 samples/sec Loss 2.0997 LearningRate 0.000287 Epoch: 20 Global Step: 429610 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:46:57,636-Speed 2496.30 samples/sec Loss 2.1320 LearningRate 0.000287 Epoch: 20 Global Step: 429620 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:05,839-Speed 2497.06 samples/sec Loss 2.0359 LearningRate 0.000287 Epoch: 20 Global Step: 429630 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:14,042-Speed 2497.05 samples/sec Loss 2.0937 LearningRate 0.000287 Epoch: 20 Global Step: 429640 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:22,245-Speed 2497.11 samples/sec Loss 2.0812 LearningRate 0.000287 Epoch: 20 Global Step: 429650 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:30,453-Speed 2495.59 samples/sec Loss 2.0459 LearningRate 0.000287 Epoch: 20 Global Step: 429660 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:38,609-Speed 2511.25 samples/sec Loss 2.0641 LearningRate 0.000287 Epoch: 20 Global Step: 429670 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:46,810-Speed 2497.59 samples/sec Loss 2.0963 LearningRate 0.000287 Epoch: 20 Global Step: 429680 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:47:55,015-Speed 2496.46 samples/sec Loss 2.0432 LearningRate 0.000287 Epoch: 20 Global Step: 429690 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:03,218-Speed 2497.35 samples/sec Loss 2.0686 LearningRate 0.000287 Epoch: 20 Global Step: 429700 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:11,419-Speed 2497.46 samples/sec Loss 2.0358 LearningRate 0.000287 Epoch: 20 Global Step: 429710 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:19,628-Speed 2495.22 samples/sec Loss 2.0448 LearningRate 0.000287 Epoch: 20 Global Step: 429720 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:27,784-Speed 2511.75 samples/sec Loss 2.0587 LearningRate 0.000287 Epoch: 20 Global Step: 429730 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:35,996-Speed 2494.45 samples/sec Loss 2.0896 LearningRate 0.000287 Epoch: 20 Global Step: 429740 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:44,198-Speed 2497.34 samples/sec Loss 2.0426 LearningRate 0.000287 Epoch: 20 Global Step: 429750 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:48:52,399-Speed 2497.60 samples/sec Loss 2.0509 LearningRate 0.000287 Epoch: 20 Global Step: 429760 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:00,603-Speed 2496.74 samples/sec Loss 2.0946 LearningRate 0.000287 Epoch: 20 Global Step: 429770 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:08,805-Speed 2497.13 samples/sec Loss 2.1328 LearningRate 0.000287 Epoch: 20 Global Step: 429780 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:16,957-Speed 2512.74 samples/sec Loss 2.0162 LearningRate 0.000287 Epoch: 20 Global Step: 429790 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:25,162-Speed 2496.64 samples/sec Loss 2.0430 LearningRate 0.000287 Epoch: 20 Global Step: 429800 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:33,366-Speed 2497.16 samples/sec Loss 2.0748 LearningRate 0.000287 Epoch: 20 Global Step: 429810 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:41,577-Speed 2494.42 samples/sec Loss 2.0305 LearningRate 0.000287 Epoch: 20 Global Step: 429820 Fp16 Grad Scale: 16384 Required: 92 hours Training: 2022-07-09 16:49:49,778-Speed 2497.57 samples/sec Loss 2.0465 LearningRate 0.000287 Epoch: 20 Global Step: 429830 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:49:57,984-Speed 2496.21 samples/sec Loss 2.0840 LearningRate 0.000287 Epoch: 20 Global Step: 429840 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:06,134-Speed 2513.20 samples/sec Loss 2.0404 LearningRate 0.000287 Epoch: 20 Global Step: 429850 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:14,346-Speed 2494.31 samples/sec Loss 2.0824 LearningRate 0.000287 Epoch: 20 Global Step: 429860 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:22,550-Speed 2497.00 samples/sec Loss 2.0364 LearningRate 0.000287 Epoch: 20 Global Step: 429870 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:30,757-Speed 2495.70 samples/sec Loss 2.0731 LearningRate 0.000287 Epoch: 20 Global Step: 429880 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:38,976-Speed 2492.18 samples/sec Loss 1.9898 LearningRate 0.000287 Epoch: 20 Global Step: 429890 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:47,179-Speed 2497.11 samples/sec Loss 2.0369 LearningRate 0.000287 Epoch: 20 Global Step: 429900 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:50:55,330-Speed 2512.76 samples/sec Loss 2.0599 LearningRate 0.000287 Epoch: 20 Global Step: 429910 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:03,532-Speed 2497.51 samples/sec Loss 2.0585 LearningRate 0.000287 Epoch: 20 Global Step: 429920 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:11,738-Speed 2496.14 samples/sec Loss 2.0708 LearningRate 0.000287 Epoch: 20 Global Step: 429930 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:19,941-Speed 2496.77 samples/sec Loss 2.0919 LearningRate 0.000286 Epoch: 20 Global Step: 429940 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:28,144-Speed 2497.06 samples/sec Loss 2.0630 LearningRate 0.000286 Epoch: 20 Global Step: 429950 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:36,351-Speed 2496.11 samples/sec Loss 2.0283 LearningRate 0.000286 Epoch: 20 Global Step: 429960 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:44,500-Speed 2513.70 samples/sec Loss 2.0497 LearningRate 0.000286 Epoch: 20 Global Step: 429970 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:51:52,706-Speed 2496.16 samples/sec Loss 2.0523 LearningRate 0.000286 Epoch: 20 Global Step: 429980 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:00,921-Speed 2493.52 samples/sec Loss 2.0723 LearningRate 0.000286 Epoch: 20 Global Step: 429990 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:09,127-Speed 2496.00 samples/sec Loss 2.0774 LearningRate 0.000286 Epoch: 20 Global Step: 430000 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:17,331-Speed 2496.82 samples/sec Loss 2.1163 LearningRate 0.000286 Epoch: 20 Global Step: 430010 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:25,533-Speed 2497.43 samples/sec Loss 2.0885 LearningRate 0.000286 Epoch: 20 Global Step: 430020 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:33,684-Speed 2513.04 samples/sec Loss 2.0508 LearningRate 0.000286 Epoch: 20 Global Step: 430030 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:41,892-Speed 2495.58 samples/sec Loss 2.0527 LearningRate 0.000286 Epoch: 20 Global Step: 430040 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:50,094-Speed 2497.07 samples/sec Loss 2.0500 LearningRate 0.000286 Epoch: 20 Global Step: 430050 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:52:58,309-Speed 2493.49 samples/sec Loss 2.0694 LearningRate 0.000286 Epoch: 20 Global Step: 430060 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:06,519-Speed 2494.99 samples/sec Loss 2.0605 LearningRate 0.000286 Epoch: 20 Global Step: 430070 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:14,724-Speed 2496.35 samples/sec Loss 2.0286 LearningRate 0.000286 Epoch: 20 Global Step: 430080 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:22,882-Speed 2511.30 samples/sec Loss 2.0788 LearningRate 0.000286 Epoch: 20 Global Step: 430090 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:31,087-Speed 2496.61 samples/sec Loss 2.0415 LearningRate 0.000286 Epoch: 20 Global Step: 430100 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:39,292-Speed 2496.52 samples/sec Loss 2.0645 LearningRate 0.000286 Epoch: 20 Global Step: 430110 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:47,496-Speed 2496.52 samples/sec Loss 2.0982 LearningRate 0.000286 Epoch: 20 Global Step: 430120 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:53:55,704-Speed 2495.50 samples/sec Loss 2.0751 LearningRate 0.000286 Epoch: 20 Global Step: 430130 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:03,915-Speed 2494.76 samples/sec Loss 2.0558 LearningRate 0.000286 Epoch: 20 Global Step: 430140 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:12,071-Speed 2511.30 samples/sec Loss 2.0648 LearningRate 0.000286 Epoch: 20 Global Step: 430150 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:20,277-Speed 2496.10 samples/sec Loss 2.0786 LearningRate 0.000286 Epoch: 20 Global Step: 430160 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:28,481-Speed 2496.84 samples/sec Loss 2.0452 LearningRate 0.000286 Epoch: 20 Global Step: 430170 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:36,698-Speed 2493.23 samples/sec Loss 2.0164 LearningRate 0.000286 Epoch: 20 Global Step: 430180 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:44,901-Speed 2496.95 samples/sec Loss 2.0541 LearningRate 0.000286 Epoch: 20 Global Step: 430190 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:54:53,106-Speed 2496.27 samples/sec Loss 2.0655 LearningRate 0.000286 Epoch: 20 Global Step: 430200 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:01,255-Speed 2513.77 samples/sec Loss 1.9921 LearningRate 0.000286 Epoch: 20 Global Step: 430210 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:09,462-Speed 2496.09 samples/sec Loss 2.0634 LearningRate 0.000286 Epoch: 20 Global Step: 430220 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:17,666-Speed 2496.70 samples/sec Loss 2.0843 LearningRate 0.000286 Epoch: 20 Global Step: 430230 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:25,874-Speed 2495.50 samples/sec Loss 2.0813 LearningRate 0.000286 Epoch: 20 Global Step: 430240 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:34,078-Speed 2496.70 samples/sec Loss 2.0351 LearningRate 0.000286 Epoch: 20 Global Step: 430250 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:42,282-Speed 2496.70 samples/sec Loss 2.0854 LearningRate 0.000286 Epoch: 20 Global Step: 430260 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:50,439-Speed 2511.17 samples/sec Loss 2.1110 LearningRate 0.000286 Epoch: 20 Global Step: 430270 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:55:58,665-Speed 2490.12 samples/sec Loss 2.0758 LearningRate 0.000286 Epoch: 20 Global Step: 430280 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:06,868-Speed 2497.06 samples/sec Loss 2.1084 LearningRate 0.000286 Epoch: 20 Global Step: 430290 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:15,091-Speed 2491.14 samples/sec Loss 2.0787 LearningRate 0.000286 Epoch: 20 Global Step: 430300 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:23,296-Speed 2496.63 samples/sec Loss 2.0569 LearningRate 0.000286 Epoch: 20 Global Step: 430310 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:31,499-Speed 2496.95 samples/sec Loss 2.0357 LearningRate 0.000286 Epoch: 20 Global Step: 430320 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:39,650-Speed 2513.11 samples/sec Loss 2.0333 LearningRate 0.000286 Epoch: 20 Global Step: 430330 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:47,856-Speed 2496.31 samples/sec Loss 2.0898 LearningRate 0.000286 Epoch: 20 Global Step: 430340 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:56:56,064-Speed 2495.51 samples/sec Loss 2.1167 LearningRate 0.000286 Epoch: 20 Global Step: 430350 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:04,271-Speed 2495.94 samples/sec Loss 2.0743 LearningRate 0.000286 Epoch: 20 Global Step: 430360 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:12,475-Speed 2496.41 samples/sec Loss 2.0570 LearningRate 0.000286 Epoch: 20 Global Step: 430370 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:20,678-Speed 2497.12 samples/sec Loss 2.0659 LearningRate 0.000286 Epoch: 20 Global Step: 430380 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:28,828-Speed 2513.47 samples/sec Loss 2.1148 LearningRate 0.000286 Epoch: 20 Global Step: 430390 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:37,033-Speed 2496.49 samples/sec Loss 2.0841 LearningRate 0.000286 Epoch: 20 Global Step: 430400 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:45,238-Speed 2496.25 samples/sec Loss 2.0788 LearningRate 0.000286 Epoch: 20 Global Step: 430410 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:57:53,457-Speed 2492.09 samples/sec Loss 2.0736 LearningRate 0.000286 Epoch: 20 Global Step: 430420 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:58:01,663-Speed 2496.20 samples/sec Loss 2.0792 LearningRate 0.000286 Epoch: 20 Global Step: 430430 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 16:58:09,868-Speed 2496.78 samples/sec Loss 2.0744 LearningRate 0.000286 Epoch: 20 Global Step: 430440 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:18,032-Speed 2508.94 samples/sec Loss 2.0446 LearningRate 0.000286 Epoch: 20 Global Step: 430450 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:26,239-Speed 2495.78 samples/sec Loss 2.0029 LearningRate 0.000286 Epoch: 20 Global Step: 430460 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:34,451-Speed 2494.30 samples/sec Loss 2.0638 LearningRate 0.000286 Epoch: 20 Global Step: 430470 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:42,655-Speed 2496.89 samples/sec Loss 2.0307 LearningRate 0.000286 Epoch: 20 Global Step: 430480 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:50,861-Speed 2496.18 samples/sec Loss 2.0731 LearningRate 0.000286 Epoch: 20 Global Step: 430490 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:58:59,065-Speed 2496.60 samples/sec Loss 2.0681 LearningRate 0.000286 Epoch: 20 Global Step: 430500 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:07,225-Speed 2510.05 samples/sec Loss 2.0401 LearningRate 0.000286 Epoch: 20 Global Step: 430510 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:15,429-Speed 2496.80 samples/sec Loss 2.0663 LearningRate 0.000286 Epoch: 20 Global Step: 430520 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:23,637-Speed 2495.48 samples/sec Loss 2.0642 LearningRate 0.000286 Epoch: 20 Global Step: 430530 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:31,842-Speed 2496.41 samples/sec Loss 2.0107 LearningRate 0.000286 Epoch: 20 Global Step: 430540 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:40,047-Speed 2496.30 samples/sec Loss 2.0589 LearningRate 0.000286 Epoch: 20 Global Step: 430550 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:48,252-Speed 2496.69 samples/sec Loss 2.0574 LearningRate 0.000286 Epoch: 20 Global Step: 430560 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 16:59:56,407-Speed 2511.93 samples/sec Loss 2.0703 LearningRate 0.000286 Epoch: 20 Global Step: 430570 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:04,610-Speed 2496.77 samples/sec Loss 2.0764 LearningRate 0.000286 Epoch: 20 Global Step: 430580 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:12,814-Speed 2496.91 samples/sec Loss 2.0387 LearningRate 0.000286 Epoch: 20 Global Step: 430590 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:21,022-Speed 2495.44 samples/sec Loss 2.0977 LearningRate 0.000286 Epoch: 20 Global Step: 430600 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:29,243-Speed 2491.96 samples/sec Loss 2.0352 LearningRate 0.000286 Epoch: 20 Global Step: 430610 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:37,448-Speed 2496.28 samples/sec Loss 2.0856 LearningRate 0.000286 Epoch: 20 Global Step: 430620 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:45,601-Speed 2512.10 samples/sec Loss 2.1408 LearningRate 0.000286 Epoch: 20 Global Step: 430630 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:00:53,819-Speed 2492.82 samples/sec Loss 2.0767 LearningRate 0.000285 Epoch: 20 Global Step: 430640 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:02,031-Speed 2494.14 samples/sec Loss 1.9843 LearningRate 0.000285 Epoch: 20 Global Step: 430650 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:10,239-Speed 2495.57 samples/sec Loss 2.0743 LearningRate 0.000285 Epoch: 20 Global Step: 430660 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:18,442-Speed 2496.99 samples/sec Loss 2.0665 LearningRate 0.000285 Epoch: 20 Global Step: 430670 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:26,667-Speed 2490.58 samples/sec Loss 2.1152 LearningRate 0.000285 Epoch: 20 Global Step: 430680 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:34,818-Speed 2512.70 samples/sec Loss 2.0993 LearningRate 0.000285 Epoch: 20 Global Step: 430690 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:43,023-Speed 2496.31 samples/sec Loss 2.0457 LearningRate 0.000285 Epoch: 20 Global Step: 430700 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:51,229-Speed 2496.41 samples/sec Loss 2.0679 LearningRate 0.000285 Epoch: 20 Global Step: 430710 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:01:59,431-Speed 2497.09 samples/sec Loss 2.0611 LearningRate 0.000285 Epoch: 20 Global Step: 430720 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:07,653-Speed 2491.47 samples/sec Loss 2.0938 LearningRate 0.000285 Epoch: 20 Global Step: 430730 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:15,860-Speed 2495.92 samples/sec Loss 2.1226 LearningRate 0.000285 Epoch: 20 Global Step: 430740 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:24,009-Speed 2513.67 samples/sec Loss 2.0852 LearningRate 0.000285 Epoch: 20 Global Step: 430750 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:32,213-Speed 2496.69 samples/sec Loss 2.0896 LearningRate 0.000285 Epoch: 20 Global Step: 430760 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:40,420-Speed 2495.80 samples/sec Loss 2.0850 LearningRate 0.000285 Epoch: 20 Global Step: 430770 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:48,624-Speed 2496.63 samples/sec Loss 2.1222 LearningRate 0.000285 Epoch: 20 Global Step: 430780 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:02:56,829-Speed 2496.47 samples/sec Loss 2.0598 LearningRate 0.000285 Epoch: 20 Global Step: 430790 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:05,034-Speed 2496.53 samples/sec Loss 2.1088 LearningRate 0.000285 Epoch: 20 Global Step: 430800 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:13,185-Speed 2512.73 samples/sec Loss 2.0828 LearningRate 0.000285 Epoch: 20 Global Step: 430810 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:21,390-Speed 2496.57 samples/sec Loss 2.0583 LearningRate 0.000285 Epoch: 20 Global Step: 430820 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:29,593-Speed 2497.21 samples/sec Loss 2.0860 LearningRate 0.000285 Epoch: 20 Global Step: 430830 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:37,799-Speed 2496.18 samples/sec Loss 2.0711 LearningRate 0.000285 Epoch: 20 Global Step: 430840 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:46,018-Speed 2491.94 samples/sec Loss 2.1143 LearningRate 0.000285 Epoch: 20 Global Step: 430850 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:03:54,223-Speed 2496.33 samples/sec Loss 2.1309 LearningRate 0.000285 Epoch: 20 Global Step: 430860 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:02,379-Speed 2511.61 samples/sec Loss 2.0815 LearningRate 0.000285 Epoch: 20 Global Step: 430870 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:10,587-Speed 2495.47 samples/sec Loss 2.0996 LearningRate 0.000285 Epoch: 20 Global Step: 430880 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:18,794-Speed 2495.78 samples/sec Loss 2.0935 LearningRate 0.000285 Epoch: 20 Global Step: 430890 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:26,997-Speed 2496.94 samples/sec Loss 2.0795 LearningRate 0.000285 Epoch: 20 Global Step: 430900 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:35,201-Speed 2496.98 samples/sec Loss 2.0679 LearningRate 0.000285 Epoch: 20 Global Step: 430910 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:43,411-Speed 2494.83 samples/sec Loss 2.0708 LearningRate 0.000285 Epoch: 20 Global Step: 430920 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:51,563-Speed 2512.94 samples/sec Loss 2.0375 LearningRate 0.000285 Epoch: 20 Global Step: 430930 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:04:59,767-Speed 2496.55 samples/sec Loss 2.1129 LearningRate 0.000285 Epoch: 20 Global Step: 430940 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:07,973-Speed 2496.27 samples/sec Loss 2.0530 LearningRate 0.000285 Epoch: 20 Global Step: 430950 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:16,179-Speed 2496.12 samples/sec Loss 2.0824 LearningRate 0.000285 Epoch: 20 Global Step: 430960 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:24,385-Speed 2496.07 samples/sec Loss 2.0591 LearningRate 0.000285 Epoch: 20 Global Step: 430970 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:32,597-Speed 2494.21 samples/sec Loss 2.0890 LearningRate 0.000285 Epoch: 20 Global Step: 430980 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:40,747-Speed 2513.47 samples/sec Loss 2.0627 LearningRate 0.000285 Epoch: 20 Global Step: 430990 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:48,959-Speed 2494.25 samples/sec Loss 2.0236 LearningRate 0.000285 Epoch: 20 Global Step: 431000 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:05:57,171-Speed 2494.19 samples/sec Loss 2.0682 LearningRate 0.000285 Epoch: 20 Global Step: 431010 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:06:05,390-Speed 2492.17 samples/sec Loss 2.0223 LearningRate 0.000285 Epoch: 20 Global Step: 431020 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:06:13,598-Speed 2495.36 samples/sec Loss 2.0301 LearningRate 0.000285 Epoch: 20 Global Step: 431030 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:06:21,767-Speed 2507.42 samples/sec Loss 2.0479 LearningRate 0.000285 Epoch: 20 Global Step: 431040 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:06:29,920-Speed 2512.28 samples/sec Loss 2.0260 LearningRate 0.000285 Epoch: 20 Global Step: 431050 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:06:38,132-Speed 2494.37 samples/sec Loss 2.0453 LearningRate 0.000285 Epoch: 20 Global Step: 431060 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:06:46,336-Speed 2497.05 samples/sec Loss 2.0318 LearningRate 0.000285 Epoch: 20 Global Step: 431070 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:06:54,540-Speed 2496.81 samples/sec Loss 2.0588 LearningRate 0.000285 Epoch: 20 Global Step: 431080 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:02,746-Speed 2496.04 samples/sec Loss 2.0175 LearningRate 0.000285 Epoch: 20 Global Step: 431090 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:10,957-Speed 2494.84 samples/sec Loss 2.0470 LearningRate 0.000285 Epoch: 20 Global Step: 431100 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:19,121-Speed 2508.93 samples/sec Loss 1.9779 LearningRate 0.000285 Epoch: 20 Global Step: 431110 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:27,327-Speed 2496.29 samples/sec Loss 2.0385 LearningRate 0.000285 Epoch: 20 Global Step: 431120 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:35,541-Speed 2493.68 samples/sec Loss 2.0397 LearningRate 0.000285 Epoch: 20 Global Step: 431130 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:43,745-Speed 2496.92 samples/sec Loss 2.0671 LearningRate 0.000285 Epoch: 20 Global Step: 431140 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:07:51,948-Speed 2496.96 samples/sec Loss 2.0449 LearningRate 0.000285 Epoch: 20 Global Step: 431150 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:00,150-Speed 2497.33 samples/sec Loss 2.1065 LearningRate 0.000285 Epoch: 20 Global Step: 431160 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:08,299-Speed 2513.57 samples/sec Loss 2.0372 LearningRate 0.000285 Epoch: 20 Global Step: 431170 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:16,501-Speed 2497.15 samples/sec Loss 2.0594 LearningRate 0.000285 Epoch: 20 Global Step: 431180 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:24,704-Speed 2496.93 samples/sec Loss 2.0514 LearningRate 0.000285 Epoch: 20 Global Step: 431190 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:32,911-Speed 2495.95 samples/sec Loss 2.0826 LearningRate 0.000285 Epoch: 20 Global Step: 431200 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:41,117-Speed 2496.42 samples/sec Loss 2.0556 LearningRate 0.000285 Epoch: 20 Global Step: 431210 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:49,321-Speed 2496.33 samples/sec Loss 2.0689 LearningRate 0.000285 Epoch: 20 Global Step: 431220 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:08:57,475-Speed 2512.42 samples/sec Loss 2.0426 LearningRate 0.000285 Epoch: 20 Global Step: 431230 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:05,685-Speed 2495.04 samples/sec Loss 2.0261 LearningRate 0.000285 Epoch: 20 Global Step: 431240 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:13,890-Speed 2496.44 samples/sec Loss 2.0660 LearningRate 0.000285 Epoch: 20 Global Step: 431250 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:22,096-Speed 2496.28 samples/sec Loss 2.0891 LearningRate 0.000285 Epoch: 20 Global Step: 431260 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:30,315-Speed 2492.15 samples/sec Loss 2.0508 LearningRate 0.000285 Epoch: 20 Global Step: 431270 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:38,519-Speed 2496.57 samples/sec Loss 2.0536 LearningRate 0.000285 Epoch: 20 Global Step: 431280 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:46,673-Speed 2512.29 samples/sec Loss 2.0899 LearningRate 0.000285 Epoch: 20 Global Step: 431290 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:09:54,879-Speed 2495.85 samples/sec Loss 2.0127 LearningRate 0.000285 Epoch: 20 Global Step: 431300 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:03,087-Speed 2495.58 samples/sec Loss 2.0796 LearningRate 0.000285 Epoch: 20 Global Step: 431310 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:11,294-Speed 2495.92 samples/sec Loss 2.0458 LearningRate 0.000285 Epoch: 20 Global Step: 431320 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:19,510-Speed 2493.12 samples/sec Loss 2.0324 LearningRate 0.000285 Epoch: 20 Global Step: 431330 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:27,717-Speed 2495.79 samples/sec Loss 2.0187 LearningRate 0.000284 Epoch: 20 Global Step: 431340 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:35,873-Speed 2511.64 samples/sec Loss 2.0385 LearningRate 0.000284 Epoch: 20 Global Step: 431350 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:44,079-Speed 2496.12 samples/sec Loss 2.0928 LearningRate 0.000284 Epoch: 20 Global Step: 431360 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:10:52,283-Speed 2496.59 samples/sec Loss 2.0603 LearningRate 0.000284 Epoch: 20 Global Step: 431370 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:00,487-Speed 2496.68 samples/sec Loss 2.0176 LearningRate 0.000284 Epoch: 20 Global Step: 431380 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:08,696-Speed 2495.53 samples/sec Loss 2.0786 LearningRate 0.000284 Epoch: 20 Global Step: 431390 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:16,902-Speed 2496.17 samples/sec Loss 2.0749 LearningRate 0.000284 Epoch: 20 Global Step: 431400 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:25,054-Speed 2512.56 samples/sec Loss 2.0293 LearningRate 0.000284 Epoch: 20 Global Step: 431410 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:33,277-Speed 2491.00 samples/sec Loss 2.0741 LearningRate 0.000284 Epoch: 20 Global Step: 431420 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:41,485-Speed 2495.43 samples/sec Loss 2.0489 LearningRate 0.000284 Epoch: 20 Global Step: 431430 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:49,691-Speed 2496.32 samples/sec Loss 2.0659 LearningRate 0.000284 Epoch: 20 Global Step: 431440 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:11:57,898-Speed 2495.97 samples/sec Loss 2.0551 LearningRate 0.000284 Epoch: 20 Global Step: 431450 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:06,101-Speed 2496.90 samples/sec Loss 2.0090 LearningRate 0.000284 Epoch: 20 Global Step: 431460 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:14,253-Speed 2512.54 samples/sec Loss 2.0139 LearningRate 0.000284 Epoch: 20 Global Step: 431470 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:22,465-Speed 2494.38 samples/sec Loss 2.0470 LearningRate 0.000284 Epoch: 20 Global Step: 431480 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:30,673-Speed 2495.21 samples/sec Loss 2.0524 LearningRate 0.000284 Epoch: 20 Global Step: 431490 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:38,881-Speed 2495.51 samples/sec Loss 2.0306 LearningRate 0.000284 Epoch: 20 Global Step: 431500 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:47,086-Speed 2496.49 samples/sec Loss 2.0061 LearningRate 0.000284 Epoch: 20 Global Step: 431510 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:12:55,294-Speed 2495.78 samples/sec Loss 2.1218 LearningRate 0.000284 Epoch: 20 Global Step: 431520 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:03,443-Speed 2513.57 samples/sec Loss 2.0299 LearningRate 0.000284 Epoch: 20 Global Step: 431530 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:11,649-Speed 2496.18 samples/sec Loss 2.0570 LearningRate 0.000284 Epoch: 20 Global Step: 431540 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:19,852-Speed 2497.31 samples/sec Loss 2.0459 LearningRate 0.000284 Epoch: 20 Global Step: 431550 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:28,058-Speed 2496.23 samples/sec Loss 2.0590 LearningRate 0.000284 Epoch: 20 Global Step: 431560 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:36,263-Speed 2496.41 samples/sec Loss 2.0294 LearningRate 0.000284 Epoch: 20 Global Step: 431570 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:44,465-Speed 2497.04 samples/sec Loss 1.9858 LearningRate 0.000284 Epoch: 20 Global Step: 431580 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:13:52,626-Speed 2509.96 samples/sec Loss 2.0643 LearningRate 0.000284 Epoch: 20 Global Step: 431590 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:00,830-Speed 2496.83 samples/sec Loss 2.0348 LearningRate 0.000284 Epoch: 20 Global Step: 431600 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:09,036-Speed 2495.97 samples/sec Loss 2.0423 LearningRate 0.000284 Epoch: 20 Global Step: 431610 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:17,246-Speed 2495.04 samples/sec Loss 2.0454 LearningRate 0.000284 Epoch: 20 Global Step: 431620 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:25,452-Speed 2496.21 samples/sec Loss 2.0307 LearningRate 0.000284 Epoch: 20 Global Step: 431630 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:33,657-Speed 2496.42 samples/sec Loss 2.0173 LearningRate 0.000284 Epoch: 20 Global Step: 431640 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:41,831-Speed 2505.72 samples/sec Loss 2.0008 LearningRate 0.000284 Epoch: 20 Global Step: 431650 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:50,043-Speed 2494.62 samples/sec Loss 2.0255 LearningRate 0.000284 Epoch: 20 Global Step: 431660 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:14:58,264-Speed 2491.57 samples/sec Loss 2.0493 LearningRate 0.000284 Epoch: 20 Global Step: 431670 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:06,473-Speed 2495.10 samples/sec Loss 2.0151 LearningRate 0.000284 Epoch: 20 Global Step: 431680 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:14,677-Speed 2496.93 samples/sec Loss 2.0121 LearningRate 0.000284 Epoch: 20 Global Step: 431690 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:22,890-Speed 2494.29 samples/sec Loss 2.0194 LearningRate 0.000284 Epoch: 20 Global Step: 431700 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:31,041-Speed 2513.00 samples/sec Loss 2.0407 LearningRate 0.000284 Epoch: 20 Global Step: 431710 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:39,252-Speed 2494.47 samples/sec Loss 2.0990 LearningRate 0.000284 Epoch: 20 Global Step: 431720 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:47,461-Speed 2495.31 samples/sec Loss 2.0803 LearningRate 0.000284 Epoch: 20 Global Step: 431730 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:15:55,679-Speed 2492.41 samples/sec Loss 2.0671 LearningRate 0.000284 Epoch: 20 Global Step: 431740 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:03,882-Speed 2497.16 samples/sec Loss 2.0889 LearningRate 0.000284 Epoch: 20 Global Step: 431750 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:12,085-Speed 2496.93 samples/sec Loss 2.0376 LearningRate 0.000284 Epoch: 20 Global Step: 431760 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:20,239-Speed 2512.24 samples/sec Loss 2.0539 LearningRate 0.000284 Epoch: 20 Global Step: 431770 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:28,444-Speed 2496.72 samples/sec Loss 2.0802 LearningRate 0.000284 Epoch: 20 Global Step: 431780 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:36,648-Speed 2496.64 samples/sec Loss 2.0508 LearningRate 0.000284 Epoch: 20 Global Step: 431790 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:44,868-Speed 2491.94 samples/sec Loss 2.0333 LearningRate 0.000284 Epoch: 20 Global Step: 431800 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:16:53,072-Speed 2496.59 samples/sec Loss 2.0232 LearningRate 0.000284 Epoch: 20 Global Step: 431810 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:01,275-Speed 2496.87 samples/sec Loss 1.9909 LearningRate 0.000284 Epoch: 20 Global Step: 431820 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:09,430-Speed 2512.03 samples/sec Loss 2.0724 LearningRate 0.000284 Epoch: 20 Global Step: 431830 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:17,640-Speed 2494.78 samples/sec Loss 2.0737 LearningRate 0.000284 Epoch: 20 Global Step: 431840 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:25,845-Speed 2496.70 samples/sec Loss 2.0602 LearningRate 0.000284 Epoch: 20 Global Step: 431850 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:34,050-Speed 2496.38 samples/sec Loss 2.0703 LearningRate 0.000284 Epoch: 20 Global Step: 431860 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:42,271-Speed 2491.71 samples/sec Loss 2.0978 LearningRate 0.000284 Epoch: 20 Global Step: 431870 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:50,475-Speed 2496.76 samples/sec Loss 2.0775 LearningRate 0.000284 Epoch: 20 Global Step: 431880 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:17:58,626-Speed 2513.05 samples/sec Loss 2.0799 LearningRate 0.000284 Epoch: 20 Global Step: 431890 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:06,830-Speed 2496.56 samples/sec Loss 2.0720 LearningRate 0.000284 Epoch: 20 Global Step: 431900 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:15,036-Speed 2496.34 samples/sec Loss 2.0479 LearningRate 0.000284 Epoch: 20 Global Step: 431910 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:23,240-Speed 2496.58 samples/sec Loss 2.1011 LearningRate 0.000284 Epoch: 20 Global Step: 431920 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:31,457-Speed 2493.12 samples/sec Loss 2.0732 LearningRate 0.000284 Epoch: 20 Global Step: 431930 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:39,663-Speed 2496.48 samples/sec Loss 2.0778 LearningRate 0.000284 Epoch: 20 Global Step: 431940 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:47,812-Speed 2513.66 samples/sec Loss 2.0522 LearningRate 0.000284 Epoch: 20 Global Step: 431950 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:18:56,021-Speed 2495.14 samples/sec Loss 2.1286 LearningRate 0.000284 Epoch: 20 Global Step: 431960 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:04,232-Speed 2494.73 samples/sec Loss 2.0900 LearningRate 0.000284 Epoch: 20 Global Step: 431970 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:12,439-Speed 2495.81 samples/sec Loss 2.0899 LearningRate 0.000284 Epoch: 20 Global Step: 431980 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:20,644-Speed 2496.17 samples/sec Loss 2.0564 LearningRate 0.000284 Epoch: 20 Global Step: 431990 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:28,858-Speed 2494.06 samples/sec Loss 2.0598 LearningRate 0.000284 Epoch: 20 Global Step: 432000 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:37,026-Speed 2507.56 samples/sec Loss 2.0465 LearningRate 0.000284 Epoch: 20 Global Step: 432010 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:45,229-Speed 2497.03 samples/sec Loss 2.0487 LearningRate 0.000284 Epoch: 20 Global Step: 432020 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:19:53,436-Speed 2495.96 samples/sec Loss 2.0556 LearningRate 0.000284 Epoch: 20 Global Step: 432030 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:01,642-Speed 2496.05 samples/sec Loss 2.0632 LearningRate 0.000283 Epoch: 20 Global Step: 432040 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:09,863-Speed 2491.55 samples/sec Loss 2.0427 LearningRate 0.000283 Epoch: 20 Global Step: 432050 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:18,071-Speed 2495.52 samples/sec Loss 2.0935 LearningRate 0.000283 Epoch: 20 Global Step: 432060 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:26,220-Speed 2513.47 samples/sec Loss 2.0785 LearningRate 0.000283 Epoch: 20 Global Step: 432070 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:34,430-Speed 2495.17 samples/sec Loss 2.0383 LearningRate 0.000283 Epoch: 20 Global Step: 432080 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:42,632-Speed 2497.35 samples/sec Loss 2.0476 LearningRate 0.000283 Epoch: 20 Global Step: 432090 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:50,838-Speed 2496.20 samples/sec Loss 2.0387 LearningRate 0.000283 Epoch: 20 Global Step: 432100 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:20:59,042-Speed 2496.50 samples/sec Loss 2.0579 LearningRate 0.000283 Epoch: 20 Global Step: 432110 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:07,245-Speed 2497.45 samples/sec Loss 2.0487 LearningRate 0.000283 Epoch: 20 Global Step: 432120 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:15,400-Speed 2511.75 samples/sec Loss 2.0725 LearningRate 0.000283 Epoch: 20 Global Step: 432130 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:23,603-Speed 2497.03 samples/sec Loss 2.0573 LearningRate 0.000283 Epoch: 20 Global Step: 432140 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:31,811-Speed 2495.58 samples/sec Loss 2.0186 LearningRate 0.000283 Epoch: 20 Global Step: 432150 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:40,016-Speed 2496.66 samples/sec Loss 2.0403 LearningRate 0.000283 Epoch: 20 Global Step: 432160 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:48,223-Speed 2495.67 samples/sec Loss 2.0915 LearningRate 0.000283 Epoch: 20 Global Step: 432170 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:21:56,428-Speed 2496.57 samples/sec Loss 2.0456 LearningRate 0.000283 Epoch: 20 Global Step: 432180 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:04,579-Speed 2512.87 samples/sec Loss 2.0544 LearningRate 0.000283 Epoch: 20 Global Step: 432190 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:12,781-Speed 2497.15 samples/sec Loss 2.0873 LearningRate 0.000283 Epoch: 20 Global Step: 432200 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:20,985-Speed 2496.98 samples/sec Loss 2.0710 LearningRate 0.000283 Epoch: 20 Global Step: 432210 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:29,188-Speed 2497.46 samples/sec Loss 2.0613 LearningRate 0.000283 Epoch: 20 Global Step: 432220 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:37,390-Speed 2497.18 samples/sec Loss 2.0549 LearningRate 0.000283 Epoch: 20 Global Step: 432230 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:22:45,608-Speed 2492.50 samples/sec Loss 2.0321 LearningRate 0.000283 Epoch: 20 Global Step: 432240 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:22:53,759-Speed 2513.22 samples/sec Loss 2.0414 LearningRate 0.000283 Epoch: 20 Global Step: 432250 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:01,962-Speed 2496.91 samples/sec Loss 2.0561 LearningRate 0.000283 Epoch: 20 Global Step: 432260 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:10,169-Speed 2495.86 samples/sec Loss 2.0387 LearningRate 0.000283 Epoch: 20 Global Step: 432270 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:18,378-Speed 2495.23 samples/sec Loss 2.0354 LearningRate 0.000283 Epoch: 20 Global Step: 432280 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:26,585-Speed 2496.02 samples/sec Loss 2.0284 LearningRate 0.000283 Epoch: 20 Global Step: 432290 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:34,796-Speed 2494.46 samples/sec Loss 2.0358 LearningRate 0.000283 Epoch: 20 Global Step: 432300 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:42,947-Speed 2513.08 samples/sec Loss 2.0294 LearningRate 0.000283 Epoch: 20 Global Step: 432310 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:51,153-Speed 2496.37 samples/sec Loss 2.0609 LearningRate 0.000283 Epoch: 20 Global Step: 432320 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:23:59,363-Speed 2495.14 samples/sec Loss 2.1056 LearningRate 0.000283 Epoch: 20 Global Step: 432330 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:07,575-Speed 2494.39 samples/sec Loss 2.0792 LearningRate 0.000283 Epoch: 20 Global Step: 432340 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:15,780-Speed 2496.36 samples/sec Loss 2.0478 LearningRate 0.000283 Epoch: 20 Global Step: 432350 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:23,985-Speed 2496.51 samples/sec Loss 2.0189 LearningRate 0.000283 Epoch: 20 Global Step: 432360 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:32,141-Speed 2511.21 samples/sec Loss 2.0777 LearningRate 0.000283 Epoch: 20 Global Step: 432370 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:40,348-Speed 2495.94 samples/sec Loss 2.0878 LearningRate 0.000283 Epoch: 20 Global Step: 432380 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:48,551-Speed 2497.15 samples/sec Loss 2.0742 LearningRate 0.000283 Epoch: 20 Global Step: 432390 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:24:56,758-Speed 2495.56 samples/sec Loss 2.0438 LearningRate 0.000283 Epoch: 20 Global Step: 432400 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:04,963-Speed 2496.57 samples/sec Loss 2.1093 LearningRate 0.000283 Epoch: 20 Global Step: 432410 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:13,179-Speed 2493.08 samples/sec Loss 2.0445 LearningRate 0.000283 Epoch: 20 Global Step: 432420 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:21,359-Speed 2504.05 samples/sec Loss 2.0544 LearningRate 0.000283 Epoch: 20 Global Step: 432430 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:29,565-Speed 2496.25 samples/sec Loss 2.0765 LearningRate 0.000283 Epoch: 20 Global Step: 432440 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:37,769-Speed 2496.81 samples/sec Loss 2.0659 LearningRate 0.000283 Epoch: 20 Global Step: 432450 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:45,990-Speed 2491.47 samples/sec Loss 2.0525 LearningRate 0.000283 Epoch: 20 Global Step: 432460 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:25:54,204-Speed 2493.70 samples/sec Loss 2.0312 LearningRate 0.000283 Epoch: 20 Global Step: 432470 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:02,407-Speed 2496.99 samples/sec Loss 2.0658 LearningRate 0.000283 Epoch: 20 Global Step: 432480 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:10,565-Speed 2510.78 samples/sec Loss 2.0433 LearningRate 0.000283 Epoch: 20 Global Step: 432490 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:18,772-Speed 2496.19 samples/sec Loss 2.0608 LearningRate 0.000283 Epoch: 20 Global Step: 432500 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:26,976-Speed 2496.61 samples/sec Loss 2.0430 LearningRate 0.000283 Epoch: 20 Global Step: 432510 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:35,192-Speed 2493.21 samples/sec Loss 2.0686 LearningRate 0.000283 Epoch: 20 Global Step: 432520 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:43,394-Speed 2497.42 samples/sec Loss 2.0163 LearningRate 0.000283 Epoch: 20 Global Step: 432530 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:51,602-Speed 2495.53 samples/sec Loss 2.0750 LearningRate 0.000283 Epoch: 20 Global Step: 432540 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:26:59,753-Speed 2512.92 samples/sec Loss 2.0822 LearningRate 0.000283 Epoch: 20 Global Step: 432550 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:07,957-Speed 2496.74 samples/sec Loss 2.0176 LearningRate 0.000283 Epoch: 20 Global Step: 432560 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:16,163-Speed 2496.39 samples/sec Loss 2.0408 LearningRate 0.000283 Epoch: 20 Global Step: 432570 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:24,371-Speed 2495.43 samples/sec Loss 2.0279 LearningRate 0.000283 Epoch: 20 Global Step: 432580 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:32,578-Speed 2495.69 samples/sec Loss 2.0042 LearningRate 0.000283 Epoch: 20 Global Step: 432590 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:40,784-Speed 2496.19 samples/sec Loss 2.0429 LearningRate 0.000283 Epoch: 20 Global Step: 432600 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:48,943-Speed 2510.58 samples/sec Loss 2.0347 LearningRate 0.000283 Epoch: 20 Global Step: 432610 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:27:57,148-Speed 2496.42 samples/sec Loss 2.0392 LearningRate 0.000283 Epoch: 20 Global Step: 432620 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:05,352-Speed 2496.76 samples/sec Loss 2.0309 LearningRate 0.000283 Epoch: 20 Global Step: 432630 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:13,567-Speed 2493.58 samples/sec Loss 2.0062 LearningRate 0.000283 Epoch: 20 Global Step: 432640 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:21,773-Speed 2495.96 samples/sec Loss 2.0197 LearningRate 0.000283 Epoch: 20 Global Step: 432650 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:29,984-Speed 2494.53 samples/sec Loss 2.0681 LearningRate 0.000283 Epoch: 20 Global Step: 432660 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:38,140-Speed 2511.56 samples/sec Loss 2.0032 LearningRate 0.000283 Epoch: 20 Global Step: 432670 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:46,349-Speed 2495.43 samples/sec Loss 2.0167 LearningRate 0.000283 Epoch: 20 Global Step: 432680 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:28:54,551-Speed 2497.21 samples/sec Loss 2.0474 LearningRate 0.000283 Epoch: 20 Global Step: 432690 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:02,755-Speed 2496.92 samples/sec Loss 2.0535 LearningRate 0.000283 Epoch: 20 Global Step: 432700 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:10,963-Speed 2495.39 samples/sec Loss 2.0359 LearningRate 0.000283 Epoch: 20 Global Step: 432710 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:19,169-Speed 2496.17 samples/sec Loss 2.0410 LearningRate 0.000283 Epoch: 20 Global Step: 432720 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:27,332-Speed 2508.99 samples/sec Loss 1.9720 LearningRate 0.000283 Epoch: 20 Global Step: 432730 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:35,535-Speed 2497.20 samples/sec Loss 2.0447 LearningRate 0.000282 Epoch: 20 Global Step: 432740 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:43,750-Speed 2493.34 samples/sec Loss 2.0326 LearningRate 0.000282 Epoch: 20 Global Step: 432750 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:29:51,954-Speed 2496.63 samples/sec Loss 2.0835 LearningRate 0.000282 Epoch: 20 Global Step: 432760 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:00,157-Speed 2497.39 samples/sec Loss 2.0273 LearningRate 0.000282 Epoch: 20 Global Step: 432770 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:08,365-Speed 2495.70 samples/sec Loss 2.0400 LearningRate 0.000282 Epoch: 20 Global Step: 432780 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:16,517-Speed 2512.64 samples/sec Loss 1.9921 LearningRate 0.000282 Epoch: 20 Global Step: 432790 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:24,720-Speed 2496.88 samples/sec Loss 2.0731 LearningRate 0.000282 Epoch: 20 Global Step: 432800 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:32,926-Speed 2496.26 samples/sec Loss 2.0573 LearningRate 0.000282 Epoch: 20 Global Step: 432810 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:41,141-Speed 2493.36 samples/sec Loss 2.0345 LearningRate 0.000282 Epoch: 20 Global Step: 432820 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:49,359-Speed 2492.45 samples/sec Loss 2.0541 LearningRate 0.000282 Epoch: 20 Global Step: 432830 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:30:57,564-Speed 2496.39 samples/sec Loss 2.0844 LearningRate 0.000282 Epoch: 20 Global Step: 432840 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:05,735-Speed 2506.81 samples/sec Loss 2.0666 LearningRate 0.000282 Epoch: 20 Global Step: 432850 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:13,941-Speed 2496.30 samples/sec Loss 2.0369 LearningRate 0.000282 Epoch: 20 Global Step: 432860 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:22,164-Speed 2490.99 samples/sec Loss 2.0674 LearningRate 0.000282 Epoch: 20 Global Step: 432870 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:30,373-Speed 2495.03 samples/sec Loss 2.0153 LearningRate 0.000282 Epoch: 20 Global Step: 432880 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:38,581-Speed 2495.65 samples/sec Loss 2.0100 LearningRate 0.000282 Epoch: 20 Global Step: 432890 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:46,799-Speed 2492.50 samples/sec Loss 2.1052 LearningRate 0.000282 Epoch: 20 Global Step: 432900 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:31:54,996-Speed 2498.83 samples/sec Loss 2.0802 LearningRate 0.000282 Epoch: 20 Global Step: 432910 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:03,204-Speed 2495.19 samples/sec Loss 2.0694 LearningRate 0.000282 Epoch: 20 Global Step: 432920 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:11,411-Speed 2495.95 samples/sec Loss 2.0926 LearningRate 0.000282 Epoch: 20 Global Step: 432930 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:19,618-Speed 2495.57 samples/sec Loss 2.0459 LearningRate 0.000282 Epoch: 20 Global Step: 432940 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:27,824-Speed 2496.51 samples/sec Loss 2.0196 LearningRate 0.000282 Epoch: 20 Global Step: 432950 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:36,036-Speed 2494.45 samples/sec Loss 2.0304 LearningRate 0.000282 Epoch: 20 Global Step: 432960 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:44,192-Speed 2511.43 samples/sec Loss 2.0805 LearningRate 0.000282 Epoch: 20 Global Step: 432970 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:32:52,398-Speed 2496.19 samples/sec Loss 2.0744 LearningRate 0.000282 Epoch: 20 Global Step: 432980 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:00,606-Speed 2495.44 samples/sec Loss 2.0825 LearningRate 0.000282 Epoch: 20 Global Step: 432990 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:08,859-Speed 2481.85 samples/sec Loss 2.0305 LearningRate 0.000282 Epoch: 20 Global Step: 433000 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:17,093-Speed 2487.66 samples/sec Loss 2.0501 LearningRate 0.000282 Epoch: 20 Global Step: 433010 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:25,296-Speed 2497.00 samples/sec Loss 2.0506 LearningRate 0.000282 Epoch: 20 Global Step: 433020 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:33,448-Speed 2512.59 samples/sec Loss 2.0811 LearningRate 0.000282 Epoch: 20 Global Step: 433030 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:41,659-Speed 2494.42 samples/sec Loss 2.0282 LearningRate 0.000282 Epoch: 20 Global Step: 433040 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:49,868-Speed 2495.50 samples/sec Loss 2.0730 LearningRate 0.000282 Epoch: 20 Global Step: 433050 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:33:58,085-Speed 2492.71 samples/sec Loss 2.0406 LearningRate 0.000282 Epoch: 20 Global Step: 433060 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:06,291-Speed 2496.00 samples/sec Loss 2.0758 LearningRate 0.000282 Epoch: 20 Global Step: 433070 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:14,504-Speed 2494.47 samples/sec Loss 2.0364 LearningRate 0.000282 Epoch: 20 Global Step: 433080 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:22,654-Speed 2513.09 samples/sec Loss 2.0603 LearningRate 0.000282 Epoch: 20 Global Step: 433090 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:30,870-Speed 2493.41 samples/sec Loss 2.0316 LearningRate 0.000282 Epoch: 20 Global Step: 433100 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:39,077-Speed 2495.90 samples/sec Loss 2.0181 LearningRate 0.000282 Epoch: 20 Global Step: 433110 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:47,280-Speed 2497.12 samples/sec Loss 2.0281 LearningRate 0.000282 Epoch: 20 Global Step: 433120 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:34:55,496-Speed 2493.32 samples/sec Loss 1.9934 LearningRate 0.000282 Epoch: 20 Global Step: 433130 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:03,699-Speed 2497.37 samples/sec Loss 2.0387 LearningRate 0.000282 Epoch: 20 Global Step: 433140 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:11,864-Speed 2508.87 samples/sec Loss 2.0447 LearningRate 0.000282 Epoch: 20 Global Step: 433150 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:20,068-Speed 2496.69 samples/sec Loss 2.0763 LearningRate 0.000282 Epoch: 20 Global Step: 433160 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:28,274-Speed 2496.31 samples/sec Loss 2.0332 LearningRate 0.000282 Epoch: 20 Global Step: 433170 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:36,481-Speed 2495.78 samples/sec Loss 2.0372 LearningRate 0.000282 Epoch: 20 Global Step: 433180 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:44,709-Speed 2489.39 samples/sec Loss 2.0397 LearningRate 0.000282 Epoch: 20 Global Step: 433190 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:35:52,915-Speed 2495.99 samples/sec Loss 2.0640 LearningRate 0.000282 Epoch: 20 Global Step: 433200 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:01,082-Speed 2508.14 samples/sec Loss 2.0190 LearningRate 0.000282 Epoch: 20 Global Step: 433210 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:09,289-Speed 2495.94 samples/sec Loss 2.0553 LearningRate 0.000282 Epoch: 20 Global Step: 433220 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:17,494-Speed 2496.03 samples/sec Loss 2.0115 LearningRate 0.000282 Epoch: 20 Global Step: 433230 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:25,709-Speed 2493.31 samples/sec Loss 2.0404 LearningRate 0.000282 Epoch: 20 Global Step: 433240 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:33,915-Speed 2496.27 samples/sec Loss 2.0471 LearningRate 0.000282 Epoch: 20 Global Step: 433250 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:42,133-Speed 2492.56 samples/sec Loss 1.9796 LearningRate 0.000282 Epoch: 20 Global Step: 433260 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:50,291-Speed 2510.84 samples/sec Loss 2.0218 LearningRate 0.000282 Epoch: 20 Global Step: 433270 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:36:58,496-Speed 2496.24 samples/sec Loss 2.0627 LearningRate 0.000282 Epoch: 20 Global Step: 433280 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:06,705-Speed 2495.34 samples/sec Loss 2.0461 LearningRate 0.000282 Epoch: 20 Global Step: 433290 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:14,918-Speed 2494.03 samples/sec Loss 2.0383 LearningRate 0.000282 Epoch: 20 Global Step: 433300 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:23,124-Speed 2496.39 samples/sec Loss 1.9892 LearningRate 0.000282 Epoch: 20 Global Step: 433310 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:31,327-Speed 2497.07 samples/sec Loss 2.0686 LearningRate 0.000282 Epoch: 20 Global Step: 433320 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:39,477-Speed 2513.95 samples/sec Loss 2.0855 LearningRate 0.000282 Epoch: 20 Global Step: 433330 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:47,686-Speed 2495.28 samples/sec Loss 2.0236 LearningRate 0.000282 Epoch: 20 Global Step: 433340 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:37:55,889-Speed 2496.90 samples/sec Loss 2.0234 LearningRate 0.000282 Epoch: 20 Global Step: 433350 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:04,093-Speed 2496.90 samples/sec Loss 2.0815 LearningRate 0.000282 Epoch: 20 Global Step: 433360 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:12,295-Speed 2497.56 samples/sec Loss 2.0363 LearningRate 0.000282 Epoch: 20 Global Step: 433370 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:20,506-Speed 2494.59 samples/sec Loss 2.0396 LearningRate 0.000282 Epoch: 20 Global Step: 433380 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:28,657-Speed 2513.22 samples/sec Loss 2.0943 LearningRate 0.000282 Epoch: 20 Global Step: 433390 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:36,861-Speed 2496.44 samples/sec Loss 2.0609 LearningRate 0.000282 Epoch: 20 Global Step: 433400 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:45,068-Speed 2495.99 samples/sec Loss 2.0625 LearningRate 0.000282 Epoch: 20 Global Step: 433410 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:38:53,272-Speed 2496.59 samples/sec Loss 2.0730 LearningRate 0.000282 Epoch: 20 Global Step: 433420 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:39:01,476-Speed 2496.80 samples/sec Loss 2.0811 LearningRate 0.000282 Epoch: 20 Global Step: 433430 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:39:09,682-Speed 2496.29 samples/sec Loss 2.0587 LearningRate 0.000281 Epoch: 20 Global Step: 433440 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:17,835-Speed 2512.67 samples/sec Loss 2.0155 LearningRate 0.000281 Epoch: 20 Global Step: 433450 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:26,040-Speed 2496.42 samples/sec Loss 2.0106 LearningRate 0.000281 Epoch: 20 Global Step: 433460 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:34,242-Speed 2497.07 samples/sec Loss 2.0141 LearningRate 0.000281 Epoch: 20 Global Step: 433470 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:42,446-Speed 2496.90 samples/sec Loss 2.0367 LearningRate 0.000281 Epoch: 20 Global Step: 433480 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:50,648-Speed 2497.78 samples/sec Loss 1.9928 LearningRate 0.000281 Epoch: 20 Global Step: 433490 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:39:58,854-Speed 2496.26 samples/sec Loss 2.0383 LearningRate 0.000281 Epoch: 20 Global Step: 433500 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:07,008-Speed 2512.15 samples/sec Loss 2.0634 LearningRate 0.000281 Epoch: 20 Global Step: 433510 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:15,224-Speed 2493.23 samples/sec Loss 2.0493 LearningRate 0.000281 Epoch: 20 Global Step: 433520 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:23,426-Speed 2497.08 samples/sec Loss 2.0635 LearningRate 0.000281 Epoch: 20 Global Step: 433530 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:31,633-Speed 2495.77 samples/sec Loss 2.0232 LearningRate 0.000281 Epoch: 20 Global Step: 433540 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:39,841-Speed 2495.40 samples/sec Loss 2.0108 LearningRate 0.000281 Epoch: 20 Global Step: 433550 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:48,050-Speed 2495.88 samples/sec Loss 2.0572 LearningRate 0.000281 Epoch: 20 Global Step: 433560 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:40:56,201-Speed 2512.56 samples/sec Loss 2.0784 LearningRate 0.000281 Epoch: 20 Global Step: 433570 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:04,408-Speed 2496.05 samples/sec Loss 2.0534 LearningRate 0.000281 Epoch: 20 Global Step: 433580 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:12,681-Speed 2494.33 samples/sec Loss 2.0428 LearningRate 0.000281 Epoch: 20 Global Step: 433590 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:20,940-Speed 2498.01 samples/sec Loss 2.0381 LearningRate 0.000281 Epoch: 20 Global Step: 433600 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:29,148-Speed 2495.50 samples/sec Loss 2.1008 LearningRate 0.000281 Epoch: 20 Global Step: 433610 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:41,623-Speed 2498.64 samples/sec Loss 2.0252 LearningRate 0.000281 Epoch: 20 Global Step: 433620 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:49,788-Speed 2517.77 samples/sec Loss 2.0242 LearningRate 0.000281 Epoch: 20 Global Step: 433630 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:41:58,034-Speed 2499.93 samples/sec Loss 2.0309 LearningRate 0.000281 Epoch: 20 Global Step: 433640 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:42:10,713-Speed 1615.45 samples/sec Loss 2.0195 LearningRate 0.000281 Epoch: 20 Global Step: 433650 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:42:18,954-Speed 2501.66 samples/sec Loss 2.0168 LearningRate 0.000281 Epoch: 20 Global Step: 433660 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-07-09 17:42:27,151-Speed 2513.22 samples/sec Loss 2.0508 LearningRate 0.000281 Epoch: 20 Global Step: 433670 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:42:38,787-Speed 1768.07 samples/sec Loss 1.9995 LearningRate 0.000281 Epoch: 20 Global Step: 433680 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:42:46,936-Speed 2513.36 samples/sec Loss 1.9904 LearningRate 0.000281 Epoch: 20 Global Step: 433690 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:42:59,452-Speed 1643.22 samples/sec Loss 2.0079 LearningRate 0.000281 Epoch: 20 Global Step: 433700 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:07,658-Speed 2497.39 samples/sec Loss 2.0546 LearningRate 0.000281 Epoch: 20 Global Step: 433710 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:16,948-Speed 2498.61 samples/sec Loss 2.0335 LearningRate 0.000281 Epoch: 20 Global Step: 433720 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:25,164-Speed 2493.02 samples/sec Loss 2.0350 LearningRate 0.000281 Epoch: 20 Global Step: 433730 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:33,422-Speed 2496.12 samples/sec Loss 2.0197 LearningRate 0.000281 Epoch: 20 Global Step: 433740 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:41,880-Speed 2513.51 samples/sec Loss 2.0775 LearningRate 0.000281 Epoch: 20 Global Step: 433750 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:43:53,345-Speed 1786.48 samples/sec Loss 2.0590 LearningRate 0.000281 Epoch: 20 Global Step: 433760 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:44:01,597-Speed 2498.99 samples/sec Loss 2.0581 LearningRate 0.000281 Epoch: 20 Global Step: 433770 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:44:14,543-Speed 1592.84 samples/sec Loss 2.0471 LearningRate 0.000281 Epoch: 20 Global Step: 433780 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:44:22,830-Speed 2501.21 samples/sec Loss 2.0193 LearningRate 0.000281 Epoch: 20 Global Step: 433790 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:44:42,402-Speed 1046.41 samples/sec Loss 2.0008 LearningRate 0.000281 Epoch: 20 Global Step: 433800 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:44:51,052-Speed 2382.11 samples/sec Loss 2.0864 LearningRate 0.000281 Epoch: 20 Global Step: 433810 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:02,300-Speed 2502.21 samples/sec Loss 2.0430 LearningRate 0.000281 Epoch: 20 Global Step: 433820 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:10,499-Speed 2498.47 samples/sec Loss 2.0793 LearningRate 0.000281 Epoch: 20 Global Step: 433830 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:18,710-Speed 2494.55 samples/sec Loss 1.9807 LearningRate 0.000281 Epoch: 20 Global Step: 433840 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:26,916-Speed 2496.05 samples/sec Loss 2.0608 LearningRate 0.000281 Epoch: 20 Global Step: 433850 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:35,119-Speed 2497.35 samples/sec Loss 2.0455 LearningRate 0.000281 Epoch: 20 Global Step: 433860 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:43,273-Speed 2511.89 samples/sec Loss 2.0331 LearningRate 0.000281 Epoch: 20 Global Step: 433870 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:51,479-Speed 2496.41 samples/sec Loss 2.0815 LearningRate 0.000281 Epoch: 20 Global Step: 433880 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:45:59,685-Speed 2495.91 samples/sec Loss 2.0631 LearningRate 0.000281 Epoch: 20 Global Step: 433890 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:07,893-Speed 2495.53 samples/sec Loss 2.0183 LearningRate 0.000281 Epoch: 20 Global Step: 433900 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:16,113-Speed 2492.21 samples/sec Loss 2.0445 LearningRate 0.000281 Epoch: 20 Global Step: 433910 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:24,320-Speed 2495.61 samples/sec Loss 2.0878 LearningRate 0.000281 Epoch: 20 Global Step: 433920 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:32,474-Speed 2512.06 samples/sec Loss 2.0822 LearningRate 0.000281 Epoch: 20 Global Step: 433930 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:40,681-Speed 2496.03 samples/sec Loss 2.0465 LearningRate 0.000281 Epoch: 20 Global Step: 433940 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:48,894-Speed 2494.06 samples/sec Loss 2.1295 LearningRate 0.000281 Epoch: 20 Global Step: 433950 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:46:57,101-Speed 2495.67 samples/sec Loss 2.0765 LearningRate 0.000281 Epoch: 20 Global Step: 433960 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:05,308-Speed 2495.79 samples/sec Loss 2.0425 LearningRate 0.000281 Epoch: 20 Global Step: 433970 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:13,513-Speed 2496.51 samples/sec Loss 2.0909 LearningRate 0.000281 Epoch: 20 Global Step: 433980 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:21,678-Speed 2508.67 samples/sec Loss 2.1205 LearningRate 0.000281 Epoch: 20 Global Step: 433990 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:29,888-Speed 2495.18 samples/sec Loss 2.0749 LearningRate 0.000281 Epoch: 20 Global Step: 434000 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:38,093-Speed 2496.17 samples/sec Loss 2.0731 LearningRate 0.000281 Epoch: 20 Global Step: 434010 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:46,303-Speed 2495.09 samples/sec Loss 2.1394 LearningRate 0.000281 Epoch: 20 Global Step: 434020 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:47:54,506-Speed 2497.06 samples/sec Loss 2.0558 LearningRate 0.000281 Epoch: 20 Global Step: 434030 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:02,711-Speed 2496.40 samples/sec Loss 2.0643 LearningRate 0.000281 Epoch: 20 Global Step: 434040 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:10,866-Speed 2511.49 samples/sec Loss 2.0605 LearningRate 0.000281 Epoch: 20 Global Step: 434050 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:19,076-Speed 2494.97 samples/sec Loss 2.0473 LearningRate 0.000281 Epoch: 20 Global Step: 434060 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:27,279-Speed 2497.18 samples/sec Loss 2.0588 LearningRate 0.000281 Epoch: 20 Global Step: 434070 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:35,484-Speed 2496.45 samples/sec Loss 2.0692 LearningRate 0.000281 Epoch: 20 Global Step: 434080 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:43,701-Speed 2492.61 samples/sec Loss 2.0731 LearningRate 0.000281 Epoch: 20 Global Step: 434090 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:48:51,905-Speed 2496.96 samples/sec Loss 2.0688 LearningRate 0.000281 Epoch: 20 Global Step: 434100 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:49:00,057-Speed 2513.86 samples/sec Loss 2.1010 LearningRate 0.000281 Epoch: 20 Global Step: 434110 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:49:08,261-Speed 2496.35 samples/sec Loss 2.0351 LearningRate 0.000281 Epoch: 20 Global Step: 434120 Fp16 Grad Scale: 32768 Required: 91 hours Training: 2022-07-09 17:49:16,430-Speed 2507.82 samples/sec Loss 2.0459 LearningRate 0.000281 Epoch: 20 Global Step: 434130 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:49:24,638-Speed 2495.39 samples/sec Loss 2.0496 LearningRate 0.000281 Epoch: 20 Global Step: 434140 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:49:32,842-Speed 2496.89 samples/sec Loss 2.0334 LearningRate 0.000280 Epoch: 20 Global Step: 434150 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:49:41,050-Speed 2495.50 samples/sec Loss 2.0612 LearningRate 0.000280 Epoch: 20 Global Step: 434160 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:49:49,206-Speed 2511.46 samples/sec Loss 2.0793 LearningRate 0.000280 Epoch: 20 Global Step: 434170 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:49:57,412-Speed 2495.98 samples/sec Loss 2.0222 LearningRate 0.000280 Epoch: 20 Global Step: 434180 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:50:05,624-Speed 2494.28 samples/sec Loss 2.0361 LearningRate 0.000280 Epoch: 20 Global Step: 434190 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:50:13,831-Speed 2495.59 samples/sec Loss 2.0281 LearningRate 0.000280 Epoch: 20 Global Step: 434200 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:50:22,044-Speed 2494.26 samples/sec Loss 2.0146 LearningRate 0.000280 Epoch: 20 Global Step: 434210 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:50:30,250-Speed 2495.96 samples/sec Loss 1.9786 LearningRate 0.000280 Epoch: 20 Global Step: 434220 Fp16 Grad Scale: 16384 Required: 91 hours Training: 2022-07-09 17:50:38,403-Speed 2512.43 samples/sec Loss 2.0481 LearningRate 0.000280 Epoch: 20 Global Step: 434230 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:50:46,607-Speed 2496.95 samples/sec Loss 2.0532 LearningRate 0.000280 Epoch: 20 Global Step: 434240 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:50:54,814-Speed 2496.13 samples/sec Loss 2.0506 LearningRate 0.000280 Epoch: 20 Global Step: 434250 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:03,043-Speed 2489.15 samples/sec Loss 2.0764 LearningRate 0.000280 Epoch: 20 Global Step: 434260 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:11,246-Speed 2497.20 samples/sec Loss 2.0373 LearningRate 0.000280 Epoch: 20 Global Step: 434270 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:19,447-Speed 2497.62 samples/sec Loss 2.0485 LearningRate 0.000280 Epoch: 20 Global Step: 434280 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:27,598-Speed 2512.99 samples/sec Loss 2.0413 LearningRate 0.000280 Epoch: 20 Global Step: 434290 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:35,798-Speed 2498.08 samples/sec Loss 2.0562 LearningRate 0.000280 Epoch: 20 Global Step: 434300 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:44,010-Speed 2494.07 samples/sec Loss 2.0362 LearningRate 0.000280 Epoch: 20 Global Step: 434310 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:51:52,211-Speed 2497.38 samples/sec Loss 2.0496 LearningRate 0.000280 Epoch: 20 Global Step: 434320 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:00,415-Speed 2497.19 samples/sec Loss 2.0635 LearningRate 0.000280 Epoch: 20 Global Step: 434330 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:08,618-Speed 2497.18 samples/sec Loss 2.0139 LearningRate 0.000280 Epoch: 20 Global Step: 434340 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:16,778-Speed 2509.99 samples/sec Loss 2.0151 LearningRate 0.000280 Epoch: 20 Global Step: 434350 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:24,984-Speed 2496.35 samples/sec Loss 2.0841 LearningRate 0.000280 Epoch: 20 Global Step: 434360 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:33,189-Speed 2496.25 samples/sec Loss 2.0533 LearningRate 0.000280 Epoch: 20 Global Step: 434370 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:41,400-Speed 2494.69 samples/sec Loss 2.0406 LearningRate 0.000280 Epoch: 20 Global Step: 434380 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:49,603-Speed 2496.71 samples/sec Loss 2.0790 LearningRate 0.000280 Epoch: 20 Global Step: 434390 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:52:57,808-Speed 2496.76 samples/sec Loss 2.0970 LearningRate 0.000280 Epoch: 20 Global Step: 434400 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:05,961-Speed 2512.60 samples/sec Loss 2.0438 LearningRate 0.000280 Epoch: 20 Global Step: 434410 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:14,164-Speed 2496.89 samples/sec Loss 2.0590 LearningRate 0.000280 Epoch: 20 Global Step: 434420 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:22,370-Speed 2496.03 samples/sec Loss 2.0017 LearningRate 0.000280 Epoch: 20 Global Step: 434430 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:30,575-Speed 2496.57 samples/sec Loss 2.0535 LearningRate 0.000280 Epoch: 20 Global Step: 434440 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:38,779-Speed 2496.58 samples/sec Loss 2.0755 LearningRate 0.000280 Epoch: 20 Global Step: 434450 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:46,985-Speed 2496.21 samples/sec Loss 2.0358 LearningRate 0.000280 Epoch: 20 Global Step: 434460 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:53:55,133-Speed 2514.27 samples/sec Loss 2.0336 LearningRate 0.000280 Epoch: 20 Global Step: 434470 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:03,337-Speed 2496.76 samples/sec Loss 2.0461 LearningRate 0.000280 Epoch: 20 Global Step: 434480 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:11,543-Speed 2496.30 samples/sec Loss 2.0407 LearningRate 0.000280 Epoch: 20 Global Step: 434490 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:19,751-Speed 2495.48 samples/sec Loss 2.0582 LearningRate 0.000280 Epoch: 20 Global Step: 434500 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:27,953-Speed 2497.69 samples/sec Loss 2.0243 LearningRate 0.000280 Epoch: 20 Global Step: 434510 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:36,170-Speed 2492.81 samples/sec Loss 2.0362 LearningRate 0.000280 Epoch: 20 Global Step: 434520 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:44,325-Speed 2511.81 samples/sec Loss 1.9742 LearningRate 0.000280 Epoch: 20 Global Step: 434530 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:54:52,530-Speed 2496.29 samples/sec Loss 2.0488 LearningRate 0.000280 Epoch: 20 Global Step: 434540 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:00,741-Speed 2495.05 samples/sec Loss 2.0050 LearningRate 0.000280 Epoch: 20 Global Step: 434550 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:08,948-Speed 2495.67 samples/sec Loss 2.0525 LearningRate 0.000280 Epoch: 20 Global Step: 434560 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:17,160-Speed 2494.43 samples/sec Loss 2.0303 LearningRate 0.000280 Epoch: 20 Global Step: 434570 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:25,362-Speed 2497.04 samples/sec Loss 2.0268 LearningRate 0.000280 Epoch: 20 Global Step: 434580 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:33,516-Speed 2512.30 samples/sec Loss 2.0105 LearningRate 0.000280 Epoch: 20 Global Step: 434590 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:41,718-Speed 2497.30 samples/sec Loss 2.0574 LearningRate 0.000280 Epoch: 20 Global Step: 434600 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:49,918-Speed 2497.77 samples/sec Loss 2.0606 LearningRate 0.000280 Epoch: 20 Global Step: 434610 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:55:58,122-Speed 2496.84 samples/sec Loss 2.0665 LearningRate 0.000280 Epoch: 20 Global Step: 434620 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:06,320-Speed 2498.49 samples/sec Loss 2.0238 LearningRate 0.000280 Epoch: 20 Global Step: 434630 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:14,516-Speed 2499.03 samples/sec Loss 2.0572 LearningRate 0.000280 Epoch: 20 Global Step: 434640 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:22,666-Speed 2513.14 samples/sec Loss 2.0467 LearningRate 0.000280 Epoch: 20 Global Step: 434650 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:30,871-Speed 2496.47 samples/sec Loss 2.0356 LearningRate 0.000280 Epoch: 20 Global Step: 434660 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:39,069-Speed 2498.55 samples/sec Loss 1.9815 LearningRate 0.000280 Epoch: 20 Global Step: 434670 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:47,284-Speed 2493.23 samples/sec Loss 2.0154 LearningRate 0.000280 Epoch: 20 Global Step: 434680 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:56:55,488-Speed 2497.00 samples/sec Loss 2.0077 LearningRate 0.000280 Epoch: 20 Global Step: 434690 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:03,691-Speed 2497.03 samples/sec Loss 2.0165 LearningRate 0.000280 Epoch: 20 Global Step: 434700 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:11,844-Speed 2513.01 samples/sec Loss 1.9977 LearningRate 0.000280 Epoch: 20 Global Step: 434710 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:20,044-Speed 2497.91 samples/sec Loss 1.9967 LearningRate 0.000280 Epoch: 20 Global Step: 434720 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:28,251-Speed 2495.98 samples/sec Loss 2.0058 LearningRate 0.000280 Epoch: 20 Global Step: 434730 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:36,478-Speed 2489.72 samples/sec Loss 2.0488 LearningRate 0.000280 Epoch: 20 Global Step: 434740 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:44,680-Speed 2497.34 samples/sec Loss 2.0562 LearningRate 0.000280 Epoch: 20 Global Step: 434750 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:57:52,884-Speed 2496.85 samples/sec Loss 2.0507 LearningRate 0.000280 Epoch: 20 Global Step: 434760 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:01,029-Speed 2514.51 samples/sec Loss 2.0368 LearningRate 0.000280 Epoch: 20 Global Step: 434770 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:09,243-Speed 2493.88 samples/sec Loss 2.0276 LearningRate 0.000280 Epoch: 20 Global Step: 434780 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:17,444-Speed 2497.75 samples/sec Loss 2.0329 LearningRate 0.000280 Epoch: 20 Global Step: 434790 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:25,644-Speed 2497.98 samples/sec Loss 2.0285 LearningRate 0.000280 Epoch: 20 Global Step: 434800 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:33,883-Speed 2485.86 samples/sec Loss 2.0154 LearningRate 0.000280 Epoch: 20 Global Step: 434810 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:42,089-Speed 2496.39 samples/sec Loss 2.0256 LearningRate 0.000280 Epoch: 20 Global Step: 434820 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:50,236-Speed 2514.32 samples/sec Loss 2.0794 LearningRate 0.000280 Epoch: 20 Global Step: 434830 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:58:58,438-Speed 2497.22 samples/sec Loss 2.0920 LearningRate 0.000280 Epoch: 20 Global Step: 434840 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:06,635-Speed 2498.65 samples/sec Loss 2.0050 LearningRate 0.000279 Epoch: 20 Global Step: 434850 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:14,834-Speed 2498.19 samples/sec Loss 2.0474 LearningRate 0.000279 Epoch: 20 Global Step: 434860 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:23,035-Speed 2498.24 samples/sec Loss 2.0816 LearningRate 0.000279 Epoch: 20 Global Step: 434870 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:31,232-Speed 2498.84 samples/sec Loss 2.0475 LearningRate 0.000279 Epoch: 20 Global Step: 434880 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:39,376-Speed 2515.12 samples/sec Loss 2.0547 LearningRate 0.000279 Epoch: 20 Global Step: 434890 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:47,577-Speed 2497.57 samples/sec Loss 2.0118 LearningRate 0.000279 Epoch: 20 Global Step: 434900 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 17:59:55,777-Speed 2497.91 samples/sec Loss 2.0524 LearningRate 0.000279 Epoch: 20 Global Step: 434910 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:03,976-Speed 2498.26 samples/sec Loss 2.0911 LearningRate 0.000279 Epoch: 20 Global Step: 434920 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:12,175-Speed 2498.30 samples/sec Loss 2.0107 LearningRate 0.000279 Epoch: 20 Global Step: 434930 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:20,374-Speed 2498.37 samples/sec Loss 2.0305 LearningRate 0.000279 Epoch: 20 Global Step: 434940 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:28,528-Speed 2511.81 samples/sec Loss 2.0978 LearningRate 0.000279 Epoch: 20 Global Step: 434950 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:36,726-Speed 2498.41 samples/sec Loss 2.0109 LearningRate 0.000279 Epoch: 20 Global Step: 434960 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:44,928-Speed 2497.40 samples/sec Loss 2.0158 LearningRate 0.000279 Epoch: 20 Global Step: 434970 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:00:53,127-Speed 2498.44 samples/sec Loss 2.0562 LearningRate 0.000279 Epoch: 20 Global Step: 434980 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:01,331-Speed 2496.70 samples/sec Loss 2.0533 LearningRate 0.000279 Epoch: 20 Global Step: 434990 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:09,530-Speed 2497.94 samples/sec Loss 2.0198 LearningRate 0.000279 Epoch: 20 Global Step: 435000 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:17,694-Speed 2509.32 samples/sec Loss 2.0295 LearningRate 0.000279 Epoch: 20 Global Step: 435010 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:25,892-Speed 2498.52 samples/sec Loss 2.0974 LearningRate 0.000279 Epoch: 20 Global Step: 435020 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:34,099-Speed 2495.64 samples/sec Loss 2.0311 LearningRate 0.000279 Epoch: 20 Global Step: 435030 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:42,298-Speed 2498.27 samples/sec Loss 2.0247 LearningRate 0.000279 Epoch: 20 Global Step: 435040 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:50,506-Speed 2495.48 samples/sec Loss 2.0722 LearningRate 0.000279 Epoch: 20 Global Step: 435050 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:01:58,718-Speed 2494.37 samples/sec Loss 2.0781 LearningRate 0.000279 Epoch: 20 Global Step: 435060 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:06,868-Speed 2513.18 samples/sec Loss 2.0707 LearningRate 0.000279 Epoch: 20 Global Step: 435070 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:15,072-Speed 2496.81 samples/sec Loss 2.0389 LearningRate 0.000279 Epoch: 20 Global Step: 435080 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:23,274-Speed 2497.57 samples/sec Loss 2.0298 LearningRate 0.000279 Epoch: 20 Global Step: 435090 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:31,476-Speed 2497.59 samples/sec Loss 2.0316 LearningRate 0.000279 Epoch: 20 Global Step: 435100 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:39,673-Speed 2498.77 samples/sec Loss 2.0081 LearningRate 0.000279 Epoch: 20 Global Step: 435110 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:47,872-Speed 2498.55 samples/sec Loss 2.0367 LearningRate 0.000279 Epoch: 20 Global Step: 435120 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:02:56,022-Speed 2513.33 samples/sec Loss 2.0413 LearningRate 0.000279 Epoch: 20 Global Step: 435130 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:04,230-Speed 2495.36 samples/sec Loss 2.0702 LearningRate 0.000279 Epoch: 20 Global Step: 435140 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:12,433-Speed 2497.24 samples/sec Loss 2.0576 LearningRate 0.000279 Epoch: 20 Global Step: 435150 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:20,639-Speed 2495.96 samples/sec Loss 2.0558 LearningRate 0.000279 Epoch: 20 Global Step: 435160 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:28,840-Speed 2497.83 samples/sec Loss 2.0441 LearningRate 0.000279 Epoch: 20 Global Step: 435170 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:37,048-Speed 2495.39 samples/sec Loss 2.0658 LearningRate 0.000279 Epoch: 20 Global Step: 435180 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:45,211-Speed 2509.37 samples/sec Loss 2.0499 LearningRate 0.000279 Epoch: 20 Global Step: 435190 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:03:53,410-Speed 2498.23 samples/sec Loss 2.0124 LearningRate 0.000279 Epoch: 20 Global Step: 435200 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:01,627-Speed 2492.80 samples/sec Loss 2.0530 LearningRate 0.000279 Epoch: 20 Global Step: 435210 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:09,826-Speed 2498.36 samples/sec Loss 2.0664 LearningRate 0.000279 Epoch: 20 Global Step: 435220 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:18,037-Speed 2494.46 samples/sec Loss 2.0998 LearningRate 0.000279 Epoch: 20 Global Step: 435230 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:26,239-Speed 2497.57 samples/sec Loss 2.0533 LearningRate 0.000279 Epoch: 20 Global Step: 435240 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:34,401-Speed 2509.49 samples/sec Loss 2.0287 LearningRate 0.000279 Epoch: 20 Global Step: 435250 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:42,603-Speed 2497.32 samples/sec Loss 2.0323 LearningRate 0.000279 Epoch: 20 Global Step: 435260 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:50,804-Speed 2497.62 samples/sec Loss 2.0220 LearningRate 0.000279 Epoch: 20 Global Step: 435270 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:04:59,005-Speed 2497.78 samples/sec Loss 2.0673 LearningRate 0.000279 Epoch: 20 Global Step: 435280 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:05:07,206-Speed 2497.70 samples/sec Loss 2.0867 LearningRate 0.000279 Epoch: 20 Global Step: 435290 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:05:15,405-Speed 2498.18 samples/sec Loss 2.1135 LearningRate 0.000279 Epoch: 20 Global Step: 435300 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:05:23,547-Speed 2516.48 samples/sec Loss 2.0496 LearningRate 0.000279 Epoch: 20 Global Step: 435310 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:05:31,748-Speed 2497.90 samples/sec Loss 2.0476 LearningRate 0.000279 Epoch: 20 Global Step: 435320 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:05:39,945-Speed 2498.89 samples/sec Loss 2.0631 LearningRate 0.000279 Epoch: 20 Global Step: 435330 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:05:48,152-Speed 2496.06 samples/sec Loss 2.0262 LearningRate 0.000279 Epoch: 20 Global Step: 435340 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:05:56,354-Speed 2497.36 samples/sec Loss 2.0704 LearningRate 0.000279 Epoch: 20 Global Step: 435350 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:04,566-Speed 2494.11 samples/sec Loss 2.0616 LearningRate 0.000279 Epoch: 20 Global Step: 435360 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:12,723-Speed 2511.98 samples/sec Loss 2.0470 LearningRate 0.000279 Epoch: 20 Global Step: 435370 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:21,025-Speed 2467.34 samples/sec Loss 2.0623 LearningRate 0.000279 Epoch: 20 Global Step: 435380 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:29,225-Speed 2497.91 samples/sec Loss 2.0861 LearningRate 0.000279 Epoch: 20 Global Step: 435390 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:37,428-Speed 2496.91 samples/sec Loss 2.0611 LearningRate 0.000279 Epoch: 20 Global Step: 435400 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:45,626-Speed 2498.60 samples/sec Loss 2.0280 LearningRate 0.000279 Epoch: 20 Global Step: 435410 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:06:53,824-Speed 2498.53 samples/sec Loss 2.0911 LearningRate 0.000279 Epoch: 20 Global Step: 435420 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:01,973-Speed 2513.78 samples/sec Loss 2.0758 LearningRate 0.000279 Epoch: 20 Global Step: 435430 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:10,172-Speed 2498.23 samples/sec Loss 2.0903 LearningRate 0.000279 Epoch: 20 Global Step: 435440 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:18,370-Speed 2498.76 samples/sec Loss 2.0290 LearningRate 0.000279 Epoch: 20 Global Step: 435450 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:26,582-Speed 2494.35 samples/sec Loss 2.0480 LearningRate 0.000279 Epoch: 20 Global Step: 435460 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:34,785-Speed 2497.32 samples/sec Loss 2.0931 LearningRate 0.000279 Epoch: 20 Global Step: 435470 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:42,980-Speed 2499.33 samples/sec Loss 2.0360 LearningRate 0.000279 Epoch: 20 Global Step: 435480 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:51,127-Speed 2514.26 samples/sec Loss 2.0211 LearningRate 0.000279 Epoch: 20 Global Step: 435490 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:07:59,329-Speed 2497.40 samples/sec Loss 2.0336 LearningRate 0.000279 Epoch: 20 Global Step: 435500 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:07,529-Speed 2498.12 samples/sec Loss 2.0319 LearningRate 0.000279 Epoch: 20 Global Step: 435510 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:15,731-Speed 2497.32 samples/sec Loss 2.0476 LearningRate 0.000279 Epoch: 20 Global Step: 435520 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:23,933-Speed 2497.42 samples/sec Loss 2.0458 LearningRate 0.000279 Epoch: 20 Global Step: 435530 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:34,619-Speed 1916.79 samples/sec Loss 2.0932 LearningRate 0.000279 Epoch: 21 Global Step: 435540 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:42,763-Speed 2515.10 samples/sec Loss 2.0897 LearningRate 0.000279 Epoch: 21 Global Step: 435550 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:50,962-Speed 2498.48 samples/sec Loss 2.0702 LearningRate 0.000278 Epoch: 21 Global Step: 435560 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:08:59,176-Speed 2493.47 samples/sec Loss 2.0549 LearningRate 0.000278 Epoch: 21 Global Step: 435570 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:07,377-Speed 2497.68 samples/sec Loss 2.1423 LearningRate 0.000278 Epoch: 21 Global Step: 435580 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:15,576-Speed 2498.35 samples/sec Loss 2.0484 LearningRate 0.000278 Epoch: 21 Global Step: 435590 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:23,778-Speed 2497.38 samples/sec Loss 2.0953 LearningRate 0.000278 Epoch: 21 Global Step: 435600 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:31,932-Speed 2511.97 samples/sec Loss 2.1170 LearningRate 0.000278 Epoch: 21 Global Step: 435610 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:40,136-Speed 2496.83 samples/sec Loss 2.0646 LearningRate 0.000278 Epoch: 21 Global Step: 435620 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:48,339-Speed 2497.06 samples/sec Loss 2.0076 LearningRate 0.000278 Epoch: 21 Global Step: 435630 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:09:56,540-Speed 2497.56 samples/sec Loss 2.0970 LearningRate 0.000278 Epoch: 21 Global Step: 435640 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:04,745-Speed 2496.57 samples/sec Loss 2.0490 LearningRate 0.000278 Epoch: 21 Global Step: 435650 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:12,943-Speed 2498.38 samples/sec Loss 2.0419 LearningRate 0.000278 Epoch: 21 Global Step: 435660 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:21,096-Speed 2512.30 samples/sec Loss 2.0290 LearningRate 0.000278 Epoch: 21 Global Step: 435670 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:29,297-Speed 2497.65 samples/sec Loss 2.0149 LearningRate 0.000278 Epoch: 21 Global Step: 435680 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:37,496-Speed 2498.36 samples/sec Loss 2.0345 LearningRate 0.000278 Epoch: 21 Global Step: 435690 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:45,700-Speed 2496.72 samples/sec Loss 2.0495 LearningRate 0.000278 Epoch: 21 Global Step: 435700 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:10:53,896-Speed 2499.03 samples/sec Loss 2.0282 LearningRate 0.000278 Epoch: 21 Global Step: 435710 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:02,096-Speed 2497.87 samples/sec Loss 2.0186 LearningRate 0.000278 Epoch: 21 Global Step: 435720 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:10,244-Speed 2514.10 samples/sec Loss 2.0072 LearningRate 0.000278 Epoch: 21 Global Step: 435730 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:18,443-Speed 2498.04 samples/sec Loss 2.0586 LearningRate 0.000278 Epoch: 21 Global Step: 435740 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:26,646-Speed 2496.93 samples/sec Loss 2.0090 LearningRate 0.000278 Epoch: 21 Global Step: 435750 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:34,847-Speed 2498.03 samples/sec Loss 2.0165 LearningRate 0.000278 Epoch: 21 Global Step: 435760 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:43,049-Speed 2497.59 samples/sec Loss 2.0705 LearningRate 0.000278 Epoch: 21 Global Step: 435770 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:51,253-Speed 2496.89 samples/sec Loss 2.0131 LearningRate 0.000278 Epoch: 21 Global Step: 435780 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:11:59,405-Speed 2512.67 samples/sec Loss 2.0638 LearningRate 0.000278 Epoch: 21 Global Step: 435790 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:12:07,607-Speed 2497.23 samples/sec Loss 2.0275 LearningRate 0.000278 Epoch: 21 Global Step: 435800 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:12:15,813-Speed 2496.36 samples/sec Loss 2.0479 LearningRate 0.000278 Epoch: 21 Global Step: 435810 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:12:23,976-Speed 2509.37 samples/sec Loss 2.0712 LearningRate 0.000278 Epoch: 21 Global Step: 435820 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:12:32,180-Speed 2496.79 samples/sec Loss 2.0145 LearningRate 0.000278 Epoch: 21 Global Step: 435830 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:12:40,381-Speed 2497.60 samples/sec Loss 2.0322 LearningRate 0.000278 Epoch: 21 Global Step: 435840 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:12:48,543-Speed 2509.60 samples/sec Loss 2.0252 LearningRate 0.000278 Epoch: 21 Global Step: 435850 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:12:56,751-Speed 2495.51 samples/sec Loss 1.9950 LearningRate 0.000278 Epoch: 21 Global Step: 435860 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:04,948-Speed 2498.55 samples/sec Loss 2.0237 LearningRate 0.000278 Epoch: 21 Global Step: 435870 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:13,151-Speed 2497.08 samples/sec Loss 2.0401 LearningRate 0.000278 Epoch: 21 Global Step: 435880 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:21,367-Speed 2493.32 samples/sec Loss 2.0124 LearningRate 0.000278 Epoch: 21 Global Step: 435890 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:29,569-Speed 2497.26 samples/sec Loss 2.0098 LearningRate 0.000278 Epoch: 21 Global Step: 435900 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:37,715-Speed 2514.45 samples/sec Loss 2.0309 LearningRate 0.000278 Epoch: 21 Global Step: 435910 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:45,918-Speed 2497.13 samples/sec Loss 2.0556 LearningRate 0.000278 Epoch: 21 Global Step: 435920 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:13:54,121-Speed 2497.26 samples/sec Loss 2.0539 LearningRate 0.000278 Epoch: 21 Global Step: 435930 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:02,324-Speed 2497.23 samples/sec Loss 2.0389 LearningRate 0.000278 Epoch: 21 Global Step: 435940 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:10,530-Speed 2495.95 samples/sec Loss 2.0208 LearningRate 0.000278 Epoch: 21 Global Step: 435950 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:18,733-Speed 2497.27 samples/sec Loss 2.0653 LearningRate 0.000278 Epoch: 21 Global Step: 435960 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:26,883-Speed 2513.24 samples/sec Loss 2.0342 LearningRate 0.000278 Epoch: 21 Global Step: 435970 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:35,093-Speed 2495.11 samples/sec Loss 2.0317 LearningRate 0.000278 Epoch: 21 Global Step: 435980 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:43,343-Speed 2482.65 samples/sec Loss 2.0334 LearningRate 0.000278 Epoch: 21 Global Step: 435990 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:51,548-Speed 2496.63 samples/sec Loss 2.0463 LearningRate 0.000278 Epoch: 21 Global Step: 436000 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:14:59,750-Speed 2497.50 samples/sec Loss 2.0929 LearningRate 0.000278 Epoch: 21 Global Step: 436010 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:07,951-Speed 2497.48 samples/sec Loss 1.9833 LearningRate 0.000278 Epoch: 21 Global Step: 436020 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:16,099-Speed 2513.85 samples/sec Loss 2.0435 LearningRate 0.000278 Epoch: 21 Global Step: 436030 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:24,303-Speed 2496.79 samples/sec Loss 2.0221 LearningRate 0.000278 Epoch: 21 Global Step: 436040 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:32,504-Speed 2497.74 samples/sec Loss 2.0320 LearningRate 0.000278 Epoch: 21 Global Step: 436050 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:40,705-Speed 2497.73 samples/sec Loss 2.0219 LearningRate 0.000278 Epoch: 21 Global Step: 436060 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:48,901-Speed 2499.19 samples/sec Loss 2.0092 LearningRate 0.000278 Epoch: 21 Global Step: 436070 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:15:57,106-Speed 2496.48 samples/sec Loss 2.0469 LearningRate 0.000278 Epoch: 21 Global Step: 436080 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:05,248-Speed 2516.05 samples/sec Loss 2.0459 LearningRate 0.000278 Epoch: 21 Global Step: 436090 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:13,462-Speed 2493.77 samples/sec Loss 2.0179 LearningRate 0.000278 Epoch: 21 Global Step: 436100 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:21,661-Speed 2497.97 samples/sec Loss 2.0492 LearningRate 0.000278 Epoch: 21 Global Step: 436110 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:29,862-Speed 2497.75 samples/sec Loss 2.0681 LearningRate 0.000278 Epoch: 21 Global Step: 436120 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:38,064-Speed 2497.58 samples/sec Loss 2.0307 LearningRate 0.000278 Epoch: 21 Global Step: 436130 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:46,277-Speed 2493.98 samples/sec Loss 2.0336 LearningRate 0.000278 Epoch: 21 Global Step: 436140 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:16:54,424-Speed 2514.13 samples/sec Loss 2.0366 LearningRate 0.000278 Epoch: 21 Global Step: 436150 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:02,629-Speed 2496.73 samples/sec Loss 2.0631 LearningRate 0.000278 Epoch: 21 Global Step: 436160 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:10,835-Speed 2495.97 samples/sec Loss 2.0060 LearningRate 0.000278 Epoch: 21 Global Step: 436170 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:19,046-Speed 2494.51 samples/sec Loss 2.0462 LearningRate 0.000278 Epoch: 21 Global Step: 436180 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:27,262-Speed 2493.12 samples/sec Loss 2.0243 LearningRate 0.000278 Epoch: 21 Global Step: 436190 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:35,465-Speed 2497.06 samples/sec Loss 2.0170 LearningRate 0.000278 Epoch: 21 Global Step: 436200 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:43,617-Speed 2512.54 samples/sec Loss 2.0161 LearningRate 0.000278 Epoch: 21 Global Step: 436210 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:17:51,818-Speed 2497.69 samples/sec Loss 2.0435 LearningRate 0.000278 Epoch: 21 Global Step: 436220 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:00,015-Speed 2498.94 samples/sec Loss 2.0469 LearningRate 0.000278 Epoch: 21 Global Step: 436230 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:08,223-Speed 2495.64 samples/sec Loss 2.0187 LearningRate 0.000278 Epoch: 21 Global Step: 436240 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:16,431-Speed 2495.47 samples/sec Loss 2.0285 LearningRate 0.000278 Epoch: 21 Global Step: 436250 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:24,628-Speed 2499.00 samples/sec Loss 1.9861 LearningRate 0.000278 Epoch: 21 Global Step: 436260 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:32,776-Speed 2513.73 samples/sec Loss 2.0446 LearningRate 0.000277 Epoch: 21 Global Step: 436270 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:40,980-Speed 2496.86 samples/sec Loss 2.0372 LearningRate 0.000277 Epoch: 21 Global Step: 436280 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:49,175-Speed 2499.58 samples/sec Loss 2.0467 LearningRate 0.000277 Epoch: 21 Global Step: 436290 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:18:57,376-Speed 2497.68 samples/sec Loss 2.0328 LearningRate 0.000277 Epoch: 21 Global Step: 436300 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:05,577-Speed 2497.69 samples/sec Loss 2.0322 LearningRate 0.000277 Epoch: 21 Global Step: 436310 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:13,780-Speed 2497.17 samples/sec Loss 2.0161 LearningRate 0.000277 Epoch: 21 Global Step: 436320 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:21,932-Speed 2512.46 samples/sec Loss 2.0026 LearningRate 0.000277 Epoch: 21 Global Step: 436330 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:30,134-Speed 2497.62 samples/sec Loss 1.9835 LearningRate 0.000277 Epoch: 21 Global Step: 436340 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:38,336-Speed 2497.30 samples/sec Loss 2.0583 LearningRate 0.000277 Epoch: 21 Global Step: 436350 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:46,541-Speed 2496.54 samples/sec Loss 2.0702 LearningRate 0.000277 Epoch: 21 Global Step: 436360 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:19:54,742-Speed 2497.61 samples/sec Loss 2.0427 LearningRate 0.000277 Epoch: 21 Global Step: 436370 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:02,942-Speed 2497.70 samples/sec Loss 2.0586 LearningRate 0.000277 Epoch: 21 Global Step: 436380 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:11,099-Speed 2511.39 samples/sec Loss 2.0303 LearningRate 0.000277 Epoch: 21 Global Step: 436390 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:19,296-Speed 2498.94 samples/sec Loss 2.0708 LearningRate 0.000277 Epoch: 21 Global Step: 436400 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:27,496-Speed 2497.74 samples/sec Loss 2.0719 LearningRate 0.000277 Epoch: 21 Global Step: 436410 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:35,698-Speed 2497.34 samples/sec Loss 2.0545 LearningRate 0.000277 Epoch: 21 Global Step: 436420 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:43,902-Speed 2496.74 samples/sec Loss 2.0653 LearningRate 0.000277 Epoch: 21 Global Step: 436430 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:20:52,108-Speed 2496.02 samples/sec Loss 2.0445 LearningRate 0.000277 Epoch: 21 Global Step: 436440 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:00,261-Speed 2512.42 samples/sec Loss 2.0408 LearningRate 0.000277 Epoch: 21 Global Step: 436450 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:08,467-Speed 2496.47 samples/sec Loss 2.1019 LearningRate 0.000277 Epoch: 21 Global Step: 436460 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:16,663-Speed 2499.25 samples/sec Loss 2.0586 LearningRate 0.000277 Epoch: 21 Global Step: 436470 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:24,865-Speed 2497.34 samples/sec Loss 2.0370 LearningRate 0.000277 Epoch: 21 Global Step: 436480 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:33,065-Speed 2497.87 samples/sec Loss 2.0818 LearningRate 0.000277 Epoch: 21 Global Step: 436490 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:41,278-Speed 2494.13 samples/sec Loss 2.0439 LearningRate 0.000277 Epoch: 21 Global Step: 436500 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:49,432-Speed 2513.62 samples/sec Loss 2.0662 LearningRate 0.000277 Epoch: 21 Global Step: 436510 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:21:57,634-Speed 2497.12 samples/sec Loss 2.0040 LearningRate 0.000277 Epoch: 21 Global Step: 436520 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:05,835-Speed 2497.80 samples/sec Loss 2.0255 LearningRate 0.000277 Epoch: 21 Global Step: 436530 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:14,033-Speed 2498.89 samples/sec Loss 2.0684 LearningRate 0.000277 Epoch: 21 Global Step: 436540 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:22,237-Speed 2496.69 samples/sec Loss 2.0276 LearningRate 0.000277 Epoch: 21 Global Step: 436550 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:30,434-Speed 2498.84 samples/sec Loss 1.9580 LearningRate 0.000277 Epoch: 21 Global Step: 436560 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:38,583-Speed 2513.65 samples/sec Loss 2.0091 LearningRate 0.000277 Epoch: 21 Global Step: 436570 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:46,782-Speed 2498.07 samples/sec Loss 2.0331 LearningRate 0.000277 Epoch: 21 Global Step: 436580 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:22:54,983-Speed 2498.03 samples/sec Loss 1.9922 LearningRate 0.000277 Epoch: 21 Global Step: 436590 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:03,182-Speed 2498.18 samples/sec Loss 2.0460 LearningRate 0.000277 Epoch: 21 Global Step: 436600 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:11,385-Speed 2496.84 samples/sec Loss 2.0178 LearningRate 0.000277 Epoch: 21 Global Step: 436610 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:19,589-Speed 2496.91 samples/sec Loss 2.0413 LearningRate 0.000277 Epoch: 21 Global Step: 436620 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:27,739-Speed 2513.32 samples/sec Loss 2.0591 LearningRate 0.000277 Epoch: 21 Global Step: 436630 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:35,936-Speed 2498.90 samples/sec Loss 2.0047 LearningRate 0.000277 Epoch: 21 Global Step: 436640 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:44,140-Speed 2496.57 samples/sec Loss 2.0412 LearningRate 0.000277 Epoch: 21 Global Step: 436650 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:23:52,352-Speed 2494.51 samples/sec Loss 2.0594 LearningRate 0.000277 Epoch: 21 Global Step: 436660 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:00,550-Speed 2498.41 samples/sec Loss 2.0626 LearningRate 0.000277 Epoch: 21 Global Step: 436670 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:08,750-Speed 2497.92 samples/sec Loss 2.0366 LearningRate 0.000277 Epoch: 21 Global Step: 436680 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:16,893-Speed 2515.61 samples/sec Loss 2.0318 LearningRate 0.000277 Epoch: 21 Global Step: 436690 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:25,092-Speed 2498.39 samples/sec Loss 2.0786 LearningRate 0.000277 Epoch: 21 Global Step: 436700 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:33,291-Speed 2498.70 samples/sec Loss 2.0552 LearningRate 0.000277 Epoch: 21 Global Step: 436710 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:41,495-Speed 2496.69 samples/sec Loss 2.0699 LearningRate 0.000277 Epoch: 21 Global Step: 436720 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:49,696-Speed 2497.99 samples/sec Loss 2.0135 LearningRate 0.000277 Epoch: 21 Global Step: 436730 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:24:57,904-Speed 2495.51 samples/sec Loss 2.0953 LearningRate 0.000277 Epoch: 21 Global Step: 436740 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:06,045-Speed 2515.90 samples/sec Loss 2.0687 LearningRate 0.000277 Epoch: 21 Global Step: 436750 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:14,245-Speed 2498.12 samples/sec Loss 2.0718 LearningRate 0.000277 Epoch: 21 Global Step: 436760 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:22,448-Speed 2496.96 samples/sec Loss 2.0762 LearningRate 0.000277 Epoch: 21 Global Step: 436770 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:30,650-Speed 2497.40 samples/sec Loss 2.0708 LearningRate 0.000277 Epoch: 21 Global Step: 436780 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:38,851-Speed 2497.58 samples/sec Loss 2.0969 LearningRate 0.000277 Epoch: 21 Global Step: 436790 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:47,050-Speed 2498.27 samples/sec Loss 2.0578 LearningRate 0.000277 Epoch: 21 Global Step: 436800 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:25:55,199-Speed 2513.44 samples/sec Loss 2.0640 LearningRate 0.000277 Epoch: 21 Global Step: 436810 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:03,401-Speed 2497.60 samples/sec Loss 2.0521 LearningRate 0.000277 Epoch: 21 Global Step: 436820 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:11,611-Speed 2494.88 samples/sec Loss 2.0584 LearningRate 0.000277 Epoch: 21 Global Step: 436830 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:19,816-Speed 2497.09 samples/sec Loss 1.9886 LearningRate 0.000277 Epoch: 21 Global Step: 436840 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:28,015-Speed 2498.45 samples/sec Loss 2.0397 LearningRate 0.000277 Epoch: 21 Global Step: 436850 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:36,214-Speed 2498.12 samples/sec Loss 2.0968 LearningRate 0.000277 Epoch: 21 Global Step: 436860 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:44,360-Speed 2514.67 samples/sec Loss 2.0249 LearningRate 0.000277 Epoch: 21 Global Step: 436870 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:26:52,557-Speed 2498.97 samples/sec Loss 2.0827 LearningRate 0.000277 Epoch: 21 Global Step: 436880 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:00,757-Speed 2497.77 samples/sec Loss 2.0396 LearningRate 0.000277 Epoch: 21 Global Step: 436890 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:08,959-Speed 2497.55 samples/sec Loss 1.9935 LearningRate 0.000277 Epoch: 21 Global Step: 436900 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:17,158-Speed 2498.11 samples/sec Loss 2.0306 LearningRate 0.000277 Epoch: 21 Global Step: 436910 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:25,371-Speed 2494.50 samples/sec Loss 2.0248 LearningRate 0.000277 Epoch: 21 Global Step: 436920 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:33,514-Speed 2515.33 samples/sec Loss 2.0346 LearningRate 0.000277 Epoch: 21 Global Step: 436930 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:41,719-Speed 2496.47 samples/sec Loss 2.0429 LearningRate 0.000277 Epoch: 21 Global Step: 436940 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:49,923-Speed 2496.71 samples/sec Loss 2.0385 LearningRate 0.000277 Epoch: 21 Global Step: 436950 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:27:58,121-Speed 2499.04 samples/sec Loss 2.0532 LearningRate 0.000277 Epoch: 21 Global Step: 436960 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:06,322-Speed 2498.01 samples/sec Loss 2.0004 LearningRate 0.000277 Epoch: 21 Global Step: 436970 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:14,534-Speed 2494.26 samples/sec Loss 2.0246 LearningRate 0.000276 Epoch: 21 Global Step: 436980 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:22,687-Speed 2512.47 samples/sec Loss 2.0320 LearningRate 0.000276 Epoch: 21 Global Step: 436990 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:30,884-Speed 2498.86 samples/sec Loss 2.0493 LearningRate 0.000276 Epoch: 21 Global Step: 437000 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:39,084-Speed 2498.22 samples/sec Loss 2.0176 LearningRate 0.000276 Epoch: 21 Global Step: 437010 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:28:47,285-Speed 2497.36 samples/sec Loss 2.0365 LearningRate 0.000276 Epoch: 21 Global Step: 437020 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:28:55,486-Speed 2497.80 samples/sec Loss 1.9985 LearningRate 0.000276 Epoch: 21 Global Step: 437030 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:03,691-Speed 2496.42 samples/sec Loss 2.0668 LearningRate 0.000276 Epoch: 21 Global Step: 437040 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:11,842-Speed 2513.12 samples/sec Loss 2.0252 LearningRate 0.000276 Epoch: 21 Global Step: 437050 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:20,045-Speed 2497.15 samples/sec Loss 2.0238 LearningRate 0.000276 Epoch: 21 Global Step: 437060 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:28,253-Speed 2495.54 samples/sec Loss 2.0371 LearningRate 0.000276 Epoch: 21 Global Step: 437070 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:36,481-Speed 2489.67 samples/sec Loss 2.1091 LearningRate 0.000276 Epoch: 21 Global Step: 437080 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:44,680-Speed 2498.40 samples/sec Loss 2.0737 LearningRate 0.000276 Epoch: 21 Global Step: 437090 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:29:52,898-Speed 2492.36 samples/sec Loss 2.0478 LearningRate 0.000276 Epoch: 21 Global Step: 437100 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:01,047-Speed 2513.62 samples/sec Loss 2.0446 LearningRate 0.000276 Epoch: 21 Global Step: 437110 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:09,248-Speed 2497.47 samples/sec Loss 2.0241 LearningRate 0.000276 Epoch: 21 Global Step: 437120 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:17,451-Speed 2497.23 samples/sec Loss 2.0195 LearningRate 0.000276 Epoch: 21 Global Step: 437130 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:25,649-Speed 2498.51 samples/sec Loss 2.0203 LearningRate 0.000276 Epoch: 21 Global Step: 437140 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:33,857-Speed 2495.18 samples/sec Loss 1.9898 LearningRate 0.000276 Epoch: 21 Global Step: 437150 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:42,059-Speed 2497.46 samples/sec Loss 2.0413 LearningRate 0.000276 Epoch: 21 Global Step: 437160 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:50,205-Speed 2514.62 samples/sec Loss 2.0607 LearningRate 0.000276 Epoch: 21 Global Step: 437170 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:30:58,407-Speed 2497.44 samples/sec Loss 1.9997 LearningRate 0.000276 Epoch: 21 Global Step: 437180 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:06,602-Speed 2499.30 samples/sec Loss 2.0174 LearningRate 0.000276 Epoch: 21 Global Step: 437190 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:14,800-Speed 2498.81 samples/sec Loss 2.0690 LearningRate 0.000276 Epoch: 21 Global Step: 437200 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:23,011-Speed 2494.47 samples/sec Loss 2.0631 LearningRate 0.000276 Epoch: 21 Global Step: 437210 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:31,220-Speed 2495.13 samples/sec Loss 2.0658 LearningRate 0.000276 Epoch: 21 Global Step: 437220 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:39,370-Speed 2513.47 samples/sec Loss 2.0485 LearningRate 0.000276 Epoch: 21 Global Step: 437230 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:47,600-Speed 2489.13 samples/sec Loss 2.0518 LearningRate 0.000276 Epoch: 21 Global Step: 437240 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:31:55,805-Speed 2496.43 samples/sec Loss 2.0498 LearningRate 0.000276 Epoch: 21 Global Step: 437250 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:04,008-Speed 2496.79 samples/sec Loss 2.0163 LearningRate 0.000276 Epoch: 21 Global Step: 437260 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:12,205-Speed 2498.90 samples/sec Loss 2.0115 LearningRate 0.000276 Epoch: 21 Global Step: 437270 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:20,409-Speed 2496.80 samples/sec Loss 2.0622 LearningRate 0.000276 Epoch: 21 Global Step: 437280 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:28,558-Speed 2513.53 samples/sec Loss 2.0321 LearningRate 0.000276 Epoch: 21 Global Step: 437290 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:36,758-Speed 2498.02 samples/sec Loss 2.0564 LearningRate 0.000276 Epoch: 21 Global Step: 437300 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:44,957-Speed 2498.11 samples/sec Loss 2.0282 LearningRate 0.000276 Epoch: 21 Global Step: 437310 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:32:53,157-Speed 2498.10 samples/sec Loss 2.0374 LearningRate 0.000276 Epoch: 21 Global Step: 437320 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:01,355-Speed 2498.45 samples/sec Loss 2.0267 LearningRate 0.000276 Epoch: 21 Global Step: 437330 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:09,558-Speed 2496.93 samples/sec Loss 2.0734 LearningRate 0.000276 Epoch: 21 Global Step: 437340 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:17,704-Speed 2514.62 samples/sec Loss 2.0629 LearningRate 0.000276 Epoch: 21 Global Step: 437350 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:25,903-Speed 2498.25 samples/sec Loss 2.0673 LearningRate 0.000276 Epoch: 21 Global Step: 437360 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:34,103-Speed 2497.71 samples/sec Loss 2.0786 LearningRate 0.000276 Epoch: 21 Global Step: 437370 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:42,304-Speed 2497.77 samples/sec Loss 2.0413 LearningRate 0.000276 Epoch: 21 Global Step: 437380 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:50,501-Speed 2499.13 samples/sec Loss 2.0469 LearningRate 0.000276 Epoch: 21 Global Step: 437390 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:33:58,712-Speed 2494.57 samples/sec Loss 2.0477 LearningRate 0.000276 Epoch: 21 Global Step: 437400 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:06,858-Speed 2514.59 samples/sec Loss 2.0340 LearningRate 0.000276 Epoch: 21 Global Step: 437410 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:15,061-Speed 2497.08 samples/sec Loss 2.0510 LearningRate 0.000276 Epoch: 21 Global Step: 437420 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:23,260-Speed 2498.35 samples/sec Loss 2.0448 LearningRate 0.000276 Epoch: 21 Global Step: 437430 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:31,474-Speed 2493.74 samples/sec Loss 2.0880 LearningRate 0.000276 Epoch: 21 Global Step: 437440 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:39,677-Speed 2496.87 samples/sec Loss 2.0602 LearningRate 0.000276 Epoch: 21 Global Step: 437450 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:47,876-Speed 2498.39 samples/sec Loss 2.0081 LearningRate 0.000276 Epoch: 21 Global Step: 437460 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:34:56,025-Speed 2513.67 samples/sec Loss 2.0112 LearningRate 0.000276 Epoch: 21 Global Step: 437470 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:04,228-Speed 2496.86 samples/sec Loss 2.0514 LearningRate 0.000276 Epoch: 21 Global Step: 437480 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:12,428-Speed 2497.96 samples/sec Loss 2.0580 LearningRate 0.000276 Epoch: 21 Global Step: 437490 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:20,631-Speed 2497.05 samples/sec Loss 2.0527 LearningRate 0.000276 Epoch: 21 Global Step: 437500 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:28,833-Speed 2497.52 samples/sec Loss 2.0107 LearningRate 0.000276 Epoch: 21 Global Step: 437510 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:37,035-Speed 2497.25 samples/sec Loss 2.0961 LearningRate 0.000276 Epoch: 21 Global Step: 437520 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:45,181-Speed 2514.67 samples/sec Loss 2.0265 LearningRate 0.000276 Epoch: 21 Global Step: 437530 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:35:53,377-Speed 2499.25 samples/sec Loss 2.0172 LearningRate 0.000276 Epoch: 21 Global Step: 437540 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:01,588-Speed 2494.46 samples/sec Loss 2.0279 LearningRate 0.000276 Epoch: 21 Global Step: 437550 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:09,789-Speed 2497.73 samples/sec Loss 2.0486 LearningRate 0.000276 Epoch: 21 Global Step: 437560 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:18,004-Speed 2493.66 samples/sec Loss 2.0374 LearningRate 0.000276 Epoch: 21 Global Step: 437570 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:26,203-Speed 2498.24 samples/sec Loss 2.0562 LearningRate 0.000276 Epoch: 21 Global Step: 437580 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:34,352-Speed 2514.15 samples/sec Loss 2.0479 LearningRate 0.000276 Epoch: 21 Global Step: 437590 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:42,556-Speed 2496.77 samples/sec Loss 2.0028 LearningRate 0.000276 Epoch: 21 Global Step: 437600 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:50,756-Speed 2497.92 samples/sec Loss 2.0195 LearningRate 0.000276 Epoch: 21 Global Step: 437610 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:36:58,954-Speed 2498.53 samples/sec Loss 2.0341 LearningRate 0.000276 Epoch: 21 Global Step: 437620 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:07,164-Speed 2494.83 samples/sec Loss 2.0017 LearningRate 0.000276 Epoch: 21 Global Step: 437630 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:15,373-Speed 2495.43 samples/sec Loss 2.0497 LearningRate 0.000276 Epoch: 21 Global Step: 437640 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:23,517-Speed 2514.99 samples/sec Loss 2.0293 LearningRate 0.000276 Epoch: 21 Global Step: 437650 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:31,718-Speed 2498.08 samples/sec Loss 2.0241 LearningRate 0.000276 Epoch: 21 Global Step: 437660 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:39,917-Speed 2498.27 samples/sec Loss 2.0398 LearningRate 0.000276 Epoch: 21 Global Step: 437670 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:48,121-Speed 2496.83 samples/sec Loss 2.0575 LearningRate 0.000276 Epoch: 21 Global Step: 437680 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:37:56,326-Speed 2496.30 samples/sec Loss 2.0301 LearningRate 0.000275 Epoch: 21 Global Step: 437690 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:04,528-Speed 2497.35 samples/sec Loss 2.0788 LearningRate 0.000275 Epoch: 21 Global Step: 437700 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:12,674-Speed 2514.70 samples/sec Loss 2.0291 LearningRate 0.000275 Epoch: 21 Global Step: 437710 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:20,873-Speed 2498.06 samples/sec Loss 2.0333 LearningRate 0.000275 Epoch: 21 Global Step: 437720 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:29,069-Speed 2499.23 samples/sec Loss 2.0144 LearningRate 0.000275 Epoch: 21 Global Step: 437730 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:37,269-Speed 2497.99 samples/sec Loss 2.0223 LearningRate 0.000275 Epoch: 21 Global Step: 437740 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:45,477-Speed 2495.60 samples/sec Loss 2.0112 LearningRate 0.000275 Epoch: 21 Global Step: 437750 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:38:53,680-Speed 2496.93 samples/sec Loss 2.0448 LearningRate 0.000275 Epoch: 21 Global Step: 437760 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:01,868-Speed 2501.92 samples/sec Loss 2.0341 LearningRate 0.000275 Epoch: 21 Global Step: 437770 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:10,069-Speed 2497.65 samples/sec Loss 2.0369 LearningRate 0.000275 Epoch: 21 Global Step: 437780 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:18,267-Speed 2498.48 samples/sec Loss 2.0522 LearningRate 0.000275 Epoch: 21 Global Step: 437790 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:26,469-Speed 2497.48 samples/sec Loss 2.0287 LearningRate 0.000275 Epoch: 21 Global Step: 437800 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:34,669-Speed 2498.06 samples/sec Loss 2.0232 LearningRate 0.000275 Epoch: 21 Global Step: 437810 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:42,885-Speed 2493.49 samples/sec Loss 2.0951 LearningRate 0.000275 Epoch: 21 Global Step: 437820 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:51,032-Speed 2513.86 samples/sec Loss 2.0202 LearningRate 0.000275 Epoch: 21 Global Step: 437830 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:39:59,233-Speed 2497.84 samples/sec Loss 2.0022 LearningRate 0.000275 Epoch: 21 Global Step: 437840 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:07,437-Speed 2496.84 samples/sec Loss 2.0170 LearningRate 0.000275 Epoch: 21 Global Step: 437850 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:15,633-Speed 2498.94 samples/sec Loss 2.0079 LearningRate 0.000275 Epoch: 21 Global Step: 437860 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:23,831-Speed 2498.88 samples/sec Loss 2.0253 LearningRate 0.000275 Epoch: 21 Global Step: 437870 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:32,033-Speed 2497.52 samples/sec Loss 2.0053 LearningRate 0.000275 Epoch: 21 Global Step: 437880 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:40,180-Speed 2514.53 samples/sec Loss 2.0157 LearningRate 0.000275 Epoch: 21 Global Step: 437890 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:48,379-Speed 2498.24 samples/sec Loss 2.0498 LearningRate 0.000275 Epoch: 21 Global Step: 437900 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:40:56,576-Speed 2498.82 samples/sec Loss 2.0259 LearningRate 0.000275 Epoch: 21 Global Step: 437910 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:04,780-Speed 2496.95 samples/sec Loss 1.9945 LearningRate 0.000275 Epoch: 21 Global Step: 437920 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:12,979-Speed 2498.51 samples/sec Loss 2.0272 LearningRate 0.000275 Epoch: 21 Global Step: 437930 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:21,186-Speed 2495.90 samples/sec Loss 2.0440 LearningRate 0.000275 Epoch: 21 Global Step: 437940 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:29,333-Speed 2514.24 samples/sec Loss 1.9877 LearningRate 0.000275 Epoch: 21 Global Step: 437950 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:37,534-Speed 2497.71 samples/sec Loss 2.0289 LearningRate 0.000275 Epoch: 21 Global Step: 437960 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:45,735-Speed 2497.57 samples/sec Loss 2.0617 LearningRate 0.000275 Epoch: 21 Global Step: 437970 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:41:53,937-Speed 2497.49 samples/sec Loss 2.0298 LearningRate 0.000275 Epoch: 21 Global Step: 437980 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:02,140-Speed 2496.98 samples/sec Loss 2.0377 LearningRate 0.000275 Epoch: 21 Global Step: 437990 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:10,343-Speed 2496.84 samples/sec Loss 2.0518 LearningRate 0.000275 Epoch: 21 Global Step: 438000 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:18,492-Speed 2513.69 samples/sec Loss 1.9922 LearningRate 0.000275 Epoch: 21 Global Step: 438010 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:26,692-Speed 2498.17 samples/sec Loss 2.0377 LearningRate 0.000275 Epoch: 21 Global Step: 438020 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:34,892-Speed 2497.91 samples/sec Loss 2.0271 LearningRate 0.000275 Epoch: 21 Global Step: 438030 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:43,108-Speed 2493.78 samples/sec Loss 2.0308 LearningRate 0.000275 Epoch: 21 Global Step: 438040 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:51,308-Speed 2498.00 samples/sec Loss 1.9680 LearningRate 0.000275 Epoch: 21 Global Step: 438050 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:42:59,511-Speed 2496.98 samples/sec Loss 2.0104 LearningRate 0.000275 Epoch: 21 Global Step: 438060 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:07,658-Speed 2514.01 samples/sec Loss 1.9957 LearningRate 0.000275 Epoch: 21 Global Step: 438070 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:15,861-Speed 2497.25 samples/sec Loss 2.0539 LearningRate 0.000275 Epoch: 21 Global Step: 438080 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:24,068-Speed 2495.88 samples/sec Loss 2.0336 LearningRate 0.000275 Epoch: 21 Global Step: 438090 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:32,269-Speed 2497.54 samples/sec Loss 2.0282 LearningRate 0.000275 Epoch: 21 Global Step: 438100 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:40,482-Speed 2493.93 samples/sec Loss 2.0062 LearningRate 0.000275 Epoch: 21 Global Step: 438110 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:48,685-Speed 2497.26 samples/sec Loss 2.0060 LearningRate 0.000275 Epoch: 21 Global Step: 438120 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:43:56,844-Speed 2510.77 samples/sec Loss 2.0139 LearningRate 0.000275 Epoch: 21 Global Step: 438130 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:05,047-Speed 2496.87 samples/sec Loss 2.0068 LearningRate 0.000275 Epoch: 21 Global Step: 438140 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:13,246-Speed 2498.10 samples/sec Loss 2.0168 LearningRate 0.000275 Epoch: 21 Global Step: 438150 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:21,454-Speed 2495.63 samples/sec Loss 2.0355 LearningRate 0.000275 Epoch: 21 Global Step: 438160 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:29,655-Speed 2497.77 samples/sec Loss 2.0243 LearningRate 0.000275 Epoch: 21 Global Step: 438170 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:37,871-Speed 2492.89 samples/sec Loss 2.0052 LearningRate 0.000275 Epoch: 21 Global Step: 438180 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:46,021-Speed 2513.40 samples/sec Loss 1.9903 LearningRate 0.000275 Epoch: 21 Global Step: 438190 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:44:54,223-Speed 2497.44 samples/sec Loss 1.9885 LearningRate 0.000275 Epoch: 21 Global Step: 438200 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:45:02,421-Speed 2498.78 samples/sec Loss 1.9886 LearningRate 0.000275 Epoch: 21 Global Step: 438210 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:45:10,621-Speed 2497.90 samples/sec Loss 2.0311 LearningRate 0.000275 Epoch: 21 Global Step: 438220 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:18,820-Speed 2498.48 samples/sec Loss 2.0306 LearningRate 0.000275 Epoch: 21 Global Step: 438230 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:27,017-Speed 2498.79 samples/sec Loss 1.9814 LearningRate 0.000275 Epoch: 21 Global Step: 438240 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:35,166-Speed 2513.56 samples/sec Loss 2.0111 LearningRate 0.000275 Epoch: 21 Global Step: 438250 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:43,376-Speed 2494.93 samples/sec Loss 2.0229 LearningRate 0.000275 Epoch: 21 Global Step: 438260 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:51,575-Speed 2498.53 samples/sec Loss 2.0015 LearningRate 0.000275 Epoch: 21 Global Step: 438270 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:45:59,776-Speed 2497.62 samples/sec Loss 2.0227 LearningRate 0.000275 Epoch: 21 Global Step: 438280 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:07,976-Speed 2497.95 samples/sec Loss 2.0328 LearningRate 0.000275 Epoch: 21 Global Step: 438290 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:16,174-Speed 2498.59 samples/sec Loss 1.9980 LearningRate 0.000275 Epoch: 21 Global Step: 438300 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:24,324-Speed 2513.41 samples/sec Loss 2.0504 LearningRate 0.000275 Epoch: 21 Global Step: 438310 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:32,522-Speed 2498.50 samples/sec Loss 2.0287 LearningRate 0.000275 Epoch: 21 Global Step: 438320 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:40,734-Speed 2494.57 samples/sec Loss 2.0245 LearningRate 0.000275 Epoch: 21 Global Step: 438330 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:48,934-Speed 2498.09 samples/sec Loss 2.0156 LearningRate 0.000275 Epoch: 21 Global Step: 438340 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:46:57,133-Speed 2498.02 samples/sec Loss 2.0235 LearningRate 0.000275 Epoch: 21 Global Step: 438350 Fp16 Grad Scale: 65536 Required: 90 hours Training: 2022-07-09 18:47:05,293-Speed 2510.66 samples/sec Loss 2.0410 LearningRate 0.000275 Epoch: 21 Global Step: 438360 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:13,437-Speed 2515.02 samples/sec Loss 2.0857 LearningRate 0.000275 Epoch: 21 Global Step: 438370 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:21,635-Speed 2498.83 samples/sec Loss 2.0235 LearningRate 0.000275 Epoch: 21 Global Step: 438380 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:29,840-Speed 2496.51 samples/sec Loss 2.0008 LearningRate 0.000275 Epoch: 21 Global Step: 438390 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:38,038-Speed 2498.74 samples/sec Loss 1.9986 LearningRate 0.000274 Epoch: 21 Global Step: 438400 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:46,238-Speed 2498.02 samples/sec Loss 2.0453 LearningRate 0.000274 Epoch: 21 Global Step: 438410 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:47:54,436-Speed 2498.47 samples/sec Loss 2.0517 LearningRate 0.000274 Epoch: 21 Global Step: 438420 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:02,591-Speed 2511.91 samples/sec Loss 2.0194 LearningRate 0.000274 Epoch: 21 Global Step: 438430 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:10,801-Speed 2494.86 samples/sec Loss 2.0083 LearningRate 0.000274 Epoch: 21 Global Step: 438440 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:19,020-Speed 2492.13 samples/sec Loss 2.0358 LearningRate 0.000274 Epoch: 21 Global Step: 438450 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:27,220-Speed 2497.90 samples/sec Loss 2.0138 LearningRate 0.000274 Epoch: 21 Global Step: 438460 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:35,433-Speed 2494.01 samples/sec Loss 2.0136 LearningRate 0.000274 Epoch: 21 Global Step: 438470 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:43,635-Speed 2497.46 samples/sec Loss 2.0288 LearningRate 0.000274 Epoch: 21 Global Step: 438480 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:51,784-Speed 2513.62 samples/sec Loss 2.0116 LearningRate 0.000274 Epoch: 21 Global Step: 438490 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:48:59,985-Speed 2497.79 samples/sec Loss 2.0139 LearningRate 0.000274 Epoch: 21 Global Step: 438500 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:08,185-Speed 2497.99 samples/sec Loss 2.0043 LearningRate 0.000274 Epoch: 21 Global Step: 438510 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:16,384-Speed 2498.21 samples/sec Loss 2.0488 LearningRate 0.000274 Epoch: 21 Global Step: 438520 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:24,584-Speed 2498.01 samples/sec Loss 2.0357 LearningRate 0.000274 Epoch: 21 Global Step: 438530 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:32,785-Speed 2498.29 samples/sec Loss 2.0488 LearningRate 0.000274 Epoch: 21 Global Step: 438540 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:40,942-Speed 2511.51 samples/sec Loss 2.0646 LearningRate 0.000274 Epoch: 21 Global Step: 438550 Fp16 Grad Scale: 32768 Required: 90 hours Training: 2022-07-09 18:49:49,099-Speed 2511.29 samples/sec Loss 2.0124 LearningRate 0.000274 Epoch: 21 Global Step: 438560 Fp16 Grad Scale: 16384 Required: 90 hours Training: 2022-07-09 18:49:57,310-Speed 2494.79 samples/sec Loss 2.0256 LearningRate 0.000274 Epoch: 21 Global Step: 438570 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:05,511-Speed 2497.75 samples/sec Loss 2.0373 LearningRate 0.000274 Epoch: 21 Global Step: 438580 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:13,709-Speed 2498.55 samples/sec Loss 2.0150 LearningRate 0.000274 Epoch: 21 Global Step: 438590 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:21,910-Speed 2497.80 samples/sec Loss 2.0085 LearningRate 0.000274 Epoch: 21 Global Step: 438600 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:30,067-Speed 2510.99 samples/sec Loss 2.0803 LearningRate 0.000274 Epoch: 21 Global Step: 438610 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:38,268-Speed 2497.55 samples/sec Loss 2.0099 LearningRate 0.000274 Epoch: 21 Global Step: 438620 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:46,470-Speed 2497.38 samples/sec Loss 2.0411 LearningRate 0.000274 Epoch: 21 Global Step: 438630 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:50:54,674-Speed 2496.83 samples/sec Loss 2.0235 LearningRate 0.000274 Epoch: 21 Global Step: 438640 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:02,880-Speed 2496.24 samples/sec Loss 2.0040 LearningRate 0.000274 Epoch: 21 Global Step: 438650 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:11,098-Speed 2493.69 samples/sec Loss 1.9939 LearningRate 0.000274 Epoch: 21 Global Step: 438660 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:19,249-Speed 2513.18 samples/sec Loss 2.0300 LearningRate 0.000274 Epoch: 21 Global Step: 438670 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:27,455-Speed 2496.15 samples/sec Loss 2.0467 LearningRate 0.000274 Epoch: 21 Global Step: 438680 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:35,661-Speed 2496.07 samples/sec Loss 1.9784 LearningRate 0.000274 Epoch: 21 Global Step: 438690 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:43,863-Speed 2497.43 samples/sec Loss 2.0301 LearningRate 0.000274 Epoch: 21 Global Step: 438700 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:51:52,067-Speed 2497.02 samples/sec Loss 2.0497 LearningRate 0.000274 Epoch: 21 Global Step: 438710 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:00,266-Speed 2498.23 samples/sec Loss 2.0133 LearningRate 0.000274 Epoch: 21 Global Step: 438720 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:08,413-Speed 2514.08 samples/sec Loss 2.0179 LearningRate 0.000274 Epoch: 21 Global Step: 438730 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:16,614-Speed 2497.81 samples/sec Loss 2.0187 LearningRate 0.000274 Epoch: 21 Global Step: 438740 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:24,812-Speed 2498.34 samples/sec Loss 2.0334 LearningRate 0.000274 Epoch: 21 Global Step: 438750 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:33,015-Speed 2497.30 samples/sec Loss 2.0109 LearningRate 0.000274 Epoch: 21 Global Step: 438760 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:41,221-Speed 2496.34 samples/sec Loss 2.0410 LearningRate 0.000274 Epoch: 21 Global Step: 438770 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:49,422-Speed 2497.65 samples/sec Loss 2.0189 LearningRate 0.000274 Epoch: 21 Global Step: 438780 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:52:57,570-Speed 2514.05 samples/sec Loss 2.0152 LearningRate 0.000274 Epoch: 21 Global Step: 438790 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:05,775-Speed 2496.52 samples/sec Loss 2.0411 LearningRate 0.000274 Epoch: 21 Global Step: 438800 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:13,983-Speed 2495.65 samples/sec Loss 2.0176 LearningRate 0.000274 Epoch: 21 Global Step: 438810 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:22,178-Speed 2499.37 samples/sec Loss 1.9982 LearningRate 0.000274 Epoch: 21 Global Step: 438820 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:30,376-Speed 2498.63 samples/sec Loss 2.0427 LearningRate 0.000274 Epoch: 21 Global Step: 438830 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:38,577-Speed 2497.95 samples/sec Loss 1.9336 LearningRate 0.000274 Epoch: 21 Global Step: 438840 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:46,724-Speed 2514.87 samples/sec Loss 1.9931 LearningRate 0.000274 Epoch: 21 Global Step: 438850 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:53:54,921-Speed 2498.70 samples/sec Loss 2.0072 LearningRate 0.000274 Epoch: 21 Global Step: 438860 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:03,125-Speed 2496.78 samples/sec Loss 2.0297 LearningRate 0.000274 Epoch: 21 Global Step: 438870 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:11,322-Speed 2499.28 samples/sec Loss 2.0134 LearningRate 0.000274 Epoch: 21 Global Step: 438880 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:19,523-Speed 2497.60 samples/sec Loss 2.0309 LearningRate 0.000274 Epoch: 21 Global Step: 438890 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:27,721-Speed 2498.37 samples/sec Loss 2.0386 LearningRate 0.000274 Epoch: 21 Global Step: 438900 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:35,874-Speed 2512.37 samples/sec Loss 2.0176 LearningRate 0.000274 Epoch: 21 Global Step: 438910 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:44,071-Speed 2498.93 samples/sec Loss 2.0152 LearningRate 0.000274 Epoch: 21 Global Step: 438920 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:54:52,280-Speed 2495.51 samples/sec Loss 1.9973 LearningRate 0.000274 Epoch: 21 Global Step: 438930 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:00,486-Speed 2495.99 samples/sec Loss 2.0451 LearningRate 0.000274 Epoch: 21 Global Step: 438940 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:08,687-Speed 2497.89 samples/sec Loss 2.0082 LearningRate 0.000274 Epoch: 21 Global Step: 438950 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:16,886-Speed 2498.50 samples/sec Loss 1.9780 LearningRate 0.000274 Epoch: 21 Global Step: 438960 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:25,033-Speed 2514.47 samples/sec Loss 2.0189 LearningRate 0.000274 Epoch: 21 Global Step: 438970 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:33,232-Speed 2498.05 samples/sec Loss 2.0306 LearningRate 0.000274 Epoch: 21 Global Step: 438980 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:41,431-Speed 2498.16 samples/sec Loss 1.9918 LearningRate 0.000274 Epoch: 21 Global Step: 438990 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:49,632-Speed 2497.83 samples/sec Loss 2.0252 LearningRate 0.000274 Epoch: 21 Global Step: 439000 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:55:57,831-Speed 2498.56 samples/sec Loss 2.0228 LearningRate 0.000274 Epoch: 21 Global Step: 439010 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:06,031-Speed 2497.81 samples/sec Loss 2.0274 LearningRate 0.000274 Epoch: 21 Global Step: 439020 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:14,177-Speed 2514.56 samples/sec Loss 1.9741 LearningRate 0.000274 Epoch: 21 Global Step: 439030 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:22,377-Speed 2498.00 samples/sec Loss 1.9503 LearningRate 0.000274 Epoch: 21 Global Step: 439040 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:30,579-Speed 2497.31 samples/sec Loss 2.0230 LearningRate 0.000274 Epoch: 21 Global Step: 439050 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:38,781-Speed 2497.39 samples/sec Loss 2.0022 LearningRate 0.000274 Epoch: 21 Global Step: 439060 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:46,984-Speed 2497.30 samples/sec Loss 2.0057 LearningRate 0.000274 Epoch: 21 Global Step: 439070 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:56:55,184-Speed 2498.03 samples/sec Loss 1.9824 LearningRate 0.000274 Epoch: 21 Global Step: 439080 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:03,328-Speed 2515.15 samples/sec Loss 2.0418 LearningRate 0.000274 Epoch: 21 Global Step: 439090 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:11,524-Speed 2499.04 samples/sec Loss 2.0518 LearningRate 0.000274 Epoch: 21 Global Step: 439100 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:19,727-Speed 2497.04 samples/sec Loss 2.0259 LearningRate 0.000273 Epoch: 21 Global Step: 439110 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:27,940-Speed 2494.03 samples/sec Loss 1.9689 LearningRate 0.000273 Epoch: 21 Global Step: 439120 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:36,151-Speed 2494.70 samples/sec Loss 2.0407 LearningRate 0.000273 Epoch: 21 Global Step: 439130 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:44,355-Speed 2497.48 samples/sec Loss 2.0643 LearningRate 0.000273 Epoch: 21 Global Step: 439140 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:57:52,494-Speed 2516.71 samples/sec Loss 2.0180 LearningRate 0.000273 Epoch: 21 Global Step: 439150 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:00,703-Speed 2495.22 samples/sec Loss 1.9937 LearningRate 0.000273 Epoch: 21 Global Step: 439160 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:08,900-Speed 2498.95 samples/sec Loss 2.0194 LearningRate 0.000273 Epoch: 21 Global Step: 439170 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:17,097-Speed 2498.75 samples/sec Loss 2.0088 LearningRate 0.000273 Epoch: 21 Global Step: 439180 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:25,297-Speed 2497.78 samples/sec Loss 2.0007 LearningRate 0.000273 Epoch: 21 Global Step: 439190 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:33,496-Speed 2498.23 samples/sec Loss 1.9637 LearningRate 0.000273 Epoch: 21 Global Step: 439200 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:41,643-Speed 2514.63 samples/sec Loss 2.0255 LearningRate 0.000273 Epoch: 21 Global Step: 439210 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:49,844-Speed 2497.58 samples/sec Loss 1.9790 LearningRate 0.000273 Epoch: 21 Global Step: 439220 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:58:58,054-Speed 2494.92 samples/sec Loss 2.0199 LearningRate 0.000273 Epoch: 21 Global Step: 439230 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:06,260-Speed 2496.05 samples/sec Loss 2.0626 LearningRate 0.000273 Epoch: 21 Global Step: 439240 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:14,468-Speed 2495.73 samples/sec Loss 2.0253 LearningRate 0.000273 Epoch: 21 Global Step: 439250 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:22,675-Speed 2495.56 samples/sec Loss 2.0144 LearningRate 0.000273 Epoch: 21 Global Step: 439260 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:30,828-Speed 2512.69 samples/sec Loss 2.0160 LearningRate 0.000273 Epoch: 21 Global Step: 439270 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:39,025-Speed 2498.86 samples/sec Loss 2.0053 LearningRate 0.000273 Epoch: 21 Global Step: 439280 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:47,230-Speed 2496.30 samples/sec Loss 2.0474 LearningRate 0.000273 Epoch: 21 Global Step: 439290 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 18:59:55,434-Speed 2496.79 samples/sec Loss 2.0333 LearningRate 0.000273 Epoch: 21 Global Step: 439300 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:03,630-Speed 2498.99 samples/sec Loss 2.0167 LearningRate 0.000273 Epoch: 21 Global Step: 439310 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:11,829-Speed 2498.39 samples/sec Loss 2.0222 LearningRate 0.000273 Epoch: 21 Global Step: 439320 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:19,975-Speed 2514.49 samples/sec Loss 2.0431 LearningRate 0.000273 Epoch: 21 Global Step: 439330 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:28,176-Speed 2497.58 samples/sec Loss 1.9834 LearningRate 0.000273 Epoch: 21 Global Step: 439340 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:36,375-Speed 2498.38 samples/sec Loss 2.0267 LearningRate 0.000273 Epoch: 21 Global Step: 439350 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:44,573-Speed 2498.34 samples/sec Loss 2.0535 LearningRate 0.000273 Epoch: 21 Global Step: 439360 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:00:52,788-Speed 2493.45 samples/sec Loss 1.9775 LearningRate 0.000273 Epoch: 21 Global Step: 439370 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:00,986-Speed 2498.51 samples/sec Loss 1.9905 LearningRate 0.000273 Epoch: 21 Global Step: 439380 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:09,133-Speed 2514.27 samples/sec Loss 1.9962 LearningRate 0.000273 Epoch: 21 Global Step: 439390 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:17,347-Speed 2493.66 samples/sec Loss 2.0209 LearningRate 0.000273 Epoch: 21 Global Step: 439400 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:25,550-Speed 2497.41 samples/sec Loss 1.9890 LearningRate 0.000273 Epoch: 21 Global Step: 439410 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:33,751-Speed 2497.52 samples/sec Loss 2.0508 LearningRate 0.000273 Epoch: 21 Global Step: 439420 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:41,950-Speed 2498.37 samples/sec Loss 2.0512 LearningRate 0.000273 Epoch: 21 Global Step: 439430 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:50,164-Speed 2493.69 samples/sec Loss 1.9997 LearningRate 0.000273 Epoch: 21 Global Step: 439440 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:01:58,314-Speed 2513.42 samples/sec Loss 2.0362 LearningRate 0.000273 Epoch: 21 Global Step: 439450 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:06,516-Speed 2497.16 samples/sec Loss 2.0058 LearningRate 0.000273 Epoch: 21 Global Step: 439460 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:14,715-Speed 2498.09 samples/sec Loss 2.0611 LearningRate 0.000273 Epoch: 21 Global Step: 439470 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:22,915-Speed 2498.40 samples/sec Loss 2.0031 LearningRate 0.000273 Epoch: 21 Global Step: 439480 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:31,126-Speed 2494.73 samples/sec Loss 2.0641 LearningRate 0.000273 Epoch: 21 Global Step: 439490 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:39,325-Speed 2498.31 samples/sec Loss 2.0239 LearningRate 0.000273 Epoch: 21 Global Step: 439500 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:47,470-Speed 2514.49 samples/sec Loss 2.0182 LearningRate 0.000273 Epoch: 21 Global Step: 439510 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:02:55,668-Speed 2498.80 samples/sec Loss 2.0282 LearningRate 0.000273 Epoch: 21 Global Step: 439520 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:03,866-Speed 2498.67 samples/sec Loss 2.0259 LearningRate 0.000273 Epoch: 21 Global Step: 439530 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:12,071-Speed 2496.24 samples/sec Loss 2.0189 LearningRate 0.000273 Epoch: 21 Global Step: 439540 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:20,270-Speed 2498.35 samples/sec Loss 2.0282 LearningRate 0.000273 Epoch: 21 Global Step: 439550 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:28,467-Speed 2498.79 samples/sec Loss 1.9834 LearningRate 0.000273 Epoch: 21 Global Step: 439560 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:36,616-Speed 2513.73 samples/sec Loss 1.9683 LearningRate 0.000273 Epoch: 21 Global Step: 439570 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:44,825-Speed 2495.36 samples/sec Loss 2.0249 LearningRate 0.000273 Epoch: 21 Global Step: 439580 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:03:53,024-Speed 2498.04 samples/sec Loss 1.9876 LearningRate 0.000273 Epoch: 21 Global Step: 439590 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:01,224-Speed 2498.05 samples/sec Loss 2.0245 LearningRate 0.000273 Epoch: 21 Global Step: 439600 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:09,421-Speed 2499.13 samples/sec Loss 2.0397 LearningRate 0.000273 Epoch: 21 Global Step: 439610 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:17,622-Speed 2497.69 samples/sec Loss 2.0245 LearningRate 0.000273 Epoch: 21 Global Step: 439620 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:25,770-Speed 2514.07 samples/sec Loss 1.9640 LearningRate 0.000273 Epoch: 21 Global Step: 439630 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:33,968-Speed 2498.31 samples/sec Loss 2.0133 LearningRate 0.000273 Epoch: 21 Global Step: 439640 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:42,181-Speed 2494.20 samples/sec Loss 1.9943 LearningRate 0.000273 Epoch: 21 Global Step: 439650 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:50,385-Speed 2496.74 samples/sec Loss 2.0126 LearningRate 0.000273 Epoch: 21 Global Step: 439660 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:04:58,591-Speed 2496.14 samples/sec Loss 1.9996 LearningRate 0.000273 Epoch: 21 Global Step: 439670 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:06,795-Speed 2496.86 samples/sec Loss 2.0371 LearningRate 0.000273 Epoch: 21 Global Step: 439680 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:14,940-Speed 2514.94 samples/sec Loss 1.9686 LearningRate 0.000273 Epoch: 21 Global Step: 439690 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:23,140-Speed 2498.01 samples/sec Loss 1.9849 LearningRate 0.000273 Epoch: 21 Global Step: 439700 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:31,341-Speed 2497.70 samples/sec Loss 2.0603 LearningRate 0.000273 Epoch: 21 Global Step: 439710 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:39,553-Speed 2494.40 samples/sec Loss 2.0798 LearningRate 0.000273 Epoch: 21 Global Step: 439720 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:47,752-Speed 2498.06 samples/sec Loss 1.9968 LearningRate 0.000273 Epoch: 21 Global Step: 439730 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:05:55,952-Speed 2498.02 samples/sec Loss 2.0486 LearningRate 0.000273 Epoch: 21 Global Step: 439740 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:06:04,125-Speed 2506.13 samples/sec Loss 1.9614 LearningRate 0.000273 Epoch: 21 Global Step: 439750 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:06:12,326-Speed 2497.56 samples/sec Loss 1.9970 LearningRate 0.000273 Epoch: 21 Global Step: 439760 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:06:20,537-Speed 2494.59 samples/sec Loss 1.9710 LearningRate 0.000273 Epoch: 21 Global Step: 439770 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:06:28,742-Speed 2496.47 samples/sec Loss 1.9774 LearningRate 0.000273 Epoch: 21 Global Step: 439780 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:06:36,947-Speed 2496.27 samples/sec Loss 1.9683 LearningRate 0.000273 Epoch: 21 Global Step: 439790 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:06:45,149-Speed 2497.72 samples/sec Loss 1.9924 LearningRate 0.000273 Epoch: 21 Global Step: 439800 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:06:53,297-Speed 2514.04 samples/sec Loss 1.9667 LearningRate 0.000273 Epoch: 21 Global Step: 439810 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:01,508-Speed 2494.61 samples/sec Loss 1.9997 LearningRate 0.000273 Epoch: 21 Global Step: 439820 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:09,709-Speed 2497.39 samples/sec Loss 1.9916 LearningRate 0.000272 Epoch: 21 Global Step: 439830 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:17,915-Speed 2496.26 samples/sec Loss 2.0235 LearningRate 0.000272 Epoch: 21 Global Step: 439840 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:26,116-Speed 2497.71 samples/sec Loss 2.0003 LearningRate 0.000272 Epoch: 21 Global Step: 439850 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:34,319-Speed 2497.04 samples/sec Loss 2.0526 LearningRate 0.000272 Epoch: 21 Global Step: 439860 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:42,467-Speed 2513.85 samples/sec Loss 1.9618 LearningRate 0.000272 Epoch: 21 Global Step: 439870 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:50,674-Speed 2495.84 samples/sec Loss 2.0254 LearningRate 0.000272 Epoch: 21 Global Step: 439880 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:07:58,878-Speed 2497.04 samples/sec Loss 1.9795 LearningRate 0.000272 Epoch: 21 Global Step: 439890 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:07,079-Speed 2497.54 samples/sec Loss 2.0093 LearningRate 0.000272 Epoch: 21 Global Step: 439900 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:15,281-Speed 2497.38 samples/sec Loss 2.0338 LearningRate 0.000272 Epoch: 21 Global Step: 439910 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:23,481-Speed 2497.89 samples/sec Loss 2.0230 LearningRate 0.000272 Epoch: 21 Global Step: 439920 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:31,631-Speed 2513.39 samples/sec Loss 1.9622 LearningRate 0.000272 Epoch: 21 Global Step: 439930 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:39,833-Speed 2497.35 samples/sec Loss 1.9974 LearningRate 0.000272 Epoch: 21 Global Step: 439940 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:48,034-Speed 2497.69 samples/sec Loss 2.0251 LearningRate 0.000272 Epoch: 21 Global Step: 439950 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:08:56,240-Speed 2496.11 samples/sec Loss 2.0375 LearningRate 0.000272 Epoch: 21 Global Step: 439960 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:04,440-Speed 2497.91 samples/sec Loss 1.9937 LearningRate 0.000272 Epoch: 21 Global Step: 439970 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:12,642-Speed 2497.23 samples/sec Loss 1.9774 LearningRate 0.000272 Epoch: 21 Global Step: 439980 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:20,801-Speed 2510.48 samples/sec Loss 2.0173 LearningRate 0.000272 Epoch: 21 Global Step: 439990 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:29,003-Speed 2497.45 samples/sec Loss 1.9445 LearningRate 0.000272 Epoch: 21 Global Step: 440000 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:37,206-Speed 2496.78 samples/sec Loss 2.0238 LearningRate 0.000272 Epoch: 21 Global Step: 440010 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:45,408-Speed 2497.46 samples/sec Loss 2.0043 LearningRate 0.000272 Epoch: 21 Global Step: 440020 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:09:53,614-Speed 2496.37 samples/sec Loss 2.0399 LearningRate 0.000272 Epoch: 21 Global Step: 440030 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:01,831-Speed 2492.78 samples/sec Loss 2.0104 LearningRate 0.000272 Epoch: 21 Global Step: 440040 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:09,980-Speed 2513.65 samples/sec Loss 1.9958 LearningRate 0.000272 Epoch: 21 Global Step: 440050 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:18,181-Speed 2497.49 samples/sec Loss 1.9831 LearningRate 0.000272 Epoch: 21 Global Step: 440060 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:26,392-Speed 2494.77 samples/sec Loss 2.0033 LearningRate 0.000272 Epoch: 21 Global Step: 440070 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:34,599-Speed 2495.83 samples/sec Loss 1.9832 LearningRate 0.000272 Epoch: 21 Global Step: 440080 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:42,803-Speed 2496.85 samples/sec Loss 1.9912 LearningRate 0.000272 Epoch: 21 Global Step: 440090 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:51,003-Speed 2498.23 samples/sec Loss 2.0324 LearningRate 0.000272 Epoch: 21 Global Step: 440100 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:10:59,154-Speed 2513.40 samples/sec Loss 1.9668 LearningRate 0.000272 Epoch: 21 Global Step: 440110 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:07,354-Speed 2497.87 samples/sec Loss 2.0241 LearningRate 0.000272 Epoch: 21 Global Step: 440120 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:15,565-Speed 2494.78 samples/sec Loss 1.9849 LearningRate 0.000272 Epoch: 21 Global Step: 440130 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:23,765-Speed 2498.04 samples/sec Loss 1.9916 LearningRate 0.000272 Epoch: 21 Global Step: 440140 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:31,965-Speed 2497.89 samples/sec Loss 2.0011 LearningRate 0.000272 Epoch: 21 Global Step: 440150 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:40,166-Speed 2497.65 samples/sec Loss 2.0011 LearningRate 0.000272 Epoch: 21 Global Step: 440160 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:48,313-Speed 2514.32 samples/sec Loss 1.9880 LearningRate 0.000272 Epoch: 21 Global Step: 440170 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:11:56,516-Speed 2497.37 samples/sec Loss 2.0319 LearningRate 0.000272 Epoch: 21 Global Step: 440180 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:04,720-Speed 2496.73 samples/sec Loss 1.9699 LearningRate 0.000272 Epoch: 21 Global Step: 440190 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:12,920-Speed 2497.84 samples/sec Loss 1.9636 LearningRate 0.000272 Epoch: 21 Global Step: 440200 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:21,124-Speed 2496.91 samples/sec Loss 2.0063 LearningRate 0.000272 Epoch: 21 Global Step: 440210 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:29,336-Speed 2494.69 samples/sec Loss 2.0363 LearningRate 0.000272 Epoch: 21 Global Step: 440220 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:37,486-Speed 2513.13 samples/sec Loss 2.0128 LearningRate 0.000272 Epoch: 21 Global Step: 440230 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:45,683-Speed 2498.85 samples/sec Loss 2.0025 LearningRate 0.000272 Epoch: 21 Global Step: 440240 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:12:53,882-Speed 2498.46 samples/sec Loss 1.9821 LearningRate 0.000272 Epoch: 21 Global Step: 440250 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:02,085-Speed 2497.05 samples/sec Loss 1.9879 LearningRate 0.000272 Epoch: 21 Global Step: 440260 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:10,284-Speed 2498.58 samples/sec Loss 2.0095 LearningRate 0.000272 Epoch: 21 Global Step: 440270 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:18,480-Speed 2499.12 samples/sec Loss 1.9819 LearningRate 0.000272 Epoch: 21 Global Step: 440280 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:26,623-Speed 2515.59 samples/sec Loss 2.0534 LearningRate 0.000272 Epoch: 21 Global Step: 440290 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:34,830-Speed 2495.60 samples/sec Loss 2.0066 LearningRate 0.000272 Epoch: 21 Global Step: 440300 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:43,029-Speed 2498.53 samples/sec Loss 2.0038 LearningRate 0.000272 Epoch: 21 Global Step: 440310 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:51,225-Speed 2499.40 samples/sec Loss 2.0476 LearningRate 0.000272 Epoch: 21 Global Step: 440320 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:13:59,427-Speed 2497.26 samples/sec Loss 2.0403 LearningRate 0.000272 Epoch: 21 Global Step: 440330 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:14:07,635-Speed 2495.80 samples/sec Loss 1.9929 LearningRate 0.000272 Epoch: 21 Global Step: 440340 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:14:15,799-Speed 2508.87 samples/sec Loss 2.0197 LearningRate 0.000272 Epoch: 21 Global Step: 440350 Fp16 Grad Scale: 32768 Required: 89 hours Training: 2022-07-09 19:14:23,972-Speed 2506.27 samples/sec Loss 1.9788 LearningRate 0.000272 Epoch: 21 Global Step: 440360 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:14:32,172-Speed 2497.87 samples/sec Loss 2.0405 LearningRate 0.000272 Epoch: 21 Global Step: 440370 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:14:40,378-Speed 2496.39 samples/sec Loss 2.0257 LearningRate 0.000272 Epoch: 21 Global Step: 440380 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:14:48,588-Speed 2495.01 samples/sec Loss 1.9902 LearningRate 0.000272 Epoch: 21 Global Step: 440390 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:14:56,809-Speed 2491.58 samples/sec Loss 1.9641 LearningRate 0.000272 Epoch: 21 Global Step: 440400 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:04,957-Speed 2514.13 samples/sec Loss 2.0466 LearningRate 0.000272 Epoch: 21 Global Step: 440410 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:13,156-Speed 2498.15 samples/sec Loss 2.0659 LearningRate 0.000272 Epoch: 21 Global Step: 440420 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:21,358-Speed 2497.37 samples/sec Loss 2.0040 LearningRate 0.000272 Epoch: 21 Global Step: 440430 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:29,562-Speed 2496.61 samples/sec Loss 2.0432 LearningRate 0.000272 Epoch: 21 Global Step: 440440 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:37,760-Speed 2498.76 samples/sec Loss 2.1154 LearningRate 0.000272 Epoch: 21 Global Step: 440450 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:45,960-Speed 2497.88 samples/sec Loss 2.0230 LearningRate 0.000272 Epoch: 21 Global Step: 440460 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:15:54,108-Speed 2513.93 samples/sec Loss 2.1139 LearningRate 0.000272 Epoch: 21 Global Step: 440470 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:02,307-Speed 2498.31 samples/sec Loss 2.0149 LearningRate 0.000272 Epoch: 21 Global Step: 440480 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:10,529-Speed 2491.21 samples/sec Loss 2.0241 LearningRate 0.000272 Epoch: 21 Global Step: 440490 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:18,745-Speed 2493.07 samples/sec Loss 2.0472 LearningRate 0.000272 Epoch: 21 Global Step: 440500 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:26,945-Speed 2498.00 samples/sec Loss 2.0176 LearningRate 0.000272 Epoch: 21 Global Step: 440510 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:35,156-Speed 2494.61 samples/sec Loss 2.0047 LearningRate 0.000272 Epoch: 21 Global Step: 440520 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:43,302-Speed 2514.57 samples/sec Loss 2.0454 LearningRate 0.000272 Epoch: 21 Global Step: 440530 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:51,500-Speed 2498.63 samples/sec Loss 2.0221 LearningRate 0.000271 Epoch: 21 Global Step: 440540 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:16:59,696-Speed 2498.92 samples/sec Loss 2.0214 LearningRate 0.000271 Epoch: 21 Global Step: 440550 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:07,894-Speed 2498.70 samples/sec Loss 1.9917 LearningRate 0.000271 Epoch: 21 Global Step: 440560 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:16,140-Speed 2496.52 samples/sec Loss 2.0240 LearningRate 0.000271 Epoch: 21 Global Step: 440570 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:24,359-Speed 2496.70 samples/sec Loss 2.0247 LearningRate 0.000271 Epoch: 21 Global Step: 440580 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:32,528-Speed 2507.30 samples/sec Loss 2.0886 LearningRate 0.000271 Epoch: 21 Global Step: 440590 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:40,781-Speed 2498.72 samples/sec Loss 2.0175 LearningRate 0.000271 Epoch: 21 Global Step: 440600 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:17:53,093-Speed 1672.79 samples/sec Loss 2.0943 LearningRate 0.000271 Epoch: 21 Global Step: 440610 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:01,371-Speed 2494.45 samples/sec Loss 2.1196 LearningRate 0.000271 Epoch: 21 Global Step: 440620 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:09,575-Speed 2496.93 samples/sec Loss 2.0579 LearningRate 0.000271 Epoch: 21 Global Step: 440630 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:18,046-Speed 2497.36 samples/sec Loss 2.0853 LearningRate 0.000271 Epoch: 21 Global Step: 440640 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:26,377-Speed 2517.30 samples/sec Loss 2.0845 LearningRate 0.000271 Epoch: 21 Global Step: 440650 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:34,600-Speed 2490.70 samples/sec Loss 2.0811 LearningRate 0.000271 Epoch: 21 Global Step: 440660 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:42,803-Speed 2497.18 samples/sec Loss 2.0929 LearningRate 0.000271 Epoch: 21 Global Step: 440670 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:51,053-Speed 2499.79 samples/sec Loss 2.0715 LearningRate 0.000271 Epoch: 21 Global Step: 440680 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:18:59,287-Speed 2497.76 samples/sec Loss 2.0680 LearningRate 0.000271 Epoch: 21 Global Step: 440690 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:07,490-Speed 2496.96 samples/sec Loss 2.0537 LearningRate 0.000271 Epoch: 21 Global Step: 440700 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:15,938-Speed 2512.19 samples/sec Loss 2.0428 LearningRate 0.000271 Epoch: 21 Global Step: 440710 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:24,171-Speed 2499.23 samples/sec Loss 2.0523 LearningRate 0.000271 Epoch: 21 Global Step: 440720 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:32,370-Speed 2498.36 samples/sec Loss 1.9910 LearningRate 0.000271 Epoch: 21 Global Step: 440730 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:40,568-Speed 2498.24 samples/sec Loss 2.0788 LearningRate 0.000271 Epoch: 21 Global Step: 440740 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:49,584-Speed 2499.45 samples/sec Loss 1.9927 LearningRate 0.000271 Epoch: 21 Global Step: 440750 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:19:57,810-Speed 2499.57 samples/sec Loss 2.0030 LearningRate 0.000271 Epoch: 21 Global Step: 440760 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:05,959-Speed 2513.58 samples/sec Loss 2.0144 LearningRate 0.000271 Epoch: 21 Global Step: 440770 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:15,745-Speed 2106.29 samples/sec Loss 2.0672 LearningRate 0.000271 Epoch: 21 Global Step: 440780 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:23,941-Speed 2501.18 samples/sec Loss 2.0177 LearningRate 0.000271 Epoch: 21 Global Step: 440790 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:32,150-Speed 2495.11 samples/sec Loss 1.9929 LearningRate 0.000271 Epoch: 21 Global Step: 440800 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:46,298-Speed 2499.16 samples/sec Loss 2.0256 LearningRate 0.000271 Epoch: 21 Global Step: 440810 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:20:54,558-Speed 2499.92 samples/sec Loss 2.0578 LearningRate 0.000271 Epoch: 21 Global Step: 440820 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:07,453-Speed 1635.24 samples/sec Loss 1.9970 LearningRate 0.000271 Epoch: 21 Global Step: 440830 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:16,607-Speed 2250.03 samples/sec Loss 1.9930 LearningRate 0.000271 Epoch: 21 Global Step: 440840 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:25,202-Speed 2411.47 samples/sec Loss 1.9910 LearningRate 0.000271 Epoch: 21 Global Step: 440850 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:33,560-Speed 2490.81 samples/sec Loss 1.9982 LearningRate 0.000271 Epoch: 21 Global Step: 440860 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:41,768-Speed 2495.33 samples/sec Loss 1.9946 LearningRate 0.000271 Epoch: 21 Global Step: 440870 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:49,966-Speed 2498.69 samples/sec Loss 2.0409 LearningRate 0.000271 Epoch: 21 Global Step: 440880 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:21:58,111-Speed 2514.63 samples/sec Loss 1.9855 LearningRate 0.000271 Epoch: 21 Global Step: 440890 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:06,317-Speed 2496.17 samples/sec Loss 1.9927 LearningRate 0.000271 Epoch: 21 Global Step: 440900 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:14,520-Speed 2497.31 samples/sec Loss 2.0212 LearningRate 0.000271 Epoch: 21 Global Step: 440910 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:22,728-Speed 2495.27 samples/sec Loss 2.0217 LearningRate 0.000271 Epoch: 21 Global Step: 440920 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:30,932-Speed 2496.95 samples/sec Loss 1.9607 LearningRate 0.000271 Epoch: 21 Global Step: 440930 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:39,136-Speed 2496.79 samples/sec Loss 1.9765 LearningRate 0.000271 Epoch: 21 Global Step: 440940 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:47,287-Speed 2513.31 samples/sec Loss 1.9861 LearningRate 0.000271 Epoch: 21 Global Step: 440950 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:22:55,487-Speed 2497.67 samples/sec Loss 1.9907 LearningRate 0.000271 Epoch: 21 Global Step: 440960 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:03,692-Speed 2496.52 samples/sec Loss 2.0024 LearningRate 0.000271 Epoch: 21 Global Step: 440970 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:11,902-Speed 2494.80 samples/sec Loss 1.9958 LearningRate 0.000271 Epoch: 21 Global Step: 440980 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:20,104-Speed 2497.50 samples/sec Loss 2.0365 LearningRate 0.000271 Epoch: 21 Global Step: 440990 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:28,318-Speed 2493.58 samples/sec Loss 1.9758 LearningRate 0.000271 Epoch: 21 Global Step: 441000 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:36,464-Speed 2514.66 samples/sec Loss 2.0483 LearningRate 0.000271 Epoch: 21 Global Step: 441010 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:44,661-Speed 2498.70 samples/sec Loss 2.0114 LearningRate 0.000271 Epoch: 21 Global Step: 441020 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:23:52,858-Speed 2498.60 samples/sec Loss 2.0331 LearningRate 0.000271 Epoch: 21 Global Step: 441030 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:01,056-Speed 2498.57 samples/sec Loss 2.0235 LearningRate 0.000271 Epoch: 21 Global Step: 441040 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:09,282-Speed 2490.05 samples/sec Loss 2.0226 LearningRate 0.000271 Epoch: 21 Global Step: 441050 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:17,483-Speed 2497.75 samples/sec Loss 1.9856 LearningRate 0.000271 Epoch: 21 Global Step: 441060 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:25,637-Speed 2512.63 samples/sec Loss 1.9919 LearningRate 0.000271 Epoch: 21 Global Step: 441070 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:33,841-Speed 2496.59 samples/sec Loss 1.9949 LearningRate 0.000271 Epoch: 21 Global Step: 441080 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:42,042-Speed 2497.66 samples/sec Loss 2.0532 LearningRate 0.000271 Epoch: 21 Global Step: 441090 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:50,242-Speed 2497.97 samples/sec Loss 2.0019 LearningRate 0.000271 Epoch: 21 Global Step: 441100 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:24:58,448-Speed 2496.06 samples/sec Loss 1.9988 LearningRate 0.000271 Epoch: 21 Global Step: 441110 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:06,652-Speed 2496.99 samples/sec Loss 1.9859 LearningRate 0.000271 Epoch: 21 Global Step: 441120 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:14,802-Speed 2513.35 samples/sec Loss 1.9964 LearningRate 0.000271 Epoch: 21 Global Step: 441130 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:23,001-Speed 2498.08 samples/sec Loss 2.0095 LearningRate 0.000271 Epoch: 21 Global Step: 441140 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:31,204-Speed 2497.10 samples/sec Loss 1.9850 LearningRate 0.000271 Epoch: 21 Global Step: 441150 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:39,406-Speed 2497.44 samples/sec Loss 1.9600 LearningRate 0.000271 Epoch: 21 Global Step: 441160 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:47,608-Speed 2497.37 samples/sec Loss 2.0121 LearningRate 0.000271 Epoch: 21 Global Step: 441170 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:25:55,768-Speed 2510.25 samples/sec Loss 1.9641 LearningRate 0.000271 Epoch: 21 Global Step: 441180 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:03,916-Speed 2513.84 samples/sec Loss 2.0155 LearningRate 0.000271 Epoch: 21 Global Step: 441190 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:12,116-Speed 2497.99 samples/sec Loss 2.0380 LearningRate 0.000271 Epoch: 21 Global Step: 441200 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:20,313-Speed 2498.87 samples/sec Loss 2.0183 LearningRate 0.000271 Epoch: 21 Global Step: 441210 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:28,512-Speed 2498.42 samples/sec Loss 2.0290 LearningRate 0.000271 Epoch: 21 Global Step: 441220 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:36,709-Speed 2498.78 samples/sec Loss 2.0184 LearningRate 0.000271 Epoch: 21 Global Step: 441230 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:44,905-Speed 2499.48 samples/sec Loss 2.0083 LearningRate 0.000271 Epoch: 21 Global Step: 441240 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:26:53,051-Speed 2514.29 samples/sec Loss 1.9811 LearningRate 0.000271 Epoch: 21 Global Step: 441250 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:01,247-Speed 2499.07 samples/sec Loss 1.9734 LearningRate 0.000270 Epoch: 21 Global Step: 441260 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:09,448-Speed 2497.64 samples/sec Loss 2.0363 LearningRate 0.000270 Epoch: 21 Global Step: 441270 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:17,649-Speed 2497.57 samples/sec Loss 2.0247 LearningRate 0.000270 Epoch: 21 Global Step: 441280 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:25,854-Speed 2496.47 samples/sec Loss 2.0293 LearningRate 0.000270 Epoch: 21 Global Step: 441290 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:34,059-Speed 2496.42 samples/sec Loss 2.0077 LearningRate 0.000270 Epoch: 21 Global Step: 441300 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:42,205-Speed 2514.53 samples/sec Loss 2.0291 LearningRate 0.000270 Epoch: 21 Global Step: 441310 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:50,405-Speed 2497.90 samples/sec Loss 1.9887 LearningRate 0.000270 Epoch: 21 Global Step: 441320 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:27:58,609-Speed 2496.87 samples/sec Loss 2.0127 LearningRate 0.000270 Epoch: 21 Global Step: 441330 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:06,811-Speed 2497.25 samples/sec Loss 2.0210 LearningRate 0.000270 Epoch: 21 Global Step: 441340 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:15,012-Speed 2497.78 samples/sec Loss 1.9832 LearningRate 0.000270 Epoch: 21 Global Step: 441350 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:23,211-Speed 2498.49 samples/sec Loss 1.9821 LearningRate 0.000270 Epoch: 21 Global Step: 441360 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:31,354-Speed 2515.18 samples/sec Loss 2.0346 LearningRate 0.000270 Epoch: 21 Global Step: 441370 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:39,556-Speed 2497.48 samples/sec Loss 2.0188 LearningRate 0.000270 Epoch: 21 Global Step: 441380 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:47,753-Speed 2498.69 samples/sec Loss 2.0604 LearningRate 0.000270 Epoch: 21 Global Step: 441390 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:28:55,954-Speed 2497.96 samples/sec Loss 2.0345 LearningRate 0.000270 Epoch: 21 Global Step: 441400 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:04,162-Speed 2495.30 samples/sec Loss 2.0118 LearningRate 0.000270 Epoch: 21 Global Step: 441410 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:12,377-Speed 2493.34 samples/sec Loss 2.0045 LearningRate 0.000270 Epoch: 21 Global Step: 441420 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:20,522-Speed 2515.16 samples/sec Loss 2.0026 LearningRate 0.000270 Epoch: 21 Global Step: 441430 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:28,720-Speed 2498.36 samples/sec Loss 1.9804 LearningRate 0.000270 Epoch: 21 Global Step: 441440 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:36,920-Speed 2498.09 samples/sec Loss 2.0301 LearningRate 0.000270 Epoch: 21 Global Step: 441450 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:45,121-Speed 2497.56 samples/sec Loss 2.0068 LearningRate 0.000270 Epoch: 21 Global Step: 441460 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:29:53,323-Speed 2497.39 samples/sec Loss 2.0318 LearningRate 0.000270 Epoch: 21 Global Step: 441470 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:01,523-Speed 2497.80 samples/sec Loss 2.0263 LearningRate 0.000270 Epoch: 21 Global Step: 441480 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:09,668-Speed 2514.83 samples/sec Loss 2.0631 LearningRate 0.000270 Epoch: 21 Global Step: 441490 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:17,873-Speed 2496.62 samples/sec Loss 2.0545 LearningRate 0.000270 Epoch: 21 Global Step: 441500 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:26,070-Speed 2498.58 samples/sec Loss 2.0450 LearningRate 0.000270 Epoch: 21 Global Step: 441510 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:34,273-Speed 2497.21 samples/sec Loss 2.0321 LearningRate 0.000270 Epoch: 21 Global Step: 441520 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:42,473-Speed 2498.10 samples/sec Loss 2.0069 LearningRate 0.000270 Epoch: 21 Global Step: 441530 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:50,673-Speed 2497.81 samples/sec Loss 2.0279 LearningRate 0.000270 Epoch: 21 Global Step: 441540 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:30:58,817-Speed 2515.10 samples/sec Loss 2.0572 LearningRate 0.000270 Epoch: 21 Global Step: 441550 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:07,032-Speed 2493.45 samples/sec Loss 2.0524 LearningRate 0.000270 Epoch: 21 Global Step: 441560 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:15,229-Speed 2498.93 samples/sec Loss 1.9784 LearningRate 0.000270 Epoch: 21 Global Step: 441570 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:23,431-Speed 2497.44 samples/sec Loss 2.0582 LearningRate 0.000270 Epoch: 21 Global Step: 441580 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:31,633-Speed 2497.32 samples/sec Loss 1.9984 LearningRate 0.000270 Epoch: 21 Global Step: 441590 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:39,837-Speed 2496.85 samples/sec Loss 1.9725 LearningRate 0.000270 Epoch: 21 Global Step: 441600 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:47,986-Speed 2513.47 samples/sec Loss 2.0693 LearningRate 0.000270 Epoch: 21 Global Step: 441610 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:31:56,185-Speed 2498.43 samples/sec Loss 2.0226 LearningRate 0.000270 Epoch: 21 Global Step: 441620 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:04,384-Speed 2498.25 samples/sec Loss 1.9829 LearningRate 0.000270 Epoch: 21 Global Step: 441630 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:12,581-Speed 2498.99 samples/sec Loss 2.0037 LearningRate 0.000270 Epoch: 21 Global Step: 441640 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:20,781-Speed 2497.89 samples/sec Loss 1.9836 LearningRate 0.000270 Epoch: 21 Global Step: 441650 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:28,990-Speed 2495.29 samples/sec Loss 2.0269 LearningRate 0.000270 Epoch: 21 Global Step: 441660 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:37,142-Speed 2512.83 samples/sec Loss 2.0665 LearningRate 0.000270 Epoch: 21 Global Step: 441670 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:45,343-Speed 2497.70 samples/sec Loss 2.0076 LearningRate 0.000270 Epoch: 21 Global Step: 441680 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:32:53,542-Speed 2498.02 samples/sec Loss 2.0174 LearningRate 0.000270 Epoch: 21 Global Step: 441690 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:01,751-Speed 2495.49 samples/sec Loss 2.0488 LearningRate 0.000270 Epoch: 21 Global Step: 441700 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:09,951-Speed 2498.04 samples/sec Loss 1.9701 LearningRate 0.000270 Epoch: 21 Global Step: 441710 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:18,164-Speed 2494.01 samples/sec Loss 1.9956 LearningRate 0.000270 Epoch: 21 Global Step: 441720 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:26,313-Speed 2513.64 samples/sec Loss 1.9792 LearningRate 0.000270 Epoch: 21 Global Step: 441730 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:34,514-Speed 2497.67 samples/sec Loss 2.0241 LearningRate 0.000270 Epoch: 21 Global Step: 441740 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:42,716-Speed 2497.46 samples/sec Loss 2.0191 LearningRate 0.000270 Epoch: 21 Global Step: 441750 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:50,917-Speed 2497.98 samples/sec Loss 2.0139 LearningRate 0.000270 Epoch: 21 Global Step: 441760 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:33:59,126-Speed 2495.24 samples/sec Loss 1.9976 LearningRate 0.000270 Epoch: 21 Global Step: 441770 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:07,327-Speed 2497.71 samples/sec Loss 1.9642 LearningRate 0.000270 Epoch: 21 Global Step: 441780 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:15,473-Speed 2514.42 samples/sec Loss 2.0788 LearningRate 0.000270 Epoch: 21 Global Step: 441790 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:23,671-Speed 2498.53 samples/sec Loss 2.0013 LearningRate 0.000270 Epoch: 21 Global Step: 441800 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:31,874-Speed 2497.12 samples/sec Loss 2.0034 LearningRate 0.000270 Epoch: 21 Global Step: 441810 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:40,072-Speed 2498.46 samples/sec Loss 2.0067 LearningRate 0.000270 Epoch: 21 Global Step: 441820 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:48,273-Speed 2497.62 samples/sec Loss 2.0185 LearningRate 0.000270 Epoch: 21 Global Step: 441830 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:34:56,477-Speed 2496.54 samples/sec Loss 2.0283 LearningRate 0.000270 Epoch: 21 Global Step: 441840 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:04,657-Speed 2504.42 samples/sec Loss 2.0111 LearningRate 0.000270 Epoch: 21 Global Step: 441850 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:12,855-Speed 2498.56 samples/sec Loss 1.9765 LearningRate 0.000270 Epoch: 21 Global Step: 441860 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:21,055-Speed 2498.00 samples/sec Loss 2.0042 LearningRate 0.000270 Epoch: 21 Global Step: 441870 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:29,272-Speed 2492.80 samples/sec Loss 2.0383 LearningRate 0.000270 Epoch: 21 Global Step: 441880 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:37,478-Speed 2496.07 samples/sec Loss 1.9686 LearningRate 0.000270 Epoch: 21 Global Step: 441890 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:45,677-Speed 2498.61 samples/sec Loss 1.9661 LearningRate 0.000270 Epoch: 21 Global Step: 441900 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:35:53,821-Speed 2515.18 samples/sec Loss 2.0279 LearningRate 0.000270 Epoch: 21 Global Step: 441910 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:02,018-Speed 2498.74 samples/sec Loss 1.9574 LearningRate 0.000270 Epoch: 21 Global Step: 441920 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:10,214-Speed 2499.10 samples/sec Loss 2.0093 LearningRate 0.000270 Epoch: 21 Global Step: 441930 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:18,411-Speed 2498.89 samples/sec Loss 1.9956 LearningRate 0.000270 Epoch: 21 Global Step: 441940 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:26,608-Speed 2498.80 samples/sec Loss 1.9678 LearningRate 0.000270 Epoch: 21 Global Step: 441950 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:34,812-Speed 2496.86 samples/sec Loss 1.9711 LearningRate 0.000270 Epoch: 21 Global Step: 441960 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:42,953-Speed 2516.03 samples/sec Loss 1.9962 LearningRate 0.000270 Epoch: 21 Global Step: 441970 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:51,151-Speed 2498.41 samples/sec Loss 2.0236 LearningRate 0.000269 Epoch: 21 Global Step: 441980 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:36:59,354-Speed 2497.47 samples/sec Loss 2.0005 LearningRate 0.000269 Epoch: 21 Global Step: 441990 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:07,553-Speed 2498.66 samples/sec Loss 2.0136 LearningRate 0.000269 Epoch: 21 Global Step: 442000 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:15,752-Speed 2498.19 samples/sec Loss 1.9526 LearningRate 0.000269 Epoch: 21 Global Step: 442010 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:23,953-Speed 2497.56 samples/sec Loss 1.9887 LearningRate 0.000269 Epoch: 21 Global Step: 442020 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:32,096-Speed 2515.75 samples/sec Loss 2.0060 LearningRate 0.000269 Epoch: 21 Global Step: 442030 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:40,290-Speed 2499.69 samples/sec Loss 1.9986 LearningRate 0.000269 Epoch: 21 Global Step: 442040 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:48,486-Speed 2499.30 samples/sec Loss 2.0200 LearningRate 0.000269 Epoch: 21 Global Step: 442050 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:37:56,683-Speed 2498.81 samples/sec Loss 1.9931 LearningRate 0.000269 Epoch: 21 Global Step: 442060 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:04,881-Speed 2498.43 samples/sec Loss 1.9968 LearningRate 0.000269 Epoch: 21 Global Step: 442070 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:13,079-Speed 2498.68 samples/sec Loss 1.9763 LearningRate 0.000269 Epoch: 21 Global Step: 442080 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:21,223-Speed 2515.11 samples/sec Loss 2.0029 LearningRate 0.000269 Epoch: 21 Global Step: 442090 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:29,421-Speed 2498.55 samples/sec Loss 2.0274 LearningRate 0.000269 Epoch: 21 Global Step: 442100 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:37,617-Speed 2499.44 samples/sec Loss 2.0195 LearningRate 0.000269 Epoch: 21 Global Step: 442110 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:45,813-Speed 2499.14 samples/sec Loss 1.9778 LearningRate 0.000269 Epoch: 21 Global Step: 442120 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:38:54,018-Speed 2496.32 samples/sec Loss 1.9918 LearningRate 0.000269 Epoch: 21 Global Step: 442130 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:02,218-Speed 2497.94 samples/sec Loss 1.9388 LearningRate 0.000269 Epoch: 21 Global Step: 442140 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:10,362-Speed 2514.91 samples/sec Loss 1.9772 LearningRate 0.000269 Epoch: 21 Global Step: 442150 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:18,559-Speed 2499.37 samples/sec Loss 1.9981 LearningRate 0.000269 Epoch: 21 Global Step: 442160 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:26,753-Speed 2499.89 samples/sec Loss 2.0251 LearningRate 0.000269 Epoch: 21 Global Step: 442170 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:34,950-Speed 2498.70 samples/sec Loss 2.0015 LearningRate 0.000269 Epoch: 21 Global Step: 442180 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:43,151-Speed 2497.81 samples/sec Loss 1.9794 LearningRate 0.000269 Epoch: 21 Global Step: 442190 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:51,353-Speed 2497.44 samples/sec Loss 1.9973 LearningRate 0.000269 Epoch: 21 Global Step: 442200 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:39:59,502-Speed 2513.51 samples/sec Loss 2.0470 LearningRate 0.000269 Epoch: 21 Global Step: 442210 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:07,703-Speed 2498.03 samples/sec Loss 2.0247 LearningRate 0.000269 Epoch: 21 Global Step: 442220 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:15,900-Speed 2499.04 samples/sec Loss 2.0133 LearningRate 0.000269 Epoch: 21 Global Step: 442230 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:24,098-Speed 2498.89 samples/sec Loss 1.9871 LearningRate 0.000269 Epoch: 21 Global Step: 442240 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:32,294-Speed 2499.18 samples/sec Loss 2.0062 LearningRate 0.000269 Epoch: 21 Global Step: 442250 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:40,492-Speed 2498.49 samples/sec Loss 1.9502 LearningRate 0.000269 Epoch: 21 Global Step: 442260 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:48,662-Speed 2507.38 samples/sec Loss 1.9778 LearningRate 0.000269 Epoch: 21 Global Step: 442270 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:40:56,867-Speed 2496.49 samples/sec Loss 1.9877 LearningRate 0.000269 Epoch: 21 Global Step: 442280 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:05,068-Speed 2497.86 samples/sec Loss 1.9895 LearningRate 0.000269 Epoch: 21 Global Step: 442290 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:13,274-Speed 2496.01 samples/sec Loss 1.9645 LearningRate 0.000269 Epoch: 21 Global Step: 442300 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:21,472-Speed 2498.40 samples/sec Loss 2.0108 LearningRate 0.000269 Epoch: 21 Global Step: 442310 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:29,677-Speed 2496.55 samples/sec Loss 2.0187 LearningRate 0.000269 Epoch: 21 Global Step: 442320 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:37,821-Speed 2515.10 samples/sec Loss 1.9824 LearningRate 0.000269 Epoch: 21 Global Step: 442330 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:46,024-Speed 2497.23 samples/sec Loss 1.9503 LearningRate 0.000269 Epoch: 21 Global Step: 442340 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:41:54,225-Speed 2497.58 samples/sec Loss 1.9595 LearningRate 0.000269 Epoch: 21 Global Step: 442350 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:42:02,422-Speed 2498.89 samples/sec Loss 1.9822 LearningRate 0.000269 Epoch: 21 Global Step: 442360 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:42:10,625-Speed 2496.98 samples/sec Loss 2.0201 LearningRate 0.000269 Epoch: 21 Global Step: 442370 Fp16 Grad Scale: 8192 Required: 89 hours Training: 2022-07-09 19:42:18,840-Speed 2493.25 samples/sec Loss 1.9970 LearningRate 0.000269 Epoch: 21 Global Step: 442380 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:42:26,988-Speed 2513.98 samples/sec Loss 1.9909 LearningRate 0.000269 Epoch: 21 Global Step: 442390 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:42:35,190-Speed 2497.50 samples/sec Loss 2.0031 LearningRate 0.000269 Epoch: 21 Global Step: 442400 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:42:43,389-Speed 2498.18 samples/sec Loss 2.0175 LearningRate 0.000269 Epoch: 21 Global Step: 442410 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:42:51,588-Speed 2498.54 samples/sec Loss 2.0705 LearningRate 0.000269 Epoch: 21 Global Step: 442420 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:42:59,786-Speed 2498.48 samples/sec Loss 2.0608 LearningRate 0.000269 Epoch: 21 Global Step: 442430 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:07,987-Speed 2497.29 samples/sec Loss 1.9948 LearningRate 0.000269 Epoch: 21 Global Step: 442440 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:16,139-Speed 2513.26 samples/sec Loss 2.0085 LearningRate 0.000269 Epoch: 21 Global Step: 442450 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:24,352-Speed 2494.08 samples/sec Loss 1.9961 LearningRate 0.000269 Epoch: 21 Global Step: 442460 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:32,550-Speed 2498.45 samples/sec Loss 2.0491 LearningRate 0.000269 Epoch: 21 Global Step: 442470 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:40,753-Speed 2496.91 samples/sec Loss 1.9873 LearningRate 0.000269 Epoch: 21 Global Step: 442480 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:48,953-Speed 2498.07 samples/sec Loss 2.0314 LearningRate 0.000269 Epoch: 21 Global Step: 442490 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:43:57,154-Speed 2497.60 samples/sec Loss 1.9798 LearningRate 0.000269 Epoch: 21 Global Step: 442500 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:05,312-Speed 2511.10 samples/sec Loss 2.0278 LearningRate 0.000269 Epoch: 21 Global Step: 442510 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:13,508-Speed 2499.84 samples/sec Loss 2.0271 LearningRate 0.000269 Epoch: 21 Global Step: 442520 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:21,712-Speed 2496.63 samples/sec Loss 1.9979 LearningRate 0.000269 Epoch: 21 Global Step: 442530 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:29,913-Speed 2497.80 samples/sec Loss 2.0442 LearningRate 0.000269 Epoch: 21 Global Step: 442540 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:38,111-Speed 2498.37 samples/sec Loss 1.9860 LearningRate 0.000269 Epoch: 21 Global Step: 442550 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:46,328-Speed 2492.83 samples/sec Loss 2.0838 LearningRate 0.000269 Epoch: 21 Global Step: 442560 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:44:54,480-Speed 2512.50 samples/sec Loss 2.0629 LearningRate 0.000269 Epoch: 21 Global Step: 442570 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:02,683-Speed 2497.06 samples/sec Loss 2.0362 LearningRate 0.000269 Epoch: 21 Global Step: 442580 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:10,888-Speed 2496.51 samples/sec Loss 2.0157 LearningRate 0.000269 Epoch: 21 Global Step: 442590 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:19,103-Speed 2493.09 samples/sec Loss 1.9729 LearningRate 0.000269 Epoch: 21 Global Step: 442600 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:27,310-Speed 2495.87 samples/sec Loss 2.0271 LearningRate 0.000269 Epoch: 21 Global Step: 442610 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:35,518-Speed 2495.39 samples/sec Loss 2.0140 LearningRate 0.000269 Epoch: 21 Global Step: 442620 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:43,667-Speed 2513.86 samples/sec Loss 2.0088 LearningRate 0.000269 Epoch: 21 Global Step: 442630 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:45:51,868-Speed 2497.55 samples/sec Loss 2.0113 LearningRate 0.000269 Epoch: 21 Global Step: 442640 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:00,071-Speed 2497.15 samples/sec Loss 1.9796 LearningRate 0.000269 Epoch: 21 Global Step: 442650 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:08,273-Speed 2497.15 samples/sec Loss 2.0517 LearningRate 0.000269 Epoch: 21 Global Step: 442660 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:16,487-Speed 2493.76 samples/sec Loss 2.0181 LearningRate 0.000269 Epoch: 21 Global Step: 442670 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:24,685-Speed 2498.37 samples/sec Loss 2.0516 LearningRate 0.000269 Epoch: 21 Global Step: 442680 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:32,837-Speed 2512.65 samples/sec Loss 2.0236 LearningRate 0.000269 Epoch: 21 Global Step: 442690 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:41,035-Speed 2498.81 samples/sec Loss 2.0214 LearningRate 0.000268 Epoch: 21 Global Step: 442700 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:49,233-Speed 2498.52 samples/sec Loss 2.0286 LearningRate 0.000268 Epoch: 21 Global Step: 442710 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:46:57,430-Speed 2498.86 samples/sec Loss 2.0331 LearningRate 0.000268 Epoch: 21 Global Step: 442720 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:05,626-Speed 2499.46 samples/sec Loss 2.0243 LearningRate 0.000268 Epoch: 21 Global Step: 442730 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:13,822-Speed 2499.10 samples/sec Loss 2.0368 LearningRate 0.000268 Epoch: 21 Global Step: 442740 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:21,971-Speed 2513.72 samples/sec Loss 2.0162 LearningRate 0.000268 Epoch: 21 Global Step: 442750 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:30,173-Speed 2497.21 samples/sec Loss 2.0189 LearningRate 0.000268 Epoch: 21 Global Step: 442760 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:38,374-Speed 2497.47 samples/sec Loss 2.0115 LearningRate 0.000268 Epoch: 21 Global Step: 442770 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:46,577-Speed 2497.43 samples/sec Loss 2.0214 LearningRate 0.000268 Epoch: 21 Global Step: 442780 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:47:54,796-Speed 2492.45 samples/sec Loss 1.9925 LearningRate 0.000268 Epoch: 21 Global Step: 442790 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:02,998-Speed 2497.18 samples/sec Loss 2.0005 LearningRate 0.000268 Epoch: 21 Global Step: 442800 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:11,164-Speed 2508.55 samples/sec Loss 1.9855 LearningRate 0.000268 Epoch: 21 Global Step: 442810 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:19,364-Speed 2497.93 samples/sec Loss 2.0031 LearningRate 0.000268 Epoch: 21 Global Step: 442820 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:27,566-Speed 2497.12 samples/sec Loss 1.9291 LearningRate 0.000268 Epoch: 21 Global Step: 442830 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:35,766-Speed 2498.33 samples/sec Loss 1.9811 LearningRate 0.000268 Epoch: 21 Global Step: 442840 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:44,001-Speed 2487.62 samples/sec Loss 1.9491 LearningRate 0.000268 Epoch: 21 Global Step: 442850 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:48:52,204-Speed 2496.97 samples/sec Loss 1.9833 LearningRate 0.000268 Epoch: 21 Global Step: 442860 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:00,351-Speed 2514.21 samples/sec Loss 1.9926 LearningRate 0.000268 Epoch: 21 Global Step: 442870 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:08,551-Speed 2497.91 samples/sec Loss 1.9564 LearningRate 0.000268 Epoch: 21 Global Step: 442880 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:16,753-Speed 2497.30 samples/sec Loss 2.0087 LearningRate 0.000268 Epoch: 21 Global Step: 442890 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:24,957-Speed 2496.89 samples/sec Loss 1.9698 LearningRate 0.000268 Epoch: 21 Global Step: 442900 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:33,163-Speed 2496.33 samples/sec Loss 1.9785 LearningRate 0.000268 Epoch: 21 Global Step: 442910 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:41,367-Speed 2497.04 samples/sec Loss 2.0381 LearningRate 0.000268 Epoch: 21 Global Step: 442920 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:49,517-Speed 2513.34 samples/sec Loss 2.0162 LearningRate 0.000268 Epoch: 21 Global Step: 442930 Fp16 Grad Scale: 16384 Required: 89 hours Training: 2022-07-09 19:49:57,719-Speed 2497.41 samples/sec Loss 2.0056 LearningRate 0.000268 Epoch: 21 Global Step: 442940 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:05,919-Speed 2497.76 samples/sec Loss 2.0129 LearningRate 0.000268 Epoch: 21 Global Step: 442950 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:14,122-Speed 2497.31 samples/sec Loss 1.9773 LearningRate 0.000268 Epoch: 21 Global Step: 442960 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:22,324-Speed 2497.27 samples/sec Loss 2.0026 LearningRate 0.000268 Epoch: 21 Global Step: 442970 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:30,524-Speed 2498.00 samples/sec Loss 2.0853 LearningRate 0.000268 Epoch: 21 Global Step: 442980 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:38,671-Speed 2513.93 samples/sec Loss 2.0323 LearningRate 0.000268 Epoch: 21 Global Step: 442990 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:46,868-Speed 2498.91 samples/sec Loss 2.0305 LearningRate 0.000268 Epoch: 21 Global Step: 443000 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:50:55,073-Speed 2496.60 samples/sec Loss 2.0688 LearningRate 0.000268 Epoch: 21 Global Step: 443010 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:03,288-Speed 2493.30 samples/sec Loss 1.9904 LearningRate 0.000268 Epoch: 21 Global Step: 443020 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:11,486-Speed 2498.64 samples/sec Loss 2.0234 LearningRate 0.000268 Epoch: 21 Global Step: 443030 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:19,694-Speed 2495.50 samples/sec Loss 2.0325 LearningRate 0.000268 Epoch: 21 Global Step: 443040 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:27,838-Speed 2515.18 samples/sec Loss 1.9820 LearningRate 0.000268 Epoch: 21 Global Step: 443050 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:36,044-Speed 2496.32 samples/sec Loss 1.9984 LearningRate 0.000268 Epoch: 21 Global Step: 443060 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:44,265-Speed 2491.53 samples/sec Loss 2.0087 LearningRate 0.000268 Epoch: 21 Global Step: 443070 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:51:52,467-Speed 2497.26 samples/sec Loss 1.9948 LearningRate 0.000268 Epoch: 21 Global Step: 443080 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:00,676-Speed 2495.12 samples/sec Loss 2.0079 LearningRate 0.000268 Epoch: 21 Global Step: 443090 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:08,881-Speed 2496.49 samples/sec Loss 1.9762 LearningRate 0.000268 Epoch: 21 Global Step: 443100 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:17,028-Speed 2514.25 samples/sec Loss 2.0171 LearningRate 0.000268 Epoch: 21 Global Step: 443110 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:25,237-Speed 2494.88 samples/sec Loss 2.0033 LearningRate 0.000268 Epoch: 21 Global Step: 443120 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:33,445-Speed 2495.84 samples/sec Loss 1.9780 LearningRate 0.000268 Epoch: 21 Global Step: 443130 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:41,646-Speed 2497.52 samples/sec Loss 2.0136 LearningRate 0.000268 Epoch: 21 Global Step: 443140 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:49,868-Speed 2491.12 samples/sec Loss 2.0110 LearningRate 0.000268 Epoch: 21 Global Step: 443150 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:52:58,070-Speed 2497.55 samples/sec Loss 1.9757 LearningRate 0.000268 Epoch: 21 Global Step: 443160 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:06,226-Speed 2511.25 samples/sec Loss 2.0141 LearningRate 0.000268 Epoch: 21 Global Step: 443170 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:14,434-Speed 2495.73 samples/sec Loss 1.9768 LearningRate 0.000268 Epoch: 21 Global Step: 443180 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:22,636-Speed 2497.44 samples/sec Loss 2.0258 LearningRate 0.000268 Epoch: 21 Global Step: 443190 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:30,839-Speed 2496.98 samples/sec Loss 2.0718 LearningRate 0.000268 Epoch: 21 Global Step: 443200 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:39,047-Speed 2495.39 samples/sec Loss 2.0381 LearningRate 0.000268 Epoch: 21 Global Step: 443210 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:47,248-Speed 2497.57 samples/sec Loss 2.0331 LearningRate 0.000268 Epoch: 21 Global Step: 443220 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:53:55,402-Speed 2512.16 samples/sec Loss 1.9825 LearningRate 0.000268 Epoch: 21 Global Step: 443230 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:03,615-Speed 2494.01 samples/sec Loss 2.0237 LearningRate 0.000268 Epoch: 21 Global Step: 443240 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:11,815-Speed 2498.01 samples/sec Loss 2.0564 LearningRate 0.000268 Epoch: 21 Global Step: 443250 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:20,013-Speed 2498.43 samples/sec Loss 2.0136 LearningRate 0.000268 Epoch: 21 Global Step: 443260 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:28,211-Speed 2498.42 samples/sec Loss 2.0194 LearningRate 0.000268 Epoch: 21 Global Step: 443270 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:36,410-Speed 2498.30 samples/sec Loss 2.0059 LearningRate 0.000268 Epoch: 21 Global Step: 443280 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:44,558-Speed 2513.97 samples/sec Loss 1.9816 LearningRate 0.000268 Epoch: 21 Global Step: 443290 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:54:52,779-Speed 2491.44 samples/sec Loss 2.0048 LearningRate 0.000268 Epoch: 21 Global Step: 443300 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:00,982-Speed 2497.48 samples/sec Loss 2.0278 LearningRate 0.000268 Epoch: 21 Global Step: 443310 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:09,186-Speed 2496.88 samples/sec Loss 1.9419 LearningRate 0.000268 Epoch: 21 Global Step: 443320 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:17,398-Speed 2494.30 samples/sec Loss 2.0217 LearningRate 0.000268 Epoch: 21 Global Step: 443330 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:25,600-Speed 2497.59 samples/sec Loss 1.9851 LearningRate 0.000268 Epoch: 21 Global Step: 443340 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:33,750-Speed 2513.28 samples/sec Loss 1.9772 LearningRate 0.000268 Epoch: 21 Global Step: 443350 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:41,958-Speed 2495.39 samples/sec Loss 2.0345 LearningRate 0.000268 Epoch: 21 Global Step: 443360 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:50,165-Speed 2495.85 samples/sec Loss 1.9799 LearningRate 0.000268 Epoch: 21 Global Step: 443370 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:55:58,372-Speed 2495.54 samples/sec Loss 2.0189 LearningRate 0.000268 Epoch: 21 Global Step: 443380 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:06,572-Speed 2498.06 samples/sec Loss 2.0402 LearningRate 0.000268 Epoch: 21 Global Step: 443390 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:14,777-Speed 2496.43 samples/sec Loss 2.0180 LearningRate 0.000268 Epoch: 21 Global Step: 443400 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:22,927-Speed 2513.13 samples/sec Loss 2.0200 LearningRate 0.000268 Epoch: 21 Global Step: 443410 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:31,126-Speed 2498.30 samples/sec Loss 1.9500 LearningRate 0.000267 Epoch: 21 Global Step: 443420 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:39,335-Speed 2495.99 samples/sec Loss 2.0157 LearningRate 0.000267 Epoch: 21 Global Step: 443430 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:47,542-Speed 2495.80 samples/sec Loss 2.0378 LearningRate 0.000267 Epoch: 21 Global Step: 443440 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:56:55,745-Speed 2496.96 samples/sec Loss 1.9720 LearningRate 0.000267 Epoch: 21 Global Step: 443450 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:03,969-Speed 2490.71 samples/sec Loss 1.9993 LearningRate 0.000267 Epoch: 21 Global Step: 443460 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:12,115-Speed 2514.37 samples/sec Loss 1.9788 LearningRate 0.000267 Epoch: 21 Global Step: 443470 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:20,318-Speed 2497.10 samples/sec Loss 2.0183 LearningRate 0.000267 Epoch: 21 Global Step: 443480 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:28,514-Speed 2499.43 samples/sec Loss 2.0111 LearningRate 0.000267 Epoch: 21 Global Step: 443490 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:36,714-Speed 2497.75 samples/sec Loss 2.0128 LearningRate 0.000267 Epoch: 21 Global Step: 443500 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:44,916-Speed 2497.33 samples/sec Loss 1.9776 LearningRate 0.000267 Epoch: 21 Global Step: 443510 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:57:53,120-Speed 2496.85 samples/sec Loss 1.9745 LearningRate 0.000267 Epoch: 21 Global Step: 443520 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:01,279-Speed 2510.29 samples/sec Loss 1.9836 LearningRate 0.000267 Epoch: 21 Global Step: 443530 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:09,491-Speed 2494.42 samples/sec Loss 1.9979 LearningRate 0.000267 Epoch: 21 Global Step: 443540 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:17,719-Speed 2489.26 samples/sec Loss 1.9966 LearningRate 0.000267 Epoch: 21 Global Step: 443550 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:25,924-Speed 2496.59 samples/sec Loss 2.0479 LearningRate 0.000267 Epoch: 21 Global Step: 443560 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:34,132-Speed 2495.33 samples/sec Loss 2.0636 LearningRate 0.000267 Epoch: 21 Global Step: 443570 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 19:58:42,336-Speed 2496.81 samples/sec Loss 2.0061 LearningRate 0.000267 Epoch: 21 Global Step: 443580 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:58:50,480-Speed 2514.99 samples/sec Loss 2.0188 LearningRate 0.000267 Epoch: 21 Global Step: 443590 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:58:58,710-Speed 2489.48 samples/sec Loss 2.0389 LearningRate 0.000267 Epoch: 21 Global Step: 443600 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:06,910-Speed 2497.67 samples/sec Loss 2.0047 LearningRate 0.000267 Epoch: 21 Global Step: 443610 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:15,134-Speed 2490.72 samples/sec Loss 1.9819 LearningRate 0.000267 Epoch: 21 Global Step: 443620 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:23,332-Speed 2498.93 samples/sec Loss 2.0162 LearningRate 0.000267 Epoch: 21 Global Step: 443630 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:31,536-Speed 2496.95 samples/sec Loss 2.0059 LearningRate 0.000267 Epoch: 21 Global Step: 443640 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:39,682-Speed 2514.42 samples/sec Loss 2.0094 LearningRate 0.000267 Epoch: 21 Global Step: 443650 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:47,894-Speed 2494.44 samples/sec Loss 1.9995 LearningRate 0.000267 Epoch: 21 Global Step: 443660 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 19:59:56,097-Speed 2496.94 samples/sec Loss 2.0116 LearningRate 0.000267 Epoch: 21 Global Step: 443670 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:04,305-Speed 2495.49 samples/sec Loss 2.0177 LearningRate 0.000267 Epoch: 21 Global Step: 443680 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:12,505-Speed 2498.07 samples/sec Loss 1.9918 LearningRate 0.000267 Epoch: 21 Global Step: 443690 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:20,704-Speed 2498.29 samples/sec Loss 1.9766 LearningRate 0.000267 Epoch: 21 Global Step: 443700 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:28,849-Speed 2514.83 samples/sec Loss 2.0411 LearningRate 0.000267 Epoch: 21 Global Step: 443710 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:37,052-Speed 2496.97 samples/sec Loss 2.0025 LearningRate 0.000267 Epoch: 21 Global Step: 443720 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:45,254-Speed 2497.30 samples/sec Loss 1.9689 LearningRate 0.000267 Epoch: 21 Global Step: 443730 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:00:53,454-Speed 2498.34 samples/sec Loss 2.0097 LearningRate 0.000267 Epoch: 21 Global Step: 443740 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:01,655-Speed 2497.47 samples/sec Loss 1.9796 LearningRate 0.000267 Epoch: 21 Global Step: 443750 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:09,861-Speed 2496.34 samples/sec Loss 1.9945 LearningRate 0.000267 Epoch: 21 Global Step: 443760 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:18,011-Speed 2513.38 samples/sec Loss 2.0132 LearningRate 0.000267 Epoch: 21 Global Step: 443770 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:26,220-Speed 2495.07 samples/sec Loss 1.9893 LearningRate 0.000267 Epoch: 21 Global Step: 443780 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:34,420-Speed 2497.90 samples/sec Loss 2.0001 LearningRate 0.000267 Epoch: 21 Global Step: 443790 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:42,620-Speed 2498.05 samples/sec Loss 1.9832 LearningRate 0.000267 Epoch: 21 Global Step: 443800 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:50,824-Speed 2496.69 samples/sec Loss 2.0435 LearningRate 0.000267 Epoch: 21 Global Step: 443810 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:01:59,025-Speed 2497.90 samples/sec Loss 2.0637 LearningRate 0.000267 Epoch: 21 Global Step: 443820 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:07,170-Speed 2514.59 samples/sec Loss 2.1128 LearningRate 0.000267 Epoch: 21 Global Step: 443830 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:15,373-Speed 2497.25 samples/sec Loss 2.1185 LearningRate 0.000267 Epoch: 21 Global Step: 443840 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:23,604-Speed 2488.59 samples/sec Loss 2.0904 LearningRate 0.000267 Epoch: 21 Global Step: 443850 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:31,804-Speed 2497.99 samples/sec Loss 2.1523 LearningRate 0.000267 Epoch: 21 Global Step: 443860 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:40,022-Speed 2492.71 samples/sec Loss 2.0622 LearningRate 0.000267 Epoch: 21 Global Step: 443870 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:48,224-Speed 2497.34 samples/sec Loss 2.0701 LearningRate 0.000267 Epoch: 21 Global Step: 443880 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:02:56,368-Speed 2515.24 samples/sec Loss 2.0854 LearningRate 0.000267 Epoch: 21 Global Step: 443890 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:04,574-Speed 2497.69 samples/sec Loss 2.0419 LearningRate 0.000267 Epoch: 21 Global Step: 443900 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:12,773-Speed 2498.25 samples/sec Loss 2.0429 LearningRate 0.000267 Epoch: 21 Global Step: 443910 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:20,972-Speed 2498.31 samples/sec Loss 2.0793 LearningRate 0.000267 Epoch: 21 Global Step: 443920 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:29,169-Speed 2498.76 samples/sec Loss 2.0494 LearningRate 0.000267 Epoch: 21 Global Step: 443930 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:37,372-Speed 2497.40 samples/sec Loss 2.0519 LearningRate 0.000267 Epoch: 21 Global Step: 443940 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:45,532-Speed 2510.51 samples/sec Loss 2.0731 LearningRate 0.000267 Epoch: 21 Global Step: 443950 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:03:53,733-Speed 2497.59 samples/sec Loss 2.0017 LearningRate 0.000267 Epoch: 21 Global Step: 443960 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:01,934-Speed 2497.58 samples/sec Loss 2.0391 LearningRate 0.000267 Epoch: 21 Global Step: 443970 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:10,136-Speed 2497.52 samples/sec Loss 2.0513 LearningRate 0.000267 Epoch: 21 Global Step: 443980 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:18,333-Speed 2498.67 samples/sec Loss 1.9775 LearningRate 0.000267 Epoch: 21 Global Step: 443990 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:26,534-Speed 2497.73 samples/sec Loss 1.9915 LearningRate 0.000267 Epoch: 21 Global Step: 444000 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:34,688-Speed 2512.13 samples/sec Loss 1.9744 LearningRate 0.000267 Epoch: 21 Global Step: 444010 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:42,899-Speed 2494.50 samples/sec Loss 2.0092 LearningRate 0.000267 Epoch: 21 Global Step: 444020 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:51,105-Speed 2496.29 samples/sec Loss 2.0012 LearningRate 0.000267 Epoch: 21 Global Step: 444030 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:04:59,311-Speed 2496.09 samples/sec Loss 1.9736 LearningRate 0.000267 Epoch: 21 Global Step: 444040 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:07,515-Speed 2496.59 samples/sec Loss 1.9919 LearningRate 0.000267 Epoch: 21 Global Step: 444050 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:15,715-Speed 2497.99 samples/sec Loss 2.0131 LearningRate 0.000267 Epoch: 21 Global Step: 444060 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:23,865-Speed 2513.21 samples/sec Loss 1.9893 LearningRate 0.000267 Epoch: 21 Global Step: 444070 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:32,064-Speed 2498.40 samples/sec Loss 1.9906 LearningRate 0.000267 Epoch: 21 Global Step: 444080 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:40,268-Speed 2496.85 samples/sec Loss 2.0140 LearningRate 0.000267 Epoch: 21 Global Step: 444090 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:48,475-Speed 2495.98 samples/sec Loss 2.0132 LearningRate 0.000267 Epoch: 21 Global Step: 444100 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:05:56,678-Speed 2497.16 samples/sec Loss 2.0068 LearningRate 0.000267 Epoch: 21 Global Step: 444110 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:04,878-Speed 2498.33 samples/sec Loss 1.9980 LearningRate 0.000267 Epoch: 21 Global Step: 444120 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:13,031-Speed 2512.58 samples/sec Loss 2.0170 LearningRate 0.000267 Epoch: 21 Global Step: 444130 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:21,249-Speed 2492.23 samples/sec Loss 2.0131 LearningRate 0.000266 Epoch: 21 Global Step: 444140 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:29,464-Speed 2493.30 samples/sec Loss 2.0015 LearningRate 0.000266 Epoch: 21 Global Step: 444150 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:37,691-Speed 2490.03 samples/sec Loss 1.9932 LearningRate 0.000266 Epoch: 21 Global Step: 444160 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:45,896-Speed 2496.28 samples/sec Loss 1.9536 LearningRate 0.000266 Epoch: 21 Global Step: 444170 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:06:54,097-Speed 2497.57 samples/sec Loss 2.0022 LearningRate 0.000266 Epoch: 21 Global Step: 444180 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:02,244-Speed 2514.62 samples/sec Loss 2.0394 LearningRate 0.000266 Epoch: 21 Global Step: 444190 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:10,443-Speed 2498.07 samples/sec Loss 1.9821 LearningRate 0.000266 Epoch: 21 Global Step: 444200 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:18,641-Speed 2498.43 samples/sec Loss 1.9810 LearningRate 0.000266 Epoch: 21 Global Step: 444210 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:26,844-Speed 2497.39 samples/sec Loss 2.0068 LearningRate 0.000266 Epoch: 21 Global Step: 444220 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:35,044-Speed 2497.98 samples/sec Loss 1.9858 LearningRate 0.000266 Epoch: 21 Global Step: 444230 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:43,241-Speed 2499.16 samples/sec Loss 1.9499 LearningRate 0.000266 Epoch: 21 Global Step: 444240 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:51,389-Speed 2513.80 samples/sec Loss 1.9773 LearningRate 0.000266 Epoch: 21 Global Step: 444250 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:07:59,593-Speed 2496.70 samples/sec Loss 2.0205 LearningRate 0.000266 Epoch: 21 Global Step: 444260 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:07,796-Speed 2497.10 samples/sec Loss 1.9509 LearningRate 0.000266 Epoch: 21 Global Step: 444270 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:15,993-Speed 2498.86 samples/sec Loss 2.0327 LearningRate 0.000266 Epoch: 21 Global Step: 444280 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:24,203-Speed 2494.94 samples/sec Loss 1.9822 LearningRate 0.000266 Epoch: 21 Global Step: 444290 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:32,414-Speed 2494.70 samples/sec Loss 1.9454 LearningRate 0.000266 Epoch: 21 Global Step: 444300 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:40,563-Speed 2514.14 samples/sec Loss 2.0161 LearningRate 0.000266 Epoch: 21 Global Step: 444310 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:48,763-Speed 2497.89 samples/sec Loss 1.9784 LearningRate 0.000266 Epoch: 21 Global Step: 444320 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:08:56,970-Speed 2495.77 samples/sec Loss 1.9636 LearningRate 0.000266 Epoch: 21 Global Step: 444330 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:09:05,169-Speed 2498.37 samples/sec Loss 1.9510 LearningRate 0.000266 Epoch: 21 Global Step: 444340 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:09:13,326-Speed 2511.01 samples/sec Loss 1.9798 LearningRate 0.000266 Epoch: 21 Global Step: 444350 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:09:21,532-Speed 2496.12 samples/sec Loss 1.9917 LearningRate 0.000266 Epoch: 21 Global Step: 444360 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:09:29,678-Speed 2514.42 samples/sec Loss 1.9680 LearningRate 0.000266 Epoch: 21 Global Step: 444370 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:09:37,893-Speed 2493.63 samples/sec Loss 2.0383 LearningRate 0.000266 Epoch: 21 Global Step: 444380 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:09:46,105-Speed 2493.98 samples/sec Loss 1.9948 LearningRate 0.000266 Epoch: 21 Global Step: 444390 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:09:54,320-Speed 2493.38 samples/sec Loss 1.9867 LearningRate 0.000266 Epoch: 21 Global Step: 444400 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:02,521-Speed 2498.08 samples/sec Loss 2.0018 LearningRate 0.000266 Epoch: 21 Global Step: 444410 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:10,720-Speed 2497.95 samples/sec Loss 1.9951 LearningRate 0.000266 Epoch: 21 Global Step: 444420 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:18,868-Speed 2514.01 samples/sec Loss 1.9627 LearningRate 0.000266 Epoch: 21 Global Step: 444430 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:27,066-Speed 2498.60 samples/sec Loss 1.9899 LearningRate 0.000266 Epoch: 21 Global Step: 444440 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:35,266-Speed 2497.82 samples/sec Loss 1.9889 LearningRate 0.000266 Epoch: 21 Global Step: 444450 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:43,467-Speed 2497.80 samples/sec Loss 2.0503 LearningRate 0.000266 Epoch: 21 Global Step: 444460 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:51,693-Speed 2490.17 samples/sec Loss 2.0059 LearningRate 0.000266 Epoch: 21 Global Step: 444470 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:10:59,896-Speed 2497.14 samples/sec Loss 1.9851 LearningRate 0.000266 Epoch: 21 Global Step: 444480 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:08,045-Speed 2513.60 samples/sec Loss 2.0112 LearningRate 0.000266 Epoch: 21 Global Step: 444490 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:16,244-Speed 2498.16 samples/sec Loss 2.0112 LearningRate 0.000266 Epoch: 21 Global Step: 444500 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:24,448-Speed 2496.90 samples/sec Loss 2.0187 LearningRate 0.000266 Epoch: 21 Global Step: 444510 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:32,652-Speed 2496.69 samples/sec Loss 2.0322 LearningRate 0.000266 Epoch: 21 Global Step: 444520 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:40,848-Speed 2498.99 samples/sec Loss 2.0435 LearningRate 0.000266 Epoch: 21 Global Step: 444530 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:49,049-Speed 2497.94 samples/sec Loss 1.9848 LearningRate 0.000266 Epoch: 21 Global Step: 444540 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:11:57,204-Speed 2511.83 samples/sec Loss 1.9893 LearningRate 0.000266 Epoch: 21 Global Step: 444550 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:12:05,412-Speed 2495.55 samples/sec Loss 1.9931 LearningRate 0.000266 Epoch: 21 Global Step: 444560 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:12:13,616-Speed 2496.93 samples/sec Loss 1.9912 LearningRate 0.000266 Epoch: 21 Global Step: 444570 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:12:21,841-Speed 2490.42 samples/sec Loss 1.9827 LearningRate 0.000266 Epoch: 21 Global Step: 444580 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:12:30,041-Speed 2497.77 samples/sec Loss 2.0301 LearningRate 0.000266 Epoch: 21 Global Step: 444590 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:12:38,215-Speed 2505.85 samples/sec Loss 2.0285 LearningRate 0.000266 Epoch: 21 Global Step: 444600 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:12:46,363-Speed 2513.97 samples/sec Loss 1.9862 LearningRate 0.000266 Epoch: 21 Global Step: 444610 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:12:54,562-Speed 2498.30 samples/sec Loss 2.0218 LearningRate 0.000266 Epoch: 21 Global Step: 444620 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:02,762-Speed 2497.86 samples/sec Loss 2.0016 LearningRate 0.000266 Epoch: 21 Global Step: 444630 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:10,971-Speed 2495.11 samples/sec Loss 1.9593 LearningRate 0.000266 Epoch: 21 Global Step: 444640 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:19,174-Speed 2497.17 samples/sec Loss 2.0005 LearningRate 0.000266 Epoch: 21 Global Step: 444650 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:27,385-Speed 2494.59 samples/sec Loss 1.9931 LearningRate 0.000266 Epoch: 21 Global Step: 444660 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:35,530-Speed 2514.74 samples/sec Loss 2.0304 LearningRate 0.000266 Epoch: 21 Global Step: 444670 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:43,728-Speed 2498.70 samples/sec Loss 2.0045 LearningRate 0.000266 Epoch: 21 Global Step: 444680 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:13:51,926-Speed 2498.70 samples/sec Loss 2.0030 LearningRate 0.000266 Epoch: 21 Global Step: 444690 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:00,126-Speed 2498.00 samples/sec Loss 2.0242 LearningRate 0.000266 Epoch: 21 Global Step: 444700 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:08,325-Speed 2498.14 samples/sec Loss 2.0031 LearningRate 0.000266 Epoch: 21 Global Step: 444710 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:16,523-Speed 2498.40 samples/sec Loss 2.0086 LearningRate 0.000266 Epoch: 21 Global Step: 444720 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:24,669-Speed 2514.47 samples/sec Loss 2.0191 LearningRate 0.000266 Epoch: 21 Global Step: 444730 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:32,868-Speed 2498.29 samples/sec Loss 2.0200 LearningRate 0.000266 Epoch: 21 Global Step: 444740 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:41,070-Speed 2497.29 samples/sec Loss 2.0503 LearningRate 0.000266 Epoch: 21 Global Step: 444750 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:49,268-Speed 2498.57 samples/sec Loss 1.9959 LearningRate 0.000266 Epoch: 21 Global Step: 444760 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:14:57,469-Speed 2497.81 samples/sec Loss 1.9916 LearningRate 0.000266 Epoch: 21 Global Step: 444770 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:05,678-Speed 2495.33 samples/sec Loss 2.0050 LearningRate 0.000266 Epoch: 21 Global Step: 444780 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:13,826-Speed 2513.92 samples/sec Loss 2.0265 LearningRate 0.000266 Epoch: 21 Global Step: 444790 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:22,024-Speed 2498.52 samples/sec Loss 1.9988 LearningRate 0.000266 Epoch: 21 Global Step: 444800 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:30,222-Speed 2498.84 samples/sec Loss 1.9863 LearningRate 0.000266 Epoch: 21 Global Step: 444810 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:38,422-Speed 2498.55 samples/sec Loss 2.0226 LearningRate 0.000266 Epoch: 21 Global Step: 444820 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:46,624-Speed 2497.39 samples/sec Loss 2.0300 LearningRate 0.000266 Epoch: 21 Global Step: 444830 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:15:54,821-Speed 2498.70 samples/sec Loss 2.0086 LearningRate 0.000266 Epoch: 21 Global Step: 444840 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:02,960-Speed 2516.84 samples/sec Loss 2.0042 LearningRate 0.000266 Epoch: 21 Global Step: 444850 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:11,160-Speed 2498.06 samples/sec Loss 2.0148 LearningRate 0.000266 Epoch: 21 Global Step: 444860 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:19,360-Speed 2498.22 samples/sec Loss 2.0160 LearningRate 0.000265 Epoch: 21 Global Step: 444870 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:27,558-Speed 2498.51 samples/sec Loss 1.9843 LearningRate 0.000265 Epoch: 21 Global Step: 444880 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:35,760-Speed 2497.43 samples/sec Loss 2.0168 LearningRate 0.000265 Epoch: 21 Global Step: 444890 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:43,963-Speed 2497.28 samples/sec Loss 1.9903 LearningRate 0.000265 Epoch: 21 Global Step: 444900 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:16:52,116-Speed 2512.07 samples/sec Loss 1.9655 LearningRate 0.000265 Epoch: 21 Global Step: 444910 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:00,313-Speed 2499.02 samples/sec Loss 2.0101 LearningRate 0.000265 Epoch: 21 Global Step: 444920 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:08,513-Speed 2497.80 samples/sec Loss 1.9791 LearningRate 0.000265 Epoch: 21 Global Step: 444930 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:16,712-Speed 2498.59 samples/sec Loss 2.0065 LearningRate 0.000265 Epoch: 21 Global Step: 444940 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:24,914-Speed 2497.02 samples/sec Loss 1.9998 LearningRate 0.000265 Epoch: 21 Global Step: 444950 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:33,115-Speed 2497.60 samples/sec Loss 2.0098 LearningRate 0.000265 Epoch: 21 Global Step: 444960 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:41,271-Speed 2511.50 samples/sec Loss 1.9720 LearningRate 0.000265 Epoch: 21 Global Step: 444970 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:49,470-Speed 2498.68 samples/sec Loss 2.0179 LearningRate 0.000265 Epoch: 21 Global Step: 444980 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:17:57,673-Speed 2496.81 samples/sec Loss 1.9858 LearningRate 0.000265 Epoch: 21 Global Step: 444990 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:05,881-Speed 2495.55 samples/sec Loss 2.0039 LearningRate 0.000265 Epoch: 21 Global Step: 445000 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:14,079-Speed 2498.93 samples/sec Loss 2.0416 LearningRate 0.000265 Epoch: 21 Global Step: 445010 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:22,291-Speed 2494.11 samples/sec Loss 2.0111 LearningRate 0.000265 Epoch: 21 Global Step: 445020 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:30,439-Speed 2514.12 samples/sec Loss 1.9740 LearningRate 0.000265 Epoch: 21 Global Step: 445030 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:38,637-Speed 2498.43 samples/sec Loss 2.0047 LearningRate 0.000265 Epoch: 21 Global Step: 445040 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:46,836-Speed 2498.19 samples/sec Loss 1.9982 LearningRate 0.000265 Epoch: 21 Global Step: 445050 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:18:55,039-Speed 2497.12 samples/sec Loss 1.9722 LearningRate 0.000265 Epoch: 21 Global Step: 445060 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:03,236-Speed 2498.56 samples/sec Loss 1.9796 LearningRate 0.000265 Epoch: 21 Global Step: 445070 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:11,438-Speed 2497.61 samples/sec Loss 2.0101 LearningRate 0.000265 Epoch: 21 Global Step: 445080 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:19,610-Speed 2506.49 samples/sec Loss 1.9969 LearningRate 0.000265 Epoch: 21 Global Step: 445090 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:27,807-Speed 2498.53 samples/sec Loss 1.9806 LearningRate 0.000265 Epoch: 21 Global Step: 445100 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:36,008-Speed 2497.77 samples/sec Loss 2.0244 LearningRate 0.000265 Epoch: 21 Global Step: 445110 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:44,206-Speed 2498.73 samples/sec Loss 1.9886 LearningRate 0.000265 Epoch: 21 Global Step: 445120 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:19:52,425-Speed 2492.19 samples/sec Loss 1.9660 LearningRate 0.000265 Epoch: 21 Global Step: 445130 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:00,629-Speed 2496.50 samples/sec Loss 1.9310 LearningRate 0.000265 Epoch: 21 Global Step: 445140 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:08,778-Speed 2513.76 samples/sec Loss 1.9612 LearningRate 0.000265 Epoch: 21 Global Step: 445150 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:16,976-Speed 2498.61 samples/sec Loss 2.0018 LearningRate 0.000265 Epoch: 21 Global Step: 445160 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:25,188-Speed 2494.22 samples/sec Loss 1.9853 LearningRate 0.000265 Epoch: 21 Global Step: 445170 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:33,388-Speed 2497.73 samples/sec Loss 2.0036 LearningRate 0.000265 Epoch: 21 Global Step: 445180 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:41,596-Speed 2495.83 samples/sec Loss 2.0332 LearningRate 0.000265 Epoch: 21 Global Step: 445190 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:49,794-Speed 2498.50 samples/sec Loss 1.9963 LearningRate 0.000265 Epoch: 21 Global Step: 445200 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:20:57,957-Speed 2509.46 samples/sec Loss 1.9935 LearningRate 0.000265 Epoch: 21 Global Step: 445210 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:06,156-Speed 2498.21 samples/sec Loss 1.9553 LearningRate 0.000265 Epoch: 21 Global Step: 445220 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:14,355-Speed 2498.24 samples/sec Loss 2.0227 LearningRate 0.000265 Epoch: 21 Global Step: 445230 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:22,557-Speed 2497.30 samples/sec Loss 2.0521 LearningRate 0.000265 Epoch: 21 Global Step: 445240 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:30,773-Speed 2493.13 samples/sec Loss 1.9986 LearningRate 0.000265 Epoch: 21 Global Step: 445250 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:38,972-Speed 2498.28 samples/sec Loss 1.9868 LearningRate 0.000265 Epoch: 21 Global Step: 445260 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:47,116-Speed 2515.14 samples/sec Loss 2.0077 LearningRate 0.000265 Epoch: 21 Global Step: 445270 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:21:55,315-Speed 2498.82 samples/sec Loss 1.9979 LearningRate 0.000265 Epoch: 21 Global Step: 445280 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:03,517-Speed 2497.21 samples/sec Loss 2.0401 LearningRate 0.000265 Epoch: 21 Global Step: 445290 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:11,713-Speed 2499.33 samples/sec Loss 2.0147 LearningRate 0.000265 Epoch: 21 Global Step: 445300 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:19,913-Speed 2497.94 samples/sec Loss 1.9978 LearningRate 0.000265 Epoch: 21 Global Step: 445310 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:28,110-Speed 2499.04 samples/sec Loss 2.0608 LearningRate 0.000265 Epoch: 21 Global Step: 445320 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:36,257-Speed 2514.23 samples/sec Loss 1.9619 LearningRate 0.000265 Epoch: 21 Global Step: 445330 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:44,454-Speed 2498.93 samples/sec Loss 2.0067 LearningRate 0.000265 Epoch: 21 Global Step: 445340 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:22:52,654-Speed 2497.90 samples/sec Loss 2.0002 LearningRate 0.000265 Epoch: 21 Global Step: 445350 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:00,854-Speed 2498.19 samples/sec Loss 1.9688 LearningRate 0.000265 Epoch: 21 Global Step: 445360 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:09,059-Speed 2496.16 samples/sec Loss 1.9994 LearningRate 0.000265 Epoch: 21 Global Step: 445370 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:17,257-Speed 2498.61 samples/sec Loss 2.0170 LearningRate 0.000265 Epoch: 21 Global Step: 445380 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:25,413-Speed 2511.57 samples/sec Loss 2.0001 LearningRate 0.000265 Epoch: 21 Global Step: 445390 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:33,613-Speed 2498.28 samples/sec Loss 2.0415 LearningRate 0.000265 Epoch: 21 Global Step: 445400 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:41,811-Speed 2498.72 samples/sec Loss 1.9714 LearningRate 0.000265 Epoch: 21 Global Step: 445410 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:50,008-Speed 2498.73 samples/sec Loss 1.9967 LearningRate 0.000265 Epoch: 21 Global Step: 445420 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:23:58,220-Speed 2494.18 samples/sec Loss 2.0080 LearningRate 0.000265 Epoch: 21 Global Step: 445430 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:06,419-Speed 2498.23 samples/sec Loss 1.9926 LearningRate 0.000265 Epoch: 21 Global Step: 445440 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:14,566-Speed 2514.21 samples/sec Loss 1.9924 LearningRate 0.000265 Epoch: 21 Global Step: 445450 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:22,772-Speed 2496.12 samples/sec Loss 2.0274 LearningRate 0.000265 Epoch: 21 Global Step: 445460 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:30,969-Speed 2498.90 samples/sec Loss 1.9755 LearningRate 0.000265 Epoch: 21 Global Step: 445470 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:39,168-Speed 2498.09 samples/sec Loss 1.9896 LearningRate 0.000265 Epoch: 21 Global Step: 445480 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:47,375-Speed 2496.04 samples/sec Loss 1.9951 LearningRate 0.000265 Epoch: 21 Global Step: 445490 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:24:55,570-Speed 2499.31 samples/sec Loss 2.0416 LearningRate 0.000265 Epoch: 21 Global Step: 445500 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:03,717-Speed 2514.30 samples/sec Loss 1.9932 LearningRate 0.000265 Epoch: 21 Global Step: 445510 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:11,918-Speed 2497.66 samples/sec Loss 2.0026 LearningRate 0.000265 Epoch: 21 Global Step: 445520 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:20,115-Speed 2498.68 samples/sec Loss 2.0383 LearningRate 0.000265 Epoch: 21 Global Step: 445530 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:28,317-Speed 2497.63 samples/sec Loss 2.0108 LearningRate 0.000265 Epoch: 21 Global Step: 445540 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:36,519-Speed 2497.51 samples/sec Loss 1.9970 LearningRate 0.000265 Epoch: 21 Global Step: 445550 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:44,737-Speed 2492.38 samples/sec Loss 2.0075 LearningRate 0.000265 Epoch: 21 Global Step: 445560 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:25:52,884-Speed 2514.20 samples/sec Loss 1.9933 LearningRate 0.000265 Epoch: 21 Global Step: 445570 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:01,082-Speed 2498.50 samples/sec Loss 2.0087 LearningRate 0.000265 Epoch: 21 Global Step: 445580 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:09,290-Speed 2495.26 samples/sec Loss 2.0038 LearningRate 0.000264 Epoch: 21 Global Step: 445590 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:17,573-Speed 2473.02 samples/sec Loss 1.9573 LearningRate 0.000264 Epoch: 21 Global Step: 445600 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:25,774-Speed 2497.47 samples/sec Loss 2.0060 LearningRate 0.000264 Epoch: 21 Global Step: 445610 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:33,972-Speed 2499.00 samples/sec Loss 2.0054 LearningRate 0.000264 Epoch: 21 Global Step: 445620 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:42,117-Speed 2514.99 samples/sec Loss 2.0388 LearningRate 0.000264 Epoch: 21 Global Step: 445630 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:50,314-Speed 2498.71 samples/sec Loss 2.0139 LearningRate 0.000264 Epoch: 21 Global Step: 445640 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:26:58,510-Speed 2499.28 samples/sec Loss 2.0417 LearningRate 0.000264 Epoch: 21 Global Step: 445650 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:06,706-Speed 2499.16 samples/sec Loss 2.0472 LearningRate 0.000264 Epoch: 21 Global Step: 445660 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:14,910-Speed 2496.71 samples/sec Loss 2.0263 LearningRate 0.000264 Epoch: 21 Global Step: 445670 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:23,110-Speed 2498.20 samples/sec Loss 1.9800 LearningRate 0.000264 Epoch: 21 Global Step: 445680 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:31,270-Speed 2510.42 samples/sec Loss 2.0080 LearningRate 0.000264 Epoch: 21 Global Step: 445690 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:39,485-Speed 2493.33 samples/sec Loss 2.0415 LearningRate 0.000264 Epoch: 21 Global Step: 445700 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:47,690-Speed 2496.65 samples/sec Loss 1.9760 LearningRate 0.000264 Epoch: 21 Global Step: 445710 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:27:55,888-Speed 2498.51 samples/sec Loss 1.9933 LearningRate 0.000264 Epoch: 21 Global Step: 445720 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:04,087-Speed 2498.47 samples/sec Loss 1.9759 LearningRate 0.000264 Epoch: 21 Global Step: 445730 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:12,291-Speed 2496.80 samples/sec Loss 2.0211 LearningRate 0.000264 Epoch: 21 Global Step: 445740 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:20,449-Speed 2510.83 samples/sec Loss 2.0402 LearningRate 0.000264 Epoch: 21 Global Step: 445750 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:28,646-Speed 2498.81 samples/sec Loss 2.0001 LearningRate 0.000264 Epoch: 21 Global Step: 445760 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:36,846-Speed 2497.77 samples/sec Loss 2.0074 LearningRate 0.000264 Epoch: 21 Global Step: 445770 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:45,046-Speed 2497.93 samples/sec Loss 2.0114 LearningRate 0.000264 Epoch: 21 Global Step: 445780 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:28:53,243-Speed 2498.97 samples/sec Loss 2.0153 LearningRate 0.000264 Epoch: 21 Global Step: 445790 Fp16 Grad Scale: 8192 Required: 88 hours Training: 2022-07-09 20:29:01,439-Speed 2499.30 samples/sec Loss 1.9753 LearningRate 0.000264 Epoch: 21 Global Step: 445800 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:09,605-Speed 2508.23 samples/sec Loss 2.0006 LearningRate 0.000264 Epoch: 21 Global Step: 445810 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:17,827-Speed 2491.77 samples/sec Loss 2.0106 LearningRate 0.000264 Epoch: 21 Global Step: 445820 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:26,031-Speed 2496.49 samples/sec Loss 1.9531 LearningRate 0.000264 Epoch: 21 Global Step: 445830 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:34,230-Speed 2498.20 samples/sec Loss 1.9918 LearningRate 0.000264 Epoch: 21 Global Step: 445840 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:42,462-Speed 2488.53 samples/sec Loss 2.0196 LearningRate 0.000264 Epoch: 21 Global Step: 445850 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:50,669-Speed 2495.85 samples/sec Loss 1.9737 LearningRate 0.000264 Epoch: 21 Global Step: 445860 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:29:58,814-Speed 2514.57 samples/sec Loss 1.9520 LearningRate 0.000264 Epoch: 21 Global Step: 445870 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:07,014-Speed 2498.14 samples/sec Loss 1.9714 LearningRate 0.000264 Epoch: 21 Global Step: 445880 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:15,222-Speed 2495.27 samples/sec Loss 1.9322 LearningRate 0.000264 Epoch: 21 Global Step: 445890 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:23,445-Speed 2491.56 samples/sec Loss 1.9433 LearningRate 0.000264 Epoch: 21 Global Step: 445900 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:31,651-Speed 2495.95 samples/sec Loss 1.9488 LearningRate 0.000264 Epoch: 21 Global Step: 445910 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:39,851-Speed 2498.13 samples/sec Loss 2.0122 LearningRate 0.000264 Epoch: 21 Global Step: 445920 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:47,999-Speed 2513.84 samples/sec Loss 1.9699 LearningRate 0.000264 Epoch: 21 Global Step: 445930 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:30:56,205-Speed 2496.04 samples/sec Loss 1.9746 LearningRate 0.000264 Epoch: 21 Global Step: 445940 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:04,403-Speed 2498.72 samples/sec Loss 1.9563 LearningRate 0.000264 Epoch: 21 Global Step: 445950 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:12,601-Speed 2498.30 samples/sec Loss 1.9771 LearningRate 0.000264 Epoch: 21 Global Step: 445960 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:20,810-Speed 2495.40 samples/sec Loss 2.0114 LearningRate 0.000264 Epoch: 21 Global Step: 445970 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:29,025-Speed 2493.43 samples/sec Loss 2.0070 LearningRate 0.000264 Epoch: 21 Global Step: 445980 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:37,186-Speed 2509.96 samples/sec Loss 1.9429 LearningRate 0.000264 Epoch: 21 Global Step: 445990 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:45,387-Speed 2497.61 samples/sec Loss 2.0130 LearningRate 0.000264 Epoch: 21 Global Step: 446000 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:31:53,589-Speed 2497.32 samples/sec Loss 1.9781 LearningRate 0.000264 Epoch: 21 Global Step: 446010 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:01,790-Speed 2497.66 samples/sec Loss 1.9956 LearningRate 0.000264 Epoch: 21 Global Step: 446020 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:09,995-Speed 2496.31 samples/sec Loss 2.0007 LearningRate 0.000264 Epoch: 21 Global Step: 446030 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:18,209-Speed 2493.77 samples/sec Loss 1.9889 LearningRate 0.000264 Epoch: 21 Global Step: 446040 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:26,369-Speed 2510.31 samples/sec Loss 2.0403 LearningRate 0.000264 Epoch: 21 Global Step: 446050 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:34,569-Speed 2497.65 samples/sec Loss 2.0501 LearningRate 0.000264 Epoch: 21 Global Step: 446060 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:42,792-Speed 2490.93 samples/sec Loss 1.9759 LearningRate 0.000264 Epoch: 21 Global Step: 446070 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:50,994-Speed 2497.45 samples/sec Loss 2.0373 LearningRate 0.000264 Epoch: 21 Global Step: 446080 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:32:59,202-Speed 2495.35 samples/sec Loss 1.9830 LearningRate 0.000264 Epoch: 21 Global Step: 446090 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:07,405-Speed 2497.25 samples/sec Loss 1.9742 LearningRate 0.000264 Epoch: 21 Global Step: 446100 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:15,550-Speed 2514.73 samples/sec Loss 1.9616 LearningRate 0.000264 Epoch: 21 Global Step: 446110 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:23,761-Speed 2494.55 samples/sec Loss 2.0282 LearningRate 0.000264 Epoch: 21 Global Step: 446120 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:31,960-Speed 2498.32 samples/sec Loss 1.9853 LearningRate 0.000264 Epoch: 21 Global Step: 446130 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:40,160-Speed 2498.27 samples/sec Loss 1.9974 LearningRate 0.000264 Epoch: 21 Global Step: 446140 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:48,364-Speed 2496.66 samples/sec Loss 1.9981 LearningRate 0.000264 Epoch: 21 Global Step: 446150 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:33:56,564-Speed 2497.93 samples/sec Loss 2.0266 LearningRate 0.000264 Epoch: 21 Global Step: 446160 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:04,722-Speed 2510.92 samples/sec Loss 1.9665 LearningRate 0.000264 Epoch: 21 Global Step: 446170 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:12,927-Speed 2496.54 samples/sec Loss 1.9373 LearningRate 0.000264 Epoch: 21 Global Step: 446180 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:21,130-Speed 2497.02 samples/sec Loss 2.0189 LearningRate 0.000264 Epoch: 21 Global Step: 446190 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:29,350-Speed 2492.03 samples/sec Loss 2.0050 LearningRate 0.000264 Epoch: 21 Global Step: 446200 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:37,554-Speed 2496.76 samples/sec Loss 1.9921 LearningRate 0.000264 Epoch: 21 Global Step: 446210 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:45,765-Speed 2494.52 samples/sec Loss 2.0488 LearningRate 0.000264 Epoch: 21 Global Step: 446220 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:34:53,928-Speed 2509.52 samples/sec Loss 2.0034 LearningRate 0.000264 Epoch: 21 Global Step: 446230 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:02,132-Speed 2496.55 samples/sec Loss 1.9770 LearningRate 0.000264 Epoch: 21 Global Step: 446240 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:10,336-Speed 2496.94 samples/sec Loss 1.9886 LearningRate 0.000264 Epoch: 21 Global Step: 446250 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:18,535-Speed 2498.03 samples/sec Loss 1.9891 LearningRate 0.000264 Epoch: 21 Global Step: 446260 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:26,734-Speed 2498.35 samples/sec Loss 1.9277 LearningRate 0.000264 Epoch: 21 Global Step: 446270 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:34,932-Speed 2498.51 samples/sec Loss 2.0074 LearningRate 0.000264 Epoch: 21 Global Step: 446280 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:43,080-Speed 2513.74 samples/sec Loss 1.9700 LearningRate 0.000264 Epoch: 21 Global Step: 446290 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:51,293-Speed 2494.19 samples/sec Loss 1.9930 LearningRate 0.000264 Epoch: 21 Global Step: 446300 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:35:59,497-Speed 2496.93 samples/sec Loss 1.9553 LearningRate 0.000264 Epoch: 21 Global Step: 446310 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:07,720-Speed 2491.03 samples/sec Loss 1.9763 LearningRate 0.000263 Epoch: 21 Global Step: 446320 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:15,940-Speed 2491.89 samples/sec Loss 1.9924 LearningRate 0.000263 Epoch: 21 Global Step: 446330 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:24,139-Speed 2498.30 samples/sec Loss 2.0343 LearningRate 0.000263 Epoch: 21 Global Step: 446340 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:32,289-Speed 2513.39 samples/sec Loss 2.0102 LearningRate 0.000263 Epoch: 21 Global Step: 446350 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:40,511-Speed 2490.88 samples/sec Loss 1.9980 LearningRate 0.000263 Epoch: 21 Global Step: 446360 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:48,722-Speed 2494.64 samples/sec Loss 1.9817 LearningRate 0.000263 Epoch: 21 Global Step: 446370 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:36:56,923-Speed 2497.77 samples/sec Loss 2.0553 LearningRate 0.000263 Epoch: 21 Global Step: 446380 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:05,122-Speed 2498.27 samples/sec Loss 2.0448 LearningRate 0.000263 Epoch: 21 Global Step: 446390 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:13,320-Speed 2498.56 samples/sec Loss 1.9370 LearningRate 0.000263 Epoch: 21 Global Step: 446400 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:21,471-Speed 2513.01 samples/sec Loss 2.0111 LearningRate 0.000263 Epoch: 21 Global Step: 446410 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:29,679-Speed 2495.56 samples/sec Loss 2.0258 LearningRate 0.000263 Epoch: 21 Global Step: 446420 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:37,880-Speed 2497.53 samples/sec Loss 2.0273 LearningRate 0.000263 Epoch: 21 Global Step: 446430 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:46,080-Speed 2497.96 samples/sec Loss 2.0297 LearningRate 0.000263 Epoch: 21 Global Step: 446440 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:37:54,278-Speed 2498.68 samples/sec Loss 2.0291 LearningRate 0.000263 Epoch: 21 Global Step: 446450 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:02,476-Speed 2498.43 samples/sec Loss 2.0182 LearningRate 0.000263 Epoch: 21 Global Step: 446460 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:10,628-Speed 2512.59 samples/sec Loss 1.9907 LearningRate 0.000263 Epoch: 21 Global Step: 446470 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:18,827-Speed 2498.25 samples/sec Loss 2.0303 LearningRate 0.000263 Epoch: 21 Global Step: 446480 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:27,026-Speed 2498.38 samples/sec Loss 1.9565 LearningRate 0.000263 Epoch: 21 Global Step: 446490 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:35,224-Speed 2498.80 samples/sec Loss 1.9469 LearningRate 0.000263 Epoch: 21 Global Step: 446500 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:43,427-Speed 2496.72 samples/sec Loss 2.0217 LearningRate 0.000263 Epoch: 21 Global Step: 446510 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:51,630-Speed 2497.07 samples/sec Loss 1.9612 LearningRate 0.000263 Epoch: 21 Global Step: 446520 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:38:59,778-Speed 2513.79 samples/sec Loss 1.9997 LearningRate 0.000263 Epoch: 21 Global Step: 446530 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:07,976-Speed 2498.58 samples/sec Loss 2.0328 LearningRate 0.000263 Epoch: 21 Global Step: 446540 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:16,178-Speed 2497.55 samples/sec Loss 1.9739 LearningRate 0.000263 Epoch: 21 Global Step: 446550 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:24,380-Speed 2497.15 samples/sec Loss 1.9916 LearningRate 0.000263 Epoch: 21 Global Step: 446560 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:32,605-Speed 2490.37 samples/sec Loss 2.0250 LearningRate 0.000263 Epoch: 21 Global Step: 446570 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:40,821-Speed 2493.13 samples/sec Loss 1.9682 LearningRate 0.000263 Epoch: 21 Global Step: 446580 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:48,971-Speed 2513.10 samples/sec Loss 1.9481 LearningRate 0.000263 Epoch: 21 Global Step: 446590 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:39:57,178-Speed 2495.90 samples/sec Loss 2.0091 LearningRate 0.000263 Epoch: 21 Global Step: 446600 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:05,377-Speed 2498.16 samples/sec Loss 1.9585 LearningRate 0.000263 Epoch: 21 Global Step: 446610 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:13,576-Speed 2498.45 samples/sec Loss 1.9724 LearningRate 0.000263 Epoch: 21 Global Step: 446620 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:21,776-Speed 2497.95 samples/sec Loss 2.0075 LearningRate 0.000263 Epoch: 21 Global Step: 446630 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:29,986-Speed 2494.77 samples/sec Loss 1.9701 LearningRate 0.000263 Epoch: 21 Global Step: 446640 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:38,136-Speed 2513.49 samples/sec Loss 1.9747 LearningRate 0.000263 Epoch: 21 Global Step: 446650 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:46,332-Speed 2499.22 samples/sec Loss 1.9798 LearningRate 0.000263 Epoch: 21 Global Step: 446660 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:40:54,541-Speed 2495.36 samples/sec Loss 1.9806 LearningRate 0.000263 Epoch: 21 Global Step: 446670 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:02,738-Speed 2498.72 samples/sec Loss 1.9908 LearningRate 0.000263 Epoch: 21 Global Step: 446680 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:10,955-Speed 2492.91 samples/sec Loss 1.9775 LearningRate 0.000263 Epoch: 21 Global Step: 446690 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:19,155-Speed 2497.76 samples/sec Loss 1.9743 LearningRate 0.000263 Epoch: 21 Global Step: 446700 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:27,321-Speed 2508.41 samples/sec Loss 2.0142 LearningRate 0.000263 Epoch: 21 Global Step: 446710 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:35,532-Speed 2494.60 samples/sec Loss 1.9725 LearningRate 0.000263 Epoch: 21 Global Step: 446720 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:43,731-Speed 2498.52 samples/sec Loss 1.9867 LearningRate 0.000263 Epoch: 21 Global Step: 446730 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:41:51,930-Speed 2498.50 samples/sec Loss 2.0367 LearningRate 0.000263 Epoch: 21 Global Step: 446740 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:00,137-Speed 2496.05 samples/sec Loss 1.9978 LearningRate 0.000263 Epoch: 21 Global Step: 446750 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:08,336-Speed 2498.33 samples/sec Loss 2.0663 LearningRate 0.000263 Epoch: 21 Global Step: 446760 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:16,497-Speed 2509.89 samples/sec Loss 2.0243 LearningRate 0.000263 Epoch: 21 Global Step: 446770 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:24,696-Speed 2498.10 samples/sec Loss 1.9954 LearningRate 0.000263 Epoch: 21 Global Step: 446780 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:32,895-Speed 2498.31 samples/sec Loss 2.0241 LearningRate 0.000263 Epoch: 21 Global Step: 446790 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:41,096-Speed 2497.91 samples/sec Loss 2.0128 LearningRate 0.000263 Epoch: 21 Global Step: 446800 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:49,294-Speed 2498.62 samples/sec Loss 2.0429 LearningRate 0.000263 Epoch: 21 Global Step: 446810 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:42:57,516-Speed 2491.07 samples/sec Loss 2.0045 LearningRate 0.000263 Epoch: 21 Global Step: 446820 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:05,666-Speed 2513.42 samples/sec Loss 1.9752 LearningRate 0.000263 Epoch: 21 Global Step: 446830 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:13,865-Speed 2498.58 samples/sec Loss 1.9664 LearningRate 0.000263 Epoch: 21 Global Step: 446840 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:22,063-Speed 2498.68 samples/sec Loss 1.9592 LearningRate 0.000263 Epoch: 21 Global Step: 446850 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:30,264-Speed 2497.71 samples/sec Loss 1.9853 LearningRate 0.000263 Epoch: 21 Global Step: 446860 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:38,464-Speed 2497.67 samples/sec Loss 2.0273 LearningRate 0.000263 Epoch: 21 Global Step: 446870 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:46,669-Speed 2497.04 samples/sec Loss 1.9647 LearningRate 0.000263 Epoch: 21 Global Step: 446880 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:43:54,815-Speed 2514.47 samples/sec Loss 2.0387 LearningRate 0.000263 Epoch: 21 Global Step: 446890 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:03,013-Speed 2498.46 samples/sec Loss 2.0122 LearningRate 0.000263 Epoch: 21 Global Step: 446900 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:11,219-Speed 2496.36 samples/sec Loss 1.9623 LearningRate 0.000263 Epoch: 21 Global Step: 446910 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:19,416-Speed 2499.13 samples/sec Loss 1.9789 LearningRate 0.000263 Epoch: 21 Global Step: 446920 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:27,617-Speed 2497.86 samples/sec Loss 2.0286 LearningRate 0.000263 Epoch: 21 Global Step: 446930 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:35,816-Speed 2498.02 samples/sec Loss 1.9768 LearningRate 0.000263 Epoch: 21 Global Step: 446940 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:43,962-Speed 2514.67 samples/sec Loss 1.9987 LearningRate 0.000263 Epoch: 21 Global Step: 446950 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:44:52,163-Speed 2497.81 samples/sec Loss 1.9795 LearningRate 0.000263 Epoch: 21 Global Step: 446960 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:45:00,360-Speed 2498.84 samples/sec Loss 1.9592 LearningRate 0.000263 Epoch: 21 Global Step: 446970 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:45:08,566-Speed 2496.22 samples/sec Loss 1.9992 LearningRate 0.000263 Epoch: 21 Global Step: 446980 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:45:16,766-Speed 2497.88 samples/sec Loss 1.9903 LearningRate 0.000263 Epoch: 21 Global Step: 446990 Fp16 Grad Scale: 16384 Required: 88 hours Training: 2022-07-09 20:45:24,966-Speed 2498.21 samples/sec Loss 1.9922 LearningRate 0.000263 Epoch: 21 Global Step: 447000 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:45:33,113-Speed 2514.10 samples/sec Loss 2.0240 LearningRate 0.000263 Epoch: 21 Global Step: 447010 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:45:41,313-Speed 2498.20 samples/sec Loss 2.0283 LearningRate 0.000263 Epoch: 21 Global Step: 447020 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:45:49,527-Speed 2494.15 samples/sec Loss 1.9918 LearningRate 0.000263 Epoch: 21 Global Step: 447030 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:45:57,725-Speed 2498.53 samples/sec Loss 2.0190 LearningRate 0.000263 Epoch: 21 Global Step: 447040 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:05,922-Speed 2498.55 samples/sec Loss 1.9553 LearningRate 0.000262 Epoch: 21 Global Step: 447050 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:14,122-Speed 2498.07 samples/sec Loss 1.9696 LearningRate 0.000262 Epoch: 21 Global Step: 447060 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:22,291-Speed 2507.39 samples/sec Loss 1.9990 LearningRate 0.000262 Epoch: 21 Global Step: 447070 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:30,492-Speed 2497.82 samples/sec Loss 1.9567 LearningRate 0.000262 Epoch: 21 Global Step: 447080 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:38,691-Speed 2498.08 samples/sec Loss 1.9662 LearningRate 0.000262 Epoch: 21 Global Step: 447090 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:46,896-Speed 2496.51 samples/sec Loss 1.9990 LearningRate 0.000262 Epoch: 21 Global Step: 447100 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:46:55,104-Speed 2495.42 samples/sec Loss 1.9660 LearningRate 0.000262 Epoch: 21 Global Step: 447110 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:03,319-Speed 2493.10 samples/sec Loss 2.0272 LearningRate 0.000262 Epoch: 21 Global Step: 447120 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:11,463-Speed 2515.19 samples/sec Loss 2.0314 LearningRate 0.000262 Epoch: 21 Global Step: 447130 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:19,665-Speed 2497.39 samples/sec Loss 2.0192 LearningRate 0.000262 Epoch: 21 Global Step: 447140 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:27,867-Speed 2497.39 samples/sec Loss 1.9998 LearningRate 0.000262 Epoch: 21 Global Step: 447150 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:36,069-Speed 2497.19 samples/sec Loss 2.0501 LearningRate 0.000262 Epoch: 21 Global Step: 447160 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:44,270-Speed 2497.57 samples/sec Loss 2.0209 LearningRate 0.000262 Epoch: 21 Global Step: 447170 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:47:52,471-Speed 2498.03 samples/sec Loss 1.9880 LearningRate 0.000262 Epoch: 21 Global Step: 447180 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:00,620-Speed 2513.71 samples/sec Loss 1.9657 LearningRate 0.000262 Epoch: 21 Global Step: 447190 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:08,823-Speed 2496.81 samples/sec Loss 1.9783 LearningRate 0.000262 Epoch: 21 Global Step: 447200 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:17,033-Speed 2495.21 samples/sec Loss 2.0480 LearningRate 0.000262 Epoch: 21 Global Step: 447210 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:25,239-Speed 2496.26 samples/sec Loss 1.9907 LearningRate 0.000262 Epoch: 21 Global Step: 447220 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:33,459-Speed 2492.51 samples/sec Loss 2.0315 LearningRate 0.000262 Epoch: 21 Global Step: 447230 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:41,656-Speed 2499.50 samples/sec Loss 2.0249 LearningRate 0.000262 Epoch: 21 Global Step: 447240 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:49,808-Speed 2512.69 samples/sec Loss 2.0272 LearningRate 0.000262 Epoch: 21 Global Step: 447250 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:48:58,007-Speed 2498.69 samples/sec Loss 2.0022 LearningRate 0.000262 Epoch: 21 Global Step: 447260 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:49:06,206-Speed 2498.10 samples/sec Loss 2.0086 LearningRate 0.000262 Epoch: 21 Global Step: 447270 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:49:14,405-Speed 2498.15 samples/sec Loss 1.9643 LearningRate 0.000262 Epoch: 21 Global Step: 447280 Fp16 Grad Scale: 32768 Required: 88 hours Training: 2022-07-09 20:49:22,612-Speed 2496.14 samples/sec Loss 2.0397 LearningRate 0.000262 Epoch: 21 Global Step: 447290 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:49:30,826-Speed 2493.82 samples/sec Loss 1.9769 LearningRate 0.000262 Epoch: 21 Global Step: 447300 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:49:38,978-Speed 2512.48 samples/sec Loss 2.0147 LearningRate 0.000262 Epoch: 21 Global Step: 447310 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:49:47,182-Speed 2497.26 samples/sec Loss 2.0675 LearningRate 0.000262 Epoch: 21 Global Step: 447320 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:49:55,395-Speed 2493.98 samples/sec Loss 1.9884 LearningRate 0.000262 Epoch: 21 Global Step: 447330 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:03,601-Speed 2496.09 samples/sec Loss 2.0093 LearningRate 0.000262 Epoch: 21 Global Step: 447340 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:11,807-Speed 2497.15 samples/sec Loss 2.0327 LearningRate 0.000262 Epoch: 21 Global Step: 447350 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:20,009-Speed 2497.28 samples/sec Loss 2.0445 LearningRate 0.000262 Epoch: 21 Global Step: 447360 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:28,156-Speed 2514.07 samples/sec Loss 1.9264 LearningRate 0.000262 Epoch: 21 Global Step: 447370 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:36,360-Speed 2496.85 samples/sec Loss 2.0092 LearningRate 0.000262 Epoch: 21 Global Step: 447380 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:44,561-Speed 2497.43 samples/sec Loss 1.9564 LearningRate 0.000262 Epoch: 21 Global Step: 447390 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:50:52,765-Speed 2497.00 samples/sec Loss 1.9874 LearningRate 0.000262 Epoch: 21 Global Step: 447400 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:00,963-Speed 2498.35 samples/sec Loss 1.9709 LearningRate 0.000262 Epoch: 21 Global Step: 447410 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:09,159-Speed 2499.15 samples/sec Loss 1.9823 LearningRate 0.000262 Epoch: 21 Global Step: 447420 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:17,306-Speed 2515.55 samples/sec Loss 1.9348 LearningRate 0.000262 Epoch: 21 Global Step: 447430 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:25,524-Speed 2492.50 samples/sec Loss 2.0056 LearningRate 0.000262 Epoch: 21 Global Step: 447440 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:33,724-Speed 2497.86 samples/sec Loss 2.0109 LearningRate 0.000262 Epoch: 21 Global Step: 447450 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:41,944-Speed 2491.73 samples/sec Loss 1.9566 LearningRate 0.000262 Epoch: 21 Global Step: 447460 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:50,142-Speed 2498.94 samples/sec Loss 1.9473 LearningRate 0.000262 Epoch: 21 Global Step: 447470 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:51:58,354-Speed 2494.43 samples/sec Loss 1.9375 LearningRate 0.000262 Epoch: 21 Global Step: 447480 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:06,515-Speed 2509.91 samples/sec Loss 1.9983 LearningRate 0.000262 Epoch: 21 Global Step: 447490 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:14,717-Speed 2497.30 samples/sec Loss 1.9656 LearningRate 0.000262 Epoch: 21 Global Step: 447500 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:22,919-Speed 2497.32 samples/sec Loss 1.9292 LearningRate 0.000262 Epoch: 21 Global Step: 447510 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:31,118-Speed 2498.58 samples/sec Loss 1.9705 LearningRate 0.000262 Epoch: 21 Global Step: 447520 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:39,330-Speed 2494.52 samples/sec Loss 1.9828 LearningRate 0.000262 Epoch: 21 Global Step: 447530 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:47,538-Speed 2495.26 samples/sec Loss 1.9665 LearningRate 0.000262 Epoch: 21 Global Step: 447540 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:52:55,686-Speed 2514.02 samples/sec Loss 1.9700 LearningRate 0.000262 Epoch: 21 Global Step: 447550 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:03,885-Speed 2498.44 samples/sec Loss 1.9749 LearningRate 0.000262 Epoch: 21 Global Step: 447560 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:12,088-Speed 2496.84 samples/sec Loss 1.9583 LearningRate 0.000262 Epoch: 21 Global Step: 447570 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:20,297-Speed 2495.37 samples/sec Loss 1.9355 LearningRate 0.000262 Epoch: 21 Global Step: 447580 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:30,224-Speed 2501.02 samples/sec Loss 2.0144 LearningRate 0.000262 Epoch: 21 Global Step: 447590 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:38,469-Speed 2496.17 samples/sec Loss 2.0145 LearningRate 0.000262 Epoch: 21 Global Step: 447600 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:46,620-Speed 2512.72 samples/sec Loss 2.0064 LearningRate 0.000262 Epoch: 21 Global Step: 447610 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:53:54,868-Speed 2497.79 samples/sec Loss 1.9665 LearningRate 0.000262 Epoch: 21 Global Step: 447620 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:03,286-Speed 2499.25 samples/sec Loss 1.9588 LearningRate 0.000262 Epoch: 21 Global Step: 447630 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:11,546-Speed 2495.02 samples/sec Loss 1.9581 LearningRate 0.000262 Epoch: 21 Global Step: 447640 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:23,668-Speed 1689.65 samples/sec Loss 1.9799 LearningRate 0.000262 Epoch: 21 Global Step: 447650 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:32,100-Speed 2501.36 samples/sec Loss 2.0282 LearningRate 0.000262 Epoch: 21 Global Step: 447660 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:40,300-Speed 2515.73 samples/sec Loss 1.9881 LearningRate 0.000262 Epoch: 21 Global Step: 447670 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:48,495-Speed 2499.43 samples/sec Loss 1.9934 LearningRate 0.000262 Epoch: 21 Global Step: 447680 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:54:56,739-Speed 2500.12 samples/sec Loss 1.9161 LearningRate 0.000262 Epoch: 21 Global Step: 447690 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:08,734-Speed 1707.52 samples/sec Loss 1.9773 LearningRate 0.000262 Epoch: 21 Global Step: 447700 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:17,000-Speed 2497.57 samples/sec Loss 1.9652 LearningRate 0.000262 Epoch: 21 Global Step: 447710 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:25,223-Speed 2498.26 samples/sec Loss 1.9916 LearningRate 0.000262 Epoch: 21 Global Step: 447720 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:36,152-Speed 1874.00 samples/sec Loss 1.9614 LearningRate 0.000262 Epoch: 21 Global Step: 447730 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:44,407-Speed 2495.73 samples/sec Loss 2.0173 LearningRate 0.000262 Epoch: 21 Global Step: 447740 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:55:52,651-Speed 2498.97 samples/sec Loss 1.9388 LearningRate 0.000262 Epoch: 21 Global Step: 447750 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:02,126-Speed 2161.69 samples/sec Loss 2.0188 LearningRate 0.000262 Epoch: 21 Global Step: 447760 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:10,359-Speed 2496.12 samples/sec Loss 1.9903 LearningRate 0.000261 Epoch: 21 Global Step: 447770 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:18,613-Speed 2499.56 samples/sec Loss 1.9921 LearningRate 0.000261 Epoch: 21 Global Step: 447780 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:26,934-Speed 2486.21 samples/sec Loss 1.9690 LearningRate 0.000261 Epoch: 21 Global Step: 447790 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:35,150-Speed 2492.96 samples/sec Loss 1.9699 LearningRate 0.000261 Epoch: 21 Global Step: 447800 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:43,373-Speed 2497.22 samples/sec Loss 2.0122 LearningRate 0.000261 Epoch: 21 Global Step: 447810 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:56:55,685-Speed 1663.67 samples/sec Loss 1.9931 LearningRate 0.000261 Epoch: 21 Global Step: 447820 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:03,882-Speed 2500.11 samples/sec Loss 1.9916 LearningRate 0.000261 Epoch: 21 Global Step: 447830 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:12,117-Speed 2499.58 samples/sec Loss 1.9545 LearningRate 0.000261 Epoch: 21 Global Step: 447840 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:22,650-Speed 1944.57 samples/sec Loss 1.9503 LearningRate 0.000261 Epoch: 21 Global Step: 447850 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:31,269-Speed 2403.51 samples/sec Loss 1.9669 LearningRate 0.000261 Epoch: 21 Global Step: 447860 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:42,395-Speed 1850.49 samples/sec Loss 1.9859 LearningRate 0.000261 Epoch: 21 Global Step: 447870 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:50,588-Speed 2500.03 samples/sec Loss 1.9887 LearningRate 0.000261 Epoch: 21 Global Step: 447880 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:57:58,785-Speed 2499.01 samples/sec Loss 1.9740 LearningRate 0.000261 Epoch: 21 Global Step: 447890 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:06,985-Speed 2497.77 samples/sec Loss 1.9966 LearningRate 0.000261 Epoch: 21 Global Step: 447900 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:15,135-Speed 2513.52 samples/sec Loss 1.9868 LearningRate 0.000261 Epoch: 21 Global Step: 447910 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:23,339-Speed 2496.72 samples/sec Loss 1.9806 LearningRate 0.000261 Epoch: 21 Global Step: 447920 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:31,544-Speed 2496.70 samples/sec Loss 1.9616 LearningRate 0.000261 Epoch: 21 Global Step: 447930 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:39,745-Speed 2497.71 samples/sec Loss 1.9687 LearningRate 0.000261 Epoch: 21 Global Step: 447940 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 20:58:47,903-Speed 2511.18 samples/sec Loss 2.0465 LearningRate 0.000261 Epoch: 21 Global Step: 447950 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:58:56,110-Speed 2495.82 samples/sec Loss 1.9999 LearningRate 0.000261 Epoch: 21 Global Step: 447960 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:04,260-Speed 2513.37 samples/sec Loss 1.9797 LearningRate 0.000261 Epoch: 21 Global Step: 447970 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:12,461-Speed 2497.52 samples/sec Loss 1.9730 LearningRate 0.000261 Epoch: 21 Global Step: 447980 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:20,664-Speed 2497.05 samples/sec Loss 1.9349 LearningRate 0.000261 Epoch: 21 Global Step: 447990 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:28,868-Speed 2496.77 samples/sec Loss 1.9358 LearningRate 0.000261 Epoch: 21 Global Step: 448000 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:37,073-Speed 2496.38 samples/sec Loss 1.9528 LearningRate 0.000261 Epoch: 21 Global Step: 448010 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:45,286-Speed 2493.92 samples/sec Loss 1.9512 LearningRate 0.000261 Epoch: 21 Global Step: 448020 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 20:59:53,461-Speed 2505.75 samples/sec Loss 1.9741 LearningRate 0.000261 Epoch: 21 Global Step: 448030 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:01,663-Speed 2497.61 samples/sec Loss 1.9674 LearningRate 0.000261 Epoch: 21 Global Step: 448040 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:09,864-Speed 2497.67 samples/sec Loss 2.0122 LearningRate 0.000261 Epoch: 21 Global Step: 448050 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:18,066-Speed 2497.19 samples/sec Loss 2.0454 LearningRate 0.000261 Epoch: 21 Global Step: 448060 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:26,267-Speed 2497.69 samples/sec Loss 2.0183 LearningRate 0.000261 Epoch: 21 Global Step: 448070 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:34,466-Speed 2498.23 samples/sec Loss 2.0213 LearningRate 0.000261 Epoch: 21 Global Step: 448080 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:42,617-Speed 2513.04 samples/sec Loss 1.9822 LearningRate 0.000261 Epoch: 21 Global Step: 448090 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:50,820-Speed 2497.06 samples/sec Loss 1.9876 LearningRate 0.000261 Epoch: 21 Global Step: 448100 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:00:59,021-Speed 2497.70 samples/sec Loss 2.0319 LearningRate 0.000261 Epoch: 21 Global Step: 448110 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:07,226-Speed 2496.56 samples/sec Loss 2.0032 LearningRate 0.000261 Epoch: 21 Global Step: 448120 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:15,428-Speed 2497.00 samples/sec Loss 2.0084 LearningRate 0.000261 Epoch: 21 Global Step: 448130 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:23,653-Speed 2490.60 samples/sec Loss 2.0709 LearningRate 0.000261 Epoch: 21 Global Step: 448140 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:31,800-Speed 2514.18 samples/sec Loss 1.9705 LearningRate 0.000261 Epoch: 21 Global Step: 448150 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:40,012-Speed 2494.25 samples/sec Loss 1.9736 LearningRate 0.000261 Epoch: 21 Global Step: 448160 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:48,217-Speed 2496.36 samples/sec Loss 1.9877 LearningRate 0.000261 Epoch: 21 Global Step: 448170 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:01:56,416-Speed 2498.35 samples/sec Loss 2.0169 LearningRate 0.000261 Epoch: 21 Global Step: 448180 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:04,624-Speed 2495.62 samples/sec Loss 1.9723 LearningRate 0.000261 Epoch: 21 Global Step: 448190 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:12,825-Speed 2497.41 samples/sec Loss 2.0346 LearningRate 0.000261 Epoch: 21 Global Step: 448200 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:20,976-Speed 2513.11 samples/sec Loss 1.9847 LearningRate 0.000261 Epoch: 21 Global Step: 448210 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:29,187-Speed 2494.58 samples/sec Loss 1.9833 LearningRate 0.000261 Epoch: 21 Global Step: 448220 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:37,388-Speed 2497.62 samples/sec Loss 1.9845 LearningRate 0.000261 Epoch: 21 Global Step: 448230 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:45,587-Speed 2498.14 samples/sec Loss 2.0243 LearningRate 0.000261 Epoch: 21 Global Step: 448240 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:02:53,789-Speed 2497.32 samples/sec Loss 1.9620 LearningRate 0.000261 Epoch: 21 Global Step: 448250 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:02,001-Speed 2494.21 samples/sec Loss 1.9634 LearningRate 0.000261 Epoch: 21 Global Step: 448260 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:10,155-Speed 2511.79 samples/sec Loss 1.9590 LearningRate 0.000261 Epoch: 21 Global Step: 448270 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:18,355-Speed 2498.28 samples/sec Loss 2.0223 LearningRate 0.000261 Epoch: 21 Global Step: 448280 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:26,563-Speed 2495.41 samples/sec Loss 1.9763 LearningRate 0.000261 Epoch: 21 Global Step: 448290 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:34,770-Speed 2495.94 samples/sec Loss 1.9689 LearningRate 0.000261 Epoch: 21 Global Step: 448300 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:42,970-Speed 2497.92 samples/sec Loss 1.9828 LearningRate 0.000261 Epoch: 21 Global Step: 448310 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:51,183-Speed 2494.01 samples/sec Loss 1.9902 LearningRate 0.000261 Epoch: 21 Global Step: 448320 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:03:59,336-Speed 2512.48 samples/sec Loss 1.9860 LearningRate 0.000261 Epoch: 21 Global Step: 448330 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:07,540-Speed 2496.62 samples/sec Loss 1.9009 LearningRate 0.000261 Epoch: 21 Global Step: 448340 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:15,748-Speed 2495.47 samples/sec Loss 1.9705 LearningRate 0.000261 Epoch: 21 Global Step: 448350 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:23,962-Speed 2493.81 samples/sec Loss 1.9737 LearningRate 0.000261 Epoch: 21 Global Step: 448360 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:32,176-Speed 2493.70 samples/sec Loss 2.0022 LearningRate 0.000261 Epoch: 21 Global Step: 448370 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:40,381-Speed 2496.42 samples/sec Loss 1.9916 LearningRate 0.000261 Epoch: 21 Global Step: 448380 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:48,538-Speed 2511.02 samples/sec Loss 1.9369 LearningRate 0.000261 Epoch: 21 Global Step: 448390 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:04:56,744-Speed 2496.28 samples/sec Loss 1.9720 LearningRate 0.000261 Epoch: 21 Global Step: 448400 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:04,962-Speed 2492.50 samples/sec Loss 1.9351 LearningRate 0.000261 Epoch: 21 Global Step: 448410 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:13,160-Speed 2498.38 samples/sec Loss 1.9776 LearningRate 0.000261 Epoch: 21 Global Step: 448420 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:21,364-Speed 2496.82 samples/sec Loss 1.9750 LearningRate 0.000261 Epoch: 21 Global Step: 448430 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:29,570-Speed 2496.16 samples/sec Loss 1.9515 LearningRate 0.000261 Epoch: 21 Global Step: 448440 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:37,721-Speed 2512.93 samples/sec Loss 1.9592 LearningRate 0.000261 Epoch: 21 Global Step: 448450 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:45,920-Speed 2498.24 samples/sec Loss 2.0004 LearningRate 0.000261 Epoch: 21 Global Step: 448460 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:05:54,130-Speed 2495.44 samples/sec Loss 1.9842 LearningRate 0.000261 Epoch: 21 Global Step: 448470 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:02,330-Speed 2497.82 samples/sec Loss 2.0198 LearningRate 0.000261 Epoch: 21 Global Step: 448480 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:10,534-Speed 2496.67 samples/sec Loss 1.9497 LearningRate 0.000261 Epoch: 21 Global Step: 448490 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:18,734-Speed 2497.91 samples/sec Loss 1.9895 LearningRate 0.000261 Epoch: 21 Global Step: 448500 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:26,883-Speed 2513.64 samples/sec Loss 1.9814 LearningRate 0.000260 Epoch: 21 Global Step: 448510 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:35,103-Speed 2492.09 samples/sec Loss 1.9304 LearningRate 0.000260 Epoch: 21 Global Step: 448520 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:43,304-Speed 2497.59 samples/sec Loss 2.0050 LearningRate 0.000260 Epoch: 21 Global Step: 448530 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:51,510-Speed 2495.97 samples/sec Loss 1.9820 LearningRate 0.000260 Epoch: 21 Global Step: 448540 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:06:59,721-Speed 2494.61 samples/sec Loss 1.9677 LearningRate 0.000260 Epoch: 21 Global Step: 448550 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:07,923-Speed 2497.48 samples/sec Loss 2.0072 LearningRate 0.000260 Epoch: 21 Global Step: 448560 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:16,069-Speed 2514.12 samples/sec Loss 1.9747 LearningRate 0.000260 Epoch: 21 Global Step: 448570 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:24,268-Speed 2498.36 samples/sec Loss 1.9393 LearningRate 0.000260 Epoch: 21 Global Step: 448580 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:32,470-Speed 2497.52 samples/sec Loss 2.0002 LearningRate 0.000260 Epoch: 21 Global Step: 448590 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:40,671-Speed 2497.60 samples/sec Loss 1.9832 LearningRate 0.000260 Epoch: 21 Global Step: 448600 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:48,872-Speed 2497.71 samples/sec Loss 2.0366 LearningRate 0.000260 Epoch: 21 Global Step: 448610 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:07:57,086-Speed 2493.79 samples/sec Loss 2.0193 LearningRate 0.000260 Epoch: 21 Global Step: 448620 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:05,234-Speed 2513.90 samples/sec Loss 1.9658 LearningRate 0.000260 Epoch: 21 Global Step: 448630 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:13,434-Speed 2498.07 samples/sec Loss 2.0073 LearningRate 0.000260 Epoch: 21 Global Step: 448640 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:21,640-Speed 2495.98 samples/sec Loss 1.9877 LearningRate 0.000260 Epoch: 21 Global Step: 448650 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:29,840-Speed 2498.18 samples/sec Loss 2.0126 LearningRate 0.000260 Epoch: 21 Global Step: 448660 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:38,046-Speed 2495.87 samples/sec Loss 2.0165 LearningRate 0.000260 Epoch: 21 Global Step: 448670 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:46,253-Speed 2495.91 samples/sec Loss 1.9777 LearningRate 0.000260 Epoch: 21 Global Step: 448680 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:08:54,404-Speed 2513.02 samples/sec Loss 1.9602 LearningRate 0.000260 Epoch: 21 Global Step: 448690 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:02,605-Speed 2497.80 samples/sec Loss 1.9803 LearningRate 0.000260 Epoch: 21 Global Step: 448700 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:10,806-Speed 2497.73 samples/sec Loss 2.0428 LearningRate 0.000260 Epoch: 21 Global Step: 448710 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:19,009-Speed 2496.92 samples/sec Loss 1.9963 LearningRate 0.000260 Epoch: 21 Global Step: 448720 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:27,217-Speed 2495.89 samples/sec Loss 1.9936 LearningRate 0.000260 Epoch: 21 Global Step: 448730 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:35,419-Speed 2497.45 samples/sec Loss 2.0202 LearningRate 0.000260 Epoch: 21 Global Step: 448740 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:43,573-Speed 2512.19 samples/sec Loss 1.9581 LearningRate 0.000260 Epoch: 21 Global Step: 448750 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:51,774-Speed 2498.12 samples/sec Loss 1.9704 LearningRate 0.000260 Epoch: 21 Global Step: 448760 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:09:59,979-Speed 2496.35 samples/sec Loss 1.9853 LearningRate 0.000260 Epoch: 21 Global Step: 448770 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:08,184-Speed 2496.59 samples/sec Loss 2.0033 LearningRate 0.000260 Epoch: 21 Global Step: 448780 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:16,399-Speed 2493.89 samples/sec Loss 2.0093 LearningRate 0.000260 Epoch: 21 Global Step: 448790 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:24,607-Speed 2495.22 samples/sec Loss 1.9653 LearningRate 0.000260 Epoch: 21 Global Step: 448800 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:32,758-Speed 2512.89 samples/sec Loss 2.0029 LearningRate 0.000260 Epoch: 21 Global Step: 448810 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:40,960-Speed 2497.66 samples/sec Loss 1.9784 LearningRate 0.000260 Epoch: 21 Global Step: 448820 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:49,160-Speed 2498.20 samples/sec Loss 2.0038 LearningRate 0.000260 Epoch: 21 Global Step: 448830 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:10:57,363-Speed 2497.24 samples/sec Loss 1.9704 LearningRate 0.000260 Epoch: 21 Global Step: 448840 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:05,563-Speed 2497.81 samples/sec Loss 1.9868 LearningRate 0.000260 Epoch: 21 Global Step: 448850 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:13,771-Speed 2495.44 samples/sec Loss 1.9905 LearningRate 0.000260 Epoch: 21 Global Step: 448860 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:21,921-Speed 2513.89 samples/sec Loss 2.0028 LearningRate 0.000260 Epoch: 21 Global Step: 448870 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:30,120-Speed 2498.16 samples/sec Loss 1.9843 LearningRate 0.000260 Epoch: 21 Global Step: 448880 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:38,317-Speed 2498.84 samples/sec Loss 1.9761 LearningRate 0.000260 Epoch: 21 Global Step: 448890 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:46,515-Speed 2498.57 samples/sec Loss 1.9594 LearningRate 0.000260 Epoch: 21 Global Step: 448900 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:11:54,714-Speed 2498.27 samples/sec Loss 1.9905 LearningRate 0.000260 Epoch: 21 Global Step: 448910 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:02,916-Speed 2497.21 samples/sec Loss 1.9790 LearningRate 0.000260 Epoch: 21 Global Step: 448920 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:11,072-Speed 2511.61 samples/sec Loss 1.9833 LearningRate 0.000260 Epoch: 21 Global Step: 448930 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:19,277-Speed 2496.14 samples/sec Loss 1.9972 LearningRate 0.000260 Epoch: 21 Global Step: 448940 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:27,486-Speed 2495.42 samples/sec Loss 1.9921 LearningRate 0.000260 Epoch: 21 Global Step: 448950 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:35,686-Speed 2497.90 samples/sec Loss 1.9570 LearningRate 0.000260 Epoch: 21 Global Step: 448960 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:43,911-Speed 2490.22 samples/sec Loss 1.9889 LearningRate 0.000260 Epoch: 21 Global Step: 448970 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:12:52,108-Speed 2498.72 samples/sec Loss 1.9779 LearningRate 0.000260 Epoch: 21 Global Step: 448980 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:00,263-Speed 2511.92 samples/sec Loss 1.9559 LearningRate 0.000260 Epoch: 21 Global Step: 448990 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:08,461-Speed 2498.31 samples/sec Loss 1.9649 LearningRate 0.000260 Epoch: 21 Global Step: 449000 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:16,663-Speed 2497.42 samples/sec Loss 1.9752 LearningRate 0.000260 Epoch: 21 Global Step: 449010 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:24,867-Speed 2496.86 samples/sec Loss 1.9530 LearningRate 0.000260 Epoch: 21 Global Step: 449020 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:33,084-Speed 2492.57 samples/sec Loss 2.0170 LearningRate 0.000260 Epoch: 21 Global Step: 449030 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:41,298-Speed 2493.87 samples/sec Loss 1.9451 LearningRate 0.000260 Epoch: 21 Global Step: 449040 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:49,450-Speed 2512.59 samples/sec Loss 2.0081 LearningRate 0.000260 Epoch: 21 Global Step: 449050 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:13:57,663-Speed 2493.96 samples/sec Loss 1.9616 LearningRate 0.000260 Epoch: 21 Global Step: 449060 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:05,863-Speed 2497.95 samples/sec Loss 1.9194 LearningRate 0.000260 Epoch: 21 Global Step: 449070 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:14,061-Speed 2498.76 samples/sec Loss 1.9515 LearningRate 0.000260 Epoch: 21 Global Step: 449080 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:22,282-Speed 2491.48 samples/sec Loss 2.0130 LearningRate 0.000260 Epoch: 21 Global Step: 449090 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:30,485-Speed 2496.74 samples/sec Loss 2.0065 LearningRate 0.000260 Epoch: 21 Global Step: 449100 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:38,635-Speed 2513.52 samples/sec Loss 1.9581 LearningRate 0.000260 Epoch: 21 Global Step: 449110 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:46,831-Speed 2498.96 samples/sec Loss 1.9745 LearningRate 0.000260 Epoch: 21 Global Step: 449120 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:14:55,036-Speed 2496.67 samples/sec Loss 1.9626 LearningRate 0.000260 Epoch: 21 Global Step: 449130 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:15:03,234-Speed 2498.55 samples/sec Loss 1.9776 LearningRate 0.000260 Epoch: 21 Global Step: 449140 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:15:11,451-Speed 2492.70 samples/sec Loss 1.9510 LearningRate 0.000260 Epoch: 21 Global Step: 449150 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:15:19,659-Speed 2495.73 samples/sec Loss 1.9740 LearningRate 0.000260 Epoch: 21 Global Step: 449160 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:15:27,806-Speed 2514.14 samples/sec Loss 2.0058 LearningRate 0.000260 Epoch: 21 Global Step: 449170 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:15:36,003-Speed 2499.07 samples/sec Loss 2.0026 LearningRate 0.000260 Epoch: 21 Global Step: 449180 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:15:44,202-Speed 2498.31 samples/sec Loss 1.9984 LearningRate 0.000260 Epoch: 21 Global Step: 449190 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:15:52,416-Speed 2494.13 samples/sec Loss 1.9963 LearningRate 0.000260 Epoch: 21 Global Step: 449200 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:00,620-Speed 2496.88 samples/sec Loss 2.0443 LearningRate 0.000260 Epoch: 21 Global Step: 449210 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:08,823-Speed 2496.91 samples/sec Loss 1.9602 LearningRate 0.000260 Epoch: 21 Global Step: 449220 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:16,973-Speed 2513.33 samples/sec Loss 1.9989 LearningRate 0.000260 Epoch: 21 Global Step: 449230 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:25,186-Speed 2494.46 samples/sec Loss 1.9806 LearningRate 0.000259 Epoch: 21 Global Step: 449240 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:33,385-Speed 2498.02 samples/sec Loss 1.9934 LearningRate 0.000259 Epoch: 21 Global Step: 449250 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:41,597-Speed 2494.54 samples/sec Loss 1.9553 LearningRate 0.000259 Epoch: 21 Global Step: 449260 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:49,797-Speed 2498.10 samples/sec Loss 1.9845 LearningRate 0.000259 Epoch: 21 Global Step: 449270 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:16:57,997-Speed 2497.89 samples/sec Loss 1.9604 LearningRate 0.000259 Epoch: 21 Global Step: 449280 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:06,150-Speed 2512.29 samples/sec Loss 1.9526 LearningRate 0.000259 Epoch: 21 Global Step: 449290 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:14,350-Speed 2497.73 samples/sec Loss 1.9519 LearningRate 0.000259 Epoch: 21 Global Step: 449300 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:22,560-Speed 2495.08 samples/sec Loss 1.9858 LearningRate 0.000259 Epoch: 21 Global Step: 449310 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:30,759-Speed 2498.25 samples/sec Loss 1.9629 LearningRate 0.000259 Epoch: 21 Global Step: 449320 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:38,958-Speed 2498.21 samples/sec Loss 1.9419 LearningRate 0.000259 Epoch: 21 Global Step: 449330 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:47,184-Speed 2490.04 samples/sec Loss 1.9454 LearningRate 0.000259 Epoch: 21 Global Step: 449340 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:17:55,328-Speed 2515.17 samples/sec Loss 1.9429 LearningRate 0.000259 Epoch: 21 Global Step: 449350 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:03,530-Speed 2497.61 samples/sec Loss 1.9617 LearningRate 0.000259 Epoch: 21 Global Step: 449360 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:11,735-Speed 2496.40 samples/sec Loss 1.9544 LearningRate 0.000259 Epoch: 21 Global Step: 449370 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:19,934-Speed 2498.20 samples/sec Loss 1.9780 LearningRate 0.000259 Epoch: 21 Global Step: 449380 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:28,135-Speed 2497.93 samples/sec Loss 1.9661 LearningRate 0.000259 Epoch: 21 Global Step: 449390 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:36,335-Speed 2497.91 samples/sec Loss 1.9390 LearningRate 0.000259 Epoch: 21 Global Step: 449400 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:44,512-Speed 2505.19 samples/sec Loss 1.9714 LearningRate 0.000259 Epoch: 21 Global Step: 449410 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:18:52,712-Speed 2497.81 samples/sec Loss 1.9638 LearningRate 0.000259 Epoch: 21 Global Step: 449420 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:00,913-Speed 2497.88 samples/sec Loss 1.9689 LearningRate 0.000259 Epoch: 21 Global Step: 449430 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:09,119-Speed 2496.34 samples/sec Loss 1.9349 LearningRate 0.000259 Epoch: 21 Global Step: 449440 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:17,332-Speed 2494.09 samples/sec Loss 1.9559 LearningRate 0.000259 Epoch: 21 Global Step: 449450 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:25,527-Speed 2499.47 samples/sec Loss 1.9998 LearningRate 0.000259 Epoch: 21 Global Step: 449460 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:33,674-Speed 2514.19 samples/sec Loss 1.9738 LearningRate 0.000259 Epoch: 21 Global Step: 449470 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:41,871-Speed 2498.78 samples/sec Loss 1.9056 LearningRate 0.000259 Epoch: 21 Global Step: 449480 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:50,069-Speed 2498.87 samples/sec Loss 2.0000 LearningRate 0.000259 Epoch: 21 Global Step: 449490 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:19:58,273-Speed 2496.78 samples/sec Loss 1.9427 LearningRate 0.000259 Epoch: 21 Global Step: 449500 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:06,473-Speed 2498.14 samples/sec Loss 1.9711 LearningRate 0.000259 Epoch: 21 Global Step: 449510 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:14,686-Speed 2493.73 samples/sec Loss 2.0317 LearningRate 0.000259 Epoch: 21 Global Step: 449520 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:22,836-Speed 2513.39 samples/sec Loss 2.0109 LearningRate 0.000259 Epoch: 21 Global Step: 449530 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:31,035-Speed 2498.44 samples/sec Loss 1.9893 LearningRate 0.000259 Epoch: 21 Global Step: 449540 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:39,237-Speed 2497.48 samples/sec Loss 2.0511 LearningRate 0.000259 Epoch: 21 Global Step: 449550 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:47,437-Speed 2498.18 samples/sec Loss 2.0161 LearningRate 0.000259 Epoch: 21 Global Step: 449560 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:20:55,638-Speed 2497.47 samples/sec Loss 1.9991 LearningRate 0.000259 Epoch: 21 Global Step: 449570 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:03,837-Speed 2498.46 samples/sec Loss 1.9933 LearningRate 0.000259 Epoch: 21 Global Step: 449580 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:11,979-Speed 2515.65 samples/sec Loss 2.0061 LearningRate 0.000259 Epoch: 21 Global Step: 449590 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:20,182-Speed 2496.90 samples/sec Loss 1.9969 LearningRate 0.000259 Epoch: 21 Global Step: 449600 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:28,382-Speed 2498.12 samples/sec Loss 1.9830 LearningRate 0.000259 Epoch: 21 Global Step: 449610 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:36,580-Speed 2498.62 samples/sec Loss 1.9628 LearningRate 0.000259 Epoch: 21 Global Step: 449620 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:44,792-Speed 2494.37 samples/sec Loss 1.9893 LearningRate 0.000259 Epoch: 21 Global Step: 449630 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:21:53,002-Speed 2494.93 samples/sec Loss 1.9661 LearningRate 0.000259 Epoch: 21 Global Step: 449640 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:01,150-Speed 2513.84 samples/sec Loss 2.0037 LearningRate 0.000259 Epoch: 21 Global Step: 449650 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:09,349-Speed 2498.39 samples/sec Loss 1.9666 LearningRate 0.000259 Epoch: 21 Global Step: 449660 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:17,565-Speed 2493.01 samples/sec Loss 2.0101 LearningRate 0.000259 Epoch: 21 Global Step: 449670 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:25,764-Speed 2498.37 samples/sec Loss 1.9611 LearningRate 0.000259 Epoch: 21 Global Step: 449680 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:33,964-Speed 2497.93 samples/sec Loss 1.9829 LearningRate 0.000259 Epoch: 21 Global Step: 449690 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:42,165-Speed 2497.37 samples/sec Loss 1.9382 LearningRate 0.000259 Epoch: 21 Global Step: 449700 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:50,315-Speed 2513.42 samples/sec Loss 2.0047 LearningRate 0.000259 Epoch: 21 Global Step: 449710 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:22:58,513-Speed 2498.55 samples/sec Loss 1.9490 LearningRate 0.000259 Epoch: 21 Global Step: 449720 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:06,712-Speed 2498.41 samples/sec Loss 1.9717 LearningRate 0.000259 Epoch: 21 Global Step: 449730 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:14,910-Speed 2498.40 samples/sec Loss 1.9933 LearningRate 0.000259 Epoch: 21 Global Step: 449740 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:23,111-Speed 2497.63 samples/sec Loss 1.9920 LearningRate 0.000259 Epoch: 21 Global Step: 449750 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:31,322-Speed 2494.73 samples/sec Loss 1.9939 LearningRate 0.000259 Epoch: 21 Global Step: 449760 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:39,470-Speed 2513.66 samples/sec Loss 2.0027 LearningRate 0.000259 Epoch: 21 Global Step: 449770 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:47,670-Speed 2498.17 samples/sec Loss 2.0224 LearningRate 0.000259 Epoch: 21 Global Step: 449780 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:23:55,869-Speed 2498.23 samples/sec Loss 2.0312 LearningRate 0.000259 Epoch: 21 Global Step: 449790 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:04,072-Speed 2497.14 samples/sec Loss 1.9953 LearningRate 0.000259 Epoch: 21 Global Step: 449800 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:12,273-Speed 2497.66 samples/sec Loss 2.0686 LearningRate 0.000259 Epoch: 21 Global Step: 449810 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:20,473-Speed 2497.74 samples/sec Loss 2.0508 LearningRate 0.000259 Epoch: 21 Global Step: 449820 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:28,629-Speed 2511.23 samples/sec Loss 1.9903 LearningRate 0.000259 Epoch: 21 Global Step: 449830 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:36,831-Speed 2497.52 samples/sec Loss 1.9897 LearningRate 0.000259 Epoch: 21 Global Step: 449840 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:45,037-Speed 2496.32 samples/sec Loss 1.9790 LearningRate 0.000259 Epoch: 21 Global Step: 449850 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:24:53,237-Speed 2498.05 samples/sec Loss 2.0129 LearningRate 0.000259 Epoch: 21 Global Step: 449860 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:01,438-Speed 2497.57 samples/sec Loss 1.9869 LearningRate 0.000259 Epoch: 21 Global Step: 449870 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:09,651-Speed 2493.94 samples/sec Loss 1.9995 LearningRate 0.000259 Epoch: 21 Global Step: 449880 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:17,798-Speed 2514.39 samples/sec Loss 2.0229 LearningRate 0.000259 Epoch: 21 Global Step: 449890 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:25,997-Speed 2498.30 samples/sec Loss 1.9532 LearningRate 0.000259 Epoch: 21 Global Step: 449900 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:34,199-Speed 2496.97 samples/sec Loss 2.0048 LearningRate 0.000259 Epoch: 21 Global Step: 449910 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:42,403-Speed 2496.76 samples/sec Loss 1.9959 LearningRate 0.000259 Epoch: 21 Global Step: 449920 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:50,603-Speed 2498.27 samples/sec Loss 2.0118 LearningRate 0.000259 Epoch: 21 Global Step: 449930 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:25:58,802-Speed 2498.21 samples/sec Loss 1.9425 LearningRate 0.000259 Epoch: 21 Global Step: 449940 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:06,956-Speed 2512.07 samples/sec Loss 1.9883 LearningRate 0.000259 Epoch: 21 Global Step: 449950 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:15,152-Speed 2499.42 samples/sec Loss 2.0354 LearningRate 0.000259 Epoch: 21 Global Step: 449960 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:23,352-Speed 2497.97 samples/sec Loss 2.0654 LearningRate 0.000258 Epoch: 21 Global Step: 449970 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:31,557-Speed 2496.46 samples/sec Loss 1.9831 LearningRate 0.000258 Epoch: 21 Global Step: 449980 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:39,757-Speed 2498.07 samples/sec Loss 1.9216 LearningRate 0.000258 Epoch: 21 Global Step: 449990 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:47,963-Speed 2495.99 samples/sec Loss 1.9905 LearningRate 0.000258 Epoch: 21 Global Step: 450000 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:26:56,110-Speed 2514.12 samples/sec Loss 2.0117 LearningRate 0.000258 Epoch: 21 Global Step: 450010 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:04,307-Speed 2498.89 samples/sec Loss 1.9881 LearningRate 0.000258 Epoch: 21 Global Step: 450020 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:12,523-Speed 2493.16 samples/sec Loss 2.0288 LearningRate 0.000258 Epoch: 21 Global Step: 450030 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:20,735-Speed 2494.50 samples/sec Loss 1.9983 LearningRate 0.000258 Epoch: 21 Global Step: 450040 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:28,930-Speed 2499.79 samples/sec Loss 1.9995 LearningRate 0.000258 Epoch: 21 Global Step: 450050 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:37,131-Speed 2497.54 samples/sec Loss 1.9818 LearningRate 0.000258 Epoch: 21 Global Step: 450060 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:45,276-Speed 2514.76 samples/sec Loss 1.9838 LearningRate 0.000258 Epoch: 21 Global Step: 450070 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:27:53,494-Speed 2492.72 samples/sec Loss 2.0140 LearningRate 0.000258 Epoch: 21 Global Step: 450080 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:01,692-Speed 2498.69 samples/sec Loss 1.9973 LearningRate 0.000258 Epoch: 21 Global Step: 450090 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:09,893-Speed 2497.73 samples/sec Loss 1.9745 LearningRate 0.000258 Epoch: 21 Global Step: 450100 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:18,096-Speed 2497.09 samples/sec Loss 2.0205 LearningRate 0.000258 Epoch: 21 Global Step: 450110 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:26,313-Speed 2492.88 samples/sec Loss 1.9741 LearningRate 0.000258 Epoch: 21 Global Step: 450120 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:34,460-Speed 2514.02 samples/sec Loss 1.9775 LearningRate 0.000258 Epoch: 21 Global Step: 450130 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:42,666-Speed 2496.27 samples/sec Loss 1.9798 LearningRate 0.000258 Epoch: 21 Global Step: 450140 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:50,870-Speed 2496.88 samples/sec Loss 1.9327 LearningRate 0.000258 Epoch: 21 Global Step: 450150 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:28:59,085-Speed 2493.37 samples/sec Loss 1.9996 LearningRate 0.000258 Epoch: 21 Global Step: 450160 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:07,286-Speed 2497.63 samples/sec Loss 1.9907 LearningRate 0.000258 Epoch: 21 Global Step: 450170 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:15,490-Speed 2496.64 samples/sec Loss 1.9567 LearningRate 0.000258 Epoch: 21 Global Step: 450180 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:23,636-Speed 2514.57 samples/sec Loss 1.9863 LearningRate 0.000258 Epoch: 21 Global Step: 450190 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:31,842-Speed 2496.08 samples/sec Loss 1.9866 LearningRate 0.000258 Epoch: 21 Global Step: 450200 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:40,047-Speed 2496.59 samples/sec Loss 1.9955 LearningRate 0.000258 Epoch: 21 Global Step: 450210 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:48,250-Speed 2496.78 samples/sec Loss 1.9617 LearningRate 0.000258 Epoch: 21 Global Step: 450220 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:29:56,451-Speed 2497.59 samples/sec Loss 1.9519 LearningRate 0.000258 Epoch: 21 Global Step: 450230 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:04,650-Speed 2498.41 samples/sec Loss 1.9805 LearningRate 0.000258 Epoch: 21 Global Step: 450240 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:12,795-Speed 2514.91 samples/sec Loss 1.9390 LearningRate 0.000258 Epoch: 21 Global Step: 450250 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:20,993-Speed 2498.62 samples/sec Loss 2.0126 LearningRate 0.000258 Epoch: 21 Global Step: 450260 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:29,196-Speed 2497.00 samples/sec Loss 1.9719 LearningRate 0.000258 Epoch: 21 Global Step: 450270 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:37,407-Speed 2494.62 samples/sec Loss 1.9650 LearningRate 0.000258 Epoch: 21 Global Step: 450280 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:45,608-Speed 2497.75 samples/sec Loss 1.9642 LearningRate 0.000258 Epoch: 21 Global Step: 450290 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:30:53,804-Speed 2498.78 samples/sec Loss 1.9985 LearningRate 0.000258 Epoch: 21 Global Step: 450300 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:31:01,954-Speed 2513.42 samples/sec Loss 1.9364 LearningRate 0.000258 Epoch: 21 Global Step: 450310 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:31:10,163-Speed 2495.32 samples/sec Loss 1.9824 LearningRate 0.000258 Epoch: 21 Global Step: 450320 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:31:18,365-Speed 2497.37 samples/sec Loss 1.9780 LearningRate 0.000258 Epoch: 21 Global Step: 450330 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:31:26,564-Speed 2498.07 samples/sec Loss 1.9819 LearningRate 0.000258 Epoch: 21 Global Step: 450340 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:31:34,764-Speed 2498.02 samples/sec Loss 1.9504 LearningRate 0.000258 Epoch: 21 Global Step: 450350 Fp16 Grad Scale: 65536 Required: 87 hours Training: 2022-07-09 21:31:42,968-Speed 2496.60 samples/sec Loss 1.9413 LearningRate 0.000258 Epoch: 21 Global Step: 450360 Fp16 Grad Scale: 65536 Required: 87 hours Training: 2022-07-09 21:31:51,114-Speed 2514.70 samples/sec Loss 1.9582 LearningRate 0.000258 Epoch: 21 Global Step: 450370 Fp16 Grad Scale: 65536 Required: 87 hours Training: 2022-07-09 21:31:59,318-Speed 2496.49 samples/sec Loss 1.9855 LearningRate 0.000258 Epoch: 21 Global Step: 450380 Fp16 Grad Scale: 65536 Required: 87 hours Training: 2022-07-09 21:32:07,519-Speed 2497.83 samples/sec Loss 1.9154 LearningRate 0.000258 Epoch: 21 Global Step: 450390 Fp16 Grad Scale: 65536 Required: 87 hours Training: 2022-07-09 21:32:15,673-Speed 2511.89 samples/sec Loss 1.9372 LearningRate 0.000258 Epoch: 21 Global Step: 450400 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:32:23,873-Speed 2498.01 samples/sec Loss 1.9643 LearningRate 0.000258 Epoch: 21 Global Step: 450410 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:32:32,078-Speed 2496.50 samples/sec Loss 1.9638 LearningRate 0.000258 Epoch: 21 Global Step: 450420 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:32:40,226-Speed 2513.79 samples/sec Loss 1.9202 LearningRate 0.000258 Epoch: 21 Global Step: 450430 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:32:48,423-Speed 2498.95 samples/sec Loss 1.9446 LearningRate 0.000258 Epoch: 21 Global Step: 450440 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:32:56,621-Speed 2498.57 samples/sec Loss 1.9527 LearningRate 0.000258 Epoch: 21 Global Step: 450450 Fp16 Grad Scale: 32768 Required: 87 hours Training: 2022-07-09 21:33:04,785-Speed 2508.67 samples/sec Loss 1.9519 LearningRate 0.000258 Epoch: 21 Global Step: 450460 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:12,988-Speed 2497.35 samples/sec Loss 1.9754 LearningRate 0.000258 Epoch: 21 Global Step: 450470 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:21,185-Speed 2498.63 samples/sec Loss 1.9658 LearningRate 0.000258 Epoch: 21 Global Step: 450480 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:29,331-Speed 2514.42 samples/sec Loss 1.9233 LearningRate 0.000258 Epoch: 21 Global Step: 450490 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:37,529-Speed 2498.85 samples/sec Loss 1.9612 LearningRate 0.000258 Epoch: 21 Global Step: 450500 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:45,731-Speed 2497.48 samples/sec Loss 1.9498 LearningRate 0.000258 Epoch: 21 Global Step: 450510 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:33:53,930-Speed 2498.36 samples/sec Loss 1.9307 LearningRate 0.000258 Epoch: 21 Global Step: 450520 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:02,126-Speed 2499.05 samples/sec Loss 1.9093 LearningRate 0.000258 Epoch: 21 Global Step: 450530 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:10,329-Speed 2497.26 samples/sec Loss 1.9442 LearningRate 0.000258 Epoch: 21 Global Step: 450540 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:18,488-Speed 2510.21 samples/sec Loss 1.8866 LearningRate 0.000258 Epoch: 21 Global Step: 450550 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:26,686-Speed 2498.47 samples/sec Loss 1.9696 LearningRate 0.000258 Epoch: 21 Global Step: 450560 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:34,886-Speed 2498.46 samples/sec Loss 1.9288 LearningRate 0.000258 Epoch: 21 Global Step: 450570 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:43,087-Speed 2497.75 samples/sec Loss 2.0127 LearningRate 0.000258 Epoch: 21 Global Step: 450580 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:51,289-Speed 2497.24 samples/sec Loss 1.9677 LearningRate 0.000258 Epoch: 21 Global Step: 450590 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:34:59,490-Speed 2497.69 samples/sec Loss 1.9596 LearningRate 0.000258 Epoch: 21 Global Step: 450600 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:07,636-Speed 2514.36 samples/sec Loss 1.9560 LearningRate 0.000258 Epoch: 21 Global Step: 450610 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:15,855-Speed 2492.52 samples/sec Loss 2.0171 LearningRate 0.000258 Epoch: 21 Global Step: 450620 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:24,062-Speed 2495.67 samples/sec Loss 1.9726 LearningRate 0.000258 Epoch: 21 Global Step: 450630 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:32,261-Speed 2498.24 samples/sec Loss 1.9919 LearningRate 0.000258 Epoch: 21 Global Step: 450640 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:40,463-Speed 2497.44 samples/sec Loss 1.9402 LearningRate 0.000258 Epoch: 21 Global Step: 450650 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:48,664-Speed 2497.71 samples/sec Loss 1.9235 LearningRate 0.000258 Epoch: 21 Global Step: 450660 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:35:56,806-Speed 2515.54 samples/sec Loss 1.9313 LearningRate 0.000258 Epoch: 21 Global Step: 450670 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:05,014-Speed 2495.77 samples/sec Loss 1.9808 LearningRate 0.000258 Epoch: 21 Global Step: 450680 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:13,209-Speed 2499.43 samples/sec Loss 1.9768 LearningRate 0.000258 Epoch: 21 Global Step: 450690 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:21,408-Speed 2498.22 samples/sec Loss 1.9968 LearningRate 0.000258 Epoch: 21 Global Step: 450700 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:29,613-Speed 2496.33 samples/sec Loss 1.9341 LearningRate 0.000257 Epoch: 21 Global Step: 450710 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:37,813-Speed 2497.78 samples/sec Loss 1.9238 LearningRate 0.000257 Epoch: 21 Global Step: 450720 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:45,965-Speed 2512.61 samples/sec Loss 1.9039 LearningRate 0.000257 Epoch: 21 Global Step: 450730 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:36:54,167-Speed 2497.43 samples/sec Loss 1.9304 LearningRate 0.000257 Epoch: 21 Global Step: 450740 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:02,375-Speed 2495.59 samples/sec Loss 1.9680 LearningRate 0.000257 Epoch: 21 Global Step: 450750 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:10,581-Speed 2496.32 samples/sec Loss 1.9579 LearningRate 0.000257 Epoch: 21 Global Step: 450760 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:18,781-Speed 2497.94 samples/sec Loss 1.9549 LearningRate 0.000257 Epoch: 21 Global Step: 450770 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:26,985-Speed 2496.50 samples/sec Loss 2.0050 LearningRate 0.000257 Epoch: 21 Global Step: 450780 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:35,138-Speed 2512.31 samples/sec Loss 1.9381 LearningRate 0.000257 Epoch: 21 Global Step: 450790 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:43,340-Speed 2497.28 samples/sec Loss 2.0597 LearningRate 0.000257 Epoch: 21 Global Step: 450800 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:51,543-Speed 2497.10 samples/sec Loss 2.0273 LearningRate 0.000257 Epoch: 21 Global Step: 450810 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:37:59,750-Speed 2496.01 samples/sec Loss 1.9979 LearningRate 0.000257 Epoch: 21 Global Step: 450820 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:07,953-Speed 2496.98 samples/sec Loss 1.9852 LearningRate 0.000257 Epoch: 21 Global Step: 450830 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:16,161-Speed 2495.47 samples/sec Loss 2.0028 LearningRate 0.000257 Epoch: 21 Global Step: 450840 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:24,311-Speed 2513.45 samples/sec Loss 1.9971 LearningRate 0.000257 Epoch: 21 Global Step: 450850 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:32,512-Speed 2497.56 samples/sec Loss 1.9905 LearningRate 0.000257 Epoch: 21 Global Step: 450860 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:40,713-Speed 2497.73 samples/sec Loss 1.9641 LearningRate 0.000257 Epoch: 21 Global Step: 450870 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:48,917-Speed 2497.07 samples/sec Loss 2.0028 LearningRate 0.000257 Epoch: 21 Global Step: 450880 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:38:57,124-Speed 2495.81 samples/sec Loss 1.9892 LearningRate 0.000257 Epoch: 21 Global Step: 450890 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:05,328-Speed 2496.56 samples/sec Loss 1.9970 LearningRate 0.000257 Epoch: 21 Global Step: 450900 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:13,475-Speed 2514.25 samples/sec Loss 1.9615 LearningRate 0.000257 Epoch: 21 Global Step: 450910 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:21,674-Speed 2498.43 samples/sec Loss 1.9852 LearningRate 0.000257 Epoch: 21 Global Step: 450920 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:29,880-Speed 2496.42 samples/sec Loss 2.0015 LearningRate 0.000257 Epoch: 21 Global Step: 450930 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:38,091-Speed 2494.57 samples/sec Loss 1.9854 LearningRate 0.000257 Epoch: 21 Global Step: 450940 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:46,295-Speed 2496.86 samples/sec Loss 1.9747 LearningRate 0.000257 Epoch: 21 Global Step: 450950 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:39:54,503-Speed 2495.49 samples/sec Loss 2.0024 LearningRate 0.000257 Epoch: 21 Global Step: 450960 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:02,649-Speed 2514.35 samples/sec Loss 1.9791 LearningRate 0.000257 Epoch: 21 Global Step: 450970 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:10,857-Speed 2495.70 samples/sec Loss 1.9685 LearningRate 0.000257 Epoch: 21 Global Step: 450980 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:19,056-Speed 2498.61 samples/sec Loss 1.9515 LearningRate 0.000257 Epoch: 21 Global Step: 450990 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:27,260-Speed 2496.94 samples/sec Loss 1.9315 LearningRate 0.000257 Epoch: 21 Global Step: 451000 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:35,479-Speed 2492.11 samples/sec Loss 1.9487 LearningRate 0.000257 Epoch: 21 Global Step: 451010 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:43,677-Speed 2498.46 samples/sec Loss 1.9336 LearningRate 0.000257 Epoch: 21 Global Step: 451020 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:40:51,837-Speed 2510.15 samples/sec Loss 1.9926 LearningRate 0.000257 Epoch: 21 Global Step: 451030 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:00,036-Speed 2498.54 samples/sec Loss 1.9737 LearningRate 0.000257 Epoch: 21 Global Step: 451040 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:08,238-Speed 2497.20 samples/sec Loss 1.9499 LearningRate 0.000257 Epoch: 21 Global Step: 451050 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:16,439-Speed 2497.80 samples/sec Loss 1.9847 LearningRate 0.000257 Epoch: 21 Global Step: 451060 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:24,637-Speed 2498.63 samples/sec Loss 1.9475 LearningRate 0.000257 Epoch: 21 Global Step: 451070 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:32,841-Speed 2496.92 samples/sec Loss 1.9676 LearningRate 0.000257 Epoch: 21 Global Step: 451080 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:40,984-Speed 2515.45 samples/sec Loss 1.9205 LearningRate 0.000257 Epoch: 21 Global Step: 451090 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:49,186-Speed 2497.29 samples/sec Loss 1.9248 LearningRate 0.000257 Epoch: 21 Global Step: 451100 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:41:57,386-Speed 2497.85 samples/sec Loss 1.9399 LearningRate 0.000257 Epoch: 21 Global Step: 451110 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:05,583-Speed 2498.96 samples/sec Loss 1.9369 LearningRate 0.000257 Epoch: 21 Global Step: 451120 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:13,786-Speed 2496.95 samples/sec Loss 1.9500 LearningRate 0.000257 Epoch: 21 Global Step: 451130 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:21,984-Speed 2498.53 samples/sec Loss 1.8694 LearningRate 0.000257 Epoch: 21 Global Step: 451140 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:30,133-Speed 2513.53 samples/sec Loss 1.9346 LearningRate 0.000257 Epoch: 21 Global Step: 451150 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:38,336-Speed 2497.26 samples/sec Loss 1.9831 LearningRate 0.000257 Epoch: 21 Global Step: 451160 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:46,539-Speed 2497.09 samples/sec Loss 1.9175 LearningRate 0.000257 Epoch: 21 Global Step: 451170 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:42:54,737-Speed 2498.65 samples/sec Loss 1.9550 LearningRate 0.000257 Epoch: 21 Global Step: 451180 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:02,938-Speed 2497.98 samples/sec Loss 1.9322 LearningRate 0.000257 Epoch: 21 Global Step: 451190 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:11,142-Speed 2496.62 samples/sec Loss 1.9512 LearningRate 0.000257 Epoch: 21 Global Step: 451200 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:19,297-Speed 2511.93 samples/sec Loss 1.9330 LearningRate 0.000257 Epoch: 21 Global Step: 451210 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:27,506-Speed 2495.21 samples/sec Loss 1.9540 LearningRate 0.000257 Epoch: 21 Global Step: 451220 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:35,703-Speed 2498.57 samples/sec Loss 2.0015 LearningRate 0.000257 Epoch: 21 Global Step: 451230 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:43,902-Speed 2498.23 samples/sec Loss 1.9221 LearningRate 0.000257 Epoch: 21 Global Step: 451240 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:43:52,104-Speed 2497.59 samples/sec Loss 1.9767 LearningRate 0.000257 Epoch: 21 Global Step: 451250 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:00,318-Speed 2493.92 samples/sec Loss 1.9333 LearningRate 0.000257 Epoch: 21 Global Step: 451260 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:08,468-Speed 2513.43 samples/sec Loss 1.9373 LearningRate 0.000257 Epoch: 21 Global Step: 451270 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:16,668-Speed 2497.81 samples/sec Loss 1.9559 LearningRate 0.000257 Epoch: 21 Global Step: 451280 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:24,870-Speed 2497.24 samples/sec Loss 1.9263 LearningRate 0.000257 Epoch: 21 Global Step: 451290 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:33,088-Speed 2492.47 samples/sec Loss 1.9440 LearningRate 0.000257 Epoch: 21 Global Step: 451300 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:41,290-Speed 2497.31 samples/sec Loss 1.9866 LearningRate 0.000257 Epoch: 21 Global Step: 451310 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:49,491-Speed 2497.51 samples/sec Loss 1.9452 LearningRate 0.000257 Epoch: 21 Global Step: 451320 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:44:57,637-Speed 2515.41 samples/sec Loss 1.9481 LearningRate 0.000257 Epoch: 21 Global Step: 451330 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:05,841-Speed 2496.68 samples/sec Loss 1.9256 LearningRate 0.000257 Epoch: 21 Global Step: 451340 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:14,046-Speed 2496.48 samples/sec Loss 1.9159 LearningRate 0.000257 Epoch: 21 Global Step: 451350 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:22,259-Speed 2493.82 samples/sec Loss 1.9729 LearningRate 0.000257 Epoch: 21 Global Step: 451360 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:30,459-Speed 2498.25 samples/sec Loss 1.9240 LearningRate 0.000257 Epoch: 21 Global Step: 451370 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:38,661-Speed 2497.16 samples/sec Loss 1.9546 LearningRate 0.000257 Epoch: 21 Global Step: 451380 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:46,807-Speed 2514.59 samples/sec Loss 2.0314 LearningRate 0.000257 Epoch: 21 Global Step: 451390 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:45:55,010-Speed 2497.46 samples/sec Loss 1.9138 LearningRate 0.000257 Epoch: 21 Global Step: 451400 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:03,211-Speed 2497.60 samples/sec Loss 2.0024 LearningRate 0.000257 Epoch: 21 Global Step: 451410 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:11,414-Speed 2496.92 samples/sec Loss 1.9209 LearningRate 0.000257 Epoch: 21 Global Step: 451420 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:19,627-Speed 2494.12 samples/sec Loss 1.9543 LearningRate 0.000257 Epoch: 21 Global Step: 451430 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:27,833-Speed 2496.07 samples/sec Loss 1.9781 LearningRate 0.000256 Epoch: 21 Global Step: 451440 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:35,997-Speed 2509.06 samples/sec Loss 1.9380 LearningRate 0.000256 Epoch: 21 Global Step: 451450 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:44,196-Speed 2498.14 samples/sec Loss 2.0056 LearningRate 0.000256 Epoch: 21 Global Step: 451460 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:46:52,417-Speed 2491.79 samples/sec Loss 1.9967 LearningRate 0.000256 Epoch: 21 Global Step: 451470 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:00,618-Speed 2497.55 samples/sec Loss 2.0048 LearningRate 0.000256 Epoch: 21 Global Step: 451480 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:08,817-Speed 2498.11 samples/sec Loss 1.9477 LearningRate 0.000256 Epoch: 21 Global Step: 451490 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:17,016-Speed 2498.16 samples/sec Loss 2.0011 LearningRate 0.000256 Epoch: 21 Global Step: 451500 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:25,163-Speed 2515.06 samples/sec Loss 1.9900 LearningRate 0.000256 Epoch: 21 Global Step: 451510 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:33,357-Speed 2499.42 samples/sec Loss 2.0037 LearningRate 0.000256 Epoch: 21 Global Step: 451520 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:41,558-Speed 2497.80 samples/sec Loss 2.0219 LearningRate 0.000256 Epoch: 21 Global Step: 451530 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:49,753-Speed 2499.95 samples/sec Loss 1.9999 LearningRate 0.000256 Epoch: 21 Global Step: 451540 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:47:57,951-Speed 2498.59 samples/sec Loss 1.9453 LearningRate 0.000256 Epoch: 21 Global Step: 451550 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:06,151-Speed 2497.61 samples/sec Loss 1.9637 LearningRate 0.000256 Epoch: 21 Global Step: 451560 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:14,303-Speed 2513.06 samples/sec Loss 1.9675 LearningRate 0.000256 Epoch: 21 Global Step: 451570 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:22,503-Speed 2498.03 samples/sec Loss 1.9751 LearningRate 0.000256 Epoch: 21 Global Step: 451580 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:30,701-Speed 2498.40 samples/sec Loss 1.9528 LearningRate 0.000256 Epoch: 21 Global Step: 451590 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:38,909-Speed 2495.69 samples/sec Loss 1.9606 LearningRate 0.000256 Epoch: 21 Global Step: 451600 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:47,128-Speed 2492.39 samples/sec Loss 1.9714 LearningRate 0.000256 Epoch: 21 Global Step: 451610 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:48:55,332-Speed 2496.80 samples/sec Loss 1.9806 LearningRate 0.000256 Epoch: 21 Global Step: 451620 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:49:03,487-Speed 2511.51 samples/sec Loss 1.9632 LearningRate 0.000256 Epoch: 21 Global Step: 451630 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:49:11,688-Speed 2497.54 samples/sec Loss 1.9620 LearningRate 0.000256 Epoch: 21 Global Step: 451640 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:49:19,889-Speed 2497.69 samples/sec Loss 2.0060 LearningRate 0.000256 Epoch: 21 Global Step: 451650 Fp16 Grad Scale: 16384 Required: 87 hours Training: 2022-07-09 21:49:28,089-Speed 2498.28 samples/sec Loss 2.0174 LearningRate 0.000256 Epoch: 21 Global Step: 451660 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:49:36,303-Speed 2493.62 samples/sec Loss 1.9622 LearningRate 0.000256 Epoch: 21 Global Step: 451670 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:49:44,501-Speed 2498.41 samples/sec Loss 1.9840 LearningRate 0.000256 Epoch: 21 Global Step: 451680 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:49:52,652-Speed 2513.19 samples/sec Loss 1.9673 LearningRate 0.000256 Epoch: 21 Global Step: 451690 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:00,861-Speed 2495.23 samples/sec Loss 2.0112 LearningRate 0.000256 Epoch: 21 Global Step: 451700 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:09,061-Speed 2498.01 samples/sec Loss 1.9774 LearningRate 0.000256 Epoch: 21 Global Step: 451710 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:17,260-Speed 2498.46 samples/sec Loss 2.0163 LearningRate 0.000256 Epoch: 21 Global Step: 451720 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:25,480-Speed 2491.61 samples/sec Loss 2.0295 LearningRate 0.000256 Epoch: 21 Global Step: 451730 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:33,681-Speed 2497.81 samples/sec Loss 2.0007 LearningRate 0.000256 Epoch: 21 Global Step: 451740 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:41,852-Speed 2507.01 samples/sec Loss 1.9114 LearningRate 0.000256 Epoch: 21 Global Step: 451750 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:50,052-Speed 2497.81 samples/sec Loss 1.9980 LearningRate 0.000256 Epoch: 21 Global Step: 451760 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:50:58,252-Speed 2498.17 samples/sec Loss 1.9743 LearningRate 0.000256 Epoch: 21 Global Step: 451770 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:06,452-Speed 2498.11 samples/sec Loss 2.0251 LearningRate 0.000256 Epoch: 21 Global Step: 451780 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:14,655-Speed 2497.05 samples/sec Loss 1.9818 LearningRate 0.000256 Epoch: 21 Global Step: 451790 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:22,857-Speed 2497.39 samples/sec Loss 2.0006 LearningRate 0.000256 Epoch: 21 Global Step: 451800 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:31,003-Speed 2514.41 samples/sec Loss 1.9718 LearningRate 0.000256 Epoch: 21 Global Step: 451810 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:39,207-Speed 2497.09 samples/sec Loss 1.9791 LearningRate 0.000256 Epoch: 21 Global Step: 451820 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:47,411-Speed 2496.75 samples/sec Loss 1.9872 LearningRate 0.000256 Epoch: 21 Global Step: 451830 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:51:55,615-Speed 2496.58 samples/sec Loss 1.9879 LearningRate 0.000256 Epoch: 21 Global Step: 451840 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:03,816-Speed 2497.52 samples/sec Loss 1.9852 LearningRate 0.000256 Epoch: 21 Global Step: 451850 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:12,017-Speed 2497.92 samples/sec Loss 1.9834 LearningRate 0.000256 Epoch: 21 Global Step: 451860 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:20,163-Speed 2514.43 samples/sec Loss 2.0016 LearningRate 0.000256 Epoch: 21 Global Step: 451870 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:28,365-Speed 2497.43 samples/sec Loss 1.9917 LearningRate 0.000256 Epoch: 21 Global Step: 451880 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:36,575-Speed 2494.81 samples/sec Loss 2.0102 LearningRate 0.000256 Epoch: 21 Global Step: 451890 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:44,789-Speed 2493.47 samples/sec Loss 2.0158 LearningRate 0.000256 Epoch: 21 Global Step: 451900 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:52:52,990-Speed 2497.90 samples/sec Loss 1.9716 LearningRate 0.000256 Epoch: 21 Global Step: 451910 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:01,199-Speed 2495.18 samples/sec Loss 1.9334 LearningRate 0.000256 Epoch: 21 Global Step: 451920 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:09,353-Speed 2512.41 samples/sec Loss 1.9685 LearningRate 0.000256 Epoch: 21 Global Step: 451930 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:17,552-Speed 2498.25 samples/sec Loss 1.9384 LearningRate 0.000256 Epoch: 21 Global Step: 451940 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:25,753-Speed 2497.67 samples/sec Loss 1.9658 LearningRate 0.000256 Epoch: 21 Global Step: 451950 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:33,956-Speed 2496.93 samples/sec Loss 1.9730 LearningRate 0.000256 Epoch: 21 Global Step: 451960 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:42,154-Speed 2498.65 samples/sec Loss 1.9698 LearningRate 0.000256 Epoch: 21 Global Step: 451970 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:50,355-Speed 2497.61 samples/sec Loss 1.9793 LearningRate 0.000256 Epoch: 21 Global Step: 451980 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:53:58,504-Speed 2513.67 samples/sec Loss 1.9609 LearningRate 0.000256 Epoch: 21 Global Step: 451990 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:06,704-Speed 2497.94 samples/sec Loss 1.9823 LearningRate 0.000256 Epoch: 21 Global Step: 452000 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:14,904-Speed 2497.96 samples/sec Loss 2.0112 LearningRate 0.000256 Epoch: 21 Global Step: 452010 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:23,111-Speed 2495.90 samples/sec Loss 2.0203 LearningRate 0.000256 Epoch: 21 Global Step: 452020 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:31,309-Speed 2498.51 samples/sec Loss 1.9288 LearningRate 0.000256 Epoch: 21 Global Step: 452030 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:39,525-Speed 2493.31 samples/sec Loss 2.0067 LearningRate 0.000256 Epoch: 21 Global Step: 452040 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:47,672-Speed 2514.13 samples/sec Loss 1.9518 LearningRate 0.000256 Epoch: 21 Global Step: 452050 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:54:55,874-Speed 2497.50 samples/sec Loss 1.9585 LearningRate 0.000256 Epoch: 21 Global Step: 452060 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:04,075-Speed 2497.59 samples/sec Loss 1.9488 LearningRate 0.000256 Epoch: 21 Global Step: 452070 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:12,278-Speed 2496.92 samples/sec Loss 1.9692 LearningRate 0.000256 Epoch: 21 Global Step: 452080 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:20,480-Speed 2497.47 samples/sec Loss 1.9729 LearningRate 0.000256 Epoch: 21 Global Step: 452090 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:28,679-Speed 2498.10 samples/sec Loss 1.9462 LearningRate 0.000256 Epoch: 21 Global Step: 452100 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:36,827-Speed 2513.93 samples/sec Loss 1.9690 LearningRate 0.000256 Epoch: 21 Global Step: 452110 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:45,028-Speed 2497.51 samples/sec Loss 1.9488 LearningRate 0.000256 Epoch: 21 Global Step: 452120 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:55:53,241-Speed 2494.15 samples/sec Loss 1.9648 LearningRate 0.000256 Epoch: 21 Global Step: 452130 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:01,442-Speed 2497.60 samples/sec Loss 1.9943 LearningRate 0.000256 Epoch: 21 Global Step: 452140 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:09,647-Speed 2496.62 samples/sec Loss 1.9916 LearningRate 0.000256 Epoch: 21 Global Step: 452150 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:17,852-Speed 2496.44 samples/sec Loss 1.9980 LearningRate 0.000256 Epoch: 21 Global Step: 452160 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:25,995-Speed 2515.69 samples/sec Loss 1.9867 LearningRate 0.000256 Epoch: 21 Global Step: 452170 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:34,196-Speed 2497.42 samples/sec Loss 1.9583 LearningRate 0.000255 Epoch: 21 Global Step: 452180 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:42,398-Speed 2497.51 samples/sec Loss 2.0064 LearningRate 0.000255 Epoch: 21 Global Step: 452190 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:50,600-Speed 2497.25 samples/sec Loss 1.9750 LearningRate 0.000255 Epoch: 21 Global Step: 452200 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:56:58,804-Speed 2496.80 samples/sec Loss 2.0059 LearningRate 0.000255 Epoch: 21 Global Step: 452210 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:07,014-Speed 2494.82 samples/sec Loss 1.9718 LearningRate 0.000255 Epoch: 21 Global Step: 452220 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:15,165-Speed 2513.08 samples/sec Loss 1.9847 LearningRate 0.000255 Epoch: 21 Global Step: 452230 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:23,367-Speed 2497.60 samples/sec Loss 1.9782 LearningRate 0.000255 Epoch: 21 Global Step: 452240 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:31,571-Speed 2496.67 samples/sec Loss 1.9721 LearningRate 0.000255 Epoch: 21 Global Step: 452250 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:39,770-Speed 2498.48 samples/sec Loss 1.9798 LearningRate 0.000255 Epoch: 21 Global Step: 452260 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:47,968-Speed 2498.62 samples/sec Loss 1.9512 LearningRate 0.000255 Epoch: 21 Global Step: 452270 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:57:56,168-Speed 2497.85 samples/sec Loss 1.9463 LearningRate 0.000255 Epoch: 21 Global Step: 452280 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:04,314-Speed 2514.51 samples/sec Loss 1.9541 LearningRate 0.000255 Epoch: 21 Global Step: 452290 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:12,514-Speed 2498.53 samples/sec Loss 1.9487 LearningRate 0.000255 Epoch: 21 Global Step: 452300 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:20,714-Speed 2497.70 samples/sec Loss 1.9877 LearningRate 0.000255 Epoch: 21 Global Step: 452310 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:28,914-Speed 2498.17 samples/sec Loss 1.9682 LearningRate 0.000255 Epoch: 21 Global Step: 452320 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:37,114-Speed 2498.21 samples/sec Loss 1.9653 LearningRate 0.000255 Epoch: 21 Global Step: 452330 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:45,311-Speed 2498.73 samples/sec Loss 1.9589 LearningRate 0.000255 Epoch: 21 Global Step: 452340 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:58:53,457-Speed 2514.45 samples/sec Loss 2.0021 LearningRate 0.000255 Epoch: 21 Global Step: 452350 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:01,660-Speed 2497.13 samples/sec Loss 1.9514 LearningRate 0.000255 Epoch: 21 Global Step: 452360 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:09,872-Speed 2494.24 samples/sec Loss 1.9587 LearningRate 0.000255 Epoch: 21 Global Step: 452370 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:18,080-Speed 2495.60 samples/sec Loss 1.9962 LearningRate 0.000255 Epoch: 21 Global Step: 452380 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:26,280-Speed 2498.24 samples/sec Loss 1.9607 LearningRate 0.000255 Epoch: 21 Global Step: 452390 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:34,482-Speed 2497.31 samples/sec Loss 1.9549 LearningRate 0.000255 Epoch: 21 Global Step: 452400 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:42,633-Speed 2513.14 samples/sec Loss 1.9386 LearningRate 0.000255 Epoch: 21 Global Step: 452410 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:50,830-Speed 2498.64 samples/sec Loss 1.9948 LearningRate 0.000255 Epoch: 21 Global Step: 452420 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 21:59:59,030-Speed 2498.26 samples/sec Loss 1.9363 LearningRate 0.000255 Epoch: 21 Global Step: 452430 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:07,237-Speed 2495.81 samples/sec Loss 1.9220 LearningRate 0.000255 Epoch: 21 Global Step: 452440 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:15,433-Speed 2499.14 samples/sec Loss 1.9585 LearningRate 0.000255 Epoch: 21 Global Step: 452450 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:23,632-Speed 2498.67 samples/sec Loss 1.9533 LearningRate 0.000255 Epoch: 21 Global Step: 452460 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:31,780-Speed 2514.11 samples/sec Loss 1.9565 LearningRate 0.000255 Epoch: 21 Global Step: 452470 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:39,981-Speed 2497.44 samples/sec Loss 1.9480 LearningRate 0.000255 Epoch: 21 Global Step: 452480 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:48,181-Speed 2498.14 samples/sec Loss 1.9137 LearningRate 0.000255 Epoch: 21 Global Step: 452490 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:00:56,397-Speed 2493.16 samples/sec Loss 1.8838 LearningRate 0.000255 Epoch: 21 Global Step: 452500 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:04,594-Speed 2498.78 samples/sec Loss 1.9093 LearningRate 0.000255 Epoch: 21 Global Step: 452510 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:12,801-Speed 2495.78 samples/sec Loss 1.9362 LearningRate 0.000255 Epoch: 21 Global Step: 452520 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:20,953-Speed 2512.67 samples/sec Loss 1.9768 LearningRate 0.000255 Epoch: 21 Global Step: 452530 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:29,159-Speed 2496.32 samples/sec Loss 1.9812 LearningRate 0.000255 Epoch: 21 Global Step: 452540 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:37,374-Speed 2493.27 samples/sec Loss 1.9503 LearningRate 0.000255 Epoch: 21 Global Step: 452550 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:45,577-Speed 2497.12 samples/sec Loss 1.9218 LearningRate 0.000255 Epoch: 21 Global Step: 452560 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:01:53,778-Speed 2497.66 samples/sec Loss 1.9402 LearningRate 0.000255 Epoch: 21 Global Step: 452570 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:01,979-Speed 2497.53 samples/sec Loss 2.0113 LearningRate 0.000255 Epoch: 21 Global Step: 452580 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:10,126-Speed 2514.44 samples/sec Loss 1.9644 LearningRate 0.000255 Epoch: 21 Global Step: 452590 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:18,325-Speed 2498.09 samples/sec Loss 1.9378 LearningRate 0.000255 Epoch: 21 Global Step: 452600 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:26,524-Speed 2498.41 samples/sec Loss 1.9725 LearningRate 0.000255 Epoch: 21 Global Step: 452610 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:34,727-Speed 2497.02 samples/sec Loss 1.9391 LearningRate 0.000255 Epoch: 21 Global Step: 452620 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:42,927-Speed 2497.87 samples/sec Loss 1.9374 LearningRate 0.000255 Epoch: 21 Global Step: 452630 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:51,127-Speed 2498.04 samples/sec Loss 1.9710 LearningRate 0.000255 Epoch: 21 Global Step: 452640 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:02:59,278-Speed 2512.93 samples/sec Loss 1.9808 LearningRate 0.000255 Epoch: 21 Global Step: 452650 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:07,479-Speed 2497.79 samples/sec Loss 1.9519 LearningRate 0.000255 Epoch: 21 Global Step: 452660 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:15,679-Speed 2498.04 samples/sec Loss 1.9404 LearningRate 0.000255 Epoch: 21 Global Step: 452670 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:23,881-Speed 2497.15 samples/sec Loss 1.9858 LearningRate 0.000255 Epoch: 21 Global Step: 452680 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:32,080-Speed 2498.43 samples/sec Loss 1.9303 LearningRate 0.000255 Epoch: 21 Global Step: 452690 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:40,278-Speed 2498.53 samples/sec Loss 2.0190 LearningRate 0.000255 Epoch: 21 Global Step: 452700 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:48,431-Speed 2512.25 samples/sec Loss 1.9818 LearningRate 0.000255 Epoch: 21 Global Step: 452710 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:03:56,635-Speed 2497.03 samples/sec Loss 1.9582 LearningRate 0.000255 Epoch: 21 Global Step: 452720 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:04,832-Speed 2498.90 samples/sec Loss 2.0177 LearningRate 0.000255 Epoch: 21 Global Step: 452730 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:13,051-Speed 2493.51 samples/sec Loss 1.9880 LearningRate 0.000255 Epoch: 21 Global Step: 452740 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:21,253-Speed 2497.29 samples/sec Loss 1.9566 LearningRate 0.000255 Epoch: 21 Global Step: 452750 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:29,457-Speed 2496.59 samples/sec Loss 1.9487 LearningRate 0.000255 Epoch: 21 Global Step: 452760 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:37,607-Speed 2513.27 samples/sec Loss 1.9271 LearningRate 0.000255 Epoch: 21 Global Step: 452770 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:45,808-Speed 2497.57 samples/sec Loss 1.9389 LearningRate 0.000255 Epoch: 21 Global Step: 452780 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:04:54,009-Speed 2497.71 samples/sec Loss 1.9300 LearningRate 0.000255 Epoch: 21 Global Step: 452790 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:02,213-Speed 2496.60 samples/sec Loss 1.9534 LearningRate 0.000255 Epoch: 21 Global Step: 452800 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:10,415-Speed 2497.51 samples/sec Loss 2.0179 LearningRate 0.000255 Epoch: 21 Global Step: 452810 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:18,615-Speed 2498.05 samples/sec Loss 1.9703 LearningRate 0.000255 Epoch: 21 Global Step: 452820 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:26,758-Speed 2515.34 samples/sec Loss 1.9617 LearningRate 0.000255 Epoch: 21 Global Step: 452830 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:34,961-Speed 2497.29 samples/sec Loss 1.9369 LearningRate 0.000255 Epoch: 21 Global Step: 452840 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:43,162-Speed 2497.54 samples/sec Loss 1.9870 LearningRate 0.000255 Epoch: 21 Global Step: 452850 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:05:51,361-Speed 2499.70 samples/sec Loss 1.9265 LearningRate 0.000255 Epoch: 21 Global Step: 452860 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:05:59,562-Speed 2497.39 samples/sec Loss 1.9569 LearningRate 0.000255 Epoch: 21 Global Step: 452870 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:07,764-Speed 2497.44 samples/sec Loss 1.9743 LearningRate 0.000255 Epoch: 21 Global Step: 452880 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:15,909-Speed 2514.75 samples/sec Loss 1.9173 LearningRate 0.000255 Epoch: 21 Global Step: 452890 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:24,111-Speed 2497.52 samples/sec Loss 1.9422 LearningRate 0.000255 Epoch: 21 Global Step: 452900 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:32,307-Speed 2498.76 samples/sec Loss 1.9657 LearningRate 0.000255 Epoch: 21 Global Step: 452910 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:40,510-Speed 2497.16 samples/sec Loss 1.9811 LearningRate 0.000254 Epoch: 21 Global Step: 452920 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:48,710-Speed 2498.11 samples/sec Loss 1.9421 LearningRate 0.000254 Epoch: 21 Global Step: 452930 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:06:56,909-Speed 2498.09 samples/sec Loss 1.9425 LearningRate 0.000254 Epoch: 21 Global Step: 452940 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:07:05,059-Speed 2513.47 samples/sec Loss 1.9478 LearningRate 0.000254 Epoch: 21 Global Step: 452950 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:07:13,256-Speed 2499.05 samples/sec Loss 1.9136 LearningRate 0.000254 Epoch: 21 Global Step: 452960 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:07:21,456-Speed 2498.22 samples/sec Loss 1.9512 LearningRate 0.000254 Epoch: 21 Global Step: 452970 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:07:29,653-Speed 2499.08 samples/sec Loss 1.9523 LearningRate 0.000254 Epoch: 21 Global Step: 452980 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-07-09 22:07:37,821-Speed 2507.74 samples/sec Loss 1.9450 LearningRate 0.000254 Epoch: 21 Global Step: 452990 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:07:46,031-Speed 2494.78 samples/sec Loss 1.9758 LearningRate 0.000254 Epoch: 21 Global Step: 453000 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:07:54,179-Speed 2514.10 samples/sec Loss 1.9355 LearningRate 0.000254 Epoch: 21 Global Step: 453010 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:02,377-Speed 2498.75 samples/sec Loss 1.9409 LearningRate 0.000254 Epoch: 21 Global Step: 453020 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:10,575-Speed 2498.43 samples/sec Loss 1.9594 LearningRate 0.000254 Epoch: 21 Global Step: 453030 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:18,777-Speed 2497.91 samples/sec Loss 1.9612 LearningRate 0.000254 Epoch: 21 Global Step: 453040 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:26,976-Speed 2498.21 samples/sec Loss 1.9809 LearningRate 0.000254 Epoch: 21 Global Step: 453050 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:35,180-Speed 2496.48 samples/sec Loss 1.9420 LearningRate 0.000254 Epoch: 21 Global Step: 453060 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:43,326-Speed 2514.57 samples/sec Loss 1.9640 LearningRate 0.000254 Epoch: 21 Global Step: 453070 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:51,523-Speed 2498.94 samples/sec Loss 1.9765 LearningRate 0.000254 Epoch: 21 Global Step: 453080 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:08:59,725-Speed 2497.20 samples/sec Loss 1.9574 LearningRate 0.000254 Epoch: 21 Global Step: 453090 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:09:07,880-Speed 2511.70 samples/sec Loss 1.9718 LearningRate 0.000254 Epoch: 21 Global Step: 453100 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:16,081-Speed 2497.67 samples/sec Loss 1.9262 LearningRate 0.000254 Epoch: 21 Global Step: 453110 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:24,283-Speed 2497.40 samples/sec Loss 1.9489 LearningRate 0.000254 Epoch: 21 Global Step: 453120 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:32,428-Speed 2514.82 samples/sec Loss 1.9636 LearningRate 0.000254 Epoch: 21 Global Step: 453130 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:40,630-Speed 2497.22 samples/sec Loss 1.9575 LearningRate 0.000254 Epoch: 21 Global Step: 453140 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:48,830-Speed 2498.04 samples/sec Loss 1.9466 LearningRate 0.000254 Epoch: 21 Global Step: 453150 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:09:57,039-Speed 2495.36 samples/sec Loss 1.9611 LearningRate 0.000254 Epoch: 21 Global Step: 453160 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:05,242-Speed 2496.61 samples/sec Loss 1.9530 LearningRate 0.000254 Epoch: 21 Global Step: 453170 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:13,439-Speed 2499.19 samples/sec Loss 1.9169 LearningRate 0.000254 Epoch: 21 Global Step: 453180 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:21,585-Speed 2514.45 samples/sec Loss 1.9226 LearningRate 0.000254 Epoch: 21 Global Step: 453190 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:29,785-Speed 2497.94 samples/sec Loss 1.9334 LearningRate 0.000254 Epoch: 21 Global Step: 453200 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:37,984-Speed 2498.08 samples/sec Loss 1.9887 LearningRate 0.000254 Epoch: 21 Global Step: 453210 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:46,183-Speed 2498.37 samples/sec Loss 1.9608 LearningRate 0.000254 Epoch: 21 Global Step: 453220 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:10:54,381-Speed 2498.76 samples/sec Loss 1.9192 LearningRate 0.000254 Epoch: 21 Global Step: 453230 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:11:02,579-Speed 2498.77 samples/sec Loss 1.9276 LearningRate 0.000254 Epoch: 21 Global Step: 453240 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:11:10,728-Speed 2513.80 samples/sec Loss 1.9662 LearningRate 0.000254 Epoch: 21 Global Step: 453250 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:11:18,882-Speed 2512.22 samples/sec Loss 1.9856 LearningRate 0.000254 Epoch: 21 Global Step: 453260 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:11:27,084-Speed 2497.27 samples/sec Loss 1.9498 LearningRate 0.000254 Epoch: 21 Global Step: 453270 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:11:35,285-Speed 2497.82 samples/sec Loss 1.9350 LearningRate 0.000254 Epoch: 21 Global Step: 453280 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:11:43,486-Speed 2497.70 samples/sec Loss 1.9896 LearningRate 0.000254 Epoch: 21 Global Step: 453290 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:11:51,688-Speed 2497.37 samples/sec Loss 1.9505 LearningRate 0.000254 Epoch: 21 Global Step: 453300 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:11:59,829-Speed 2516.29 samples/sec Loss 1.9380 LearningRate 0.000254 Epoch: 21 Global Step: 453310 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:08,024-Speed 2499.62 samples/sec Loss 1.9429 LearningRate 0.000254 Epoch: 21 Global Step: 453320 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:16,222-Speed 2498.38 samples/sec Loss 1.8919 LearningRate 0.000254 Epoch: 21 Global Step: 453330 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:24,424-Speed 2497.23 samples/sec Loss 1.9287 LearningRate 0.000254 Epoch: 21 Global Step: 453340 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:32,620-Speed 2499.45 samples/sec Loss 1.9548 LearningRate 0.000254 Epoch: 21 Global Step: 453350 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:40,815-Speed 2499.76 samples/sec Loss 1.9410 LearningRate 0.000254 Epoch: 21 Global Step: 453360 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:48,960-Speed 2514.99 samples/sec Loss 1.9876 LearningRate 0.000254 Epoch: 21 Global Step: 453370 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:12:57,160-Speed 2497.92 samples/sec Loss 1.9365 LearningRate 0.000254 Epoch: 21 Global Step: 453380 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:05,358-Speed 2498.63 samples/sec Loss 1.9928 LearningRate 0.000254 Epoch: 21 Global Step: 453390 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:13,554-Speed 2498.91 samples/sec Loss 1.9442 LearningRate 0.000254 Epoch: 21 Global Step: 453400 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:21,752-Speed 2498.64 samples/sec Loss 1.9798 LearningRate 0.000254 Epoch: 21 Global Step: 453410 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:29,957-Speed 2496.54 samples/sec Loss 1.9604 LearningRate 0.000254 Epoch: 21 Global Step: 453420 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:38,103-Speed 2514.46 samples/sec Loss 1.9649 LearningRate 0.000254 Epoch: 21 Global Step: 453430 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:46,301-Speed 2498.67 samples/sec Loss 1.9525 LearningRate 0.000254 Epoch: 21 Global Step: 453440 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:13:54,500-Speed 2498.38 samples/sec Loss 1.9820 LearningRate 0.000254 Epoch: 21 Global Step: 453450 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:02,706-Speed 2496.32 samples/sec Loss 1.9292 LearningRate 0.000254 Epoch: 21 Global Step: 453460 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:10,910-Speed 2496.80 samples/sec Loss 1.9089 LearningRate 0.000254 Epoch: 21 Global Step: 453470 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:19,106-Speed 2499.22 samples/sec Loss 1.9462 LearningRate 0.000254 Epoch: 21 Global Step: 453480 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:27,254-Speed 2513.73 samples/sec Loss 1.9833 LearningRate 0.000254 Epoch: 21 Global Step: 453490 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:35,455-Speed 2498.10 samples/sec Loss 1.9511 LearningRate 0.000254 Epoch: 21 Global Step: 453500 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:43,652-Speed 2499.20 samples/sec Loss 1.9767 LearningRate 0.000254 Epoch: 21 Global Step: 453510 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:14:51,857-Speed 2496.40 samples/sec Loss 1.9110 LearningRate 0.000254 Epoch: 21 Global Step: 453520 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:00,055-Speed 2498.61 samples/sec Loss 1.9582 LearningRate 0.000254 Epoch: 21 Global Step: 453530 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:08,255-Speed 2497.88 samples/sec Loss 1.9729 LearningRate 0.000254 Epoch: 21 Global Step: 453540 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:16,408-Speed 2512.35 samples/sec Loss 1.9738 LearningRate 0.000254 Epoch: 21 Global Step: 453550 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:24,626-Speed 2492.69 samples/sec Loss 1.9429 LearningRate 0.000254 Epoch: 21 Global Step: 453560 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:32,828-Speed 2497.15 samples/sec Loss 1.9731 LearningRate 0.000254 Epoch: 21 Global Step: 453570 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:41,027-Speed 2498.16 samples/sec Loss 1.9142 LearningRate 0.000254 Epoch: 21 Global Step: 453580 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:49,228-Speed 2497.71 samples/sec Loss 1.9591 LearningRate 0.000254 Epoch: 21 Global Step: 453590 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:15:57,425-Speed 2499.02 samples/sec Loss 1.9514 LearningRate 0.000254 Epoch: 21 Global Step: 453600 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:05,572-Speed 2514.28 samples/sec Loss 1.9339 LearningRate 0.000254 Epoch: 21 Global Step: 453610 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:13,779-Speed 2495.80 samples/sec Loss 1.9462 LearningRate 0.000254 Epoch: 21 Global Step: 453620 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:21,978-Speed 2498.18 samples/sec Loss 1.9292 LearningRate 0.000254 Epoch: 21 Global Step: 453630 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:30,175-Speed 2499.08 samples/sec Loss 1.9408 LearningRate 0.000254 Epoch: 21 Global Step: 453640 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:38,375-Speed 2498.05 samples/sec Loss 1.9165 LearningRate 0.000254 Epoch: 21 Global Step: 453650 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:46,579-Speed 2497.12 samples/sec Loss 1.9232 LearningRate 0.000253 Epoch: 21 Global Step: 453660 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:16:54,738-Speed 2510.44 samples/sec Loss 1.9314 LearningRate 0.000253 Epoch: 21 Global Step: 453670 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:02,939-Speed 2497.62 samples/sec Loss 1.9542 LearningRate 0.000253 Epoch: 21 Global Step: 453680 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:11,137-Speed 2498.53 samples/sec Loss 1.9298 LearningRate 0.000253 Epoch: 21 Global Step: 453690 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:19,337-Speed 2498.23 samples/sec Loss 1.9213 LearningRate 0.000253 Epoch: 21 Global Step: 453700 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:27,537-Speed 2497.78 samples/sec Loss 1.9314 LearningRate 0.000253 Epoch: 21 Global Step: 453710 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:35,743-Speed 2496.26 samples/sec Loss 1.9455 LearningRate 0.000253 Epoch: 21 Global Step: 453720 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:43,905-Speed 2509.42 samples/sec Loss 1.9226 LearningRate 0.000253 Epoch: 21 Global Step: 453730 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:17:52,106-Speed 2497.79 samples/sec Loss 1.9129 LearningRate 0.000253 Epoch: 21 Global Step: 453740 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:00,306-Speed 2497.89 samples/sec Loss 1.9770 LearningRate 0.000253 Epoch: 21 Global Step: 453750 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:08,515-Speed 2495.33 samples/sec Loss 1.9791 LearningRate 0.000253 Epoch: 21 Global Step: 453760 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:16,729-Speed 2493.79 samples/sec Loss 1.9367 LearningRate 0.000253 Epoch: 21 Global Step: 453770 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:24,927-Speed 2498.60 samples/sec Loss 1.9051 LearningRate 0.000253 Epoch: 21 Global Step: 453780 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:33,074-Speed 2514.24 samples/sec Loss 1.9588 LearningRate 0.000253 Epoch: 21 Global Step: 453790 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:41,276-Speed 2497.35 samples/sec Loss 1.9032 LearningRate 0.000253 Epoch: 21 Global Step: 453800 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:49,482-Speed 2495.88 samples/sec Loss 1.9194 LearningRate 0.000253 Epoch: 21 Global Step: 453810 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:18:57,680-Speed 2498.81 samples/sec Loss 1.9195 LearningRate 0.000253 Epoch: 21 Global Step: 453820 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:05,874-Speed 2499.71 samples/sec Loss 1.9077 LearningRate 0.000253 Epoch: 21 Global Step: 453830 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:14,074-Speed 2498.18 samples/sec Loss 1.9447 LearningRate 0.000253 Epoch: 21 Global Step: 453840 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:22,219-Speed 2514.98 samples/sec Loss 1.9560 LearningRate 0.000253 Epoch: 21 Global Step: 453850 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:30,417-Speed 2498.54 samples/sec Loss 1.9633 LearningRate 0.000253 Epoch: 21 Global Step: 453860 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:38,614-Speed 2499.14 samples/sec Loss 1.9029 LearningRate 0.000253 Epoch: 21 Global Step: 453870 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:46,811-Speed 2498.87 samples/sec Loss 1.9702 LearningRate 0.000253 Epoch: 21 Global Step: 453880 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:19:55,024-Speed 2494.23 samples/sec Loss 1.9408 LearningRate 0.000253 Epoch: 21 Global Step: 453890 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:03,222-Speed 2498.46 samples/sec Loss 1.9419 LearningRate 0.000253 Epoch: 21 Global Step: 453900 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:11,368-Speed 2514.97 samples/sec Loss 1.9354 LearningRate 0.000253 Epoch: 21 Global Step: 453910 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:19,565-Speed 2498.61 samples/sec Loss 1.9330 LearningRate 0.000253 Epoch: 21 Global Step: 453920 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:27,767-Speed 2497.52 samples/sec Loss 1.9392 LearningRate 0.000253 Epoch: 21 Global Step: 453930 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:35,963-Speed 2499.49 samples/sec Loss 1.9342 LearningRate 0.000253 Epoch: 21 Global Step: 453940 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:44,161-Speed 2498.39 samples/sec Loss 1.9729 LearningRate 0.000253 Epoch: 21 Global Step: 453950 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:20:52,359-Speed 2498.40 samples/sec Loss 1.9800 LearningRate 0.000253 Epoch: 21 Global Step: 453960 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:00,503-Speed 2515.52 samples/sec Loss 1.9279 LearningRate 0.000253 Epoch: 21 Global Step: 453970 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:08,699-Speed 2498.96 samples/sec Loss 1.9958 LearningRate 0.000253 Epoch: 21 Global Step: 453980 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:16,897-Speed 2498.73 samples/sec Loss 1.9661 LearningRate 0.000253 Epoch: 21 Global Step: 453990 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:25,099-Speed 2497.45 samples/sec Loss 1.9179 LearningRate 0.000253 Epoch: 21 Global Step: 454000 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:33,302-Speed 2497.11 samples/sec Loss 1.9476 LearningRate 0.000253 Epoch: 21 Global Step: 454010 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:41,507-Speed 2497.08 samples/sec Loss 1.9436 LearningRate 0.000253 Epoch: 21 Global Step: 454020 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:49,649-Speed 2515.79 samples/sec Loss 1.9827 LearningRate 0.000253 Epoch: 21 Global Step: 454030 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:21:57,853-Speed 2496.78 samples/sec Loss 1.8986 LearningRate 0.000253 Epoch: 21 Global Step: 454040 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:06,051-Speed 2498.74 samples/sec Loss 1.9470 LearningRate 0.000253 Epoch: 21 Global Step: 454050 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:14,252-Speed 2497.68 samples/sec Loss 1.9770 LearningRate 0.000253 Epoch: 21 Global Step: 454060 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:22,455-Speed 2497.28 samples/sec Loss 1.9740 LearningRate 0.000253 Epoch: 21 Global Step: 454070 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:30,652-Speed 2498.78 samples/sec Loss 1.9271 LearningRate 0.000253 Epoch: 21 Global Step: 454080 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:38,797-Speed 2514.90 samples/sec Loss 1.9031 LearningRate 0.000253 Epoch: 21 Global Step: 454090 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:46,998-Speed 2497.36 samples/sec Loss 1.9299 LearningRate 0.000253 Epoch: 21 Global Step: 454100 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:22:55,198-Speed 2498.26 samples/sec Loss 1.9623 LearningRate 0.000253 Epoch: 21 Global Step: 454110 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:03,395-Speed 2498.96 samples/sec Loss 1.9677 LearningRate 0.000253 Epoch: 21 Global Step: 454120 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:11,593-Speed 2498.49 samples/sec Loss 1.9640 LearningRate 0.000253 Epoch: 21 Global Step: 454130 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:19,791-Speed 2498.56 samples/sec Loss 1.9846 LearningRate 0.000253 Epoch: 21 Global Step: 454140 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:27,934-Speed 2515.39 samples/sec Loss 1.9628 LearningRate 0.000253 Epoch: 21 Global Step: 454150 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:36,129-Speed 2499.45 samples/sec Loss 1.9133 LearningRate 0.000253 Epoch: 21 Global Step: 454160 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:44,336-Speed 2496.27 samples/sec Loss 2.0167 LearningRate 0.000253 Epoch: 21 Global Step: 454170 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:23:52,533-Speed 2498.65 samples/sec Loss 1.9593 LearningRate 0.000253 Epoch: 21 Global Step: 454180 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:00,738-Speed 2496.45 samples/sec Loss 1.9810 LearningRate 0.000253 Epoch: 21 Global Step: 454190 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:08,935-Speed 2498.79 samples/sec Loss 1.9177 LearningRate 0.000253 Epoch: 21 Global Step: 454200 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:17,081-Speed 2514.43 samples/sec Loss 2.0360 LearningRate 0.000253 Epoch: 21 Global Step: 454210 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:25,279-Speed 2498.51 samples/sec Loss 1.9364 LearningRate 0.000253 Epoch: 21 Global Step: 454220 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:33,481-Speed 2497.33 samples/sec Loss 1.9151 LearningRate 0.000253 Epoch: 21 Global Step: 454230 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:41,679-Speed 2499.18 samples/sec Loss 1.9543 LearningRate 0.000253 Epoch: 21 Global Step: 454240 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:49,875-Speed 2499.31 samples/sec Loss 1.9009 LearningRate 0.000253 Epoch: 21 Global Step: 454250 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:24:58,071-Speed 2499.11 samples/sec Loss 1.9338 LearningRate 0.000253 Epoch: 21 Global Step: 454260 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:06,215-Speed 2515.00 samples/sec Loss 1.9602 LearningRate 0.000253 Epoch: 21 Global Step: 454270 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:14,417-Speed 2497.61 samples/sec Loss 1.9513 LearningRate 0.000253 Epoch: 21 Global Step: 454280 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:22,614-Speed 2498.64 samples/sec Loss 1.9281 LearningRate 0.000253 Epoch: 21 Global Step: 454290 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:30,811-Speed 2498.81 samples/sec Loss 1.9287 LearningRate 0.000253 Epoch: 21 Global Step: 454300 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:39,013-Speed 2497.47 samples/sec Loss 1.9246 LearningRate 0.000253 Epoch: 21 Global Step: 454310 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:47,225-Speed 2494.10 samples/sec Loss 2.0046 LearningRate 0.000253 Epoch: 21 Global Step: 454320 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:25:55,379-Speed 2512.49 samples/sec Loss 1.9567 LearningRate 0.000253 Epoch: 21 Global Step: 454330 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:03,572-Speed 2499.94 samples/sec Loss 1.9438 LearningRate 0.000253 Epoch: 21 Global Step: 454340 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:11,769-Speed 2499.31 samples/sec Loss 1.9521 LearningRate 0.000253 Epoch: 21 Global Step: 454350 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:19,974-Speed 2496.36 samples/sec Loss 1.9730 LearningRate 0.000253 Epoch: 21 Global Step: 454360 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:28,172-Speed 2498.58 samples/sec Loss 1.9722 LearningRate 0.000253 Epoch: 21 Global Step: 454370 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:36,369-Speed 2498.81 samples/sec Loss 1.9673 LearningRate 0.000253 Epoch: 21 Global Step: 454380 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:44,515-Speed 2514.61 samples/sec Loss 1.9364 LearningRate 0.000253 Epoch: 21 Global Step: 454390 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:26:52,715-Speed 2498.29 samples/sec Loss 1.9816 LearningRate 0.000252 Epoch: 21 Global Step: 454400 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:00,912-Speed 2498.71 samples/sec Loss 1.9455 LearningRate 0.000252 Epoch: 21 Global Step: 454410 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:09,120-Speed 2495.43 samples/sec Loss 1.9646 LearningRate 0.000252 Epoch: 21 Global Step: 454420 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:17,320-Speed 2498.15 samples/sec Loss 1.9448 LearningRate 0.000252 Epoch: 21 Global Step: 454430 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:25,521-Speed 2497.50 samples/sec Loss 1.9280 LearningRate 0.000252 Epoch: 21 Global Step: 454440 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:33,663-Speed 2515.94 samples/sec Loss 1.9451 LearningRate 0.000252 Epoch: 21 Global Step: 454450 Fp16 Grad Scale: 8192 Required: 86 hours Training: 2022-07-09 22:27:41,863-Speed 2497.97 samples/sec Loss 2.0096 LearningRate 0.000252 Epoch: 21 Global Step: 454460 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:27:50,057-Speed 2499.89 samples/sec Loss 1.9499 LearningRate 0.000252 Epoch: 21 Global Step: 454470 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:27:58,260-Speed 2496.97 samples/sec Loss 1.9092 LearningRate 0.000252 Epoch: 21 Global Step: 454480 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:06,463-Speed 2497.09 samples/sec Loss 1.9544 LearningRate 0.000252 Epoch: 21 Global Step: 454490 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:14,667-Speed 2496.90 samples/sec Loss 1.9842 LearningRate 0.000252 Epoch: 21 Global Step: 454500 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:22,813-Speed 2514.29 samples/sec Loss 1.9346 LearningRate 0.000252 Epoch: 21 Global Step: 454510 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:31,014-Speed 2497.76 samples/sec Loss 1.9505 LearningRate 0.000252 Epoch: 21 Global Step: 454520 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:39,212-Speed 2498.56 samples/sec Loss 1.9357 LearningRate 0.000252 Epoch: 21 Global Step: 454530 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:47,412-Speed 2497.84 samples/sec Loss 1.9454 LearningRate 0.000252 Epoch: 21 Global Step: 454540 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:28:55,615-Speed 2496.94 samples/sec Loss 1.9003 LearningRate 0.000252 Epoch: 21 Global Step: 454550 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:03,813-Speed 2498.71 samples/sec Loss 1.9802 LearningRate 0.000252 Epoch: 21 Global Step: 454560 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:11,959-Speed 2514.54 samples/sec Loss 1.9409 LearningRate 0.000252 Epoch: 21 Global Step: 454570 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:20,155-Speed 2499.10 samples/sec Loss 1.9683 LearningRate 0.000252 Epoch: 21 Global Step: 454580 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:28,359-Speed 2496.76 samples/sec Loss 1.9878 LearningRate 0.000252 Epoch: 21 Global Step: 454590 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:36,606-Speed 2500.17 samples/sec Loss 1.9718 LearningRate 0.000252 Epoch: 21 Global Step: 454600 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:44,807-Speed 2497.49 samples/sec Loss 1.9628 LearningRate 0.000252 Epoch: 21 Global Step: 454610 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:29:53,051-Speed 2500.21 samples/sec Loss 1.9693 LearningRate 0.000252 Epoch: 21 Global Step: 454620 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:30:01,231-Speed 2517.04 samples/sec Loss 1.9787 LearningRate 0.000252 Epoch: 21 Global Step: 454630 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:30:09,434-Speed 2497.16 samples/sec Loss 1.9630 LearningRate 0.000252 Epoch: 21 Global Step: 454640 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:31:40,108-Speed 225.95 samples/sec Loss 1.9108 LearningRate 0.000252 Epoch: 21 Global Step: 454650 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:31:48,320-Speed 2502.60 samples/sec Loss 1.9464 LearningRate 0.000252 Epoch: 21 Global Step: 454660 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:31:56,546-Speed 2501.88 samples/sec Loss 1.9413 LearningRate 0.000252 Epoch: 21 Global Step: 454670 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:05,597-Speed 2262.77 samples/sec Loss 1.9637 LearningRate 0.000252 Epoch: 21 Global Step: 454680 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:13,764-Speed 2512.11 samples/sec Loss 2.0213 LearningRate 0.000252 Epoch: 21 Global Step: 454690 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:25,850-Speed 1694.75 samples/sec Loss 1.9767 LearningRate 0.000252 Epoch: 21 Global Step: 454700 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:34,114-Speed 2491.33 samples/sec Loss 1.9684 LearningRate 0.000252 Epoch: 21 Global Step: 454710 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:42,384-Speed 2490.82 samples/sec Loss 1.9400 LearningRate 0.000252 Epoch: 21 Global Step: 454720 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:32:51,815-Speed 2171.61 samples/sec Loss 1.9855 LearningRate 0.000252 Epoch: 21 Global Step: 454730 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:00,037-Speed 2491.21 samples/sec Loss 1.9769 LearningRate 0.000252 Epoch: 21 Global Step: 454740 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:08,261-Speed 2505.72 samples/sec Loss 1.9822 LearningRate 0.000252 Epoch: 21 Global Step: 454750 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:16,503-Speed 2496.99 samples/sec Loss 1.9874 LearningRate 0.000252 Epoch: 21 Global Step: 454760 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:24,709-Speed 2496.07 samples/sec Loss 1.9835 LearningRate 0.000252 Epoch: 21 Global Step: 454770 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:33,914-Speed 2497.68 samples/sec Loss 1.9661 LearningRate 0.000252 Epoch: 21 Global Step: 454780 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:43,324-Speed 2494.77 samples/sec Loss 2.0080 LearningRate 0.000252 Epoch: 21 Global Step: 454790 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:51,524-Speed 2498.03 samples/sec Loss 2.0298 LearningRate 0.000252 Epoch: 21 Global Step: 454800 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:33:59,672-Speed 2513.97 samples/sec Loss 1.9882 LearningRate 0.000252 Epoch: 21 Global Step: 454810 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:07,874-Speed 2497.38 samples/sec Loss 1.9971 LearningRate 0.000252 Epoch: 21 Global Step: 454820 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:16,070-Speed 2499.53 samples/sec Loss 2.0230 LearningRate 0.000252 Epoch: 21 Global Step: 454830 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:24,277-Speed 2495.84 samples/sec Loss 1.9941 LearningRate 0.000252 Epoch: 21 Global Step: 454840 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:32,500-Speed 2490.72 samples/sec Loss 1.9958 LearningRate 0.000252 Epoch: 21 Global Step: 454850 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:40,707-Speed 2495.87 samples/sec Loss 2.0186 LearningRate 0.000252 Epoch: 21 Global Step: 454860 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:48,862-Speed 2511.95 samples/sec Loss 1.9991 LearningRate 0.000252 Epoch: 21 Global Step: 454870 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:34:57,063-Speed 2497.38 samples/sec Loss 1.9888 LearningRate 0.000252 Epoch: 21 Global Step: 454880 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:05,268-Speed 2496.55 samples/sec Loss 2.0326 LearningRate 0.000252 Epoch: 21 Global Step: 454890 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:13,470-Speed 2497.47 samples/sec Loss 1.9992 LearningRate 0.000252 Epoch: 21 Global Step: 454900 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:21,672-Speed 2497.45 samples/sec Loss 2.0059 LearningRate 0.000252 Epoch: 21 Global Step: 454910 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:29,874-Speed 2497.31 samples/sec Loss 1.9867 LearningRate 0.000252 Epoch: 21 Global Step: 454920 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:38,026-Speed 2512.82 samples/sec Loss 1.9747 LearningRate 0.000252 Epoch: 21 Global Step: 454930 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:46,230-Speed 2496.72 samples/sec Loss 1.9770 LearningRate 0.000252 Epoch: 21 Global Step: 454940 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:35:54,433-Speed 2497.09 samples/sec Loss 1.9894 LearningRate 0.000252 Epoch: 21 Global Step: 454950 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:02,648-Speed 2493.55 samples/sec Loss 1.9697 LearningRate 0.000252 Epoch: 21 Global Step: 454960 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:10,857-Speed 2495.50 samples/sec Loss 2.0206 LearningRate 0.000252 Epoch: 21 Global Step: 454970 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:19,061-Speed 2496.54 samples/sec Loss 1.9822 LearningRate 0.000252 Epoch: 21 Global Step: 454980 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:27,213-Speed 2512.77 samples/sec Loss 1.9964 LearningRate 0.000252 Epoch: 21 Global Step: 454990 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:35,417-Speed 2496.65 samples/sec Loss 1.9683 LearningRate 0.000252 Epoch: 21 Global Step: 455000 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:43,623-Speed 2496.21 samples/sec Loss 1.9344 LearningRate 0.000252 Epoch: 21 Global Step: 455010 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:36:51,824-Speed 2497.76 samples/sec Loss 1.9829 LearningRate 0.000252 Epoch: 21 Global Step: 455020 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:00,030-Speed 2495.92 samples/sec Loss 1.9502 LearningRate 0.000252 Epoch: 21 Global Step: 455030 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:08,239-Speed 2495.32 samples/sec Loss 1.9640 LearningRate 0.000252 Epoch: 21 Global Step: 455040 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:16,395-Speed 2511.55 samples/sec Loss 1.9725 LearningRate 0.000252 Epoch: 21 Global Step: 455050 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:24,609-Speed 2493.81 samples/sec Loss 2.0038 LearningRate 0.000252 Epoch: 21 Global Step: 455060 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:32,819-Speed 2494.93 samples/sec Loss 1.9862 LearningRate 0.000252 Epoch: 21 Global Step: 455070 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:41,030-Speed 2494.74 samples/sec Loss 1.9964 LearningRate 0.000252 Epoch: 21 Global Step: 455080 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:49,240-Speed 2494.97 samples/sec Loss 1.9606 LearningRate 0.000252 Epoch: 21 Global Step: 455090 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:37:57,464-Speed 2490.54 samples/sec Loss 2.0019 LearningRate 0.000252 Epoch: 21 Global Step: 455100 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:05,623-Speed 2510.69 samples/sec Loss 1.9460 LearningRate 0.000252 Epoch: 21 Global Step: 455110 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:13,838-Speed 2493.62 samples/sec Loss 1.9885 LearningRate 0.000252 Epoch: 21 Global Step: 455120 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:22,050-Speed 2494.46 samples/sec Loss 1.9813 LearningRate 0.000252 Epoch: 21 Global Step: 455130 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:30,263-Speed 2493.79 samples/sec Loss 1.9566 LearningRate 0.000252 Epoch: 21 Global Step: 455140 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:38,480-Speed 2492.75 samples/sec Loss 1.9738 LearningRate 0.000251 Epoch: 21 Global Step: 455150 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:46,693-Speed 2494.11 samples/sec Loss 1.9266 LearningRate 0.000251 Epoch: 21 Global Step: 455160 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:38:54,853-Speed 2510.13 samples/sec Loss 1.9048 LearningRate 0.000251 Epoch: 21 Global Step: 455170 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:03,067-Speed 2493.82 samples/sec Loss 1.8971 LearningRate 0.000251 Epoch: 21 Global Step: 455180 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:11,311-Speed 2484.68 samples/sec Loss 1.9234 LearningRate 0.000251 Epoch: 21 Global Step: 455190 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:19,522-Speed 2494.48 samples/sec Loss 1.9097 LearningRate 0.000251 Epoch: 21 Global Step: 455200 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:27,734-Speed 2494.22 samples/sec Loss 1.9684 LearningRate 0.000251 Epoch: 21 Global Step: 455210 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:35,950-Speed 2493.12 samples/sec Loss 1.9335 LearningRate 0.000251 Epoch: 21 Global Step: 455220 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:44,112-Speed 2509.72 samples/sec Loss 1.9577 LearningRate 0.000251 Epoch: 21 Global Step: 455230 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:39:52,327-Speed 2493.18 samples/sec Loss 1.9159 LearningRate 0.000251 Epoch: 21 Global Step: 455240 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:00,560-Speed 2488.19 samples/sec Loss 1.9316 LearningRate 0.000251 Epoch: 21 Global Step: 455250 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:08,774-Speed 2493.56 samples/sec Loss 1.9366 LearningRate 0.000251 Epoch: 21 Global Step: 455260 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:16,991-Speed 2492.83 samples/sec Loss 1.9630 LearningRate 0.000251 Epoch: 21 Global Step: 455270 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:25,208-Speed 2492.74 samples/sec Loss 1.9904 LearningRate 0.000251 Epoch: 21 Global Step: 455280 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:33,371-Speed 2509.57 samples/sec Loss 1.9932 LearningRate 0.000251 Epoch: 21 Global Step: 455290 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:41,587-Speed 2492.94 samples/sec Loss 1.9555 LearningRate 0.000251 Epoch: 21 Global Step: 455300 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:49,802-Speed 2493.59 samples/sec Loss 1.9660 LearningRate 0.000251 Epoch: 21 Global Step: 455310 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:40:58,019-Speed 2492.71 samples/sec Loss 1.9772 LearningRate 0.000251 Epoch: 21 Global Step: 455320 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:06,239-Speed 2491.73 samples/sec Loss 1.9780 LearningRate 0.000251 Epoch: 21 Global Step: 455330 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:14,455-Speed 2493.12 samples/sec Loss 1.9907 LearningRate 0.000251 Epoch: 21 Global Step: 455340 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:22,630-Speed 2505.67 samples/sec Loss 1.9385 LearningRate 0.000251 Epoch: 21 Global Step: 455350 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:30,851-Speed 2491.54 samples/sec Loss 1.9303 LearningRate 0.000251 Epoch: 21 Global Step: 455360 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:39,067-Speed 2492.98 samples/sec Loss 1.9385 LearningRate 0.000251 Epoch: 21 Global Step: 455370 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:47,286-Speed 2492.28 samples/sec Loss 1.9464 LearningRate 0.000251 Epoch: 21 Global Step: 455380 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:41:55,503-Speed 2492.68 samples/sec Loss 1.9213 LearningRate 0.000251 Epoch: 21 Global Step: 455390 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:03,725-Speed 2491.10 samples/sec Loss 1.8962 LearningRate 0.000251 Epoch: 21 Global Step: 455400 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:11,897-Speed 2506.42 samples/sec Loss 1.8973 LearningRate 0.000251 Epoch: 21 Global Step: 455410 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:20,114-Speed 2492.72 samples/sec Loss 1.9355 LearningRate 0.000251 Epoch: 21 Global Step: 455420 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:28,331-Speed 2492.81 samples/sec Loss 1.9234 LearningRate 0.000251 Epoch: 21 Global Step: 455430 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:36,554-Speed 2491.24 samples/sec Loss 1.9555 LearningRate 0.000251 Epoch: 21 Global Step: 455440 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:44,777-Speed 2491.11 samples/sec Loss 1.9288 LearningRate 0.000251 Epoch: 21 Global Step: 455450 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:42:53,001-Speed 2490.67 samples/sec Loss 2.0122 LearningRate 0.000251 Epoch: 21 Global Step: 455460 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:01,169-Speed 2507.62 samples/sec Loss 1.9338 LearningRate 0.000251 Epoch: 21 Global Step: 455470 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:09,390-Speed 2491.61 samples/sec Loss 1.9614 LearningRate 0.000251 Epoch: 21 Global Step: 455480 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:17,610-Speed 2491.92 samples/sec Loss 1.9396 LearningRate 0.000251 Epoch: 21 Global Step: 455490 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:25,828-Speed 2492.62 samples/sec Loss 1.9284 LearningRate 0.000251 Epoch: 21 Global Step: 455500 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:34,051-Speed 2490.98 samples/sec Loss 1.9297 LearningRate 0.000251 Epoch: 21 Global Step: 455510 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:42,272-Speed 2491.57 samples/sec Loss 1.9581 LearningRate 0.000251 Epoch: 21 Global Step: 455520 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:50,441-Speed 2507.50 samples/sec Loss 1.9172 LearningRate 0.000251 Epoch: 21 Global Step: 455530 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:43:58,661-Speed 2491.94 samples/sec Loss 1.9001 LearningRate 0.000251 Epoch: 21 Global Step: 455540 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:06,889-Speed 2489.77 samples/sec Loss 1.9465 LearningRate 0.000251 Epoch: 21 Global Step: 455550 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:15,141-Speed 2482.07 samples/sec Loss 1.9304 LearningRate 0.000251 Epoch: 21 Global Step: 455560 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:23,373-Speed 2488.13 samples/sec Loss 1.9056 LearningRate 0.000251 Epoch: 21 Global Step: 455570 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:31,602-Speed 2489.27 samples/sec Loss 1.9304 LearningRate 0.000251 Epoch: 21 Global Step: 455580 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:39,771-Speed 2507.59 samples/sec Loss 1.9488 LearningRate 0.000251 Epoch: 21 Global Step: 455590 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:47,992-Speed 2491.55 samples/sec Loss 1.9676 LearningRate 0.000251 Epoch: 21 Global Step: 455600 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:44:56,218-Speed 2490.37 samples/sec Loss 1.9708 LearningRate 0.000251 Epoch: 21 Global Step: 455610 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:45:04,441-Speed 2490.97 samples/sec Loss 1.9619 LearningRate 0.000251 Epoch: 21 Global Step: 455620 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:45:12,663-Speed 2491.33 samples/sec Loss 1.9207 LearningRate 0.000251 Epoch: 21 Global Step: 455630 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:45:20,886-Speed 2490.90 samples/sec Loss 1.9317 LearningRate 0.000251 Epoch: 21 Global Step: 455640 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:45:29,054-Speed 2507.74 samples/sec Loss 1.9784 LearningRate 0.000251 Epoch: 21 Global Step: 455650 Fp16 Grad Scale: 16384 Required: 86 hours Training: 2022-07-09 22:45:37,277-Speed 2490.94 samples/sec Loss 1.9710 LearningRate 0.000251 Epoch: 21 Global Step: 455660 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:45:45,502-Speed 2490.44 samples/sec Loss 1.9090 LearningRate 0.000251 Epoch: 21 Global Step: 455670 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:45:53,730-Speed 2489.27 samples/sec Loss 1.9449 LearningRate 0.000251 Epoch: 21 Global Step: 455680 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:01,956-Speed 2490.44 samples/sec Loss 1.9310 LearningRate 0.000251 Epoch: 21 Global Step: 455690 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:10,178-Speed 2491.00 samples/sec Loss 1.9502 LearningRate 0.000251 Epoch: 21 Global Step: 455700 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:18,355-Speed 2505.14 samples/sec Loss 1.9526 LearningRate 0.000251 Epoch: 21 Global Step: 455710 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:26,579-Speed 2490.77 samples/sec Loss 1.9790 LearningRate 0.000251 Epoch: 21 Global Step: 455720 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:34,803-Speed 2490.48 samples/sec Loss 1.9664 LearningRate 0.000251 Epoch: 21 Global Step: 455730 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:43,034-Speed 2488.78 samples/sec Loss 1.9886 LearningRate 0.000251 Epoch: 21 Global Step: 455740 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:51,256-Speed 2491.33 samples/sec Loss 1.9842 LearningRate 0.000251 Epoch: 21 Global Step: 455750 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:46:59,481-Speed 2490.33 samples/sec Loss 1.9861 LearningRate 0.000251 Epoch: 21 Global Step: 455760 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:07,648-Speed 2508.11 samples/sec Loss 1.9968 LearningRate 0.000251 Epoch: 21 Global Step: 455770 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:15,871-Speed 2490.75 samples/sec Loss 1.9921 LearningRate 0.000251 Epoch: 21 Global Step: 455780 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:24,099-Speed 2489.57 samples/sec Loss 2.0016 LearningRate 0.000251 Epoch: 21 Global Step: 455790 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:32,320-Speed 2491.53 samples/sec Loss 1.9604 LearningRate 0.000251 Epoch: 21 Global Step: 455800 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:40,548-Speed 2489.71 samples/sec Loss 1.9757 LearningRate 0.000251 Epoch: 21 Global Step: 455810 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:48,766-Speed 2492.48 samples/sec Loss 1.9605 LearningRate 0.000251 Epoch: 21 Global Step: 455820 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:47:56,932-Speed 2508.20 samples/sec Loss 1.9532 LearningRate 0.000251 Epoch: 21 Global Step: 455830 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:05,243-Speed 2465.53 samples/sec Loss 1.9874 LearningRate 0.000251 Epoch: 21 Global Step: 455840 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:13,462-Speed 2492.08 samples/sec Loss 1.9613 LearningRate 0.000251 Epoch: 21 Global Step: 455850 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:21,689-Speed 2489.83 samples/sec Loss 1.9075 LearningRate 0.000251 Epoch: 21 Global Step: 455860 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:29,907-Speed 2493.41 samples/sec Loss 1.9789 LearningRate 0.000251 Epoch: 21 Global Step: 455870 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:38,121-Speed 2493.72 samples/sec Loss 1.9329 LearningRate 0.000251 Epoch: 21 Global Step: 455880 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:46,285-Speed 2508.81 samples/sec Loss 1.9333 LearningRate 0.000250 Epoch: 21 Global Step: 455890 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:48:54,504-Speed 2492.32 samples/sec Loss 1.9429 LearningRate 0.000250 Epoch: 21 Global Step: 455900 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:02,720-Speed 2493.04 samples/sec Loss 1.9379 LearningRate 0.000250 Epoch: 21 Global Step: 455910 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:10,937-Speed 2492.84 samples/sec Loss 1.9460 LearningRate 0.000250 Epoch: 21 Global Step: 455920 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:19,153-Speed 2493.07 samples/sec Loss 1.9467 LearningRate 0.000250 Epoch: 21 Global Step: 455930 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:27,375-Speed 2491.64 samples/sec Loss 1.9921 LearningRate 0.000250 Epoch: 21 Global Step: 455940 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:35,536-Speed 2509.69 samples/sec Loss 1.9486 LearningRate 0.000250 Epoch: 21 Global Step: 455950 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:43,751-Speed 2493.48 samples/sec Loss 1.8681 LearningRate 0.000250 Epoch: 21 Global Step: 455960 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:49:51,967-Speed 2493.21 samples/sec Loss 1.9079 LearningRate 0.000250 Epoch: 21 Global Step: 455970 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:00,199-Speed 2488.24 samples/sec Loss 1.9027 LearningRate 0.000250 Epoch: 21 Global Step: 455980 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:08,415-Speed 2492.90 samples/sec Loss 1.9101 LearningRate 0.000250 Epoch: 21 Global Step: 455990 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:16,645-Speed 2489.40 samples/sec Loss 1.9183 LearningRate 0.000250 Epoch: 21 Global Step: 456000 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:24,808-Speed 2509.30 samples/sec Loss 1.9638 LearningRate 0.000250 Epoch: 21 Global Step: 456010 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:33,027-Speed 2492.37 samples/sec Loss 1.8874 LearningRate 0.000250 Epoch: 21 Global Step: 456020 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:41,244-Speed 2492.58 samples/sec Loss 1.9310 LearningRate 0.000250 Epoch: 21 Global Step: 456030 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:49,460-Speed 2493.16 samples/sec Loss 1.8928 LearningRate 0.000250 Epoch: 21 Global Step: 456040 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:50:57,684-Speed 2490.89 samples/sec Loss 1.8808 LearningRate 0.000250 Epoch: 21 Global Step: 456050 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:51:05,925-Speed 2485.35 samples/sec Loss 1.9177 LearningRate 0.000250 Epoch: 21 Global Step: 456060 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:51:14,088-Speed 2509.43 samples/sec Loss 1.9605 LearningRate 0.000250 Epoch: 21 Global Step: 456070 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:51:22,307-Speed 2492.83 samples/sec Loss 1.9640 LearningRate 0.000250 Epoch: 21 Global Step: 456080 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:51:30,520-Speed 2493.85 samples/sec Loss 1.8941 LearningRate 0.000250 Epoch: 21 Global Step: 456090 Fp16 Grad Scale: 32768 Required: 86 hours Training: 2022-07-09 22:51:38,738-Speed 2492.51 samples/sec Loss 1.9255 LearningRate 0.000250 Epoch: 21 Global Step: 456100 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:51:46,957-Speed 2495.77 samples/sec Loss 1.9095 LearningRate 0.000250 Epoch: 21 Global Step: 456110 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:51:55,171-Speed 2493.71 samples/sec Loss 1.9288 LearningRate 0.000250 Epoch: 21 Global Step: 456120 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:03,333-Speed 2509.61 samples/sec Loss 1.9216 LearningRate 0.000250 Epoch: 21 Global Step: 456130 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:11,562-Speed 2489.56 samples/sec Loss 1.9628 LearningRate 0.000250 Epoch: 21 Global Step: 456140 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:19,785-Speed 2490.81 samples/sec Loss 2.0008 LearningRate 0.000250 Epoch: 21 Global Step: 456150 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:27,999-Speed 2493.68 samples/sec Loss 1.9537 LearningRate 0.000250 Epoch: 21 Global Step: 456160 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:36,207-Speed 2496.00 samples/sec Loss 1.9555 LearningRate 0.000250 Epoch: 21 Global Step: 456170 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:44,436-Speed 2489.08 samples/sec Loss 1.9546 LearningRate 0.000250 Epoch: 21 Global Step: 456180 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:52:52,594-Speed 2511.13 samples/sec Loss 1.9412 LearningRate 0.000250 Epoch: 21 Global Step: 456190 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:00,804-Speed 2494.91 samples/sec Loss 1.9498 LearningRate 0.000250 Epoch: 21 Global Step: 456200 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:09,018-Speed 2493.67 samples/sec Loss 1.8830 LearningRate 0.000250 Epoch: 21 Global Step: 456210 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:17,231-Speed 2494.02 samples/sec Loss 1.9562 LearningRate 0.000250 Epoch: 21 Global Step: 456220 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:25,444-Speed 2494.23 samples/sec Loss 1.9657 LearningRate 0.000250 Epoch: 21 Global Step: 456230 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:33,654-Speed 2494.74 samples/sec Loss 1.8991 LearningRate 0.000250 Epoch: 21 Global Step: 456240 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:41,810-Speed 2511.52 samples/sec Loss 1.8907 LearningRate 0.000250 Epoch: 21 Global Step: 456250 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:50,024-Speed 2493.93 samples/sec Loss 1.9470 LearningRate 0.000250 Epoch: 21 Global Step: 456260 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:53:58,238-Speed 2493.50 samples/sec Loss 1.9819 LearningRate 0.000250 Epoch: 21 Global Step: 456270 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:08,661-Speed 1965.30 samples/sec Loss 1.9676 LearningRate 0.000250 Epoch: 22 Global Step: 456280 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:16,864-Speed 2497.19 samples/sec Loss 1.9721 LearningRate 0.000250 Epoch: 22 Global Step: 456290 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:25,066-Speed 2497.16 samples/sec Loss 1.9630 LearningRate 0.000250 Epoch: 22 Global Step: 456300 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:33,214-Speed 2514.10 samples/sec Loss 1.9327 LearningRate 0.000250 Epoch: 22 Global Step: 456310 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:41,417-Speed 2497.02 samples/sec Loss 1.9527 LearningRate 0.000250 Epoch: 22 Global Step: 456320 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 22:54:49,579-Speed 2509.55 samples/sec Loss 1.9258 LearningRate 0.000250 Epoch: 22 Global Step: 456330 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:54:57,783-Speed 2496.62 samples/sec Loss 1.9689 LearningRate 0.000250 Epoch: 22 Global Step: 456340 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:05,990-Speed 2495.68 samples/sec Loss 1.9636 LearningRate 0.000250 Epoch: 22 Global Step: 456350 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:14,195-Speed 2496.67 samples/sec Loss 1.9486 LearningRate 0.000250 Epoch: 22 Global Step: 456360 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:22,347-Speed 2512.46 samples/sec Loss 1.9688 LearningRate 0.000250 Epoch: 22 Global Step: 456370 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:30,555-Speed 2495.69 samples/sec Loss 1.9310 LearningRate 0.000250 Epoch: 22 Global Step: 456380 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:38,760-Speed 2496.57 samples/sec Loss 1.9050 LearningRate 0.000250 Epoch: 22 Global Step: 456390 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:46,966-Speed 2496.35 samples/sec Loss 1.9472 LearningRate 0.000250 Epoch: 22 Global Step: 456400 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:55:55,173-Speed 2496.04 samples/sec Loss 2.0030 LearningRate 0.000250 Epoch: 22 Global Step: 456410 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:03,391-Speed 2492.57 samples/sec Loss 1.9942 LearningRate 0.000250 Epoch: 22 Global Step: 456420 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:11,547-Speed 2511.28 samples/sec Loss 1.9508 LearningRate 0.000250 Epoch: 22 Global Step: 456430 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:19,757-Speed 2495.04 samples/sec Loss 1.9761 LearningRate 0.000250 Epoch: 22 Global Step: 456440 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:27,971-Speed 2493.61 samples/sec Loss 1.9509 LearningRate 0.000250 Epoch: 22 Global Step: 456450 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:36,185-Speed 2493.80 samples/sec Loss 1.9737 LearningRate 0.000250 Epoch: 22 Global Step: 456460 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:44,389-Speed 2496.76 samples/sec Loss 1.9295 LearningRate 0.000250 Epoch: 22 Global Step: 456470 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:56:52,598-Speed 2495.09 samples/sec Loss 1.9370 LearningRate 0.000250 Epoch: 22 Global Step: 456480 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:00,754-Speed 2511.64 samples/sec Loss 1.9469 LearningRate 0.000250 Epoch: 22 Global Step: 456490 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:08,957-Speed 2496.77 samples/sec Loss 2.0014 LearningRate 0.000250 Epoch: 22 Global Step: 456500 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:17,164-Speed 2495.96 samples/sec Loss 1.9884 LearningRate 0.000250 Epoch: 22 Global Step: 456510 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:25,383-Speed 2492.45 samples/sec Loss 1.9341 LearningRate 0.000250 Epoch: 22 Global Step: 456520 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:33,589-Speed 2496.12 samples/sec Loss 1.9376 LearningRate 0.000250 Epoch: 22 Global Step: 456530 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:41,795-Speed 2496.12 samples/sec Loss 1.9528 LearningRate 0.000250 Epoch: 22 Global Step: 456540 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:49,949-Speed 2511.86 samples/sec Loss 1.9463 LearningRate 0.000250 Epoch: 22 Global Step: 456550 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:57:58,168-Speed 2492.51 samples/sec Loss 1.9379 LearningRate 0.000250 Epoch: 22 Global Step: 456560 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:06,371-Speed 2497.14 samples/sec Loss 1.9553 LearningRate 0.000250 Epoch: 22 Global Step: 456570 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:14,576-Speed 2496.26 samples/sec Loss 1.9740 LearningRate 0.000250 Epoch: 22 Global Step: 456580 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:22,781-Speed 2496.59 samples/sec Loss 1.9780 LearningRate 0.000250 Epoch: 22 Global Step: 456590 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:30,992-Speed 2494.53 samples/sec Loss 1.9276 LearningRate 0.000250 Epoch: 22 Global Step: 456600 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:39,146-Speed 2512.17 samples/sec Loss 1.9279 LearningRate 0.000250 Epoch: 22 Global Step: 456610 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:47,349-Speed 2497.15 samples/sec Loss 1.9347 LearningRate 0.000250 Epoch: 22 Global Step: 456620 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:58:55,554-Speed 2496.17 samples/sec Loss 1.9673 LearningRate 0.000250 Epoch: 22 Global Step: 456630 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:03,762-Speed 2495.41 samples/sec Loss 1.9567 LearningRate 0.000249 Epoch: 22 Global Step: 456640 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:11,970-Speed 2495.83 samples/sec Loss 1.9511 LearningRate 0.000249 Epoch: 22 Global Step: 456650 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:20,177-Speed 2495.69 samples/sec Loss 1.9412 LearningRate 0.000249 Epoch: 22 Global Step: 456660 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:28,333-Speed 2511.41 samples/sec Loss 1.9634 LearningRate 0.000249 Epoch: 22 Global Step: 456670 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:36,544-Speed 2494.73 samples/sec Loss 1.9976 LearningRate 0.000249 Epoch: 22 Global Step: 456680 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:44,760-Speed 2493.13 samples/sec Loss 1.9795 LearningRate 0.000249 Epoch: 22 Global Step: 456690 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 22:59:52,966-Speed 2496.14 samples/sec Loss 1.9619 LearningRate 0.000249 Epoch: 22 Global Step: 456700 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:01,177-Speed 2494.55 samples/sec Loss 1.9621 LearningRate 0.000249 Epoch: 22 Global Step: 456710 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:09,386-Speed 2495.36 samples/sec Loss 1.9623 LearningRate 0.000249 Epoch: 22 Global Step: 456720 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:17,538-Speed 2512.64 samples/sec Loss 1.9499 LearningRate 0.000249 Epoch: 22 Global Step: 456730 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:25,768-Speed 2488.72 samples/sec Loss 1.9728 LearningRate 0.000249 Epoch: 22 Global Step: 456740 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:33,976-Speed 2495.43 samples/sec Loss 1.9512 LearningRate 0.000249 Epoch: 22 Global Step: 456750 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:42,200-Speed 2490.76 samples/sec Loss 1.9474 LearningRate 0.000249 Epoch: 22 Global Step: 456760 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:50,406-Speed 2496.04 samples/sec Loss 1.9504 LearningRate 0.000249 Epoch: 22 Global Step: 456770 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:00:58,609-Speed 2497.03 samples/sec Loss 1.9216 LearningRate 0.000249 Epoch: 22 Global Step: 456780 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:06,762-Speed 2512.55 samples/sec Loss 1.9486 LearningRate 0.000249 Epoch: 22 Global Step: 456790 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:14,966-Speed 2496.43 samples/sec Loss 1.8705 LearningRate 0.000249 Epoch: 22 Global Step: 456800 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:23,174-Speed 2496.11 samples/sec Loss 1.9575 LearningRate 0.000249 Epoch: 22 Global Step: 456810 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:31,375-Speed 2497.58 samples/sec Loss 1.9263 LearningRate 0.000249 Epoch: 22 Global Step: 456820 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:39,578-Speed 2497.18 samples/sec Loss 1.9182 LearningRate 0.000249 Epoch: 22 Global Step: 456830 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:47,784-Speed 2496.04 samples/sec Loss 1.9351 LearningRate 0.000249 Epoch: 22 Global Step: 456840 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:01:55,940-Speed 2511.56 samples/sec Loss 1.9048 LearningRate 0.000249 Epoch: 22 Global Step: 456850 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:04,146-Speed 2496.33 samples/sec Loss 1.9343 LearningRate 0.000249 Epoch: 22 Global Step: 456860 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:12,351-Speed 2496.33 samples/sec Loss 1.9017 LearningRate 0.000249 Epoch: 22 Global Step: 456870 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:20,558-Speed 2495.94 samples/sec Loss 1.9061 LearningRate 0.000249 Epoch: 22 Global Step: 456880 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:28,770-Speed 2494.13 samples/sec Loss 1.9464 LearningRate 0.000249 Epoch: 22 Global Step: 456890 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:36,979-Speed 2495.37 samples/sec Loss 1.9044 LearningRate 0.000249 Epoch: 22 Global Step: 456900 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:45,133-Speed 2511.88 samples/sec Loss 1.9473 LearningRate 0.000249 Epoch: 22 Global Step: 456910 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:02:53,338-Speed 2496.55 samples/sec Loss 1.9397 LearningRate 0.000249 Epoch: 22 Global Step: 456920 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:01,546-Speed 2495.52 samples/sec Loss 1.9352 LearningRate 0.000249 Epoch: 22 Global Step: 456930 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:09,755-Speed 2495.34 samples/sec Loss 1.9282 LearningRate 0.000249 Epoch: 22 Global Step: 456940 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:17,962-Speed 2495.66 samples/sec Loss 1.9492 LearningRate 0.000249 Epoch: 22 Global Step: 456950 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:26,167-Speed 2496.54 samples/sec Loss 1.8911 LearningRate 0.000249 Epoch: 22 Global Step: 456960 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:34,334-Speed 2508.23 samples/sec Loss 1.9367 LearningRate 0.000249 Epoch: 22 Global Step: 456970 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:42,542-Speed 2495.37 samples/sec Loss 1.8929 LearningRate 0.000249 Epoch: 22 Global Step: 456980 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:50,756-Speed 2494.01 samples/sec Loss 1.9073 LearningRate 0.000249 Epoch: 22 Global Step: 456990 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:03:58,960-Speed 2496.49 samples/sec Loss 1.9386 LearningRate 0.000249 Epoch: 22 Global Step: 457000 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:07,168-Speed 2495.65 samples/sec Loss 1.9381 LearningRate 0.000249 Epoch: 22 Global Step: 457010 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:15,380-Speed 2494.39 samples/sec Loss 1.9296 LearningRate 0.000249 Epoch: 22 Global Step: 457020 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:23,530-Speed 2513.14 samples/sec Loss 1.9326 LearningRate 0.000249 Epoch: 22 Global Step: 457030 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:31,739-Speed 2495.18 samples/sec Loss 1.9136 LearningRate 0.000249 Epoch: 22 Global Step: 457040 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:39,943-Speed 2496.69 samples/sec Loss 1.8992 LearningRate 0.000249 Epoch: 22 Global Step: 457050 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:48,151-Speed 2495.63 samples/sec Loss 1.9426 LearningRate 0.000249 Epoch: 22 Global Step: 457060 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:04:56,359-Speed 2495.40 samples/sec Loss 1.9250 LearningRate 0.000249 Epoch: 22 Global Step: 457070 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:04,564-Speed 2496.50 samples/sec Loss 1.9004 LearningRate 0.000249 Epoch: 22 Global Step: 457080 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:12,718-Speed 2511.78 samples/sec Loss 1.9600 LearningRate 0.000249 Epoch: 22 Global Step: 457090 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:20,924-Speed 2496.45 samples/sec Loss 1.9989 LearningRate 0.000249 Epoch: 22 Global Step: 457100 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:29,127-Speed 2496.79 samples/sec Loss 1.9507 LearningRate 0.000249 Epoch: 22 Global Step: 457110 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:37,336-Speed 2495.19 samples/sec Loss 1.9685 LearningRate 0.000249 Epoch: 22 Global Step: 457120 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:45,543-Speed 2496.10 samples/sec Loss 1.9054 LearningRate 0.000249 Epoch: 22 Global Step: 457130 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:05:53,747-Speed 2496.45 samples/sec Loss 1.9623 LearningRate 0.000249 Epoch: 22 Global Step: 457140 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:01,901-Speed 2512.24 samples/sec Loss 1.9588 LearningRate 0.000249 Epoch: 22 Global Step: 457150 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:10,103-Speed 2497.14 samples/sec Loss 1.9525 LearningRate 0.000249 Epoch: 22 Global Step: 457160 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:18,308-Speed 2496.75 samples/sec Loss 1.9653 LearningRate 0.000249 Epoch: 22 Global Step: 457170 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:26,526-Speed 2492.37 samples/sec Loss 1.8796 LearningRate 0.000249 Epoch: 22 Global Step: 457180 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:34,737-Speed 2494.80 samples/sec Loss 1.8869 LearningRate 0.000249 Epoch: 22 Global Step: 457190 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:42,948-Speed 2494.62 samples/sec Loss 1.9351 LearningRate 0.000249 Epoch: 22 Global Step: 457200 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:51,104-Speed 2511.85 samples/sec Loss 1.9402 LearningRate 0.000249 Epoch: 22 Global Step: 457210 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:06:59,314-Speed 2495.29 samples/sec Loss 1.9260 LearningRate 0.000249 Epoch: 22 Global Step: 457220 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:07,521-Speed 2496.13 samples/sec Loss 1.9823 LearningRate 0.000249 Epoch: 22 Global Step: 457230 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:15,723-Speed 2497.29 samples/sec Loss 1.9763 LearningRate 0.000249 Epoch: 22 Global Step: 457240 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:23,928-Speed 2496.71 samples/sec Loss 1.9476 LearningRate 0.000249 Epoch: 22 Global Step: 457250 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:32,133-Speed 2496.37 samples/sec Loss 1.9499 LearningRate 0.000249 Epoch: 22 Global Step: 457260 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:40,284-Speed 2512.73 samples/sec Loss 1.9473 LearningRate 0.000249 Epoch: 22 Global Step: 457270 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:48,489-Speed 2496.59 samples/sec Loss 1.9150 LearningRate 0.000249 Epoch: 22 Global Step: 457280 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:07:56,692-Speed 2497.19 samples/sec Loss 1.9286 LearningRate 0.000249 Epoch: 22 Global Step: 457290 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:04,898-Speed 2496.15 samples/sec Loss 1.9346 LearningRate 0.000249 Epoch: 22 Global Step: 457300 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:13,103-Speed 2496.45 samples/sec Loss 1.9109 LearningRate 0.000249 Epoch: 22 Global Step: 457310 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:21,308-Speed 2496.25 samples/sec Loss 1.9594 LearningRate 0.000249 Epoch: 22 Global Step: 457320 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:29,460-Speed 2512.77 samples/sec Loss 1.9443 LearningRate 0.000249 Epoch: 22 Global Step: 457330 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:37,666-Speed 2496.15 samples/sec Loss 1.9276 LearningRate 0.000249 Epoch: 22 Global Step: 457340 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:45,891-Speed 2490.11 samples/sec Loss 1.9217 LearningRate 0.000249 Epoch: 22 Global Step: 457350 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:08:54,103-Speed 2494.40 samples/sec Loss 1.9053 LearningRate 0.000249 Epoch: 22 Global Step: 457360 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:02,306-Speed 2497.21 samples/sec Loss 1.8747 LearningRate 0.000249 Epoch: 22 Global Step: 457370 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:10,511-Speed 2496.07 samples/sec Loss 1.9133 LearningRate 0.000249 Epoch: 22 Global Step: 457380 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:18,667-Speed 2511.71 samples/sec Loss 1.9480 LearningRate 0.000248 Epoch: 22 Global Step: 457390 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:26,871-Speed 2497.04 samples/sec Loss 1.9253 LearningRate 0.000248 Epoch: 22 Global Step: 457400 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:35,080-Speed 2495.25 samples/sec Loss 1.9447 LearningRate 0.000248 Epoch: 22 Global Step: 457410 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:43,284-Speed 2496.65 samples/sec Loss 1.9898 LearningRate 0.000248 Epoch: 22 Global Step: 457420 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:51,485-Speed 2497.64 samples/sec Loss 1.9323 LearningRate 0.000248 Epoch: 22 Global Step: 457430 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:09:59,694-Speed 2495.26 samples/sec Loss 1.9099 LearningRate 0.000248 Epoch: 22 Global Step: 457440 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:07,847-Speed 2512.13 samples/sec Loss 1.9386 LearningRate 0.000248 Epoch: 22 Global Step: 457450 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:16,051-Speed 2496.88 samples/sec Loss 1.9424 LearningRate 0.000248 Epoch: 22 Global Step: 457460 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:24,269-Speed 2492.70 samples/sec Loss 1.8980 LearningRate 0.000248 Epoch: 22 Global Step: 457470 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:32,476-Speed 2495.70 samples/sec Loss 1.9231 LearningRate 0.000248 Epoch: 22 Global Step: 457480 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:40,685-Speed 2495.22 samples/sec Loss 1.9383 LearningRate 0.000248 Epoch: 22 Global Step: 457490 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:48,895-Speed 2495.10 samples/sec Loss 1.9624 LearningRate 0.000248 Epoch: 22 Global Step: 457500 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:10:57,052-Speed 2510.99 samples/sec Loss 1.9961 LearningRate 0.000248 Epoch: 22 Global Step: 457510 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:11:05,266-Speed 2493.83 samples/sec Loss 1.9490 LearningRate 0.000248 Epoch: 22 Global Step: 457520 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:11:13,476-Speed 2495.16 samples/sec Loss 1.9495 LearningRate 0.000248 Epoch: 22 Global Step: 457530 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:11:21,684-Speed 2495.39 samples/sec Loss 1.9907 LearningRate 0.000248 Epoch: 22 Global Step: 457540 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:11:29,890-Speed 2496.38 samples/sec Loss 1.9604 LearningRate 0.000248 Epoch: 22 Global Step: 457550 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:11:38,094-Speed 2496.61 samples/sec Loss 1.9498 LearningRate 0.000248 Epoch: 22 Global Step: 457560 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:11:46,248-Speed 2511.84 samples/sec Loss 1.9374 LearningRate 0.000248 Epoch: 22 Global Step: 457570 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:11:54,454-Speed 2496.17 samples/sec Loss 1.9281 LearningRate 0.000248 Epoch: 22 Global Step: 457580 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:02,660-Speed 2496.28 samples/sec Loss 1.9443 LearningRate 0.000248 Epoch: 22 Global Step: 457590 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:10,866-Speed 2496.19 samples/sec Loss 1.9500 LearningRate 0.000248 Epoch: 22 Global Step: 457600 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:19,071-Speed 2496.16 samples/sec Loss 1.9290 LearningRate 0.000248 Epoch: 22 Global Step: 457610 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:27,275-Speed 2496.87 samples/sec Loss 1.9623 LearningRate 0.000248 Epoch: 22 Global Step: 457620 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:35,428-Speed 2512.27 samples/sec Loss 1.9538 LearningRate 0.000248 Epoch: 22 Global Step: 457630 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:43,631-Speed 2498.25 samples/sec Loss 1.9612 LearningRate 0.000248 Epoch: 22 Global Step: 457640 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:12:51,836-Speed 2496.57 samples/sec Loss 1.9487 LearningRate 0.000248 Epoch: 22 Global Step: 457650 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:00,041-Speed 2496.64 samples/sec Loss 1.9269 LearningRate 0.000248 Epoch: 22 Global Step: 457660 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:08,245-Speed 2496.90 samples/sec Loss 1.9463 LearningRate 0.000248 Epoch: 22 Global Step: 457670 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:16,449-Speed 2496.72 samples/sec Loss 1.9309 LearningRate 0.000248 Epoch: 22 Global Step: 457680 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:24,602-Speed 2512.15 samples/sec Loss 1.9339 LearningRate 0.000248 Epoch: 22 Global Step: 457690 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:32,808-Speed 2496.08 samples/sec Loss 1.9355 LearningRate 0.000248 Epoch: 22 Global Step: 457700 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:41,012-Speed 2497.36 samples/sec Loss 1.9101 LearningRate 0.000248 Epoch: 22 Global Step: 457710 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:49,218-Speed 2496.04 samples/sec Loss 1.9056 LearningRate 0.000248 Epoch: 22 Global Step: 457720 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:13:57,428-Speed 2494.80 samples/sec Loss 1.9717 LearningRate 0.000248 Epoch: 22 Global Step: 457730 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:05,648-Speed 2491.78 samples/sec Loss 1.9794 LearningRate 0.000248 Epoch: 22 Global Step: 457740 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:13,801-Speed 2512.48 samples/sec Loss 1.9742 LearningRate 0.000248 Epoch: 22 Global Step: 457750 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:22,005-Speed 2496.68 samples/sec Loss 1.9287 LearningRate 0.000248 Epoch: 22 Global Step: 457760 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:30,208-Speed 2497.11 samples/sec Loss 1.9474 LearningRate 0.000248 Epoch: 22 Global Step: 457770 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:38,425-Speed 2493.08 samples/sec Loss 1.8735 LearningRate 0.000248 Epoch: 22 Global Step: 457780 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:46,628-Speed 2497.04 samples/sec Loss 1.9592 LearningRate 0.000248 Epoch: 22 Global Step: 457790 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:14:54,831-Speed 2497.25 samples/sec Loss 1.9383 LearningRate 0.000248 Epoch: 22 Global Step: 457800 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:02,986-Speed 2511.85 samples/sec Loss 1.8953 LearningRate 0.000248 Epoch: 22 Global Step: 457810 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:11,193-Speed 2496.12 samples/sec Loss 1.9244 LearningRate 0.000248 Epoch: 22 Global Step: 457820 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:19,399-Speed 2496.06 samples/sec Loss 1.9368 LearningRate 0.000248 Epoch: 22 Global Step: 457830 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:27,605-Speed 2496.14 samples/sec Loss 1.9274 LearningRate 0.000248 Epoch: 22 Global Step: 457840 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:35,812-Speed 2495.63 samples/sec Loss 1.9683 LearningRate 0.000248 Epoch: 22 Global Step: 457850 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:44,016-Speed 2496.89 samples/sec Loss 1.9439 LearningRate 0.000248 Epoch: 22 Global Step: 457860 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:15:52,178-Speed 2509.42 samples/sec Loss 1.9288 LearningRate 0.000248 Epoch: 22 Global Step: 457870 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:00,382-Speed 2496.81 samples/sec Loss 1.9463 LearningRate 0.000248 Epoch: 22 Global Step: 457880 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:08,593-Speed 2494.70 samples/sec Loss 1.9649 LearningRate 0.000248 Epoch: 22 Global Step: 457890 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:16,798-Speed 2496.27 samples/sec Loss 1.9212 LearningRate 0.000248 Epoch: 22 Global Step: 457900 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:25,010-Speed 2494.47 samples/sec Loss 1.9270 LearningRate 0.000248 Epoch: 22 Global Step: 457910 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:33,211-Speed 2497.50 samples/sec Loss 1.9373 LearningRate 0.000248 Epoch: 22 Global Step: 457920 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:41,373-Speed 2509.66 samples/sec Loss 1.9649 LearningRate 0.000248 Epoch: 22 Global Step: 457930 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:49,580-Speed 2495.95 samples/sec Loss 1.8917 LearningRate 0.000248 Epoch: 22 Global Step: 457940 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:16:57,787-Speed 2495.53 samples/sec Loss 1.9703 LearningRate 0.000248 Epoch: 22 Global Step: 457950 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:05,997-Speed 2495.09 samples/sec Loss 1.9151 LearningRate 0.000248 Epoch: 22 Global Step: 457960 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:14,201-Speed 2496.47 samples/sec Loss 1.9112 LearningRate 0.000248 Epoch: 22 Global Step: 457970 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:22,416-Speed 2493.73 samples/sec Loss 1.9362 LearningRate 0.000248 Epoch: 22 Global Step: 457980 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:30,566-Speed 2515.27 samples/sec Loss 1.9319 LearningRate 0.000248 Epoch: 22 Global Step: 457990 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:38,768-Speed 2497.07 samples/sec Loss 1.9253 LearningRate 0.000248 Epoch: 22 Global Step: 458000 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:46,970-Speed 2497.29 samples/sec Loss 1.9601 LearningRate 0.000248 Epoch: 22 Global Step: 458010 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:17:55,178-Speed 2495.52 samples/sec Loss 1.9676 LearningRate 0.000248 Epoch: 22 Global Step: 458020 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:03,383-Speed 2496.58 samples/sec Loss 1.9192 LearningRate 0.000248 Epoch: 22 Global Step: 458030 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:11,587-Speed 2496.53 samples/sec Loss 1.9492 LearningRate 0.000248 Epoch: 22 Global Step: 458040 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:19,741-Speed 2511.93 samples/sec Loss 1.9517 LearningRate 0.000248 Epoch: 22 Global Step: 458050 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:27,951-Speed 2495.19 samples/sec Loss 1.9448 LearningRate 0.000248 Epoch: 22 Global Step: 458060 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:36,154-Speed 2497.02 samples/sec Loss 1.9462 LearningRate 0.000248 Epoch: 22 Global Step: 458070 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:44,365-Speed 2494.56 samples/sec Loss 1.8937 LearningRate 0.000248 Epoch: 22 Global Step: 458080 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:18:52,575-Speed 2494.85 samples/sec Loss 1.9097 LearningRate 0.000248 Epoch: 22 Global Step: 458090 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:00,783-Speed 2495.72 samples/sec Loss 1.9325 LearningRate 0.000248 Epoch: 22 Global Step: 458100 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:08,934-Speed 2512.74 samples/sec Loss 1.8992 LearningRate 0.000248 Epoch: 22 Global Step: 458110 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:17,138-Speed 2496.88 samples/sec Loss 1.9125 LearningRate 0.000248 Epoch: 22 Global Step: 458120 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:25,355-Speed 2493.04 samples/sec Loss 1.9518 LearningRate 0.000248 Epoch: 22 Global Step: 458130 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:33,561-Speed 2495.89 samples/sec Loss 1.9410 LearningRate 0.000247 Epoch: 22 Global Step: 458140 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:41,765-Speed 2496.73 samples/sec Loss 1.9365 LearningRate 0.000247 Epoch: 22 Global Step: 458150 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:49,971-Speed 2496.07 samples/sec Loss 1.9335 LearningRate 0.000247 Epoch: 22 Global Step: 458160 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:19:58,130-Speed 2510.74 samples/sec Loss 1.9308 LearningRate 0.000247 Epoch: 22 Global Step: 458170 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:06,334-Speed 2496.73 samples/sec Loss 1.9197 LearningRate 0.000247 Epoch: 22 Global Step: 458180 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:14,548-Speed 2493.61 samples/sec Loss 1.8836 LearningRate 0.000247 Epoch: 22 Global Step: 458190 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:22,752-Speed 2496.90 samples/sec Loss 1.9326 LearningRate 0.000247 Epoch: 22 Global Step: 458200 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:30,956-Speed 2496.62 samples/sec Loss 1.9218 LearningRate 0.000247 Epoch: 22 Global Step: 458210 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:39,164-Speed 2495.64 samples/sec Loss 1.8871 LearningRate 0.000247 Epoch: 22 Global Step: 458220 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:47,312-Speed 2513.89 samples/sec Loss 1.9386 LearningRate 0.000247 Epoch: 22 Global Step: 458230 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:20:55,514-Speed 2497.49 samples/sec Loss 1.9694 LearningRate 0.000247 Epoch: 22 Global Step: 458240 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:21:03,716-Speed 2497.22 samples/sec Loss 1.9028 LearningRate 0.000247 Epoch: 22 Global Step: 458250 Fp16 Grad Scale: 32768 Required: 85 hours Training: 2022-07-09 23:21:11,876-Speed 2510.45 samples/sec Loss 1.8772 LearningRate 0.000247 Epoch: 22 Global Step: 458260 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:21:20,097-Speed 2491.67 samples/sec Loss 1.9374 LearningRate 0.000247 Epoch: 22 Global Step: 458270 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:21:28,308-Speed 2494.50 samples/sec Loss 1.9784 LearningRate 0.000247 Epoch: 22 Global Step: 458280 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:21:36,461-Speed 2512.33 samples/sec Loss 1.8938 LearningRate 0.000247 Epoch: 22 Global Step: 458290 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:21:44,669-Speed 2495.39 samples/sec Loss 1.9247 LearningRate 0.000247 Epoch: 22 Global Step: 458300 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:21:52,880-Speed 2494.55 samples/sec Loss 1.9536 LearningRate 0.000247 Epoch: 22 Global Step: 458310 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:01,091-Speed 2494.82 samples/sec Loss 1.9280 LearningRate 0.000247 Epoch: 22 Global Step: 458320 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:09,303-Speed 2493.99 samples/sec Loss 1.8875 LearningRate 0.000247 Epoch: 22 Global Step: 458330 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:17,509-Speed 2496.34 samples/sec Loss 1.8840 LearningRate 0.000247 Epoch: 22 Global Step: 458340 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:25,683-Speed 2505.90 samples/sec Loss 1.8944 LearningRate 0.000247 Epoch: 22 Global Step: 458350 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:33,890-Speed 2496.06 samples/sec Loss 1.9029 LearningRate 0.000247 Epoch: 22 Global Step: 458360 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:42,092-Speed 2497.28 samples/sec Loss 1.8964 LearningRate 0.000247 Epoch: 22 Global Step: 458370 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:50,296-Speed 2496.73 samples/sec Loss 1.8977 LearningRate 0.000247 Epoch: 22 Global Step: 458380 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:22:58,499-Speed 2497.15 samples/sec Loss 1.9657 LearningRate 0.000247 Epoch: 22 Global Step: 458390 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:06,703-Speed 2496.99 samples/sec Loss 1.9190 LearningRate 0.000247 Epoch: 22 Global Step: 458400 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:14,853-Speed 2513.29 samples/sec Loss 1.9454 LearningRate 0.000247 Epoch: 22 Global Step: 458410 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:23,068-Speed 2493.44 samples/sec Loss 1.9404 LearningRate 0.000247 Epoch: 22 Global Step: 458420 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:31,270-Speed 2497.38 samples/sec Loss 1.9661 LearningRate 0.000247 Epoch: 22 Global Step: 458430 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:39,473-Speed 2496.97 samples/sec Loss 1.9778 LearningRate 0.000247 Epoch: 22 Global Step: 458440 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:47,680-Speed 2495.97 samples/sec Loss 1.9413 LearningRate 0.000247 Epoch: 22 Global Step: 458450 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:23:55,886-Speed 2495.97 samples/sec Loss 1.9589 LearningRate 0.000247 Epoch: 22 Global Step: 458460 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:04,039-Speed 2513.22 samples/sec Loss 1.9260 LearningRate 0.000247 Epoch: 22 Global Step: 458470 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:12,243-Speed 2496.73 samples/sec Loss 1.8555 LearningRate 0.000247 Epoch: 22 Global Step: 458480 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:20,455-Speed 2494.34 samples/sec Loss 1.8862 LearningRate 0.000247 Epoch: 22 Global Step: 458490 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:28,664-Speed 2495.08 samples/sec Loss 1.9257 LearningRate 0.000247 Epoch: 22 Global Step: 458500 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:36,873-Speed 2495.40 samples/sec Loss 1.9225 LearningRate 0.000247 Epoch: 22 Global Step: 458510 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:45,077-Speed 2496.70 samples/sec Loss 1.9400 LearningRate 0.000247 Epoch: 22 Global Step: 458520 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:24:53,229-Speed 2512.78 samples/sec Loss 1.9779 LearningRate 0.000247 Epoch: 22 Global Step: 458530 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:01,442-Speed 2493.94 samples/sec Loss 1.9145 LearningRate 0.000247 Epoch: 22 Global Step: 458540 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:09,647-Speed 2496.44 samples/sec Loss 1.9124 LearningRate 0.000247 Epoch: 22 Global Step: 458550 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:17,853-Speed 2496.31 samples/sec Loss 1.8674 LearningRate 0.000247 Epoch: 22 Global Step: 458560 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:26,057-Speed 2496.70 samples/sec Loss 1.9005 LearningRate 0.000247 Epoch: 22 Global Step: 458570 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:34,264-Speed 2495.61 samples/sec Loss 1.9675 LearningRate 0.000247 Epoch: 22 Global Step: 458580 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:42,417-Speed 2512.53 samples/sec Loss 1.9171 LearningRate 0.000247 Epoch: 22 Global Step: 458590 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:50,621-Speed 2496.68 samples/sec Loss 1.9826 LearningRate 0.000247 Epoch: 22 Global Step: 458600 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:25:58,823-Speed 2497.24 samples/sec Loss 1.9109 LearningRate 0.000247 Epoch: 22 Global Step: 458610 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:07,041-Speed 2492.55 samples/sec Loss 1.9054 LearningRate 0.000247 Epoch: 22 Global Step: 458620 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:15,244-Speed 2496.99 samples/sec Loss 1.9576 LearningRate 0.000247 Epoch: 22 Global Step: 458630 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:23,447-Speed 2496.92 samples/sec Loss 1.9569 LearningRate 0.000247 Epoch: 22 Global Step: 458640 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:31,611-Speed 2509.04 samples/sec Loss 1.9526 LearningRate 0.000247 Epoch: 22 Global Step: 458650 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:39,816-Speed 2496.27 samples/sec Loss 1.9129 LearningRate 0.000247 Epoch: 22 Global Step: 458660 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:48,020-Speed 2497.14 samples/sec Loss 1.8940 LearningRate 0.000247 Epoch: 22 Global Step: 458670 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:26:56,235-Speed 2493.34 samples/sec Loss 1.9134 LearningRate 0.000247 Epoch: 22 Global Step: 458680 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:04,442-Speed 2496.01 samples/sec Loss 1.9342 LearningRate 0.000247 Epoch: 22 Global Step: 458690 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:12,649-Speed 2495.76 samples/sec Loss 1.9259 LearningRate 0.000247 Epoch: 22 Global Step: 458700 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:20,804-Speed 2511.70 samples/sec Loss 1.9441 LearningRate 0.000247 Epoch: 22 Global Step: 458710 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:29,010-Speed 2495.94 samples/sec Loss 1.8773 LearningRate 0.000247 Epoch: 22 Global Step: 458720 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:37,219-Speed 2495.22 samples/sec Loss 1.9426 LearningRate 0.000247 Epoch: 22 Global Step: 458730 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:45,430-Speed 2494.67 samples/sec Loss 1.9269 LearningRate 0.000247 Epoch: 22 Global Step: 458740 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:27:53,635-Speed 2496.33 samples/sec Loss 1.9525 LearningRate 0.000247 Epoch: 22 Global Step: 458750 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:01,844-Speed 2495.09 samples/sec Loss 1.9425 LearningRate 0.000247 Epoch: 22 Global Step: 458760 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:09,996-Speed 2512.81 samples/sec Loss 1.9241 LearningRate 0.000247 Epoch: 22 Global Step: 458770 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:18,199-Speed 2497.04 samples/sec Loss 1.9255 LearningRate 0.000247 Epoch: 22 Global Step: 458780 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:26,409-Speed 2494.98 samples/sec Loss 1.9216 LearningRate 0.000247 Epoch: 22 Global Step: 458790 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:34,609-Speed 2498.13 samples/sec Loss 1.8783 LearningRate 0.000247 Epoch: 22 Global Step: 458800 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:42,820-Speed 2495.01 samples/sec Loss 1.9530 LearningRate 0.000247 Epoch: 22 Global Step: 458810 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:51,022-Speed 2497.32 samples/sec Loss 1.8706 LearningRate 0.000247 Epoch: 22 Global Step: 458820 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:28:59,184-Speed 2509.54 samples/sec Loss 1.9327 LearningRate 0.000247 Epoch: 22 Global Step: 458830 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:29:07,348-Speed 2508.97 samples/sec Loss 1.9509 LearningRate 0.000247 Epoch: 22 Global Step: 458840 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:15,550-Speed 2497.19 samples/sec Loss 1.8792 LearningRate 0.000247 Epoch: 22 Global Step: 458850 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:23,755-Speed 2496.51 samples/sec Loss 1.8851 LearningRate 0.000247 Epoch: 22 Global Step: 458860 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:31,953-Speed 2498.40 samples/sec Loss 1.9027 LearningRate 0.000247 Epoch: 22 Global Step: 458870 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:40,159-Speed 2496.36 samples/sec Loss 1.9515 LearningRate 0.000247 Epoch: 22 Global Step: 458880 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:48,306-Speed 2514.14 samples/sec Loss 1.9059 LearningRate 0.000246 Epoch: 22 Global Step: 458890 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:29:56,508-Speed 2497.50 samples/sec Loss 1.9399 LearningRate 0.000246 Epoch: 22 Global Step: 458900 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:04,710-Speed 2497.20 samples/sec Loss 1.9232 LearningRate 0.000246 Epoch: 22 Global Step: 458910 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:12,915-Speed 2496.42 samples/sec Loss 1.9640 LearningRate 0.000246 Epoch: 22 Global Step: 458920 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:21,115-Speed 2497.73 samples/sec Loss 1.9443 LearningRate 0.000246 Epoch: 22 Global Step: 458930 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:29,319-Speed 2496.81 samples/sec Loss 1.9424 LearningRate 0.000246 Epoch: 22 Global Step: 458940 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:37,467-Speed 2514.16 samples/sec Loss 1.8955 LearningRate 0.000246 Epoch: 22 Global Step: 458950 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:45,672-Speed 2496.53 samples/sec Loss 1.9524 LearningRate 0.000246 Epoch: 22 Global Step: 458960 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:30:53,873-Speed 2498.01 samples/sec Loss 1.9595 LearningRate 0.000246 Epoch: 22 Global Step: 458970 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:02,073-Speed 2497.80 samples/sec Loss 1.9777 LearningRate 0.000246 Epoch: 22 Global Step: 458980 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:10,276-Speed 2497.25 samples/sec Loss 1.9539 LearningRate 0.000246 Epoch: 22 Global Step: 458990 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:18,483-Speed 2495.79 samples/sec Loss 1.9787 LearningRate 0.000246 Epoch: 22 Global Step: 459000 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:26,629-Speed 2514.57 samples/sec Loss 1.9311 LearningRate 0.000246 Epoch: 22 Global Step: 459010 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:34,835-Speed 2496.12 samples/sec Loss 1.9158 LearningRate 0.000246 Epoch: 22 Global Step: 459020 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:43,037-Speed 2497.17 samples/sec Loss 1.9244 LearningRate 0.000246 Epoch: 22 Global Step: 459030 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:51,239-Speed 2497.43 samples/sec Loss 1.9492 LearningRate 0.000246 Epoch: 22 Global Step: 459040 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:31:59,442-Speed 2496.85 samples/sec Loss 1.9123 LearningRate 0.000246 Epoch: 22 Global Step: 459050 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:07,645-Speed 2497.14 samples/sec Loss 1.9893 LearningRate 0.000246 Epoch: 22 Global Step: 459060 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:15,795-Speed 2513.33 samples/sec Loss 1.9647 LearningRate 0.000246 Epoch: 22 Global Step: 459070 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:23,996-Speed 2497.88 samples/sec Loss 1.9361 LearningRate 0.000246 Epoch: 22 Global Step: 459080 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:32,196-Speed 2497.77 samples/sec Loss 2.0114 LearningRate 0.000246 Epoch: 22 Global Step: 459090 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:40,399-Speed 2497.04 samples/sec Loss 1.9241 LearningRate 0.000246 Epoch: 22 Global Step: 459100 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:48,600-Speed 2497.90 samples/sec Loss 1.9867 LearningRate 0.000246 Epoch: 22 Global Step: 459110 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:32:56,802-Speed 2497.15 samples/sec Loss 1.9319 LearningRate 0.000246 Epoch: 22 Global Step: 459120 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:04,949-Speed 2514.30 samples/sec Loss 1.9704 LearningRate 0.000246 Epoch: 22 Global Step: 459130 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:13,153-Speed 2496.88 samples/sec Loss 1.9469 LearningRate 0.000246 Epoch: 22 Global Step: 459140 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:21,368-Speed 2493.63 samples/sec Loss 1.9652 LearningRate 0.000246 Epoch: 22 Global Step: 459150 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:29,569-Speed 2497.46 samples/sec Loss 1.9692 LearningRate 0.000246 Epoch: 22 Global Step: 459160 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:37,774-Speed 2496.64 samples/sec Loss 1.9134 LearningRate 0.000246 Epoch: 22 Global Step: 459170 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:45,981-Speed 2495.84 samples/sec Loss 1.9379 LearningRate 0.000246 Epoch: 22 Global Step: 459180 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:33:54,127-Speed 2514.94 samples/sec Loss 1.9465 LearningRate 0.000246 Epoch: 22 Global Step: 459190 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:02,340-Speed 2493.89 samples/sec Loss 1.9986 LearningRate 0.000246 Epoch: 22 Global Step: 459200 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:10,550-Speed 2494.84 samples/sec Loss 1.9318 LearningRate 0.000246 Epoch: 22 Global Step: 459210 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:18,750-Speed 2498.01 samples/sec Loss 1.9353 LearningRate 0.000246 Epoch: 22 Global Step: 459220 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:26,964-Speed 2493.66 samples/sec Loss 1.9659 LearningRate 0.000246 Epoch: 22 Global Step: 459230 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:35,167-Speed 2497.07 samples/sec Loss 1.9575 LearningRate 0.000246 Epoch: 22 Global Step: 459240 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:43,329-Speed 2509.62 samples/sec Loss 1.9541 LearningRate 0.000246 Epoch: 22 Global Step: 459250 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:51,529-Speed 2497.90 samples/sec Loss 1.9775 LearningRate 0.000246 Epoch: 22 Global Step: 459260 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:34:59,732-Speed 2496.97 samples/sec Loss 1.9676 LearningRate 0.000246 Epoch: 22 Global Step: 459270 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:07,937-Speed 2496.34 samples/sec Loss 2.0195 LearningRate 0.000246 Epoch: 22 Global Step: 459280 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:16,142-Speed 2496.57 samples/sec Loss 1.9594 LearningRate 0.000246 Epoch: 22 Global Step: 459290 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:24,340-Speed 2498.50 samples/sec Loss 1.9281 LearningRate 0.000246 Epoch: 22 Global Step: 459300 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:32,489-Speed 2513.60 samples/sec Loss 1.9517 LearningRate 0.000246 Epoch: 22 Global Step: 459310 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:40,690-Speed 2497.60 samples/sec Loss 1.9515 LearningRate 0.000246 Epoch: 22 Global Step: 459320 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:48,894-Speed 2496.93 samples/sec Loss 1.9904 LearningRate 0.000246 Epoch: 22 Global Step: 459330 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:35:57,092-Speed 2498.48 samples/sec Loss 1.9295 LearningRate 0.000246 Epoch: 22 Global Step: 459340 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:05,298-Speed 2496.08 samples/sec Loss 1.9676 LearningRate 0.000246 Epoch: 22 Global Step: 459350 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:13,498-Speed 2497.84 samples/sec Loss 1.9840 LearningRate 0.000246 Epoch: 22 Global Step: 459360 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:21,650-Speed 2512.82 samples/sec Loss 1.9661 LearningRate 0.000246 Epoch: 22 Global Step: 459370 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:29,860-Speed 2494.85 samples/sec Loss 1.9518 LearningRate 0.000246 Epoch: 22 Global Step: 459380 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:38,059-Speed 2498.41 samples/sec Loss 1.9373 LearningRate 0.000246 Epoch: 22 Global Step: 459390 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:46,260-Speed 2497.67 samples/sec Loss 1.9437 LearningRate 0.000246 Epoch: 22 Global Step: 459400 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:36:54,471-Speed 2494.67 samples/sec Loss 1.9514 LearningRate 0.000246 Epoch: 22 Global Step: 459410 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:02,674-Speed 2497.22 samples/sec Loss 1.9733 LearningRate 0.000246 Epoch: 22 Global Step: 459420 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:10,826-Speed 2512.75 samples/sec Loss 1.9332 LearningRate 0.000246 Epoch: 22 Global Step: 459430 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:19,028-Speed 2497.31 samples/sec Loss 1.9418 LearningRate 0.000246 Epoch: 22 Global Step: 459440 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:27,229-Speed 2497.70 samples/sec Loss 1.9640 LearningRate 0.000246 Epoch: 22 Global Step: 459450 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:35,429-Speed 2497.74 samples/sec Loss 1.9249 LearningRate 0.000246 Epoch: 22 Global Step: 459460 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:43,660-Speed 2488.68 samples/sec Loss 1.9246 LearningRate 0.000246 Epoch: 22 Global Step: 459470 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:37:51,873-Speed 2493.85 samples/sec Loss 1.9286 LearningRate 0.000246 Epoch: 22 Global Step: 459480 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:00,022-Speed 2513.51 samples/sec Loss 1.9010 LearningRate 0.000246 Epoch: 22 Global Step: 459490 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:08,226-Speed 2496.75 samples/sec Loss 1.9394 LearningRate 0.000246 Epoch: 22 Global Step: 459500 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:16,442-Speed 2493.24 samples/sec Loss 2.0002 LearningRate 0.000246 Epoch: 22 Global Step: 459510 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:24,644-Speed 2497.30 samples/sec Loss 1.9103 LearningRate 0.000246 Epoch: 22 Global Step: 459520 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:32,845-Speed 2497.71 samples/sec Loss 1.9330 LearningRate 0.000246 Epoch: 22 Global Step: 459530 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:41,048-Speed 2497.05 samples/sec Loss 1.8961 LearningRate 0.000246 Epoch: 22 Global Step: 459540 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:49,197-Speed 2513.43 samples/sec Loss 1.9257 LearningRate 0.000246 Epoch: 22 Global Step: 459550 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:38:57,400-Speed 2496.92 samples/sec Loss 1.9161 LearningRate 0.000246 Epoch: 22 Global Step: 459560 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:05,601-Speed 2497.91 samples/sec Loss 1.8961 LearningRate 0.000246 Epoch: 22 Global Step: 459570 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:13,802-Speed 2497.64 samples/sec Loss 1.8832 LearningRate 0.000246 Epoch: 22 Global Step: 459580 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:22,000-Speed 2498.40 samples/sec Loss 1.9320 LearningRate 0.000246 Epoch: 22 Global Step: 459590 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:30,209-Speed 2495.04 samples/sec Loss 1.9577 LearningRate 0.000246 Epoch: 22 Global Step: 459600 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:38,365-Speed 2511.27 samples/sec Loss 1.9178 LearningRate 0.000246 Epoch: 22 Global Step: 459610 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:46,567-Speed 2497.88 samples/sec Loss 1.9238 LearningRate 0.000246 Epoch: 22 Global Step: 459620 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:39:54,771-Speed 2496.94 samples/sec Loss 1.9060 LearningRate 0.000246 Epoch: 22 Global Step: 459630 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:02,977-Speed 2496.00 samples/sec Loss 1.8885 LearningRate 0.000245 Epoch: 22 Global Step: 459640 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:11,179-Speed 2497.41 samples/sec Loss 1.9434 LearningRate 0.000245 Epoch: 22 Global Step: 459650 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:19,380-Speed 2497.75 samples/sec Loss 1.9530 LearningRate 0.000245 Epoch: 22 Global Step: 459660 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:27,527-Speed 2514.03 samples/sec Loss 1.9423 LearningRate 0.000245 Epoch: 22 Global Step: 459670 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:35,728-Speed 2498.03 samples/sec Loss 1.9316 LearningRate 0.000245 Epoch: 22 Global Step: 459680 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:43,931-Speed 2496.92 samples/sec Loss 1.9376 LearningRate 0.000245 Epoch: 22 Global Step: 459690 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:40:52,131-Speed 2498.20 samples/sec Loss 1.9306 LearningRate 0.000245 Epoch: 22 Global Step: 459700 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:00,343-Speed 2494.27 samples/sec Loss 1.9587 LearningRate 0.000245 Epoch: 22 Global Step: 459710 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:08,549-Speed 2495.92 samples/sec Loss 1.9468 LearningRate 0.000245 Epoch: 22 Global Step: 459720 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:16,713-Speed 2509.13 samples/sec Loss 1.9209 LearningRate 0.000245 Epoch: 22 Global Step: 459730 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:24,916-Speed 2497.14 samples/sec Loss 1.9361 LearningRate 0.000245 Epoch: 22 Global Step: 459740 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:33,153-Speed 2486.69 samples/sec Loss 1.9125 LearningRate 0.000245 Epoch: 22 Global Step: 459750 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:41,355-Speed 2497.32 samples/sec Loss 1.8978 LearningRate 0.000245 Epoch: 22 Global Step: 459760 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:49,556-Speed 2497.97 samples/sec Loss 1.9705 LearningRate 0.000245 Epoch: 22 Global Step: 459770 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:41:57,762-Speed 2496.09 samples/sec Loss 1.9155 LearningRate 0.000245 Epoch: 22 Global Step: 459780 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:05,914-Speed 2512.68 samples/sec Loss 1.9365 LearningRate 0.000245 Epoch: 22 Global Step: 459790 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:14,116-Speed 2497.34 samples/sec Loss 1.9267 LearningRate 0.000245 Epoch: 22 Global Step: 459800 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:22,323-Speed 2495.72 samples/sec Loss 1.9530 LearningRate 0.000245 Epoch: 22 Global Step: 459810 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:30,525-Speed 2497.56 samples/sec Loss 1.9312 LearningRate 0.000245 Epoch: 22 Global Step: 459820 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:38,726-Speed 2497.70 samples/sec Loss 1.8952 LearningRate 0.000245 Epoch: 22 Global Step: 459830 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:46,935-Speed 2495.09 samples/sec Loss 1.9177 LearningRate 0.000245 Epoch: 22 Global Step: 459840 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:42:55,082-Speed 2514.33 samples/sec Loss 1.9416 LearningRate 0.000245 Epoch: 22 Global Step: 459850 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:03,282-Speed 2497.98 samples/sec Loss 1.9601 LearningRate 0.000245 Epoch: 22 Global Step: 459860 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:11,484-Speed 2497.25 samples/sec Loss 1.9484 LearningRate 0.000245 Epoch: 22 Global Step: 459870 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:19,688-Speed 2496.89 samples/sec Loss 1.8872 LearningRate 0.000245 Epoch: 22 Global Step: 459880 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:27,892-Speed 2496.68 samples/sec Loss 1.9742 LearningRate 0.000245 Epoch: 22 Global Step: 459890 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:36,096-Speed 2496.74 samples/sec Loss 1.9164 LearningRate 0.000245 Epoch: 22 Global Step: 459900 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:44,243-Speed 2514.10 samples/sec Loss 1.9942 LearningRate 0.000245 Epoch: 22 Global Step: 459910 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:43:52,445-Speed 2497.68 samples/sec Loss 1.9307 LearningRate 0.000245 Epoch: 22 Global Step: 459920 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:00,646-Speed 2497.55 samples/sec Loss 1.9282 LearningRate 0.000245 Epoch: 22 Global Step: 459930 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:08,851-Speed 2496.64 samples/sec Loss 1.8825 LearningRate 0.000245 Epoch: 22 Global Step: 459940 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:17,057-Speed 2496.27 samples/sec Loss 1.8960 LearningRate 0.000245 Epoch: 22 Global Step: 459950 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:25,260-Speed 2496.92 samples/sec Loss 1.9567 LearningRate 0.000245 Epoch: 22 Global Step: 459960 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:33,412-Speed 2512.74 samples/sec Loss 1.9488 LearningRate 0.000245 Epoch: 22 Global Step: 459970 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:41,616-Speed 2496.73 samples/sec Loss 1.8841 LearningRate 0.000245 Epoch: 22 Global Step: 459980 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:49,816-Speed 2498.02 samples/sec Loss 1.9216 LearningRate 0.000245 Epoch: 22 Global Step: 459990 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:44:58,019-Speed 2497.10 samples/sec Loss 1.8970 LearningRate 0.000245 Epoch: 22 Global Step: 460000 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:45:06,226-Speed 2495.73 samples/sec Loss 1.9162 LearningRate 0.000245 Epoch: 22 Global Step: 460010 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:45:14,427-Speed 2497.73 samples/sec Loss 1.9068 LearningRate 0.000245 Epoch: 22 Global Step: 460020 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:45:22,585-Speed 2510.58 samples/sec Loss 1.9040 LearningRate 0.000245 Epoch: 22 Global Step: 460030 Fp16 Grad Scale: 8192 Required: 85 hours Training: 2022-07-09 23:45:30,790-Speed 2496.53 samples/sec Loss 1.8853 LearningRate 0.000245 Epoch: 22 Global Step: 460040 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:45:39,008-Speed 2492.55 samples/sec Loss 1.9491 LearningRate 0.000245 Epoch: 22 Global Step: 460050 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:45:47,210-Speed 2497.32 samples/sec Loss 1.9141 LearningRate 0.000245 Epoch: 22 Global Step: 460060 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:45:55,413-Speed 2497.06 samples/sec Loss 1.9484 LearningRate 0.000245 Epoch: 22 Global Step: 460070 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:03,614-Speed 2497.84 samples/sec Loss 1.9053 LearningRate 0.000245 Epoch: 22 Global Step: 460080 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:11,767-Speed 2512.48 samples/sec Loss 1.9104 LearningRate 0.000245 Epoch: 22 Global Step: 460090 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:19,967-Speed 2497.65 samples/sec Loss 1.9302 LearningRate 0.000245 Epoch: 22 Global Step: 460100 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:28,172-Speed 2496.58 samples/sec Loss 1.9033 LearningRate 0.000245 Epoch: 22 Global Step: 460110 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:36,382-Speed 2494.88 samples/sec Loss 1.9303 LearningRate 0.000245 Epoch: 22 Global Step: 460120 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:44,593-Speed 2494.70 samples/sec Loss 1.9087 LearningRate 0.000245 Epoch: 22 Global Step: 460130 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:46:52,798-Speed 2496.34 samples/sec Loss 1.9514 LearningRate 0.000245 Epoch: 22 Global Step: 460140 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:00,949-Speed 2512.81 samples/sec Loss 1.9269 LearningRate 0.000245 Epoch: 22 Global Step: 460150 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:09,158-Speed 2498.19 samples/sec Loss 1.9230 LearningRate 0.000245 Epoch: 22 Global Step: 460160 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:17,360-Speed 2497.40 samples/sec Loss 1.8872 LearningRate 0.000245 Epoch: 22 Global Step: 460170 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:25,591-Speed 2496.98 samples/sec Loss 1.9386 LearningRate 0.000245 Epoch: 22 Global Step: 460180 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:33,793-Speed 2497.67 samples/sec Loss 1.9053 LearningRate 0.000245 Epoch: 22 Global Step: 460190 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:42,009-Speed 2492.99 samples/sec Loss 1.9245 LearningRate 0.000245 Epoch: 22 Global Step: 460200 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:50,166-Speed 2511.15 samples/sec Loss 1.9691 LearningRate 0.000245 Epoch: 22 Global Step: 460210 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:47:58,369-Speed 2497.02 samples/sec Loss 1.8771 LearningRate 0.000245 Epoch: 22 Global Step: 460220 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:06,570-Speed 2497.48 samples/sec Loss 1.9100 LearningRate 0.000245 Epoch: 22 Global Step: 460230 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:14,777-Speed 2495.95 samples/sec Loss 1.9422 LearningRate 0.000245 Epoch: 22 Global Step: 460240 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:22,983-Speed 2496.10 samples/sec Loss 1.9524 LearningRate 0.000245 Epoch: 22 Global Step: 460250 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:31,187-Speed 2496.53 samples/sec Loss 1.9522 LearningRate 0.000245 Epoch: 22 Global Step: 460260 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:39,339-Speed 2512.86 samples/sec Loss 1.9356 LearningRate 0.000245 Epoch: 22 Global Step: 460270 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:47,548-Speed 2495.21 samples/sec Loss 1.9432 LearningRate 0.000245 Epoch: 22 Global Step: 460280 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:48:55,762-Speed 2493.48 samples/sec Loss 1.8761 LearningRate 0.000245 Epoch: 22 Global Step: 460290 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:03,964-Speed 2497.68 samples/sec Loss 1.9482 LearningRate 0.000245 Epoch: 22 Global Step: 460300 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:12,166-Speed 2497.13 samples/sec Loss 1.9098 LearningRate 0.000245 Epoch: 22 Global Step: 460310 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:20,380-Speed 2493.70 samples/sec Loss 1.9327 LearningRate 0.000245 Epoch: 22 Global Step: 460320 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:28,547-Speed 2508.16 samples/sec Loss 1.9253 LearningRate 0.000245 Epoch: 22 Global Step: 460330 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:36,755-Speed 2495.31 samples/sec Loss 1.8915 LearningRate 0.000245 Epoch: 22 Global Step: 460340 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:44,961-Speed 2496.39 samples/sec Loss 1.8812 LearningRate 0.000245 Epoch: 22 Global Step: 460350 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:49:53,163-Speed 2497.54 samples/sec Loss 1.9365 LearningRate 0.000245 Epoch: 22 Global Step: 460360 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:01,365-Speed 2497.24 samples/sec Loss 1.8972 LearningRate 0.000245 Epoch: 22 Global Step: 460370 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:09,567-Speed 2497.45 samples/sec Loss 1.9375 LearningRate 0.000245 Epoch: 22 Global Step: 460380 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:17,715-Speed 2514.09 samples/sec Loss 1.9639 LearningRate 0.000244 Epoch: 22 Global Step: 460390 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:25,921-Speed 2496.12 samples/sec Loss 1.9167 LearningRate 0.000244 Epoch: 22 Global Step: 460400 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:34,123-Speed 2497.52 samples/sec Loss 1.9654 LearningRate 0.000244 Epoch: 22 Global Step: 460410 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:42,348-Speed 2490.46 samples/sec Loss 1.9674 LearningRate 0.000244 Epoch: 22 Global Step: 460420 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:50,549-Speed 2497.59 samples/sec Loss 1.9264 LearningRate 0.000244 Epoch: 22 Global Step: 460430 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:50:58,763-Speed 2493.57 samples/sec Loss 1.9028 LearningRate 0.000244 Epoch: 22 Global Step: 460440 Fp16 Grad Scale: 16384 Required: 85 hours Training: 2022-07-09 23:51:06,934-Speed 2506.96 samples/sec Loss 1.9739 LearningRate 0.000244 Epoch: 22 Global Step: 460450 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:15,147-Speed 2494.33 samples/sec Loss 1.9949 LearningRate 0.000244 Epoch: 22 Global Step: 460460 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:23,352-Speed 2496.57 samples/sec Loss 1.9582 LearningRate 0.000244 Epoch: 22 Global Step: 460470 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:31,557-Speed 2496.28 samples/sec Loss 1.9384 LearningRate 0.000244 Epoch: 22 Global Step: 460480 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:39,766-Speed 2495.36 samples/sec Loss 1.9699 LearningRate 0.000244 Epoch: 22 Global Step: 460490 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:47,966-Speed 2497.65 samples/sec Loss 1.9208 LearningRate 0.000244 Epoch: 22 Global Step: 460500 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:51:56,118-Speed 2512.87 samples/sec Loss 1.9172 LearningRate 0.000244 Epoch: 22 Global Step: 460510 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:04,320-Speed 2497.23 samples/sec Loss 1.9363 LearningRate 0.000244 Epoch: 22 Global Step: 460520 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:12,522-Speed 2497.38 samples/sec Loss 1.9500 LearningRate 0.000244 Epoch: 22 Global Step: 460530 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:20,725-Speed 2496.95 samples/sec Loss 1.8847 LearningRate 0.000244 Epoch: 22 Global Step: 460540 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:28,938-Speed 2493.86 samples/sec Loss 1.9285 LearningRate 0.000244 Epoch: 22 Global Step: 460550 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:37,151-Speed 2494.02 samples/sec Loss 1.9746 LearningRate 0.000244 Epoch: 22 Global Step: 460560 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:45,318-Speed 2508.16 samples/sec Loss 1.9240 LearningRate 0.000244 Epoch: 22 Global Step: 460570 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:52:53,525-Speed 2495.89 samples/sec Loss 1.9329 LearningRate 0.000244 Epoch: 22 Global Step: 460580 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:01,728-Speed 2496.81 samples/sec Loss 1.9533 LearningRate 0.000244 Epoch: 22 Global Step: 460590 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:09,933-Speed 2496.59 samples/sec Loss 1.9491 LearningRate 0.000244 Epoch: 22 Global Step: 460600 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:18,137-Speed 2496.85 samples/sec Loss 1.9231 LearningRate 0.000244 Epoch: 22 Global Step: 460610 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:26,340-Speed 2496.87 samples/sec Loss 1.9587 LearningRate 0.000244 Epoch: 22 Global Step: 460620 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:34,498-Speed 2510.92 samples/sec Loss 1.9583 LearningRate 0.000244 Epoch: 22 Global Step: 460630 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:42,705-Speed 2495.93 samples/sec Loss 1.9722 LearningRate 0.000244 Epoch: 22 Global Step: 460640 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:50,913-Speed 2495.65 samples/sec Loss 1.9543 LearningRate 0.000244 Epoch: 22 Global Step: 460650 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:53:59,118-Speed 2496.29 samples/sec Loss 1.9059 LearningRate 0.000244 Epoch: 22 Global Step: 460660 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:07,320-Speed 2497.69 samples/sec Loss 1.9139 LearningRate 0.000244 Epoch: 22 Global Step: 460670 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:15,530-Speed 2494.97 samples/sec Loss 1.9678 LearningRate 0.000244 Epoch: 22 Global Step: 460680 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:23,678-Speed 2513.97 samples/sec Loss 1.8687 LearningRate 0.000244 Epoch: 22 Global Step: 460690 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:31,882-Speed 2496.58 samples/sec Loss 1.9395 LearningRate 0.000244 Epoch: 22 Global Step: 460700 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:40,084-Speed 2497.38 samples/sec Loss 1.8813 LearningRate 0.000244 Epoch: 22 Global Step: 460710 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:48,283-Speed 2499.01 samples/sec Loss 1.9514 LearningRate 0.000244 Epoch: 22 Global Step: 460720 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:54:56,491-Speed 2495.44 samples/sec Loss 1.9421 LearningRate 0.000244 Epoch: 22 Global Step: 460730 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:04,693-Speed 2497.31 samples/sec Loss 1.8921 LearningRate 0.000244 Epoch: 22 Global Step: 460740 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:12,846-Speed 2512.28 samples/sec Loss 1.9642 LearningRate 0.000244 Epoch: 22 Global Step: 460750 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:21,049-Speed 2496.87 samples/sec Loss 1.9105 LearningRate 0.000244 Epoch: 22 Global Step: 460760 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:29,253-Speed 2496.96 samples/sec Loss 1.9429 LearningRate 0.000244 Epoch: 22 Global Step: 460770 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:37,477-Speed 2490.70 samples/sec Loss 1.9400 LearningRate 0.000244 Epoch: 22 Global Step: 460780 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:45,681-Speed 2496.35 samples/sec Loss 1.8984 LearningRate 0.000244 Epoch: 22 Global Step: 460790 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:55:53,886-Speed 2496.50 samples/sec Loss 1.9551 LearningRate 0.000244 Epoch: 22 Global Step: 460800 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:02,037-Speed 2513.17 samples/sec Loss 1.9120 LearningRate 0.000244 Epoch: 22 Global Step: 460810 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:10,237-Speed 2497.82 samples/sec Loss 1.9015 LearningRate 0.000244 Epoch: 22 Global Step: 460820 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:18,438-Speed 2497.69 samples/sec Loss 1.9059 LearningRate 0.000244 Epoch: 22 Global Step: 460830 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:26,654-Speed 2493.33 samples/sec Loss 1.9230 LearningRate 0.000244 Epoch: 22 Global Step: 460840 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:34,857-Speed 2497.01 samples/sec Loss 1.9014 LearningRate 0.000244 Epoch: 22 Global Step: 460850 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:43,064-Speed 2495.87 samples/sec Loss 1.9710 LearningRate 0.000244 Epoch: 22 Global Step: 460860 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:51,213-Speed 2513.56 samples/sec Loss 1.9643 LearningRate 0.000244 Epoch: 22 Global Step: 460870 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:56:59,417-Speed 2496.98 samples/sec Loss 1.9701 LearningRate 0.000244 Epoch: 22 Global Step: 460880 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:07,631-Speed 2493.51 samples/sec Loss 1.9267 LearningRate 0.000244 Epoch: 22 Global Step: 460890 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:15,832-Speed 2497.77 samples/sec Loss 1.9222 LearningRate 0.000244 Epoch: 22 Global Step: 460900 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:24,034-Speed 2497.37 samples/sec Loss 1.9926 LearningRate 0.000244 Epoch: 22 Global Step: 460910 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:32,235-Speed 2497.43 samples/sec Loss 1.9156 LearningRate 0.000244 Epoch: 22 Global Step: 460920 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:40,389-Speed 2512.17 samples/sec Loss 1.9615 LearningRate 0.000244 Epoch: 22 Global Step: 460930 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:48,596-Speed 2495.92 samples/sec Loss 1.9063 LearningRate 0.000244 Epoch: 22 Global Step: 460940 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:57:56,798-Speed 2497.09 samples/sec Loss 1.9665 LearningRate 0.000244 Epoch: 22 Global Step: 460950 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:04,999-Speed 2497.91 samples/sec Loss 1.9492 LearningRate 0.000244 Epoch: 22 Global Step: 460960 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:13,225-Speed 2498.60 samples/sec Loss 1.9277 LearningRate 0.000244 Epoch: 22 Global Step: 460970 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:21,431-Speed 2496.03 samples/sec Loss 1.9463 LearningRate 0.000244 Epoch: 22 Global Step: 460980 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:29,623-Speed 2515.19 samples/sec Loss 1.9101 LearningRate 0.000244 Epoch: 22 Global Step: 460990 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:37,839-Speed 2499.22 samples/sec Loss 1.8808 LearningRate 0.000244 Epoch: 22 Global Step: 461000 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:46,045-Speed 2496.23 samples/sec Loss 1.9767 LearningRate 0.000244 Epoch: 22 Global Step: 461010 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:58:54,252-Speed 2495.71 samples/sec Loss 1.9538 LearningRate 0.000244 Epoch: 22 Global Step: 461020 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:06,810-Speed 1646.11 samples/sec Loss 1.9493 LearningRate 0.000244 Epoch: 22 Global Step: 461030 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:15,112-Speed 2501.35 samples/sec Loss 2.0005 LearningRate 0.000244 Epoch: 22 Global Step: 461040 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:23,256-Speed 2514.81 samples/sec Loss 1.9661 LearningRate 0.000244 Epoch: 22 Global Step: 461050 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:36,548-Speed 2499.34 samples/sec Loss 1.9334 LearningRate 0.000244 Epoch: 22 Global Step: 461060 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:45,505-Speed 2499.98 samples/sec Loss 1.9199 LearningRate 0.000244 Epoch: 22 Global Step: 461070 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-09 23:59:54,019-Speed 2405.75 samples/sec Loss 1.9537 LearningRate 0.000244 Epoch: 22 Global Step: 461080 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:02,221-Speed 2497.81 samples/sec Loss 1.9627 LearningRate 0.000244 Epoch: 22 Global Step: 461090 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:10,499-Speed 2494.39 samples/sec Loss 1.9405 LearningRate 0.000244 Epoch: 22 Global Step: 461100 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:22,369-Speed 1725.39 samples/sec Loss 1.9152 LearningRate 0.000244 Epoch: 22 Global Step: 461110 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:30,579-Speed 2498.19 samples/sec Loss 1.9307 LearningRate 0.000244 Epoch: 22 Global Step: 461120 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:38,805-Speed 2498.84 samples/sec Loss 1.9465 LearningRate 0.000244 Epoch: 22 Global Step: 461130 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:47,025-Speed 2491.74 samples/sec Loss 1.9374 LearningRate 0.000244 Epoch: 22 Global Step: 461140 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:00:55,323-Speed 2473.16 samples/sec Loss 1.9552 LearningRate 0.000243 Epoch: 22 Global Step: 461150 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:03,520-Speed 2499.66 samples/sec Loss 1.9447 LearningRate 0.000243 Epoch: 22 Global Step: 461160 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:11,748-Speed 2512.47 samples/sec Loss 1.9250 LearningRate 0.000243 Epoch: 22 Global Step: 461170 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:19,995-Speed 2499.48 samples/sec Loss 1.9296 LearningRate 0.000243 Epoch: 22 Global Step: 461180 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:28,194-Speed 2498.35 samples/sec Loss 1.9232 LearningRate 0.000243 Epoch: 22 Global Step: 461190 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:36,420-Speed 2493.03 samples/sec Loss 1.9747 LearningRate 0.000243 Epoch: 22 Global Step: 461200 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:44,707-Speed 2471.78 samples/sec Loss 1.9165 LearningRate 0.000243 Epoch: 22 Global Step: 461210 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:01:52,907-Speed 2497.91 samples/sec Loss 1.9727 LearningRate 0.000243 Epoch: 22 Global Step: 461220 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:02:01,053-Speed 2514.57 samples/sec Loss 1.9313 LearningRate 0.000243 Epoch: 22 Global Step: 461230 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:02:09,254-Speed 2497.63 samples/sec Loss 1.9236 LearningRate 0.000243 Epoch: 22 Global Step: 461240 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:17,454-Speed 2498.85 samples/sec Loss 1.9138 LearningRate 0.000243 Epoch: 22 Global Step: 461250 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:25,652-Speed 2498.40 samples/sec Loss 1.9395 LearningRate 0.000243 Epoch: 22 Global Step: 461260 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:33,863-Speed 2494.50 samples/sec Loss 1.8922 LearningRate 0.000243 Epoch: 22 Global Step: 461270 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:42,066-Speed 2498.98 samples/sec Loss 1.9323 LearningRate 0.000243 Epoch: 22 Global Step: 461280 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:50,212-Speed 2514.49 samples/sec Loss 1.9402 LearningRate 0.000243 Epoch: 22 Global Step: 461290 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:02:58,415-Speed 2497.17 samples/sec Loss 1.9234 LearningRate 0.000243 Epoch: 22 Global Step: 461300 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:06,614-Speed 2498.67 samples/sec Loss 1.8978 LearningRate 0.000243 Epoch: 22 Global Step: 461310 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:14,818-Speed 2496.82 samples/sec Loss 1.9601 LearningRate 0.000243 Epoch: 22 Global Step: 461320 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:23,027-Speed 2495.28 samples/sec Loss 1.9095 LearningRate 0.000243 Epoch: 22 Global Step: 461330 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:31,226-Speed 2498.28 samples/sec Loss 1.9430 LearningRate 0.000243 Epoch: 22 Global Step: 461340 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:39,374-Speed 2513.94 samples/sec Loss 1.9769 LearningRate 0.000243 Epoch: 22 Global Step: 461350 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:47,575-Speed 2497.77 samples/sec Loss 1.9634 LearningRate 0.000243 Epoch: 22 Global Step: 461360 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:03:55,797-Speed 2491.22 samples/sec Loss 1.9218 LearningRate 0.000243 Epoch: 22 Global Step: 461370 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:04,005-Speed 2495.51 samples/sec Loss 1.9318 LearningRate 0.000243 Epoch: 22 Global Step: 461380 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:12,209-Speed 2497.13 samples/sec Loss 1.9034 LearningRate 0.000243 Epoch: 22 Global Step: 461390 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:20,410-Speed 2497.77 samples/sec Loss 1.9077 LearningRate 0.000243 Epoch: 22 Global Step: 461400 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:28,557-Speed 2514.19 samples/sec Loss 1.9588 LearningRate 0.000243 Epoch: 22 Global Step: 461410 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:36,755-Speed 2498.57 samples/sec Loss 1.9438 LearningRate 0.000243 Epoch: 22 Global Step: 461420 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:44,957-Speed 2497.45 samples/sec Loss 1.9349 LearningRate 0.000243 Epoch: 22 Global Step: 461430 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:04:53,156-Speed 2498.32 samples/sec Loss 1.9171 LearningRate 0.000243 Epoch: 22 Global Step: 461440 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:01,351-Speed 2499.34 samples/sec Loss 1.9292 LearningRate 0.000243 Epoch: 22 Global Step: 461450 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:09,552-Speed 2497.62 samples/sec Loss 1.9486 LearningRate 0.000243 Epoch: 22 Global Step: 461460 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:17,697-Speed 2514.74 samples/sec Loss 1.9382 LearningRate 0.000243 Epoch: 22 Global Step: 461470 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:25,895-Speed 2498.78 samples/sec Loss 1.8956 LearningRate 0.000243 Epoch: 22 Global Step: 461480 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:34,098-Speed 2497.13 samples/sec Loss 1.8991 LearningRate 0.000243 Epoch: 22 Global Step: 461490 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:42,299-Speed 2497.55 samples/sec Loss 1.8996 LearningRate 0.000243 Epoch: 22 Global Step: 461500 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:50,500-Speed 2497.60 samples/sec Loss 1.9068 LearningRate 0.000243 Epoch: 22 Global Step: 461510 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:05:58,701-Speed 2497.93 samples/sec Loss 1.9143 LearningRate 0.000243 Epoch: 22 Global Step: 461520 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:06,844-Speed 2515.35 samples/sec Loss 1.8708 LearningRate 0.000243 Epoch: 22 Global Step: 461530 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:15,045-Speed 2497.91 samples/sec Loss 1.9078 LearningRate 0.000243 Epoch: 22 Global Step: 461540 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:23,244-Speed 2498.20 samples/sec Loss 1.9642 LearningRate 0.000243 Epoch: 22 Global Step: 461550 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:31,449-Speed 2496.14 samples/sec Loss 1.8927 LearningRate 0.000243 Epoch: 22 Global Step: 461560 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:39,659-Speed 2494.98 samples/sec Loss 1.8960 LearningRate 0.000243 Epoch: 22 Global Step: 461570 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:47,856-Speed 2498.95 samples/sec Loss 1.9192 LearningRate 0.000243 Epoch: 22 Global Step: 461580 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:06:56,008-Speed 2512.65 samples/sec Loss 1.8911 LearningRate 0.000243 Epoch: 22 Global Step: 461590 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:04,203-Speed 2499.36 samples/sec Loss 1.9020 LearningRate 0.000243 Epoch: 22 Global Step: 461600 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:12,404-Speed 2497.87 samples/sec Loss 1.8870 LearningRate 0.000243 Epoch: 22 Global Step: 461610 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:20,604-Speed 2497.97 samples/sec Loss 1.8791 LearningRate 0.000243 Epoch: 22 Global Step: 461620 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:28,802-Speed 2498.53 samples/sec Loss 1.9330 LearningRate 0.000243 Epoch: 22 Global Step: 461630 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:37,006-Speed 2496.96 samples/sec Loss 1.9581 LearningRate 0.000243 Epoch: 22 Global Step: 461640 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:45,166-Speed 2510.23 samples/sec Loss 1.8847 LearningRate 0.000243 Epoch: 22 Global Step: 461650 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:07:53,371-Speed 2496.56 samples/sec Loss 1.8859 LearningRate 0.000243 Epoch: 22 Global Step: 461660 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:01,577-Speed 2496.38 samples/sec Loss 1.8971 LearningRate 0.000243 Epoch: 22 Global Step: 461670 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:09,779-Speed 2497.17 samples/sec Loss 1.8998 LearningRate 0.000243 Epoch: 22 Global Step: 461680 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:17,987-Speed 2495.42 samples/sec Loss 1.8912 LearningRate 0.000243 Epoch: 22 Global Step: 461690 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:26,192-Speed 2496.58 samples/sec Loss 1.9055 LearningRate 0.000243 Epoch: 22 Global Step: 461700 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:34,342-Speed 2513.12 samples/sec Loss 1.8873 LearningRate 0.000243 Epoch: 22 Global Step: 461710 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:42,542-Speed 2497.97 samples/sec Loss 1.9459 LearningRate 0.000243 Epoch: 22 Global Step: 461720 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:50,744-Speed 2497.46 samples/sec Loss 1.8910 LearningRate 0.000243 Epoch: 22 Global Step: 461730 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:08:58,945-Speed 2497.59 samples/sec Loss 1.9558 LearningRate 0.000243 Epoch: 22 Global Step: 461740 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:07,154-Speed 2496.79 samples/sec Loss 1.8939 LearningRate 0.000243 Epoch: 22 Global Step: 461750 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:15,365-Speed 2497.85 samples/sec Loss 1.8845 LearningRate 0.000243 Epoch: 22 Global Step: 461760 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:23,514-Speed 2513.70 samples/sec Loss 1.9086 LearningRate 0.000243 Epoch: 22 Global Step: 461770 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:31,714-Speed 2497.63 samples/sec Loss 1.9363 LearningRate 0.000243 Epoch: 22 Global Step: 461780 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:39,915-Speed 2497.56 samples/sec Loss 1.8819 LearningRate 0.000243 Epoch: 22 Global Step: 461790 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:48,115-Speed 2497.92 samples/sec Loss 1.9307 LearningRate 0.000243 Epoch: 22 Global Step: 461800 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:09:56,313-Speed 2498.72 samples/sec Loss 1.9099 LearningRate 0.000243 Epoch: 22 Global Step: 461810 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:04,514-Speed 2497.65 samples/sec Loss 1.8957 LearningRate 0.000243 Epoch: 22 Global Step: 461820 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:12,659-Speed 2514.75 samples/sec Loss 1.9225 LearningRate 0.000243 Epoch: 22 Global Step: 461830 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:20,857-Speed 2498.50 samples/sec Loss 1.9587 LearningRate 0.000243 Epoch: 22 Global Step: 461840 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:29,056-Speed 2498.45 samples/sec Loss 1.9175 LearningRate 0.000243 Epoch: 22 Global Step: 461850 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:37,254-Speed 2498.35 samples/sec Loss 1.9516 LearningRate 0.000243 Epoch: 22 Global Step: 461860 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:45,450-Speed 2499.11 samples/sec Loss 1.9019 LearningRate 0.000243 Epoch: 22 Global Step: 461870 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:10:53,647-Speed 2499.41 samples/sec Loss 1.9404 LearningRate 0.000243 Epoch: 22 Global Step: 461880 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:01,794-Speed 2514.16 samples/sec Loss 1.9109 LearningRate 0.000243 Epoch: 22 Global Step: 461890 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:10,000-Speed 2496.10 samples/sec Loss 1.9560 LearningRate 0.000243 Epoch: 22 Global Step: 461900 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:18,201-Speed 2497.67 samples/sec Loss 1.9642 LearningRate 0.000242 Epoch: 22 Global Step: 461910 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:26,399-Speed 2498.52 samples/sec Loss 1.9231 LearningRate 0.000242 Epoch: 22 Global Step: 461920 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:34,595-Speed 2499.15 samples/sec Loss 1.9256 LearningRate 0.000242 Epoch: 22 Global Step: 461930 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:42,793-Speed 2498.33 samples/sec Loss 1.8873 LearningRate 0.000242 Epoch: 22 Global Step: 461940 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:50,944-Speed 2512.97 samples/sec Loss 1.9302 LearningRate 0.000242 Epoch: 22 Global Step: 461950 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:11:59,142-Speed 2498.62 samples/sec Loss 1.9521 LearningRate 0.000242 Epoch: 22 Global Step: 461960 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:07,344-Speed 2497.60 samples/sec Loss 1.9516 LearningRate 0.000242 Epoch: 22 Global Step: 461970 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:15,544-Speed 2498.00 samples/sec Loss 1.9279 LearningRate 0.000242 Epoch: 22 Global Step: 461980 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:23,749-Speed 2496.84 samples/sec Loss 1.9485 LearningRate 0.000242 Epoch: 22 Global Step: 461990 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:31,951-Speed 2497.34 samples/sec Loss 1.9329 LearningRate 0.000242 Epoch: 22 Global Step: 462000 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:40,100-Speed 2513.50 samples/sec Loss 1.9314 LearningRate 0.000242 Epoch: 22 Global Step: 462010 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:48,302-Speed 2497.51 samples/sec Loss 1.9337 LearningRate 0.000242 Epoch: 22 Global Step: 462020 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:12:56,506-Speed 2496.91 samples/sec Loss 1.9225 LearningRate 0.000242 Epoch: 22 Global Step: 462030 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:04,723-Speed 2492.91 samples/sec Loss 1.9441 LearningRate 0.000242 Epoch: 22 Global Step: 462040 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:12,921-Speed 2498.61 samples/sec Loss 1.9264 LearningRate 0.000242 Epoch: 22 Global Step: 462050 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:21,121-Speed 2497.66 samples/sec Loss 1.9339 LearningRate 0.000242 Epoch: 22 Global Step: 462060 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:29,266-Speed 2514.80 samples/sec Loss 1.9619 LearningRate 0.000242 Epoch: 22 Global Step: 462070 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:37,466-Speed 2497.98 samples/sec Loss 1.9260 LearningRate 0.000242 Epoch: 22 Global Step: 462080 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:45,667-Speed 2497.56 samples/sec Loss 1.8760 LearningRate 0.000242 Epoch: 22 Global Step: 462090 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:13:53,871-Speed 2496.78 samples/sec Loss 1.9133 LearningRate 0.000242 Epoch: 22 Global Step: 462100 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:02,072-Speed 2497.66 samples/sec Loss 1.9562 LearningRate 0.000242 Epoch: 22 Global Step: 462110 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:10,276-Speed 2496.78 samples/sec Loss 1.9184 LearningRate 0.000242 Epoch: 22 Global Step: 462120 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:18,427-Speed 2512.85 samples/sec Loss 1.9431 LearningRate 0.000242 Epoch: 22 Global Step: 462130 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:26,625-Speed 2498.49 samples/sec Loss 1.9380 LearningRate 0.000242 Epoch: 22 Global Step: 462140 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:34,825-Speed 2498.23 samples/sec Loss 1.9677 LearningRate 0.000242 Epoch: 22 Global Step: 462150 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:43,022-Speed 2498.82 samples/sec Loss 1.9800 LearningRate 0.000242 Epoch: 22 Global Step: 462160 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:51,228-Speed 2496.25 samples/sec Loss 1.9945 LearningRate 0.000242 Epoch: 22 Global Step: 462170 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:14:59,429-Speed 2497.56 samples/sec Loss 1.9746 LearningRate 0.000242 Epoch: 22 Global Step: 462180 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:07,578-Speed 2514.08 samples/sec Loss 1.9485 LearningRate 0.000242 Epoch: 22 Global Step: 462190 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:15,779-Speed 2497.72 samples/sec Loss 1.9549 LearningRate 0.000242 Epoch: 22 Global Step: 462200 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:23,975-Speed 2499.10 samples/sec Loss 1.9733 LearningRate 0.000242 Epoch: 22 Global Step: 462210 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:32,179-Speed 2496.74 samples/sec Loss 1.9137 LearningRate 0.000242 Epoch: 22 Global Step: 462220 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:40,378-Speed 2498.19 samples/sec Loss 1.9385 LearningRate 0.000242 Epoch: 22 Global Step: 462230 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:48,599-Speed 2491.35 samples/sec Loss 1.9893 LearningRate 0.000242 Epoch: 22 Global Step: 462240 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:15:56,750-Speed 2513.10 samples/sec Loss 1.8857 LearningRate 0.000242 Epoch: 22 Global Step: 462250 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:04,963-Speed 2493.97 samples/sec Loss 1.9448 LearningRate 0.000242 Epoch: 22 Global Step: 462260 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:13,166-Speed 2496.80 samples/sec Loss 1.9349 LearningRate 0.000242 Epoch: 22 Global Step: 462270 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:21,372-Speed 2496.53 samples/sec Loss 1.9302 LearningRate 0.000242 Epoch: 22 Global Step: 462280 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:29,568-Speed 2498.94 samples/sec Loss 1.9275 LearningRate 0.000242 Epoch: 22 Global Step: 462290 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:37,768-Speed 2498.15 samples/sec Loss 1.9245 LearningRate 0.000242 Epoch: 22 Global Step: 462300 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:45,920-Speed 2512.68 samples/sec Loss 1.9672 LearningRate 0.000242 Epoch: 22 Global Step: 462310 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:16:54,121-Speed 2497.56 samples/sec Loss 1.9374 LearningRate 0.000242 Epoch: 22 Global Step: 462320 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:02,322-Speed 2497.46 samples/sec Loss 1.9368 LearningRate 0.000242 Epoch: 22 Global Step: 462330 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:10,538-Speed 2493.28 samples/sec Loss 1.9533 LearningRate 0.000242 Epoch: 22 Global Step: 462340 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:18,742-Speed 2496.73 samples/sec Loss 1.9602 LearningRate 0.000242 Epoch: 22 Global Step: 462350 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:26,946-Speed 2496.82 samples/sec Loss 2.0156 LearningRate 0.000242 Epoch: 22 Global Step: 462360 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:35,094-Speed 2513.59 samples/sec Loss 1.9070 LearningRate 0.000242 Epoch: 22 Global Step: 462370 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:43,313-Speed 2492.37 samples/sec Loss 1.9002 LearningRate 0.000242 Epoch: 22 Global Step: 462380 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:51,514-Speed 2497.79 samples/sec Loss 1.9177 LearningRate 0.000242 Epoch: 22 Global Step: 462390 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:17:59,734-Speed 2491.68 samples/sec Loss 1.9520 LearningRate 0.000242 Epoch: 22 Global Step: 462400 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:18:07,935-Speed 2497.68 samples/sec Loss 1.9111 LearningRate 0.000242 Epoch: 22 Global Step: 462410 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:18:16,134-Speed 2498.34 samples/sec Loss 1.9226 LearningRate 0.000242 Epoch: 22 Global Step: 462420 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:18:24,279-Speed 2514.76 samples/sec Loss 1.9450 LearningRate 0.000242 Epoch: 22 Global Step: 462430 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:18:32,481-Speed 2497.51 samples/sec Loss 1.9494 LearningRate 0.000242 Epoch: 22 Global Step: 462440 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:18:40,686-Speed 2496.42 samples/sec Loss 1.9240 LearningRate 0.000242 Epoch: 22 Global Step: 462450 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:18:48,886-Speed 2497.81 samples/sec Loss 1.8835 LearningRate 0.000242 Epoch: 22 Global Step: 462460 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:18:57,087-Speed 2497.73 samples/sec Loss 1.9084 LearningRate 0.000242 Epoch: 22 Global Step: 462470 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:05,282-Speed 2499.51 samples/sec Loss 1.9266 LearningRate 0.000242 Epoch: 22 Global Step: 462480 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:13,431-Speed 2513.74 samples/sec Loss 1.9080 LearningRate 0.000242 Epoch: 22 Global Step: 462490 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:21,633-Speed 2497.27 samples/sec Loss 1.9020 LearningRate 0.000242 Epoch: 22 Global Step: 462500 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:29,837-Speed 2496.70 samples/sec Loss 1.9183 LearningRate 0.000242 Epoch: 22 Global Step: 462510 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:38,040-Speed 2496.89 samples/sec Loss 1.9190 LearningRate 0.000242 Epoch: 22 Global Step: 462520 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:46,242-Speed 2497.90 samples/sec Loss 1.9391 LearningRate 0.000242 Epoch: 22 Global Step: 462530 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:19:54,445-Speed 2497.37 samples/sec Loss 1.8779 LearningRate 0.000242 Epoch: 22 Global Step: 462540 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:02,599-Speed 2511.85 samples/sec Loss 1.9338 LearningRate 0.000242 Epoch: 22 Global Step: 462550 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:10,803-Speed 2497.18 samples/sec Loss 1.8773 LearningRate 0.000242 Epoch: 22 Global Step: 462560 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:19,002-Speed 2498.34 samples/sec Loss 1.8896 LearningRate 0.000242 Epoch: 22 Global Step: 462570 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:27,199-Speed 2498.96 samples/sec Loss 1.8909 LearningRate 0.000242 Epoch: 22 Global Step: 462580 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:35,395-Speed 2499.11 samples/sec Loss 1.8826 LearningRate 0.000242 Epoch: 22 Global Step: 462590 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:43,597-Speed 2497.18 samples/sec Loss 1.8514 LearningRate 0.000242 Epoch: 22 Global Step: 462600 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:51,740-Speed 2515.31 samples/sec Loss 1.8943 LearningRate 0.000242 Epoch: 22 Global Step: 462610 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:20:59,937-Speed 2499.03 samples/sec Loss 1.9095 LearningRate 0.000242 Epoch: 22 Global Step: 462620 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:08,138-Speed 2497.97 samples/sec Loss 1.8848 LearningRate 0.000242 Epoch: 22 Global Step: 462630 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:16,336-Speed 2498.44 samples/sec Loss 1.8931 LearningRate 0.000242 Epoch: 22 Global Step: 462640 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:24,536-Speed 2497.89 samples/sec Loss 1.9015 LearningRate 0.000242 Epoch: 22 Global Step: 462650 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:32,746-Speed 2494.98 samples/sec Loss 1.9217 LearningRate 0.000242 Epoch: 22 Global Step: 462660 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:40,894-Speed 2513.98 samples/sec Loss 1.9123 LearningRate 0.000241 Epoch: 22 Global Step: 462670 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:49,095-Speed 2497.63 samples/sec Loss 1.9120 LearningRate 0.000241 Epoch: 22 Global Step: 462680 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:21:57,297-Speed 2497.46 samples/sec Loss 1.9036 LearningRate 0.000241 Epoch: 22 Global Step: 462690 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:05,494-Speed 2498.65 samples/sec Loss 1.9250 LearningRate 0.000241 Epoch: 22 Global Step: 462700 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:13,695-Speed 2497.73 samples/sec Loss 1.8960 LearningRate 0.000241 Epoch: 22 Global Step: 462710 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:21,897-Speed 2497.39 samples/sec Loss 1.9012 LearningRate 0.000241 Epoch: 22 Global Step: 462720 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:30,042-Speed 2514.98 samples/sec Loss 1.9054 LearningRate 0.000241 Epoch: 22 Global Step: 462730 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:38,246-Speed 2496.86 samples/sec Loss 1.8840 LearningRate 0.000241 Epoch: 22 Global Step: 462740 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:46,460-Speed 2493.51 samples/sec Loss 1.9139 LearningRate 0.000241 Epoch: 22 Global Step: 462750 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:22:54,665-Speed 2496.61 samples/sec Loss 1.9117 LearningRate 0.000241 Epoch: 22 Global Step: 462760 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:02,877-Speed 2494.32 samples/sec Loss 1.8896 LearningRate 0.000241 Epoch: 22 Global Step: 462770 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:11,075-Speed 2498.35 samples/sec Loss 1.9296 LearningRate 0.000241 Epoch: 22 Global Step: 462780 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:19,224-Speed 2513.85 samples/sec Loss 1.9269 LearningRate 0.000241 Epoch: 22 Global Step: 462790 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:27,424-Speed 2497.76 samples/sec Loss 1.8987 LearningRate 0.000241 Epoch: 22 Global Step: 462800 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:35,623-Speed 2498.42 samples/sec Loss 1.9095 LearningRate 0.000241 Epoch: 22 Global Step: 462810 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:43,824-Speed 2497.80 samples/sec Loss 1.9082 LearningRate 0.000241 Epoch: 22 Global Step: 462820 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:23:52,019-Speed 2499.27 samples/sec Loss 1.8916 LearningRate 0.000241 Epoch: 22 Global Step: 462830 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:00,221-Speed 2497.44 samples/sec Loss 1.8848 LearningRate 0.000241 Epoch: 22 Global Step: 462840 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:08,370-Speed 2513.60 samples/sec Loss 1.8882 LearningRate 0.000241 Epoch: 22 Global Step: 462850 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:16,570-Speed 2497.70 samples/sec Loss 1.9089 LearningRate 0.000241 Epoch: 22 Global Step: 462860 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:24,772-Speed 2497.18 samples/sec Loss 1.9353 LearningRate 0.000241 Epoch: 22 Global Step: 462870 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:32,971-Speed 2498.50 samples/sec Loss 1.9406 LearningRate 0.000241 Epoch: 22 Global Step: 462880 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:41,173-Speed 2497.27 samples/sec Loss 1.8699 LearningRate 0.000241 Epoch: 22 Global Step: 462890 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:49,373-Speed 2497.98 samples/sec Loss 1.9042 LearningRate 0.000241 Epoch: 22 Global Step: 462900 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:24:57,519-Speed 2514.50 samples/sec Loss 1.8980 LearningRate 0.000241 Epoch: 22 Global Step: 462910 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:05,724-Speed 2496.63 samples/sec Loss 1.9157 LearningRate 0.000241 Epoch: 22 Global Step: 462920 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:13,935-Speed 2494.71 samples/sec Loss 1.8842 LearningRate 0.000241 Epoch: 22 Global Step: 462930 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:22,143-Speed 2495.43 samples/sec Loss 1.9075 LearningRate 0.000241 Epoch: 22 Global Step: 462940 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:30,345-Speed 2497.50 samples/sec Loss 1.8972 LearningRate 0.000241 Epoch: 22 Global Step: 462950 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:38,546-Speed 2497.49 samples/sec Loss 1.9023 LearningRate 0.000241 Epoch: 22 Global Step: 462960 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:46,693-Speed 2514.43 samples/sec Loss 1.9807 LearningRate 0.000241 Epoch: 22 Global Step: 462970 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:25:54,897-Speed 2496.81 samples/sec Loss 1.9185 LearningRate 0.000241 Epoch: 22 Global Step: 462980 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:03,107-Speed 2494.81 samples/sec Loss 1.8910 LearningRate 0.000241 Epoch: 22 Global Step: 462990 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:11,308-Speed 2497.77 samples/sec Loss 1.9163 LearningRate 0.000241 Epoch: 22 Global Step: 463000 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:19,509-Speed 2497.61 samples/sec Loss 1.9067 LearningRate 0.000241 Epoch: 22 Global Step: 463010 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:27,713-Speed 2496.70 samples/sec Loss 1.9013 LearningRate 0.000241 Epoch: 22 Global Step: 463020 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:35,862-Speed 2513.70 samples/sec Loss 1.9302 LearningRate 0.000241 Epoch: 22 Global Step: 463030 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:44,060-Speed 2498.39 samples/sec Loss 1.9279 LearningRate 0.000241 Epoch: 22 Global Step: 463040 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-07-10 00:26:52,216-Speed 2511.81 samples/sec Loss 1.9324 LearningRate 0.000241 Epoch: 22 Global Step: 463050 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:00,417-Speed 2497.57 samples/sec Loss 1.9242 LearningRate 0.000241 Epoch: 22 Global Step: 463060 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:08,619-Speed 2497.62 samples/sec Loss 1.9286 LearningRate 0.000241 Epoch: 22 Global Step: 463070 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:16,825-Speed 2496.12 samples/sec Loss 1.8989 LearningRate 0.000241 Epoch: 22 Global Step: 463080 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:24,977-Speed 2512.85 samples/sec Loss 1.8961 LearningRate 0.000241 Epoch: 22 Global Step: 463090 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:33,175-Speed 2498.37 samples/sec Loss 1.9205 LearningRate 0.000241 Epoch: 22 Global Step: 463100 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:41,378-Speed 2497.10 samples/sec Loss 1.9234 LearningRate 0.000241 Epoch: 22 Global Step: 463110 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:49,577-Speed 2498.29 samples/sec Loss 1.9035 LearningRate 0.000241 Epoch: 22 Global Step: 463120 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:27:57,776-Speed 2498.26 samples/sec Loss 1.9298 LearningRate 0.000241 Epoch: 22 Global Step: 463130 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:05,976-Speed 2497.84 samples/sec Loss 1.9138 LearningRate 0.000241 Epoch: 22 Global Step: 463140 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:14,123-Speed 2514.55 samples/sec Loss 1.9291 LearningRate 0.000241 Epoch: 22 Global Step: 463150 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:22,338-Speed 2493.58 samples/sec Loss 1.9318 LearningRate 0.000241 Epoch: 22 Global Step: 463160 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:30,541-Speed 2497.29 samples/sec Loss 1.8781 LearningRate 0.000241 Epoch: 22 Global Step: 463170 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:38,746-Speed 2496.49 samples/sec Loss 1.8959 LearningRate 0.000241 Epoch: 22 Global Step: 463180 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:46,949-Speed 2496.93 samples/sec Loss 1.9419 LearningRate 0.000241 Epoch: 22 Global Step: 463190 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:28:55,149-Speed 2497.98 samples/sec Loss 1.8994 LearningRate 0.000241 Epoch: 22 Global Step: 463200 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:03,294-Speed 2515.30 samples/sec Loss 1.8837 LearningRate 0.000241 Epoch: 22 Global Step: 463210 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:11,500-Speed 2496.17 samples/sec Loss 1.8877 LearningRate 0.000241 Epoch: 22 Global Step: 463220 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:19,706-Speed 2496.09 samples/sec Loss 1.9089 LearningRate 0.000241 Epoch: 22 Global Step: 463230 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:27,910-Speed 2496.82 samples/sec Loss 1.9225 LearningRate 0.000241 Epoch: 22 Global Step: 463240 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:36,107-Speed 2498.97 samples/sec Loss 1.9114 LearningRate 0.000241 Epoch: 22 Global Step: 463250 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:44,306-Speed 2498.28 samples/sec Loss 1.8649 LearningRate 0.000241 Epoch: 22 Global Step: 463260 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:29:52,455-Speed 2513.49 samples/sec Loss 1.9129 LearningRate 0.000241 Epoch: 22 Global Step: 463270 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:00,657-Speed 2497.36 samples/sec Loss 1.8988 LearningRate 0.000241 Epoch: 22 Global Step: 463280 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:08,862-Speed 2496.52 samples/sec Loss 1.8936 LearningRate 0.000241 Epoch: 22 Global Step: 463290 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:17,062-Speed 2497.65 samples/sec Loss 1.8800 LearningRate 0.000241 Epoch: 22 Global Step: 463300 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:25,264-Speed 2497.45 samples/sec Loss 1.8960 LearningRate 0.000241 Epoch: 22 Global Step: 463310 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:33,472-Speed 2495.70 samples/sec Loss 1.9004 LearningRate 0.000241 Epoch: 22 Global Step: 463320 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:41,617-Speed 2514.61 samples/sec Loss 1.8553 LearningRate 0.000241 Epoch: 22 Global Step: 463330 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:30:49,774-Speed 2511.19 samples/sec Loss 1.9145 LearningRate 0.000241 Epoch: 22 Global Step: 463340 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:30:57,974-Speed 2498.03 samples/sec Loss 1.9063 LearningRate 0.000241 Epoch: 22 Global Step: 463350 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:06,174-Speed 2497.55 samples/sec Loss 1.8968 LearningRate 0.000241 Epoch: 22 Global Step: 463360 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:14,373-Speed 2498.64 samples/sec Loss 1.8989 LearningRate 0.000241 Epoch: 22 Global Step: 463370 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:22,577-Speed 2496.78 samples/sec Loss 1.9257 LearningRate 0.000241 Epoch: 22 Global Step: 463380 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:30,729-Speed 2512.47 samples/sec Loss 1.9336 LearningRate 0.000241 Epoch: 22 Global Step: 463390 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:38,929-Speed 2498.06 samples/sec Loss 1.9313 LearningRate 0.000241 Epoch: 22 Global Step: 463400 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:47,153-Speed 2490.83 samples/sec Loss 1.8803 LearningRate 0.000241 Epoch: 22 Global Step: 463410 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:31:55,352-Speed 2498.06 samples/sec Loss 1.9451 LearningRate 0.000241 Epoch: 22 Global Step: 463420 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:03,557-Speed 2496.45 samples/sec Loss 1.9420 LearningRate 0.000240 Epoch: 22 Global Step: 463430 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:11,757-Speed 2497.83 samples/sec Loss 1.8923 LearningRate 0.000240 Epoch: 22 Global Step: 463440 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:19,907-Speed 2513.21 samples/sec Loss 1.9149 LearningRate 0.000240 Epoch: 22 Global Step: 463450 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:28,112-Speed 2496.36 samples/sec Loss 1.9131 LearningRate 0.000240 Epoch: 22 Global Step: 463460 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:36,309-Speed 2499.11 samples/sec Loss 1.8817 LearningRate 0.000240 Epoch: 22 Global Step: 463470 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:44,510-Speed 2497.50 samples/sec Loss 1.8878 LearningRate 0.000240 Epoch: 22 Global Step: 463480 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:32:52,711-Speed 2497.65 samples/sec Loss 1.8903 LearningRate 0.000240 Epoch: 22 Global Step: 463490 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:00,913-Speed 2497.58 samples/sec Loss 1.8703 LearningRate 0.000240 Epoch: 22 Global Step: 463500 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:09,061-Speed 2513.73 samples/sec Loss 1.8942 LearningRate 0.000240 Epoch: 22 Global Step: 463510 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:17,262-Speed 2497.63 samples/sec Loss 1.9396 LearningRate 0.000240 Epoch: 22 Global Step: 463520 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:25,462-Speed 2497.88 samples/sec Loss 1.9096 LearningRate 0.000240 Epoch: 22 Global Step: 463530 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:33,668-Speed 2496.29 samples/sec Loss 1.9160 LearningRate 0.000240 Epoch: 22 Global Step: 463540 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:41,868-Speed 2498.03 samples/sec Loss 1.8809 LearningRate 0.000240 Epoch: 22 Global Step: 463550 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:50,066-Speed 2498.49 samples/sec Loss 1.9067 LearningRate 0.000240 Epoch: 22 Global Step: 463560 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:33:58,216-Speed 2513.04 samples/sec Loss 1.9059 LearningRate 0.000240 Epoch: 22 Global Step: 463570 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:06,416-Speed 2498.26 samples/sec Loss 1.9196 LearningRate 0.000240 Epoch: 22 Global Step: 463580 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:14,629-Speed 2493.94 samples/sec Loss 1.8936 LearningRate 0.000240 Epoch: 22 Global Step: 463590 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:22,831-Speed 2497.44 samples/sec Loss 1.9213 LearningRate 0.000240 Epoch: 22 Global Step: 463600 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:31,029-Speed 2498.61 samples/sec Loss 1.8814 LearningRate 0.000240 Epoch: 22 Global Step: 463610 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:39,229-Speed 2498.17 samples/sec Loss 1.9298 LearningRate 0.000240 Epoch: 22 Global Step: 463620 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:47,380-Speed 2513.07 samples/sec Loss 1.9017 LearningRate 0.000240 Epoch: 22 Global Step: 463630 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:34:55,591-Speed 2494.57 samples/sec Loss 1.9140 LearningRate 0.000240 Epoch: 22 Global Step: 463640 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:03,801-Speed 2494.79 samples/sec Loss 1.8872 LearningRate 0.000240 Epoch: 22 Global Step: 463650 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:12,003-Speed 2497.57 samples/sec Loss 1.9006 LearningRate 0.000240 Epoch: 22 Global Step: 463660 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:20,203-Speed 2498.35 samples/sec Loss 1.9057 LearningRate 0.000240 Epoch: 22 Global Step: 463670 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:28,401-Speed 2498.45 samples/sec Loss 1.9418 LearningRate 0.000240 Epoch: 22 Global Step: 463680 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:36,558-Speed 2511.17 samples/sec Loss 1.9360 LearningRate 0.000240 Epoch: 22 Global Step: 463690 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:44,757-Speed 2498.28 samples/sec Loss 1.9620 LearningRate 0.000240 Epoch: 22 Global Step: 463700 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:35:52,982-Speed 2490.42 samples/sec Loss 1.9263 LearningRate 0.000240 Epoch: 22 Global Step: 463710 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:01,183-Speed 2497.40 samples/sec Loss 1.8866 LearningRate 0.000240 Epoch: 22 Global Step: 463720 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:09,386-Speed 2497.40 samples/sec Loss 1.9248 LearningRate 0.000240 Epoch: 22 Global Step: 463730 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:17,583-Speed 2498.96 samples/sec Loss 1.9331 LearningRate 0.000240 Epoch: 22 Global Step: 463740 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:25,736-Speed 2512.25 samples/sec Loss 1.9545 LearningRate 0.000240 Epoch: 22 Global Step: 463750 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:33,947-Speed 2494.50 samples/sec Loss 1.9427 LearningRate 0.000240 Epoch: 22 Global Step: 463760 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:42,144-Speed 2498.97 samples/sec Loss 1.9457 LearningRate 0.000240 Epoch: 22 Global Step: 463770 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:50,348-Speed 2496.77 samples/sec Loss 1.9561 LearningRate 0.000240 Epoch: 22 Global Step: 463780 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:36:58,550-Speed 2497.39 samples/sec Loss 1.9264 LearningRate 0.000240 Epoch: 22 Global Step: 463790 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:06,753-Speed 2497.00 samples/sec Loss 1.9158 LearningRate 0.000240 Epoch: 22 Global Step: 463800 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:14,908-Speed 2511.86 samples/sec Loss 1.9021 LearningRate 0.000240 Epoch: 22 Global Step: 463810 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:23,110-Speed 2497.25 samples/sec Loss 1.9063 LearningRate 0.000240 Epoch: 22 Global Step: 463820 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:31,316-Speed 2496.13 samples/sec Loss 1.9207 LearningRate 0.000240 Epoch: 22 Global Step: 463830 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:39,520-Speed 2496.77 samples/sec Loss 1.8999 LearningRate 0.000240 Epoch: 22 Global Step: 463840 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:47,726-Speed 2496.19 samples/sec Loss 1.9344 LearningRate 0.000240 Epoch: 22 Global Step: 463850 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:37:55,927-Speed 2497.57 samples/sec Loss 1.8801 LearningRate 0.000240 Epoch: 22 Global Step: 463860 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:04,077-Speed 2513.33 samples/sec Loss 1.8836 LearningRate 0.000240 Epoch: 22 Global Step: 463870 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:12,275-Speed 2498.46 samples/sec Loss 1.9215 LearningRate 0.000240 Epoch: 22 Global Step: 463880 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:20,476-Speed 2497.68 samples/sec Loss 1.9143 LearningRate 0.000240 Epoch: 22 Global Step: 463890 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:28,692-Speed 2492.95 samples/sec Loss 1.8486 LearningRate 0.000240 Epoch: 22 Global Step: 463900 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:36,893-Speed 2497.77 samples/sec Loss 1.9195 LearningRate 0.000240 Epoch: 22 Global Step: 463910 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:45,091-Speed 2498.45 samples/sec Loss 1.9004 LearningRate 0.000240 Epoch: 22 Global Step: 463920 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:38:53,244-Speed 2512.38 samples/sec Loss 1.9198 LearningRate 0.000240 Epoch: 22 Global Step: 463930 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:01,453-Speed 2495.20 samples/sec Loss 1.9413 LearningRate 0.000240 Epoch: 22 Global Step: 463940 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:09,672-Speed 2492.43 samples/sec Loss 1.8629 LearningRate 0.000240 Epoch: 22 Global Step: 463950 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:17,873-Speed 2497.76 samples/sec Loss 1.9107 LearningRate 0.000240 Epoch: 22 Global Step: 463960 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:26,082-Speed 2495.02 samples/sec Loss 1.8646 LearningRate 0.000240 Epoch: 22 Global Step: 463970 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:34,280-Speed 2498.77 samples/sec Loss 1.9364 LearningRate 0.000240 Epoch: 22 Global Step: 463980 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:42,425-Speed 2514.74 samples/sec Loss 1.8993 LearningRate 0.000240 Epoch: 22 Global Step: 463990 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:50,626-Speed 2497.75 samples/sec Loss 1.8776 LearningRate 0.000240 Epoch: 22 Global Step: 464000 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:39:58,829-Speed 2497.22 samples/sec Loss 1.9086 LearningRate 0.000240 Epoch: 22 Global Step: 464010 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:07,035-Speed 2495.93 samples/sec Loss 1.8924 LearningRate 0.000240 Epoch: 22 Global Step: 464020 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:15,240-Speed 2496.50 samples/sec Loss 1.9029 LearningRate 0.000240 Epoch: 22 Global Step: 464030 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:23,440-Speed 2498.03 samples/sec Loss 1.9340 LearningRate 0.000240 Epoch: 22 Global Step: 464040 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:31,586-Speed 2514.40 samples/sec Loss 1.9359 LearningRate 0.000240 Epoch: 22 Global Step: 464050 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:39,787-Speed 2497.73 samples/sec Loss 1.9044 LearningRate 0.000240 Epoch: 22 Global Step: 464060 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:47,997-Speed 2494.85 samples/sec Loss 1.9281 LearningRate 0.000240 Epoch: 22 Global Step: 464070 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:40:56,204-Speed 2495.73 samples/sec Loss 1.9329 LearningRate 0.000240 Epoch: 22 Global Step: 464080 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:04,419-Speed 2493.43 samples/sec Loss 1.9144 LearningRate 0.000240 Epoch: 22 Global Step: 464090 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:12,617-Speed 2498.44 samples/sec Loss 1.9134 LearningRate 0.000240 Epoch: 22 Global Step: 464100 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:20,764-Speed 2514.13 samples/sec Loss 1.8987 LearningRate 0.000240 Epoch: 22 Global Step: 464110 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:28,963-Speed 2498.48 samples/sec Loss 1.9111 LearningRate 0.000240 Epoch: 22 Global Step: 464120 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:37,170-Speed 2495.60 samples/sec Loss 1.9079 LearningRate 0.000240 Epoch: 22 Global Step: 464130 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:45,373-Speed 2496.96 samples/sec Loss 1.8927 LearningRate 0.000240 Epoch: 22 Global Step: 464140 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:41:53,570-Speed 2499.02 samples/sec Loss 1.8815 LearningRate 0.000240 Epoch: 22 Global Step: 464150 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:01,769-Speed 2498.16 samples/sec Loss 1.8681 LearningRate 0.000240 Epoch: 22 Global Step: 464160 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:09,916-Speed 2514.38 samples/sec Loss 1.8976 LearningRate 0.000240 Epoch: 22 Global Step: 464170 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:18,119-Speed 2497.25 samples/sec Loss 1.8996 LearningRate 0.000240 Epoch: 22 Global Step: 464180 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:26,322-Speed 2496.93 samples/sec Loss 1.9099 LearningRate 0.000239 Epoch: 22 Global Step: 464190 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:34,523-Speed 2497.77 samples/sec Loss 1.9032 LearningRate 0.000239 Epoch: 22 Global Step: 464200 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:42,739-Speed 2493.18 samples/sec Loss 1.8952 LearningRate 0.000239 Epoch: 22 Global Step: 464210 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:50,939-Speed 2497.69 samples/sec Loss 1.8935 LearningRate 0.000239 Epoch: 22 Global Step: 464220 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:42:59,089-Speed 2513.45 samples/sec Loss 1.8961 LearningRate 0.000239 Epoch: 22 Global Step: 464230 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:07,286-Speed 2498.91 samples/sec Loss 1.9344 LearningRate 0.000239 Epoch: 22 Global Step: 464240 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:15,485-Speed 2498.19 samples/sec Loss 1.8535 LearningRate 0.000239 Epoch: 22 Global Step: 464250 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:23,684-Speed 2498.21 samples/sec Loss 1.9164 LearningRate 0.000239 Epoch: 22 Global Step: 464260 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:31,883-Speed 2498.14 samples/sec Loss 1.9390 LearningRate 0.000239 Epoch: 22 Global Step: 464270 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:40,105-Speed 2491.29 samples/sec Loss 1.8862 LearningRate 0.000239 Epoch: 22 Global Step: 464280 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:48,249-Speed 2515.03 samples/sec Loss 1.9085 LearningRate 0.000239 Epoch: 22 Global Step: 464290 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:43:56,449-Speed 2497.95 samples/sec Loss 1.9206 LearningRate 0.000239 Epoch: 22 Global Step: 464300 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:04,665-Speed 2493.19 samples/sec Loss 1.9492 LearningRate 0.000239 Epoch: 22 Global Step: 464310 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:12,864-Speed 2498.29 samples/sec Loss 1.8981 LearningRate 0.000239 Epoch: 22 Global Step: 464320 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:21,059-Speed 2499.18 samples/sec Loss 1.9149 LearningRate 0.000239 Epoch: 22 Global Step: 464330 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:29,272-Speed 2494.19 samples/sec Loss 1.8998 LearningRate 0.000239 Epoch: 22 Global Step: 464340 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:37,423-Speed 2512.98 samples/sec Loss 1.8897 LearningRate 0.000239 Epoch: 22 Global Step: 464350 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:45,630-Speed 2496.03 samples/sec Loss 1.9114 LearningRate 0.000239 Epoch: 22 Global Step: 464360 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:44:53,840-Speed 2494.73 samples/sec Loss 1.9409 LearningRate 0.000239 Epoch: 22 Global Step: 464370 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:02,038-Speed 2498.76 samples/sec Loss 1.8978 LearningRate 0.000239 Epoch: 22 Global Step: 464380 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:10,240-Speed 2497.42 samples/sec Loss 1.9577 LearningRate 0.000239 Epoch: 22 Global Step: 464390 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:18,440-Speed 2497.85 samples/sec Loss 1.9867 LearningRate 0.000239 Epoch: 22 Global Step: 464400 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:26,585-Speed 2514.89 samples/sec Loss 1.9293 LearningRate 0.000239 Epoch: 22 Global Step: 464410 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:34,782-Speed 2499.44 samples/sec Loss 1.9541 LearningRate 0.000239 Epoch: 22 Global Step: 464420 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:42,979-Speed 2498.93 samples/sec Loss 1.9736 LearningRate 0.000239 Epoch: 22 Global Step: 464430 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:51,177-Speed 2498.65 samples/sec Loss 1.9049 LearningRate 0.000239 Epoch: 22 Global Step: 464440 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:45:59,375-Speed 2498.34 samples/sec Loss 1.9029 LearningRate 0.000239 Epoch: 22 Global Step: 464450 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:07,574-Speed 2498.41 samples/sec Loss 1.9016 LearningRate 0.000239 Epoch: 22 Global Step: 464460 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:15,721-Speed 2514.33 samples/sec Loss 1.9622 LearningRate 0.000239 Epoch: 22 Global Step: 464470 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:23,918-Speed 2498.75 samples/sec Loss 1.8753 LearningRate 0.000239 Epoch: 22 Global Step: 464480 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:32,116-Speed 2498.76 samples/sec Loss 1.9186 LearningRate 0.000239 Epoch: 22 Global Step: 464490 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:40,319-Speed 2497.23 samples/sec Loss 1.9447 LearningRate 0.000239 Epoch: 22 Global Step: 464500 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:48,517-Speed 2498.26 samples/sec Loss 1.9067 LearningRate 0.000239 Epoch: 22 Global Step: 464510 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:46:56,713-Speed 2499.28 samples/sec Loss 1.8719 LearningRate 0.000239 Epoch: 22 Global Step: 464520 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:47:04,864-Speed 2513.11 samples/sec Loss 1.8973 LearningRate 0.000239 Epoch: 22 Global Step: 464530 Fp16 Grad Scale: 16384 Required: 84 hours Training: 2022-07-10 00:47:13,073-Speed 2495.46 samples/sec Loss 1.8930 LearningRate 0.000239 Epoch: 22 Global Step: 464540 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:47:21,274-Speed 2497.77 samples/sec Loss 1.9396 LearningRate 0.000239 Epoch: 22 Global Step: 464550 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:47:29,475-Speed 2497.54 samples/sec Loss 1.8982 LearningRate 0.000239 Epoch: 22 Global Step: 464560 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:47:37,672-Speed 2498.95 samples/sec Loss 1.8903 LearningRate 0.000239 Epoch: 22 Global Step: 464570 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:47:45,868-Speed 2498.98 samples/sec Loss 1.8752 LearningRate 0.000239 Epoch: 22 Global Step: 464580 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:47:54,014-Speed 2514.85 samples/sec Loss 1.9199 LearningRate 0.000239 Epoch: 22 Global Step: 464590 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:02,211-Speed 2498.47 samples/sec Loss 1.9129 LearningRate 0.000239 Epoch: 22 Global Step: 464600 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:10,413-Speed 2497.55 samples/sec Loss 1.9254 LearningRate 0.000239 Epoch: 22 Global Step: 464610 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:18,610-Speed 2498.96 samples/sec Loss 1.9089 LearningRate 0.000239 Epoch: 22 Global Step: 464620 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:26,812-Speed 2497.56 samples/sec Loss 1.9378 LearningRate 0.000239 Epoch: 22 Global Step: 464630 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:35,009-Speed 2498.62 samples/sec Loss 1.9183 LearningRate 0.000239 Epoch: 22 Global Step: 464640 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:43,153-Speed 2515.30 samples/sec Loss 1.9459 LearningRate 0.000239 Epoch: 22 Global Step: 464650 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:51,354-Speed 2497.88 samples/sec Loss 1.9027 LearningRate 0.000239 Epoch: 22 Global Step: 464660 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:48:59,552-Speed 2498.34 samples/sec Loss 1.8960 LearningRate 0.000239 Epoch: 22 Global Step: 464670 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:07,753-Speed 2497.76 samples/sec Loss 1.9190 LearningRate 0.000239 Epoch: 22 Global Step: 464680 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:15,955-Speed 2497.40 samples/sec Loss 1.9362 LearningRate 0.000239 Epoch: 22 Global Step: 464690 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:24,151-Speed 2499.36 samples/sec Loss 1.9614 LearningRate 0.000239 Epoch: 22 Global Step: 464700 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:32,299-Speed 2513.86 samples/sec Loss 1.9051 LearningRate 0.000239 Epoch: 22 Global Step: 464710 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:40,496-Speed 2498.96 samples/sec Loss 1.8925 LearningRate 0.000239 Epoch: 22 Global Step: 464720 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:48,695-Speed 2498.19 samples/sec Loss 1.9473 LearningRate 0.000239 Epoch: 22 Global Step: 464730 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:49:56,893-Speed 2498.60 samples/sec Loss 1.9370 LearningRate 0.000239 Epoch: 22 Global Step: 464740 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:05,090-Speed 2498.93 samples/sec Loss 1.8730 LearningRate 0.000239 Epoch: 22 Global Step: 464750 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:13,291-Speed 2497.86 samples/sec Loss 1.9364 LearningRate 0.000239 Epoch: 22 Global Step: 464760 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:21,441-Speed 2513.34 samples/sec Loss 1.9027 LearningRate 0.000239 Epoch: 22 Global Step: 464770 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:29,637-Speed 2499.34 samples/sec Loss 1.8851 LearningRate 0.000239 Epoch: 22 Global Step: 464780 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:37,836-Speed 2498.34 samples/sec Loss 1.8615 LearningRate 0.000239 Epoch: 22 Global Step: 464790 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:46,046-Speed 2494.80 samples/sec Loss 1.8933 LearningRate 0.000239 Epoch: 22 Global Step: 464800 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:50:54,247-Speed 2497.63 samples/sec Loss 1.8962 LearningRate 0.000239 Epoch: 22 Global Step: 464810 Fp16 Grad Scale: 32768 Required: 84 hours Training: 2022-07-10 00:51:02,449-Speed 2497.55 samples/sec Loss 1.9182 LearningRate 0.000239 Epoch: 22 Global Step: 464820 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 00:51:10,595-Speed 2514.51 samples/sec Loss 1.9281 LearningRate 0.000239 Epoch: 22 Global Step: 464830 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 00:51:18,792-Speed 2498.76 samples/sec Loss 1.9013 LearningRate 0.000239 Epoch: 22 Global Step: 464840 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 00:51:26,994-Speed 2497.68 samples/sec Loss 1.8890 LearningRate 0.000239 Epoch: 22 Global Step: 464850 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 00:51:35,196-Speed 2497.24 samples/sec Loss 1.9281 LearningRate 0.000239 Epoch: 22 Global Step: 464860 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 00:51:43,365-Speed 2507.51 samples/sec Loss 1.9047 LearningRate 0.000239 Epoch: 22 Global Step: 464870 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:51:51,576-Speed 2494.72 samples/sec Loss 1.8932 LearningRate 0.000239 Epoch: 22 Global Step: 464880 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:51:59,722-Speed 2514.30 samples/sec Loss 1.9016 LearningRate 0.000239 Epoch: 22 Global Step: 464890 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:07,928-Speed 2496.14 samples/sec Loss 1.8992 LearningRate 0.000239 Epoch: 22 Global Step: 464900 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:16,133-Speed 2496.25 samples/sec Loss 1.9092 LearningRate 0.000239 Epoch: 22 Global Step: 464910 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:24,330-Speed 2498.98 samples/sec Loss 1.8720 LearningRate 0.000239 Epoch: 22 Global Step: 464920 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:32,528-Speed 2498.51 samples/sec Loss 1.8884 LearningRate 0.000239 Epoch: 22 Global Step: 464930 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:40,743-Speed 2493.51 samples/sec Loss 1.8623 LearningRate 0.000239 Epoch: 22 Global Step: 464940 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:48,889-Speed 2514.39 samples/sec Loss 1.8898 LearningRate 0.000238 Epoch: 22 Global Step: 464950 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:52:57,087-Speed 2498.55 samples/sec Loss 1.9512 LearningRate 0.000238 Epoch: 22 Global Step: 464960 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:05,286-Speed 2498.22 samples/sec Loss 1.8855 LearningRate 0.000238 Epoch: 22 Global Step: 464970 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:13,484-Speed 2498.56 samples/sec Loss 1.9478 LearningRate 0.000238 Epoch: 22 Global Step: 464980 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:21,684-Speed 2498.02 samples/sec Loss 1.8913 LearningRate 0.000238 Epoch: 22 Global Step: 464990 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:29,885-Speed 2497.70 samples/sec Loss 1.8933 LearningRate 0.000238 Epoch: 22 Global Step: 465000 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:38,030-Speed 2514.79 samples/sec Loss 1.8822 LearningRate 0.000238 Epoch: 22 Global Step: 465010 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:46,228-Speed 2499.62 samples/sec Loss 1.9349 LearningRate 0.000238 Epoch: 22 Global Step: 465020 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:53:54,429-Speed 2497.60 samples/sec Loss 1.9055 LearningRate 0.000238 Epoch: 22 Global Step: 465030 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:02,630-Speed 2497.66 samples/sec Loss 1.9072 LearningRate 0.000238 Epoch: 22 Global Step: 465040 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:10,828-Speed 2499.16 samples/sec Loss 1.9240 LearningRate 0.000238 Epoch: 22 Global Step: 465050 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:19,023-Speed 2499.49 samples/sec Loss 1.9335 LearningRate 0.000238 Epoch: 22 Global Step: 465060 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:27,168-Speed 2514.89 samples/sec Loss 1.9317 LearningRate 0.000238 Epoch: 22 Global Step: 465070 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:35,375-Speed 2495.77 samples/sec Loss 1.8953 LearningRate 0.000238 Epoch: 22 Global Step: 465080 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:43,578-Speed 2496.91 samples/sec Loss 1.9649 LearningRate 0.000238 Epoch: 22 Global Step: 465090 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:51,781-Speed 2497.10 samples/sec Loss 1.9499 LearningRate 0.000238 Epoch: 22 Global Step: 465100 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:54:59,993-Speed 2494.42 samples/sec Loss 1.8892 LearningRate 0.000238 Epoch: 22 Global Step: 465110 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:08,197-Speed 2496.53 samples/sec Loss 1.9056 LearningRate 0.000238 Epoch: 22 Global Step: 465120 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:16,352-Speed 2511.82 samples/sec Loss 1.8927 LearningRate 0.000238 Epoch: 22 Global Step: 465130 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:24,554-Speed 2497.24 samples/sec Loss 1.8800 LearningRate 0.000238 Epoch: 22 Global Step: 465140 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:32,752-Speed 2498.71 samples/sec Loss 1.9207 LearningRate 0.000238 Epoch: 22 Global Step: 465150 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:40,949-Speed 2498.60 samples/sec Loss 1.8956 LearningRate 0.000238 Epoch: 22 Global Step: 465160 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:49,148-Speed 2498.49 samples/sec Loss 1.8950 LearningRate 0.000238 Epoch: 22 Global Step: 465170 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:55:57,374-Speed 2489.99 samples/sec Loss 1.8972 LearningRate 0.000238 Epoch: 22 Global Step: 465180 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:05,534-Speed 2510.29 samples/sec Loss 1.9243 LearningRate 0.000238 Epoch: 22 Global Step: 465190 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:13,731-Speed 2498.73 samples/sec Loss 1.9183 LearningRate 0.000238 Epoch: 22 Global Step: 465200 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:21,929-Speed 2498.47 samples/sec Loss 1.9027 LearningRate 0.000238 Epoch: 22 Global Step: 465210 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:30,135-Speed 2496.51 samples/sec Loss 1.9277 LearningRate 0.000238 Epoch: 22 Global Step: 465220 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:38,335-Speed 2497.75 samples/sec Loss 1.9202 LearningRate 0.000238 Epoch: 22 Global Step: 465230 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:46,539-Speed 2496.89 samples/sec Loss 1.8933 LearningRate 0.000238 Epoch: 22 Global Step: 465240 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:56:54,701-Speed 2509.83 samples/sec Loss 1.9701 LearningRate 0.000238 Epoch: 22 Global Step: 465250 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:02,907-Speed 2495.87 samples/sec Loss 1.8959 LearningRate 0.000238 Epoch: 22 Global Step: 465260 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:11,109-Speed 2497.54 samples/sec Loss 1.9135 LearningRate 0.000238 Epoch: 22 Global Step: 465270 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:19,324-Speed 2493.36 samples/sec Loss 1.8955 LearningRate 0.000238 Epoch: 22 Global Step: 465280 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:27,525-Speed 2497.56 samples/sec Loss 1.9298 LearningRate 0.000238 Epoch: 22 Global Step: 465290 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:35,748-Speed 2491.20 samples/sec Loss 1.9163 LearningRate 0.000238 Epoch: 22 Global Step: 465300 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:43,915-Speed 2508.29 samples/sec Loss 1.9341 LearningRate 0.000238 Epoch: 22 Global Step: 465310 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:57:52,113-Speed 2498.46 samples/sec Loss 1.9306 LearningRate 0.000238 Epoch: 22 Global Step: 465320 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:00,311-Speed 2498.51 samples/sec Loss 1.9414 LearningRate 0.000238 Epoch: 22 Global Step: 465330 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:08,512-Speed 2497.62 samples/sec Loss 1.9638 LearningRate 0.000238 Epoch: 22 Global Step: 465340 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:16,712-Speed 2497.91 samples/sec Loss 1.9860 LearningRate 0.000238 Epoch: 22 Global Step: 465350 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:24,914-Speed 2497.31 samples/sec Loss 1.9094 LearningRate 0.000238 Epoch: 22 Global Step: 465360 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:33,059-Speed 2514.94 samples/sec Loss 1.9238 LearningRate 0.000238 Epoch: 22 Global Step: 465370 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:41,258-Speed 2498.36 samples/sec Loss 1.8859 LearningRate 0.000238 Epoch: 22 Global Step: 465380 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:49,457-Speed 2498.59 samples/sec Loss 1.9342 LearningRate 0.000238 Epoch: 22 Global Step: 465390 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:58:57,656-Speed 2498.21 samples/sec Loss 1.9387 LearningRate 0.000238 Epoch: 22 Global Step: 465400 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:05,866-Speed 2494.85 samples/sec Loss 1.9603 LearningRate 0.000238 Epoch: 22 Global Step: 465410 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:14,067-Speed 2497.60 samples/sec Loss 1.9289 LearningRate 0.000238 Epoch: 22 Global Step: 465420 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:22,218-Speed 2513.21 samples/sec Loss 1.9606 LearningRate 0.000238 Epoch: 22 Global Step: 465430 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:30,417-Speed 2498.36 samples/sec Loss 1.9542 LearningRate 0.000238 Epoch: 22 Global Step: 465440 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:38,639-Speed 2491.27 samples/sec Loss 1.9633 LearningRate 0.000238 Epoch: 22 Global Step: 465450 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:46,857-Speed 2492.78 samples/sec Loss 1.9501 LearningRate 0.000238 Epoch: 22 Global Step: 465460 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 00:59:55,056-Speed 2498.26 samples/sec Loss 1.9251 LearningRate 0.000238 Epoch: 22 Global Step: 465470 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:03,255-Speed 2498.27 samples/sec Loss 1.9171 LearningRate 0.000238 Epoch: 22 Global Step: 465480 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:11,411-Speed 2511.31 samples/sec Loss 1.9104 LearningRate 0.000238 Epoch: 22 Global Step: 465490 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:19,609-Speed 2498.77 samples/sec Loss 1.8875 LearningRate 0.000238 Epoch: 22 Global Step: 465500 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:27,806-Speed 2498.70 samples/sec Loss 1.8923 LearningRate 0.000238 Epoch: 22 Global Step: 465510 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:36,011-Speed 2496.48 samples/sec Loss 1.8864 LearningRate 0.000238 Epoch: 22 Global Step: 465520 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:44,211-Speed 2497.90 samples/sec Loss 1.9127 LearningRate 0.000238 Epoch: 22 Global Step: 465530 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:00:52,412-Speed 2497.60 samples/sec Loss 1.9432 LearningRate 0.000238 Epoch: 22 Global Step: 465540 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:00,557-Speed 2514.77 samples/sec Loss 1.9030 LearningRate 0.000238 Epoch: 22 Global Step: 465550 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:08,757-Speed 2497.94 samples/sec Loss 1.9429 LearningRate 0.000238 Epoch: 22 Global Step: 465560 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:16,971-Speed 2493.79 samples/sec Loss 1.9105 LearningRate 0.000238 Epoch: 22 Global Step: 465570 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:25,173-Speed 2497.31 samples/sec Loss 1.9526 LearningRate 0.000238 Epoch: 22 Global Step: 465580 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:33,372-Speed 2498.42 samples/sec Loss 1.8915 LearningRate 0.000238 Epoch: 22 Global Step: 465590 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:41,571-Speed 2498.30 samples/sec Loss 1.8718 LearningRate 0.000238 Epoch: 22 Global Step: 465600 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:49,719-Speed 2514.25 samples/sec Loss 1.9113 LearningRate 0.000238 Epoch: 22 Global Step: 465610 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:01:57,919-Speed 2497.85 samples/sec Loss 1.8833 LearningRate 0.000238 Epoch: 22 Global Step: 465620 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:06,123-Speed 2496.72 samples/sec Loss 1.8934 LearningRate 0.000238 Epoch: 22 Global Step: 465630 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:14,328-Speed 2496.61 samples/sec Loss 1.8626 LearningRate 0.000238 Epoch: 22 Global Step: 465640 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:22,527-Speed 2498.34 samples/sec Loss 1.9372 LearningRate 0.000238 Epoch: 22 Global Step: 465650 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:30,735-Speed 2495.54 samples/sec Loss 1.8940 LearningRate 0.000238 Epoch: 22 Global Step: 465660 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:38,885-Speed 2513.34 samples/sec Loss 1.9031 LearningRate 0.000238 Epoch: 22 Global Step: 465670 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:47,086-Speed 2497.70 samples/sec Loss 1.9326 LearningRate 0.000238 Epoch: 22 Global Step: 465680 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:02:55,291-Speed 2496.33 samples/sec Loss 1.8875 LearningRate 0.000238 Epoch: 22 Global Step: 465690 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:03,490-Speed 2498.30 samples/sec Loss 1.8610 LearningRate 0.000238 Epoch: 22 Global Step: 465700 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:11,700-Speed 2495.34 samples/sec Loss 1.8583 LearningRate 0.000238 Epoch: 22 Global Step: 465710 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:19,899-Speed 2498.02 samples/sec Loss 1.9022 LearningRate 0.000237 Epoch: 22 Global Step: 465720 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:28,043-Speed 2515.23 samples/sec Loss 1.8827 LearningRate 0.000237 Epoch: 22 Global Step: 465730 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:36,243-Speed 2497.80 samples/sec Loss 1.9014 LearningRate 0.000237 Epoch: 22 Global Step: 465740 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:44,439-Speed 2499.14 samples/sec Loss 1.8941 LearningRate 0.000237 Epoch: 22 Global Step: 465750 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:03:52,636-Speed 2498.81 samples/sec Loss 1.9189 LearningRate 0.000237 Epoch: 22 Global Step: 465760 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:00,840-Speed 2496.88 samples/sec Loss 1.8801 LearningRate 0.000237 Epoch: 22 Global Step: 465770 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:09,051-Speed 2494.54 samples/sec Loss 1.8824 LearningRate 0.000237 Epoch: 22 Global Step: 465780 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:17,199-Speed 2514.08 samples/sec Loss 1.9211 LearningRate 0.000237 Epoch: 22 Global Step: 465790 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:25,398-Speed 2498.18 samples/sec Loss 1.9032 LearningRate 0.000237 Epoch: 22 Global Step: 465800 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:33,599-Speed 2497.86 samples/sec Loss 1.9191 LearningRate 0.000237 Epoch: 22 Global Step: 465810 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:41,799-Speed 2498.01 samples/sec Loss 1.9408 LearningRate 0.000237 Epoch: 22 Global Step: 465820 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:49,998-Speed 2498.33 samples/sec Loss 1.8904 LearningRate 0.000237 Epoch: 22 Global Step: 465830 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:04:58,198-Speed 2498.29 samples/sec Loss 1.9671 LearningRate 0.000237 Epoch: 22 Global Step: 465840 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:06,342-Speed 2515.29 samples/sec Loss 1.8973 LearningRate 0.000237 Epoch: 22 Global Step: 465850 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:14,537-Speed 2499.25 samples/sec Loss 1.9390 LearningRate 0.000237 Epoch: 22 Global Step: 465860 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:22,739-Speed 2497.35 samples/sec Loss 1.9113 LearningRate 0.000237 Epoch: 22 Global Step: 465870 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:30,939-Speed 2497.94 samples/sec Loss 1.9498 LearningRate 0.000237 Epoch: 22 Global Step: 465880 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:39,142-Speed 2497.03 samples/sec Loss 1.8658 LearningRate 0.000237 Epoch: 22 Global Step: 465890 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:47,362-Speed 2491.69 samples/sec Loss 1.9026 LearningRate 0.000237 Epoch: 22 Global Step: 465900 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:05:55,506-Speed 2515.41 samples/sec Loss 1.8514 LearningRate 0.000237 Epoch: 22 Global Step: 465910 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:03,702-Speed 2499.16 samples/sec Loss 1.8770 LearningRate 0.000237 Epoch: 22 Global Step: 465920 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:11,908-Speed 2496.18 samples/sec Loss 1.8745 LearningRate 0.000237 Epoch: 22 Global Step: 465930 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:20,110-Speed 2497.42 samples/sec Loss 1.8941 LearningRate 0.000237 Epoch: 22 Global Step: 465940 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:28,320-Speed 2495.02 samples/sec Loss 1.8765 LearningRate 0.000237 Epoch: 22 Global Step: 465950 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:36,537-Speed 2492.85 samples/sec Loss 1.8505 LearningRate 0.000237 Epoch: 22 Global Step: 465960 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:44,686-Speed 2513.30 samples/sec Loss 1.8598 LearningRate 0.000237 Epoch: 22 Global Step: 465970 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:06:52,883-Speed 2499.15 samples/sec Loss 1.8966 LearningRate 0.000237 Epoch: 22 Global Step: 465980 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:01,080-Speed 2498.79 samples/sec Loss 1.8736 LearningRate 0.000237 Epoch: 22 Global Step: 465990 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:09,280-Speed 2497.76 samples/sec Loss 1.8745 LearningRate 0.000237 Epoch: 22 Global Step: 466000 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:17,489-Speed 2495.29 samples/sec Loss 1.8708 LearningRate 0.000237 Epoch: 22 Global Step: 466010 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:25,684-Speed 2499.68 samples/sec Loss 1.8500 LearningRate 0.000237 Epoch: 22 Global Step: 466020 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:33,839-Speed 2511.73 samples/sec Loss 1.8889 LearningRate 0.000237 Epoch: 22 Global Step: 466030 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:42,045-Speed 2496.13 samples/sec Loss 1.8521 LearningRate 0.000237 Epoch: 22 Global Step: 466040 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:50,247-Speed 2497.40 samples/sec Loss 1.8934 LearningRate 0.000237 Epoch: 22 Global Step: 466050 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:07:58,530-Speed 2473.10 samples/sec Loss 1.8725 LearningRate 0.000237 Epoch: 22 Global Step: 466060 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:08:06,733-Speed 2497.39 samples/sec Loss 1.9190 LearningRate 0.000237 Epoch: 22 Global Step: 466070 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:14,936-Speed 2496.93 samples/sec Loss 1.8635 LearningRate 0.000237 Epoch: 22 Global Step: 466080 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:23,080-Speed 2515.36 samples/sec Loss 1.8356 LearningRate 0.000237 Epoch: 22 Global Step: 466090 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:31,277-Speed 2498.96 samples/sec Loss 1.8745 LearningRate 0.000237 Epoch: 22 Global Step: 466100 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:39,477-Speed 2497.91 samples/sec Loss 1.9093 LearningRate 0.000237 Epoch: 22 Global Step: 466110 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:47,693-Speed 2493.10 samples/sec Loss 1.9040 LearningRate 0.000237 Epoch: 22 Global Step: 466120 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:08:55,893-Speed 2498.08 samples/sec Loss 1.8976 LearningRate 0.000237 Epoch: 22 Global Step: 466130 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:04,096-Speed 2497.06 samples/sec Loss 1.8420 LearningRate 0.000237 Epoch: 22 Global Step: 466140 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:12,242-Speed 2514.71 samples/sec Loss 1.9124 LearningRate 0.000237 Epoch: 22 Global Step: 466150 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:20,456-Speed 2493.52 samples/sec Loss 1.8983 LearningRate 0.000237 Epoch: 22 Global Step: 466160 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:28,654-Speed 2498.80 samples/sec Loss 1.8825 LearningRate 0.000237 Epoch: 22 Global Step: 466170 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:36,851-Speed 2498.95 samples/sec Loss 1.9047 LearningRate 0.000237 Epoch: 22 Global Step: 466180 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:45,050-Speed 2498.26 samples/sec Loss 1.8931 LearningRate 0.000237 Epoch: 22 Global Step: 466190 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:09:53,260-Speed 2494.70 samples/sec Loss 1.9379 LearningRate 0.000237 Epoch: 22 Global Step: 466200 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:01,400-Speed 2516.46 samples/sec Loss 2.0090 LearningRate 0.000237 Epoch: 22 Global Step: 466210 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:09,605-Speed 2496.23 samples/sec Loss 1.9603 LearningRate 0.000237 Epoch: 22 Global Step: 466220 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:17,812-Speed 2495.76 samples/sec Loss 1.8892 LearningRate 0.000237 Epoch: 22 Global Step: 466230 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:26,013-Speed 2498.02 samples/sec Loss 1.8932 LearningRate 0.000237 Epoch: 22 Global Step: 466240 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:34,225-Speed 2494.28 samples/sec Loss 1.9447 LearningRate 0.000237 Epoch: 22 Global Step: 466250 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:42,428-Speed 2497.16 samples/sec Loss 1.8954 LearningRate 0.000237 Epoch: 22 Global Step: 466260 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:50,577-Speed 2513.58 samples/sec Loss 1.9092 LearningRate 0.000237 Epoch: 22 Global Step: 466270 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:10:58,777-Speed 2498.21 samples/sec Loss 1.9105 LearningRate 0.000237 Epoch: 22 Global Step: 466280 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:06,978-Speed 2497.33 samples/sec Loss 1.8711 LearningRate 0.000237 Epoch: 22 Global Step: 466290 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:15,186-Speed 2495.83 samples/sec Loss 1.9333 LearningRate 0.000237 Epoch: 22 Global Step: 466300 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:23,386-Speed 2497.98 samples/sec Loss 1.9000 LearningRate 0.000237 Epoch: 22 Global Step: 466310 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:31,598-Speed 2494.25 samples/sec Loss 1.9614 LearningRate 0.000237 Epoch: 22 Global Step: 466320 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:39,743-Speed 2514.80 samples/sec Loss 1.9389 LearningRate 0.000237 Epoch: 22 Global Step: 466330 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:47,945-Speed 2497.32 samples/sec Loss 1.9525 LearningRate 0.000237 Epoch: 22 Global Step: 466340 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:11:56,152-Speed 2495.89 samples/sec Loss 1.8926 LearningRate 0.000237 Epoch: 22 Global Step: 466350 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:12:04,313-Speed 2509.90 samples/sec Loss 1.9333 LearningRate 0.000237 Epoch: 22 Global Step: 466360 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:12,535-Speed 2491.28 samples/sec Loss 1.8211 LearningRate 0.000237 Epoch: 22 Global Step: 466370 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:20,741-Speed 2496.13 samples/sec Loss 1.8896 LearningRate 0.000237 Epoch: 22 Global Step: 466380 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:28,886-Speed 2514.77 samples/sec Loss 1.9275 LearningRate 0.000237 Epoch: 22 Global Step: 466390 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:37,091-Speed 2496.51 samples/sec Loss 1.8669 LearningRate 0.000237 Epoch: 22 Global Step: 466400 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:45,295-Speed 2496.76 samples/sec Loss 1.9248 LearningRate 0.000237 Epoch: 22 Global Step: 466410 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:12:53,511-Speed 2493.16 samples/sec Loss 1.9426 LearningRate 0.000237 Epoch: 22 Global Step: 466420 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:01,716-Speed 2496.09 samples/sec Loss 1.8708 LearningRate 0.000237 Epoch: 22 Global Step: 466430 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:09,936-Speed 2492.08 samples/sec Loss 1.8906 LearningRate 0.000237 Epoch: 22 Global Step: 466440 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:18,081-Speed 2515.03 samples/sec Loss 1.8905 LearningRate 0.000237 Epoch: 22 Global Step: 466450 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:26,287-Speed 2495.97 samples/sec Loss 1.9154 LearningRate 0.000237 Epoch: 22 Global Step: 466460 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:34,488-Speed 2497.80 samples/sec Loss 1.8998 LearningRate 0.000237 Epoch: 22 Global Step: 466470 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:42,692-Speed 2496.94 samples/sec Loss 1.9077 LearningRate 0.000236 Epoch: 22 Global Step: 466480 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:50,896-Speed 2496.66 samples/sec Loss 1.8823 LearningRate 0.000236 Epoch: 22 Global Step: 466490 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:13:59,101-Speed 2496.58 samples/sec Loss 1.8947 LearningRate 0.000236 Epoch: 22 Global Step: 466500 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:07,250-Speed 2513.59 samples/sec Loss 1.9075 LearningRate 0.000236 Epoch: 22 Global Step: 466510 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:15,452-Speed 2497.15 samples/sec Loss 1.9207 LearningRate 0.000236 Epoch: 22 Global Step: 466520 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:23,659-Speed 2495.90 samples/sec Loss 1.8959 LearningRate 0.000236 Epoch: 22 Global Step: 466530 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:31,862-Speed 2496.99 samples/sec Loss 1.8934 LearningRate 0.000236 Epoch: 22 Global Step: 466540 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:40,069-Speed 2495.67 samples/sec Loss 1.8691 LearningRate 0.000236 Epoch: 22 Global Step: 466550 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:48,273-Speed 2496.77 samples/sec Loss 1.9257 LearningRate 0.000236 Epoch: 22 Global Step: 466560 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:14:56,420-Speed 2514.41 samples/sec Loss 1.8244 LearningRate 0.000236 Epoch: 22 Global Step: 466570 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:04,622-Speed 2497.24 samples/sec Loss 1.9193 LearningRate 0.000236 Epoch: 22 Global Step: 466580 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:12,827-Speed 2496.55 samples/sec Loss 1.9098 LearningRate 0.000236 Epoch: 22 Global Step: 466590 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:21,047-Speed 2495.97 samples/sec Loss 1.8856 LearningRate 0.000236 Epoch: 22 Global Step: 466600 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:29,316-Speed 2495.50 samples/sec Loss 1.8699 LearningRate 0.000236 Epoch: 22 Global Step: 466610 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:37,517-Speed 2497.64 samples/sec Loss 1.9120 LearningRate 0.000236 Epoch: 22 Global Step: 466620 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:45,721-Speed 2511.54 samples/sec Loss 1.8775 LearningRate 0.000236 Epoch: 22 Global Step: 466630 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:15:56,676-Speed 1869.64 samples/sec Loss 1.8998 LearningRate 0.000236 Epoch: 22 Global Step: 466640 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:04,898-Speed 2500.98 samples/sec Loss 1.8961 LearningRate 0.000236 Epoch: 22 Global Step: 466650 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:13,124-Speed 2500.66 samples/sec Loss 1.8634 LearningRate 0.000236 Epoch: 22 Global Step: 466660 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:21,337-Speed 2493.96 samples/sec Loss 1.8524 LearningRate 0.000236 Epoch: 22 Global Step: 466670 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:33,619-Speed 2497.42 samples/sec Loss 1.9237 LearningRate 0.000236 Epoch: 22 Global Step: 466680 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:41,817-Speed 2513.57 samples/sec Loss 1.8443 LearningRate 0.000236 Epoch: 22 Global Step: 466690 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:16:52,744-Speed 1874.36 samples/sec Loss 1.8663 LearningRate 0.000236 Epoch: 22 Global Step: 466700 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:00,970-Speed 2498.41 samples/sec Loss 1.8581 LearningRate 0.000236 Epoch: 22 Global Step: 466710 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:09,214-Speed 2496.21 samples/sec Loss 1.8939 LearningRate 0.000236 Epoch: 22 Global Step: 466720 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:17,472-Speed 2495.31 samples/sec Loss 1.8604 LearningRate 0.000236 Epoch: 22 Global Step: 466730 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:28,178-Speed 1913.17 samples/sec Loss 1.8754 LearningRate 0.000236 Epoch: 22 Global Step: 466740 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:36,377-Speed 2509.77 samples/sec Loss 1.8542 LearningRate 0.000236 Epoch: 22 Global Step: 466750 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:44,636-Speed 2493.28 samples/sec Loss 1.8862 LearningRate 0.000236 Epoch: 22 Global Step: 466760 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:17:52,850-Speed 2493.73 samples/sec Loss 1.8644 LearningRate 0.000236 Epoch: 22 Global Step: 466770 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:01,108-Speed 2494.12 samples/sec Loss 1.8721 LearningRate 0.000236 Epoch: 22 Global Step: 466780 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:13,147-Speed 2474.38 samples/sec Loss 1.8690 LearningRate 0.000236 Epoch: 22 Global Step: 466790 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:26,478-Speed 1543.95 samples/sec Loss 1.8543 LearningRate 0.000236 Epoch: 22 Global Step: 466800 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:35,203-Speed 2354.34 samples/sec Loss 1.8783 LearningRate 0.000236 Epoch: 22 Global Step: 466810 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:43,438-Speed 2494.50 samples/sec Loss 1.8701 LearningRate 0.000236 Epoch: 22 Global Step: 466820 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:18:52,035-Speed 2382.73 samples/sec Loss 1.8666 LearningRate 0.000236 Epoch: 22 Global Step: 466830 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:00,847-Speed 2324.47 samples/sec Loss 1.9232 LearningRate 0.000236 Epoch: 22 Global Step: 466840 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:09,049-Speed 2497.44 samples/sec Loss 1.8860 LearningRate 0.000236 Epoch: 22 Global Step: 466850 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:17,283-Speed 2487.75 samples/sec Loss 1.8693 LearningRate 0.000236 Epoch: 22 Global Step: 466860 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:25,434-Speed 2513.01 samples/sec Loss 1.8910 LearningRate 0.000236 Epoch: 22 Global Step: 466870 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:33,646-Speed 2494.31 samples/sec Loss 1.8939 LearningRate 0.000236 Epoch: 22 Global Step: 466880 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:41,860-Speed 2493.52 samples/sec Loss 1.8979 LearningRate 0.000236 Epoch: 22 Global Step: 466890 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:50,076-Speed 2493.08 samples/sec Loss 1.9122 LearningRate 0.000236 Epoch: 22 Global Step: 466900 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:19:58,285-Speed 2495.62 samples/sec Loss 1.8574 LearningRate 0.000236 Epoch: 22 Global Step: 466910 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:06,494-Speed 2494.93 samples/sec Loss 1.8516 LearningRate 0.000236 Epoch: 22 Global Step: 466920 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:14,657-Speed 2509.78 samples/sec Loss 1.9034 LearningRate 0.000236 Epoch: 22 Global Step: 466930 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:22,864-Speed 2495.88 samples/sec Loss 1.8873 LearningRate 0.000236 Epoch: 22 Global Step: 466940 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:31,070-Speed 2495.92 samples/sec Loss 1.8806 LearningRate 0.000236 Epoch: 22 Global Step: 466950 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:39,288-Speed 2492.83 samples/sec Loss 1.8723 LearningRate 0.000236 Epoch: 22 Global Step: 466960 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:47,489-Speed 2497.39 samples/sec Loss 1.9101 LearningRate 0.000236 Epoch: 22 Global Step: 466970 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:20:55,699-Speed 2495.03 samples/sec Loss 1.9124 LearningRate 0.000236 Epoch: 22 Global Step: 466980 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:03,850-Speed 2513.04 samples/sec Loss 1.8909 LearningRate 0.000236 Epoch: 22 Global Step: 466990 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:12,050-Speed 2497.95 samples/sec Loss 1.9010 LearningRate 0.000236 Epoch: 22 Global Step: 467000 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:20,284-Speed 2487.48 samples/sec Loss 1.8679 LearningRate 0.000236 Epoch: 22 Global Step: 467010 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:28,499-Speed 2493.63 samples/sec Loss 1.9110 LearningRate 0.000236 Epoch: 22 Global Step: 467020 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:36,710-Speed 2494.42 samples/sec Loss 1.8849 LearningRate 0.000236 Epoch: 22 Global Step: 467030 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:44,922-Speed 2494.17 samples/sec Loss 1.8833 LearningRate 0.000236 Epoch: 22 Global Step: 467040 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:21:53,075-Speed 2512.56 samples/sec Loss 1.8882 LearningRate 0.000236 Epoch: 22 Global Step: 467050 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:01,281-Speed 2496.03 samples/sec Loss 1.9169 LearningRate 0.000236 Epoch: 22 Global Step: 467060 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:09,488-Speed 2495.97 samples/sec Loss 1.9147 LearningRate 0.000236 Epoch: 22 Global Step: 467070 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:17,705-Speed 2492.93 samples/sec Loss 1.8982 LearningRate 0.000236 Epoch: 22 Global Step: 467080 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:25,935-Speed 2488.58 samples/sec Loss 1.9181 LearningRate 0.000236 Epoch: 22 Global Step: 467090 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:34,143-Speed 2495.72 samples/sec Loss 1.9228 LearningRate 0.000236 Epoch: 22 Global Step: 467100 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:42,299-Speed 2511.26 samples/sec Loss 1.8872 LearningRate 0.000236 Epoch: 22 Global Step: 467110 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:50,521-Speed 2491.43 samples/sec Loss 1.8937 LearningRate 0.000236 Epoch: 22 Global Step: 467120 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:22:58,724-Speed 2496.89 samples/sec Loss 1.8692 LearningRate 0.000236 Epoch: 22 Global Step: 467130 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:06,935-Speed 2494.72 samples/sec Loss 1.8935 LearningRate 0.000236 Epoch: 22 Global Step: 467140 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:15,145-Speed 2494.61 samples/sec Loss 1.8791 LearningRate 0.000236 Epoch: 22 Global Step: 467150 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:23,371-Speed 2490.10 samples/sec Loss 1.8851 LearningRate 0.000236 Epoch: 22 Global Step: 467160 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:31,536-Speed 2508.79 samples/sec Loss 1.9012 LearningRate 0.000236 Epoch: 22 Global Step: 467170 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:39,741-Speed 2496.29 samples/sec Loss 1.9251 LearningRate 0.000236 Epoch: 22 Global Step: 467180 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:47,947-Speed 2496.32 samples/sec Loss 1.9077 LearningRate 0.000236 Epoch: 22 Global Step: 467190 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:23:56,158-Speed 2494.47 samples/sec Loss 1.9076 LearningRate 0.000236 Epoch: 22 Global Step: 467200 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:04,371-Speed 2494.13 samples/sec Loss 1.8982 LearningRate 0.000236 Epoch: 22 Global Step: 467210 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:12,575-Speed 2496.43 samples/sec Loss 1.9102 LearningRate 0.000236 Epoch: 22 Global Step: 467220 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:20,730-Speed 2511.96 samples/sec Loss 1.9072 LearningRate 0.000236 Epoch: 22 Global Step: 467230 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:28,935-Speed 2496.33 samples/sec Loss 1.8647 LearningRate 0.000236 Epoch: 22 Global Step: 467240 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:37,141-Speed 2495.89 samples/sec Loss 1.9228 LearningRate 0.000235 Epoch: 22 Global Step: 467250 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:45,346-Speed 2496.57 samples/sec Loss 1.8755 LearningRate 0.000235 Epoch: 22 Global Step: 467260 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:24:53,562-Speed 2493.06 samples/sec Loss 1.8862 LearningRate 0.000235 Epoch: 22 Global Step: 467270 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:01,778-Speed 2493.25 samples/sec Loss 1.8960 LearningRate 0.000235 Epoch: 22 Global Step: 467280 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:09,936-Speed 2510.81 samples/sec Loss 1.8805 LearningRate 0.000235 Epoch: 22 Global Step: 467290 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:18,154-Speed 2492.54 samples/sec Loss 1.9222 LearningRate 0.000235 Epoch: 22 Global Step: 467300 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:26,361-Speed 2495.82 samples/sec Loss 1.8432 LearningRate 0.000235 Epoch: 22 Global Step: 467310 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:34,575-Speed 2493.83 samples/sec Loss 1.9057 LearningRate 0.000235 Epoch: 22 Global Step: 467320 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:42,781-Speed 2496.04 samples/sec Loss 1.8982 LearningRate 0.000235 Epoch: 22 Global Step: 467330 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:50,986-Speed 2496.41 samples/sec Loss 1.8936 LearningRate 0.000235 Epoch: 22 Global Step: 467340 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:25:59,152-Speed 2508.40 samples/sec Loss 1.8634 LearningRate 0.000235 Epoch: 22 Global Step: 467350 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:07,373-Speed 2491.71 samples/sec Loss 1.9122 LearningRate 0.000235 Epoch: 22 Global Step: 467360 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:15,575-Speed 2497.17 samples/sec Loss 1.8894 LearningRate 0.000235 Epoch: 22 Global Step: 467370 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:23,777-Speed 2497.39 samples/sec Loss 1.8920 LearningRate 0.000235 Epoch: 22 Global Step: 467380 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:31,983-Speed 2496.29 samples/sec Loss 1.8860 LearningRate 0.000235 Epoch: 22 Global Step: 467390 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:40,197-Speed 2493.66 samples/sec Loss 1.8605 LearningRate 0.000235 Epoch: 22 Global Step: 467400 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:48,351-Speed 2512.11 samples/sec Loss 1.8843 LearningRate 0.000235 Epoch: 22 Global Step: 467410 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:26:56,554-Speed 2496.75 samples/sec Loss 1.8653 LearningRate 0.000235 Epoch: 22 Global Step: 467420 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:04,763-Speed 2495.43 samples/sec Loss 1.8715 LearningRate 0.000235 Epoch: 22 Global Step: 467430 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:12,969-Speed 2496.24 samples/sec Loss 1.9159 LearningRate 0.000235 Epoch: 22 Global Step: 467440 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:21,188-Speed 2492.25 samples/sec Loss 1.8616 LearningRate 0.000235 Epoch: 22 Global Step: 467450 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:29,404-Speed 2492.93 samples/sec Loss 1.9017 LearningRate 0.000235 Epoch: 22 Global Step: 467460 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:37,555-Speed 2512.95 samples/sec Loss 1.8426 LearningRate 0.000235 Epoch: 22 Global Step: 467470 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:45,757-Speed 2497.29 samples/sec Loss 1.8972 LearningRate 0.000235 Epoch: 22 Global Step: 467480 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:27:53,968-Speed 2494.53 samples/sec Loss 1.8991 LearningRate 0.000235 Epoch: 22 Global Step: 467490 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:02,173-Speed 2496.56 samples/sec Loss 1.9183 LearningRate 0.000235 Epoch: 22 Global Step: 467500 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:10,377-Speed 2496.97 samples/sec Loss 1.8993 LearningRate 0.000235 Epoch: 22 Global Step: 467510 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:18,582-Speed 2496.31 samples/sec Loss 1.8810 LearningRate 0.000235 Epoch: 22 Global Step: 467520 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:26,737-Speed 2511.86 samples/sec Loss 1.8867 LearningRate 0.000235 Epoch: 22 Global Step: 467530 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:34,955-Speed 2492.48 samples/sec Loss 1.9082 LearningRate 0.000235 Epoch: 22 Global Step: 467540 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:43,175-Speed 2491.74 samples/sec Loss 1.8584 LearningRate 0.000235 Epoch: 22 Global Step: 467550 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:28:51,381-Speed 2495.99 samples/sec Loss 1.9003 LearningRate 0.000235 Epoch: 22 Global Step: 467560 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:28:59,590-Speed 2495.24 samples/sec Loss 1.9102 LearningRate 0.000235 Epoch: 22 Global Step: 467570 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:07,804-Speed 2493.51 samples/sec Loss 1.9185 LearningRate 0.000235 Epoch: 22 Global Step: 467580 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:15,955-Speed 2512.89 samples/sec Loss 1.8965 LearningRate 0.000235 Epoch: 22 Global Step: 467590 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:24,175-Speed 2491.85 samples/sec Loss 1.8794 LearningRate 0.000235 Epoch: 22 Global Step: 467600 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:32,383-Speed 2495.59 samples/sec Loss 1.8732 LearningRate 0.000235 Epoch: 22 Global Step: 467610 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:40,591-Speed 2495.68 samples/sec Loss 1.8864 LearningRate 0.000235 Epoch: 22 Global Step: 467620 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:48,795-Speed 2496.58 samples/sec Loss 1.8707 LearningRate 0.000235 Epoch: 22 Global Step: 467630 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:29:57,013-Speed 2492.76 samples/sec Loss 1.8844 LearningRate 0.000235 Epoch: 22 Global Step: 467640 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:05,172-Speed 2510.41 samples/sec Loss 1.8655 LearningRate 0.000235 Epoch: 22 Global Step: 467650 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:13,376-Speed 2496.58 samples/sec Loss 1.8673 LearningRate 0.000235 Epoch: 22 Global Step: 467660 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:21,582-Speed 2496.27 samples/sec Loss 1.8885 LearningRate 0.000235 Epoch: 22 Global Step: 467670 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:29,788-Speed 2496.06 samples/sec Loss 1.9229 LearningRate 0.000235 Epoch: 22 Global Step: 467680 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:37,995-Speed 2495.89 samples/sec Loss 1.9129 LearningRate 0.000235 Epoch: 22 Global Step: 467690 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:46,202-Speed 2495.93 samples/sec Loss 1.9255 LearningRate 0.000235 Epoch: 22 Global Step: 467700 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:30:54,367-Speed 2508.91 samples/sec Loss 1.9100 LearningRate 0.000235 Epoch: 22 Global Step: 467710 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:31:02,572-Speed 2496.23 samples/sec Loss 1.8492 LearningRate 0.000235 Epoch: 22 Global Step: 467720 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:31:10,789-Speed 2492.86 samples/sec Loss 1.9591 LearningRate 0.000235 Epoch: 22 Global Step: 467730 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:31:19,008-Speed 2492.54 samples/sec Loss 1.9277 LearningRate 0.000235 Epoch: 22 Global Step: 467740 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:31:27,171-Speed 2509.22 samples/sec Loss 1.9007 LearningRate 0.000235 Epoch: 22 Global Step: 467750 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:31:35,375-Speed 2496.53 samples/sec Loss 1.9447 LearningRate 0.000235 Epoch: 22 Global Step: 467760 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:31:43,531-Speed 2511.59 samples/sec Loss 1.9652 LearningRate 0.000235 Epoch: 22 Global Step: 467770 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:31:51,736-Speed 2496.42 samples/sec Loss 1.9001 LearningRate 0.000235 Epoch: 22 Global Step: 467780 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:31:59,938-Speed 2497.20 samples/sec Loss 1.8672 LearningRate 0.000235 Epoch: 22 Global Step: 467790 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:08,143-Speed 2496.91 samples/sec Loss 1.9006 LearningRate 0.000235 Epoch: 22 Global Step: 467800 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:16,344-Speed 2497.71 samples/sec Loss 1.8877 LearningRate 0.000235 Epoch: 22 Global Step: 467810 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:24,547-Speed 2497.06 samples/sec Loss 1.9160 LearningRate 0.000235 Epoch: 22 Global Step: 467820 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:32,692-Speed 2514.61 samples/sec Loss 1.9011 LearningRate 0.000235 Epoch: 22 Global Step: 467830 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:40,902-Speed 2494.86 samples/sec Loss 1.9102 LearningRate 0.000235 Epoch: 22 Global Step: 467840 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:49,108-Speed 2496.15 samples/sec Loss 1.8869 LearningRate 0.000235 Epoch: 22 Global Step: 467850 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:32:57,308-Speed 2497.92 samples/sec Loss 1.9125 LearningRate 0.000235 Epoch: 22 Global Step: 467860 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:05,510-Speed 2497.47 samples/sec Loss 1.8467 LearningRate 0.000235 Epoch: 22 Global Step: 467870 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:13,730-Speed 2491.84 samples/sec Loss 1.9081 LearningRate 0.000235 Epoch: 22 Global Step: 467880 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:21,880-Speed 2513.43 samples/sec Loss 1.8690 LearningRate 0.000235 Epoch: 22 Global Step: 467890 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:30,085-Speed 2496.34 samples/sec Loss 1.8911 LearningRate 0.000235 Epoch: 22 Global Step: 467900 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:38,292-Speed 2495.87 samples/sec Loss 1.8668 LearningRate 0.000235 Epoch: 22 Global Step: 467910 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:46,503-Speed 2494.36 samples/sec Loss 1.8647 LearningRate 0.000235 Epoch: 22 Global Step: 467920 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:33:54,712-Speed 2495.12 samples/sec Loss 1.9394 LearningRate 0.000235 Epoch: 22 Global Step: 467930 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:02,919-Speed 2496.07 samples/sec Loss 1.9080 LearningRate 0.000235 Epoch: 22 Global Step: 467940 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:11,071-Speed 2512.47 samples/sec Loss 1.9164 LearningRate 0.000235 Epoch: 22 Global Step: 467950 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:19,275-Speed 2496.73 samples/sec Loss 1.8754 LearningRate 0.000235 Epoch: 22 Global Step: 467960 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:27,478-Speed 2496.99 samples/sec Loss 1.9225 LearningRate 0.000235 Epoch: 22 Global Step: 467970 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:35,684-Speed 2496.03 samples/sec Loss 1.8982 LearningRate 0.000235 Epoch: 22 Global Step: 467980 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:43,890-Speed 2496.24 samples/sec Loss 1.8660 LearningRate 0.000235 Epoch: 22 Global Step: 467990 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:34:52,091-Speed 2497.52 samples/sec Loss 1.9073 LearningRate 0.000235 Epoch: 22 Global Step: 468000 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:00,242-Speed 2512.98 samples/sec Loss 1.8790 LearningRate 0.000235 Epoch: 22 Global Step: 468010 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:08,459-Speed 2492.85 samples/sec Loss 1.9021 LearningRate 0.000234 Epoch: 22 Global Step: 468020 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:16,668-Speed 2495.16 samples/sec Loss 1.8817 LearningRate 0.000234 Epoch: 22 Global Step: 468030 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:24,883-Speed 2493.35 samples/sec Loss 1.9096 LearningRate 0.000234 Epoch: 22 Global Step: 468040 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:33,084-Speed 2497.72 samples/sec Loss 1.8773 LearningRate 0.000234 Epoch: 22 Global Step: 468050 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:41,289-Speed 2496.32 samples/sec Loss 1.8709 LearningRate 0.000234 Epoch: 22 Global Step: 468060 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:49,437-Speed 2513.74 samples/sec Loss 1.8817 LearningRate 0.000234 Epoch: 22 Global Step: 468070 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:35:57,644-Speed 2495.95 samples/sec Loss 1.9295 LearningRate 0.000234 Epoch: 22 Global Step: 468080 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:05,851-Speed 2495.63 samples/sec Loss 1.9095 LearningRate 0.000234 Epoch: 22 Global Step: 468090 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:14,056-Speed 2496.44 samples/sec Loss 1.8951 LearningRate 0.000234 Epoch: 22 Global Step: 468100 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:22,257-Speed 2497.69 samples/sec Loss 1.9390 LearningRate 0.000234 Epoch: 22 Global Step: 468110 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:30,470-Speed 2494.07 samples/sec Loss 1.9310 LearningRate 0.000234 Epoch: 22 Global Step: 468120 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:38,626-Speed 2511.58 samples/sec Loss 1.9239 LearningRate 0.000234 Epoch: 22 Global Step: 468130 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:46,834-Speed 2495.64 samples/sec Loss 1.9224 LearningRate 0.000234 Epoch: 22 Global Step: 468140 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:36:55,045-Speed 2494.71 samples/sec Loss 1.8820 LearningRate 0.000234 Epoch: 22 Global Step: 468150 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:03,247-Speed 2497.33 samples/sec Loss 1.8756 LearningRate 0.000234 Epoch: 22 Global Step: 468160 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:11,453-Speed 2496.09 samples/sec Loss 1.9153 LearningRate 0.000234 Epoch: 22 Global Step: 468170 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:19,662-Speed 2495.55 samples/sec Loss 1.8829 LearningRate 0.000234 Epoch: 22 Global Step: 468180 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:27,816-Speed 2511.99 samples/sec Loss 1.8883 LearningRate 0.000234 Epoch: 22 Global Step: 468190 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:36,022-Speed 2496.12 samples/sec Loss 1.9046 LearningRate 0.000234 Epoch: 22 Global Step: 468200 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:44,225-Speed 2497.20 samples/sec Loss 1.9280 LearningRate 0.000234 Epoch: 22 Global Step: 468210 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:37:52,427-Speed 2497.05 samples/sec Loss 1.9016 LearningRate 0.000234 Epoch: 22 Global Step: 468220 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:00,637-Speed 2494.94 samples/sec Loss 1.8731 LearningRate 0.000234 Epoch: 22 Global Step: 468230 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:08,845-Speed 2495.53 samples/sec Loss 1.9048 LearningRate 0.000234 Epoch: 22 Global Step: 468240 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:17,002-Speed 2511.26 samples/sec Loss 1.8823 LearningRate 0.000234 Epoch: 22 Global Step: 468250 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:25,204-Speed 2497.43 samples/sec Loss 1.8502 LearningRate 0.000234 Epoch: 22 Global Step: 468260 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:33,421-Speed 2492.97 samples/sec Loss 1.8514 LearningRate 0.000234 Epoch: 22 Global Step: 468270 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:41,633-Speed 2494.31 samples/sec Loss 1.8701 LearningRate 0.000234 Epoch: 22 Global Step: 468280 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:49,843-Speed 2494.94 samples/sec Loss 1.8851 LearningRate 0.000234 Epoch: 22 Global Step: 468290 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:38:58,047-Speed 2496.61 samples/sec Loss 1.8753 LearningRate 0.000234 Epoch: 22 Global Step: 468300 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:06,197-Speed 2513.36 samples/sec Loss 1.8646 LearningRate 0.000234 Epoch: 22 Global Step: 468310 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:14,402-Speed 2496.55 samples/sec Loss 1.8915 LearningRate 0.000234 Epoch: 22 Global Step: 468320 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:22,626-Speed 2490.70 samples/sec Loss 1.9000 LearningRate 0.000234 Epoch: 22 Global Step: 468330 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:30,832-Speed 2496.21 samples/sec Loss 1.8757 LearningRate 0.000234 Epoch: 22 Global Step: 468340 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:39,037-Speed 2496.25 samples/sec Loss 1.8613 LearningRate 0.000234 Epoch: 22 Global Step: 468350 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:47,244-Speed 2495.69 samples/sec Loss 1.9324 LearningRate 0.000234 Epoch: 22 Global Step: 468360 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:39:55,399-Speed 2512.10 samples/sec Loss 1.9076 LearningRate 0.000234 Epoch: 22 Global Step: 468370 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:03,601-Speed 2497.10 samples/sec Loss 1.9061 LearningRate 0.000234 Epoch: 22 Global Step: 468380 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:11,803-Speed 2497.39 samples/sec Loss 1.9002 LearningRate 0.000234 Epoch: 22 Global Step: 468390 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:20,009-Speed 2496.13 samples/sec Loss 1.8937 LearningRate 0.000234 Epoch: 22 Global Step: 468400 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:28,213-Speed 2496.71 samples/sec Loss 1.9003 LearningRate 0.000234 Epoch: 22 Global Step: 468410 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:36,422-Speed 2495.17 samples/sec Loss 1.8810 LearningRate 0.000234 Epoch: 22 Global Step: 468420 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:44,573-Speed 2513.00 samples/sec Loss 1.8654 LearningRate 0.000234 Epoch: 22 Global Step: 468430 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:40:52,792-Speed 2491.95 samples/sec Loss 1.9062 LearningRate 0.000234 Epoch: 22 Global Step: 468440 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:00,994-Speed 2497.54 samples/sec Loss 1.9400 LearningRate 0.000234 Epoch: 22 Global Step: 468450 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:09,201-Speed 2495.84 samples/sec Loss 1.9076 LearningRate 0.000234 Epoch: 22 Global Step: 468460 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:17,417-Speed 2492.96 samples/sec Loss 1.9119 LearningRate 0.000234 Epoch: 22 Global Step: 468470 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:25,624-Speed 2495.61 samples/sec Loss 1.8895 LearningRate 0.000234 Epoch: 22 Global Step: 468480 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:33,775-Speed 2513.07 samples/sec Loss 1.8967 LearningRate 0.000234 Epoch: 22 Global Step: 468490 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:41,988-Speed 2494.32 samples/sec Loss 1.9454 LearningRate 0.000234 Epoch: 22 Global Step: 468500 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:50,204-Speed 2493.07 samples/sec Loss 1.8861 LearningRate 0.000234 Epoch: 22 Global Step: 468510 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:41:58,410-Speed 2496.07 samples/sec Loss 1.9194 LearningRate 0.000234 Epoch: 22 Global Step: 468520 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:06,619-Speed 2495.39 samples/sec Loss 1.9059 LearningRate 0.000234 Epoch: 22 Global Step: 468530 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:14,830-Speed 2494.62 samples/sec Loss 1.8602 LearningRate 0.000234 Epoch: 22 Global Step: 468540 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:22,992-Speed 2509.56 samples/sec Loss 1.8879 LearningRate 0.000234 Epoch: 22 Global Step: 468550 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:31,201-Speed 2495.47 samples/sec Loss 1.8867 LearningRate 0.000234 Epoch: 22 Global Step: 468560 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:39,406-Speed 2496.30 samples/sec Loss 1.9027 LearningRate 0.000234 Epoch: 22 Global Step: 468570 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:47,613-Speed 2495.75 samples/sec Loss 1.9154 LearningRate 0.000234 Epoch: 22 Global Step: 468580 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:42:55,819-Speed 2496.17 samples/sec Loss 1.8645 LearningRate 0.000234 Epoch: 22 Global Step: 468590 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:04,023-Speed 2496.60 samples/sec Loss 1.9031 LearningRate 0.000234 Epoch: 22 Global Step: 468600 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:12,180-Speed 2511.38 samples/sec Loss 1.9043 LearningRate 0.000234 Epoch: 22 Global Step: 468610 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:20,385-Speed 2496.40 samples/sec Loss 1.9229 LearningRate 0.000234 Epoch: 22 Global Step: 468620 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:28,596-Speed 2494.46 samples/sec Loss 1.9285 LearningRate 0.000234 Epoch: 22 Global Step: 468630 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:36,801-Speed 2496.69 samples/sec Loss 1.8847 LearningRate 0.000234 Epoch: 22 Global Step: 468640 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:45,017-Speed 2493.16 samples/sec Loss 1.8768 LearningRate 0.000234 Epoch: 22 Global Step: 468650 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:43:53,221-Speed 2496.76 samples/sec Loss 1.8902 LearningRate 0.000234 Epoch: 22 Global Step: 468660 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:01,380-Speed 2510.54 samples/sec Loss 1.8850 LearningRate 0.000234 Epoch: 22 Global Step: 468670 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:09,588-Speed 2495.47 samples/sec Loss 1.9172 LearningRate 0.000234 Epoch: 22 Global Step: 468680 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:17,793-Speed 2496.33 samples/sec Loss 1.9220 LearningRate 0.000234 Epoch: 22 Global Step: 468690 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:26,001-Speed 2495.67 samples/sec Loss 1.9307 LearningRate 0.000234 Epoch: 22 Global Step: 468700 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:34,217-Speed 2493.28 samples/sec Loss 1.8861 LearningRate 0.000234 Epoch: 22 Global Step: 468710 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:42,420-Speed 2496.92 samples/sec Loss 1.8822 LearningRate 0.000234 Epoch: 22 Global Step: 468720 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:50,583-Speed 2509.90 samples/sec Loss 1.8144 LearningRate 0.000234 Epoch: 22 Global Step: 468730 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:44:58,790-Speed 2495.63 samples/sec Loss 1.8736 LearningRate 0.000234 Epoch: 22 Global Step: 468740 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:06,994-Speed 2496.68 samples/sec Loss 1.8719 LearningRate 0.000234 Epoch: 22 Global Step: 468750 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:15,207-Speed 2494.13 samples/sec Loss 1.9132 LearningRate 0.000234 Epoch: 22 Global Step: 468760 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:23,421-Speed 2493.74 samples/sec Loss 1.9204 LearningRate 0.000234 Epoch: 22 Global Step: 468770 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:31,621-Speed 2498.05 samples/sec Loss 1.9072 LearningRate 0.000234 Epoch: 22 Global Step: 468780 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:39,772-Speed 2512.81 samples/sec Loss 1.8379 LearningRate 0.000233 Epoch: 22 Global Step: 468790 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:47,982-Speed 2495.16 samples/sec Loss 1.8749 LearningRate 0.000233 Epoch: 22 Global Step: 468800 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:45:56,182-Speed 2497.91 samples/sec Loss 1.8851 LearningRate 0.000233 Epoch: 22 Global Step: 468810 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:04,391-Speed 2495.20 samples/sec Loss 1.9038 LearningRate 0.000233 Epoch: 22 Global Step: 468820 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:12,603-Speed 2494.73 samples/sec Loss 1.8869 LearningRate 0.000233 Epoch: 22 Global Step: 468830 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:20,807-Speed 2496.64 samples/sec Loss 1.9191 LearningRate 0.000233 Epoch: 22 Global Step: 468840 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:28,982-Speed 2505.60 samples/sec Loss 1.8949 LearningRate 0.000233 Epoch: 22 Global Step: 468850 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:37,193-Speed 2494.64 samples/sec Loss 1.9152 LearningRate 0.000233 Epoch: 22 Global Step: 468860 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:45,396-Speed 2497.53 samples/sec Loss 1.9079 LearningRate 0.000233 Epoch: 22 Global Step: 468870 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:46:53,618-Speed 2491.37 samples/sec Loss 1.9018 LearningRate 0.000233 Epoch: 22 Global Step: 468880 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:01,828-Speed 2494.94 samples/sec Loss 1.8851 LearningRate 0.000233 Epoch: 22 Global Step: 468890 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:10,041-Speed 2493.96 samples/sec Loss 1.9335 LearningRate 0.000233 Epoch: 22 Global Step: 468900 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:18,190-Speed 2513.63 samples/sec Loss 1.9128 LearningRate 0.000233 Epoch: 22 Global Step: 468910 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:26,394-Speed 2497.14 samples/sec Loss 1.8958 LearningRate 0.000233 Epoch: 22 Global Step: 468920 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:34,608-Speed 2493.69 samples/sec Loss 1.9001 LearningRate 0.000233 Epoch: 22 Global Step: 468930 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:42,812-Speed 2496.57 samples/sec Loss 1.8815 LearningRate 0.000233 Epoch: 22 Global Step: 468940 Fp16 Grad Scale: 16384 Required: 83 hours Training: 2022-07-10 01:47:51,016-Speed 2496.74 samples/sec Loss 1.9302 LearningRate 0.000233 Epoch: 22 Global Step: 468950 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:47:59,224-Speed 2495.72 samples/sec Loss 1.9355 LearningRate 0.000233 Epoch: 22 Global Step: 468960 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:07,374-Speed 2513.08 samples/sec Loss 1.9272 LearningRate 0.000233 Epoch: 22 Global Step: 468970 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:15,578-Speed 2497.36 samples/sec Loss 1.9443 LearningRate 0.000233 Epoch: 22 Global Step: 468980 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:23,781-Speed 2497.29 samples/sec Loss 1.9176 LearningRate 0.000233 Epoch: 22 Global Step: 468990 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:31,984-Speed 2496.76 samples/sec Loss 1.9164 LearningRate 0.000233 Epoch: 22 Global Step: 469000 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:40,191-Speed 2495.79 samples/sec Loss 1.9330 LearningRate 0.000233 Epoch: 22 Global Step: 469010 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:48,398-Speed 2496.02 samples/sec Loss 1.9344 LearningRate 0.000233 Epoch: 22 Global Step: 469020 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:48:56,549-Speed 2512.94 samples/sec Loss 1.8589 LearningRate 0.000233 Epoch: 22 Global Step: 469030 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:04,753-Speed 2496.80 samples/sec Loss 1.8776 LearningRate 0.000233 Epoch: 22 Global Step: 469040 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:12,957-Speed 2496.69 samples/sec Loss 1.8630 LearningRate 0.000233 Epoch: 22 Global Step: 469050 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:21,159-Speed 2497.43 samples/sec Loss 1.8507 LearningRate 0.000233 Epoch: 22 Global Step: 469060 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:29,360-Speed 2497.80 samples/sec Loss 1.8832 LearningRate 0.000233 Epoch: 22 Global Step: 469070 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:37,565-Speed 2496.60 samples/sec Loss 1.8767 LearningRate 0.000233 Epoch: 22 Global Step: 469080 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:45,719-Speed 2511.87 samples/sec Loss 1.8979 LearningRate 0.000233 Epoch: 22 Global Step: 469090 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:49:53,937-Speed 2492.59 samples/sec Loss 1.9033 LearningRate 0.000233 Epoch: 22 Global Step: 469100 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:02,143-Speed 2496.29 samples/sec Loss 1.9417 LearningRate 0.000233 Epoch: 22 Global Step: 469110 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:10,344-Speed 2497.67 samples/sec Loss 1.8828 LearningRate 0.000233 Epoch: 22 Global Step: 469120 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:18,545-Speed 2497.66 samples/sec Loss 1.9341 LearningRate 0.000233 Epoch: 22 Global Step: 469130 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:26,747-Speed 2497.07 samples/sec Loss 1.8925 LearningRate 0.000233 Epoch: 22 Global Step: 469140 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:34,911-Speed 2509.16 samples/sec Loss 1.8604 LearningRate 0.000233 Epoch: 22 Global Step: 469150 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:43,120-Speed 2495.37 samples/sec Loss 1.8845 LearningRate 0.000233 Epoch: 22 Global Step: 469160 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:51,337-Speed 2492.70 samples/sec Loss 1.8862 LearningRate 0.000233 Epoch: 22 Global Step: 469170 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:50:59,546-Speed 2496.17 samples/sec Loss 1.9342 LearningRate 0.000233 Epoch: 22 Global Step: 469180 Fp16 Grad Scale: 32768 Required: 83 hours Training: 2022-07-10 01:51:07,749-Speed 2496.87 samples/sec Loss 1.8802 LearningRate 0.000233 Epoch: 22 Global Step: 469190 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:15,956-Speed 2495.85 samples/sec Loss 1.8932 LearningRate 0.000233 Epoch: 22 Global Step: 469200 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:24,106-Speed 2513.26 samples/sec Loss 1.8965 LearningRate 0.000233 Epoch: 22 Global Step: 469210 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:32,309-Speed 2496.93 samples/sec Loss 1.8803 LearningRate 0.000233 Epoch: 22 Global Step: 469220 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:40,514-Speed 2496.50 samples/sec Loss 1.9172 LearningRate 0.000233 Epoch: 22 Global Step: 469230 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:48,718-Speed 2496.66 samples/sec Loss 1.8717 LearningRate 0.000233 Epoch: 22 Global Step: 469240 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:51:56,934-Speed 2493.10 samples/sec Loss 1.8837 LearningRate 0.000233 Epoch: 22 Global Step: 469250 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:05,137-Speed 2497.25 samples/sec Loss 1.8929 LearningRate 0.000233 Epoch: 22 Global Step: 469260 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:13,286-Speed 2513.63 samples/sec Loss 1.9169 LearningRate 0.000233 Epoch: 22 Global Step: 469270 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:21,503-Speed 2492.76 samples/sec Loss 1.8851 LearningRate 0.000233 Epoch: 22 Global Step: 469280 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:29,715-Speed 2494.36 samples/sec Loss 1.8988 LearningRate 0.000233 Epoch: 22 Global Step: 469290 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:37,918-Speed 2497.19 samples/sec Loss 1.8745 LearningRate 0.000233 Epoch: 22 Global Step: 469300 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:46,135-Speed 2492.63 samples/sec Loss 1.8437 LearningRate 0.000233 Epoch: 22 Global Step: 469310 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:52:54,342-Speed 2495.79 samples/sec Loss 1.9177 LearningRate 0.000233 Epoch: 22 Global Step: 469320 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:02,505-Speed 2509.20 samples/sec Loss 1.9060 LearningRate 0.000233 Epoch: 22 Global Step: 469330 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:10,719-Speed 2493.78 samples/sec Loss 1.8953 LearningRate 0.000233 Epoch: 22 Global Step: 469340 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:18,928-Speed 2495.25 samples/sec Loss 1.9187 LearningRate 0.000233 Epoch: 22 Global Step: 469350 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:27,133-Speed 2496.76 samples/sec Loss 1.9126 LearningRate 0.000233 Epoch: 22 Global Step: 469360 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:35,340-Speed 2495.83 samples/sec Loss 1.8889 LearningRate 0.000233 Epoch: 22 Global Step: 469370 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:43,545-Speed 2496.22 samples/sec Loss 1.9097 LearningRate 0.000233 Epoch: 22 Global Step: 469380 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:51,698-Speed 2512.58 samples/sec Loss 1.9361 LearningRate 0.000233 Epoch: 22 Global Step: 469390 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:53:59,914-Speed 2493.04 samples/sec Loss 1.9924 LearningRate 0.000233 Epoch: 22 Global Step: 469400 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:08,127-Speed 2493.93 samples/sec Loss 1.9294 LearningRate 0.000233 Epoch: 22 Global Step: 469410 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:16,328-Speed 2497.50 samples/sec Loss 1.8943 LearningRate 0.000233 Epoch: 22 Global Step: 469420 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:24,536-Speed 2495.66 samples/sec Loss 1.8758 LearningRate 0.000233 Epoch: 22 Global Step: 469430 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:32,746-Speed 2494.88 samples/sec Loss 1.8971 LearningRate 0.000233 Epoch: 22 Global Step: 469440 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:40,893-Speed 2514.37 samples/sec Loss 1.8764 LearningRate 0.000233 Epoch: 22 Global Step: 469450 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:49,093-Speed 2497.77 samples/sec Loss 1.8795 LearningRate 0.000233 Epoch: 22 Global Step: 469460 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:54:57,298-Speed 2496.36 samples/sec Loss 1.9181 LearningRate 0.000233 Epoch: 22 Global Step: 469470 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:05,500-Speed 2497.43 samples/sec Loss 1.9300 LearningRate 0.000233 Epoch: 22 Global Step: 469480 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:13,701-Speed 2497.57 samples/sec Loss 1.8626 LearningRate 0.000233 Epoch: 22 Global Step: 469490 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:21,902-Speed 2497.66 samples/sec Loss 1.8740 LearningRate 0.000233 Epoch: 22 Global Step: 469500 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:30,050-Speed 2514.04 samples/sec Loss 1.8796 LearningRate 0.000233 Epoch: 22 Global Step: 469510 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:38,255-Speed 2496.28 samples/sec Loss 1.9027 LearningRate 0.000233 Epoch: 22 Global Step: 469520 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:46,457-Speed 2497.43 samples/sec Loss 1.8679 LearningRate 0.000233 Epoch: 22 Global Step: 469530 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:55:54,667-Speed 2495.00 samples/sec Loss 1.9082 LearningRate 0.000233 Epoch: 22 Global Step: 469540 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:02,870-Speed 2496.93 samples/sec Loss 1.8696 LearningRate 0.000233 Epoch: 22 Global Step: 469550 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:11,071-Speed 2497.61 samples/sec Loss 1.8950 LearningRate 0.000233 Epoch: 22 Global Step: 469560 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:19,225-Speed 2512.58 samples/sec Loss 1.9434 LearningRate 0.000232 Epoch: 22 Global Step: 469570 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:27,441-Speed 2492.90 samples/sec Loss 1.8713 LearningRate 0.000232 Epoch: 22 Global Step: 469580 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:35,646-Speed 2496.44 samples/sec Loss 1.8961 LearningRate 0.000232 Epoch: 22 Global Step: 469590 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:43,856-Speed 2495.07 samples/sec Loss 1.9115 LearningRate 0.000232 Epoch: 22 Global Step: 469600 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:56:52,066-Speed 2495.02 samples/sec Loss 1.8808 LearningRate 0.000232 Epoch: 22 Global Step: 469610 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:00,270-Speed 2496.55 samples/sec Loss 1.9158 LearningRate 0.000232 Epoch: 22 Global Step: 469620 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:08,421-Speed 2512.94 samples/sec Loss 1.9005 LearningRate 0.000232 Epoch: 22 Global Step: 469630 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:16,626-Speed 2496.57 samples/sec Loss 1.9070 LearningRate 0.000232 Epoch: 22 Global Step: 469640 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:24,831-Speed 2496.20 samples/sec Loss 1.9221 LearningRate 0.000232 Epoch: 22 Global Step: 469650 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:33,047-Speed 2492.94 samples/sec Loss 1.9686 LearningRate 0.000232 Epoch: 22 Global Step: 469660 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:41,260-Speed 2494.09 samples/sec Loss 1.9266 LearningRate 0.000232 Epoch: 22 Global Step: 469670 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:49,463-Speed 2496.95 samples/sec Loss 1.9274 LearningRate 0.000232 Epoch: 22 Global Step: 469680 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:57:57,615-Speed 2512.72 samples/sec Loss 1.9323 LearningRate 0.000232 Epoch: 22 Global Step: 469690 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:05,814-Speed 2498.25 samples/sec Loss 1.8628 LearningRate 0.000232 Epoch: 22 Global Step: 469700 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:14,018-Speed 2496.63 samples/sec Loss 1.9131 LearningRate 0.000232 Epoch: 22 Global Step: 469710 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:22,223-Speed 2496.54 samples/sec Loss 1.8738 LearningRate 0.000232 Epoch: 22 Global Step: 469720 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:30,428-Speed 2496.92 samples/sec Loss 1.9027 LearningRate 0.000232 Epoch: 22 Global Step: 469730 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:38,642-Speed 2493.53 samples/sec Loss 1.8985 LearningRate 0.000232 Epoch: 22 Global Step: 469740 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:46,796-Speed 2511.86 samples/sec Loss 1.9478 LearningRate 0.000232 Epoch: 22 Global Step: 469750 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:58:55,002-Speed 2496.49 samples/sec Loss 1.9055 LearningRate 0.000232 Epoch: 22 Global Step: 469760 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:03,215-Speed 2493.93 samples/sec Loss 1.8934 LearningRate 0.000232 Epoch: 22 Global Step: 469770 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:11,421-Speed 2496.32 samples/sec Loss 1.9259 LearningRate 0.000232 Epoch: 22 Global Step: 469780 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:19,626-Speed 2496.20 samples/sec Loss 1.8833 LearningRate 0.000232 Epoch: 22 Global Step: 469790 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:27,835-Speed 2495.41 samples/sec Loss 1.9355 LearningRate 0.000232 Epoch: 22 Global Step: 469800 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:35,995-Speed 2510.42 samples/sec Loss 1.8959 LearningRate 0.000232 Epoch: 22 Global Step: 469810 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:44,201-Speed 2496.05 samples/sec Loss 1.9118 LearningRate 0.000232 Epoch: 22 Global Step: 469820 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 01:59:52,416-Speed 2493.62 samples/sec Loss 1.9110 LearningRate 0.000232 Epoch: 22 Global Step: 469830 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:00,629-Speed 2493.95 samples/sec Loss 1.8982 LearningRate 0.000232 Epoch: 22 Global Step: 469840 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:08,837-Speed 2495.29 samples/sec Loss 1.9008 LearningRate 0.000232 Epoch: 22 Global Step: 469850 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:17,043-Speed 2496.36 samples/sec Loss 1.9490 LearningRate 0.000232 Epoch: 22 Global Step: 469860 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:25,199-Speed 2511.47 samples/sec Loss 1.8729 LearningRate 0.000232 Epoch: 22 Global Step: 469870 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:33,403-Speed 2496.76 samples/sec Loss 1.9448 LearningRate 0.000232 Epoch: 22 Global Step: 469880 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:41,611-Speed 2495.51 samples/sec Loss 1.8657 LearningRate 0.000232 Epoch: 22 Global Step: 469890 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:49,820-Speed 2495.23 samples/sec Loss 1.8753 LearningRate 0.000232 Epoch: 22 Global Step: 469900 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:00:58,022-Speed 2497.26 samples/sec Loss 1.9119 LearningRate 0.000232 Epoch: 22 Global Step: 469910 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:06,225-Speed 2496.79 samples/sec Loss 1.8836 LearningRate 0.000232 Epoch: 22 Global Step: 469920 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:14,378-Speed 2513.39 samples/sec Loss 1.8851 LearningRate 0.000232 Epoch: 22 Global Step: 469930 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:22,583-Speed 2496.32 samples/sec Loss 1.9444 LearningRate 0.000232 Epoch: 22 Global Step: 469940 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:30,790-Speed 2495.62 samples/sec Loss 1.9204 LearningRate 0.000232 Epoch: 22 Global Step: 469950 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:38,997-Speed 2496.23 samples/sec Loss 1.9066 LearningRate 0.000232 Epoch: 22 Global Step: 469960 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:47,198-Speed 2497.63 samples/sec Loss 1.9254 LearningRate 0.000232 Epoch: 22 Global Step: 469970 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:01:55,408-Speed 2494.96 samples/sec Loss 1.9510 LearningRate 0.000232 Epoch: 22 Global Step: 469980 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:03,555-Speed 2514.33 samples/sec Loss 1.9279 LearningRate 0.000232 Epoch: 22 Global Step: 469990 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:11,760-Speed 2496.41 samples/sec Loss 1.9326 LearningRate 0.000232 Epoch: 22 Global Step: 470000 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:19,967-Speed 2495.77 samples/sec Loss 1.9274 LearningRate 0.000232 Epoch: 22 Global Step: 470010 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:28,173-Speed 2496.19 samples/sec Loss 1.8767 LearningRate 0.000232 Epoch: 22 Global Step: 470020 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:36,376-Speed 2496.80 samples/sec Loss 1.8960 LearningRate 0.000232 Epoch: 22 Global Step: 470030 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:44,583-Speed 2495.97 samples/sec Loss 1.9310 LearningRate 0.000232 Epoch: 22 Global Step: 470040 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:02:52,733-Speed 2513.25 samples/sec Loss 1.9199 LearningRate 0.000232 Epoch: 22 Global Step: 470050 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:00,940-Speed 2495.65 samples/sec Loss 1.9191 LearningRate 0.000232 Epoch: 22 Global Step: 470060 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:09,147-Speed 2496.05 samples/sec Loss 1.8949 LearningRate 0.000232 Epoch: 22 Global Step: 470070 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:17,349-Speed 2497.39 samples/sec Loss 1.9064 LearningRate 0.000232 Epoch: 22 Global Step: 470080 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:25,553-Speed 2496.46 samples/sec Loss 1.9042 LearningRate 0.000232 Epoch: 22 Global Step: 470090 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:33,757-Speed 2496.76 samples/sec Loss 1.8708 LearningRate 0.000232 Epoch: 22 Global Step: 470100 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:41,906-Speed 2513.64 samples/sec Loss 1.9278 LearningRate 0.000232 Epoch: 22 Global Step: 470110 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:50,108-Speed 2497.79 samples/sec Loss 1.8720 LearningRate 0.000232 Epoch: 22 Global Step: 470120 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:03:58,312-Speed 2496.50 samples/sec Loss 1.8863 LearningRate 0.000232 Epoch: 22 Global Step: 470130 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:04:06,516-Speed 2496.95 samples/sec Loss 1.8437 LearningRate 0.000232 Epoch: 22 Global Step: 470140 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:04:14,719-Speed 2496.80 samples/sec Loss 1.8137 LearningRate 0.000232 Epoch: 22 Global Step: 470150 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:04:22,924-Speed 2496.47 samples/sec Loss 1.8890 LearningRate 0.000232 Epoch: 22 Global Step: 470160 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:04:31,077-Speed 2512.32 samples/sec Loss 1.8108 LearningRate 0.000232 Epoch: 22 Global Step: 470170 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:04:39,281-Speed 2496.78 samples/sec Loss 1.8884 LearningRate 0.000232 Epoch: 22 Global Step: 470180 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:04:47,486-Speed 2496.61 samples/sec Loss 1.8324 LearningRate 0.000232 Epoch: 22 Global Step: 470190 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:04:55,700-Speed 2493.55 samples/sec Loss 1.8432 LearningRate 0.000232 Epoch: 22 Global Step: 470200 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:05:03,904-Speed 2496.86 samples/sec Loss 1.8497 LearningRate 0.000232 Epoch: 22 Global Step: 470210 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:05:12,113-Speed 2495.15 samples/sec Loss 1.8451 LearningRate 0.000232 Epoch: 22 Global Step: 470220 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:05:20,269-Speed 2511.56 samples/sec Loss 1.8825 LearningRate 0.000232 Epoch: 22 Global Step: 470230 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:05:28,471-Speed 2497.27 samples/sec Loss 1.9028 LearningRate 0.000232 Epoch: 22 Global Step: 470240 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-07-10 02:05:36,628-Speed 2511.21 samples/sec Loss 1.8456 LearningRate 0.000232 Epoch: 22 Global Step: 470250 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:05:44,830-Speed 2497.38 samples/sec Loss 1.7894 LearningRate 0.000232 Epoch: 22 Global Step: 470260 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:05:53,038-Speed 2495.71 samples/sec Loss 1.8350 LearningRate 0.000232 Epoch: 22 Global Step: 470270 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:01,244-Speed 2496.25 samples/sec Loss 1.8560 LearningRate 0.000232 Epoch: 22 Global Step: 470280 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:09,393-Speed 2513.51 samples/sec Loss 1.9306 LearningRate 0.000232 Epoch: 22 Global Step: 470290 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:17,596-Speed 2496.98 samples/sec Loss 1.8385 LearningRate 0.000232 Epoch: 22 Global Step: 470300 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:25,828-Speed 2488.38 samples/sec Loss 1.8688 LearningRate 0.000232 Epoch: 22 Global Step: 470310 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:34,028-Speed 2498.22 samples/sec Loss 1.9223 LearningRate 0.000232 Epoch: 22 Global Step: 470320 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:42,236-Speed 2495.34 samples/sec Loss 1.8584 LearningRate 0.000232 Epoch: 22 Global Step: 470330 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:50,445-Speed 2495.35 samples/sec Loss 1.8684 LearningRate 0.000231 Epoch: 22 Global Step: 470340 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:06:58,598-Speed 2512.54 samples/sec Loss 1.8092 LearningRate 0.000231 Epoch: 22 Global Step: 470350 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:06,806-Speed 2495.51 samples/sec Loss 1.8670 LearningRate 0.000231 Epoch: 22 Global Step: 470360 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:15,011-Speed 2496.49 samples/sec Loss 1.8480 LearningRate 0.000231 Epoch: 22 Global Step: 470370 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:23,231-Speed 2491.72 samples/sec Loss 1.8679 LearningRate 0.000231 Epoch: 22 Global Step: 470380 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:31,436-Speed 2496.50 samples/sec Loss 1.8917 LearningRate 0.000231 Epoch: 22 Global Step: 470390 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:39,641-Speed 2496.42 samples/sec Loss 1.8832 LearningRate 0.000231 Epoch: 22 Global Step: 470400 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:47,794-Speed 2512.43 samples/sec Loss 1.8719 LearningRate 0.000231 Epoch: 22 Global Step: 470410 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:07:55,999-Speed 2496.33 samples/sec Loss 1.9003 LearningRate 0.000231 Epoch: 22 Global Step: 470420 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:04,231-Speed 2488.37 samples/sec Loss 1.9032 LearningRate 0.000231 Epoch: 22 Global Step: 470430 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:12,436-Speed 2496.49 samples/sec Loss 1.9131 LearningRate 0.000231 Epoch: 22 Global Step: 470440 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:20,648-Speed 2494.06 samples/sec Loss 1.8530 LearningRate 0.000231 Epoch: 22 Global Step: 470450 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:28,861-Speed 2494.01 samples/sec Loss 1.9216 LearningRate 0.000231 Epoch: 22 Global Step: 470460 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:37,009-Speed 2513.97 samples/sec Loss 1.8506 LearningRate 0.000231 Epoch: 22 Global Step: 470470 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:45,215-Speed 2496.23 samples/sec Loss 1.8817 LearningRate 0.000231 Epoch: 22 Global Step: 470480 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:08:53,428-Speed 2494.20 samples/sec Loss 1.8723 LearningRate 0.000231 Epoch: 22 Global Step: 470490 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:01,632-Speed 2497.09 samples/sec Loss 1.8727 LearningRate 0.000231 Epoch: 22 Global Step: 470500 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:09,836-Speed 2499.10 samples/sec Loss 1.9105 LearningRate 0.000231 Epoch: 22 Global Step: 470510 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:18,038-Speed 2497.14 samples/sec Loss 1.8949 LearningRate 0.000231 Epoch: 22 Global Step: 470520 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:26,198-Speed 2510.11 samples/sec Loss 1.8391 LearningRate 0.000231 Epoch: 22 Global Step: 470530 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:34,401-Speed 2496.86 samples/sec Loss 1.8823 LearningRate 0.000231 Epoch: 22 Global Step: 470540 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:42,606-Speed 2496.40 samples/sec Loss 1.9109 LearningRate 0.000231 Epoch: 22 Global Step: 470550 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:50,825-Speed 2492.79 samples/sec Loss 1.8930 LearningRate 0.000231 Epoch: 22 Global Step: 470560 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:09:59,031-Speed 2496.04 samples/sec Loss 1.8984 LearningRate 0.000231 Epoch: 22 Global Step: 470570 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:07,247-Speed 2493.41 samples/sec Loss 1.8712 LearningRate 0.000231 Epoch: 22 Global Step: 470580 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:15,400-Speed 2512.42 samples/sec Loss 1.9084 LearningRate 0.000231 Epoch: 22 Global Step: 470590 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:23,611-Speed 2494.56 samples/sec Loss 1.8759 LearningRate 0.000231 Epoch: 22 Global Step: 470600 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:31,819-Speed 2495.48 samples/sec Loss 1.8938 LearningRate 0.000231 Epoch: 22 Global Step: 470610 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:40,027-Speed 2495.52 samples/sec Loss 1.8522 LearningRate 0.000231 Epoch: 22 Global Step: 470620 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:48,230-Speed 2497.31 samples/sec Loss 1.9320 LearningRate 0.000231 Epoch: 22 Global Step: 470630 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:10:56,440-Speed 2494.84 samples/sec Loss 1.8784 LearningRate 0.000231 Epoch: 22 Global Step: 470640 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:04,595-Speed 2511.87 samples/sec Loss 1.8880 LearningRate 0.000231 Epoch: 22 Global Step: 470650 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:12,795-Speed 2497.82 samples/sec Loss 1.8730 LearningRate 0.000231 Epoch: 22 Global Step: 470660 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:20,998-Speed 2497.24 samples/sec Loss 1.8656 LearningRate 0.000231 Epoch: 22 Global Step: 470670 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:29,200-Speed 2497.09 samples/sec Loss 1.8850 LearningRate 0.000231 Epoch: 22 Global Step: 470680 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:37,401-Speed 2497.76 samples/sec Loss 1.9204 LearningRate 0.000231 Epoch: 22 Global Step: 470690 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:45,605-Speed 2497.13 samples/sec Loss 1.8753 LearningRate 0.000231 Epoch: 22 Global Step: 470700 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:11:53,755-Speed 2513.33 samples/sec Loss 1.9027 LearningRate 0.000231 Epoch: 22 Global Step: 470710 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:12:01,962-Speed 2495.65 samples/sec Loss 1.9156 LearningRate 0.000231 Epoch: 22 Global Step: 470720 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:12:10,169-Speed 2496.06 samples/sec Loss 1.9147 LearningRate 0.000231 Epoch: 22 Global Step: 470730 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:12:18,374-Speed 2496.38 samples/sec Loss 1.9057 LearningRate 0.000231 Epoch: 22 Global Step: 470740 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:12:26,547-Speed 2506.36 samples/sec Loss 1.8979 LearningRate 0.000231 Epoch: 22 Global Step: 470750 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:12:34,752-Speed 2496.68 samples/sec Loss 1.9469 LearningRate 0.000231 Epoch: 22 Global Step: 470760 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:12:42,901-Speed 2513.62 samples/sec Loss 1.9068 LearningRate 0.000231 Epoch: 22 Global Step: 470770 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:12:51,109-Speed 2495.38 samples/sec Loss 1.9430 LearningRate 0.000231 Epoch: 22 Global Step: 470780 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:12:59,314-Speed 2496.49 samples/sec Loss 1.9108 LearningRate 0.000231 Epoch: 22 Global Step: 470790 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:07,515-Speed 2497.43 samples/sec Loss 1.8799 LearningRate 0.000231 Epoch: 22 Global Step: 470800 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:15,721-Speed 2496.22 samples/sec Loss 1.9103 LearningRate 0.000231 Epoch: 22 Global Step: 470810 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:23,927-Speed 2496.28 samples/sec Loss 1.9069 LearningRate 0.000231 Epoch: 22 Global Step: 470820 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:32,077-Speed 2513.28 samples/sec Loss 1.8612 LearningRate 0.000231 Epoch: 22 Global Step: 470830 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:40,296-Speed 2492.27 samples/sec Loss 1.8894 LearningRate 0.000231 Epoch: 22 Global Step: 470840 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:48,497-Speed 2497.68 samples/sec Loss 1.8821 LearningRate 0.000231 Epoch: 22 Global Step: 470850 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:13:56,704-Speed 2495.75 samples/sec Loss 1.8694 LearningRate 0.000231 Epoch: 22 Global Step: 470860 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:04,907-Speed 2497.19 samples/sec Loss 1.8563 LearningRate 0.000231 Epoch: 22 Global Step: 470870 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:13,106-Speed 2498.10 samples/sec Loss 1.8748 LearningRate 0.000231 Epoch: 22 Global Step: 470880 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:21,265-Speed 2510.50 samples/sec Loss 1.9234 LearningRate 0.000231 Epoch: 22 Global Step: 470890 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:29,470-Speed 2496.37 samples/sec Loss 1.8897 LearningRate 0.000231 Epoch: 22 Global Step: 470900 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:37,672-Speed 2497.32 samples/sec Loss 1.8866 LearningRate 0.000231 Epoch: 22 Global Step: 470910 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:45,889-Speed 2492.87 samples/sec Loss 1.8794 LearningRate 0.000231 Epoch: 22 Global Step: 470920 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:14:54,106-Speed 2492.74 samples/sec Loss 1.8811 LearningRate 0.000231 Epoch: 22 Global Step: 470930 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:02,311-Speed 2496.66 samples/sec Loss 1.9189 LearningRate 0.000231 Epoch: 22 Global Step: 470940 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:10,465-Speed 2511.90 samples/sec Loss 1.8700 LearningRate 0.000231 Epoch: 22 Global Step: 470950 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:18,670-Speed 2496.45 samples/sec Loss 1.8620 LearningRate 0.000231 Epoch: 22 Global Step: 470960 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:26,881-Speed 2494.76 samples/sec Loss 1.8919 LearningRate 0.000231 Epoch: 22 Global Step: 470970 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:35,088-Speed 2496.00 samples/sec Loss 1.9561 LearningRate 0.000231 Epoch: 22 Global Step: 470980 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:43,296-Speed 2495.57 samples/sec Loss 1.8409 LearningRate 0.000231 Epoch: 22 Global Step: 470990 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:51,502-Speed 2496.05 samples/sec Loss 1.9032 LearningRate 0.000231 Epoch: 22 Global Step: 471000 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:15:59,668-Speed 2508.23 samples/sec Loss 1.8416 LearningRate 0.000231 Epoch: 22 Global Step: 471010 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:07,879-Speed 2494.72 samples/sec Loss 1.9135 LearningRate 0.000231 Epoch: 22 Global Step: 471020 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:16,090-Speed 2494.62 samples/sec Loss 1.8465 LearningRate 0.000231 Epoch: 22 Global Step: 471030 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:24,295-Speed 2496.47 samples/sec Loss 1.8933 LearningRate 0.000231 Epoch: 22 Global Step: 471040 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:32,509-Speed 2493.71 samples/sec Loss 1.8897 LearningRate 0.000231 Epoch: 22 Global Step: 471050 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:40,721-Speed 2494.41 samples/sec Loss 1.9319 LearningRate 0.000231 Epoch: 22 Global Step: 471060 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:48,869-Speed 2513.70 samples/sec Loss 1.9197 LearningRate 0.000231 Epoch: 22 Global Step: 471070 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:16:57,071-Speed 2497.42 samples/sec Loss 1.8740 LearningRate 0.000231 Epoch: 22 Global Step: 471080 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:05,274-Speed 2497.01 samples/sec Loss 1.9114 LearningRate 0.000231 Epoch: 22 Global Step: 471090 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:13,475-Speed 2497.50 samples/sec Loss 1.9275 LearningRate 0.000231 Epoch: 22 Global Step: 471100 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:21,682-Speed 2496.07 samples/sec Loss 1.9424 LearningRate 0.000231 Epoch: 22 Global Step: 471110 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:29,885-Speed 2496.91 samples/sec Loss 1.9370 LearningRate 0.000230 Epoch: 22 Global Step: 471120 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:38,032-Speed 2514.43 samples/sec Loss 1.9124 LearningRate 0.000230 Epoch: 22 Global Step: 471130 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:46,236-Speed 2496.73 samples/sec Loss 1.9474 LearningRate 0.000230 Epoch: 22 Global Step: 471140 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:17:54,456-Speed 2491.82 samples/sec Loss 1.9211 LearningRate 0.000230 Epoch: 22 Global Step: 471150 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:02,660-Speed 2496.63 samples/sec Loss 1.9204 LearningRate 0.000230 Epoch: 22 Global Step: 471160 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:10,867-Speed 2495.87 samples/sec Loss 1.9212 LearningRate 0.000230 Epoch: 22 Global Step: 471170 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:19,069-Speed 2497.07 samples/sec Loss 1.9788 LearningRate 0.000230 Epoch: 22 Global Step: 471180 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:27,218-Speed 2513.82 samples/sec Loss 1.8814 LearningRate 0.000230 Epoch: 22 Global Step: 471190 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:35,419-Speed 2497.71 samples/sec Loss 1.9222 LearningRate 0.000230 Epoch: 22 Global Step: 471200 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:43,625-Speed 2496.13 samples/sec Loss 1.9180 LearningRate 0.000230 Epoch: 22 Global Step: 471210 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:18:51,827-Speed 2497.35 samples/sec Loss 1.8827 LearningRate 0.000230 Epoch: 22 Global Step: 471220 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:00,028-Speed 2497.86 samples/sec Loss 1.8563 LearningRate 0.000230 Epoch: 22 Global Step: 471230 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:08,238-Speed 2494.56 samples/sec Loss 1.9028 LearningRate 0.000230 Epoch: 22 Global Step: 471240 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:16,393-Speed 2511.87 samples/sec Loss 1.9482 LearningRate 0.000230 Epoch: 22 Global Step: 471250 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:24,597-Speed 2496.56 samples/sec Loss 1.9051 LearningRate 0.000230 Epoch: 22 Global Step: 471260 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:32,801-Speed 2496.79 samples/sec Loss 1.9264 LearningRate 0.000230 Epoch: 22 Global Step: 471270 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:41,005-Speed 2496.63 samples/sec Loss 1.9132 LearningRate 0.000230 Epoch: 22 Global Step: 471280 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:49,208-Speed 2497.03 samples/sec Loss 1.8834 LearningRate 0.000230 Epoch: 22 Global Step: 471290 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:19:57,410-Speed 2497.54 samples/sec Loss 1.8348 LearningRate 0.000230 Epoch: 22 Global Step: 471300 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:05,570-Speed 2510.34 samples/sec Loss 1.9198 LearningRate 0.000230 Epoch: 22 Global Step: 471310 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:13,773-Speed 2497.04 samples/sec Loss 1.9173 LearningRate 0.000230 Epoch: 22 Global Step: 471320 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:21,981-Speed 2495.68 samples/sec Loss 1.8948 LearningRate 0.000230 Epoch: 22 Global Step: 471330 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:30,187-Speed 2496.21 samples/sec Loss 1.9019 LearningRate 0.000230 Epoch: 22 Global Step: 471340 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:38,395-Speed 2495.36 samples/sec Loss 1.8908 LearningRate 0.000230 Epoch: 22 Global Step: 471350 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:46,601-Speed 2496.32 samples/sec Loss 1.8972 LearningRate 0.000230 Epoch: 22 Global Step: 471360 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:20:54,753-Speed 2512.40 samples/sec Loss 1.9311 LearningRate 0.000230 Epoch: 22 Global Step: 471370 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:02,957-Speed 2496.71 samples/sec Loss 1.8704 LearningRate 0.000230 Epoch: 22 Global Step: 471380 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:11,161-Speed 2496.79 samples/sec Loss 1.9194 LearningRate 0.000230 Epoch: 22 Global Step: 471390 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:19,364-Speed 2497.15 samples/sec Loss 1.9380 LearningRate 0.000230 Epoch: 22 Global Step: 471400 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:27,566-Speed 2497.21 samples/sec Loss 1.8992 LearningRate 0.000230 Epoch: 22 Global Step: 471410 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:35,767-Speed 2497.55 samples/sec Loss 1.9140 LearningRate 0.000230 Epoch: 22 Global Step: 471420 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:43,926-Speed 2510.74 samples/sec Loss 1.8835 LearningRate 0.000230 Epoch: 22 Global Step: 471430 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:21:52,143-Speed 2492.71 samples/sec Loss 1.8993 LearningRate 0.000230 Epoch: 22 Global Step: 471440 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:00,343-Speed 2497.82 samples/sec Loss 1.9130 LearningRate 0.000230 Epoch: 22 Global Step: 471450 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:08,548-Speed 2496.53 samples/sec Loss 1.8536 LearningRate 0.000230 Epoch: 22 Global Step: 471460 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:16,752-Speed 2496.89 samples/sec Loss 1.8659 LearningRate 0.000230 Epoch: 22 Global Step: 471470 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:24,956-Speed 2496.76 samples/sec Loss 1.8846 LearningRate 0.000230 Epoch: 22 Global Step: 471480 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:33,105-Speed 2513.51 samples/sec Loss 1.8837 LearningRate 0.000230 Epoch: 22 Global Step: 471490 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:41,313-Speed 2495.65 samples/sec Loss 1.8415 LearningRate 0.000230 Epoch: 22 Global Step: 471500 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:49,519-Speed 2496.30 samples/sec Loss 1.9264 LearningRate 0.000230 Epoch: 22 Global Step: 471510 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:22:57,721-Speed 2497.24 samples/sec Loss 1.9161 LearningRate 0.000230 Epoch: 22 Global Step: 471520 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:05,929-Speed 2495.48 samples/sec Loss 1.9123 LearningRate 0.000230 Epoch: 22 Global Step: 471530 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:14,131-Speed 2497.48 samples/sec Loss 1.8909 LearningRate 0.000230 Epoch: 22 Global Step: 471540 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:22,280-Speed 2513.60 samples/sec Loss 1.8558 LearningRate 0.000230 Epoch: 22 Global Step: 471550 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:30,482-Speed 2497.28 samples/sec Loss 1.8786 LearningRate 0.000230 Epoch: 22 Global Step: 471560 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:38,685-Speed 2496.85 samples/sec Loss 1.8463 LearningRate 0.000230 Epoch: 22 Global Step: 471570 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:46,889-Speed 2497.40 samples/sec Loss 1.9290 LearningRate 0.000230 Epoch: 22 Global Step: 471580 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:23:55,095-Speed 2495.98 samples/sec Loss 1.9066 LearningRate 0.000230 Epoch: 22 Global Step: 471590 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:03,297-Speed 2497.40 samples/sec Loss 1.9285 LearningRate 0.000230 Epoch: 22 Global Step: 471600 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:11,464-Speed 2508.04 samples/sec Loss 1.8734 LearningRate 0.000230 Epoch: 22 Global Step: 471610 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:19,676-Speed 2494.47 samples/sec Loss 1.9242 LearningRate 0.000230 Epoch: 22 Global Step: 471620 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:27,882-Speed 2496.20 samples/sec Loss 1.8873 LearningRate 0.000230 Epoch: 22 Global Step: 471630 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:36,083-Speed 2497.65 samples/sec Loss 1.8822 LearningRate 0.000230 Epoch: 22 Global Step: 471640 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:44,287-Speed 2496.75 samples/sec Loss 1.9146 LearningRate 0.000230 Epoch: 22 Global Step: 471650 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:24:52,495-Speed 2495.40 samples/sec Loss 1.8682 LearningRate 0.000230 Epoch: 22 Global Step: 471660 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:00,651-Speed 2511.42 samples/sec Loss 1.8902 LearningRate 0.000230 Epoch: 22 Global Step: 471670 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:08,858-Speed 2495.86 samples/sec Loss 1.8746 LearningRate 0.000230 Epoch: 22 Global Step: 471680 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:17,068-Speed 2494.89 samples/sec Loss 1.9003 LearningRate 0.000230 Epoch: 22 Global Step: 471690 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:25,272-Speed 2496.61 samples/sec Loss 1.8617 LearningRate 0.000230 Epoch: 22 Global Step: 471700 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:33,484-Speed 2494.30 samples/sec Loss 1.8553 LearningRate 0.000230 Epoch: 22 Global Step: 471710 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:41,686-Speed 2497.42 samples/sec Loss 1.8551 LearningRate 0.000230 Epoch: 22 Global Step: 471720 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:49,871-Speed 2502.47 samples/sec Loss 1.9127 LearningRate 0.000230 Epoch: 22 Global Step: 471730 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:25:58,077-Speed 2496.27 samples/sec Loss 1.8852 LearningRate 0.000230 Epoch: 22 Global Step: 471740 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:06,280-Speed 2497.05 samples/sec Loss 1.9067 LearningRate 0.000230 Epoch: 22 Global Step: 471750 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:14,501-Speed 2491.86 samples/sec Loss 1.9224 LearningRate 0.000230 Epoch: 22 Global Step: 471760 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:22,712-Speed 2494.38 samples/sec Loss 1.8597 LearningRate 0.000230 Epoch: 22 Global Step: 471770 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:30,915-Speed 2496.99 samples/sec Loss 1.8858 LearningRate 0.000230 Epoch: 22 Global Step: 471780 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:39,070-Speed 2511.81 samples/sec Loss 1.8818 LearningRate 0.000230 Epoch: 22 Global Step: 471790 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:47,276-Speed 2496.25 samples/sec Loss 1.8645 LearningRate 0.000230 Epoch: 22 Global Step: 471800 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:26:55,476-Speed 2497.71 samples/sec Loss 1.8864 LearningRate 0.000230 Epoch: 22 Global Step: 471810 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:03,680-Speed 2497.29 samples/sec Loss 1.8847 LearningRate 0.000230 Epoch: 22 Global Step: 471820 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:11,884-Speed 2496.94 samples/sec Loss 1.8938 LearningRate 0.000230 Epoch: 22 Global Step: 471830 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:20,090-Speed 2496.17 samples/sec Loss 1.8716 LearningRate 0.000230 Epoch: 22 Global Step: 471840 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:28,248-Speed 2510.92 samples/sec Loss 1.9468 LearningRate 0.000230 Epoch: 22 Global Step: 471850 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:36,463-Speed 2493.49 samples/sec Loss 1.8917 LearningRate 0.000230 Epoch: 22 Global Step: 471860 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:44,667-Speed 2496.63 samples/sec Loss 1.8587 LearningRate 0.000230 Epoch: 22 Global Step: 471870 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:27:52,869-Speed 2497.34 samples/sec Loss 1.8597 LearningRate 0.000230 Epoch: 22 Global Step: 471880 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:01,074-Speed 2496.55 samples/sec Loss 1.8738 LearningRate 0.000230 Epoch: 22 Global Step: 471890 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:09,279-Speed 2496.31 samples/sec Loss 1.8953 LearningRate 0.000229 Epoch: 22 Global Step: 471900 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:17,439-Speed 2510.22 samples/sec Loss 1.8781 LearningRate 0.000229 Epoch: 22 Global Step: 471910 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:25,646-Speed 2496.25 samples/sec Loss 1.8709 LearningRate 0.000229 Epoch: 22 Global Step: 471920 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:33,855-Speed 2495.23 samples/sec Loss 1.8554 LearningRate 0.000229 Epoch: 22 Global Step: 471930 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:42,059-Speed 2496.77 samples/sec Loss 1.8639 LearningRate 0.000229 Epoch: 22 Global Step: 471940 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:28:50,260-Speed 2497.62 samples/sec Loss 1.8222 LearningRate 0.000229 Epoch: 22 Global Step: 471950 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:28:58,470-Speed 2495.27 samples/sec Loss 1.8832 LearningRate 0.000229 Epoch: 22 Global Step: 471960 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:06,620-Speed 2513.24 samples/sec Loss 1.8282 LearningRate 0.000229 Epoch: 22 Global Step: 471970 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:14,822-Speed 2497.37 samples/sec Loss 1.8932 LearningRate 0.000229 Epoch: 22 Global Step: 471980 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:23,035-Speed 2493.90 samples/sec Loss 1.8305 LearningRate 0.000229 Epoch: 22 Global Step: 471990 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:31,257-Speed 2491.42 samples/sec Loss 1.8681 LearningRate 0.000229 Epoch: 22 Global Step: 472000 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:39,459-Speed 2497.73 samples/sec Loss 1.8155 LearningRate 0.000229 Epoch: 22 Global Step: 472010 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:47,665-Speed 2495.90 samples/sec Loss 1.8663 LearningRate 0.000229 Epoch: 22 Global Step: 472020 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:29:55,813-Speed 2513.78 samples/sec Loss 1.8596 LearningRate 0.000229 Epoch: 22 Global Step: 472030 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:04,018-Speed 2496.31 samples/sec Loss 1.8213 LearningRate 0.000229 Epoch: 22 Global Step: 472040 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:12,225-Speed 2495.98 samples/sec Loss 1.8577 LearningRate 0.000229 Epoch: 22 Global Step: 472050 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:20,426-Speed 2497.57 samples/sec Loss 1.8479 LearningRate 0.000229 Epoch: 22 Global Step: 472060 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:28,625-Speed 2498.10 samples/sec Loss 1.8738 LearningRate 0.000229 Epoch: 22 Global Step: 472070 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:36,830-Speed 2496.14 samples/sec Loss 1.8793 LearningRate 0.000229 Epoch: 22 Global Step: 472080 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:44,982-Speed 2512.57 samples/sec Loss 1.8972 LearningRate 0.000229 Epoch: 22 Global Step: 472090 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:30:53,185-Speed 2497.24 samples/sec Loss 1.8494 LearningRate 0.000229 Epoch: 22 Global Step: 472100 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:01,398-Speed 2494.10 samples/sec Loss 1.8698 LearningRate 0.000229 Epoch: 22 Global Step: 472110 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:09,606-Speed 2495.47 samples/sec Loss 1.8941 LearningRate 0.000229 Epoch: 22 Global Step: 472120 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:17,808-Speed 2497.29 samples/sec Loss 1.8293 LearningRate 0.000229 Epoch: 22 Global Step: 472130 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:26,011-Speed 2497.19 samples/sec Loss 1.9121 LearningRate 0.000229 Epoch: 22 Global Step: 472140 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:34,168-Speed 2511.02 samples/sec Loss 1.8527 LearningRate 0.000229 Epoch: 22 Global Step: 472150 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:42,370-Speed 2497.26 samples/sec Loss 1.8823 LearningRate 0.000229 Epoch: 22 Global Step: 472160 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:50,578-Speed 2495.85 samples/sec Loss 1.8719 LearningRate 0.000229 Epoch: 22 Global Step: 472170 Fp16 Grad Scale: 32768 Required: 82 hours Training: 2022-07-10 02:31:58,737-Speed 2510.49 samples/sec Loss 1.8323 LearningRate 0.000229 Epoch: 22 Global Step: 472180 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:06,949-Speed 2494.24 samples/sec Loss 1.8878 LearningRate 0.000229 Epoch: 22 Global Step: 472190 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:15,152-Speed 2496.91 samples/sec Loss 1.8503 LearningRate 0.000229 Epoch: 22 Global Step: 472200 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:23,307-Speed 2511.84 samples/sec Loss 1.8497 LearningRate 0.000229 Epoch: 22 Global Step: 472210 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:31,522-Speed 2493.50 samples/sec Loss 1.8318 LearningRate 0.000229 Epoch: 22 Global Step: 472220 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:39,757-Speed 2498.78 samples/sec Loss 1.8166 LearningRate 0.000229 Epoch: 22 Global Step: 472230 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:47,967-Speed 2494.65 samples/sec Loss 1.8476 LearningRate 0.000229 Epoch: 22 Global Step: 472240 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:32:56,224-Speed 2497.59 samples/sec Loss 1.8634 LearningRate 0.000229 Epoch: 22 Global Step: 472250 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:04,483-Speed 2497.95 samples/sec Loss 1.8929 LearningRate 0.000229 Epoch: 22 Global Step: 472260 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:12,634-Speed 2512.77 samples/sec Loss 1.8243 LearningRate 0.000229 Epoch: 22 Global Step: 472270 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:20,879-Speed 2499.34 samples/sec Loss 1.8523 LearningRate 0.000229 Epoch: 22 Global Step: 472280 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:29,128-Speed 2499.12 samples/sec Loss 1.8860 LearningRate 0.000229 Epoch: 22 Global Step: 472290 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:37,330-Speed 2497.21 samples/sec Loss 1.8904 LearningRate 0.000229 Epoch: 22 Global Step: 472300 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:51,498-Speed 2499.31 samples/sec Loss 1.8887 LearningRate 0.000229 Epoch: 22 Global Step: 472310 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:33:59,738-Speed 2503.10 samples/sec Loss 1.8734 LearningRate 0.000229 Epoch: 22 Global Step: 472320 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:10,735-Speed 1869.29 samples/sec Loss 1.8660 LearningRate 0.000229 Epoch: 22 Global Step: 472330 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:18,926-Speed 2500.59 samples/sec Loss 1.8891 LearningRate 0.000229 Epoch: 22 Global Step: 472340 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:29,019-Speed 2029.41 samples/sec Loss 1.9058 LearningRate 0.000229 Epoch: 22 Global Step: 472350 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:37,543-Speed 2414.25 samples/sec Loss 1.8979 LearningRate 0.000229 Epoch: 22 Global Step: 472360 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:46,801-Speed 2500.79 samples/sec Loss 1.9133 LearningRate 0.000229 Epoch: 22 Global Step: 472370 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:34:55,006-Speed 2496.33 samples/sec Loss 1.9062 LearningRate 0.000229 Epoch: 22 Global Step: 472380 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:35:03,195-Speed 2515.40 samples/sec Loss 1.9351 LearningRate 0.000229 Epoch: 22 Global Step: 472390 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:35:11,464-Speed 2499.76 samples/sec Loss 1.8579 LearningRate 0.000229 Epoch: 22 Global Step: 472400 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:35:41,738-Speed 676.53 samples/sec Loss 1.9037 LearningRate 0.000229 Epoch: 22 Global Step: 472410 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:35:49,995-Speed 2506.48 samples/sec Loss 1.9239 LearningRate 0.000229 Epoch: 22 Global Step: 472420 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:35:58,243-Speed 2504.18 samples/sec Loss 1.8672 LearningRate 0.000229 Epoch: 22 Global Step: 472430 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:06,931-Speed 2417.30 samples/sec Loss 1.8489 LearningRate 0.000229 Epoch: 22 Global Step: 472440 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:15,079-Speed 2513.97 samples/sec Loss 1.8681 LearningRate 0.000229 Epoch: 22 Global Step: 472450 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:23,287-Speed 2495.65 samples/sec Loss 1.8885 LearningRate 0.000229 Epoch: 22 Global Step: 472460 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:31,492-Speed 2496.69 samples/sec Loss 1.8751 LearningRate 0.000229 Epoch: 22 Global Step: 472470 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:39,698-Speed 2496.26 samples/sec Loss 1.8991 LearningRate 0.000229 Epoch: 22 Global Step: 472480 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:47,913-Speed 2493.67 samples/sec Loss 1.8545 LearningRate 0.000229 Epoch: 22 Global Step: 472490 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:36:56,126-Speed 2493.70 samples/sec Loss 1.8705 LearningRate 0.000229 Epoch: 22 Global Step: 472500 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:04,291-Speed 2509.12 samples/sec Loss 1.8647 LearningRate 0.000229 Epoch: 22 Global Step: 472510 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:12,505-Speed 2493.83 samples/sec Loss 1.8416 LearningRate 0.000229 Epoch: 22 Global Step: 472520 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:20,715-Speed 2494.77 samples/sec Loss 1.8879 LearningRate 0.000229 Epoch: 22 Global Step: 472530 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:28,937-Speed 2491.37 samples/sec Loss 1.9453 LearningRate 0.000229 Epoch: 22 Global Step: 472540 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:37,144-Speed 2495.52 samples/sec Loss 1.8412 LearningRate 0.000229 Epoch: 22 Global Step: 472550 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:45,349-Speed 2496.55 samples/sec Loss 1.8563 LearningRate 0.000229 Epoch: 22 Global Step: 472560 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:37:53,499-Speed 2513.40 samples/sec Loss 1.9149 LearningRate 0.000229 Epoch: 22 Global Step: 472570 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:01,702-Speed 2496.87 samples/sec Loss 1.9082 LearningRate 0.000229 Epoch: 22 Global Step: 472580 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:09,913-Speed 2494.55 samples/sec Loss 1.8747 LearningRate 0.000229 Epoch: 22 Global Step: 472590 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:18,125-Speed 2494.45 samples/sec Loss 1.8886 LearningRate 0.000229 Epoch: 22 Global Step: 472600 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:26,326-Speed 2497.60 samples/sec Loss 1.8728 LearningRate 0.000229 Epoch: 22 Global Step: 472610 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:34,533-Speed 2495.82 samples/sec Loss 1.9563 LearningRate 0.000229 Epoch: 22 Global Step: 472620 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:42,685-Speed 2512.60 samples/sec Loss 1.8446 LearningRate 0.000229 Epoch: 22 Global Step: 472630 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:50,891-Speed 2496.20 samples/sec Loss 1.9018 LearningRate 0.000229 Epoch: 22 Global Step: 472640 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:38:59,102-Speed 2494.75 samples/sec Loss 1.8903 LearningRate 0.000229 Epoch: 22 Global Step: 472650 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:07,306-Speed 2496.79 samples/sec Loss 1.8679 LearningRate 0.000229 Epoch: 22 Global Step: 472660 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:15,512-Speed 2496.01 samples/sec Loss 1.8604 LearningRate 0.000229 Epoch: 22 Global Step: 472670 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:23,720-Speed 2495.58 samples/sec Loss 1.8933 LearningRate 0.000228 Epoch: 22 Global Step: 472680 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:31,870-Speed 2513.05 samples/sec Loss 1.8749 LearningRate 0.000228 Epoch: 22 Global Step: 472690 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:40,078-Speed 2495.54 samples/sec Loss 1.9215 LearningRate 0.000228 Epoch: 22 Global Step: 472700 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:48,283-Speed 2496.21 samples/sec Loss 1.8770 LearningRate 0.000228 Epoch: 22 Global Step: 472710 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:39:56,491-Speed 2495.84 samples/sec Loss 1.8906 LearningRate 0.000228 Epoch: 22 Global Step: 472720 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:04,699-Speed 2495.48 samples/sec Loss 1.9048 LearningRate 0.000228 Epoch: 22 Global Step: 472730 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:12,909-Speed 2494.73 samples/sec Loss 1.8525 LearningRate 0.000228 Epoch: 22 Global Step: 472740 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:21,064-Speed 2511.86 samples/sec Loss 1.9002 LearningRate 0.000228 Epoch: 22 Global Step: 472750 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:29,290-Speed 2490.11 samples/sec Loss 1.9053 LearningRate 0.000228 Epoch: 22 Global Step: 472760 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:37,499-Speed 2495.00 samples/sec Loss 1.8827 LearningRate 0.000228 Epoch: 22 Global Step: 472770 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:45,706-Speed 2495.78 samples/sec Loss 1.8901 LearningRate 0.000228 Epoch: 22 Global Step: 472780 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:40:53,910-Speed 2496.96 samples/sec Loss 1.8830 LearningRate 0.000228 Epoch: 22 Global Step: 472790 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:02,115-Speed 2496.30 samples/sec Loss 1.9133 LearningRate 0.000228 Epoch: 22 Global Step: 472800 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:10,270-Speed 2511.45 samples/sec Loss 1.8861 LearningRate 0.000228 Epoch: 22 Global Step: 472810 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:18,473-Speed 2497.05 samples/sec Loss 1.9235 LearningRate 0.000228 Epoch: 22 Global Step: 472820 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:26,679-Speed 2496.41 samples/sec Loss 1.8770 LearningRate 0.000228 Epoch: 22 Global Step: 472830 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:34,892-Speed 2493.97 samples/sec Loss 1.8702 LearningRate 0.000228 Epoch: 22 Global Step: 472840 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:43,107-Speed 2493.18 samples/sec Loss 1.8660 LearningRate 0.000228 Epoch: 22 Global Step: 472850 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:51,315-Speed 2495.47 samples/sec Loss 1.9012 LearningRate 0.000228 Epoch: 22 Global Step: 472860 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:41:59,468-Speed 2512.55 samples/sec Loss 1.8710 LearningRate 0.000228 Epoch: 22 Global Step: 472870 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:07,675-Speed 2495.70 samples/sec Loss 1.8855 LearningRate 0.000228 Epoch: 22 Global Step: 472880 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:15,882-Speed 2495.87 samples/sec Loss 1.8855 LearningRate 0.000228 Epoch: 22 Global Step: 472890 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:24,086-Speed 2496.73 samples/sec Loss 1.8172 LearningRate 0.000228 Epoch: 22 Global Step: 472900 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:32,293-Speed 2495.81 samples/sec Loss 1.8656 LearningRate 0.000228 Epoch: 22 Global Step: 472910 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:40,512-Speed 2492.45 samples/sec Loss 1.8492 LearningRate 0.000228 Epoch: 22 Global Step: 472920 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:48,666-Speed 2512.14 samples/sec Loss 1.8573 LearningRate 0.000228 Epoch: 22 Global Step: 472930 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:42:56,877-Speed 2494.57 samples/sec Loss 1.8618 LearningRate 0.000228 Epoch: 22 Global Step: 472940 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:43:05,084-Speed 2495.82 samples/sec Loss 1.8465 LearningRate 0.000228 Epoch: 22 Global Step: 472950 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:43:13,298-Speed 2493.80 samples/sec Loss 1.8467 LearningRate 0.000228 Epoch: 22 Global Step: 472960 Fp16 Grad Scale: 16384 Required: 82 hours Training: 2022-07-10 02:43:21,480-Speed 2503.44 samples/sec Loss 1.9133 LearningRate 0.000228 Epoch: 22 Global Step: 472970 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:43:29,700-Speed 2492.05 samples/sec Loss 1.8836 LearningRate 0.000228 Epoch: 22 Global Step: 472980 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:43:37,849-Speed 2513.50 samples/sec Loss 1.8490 LearningRate 0.000228 Epoch: 22 Global Step: 472990 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:43:46,053-Speed 2496.70 samples/sec Loss 1.8771 LearningRate 0.000228 Epoch: 22 Global Step: 473000 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:43:54,252-Speed 2498.30 samples/sec Loss 1.8793 LearningRate 0.000228 Epoch: 22 Global Step: 473010 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:02,454-Speed 2497.38 samples/sec Loss 1.8678 LearningRate 0.000228 Epoch: 22 Global Step: 473020 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:10,662-Speed 2495.60 samples/sec Loss 1.8993 LearningRate 0.000228 Epoch: 22 Global Step: 473030 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:18,869-Speed 2495.81 samples/sec Loss 1.8964 LearningRate 0.000228 Epoch: 22 Global Step: 473040 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:27,019-Speed 2513.45 samples/sec Loss 1.9403 LearningRate 0.000228 Epoch: 22 Global Step: 473050 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:35,220-Speed 2497.81 samples/sec Loss 1.8728 LearningRate 0.000228 Epoch: 22 Global Step: 473060 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:43,419-Speed 2498.51 samples/sec Loss 1.8995 LearningRate 0.000228 Epoch: 22 Global Step: 473070 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:51,623-Speed 2496.59 samples/sec Loss 1.9405 LearningRate 0.000228 Epoch: 22 Global Step: 473080 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:44:59,836-Speed 2493.99 samples/sec Loss 1.9167 LearningRate 0.000228 Epoch: 22 Global Step: 473090 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:08,042-Speed 2496.26 samples/sec Loss 1.8627 LearningRate 0.000228 Epoch: 22 Global Step: 473100 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:16,199-Speed 2511.11 samples/sec Loss 1.8733 LearningRate 0.000228 Epoch: 22 Global Step: 473110 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:24,407-Speed 2495.38 samples/sec Loss 1.9096 LearningRate 0.000228 Epoch: 22 Global Step: 473120 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:32,629-Speed 2491.50 samples/sec Loss 1.8928 LearningRate 0.000228 Epoch: 22 Global Step: 473130 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:40,840-Speed 2494.74 samples/sec Loss 1.8573 LearningRate 0.000228 Epoch: 22 Global Step: 473140 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:49,044-Speed 2496.75 samples/sec Loss 1.8743 LearningRate 0.000228 Epoch: 22 Global Step: 473150 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:45:57,260-Speed 2492.84 samples/sec Loss 1.9389 LearningRate 0.000228 Epoch: 22 Global Step: 473160 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:05,426-Speed 2508.50 samples/sec Loss 1.8552 LearningRate 0.000228 Epoch: 22 Global Step: 473170 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:13,635-Speed 2495.06 samples/sec Loss 1.8645 LearningRate 0.000228 Epoch: 22 Global Step: 473180 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:21,840-Speed 2496.41 samples/sec Loss 1.8596 LearningRate 0.000228 Epoch: 22 Global Step: 473190 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:30,044-Speed 2496.86 samples/sec Loss 1.8589 LearningRate 0.000228 Epoch: 22 Global Step: 473200 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:38,246-Speed 2497.37 samples/sec Loss 1.8368 LearningRate 0.000228 Epoch: 22 Global Step: 473210 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:46,452-Speed 2496.32 samples/sec Loss 1.8581 LearningRate 0.000228 Epoch: 22 Global Step: 473220 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:46:54,607-Speed 2511.82 samples/sec Loss 1.8819 LearningRate 0.000228 Epoch: 22 Global Step: 473230 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:02,821-Speed 2493.50 samples/sec Loss 1.8661 LearningRate 0.000228 Epoch: 22 Global Step: 473240 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:11,024-Speed 2497.23 samples/sec Loss 1.8538 LearningRate 0.000228 Epoch: 22 Global Step: 473250 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:19,224-Speed 2497.95 samples/sec Loss 1.8383 LearningRate 0.000228 Epoch: 22 Global Step: 473260 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:27,431-Speed 2495.96 samples/sec Loss 1.8305 LearningRate 0.000228 Epoch: 22 Global Step: 473270 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:35,637-Speed 2496.05 samples/sec Loss 1.8308 LearningRate 0.000228 Epoch: 22 Global Step: 473280 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:43,784-Speed 2514.43 samples/sec Loss 1.8059 LearningRate 0.000228 Epoch: 22 Global Step: 473290 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:47:51,984-Speed 2497.88 samples/sec Loss 1.8773 LearningRate 0.000228 Epoch: 22 Global Step: 473300 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:00,188-Speed 2496.82 samples/sec Loss 1.8796 LearningRate 0.000228 Epoch: 22 Global Step: 473310 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:08,392-Speed 2496.76 samples/sec Loss 1.8734 LearningRate 0.000228 Epoch: 22 Global Step: 473320 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:16,600-Speed 2495.70 samples/sec Loss 1.9124 LearningRate 0.000228 Epoch: 22 Global Step: 473330 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:24,805-Speed 2496.28 samples/sec Loss 1.9550 LearningRate 0.000228 Epoch: 22 Global Step: 473340 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:32,955-Speed 2513.16 samples/sec Loss 1.8857 LearningRate 0.000228 Epoch: 22 Global Step: 473350 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:41,158-Speed 2497.17 samples/sec Loss 1.9314 LearningRate 0.000228 Epoch: 22 Global Step: 473360 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:49,360-Speed 2497.41 samples/sec Loss 1.8877 LearningRate 0.000228 Epoch: 22 Global Step: 473370 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:48:57,563-Speed 2497.04 samples/sec Loss 1.8740 LearningRate 0.000228 Epoch: 22 Global Step: 473380 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:05,767-Speed 2496.42 samples/sec Loss 1.8849 LearningRate 0.000228 Epoch: 22 Global Step: 473390 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:13,972-Speed 2496.49 samples/sec Loss 1.9264 LearningRate 0.000228 Epoch: 22 Global Step: 473400 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:22,147-Speed 2505.71 samples/sec Loss 1.9083 LearningRate 0.000228 Epoch: 22 Global Step: 473410 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:30,375-Speed 2489.37 samples/sec Loss 1.9115 LearningRate 0.000228 Epoch: 22 Global Step: 473420 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:38,579-Speed 2496.63 samples/sec Loss 1.8744 LearningRate 0.000228 Epoch: 22 Global Step: 473430 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:46,787-Speed 2495.59 samples/sec Loss 1.8639 LearningRate 0.000228 Epoch: 22 Global Step: 473440 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:49:54,992-Speed 2496.38 samples/sec Loss 1.8818 LearningRate 0.000228 Epoch: 22 Global Step: 473450 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:03,193-Speed 2497.68 samples/sec Loss 1.8949 LearningRate 0.000227 Epoch: 22 Global Step: 473460 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:11,344-Speed 2512.91 samples/sec Loss 1.8543 LearningRate 0.000227 Epoch: 22 Global Step: 473470 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:19,546-Speed 2497.42 samples/sec Loss 1.8807 LearningRate 0.000227 Epoch: 22 Global Step: 473480 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:27,746-Speed 2497.66 samples/sec Loss 1.8803 LearningRate 0.000227 Epoch: 22 Global Step: 473490 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:35,951-Speed 2496.55 samples/sec Loss 1.8740 LearningRate 0.000227 Epoch: 22 Global Step: 473500 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:44,159-Speed 2495.37 samples/sec Loss 1.8505 LearningRate 0.000227 Epoch: 22 Global Step: 473510 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:50:52,361-Speed 2497.45 samples/sec Loss 1.8565 LearningRate 0.000227 Epoch: 22 Global Step: 473520 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:51:00,511-Speed 2513.33 samples/sec Loss 1.8763 LearningRate 0.000227 Epoch: 22 Global Step: 473530 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:51:08,712-Speed 2497.54 samples/sec Loss 1.8733 LearningRate 0.000227 Epoch: 22 Global Step: 473540 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:51:16,929-Speed 2492.89 samples/sec Loss 1.9097 LearningRate 0.000227 Epoch: 22 Global Step: 473550 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:51:25,133-Speed 2496.70 samples/sec Loss 1.9372 LearningRate 0.000227 Epoch: 22 Global Step: 473560 Fp16 Grad Scale: 8192 Required: 82 hours Training: 2022-07-10 02:51:33,337-Speed 2496.83 samples/sec Loss 1.8921 LearningRate 0.000227 Epoch: 22 Global Step: 473570 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:51:41,542-Speed 2496.35 samples/sec Loss 1.8678 LearningRate 0.000227 Epoch: 22 Global Step: 473580 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:51:49,688-Speed 2514.44 samples/sec Loss 1.9154 LearningRate 0.000227 Epoch: 22 Global Step: 473590 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:51:57,902-Speed 2493.71 samples/sec Loss 1.8585 LearningRate 0.000227 Epoch: 22 Global Step: 473600 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:06,122-Speed 2491.90 samples/sec Loss 1.9196 LearningRate 0.000227 Epoch: 22 Global Step: 473610 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:14,333-Speed 2494.52 samples/sec Loss 1.9091 LearningRate 0.000227 Epoch: 22 Global Step: 473620 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:22,540-Speed 2495.72 samples/sec Loss 1.9117 LearningRate 0.000227 Epoch: 22 Global Step: 473630 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:30,744-Speed 2496.89 samples/sec Loss 1.8642 LearningRate 0.000227 Epoch: 22 Global Step: 473640 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:38,898-Speed 2511.86 samples/sec Loss 1.8484 LearningRate 0.000227 Epoch: 22 Global Step: 473650 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:47,102-Speed 2496.70 samples/sec Loss 1.8876 LearningRate 0.000227 Epoch: 22 Global Step: 473660 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:52:55,308-Speed 2496.13 samples/sec Loss 1.8629 LearningRate 0.000227 Epoch: 22 Global Step: 473670 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:03,512-Speed 2496.92 samples/sec Loss 1.8652 LearningRate 0.000227 Epoch: 22 Global Step: 473680 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:11,719-Speed 2495.52 samples/sec Loss 1.8693 LearningRate 0.000227 Epoch: 22 Global Step: 473690 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:19,925-Speed 2496.21 samples/sec Loss 1.8860 LearningRate 0.000227 Epoch: 22 Global Step: 473700 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:28,074-Speed 2513.54 samples/sec Loss 1.8408 LearningRate 0.000227 Epoch: 22 Global Step: 473710 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:36,290-Speed 2493.27 samples/sec Loss 1.8951 LearningRate 0.000227 Epoch: 22 Global Step: 473720 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:44,504-Speed 2493.72 samples/sec Loss 1.8555 LearningRate 0.000227 Epoch: 22 Global Step: 473730 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:53:52,711-Speed 2495.77 samples/sec Loss 1.9019 LearningRate 0.000227 Epoch: 22 Global Step: 473740 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:00,913-Speed 2497.49 samples/sec Loss 1.9029 LearningRate 0.000227 Epoch: 22 Global Step: 473750 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:09,122-Speed 2495.27 samples/sec Loss 1.9237 LearningRate 0.000227 Epoch: 22 Global Step: 473760 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:17,281-Speed 2510.60 samples/sec Loss 1.8418 LearningRate 0.000227 Epoch: 22 Global Step: 473770 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:25,484-Speed 2497.11 samples/sec Loss 1.9328 LearningRate 0.000227 Epoch: 22 Global Step: 473780 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:33,699-Speed 2493.59 samples/sec Loss 1.8899 LearningRate 0.000227 Epoch: 22 Global Step: 473790 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:41,900-Speed 2497.52 samples/sec Loss 1.8905 LearningRate 0.000227 Epoch: 22 Global Step: 473800 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:50,104-Speed 2497.00 samples/sec Loss 1.8488 LearningRate 0.000227 Epoch: 22 Global Step: 473810 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:54:58,311-Speed 2495.73 samples/sec Loss 1.8328 LearningRate 0.000227 Epoch: 22 Global Step: 473820 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:06,464-Speed 2512.53 samples/sec Loss 1.8713 LearningRate 0.000227 Epoch: 22 Global Step: 473830 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:14,679-Speed 2493.14 samples/sec Loss 1.9239 LearningRate 0.000227 Epoch: 22 Global Step: 473840 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:22,881-Speed 2497.47 samples/sec Loss 1.9044 LearningRate 0.000227 Epoch: 22 Global Step: 473850 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:31,086-Speed 2496.42 samples/sec Loss 1.9201 LearningRate 0.000227 Epoch: 22 Global Step: 473860 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:39,290-Speed 2496.95 samples/sec Loss 1.8932 LearningRate 0.000227 Epoch: 22 Global Step: 473870 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:47,493-Speed 2497.16 samples/sec Loss 1.9020 LearningRate 0.000227 Epoch: 22 Global Step: 473880 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:55:55,642-Speed 2513.48 samples/sec Loss 1.8698 LearningRate 0.000227 Epoch: 22 Global Step: 473890 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:03,850-Speed 2495.65 samples/sec Loss 1.8880 LearningRate 0.000227 Epoch: 22 Global Step: 473900 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:12,055-Speed 2496.72 samples/sec Loss 1.8875 LearningRate 0.000227 Epoch: 22 Global Step: 473910 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:20,284-Speed 2489.16 samples/sec Loss 1.9016 LearningRate 0.000227 Epoch: 22 Global Step: 473920 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:28,490-Speed 2495.91 samples/sec Loss 1.8758 LearningRate 0.000227 Epoch: 22 Global Step: 473930 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:36,691-Speed 2497.65 samples/sec Loss 1.8550 LearningRate 0.000227 Epoch: 22 Global Step: 473940 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:44,844-Speed 2512.56 samples/sec Loss 1.9231 LearningRate 0.000227 Epoch: 22 Global Step: 473950 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:56:53,050-Speed 2496.06 samples/sec Loss 1.8625 LearningRate 0.000227 Epoch: 22 Global Step: 473960 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:01,256-Speed 2496.19 samples/sec Loss 1.8419 LearningRate 0.000227 Epoch: 22 Global Step: 473970 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:09,465-Speed 2495.21 samples/sec Loss 1.9037 LearningRate 0.000227 Epoch: 22 Global Step: 473980 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:17,670-Speed 2496.24 samples/sec Loss 1.8515 LearningRate 0.000227 Epoch: 22 Global Step: 473990 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:25,879-Speed 2495.33 samples/sec Loss 1.8495 LearningRate 0.000227 Epoch: 22 Global Step: 474000 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:34,027-Speed 2513.96 samples/sec Loss 1.8991 LearningRate 0.000227 Epoch: 22 Global Step: 474010 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:42,227-Speed 2497.74 samples/sec Loss 1.8883 LearningRate 0.000227 Epoch: 22 Global Step: 474020 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:50,441-Speed 2493.84 samples/sec Loss 1.8367 LearningRate 0.000227 Epoch: 22 Global Step: 474030 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:57:58,643-Speed 2497.39 samples/sec Loss 1.8602 LearningRate 0.000227 Epoch: 22 Global Step: 474040 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:06,846-Speed 2497.12 samples/sec Loss 1.8537 LearningRate 0.000227 Epoch: 22 Global Step: 474050 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:15,048-Speed 2497.56 samples/sec Loss 1.9027 LearningRate 0.000227 Epoch: 22 Global Step: 474060 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:23,199-Speed 2512.92 samples/sec Loss 1.8874 LearningRate 0.000227 Epoch: 22 Global Step: 474070 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:31,401-Speed 2497.51 samples/sec Loss 1.8504 LearningRate 0.000227 Epoch: 22 Global Step: 474080 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:39,601-Speed 2497.99 samples/sec Loss 1.8947 LearningRate 0.000227 Epoch: 22 Global Step: 474090 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:47,807-Speed 2496.39 samples/sec Loss 1.8562 LearningRate 0.000227 Epoch: 22 Global Step: 474100 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:58:56,012-Speed 2496.21 samples/sec Loss 1.8313 LearningRate 0.000227 Epoch: 22 Global Step: 474110 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:04,216-Speed 2497.16 samples/sec Loss 1.8583 LearningRate 0.000227 Epoch: 22 Global Step: 474120 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:12,365-Speed 2513.40 samples/sec Loss 1.8713 LearningRate 0.000227 Epoch: 22 Global Step: 474130 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:20,574-Speed 2495.09 samples/sec Loss 1.8620 LearningRate 0.000227 Epoch: 22 Global Step: 474140 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:28,779-Speed 2496.49 samples/sec Loss 1.8453 LearningRate 0.000227 Epoch: 22 Global Step: 474150 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:36,983-Speed 2497.36 samples/sec Loss 1.8768 LearningRate 0.000227 Epoch: 22 Global Step: 474160 Fp16 Grad Scale: 8192 Required: 81 hours Training: 2022-07-10 02:59:45,196-Speed 2494.06 samples/sec Loss 1.8478 LearningRate 0.000227 Epoch: 22 Global Step: 474170 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 02:59:53,397-Speed 2497.32 samples/sec Loss 1.7949 LearningRate 0.000227 Epoch: 22 Global Step: 474180 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:01,570-Speed 2506.23 samples/sec Loss 1.8372 LearningRate 0.000227 Epoch: 22 Global Step: 474190 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:09,776-Speed 2496.34 samples/sec Loss 1.8945 LearningRate 0.000227 Epoch: 22 Global Step: 474200 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:17,983-Speed 2495.97 samples/sec Loss 1.8470 LearningRate 0.000227 Epoch: 22 Global Step: 474210 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:26,186-Speed 2496.86 samples/sec Loss 1.8551 LearningRate 0.000227 Epoch: 22 Global Step: 474220 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:34,403-Speed 2492.88 samples/sec Loss 1.8894 LearningRate 0.000227 Epoch: 22 Global Step: 474230 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:42,606-Speed 2496.90 samples/sec Loss 1.8760 LearningRate 0.000226 Epoch: 22 Global Step: 474240 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:50,762-Speed 2511.30 samples/sec Loss 1.8243 LearningRate 0.000226 Epoch: 22 Global Step: 474250 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:00:58,964-Speed 2497.40 samples/sec Loss 1.8788 LearningRate 0.000226 Epoch: 22 Global Step: 474260 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:07,175-Speed 2494.54 samples/sec Loss 1.8776 LearningRate 0.000226 Epoch: 22 Global Step: 474270 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:15,377-Speed 2497.81 samples/sec Loss 1.8567 LearningRate 0.000226 Epoch: 22 Global Step: 474280 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:23,597-Speed 2491.79 samples/sec Loss 1.8532 LearningRate 0.000226 Epoch: 22 Global Step: 474290 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:31,799-Speed 2497.33 samples/sec Loss 1.9003 LearningRate 0.000226 Epoch: 22 Global Step: 474300 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:39,949-Speed 2513.44 samples/sec Loss 1.8680 LearningRate 0.000226 Epoch: 22 Global Step: 474310 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:48,154-Speed 2496.41 samples/sec Loss 1.8341 LearningRate 0.000226 Epoch: 22 Global Step: 474320 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:01:56,369-Speed 2493.47 samples/sec Loss 1.8567 LearningRate 0.000226 Epoch: 22 Global Step: 474330 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:04,571-Speed 2497.44 samples/sec Loss 1.9120 LearningRate 0.000226 Epoch: 22 Global Step: 474340 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:12,784-Speed 2493.85 samples/sec Loss 1.8622 LearningRate 0.000226 Epoch: 22 Global Step: 474350 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:20,988-Speed 2496.90 samples/sec Loss 1.8446 LearningRate 0.000226 Epoch: 22 Global Step: 474360 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:29,136-Speed 2514.03 samples/sec Loss 1.8147 LearningRate 0.000226 Epoch: 22 Global Step: 474370 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:37,338-Speed 2497.35 samples/sec Loss 1.8787 LearningRate 0.000226 Epoch: 22 Global Step: 474380 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:45,541-Speed 2497.06 samples/sec Loss 1.8475 LearningRate 0.000226 Epoch: 22 Global Step: 474390 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:02:53,747-Speed 2496.28 samples/sec Loss 1.8648 LearningRate 0.000226 Epoch: 22 Global Step: 474400 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:01,951-Speed 2496.46 samples/sec Loss 1.8944 LearningRate 0.000226 Epoch: 22 Global Step: 474410 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:10,152-Speed 2497.81 samples/sec Loss 1.8356 LearningRate 0.000226 Epoch: 22 Global Step: 474420 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:18,298-Speed 2514.50 samples/sec Loss 1.8308 LearningRate 0.000226 Epoch: 22 Global Step: 474430 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:26,501-Speed 2497.35 samples/sec Loss 1.8779 LearningRate 0.000226 Epoch: 22 Global Step: 474440 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:34,700-Speed 2498.35 samples/sec Loss 1.8687 LearningRate 0.000226 Epoch: 22 Global Step: 474450 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:42,900-Speed 2497.89 samples/sec Loss 1.8536 LearningRate 0.000226 Epoch: 22 Global Step: 474460 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:51,113-Speed 2493.86 samples/sec Loss 1.8769 LearningRate 0.000226 Epoch: 22 Global Step: 474470 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:03:59,316-Speed 2497.21 samples/sec Loss 1.8412 LearningRate 0.000226 Epoch: 22 Global Step: 474480 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:07,472-Speed 2511.44 samples/sec Loss 1.8304 LearningRate 0.000226 Epoch: 22 Global Step: 474490 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:15,676-Speed 2496.62 samples/sec Loss 1.8735 LearningRate 0.000226 Epoch: 22 Global Step: 474500 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:23,882-Speed 2496.20 samples/sec Loss 1.8465 LearningRate 0.000226 Epoch: 22 Global Step: 474510 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:32,088-Speed 2496.04 samples/sec Loss 1.8692 LearningRate 0.000226 Epoch: 22 Global Step: 474520 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:40,288-Speed 2497.93 samples/sec Loss 1.8229 LearningRate 0.000226 Epoch: 22 Global Step: 474530 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:48,489-Speed 2497.99 samples/sec Loss 1.8560 LearningRate 0.000226 Epoch: 22 Global Step: 474540 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:04:56,638-Speed 2513.67 samples/sec Loss 1.8201 LearningRate 0.000226 Epoch: 22 Global Step: 474550 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:04,836-Speed 2498.41 samples/sec Loss 1.8427 LearningRate 0.000226 Epoch: 22 Global Step: 474560 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:13,057-Speed 2491.57 samples/sec Loss 1.8437 LearningRate 0.000226 Epoch: 22 Global Step: 474570 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:21,260-Speed 2497.18 samples/sec Loss 1.8416 LearningRate 0.000226 Epoch: 22 Global Step: 474580 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:29,463-Speed 2497.01 samples/sec Loss 1.8012 LearningRate 0.000226 Epoch: 22 Global Step: 474590 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:37,668-Speed 2496.18 samples/sec Loss 1.8658 LearningRate 0.000226 Epoch: 22 Global Step: 474600 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:45,827-Speed 2510.59 samples/sec Loss 1.8257 LearningRate 0.000226 Epoch: 22 Global Step: 474610 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:05:54,033-Speed 2496.06 samples/sec Loss 1.8257 LearningRate 0.000226 Epoch: 22 Global Step: 474620 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:02,239-Speed 2496.33 samples/sec Loss 1.8611 LearningRate 0.000226 Epoch: 22 Global Step: 474630 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:10,461-Speed 2491.21 samples/sec Loss 1.8659 LearningRate 0.000226 Epoch: 22 Global Step: 474640 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:18,663-Speed 2497.31 samples/sec Loss 1.8172 LearningRate 0.000226 Epoch: 22 Global Step: 474650 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:26,863-Speed 2497.86 samples/sec Loss 1.8775 LearningRate 0.000226 Epoch: 22 Global Step: 474660 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:35,017-Speed 2512.17 samples/sec Loss 1.8707 LearningRate 0.000226 Epoch: 22 Global Step: 474670 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:43,220-Speed 2497.06 samples/sec Loss 1.8674 LearningRate 0.000226 Epoch: 22 Global Step: 474680 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:51,428-Speed 2495.41 samples/sec Loss 1.8175 LearningRate 0.000226 Epoch: 22 Global Step: 474690 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:06:59,646-Speed 2492.55 samples/sec Loss 1.8692 LearningRate 0.000226 Epoch: 22 Global Step: 474700 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:07,848-Speed 2497.38 samples/sec Loss 1.8131 LearningRate 0.000226 Epoch: 22 Global Step: 474710 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:16,052-Speed 2496.67 samples/sec Loss 1.8586 LearningRate 0.000226 Epoch: 22 Global Step: 474720 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:24,202-Speed 2513.55 samples/sec Loss 1.8435 LearningRate 0.000226 Epoch: 22 Global Step: 474730 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:32,410-Speed 2495.47 samples/sec Loss 1.8133 LearningRate 0.000226 Epoch: 22 Global Step: 474740 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:40,629-Speed 2492.56 samples/sec Loss 1.8479 LearningRate 0.000226 Epoch: 22 Global Step: 474750 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:48,835-Speed 2496.05 samples/sec Loss 1.8516 LearningRate 0.000226 Epoch: 22 Global Step: 474760 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:07:57,039-Speed 2496.47 samples/sec Loss 1.8140 LearningRate 0.000226 Epoch: 22 Global Step: 474770 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:05,244-Speed 2496.55 samples/sec Loss 1.8640 LearningRate 0.000226 Epoch: 22 Global Step: 474780 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:13,398-Speed 2512.03 samples/sec Loss 1.8418 LearningRate 0.000226 Epoch: 22 Global Step: 474790 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:21,614-Speed 2493.03 samples/sec Loss 1.8604 LearningRate 0.000226 Epoch: 22 Global Step: 474800 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:29,824-Speed 2494.94 samples/sec Loss 1.8052 LearningRate 0.000226 Epoch: 22 Global Step: 474810 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:38,030-Speed 2496.48 samples/sec Loss 1.8342 LearningRate 0.000226 Epoch: 22 Global Step: 474820 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:46,236-Speed 2496.00 samples/sec Loss 1.8578 LearningRate 0.000226 Epoch: 22 Global Step: 474830 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:08:54,457-Speed 2491.66 samples/sec Loss 1.8268 LearningRate 0.000226 Epoch: 22 Global Step: 474840 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:02,606-Speed 2513.64 samples/sec Loss 1.8303 LearningRate 0.000226 Epoch: 22 Global Step: 474850 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:10,808-Speed 2497.65 samples/sec Loss 1.8868 LearningRate 0.000226 Epoch: 22 Global Step: 474860 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:19,017-Speed 2495.25 samples/sec Loss 1.8448 LearningRate 0.000226 Epoch: 22 Global Step: 474870 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:27,219-Speed 2497.04 samples/sec Loss 1.8340 LearningRate 0.000226 Epoch: 22 Global Step: 474880 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:35,429-Speed 2495.13 samples/sec Loss 1.8648 LearningRate 0.000226 Epoch: 22 Global Step: 474890 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:43,634-Speed 2496.40 samples/sec Loss 1.8312 LearningRate 0.000226 Epoch: 22 Global Step: 474900 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:51,783-Speed 2513.72 samples/sec Loss 1.8504 LearningRate 0.000226 Epoch: 22 Global Step: 474910 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:09:59,982-Speed 2498.61 samples/sec Loss 1.8375 LearningRate 0.000226 Epoch: 22 Global Step: 474920 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:08,192-Speed 2494.77 samples/sec Loss 1.8711 LearningRate 0.000226 Epoch: 22 Global Step: 474930 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:16,391-Speed 2498.39 samples/sec Loss 1.8542 LearningRate 0.000226 Epoch: 22 Global Step: 474940 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:24,595-Speed 2496.61 samples/sec Loss 1.8425 LearningRate 0.000226 Epoch: 22 Global Step: 474950 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:32,801-Speed 2496.24 samples/sec Loss 1.8488 LearningRate 0.000226 Epoch: 22 Global Step: 474960 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:40,953-Speed 2512.45 samples/sec Loss 1.8818 LearningRate 0.000226 Epoch: 22 Global Step: 474970 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:49,185-Speed 2488.39 samples/sec Loss 1.9325 LearningRate 0.000226 Epoch: 22 Global Step: 474980 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:10:57,388-Speed 2497.14 samples/sec Loss 1.9101 LearningRate 0.000226 Epoch: 22 Global Step: 474990 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:05,590-Speed 2497.16 samples/sec Loss 1.8539 LearningRate 0.000226 Epoch: 22 Global Step: 475000 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:13,799-Speed 2495.27 samples/sec Loss 1.8421 LearningRate 0.000226 Epoch: 22 Global Step: 475010 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:22,000-Speed 2497.88 samples/sec Loss 1.8310 LearningRate 0.000226 Epoch: 22 Global Step: 475020 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:30,151-Speed 2512.67 samples/sec Loss 1.8625 LearningRate 0.000225 Epoch: 22 Global Step: 475030 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:38,361-Speed 2494.95 samples/sec Loss 1.8448 LearningRate 0.000225 Epoch: 22 Global Step: 475040 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:46,566-Speed 2496.65 samples/sec Loss 1.8659 LearningRate 0.000225 Epoch: 22 Global Step: 475050 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:11:54,767-Speed 2497.88 samples/sec Loss 1.8820 LearningRate 0.000225 Epoch: 22 Global Step: 475060 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:02,984-Speed 2492.92 samples/sec Loss 1.8645 LearningRate 0.000225 Epoch: 22 Global Step: 475070 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:11,188-Speed 2496.49 samples/sec Loss 1.8517 LearningRate 0.000225 Epoch: 22 Global Step: 475080 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:19,338-Speed 2513.59 samples/sec Loss 1.9089 LearningRate 0.000225 Epoch: 22 Global Step: 475090 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:27,548-Speed 2494.87 samples/sec Loss 1.8468 LearningRate 0.000225 Epoch: 22 Global Step: 475100 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:35,752-Speed 2496.82 samples/sec Loss 1.8741 LearningRate 0.000225 Epoch: 22 Global Step: 475110 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:43,968-Speed 2493.05 samples/sec Loss 1.8275 LearningRate 0.000225 Epoch: 22 Global Step: 475120 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:12:52,169-Speed 2497.71 samples/sec Loss 1.9315 LearningRate 0.000225 Epoch: 22 Global Step: 475130 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:00,387-Speed 2492.37 samples/sec Loss 1.8876 LearningRate 0.000225 Epoch: 22 Global Step: 475140 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:08,536-Speed 2513.28 samples/sec Loss 1.8468 LearningRate 0.000225 Epoch: 22 Global Step: 475150 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:16,736-Speed 2498.11 samples/sec Loss 1.8840 LearningRate 0.000225 Epoch: 22 Global Step: 475160 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:24,949-Speed 2494.26 samples/sec Loss 1.8955 LearningRate 0.000225 Epoch: 22 Global Step: 475170 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:33,160-Speed 2494.39 samples/sec Loss 1.8574 LearningRate 0.000225 Epoch: 22 Global Step: 475180 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:41,363-Speed 2496.94 samples/sec Loss 1.8461 LearningRate 0.000225 Epoch: 22 Global Step: 475190 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:49,563-Speed 2497.99 samples/sec Loss 1.8380 LearningRate 0.000225 Epoch: 22 Global Step: 475200 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:13:57,718-Speed 2511.93 samples/sec Loss 1.8912 LearningRate 0.000225 Epoch: 22 Global Step: 475210 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:05,920-Speed 2497.31 samples/sec Loss 1.8628 LearningRate 0.000225 Epoch: 22 Global Step: 475220 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:14,156-Speed 2486.94 samples/sec Loss 1.8382 LearningRate 0.000225 Epoch: 22 Global Step: 475230 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:22,359-Speed 2497.09 samples/sec Loss 1.8150 LearningRate 0.000225 Epoch: 22 Global Step: 475240 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:30,559-Speed 2498.06 samples/sec Loss 1.8566 LearningRate 0.000225 Epoch: 22 Global Step: 475250 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:38,765-Speed 2496.17 samples/sec Loss 1.8861 LearningRate 0.000225 Epoch: 22 Global Step: 475260 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:46,915-Speed 2513.32 samples/sec Loss 1.8972 LearningRate 0.000225 Epoch: 22 Global Step: 475270 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:14:55,115-Speed 2497.84 samples/sec Loss 1.8466 LearningRate 0.000225 Epoch: 22 Global Step: 475280 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:03,331-Speed 2493.18 samples/sec Loss 1.8419 LearningRate 0.000225 Epoch: 22 Global Step: 475290 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:11,532-Speed 2497.57 samples/sec Loss 1.8163 LearningRate 0.000225 Epoch: 22 Global Step: 475300 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:19,759-Speed 2489.69 samples/sec Loss 1.8962 LearningRate 0.000225 Epoch: 22 Global Step: 475310 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:27,961-Speed 2497.28 samples/sec Loss 1.8872 LearningRate 0.000225 Epoch: 22 Global Step: 475320 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:36,111-Speed 2513.48 samples/sec Loss 1.8189 LearningRate 0.000225 Epoch: 22 Global Step: 475330 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:44,311-Speed 2497.74 samples/sec Loss 1.8547 LearningRate 0.000225 Epoch: 22 Global Step: 475340 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:15:52,521-Speed 2494.94 samples/sec Loss 1.8706 LearningRate 0.000225 Epoch: 22 Global Step: 475350 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:16:00,724-Speed 2497.10 samples/sec Loss 1.8419 LearningRate 0.000225 Epoch: 22 Global Step: 475360 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:16:08,927-Speed 2497.29 samples/sec Loss 1.8663 LearningRate 0.000225 Epoch: 22 Global Step: 475370 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:17,129-Speed 2497.02 samples/sec Loss 1.8832 LearningRate 0.000225 Epoch: 22 Global Step: 475380 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:25,282-Speed 2512.36 samples/sec Loss 1.8884 LearningRate 0.000225 Epoch: 22 Global Step: 475390 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:33,491-Speed 2495.62 samples/sec Loss 1.8282 LearningRate 0.000225 Epoch: 22 Global Step: 475400 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:41,696-Speed 2496.42 samples/sec Loss 1.8633 LearningRate 0.000225 Epoch: 22 Global Step: 475410 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:49,909-Speed 2493.99 samples/sec Loss 1.8723 LearningRate 0.000225 Epoch: 22 Global Step: 475420 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:16:58,114-Speed 2496.21 samples/sec Loss 1.8266 LearningRate 0.000225 Epoch: 22 Global Step: 475430 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:06,332-Speed 2492.63 samples/sec Loss 1.8443 LearningRate 0.000225 Epoch: 22 Global Step: 475440 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:14,481-Speed 2513.55 samples/sec Loss 1.8601 LearningRate 0.000225 Epoch: 22 Global Step: 475450 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:22,686-Speed 2496.47 samples/sec Loss 1.8841 LearningRate 0.000225 Epoch: 22 Global Step: 475460 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:30,889-Speed 2497.14 samples/sec Loss 1.8547 LearningRate 0.000225 Epoch: 22 Global Step: 475470 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:39,090-Speed 2497.83 samples/sec Loss 1.9154 LearningRate 0.000225 Epoch: 22 Global Step: 475480 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:47,291-Speed 2497.50 samples/sec Loss 1.8929 LearningRate 0.000225 Epoch: 22 Global Step: 475490 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:17:55,498-Speed 2495.72 samples/sec Loss 1.8474 LearningRate 0.000225 Epoch: 22 Global Step: 475500 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:03,647-Speed 2513.95 samples/sec Loss 1.8611 LearningRate 0.000225 Epoch: 22 Global Step: 475510 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:11,853-Speed 2496.09 samples/sec Loss 1.8805 LearningRate 0.000225 Epoch: 22 Global Step: 475520 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:20,055-Speed 2497.30 samples/sec Loss 1.8581 LearningRate 0.000225 Epoch: 22 Global Step: 475530 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:28,259-Speed 2496.65 samples/sec Loss 1.8915 LearningRate 0.000225 Epoch: 22 Global Step: 475540 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:36,462-Speed 2497.41 samples/sec Loss 1.8986 LearningRate 0.000225 Epoch: 22 Global Step: 475550 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:44,676-Speed 2493.54 samples/sec Loss 1.8798 LearningRate 0.000225 Epoch: 22 Global Step: 475560 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:18:52,832-Speed 2511.42 samples/sec Loss 1.8817 LearningRate 0.000225 Epoch: 22 Global Step: 475570 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:01,034-Speed 2497.43 samples/sec Loss 1.8701 LearningRate 0.000225 Epoch: 22 Global Step: 475580 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:09,233-Speed 2498.56 samples/sec Loss 1.8872 LearningRate 0.000225 Epoch: 22 Global Step: 475590 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:17,439-Speed 2496.08 samples/sec Loss 1.8597 LearningRate 0.000225 Epoch: 22 Global Step: 475600 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:25,642-Speed 2496.88 samples/sec Loss 1.8605 LearningRate 0.000225 Epoch: 22 Global Step: 475610 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:33,858-Speed 2493.18 samples/sec Loss 1.8661 LearningRate 0.000225 Epoch: 22 Global Step: 475620 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:42,007-Speed 2513.69 samples/sec Loss 1.8434 LearningRate 0.000225 Epoch: 22 Global Step: 475630 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:50,210-Speed 2496.90 samples/sec Loss 1.8605 LearningRate 0.000225 Epoch: 22 Global Step: 475640 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:19:58,415-Speed 2496.34 samples/sec Loss 1.8411 LearningRate 0.000225 Epoch: 22 Global Step: 475650 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:06,630-Speed 2493.48 samples/sec Loss 1.8811 LearningRate 0.000225 Epoch: 22 Global Step: 475660 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:14,837-Speed 2495.80 samples/sec Loss 1.8678 LearningRate 0.000225 Epoch: 22 Global Step: 475670 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:23,038-Speed 2497.54 samples/sec Loss 1.8947 LearningRate 0.000225 Epoch: 22 Global Step: 475680 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:31,186-Speed 2513.78 samples/sec Loss 1.8295 LearningRate 0.000225 Epoch: 22 Global Step: 475690 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:39,399-Speed 2494.07 samples/sec Loss 1.8281 LearningRate 0.000225 Epoch: 22 Global Step: 475700 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:47,605-Speed 2496.02 samples/sec Loss 1.8484 LearningRate 0.000225 Epoch: 22 Global Step: 475710 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:20:55,806-Speed 2497.75 samples/sec Loss 1.8751 LearningRate 0.000225 Epoch: 22 Global Step: 475720 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:04,007-Speed 2497.59 samples/sec Loss 1.8782 LearningRate 0.000225 Epoch: 22 Global Step: 475730 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:12,211-Speed 2496.99 samples/sec Loss 1.8169 LearningRate 0.000225 Epoch: 22 Global Step: 475740 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:20,363-Speed 2512.55 samples/sec Loss 1.8691 LearningRate 0.000225 Epoch: 22 Global Step: 475750 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:28,564-Speed 2497.60 samples/sec Loss 1.8758 LearningRate 0.000225 Epoch: 22 Global Step: 475760 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:36,766-Speed 2497.58 samples/sec Loss 1.8383 LearningRate 0.000225 Epoch: 22 Global Step: 475770 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:44,970-Speed 2496.78 samples/sec Loss 1.8319 LearningRate 0.000225 Epoch: 22 Global Step: 475780 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:21:53,204-Speed 2487.59 samples/sec Loss 1.8623 LearningRate 0.000225 Epoch: 22 Global Step: 475790 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:01,411-Speed 2495.51 samples/sec Loss 1.8709 LearningRate 0.000225 Epoch: 22 Global Step: 475800 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:09,563-Speed 2512.88 samples/sec Loss 1.8739 LearningRate 0.000224 Epoch: 22 Global Step: 475810 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:17,766-Speed 2497.08 samples/sec Loss 1.8813 LearningRate 0.000224 Epoch: 22 Global Step: 475820 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:25,967-Speed 2497.65 samples/sec Loss 1.9089 LearningRate 0.000224 Epoch: 22 Global Step: 475830 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:34,168-Speed 2497.50 samples/sec Loss 1.8509 LearningRate 0.000224 Epoch: 22 Global Step: 475840 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:42,398-Speed 2489.79 samples/sec Loss 1.8543 LearningRate 0.000224 Epoch: 22 Global Step: 475850 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:50,599-Speed 2497.80 samples/sec Loss 1.8773 LearningRate 0.000224 Epoch: 22 Global Step: 475860 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:22:58,750-Speed 2513.00 samples/sec Loss 1.8693 LearningRate 0.000224 Epoch: 22 Global Step: 475870 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:23:06,950-Speed 2497.87 samples/sec Loss 1.8654 LearningRate 0.000224 Epoch: 22 Global Step: 475880 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:23:15,153-Speed 2497.12 samples/sec Loss 1.8677 LearningRate 0.000224 Epoch: 22 Global Step: 475890 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:23:23,323-Speed 2507.15 samples/sec Loss 1.9063 LearningRate 0.000224 Epoch: 22 Global Step: 475900 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:23:31,541-Speed 2492.36 samples/sec Loss 1.8901 LearningRate 0.000224 Epoch: 22 Global Step: 475910 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:23:39,746-Speed 2496.60 samples/sec Loss 1.8375 LearningRate 0.000224 Epoch: 22 Global Step: 475920 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:23:47,893-Speed 2514.42 samples/sec Loss 1.8806 LearningRate 0.000224 Epoch: 22 Global Step: 475930 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:23:56,097-Speed 2496.70 samples/sec Loss 1.8333 LearningRate 0.000224 Epoch: 22 Global Step: 475940 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:04,303-Speed 2496.14 samples/sec Loss 1.8503 LearningRate 0.000224 Epoch: 22 Global Step: 475950 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:12,506-Speed 2497.21 samples/sec Loss 1.8316 LearningRate 0.000224 Epoch: 22 Global Step: 475960 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:20,705-Speed 2498.25 samples/sec Loss 1.8744 LearningRate 0.000224 Epoch: 22 Global Step: 475970 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:28,907-Speed 2497.35 samples/sec Loss 1.8431 LearningRate 0.000224 Epoch: 22 Global Step: 475980 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:37,058-Speed 2512.85 samples/sec Loss 1.8511 LearningRate 0.000224 Epoch: 22 Global Step: 475990 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:45,268-Speed 2495.03 samples/sec Loss 1.8440 LearningRate 0.000224 Epoch: 22 Global Step: 476000 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:24:53,478-Speed 2495.28 samples/sec Loss 1.8300 LearningRate 0.000224 Epoch: 22 Global Step: 476010 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:01,681-Speed 2497.02 samples/sec Loss 1.8362 LearningRate 0.000224 Epoch: 22 Global Step: 476020 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:09,898-Speed 2493.02 samples/sec Loss 1.8575 LearningRate 0.000224 Epoch: 22 Global Step: 476030 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:18,099-Speed 2497.66 samples/sec Loss 1.8673 LearningRate 0.000224 Epoch: 22 Global Step: 476040 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:26,247-Speed 2513.92 samples/sec Loss 1.8493 LearningRate 0.000224 Epoch: 22 Global Step: 476050 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:34,449-Speed 2497.34 samples/sec Loss 1.9052 LearningRate 0.000224 Epoch: 22 Global Step: 476060 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:42,650-Speed 2497.75 samples/sec Loss 1.8576 LearningRate 0.000224 Epoch: 22 Global Step: 476070 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:50,857-Speed 2495.84 samples/sec Loss 1.8340 LearningRate 0.000224 Epoch: 22 Global Step: 476080 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:25:59,061-Speed 2496.64 samples/sec Loss 1.8918 LearningRate 0.000224 Epoch: 22 Global Step: 476090 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:07,266-Speed 2496.19 samples/sec Loss 1.8926 LearningRate 0.000224 Epoch: 22 Global Step: 476100 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:15,419-Speed 2512.46 samples/sec Loss 1.8600 LearningRate 0.000224 Epoch: 22 Global Step: 476110 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:23,624-Speed 2496.62 samples/sec Loss 1.8511 LearningRate 0.000224 Epoch: 22 Global Step: 476120 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:31,842-Speed 2492.38 samples/sec Loss 1.9029 LearningRate 0.000224 Epoch: 22 Global Step: 476130 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:40,054-Speed 2494.10 samples/sec Loss 1.8887 LearningRate 0.000224 Epoch: 22 Global Step: 476140 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:48,261-Speed 2496.51 samples/sec Loss 1.8814 LearningRate 0.000224 Epoch: 22 Global Step: 476150 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:26:56,469-Speed 2495.54 samples/sec Loss 1.8616 LearningRate 0.000224 Epoch: 22 Global Step: 476160 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:04,620-Speed 2512.75 samples/sec Loss 1.8798 LearningRate 0.000224 Epoch: 22 Global Step: 476170 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:12,820-Speed 2498.26 samples/sec Loss 1.8269 LearningRate 0.000224 Epoch: 22 Global Step: 476180 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:21,034-Speed 2493.68 samples/sec Loss 1.8578 LearningRate 0.000224 Epoch: 22 Global Step: 476190 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:29,239-Speed 2496.29 samples/sec Loss 1.8587 LearningRate 0.000224 Epoch: 22 Global Step: 476200 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:37,446-Speed 2496.04 samples/sec Loss 1.8617 LearningRate 0.000224 Epoch: 22 Global Step: 476210 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:45,652-Speed 2495.81 samples/sec Loss 1.8920 LearningRate 0.000224 Epoch: 22 Global Step: 476220 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:27:53,803-Speed 2513.30 samples/sec Loss 1.8805 LearningRate 0.000224 Epoch: 22 Global Step: 476230 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:02,006-Speed 2497.16 samples/sec Loss 1.8794 LearningRate 0.000224 Epoch: 22 Global Step: 476240 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:10,208-Speed 2497.36 samples/sec Loss 1.8895 LearningRate 0.000224 Epoch: 22 Global Step: 476250 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:18,414-Speed 2496.13 samples/sec Loss 1.8832 LearningRate 0.000224 Epoch: 22 Global Step: 476260 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:26,619-Speed 2496.48 samples/sec Loss 1.8730 LearningRate 0.000224 Epoch: 22 Global Step: 476270 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:34,824-Speed 2496.33 samples/sec Loss 1.8546 LearningRate 0.000224 Epoch: 22 Global Step: 476280 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:42,975-Speed 2512.94 samples/sec Loss 1.8657 LearningRate 0.000224 Epoch: 22 Global Step: 476290 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:51,276-Speed 2467.52 samples/sec Loss 1.8587 LearningRate 0.000224 Epoch: 22 Global Step: 476300 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:28:59,509-Speed 2488.00 samples/sec Loss 1.8316 LearningRate 0.000224 Epoch: 22 Global Step: 476310 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:07,712-Speed 2496.80 samples/sec Loss 1.8665 LearningRate 0.000224 Epoch: 22 Global Step: 476320 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:15,917-Speed 2496.38 samples/sec Loss 1.8563 LearningRate 0.000224 Epoch: 22 Global Step: 476330 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:24,118-Speed 2497.57 samples/sec Loss 1.8824 LearningRate 0.000224 Epoch: 22 Global Step: 476340 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:32,269-Speed 2513.14 samples/sec Loss 1.8559 LearningRate 0.000224 Epoch: 22 Global Step: 476350 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:40,474-Speed 2496.24 samples/sec Loss 1.8772 LearningRate 0.000224 Epoch: 22 Global Step: 476360 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:48,675-Speed 2497.40 samples/sec Loss 1.8480 LearningRate 0.000224 Epoch: 22 Global Step: 476370 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:29:56,881-Speed 2496.03 samples/sec Loss 1.8300 LearningRate 0.000224 Epoch: 22 Global Step: 476380 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:05,094-Speed 2493.91 samples/sec Loss 1.8841 LearningRate 0.000224 Epoch: 22 Global Step: 476390 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:13,311-Speed 2492.56 samples/sec Loss 1.8524 LearningRate 0.000224 Epoch: 22 Global Step: 476400 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:21,463-Speed 2512.69 samples/sec Loss 1.8964 LearningRate 0.000224 Epoch: 22 Global Step: 476410 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:29,669-Speed 2496.17 samples/sec Loss 1.8054 LearningRate 0.000224 Epoch: 22 Global Step: 476420 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:37,872-Speed 2497.25 samples/sec Loss 1.8592 LearningRate 0.000224 Epoch: 22 Global Step: 476430 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:46,083-Speed 2494.71 samples/sec Loss 1.8365 LearningRate 0.000224 Epoch: 22 Global Step: 476440 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:30:54,285-Speed 2497.20 samples/sec Loss 1.8641 LearningRate 0.000224 Epoch: 22 Global Step: 476450 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:02,491-Speed 2496.04 samples/sec Loss 1.8540 LearningRate 0.000224 Epoch: 22 Global Step: 476460 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:10,646-Speed 2511.68 samples/sec Loss 1.8422 LearningRate 0.000224 Epoch: 22 Global Step: 476470 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:18,861-Speed 2493.49 samples/sec Loss 1.8560 LearningRate 0.000224 Epoch: 22 Global Step: 476480 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:27,064-Speed 2497.07 samples/sec Loss 1.8582 LearningRate 0.000224 Epoch: 22 Global Step: 476490 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:35,265-Speed 2497.55 samples/sec Loss 1.9064 LearningRate 0.000224 Epoch: 22 Global Step: 476500 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:43,478-Speed 2493.91 samples/sec Loss 1.8232 LearningRate 0.000224 Epoch: 22 Global Step: 476510 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:51,692-Speed 2493.42 samples/sec Loss 1.8352 LearningRate 0.000224 Epoch: 22 Global Step: 476520 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:31:59,840-Speed 2514.07 samples/sec Loss 1.8628 LearningRate 0.000224 Epoch: 22 Global Step: 476530 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:08,044-Speed 2496.85 samples/sec Loss 1.8706 LearningRate 0.000224 Epoch: 22 Global Step: 476540 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:16,243-Speed 2498.21 samples/sec Loss 1.8370 LearningRate 0.000224 Epoch: 22 Global Step: 476550 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:24,462-Speed 2492.34 samples/sec Loss 1.8769 LearningRate 0.000224 Epoch: 22 Global Step: 476560 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:32,669-Speed 2495.85 samples/sec Loss 1.8846 LearningRate 0.000224 Epoch: 22 Global Step: 476570 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:40,871-Speed 2497.26 samples/sec Loss 1.8448 LearningRate 0.000224 Epoch: 22 Global Step: 476580 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:49,021-Speed 2513.60 samples/sec Loss 1.8648 LearningRate 0.000224 Epoch: 22 Global Step: 476590 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:32:57,225-Speed 2496.64 samples/sec Loss 1.8803 LearningRate 0.000223 Epoch: 22 Global Step: 476600 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:05,430-Speed 2496.61 samples/sec Loss 1.8659 LearningRate 0.000223 Epoch: 22 Global Step: 476610 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:13,641-Speed 2494.62 samples/sec Loss 1.8685 LearningRate 0.000223 Epoch: 22 Global Step: 476620 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:21,852-Speed 2494.53 samples/sec Loss 1.8521 LearningRate 0.000223 Epoch: 22 Global Step: 476630 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:30,069-Speed 2493.05 samples/sec Loss 1.8222 LearningRate 0.000223 Epoch: 22 Global Step: 476640 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:38,218-Speed 2513.69 samples/sec Loss 1.8632 LearningRate 0.000223 Epoch: 22 Global Step: 476650 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:46,436-Speed 2492.38 samples/sec Loss 1.8881 LearningRate 0.000223 Epoch: 22 Global Step: 476660 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:33:54,640-Speed 2496.91 samples/sec Loss 1.8736 LearningRate 0.000223 Epoch: 22 Global Step: 476670 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:02,841-Speed 2497.39 samples/sec Loss 1.8851 LearningRate 0.000223 Epoch: 22 Global Step: 476680 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:11,045-Speed 2497.12 samples/sec Loss 1.8600 LearningRate 0.000223 Epoch: 22 Global Step: 476690 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:19,251-Speed 2496.23 samples/sec Loss 1.8082 LearningRate 0.000223 Epoch: 22 Global Step: 476700 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:27,400-Speed 2513.74 samples/sec Loss 1.8471 LearningRate 0.000223 Epoch: 22 Global Step: 476710 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:35,605-Speed 2496.31 samples/sec Loss 1.8990 LearningRate 0.000223 Epoch: 22 Global Step: 476720 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:43,808-Speed 2497.10 samples/sec Loss 1.8718 LearningRate 0.000223 Epoch: 22 Global Step: 476730 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:34:52,013-Speed 2496.61 samples/sec Loss 1.9032 LearningRate 0.000223 Epoch: 22 Global Step: 476740 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:00,217-Speed 2496.62 samples/sec Loss 1.8560 LearningRate 0.000223 Epoch: 22 Global Step: 476750 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:08,419-Speed 2497.28 samples/sec Loss 1.8699 LearningRate 0.000223 Epoch: 22 Global Step: 476760 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:16,571-Speed 2512.93 samples/sec Loss 1.9033 LearningRate 0.000223 Epoch: 22 Global Step: 476770 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:24,784-Speed 2493.91 samples/sec Loss 1.8490 LearningRate 0.000223 Epoch: 22 Global Step: 476780 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:32,987-Speed 2496.95 samples/sec Loss 1.8857 LearningRate 0.000223 Epoch: 22 Global Step: 476790 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:41,195-Speed 2495.79 samples/sec Loss 1.8667 LearningRate 0.000223 Epoch: 22 Global Step: 476800 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:49,397-Speed 2497.46 samples/sec Loss 1.8667 LearningRate 0.000223 Epoch: 22 Global Step: 476810 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:35:57,603-Speed 2495.99 samples/sec Loss 1.8592 LearningRate 0.000223 Epoch: 22 Global Step: 476820 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:05,758-Speed 2511.99 samples/sec Loss 1.8651 LearningRate 0.000223 Epoch: 22 Global Step: 476830 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:13,958-Speed 2498.01 samples/sec Loss 1.8658 LearningRate 0.000223 Epoch: 22 Global Step: 476840 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:22,171-Speed 2494.10 samples/sec Loss 1.8487 LearningRate 0.000223 Epoch: 22 Global Step: 476850 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:30,375-Speed 2496.43 samples/sec Loss 1.8473 LearningRate 0.000223 Epoch: 22 Global Step: 476860 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:38,580-Speed 2496.51 samples/sec Loss 1.8073 LearningRate 0.000223 Epoch: 22 Global Step: 476870 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:46,785-Speed 2496.43 samples/sec Loss 1.8950 LearningRate 0.000223 Epoch: 22 Global Step: 476880 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:36:54,936-Speed 2513.05 samples/sec Loss 1.8348 LearningRate 0.000223 Epoch: 22 Global Step: 476890 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:03,157-Speed 2491.76 samples/sec Loss 1.8595 LearningRate 0.000223 Epoch: 22 Global Step: 476900 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:11,360-Speed 2496.77 samples/sec Loss 1.8638 LearningRate 0.000223 Epoch: 22 Global Step: 476910 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:19,567-Speed 2495.83 samples/sec Loss 1.8769 LearningRate 0.000223 Epoch: 22 Global Step: 476920 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:27,773-Speed 2496.73 samples/sec Loss 1.8892 LearningRate 0.000223 Epoch: 22 Global Step: 476930 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:35,976-Speed 2496.67 samples/sec Loss 1.8391 LearningRate 0.000223 Epoch: 22 Global Step: 476940 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:44,137-Speed 2510.15 samples/sec Loss 1.8812 LearningRate 0.000223 Epoch: 22 Global Step: 476950 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:37:52,341-Speed 2496.67 samples/sec Loss 1.8836 LearningRate 0.000223 Epoch: 22 Global Step: 476960 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:00,547-Speed 2496.34 samples/sec Loss 1.7993 LearningRate 0.000223 Epoch: 22 Global Step: 476970 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:08,748-Speed 2497.66 samples/sec Loss 1.8203 LearningRate 0.000223 Epoch: 22 Global Step: 476980 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:16,947-Speed 2498.55 samples/sec Loss 1.9134 LearningRate 0.000223 Epoch: 22 Global Step: 476990 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:25,152-Speed 2496.68 samples/sec Loss 1.8631 LearningRate 0.000223 Epoch: 22 Global Step: 477000 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:33,303-Speed 2513.00 samples/sec Loss 1.8753 LearningRate 0.000223 Epoch: 22 Global Step: 477010 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:43,666-Speed 1976.30 samples/sec Loss 1.8521 LearningRate 0.000223 Epoch: 23 Global Step: 477020 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:38:51,864-Speed 2498.68 samples/sec Loss 1.9133 LearningRate 0.000223 Epoch: 23 Global Step: 477030 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:00,074-Speed 2494.92 samples/sec Loss 1.8876 LearningRate 0.000223 Epoch: 23 Global Step: 477040 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:08,273-Speed 2498.58 samples/sec Loss 1.9105 LearningRate 0.000223 Epoch: 23 Global Step: 477050 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:16,468-Speed 2499.17 samples/sec Loss 1.8801 LearningRate 0.000223 Epoch: 23 Global Step: 477060 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:24,616-Speed 2514.16 samples/sec Loss 1.8850 LearningRate 0.000223 Epoch: 23 Global Step: 477070 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:32,826-Speed 2494.79 samples/sec Loss 1.8195 LearningRate 0.000223 Epoch: 23 Global Step: 477080 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:41,036-Speed 2494.98 samples/sec Loss 1.8865 LearningRate 0.000223 Epoch: 23 Global Step: 477090 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:39:49,240-Speed 2496.94 samples/sec Loss 1.8894 LearningRate 0.000223 Epoch: 23 Global Step: 477100 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:39:57,445-Speed 2496.39 samples/sec Loss 1.8826 LearningRate 0.000223 Epoch: 23 Global Step: 477110 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:05,657-Speed 2494.30 samples/sec Loss 1.9473 LearningRate 0.000223 Epoch: 23 Global Step: 477120 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:13,808-Speed 2512.98 samples/sec Loss 1.9035 LearningRate 0.000223 Epoch: 23 Global Step: 477130 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:22,028-Speed 2492.10 samples/sec Loss 1.8980 LearningRate 0.000223 Epoch: 23 Global Step: 477140 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:30,231-Speed 2496.87 samples/sec Loss 1.8511 LearningRate 0.000223 Epoch: 23 Global Step: 477150 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:38,437-Speed 2496.26 samples/sec Loss 1.8477 LearningRate 0.000223 Epoch: 23 Global Step: 477160 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:46,640-Speed 2497.05 samples/sec Loss 1.8744 LearningRate 0.000223 Epoch: 23 Global Step: 477170 Fp16 Grad Scale: 32768 Required: 81 hours Training: 2022-07-10 03:40:54,802-Speed 2509.66 samples/sec Loss 1.8934 LearningRate 0.000223 Epoch: 23 Global Step: 477180 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:02,951-Speed 2513.78 samples/sec Loss 1.8681 LearningRate 0.000223 Epoch: 23 Global Step: 477190 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:11,154-Speed 2496.76 samples/sec Loss 1.9186 LearningRate 0.000223 Epoch: 23 Global Step: 477200 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:19,364-Speed 2494.92 samples/sec Loss 1.8846 LearningRate 0.000223 Epoch: 23 Global Step: 477210 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:27,565-Speed 2497.62 samples/sec Loss 1.8847 LearningRate 0.000223 Epoch: 23 Global Step: 477220 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:35,773-Speed 2495.57 samples/sec Loss 1.8284 LearningRate 0.000223 Epoch: 23 Global Step: 477230 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:43,973-Speed 2497.93 samples/sec Loss 1.9249 LearningRate 0.000223 Epoch: 23 Global Step: 477240 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:41:52,122-Speed 2513.51 samples/sec Loss 1.8479 LearningRate 0.000223 Epoch: 23 Global Step: 477250 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:00,326-Speed 2496.97 samples/sec Loss 1.8509 LearningRate 0.000223 Epoch: 23 Global Step: 477260 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:08,532-Speed 2496.01 samples/sec Loss 1.8600 LearningRate 0.000223 Epoch: 23 Global Step: 477270 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:16,735-Speed 2497.20 samples/sec Loss 1.8783 LearningRate 0.000223 Epoch: 23 Global Step: 477280 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:24,933-Speed 2498.40 samples/sec Loss 1.8463 LearningRate 0.000223 Epoch: 23 Global Step: 477290 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:33,138-Speed 2496.56 samples/sec Loss 1.8473 LearningRate 0.000223 Epoch: 23 Global Step: 477300 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:41,284-Speed 2514.45 samples/sec Loss 1.8935 LearningRate 0.000223 Epoch: 23 Global Step: 477310 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:49,485-Speed 2497.74 samples/sec Loss 1.8750 LearningRate 0.000223 Epoch: 23 Global Step: 477320 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:42:57,682-Speed 2498.77 samples/sec Loss 1.8599 LearningRate 0.000223 Epoch: 23 Global Step: 477330 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:05,883-Speed 2497.57 samples/sec Loss 1.8594 LearningRate 0.000223 Epoch: 23 Global Step: 477340 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:14,087-Speed 2496.67 samples/sec Loss 1.8756 LearningRate 0.000223 Epoch: 23 Global Step: 477350 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:22,288-Speed 2497.77 samples/sec Loss 1.8687 LearningRate 0.000223 Epoch: 23 Global Step: 477360 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:30,448-Speed 2510.19 samples/sec Loss 1.8131 LearningRate 0.000223 Epoch: 23 Global Step: 477370 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:38,647-Speed 2498.33 samples/sec Loss 1.8976 LearningRate 0.000223 Epoch: 23 Global Step: 477380 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:46,853-Speed 2496.08 samples/sec Loss 1.8649 LearningRate 0.000222 Epoch: 23 Global Step: 477390 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:43:55,054-Speed 2497.98 samples/sec Loss 1.8394 LearningRate 0.000222 Epoch: 23 Global Step: 477400 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:03,268-Speed 2493.67 samples/sec Loss 1.8497 LearningRate 0.000222 Epoch: 23 Global Step: 477410 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:11,470-Speed 2497.23 samples/sec Loss 1.8410 LearningRate 0.000222 Epoch: 23 Global Step: 477420 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:19,624-Speed 2512.06 samples/sec Loss 1.8440 LearningRate 0.000222 Epoch: 23 Global Step: 477430 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:27,837-Speed 2494.04 samples/sec Loss 1.8258 LearningRate 0.000222 Epoch: 23 Global Step: 477440 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:36,038-Speed 2497.57 samples/sec Loss 1.8329 LearningRate 0.000222 Epoch: 23 Global Step: 477450 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:44,245-Speed 2495.86 samples/sec Loss 1.8641 LearningRate 0.000222 Epoch: 23 Global Step: 477460 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:44:52,457-Speed 2494.39 samples/sec Loss 1.8723 LearningRate 0.000222 Epoch: 23 Global Step: 477470 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:00,659-Speed 2497.31 samples/sec Loss 1.8221 LearningRate 0.000222 Epoch: 23 Global Step: 477480 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:08,817-Speed 2510.97 samples/sec Loss 1.8811 LearningRate 0.000222 Epoch: 23 Global Step: 477490 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:17,021-Speed 2496.73 samples/sec Loss 1.8313 LearningRate 0.000222 Epoch: 23 Global Step: 477500 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:25,234-Speed 2494.00 samples/sec Loss 1.8204 LearningRate 0.000222 Epoch: 23 Global Step: 477510 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:33,440-Speed 2496.17 samples/sec Loss 1.8610 LearningRate 0.000222 Epoch: 23 Global Step: 477520 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:41,639-Speed 2498.20 samples/sec Loss 1.8793 LearningRate 0.000222 Epoch: 23 Global Step: 477530 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:49,841-Speed 2497.27 samples/sec Loss 1.8066 LearningRate 0.000222 Epoch: 23 Global Step: 477540 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:45:57,988-Speed 2514.29 samples/sec Loss 1.8553 LearningRate 0.000222 Epoch: 23 Global Step: 477550 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:06,188-Speed 2497.89 samples/sec Loss 1.8526 LearningRate 0.000222 Epoch: 23 Global Step: 477560 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:14,389-Speed 2497.88 samples/sec Loss 1.8247 LearningRate 0.000222 Epoch: 23 Global Step: 477570 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:22,589-Speed 2497.74 samples/sec Loss 1.8378 LearningRate 0.000222 Epoch: 23 Global Step: 477580 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:30,790-Speed 2497.61 samples/sec Loss 1.8182 LearningRate 0.000222 Epoch: 23 Global Step: 477590 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:38,992-Speed 2497.37 samples/sec Loss 1.8667 LearningRate 0.000222 Epoch: 23 Global Step: 477600 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:47,139-Speed 2514.14 samples/sec Loss 1.8450 LearningRate 0.000222 Epoch: 23 Global Step: 477610 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:46:55,355-Speed 2493.24 samples/sec Loss 1.8507 LearningRate 0.000222 Epoch: 23 Global Step: 477620 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:03,555-Speed 2497.85 samples/sec Loss 1.8657 LearningRate 0.000222 Epoch: 23 Global Step: 477630 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:11,768-Speed 2493.96 samples/sec Loss 1.8664 LearningRate 0.000222 Epoch: 23 Global Step: 477640 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:19,969-Speed 2497.92 samples/sec Loss 1.8641 LearningRate 0.000222 Epoch: 23 Global Step: 477650 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:28,167-Speed 2498.41 samples/sec Loss 1.8731 LearningRate 0.000222 Epoch: 23 Global Step: 477660 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:36,321-Speed 2512.00 samples/sec Loss 1.8642 LearningRate 0.000222 Epoch: 23 Global Step: 477670 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:44,523-Speed 2497.52 samples/sec Loss 1.8635 LearningRate 0.000222 Epoch: 23 Global Step: 477680 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:47:52,725-Speed 2497.28 samples/sec Loss 1.8214 LearningRate 0.000222 Epoch: 23 Global Step: 477690 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:00,956-Speed 2488.83 samples/sec Loss 1.8512 LearningRate 0.000222 Epoch: 23 Global Step: 477700 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:09,157-Speed 2497.45 samples/sec Loss 1.8566 LearningRate 0.000222 Epoch: 23 Global Step: 477710 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:17,356-Speed 2498.21 samples/sec Loss 1.8793 LearningRate 0.000222 Epoch: 23 Global Step: 477720 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:25,508-Speed 2512.90 samples/sec Loss 1.8921 LearningRate 0.000222 Epoch: 23 Global Step: 477730 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:33,708-Speed 2497.82 samples/sec Loss 1.8434 LearningRate 0.000222 Epoch: 23 Global Step: 477740 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:41,910-Speed 2497.35 samples/sec Loss 1.8812 LearningRate 0.000222 Epoch: 23 Global Step: 477750 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:50,123-Speed 2496.16 samples/sec Loss 1.8527 LearningRate 0.000222 Epoch: 23 Global Step: 477760 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:48:58,321-Speed 2498.67 samples/sec Loss 1.9087 LearningRate 0.000222 Epoch: 23 Global Step: 477770 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:06,526-Speed 2496.18 samples/sec Loss 1.8596 LearningRate 0.000222 Epoch: 23 Global Step: 477780 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:14,679-Speed 2512.43 samples/sec Loss 1.8269 LearningRate 0.000222 Epoch: 23 Global Step: 477790 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:22,886-Speed 2495.96 samples/sec Loss 1.7905 LearningRate 0.000222 Epoch: 23 Global Step: 477800 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:31,092-Speed 2496.19 samples/sec Loss 1.8486 LearningRate 0.000222 Epoch: 23 Global Step: 477810 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:39,295-Speed 2496.88 samples/sec Loss 1.8817 LearningRate 0.000222 Epoch: 23 Global Step: 477820 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:47,505-Speed 2495.07 samples/sec Loss 1.8318 LearningRate 0.000222 Epoch: 23 Global Step: 477830 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:49:55,707-Speed 2497.38 samples/sec Loss 1.8496 LearningRate 0.000222 Epoch: 23 Global Step: 477840 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:03,853-Speed 2514.27 samples/sec Loss 1.7958 LearningRate 0.000222 Epoch: 23 Global Step: 477850 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:12,056-Speed 2497.17 samples/sec Loss 1.8344 LearningRate 0.000222 Epoch: 23 Global Step: 477860 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:23,982-Speed 2496.71 samples/sec Loss 1.8458 LearningRate 0.000222 Epoch: 23 Global Step: 477870 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:32,223-Speed 2500.64 samples/sec Loss 1.8477 LearningRate 0.000222 Epoch: 23 Global Step: 477880 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:40,424-Speed 2497.30 samples/sec Loss 1.8328 LearningRate 0.000222 Epoch: 23 Global Step: 477890 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:48,654-Speed 2500.91 samples/sec Loss 1.8504 LearningRate 0.000222 Epoch: 23 Global Step: 477900 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:50:56,820-Speed 2517.14 samples/sec Loss 1.8215 LearningRate 0.000222 Epoch: 23 Global Step: 477910 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:51:05,016-Speed 2499.07 samples/sec Loss 1.8823 LearningRate 0.000222 Epoch: 23 Global Step: 477920 Fp16 Grad Scale: 16384 Required: 81 hours Training: 2022-07-10 03:51:17,747-Speed 1618.78 samples/sec Loss 1.8172 LearningRate 0.000222 Epoch: 23 Global Step: 477930 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:51:25,972-Speed 2501.46 samples/sec Loss 1.8513 LearningRate 0.000222 Epoch: 23 Global Step: 477940 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:51:34,164-Speed 2500.17 samples/sec Loss 1.8737 LearningRate 0.000222 Epoch: 23 Global Step: 477950 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:51:42,362-Speed 2498.69 samples/sec Loss 1.8282 LearningRate 0.000222 Epoch: 23 Global Step: 477960 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:51:55,057-Speed 1623.47 samples/sec Loss 1.8471 LearningRate 0.000222 Epoch: 23 Global Step: 477970 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:03,260-Speed 2502.22 samples/sec Loss 1.8450 LearningRate 0.000222 Epoch: 23 Global Step: 477980 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:11,454-Speed 2499.72 samples/sec Loss 1.8544 LearningRate 0.000222 Epoch: 23 Global Step: 477990 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:24,902-Speed 2502.12 samples/sec Loss 1.8166 LearningRate 0.000222 Epoch: 23 Global Step: 478000 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:33,133-Speed 2501.39 samples/sec Loss 1.8467 LearningRate 0.000222 Epoch: 23 Global Step: 478010 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:44,100-Speed 1867.45 samples/sec Loss 1.8736 LearningRate 0.000222 Epoch: 23 Global Step: 478020 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:52:52,301-Speed 2519.29 samples/sec Loss 1.8854 LearningRate 0.000222 Epoch: 23 Global Step: 478030 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:01,584-Speed 2210.60 samples/sec Loss 1.8446 LearningRate 0.000222 Epoch: 23 Global Step: 478040 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:10,714-Speed 2243.45 samples/sec Loss 1.8451 LearningRate 0.000222 Epoch: 23 Global Step: 478050 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:23,492-Speed 2496.21 samples/sec Loss 1.8765 LearningRate 0.000222 Epoch: 23 Global Step: 478060 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:32,958-Speed 2179.14 samples/sec Loss 1.8227 LearningRate 0.000222 Epoch: 23 Global Step: 478070 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:42,706-Speed 2502.86 samples/sec Loss 1.8451 LearningRate 0.000222 Epoch: 23 Global Step: 478080 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:50,847-Speed 2516.03 samples/sec Loss 1.8691 LearningRate 0.000222 Epoch: 23 Global Step: 478090 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:53:59,041-Speed 2499.80 samples/sec Loss 1.8639 LearningRate 0.000222 Epoch: 23 Global Step: 478100 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:07,238-Speed 2498.91 samples/sec Loss 1.8577 LearningRate 0.000222 Epoch: 23 Global Step: 478110 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:15,443-Speed 2496.30 samples/sec Loss 1.8660 LearningRate 0.000222 Epoch: 23 Global Step: 478120 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:23,648-Speed 2496.70 samples/sec Loss 1.8384 LearningRate 0.000222 Epoch: 23 Global Step: 478130 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:31,846-Speed 2498.91 samples/sec Loss 1.8806 LearningRate 0.000222 Epoch: 23 Global Step: 478140 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:39,997-Speed 2513.32 samples/sec Loss 1.8434 LearningRate 0.000222 Epoch: 23 Global Step: 478150 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:48,209-Speed 2494.33 samples/sec Loss 1.8273 LearningRate 0.000222 Epoch: 23 Global Step: 478160 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:54:56,413-Speed 2496.79 samples/sec Loss 1.8830 LearningRate 0.000222 Epoch: 23 Global Step: 478170 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:04,618-Speed 2496.56 samples/sec Loss 1.8449 LearningRate 0.000222 Epoch: 23 Global Step: 478180 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:12,821-Speed 2497.88 samples/sec Loss 1.8791 LearningRate 0.000221 Epoch: 23 Global Step: 478190 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:21,027-Speed 2496.19 samples/sec Loss 1.8933 LearningRate 0.000221 Epoch: 23 Global Step: 478200 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:29,192-Speed 2508.71 samples/sec Loss 1.8572 LearningRate 0.000221 Epoch: 23 Global Step: 478210 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:37,405-Speed 2494.11 samples/sec Loss 1.8311 LearningRate 0.000221 Epoch: 23 Global Step: 478220 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:45,614-Speed 2494.97 samples/sec Loss 1.8778 LearningRate 0.000221 Epoch: 23 Global Step: 478230 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:55:53,811-Speed 2498.94 samples/sec Loss 1.8923 LearningRate 0.000221 Epoch: 23 Global Step: 478240 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:02,015-Speed 2496.63 samples/sec Loss 1.8199 LearningRate 0.000221 Epoch: 23 Global Step: 478250 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:10,214-Speed 2498.65 samples/sec Loss 1.8702 LearningRate 0.000221 Epoch: 23 Global Step: 478260 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:18,362-Speed 2513.91 samples/sec Loss 1.8939 LearningRate 0.000221 Epoch: 23 Global Step: 478270 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:26,565-Speed 2496.81 samples/sec Loss 1.8990 LearningRate 0.000221 Epoch: 23 Global Step: 478280 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:34,767-Speed 2497.54 samples/sec Loss 1.8990 LearningRate 0.000221 Epoch: 23 Global Step: 478290 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:42,967-Speed 2498.85 samples/sec Loss 1.8594 LearningRate 0.000221 Epoch: 23 Global Step: 478300 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:51,170-Speed 2497.09 samples/sec Loss 1.8513 LearningRate 0.000221 Epoch: 23 Global Step: 478310 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:56:59,374-Speed 2496.55 samples/sec Loss 1.8601 LearningRate 0.000221 Epoch: 23 Global Step: 478320 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:07,523-Speed 2513.74 samples/sec Loss 1.8309 LearningRate 0.000221 Epoch: 23 Global Step: 478330 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:15,727-Speed 2496.49 samples/sec Loss 1.8687 LearningRate 0.000221 Epoch: 23 Global Step: 478340 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:23,928-Speed 2498.06 samples/sec Loss 1.8654 LearningRate 0.000221 Epoch: 23 Global Step: 478350 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:32,146-Speed 2492.47 samples/sec Loss 1.8496 LearningRate 0.000221 Epoch: 23 Global Step: 478360 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:40,354-Speed 2495.43 samples/sec Loss 1.8803 LearningRate 0.000221 Epoch: 23 Global Step: 478370 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 03:57:48,557-Speed 2496.97 samples/sec Loss 1.8556 LearningRate 0.000221 Epoch: 23 Global Step: 478380 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:57:56,709-Speed 2512.57 samples/sec Loss 1.8769 LearningRate 0.000221 Epoch: 23 Global Step: 478390 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:04,909-Speed 2497.73 samples/sec Loss 1.8195 LearningRate 0.000221 Epoch: 23 Global Step: 478400 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:13,114-Speed 2496.59 samples/sec Loss 1.8640 LearningRate 0.000221 Epoch: 23 Global Step: 478410 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:21,314-Speed 2498.12 samples/sec Loss 1.8556 LearningRate 0.000221 Epoch: 23 Global Step: 478420 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:29,525-Speed 2494.41 samples/sec Loss 1.8283 LearningRate 0.000221 Epoch: 23 Global Step: 478430 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:37,733-Speed 2495.64 samples/sec Loss 1.8294 LearningRate 0.000221 Epoch: 23 Global Step: 478440 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:45,893-Speed 2510.42 samples/sec Loss 1.8432 LearningRate 0.000221 Epoch: 23 Global Step: 478450 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:58:54,095-Speed 2497.18 samples/sec Loss 1.8624 LearningRate 0.000221 Epoch: 23 Global Step: 478460 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:02,300-Speed 2496.51 samples/sec Loss 1.8573 LearningRate 0.000221 Epoch: 23 Global Step: 478470 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:10,501-Speed 2497.50 samples/sec Loss 1.8861 LearningRate 0.000221 Epoch: 23 Global Step: 478480 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:18,713-Speed 2494.30 samples/sec Loss 1.8687 LearningRate 0.000221 Epoch: 23 Global Step: 478490 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:26,922-Speed 2495.24 samples/sec Loss 1.8371 LearningRate 0.000221 Epoch: 23 Global Step: 478500 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:35,083-Speed 2509.92 samples/sec Loss 1.8238 LearningRate 0.000221 Epoch: 23 Global Step: 478510 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:43,284-Speed 2497.53 samples/sec Loss 1.8464 LearningRate 0.000221 Epoch: 23 Global Step: 478520 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:51,509-Speed 2490.84 samples/sec Loss 1.8641 LearningRate 0.000221 Epoch: 23 Global Step: 478530 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 03:59:59,721-Speed 2494.20 samples/sec Loss 1.8646 LearningRate 0.000221 Epoch: 23 Global Step: 478540 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:07,940-Speed 2492.25 samples/sec Loss 1.8592 LearningRate 0.000221 Epoch: 23 Global Step: 478550 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:16,140-Speed 2497.91 samples/sec Loss 1.8637 LearningRate 0.000221 Epoch: 23 Global Step: 478560 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:24,290-Speed 2513.50 samples/sec Loss 1.8471 LearningRate 0.000221 Epoch: 23 Global Step: 478570 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:32,497-Speed 2495.80 samples/sec Loss 1.8760 LearningRate 0.000221 Epoch: 23 Global Step: 478580 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:40,714-Speed 2492.90 samples/sec Loss 1.8476 LearningRate 0.000221 Epoch: 23 Global Step: 478590 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:48,919-Speed 2496.57 samples/sec Loss 1.8590 LearningRate 0.000221 Epoch: 23 Global Step: 478600 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:00:57,120-Speed 2497.58 samples/sec Loss 1.8680 LearningRate 0.000221 Epoch: 23 Global Step: 478610 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:05,326-Speed 2496.11 samples/sec Loss 1.8723 LearningRate 0.000221 Epoch: 23 Global Step: 478620 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:13,478-Speed 2512.67 samples/sec Loss 1.8888 LearningRate 0.000221 Epoch: 23 Global Step: 478630 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:21,675-Speed 2498.81 samples/sec Loss 1.8832 LearningRate 0.000221 Epoch: 23 Global Step: 478640 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:29,894-Speed 2492.35 samples/sec Loss 1.8105 LearningRate 0.000221 Epoch: 23 Global Step: 478650 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:38,099-Speed 2496.29 samples/sec Loss 1.8803 LearningRate 0.000221 Epoch: 23 Global Step: 478660 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:46,298-Speed 2498.20 samples/sec Loss 1.8783 LearningRate 0.000221 Epoch: 23 Global Step: 478670 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:01:54,501-Speed 2497.13 samples/sec Loss 1.8859 LearningRate 0.000221 Epoch: 23 Global Step: 478680 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:02,655-Speed 2512.13 samples/sec Loss 1.8117 LearningRate 0.000221 Epoch: 23 Global Step: 478690 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:10,857-Speed 2497.34 samples/sec Loss 1.8937 LearningRate 0.000221 Epoch: 23 Global Step: 478700 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:19,054-Speed 2498.72 samples/sec Loss 1.8050 LearningRate 0.000221 Epoch: 23 Global Step: 478710 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:27,253-Speed 2498.24 samples/sec Loss 1.8382 LearningRate 0.000221 Epoch: 23 Global Step: 478720 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:35,455-Speed 2497.49 samples/sec Loss 1.8673 LearningRate 0.000221 Epoch: 23 Global Step: 478730 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:43,657-Speed 2497.18 samples/sec Loss 1.8730 LearningRate 0.000221 Epoch: 23 Global Step: 478740 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:02:51,824-Speed 2508.25 samples/sec Loss 1.8240 LearningRate 0.000221 Epoch: 23 Global Step: 478750 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:00,031-Speed 2496.10 samples/sec Loss 1.8798 LearningRate 0.000221 Epoch: 23 Global Step: 478760 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:08,237-Speed 2496.29 samples/sec Loss 1.8591 LearningRate 0.000221 Epoch: 23 Global Step: 478770 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:16,440-Speed 2497.08 samples/sec Loss 1.8508 LearningRate 0.000221 Epoch: 23 Global Step: 478780 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:24,663-Speed 2490.76 samples/sec Loss 1.8524 LearningRate 0.000221 Epoch: 23 Global Step: 478790 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:32,868-Speed 2496.62 samples/sec Loss 1.8887 LearningRate 0.000221 Epoch: 23 Global Step: 478800 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:41,018-Speed 2513.20 samples/sec Loss 1.8398 LearningRate 0.000221 Epoch: 23 Global Step: 478810 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:49,223-Speed 2496.44 samples/sec Loss 1.8343 LearningRate 0.000221 Epoch: 23 Global Step: 478820 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:03:57,424-Speed 2497.70 samples/sec Loss 1.8726 LearningRate 0.000221 Epoch: 23 Global Step: 478830 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:05,627-Speed 2497.20 samples/sec Loss 1.8556 LearningRate 0.000221 Epoch: 23 Global Step: 478840 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:13,827-Speed 2497.74 samples/sec Loss 1.8489 LearningRate 0.000221 Epoch: 23 Global Step: 478850 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:22,031-Speed 2496.78 samples/sec Loss 1.9290 LearningRate 0.000221 Epoch: 23 Global Step: 478860 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:30,180-Speed 2513.78 samples/sec Loss 1.8648 LearningRate 0.000221 Epoch: 23 Global Step: 478870 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:38,392-Speed 2494.12 samples/sec Loss 1.8030 LearningRate 0.000221 Epoch: 23 Global Step: 478880 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:46,596-Speed 2496.97 samples/sec Loss 1.8724 LearningRate 0.000221 Epoch: 23 Global Step: 478890 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:04:54,799-Speed 2497.11 samples/sec Loss 1.8305 LearningRate 0.000221 Epoch: 23 Global Step: 478900 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:03,005-Speed 2496.07 samples/sec Loss 1.8305 LearningRate 0.000221 Epoch: 23 Global Step: 478910 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:11,207-Speed 2497.11 samples/sec Loss 1.8855 LearningRate 0.000221 Epoch: 23 Global Step: 478920 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:19,360-Speed 2512.60 samples/sec Loss 1.8556 LearningRate 0.000221 Epoch: 23 Global Step: 478930 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:27,560-Speed 2497.76 samples/sec Loss 1.8277 LearningRate 0.000221 Epoch: 23 Global Step: 478940 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:35,765-Speed 2496.42 samples/sec Loss 1.8981 LearningRate 0.000221 Epoch: 23 Global Step: 478950 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:43,967-Speed 2497.33 samples/sec Loss 1.8203 LearningRate 0.000221 Epoch: 23 Global Step: 478960 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:05:52,169-Speed 2497.50 samples/sec Loss 1.8738 LearningRate 0.000221 Epoch: 23 Global Step: 478970 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:00,370-Speed 2497.55 samples/sec Loss 1.8626 LearningRate 0.000220 Epoch: 23 Global Step: 478980 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:08,520-Speed 2513.41 samples/sec Loss 1.8622 LearningRate 0.000220 Epoch: 23 Global Step: 478990 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:16,721-Speed 2497.57 samples/sec Loss 1.8646 LearningRate 0.000220 Epoch: 23 Global Step: 479000 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:24,922-Speed 2497.69 samples/sec Loss 1.8354 LearningRate 0.000220 Epoch: 23 Global Step: 479010 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:33,122-Speed 2497.77 samples/sec Loss 1.8673 LearningRate 0.000220 Epoch: 23 Global Step: 479020 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:41,348-Speed 2490.13 samples/sec Loss 1.8648 LearningRate 0.000220 Epoch: 23 Global Step: 479030 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:49,550-Speed 2497.39 samples/sec Loss 1.8306 LearningRate 0.000220 Epoch: 23 Global Step: 479040 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:06:57,699-Speed 2513.48 samples/sec Loss 1.8767 LearningRate 0.000220 Epoch: 23 Global Step: 479050 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:05,903-Speed 2496.74 samples/sec Loss 1.8722 LearningRate 0.000220 Epoch: 23 Global Step: 479060 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:14,105-Speed 2497.31 samples/sec Loss 1.8501 LearningRate 0.000220 Epoch: 23 Global Step: 479070 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:22,312-Speed 2496.01 samples/sec Loss 1.8620 LearningRate 0.000220 Epoch: 23 Global Step: 479080 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:30,516-Speed 2496.64 samples/sec Loss 1.8417 LearningRate 0.000220 Epoch: 23 Global Step: 479090 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:38,718-Speed 2497.58 samples/sec Loss 1.8943 LearningRate 0.000220 Epoch: 23 Global Step: 479100 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:46,871-Speed 2512.35 samples/sec Loss 1.8725 LearningRate 0.000220 Epoch: 23 Global Step: 479110 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:07:55,074-Speed 2497.25 samples/sec Loss 1.8340 LearningRate 0.000220 Epoch: 23 Global Step: 479120 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:03,289-Speed 2493.78 samples/sec Loss 1.8702 LearningRate 0.000220 Epoch: 23 Global Step: 479130 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:11,487-Speed 2498.64 samples/sec Loss 1.8459 LearningRate 0.000220 Epoch: 23 Global Step: 479140 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:19,697-Speed 2494.92 samples/sec Loss 1.8669 LearningRate 0.000220 Epoch: 23 Global Step: 479150 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:27,902-Speed 2496.71 samples/sec Loss 1.9589 LearningRate 0.000220 Epoch: 23 Global Step: 479160 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:36,051-Speed 2513.51 samples/sec Loss 1.9101 LearningRate 0.000220 Epoch: 23 Global Step: 479170 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:44,256-Speed 2496.56 samples/sec Loss 1.7950 LearningRate 0.000220 Epoch: 23 Global Step: 479180 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:08:52,457-Speed 2497.65 samples/sec Loss 1.8428 LearningRate 0.000220 Epoch: 23 Global Step: 479190 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:00,663-Speed 2496.29 samples/sec Loss 1.8841 LearningRate 0.000220 Epoch: 23 Global Step: 479200 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:08,880-Speed 2492.70 samples/sec Loss 1.8182 LearningRate 0.000220 Epoch: 23 Global Step: 479210 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:17,084-Speed 2496.84 samples/sec Loss 1.8472 LearningRate 0.000220 Epoch: 23 Global Step: 479220 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:25,238-Speed 2512.12 samples/sec Loss 1.8175 LearningRate 0.000220 Epoch: 23 Global Step: 479230 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:33,445-Speed 2495.67 samples/sec Loss 1.8495 LearningRate 0.000220 Epoch: 23 Global Step: 479240 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:41,648-Speed 2497.23 samples/sec Loss 1.8033 LearningRate 0.000220 Epoch: 23 Global Step: 479250 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:49,854-Speed 2496.25 samples/sec Loss 1.8662 LearningRate 0.000220 Epoch: 23 Global Step: 479260 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:09:58,060-Speed 2496.04 samples/sec Loss 1.8341 LearningRate 0.000220 Epoch: 23 Global Step: 479270 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:06,275-Speed 2493.43 samples/sec Loss 1.8323 LearningRate 0.000220 Epoch: 23 Global Step: 479280 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:14,430-Speed 2511.85 samples/sec Loss 1.8235 LearningRate 0.000220 Epoch: 23 Global Step: 479290 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:22,636-Speed 2496.48 samples/sec Loss 1.8909 LearningRate 0.000220 Epoch: 23 Global Step: 479300 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:30,839-Speed 2497.10 samples/sec Loss 1.8227 LearningRate 0.000220 Epoch: 23 Global Step: 479310 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:39,046-Speed 2496.11 samples/sec Loss 1.8362 LearningRate 0.000220 Epoch: 23 Global Step: 479320 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:47,248-Speed 2497.52 samples/sec Loss 1.8641 LearningRate 0.000220 Epoch: 23 Global Step: 479330 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:10:55,452-Speed 2496.98 samples/sec Loss 1.8277 LearningRate 0.000220 Epoch: 23 Global Step: 479340 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:03,599-Speed 2513.93 samples/sec Loss 1.8207 LearningRate 0.000220 Epoch: 23 Global Step: 479350 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:11,816-Speed 2493.52 samples/sec Loss 1.8143 LearningRate 0.000220 Epoch: 23 Global Step: 479360 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:20,020-Speed 2496.67 samples/sec Loss 1.8823 LearningRate 0.000220 Epoch: 23 Global Step: 479370 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:28,232-Speed 2494.24 samples/sec Loss 1.8424 LearningRate 0.000220 Epoch: 23 Global Step: 479380 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:36,430-Speed 2498.53 samples/sec Loss 1.8374 LearningRate 0.000220 Epoch: 23 Global Step: 479390 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:44,634-Speed 2496.85 samples/sec Loss 1.8524 LearningRate 0.000220 Epoch: 23 Global Step: 479400 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:11:52,781-Speed 2514.15 samples/sec Loss 1.8488 LearningRate 0.000220 Epoch: 23 Global Step: 479410 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:00,986-Speed 2496.46 samples/sec Loss 1.8143 LearningRate 0.000220 Epoch: 23 Global Step: 479420 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:09,198-Speed 2494.32 samples/sec Loss 1.8761 LearningRate 0.000220 Epoch: 23 Global Step: 479430 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:17,394-Speed 2499.14 samples/sec Loss 1.8400 LearningRate 0.000220 Epoch: 23 Global Step: 479440 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:25,600-Speed 2495.91 samples/sec Loss 1.8497 LearningRate 0.000220 Epoch: 23 Global Step: 479450 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:33,805-Speed 2496.56 samples/sec Loss 1.8666 LearningRate 0.000220 Epoch: 23 Global Step: 479460 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:41,958-Speed 2512.56 samples/sec Loss 1.8607 LearningRate 0.000220 Epoch: 23 Global Step: 479470 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:50,158-Speed 2497.95 samples/sec Loss 1.8867 LearningRate 0.000220 Epoch: 23 Global Step: 479480 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:12:58,363-Speed 2496.47 samples/sec Loss 1.8459 LearningRate 0.000220 Epoch: 23 Global Step: 479490 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:06,569-Speed 2496.17 samples/sec Loss 1.8151 LearningRate 0.000220 Epoch: 23 Global Step: 479500 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:14,777-Speed 2495.45 samples/sec Loss 1.8401 LearningRate 0.000220 Epoch: 23 Global Step: 479510 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:22,993-Speed 2492.96 samples/sec Loss 1.8408 LearningRate 0.000220 Epoch: 23 Global Step: 479520 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:31,155-Speed 2509.49 samples/sec Loss 1.8423 LearningRate 0.000220 Epoch: 23 Global Step: 479530 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:39,355-Speed 2498.18 samples/sec Loss 1.8333 LearningRate 0.000220 Epoch: 23 Global Step: 479540 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:47,557-Speed 2497.29 samples/sec Loss 1.8843 LearningRate 0.000220 Epoch: 23 Global Step: 479550 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:13:55,759-Speed 2497.56 samples/sec Loss 1.8525 LearningRate 0.000220 Epoch: 23 Global Step: 479560 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:14:03,960-Speed 2497.49 samples/sec Loss 1.8822 LearningRate 0.000220 Epoch: 23 Global Step: 479570 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:14:12,162-Speed 2497.71 samples/sec Loss 1.8285 LearningRate 0.000220 Epoch: 23 Global Step: 479580 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:14:20,313-Speed 2513.10 samples/sec Loss 1.8544 LearningRate 0.000220 Epoch: 23 Global Step: 479590 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:14:28,521-Speed 2495.36 samples/sec Loss 1.8446 LearningRate 0.000220 Epoch: 23 Global Step: 479600 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:14:36,735-Speed 2493.86 samples/sec Loss 1.8639 LearningRate 0.000220 Epoch: 23 Global Step: 479610 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:14:44,936-Speed 2497.40 samples/sec Loss 1.8488 LearningRate 0.000220 Epoch: 23 Global Step: 479620 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:14:53,147-Speed 2494.72 samples/sec Loss 1.8266 LearningRate 0.000220 Epoch: 23 Global Step: 479630 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:01,347-Speed 2497.83 samples/sec Loss 1.8269 LearningRate 0.000220 Epoch: 23 Global Step: 479640 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:09,495-Speed 2513.94 samples/sec Loss 1.8231 LearningRate 0.000220 Epoch: 23 Global Step: 479650 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:17,700-Speed 2496.58 samples/sec Loss 1.8484 LearningRate 0.000220 Epoch: 23 Global Step: 479660 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:25,909-Speed 2495.12 samples/sec Loss 1.8561 LearningRate 0.000220 Epoch: 23 Global Step: 479670 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:34,109-Speed 2498.10 samples/sec Loss 1.8420 LearningRate 0.000220 Epoch: 23 Global Step: 479680 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:42,310-Speed 2497.49 samples/sec Loss 1.8395 LearningRate 0.000220 Epoch: 23 Global Step: 479690 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:50,511-Speed 2497.69 samples/sec Loss 1.8504 LearningRate 0.000220 Epoch: 23 Global Step: 479700 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:15:58,669-Speed 2510.86 samples/sec Loss 1.8335 LearningRate 0.000220 Epoch: 23 Global Step: 479710 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:06,870-Speed 2497.72 samples/sec Loss 1.8584 LearningRate 0.000220 Epoch: 23 Global Step: 479720 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:15,069-Speed 2498.02 samples/sec Loss 1.8671 LearningRate 0.000220 Epoch: 23 Global Step: 479730 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:23,271-Speed 2497.55 samples/sec Loss 1.8352 LearningRate 0.000220 Epoch: 23 Global Step: 479740 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:31,471-Speed 2497.85 samples/sec Loss 1.8057 LearningRate 0.000220 Epoch: 23 Global Step: 479750 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:39,674-Speed 2497.12 samples/sec Loss 1.8781 LearningRate 0.000220 Epoch: 23 Global Step: 479760 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:47,825-Speed 2512.78 samples/sec Loss 1.8171 LearningRate 0.000220 Epoch: 23 Global Step: 479770 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:16:56,028-Speed 2497.15 samples/sec Loss 1.8161 LearningRate 0.000219 Epoch: 23 Global Step: 479780 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:04,228-Speed 2497.95 samples/sec Loss 1.8367 LearningRate 0.000219 Epoch: 23 Global Step: 479790 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:12,429-Speed 2497.58 samples/sec Loss 1.8385 LearningRate 0.000219 Epoch: 23 Global Step: 479800 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:20,628-Speed 2498.19 samples/sec Loss 1.8238 LearningRate 0.000219 Epoch: 23 Global Step: 479810 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:28,837-Speed 2495.25 samples/sec Loss 1.8769 LearningRate 0.000219 Epoch: 23 Global Step: 479820 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:36,986-Speed 2513.74 samples/sec Loss 1.8090 LearningRate 0.000219 Epoch: 23 Global Step: 479830 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:45,186-Speed 2497.86 samples/sec Loss 1.8396 LearningRate 0.000219 Epoch: 23 Global Step: 479840 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:17:53,389-Speed 2497.07 samples/sec Loss 1.8536 LearningRate 0.000219 Epoch: 23 Global Step: 479850 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:01,593-Speed 2496.73 samples/sec Loss 1.8309 LearningRate 0.000219 Epoch: 23 Global Step: 479860 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:09,799-Speed 2496.27 samples/sec Loss 1.8548 LearningRate 0.000219 Epoch: 23 Global Step: 479870 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:18,000-Speed 2497.54 samples/sec Loss 1.8515 LearningRate 0.000219 Epoch: 23 Global Step: 479880 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:26,151-Speed 2512.89 samples/sec Loss 1.8471 LearningRate 0.000219 Epoch: 23 Global Step: 479890 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:34,353-Speed 2497.38 samples/sec Loss 1.8224 LearningRate 0.000219 Epoch: 23 Global Step: 479900 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:42,553-Speed 2497.94 samples/sec Loss 1.8187 LearningRate 0.000219 Epoch: 23 Global Step: 479910 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:50,756-Speed 2496.93 samples/sec Loss 1.8133 LearningRate 0.000219 Epoch: 23 Global Step: 479920 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:18:58,958-Speed 2497.30 samples/sec Loss 1.8534 LearningRate 0.000219 Epoch: 23 Global Step: 479930 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:07,161-Speed 2497.18 samples/sec Loss 1.8229 LearningRate 0.000219 Epoch: 23 Global Step: 479940 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:15,316-Speed 2511.79 samples/sec Loss 1.8278 LearningRate 0.000219 Epoch: 23 Global Step: 479950 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:23,519-Speed 2496.96 samples/sec Loss 1.8313 LearningRate 0.000219 Epoch: 23 Global Step: 479960 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:31,720-Speed 2497.61 samples/sec Loss 1.8124 LearningRate 0.000219 Epoch: 23 Global Step: 479970 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:39,919-Speed 2498.29 samples/sec Loss 1.8305 LearningRate 0.000219 Epoch: 23 Global Step: 479980 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:48,118-Speed 2498.10 samples/sec Loss 1.8133 LearningRate 0.000219 Epoch: 23 Global Step: 479990 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:19:56,283-Speed 2508.72 samples/sec Loss 1.8520 LearningRate 0.000219 Epoch: 23 Global Step: 480000 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:04,431-Speed 2514.08 samples/sec Loss 1.8407 LearningRate 0.000219 Epoch: 23 Global Step: 480010 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:12,632-Speed 2497.26 samples/sec Loss 1.8536 LearningRate 0.000219 Epoch: 23 Global Step: 480020 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:20,832-Speed 2498.16 samples/sec Loss 1.8425 LearningRate 0.000219 Epoch: 23 Global Step: 480030 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:29,030-Speed 2498.66 samples/sec Loss 1.8386 LearningRate 0.000219 Epoch: 23 Global Step: 480040 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:37,244-Speed 2493.77 samples/sec Loss 1.8452 LearningRate 0.000219 Epoch: 23 Global Step: 480050 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:45,443-Speed 2498.18 samples/sec Loss 1.8481 LearningRate 0.000219 Epoch: 23 Global Step: 480060 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:20:53,592-Speed 2513.63 samples/sec Loss 1.8589 LearningRate 0.000219 Epoch: 23 Global Step: 480070 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:01,791-Speed 2498.22 samples/sec Loss 1.8884 LearningRate 0.000219 Epoch: 23 Global Step: 480080 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:09,995-Speed 2496.83 samples/sec Loss 1.8663 LearningRate 0.000219 Epoch: 23 Global Step: 480090 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:18,196-Speed 2497.52 samples/sec Loss 1.8504 LearningRate 0.000219 Epoch: 23 Global Step: 480100 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:26,400-Speed 2496.91 samples/sec Loss 1.8531 LearningRate 0.000219 Epoch: 23 Global Step: 480110 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:34,600-Speed 2497.99 samples/sec Loss 1.8485 LearningRate 0.000219 Epoch: 23 Global Step: 480120 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:42,748-Speed 2513.97 samples/sec Loss 1.8618 LearningRate 0.000219 Epoch: 23 Global Step: 480130 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:50,961-Speed 2494.11 samples/sec Loss 1.8512 LearningRate 0.000219 Epoch: 23 Global Step: 480140 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:21:59,162-Speed 2498.17 samples/sec Loss 1.8257 LearningRate 0.000219 Epoch: 23 Global Step: 480150 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:07,376-Speed 2493.92 samples/sec Loss 1.8773 LearningRate 0.000219 Epoch: 23 Global Step: 480160 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:15,582-Speed 2495.96 samples/sec Loss 1.8845 LearningRate 0.000219 Epoch: 23 Global Step: 480170 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:23,787-Speed 2496.40 samples/sec Loss 1.8657 LearningRate 0.000219 Epoch: 23 Global Step: 480180 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:31,963-Speed 2505.24 samples/sec Loss 1.8560 LearningRate 0.000219 Epoch: 23 Global Step: 480190 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:40,165-Speed 2497.14 samples/sec Loss 1.8995 LearningRate 0.000219 Epoch: 23 Global Step: 480200 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:48,375-Speed 2494.93 samples/sec Loss 1.8432 LearningRate 0.000219 Epoch: 23 Global Step: 480210 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:22:56,599-Speed 2490.43 samples/sec Loss 1.8471 LearningRate 0.000219 Epoch: 23 Global Step: 480220 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:04,808-Speed 2495.70 samples/sec Loss 1.8645 LearningRate 0.000219 Epoch: 23 Global Step: 480230 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:13,016-Speed 2495.48 samples/sec Loss 1.8712 LearningRate 0.000219 Epoch: 23 Global Step: 480240 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:21,180-Speed 2508.85 samples/sec Loss 1.8791 LearningRate 0.000219 Epoch: 23 Global Step: 480250 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:29,383-Speed 2497.06 samples/sec Loss 1.8322 LearningRate 0.000219 Epoch: 23 Global Step: 480260 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:37,588-Speed 2496.47 samples/sec Loss 1.8673 LearningRate 0.000219 Epoch: 23 Global Step: 480270 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:45,789-Speed 2497.47 samples/sec Loss 1.8273 LearningRate 0.000219 Epoch: 23 Global Step: 480280 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:23:53,995-Speed 2496.33 samples/sec Loss 1.8217 LearningRate 0.000219 Epoch: 23 Global Step: 480290 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:02,193-Speed 2498.41 samples/sec Loss 1.8241 LearningRate 0.000219 Epoch: 23 Global Step: 480300 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:10,342-Speed 2514.21 samples/sec Loss 1.7948 LearningRate 0.000219 Epoch: 23 Global Step: 480310 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:18,557-Speed 2493.37 samples/sec Loss 1.8349 LearningRate 0.000219 Epoch: 23 Global Step: 480320 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:26,764-Speed 2495.85 samples/sec Loss 1.8640 LearningRate 0.000219 Epoch: 23 Global Step: 480330 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:34,963-Speed 2498.25 samples/sec Loss 1.8054 LearningRate 0.000219 Epoch: 23 Global Step: 480340 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:43,168-Speed 2496.52 samples/sec Loss 1.8303 LearningRate 0.000219 Epoch: 23 Global Step: 480350 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:51,370-Speed 2497.63 samples/sec Loss 1.8308 LearningRate 0.000219 Epoch: 23 Global Step: 480360 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:24:59,519-Speed 2513.32 samples/sec Loss 1.8479 LearningRate 0.000219 Epoch: 23 Global Step: 480370 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:07,720-Speed 2497.74 samples/sec Loss 1.8729 LearningRate 0.000219 Epoch: 23 Global Step: 480380 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:15,924-Speed 2496.80 samples/sec Loss 1.8397 LearningRate 0.000219 Epoch: 23 Global Step: 480390 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:24,128-Speed 2496.56 samples/sec Loss 1.8420 LearningRate 0.000219 Epoch: 23 Global Step: 480400 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:32,330-Speed 2497.46 samples/sec Loss 1.8513 LearningRate 0.000219 Epoch: 23 Global Step: 480410 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:40,533-Speed 2497.05 samples/sec Loss 1.8443 LearningRate 0.000219 Epoch: 23 Global Step: 480420 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:48,698-Speed 2508.89 samples/sec Loss 1.8619 LearningRate 0.000219 Epoch: 23 Global Step: 480430 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:25:56,899-Speed 2497.54 samples/sec Loss 1.8394 LearningRate 0.000219 Epoch: 23 Global Step: 480440 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:05,101-Speed 2497.26 samples/sec Loss 1.8452 LearningRate 0.000219 Epoch: 23 Global Step: 480450 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:13,314-Speed 2494.28 samples/sec Loss 1.8296 LearningRate 0.000219 Epoch: 23 Global Step: 480460 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:21,513-Speed 2498.21 samples/sec Loss 1.8617 LearningRate 0.000219 Epoch: 23 Global Step: 480470 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:29,717-Speed 2496.61 samples/sec Loss 1.8693 LearningRate 0.000219 Epoch: 23 Global Step: 480480 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:37,865-Speed 2513.93 samples/sec Loss 1.8248 LearningRate 0.000219 Epoch: 23 Global Step: 480490 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:46,068-Speed 2497.07 samples/sec Loss 1.8390 LearningRate 0.000219 Epoch: 23 Global Step: 480500 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:26:54,275-Speed 2495.90 samples/sec Loss 1.8198 LearningRate 0.000219 Epoch: 23 Global Step: 480510 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:02,475-Speed 2497.81 samples/sec Loss 1.8369 LearningRate 0.000219 Epoch: 23 Global Step: 480520 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:10,678-Speed 2497.10 samples/sec Loss 1.8271 LearningRate 0.000219 Epoch: 23 Global Step: 480530 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:18,881-Speed 2497.04 samples/sec Loss 1.8218 LearningRate 0.000219 Epoch: 23 Global Step: 480540 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:27,027-Speed 2514.69 samples/sec Loss 1.8466 LearningRate 0.000219 Epoch: 23 Global Step: 480550 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:35,236-Speed 2495.02 samples/sec Loss 1.8055 LearningRate 0.000219 Epoch: 23 Global Step: 480560 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:43,438-Speed 2497.55 samples/sec Loss 1.8471 LearningRate 0.000218 Epoch: 23 Global Step: 480570 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:51,644-Speed 2495.93 samples/sec Loss 1.8639 LearningRate 0.000218 Epoch: 23 Global Step: 480580 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:27:59,849-Speed 2496.44 samples/sec Loss 1.8575 LearningRate 0.000218 Epoch: 23 Global Step: 480590 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:08,052-Speed 2497.22 samples/sec Loss 1.7981 LearningRate 0.000218 Epoch: 23 Global Step: 480600 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:16,204-Speed 2512.47 samples/sec Loss 1.8190 LearningRate 0.000218 Epoch: 23 Global Step: 480610 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:24,410-Speed 2496.19 samples/sec Loss 1.8507 LearningRate 0.000218 Epoch: 23 Global Step: 480620 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:32,614-Speed 2496.81 samples/sec Loss 1.8454 LearningRate 0.000218 Epoch: 23 Global Step: 480630 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:40,820-Speed 2495.78 samples/sec Loss 1.8542 LearningRate 0.000218 Epoch: 23 Global Step: 480640 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:49,026-Speed 2496.42 samples/sec Loss 1.8770 LearningRate 0.000218 Epoch: 23 Global Step: 480650 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:28:57,232-Speed 2496.16 samples/sec Loss 1.8975 LearningRate 0.000218 Epoch: 23 Global Step: 480660 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:05,382-Speed 2513.02 samples/sec Loss 1.8742 LearningRate 0.000218 Epoch: 23 Global Step: 480670 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:13,587-Speed 2496.61 samples/sec Loss 1.8537 LearningRate 0.000218 Epoch: 23 Global Step: 480680 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:21,787-Speed 2497.91 samples/sec Loss 1.8047 LearningRate 0.000218 Epoch: 23 Global Step: 480690 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:29,989-Speed 2497.27 samples/sec Loss 1.8212 LearningRate 0.000218 Epoch: 23 Global Step: 480700 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:38,189-Speed 2498.32 samples/sec Loss 1.8572 LearningRate 0.000218 Epoch: 23 Global Step: 480710 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:46,402-Speed 2493.83 samples/sec Loss 1.8575 LearningRate 0.000218 Epoch: 23 Global Step: 480720 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:29:54,551-Speed 2513.85 samples/sec Loss 1.8487 LearningRate 0.000218 Epoch: 23 Global Step: 480730 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:02,752-Speed 2497.45 samples/sec Loss 1.8515 LearningRate 0.000218 Epoch: 23 Global Step: 480740 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:10,955-Speed 2496.93 samples/sec Loss 1.8701 LearningRate 0.000218 Epoch: 23 Global Step: 480750 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:19,155-Speed 2497.92 samples/sec Loss 1.8609 LearningRate 0.000218 Epoch: 23 Global Step: 480760 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:27,369-Speed 2493.86 samples/sec Loss 1.8205 LearningRate 0.000218 Epoch: 23 Global Step: 480770 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:35,578-Speed 2495.17 samples/sec Loss 1.8198 LearningRate 0.000218 Epoch: 23 Global Step: 480780 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:43,730-Speed 2512.75 samples/sec Loss 1.8602 LearningRate 0.000218 Epoch: 23 Global Step: 480790 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:30:51,938-Speed 2495.67 samples/sec Loss 1.8668 LearningRate 0.000218 Epoch: 23 Global Step: 480800 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:00,145-Speed 2495.90 samples/sec Loss 1.8572 LearningRate 0.000218 Epoch: 23 Global Step: 480810 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:08,348-Speed 2497.23 samples/sec Loss 1.8396 LearningRate 0.000218 Epoch: 23 Global Step: 480820 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:16,547-Speed 2498.25 samples/sec Loss 1.8980 LearningRate 0.000218 Epoch: 23 Global Step: 480830 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:24,758-Speed 2494.58 samples/sec Loss 1.8897 LearningRate 0.000218 Epoch: 23 Global Step: 480840 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:32,912-Speed 2512.07 samples/sec Loss 1.7808 LearningRate 0.000218 Epoch: 23 Global Step: 480850 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:41,114-Speed 2497.22 samples/sec Loss 1.8210 LearningRate 0.000218 Epoch: 23 Global Step: 480860 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:49,327-Speed 2493.92 samples/sec Loss 1.8074 LearningRate 0.000218 Epoch: 23 Global Step: 480870 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:31:57,544-Speed 2492.80 samples/sec Loss 1.8352 LearningRate 0.000218 Epoch: 23 Global Step: 480880 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:05,750-Speed 2496.15 samples/sec Loss 1.8589 LearningRate 0.000218 Epoch: 23 Global Step: 480890 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:13,952-Speed 2498.28 samples/sec Loss 1.8568 LearningRate 0.000218 Epoch: 23 Global Step: 480900 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:22,105-Speed 2512.66 samples/sec Loss 1.8275 LearningRate 0.000218 Epoch: 23 Global Step: 480910 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:30,314-Speed 2495.37 samples/sec Loss 1.8900 LearningRate 0.000218 Epoch: 23 Global Step: 480920 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:38,513-Speed 2497.95 samples/sec Loss 1.8639 LearningRate 0.000218 Epoch: 23 Global Step: 480930 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:46,735-Speed 2491.78 samples/sec Loss 1.7854 LearningRate 0.000218 Epoch: 23 Global Step: 480940 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:32:54,944-Speed 2495.05 samples/sec Loss 1.8199 LearningRate 0.000218 Epoch: 23 Global Step: 480950 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:03,150-Speed 2496.04 samples/sec Loss 1.8290 LearningRate 0.000218 Epoch: 23 Global Step: 480960 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:11,310-Speed 2510.22 samples/sec Loss 1.8437 LearningRate 0.000218 Epoch: 23 Global Step: 480970 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:19,525-Speed 2493.56 samples/sec Loss 1.8689 LearningRate 0.000218 Epoch: 23 Global Step: 480980 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:27,740-Speed 2493.47 samples/sec Loss 1.8479 LearningRate 0.000218 Epoch: 23 Global Step: 480990 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:35,943-Speed 2496.94 samples/sec Loss 1.8367 LearningRate 0.000218 Epoch: 23 Global Step: 481000 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:44,158-Speed 2493.62 samples/sec Loss 1.8176 LearningRate 0.000218 Epoch: 23 Global Step: 481010 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:33:52,363-Speed 2496.30 samples/sec Loss 1.8207 LearningRate 0.000218 Epoch: 23 Global Step: 481020 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:00,512-Speed 2513.53 samples/sec Loss 1.8164 LearningRate 0.000218 Epoch: 23 Global Step: 481030 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:08,720-Speed 2495.73 samples/sec Loss 1.8381 LearningRate 0.000218 Epoch: 23 Global Step: 481040 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:16,925-Speed 2496.57 samples/sec Loss 1.8239 LearningRate 0.000218 Epoch: 23 Global Step: 481050 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:25,130-Speed 2496.41 samples/sec Loss 1.8362 LearningRate 0.000218 Epoch: 23 Global Step: 481060 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:33,335-Speed 2496.32 samples/sec Loss 1.8387 LearningRate 0.000218 Epoch: 23 Global Step: 481070 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:41,538-Speed 2497.23 samples/sec Loss 1.8351 LearningRate 0.000218 Epoch: 23 Global Step: 481080 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:49,687-Speed 2513.47 samples/sec Loss 1.8467 LearningRate 0.000218 Epoch: 23 Global Step: 481090 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:34:57,893-Speed 2496.32 samples/sec Loss 1.8036 LearningRate 0.000218 Epoch: 23 Global Step: 481100 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:06,094-Speed 2497.60 samples/sec Loss 1.8511 LearningRate 0.000218 Epoch: 23 Global Step: 481110 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:14,299-Speed 2496.43 samples/sec Loss 1.8590 LearningRate 0.000218 Epoch: 23 Global Step: 481120 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:22,503-Speed 2496.61 samples/sec Loss 1.8315 LearningRate 0.000218 Epoch: 23 Global Step: 481130 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:30,702-Speed 2498.60 samples/sec Loss 1.8643 LearningRate 0.000218 Epoch: 23 Global Step: 481140 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:38,851-Speed 2513.68 samples/sec Loss 1.8951 LearningRate 0.000218 Epoch: 23 Global Step: 481150 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:47,050-Speed 2498.02 samples/sec Loss 1.8597 LearningRate 0.000218 Epoch: 23 Global Step: 481160 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:35:55,250-Speed 2498.01 samples/sec Loss 1.8736 LearningRate 0.000218 Epoch: 23 Global Step: 481170 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:36:03,455-Speed 2496.51 samples/sec Loss 1.8289 LearningRate 0.000218 Epoch: 23 Global Step: 481180 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:36:11,680-Speed 2490.66 samples/sec Loss 1.8373 LearningRate 0.000218 Epoch: 23 Global Step: 481190 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:36:19,881-Speed 2497.34 samples/sec Loss 1.8382 LearningRate 0.000218 Epoch: 23 Global Step: 481200 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:36:28,047-Speed 2508.63 samples/sec Loss 1.8147 LearningRate 0.000218 Epoch: 23 Global Step: 481210 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:36:36,266-Speed 2492.42 samples/sec Loss 1.8018 LearningRate 0.000218 Epoch: 23 Global Step: 481220 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:36:44,469-Speed 2496.90 samples/sec Loss 1.8407 LearningRate 0.000218 Epoch: 23 Global Step: 481230 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:36:52,670-Speed 2497.83 samples/sec Loss 1.8478 LearningRate 0.000218 Epoch: 23 Global Step: 481240 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:37:00,872-Speed 2497.30 samples/sec Loss 1.8471 LearningRate 0.000218 Epoch: 23 Global Step: 481250 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:37:09,081-Speed 2495.29 samples/sec Loss 1.8355 LearningRate 0.000218 Epoch: 23 Global Step: 481260 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:37:17,238-Speed 2511.13 samples/sec Loss 1.8404 LearningRate 0.000218 Epoch: 23 Global Step: 481270 Fp16 Grad Scale: 65536 Required: 80 hours Training: 2022-07-10 04:37:25,409-Speed 2506.77 samples/sec Loss 1.8267 LearningRate 0.000218 Epoch: 23 Global Step: 481280 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:37:33,610-Speed 2497.75 samples/sec Loss 1.8632 LearningRate 0.000218 Epoch: 23 Global Step: 481290 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:37:41,812-Speed 2497.39 samples/sec Loss 1.8580 LearningRate 0.000218 Epoch: 23 Global Step: 481300 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:37:50,024-Speed 2494.66 samples/sec Loss 1.8029 LearningRate 0.000218 Epoch: 23 Global Step: 481310 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:37:58,224-Speed 2497.65 samples/sec Loss 1.8285 LearningRate 0.000218 Epoch: 23 Global Step: 481320 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:06,382-Speed 2510.96 samples/sec Loss 1.8260 LearningRate 0.000218 Epoch: 23 Global Step: 481330 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:14,581-Speed 2498.40 samples/sec Loss 1.8606 LearningRate 0.000218 Epoch: 23 Global Step: 481340 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:22,799-Speed 2492.82 samples/sec Loss 1.8734 LearningRate 0.000218 Epoch: 23 Global Step: 481350 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:31,031-Speed 2487.98 samples/sec Loss 1.8618 LearningRate 0.000218 Epoch: 23 Global Step: 481360 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:39,236-Speed 2496.64 samples/sec Loss 1.8199 LearningRate 0.000217 Epoch: 23 Global Step: 481370 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:47,444-Speed 2495.44 samples/sec Loss 1.8408 LearningRate 0.000217 Epoch: 23 Global Step: 481380 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:38:55,595-Speed 2512.90 samples/sec Loss 1.8877 LearningRate 0.000217 Epoch: 23 Global Step: 481390 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:03,798-Speed 2496.98 samples/sec Loss 1.8847 LearningRate 0.000217 Epoch: 23 Global Step: 481400 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:12,000-Speed 2497.59 samples/sec Loss 1.9339 LearningRate 0.000217 Epoch: 23 Global Step: 481410 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:20,206-Speed 2496.20 samples/sec Loss 1.8531 LearningRate 0.000217 Epoch: 23 Global Step: 481420 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:28,405-Speed 2498.10 samples/sec Loss 1.8509 LearningRate 0.000217 Epoch: 23 Global Step: 481430 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:36,625-Speed 2491.75 samples/sec Loss 1.8503 LearningRate 0.000217 Epoch: 23 Global Step: 481440 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:44,774-Speed 2513.63 samples/sec Loss 1.8748 LearningRate 0.000217 Epoch: 23 Global Step: 481450 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:39:52,976-Speed 2497.68 samples/sec Loss 1.8021 LearningRate 0.000217 Epoch: 23 Global Step: 481460 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:01,180-Speed 2497.04 samples/sec Loss 1.8463 LearningRate 0.000217 Epoch: 23 Global Step: 481470 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:09,380-Speed 2497.81 samples/sec Loss 1.8270 LearningRate 0.000217 Epoch: 23 Global Step: 481480 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:17,594-Speed 2493.74 samples/sec Loss 1.8324 LearningRate 0.000217 Epoch: 23 Global Step: 481490 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:25,815-Speed 2491.53 samples/sec Loss 1.8618 LearningRate 0.000217 Epoch: 23 Global Step: 481500 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:33,976-Speed 2509.71 samples/sec Loss 1.8041 LearningRate 0.000217 Epoch: 23 Global Step: 481510 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:42,178-Speed 2497.44 samples/sec Loss 1.7949 LearningRate 0.000217 Epoch: 23 Global Step: 481520 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:50,380-Speed 2497.61 samples/sec Loss 1.8775 LearningRate 0.000217 Epoch: 23 Global Step: 481530 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:40:58,581-Speed 2497.34 samples/sec Loss 1.8396 LearningRate 0.000217 Epoch: 23 Global Step: 481540 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:06,795-Speed 2494.04 samples/sec Loss 1.8528 LearningRate 0.000217 Epoch: 23 Global Step: 481550 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:15,012-Speed 2492.64 samples/sec Loss 1.7940 LearningRate 0.000217 Epoch: 23 Global Step: 481560 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:23,168-Speed 2511.44 samples/sec Loss 1.8403 LearningRate 0.000217 Epoch: 23 Global Step: 481570 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:31,371-Speed 2496.92 samples/sec Loss 1.8466 LearningRate 0.000217 Epoch: 23 Global Step: 481580 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:39,573-Speed 2497.33 samples/sec Loss 1.8324 LearningRate 0.000217 Epoch: 23 Global Step: 481590 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:47,775-Speed 2497.46 samples/sec Loss 1.8448 LearningRate 0.000217 Epoch: 23 Global Step: 481600 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:41:55,977-Speed 2497.38 samples/sec Loss 1.8190 LearningRate 0.000217 Epoch: 23 Global Step: 481610 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:04,178-Speed 2497.66 samples/sec Loss 1.7861 LearningRate 0.000217 Epoch: 23 Global Step: 481620 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:12,327-Speed 2513.67 samples/sec Loss 1.8273 LearningRate 0.000217 Epoch: 23 Global Step: 481630 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:20,541-Speed 2493.80 samples/sec Loss 1.8393 LearningRate 0.000217 Epoch: 23 Global Step: 481640 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:28,745-Speed 2496.76 samples/sec Loss 1.8071 LearningRate 0.000217 Epoch: 23 Global Step: 481650 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:36,949-Speed 2496.54 samples/sec Loss 1.8408 LearningRate 0.000217 Epoch: 23 Global Step: 481660 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:45,163-Speed 2493.81 samples/sec Loss 1.8089 LearningRate 0.000217 Epoch: 23 Global Step: 481670 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:42:53,362-Speed 2497.85 samples/sec Loss 1.8274 LearningRate 0.000217 Epoch: 23 Global Step: 481680 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:43:01,514-Speed 2512.89 samples/sec Loss 1.7972 LearningRate 0.000217 Epoch: 23 Global Step: 481690 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:43:09,720-Speed 2496.28 samples/sec Loss 1.8110 LearningRate 0.000217 Epoch: 23 Global Step: 481700 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:43:17,937-Speed 2492.51 samples/sec Loss 1.8120 LearningRate 0.000217 Epoch: 23 Global Step: 481710 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:43:26,145-Speed 2495.56 samples/sec Loss 1.8525 LearningRate 0.000217 Epoch: 23 Global Step: 481720 Fp16 Grad Scale: 32768 Required: 80 hours Training: 2022-07-10 04:43:34,314-Speed 2507.45 samples/sec Loss 1.8115 LearningRate 0.000217 Epoch: 23 Global Step: 481730 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:43:42,531-Speed 2492.75 samples/sec Loss 1.8081 LearningRate 0.000217 Epoch: 23 Global Step: 481740 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:43:50,711-Speed 2504.04 samples/sec Loss 1.8185 LearningRate 0.000217 Epoch: 23 Global Step: 481750 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:43:58,924-Speed 2494.05 samples/sec Loss 1.8428 LearningRate 0.000217 Epoch: 23 Global Step: 481760 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:07,128-Speed 2496.68 samples/sec Loss 1.8560 LearningRate 0.000217 Epoch: 23 Global Step: 481770 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:15,329-Speed 2497.49 samples/sec Loss 1.8396 LearningRate 0.000217 Epoch: 23 Global Step: 481780 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:23,530-Speed 2497.52 samples/sec Loss 1.8517 LearningRate 0.000217 Epoch: 23 Global Step: 481790 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:31,737-Speed 2495.98 samples/sec Loss 1.8438 LearningRate 0.000217 Epoch: 23 Global Step: 481800 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:39,887-Speed 2513.34 samples/sec Loss 1.8404 LearningRate 0.000217 Epoch: 23 Global Step: 481810 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:48,103-Speed 2493.29 samples/sec Loss 1.8126 LearningRate 0.000217 Epoch: 23 Global Step: 481820 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:44:56,306-Speed 2496.97 samples/sec Loss 1.8260 LearningRate 0.000217 Epoch: 23 Global Step: 481830 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:04,509-Speed 2497.12 samples/sec Loss 1.8499 LearningRate 0.000217 Epoch: 23 Global Step: 481840 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:12,726-Speed 2492.67 samples/sec Loss 1.8434 LearningRate 0.000217 Epoch: 23 Global Step: 481850 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:20,930-Speed 2496.60 samples/sec Loss 1.8430 LearningRate 0.000217 Epoch: 23 Global Step: 481860 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:29,082-Speed 2512.70 samples/sec Loss 1.8353 LearningRate 0.000217 Epoch: 23 Global Step: 481870 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:37,284-Speed 2497.30 samples/sec Loss 1.8578 LearningRate 0.000217 Epoch: 23 Global Step: 481880 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:45,486-Speed 2497.36 samples/sec Loss 1.8330 LearningRate 0.000217 Epoch: 23 Global Step: 481890 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:45:53,694-Speed 2495.61 samples/sec Loss 1.8408 LearningRate 0.000217 Epoch: 23 Global Step: 481900 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:01,898-Speed 2496.72 samples/sec Loss 1.8128 LearningRate 0.000217 Epoch: 23 Global Step: 481910 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:10,116-Speed 2492.48 samples/sec Loss 1.8470 LearningRate 0.000217 Epoch: 23 Global Step: 481920 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:18,264-Speed 2513.95 samples/sec Loss 1.8127 LearningRate 0.000217 Epoch: 23 Global Step: 481930 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:26,468-Speed 2496.61 samples/sec Loss 1.8259 LearningRate 0.000217 Epoch: 23 Global Step: 481940 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:34,672-Speed 2496.68 samples/sec Loss 1.8514 LearningRate 0.000217 Epoch: 23 Global Step: 481950 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:42,883-Speed 2494.66 samples/sec Loss 1.8290 LearningRate 0.000217 Epoch: 23 Global Step: 481960 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:51,086-Speed 2497.20 samples/sec Loss 1.8694 LearningRate 0.000217 Epoch: 23 Global Step: 481970 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:46:59,287-Speed 2497.50 samples/sec Loss 1.8555 LearningRate 0.000217 Epoch: 23 Global Step: 481980 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:07,433-Speed 2515.53 samples/sec Loss 1.8391 LearningRate 0.000217 Epoch: 23 Global Step: 481990 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:15,647-Speed 2493.64 samples/sec Loss 1.8840 LearningRate 0.000217 Epoch: 23 Global Step: 482000 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:23,855-Speed 2495.38 samples/sec Loss 1.8458 LearningRate 0.000217 Epoch: 23 Global Step: 482010 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:32,060-Speed 2496.23 samples/sec Loss 1.8442 LearningRate 0.000217 Epoch: 23 Global Step: 482020 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:40,262-Speed 2497.45 samples/sec Loss 1.8417 LearningRate 0.000217 Epoch: 23 Global Step: 482030 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:48,473-Speed 2494.73 samples/sec Loss 1.8469 LearningRate 0.000217 Epoch: 23 Global Step: 482040 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:47:56,626-Speed 2512.26 samples/sec Loss 1.8658 LearningRate 0.000217 Epoch: 23 Global Step: 482050 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:04,836-Speed 2494.87 samples/sec Loss 1.8537 LearningRate 0.000217 Epoch: 23 Global Step: 482060 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:13,039-Speed 2497.65 samples/sec Loss 1.8380 LearningRate 0.000217 Epoch: 23 Global Step: 482070 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:21,262-Speed 2490.82 samples/sec Loss 1.8629 LearningRate 0.000217 Epoch: 23 Global Step: 482080 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:29,463-Speed 2497.72 samples/sec Loss 1.8382 LearningRate 0.000217 Epoch: 23 Global Step: 482090 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:37,679-Speed 2493.06 samples/sec Loss 1.8248 LearningRate 0.000217 Epoch: 23 Global Step: 482100 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:45,831-Speed 2513.26 samples/sec Loss 1.8471 LearningRate 0.000217 Epoch: 23 Global Step: 482110 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:48:54,032-Speed 2497.79 samples/sec Loss 1.8164 LearningRate 0.000217 Epoch: 23 Global Step: 482120 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:02,249-Speed 2492.81 samples/sec Loss 1.8414 LearningRate 0.000217 Epoch: 23 Global Step: 482130 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:10,468-Speed 2492.23 samples/sec Loss 1.8294 LearningRate 0.000217 Epoch: 23 Global Step: 482140 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:18,669-Speed 2497.45 samples/sec Loss 1.8525 LearningRate 0.000217 Epoch: 23 Global Step: 482150 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:26,877-Speed 2495.62 samples/sec Loss 1.8621 LearningRate 0.000217 Epoch: 23 Global Step: 482160 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:35,026-Speed 2513.70 samples/sec Loss 1.8175 LearningRate 0.000216 Epoch: 23 Global Step: 482170 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:43,227-Speed 2497.81 samples/sec Loss 1.8202 LearningRate 0.000216 Epoch: 23 Global Step: 482180 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:51,435-Speed 2495.70 samples/sec Loss 1.8345 LearningRate 0.000216 Epoch: 23 Global Step: 482190 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:49:59,645-Speed 2494.58 samples/sec Loss 1.7847 LearningRate 0.000216 Epoch: 23 Global Step: 482200 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:07,849-Speed 2496.71 samples/sec Loss 1.8213 LearningRate 0.000216 Epoch: 23 Global Step: 482210 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:16,054-Speed 2496.43 samples/sec Loss 1.8758 LearningRate 0.000216 Epoch: 23 Global Step: 482220 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:24,204-Speed 2513.35 samples/sec Loss 1.8996 LearningRate 0.000216 Epoch: 23 Global Step: 482230 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:32,417-Speed 2493.91 samples/sec Loss 1.8439 LearningRate 0.000216 Epoch: 23 Global Step: 482240 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:40,620-Speed 2496.99 samples/sec Loss 1.8300 LearningRate 0.000216 Epoch: 23 Global Step: 482250 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:48,823-Speed 2497.34 samples/sec Loss 1.8571 LearningRate 0.000216 Epoch: 23 Global Step: 482260 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:50:57,030-Speed 2496.05 samples/sec Loss 1.8315 LearningRate 0.000216 Epoch: 23 Global Step: 482270 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:51:05,238-Speed 2495.54 samples/sec Loss 1.8348 LearningRate 0.000216 Epoch: 23 Global Step: 482280 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:51:13,386-Speed 2513.79 samples/sec Loss 1.8588 LearningRate 0.000216 Epoch: 23 Global Step: 482290 Fp16 Grad Scale: 16384 Required: 80 hours Training: 2022-07-10 04:51:21,593-Speed 2496.14 samples/sec Loss 1.8580 LearningRate 0.000216 Epoch: 23 Global Step: 482300 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:51:29,794-Speed 2497.50 samples/sec Loss 1.8416 LearningRate 0.000216 Epoch: 23 Global Step: 482310 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:51:38,005-Speed 2494.59 samples/sec Loss 1.8340 LearningRate 0.000216 Epoch: 23 Global Step: 482320 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:51:46,206-Speed 2497.75 samples/sec Loss 1.8145 LearningRate 0.000216 Epoch: 23 Global Step: 482330 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:51:54,408-Speed 2497.48 samples/sec Loss 1.7884 LearningRate 0.000216 Epoch: 23 Global Step: 482340 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:02,557-Speed 2513.66 samples/sec Loss 1.8037 LearningRate 0.000216 Epoch: 23 Global Step: 482350 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:10,766-Speed 2495.05 samples/sec Loss 1.8175 LearningRate 0.000216 Epoch: 23 Global Step: 482360 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:18,978-Speed 2494.09 samples/sec Loss 1.7983 LearningRate 0.000216 Epoch: 23 Global Step: 482370 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:27,182-Speed 2497.17 samples/sec Loss 1.8085 LearningRate 0.000216 Epoch: 23 Global Step: 482380 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:35,388-Speed 2496.39 samples/sec Loss 1.8439 LearningRate 0.000216 Epoch: 23 Global Step: 482390 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:43,601-Speed 2493.93 samples/sec Loss 1.8674 LearningRate 0.000216 Epoch: 23 Global Step: 482400 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:51,752-Speed 2513.12 samples/sec Loss 1.8420 LearningRate 0.000216 Epoch: 23 Global Step: 482410 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:52:59,961-Speed 2495.12 samples/sec Loss 1.8573 LearningRate 0.000216 Epoch: 23 Global Step: 482420 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:08,166-Speed 2496.35 samples/sec Loss 1.8126 LearningRate 0.000216 Epoch: 23 Global Step: 482430 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:16,369-Speed 2497.72 samples/sec Loss 1.8183 LearningRate 0.000216 Epoch: 23 Global Step: 482440 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:24,588-Speed 2492.57 samples/sec Loss 1.7790 LearningRate 0.000216 Epoch: 23 Global Step: 482450 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:32,787-Speed 2498.02 samples/sec Loss 1.8424 LearningRate 0.000216 Epoch: 23 Global Step: 482460 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:40,937-Speed 2513.38 samples/sec Loss 1.8126 LearningRate 0.000216 Epoch: 23 Global Step: 482470 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:49,153-Speed 2493.11 samples/sec Loss 1.8115 LearningRate 0.000216 Epoch: 23 Global Step: 482480 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:53:57,355-Speed 2497.58 samples/sec Loss 1.8435 LearningRate 0.000216 Epoch: 23 Global Step: 482490 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:05,556-Speed 2497.54 samples/sec Loss 1.8132 LearningRate 0.000216 Epoch: 23 Global Step: 482500 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:13,772-Speed 2493.24 samples/sec Loss 1.8779 LearningRate 0.000216 Epoch: 23 Global Step: 482510 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:21,976-Speed 2496.59 samples/sec Loss 1.8037 LearningRate 0.000216 Epoch: 23 Global Step: 482520 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:30,155-Speed 2504.55 samples/sec Loss 1.8211 LearningRate 0.000216 Epoch: 23 Global Step: 482530 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:38,356-Speed 2497.39 samples/sec Loss 1.8133 LearningRate 0.000216 Epoch: 23 Global Step: 482540 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:46,563-Speed 2495.80 samples/sec Loss 1.7922 LearningRate 0.000216 Epoch: 23 Global Step: 482550 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:54:54,768-Speed 2496.71 samples/sec Loss 1.7953 LearningRate 0.000216 Epoch: 23 Global Step: 482560 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:02,963-Speed 2499.20 samples/sec Loss 1.8151 LearningRate 0.000216 Epoch: 23 Global Step: 482570 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:11,164-Speed 2497.90 samples/sec Loss 1.7940 LearningRate 0.000216 Epoch: 23 Global Step: 482580 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:19,323-Speed 2510.69 samples/sec Loss 1.7936 LearningRate 0.000216 Epoch: 23 Global Step: 482590 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:27,532-Speed 2495.22 samples/sec Loss 1.8211 LearningRate 0.000216 Epoch: 23 Global Step: 482600 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:35,743-Speed 2494.30 samples/sec Loss 1.8394 LearningRate 0.000216 Epoch: 23 Global Step: 482610 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:43,948-Speed 2496.69 samples/sec Loss 1.8426 LearningRate 0.000216 Epoch: 23 Global Step: 482620 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:55:52,148-Speed 2497.93 samples/sec Loss 1.7974 LearningRate 0.000216 Epoch: 23 Global Step: 482630 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:00,346-Speed 2498.55 samples/sec Loss 1.7966 LearningRate 0.000216 Epoch: 23 Global Step: 482640 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:08,505-Speed 2510.75 samples/sec Loss 1.8376 LearningRate 0.000216 Epoch: 23 Global Step: 482650 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:16,705-Speed 2497.85 samples/sec Loss 1.8749 LearningRate 0.000216 Epoch: 23 Global Step: 482660 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:24,917-Speed 2494.50 samples/sec Loss 1.8297 LearningRate 0.000216 Epoch: 23 Global Step: 482670 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:33,116-Speed 2498.11 samples/sec Loss 1.8113 LearningRate 0.000216 Epoch: 23 Global Step: 482680 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:41,331-Speed 2493.44 samples/sec Loss 1.8646 LearningRate 0.000216 Epoch: 23 Global Step: 482690 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:49,532-Speed 2497.36 samples/sec Loss 1.7773 LearningRate 0.000216 Epoch: 23 Global Step: 482700 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:56:57,679-Speed 2514.47 samples/sec Loss 1.8422 LearningRate 0.000216 Epoch: 23 Global Step: 482710 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:05,882-Speed 2497.01 samples/sec Loss 1.8054 LearningRate 0.000216 Epoch: 23 Global Step: 482720 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:14,081-Speed 2498.32 samples/sec Loss 1.8315 LearningRate 0.000216 Epoch: 23 Global Step: 482730 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:22,285-Speed 2496.50 samples/sec Loss 1.8227 LearningRate 0.000216 Epoch: 23 Global Step: 482740 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:30,490-Speed 2496.62 samples/sec Loss 1.8480 LearningRate 0.000216 Epoch: 23 Global Step: 482750 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:38,694-Speed 2496.81 samples/sec Loss 1.7849 LearningRate 0.000216 Epoch: 23 Global Step: 482760 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:46,839-Speed 2514.87 samples/sec Loss 1.8739 LearningRate 0.000216 Epoch: 23 Global Step: 482770 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:57:55,041-Speed 2497.32 samples/sec Loss 1.8194 LearningRate 0.000216 Epoch: 23 Global Step: 482780 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:03,253-Speed 2494.48 samples/sec Loss 1.8029 LearningRate 0.000216 Epoch: 23 Global Step: 482790 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:11,456-Speed 2496.80 samples/sec Loss 1.8526 LearningRate 0.000216 Epoch: 23 Global Step: 482800 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:19,660-Speed 2496.80 samples/sec Loss 1.8543 LearningRate 0.000216 Epoch: 23 Global Step: 482810 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:27,868-Speed 2495.88 samples/sec Loss 1.8260 LearningRate 0.000216 Epoch: 23 Global Step: 482820 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:36,018-Speed 2513.21 samples/sec Loss 1.8824 LearningRate 0.000216 Epoch: 23 Global Step: 482830 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:44,223-Speed 2496.41 samples/sec Loss 1.7934 LearningRate 0.000216 Epoch: 23 Global Step: 482840 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:58:52,425-Speed 2497.45 samples/sec Loss 1.8493 LearningRate 0.000216 Epoch: 23 Global Step: 482850 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:00,636-Speed 2494.78 samples/sec Loss 1.8482 LearningRate 0.000216 Epoch: 23 Global Step: 482860 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:08,838-Speed 2497.41 samples/sec Loss 1.8000 LearningRate 0.000216 Epoch: 23 Global Step: 482870 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:17,042-Speed 2496.73 samples/sec Loss 1.8485 LearningRate 0.000216 Epoch: 23 Global Step: 482880 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:25,195-Speed 2512.23 samples/sec Loss 1.8543 LearningRate 0.000216 Epoch: 23 Global Step: 482890 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:33,398-Speed 2497.01 samples/sec Loss 1.8545 LearningRate 0.000216 Epoch: 23 Global Step: 482900 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:41,599-Speed 2497.96 samples/sec Loss 1.8748 LearningRate 0.000216 Epoch: 23 Global Step: 482910 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:49,802-Speed 2496.97 samples/sec Loss 1.8017 LearningRate 0.000216 Epoch: 23 Global Step: 482920 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 04:59:58,016-Speed 2493.80 samples/sec Loss 1.8313 LearningRate 0.000216 Epoch: 23 Global Step: 482930 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:00:06,177-Speed 2509.75 samples/sec Loss 1.8652 LearningRate 0.000216 Epoch: 23 Global Step: 482940 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:14,324-Speed 2514.29 samples/sec Loss 1.8170 LearningRate 0.000216 Epoch: 23 Global Step: 482950 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:22,525-Speed 2497.76 samples/sec Loss 1.7946 LearningRate 0.000216 Epoch: 23 Global Step: 482960 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:30,724-Speed 2498.09 samples/sec Loss 1.8260 LearningRate 0.000216 Epoch: 23 Global Step: 482970 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:38,921-Speed 2498.78 samples/sec Loss 1.8338 LearningRate 0.000215 Epoch: 23 Global Step: 482980 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:47,124-Speed 2499.03 samples/sec Loss 1.8250 LearningRate 0.000215 Epoch: 23 Global Step: 482990 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:00:55,327-Speed 2497.32 samples/sec Loss 1.8338 LearningRate 0.000215 Epoch: 23 Global Step: 483000 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:03,470-Speed 2515.29 samples/sec Loss 1.8303 LearningRate 0.000215 Epoch: 23 Global Step: 483010 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:11,671-Speed 2497.55 samples/sec Loss 1.8561 LearningRate 0.000215 Epoch: 23 Global Step: 483020 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:19,871-Speed 2498.16 samples/sec Loss 1.8298 LearningRate 0.000215 Epoch: 23 Global Step: 483030 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:28,074-Speed 2496.97 samples/sec Loss 1.8099 LearningRate 0.000215 Epoch: 23 Global Step: 483040 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:36,279-Speed 2496.31 samples/sec Loss 1.8122 LearningRate 0.000215 Epoch: 23 Global Step: 483050 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:44,492-Speed 2494.29 samples/sec Loss 1.8300 LearningRate 0.000215 Epoch: 23 Global Step: 483060 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:01:52,657-Speed 2508.40 samples/sec Loss 1.8174 LearningRate 0.000215 Epoch: 23 Global Step: 483070 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:00,860-Speed 2497.13 samples/sec Loss 1.8911 LearningRate 0.000215 Epoch: 23 Global Step: 483080 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:09,065-Speed 2496.30 samples/sec Loss 1.8484 LearningRate 0.000215 Epoch: 23 Global Step: 483090 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:17,284-Speed 2492.56 samples/sec Loss 1.8189 LearningRate 0.000215 Epoch: 23 Global Step: 483100 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:25,486-Speed 2497.46 samples/sec Loss 1.7971 LearningRate 0.000215 Epoch: 23 Global Step: 483110 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:33,695-Speed 2495.23 samples/sec Loss 1.8220 LearningRate 0.000215 Epoch: 23 Global Step: 483120 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:41,858-Speed 2509.34 samples/sec Loss 1.8384 LearningRate 0.000215 Epoch: 23 Global Step: 483130 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:50,060-Speed 2497.61 samples/sec Loss 1.8563 LearningRate 0.000215 Epoch: 23 Global Step: 483140 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:02:58,283-Speed 2491.05 samples/sec Loss 1.7932 LearningRate 0.000215 Epoch: 23 Global Step: 483150 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:06,495-Speed 2494.38 samples/sec Loss 1.8357 LearningRate 0.000215 Epoch: 23 Global Step: 483160 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:14,697-Speed 2497.32 samples/sec Loss 1.8610 LearningRate 0.000215 Epoch: 23 Global Step: 483170 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:22,908-Speed 2495.01 samples/sec Loss 1.7730 LearningRate 0.000215 Epoch: 23 Global Step: 483180 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:31,069-Speed 2509.92 samples/sec Loss 1.8351 LearningRate 0.000215 Epoch: 23 Global Step: 483190 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:39,269-Speed 2497.88 samples/sec Loss 1.8718 LearningRate 0.000215 Epoch: 23 Global Step: 483200 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:47,473-Speed 2496.65 samples/sec Loss 1.8055 LearningRate 0.000215 Epoch: 23 Global Step: 483210 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:03:55,688-Speed 2493.62 samples/sec Loss 1.8473 LearningRate 0.000215 Epoch: 23 Global Step: 483220 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:03,897-Speed 2495.37 samples/sec Loss 1.8353 LearningRate 0.000215 Epoch: 23 Global Step: 483230 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:12,100-Speed 2497.05 samples/sec Loss 1.8728 LearningRate 0.000215 Epoch: 23 Global Step: 483240 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:20,253-Speed 2512.54 samples/sec Loss 1.7878 LearningRate 0.000215 Epoch: 23 Global Step: 483250 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:28,458-Speed 2496.73 samples/sec Loss 1.8676 LearningRate 0.000215 Epoch: 23 Global Step: 483260 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:36,663-Speed 2496.11 samples/sec Loss 1.8801 LearningRate 0.000215 Epoch: 23 Global Step: 483270 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:44,864-Speed 2497.81 samples/sec Loss 1.8281 LearningRate 0.000215 Epoch: 23 Global Step: 483280 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:04:53,064-Speed 2497.79 samples/sec Loss 1.8430 LearningRate 0.000215 Epoch: 23 Global Step: 483290 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:01,261-Speed 2499.21 samples/sec Loss 1.8214 LearningRate 0.000215 Epoch: 23 Global Step: 483300 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:09,415-Speed 2512.18 samples/sec Loss 1.8801 LearningRate 0.000215 Epoch: 23 Global Step: 483310 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:17,626-Speed 2494.46 samples/sec Loss 1.8596 LearningRate 0.000215 Epoch: 23 Global Step: 483320 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:25,828-Speed 2497.43 samples/sec Loss 1.8282 LearningRate 0.000215 Epoch: 23 Global Step: 483330 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:34,041-Speed 2494.21 samples/sec Loss 1.8358 LearningRate 0.000215 Epoch: 23 Global Step: 483340 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:42,240-Speed 2498.14 samples/sec Loss 1.8390 LearningRate 0.000215 Epoch: 23 Global Step: 483350 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:50,461-Speed 2491.86 samples/sec Loss 1.8374 LearningRate 0.000215 Epoch: 23 Global Step: 483360 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:05:58,616-Speed 2511.76 samples/sec Loss 1.8631 LearningRate 0.000215 Epoch: 23 Global Step: 483370 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:06,839-Speed 2490.85 samples/sec Loss 1.8370 LearningRate 0.000215 Epoch: 23 Global Step: 483380 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:15,045-Speed 2496.02 samples/sec Loss 1.7993 LearningRate 0.000215 Epoch: 23 Global Step: 483390 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:23,253-Speed 2495.74 samples/sec Loss 1.8411 LearningRate 0.000215 Epoch: 23 Global Step: 483400 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:31,464-Speed 2494.88 samples/sec Loss 1.7938 LearningRate 0.000215 Epoch: 23 Global Step: 483410 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:39,663-Speed 2498.31 samples/sec Loss 1.8454 LearningRate 0.000215 Epoch: 23 Global Step: 483420 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:47,814-Speed 2513.13 samples/sec Loss 1.8528 LearningRate 0.000215 Epoch: 23 Global Step: 483430 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:06:56,012-Speed 2498.39 samples/sec Loss 1.8602 LearningRate 0.000215 Epoch: 23 Global Step: 483440 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:04,219-Speed 2496.13 samples/sec Loss 1.8250 LearningRate 0.000215 Epoch: 23 Global Step: 483450 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:12,433-Speed 2493.76 samples/sec Loss 1.8520 LearningRate 0.000215 Epoch: 23 Global Step: 483460 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:20,635-Speed 2497.31 samples/sec Loss 1.7905 LearningRate 0.000215 Epoch: 23 Global Step: 483470 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:28,835-Speed 2497.97 samples/sec Loss 1.8245 LearningRate 0.000215 Epoch: 23 Global Step: 483480 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:36,992-Speed 2511.05 samples/sec Loss 1.8483 LearningRate 0.000215 Epoch: 23 Global Step: 483490 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:45,846-Speed 2497.98 samples/sec Loss 1.8473 LearningRate 0.000215 Epoch: 23 Global Step: 483500 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:07:54,068-Speed 2491.04 samples/sec Loss 1.8047 LearningRate 0.000215 Epoch: 23 Global Step: 483510 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:02,312-Speed 2500.10 samples/sec Loss 1.8177 LearningRate 0.000215 Epoch: 23 Global Step: 483520 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:16,855-Speed 2500.43 samples/sec Loss 1.8233 LearningRate 0.000215 Epoch: 23 Global Step: 483530 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:25,065-Speed 2502.12 samples/sec Loss 1.8323 LearningRate 0.000215 Epoch: 23 Global Step: 483540 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:33,209-Speed 2514.80 samples/sec Loss 1.8254 LearningRate 0.000215 Epoch: 23 Global Step: 483550 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:46,660-Speed 1525.59 samples/sec Loss 1.8302 LearningRate 0.000215 Epoch: 23 Global Step: 483560 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:08:54,899-Speed 2500.15 samples/sec Loss 1.7833 LearningRate 0.000215 Epoch: 23 Global Step: 483570 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:03,099-Speed 2497.75 samples/sec Loss 1.8303 LearningRate 0.000215 Epoch: 23 Global Step: 483580 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:16,016-Speed 1591.37 samples/sec Loss 1.7949 LearningRate 0.000215 Epoch: 23 Global Step: 483590 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:25,281-Speed 2500.81 samples/sec Loss 1.8227 LearningRate 0.000215 Epoch: 23 Global Step: 483600 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:33,451-Speed 2507.19 samples/sec Loss 1.8066 LearningRate 0.000215 Epoch: 23 Global Step: 483610 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:41,698-Speed 2497.70 samples/sec Loss 1.7990 LearningRate 0.000215 Epoch: 23 Global Step: 483620 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:51,025-Speed 2250.98 samples/sec Loss 1.8774 LearningRate 0.000215 Epoch: 23 Global Step: 483630 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:09:59,257-Speed 2488.06 samples/sec Loss 1.7896 LearningRate 0.000215 Epoch: 23 Global Step: 483640 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:12,374-Speed 1566.20 samples/sec Loss 1.8149 LearningRate 0.000215 Epoch: 23 Global Step: 483650 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:20,613-Speed 2497.39 samples/sec Loss 1.8319 LearningRate 0.000215 Epoch: 23 Global Step: 483660 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:28,764-Speed 2512.86 samples/sec Loss 1.8110 LearningRate 0.000215 Epoch: 23 Global Step: 483670 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:40,421-Speed 1784.17 samples/sec Loss 1.8709 LearningRate 0.000215 Epoch: 23 Global Step: 483680 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:48,626-Speed 2500.25 samples/sec Loss 1.8461 LearningRate 0.000215 Epoch: 23 Global Step: 483690 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:10:57,571-Speed 2289.62 samples/sec Loss 1.8446 LearningRate 0.000215 Epoch: 23 Global Step: 483700 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:11,980-Speed 2496.15 samples/sec Loss 1.8212 LearningRate 0.000215 Epoch: 23 Global Step: 483710 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:21,149-Speed 2500.48 samples/sec Loss 1.8417 LearningRate 0.000215 Epoch: 23 Global Step: 483720 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:29,829-Speed 2518.21 samples/sec Loss 1.8688 LearningRate 0.000215 Epoch: 23 Global Step: 483730 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:38,025-Speed 2498.89 samples/sec Loss 1.8685 LearningRate 0.000215 Epoch: 23 Global Step: 483740 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:46,245-Speed 2492.46 samples/sec Loss 1.8407 LearningRate 0.000215 Epoch: 23 Global Step: 483750 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:11:54,450-Speed 2496.73 samples/sec Loss 1.8116 LearningRate 0.000215 Epoch: 23 Global Step: 483760 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:02,668-Speed 2492.36 samples/sec Loss 1.8487 LearningRate 0.000215 Epoch: 23 Global Step: 483770 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:10,873-Speed 2496.64 samples/sec Loss 1.8934 LearningRate 0.000214 Epoch: 23 Global Step: 483780 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:19,029-Speed 2511.39 samples/sec Loss 1.8202 LearningRate 0.000214 Epoch: 23 Global Step: 483790 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:27,241-Speed 2494.20 samples/sec Loss 1.8431 LearningRate 0.000214 Epoch: 23 Global Step: 483800 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:35,472-Speed 2488.51 samples/sec Loss 1.8428 LearningRate 0.000214 Epoch: 23 Global Step: 483810 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:43,680-Speed 2495.32 samples/sec Loss 1.8685 LearningRate 0.000214 Epoch: 23 Global Step: 483820 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:12:51,888-Speed 2495.44 samples/sec Loss 1.8331 LearningRate 0.000214 Epoch: 23 Global Step: 483830 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:00,121-Speed 2488.04 samples/sec Loss 1.8557 LearningRate 0.000214 Epoch: 23 Global Step: 483840 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:08,273-Speed 2512.67 samples/sec Loss 1.8005 LearningRate 0.000214 Epoch: 23 Global Step: 483850 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:16,481-Speed 2495.70 samples/sec Loss 1.8434 LearningRate 0.000214 Epoch: 23 Global Step: 483860 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:24,683-Speed 2497.50 samples/sec Loss 1.8699 LearningRate 0.000214 Epoch: 23 Global Step: 483870 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:32,888-Speed 2496.47 samples/sec Loss 1.8601 LearningRate 0.000214 Epoch: 23 Global Step: 483880 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:41,105-Speed 2492.98 samples/sec Loss 1.8342 LearningRate 0.000214 Epoch: 23 Global Step: 483890 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:49,319-Speed 2493.67 samples/sec Loss 1.8519 LearningRate 0.000214 Epoch: 23 Global Step: 483900 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:13:57,479-Speed 2510.15 samples/sec Loss 1.8652 LearningRate 0.000214 Epoch: 23 Global Step: 483910 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:05,688-Speed 2495.27 samples/sec Loss 1.8633 LearningRate 0.000214 Epoch: 23 Global Step: 483920 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:13,891-Speed 2497.12 samples/sec Loss 1.8160 LearningRate 0.000214 Epoch: 23 Global Step: 483930 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:22,097-Speed 2496.22 samples/sec Loss 1.8246 LearningRate 0.000214 Epoch: 23 Global Step: 483940 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:30,311-Speed 2493.80 samples/sec Loss 1.8684 LearningRate 0.000214 Epoch: 23 Global Step: 483950 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:38,520-Speed 2495.33 samples/sec Loss 1.8145 LearningRate 0.000214 Epoch: 23 Global Step: 483960 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:46,672-Speed 2512.83 samples/sec Loss 1.8453 LearningRate 0.000214 Epoch: 23 Global Step: 483970 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:14:54,882-Speed 2494.83 samples/sec Loss 1.8873 LearningRate 0.000214 Epoch: 23 Global Step: 483980 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:03,089-Speed 2495.90 samples/sec Loss 1.8425 LearningRate 0.000214 Epoch: 23 Global Step: 483990 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:11,293-Speed 2496.50 samples/sec Loss 1.8653 LearningRate 0.000214 Epoch: 23 Global Step: 484000 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:19,499-Speed 2496.20 samples/sec Loss 1.8616 LearningRate 0.000214 Epoch: 23 Global Step: 484010 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:27,703-Speed 2496.92 samples/sec Loss 1.8471 LearningRate 0.000214 Epoch: 23 Global Step: 484020 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:35,865-Speed 2509.74 samples/sec Loss 1.8414 LearningRate 0.000214 Epoch: 23 Global Step: 484030 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:44,083-Speed 2492.45 samples/sec Loss 1.8374 LearningRate 0.000214 Epoch: 23 Global Step: 484040 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:15:52,298-Speed 2493.37 samples/sec Loss 1.8380 LearningRate 0.000214 Epoch: 23 Global Step: 484050 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:00,512-Speed 2493.77 samples/sec Loss 1.8475 LearningRate 0.000214 Epoch: 23 Global Step: 484060 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:08,714-Speed 2497.31 samples/sec Loss 1.8388 LearningRate 0.000214 Epoch: 23 Global Step: 484070 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:16,925-Speed 2494.45 samples/sec Loss 1.8189 LearningRate 0.000214 Epoch: 23 Global Step: 484080 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:25,075-Speed 2513.56 samples/sec Loss 1.7963 LearningRate 0.000214 Epoch: 23 Global Step: 484090 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:33,281-Speed 2496.21 samples/sec Loss 1.7964 LearningRate 0.000214 Epoch: 23 Global Step: 484100 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:41,503-Speed 2491.31 samples/sec Loss 1.8356 LearningRate 0.000214 Epoch: 23 Global Step: 484110 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:49,709-Speed 2496.12 samples/sec Loss 1.8495 LearningRate 0.000214 Epoch: 23 Global Step: 484120 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:16:57,912-Speed 2497.05 samples/sec Loss 1.8174 LearningRate 0.000214 Epoch: 23 Global Step: 484130 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:17:06,120-Speed 2495.64 samples/sec Loss 1.8132 LearningRate 0.000214 Epoch: 23 Global Step: 484140 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:14,273-Speed 2512.35 samples/sec Loss 1.8438 LearningRate 0.000214 Epoch: 23 Global Step: 484150 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:22,480-Speed 2495.91 samples/sec Loss 1.8657 LearningRate 0.000214 Epoch: 23 Global Step: 484160 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:30,689-Speed 2495.49 samples/sec Loss 1.8451 LearningRate 0.000214 Epoch: 23 Global Step: 484170 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:38,906-Speed 2492.67 samples/sec Loss 1.8490 LearningRate 0.000214 Epoch: 23 Global Step: 484180 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:47,117-Speed 2494.61 samples/sec Loss 1.8638 LearningRate 0.000214 Epoch: 23 Global Step: 484190 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:17:55,327-Speed 2495.40 samples/sec Loss 1.8268 LearningRate 0.000214 Epoch: 23 Global Step: 484200 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:18:03,488-Speed 2509.97 samples/sec Loss 1.8049 LearningRate 0.000214 Epoch: 23 Global Step: 484210 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:18:11,711-Speed 2491.11 samples/sec Loss 1.8031 LearningRate 0.000214 Epoch: 23 Global Step: 484220 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:18:19,874-Speed 2509.20 samples/sec Loss 1.8368 LearningRate 0.000214 Epoch: 23 Global Step: 484230 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:18:28,082-Speed 2495.54 samples/sec Loss 1.8092 LearningRate 0.000214 Epoch: 23 Global Step: 484240 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:18:36,287-Speed 2496.47 samples/sec Loss 1.8197 LearningRate 0.000214 Epoch: 23 Global Step: 484250 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:18:44,508-Speed 2491.57 samples/sec Loss 1.8893 LearningRate 0.000214 Epoch: 23 Global Step: 484260 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:18:52,662-Speed 2511.93 samples/sec Loss 1.8454 LearningRate 0.000214 Epoch: 23 Global Step: 484270 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:00,868-Speed 2496.09 samples/sec Loss 1.7942 LearningRate 0.000214 Epoch: 23 Global Step: 484280 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:09,069-Speed 2497.74 samples/sec Loss 1.8092 LearningRate 0.000214 Epoch: 23 Global Step: 484290 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:17,286-Speed 2492.76 samples/sec Loss 1.8284 LearningRate 0.000214 Epoch: 23 Global Step: 484300 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:25,499-Speed 2493.70 samples/sec Loss 1.8121 LearningRate 0.000214 Epoch: 23 Global Step: 484310 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:33,705-Speed 2496.15 samples/sec Loss 1.8556 LearningRate 0.000214 Epoch: 23 Global Step: 484320 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:41,867-Speed 2509.38 samples/sec Loss 1.7909 LearningRate 0.000214 Epoch: 23 Global Step: 484330 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:50,090-Speed 2491.07 samples/sec Loss 1.8037 LearningRate 0.000214 Epoch: 23 Global Step: 484340 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:19:58,296-Speed 2496.23 samples/sec Loss 1.8460 LearningRate 0.000214 Epoch: 23 Global Step: 484350 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:06,503-Speed 2495.90 samples/sec Loss 1.8365 LearningRate 0.000214 Epoch: 23 Global Step: 484360 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:14,708-Speed 2496.16 samples/sec Loss 1.8484 LearningRate 0.000214 Epoch: 23 Global Step: 484370 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:22,914-Speed 2496.01 samples/sec Loss 1.7853 LearningRate 0.000214 Epoch: 23 Global Step: 484380 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:31,067-Speed 2512.57 samples/sec Loss 1.8195 LearningRate 0.000214 Epoch: 23 Global Step: 484390 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:39,271-Speed 2497.03 samples/sec Loss 1.8011 LearningRate 0.000214 Epoch: 23 Global Step: 484400 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:47,476-Speed 2496.36 samples/sec Loss 1.8006 LearningRate 0.000214 Epoch: 23 Global Step: 484410 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:20:55,679-Speed 2496.74 samples/sec Loss 1.7960 LearningRate 0.000214 Epoch: 23 Global Step: 484420 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:03,886-Speed 2495.87 samples/sec Loss 1.8119 LearningRate 0.000214 Epoch: 23 Global Step: 484430 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:12,092-Speed 2496.52 samples/sec Loss 1.8543 LearningRate 0.000214 Epoch: 23 Global Step: 484440 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:20,247-Speed 2512.55 samples/sec Loss 1.8162 LearningRate 0.000214 Epoch: 23 Global Step: 484450 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:28,453-Speed 2495.98 samples/sec Loss 1.8518 LearningRate 0.000214 Epoch: 23 Global Step: 484460 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:36,667-Speed 2493.65 samples/sec Loss 1.8288 LearningRate 0.000214 Epoch: 23 Global Step: 484470 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:44,874-Speed 2495.78 samples/sec Loss 1.8184 LearningRate 0.000214 Epoch: 23 Global Step: 484480 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:21:53,080-Speed 2496.59 samples/sec Loss 1.8040 LearningRate 0.000214 Epoch: 23 Global Step: 484490 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:01,287-Speed 2496.08 samples/sec Loss 1.8249 LearningRate 0.000214 Epoch: 23 Global Step: 484500 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:09,443-Speed 2511.17 samples/sec Loss 1.8290 LearningRate 0.000214 Epoch: 23 Global Step: 484510 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:17,642-Speed 2498.53 samples/sec Loss 1.8083 LearningRate 0.000214 Epoch: 23 Global Step: 484520 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:25,877-Speed 2487.88 samples/sec Loss 1.7903 LearningRate 0.000214 Epoch: 23 Global Step: 484530 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:34,084-Speed 2495.63 samples/sec Loss 1.8319 LearningRate 0.000214 Epoch: 23 Global Step: 484540 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:42,289-Speed 2496.62 samples/sec Loss 1.8000 LearningRate 0.000214 Epoch: 23 Global Step: 484550 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:50,522-Speed 2487.97 samples/sec Loss 1.7927 LearningRate 0.000214 Epoch: 23 Global Step: 484560 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:22:58,685-Speed 2509.15 samples/sec Loss 1.7865 LearningRate 0.000214 Epoch: 23 Global Step: 484570 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:06,886-Speed 2497.55 samples/sec Loss 1.7788 LearningRate 0.000214 Epoch: 23 Global Step: 484580 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:15,088-Speed 2497.30 samples/sec Loss 1.8328 LearningRate 0.000213 Epoch: 23 Global Step: 484590 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:23,294-Speed 2496.45 samples/sec Loss 1.8453 LearningRate 0.000213 Epoch: 23 Global Step: 484600 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:31,499-Speed 2496.34 samples/sec Loss 1.8154 LearningRate 0.000213 Epoch: 23 Global Step: 484610 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:39,717-Speed 2492.52 samples/sec Loss 1.8068 LearningRate 0.000213 Epoch: 23 Global Step: 484620 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:47,870-Speed 2512.41 samples/sec Loss 1.8268 LearningRate 0.000213 Epoch: 23 Global Step: 484630 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:23:56,076-Speed 2496.10 samples/sec Loss 1.7792 LearningRate 0.000213 Epoch: 23 Global Step: 484640 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:04,292-Speed 2493.03 samples/sec Loss 1.8533 LearningRate 0.000213 Epoch: 23 Global Step: 484650 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:12,495-Speed 2496.79 samples/sec Loss 1.7875 LearningRate 0.000213 Epoch: 23 Global Step: 484660 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:20,701-Speed 2496.73 samples/sec Loss 1.7742 LearningRate 0.000213 Epoch: 23 Global Step: 484670 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:28,914-Speed 2494.14 samples/sec Loss 1.7856 LearningRate 0.000213 Epoch: 23 Global Step: 484680 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:37,074-Speed 2510.16 samples/sec Loss 1.8625 LearningRate 0.000213 Epoch: 23 Global Step: 484690 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:45,279-Speed 2496.29 samples/sec Loss 1.7777 LearningRate 0.000213 Epoch: 23 Global Step: 484700 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:24:53,486-Speed 2496.04 samples/sec Loss 1.7933 LearningRate 0.000213 Epoch: 23 Global Step: 484710 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:01,719-Speed 2488.17 samples/sec Loss 1.8524 LearningRate 0.000213 Epoch: 23 Global Step: 484720 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:09,929-Speed 2494.59 samples/sec Loss 1.8202 LearningRate 0.000213 Epoch: 23 Global Step: 484730 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:18,140-Speed 2494.68 samples/sec Loss 1.8072 LearningRate 0.000213 Epoch: 23 Global Step: 484740 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:26,310-Speed 2508.30 samples/sec Loss 1.7936 LearningRate 0.000213 Epoch: 23 Global Step: 484750 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:34,527-Speed 2492.86 samples/sec Loss 1.8094 LearningRate 0.000213 Epoch: 23 Global Step: 484760 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:42,753-Speed 2489.99 samples/sec Loss 1.7944 LearningRate 0.000213 Epoch: 23 Global Step: 484770 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:50,957-Speed 2496.74 samples/sec Loss 1.7993 LearningRate 0.000213 Epoch: 23 Global Step: 484780 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:25:59,168-Speed 2494.84 samples/sec Loss 1.8550 LearningRate 0.000213 Epoch: 23 Global Step: 484790 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:07,379-Speed 2494.50 samples/sec Loss 1.8217 LearningRate 0.000213 Epoch: 23 Global Step: 484800 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:15,536-Speed 2511.11 samples/sec Loss 1.8248 LearningRate 0.000213 Epoch: 23 Global Step: 484810 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:23,741-Speed 2496.40 samples/sec Loss 1.8443 LearningRate 0.000213 Epoch: 23 Global Step: 484820 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:31,946-Speed 2496.46 samples/sec Loss 1.8268 LearningRate 0.000213 Epoch: 23 Global Step: 484830 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:40,172-Speed 2489.90 samples/sec Loss 1.8336 LearningRate 0.000213 Epoch: 23 Global Step: 484840 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:48,378-Speed 2496.27 samples/sec Loss 1.8512 LearningRate 0.000213 Epoch: 23 Global Step: 484850 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:26:56,580-Speed 2497.27 samples/sec Loss 1.8433 LearningRate 0.000213 Epoch: 23 Global Step: 484860 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:04,736-Speed 2511.49 samples/sec Loss 1.8256 LearningRate 0.000213 Epoch: 23 Global Step: 484870 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:12,944-Speed 2495.49 samples/sec Loss 1.8013 LearningRate 0.000213 Epoch: 23 Global Step: 484880 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:21,152-Speed 2495.50 samples/sec Loss 1.8177 LearningRate 0.000213 Epoch: 23 Global Step: 484890 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:29,359-Speed 2495.70 samples/sec Loss 1.7771 LearningRate 0.000213 Epoch: 23 Global Step: 484900 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:37,574-Speed 2493.52 samples/sec Loss 1.8381 LearningRate 0.000213 Epoch: 23 Global Step: 484910 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:45,781-Speed 2495.69 samples/sec Loss 1.8329 LearningRate 0.000213 Epoch: 23 Global Step: 484920 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:27:53,951-Speed 2507.24 samples/sec Loss 1.8670 LearningRate 0.000213 Epoch: 23 Global Step: 484930 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:02,157-Speed 2496.12 samples/sec Loss 1.8376 LearningRate 0.000213 Epoch: 23 Global Step: 484940 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:10,362-Speed 2496.71 samples/sec Loss 1.7984 LearningRate 0.000213 Epoch: 23 Global Step: 484950 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:18,569-Speed 2496.11 samples/sec Loss 1.8362 LearningRate 0.000213 Epoch: 23 Global Step: 484960 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:26,798-Speed 2488.91 samples/sec Loss 1.7974 LearningRate 0.000213 Epoch: 23 Global Step: 484970 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:35,008-Speed 2494.78 samples/sec Loss 1.8120 LearningRate 0.000213 Epoch: 23 Global Step: 484980 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:43,161-Speed 2512.66 samples/sec Loss 1.8368 LearningRate 0.000213 Epoch: 23 Global Step: 484990 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:51,367-Speed 2496.02 samples/sec Loss 1.8426 LearningRate 0.000213 Epoch: 23 Global Step: 485000 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:28:59,570-Speed 2496.89 samples/sec Loss 1.8209 LearningRate 0.000213 Epoch: 23 Global Step: 485010 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:07,772-Speed 2497.34 samples/sec Loss 1.8217 LearningRate 0.000213 Epoch: 23 Global Step: 485020 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:15,978-Speed 2496.22 samples/sec Loss 1.8309 LearningRate 0.000213 Epoch: 23 Global Step: 485030 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:24,182-Speed 2496.60 samples/sec Loss 1.8284 LearningRate 0.000213 Epoch: 23 Global Step: 485040 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:32,333-Speed 2513.29 samples/sec Loss 1.7903 LearningRate 0.000213 Epoch: 23 Global Step: 485050 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:40,535-Speed 2497.30 samples/sec Loss 1.8556 LearningRate 0.000213 Epoch: 23 Global Step: 485060 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:48,742-Speed 2495.82 samples/sec Loss 1.8190 LearningRate 0.000213 Epoch: 23 Global Step: 485070 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:29:56,952-Speed 2494.62 samples/sec Loss 1.8446 LearningRate 0.000213 Epoch: 23 Global Step: 485080 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:05,160-Speed 2495.70 samples/sec Loss 1.8274 LearningRate 0.000213 Epoch: 23 Global Step: 485090 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:13,367-Speed 2495.88 samples/sec Loss 1.8144 LearningRate 0.000213 Epoch: 23 Global Step: 485100 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:21,517-Speed 2513.35 samples/sec Loss 1.8174 LearningRate 0.000213 Epoch: 23 Global Step: 485110 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:29,729-Speed 2493.95 samples/sec Loss 1.8521 LearningRate 0.000213 Epoch: 23 Global Step: 485120 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:37,938-Speed 2495.49 samples/sec Loss 1.7972 LearningRate 0.000213 Epoch: 23 Global Step: 485130 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:46,144-Speed 2495.93 samples/sec Loss 1.8503 LearningRate 0.000213 Epoch: 23 Global Step: 485140 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:30:54,347-Speed 2496.94 samples/sec Loss 1.8332 LearningRate 0.000213 Epoch: 23 Global Step: 485150 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:02,568-Speed 2491.63 samples/sec Loss 1.8477 LearningRate 0.000213 Epoch: 23 Global Step: 485160 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:10,740-Speed 2506.64 samples/sec Loss 1.8447 LearningRate 0.000213 Epoch: 23 Global Step: 485170 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:18,950-Speed 2494.92 samples/sec Loss 1.8005 LearningRate 0.000213 Epoch: 23 Global Step: 485180 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:27,155-Speed 2496.39 samples/sec Loss 1.8138 LearningRate 0.000213 Epoch: 23 Global Step: 485190 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:35,364-Speed 2495.30 samples/sec Loss 1.8562 LearningRate 0.000213 Epoch: 23 Global Step: 485200 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:43,578-Speed 2493.91 samples/sec Loss 1.8136 LearningRate 0.000213 Epoch: 23 Global Step: 485210 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:51,779-Speed 2497.32 samples/sec Loss 1.7899 LearningRate 0.000213 Epoch: 23 Global Step: 485220 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:31:59,927-Speed 2514.00 samples/sec Loss 1.8274 LearningRate 0.000213 Epoch: 23 Global Step: 485230 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:08,130-Speed 2496.99 samples/sec Loss 1.7920 LearningRate 0.000213 Epoch: 23 Global Step: 485240 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:16,341-Speed 2494.80 samples/sec Loss 1.8444 LearningRate 0.000213 Epoch: 23 Global Step: 485250 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:24,543-Speed 2497.25 samples/sec Loss 1.8070 LearningRate 0.000213 Epoch: 23 Global Step: 485260 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:32,748-Speed 2496.34 samples/sec Loss 1.8253 LearningRate 0.000213 Epoch: 23 Global Step: 485270 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:40,954-Speed 2495.99 samples/sec Loss 1.8181 LearningRate 0.000213 Epoch: 23 Global Step: 485280 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:49,118-Speed 2509.16 samples/sec Loss 1.8039 LearningRate 0.000213 Epoch: 23 Global Step: 485290 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:32:57,328-Speed 2494.91 samples/sec Loss 1.8072 LearningRate 0.000213 Epoch: 23 Global Step: 485300 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:05,540-Speed 2494.25 samples/sec Loss 1.7750 LearningRate 0.000213 Epoch: 23 Global Step: 485310 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:13,746-Speed 2496.08 samples/sec Loss 1.8543 LearningRate 0.000213 Epoch: 23 Global Step: 485320 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:21,952-Speed 2495.92 samples/sec Loss 1.8398 LearningRate 0.000213 Epoch: 23 Global Step: 485330 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:30,160-Speed 2495.55 samples/sec Loss 1.8505 LearningRate 0.000213 Epoch: 23 Global Step: 485340 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:38,310-Speed 2513.23 samples/sec Loss 1.8312 LearningRate 0.000213 Epoch: 23 Global Step: 485350 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:46,517-Speed 2496.04 samples/sec Loss 1.8352 LearningRate 0.000213 Epoch: 23 Global Step: 485360 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:33:54,719-Speed 2497.19 samples/sec Loss 1.8451 LearningRate 0.000213 Epoch: 23 Global Step: 485370 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:02,939-Speed 2491.68 samples/sec Loss 1.8472 LearningRate 0.000213 Epoch: 23 Global Step: 485380 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:11,140-Speed 2497.83 samples/sec Loss 1.8428 LearningRate 0.000213 Epoch: 23 Global Step: 485390 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:19,343-Speed 2496.95 samples/sec Loss 1.8157 LearningRate 0.000212 Epoch: 23 Global Step: 485400 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:27,491-Speed 2513.83 samples/sec Loss 1.8409 LearningRate 0.000212 Epoch: 23 Global Step: 485410 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:35,723-Speed 2488.55 samples/sec Loss 1.8388 LearningRate 0.000212 Epoch: 23 Global Step: 485420 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:34:43,926-Speed 2496.78 samples/sec Loss 1.8247 LearningRate 0.000212 Epoch: 23 Global Step: 485430 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:34:52,126-Speed 2497.95 samples/sec Loss 1.8461 LearningRate 0.000212 Epoch: 23 Global Step: 485440 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:00,336-Speed 2495.13 samples/sec Loss 1.8420 LearningRate 0.000212 Epoch: 23 Global Step: 485450 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:08,537-Speed 2497.59 samples/sec Loss 1.8900 LearningRate 0.000212 Epoch: 23 Global Step: 485460 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:16,688-Speed 2513.08 samples/sec Loss 1.8330 LearningRate 0.000212 Epoch: 23 Global Step: 485470 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:24,891-Speed 2497.04 samples/sec Loss 1.8525 LearningRate 0.000212 Epoch: 23 Global Step: 485480 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:33,097-Speed 2496.25 samples/sec Loss 1.8225 LearningRate 0.000212 Epoch: 23 Global Step: 485490 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:41,302-Speed 2496.33 samples/sec Loss 1.8511 LearningRate 0.000212 Epoch: 23 Global Step: 485500 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:49,504-Speed 2497.07 samples/sec Loss 1.8500 LearningRate 0.000212 Epoch: 23 Global Step: 485510 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:35:57,719-Speed 2493.35 samples/sec Loss 1.8420 LearningRate 0.000212 Epoch: 23 Global Step: 485520 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:05,878-Speed 2510.89 samples/sec Loss 1.8493 LearningRate 0.000212 Epoch: 23 Global Step: 485530 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:14,093-Speed 2493.39 samples/sec Loss 1.8462 LearningRate 0.000212 Epoch: 23 Global Step: 485540 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:22,310-Speed 2492.62 samples/sec Loss 1.8482 LearningRate 0.000212 Epoch: 23 Global Step: 485550 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:30,519-Speed 2495.49 samples/sec Loss 1.8987 LearningRate 0.000212 Epoch: 23 Global Step: 485560 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:38,724-Speed 2496.19 samples/sec Loss 1.8557 LearningRate 0.000212 Epoch: 23 Global Step: 485570 Fp16 Grad Scale: 32768 Required: 79 hours Training: 2022-07-10 05:36:46,885-Speed 2509.89 samples/sec Loss 1.8398 LearningRate 0.000212 Epoch: 23 Global Step: 485580 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:36:55,035-Speed 2513.12 samples/sec Loss 1.8697 LearningRate 0.000212 Epoch: 23 Global Step: 485590 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:03,238-Speed 2497.03 samples/sec Loss 1.8745 LearningRate 0.000212 Epoch: 23 Global Step: 485600 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:11,451-Speed 2494.00 samples/sec Loss 1.8251 LearningRate 0.000212 Epoch: 23 Global Step: 485610 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:19,653-Speed 2497.24 samples/sec Loss 1.8587 LearningRate 0.000212 Epoch: 23 Global Step: 485620 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:27,857-Speed 2496.56 samples/sec Loss 1.8659 LearningRate 0.000212 Epoch: 23 Global Step: 485630 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:36,069-Speed 2495.21 samples/sec Loss 1.8145 LearningRate 0.000212 Epoch: 23 Global Step: 485640 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:44,229-Speed 2510.20 samples/sec Loss 1.8178 LearningRate 0.000212 Epoch: 23 Global Step: 485650 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:37:52,452-Speed 2491.08 samples/sec Loss 1.8348 LearningRate 0.000212 Epoch: 23 Global Step: 485660 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:00,663-Speed 2494.50 samples/sec Loss 1.8489 LearningRate 0.000212 Epoch: 23 Global Step: 485670 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:08,869-Speed 2496.23 samples/sec Loss 1.8120 LearningRate 0.000212 Epoch: 23 Global Step: 485680 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:17,074-Speed 2496.32 samples/sec Loss 1.8707 LearningRate 0.000212 Epoch: 23 Global Step: 485690 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:25,280-Speed 2496.13 samples/sec Loss 1.8320 LearningRate 0.000212 Epoch: 23 Global Step: 485700 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:33,429-Speed 2513.62 samples/sec Loss 1.8291 LearningRate 0.000212 Epoch: 23 Global Step: 485710 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:41,633-Speed 2496.78 samples/sec Loss 1.8254 LearningRate 0.000212 Epoch: 23 Global Step: 485720 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:49,850-Speed 2492.70 samples/sec Loss 1.7796 LearningRate 0.000212 Epoch: 23 Global Step: 485730 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:38:58,056-Speed 2496.42 samples/sec Loss 1.8905 LearningRate 0.000212 Epoch: 23 Global Step: 485740 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:06,265-Speed 2495.28 samples/sec Loss 1.8371 LearningRate 0.000212 Epoch: 23 Global Step: 485750 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:14,469-Speed 2496.53 samples/sec Loss 1.8161 LearningRate 0.000212 Epoch: 23 Global Step: 485760 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:22,632-Speed 2509.31 samples/sec Loss 1.8762 LearningRate 0.000212 Epoch: 23 Global Step: 485770 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:30,838-Speed 2495.96 samples/sec Loss 1.8039 LearningRate 0.000212 Epoch: 23 Global Step: 485780 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:39,048-Speed 2495.21 samples/sec Loss 1.8082 LearningRate 0.000212 Epoch: 23 Global Step: 485790 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:47,267-Speed 2492.30 samples/sec Loss 1.8443 LearningRate 0.000212 Epoch: 23 Global Step: 485800 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:39:55,472-Speed 2496.25 samples/sec Loss 1.8853 LearningRate 0.000212 Epoch: 23 Global Step: 485810 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:03,677-Speed 2496.42 samples/sec Loss 1.7929 LearningRate 0.000212 Epoch: 23 Global Step: 485820 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:11,843-Speed 2508.41 samples/sec Loss 1.8101 LearningRate 0.000212 Epoch: 23 Global Step: 485830 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:20,058-Speed 2493.24 samples/sec Loss 1.8384 LearningRate 0.000212 Epoch: 23 Global Step: 485840 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:28,268-Speed 2495.15 samples/sec Loss 1.8153 LearningRate 0.000212 Epoch: 23 Global Step: 485850 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:36,475-Speed 2495.97 samples/sec Loss 1.8210 LearningRate 0.000212 Epoch: 23 Global Step: 485860 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:44,682-Speed 2495.77 samples/sec Loss 1.8078 LearningRate 0.000212 Epoch: 23 Global Step: 485870 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:40:52,898-Speed 2493.15 samples/sec Loss 1.7860 LearningRate 0.000212 Epoch: 23 Global Step: 485880 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:01,051-Speed 2512.48 samples/sec Loss 1.7943 LearningRate 0.000212 Epoch: 23 Global Step: 485890 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:09,261-Speed 2494.74 samples/sec Loss 1.8420 LearningRate 0.000212 Epoch: 23 Global Step: 485900 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:17,466-Speed 2496.76 samples/sec Loss 1.7769 LearningRate 0.000212 Epoch: 23 Global Step: 485910 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:25,674-Speed 2495.33 samples/sec Loss 1.7820 LearningRate 0.000212 Epoch: 23 Global Step: 485920 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:33,887-Speed 2494.29 samples/sec Loss 1.8082 LearningRate 0.000212 Epoch: 23 Global Step: 485930 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:42,090-Speed 2496.95 samples/sec Loss 1.8231 LearningRate 0.000212 Epoch: 23 Global Step: 485940 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:50,243-Speed 2512.48 samples/sec Loss 1.7882 LearningRate 0.000212 Epoch: 23 Global Step: 485950 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:41:58,454-Speed 2494.40 samples/sec Loss 1.8068 LearningRate 0.000212 Epoch: 23 Global Step: 485960 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:06,655-Speed 2497.63 samples/sec Loss 1.7795 LearningRate 0.000212 Epoch: 23 Global Step: 485970 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:14,857-Speed 2497.29 samples/sec Loss 1.8276 LearningRate 0.000212 Epoch: 23 Global Step: 485980 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:23,060-Speed 2497.11 samples/sec Loss 1.7823 LearningRate 0.000212 Epoch: 23 Global Step: 485990 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:31,274-Speed 2493.56 samples/sec Loss 1.8158 LearningRate 0.000212 Epoch: 23 Global Step: 486000 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:39,424-Speed 2513.51 samples/sec Loss 1.8834 LearningRate 0.000212 Epoch: 23 Global Step: 486010 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:47,627-Speed 2497.35 samples/sec Loss 1.8488 LearningRate 0.000212 Epoch: 23 Global Step: 486020 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:42:55,827-Speed 2497.74 samples/sec Loss 1.8556 LearningRate 0.000212 Epoch: 23 Global Step: 486030 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:04,043-Speed 2493.16 samples/sec Loss 1.7976 LearningRate 0.000212 Epoch: 23 Global Step: 486040 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:12,251-Speed 2495.55 samples/sec Loss 1.8441 LearningRate 0.000212 Epoch: 23 Global Step: 486050 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:20,464-Speed 2494.14 samples/sec Loss 1.8315 LearningRate 0.000212 Epoch: 23 Global Step: 486060 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:28,620-Speed 2511.62 samples/sec Loss 1.8007 LearningRate 0.000212 Epoch: 23 Global Step: 486070 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:36,824-Speed 2496.65 samples/sec Loss 1.8424 LearningRate 0.000212 Epoch: 23 Global Step: 486080 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:45,028-Speed 2496.71 samples/sec Loss 1.8349 LearningRate 0.000212 Epoch: 23 Global Step: 486090 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:43:53,230-Speed 2497.59 samples/sec Loss 1.8067 LearningRate 0.000212 Epoch: 23 Global Step: 486100 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:01,436-Speed 2496.09 samples/sec Loss 1.8464 LearningRate 0.000212 Epoch: 23 Global Step: 486110 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:09,636-Speed 2497.90 samples/sec Loss 1.8084 LearningRate 0.000212 Epoch: 23 Global Step: 486120 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:17,792-Speed 2511.85 samples/sec Loss 1.7859 LearningRate 0.000212 Epoch: 23 Global Step: 486130 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:25,997-Speed 2496.26 samples/sec Loss 1.8219 LearningRate 0.000212 Epoch: 23 Global Step: 486140 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:34,204-Speed 2495.98 samples/sec Loss 1.7902 LearningRate 0.000212 Epoch: 23 Global Step: 486150 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:42,408-Speed 2496.76 samples/sec Loss 1.7674 LearningRate 0.000212 Epoch: 23 Global Step: 486160 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:50,614-Speed 2496.05 samples/sec Loss 1.8157 LearningRate 0.000212 Epoch: 23 Global Step: 486170 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:44:58,819-Speed 2496.49 samples/sec Loss 1.8465 LearningRate 0.000212 Epoch: 23 Global Step: 486180 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:06,987-Speed 2507.91 samples/sec Loss 1.8873 LearningRate 0.000212 Epoch: 23 Global Step: 486190 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:15,192-Speed 2496.44 samples/sec Loss 1.8219 LearningRate 0.000212 Epoch: 23 Global Step: 486200 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:23,404-Speed 2494.35 samples/sec Loss 1.8259 LearningRate 0.000211 Epoch: 23 Global Step: 486210 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:31,619-Speed 2493.39 samples/sec Loss 1.8404 LearningRate 0.000211 Epoch: 23 Global Step: 486220 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:39,820-Speed 2497.76 samples/sec Loss 1.7480 LearningRate 0.000211 Epoch: 23 Global Step: 486230 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:48,029-Speed 2495.60 samples/sec Loss 1.8634 LearningRate 0.000211 Epoch: 23 Global Step: 486240 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:45:56,193-Speed 2508.99 samples/sec Loss 1.7957 LearningRate 0.000211 Epoch: 23 Global Step: 486250 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:04,396-Speed 2496.97 samples/sec Loss 1.7945 LearningRate 0.000211 Epoch: 23 Global Step: 486260 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:12,596-Speed 2498.44 samples/sec Loss 1.8196 LearningRate 0.000211 Epoch: 23 Global Step: 486270 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:20,803-Speed 2496.68 samples/sec Loss 1.8216 LearningRate 0.000211 Epoch: 23 Global Step: 486280 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:29,004-Speed 2497.63 samples/sec Loss 1.8362 LearningRate 0.000211 Epoch: 23 Global Step: 486290 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:37,215-Speed 2494.81 samples/sec Loss 1.7851 LearningRate 0.000211 Epoch: 23 Global Step: 486300 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:45,365-Speed 2513.07 samples/sec Loss 1.8007 LearningRate 0.000211 Epoch: 23 Global Step: 486310 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:46:53,568-Speed 2497.01 samples/sec Loss 1.7795 LearningRate 0.000211 Epoch: 23 Global Step: 486320 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:01,771-Speed 2497.31 samples/sec Loss 1.8308 LearningRate 0.000211 Epoch: 23 Global Step: 486330 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:09,974-Speed 2496.91 samples/sec Loss 1.7724 LearningRate 0.000211 Epoch: 23 Global Step: 486340 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:18,180-Speed 2496.19 samples/sec Loss 1.7659 LearningRate 0.000211 Epoch: 23 Global Step: 486350 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:26,385-Speed 2496.51 samples/sec Loss 1.8089 LearningRate 0.000211 Epoch: 23 Global Step: 486360 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:34,551-Speed 2508.13 samples/sec Loss 1.8162 LearningRate 0.000211 Epoch: 23 Global Step: 486370 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:42,756-Speed 2496.53 samples/sec Loss 1.8228 LearningRate 0.000211 Epoch: 23 Global Step: 486380 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:50,958-Speed 2497.26 samples/sec Loss 1.7990 LearningRate 0.000211 Epoch: 23 Global Step: 486390 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:47:59,164-Speed 2496.21 samples/sec Loss 1.8418 LearningRate 0.000211 Epoch: 23 Global Step: 486400 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:07,367-Speed 2496.94 samples/sec Loss 1.8240 LearningRate 0.000211 Epoch: 23 Global Step: 486410 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:15,568-Speed 2497.65 samples/sec Loss 1.8017 LearningRate 0.000211 Epoch: 23 Global Step: 486420 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:23,717-Speed 2513.46 samples/sec Loss 1.8258 LearningRate 0.000211 Epoch: 23 Global Step: 486430 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:31,924-Speed 2495.90 samples/sec Loss 1.8017 LearningRate 0.000211 Epoch: 23 Global Step: 486440 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:40,125-Speed 2497.76 samples/sec Loss 1.8170 LearningRate 0.000211 Epoch: 23 Global Step: 486450 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:48,333-Speed 2495.46 samples/sec Loss 1.8629 LearningRate 0.000211 Epoch: 23 Global Step: 486460 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:48:56,533-Speed 2497.84 samples/sec Loss 1.8206 LearningRate 0.000211 Epoch: 23 Global Step: 486470 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:04,736-Speed 2497.39 samples/sec Loss 1.8589 LearningRate 0.000211 Epoch: 23 Global Step: 486480 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:12,887-Speed 2512.91 samples/sec Loss 1.7859 LearningRate 0.000211 Epoch: 23 Global Step: 486490 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:21,090-Speed 2497.00 samples/sec Loss 1.8104 LearningRate 0.000211 Epoch: 23 Global Step: 486500 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:29,294-Speed 2496.77 samples/sec Loss 1.8375 LearningRate 0.000211 Epoch: 23 Global Step: 486510 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:37,587-Speed 2469.93 samples/sec Loss 1.7871 LearningRate 0.000211 Epoch: 23 Global Step: 486520 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:45,818-Speed 2488.36 samples/sec Loss 1.7998 LearningRate 0.000211 Epoch: 23 Global Step: 486530 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:49:54,020-Speed 2497.35 samples/sec Loss 1.7989 LearningRate 0.000211 Epoch: 23 Global Step: 486540 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:02,174-Speed 2512.34 samples/sec Loss 1.8324 LearningRate 0.000211 Epoch: 23 Global Step: 486550 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:10,377-Speed 2497.02 samples/sec Loss 1.8248 LearningRate 0.000211 Epoch: 23 Global Step: 486560 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:18,618-Speed 2485.66 samples/sec Loss 1.8254 LearningRate 0.000211 Epoch: 23 Global Step: 486570 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:26,835-Speed 2492.68 samples/sec Loss 1.7588 LearningRate 0.000211 Epoch: 23 Global Step: 486580 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:35,039-Speed 2496.70 samples/sec Loss 1.8440 LearningRate 0.000211 Epoch: 23 Global Step: 486590 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:43,245-Speed 2496.41 samples/sec Loss 1.7965 LearningRate 0.000211 Epoch: 23 Global Step: 486600 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:51,394-Speed 2513.40 samples/sec Loss 1.8274 LearningRate 0.000211 Epoch: 23 Global Step: 486610 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:50:59,597-Speed 2497.27 samples/sec Loss 1.7811 LearningRate 0.000211 Epoch: 23 Global Step: 486620 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:07,799-Speed 2497.42 samples/sec Loss 1.7893 LearningRate 0.000211 Epoch: 23 Global Step: 486630 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:16,001-Speed 2497.60 samples/sec Loss 1.7947 LearningRate 0.000211 Epoch: 23 Global Step: 486640 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:24,209-Speed 2495.62 samples/sec Loss 1.8456 LearningRate 0.000211 Epoch: 23 Global Step: 486650 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:32,411-Speed 2497.57 samples/sec Loss 1.8372 LearningRate 0.000211 Epoch: 23 Global Step: 486660 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:40,560-Speed 2513.40 samples/sec Loss 1.8138 LearningRate 0.000211 Epoch: 23 Global Step: 486670 Fp16 Grad Scale: 16384 Required: 79 hours Training: 2022-07-10 05:51:48,763-Speed 2496.98 samples/sec Loss 1.8508 LearningRate 0.000211 Epoch: 23 Global Step: 486680 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:51:56,968-Speed 2496.70 samples/sec Loss 1.8602 LearningRate 0.000211 Epoch: 23 Global Step: 486690 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:05,171-Speed 2497.03 samples/sec Loss 1.8175 LearningRate 0.000211 Epoch: 23 Global Step: 486700 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:13,372-Speed 2497.72 samples/sec Loss 1.8112 LearningRate 0.000211 Epoch: 23 Global Step: 486710 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:21,579-Speed 2495.60 samples/sec Loss 1.8017 LearningRate 0.000211 Epoch: 23 Global Step: 486720 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:29,728-Speed 2513.49 samples/sec Loss 1.8060 LearningRate 0.000211 Epoch: 23 Global Step: 486730 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:37,928-Speed 2498.07 samples/sec Loss 1.8402 LearningRate 0.000211 Epoch: 23 Global Step: 486740 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:46,132-Speed 2496.78 samples/sec Loss 1.8257 LearningRate 0.000211 Epoch: 23 Global Step: 486750 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:52:54,332-Speed 2498.11 samples/sec Loss 1.7997 LearningRate 0.000211 Epoch: 23 Global Step: 486760 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:53:02,533-Speed 2497.68 samples/sec Loss 1.8032 LearningRate 0.000211 Epoch: 23 Global Step: 486770 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:53:10,741-Speed 2495.49 samples/sec Loss 1.8123 LearningRate 0.000211 Epoch: 23 Global Step: 486780 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 05:53:18,891-Speed 2513.57 samples/sec Loss 1.8301 LearningRate 0.000211 Epoch: 23 Global Step: 486790 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 05:53:27,094-Speed 2497.31 samples/sec Loss 1.8253 LearningRate 0.000211 Epoch: 23 Global Step: 486800 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 05:53:35,294-Speed 2497.76 samples/sec Loss 1.8803 LearningRate 0.000211 Epoch: 23 Global Step: 486810 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 05:53:43,499-Speed 2496.53 samples/sec Loss 1.8496 LearningRate 0.000211 Epoch: 23 Global Step: 486820 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 05:53:51,661-Speed 2509.62 samples/sec Loss 1.8511 LearningRate 0.000211 Epoch: 23 Global Step: 486830 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:53:59,866-Speed 2496.59 samples/sec Loss 1.8276 LearningRate 0.000211 Epoch: 23 Global Step: 486840 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:08,015-Speed 2513.49 samples/sec Loss 1.8016 LearningRate 0.000211 Epoch: 23 Global Step: 486850 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:16,219-Speed 2496.83 samples/sec Loss 1.8552 LearningRate 0.000211 Epoch: 23 Global Step: 486860 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:24,422-Speed 2497.50 samples/sec Loss 1.7781 LearningRate 0.000211 Epoch: 23 Global Step: 486870 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:32,622-Speed 2497.79 samples/sec Loss 1.8338 LearningRate 0.000211 Epoch: 23 Global Step: 486880 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:40,825-Speed 2497.10 samples/sec Loss 1.8267 LearningRate 0.000211 Epoch: 23 Global Step: 486890 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:49,037-Speed 2494.30 samples/sec Loss 1.8561 LearningRate 0.000211 Epoch: 23 Global Step: 486900 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:54:57,184-Speed 2514.20 samples/sec Loss 1.7687 LearningRate 0.000211 Epoch: 23 Global Step: 486910 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:05,389-Speed 2496.42 samples/sec Loss 1.8470 LearningRate 0.000211 Epoch: 23 Global Step: 486920 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:13,591-Speed 2497.37 samples/sec Loss 1.8517 LearningRate 0.000211 Epoch: 23 Global Step: 486930 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:21,794-Speed 2497.18 samples/sec Loss 1.8456 LearningRate 0.000211 Epoch: 23 Global Step: 486940 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:29,998-Speed 2496.72 samples/sec Loss 1.7789 LearningRate 0.000211 Epoch: 23 Global Step: 486950 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:38,212-Speed 2493.41 samples/sec Loss 1.8047 LearningRate 0.000211 Epoch: 23 Global Step: 486960 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:46,369-Speed 2511.08 samples/sec Loss 1.8230 LearningRate 0.000211 Epoch: 23 Global Step: 486970 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:55:54,572-Speed 2497.30 samples/sec Loss 1.8093 LearningRate 0.000211 Epoch: 23 Global Step: 486980 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:02,788-Speed 2493.03 samples/sec Loss 1.8038 LearningRate 0.000211 Epoch: 23 Global Step: 486990 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:10,992-Speed 2496.89 samples/sec Loss 1.8609 LearningRate 0.000211 Epoch: 23 Global Step: 487000 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:19,196-Speed 2496.72 samples/sec Loss 1.7944 LearningRate 0.000211 Epoch: 23 Global Step: 487010 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:27,402-Speed 2496.10 samples/sec Loss 1.8395 LearningRate 0.000210 Epoch: 23 Global Step: 487020 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:35,561-Speed 2510.75 samples/sec Loss 1.8191 LearningRate 0.000210 Epoch: 23 Global Step: 487030 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:43,768-Speed 2495.58 samples/sec Loss 1.7878 LearningRate 0.000210 Epoch: 23 Global Step: 487040 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:56:51,975-Speed 2495.83 samples/sec Loss 1.7936 LearningRate 0.000210 Epoch: 23 Global Step: 487050 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:00,177-Speed 2497.47 samples/sec Loss 1.8134 LearningRate 0.000210 Epoch: 23 Global Step: 487060 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:08,378-Speed 2497.66 samples/sec Loss 1.8316 LearningRate 0.000210 Epoch: 23 Global Step: 487070 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:16,579-Speed 2497.39 samples/sec Loss 1.8617 LearningRate 0.000210 Epoch: 23 Global Step: 487080 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:24,735-Speed 2511.67 samples/sec Loss 1.8304 LearningRate 0.000210 Epoch: 23 Global Step: 487090 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:32,937-Speed 2497.57 samples/sec Loss 1.8456 LearningRate 0.000210 Epoch: 23 Global Step: 487100 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:41,139-Speed 2497.41 samples/sec Loss 1.8134 LearningRate 0.000210 Epoch: 23 Global Step: 487110 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:49,342-Speed 2497.34 samples/sec Loss 1.8172 LearningRate 0.000210 Epoch: 23 Global Step: 487120 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:57:57,550-Speed 2495.59 samples/sec Loss 1.8672 LearningRate 0.000210 Epoch: 23 Global Step: 487130 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:05,751-Speed 2497.85 samples/sec Loss 1.8237 LearningRate 0.000210 Epoch: 23 Global Step: 487140 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:13,898-Speed 2514.05 samples/sec Loss 1.8061 LearningRate 0.000210 Epoch: 23 Global Step: 487150 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:22,103-Speed 2496.34 samples/sec Loss 1.8171 LearningRate 0.000210 Epoch: 23 Global Step: 487160 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:30,310-Speed 2496.19 samples/sec Loss 1.8292 LearningRate 0.000210 Epoch: 23 Global Step: 487170 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:38,515-Speed 2496.33 samples/sec Loss 1.8400 LearningRate 0.000210 Epoch: 23 Global Step: 487180 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:46,718-Speed 2497.39 samples/sec Loss 1.8390 LearningRate 0.000210 Epoch: 23 Global Step: 487190 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:58:54,920-Speed 2497.07 samples/sec Loss 1.8344 LearningRate 0.000210 Epoch: 23 Global Step: 487200 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:03,070-Speed 2513.53 samples/sec Loss 1.8438 LearningRate 0.000210 Epoch: 23 Global Step: 487210 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:11,274-Speed 2496.75 samples/sec Loss 1.8573 LearningRate 0.000210 Epoch: 23 Global Step: 487220 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:19,485-Speed 2494.56 samples/sec Loss 1.8060 LearningRate 0.000210 Epoch: 23 Global Step: 487230 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:27,705-Speed 2492.02 samples/sec Loss 1.8492 LearningRate 0.000210 Epoch: 23 Global Step: 487240 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:35,919-Speed 2493.69 samples/sec Loss 1.8256 LearningRate 0.000210 Epoch: 23 Global Step: 487250 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:44,125-Speed 2495.94 samples/sec Loss 1.8218 LearningRate 0.000210 Epoch: 23 Global Step: 487260 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 05:59:52,274-Speed 2513.69 samples/sec Loss 1.8645 LearningRate 0.000210 Epoch: 23 Global Step: 487270 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:00,477-Speed 2497.18 samples/sec Loss 1.8338 LearningRate 0.000210 Epoch: 23 Global Step: 487280 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:08,682-Speed 2497.08 samples/sec Loss 1.7591 LearningRate 0.000210 Epoch: 23 Global Step: 487290 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:16,883-Speed 2497.35 samples/sec Loss 1.8099 LearningRate 0.000210 Epoch: 23 Global Step: 487300 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:25,085-Speed 2497.31 samples/sec Loss 1.7878 LearningRate 0.000210 Epoch: 23 Global Step: 487310 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:33,290-Speed 2496.54 samples/sec Loss 1.7893 LearningRate 0.000210 Epoch: 23 Global Step: 487320 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:41,442-Speed 2512.84 samples/sec Loss 1.7911 LearningRate 0.000210 Epoch: 23 Global Step: 487330 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:49,655-Speed 2494.13 samples/sec Loss 1.8133 LearningRate 0.000210 Epoch: 23 Global Step: 487340 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:00:57,858-Speed 2497.05 samples/sec Loss 1.8099 LearningRate 0.000210 Epoch: 23 Global Step: 487350 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:06,076-Speed 2492.48 samples/sec Loss 1.8395 LearningRate 0.000210 Epoch: 23 Global Step: 487360 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:14,278-Speed 2497.18 samples/sec Loss 1.8010 LearningRate 0.000210 Epoch: 23 Global Step: 487370 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:22,480-Speed 2497.20 samples/sec Loss 1.8159 LearningRate 0.000210 Epoch: 23 Global Step: 487380 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:30,632-Speed 2512.93 samples/sec Loss 1.8737 LearningRate 0.000210 Epoch: 23 Global Step: 487390 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:38,834-Speed 2497.42 samples/sec Loss 1.8491 LearningRate 0.000210 Epoch: 23 Global Step: 487400 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:47,040-Speed 2496.42 samples/sec Loss 1.8185 LearningRate 0.000210 Epoch: 23 Global Step: 487410 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:01:55,247-Speed 2495.65 samples/sec Loss 1.8183 LearningRate 0.000210 Epoch: 23 Global Step: 487420 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:03,451-Speed 2496.84 samples/sec Loss 1.7918 LearningRate 0.000210 Epoch: 23 Global Step: 487430 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:11,654-Speed 2497.22 samples/sec Loss 1.8494 LearningRate 0.000210 Epoch: 23 Global Step: 487440 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:19,804-Speed 2513.09 samples/sec Loss 1.7814 LearningRate 0.000210 Epoch: 23 Global Step: 487450 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:28,017-Speed 2494.28 samples/sec Loss 1.8375 LearningRate 0.000210 Epoch: 23 Global Step: 487460 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:36,219-Speed 2498.22 samples/sec Loss 1.7642 LearningRate 0.000210 Epoch: 23 Global Step: 487470 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:44,421-Speed 2497.30 samples/sec Loss 1.8468 LearningRate 0.000210 Epoch: 23 Global Step: 487480 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:02:52,620-Speed 2498.20 samples/sec Loss 1.7850 LearningRate 0.000210 Epoch: 23 Global Step: 487490 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:00,821-Speed 2497.43 samples/sec Loss 1.8310 LearningRate 0.000210 Epoch: 23 Global Step: 487500 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:08,995-Speed 2505.81 samples/sec Loss 1.7919 LearningRate 0.000210 Epoch: 23 Global Step: 487510 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:17,211-Speed 2493.27 samples/sec Loss 1.8102 LearningRate 0.000210 Epoch: 23 Global Step: 487520 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:25,417-Speed 2495.95 samples/sec Loss 1.8520 LearningRate 0.000210 Epoch: 23 Global Step: 487530 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:33,619-Speed 2497.39 samples/sec Loss 1.8044 LearningRate 0.000210 Epoch: 23 Global Step: 487540 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:41,824-Speed 2496.39 samples/sec Loss 1.8127 LearningRate 0.000210 Epoch: 23 Global Step: 487550 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:50,025-Speed 2497.92 samples/sec Loss 1.7836 LearningRate 0.000210 Epoch: 23 Global Step: 487560 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:03:58,176-Speed 2512.83 samples/sec Loss 1.8288 LearningRate 0.000210 Epoch: 23 Global Step: 487570 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:06,389-Speed 2493.80 samples/sec Loss 1.7537 LearningRate 0.000210 Epoch: 23 Global Step: 487580 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:14,590-Speed 2497.74 samples/sec Loss 1.7534 LearningRate 0.000210 Epoch: 23 Global Step: 487590 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:22,811-Speed 2491.60 samples/sec Loss 1.7963 LearningRate 0.000210 Epoch: 23 Global Step: 487600 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:31,014-Speed 2497.17 samples/sec Loss 1.8024 LearningRate 0.000210 Epoch: 23 Global Step: 487610 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:39,219-Speed 2496.49 samples/sec Loss 1.8273 LearningRate 0.000210 Epoch: 23 Global Step: 487620 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:47,366-Speed 2514.36 samples/sec Loss 1.7989 LearningRate 0.000210 Epoch: 23 Global Step: 487630 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:04:55,571-Speed 2496.41 samples/sec Loss 1.7820 LearningRate 0.000210 Epoch: 23 Global Step: 487640 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:03,789-Speed 2492.11 samples/sec Loss 1.8132 LearningRate 0.000210 Epoch: 23 Global Step: 487650 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:12,003-Speed 2493.97 samples/sec Loss 1.7921 LearningRate 0.000210 Epoch: 23 Global Step: 487660 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:20,206-Speed 2497.02 samples/sec Loss 1.8070 LearningRate 0.000210 Epoch: 23 Global Step: 487670 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:28,409-Speed 2497.72 samples/sec Loss 1.8159 LearningRate 0.000210 Epoch: 23 Global Step: 487680 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:36,572-Speed 2509.31 samples/sec Loss 1.7786 LearningRate 0.000210 Epoch: 23 Global Step: 487690 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:44,777-Speed 2496.23 samples/sec Loss 1.8026 LearningRate 0.000210 Epoch: 23 Global Step: 487700 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:05:52,990-Speed 2494.11 samples/sec Loss 1.7793 LearningRate 0.000210 Epoch: 23 Global Step: 487710 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:01,200-Speed 2495.03 samples/sec Loss 1.8083 LearningRate 0.000210 Epoch: 23 Global Step: 487720 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:09,408-Speed 2495.18 samples/sec Loss 1.8337 LearningRate 0.000210 Epoch: 23 Global Step: 487730 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:17,628-Speed 2492.10 samples/sec Loss 1.8387 LearningRate 0.000210 Epoch: 23 Global Step: 487740 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:25,816-Speed 2501.77 samples/sec Loss 1.7614 LearningRate 0.000210 Epoch: 23 Global Step: 487750 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:34,019-Speed 2497.07 samples/sec Loss 1.8404 LearningRate 0.000210 Epoch: 23 Global Step: 487760 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:42,229-Speed 2494.66 samples/sec Loss 1.7513 LearningRate 0.000210 Epoch: 23 Global Step: 487770 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:50,430-Speed 2497.91 samples/sec Loss 1.7998 LearningRate 0.000210 Epoch: 23 Global Step: 487780 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:06:58,635-Speed 2496.84 samples/sec Loss 1.7881 LearningRate 0.000210 Epoch: 23 Global Step: 487790 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:06,838-Speed 2496.87 samples/sec Loss 1.7711 LearningRate 0.000210 Epoch: 23 Global Step: 487800 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:14,988-Speed 2513.23 samples/sec Loss 1.7823 LearningRate 0.000210 Epoch: 23 Global Step: 487810 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:23,190-Speed 2497.41 samples/sec Loss 1.7887 LearningRate 0.000210 Epoch: 23 Global Step: 487820 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:31,392-Speed 2497.72 samples/sec Loss 1.8503 LearningRate 0.000210 Epoch: 23 Global Step: 487830 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:39,596-Speed 2496.69 samples/sec Loss 1.8278 LearningRate 0.000209 Epoch: 23 Global Step: 487840 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:47,800-Speed 2496.94 samples/sec Loss 1.8692 LearningRate 0.000209 Epoch: 23 Global Step: 487850 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:07:56,002-Speed 2497.21 samples/sec Loss 1.8093 LearningRate 0.000209 Epoch: 23 Global Step: 487860 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:08:04,153-Speed 2513.19 samples/sec Loss 1.7986 LearningRate 0.000209 Epoch: 23 Global Step: 487870 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:08:12,353-Speed 2498.16 samples/sec Loss 1.8168 LearningRate 0.000209 Epoch: 23 Global Step: 487880 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:08:20,552-Speed 2498.17 samples/sec Loss 1.7942 LearningRate 0.000209 Epoch: 23 Global Step: 487890 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:08:28,753-Speed 2497.77 samples/sec Loss 1.7895 LearningRate 0.000209 Epoch: 23 Global Step: 487900 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:08:36,910-Speed 2511.32 samples/sec Loss 1.8234 LearningRate 0.000209 Epoch: 23 Global Step: 487910 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:08:45,107-Speed 2498.71 samples/sec Loss 1.8200 LearningRate 0.000209 Epoch: 23 Global Step: 487920 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:08:53,258-Speed 2513.27 samples/sec Loss 1.8441 LearningRate 0.000209 Epoch: 23 Global Step: 487930 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:01,460-Speed 2497.51 samples/sec Loss 1.8274 LearningRate 0.000209 Epoch: 23 Global Step: 487940 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:09,662-Speed 2497.38 samples/sec Loss 1.8578 LearningRate 0.000209 Epoch: 23 Global Step: 487950 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:17,863-Speed 2497.60 samples/sec Loss 1.8244 LearningRate 0.000209 Epoch: 23 Global Step: 487960 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:26,062-Speed 2498.28 samples/sec Loss 1.8038 LearningRate 0.000209 Epoch: 23 Global Step: 487970 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:34,263-Speed 2497.51 samples/sec Loss 1.8083 LearningRate 0.000209 Epoch: 23 Global Step: 487980 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:42,408-Speed 2515.01 samples/sec Loss 1.8238 LearningRate 0.000209 Epoch: 23 Global Step: 487990 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:50,607-Speed 2498.15 samples/sec Loss 1.8544 LearningRate 0.000209 Epoch: 23 Global Step: 488000 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:09:58,823-Speed 2493.14 samples/sec Loss 1.8120 LearningRate 0.000209 Epoch: 23 Global Step: 488010 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:07,025-Speed 2497.21 samples/sec Loss 1.7922 LearningRate 0.000209 Epoch: 23 Global Step: 488020 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:15,235-Speed 2494.79 samples/sec Loss 1.7891 LearningRate 0.000209 Epoch: 23 Global Step: 488030 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:23,437-Speed 2497.61 samples/sec Loss 1.8365 LearningRate 0.000209 Epoch: 23 Global Step: 488040 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:31,585-Speed 2513.76 samples/sec Loss 1.8000 LearningRate 0.000209 Epoch: 23 Global Step: 488050 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:39,787-Speed 2497.28 samples/sec Loss 1.8138 LearningRate 0.000209 Epoch: 23 Global Step: 488060 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:47,993-Speed 2496.35 samples/sec Loss 1.8102 LearningRate 0.000209 Epoch: 23 Global Step: 488070 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:10:56,208-Speed 2493.19 samples/sec Loss 1.8185 LearningRate 0.000209 Epoch: 23 Global Step: 488080 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:04,413-Speed 2496.35 samples/sec Loss 1.8026 LearningRate 0.000209 Epoch: 23 Global Step: 488090 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:12,613-Speed 2498.57 samples/sec Loss 1.8165 LearningRate 0.000209 Epoch: 23 Global Step: 488100 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:20,765-Speed 2512.55 samples/sec Loss 1.8042 LearningRate 0.000209 Epoch: 23 Global Step: 488110 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:28,966-Speed 2497.46 samples/sec Loss 1.8280 LearningRate 0.000209 Epoch: 23 Global Step: 488120 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:37,167-Speed 2498.08 samples/sec Loss 1.8249 LearningRate 0.000209 Epoch: 23 Global Step: 488130 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:45,376-Speed 2495.15 samples/sec Loss 1.8481 LearningRate 0.000209 Epoch: 23 Global Step: 488140 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:11:53,578-Speed 2497.30 samples/sec Loss 1.7913 LearningRate 0.000209 Epoch: 23 Global Step: 488150 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:01,780-Speed 2497.52 samples/sec Loss 1.7994 LearningRate 0.000209 Epoch: 23 Global Step: 488160 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:09,933-Speed 2512.51 samples/sec Loss 1.7926 LearningRate 0.000209 Epoch: 23 Global Step: 488170 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:18,146-Speed 2494.20 samples/sec Loss 1.7965 LearningRate 0.000209 Epoch: 23 Global Step: 488180 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:26,352-Speed 2495.98 samples/sec Loss 1.8044 LearningRate 0.000209 Epoch: 23 Global Step: 488190 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:34,559-Speed 2495.90 samples/sec Loss 1.8149 LearningRate 0.000209 Epoch: 23 Global Step: 488200 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:42,757-Speed 2498.50 samples/sec Loss 1.8509 LearningRate 0.000209 Epoch: 23 Global Step: 488210 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:50,957-Speed 2497.94 samples/sec Loss 1.8400 LearningRate 0.000209 Epoch: 23 Global Step: 488220 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:12:59,106-Speed 2513.57 samples/sec Loss 1.8158 LearningRate 0.000209 Epoch: 23 Global Step: 488230 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:07,319-Speed 2494.00 samples/sec Loss 1.8166 LearningRate 0.000209 Epoch: 23 Global Step: 488240 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:15,523-Speed 2496.98 samples/sec Loss 1.8036 LearningRate 0.000209 Epoch: 23 Global Step: 488250 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:23,723-Speed 2497.73 samples/sec Loss 1.8280 LearningRate 0.000209 Epoch: 23 Global Step: 488260 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:31,927-Speed 2496.82 samples/sec Loss 1.8278 LearningRate 0.000209 Epoch: 23 Global Step: 488270 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:40,128-Speed 2497.91 samples/sec Loss 1.8205 LearningRate 0.000209 Epoch: 23 Global Step: 488280 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:48,277-Speed 2513.53 samples/sec Loss 1.7880 LearningRate 0.000209 Epoch: 23 Global Step: 488290 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:13:56,477-Speed 2497.84 samples/sec Loss 1.8209 LearningRate 0.000209 Epoch: 23 Global Step: 488300 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:04,675-Speed 2498.61 samples/sec Loss 1.8024 LearningRate 0.000209 Epoch: 23 Global Step: 488310 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:12,891-Speed 2493.04 samples/sec Loss 1.8614 LearningRate 0.000209 Epoch: 23 Global Step: 488320 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:21,095-Speed 2496.89 samples/sec Loss 1.8167 LearningRate 0.000209 Epoch: 23 Global Step: 488330 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:29,297-Speed 2497.17 samples/sec Loss 1.8233 LearningRate 0.000209 Epoch: 23 Global Step: 488340 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:37,447-Speed 2513.37 samples/sec Loss 1.8343 LearningRate 0.000209 Epoch: 23 Global Step: 488350 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:45,646-Speed 2498.24 samples/sec Loss 1.8218 LearningRate 0.000209 Epoch: 23 Global Step: 488360 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:14:53,843-Speed 2498.70 samples/sec Loss 1.8331 LearningRate 0.000209 Epoch: 23 Global Step: 488370 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:02,052-Speed 2495.06 samples/sec Loss 1.7755 LearningRate 0.000209 Epoch: 23 Global Step: 488380 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:10,252-Speed 2497.91 samples/sec Loss 1.8221 LearningRate 0.000209 Epoch: 23 Global Step: 488390 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:18,466-Speed 2494.32 samples/sec Loss 1.7987 LearningRate 0.000209 Epoch: 23 Global Step: 488400 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:26,625-Speed 2510.40 samples/sec Loss 1.8186 LearningRate 0.000209 Epoch: 23 Global Step: 488410 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:34,826-Speed 2498.16 samples/sec Loss 1.8547 LearningRate 0.000209 Epoch: 23 Global Step: 488420 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:43,027-Speed 2497.70 samples/sec Loss 1.7892 LearningRate 0.000209 Epoch: 23 Global Step: 488430 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:51,227-Speed 2497.89 samples/sec Loss 1.8011 LearningRate 0.000209 Epoch: 23 Global Step: 488440 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:15:59,429-Speed 2497.25 samples/sec Loss 1.8272 LearningRate 0.000209 Epoch: 23 Global Step: 488450 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:07,631-Speed 2497.04 samples/sec Loss 1.8499 LearningRate 0.000209 Epoch: 23 Global Step: 488460 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:15,780-Speed 2513.87 samples/sec Loss 1.8058 LearningRate 0.000209 Epoch: 23 Global Step: 488470 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:23,981-Speed 2497.75 samples/sec Loss 1.8215 LearningRate 0.000209 Epoch: 23 Global Step: 488480 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:32,182-Speed 2497.76 samples/sec Loss 1.8510 LearningRate 0.000209 Epoch: 23 Global Step: 488490 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:40,384-Speed 2497.30 samples/sec Loss 1.8039 LearningRate 0.000209 Epoch: 23 Global Step: 488500 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:48,596-Speed 2494.17 samples/sec Loss 1.8453 LearningRate 0.000209 Epoch: 23 Global Step: 488510 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:16:56,799-Speed 2497.08 samples/sec Loss 1.8437 LearningRate 0.000209 Epoch: 23 Global Step: 488520 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:04,950-Speed 2512.83 samples/sec Loss 1.7991 LearningRate 0.000209 Epoch: 23 Global Step: 488530 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:13,153-Speed 2497.44 samples/sec Loss 1.8459 LearningRate 0.000209 Epoch: 23 Global Step: 488540 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:21,354-Speed 2497.82 samples/sec Loss 1.7931 LearningRate 0.000209 Epoch: 23 Global Step: 488550 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:29,555-Speed 2497.42 samples/sec Loss 1.8001 LearningRate 0.000209 Epoch: 23 Global Step: 488560 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:37,763-Speed 2495.38 samples/sec Loss 1.8695 LearningRate 0.000209 Epoch: 23 Global Step: 488570 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:45,966-Speed 2497.52 samples/sec Loss 1.8062 LearningRate 0.000209 Epoch: 23 Global Step: 488580 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:17:54,114-Speed 2513.94 samples/sec Loss 1.7943 LearningRate 0.000209 Epoch: 23 Global Step: 488590 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:02,320-Speed 2495.96 samples/sec Loss 1.8254 LearningRate 0.000209 Epoch: 23 Global Step: 488600 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:10,538-Speed 2492.72 samples/sec Loss 1.8438 LearningRate 0.000209 Epoch: 23 Global Step: 488610 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:18,742-Speed 2496.80 samples/sec Loss 1.8289 LearningRate 0.000209 Epoch: 23 Global Step: 488620 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:26,942-Speed 2497.91 samples/sec Loss 1.8033 LearningRate 0.000209 Epoch: 23 Global Step: 488630 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:35,149-Speed 2496.06 samples/sec Loss 1.8202 LearningRate 0.000209 Epoch: 23 Global Step: 488640 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:43,315-Speed 2508.40 samples/sec Loss 1.8237 LearningRate 0.000208 Epoch: 23 Global Step: 488650 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:51,515-Speed 2498.19 samples/sec Loss 1.7936 LearningRate 0.000208 Epoch: 23 Global Step: 488660 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:18:59,728-Speed 2493.91 samples/sec Loss 1.8477 LearningRate 0.000208 Epoch: 23 Global Step: 488670 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:07,927-Speed 2498.21 samples/sec Loss 1.8111 LearningRate 0.000208 Epoch: 23 Global Step: 488680 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:16,133-Speed 2496.37 samples/sec Loss 1.8046 LearningRate 0.000208 Epoch: 23 Global Step: 488690 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:24,337-Speed 2496.60 samples/sec Loss 1.8073 LearningRate 0.000208 Epoch: 23 Global Step: 488700 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:32,484-Speed 2514.32 samples/sec Loss 1.8136 LearningRate 0.000208 Epoch: 23 Global Step: 488710 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:40,685-Speed 2497.81 samples/sec Loss 1.7825 LearningRate 0.000208 Epoch: 23 Global Step: 488720 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:48,884-Speed 2498.27 samples/sec Loss 1.8004 LearningRate 0.000208 Epoch: 23 Global Step: 488730 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:19:57,089-Speed 2496.31 samples/sec Loss 1.8032 LearningRate 0.000208 Epoch: 23 Global Step: 488740 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:05,292-Speed 2497.01 samples/sec Loss 1.8518 LearningRate 0.000208 Epoch: 23 Global Step: 488750 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:13,490-Speed 2498.55 samples/sec Loss 1.8703 LearningRate 0.000208 Epoch: 23 Global Step: 488760 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:21,641-Speed 2512.69 samples/sec Loss 1.8512 LearningRate 0.000208 Epoch: 23 Global Step: 488770 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:29,858-Speed 2492.77 samples/sec Loss 1.8062 LearningRate 0.000208 Epoch: 23 Global Step: 488780 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:38,057-Speed 2498.56 samples/sec Loss 1.8345 LearningRate 0.000208 Epoch: 23 Global Step: 488790 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:46,256-Speed 2498.35 samples/sec Loss 1.8157 LearningRate 0.000208 Epoch: 23 Global Step: 488800 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:20:54,456-Speed 2498.27 samples/sec Loss 1.7624 LearningRate 0.000208 Epoch: 23 Global Step: 488810 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:02,655-Speed 2499.10 samples/sec Loss 1.8562 LearningRate 0.000208 Epoch: 23 Global Step: 488820 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:10,803-Speed 2513.74 samples/sec Loss 1.8111 LearningRate 0.000208 Epoch: 23 Global Step: 488830 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:19,017-Speed 2493.86 samples/sec Loss 1.8626 LearningRate 0.000208 Epoch: 23 Global Step: 488840 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:27,213-Speed 2499.02 samples/sec Loss 1.8345 LearningRate 0.000208 Epoch: 23 Global Step: 488850 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:35,413-Speed 2497.95 samples/sec Loss 1.8217 LearningRate 0.000208 Epoch: 23 Global Step: 488860 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:43,613-Speed 2498.00 samples/sec Loss 1.8165 LearningRate 0.000208 Epoch: 23 Global Step: 488870 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:51,815-Speed 2497.44 samples/sec Loss 1.7994 LearningRate 0.000208 Epoch: 23 Global Step: 488880 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:21:59,966-Speed 2512.95 samples/sec Loss 1.8145 LearningRate 0.000208 Epoch: 23 Global Step: 488890 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:08,178-Speed 2494.54 samples/sec Loss 1.8208 LearningRate 0.000208 Epoch: 23 Global Step: 488900 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:16,379-Speed 2497.51 samples/sec Loss 1.8084 LearningRate 0.000208 Epoch: 23 Global Step: 488910 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:24,580-Speed 2497.83 samples/sec Loss 1.8424 LearningRate 0.000208 Epoch: 23 Global Step: 488920 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:32,779-Speed 2498.17 samples/sec Loss 1.7998 LearningRate 0.000208 Epoch: 23 Global Step: 488930 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:40,979-Speed 2498.03 samples/sec Loss 1.8083 LearningRate 0.000208 Epoch: 23 Global Step: 488940 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:49,131-Speed 2512.50 samples/sec Loss 1.7937 LearningRate 0.000208 Epoch: 23 Global Step: 488950 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:22:57,330-Speed 2498.27 samples/sec Loss 1.7748 LearningRate 0.000208 Epoch: 23 Global Step: 488960 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:05,535-Speed 2496.83 samples/sec Loss 1.7875 LearningRate 0.000208 Epoch: 23 Global Step: 488970 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:13,729-Speed 2499.50 samples/sec Loss 1.8106 LearningRate 0.000208 Epoch: 23 Global Step: 488980 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:21,934-Speed 2497.10 samples/sec Loss 1.8049 LearningRate 0.000208 Epoch: 23 Global Step: 488990 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:30,134-Speed 2498.10 samples/sec Loss 1.7871 LearningRate 0.000208 Epoch: 23 Global Step: 489000 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:38,283-Speed 2513.63 samples/sec Loss 1.8275 LearningRate 0.000208 Epoch: 23 Global Step: 489010 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:46,489-Speed 2496.17 samples/sec Loss 1.7773 LearningRate 0.000208 Epoch: 23 Global Step: 489020 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:23:54,700-Speed 2494.57 samples/sec Loss 1.7812 LearningRate 0.000208 Epoch: 23 Global Step: 489030 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:02,902-Speed 2497.46 samples/sec Loss 1.7892 LearningRate 0.000208 Epoch: 23 Global Step: 489040 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:11,101-Speed 2498.09 samples/sec Loss 1.7719 LearningRate 0.000208 Epoch: 23 Global Step: 489050 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:19,317-Speed 2492.95 samples/sec Loss 1.8341 LearningRate 0.000208 Epoch: 23 Global Step: 489060 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:27,466-Speed 2513.54 samples/sec Loss 1.8125 LearningRate 0.000208 Epoch: 23 Global Step: 489070 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:35,666-Speed 2498.04 samples/sec Loss 1.8108 LearningRate 0.000208 Epoch: 23 Global Step: 489080 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:43,873-Speed 2495.92 samples/sec Loss 1.8058 LearningRate 0.000208 Epoch: 23 Global Step: 489090 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:24:52,073-Speed 2497.86 samples/sec Loss 1.8041 LearningRate 0.000208 Epoch: 23 Global Step: 489100 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-07-10 06:25:00,277-Speed 2496.93 samples/sec Loss 1.7802 LearningRate 0.000208 Epoch: 23 Global Step: 489110 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:08,485-Speed 2495.38 samples/sec Loss 1.7531 LearningRate 0.000208 Epoch: 23 Global Step: 489120 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:16,637-Speed 2512.94 samples/sec Loss 1.8038 LearningRate 0.000208 Epoch: 23 Global Step: 489130 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:24,854-Speed 2492.60 samples/sec Loss 1.7982 LearningRate 0.000208 Epoch: 23 Global Step: 489140 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:33,090-Speed 2498.98 samples/sec Loss 1.7795 LearningRate 0.000208 Epoch: 23 Global Step: 489150 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:41,296-Speed 2499.26 samples/sec Loss 1.8532 LearningRate 0.000208 Epoch: 23 Global Step: 489160 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:49,507-Speed 2494.45 samples/sec Loss 1.8267 LearningRate 0.000208 Epoch: 23 Global Step: 489170 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:25:59,337-Speed 2189.99 samples/sec Loss 1.7892 LearningRate 0.000208 Epoch: 23 Global Step: 489180 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:07,593-Speed 2517.36 samples/sec Loss 1.8031 LearningRate 0.000208 Epoch: 23 Global Step: 489190 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:15,831-Speed 2500.93 samples/sec Loss 1.8190 LearningRate 0.000208 Epoch: 23 Global Step: 489200 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:24,074-Speed 2500.12 samples/sec Loss 1.8079 LearningRate 0.000208 Epoch: 23 Global Step: 489210 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:36,584-Speed 1637.26 samples/sec Loss 1.7708 LearningRate 0.000208 Epoch: 23 Global Step: 489220 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:44,826-Speed 2497.95 samples/sec Loss 1.7730 LearningRate 0.000208 Epoch: 23 Global Step: 489230 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:26:53,110-Speed 2500.48 samples/sec Loss 1.7727 LearningRate 0.000208 Epoch: 23 Global Step: 489240 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:01,251-Speed 2515.81 samples/sec Loss 1.7935 LearningRate 0.000208 Epoch: 23 Global Step: 489250 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:11,015-Speed 2500.61 samples/sec Loss 1.7589 LearningRate 0.000208 Epoch: 23 Global Step: 489260 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:19,231-Speed 2501.10 samples/sec Loss 1.7700 LearningRate 0.000208 Epoch: 23 Global Step: 489270 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:31,008-Speed 1740.10 samples/sec Loss 1.8379 LearningRate 0.000208 Epoch: 23 Global Step: 489280 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:39,197-Speed 2501.26 samples/sec Loss 1.7965 LearningRate 0.000208 Epoch: 23 Global Step: 489290 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:47,392-Speed 2499.10 samples/sec Loss 1.8032 LearningRate 0.000208 Epoch: 23 Global Step: 489300 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:27:55,575-Speed 2515.59 samples/sec Loss 1.7961 LearningRate 0.000208 Epoch: 23 Global Step: 489310 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:07,835-Speed 2070.87 samples/sec Loss 1.8685 LearningRate 0.000208 Epoch: 23 Global Step: 489320 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:16,033-Speed 2498.65 samples/sec Loss 1.7869 LearningRate 0.000208 Epoch: 23 Global Step: 489330 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:27,845-Speed 1740.68 samples/sec Loss 1.7755 LearningRate 0.000208 Epoch: 23 Global Step: 489340 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:37,045-Speed 2500.08 samples/sec Loss 1.8338 LearningRate 0.000208 Epoch: 23 Global Step: 489350 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:45,244-Speed 2498.22 samples/sec Loss 1.8157 LearningRate 0.000208 Epoch: 23 Global Step: 489360 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:28:58,276-Speed 1576.74 samples/sec Loss 1.8337 LearningRate 0.000208 Epoch: 23 Global Step: 489370 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:07,047-Speed 2498.66 samples/sec Loss 1.7905 LearningRate 0.000208 Epoch: 23 Global Step: 489380 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:15,255-Speed 2495.55 samples/sec Loss 1.8182 LearningRate 0.000208 Epoch: 23 Global Step: 489390 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:23,467-Speed 2494.03 samples/sec Loss 1.7919 LearningRate 0.000208 Epoch: 23 Global Step: 489400 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:31,685-Speed 2492.95 samples/sec Loss 1.7936 LearningRate 0.000208 Epoch: 23 Global Step: 489410 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:39,908-Speed 2491.01 samples/sec Loss 1.8311 LearningRate 0.000208 Epoch: 23 Global Step: 489420 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:48,068-Speed 2510.12 samples/sec Loss 1.8076 LearningRate 0.000208 Epoch: 23 Global Step: 489430 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:29:56,284-Speed 2493.23 samples/sec Loss 1.8262 LearningRate 0.000208 Epoch: 23 Global Step: 489440 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:04,500-Speed 2493.04 samples/sec Loss 1.7978 LearningRate 0.000208 Epoch: 23 Global Step: 489450 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:12,709-Speed 2495.23 samples/sec Loss 1.8121 LearningRate 0.000208 Epoch: 23 Global Step: 489460 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:20,916-Speed 2495.69 samples/sec Loss 1.7520 LearningRate 0.000207 Epoch: 23 Global Step: 489470 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:29,122-Speed 2496.25 samples/sec Loss 1.7750 LearningRate 0.000207 Epoch: 23 Global Step: 489480 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:37,268-Speed 2514.46 samples/sec Loss 1.7915 LearningRate 0.000207 Epoch: 23 Global Step: 489490 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:45,471-Speed 2497.19 samples/sec Loss 1.7762 LearningRate 0.000207 Epoch: 23 Global Step: 489500 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:30:53,673-Speed 2497.14 samples/sec Loss 1.8093 LearningRate 0.000207 Epoch: 23 Global Step: 489510 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:01,872-Speed 2498.46 samples/sec Loss 1.7583 LearningRate 0.000207 Epoch: 23 Global Step: 489520 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:10,083-Speed 2494.58 samples/sec Loss 1.8230 LearningRate 0.000207 Epoch: 23 Global Step: 489530 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:18,282-Speed 2498.43 samples/sec Loss 1.7731 LearningRate 0.000207 Epoch: 23 Global Step: 489540 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:26,434-Speed 2512.47 samples/sec Loss 1.8274 LearningRate 0.000207 Epoch: 23 Global Step: 489550 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:34,641-Speed 2496.10 samples/sec Loss 1.7938 LearningRate 0.000207 Epoch: 23 Global Step: 489560 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:42,847-Speed 2496.20 samples/sec Loss 1.8235 LearningRate 0.000207 Epoch: 23 Global Step: 489570 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:51,053-Speed 2496.27 samples/sec Loss 1.7761 LearningRate 0.000207 Epoch: 23 Global Step: 489580 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:31:59,258-Speed 2496.45 samples/sec Loss 1.8278 LearningRate 0.000207 Epoch: 23 Global Step: 489590 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:07,463-Speed 2496.59 samples/sec Loss 1.8143 LearningRate 0.000207 Epoch: 23 Global Step: 489600 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:15,615-Speed 2512.67 samples/sec Loss 1.8072 LearningRate 0.000207 Epoch: 23 Global Step: 489610 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:23,822-Speed 2495.78 samples/sec Loss 1.8152 LearningRate 0.000207 Epoch: 23 Global Step: 489620 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:32,026-Speed 2496.51 samples/sec Loss 1.8499 LearningRate 0.000207 Epoch: 23 Global Step: 489630 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:40,233-Speed 2495.77 samples/sec Loss 1.7722 LearningRate 0.000207 Epoch: 23 Global Step: 489640 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:48,434-Speed 2497.71 samples/sec Loss 1.8117 LearningRate 0.000207 Epoch: 23 Global Step: 489650 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:32:56,645-Speed 2494.75 samples/sec Loss 1.8248 LearningRate 0.000207 Epoch: 23 Global Step: 489660 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:04,799-Speed 2511.92 samples/sec Loss 1.8314 LearningRate 0.000207 Epoch: 23 Global Step: 489670 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:13,003-Speed 2496.92 samples/sec Loss 1.8488 LearningRate 0.000207 Epoch: 23 Global Step: 489680 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:21,206-Speed 2496.91 samples/sec Loss 1.7965 LearningRate 0.000207 Epoch: 23 Global Step: 489690 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:29,407-Speed 2497.70 samples/sec Loss 1.8195 LearningRate 0.000207 Epoch: 23 Global Step: 489700 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:37,615-Speed 2495.55 samples/sec Loss 1.8062 LearningRate 0.000207 Epoch: 23 Global Step: 489710 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:45,816-Speed 2497.57 samples/sec Loss 1.7848 LearningRate 0.000207 Epoch: 23 Global Step: 489720 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:33:53,968-Speed 2512.86 samples/sec Loss 1.8534 LearningRate 0.000207 Epoch: 23 Global Step: 489730 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:02,175-Speed 2495.58 samples/sec Loss 1.8394 LearningRate 0.000207 Epoch: 23 Global Step: 489740 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:10,383-Speed 2495.47 samples/sec Loss 1.7981 LearningRate 0.000207 Epoch: 23 Global Step: 489750 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:18,588-Speed 2496.52 samples/sec Loss 1.8050 LearningRate 0.000207 Epoch: 23 Global Step: 489760 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:26,798-Speed 2494.92 samples/sec Loss 1.8384 LearningRate 0.000207 Epoch: 23 Global Step: 489770 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:35,005-Speed 2495.87 samples/sec Loss 1.8243 LearningRate 0.000207 Epoch: 23 Global Step: 489780 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:43,174-Speed 2507.50 samples/sec Loss 1.8076 LearningRate 0.000207 Epoch: 23 Global Step: 489790 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:51,375-Speed 2497.78 samples/sec Loss 1.7972 LearningRate 0.000207 Epoch: 23 Global Step: 489800 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:34:59,579-Speed 2497.03 samples/sec Loss 1.8084 LearningRate 0.000207 Epoch: 23 Global Step: 489810 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:07,785-Speed 2495.89 samples/sec Loss 1.8471 LearningRate 0.000207 Epoch: 23 Global Step: 489820 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:15,989-Speed 2496.75 samples/sec Loss 1.8152 LearningRate 0.000207 Epoch: 23 Global Step: 489830 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:24,188-Speed 2498.47 samples/sec Loss 1.7782 LearningRate 0.000207 Epoch: 23 Global Step: 489840 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:32,336-Speed 2514.22 samples/sec Loss 1.8077 LearningRate 0.000207 Epoch: 23 Global Step: 489850 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:40,549-Speed 2493.72 samples/sec Loss 1.8432 LearningRate 0.000207 Epoch: 23 Global Step: 489860 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:48,753-Speed 2497.96 samples/sec Loss 1.8028 LearningRate 0.000207 Epoch: 23 Global Step: 489870 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:35:56,954-Speed 2497.55 samples/sec Loss 1.8507 LearningRate 0.000207 Epoch: 23 Global Step: 489880 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:05,155-Speed 2497.58 samples/sec Loss 1.7843 LearningRate 0.000207 Epoch: 23 Global Step: 489890 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:13,361-Speed 2496.10 samples/sec Loss 1.8262 LearningRate 0.000207 Epoch: 23 Global Step: 489900 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:21,511-Speed 2513.44 samples/sec Loss 1.8376 LearningRate 0.000207 Epoch: 23 Global Step: 489910 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:29,712-Speed 2497.66 samples/sec Loss 1.7964 LearningRate 0.000207 Epoch: 23 Global Step: 489920 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:37,922-Speed 2494.67 samples/sec Loss 1.7917 LearningRate 0.000207 Epoch: 23 Global Step: 489930 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:46,126-Speed 2496.82 samples/sec Loss 1.7914 LearningRate 0.000207 Epoch: 23 Global Step: 489940 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:36:54,327-Speed 2497.63 samples/sec Loss 1.7547 LearningRate 0.000207 Epoch: 23 Global Step: 489950 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:02,541-Speed 2493.79 samples/sec Loss 1.7949 LearningRate 0.000207 Epoch: 23 Global Step: 489960 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:10,690-Speed 2513.59 samples/sec Loss 1.7935 LearningRate 0.000207 Epoch: 23 Global Step: 489970 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:18,890-Speed 2497.99 samples/sec Loss 1.8003 LearningRate 0.000207 Epoch: 23 Global Step: 489980 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:27,096-Speed 2496.12 samples/sec Loss 1.7819 LearningRate 0.000207 Epoch: 23 Global Step: 489990 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:35,296-Speed 2497.79 samples/sec Loss 1.8270 LearningRate 0.000207 Epoch: 23 Global Step: 490000 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:43,495-Speed 2498.50 samples/sec Loss 1.8134 LearningRate 0.000207 Epoch: 23 Global Step: 490010 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:51,697-Speed 2497.35 samples/sec Loss 1.7635 LearningRate 0.000207 Epoch: 23 Global Step: 490020 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:37:59,868-Speed 2507.12 samples/sec Loss 1.8502 LearningRate 0.000207 Epoch: 23 Global Step: 490030 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:08,070-Speed 2497.44 samples/sec Loss 1.8066 LearningRate 0.000207 Epoch: 23 Global Step: 490040 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:16,273-Speed 2497.04 samples/sec Loss 1.7991 LearningRate 0.000207 Epoch: 23 Global Step: 490050 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:24,481-Speed 2495.50 samples/sec Loss 1.7880 LearningRate 0.000207 Epoch: 23 Global Step: 490060 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:32,680-Speed 2498.30 samples/sec Loss 1.8637 LearningRate 0.000207 Epoch: 23 Global Step: 490070 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:40,881-Speed 2497.68 samples/sec Loss 1.8128 LearningRate 0.000207 Epoch: 23 Global Step: 490080 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:49,033-Speed 2512.91 samples/sec Loss 1.8131 LearningRate 0.000207 Epoch: 23 Global Step: 490090 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:38:57,233-Speed 2497.88 samples/sec Loss 1.7813 LearningRate 0.000207 Epoch: 23 Global Step: 490100 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:05,434-Speed 2497.54 samples/sec Loss 1.7790 LearningRate 0.000207 Epoch: 23 Global Step: 490110 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:13,644-Speed 2495.17 samples/sec Loss 1.7762 LearningRate 0.000207 Epoch: 23 Global Step: 490120 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:21,851-Speed 2495.85 samples/sec Loss 1.8118 LearningRate 0.000207 Epoch: 23 Global Step: 490130 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:30,060-Speed 2495.03 samples/sec Loss 1.8322 LearningRate 0.000207 Epoch: 23 Global Step: 490140 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:38,211-Speed 2513.11 samples/sec Loss 1.7664 LearningRate 0.000207 Epoch: 23 Global Step: 490150 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:46,411-Speed 2498.02 samples/sec Loss 1.8048 LearningRate 0.000207 Epoch: 23 Global Step: 490160 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:39:54,617-Speed 2495.92 samples/sec Loss 1.8042 LearningRate 0.000207 Epoch: 23 Global Step: 490170 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:02,823-Speed 2496.19 samples/sec Loss 1.8075 LearningRate 0.000207 Epoch: 23 Global Step: 490180 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:11,033-Speed 2494.89 samples/sec Loss 1.7823 LearningRate 0.000207 Epoch: 23 Global Step: 490190 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:19,238-Speed 2496.63 samples/sec Loss 1.8269 LearningRate 0.000207 Epoch: 23 Global Step: 490200 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:27,383-Speed 2514.47 samples/sec Loss 1.7859 LearningRate 0.000207 Epoch: 23 Global Step: 490210 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:35,591-Speed 2495.61 samples/sec Loss 1.7842 LearningRate 0.000207 Epoch: 23 Global Step: 490220 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:43,796-Speed 2496.52 samples/sec Loss 1.8019 LearningRate 0.000207 Epoch: 23 Global Step: 490230 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:40:52,000-Speed 2496.71 samples/sec Loss 1.8245 LearningRate 0.000207 Epoch: 23 Global Step: 490240 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:00,202-Speed 2497.30 samples/sec Loss 1.8488 LearningRate 0.000207 Epoch: 23 Global Step: 490250 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:08,402-Speed 2497.88 samples/sec Loss 1.8020 LearningRate 0.000207 Epoch: 23 Global Step: 490260 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:16,546-Speed 2515.27 samples/sec Loss 1.8119 LearningRate 0.000207 Epoch: 23 Global Step: 490270 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:24,758-Speed 2494.03 samples/sec Loss 1.8214 LearningRate 0.000207 Epoch: 23 Global Step: 490280 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:32,980-Speed 2491.80 samples/sec Loss 1.7944 LearningRate 0.000206 Epoch: 23 Global Step: 490290 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:41,183-Speed 2496.89 samples/sec Loss 1.8002 LearningRate 0.000206 Epoch: 23 Global Step: 490300 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:41:49,382-Speed 2498.26 samples/sec Loss 1.7867 LearningRate 0.000206 Epoch: 23 Global Step: 490310 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:41:57,582-Speed 2497.98 samples/sec Loss 1.7972 LearningRate 0.000206 Epoch: 23 Global Step: 490320 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:05,738-Speed 2511.32 samples/sec Loss 1.7743 LearningRate 0.000206 Epoch: 23 Global Step: 490330 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:13,937-Speed 2498.72 samples/sec Loss 1.8137 LearningRate 0.000206 Epoch: 23 Global Step: 490340 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:22,138-Speed 2497.52 samples/sec Loss 1.8041 LearningRate 0.000206 Epoch: 23 Global Step: 490350 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:30,339-Speed 2497.74 samples/sec Loss 1.8019 LearningRate 0.000206 Epoch: 23 Global Step: 490360 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:38,540-Speed 2497.48 samples/sec Loss 1.7588 LearningRate 0.000206 Epoch: 23 Global Step: 490370 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:46,740-Speed 2498.31 samples/sec Loss 1.8427 LearningRate 0.000206 Epoch: 23 Global Step: 490380 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:42:54,890-Speed 2513.19 samples/sec Loss 1.8059 LearningRate 0.000206 Epoch: 23 Global Step: 490390 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:03,092-Speed 2497.47 samples/sec Loss 1.8295 LearningRate 0.000206 Epoch: 23 Global Step: 490400 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:11,293-Speed 2497.40 samples/sec Loss 1.7972 LearningRate 0.000206 Epoch: 23 Global Step: 490410 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:19,500-Speed 2495.87 samples/sec Loss 1.8132 LearningRate 0.000206 Epoch: 23 Global Step: 490420 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:27,701-Speed 2497.76 samples/sec Loss 1.8268 LearningRate 0.000206 Epoch: 23 Global Step: 490430 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:35,904-Speed 2496.97 samples/sec Loss 1.8386 LearningRate 0.000206 Epoch: 23 Global Step: 490440 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:44,052-Speed 2513.87 samples/sec Loss 1.8076 LearningRate 0.000206 Epoch: 23 Global Step: 490450 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:43:52,254-Speed 2497.43 samples/sec Loss 1.8115 LearningRate 0.000206 Epoch: 23 Global Step: 490460 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:00,465-Speed 2494.69 samples/sec Loss 1.7957 LearningRate 0.000206 Epoch: 23 Global Step: 490470 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:08,668-Speed 2496.92 samples/sec Loss 1.8272 LearningRate 0.000206 Epoch: 23 Global Step: 490480 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:16,890-Speed 2491.35 samples/sec Loss 1.8005 LearningRate 0.000206 Epoch: 23 Global Step: 490490 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:25,101-Speed 2494.53 samples/sec Loss 1.7753 LearningRate 0.000206 Epoch: 23 Global Step: 490500 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:33,249-Speed 2514.05 samples/sec Loss 1.8381 LearningRate 0.000206 Epoch: 23 Global Step: 490510 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:41,457-Speed 2495.40 samples/sec Loss 1.7871 LearningRate 0.000206 Epoch: 23 Global Step: 490520 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:49,676-Speed 2492.21 samples/sec Loss 1.7830 LearningRate 0.000206 Epoch: 23 Global Step: 490530 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:44:57,883-Speed 2495.80 samples/sec Loss 1.7941 LearningRate 0.000206 Epoch: 23 Global Step: 490540 Fp16 Grad Scale: 32768 Required: 78 hours Training: 2022-07-10 06:45:06,041-Speed 2510.68 samples/sec Loss 1.8061 LearningRate 0.000206 Epoch: 23 Global Step: 490550 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:14,241-Speed 2498.10 samples/sec Loss 1.7811 LearningRate 0.000206 Epoch: 23 Global Step: 490560 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:22,388-Speed 2514.00 samples/sec Loss 1.8077 LearningRate 0.000206 Epoch: 23 Global Step: 490570 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:30,592-Speed 2497.05 samples/sec Loss 1.7858 LearningRate 0.000206 Epoch: 23 Global Step: 490580 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:38,794-Speed 2497.42 samples/sec Loss 1.7859 LearningRate 0.000206 Epoch: 23 Global Step: 490590 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:46,996-Speed 2497.11 samples/sec Loss 1.7899 LearningRate 0.000206 Epoch: 23 Global Step: 490600 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:45:55,199-Speed 2497.19 samples/sec Loss 1.8105 LearningRate 0.000206 Epoch: 23 Global Step: 490610 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:03,401-Speed 2497.30 samples/sec Loss 1.8082 LearningRate 0.000206 Epoch: 23 Global Step: 490620 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:11,557-Speed 2511.45 samples/sec Loss 1.7946 LearningRate 0.000206 Epoch: 23 Global Step: 490630 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:19,756-Speed 2498.24 samples/sec Loss 1.8327 LearningRate 0.000206 Epoch: 23 Global Step: 490640 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:27,963-Speed 2496.01 samples/sec Loss 1.8065 LearningRate 0.000206 Epoch: 23 Global Step: 490650 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:36,176-Speed 2493.79 samples/sec Loss 1.7922 LearningRate 0.000206 Epoch: 23 Global Step: 490660 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:44,377-Speed 2497.56 samples/sec Loss 1.8289 LearningRate 0.000206 Epoch: 23 Global Step: 490670 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:46:52,577-Speed 2497.97 samples/sec Loss 1.7896 LearningRate 0.000206 Epoch: 23 Global Step: 490680 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:00,729-Speed 2512.65 samples/sec Loss 1.7612 LearningRate 0.000206 Epoch: 23 Global Step: 490690 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:08,933-Speed 2496.66 samples/sec Loss 1.7986 LearningRate 0.000206 Epoch: 23 Global Step: 490700 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:17,143-Speed 2494.87 samples/sec Loss 1.7762 LearningRate 0.000206 Epoch: 23 Global Step: 490710 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:25,344-Speed 2497.81 samples/sec Loss 1.7747 LearningRate 0.000206 Epoch: 23 Global Step: 490720 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:33,547-Speed 2497.18 samples/sec Loss 1.8040 LearningRate 0.000206 Epoch: 23 Global Step: 490730 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:41,760-Speed 2493.96 samples/sec Loss 1.7586 LearningRate 0.000206 Epoch: 23 Global Step: 490740 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:49,907-Speed 2514.05 samples/sec Loss 1.8123 LearningRate 0.000206 Epoch: 23 Global Step: 490750 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:47:58,113-Speed 2496.15 samples/sec Loss 1.7805 LearningRate 0.000206 Epoch: 23 Global Step: 490760 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:06,313-Speed 2497.65 samples/sec Loss 1.7770 LearningRate 0.000206 Epoch: 23 Global Step: 490770 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:14,514-Speed 2497.61 samples/sec Loss 1.7778 LearningRate 0.000206 Epoch: 23 Global Step: 490780 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:22,721-Speed 2495.80 samples/sec Loss 1.7565 LearningRate 0.000206 Epoch: 23 Global Step: 490790 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:30,926-Speed 2496.55 samples/sec Loss 1.8187 LearningRate 0.000206 Epoch: 23 Global Step: 490800 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:39,073-Speed 2514.52 samples/sec Loss 1.8308 LearningRate 0.000206 Epoch: 23 Global Step: 490810 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:47,271-Speed 2498.75 samples/sec Loss 1.7789 LearningRate 0.000206 Epoch: 23 Global Step: 490820 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:48:55,487-Speed 2493.09 samples/sec Loss 1.8008 LearningRate 0.000206 Epoch: 23 Global Step: 490830 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:03,691-Speed 2497.08 samples/sec Loss 1.7643 LearningRate 0.000206 Epoch: 23 Global Step: 490840 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:11,891-Speed 2497.82 samples/sec Loss 1.8284 LearningRate 0.000206 Epoch: 23 Global Step: 490850 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:20,108-Speed 2492.77 samples/sec Loss 1.8267 LearningRate 0.000206 Epoch: 23 Global Step: 490860 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:28,257-Speed 2513.56 samples/sec Loss 1.7940 LearningRate 0.000206 Epoch: 23 Global Step: 490870 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:36,475-Speed 2492.35 samples/sec Loss 1.8374 LearningRate 0.000206 Epoch: 23 Global Step: 490880 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:44,677-Speed 2497.63 samples/sec Loss 1.8297 LearningRate 0.000206 Epoch: 23 Global Step: 490890 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:49:52,879-Speed 2497.37 samples/sec Loss 1.7799 LearningRate 0.000206 Epoch: 23 Global Step: 490900 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:01,087-Speed 2495.73 samples/sec Loss 1.8024 LearningRate 0.000206 Epoch: 23 Global Step: 490910 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:09,288-Speed 2497.75 samples/sec Loss 1.7825 LearningRate 0.000206 Epoch: 23 Global Step: 490920 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:17,437-Speed 2513.72 samples/sec Loss 1.8477 LearningRate 0.000206 Epoch: 23 Global Step: 490930 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:25,652-Speed 2493.50 samples/sec Loss 1.8233 LearningRate 0.000206 Epoch: 23 Global Step: 490940 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:33,866-Speed 2493.82 samples/sec Loss 1.8245 LearningRate 0.000206 Epoch: 23 Global Step: 490950 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:42,070-Speed 2496.56 samples/sec Loss 1.8268 LearningRate 0.000206 Epoch: 23 Global Step: 490960 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:50,280-Speed 2495.14 samples/sec Loss 1.8615 LearningRate 0.000206 Epoch: 23 Global Step: 490970 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:50:58,484-Speed 2496.60 samples/sec Loss 1.8049 LearningRate 0.000206 Epoch: 23 Global Step: 490980 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:06,632-Speed 2514.01 samples/sec Loss 1.8352 LearningRate 0.000206 Epoch: 23 Global Step: 490990 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:14,833-Speed 2497.42 samples/sec Loss 1.7815 LearningRate 0.000206 Epoch: 23 Global Step: 491000 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:23,034-Speed 2497.94 samples/sec Loss 1.8447 LearningRate 0.000206 Epoch: 23 Global Step: 491010 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:31,247-Speed 2494.02 samples/sec Loss 1.8311 LearningRate 0.000206 Epoch: 23 Global Step: 491020 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:39,448-Speed 2497.61 samples/sec Loss 1.8218 LearningRate 0.000206 Epoch: 23 Global Step: 491030 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:47,656-Speed 2495.40 samples/sec Loss 1.8368 LearningRate 0.000206 Epoch: 23 Global Step: 491040 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:51:55,805-Speed 2513.74 samples/sec Loss 1.8080 LearningRate 0.000206 Epoch: 23 Global Step: 491050 Fp16 Grad Scale: 16384 Required: 78 hours Training: 2022-07-10 06:52:04,008-Speed 2496.86 samples/sec Loss 1.8124 LearningRate 0.000206 Epoch: 23 Global Step: 491060 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:12,210-Speed 2497.49 samples/sec Loss 1.8314 LearningRate 0.000206 Epoch: 23 Global Step: 491070 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:20,416-Speed 2496.12 samples/sec Loss 1.8341 LearningRate 0.000206 Epoch: 23 Global Step: 491080 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:28,621-Speed 2496.59 samples/sec Loss 1.8137 LearningRate 0.000206 Epoch: 23 Global Step: 491090 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:36,839-Speed 2492.40 samples/sec Loss 1.7900 LearningRate 0.000206 Epoch: 23 Global Step: 491100 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:44,995-Speed 2511.22 samples/sec Loss 1.7987 LearningRate 0.000205 Epoch: 23 Global Step: 491110 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:52:53,211-Speed 2493.09 samples/sec Loss 1.7856 LearningRate 0.000205 Epoch: 23 Global Step: 491120 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:01,415-Speed 2496.79 samples/sec Loss 1.8230 LearningRate 0.000205 Epoch: 23 Global Step: 491130 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:09,617-Speed 2497.43 samples/sec Loss 1.8654 LearningRate 0.000205 Epoch: 23 Global Step: 491140 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:17,817-Speed 2498.01 samples/sec Loss 1.8381 LearningRate 0.000205 Epoch: 23 Global Step: 491150 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:26,020-Speed 2496.88 samples/sec Loss 1.8106 LearningRate 0.000205 Epoch: 23 Global Step: 491160 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:34,169-Speed 2513.50 samples/sec Loss 1.8065 LearningRate 0.000205 Epoch: 23 Global Step: 491170 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:42,375-Speed 2496.32 samples/sec Loss 1.7963 LearningRate 0.000205 Epoch: 23 Global Step: 491180 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:50,579-Speed 2497.08 samples/sec Loss 1.8118 LearningRate 0.000205 Epoch: 23 Global Step: 491190 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 06:53:58,738-Speed 2510.41 samples/sec Loss 1.7757 LearningRate 0.000205 Epoch: 23 Global Step: 491200 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:06,943-Speed 2496.21 samples/sec Loss 1.8188 LearningRate 0.000205 Epoch: 23 Global Step: 491210 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:15,152-Speed 2495.18 samples/sec Loss 1.7567 LearningRate 0.000205 Epoch: 23 Global Step: 491220 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:23,305-Speed 2512.80 samples/sec Loss 1.7715 LearningRate 0.000205 Epoch: 23 Global Step: 491230 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:31,508-Speed 2497.05 samples/sec Loss 1.8025 LearningRate 0.000205 Epoch: 23 Global Step: 491240 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:39,710-Speed 2497.36 samples/sec Loss 1.7947 LearningRate 0.000205 Epoch: 23 Global Step: 491250 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:47,912-Speed 2497.29 samples/sec Loss 1.8139 LearningRate 0.000205 Epoch: 23 Global Step: 491260 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:54:56,123-Speed 2494.90 samples/sec Loss 1.8225 LearningRate 0.000205 Epoch: 23 Global Step: 491270 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:04,332-Speed 2495.22 samples/sec Loss 1.7489 LearningRate 0.000205 Epoch: 23 Global Step: 491280 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:12,489-Speed 2510.85 samples/sec Loss 1.7693 LearningRate 0.000205 Epoch: 23 Global Step: 491290 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:20,698-Speed 2495.40 samples/sec Loss 1.8140 LearningRate 0.000205 Epoch: 23 Global Step: 491300 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:28,902-Speed 2497.10 samples/sec Loss 1.7782 LearningRate 0.000205 Epoch: 23 Global Step: 491310 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:37,144-Speed 2485.10 samples/sec Loss 1.7768 LearningRate 0.000205 Epoch: 23 Global Step: 491320 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:45,345-Speed 2497.49 samples/sec Loss 1.8413 LearningRate 0.000205 Epoch: 23 Global Step: 491330 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:55:53,551-Speed 2496.35 samples/sec Loss 1.8139 LearningRate 0.000205 Epoch: 23 Global Step: 491340 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:01,702-Speed 2513.02 samples/sec Loss 1.7872 LearningRate 0.000205 Epoch: 23 Global Step: 491350 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:09,909-Speed 2495.96 samples/sec Loss 1.8458 LearningRate 0.000205 Epoch: 23 Global Step: 491360 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:18,116-Speed 2496.08 samples/sec Loss 1.7694 LearningRate 0.000205 Epoch: 23 Global Step: 491370 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:26,318-Speed 2497.07 samples/sec Loss 1.7938 LearningRate 0.000205 Epoch: 23 Global Step: 491380 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:34,521-Speed 2497.15 samples/sec Loss 1.7997 LearningRate 0.000205 Epoch: 23 Global Step: 491390 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:42,720-Speed 2498.13 samples/sec Loss 1.8644 LearningRate 0.000205 Epoch: 23 Global Step: 491400 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:50,867-Speed 2514.37 samples/sec Loss 1.8160 LearningRate 0.000205 Epoch: 23 Global Step: 491410 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:56:59,069-Speed 2497.43 samples/sec Loss 1.8293 LearningRate 0.000205 Epoch: 23 Global Step: 491420 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:07,271-Speed 2497.39 samples/sec Loss 1.8287 LearningRate 0.000205 Epoch: 23 Global Step: 491430 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:15,470-Speed 2498.25 samples/sec Loss 1.8458 LearningRate 0.000205 Epoch: 23 Global Step: 491440 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:23,674-Speed 2496.98 samples/sec Loss 1.8451 LearningRate 0.000205 Epoch: 23 Global Step: 491450 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:31,872-Speed 2498.57 samples/sec Loss 1.8508 LearningRate 0.000205 Epoch: 23 Global Step: 491460 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:40,019-Speed 2514.34 samples/sec Loss 1.8215 LearningRate 0.000205 Epoch: 23 Global Step: 491470 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:48,224-Speed 2496.61 samples/sec Loss 1.8108 LearningRate 0.000205 Epoch: 23 Global Step: 491480 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:57:56,424-Speed 2497.73 samples/sec Loss 1.8544 LearningRate 0.000205 Epoch: 23 Global Step: 491490 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:04,624-Speed 2498.17 samples/sec Loss 1.7759 LearningRate 0.000205 Epoch: 23 Global Step: 491500 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:12,825-Speed 2497.62 samples/sec Loss 1.8467 LearningRate 0.000205 Epoch: 23 Global Step: 491510 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:21,028-Speed 2496.97 samples/sec Loss 1.8334 LearningRate 0.000205 Epoch: 23 Global Step: 491520 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:29,173-Speed 2515.33 samples/sec Loss 1.8129 LearningRate 0.000205 Epoch: 23 Global Step: 491530 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:37,372-Speed 2498.26 samples/sec Loss 1.8116 LearningRate 0.000205 Epoch: 23 Global Step: 491540 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:45,583-Speed 2494.50 samples/sec Loss 1.7959 LearningRate 0.000205 Epoch: 23 Global Step: 491550 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:58:53,783-Speed 2497.69 samples/sec Loss 1.7901 LearningRate 0.000205 Epoch: 23 Global Step: 491560 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:01,983-Speed 2498.15 samples/sec Loss 1.8298 LearningRate 0.000205 Epoch: 23 Global Step: 491570 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:10,180-Speed 2498.96 samples/sec Loss 1.8509 LearningRate 0.000205 Epoch: 23 Global Step: 491580 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:18,329-Speed 2513.48 samples/sec Loss 1.8035 LearningRate 0.000205 Epoch: 23 Global Step: 491590 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:26,528-Speed 2498.46 samples/sec Loss 1.8202 LearningRate 0.000205 Epoch: 23 Global Step: 491600 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:34,733-Speed 2496.39 samples/sec Loss 1.7997 LearningRate 0.000205 Epoch: 23 Global Step: 491610 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:42,933-Speed 2498.32 samples/sec Loss 1.7650 LearningRate 0.000205 Epoch: 23 Global Step: 491620 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:51,142-Speed 2495.11 samples/sec Loss 1.8246 LearningRate 0.000205 Epoch: 23 Global Step: 491630 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 06:59:59,344-Speed 2497.29 samples/sec Loss 1.8037 LearningRate 0.000205 Epoch: 23 Global Step: 491640 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:07,491-Speed 2514.29 samples/sec Loss 1.8007 LearningRate 0.000205 Epoch: 23 Global Step: 491650 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:15,689-Speed 2498.60 samples/sec Loss 1.8058 LearningRate 0.000205 Epoch: 23 Global Step: 491660 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:23,893-Speed 2496.75 samples/sec Loss 1.8043 LearningRate 0.000205 Epoch: 23 Global Step: 491670 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:32,090-Speed 2498.89 samples/sec Loss 1.8118 LearningRate 0.000205 Epoch: 23 Global Step: 491680 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:40,299-Speed 2495.24 samples/sec Loss 1.7971 LearningRate 0.000205 Epoch: 23 Global Step: 491690 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:48,501-Speed 2497.52 samples/sec Loss 1.7952 LearningRate 0.000205 Epoch: 23 Global Step: 491700 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:00:56,656-Speed 2511.80 samples/sec Loss 1.7710 LearningRate 0.000205 Epoch: 23 Global Step: 491710 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:04,854-Speed 2498.46 samples/sec Loss 1.8386 LearningRate 0.000205 Epoch: 23 Global Step: 491720 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:13,052-Speed 2498.49 samples/sec Loss 1.8401 LearningRate 0.000205 Epoch: 23 Global Step: 491730 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:21,251-Speed 2498.38 samples/sec Loss 1.8331 LearningRate 0.000205 Epoch: 23 Global Step: 491740 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:29,452-Speed 2497.73 samples/sec Loss 1.7801 LearningRate 0.000205 Epoch: 23 Global Step: 491750 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:37,664-Speed 2494.59 samples/sec Loss 1.8404 LearningRate 0.000205 Epoch: 23 Global Step: 491760 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:45,808-Speed 2515.22 samples/sec Loss 1.7982 LearningRate 0.000205 Epoch: 23 Global Step: 491770 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:01:54,008-Speed 2497.92 samples/sec Loss 1.7850 LearningRate 0.000205 Epoch: 23 Global Step: 491780 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:02,211-Speed 2497.27 samples/sec Loss 1.8128 LearningRate 0.000205 Epoch: 23 Global Step: 491790 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:10,410-Speed 2498.08 samples/sec Loss 1.7406 LearningRate 0.000205 Epoch: 23 Global Step: 491800 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:18,608-Speed 2498.67 samples/sec Loss 1.7964 LearningRate 0.000205 Epoch: 23 Global Step: 491810 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:26,811-Speed 2497.03 samples/sec Loss 1.8330 LearningRate 0.000205 Epoch: 23 Global Step: 491820 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:34,955-Speed 2515.10 samples/sec Loss 1.7922 LearningRate 0.000205 Epoch: 23 Global Step: 491830 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:43,157-Speed 2497.37 samples/sec Loss 1.8405 LearningRate 0.000205 Epoch: 23 Global Step: 491840 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:51,355-Speed 2498.56 samples/sec Loss 1.8178 LearningRate 0.000205 Epoch: 23 Global Step: 491850 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:02:59,554-Speed 2498.69 samples/sec Loss 1.8007 LearningRate 0.000205 Epoch: 23 Global Step: 491860 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:07,754-Speed 2497.84 samples/sec Loss 1.8432 LearningRate 0.000205 Epoch: 23 Global Step: 491870 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:15,960-Speed 2496.10 samples/sec Loss 1.7844 LearningRate 0.000205 Epoch: 23 Global Step: 491880 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:24,106-Speed 2514.57 samples/sec Loss 1.8083 LearningRate 0.000205 Epoch: 23 Global Step: 491890 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:32,305-Speed 2498.34 samples/sec Loss 1.8520 LearningRate 0.000205 Epoch: 23 Global Step: 491900 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:40,504-Speed 2498.50 samples/sec Loss 1.8137 LearningRate 0.000205 Epoch: 23 Global Step: 491910 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:48,701-Speed 2498.83 samples/sec Loss 1.8272 LearningRate 0.000205 Epoch: 23 Global Step: 491920 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:03:56,899-Speed 2498.73 samples/sec Loss 1.8177 LearningRate 0.000205 Epoch: 23 Global Step: 491930 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:05,102-Speed 2496.91 samples/sec Loss 1.8548 LearningRate 0.000204 Epoch: 23 Global Step: 491940 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:13,259-Speed 2511.18 samples/sec Loss 1.8131 LearningRate 0.000204 Epoch: 23 Global Step: 491950 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:21,468-Speed 2495.38 samples/sec Loss 1.8313 LearningRate 0.000204 Epoch: 23 Global Step: 491960 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:29,677-Speed 2495.18 samples/sec Loss 1.8063 LearningRate 0.000204 Epoch: 23 Global Step: 491970 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:37,878-Speed 2497.83 samples/sec Loss 1.7923 LearningRate 0.000204 Epoch: 23 Global Step: 491980 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:46,076-Speed 2498.29 samples/sec Loss 1.8105 LearningRate 0.000204 Epoch: 23 Global Step: 491990 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:04:54,278-Speed 2497.51 samples/sec Loss 1.7854 LearningRate 0.000204 Epoch: 23 Global Step: 492000 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:02,424-Speed 2514.45 samples/sec Loss 1.7796 LearningRate 0.000204 Epoch: 23 Global Step: 492010 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:10,634-Speed 2495.05 samples/sec Loss 1.8533 LearningRate 0.000204 Epoch: 23 Global Step: 492020 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:18,833-Speed 2498.24 samples/sec Loss 1.8053 LearningRate 0.000204 Epoch: 23 Global Step: 492030 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:27,033-Speed 2498.08 samples/sec Loss 1.7782 LearningRate 0.000204 Epoch: 23 Global Step: 492040 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:35,233-Speed 2497.91 samples/sec Loss 1.8205 LearningRate 0.000204 Epoch: 23 Global Step: 492050 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:43,431-Speed 2498.65 samples/sec Loss 1.7848 LearningRate 0.000204 Epoch: 23 Global Step: 492060 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:51,579-Speed 2513.98 samples/sec Loss 1.8095 LearningRate 0.000204 Epoch: 23 Global Step: 492070 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:05:59,779-Speed 2497.94 samples/sec Loss 1.7875 LearningRate 0.000204 Epoch: 23 Global Step: 492080 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:07,984-Speed 2496.37 samples/sec Loss 1.8001 LearningRate 0.000204 Epoch: 23 Global Step: 492090 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:16,188-Speed 2496.94 samples/sec Loss 1.8083 LearningRate 0.000204 Epoch: 23 Global Step: 492100 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:24,388-Speed 2497.88 samples/sec Loss 1.7959 LearningRate 0.000204 Epoch: 23 Global Step: 492110 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:32,592-Speed 2496.79 samples/sec Loss 1.7807 LearningRate 0.000204 Epoch: 23 Global Step: 492120 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:40,740-Speed 2513.92 samples/sec Loss 1.7915 LearningRate 0.000204 Epoch: 23 Global Step: 492130 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:48,937-Speed 2498.81 samples/sec Loss 1.8240 LearningRate 0.000204 Epoch: 23 Global Step: 492140 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:06:57,141-Speed 2496.86 samples/sec Loss 1.7935 LearningRate 0.000204 Epoch: 23 Global Step: 492150 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:05,343-Speed 2497.35 samples/sec Loss 1.7889 LearningRate 0.000204 Epoch: 23 Global Step: 492160 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:13,542-Speed 2498.04 samples/sec Loss 1.8005 LearningRate 0.000204 Epoch: 23 Global Step: 492170 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:21,756-Speed 2493.62 samples/sec Loss 1.7957 LearningRate 0.000204 Epoch: 23 Global Step: 492180 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:29,900-Speed 2515.15 samples/sec Loss 1.7711 LearningRate 0.000204 Epoch: 23 Global Step: 492190 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:38,099-Speed 2498.26 samples/sec Loss 1.8062 LearningRate 0.000204 Epoch: 23 Global Step: 492200 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:46,296-Speed 2498.77 samples/sec Loss 1.7811 LearningRate 0.000204 Epoch: 23 Global Step: 492210 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:07:54,491-Speed 2499.46 samples/sec Loss 1.7726 LearningRate 0.000204 Epoch: 23 Global Step: 492220 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:02,688-Speed 2498.74 samples/sec Loss 1.8226 LearningRate 0.000204 Epoch: 23 Global Step: 492230 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:10,896-Speed 2495.49 samples/sec Loss 1.7774 LearningRate 0.000204 Epoch: 23 Global Step: 492240 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:19,042-Speed 2514.48 samples/sec Loss 1.8151 LearningRate 0.000204 Epoch: 23 Global Step: 492250 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:27,246-Speed 2497.10 samples/sec Loss 1.8170 LearningRate 0.000204 Epoch: 23 Global Step: 492260 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:35,449-Speed 2497.04 samples/sec Loss 1.8002 LearningRate 0.000204 Epoch: 23 Global Step: 492270 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:43,656-Speed 2495.71 samples/sec Loss 1.8456 LearningRate 0.000204 Epoch: 23 Global Step: 492280 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:08:51,857-Speed 2497.58 samples/sec Loss 1.7957 LearningRate 0.000204 Epoch: 23 Global Step: 492290 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:00,057-Speed 2497.72 samples/sec Loss 1.7754 LearningRate 0.000204 Epoch: 23 Global Step: 492300 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:08,209-Speed 2513.00 samples/sec Loss 1.7896 LearningRate 0.000204 Epoch: 23 Global Step: 492310 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:16,412-Speed 2497.11 samples/sec Loss 1.7917 LearningRate 0.000204 Epoch: 23 Global Step: 492320 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:24,613-Speed 2497.58 samples/sec Loss 1.8399 LearningRate 0.000204 Epoch: 23 Global Step: 492330 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:32,828-Speed 2493.47 samples/sec Loss 1.8463 LearningRate 0.000204 Epoch: 23 Global Step: 492340 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:41,028-Speed 2497.82 samples/sec Loss 1.7532 LearningRate 0.000204 Epoch: 23 Global Step: 492350 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:49,233-Speed 2496.54 samples/sec Loss 1.7836 LearningRate 0.000204 Epoch: 23 Global Step: 492360 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:09:57,379-Speed 2514.31 samples/sec Loss 1.7907 LearningRate 0.000204 Epoch: 23 Global Step: 492370 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:10:05,583-Speed 2496.78 samples/sec Loss 1.8379 LearningRate 0.000204 Epoch: 23 Global Step: 492380 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:10:13,784-Speed 2497.83 samples/sec Loss 1.7576 LearningRate 0.000204 Epoch: 23 Global Step: 492390 Fp16 Grad Scale: 8192 Required: 77 hours Training: 2022-07-10 07:10:22,007-Speed 2490.73 samples/sec Loss 1.8393 LearningRate 0.000204 Epoch: 23 Global Step: 492400 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:10:30,212-Speed 2496.37 samples/sec Loss 1.7531 LearningRate 0.000204 Epoch: 23 Global Step: 492410 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:10:38,415-Speed 2497.28 samples/sec Loss 1.8344 LearningRate 0.000204 Epoch: 23 Global Step: 492420 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:10:46,561-Speed 2514.55 samples/sec Loss 1.8174 LearningRate 0.000204 Epoch: 23 Global Step: 492430 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:10:54,779-Speed 2492.35 samples/sec Loss 1.8029 LearningRate 0.000204 Epoch: 23 Global Step: 492440 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:02,978-Speed 2498.40 samples/sec Loss 1.8073 LearningRate 0.000204 Epoch: 23 Global Step: 492450 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:11,173-Speed 2499.43 samples/sec Loss 1.8065 LearningRate 0.000204 Epoch: 23 Global Step: 492460 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:19,368-Speed 2499.48 samples/sec Loss 1.8001 LearningRate 0.000204 Epoch: 23 Global Step: 492470 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:27,565-Speed 2498.70 samples/sec Loss 1.8064 LearningRate 0.000204 Epoch: 23 Global Step: 492480 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:35,726-Speed 2510.13 samples/sec Loss 1.7820 LearningRate 0.000204 Epoch: 23 Global Step: 492490 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:43,927-Speed 2497.63 samples/sec Loss 1.7606 LearningRate 0.000204 Epoch: 23 Global Step: 492500 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:11:52,128-Speed 2497.52 samples/sec Loss 1.7993 LearningRate 0.000204 Epoch: 23 Global Step: 492510 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:00,330-Speed 2497.31 samples/sec Loss 1.7119 LearningRate 0.000204 Epoch: 23 Global Step: 492520 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:08,542-Speed 2494.39 samples/sec Loss 1.8109 LearningRate 0.000204 Epoch: 23 Global Step: 492530 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:16,744-Speed 2497.62 samples/sec Loss 1.7809 LearningRate 0.000204 Epoch: 23 Global Step: 492540 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:24,890-Speed 2514.32 samples/sec Loss 1.7759 LearningRate 0.000204 Epoch: 23 Global Step: 492550 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:33,091-Speed 2497.42 samples/sec Loss 1.7461 LearningRate 0.000204 Epoch: 23 Global Step: 492560 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:41,289-Speed 2498.84 samples/sec Loss 1.7900 LearningRate 0.000204 Epoch: 23 Global Step: 492570 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:49,488-Speed 2498.14 samples/sec Loss 1.8109 LearningRate 0.000204 Epoch: 23 Global Step: 492580 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:12:57,690-Speed 2497.35 samples/sec Loss 1.7666 LearningRate 0.000204 Epoch: 23 Global Step: 492590 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:05,889-Speed 2498.35 samples/sec Loss 1.8017 LearningRate 0.000204 Epoch: 23 Global Step: 492600 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:14,039-Speed 2513.34 samples/sec Loss 1.8070 LearningRate 0.000204 Epoch: 23 Global Step: 492610 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:22,256-Speed 2492.86 samples/sec Loss 1.7977 LearningRate 0.000204 Epoch: 23 Global Step: 492620 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:30,459-Speed 2497.01 samples/sec Loss 1.8099 LearningRate 0.000204 Epoch: 23 Global Step: 492630 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:38,661-Speed 2497.37 samples/sec Loss 1.8050 LearningRate 0.000204 Epoch: 23 Global Step: 492640 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:46,858-Speed 2498.72 samples/sec Loss 1.8206 LearningRate 0.000204 Epoch: 23 Global Step: 492650 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:13:55,056-Speed 2498.84 samples/sec Loss 1.8097 LearningRate 0.000204 Epoch: 23 Global Step: 492660 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:03,203-Speed 2514.23 samples/sec Loss 1.8049 LearningRate 0.000204 Epoch: 23 Global Step: 492670 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:11,406-Speed 2496.85 samples/sec Loss 1.7782 LearningRate 0.000204 Epoch: 23 Global Step: 492680 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:19,608-Speed 2497.38 samples/sec Loss 1.7770 LearningRate 0.000204 Epoch: 23 Global Step: 492690 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:27,811-Speed 2497.40 samples/sec Loss 1.7484 LearningRate 0.000204 Epoch: 23 Global Step: 492700 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:36,012-Speed 2497.67 samples/sec Loss 1.8104 LearningRate 0.000204 Epoch: 23 Global Step: 492710 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:44,212-Speed 2498.07 samples/sec Loss 1.7754 LearningRate 0.000204 Epoch: 23 Global Step: 492720 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:14:52,357-Speed 2514.84 samples/sec Loss 1.7584 LearningRate 0.000204 Epoch: 23 Global Step: 492730 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:00,556-Speed 2498.58 samples/sec Loss 1.7916 LearningRate 0.000204 Epoch: 23 Global Step: 492740 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:08,753-Speed 2498.83 samples/sec Loss 1.8032 LearningRate 0.000204 Epoch: 23 Global Step: 492750 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:16,953-Speed 2497.85 samples/sec Loss 1.7917 LearningRate 0.000204 Epoch: 23 Global Step: 492760 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:25,156-Speed 2497.16 samples/sec Loss 1.7930 LearningRate 0.000203 Epoch: 23 Global Step: 492770 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:33,351-Speed 2499.50 samples/sec Loss 1.8102 LearningRate 0.000203 Epoch: 23 Global Step: 492780 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:41,496-Speed 2514.90 samples/sec Loss 1.7675 LearningRate 0.000203 Epoch: 23 Global Step: 492790 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:49,700-Speed 2496.89 samples/sec Loss 1.7559 LearningRate 0.000203 Epoch: 23 Global Step: 492800 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:15:57,899-Speed 2498.26 samples/sec Loss 1.8137 LearningRate 0.000203 Epoch: 23 Global Step: 492810 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:06,118-Speed 2492.11 samples/sec Loss 1.7986 LearningRate 0.000203 Epoch: 23 Global Step: 492820 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:14,318-Speed 2497.85 samples/sec Loss 1.7815 LearningRate 0.000203 Epoch: 23 Global Step: 492830 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:22,522-Speed 2497.19 samples/sec Loss 1.8015 LearningRate 0.000203 Epoch: 23 Global Step: 492840 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:30,670-Speed 2514.09 samples/sec Loss 1.7899 LearningRate 0.000203 Epoch: 23 Global Step: 492850 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:38,867-Speed 2498.63 samples/sec Loss 1.7898 LearningRate 0.000203 Epoch: 23 Global Step: 492860 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:47,070-Speed 2497.03 samples/sec Loss 1.8289 LearningRate 0.000203 Epoch: 23 Global Step: 492870 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:16:55,269-Speed 2498.19 samples/sec Loss 1.8249 LearningRate 0.000203 Epoch: 23 Global Step: 492880 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:03,468-Speed 2498.20 samples/sec Loss 1.8034 LearningRate 0.000203 Epoch: 23 Global Step: 492890 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:11,677-Speed 2495.44 samples/sec Loss 1.8260 LearningRate 0.000203 Epoch: 23 Global Step: 492900 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:19,824-Speed 2514.29 samples/sec Loss 1.7853 LearningRate 0.000203 Epoch: 23 Global Step: 492910 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:28,022-Speed 2498.46 samples/sec Loss 1.7636 LearningRate 0.000203 Epoch: 23 Global Step: 492920 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:36,220-Speed 2499.56 samples/sec Loss 1.8252 LearningRate 0.000203 Epoch: 23 Global Step: 492930 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:44,417-Speed 2498.92 samples/sec Loss 1.8139 LearningRate 0.000203 Epoch: 23 Global Step: 492940 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:17:52,618-Speed 2497.52 samples/sec Loss 1.8255 LearningRate 0.000203 Epoch: 23 Global Step: 492950 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:00,821-Speed 2497.11 samples/sec Loss 1.7914 LearningRate 0.000203 Epoch: 23 Global Step: 492960 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:08,968-Speed 2514.03 samples/sec Loss 1.8033 LearningRate 0.000203 Epoch: 23 Global Step: 492970 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:17,163-Speed 2499.44 samples/sec Loss 1.8374 LearningRate 0.000203 Epoch: 23 Global Step: 492980 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:25,377-Speed 2493.70 samples/sec Loss 1.8177 LearningRate 0.000203 Epoch: 23 Global Step: 492990 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:33,588-Speed 2494.74 samples/sec Loss 1.7829 LearningRate 0.000203 Epoch: 23 Global Step: 493000 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:41,789-Speed 2497.76 samples/sec Loss 1.8412 LearningRate 0.000203 Epoch: 23 Global Step: 493010 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:49,991-Speed 2497.26 samples/sec Loss 1.7901 LearningRate 0.000203 Epoch: 23 Global Step: 493020 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:18:58,137-Speed 2514.40 samples/sec Loss 1.7910 LearningRate 0.000203 Epoch: 23 Global Step: 493030 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:06,336-Speed 2498.34 samples/sec Loss 1.8156 LearningRate 0.000203 Epoch: 23 Global Step: 493040 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:14,533-Speed 2498.92 samples/sec Loss 1.7928 LearningRate 0.000203 Epoch: 23 Global Step: 493050 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:22,735-Speed 2497.51 samples/sec Loss 1.8310 LearningRate 0.000203 Epoch: 23 Global Step: 493060 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:30,934-Speed 2498.20 samples/sec Loss 1.7945 LearningRate 0.000203 Epoch: 23 Global Step: 493070 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:39,133-Speed 2498.64 samples/sec Loss 1.8099 LearningRate 0.000203 Epoch: 23 Global Step: 493080 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:47,279-Speed 2514.29 samples/sec Loss 1.7950 LearningRate 0.000203 Epoch: 23 Global Step: 493090 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:19:55,480-Speed 2497.96 samples/sec Loss 1.7936 LearningRate 0.000203 Epoch: 23 Global Step: 493100 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:03,694-Speed 2493.47 samples/sec Loss 1.8130 LearningRate 0.000203 Epoch: 23 Global Step: 493110 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:11,894-Speed 2497.97 samples/sec Loss 1.7829 LearningRate 0.000203 Epoch: 23 Global Step: 493120 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:20,096-Speed 2497.30 samples/sec Loss 1.8270 LearningRate 0.000203 Epoch: 23 Global Step: 493130 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:28,299-Speed 2497.18 samples/sec Loss 1.7889 LearningRate 0.000203 Epoch: 23 Global Step: 493140 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:36,449-Speed 2513.12 samples/sec Loss 1.8325 LearningRate 0.000203 Epoch: 23 Global Step: 493150 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:44,649-Speed 2498.44 samples/sec Loss 1.7843 LearningRate 0.000203 Epoch: 23 Global Step: 493160 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:20:52,850-Speed 2497.40 samples/sec Loss 1.7893 LearningRate 0.000203 Epoch: 23 Global Step: 493170 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:01,048-Speed 2498.67 samples/sec Loss 1.8061 LearningRate 0.000203 Epoch: 23 Global Step: 493180 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:09,249-Speed 2497.66 samples/sec Loss 1.7912 LearningRate 0.000203 Epoch: 23 Global Step: 493190 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:17,448-Speed 2498.21 samples/sec Loss 1.7972 LearningRate 0.000203 Epoch: 23 Global Step: 493200 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:25,596-Speed 2514.00 samples/sec Loss 1.8005 LearningRate 0.000203 Epoch: 23 Global Step: 493210 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:33,801-Speed 2496.22 samples/sec Loss 1.8237 LearningRate 0.000203 Epoch: 23 Global Step: 493220 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:42,001-Speed 2498.18 samples/sec Loss 1.7640 LearningRate 0.000203 Epoch: 23 Global Step: 493230 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:50,199-Speed 2498.73 samples/sec Loss 1.8206 LearningRate 0.000203 Epoch: 23 Global Step: 493240 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:21:58,403-Speed 2496.66 samples/sec Loss 1.7887 LearningRate 0.000203 Epoch: 23 Global Step: 493250 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:06,606-Speed 2497.37 samples/sec Loss 1.8121 LearningRate 0.000203 Epoch: 23 Global Step: 493260 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:14,754-Speed 2513.84 samples/sec Loss 1.7923 LearningRate 0.000203 Epoch: 23 Global Step: 493270 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:22,954-Speed 2497.67 samples/sec Loss 1.8286 LearningRate 0.000203 Epoch: 23 Global Step: 493280 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:31,160-Speed 2496.36 samples/sec Loss 1.7705 LearningRate 0.000203 Epoch: 23 Global Step: 493290 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:39,363-Speed 2497.14 samples/sec Loss 1.8061 LearningRate 0.000203 Epoch: 23 Global Step: 493300 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:47,566-Speed 2497.02 samples/sec Loss 1.7343 LearningRate 0.000203 Epoch: 23 Global Step: 493310 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:22:55,768-Speed 2497.19 samples/sec Loss 1.7862 LearningRate 0.000203 Epoch: 23 Global Step: 493320 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:03,918-Speed 2513.18 samples/sec Loss 1.8171 LearningRate 0.000203 Epoch: 23 Global Step: 493330 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:12,118-Speed 2498.10 samples/sec Loss 1.7797 LearningRate 0.000203 Epoch: 23 Global Step: 493340 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:20,323-Speed 2496.54 samples/sec Loss 1.7851 LearningRate 0.000203 Epoch: 23 Global Step: 493350 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:28,520-Speed 2499.00 samples/sec Loss 1.8138 LearningRate 0.000203 Epoch: 23 Global Step: 493360 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:36,720-Speed 2497.96 samples/sec Loss 1.8230 LearningRate 0.000203 Epoch: 23 Global Step: 493370 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:44,927-Speed 2495.88 samples/sec Loss 1.7795 LearningRate 0.000203 Epoch: 23 Global Step: 493380 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:23:53,069-Speed 2515.48 samples/sec Loss 1.8304 LearningRate 0.000203 Epoch: 23 Global Step: 493390 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:01,272-Speed 2497.09 samples/sec Loss 1.7783 LearningRate 0.000203 Epoch: 23 Global Step: 493400 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:09,472-Speed 2498.30 samples/sec Loss 1.7522 LearningRate 0.000203 Epoch: 23 Global Step: 493410 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:17,673-Speed 2497.79 samples/sec Loss 1.7913 LearningRate 0.000203 Epoch: 23 Global Step: 493420 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:25,876-Speed 2496.99 samples/sec Loss 1.8424 LearningRate 0.000203 Epoch: 23 Global Step: 493430 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:34,075-Speed 2498.26 samples/sec Loss 1.7815 LearningRate 0.000203 Epoch: 23 Global Step: 493440 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:42,227-Speed 2512.80 samples/sec Loss 1.8096 LearningRate 0.000203 Epoch: 23 Global Step: 493450 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:50,428-Speed 2497.61 samples/sec Loss 1.8310 LearningRate 0.000203 Epoch: 23 Global Step: 493460 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:24:58,632-Speed 2496.75 samples/sec Loss 1.8220 LearningRate 0.000203 Epoch: 23 Global Step: 493470 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:06,830-Speed 2498.60 samples/sec Loss 1.7842 LearningRate 0.000203 Epoch: 23 Global Step: 493480 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:15,032-Speed 2497.70 samples/sec Loss 1.7968 LearningRate 0.000203 Epoch: 23 Global Step: 493490 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:23,228-Speed 2499.04 samples/sec Loss 1.8174 LearningRate 0.000203 Epoch: 23 Global Step: 493500 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:31,371-Speed 2515.29 samples/sec Loss 1.8327 LearningRate 0.000203 Epoch: 23 Global Step: 493510 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:39,590-Speed 2492.28 samples/sec Loss 1.8553 LearningRate 0.000203 Epoch: 23 Global Step: 493520 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:47,787-Speed 2498.81 samples/sec Loss 1.8364 LearningRate 0.000203 Epoch: 23 Global Step: 493530 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:25:55,992-Speed 2496.46 samples/sec Loss 1.8352 LearningRate 0.000203 Epoch: 23 Global Step: 493540 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:04,197-Speed 2496.47 samples/sec Loss 1.8668 LearningRate 0.000203 Epoch: 23 Global Step: 493550 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:12,408-Speed 2494.77 samples/sec Loss 1.8663 LearningRate 0.000203 Epoch: 23 Global Step: 493560 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:20,554-Speed 2514.28 samples/sec Loss 1.8112 LearningRate 0.000203 Epoch: 23 Global Step: 493570 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:28,753-Speed 2498.68 samples/sec Loss 1.7801 LearningRate 0.000203 Epoch: 23 Global Step: 493580 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:36,951-Speed 2498.45 samples/sec Loss 1.7844 LearningRate 0.000202 Epoch: 23 Global Step: 493590 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:26:45,151-Speed 2498.18 samples/sec Loss 1.7942 LearningRate 0.000202 Epoch: 23 Global Step: 493600 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:26:53,351-Speed 2498.04 samples/sec Loss 1.8185 LearningRate 0.000202 Epoch: 23 Global Step: 493610 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:01,551-Speed 2498.06 samples/sec Loss 1.8241 LearningRate 0.000202 Epoch: 23 Global Step: 493620 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:09,700-Speed 2513.28 samples/sec Loss 1.8359 LearningRate 0.000202 Epoch: 23 Global Step: 493630 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:17,901-Speed 2498.01 samples/sec Loss 1.8051 LearningRate 0.000202 Epoch: 23 Global Step: 493640 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:26,099-Speed 2498.53 samples/sec Loss 1.8221 LearningRate 0.000202 Epoch: 23 Global Step: 493650 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:34,300-Speed 2497.92 samples/sec Loss 1.8116 LearningRate 0.000202 Epoch: 23 Global Step: 493660 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:42,502-Speed 2497.16 samples/sec Loss 1.7695 LearningRate 0.000202 Epoch: 23 Global Step: 493670 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:50,703-Speed 2497.75 samples/sec Loss 1.7965 LearningRate 0.000202 Epoch: 23 Global Step: 493680 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:27:58,848-Speed 2514.67 samples/sec Loss 1.7746 LearningRate 0.000202 Epoch: 23 Global Step: 493690 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:07,047-Speed 2498.48 samples/sec Loss 1.8278 LearningRate 0.000202 Epoch: 23 Global Step: 493700 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:15,251-Speed 2496.93 samples/sec Loss 1.8065 LearningRate 0.000202 Epoch: 23 Global Step: 493710 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:23,450-Speed 2498.37 samples/sec Loss 1.8205 LearningRate 0.000202 Epoch: 23 Global Step: 493720 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:31,651-Speed 2497.84 samples/sec Loss 1.8177 LearningRate 0.000202 Epoch: 23 Global Step: 493730 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:39,851-Speed 2497.75 samples/sec Loss 1.8091 LearningRate 0.000202 Epoch: 23 Global Step: 493740 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:48,013-Speed 2509.41 samples/sec Loss 1.8121 LearningRate 0.000202 Epoch: 23 Global Step: 493750 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:28:56,219-Speed 2496.27 samples/sec Loss 1.7798 LearningRate 0.000202 Epoch: 23 Global Step: 493760 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:04,419-Speed 2498.19 samples/sec Loss 1.8126 LearningRate 0.000202 Epoch: 23 Global Step: 493770 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:12,617-Speed 2498.35 samples/sec Loss 1.7509 LearningRate 0.000202 Epoch: 23 Global Step: 493780 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:20,815-Speed 2498.86 samples/sec Loss 1.7789 LearningRate 0.000202 Epoch: 23 Global Step: 493790 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:29,021-Speed 2496.07 samples/sec Loss 1.7744 LearningRate 0.000202 Epoch: 23 Global Step: 493800 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:37,169-Speed 2513.66 samples/sec Loss 1.7763 LearningRate 0.000202 Epoch: 23 Global Step: 493810 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:45,368-Speed 2498.32 samples/sec Loss 1.7486 LearningRate 0.000202 Epoch: 23 Global Step: 493820 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:29:53,568-Speed 2497.86 samples/sec Loss 1.7900 LearningRate 0.000202 Epoch: 23 Global Step: 493830 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:01,768-Speed 2498.39 samples/sec Loss 1.8042 LearningRate 0.000202 Epoch: 23 Global Step: 493840 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:09,970-Speed 2497.26 samples/sec Loss 1.7424 LearningRate 0.000202 Epoch: 23 Global Step: 493850 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:18,172-Speed 2497.67 samples/sec Loss 1.7781 LearningRate 0.000202 Epoch: 23 Global Step: 493860 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:26,331-Speed 2510.56 samples/sec Loss 1.7839 LearningRate 0.000202 Epoch: 23 Global Step: 493870 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:34,528-Speed 2498.76 samples/sec Loss 1.7743 LearningRate 0.000202 Epoch: 23 Global Step: 493880 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:42,730-Speed 2497.51 samples/sec Loss 1.7637 LearningRate 0.000202 Epoch: 23 Global Step: 493890 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:50,928-Speed 2498.42 samples/sec Loss 1.7898 LearningRate 0.000202 Epoch: 23 Global Step: 493900 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:30:59,126-Speed 2498.62 samples/sec Loss 1.7797 LearningRate 0.000202 Epoch: 23 Global Step: 493910 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:07,325-Speed 2498.20 samples/sec Loss 1.7717 LearningRate 0.000202 Epoch: 23 Global Step: 493920 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:15,473-Speed 2513.88 samples/sec Loss 1.7944 LearningRate 0.000202 Epoch: 23 Global Step: 493930 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:23,672-Speed 2498.41 samples/sec Loss 1.7866 LearningRate 0.000202 Epoch: 23 Global Step: 493940 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:31,870-Speed 2498.63 samples/sec Loss 1.7637 LearningRate 0.000202 Epoch: 23 Global Step: 493950 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:40,071-Speed 2497.69 samples/sec Loss 1.7689 LearningRate 0.000202 Epoch: 23 Global Step: 493960 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:48,279-Speed 2495.32 samples/sec Loss 1.7487 LearningRate 0.000202 Epoch: 23 Global Step: 493970 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:31:56,486-Speed 2495.92 samples/sec Loss 1.7638 LearningRate 0.000202 Epoch: 23 Global Step: 493980 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:04,632-Speed 2514.34 samples/sec Loss 1.7406 LearningRate 0.000202 Epoch: 23 Global Step: 493990 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:12,835-Speed 2497.18 samples/sec Loss 1.8427 LearningRate 0.000202 Epoch: 23 Global Step: 494000 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:21,045-Speed 2495.04 samples/sec Loss 1.7873 LearningRate 0.000202 Epoch: 23 Global Step: 494010 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:29,245-Speed 2498.07 samples/sec Loss 1.7672 LearningRate 0.000202 Epoch: 23 Global Step: 494020 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:37,445-Speed 2497.97 samples/sec Loss 1.7551 LearningRate 0.000202 Epoch: 23 Global Step: 494030 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:45,647-Speed 2497.44 samples/sec Loss 1.7632 LearningRate 0.000202 Epoch: 23 Global Step: 494040 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:32:53,794-Speed 2514.01 samples/sec Loss 1.8181 LearningRate 0.000202 Epoch: 23 Global Step: 494050 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:02,000-Speed 2496.15 samples/sec Loss 1.8191 LearningRate 0.000202 Epoch: 23 Global Step: 494060 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:10,204-Speed 2496.98 samples/sec Loss 1.7892 LearningRate 0.000202 Epoch: 23 Global Step: 494070 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:18,406-Speed 2497.57 samples/sec Loss 1.8317 LearningRate 0.000202 Epoch: 23 Global Step: 494080 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:26,611-Speed 2496.48 samples/sec Loss 1.8587 LearningRate 0.000202 Epoch: 23 Global Step: 494090 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:34,812-Speed 2497.88 samples/sec Loss 1.8024 LearningRate 0.000202 Epoch: 23 Global Step: 494100 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:42,966-Speed 2512.10 samples/sec Loss 1.8346 LearningRate 0.000202 Epoch: 23 Global Step: 494110 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:51,167-Speed 2497.50 samples/sec Loss 1.7640 LearningRate 0.000202 Epoch: 23 Global Step: 494120 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:33:59,367-Speed 2497.85 samples/sec Loss 1.8122 LearningRate 0.000202 Epoch: 23 Global Step: 494130 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:07,572-Speed 2496.46 samples/sec Loss 1.8417 LearningRate 0.000202 Epoch: 23 Global Step: 494140 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:15,777-Speed 2496.56 samples/sec Loss 1.7822 LearningRate 0.000202 Epoch: 23 Global Step: 494150 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:23,982-Speed 2496.30 samples/sec Loss 1.8283 LearningRate 0.000202 Epoch: 23 Global Step: 494160 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:32,131-Speed 2513.61 samples/sec Loss 1.8262 LearningRate 0.000202 Epoch: 23 Global Step: 494170 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:40,339-Speed 2495.66 samples/sec Loss 1.7990 LearningRate 0.000202 Epoch: 23 Global Step: 494180 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:48,548-Speed 2495.25 samples/sec Loss 1.8013 LearningRate 0.000202 Epoch: 23 Global Step: 494190 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:34:56,750-Speed 2497.27 samples/sec Loss 1.7895 LearningRate 0.000202 Epoch: 23 Global Step: 494200 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:04,952-Speed 2497.53 samples/sec Loss 1.7745 LearningRate 0.000202 Epoch: 23 Global Step: 494210 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:13,155-Speed 2497.36 samples/sec Loss 1.8256 LearningRate 0.000202 Epoch: 23 Global Step: 494220 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:21,302-Speed 2514.34 samples/sec Loss 1.8445 LearningRate 0.000202 Epoch: 23 Global Step: 494230 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:29,505-Speed 2496.94 samples/sec Loss 1.8043 LearningRate 0.000202 Epoch: 23 Global Step: 494240 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:37,704-Speed 2498.49 samples/sec Loss 1.7494 LearningRate 0.000202 Epoch: 23 Global Step: 494250 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:45,904-Speed 2497.72 samples/sec Loss 1.7437 LearningRate 0.000202 Epoch: 23 Global Step: 494260 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:35:54,100-Speed 2499.31 samples/sec Loss 1.7902 LearningRate 0.000202 Epoch: 23 Global Step: 494270 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:02,299-Speed 2498.12 samples/sec Loss 1.7939 LearningRate 0.000202 Epoch: 23 Global Step: 494280 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:10,449-Speed 2513.22 samples/sec Loss 1.7692 LearningRate 0.000202 Epoch: 23 Global Step: 494290 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:18,653-Speed 2497.00 samples/sec Loss 1.7981 LearningRate 0.000202 Epoch: 23 Global Step: 494300 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:26,853-Speed 2497.79 samples/sec Loss 1.7598 LearningRate 0.000202 Epoch: 23 Global Step: 494310 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:35,052-Speed 2498.30 samples/sec Loss 1.8032 LearningRate 0.000202 Epoch: 23 Global Step: 494320 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:43,249-Speed 2499.07 samples/sec Loss 1.7765 LearningRate 0.000202 Epoch: 23 Global Step: 494330 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:51,448-Speed 2498.17 samples/sec Loss 1.8324 LearningRate 0.000202 Epoch: 23 Global Step: 494340 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:36:59,596-Speed 2513.94 samples/sec Loss 1.8193 LearningRate 0.000202 Epoch: 23 Global Step: 494350 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:07,813-Speed 2492.92 samples/sec Loss 1.8281 LearningRate 0.000202 Epoch: 23 Global Step: 494360 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:16,015-Speed 2497.18 samples/sec Loss 1.8080 LearningRate 0.000202 Epoch: 23 Global Step: 494370 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:24,221-Speed 2496.22 samples/sec Loss 1.7871 LearningRate 0.000202 Epoch: 23 Global Step: 494380 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:32,454-Speed 2487.95 samples/sec Loss 1.7738 LearningRate 0.000202 Epoch: 23 Global Step: 494390 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:40,659-Speed 2496.32 samples/sec Loss 1.8281 LearningRate 0.000202 Epoch: 23 Global Step: 494400 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:48,808-Speed 2513.49 samples/sec Loss 1.7910 LearningRate 0.000202 Epoch: 23 Global Step: 494410 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:37:57,013-Speed 2496.63 samples/sec Loss 1.8133 LearningRate 0.000201 Epoch: 23 Global Step: 494420 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:05,213-Speed 2497.80 samples/sec Loss 1.7982 LearningRate 0.000201 Epoch: 23 Global Step: 494430 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:13,413-Speed 2498.29 samples/sec Loss 1.7766 LearningRate 0.000201 Epoch: 23 Global Step: 494440 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:21,612-Speed 2498.26 samples/sec Loss 1.7706 LearningRate 0.000201 Epoch: 23 Global Step: 494450 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:29,810-Speed 2498.40 samples/sec Loss 1.7869 LearningRate 0.000201 Epoch: 23 Global Step: 494460 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:37,956-Speed 2514.34 samples/sec Loss 1.7838 LearningRate 0.000201 Epoch: 23 Global Step: 494470 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:46,159-Speed 2497.15 samples/sec Loss 1.7214 LearningRate 0.000201 Epoch: 23 Global Step: 494480 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:38:54,358-Speed 2498.29 samples/sec Loss 1.8003 LearningRate 0.000201 Epoch: 23 Global Step: 494490 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:02,558-Speed 2498.00 samples/sec Loss 1.8162 LearningRate 0.000201 Epoch: 23 Global Step: 494500 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:10,770-Speed 2494.33 samples/sec Loss 1.7889 LearningRate 0.000201 Epoch: 23 Global Step: 494510 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:18,973-Speed 2497.14 samples/sec Loss 1.7782 LearningRate 0.000201 Epoch: 23 Global Step: 494520 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:27,121-Speed 2513.83 samples/sec Loss 1.8073 LearningRate 0.000201 Epoch: 23 Global Step: 494530 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:35,325-Speed 2496.70 samples/sec Loss 1.8111 LearningRate 0.000201 Epoch: 23 Global Step: 494540 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:43,525-Speed 2497.83 samples/sec Loss 1.7844 LearningRate 0.000201 Epoch: 23 Global Step: 494550 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:51,729-Speed 2496.66 samples/sec Loss 1.7753 LearningRate 0.000201 Epoch: 23 Global Step: 494560 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:39:59,930-Speed 2497.72 samples/sec Loss 1.8101 LearningRate 0.000201 Epoch: 23 Global Step: 494570 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:08,133-Speed 2496.97 samples/sec Loss 1.7681 LearningRate 0.000201 Epoch: 23 Global Step: 494580 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:16,280-Speed 2514.14 samples/sec Loss 1.7860 LearningRate 0.000201 Epoch: 23 Global Step: 494590 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:24,481-Speed 2497.74 samples/sec Loss 1.7651 LearningRate 0.000201 Epoch: 23 Global Step: 494600 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:32,695-Speed 2493.71 samples/sec Loss 1.8224 LearningRate 0.000201 Epoch: 23 Global Step: 494610 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:40,897-Speed 2497.56 samples/sec Loss 1.8084 LearningRate 0.000201 Epoch: 23 Global Step: 494620 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:49,098-Speed 2497.45 samples/sec Loss 1.8225 LearningRate 0.000201 Epoch: 23 Global Step: 494630 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:40:57,299-Speed 2497.85 samples/sec Loss 1.7826 LearningRate 0.000201 Epoch: 23 Global Step: 494640 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:05,446-Speed 2514.24 samples/sec Loss 1.7761 LearningRate 0.000201 Epoch: 23 Global Step: 494650 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:13,646-Speed 2498.37 samples/sec Loss 1.7627 LearningRate 0.000201 Epoch: 23 Global Step: 494660 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:21,848-Speed 2497.16 samples/sec Loss 1.7957 LearningRate 0.000201 Epoch: 23 Global Step: 494670 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:30,046-Speed 2498.25 samples/sec Loss 1.8000 LearningRate 0.000201 Epoch: 23 Global Step: 494680 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:38,248-Speed 2497.45 samples/sec Loss 1.8146 LearningRate 0.000201 Epoch: 23 Global Step: 494690 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:46,448-Speed 2497.93 samples/sec Loss 1.8273 LearningRate 0.000201 Epoch: 23 Global Step: 494700 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:41:54,598-Speed 2513.48 samples/sec Loss 1.7254 LearningRate 0.000201 Epoch: 23 Global Step: 494710 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:02,811-Speed 2493.82 samples/sec Loss 1.7597 LearningRate 0.000201 Epoch: 23 Global Step: 494720 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:11,008-Speed 2498.85 samples/sec Loss 1.7602 LearningRate 0.000201 Epoch: 23 Global Step: 494730 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:19,212-Speed 2496.82 samples/sec Loss 1.7647 LearningRate 0.000201 Epoch: 23 Global Step: 494740 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:27,414-Speed 2497.19 samples/sec Loss 1.8227 LearningRate 0.000201 Epoch: 23 Global Step: 494750 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:35,613-Speed 2498.50 samples/sec Loss 1.7952 LearningRate 0.000201 Epoch: 23 Global Step: 494760 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:43,760-Speed 2514.23 samples/sec Loss 1.7983 LearningRate 0.000201 Epoch: 23 Global Step: 494770 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:42:51,964-Speed 2496.89 samples/sec Loss 1.8311 LearningRate 0.000201 Epoch: 23 Global Step: 494780 Fp16 Grad Scale: 32768 Required: 77 hours Training: 2022-07-10 07:43:00,119-Speed 2511.72 samples/sec Loss 1.7805 LearningRate 0.000201 Epoch: 23 Global Step: 494790 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:08,337-Speed 2492.63 samples/sec Loss 1.8202 LearningRate 0.000201 Epoch: 23 Global Step: 494800 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:16,537-Speed 2497.89 samples/sec Loss 1.7837 LearningRate 0.000201 Epoch: 23 Global Step: 494810 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:24,736-Speed 2498.30 samples/sec Loss 1.8298 LearningRate 0.000201 Epoch: 23 Global Step: 494820 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:32,886-Speed 2513.37 samples/sec Loss 1.8062 LearningRate 0.000201 Epoch: 23 Global Step: 494830 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:41,085-Speed 2500.00 samples/sec Loss 1.7828 LearningRate 0.000201 Epoch: 23 Global Step: 494840 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:49,318-Speed 2496.41 samples/sec Loss 1.7623 LearningRate 0.000201 Epoch: 23 Global Step: 494850 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:43:57,535-Speed 2492.59 samples/sec Loss 1.8382 LearningRate 0.000201 Epoch: 23 Global Step: 494860 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:06,008-Speed 2497.75 samples/sec Loss 1.7918 LearningRate 0.000201 Epoch: 23 Global Step: 494870 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:14,268-Speed 2499.41 samples/sec Loss 1.7671 LearningRate 0.000201 Epoch: 23 Global Step: 494880 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:22,461-Speed 2515.33 samples/sec Loss 1.7784 LearningRate 0.000201 Epoch: 23 Global Step: 494890 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:34,805-Speed 1659.31 samples/sec Loss 1.7315 LearningRate 0.000201 Epoch: 23 Global Step: 494900 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:43,051-Speed 2498.30 samples/sec Loss 1.7496 LearningRate 0.000201 Epoch: 23 Global Step: 494910 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:51,292-Speed 2499.95 samples/sec Loss 1.7568 LearningRate 0.000201 Epoch: 23 Global Step: 494920 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:44:59,487-Speed 2499.23 samples/sec Loss 1.7426 LearningRate 0.000201 Epoch: 23 Global Step: 494930 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:45:13,747-Speed 2241.20 samples/sec Loss 1.8138 LearningRate 0.000201 Epoch: 23 Global Step: 494940 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:45:21,913-Speed 2517.64 samples/sec Loss 1.7737 LearningRate 0.000201 Epoch: 23 Global Step: 494950 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:45:34,544-Speed 1627.79 samples/sec Loss 1.7634 LearningRate 0.000201 Epoch: 23 Global Step: 494960 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:45:44,401-Speed 2502.57 samples/sec Loss 1.7390 LearningRate 0.000201 Epoch: 23 Global Step: 494970 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:45:52,641-Speed 2501.21 samples/sec Loss 1.8082 LearningRate 0.000201 Epoch: 23 Global Step: 494980 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:03,436-Speed 1897.48 samples/sec Loss 1.7425 LearningRate 0.000201 Epoch: 23 Global Step: 494990 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:11,672-Speed 2498.42 samples/sec Loss 1.7543 LearningRate 0.000201 Epoch: 23 Global Step: 495000 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:19,837-Speed 2514.97 samples/sec Loss 1.7876 LearningRate 0.000201 Epoch: 23 Global Step: 495010 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:33,388-Speed 1544.14 samples/sec Loss 1.7777 LearningRate 0.000201 Epoch: 23 Global Step: 495020 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:41,596-Speed 2495.23 samples/sec Loss 1.7864 LearningRate 0.000201 Epoch: 23 Global Step: 495030 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:46:53,475-Speed 1728.32 samples/sec Loss 1.7373 LearningRate 0.000201 Epoch: 23 Global Step: 495040 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:02,367-Speed 2303.70 samples/sec Loss 1.7655 LearningRate 0.000201 Epoch: 23 Global Step: 495050 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:10,759-Speed 2440.61 samples/sec Loss 1.7894 LearningRate 0.000201 Epoch: 23 Global Step: 495060 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:18,911-Speed 2512.73 samples/sec Loss 1.7913 LearningRate 0.000201 Epoch: 23 Global Step: 495070 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:27,114-Speed 2497.04 samples/sec Loss 1.7664 LearningRate 0.000201 Epoch: 23 Global Step: 495080 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:35,321-Speed 2495.74 samples/sec Loss 1.8189 LearningRate 0.000201 Epoch: 23 Global Step: 495090 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:43,528-Speed 2495.70 samples/sec Loss 1.7766 LearningRate 0.000201 Epoch: 23 Global Step: 495100 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:51,736-Speed 2495.58 samples/sec Loss 1.7776 LearningRate 0.000201 Epoch: 23 Global Step: 495110 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:47:59,939-Speed 2497.23 samples/sec Loss 1.7249 LearningRate 0.000201 Epoch: 23 Global Step: 495120 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:08,089-Speed 2513.17 samples/sec Loss 1.7478 LearningRate 0.000201 Epoch: 23 Global Step: 495130 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:16,298-Speed 2495.20 samples/sec Loss 1.7760 LearningRate 0.000201 Epoch: 23 Global Step: 495140 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:24,505-Speed 2496.18 samples/sec Loss 1.7706 LearningRate 0.000201 Epoch: 23 Global Step: 495150 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:32,714-Speed 2495.25 samples/sec Loss 1.7781 LearningRate 0.000201 Epoch: 23 Global Step: 495160 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:40,918-Speed 2496.74 samples/sec Loss 1.7780 LearningRate 0.000201 Epoch: 23 Global Step: 495170 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:49,128-Speed 2494.93 samples/sec Loss 1.7648 LearningRate 0.000201 Epoch: 23 Global Step: 495180 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:48:57,283-Speed 2511.65 samples/sec Loss 1.7528 LearningRate 0.000201 Epoch: 23 Global Step: 495190 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:05,492-Speed 2495.41 samples/sec Loss 1.7741 LearningRate 0.000201 Epoch: 23 Global Step: 495200 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:13,699-Speed 2495.61 samples/sec Loss 1.7647 LearningRate 0.000201 Epoch: 23 Global Step: 495210 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:21,915-Speed 2493.30 samples/sec Loss 1.7611 LearningRate 0.000201 Epoch: 23 Global Step: 495220 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:30,122-Speed 2495.87 samples/sec Loss 1.7640 LearningRate 0.000201 Epoch: 23 Global Step: 495230 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:38,328-Speed 2496.38 samples/sec Loss 1.7574 LearningRate 0.000201 Epoch: 23 Global Step: 495240 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:46,474-Speed 2514.34 samples/sec Loss 1.7938 LearningRate 0.000201 Epoch: 23 Global Step: 495250 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:49:54,678-Speed 2497.00 samples/sec Loss 1.7555 LearningRate 0.000200 Epoch: 23 Global Step: 495260 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:02,881-Speed 2497.00 samples/sec Loss 1.8007 LearningRate 0.000200 Epoch: 23 Global Step: 495270 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:11,080-Speed 2498.36 samples/sec Loss 1.7184 LearningRate 0.000200 Epoch: 23 Global Step: 495280 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:19,278-Speed 2498.57 samples/sec Loss 1.7450 LearningRate 0.000200 Epoch: 23 Global Step: 495290 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:27,493-Speed 2493.48 samples/sec Loss 1.7830 LearningRate 0.000200 Epoch: 23 Global Step: 495300 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:35,642-Speed 2513.83 samples/sec Loss 1.7766 LearningRate 0.000200 Epoch: 23 Global Step: 495310 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:43,844-Speed 2497.46 samples/sec Loss 1.7727 LearningRate 0.000200 Epoch: 23 Global Step: 495320 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:50:52,043-Speed 2498.17 samples/sec Loss 1.8226 LearningRate 0.000200 Epoch: 23 Global Step: 495330 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:00,247-Speed 2497.20 samples/sec Loss 1.7367 LearningRate 0.000200 Epoch: 23 Global Step: 495340 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:08,455-Speed 2495.41 samples/sec Loss 1.7802 LearningRate 0.000200 Epoch: 23 Global Step: 495350 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:16,657-Speed 2497.18 samples/sec Loss 1.8093 LearningRate 0.000200 Epoch: 23 Global Step: 495360 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:24,817-Speed 2510.59 samples/sec Loss 1.7588 LearningRate 0.000200 Epoch: 23 Global Step: 495370 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:33,033-Speed 2493.26 samples/sec Loss 1.7830 LearningRate 0.000200 Epoch: 23 Global Step: 495380 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:41,237-Speed 2496.62 samples/sec Loss 1.8019 LearningRate 0.000200 Epoch: 23 Global Step: 495390 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:49,439-Speed 2497.34 samples/sec Loss 1.7969 LearningRate 0.000200 Epoch: 23 Global Step: 495400 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:51:57,645-Speed 2496.23 samples/sec Loss 1.8037 LearningRate 0.000200 Epoch: 23 Global Step: 495410 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:52:05,853-Speed 2495.48 samples/sec Loss 1.7930 LearningRate 0.000200 Epoch: 23 Global Step: 495420 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-07-10 07:52:14,000-Speed 2514.05 samples/sec Loss 1.7778 LearningRate 0.000200 Epoch: 23 Global Step: 495430 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:52:22,221-Speed 2491.47 samples/sec Loss 1.8473 LearningRate 0.000200 Epoch: 23 Global Step: 495440 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:52:30,424-Speed 2497.33 samples/sec Loss 1.8081 LearningRate 0.000200 Epoch: 23 Global Step: 495450 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:52:38,623-Speed 2498.07 samples/sec Loss 1.7684 LearningRate 0.000200 Epoch: 23 Global Step: 495460 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:52:46,825-Speed 2497.46 samples/sec Loss 1.7599 LearningRate 0.000200 Epoch: 23 Global Step: 495470 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:52:55,028-Speed 2497.03 samples/sec Loss 1.7681 LearningRate 0.000200 Epoch: 23 Global Step: 495480 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:03,174-Speed 2514.63 samples/sec Loss 1.7874 LearningRate 0.000200 Epoch: 23 Global Step: 495490 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:11,378-Speed 2496.76 samples/sec Loss 1.7761 LearningRate 0.000200 Epoch: 23 Global Step: 495500 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:19,580-Speed 2497.27 samples/sec Loss 1.8090 LearningRate 0.000200 Epoch: 23 Global Step: 495510 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:27,783-Speed 2497.26 samples/sec Loss 1.7706 LearningRate 0.000200 Epoch: 23 Global Step: 495520 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:35,986-Speed 2497.06 samples/sec Loss 1.8008 LearningRate 0.000200 Epoch: 23 Global Step: 495530 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:44,189-Speed 2497.12 samples/sec Loss 1.7727 LearningRate 0.000200 Epoch: 23 Global Step: 495540 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:53:52,343-Speed 2511.81 samples/sec Loss 1.7738 LearningRate 0.000200 Epoch: 23 Global Step: 495550 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:00,548-Speed 2496.50 samples/sec Loss 1.8254 LearningRate 0.000200 Epoch: 23 Global Step: 495560 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:08,749-Speed 2497.60 samples/sec Loss 1.7314 LearningRate 0.000200 Epoch: 23 Global Step: 495570 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:16,952-Speed 2497.13 samples/sec Loss 1.7428 LearningRate 0.000200 Epoch: 23 Global Step: 495580 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:25,157-Speed 2496.22 samples/sec Loss 1.7902 LearningRate 0.000200 Epoch: 23 Global Step: 495590 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:33,367-Speed 2495.03 samples/sec Loss 1.7375 LearningRate 0.000200 Epoch: 23 Global Step: 495600 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:41,514-Speed 2514.38 samples/sec Loss 1.7464 LearningRate 0.000200 Epoch: 23 Global Step: 495610 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:49,713-Speed 2498.48 samples/sec Loss 1.8092 LearningRate 0.000200 Epoch: 23 Global Step: 495620 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:54:57,918-Speed 2496.38 samples/sec Loss 1.7760 LearningRate 0.000200 Epoch: 23 Global Step: 495630 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:06,120-Speed 2497.28 samples/sec Loss 1.7610 LearningRate 0.000200 Epoch: 23 Global Step: 495640 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:14,326-Speed 2496.35 samples/sec Loss 1.8019 LearningRate 0.000200 Epoch: 23 Global Step: 495650 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:22,528-Speed 2497.26 samples/sec Loss 1.8423 LearningRate 0.000200 Epoch: 23 Global Step: 495660 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:30,677-Speed 2513.55 samples/sec Loss 1.7656 LearningRate 0.000200 Epoch: 23 Global Step: 495670 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:38,881-Speed 2497.09 samples/sec Loss 1.7707 LearningRate 0.000200 Epoch: 23 Global Step: 495680 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:47,084-Speed 2497.07 samples/sec Loss 1.7356 LearningRate 0.000200 Epoch: 23 Global Step: 495690 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 07:55:55,242-Speed 2510.84 samples/sec Loss 1.7961 LearningRate 0.000200 Epoch: 23 Global Step: 495700 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:03,454-Speed 2494.80 samples/sec Loss 1.7932 LearningRate 0.000200 Epoch: 23 Global Step: 495710 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:11,655-Speed 2497.47 samples/sec Loss 1.7685 LearningRate 0.000200 Epoch: 23 Global Step: 495720 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:19,828-Speed 2506.37 samples/sec Loss 1.8112 LearningRate 0.000200 Epoch: 23 Global Step: 495730 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:28,027-Speed 2498.13 samples/sec Loss 1.7569 LearningRate 0.000200 Epoch: 23 Global Step: 495740 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:36,232-Speed 2496.28 samples/sec Loss 1.7809 LearningRate 0.000200 Epoch: 23 Global Step: 495750 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:44,437-Speed 2496.91 samples/sec Loss 1.7770 LearningRate 0.000200 Epoch: 23 Global Step: 495760 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:56:52,636-Speed 2498.30 samples/sec Loss 1.7692 LearningRate 0.000200 Epoch: 23 Global Step: 495770 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:00,834-Speed 2498.36 samples/sec Loss 1.7584 LearningRate 0.000200 Epoch: 23 Global Step: 495780 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:08,984-Speed 2513.23 samples/sec Loss 1.7610 LearningRate 0.000200 Epoch: 23 Global Step: 495790 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:17,188-Speed 2497.02 samples/sec Loss 1.7956 LearningRate 0.000200 Epoch: 23 Global Step: 495800 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:25,384-Speed 2499.10 samples/sec Loss 1.7875 LearningRate 0.000200 Epoch: 23 Global Step: 495810 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:33,586-Speed 2497.33 samples/sec Loss 1.7556 LearningRate 0.000200 Epoch: 23 Global Step: 495820 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:41,783-Speed 2498.78 samples/sec Loss 1.7538 LearningRate 0.000200 Epoch: 23 Global Step: 495830 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:49,983-Speed 2498.03 samples/sec Loss 1.8185 LearningRate 0.000200 Epoch: 23 Global Step: 495840 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:57:58,130-Speed 2514.23 samples/sec Loss 1.7560 LearningRate 0.000200 Epoch: 23 Global Step: 495850 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:06,332-Speed 2497.23 samples/sec Loss 1.7610 LearningRate 0.000200 Epoch: 23 Global Step: 495860 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:14,533-Speed 2497.93 samples/sec Loss 1.7489 LearningRate 0.000200 Epoch: 23 Global Step: 495870 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:22,732-Speed 2498.35 samples/sec Loss 1.7764 LearningRate 0.000200 Epoch: 23 Global Step: 495880 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:30,932-Speed 2497.93 samples/sec Loss 1.8041 LearningRate 0.000200 Epoch: 23 Global Step: 495890 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:39,130-Speed 2498.34 samples/sec Loss 1.7538 LearningRate 0.000200 Epoch: 23 Global Step: 495900 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:47,294-Speed 2516.27 samples/sec Loss 1.7230 LearningRate 0.000200 Epoch: 23 Global Step: 495910 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:58:55,497-Speed 2497.04 samples/sec Loss 1.7354 LearningRate 0.000200 Epoch: 23 Global Step: 495920 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:03,699-Speed 2497.26 samples/sec Loss 1.7796 LearningRate 0.000200 Epoch: 23 Global Step: 495930 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:11,897-Speed 2498.66 samples/sec Loss 1.7566 LearningRate 0.000200 Epoch: 23 Global Step: 495940 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:20,100-Speed 2497.06 samples/sec Loss 1.7567 LearningRate 0.000200 Epoch: 23 Global Step: 495950 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:28,313-Speed 2494.17 samples/sec Loss 1.7057 LearningRate 0.000200 Epoch: 23 Global Step: 495960 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:36,459-Speed 2514.45 samples/sec Loss 1.7297 LearningRate 0.000200 Epoch: 23 Global Step: 495970 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:44,659-Speed 2497.91 samples/sec Loss 1.7457 LearningRate 0.000200 Epoch: 23 Global Step: 495980 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 07:59:52,858-Speed 2498.40 samples/sec Loss 1.7519 LearningRate 0.000200 Epoch: 23 Global Step: 495990 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:01,057-Speed 2498.31 samples/sec Loss 1.7628 LearningRate 0.000200 Epoch: 23 Global Step: 496000 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:09,266-Speed 2495.28 samples/sec Loss 1.7767 LearningRate 0.000200 Epoch: 23 Global Step: 496010 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:17,469-Speed 2497.00 samples/sec Loss 1.7231 LearningRate 0.000200 Epoch: 23 Global Step: 496020 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:25,616-Speed 2514.31 samples/sec Loss 1.7788 LearningRate 0.000200 Epoch: 23 Global Step: 496030 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:33,814-Speed 2498.41 samples/sec Loss 1.7745 LearningRate 0.000200 Epoch: 23 Global Step: 496040 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:42,020-Speed 2496.38 samples/sec Loss 1.7654 LearningRate 0.000200 Epoch: 23 Global Step: 496050 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:50,218-Speed 2498.48 samples/sec Loss 1.7798 LearningRate 0.000200 Epoch: 23 Global Step: 496060 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:00:58,420-Speed 2497.58 samples/sec Loss 1.7377 LearningRate 0.000200 Epoch: 23 Global Step: 496070 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:06,624-Speed 2497.03 samples/sec Loss 1.7434 LearningRate 0.000200 Epoch: 23 Global Step: 496080 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:14,772-Speed 2514.04 samples/sec Loss 1.7345 LearningRate 0.000199 Epoch: 23 Global Step: 496090 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:22,973-Speed 2497.44 samples/sec Loss 1.7880 LearningRate 0.000199 Epoch: 23 Global Step: 496100 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:31,172-Speed 2498.44 samples/sec Loss 1.7662 LearningRate 0.000199 Epoch: 23 Global Step: 496110 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:39,374-Speed 2497.34 samples/sec Loss 1.7224 LearningRate 0.000199 Epoch: 23 Global Step: 496120 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:47,574-Speed 2497.82 samples/sec Loss 1.7434 LearningRate 0.000199 Epoch: 23 Global Step: 496130 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:01:55,776-Speed 2497.66 samples/sec Loss 1.7257 LearningRate 0.000199 Epoch: 23 Global Step: 496140 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:03,926-Speed 2513.11 samples/sec Loss 1.7848 LearningRate 0.000199 Epoch: 23 Global Step: 496150 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:12,136-Speed 2495.05 samples/sec Loss 1.7990 LearningRate 0.000199 Epoch: 23 Global Step: 496160 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:20,337-Speed 2497.54 samples/sec Loss 1.7814 LearningRate 0.000199 Epoch: 23 Global Step: 496170 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:28,538-Speed 2497.80 samples/sec Loss 1.7874 LearningRate 0.000199 Epoch: 23 Global Step: 496180 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:36,737-Speed 2498.27 samples/sec Loss 1.7774 LearningRate 0.000199 Epoch: 23 Global Step: 496190 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:44,953-Speed 2493.03 samples/sec Loss 1.8136 LearningRate 0.000199 Epoch: 23 Global Step: 496200 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:02:53,100-Speed 2514.22 samples/sec Loss 1.7765 LearningRate 0.000199 Epoch: 23 Global Step: 496210 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:01,300-Speed 2497.84 samples/sec Loss 1.7577 LearningRate 0.000199 Epoch: 23 Global Step: 496220 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:09,504-Speed 2497.02 samples/sec Loss 1.7404 LearningRate 0.000199 Epoch: 23 Global Step: 496230 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:17,714-Speed 2494.73 samples/sec Loss 1.7503 LearningRate 0.000199 Epoch: 23 Global Step: 496240 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:25,915-Speed 2497.98 samples/sec Loss 1.7320 LearningRate 0.000199 Epoch: 23 Global Step: 496250 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:34,112-Speed 2499.13 samples/sec Loss 1.7862 LearningRate 0.000199 Epoch: 23 Global Step: 496260 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:42,262-Speed 2513.08 samples/sec Loss 1.7620 LearningRate 0.000199 Epoch: 23 Global Step: 496270 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:50,459-Speed 2499.29 samples/sec Loss 1.7855 LearningRate 0.000199 Epoch: 23 Global Step: 496280 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:03:58,656-Speed 2498.97 samples/sec Loss 1.7764 LearningRate 0.000199 Epoch: 23 Global Step: 496290 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:06,854-Speed 2498.46 samples/sec Loss 1.7435 LearningRate 0.000199 Epoch: 23 Global Step: 496300 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:15,054-Speed 2498.23 samples/sec Loss 1.7671 LearningRate 0.000199 Epoch: 23 Global Step: 496310 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:23,255-Speed 2497.60 samples/sec Loss 1.7800 LearningRate 0.000199 Epoch: 23 Global Step: 496320 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:31,401-Speed 2514.77 samples/sec Loss 1.8143 LearningRate 0.000199 Epoch: 23 Global Step: 496330 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:39,600-Speed 2498.23 samples/sec Loss 1.7666 LearningRate 0.000199 Epoch: 23 Global Step: 496340 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:47,800-Speed 2497.95 samples/sec Loss 1.8239 LearningRate 0.000199 Epoch: 23 Global Step: 496350 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:04:55,998-Speed 2498.28 samples/sec Loss 1.8099 LearningRate 0.000199 Epoch: 23 Global Step: 496360 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:04,199-Speed 2497.85 samples/sec Loss 1.7751 LearningRate 0.000199 Epoch: 23 Global Step: 496370 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:12,397-Speed 2498.34 samples/sec Loss 1.7770 LearningRate 0.000199 Epoch: 23 Global Step: 496380 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:20,546-Speed 2513.55 samples/sec Loss 1.7186 LearningRate 0.000199 Epoch: 23 Global Step: 496390 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:28,747-Speed 2498.12 samples/sec Loss 1.7680 LearningRate 0.000199 Epoch: 23 Global Step: 496400 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:36,945-Speed 2498.49 samples/sec Loss 1.7745 LearningRate 0.000199 Epoch: 23 Global Step: 496410 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:45,147-Speed 2497.33 samples/sec Loss 1.7691 LearningRate 0.000199 Epoch: 23 Global Step: 496420 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:05:53,347-Speed 2498.05 samples/sec Loss 1.7345 LearningRate 0.000199 Epoch: 23 Global Step: 496430 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:01,550-Speed 2497.04 samples/sec Loss 1.7210 LearningRate 0.000199 Epoch: 23 Global Step: 496440 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:09,697-Speed 2514.12 samples/sec Loss 1.7567 LearningRate 0.000199 Epoch: 23 Global Step: 496450 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:17,898-Speed 2497.72 samples/sec Loss 1.7656 LearningRate 0.000199 Epoch: 23 Global Step: 496460 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:26,099-Speed 2497.79 samples/sec Loss 1.7797 LearningRate 0.000199 Epoch: 23 Global Step: 496470 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:34,301-Speed 2497.23 samples/sec Loss 1.7391 LearningRate 0.000199 Epoch: 23 Global Step: 496480 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:42,507-Speed 2496.32 samples/sec Loss 1.7660 LearningRate 0.000199 Epoch: 23 Global Step: 496490 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:50,713-Speed 2496.21 samples/sec Loss 1.7931 LearningRate 0.000199 Epoch: 23 Global Step: 496500 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:06:58,860-Speed 2513.90 samples/sec Loss 1.7217 LearningRate 0.000199 Epoch: 23 Global Step: 496510 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:07,065-Speed 2496.75 samples/sec Loss 1.7878 LearningRate 0.000199 Epoch: 23 Global Step: 496520 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:15,267-Speed 2497.38 samples/sec Loss 1.7778 LearningRate 0.000199 Epoch: 23 Global Step: 496530 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:23,471-Speed 2496.75 samples/sec Loss 1.7162 LearningRate 0.000199 Epoch: 23 Global Step: 496540 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:31,673-Speed 2497.54 samples/sec Loss 1.7554 LearningRate 0.000199 Epoch: 23 Global Step: 496550 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:39,874-Speed 2497.59 samples/sec Loss 1.7507 LearningRate 0.000199 Epoch: 23 Global Step: 496560 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:48,023-Speed 2513.87 samples/sec Loss 1.7530 LearningRate 0.000199 Epoch: 23 Global Step: 496570 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:07:56,230-Speed 2496.07 samples/sec Loss 1.7855 LearningRate 0.000199 Epoch: 23 Global Step: 496580 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:04,428-Speed 2498.19 samples/sec Loss 1.7671 LearningRate 0.000199 Epoch: 23 Global Step: 496590 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:12,632-Speed 2496.79 samples/sec Loss 1.7969 LearningRate 0.000199 Epoch: 23 Global Step: 496600 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:20,832-Speed 2498.07 samples/sec Loss 1.7608 LearningRate 0.000199 Epoch: 23 Global Step: 496610 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:29,036-Speed 2496.92 samples/sec Loss 1.7881 LearningRate 0.000199 Epoch: 23 Global Step: 496620 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:37,180-Speed 2514.96 samples/sec Loss 1.7776 LearningRate 0.000199 Epoch: 23 Global Step: 496630 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:45,380-Speed 2498.02 samples/sec Loss 1.7532 LearningRate 0.000199 Epoch: 23 Global Step: 496640 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:08:53,582-Speed 2497.44 samples/sec Loss 1.7870 LearningRate 0.000199 Epoch: 23 Global Step: 496650 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:01,802-Speed 2491.71 samples/sec Loss 1.8364 LearningRate 0.000199 Epoch: 23 Global Step: 496660 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:09,999-Speed 2498.61 samples/sec Loss 1.7709 LearningRate 0.000199 Epoch: 23 Global Step: 496670 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:18,213-Speed 2493.89 samples/sec Loss 1.7535 LearningRate 0.000199 Epoch: 23 Global Step: 496680 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:26,359-Speed 2514.46 samples/sec Loss 1.7841 LearningRate 0.000199 Epoch: 23 Global Step: 496690 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:34,558-Speed 2498.09 samples/sec Loss 1.7703 LearningRate 0.000199 Epoch: 23 Global Step: 496700 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:42,754-Speed 2499.03 samples/sec Loss 1.7812 LearningRate 0.000199 Epoch: 23 Global Step: 496710 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:50,952-Speed 2498.42 samples/sec Loss 1.7822 LearningRate 0.000199 Epoch: 23 Global Step: 496720 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:09:59,152-Speed 2498.38 samples/sec Loss 1.7654 LearningRate 0.000199 Epoch: 23 Global Step: 496730 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:07,365-Speed 2494.02 samples/sec Loss 1.7961 LearningRate 0.000199 Epoch: 23 Global Step: 496740 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:15,512-Speed 2514.09 samples/sec Loss 1.7861 LearningRate 0.000199 Epoch: 23 Global Step: 496750 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:23,808-Speed 2469.15 samples/sec Loss 1.7932 LearningRate 0.000199 Epoch: 23 Global Step: 496760 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:32,006-Speed 2498.40 samples/sec Loss 1.7845 LearningRate 0.000199 Epoch: 23 Global Step: 496770 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:40,207-Speed 2497.70 samples/sec Loss 1.7474 LearningRate 0.000199 Epoch: 23 Global Step: 496780 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:48,405-Speed 2498.59 samples/sec Loss 1.7830 LearningRate 0.000199 Epoch: 23 Global Step: 496790 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:10:56,606-Speed 2497.51 samples/sec Loss 1.7805 LearningRate 0.000199 Epoch: 23 Global Step: 496800 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:04,756-Speed 2513.42 samples/sec Loss 1.7765 LearningRate 0.000199 Epoch: 23 Global Step: 496810 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:12,952-Speed 2499.14 samples/sec Loss 1.7695 LearningRate 0.000199 Epoch: 23 Global Step: 496820 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:21,155-Speed 2497.03 samples/sec Loss 1.7864 LearningRate 0.000199 Epoch: 23 Global Step: 496830 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:29,354-Speed 2498.46 samples/sec Loss 1.7619 LearningRate 0.000199 Epoch: 23 Global Step: 496840 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:37,553-Speed 2498.00 samples/sec Loss 1.7966 LearningRate 0.000199 Epoch: 23 Global Step: 496850 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:45,756-Speed 2497.13 samples/sec Loss 1.7578 LearningRate 0.000199 Epoch: 23 Global Step: 496860 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:11:53,903-Speed 2514.28 samples/sec Loss 1.7799 LearningRate 0.000199 Epoch: 23 Global Step: 496870 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:12:02,103-Speed 2498.37 samples/sec Loss 1.8042 LearningRate 0.000199 Epoch: 23 Global Step: 496880 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:12:10,301-Speed 2498.39 samples/sec Loss 1.8354 LearningRate 0.000199 Epoch: 23 Global Step: 496890 Fp16 Grad Scale: 8192 Required: 76 hours Training: 2022-07-10 08:12:18,505-Speed 2496.67 samples/sec Loss 1.7772 LearningRate 0.000199 Epoch: 23 Global Step: 496900 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:12:26,711-Speed 2496.21 samples/sec Loss 1.7744 LearningRate 0.000199 Epoch: 23 Global Step: 496910 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:12:34,925-Speed 2493.68 samples/sec Loss 1.7446 LearningRate 0.000199 Epoch: 23 Global Step: 496920 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:12:43,073-Speed 2514.10 samples/sec Loss 1.7619 LearningRate 0.000198 Epoch: 23 Global Step: 496930 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:12:51,274-Speed 2497.53 samples/sec Loss 1.8214 LearningRate 0.000198 Epoch: 23 Global Step: 496940 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:12:59,477-Speed 2497.34 samples/sec Loss 1.8069 LearningRate 0.000198 Epoch: 23 Global Step: 496950 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:07,680-Speed 2497.18 samples/sec Loss 1.7996 LearningRate 0.000198 Epoch: 23 Global Step: 496960 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:15,883-Speed 2497.10 samples/sec Loss 1.8057 LearningRate 0.000198 Epoch: 23 Global Step: 496970 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:24,086-Speed 2496.81 samples/sec Loss 1.7901 LearningRate 0.000198 Epoch: 23 Global Step: 496980 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:32,236-Speed 2513.32 samples/sec Loss 1.7220 LearningRate 0.000198 Epoch: 23 Global Step: 496990 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:40,444-Speed 2495.40 samples/sec Loss 1.7733 LearningRate 0.000198 Epoch: 23 Global Step: 497000 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:48,648-Speed 2496.84 samples/sec Loss 1.7565 LearningRate 0.000198 Epoch: 23 Global Step: 497010 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:13:56,849-Speed 2497.45 samples/sec Loss 1.7262 LearningRate 0.000198 Epoch: 23 Global Step: 497020 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:05,050-Speed 2497.77 samples/sec Loss 1.7971 LearningRate 0.000198 Epoch: 23 Global Step: 497030 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:13,252-Speed 2497.34 samples/sec Loss 1.7862 LearningRate 0.000198 Epoch: 23 Global Step: 497040 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:21,406-Speed 2512.20 samples/sec Loss 1.7852 LearningRate 0.000198 Epoch: 23 Global Step: 497050 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:29,601-Speed 2499.63 samples/sec Loss 1.7769 LearningRate 0.000198 Epoch: 23 Global Step: 497060 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:37,807-Speed 2496.22 samples/sec Loss 1.7800 LearningRate 0.000198 Epoch: 23 Global Step: 497070 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:46,012-Speed 2496.40 samples/sec Loss 1.7668 LearningRate 0.000198 Epoch: 23 Global Step: 497080 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:14:54,224-Speed 2494.21 samples/sec Loss 1.7555 LearningRate 0.000198 Epoch: 23 Global Step: 497090 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:02,427-Speed 2497.09 samples/sec Loss 1.7396 LearningRate 0.000198 Epoch: 23 Global Step: 497100 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:10,580-Speed 2512.31 samples/sec Loss 1.7903 LearningRate 0.000198 Epoch: 23 Global Step: 497110 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:18,781-Speed 2497.64 samples/sec Loss 1.7496 LearningRate 0.000198 Epoch: 23 Global Step: 497120 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:26,978-Speed 2499.01 samples/sec Loss 1.7577 LearningRate 0.000198 Epoch: 23 Global Step: 497130 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:35,176-Speed 2498.44 samples/sec Loss 1.8040 LearningRate 0.000198 Epoch: 23 Global Step: 497140 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:43,375-Speed 2498.34 samples/sec Loss 1.8128 LearningRate 0.000198 Epoch: 23 Global Step: 497150 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:51,573-Speed 2498.59 samples/sec Loss 1.8348 LearningRate 0.000198 Epoch: 23 Global Step: 497160 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:15:59,721-Speed 2513.78 samples/sec Loss 1.7848 LearningRate 0.000198 Epoch: 23 Global Step: 497170 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:07,923-Speed 2497.41 samples/sec Loss 1.7523 LearningRate 0.000198 Epoch: 23 Global Step: 497180 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:16,127-Speed 2496.70 samples/sec Loss 1.7939 LearningRate 0.000198 Epoch: 23 Global Step: 497190 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:24,337-Speed 2494.94 samples/sec Loss 1.7642 LearningRate 0.000198 Epoch: 23 Global Step: 497200 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:32,539-Speed 2497.10 samples/sec Loss 1.7683 LearningRate 0.000198 Epoch: 23 Global Step: 497210 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:40,740-Speed 2497.95 samples/sec Loss 1.7356 LearningRate 0.000198 Epoch: 23 Global Step: 497220 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:48,889-Speed 2513.74 samples/sec Loss 1.7717 LearningRate 0.000198 Epoch: 23 Global Step: 497230 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:16:57,093-Speed 2496.78 samples/sec Loss 1.8069 LearningRate 0.000198 Epoch: 23 Global Step: 497240 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:05,292-Speed 2498.16 samples/sec Loss 1.8217 LearningRate 0.000198 Epoch: 23 Global Step: 497250 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:13,492-Speed 2497.93 samples/sec Loss 1.7283 LearningRate 0.000198 Epoch: 23 Global Step: 497260 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:21,694-Speed 2497.60 samples/sec Loss 1.7530 LearningRate 0.000198 Epoch: 23 Global Step: 497270 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:29,895-Speed 2498.05 samples/sec Loss 1.7844 LearningRate 0.000198 Epoch: 23 Global Step: 497280 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:38,040-Speed 2514.82 samples/sec Loss 1.7898 LearningRate 0.000198 Epoch: 23 Global Step: 497290 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:46,241-Speed 2497.52 samples/sec Loss 1.7739 LearningRate 0.000198 Epoch: 23 Global Step: 497300 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:17:54,439-Speed 2498.75 samples/sec Loss 1.7248 LearningRate 0.000198 Epoch: 23 Global Step: 497310 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:02,637-Speed 2498.54 samples/sec Loss 1.7354 LearningRate 0.000198 Epoch: 23 Global Step: 497320 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:10,838-Speed 2497.73 samples/sec Loss 1.7296 LearningRate 0.000198 Epoch: 23 Global Step: 497330 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:19,035-Speed 2498.94 samples/sec Loss 1.7908 LearningRate 0.000198 Epoch: 23 Global Step: 497340 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:27,189-Speed 2512.39 samples/sec Loss 1.7476 LearningRate 0.000198 Epoch: 23 Global Step: 497350 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:35,389-Speed 2497.78 samples/sec Loss 1.7621 LearningRate 0.000198 Epoch: 23 Global Step: 497360 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:43,591-Speed 2497.40 samples/sec Loss 1.7621 LearningRate 0.000198 Epoch: 23 Global Step: 497370 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:51,791-Speed 2498.16 samples/sec Loss 1.7711 LearningRate 0.000198 Epoch: 23 Global Step: 497380 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:18:59,991-Speed 2498.02 samples/sec Loss 1.7957 LearningRate 0.000198 Epoch: 23 Global Step: 497390 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:08,194-Speed 2496.93 samples/sec Loss 1.7953 LearningRate 0.000198 Epoch: 23 Global Step: 497400 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:16,344-Speed 2513.16 samples/sec Loss 1.7603 LearningRate 0.000198 Epoch: 23 Global Step: 497410 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:24,547-Speed 2497.28 samples/sec Loss 1.7782 LearningRate 0.000198 Epoch: 23 Global Step: 497420 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:32,748-Speed 2497.76 samples/sec Loss 1.7881 LearningRate 0.000198 Epoch: 23 Global Step: 497430 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:40,947-Speed 2498.09 samples/sec Loss 1.7649 LearningRate 0.000198 Epoch: 23 Global Step: 497440 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:49,149-Speed 2497.63 samples/sec Loss 1.7993 LearningRate 0.000198 Epoch: 23 Global Step: 497450 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:19:57,347-Speed 2498.65 samples/sec Loss 1.7867 LearningRate 0.000198 Epoch: 23 Global Step: 497460 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:05,495-Speed 2513.86 samples/sec Loss 1.7873 LearningRate 0.000198 Epoch: 23 Global Step: 497470 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:13,695-Speed 2498.15 samples/sec Loss 1.7801 LearningRate 0.000198 Epoch: 23 Global Step: 497480 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:21,890-Speed 2499.39 samples/sec Loss 1.7332 LearningRate 0.000198 Epoch: 23 Global Step: 497490 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:30,087-Speed 2499.26 samples/sec Loss 1.8148 LearningRate 0.000198 Epoch: 23 Global Step: 497500 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:38,288-Speed 2497.56 samples/sec Loss 1.7618 LearningRate 0.000198 Epoch: 23 Global Step: 497510 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:46,500-Speed 2494.68 samples/sec Loss 1.7562 LearningRate 0.000198 Epoch: 23 Global Step: 497520 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:20:54,648-Speed 2513.78 samples/sec Loss 1.8002 LearningRate 0.000198 Epoch: 23 Global Step: 497530 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:02,851-Speed 2497.08 samples/sec Loss 1.7585 LearningRate 0.000198 Epoch: 23 Global Step: 497540 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:11,056-Speed 2496.53 samples/sec Loss 1.7727 LearningRate 0.000198 Epoch: 23 Global Step: 497550 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:19,258-Speed 2497.30 samples/sec Loss 1.8200 LearningRate 0.000198 Epoch: 23 Global Step: 497560 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:27,463-Speed 2496.34 samples/sec Loss 1.8029 LearningRate 0.000198 Epoch: 23 Global Step: 497570 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:35,666-Speed 2497.13 samples/sec Loss 1.7684 LearningRate 0.000198 Epoch: 23 Global Step: 497580 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:43,818-Speed 2512.74 samples/sec Loss 1.7688 LearningRate 0.000198 Epoch: 23 Global Step: 497590 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:21:52,025-Speed 2495.93 samples/sec Loss 1.7833 LearningRate 0.000198 Epoch: 23 Global Step: 497600 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:00,225-Speed 2498.20 samples/sec Loss 1.7233 LearningRate 0.000198 Epoch: 23 Global Step: 497610 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:08,424-Speed 2498.31 samples/sec Loss 1.7887 LearningRate 0.000198 Epoch: 23 Global Step: 497620 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:16,625-Speed 2497.75 samples/sec Loss 1.7689 LearningRate 0.000198 Epoch: 23 Global Step: 497630 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:24,823-Speed 2498.45 samples/sec Loss 1.7449 LearningRate 0.000198 Epoch: 23 Global Step: 497640 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:32,972-Speed 2513.80 samples/sec Loss 1.7400 LearningRate 0.000198 Epoch: 23 Global Step: 497650 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:41,169-Speed 2499.06 samples/sec Loss 1.7822 LearningRate 0.000198 Epoch: 23 Global Step: 497660 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:49,373-Speed 2496.70 samples/sec Loss 1.8105 LearningRate 0.000198 Epoch: 23 Global Step: 497670 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:22:57,572-Speed 2498.20 samples/sec Loss 1.7601 LearningRate 0.000198 Epoch: 23 Global Step: 497680 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:05,772-Speed 2498.23 samples/sec Loss 1.7694 LearningRate 0.000198 Epoch: 23 Global Step: 497690 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:13,975-Speed 2497.10 samples/sec Loss 1.7669 LearningRate 0.000198 Epoch: 23 Global Step: 497700 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:22,123-Speed 2513.71 samples/sec Loss 1.7927 LearningRate 0.000198 Epoch: 23 Global Step: 497710 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:30,323-Speed 2498.01 samples/sec Loss 1.8201 LearningRate 0.000198 Epoch: 23 Global Step: 497720 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:38,524-Speed 2497.75 samples/sec Loss 1.7819 LearningRate 0.000198 Epoch: 23 Global Step: 497730 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:46,723-Speed 2498.07 samples/sec Loss 1.8146 LearningRate 0.000198 Epoch: 23 Global Step: 497740 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:23:57,126-Speed 1969.00 samples/sec Loss 1.7697 LearningRate 0.000198 Epoch: 24 Global Step: 497750 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:05,320-Speed 2499.54 samples/sec Loss 1.8101 LearningRate 0.000198 Epoch: 24 Global Step: 497760 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:13,476-Speed 2511.61 samples/sec Loss 1.7834 LearningRate 0.000197 Epoch: 24 Global Step: 497770 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:21,687-Speed 2494.58 samples/sec Loss 1.8285 LearningRate 0.000197 Epoch: 24 Global Step: 497780 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:29,884-Speed 2498.98 samples/sec Loss 1.7814 LearningRate 0.000197 Epoch: 24 Global Step: 497790 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:38,084-Speed 2498.09 samples/sec Loss 1.8330 LearningRate 0.000197 Epoch: 24 Global Step: 497800 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:46,278-Speed 2499.51 samples/sec Loss 1.8196 LearningRate 0.000197 Epoch: 24 Global Step: 497810 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:24:54,494-Speed 2493.10 samples/sec Loss 1.7594 LearningRate 0.000197 Epoch: 24 Global Step: 497820 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:02,638-Speed 2515.13 samples/sec Loss 1.8115 LearningRate 0.000197 Epoch: 24 Global Step: 497830 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:10,838-Speed 2498.19 samples/sec Loss 1.7829 LearningRate 0.000197 Epoch: 24 Global Step: 497840 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:19,050-Speed 2494.21 samples/sec Loss 1.7873 LearningRate 0.000197 Epoch: 24 Global Step: 497850 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:27,247-Speed 2498.80 samples/sec Loss 1.8033 LearningRate 0.000197 Epoch: 24 Global Step: 497860 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:35,447-Speed 2497.92 samples/sec Loss 1.7866 LearningRate 0.000197 Epoch: 24 Global Step: 497870 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:43,642-Speed 2499.54 samples/sec Loss 1.8165 LearningRate 0.000197 Epoch: 24 Global Step: 497880 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:51,787-Speed 2514.81 samples/sec Loss 1.7683 LearningRate 0.000197 Epoch: 24 Global Step: 497890 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:25:59,986-Speed 2498.16 samples/sec Loss 1.8065 LearningRate 0.000197 Epoch: 24 Global Step: 497900 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:08,187-Speed 2497.64 samples/sec Loss 1.7869 LearningRate 0.000197 Epoch: 24 Global Step: 497910 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:16,386-Speed 2498.55 samples/sec Loss 1.8056 LearningRate 0.000197 Epoch: 24 Global Step: 497920 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:24,584-Speed 2498.43 samples/sec Loss 1.7872 LearningRate 0.000197 Epoch: 24 Global Step: 497930 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:32,786-Speed 2497.46 samples/sec Loss 1.7876 LearningRate 0.000197 Epoch: 24 Global Step: 497940 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:40,934-Speed 2514.05 samples/sec Loss 1.8147 LearningRate 0.000197 Epoch: 24 Global Step: 497950 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:49,139-Speed 2496.44 samples/sec Loss 1.7773 LearningRate 0.000197 Epoch: 24 Global Step: 497960 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:26:57,338-Speed 2498.22 samples/sec Loss 1.7657 LearningRate 0.000197 Epoch: 24 Global Step: 497970 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:05,536-Speed 2498.49 samples/sec Loss 1.7629 LearningRate 0.000197 Epoch: 24 Global Step: 497980 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:13,740-Speed 2496.86 samples/sec Loss 1.7890 LearningRate 0.000197 Epoch: 24 Global Step: 497990 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:21,939-Speed 2498.25 samples/sec Loss 1.7783 LearningRate 0.000197 Epoch: 24 Global Step: 498000 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:30,101-Speed 2509.53 samples/sec Loss 1.7697 LearningRate 0.000197 Epoch: 24 Global Step: 498010 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:38,306-Speed 2496.50 samples/sec Loss 1.7634 LearningRate 0.000197 Epoch: 24 Global Step: 498020 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:46,512-Speed 2496.28 samples/sec Loss 1.7596 LearningRate 0.000197 Epoch: 24 Global Step: 498030 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:27:54,715-Speed 2497.07 samples/sec Loss 1.7816 LearningRate 0.000197 Epoch: 24 Global Step: 498040 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:02,915-Speed 2497.79 samples/sec Loss 1.7569 LearningRate 0.000197 Epoch: 24 Global Step: 498050 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:11,119-Speed 2496.80 samples/sec Loss 1.7471 LearningRate 0.000197 Epoch: 24 Global Step: 498060 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:19,265-Speed 2514.62 samples/sec Loss 1.7574 LearningRate 0.000197 Epoch: 24 Global Step: 498070 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:27,468-Speed 2496.99 samples/sec Loss 1.7746 LearningRate 0.000197 Epoch: 24 Global Step: 498080 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:35,666-Speed 2498.25 samples/sec Loss 1.7774 LearningRate 0.000197 Epoch: 24 Global Step: 498090 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:28:43,867-Speed 2497.80 samples/sec Loss 1.7568 LearningRate 0.000197 Epoch: 24 Global Step: 498100 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:28:52,068-Speed 2497.76 samples/sec Loss 1.7688 LearningRate 0.000197 Epoch: 24 Global Step: 498110 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:00,267-Speed 2498.17 samples/sec Loss 1.7718 LearningRate 0.000197 Epoch: 24 Global Step: 498120 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:08,415-Speed 2513.88 samples/sec Loss 1.7969 LearningRate 0.000197 Epoch: 24 Global Step: 498130 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:16,632-Speed 2492.82 samples/sec Loss 1.7760 LearningRate 0.000197 Epoch: 24 Global Step: 498140 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:24,836-Speed 2497.03 samples/sec Loss 1.7342 LearningRate 0.000197 Epoch: 24 Global Step: 498150 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:33,040-Speed 2496.73 samples/sec Loss 1.7716 LearningRate 0.000197 Epoch: 24 Global Step: 498160 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:41,242-Speed 2497.37 samples/sec Loss 1.8158 LearningRate 0.000197 Epoch: 24 Global Step: 498170 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:49,442-Speed 2497.79 samples/sec Loss 1.7732 LearningRate 0.000197 Epoch: 24 Global Step: 498180 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:29:57,591-Speed 2513.68 samples/sec Loss 1.8213 LearningRate 0.000197 Epoch: 24 Global Step: 498190 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:05,790-Speed 2498.09 samples/sec Loss 1.7728 LearningRate 0.000197 Epoch: 24 Global Step: 498200 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:13,991-Speed 2497.83 samples/sec Loss 1.7445 LearningRate 0.000197 Epoch: 24 Global Step: 498210 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:22,195-Speed 2496.53 samples/sec Loss 1.8002 LearningRate 0.000197 Epoch: 24 Global Step: 498220 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:30,393-Speed 2498.69 samples/sec Loss 1.7413 LearningRate 0.000197 Epoch: 24 Global Step: 498230 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:38,596-Speed 2497.06 samples/sec Loss 1.7502 LearningRate 0.000197 Epoch: 24 Global Step: 498240 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:46,757-Speed 2509.98 samples/sec Loss 1.7402 LearningRate 0.000197 Epoch: 24 Global Step: 498250 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:30:54,954-Speed 2498.97 samples/sec Loss 1.7805 LearningRate 0.000197 Epoch: 24 Global Step: 498260 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:03,156-Speed 2497.38 samples/sec Loss 1.7932 LearningRate 0.000197 Epoch: 24 Global Step: 498270 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:11,356-Speed 2497.87 samples/sec Loss 1.7510 LearningRate 0.000197 Epoch: 24 Global Step: 498280 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:19,558-Speed 2497.45 samples/sec Loss 1.7930 LearningRate 0.000197 Epoch: 24 Global Step: 498290 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:27,758-Speed 2497.93 samples/sec Loss 1.7424 LearningRate 0.000197 Epoch: 24 Global Step: 498300 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:35,901-Speed 2515.37 samples/sec Loss 1.7576 LearningRate 0.000197 Epoch: 24 Global Step: 498310 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:44,099-Speed 2498.63 samples/sec Loss 1.7507 LearningRate 0.000197 Epoch: 24 Global Step: 498320 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:31:52,297-Speed 2498.62 samples/sec Loss 1.7426 LearningRate 0.000197 Epoch: 24 Global Step: 498330 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:00,497-Speed 2497.86 samples/sec Loss 1.7788 LearningRate 0.000197 Epoch: 24 Global Step: 498340 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:08,698-Speed 2497.71 samples/sec Loss 1.7617 LearningRate 0.000197 Epoch: 24 Global Step: 498350 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:16,899-Speed 2497.60 samples/sec Loss 1.7458 LearningRate 0.000197 Epoch: 24 Global Step: 498360 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:25,046-Speed 2514.24 samples/sec Loss 1.7602 LearningRate 0.000197 Epoch: 24 Global Step: 498370 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:33,247-Speed 2497.44 samples/sec Loss 1.7784 LearningRate 0.000197 Epoch: 24 Global Step: 498380 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:41,448-Speed 2497.69 samples/sec Loss 1.7783 LearningRate 0.000197 Epoch: 24 Global Step: 498390 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:49,650-Speed 2497.56 samples/sec Loss 1.7916 LearningRate 0.000197 Epoch: 24 Global Step: 498400 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:32:57,862-Speed 2494.12 samples/sec Loss 1.7564 LearningRate 0.000197 Epoch: 24 Global Step: 498410 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:06,062-Speed 2498.11 samples/sec Loss 1.7837 LearningRate 0.000197 Epoch: 24 Global Step: 498420 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:14,212-Speed 2513.14 samples/sec Loss 1.7176 LearningRate 0.000197 Epoch: 24 Global Step: 498430 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:22,410-Speed 2498.70 samples/sec Loss 1.7767 LearningRate 0.000197 Epoch: 24 Global Step: 498440 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:30,611-Speed 2497.66 samples/sec Loss 1.7788 LearningRate 0.000197 Epoch: 24 Global Step: 498450 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:38,820-Speed 2495.31 samples/sec Loss 1.7759 LearningRate 0.000197 Epoch: 24 Global Step: 498460 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:47,021-Speed 2497.64 samples/sec Loss 1.7734 LearningRate 0.000197 Epoch: 24 Global Step: 498470 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:33:55,218-Speed 2498.75 samples/sec Loss 1.7412 LearningRate 0.000197 Epoch: 24 Global Step: 498480 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:03,368-Speed 2513.42 samples/sec Loss 1.7641 LearningRate 0.000197 Epoch: 24 Global Step: 498490 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:11,571-Speed 2496.98 samples/sec Loss 1.7805 LearningRate 0.000197 Epoch: 24 Global Step: 498500 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:19,781-Speed 2494.74 samples/sec Loss 1.7646 LearningRate 0.000197 Epoch: 24 Global Step: 498510 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:27,986-Speed 2496.67 samples/sec Loss 1.7684 LearningRate 0.000197 Epoch: 24 Global Step: 498520 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:36,186-Speed 2498.07 samples/sec Loss 1.7881 LearningRate 0.000197 Epoch: 24 Global Step: 498530 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:44,385-Speed 2498.08 samples/sec Loss 1.7916 LearningRate 0.000197 Epoch: 24 Global Step: 498540 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:34:52,532-Speed 2514.41 samples/sec Loss 1.7626 LearningRate 0.000197 Epoch: 24 Global Step: 498550 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:00,729-Speed 2498.88 samples/sec Loss 1.7692 LearningRate 0.000197 Epoch: 24 Global Step: 498560 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:08,935-Speed 2496.04 samples/sec Loss 1.7467 LearningRate 0.000197 Epoch: 24 Global Step: 498570 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:17,133-Speed 2498.46 samples/sec Loss 1.7592 LearningRate 0.000197 Epoch: 24 Global Step: 498580 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:25,333-Speed 2498.15 samples/sec Loss 1.7808 LearningRate 0.000197 Epoch: 24 Global Step: 498590 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:33,537-Speed 2496.75 samples/sec Loss 1.7750 LearningRate 0.000197 Epoch: 24 Global Step: 498600 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:41,684-Speed 2514.11 samples/sec Loss 1.7562 LearningRate 0.000196 Epoch: 24 Global Step: 498610 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:49,882-Speed 2498.80 samples/sec Loss 1.8014 LearningRate 0.000196 Epoch: 24 Global Step: 498620 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:35:58,080-Speed 2498.98 samples/sec Loss 1.7917 LearningRate 0.000196 Epoch: 24 Global Step: 498630 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:36:06,290-Speed 2495.03 samples/sec Loss 1.7988 LearningRate 0.000196 Epoch: 24 Global Step: 498640 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-07-10 08:36:14,446-Speed 2511.37 samples/sec Loss 1.7823 LearningRate 0.000196 Epoch: 24 Global Step: 498650 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:36:22,647-Speed 2497.57 samples/sec Loss 1.7624 LearningRate 0.000196 Epoch: 24 Global Step: 498660 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:36:30,793-Speed 2514.63 samples/sec Loss 1.7801 LearningRate 0.000196 Epoch: 24 Global Step: 498670 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:36:38,998-Speed 2496.37 samples/sec Loss 1.7984 LearningRate 0.000196 Epoch: 24 Global Step: 498680 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:36:47,200-Speed 2497.33 samples/sec Loss 1.7446 LearningRate 0.000196 Epoch: 24 Global Step: 498690 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:36:55,400-Speed 2497.99 samples/sec Loss 1.7720 LearningRate 0.000196 Epoch: 24 Global Step: 498700 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:03,614-Speed 2493.48 samples/sec Loss 1.7519 LearningRate 0.000196 Epoch: 24 Global Step: 498710 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:11,813-Speed 2498.42 samples/sec Loss 1.7461 LearningRate 0.000196 Epoch: 24 Global Step: 498720 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:19,962-Speed 2513.67 samples/sec Loss 1.7674 LearningRate 0.000196 Epoch: 24 Global Step: 498730 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:28,162-Speed 2497.74 samples/sec Loss 1.7388 LearningRate 0.000196 Epoch: 24 Global Step: 498740 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:36,364-Speed 2497.68 samples/sec Loss 1.7934 LearningRate 0.000196 Epoch: 24 Global Step: 498750 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:44,565-Speed 2497.75 samples/sec Loss 1.7698 LearningRate 0.000196 Epoch: 24 Global Step: 498760 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:37:52,765-Speed 2498.05 samples/sec Loss 1.7525 LearningRate 0.000196 Epoch: 24 Global Step: 498770 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:00,964-Speed 2497.96 samples/sec Loss 1.7883 LearningRate 0.000196 Epoch: 24 Global Step: 498780 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:09,110-Speed 2514.74 samples/sec Loss 1.7155 LearningRate 0.000196 Epoch: 24 Global Step: 498790 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:17,314-Speed 2496.55 samples/sec Loss 1.7284 LearningRate 0.000196 Epoch: 24 Global Step: 498800 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:25,521-Speed 2495.90 samples/sec Loss 1.7294 LearningRate 0.000196 Epoch: 24 Global Step: 498810 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:33,719-Speed 2498.35 samples/sec Loss 1.7535 LearningRate 0.000196 Epoch: 24 Global Step: 498820 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:41,919-Speed 2497.86 samples/sec Loss 1.7456 LearningRate 0.000196 Epoch: 24 Global Step: 498830 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:50,118-Speed 2498.26 samples/sec Loss 1.8179 LearningRate 0.000196 Epoch: 24 Global Step: 498840 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:38:58,266-Speed 2514.25 samples/sec Loss 1.7825 LearningRate 0.000196 Epoch: 24 Global Step: 498850 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:06,465-Speed 2498.11 samples/sec Loss 1.7825 LearningRate 0.000196 Epoch: 24 Global Step: 498860 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:14,665-Speed 2498.02 samples/sec Loss 1.7534 LearningRate 0.000196 Epoch: 24 Global Step: 498870 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:22,868-Speed 2497.12 samples/sec Loss 1.7744 LearningRate 0.000196 Epoch: 24 Global Step: 498880 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:31,069-Speed 2497.57 samples/sec Loss 1.7268 LearningRate 0.000196 Epoch: 24 Global Step: 498890 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:39,271-Speed 2497.52 samples/sec Loss 1.7939 LearningRate 0.000196 Epoch: 24 Global Step: 498900 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:47,431-Speed 2510.38 samples/sec Loss 1.7769 LearningRate 0.000196 Epoch: 24 Global Step: 498910 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:39:55,641-Speed 2495.24 samples/sec Loss 1.7609 LearningRate 0.000196 Epoch: 24 Global Step: 498920 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:03,840-Speed 2498.17 samples/sec Loss 1.7122 LearningRate 0.000196 Epoch: 24 Global Step: 498930 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:12,043-Speed 2496.89 samples/sec Loss 1.7487 LearningRate 0.000196 Epoch: 24 Global Step: 498940 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:20,238-Speed 2499.39 samples/sec Loss 1.7512 LearningRate 0.000196 Epoch: 24 Global Step: 498950 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:28,438-Speed 2498.14 samples/sec Loss 1.7806 LearningRate 0.000196 Epoch: 24 Global Step: 498960 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:36,585-Speed 2514.24 samples/sec Loss 1.7425 LearningRate 0.000196 Epoch: 24 Global Step: 498970 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:44,788-Speed 2496.84 samples/sec Loss 1.7389 LearningRate 0.000196 Epoch: 24 Global Step: 498980 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:40:52,994-Speed 2496.15 samples/sec Loss 1.7730 LearningRate 0.000196 Epoch: 24 Global Step: 498990 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:01,194-Speed 2498.09 samples/sec Loss 1.7688 LearningRate 0.000196 Epoch: 24 Global Step: 499000 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:09,392-Speed 2498.68 samples/sec Loss 1.7540 LearningRate 0.000196 Epoch: 24 Global Step: 499010 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:17,593-Speed 2497.67 samples/sec Loss 1.7421 LearningRate 0.000196 Epoch: 24 Global Step: 499020 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:25,738-Speed 2514.75 samples/sec Loss 1.7364 LearningRate 0.000196 Epoch: 24 Global Step: 499030 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:33,942-Speed 2497.04 samples/sec Loss 1.7331 LearningRate 0.000196 Epoch: 24 Global Step: 499040 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:42,141-Speed 2498.34 samples/sec Loss 1.7592 LearningRate 0.000196 Epoch: 24 Global Step: 499050 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:50,339-Speed 2498.33 samples/sec Loss 1.7881 LearningRate 0.000196 Epoch: 24 Global Step: 499060 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:41:58,538-Speed 2498.31 samples/sec Loss 1.7574 LearningRate 0.000196 Epoch: 24 Global Step: 499070 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:06,739-Speed 2497.79 samples/sec Loss 1.7491 LearningRate 0.000196 Epoch: 24 Global Step: 499080 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:14,884-Speed 2514.82 samples/sec Loss 1.7898 LearningRate 0.000196 Epoch: 24 Global Step: 499090 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:23,084-Speed 2497.96 samples/sec Loss 1.7624 LearningRate 0.000196 Epoch: 24 Global Step: 499100 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:31,289-Speed 2496.48 samples/sec Loss 1.7351 LearningRate 0.000196 Epoch: 24 Global Step: 499110 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:39,488-Speed 2498.20 samples/sec Loss 1.8050 LearningRate 0.000196 Epoch: 24 Global Step: 499120 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:47,687-Speed 2498.38 samples/sec Loss 1.7460 LearningRate 0.000196 Epoch: 24 Global Step: 499130 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:42:55,889-Speed 2497.73 samples/sec Loss 1.7882 LearningRate 0.000196 Epoch: 24 Global Step: 499140 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:04,034-Speed 2515.06 samples/sec Loss 1.7333 LearningRate 0.000196 Epoch: 24 Global Step: 499150 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:12,237-Speed 2496.89 samples/sec Loss 1.7240 LearningRate 0.000196 Epoch: 24 Global Step: 499160 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:20,439-Speed 2497.27 samples/sec Loss 1.7528 LearningRate 0.000196 Epoch: 24 Global Step: 499170 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:28,642-Speed 2497.32 samples/sec Loss 1.7953 LearningRate 0.000196 Epoch: 24 Global Step: 499180 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:36,840-Speed 2498.56 samples/sec Loss 1.7431 LearningRate 0.000196 Epoch: 24 Global Step: 499190 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:45,042-Speed 2497.26 samples/sec Loss 1.7439 LearningRate 0.000196 Epoch: 24 Global Step: 499200 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:43:53,194-Speed 2512.64 samples/sec Loss 1.7635 LearningRate 0.000196 Epoch: 24 Global Step: 499210 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:01,395-Speed 2497.83 samples/sec Loss 1.7886 LearningRate 0.000196 Epoch: 24 Global Step: 499220 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:09,596-Speed 2497.54 samples/sec Loss 1.7457 LearningRate 0.000196 Epoch: 24 Global Step: 499230 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:17,801-Speed 2496.81 samples/sec Loss 1.7648 LearningRate 0.000196 Epoch: 24 Global Step: 499240 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:26,008-Speed 2495.60 samples/sec Loss 1.7307 LearningRate 0.000196 Epoch: 24 Global Step: 499250 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:34,213-Speed 2496.72 samples/sec Loss 1.6958 LearningRate 0.000196 Epoch: 24 Global Step: 499260 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:42,361-Speed 2513.83 samples/sec Loss 1.7177 LearningRate 0.000196 Epoch: 24 Global Step: 499270 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:50,561-Speed 2498.12 samples/sec Loss 1.7355 LearningRate 0.000196 Epoch: 24 Global Step: 499280 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:44:58,762-Speed 2497.46 samples/sec Loss 1.7844 LearningRate 0.000196 Epoch: 24 Global Step: 499290 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:06,974-Speed 2494.27 samples/sec Loss 1.7242 LearningRate 0.000196 Epoch: 24 Global Step: 499300 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:15,176-Speed 2497.61 samples/sec Loss 1.7372 LearningRate 0.000196 Epoch: 24 Global Step: 499310 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:23,374-Speed 2498.40 samples/sec Loss 1.8008 LearningRate 0.000196 Epoch: 24 Global Step: 499320 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:31,518-Speed 2515.02 samples/sec Loss 1.7955 LearningRate 0.000196 Epoch: 24 Global Step: 499330 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:39,720-Speed 2497.37 samples/sec Loss 1.7822 LearningRate 0.000196 Epoch: 24 Global Step: 499340 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:47,919-Speed 2498.28 samples/sec Loss 1.7917 LearningRate 0.000196 Epoch: 24 Global Step: 499350 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:45:56,116-Speed 2498.92 samples/sec Loss 1.7662 LearningRate 0.000196 Epoch: 24 Global Step: 499360 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:04,317-Speed 2497.49 samples/sec Loss 1.7820 LearningRate 0.000196 Epoch: 24 Global Step: 499370 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:12,517-Speed 2497.87 samples/sec Loss 1.7335 LearningRate 0.000196 Epoch: 24 Global Step: 499380 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:20,666-Speed 2513.67 samples/sec Loss 1.7753 LearningRate 0.000196 Epoch: 24 Global Step: 499390 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:28,865-Speed 2498.01 samples/sec Loss 1.7526 LearningRate 0.000196 Epoch: 24 Global Step: 499400 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:37,064-Speed 2498.41 samples/sec Loss 1.7287 LearningRate 0.000196 Epoch: 24 Global Step: 499410 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:45,268-Speed 2496.81 samples/sec Loss 1.7838 LearningRate 0.000196 Epoch: 24 Global Step: 499420 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:46:53,466-Speed 2498.34 samples/sec Loss 1.7683 LearningRate 0.000196 Epoch: 24 Global Step: 499430 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:01,670-Speed 2497.56 samples/sec Loss 1.7426 LearningRate 0.000196 Epoch: 24 Global Step: 499440 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:09,833-Speed 2509.51 samples/sec Loss 1.7818 LearningRate 0.000195 Epoch: 24 Global Step: 499450 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:18,031-Speed 2498.27 samples/sec Loss 1.7644 LearningRate 0.000195 Epoch: 24 Global Step: 499460 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:26,231-Speed 2497.99 samples/sec Loss 1.7630 LearningRate 0.000195 Epoch: 24 Global Step: 499470 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:34,430-Speed 2498.36 samples/sec Loss 1.7607 LearningRate 0.000195 Epoch: 24 Global Step: 499480 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:42,638-Speed 2495.46 samples/sec Loss 1.7746 LearningRate 0.000195 Epoch: 24 Global Step: 499490 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:50,837-Speed 2497.94 samples/sec Loss 1.7345 LearningRate 0.000195 Epoch: 24 Global Step: 499500 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:47:58,999-Speed 2509.90 samples/sec Loss 1.7884 LearningRate 0.000195 Epoch: 24 Global Step: 499510 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:07,198-Speed 2497.91 samples/sec Loss 1.7757 LearningRate 0.000195 Epoch: 24 Global Step: 499520 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:15,400-Speed 2497.48 samples/sec Loss 1.7595 LearningRate 0.000195 Epoch: 24 Global Step: 499530 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:23,599-Speed 2498.69 samples/sec Loss 1.7866 LearningRate 0.000195 Epoch: 24 Global Step: 499540 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:31,798-Speed 2498.00 samples/sec Loss 1.7857 LearningRate 0.000195 Epoch: 24 Global Step: 499550 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:40,010-Speed 2494.51 samples/sec Loss 1.7648 LearningRate 0.000195 Epoch: 24 Global Step: 499560 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:48,155-Speed 2515.09 samples/sec Loss 1.7674 LearningRate 0.000195 Epoch: 24 Global Step: 499570 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:48:56,354-Speed 2498.05 samples/sec Loss 1.7614 LearningRate 0.000195 Epoch: 24 Global Step: 499580 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:04,552-Speed 2498.61 samples/sec Loss 1.7837 LearningRate 0.000195 Epoch: 24 Global Step: 499590 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:12,756-Speed 2497.09 samples/sec Loss 1.7947 LearningRate 0.000195 Epoch: 24 Global Step: 499600 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:20,955-Speed 2498.14 samples/sec Loss 1.7597 LearningRate 0.000195 Epoch: 24 Global Step: 499610 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:29,158-Speed 2496.95 samples/sec Loss 1.7458 LearningRate 0.000195 Epoch: 24 Global Step: 499620 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:37,303-Speed 2514.72 samples/sec Loss 1.7771 LearningRate 0.000195 Epoch: 24 Global Step: 499630 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:45,500-Speed 2499.03 samples/sec Loss 1.7191 LearningRate 0.000195 Epoch: 24 Global Step: 499640 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:49:53,702-Speed 2497.54 samples/sec Loss 1.7368 LearningRate 0.000195 Epoch: 24 Global Step: 499650 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:01,907-Speed 2496.42 samples/sec Loss 1.7663 LearningRate 0.000195 Epoch: 24 Global Step: 499660 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:10,111-Speed 2496.65 samples/sec Loss 1.7579 LearningRate 0.000195 Epoch: 24 Global Step: 499670 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:18,316-Speed 2496.59 samples/sec Loss 1.7519 LearningRate 0.000195 Epoch: 24 Global Step: 499680 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:26,460-Speed 2515.13 samples/sec Loss 1.7436 LearningRate 0.000195 Epoch: 24 Global Step: 499690 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:34,660-Speed 2498.04 samples/sec Loss 1.7419 LearningRate 0.000195 Epoch: 24 Global Step: 499700 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:42,867-Speed 2495.90 samples/sec Loss 1.7362 LearningRate 0.000195 Epoch: 24 Global Step: 499710 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:51,065-Speed 2498.51 samples/sec Loss 1.7631 LearningRate 0.000195 Epoch: 24 Global Step: 499720 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:50:59,266-Speed 2497.62 samples/sec Loss 1.7653 LearningRate 0.000195 Epoch: 24 Global Step: 499730 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:51:07,467-Speed 2497.50 samples/sec Loss 1.7593 LearningRate 0.000195 Epoch: 24 Global Step: 499740 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:51:15,614-Speed 2514.18 samples/sec Loss 1.7107 LearningRate 0.000195 Epoch: 24 Global Step: 499750 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:51:23,813-Speed 2498.18 samples/sec Loss 1.7740 LearningRate 0.000195 Epoch: 24 Global Step: 499760 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:51:32,012-Speed 2498.26 samples/sec Loss 1.7507 LearningRate 0.000195 Epoch: 24 Global Step: 499770 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-07-10 08:51:40,215-Speed 2497.00 samples/sec Loss 1.7189 LearningRate 0.000195 Epoch: 24 Global Step: 499780 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:51:48,415-Speed 2498.12 samples/sec Loss 1.7586 LearningRate 0.000195 Epoch: 24 Global Step: 499790 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:51:56,617-Speed 2497.40 samples/sec Loss 1.7509 LearningRate 0.000195 Epoch: 24 Global Step: 499800 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:52:04,762-Speed 2514.66 samples/sec Loss 1.7390 LearningRate 0.000195 Epoch: 24 Global Step: 499810 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:52:12,961-Speed 2498.38 samples/sec Loss 1.7194 LearningRate 0.000195 Epoch: 24 Global Step: 499820 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:52:21,160-Speed 2498.44 samples/sec Loss 1.7572 LearningRate 0.000195 Epoch: 24 Global Step: 499830 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:52:29,359-Speed 2498.22 samples/sec Loss 1.7645 LearningRate 0.000195 Epoch: 24 Global Step: 499840 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 08:52:37,561-Speed 2497.39 samples/sec Loss 1.7374 LearningRate 0.000195 Epoch: 24 Global Step: 499850 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:52:45,764-Speed 2497.02 samples/sec Loss 1.7915 LearningRate 0.000195 Epoch: 24 Global Step: 499860 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:52:53,914-Speed 2513.40 samples/sec Loss 1.7826 LearningRate 0.000195 Epoch: 24 Global Step: 499870 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:02,116-Speed 2497.57 samples/sec Loss 1.7714 LearningRate 0.000195 Epoch: 24 Global Step: 499880 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:10,315-Speed 2498.13 samples/sec Loss 1.8028 LearningRate 0.000195 Epoch: 24 Global Step: 499890 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:18,514-Speed 2498.27 samples/sec Loss 1.8173 LearningRate 0.000195 Epoch: 24 Global Step: 499900 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:26,712-Speed 2498.55 samples/sec Loss 1.7628 LearningRate 0.000195 Epoch: 24 Global Step: 499910 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:34,912-Speed 2497.95 samples/sec Loss 1.7808 LearningRate 0.000195 Epoch: 24 Global Step: 499920 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:43,058-Speed 2514.49 samples/sec Loss 1.7532 LearningRate 0.000195 Epoch: 24 Global Step: 499930 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:51,257-Speed 2498.30 samples/sec Loss 1.7720 LearningRate 0.000195 Epoch: 24 Global Step: 499940 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:53:59,456-Speed 2498.58 samples/sec Loss 1.7205 LearningRate 0.000195 Epoch: 24 Global Step: 499950 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:07,661-Speed 2496.30 samples/sec Loss 1.7448 LearningRate 0.000195 Epoch: 24 Global Step: 499960 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:15,861-Speed 2498.15 samples/sec Loss 1.7479 LearningRate 0.000195 Epoch: 24 Global Step: 499970 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:24,066-Speed 2496.21 samples/sec Loss 1.7705 LearningRate 0.000195 Epoch: 24 Global Step: 499980 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:32,223-Speed 2511.13 samples/sec Loss 1.7892 LearningRate 0.000195 Epoch: 24 Global Step: 499990 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:40,422-Speed 2498.28 samples/sec Loss 1.7728 LearningRate 0.000195 Epoch: 24 Global Step: 500000 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:48,622-Speed 2497.91 samples/sec Loss 1.8029 LearningRate 0.000195 Epoch: 24 Global Step: 500010 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:54:56,821-Speed 2498.08 samples/sec Loss 1.7845 LearningRate 0.000195 Epoch: 24 Global Step: 500020 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:05,020-Speed 2498.43 samples/sec Loss 1.7969 LearningRate 0.000195 Epoch: 24 Global Step: 500030 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:13,220-Speed 2497.71 samples/sec Loss 1.7963 LearningRate 0.000195 Epoch: 24 Global Step: 500040 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:21,365-Speed 2514.89 samples/sec Loss 1.7417 LearningRate 0.000195 Epoch: 24 Global Step: 500050 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:29,560-Speed 2499.35 samples/sec Loss 1.7258 LearningRate 0.000195 Epoch: 24 Global Step: 500060 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:37,758-Speed 2498.57 samples/sec Loss 1.7594 LearningRate 0.000195 Epoch: 24 Global Step: 500070 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:45,961-Speed 2497.31 samples/sec Loss 1.7494 LearningRate 0.000195 Epoch: 24 Global Step: 500080 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:55:54,160-Speed 2498.05 samples/sec Loss 1.7678 LearningRate 0.000195 Epoch: 24 Global Step: 500090 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:02,360-Speed 2497.96 samples/sec Loss 1.7584 LearningRate 0.000195 Epoch: 24 Global Step: 500100 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:10,506-Speed 2514.68 samples/sec Loss 1.7909 LearningRate 0.000195 Epoch: 24 Global Step: 500110 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:18,709-Speed 2497.15 samples/sec Loss 1.7749 LearningRate 0.000195 Epoch: 24 Global Step: 500120 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:26,918-Speed 2495.13 samples/sec Loss 1.7710 LearningRate 0.000195 Epoch: 24 Global Step: 500130 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:35,117-Speed 2498.57 samples/sec Loss 1.7403 LearningRate 0.000195 Epoch: 24 Global Step: 500140 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:43,315-Speed 2498.46 samples/sec Loss 1.7499 LearningRate 0.000195 Epoch: 24 Global Step: 500150 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:51,514-Speed 2498.65 samples/sec Loss 1.7701 LearningRate 0.000195 Epoch: 24 Global Step: 500160 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:56:59,662-Speed 2513.74 samples/sec Loss 1.7959 LearningRate 0.000195 Epoch: 24 Global Step: 500170 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:07,861-Speed 2498.41 samples/sec Loss 1.7706 LearningRate 0.000195 Epoch: 24 Global Step: 500180 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:16,064-Speed 2497.04 samples/sec Loss 1.7571 LearningRate 0.000195 Epoch: 24 Global Step: 500190 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:24,264-Speed 2498.27 samples/sec Loss 1.7519 LearningRate 0.000195 Epoch: 24 Global Step: 500200 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:32,465-Speed 2497.62 samples/sec Loss 1.7862 LearningRate 0.000195 Epoch: 24 Global Step: 500210 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:40,675-Speed 2494.65 samples/sec Loss 1.7839 LearningRate 0.000195 Epoch: 24 Global Step: 500220 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:48,823-Speed 2514.18 samples/sec Loss 1.7726 LearningRate 0.000195 Epoch: 24 Global Step: 500230 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:57:57,025-Speed 2497.36 samples/sec Loss 1.7672 LearningRate 0.000195 Epoch: 24 Global Step: 500240 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:05,240-Speed 2493.20 samples/sec Loss 1.7746 LearningRate 0.000195 Epoch: 24 Global Step: 500250 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:13,439-Speed 2498.30 samples/sec Loss 1.7503 LearningRate 0.000195 Epoch: 24 Global Step: 500260 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:21,637-Speed 2498.47 samples/sec Loss 1.7760 LearningRate 0.000195 Epoch: 24 Global Step: 500270 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:29,837-Speed 2498.01 samples/sec Loss 1.7298 LearningRate 0.000195 Epoch: 24 Global Step: 500280 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:37,989-Speed 2512.73 samples/sec Loss 1.7653 LearningRate 0.000195 Epoch: 24 Global Step: 500290 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:46,189-Speed 2498.13 samples/sec Loss 1.8082 LearningRate 0.000194 Epoch: 24 Global Step: 500300 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:58:54,389-Speed 2497.78 samples/sec Loss 1.8218 LearningRate 0.000194 Epoch: 24 Global Step: 500310 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:02,590-Speed 2497.60 samples/sec Loss 1.7991 LearningRate 0.000194 Epoch: 24 Global Step: 500320 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:10,793-Speed 2497.18 samples/sec Loss 1.7361 LearningRate 0.000194 Epoch: 24 Global Step: 500330 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:18,994-Speed 2497.54 samples/sec Loss 1.7838 LearningRate 0.000194 Epoch: 24 Global Step: 500340 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:27,137-Speed 2515.37 samples/sec Loss 1.7423 LearningRate 0.000194 Epoch: 24 Global Step: 500350 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:35,338-Speed 2497.64 samples/sec Loss 1.7792 LearningRate 0.000194 Epoch: 24 Global Step: 500360 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:43,538-Speed 2498.08 samples/sec Loss 1.7610 LearningRate 0.000194 Epoch: 24 Global Step: 500370 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:51,738-Speed 2498.04 samples/sec Loss 1.8035 LearningRate 0.000194 Epoch: 24 Global Step: 500380 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 08:59:59,948-Speed 2494.98 samples/sec Loss 1.8091 LearningRate 0.000194 Epoch: 24 Global Step: 500390 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:08,152-Speed 2496.65 samples/sec Loss 1.7991 LearningRate 0.000194 Epoch: 24 Global Step: 500400 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:16,301-Speed 2513.79 samples/sec Loss 1.7784 LearningRate 0.000194 Epoch: 24 Global Step: 500410 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:24,508-Speed 2495.67 samples/sec Loss 1.7969 LearningRate 0.000194 Epoch: 24 Global Step: 500420 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:32,709-Speed 2497.69 samples/sec Loss 1.7625 LearningRate 0.000194 Epoch: 24 Global Step: 500430 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:40,928-Speed 2492.26 samples/sec Loss 1.8016 LearningRate 0.000194 Epoch: 24 Global Step: 500440 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:49,129-Speed 2497.40 samples/sec Loss 1.7719 LearningRate 0.000194 Epoch: 24 Global Step: 500450 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:00:57,331-Speed 2497.90 samples/sec Loss 1.7869 LearningRate 0.000194 Epoch: 24 Global Step: 500460 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:05,495-Speed 2508.87 samples/sec Loss 1.8251 LearningRate 0.000194 Epoch: 24 Global Step: 500470 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:13,700-Speed 2496.47 samples/sec Loss 1.8077 LearningRate 0.000194 Epoch: 24 Global Step: 500480 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:21,901-Speed 2497.66 samples/sec Loss 1.7749 LearningRate 0.000194 Epoch: 24 Global Step: 500490 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:30,102-Speed 2497.37 samples/sec Loss 1.7726 LearningRate 0.000194 Epoch: 24 Global Step: 500500 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:38,355-Speed 2499.51 samples/sec Loss 1.7732 LearningRate 0.000194 Epoch: 24 Global Step: 500510 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:46,587-Speed 2499.10 samples/sec Loss 1.7686 LearningRate 0.000194 Epoch: 24 Global Step: 500520 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:01:54,778-Speed 2515.38 samples/sec Loss 1.7941 LearningRate 0.000194 Epoch: 24 Global Step: 500530 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:02:02,980-Speed 2497.36 samples/sec Loss 1.7507 LearningRate 0.000194 Epoch: 24 Global Step: 500540 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:02:11,153-Speed 2506.12 samples/sec Loss 1.7714 LearningRate 0.000194 Epoch: 24 Global Step: 500550 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:02:22,509-Speed 1809.82 samples/sec Loss 1.7691 LearningRate 0.000194 Epoch: 24 Global Step: 500560 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:02:30,734-Speed 2501.74 samples/sec Loss 1.7508 LearningRate 0.000194 Epoch: 24 Global Step: 500570 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:02:38,931-Speed 2498.79 samples/sec Loss 1.7417 LearningRate 0.000194 Epoch: 24 Global Step: 500580 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:02:47,139-Speed 2514.96 samples/sec Loss 1.7432 LearningRate 0.000194 Epoch: 24 Global Step: 500590 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:02:58,734-Speed 1780.19 samples/sec Loss 1.7496 LearningRate 0.000194 Epoch: 24 Global Step: 500600 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:06,928-Speed 2499.84 samples/sec Loss 1.7850 LearningRate 0.000194 Epoch: 24 Global Step: 500610 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:15,183-Speed 2496.92 samples/sec Loss 1.7764 LearningRate 0.000194 Epoch: 24 Global Step: 500620 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:23,380-Speed 2500.62 samples/sec Loss 1.7218 LearningRate 0.000194 Epoch: 24 Global Step: 500630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:35,904-Speed 1635.40 samples/sec Loss 1.7444 LearningRate 0.000194 Epoch: 24 Global Step: 500640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:44,276-Speed 2446.90 samples/sec Loss 1.7701 LearningRate 0.000194 Epoch: 24 Global Step: 500650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:03:57,112-Speed 1696.19 samples/sec Loss 1.7639 LearningRate 0.000194 Epoch: 24 Global Step: 500660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:05,304-Speed 2505.03 samples/sec Loss 1.7199 LearningRate 0.000194 Epoch: 24 Global Step: 500670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:13,514-Speed 2503.36 samples/sec Loss 1.7963 LearningRate 0.000194 Epoch: 24 Global Step: 500680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:27,730-Speed 2498.31 samples/sec Loss 1.7859 LearningRate 0.000194 Epoch: 24 Global Step: 500690 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:35,964-Speed 2502.43 samples/sec Loss 1.7825 LearningRate 0.000194 Epoch: 24 Global Step: 500700 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:47,083-Speed 1842.01 samples/sec Loss 1.7888 LearningRate 0.000194 Epoch: 24 Global Step: 500710 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:04:55,501-Speed 2433.28 samples/sec Loss 1.7671 LearningRate 0.000194 Epoch: 24 Global Step: 500720 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:05,851-Speed 2075.82 samples/sec Loss 1.7676 LearningRate 0.000194 Epoch: 24 Global Step: 500730 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:14,972-Speed 2245.75 samples/sec Loss 1.7672 LearningRate 0.000194 Epoch: 24 Global Step: 500740 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:23,171-Speed 2498.17 samples/sec Loss 1.7501 LearningRate 0.000194 Epoch: 24 Global Step: 500750 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:31,390-Speed 2492.13 samples/sec Loss 1.7814 LearningRate 0.000194 Epoch: 24 Global Step: 500760 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:39,540-Speed 2513.22 samples/sec Loss 1.8160 LearningRate 0.000194 Epoch: 24 Global Step: 500770 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:47,749-Speed 2495.09 samples/sec Loss 1.7564 LearningRate 0.000194 Epoch: 24 Global Step: 500780 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:05:55,959-Speed 2494.91 samples/sec Loss 1.7683 LearningRate 0.000194 Epoch: 24 Global Step: 500790 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:04,174-Speed 2493.41 samples/sec Loss 1.8038 LearningRate 0.000194 Epoch: 24 Global Step: 500800 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:12,400-Speed 2490.14 samples/sec Loss 1.7388 LearningRate 0.000194 Epoch: 24 Global Step: 500810 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:20,631-Speed 2488.43 samples/sec Loss 1.7556 LearningRate 0.000194 Epoch: 24 Global Step: 500820 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:28,800-Speed 2507.57 samples/sec Loss 1.7693 LearningRate 0.000194 Epoch: 24 Global Step: 500830 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:37,014-Speed 2493.67 samples/sec Loss 1.7801 LearningRate 0.000194 Epoch: 24 Global Step: 500840 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:45,228-Speed 2493.68 samples/sec Loss 1.7845 LearningRate 0.000194 Epoch: 24 Global Step: 500850 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:06:53,441-Speed 2494.16 samples/sec Loss 1.7992 LearningRate 0.000194 Epoch: 24 Global Step: 500860 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:01,653-Speed 2494.37 samples/sec Loss 1.7606 LearningRate 0.000194 Epoch: 24 Global Step: 500870 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:09,865-Speed 2494.36 samples/sec Loss 1.7872 LearningRate 0.000194 Epoch: 24 Global Step: 500880 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:18,020-Speed 2511.73 samples/sec Loss 1.7892 LearningRate 0.000194 Epoch: 24 Global Step: 500890 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:26,224-Speed 2496.62 samples/sec Loss 1.7502 LearningRate 0.000194 Epoch: 24 Global Step: 500900 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:34,430-Speed 2496.26 samples/sec Loss 1.7700 LearningRate 0.000194 Epoch: 24 Global Step: 500910 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:42,644-Speed 2493.70 samples/sec Loss 1.7288 LearningRate 0.000194 Epoch: 24 Global Step: 500920 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:50,854-Speed 2494.90 samples/sec Loss 1.7701 LearningRate 0.000194 Epoch: 24 Global Step: 500930 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:07:59,060-Speed 2496.14 samples/sec Loss 1.7740 LearningRate 0.000194 Epoch: 24 Global Step: 500940 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:07,211-Speed 2513.04 samples/sec Loss 1.7827 LearningRate 0.000194 Epoch: 24 Global Step: 500950 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:15,417-Speed 2496.24 samples/sec Loss 1.7550 LearningRate 0.000194 Epoch: 24 Global Step: 500960 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:23,622-Speed 2496.58 samples/sec Loss 1.7674 LearningRate 0.000194 Epoch: 24 Global Step: 500970 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:31,837-Speed 2493.24 samples/sec Loss 1.7264 LearningRate 0.000194 Epoch: 24 Global Step: 500980 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:40,046-Speed 2495.32 samples/sec Loss 1.7820 LearningRate 0.000194 Epoch: 24 Global Step: 500990 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:48,254-Speed 2495.83 samples/sec Loss 1.7473 LearningRate 0.000194 Epoch: 24 Global Step: 501000 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:08:56,416-Speed 2509.43 samples/sec Loss 1.7819 LearningRate 0.000194 Epoch: 24 Global Step: 501010 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:04,625-Speed 2495.26 samples/sec Loss 1.7808 LearningRate 0.000194 Epoch: 24 Global Step: 501020 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:12,831-Speed 2496.34 samples/sec Loss 1.7774 LearningRate 0.000194 Epoch: 24 Global Step: 501030 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:21,041-Speed 2494.90 samples/sec Loss 1.7428 LearningRate 0.000194 Epoch: 24 Global Step: 501040 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:29,253-Speed 2494.37 samples/sec Loss 1.7670 LearningRate 0.000194 Epoch: 24 Global Step: 501050 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:37,455-Speed 2497.20 samples/sec Loss 1.7685 LearningRate 0.000194 Epoch: 24 Global Step: 501060 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:45,612-Speed 2511.07 samples/sec Loss 1.7348 LearningRate 0.000194 Epoch: 24 Global Step: 501070 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:09:53,815-Speed 2497.32 samples/sec Loss 1.7839 LearningRate 0.000194 Epoch: 24 Global Step: 501080 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:02,020-Speed 2496.41 samples/sec Loss 1.7760 LearningRate 0.000194 Epoch: 24 Global Step: 501090 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:10,225-Speed 2496.42 samples/sec Loss 1.7310 LearningRate 0.000194 Epoch: 24 Global Step: 501100 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:18,430-Speed 2496.54 samples/sec Loss 1.7157 LearningRate 0.000194 Epoch: 24 Global Step: 501110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:26,636-Speed 2495.90 samples/sec Loss 1.7688 LearningRate 0.000194 Epoch: 24 Global Step: 501120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:34,802-Speed 2508.51 samples/sec Loss 1.7514 LearningRate 0.000194 Epoch: 24 Global Step: 501130 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:43,010-Speed 2495.40 samples/sec Loss 1.7763 LearningRate 0.000194 Epoch: 24 Global Step: 501140 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:51,221-Speed 2494.81 samples/sec Loss 1.7568 LearningRate 0.000193 Epoch: 24 Global Step: 501150 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:10:59,425-Speed 2496.80 samples/sec Loss 1.7299 LearningRate 0.000193 Epoch: 24 Global Step: 501160 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:07,629-Speed 2496.77 samples/sec Loss 1.7440 LearningRate 0.000193 Epoch: 24 Global Step: 501170 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:15,834-Speed 2496.54 samples/sec Loss 1.7693 LearningRate 0.000193 Epoch: 24 Global Step: 501180 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:23,983-Speed 2513.42 samples/sec Loss 1.7742 LearningRate 0.000193 Epoch: 24 Global Step: 501190 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:32,197-Speed 2494.23 samples/sec Loss 1.7688 LearningRate 0.000193 Epoch: 24 Global Step: 501200 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:40,404-Speed 2495.79 samples/sec Loss 1.7286 LearningRate 0.000193 Epoch: 24 Global Step: 501210 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:48,608-Speed 2496.81 samples/sec Loss 1.7971 LearningRate 0.000193 Epoch: 24 Global Step: 501220 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:11:56,815-Speed 2495.85 samples/sec Loss 1.7191 LearningRate 0.000193 Epoch: 24 Global Step: 501230 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:05,017-Speed 2497.27 samples/sec Loss 1.7474 LearningRate 0.000193 Epoch: 24 Global Step: 501240 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:13,170-Speed 2512.18 samples/sec Loss 1.7277 LearningRate 0.000193 Epoch: 24 Global Step: 501250 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:21,380-Speed 2495.12 samples/sec Loss 1.7335 LearningRate 0.000193 Epoch: 24 Global Step: 501260 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:29,589-Speed 2495.16 samples/sec Loss 1.7378 LearningRate 0.000193 Epoch: 24 Global Step: 501270 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:37,799-Speed 2494.97 samples/sec Loss 1.7298 LearningRate 0.000193 Epoch: 24 Global Step: 501280 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:46,009-Speed 2495.05 samples/sec Loss 1.7617 LearningRate 0.000193 Epoch: 24 Global Step: 501290 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:12:54,219-Speed 2495.05 samples/sec Loss 1.7675 LearningRate 0.000193 Epoch: 24 Global Step: 501300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:02,378-Speed 2510.39 samples/sec Loss 1.7903 LearningRate 0.000193 Epoch: 24 Global Step: 501310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:10,610-Speed 2488.13 samples/sec Loss 1.7893 LearningRate 0.000193 Epoch: 24 Global Step: 501320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:18,822-Speed 2494.40 samples/sec Loss 1.8027 LearningRate 0.000193 Epoch: 24 Global Step: 501330 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:27,036-Speed 2493.69 samples/sec Loss 1.8048 LearningRate 0.000193 Epoch: 24 Global Step: 501340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:35,247-Speed 2494.73 samples/sec Loss 1.7558 LearningRate 0.000193 Epoch: 24 Global Step: 501350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:43,452-Speed 2496.21 samples/sec Loss 1.7681 LearningRate 0.000193 Epoch: 24 Global Step: 501360 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:51,605-Speed 2512.40 samples/sec Loss 1.8030 LearningRate 0.000193 Epoch: 24 Global Step: 501370 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:13:59,809-Speed 2496.70 samples/sec Loss 1.7547 LearningRate 0.000193 Epoch: 24 Global Step: 501380 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:08,017-Speed 2495.50 samples/sec Loss 1.7927 LearningRate 0.000193 Epoch: 24 Global Step: 501390 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:16,220-Speed 2497.11 samples/sec Loss 1.7764 LearningRate 0.000193 Epoch: 24 Global Step: 501400 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:24,438-Speed 2492.79 samples/sec Loss 1.7459 LearningRate 0.000193 Epoch: 24 Global Step: 501410 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:32,643-Speed 2496.38 samples/sec Loss 1.7480 LearningRate 0.000193 Epoch: 24 Global Step: 501420 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:40,790-Speed 2514.02 samples/sec Loss 1.7746 LearningRate 0.000193 Epoch: 24 Global Step: 501430 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:48,992-Speed 2497.57 samples/sec Loss 1.7429 LearningRate 0.000193 Epoch: 24 Global Step: 501440 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:14:57,195-Speed 2497.26 samples/sec Loss 1.7267 LearningRate 0.000193 Epoch: 24 Global Step: 501450 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:05,408-Speed 2494.02 samples/sec Loss 1.7062 LearningRate 0.000193 Epoch: 24 Global Step: 501460 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:13,612-Speed 2496.57 samples/sec Loss 1.7271 LearningRate 0.000193 Epoch: 24 Global Step: 501470 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:21,820-Speed 2495.59 samples/sec Loss 1.7919 LearningRate 0.000193 Epoch: 24 Global Step: 501480 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:29,980-Speed 2510.43 samples/sec Loss 1.7230 LearningRate 0.000193 Epoch: 24 Global Step: 501490 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:38,185-Speed 2496.31 samples/sec Loss 1.7167 LearningRate 0.000193 Epoch: 24 Global Step: 501500 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:46,392-Speed 2495.70 samples/sec Loss 1.7078 LearningRate 0.000193 Epoch: 24 Global Step: 501510 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:15:54,598-Speed 2496.15 samples/sec Loss 1.7252 LearningRate 0.000193 Epoch: 24 Global Step: 501520 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:02,804-Speed 2496.02 samples/sec Loss 1.7702 LearningRate 0.000193 Epoch: 24 Global Step: 501530 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:11,008-Speed 2496.71 samples/sec Loss 1.7422 LearningRate 0.000193 Epoch: 24 Global Step: 501540 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:19,166-Speed 2510.91 samples/sec Loss 1.7579 LearningRate 0.000193 Epoch: 24 Global Step: 501550 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:27,376-Speed 2494.71 samples/sec Loss 1.7584 LearningRate 0.000193 Epoch: 24 Global Step: 501560 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:35,584-Speed 2495.69 samples/sec Loss 1.7465 LearningRate 0.000193 Epoch: 24 Global Step: 501570 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:43,796-Speed 2494.25 samples/sec Loss 1.7489 LearningRate 0.000193 Epoch: 24 Global Step: 501580 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:16:52,012-Speed 2492.96 samples/sec Loss 1.7103 LearningRate 0.000193 Epoch: 24 Global Step: 501590 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:00,219-Speed 2496.03 samples/sec Loss 1.7385 LearningRate 0.000193 Epoch: 24 Global Step: 501600 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:08,370-Speed 2512.93 samples/sec Loss 1.7522 LearningRate 0.000193 Epoch: 24 Global Step: 501610 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:16,574-Speed 2496.53 samples/sec Loss 1.7736 LearningRate 0.000193 Epoch: 24 Global Step: 501620 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:24,780-Speed 2496.25 samples/sec Loss 1.7317 LearningRate 0.000193 Epoch: 24 Global Step: 501630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:32,986-Speed 2496.05 samples/sec Loss 1.7256 LearningRate 0.000193 Epoch: 24 Global Step: 501640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:41,190-Speed 2496.81 samples/sec Loss 1.7399 LearningRate 0.000193 Epoch: 24 Global Step: 501650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:49,393-Speed 2497.05 samples/sec Loss 1.7505 LearningRate 0.000193 Epoch: 24 Global Step: 501660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:17:57,555-Speed 2509.55 samples/sec Loss 1.7013 LearningRate 0.000193 Epoch: 24 Global Step: 501670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:05,761-Speed 2495.99 samples/sec Loss 1.7241 LearningRate 0.000193 Epoch: 24 Global Step: 501680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:13,964-Speed 2497.21 samples/sec Loss 1.7641 LearningRate 0.000193 Epoch: 24 Global Step: 501690 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:22,165-Speed 2497.34 samples/sec Loss 1.7549 LearningRate 0.000193 Epoch: 24 Global Step: 501700 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:30,368-Speed 2497.15 samples/sec Loss 1.7209 LearningRate 0.000193 Epoch: 24 Global Step: 501710 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:38,570-Speed 2497.62 samples/sec Loss 1.7149 LearningRate 0.000193 Epoch: 24 Global Step: 501720 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:46,723-Speed 2512.37 samples/sec Loss 1.7617 LearningRate 0.000193 Epoch: 24 Global Step: 501730 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:18:54,927-Speed 2496.71 samples/sec Loss 1.7341 LearningRate 0.000193 Epoch: 24 Global Step: 501740 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:19:03,130-Speed 2496.92 samples/sec Loss 1.7733 LearningRate 0.000193 Epoch: 24 Global Step: 501750 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:11,348-Speed 2492.62 samples/sec Loss 1.7569 LearningRate 0.000193 Epoch: 24 Global Step: 501760 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:19,558-Speed 2494.72 samples/sec Loss 1.7350 LearningRate 0.000193 Epoch: 24 Global Step: 501770 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:27,760-Speed 2497.37 samples/sec Loss 1.7689 LearningRate 0.000193 Epoch: 24 Global Step: 501780 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:35,914-Speed 2512.46 samples/sec Loss 1.7003 LearningRate 0.000193 Epoch: 24 Global Step: 501790 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:44,119-Speed 2496.22 samples/sec Loss 1.7636 LearningRate 0.000193 Epoch: 24 Global Step: 501800 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:19:52,322-Speed 2496.93 samples/sec Loss 1.7382 LearningRate 0.000193 Epoch: 24 Global Step: 501810 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:00,526-Speed 2496.92 samples/sec Loss 1.7748 LearningRate 0.000193 Epoch: 24 Global Step: 501820 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:08,726-Speed 2497.89 samples/sec Loss 1.7117 LearningRate 0.000193 Epoch: 24 Global Step: 501830 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:16,931-Speed 2496.42 samples/sec Loss 1.7606 LearningRate 0.000193 Epoch: 24 Global Step: 501840 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:25,097-Speed 2508.14 samples/sec Loss 1.8075 LearningRate 0.000193 Epoch: 24 Global Step: 501850 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:33,299-Speed 2497.35 samples/sec Loss 1.7986 LearningRate 0.000193 Epoch: 24 Global Step: 501860 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:41,510-Speed 2494.68 samples/sec Loss 1.7604 LearningRate 0.000193 Epoch: 24 Global Step: 501870 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:49,712-Speed 2497.23 samples/sec Loss 1.7603 LearningRate 0.000193 Epoch: 24 Global Step: 501880 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:20:57,923-Speed 2494.73 samples/sec Loss 1.7649 LearningRate 0.000193 Epoch: 24 Global Step: 501890 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:06,131-Speed 2495.58 samples/sec Loss 1.7391 LearningRate 0.000193 Epoch: 24 Global Step: 501900 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:14,280-Speed 2513.75 samples/sec Loss 1.7565 LearningRate 0.000193 Epoch: 24 Global Step: 501910 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:22,486-Speed 2496.28 samples/sec Loss 1.7190 LearningRate 0.000193 Epoch: 24 Global Step: 501920 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:30,691-Speed 2496.33 samples/sec Loss 1.7781 LearningRate 0.000193 Epoch: 24 Global Step: 501930 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:38,894-Speed 2497.38 samples/sec Loss 1.7834 LearningRate 0.000193 Epoch: 24 Global Step: 501940 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:47,103-Speed 2495.36 samples/sec Loss 1.7906 LearningRate 0.000193 Epoch: 24 Global Step: 501950 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:21:55,310-Speed 2495.65 samples/sec Loss 1.7737 LearningRate 0.000193 Epoch: 24 Global Step: 501960 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:03,469-Speed 2510.75 samples/sec Loss 1.7303 LearningRate 0.000193 Epoch: 24 Global Step: 501970 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:11,686-Speed 2492.70 samples/sec Loss 1.8191 LearningRate 0.000193 Epoch: 24 Global Step: 501980 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:19,890-Speed 2496.84 samples/sec Loss 1.7609 LearningRate 0.000192 Epoch: 24 Global Step: 501990 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:28,094-Speed 2496.48 samples/sec Loss 1.8163 LearningRate 0.000192 Epoch: 24 Global Step: 502000 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:36,301-Speed 2495.88 samples/sec Loss 1.7697 LearningRate 0.000192 Epoch: 24 Global Step: 502010 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:44,505-Speed 2496.93 samples/sec Loss 1.7802 LearningRate 0.000192 Epoch: 24 Global Step: 502020 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:22:52,657-Speed 2512.64 samples/sec Loss 1.7761 LearningRate 0.000192 Epoch: 24 Global Step: 502030 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:00,860-Speed 2496.81 samples/sec Loss 1.7714 LearningRate 0.000192 Epoch: 24 Global Step: 502040 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:09,079-Speed 2492.19 samples/sec Loss 1.7799 LearningRate 0.000192 Epoch: 24 Global Step: 502050 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:17,284-Speed 2496.65 samples/sec Loss 1.7657 LearningRate 0.000192 Epoch: 24 Global Step: 502060 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:25,487-Speed 2497.27 samples/sec Loss 1.7536 LearningRate 0.000192 Epoch: 24 Global Step: 502070 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:33,694-Speed 2495.51 samples/sec Loss 1.7417 LearningRate 0.000192 Epoch: 24 Global Step: 502080 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:41,847-Speed 2512.62 samples/sec Loss 1.7182 LearningRate 0.000192 Epoch: 24 Global Step: 502090 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:50,051-Speed 2496.65 samples/sec Loss 1.7320 LearningRate 0.000192 Epoch: 24 Global Step: 502100 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:23:58,259-Speed 2495.74 samples/sec Loss 1.7634 LearningRate 0.000192 Epoch: 24 Global Step: 502110 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:06,462-Speed 2496.80 samples/sec Loss 1.7359 LearningRate 0.000192 Epoch: 24 Global Step: 502120 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:14,667-Speed 2497.02 samples/sec Loss 1.7663 LearningRate 0.000192 Epoch: 24 Global Step: 502130 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:22,869-Speed 2497.30 samples/sec Loss 1.7268 LearningRate 0.000192 Epoch: 24 Global Step: 502140 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:31,024-Speed 2511.76 samples/sec Loss 1.6912 LearningRate 0.000192 Epoch: 24 Global Step: 502150 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:39,231-Speed 2495.95 samples/sec Loss 1.7434 LearningRate 0.000192 Epoch: 24 Global Step: 502160 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:47,439-Speed 2495.85 samples/sec Loss 1.7888 LearningRate 0.000192 Epoch: 24 Global Step: 502170 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:24:55,645-Speed 2496.37 samples/sec Loss 1.7703 LearningRate 0.000192 Epoch: 24 Global Step: 502180 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:25:03,851-Speed 2496.07 samples/sec Loss 1.7439 LearningRate 0.000192 Epoch: 24 Global Step: 502190 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:25:12,055-Speed 2496.59 samples/sec Loss 1.7294 LearningRate 0.000192 Epoch: 24 Global Step: 502200 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:25:20,211-Speed 2511.35 samples/sec Loss 1.7822 LearningRate 0.000192 Epoch: 24 Global Step: 502210 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:25:28,379-Speed 2507.96 samples/sec Loss 1.7699 LearningRate 0.000192 Epoch: 24 Global Step: 502220 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:25:36,582-Speed 2496.70 samples/sec Loss 1.7490 LearningRate 0.000192 Epoch: 24 Global Step: 502230 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:25:44,794-Speed 2494.36 samples/sec Loss 1.6958 LearningRate 0.000192 Epoch: 24 Global Step: 502240 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:25:52,998-Speed 2496.67 samples/sec Loss 1.7345 LearningRate 0.000192 Epoch: 24 Global Step: 502250 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:01,206-Speed 2495.79 samples/sec Loss 1.7456 LearningRate 0.000192 Epoch: 24 Global Step: 502260 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:09,360-Speed 2511.82 samples/sec Loss 1.7273 LearningRate 0.000192 Epoch: 24 Global Step: 502270 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:17,563-Speed 2497.19 samples/sec Loss 1.7750 LearningRate 0.000192 Epoch: 24 Global Step: 502280 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:25,769-Speed 2496.24 samples/sec Loss 1.7350 LearningRate 0.000192 Epoch: 24 Global Step: 502290 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:33,974-Speed 2496.32 samples/sec Loss 1.7324 LearningRate 0.000192 Epoch: 24 Global Step: 502300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:42,177-Speed 2496.84 samples/sec Loss 1.7327 LearningRate 0.000192 Epoch: 24 Global Step: 502310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:50,382-Speed 2496.91 samples/sec Loss 1.7459 LearningRate 0.000192 Epoch: 24 Global Step: 502320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:26:58,532-Speed 2513.16 samples/sec Loss 1.7655 LearningRate 0.000192 Epoch: 24 Global Step: 502330 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:06,749-Speed 2492.96 samples/sec Loss 1.7492 LearningRate 0.000192 Epoch: 24 Global Step: 502340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:14,951-Speed 2497.39 samples/sec Loss 1.7599 LearningRate 0.000192 Epoch: 24 Global Step: 502350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:23,154-Speed 2497.02 samples/sec Loss 1.7587 LearningRate 0.000192 Epoch: 24 Global Step: 502360 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:31,360-Speed 2496.13 samples/sec Loss 1.7896 LearningRate 0.000192 Epoch: 24 Global Step: 502370 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:39,567-Speed 2496.01 samples/sec Loss 1.7483 LearningRate 0.000192 Epoch: 24 Global Step: 502380 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:47,720-Speed 2512.17 samples/sec Loss 1.7646 LearningRate 0.000192 Epoch: 24 Global Step: 502390 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:27:55,931-Speed 2494.51 samples/sec Loss 1.8035 LearningRate 0.000192 Epoch: 24 Global Step: 502400 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:04,136-Speed 2496.57 samples/sec Loss 1.7577 LearningRate 0.000192 Epoch: 24 Global Step: 502410 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:12,344-Speed 2495.51 samples/sec Loss 1.6964 LearningRate 0.000192 Epoch: 24 Global Step: 502420 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:20,567-Speed 2491.56 samples/sec Loss 1.7333 LearningRate 0.000192 Epoch: 24 Global Step: 502430 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:28,771-Speed 2496.63 samples/sec Loss 1.7314 LearningRate 0.000192 Epoch: 24 Global Step: 502440 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:36,924-Speed 2512.45 samples/sec Loss 1.7540 LearningRate 0.000192 Epoch: 24 Global Step: 502450 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:45,127-Speed 2497.08 samples/sec Loss 1.7543 LearningRate 0.000192 Epoch: 24 Global Step: 502460 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:28:53,330-Speed 2496.93 samples/sec Loss 1.7673 LearningRate 0.000192 Epoch: 24 Global Step: 502470 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:01,536-Speed 2496.48 samples/sec Loss 1.7459 LearningRate 0.000192 Epoch: 24 Global Step: 502480 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:09,739-Speed 2497.19 samples/sec Loss 1.8257 LearningRate 0.000192 Epoch: 24 Global Step: 502490 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:17,946-Speed 2495.61 samples/sec Loss 1.7439 LearningRate 0.000192 Epoch: 24 Global Step: 502500 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:26,109-Speed 2509.54 samples/sec Loss 1.7692 LearningRate 0.000192 Epoch: 24 Global Step: 502510 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:34,312-Speed 2496.93 samples/sec Loss 1.7477 LearningRate 0.000192 Epoch: 24 Global Step: 502520 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:42,517-Speed 2496.41 samples/sec Loss 1.7827 LearningRate 0.000192 Epoch: 24 Global Step: 502530 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:50,723-Speed 2496.30 samples/sec Loss 1.7604 LearningRate 0.000192 Epoch: 24 Global Step: 502540 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:29:58,927-Speed 2496.84 samples/sec Loss 1.7767 LearningRate 0.000192 Epoch: 24 Global Step: 502550 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:07,132-Speed 2496.26 samples/sec Loss 1.7943 LearningRate 0.000192 Epoch: 24 Global Step: 502560 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:15,285-Speed 2512.50 samples/sec Loss 1.7544 LearningRate 0.000192 Epoch: 24 Global Step: 502570 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:23,494-Speed 2495.13 samples/sec Loss 1.7469 LearningRate 0.000192 Epoch: 24 Global Step: 502580 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:31,700-Speed 2496.10 samples/sec Loss 1.7497 LearningRate 0.000192 Epoch: 24 Global Step: 502590 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:39,910-Speed 2494.94 samples/sec Loss 1.7624 LearningRate 0.000192 Epoch: 24 Global Step: 502600 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:48,118-Speed 2495.47 samples/sec Loss 1.7633 LearningRate 0.000192 Epoch: 24 Global Step: 502610 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:30:56,326-Speed 2495.73 samples/sec Loss 1.7426 LearningRate 0.000192 Epoch: 24 Global Step: 502620 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:04,477-Speed 2512.98 samples/sec Loss 1.7771 LearningRate 0.000192 Epoch: 24 Global Step: 502630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:12,682-Speed 2496.63 samples/sec Loss 1.7754 LearningRate 0.000192 Epoch: 24 Global Step: 502640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:20,898-Speed 2493.17 samples/sec Loss 1.7392 LearningRate 0.000192 Epoch: 24 Global Step: 502650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:29,104-Speed 2495.91 samples/sec Loss 1.7350 LearningRate 0.000192 Epoch: 24 Global Step: 502660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:37,306-Speed 2497.28 samples/sec Loss 1.7430 LearningRate 0.000192 Epoch: 24 Global Step: 502670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:45,508-Speed 2497.40 samples/sec Loss 1.7689 LearningRate 0.000192 Epoch: 24 Global Step: 502680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:31:53,657-Speed 2513.56 samples/sec Loss 1.7682 LearningRate 0.000192 Epoch: 24 Global Step: 502690 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:01,860-Speed 2497.11 samples/sec Loss 1.7833 LearningRate 0.000192 Epoch: 24 Global Step: 502700 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:10,064-Speed 2496.62 samples/sec Loss 1.7355 LearningRate 0.000192 Epoch: 24 Global Step: 502710 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:18,267-Speed 2497.08 samples/sec Loss 1.7485 LearningRate 0.000192 Epoch: 24 Global Step: 502720 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:26,467-Speed 2498.05 samples/sec Loss 1.7736 LearningRate 0.000192 Epoch: 24 Global Step: 502730 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:34,673-Speed 2496.25 samples/sec Loss 1.7572 LearningRate 0.000192 Epoch: 24 Global Step: 502740 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:42,821-Speed 2513.83 samples/sec Loss 1.7593 LearningRate 0.000192 Epoch: 24 Global Step: 502750 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:51,024-Speed 2497.10 samples/sec Loss 1.7640 LearningRate 0.000192 Epoch: 24 Global Step: 502760 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:32:59,230-Speed 2495.87 samples/sec Loss 1.7604 LearningRate 0.000192 Epoch: 24 Global Step: 502770 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:07,438-Speed 2495.63 samples/sec Loss 1.7810 LearningRate 0.000192 Epoch: 24 Global Step: 502780 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:15,660-Speed 2491.43 samples/sec Loss 1.7469 LearningRate 0.000192 Epoch: 24 Global Step: 502790 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:23,862-Speed 2497.48 samples/sec Loss 1.8279 LearningRate 0.000192 Epoch: 24 Global Step: 502800 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:32,015-Speed 2512.39 samples/sec Loss 1.7804 LearningRate 0.000192 Epoch: 24 Global Step: 502810 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:40,219-Speed 2496.67 samples/sec Loss 1.7239 LearningRate 0.000192 Epoch: 24 Global Step: 502820 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:48,422-Speed 2496.98 samples/sec Loss 1.7656 LearningRate 0.000192 Epoch: 24 Global Step: 502830 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:33:56,638-Speed 2493.29 samples/sec Loss 1.7974 LearningRate 0.000192 Epoch: 24 Global Step: 502840 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:04,842-Speed 2496.73 samples/sec Loss 1.6930 LearningRate 0.000191 Epoch: 24 Global Step: 502850 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:13,044-Speed 2497.26 samples/sec Loss 1.8199 LearningRate 0.000191 Epoch: 24 Global Step: 502860 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:21,192-Speed 2513.91 samples/sec Loss 1.7658 LearningRate 0.000191 Epoch: 24 Global Step: 502870 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:29,395-Speed 2497.11 samples/sec Loss 1.7904 LearningRate 0.000191 Epoch: 24 Global Step: 502880 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:37,608-Speed 2493.83 samples/sec Loss 1.7594 LearningRate 0.000191 Epoch: 24 Global Step: 502890 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:45,815-Speed 2496.31 samples/sec Loss 1.7700 LearningRate 0.000191 Epoch: 24 Global Step: 502900 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:34:54,030-Speed 2493.37 samples/sec Loss 1.7723 LearningRate 0.000191 Epoch: 24 Global Step: 502910 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:02,246-Speed 2493.26 samples/sec Loss 1.7583 LearningRate 0.000191 Epoch: 24 Global Step: 502920 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:10,410-Speed 2508.85 samples/sec Loss 1.7993 LearningRate 0.000191 Epoch: 24 Global Step: 502930 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:18,613-Speed 2496.82 samples/sec Loss 1.7834 LearningRate 0.000191 Epoch: 24 Global Step: 502940 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:26,817-Speed 2496.82 samples/sec Loss 1.7477 LearningRate 0.000191 Epoch: 24 Global Step: 502950 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:35,035-Speed 2492.49 samples/sec Loss 1.7578 LearningRate 0.000191 Epoch: 24 Global Step: 502960 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:43,236-Speed 2497.61 samples/sec Loss 1.7544 LearningRate 0.000191 Epoch: 24 Global Step: 502970 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:51,440-Speed 2496.83 samples/sec Loss 1.7153 LearningRate 0.000191 Epoch: 24 Global Step: 502980 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:35:59,589-Speed 2513.44 samples/sec Loss 1.8174 LearningRate 0.000191 Epoch: 24 Global Step: 502990 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:07,797-Speed 2495.72 samples/sec Loss 1.7394 LearningRate 0.000191 Epoch: 24 Global Step: 503000 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:16,011-Speed 2493.72 samples/sec Loss 1.7576 LearningRate 0.000191 Epoch: 24 Global Step: 503010 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:24,214-Speed 2496.97 samples/sec Loss 1.7589 LearningRate 0.000191 Epoch: 24 Global Step: 503020 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:32,418-Speed 2496.82 samples/sec Loss 1.7277 LearningRate 0.000191 Epoch: 24 Global Step: 503030 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:40,624-Speed 2496.12 samples/sec Loss 1.7871 LearningRate 0.000191 Epoch: 24 Global Step: 503040 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:48,775-Speed 2512.85 samples/sec Loss 1.7519 LearningRate 0.000191 Epoch: 24 Global Step: 503050 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:36:56,978-Speed 2496.96 samples/sec Loss 1.7970 LearningRate 0.000191 Epoch: 24 Global Step: 503060 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:05,185-Speed 2495.81 samples/sec Loss 1.7351 LearningRate 0.000191 Epoch: 24 Global Step: 503070 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:13,389-Speed 2496.71 samples/sec Loss 1.7099 LearningRate 0.000191 Epoch: 24 Global Step: 503080 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:21,593-Speed 2496.77 samples/sec Loss 1.7621 LearningRate 0.000191 Epoch: 24 Global Step: 503090 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:29,795-Speed 2497.39 samples/sec Loss 1.7574 LearningRate 0.000191 Epoch: 24 Global Step: 503100 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:37,945-Speed 2513.24 samples/sec Loss 1.7393 LearningRate 0.000191 Epoch: 24 Global Step: 503110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:46,149-Speed 2497.04 samples/sec Loss 1.7322 LearningRate 0.000191 Epoch: 24 Global Step: 503120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:37:54,354-Speed 2496.57 samples/sec Loss 1.7117 LearningRate 0.000191 Epoch: 24 Global Step: 503130 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:02,556-Speed 2497.31 samples/sec Loss 1.7450 LearningRate 0.000191 Epoch: 24 Global Step: 503140 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:10,759-Speed 2496.95 samples/sec Loss 1.7569 LearningRate 0.000191 Epoch: 24 Global Step: 503150 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:18,968-Speed 2495.47 samples/sec Loss 1.7570 LearningRate 0.000191 Epoch: 24 Global Step: 503160 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:27,116-Speed 2513.98 samples/sec Loss 1.7931 LearningRate 0.000191 Epoch: 24 Global Step: 503170 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:35,325-Speed 2495.25 samples/sec Loss 1.7729 LearningRate 0.000191 Epoch: 24 Global Step: 503180 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:43,531-Speed 2496.23 samples/sec Loss 1.7703 LearningRate 0.000191 Epoch: 24 Global Step: 503190 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:51,733-Speed 2497.11 samples/sec Loss 1.7202 LearningRate 0.000191 Epoch: 24 Global Step: 503200 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:38:59,935-Speed 2497.54 samples/sec Loss 1.7462 LearningRate 0.000191 Epoch: 24 Global Step: 503210 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:08,140-Speed 2496.33 samples/sec Loss 1.7275 LearningRate 0.000191 Epoch: 24 Global Step: 503220 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:16,288-Speed 2514.06 samples/sec Loss 1.7753 LearningRate 0.000191 Epoch: 24 Global Step: 503230 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:24,491-Speed 2497.01 samples/sec Loss 1.7494 LearningRate 0.000191 Epoch: 24 Global Step: 503240 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:32,694-Speed 2496.85 samples/sec Loss 1.7709 LearningRate 0.000191 Epoch: 24 Global Step: 503250 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:40,907-Speed 2494.24 samples/sec Loss 1.7850 LearningRate 0.000191 Epoch: 24 Global Step: 503260 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:49,122-Speed 2493.37 samples/sec Loss 1.7518 LearningRate 0.000191 Epoch: 24 Global Step: 503270 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:39:57,326-Speed 2496.89 samples/sec Loss 1.7704 LearningRate 0.000191 Epoch: 24 Global Step: 503280 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:05,475-Speed 2513.55 samples/sec Loss 1.7551 LearningRate 0.000191 Epoch: 24 Global Step: 503290 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:13,681-Speed 2496.21 samples/sec Loss 1.7801 LearningRate 0.000191 Epoch: 24 Global Step: 503300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:21,885-Speed 2496.72 samples/sec Loss 1.7552 LearningRate 0.000191 Epoch: 24 Global Step: 503310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:30,088-Speed 2497.06 samples/sec Loss 1.7233 LearningRate 0.000191 Epoch: 24 Global Step: 503320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:38,292-Speed 2496.84 samples/sec Loss 1.7527 LearningRate 0.000191 Epoch: 24 Global Step: 503330 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:46,493-Speed 2497.39 samples/sec Loss 1.7402 LearningRate 0.000191 Epoch: 24 Global Step: 503340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:40:54,656-Speed 2509.52 samples/sec Loss 1.7233 LearningRate 0.000191 Epoch: 24 Global Step: 503350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:02,858-Speed 2497.25 samples/sec Loss 1.7254 LearningRate 0.000191 Epoch: 24 Global Step: 503360 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:11,062-Speed 2497.01 samples/sec Loss 1.7065 LearningRate 0.000191 Epoch: 24 Global Step: 503370 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:19,265-Speed 2496.76 samples/sec Loss 1.7497 LearningRate 0.000191 Epoch: 24 Global Step: 503380 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:27,467-Speed 2497.50 samples/sec Loss 1.7464 LearningRate 0.000191 Epoch: 24 Global Step: 503390 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:35,672-Speed 2496.49 samples/sec Loss 1.7775 LearningRate 0.000191 Epoch: 24 Global Step: 503400 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:43,826-Speed 2511.96 samples/sec Loss 1.7128 LearningRate 0.000191 Epoch: 24 Global Step: 503410 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:41:52,028-Speed 2497.32 samples/sec Loss 1.7650 LearningRate 0.000191 Epoch: 24 Global Step: 503420 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:00,234-Speed 2496.23 samples/sec Loss 1.7640 LearningRate 0.000191 Epoch: 24 Global Step: 503430 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:08,439-Speed 2496.54 samples/sec Loss 1.7312 LearningRate 0.000191 Epoch: 24 Global Step: 503440 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:16,642-Speed 2496.84 samples/sec Loss 1.7225 LearningRate 0.000191 Epoch: 24 Global Step: 503450 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:24,852-Speed 2494.91 samples/sec Loss 1.7747 LearningRate 0.000191 Epoch: 24 Global Step: 503460 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:33,003-Speed 2512.95 samples/sec Loss 1.7299 LearningRate 0.000191 Epoch: 24 Global Step: 503470 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:41,208-Speed 2496.57 samples/sec Loss 1.7250 LearningRate 0.000191 Epoch: 24 Global Step: 503480 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:49,413-Speed 2496.41 samples/sec Loss 1.7405 LearningRate 0.000191 Epoch: 24 Global Step: 503490 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:42:57,626-Speed 2494.09 samples/sec Loss 1.7139 LearningRate 0.000191 Epoch: 24 Global Step: 503500 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:05,835-Speed 2495.15 samples/sec Loss 1.7327 LearningRate 0.000191 Epoch: 24 Global Step: 503510 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:14,043-Speed 2495.84 samples/sec Loss 1.7592 LearningRate 0.000191 Epoch: 24 Global Step: 503520 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:22,194-Speed 2512.90 samples/sec Loss 1.7577 LearningRate 0.000191 Epoch: 24 Global Step: 503530 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:30,396-Speed 2497.08 samples/sec Loss 1.6936 LearningRate 0.000191 Epoch: 24 Global Step: 503540 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:38,598-Speed 2497.40 samples/sec Loss 1.7713 LearningRate 0.000191 Epoch: 24 Global Step: 503550 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:46,800-Speed 2497.38 samples/sec Loss 1.7726 LearningRate 0.000191 Epoch: 24 Global Step: 503560 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:43:55,006-Speed 2496.16 samples/sec Loss 1.7599 LearningRate 0.000191 Epoch: 24 Global Step: 503570 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:03,224-Speed 2492.55 samples/sec Loss 1.7570 LearningRate 0.000191 Epoch: 24 Global Step: 503580 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:11,378-Speed 2511.93 samples/sec Loss 1.7351 LearningRate 0.000191 Epoch: 24 Global Step: 503590 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:19,598-Speed 2491.69 samples/sec Loss 1.7742 LearningRate 0.000191 Epoch: 24 Global Step: 503600 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:27,807-Speed 2495.48 samples/sec Loss 1.7566 LearningRate 0.000191 Epoch: 24 Global Step: 503610 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:36,015-Speed 2495.48 samples/sec Loss 1.7384 LearningRate 0.000191 Epoch: 24 Global Step: 503620 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:44,224-Speed 2495.35 samples/sec Loss 1.7773 LearningRate 0.000191 Epoch: 24 Global Step: 503630 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:44:52,431-Speed 2496.06 samples/sec Loss 1.7824 LearningRate 0.000191 Epoch: 24 Global Step: 503640 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:00,585-Speed 2512.04 samples/sec Loss 1.7269 LearningRate 0.000191 Epoch: 24 Global Step: 503650 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:08,797-Speed 2494.46 samples/sec Loss 1.7581 LearningRate 0.000191 Epoch: 24 Global Step: 503660 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:17,020-Speed 2490.95 samples/sec Loss 1.7338 LearningRate 0.000191 Epoch: 24 Global Step: 503670 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:25,232-Speed 2494.10 samples/sec Loss 1.7543 LearningRate 0.000191 Epoch: 24 Global Step: 503680 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:33,456-Speed 2490.68 samples/sec Loss 1.7615 LearningRate 0.000191 Epoch: 24 Global Step: 503690 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:41,673-Speed 2492.88 samples/sec Loss 1.7456 LearningRate 0.000190 Epoch: 24 Global Step: 503700 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:49,830-Speed 2511.09 samples/sec Loss 1.7654 LearningRate 0.000190 Epoch: 24 Global Step: 503710 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:45:58,039-Speed 2495.54 samples/sec Loss 1.7706 LearningRate 0.000190 Epoch: 24 Global Step: 503720 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:06,254-Speed 2493.31 samples/sec Loss 1.7646 LearningRate 0.000190 Epoch: 24 Global Step: 503730 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:14,461-Speed 2496.02 samples/sec Loss 1.7515 LearningRate 0.000190 Epoch: 24 Global Step: 503740 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:22,670-Speed 2495.32 samples/sec Loss 1.7778 LearningRate 0.000190 Epoch: 24 Global Step: 503750 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:30,878-Speed 2495.50 samples/sec Loss 1.7945 LearningRate 0.000190 Epoch: 24 Global Step: 503760 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:39,028-Speed 2513.20 samples/sec Loss 1.8114 LearningRate 0.000190 Epoch: 24 Global Step: 503770 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:47,236-Speed 2495.61 samples/sec Loss 1.8097 LearningRate 0.000190 Epoch: 24 Global Step: 503780 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:46:55,440-Speed 2497.25 samples/sec Loss 1.7716 LearningRate 0.000190 Epoch: 24 Global Step: 503790 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:03,646-Speed 2495.96 samples/sec Loss 1.7690 LearningRate 0.000190 Epoch: 24 Global Step: 503800 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:11,851-Speed 2496.98 samples/sec Loss 1.7864 LearningRate 0.000190 Epoch: 24 Global Step: 503810 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:20,055-Speed 2496.45 samples/sec Loss 1.7761 LearningRate 0.000190 Epoch: 24 Global Step: 503820 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:28,207-Speed 2513.05 samples/sec Loss 1.7354 LearningRate 0.000190 Epoch: 24 Global Step: 503830 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:36,417-Speed 2494.77 samples/sec Loss 1.7047 LearningRate 0.000190 Epoch: 24 Global Step: 503840 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:44,620-Speed 2497.13 samples/sec Loss 1.7542 LearningRate 0.000190 Epoch: 24 Global Step: 503850 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:47:52,825-Speed 2496.41 samples/sec Loss 1.7333 LearningRate 0.000190 Epoch: 24 Global Step: 503860 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:01,026-Speed 2497.57 samples/sec Loss 1.7993 LearningRate 0.000190 Epoch: 24 Global Step: 503870 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:09,229-Speed 2496.91 samples/sec Loss 1.7268 LearningRate 0.000190 Epoch: 24 Global Step: 503880 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:17,385-Speed 2511.41 samples/sec Loss 1.7415 LearningRate 0.000190 Epoch: 24 Global Step: 503890 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:25,593-Speed 2495.72 samples/sec Loss 1.7358 LearningRate 0.000190 Epoch: 24 Global Step: 503900 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:33,799-Speed 2495.99 samples/sec Loss 1.7520 LearningRate 0.000190 Epoch: 24 Global Step: 503910 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-07-10 09:48:41,962-Speed 2509.34 samples/sec Loss 1.7538 LearningRate 0.000190 Epoch: 24 Global Step: 503920 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:48:50,167-Speed 2496.35 samples/sec Loss 1.7172 LearningRate 0.000190 Epoch: 24 Global Step: 503930 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:48:58,376-Speed 2495.36 samples/sec Loss 1.6996 LearningRate 0.000190 Epoch: 24 Global Step: 503940 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:06,537-Speed 2509.82 samples/sec Loss 1.7421 LearningRate 0.000190 Epoch: 24 Global Step: 503950 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:14,742-Speed 2496.41 samples/sec Loss 1.7435 LearningRate 0.000190 Epoch: 24 Global Step: 503960 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:22,947-Speed 2496.33 samples/sec Loss 1.7543 LearningRate 0.000190 Epoch: 24 Global Step: 503970 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:31,152-Speed 2496.55 samples/sec Loss 1.7043 LearningRate 0.000190 Epoch: 24 Global Step: 503980 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:39,366-Speed 2493.76 samples/sec Loss 1.7294 LearningRate 0.000190 Epoch: 24 Global Step: 503990 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:47,574-Speed 2495.58 samples/sec Loss 1.7554 LearningRate 0.000190 Epoch: 24 Global Step: 504000 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:49:55,725-Speed 2512.74 samples/sec Loss 1.7908 LearningRate 0.000190 Epoch: 24 Global Step: 504010 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:03,931-Speed 2496.42 samples/sec Loss 1.7456 LearningRate 0.000190 Epoch: 24 Global Step: 504020 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:12,134-Speed 2497.12 samples/sec Loss 1.7828 LearningRate 0.000190 Epoch: 24 Global Step: 504030 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:20,340-Speed 2495.99 samples/sec Loss 1.7472 LearningRate 0.000190 Epoch: 24 Global Step: 504040 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:28,544-Speed 2496.80 samples/sec Loss 1.8174 LearningRate 0.000190 Epoch: 24 Global Step: 504050 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:36,761-Speed 2493.39 samples/sec Loss 1.7588 LearningRate 0.000190 Epoch: 24 Global Step: 504060 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:44,912-Speed 2512.95 samples/sec Loss 1.7647 LearningRate 0.000190 Epoch: 24 Global Step: 504070 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:50:53,115-Speed 2497.19 samples/sec Loss 1.7436 LearningRate 0.000190 Epoch: 24 Global Step: 504080 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:01,317-Speed 2497.23 samples/sec Loss 1.7610 LearningRate 0.000190 Epoch: 24 Global Step: 504090 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:09,526-Speed 2495.34 samples/sec Loss 1.7419 LearningRate 0.000190 Epoch: 24 Global Step: 504100 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:17,731-Speed 2496.32 samples/sec Loss 1.7553 LearningRate 0.000190 Epoch: 24 Global Step: 504110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:25,935-Speed 2496.86 samples/sec Loss 1.6988 LearningRate 0.000190 Epoch: 24 Global Step: 504120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:34,083-Speed 2514.19 samples/sec Loss 1.7854 LearningRate 0.000190 Epoch: 24 Global Step: 504130 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:42,286-Speed 2496.95 samples/sec Loss 1.7671 LearningRate 0.000190 Epoch: 24 Global Step: 504140 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:50,490-Speed 2496.87 samples/sec Loss 1.7109 LearningRate 0.000190 Epoch: 24 Global Step: 504150 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-07-10 09:51:58,697-Speed 2495.64 samples/sec Loss 1.7633 LearningRate 0.000190 Epoch: 24 Global Step: 504160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:06,902-Speed 2496.64 samples/sec Loss 1.7661 LearningRate 0.000190 Epoch: 24 Global Step: 504170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:15,105-Speed 2496.96 samples/sec Loss 1.7419 LearningRate 0.000190 Epoch: 24 Global Step: 504180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:23,264-Speed 2510.41 samples/sec Loss 1.7376 LearningRate 0.000190 Epoch: 24 Global Step: 504190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:31,470-Speed 2496.17 samples/sec Loss 1.7088 LearningRate 0.000190 Epoch: 24 Global Step: 504200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:39,675-Speed 2496.39 samples/sec Loss 1.7279 LearningRate 0.000190 Epoch: 24 Global Step: 504210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:47,879-Speed 2496.82 samples/sec Loss 1.7111 LearningRate 0.000190 Epoch: 24 Global Step: 504220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:52:56,081-Speed 2497.36 samples/sec Loss 1.7326 LearningRate 0.000190 Epoch: 24 Global Step: 504230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:04,285-Speed 2496.63 samples/sec Loss 1.7387 LearningRate 0.000190 Epoch: 24 Global Step: 504240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:12,435-Speed 2513.34 samples/sec Loss 1.7242 LearningRate 0.000190 Epoch: 24 Global Step: 504250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:20,650-Speed 2493.46 samples/sec Loss 1.7580 LearningRate 0.000190 Epoch: 24 Global Step: 504260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:28,864-Speed 2493.78 samples/sec Loss 1.7353 LearningRate 0.000190 Epoch: 24 Global Step: 504270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:37,067-Speed 2497.10 samples/sec Loss 1.7501 LearningRate 0.000190 Epoch: 24 Global Step: 504280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:45,272-Speed 2496.44 samples/sec Loss 1.7240 LearningRate 0.000190 Epoch: 24 Global Step: 504290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:53:53,473-Speed 2497.37 samples/sec Loss 1.7452 LearningRate 0.000190 Epoch: 24 Global Step: 504300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:01,624-Speed 2513.21 samples/sec Loss 1.7256 LearningRate 0.000190 Epoch: 24 Global Step: 504310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:09,826-Speed 2497.06 samples/sec Loss 1.7315 LearningRate 0.000190 Epoch: 24 Global Step: 504320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:18,028-Speed 2497.45 samples/sec Loss 1.7616 LearningRate 0.000190 Epoch: 24 Global Step: 504330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:26,233-Speed 2496.50 samples/sec Loss 1.7417 LearningRate 0.000190 Epoch: 24 Global Step: 504340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:34,447-Speed 2493.32 samples/sec Loss 1.7236 LearningRate 0.000190 Epoch: 24 Global Step: 504350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:42,649-Speed 2497.49 samples/sec Loss 1.7387 LearningRate 0.000190 Epoch: 24 Global Step: 504360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:50,799-Speed 2513.30 samples/sec Loss 1.7594 LearningRate 0.000190 Epoch: 24 Global Step: 504370 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:54:59,004-Speed 2496.50 samples/sec Loss 1.7478 LearningRate 0.000190 Epoch: 24 Global Step: 504380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:07,206-Speed 2497.31 samples/sec Loss 1.7831 LearningRate 0.000190 Epoch: 24 Global Step: 504390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:15,412-Speed 2496.28 samples/sec Loss 1.7307 LearningRate 0.000190 Epoch: 24 Global Step: 504400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:23,619-Speed 2495.62 samples/sec Loss 1.7514 LearningRate 0.000190 Epoch: 24 Global Step: 504410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:31,826-Speed 2496.10 samples/sec Loss 1.8189 LearningRate 0.000190 Epoch: 24 Global Step: 504420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:39,978-Speed 2512.37 samples/sec Loss 1.7788 LearningRate 0.000190 Epoch: 24 Global Step: 504430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:48,180-Speed 2497.79 samples/sec Loss 1.7619 LearningRate 0.000190 Epoch: 24 Global Step: 504440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:55:56,388-Speed 2495.45 samples/sec Loss 1.7370 LearningRate 0.000190 Epoch: 24 Global Step: 504450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:04,590-Speed 2497.38 samples/sec Loss 1.7851 LearningRate 0.000190 Epoch: 24 Global Step: 504460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:12,792-Speed 2497.38 samples/sec Loss 1.7914 LearningRate 0.000190 Epoch: 24 Global Step: 504470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:21,007-Speed 2493.95 samples/sec Loss 1.7723 LearningRate 0.000190 Epoch: 24 Global Step: 504480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:29,158-Speed 2513.10 samples/sec Loss 1.7367 LearningRate 0.000190 Epoch: 24 Global Step: 504490 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:37,364-Speed 2496.08 samples/sec Loss 1.7859 LearningRate 0.000190 Epoch: 24 Global Step: 504500 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:45,577-Speed 2493.87 samples/sec Loss 1.7650 LearningRate 0.000190 Epoch: 24 Global Step: 504510 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:56:53,781-Speed 2496.69 samples/sec Loss 1.7809 LearningRate 0.000190 Epoch: 24 Global Step: 504520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:01,985-Speed 2496.79 samples/sec Loss 1.7351 LearningRate 0.000190 Epoch: 24 Global Step: 504530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:10,190-Speed 2496.81 samples/sec Loss 1.7953 LearningRate 0.000190 Epoch: 24 Global Step: 504540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:18,340-Speed 2513.23 samples/sec Loss 1.7395 LearningRate 0.000190 Epoch: 24 Global Step: 504550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:26,546-Speed 2495.86 samples/sec Loss 1.7785 LearningRate 0.000189 Epoch: 24 Global Step: 504560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:34,749-Speed 2497.16 samples/sec Loss 1.7548 LearningRate 0.000189 Epoch: 24 Global Step: 504570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:42,952-Speed 2496.82 samples/sec Loss 1.7310 LearningRate 0.000189 Epoch: 24 Global Step: 504580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:51,162-Speed 2494.93 samples/sec Loss 1.7700 LearningRate 0.000189 Epoch: 24 Global Step: 504590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:57:59,364-Speed 2497.49 samples/sec Loss 1.6902 LearningRate 0.000189 Epoch: 24 Global Step: 504600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:07,514-Speed 2513.31 samples/sec Loss 1.7484 LearningRate 0.000189 Epoch: 24 Global Step: 504610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:15,714-Speed 2497.93 samples/sec Loss 1.7052 LearningRate 0.000189 Epoch: 24 Global Step: 504620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:23,918-Speed 2496.63 samples/sec Loss 1.7346 LearningRate 0.000189 Epoch: 24 Global Step: 504630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:32,137-Speed 2492.13 samples/sec Loss 1.7398 LearningRate 0.000189 Epoch: 24 Global Step: 504640 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:40,339-Speed 2497.30 samples/sec Loss 1.7568 LearningRate 0.000189 Epoch: 24 Global Step: 504650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:48,551-Speed 2494.36 samples/sec Loss 1.7059 LearningRate 0.000189 Epoch: 24 Global Step: 504660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:58:56,702-Speed 2512.98 samples/sec Loss 1.7408 LearningRate 0.000189 Epoch: 24 Global Step: 504670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:04,903-Speed 2497.73 samples/sec Loss 1.7468 LearningRate 0.000189 Epoch: 24 Global Step: 504680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:13,112-Speed 2495.13 samples/sec Loss 1.7103 LearningRate 0.000189 Epoch: 24 Global Step: 504690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:21,322-Speed 2495.02 samples/sec Loss 1.7943 LearningRate 0.000189 Epoch: 24 Global Step: 504700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:29,536-Speed 2493.55 samples/sec Loss 1.7623 LearningRate 0.000189 Epoch: 24 Global Step: 504710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:37,738-Speed 2497.27 samples/sec Loss 1.7255 LearningRate 0.000189 Epoch: 24 Global Step: 504720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:45,887-Speed 2513.70 samples/sec Loss 1.7192 LearningRate 0.000189 Epoch: 24 Global Step: 504730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 09:59:54,089-Speed 2497.35 samples/sec Loss 1.7611 LearningRate 0.000189 Epoch: 24 Global Step: 504740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:02,291-Speed 2497.62 samples/sec Loss 1.7071 LearningRate 0.000189 Epoch: 24 Global Step: 504750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:10,493-Speed 2497.63 samples/sec Loss 1.7464 LearningRate 0.000189 Epoch: 24 Global Step: 504760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:18,697-Speed 2496.78 samples/sec Loss 1.7725 LearningRate 0.000189 Epoch: 24 Global Step: 504770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:26,899-Speed 2497.26 samples/sec Loss 1.7853 LearningRate 0.000189 Epoch: 24 Global Step: 504780 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:35,051-Speed 2512.70 samples/sec Loss 1.7404 LearningRate 0.000189 Epoch: 24 Global Step: 504790 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:43,256-Speed 2496.58 samples/sec Loss 1.7507 LearningRate 0.000189 Epoch: 24 Global Step: 504800 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:51,459-Speed 2496.71 samples/sec Loss 1.8109 LearningRate 0.000189 Epoch: 24 Global Step: 504810 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:00:59,669-Speed 2495.46 samples/sec Loss 1.7688 LearningRate 0.000189 Epoch: 24 Global Step: 504820 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:07,878-Speed 2495.11 samples/sec Loss 1.7306 LearningRate 0.000189 Epoch: 24 Global Step: 504830 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:16,081-Speed 2497.21 samples/sec Loss 1.7987 LearningRate 0.000189 Epoch: 24 Global Step: 504840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:24,236-Speed 2511.74 samples/sec Loss 1.7934 LearningRate 0.000189 Epoch: 24 Global Step: 504850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:32,440-Speed 2496.73 samples/sec Loss 1.7981 LearningRate 0.000189 Epoch: 24 Global Step: 504860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:40,650-Speed 2494.83 samples/sec Loss 1.7858 LearningRate 0.000189 Epoch: 24 Global Step: 504870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:48,857-Speed 2495.81 samples/sec Loss 1.7747 LearningRate 0.000189 Epoch: 24 Global Step: 504880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:01:57,061-Speed 2496.68 samples/sec Loss 1.7565 LearningRate 0.000189 Epoch: 24 Global Step: 504890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:05,278-Speed 2492.55 samples/sec Loss 1.7582 LearningRate 0.000189 Epoch: 24 Global Step: 504900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:13,433-Speed 2512.00 samples/sec Loss 1.7622 LearningRate 0.000189 Epoch: 24 Global Step: 504910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:21,651-Speed 2492.49 samples/sec Loss 1.7713 LearningRate 0.000189 Epoch: 24 Global Step: 504920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:29,854-Speed 2496.88 samples/sec Loss 1.7557 LearningRate 0.000189 Epoch: 24 Global Step: 504930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:38,056-Speed 2497.13 samples/sec Loss 1.7878 LearningRate 0.000189 Epoch: 24 Global Step: 504940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:46,267-Speed 2494.73 samples/sec Loss 1.7912 LearningRate 0.000189 Epoch: 24 Global Step: 504950 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:02:54,470-Speed 2497.27 samples/sec Loss 1.7866 LearningRate 0.000189 Epoch: 24 Global Step: 504960 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:02,619-Speed 2513.58 samples/sec Loss 1.7393 LearningRate 0.000189 Epoch: 24 Global Step: 504970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:10,822-Speed 2497.24 samples/sec Loss 1.8001 LearningRate 0.000189 Epoch: 24 Global Step: 504980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:19,024-Speed 2497.42 samples/sec Loss 1.7408 LearningRate 0.000189 Epoch: 24 Global Step: 504990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:27,226-Speed 2497.26 samples/sec Loss 1.7640 LearningRate 0.000189 Epoch: 24 Global Step: 505000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:35,431-Speed 2496.64 samples/sec Loss 1.7654 LearningRate 0.000189 Epoch: 24 Global Step: 505010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:43,637-Speed 2495.93 samples/sec Loss 1.7233 LearningRate 0.000189 Epoch: 24 Global Step: 505020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:51,794-Speed 2511.48 samples/sec Loss 1.7584 LearningRate 0.000189 Epoch: 24 Global Step: 505030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:03:59,996-Speed 2497.17 samples/sec Loss 1.7622 LearningRate 0.000189 Epoch: 24 Global Step: 505040 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:08,202-Speed 2495.95 samples/sec Loss 1.7507 LearningRate 0.000189 Epoch: 24 Global Step: 505050 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:16,406-Speed 2496.95 samples/sec Loss 1.7728 LearningRate 0.000189 Epoch: 24 Global Step: 505060 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:24,612-Speed 2496.22 samples/sec Loss 1.7462 LearningRate 0.000189 Epoch: 24 Global Step: 505070 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:32,815-Speed 2497.09 samples/sec Loss 1.7131 LearningRate 0.000189 Epoch: 24 Global Step: 505080 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:40,970-Speed 2511.85 samples/sec Loss 1.7464 LearningRate 0.000189 Epoch: 24 Global Step: 505090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:49,176-Speed 2496.25 samples/sec Loss 1.7464 LearningRate 0.000189 Epoch: 24 Global Step: 505100 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:04:57,384-Speed 2495.43 samples/sec Loss 1.7561 LearningRate 0.000189 Epoch: 24 Global Step: 505110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:05:05,591-Speed 2495.85 samples/sec Loss 1.7524 LearningRate 0.000189 Epoch: 24 Global Step: 505120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:13,797-Speed 2496.07 samples/sec Loss 1.8005 LearningRate 0.000189 Epoch: 24 Global Step: 505130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:22,004-Speed 2496.35 samples/sec Loss 1.7934 LearningRate 0.000189 Epoch: 24 Global Step: 505140 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:30,154-Speed 2513.12 samples/sec Loss 1.7737 LearningRate 0.000189 Epoch: 24 Global Step: 505150 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:38,359-Speed 2496.46 samples/sec Loss 1.7085 LearningRate 0.000189 Epoch: 24 Global Step: 505160 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:46,562-Speed 2496.99 samples/sec Loss 1.7309 LearningRate 0.000189 Epoch: 24 Global Step: 505170 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:05:54,767-Speed 2496.84 samples/sec Loss 1.7391 LearningRate 0.000189 Epoch: 24 Global Step: 505180 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:02,974-Speed 2495.99 samples/sec Loss 1.7551 LearningRate 0.000189 Epoch: 24 Global Step: 505190 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:11,177-Speed 2496.90 samples/sec Loss 1.7762 LearningRate 0.000189 Epoch: 24 Global Step: 505200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:19,328-Speed 2512.89 samples/sec Loss 1.7158 LearningRate 0.000189 Epoch: 24 Global Step: 505210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:27,532-Speed 2496.58 samples/sec Loss 1.7179 LearningRate 0.000189 Epoch: 24 Global Step: 505220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:35,735-Speed 2497.16 samples/sec Loss 1.7331 LearningRate 0.000189 Epoch: 24 Global Step: 505230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:43,938-Speed 2496.93 samples/sec Loss 1.7446 LearningRate 0.000189 Epoch: 24 Global Step: 505240 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:06:52,167-Speed 2489.43 samples/sec Loss 1.7199 LearningRate 0.000189 Epoch: 24 Global Step: 505250 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:00,371-Speed 2497.04 samples/sec Loss 1.7306 LearningRate 0.000189 Epoch: 24 Global Step: 505260 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:08,522-Speed 2512.81 samples/sec Loss 1.7755 LearningRate 0.000189 Epoch: 24 Global Step: 505270 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:16,748-Speed 2490.36 samples/sec Loss 1.7369 LearningRate 0.000189 Epoch: 24 Global Step: 505280 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:24,950-Speed 2497.15 samples/sec Loss 1.7765 LearningRate 0.000189 Epoch: 24 Global Step: 505290 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:33,154-Speed 2496.77 samples/sec Loss 1.7054 LearningRate 0.000189 Epoch: 24 Global Step: 505300 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:41,367-Speed 2494.11 samples/sec Loss 1.7681 LearningRate 0.000189 Epoch: 24 Global Step: 505310 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:49,569-Speed 2497.48 samples/sec Loss 1.7375 LearningRate 0.000189 Epoch: 24 Global Step: 505320 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:07:57,719-Speed 2513.20 samples/sec Loss 1.7363 LearningRate 0.000189 Epoch: 24 Global Step: 505330 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:05,927-Speed 2495.85 samples/sec Loss 1.7009 LearningRate 0.000189 Epoch: 24 Global Step: 505340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:14,130-Speed 2497.11 samples/sec Loss 1.7599 LearningRate 0.000189 Epoch: 24 Global Step: 505350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:22,331-Speed 2497.57 samples/sec Loss 1.7519 LearningRate 0.000189 Epoch: 24 Global Step: 505360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:30,538-Speed 2495.82 samples/sec Loss 1.7467 LearningRate 0.000189 Epoch: 24 Global Step: 505370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:38,742-Speed 2496.69 samples/sec Loss 1.7282 LearningRate 0.000189 Epoch: 24 Global Step: 505380 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:46,892-Speed 2513.60 samples/sec Loss 1.7114 LearningRate 0.000189 Epoch: 24 Global Step: 505390 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:08:55,094-Speed 2497.29 samples/sec Loss 1.7529 LearningRate 0.000189 Epoch: 24 Global Step: 505400 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:03,298-Speed 2496.71 samples/sec Loss 1.7695 LearningRate 0.000189 Epoch: 24 Global Step: 505410 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:11,503-Speed 2496.44 samples/sec Loss 1.7158 LearningRate 0.000188 Epoch: 24 Global Step: 505420 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:19,705-Speed 2497.37 samples/sec Loss 1.7737 LearningRate 0.000188 Epoch: 24 Global Step: 505430 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:27,910-Speed 2496.64 samples/sec Loss 1.7246 LearningRate 0.000188 Epoch: 24 Global Step: 505440 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:36,060-Speed 2513.36 samples/sec Loss 1.7499 LearningRate 0.000188 Epoch: 24 Global Step: 505450 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:44,264-Speed 2496.91 samples/sec Loss 1.7069 LearningRate 0.000188 Epoch: 24 Global Step: 505460 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:09:52,470-Speed 2496.05 samples/sec Loss 1.7188 LearningRate 0.000188 Epoch: 24 Global Step: 505470 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:10:00,675-Speed 2496.63 samples/sec Loss 1.8185 LearningRate 0.000188 Epoch: 24 Global Step: 505480 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:10:08,883-Speed 2495.42 samples/sec Loss 1.7493 LearningRate 0.000188 Epoch: 24 Global Step: 505490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:10:17,088-Speed 2496.36 samples/sec Loss 1.7487 LearningRate 0.000188 Epoch: 24 Global Step: 505500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:10:25,246-Speed 2511.02 samples/sec Loss 1.7679 LearningRate 0.000188 Epoch: 24 Global Step: 505510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:10:33,410-Speed 2508.86 samples/sec Loss 1.7547 LearningRate 0.000188 Epoch: 24 Global Step: 505520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:10:41,618-Speed 2495.51 samples/sec Loss 1.7281 LearningRate 0.000188 Epoch: 24 Global Step: 505530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:10:49,823-Speed 2496.53 samples/sec Loss 1.7646 LearningRate 0.000188 Epoch: 24 Global Step: 505540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:10:58,043-Speed 2492.08 samples/sec Loss 1.7005 LearningRate 0.000188 Epoch: 24 Global Step: 505550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:06,252-Speed 2495.24 samples/sec Loss 1.7303 LearningRate 0.000188 Epoch: 24 Global Step: 505560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:14,403-Speed 2513.34 samples/sec Loss 1.7212 LearningRate 0.000188 Epoch: 24 Global Step: 505570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:22,609-Speed 2496.12 samples/sec Loss 1.7764 LearningRate 0.000188 Epoch: 24 Global Step: 505580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:30,813-Speed 2496.53 samples/sec Loss 1.7237 LearningRate 0.000188 Epoch: 24 Global Step: 505590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:39,019-Speed 2496.24 samples/sec Loss 1.6939 LearningRate 0.000188 Epoch: 24 Global Step: 505600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:47,228-Speed 2495.28 samples/sec Loss 1.7153 LearningRate 0.000188 Epoch: 24 Global Step: 505610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:11:55,432-Speed 2496.94 samples/sec Loss 1.7152 LearningRate 0.000188 Epoch: 24 Global Step: 505620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:03,580-Speed 2513.81 samples/sec Loss 1.7563 LearningRate 0.000188 Epoch: 24 Global Step: 505630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:11,796-Speed 2493.16 samples/sec Loss 1.7068 LearningRate 0.000188 Epoch: 24 Global Step: 505640 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:19,998-Speed 2497.34 samples/sec Loss 1.7153 LearningRate 0.000188 Epoch: 24 Global Step: 505650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:28,215-Speed 2492.82 samples/sec Loss 1.7211 LearningRate 0.000188 Epoch: 24 Global Step: 505660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:36,418-Speed 2497.06 samples/sec Loss 1.7602 LearningRate 0.000188 Epoch: 24 Global Step: 505670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:44,624-Speed 2496.02 samples/sec Loss 1.7656 LearningRate 0.000188 Epoch: 24 Global Step: 505680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:12:52,781-Speed 2511.23 samples/sec Loss 1.7384 LearningRate 0.000188 Epoch: 24 Global Step: 505690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:00,987-Speed 2496.38 samples/sec Loss 1.7506 LearningRate 0.000188 Epoch: 24 Global Step: 505700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:09,203-Speed 2492.84 samples/sec Loss 1.7428 LearningRate 0.000188 Epoch: 24 Global Step: 505710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:17,412-Speed 2495.32 samples/sec Loss 1.7606 LearningRate 0.000188 Epoch: 24 Global Step: 505720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:25,631-Speed 2492.14 samples/sec Loss 1.7488 LearningRate 0.000188 Epoch: 24 Global Step: 505730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:33,837-Speed 2495.99 samples/sec Loss 1.7603 LearningRate 0.000188 Epoch: 24 Global Step: 505740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:41,992-Speed 2511.72 samples/sec Loss 1.7116 LearningRate 0.000188 Epoch: 24 Global Step: 505750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:50,203-Speed 2494.87 samples/sec Loss 1.7464 LearningRate 0.000188 Epoch: 24 Global Step: 505760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:13:58,405-Speed 2497.32 samples/sec Loss 1.7320 LearningRate 0.000188 Epoch: 24 Global Step: 505770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:06,621-Speed 2493.33 samples/sec Loss 1.7260 LearningRate 0.000188 Epoch: 24 Global Step: 505780 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:14,823-Speed 2497.37 samples/sec Loss 1.7259 LearningRate 0.000188 Epoch: 24 Global Step: 505790 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:23,025-Speed 2497.02 samples/sec Loss 1.7161 LearningRate 0.000188 Epoch: 24 Global Step: 505800 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:31,177-Speed 2512.88 samples/sec Loss 1.7613 LearningRate 0.000188 Epoch: 24 Global Step: 505810 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:39,380-Speed 2496.90 samples/sec Loss 1.7075 LearningRate 0.000188 Epoch: 24 Global Step: 505820 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:47,583-Speed 2496.97 samples/sec Loss 1.7165 LearningRate 0.000188 Epoch: 24 Global Step: 505830 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:14:55,788-Speed 2496.47 samples/sec Loss 1.7143 LearningRate 0.000188 Epoch: 24 Global Step: 505840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:03,994-Speed 2496.12 samples/sec Loss 1.7380 LearningRate 0.000188 Epoch: 24 Global Step: 505850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:12,204-Speed 2495.15 samples/sec Loss 1.7235 LearningRate 0.000188 Epoch: 24 Global Step: 505860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:20,355-Speed 2513.28 samples/sec Loss 1.7977 LearningRate 0.000188 Epoch: 24 Global Step: 505870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:28,560-Speed 2496.35 samples/sec Loss 1.7317 LearningRate 0.000188 Epoch: 24 Global Step: 505880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:36,765-Speed 2496.27 samples/sec Loss 1.7187 LearningRate 0.000188 Epoch: 24 Global Step: 505890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:44,974-Speed 2495.29 samples/sec Loss 1.8286 LearningRate 0.000188 Epoch: 24 Global Step: 505900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:15:53,180-Speed 2496.01 samples/sec Loss 1.7331 LearningRate 0.000188 Epoch: 24 Global Step: 505910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:01,385-Speed 2496.54 samples/sec Loss 1.7361 LearningRate 0.000188 Epoch: 24 Global Step: 505920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:09,537-Speed 2512.82 samples/sec Loss 1.7918 LearningRate 0.000188 Epoch: 24 Global Step: 505930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:17,742-Speed 2496.42 samples/sec Loss 1.7884 LearningRate 0.000188 Epoch: 24 Global Step: 505940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:25,949-Speed 2496.16 samples/sec Loss 1.7901 LearningRate 0.000188 Epoch: 24 Global Step: 505950 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:34,151-Speed 2497.21 samples/sec Loss 1.7466 LearningRate 0.000188 Epoch: 24 Global Step: 505960 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:42,357-Speed 2496.05 samples/sec Loss 1.7193 LearningRate 0.000188 Epoch: 24 Global Step: 505970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:50,561-Speed 2496.82 samples/sec Loss 1.7513 LearningRate 0.000188 Epoch: 24 Global Step: 505980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:16:58,710-Speed 2513.59 samples/sec Loss 1.7687 LearningRate 0.000188 Epoch: 24 Global Step: 505990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:06,915-Speed 2496.45 samples/sec Loss 1.7231 LearningRate 0.000188 Epoch: 24 Global Step: 506000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:15,116-Speed 2497.46 samples/sec Loss 1.7415 LearningRate 0.000188 Epoch: 24 Global Step: 506010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:23,318-Speed 2497.48 samples/sec Loss 1.7493 LearningRate 0.000188 Epoch: 24 Global Step: 506020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:31,521-Speed 2497.25 samples/sec Loss 1.7681 LearningRate 0.000188 Epoch: 24 Global Step: 506030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:39,726-Speed 2496.20 samples/sec Loss 1.7789 LearningRate 0.000188 Epoch: 24 Global Step: 506040 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:47,876-Speed 2513.53 samples/sec Loss 1.7143 LearningRate 0.000188 Epoch: 24 Global Step: 506050 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:17:56,083-Speed 2495.79 samples/sec Loss 1.7601 LearningRate 0.000188 Epoch: 24 Global Step: 506060 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:04,286-Speed 2496.97 samples/sec Loss 1.7774 LearningRate 0.000188 Epoch: 24 Global Step: 506070 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:12,495-Speed 2495.20 samples/sec Loss 1.7243 LearningRate 0.000188 Epoch: 24 Global Step: 506080 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:20,700-Speed 2496.77 samples/sec Loss 1.7190 LearningRate 0.000188 Epoch: 24 Global Step: 506090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:28,906-Speed 2496.08 samples/sec Loss 1.7342 LearningRate 0.000188 Epoch: 24 Global Step: 506100 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:37,056-Speed 2513.28 samples/sec Loss 1.7136 LearningRate 0.000188 Epoch: 24 Global Step: 506110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:45,258-Speed 2497.16 samples/sec Loss 1.7947 LearningRate 0.000188 Epoch: 24 Global Step: 506120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:18:53,461-Speed 2497.17 samples/sec Loss 1.7403 LearningRate 0.000188 Epoch: 24 Global Step: 506130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:01,665-Speed 2496.88 samples/sec Loss 1.7427 LearningRate 0.000188 Epoch: 24 Global Step: 506140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:09,866-Speed 2497.59 samples/sec Loss 1.7575 LearningRate 0.000188 Epoch: 24 Global Step: 506150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:18,073-Speed 2495.62 samples/sec Loss 1.7288 LearningRate 0.000188 Epoch: 24 Global Step: 506160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:26,272-Speed 2515.20 samples/sec Loss 1.7539 LearningRate 0.000188 Epoch: 24 Global Step: 506170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:34,531-Speed 2498.50 samples/sec Loss 1.7422 LearningRate 0.000188 Epoch: 24 Global Step: 506180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:42,736-Speed 2496.54 samples/sec Loss 1.7518 LearningRate 0.000188 Epoch: 24 Global Step: 506190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:51,004-Speed 2495.27 samples/sec Loss 1.7269 LearningRate 0.000188 Epoch: 24 Global Step: 506200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:19:59,229-Speed 2498.99 samples/sec Loss 1.7755 LearningRate 0.000188 Epoch: 24 Global Step: 506210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:07,467-Speed 2494.82 samples/sec Loss 1.7325 LearningRate 0.000188 Epoch: 24 Global Step: 506220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:15,618-Speed 2512.98 samples/sec Loss 1.7501 LearningRate 0.000188 Epoch: 24 Global Step: 506230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:23,857-Speed 2497.85 samples/sec Loss 1.7392 LearningRate 0.000188 Epoch: 24 Global Step: 506240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:32,059-Speed 2497.07 samples/sec Loss 1.7155 LearningRate 0.000188 Epoch: 24 Global Step: 506250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:40,307-Speed 2498.57 samples/sec Loss 1.7494 LearningRate 0.000188 Epoch: 24 Global Step: 506260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:51,524-Speed 2084.68 samples/sec Loss 1.7621 LearningRate 0.000188 Epoch: 24 Global Step: 506270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:20:59,723-Speed 2498.02 samples/sec Loss 1.7474 LearningRate 0.000187 Epoch: 24 Global Step: 506280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:07,884-Speed 2517.31 samples/sec Loss 1.7639 LearningRate 0.000187 Epoch: 24 Global Step: 506290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:20,328-Speed 1649.78 samples/sec Loss 1.7366 LearningRate 0.000187 Epoch: 24 Global Step: 506300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:28,528-Speed 2501.97 samples/sec Loss 1.7674 LearningRate 0.000187 Epoch: 24 Global Step: 506310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:36,773-Speed 2497.58 samples/sec Loss 1.7489 LearningRate 0.000187 Epoch: 24 Global Step: 506320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:48,704-Speed 1723.05 samples/sec Loss 1.7864 LearningRate 0.000187 Epoch: 24 Global Step: 506330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:21:56,894-Speed 2500.96 samples/sec Loss 1.7760 LearningRate 0.000187 Epoch: 24 Global Step: 506340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:05,041-Speed 2514.16 samples/sec Loss 1.7524 LearningRate 0.000187 Epoch: 24 Global Step: 506350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:18,196-Speed 2500.31 samples/sec Loss 1.8258 LearningRate 0.000187 Epoch: 24 Global Step: 506360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:27,316-Speed 2501.48 samples/sec Loss 1.7645 LearningRate 0.000187 Epoch: 24 Global Step: 506370 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:35,510-Speed 2499.77 samples/sec Loss 1.8034 LearningRate 0.000187 Epoch: 24 Global Step: 506380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:47,526-Speed 2498.94 samples/sec Loss 1.7863 LearningRate 0.000187 Epoch: 24 Global Step: 506390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:22:55,878-Speed 2502.95 samples/sec Loss 1.7513 LearningRate 0.000187 Epoch: 24 Global Step: 506400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:04,476-Speed 2382.01 samples/sec Loss 1.7782 LearningRate 0.000187 Epoch: 24 Global Step: 506410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:12,678-Speed 2497.52 samples/sec Loss 1.7657 LearningRate 0.000187 Epoch: 24 Global Step: 506420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:20,890-Speed 2494.30 samples/sec Loss 1.7562 LearningRate 0.000187 Epoch: 24 Global Step: 506430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:29,090-Speed 2498.02 samples/sec Loss 1.7601 LearningRate 0.000187 Epoch: 24 Global Step: 506440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:37,291-Speed 2497.52 samples/sec Loss 1.7259 LearningRate 0.000187 Epoch: 24 Global Step: 506450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:45,494-Speed 2497.29 samples/sec Loss 1.7222 LearningRate 0.000187 Epoch: 24 Global Step: 506460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:23:53,644-Speed 2513.49 samples/sec Loss 1.7708 LearningRate 0.000187 Epoch: 24 Global Step: 506470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:01,846-Speed 2497.09 samples/sec Loss 1.7380 LearningRate 0.000187 Epoch: 24 Global Step: 506480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:10,046-Speed 2498.15 samples/sec Loss 1.7426 LearningRate 0.000187 Epoch: 24 Global Step: 506490 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:18,244-Speed 2498.58 samples/sec Loss 1.8069 LearningRate 0.000187 Epoch: 24 Global Step: 506500 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:26,441-Speed 2498.78 samples/sec Loss 1.7136 LearningRate 0.000187 Epoch: 24 Global Step: 506510 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:34,641-Speed 2498.02 samples/sec Loss 1.7444 LearningRate 0.000187 Epoch: 24 Global Step: 506520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:42,786-Speed 2514.79 samples/sec Loss 1.7723 LearningRate 0.000187 Epoch: 24 Global Step: 506530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:50,989-Speed 2497.34 samples/sec Loss 1.7943 LearningRate 0.000187 Epoch: 24 Global Step: 506540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:24:59,189-Speed 2497.79 samples/sec Loss 1.7587 LearningRate 0.000187 Epoch: 24 Global Step: 506550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:07,392-Speed 2497.28 samples/sec Loss 1.7199 LearningRate 0.000187 Epoch: 24 Global Step: 506560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:15,591-Speed 2498.60 samples/sec Loss 1.7059 LearningRate 0.000187 Epoch: 24 Global Step: 506570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:23,793-Speed 2497.09 samples/sec Loss 1.7483 LearningRate 0.000187 Epoch: 24 Global Step: 506580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:31,944-Speed 2512.91 samples/sec Loss 1.7091 LearningRate 0.000187 Epoch: 24 Global Step: 506590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:40,145-Speed 2497.45 samples/sec Loss 1.7570 LearningRate 0.000187 Epoch: 24 Global Step: 506600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:48,347-Speed 2497.65 samples/sec Loss 1.7156 LearningRate 0.000187 Epoch: 24 Global Step: 506610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:25:56,547-Speed 2497.80 samples/sec Loss 1.7557 LearningRate 0.000187 Epoch: 24 Global Step: 506620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:04,750-Speed 2497.15 samples/sec Loss 1.7726 LearningRate 0.000187 Epoch: 24 Global Step: 506630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:12,952-Speed 2497.24 samples/sec Loss 1.7422 LearningRate 0.000187 Epoch: 24 Global Step: 506640 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:21,098-Speed 2514.63 samples/sec Loss 1.7676 LearningRate 0.000187 Epoch: 24 Global Step: 506650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:29,299-Speed 2497.91 samples/sec Loss 1.7339 LearningRate 0.000187 Epoch: 24 Global Step: 506660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:37,500-Speed 2497.57 samples/sec Loss 1.7739 LearningRate 0.000187 Epoch: 24 Global Step: 506670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:45,714-Speed 2493.78 samples/sec Loss 1.7212 LearningRate 0.000187 Epoch: 24 Global Step: 506680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:26:53,916-Speed 2497.39 samples/sec Loss 1.7439 LearningRate 0.000187 Epoch: 24 Global Step: 506690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:27:02,115-Speed 2498.39 samples/sec Loss 1.7077 LearningRate 0.000187 Epoch: 24 Global Step: 506700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:27:10,277-Speed 2509.85 samples/sec Loss 1.7193 LearningRate 0.000187 Epoch: 24 Global Step: 506710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:27:18,479-Speed 2497.27 samples/sec Loss 1.7127 LearningRate 0.000187 Epoch: 24 Global Step: 506720 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:27:26,677-Speed 2498.64 samples/sec Loss 1.7709 LearningRate 0.000187 Epoch: 24 Global Step: 506730 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:27:34,879-Speed 2497.59 samples/sec Loss 1.7521 LearningRate 0.000187 Epoch: 24 Global Step: 506740 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:27:43,167-Speed 2471.45 samples/sec Loss 1.7717 LearningRate 0.000187 Epoch: 24 Global Step: 506750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:27:51,368-Speed 2497.65 samples/sec Loss 1.7350 LearningRate 0.000187 Epoch: 24 Global Step: 506760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:27:59,520-Speed 2512.53 samples/sec Loss 1.7217 LearningRate 0.000187 Epoch: 24 Global Step: 506770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:07,742-Speed 2491.37 samples/sec Loss 1.7372 LearningRate 0.000187 Epoch: 24 Global Step: 506780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:15,943-Speed 2497.72 samples/sec Loss 1.7205 LearningRate 0.000187 Epoch: 24 Global Step: 506790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:24,144-Speed 2497.69 samples/sec Loss 1.7520 LearningRate 0.000187 Epoch: 24 Global Step: 506800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:32,346-Speed 2497.50 samples/sec Loss 1.7220 LearningRate 0.000187 Epoch: 24 Global Step: 506810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:40,554-Speed 2495.71 samples/sec Loss 1.7251 LearningRate 0.000187 Epoch: 24 Global Step: 506820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:48,699-Speed 2514.51 samples/sec Loss 1.7273 LearningRate 0.000187 Epoch: 24 Global Step: 506830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:28:56,897-Speed 2498.61 samples/sec Loss 1.7550 LearningRate 0.000187 Epoch: 24 Global Step: 506840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:05,105-Speed 2495.56 samples/sec Loss 1.7095 LearningRate 0.000187 Epoch: 24 Global Step: 506850 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:13,303-Speed 2498.79 samples/sec Loss 1.7354 LearningRate 0.000187 Epoch: 24 Global Step: 506860 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:21,504-Speed 2497.49 samples/sec Loss 1.7165 LearningRate 0.000187 Epoch: 24 Global Step: 506870 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:29,704-Speed 2498.03 samples/sec Loss 1.7039 LearningRate 0.000187 Epoch: 24 Global Step: 506880 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:37,851-Speed 2514.28 samples/sec Loss 1.7790 LearningRate 0.000187 Epoch: 24 Global Step: 506890 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:46,048-Speed 2498.84 samples/sec Loss 1.7337 LearningRate 0.000187 Epoch: 24 Global Step: 506900 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:29:54,262-Speed 2493.69 samples/sec Loss 1.7621 LearningRate 0.000187 Epoch: 24 Global Step: 506910 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:02,479-Speed 2492.89 samples/sec Loss 1.7695 LearningRate 0.000187 Epoch: 24 Global Step: 506920 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:10,683-Speed 2497.09 samples/sec Loss 1.7065 LearningRate 0.000187 Epoch: 24 Global Step: 506930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:18,883-Speed 2497.88 samples/sec Loss 1.7192 LearningRate 0.000187 Epoch: 24 Global Step: 506940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:27,038-Speed 2511.61 samples/sec Loss 1.7255 LearningRate 0.000187 Epoch: 24 Global Step: 506950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:35,245-Speed 2495.93 samples/sec Loss 1.7209 LearningRate 0.000187 Epoch: 24 Global Step: 506960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:43,443-Speed 2498.73 samples/sec Loss 1.7330 LearningRate 0.000187 Epoch: 24 Global Step: 506970 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:51,640-Speed 2498.71 samples/sec Loss 1.7362 LearningRate 0.000187 Epoch: 24 Global Step: 506980 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:30:59,841-Speed 2497.50 samples/sec Loss 1.7024 LearningRate 0.000187 Epoch: 24 Global Step: 506990 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:08,048-Speed 2495.86 samples/sec Loss 1.7782 LearningRate 0.000187 Epoch: 24 Global Step: 507000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:16,198-Speed 2513.46 samples/sec Loss 1.7401 LearningRate 0.000187 Epoch: 24 Global Step: 507010 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:24,399-Speed 2497.59 samples/sec Loss 1.7193 LearningRate 0.000187 Epoch: 24 Global Step: 507020 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:32,599-Speed 2497.90 samples/sec Loss 1.7584 LearningRate 0.000187 Epoch: 24 Global Step: 507030 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:40,804-Speed 2496.42 samples/sec Loss 1.7444 LearningRate 0.000187 Epoch: 24 Global Step: 507040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:49,003-Speed 2498.35 samples/sec Loss 1.7395 LearningRate 0.000187 Epoch: 24 Global Step: 507050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:31:57,206-Speed 2497.19 samples/sec Loss 1.7334 LearningRate 0.000187 Epoch: 24 Global Step: 507060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:05,355-Speed 2513.76 samples/sec Loss 1.7533 LearningRate 0.000187 Epoch: 24 Global Step: 507070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:13,557-Speed 2497.46 samples/sec Loss 1.7523 LearningRate 0.000187 Epoch: 24 Global Step: 507080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:21,763-Speed 2495.98 samples/sec Loss 1.7570 LearningRate 0.000187 Epoch: 24 Global Step: 507090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:29,970-Speed 2495.89 samples/sec Loss 1.7541 LearningRate 0.000187 Epoch: 24 Global Step: 507100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:38,173-Speed 2497.31 samples/sec Loss 1.7971 LearningRate 0.000187 Epoch: 24 Global Step: 507110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:46,387-Speed 2493.87 samples/sec Loss 1.7333 LearningRate 0.000187 Epoch: 24 Global Step: 507120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:32:54,536-Speed 2513.69 samples/sec Loss 1.7445 LearningRate 0.000187 Epoch: 24 Global Step: 507130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:02,739-Speed 2497.06 samples/sec Loss 1.7072 LearningRate 0.000186 Epoch: 24 Global Step: 507140 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:10,941-Speed 2497.17 samples/sec Loss 1.7323 LearningRate 0.000186 Epoch: 24 Global Step: 507150 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:19,145-Speed 2496.78 samples/sec Loss 1.7810 LearningRate 0.000186 Epoch: 24 Global Step: 507160 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:27,345-Speed 2498.21 samples/sec Loss 1.6870 LearningRate 0.000186 Epoch: 24 Global Step: 507170 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:35,548-Speed 2496.84 samples/sec Loss 1.7402 LearningRate 0.000186 Epoch: 24 Global Step: 507180 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:43,695-Speed 2514.27 samples/sec Loss 1.7554 LearningRate 0.000186 Epoch: 24 Global Step: 507190 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:33:51,896-Speed 2497.95 samples/sec Loss 1.7270 LearningRate 0.000186 Epoch: 24 Global Step: 507200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:00,113-Speed 2492.43 samples/sec Loss 1.7108 LearningRate 0.000186 Epoch: 24 Global Step: 507210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:08,313-Speed 2498.13 samples/sec Loss 1.7532 LearningRate 0.000186 Epoch: 24 Global Step: 507220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:16,526-Speed 2494.39 samples/sec Loss 1.7364 LearningRate 0.000186 Epoch: 24 Global Step: 507230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:24,744-Speed 2492.36 samples/sec Loss 1.7233 LearningRate 0.000186 Epoch: 24 Global Step: 507240 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:32,892-Speed 2513.84 samples/sec Loss 1.7150 LearningRate 0.000186 Epoch: 24 Global Step: 507250 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:41,091-Speed 2498.14 samples/sec Loss 1.7058 LearningRate 0.000186 Epoch: 24 Global Step: 507260 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:49,288-Speed 2499.04 samples/sec Loss 1.7199 LearningRate 0.000186 Epoch: 24 Global Step: 507270 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:34:57,501-Speed 2494.07 samples/sec Loss 1.7299 LearningRate 0.000186 Epoch: 24 Global Step: 507280 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:35:05,657-Speed 2511.47 samples/sec Loss 1.7262 LearningRate 0.000186 Epoch: 24 Global Step: 507290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:13,858-Speed 2497.37 samples/sec Loss 1.7388 LearningRate 0.000186 Epoch: 24 Global Step: 507300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:22,002-Speed 2515.36 samples/sec Loss 1.7502 LearningRate 0.000186 Epoch: 24 Global Step: 507310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:30,201-Speed 2498.19 samples/sec Loss 1.7056 LearningRate 0.000186 Epoch: 24 Global Step: 507320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:38,400-Speed 2498.18 samples/sec Loss 1.7311 LearningRate 0.000186 Epoch: 24 Global Step: 507330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:46,603-Speed 2497.32 samples/sec Loss 1.7586 LearningRate 0.000186 Epoch: 24 Global Step: 507340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:35:54,806-Speed 2497.10 samples/sec Loss 1.7678 LearningRate 0.000186 Epoch: 24 Global Step: 507350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:03,005-Speed 2498.62 samples/sec Loss 1.7678 LearningRate 0.000186 Epoch: 24 Global Step: 507360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:11,152-Speed 2514.21 samples/sec Loss 1.7464 LearningRate 0.000186 Epoch: 24 Global Step: 507370 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:19,353-Speed 2497.43 samples/sec Loss 1.7347 LearningRate 0.000186 Epoch: 24 Global Step: 507380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:27,581-Speed 2489.43 samples/sec Loss 1.7752 LearningRate 0.000186 Epoch: 24 Global Step: 507390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:35,783-Speed 2497.57 samples/sec Loss 1.7428 LearningRate 0.000186 Epoch: 24 Global Step: 507400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:43,987-Speed 2496.72 samples/sec Loss 1.7421 LearningRate 0.000186 Epoch: 24 Global Step: 507410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:36:52,190-Speed 2496.97 samples/sec Loss 1.7242 LearningRate 0.000186 Epoch: 24 Global Step: 507420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:00,337-Speed 2514.11 samples/sec Loss 1.7689 LearningRate 0.000186 Epoch: 24 Global Step: 507430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:08,541-Speed 2496.96 samples/sec Loss 1.7223 LearningRate 0.000186 Epoch: 24 Global Step: 507440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:16,743-Speed 2497.31 samples/sec Loss 1.7375 LearningRate 0.000186 Epoch: 24 Global Step: 507450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:24,947-Speed 2496.92 samples/sec Loss 1.7405 LearningRate 0.000186 Epoch: 24 Global Step: 507460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:33,167-Speed 2492.12 samples/sec Loss 1.7307 LearningRate 0.000186 Epoch: 24 Global Step: 507470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:41,372-Speed 2496.27 samples/sec Loss 1.7513 LearningRate 0.000186 Epoch: 24 Global Step: 507480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:49,522-Speed 2513.14 samples/sec Loss 1.7236 LearningRate 0.000186 Epoch: 24 Global Step: 507490 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:37:57,734-Speed 2494.34 samples/sec Loss 1.7140 LearningRate 0.000186 Epoch: 24 Global Step: 507500 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:05,933-Speed 2498.27 samples/sec Loss 1.7465 LearningRate 0.000186 Epoch: 24 Global Step: 507510 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:14,134-Speed 2497.57 samples/sec Loss 1.7089 LearningRate 0.000186 Epoch: 24 Global Step: 507520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:22,334-Speed 2497.96 samples/sec Loss 1.7403 LearningRate 0.000186 Epoch: 24 Global Step: 507530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:30,536-Speed 2497.42 samples/sec Loss 1.6973 LearningRate 0.000186 Epoch: 24 Global Step: 507540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:38,682-Speed 2514.48 samples/sec Loss 1.7211 LearningRate 0.000186 Epoch: 24 Global Step: 507550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:46,921-Speed 2486.13 samples/sec Loss 1.7202 LearningRate 0.000186 Epoch: 24 Global Step: 507560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:38:55,122-Speed 2497.66 samples/sec Loss 1.7342 LearningRate 0.000186 Epoch: 24 Global Step: 507570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:03,322-Speed 2497.80 samples/sec Loss 1.7460 LearningRate 0.000186 Epoch: 24 Global Step: 507580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:11,521-Speed 2498.48 samples/sec Loss 1.7214 LearningRate 0.000186 Epoch: 24 Global Step: 507590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:19,724-Speed 2496.95 samples/sec Loss 1.7315 LearningRate 0.000186 Epoch: 24 Global Step: 507600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:27,866-Speed 2515.83 samples/sec Loss 1.7257 LearningRate 0.000186 Epoch: 24 Global Step: 507610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:36,068-Speed 2497.65 samples/sec Loss 1.6855 LearningRate 0.000186 Epoch: 24 Global Step: 507620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:44,268-Speed 2497.83 samples/sec Loss 1.7446 LearningRate 0.000186 Epoch: 24 Global Step: 507630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:39:52,467-Speed 2498.11 samples/sec Loss 1.7256 LearningRate 0.000186 Epoch: 24 Global Step: 507640 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:00,668-Speed 2497.74 samples/sec Loss 1.7088 LearningRate 0.000186 Epoch: 24 Global Step: 507650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:08,873-Speed 2496.62 samples/sec Loss 1.7131 LearningRate 0.000186 Epoch: 24 Global Step: 507660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:17,021-Speed 2513.96 samples/sec Loss 1.7594 LearningRate 0.000186 Epoch: 24 Global Step: 507670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:25,221-Speed 2497.56 samples/sec Loss 1.7206 LearningRate 0.000186 Epoch: 24 Global Step: 507680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:33,424-Speed 2497.27 samples/sec Loss 1.7350 LearningRate 0.000186 Epoch: 24 Global Step: 507690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:41,627-Speed 2497.09 samples/sec Loss 1.7130 LearningRate 0.000186 Epoch: 24 Global Step: 507700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:49,831-Speed 2496.54 samples/sec Loss 1.7227 LearningRate 0.000186 Epoch: 24 Global Step: 507710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:40:58,031-Speed 2497.95 samples/sec Loss 1.7276 LearningRate 0.000186 Epoch: 24 Global Step: 507720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:06,180-Speed 2514.11 samples/sec Loss 1.7405 LearningRate 0.000186 Epoch: 24 Global Step: 507730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:14,381-Speed 2497.46 samples/sec Loss 1.7224 LearningRate 0.000186 Epoch: 24 Global Step: 507740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:22,582-Speed 2497.65 samples/sec Loss 1.6989 LearningRate 0.000186 Epoch: 24 Global Step: 507750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:30,781-Speed 2498.59 samples/sec Loss 1.7121 LearningRate 0.000186 Epoch: 24 Global Step: 507760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:38,982-Speed 2497.36 samples/sec Loss 1.6883 LearningRate 0.000186 Epoch: 24 Global Step: 507770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:47,193-Speed 2495.02 samples/sec Loss 1.7496 LearningRate 0.000186 Epoch: 24 Global Step: 507780 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:41:55,343-Speed 2513.45 samples/sec Loss 1.7271 LearningRate 0.000186 Epoch: 24 Global Step: 507790 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:03,541-Speed 2498.44 samples/sec Loss 1.7397 LearningRate 0.000186 Epoch: 24 Global Step: 507800 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:11,742-Speed 2497.61 samples/sec Loss 1.7298 LearningRate 0.000186 Epoch: 24 Global Step: 507810 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:19,945-Speed 2497.12 samples/sec Loss 1.7318 LearningRate 0.000186 Epoch: 24 Global Step: 507820 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:28,143-Speed 2498.72 samples/sec Loss 1.7174 LearningRate 0.000186 Epoch: 24 Global Step: 507830 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:36,343-Speed 2498.11 samples/sec Loss 1.7135 LearningRate 0.000186 Epoch: 24 Global Step: 507840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:44,489-Speed 2514.43 samples/sec Loss 1.7724 LearningRate 0.000186 Epoch: 24 Global Step: 507850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:42:52,693-Speed 2496.67 samples/sec Loss 1.7649 LearningRate 0.000186 Epoch: 24 Global Step: 507860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:00,894-Speed 2497.76 samples/sec Loss 1.7345 LearningRate 0.000186 Epoch: 24 Global Step: 507870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:09,094-Speed 2498.04 samples/sec Loss 1.7366 LearningRate 0.000186 Epoch: 24 Global Step: 507880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:17,294-Speed 2497.68 samples/sec Loss 1.7707 LearningRate 0.000186 Epoch: 24 Global Step: 507890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:25,493-Speed 2498.54 samples/sec Loss 1.7379 LearningRate 0.000186 Epoch: 24 Global Step: 507900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:33,653-Speed 2510.44 samples/sec Loss 1.7755 LearningRate 0.000186 Epoch: 24 Global Step: 507910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:41,851-Speed 2498.40 samples/sec Loss 1.7470 LearningRate 0.000186 Epoch: 24 Global Step: 507920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:50,053-Speed 2497.59 samples/sec Loss 1.7174 LearningRate 0.000186 Epoch: 24 Global Step: 507930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:43:58,269-Speed 2493.04 samples/sec Loss 1.7633 LearningRate 0.000186 Epoch: 24 Global Step: 507940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:06,468-Speed 2498.18 samples/sec Loss 1.7183 LearningRate 0.000186 Epoch: 24 Global Step: 507950 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:14,668-Speed 2497.99 samples/sec Loss 1.7165 LearningRate 0.000186 Epoch: 24 Global Step: 507960 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:22,816-Speed 2513.80 samples/sec Loss 1.7327 LearningRate 0.000186 Epoch: 24 Global Step: 507970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:31,027-Speed 2494.59 samples/sec Loss 1.7355 LearningRate 0.000186 Epoch: 24 Global Step: 507980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:39,251-Speed 2490.74 samples/sec Loss 1.7261 LearningRate 0.000186 Epoch: 24 Global Step: 507990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:47,457-Speed 2496.00 samples/sec Loss 1.7152 LearningRate 0.000186 Epoch: 24 Global Step: 508000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:44:55,657-Speed 2498.10 samples/sec Loss 1.7470 LearningRate 0.000185 Epoch: 24 Global Step: 508010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:03,870-Speed 2494.11 samples/sec Loss 1.6937 LearningRate 0.000185 Epoch: 24 Global Step: 508020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:12,018-Speed 2513.87 samples/sec Loss 1.7175 LearningRate 0.000185 Epoch: 24 Global Step: 508030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:20,219-Speed 2497.73 samples/sec Loss 1.6996 LearningRate 0.000185 Epoch: 24 Global Step: 508040 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:28,420-Speed 2497.71 samples/sec Loss 1.6768 LearningRate 0.000185 Epoch: 24 Global Step: 508050 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:36,620-Speed 2497.88 samples/sec Loss 1.7678 LearningRate 0.000185 Epoch: 24 Global Step: 508060 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:44,833-Speed 2493.99 samples/sec Loss 1.7351 LearningRate 0.000185 Epoch: 24 Global Step: 508070 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:45:53,037-Speed 2496.67 samples/sec Loss 1.7029 LearningRate 0.000185 Epoch: 24 Global Step: 508080 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:01,183-Speed 2514.57 samples/sec Loss 1.7356 LearningRate 0.000185 Epoch: 24 Global Step: 508090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:09,385-Speed 2497.58 samples/sec Loss 1.7333 LearningRate 0.000185 Epoch: 24 Global Step: 508100 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:17,590-Speed 2496.30 samples/sec Loss 1.7343 LearningRate 0.000185 Epoch: 24 Global Step: 508110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:25,791-Speed 2497.63 samples/sec Loss 1.7371 LearningRate 0.000185 Epoch: 24 Global Step: 508120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:33,992-Speed 2497.75 samples/sec Loss 1.7332 LearningRate 0.000185 Epoch: 24 Global Step: 508130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:42,193-Speed 2497.61 samples/sec Loss 1.7492 LearningRate 0.000185 Epoch: 24 Global Step: 508140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:50,339-Speed 2514.52 samples/sec Loss 1.7448 LearningRate 0.000185 Epoch: 24 Global Step: 508150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:46:58,539-Speed 2497.92 samples/sec Loss 1.7419 LearningRate 0.000185 Epoch: 24 Global Step: 508160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:06,752-Speed 2494.04 samples/sec Loss 1.8051 LearningRate 0.000185 Epoch: 24 Global Step: 508170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:14,950-Speed 2498.78 samples/sec Loss 1.7186 LearningRate 0.000185 Epoch: 24 Global Step: 508180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:23,150-Speed 2498.09 samples/sec Loss 1.6986 LearningRate 0.000185 Epoch: 24 Global Step: 508190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:31,348-Speed 2498.56 samples/sec Loss 1.7071 LearningRate 0.000185 Epoch: 24 Global Step: 508200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:39,496-Speed 2513.75 samples/sec Loss 1.7447 LearningRate 0.000185 Epoch: 24 Global Step: 508210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:47,694-Speed 2498.56 samples/sec Loss 1.7197 LearningRate 0.000185 Epoch: 24 Global Step: 508220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:47:55,900-Speed 2496.23 samples/sec Loss 1.7293 LearningRate 0.000185 Epoch: 24 Global Step: 508230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:04,102-Speed 2497.42 samples/sec Loss 1.7605 LearningRate 0.000185 Epoch: 24 Global Step: 508240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:12,303-Speed 2497.65 samples/sec Loss 1.7563 LearningRate 0.000185 Epoch: 24 Global Step: 508250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:20,503-Speed 2497.88 samples/sec Loss 1.7243 LearningRate 0.000185 Epoch: 24 Global Step: 508260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:28,650-Speed 2514.03 samples/sec Loss 1.7241 LearningRate 0.000185 Epoch: 24 Global Step: 508270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:36,851-Speed 2497.93 samples/sec Loss 1.7381 LearningRate 0.000185 Epoch: 24 Global Step: 508280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:45,057-Speed 2496.19 samples/sec Loss 1.7408 LearningRate 0.000185 Epoch: 24 Global Step: 508290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:48:53,267-Speed 2494.62 samples/sec Loss 1.7036 LearningRate 0.000185 Epoch: 24 Global Step: 508300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:01,471-Speed 2496.88 samples/sec Loss 1.7802 LearningRate 0.000185 Epoch: 24 Global Step: 508310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:09,674-Speed 2496.95 samples/sec Loss 1.7149 LearningRate 0.000185 Epoch: 24 Global Step: 508320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:17,824-Speed 2513.45 samples/sec Loss 1.7085 LearningRate 0.000185 Epoch: 24 Global Step: 508330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:26,032-Speed 2495.30 samples/sec Loss 1.7111 LearningRate 0.000185 Epoch: 24 Global Step: 508340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:34,236-Speed 2497.15 samples/sec Loss 1.7156 LearningRate 0.000185 Epoch: 24 Global Step: 508350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:42,438-Speed 2497.27 samples/sec Loss 1.6637 LearningRate 0.000185 Epoch: 24 Global Step: 508360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:50,641-Speed 2496.98 samples/sec Loss 1.7098 LearningRate 0.000185 Epoch: 24 Global Step: 508370 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:49:58,842-Speed 2497.62 samples/sec Loss 1.7569 LearningRate 0.000185 Epoch: 24 Global Step: 508380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:06,993-Speed 2512.73 samples/sec Loss 1.7426 LearningRate 0.000185 Epoch: 24 Global Step: 508390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:15,197-Speed 2497.12 samples/sec Loss 1.7280 LearningRate 0.000185 Epoch: 24 Global Step: 508400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:23,405-Speed 2495.21 samples/sec Loss 1.7037 LearningRate 0.000185 Epoch: 24 Global Step: 508410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:31,606-Speed 2497.62 samples/sec Loss 1.7239 LearningRate 0.000185 Epoch: 24 Global Step: 508420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:39,815-Speed 2495.21 samples/sec Loss 1.7160 LearningRate 0.000185 Epoch: 24 Global Step: 508430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:48,014-Speed 2498.19 samples/sec Loss 1.7492 LearningRate 0.000185 Epoch: 24 Global Step: 508440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:50:56,169-Speed 2511.70 samples/sec Loss 1.7041 LearningRate 0.000185 Epoch: 24 Global Step: 508450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:51:04,369-Speed 2497.83 samples/sec Loss 1.7094 LearningRate 0.000185 Epoch: 24 Global Step: 508460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:51:12,570-Speed 2497.92 samples/sec Loss 1.7203 LearningRate 0.000185 Epoch: 24 Global Step: 508470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:51:20,774-Speed 2496.72 samples/sec Loss 1.7255 LearningRate 0.000185 Epoch: 24 Global Step: 508480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-07-10 10:51:28,973-Speed 2498.15 samples/sec Loss 1.7509 LearningRate 0.000185 Epoch: 24 Global Step: 508490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:51:37,175-Speed 2497.31 samples/sec Loss 1.7326 LearningRate 0.000185 Epoch: 24 Global Step: 508500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:51:45,324-Speed 2513.95 samples/sec Loss 1.7210 LearningRate 0.000185 Epoch: 24 Global Step: 508510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:51:53,523-Speed 2498.03 samples/sec Loss 1.7297 LearningRate 0.000185 Epoch: 24 Global Step: 508520 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-07-10 10:52:01,728-Speed 2496.55 samples/sec Loss 1.7166 LearningRate 0.000185 Epoch: 24 Global Step: 508530 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:09,927-Speed 2498.27 samples/sec Loss 1.7145 LearningRate 0.000185 Epoch: 24 Global Step: 508540 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:18,124-Speed 2498.64 samples/sec Loss 1.7517 LearningRate 0.000185 Epoch: 24 Global Step: 508550 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:26,321-Speed 2498.91 samples/sec Loss 1.7557 LearningRate 0.000185 Epoch: 24 Global Step: 508560 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:34,466-Speed 2514.94 samples/sec Loss 1.7253 LearningRate 0.000185 Epoch: 24 Global Step: 508570 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:42,674-Speed 2495.65 samples/sec Loss 1.7775 LearningRate 0.000185 Epoch: 24 Global Step: 508580 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:50,870-Speed 2499.08 samples/sec Loss 1.7032 LearningRate 0.000185 Epoch: 24 Global Step: 508590 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:52:59,070-Speed 2497.99 samples/sec Loss 1.7274 LearningRate 0.000185 Epoch: 24 Global Step: 508600 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:07,276-Speed 2496.30 samples/sec Loss 1.7417 LearningRate 0.000185 Epoch: 24 Global Step: 508610 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:15,472-Speed 2499.01 samples/sec Loss 1.7754 LearningRate 0.000185 Epoch: 24 Global Step: 508620 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:23,618-Speed 2514.69 samples/sec Loss 1.7617 LearningRate 0.000185 Epoch: 24 Global Step: 508630 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:31,816-Speed 2498.63 samples/sec Loss 1.7489 LearningRate 0.000185 Epoch: 24 Global Step: 508640 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:40,012-Speed 2499.30 samples/sec Loss 1.7410 LearningRate 0.000185 Epoch: 24 Global Step: 508650 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:48,211-Speed 2498.23 samples/sec Loss 1.7512 LearningRate 0.000185 Epoch: 24 Global Step: 508660 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:53:56,409-Speed 2498.51 samples/sec Loss 1.7505 LearningRate 0.000185 Epoch: 24 Global Step: 508670 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:04,618-Speed 2495.14 samples/sec Loss 1.7132 LearningRate 0.000185 Epoch: 24 Global Step: 508680 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:12,764-Speed 2514.76 samples/sec Loss 1.7679 LearningRate 0.000185 Epoch: 24 Global Step: 508690 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:20,960-Speed 2499.17 samples/sec Loss 1.7334 LearningRate 0.000185 Epoch: 24 Global Step: 508700 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:29,164-Speed 2496.68 samples/sec Loss 1.7474 LearningRate 0.000185 Epoch: 24 Global Step: 508710 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:37,374-Speed 2494.68 samples/sec Loss 1.7459 LearningRate 0.000185 Epoch: 24 Global Step: 508720 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:45,573-Speed 2498.28 samples/sec Loss 1.7021 LearningRate 0.000185 Epoch: 24 Global Step: 508730 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:54:53,773-Speed 2498.18 samples/sec Loss 1.7411 LearningRate 0.000185 Epoch: 24 Global Step: 508740 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:01,919-Speed 2515.10 samples/sec Loss 1.7552 LearningRate 0.000185 Epoch: 24 Global Step: 508750 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:10,126-Speed 2495.61 samples/sec Loss 1.7558 LearningRate 0.000185 Epoch: 24 Global Step: 508760 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:18,326-Speed 2498.17 samples/sec Loss 1.7765 LearningRate 0.000185 Epoch: 24 Global Step: 508770 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:26,523-Speed 2498.59 samples/sec Loss 1.7331 LearningRate 0.000185 Epoch: 24 Global Step: 508780 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:34,720-Speed 2498.84 samples/sec Loss 1.7498 LearningRate 0.000185 Epoch: 24 Global Step: 508790 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:42,916-Speed 2499.45 samples/sec Loss 1.7541 LearningRate 0.000185 Epoch: 24 Global Step: 508800 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:51,058-Speed 2515.67 samples/sec Loss 1.7700 LearningRate 0.000185 Epoch: 24 Global Step: 508810 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:55:59,256-Speed 2498.74 samples/sec Loss 1.7384 LearningRate 0.000185 Epoch: 24 Global Step: 508820 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:07,449-Speed 2500.22 samples/sec Loss 1.7316 LearningRate 0.000185 Epoch: 24 Global Step: 508830 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:15,645-Speed 2499.25 samples/sec Loss 1.7345 LearningRate 0.000185 Epoch: 24 Global Step: 508840 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:23,842-Speed 2498.86 samples/sec Loss 1.7599 LearningRate 0.000185 Epoch: 24 Global Step: 508850 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:32,039-Speed 2498.79 samples/sec Loss 1.7271 LearningRate 0.000185 Epoch: 24 Global Step: 508860 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:40,185-Speed 2514.62 samples/sec Loss 1.7311 LearningRate 0.000184 Epoch: 24 Global Step: 508870 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:48,379-Speed 2499.66 samples/sec Loss 1.7141 LearningRate 0.000184 Epoch: 24 Global Step: 508880 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:56:56,576-Speed 2499.03 samples/sec Loss 1.7550 LearningRate 0.000184 Epoch: 24 Global Step: 508890 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:04,778-Speed 2497.22 samples/sec Loss 1.7248 LearningRate 0.000184 Epoch: 24 Global Step: 508900 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:12,975-Speed 2499.17 samples/sec Loss 1.7397 LearningRate 0.000184 Epoch: 24 Global Step: 508910 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:21,177-Speed 2497.10 samples/sec Loss 1.7177 LearningRate 0.000184 Epoch: 24 Global Step: 508920 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:29,327-Speed 2513.60 samples/sec Loss 1.7553 LearningRate 0.000184 Epoch: 24 Global Step: 508930 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:37,525-Speed 2498.50 samples/sec Loss 1.7396 LearningRate 0.000184 Epoch: 24 Global Step: 508940 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:45,723-Speed 2498.54 samples/sec Loss 1.7178 LearningRate 0.000184 Epoch: 24 Global Step: 508950 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:57:53,925-Speed 2497.24 samples/sec Loss 1.7267 LearningRate 0.000184 Epoch: 24 Global Step: 508960 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:02,122-Speed 2499.08 samples/sec Loss 1.7581 LearningRate 0.000184 Epoch: 24 Global Step: 508970 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:10,330-Speed 2495.71 samples/sec Loss 1.7042 LearningRate 0.000184 Epoch: 24 Global Step: 508980 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:18,476-Speed 2514.32 samples/sec Loss 1.7167 LearningRate 0.000184 Epoch: 24 Global Step: 508990 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:26,681-Speed 2496.64 samples/sec Loss 1.7510 LearningRate 0.000184 Epoch: 24 Global Step: 509000 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:34,881-Speed 2497.95 samples/sec Loss 1.7402 LearningRate 0.000184 Epoch: 24 Global Step: 509010 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:43,089-Speed 2495.58 samples/sec Loss 1.7513 LearningRate 0.000184 Epoch: 24 Global Step: 509020 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:51,292-Speed 2497.05 samples/sec Loss 1.7531 LearningRate 0.000184 Epoch: 24 Global Step: 509030 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:58:59,490-Speed 2498.24 samples/sec Loss 1.7588 LearningRate 0.000184 Epoch: 24 Global Step: 509040 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:07,646-Speed 2511.60 samples/sec Loss 1.7447 LearningRate 0.000184 Epoch: 24 Global Step: 509050 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:15,846-Speed 2497.97 samples/sec Loss 1.7137 LearningRate 0.000184 Epoch: 24 Global Step: 509060 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:24,050-Speed 2496.83 samples/sec Loss 1.7625 LearningRate 0.000184 Epoch: 24 Global Step: 509070 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:32,258-Speed 2495.43 samples/sec Loss 1.7542 LearningRate 0.000184 Epoch: 24 Global Step: 509080 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:40,455-Speed 2498.68 samples/sec Loss 1.7287 LearningRate 0.000184 Epoch: 24 Global Step: 509090 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:48,654-Speed 2498.37 samples/sec Loss 1.7816 LearningRate 0.000184 Epoch: 24 Global Step: 509100 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 10:59:56,799-Speed 2514.80 samples/sec Loss 1.7304 LearningRate 0.000184 Epoch: 24 Global Step: 509110 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:04,998-Speed 2498.47 samples/sec Loss 1.7628 LearningRate 0.000184 Epoch: 24 Global Step: 509120 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:13,204-Speed 2496.11 samples/sec Loss 1.7233 LearningRate 0.000184 Epoch: 24 Global Step: 509130 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:21,404-Speed 2497.89 samples/sec Loss 1.7410 LearningRate 0.000184 Epoch: 24 Global Step: 509140 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:29,602-Speed 2498.68 samples/sec Loss 1.7109 LearningRate 0.000184 Epoch: 24 Global Step: 509150 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:37,801-Speed 2498.36 samples/sec Loss 1.7550 LearningRate 0.000184 Epoch: 24 Global Step: 509160 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:45,948-Speed 2514.17 samples/sec Loss 1.7590 LearningRate 0.000184 Epoch: 24 Global Step: 509170 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:00:54,147-Speed 2498.46 samples/sec Loss 1.7647 LearningRate 0.000184 Epoch: 24 Global Step: 509180 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:02,346-Speed 2498.17 samples/sec Loss 1.7227 LearningRate 0.000184 Epoch: 24 Global Step: 509190 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:10,543-Speed 2499.42 samples/sec Loss 1.7350 LearningRate 0.000184 Epoch: 24 Global Step: 509200 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:18,739-Speed 2499.20 samples/sec Loss 1.7272 LearningRate 0.000184 Epoch: 24 Global Step: 509210 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:26,938-Speed 2498.27 samples/sec Loss 1.7498 LearningRate 0.000184 Epoch: 24 Global Step: 509220 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:35,085-Speed 2514.13 samples/sec Loss 1.7677 LearningRate 0.000184 Epoch: 24 Global Step: 509230 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:43,289-Speed 2496.51 samples/sec Loss 1.7406 LearningRate 0.000184 Epoch: 24 Global Step: 509240 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:51,486-Speed 2499.01 samples/sec Loss 1.7686 LearningRate 0.000184 Epoch: 24 Global Step: 509250 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:01:59,682-Speed 2499.23 samples/sec Loss 1.7505 LearningRate 0.000184 Epoch: 24 Global Step: 509260 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:07,880-Speed 2498.57 samples/sec Loss 1.7258 LearningRate 0.000184 Epoch: 24 Global Step: 509270 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:16,081-Speed 2497.75 samples/sec Loss 1.7388 LearningRate 0.000184 Epoch: 24 Global Step: 509280 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:24,230-Speed 2513.84 samples/sec Loss 1.7547 LearningRate 0.000184 Epoch: 24 Global Step: 509290 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:32,434-Speed 2496.37 samples/sec Loss 1.7556 LearningRate 0.000184 Epoch: 24 Global Step: 509300 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:40,637-Speed 2497.37 samples/sec Loss 1.7689 LearningRate 0.000184 Epoch: 24 Global Step: 509310 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:48,833-Speed 2499.10 samples/sec Loss 1.7004 LearningRate 0.000184 Epoch: 24 Global Step: 509320 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:02:57,034-Speed 2497.74 samples/sec Loss 1.7374 LearningRate 0.000184 Epoch: 24 Global Step: 509330 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:05,231-Speed 2498.77 samples/sec Loss 1.7428 LearningRate 0.000184 Epoch: 24 Global Step: 509340 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:13,376-Speed 2514.65 samples/sec Loss 1.7644 LearningRate 0.000184 Epoch: 24 Global Step: 509350 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:21,575-Speed 2498.54 samples/sec Loss 1.7337 LearningRate 0.000184 Epoch: 24 Global Step: 509360 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:29,772-Speed 2498.71 samples/sec Loss 1.7382 LearningRate 0.000184 Epoch: 24 Global Step: 509370 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:37,974-Speed 2497.50 samples/sec Loss 1.7240 LearningRate 0.000184 Epoch: 24 Global Step: 509380 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:46,174-Speed 2497.70 samples/sec Loss 1.7269 LearningRate 0.000184 Epoch: 24 Global Step: 509390 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:03:54,377-Speed 2497.48 samples/sec Loss 1.7439 LearningRate 0.000184 Epoch: 24 Global Step: 509400 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:02,529-Speed 2512.64 samples/sec Loss 1.7477 LearningRate 0.000184 Epoch: 24 Global Step: 509410 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:10,725-Speed 2498.90 samples/sec Loss 1.7453 LearningRate 0.000184 Epoch: 24 Global Step: 509420 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:18,924-Speed 2498.46 samples/sec Loss 1.7225 LearningRate 0.000184 Epoch: 24 Global Step: 509430 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:27,126-Speed 2497.17 samples/sec Loss 1.7035 LearningRate 0.000184 Epoch: 24 Global Step: 509440 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:35,323-Speed 2498.94 samples/sec Loss 1.7114 LearningRate 0.000184 Epoch: 24 Global Step: 509450 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:43,521-Speed 2498.60 samples/sec Loss 1.7469 LearningRate 0.000184 Epoch: 24 Global Step: 509460 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:51,674-Speed 2512.44 samples/sec Loss 1.6955 LearningRate 0.000184 Epoch: 24 Global Step: 509470 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:04:59,877-Speed 2496.82 samples/sec Loss 1.7696 LearningRate 0.000184 Epoch: 24 Global Step: 509480 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:08,076-Speed 2498.28 samples/sec Loss 1.7135 LearningRate 0.000184 Epoch: 24 Global Step: 509490 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:16,275-Speed 2498.13 samples/sec Loss 1.7275 LearningRate 0.000184 Epoch: 24 Global Step: 509500 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:24,474-Speed 2498.38 samples/sec Loss 1.6660 LearningRate 0.000184 Epoch: 24 Global Step: 509510 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:32,676-Speed 2497.65 samples/sec Loss 1.7487 LearningRate 0.000184 Epoch: 24 Global Step: 509520 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:40,823-Speed 2514.44 samples/sec Loss 1.6903 LearningRate 0.000184 Epoch: 24 Global Step: 509530 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:49,024-Speed 2497.42 samples/sec Loss 1.7012 LearningRate 0.000184 Epoch: 24 Global Step: 509540 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:05:57,227-Speed 2497.09 samples/sec Loss 1.7545 LearningRate 0.000184 Epoch: 24 Global Step: 509550 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:05,422-Speed 2499.81 samples/sec Loss 1.7484 LearningRate 0.000184 Epoch: 24 Global Step: 509560 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:13,621-Speed 2498.03 samples/sec Loss 1.7146 LearningRate 0.000184 Epoch: 24 Global Step: 509570 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:21,821-Speed 2498.10 samples/sec Loss 1.7252 LearningRate 0.000184 Epoch: 24 Global Step: 509580 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:29,972-Speed 2514.07 samples/sec Loss 1.7401 LearningRate 0.000184 Epoch: 24 Global Step: 509590 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:38,170-Speed 2498.50 samples/sec Loss 1.6994 LearningRate 0.000184 Epoch: 24 Global Step: 509600 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:46,369-Speed 2498.17 samples/sec Loss 1.7315 LearningRate 0.000184 Epoch: 24 Global Step: 509610 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:06:54,570-Speed 2497.83 samples/sec Loss 1.6805 LearningRate 0.000184 Epoch: 24 Global Step: 509620 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-07-10 11:07:02,733-Speed 2509.30 samples/sec Loss 1.7538 LearningRate 0.000184 Epoch: 24 Global Step: 509630 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:10,935-Speed 2497.27 samples/sec Loss 1.7079 LearningRate 0.000184 Epoch: 24 Global Step: 509640 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:19,088-Speed 2512.33 samples/sec Loss 1.7179 LearningRate 0.000184 Epoch: 24 Global Step: 509650 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:27,285-Speed 2498.85 samples/sec Loss 1.7334 LearningRate 0.000184 Epoch: 24 Global Step: 509660 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:35,489-Speed 2497.03 samples/sec Loss 1.6952 LearningRate 0.000184 Epoch: 24 Global Step: 509670 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:43,703-Speed 2493.38 samples/sec Loss 1.7259 LearningRate 0.000184 Epoch: 24 Global Step: 509680 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:07:51,905-Speed 2497.68 samples/sec Loss 1.7239 LearningRate 0.000184 Epoch: 24 Global Step: 509690 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:00,110-Speed 2496.37 samples/sec Loss 1.7487 LearningRate 0.000184 Epoch: 24 Global Step: 509700 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:08,255-Speed 2514.75 samples/sec Loss 1.7143 LearningRate 0.000184 Epoch: 24 Global Step: 509710 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:16,465-Speed 2495.09 samples/sec Loss 1.7456 LearningRate 0.000184 Epoch: 24 Global Step: 509720 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:24,671-Speed 2496.06 samples/sec Loss 1.7359 LearningRate 0.000184 Epoch: 24 Global Step: 509730 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:32,866-Speed 2499.68 samples/sec Loss 1.7260 LearningRate 0.000183 Epoch: 24 Global Step: 509740 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:41,070-Speed 2497.39 samples/sec Loss 1.7316 LearningRate 0.000183 Epoch: 24 Global Step: 509750 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:49,272-Speed 2497.11 samples/sec Loss 1.6614 LearningRate 0.000183 Epoch: 24 Global Step: 509760 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:08:57,421-Speed 2513.61 samples/sec Loss 1.7273 LearningRate 0.000183 Epoch: 24 Global Step: 509770 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:05,624-Speed 2497.17 samples/sec Loss 1.6851 LearningRate 0.000183 Epoch: 24 Global Step: 509780 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:13,821-Speed 2498.94 samples/sec Loss 1.7056 LearningRate 0.000183 Epoch: 24 Global Step: 509790 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:22,021-Speed 2497.79 samples/sec Loss 1.7154 LearningRate 0.000183 Epoch: 24 Global Step: 509800 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:30,235-Speed 2493.95 samples/sec Loss 1.6946 LearningRate 0.000183 Epoch: 24 Global Step: 509810 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:38,430-Speed 2499.33 samples/sec Loss 1.6959 LearningRate 0.000183 Epoch: 24 Global Step: 509820 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:46,577-Speed 2514.41 samples/sec Loss 1.6991 LearningRate 0.000183 Epoch: 24 Global Step: 509830 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:09:54,786-Speed 2494.99 samples/sec Loss 1.7029 LearningRate 0.000183 Epoch: 24 Global Step: 509840 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:02,988-Speed 2497.57 samples/sec Loss 1.7290 LearningRate 0.000183 Epoch: 24 Global Step: 509850 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:11,189-Speed 2497.72 samples/sec Loss 1.6873 LearningRate 0.000183 Epoch: 24 Global Step: 509860 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:19,388-Speed 2498.18 samples/sec Loss 1.7324 LearningRate 0.000183 Epoch: 24 Global Step: 509870 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:27,585-Speed 2498.70 samples/sec Loss 1.7247 LearningRate 0.000183 Epoch: 24 Global Step: 509880 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:35,731-Speed 2514.82 samples/sec Loss 1.7096 LearningRate 0.000183 Epoch: 24 Global Step: 509890 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:43,929-Speed 2498.67 samples/sec Loss 1.7440 LearningRate 0.000183 Epoch: 24 Global Step: 509900 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:10:52,130-Speed 2497.72 samples/sec Loss 1.7293 LearningRate 0.000183 Epoch: 24 Global Step: 509910 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:00,329-Speed 2498.37 samples/sec Loss 1.7438 LearningRate 0.000183 Epoch: 24 Global Step: 509920 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:08,527-Speed 2498.48 samples/sec Loss 1.7285 LearningRate 0.000183 Epoch: 24 Global Step: 509930 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:16,728-Speed 2497.79 samples/sec Loss 1.7040 LearningRate 0.000183 Epoch: 24 Global Step: 509940 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:24,872-Speed 2515.05 samples/sec Loss 1.7265 LearningRate 0.000183 Epoch: 24 Global Step: 509950 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:33,073-Speed 2497.75 samples/sec Loss 1.6789 LearningRate 0.000183 Epoch: 24 Global Step: 509960 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:41,272-Speed 2498.23 samples/sec Loss 1.7116 LearningRate 0.000183 Epoch: 24 Global Step: 509970 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:49,470-Speed 2498.48 samples/sec Loss 1.6990 LearningRate 0.000183 Epoch: 24 Global Step: 509980 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:11:57,675-Speed 2496.58 samples/sec Loss 1.7045 LearningRate 0.000183 Epoch: 24 Global Step: 509990 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:05,876-Speed 2497.49 samples/sec Loss 1.7158 LearningRate 0.000183 Epoch: 24 Global Step: 510000 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:14,024-Speed 2514.65 samples/sec Loss 1.6701 LearningRate 0.000183 Epoch: 24 Global Step: 510010 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:22,225-Speed 2497.67 samples/sec Loss 1.7146 LearningRate 0.000183 Epoch: 24 Global Step: 510020 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:30,426-Speed 2497.79 samples/sec Loss 1.6960 LearningRate 0.000183 Epoch: 24 Global Step: 510030 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:38,625-Speed 2498.04 samples/sec Loss 1.7085 LearningRate 0.000183 Epoch: 24 Global Step: 510040 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:46,825-Speed 2498.07 samples/sec Loss 1.7196 LearningRate 0.000183 Epoch: 24 Global Step: 510050 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:12:55,028-Speed 2497.16 samples/sec Loss 1.7177 LearningRate 0.000183 Epoch: 24 Global Step: 510060 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:03,174-Speed 2514.28 samples/sec Loss 1.7251 LearningRate 0.000183 Epoch: 24 Global Step: 510070 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:11,374-Speed 2498.14 samples/sec Loss 1.7085 LearningRate 0.000183 Epoch: 24 Global Step: 510080 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:19,573-Speed 2498.42 samples/sec Loss 1.6821 LearningRate 0.000183 Epoch: 24 Global Step: 510090 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:27,774-Speed 2497.47 samples/sec Loss 1.7266 LearningRate 0.000183 Epoch: 24 Global Step: 510100 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:35,986-Speed 2494.47 samples/sec Loss 1.7308 LearningRate 0.000183 Epoch: 24 Global Step: 510110 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:44,188-Speed 2497.45 samples/sec Loss 1.6953 LearningRate 0.000183 Epoch: 24 Global Step: 510120 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:13:52,336-Speed 2514.45 samples/sec Loss 1.7179 LearningRate 0.000183 Epoch: 24 Global Step: 510130 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:00,534-Speed 2498.28 samples/sec Loss 1.7398 LearningRate 0.000183 Epoch: 24 Global Step: 510140 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:08,734-Speed 2498.03 samples/sec Loss 1.7020 LearningRate 0.000183 Epoch: 24 Global Step: 510150 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:16,932-Speed 2498.53 samples/sec Loss 1.6892 LearningRate 0.000183 Epoch: 24 Global Step: 510160 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:25,129-Speed 2498.95 samples/sec Loss 1.7394 LearningRate 0.000183 Epoch: 24 Global Step: 510170 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:33,330-Speed 2497.84 samples/sec Loss 1.7112 LearningRate 0.000183 Epoch: 24 Global Step: 510180 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:41,475-Speed 2514.83 samples/sec Loss 1.7203 LearningRate 0.000183 Epoch: 24 Global Step: 510190 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:49,674-Speed 2498.11 samples/sec Loss 1.7008 LearningRate 0.000183 Epoch: 24 Global Step: 510200 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:14:57,881-Speed 2495.69 samples/sec Loss 1.7148 LearningRate 0.000183 Epoch: 24 Global Step: 510210 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:06,084-Speed 2497.13 samples/sec Loss 1.7109 LearningRate 0.000183 Epoch: 24 Global Step: 510220 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:14,287-Speed 2497.08 samples/sec Loss 1.7335 LearningRate 0.000183 Epoch: 24 Global Step: 510230 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:22,491-Speed 2496.95 samples/sec Loss 1.7286 LearningRate 0.000183 Epoch: 24 Global Step: 510240 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:30,639-Speed 2513.61 samples/sec Loss 1.7193 LearningRate 0.000183 Epoch: 24 Global Step: 510250 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:38,840-Speed 2497.57 samples/sec Loss 1.7367 LearningRate 0.000183 Epoch: 24 Global Step: 510260 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:15:46,999-Speed 2510.93 samples/sec Loss 1.6602 LearningRate 0.000183 Epoch: 24 Global Step: 510270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:15:55,221-Speed 2491.16 samples/sec Loss 1.7261 LearningRate 0.000183 Epoch: 24 Global Step: 510280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:03,422-Speed 2498.14 samples/sec Loss 1.7142 LearningRate 0.000183 Epoch: 24 Global Step: 510290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:11,619-Speed 2498.96 samples/sec Loss 1.7107 LearningRate 0.000183 Epoch: 24 Global Step: 510300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:19,765-Speed 2514.57 samples/sec Loss 1.7190 LearningRate 0.000183 Epoch: 24 Global Step: 510310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:27,961-Speed 2499.00 samples/sec Loss 1.6918 LearningRate 0.000183 Epoch: 24 Global Step: 510320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:36,158-Speed 2499.12 samples/sec Loss 1.7225 LearningRate 0.000183 Epoch: 24 Global Step: 510330 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:44,359-Speed 2497.76 samples/sec Loss 1.7117 LearningRate 0.000183 Epoch: 24 Global Step: 510340 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:16:52,558-Speed 2498.14 samples/sec Loss 1.6935 LearningRate 0.000183 Epoch: 24 Global Step: 510350 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:00,757-Speed 2498.26 samples/sec Loss 1.7481 LearningRate 0.000183 Epoch: 24 Global Step: 510360 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:08,900-Speed 2515.67 samples/sec Loss 1.7487 LearningRate 0.000183 Epoch: 24 Global Step: 510370 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:17,111-Speed 2494.65 samples/sec Loss 1.7372 LearningRate 0.000183 Epoch: 24 Global Step: 510380 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:25,310-Speed 2498.20 samples/sec Loss 1.7254 LearningRate 0.000183 Epoch: 24 Global Step: 510390 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:33,509-Speed 2498.44 samples/sec Loss 1.7854 LearningRate 0.000183 Epoch: 24 Global Step: 510400 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:41,706-Speed 2498.78 samples/sec Loss 1.7535 LearningRate 0.000183 Epoch: 24 Global Step: 510410 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:49,904-Speed 2498.54 samples/sec Loss 1.7161 LearningRate 0.000183 Epoch: 24 Global Step: 510420 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:17:58,050-Speed 2514.51 samples/sec Loss 1.7391 LearningRate 0.000183 Epoch: 24 Global Step: 510430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:06,245-Speed 2499.48 samples/sec Loss 1.7469 LearningRate 0.000183 Epoch: 24 Global Step: 510440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:14,445-Speed 2498.03 samples/sec Loss 1.7425 LearningRate 0.000183 Epoch: 24 Global Step: 510450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:22,642-Speed 2498.80 samples/sec Loss 1.6789 LearningRate 0.000183 Epoch: 24 Global Step: 510460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:30,850-Speed 2495.48 samples/sec Loss 1.7371 LearningRate 0.000183 Epoch: 24 Global Step: 510470 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:39,046-Speed 2499.11 samples/sec Loss 1.7391 LearningRate 0.000183 Epoch: 24 Global Step: 510480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:47,201-Speed 2511.59 samples/sec Loss 1.7295 LearningRate 0.000183 Epoch: 24 Global Step: 510490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:18:55,402-Speed 2497.89 samples/sec Loss 1.7481 LearningRate 0.000183 Epoch: 24 Global Step: 510500 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:03,595-Speed 2500.17 samples/sec Loss 1.7340 LearningRate 0.000183 Epoch: 24 Global Step: 510510 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:11,795-Speed 2497.99 samples/sec Loss 1.7206 LearningRate 0.000183 Epoch: 24 Global Step: 510520 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:19,992-Speed 2498.83 samples/sec Loss 1.7315 LearningRate 0.000183 Epoch: 24 Global Step: 510530 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:28,191-Speed 2498.42 samples/sec Loss 1.7419 LearningRate 0.000183 Epoch: 24 Global Step: 510540 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:36,342-Speed 2513.05 samples/sec Loss 1.6948 LearningRate 0.000183 Epoch: 24 Global Step: 510550 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:44,548-Speed 2496.08 samples/sec Loss 1.7220 LearningRate 0.000183 Epoch: 24 Global Step: 510560 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:19:52,752-Speed 2496.64 samples/sec Loss 1.7040 LearningRate 0.000183 Epoch: 24 Global Step: 510570 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:00,951-Speed 2498.48 samples/sec Loss 1.6795 LearningRate 0.000183 Epoch: 24 Global Step: 510580 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:09,146-Speed 2499.19 samples/sec Loss 1.7021 LearningRate 0.000183 Epoch: 24 Global Step: 510590 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:17,342-Speed 2499.14 samples/sec Loss 1.7216 LearningRate 0.000183 Epoch: 24 Global Step: 510600 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:25,487-Speed 2515.00 samples/sec Loss 1.7550 LearningRate 0.000183 Epoch: 24 Global Step: 510610 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:33,686-Speed 2498.22 samples/sec Loss 1.7398 LearningRate 0.000182 Epoch: 24 Global Step: 510620 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:41,885-Speed 2498.48 samples/sec Loss 1.7317 LearningRate 0.000182 Epoch: 24 Global Step: 510630 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:50,084-Speed 2498.32 samples/sec Loss 1.7094 LearningRate 0.000182 Epoch: 24 Global Step: 510640 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:20:58,281-Speed 2498.97 samples/sec Loss 1.7131 LearningRate 0.000182 Epoch: 24 Global Step: 510650 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:06,485-Speed 2496.83 samples/sec Loss 1.7367 LearningRate 0.000182 Epoch: 24 Global Step: 510660 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:14,629-Speed 2515.20 samples/sec Loss 1.7754 LearningRate 0.000182 Epoch: 24 Global Step: 510670 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:22,838-Speed 2495.16 samples/sec Loss 1.7024 LearningRate 0.000182 Epoch: 24 Global Step: 510680 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:31,034-Speed 2499.09 samples/sec Loss 1.7449 LearningRate 0.000182 Epoch: 24 Global Step: 510690 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:39,235-Speed 2497.72 samples/sec Loss 1.7118 LearningRate 0.000182 Epoch: 24 Global Step: 510700 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:47,436-Speed 2497.81 samples/sec Loss 1.7666 LearningRate 0.000182 Epoch: 24 Global Step: 510710 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:21:55,634-Speed 2498.37 samples/sec Loss 1.7038 LearningRate 0.000182 Epoch: 24 Global Step: 510720 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:03,791-Speed 2511.08 samples/sec Loss 1.7479 LearningRate 0.000182 Epoch: 24 Global Step: 510730 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:11,990-Speed 2498.39 samples/sec Loss 1.7224 LearningRate 0.000182 Epoch: 24 Global Step: 510740 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:20,186-Speed 2499.36 samples/sec Loss 1.7102 LearningRate 0.000182 Epoch: 24 Global Step: 510750 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:28,385-Speed 2498.46 samples/sec Loss 1.6811 LearningRate 0.000182 Epoch: 24 Global Step: 510760 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:36,579-Speed 2499.74 samples/sec Loss 1.7360 LearningRate 0.000182 Epoch: 24 Global Step: 510770 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:44,778-Speed 2498.31 samples/sec Loss 1.7213 LearningRate 0.000182 Epoch: 24 Global Step: 510780 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:22:52,922-Speed 2515.11 samples/sec Loss 1.7226 LearningRate 0.000182 Epoch: 24 Global Step: 510790 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:01,121-Speed 2498.29 samples/sec Loss 1.7620 LearningRate 0.000182 Epoch: 24 Global Step: 510800 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:09,322-Speed 2497.84 samples/sec Loss 1.6998 LearningRate 0.000182 Epoch: 24 Global Step: 510810 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:17,524-Speed 2497.13 samples/sec Loss 1.7287 LearningRate 0.000182 Epoch: 24 Global Step: 510820 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:25,721-Speed 2499.08 samples/sec Loss 1.7157 LearningRate 0.000182 Epoch: 24 Global Step: 510830 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:33,921-Speed 2497.95 samples/sec Loss 1.7179 LearningRate 0.000182 Epoch: 24 Global Step: 510840 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:42,067-Speed 2514.46 samples/sec Loss 1.7432 LearningRate 0.000182 Epoch: 24 Global Step: 510850 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:50,262-Speed 2499.71 samples/sec Loss 1.7023 LearningRate 0.000182 Epoch: 24 Global Step: 510860 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:23:58,467-Speed 2496.56 samples/sec Loss 1.7057 LearningRate 0.000182 Epoch: 24 Global Step: 510870 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:06,662-Speed 2499.37 samples/sec Loss 1.7075 LearningRate 0.000182 Epoch: 24 Global Step: 510880 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:14,866-Speed 2496.88 samples/sec Loss 1.7269 LearningRate 0.000182 Epoch: 24 Global Step: 510890 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:23,065-Speed 2498.29 samples/sec Loss 1.7035 LearningRate 0.000182 Epoch: 24 Global Step: 510900 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:31,208-Speed 2515.26 samples/sec Loss 1.7651 LearningRate 0.000182 Epoch: 24 Global Step: 510910 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:39,421-Speed 2494.13 samples/sec Loss 1.6832 LearningRate 0.000182 Epoch: 24 Global Step: 510920 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:47,624-Speed 2497.09 samples/sec Loss 1.7296 LearningRate 0.000182 Epoch: 24 Global Step: 510930 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:24:55,819-Speed 2499.46 samples/sec Loss 1.7440 LearningRate 0.000182 Epoch: 24 Global Step: 510940 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:04,035-Speed 2493.16 samples/sec Loss 1.7226 LearningRate 0.000182 Epoch: 24 Global Step: 510950 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:12,234-Speed 2498.28 samples/sec Loss 1.7574 LearningRate 0.000182 Epoch: 24 Global Step: 510960 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:20,381-Speed 2514.45 samples/sec Loss 1.7364 LearningRate 0.000182 Epoch: 24 Global Step: 510970 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:28,580-Speed 2498.02 samples/sec Loss 1.7628 LearningRate 0.000182 Epoch: 24 Global Step: 510980 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:36,777-Speed 2499.00 samples/sec Loss 1.7084 LearningRate 0.000182 Epoch: 24 Global Step: 510990 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:44,991-Speed 2494.08 samples/sec Loss 1.7549 LearningRate 0.000182 Epoch: 24 Global Step: 511000 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:25:53,191-Speed 2497.85 samples/sec Loss 1.7175 LearningRate 0.000182 Epoch: 24 Global Step: 511010 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:01,390-Speed 2498.53 samples/sec Loss 1.7703 LearningRate 0.000182 Epoch: 24 Global Step: 511020 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:09,539-Speed 2513.44 samples/sec Loss 1.7689 LearningRate 0.000182 Epoch: 24 Global Step: 511030 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:17,739-Speed 2498.23 samples/sec Loss 1.7205 LearningRate 0.000182 Epoch: 24 Global Step: 511040 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:25,936-Speed 2498.66 samples/sec Loss 1.7112 LearningRate 0.000182 Epoch: 24 Global Step: 511050 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:34,134-Speed 2498.62 samples/sec Loss 1.7383 LearningRate 0.000182 Epoch: 24 Global Step: 511060 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:42,332-Speed 2498.76 samples/sec Loss 1.6932 LearningRate 0.000182 Epoch: 24 Global Step: 511070 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:50,544-Speed 2494.17 samples/sec Loss 1.7408 LearningRate 0.000182 Epoch: 24 Global Step: 511080 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:26:58,691-Speed 2514.42 samples/sec Loss 1.7534 LearningRate 0.000182 Epoch: 24 Global Step: 511090 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:06,886-Speed 2499.35 samples/sec Loss 1.7354 LearningRate 0.000182 Epoch: 24 Global Step: 511100 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:15,086-Speed 2498.12 samples/sec Loss 1.6995 LearningRate 0.000182 Epoch: 24 Global Step: 511110 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:23,281-Speed 2499.48 samples/sec Loss 1.7218 LearningRate 0.000182 Epoch: 24 Global Step: 511120 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:31,480-Speed 2498.09 samples/sec Loss 1.6940 LearningRate 0.000182 Epoch: 24 Global Step: 511130 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:39,681-Speed 2497.76 samples/sec Loss 1.7128 LearningRate 0.000182 Epoch: 24 Global Step: 511140 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:47,826-Speed 2515.01 samples/sec Loss 1.6710 LearningRate 0.000182 Epoch: 24 Global Step: 511150 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:27:56,020-Speed 2499.71 samples/sec Loss 1.7176 LearningRate 0.000182 Epoch: 24 Global Step: 511160 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:04,221-Speed 2497.53 samples/sec Loss 1.7195 LearningRate 0.000182 Epoch: 24 Global Step: 511170 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:12,414-Speed 2499.93 samples/sec Loss 1.7122 LearningRate 0.000182 Epoch: 24 Global Step: 511180 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:20,611-Speed 2499.19 samples/sec Loss 1.7532 LearningRate 0.000182 Epoch: 24 Global Step: 511190 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:28,806-Speed 2499.23 samples/sec Loss 1.7471 LearningRate 0.000182 Epoch: 24 Global Step: 511200 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:36,950-Speed 2515.23 samples/sec Loss 1.7545 LearningRate 0.000182 Epoch: 24 Global Step: 511210 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:45,147-Speed 2498.91 samples/sec Loss 1.7169 LearningRate 0.000182 Epoch: 24 Global Step: 511220 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:28:53,343-Speed 2499.34 samples/sec Loss 1.7601 LearningRate 0.000182 Epoch: 24 Global Step: 511230 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:01,542-Speed 2498.16 samples/sec Loss 1.7226 LearningRate 0.000182 Epoch: 24 Global Step: 511240 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:09,750-Speed 2495.53 samples/sec Loss 1.7495 LearningRate 0.000182 Epoch: 24 Global Step: 511250 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:17,947-Speed 2499.21 samples/sec Loss 1.7671 LearningRate 0.000182 Epoch: 24 Global Step: 511260 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:26,090-Speed 2515.28 samples/sec Loss 1.7686 LearningRate 0.000182 Epoch: 24 Global Step: 511270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:34,286-Speed 2499.08 samples/sec Loss 1.7057 LearningRate 0.000182 Epoch: 24 Global Step: 511280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:42,496-Speed 2495.17 samples/sec Loss 1.7496 LearningRate 0.000182 Epoch: 24 Global Step: 511290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:50,693-Speed 2498.66 samples/sec Loss 1.7104 LearningRate 0.000182 Epoch: 24 Global Step: 511300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:29:58,890-Speed 2498.93 samples/sec Loss 1.6916 LearningRate 0.000182 Epoch: 24 Global Step: 511310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:07,089-Speed 2498.19 samples/sec Loss 1.7452 LearningRate 0.000182 Epoch: 24 Global Step: 511320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:15,235-Speed 2514.60 samples/sec Loss 1.7350 LearningRate 0.000182 Epoch: 24 Global Step: 511330 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:23,451-Speed 2493.04 samples/sec Loss 1.7096 LearningRate 0.000182 Epoch: 24 Global Step: 511340 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:31,660-Speed 2495.31 samples/sec Loss 1.7129 LearningRate 0.000182 Epoch: 24 Global Step: 511350 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:39,861-Speed 2497.69 samples/sec Loss 1.7540 LearningRate 0.000182 Epoch: 24 Global Step: 511360 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:48,060-Speed 2498.24 samples/sec Loss 1.7247 LearningRate 0.000182 Epoch: 24 Global Step: 511370 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:30:56,272-Speed 2494.31 samples/sec Loss 1.7448 LearningRate 0.000182 Epoch: 24 Global Step: 511380 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:04,420-Speed 2513.94 samples/sec Loss 1.7541 LearningRate 0.000182 Epoch: 24 Global Step: 511390 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:12,621-Speed 2497.74 samples/sec Loss 1.7072 LearningRate 0.000182 Epoch: 24 Global Step: 511400 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:20,819-Speed 2498.37 samples/sec Loss 1.7306 LearningRate 0.000182 Epoch: 24 Global Step: 511410 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:29,016-Speed 2499.08 samples/sec Loss 1.7464 LearningRate 0.000182 Epoch: 24 Global Step: 511420 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:37,217-Speed 2497.72 samples/sec Loss 1.7050 LearningRate 0.000182 Epoch: 24 Global Step: 511430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:45,411-Speed 2499.65 samples/sec Loss 1.7575 LearningRate 0.000182 Epoch: 24 Global Step: 511440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:31:53,558-Speed 2514.21 samples/sec Loss 1.7424 LearningRate 0.000182 Epoch: 24 Global Step: 511450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:32:01,763-Speed 2496.45 samples/sec Loss 1.7421 LearningRate 0.000182 Epoch: 24 Global Step: 511460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:32:09,962-Speed 2498.37 samples/sec Loss 1.7303 LearningRate 0.000182 Epoch: 24 Global Step: 511470 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:18,170-Speed 2495.59 samples/sec Loss 1.7305 LearningRate 0.000182 Epoch: 24 Global Step: 511480 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:26,371-Speed 2497.77 samples/sec Loss 1.7067 LearningRate 0.000181 Epoch: 24 Global Step: 511490 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:34,574-Speed 2497.33 samples/sec Loss 1.6951 LearningRate 0.000181 Epoch: 24 Global Step: 511500 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:42,718-Speed 2514.97 samples/sec Loss 1.6955 LearningRate 0.000181 Epoch: 24 Global Step: 511510 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:50,921-Speed 2497.32 samples/sec Loss 1.6921 LearningRate 0.000181 Epoch: 24 Global Step: 511520 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:32:59,119-Speed 2498.67 samples/sec Loss 1.7227 LearningRate 0.000181 Epoch: 24 Global Step: 511530 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:07,317-Speed 2498.46 samples/sec Loss 1.7359 LearningRate 0.000181 Epoch: 24 Global Step: 511540 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:15,517-Speed 2498.16 samples/sec Loss 1.7247 LearningRate 0.000181 Epoch: 24 Global Step: 511550 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:23,718-Speed 2497.49 samples/sec Loss 1.6994 LearningRate 0.000181 Epoch: 24 Global Step: 511560 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:31,866-Speed 2514.17 samples/sec Loss 1.7531 LearningRate 0.000181 Epoch: 24 Global Step: 511570 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:40,066-Speed 2498.04 samples/sec Loss 1.6872 LearningRate 0.000181 Epoch: 24 Global Step: 511580 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:48,268-Speed 2497.24 samples/sec Loss 1.7349 LearningRate 0.000181 Epoch: 24 Global Step: 511590 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:33:56,467-Speed 2498.50 samples/sec Loss 1.7146 LearningRate 0.000181 Epoch: 24 Global Step: 511600 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:04,666-Speed 2498.50 samples/sec Loss 1.7362 LearningRate 0.000181 Epoch: 24 Global Step: 511610 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:12,865-Speed 2498.20 samples/sec Loss 1.7283 LearningRate 0.000181 Epoch: 24 Global Step: 511620 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:21,009-Speed 2515.19 samples/sec Loss 1.7295 LearningRate 0.000181 Epoch: 24 Global Step: 511630 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:29,207-Speed 2498.49 samples/sec Loss 1.6806 LearningRate 0.000181 Epoch: 24 Global Step: 511640 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:37,408-Speed 2497.77 samples/sec Loss 1.7167 LearningRate 0.000181 Epoch: 24 Global Step: 511650 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:45,612-Speed 2496.67 samples/sec Loss 1.7781 LearningRate 0.000181 Epoch: 24 Global Step: 511660 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:34:53,810-Speed 2498.67 samples/sec Loss 1.6714 LearningRate 0.000181 Epoch: 24 Global Step: 511670 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:02,010-Speed 2497.90 samples/sec Loss 1.7024 LearningRate 0.000181 Epoch: 24 Global Step: 511680 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:10,159-Speed 2513.64 samples/sec Loss 1.7283 LearningRate 0.000181 Epoch: 24 Global Step: 511690 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:18,365-Speed 2496.11 samples/sec Loss 1.7457 LearningRate 0.000181 Epoch: 24 Global Step: 511700 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:26,568-Speed 2497.07 samples/sec Loss 1.7503 LearningRate 0.000181 Epoch: 24 Global Step: 511710 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:34,773-Speed 2496.32 samples/sec Loss 1.7247 LearningRate 0.000181 Epoch: 24 Global Step: 511720 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:43,002-Speed 2489.29 samples/sec Loss 1.7252 LearningRate 0.000181 Epoch: 24 Global Step: 511730 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:51,202-Speed 2498.14 samples/sec Loss 1.7046 LearningRate 0.000181 Epoch: 24 Global Step: 511740 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:35:59,350-Speed 2513.90 samples/sec Loss 1.7553 LearningRate 0.000181 Epoch: 24 Global Step: 511750 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:07,556-Speed 2496.19 samples/sec Loss 1.7494 LearningRate 0.000181 Epoch: 24 Global Step: 511760 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:15,776-Speed 2492.22 samples/sec Loss 1.7267 LearningRate 0.000181 Epoch: 24 Global Step: 511770 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:23,987-Speed 2494.50 samples/sec Loss 1.7316 LearningRate 0.000181 Epoch: 24 Global Step: 511780 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:32,191-Speed 2496.93 samples/sec Loss 1.6699 LearningRate 0.000181 Epoch: 24 Global Step: 511790 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:40,392-Speed 2497.77 samples/sec Loss 1.7216 LearningRate 0.000181 Epoch: 24 Global Step: 511800 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:48,542-Speed 2513.40 samples/sec Loss 1.7369 LearningRate 0.000181 Epoch: 24 Global Step: 511810 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:36:56,743-Speed 2497.63 samples/sec Loss 1.7404 LearningRate 0.000181 Epoch: 24 Global Step: 511820 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:04,945-Speed 2497.51 samples/sec Loss 1.7133 LearningRate 0.000181 Epoch: 24 Global Step: 511830 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:13,142-Speed 2498.70 samples/sec Loss 1.6911 LearningRate 0.000181 Epoch: 24 Global Step: 511840 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:21,354-Speed 2494.72 samples/sec Loss 1.7391 LearningRate 0.000181 Epoch: 24 Global Step: 511850 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:29,561-Speed 2495.69 samples/sec Loss 1.7212 LearningRate 0.000181 Epoch: 24 Global Step: 511860 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:37,707-Speed 2514.65 samples/sec Loss 1.7401 LearningRate 0.000181 Epoch: 24 Global Step: 511870 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:45,917-Speed 2495.00 samples/sec Loss 1.7360 LearningRate 0.000181 Epoch: 24 Global Step: 511880 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:37:54,117-Speed 2497.64 samples/sec Loss 1.7103 LearningRate 0.000181 Epoch: 24 Global Step: 511890 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:02,319-Speed 2497.29 samples/sec Loss 1.7212 LearningRate 0.000181 Epoch: 24 Global Step: 511900 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:10,551-Speed 2498.81 samples/sec Loss 1.7209 LearningRate 0.000181 Epoch: 24 Global Step: 511910 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:18,792-Speed 2498.69 samples/sec Loss 1.7433 LearningRate 0.000181 Epoch: 24 Global Step: 511920 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:26,942-Speed 2513.30 samples/sec Loss 1.7027 LearningRate 0.000181 Epoch: 24 Global Step: 511930 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:35,147-Speed 2496.10 samples/sec Loss 1.7446 LearningRate 0.000181 Epoch: 24 Global Step: 511940 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:43,407-Speed 2496.12 samples/sec Loss 1.7124 LearningRate 0.000181 Epoch: 24 Global Step: 511950 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:38:56,763-Speed 2131.95 samples/sec Loss 1.7472 LearningRate 0.000181 Epoch: 24 Global Step: 511960 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:04,988-Speed 2499.60 samples/sec Loss 1.6890 LearningRate 0.000181 Epoch: 24 Global Step: 511970 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:13,208-Speed 2500.67 samples/sec Loss 1.7519 LearningRate 0.000181 Epoch: 24 Global Step: 511980 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:25,930-Speed 1615.94 samples/sec Loss 1.7744 LearningRate 0.000181 Epoch: 24 Global Step: 511990 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:34,151-Speed 2498.90 samples/sec Loss 1.7266 LearningRate 0.000181 Epoch: 24 Global Step: 512000 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:42,343-Speed 2500.39 samples/sec Loss 1.7478 LearningRate 0.000181 Epoch: 24 Global Step: 512010 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:39:57,043-Speed 1602.98 samples/sec Loss 1.7111 LearningRate 0.000181 Epoch: 24 Global Step: 512020 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:05,272-Speed 2495.10 samples/sec Loss 1.7380 LearningRate 0.000181 Epoch: 24 Global Step: 512030 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:13,464-Speed 2500.35 samples/sec Loss 1.6914 LearningRate 0.000181 Epoch: 24 Global Step: 512040 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:21,635-Speed 2517.63 samples/sec Loss 1.6971 LearningRate 0.000181 Epoch: 24 Global Step: 512050 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:34,190-Speed 1631.34 samples/sec Loss 1.7335 LearningRate 0.000181 Epoch: 24 Global Step: 512060 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:42,425-Speed 2500.60 samples/sec Loss 1.7285 LearningRate 0.000181 Epoch: 24 Global Step: 512070 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:40:55,865-Speed 1527.53 samples/sec Loss 1.7216 LearningRate 0.000181 Epoch: 24 Global Step: 512080 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:04,059-Speed 2499.57 samples/sec Loss 1.6747 LearningRate 0.000181 Epoch: 24 Global Step: 512090 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:12,286-Speed 2499.44 samples/sec Loss 1.7040 LearningRate 0.000181 Epoch: 24 Global Step: 512100 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:20,570-Speed 2511.21 samples/sec Loss 1.6933 LearningRate 0.000181 Epoch: 24 Global Step: 512110 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:29,439-Speed 2309.38 samples/sec Loss 1.7041 LearningRate 0.000181 Epoch: 24 Global Step: 512120 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:37,657-Speed 2492.38 samples/sec Loss 1.7117 LearningRate 0.000181 Epoch: 24 Global Step: 512130 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:45,863-Speed 2496.25 samples/sec Loss 1.7143 LearningRate 0.000181 Epoch: 24 Global Step: 512140 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-07-10 11:41:54,027-Speed 2508.84 samples/sec Loss 1.7418 LearningRate 0.000181 Epoch: 24 Global Step: 512150 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:02,227-Speed 2497.84 samples/sec Loss 1.7268 LearningRate 0.000181 Epoch: 24 Global Step: 512160 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:10,380-Speed 2513.09 samples/sec Loss 1.6893 LearningRate 0.000181 Epoch: 24 Global Step: 512170 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:18,585-Speed 2496.51 samples/sec Loss 1.7257 LearningRate 0.000181 Epoch: 24 Global Step: 512180 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:26,790-Speed 2496.38 samples/sec Loss 1.7295 LearningRate 0.000181 Epoch: 24 Global Step: 512190 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:34,998-Speed 2495.58 samples/sec Loss 1.7184 LearningRate 0.000181 Epoch: 24 Global Step: 512200 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:43,196-Speed 2498.52 samples/sec Loss 1.7485 LearningRate 0.000181 Epoch: 24 Global Step: 512210 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:51,395-Speed 2498.35 samples/sec Loss 1.7470 LearningRate 0.000181 Epoch: 24 Global Step: 512220 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:42:59,549-Speed 2512.75 samples/sec Loss 1.7486 LearningRate 0.000181 Epoch: 24 Global Step: 512230 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:07,757-Speed 2495.74 samples/sec Loss 1.6816 LearningRate 0.000181 Epoch: 24 Global Step: 512240 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:15,961-Speed 2496.78 samples/sec Loss 1.7565 LearningRate 0.000181 Epoch: 24 Global Step: 512250 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:24,163-Speed 2497.15 samples/sec Loss 1.7361 LearningRate 0.000181 Epoch: 24 Global Step: 512260 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:32,359-Speed 2499.33 samples/sec Loss 1.7179 LearningRate 0.000181 Epoch: 24 Global Step: 512270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:40,570-Speed 2494.61 samples/sec Loss 1.7217 LearningRate 0.000181 Epoch: 24 Global Step: 512280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:48,719-Speed 2513.49 samples/sec Loss 1.7375 LearningRate 0.000181 Epoch: 24 Global Step: 512290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:43:56,925-Speed 2496.31 samples/sec Loss 1.7313 LearningRate 0.000181 Epoch: 24 Global Step: 512300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:05,132-Speed 2495.89 samples/sec Loss 1.7357 LearningRate 0.000181 Epoch: 24 Global Step: 512310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:13,334-Speed 2497.38 samples/sec Loss 1.6982 LearningRate 0.000181 Epoch: 24 Global Step: 512320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:21,546-Speed 2494.18 samples/sec Loss 1.6934 LearningRate 0.000181 Epoch: 24 Global Step: 512330 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:29,750-Speed 2496.76 samples/sec Loss 1.7159 LearningRate 0.000181 Epoch: 24 Global Step: 512340 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:37,903-Speed 2512.56 samples/sec Loss 1.6934 LearningRate 0.000181 Epoch: 24 Global Step: 512350 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:46,110-Speed 2495.88 samples/sec Loss 1.6988 LearningRate 0.000181 Epoch: 24 Global Step: 512360 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:44:54,314-Speed 2496.91 samples/sec Loss 1.6717 LearningRate 0.000180 Epoch: 24 Global Step: 512370 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:02,517-Speed 2497.25 samples/sec Loss 1.7153 LearningRate 0.000180 Epoch: 24 Global Step: 512380 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:10,719-Speed 2497.11 samples/sec Loss 1.7050 LearningRate 0.000180 Epoch: 24 Global Step: 512390 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:18,925-Speed 2496.04 samples/sec Loss 1.6820 LearningRate 0.000180 Epoch: 24 Global Step: 512400 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:27,074-Speed 2513.63 samples/sec Loss 1.7523 LearningRate 0.000180 Epoch: 24 Global Step: 512410 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:35,283-Speed 2495.30 samples/sec Loss 1.7596 LearningRate 0.000180 Epoch: 24 Global Step: 512420 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:43,485-Speed 2497.41 samples/sec Loss 1.7072 LearningRate 0.000180 Epoch: 24 Global Step: 512430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:51,690-Speed 2496.41 samples/sec Loss 1.7492 LearningRate 0.000180 Epoch: 24 Global Step: 512440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:45:59,901-Speed 2494.67 samples/sec Loss 1.7480 LearningRate 0.000180 Epoch: 24 Global Step: 512450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:08,110-Speed 2495.34 samples/sec Loss 1.6776 LearningRate 0.000180 Epoch: 24 Global Step: 512460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:16,257-Speed 2514.42 samples/sec Loss 1.7030 LearningRate 0.000180 Epoch: 24 Global Step: 512470 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:24,462-Speed 2496.30 samples/sec Loss 1.7552 LearningRate 0.000180 Epoch: 24 Global Step: 512480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:32,665-Speed 2497.13 samples/sec Loss 1.7478 LearningRate 0.000180 Epoch: 24 Global Step: 512490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:40,867-Speed 2497.28 samples/sec Loss 1.6938 LearningRate 0.000180 Epoch: 24 Global Step: 512500 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:49,069-Speed 2497.47 samples/sec Loss 1.7108 LearningRate 0.000180 Epoch: 24 Global Step: 512510 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:46:57,267-Speed 2498.60 samples/sec Loss 1.7143 LearningRate 0.000180 Epoch: 24 Global Step: 512520 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:05,416-Speed 2513.50 samples/sec Loss 1.7039 LearningRate 0.000180 Epoch: 24 Global Step: 512530 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:13,615-Speed 2498.29 samples/sec Loss 1.7324 LearningRate 0.000180 Epoch: 24 Global Step: 512540 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:21,814-Speed 2498.39 samples/sec Loss 1.6827 LearningRate 0.000180 Epoch: 24 Global Step: 512550 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:30,012-Speed 2498.39 samples/sec Loss 1.7451 LearningRate 0.000180 Epoch: 24 Global Step: 512560 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:38,208-Speed 2499.10 samples/sec Loss 1.7489 LearningRate 0.000180 Epoch: 24 Global Step: 512570 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:46,426-Speed 2492.57 samples/sec Loss 1.6955 LearningRate 0.000180 Epoch: 24 Global Step: 512580 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:47:54,573-Speed 2514.29 samples/sec Loss 1.7731 LearningRate 0.000180 Epoch: 24 Global Step: 512590 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:02,789-Speed 2492.90 samples/sec Loss 1.7155 LearningRate 0.000180 Epoch: 24 Global Step: 512600 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:10,994-Speed 2496.86 samples/sec Loss 1.7087 LearningRate 0.000180 Epoch: 24 Global Step: 512610 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:19,194-Speed 2498.20 samples/sec Loss 1.7585 LearningRate 0.000180 Epoch: 24 Global Step: 512620 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:27,393-Speed 2497.87 samples/sec Loss 1.7229 LearningRate 0.000180 Epoch: 24 Global Step: 512630 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:35,598-Speed 2496.55 samples/sec Loss 1.7277 LearningRate 0.000180 Epoch: 24 Global Step: 512640 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:43,749-Speed 2513.11 samples/sec Loss 1.7190 LearningRate 0.000180 Epoch: 24 Global Step: 512650 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:48:51,958-Speed 2495.46 samples/sec Loss 1.7399 LearningRate 0.000180 Epoch: 24 Global Step: 512660 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:00,157-Speed 2498.14 samples/sec Loss 1.7088 LearningRate 0.000180 Epoch: 24 Global Step: 512670 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:08,363-Speed 2496.03 samples/sec Loss 1.7260 LearningRate 0.000180 Epoch: 24 Global Step: 512680 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:16,562-Speed 2498.54 samples/sec Loss 1.7027 LearningRate 0.000180 Epoch: 24 Global Step: 512690 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:24,760-Speed 2498.69 samples/sec Loss 1.7033 LearningRate 0.000180 Epoch: 24 Global Step: 512700 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:32,908-Speed 2513.77 samples/sec Loss 1.6966 LearningRate 0.000180 Epoch: 24 Global Step: 512710 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:41,108-Speed 2498.10 samples/sec Loss 1.6812 LearningRate 0.000180 Epoch: 24 Global Step: 512720 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:49,305-Speed 2498.77 samples/sec Loss 1.6712 LearningRate 0.000180 Epoch: 24 Global Step: 512730 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:49:57,518-Speed 2493.94 samples/sec Loss 1.6946 LearningRate 0.000180 Epoch: 24 Global Step: 512740 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:05,716-Speed 2498.61 samples/sec Loss 1.6978 LearningRate 0.000180 Epoch: 24 Global Step: 512750 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:13,913-Speed 2499.02 samples/sec Loss 1.7268 LearningRate 0.000180 Epoch: 24 Global Step: 512760 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:22,074-Speed 2510.00 samples/sec Loss 1.6955 LearningRate 0.000180 Epoch: 24 Global Step: 512770 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:30,277-Speed 2497.17 samples/sec Loss 1.6915 LearningRate 0.000180 Epoch: 24 Global Step: 512780 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:38,481-Speed 2496.41 samples/sec Loss 1.7195 LearningRate 0.000180 Epoch: 24 Global Step: 512790 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:46,694-Speed 2494.19 samples/sec Loss 1.7214 LearningRate 0.000180 Epoch: 24 Global Step: 512800 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:50:54,904-Speed 2494.87 samples/sec Loss 1.7230 LearningRate 0.000180 Epoch: 24 Global Step: 512810 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:03,103-Speed 2498.23 samples/sec Loss 1.6903 LearningRate 0.000180 Epoch: 24 Global Step: 512820 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:11,251-Speed 2514.08 samples/sec Loss 1.7103 LearningRate 0.000180 Epoch: 24 Global Step: 512830 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:19,471-Speed 2491.83 samples/sec Loss 1.7304 LearningRate 0.000180 Epoch: 24 Global Step: 512840 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:27,673-Speed 2497.62 samples/sec Loss 1.7016 LearningRate 0.000180 Epoch: 24 Global Step: 512850 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:35,871-Speed 2498.45 samples/sec Loss 1.7217 LearningRate 0.000180 Epoch: 24 Global Step: 512860 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:44,071-Speed 2498.00 samples/sec Loss 1.6803 LearningRate 0.000180 Epoch: 24 Global Step: 512870 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:51:52,268-Speed 2498.74 samples/sec Loss 1.7039 LearningRate 0.000180 Epoch: 24 Global Step: 512880 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:52:00,410-Speed 2515.76 samples/sec Loss 1.7425 LearningRate 0.000180 Epoch: 24 Global Step: 512890 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-07-10 11:52:08,629-Speed 2492.19 samples/sec Loss 1.6937 LearningRate 0.000180 Epoch: 24 Global Step: 512900 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:16,831-Speed 2497.14 samples/sec Loss 1.7126 LearningRate 0.000180 Epoch: 24 Global Step: 512910 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:25,034-Speed 2497.68 samples/sec Loss 1.7012 LearningRate 0.000180 Epoch: 24 Global Step: 512920 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:33,236-Speed 2497.46 samples/sec Loss 1.6975 LearningRate 0.000180 Epoch: 24 Global Step: 512930 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:41,439-Speed 2496.75 samples/sec Loss 1.7169 LearningRate 0.000180 Epoch: 24 Global Step: 512940 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:49,585-Speed 2514.88 samples/sec Loss 1.7476 LearningRate 0.000180 Epoch: 24 Global Step: 512950 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:52:57,787-Speed 2497.42 samples/sec Loss 1.7344 LearningRate 0.000180 Epoch: 24 Global Step: 512960 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:05,985-Speed 2498.47 samples/sec Loss 1.7400 LearningRate 0.000180 Epoch: 24 Global Step: 512970 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:14,189-Speed 2496.87 samples/sec Loss 1.7437 LearningRate 0.000180 Epoch: 24 Global Step: 512980 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:22,389-Speed 2498.19 samples/sec Loss 1.6956 LearningRate 0.000180 Epoch: 24 Global Step: 512990 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:30,588-Speed 2498.12 samples/sec Loss 1.6928 LearningRate 0.000180 Epoch: 24 Global Step: 513000 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:38,736-Speed 2513.87 samples/sec Loss 1.6823 LearningRate 0.000180 Epoch: 24 Global Step: 513010 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:46,942-Speed 2496.20 samples/sec Loss 1.7106 LearningRate 0.000180 Epoch: 24 Global Step: 513020 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:53:55,142-Speed 2498.11 samples/sec Loss 1.6988 LearningRate 0.000180 Epoch: 24 Global Step: 513030 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:03,354-Speed 2494.11 samples/sec Loss 1.7143 LearningRate 0.000180 Epoch: 24 Global Step: 513040 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:11,553-Speed 2498.73 samples/sec Loss 1.7258 LearningRate 0.000180 Epoch: 24 Global Step: 513050 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:19,757-Speed 2496.59 samples/sec Loss 1.6833 LearningRate 0.000180 Epoch: 24 Global Step: 513060 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:27,904-Speed 2514.25 samples/sec Loss 1.7659 LearningRate 0.000180 Epoch: 24 Global Step: 513070 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:36,104-Speed 2498.24 samples/sec Loss 1.6792 LearningRate 0.000180 Epoch: 24 Global Step: 513080 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:44,320-Speed 2493.10 samples/sec Loss 1.7261 LearningRate 0.000180 Epoch: 24 Global Step: 513090 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:54:52,519-Speed 2498.54 samples/sec Loss 1.6706 LearningRate 0.000180 Epoch: 24 Global Step: 513100 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:00,715-Speed 2499.22 samples/sec Loss 1.7134 LearningRate 0.000180 Epoch: 24 Global Step: 513110 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:08,915-Speed 2497.91 samples/sec Loss 1.7087 LearningRate 0.000180 Epoch: 24 Global Step: 513120 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:17,062-Speed 2514.15 samples/sec Loss 1.7288 LearningRate 0.000180 Epoch: 24 Global Step: 513130 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:25,264-Speed 2497.52 samples/sec Loss 1.6715 LearningRate 0.000180 Epoch: 24 Global Step: 513140 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:33,486-Speed 2491.32 samples/sec Loss 1.6709 LearningRate 0.000180 Epoch: 24 Global Step: 513150 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:41,698-Speed 2494.45 samples/sec Loss 1.7186 LearningRate 0.000180 Epoch: 24 Global Step: 513160 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:49,898-Speed 2498.06 samples/sec Loss 1.7392 LearningRate 0.000180 Epoch: 24 Global Step: 513170 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:55:58,107-Speed 2495.37 samples/sec Loss 1.7170 LearningRate 0.000180 Epoch: 24 Global Step: 513180 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:06,266-Speed 2510.46 samples/sec Loss 1.6937 LearningRate 0.000180 Epoch: 24 Global Step: 513190 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:14,471-Speed 2496.34 samples/sec Loss 1.6941 LearningRate 0.000180 Epoch: 24 Global Step: 513200 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:22,683-Speed 2494.39 samples/sec Loss 1.7276 LearningRate 0.000180 Epoch: 24 Global Step: 513210 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:30,883-Speed 2498.08 samples/sec Loss 1.6605 LearningRate 0.000180 Epoch: 24 Global Step: 513220 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:39,089-Speed 2496.60 samples/sec Loss 1.7342 LearningRate 0.000180 Epoch: 24 Global Step: 513230 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:47,300-Speed 2494.36 samples/sec Loss 1.7440 LearningRate 0.000180 Epoch: 24 Global Step: 513240 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:56:55,448-Speed 2513.79 samples/sec Loss 1.7182 LearningRate 0.000179 Epoch: 24 Global Step: 513250 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:03,651-Speed 2497.27 samples/sec Loss 1.7455 LearningRate 0.000179 Epoch: 24 Global Step: 513260 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:11,860-Speed 2495.20 samples/sec Loss 1.7103 LearningRate 0.000179 Epoch: 24 Global Step: 513270 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:20,059-Speed 2498.26 samples/sec Loss 1.6770 LearningRate 0.000179 Epoch: 24 Global Step: 513280 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:28,255-Speed 2499.02 samples/sec Loss 1.7402 LearningRate 0.000179 Epoch: 24 Global Step: 513290 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:36,456-Speed 2497.68 samples/sec Loss 1.7022 LearningRate 0.000179 Epoch: 24 Global Step: 513300 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:44,605-Speed 2513.61 samples/sec Loss 1.7317 LearningRate 0.000179 Epoch: 24 Global Step: 513310 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:57:52,807-Speed 2497.82 samples/sec Loss 1.6913 LearningRate 0.000179 Epoch: 24 Global Step: 513320 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:58:01,005-Speed 2498.61 samples/sec Loss 1.6676 LearningRate 0.000179 Epoch: 24 Global Step: 513330 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:58:09,226-Speed 2491.54 samples/sec Loss 1.6946 LearningRate 0.000179 Epoch: 24 Global Step: 513340 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 11:58:17,423-Speed 2499.00 samples/sec Loss 1.7441 LearningRate 0.000179 Epoch: 24 Global Step: 513350 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:58:25,628-Speed 2496.40 samples/sec Loss 1.7367 LearningRate 0.000179 Epoch: 24 Global Step: 513360 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:58:33,776-Speed 2513.60 samples/sec Loss 1.7224 LearningRate 0.000179 Epoch: 24 Global Step: 513370 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:58:41,984-Speed 2495.43 samples/sec Loss 1.7457 LearningRate 0.000179 Epoch: 24 Global Step: 513380 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:58:50,187-Speed 2497.12 samples/sec Loss 1.7272 LearningRate 0.000179 Epoch: 24 Global Step: 513390 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:58:58,389-Speed 2497.40 samples/sec Loss 1.6890 LearningRate 0.000179 Epoch: 24 Global Step: 513400 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:06,593-Speed 2496.79 samples/sec Loss 1.7958 LearningRate 0.000179 Epoch: 24 Global Step: 513410 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:14,795-Speed 2497.42 samples/sec Loss 1.7382 LearningRate 0.000179 Epoch: 24 Global Step: 513420 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:22,944-Speed 2513.84 samples/sec Loss 1.7230 LearningRate 0.000179 Epoch: 24 Global Step: 513430 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:31,147-Speed 2496.93 samples/sec Loss 1.7012 LearningRate 0.000179 Epoch: 24 Global Step: 513440 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:39,345-Speed 2498.66 samples/sec Loss 1.7061 LearningRate 0.000179 Epoch: 24 Global Step: 513450 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:47,549-Speed 2496.66 samples/sec Loss 1.7105 LearningRate 0.000179 Epoch: 24 Global Step: 513460 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 11:59:55,754-Speed 2496.51 samples/sec Loss 1.6543 LearningRate 0.000179 Epoch: 24 Global Step: 513470 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:03,952-Speed 2498.45 samples/sec Loss 1.6973 LearningRate 0.000179 Epoch: 24 Global Step: 513480 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:12,103-Speed 2512.89 samples/sec Loss 1.6610 LearningRate 0.000179 Epoch: 24 Global Step: 513490 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:20,305-Speed 2497.61 samples/sec Loss 1.7133 LearningRate 0.000179 Epoch: 24 Global Step: 513500 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:28,507-Speed 2497.26 samples/sec Loss 1.6906 LearningRate 0.000179 Epoch: 24 Global Step: 513510 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:36,707-Speed 2497.81 samples/sec Loss 1.6744 LearningRate 0.000179 Epoch: 24 Global Step: 513520 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:44,929-Speed 2491.29 samples/sec Loss 1.6834 LearningRate 0.000179 Epoch: 24 Global Step: 513530 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:00:53,128-Speed 2498.24 samples/sec Loss 1.6776 LearningRate 0.000179 Epoch: 24 Global Step: 513540 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:01,276-Speed 2513.84 samples/sec Loss 1.7139 LearningRate 0.000179 Epoch: 24 Global Step: 513550 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:09,484-Speed 2496.17 samples/sec Loss 1.7155 LearningRate 0.000179 Epoch: 24 Global Step: 513560 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:17,685-Speed 2497.50 samples/sec Loss 1.7254 LearningRate 0.000179 Epoch: 24 Global Step: 513570 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:25,889-Speed 2496.92 samples/sec Loss 1.7244 LearningRate 0.000179 Epoch: 24 Global Step: 513580 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:34,088-Speed 2498.46 samples/sec Loss 1.6779 LearningRate 0.000179 Epoch: 24 Global Step: 513590 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:42,303-Speed 2493.76 samples/sec Loss 1.6907 LearningRate 0.000179 Epoch: 24 Global Step: 513600 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:50,453-Speed 2513.17 samples/sec Loss 1.6894 LearningRate 0.000179 Epoch: 24 Global Step: 513610 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:01:58,655-Speed 2497.38 samples/sec Loss 1.6906 LearningRate 0.000179 Epoch: 24 Global Step: 513620 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:06,866-Speed 2494.98 samples/sec Loss 1.6993 LearningRate 0.000179 Epoch: 24 Global Step: 513630 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:15,062-Speed 2498.85 samples/sec Loss 1.6896 LearningRate 0.000179 Epoch: 24 Global Step: 513640 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:23,264-Speed 2497.43 samples/sec Loss 1.7418 LearningRate 0.000179 Epoch: 24 Global Step: 513650 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:31,473-Speed 2495.30 samples/sec Loss 1.7154 LearningRate 0.000179 Epoch: 24 Global Step: 513660 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:39,623-Speed 2513.19 samples/sec Loss 1.7198 LearningRate 0.000179 Epoch: 24 Global Step: 513670 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:47,829-Speed 2496.37 samples/sec Loss 1.7298 LearningRate 0.000179 Epoch: 24 Global Step: 513680 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:02:56,028-Speed 2498.14 samples/sec Loss 1.7006 LearningRate 0.000179 Epoch: 24 Global Step: 513690 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:04,229-Speed 2497.92 samples/sec Loss 1.7081 LearningRate 0.000179 Epoch: 24 Global Step: 513700 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:12,440-Speed 2494.44 samples/sec Loss 1.7225 LearningRate 0.000179 Epoch: 24 Global Step: 513710 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:20,643-Speed 2497.22 samples/sec Loss 1.7361 LearningRate 0.000179 Epoch: 24 Global Step: 513720 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:28,789-Speed 2514.37 samples/sec Loss 1.6815 LearningRate 0.000179 Epoch: 24 Global Step: 513730 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:36,990-Speed 2497.63 samples/sec Loss 1.6978 LearningRate 0.000179 Epoch: 24 Global Step: 513740 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:03:45,147-Speed 2511.23 samples/sec Loss 1.7178 LearningRate 0.000179 Epoch: 24 Global Step: 513750 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:03:53,345-Speed 2499.02 samples/sec Loss 1.7111 LearningRate 0.000179 Epoch: 24 Global Step: 513760 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:01,543-Speed 2498.55 samples/sec Loss 1.7188 LearningRate 0.000179 Epoch: 24 Global Step: 513770 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:09,741-Speed 2498.40 samples/sec Loss 1.7195 LearningRate 0.000179 Epoch: 24 Global Step: 513780 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:17,887-Speed 2514.46 samples/sec Loss 1.7119 LearningRate 0.000179 Epoch: 24 Global Step: 513790 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:26,091-Speed 2497.02 samples/sec Loss 1.6825 LearningRate 0.000179 Epoch: 24 Global Step: 513800 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:34,290-Speed 2498.36 samples/sec Loss 1.7411 LearningRate 0.000179 Epoch: 24 Global Step: 513810 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:42,493-Speed 2496.98 samples/sec Loss 1.7228 LearningRate 0.000179 Epoch: 24 Global Step: 513820 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:50,693-Speed 2498.01 samples/sec Loss 1.7154 LearningRate 0.000179 Epoch: 24 Global Step: 513830 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:04:58,893-Speed 2497.86 samples/sec Loss 1.6885 LearningRate 0.000179 Epoch: 24 Global Step: 513840 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:07,039-Speed 2514.41 samples/sec Loss 1.6827 LearningRate 0.000179 Epoch: 24 Global Step: 513850 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:15,244-Speed 2496.65 samples/sec Loss 1.7261 LearningRate 0.000179 Epoch: 24 Global Step: 513860 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:23,448-Speed 2496.59 samples/sec Loss 1.7271 LearningRate 0.000179 Epoch: 24 Global Step: 513870 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:31,652-Speed 2496.80 samples/sec Loss 1.7025 LearningRate 0.000179 Epoch: 24 Global Step: 513880 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:39,853-Speed 2497.74 samples/sec Loss 1.7337 LearningRate 0.000179 Epoch: 24 Global Step: 513890 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:48,054-Speed 2497.84 samples/sec Loss 1.7419 LearningRate 0.000179 Epoch: 24 Global Step: 513900 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:05:56,200-Speed 2514.58 samples/sec Loss 1.6923 LearningRate 0.000179 Epoch: 24 Global Step: 513910 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:04,399-Speed 2498.16 samples/sec Loss 1.7177 LearningRate 0.000179 Epoch: 24 Global Step: 513920 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:12,596-Speed 2499.11 samples/sec Loss 1.7148 LearningRate 0.000179 Epoch: 24 Global Step: 513930 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:20,797-Speed 2497.59 samples/sec Loss 1.7130 LearningRate 0.000179 Epoch: 24 Global Step: 513940 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:29,014-Speed 2492.56 samples/sec Loss 1.7029 LearningRate 0.000179 Epoch: 24 Global Step: 513950 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:37,214-Speed 2498.06 samples/sec Loss 1.7162 LearningRate 0.000179 Epoch: 24 Global Step: 513960 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:45,376-Speed 2509.74 samples/sec Loss 1.7150 LearningRate 0.000179 Epoch: 24 Global Step: 513970 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:06:53,588-Speed 2494.33 samples/sec Loss 1.7204 LearningRate 0.000179 Epoch: 24 Global Step: 513980 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:01,787-Speed 2498.35 samples/sec Loss 1.6684 LearningRate 0.000179 Epoch: 24 Global Step: 513990 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:09,988-Speed 2497.51 samples/sec Loss 1.7107 LearningRate 0.000179 Epoch: 24 Global Step: 514000 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:18,186-Speed 2498.48 samples/sec Loss 1.7672 LearningRate 0.000179 Epoch: 24 Global Step: 514010 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:26,395-Speed 2495.32 samples/sec Loss 1.7542 LearningRate 0.000179 Epoch: 24 Global Step: 514020 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:34,540-Speed 2515.01 samples/sec Loss 1.7423 LearningRate 0.000179 Epoch: 24 Global Step: 514030 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:42,738-Speed 2498.39 samples/sec Loss 1.7016 LearningRate 0.000179 Epoch: 24 Global Step: 514040 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:50,938-Speed 2498.12 samples/sec Loss 1.7538 LearningRate 0.000179 Epoch: 24 Global Step: 514050 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:07:59,137-Speed 2498.38 samples/sec Loss 1.6761 LearningRate 0.000179 Epoch: 24 Global Step: 514060 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:07,333-Speed 2498.96 samples/sec Loss 1.7082 LearningRate 0.000179 Epoch: 24 Global Step: 514070 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:15,531-Speed 2498.72 samples/sec Loss 1.6986 LearningRate 0.000179 Epoch: 24 Global Step: 514080 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:23,677-Speed 2514.59 samples/sec Loss 1.7283 LearningRate 0.000179 Epoch: 24 Global Step: 514090 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:31,877-Speed 2497.96 samples/sec Loss 1.7398 LearningRate 0.000179 Epoch: 24 Global Step: 514100 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:40,079-Speed 2497.64 samples/sec Loss 1.6947 LearningRate 0.000179 Epoch: 24 Global Step: 514110 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:48,275-Speed 2498.98 samples/sec Loss 1.6833 LearningRate 0.000179 Epoch: 24 Global Step: 514120 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:08:56,475-Speed 2497.85 samples/sec Loss 1.7051 LearningRate 0.000178 Epoch: 24 Global Step: 514130 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:04,680-Speed 2496.63 samples/sec Loss 1.7608 LearningRate 0.000178 Epoch: 24 Global Step: 514140 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:12,835-Speed 2511.87 samples/sec Loss 1.6770 LearningRate 0.000178 Epoch: 24 Global Step: 514150 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:21,048-Speed 2494.04 samples/sec Loss 1.7183 LearningRate 0.000178 Epoch: 24 Global Step: 514160 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:29,247-Speed 2498.40 samples/sec Loss 1.6881 LearningRate 0.000178 Epoch: 24 Global Step: 514170 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:37,448-Speed 2497.48 samples/sec Loss 1.7136 LearningRate 0.000178 Epoch: 24 Global Step: 514180 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:45,646-Speed 2498.38 samples/sec Loss 1.6848 LearningRate 0.000178 Epoch: 24 Global Step: 514190 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:09:53,846-Speed 2498.11 samples/sec Loss 1.6928 LearningRate 0.000178 Epoch: 24 Global Step: 514200 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:01,990-Speed 2515.39 samples/sec Loss 1.6934 LearningRate 0.000178 Epoch: 24 Global Step: 514210 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:10,200-Speed 2494.79 samples/sec Loss 1.6992 LearningRate 0.000178 Epoch: 24 Global Step: 514220 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:18,398-Speed 2498.53 samples/sec Loss 1.6884 LearningRate 0.000178 Epoch: 24 Global Step: 514230 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:26,599-Speed 2497.55 samples/sec Loss 1.7224 LearningRate 0.000178 Epoch: 24 Global Step: 514240 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:34,804-Speed 2496.56 samples/sec Loss 1.7143 LearningRate 0.000178 Epoch: 24 Global Step: 514250 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:43,003-Speed 2499.04 samples/sec Loss 1.7299 LearningRate 0.000178 Epoch: 24 Global Step: 514260 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:51,149-Speed 2514.43 samples/sec Loss 1.7403 LearningRate 0.000178 Epoch: 24 Global Step: 514270 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:10:59,349-Speed 2498.11 samples/sec Loss 1.6998 LearningRate 0.000178 Epoch: 24 Global Step: 514280 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:07,549-Speed 2497.76 samples/sec Loss 1.6912 LearningRate 0.000178 Epoch: 24 Global Step: 514290 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:15,748-Speed 2498.16 samples/sec Loss 1.7438 LearningRate 0.000178 Epoch: 24 Global Step: 514300 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:23,948-Speed 2498.34 samples/sec Loss 1.7202 LearningRate 0.000178 Epoch: 24 Global Step: 514310 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:32,148-Speed 2498.01 samples/sec Loss 1.6750 LearningRate 0.000178 Epoch: 24 Global Step: 514320 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:40,296-Speed 2513.78 samples/sec Loss 1.7099 LearningRate 0.000178 Epoch: 24 Global Step: 514330 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:48,502-Speed 2496.42 samples/sec Loss 1.6906 LearningRate 0.000178 Epoch: 24 Global Step: 514340 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:11:56,703-Speed 2497.93 samples/sec Loss 1.7327 LearningRate 0.000178 Epoch: 24 Global Step: 514350 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:04,899-Speed 2499.02 samples/sec Loss 1.7604 LearningRate 0.000178 Epoch: 24 Global Step: 514360 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:13,098-Speed 2498.19 samples/sec Loss 1.7332 LearningRate 0.000178 Epoch: 24 Global Step: 514370 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:21,302-Speed 2496.91 samples/sec Loss 1.7309 LearningRate 0.000178 Epoch: 24 Global Step: 514380 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:29,447-Speed 2515.11 samples/sec Loss 1.7404 LearningRate 0.000178 Epoch: 24 Global Step: 514390 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:37,646-Speed 2498.01 samples/sec Loss 1.7153 LearningRate 0.000178 Epoch: 24 Global Step: 514400 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:45,844-Speed 2498.57 samples/sec Loss 1.7516 LearningRate 0.000178 Epoch: 24 Global Step: 514410 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:12:54,045-Speed 2497.72 samples/sec Loss 1.7506 LearningRate 0.000178 Epoch: 24 Global Step: 514420 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:02,250-Speed 2496.57 samples/sec Loss 1.6910 LearningRate 0.000178 Epoch: 24 Global Step: 514430 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:10,453-Speed 2496.83 samples/sec Loss 1.6925 LearningRate 0.000178 Epoch: 24 Global Step: 514440 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:18,612-Speed 2510.63 samples/sec Loss 1.6901 LearningRate 0.000178 Epoch: 24 Global Step: 514450 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:26,814-Speed 2497.44 samples/sec Loss 1.7155 LearningRate 0.000178 Epoch: 24 Global Step: 514460 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:35,016-Speed 2497.33 samples/sec Loss 1.7523 LearningRate 0.000178 Epoch: 24 Global Step: 514470 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:43,213-Speed 2498.89 samples/sec Loss 1.6980 LearningRate 0.000178 Epoch: 24 Global Step: 514480 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:51,415-Speed 2497.36 samples/sec Loss 1.6968 LearningRate 0.000178 Epoch: 24 Global Step: 514490 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:13:59,613-Speed 2498.51 samples/sec Loss 1.6949 LearningRate 0.000178 Epoch: 24 Global Step: 514500 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:07,762-Speed 2513.81 samples/sec Loss 1.6828 LearningRate 0.000178 Epoch: 24 Global Step: 514510 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:15,974-Speed 2494.12 samples/sec Loss 1.6770 LearningRate 0.000178 Epoch: 24 Global Step: 514520 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:24,174-Speed 2498.03 samples/sec Loss 1.7255 LearningRate 0.000178 Epoch: 24 Global Step: 514530 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:32,389-Speed 2493.46 samples/sec Loss 1.7291 LearningRate 0.000178 Epoch: 24 Global Step: 514540 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:40,587-Speed 2498.34 samples/sec Loss 1.6837 LearningRate 0.000178 Epoch: 24 Global Step: 514550 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:48,791-Speed 2496.98 samples/sec Loss 1.6980 LearningRate 0.000178 Epoch: 24 Global Step: 514560 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:14:56,939-Speed 2513.82 samples/sec Loss 1.7084 LearningRate 0.000178 Epoch: 24 Global Step: 514570 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:05,142-Speed 2497.00 samples/sec Loss 1.6952 LearningRate 0.000178 Epoch: 24 Global Step: 514580 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:13,343-Speed 2497.57 samples/sec Loss 1.7003 LearningRate 0.000178 Epoch: 24 Global Step: 514590 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:21,542-Speed 2498.30 samples/sec Loss 1.7122 LearningRate 0.000178 Epoch: 24 Global Step: 514600 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:29,739-Speed 2498.76 samples/sec Loss 1.7130 LearningRate 0.000178 Epoch: 24 Global Step: 514610 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:37,942-Speed 2497.26 samples/sec Loss 1.6897 LearningRate 0.000178 Epoch: 24 Global Step: 514620 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:46,087-Speed 2514.53 samples/sec Loss 1.6882 LearningRate 0.000178 Epoch: 24 Global Step: 514630 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:15:54,302-Speed 2493.70 samples/sec Loss 1.7132 LearningRate 0.000178 Epoch: 24 Global Step: 514640 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:02,508-Speed 2496.09 samples/sec Loss 1.7190 LearningRate 0.000178 Epoch: 24 Global Step: 514650 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:10,707-Speed 2498.67 samples/sec Loss 1.6950 LearningRate 0.000178 Epoch: 24 Global Step: 514660 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:18,907-Speed 2497.87 samples/sec Loss 1.7135 LearningRate 0.000178 Epoch: 24 Global Step: 514670 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:27,115-Speed 2495.50 samples/sec Loss 1.6941 LearningRate 0.000178 Epoch: 24 Global Step: 514680 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:35,261-Speed 2514.55 samples/sec Loss 1.7121 LearningRate 0.000178 Epoch: 24 Global Step: 514690 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:43,468-Speed 2495.78 samples/sec Loss 1.7159 LearningRate 0.000178 Epoch: 24 Global Step: 514700 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:51,668-Speed 2497.88 samples/sec Loss 1.7174 LearningRate 0.000178 Epoch: 24 Global Step: 514710 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:16:59,886-Speed 2492.46 samples/sec Loss 1.6709 LearningRate 0.000178 Epoch: 24 Global Step: 514720 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:08,092-Speed 2496.35 samples/sec Loss 1.6735 LearningRate 0.000178 Epoch: 24 Global Step: 514730 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:16,291-Speed 2498.19 samples/sec Loss 1.6564 LearningRate 0.000178 Epoch: 24 Global Step: 514740 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:24,439-Speed 2513.81 samples/sec Loss 1.7206 LearningRate 0.000178 Epoch: 24 Global Step: 514750 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:32,640-Speed 2497.59 samples/sec Loss 1.6619 LearningRate 0.000178 Epoch: 24 Global Step: 514760 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:40,843-Speed 2497.00 samples/sec Loss 1.6721 LearningRate 0.000178 Epoch: 24 Global Step: 514770 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:49,047-Speed 2496.67 samples/sec Loss 1.7064 LearningRate 0.000178 Epoch: 24 Global Step: 514780 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:17:57,246-Speed 2498.22 samples/sec Loss 1.7142 LearningRate 0.000178 Epoch: 24 Global Step: 514790 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:05,445-Speed 2498.26 samples/sec Loss 1.7084 LearningRate 0.000178 Epoch: 24 Global Step: 514800 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:13,594-Speed 2513.74 samples/sec Loss 1.7015 LearningRate 0.000178 Epoch: 24 Global Step: 514810 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:21,796-Speed 2497.30 samples/sec Loss 1.6784 LearningRate 0.000178 Epoch: 24 Global Step: 514820 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:29,999-Speed 2497.27 samples/sec Loss 1.7166 LearningRate 0.000178 Epoch: 24 Global Step: 514830 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:38,199-Speed 2497.94 samples/sec Loss 1.6987 LearningRate 0.000178 Epoch: 24 Global Step: 514840 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:46,402-Speed 2496.95 samples/sec Loss 1.7303 LearningRate 0.000178 Epoch: 24 Global Step: 514850 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:18:54,603-Speed 2497.35 samples/sec Loss 1.7151 LearningRate 0.000178 Epoch: 24 Global Step: 514860 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:02,749-Speed 2514.45 samples/sec Loss 1.6916 LearningRate 0.000178 Epoch: 24 Global Step: 514870 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:10,948-Speed 2498.22 samples/sec Loss 1.7402 LearningRate 0.000178 Epoch: 24 Global Step: 514880 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:19,150-Speed 2497.70 samples/sec Loss 1.6911 LearningRate 0.000178 Epoch: 24 Global Step: 514890 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:27,343-Speed 2499.97 samples/sec Loss 1.7235 LearningRate 0.000178 Epoch: 24 Global Step: 514900 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:35,542-Speed 2498.42 samples/sec Loss 1.7104 LearningRate 0.000178 Epoch: 24 Global Step: 514910 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:43,741-Speed 2498.21 samples/sec Loss 1.6762 LearningRate 0.000178 Epoch: 24 Global Step: 514920 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:19:51,889-Speed 2513.86 samples/sec Loss 1.7092 LearningRate 0.000178 Epoch: 24 Global Step: 514930 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:20:00,087-Speed 2498.42 samples/sec Loss 1.7134 LearningRate 0.000178 Epoch: 24 Global Step: 514940 Fp16 Grad Scale: 8192 Required: 72 hours Training: 2022-07-10 12:20:08,287-Speed 2498.09 samples/sec Loss 1.6978 LearningRate 0.000178 Epoch: 24 Global Step: 514950 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:16,484-Speed 2499.13 samples/sec Loss 1.7109 LearningRate 0.000178 Epoch: 24 Global Step: 514960 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:24,684-Speed 2497.76 samples/sec Loss 1.7310 LearningRate 0.000178 Epoch: 24 Global Step: 514970 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:32,885-Speed 2497.53 samples/sec Loss 1.7019 LearningRate 0.000178 Epoch: 24 Global Step: 514980 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:41,028-Speed 2516.07 samples/sec Loss 1.7192 LearningRate 0.000178 Epoch: 24 Global Step: 514990 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:49,227-Speed 2498.33 samples/sec Loss 1.7383 LearningRate 0.000178 Epoch: 24 Global Step: 515000 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:20:57,437-Speed 2494.82 samples/sec Loss 1.6831 LearningRate 0.000178 Epoch: 24 Global Step: 515010 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:05,640-Speed 2496.83 samples/sec Loss 1.7145 LearningRate 0.000177 Epoch: 24 Global Step: 515020 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:13,842-Speed 2497.37 samples/sec Loss 1.6888 LearningRate 0.000177 Epoch: 24 Global Step: 515030 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:22,046-Speed 2496.97 samples/sec Loss 1.7101 LearningRate 0.000177 Epoch: 24 Global Step: 515040 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:30,197-Speed 2513.02 samples/sec Loss 1.6779 LearningRate 0.000177 Epoch: 24 Global Step: 515050 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:38,403-Speed 2496.18 samples/sec Loss 1.6426 LearningRate 0.000177 Epoch: 24 Global Step: 515060 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:46,602-Speed 2498.03 samples/sec Loss 1.7177 LearningRate 0.000177 Epoch: 24 Global Step: 515070 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:21:54,807-Speed 2496.57 samples/sec Loss 1.7072 LearningRate 0.000177 Epoch: 24 Global Step: 515080 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:03,004-Speed 2498.74 samples/sec Loss 1.7092 LearningRate 0.000177 Epoch: 24 Global Step: 515090 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:11,219-Speed 2493.46 samples/sec Loss 1.7043 LearningRate 0.000177 Epoch: 24 Global Step: 515100 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:19,363-Speed 2514.93 samples/sec Loss 1.7062 LearningRate 0.000177 Epoch: 24 Global Step: 515110 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:27,570-Speed 2496.11 samples/sec Loss 1.7123 LearningRate 0.000177 Epoch: 24 Global Step: 515120 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:35,781-Speed 2494.65 samples/sec Loss 1.7020 LearningRate 0.000177 Epoch: 24 Global Step: 515130 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:43,979-Speed 2498.61 samples/sec Loss 1.7131 LearningRate 0.000177 Epoch: 24 Global Step: 515140 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:22:52,185-Speed 2496.09 samples/sec Loss 1.7000 LearningRate 0.000177 Epoch: 24 Global Step: 515150 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:00,398-Speed 2493.90 samples/sec Loss 1.7557 LearningRate 0.000177 Epoch: 24 Global Step: 515160 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:08,547-Speed 2513.74 samples/sec Loss 1.7059 LearningRate 0.000177 Epoch: 24 Global Step: 515170 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:16,745-Speed 2498.62 samples/sec Loss 1.7480 LearningRate 0.000177 Epoch: 24 Global Step: 515180 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:24,956-Speed 2494.62 samples/sec Loss 1.7086 LearningRate 0.000177 Epoch: 24 Global Step: 515190 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:33,153-Speed 2499.03 samples/sec Loss 1.6725 LearningRate 0.000177 Epoch: 24 Global Step: 515200 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:41,351-Speed 2498.41 samples/sec Loss 1.7323 LearningRate 0.000177 Epoch: 24 Global Step: 515210 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:49,552-Speed 2497.67 samples/sec Loss 1.6865 LearningRate 0.000177 Epoch: 24 Global Step: 515220 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:23:57,700-Speed 2514.05 samples/sec Loss 1.7087 LearningRate 0.000177 Epoch: 24 Global Step: 515230 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:05,903-Speed 2497.07 samples/sec Loss 1.7229 LearningRate 0.000177 Epoch: 24 Global Step: 515240 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:14,104-Speed 2497.54 samples/sec Loss 1.6975 LearningRate 0.000177 Epoch: 24 Global Step: 515250 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:22,306-Speed 2497.24 samples/sec Loss 1.7046 LearningRate 0.000177 Epoch: 24 Global Step: 515260 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:30,507-Speed 2497.90 samples/sec Loss 1.7649 LearningRate 0.000177 Epoch: 24 Global Step: 515270 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:38,719-Speed 2494.31 samples/sec Loss 1.6848 LearningRate 0.000177 Epoch: 24 Global Step: 515280 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:46,867-Speed 2513.52 samples/sec Loss 1.7087 LearningRate 0.000177 Epoch: 24 Global Step: 515290 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:24:55,066-Speed 2498.44 samples/sec Loss 1.6944 LearningRate 0.000177 Epoch: 24 Global Step: 515300 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:03,296-Speed 2488.97 samples/sec Loss 1.6959 LearningRate 0.000177 Epoch: 24 Global Step: 515310 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:11,494-Speed 2498.58 samples/sec Loss 1.7094 LearningRate 0.000177 Epoch: 24 Global Step: 515320 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:19,694-Speed 2497.83 samples/sec Loss 1.7411 LearningRate 0.000177 Epoch: 24 Global Step: 515330 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:27,906-Speed 2494.50 samples/sec Loss 1.7498 LearningRate 0.000177 Epoch: 24 Global Step: 515340 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:36,046-Speed 2516.30 samples/sec Loss 1.7261 LearningRate 0.000177 Epoch: 24 Global Step: 515350 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:44,247-Speed 2497.85 samples/sec Loss 1.7094 LearningRate 0.000177 Epoch: 24 Global Step: 515360 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:25:52,455-Speed 2495.24 samples/sec Loss 1.7219 LearningRate 0.000177 Epoch: 24 Global Step: 515370 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:00,655-Speed 2497.95 samples/sec Loss 1.6767 LearningRate 0.000177 Epoch: 24 Global Step: 515380 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:08,850-Speed 2499.74 samples/sec Loss 1.7391 LearningRate 0.000177 Epoch: 24 Global Step: 515390 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:17,052-Speed 2497.28 samples/sec Loss 1.7246 LearningRate 0.000177 Epoch: 24 Global Step: 515400 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:25,198-Speed 2514.70 samples/sec Loss 1.7184 LearningRate 0.000177 Epoch: 24 Global Step: 515410 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:33,401-Speed 2497.14 samples/sec Loss 1.7014 LearningRate 0.000177 Epoch: 24 Global Step: 515420 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:41,600-Speed 2498.39 samples/sec Loss 1.6989 LearningRate 0.000177 Epoch: 24 Global Step: 515430 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:49,799-Speed 2498.01 samples/sec Loss 1.7019 LearningRate 0.000177 Epoch: 24 Global Step: 515440 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:26:57,999-Speed 2498.14 samples/sec Loss 1.7486 LearningRate 0.000177 Epoch: 24 Global Step: 515450 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:06,204-Speed 2496.14 samples/sec Loss 1.7460 LearningRate 0.000177 Epoch: 24 Global Step: 515460 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:14,350-Speed 2514.83 samples/sec Loss 1.6869 LearningRate 0.000177 Epoch: 24 Global Step: 515470 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:22,552-Speed 2497.18 samples/sec Loss 1.7586 LearningRate 0.000177 Epoch: 24 Global Step: 515480 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:30,757-Speed 2496.33 samples/sec Loss 1.6732 LearningRate 0.000177 Epoch: 24 Global Step: 515490 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:38,964-Speed 2495.95 samples/sec Loss 1.7305 LearningRate 0.000177 Epoch: 24 Global Step: 515500 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:47,164-Speed 2498.03 samples/sec Loss 1.6701 LearningRate 0.000177 Epoch: 24 Global Step: 515510 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:27:55,364-Speed 2498.17 samples/sec Loss 1.7288 LearningRate 0.000177 Epoch: 24 Global Step: 515520 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:03,514-Speed 2513.21 samples/sec Loss 1.7020 LearningRate 0.000177 Epoch: 24 Global Step: 515530 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:11,710-Speed 2499.20 samples/sec Loss 1.7197 LearningRate 0.000177 Epoch: 24 Global Step: 515540 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:19,921-Speed 2494.48 samples/sec Loss 1.6868 LearningRate 0.000177 Epoch: 24 Global Step: 515550 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:28,122-Speed 2497.87 samples/sec Loss 1.7272 LearningRate 0.000177 Epoch: 24 Global Step: 515560 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:36,325-Speed 2496.77 samples/sec Loss 1.6798 LearningRate 0.000177 Epoch: 24 Global Step: 515570 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:44,523-Speed 2498.71 samples/sec Loss 1.7396 LearningRate 0.000177 Epoch: 24 Global Step: 515580 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:28:52,673-Speed 2513.29 samples/sec Loss 1.6543 LearningRate 0.000177 Epoch: 24 Global Step: 515590 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:00,869-Speed 2499.09 samples/sec Loss 1.7580 LearningRate 0.000177 Epoch: 24 Global Step: 515600 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:09,082-Speed 2494.29 samples/sec Loss 1.6804 LearningRate 0.000177 Epoch: 24 Global Step: 515610 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:17,285-Speed 2496.84 samples/sec Loss 1.7207 LearningRate 0.000177 Epoch: 24 Global Step: 515620 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:25,486-Speed 2497.82 samples/sec Loss 1.6962 LearningRate 0.000177 Epoch: 24 Global Step: 515630 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:33,682-Speed 2498.95 samples/sec Loss 1.6908 LearningRate 0.000177 Epoch: 24 Global Step: 515640 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:41,830-Speed 2514.09 samples/sec Loss 1.6917 LearningRate 0.000177 Epoch: 24 Global Step: 515650 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:50,026-Speed 2499.11 samples/sec Loss 1.7273 LearningRate 0.000177 Epoch: 24 Global Step: 515660 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:29:58,226-Speed 2497.76 samples/sec Loss 1.6966 LearningRate 0.000177 Epoch: 24 Global Step: 515670 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:06,431-Speed 2496.53 samples/sec Loss 1.6702 LearningRate 0.000177 Epoch: 24 Global Step: 515680 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:14,631-Speed 2498.05 samples/sec Loss 1.6407 LearningRate 0.000177 Epoch: 24 Global Step: 515690 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:22,827-Speed 2499.04 samples/sec Loss 1.6941 LearningRate 0.000177 Epoch: 24 Global Step: 515700 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:30,981-Speed 2511.90 samples/sec Loss 1.7321 LearningRate 0.000177 Epoch: 24 Global Step: 515710 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:39,178-Speed 2498.91 samples/sec Loss 1.7034 LearningRate 0.000177 Epoch: 24 Global Step: 515720 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:47,375-Speed 2498.88 samples/sec Loss 1.6843 LearningRate 0.000177 Epoch: 24 Global Step: 515730 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:30:55,580-Speed 2496.88 samples/sec Loss 1.6793 LearningRate 0.000177 Epoch: 24 Global Step: 515740 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:03,780-Speed 2497.84 samples/sec Loss 1.6984 LearningRate 0.000177 Epoch: 24 Global Step: 515750 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:11,980-Speed 2498.21 samples/sec Loss 1.7256 LearningRate 0.000177 Epoch: 24 Global Step: 515760 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:20,124-Speed 2515.24 samples/sec Loss 1.6687 LearningRate 0.000177 Epoch: 24 Global Step: 515770 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:28,326-Speed 2497.41 samples/sec Loss 1.6752 LearningRate 0.000177 Epoch: 24 Global Step: 515780 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:36,523-Speed 2498.96 samples/sec Loss 1.7172 LearningRate 0.000177 Epoch: 24 Global Step: 515790 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:44,722-Speed 2498.22 samples/sec Loss 1.7476 LearningRate 0.000177 Epoch: 24 Global Step: 515800 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:31:52,919-Speed 2498.63 samples/sec Loss 1.6999 LearningRate 0.000177 Epoch: 24 Global Step: 515810 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:01,117-Speed 2498.64 samples/sec Loss 1.6860 LearningRate 0.000177 Epoch: 24 Global Step: 515820 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:09,260-Speed 2515.51 samples/sec Loss 1.7228 LearningRate 0.000177 Epoch: 24 Global Step: 515830 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:17,458-Speed 2498.61 samples/sec Loss 1.6717 LearningRate 0.000177 Epoch: 24 Global Step: 515840 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:25,670-Speed 2494.57 samples/sec Loss 1.7069 LearningRate 0.000177 Epoch: 24 Global Step: 515850 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:33,876-Speed 2496.54 samples/sec Loss 1.6655 LearningRate 0.000177 Epoch: 24 Global Step: 515860 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:42,071-Speed 2499.21 samples/sec Loss 1.6899 LearningRate 0.000177 Epoch: 24 Global Step: 515870 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:50,273-Speed 2497.42 samples/sec Loss 1.6925 LearningRate 0.000177 Epoch: 24 Global Step: 515880 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:32:58,423-Speed 2513.27 samples/sec Loss 1.6818 LearningRate 0.000177 Epoch: 24 Global Step: 515890 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:06,622-Speed 2498.48 samples/sec Loss 1.6900 LearningRate 0.000176 Epoch: 24 Global Step: 515900 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:14,827-Speed 2496.22 samples/sec Loss 1.7191 LearningRate 0.000176 Epoch: 24 Global Step: 515910 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:23,029-Speed 2497.37 samples/sec Loss 1.6720 LearningRate 0.000176 Epoch: 24 Global Step: 515920 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:31,233-Speed 2496.84 samples/sec Loss 1.6843 LearningRate 0.000176 Epoch: 24 Global Step: 515930 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:39,434-Speed 2497.67 samples/sec Loss 1.7087 LearningRate 0.000176 Epoch: 24 Global Step: 515940 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:47,598-Speed 2508.84 samples/sec Loss 1.6970 LearningRate 0.000176 Epoch: 24 Global Step: 515950 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:33:55,812-Speed 2493.70 samples/sec Loss 1.7145 LearningRate 0.000176 Epoch: 24 Global Step: 515960 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:04,025-Speed 2493.97 samples/sec Loss 1.7087 LearningRate 0.000176 Epoch: 24 Global Step: 515970 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:12,225-Speed 2497.83 samples/sec Loss 1.6694 LearningRate 0.000176 Epoch: 24 Global Step: 515980 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:20,423-Speed 2498.56 samples/sec Loss 1.7124 LearningRate 0.000176 Epoch: 24 Global Step: 515990 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:28,621-Speed 2498.78 samples/sec Loss 1.6305 LearningRate 0.000176 Epoch: 24 Global Step: 516000 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:36,776-Speed 2511.64 samples/sec Loss 1.7055 LearningRate 0.000176 Epoch: 24 Global Step: 516010 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:44,978-Speed 2497.65 samples/sec Loss 1.7127 LearningRate 0.000176 Epoch: 24 Global Step: 516020 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:34:53,175-Speed 2498.57 samples/sec Loss 1.6697 LearningRate 0.000176 Epoch: 24 Global Step: 516030 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:01,373-Speed 2498.66 samples/sec Loss 1.6932 LearningRate 0.000176 Epoch: 24 Global Step: 516040 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:09,574-Speed 2497.99 samples/sec Loss 1.6949 LearningRate 0.000176 Epoch: 24 Global Step: 516050 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:17,784-Speed 2494.92 samples/sec Loss 1.6390 LearningRate 0.000176 Epoch: 24 Global Step: 516060 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:25,929-Speed 2514.89 samples/sec Loss 1.7562 LearningRate 0.000176 Epoch: 24 Global Step: 516070 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:34,128-Speed 2498.35 samples/sec Loss 1.7034 LearningRate 0.000176 Epoch: 24 Global Step: 516080 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:42,333-Speed 2496.26 samples/sec Loss 1.7379 LearningRate 0.000176 Epoch: 24 Global Step: 516090 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:50,535-Speed 2497.69 samples/sec Loss 1.7208 LearningRate 0.000176 Epoch: 24 Global Step: 516100 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:35:58,731-Speed 2499.07 samples/sec Loss 1.7240 LearningRate 0.000176 Epoch: 24 Global Step: 516110 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:36:06,943-Speed 2494.58 samples/sec Loss 1.7410 LearningRate 0.000176 Epoch: 24 Global Step: 516120 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:36:15,090-Speed 2514.12 samples/sec Loss 1.6961 LearningRate 0.000176 Epoch: 24 Global Step: 516130 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:36:23,289-Speed 2497.94 samples/sec Loss 1.6926 LearningRate 0.000176 Epoch: 24 Global Step: 516140 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:36:31,490-Speed 2497.71 samples/sec Loss 1.7179 LearningRate 0.000176 Epoch: 24 Global Step: 516150 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:36:39,686-Speed 2499.11 samples/sec Loss 1.6701 LearningRate 0.000176 Epoch: 24 Global Step: 516160 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:36:47,884-Speed 2498.71 samples/sec Loss 1.6886 LearningRate 0.000176 Epoch: 24 Global Step: 516170 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:36:56,082-Speed 2498.33 samples/sec Loss 1.6911 LearningRate 0.000176 Epoch: 24 Global Step: 516180 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:04,225-Speed 2515.70 samples/sec Loss 1.7292 LearningRate 0.000176 Epoch: 24 Global Step: 516190 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:12,430-Speed 2496.54 samples/sec Loss 1.7013 LearningRate 0.000176 Epoch: 24 Global Step: 516200 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:20,630-Speed 2497.75 samples/sec Loss 1.7466 LearningRate 0.000176 Epoch: 24 Global Step: 516210 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:28,828-Speed 2498.63 samples/sec Loss 1.7113 LearningRate 0.000176 Epoch: 24 Global Step: 516220 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:37,026-Speed 2498.72 samples/sec Loss 1.7224 LearningRate 0.000176 Epoch: 24 Global Step: 516230 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:45,229-Speed 2497.15 samples/sec Loss 1.6653 LearningRate 0.000176 Epoch: 24 Global Step: 516240 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:37:53,376-Speed 2514.18 samples/sec Loss 1.6793 LearningRate 0.000176 Epoch: 24 Global Step: 516250 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:01,578-Speed 2497.44 samples/sec Loss 1.6694 LearningRate 0.000176 Epoch: 24 Global Step: 516260 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:09,775-Speed 2498.82 samples/sec Loss 1.6819 LearningRate 0.000176 Epoch: 24 Global Step: 516270 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:17,974-Speed 2498.39 samples/sec Loss 1.6709 LearningRate 0.000176 Epoch: 24 Global Step: 516280 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:26,179-Speed 2496.32 samples/sec Loss 1.6862 LearningRate 0.000176 Epoch: 24 Global Step: 516290 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:34,378-Speed 2498.28 samples/sec Loss 1.7005 LearningRate 0.000176 Epoch: 24 Global Step: 516300 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:42,524-Speed 2514.53 samples/sec Loss 1.6672 LearningRate 0.000176 Epoch: 24 Global Step: 516310 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:50,723-Speed 2498.23 samples/sec Loss 1.7003 LearningRate 0.000176 Epoch: 24 Global Step: 516320 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:38:58,930-Speed 2495.81 samples/sec Loss 1.7123 LearningRate 0.000176 Epoch: 24 Global Step: 516330 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:07,128-Speed 2498.71 samples/sec Loss 1.7135 LearningRate 0.000176 Epoch: 24 Global Step: 516340 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:15,324-Speed 2499.28 samples/sec Loss 1.6993 LearningRate 0.000176 Epoch: 24 Global Step: 516350 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:23,540-Speed 2493.48 samples/sec Loss 1.7189 LearningRate 0.000176 Epoch: 24 Global Step: 516360 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:31,683-Speed 2515.38 samples/sec Loss 1.7342 LearningRate 0.000176 Epoch: 24 Global Step: 516370 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:39,880-Speed 2498.66 samples/sec Loss 1.6965 LearningRate 0.000176 Epoch: 24 Global Step: 516380 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:48,079-Speed 2498.52 samples/sec Loss 1.6849 LearningRate 0.000176 Epoch: 24 Global Step: 516390 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:39:56,277-Speed 2498.73 samples/sec Loss 1.7374 LearningRate 0.000176 Epoch: 24 Global Step: 516400 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:04,485-Speed 2495.53 samples/sec Loss 1.7391 LearningRate 0.000176 Epoch: 24 Global Step: 516410 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:12,695-Speed 2494.77 samples/sec Loss 1.7009 LearningRate 0.000176 Epoch: 24 Global Step: 516420 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:20,842-Speed 2514.29 samples/sec Loss 1.6976 LearningRate 0.000176 Epoch: 24 Global Step: 516430 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:29,059-Speed 2492.46 samples/sec Loss 1.7285 LearningRate 0.000176 Epoch: 24 Global Step: 516440 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:37,261-Speed 2497.34 samples/sec Loss 1.7157 LearningRate 0.000176 Epoch: 24 Global Step: 516450 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:45,459-Speed 2498.56 samples/sec Loss 1.7077 LearningRate 0.000176 Epoch: 24 Global Step: 516460 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:40:53,659-Speed 2498.13 samples/sec Loss 1.7111 LearningRate 0.000176 Epoch: 24 Global Step: 516470 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:01,858-Speed 2498.15 samples/sec Loss 1.7153 LearningRate 0.000176 Epoch: 24 Global Step: 516480 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:10,004-Speed 2514.57 samples/sec Loss 1.6997 LearningRate 0.000176 Epoch: 24 Global Step: 516490 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:18,206-Speed 2497.50 samples/sec Loss 1.7113 LearningRate 0.000176 Epoch: 24 Global Step: 516500 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:26,407-Speed 2497.53 samples/sec Loss 1.7190 LearningRate 0.000176 Epoch: 24 Global Step: 516510 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:34,605-Speed 2498.38 samples/sec Loss 1.6952 LearningRate 0.000176 Epoch: 24 Global Step: 516520 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:42,804-Speed 2498.62 samples/sec Loss 1.6992 LearningRate 0.000176 Epoch: 24 Global Step: 516530 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:51,008-Speed 2496.82 samples/sec Loss 1.6985 LearningRate 0.000176 Epoch: 24 Global Step: 516540 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:41:59,155-Speed 2514.12 samples/sec Loss 1.6672 LearningRate 0.000176 Epoch: 24 Global Step: 516550 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:07,351-Speed 2499.14 samples/sec Loss 1.7413 LearningRate 0.000176 Epoch: 24 Global Step: 516560 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:15,557-Speed 2497.11 samples/sec Loss 1.7504 LearningRate 0.000176 Epoch: 24 Global Step: 516570 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:23,759-Speed 2497.33 samples/sec Loss 1.7201 LearningRate 0.000176 Epoch: 24 Global Step: 516580 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:31,958-Speed 2498.26 samples/sec Loss 1.6963 LearningRate 0.000176 Epoch: 24 Global Step: 516590 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:40,161-Speed 2496.89 samples/sec Loss 1.7183 LearningRate 0.000176 Epoch: 24 Global Step: 516600 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:48,308-Speed 2514.24 samples/sec Loss 1.7079 LearningRate 0.000176 Epoch: 24 Global Step: 516610 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:42:56,507-Speed 2498.45 samples/sec Loss 1.7373 LearningRate 0.000176 Epoch: 24 Global Step: 516620 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:04,709-Speed 2497.52 samples/sec Loss 1.7045 LearningRate 0.000176 Epoch: 24 Global Step: 516630 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:12,910-Speed 2497.59 samples/sec Loss 1.6945 LearningRate 0.000176 Epoch: 24 Global Step: 516640 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:21,110-Speed 2497.81 samples/sec Loss 1.7213 LearningRate 0.000176 Epoch: 24 Global Step: 516650 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:29,311-Speed 2497.58 samples/sec Loss 1.7144 LearningRate 0.000176 Epoch: 24 Global Step: 516660 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:37,458-Speed 2514.21 samples/sec Loss 1.6852 LearningRate 0.000176 Epoch: 24 Global Step: 516670 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:45,664-Speed 2496.12 samples/sec Loss 1.6897 LearningRate 0.000176 Epoch: 24 Global Step: 516680 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:43:53,861-Speed 2498.97 samples/sec Loss 1.6955 LearningRate 0.000176 Epoch: 24 Global Step: 516690 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:02,060-Speed 2498.75 samples/sec Loss 1.6742 LearningRate 0.000176 Epoch: 24 Global Step: 516700 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:10,264-Speed 2496.63 samples/sec Loss 1.7284 LearningRate 0.000176 Epoch: 24 Global Step: 516710 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:18,467-Speed 2496.96 samples/sec Loss 1.6999 LearningRate 0.000176 Epoch: 24 Global Step: 516720 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:26,616-Speed 2513.84 samples/sec Loss 1.7439 LearningRate 0.000176 Epoch: 24 Global Step: 516730 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:34,819-Speed 2497.38 samples/sec Loss 1.7160 LearningRate 0.000176 Epoch: 24 Global Step: 516740 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:43,021-Speed 2497.29 samples/sec Loss 1.6847 LearningRate 0.000176 Epoch: 24 Global Step: 516750 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:51,224-Speed 2496.98 samples/sec Loss 1.6883 LearningRate 0.000176 Epoch: 24 Global Step: 516760 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:44:59,436-Speed 2494.01 samples/sec Loss 1.6950 LearningRate 0.000176 Epoch: 24 Global Step: 516770 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:45:07,638-Speed 2497.58 samples/sec Loss 1.7264 LearningRate 0.000176 Epoch: 24 Global Step: 516780 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:45:15,789-Speed 2512.91 samples/sec Loss 1.6494 LearningRate 0.000175 Epoch: 24 Global Step: 516790 Fp16 Grad Scale: 32768 Required: 72 hours Training: 2022-07-10 12:45:23,951-Speed 2509.31 samples/sec Loss 1.6963 LearningRate 0.000175 Epoch: 24 Global Step: 516800 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:45:32,151-Speed 2498.04 samples/sec Loss 1.7052 LearningRate 0.000175 Epoch: 24 Global Step: 516810 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:45:40,356-Speed 2496.51 samples/sec Loss 1.6805 LearningRate 0.000175 Epoch: 24 Global Step: 516820 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:45:48,558-Speed 2497.20 samples/sec Loss 1.6735 LearningRate 0.000175 Epoch: 24 Global Step: 516830 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:45:56,760-Speed 2497.23 samples/sec Loss 1.7291 LearningRate 0.000175 Epoch: 24 Global Step: 516840 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:04,907-Speed 2514.52 samples/sec Loss 1.6981 LearningRate 0.000175 Epoch: 24 Global Step: 516850 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:13,103-Speed 2499.18 samples/sec Loss 1.6739 LearningRate 0.000175 Epoch: 24 Global Step: 516860 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:21,302-Speed 2498.29 samples/sec Loss 1.7141 LearningRate 0.000175 Epoch: 24 Global Step: 516870 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:29,521-Speed 2492.21 samples/sec Loss 1.7212 LearningRate 0.000175 Epoch: 24 Global Step: 516880 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:37,717-Speed 2499.27 samples/sec Loss 1.7171 LearningRate 0.000175 Epoch: 24 Global Step: 516890 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:45,922-Speed 2496.44 samples/sec Loss 1.7312 LearningRate 0.000175 Epoch: 24 Global Step: 516900 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:46:54,066-Speed 2515.14 samples/sec Loss 1.7261 LearningRate 0.000175 Epoch: 24 Global Step: 516910 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:02,269-Speed 2497.03 samples/sec Loss 1.7361 LearningRate 0.000175 Epoch: 24 Global Step: 516920 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:10,488-Speed 2492.29 samples/sec Loss 1.7127 LearningRate 0.000175 Epoch: 24 Global Step: 516930 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:18,689-Speed 2497.76 samples/sec Loss 1.6722 LearningRate 0.000175 Epoch: 24 Global Step: 516940 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:26,889-Speed 2497.72 samples/sec Loss 1.6720 LearningRate 0.000175 Epoch: 24 Global Step: 516950 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:35,090-Speed 2497.95 samples/sec Loss 1.7091 LearningRate 0.000175 Epoch: 24 Global Step: 516960 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:43,236-Speed 2514.64 samples/sec Loss 1.7095 LearningRate 0.000175 Epoch: 24 Global Step: 516970 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:51,429-Speed 2500.01 samples/sec Loss 1.7166 LearningRate 0.000175 Epoch: 24 Global Step: 516980 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:47:59,722-Speed 2470.20 samples/sec Loss 1.7006 LearningRate 0.000175 Epoch: 24 Global Step: 516990 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:07,919-Speed 2498.87 samples/sec Loss 1.6938 LearningRate 0.000175 Epoch: 24 Global Step: 517000 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:16,119-Speed 2498.08 samples/sec Loss 1.7537 LearningRate 0.000175 Epoch: 24 Global Step: 517010 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:24,318-Speed 2498.04 samples/sec Loss 1.7017 LearningRate 0.000175 Epoch: 24 Global Step: 517020 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:32,464-Speed 2514.68 samples/sec Loss 1.6611 LearningRate 0.000175 Epoch: 24 Global Step: 517030 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:40,661-Speed 2498.66 samples/sec Loss 1.7287 LearningRate 0.000175 Epoch: 24 Global Step: 517040 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:48,862-Speed 2497.66 samples/sec Loss 1.7375 LearningRate 0.000175 Epoch: 24 Global Step: 517050 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:48:57,062-Speed 2497.94 samples/sec Loss 1.7283 LearningRate 0.000175 Epoch: 24 Global Step: 517060 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:05,262-Speed 2497.95 samples/sec Loss 1.6774 LearningRate 0.000175 Epoch: 24 Global Step: 517070 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:13,462-Speed 2498.00 samples/sec Loss 1.7414 LearningRate 0.000175 Epoch: 24 Global Step: 517080 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:21,610-Speed 2513.96 samples/sec Loss 1.7154 LearningRate 0.000175 Epoch: 24 Global Step: 517090 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:29,811-Speed 2497.51 samples/sec Loss 1.6923 LearningRate 0.000175 Epoch: 24 Global Step: 517100 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:38,014-Speed 2496.99 samples/sec Loss 1.6937 LearningRate 0.000175 Epoch: 24 Global Step: 517110 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:46,231-Speed 2492.80 samples/sec Loss 1.7053 LearningRate 0.000175 Epoch: 24 Global Step: 517120 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:49:54,435-Speed 2496.65 samples/sec Loss 1.6948 LearningRate 0.000175 Epoch: 24 Global Step: 517130 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:02,639-Speed 2496.78 samples/sec Loss 1.6744 LearningRate 0.000175 Epoch: 24 Global Step: 517140 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:10,795-Speed 2512.23 samples/sec Loss 1.7118 LearningRate 0.000175 Epoch: 24 Global Step: 517150 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:18,994-Speed 2498.20 samples/sec Loss 1.6972 LearningRate 0.000175 Epoch: 24 Global Step: 517160 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:27,208-Speed 2493.74 samples/sec Loss 1.6920 LearningRate 0.000175 Epoch: 24 Global Step: 517170 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:35,423-Speed 2493.24 samples/sec Loss 1.6604 LearningRate 0.000175 Epoch: 24 Global Step: 517180 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:43,626-Speed 2496.88 samples/sec Loss 1.6590 LearningRate 0.000175 Epoch: 24 Global Step: 517190 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:51,827-Speed 2497.68 samples/sec Loss 1.6573 LearningRate 0.000175 Epoch: 24 Global Step: 517200 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:50:59,970-Speed 2515.60 samples/sec Loss 1.6688 LearningRate 0.000175 Epoch: 24 Global Step: 517210 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:51:08,177-Speed 2495.57 samples/sec Loss 1.6580 LearningRate 0.000175 Epoch: 24 Global Step: 517220 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:51:16,380-Speed 2497.19 samples/sec Loss 1.6664 LearningRate 0.000175 Epoch: 24 Global Step: 517230 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:51:24,582-Speed 2497.53 samples/sec Loss 1.7242 LearningRate 0.000175 Epoch: 24 Global Step: 517240 Fp16 Grad Scale: 16384 Required: 72 hours Training: 2022-07-10 12:51:32,782-Speed 2497.91 samples/sec Loss 1.6903 LearningRate 0.000175 Epoch: 24 Global Step: 517250 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:51:40,985-Speed 2496.96 samples/sec Loss 1.6814 LearningRate 0.000175 Epoch: 24 Global Step: 517260 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:51:49,139-Speed 2512.02 samples/sec Loss 1.7200 LearningRate 0.000175 Epoch: 24 Global Step: 517270 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:51:57,344-Speed 2496.50 samples/sec Loss 1.7007 LearningRate 0.000175 Epoch: 24 Global Step: 517280 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:05,547-Speed 2497.12 samples/sec Loss 1.6840 LearningRate 0.000175 Epoch: 24 Global Step: 517290 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:13,751-Speed 2496.67 samples/sec Loss 1.7338 LearningRate 0.000175 Epoch: 24 Global Step: 517300 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:21,956-Speed 2496.54 samples/sec Loss 1.7097 LearningRate 0.000175 Epoch: 24 Global Step: 517310 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:30,160-Speed 2496.55 samples/sec Loss 1.6876 LearningRate 0.000175 Epoch: 24 Global Step: 517320 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:38,308-Speed 2513.91 samples/sec Loss 1.6840 LearningRate 0.000175 Epoch: 24 Global Step: 517330 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:46,512-Speed 2496.89 samples/sec Loss 1.7223 LearningRate 0.000175 Epoch: 24 Global Step: 517340 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:52:54,715-Speed 2496.99 samples/sec Loss 1.6786 LearningRate 0.000175 Epoch: 24 Global Step: 517350 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:02,919-Speed 2496.63 samples/sec Loss 1.6927 LearningRate 0.000175 Epoch: 24 Global Step: 517360 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:11,127-Speed 2495.82 samples/sec Loss 1.6865 LearningRate 0.000175 Epoch: 24 Global Step: 517370 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:19,330-Speed 2497.01 samples/sec Loss 1.7157 LearningRate 0.000175 Epoch: 24 Global Step: 517380 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:27,479-Speed 2513.37 samples/sec Loss 1.7212 LearningRate 0.000175 Epoch: 24 Global Step: 517390 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:35,683-Speed 2496.67 samples/sec Loss 1.7117 LearningRate 0.000175 Epoch: 24 Global Step: 517400 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:43,886-Speed 2497.18 samples/sec Loss 1.7300 LearningRate 0.000175 Epoch: 24 Global Step: 517410 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:53:52,088-Speed 2497.59 samples/sec Loss 1.6730 LearningRate 0.000175 Epoch: 24 Global Step: 517420 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:00,296-Speed 2495.78 samples/sec Loss 1.6570 LearningRate 0.000175 Epoch: 24 Global Step: 517430 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:08,498-Speed 2497.15 samples/sec Loss 1.7211 LearningRate 0.000175 Epoch: 24 Global Step: 517440 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:16,656-Speed 2510.89 samples/sec Loss 1.6935 LearningRate 0.000175 Epoch: 24 Global Step: 517450 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:24,865-Speed 2495.62 samples/sec Loss 1.7105 LearningRate 0.000175 Epoch: 24 Global Step: 517460 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:33,070-Speed 2496.30 samples/sec Loss 1.7017 LearningRate 0.000175 Epoch: 24 Global Step: 517470 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:41,277-Speed 2495.76 samples/sec Loss 1.7157 LearningRate 0.000175 Epoch: 24 Global Step: 517480 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:49,477-Speed 2498.79 samples/sec Loss 1.7094 LearningRate 0.000175 Epoch: 24 Global Step: 517490 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:54:57,675-Speed 2498.87 samples/sec Loss 1.6951 LearningRate 0.000175 Epoch: 24 Global Step: 517500 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:05,820-Speed 2514.74 samples/sec Loss 1.7224 LearningRate 0.000175 Epoch: 24 Global Step: 517510 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:14,020-Speed 2498.05 samples/sec Loss 1.7197 LearningRate 0.000175 Epoch: 24 Global Step: 517520 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:22,225-Speed 2496.49 samples/sec Loss 1.7021 LearningRate 0.000175 Epoch: 24 Global Step: 517530 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:30,430-Speed 2496.36 samples/sec Loss 1.6995 LearningRate 0.000175 Epoch: 24 Global Step: 517540 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:38,627-Speed 2499.06 samples/sec Loss 1.6994 LearningRate 0.000175 Epoch: 24 Global Step: 517550 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:46,828-Speed 2497.58 samples/sec Loss 1.7260 LearningRate 0.000175 Epoch: 24 Global Step: 517560 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:55:54,973-Speed 2514.81 samples/sec Loss 1.6661 LearningRate 0.000175 Epoch: 24 Global Step: 517570 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:03,172-Speed 2498.27 samples/sec Loss 1.6970 LearningRate 0.000175 Epoch: 24 Global Step: 517580 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:11,369-Speed 2499.03 samples/sec Loss 1.6673 LearningRate 0.000175 Epoch: 24 Global Step: 517590 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:19,571-Speed 2497.46 samples/sec Loss 1.6995 LearningRate 0.000175 Epoch: 24 Global Step: 517600 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:27,769-Speed 2498.38 samples/sec Loss 1.7089 LearningRate 0.000175 Epoch: 24 Global Step: 517610 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:36,005-Speed 2499.59 samples/sec Loss 1.6920 LearningRate 0.000175 Epoch: 24 Global Step: 517620 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:44,189-Speed 2516.07 samples/sec Loss 1.7442 LearningRate 0.000175 Epoch: 24 Global Step: 517630 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:56:52,390-Speed 2497.50 samples/sec Loss 1.6683 LearningRate 0.000175 Epoch: 24 Global Step: 517640 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:00,592-Speed 2497.25 samples/sec Loss 1.6757 LearningRate 0.000175 Epoch: 24 Global Step: 517650 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:08,808-Speed 2500.19 samples/sec Loss 1.6918 LearningRate 0.000175 Epoch: 24 Global Step: 517660 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:17,058-Speed 2497.50 samples/sec Loss 1.7202 LearningRate 0.000175 Epoch: 24 Global Step: 517670 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:29,493-Speed 1647.14 samples/sec Loss 1.6492 LearningRate 0.000175 Epoch: 24 Global Step: 517680 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:37,689-Speed 2518.14 samples/sec Loss 1.6770 LearningRate 0.000174 Epoch: 24 Global Step: 517690 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:45,935-Speed 2497.91 samples/sec Loss 1.6662 LearningRate 0.000174 Epoch: 24 Global Step: 517700 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:57:54,135-Speed 2498.08 samples/sec Loss 1.6734 LearningRate 0.000174 Epoch: 24 Global Step: 517710 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:04,786-Speed 2502.29 samples/sec Loss 1.6901 LearningRate 0.000174 Epoch: 24 Global Step: 517720 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:13,015-Speed 2500.51 samples/sec Loss 1.6915 LearningRate 0.000174 Epoch: 24 Global Step: 517730 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:25,358-Speed 1659.30 samples/sec Loss 1.6856 LearningRate 0.000174 Epoch: 24 Global Step: 517740 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:33,508-Speed 2513.53 samples/sec Loss 1.6820 LearningRate 0.000174 Epoch: 24 Global Step: 517750 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:41,759-Speed 2496.39 samples/sec Loss 1.6793 LearningRate 0.000174 Epoch: 24 Global Step: 517760 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:58:54,068-Speed 1669.82 samples/sec Loss 1.6748 LearningRate 0.000174 Epoch: 24 Global Step: 517770 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:06,892-Speed 2500.55 samples/sec Loss 1.7220 LearningRate 0.000174 Epoch: 24 Global Step: 517780 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:19,083-Speed 1684.75 samples/sec Loss 1.6823 LearningRate 0.000174 Epoch: 24 Global Step: 517790 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:27,969-Speed 2305.33 samples/sec Loss 1.6746 LearningRate 0.000174 Epoch: 24 Global Step: 517800 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:36,125-Speed 2511.31 samples/sec Loss 1.6827 LearningRate 0.000174 Epoch: 24 Global Step: 517810 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:49,087-Speed 2478.86 samples/sec Loss 1.7376 LearningRate 0.000174 Epoch: 24 Global Step: 517820 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 12:59:58,357-Speed 2213.94 samples/sec Loss 1.7037 LearningRate 0.000174 Epoch: 24 Global Step: 517830 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:06,570-Speed 2493.98 samples/sec Loss 1.7452 LearningRate 0.000174 Epoch: 24 Global Step: 517840 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:14,780-Speed 2494.72 samples/sec Loss 1.6970 LearningRate 0.000174 Epoch: 24 Global Step: 517850 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:23,007-Speed 2490.00 samples/sec Loss 1.6981 LearningRate 0.000174 Epoch: 24 Global Step: 517860 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:31,165-Speed 2510.81 samples/sec Loss 1.6991 LearningRate 0.000174 Epoch: 24 Global Step: 517870 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:39,387-Speed 2491.22 samples/sec Loss 1.6942 LearningRate 0.000174 Epoch: 24 Global Step: 517880 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:47,596-Speed 2495.28 samples/sec Loss 1.6688 LearningRate 0.000174 Epoch: 24 Global Step: 517890 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:00:55,811-Speed 2493.21 samples/sec Loss 1.7095 LearningRate 0.000174 Epoch: 24 Global Step: 517900 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:04,021-Speed 2494.95 samples/sec Loss 1.7179 LearningRate 0.000174 Epoch: 24 Global Step: 517910 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:12,232-Speed 2494.71 samples/sec Loss 1.7284 LearningRate 0.000174 Epoch: 24 Global Step: 517920 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:20,387-Speed 2511.56 samples/sec Loss 1.7336 LearningRate 0.000174 Epoch: 24 Global Step: 517930 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:28,601-Speed 2493.71 samples/sec Loss 1.7437 LearningRate 0.000174 Epoch: 24 Global Step: 517940 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:36,809-Speed 2495.78 samples/sec Loss 1.6517 LearningRate 0.000174 Epoch: 24 Global Step: 517950 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:45,029-Speed 2492.14 samples/sec Loss 1.7052 LearningRate 0.000174 Epoch: 24 Global Step: 517960 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:01:53,238-Speed 2495.28 samples/sec Loss 1.6911 LearningRate 0.000174 Epoch: 24 Global Step: 517970 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:02:01,446-Speed 2495.47 samples/sec Loss 1.6752 LearningRate 0.000174 Epoch: 24 Global Step: 517980 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:02:09,608-Speed 2509.61 samples/sec Loss 1.7142 LearningRate 0.000174 Epoch: 24 Global Step: 517990 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:02:17,817-Speed 2495.21 samples/sec Loss 1.6826 LearningRate 0.000174 Epoch: 24 Global Step: 518000 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:02:26,039-Speed 2491.29 samples/sec Loss 1.7158 LearningRate 0.000174 Epoch: 24 Global Step: 518010 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:02:34,255-Speed 2492.96 samples/sec Loss 1.7134 LearningRate 0.000174 Epoch: 24 Global Step: 518020 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:02:42,467-Speed 2494.28 samples/sec Loss 1.7037 LearningRate 0.000174 Epoch: 24 Global Step: 518030 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:02:50,678-Speed 2494.82 samples/sec Loss 1.6949 LearningRate 0.000174 Epoch: 24 Global Step: 518040 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:02:58,831-Speed 2514.27 samples/sec Loss 1.6710 LearningRate 0.000174 Epoch: 24 Global Step: 518050 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:07,040-Speed 2495.15 samples/sec Loss 1.6659 LearningRate 0.000174 Epoch: 24 Global Step: 518060 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:15,253-Speed 2494.14 samples/sec Loss 1.7091 LearningRate 0.000174 Epoch: 24 Global Step: 518070 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:23,463-Speed 2495.02 samples/sec Loss 1.7170 LearningRate 0.000174 Epoch: 24 Global Step: 518080 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:31,671-Speed 2495.41 samples/sec Loss 1.6773 LearningRate 0.000174 Epoch: 24 Global Step: 518090 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:39,882-Speed 2494.42 samples/sec Loss 1.7071 LearningRate 0.000174 Epoch: 24 Global Step: 518100 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:48,039-Speed 2511.09 samples/sec Loss 1.6751 LearningRate 0.000174 Epoch: 24 Global Step: 518110 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:03:56,252-Speed 2494.02 samples/sec Loss 1.6727 LearningRate 0.000174 Epoch: 24 Global Step: 518120 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:04,460-Speed 2496.12 samples/sec Loss 1.6721 LearningRate 0.000174 Epoch: 24 Global Step: 518130 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:12,667-Speed 2495.68 samples/sec Loss 1.7030 LearningRate 0.000174 Epoch: 24 Global Step: 518140 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:20,874-Speed 2495.84 samples/sec Loss 1.6428 LearningRate 0.000174 Epoch: 24 Global Step: 518150 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:29,083-Speed 2495.41 samples/sec Loss 1.7272 LearningRate 0.000174 Epoch: 24 Global Step: 518160 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:37,239-Speed 2511.25 samples/sec Loss 1.6710 LearningRate 0.000174 Epoch: 24 Global Step: 518170 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:45,448-Speed 2495.46 samples/sec Loss 1.6792 LearningRate 0.000174 Epoch: 24 Global Step: 518180 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:04:53,661-Speed 2494.10 samples/sec Loss 1.6770 LearningRate 0.000174 Epoch: 24 Global Step: 518190 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:01,872-Speed 2494.66 samples/sec Loss 1.7057 LearningRate 0.000174 Epoch: 24 Global Step: 518200 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:10,081-Speed 2495.24 samples/sec Loss 1.6837 LearningRate 0.000174 Epoch: 24 Global Step: 518210 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:18,288-Speed 2495.59 samples/sec Loss 1.7261 LearningRate 0.000174 Epoch: 24 Global Step: 518220 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:26,440-Speed 2512.67 samples/sec Loss 1.6564 LearningRate 0.000174 Epoch: 24 Global Step: 518230 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:34,647-Speed 2495.99 samples/sec Loss 1.6940 LearningRate 0.000174 Epoch: 24 Global Step: 518240 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:05:42,814-Speed 2507.93 samples/sec Loss 1.7125 LearningRate 0.000174 Epoch: 24 Global Step: 518250 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:05:51,021-Speed 2495.88 samples/sec Loss 1.6258 LearningRate 0.000174 Epoch: 24 Global Step: 518260 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:05:59,231-Speed 2494.85 samples/sec Loss 1.7099 LearningRate 0.000174 Epoch: 24 Global Step: 518270 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:07,439-Speed 2495.35 samples/sec Loss 1.6885 LearningRate 0.000174 Epoch: 24 Global Step: 518280 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:15,596-Speed 2511.43 samples/sec Loss 1.7101 LearningRate 0.000174 Epoch: 24 Global Step: 518290 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:23,808-Speed 2494.16 samples/sec Loss 1.7301 LearningRate 0.000174 Epoch: 24 Global Step: 518300 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:32,016-Speed 2495.63 samples/sec Loss 1.6910 LearningRate 0.000174 Epoch: 24 Global Step: 518310 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:40,225-Speed 2495.13 samples/sec Loss 1.6843 LearningRate 0.000174 Epoch: 24 Global Step: 518320 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:48,453-Speed 2489.34 samples/sec Loss 1.6792 LearningRate 0.000174 Epoch: 24 Global Step: 518330 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:06:56,666-Speed 2494.00 samples/sec Loss 1.7230 LearningRate 0.000174 Epoch: 24 Global Step: 518340 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:04,821-Speed 2511.66 samples/sec Loss 1.7374 LearningRate 0.000174 Epoch: 24 Global Step: 518350 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:13,035-Speed 2493.90 samples/sec Loss 1.6813 LearningRate 0.000174 Epoch: 24 Global Step: 518360 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:21,245-Speed 2494.96 samples/sec Loss 1.6510 LearningRate 0.000174 Epoch: 24 Global Step: 518370 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:29,456-Speed 2494.71 samples/sec Loss 1.7400 LearningRate 0.000174 Epoch: 24 Global Step: 518380 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:37,661-Speed 2496.54 samples/sec Loss 1.7177 LearningRate 0.000174 Epoch: 24 Global Step: 518390 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:45,866-Speed 2496.49 samples/sec Loss 1.7189 LearningRate 0.000174 Epoch: 24 Global Step: 518400 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:07:54,020-Speed 2512.16 samples/sec Loss 1.7118 LearningRate 0.000174 Epoch: 24 Global Step: 518410 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:02,227-Speed 2495.86 samples/sec Loss 1.6621 LearningRate 0.000174 Epoch: 24 Global Step: 518420 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:10,438-Speed 2494.50 samples/sec Loss 1.6959 LearningRate 0.000174 Epoch: 24 Global Step: 518430 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:18,643-Speed 2496.51 samples/sec Loss 1.6929 LearningRate 0.000174 Epoch: 24 Global Step: 518440 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:26,849-Speed 2496.19 samples/sec Loss 1.6984 LearningRate 0.000174 Epoch: 24 Global Step: 518450 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:35,057-Speed 2495.37 samples/sec Loss 1.7704 LearningRate 0.000174 Epoch: 24 Global Step: 518460 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:43,209-Speed 2512.92 samples/sec Loss 1.7108 LearningRate 0.000174 Epoch: 24 Global Step: 518470 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:08:51,430-Speed 2491.65 samples/sec Loss 1.7239 LearningRate 0.000174 Epoch: 24 Global Step: 518480 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:02,251-Speed 1892.83 samples/sec Loss 1.6978 LearningRate 0.000174 Epoch: 25 Global Step: 518490 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:10,450-Speed 2498.23 samples/sec Loss 1.6930 LearningRate 0.000174 Epoch: 25 Global Step: 518500 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:18,654-Speed 2496.71 samples/sec Loss 1.7359 LearningRate 0.000174 Epoch: 25 Global Step: 518510 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:26,857-Speed 2496.97 samples/sec Loss 1.7070 LearningRate 0.000174 Epoch: 25 Global Step: 518520 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:35,008-Speed 2513.17 samples/sec Loss 1.7547 LearningRate 0.000174 Epoch: 25 Global Step: 518530 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:43,213-Speed 2496.49 samples/sec Loss 1.7796 LearningRate 0.000174 Epoch: 25 Global Step: 518540 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:51,419-Speed 2495.85 samples/sec Loss 1.7211 LearningRate 0.000174 Epoch: 25 Global Step: 518550 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:09:59,625-Speed 2496.39 samples/sec Loss 1.7600 LearningRate 0.000174 Epoch: 25 Global Step: 518560 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:07,832-Speed 2495.89 samples/sec Loss 1.7189 LearningRate 0.000174 Epoch: 25 Global Step: 518570 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:16,037-Speed 2496.44 samples/sec Loss 1.7080 LearningRate 0.000173 Epoch: 25 Global Step: 518580 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:24,189-Speed 2512.68 samples/sec Loss 1.7334 LearningRate 0.000173 Epoch: 25 Global Step: 518590 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:32,396-Speed 2495.93 samples/sec Loss 1.6974 LearningRate 0.000173 Epoch: 25 Global Step: 518600 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:40,599-Speed 2496.98 samples/sec Loss 1.6997 LearningRate 0.000173 Epoch: 25 Global Step: 518610 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:48,799-Speed 2497.96 samples/sec Loss 1.6698 LearningRate 0.000173 Epoch: 25 Global Step: 518620 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:10:56,998-Speed 2498.33 samples/sec Loss 1.7194 LearningRate 0.000173 Epoch: 25 Global Step: 518630 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:05,202-Speed 2496.85 samples/sec Loss 1.7223 LearningRate 0.000173 Epoch: 25 Global Step: 518640 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:13,354-Speed 2516.54 samples/sec Loss 1.6885 LearningRate 0.000173 Epoch: 25 Global Step: 518650 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:21,560-Speed 2496.08 samples/sec Loss 1.7192 LearningRate 0.000173 Epoch: 25 Global Step: 518660 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:29,760-Speed 2497.89 samples/sec Loss 1.6919 LearningRate 0.000173 Epoch: 25 Global Step: 518670 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:37,963-Speed 2497.15 samples/sec Loss 1.7312 LearningRate 0.000173 Epoch: 25 Global Step: 518680 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:46,165-Speed 2497.47 samples/sec Loss 1.6984 LearningRate 0.000173 Epoch: 25 Global Step: 518690 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:11:54,366-Speed 2497.77 samples/sec Loss 1.7073 LearningRate 0.000173 Epoch: 25 Global Step: 518700 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:02,525-Speed 2510.34 samples/sec Loss 1.6995 LearningRate 0.000173 Epoch: 25 Global Step: 518710 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:10,733-Speed 2495.69 samples/sec Loss 1.7346 LearningRate 0.000173 Epoch: 25 Global Step: 518720 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:18,936-Speed 2497.08 samples/sec Loss 1.6943 LearningRate 0.000173 Epoch: 25 Global Step: 518730 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:27,141-Speed 2496.25 samples/sec Loss 1.7240 LearningRate 0.000173 Epoch: 25 Global Step: 518740 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:35,345-Speed 2496.56 samples/sec Loss 1.7105 LearningRate 0.000173 Epoch: 25 Global Step: 518750 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:43,552-Speed 2495.82 samples/sec Loss 1.6905 LearningRate 0.000173 Epoch: 25 Global Step: 518760 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:51,710-Speed 2510.93 samples/sec Loss 1.7367 LearningRate 0.000173 Epoch: 25 Global Step: 518770 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:12:59,913-Speed 2497.25 samples/sec Loss 1.6828 LearningRate 0.000173 Epoch: 25 Global Step: 518780 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:08,115-Speed 2497.05 samples/sec Loss 1.7113 LearningRate 0.000173 Epoch: 25 Global Step: 518790 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:16,316-Speed 2497.69 samples/sec Loss 1.6572 LearningRate 0.000173 Epoch: 25 Global Step: 518800 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:24,522-Speed 2496.25 samples/sec Loss 1.7151 LearningRate 0.000173 Epoch: 25 Global Step: 518810 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:32,724-Speed 2497.36 samples/sec Loss 1.6738 LearningRate 0.000173 Epoch: 25 Global Step: 518820 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:40,871-Speed 2514.25 samples/sec Loss 1.7060 LearningRate 0.000173 Epoch: 25 Global Step: 518830 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:49,077-Speed 2496.24 samples/sec Loss 1.6829 LearningRate 0.000173 Epoch: 25 Global Step: 518840 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:13:57,284-Speed 2495.58 samples/sec Loss 1.6589 LearningRate 0.000173 Epoch: 25 Global Step: 518850 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:05,507-Speed 2491.08 samples/sec Loss 1.6468 LearningRate 0.000173 Epoch: 25 Global Step: 518860 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:13,710-Speed 2496.91 samples/sec Loss 1.6937 LearningRate 0.000173 Epoch: 25 Global Step: 518870 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:21,910-Speed 2498.17 samples/sec Loss 1.7067 LearningRate 0.000173 Epoch: 25 Global Step: 518880 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:30,065-Speed 2511.88 samples/sec Loss 1.6764 LearningRate 0.000173 Epoch: 25 Global Step: 518890 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:38,266-Speed 2497.87 samples/sec Loss 1.7042 LearningRate 0.000173 Epoch: 25 Global Step: 518900 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:46,472-Speed 2496.24 samples/sec Loss 1.6609 LearningRate 0.000173 Epoch: 25 Global Step: 518910 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:14:54,674-Speed 2497.35 samples/sec Loss 1.6998 LearningRate 0.000173 Epoch: 25 Global Step: 518920 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:02,872-Speed 2498.62 samples/sec Loss 1.7016 LearningRate 0.000173 Epoch: 25 Global Step: 518930 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:11,075-Speed 2497.10 samples/sec Loss 1.6661 LearningRate 0.000173 Epoch: 25 Global Step: 518940 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:19,241-Speed 2508.48 samples/sec Loss 1.6512 LearningRate 0.000173 Epoch: 25 Global Step: 518950 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:27,439-Speed 2498.37 samples/sec Loss 1.6879 LearningRate 0.000173 Epoch: 25 Global Step: 518960 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:35,646-Speed 2495.77 samples/sec Loss 1.6578 LearningRate 0.000173 Epoch: 25 Global Step: 518970 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:43,854-Speed 2495.89 samples/sec Loss 1.6504 LearningRate 0.000173 Epoch: 25 Global Step: 518980 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:15:52,056-Speed 2497.60 samples/sec Loss 1.6702 LearningRate 0.000173 Epoch: 25 Global Step: 518990 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:00,285-Speed 2489.15 samples/sec Loss 1.6797 LearningRate 0.000173 Epoch: 25 Global Step: 519000 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:08,439-Speed 2511.86 samples/sec Loss 1.6985 LearningRate 0.000173 Epoch: 25 Global Step: 519010 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:16,673-Speed 2487.78 samples/sec Loss 1.6633 LearningRate 0.000173 Epoch: 25 Global Step: 519020 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:24,889-Speed 2493.21 samples/sec Loss 1.6776 LearningRate 0.000173 Epoch: 25 Global Step: 519030 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:33,093-Speed 2496.73 samples/sec Loss 1.6783 LearningRate 0.000173 Epoch: 25 Global Step: 519040 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:41,294-Speed 2497.79 samples/sec Loss 1.6708 LearningRate 0.000173 Epoch: 25 Global Step: 519050 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:49,496-Speed 2497.36 samples/sec Loss 1.6914 LearningRate 0.000173 Epoch: 25 Global Step: 519060 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:16:57,640-Speed 2515.14 samples/sec Loss 1.6904 LearningRate 0.000173 Epoch: 25 Global Step: 519070 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:05,841-Speed 2497.72 samples/sec Loss 1.6899 LearningRate 0.000173 Epoch: 25 Global Step: 519080 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:14,037-Speed 2498.96 samples/sec Loss 1.7008 LearningRate 0.000173 Epoch: 25 Global Step: 519090 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:22,245-Speed 2495.49 samples/sec Loss 1.7181 LearningRate 0.000173 Epoch: 25 Global Step: 519100 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:30,449-Speed 2497.26 samples/sec Loss 1.7248 LearningRate 0.000173 Epoch: 25 Global Step: 519110 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:38,650-Speed 2497.71 samples/sec Loss 1.6998 LearningRate 0.000173 Epoch: 25 Global Step: 519120 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:46,796-Speed 2514.31 samples/sec Loss 1.7231 LearningRate 0.000173 Epoch: 25 Global Step: 519130 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:17:55,009-Speed 2494.14 samples/sec Loss 1.6296 LearningRate 0.000173 Epoch: 25 Global Step: 519140 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:03,207-Speed 2498.71 samples/sec Loss 1.6571 LearningRate 0.000173 Epoch: 25 Global Step: 519150 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:11,408-Speed 2497.56 samples/sec Loss 1.7276 LearningRate 0.000173 Epoch: 25 Global Step: 519160 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:19,617-Speed 2495.41 samples/sec Loss 1.6570 LearningRate 0.000173 Epoch: 25 Global Step: 519170 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:27,824-Speed 2495.59 samples/sec Loss 1.6659 LearningRate 0.000173 Epoch: 25 Global Step: 519180 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:35,977-Speed 2512.70 samples/sec Loss 1.6834 LearningRate 0.000173 Epoch: 25 Global Step: 519190 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:44,182-Speed 2496.52 samples/sec Loss 1.7059 LearningRate 0.000173 Epoch: 25 Global Step: 519200 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:18:52,387-Speed 2496.43 samples/sec Loss 1.6890 LearningRate 0.000173 Epoch: 25 Global Step: 519210 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:00,589-Speed 2497.10 samples/sec Loss 1.6936 LearningRate 0.000173 Epoch: 25 Global Step: 519220 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:08,800-Speed 2494.70 samples/sec Loss 1.7147 LearningRate 0.000173 Epoch: 25 Global Step: 519230 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:17,009-Speed 2495.30 samples/sec Loss 1.6959 LearningRate 0.000173 Epoch: 25 Global Step: 519240 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:25,160-Speed 2512.98 samples/sec Loss 1.6908 LearningRate 0.000173 Epoch: 25 Global Step: 519250 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:33,366-Speed 2496.04 samples/sec Loss 1.7093 LearningRate 0.000173 Epoch: 25 Global Step: 519260 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:41,571-Speed 2496.25 samples/sec Loss 1.7231 LearningRate 0.000173 Epoch: 25 Global Step: 519270 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:49,774-Speed 2497.24 samples/sec Loss 1.6951 LearningRate 0.000173 Epoch: 25 Global Step: 519280 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:19:57,987-Speed 2493.95 samples/sec Loss 1.6885 LearningRate 0.000173 Epoch: 25 Global Step: 519290 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:06,191-Speed 2496.70 samples/sec Loss 1.6813 LearningRate 0.000173 Epoch: 25 Global Step: 519300 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:14,338-Speed 2514.44 samples/sec Loss 1.7088 LearningRate 0.000173 Epoch: 25 Global Step: 519310 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:22,538-Speed 2498.06 samples/sec Loss 1.6966 LearningRate 0.000173 Epoch: 25 Global Step: 519320 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:30,737-Speed 2498.28 samples/sec Loss 1.6999 LearningRate 0.000173 Epoch: 25 Global Step: 519330 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:38,936-Speed 2498.24 samples/sec Loss 1.7108 LearningRate 0.000173 Epoch: 25 Global Step: 519340 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:47,135-Speed 2498.33 samples/sec Loss 1.7236 LearningRate 0.000173 Epoch: 25 Global Step: 519350 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:20:55,333-Speed 2498.67 samples/sec Loss 1.7070 LearningRate 0.000173 Epoch: 25 Global Step: 519360 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:03,485-Speed 2512.63 samples/sec Loss 1.6952 LearningRate 0.000173 Epoch: 25 Global Step: 519370 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:11,687-Speed 2497.27 samples/sec Loss 1.7378 LearningRate 0.000173 Epoch: 25 Global Step: 519380 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:19,889-Speed 2497.53 samples/sec Loss 1.7058 LearningRate 0.000173 Epoch: 25 Global Step: 519390 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:28,088-Speed 2498.18 samples/sec Loss 1.7110 LearningRate 0.000173 Epoch: 25 Global Step: 519400 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:36,286-Speed 2498.44 samples/sec Loss 1.6830 LearningRate 0.000173 Epoch: 25 Global Step: 519410 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:44,489-Speed 2497.17 samples/sec Loss 1.6939 LearningRate 0.000173 Epoch: 25 Global Step: 519420 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:21:52,640-Speed 2512.80 samples/sec Loss 1.6902 LearningRate 0.000173 Epoch: 25 Global Step: 519430 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:22:00,847-Speed 2495.90 samples/sec Loss 1.7029 LearningRate 0.000173 Epoch: 25 Global Step: 519440 Fp16 Grad Scale: 16384 Required: 71 hours Training: 2022-07-10 13:22:09,049-Speed 2497.29 samples/sec Loss 1.6653 LearningRate 0.000173 Epoch: 25 Global Step: 519450 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:17,252-Speed 2497.20 samples/sec Loss 1.6566 LearningRate 0.000173 Epoch: 25 Global Step: 519460 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:25,455-Speed 2497.13 samples/sec Loss 1.6937 LearningRate 0.000173 Epoch: 25 Global Step: 519470 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:33,658-Speed 2496.84 samples/sec Loss 1.7283 LearningRate 0.000172 Epoch: 25 Global Step: 519480 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:41,806-Speed 2513.86 samples/sec Loss 1.6971 LearningRate 0.000172 Epoch: 25 Global Step: 519490 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:50,009-Speed 2496.99 samples/sec Loss 1.6867 LearningRate 0.000172 Epoch: 25 Global Step: 519500 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:22:58,211-Speed 2497.54 samples/sec Loss 1.7050 LearningRate 0.000172 Epoch: 25 Global Step: 519510 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:06,413-Speed 2497.00 samples/sec Loss 1.6610 LearningRate 0.000172 Epoch: 25 Global Step: 519520 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:14,624-Speed 2494.86 samples/sec Loss 1.6430 LearningRate 0.000172 Epoch: 25 Global Step: 519530 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:22,828-Speed 2496.70 samples/sec Loss 1.6663 LearningRate 0.000172 Epoch: 25 Global Step: 519540 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:30,978-Speed 2513.11 samples/sec Loss 1.7050 LearningRate 0.000172 Epoch: 25 Global Step: 519550 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:39,186-Speed 2495.83 samples/sec Loss 1.7096 LearningRate 0.000172 Epoch: 25 Global Step: 519560 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:47,394-Speed 2495.50 samples/sec Loss 1.6480 LearningRate 0.000172 Epoch: 25 Global Step: 519570 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:23:55,605-Speed 2494.41 samples/sec Loss 1.6589 LearningRate 0.000172 Epoch: 25 Global Step: 519580 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:03,809-Speed 2496.67 samples/sec Loss 1.6831 LearningRate 0.000172 Epoch: 25 Global Step: 519590 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:12,013-Speed 2497.71 samples/sec Loss 1.6892 LearningRate 0.000172 Epoch: 25 Global Step: 519600 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:20,161-Speed 2513.78 samples/sec Loss 1.6278 LearningRate 0.000172 Epoch: 25 Global Step: 519610 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:28,368-Speed 2495.85 samples/sec Loss 1.6857 LearningRate 0.000172 Epoch: 25 Global Step: 519620 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:36,572-Speed 2496.91 samples/sec Loss 1.6870 LearningRate 0.000172 Epoch: 25 Global Step: 519630 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:44,770-Speed 2498.34 samples/sec Loss 1.6778 LearningRate 0.000172 Epoch: 25 Global Step: 519640 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:24:52,970-Speed 2497.99 samples/sec Loss 1.6830 LearningRate 0.000172 Epoch: 25 Global Step: 519650 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:01,176-Speed 2496.23 samples/sec Loss 1.6302 LearningRate 0.000172 Epoch: 25 Global Step: 519660 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:09,323-Speed 2514.00 samples/sec Loss 1.6802 LearningRate 0.000172 Epoch: 25 Global Step: 519670 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:17,523-Speed 2498.06 samples/sec Loss 1.6789 LearningRate 0.000172 Epoch: 25 Global Step: 519680 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:25,719-Speed 2499.37 samples/sec Loss 1.6634 LearningRate 0.000172 Epoch: 25 Global Step: 519690 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:33,920-Speed 2497.88 samples/sec Loss 1.7081 LearningRate 0.000172 Epoch: 25 Global Step: 519700 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:42,119-Speed 2498.10 samples/sec Loss 1.6887 LearningRate 0.000172 Epoch: 25 Global Step: 519710 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:50,319-Speed 2497.82 samples/sec Loss 1.6527 LearningRate 0.000172 Epoch: 25 Global Step: 519720 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:25:58,468-Speed 2513.86 samples/sec Loss 1.6649 LearningRate 0.000172 Epoch: 25 Global Step: 519730 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:06,673-Speed 2496.35 samples/sec Loss 1.6855 LearningRate 0.000172 Epoch: 25 Global Step: 519740 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:14,877-Speed 2496.75 samples/sec Loss 1.6755 LearningRate 0.000172 Epoch: 25 Global Step: 519750 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:23,084-Speed 2495.86 samples/sec Loss 1.6899 LearningRate 0.000172 Epoch: 25 Global Step: 519760 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:31,282-Speed 2498.49 samples/sec Loss 1.6621 LearningRate 0.000172 Epoch: 25 Global Step: 519770 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:39,485-Speed 2497.33 samples/sec Loss 1.6747 LearningRate 0.000172 Epoch: 25 Global Step: 519780 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:47,641-Speed 2511.45 samples/sec Loss 1.6464 LearningRate 0.000172 Epoch: 25 Global Step: 519790 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:26:55,843-Speed 2497.41 samples/sec Loss 1.6659 LearningRate 0.000172 Epoch: 25 Global Step: 519800 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:04,045-Speed 2497.32 samples/sec Loss 1.6259 LearningRate 0.000172 Epoch: 25 Global Step: 519810 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:12,247-Speed 2497.22 samples/sec Loss 1.6760 LearningRate 0.000172 Epoch: 25 Global Step: 519820 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:20,445-Speed 2498.60 samples/sec Loss 1.6988 LearningRate 0.000172 Epoch: 25 Global Step: 519830 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:28,647-Speed 2497.54 samples/sec Loss 1.7068 LearningRate 0.000172 Epoch: 25 Global Step: 519840 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:36,806-Speed 2510.23 samples/sec Loss 1.6986 LearningRate 0.000172 Epoch: 25 Global Step: 519850 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:45,007-Speed 2497.86 samples/sec Loss 1.6575 LearningRate 0.000172 Epoch: 25 Global Step: 519860 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:27:53,204-Speed 2498.67 samples/sec Loss 1.6697 LearningRate 0.000172 Epoch: 25 Global Step: 519870 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:01,406-Speed 2497.41 samples/sec Loss 1.7156 LearningRate 0.000172 Epoch: 25 Global Step: 519880 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:09,605-Speed 2498.19 samples/sec Loss 1.6952 LearningRate 0.000172 Epoch: 25 Global Step: 519890 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:17,822-Speed 2492.69 samples/sec Loss 1.7142 LearningRate 0.000172 Epoch: 25 Global Step: 519900 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:25,972-Speed 2513.31 samples/sec Loss 1.6351 LearningRate 0.000172 Epoch: 25 Global Step: 519910 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:34,173-Speed 2497.82 samples/sec Loss 1.6427 LearningRate 0.000172 Epoch: 25 Global Step: 519920 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:42,397-Speed 2490.70 samples/sec Loss 1.6982 LearningRate 0.000172 Epoch: 25 Global Step: 519930 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:50,601-Speed 2496.61 samples/sec Loss 1.6846 LearningRate 0.000172 Epoch: 25 Global Step: 519940 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:28:58,809-Speed 2495.61 samples/sec Loss 1.6644 LearningRate 0.000172 Epoch: 25 Global Step: 519950 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:07,023-Speed 2493.74 samples/sec Loss 1.6648 LearningRate 0.000172 Epoch: 25 Global Step: 519960 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:15,173-Speed 2513.40 samples/sec Loss 1.6778 LearningRate 0.000172 Epoch: 25 Global Step: 519970 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:23,373-Speed 2498.01 samples/sec Loss 1.6371 LearningRate 0.000172 Epoch: 25 Global Step: 519980 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:31,581-Speed 2495.42 samples/sec Loss 1.6876 LearningRate 0.000172 Epoch: 25 Global Step: 519990 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:39,782-Speed 2497.67 samples/sec Loss 1.6943 LearningRate 0.000172 Epoch: 25 Global Step: 520000 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:47,982-Speed 2498.05 samples/sec Loss 1.6903 LearningRate 0.000172 Epoch: 25 Global Step: 520010 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:29:56,178-Speed 2499.14 samples/sec Loss 1.6727 LearningRate 0.000172 Epoch: 25 Global Step: 520020 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:04,325-Speed 2514.09 samples/sec Loss 1.6839 LearningRate 0.000172 Epoch: 25 Global Step: 520030 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:12,528-Speed 2497.32 samples/sec Loss 1.6863 LearningRate 0.000172 Epoch: 25 Global Step: 520040 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:20,728-Speed 2497.69 samples/sec Loss 1.6749 LearningRate 0.000172 Epoch: 25 Global Step: 520050 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:28,929-Speed 2497.62 samples/sec Loss 1.7006 LearningRate 0.000172 Epoch: 25 Global Step: 520060 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:37,127-Speed 2498.72 samples/sec Loss 1.6892 LearningRate 0.000172 Epoch: 25 Global Step: 520070 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:45,333-Speed 2496.41 samples/sec Loss 1.6697 LearningRate 0.000172 Epoch: 25 Global Step: 520080 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:30:53,481-Speed 2513.88 samples/sec Loss 1.7286 LearningRate 0.000172 Epoch: 25 Global Step: 520090 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:01,687-Speed 2496.09 samples/sec Loss 1.7484 LearningRate 0.000172 Epoch: 25 Global Step: 520100 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:09,891-Speed 2496.96 samples/sec Loss 1.7081 LearningRate 0.000172 Epoch: 25 Global Step: 520110 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:18,095-Speed 2496.64 samples/sec Loss 1.7146 LearningRate 0.000172 Epoch: 25 Global Step: 520120 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:26,295-Speed 2497.93 samples/sec Loss 1.7699 LearningRate 0.000172 Epoch: 25 Global Step: 520130 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:34,497-Speed 2497.31 samples/sec Loss 1.6830 LearningRate 0.000172 Epoch: 25 Global Step: 520140 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:42,645-Speed 2513.92 samples/sec Loss 1.7094 LearningRate 0.000172 Epoch: 25 Global Step: 520150 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:50,857-Speed 2494.15 samples/sec Loss 1.7193 LearningRate 0.000172 Epoch: 25 Global Step: 520160 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:31:59,059-Speed 2497.41 samples/sec Loss 1.7384 LearningRate 0.000172 Epoch: 25 Global Step: 520170 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:07,259-Speed 2498.05 samples/sec Loss 1.7580 LearningRate 0.000172 Epoch: 25 Global Step: 520180 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:15,462-Speed 2497.08 samples/sec Loss 1.6993 LearningRate 0.000172 Epoch: 25 Global Step: 520190 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:23,666-Speed 2496.75 samples/sec Loss 1.6701 LearningRate 0.000172 Epoch: 25 Global Step: 520200 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:31,816-Speed 2513.17 samples/sec Loss 1.6795 LearningRate 0.000172 Epoch: 25 Global Step: 520210 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:40,022-Speed 2496.36 samples/sec Loss 1.6917 LearningRate 0.000172 Epoch: 25 Global Step: 520220 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:48,226-Speed 2496.62 samples/sec Loss 1.6721 LearningRate 0.000172 Epoch: 25 Global Step: 520230 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:32:56,436-Speed 2494.92 samples/sec Loss 1.7208 LearningRate 0.000172 Epoch: 25 Global Step: 520240 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:04,648-Speed 2494.31 samples/sec Loss 1.6970 LearningRate 0.000172 Epoch: 25 Global Step: 520250 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:12,849-Speed 2497.89 samples/sec Loss 1.6884 LearningRate 0.000172 Epoch: 25 Global Step: 520260 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:20,998-Speed 2513.55 samples/sec Loss 1.7022 LearningRate 0.000172 Epoch: 25 Global Step: 520270 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:29,198-Speed 2497.90 samples/sec Loss 1.6496 LearningRate 0.000172 Epoch: 25 Global Step: 520280 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:37,401-Speed 2496.99 samples/sec Loss 1.7341 LearningRate 0.000172 Epoch: 25 Global Step: 520290 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:45,614-Speed 2494.14 samples/sec Loss 1.6341 LearningRate 0.000172 Epoch: 25 Global Step: 520300 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:33:53,814-Speed 2497.76 samples/sec Loss 1.6973 LearningRate 0.000172 Epoch: 25 Global Step: 520310 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:02,012-Speed 2498.81 samples/sec Loss 1.6708 LearningRate 0.000172 Epoch: 25 Global Step: 520320 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:10,170-Speed 2510.67 samples/sec Loss 1.6986 LearningRate 0.000172 Epoch: 25 Global Step: 520330 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:18,382-Speed 2494.46 samples/sec Loss 1.7136 LearningRate 0.000172 Epoch: 25 Global Step: 520340 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:26,578-Speed 2499.16 samples/sec Loss 1.6821 LearningRate 0.000172 Epoch: 25 Global Step: 520350 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:34,777-Speed 2498.27 samples/sec Loss 1.7243 LearningRate 0.000172 Epoch: 25 Global Step: 520360 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:42,976-Speed 2498.24 samples/sec Loss 1.6879 LearningRate 0.000172 Epoch: 25 Global Step: 520370 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:51,176-Speed 2497.70 samples/sec Loss 1.6569 LearningRate 0.000171 Epoch: 25 Global Step: 520380 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:34:59,323-Speed 2514.24 samples/sec Loss 1.7069 LearningRate 0.000171 Epoch: 25 Global Step: 520390 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:07,529-Speed 2496.26 samples/sec Loss 1.6646 LearningRate 0.000171 Epoch: 25 Global Step: 520400 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:15,734-Speed 2496.56 samples/sec Loss 1.6985 LearningRate 0.000171 Epoch: 25 Global Step: 520410 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:23,943-Speed 2495.10 samples/sec Loss 1.6935 LearningRate 0.000171 Epoch: 25 Global Step: 520420 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:32,149-Speed 2496.05 samples/sec Loss 1.6990 LearningRate 0.000171 Epoch: 25 Global Step: 520430 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:40,351-Speed 2497.31 samples/sec Loss 1.7263 LearningRate 0.000171 Epoch: 25 Global Step: 520440 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:48,514-Speed 2509.49 samples/sec Loss 1.7226 LearningRate 0.000171 Epoch: 25 Global Step: 520450 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:35:56,723-Speed 2495.35 samples/sec Loss 1.7040 LearningRate 0.000171 Epoch: 25 Global Step: 520460 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:04,938-Speed 2493.23 samples/sec Loss 1.6824 LearningRate 0.000171 Epoch: 25 Global Step: 520470 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:13,139-Speed 2497.35 samples/sec Loss 1.6707 LearningRate 0.000171 Epoch: 25 Global Step: 520480 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:21,341-Speed 2497.61 samples/sec Loss 1.6946 LearningRate 0.000171 Epoch: 25 Global Step: 520490 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:29,540-Speed 2498.07 samples/sec Loss 1.7006 LearningRate 0.000171 Epoch: 25 Global Step: 520500 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:37,701-Speed 2509.92 samples/sec Loss 1.7172 LearningRate 0.000171 Epoch: 25 Global Step: 520510 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:45,902-Speed 2497.51 samples/sec Loss 1.6205 LearningRate 0.000171 Epoch: 25 Global Step: 520520 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:36:54,109-Speed 2495.99 samples/sec Loss 1.6306 LearningRate 0.000171 Epoch: 25 Global Step: 520530 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:02,308-Speed 2498.15 samples/sec Loss 1.6343 LearningRate 0.000171 Epoch: 25 Global Step: 520540 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:10,510-Speed 2497.55 samples/sec Loss 1.6391 LearningRate 0.000171 Epoch: 25 Global Step: 520550 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:18,710-Speed 2497.98 samples/sec Loss 1.6471 LearningRate 0.000171 Epoch: 25 Global Step: 520560 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:26,856-Speed 2514.29 samples/sec Loss 1.6456 LearningRate 0.000171 Epoch: 25 Global Step: 520570 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:35,064-Speed 2495.51 samples/sec Loss 1.6667 LearningRate 0.000171 Epoch: 25 Global Step: 520580 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:43,276-Speed 2494.41 samples/sec Loss 1.6443 LearningRate 0.000171 Epoch: 25 Global Step: 520590 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:51,482-Speed 2496.40 samples/sec Loss 1.6677 LearningRate 0.000171 Epoch: 25 Global Step: 520600 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:37:59,684-Speed 2497.48 samples/sec Loss 1.6494 LearningRate 0.000171 Epoch: 25 Global Step: 520610 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:38:07,896-Speed 2494.31 samples/sec Loss 1.6764 LearningRate 0.000171 Epoch: 25 Global Step: 520620 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:38:16,048-Speed 2512.79 samples/sec Loss 1.6663 LearningRate 0.000171 Epoch: 25 Global Step: 520630 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:38:24,254-Speed 2496.17 samples/sec Loss 1.6783 LearningRate 0.000171 Epoch: 25 Global Step: 520640 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:38:32,452-Speed 2498.43 samples/sec Loss 1.6408 LearningRate 0.000171 Epoch: 25 Global Step: 520650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:38:40,658-Speed 2496.05 samples/sec Loss 1.6720 LearningRate 0.000171 Epoch: 25 Global Step: 520660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:38:48,859-Speed 2497.80 samples/sec Loss 1.6634 LearningRate 0.000171 Epoch: 25 Global Step: 520670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:38:57,062-Speed 2497.10 samples/sec Loss 1.6510 LearningRate 0.000171 Epoch: 25 Global Step: 520680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:39:05,213-Speed 2513.10 samples/sec Loss 1.6593 LearningRate 0.000171 Epoch: 25 Global Step: 520690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:39:13,418-Speed 2496.78 samples/sec Loss 1.6352 LearningRate 0.000171 Epoch: 25 Global Step: 520700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:39:21,619-Speed 2497.62 samples/sec Loss 1.6578 LearningRate 0.000171 Epoch: 25 Global Step: 520710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:39:29,823-Speed 2496.99 samples/sec Loss 1.6515 LearningRate 0.000171 Epoch: 25 Global Step: 520720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-07-10 13:39:37,981-Speed 2510.82 samples/sec Loss 1.6932 LearningRate 0.000171 Epoch: 25 Global Step: 520730 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:39:46,184-Speed 2497.00 samples/sec Loss 1.6767 LearningRate 0.000171 Epoch: 25 Global Step: 520740 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:39:54,336-Speed 2512.69 samples/sec Loss 1.6175 LearningRate 0.000171 Epoch: 25 Global Step: 520750 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:02,540-Speed 2496.66 samples/sec Loss 1.6742 LearningRate 0.000171 Epoch: 25 Global Step: 520760 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:10,753-Speed 2494.09 samples/sec Loss 1.6655 LearningRate 0.000171 Epoch: 25 Global Step: 520770 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:18,954-Speed 2497.56 samples/sec Loss 1.6651 LearningRate 0.000171 Epoch: 25 Global Step: 520780 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:27,156-Speed 2497.37 samples/sec Loss 1.6485 LearningRate 0.000171 Epoch: 25 Global Step: 520790 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:35,359-Speed 2497.22 samples/sec Loss 1.6911 LearningRate 0.000171 Epoch: 25 Global Step: 520800 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:43,514-Speed 2511.65 samples/sec Loss 1.7044 LearningRate 0.000171 Epoch: 25 Global Step: 520810 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:51,729-Speed 2493.55 samples/sec Loss 1.6255 LearningRate 0.000171 Epoch: 25 Global Step: 520820 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:40:59,935-Speed 2496.32 samples/sec Loss 1.6384 LearningRate 0.000171 Epoch: 25 Global Step: 520830 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:08,135-Speed 2497.79 samples/sec Loss 1.6500 LearningRate 0.000171 Epoch: 25 Global Step: 520840 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:16,334-Speed 2498.22 samples/sec Loss 1.6728 LearningRate 0.000171 Epoch: 25 Global Step: 520850 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:24,532-Speed 2498.70 samples/sec Loss 1.6662 LearningRate 0.000171 Epoch: 25 Global Step: 520860 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:32,681-Speed 2513.74 samples/sec Loss 1.6875 LearningRate 0.000171 Epoch: 25 Global Step: 520870 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:40,877-Speed 2499.11 samples/sec Loss 1.6915 LearningRate 0.000171 Epoch: 25 Global Step: 520880 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:49,075-Speed 2498.55 samples/sec Loss 1.6961 LearningRate 0.000171 Epoch: 25 Global Step: 520890 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:41:57,280-Speed 2496.81 samples/sec Loss 1.6558 LearningRate 0.000171 Epoch: 25 Global Step: 520900 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:05,479-Speed 2498.23 samples/sec Loss 1.6765 LearningRate 0.000171 Epoch: 25 Global Step: 520910 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:13,692-Speed 2494.03 samples/sec Loss 1.6905 LearningRate 0.000171 Epoch: 25 Global Step: 520920 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:21,842-Speed 2513.13 samples/sec Loss 1.6491 LearningRate 0.000171 Epoch: 25 Global Step: 520930 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:30,039-Speed 2499.09 samples/sec Loss 1.6719 LearningRate 0.000171 Epoch: 25 Global Step: 520940 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:38,242-Speed 2496.96 samples/sec Loss 1.6949 LearningRate 0.000171 Epoch: 25 Global Step: 520950 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:46,443-Speed 2497.60 samples/sec Loss 1.7058 LearningRate 0.000171 Epoch: 25 Global Step: 520960 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:42:54,645-Speed 2497.35 samples/sec Loss 1.7299 LearningRate 0.000171 Epoch: 25 Global Step: 520970 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:02,848-Speed 2497.15 samples/sec Loss 1.7130 LearningRate 0.000171 Epoch: 25 Global Step: 520980 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:11,002-Speed 2512.04 samples/sec Loss 1.6791 LearningRate 0.000171 Epoch: 25 Global Step: 520990 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:19,214-Speed 2494.24 samples/sec Loss 1.6816 LearningRate 0.000171 Epoch: 25 Global Step: 521000 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:27,418-Speed 2496.77 samples/sec Loss 1.7027 LearningRate 0.000171 Epoch: 25 Global Step: 521010 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:35,621-Speed 2497.06 samples/sec Loss 1.7225 LearningRate 0.000171 Epoch: 25 Global Step: 521020 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:43,829-Speed 2495.64 samples/sec Loss 1.6730 LearningRate 0.000171 Epoch: 25 Global Step: 521030 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:43:52,031-Speed 2497.30 samples/sec Loss 1.6745 LearningRate 0.000171 Epoch: 25 Global Step: 521040 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:00,178-Speed 2514.09 samples/sec Loss 1.7023 LearningRate 0.000171 Epoch: 25 Global Step: 521050 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:08,382-Speed 2496.90 samples/sec Loss 1.7012 LearningRate 0.000171 Epoch: 25 Global Step: 521060 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:16,580-Speed 2498.44 samples/sec Loss 1.6541 LearningRate 0.000171 Epoch: 25 Global Step: 521070 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:24,779-Speed 2498.26 samples/sec Loss 1.6780 LearningRate 0.000171 Epoch: 25 Global Step: 521080 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:32,980-Speed 2497.68 samples/sec Loss 1.6449 LearningRate 0.000171 Epoch: 25 Global Step: 521090 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:41,182-Speed 2497.15 samples/sec Loss 1.6838 LearningRate 0.000171 Epoch: 25 Global Step: 521100 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:49,334-Speed 2512.77 samples/sec Loss 1.6921 LearningRate 0.000171 Epoch: 25 Global Step: 521110 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:44:57,532-Speed 2498.46 samples/sec Loss 1.6874 LearningRate 0.000171 Epoch: 25 Global Step: 521120 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:05,744-Speed 2494.39 samples/sec Loss 1.7060 LearningRate 0.000171 Epoch: 25 Global Step: 521130 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:13,945-Speed 2497.89 samples/sec Loss 1.6347 LearningRate 0.000171 Epoch: 25 Global Step: 521140 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:22,154-Speed 2494.94 samples/sec Loss 1.6739 LearningRate 0.000171 Epoch: 25 Global Step: 521150 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:30,353-Speed 2498.13 samples/sec Loss 1.6899 LearningRate 0.000171 Epoch: 25 Global Step: 521160 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:38,502-Speed 2513.73 samples/sec Loss 1.6772 LearningRate 0.000171 Epoch: 25 Global Step: 521170 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:46,700-Speed 2498.93 samples/sec Loss 1.7065 LearningRate 0.000171 Epoch: 25 Global Step: 521180 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:45:54,901-Speed 2497.49 samples/sec Loss 1.6739 LearningRate 0.000171 Epoch: 25 Global Step: 521190 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:03,104-Speed 2497.29 samples/sec Loss 1.6523 LearningRate 0.000171 Epoch: 25 Global Step: 521200 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:11,303-Speed 2498.25 samples/sec Loss 1.6716 LearningRate 0.000171 Epoch: 25 Global Step: 521210 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:19,505-Speed 2497.12 samples/sec Loss 1.6962 LearningRate 0.000171 Epoch: 25 Global Step: 521220 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:27,651-Speed 2514.49 samples/sec Loss 1.7200 LearningRate 0.000171 Epoch: 25 Global Step: 521230 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:35,862-Speed 2495.06 samples/sec Loss 1.7289 LearningRate 0.000171 Epoch: 25 Global Step: 521240 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:44,058-Speed 2499.02 samples/sec Loss 1.7309 LearningRate 0.000171 Epoch: 25 Global Step: 521250 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:46:52,270-Speed 2494.36 samples/sec Loss 1.6565 LearningRate 0.000171 Epoch: 25 Global Step: 521260 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:00,471-Speed 2497.75 samples/sec Loss 1.6911 LearningRate 0.000171 Epoch: 25 Global Step: 521270 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:08,674-Speed 2497.11 samples/sec Loss 1.6972 LearningRate 0.000170 Epoch: 25 Global Step: 521280 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:16,823-Speed 2513.52 samples/sec Loss 1.7199 LearningRate 0.000170 Epoch: 25 Global Step: 521290 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:25,026-Speed 2496.90 samples/sec Loss 1.7095 LearningRate 0.000170 Epoch: 25 Global Step: 521300 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:33,238-Speed 2494.25 samples/sec Loss 1.6835 LearningRate 0.000170 Epoch: 25 Global Step: 521310 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:41,439-Speed 2497.88 samples/sec Loss 1.6535 LearningRate 0.000170 Epoch: 25 Global Step: 521320 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:49,639-Speed 2497.88 samples/sec Loss 1.6812 LearningRate 0.000170 Epoch: 25 Global Step: 521330 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:47:57,838-Speed 2498.25 samples/sec Loss 1.6919 LearningRate 0.000170 Epoch: 25 Global Step: 521340 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:05,981-Speed 2515.75 samples/sec Loss 1.6776 LearningRate 0.000170 Epoch: 25 Global Step: 521350 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:14,183-Speed 2497.73 samples/sec Loss 1.6918 LearningRate 0.000170 Epoch: 25 Global Step: 521360 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:22,385-Speed 2497.21 samples/sec Loss 1.6907 LearningRate 0.000170 Epoch: 25 Global Step: 521370 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:30,587-Speed 2497.42 samples/sec Loss 1.6718 LearningRate 0.000170 Epoch: 25 Global Step: 521380 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:38,786-Speed 2498.14 samples/sec Loss 1.6733 LearningRate 0.000170 Epoch: 25 Global Step: 521390 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:46,982-Speed 2499.43 samples/sec Loss 1.6919 LearningRate 0.000170 Epoch: 25 Global Step: 521400 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:48:55,131-Speed 2513.27 samples/sec Loss 1.6508 LearningRate 0.000170 Epoch: 25 Global Step: 521410 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:03,335-Speed 2496.80 samples/sec Loss 1.6625 LearningRate 0.000170 Epoch: 25 Global Step: 521420 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:11,535-Speed 2497.82 samples/sec Loss 1.6728 LearningRate 0.000170 Epoch: 25 Global Step: 521430 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:19,736-Speed 2497.72 samples/sec Loss 1.7383 LearningRate 0.000170 Epoch: 25 Global Step: 521440 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:27,935-Speed 2498.33 samples/sec Loss 1.6662 LearningRate 0.000170 Epoch: 25 Global Step: 521450 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:36,138-Speed 2497.06 samples/sec Loss 1.7463 LearningRate 0.000170 Epoch: 25 Global Step: 521460 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:44,285-Speed 2514.45 samples/sec Loss 1.6864 LearningRate 0.000170 Epoch: 25 Global Step: 521470 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:49:52,483-Speed 2498.34 samples/sec Loss 1.6793 LearningRate 0.000170 Epoch: 25 Global Step: 521480 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:00,701-Speed 2492.57 samples/sec Loss 1.6963 LearningRate 0.000170 Epoch: 25 Global Step: 521490 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:08,900-Speed 2498.27 samples/sec Loss 1.7121 LearningRate 0.000170 Epoch: 25 Global Step: 521500 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:17,112-Speed 2494.32 samples/sec Loss 1.6688 LearningRate 0.000170 Epoch: 25 Global Step: 521510 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:25,317-Speed 2496.54 samples/sec Loss 1.7015 LearningRate 0.000170 Epoch: 25 Global Step: 521520 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:33,464-Speed 2514.00 samples/sec Loss 1.6803 LearningRate 0.000170 Epoch: 25 Global Step: 521530 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:41,666-Speed 2497.22 samples/sec Loss 1.6897 LearningRate 0.000170 Epoch: 25 Global Step: 521540 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:49,866-Speed 2498.01 samples/sec Loss 1.6777 LearningRate 0.000170 Epoch: 25 Global Step: 521550 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:50:58,067-Speed 2497.69 samples/sec Loss 1.6798 LearningRate 0.000170 Epoch: 25 Global Step: 521560 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:06,268-Speed 2497.58 samples/sec Loss 1.6923 LearningRate 0.000170 Epoch: 25 Global Step: 521570 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:14,473-Speed 2496.39 samples/sec Loss 1.6474 LearningRate 0.000170 Epoch: 25 Global Step: 521580 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:22,629-Speed 2511.62 samples/sec Loss 1.7006 LearningRate 0.000170 Epoch: 25 Global Step: 521590 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:30,829-Speed 2497.89 samples/sec Loss 1.7070 LearningRate 0.000170 Epoch: 25 Global Step: 521600 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:39,043-Speed 2493.71 samples/sec Loss 1.6920 LearningRate 0.000170 Epoch: 25 Global Step: 521610 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:47,243-Speed 2498.15 samples/sec Loss 1.6611 LearningRate 0.000170 Epoch: 25 Global Step: 521620 Fp16 Grad Scale: 32768 Required: 71 hours Training: 2022-07-10 13:51:55,442-Speed 2498.08 samples/sec Loss 1.6766 LearningRate 0.000170 Epoch: 25 Global Step: 521630 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:03,644-Speed 2497.37 samples/sec Loss 1.6670 LearningRate 0.000170 Epoch: 25 Global Step: 521640 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:11,795-Speed 2513.02 samples/sec Loss 1.6610 LearningRate 0.000170 Epoch: 25 Global Step: 521650 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:19,995-Speed 2497.78 samples/sec Loss 1.6660 LearningRate 0.000170 Epoch: 25 Global Step: 521660 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:28,203-Speed 2495.53 samples/sec Loss 1.6687 LearningRate 0.000170 Epoch: 25 Global Step: 521670 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:36,404-Speed 2497.70 samples/sec Loss 1.7435 LearningRate 0.000170 Epoch: 25 Global Step: 521680 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:44,606-Speed 2497.36 samples/sec Loss 1.7227 LearningRate 0.000170 Epoch: 25 Global Step: 521690 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:52:52,815-Speed 2498.77 samples/sec Loss 1.6662 LearningRate 0.000170 Epoch: 25 Global Step: 521700 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:00,973-Speed 2511.02 samples/sec Loss 1.7007 LearningRate 0.000170 Epoch: 25 Global Step: 521710 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:09,190-Speed 2492.73 samples/sec Loss 1.6735 LearningRate 0.000170 Epoch: 25 Global Step: 521720 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:17,390-Speed 2498.01 samples/sec Loss 1.7025 LearningRate 0.000170 Epoch: 25 Global Step: 521730 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:25,593-Speed 2497.01 samples/sec Loss 1.7059 LearningRate 0.000170 Epoch: 25 Global Step: 521740 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:33,795-Speed 2497.39 samples/sec Loss 1.6853 LearningRate 0.000170 Epoch: 25 Global Step: 521750 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:41,993-Speed 2498.59 samples/sec Loss 1.6993 LearningRate 0.000170 Epoch: 25 Global Step: 521760 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:50,142-Speed 2513.82 samples/sec Loss 1.7046 LearningRate 0.000170 Epoch: 25 Global Step: 521770 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:53:58,338-Speed 2499.14 samples/sec Loss 1.6961 LearningRate 0.000170 Epoch: 25 Global Step: 521780 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:06,543-Speed 2496.37 samples/sec Loss 1.7011 LearningRate 0.000170 Epoch: 25 Global Step: 521790 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:14,738-Speed 2499.40 samples/sec Loss 1.6994 LearningRate 0.000170 Epoch: 25 Global Step: 521800 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:22,941-Speed 2497.07 samples/sec Loss 1.6826 LearningRate 0.000170 Epoch: 25 Global Step: 521810 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:31,148-Speed 2496.14 samples/sec Loss 1.6784 LearningRate 0.000170 Epoch: 25 Global Step: 521820 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:39,302-Speed 2511.91 samples/sec Loss 1.6810 LearningRate 0.000170 Epoch: 25 Global Step: 521830 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:47,506-Speed 2496.66 samples/sec Loss 1.6855 LearningRate 0.000170 Epoch: 25 Global Step: 521840 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:54:55,709-Speed 2497.11 samples/sec Loss 1.6667 LearningRate 0.000170 Epoch: 25 Global Step: 521850 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:03,910-Speed 2497.91 samples/sec Loss 1.7105 LearningRate 0.000170 Epoch: 25 Global Step: 521860 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:12,108-Speed 2498.37 samples/sec Loss 1.7061 LearningRate 0.000170 Epoch: 25 Global Step: 521870 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:20,320-Speed 2494.55 samples/sec Loss 1.6637 LearningRate 0.000170 Epoch: 25 Global Step: 521880 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:28,463-Speed 2515.57 samples/sec Loss 1.6838 LearningRate 0.000170 Epoch: 25 Global Step: 521890 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:36,668-Speed 2496.23 samples/sec Loss 1.6713 LearningRate 0.000170 Epoch: 25 Global Step: 521900 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:44,872-Speed 2496.71 samples/sec Loss 1.7029 LearningRate 0.000170 Epoch: 25 Global Step: 521910 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:55:53,075-Speed 2497.18 samples/sec Loss 1.6567 LearningRate 0.000170 Epoch: 25 Global Step: 521920 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:56:01,288-Speed 2494.16 samples/sec Loss 1.6495 LearningRate 0.000170 Epoch: 25 Global Step: 521930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-07-10 13:56:09,488-Speed 2497.87 samples/sec Loss 1.6680 LearningRate 0.000170 Epoch: 25 Global Step: 521940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-07-10 13:56:17,634-Speed 2514.29 samples/sec Loss 1.6714 LearningRate 0.000170 Epoch: 25 Global Step: 521950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-07-10 13:56:25,837-Speed 2497.14 samples/sec Loss 1.6673 LearningRate 0.000170 Epoch: 25 Global Step: 521960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-07-10 13:56:33,992-Speed 2511.94 samples/sec Loss 1.6819 LearningRate 0.000170 Epoch: 25 Global Step: 521970 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:56:42,192-Speed 2498.09 samples/sec Loss 1.6945 LearningRate 0.000170 Epoch: 25 Global Step: 521980 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:56:50,390-Speed 2498.43 samples/sec Loss 1.6598 LearningRate 0.000170 Epoch: 25 Global Step: 521990 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:56:58,588-Speed 2498.41 samples/sec Loss 1.6713 LearningRate 0.000170 Epoch: 25 Global Step: 522000 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:06,736-Speed 2514.16 samples/sec Loss 1.6864 LearningRate 0.000170 Epoch: 25 Global Step: 522010 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:14,941-Speed 2496.56 samples/sec Loss 1.6540 LearningRate 0.000170 Epoch: 25 Global Step: 522020 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:23,137-Speed 2499.28 samples/sec Loss 1.6770 LearningRate 0.000170 Epoch: 25 Global Step: 522030 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:31,338-Speed 2497.59 samples/sec Loss 1.6750 LearningRate 0.000170 Epoch: 25 Global Step: 522040 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:39,540-Speed 2497.59 samples/sec Loss 1.6849 LearningRate 0.000170 Epoch: 25 Global Step: 522050 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:47,745-Speed 2496.28 samples/sec Loss 1.6701 LearningRate 0.000170 Epoch: 25 Global Step: 522060 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:57:55,893-Speed 2513.96 samples/sec Loss 1.6828 LearningRate 0.000170 Epoch: 25 Global Step: 522070 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:04,092-Speed 2498.28 samples/sec Loss 1.7092 LearningRate 0.000170 Epoch: 25 Global Step: 522080 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:12,291-Speed 2498.48 samples/sec Loss 1.6543 LearningRate 0.000170 Epoch: 25 Global Step: 522090 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:20,495-Speed 2496.75 samples/sec Loss 1.6537 LearningRate 0.000170 Epoch: 25 Global Step: 522100 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:28,703-Speed 2495.75 samples/sec Loss 1.6706 LearningRate 0.000170 Epoch: 25 Global Step: 522110 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:36,917-Speed 2493.69 samples/sec Loss 1.6992 LearningRate 0.000170 Epoch: 25 Global Step: 522120 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:45,065-Speed 2513.90 samples/sec Loss 1.7028 LearningRate 0.000170 Epoch: 25 Global Step: 522130 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:58:53,270-Speed 2496.29 samples/sec Loss 1.6542 LearningRate 0.000170 Epoch: 25 Global Step: 522140 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:01,485-Speed 2493.68 samples/sec Loss 1.7003 LearningRate 0.000170 Epoch: 25 Global Step: 522150 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:09,690-Speed 2496.37 samples/sec Loss 1.6921 LearningRate 0.000170 Epoch: 25 Global Step: 522160 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:17,896-Speed 2496.22 samples/sec Loss 1.6897 LearningRate 0.000170 Epoch: 25 Global Step: 522170 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:26,095-Speed 2498.19 samples/sec Loss 1.6826 LearningRate 0.000170 Epoch: 25 Global Step: 522180 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:34,248-Speed 2512.45 samples/sec Loss 1.6694 LearningRate 0.000169 Epoch: 25 Global Step: 522190 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:42,453-Speed 2496.37 samples/sec Loss 1.7072 LearningRate 0.000169 Epoch: 25 Global Step: 522200 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:50,656-Speed 2496.95 samples/sec Loss 1.6956 LearningRate 0.000169 Epoch: 25 Global Step: 522210 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 13:59:58,855-Speed 2498.42 samples/sec Loss 1.6361 LearningRate 0.000169 Epoch: 25 Global Step: 522220 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:00:07,072-Speed 2492.75 samples/sec Loss 1.6736 LearningRate 0.000169 Epoch: 25 Global Step: 522230 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:00:15,278-Speed 2496.06 samples/sec Loss 1.6823 LearningRate 0.000169 Epoch: 25 Global Step: 522240 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:00:23,425-Speed 2514.01 samples/sec Loss 1.6659 LearningRate 0.000169 Epoch: 25 Global Step: 522250 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:00:31,635-Speed 2495.04 samples/sec Loss 1.6580 LearningRate 0.000169 Epoch: 25 Global Step: 522260 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:00:39,795-Speed 2510.21 samples/sec Loss 1.6958 LearningRate 0.000169 Epoch: 25 Global Step: 522270 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:00:48,000-Speed 2496.33 samples/sec Loss 1.6721 LearningRate 0.000169 Epoch: 25 Global Step: 522280 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:00:56,200-Speed 2498.00 samples/sec Loss 1.7033 LearningRate 0.000169 Epoch: 25 Global Step: 522290 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:04,398-Speed 2498.30 samples/sec Loss 1.6764 LearningRate 0.000169 Epoch: 25 Global Step: 522300 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:12,545-Speed 2514.54 samples/sec Loss 1.6887 LearningRate 0.000169 Epoch: 25 Global Step: 522310 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:20,745-Speed 2497.98 samples/sec Loss 1.6547 LearningRate 0.000169 Epoch: 25 Global Step: 522320 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:28,948-Speed 2496.77 samples/sec Loss 1.6472 LearningRate 0.000169 Epoch: 25 Global Step: 522330 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:37,148-Speed 2498.11 samples/sec Loss 1.6721 LearningRate 0.000169 Epoch: 25 Global Step: 522340 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:45,347-Speed 2498.43 samples/sec Loss 1.6649 LearningRate 0.000169 Epoch: 25 Global Step: 522350 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:01:53,547-Speed 2497.70 samples/sec Loss 1.6999 LearningRate 0.000169 Epoch: 25 Global Step: 522360 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:01,695-Speed 2513.89 samples/sec Loss 1.6813 LearningRate 0.000169 Epoch: 25 Global Step: 522370 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:09,894-Speed 2498.72 samples/sec Loss 1.6760 LearningRate 0.000169 Epoch: 25 Global Step: 522380 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:18,094-Speed 2497.81 samples/sec Loss 1.6693 LearningRate 0.000169 Epoch: 25 Global Step: 522390 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:26,293-Speed 2498.30 samples/sec Loss 1.6875 LearningRate 0.000169 Epoch: 25 Global Step: 522400 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:34,490-Speed 2498.71 samples/sec Loss 1.6796 LearningRate 0.000169 Epoch: 25 Global Step: 522410 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:42,696-Speed 2496.35 samples/sec Loss 1.6794 LearningRate 0.000169 Epoch: 25 Global Step: 522420 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:50,841-Speed 2514.62 samples/sec Loss 1.6606 LearningRate 0.000169 Epoch: 25 Global Step: 522430 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:02:59,049-Speed 2495.42 samples/sec Loss 1.6879 LearningRate 0.000169 Epoch: 25 Global Step: 522440 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:07,244-Speed 2499.68 samples/sec Loss 1.6671 LearningRate 0.000169 Epoch: 25 Global Step: 522450 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:15,443-Speed 2498.29 samples/sec Loss 1.6906 LearningRate 0.000169 Epoch: 25 Global Step: 522460 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:23,648-Speed 2496.37 samples/sec Loss 1.6622 LearningRate 0.000169 Epoch: 25 Global Step: 522470 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:31,856-Speed 2495.28 samples/sec Loss 1.6667 LearningRate 0.000169 Epoch: 25 Global Step: 522480 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:40,000-Speed 2515.26 samples/sec Loss 1.6915 LearningRate 0.000169 Epoch: 25 Global Step: 522490 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:48,200-Speed 2498.16 samples/sec Loss 1.6543 LearningRate 0.000169 Epoch: 25 Global Step: 522500 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:03:56,409-Speed 2495.13 samples/sec Loss 1.6760 LearningRate 0.000169 Epoch: 25 Global Step: 522510 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:04,608-Speed 2498.12 samples/sec Loss 1.6519 LearningRate 0.000169 Epoch: 25 Global Step: 522520 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:12,804-Speed 2499.10 samples/sec Loss 1.6788 LearningRate 0.000169 Epoch: 25 Global Step: 522530 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:21,005-Speed 2497.43 samples/sec Loss 1.6639 LearningRate 0.000169 Epoch: 25 Global Step: 522540 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:29,152-Speed 2514.58 samples/sec Loss 1.6798 LearningRate 0.000169 Epoch: 25 Global Step: 522550 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:37,349-Speed 2498.85 samples/sec Loss 1.6664 LearningRate 0.000169 Epoch: 25 Global Step: 522560 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:45,547-Speed 2498.38 samples/sec Loss 1.6851 LearningRate 0.000169 Epoch: 25 Global Step: 522570 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:04:53,753-Speed 2496.08 samples/sec Loss 1.6465 LearningRate 0.000169 Epoch: 25 Global Step: 522580 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:01,952-Speed 2498.55 samples/sec Loss 1.6671 LearningRate 0.000169 Epoch: 25 Global Step: 522590 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:10,152-Speed 2497.67 samples/sec Loss 1.7281 LearningRate 0.000169 Epoch: 25 Global Step: 522600 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:18,296-Speed 2515.14 samples/sec Loss 1.7063 LearningRate 0.000169 Epoch: 25 Global Step: 522610 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:26,506-Speed 2495.19 samples/sec Loss 1.6456 LearningRate 0.000169 Epoch: 25 Global Step: 522620 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:34,712-Speed 2496.39 samples/sec Loss 1.7293 LearningRate 0.000169 Epoch: 25 Global Step: 522630 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:42,911-Speed 2498.36 samples/sec Loss 1.6949 LearningRate 0.000169 Epoch: 25 Global Step: 522640 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:51,110-Speed 2498.16 samples/sec Loss 1.6883 LearningRate 0.000169 Epoch: 25 Global Step: 522650 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:05:59,311-Speed 2497.57 samples/sec Loss 1.6949 LearningRate 0.000169 Epoch: 25 Global Step: 522660 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:07,465-Speed 2511.94 samples/sec Loss 1.6787 LearningRate 0.000169 Epoch: 25 Global Step: 522670 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:15,666-Speed 2497.72 samples/sec Loss 1.6722 LearningRate 0.000169 Epoch: 25 Global Step: 522680 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:23,865-Speed 2498.35 samples/sec Loss 1.6699 LearningRate 0.000169 Epoch: 25 Global Step: 522690 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:32,068-Speed 2497.15 samples/sec Loss 1.6964 LearningRate 0.000169 Epoch: 25 Global Step: 522700 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:40,269-Speed 2497.47 samples/sec Loss 1.6598 LearningRate 0.000169 Epoch: 25 Global Step: 522710 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:48,486-Speed 2492.98 samples/sec Loss 1.7238 LearningRate 0.000169 Epoch: 25 Global Step: 522720 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:06:56,644-Speed 2510.81 samples/sec Loss 1.6899 LearningRate 0.000169 Epoch: 25 Global Step: 522730 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:04,847-Speed 2497.11 samples/sec Loss 1.6953 LearningRate 0.000169 Epoch: 25 Global Step: 522740 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:13,046-Speed 2498.16 samples/sec Loss 1.6450 LearningRate 0.000169 Epoch: 25 Global Step: 522750 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:21,256-Speed 2494.97 samples/sec Loss 1.6877 LearningRate 0.000169 Epoch: 25 Global Step: 522760 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:29,459-Speed 2497.07 samples/sec Loss 1.6692 LearningRate 0.000169 Epoch: 25 Global Step: 522770 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:37,662-Speed 2497.18 samples/sec Loss 1.6781 LearningRate 0.000169 Epoch: 25 Global Step: 522780 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:45,812-Speed 2513.14 samples/sec Loss 1.6728 LearningRate 0.000169 Epoch: 25 Global Step: 522790 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:07:54,027-Speed 2493.36 samples/sec Loss 1.6779 LearningRate 0.000169 Epoch: 25 Global Step: 522800 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:02,233-Speed 2496.17 samples/sec Loss 1.7199 LearningRate 0.000169 Epoch: 25 Global Step: 522810 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:10,434-Speed 2497.62 samples/sec Loss 1.6583 LearningRate 0.000169 Epoch: 25 Global Step: 522820 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:18,642-Speed 2495.78 samples/sec Loss 1.7170 LearningRate 0.000169 Epoch: 25 Global Step: 522830 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:26,844-Speed 2497.01 samples/sec Loss 1.6978 LearningRate 0.000169 Epoch: 25 Global Step: 522840 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:35,004-Speed 2510.16 samples/sec Loss 1.6967 LearningRate 0.000169 Epoch: 25 Global Step: 522850 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:43,207-Speed 2497.08 samples/sec Loss 1.6603 LearningRate 0.000169 Epoch: 25 Global Step: 522860 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:51,411-Speed 2496.68 samples/sec Loss 1.6545 LearningRate 0.000169 Epoch: 25 Global Step: 522870 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:08:59,614-Speed 2497.18 samples/sec Loss 1.6895 LearningRate 0.000169 Epoch: 25 Global Step: 522880 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:07,818-Speed 2496.70 samples/sec Loss 1.6657 LearningRate 0.000169 Epoch: 25 Global Step: 522890 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:16,035-Speed 2492.95 samples/sec Loss 1.6769 LearningRate 0.000169 Epoch: 25 Global Step: 522900 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:24,179-Speed 2514.80 samples/sec Loss 1.6881 LearningRate 0.000169 Epoch: 25 Global Step: 522910 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:32,382-Speed 2497.11 samples/sec Loss 1.6988 LearningRate 0.000169 Epoch: 25 Global Step: 522920 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:40,582-Speed 2498.05 samples/sec Loss 1.7121 LearningRate 0.000169 Epoch: 25 Global Step: 522930 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:48,779-Speed 2498.90 samples/sec Loss 1.6917 LearningRate 0.000169 Epoch: 25 Global Step: 522940 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:09:56,984-Speed 2496.46 samples/sec Loss 1.6995 LearningRate 0.000169 Epoch: 25 Global Step: 522950 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:05,181-Speed 2498.86 samples/sec Loss 1.6722 LearningRate 0.000169 Epoch: 25 Global Step: 522960 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:13,326-Speed 2514.98 samples/sec Loss 1.6782 LearningRate 0.000169 Epoch: 25 Global Step: 522970 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:21,528-Speed 2497.40 samples/sec Loss 1.6965 LearningRate 0.000169 Epoch: 25 Global Step: 522980 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:29,727-Speed 2498.43 samples/sec Loss 1.6983 LearningRate 0.000169 Epoch: 25 Global Step: 522990 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:37,940-Speed 2493.93 samples/sec Loss 1.6473 LearningRate 0.000169 Epoch: 25 Global Step: 523000 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:46,140-Speed 2498.17 samples/sec Loss 1.6898 LearningRate 0.000169 Epoch: 25 Global Step: 523010 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:10:54,341-Speed 2497.48 samples/sec Loss 1.6824 LearningRate 0.000169 Epoch: 25 Global Step: 523020 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:02,498-Speed 2511.30 samples/sec Loss 1.6852 LearningRate 0.000169 Epoch: 25 Global Step: 523030 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:10,704-Speed 2496.02 samples/sec Loss 1.6694 LearningRate 0.000169 Epoch: 25 Global Step: 523040 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:18,908-Speed 2496.88 samples/sec Loss 1.6911 LearningRate 0.000169 Epoch: 25 Global Step: 523050 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:27,106-Speed 2498.40 samples/sec Loss 1.6813 LearningRate 0.000169 Epoch: 25 Global Step: 523060 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:35,306-Speed 2498.07 samples/sec Loss 1.6827 LearningRate 0.000169 Epoch: 25 Global Step: 523070 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:43,505-Speed 2498.17 samples/sec Loss 1.7071 LearningRate 0.000169 Epoch: 25 Global Step: 523080 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:51,650-Speed 2514.87 samples/sec Loss 1.6652 LearningRate 0.000168 Epoch: 25 Global Step: 523090 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:11:59,848-Speed 2498.39 samples/sec Loss 1.6776 LearningRate 0.000168 Epoch: 25 Global Step: 523100 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:08,045-Speed 2498.87 samples/sec Loss 1.6678 LearningRate 0.000168 Epoch: 25 Global Step: 523110 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:16,249-Speed 2496.99 samples/sec Loss 1.6520 LearningRate 0.000168 Epoch: 25 Global Step: 523120 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:24,451-Speed 2497.28 samples/sec Loss 1.6676 LearningRate 0.000168 Epoch: 25 Global Step: 523130 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:32,655-Speed 2496.63 samples/sec Loss 1.7281 LearningRate 0.000168 Epoch: 25 Global Step: 523140 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:40,805-Speed 2513.15 samples/sec Loss 1.6548 LearningRate 0.000168 Epoch: 25 Global Step: 523150 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:49,009-Speed 2496.83 samples/sec Loss 1.6674 LearningRate 0.000168 Epoch: 25 Global Step: 523160 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:12:57,217-Speed 2495.43 samples/sec Loss 1.6609 LearningRate 0.000168 Epoch: 25 Global Step: 523170 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:05,415-Speed 2498.50 samples/sec Loss 1.6754 LearningRate 0.000168 Epoch: 25 Global Step: 523180 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:13,618-Speed 2496.90 samples/sec Loss 1.7077 LearningRate 0.000168 Epoch: 25 Global Step: 523190 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:21,823-Speed 2496.90 samples/sec Loss 1.7022 LearningRate 0.000168 Epoch: 25 Global Step: 523200 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:29,974-Speed 2512.77 samples/sec Loss 1.7000 LearningRate 0.000168 Epoch: 25 Global Step: 523210 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:38,176-Speed 2497.28 samples/sec Loss 1.6405 LearningRate 0.000168 Epoch: 25 Global Step: 523220 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:46,376-Speed 2498.04 samples/sec Loss 1.6771 LearningRate 0.000168 Epoch: 25 Global Step: 523230 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:13:54,585-Speed 2495.35 samples/sec Loss 1.6578 LearningRate 0.000168 Epoch: 25 Global Step: 523240 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:02,788-Speed 2496.80 samples/sec Loss 1.6995 LearningRate 0.000168 Epoch: 25 Global Step: 523250 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:10,992-Speed 2496.71 samples/sec Loss 1.6928 LearningRate 0.000168 Epoch: 25 Global Step: 523260 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:19,142-Speed 2513.38 samples/sec Loss 1.6349 LearningRate 0.000168 Epoch: 25 Global Step: 523270 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:27,373-Speed 2488.75 samples/sec Loss 1.6557 LearningRate 0.000168 Epoch: 25 Global Step: 523280 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:35,578-Speed 2496.40 samples/sec Loss 1.6755 LearningRate 0.000168 Epoch: 25 Global Step: 523290 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:43,804-Speed 2498.55 samples/sec Loss 1.6526 LearningRate 0.000168 Epoch: 25 Global Step: 523300 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:14:52,008-Speed 2496.78 samples/sec Loss 1.6582 LearningRate 0.000168 Epoch: 25 Global Step: 523310 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:00,246-Speed 2498.18 samples/sec Loss 1.6232 LearningRate 0.000168 Epoch: 25 Global Step: 523320 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:08,440-Speed 2515.27 samples/sec Loss 1.6372 LearningRate 0.000168 Epoch: 25 Global Step: 523330 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:16,643-Speed 2497.05 samples/sec Loss 1.6593 LearningRate 0.000168 Epoch: 25 Global Step: 523340 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:24,884-Speed 2498.01 samples/sec Loss 1.6626 LearningRate 0.000168 Epoch: 25 Global Step: 523350 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:33,125-Speed 2499.13 samples/sec Loss 1.6797 LearningRate 0.000168 Epoch: 25 Global Step: 523360 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:42,445-Speed 2197.58 samples/sec Loss 1.6449 LearningRate 0.000168 Epoch: 25 Global Step: 523370 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:50,705-Speed 2499.18 samples/sec Loss 1.6617 LearningRate 0.000168 Epoch: 25 Global Step: 523380 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:15:58,897-Speed 2515.24 samples/sec Loss 1.6977 LearningRate 0.000168 Epoch: 25 Global Step: 523390 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:12,318-Speed 1529.44 samples/sec Loss 1.7015 LearningRate 0.000168 Epoch: 25 Global Step: 523400 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:20,571-Speed 2500.05 samples/sec Loss 1.6824 LearningRate 0.000168 Epoch: 25 Global Step: 523410 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:28,843-Speed 2492.67 samples/sec Loss 1.6972 LearningRate 0.000168 Epoch: 25 Global Step: 523420 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:39,971-Speed 1847.51 samples/sec Loss 1.7040 LearningRate 0.000168 Epoch: 25 Global Step: 523430 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:48,196-Speed 2490.39 samples/sec Loss 1.6594 LearningRate 0.000168 Epoch: 25 Global Step: 523440 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:16:56,357-Speed 2509.76 samples/sec Loss 1.7003 LearningRate 0.000168 Epoch: 25 Global Step: 523450 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:17:10,594-Speed 2491.19 samples/sec Loss 1.6542 LearningRate 0.000168 Epoch: 25 Global Step: 523460 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:17:18,900-Speed 2492.21 samples/sec Loss 1.7008 LearningRate 0.000168 Epoch: 25 Global Step: 523470 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:17:27,869-Speed 2283.66 samples/sec Loss 1.6981 LearningRate 0.000168 Epoch: 25 Global Step: 523480 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:17:37,935-Speed 2497.55 samples/sec Loss 1.6982 LearningRate 0.000168 Epoch: 25 Global Step: 523490 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:17:46,194-Speed 2488.60 samples/sec Loss 1.7033 LearningRate 0.000168 Epoch: 25 Global Step: 523500 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:17:58,016-Speed 1732.51 samples/sec Loss 1.6888 LearningRate 0.000168 Epoch: 25 Global Step: 523510 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:07,144-Speed 2252.88 samples/sec Loss 1.6439 LearningRate 0.000168 Epoch: 25 Global Step: 523520 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:15,383-Speed 2501.56 samples/sec Loss 1.6734 LearningRate 0.000168 Epoch: 25 Global Step: 523530 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:23,579-Speed 2499.08 samples/sec Loss 1.6698 LearningRate 0.000168 Epoch: 25 Global Step: 523540 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:31,785-Speed 2496.45 samples/sec Loss 1.6715 LearningRate 0.000168 Epoch: 25 Global Step: 523550 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:39,991-Speed 2496.00 samples/sec Loss 1.6654 LearningRate 0.000168 Epoch: 25 Global Step: 523560 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:48,140-Speed 2513.69 samples/sec Loss 1.6963 LearningRate 0.000168 Epoch: 25 Global Step: 523570 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:18:56,339-Speed 2498.20 samples/sec Loss 1.6415 LearningRate 0.000168 Epoch: 25 Global Step: 523580 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:04,539-Speed 2498.26 samples/sec Loss 1.6548 LearningRate 0.000168 Epoch: 25 Global Step: 523590 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:12,745-Speed 2496.14 samples/sec Loss 1.6622 LearningRate 0.000168 Epoch: 25 Global Step: 523600 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:20,950-Speed 2496.41 samples/sec Loss 1.7182 LearningRate 0.000168 Epoch: 25 Global Step: 523610 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:29,155-Speed 2496.53 samples/sec Loss 1.7019 LearningRate 0.000168 Epoch: 25 Global Step: 523620 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:37,329-Speed 2505.87 samples/sec Loss 1.6864 LearningRate 0.000168 Epoch: 25 Global Step: 523630 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:45,534-Speed 2496.63 samples/sec Loss 1.6853 LearningRate 0.000168 Epoch: 25 Global Step: 523640 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:19:53,746-Speed 2494.25 samples/sec Loss 1.6461 LearningRate 0.000168 Epoch: 25 Global Step: 523650 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:01,952-Speed 2496.03 samples/sec Loss 1.6778 LearningRate 0.000168 Epoch: 25 Global Step: 523660 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:10,154-Speed 2497.47 samples/sec Loss 1.6554 LearningRate 0.000168 Epoch: 25 Global Step: 523670 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:18,360-Speed 2495.96 samples/sec Loss 1.6624 LearningRate 0.000168 Epoch: 25 Global Step: 523680 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:26,521-Speed 2509.73 samples/sec Loss 1.6471 LearningRate 0.000168 Epoch: 25 Global Step: 523690 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:34,733-Speed 2494.45 samples/sec Loss 1.6681 LearningRate 0.000168 Epoch: 25 Global Step: 523700 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:42,945-Speed 2494.26 samples/sec Loss 1.6849 LearningRate 0.000168 Epoch: 25 Global Step: 523710 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:51,152-Speed 2495.80 samples/sec Loss 1.6383 LearningRate 0.000168 Epoch: 25 Global Step: 523720 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:20:59,354-Speed 2497.47 samples/sec Loss 1.6572 LearningRate 0.000168 Epoch: 25 Global Step: 523730 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:21:07,553-Speed 2498.06 samples/sec Loss 1.6664 LearningRate 0.000168 Epoch: 25 Global Step: 523740 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:21:15,702-Speed 2513.62 samples/sec Loss 1.6567 LearningRate 0.000168 Epoch: 25 Global Step: 523750 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:21:23,903-Speed 2497.66 samples/sec Loss 1.6877 LearningRate 0.000168 Epoch: 25 Global Step: 523760 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:21:32,062-Speed 2510.21 samples/sec Loss 1.6832 LearningRate 0.000168 Epoch: 25 Global Step: 523770 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:21:40,266-Speed 2497.00 samples/sec Loss 1.6705 LearningRate 0.000168 Epoch: 25 Global Step: 523780 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:21:48,471-Speed 2496.30 samples/sec Loss 1.6739 LearningRate 0.000168 Epoch: 25 Global Step: 523790 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:21:56,679-Speed 2495.47 samples/sec Loss 1.6787 LearningRate 0.000168 Epoch: 25 Global Step: 523800 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:04,824-Speed 2514.73 samples/sec Loss 1.6609 LearningRate 0.000168 Epoch: 25 Global Step: 523810 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:13,035-Speed 2494.99 samples/sec Loss 1.6619 LearningRate 0.000168 Epoch: 25 Global Step: 523820 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:21,236-Speed 2497.24 samples/sec Loss 1.6336 LearningRate 0.000168 Epoch: 25 Global Step: 523830 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:29,446-Speed 2495.17 samples/sec Loss 1.6566 LearningRate 0.000168 Epoch: 25 Global Step: 523840 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:37,650-Speed 2496.80 samples/sec Loss 1.6994 LearningRate 0.000168 Epoch: 25 Global Step: 523850 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:45,868-Speed 2492.48 samples/sec Loss 1.6330 LearningRate 0.000168 Epoch: 25 Global Step: 523860 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:22:54,019-Speed 2513.00 samples/sec Loss 1.6617 LearningRate 0.000168 Epoch: 25 Global Step: 523870 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:02,222-Speed 2496.94 samples/sec Loss 1.6546 LearningRate 0.000168 Epoch: 25 Global Step: 523880 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:10,444-Speed 2491.46 samples/sec Loss 1.6878 LearningRate 0.000168 Epoch: 25 Global Step: 523890 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:18,644-Speed 2497.71 samples/sec Loss 1.6536 LearningRate 0.000168 Epoch: 25 Global Step: 523900 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:26,848-Speed 2496.76 samples/sec Loss 1.6696 LearningRate 0.000168 Epoch: 25 Global Step: 523910 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:35,050-Speed 2497.47 samples/sec Loss 1.6430 LearningRate 0.000168 Epoch: 25 Global Step: 523920 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:43,200-Speed 2513.34 samples/sec Loss 1.6762 LearningRate 0.000168 Epoch: 25 Global Step: 523930 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:51,399-Speed 2498.23 samples/sec Loss 1.6469 LearningRate 0.000168 Epoch: 25 Global Step: 523940 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:23:59,596-Speed 2499.17 samples/sec Loss 1.6630 LearningRate 0.000168 Epoch: 25 Global Step: 523950 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:07,803-Speed 2495.64 samples/sec Loss 1.6257 LearningRate 0.000168 Epoch: 25 Global Step: 523960 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:16,017-Speed 2494.09 samples/sec Loss 1.6087 LearningRate 0.000168 Epoch: 25 Global Step: 523970 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:24,222-Speed 2496.28 samples/sec Loss 1.7074 LearningRate 0.000168 Epoch: 25 Global Step: 523980 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:32,369-Speed 2514.22 samples/sec Loss 1.6744 LearningRate 0.000168 Epoch: 25 Global Step: 523990 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:40,570-Speed 2497.53 samples/sec Loss 1.6767 LearningRate 0.000168 Epoch: 25 Global Step: 524000 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:48,772-Speed 2497.63 samples/sec Loss 1.6614 LearningRate 0.000167 Epoch: 25 Global Step: 524010 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:24:56,971-Speed 2498.14 samples/sec Loss 1.6758 LearningRate 0.000167 Epoch: 25 Global Step: 524020 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:05,171-Speed 2497.81 samples/sec Loss 1.6962 LearningRate 0.000167 Epoch: 25 Global Step: 524030 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:13,376-Speed 2496.68 samples/sec Loss 1.6304 LearningRate 0.000167 Epoch: 25 Global Step: 524040 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:21,528-Speed 2512.66 samples/sec Loss 1.6618 LearningRate 0.000167 Epoch: 25 Global Step: 524050 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:29,745-Speed 2492.78 samples/sec Loss 1.6689 LearningRate 0.000167 Epoch: 25 Global Step: 524060 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:37,962-Speed 2492.58 samples/sec Loss 1.6565 LearningRate 0.000167 Epoch: 25 Global Step: 524070 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:46,165-Speed 2497.53 samples/sec Loss 1.6638 LearningRate 0.000167 Epoch: 25 Global Step: 524080 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:25:54,366-Speed 2497.93 samples/sec Loss 1.6682 LearningRate 0.000167 Epoch: 25 Global Step: 524090 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:02,571-Speed 2496.31 samples/sec Loss 1.6909 LearningRate 0.000167 Epoch: 25 Global Step: 524100 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:10,717-Speed 2514.53 samples/sec Loss 1.6456 LearningRate 0.000167 Epoch: 25 Global Step: 524110 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:18,919-Speed 2497.30 samples/sec Loss 1.6839 LearningRate 0.000167 Epoch: 25 Global Step: 524120 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:27,118-Speed 2498.71 samples/sec Loss 1.6224 LearningRate 0.000167 Epoch: 25 Global Step: 524130 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:35,319-Speed 2497.58 samples/sec Loss 1.6510 LearningRate 0.000167 Epoch: 25 Global Step: 524140 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:43,519-Speed 2497.66 samples/sec Loss 1.6636 LearningRate 0.000167 Epoch: 25 Global Step: 524150 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:51,727-Speed 2495.45 samples/sec Loss 1.6538 LearningRate 0.000167 Epoch: 25 Global Step: 524160 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:26:59,874-Speed 2514.48 samples/sec Loss 1.6646 LearningRate 0.000167 Epoch: 25 Global Step: 524170 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:08,083-Speed 2495.25 samples/sec Loss 1.6894 LearningRate 0.000167 Epoch: 25 Global Step: 524180 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:16,286-Speed 2497.34 samples/sec Loss 1.7109 LearningRate 0.000167 Epoch: 25 Global Step: 524190 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:24,496-Speed 2495.01 samples/sec Loss 1.7209 LearningRate 0.000167 Epoch: 25 Global Step: 524200 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:32,697-Speed 2497.77 samples/sec Loss 1.6635 LearningRate 0.000167 Epoch: 25 Global Step: 524210 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:40,897-Speed 2497.95 samples/sec Loss 1.7129 LearningRate 0.000167 Epoch: 25 Global Step: 524220 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:49,047-Speed 2513.42 samples/sec Loss 1.6339 LearningRate 0.000167 Epoch: 25 Global Step: 524230 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:27:57,246-Speed 2498.29 samples/sec Loss 1.6484 LearningRate 0.000167 Epoch: 25 Global Step: 524240 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:05,446-Speed 2498.21 samples/sec Loss 1.6577 LearningRate 0.000167 Epoch: 25 Global Step: 524250 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:13,660-Speed 2493.57 samples/sec Loss 1.6548 LearningRate 0.000167 Epoch: 25 Global Step: 524260 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:21,853-Speed 2499.91 samples/sec Loss 1.6451 LearningRate 0.000167 Epoch: 25 Global Step: 524270 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:30,053-Speed 2498.41 samples/sec Loss 1.6937 LearningRate 0.000167 Epoch: 25 Global Step: 524280 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:38,206-Speed 2512.29 samples/sec Loss 1.6250 LearningRate 0.000167 Epoch: 25 Global Step: 524290 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:46,410-Speed 2496.80 samples/sec Loss 1.6637 LearningRate 0.000167 Epoch: 25 Global Step: 524300 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:28:54,622-Speed 2494.22 samples/sec Loss 1.6579 LearningRate 0.000167 Epoch: 25 Global Step: 524310 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:02,821-Speed 2498.28 samples/sec Loss 1.6717 LearningRate 0.000167 Epoch: 25 Global Step: 524320 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:11,023-Speed 2497.32 samples/sec Loss 1.6738 LearningRate 0.000167 Epoch: 25 Global Step: 524330 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:19,223-Speed 2498.25 samples/sec Loss 1.6454 LearningRate 0.000167 Epoch: 25 Global Step: 524340 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:27,376-Speed 2512.51 samples/sec Loss 1.6415 LearningRate 0.000167 Epoch: 25 Global Step: 524350 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:35,576-Speed 2497.80 samples/sec Loss 1.6775 LearningRate 0.000167 Epoch: 25 Global Step: 524360 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:43,775-Speed 2498.17 samples/sec Loss 1.6547 LearningRate 0.000167 Epoch: 25 Global Step: 524370 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:29:51,986-Speed 2494.61 samples/sec Loss 1.6987 LearningRate 0.000167 Epoch: 25 Global Step: 524380 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:00,186-Speed 2498.05 samples/sec Loss 1.6931 LearningRate 0.000167 Epoch: 25 Global Step: 524390 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:08,389-Speed 2497.16 samples/sec Loss 1.6879 LearningRate 0.000167 Epoch: 25 Global Step: 524400 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:16,538-Speed 2513.63 samples/sec Loss 1.7079 LearningRate 0.000167 Epoch: 25 Global Step: 524410 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:24,737-Speed 2498.03 samples/sec Loss 1.6894 LearningRate 0.000167 Epoch: 25 Global Step: 524420 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:32,940-Speed 2497.07 samples/sec Loss 1.6750 LearningRate 0.000167 Epoch: 25 Global Step: 524430 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:41,144-Speed 2496.95 samples/sec Loss 1.7021 LearningRate 0.000167 Epoch: 25 Global Step: 524440 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:49,342-Speed 2498.85 samples/sec Loss 1.6634 LearningRate 0.000167 Epoch: 25 Global Step: 524450 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:30:57,540-Speed 2498.42 samples/sec Loss 1.6701 LearningRate 0.000167 Epoch: 25 Global Step: 524460 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:05,691-Speed 2513.02 samples/sec Loss 1.6826 LearningRate 0.000167 Epoch: 25 Global Step: 524470 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:13,892-Speed 2497.52 samples/sec Loss 1.6723 LearningRate 0.000167 Epoch: 25 Global Step: 524480 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:22,102-Speed 2495.15 samples/sec Loss 1.6906 LearningRate 0.000167 Epoch: 25 Global Step: 524490 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:30,308-Speed 2496.44 samples/sec Loss 1.6391 LearningRate 0.000167 Epoch: 25 Global Step: 524500 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:38,509-Speed 2497.38 samples/sec Loss 1.6963 LearningRate 0.000167 Epoch: 25 Global Step: 524510 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:46,709-Speed 2498.12 samples/sec Loss 1.7010 LearningRate 0.000167 Epoch: 25 Global Step: 524520 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:31:54,860-Speed 2513.25 samples/sec Loss 1.6808 LearningRate 0.000167 Epoch: 25 Global Step: 524530 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:03,061-Speed 2497.51 samples/sec Loss 1.6947 LearningRate 0.000167 Epoch: 25 Global Step: 524540 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:11,260-Speed 2498.32 samples/sec Loss 1.6706 LearningRate 0.000167 Epoch: 25 Global Step: 524550 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:19,458-Speed 2498.41 samples/sec Loss 1.6957 LearningRate 0.000167 Epoch: 25 Global Step: 524560 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:27,661-Speed 2497.35 samples/sec Loss 1.6626 LearningRate 0.000167 Epoch: 25 Global Step: 524570 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:35,867-Speed 2496.12 samples/sec Loss 1.6806 LearningRate 0.000167 Epoch: 25 Global Step: 524580 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:44,014-Speed 2513.97 samples/sec Loss 1.7164 LearningRate 0.000167 Epoch: 25 Global Step: 524590 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:32:52,213-Speed 2498.79 samples/sec Loss 1.6688 LearningRate 0.000167 Epoch: 25 Global Step: 524600 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:00,415-Speed 2497.18 samples/sec Loss 1.6570 LearningRate 0.000167 Epoch: 25 Global Step: 524610 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:08,631-Speed 2493.02 samples/sec Loss 1.7028 LearningRate 0.000167 Epoch: 25 Global Step: 524620 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:16,841-Speed 2495.22 samples/sec Loss 1.6918 LearningRate 0.000167 Epoch: 25 Global Step: 524630 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:25,043-Speed 2497.59 samples/sec Loss 1.6984 LearningRate 0.000167 Epoch: 25 Global Step: 524640 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:33,193-Speed 2513.25 samples/sec Loss 1.6981 LearningRate 0.000167 Epoch: 25 Global Step: 524650 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:41,395-Speed 2497.91 samples/sec Loss 1.6945 LearningRate 0.000167 Epoch: 25 Global Step: 524660 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:49,592-Speed 2498.90 samples/sec Loss 1.6671 LearningRate 0.000167 Epoch: 25 Global Step: 524670 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:33:57,804-Speed 2494.55 samples/sec Loss 1.6868 LearningRate 0.000167 Epoch: 25 Global Step: 524680 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:06,006-Speed 2497.17 samples/sec Loss 1.6490 LearningRate 0.000167 Epoch: 25 Global Step: 524690 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:14,208-Speed 2497.41 samples/sec Loss 1.6764 LearningRate 0.000167 Epoch: 25 Global Step: 524700 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:22,355-Speed 2514.36 samples/sec Loss 1.6670 LearningRate 0.000167 Epoch: 25 Global Step: 524710 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:30,559-Speed 2496.61 samples/sec Loss 1.6782 LearningRate 0.000167 Epoch: 25 Global Step: 524720 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:38,763-Speed 2496.71 samples/sec Loss 1.6656 LearningRate 0.000167 Epoch: 25 Global Step: 524730 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:46,962-Speed 2498.43 samples/sec Loss 1.6799 LearningRate 0.000167 Epoch: 25 Global Step: 524740 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:34:55,160-Speed 2498.63 samples/sec Loss 1.6794 LearningRate 0.000167 Epoch: 25 Global Step: 524750 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:03,361-Speed 2497.69 samples/sec Loss 1.6694 LearningRate 0.000167 Epoch: 25 Global Step: 524760 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:11,510-Speed 2513.60 samples/sec Loss 1.6761 LearningRate 0.000167 Epoch: 25 Global Step: 524770 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:19,708-Speed 2498.48 samples/sec Loss 1.6703 LearningRate 0.000167 Epoch: 25 Global Step: 524780 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:27,907-Speed 2498.44 samples/sec Loss 1.6524 LearningRate 0.000167 Epoch: 25 Global Step: 524790 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:36,109-Speed 2497.56 samples/sec Loss 1.6699 LearningRate 0.000167 Epoch: 25 Global Step: 524800 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:44,328-Speed 2492.21 samples/sec Loss 1.7320 LearningRate 0.000167 Epoch: 25 Global Step: 524810 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:35:52,530-Speed 2497.46 samples/sec Loss 1.6598 LearningRate 0.000167 Epoch: 25 Global Step: 524820 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:00,673-Speed 2515.23 samples/sec Loss 1.6862 LearningRate 0.000167 Epoch: 25 Global Step: 524830 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:08,883-Speed 2494.97 samples/sec Loss 1.6734 LearningRate 0.000167 Epoch: 25 Global Step: 524840 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:17,082-Speed 2498.29 samples/sec Loss 1.6413 LearningRate 0.000167 Epoch: 25 Global Step: 524850 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:25,287-Speed 2496.30 samples/sec Loss 1.6726 LearningRate 0.000167 Epoch: 25 Global Step: 524860 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:33,490-Speed 2497.28 samples/sec Loss 1.6804 LearningRate 0.000167 Epoch: 25 Global Step: 524870 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:41,690-Speed 2497.98 samples/sec Loss 1.6900 LearningRate 0.000167 Epoch: 25 Global Step: 524880 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:49,843-Speed 2512.38 samples/sec Loss 1.6429 LearningRate 0.000167 Epoch: 25 Global Step: 524890 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:36:58,045-Speed 2497.39 samples/sec Loss 1.6254 LearningRate 0.000167 Epoch: 25 Global Step: 524900 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:06,245-Speed 2497.90 samples/sec Loss 1.6410 LearningRate 0.000167 Epoch: 25 Global Step: 524910 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:14,444-Speed 2498.48 samples/sec Loss 1.6526 LearningRate 0.000166 Epoch: 25 Global Step: 524920 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:22,644-Speed 2497.82 samples/sec Loss 1.6722 LearningRate 0.000166 Epoch: 25 Global Step: 524930 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:30,844-Speed 2497.86 samples/sec Loss 1.6509 LearningRate 0.000166 Epoch: 25 Global Step: 524940 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:38,992-Speed 2514.10 samples/sec Loss 1.6454 LearningRate 0.000166 Epoch: 25 Global Step: 524950 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:47,192-Speed 2497.86 samples/sec Loss 1.6374 LearningRate 0.000166 Epoch: 25 Global Step: 524960 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:37:55,394-Speed 2497.55 samples/sec Loss 1.6437 LearningRate 0.000166 Epoch: 25 Global Step: 524970 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:03,597-Speed 2497.09 samples/sec Loss 1.6683 LearningRate 0.000166 Epoch: 25 Global Step: 524980 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:11,800-Speed 2497.49 samples/sec Loss 1.6768 LearningRate 0.000166 Epoch: 25 Global Step: 524990 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:20,005-Speed 2496.19 samples/sec Loss 1.6960 LearningRate 0.000166 Epoch: 25 Global Step: 525000 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:28,152-Speed 2514.20 samples/sec Loss 1.6562 LearningRate 0.000166 Epoch: 25 Global Step: 525010 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:36,353-Speed 2497.81 samples/sec Loss 1.6355 LearningRate 0.000166 Epoch: 25 Global Step: 525020 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:44,553-Speed 2498.02 samples/sec Loss 1.6411 LearningRate 0.000166 Epoch: 25 Global Step: 525030 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:38:52,752-Speed 2498.17 samples/sec Loss 1.6539 LearningRate 0.000166 Epoch: 25 Global Step: 525040 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:00,957-Speed 2496.74 samples/sec Loss 1.7099 LearningRate 0.000166 Epoch: 25 Global Step: 525050 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:09,158-Speed 2497.82 samples/sec Loss 1.6657 LearningRate 0.000166 Epoch: 25 Global Step: 525060 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:17,308-Speed 2513.16 samples/sec Loss 1.7116 LearningRate 0.000166 Epoch: 25 Global Step: 525070 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:25,509-Speed 2497.73 samples/sec Loss 1.6515 LearningRate 0.000166 Epoch: 25 Global Step: 525080 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:33,716-Speed 2495.77 samples/sec Loss 1.6785 LearningRate 0.000166 Epoch: 25 Global Step: 525090 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:41,922-Speed 2496.24 samples/sec Loss 1.7019 LearningRate 0.000166 Epoch: 25 Global Step: 525100 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:50,127-Speed 2496.48 samples/sec Loss 1.6885 LearningRate 0.000166 Epoch: 25 Global Step: 525110 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:39:58,332-Speed 2496.56 samples/sec Loss 1.7163 LearningRate 0.000166 Epoch: 25 Global Step: 525120 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:06,485-Speed 2512.38 samples/sec Loss 1.6572 LearningRate 0.000166 Epoch: 25 Global Step: 525130 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:14,685-Speed 2498.12 samples/sec Loss 1.6926 LearningRate 0.000166 Epoch: 25 Global Step: 525140 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:22,889-Speed 2497.11 samples/sec Loss 1.6739 LearningRate 0.000166 Epoch: 25 Global Step: 525150 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:31,088-Speed 2498.09 samples/sec Loss 1.6838 LearningRate 0.000166 Epoch: 25 Global Step: 525160 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:39,288-Speed 2498.07 samples/sec Loss 1.6969 LearningRate 0.000166 Epoch: 25 Global Step: 525170 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:47,494-Speed 2496.29 samples/sec Loss 1.6918 LearningRate 0.000166 Epoch: 25 Global Step: 525180 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:40:55,642-Speed 2513.74 samples/sec Loss 1.6994 LearningRate 0.000166 Epoch: 25 Global Step: 525190 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:03,839-Speed 2499.13 samples/sec Loss 1.7101 LearningRate 0.000166 Epoch: 25 Global Step: 525200 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:12,037-Speed 2498.69 samples/sec Loss 1.6497 LearningRate 0.000166 Epoch: 25 Global Step: 525210 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:20,237-Speed 2498.01 samples/sec Loss 1.6879 LearningRate 0.000166 Epoch: 25 Global Step: 525220 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:28,440-Speed 2497.07 samples/sec Loss 1.6556 LearningRate 0.000166 Epoch: 25 Global Step: 525230 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:36,645-Speed 2496.23 samples/sec Loss 1.7137 LearningRate 0.000166 Epoch: 25 Global Step: 525240 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:44,809-Speed 2509.16 samples/sec Loss 1.6528 LearningRate 0.000166 Epoch: 25 Global Step: 525250 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:41:53,010-Speed 2497.51 samples/sec Loss 1.6468 LearningRate 0.000166 Epoch: 25 Global Step: 525260 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:42:01,212-Speed 2497.44 samples/sec Loss 1.6573 LearningRate 0.000166 Epoch: 25 Global Step: 525270 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-07-10 14:42:09,371-Speed 2510.38 samples/sec Loss 1.7138 LearningRate 0.000166 Epoch: 25 Global Step: 525280 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:17,572-Speed 2497.63 samples/sec Loss 1.6578 LearningRate 0.000166 Epoch: 25 Global Step: 525290 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:25,780-Speed 2495.53 samples/sec Loss 1.7059 LearningRate 0.000166 Epoch: 25 Global Step: 525300 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:33,931-Speed 2513.11 samples/sec Loss 1.6972 LearningRate 0.000166 Epoch: 25 Global Step: 525310 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:42,132-Speed 2497.69 samples/sec Loss 1.6920 LearningRate 0.000166 Epoch: 25 Global Step: 525320 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:50,329-Speed 2498.93 samples/sec Loss 1.6690 LearningRate 0.000166 Epoch: 25 Global Step: 525330 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:42:58,533-Speed 2496.51 samples/sec Loss 1.6765 LearningRate 0.000166 Epoch: 25 Global Step: 525340 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:06,736-Speed 2497.19 samples/sec Loss 1.6885 LearningRate 0.000166 Epoch: 25 Global Step: 525350 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:14,939-Speed 2497.02 samples/sec Loss 1.6643 LearningRate 0.000166 Epoch: 25 Global Step: 525360 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:23,091-Speed 2512.70 samples/sec Loss 1.7076 LearningRate 0.000166 Epoch: 25 Global Step: 525370 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:31,289-Speed 2498.67 samples/sec Loss 1.6641 LearningRate 0.000166 Epoch: 25 Global Step: 525380 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:39,504-Speed 2493.27 samples/sec Loss 1.6573 LearningRate 0.000166 Epoch: 25 Global Step: 525390 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:47,702-Speed 2498.44 samples/sec Loss 1.6710 LearningRate 0.000166 Epoch: 25 Global Step: 525400 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:43:55,902-Speed 2498.20 samples/sec Loss 1.6301 LearningRate 0.000166 Epoch: 25 Global Step: 525410 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:04,101-Speed 2497.90 samples/sec Loss 1.6344 LearningRate 0.000166 Epoch: 25 Global Step: 525420 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:12,245-Speed 2514.97 samples/sec Loss 1.6475 LearningRate 0.000166 Epoch: 25 Global Step: 525430 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:20,445-Speed 2498.13 samples/sec Loss 1.6687 LearningRate 0.000166 Epoch: 25 Global Step: 525440 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:28,645-Speed 2498.13 samples/sec Loss 1.7024 LearningRate 0.000166 Epoch: 25 Global Step: 525450 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:36,862-Speed 2492.66 samples/sec Loss 1.6237 LearningRate 0.000166 Epoch: 25 Global Step: 525460 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:45,060-Speed 2498.34 samples/sec Loss 1.6459 LearningRate 0.000166 Epoch: 25 Global Step: 525470 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:44:53,260-Speed 2497.97 samples/sec Loss 1.6954 LearningRate 0.000166 Epoch: 25 Global Step: 525480 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:01,415-Speed 2512.06 samples/sec Loss 1.6287 LearningRate 0.000166 Epoch: 25 Global Step: 525490 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:09,612-Speed 2498.64 samples/sec Loss 1.7064 LearningRate 0.000166 Epoch: 25 Global Step: 525500 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:17,821-Speed 2495.70 samples/sec Loss 1.6647 LearningRate 0.000166 Epoch: 25 Global Step: 525510 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:26,019-Speed 2498.45 samples/sec Loss 1.6741 LearningRate 0.000166 Epoch: 25 Global Step: 525520 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:34,222-Speed 2496.98 samples/sec Loss 1.6703 LearningRate 0.000166 Epoch: 25 Global Step: 525530 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:42,424-Speed 2497.39 samples/sec Loss 1.6707 LearningRate 0.000166 Epoch: 25 Global Step: 525540 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:50,576-Speed 2512.65 samples/sec Loss 1.6366 LearningRate 0.000166 Epoch: 25 Global Step: 525550 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:45:58,778-Speed 2497.41 samples/sec Loss 1.6286 LearningRate 0.000166 Epoch: 25 Global Step: 525560 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:06,977-Speed 2498.31 samples/sec Loss 1.6772 LearningRate 0.000166 Epoch: 25 Global Step: 525570 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:15,175-Speed 2498.34 samples/sec Loss 1.6596 LearningRate 0.000166 Epoch: 25 Global Step: 525580 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:23,377-Speed 2497.40 samples/sec Loss 1.6183 LearningRate 0.000166 Epoch: 25 Global Step: 525590 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:31,578-Speed 2497.59 samples/sec Loss 1.6305 LearningRate 0.000166 Epoch: 25 Global Step: 525600 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:39,729-Speed 2512.92 samples/sec Loss 1.7015 LearningRate 0.000166 Epoch: 25 Global Step: 525610 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:47,933-Speed 2496.90 samples/sec Loss 1.6504 LearningRate 0.000166 Epoch: 25 Global Step: 525620 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:46:56,134-Speed 2497.56 samples/sec Loss 1.6352 LearningRate 0.000166 Epoch: 25 Global Step: 525630 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:04,335-Speed 2497.89 samples/sec Loss 1.6675 LearningRate 0.000166 Epoch: 25 Global Step: 525640 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:12,540-Speed 2496.28 samples/sec Loss 1.6793 LearningRate 0.000166 Epoch: 25 Global Step: 525650 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:20,746-Speed 2496.19 samples/sec Loss 1.6759 LearningRate 0.000166 Epoch: 25 Global Step: 525660 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:28,894-Speed 2513.86 samples/sec Loss 1.6708 LearningRate 0.000166 Epoch: 25 Global Step: 525670 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:37,094-Speed 2498.10 samples/sec Loss 1.6911 LearningRate 0.000166 Epoch: 25 Global Step: 525680 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:45,294-Speed 2497.93 samples/sec Loss 1.6670 LearningRate 0.000166 Epoch: 25 Global Step: 525690 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:47:53,495-Speed 2497.54 samples/sec Loss 1.6602 LearningRate 0.000166 Epoch: 25 Global Step: 525700 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:01,698-Speed 2496.88 samples/sec Loss 1.6311 LearningRate 0.000166 Epoch: 25 Global Step: 525710 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:09,898-Speed 2498.10 samples/sec Loss 1.6680 LearningRate 0.000166 Epoch: 25 Global Step: 525720 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:18,046-Speed 2514.01 samples/sec Loss 1.6528 LearningRate 0.000166 Epoch: 25 Global Step: 525730 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:26,248-Speed 2497.13 samples/sec Loss 1.6878 LearningRate 0.000166 Epoch: 25 Global Step: 525740 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:34,461-Speed 2494.44 samples/sec Loss 1.6622 LearningRate 0.000166 Epoch: 25 Global Step: 525750 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:42,663-Speed 2497.36 samples/sec Loss 1.6832 LearningRate 0.000166 Epoch: 25 Global Step: 525760 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:50,863-Speed 2497.91 samples/sec Loss 1.6337 LearningRate 0.000166 Epoch: 25 Global Step: 525770 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:48:59,078-Speed 2493.55 samples/sec Loss 1.6307 LearningRate 0.000166 Epoch: 25 Global Step: 525780 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:07,225-Speed 2514.09 samples/sec Loss 1.6634 LearningRate 0.000166 Epoch: 25 Global Step: 525790 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:15,430-Speed 2496.79 samples/sec Loss 1.6718 LearningRate 0.000166 Epoch: 25 Global Step: 525800 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:23,635-Speed 2496.25 samples/sec Loss 1.6818 LearningRate 0.000166 Epoch: 25 Global Step: 525810 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:31,836-Speed 2497.64 samples/sec Loss 1.6631 LearningRate 0.000166 Epoch: 25 Global Step: 525820 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:40,038-Speed 2497.57 samples/sec Loss 1.6274 LearningRate 0.000165 Epoch: 25 Global Step: 525830 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:48,238-Speed 2498.16 samples/sec Loss 1.6567 LearningRate 0.000165 Epoch: 25 Global Step: 525840 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:49:56,385-Speed 2514.12 samples/sec Loss 1.6430 LearningRate 0.000165 Epoch: 25 Global Step: 525850 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:04,587-Speed 2497.35 samples/sec Loss 1.6346 LearningRate 0.000165 Epoch: 25 Global Step: 525860 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:12,785-Speed 2498.66 samples/sec Loss 1.6883 LearningRate 0.000165 Epoch: 25 Global Step: 525870 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:20,985-Speed 2497.93 samples/sec Loss 1.6956 LearningRate 0.000165 Epoch: 25 Global Step: 525880 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:29,181-Speed 2499.03 samples/sec Loss 1.6645 LearningRate 0.000165 Epoch: 25 Global Step: 525890 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:37,378-Speed 2498.87 samples/sec Loss 1.6941 LearningRate 0.000165 Epoch: 25 Global Step: 525900 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:45,531-Speed 2512.47 samples/sec Loss 1.7032 LearningRate 0.000165 Epoch: 25 Global Step: 525910 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:50:53,729-Speed 2498.40 samples/sec Loss 1.6594 LearningRate 0.000165 Epoch: 25 Global Step: 525920 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:01,928-Speed 2498.39 samples/sec Loss 1.6805 LearningRate 0.000165 Epoch: 25 Global Step: 525930 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:10,146-Speed 2492.56 samples/sec Loss 1.6670 LearningRate 0.000165 Epoch: 25 Global Step: 525940 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:18,346-Speed 2497.91 samples/sec Loss 1.7079 LearningRate 0.000165 Epoch: 25 Global Step: 525950 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:26,551-Speed 2496.46 samples/sec Loss 1.6606 LearningRate 0.000165 Epoch: 25 Global Step: 525960 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:34,697-Speed 2514.47 samples/sec Loss 1.7026 LearningRate 0.000165 Epoch: 25 Global Step: 525970 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:42,900-Speed 2497.22 samples/sec Loss 1.6585 LearningRate 0.000165 Epoch: 25 Global Step: 525980 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:51,110-Speed 2494.69 samples/sec Loss 1.6392 LearningRate 0.000165 Epoch: 25 Global Step: 525990 Fp16 Grad Scale: 16384 Required: 70 hours Training: 2022-07-10 14:51:59,328-Speed 2492.99 samples/sec Loss 1.6193 LearningRate 0.000165 Epoch: 25 Global Step: 526000 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:07,530-Speed 2497.28 samples/sec Loss 1.6623 LearningRate 0.000165 Epoch: 25 Global Step: 526010 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:15,734-Speed 2497.12 samples/sec Loss 1.6778 LearningRate 0.000165 Epoch: 25 Global Step: 526020 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:23,886-Speed 2512.79 samples/sec Loss 1.6631 LearningRate 0.000165 Epoch: 25 Global Step: 526030 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:32,086-Speed 2497.82 samples/sec Loss 1.6777 LearningRate 0.000165 Epoch: 25 Global Step: 526040 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:40,287-Speed 2497.72 samples/sec Loss 1.6661 LearningRate 0.000165 Epoch: 25 Global Step: 526050 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:48,484-Speed 2498.81 samples/sec Loss 1.6718 LearningRate 0.000165 Epoch: 25 Global Step: 526060 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:52:56,682-Speed 2498.49 samples/sec Loss 1.6672 LearningRate 0.000165 Epoch: 25 Global Step: 526070 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:04,881-Speed 2498.25 samples/sec Loss 1.6884 LearningRate 0.000165 Epoch: 25 Global Step: 526080 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:13,033-Speed 2512.95 samples/sec Loss 1.6462 LearningRate 0.000165 Epoch: 25 Global Step: 526090 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:21,237-Speed 2496.51 samples/sec Loss 1.6618 LearningRate 0.000165 Epoch: 25 Global Step: 526100 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:29,439-Speed 2497.61 samples/sec Loss 1.6897 LearningRate 0.000165 Epoch: 25 Global Step: 526110 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:37,638-Speed 2498.19 samples/sec Loss 1.6601 LearningRate 0.000165 Epoch: 25 Global Step: 526120 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:45,854-Speed 2493.07 samples/sec Loss 1.6632 LearningRate 0.000165 Epoch: 25 Global Step: 526130 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:53:54,058-Speed 2496.67 samples/sec Loss 1.6735 LearningRate 0.000165 Epoch: 25 Global Step: 526140 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:02,207-Speed 2513.62 samples/sec Loss 1.6806 LearningRate 0.000165 Epoch: 25 Global Step: 526150 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:10,406-Speed 2498.14 samples/sec Loss 1.6844 LearningRate 0.000165 Epoch: 25 Global Step: 526160 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:18,611-Speed 2496.51 samples/sec Loss 1.6592 LearningRate 0.000165 Epoch: 25 Global Step: 526170 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:26,813-Speed 2497.36 samples/sec Loss 1.6528 LearningRate 0.000165 Epoch: 25 Global Step: 526180 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:35,017-Speed 2496.95 samples/sec Loss 1.7049 LearningRate 0.000165 Epoch: 25 Global Step: 526190 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:43,217-Speed 2497.89 samples/sec Loss 1.7007 LearningRate 0.000165 Epoch: 25 Global Step: 526200 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:51,375-Speed 2510.93 samples/sec Loss 1.6964 LearningRate 0.000165 Epoch: 25 Global Step: 526210 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:54:59,576-Speed 2497.60 samples/sec Loss 1.6330 LearningRate 0.000165 Epoch: 25 Global Step: 526220 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:07,775-Speed 2498.37 samples/sec Loss 1.6522 LearningRate 0.000165 Epoch: 25 Global Step: 526230 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:15,975-Speed 2498.29 samples/sec Loss 1.6663 LearningRate 0.000165 Epoch: 25 Global Step: 526240 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:24,176-Speed 2497.50 samples/sec Loss 1.6958 LearningRate 0.000165 Epoch: 25 Global Step: 526250 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:32,376-Speed 2497.91 samples/sec Loss 1.7115 LearningRate 0.000165 Epoch: 25 Global Step: 526260 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:40,523-Speed 2514.44 samples/sec Loss 1.6499 LearningRate 0.000165 Epoch: 25 Global Step: 526270 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:48,718-Speed 2499.18 samples/sec Loss 1.6623 LearningRate 0.000165 Epoch: 25 Global Step: 526280 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:55:56,919-Speed 2497.57 samples/sec Loss 1.6660 LearningRate 0.000165 Epoch: 25 Global Step: 526290 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:05,119-Speed 2498.31 samples/sec Loss 1.6642 LearningRate 0.000165 Epoch: 25 Global Step: 526300 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:13,316-Speed 2498.70 samples/sec Loss 1.6825 LearningRate 0.000165 Epoch: 25 Global Step: 526310 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:21,519-Speed 2496.97 samples/sec Loss 1.6790 LearningRate 0.000165 Epoch: 25 Global Step: 526320 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:29,699-Speed 2504.37 samples/sec Loss 1.6794 LearningRate 0.000165 Epoch: 25 Global Step: 526330 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:37,899-Speed 2497.72 samples/sec Loss 1.6816 LearningRate 0.000165 Epoch: 25 Global Step: 526340 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:46,098-Speed 2498.18 samples/sec Loss 1.6849 LearningRate 0.000165 Epoch: 25 Global Step: 526350 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:56:54,299-Speed 2497.65 samples/sec Loss 1.6639 LearningRate 0.000165 Epoch: 25 Global Step: 526360 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:02,500-Speed 2497.96 samples/sec Loss 1.6848 LearningRate 0.000165 Epoch: 25 Global Step: 526370 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:10,700-Speed 2497.81 samples/sec Loss 1.6861 LearningRate 0.000165 Epoch: 25 Global Step: 526380 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:18,852-Speed 2512.91 samples/sec Loss 1.7026 LearningRate 0.000165 Epoch: 25 Global Step: 526390 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:27,054-Speed 2497.06 samples/sec Loss 1.6686 LearningRate 0.000165 Epoch: 25 Global Step: 526400 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:35,258-Speed 2496.94 samples/sec Loss 1.6777 LearningRate 0.000165 Epoch: 25 Global Step: 526410 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:43,460-Speed 2497.13 samples/sec Loss 1.6533 LearningRate 0.000165 Epoch: 25 Global Step: 526420 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:51,661-Speed 2497.88 samples/sec Loss 1.6503 LearningRate 0.000165 Epoch: 25 Global Step: 526430 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:57:59,864-Speed 2497.06 samples/sec Loss 1.6276 LearningRate 0.000165 Epoch: 25 Global Step: 526440 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:58:08,013-Speed 2513.45 samples/sec Loss 1.7046 LearningRate 0.000165 Epoch: 25 Global Step: 526450 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:58:16,218-Speed 2496.43 samples/sec Loss 1.6755 LearningRate 0.000165 Epoch: 25 Global Step: 526460 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:58:24,418-Speed 2498.49 samples/sec Loss 1.6727 LearningRate 0.000165 Epoch: 25 Global Step: 526470 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 14:58:32,620-Speed 2497.34 samples/sec Loss 1.6719 LearningRate 0.000165 Epoch: 25 Global Step: 526480 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:58:40,820-Speed 2498.15 samples/sec Loss 1.6664 LearningRate 0.000165 Epoch: 25 Global Step: 526490 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:58:49,032-Speed 2494.22 samples/sec Loss 1.7276 LearningRate 0.000165 Epoch: 25 Global Step: 526500 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:58:57,194-Speed 2509.74 samples/sec Loss 1.7065 LearningRate 0.000165 Epoch: 25 Global Step: 526510 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:05,396-Speed 2497.36 samples/sec Loss 1.6291 LearningRate 0.000165 Epoch: 25 Global Step: 526520 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:13,603-Speed 2495.86 samples/sec Loss 1.6605 LearningRate 0.000165 Epoch: 25 Global Step: 526530 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:21,800-Speed 2498.61 samples/sec Loss 1.6938 LearningRate 0.000165 Epoch: 25 Global Step: 526540 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:30,005-Speed 2496.83 samples/sec Loss 1.6223 LearningRate 0.000165 Epoch: 25 Global Step: 526550 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:38,208-Speed 2496.92 samples/sec Loss 1.6696 LearningRate 0.000165 Epoch: 25 Global Step: 526560 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:46,353-Speed 2514.89 samples/sec Loss 1.6441 LearningRate 0.000165 Epoch: 25 Global Step: 526570 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 14:59:54,554-Speed 2497.91 samples/sec Loss 1.6968 LearningRate 0.000165 Epoch: 25 Global Step: 526580 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:00:02,712-Speed 2511.20 samples/sec Loss 1.6899 LearningRate 0.000165 Epoch: 25 Global Step: 526590 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:10,911-Speed 2498.23 samples/sec Loss 1.6864 LearningRate 0.000165 Epoch: 25 Global Step: 526600 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:19,121-Speed 2495.04 samples/sec Loss 1.6497 LearningRate 0.000165 Epoch: 25 Global Step: 526610 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:27,326-Speed 2496.62 samples/sec Loss 1.6952 LearningRate 0.000165 Epoch: 25 Global Step: 526620 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:35,475-Speed 2513.33 samples/sec Loss 1.6895 LearningRate 0.000165 Epoch: 25 Global Step: 526630 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:43,677-Speed 2497.42 samples/sec Loss 1.6641 LearningRate 0.000165 Epoch: 25 Global Step: 526640 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:00:51,878-Speed 2497.92 samples/sec Loss 1.6593 LearningRate 0.000165 Epoch: 25 Global Step: 526650 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:00,089-Speed 2494.48 samples/sec Loss 1.6610 LearningRate 0.000165 Epoch: 25 Global Step: 526660 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:08,291-Speed 2497.32 samples/sec Loss 1.6662 LearningRate 0.000165 Epoch: 25 Global Step: 526670 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:16,500-Speed 2495.42 samples/sec Loss 1.6882 LearningRate 0.000165 Epoch: 25 Global Step: 526680 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:24,646-Speed 2514.43 samples/sec Loss 1.6688 LearningRate 0.000165 Epoch: 25 Global Step: 526690 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:32,848-Speed 2497.31 samples/sec Loss 1.6681 LearningRate 0.000165 Epoch: 25 Global Step: 526700 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:41,057-Speed 2495.28 samples/sec Loss 1.6515 LearningRate 0.000165 Epoch: 25 Global Step: 526710 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:49,261-Speed 2497.02 samples/sec Loss 1.6649 LearningRate 0.000165 Epoch: 25 Global Step: 526720 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:01:57,474-Speed 2494.10 samples/sec Loss 1.6899 LearningRate 0.000165 Epoch: 25 Global Step: 526730 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:05,676-Speed 2497.16 samples/sec Loss 1.6964 LearningRate 0.000165 Epoch: 25 Global Step: 526740 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:13,825-Speed 2513.87 samples/sec Loss 1.6850 LearningRate 0.000164 Epoch: 25 Global Step: 526750 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:22,031-Speed 2496.10 samples/sec Loss 1.6607 LearningRate 0.000164 Epoch: 25 Global Step: 526760 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:30,234-Speed 2497.11 samples/sec Loss 1.6492 LearningRate 0.000164 Epoch: 25 Global Step: 526770 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:38,432-Speed 2498.40 samples/sec Loss 1.6359 LearningRate 0.000164 Epoch: 25 Global Step: 526780 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:46,636-Speed 2496.79 samples/sec Loss 1.6847 LearningRate 0.000164 Epoch: 25 Global Step: 526790 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:02:54,839-Speed 2497.02 samples/sec Loss 1.6733 LearningRate 0.000164 Epoch: 25 Global Step: 526800 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:02,986-Speed 2514.19 samples/sec Loss 1.6484 LearningRate 0.000164 Epoch: 25 Global Step: 526810 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:11,189-Speed 2497.18 samples/sec Loss 1.6870 LearningRate 0.000164 Epoch: 25 Global Step: 526820 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:19,386-Speed 2498.84 samples/sec Loss 1.7137 LearningRate 0.000164 Epoch: 25 Global Step: 526830 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:27,587-Speed 2497.46 samples/sec Loss 1.6298 LearningRate 0.000164 Epoch: 25 Global Step: 526840 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:35,794-Speed 2495.78 samples/sec Loss 1.6430 LearningRate 0.000164 Epoch: 25 Global Step: 526850 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:43,993-Speed 2498.19 samples/sec Loss 1.6658 LearningRate 0.000164 Epoch: 25 Global Step: 526860 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:03:52,136-Speed 2515.57 samples/sec Loss 1.6463 LearningRate 0.000164 Epoch: 25 Global Step: 526870 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:00,335-Speed 2498.27 samples/sec Loss 1.6657 LearningRate 0.000164 Epoch: 25 Global Step: 526880 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:08,536-Speed 2497.54 samples/sec Loss 1.6828 LearningRate 0.000164 Epoch: 25 Global Step: 526890 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:16,745-Speed 2495.31 samples/sec Loss 1.6827 LearningRate 0.000164 Epoch: 25 Global Step: 526900 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:24,943-Speed 2498.56 samples/sec Loss 1.6730 LearningRate 0.000164 Epoch: 25 Global Step: 526910 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:33,144-Speed 2497.60 samples/sec Loss 1.6774 LearningRate 0.000164 Epoch: 25 Global Step: 526920 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:41,288-Speed 2515.04 samples/sec Loss 1.6376 LearningRate 0.000164 Epoch: 25 Global Step: 526930 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:49,495-Speed 2496.10 samples/sec Loss 1.6669 LearningRate 0.000164 Epoch: 25 Global Step: 526940 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:04:57,698-Speed 2497.17 samples/sec Loss 1.6318 LearningRate 0.000164 Epoch: 25 Global Step: 526950 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:05,898-Speed 2498.10 samples/sec Loss 1.7141 LearningRate 0.000164 Epoch: 25 Global Step: 526960 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:14,106-Speed 2495.41 samples/sec Loss 1.6757 LearningRate 0.000164 Epoch: 25 Global Step: 526970 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:22,318-Speed 2494.29 samples/sec Loss 1.6448 LearningRate 0.000164 Epoch: 25 Global Step: 526980 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:30,470-Speed 2512.80 samples/sec Loss 1.6634 LearningRate 0.000164 Epoch: 25 Global Step: 526990 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:38,675-Speed 2496.41 samples/sec Loss 1.6609 LearningRate 0.000164 Epoch: 25 Global Step: 527000 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:46,881-Speed 2495.97 samples/sec Loss 1.6410 LearningRate 0.000164 Epoch: 25 Global Step: 527010 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:05:55,080-Speed 2498.22 samples/sec Loss 1.6935 LearningRate 0.000164 Epoch: 25 Global Step: 527020 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:03,278-Speed 2498.82 samples/sec Loss 1.6526 LearningRate 0.000164 Epoch: 25 Global Step: 527030 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:11,479-Speed 2497.64 samples/sec Loss 1.6399 LearningRate 0.000164 Epoch: 25 Global Step: 527040 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:19,637-Speed 2510.84 samples/sec Loss 1.6388 LearningRate 0.000164 Epoch: 25 Global Step: 527050 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:27,840-Speed 2497.18 samples/sec Loss 1.6690 LearningRate 0.000164 Epoch: 25 Global Step: 527060 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:36,047-Speed 2495.72 samples/sec Loss 1.6489 LearningRate 0.000164 Epoch: 25 Global Step: 527070 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:44,246-Speed 2498.28 samples/sec Loss 1.6772 LearningRate 0.000164 Epoch: 25 Global Step: 527080 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:06:52,449-Speed 2496.76 samples/sec Loss 1.6683 LearningRate 0.000164 Epoch: 25 Global Step: 527090 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:00,659-Speed 2494.91 samples/sec Loss 1.6861 LearningRate 0.000164 Epoch: 25 Global Step: 527100 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:08,815-Speed 2511.65 samples/sec Loss 1.6327 LearningRate 0.000164 Epoch: 25 Global Step: 527110 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:17,020-Speed 2496.26 samples/sec Loss 1.6930 LearningRate 0.000164 Epoch: 25 Global Step: 527120 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:25,218-Speed 2498.81 samples/sec Loss 1.6336 LearningRate 0.000164 Epoch: 25 Global Step: 527130 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:33,418-Speed 2498.02 samples/sec Loss 1.6574 LearningRate 0.000164 Epoch: 25 Global Step: 527140 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:41,615-Speed 2498.75 samples/sec Loss 1.6806 LearningRate 0.000164 Epoch: 25 Global Step: 527150 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:49,812-Speed 2498.69 samples/sec Loss 1.7246 LearningRate 0.000164 Epoch: 25 Global Step: 527160 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:07:57,958-Speed 2514.59 samples/sec Loss 1.6542 LearningRate 0.000164 Epoch: 25 Global Step: 527170 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:06,160-Speed 2497.45 samples/sec Loss 1.6632 LearningRate 0.000164 Epoch: 25 Global Step: 527180 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:14,359-Speed 2498.07 samples/sec Loss 1.6745 LearningRate 0.000164 Epoch: 25 Global Step: 527190 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:22,557-Speed 2498.62 samples/sec Loss 1.6624 LearningRate 0.000164 Epoch: 25 Global Step: 527200 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:30,839-Speed 2473.10 samples/sec Loss 1.7000 LearningRate 0.000164 Epoch: 25 Global Step: 527210 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:39,036-Speed 2499.12 samples/sec Loss 1.6539 LearningRate 0.000164 Epoch: 25 Global Step: 527220 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:47,181-Speed 2514.89 samples/sec Loss 1.7062 LearningRate 0.000164 Epoch: 25 Global Step: 527230 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:08:55,379-Speed 2498.53 samples/sec Loss 1.6865 LearningRate 0.000164 Epoch: 25 Global Step: 527240 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:03,579-Speed 2497.97 samples/sec Loss 1.6919 LearningRate 0.000164 Epoch: 25 Global Step: 527250 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:11,775-Speed 2499.35 samples/sec Loss 1.6602 LearningRate 0.000164 Epoch: 25 Global Step: 527260 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:19,977-Speed 2497.24 samples/sec Loss 1.6780 LearningRate 0.000164 Epoch: 25 Global Step: 527270 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:28,202-Speed 2490.17 samples/sec Loss 1.6179 LearningRate 0.000164 Epoch: 25 Global Step: 527280 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:36,355-Speed 2512.42 samples/sec Loss 1.6822 LearningRate 0.000164 Epoch: 25 Global Step: 527290 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:44,564-Speed 2495.47 samples/sec Loss 1.6738 LearningRate 0.000164 Epoch: 25 Global Step: 527300 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:09:52,764-Speed 2497.67 samples/sec Loss 1.6993 LearningRate 0.000164 Epoch: 25 Global Step: 527310 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:00,966-Speed 2497.15 samples/sec Loss 1.6570 LearningRate 0.000164 Epoch: 25 Global Step: 527320 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:09,171-Speed 2496.54 samples/sec Loss 1.6517 LearningRate 0.000164 Epoch: 25 Global Step: 527330 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:17,371-Speed 2498.06 samples/sec Loss 1.6566 LearningRate 0.000164 Epoch: 25 Global Step: 527340 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:25,517-Speed 2514.33 samples/sec Loss 1.6562 LearningRate 0.000164 Epoch: 25 Global Step: 527350 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:33,728-Speed 2494.61 samples/sec Loss 1.6340 LearningRate 0.000164 Epoch: 25 Global Step: 527360 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:41,923-Speed 2499.87 samples/sec Loss 1.6553 LearningRate 0.000164 Epoch: 25 Global Step: 527370 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:50,119-Speed 2499.16 samples/sec Loss 1.6711 LearningRate 0.000164 Epoch: 25 Global Step: 527380 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:10:58,316-Speed 2498.62 samples/sec Loss 1.6562 LearningRate 0.000164 Epoch: 25 Global Step: 527390 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:06,517-Speed 2498.03 samples/sec Loss 1.6521 LearningRate 0.000164 Epoch: 25 Global Step: 527400 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:14,663-Speed 2514.55 samples/sec Loss 1.6880 LearningRate 0.000164 Epoch: 25 Global Step: 527410 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:22,877-Speed 2493.72 samples/sec Loss 1.6680 LearningRate 0.000164 Epoch: 25 Global Step: 527420 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:31,082-Speed 2496.21 samples/sec Loss 1.7039 LearningRate 0.000164 Epoch: 25 Global Step: 527430 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:39,282-Speed 2497.98 samples/sec Loss 1.6930 LearningRate 0.000164 Epoch: 25 Global Step: 527440 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:47,480-Speed 2498.74 samples/sec Loss 1.6933 LearningRate 0.000164 Epoch: 25 Global Step: 527450 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:11:55,677-Speed 2498.77 samples/sec Loss 1.6740 LearningRate 0.000164 Epoch: 25 Global Step: 527460 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:03,824-Speed 2514.33 samples/sec Loss 1.6574 LearningRate 0.000164 Epoch: 25 Global Step: 527470 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:12,019-Speed 2499.41 samples/sec Loss 1.6602 LearningRate 0.000164 Epoch: 25 Global Step: 527480 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:20,228-Speed 2495.37 samples/sec Loss 1.6931 LearningRate 0.000164 Epoch: 25 Global Step: 527490 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:28,426-Speed 2498.48 samples/sec Loss 1.6655 LearningRate 0.000164 Epoch: 25 Global Step: 527500 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:36,623-Speed 2498.88 samples/sec Loss 1.6455 LearningRate 0.000164 Epoch: 25 Global Step: 527510 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:44,825-Speed 2497.36 samples/sec Loss 1.6582 LearningRate 0.000164 Epoch: 25 Global Step: 527520 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:12:52,973-Speed 2513.92 samples/sec Loss 1.6372 LearningRate 0.000164 Epoch: 25 Global Step: 527530 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:01,173-Speed 2498.11 samples/sec Loss 1.6585 LearningRate 0.000164 Epoch: 25 Global Step: 527540 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:09,375-Speed 2497.17 samples/sec Loss 1.6932 LearningRate 0.000164 Epoch: 25 Global Step: 527550 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:17,578-Speed 2497.18 samples/sec Loss 1.6875 LearningRate 0.000164 Epoch: 25 Global Step: 527560 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:25,776-Speed 2498.27 samples/sec Loss 1.6532 LearningRate 0.000164 Epoch: 25 Global Step: 527570 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:33,975-Speed 2498.43 samples/sec Loss 1.6850 LearningRate 0.000164 Epoch: 25 Global Step: 527580 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:42,125-Speed 2513.30 samples/sec Loss 1.6411 LearningRate 0.000164 Epoch: 25 Global Step: 527590 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:50,325-Speed 2497.93 samples/sec Loss 1.6481 LearningRate 0.000164 Epoch: 25 Global Step: 527600 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:13:58,524-Speed 2498.41 samples/sec Loss 1.6923 LearningRate 0.000164 Epoch: 25 Global Step: 527610 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:06,730-Speed 2496.24 samples/sec Loss 1.7087 LearningRate 0.000164 Epoch: 25 Global Step: 527620 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:14,927-Speed 2498.91 samples/sec Loss 1.6676 LearningRate 0.000164 Epoch: 25 Global Step: 527630 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:23,126-Speed 2498.08 samples/sec Loss 1.6516 LearningRate 0.000164 Epoch: 25 Global Step: 527640 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:31,278-Speed 2512.64 samples/sec Loss 1.6583 LearningRate 0.000164 Epoch: 25 Global Step: 527650 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:39,490-Speed 2494.43 samples/sec Loss 1.6620 LearningRate 0.000164 Epoch: 25 Global Step: 527660 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:47,701-Speed 2494.85 samples/sec Loss 1.6808 LearningRate 0.000164 Epoch: 25 Global Step: 527670 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:14:55,900-Speed 2498.42 samples/sec Loss 1.6350 LearningRate 0.000163 Epoch: 25 Global Step: 527680 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:04,099-Speed 2498.16 samples/sec Loss 1.6660 LearningRate 0.000163 Epoch: 25 Global Step: 527690 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:12,299-Speed 2497.86 samples/sec Loss 1.6518 LearningRate 0.000163 Epoch: 25 Global Step: 527700 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:20,445-Speed 2514.75 samples/sec Loss 1.6939 LearningRate 0.000163 Epoch: 25 Global Step: 527710 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:28,643-Speed 2498.25 samples/sec Loss 1.6303 LearningRate 0.000163 Epoch: 25 Global Step: 527720 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:36,848-Speed 2496.42 samples/sec Loss 1.6467 LearningRate 0.000163 Epoch: 25 Global Step: 527730 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:45,047-Speed 2498.22 samples/sec Loss 1.6349 LearningRate 0.000163 Epoch: 25 Global Step: 527740 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:15:53,253-Speed 2496.32 samples/sec Loss 1.6426 LearningRate 0.000163 Epoch: 25 Global Step: 527750 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:16:01,455-Speed 2497.39 samples/sec Loss 1.6650 LearningRate 0.000163 Epoch: 25 Global Step: 527760 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:16:09,604-Speed 2513.27 samples/sec Loss 1.6562 LearningRate 0.000163 Epoch: 25 Global Step: 527770 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:16:17,806-Speed 2497.40 samples/sec Loss 1.6381 LearningRate 0.000163 Epoch: 25 Global Step: 527780 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:16:26,007-Speed 2498.06 samples/sec Loss 1.6559 LearningRate 0.000163 Epoch: 25 Global Step: 527790 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:16:34,211-Speed 2496.51 samples/sec Loss 1.6431 LearningRate 0.000163 Epoch: 25 Global Step: 527800 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:16:42,430-Speed 2492.26 samples/sec Loss 1.6679 LearningRate 0.000163 Epoch: 25 Global Step: 527810 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:16:50,633-Speed 2497.02 samples/sec Loss 1.6393 LearningRate 0.000163 Epoch: 25 Global Step: 527820 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:16:58,779-Speed 2514.54 samples/sec Loss 1.6693 LearningRate 0.000163 Epoch: 25 Global Step: 527830 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:06,982-Speed 2497.79 samples/sec Loss 1.6605 LearningRate 0.000163 Epoch: 25 Global Step: 527840 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:15,183-Speed 2497.94 samples/sec Loss 1.6574 LearningRate 0.000163 Epoch: 25 Global Step: 527850 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:23,389-Speed 2496.32 samples/sec Loss 1.6290 LearningRate 0.000163 Epoch: 25 Global Step: 527860 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:31,595-Speed 2496.17 samples/sec Loss 1.6719 LearningRate 0.000163 Epoch: 25 Global Step: 527870 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:39,803-Speed 2495.64 samples/sec Loss 1.7006 LearningRate 0.000163 Epoch: 25 Global Step: 527880 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:47,946-Speed 2515.23 samples/sec Loss 1.6661 LearningRate 0.000163 Epoch: 25 Global Step: 527890 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:17:56,149-Speed 2497.43 samples/sec Loss 1.6420 LearningRate 0.000163 Epoch: 25 Global Step: 527900 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:04,347-Speed 2498.53 samples/sec Loss 1.6536 LearningRate 0.000163 Epoch: 25 Global Step: 527910 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:12,547-Speed 2497.90 samples/sec Loss 1.6327 LearningRate 0.000163 Epoch: 25 Global Step: 527920 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:20,747-Speed 2498.13 samples/sec Loss 1.6166 LearningRate 0.000163 Epoch: 25 Global Step: 527930 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:28,948-Speed 2497.56 samples/sec Loss 1.6820 LearningRate 0.000163 Epoch: 25 Global Step: 527940 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:37,094-Speed 2514.44 samples/sec Loss 1.7101 LearningRate 0.000163 Epoch: 25 Global Step: 527950 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:45,294-Speed 2497.94 samples/sec Loss 1.6667 LearningRate 0.000163 Epoch: 25 Global Step: 527960 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:18:53,498-Speed 2496.92 samples/sec Loss 1.6553 LearningRate 0.000163 Epoch: 25 Global Step: 527970 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:19:01,697-Speed 2498.43 samples/sec Loss 1.6664 LearningRate 0.000163 Epoch: 25 Global Step: 527980 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:19:09,897-Speed 2497.90 samples/sec Loss 1.6678 LearningRate 0.000163 Epoch: 25 Global Step: 527990 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:19:18,099-Speed 2497.38 samples/sec Loss 1.7114 LearningRate 0.000163 Epoch: 25 Global Step: 528000 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:19:26,255-Speed 2511.30 samples/sec Loss 1.7008 LearningRate 0.000163 Epoch: 25 Global Step: 528010 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:19:34,416-Speed 2510.04 samples/sec Loss 1.6822 LearningRate 0.000163 Epoch: 25 Global Step: 528020 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:19:42,624-Speed 2495.39 samples/sec Loss 1.7114 LearningRate 0.000163 Epoch: 25 Global Step: 528030 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:19:50,826-Speed 2497.29 samples/sec Loss 1.6645 LearningRate 0.000163 Epoch: 25 Global Step: 528040 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:19:59,033-Speed 2495.98 samples/sec Loss 1.7105 LearningRate 0.000163 Epoch: 25 Global Step: 528050 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:07,235-Speed 2497.19 samples/sec Loss 1.6499 LearningRate 0.000163 Epoch: 25 Global Step: 528060 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:15,391-Speed 2511.64 samples/sec Loss 1.7176 LearningRate 0.000163 Epoch: 25 Global Step: 528070 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:23,593-Speed 2497.54 samples/sec Loss 1.6479 LearningRate 0.000163 Epoch: 25 Global Step: 528080 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:31,793-Speed 2498.33 samples/sec Loss 1.6705 LearningRate 0.000163 Epoch: 25 Global Step: 528090 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:39,991-Speed 2498.33 samples/sec Loss 1.6849 LearningRate 0.000163 Epoch: 25 Global Step: 528100 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:48,203-Speed 2494.45 samples/sec Loss 1.6208 LearningRate 0.000163 Epoch: 25 Global Step: 528110 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:20:56,400-Speed 2498.73 samples/sec Loss 1.6379 LearningRate 0.000163 Epoch: 25 Global Step: 528120 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:04,549-Speed 2513.91 samples/sec Loss 1.5970 LearningRate 0.000163 Epoch: 25 Global Step: 528130 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:12,751-Speed 2497.55 samples/sec Loss 1.6659 LearningRate 0.000163 Epoch: 25 Global Step: 528140 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:20,955-Speed 2496.48 samples/sec Loss 1.6593 LearningRate 0.000163 Epoch: 25 Global Step: 528150 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:29,159-Speed 2497.02 samples/sec Loss 1.6743 LearningRate 0.000163 Epoch: 25 Global Step: 528160 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:37,368-Speed 2495.19 samples/sec Loss 1.6635 LearningRate 0.000163 Epoch: 25 Global Step: 528170 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:45,571-Speed 2497.27 samples/sec Loss 1.6400 LearningRate 0.000163 Epoch: 25 Global Step: 528180 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:21:53,724-Speed 2512.35 samples/sec Loss 1.6321 LearningRate 0.000163 Epoch: 25 Global Step: 528190 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:01,919-Speed 2499.24 samples/sec Loss 1.6311 LearningRate 0.000163 Epoch: 25 Global Step: 528200 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:10,120-Speed 2498.20 samples/sec Loss 1.6555 LearningRate 0.000163 Epoch: 25 Global Step: 528210 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:18,315-Speed 2499.53 samples/sec Loss 1.6757 LearningRate 0.000163 Epoch: 25 Global Step: 528220 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:26,513-Speed 2498.40 samples/sec Loss 1.6407 LearningRate 0.000163 Epoch: 25 Global Step: 528230 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:34,719-Speed 2496.69 samples/sec Loss 1.6532 LearningRate 0.000163 Epoch: 25 Global Step: 528240 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:42,859-Speed 2516.35 samples/sec Loss 1.6452 LearningRate 0.000163 Epoch: 25 Global Step: 528250 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:51,061-Speed 2497.26 samples/sec Loss 1.6225 LearningRate 0.000163 Epoch: 25 Global Step: 528260 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:22:59,256-Speed 2499.81 samples/sec Loss 1.6491 LearningRate 0.000163 Epoch: 25 Global Step: 528270 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:07,456-Speed 2497.99 samples/sec Loss 1.6592 LearningRate 0.000163 Epoch: 25 Global Step: 528280 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:15,654-Speed 2498.72 samples/sec Loss 1.6949 LearningRate 0.000163 Epoch: 25 Global Step: 528290 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:23,856-Speed 2497.64 samples/sec Loss 1.6661 LearningRate 0.000163 Epoch: 25 Global Step: 528300 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:32,004-Speed 2513.89 samples/sec Loss 1.6594 LearningRate 0.000163 Epoch: 25 Global Step: 528310 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:40,210-Speed 2495.98 samples/sec Loss 1.6484 LearningRate 0.000163 Epoch: 25 Global Step: 528320 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:48,411-Speed 2497.89 samples/sec Loss 1.6474 LearningRate 0.000163 Epoch: 25 Global Step: 528330 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:23:56,607-Speed 2499.16 samples/sec Loss 1.6851 LearningRate 0.000163 Epoch: 25 Global Step: 528340 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:04,810-Speed 2497.08 samples/sec Loss 1.6148 LearningRate 0.000163 Epoch: 25 Global Step: 528350 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:13,008-Speed 2498.39 samples/sec Loss 1.6550 LearningRate 0.000163 Epoch: 25 Global Step: 528360 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:21,171-Speed 2509.38 samples/sec Loss 1.6548 LearningRate 0.000163 Epoch: 25 Global Step: 528370 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:29,377-Speed 2495.89 samples/sec Loss 1.6607 LearningRate 0.000163 Epoch: 25 Global Step: 528380 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:37,578-Speed 2497.64 samples/sec Loss 1.6273 LearningRate 0.000163 Epoch: 25 Global Step: 528390 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:45,777-Speed 2498.42 samples/sec Loss 1.6527 LearningRate 0.000163 Epoch: 25 Global Step: 528400 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:24:53,977-Speed 2497.75 samples/sec Loss 1.6722 LearningRate 0.000163 Epoch: 25 Global Step: 528410 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:02,184-Speed 2495.96 samples/sec Loss 1.6655 LearningRate 0.000163 Epoch: 25 Global Step: 528420 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:10,336-Speed 2512.68 samples/sec Loss 1.6573 LearningRate 0.000163 Epoch: 25 Global Step: 528430 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:18,540-Speed 2496.83 samples/sec Loss 1.6392 LearningRate 0.000163 Epoch: 25 Global Step: 528440 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:26,742-Speed 2497.21 samples/sec Loss 1.6680 LearningRate 0.000163 Epoch: 25 Global Step: 528450 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:34,941-Speed 2498.18 samples/sec Loss 1.6725 LearningRate 0.000163 Epoch: 25 Global Step: 528460 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:43,139-Speed 2498.97 samples/sec Loss 1.6285 LearningRate 0.000163 Epoch: 25 Global Step: 528470 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:51,341-Speed 2497.18 samples/sec Loss 1.6767 LearningRate 0.000163 Epoch: 25 Global Step: 528480 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:25:59,485-Speed 2515.05 samples/sec Loss 1.6834 LearningRate 0.000163 Epoch: 25 Global Step: 528490 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:07,686-Speed 2497.95 samples/sec Loss 1.6538 LearningRate 0.000163 Epoch: 25 Global Step: 528500 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:15,891-Speed 2498.43 samples/sec Loss 1.6307 LearningRate 0.000163 Epoch: 25 Global Step: 528510 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:24,088-Speed 2498.79 samples/sec Loss 1.6835 LearningRate 0.000163 Epoch: 25 Global Step: 528520 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:32,290-Speed 2497.36 samples/sec Loss 1.6648 LearningRate 0.000163 Epoch: 25 Global Step: 528530 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:40,492-Speed 2497.45 samples/sec Loss 1.6554 LearningRate 0.000163 Epoch: 25 Global Step: 528540 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:48,635-Speed 2515.36 samples/sec Loss 1.6461 LearningRate 0.000163 Epoch: 25 Global Step: 528550 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:26:56,839-Speed 2496.81 samples/sec Loss 1.6476 LearningRate 0.000163 Epoch: 25 Global Step: 528560 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:05,036-Speed 2498.78 samples/sec Loss 1.6972 LearningRate 0.000163 Epoch: 25 Global Step: 528570 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:13,233-Speed 2498.76 samples/sec Loss 1.6535 LearningRate 0.000163 Epoch: 25 Global Step: 528580 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:21,443-Speed 2495.06 samples/sec Loss 1.6730 LearningRate 0.000163 Epoch: 25 Global Step: 528590 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:29,644-Speed 2497.65 samples/sec Loss 1.6936 LearningRate 0.000162 Epoch: 25 Global Step: 528600 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:37,795-Speed 2513.02 samples/sec Loss 1.6341 LearningRate 0.000162 Epoch: 25 Global Step: 528610 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:45,997-Speed 2497.78 samples/sec Loss 1.7006 LearningRate 0.000162 Epoch: 25 Global Step: 528620 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:27:54,202-Speed 2496.52 samples/sec Loss 1.6610 LearningRate 0.000162 Epoch: 25 Global Step: 528630 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:02,410-Speed 2495.84 samples/sec Loss 1.6440 LearningRate 0.000162 Epoch: 25 Global Step: 528640 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:10,612-Speed 2497.31 samples/sec Loss 1.6730 LearningRate 0.000162 Epoch: 25 Global Step: 528650 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:18,812-Speed 2498.06 samples/sec Loss 1.6413 LearningRate 0.000162 Epoch: 25 Global Step: 528660 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:26,959-Speed 2514.24 samples/sec Loss 1.6319 LearningRate 0.000162 Epoch: 25 Global Step: 528670 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:35,164-Speed 2496.32 samples/sec Loss 1.6859 LearningRate 0.000162 Epoch: 25 Global Step: 528680 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:43,370-Speed 2496.27 samples/sec Loss 1.6571 LearningRate 0.000162 Epoch: 25 Global Step: 528690 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:51,574-Speed 2496.75 samples/sec Loss 1.6600 LearningRate 0.000162 Epoch: 25 Global Step: 528700 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:28:59,779-Speed 2496.89 samples/sec Loss 1.6455 LearningRate 0.000162 Epoch: 25 Global Step: 528710 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:07,980-Speed 2497.53 samples/sec Loss 1.6793 LearningRate 0.000162 Epoch: 25 Global Step: 528720 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:16,141-Speed 2509.92 samples/sec Loss 1.6000 LearningRate 0.000162 Epoch: 25 Global Step: 528730 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:24,336-Speed 2499.36 samples/sec Loss 1.6718 LearningRate 0.000162 Epoch: 25 Global Step: 528740 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:32,544-Speed 2495.63 samples/sec Loss 1.6345 LearningRate 0.000162 Epoch: 25 Global Step: 528750 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:40,744-Speed 2497.77 samples/sec Loss 1.6257 LearningRate 0.000162 Epoch: 25 Global Step: 528760 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:48,941-Speed 2498.99 samples/sec Loss 1.6619 LearningRate 0.000162 Epoch: 25 Global Step: 528770 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:29:57,138-Speed 2498.83 samples/sec Loss 1.6332 LearningRate 0.000162 Epoch: 25 Global Step: 528780 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:05,293-Speed 2511.76 samples/sec Loss 1.6674 LearningRate 0.000162 Epoch: 25 Global Step: 528790 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:13,496-Speed 2497.15 samples/sec Loss 1.6390 LearningRate 0.000162 Epoch: 25 Global Step: 528800 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:21,707-Speed 2494.60 samples/sec Loss 1.5842 LearningRate 0.000162 Epoch: 25 Global Step: 528810 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:29,900-Speed 2500.36 samples/sec Loss 1.6230 LearningRate 0.000162 Epoch: 25 Global Step: 528820 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:38,103-Speed 2497.11 samples/sec Loss 1.6473 LearningRate 0.000162 Epoch: 25 Global Step: 528830 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:46,309-Speed 2496.36 samples/sec Loss 1.6398 LearningRate 0.000162 Epoch: 25 Global Step: 528840 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:30:54,459-Speed 2513.13 samples/sec Loss 1.6307 LearningRate 0.000162 Epoch: 25 Global Step: 528850 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:02,657-Speed 2498.74 samples/sec Loss 1.6654 LearningRate 0.000162 Epoch: 25 Global Step: 528860 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:10,854-Speed 2498.90 samples/sec Loss 1.6660 LearningRate 0.000162 Epoch: 25 Global Step: 528870 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:19,052-Speed 2498.74 samples/sec Loss 1.6763 LearningRate 0.000162 Epoch: 25 Global Step: 528880 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:27,255-Speed 2497.18 samples/sec Loss 1.6277 LearningRate 0.000162 Epoch: 25 Global Step: 528890 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:35,463-Speed 2495.63 samples/sec Loss 1.6518 LearningRate 0.000162 Epoch: 25 Global Step: 528900 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:43,608-Speed 2514.91 samples/sec Loss 1.6520 LearningRate 0.000162 Epoch: 25 Global Step: 528910 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:31:51,815-Speed 2495.89 samples/sec Loss 1.6898 LearningRate 0.000162 Epoch: 25 Global Step: 528920 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:00,021-Speed 2495.91 samples/sec Loss 1.6529 LearningRate 0.000162 Epoch: 25 Global Step: 528930 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:08,221-Speed 2498.36 samples/sec Loss 1.6793 LearningRate 0.000162 Epoch: 25 Global Step: 528940 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:16,422-Speed 2497.69 samples/sec Loss 1.6462 LearningRate 0.000162 Epoch: 25 Global Step: 528950 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:24,632-Speed 2494.86 samples/sec Loss 1.6235 LearningRate 0.000162 Epoch: 25 Global Step: 528960 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:32,779-Speed 2514.31 samples/sec Loss 1.6610 LearningRate 0.000162 Epoch: 25 Global Step: 528970 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:40,979-Speed 2498.14 samples/sec Loss 1.6809 LearningRate 0.000162 Epoch: 25 Global Step: 528980 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:49,179-Speed 2499.51 samples/sec Loss 1.6540 LearningRate 0.000162 Epoch: 25 Global Step: 528990 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:32:57,407-Speed 2499.14 samples/sec Loss 1.6305 LearningRate 0.000162 Epoch: 25 Global Step: 529000 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:05,642-Speed 2500.27 samples/sec Loss 1.6324 LearningRate 0.000162 Epoch: 25 Global Step: 529010 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:13,857-Speed 2493.55 samples/sec Loss 1.6391 LearningRate 0.000162 Epoch: 25 Global Step: 529020 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:26,961-Speed 1570.79 samples/sec Loss 1.6804 LearningRate 0.000162 Epoch: 25 Global Step: 529030 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:35,194-Speed 2499.38 samples/sec Loss 1.6582 LearningRate 0.000162 Epoch: 25 Global Step: 529040 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:43,429-Speed 2499.28 samples/sec Loss 1.6568 LearningRate 0.000162 Epoch: 25 Global Step: 529050 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:33:55,536-Speed 1691.72 samples/sec Loss 1.6444 LearningRate 0.000162 Epoch: 25 Global Step: 529060 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:03,730-Speed 2499.76 samples/sec Loss 1.6354 LearningRate 0.000162 Epoch: 25 Global Step: 529070 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:11,965-Speed 2497.10 samples/sec Loss 1.6677 LearningRate 0.000162 Epoch: 25 Global Step: 529080 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:20,436-Speed 2419.80 samples/sec Loss 1.6342 LearningRate 0.000162 Epoch: 25 Global Step: 529090 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:30,271-Speed 2082.51 samples/sec Loss 1.6636 LearningRate 0.000162 Epoch: 25 Global Step: 529100 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:38,467-Speed 2499.10 samples/sec Loss 1.6569 LearningRate 0.000162 Epoch: 25 Global Step: 529110 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:34:47,359-Speed 2311.83 samples/sec Loss 1.6299 LearningRate 0.000162 Epoch: 25 Global Step: 529120 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:00,165-Speed 1821.25 samples/sec Loss 1.6434 LearningRate 0.000162 Epoch: 25 Global Step: 529130 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:08,356-Speed 2500.75 samples/sec Loss 1.6746 LearningRate 0.000162 Epoch: 25 Global Step: 529140 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:16,522-Speed 2516.32 samples/sec Loss 1.6735 LearningRate 0.000162 Epoch: 25 Global Step: 529150 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:28,785-Speed 2500.73 samples/sec Loss 1.6775 LearningRate 0.000162 Epoch: 25 Global Step: 529160 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:37,005-Speed 2499.41 samples/sec Loss 1.6353 LearningRate 0.000162 Epoch: 25 Global Step: 529170 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:47,993-Speed 1874.47 samples/sec Loss 1.6366 LearningRate 0.000162 Epoch: 25 Global Step: 529180 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:35:57,046-Speed 2500.14 samples/sec Loss 1.6811 LearningRate 0.000162 Epoch: 25 Global Step: 529190 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:36:05,242-Speed 2499.12 samples/sec Loss 1.6823 LearningRate 0.000162 Epoch: 25 Global Step: 529200 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:36:13,459-Speed 2501.11 samples/sec Loss 1.6437 LearningRate 0.000162 Epoch: 25 Global Step: 529210 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:36:23,658-Speed 2501.35 samples/sec Loss 1.6522 LearningRate 0.000162 Epoch: 25 Global Step: 529220 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:36:31,856-Speed 2498.51 samples/sec Loss 1.6395 LearningRate 0.000162 Epoch: 25 Global Step: 529230 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:36:40,059-Speed 2496.93 samples/sec Loss 1.6582 LearningRate 0.000162 Epoch: 25 Global Step: 529240 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:36:48,261-Speed 2497.24 samples/sec Loss 1.6782 LearningRate 0.000162 Epoch: 25 Global Step: 529250 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-07-10 15:36:56,423-Speed 2510.24 samples/sec Loss 1.6363 LearningRate 0.000162 Epoch: 25 Global Step: 529260 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:04,575-Speed 2512.38 samples/sec Loss 1.6431 LearningRate 0.000162 Epoch: 25 Global Step: 529270 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:12,776-Speed 2498.02 samples/sec Loss 1.6709 LearningRate 0.000162 Epoch: 25 Global Step: 529280 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:20,988-Speed 2494.16 samples/sec Loss 1.6917 LearningRate 0.000162 Epoch: 25 Global Step: 529290 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:29,201-Speed 2493.85 samples/sec Loss 1.6462 LearningRate 0.000162 Epoch: 25 Global Step: 529300 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:37,401-Speed 2498.08 samples/sec Loss 1.6566 LearningRate 0.000162 Epoch: 25 Global Step: 529310 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:45,613-Speed 2494.31 samples/sec Loss 1.6589 LearningRate 0.000162 Epoch: 25 Global Step: 529320 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:37:53,772-Speed 2510.48 samples/sec Loss 1.6779 LearningRate 0.000162 Epoch: 25 Global Step: 529330 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:01,979-Speed 2496.00 samples/sec Loss 1.6514 LearningRate 0.000162 Epoch: 25 Global Step: 529340 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:10,184-Speed 2496.34 samples/sec Loss 1.6428 LearningRate 0.000162 Epoch: 25 Global Step: 529350 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:18,389-Speed 2496.63 samples/sec Loss 1.6601 LearningRate 0.000162 Epoch: 25 Global Step: 529360 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:26,593-Speed 2497.08 samples/sec Loss 1.6711 LearningRate 0.000162 Epoch: 25 Global Step: 529370 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:34,795-Speed 2497.30 samples/sec Loss 1.6360 LearningRate 0.000162 Epoch: 25 Global Step: 529380 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:42,944-Speed 2513.32 samples/sec Loss 1.7004 LearningRate 0.000162 Epoch: 25 Global Step: 529390 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:51,148-Speed 2496.81 samples/sec Loss 1.6772 LearningRate 0.000162 Epoch: 25 Global Step: 529400 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:38:59,363-Speed 2493.21 samples/sec Loss 1.6767 LearningRate 0.000162 Epoch: 25 Global Step: 529410 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:07,569-Speed 2496.28 samples/sec Loss 1.6704 LearningRate 0.000162 Epoch: 25 Global Step: 529420 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:15,774-Speed 2496.69 samples/sec Loss 1.6592 LearningRate 0.000162 Epoch: 25 Global Step: 529430 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:23,975-Speed 2497.84 samples/sec Loss 1.6439 LearningRate 0.000162 Epoch: 25 Global Step: 529440 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:32,125-Speed 2513.23 samples/sec Loss 1.6947 LearningRate 0.000162 Epoch: 25 Global Step: 529450 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:40,331-Speed 2495.89 samples/sec Loss 1.6814 LearningRate 0.000162 Epoch: 25 Global Step: 529460 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:48,553-Speed 2491.80 samples/sec Loss 1.6730 LearningRate 0.000162 Epoch: 25 Global Step: 529470 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:39:56,751-Speed 2498.67 samples/sec Loss 1.6767 LearningRate 0.000162 Epoch: 25 Global Step: 529480 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:04,951-Speed 2497.93 samples/sec Loss 1.6304 LearningRate 0.000162 Epoch: 25 Global Step: 529490 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:13,154-Speed 2497.02 samples/sec Loss 1.6884 LearningRate 0.000162 Epoch: 25 Global Step: 529500 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:21,299-Speed 2514.96 samples/sec Loss 1.6489 LearningRate 0.000162 Epoch: 25 Global Step: 529510 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:29,501-Speed 2497.29 samples/sec Loss 1.6689 LearningRate 0.000162 Epoch: 25 Global Step: 529520 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:37,717-Speed 2492.98 samples/sec Loss 1.6741 LearningRate 0.000161 Epoch: 25 Global Step: 529530 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:45,919-Speed 2497.53 samples/sec Loss 1.6547 LearningRate 0.000161 Epoch: 25 Global Step: 529540 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:40:54,123-Speed 2496.99 samples/sec Loss 1.6806 LearningRate 0.000161 Epoch: 25 Global Step: 529550 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:02,322-Speed 2498.07 samples/sec Loss 1.6796 LearningRate 0.000161 Epoch: 25 Global Step: 529560 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:10,471-Speed 2513.75 samples/sec Loss 1.6445 LearningRate 0.000161 Epoch: 25 Global Step: 529570 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:18,675-Speed 2496.66 samples/sec Loss 1.6496 LearningRate 0.000161 Epoch: 25 Global Step: 529580 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:26,875-Speed 2498.05 samples/sec Loss 1.6394 LearningRate 0.000161 Epoch: 25 Global Step: 529590 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:35,078-Speed 2497.05 samples/sec Loss 1.6620 LearningRate 0.000161 Epoch: 25 Global Step: 529600 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:43,282-Speed 2496.66 samples/sec Loss 1.6618 LearningRate 0.000161 Epoch: 25 Global Step: 529610 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:51,483-Speed 2498.15 samples/sec Loss 1.6551 LearningRate 0.000161 Epoch: 25 Global Step: 529620 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:41:59,634-Speed 2512.99 samples/sec Loss 1.6388 LearningRate 0.000161 Epoch: 25 Global Step: 529630 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:07,837-Speed 2496.91 samples/sec Loss 1.6571 LearningRate 0.000161 Epoch: 25 Global Step: 529640 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:16,038-Speed 2497.74 samples/sec Loss 1.6238 LearningRate 0.000161 Epoch: 25 Global Step: 529650 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:24,240-Speed 2497.18 samples/sec Loss 1.6847 LearningRate 0.000161 Epoch: 25 Global Step: 529660 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:32,445-Speed 2496.51 samples/sec Loss 1.6277 LearningRate 0.000161 Epoch: 25 Global Step: 529670 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:40,651-Speed 2496.25 samples/sec Loss 1.6331 LearningRate 0.000161 Epoch: 25 Global Step: 529680 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:48,801-Speed 2513.25 samples/sec Loss 1.5935 LearningRate 0.000161 Epoch: 25 Global Step: 529690 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:42:57,000-Speed 2498.36 samples/sec Loss 1.6537 LearningRate 0.000161 Epoch: 25 Global Step: 529700 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:05,210-Speed 2494.84 samples/sec Loss 1.6316 LearningRate 0.000161 Epoch: 25 Global Step: 529710 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:13,413-Speed 2497.12 samples/sec Loss 1.6071 LearningRate 0.000161 Epoch: 25 Global Step: 529720 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:21,610-Speed 2498.68 samples/sec Loss 1.6371 LearningRate 0.000161 Epoch: 25 Global Step: 529730 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:29,810-Speed 2498.03 samples/sec Loss 1.6363 LearningRate 0.000161 Epoch: 25 Global Step: 529740 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:37,971-Speed 2509.89 samples/sec Loss 1.6605 LearningRate 0.000161 Epoch: 25 Global Step: 529750 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:46,173-Speed 2497.73 samples/sec Loss 1.6636 LearningRate 0.000161 Epoch: 25 Global Step: 529760 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:43:54,373-Speed 2497.98 samples/sec Loss 1.6352 LearningRate 0.000161 Epoch: 25 Global Step: 529770 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:02,569-Speed 2499.20 samples/sec Loss 1.6399 LearningRate 0.000161 Epoch: 25 Global Step: 529780 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:10,768-Speed 2498.27 samples/sec Loss 1.6285 LearningRate 0.000161 Epoch: 25 Global Step: 529790 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:18,970-Speed 2497.29 samples/sec Loss 1.6358 LearningRate 0.000161 Epoch: 25 Global Step: 529800 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:27,116-Speed 2514.52 samples/sec Loss 1.6481 LearningRate 0.000161 Epoch: 25 Global Step: 529810 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:35,318-Speed 2497.49 samples/sec Loss 1.6270 LearningRate 0.000161 Epoch: 25 Global Step: 529820 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:43,519-Speed 2497.70 samples/sec Loss 1.6361 LearningRate 0.000161 Epoch: 25 Global Step: 529830 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:51,721-Speed 2497.42 samples/sec Loss 1.6655 LearningRate 0.000161 Epoch: 25 Global Step: 529840 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:44:59,920-Speed 2498.29 samples/sec Loss 1.6226 LearningRate 0.000161 Epoch: 25 Global Step: 529850 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:08,122-Speed 2497.32 samples/sec Loss 1.6479 LearningRate 0.000161 Epoch: 25 Global Step: 529860 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:16,268-Speed 2514.58 samples/sec Loss 1.5995 LearningRate 0.000161 Epoch: 25 Global Step: 529870 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:24,465-Speed 2498.65 samples/sec Loss 1.6528 LearningRate 0.000161 Epoch: 25 Global Step: 529880 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:32,668-Speed 2497.12 samples/sec Loss 1.6458 LearningRate 0.000161 Epoch: 25 Global Step: 529890 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:40,867-Speed 2498.17 samples/sec Loss 1.6712 LearningRate 0.000161 Epoch: 25 Global Step: 529900 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:49,069-Speed 2497.34 samples/sec Loss 1.6837 LearningRate 0.000161 Epoch: 25 Global Step: 529910 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:45:57,270-Speed 2497.91 samples/sec Loss 1.6587 LearningRate 0.000161 Epoch: 25 Global Step: 529920 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:05,423-Speed 2512.16 samples/sec Loss 1.6730 LearningRate 0.000161 Epoch: 25 Global Step: 529930 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:13,620-Speed 2498.73 samples/sec Loss 1.6661 LearningRate 0.000161 Epoch: 25 Global Step: 529940 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:21,841-Speed 2492.05 samples/sec Loss 1.6165 LearningRate 0.000161 Epoch: 25 Global Step: 529950 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:30,042-Speed 2497.38 samples/sec Loss 1.6770 LearningRate 0.000161 Epoch: 25 Global Step: 529960 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:38,239-Speed 2498.80 samples/sec Loss 1.6539 LearningRate 0.000161 Epoch: 25 Global Step: 529970 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:46,444-Speed 2496.47 samples/sec Loss 1.6147 LearningRate 0.000161 Epoch: 25 Global Step: 529980 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:46:54,609-Speed 2508.58 samples/sec Loss 1.6659 LearningRate 0.000161 Epoch: 25 Global Step: 529990 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:02,810-Speed 2497.64 samples/sec Loss 1.6339 LearningRate 0.000161 Epoch: 25 Global Step: 530000 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:11,021-Speed 2494.84 samples/sec Loss 1.6771 LearningRate 0.000161 Epoch: 25 Global Step: 530010 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:19,225-Speed 2496.63 samples/sec Loss 1.6402 LearningRate 0.000161 Epoch: 25 Global Step: 530020 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:27,426-Speed 2497.35 samples/sec Loss 1.5932 LearningRate 0.000161 Epoch: 25 Global Step: 530030 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:35,624-Speed 2498.50 samples/sec Loss 1.6405 LearningRate 0.000161 Epoch: 25 Global Step: 530040 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:43,771-Speed 2514.47 samples/sec Loss 1.6742 LearningRate 0.000161 Epoch: 25 Global Step: 530050 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:47:51,979-Speed 2495.49 samples/sec Loss 1.6644 LearningRate 0.000161 Epoch: 25 Global Step: 530060 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:00,195-Speed 2492.96 samples/sec Loss 1.6676 LearningRate 0.000161 Epoch: 25 Global Step: 530070 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:08,395-Speed 2497.99 samples/sec Loss 1.6365 LearningRate 0.000161 Epoch: 25 Global Step: 530080 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:16,598-Speed 2497.06 samples/sec Loss 1.6353 LearningRate 0.000161 Epoch: 25 Global Step: 530090 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:24,800-Speed 2497.41 samples/sec Loss 1.6050 LearningRate 0.000161 Epoch: 25 Global Step: 530100 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:32,946-Speed 2514.99 samples/sec Loss 1.6547 LearningRate 0.000161 Epoch: 25 Global Step: 530110 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:41,153-Speed 2495.89 samples/sec Loss 1.6690 LearningRate 0.000161 Epoch: 25 Global Step: 530120 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:49,369-Speed 2493.14 samples/sec Loss 1.6432 LearningRate 0.000161 Epoch: 25 Global Step: 530130 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:48:57,574-Speed 2496.51 samples/sec Loss 1.6403 LearningRate 0.000161 Epoch: 25 Global Step: 530140 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:05,772-Speed 2498.50 samples/sec Loss 1.6566 LearningRate 0.000161 Epoch: 25 Global Step: 530150 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:13,974-Speed 2497.32 samples/sec Loss 1.6629 LearningRate 0.000161 Epoch: 25 Global Step: 530160 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:22,122-Speed 2514.02 samples/sec Loss 1.6423 LearningRate 0.000161 Epoch: 25 Global Step: 530170 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:30,329-Speed 2495.71 samples/sec Loss 1.6656 LearningRate 0.000161 Epoch: 25 Global Step: 530180 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:38,532-Speed 2497.04 samples/sec Loss 1.6404 LearningRate 0.000161 Epoch: 25 Global Step: 530190 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:46,751-Speed 2492.99 samples/sec Loss 1.6537 LearningRate 0.000161 Epoch: 25 Global Step: 530200 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:49:54,965-Speed 2493.63 samples/sec Loss 1.6616 LearningRate 0.000161 Epoch: 25 Global Step: 530210 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:03,176-Speed 2494.32 samples/sec Loss 1.6697 LearningRate 0.000161 Epoch: 25 Global Step: 530220 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:11,321-Speed 2514.88 samples/sec Loss 1.6581 LearningRate 0.000161 Epoch: 25 Global Step: 530230 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:19,523-Speed 2497.29 samples/sec Loss 1.6300 LearningRate 0.000161 Epoch: 25 Global Step: 530240 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:27,742-Speed 2492.35 samples/sec Loss 1.6434 LearningRate 0.000161 Epoch: 25 Global Step: 530250 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:35,940-Speed 2498.65 samples/sec Loss 1.6676 LearningRate 0.000161 Epoch: 25 Global Step: 530260 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:44,145-Speed 2496.41 samples/sec Loss 1.6813 LearningRate 0.000161 Epoch: 25 Global Step: 530270 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:50:52,343-Speed 2498.45 samples/sec Loss 1.6545 LearningRate 0.000161 Epoch: 25 Global Step: 530280 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:00,489-Speed 2514.57 samples/sec Loss 1.6796 LearningRate 0.000161 Epoch: 25 Global Step: 530290 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:08,687-Speed 2498.42 samples/sec Loss 1.6255 LearningRate 0.000161 Epoch: 25 Global Step: 530300 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:16,886-Speed 2498.43 samples/sec Loss 1.6974 LearningRate 0.000161 Epoch: 25 Global Step: 530310 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:25,093-Speed 2495.90 samples/sec Loss 1.6539 LearningRate 0.000161 Epoch: 25 Global Step: 530320 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:33,292-Speed 2498.56 samples/sec Loss 1.6775 LearningRate 0.000161 Epoch: 25 Global Step: 530330 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:41,506-Speed 2493.50 samples/sec Loss 1.6573 LearningRate 0.000161 Epoch: 25 Global Step: 530340 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:49,652-Speed 2514.75 samples/sec Loss 1.6795 LearningRate 0.000161 Epoch: 25 Global Step: 530350 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:51:57,857-Speed 2496.44 samples/sec Loss 1.6560 LearningRate 0.000161 Epoch: 25 Global Step: 530360 Fp16 Grad Scale: 16384 Required: 69 hours Training: 2022-07-10 15:52:06,061-Speed 2496.71 samples/sec Loss 1.6171 LearningRate 0.000161 Epoch: 25 Global Step: 530370 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:14,269-Speed 2495.77 samples/sec Loss 1.6538 LearningRate 0.000161 Epoch: 25 Global Step: 530380 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:22,472-Speed 2497.14 samples/sec Loss 1.6250 LearningRate 0.000161 Epoch: 25 Global Step: 530390 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:30,671-Speed 2498.37 samples/sec Loss 1.6686 LearningRate 0.000161 Epoch: 25 Global Step: 530400 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:38,822-Speed 2513.09 samples/sec Loss 1.6759 LearningRate 0.000161 Epoch: 25 Global Step: 530410 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:47,021-Speed 2498.26 samples/sec Loss 1.6735 LearningRate 0.000161 Epoch: 25 Global Step: 530420 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:52:55,220-Speed 2498.07 samples/sec Loss 1.6464 LearningRate 0.000161 Epoch: 25 Global Step: 530430 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:53:03,420-Speed 2498.64 samples/sec Loss 1.6695 LearningRate 0.000161 Epoch: 25 Global Step: 530440 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:53:11,624-Speed 2496.75 samples/sec Loss 1.6553 LearningRate 0.000161 Epoch: 25 Global Step: 530450 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 15:53:19,828-Speed 2496.55 samples/sec Loss 1.6093 LearningRate 0.000160 Epoch: 25 Global Step: 530460 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:53:27,989-Speed 2509.94 samples/sec Loss 1.6407 LearningRate 0.000160 Epoch: 25 Global Step: 530470 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:53:36,191-Speed 2497.33 samples/sec Loss 1.6564 LearningRate 0.000160 Epoch: 25 Global Step: 530480 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:53:44,391-Speed 2498.02 samples/sec Loss 1.6478 LearningRate 0.000160 Epoch: 25 Global Step: 530490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:53:52,601-Speed 2494.81 samples/sec Loss 1.6938 LearningRate 0.000160 Epoch: 25 Global Step: 530500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:00,815-Speed 2493.75 samples/sec Loss 1.6323 LearningRate 0.000160 Epoch: 25 Global Step: 530510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:09,016-Speed 2497.53 samples/sec Loss 1.6458 LearningRate 0.000160 Epoch: 25 Global Step: 530520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:17,167-Speed 2513.20 samples/sec Loss 1.6214 LearningRate 0.000160 Epoch: 25 Global Step: 530530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:25,370-Speed 2496.85 samples/sec Loss 1.6251 LearningRate 0.000160 Epoch: 25 Global Step: 530540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:33,569-Speed 2498.16 samples/sec Loss 1.6075 LearningRate 0.000160 Epoch: 25 Global Step: 530550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:41,785-Speed 2493.14 samples/sec Loss 1.6438 LearningRate 0.000160 Epoch: 25 Global Step: 530560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:49,988-Speed 2497.09 samples/sec Loss 1.6682 LearningRate 0.000160 Epoch: 25 Global Step: 530570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:54:58,185-Speed 2498.80 samples/sec Loss 1.6675 LearningRate 0.000160 Epoch: 25 Global Step: 530580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:06,333-Speed 2513.91 samples/sec Loss 1.6628 LearningRate 0.000160 Epoch: 25 Global Step: 530590 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:14,533-Speed 2498.13 samples/sec Loss 1.6588 LearningRate 0.000160 Epoch: 25 Global Step: 530600 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:22,728-Speed 2499.38 samples/sec Loss 1.6552 LearningRate 0.000160 Epoch: 25 Global Step: 530610 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:30,940-Speed 2494.20 samples/sec Loss 1.6273 LearningRate 0.000160 Epoch: 25 Global Step: 530620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:39,140-Speed 2497.94 samples/sec Loss 1.6281 LearningRate 0.000160 Epoch: 25 Global Step: 530630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:47,343-Speed 2497.24 samples/sec Loss 1.6378 LearningRate 0.000160 Epoch: 25 Global Step: 530640 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:55:55,493-Speed 2513.59 samples/sec Loss 1.6350 LearningRate 0.000160 Epoch: 25 Global Step: 530650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:03,690-Speed 2498.82 samples/sec Loss 1.6353 LearningRate 0.000160 Epoch: 25 Global Step: 530660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:11,894-Speed 2496.80 samples/sec Loss 1.6252 LearningRate 0.000160 Epoch: 25 Global Step: 530670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:20,090-Speed 2499.05 samples/sec Loss 1.6181 LearningRate 0.000160 Epoch: 25 Global Step: 530680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:28,289-Speed 2498.26 samples/sec Loss 1.6480 LearningRate 0.000160 Epoch: 25 Global Step: 530690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:36,491-Speed 2497.62 samples/sec Loss 1.6163 LearningRate 0.000160 Epoch: 25 Global Step: 530700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:44,655-Speed 2509.06 samples/sec Loss 1.6574 LearningRate 0.000160 Epoch: 25 Global Step: 530710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:56:52,859-Speed 2496.85 samples/sec Loss 1.6413 LearningRate 0.000160 Epoch: 25 Global Step: 530720 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:01,058-Speed 2498.19 samples/sec Loss 1.6304 LearningRate 0.000160 Epoch: 25 Global Step: 530730 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:09,257-Speed 2498.26 samples/sec Loss 1.6493 LearningRate 0.000160 Epoch: 25 Global Step: 530740 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:17,468-Speed 2494.42 samples/sec Loss 1.6278 LearningRate 0.000160 Epoch: 25 Global Step: 530750 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:25,672-Speed 2496.88 samples/sec Loss 1.6369 LearningRate 0.000160 Epoch: 25 Global Step: 530760 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:33,816-Speed 2515.21 samples/sec Loss 1.6248 LearningRate 0.000160 Epoch: 25 Global Step: 530770 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:42,022-Speed 2496.01 samples/sec Loss 1.6567 LearningRate 0.000160 Epoch: 25 Global Step: 530780 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:50,223-Speed 2497.60 samples/sec Loss 1.6399 LearningRate 0.000160 Epoch: 25 Global Step: 530790 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:57:58,443-Speed 2492.12 samples/sec Loss 1.6762 LearningRate 0.000160 Epoch: 25 Global Step: 530800 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:06,646-Speed 2496.82 samples/sec Loss 1.6757 LearningRate 0.000160 Epoch: 25 Global Step: 530810 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:14,849-Speed 2497.23 samples/sec Loss 1.6486 LearningRate 0.000160 Epoch: 25 Global Step: 530820 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:23,000-Speed 2512.87 samples/sec Loss 1.6649 LearningRate 0.000160 Epoch: 25 Global Step: 530830 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:31,201-Speed 2497.97 samples/sec Loss 1.6901 LearningRate 0.000160 Epoch: 25 Global Step: 530840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:39,403-Speed 2497.14 samples/sec Loss 1.6960 LearningRate 0.000160 Epoch: 25 Global Step: 530850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:47,605-Speed 2497.58 samples/sec Loss 1.6270 LearningRate 0.000160 Epoch: 25 Global Step: 530860 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:58:55,813-Speed 2495.42 samples/sec Loss 1.6224 LearningRate 0.000160 Epoch: 25 Global Step: 530870 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:04,019-Speed 2496.24 samples/sec Loss 1.6299 LearningRate 0.000160 Epoch: 25 Global Step: 530880 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:12,171-Speed 2512.74 samples/sec Loss 1.6471 LearningRate 0.000160 Epoch: 25 Global Step: 530890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:20,380-Speed 2495.30 samples/sec Loss 1.6719 LearningRate 0.000160 Epoch: 25 Global Step: 530900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:28,594-Speed 2493.61 samples/sec Loss 1.6017 LearningRate 0.000160 Epoch: 25 Global Step: 530910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:36,797-Speed 2497.11 samples/sec Loss 1.6462 LearningRate 0.000160 Epoch: 25 Global Step: 530920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:45,010-Speed 2493.87 samples/sec Loss 1.6696 LearningRate 0.000160 Epoch: 25 Global Step: 530930 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 15:59:53,211-Speed 2497.65 samples/sec Loss 1.6274 LearningRate 0.000160 Epoch: 25 Global Step: 530940 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:01,357-Speed 2514.48 samples/sec Loss 1.6118 LearningRate 0.000160 Epoch: 25 Global Step: 530950 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:09,563-Speed 2496.11 samples/sec Loss 1.6237 LearningRate 0.000160 Epoch: 25 Global Step: 530960 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:17,764-Speed 2497.68 samples/sec Loss 1.6563 LearningRate 0.000160 Epoch: 25 Global Step: 530970 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:25,971-Speed 2495.89 samples/sec Loss 1.6039 LearningRate 0.000160 Epoch: 25 Global Step: 530980 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:34,176-Speed 2496.15 samples/sec Loss 1.6936 LearningRate 0.000160 Epoch: 25 Global Step: 530990 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:42,387-Speed 2494.55 samples/sec Loss 1.6362 LearningRate 0.000160 Epoch: 25 Global Step: 531000 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:50,536-Speed 2514.00 samples/sec Loss 1.6125 LearningRate 0.000160 Epoch: 25 Global Step: 531010 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:00:58,747-Speed 2494.58 samples/sec Loss 1.6413 LearningRate 0.000160 Epoch: 25 Global Step: 531020 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:06,948-Speed 2497.59 samples/sec Loss 1.6945 LearningRate 0.000160 Epoch: 25 Global Step: 531030 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:15,147-Speed 2498.38 samples/sec Loss 1.6126 LearningRate 0.000160 Epoch: 25 Global Step: 531040 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:23,357-Speed 2494.74 samples/sec Loss 1.6516 LearningRate 0.000160 Epoch: 25 Global Step: 531050 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:31,561-Speed 2496.53 samples/sec Loss 1.6628 LearningRate 0.000160 Epoch: 25 Global Step: 531060 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:39,710-Speed 2513.86 samples/sec Loss 1.6481 LearningRate 0.000160 Epoch: 25 Global Step: 531070 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:47,909-Speed 2498.25 samples/sec Loss 1.6462 LearningRate 0.000160 Epoch: 25 Global Step: 531080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:01:56,105-Speed 2499.25 samples/sec Loss 1.6787 LearningRate 0.000160 Epoch: 25 Global Step: 531090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:04,315-Speed 2494.61 samples/sec Loss 1.6704 LearningRate 0.000160 Epoch: 25 Global Step: 531100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:12,514-Speed 2498.30 samples/sec Loss 1.6321 LearningRate 0.000160 Epoch: 25 Global Step: 531110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:20,713-Speed 2498.31 samples/sec Loss 1.6553 LearningRate 0.000160 Epoch: 25 Global Step: 531120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:28,860-Speed 2514.25 samples/sec Loss 1.6507 LearningRate 0.000160 Epoch: 25 Global Step: 531130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:37,063-Speed 2496.98 samples/sec Loss 1.6774 LearningRate 0.000160 Epoch: 25 Global Step: 531140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:45,264-Speed 2497.73 samples/sec Loss 1.6646 LearningRate 0.000160 Epoch: 25 Global Step: 531150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:02:53,466-Speed 2497.44 samples/sec Loss 1.6343 LearningRate 0.000160 Epoch: 25 Global Step: 531160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:01,671-Speed 2496.28 samples/sec Loss 1.6198 LearningRate 0.000160 Epoch: 25 Global Step: 531170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:09,873-Speed 2497.37 samples/sec Loss 1.6419 LearningRate 0.000160 Epoch: 25 Global Step: 531180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:18,022-Speed 2513.65 samples/sec Loss 1.6769 LearningRate 0.000160 Epoch: 25 Global Step: 531190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:26,221-Speed 2498.38 samples/sec Loss 1.6662 LearningRate 0.000160 Epoch: 25 Global Step: 531200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:34,423-Speed 2497.19 samples/sec Loss 1.6536 LearningRate 0.000160 Epoch: 25 Global Step: 531210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:42,624-Speed 2497.72 samples/sec Loss 1.6087 LearningRate 0.000160 Epoch: 25 Global Step: 531220 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:50,828-Speed 2496.73 samples/sec Loss 1.6696 LearningRate 0.000160 Epoch: 25 Global Step: 531230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:03:59,041-Speed 2493.99 samples/sec Loss 1.6578 LearningRate 0.000160 Epoch: 25 Global Step: 531240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:07,193-Speed 2512.65 samples/sec Loss 1.6306 LearningRate 0.000160 Epoch: 25 Global Step: 531250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:15,391-Speed 2498.51 samples/sec Loss 1.6632 LearningRate 0.000160 Epoch: 25 Global Step: 531260 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:23,591-Speed 2498.12 samples/sec Loss 1.6175 LearningRate 0.000160 Epoch: 25 Global Step: 531270 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:31,799-Speed 2495.75 samples/sec Loss 1.6617 LearningRate 0.000160 Epoch: 25 Global Step: 531280 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:40,004-Speed 2496.48 samples/sec Loss 1.6310 LearningRate 0.000160 Epoch: 25 Global Step: 531290 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:48,201-Speed 2499.08 samples/sec Loss 1.6700 LearningRate 0.000160 Epoch: 25 Global Step: 531300 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:04:56,344-Speed 2515.50 samples/sec Loss 1.6377 LearningRate 0.000160 Epoch: 25 Global Step: 531310 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:04,552-Speed 2495.31 samples/sec Loss 1.6201 LearningRate 0.000160 Epoch: 25 Global Step: 531320 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:12,756-Speed 2496.80 samples/sec Loss 1.6325 LearningRate 0.000160 Epoch: 25 Global Step: 531330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:20,955-Speed 2498.26 samples/sec Loss 1.6560 LearningRate 0.000160 Epoch: 25 Global Step: 531340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:29,166-Speed 2494.64 samples/sec Loss 1.6191 LearningRate 0.000160 Epoch: 25 Global Step: 531350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:37,367-Speed 2498.02 samples/sec Loss 1.6163 LearningRate 0.000160 Epoch: 25 Global Step: 531360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:45,513-Speed 2514.46 samples/sec Loss 1.6398 LearningRate 0.000160 Epoch: 25 Global Step: 531370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:05:53,716-Speed 2497.05 samples/sec Loss 1.6234 LearningRate 0.000160 Epoch: 25 Global Step: 531380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:01,917-Speed 2497.77 samples/sec Loss 1.6135 LearningRate 0.000159 Epoch: 25 Global Step: 531390 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:10,119-Speed 2497.22 samples/sec Loss 1.6797 LearningRate 0.000159 Epoch: 25 Global Step: 531400 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:18,323-Speed 2497.08 samples/sec Loss 1.6300 LearningRate 0.000159 Epoch: 25 Global Step: 531410 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:26,531-Speed 2495.97 samples/sec Loss 1.6587 LearningRate 0.000159 Epoch: 25 Global Step: 531420 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:34,683-Speed 2512.59 samples/sec Loss 1.6542 LearningRate 0.000159 Epoch: 25 Global Step: 531430 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:42,884-Speed 2497.51 samples/sec Loss 1.6546 LearningRate 0.000159 Epoch: 25 Global Step: 531440 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:51,092-Speed 2495.58 samples/sec Loss 1.6515 LearningRate 0.000159 Epoch: 25 Global Step: 531450 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:06:59,300-Speed 2495.55 samples/sec Loss 1.6468 LearningRate 0.000159 Epoch: 25 Global Step: 531460 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:07,511-Speed 2494.79 samples/sec Loss 1.6519 LearningRate 0.000159 Epoch: 25 Global Step: 531470 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:15,714-Speed 2497.09 samples/sec Loss 1.6258 LearningRate 0.000159 Epoch: 25 Global Step: 531480 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:23,858-Speed 2515.29 samples/sec Loss 1.6599 LearningRate 0.000159 Epoch: 25 Global Step: 531490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:32,058-Speed 2498.02 samples/sec Loss 1.6357 LearningRate 0.000159 Epoch: 25 Global Step: 531500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:40,260-Speed 2497.11 samples/sec Loss 1.6751 LearningRate 0.000159 Epoch: 25 Global Step: 531510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:48,470-Speed 2495.05 samples/sec Loss 1.6529 LearningRate 0.000159 Epoch: 25 Global Step: 531520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:07:56,673-Speed 2497.26 samples/sec Loss 1.6366 LearningRate 0.000159 Epoch: 25 Global Step: 531530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:04,878-Speed 2496.63 samples/sec Loss 1.6561 LearningRate 0.000159 Epoch: 25 Global Step: 531540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:13,039-Speed 2509.65 samples/sec Loss 1.6438 LearningRate 0.000159 Epoch: 25 Global Step: 531550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:21,240-Speed 2497.56 samples/sec Loss 1.6062 LearningRate 0.000159 Epoch: 25 Global Step: 531560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:29,440-Speed 2498.18 samples/sec Loss 1.6618 LearningRate 0.000159 Epoch: 25 Global Step: 531570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:37,639-Speed 2498.40 samples/sec Loss 1.6346 LearningRate 0.000159 Epoch: 25 Global Step: 531580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:45,838-Speed 2498.18 samples/sec Loss 1.6610 LearningRate 0.000159 Epoch: 25 Global Step: 531590 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:08:54,046-Speed 2495.47 samples/sec Loss 1.6019 LearningRate 0.000159 Epoch: 25 Global Step: 531600 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:02,195-Speed 2513.61 samples/sec Loss 1.6418 LearningRate 0.000159 Epoch: 25 Global Step: 531610 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:10,400-Speed 2496.59 samples/sec Loss 1.6586 LearningRate 0.000159 Epoch: 25 Global Step: 531620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:18,602-Speed 2497.25 samples/sec Loss 1.6530 LearningRate 0.000159 Epoch: 25 Global Step: 531630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:26,804-Speed 2497.25 samples/sec Loss 1.6559 LearningRate 0.000159 Epoch: 25 Global Step: 531640 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:35,004-Speed 2498.00 samples/sec Loss 1.6841 LearningRate 0.000159 Epoch: 25 Global Step: 531650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:09:43,215-Speed 2495.06 samples/sec Loss 1.5910 LearningRate 0.000159 Epoch: 25 Global Step: 531660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:09:51,359-Speed 2514.83 samples/sec Loss 1.6760 LearningRate 0.000159 Epoch: 25 Global Step: 531670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:09:59,556-Speed 2498.95 samples/sec Loss 1.6595 LearningRate 0.000159 Epoch: 25 Global Step: 531680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:07,756-Speed 2498.05 samples/sec Loss 1.6184 LearningRate 0.000159 Epoch: 25 Global Step: 531690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:15,957-Speed 2497.45 samples/sec Loss 1.6313 LearningRate 0.000159 Epoch: 25 Global Step: 531700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:24,157-Speed 2498.15 samples/sec Loss 1.6641 LearningRate 0.000159 Epoch: 25 Global Step: 531710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:32,363-Speed 2496.21 samples/sec Loss 1.6515 LearningRate 0.000159 Epoch: 25 Global Step: 531720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:40,511-Speed 2513.96 samples/sec Loss 1.6318 LearningRate 0.000159 Epoch: 25 Global Step: 531730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:48,713-Speed 2497.49 samples/sec Loss 1.6407 LearningRate 0.000159 Epoch: 25 Global Step: 531740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:10:56,912-Speed 2498.34 samples/sec Loss 1.6135 LearningRate 0.000159 Epoch: 25 Global Step: 531750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:05,116-Speed 2496.81 samples/sec Loss 1.6342 LearningRate 0.000159 Epoch: 25 Global Step: 531760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:13,318-Speed 2497.26 samples/sec Loss 1.6310 LearningRate 0.000159 Epoch: 25 Global Step: 531770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:21,528-Speed 2494.80 samples/sec Loss 1.6270 LearningRate 0.000159 Epoch: 25 Global Step: 531780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:29,675-Speed 2514.35 samples/sec Loss 1.6230 LearningRate 0.000159 Epoch: 25 Global Step: 531790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:37,876-Speed 2497.70 samples/sec Loss 1.6631 LearningRate 0.000159 Epoch: 25 Global Step: 531800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:46,078-Speed 2497.15 samples/sec Loss 1.6473 LearningRate 0.000159 Epoch: 25 Global Step: 531810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:11:54,248-Speed 2507.24 samples/sec Loss 1.6659 LearningRate 0.000159 Epoch: 25 Global Step: 531820 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:12:02,450-Speed 2497.35 samples/sec Loss 1.6888 LearningRate 0.000159 Epoch: 25 Global Step: 531830 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:12:10,650-Speed 2497.87 samples/sec Loss 1.6562 LearningRate 0.000159 Epoch: 25 Global Step: 531840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:12:18,796-Speed 2514.54 samples/sec Loss 1.6892 LearningRate 0.000159 Epoch: 25 Global Step: 531850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:12:26,948-Speed 2512.69 samples/sec Loss 1.6974 LearningRate 0.000159 Epoch: 25 Global Step: 531860 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:12:35,148-Speed 2498.20 samples/sec Loss 1.6690 LearningRate 0.000159 Epoch: 25 Global Step: 531870 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:12:43,353-Speed 2496.78 samples/sec Loss 1.6379 LearningRate 0.000159 Epoch: 25 Global Step: 531880 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:12:51,558-Speed 2497.05 samples/sec Loss 1.6184 LearningRate 0.000159 Epoch: 25 Global Step: 531890 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:12:59,753-Speed 2499.26 samples/sec Loss 1.6142 LearningRate 0.000159 Epoch: 25 Global Step: 531900 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:07,900-Speed 2514.19 samples/sec Loss 1.6480 LearningRate 0.000159 Epoch: 25 Global Step: 531910 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:16,099-Speed 2498.43 samples/sec Loss 1.6118 LearningRate 0.000159 Epoch: 25 Global Step: 531920 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:24,298-Speed 2498.46 samples/sec Loss 1.6206 LearningRate 0.000159 Epoch: 25 Global Step: 531930 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:32,495-Speed 2498.86 samples/sec Loss 1.6215 LearningRate 0.000159 Epoch: 25 Global Step: 531940 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:40,699-Speed 2496.88 samples/sec Loss 1.6381 LearningRate 0.000159 Epoch: 25 Global Step: 531950 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:48,909-Speed 2494.95 samples/sec Loss 1.6304 LearningRate 0.000159 Epoch: 25 Global Step: 531960 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:13:57,054-Speed 2514.56 samples/sec Loss 1.6667 LearningRate 0.000159 Epoch: 25 Global Step: 531970 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:05,255-Speed 2497.57 samples/sec Loss 1.6433 LearningRate 0.000159 Epoch: 25 Global Step: 531980 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:13,453-Speed 2498.70 samples/sec Loss 1.6128 LearningRate 0.000159 Epoch: 25 Global Step: 531990 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:21,655-Speed 2497.40 samples/sec Loss 1.6332 LearningRate 0.000159 Epoch: 25 Global Step: 532000 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:29,857-Speed 2497.38 samples/sec Loss 1.6566 LearningRate 0.000159 Epoch: 25 Global Step: 532010 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:38,057-Speed 2497.83 samples/sec Loss 1.6821 LearningRate 0.000159 Epoch: 25 Global Step: 532020 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:46,205-Speed 2514.08 samples/sec Loss 1.6499 LearningRate 0.000159 Epoch: 25 Global Step: 532030 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:14:54,411-Speed 2496.17 samples/sec Loss 1.6755 LearningRate 0.000159 Epoch: 25 Global Step: 532040 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:02,621-Speed 2495.07 samples/sec Loss 1.6101 LearningRate 0.000159 Epoch: 25 Global Step: 532050 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:10,823-Speed 2497.18 samples/sec Loss 1.6274 LearningRate 0.000159 Epoch: 25 Global Step: 532060 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:19,022-Speed 2498.45 samples/sec Loss 1.7029 LearningRate 0.000159 Epoch: 25 Global Step: 532070 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:27,220-Speed 2498.51 samples/sec Loss 1.6085 LearningRate 0.000159 Epoch: 25 Global Step: 532080 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:35,368-Speed 2513.75 samples/sec Loss 1.6043 LearningRate 0.000159 Epoch: 25 Global Step: 532090 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:43,568-Speed 2498.01 samples/sec Loss 1.6282 LearningRate 0.000159 Epoch: 25 Global Step: 532100 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:51,766-Speed 2498.39 samples/sec Loss 1.6595 LearningRate 0.000159 Epoch: 25 Global Step: 532110 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:15:59,978-Speed 2494.25 samples/sec Loss 1.6358 LearningRate 0.000159 Epoch: 25 Global Step: 532120 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:08,177-Speed 2498.26 samples/sec Loss 1.5966 LearningRate 0.000159 Epoch: 25 Global Step: 532130 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:16,381-Speed 2496.91 samples/sec Loss 1.6960 LearningRate 0.000159 Epoch: 25 Global Step: 532140 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:24,538-Speed 2511.08 samples/sec Loss 1.6404 LearningRate 0.000159 Epoch: 25 Global Step: 532150 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:32,741-Speed 2496.90 samples/sec Loss 1.6448 LearningRate 0.000159 Epoch: 25 Global Step: 532160 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:40,940-Speed 2498.29 samples/sec Loss 1.6452 LearningRate 0.000159 Epoch: 25 Global Step: 532170 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:49,139-Speed 2498.06 samples/sec Loss 1.6767 LearningRate 0.000159 Epoch: 25 Global Step: 532180 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:16:57,337-Speed 2498.62 samples/sec Loss 1.6441 LearningRate 0.000159 Epoch: 25 Global Step: 532190 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:05,536-Speed 2498.62 samples/sec Loss 1.6632 LearningRate 0.000159 Epoch: 25 Global Step: 532200 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:13,686-Speed 2513.14 samples/sec Loss 1.6942 LearningRate 0.000159 Epoch: 25 Global Step: 532210 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:21,887-Speed 2497.72 samples/sec Loss 1.6198 LearningRate 0.000159 Epoch: 25 Global Step: 532220 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:30,086-Speed 2498.16 samples/sec Loss 1.6340 LearningRate 0.000159 Epoch: 25 Global Step: 532230 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:38,284-Speed 2498.90 samples/sec Loss 1.6691 LearningRate 0.000159 Epoch: 25 Global Step: 532240 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:46,485-Speed 2497.49 samples/sec Loss 1.6732 LearningRate 0.000159 Epoch: 25 Global Step: 532250 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:17:54,696-Speed 2494.63 samples/sec Loss 1.6424 LearningRate 0.000159 Epoch: 25 Global Step: 532260 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:02,856-Speed 2510.10 samples/sec Loss 1.6351 LearningRate 0.000159 Epoch: 25 Global Step: 532270 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:11,060-Speed 2496.90 samples/sec Loss 1.6843 LearningRate 0.000159 Epoch: 25 Global Step: 532280 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:19,258-Speed 2498.29 samples/sec Loss 1.6458 LearningRate 0.000159 Epoch: 25 Global Step: 532290 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:27,459-Speed 2497.71 samples/sec Loss 1.6118 LearningRate 0.000159 Epoch: 25 Global Step: 532300 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:35,658-Speed 2498.71 samples/sec Loss 1.6425 LearningRate 0.000159 Epoch: 25 Global Step: 532310 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:43,860-Speed 2497.41 samples/sec Loss 1.6251 LearningRate 0.000159 Epoch: 25 Global Step: 532320 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:18:52,006-Speed 2514.40 samples/sec Loss 1.6856 LearningRate 0.000158 Epoch: 25 Global Step: 532330 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:00,207-Speed 2497.71 samples/sec Loss 1.6345 LearningRate 0.000158 Epoch: 25 Global Step: 532340 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:08,406-Speed 2498.40 samples/sec Loss 1.6703 LearningRate 0.000158 Epoch: 25 Global Step: 532350 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:16,608-Speed 2497.37 samples/sec Loss 1.6588 LearningRate 0.000158 Epoch: 25 Global Step: 532360 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:24,817-Speed 2494.91 samples/sec Loss 1.6620 LearningRate 0.000158 Epoch: 25 Global Step: 532370 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:33,016-Speed 2498.26 samples/sec Loss 1.6548 LearningRate 0.000158 Epoch: 25 Global Step: 532380 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:41,165-Speed 2513.59 samples/sec Loss 1.6675 LearningRate 0.000158 Epoch: 25 Global Step: 532390 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:49,364-Speed 2498.39 samples/sec Loss 1.6296 LearningRate 0.000158 Epoch: 25 Global Step: 532400 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:19:57,566-Speed 2497.23 samples/sec Loss 1.6650 LearningRate 0.000158 Epoch: 25 Global Step: 532410 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:05,762-Speed 2499.18 samples/sec Loss 1.6401 LearningRate 0.000158 Epoch: 25 Global Step: 532420 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:13,962-Speed 2498.14 samples/sec Loss 1.6602 LearningRate 0.000158 Epoch: 25 Global Step: 532430 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:22,164-Speed 2497.35 samples/sec Loss 1.6741 LearningRate 0.000158 Epoch: 25 Global Step: 532440 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:30,311-Speed 2514.12 samples/sec Loss 1.6643 LearningRate 0.000158 Epoch: 25 Global Step: 532450 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:38,509-Speed 2498.56 samples/sec Loss 1.6096 LearningRate 0.000158 Epoch: 25 Global Step: 532460 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:46,712-Speed 2497.15 samples/sec Loss 1.6614 LearningRate 0.000158 Epoch: 25 Global Step: 532470 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:20:54,918-Speed 2496.22 samples/sec Loss 1.6402 LearningRate 0.000158 Epoch: 25 Global Step: 532480 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:03,118-Speed 2497.97 samples/sec Loss 1.6387 LearningRate 0.000158 Epoch: 25 Global Step: 532490 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:11,320-Speed 2497.38 samples/sec Loss 1.6776 LearningRate 0.000158 Epoch: 25 Global Step: 532500 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:19,469-Speed 2513.75 samples/sec Loss 1.6373 LearningRate 0.000158 Epoch: 25 Global Step: 532510 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:27,672-Speed 2497.61 samples/sec Loss 1.6298 LearningRate 0.000158 Epoch: 25 Global Step: 532520 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:35,878-Speed 2495.92 samples/sec Loss 1.6740 LearningRate 0.000158 Epoch: 25 Global Step: 532530 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:44,081-Speed 2497.07 samples/sec Loss 1.6536 LearningRate 0.000158 Epoch: 25 Global Step: 532540 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:21:52,283-Speed 2497.78 samples/sec Loss 1.6074 LearningRate 0.000158 Epoch: 25 Global Step: 532550 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:00,483-Speed 2497.71 samples/sec Loss 1.6451 LearningRate 0.000158 Epoch: 25 Global Step: 532560 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:08,630-Speed 2514.16 samples/sec Loss 1.6260 LearningRate 0.000158 Epoch: 25 Global Step: 532570 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:16,831-Speed 2497.83 samples/sec Loss 1.6411 LearningRate 0.000158 Epoch: 25 Global Step: 532580 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:25,029-Speed 2498.58 samples/sec Loss 1.6388 LearningRate 0.000158 Epoch: 25 Global Step: 532590 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:33,229-Speed 2497.73 samples/sec Loss 1.6599 LearningRate 0.000158 Epoch: 25 Global Step: 532600 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:41,429-Speed 2497.93 samples/sec Loss 1.6607 LearningRate 0.000158 Epoch: 25 Global Step: 532610 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:49,629-Speed 2498.06 samples/sec Loss 1.6398 LearningRate 0.000158 Epoch: 25 Global Step: 532620 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:22:57,778-Speed 2513.35 samples/sec Loss 1.6881 LearningRate 0.000158 Epoch: 25 Global Step: 532630 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:05,984-Speed 2496.07 samples/sec Loss 1.6871 LearningRate 0.000158 Epoch: 25 Global Step: 532640 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:14,189-Speed 2496.71 samples/sec Loss 1.6756 LearningRate 0.000158 Epoch: 25 Global Step: 532650 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:22,397-Speed 2495.56 samples/sec Loss 1.6769 LearningRate 0.000158 Epoch: 25 Global Step: 532660 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:30,607-Speed 2494.93 samples/sec Loss 1.6655 LearningRate 0.000158 Epoch: 25 Global Step: 532670 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:38,813-Speed 2496.11 samples/sec Loss 1.6640 LearningRate 0.000158 Epoch: 25 Global Step: 532680 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:46,971-Speed 2510.78 samples/sec Loss 1.6545 LearningRate 0.000158 Epoch: 25 Global Step: 532690 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:23:55,176-Speed 2496.32 samples/sec Loss 1.6767 LearningRate 0.000158 Epoch: 25 Global Step: 532700 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:03,382-Speed 2496.05 samples/sec Loss 1.6443 LearningRate 0.000158 Epoch: 25 Global Step: 532710 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:11,586-Speed 2497.04 samples/sec Loss 1.6289 LearningRate 0.000158 Epoch: 25 Global Step: 532720 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:19,792-Speed 2496.51 samples/sec Loss 1.6576 LearningRate 0.000158 Epoch: 25 Global Step: 532730 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:27,994-Speed 2498.19 samples/sec Loss 1.6692 LearningRate 0.000158 Epoch: 25 Global Step: 532740 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:36,139-Speed 2514.60 samples/sec Loss 1.6633 LearningRate 0.000158 Epoch: 25 Global Step: 532750 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:44,344-Speed 2496.67 samples/sec Loss 1.6693 LearningRate 0.000158 Epoch: 25 Global Step: 532760 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:24:52,544-Speed 2498.09 samples/sec Loss 1.6530 LearningRate 0.000158 Epoch: 25 Global Step: 532770 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:00,742-Speed 2498.46 samples/sec Loss 1.6416 LearningRate 0.000158 Epoch: 25 Global Step: 532780 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:08,943-Speed 2497.49 samples/sec Loss 1.6711 LearningRate 0.000158 Epoch: 25 Global Step: 532790 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:17,143-Speed 2497.95 samples/sec Loss 1.6713 LearningRate 0.000158 Epoch: 25 Global Step: 532800 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:25,295-Speed 2512.78 samples/sec Loss 1.6200 LearningRate 0.000158 Epoch: 25 Global Step: 532810 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:33,505-Speed 2495.14 samples/sec Loss 1.6209 LearningRate 0.000158 Epoch: 25 Global Step: 532820 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:41,712-Speed 2495.56 samples/sec Loss 1.7014 LearningRate 0.000158 Epoch: 25 Global Step: 532830 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:49,911-Speed 2498.13 samples/sec Loss 1.7058 LearningRate 0.000158 Epoch: 25 Global Step: 532840 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:25:58,109-Speed 2498.95 samples/sec Loss 1.6069 LearningRate 0.000158 Epoch: 25 Global Step: 532850 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:06,312-Speed 2497.07 samples/sec Loss 1.6673 LearningRate 0.000158 Epoch: 25 Global Step: 532860 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:14,465-Speed 2512.33 samples/sec Loss 1.6489 LearningRate 0.000158 Epoch: 25 Global Step: 532870 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:22,665-Speed 2497.89 samples/sec Loss 1.6704 LearningRate 0.000158 Epoch: 25 Global Step: 532880 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:30,867-Speed 2497.40 samples/sec Loss 1.6725 LearningRate 0.000158 Epoch: 25 Global Step: 532890 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:39,065-Speed 2498.41 samples/sec Loss 1.6644 LearningRate 0.000158 Epoch: 25 Global Step: 532900 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:47,281-Speed 2493.04 samples/sec Loss 1.6448 LearningRate 0.000158 Epoch: 25 Global Step: 532910 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:26:55,483-Speed 2497.39 samples/sec Loss 1.6303 LearningRate 0.000158 Epoch: 25 Global Step: 532920 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:03,630-Speed 2514.42 samples/sec Loss 1.6320 LearningRate 0.000158 Epoch: 25 Global Step: 532930 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:11,830-Speed 2497.90 samples/sec Loss 1.6377 LearningRate 0.000158 Epoch: 25 Global Step: 532940 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:20,029-Speed 2498.17 samples/sec Loss 1.6521 LearningRate 0.000158 Epoch: 25 Global Step: 532950 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:28,238-Speed 2495.34 samples/sec Loss 1.6205 LearningRate 0.000158 Epoch: 25 Global Step: 532960 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:36,439-Speed 2497.50 samples/sec Loss 1.6402 LearningRate 0.000158 Epoch: 25 Global Step: 532970 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:44,637-Speed 2498.43 samples/sec Loss 1.6544 LearningRate 0.000158 Epoch: 25 Global Step: 532980 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:27:52,786-Speed 2513.98 samples/sec Loss 1.6695 LearningRate 0.000158 Epoch: 25 Global Step: 532990 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:00,989-Speed 2496.91 samples/sec Loss 1.6125 LearningRate 0.000158 Epoch: 25 Global Step: 533000 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:09,193-Speed 2496.72 samples/sec Loss 1.6414 LearningRate 0.000158 Epoch: 25 Global Step: 533010 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:17,392-Speed 2498.14 samples/sec Loss 1.6242 LearningRate 0.000158 Epoch: 25 Global Step: 533020 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:25,592-Speed 2498.11 samples/sec Loss 1.6242 LearningRate 0.000158 Epoch: 25 Global Step: 533030 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:33,804-Speed 2494.46 samples/sec Loss 1.6633 LearningRate 0.000158 Epoch: 25 Global Step: 533040 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:41,954-Speed 2513.38 samples/sec Loss 1.6268 LearningRate 0.000158 Epoch: 25 Global Step: 533050 Fp16 Grad Scale: 16384 Required: 68 hours Training: 2022-07-10 16:28:50,150-Speed 2498.95 samples/sec Loss 1.5823 LearningRate 0.000158 Epoch: 25 Global Step: 533060 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:28:58,349-Speed 2498.66 samples/sec Loss 1.5863 LearningRate 0.000158 Epoch: 25 Global Step: 533070 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:06,557-Speed 2495.75 samples/sec Loss 1.6050 LearningRate 0.000158 Epoch: 25 Global Step: 533080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:14,758-Speed 2497.81 samples/sec Loss 1.6691 LearningRate 0.000158 Epoch: 25 Global Step: 533090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:22,963-Speed 2496.33 samples/sec Loss 1.6301 LearningRate 0.000158 Epoch: 25 Global Step: 533100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:31,111-Speed 2514.10 samples/sec Loss 1.6056 LearningRate 0.000158 Epoch: 25 Global Step: 533110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:39,315-Speed 2496.67 samples/sec Loss 1.6083 LearningRate 0.000158 Epoch: 25 Global Step: 533120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:47,516-Speed 2497.60 samples/sec Loss 1.6414 LearningRate 0.000158 Epoch: 25 Global Step: 533130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:29:55,719-Speed 2497.19 samples/sec Loss 1.6590 LearningRate 0.000158 Epoch: 25 Global Step: 533140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:03,919-Speed 2497.82 samples/sec Loss 1.6020 LearningRate 0.000158 Epoch: 25 Global Step: 533150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:12,125-Speed 2495.98 samples/sec Loss 1.6460 LearningRate 0.000158 Epoch: 25 Global Step: 533160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:20,274-Speed 2513.86 samples/sec Loss 1.6336 LearningRate 0.000158 Epoch: 25 Global Step: 533170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:28,479-Speed 2496.45 samples/sec Loss 1.6494 LearningRate 0.000158 Epoch: 25 Global Step: 533180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:36,676-Speed 2498.74 samples/sec Loss 1.6794 LearningRate 0.000158 Epoch: 25 Global Step: 533190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:44,878-Speed 2497.46 samples/sec Loss 1.6553 LearningRate 0.000158 Epoch: 25 Global Step: 533200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:30:53,080-Speed 2497.27 samples/sec Loss 1.6375 LearningRate 0.000158 Epoch: 25 Global Step: 533210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:01,279-Speed 2498.33 samples/sec Loss 1.6000 LearningRate 0.000158 Epoch: 25 Global Step: 533220 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:09,428-Speed 2513.70 samples/sec Loss 1.5968 LearningRate 0.000158 Epoch: 25 Global Step: 533230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:17,628-Speed 2497.84 samples/sec Loss 1.6375 LearningRate 0.000158 Epoch: 25 Global Step: 533240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:25,832-Speed 2497.05 samples/sec Loss 1.6334 LearningRate 0.000158 Epoch: 25 Global Step: 533250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:34,034-Speed 2497.26 samples/sec Loss 1.6458 LearningRate 0.000158 Epoch: 25 Global Step: 533260 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:42,243-Speed 2495.23 samples/sec Loss 1.6728 LearningRate 0.000157 Epoch: 25 Global Step: 533270 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:50,449-Speed 2496.07 samples/sec Loss 1.6188 LearningRate 0.000157 Epoch: 25 Global Step: 533280 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:31:58,598-Speed 2513.33 samples/sec Loss 1.6312 LearningRate 0.000157 Epoch: 25 Global Step: 533290 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:06,802-Speed 2496.94 samples/sec Loss 1.5762 LearningRate 0.000157 Epoch: 25 Global Step: 533300 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:15,002-Speed 2498.08 samples/sec Loss 1.6729 LearningRate 0.000157 Epoch: 25 Global Step: 533310 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:23,205-Speed 2497.16 samples/sec Loss 1.6308 LearningRate 0.000157 Epoch: 25 Global Step: 533320 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:31,407-Speed 2497.68 samples/sec Loss 1.6361 LearningRate 0.000157 Epoch: 25 Global Step: 533330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:39,608-Speed 2497.77 samples/sec Loss 1.6308 LearningRate 0.000157 Epoch: 25 Global Step: 533340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:47,761-Speed 2512.07 samples/sec Loss 1.6258 LearningRate 0.000157 Epoch: 25 Global Step: 533350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:32:55,964-Speed 2497.31 samples/sec Loss 1.6419 LearningRate 0.000157 Epoch: 25 Global Step: 533360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:04,168-Speed 2496.57 samples/sec Loss 1.6188 LearningRate 0.000157 Epoch: 25 Global Step: 533370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:12,370-Speed 2497.54 samples/sec Loss 1.6474 LearningRate 0.000157 Epoch: 25 Global Step: 533380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:20,574-Speed 2496.74 samples/sec Loss 1.6615 LearningRate 0.000157 Epoch: 25 Global Step: 533390 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:28,773-Speed 2498.38 samples/sec Loss 1.6472 LearningRate 0.000157 Epoch: 25 Global Step: 533400 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:36,925-Speed 2512.62 samples/sec Loss 1.6808 LearningRate 0.000157 Epoch: 25 Global Step: 533410 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:45,128-Speed 2496.93 samples/sec Loss 1.6332 LearningRate 0.000157 Epoch: 25 Global Step: 533420 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:33:53,333-Speed 2496.68 samples/sec Loss 1.6353 LearningRate 0.000157 Epoch: 25 Global Step: 533430 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:01,532-Speed 2498.29 samples/sec Loss 1.6690 LearningRate 0.000157 Epoch: 25 Global Step: 533440 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:09,744-Speed 2494.44 samples/sec Loss 1.6464 LearningRate 0.000157 Epoch: 25 Global Step: 533450 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:17,943-Speed 2498.25 samples/sec Loss 1.6261 LearningRate 0.000157 Epoch: 25 Global Step: 533460 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:26,091-Speed 2513.90 samples/sec Loss 1.6739 LearningRate 0.000157 Epoch: 25 Global Step: 533470 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:34,291-Speed 2498.09 samples/sec Loss 1.6663 LearningRate 0.000157 Epoch: 25 Global Step: 533480 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:42,490-Speed 2498.28 samples/sec Loss 1.6229 LearningRate 0.000157 Epoch: 25 Global Step: 533490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:50,691-Speed 2497.71 samples/sec Loss 1.6393 LearningRate 0.000157 Epoch: 25 Global Step: 533500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:34:58,894-Speed 2497.00 samples/sec Loss 1.6841 LearningRate 0.000157 Epoch: 25 Global Step: 533510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:07,096-Speed 2497.35 samples/sec Loss 1.6380 LearningRate 0.000157 Epoch: 25 Global Step: 533520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:15,240-Speed 2514.89 samples/sec Loss 1.6307 LearningRate 0.000157 Epoch: 25 Global Step: 533530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:23,440-Speed 2498.02 samples/sec Loss 1.6734 LearningRate 0.000157 Epoch: 25 Global Step: 533540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:31,638-Speed 2498.51 samples/sec Loss 1.6232 LearningRate 0.000157 Epoch: 25 Global Step: 533550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:39,853-Speed 2493.63 samples/sec Loss 1.6159 LearningRate 0.000157 Epoch: 25 Global Step: 533560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:48,057-Speed 2497.01 samples/sec Loss 1.6657 LearningRate 0.000157 Epoch: 25 Global Step: 533570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:35:56,262-Speed 2496.44 samples/sec Loss 1.6341 LearningRate 0.000157 Epoch: 25 Global Step: 533580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:04,411-Speed 2513.31 samples/sec Loss 1.6312 LearningRate 0.000157 Epoch: 25 Global Step: 533590 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:12,617-Speed 2496.40 samples/sec Loss 1.6447 LearningRate 0.000157 Epoch: 25 Global Step: 533600 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:20,818-Speed 2497.43 samples/sec Loss 1.6421 LearningRate 0.000157 Epoch: 25 Global Step: 533610 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:29,026-Speed 2495.50 samples/sec Loss 1.6306 LearningRate 0.000157 Epoch: 25 Global Step: 533620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:37,224-Speed 2498.47 samples/sec Loss 1.6174 LearningRate 0.000157 Epoch: 25 Global Step: 533630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:45,442-Speed 2492.84 samples/sec Loss 1.5945 LearningRate 0.000157 Epoch: 25 Global Step: 533640 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:36:53,591-Speed 2513.49 samples/sec Loss 1.6440 LearningRate 0.000157 Epoch: 25 Global Step: 533650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:01,790-Speed 2498.10 samples/sec Loss 1.6363 LearningRate 0.000157 Epoch: 25 Global Step: 533660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:09,996-Speed 2496.18 samples/sec Loss 1.5996 LearningRate 0.000157 Epoch: 25 Global Step: 533670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:18,200-Speed 2496.89 samples/sec Loss 1.6350 LearningRate 0.000157 Epoch: 25 Global Step: 533680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:26,403-Speed 2497.12 samples/sec Loss 1.6367 LearningRate 0.000157 Epoch: 25 Global Step: 533690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:34,608-Speed 2496.22 samples/sec Loss 1.6448 LearningRate 0.000157 Epoch: 25 Global Step: 533700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:42,753-Speed 2515.03 samples/sec Loss 1.6624 LearningRate 0.000157 Epoch: 25 Global Step: 533710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:50,957-Speed 2496.94 samples/sec Loss 1.6267 LearningRate 0.000157 Epoch: 25 Global Step: 533720 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:37:59,153-Speed 2499.73 samples/sec Loss 1.6417 LearningRate 0.000157 Epoch: 25 Global Step: 533730 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:07,359-Speed 2496.29 samples/sec Loss 1.6379 LearningRate 0.000157 Epoch: 25 Global Step: 533740 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:15,561-Speed 2497.45 samples/sec Loss 1.6211 LearningRate 0.000157 Epoch: 25 Global Step: 533750 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:23,763-Speed 2497.44 samples/sec Loss 1.6329 LearningRate 0.000157 Epoch: 25 Global Step: 533760 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:31,913-Speed 2513.11 samples/sec Loss 1.6467 LearningRate 0.000157 Epoch: 25 Global Step: 533770 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:40,124-Speed 2494.77 samples/sec Loss 1.6535 LearningRate 0.000157 Epoch: 25 Global Step: 533780 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:48,335-Speed 2494.58 samples/sec Loss 1.6536 LearningRate 0.000157 Epoch: 25 Global Step: 533790 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:38:56,543-Speed 2495.51 samples/sec Loss 1.6107 LearningRate 0.000157 Epoch: 25 Global Step: 533800 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:04,750-Speed 2495.77 samples/sec Loss 1.6105 LearningRate 0.000157 Epoch: 25 Global Step: 533810 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:12,958-Speed 2495.41 samples/sec Loss 1.6194 LearningRate 0.000157 Epoch: 25 Global Step: 533820 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:21,106-Speed 2513.81 samples/sec Loss 1.5911 LearningRate 0.000157 Epoch: 25 Global Step: 533830 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:29,310-Speed 2496.95 samples/sec Loss 1.6185 LearningRate 0.000157 Epoch: 25 Global Step: 533840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:37,513-Speed 2497.08 samples/sec Loss 1.6215 LearningRate 0.000157 Epoch: 25 Global Step: 533850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:45,730-Speed 2492.56 samples/sec Loss 1.6210 LearningRate 0.000157 Epoch: 25 Global Step: 533860 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:39:53,932-Speed 2497.49 samples/sec Loss 1.6749 LearningRate 0.000157 Epoch: 25 Global Step: 533870 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:02,136-Speed 2496.85 samples/sec Loss 1.6326 LearningRate 0.000157 Epoch: 25 Global Step: 533880 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:10,288-Speed 2512.70 samples/sec Loss 1.6361 LearningRate 0.000157 Epoch: 25 Global Step: 533890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:18,488-Speed 2497.78 samples/sec Loss 1.6275 LearningRate 0.000157 Epoch: 25 Global Step: 533900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:26,689-Speed 2497.86 samples/sec Loss 1.6157 LearningRate 0.000157 Epoch: 25 Global Step: 533910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:34,888-Speed 2498.41 samples/sec Loss 1.6362 LearningRate 0.000157 Epoch: 25 Global Step: 533920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:43,094-Speed 2496.24 samples/sec Loss 1.6214 LearningRate 0.000157 Epoch: 25 Global Step: 533930 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:51,290-Speed 2499.06 samples/sec Loss 1.6270 LearningRate 0.000157 Epoch: 25 Global Step: 533940 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:40:59,441-Speed 2513.17 samples/sec Loss 1.6757 LearningRate 0.000157 Epoch: 25 Global Step: 533950 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:07,644-Speed 2497.24 samples/sec Loss 1.6475 LearningRate 0.000157 Epoch: 25 Global Step: 533960 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:15,848-Speed 2496.63 samples/sec Loss 1.6071 LearningRate 0.000157 Epoch: 25 Global Step: 533970 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:24,053-Speed 2496.45 samples/sec Loss 1.6275 LearningRate 0.000157 Epoch: 25 Global Step: 533980 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:32,257-Speed 2496.91 samples/sec Loss 1.6176 LearningRate 0.000157 Epoch: 25 Global Step: 533990 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:40,460-Speed 2496.91 samples/sec Loss 1.6586 LearningRate 0.000157 Epoch: 25 Global Step: 534000 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:48,614-Speed 2512.15 samples/sec Loss 1.6290 LearningRate 0.000157 Epoch: 25 Global Step: 534010 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:41:56,817-Speed 2497.14 samples/sec Loss 1.6496 LearningRate 0.000157 Epoch: 25 Global Step: 534020 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:05,025-Speed 2495.72 samples/sec Loss 1.6050 LearningRate 0.000157 Epoch: 25 Global Step: 534030 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:13,225-Speed 2497.73 samples/sec Loss 1.6280 LearningRate 0.000157 Epoch: 25 Global Step: 534040 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:21,431-Speed 2496.08 samples/sec Loss 1.5975 LearningRate 0.000157 Epoch: 25 Global Step: 534050 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:29,636-Speed 2496.61 samples/sec Loss 1.6201 LearningRate 0.000157 Epoch: 25 Global Step: 534060 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:37,783-Speed 2514.87 samples/sec Loss 1.6381 LearningRate 0.000157 Epoch: 25 Global Step: 534070 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:45,993-Speed 2494.90 samples/sec Loss 1.6278 LearningRate 0.000157 Epoch: 25 Global Step: 534080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:42:54,192-Speed 2498.16 samples/sec Loss 1.6330 LearningRate 0.000157 Epoch: 25 Global Step: 534090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:02,392-Speed 2497.97 samples/sec Loss 1.6230 LearningRate 0.000157 Epoch: 25 Global Step: 534100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:10,599-Speed 2496.07 samples/sec Loss 1.6265 LearningRate 0.000157 Epoch: 25 Global Step: 534110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:18,800-Speed 2497.54 samples/sec Loss 1.6492 LearningRate 0.000157 Epoch: 25 Global Step: 534120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:26,949-Speed 2513.87 samples/sec Loss 1.6773 LearningRate 0.000157 Epoch: 25 Global Step: 534130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:35,147-Speed 2498.40 samples/sec Loss 1.5948 LearningRate 0.000157 Epoch: 25 Global Step: 534140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:43,355-Speed 2495.63 samples/sec Loss 1.6166 LearningRate 0.000157 Epoch: 25 Global Step: 534150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:51,557-Speed 2496.96 samples/sec Loss 1.6085 LearningRate 0.000157 Epoch: 25 Global Step: 534160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:43:59,757-Speed 2498.26 samples/sec Loss 1.5993 LearningRate 0.000157 Epoch: 25 Global Step: 534170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:07,959-Speed 2497.28 samples/sec Loss 1.6084 LearningRate 0.000157 Epoch: 25 Global Step: 534180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:16,105-Speed 2514.55 samples/sec Loss 1.6477 LearningRate 0.000157 Epoch: 25 Global Step: 534190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:24,306-Speed 2497.94 samples/sec Loss 1.6730 LearningRate 0.000157 Epoch: 25 Global Step: 534200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:32,521-Speed 2493.31 samples/sec Loss 1.6381 LearningRate 0.000156 Epoch: 25 Global Step: 534210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:40,725-Speed 2496.65 samples/sec Loss 1.5710 LearningRate 0.000156 Epoch: 25 Global Step: 534220 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:48,925-Speed 2498.09 samples/sec Loss 1.6159 LearningRate 0.000156 Epoch: 25 Global Step: 534230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:44:57,125-Speed 2498.13 samples/sec Loss 1.6458 LearningRate 0.000156 Epoch: 25 Global Step: 534240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:45:05,273-Speed 2513.62 samples/sec Loss 1.6122 LearningRate 0.000156 Epoch: 25 Global Step: 534250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:45:13,490-Speed 2493.09 samples/sec Loss 1.5924 LearningRate 0.000156 Epoch: 25 Global Step: 534260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:45:21,686-Speed 2498.96 samples/sec Loss 1.6394 LearningRate 0.000156 Epoch: 25 Global Step: 534270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:45:29,887-Speed 2497.61 samples/sec Loss 1.6090 LearningRate 0.000156 Epoch: 25 Global Step: 534280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:45:38,088-Speed 2497.86 samples/sec Loss 1.6022 LearningRate 0.000156 Epoch: 25 Global Step: 534290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:45:46,298-Speed 2494.97 samples/sec Loss 1.6450 LearningRate 0.000156 Epoch: 25 Global Step: 534300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:45:54,445-Speed 2514.21 samples/sec Loss 1.5982 LearningRate 0.000156 Epoch: 25 Global Step: 534310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:02,643-Speed 2498.29 samples/sec Loss 1.6312 LearningRate 0.000156 Epoch: 25 Global Step: 534320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:10,847-Speed 2496.64 samples/sec Loss 1.6359 LearningRate 0.000156 Epoch: 25 Global Step: 534330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:19,058-Speed 2494.78 samples/sec Loss 1.5981 LearningRate 0.000156 Epoch: 25 Global Step: 534340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:27,255-Speed 2499.12 samples/sec Loss 1.6874 LearningRate 0.000156 Epoch: 25 Global Step: 534350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:35,453-Speed 2498.47 samples/sec Loss 1.6008 LearningRate 0.000156 Epoch: 25 Global Step: 534360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:43,598-Speed 2514.77 samples/sec Loss 1.6176 LearningRate 0.000156 Epoch: 25 Global Step: 534370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:46:51,800-Speed 2497.61 samples/sec Loss 1.6351 LearningRate 0.000156 Epoch: 25 Global Step: 534380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:47:00,004-Speed 2496.58 samples/sec Loss 1.6122 LearningRate 0.000156 Epoch: 25 Global Step: 534390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-07-10 16:47:08,159-Speed 2511.87 samples/sec Loss 1.6159 LearningRate 0.000156 Epoch: 25 Global Step: 534400 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:16,361-Speed 2497.24 samples/sec Loss 1.6513 LearningRate 0.000156 Epoch: 25 Global Step: 534410 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:24,563-Speed 2497.44 samples/sec Loss 1.6527 LearningRate 0.000156 Epoch: 25 Global Step: 534420 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:32,718-Speed 2511.59 samples/sec Loss 1.6466 LearningRate 0.000156 Epoch: 25 Global Step: 534430 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:40,921-Speed 2497.31 samples/sec Loss 1.6843 LearningRate 0.000156 Epoch: 25 Global Step: 534440 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:49,119-Speed 2498.45 samples/sec Loss 1.6177 LearningRate 0.000156 Epoch: 25 Global Step: 534450 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:47:57,329-Speed 2494.99 samples/sec Loss 1.6665 LearningRate 0.000156 Epoch: 25 Global Step: 534460 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:05,527-Speed 2498.52 samples/sec Loss 1.6114 LearningRate 0.000156 Epoch: 25 Global Step: 534470 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:13,726-Speed 2498.22 samples/sec Loss 1.6322 LearningRate 0.000156 Epoch: 25 Global Step: 534480 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:21,878-Speed 2512.55 samples/sec Loss 1.6136 LearningRate 0.000156 Epoch: 25 Global Step: 534490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:30,092-Speed 2493.77 samples/sec Loss 1.6517 LearningRate 0.000156 Epoch: 25 Global Step: 534500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:38,287-Speed 2499.62 samples/sec Loss 1.6724 LearningRate 0.000156 Epoch: 25 Global Step: 534510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:46,493-Speed 2496.67 samples/sec Loss 1.6461 LearningRate 0.000156 Epoch: 25 Global Step: 534520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:48:54,693-Speed 2497.65 samples/sec Loss 1.6510 LearningRate 0.000156 Epoch: 25 Global Step: 534530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:02,903-Speed 2495.03 samples/sec Loss 1.6412 LearningRate 0.000156 Epoch: 25 Global Step: 534540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:11,056-Speed 2512.35 samples/sec Loss 1.6696 LearningRate 0.000156 Epoch: 25 Global Step: 534550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:19,255-Speed 2498.21 samples/sec Loss 1.6827 LearningRate 0.000156 Epoch: 25 Global Step: 534560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:27,455-Speed 2498.21 samples/sec Loss 1.6858 LearningRate 0.000156 Epoch: 25 Global Step: 534570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:35,656-Speed 2497.83 samples/sec Loss 1.6299 LearningRate 0.000156 Epoch: 25 Global Step: 534580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:43,863-Speed 2495.65 samples/sec Loss 1.6689 LearningRate 0.000156 Epoch: 25 Global Step: 534590 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:49:52,066-Speed 2496.86 samples/sec Loss 1.6694 LearningRate 0.000156 Epoch: 25 Global Step: 534600 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:00,216-Speed 2513.97 samples/sec Loss 1.6520 LearningRate 0.000156 Epoch: 25 Global Step: 534610 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:08,414-Speed 2498.70 samples/sec Loss 1.6500 LearningRate 0.000156 Epoch: 25 Global Step: 534620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:16,615-Speed 2497.49 samples/sec Loss 1.6591 LearningRate 0.000156 Epoch: 25 Global Step: 534630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:24,816-Speed 2497.85 samples/sec Loss 1.6332 LearningRate 0.000156 Epoch: 25 Global Step: 534640 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:33,014-Speed 2498.60 samples/sec Loss 1.6533 LearningRate 0.000156 Epoch: 25 Global Step: 534650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:41,213-Speed 2498.26 samples/sec Loss 1.6132 LearningRate 0.000156 Epoch: 25 Global Step: 534660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:49,361-Speed 2513.91 samples/sec Loss 1.6530 LearningRate 0.000156 Epoch: 25 Global Step: 534670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:50:57,571-Speed 2494.84 samples/sec Loss 1.6194 LearningRate 0.000156 Epoch: 25 Global Step: 534680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:51:05,771-Speed 2498.06 samples/sec Loss 1.6334 LearningRate 0.000156 Epoch: 25 Global Step: 534690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:51:14,024-Speed 2499.34 samples/sec Loss 1.6589 LearningRate 0.000156 Epoch: 25 Global Step: 534700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:51:22,229-Speed 2496.28 samples/sec Loss 1.6934 LearningRate 0.000156 Epoch: 25 Global Step: 534710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-07-10 16:51:30,499-Speed 2492.51 samples/sec Loss 1.6485 LearningRate 0.000156 Epoch: 25 Global Step: 534720 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:51:38,805-Speed 2515.88 samples/sec Loss 1.6647 LearningRate 0.000156 Epoch: 25 Global Step: 534730 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:51:47,016-Speed 2494.80 samples/sec Loss 1.6708 LearningRate 0.000156 Epoch: 25 Global Step: 534740 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:51:55,220-Speed 2496.67 samples/sec Loss 1.6152 LearningRate 0.000156 Epoch: 25 Global Step: 534750 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:03,466-Speed 2500.18 samples/sec Loss 1.6949 LearningRate 0.000156 Epoch: 25 Global Step: 534760 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:11,689-Speed 2499.96 samples/sec Loss 1.6340 LearningRate 0.000156 Epoch: 25 Global Step: 534770 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:23,270-Speed 1768.51 samples/sec Loss 1.6274 LearningRate 0.000156 Epoch: 25 Global Step: 534780 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:31,412-Speed 2515.91 samples/sec Loss 1.6893 LearningRate 0.000156 Epoch: 25 Global Step: 534790 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:39,648-Speed 2501.26 samples/sec Loss 1.6338 LearningRate 0.000156 Epoch: 25 Global Step: 534800 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:52:53,021-Speed 1537.80 samples/sec Loss 1.6492 LearningRate 0.000156 Epoch: 25 Global Step: 534810 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:01,686-Speed 2363.78 samples/sec Loss 1.6353 LearningRate 0.000156 Epoch: 25 Global Step: 534820 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:10,129-Speed 2492.23 samples/sec Loss 1.6298 LearningRate 0.000156 Epoch: 25 Global Step: 534830 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:18,398-Speed 2500.74 samples/sec Loss 1.6565 LearningRate 0.000156 Epoch: 25 Global Step: 534840 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:31,521-Speed 1560.72 samples/sec Loss 1.6428 LearningRate 0.000156 Epoch: 25 Global Step: 534850 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:39,733-Speed 2501.50 samples/sec Loss 1.6105 LearningRate 0.000156 Epoch: 25 Global Step: 534860 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:53:52,930-Speed 1590.72 samples/sec Loss 1.6554 LearningRate 0.000156 Epoch: 25 Global Step: 534870 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:01,138-Speed 2501.55 samples/sec Loss 1.6461 LearningRate 0.000156 Epoch: 25 Global Step: 534880 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:09,791-Speed 2502.23 samples/sec Loss 1.6138 LearningRate 0.000156 Epoch: 25 Global Step: 534890 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:24,133-Speed 1449.61 samples/sec Loss 1.6432 LearningRate 0.000156 Epoch: 25 Global Step: 534900 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:32,271-Speed 2517.10 samples/sec Loss 1.5827 LearningRate 0.000156 Epoch: 25 Global Step: 534910 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:40,482-Speed 2494.40 samples/sec Loss 1.5994 LearningRate 0.000156 Epoch: 25 Global Step: 534920 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:51,698-Speed 1876.08 samples/sec Loss 1.5960 LearningRate 0.000156 Epoch: 25 Global Step: 534930 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:54:59,901-Speed 2497.35 samples/sec Loss 1.6057 LearningRate 0.000156 Epoch: 25 Global Step: 534940 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:08,110-Speed 2495.31 samples/sec Loss 1.6282 LearningRate 0.000156 Epoch: 25 Global Step: 534950 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:16,314-Speed 2496.50 samples/sec Loss 1.6000 LearningRate 0.000156 Epoch: 25 Global Step: 534960 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:24,465-Speed 2513.13 samples/sec Loss 1.6253 LearningRate 0.000156 Epoch: 25 Global Step: 534970 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:32,670-Speed 2496.19 samples/sec Loss 1.6641 LearningRate 0.000156 Epoch: 25 Global Step: 534980 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:40,882-Speed 2494.42 samples/sec Loss 1.6258 LearningRate 0.000156 Epoch: 25 Global Step: 534990 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 16:55:49,043-Speed 2510.10 samples/sec Loss 1.6499 LearningRate 0.000156 Epoch: 25 Global Step: 535000 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:55:57,251-Speed 2495.46 samples/sec Loss 1.6540 LearningRate 0.000156 Epoch: 25 Global Step: 535010 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:05,456-Speed 2496.54 samples/sec Loss 1.6477 LearningRate 0.000156 Epoch: 25 Global Step: 535020 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:13,603-Speed 2513.92 samples/sec Loss 1.5736 LearningRate 0.000156 Epoch: 25 Global Step: 535030 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:21,808-Speed 2496.74 samples/sec Loss 1.6417 LearningRate 0.000156 Epoch: 25 Global Step: 535040 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:30,005-Speed 2498.86 samples/sec Loss 1.6532 LearningRate 0.000156 Epoch: 25 Global Step: 535050 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:38,234-Speed 2489.17 samples/sec Loss 1.6403 LearningRate 0.000156 Epoch: 25 Global Step: 535060 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:46,442-Speed 2495.66 samples/sec Loss 1.6286 LearningRate 0.000156 Epoch: 25 Global Step: 535070 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:56:54,645-Speed 2496.75 samples/sec Loss 1.6242 LearningRate 0.000156 Epoch: 25 Global Step: 535080 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:02,792-Speed 2514.21 samples/sec Loss 1.6136 LearningRate 0.000156 Epoch: 25 Global Step: 535090 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:10,999-Speed 2496.16 samples/sec Loss 1.6522 LearningRate 0.000156 Epoch: 25 Global Step: 535100 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:19,199-Speed 2498.12 samples/sec Loss 1.6309 LearningRate 0.000156 Epoch: 25 Global Step: 535110 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:27,397-Speed 2498.29 samples/sec Loss 1.6323 LearningRate 0.000156 Epoch: 25 Global Step: 535120 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:35,603-Speed 2496.14 samples/sec Loss 1.6427 LearningRate 0.000156 Epoch: 25 Global Step: 535130 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:43,806-Speed 2497.27 samples/sec Loss 1.6052 LearningRate 0.000156 Epoch: 25 Global Step: 535140 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:57:51,964-Speed 2510.76 samples/sec Loss 1.6315 LearningRate 0.000155 Epoch: 25 Global Step: 535150 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:00,161-Speed 2498.94 samples/sec Loss 1.6101 LearningRate 0.000155 Epoch: 25 Global Step: 535160 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:08,362-Speed 2497.51 samples/sec Loss 1.5839 LearningRate 0.000155 Epoch: 25 Global Step: 535170 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:16,568-Speed 2496.21 samples/sec Loss 1.6327 LearningRate 0.000155 Epoch: 25 Global Step: 535180 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:24,769-Speed 2497.50 samples/sec Loss 1.6635 LearningRate 0.000155 Epoch: 25 Global Step: 535190 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:32,972-Speed 2497.20 samples/sec Loss 1.6368 LearningRate 0.000155 Epoch: 25 Global Step: 535200 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:41,133-Speed 2510.02 samples/sec Loss 1.6644 LearningRate 0.000155 Epoch: 25 Global Step: 535210 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:49,335-Speed 2497.41 samples/sec Loss 1.6248 LearningRate 0.000155 Epoch: 25 Global Step: 535220 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:58:57,541-Speed 2495.90 samples/sec Loss 1.5729 LearningRate 0.000155 Epoch: 25 Global Step: 535230 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:05,744-Speed 2497.40 samples/sec Loss 1.6449 LearningRate 0.000155 Epoch: 25 Global Step: 535240 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:13,947-Speed 2496.94 samples/sec Loss 1.6330 LearningRate 0.000155 Epoch: 25 Global Step: 535250 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:22,152-Speed 2496.33 samples/sec Loss 1.6370 LearningRate 0.000155 Epoch: 25 Global Step: 535260 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:30,314-Speed 2509.59 samples/sec Loss 1.6241 LearningRate 0.000155 Epoch: 25 Global Step: 535270 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:38,523-Speed 2495.59 samples/sec Loss 1.6235 LearningRate 0.000155 Epoch: 25 Global Step: 535280 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:46,721-Speed 2498.63 samples/sec Loss 1.6443 LearningRate 0.000155 Epoch: 25 Global Step: 535290 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 16:59:54,925-Speed 2496.76 samples/sec Loss 1.6201 LearningRate 0.000155 Epoch: 25 Global Step: 535300 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:03,128-Speed 2496.93 samples/sec Loss 1.6257 LearningRate 0.000155 Epoch: 25 Global Step: 535310 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:11,328-Speed 2498.08 samples/sec Loss 1.6263 LearningRate 0.000155 Epoch: 25 Global Step: 535320 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:19,481-Speed 2512.29 samples/sec Loss 1.6117 LearningRate 0.000155 Epoch: 25 Global Step: 535330 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:27,683-Speed 2497.29 samples/sec Loss 1.6380 LearningRate 0.000155 Epoch: 25 Global Step: 535340 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:35,884-Speed 2497.93 samples/sec Loss 1.6304 LearningRate 0.000155 Epoch: 25 Global Step: 535350 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:44,088-Speed 2496.71 samples/sec Loss 1.5959 LearningRate 0.000155 Epoch: 25 Global Step: 535360 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:00:52,293-Speed 2496.62 samples/sec Loss 1.6014 LearningRate 0.000155 Epoch: 25 Global Step: 535370 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:00,489-Speed 2499.11 samples/sec Loss 1.6344 LearningRate 0.000155 Epoch: 25 Global Step: 535380 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:08,634-Speed 2514.69 samples/sec Loss 1.6329 LearningRate 0.000155 Epoch: 25 Global Step: 535390 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:16,833-Speed 2498.48 samples/sec Loss 1.6088 LearningRate 0.000155 Epoch: 25 Global Step: 535400 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:25,041-Speed 2495.72 samples/sec Loss 1.6286 LearningRate 0.000155 Epoch: 25 Global Step: 535410 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:33,240-Speed 2498.20 samples/sec Loss 1.6388 LearningRate 0.000155 Epoch: 25 Global Step: 535420 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:41,442-Speed 2497.32 samples/sec Loss 1.6229 LearningRate 0.000155 Epoch: 25 Global Step: 535430 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:49,650-Speed 2496.04 samples/sec Loss 1.5764 LearningRate 0.000155 Epoch: 25 Global Step: 535440 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:01:57,797-Speed 2514.16 samples/sec Loss 1.6487 LearningRate 0.000155 Epoch: 25 Global Step: 535450 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:05,999-Speed 2497.26 samples/sec Loss 1.5937 LearningRate 0.000155 Epoch: 25 Global Step: 535460 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:14,217-Speed 2492.64 samples/sec Loss 1.6291 LearningRate 0.000155 Epoch: 25 Global Step: 535470 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:22,414-Speed 2499.08 samples/sec Loss 1.5995 LearningRate 0.000155 Epoch: 25 Global Step: 535480 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:30,615-Speed 2497.36 samples/sec Loss 1.5918 LearningRate 0.000155 Epoch: 25 Global Step: 535490 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:38,819-Speed 2496.80 samples/sec Loss 1.6375 LearningRate 0.000155 Epoch: 25 Global Step: 535500 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:46,978-Speed 2510.57 samples/sec Loss 1.6064 LearningRate 0.000155 Epoch: 25 Global Step: 535510 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:02:55,179-Speed 2497.54 samples/sec Loss 1.6154 LearningRate 0.000155 Epoch: 25 Global Step: 535520 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:03,389-Speed 2495.12 samples/sec Loss 1.6492 LearningRate 0.000155 Epoch: 25 Global Step: 535530 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:11,590-Speed 2497.62 samples/sec Loss 1.6765 LearningRate 0.000155 Epoch: 25 Global Step: 535540 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:19,792-Speed 2497.54 samples/sec Loss 1.5991 LearningRate 0.000155 Epoch: 25 Global Step: 535550 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:28,003-Speed 2494.62 samples/sec Loss 1.6588 LearningRate 0.000155 Epoch: 25 Global Step: 535560 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:36,143-Speed 2516.50 samples/sec Loss 1.6102 LearningRate 0.000155 Epoch: 25 Global Step: 535570 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:44,343-Speed 2498.02 samples/sec Loss 1.6192 LearningRate 0.000155 Epoch: 25 Global Step: 535580 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:03:52,541-Speed 2498.76 samples/sec Loss 1.6173 LearningRate 0.000155 Epoch: 25 Global Step: 535590 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:00,741-Speed 2498.03 samples/sec Loss 1.6651 LearningRate 0.000155 Epoch: 25 Global Step: 535600 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:08,952-Speed 2494.65 samples/sec Loss 1.6300 LearningRate 0.000155 Epoch: 25 Global Step: 535610 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:17,153-Speed 2497.79 samples/sec Loss 1.6528 LearningRate 0.000155 Epoch: 25 Global Step: 535620 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:25,297-Speed 2515.25 samples/sec Loss 1.6340 LearningRate 0.000155 Epoch: 25 Global Step: 535630 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:33,512-Speed 2493.47 samples/sec Loss 1.6163 LearningRate 0.000155 Epoch: 25 Global Step: 535640 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:41,716-Speed 2496.55 samples/sec Loss 1.6412 LearningRate 0.000155 Epoch: 25 Global Step: 535650 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:49,927-Speed 2494.86 samples/sec Loss 1.6415 LearningRate 0.000155 Epoch: 25 Global Step: 535660 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:04:58,133-Speed 2496.20 samples/sec Loss 1.6392 LearningRate 0.000155 Epoch: 25 Global Step: 535670 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:06,330-Speed 2498.60 samples/sec Loss 1.6356 LearningRate 0.000155 Epoch: 25 Global Step: 535680 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:14,477-Speed 2514.21 samples/sec Loss 1.6092 LearningRate 0.000155 Epoch: 25 Global Step: 535690 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:22,676-Speed 2498.26 samples/sec Loss 1.6407 LearningRate 0.000155 Epoch: 25 Global Step: 535700 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:30,880-Speed 2496.93 samples/sec Loss 1.6319 LearningRate 0.000155 Epoch: 25 Global Step: 535710 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:39,074-Speed 2499.71 samples/sec Loss 1.6518 LearningRate 0.000155 Epoch: 25 Global Step: 535720 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:47,274-Speed 2497.84 samples/sec Loss 1.6388 LearningRate 0.000155 Epoch: 25 Global Step: 535730 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:05:55,477-Speed 2497.17 samples/sec Loss 1.6646 LearningRate 0.000155 Epoch: 25 Global Step: 535740 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:03,631-Speed 2512.04 samples/sec Loss 1.6019 LearningRate 0.000155 Epoch: 25 Global Step: 535750 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:11,840-Speed 2495.08 samples/sec Loss 1.6499 LearningRate 0.000155 Epoch: 25 Global Step: 535760 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:20,039-Speed 2498.40 samples/sec Loss 1.6425 LearningRate 0.000155 Epoch: 25 Global Step: 535770 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:28,245-Speed 2496.44 samples/sec Loss 1.6564 LearningRate 0.000155 Epoch: 25 Global Step: 535780 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:36,467-Speed 2491.15 samples/sec Loss 1.6597 LearningRate 0.000155 Epoch: 25 Global Step: 535790 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:44,668-Speed 2497.34 samples/sec Loss 1.6223 LearningRate 0.000155 Epoch: 25 Global Step: 535800 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:06:52,817-Speed 2513.86 samples/sec Loss 1.6349 LearningRate 0.000155 Epoch: 25 Global Step: 535810 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:01,014-Speed 2498.77 samples/sec Loss 1.6429 LearningRate 0.000155 Epoch: 25 Global Step: 535820 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:09,214-Speed 2498.11 samples/sec Loss 1.6008 LearningRate 0.000155 Epoch: 25 Global Step: 535830 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:17,416-Speed 2497.49 samples/sec Loss 1.6242 LearningRate 0.000155 Epoch: 25 Global Step: 535840 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:25,614-Speed 2498.53 samples/sec Loss 1.6230 LearningRate 0.000155 Epoch: 25 Global Step: 535850 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:33,822-Speed 2495.73 samples/sec Loss 1.6164 LearningRate 0.000155 Epoch: 25 Global Step: 535860 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:41,976-Speed 2511.96 samples/sec Loss 1.6016 LearningRate 0.000155 Epoch: 25 Global Step: 535870 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:50,176-Speed 2497.83 samples/sec Loss 1.5969 LearningRate 0.000155 Epoch: 25 Global Step: 535880 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:07:58,383-Speed 2496.02 samples/sec Loss 1.6648 LearningRate 0.000155 Epoch: 25 Global Step: 535890 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:06,588-Speed 2496.75 samples/sec Loss 1.6183 LearningRate 0.000155 Epoch: 25 Global Step: 535900 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:14,789-Speed 2497.40 samples/sec Loss 1.6479 LearningRate 0.000155 Epoch: 25 Global Step: 535910 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:22,990-Speed 2497.68 samples/sec Loss 1.6505 LearningRate 0.000155 Epoch: 25 Global Step: 535920 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:31,137-Speed 2514.36 samples/sec Loss 1.6599 LearningRate 0.000155 Epoch: 25 Global Step: 535930 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:39,336-Speed 2498.46 samples/sec Loss 1.6026 LearningRate 0.000155 Epoch: 25 Global Step: 535940 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:47,540-Speed 2496.77 samples/sec Loss 1.6278 LearningRate 0.000155 Epoch: 25 Global Step: 535950 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:08:55,741-Speed 2497.43 samples/sec Loss 1.6537 LearningRate 0.000155 Epoch: 25 Global Step: 535960 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:03,955-Speed 2493.72 samples/sec Loss 1.6081 LearningRate 0.000155 Epoch: 25 Global Step: 535970 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:12,161-Speed 2496.37 samples/sec Loss 1.5977 LearningRate 0.000155 Epoch: 25 Global Step: 535980 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:20,313-Speed 2512.68 samples/sec Loss 1.6309 LearningRate 0.000155 Epoch: 25 Global Step: 535990 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:28,520-Speed 2495.74 samples/sec Loss 1.6488 LearningRate 0.000155 Epoch: 25 Global Step: 536000 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:36,728-Speed 2495.83 samples/sec Loss 1.6192 LearningRate 0.000155 Epoch: 25 Global Step: 536010 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:44,935-Speed 2496.13 samples/sec Loss 1.6144 LearningRate 0.000155 Epoch: 25 Global Step: 536020 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:09:53,138-Speed 2496.95 samples/sec Loss 1.5972 LearningRate 0.000155 Epoch: 25 Global Step: 536030 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:01,340-Speed 2497.21 samples/sec Loss 1.6283 LearningRate 0.000155 Epoch: 25 Global Step: 536040 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:09,489-Speed 2513.47 samples/sec Loss 1.6526 LearningRate 0.000155 Epoch: 25 Global Step: 536050 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:17,702-Speed 2494.42 samples/sec Loss 1.6577 LearningRate 0.000155 Epoch: 25 Global Step: 536060 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:25,905-Speed 2496.95 samples/sec Loss 1.5981 LearningRate 0.000155 Epoch: 25 Global Step: 536070 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:34,130-Speed 2490.31 samples/sec Loss 1.6107 LearningRate 0.000155 Epoch: 25 Global Step: 536080 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:42,333-Speed 2497.44 samples/sec Loss 1.6306 LearningRate 0.000155 Epoch: 25 Global Step: 536090 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:50,535-Speed 2497.31 samples/sec Loss 1.6200 LearningRate 0.000154 Epoch: 25 Global Step: 536100 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:10:58,701-Speed 2508.25 samples/sec Loss 1.6251 LearningRate 0.000154 Epoch: 25 Global Step: 536110 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:06,911-Speed 2495.95 samples/sec Loss 1.6609 LearningRate 0.000154 Epoch: 25 Global Step: 536120 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:15,110-Speed 2498.30 samples/sec Loss 1.6290 LearningRate 0.000154 Epoch: 25 Global Step: 536130 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:23,321-Speed 2494.55 samples/sec Loss 1.6224 LearningRate 0.000154 Epoch: 25 Global Step: 536140 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:31,521-Speed 2497.85 samples/sec Loss 1.6222 LearningRate 0.000154 Epoch: 25 Global Step: 536150 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:39,724-Speed 2496.98 samples/sec Loss 1.6315 LearningRate 0.000154 Epoch: 25 Global Step: 536160 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:47,880-Speed 2512.03 samples/sec Loss 1.6238 LearningRate 0.000154 Epoch: 25 Global Step: 536170 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:11:56,091-Speed 2494.47 samples/sec Loss 1.6276 LearningRate 0.000154 Epoch: 25 Global Step: 536180 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:12:04,295-Speed 2497.08 samples/sec Loss 1.6256 LearningRate 0.000154 Epoch: 25 Global Step: 536190 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:12:12,493-Speed 2498.38 samples/sec Loss 1.6073 LearningRate 0.000154 Epoch: 25 Global Step: 536200 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:12:20,698-Speed 2496.47 samples/sec Loss 1.6603 LearningRate 0.000154 Epoch: 25 Global Step: 536210 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:12:28,903-Speed 2496.57 samples/sec Loss 1.6916 LearningRate 0.000154 Epoch: 25 Global Step: 536220 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:12:37,050-Speed 2514.00 samples/sec Loss 1.6429 LearningRate 0.000154 Epoch: 25 Global Step: 536230 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:12:45,258-Speed 2495.64 samples/sec Loss 1.5905 LearningRate 0.000154 Epoch: 25 Global Step: 536240 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:12:53,470-Speed 2494.62 samples/sec Loss 1.6783 LearningRate 0.000154 Epoch: 25 Global Step: 536250 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:01,674-Speed 2496.63 samples/sec Loss 1.6509 LearningRate 0.000154 Epoch: 25 Global Step: 536260 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:09,875-Speed 2497.76 samples/sec Loss 1.6554 LearningRate 0.000154 Epoch: 25 Global Step: 536270 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:18,091-Speed 2493.16 samples/sec Loss 1.6186 LearningRate 0.000154 Epoch: 25 Global Step: 536280 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:26,238-Speed 2514.32 samples/sec Loss 1.6561 LearningRate 0.000154 Epoch: 25 Global Step: 536290 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:34,439-Speed 2497.74 samples/sec Loss 1.6444 LearningRate 0.000154 Epoch: 25 Global Step: 536300 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:42,642-Speed 2496.89 samples/sec Loss 1.6330 LearningRate 0.000154 Epoch: 25 Global Step: 536310 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:50,840-Speed 2498.81 samples/sec Loss 1.6749 LearningRate 0.000154 Epoch: 25 Global Step: 536320 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:13:59,041-Speed 2497.59 samples/sec Loss 1.6297 LearningRate 0.000154 Epoch: 25 Global Step: 536330 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:07,243-Speed 2497.39 samples/sec Loss 1.6288 LearningRate 0.000154 Epoch: 25 Global Step: 536340 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:15,391-Speed 2514.10 samples/sec Loss 1.6521 LearningRate 0.000154 Epoch: 25 Global Step: 536350 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:23,603-Speed 2494.29 samples/sec Loss 1.5732 LearningRate 0.000154 Epoch: 25 Global Step: 536360 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:31,809-Speed 2496.12 samples/sec Loss 1.6216 LearningRate 0.000154 Epoch: 25 Global Step: 536370 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:40,007-Speed 2498.59 samples/sec Loss 1.6159 LearningRate 0.000154 Epoch: 25 Global Step: 536380 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:48,212-Speed 2496.72 samples/sec Loss 1.6356 LearningRate 0.000154 Epoch: 25 Global Step: 536390 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:14:56,411-Speed 2498.31 samples/sec Loss 1.5870 LearningRate 0.000154 Epoch: 25 Global Step: 536400 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:15:04,557-Speed 2514.38 samples/sec Loss 1.6395 LearningRate 0.000154 Epoch: 25 Global Step: 536410 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:15:12,764-Speed 2495.83 samples/sec Loss 1.6228 LearningRate 0.000154 Epoch: 25 Global Step: 536420 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:15:20,983-Speed 2492.07 samples/sec Loss 1.6238 LearningRate 0.000154 Epoch: 25 Global Step: 536430 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:15:29,184-Speed 2497.79 samples/sec Loss 1.6337 LearningRate 0.000154 Epoch: 25 Global Step: 536440 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-07-10 17:15:37,345-Speed 2509.77 samples/sec Loss 1.6614 LearningRate 0.000154 Epoch: 25 Global Step: 536450 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:15:45,504-Speed 2510.85 samples/sec Loss 1.6362 LearningRate 0.000154 Epoch: 25 Global Step: 536460 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:15:53,648-Speed 2515.17 samples/sec Loss 1.6764 LearningRate 0.000154 Epoch: 25 Global Step: 536470 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:01,844-Speed 2499.19 samples/sec Loss 1.6576 LearningRate 0.000154 Epoch: 25 Global Step: 536480 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:10,058-Speed 2493.76 samples/sec Loss 1.6015 LearningRate 0.000154 Epoch: 25 Global Step: 536490 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:18,256-Speed 2498.67 samples/sec Loss 1.6260 LearningRate 0.000154 Epoch: 25 Global Step: 536500 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:26,459-Speed 2496.94 samples/sec Loss 1.6320 LearningRate 0.000154 Epoch: 25 Global Step: 536510 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:34,658-Speed 2498.42 samples/sec Loss 1.6402 LearningRate 0.000154 Epoch: 25 Global Step: 536520 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:42,806-Speed 2513.93 samples/sec Loss 1.6196 LearningRate 0.000154 Epoch: 25 Global Step: 536530 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:51,007-Speed 2497.19 samples/sec Loss 1.6522 LearningRate 0.000154 Epoch: 25 Global Step: 536540 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:16:59,223-Speed 2493.26 samples/sec Loss 1.6455 LearningRate 0.000154 Epoch: 25 Global Step: 536550 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:07,436-Speed 2494.11 samples/sec Loss 1.6032 LearningRate 0.000154 Epoch: 25 Global Step: 536560 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:15,646-Speed 2494.86 samples/sec Loss 1.6214 LearningRate 0.000154 Epoch: 25 Global Step: 536570 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:23,845-Speed 2498.28 samples/sec Loss 1.6366 LearningRate 0.000154 Epoch: 25 Global Step: 536580 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:31,994-Speed 2513.78 samples/sec Loss 1.6367 LearningRate 0.000154 Epoch: 25 Global Step: 536590 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:40,191-Speed 2498.73 samples/sec Loss 1.6197 LearningRate 0.000154 Epoch: 25 Global Step: 536600 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:48,391-Speed 2498.12 samples/sec Loss 1.6189 LearningRate 0.000154 Epoch: 25 Global Step: 536610 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:17:56,598-Speed 2495.80 samples/sec Loss 1.6418 LearningRate 0.000154 Epoch: 25 Global Step: 536620 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:04,804-Speed 2496.34 samples/sec Loss 1.6045 LearningRate 0.000154 Epoch: 25 Global Step: 536630 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:13,004-Speed 2497.74 samples/sec Loss 1.5962 LearningRate 0.000154 Epoch: 25 Global Step: 536640 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:21,148-Speed 2515.19 samples/sec Loss 1.6441 LearningRate 0.000154 Epoch: 25 Global Step: 536650 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:29,344-Speed 2499.20 samples/sec Loss 1.6307 LearningRate 0.000154 Epoch: 25 Global Step: 536660 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:37,543-Speed 2498.22 samples/sec Loss 1.6301 LearningRate 0.000154 Epoch: 25 Global Step: 536670 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:45,747-Speed 2496.58 samples/sec Loss 1.6229 LearningRate 0.000154 Epoch: 25 Global Step: 536680 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:18:53,950-Speed 2497.52 samples/sec Loss 1.6590 LearningRate 0.000154 Epoch: 25 Global Step: 536690 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:02,151-Speed 2497.92 samples/sec Loss 1.6244 LearningRate 0.000154 Epoch: 25 Global Step: 536700 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:10,310-Speed 2510.52 samples/sec Loss 1.6353 LearningRate 0.000154 Epoch: 25 Global Step: 536710 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:18,508-Speed 2498.88 samples/sec Loss 1.6181 LearningRate 0.000154 Epoch: 25 Global Step: 536720 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:26,707-Speed 2498.05 samples/sec Loss 1.6197 LearningRate 0.000154 Epoch: 25 Global Step: 536730 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:34,912-Speed 2496.44 samples/sec Loss 1.6221 LearningRate 0.000154 Epoch: 25 Global Step: 536740 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:43,115-Speed 2497.13 samples/sec Loss 1.6200 LearningRate 0.000154 Epoch: 25 Global Step: 536750 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:51,319-Speed 2496.81 samples/sec Loss 1.5667 LearningRate 0.000154 Epoch: 25 Global Step: 536760 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:19:59,475-Speed 2511.56 samples/sec Loss 1.6588 LearningRate 0.000154 Epoch: 25 Global Step: 536770 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:07,676-Speed 2497.43 samples/sec Loss 1.6512 LearningRate 0.000154 Epoch: 25 Global Step: 536780 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:15,878-Speed 2497.58 samples/sec Loss 1.6115 LearningRate 0.000154 Epoch: 25 Global Step: 536790 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:24,092-Speed 2493.49 samples/sec Loss 1.6215 LearningRate 0.000154 Epoch: 25 Global Step: 536800 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:32,293-Speed 2497.84 samples/sec Loss 1.6341 LearningRate 0.000154 Epoch: 25 Global Step: 536810 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:40,501-Speed 2495.41 samples/sec Loss 1.6518 LearningRate 0.000154 Epoch: 25 Global Step: 536820 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:48,649-Speed 2514.06 samples/sec Loss 1.6044 LearningRate 0.000154 Epoch: 25 Global Step: 536830 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:20:56,848-Speed 2498.08 samples/sec Loss 1.6283 LearningRate 0.000154 Epoch: 25 Global Step: 536840 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:05,060-Speed 2494.55 samples/sec Loss 1.6126 LearningRate 0.000154 Epoch: 25 Global Step: 536850 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:13,261-Speed 2497.94 samples/sec Loss 1.6372 LearningRate 0.000154 Epoch: 25 Global Step: 536860 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:21,471-Speed 2494.80 samples/sec Loss 1.6323 LearningRate 0.000154 Epoch: 25 Global Step: 536870 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:29,673-Speed 2497.27 samples/sec Loss 1.5979 LearningRate 0.000154 Epoch: 25 Global Step: 536880 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:37,821-Speed 2513.98 samples/sec Loss 1.6186 LearningRate 0.000154 Epoch: 25 Global Step: 536890 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:46,030-Speed 2495.36 samples/sec Loss 1.6370 LearningRate 0.000154 Epoch: 25 Global Step: 536900 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:21:54,231-Speed 2497.51 samples/sec Loss 1.6295 LearningRate 0.000154 Epoch: 25 Global Step: 536910 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:02,435-Speed 2497.23 samples/sec Loss 1.6096 LearningRate 0.000154 Epoch: 25 Global Step: 536920 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:10,632-Speed 2498.77 samples/sec Loss 1.6221 LearningRate 0.000154 Epoch: 25 Global Step: 536930 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:18,835-Speed 2496.75 samples/sec Loss 1.6354 LearningRate 0.000154 Epoch: 25 Global Step: 536940 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:26,982-Speed 2514.09 samples/sec Loss 1.6372 LearningRate 0.000154 Epoch: 25 Global Step: 536950 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:35,185-Speed 2497.16 samples/sec Loss 1.6714 LearningRate 0.000154 Epoch: 25 Global Step: 536960 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:43,386-Speed 2497.93 samples/sec Loss 1.6199 LearningRate 0.000154 Epoch: 25 Global Step: 536970 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:51,586-Speed 2498.07 samples/sec Loss 1.6050 LearningRate 0.000154 Epoch: 25 Global Step: 536980 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:22:59,790-Speed 2497.12 samples/sec Loss 1.6389 LearningRate 0.000154 Epoch: 25 Global Step: 536990 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:07,990-Speed 2497.91 samples/sec Loss 1.6237 LearningRate 0.000154 Epoch: 25 Global Step: 537000 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:16,140-Speed 2513.56 samples/sec Loss 1.6228 LearningRate 0.000154 Epoch: 25 Global Step: 537010 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:24,339-Speed 2498.00 samples/sec Loss 1.6054 LearningRate 0.000154 Epoch: 25 Global Step: 537020 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:32,535-Speed 2499.04 samples/sec Loss 1.6061 LearningRate 0.000154 Epoch: 25 Global Step: 537030 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:40,740-Speed 2496.86 samples/sec Loss 1.6052 LearningRate 0.000154 Epoch: 25 Global Step: 537040 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:48,940-Speed 2497.74 samples/sec Loss 1.6163 LearningRate 0.000153 Epoch: 25 Global Step: 537050 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:23:57,145-Speed 2496.54 samples/sec Loss 1.5919 LearningRate 0.000153 Epoch: 25 Global Step: 537060 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:05,291-Speed 2514.33 samples/sec Loss 1.6148 LearningRate 0.000153 Epoch: 25 Global Step: 537070 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:13,503-Speed 2494.28 samples/sec Loss 1.6362 LearningRate 0.000153 Epoch: 25 Global Step: 537080 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:21,713-Speed 2495.17 samples/sec Loss 1.6413 LearningRate 0.000153 Epoch: 25 Global Step: 537090 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:29,919-Speed 2495.93 samples/sec Loss 1.6021 LearningRate 0.000153 Epoch: 25 Global Step: 537100 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:38,118-Speed 2498.61 samples/sec Loss 1.6478 LearningRate 0.000153 Epoch: 25 Global Step: 537110 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:46,319-Speed 2497.80 samples/sec Loss 1.6366 LearningRate 0.000153 Epoch: 25 Global Step: 537120 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:24:54,471-Speed 2512.61 samples/sec Loss 1.6376 LearningRate 0.000153 Epoch: 25 Global Step: 537130 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:02,672-Speed 2497.85 samples/sec Loss 1.6305 LearningRate 0.000153 Epoch: 25 Global Step: 537140 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:10,869-Speed 2498.52 samples/sec Loss 1.6290 LearningRate 0.000153 Epoch: 25 Global Step: 537150 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:19,069-Speed 2498.05 samples/sec Loss 1.6368 LearningRate 0.000153 Epoch: 25 Global Step: 537160 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:27,292-Speed 2491.01 samples/sec Loss 1.5975 LearningRate 0.000153 Epoch: 25 Global Step: 537170 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:35,494-Speed 2497.30 samples/sec Loss 1.6169 LearningRate 0.000153 Epoch: 25 Global Step: 537180 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:43,652-Speed 2511.07 samples/sec Loss 1.6441 LearningRate 0.000153 Epoch: 25 Global Step: 537190 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:25:51,847-Speed 2499.45 samples/sec Loss 1.7006 LearningRate 0.000153 Epoch: 25 Global Step: 537200 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:00,051-Speed 2496.63 samples/sec Loss 1.6163 LearningRate 0.000153 Epoch: 25 Global Step: 537210 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:08,258-Speed 2495.91 samples/sec Loss 1.6323 LearningRate 0.000153 Epoch: 25 Global Step: 537220 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:16,459-Speed 2497.62 samples/sec Loss 1.6458 LearningRate 0.000153 Epoch: 25 Global Step: 537230 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:24,658-Speed 2498.26 samples/sec Loss 1.6302 LearningRate 0.000153 Epoch: 25 Global Step: 537240 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:32,801-Speed 2515.46 samples/sec Loss 1.6348 LearningRate 0.000153 Epoch: 25 Global Step: 537250 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:41,006-Speed 2496.50 samples/sec Loss 1.6556 LearningRate 0.000153 Epoch: 25 Global Step: 537260 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:49,207-Speed 2497.63 samples/sec Loss 1.6266 LearningRate 0.000153 Epoch: 25 Global Step: 537270 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:26:57,408-Speed 2497.65 samples/sec Loss 1.6239 LearningRate 0.000153 Epoch: 25 Global Step: 537280 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:05,609-Speed 2497.47 samples/sec Loss 1.6214 LearningRate 0.000153 Epoch: 25 Global Step: 537290 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:13,810-Speed 2497.54 samples/sec Loss 1.6538 LearningRate 0.000153 Epoch: 25 Global Step: 537300 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:21,956-Speed 2514.48 samples/sec Loss 1.6469 LearningRate 0.000153 Epoch: 25 Global Step: 537310 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:30,154-Speed 2498.64 samples/sec Loss 1.6215 LearningRate 0.000153 Epoch: 25 Global Step: 537320 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:38,357-Speed 2497.10 samples/sec Loss 1.6230 LearningRate 0.000153 Epoch: 25 Global Step: 537330 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:46,561-Speed 2496.65 samples/sec Loss 1.6062 LearningRate 0.000153 Epoch: 25 Global Step: 537340 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:27:54,768-Speed 2496.19 samples/sec Loss 1.6374 LearningRate 0.000153 Epoch: 25 Global Step: 537350 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:02,965-Speed 2499.14 samples/sec Loss 1.6190 LearningRate 0.000153 Epoch: 25 Global Step: 537360 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:11,112-Speed 2514.06 samples/sec Loss 1.6410 LearningRate 0.000153 Epoch: 25 Global Step: 537370 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:19,320-Speed 2495.48 samples/sec Loss 1.5938 LearningRate 0.000153 Epoch: 25 Global Step: 537380 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:27,523-Speed 2496.85 samples/sec Loss 1.5819 LearningRate 0.000153 Epoch: 25 Global Step: 537390 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:35,728-Speed 2496.43 samples/sec Loss 1.6348 LearningRate 0.000153 Epoch: 25 Global Step: 537400 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:43,925-Speed 2498.85 samples/sec Loss 1.6468 LearningRate 0.000153 Epoch: 25 Global Step: 537410 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:28:52,123-Speed 2498.53 samples/sec Loss 1.5937 LearningRate 0.000153 Epoch: 25 Global Step: 537420 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:00,267-Speed 2515.56 samples/sec Loss 1.6184 LearningRate 0.000153 Epoch: 25 Global Step: 537430 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:08,464-Speed 2498.68 samples/sec Loss 1.6082 LearningRate 0.000153 Epoch: 25 Global Step: 537440 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:16,751-Speed 2471.75 samples/sec Loss 1.6326 LearningRate 0.000153 Epoch: 25 Global Step: 537450 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:24,952-Speed 2497.48 samples/sec Loss 1.6047 LearningRate 0.000153 Epoch: 25 Global Step: 537460 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:33,158-Speed 2496.12 samples/sec Loss 1.6017 LearningRate 0.000153 Epoch: 25 Global Step: 537470 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:41,360-Speed 2497.79 samples/sec Loss 1.6227 LearningRate 0.000153 Epoch: 25 Global Step: 537480 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:49,504-Speed 2515.00 samples/sec Loss 1.6606 LearningRate 0.000153 Epoch: 25 Global Step: 537490 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:29:57,706-Speed 2497.90 samples/sec Loss 1.6417 LearningRate 0.000153 Epoch: 25 Global Step: 537500 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:05,904-Speed 2498.59 samples/sec Loss 1.6069 LearningRate 0.000153 Epoch: 25 Global Step: 537510 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:14,102-Speed 2498.49 samples/sec Loss 1.6423 LearningRate 0.000153 Epoch: 25 Global Step: 537520 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:22,303-Speed 2497.65 samples/sec Loss 1.6317 LearningRate 0.000153 Epoch: 25 Global Step: 537530 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:30,508-Speed 2496.28 samples/sec Loss 1.6409 LearningRate 0.000153 Epoch: 25 Global Step: 537540 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:38,654-Speed 2514.78 samples/sec Loss 1.6454 LearningRate 0.000153 Epoch: 25 Global Step: 537550 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:46,853-Speed 2498.04 samples/sec Loss 1.6003 LearningRate 0.000153 Epoch: 25 Global Step: 537560 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:30:55,055-Speed 2497.35 samples/sec Loss 1.6188 LearningRate 0.000153 Epoch: 25 Global Step: 537570 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:03,253-Speed 2498.72 samples/sec Loss 1.6376 LearningRate 0.000153 Epoch: 25 Global Step: 537580 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:11,449-Speed 2499.03 samples/sec Loss 1.5757 LearningRate 0.000153 Epoch: 25 Global Step: 537590 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:19,644-Speed 2499.42 samples/sec Loss 1.6106 LearningRate 0.000153 Epoch: 25 Global Step: 537600 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:27,801-Speed 2511.17 samples/sec Loss 1.6327 LearningRate 0.000153 Epoch: 25 Global Step: 537610 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:35,997-Speed 2499.40 samples/sec Loss 1.6360 LearningRate 0.000153 Epoch: 25 Global Step: 537620 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:44,197-Speed 2497.83 samples/sec Loss 1.6553 LearningRate 0.000153 Epoch: 25 Global Step: 537630 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:31:52,395-Speed 2498.64 samples/sec Loss 1.6402 LearningRate 0.000153 Epoch: 25 Global Step: 537640 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:32:00,592-Speed 2498.91 samples/sec Loss 1.5958 LearningRate 0.000153 Epoch: 25 Global Step: 537650 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:32:08,789-Speed 2499.20 samples/sec Loss 1.6139 LearningRate 0.000153 Epoch: 25 Global Step: 537660 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:16,937-Speed 2513.94 samples/sec Loss 1.6103 LearningRate 0.000153 Epoch: 25 Global Step: 537670 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:25,135-Speed 2498.60 samples/sec Loss 1.6549 LearningRate 0.000153 Epoch: 25 Global Step: 537680 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:33,353-Speed 2492.26 samples/sec Loss 1.6749 LearningRate 0.000153 Epoch: 25 Global Step: 537690 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:41,561-Speed 2495.75 samples/sec Loss 1.6147 LearningRate 0.000153 Epoch: 25 Global Step: 537700 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:49,760-Speed 2498.12 samples/sec Loss 1.6259 LearningRate 0.000153 Epoch: 25 Global Step: 537710 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:32:57,965-Speed 2496.32 samples/sec Loss 1.6113 LearningRate 0.000153 Epoch: 25 Global Step: 537720 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:06,136-Speed 2506.87 samples/sec Loss 1.6355 LearningRate 0.000153 Epoch: 25 Global Step: 537730 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:14,335-Speed 2498.34 samples/sec Loss 1.6063 LearningRate 0.000153 Epoch: 25 Global Step: 537740 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:22,545-Speed 2494.95 samples/sec Loss 1.5893 LearningRate 0.000153 Epoch: 25 Global Step: 537750 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:30,747-Speed 2497.52 samples/sec Loss 1.6034 LearningRate 0.000153 Epoch: 25 Global Step: 537760 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:38,951-Speed 2496.66 samples/sec Loss 1.6620 LearningRate 0.000153 Epoch: 25 Global Step: 537770 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:47,153-Speed 2497.25 samples/sec Loss 1.6598 LearningRate 0.000153 Epoch: 25 Global Step: 537780 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:33:55,304-Speed 2513.08 samples/sec Loss 1.6129 LearningRate 0.000153 Epoch: 25 Global Step: 537790 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:03,511-Speed 2495.51 samples/sec Loss 1.6527 LearningRate 0.000153 Epoch: 25 Global Step: 537800 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:11,713-Speed 2497.83 samples/sec Loss 1.6173 LearningRate 0.000153 Epoch: 25 Global Step: 537810 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:19,911-Speed 2498.79 samples/sec Loss 1.6393 LearningRate 0.000153 Epoch: 25 Global Step: 537820 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:28,115-Speed 2496.98 samples/sec Loss 1.6256 LearningRate 0.000153 Epoch: 25 Global Step: 537830 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:36,318-Speed 2496.98 samples/sec Loss 1.6022 LearningRate 0.000153 Epoch: 25 Global Step: 537840 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:44,467-Speed 2513.75 samples/sec Loss 1.6244 LearningRate 0.000153 Epoch: 25 Global Step: 537850 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:34:52,666-Speed 2498.19 samples/sec Loss 1.6102 LearningRate 0.000153 Epoch: 25 Global Step: 537860 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:00,881-Speed 2493.51 samples/sec Loss 1.6220 LearningRate 0.000153 Epoch: 25 Global Step: 537870 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:09,083-Speed 2500.38 samples/sec Loss 1.6297 LearningRate 0.000153 Epoch: 25 Global Step: 537880 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:17,290-Speed 2495.88 samples/sec Loss 1.5898 LearningRate 0.000153 Epoch: 25 Global Step: 537890 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:25,496-Speed 2496.22 samples/sec Loss 1.6126 LearningRate 0.000153 Epoch: 25 Global Step: 537900 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:33,652-Speed 2511.23 samples/sec Loss 1.6444 LearningRate 0.000153 Epoch: 25 Global Step: 537910 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:41,855-Speed 2497.49 samples/sec Loss 1.5895 LearningRate 0.000153 Epoch: 25 Global Step: 537920 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:50,065-Speed 2494.93 samples/sec Loss 1.6255 LearningRate 0.000153 Epoch: 25 Global Step: 537930 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:35:58,269-Speed 2496.61 samples/sec Loss 1.6297 LearningRate 0.000153 Epoch: 25 Global Step: 537940 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:06,471-Speed 2497.25 samples/sec Loss 1.6038 LearningRate 0.000153 Epoch: 25 Global Step: 537950 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:14,672-Speed 2497.74 samples/sec Loss 1.6242 LearningRate 0.000153 Epoch: 25 Global Step: 537960 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:22,815-Speed 2516.08 samples/sec Loss 1.6256 LearningRate 0.000153 Epoch: 25 Global Step: 537970 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:31,013-Speed 2498.58 samples/sec Loss 1.6243 LearningRate 0.000153 Epoch: 25 Global Step: 537980 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:39,212-Speed 2498.18 samples/sec Loss 1.6128 LearningRate 0.000153 Epoch: 25 Global Step: 537990 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:47,412-Speed 2498.18 samples/sec Loss 1.6363 LearningRate 0.000153 Epoch: 25 Global Step: 538000 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:36:55,627-Speed 2493.10 samples/sec Loss 1.6052 LearningRate 0.000152 Epoch: 25 Global Step: 538010 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:03,827-Speed 2497.84 samples/sec Loss 1.6715 LearningRate 0.000152 Epoch: 25 Global Step: 538020 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:11,987-Speed 2510.54 samples/sec Loss 1.6361 LearningRate 0.000152 Epoch: 25 Global Step: 538030 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:20,191-Speed 2496.70 samples/sec Loss 1.6353 LearningRate 0.000152 Epoch: 25 Global Step: 538040 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:28,391-Speed 2497.78 samples/sec Loss 1.6536 LearningRate 0.000152 Epoch: 25 Global Step: 538050 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:36,591-Speed 2497.90 samples/sec Loss 1.6351 LearningRate 0.000152 Epoch: 25 Global Step: 538060 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:44,797-Speed 2496.20 samples/sec Loss 1.6205 LearningRate 0.000152 Epoch: 25 Global Step: 538070 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:37:52,994-Speed 2499.40 samples/sec Loss 1.6251 LearningRate 0.000152 Epoch: 25 Global Step: 538080 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:01,151-Speed 2510.84 samples/sec Loss 1.6355 LearningRate 0.000152 Epoch: 25 Global Step: 538090 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:09,367-Speed 2493.19 samples/sec Loss 1.6278 LearningRate 0.000152 Epoch: 25 Global Step: 538100 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:17,568-Speed 2497.65 samples/sec Loss 1.6063 LearningRate 0.000152 Epoch: 25 Global Step: 538110 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:25,771-Speed 2497.08 samples/sec Loss 1.6517 LearningRate 0.000152 Epoch: 25 Global Step: 538120 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:33,972-Speed 2497.74 samples/sec Loss 1.5958 LearningRate 0.000152 Epoch: 25 Global Step: 538130 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:42,172-Speed 2498.28 samples/sec Loss 1.6283 LearningRate 0.000152 Epoch: 25 Global Step: 538140 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:50,319-Speed 2514.15 samples/sec Loss 1.6152 LearningRate 0.000152 Epoch: 25 Global Step: 538150 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:38:58,517-Speed 2498.37 samples/sec Loss 1.6039 LearningRate 0.000152 Epoch: 25 Global Step: 538160 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:06,719-Speed 2497.40 samples/sec Loss 1.5886 LearningRate 0.000152 Epoch: 25 Global Step: 538170 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:14,918-Speed 2498.43 samples/sec Loss 1.6169 LearningRate 0.000152 Epoch: 25 Global Step: 538180 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:23,122-Speed 2497.00 samples/sec Loss 1.6690 LearningRate 0.000152 Epoch: 25 Global Step: 538190 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:31,321-Speed 2498.21 samples/sec Loss 1.6173 LearningRate 0.000152 Epoch: 25 Global Step: 538200 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:39,468-Speed 2515.21 samples/sec Loss 1.6477 LearningRate 0.000152 Epoch: 25 Global Step: 538210 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:47,669-Speed 2497.48 samples/sec Loss 1.6653 LearningRate 0.000152 Epoch: 25 Global Step: 538220 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:39:55,881-Speed 2494.57 samples/sec Loss 1.6293 LearningRate 0.000152 Epoch: 25 Global Step: 538230 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:04,079-Speed 2498.49 samples/sec Loss 1.5972 LearningRate 0.000152 Epoch: 25 Global Step: 538240 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:12,280-Speed 2497.64 samples/sec Loss 1.6571 LearningRate 0.000152 Epoch: 25 Global Step: 538250 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:20,482-Speed 2497.34 samples/sec Loss 1.6339 LearningRate 0.000152 Epoch: 25 Global Step: 538260 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:28,641-Speed 2510.38 samples/sec Loss 1.6177 LearningRate 0.000152 Epoch: 25 Global Step: 538270 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:36,845-Speed 2497.12 samples/sec Loss 1.6410 LearningRate 0.000152 Epoch: 25 Global Step: 538280 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:45,047-Speed 2497.29 samples/sec Loss 1.6039 LearningRate 0.000152 Epoch: 25 Global Step: 538290 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:40:53,249-Speed 2497.33 samples/sec Loss 1.6308 LearningRate 0.000152 Epoch: 25 Global Step: 538300 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:01,458-Speed 2495.49 samples/sec Loss 1.6050 LearningRate 0.000152 Epoch: 25 Global Step: 538310 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:09,659-Speed 2497.57 samples/sec Loss 1.6181 LearningRate 0.000152 Epoch: 25 Global Step: 538320 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:17,805-Speed 2514.39 samples/sec Loss 1.5973 LearningRate 0.000152 Epoch: 25 Global Step: 538330 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:26,003-Speed 2498.49 samples/sec Loss 1.5872 LearningRate 0.000152 Epoch: 25 Global Step: 538340 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:34,203-Speed 2498.06 samples/sec Loss 1.6271 LearningRate 0.000152 Epoch: 25 Global Step: 538350 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:42,402-Speed 2498.05 samples/sec Loss 1.6138 LearningRate 0.000152 Epoch: 25 Global Step: 538360 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:50,603-Speed 2497.72 samples/sec Loss 1.6012 LearningRate 0.000152 Epoch: 25 Global Step: 538370 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:41:58,808-Speed 2496.56 samples/sec Loss 1.5963 LearningRate 0.000152 Epoch: 25 Global Step: 538380 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:06,956-Speed 2513.76 samples/sec Loss 1.6420 LearningRate 0.000152 Epoch: 25 Global Step: 538390 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:15,160-Speed 2496.71 samples/sec Loss 1.6160 LearningRate 0.000152 Epoch: 25 Global Step: 538400 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:23,371-Speed 2494.70 samples/sec Loss 1.5834 LearningRate 0.000152 Epoch: 25 Global Step: 538410 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:31,570-Speed 2498.25 samples/sec Loss 1.6110 LearningRate 0.000152 Epoch: 25 Global Step: 538420 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:39,772-Speed 2497.23 samples/sec Loss 1.6303 LearningRate 0.000152 Epoch: 25 Global Step: 538430 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:47,971-Speed 2498.02 samples/sec Loss 1.5828 LearningRate 0.000152 Epoch: 25 Global Step: 538440 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:42:56,115-Speed 2515.06 samples/sec Loss 1.6389 LearningRate 0.000152 Epoch: 25 Global Step: 538450 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:04,314-Speed 2498.27 samples/sec Loss 1.6082 LearningRate 0.000152 Epoch: 25 Global Step: 538460 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:12,516-Speed 2497.50 samples/sec Loss 1.5690 LearningRate 0.000152 Epoch: 25 Global Step: 538470 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:20,724-Speed 2495.52 samples/sec Loss 1.6410 LearningRate 0.000152 Epoch: 25 Global Step: 538480 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:28,922-Speed 2498.39 samples/sec Loss 1.6140 LearningRate 0.000152 Epoch: 25 Global Step: 538490 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:37,125-Speed 2497.11 samples/sec Loss 1.5820 LearningRate 0.000152 Epoch: 25 Global Step: 538500 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:45,277-Speed 2512.76 samples/sec Loss 1.6726 LearningRate 0.000152 Epoch: 25 Global Step: 538510 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:43:53,477-Speed 2498.13 samples/sec Loss 1.6326 LearningRate 0.000152 Epoch: 25 Global Step: 538520 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:01,678-Speed 2497.56 samples/sec Loss 1.6114 LearningRate 0.000152 Epoch: 25 Global Step: 538530 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:09,894-Speed 2493.28 samples/sec Loss 1.6184 LearningRate 0.000152 Epoch: 25 Global Step: 538540 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:18,090-Speed 2499.28 samples/sec Loss 1.5967 LearningRate 0.000152 Epoch: 25 Global Step: 538550 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:26,291-Speed 2497.64 samples/sec Loss 1.6282 LearningRate 0.000152 Epoch: 25 Global Step: 538560 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:34,438-Speed 2514.19 samples/sec Loss 1.5964 LearningRate 0.000152 Epoch: 25 Global Step: 538570 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:42,639-Speed 2497.91 samples/sec Loss 1.6519 LearningRate 0.000152 Epoch: 25 Global Step: 538580 Fp16 Grad Scale: 16384 Required: 67 hours Training: 2022-07-10 17:44:50,805-Speed 2508.31 samples/sec Loss 1.6013 LearningRate 0.000152 Epoch: 25 Global Step: 538590 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:44:59,012-Speed 2496.35 samples/sec Loss 1.6273 LearningRate 0.000152 Epoch: 25 Global Step: 538600 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:07,207-Speed 2499.53 samples/sec Loss 1.5909 LearningRate 0.000152 Epoch: 25 Global Step: 538610 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:15,415-Speed 2495.84 samples/sec Loss 1.6308 LearningRate 0.000152 Epoch: 25 Global Step: 538620 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:23,560-Speed 2514.69 samples/sec Loss 1.6446 LearningRate 0.000152 Epoch: 25 Global Step: 538630 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:31,769-Speed 2495.40 samples/sec Loss 1.6416 LearningRate 0.000152 Epoch: 25 Global Step: 538640 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:39,968-Speed 2498.26 samples/sec Loss 1.6369 LearningRate 0.000152 Epoch: 25 Global Step: 538650 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:48,168-Speed 2498.23 samples/sec Loss 1.6044 LearningRate 0.000152 Epoch: 25 Global Step: 538660 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:45:56,366-Speed 2498.46 samples/sec Loss 1.6459 LearningRate 0.000152 Epoch: 25 Global Step: 538670 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:04,576-Speed 2495.00 samples/sec Loss 1.6402 LearningRate 0.000152 Epoch: 25 Global Step: 538680 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:12,727-Speed 2512.96 samples/sec Loss 1.6118 LearningRate 0.000152 Epoch: 25 Global Step: 538690 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:20,927-Speed 2497.94 samples/sec Loss 1.6244 LearningRate 0.000152 Epoch: 25 Global Step: 538700 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:29,126-Speed 2498.29 samples/sec Loss 1.5934 LearningRate 0.000152 Epoch: 25 Global Step: 538710 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:37,324-Speed 2498.47 samples/sec Loss 1.6144 LearningRate 0.000152 Epoch: 25 Global Step: 538720 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:45,524-Speed 2498.42 samples/sec Loss 1.6301 LearningRate 0.000152 Epoch: 25 Global Step: 538730 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:46:53,724-Speed 2497.60 samples/sec Loss 1.6624 LearningRate 0.000152 Epoch: 25 Global Step: 538740 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:01,869-Speed 2514.86 samples/sec Loss 1.5983 LearningRate 0.000152 Epoch: 25 Global Step: 538750 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:10,078-Speed 2495.36 samples/sec Loss 1.5829 LearningRate 0.000152 Epoch: 25 Global Step: 538760 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:18,281-Speed 2496.83 samples/sec Loss 1.6288 LearningRate 0.000152 Epoch: 25 Global Step: 538770 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:26,492-Speed 2494.93 samples/sec Loss 1.6077 LearningRate 0.000152 Epoch: 25 Global Step: 538780 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:34,693-Speed 2497.59 samples/sec Loss 1.6367 LearningRate 0.000152 Epoch: 25 Global Step: 538790 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:42,892-Speed 2498.38 samples/sec Loss 1.6284 LearningRate 0.000152 Epoch: 25 Global Step: 538800 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:51,037-Speed 2514.90 samples/sec Loss 1.5939 LearningRate 0.000152 Epoch: 25 Global Step: 538810 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:47:59,237-Speed 2497.76 samples/sec Loss 1.6257 LearningRate 0.000152 Epoch: 25 Global Step: 538820 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:07,440-Speed 2497.15 samples/sec Loss 1.6376 LearningRate 0.000152 Epoch: 25 Global Step: 538830 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:15,643-Speed 2497.03 samples/sec Loss 1.6577 LearningRate 0.000152 Epoch: 25 Global Step: 538840 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:23,841-Speed 2498.53 samples/sec Loss 1.6113 LearningRate 0.000152 Epoch: 25 Global Step: 538850 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:32,059-Speed 2492.52 samples/sec Loss 1.6647 LearningRate 0.000152 Epoch: 25 Global Step: 538860 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:40,209-Speed 2513.45 samples/sec Loss 1.6532 LearningRate 0.000152 Epoch: 25 Global Step: 538870 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:48,412-Speed 2497.06 samples/sec Loss 1.6108 LearningRate 0.000152 Epoch: 25 Global Step: 538880 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:48:56,622-Speed 2494.69 samples/sec Loss 1.6267 LearningRate 0.000152 Epoch: 25 Global Step: 538890 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:04,825-Speed 2497.19 samples/sec Loss 1.6448 LearningRate 0.000152 Epoch: 25 Global Step: 538900 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:13,038-Speed 2494.03 samples/sec Loss 1.6101 LearningRate 0.000152 Epoch: 25 Global Step: 538910 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:21,236-Speed 2498.49 samples/sec Loss 1.6068 LearningRate 0.000152 Epoch: 25 Global Step: 538920 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:29,392-Speed 2511.26 samples/sec Loss 1.6230 LearningRate 0.000152 Epoch: 25 Global Step: 538930 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:37,601-Speed 2495.30 samples/sec Loss 1.6481 LearningRate 0.000152 Epoch: 25 Global Step: 538940 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:45,803-Speed 2497.52 samples/sec Loss 1.6292 LearningRate 0.000152 Epoch: 25 Global Step: 538950 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:49:54,003-Speed 2498.05 samples/sec Loss 1.6291 LearningRate 0.000152 Epoch: 25 Global Step: 538960 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:02,209-Speed 2496.01 samples/sec Loss 1.6519 LearningRate 0.000151 Epoch: 25 Global Step: 538970 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:10,407-Speed 2498.59 samples/sec Loss 1.5818 LearningRate 0.000151 Epoch: 25 Global Step: 538980 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:18,558-Speed 2513.12 samples/sec Loss 1.6086 LearningRate 0.000151 Epoch: 25 Global Step: 538990 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:26,764-Speed 2495.84 samples/sec Loss 1.6113 LearningRate 0.000151 Epoch: 25 Global Step: 539000 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:34,976-Speed 2494.35 samples/sec Loss 1.5915 LearningRate 0.000151 Epoch: 25 Global Step: 539010 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:43,198-Speed 2491.25 samples/sec Loss 1.5970 LearningRate 0.000151 Epoch: 25 Global Step: 539020 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:51,395-Speed 2498.78 samples/sec Loss 1.6648 LearningRate 0.000151 Epoch: 25 Global Step: 539030 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:50:59,605-Speed 2494.98 samples/sec Loss 1.6004 LearningRate 0.000151 Epoch: 25 Global Step: 539040 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:07,754-Speed 2513.43 samples/sec Loss 1.6378 LearningRate 0.000151 Epoch: 25 Global Step: 539050 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:15,952-Speed 2498.67 samples/sec Loss 1.5999 LearningRate 0.000151 Epoch: 25 Global Step: 539060 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:24,152-Speed 2498.31 samples/sec Loss 1.6378 LearningRate 0.000151 Epoch: 25 Global Step: 539070 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:32,351-Speed 2497.96 samples/sec Loss 1.6721 LearningRate 0.000151 Epoch: 25 Global Step: 539080 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:40,556-Speed 2496.45 samples/sec Loss 1.6575 LearningRate 0.000151 Epoch: 25 Global Step: 539090 Fp16 Grad Scale: 8192 Required: 67 hours Training: 2022-07-10 17:51:48,758-Speed 2497.85 samples/sec Loss 1.6259 LearningRate 0.000151 Epoch: 25 Global Step: 539100 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:51:56,904-Speed 2514.37 samples/sec Loss 1.5548 LearningRate 0.000151 Epoch: 25 Global Step: 539110 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:05,103-Speed 2498.60 samples/sec Loss 1.6398 LearningRate 0.000151 Epoch: 25 Global Step: 539120 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:13,305-Speed 2497.62 samples/sec Loss 1.5872 LearningRate 0.000151 Epoch: 25 Global Step: 539130 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:21,509-Speed 2496.91 samples/sec Loss 1.6158 LearningRate 0.000151 Epoch: 25 Global Step: 539140 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:29,708-Speed 2498.03 samples/sec Loss 1.5877 LearningRate 0.000151 Epoch: 25 Global Step: 539150 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:37,921-Speed 2494.08 samples/sec Loss 1.6118 LearningRate 0.000151 Epoch: 25 Global Step: 539160 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:46,075-Speed 2512.08 samples/sec Loss 1.6281 LearningRate 0.000151 Epoch: 25 Global Step: 539170 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:52:54,275-Speed 2498.13 samples/sec Loss 1.6408 LearningRate 0.000151 Epoch: 25 Global Step: 539180 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:02,473-Speed 2498.45 samples/sec Loss 1.6153 LearningRate 0.000151 Epoch: 25 Global Step: 539190 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:10,673-Speed 2497.83 samples/sec Loss 1.6198 LearningRate 0.000151 Epoch: 25 Global Step: 539200 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:18,872-Speed 2498.40 samples/sec Loss 1.6382 LearningRate 0.000151 Epoch: 25 Global Step: 539210 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:27,077-Speed 2496.74 samples/sec Loss 1.6619 LearningRate 0.000151 Epoch: 25 Global Step: 539220 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:37,291-Speed 2005.25 samples/sec Loss 1.6346 LearningRate 0.000151 Epoch: 26 Global Step: 539230 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:45,484-Speed 2500.40 samples/sec Loss 1.5829 LearningRate 0.000151 Epoch: 26 Global Step: 539240 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:53:53,682-Speed 2498.63 samples/sec Loss 1.5929 LearningRate 0.000151 Epoch: 26 Global Step: 539250 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:01,875-Speed 2499.98 samples/sec Loss 1.6367 LearningRate 0.000151 Epoch: 26 Global Step: 539260 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:10,071-Speed 2499.56 samples/sec Loss 1.6216 LearningRate 0.000151 Epoch: 26 Global Step: 539270 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:18,272-Speed 2497.69 samples/sec Loss 1.6578 LearningRate 0.000151 Epoch: 26 Global Step: 539280 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:26,422-Speed 2513.52 samples/sec Loss 1.6527 LearningRate 0.000151 Epoch: 26 Global Step: 539290 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:34,624-Speed 2497.03 samples/sec Loss 1.6339 LearningRate 0.000151 Epoch: 26 Global Step: 539300 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:42,823-Speed 2498.09 samples/sec Loss 1.6127 LearningRate 0.000151 Epoch: 26 Global Step: 539310 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:51,024-Speed 2497.79 samples/sec Loss 1.5705 LearningRate 0.000151 Epoch: 26 Global Step: 539320 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:54:59,228-Speed 2496.89 samples/sec Loss 1.5871 LearningRate 0.000151 Epoch: 26 Global Step: 539330 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:07,433-Speed 2496.36 samples/sec Loss 1.5869 LearningRate 0.000151 Epoch: 26 Global Step: 539340 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:15,590-Speed 2511.03 samples/sec Loss 1.5953 LearningRate 0.000151 Epoch: 26 Global Step: 539350 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:23,806-Speed 2493.22 samples/sec Loss 1.6136 LearningRate 0.000151 Epoch: 26 Global Step: 539360 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:32,005-Speed 2498.10 samples/sec Loss 1.6186 LearningRate 0.000151 Epoch: 26 Global Step: 539370 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:40,205-Speed 2498.00 samples/sec Loss 1.6272 LearningRate 0.000151 Epoch: 26 Global Step: 539380 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:48,406-Speed 2497.66 samples/sec Loss 1.5979 LearningRate 0.000151 Epoch: 26 Global Step: 539390 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:55:56,624-Speed 2492.72 samples/sec Loss 1.6220 LearningRate 0.000151 Epoch: 26 Global Step: 539400 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:04,774-Speed 2513.12 samples/sec Loss 1.5835 LearningRate 0.000151 Epoch: 26 Global Step: 539410 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:12,976-Speed 2497.34 samples/sec Loss 1.5939 LearningRate 0.000151 Epoch: 26 Global Step: 539420 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:21,180-Speed 2496.90 samples/sec Loss 1.5925 LearningRate 0.000151 Epoch: 26 Global Step: 539430 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:29,379-Speed 2498.18 samples/sec Loss 1.6554 LearningRate 0.000151 Epoch: 26 Global Step: 539440 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:37,584-Speed 2496.23 samples/sec Loss 1.6150 LearningRate 0.000151 Epoch: 26 Global Step: 539450 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:45,783-Speed 2498.25 samples/sec Loss 1.6337 LearningRate 0.000151 Epoch: 26 Global Step: 539460 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:56:53,927-Speed 2515.02 samples/sec Loss 1.6295 LearningRate 0.000151 Epoch: 26 Global Step: 539470 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:02,127-Speed 2498.10 samples/sec Loss 1.5741 LearningRate 0.000151 Epoch: 26 Global Step: 539480 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:10,325-Speed 2498.64 samples/sec Loss 1.6131 LearningRate 0.000151 Epoch: 26 Global Step: 539490 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:18,524-Speed 2498.39 samples/sec Loss 1.5880 LearningRate 0.000151 Epoch: 26 Global Step: 539500 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:26,726-Speed 2497.36 samples/sec Loss 1.6424 LearningRate 0.000151 Epoch: 26 Global Step: 539510 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:34,923-Speed 2500.02 samples/sec Loss 1.6424 LearningRate 0.000151 Epoch: 26 Global Step: 539520 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:43,081-Speed 2510.67 samples/sec Loss 1.6266 LearningRate 0.000151 Epoch: 26 Global Step: 539530 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:51,291-Speed 2494.70 samples/sec Loss 1.6124 LearningRate 0.000151 Epoch: 26 Global Step: 539540 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:57:59,492-Speed 2497.80 samples/sec Loss 1.6225 LearningRate 0.000151 Epoch: 26 Global Step: 539550 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:07,690-Speed 2498.45 samples/sec Loss 1.5775 LearningRate 0.000151 Epoch: 26 Global Step: 539560 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:15,893-Speed 2497.01 samples/sec Loss 1.6253 LearningRate 0.000151 Epoch: 26 Global Step: 539570 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:24,092-Speed 2498.26 samples/sec Loss 1.5873 LearningRate 0.000151 Epoch: 26 Global Step: 539580 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:32,240-Speed 2514.13 samples/sec Loss 1.5989 LearningRate 0.000151 Epoch: 26 Global Step: 539590 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:40,439-Speed 2498.31 samples/sec Loss 1.6026 LearningRate 0.000151 Epoch: 26 Global Step: 539600 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:48,650-Speed 2494.70 samples/sec Loss 1.6144 LearningRate 0.000151 Epoch: 26 Global Step: 539610 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:58:56,849-Speed 2498.17 samples/sec Loss 1.6100 LearningRate 0.000151 Epoch: 26 Global Step: 539620 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:05,051-Speed 2497.68 samples/sec Loss 1.5833 LearningRate 0.000151 Epoch: 26 Global Step: 539630 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:13,253-Speed 2497.08 samples/sec Loss 1.5978 LearningRate 0.000151 Epoch: 26 Global Step: 539640 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:21,399-Speed 2514.68 samples/sec Loss 1.6541 LearningRate 0.000151 Epoch: 26 Global Step: 539650 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:29,598-Speed 2498.35 samples/sec Loss 1.6413 LearningRate 0.000151 Epoch: 26 Global Step: 539660 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:37,794-Speed 2499.16 samples/sec Loss 1.5581 LearningRate 0.000151 Epoch: 26 Global Step: 539670 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:45,991-Speed 2498.79 samples/sec Loss 1.6192 LearningRate 0.000151 Epoch: 26 Global Step: 539680 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 17:59:54,191-Speed 2498.08 samples/sec Loss 1.6296 LearningRate 0.000151 Epoch: 26 Global Step: 539690 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:02,415-Speed 2490.60 samples/sec Loss 1.6321 LearningRate 0.000151 Epoch: 26 Global Step: 539700 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:10,568-Speed 2515.59 samples/sec Loss 1.6389 LearningRate 0.000151 Epoch: 26 Global Step: 539710 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:18,768-Speed 2498.04 samples/sec Loss 1.6171 LearningRate 0.000151 Epoch: 26 Global Step: 539720 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:26,972-Speed 2496.66 samples/sec Loss 1.6094 LearningRate 0.000151 Epoch: 26 Global Step: 539730 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:35,177-Speed 2496.67 samples/sec Loss 1.6141 LearningRate 0.000151 Epoch: 26 Global Step: 539740 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:43,378-Speed 2497.74 samples/sec Loss 1.5988 LearningRate 0.000151 Epoch: 26 Global Step: 539750 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:51,594-Speed 2492.93 samples/sec Loss 1.6036 LearningRate 0.000151 Epoch: 26 Global Step: 539760 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:00:59,741-Speed 2514.02 samples/sec Loss 1.6393 LearningRate 0.000151 Epoch: 26 Global Step: 539770 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:01:07,943-Speed 2497.56 samples/sec Loss 1.6044 LearningRate 0.000151 Epoch: 26 Global Step: 539780 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:01:16,146-Speed 2497.05 samples/sec Loss 1.6415 LearningRate 0.000151 Epoch: 26 Global Step: 539790 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:01:24,347-Speed 2497.61 samples/sec Loss 1.6254 LearningRate 0.000151 Epoch: 26 Global Step: 539800 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:01:32,545-Speed 2498.34 samples/sec Loss 1.6212 LearningRate 0.000151 Epoch: 26 Global Step: 539810 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:01:40,749-Speed 2496.75 samples/sec Loss 1.6349 LearningRate 0.000151 Epoch: 26 Global Step: 539820 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:01:48,898-Speed 2513.54 samples/sec Loss 1.6202 LearningRate 0.000151 Epoch: 26 Global Step: 539830 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:01:57,102-Speed 2496.86 samples/sec Loss 1.6160 LearningRate 0.000151 Epoch: 26 Global Step: 539840 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:05,310-Speed 2495.30 samples/sec Loss 1.5864 LearningRate 0.000151 Epoch: 26 Global Step: 539850 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:13,511-Speed 2497.84 samples/sec Loss 1.5907 LearningRate 0.000151 Epoch: 26 Global Step: 539860 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:21,716-Speed 2496.45 samples/sec Loss 1.6174 LearningRate 0.000151 Epoch: 26 Global Step: 539870 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:29,919-Speed 2497.06 samples/sec Loss 1.6304 LearningRate 0.000151 Epoch: 26 Global Step: 539880 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:38,062-Speed 2515.23 samples/sec Loss 1.6686 LearningRate 0.000151 Epoch: 26 Global Step: 539890 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:46,263-Speed 2497.75 samples/sec Loss 1.6251 LearningRate 0.000151 Epoch: 26 Global Step: 539900 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:02:54,467-Speed 2496.53 samples/sec Loss 1.5784 LearningRate 0.000151 Epoch: 26 Global Step: 539910 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:02,670-Speed 2497.06 samples/sec Loss 1.5835 LearningRate 0.000151 Epoch: 26 Global Step: 539920 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:10,872-Speed 2497.49 samples/sec Loss 1.5958 LearningRate 0.000150 Epoch: 26 Global Step: 539930 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:19,071-Speed 2498.32 samples/sec Loss 1.5772 LearningRate 0.000150 Epoch: 26 Global Step: 539940 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:27,219-Speed 2513.88 samples/sec Loss 1.6014 LearningRate 0.000150 Epoch: 26 Global Step: 539950 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:35,420-Speed 2497.49 samples/sec Loss 1.6241 LearningRate 0.000150 Epoch: 26 Global Step: 539960 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:43,628-Speed 2495.57 samples/sec Loss 1.6252 LearningRate 0.000150 Epoch: 26 Global Step: 539970 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:03:51,829-Speed 2497.68 samples/sec Loss 1.6202 LearningRate 0.000150 Epoch: 26 Global Step: 539980 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:00,031-Speed 2497.07 samples/sec Loss 1.6174 LearningRate 0.000150 Epoch: 26 Global Step: 539990 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:08,232-Speed 2497.71 samples/sec Loss 1.6212 LearningRate 0.000150 Epoch: 26 Global Step: 540000 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:16,394-Speed 2509.57 samples/sec Loss 1.6277 LearningRate 0.000150 Epoch: 26 Global Step: 540010 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:24,604-Speed 2495.19 samples/sec Loss 1.6412 LearningRate 0.000150 Epoch: 26 Global Step: 540020 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:32,809-Speed 2496.55 samples/sec Loss 1.6218 LearningRate 0.000150 Epoch: 26 Global Step: 540030 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:41,017-Speed 2495.35 samples/sec Loss 1.6036 LearningRate 0.000150 Epoch: 26 Global Step: 540040 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:49,226-Speed 2495.68 samples/sec Loss 1.6160 LearningRate 0.000150 Epoch: 26 Global Step: 540050 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:04:57,445-Speed 2491.82 samples/sec Loss 1.6122 LearningRate 0.000150 Epoch: 26 Global Step: 540060 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:05,592-Speed 2514.47 samples/sec Loss 1.6312 LearningRate 0.000150 Epoch: 26 Global Step: 540070 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:13,794-Speed 2497.65 samples/sec Loss 1.6140 LearningRate 0.000150 Epoch: 26 Global Step: 540080 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:21,998-Speed 2496.79 samples/sec Loss 1.6029 LearningRate 0.000150 Epoch: 26 Global Step: 540090 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:30,201-Speed 2496.86 samples/sec Loss 1.6489 LearningRate 0.000150 Epoch: 26 Global Step: 540100 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:38,400-Speed 2498.32 samples/sec Loss 1.5956 LearningRate 0.000150 Epoch: 26 Global Step: 540110 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:46,608-Speed 2495.55 samples/sec Loss 1.5554 LearningRate 0.000150 Epoch: 26 Global Step: 540120 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:05:54,758-Speed 2513.15 samples/sec Loss 1.5897 LearningRate 0.000150 Epoch: 26 Global Step: 540130 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:02,962-Speed 2496.80 samples/sec Loss 1.6337 LearningRate 0.000150 Epoch: 26 Global Step: 540140 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:11,161-Speed 2498.11 samples/sec Loss 1.6075 LearningRate 0.000150 Epoch: 26 Global Step: 540150 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:19,367-Speed 2496.33 samples/sec Loss 1.6118 LearningRate 0.000150 Epoch: 26 Global Step: 540160 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:27,568-Speed 2497.49 samples/sec Loss 1.6226 LearningRate 0.000150 Epoch: 26 Global Step: 540170 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:35,768-Speed 2498.04 samples/sec Loss 1.5783 LearningRate 0.000150 Epoch: 26 Global Step: 540180 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:43,927-Speed 2510.43 samples/sec Loss 1.6113 LearningRate 0.000150 Epoch: 26 Global Step: 540190 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:06:52,132-Speed 2496.72 samples/sec Loss 1.5892 LearningRate 0.000150 Epoch: 26 Global Step: 540200 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:07:00,334-Speed 2497.35 samples/sec Loss 1.5892 LearningRate 0.000150 Epoch: 26 Global Step: 540210 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:07:08,490-Speed 2511.42 samples/sec Loss 1.5799 LearningRate 0.000150 Epoch: 26 Global Step: 540220 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:16,692-Speed 2497.47 samples/sec Loss 1.5905 LearningRate 0.000150 Epoch: 26 Global Step: 540230 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:24,893-Speed 2497.61 samples/sec Loss 1.6288 LearningRate 0.000150 Epoch: 26 Global Step: 540240 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:33,054-Speed 2509.96 samples/sec Loss 1.6018 LearningRate 0.000150 Epoch: 26 Global Step: 540250 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:41,251-Speed 2498.80 samples/sec Loss 1.6272 LearningRate 0.000150 Epoch: 26 Global Step: 540260 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:49,453-Speed 2497.51 samples/sec Loss 1.6377 LearningRate 0.000150 Epoch: 26 Global Step: 540270 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:07:57,649-Speed 2499.06 samples/sec Loss 1.5966 LearningRate 0.000150 Epoch: 26 Global Step: 540280 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:05,850-Speed 2497.84 samples/sec Loss 1.5875 LearningRate 0.000150 Epoch: 26 Global Step: 540290 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:14,055-Speed 2496.76 samples/sec Loss 1.6094 LearningRate 0.000150 Epoch: 26 Global Step: 540300 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:22,204-Speed 2513.74 samples/sec Loss 1.5919 LearningRate 0.000150 Epoch: 26 Global Step: 540310 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:30,407-Speed 2496.80 samples/sec Loss 1.6585 LearningRate 0.000150 Epoch: 26 Global Step: 540320 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:38,611-Speed 2496.93 samples/sec Loss 1.6270 LearningRate 0.000150 Epoch: 26 Global Step: 540330 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:46,808-Speed 2498.84 samples/sec Loss 1.5860 LearningRate 0.000150 Epoch: 26 Global Step: 540340 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:08:55,008-Speed 2498.01 samples/sec Loss 1.6036 LearningRate 0.000150 Epoch: 26 Global Step: 540350 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:03,212-Speed 2496.73 samples/sec Loss 1.6078 LearningRate 0.000150 Epoch: 26 Global Step: 540360 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:11,362-Speed 2513.12 samples/sec Loss 1.6488 LearningRate 0.000150 Epoch: 26 Global Step: 540370 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:19,601-Speed 2498.40 samples/sec Loss 1.6485 LearningRate 0.000150 Epoch: 26 Global Step: 540380 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:27,802-Speed 2497.64 samples/sec Loss 1.6088 LearningRate 0.000150 Epoch: 26 Global Step: 540390 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:36,685-Speed 2500.68 samples/sec Loss 1.5959 LearningRate 0.000150 Epoch: 26 Global Step: 540400 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:44,903-Speed 2499.68 samples/sec Loss 1.5728 LearningRate 0.000150 Epoch: 26 Global Step: 540410 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:09:53,109-Speed 2496.08 samples/sec Loss 1.5916 LearningRate 0.000150 Epoch: 26 Global Step: 540420 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:01,512-Speed 2515.55 samples/sec Loss 1.6350 LearningRate 0.000150 Epoch: 26 Global Step: 540430 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:09,737-Speed 2499.48 samples/sec Loss 1.6361 LearningRate 0.000150 Epoch: 26 Global Step: 540440 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:17,934-Speed 2498.89 samples/sec Loss 1.6080 LearningRate 0.000150 Epoch: 26 Global Step: 540450 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:28,954-Speed 1858.62 samples/sec Loss 1.5938 LearningRate 0.000150 Epoch: 26 Global Step: 540460 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:37,205-Speed 2498.05 samples/sec Loss 1.6027 LearningRate 0.000150 Epoch: 26 Global Step: 540470 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:45,493-Speed 2500.73 samples/sec Loss 1.6181 LearningRate 0.000150 Epoch: 26 Global Step: 540480 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:10:53,639-Speed 2514.42 samples/sec Loss 1.5606 LearningRate 0.000150 Epoch: 26 Global Step: 540490 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:01,882-Speed 2499.90 samples/sec Loss 1.6467 LearningRate 0.000150 Epoch: 26 Global Step: 540500 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:14,795-Speed 1913.87 samples/sec Loss 1.6477 LearningRate 0.000150 Epoch: 26 Global Step: 540510 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:23,660-Speed 2503.03 samples/sec Loss 1.5938 LearningRate 0.000150 Epoch: 26 Global Step: 540520 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:37,468-Speed 1487.10 samples/sec Loss 1.5699 LearningRate 0.000150 Epoch: 26 Global Step: 540530 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:45,724-Speed 2498.27 samples/sec Loss 1.5713 LearningRate 0.000150 Epoch: 26 Global Step: 540540 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:11:53,864-Speed 2516.14 samples/sec Loss 1.6052 LearningRate 0.000150 Epoch: 26 Global Step: 540550 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:07,413-Speed 1695.54 samples/sec Loss 1.5885 LearningRate 0.000150 Epoch: 26 Global Step: 540560 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:15,661-Speed 2501.45 samples/sec Loss 1.5539 LearningRate 0.000150 Epoch: 26 Global Step: 540570 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:28,726-Speed 1567.66 samples/sec Loss 1.5823 LearningRate 0.000150 Epoch: 26 Global Step: 540580 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:36,949-Speed 2496.52 samples/sec Loss 1.5796 LearningRate 0.000150 Epoch: 26 Global Step: 540590 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:46,643-Speed 2120.06 samples/sec Loss 1.5747 LearningRate 0.000150 Epoch: 26 Global Step: 540600 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:12:54,877-Speed 2487.68 samples/sec Loss 1.5870 LearningRate 0.000150 Epoch: 26 Global Step: 540610 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:03,289-Speed 2494.36 samples/sec Loss 1.5895 LearningRate 0.000150 Epoch: 26 Global Step: 540620 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:11,512-Speed 2491.19 samples/sec Loss 1.5898 LearningRate 0.000150 Epoch: 26 Global Step: 540630 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:19,738-Speed 2490.01 samples/sec Loss 1.6445 LearningRate 0.000150 Epoch: 26 Global Step: 540640 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:27,963-Speed 2490.61 samples/sec Loss 1.5731 LearningRate 0.000150 Epoch: 26 Global Step: 540650 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:36,181-Speed 2492.41 samples/sec Loss 1.5887 LearningRate 0.000150 Epoch: 26 Global Step: 540660 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:44,353-Speed 2506.82 samples/sec Loss 1.6059 LearningRate 0.000150 Epoch: 26 Global Step: 540670 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:13:52,563-Speed 2494.71 samples/sec Loss 1.5977 LearningRate 0.000150 Epoch: 26 Global Step: 540680 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:00,769-Speed 2496.33 samples/sec Loss 1.6034 LearningRate 0.000150 Epoch: 26 Global Step: 540690 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:08,971-Speed 2497.26 samples/sec Loss 1.5625 LearningRate 0.000150 Epoch: 26 Global Step: 540700 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:17,176-Speed 2496.33 samples/sec Loss 1.5714 LearningRate 0.000150 Epoch: 26 Global Step: 540710 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:25,375-Speed 2498.39 samples/sec Loss 1.5524 LearningRate 0.000150 Epoch: 26 Global Step: 540720 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:33,519-Speed 2515.32 samples/sec Loss 1.6104 LearningRate 0.000150 Epoch: 26 Global Step: 540730 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:41,710-Speed 2500.71 samples/sec Loss 1.5930 LearningRate 0.000150 Epoch: 26 Global Step: 540740 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:49,907-Speed 2498.99 samples/sec Loss 1.5518 LearningRate 0.000150 Epoch: 26 Global Step: 540750 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:14:58,106-Speed 2498.24 samples/sec Loss 1.5935 LearningRate 0.000150 Epoch: 26 Global Step: 540760 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:06,300-Speed 2499.74 samples/sec Loss 1.5824 LearningRate 0.000150 Epoch: 26 Global Step: 540770 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:14,499-Speed 2498.23 samples/sec Loss 1.6270 LearningRate 0.000150 Epoch: 26 Global Step: 540780 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:22,654-Speed 2511.52 samples/sec Loss 1.5889 LearningRate 0.000150 Epoch: 26 Global Step: 540790 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:30,857-Speed 2497.11 samples/sec Loss 1.5727 LearningRate 0.000150 Epoch: 26 Global Step: 540800 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:39,082-Speed 2490.18 samples/sec Loss 1.5971 LearningRate 0.000150 Epoch: 26 Global Step: 540810 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:47,292-Speed 2494.94 samples/sec Loss 1.5720 LearningRate 0.000150 Epoch: 26 Global Step: 540820 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:15:55,497-Speed 2496.51 samples/sec Loss 1.5997 LearningRate 0.000150 Epoch: 26 Global Step: 540830 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:03,718-Speed 2491.52 samples/sec Loss 1.6072 LearningRate 0.000150 Epoch: 26 Global Step: 540840 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:11,873-Speed 2511.58 samples/sec Loss 1.5746 LearningRate 0.000150 Epoch: 26 Global Step: 540850 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:20,077-Speed 2496.74 samples/sec Loss 1.6392 LearningRate 0.000150 Epoch: 26 Global Step: 540860 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:28,286-Speed 2495.15 samples/sec Loss 1.5785 LearningRate 0.000150 Epoch: 26 Global Step: 540870 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:36,500-Speed 2494.23 samples/sec Loss 1.6147 LearningRate 0.000150 Epoch: 26 Global Step: 540880 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:44,704-Speed 2496.92 samples/sec Loss 1.6052 LearningRate 0.000149 Epoch: 26 Global Step: 540890 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:16:52,908-Speed 2496.59 samples/sec Loss 1.6250 LearningRate 0.000149 Epoch: 26 Global Step: 540900 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:01,054-Speed 2514.41 samples/sec Loss 1.5922 LearningRate 0.000149 Epoch: 26 Global Step: 540910 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:09,267-Speed 2494.09 samples/sec Loss 1.5987 LearningRate 0.000149 Epoch: 26 Global Step: 540920 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:17,472-Speed 2496.66 samples/sec Loss 1.6218 LearningRate 0.000149 Epoch: 26 Global Step: 540930 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:25,673-Speed 2497.80 samples/sec Loss 1.5803 LearningRate 0.000149 Epoch: 26 Global Step: 540940 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:33,877-Speed 2496.59 samples/sec Loss 1.6277 LearningRate 0.000149 Epoch: 26 Global Step: 540950 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:42,079-Speed 2497.45 samples/sec Loss 1.5839 LearningRate 0.000149 Epoch: 26 Global Step: 540960 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:50,229-Speed 2514.20 samples/sec Loss 1.5942 LearningRate 0.000149 Epoch: 26 Global Step: 540970 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:17:58,444-Speed 2493.20 samples/sec Loss 1.6168 LearningRate 0.000149 Epoch: 26 Global Step: 540980 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:06,644-Speed 2497.76 samples/sec Loss 1.6289 LearningRate 0.000149 Epoch: 26 Global Step: 540990 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:14,847-Speed 2497.25 samples/sec Loss 1.6451 LearningRate 0.000149 Epoch: 26 Global Step: 541000 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:23,053-Speed 2496.29 samples/sec Loss 1.5984 LearningRate 0.000149 Epoch: 26 Global Step: 541010 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:31,261-Speed 2495.69 samples/sec Loss 1.6226 LearningRate 0.000149 Epoch: 26 Global Step: 541020 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:39,411-Speed 2513.18 samples/sec Loss 1.6193 LearningRate 0.000149 Epoch: 26 Global Step: 541030 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:47,615-Speed 2496.95 samples/sec Loss 1.5855 LearningRate 0.000149 Epoch: 26 Global Step: 541040 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:18:55,820-Speed 2496.41 samples/sec Loss 1.6221 LearningRate 0.000149 Epoch: 26 Global Step: 541050 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:04,030-Speed 2495.20 samples/sec Loss 1.6023 LearningRate 0.000149 Epoch: 26 Global Step: 541060 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:12,244-Speed 2493.72 samples/sec Loss 1.5971 LearningRate 0.000149 Epoch: 26 Global Step: 541070 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:20,446-Speed 2497.08 samples/sec Loss 1.6313 LearningRate 0.000149 Epoch: 26 Global Step: 541080 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:28,596-Speed 2513.31 samples/sec Loss 1.6115 LearningRate 0.000149 Epoch: 26 Global Step: 541090 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:36,803-Speed 2496.22 samples/sec Loss 1.5964 LearningRate 0.000149 Epoch: 26 Global Step: 541100 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:45,004-Speed 2497.58 samples/sec Loss 1.6323 LearningRate 0.000149 Epoch: 26 Global Step: 541110 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:19:53,210-Speed 2496.65 samples/sec Loss 1.6324 LearningRate 0.000149 Epoch: 26 Global Step: 541120 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:01,409-Speed 2498.41 samples/sec Loss 1.6502 LearningRate 0.000149 Epoch: 26 Global Step: 541130 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:09,613-Speed 2496.34 samples/sec Loss 1.6074 LearningRate 0.000149 Epoch: 26 Global Step: 541140 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:17,759-Speed 2514.62 samples/sec Loss 1.5958 LearningRate 0.000149 Epoch: 26 Global Step: 541150 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:25,964-Speed 2496.32 samples/sec Loss 1.6174 LearningRate 0.000149 Epoch: 26 Global Step: 541160 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:34,169-Speed 2496.71 samples/sec Loss 1.6175 LearningRate 0.000149 Epoch: 26 Global Step: 541170 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:42,369-Speed 2497.70 samples/sec Loss 1.5942 LearningRate 0.000149 Epoch: 26 Global Step: 541180 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:50,579-Speed 2495.05 samples/sec Loss 1.6141 LearningRate 0.000149 Epoch: 26 Global Step: 541190 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:20:58,787-Speed 2495.63 samples/sec Loss 1.6147 LearningRate 0.000149 Epoch: 26 Global Step: 541200 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:06,939-Speed 2512.90 samples/sec Loss 1.6078 LearningRate 0.000149 Epoch: 26 Global Step: 541210 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:15,145-Speed 2495.88 samples/sec Loss 1.6086 LearningRate 0.000149 Epoch: 26 Global Step: 541220 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:23,347-Speed 2497.57 samples/sec Loss 1.6114 LearningRate 0.000149 Epoch: 26 Global Step: 541230 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:31,553-Speed 2496.29 samples/sec Loss 1.5810 LearningRate 0.000149 Epoch: 26 Global Step: 541240 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:39,756-Speed 2496.80 samples/sec Loss 1.5940 LearningRate 0.000149 Epoch: 26 Global Step: 541250 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:47,959-Speed 2497.22 samples/sec Loss 1.5985 LearningRate 0.000149 Epoch: 26 Global Step: 541260 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:21:56,108-Speed 2513.39 samples/sec Loss 1.6012 LearningRate 0.000149 Epoch: 26 Global Step: 541270 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:04,313-Speed 2496.74 samples/sec Loss 1.6081 LearningRate 0.000149 Epoch: 26 Global Step: 541280 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:12,519-Speed 2496.25 samples/sec Loss 1.6153 LearningRate 0.000149 Epoch: 26 Global Step: 541290 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:20,726-Speed 2495.76 samples/sec Loss 1.6038 LearningRate 0.000149 Epoch: 26 Global Step: 541300 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:28,932-Speed 2496.36 samples/sec Loss 1.6106 LearningRate 0.000149 Epoch: 26 Global Step: 541310 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:37,140-Speed 2495.60 samples/sec Loss 1.5872 LearningRate 0.000149 Epoch: 26 Global Step: 541320 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:45,293-Speed 2512.21 samples/sec Loss 1.6553 LearningRate 0.000149 Epoch: 26 Global Step: 541330 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:22:53,494-Speed 2497.76 samples/sec Loss 1.6084 LearningRate 0.000149 Epoch: 26 Global Step: 541340 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:01,698-Speed 2496.84 samples/sec Loss 1.6283 LearningRate 0.000149 Epoch: 26 Global Step: 541350 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:09,902-Speed 2496.96 samples/sec Loss 1.6041 LearningRate 0.000149 Epoch: 26 Global Step: 541360 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:18,104-Speed 2497.39 samples/sec Loss 1.6173 LearningRate 0.000149 Epoch: 26 Global Step: 541370 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:26,309-Speed 2496.15 samples/sec Loss 1.6408 LearningRate 0.000149 Epoch: 26 Global Step: 541380 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:34,460-Speed 2513.24 samples/sec Loss 1.5772 LearningRate 0.000149 Epoch: 26 Global Step: 541390 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:42,665-Speed 2496.28 samples/sec Loss 1.6035 LearningRate 0.000149 Epoch: 26 Global Step: 541400 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:50,866-Speed 2497.56 samples/sec Loss 1.5916 LearningRate 0.000149 Epoch: 26 Global Step: 541410 Fp16 Grad Scale: 8192 Required: 66 hours Training: 2022-07-10 18:23:59,071-Speed 2496.41 samples/sec Loss 1.6564 LearningRate 0.000149 Epoch: 26 Global Step: 541420 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:07,290-Speed 2492.44 samples/sec Loss 1.6242 LearningRate 0.000149 Epoch: 26 Global Step: 541430 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:15,494-Speed 2496.43 samples/sec Loss 1.5857 LearningRate 0.000149 Epoch: 26 Global Step: 541440 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:23,648-Speed 2512.24 samples/sec Loss 1.6195 LearningRate 0.000149 Epoch: 26 Global Step: 541450 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:31,859-Speed 2494.52 samples/sec Loss 1.6313 LearningRate 0.000149 Epoch: 26 Global Step: 541460 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:40,062-Speed 2497.16 samples/sec Loss 1.6122 LearningRate 0.000149 Epoch: 26 Global Step: 541470 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:48,272-Speed 2498.71 samples/sec Loss 1.5969 LearningRate 0.000149 Epoch: 26 Global Step: 541480 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:24:56,473-Speed 2497.48 samples/sec Loss 1.6175 LearningRate 0.000149 Epoch: 26 Global Step: 541490 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:04,678-Speed 2497.03 samples/sec Loss 1.6249 LearningRate 0.000149 Epoch: 26 Global Step: 541500 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:12,827-Speed 2513.45 samples/sec Loss 1.5960 LearningRate 0.000149 Epoch: 26 Global Step: 541510 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:21,042-Speed 2493.60 samples/sec Loss 1.6133 LearningRate 0.000149 Epoch: 26 Global Step: 541520 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:29,243-Speed 2497.49 samples/sec Loss 1.6088 LearningRate 0.000149 Epoch: 26 Global Step: 541530 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:37,445-Speed 2497.36 samples/sec Loss 1.5904 LearningRate 0.000149 Epoch: 26 Global Step: 541540 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:45,645-Speed 2498.05 samples/sec Loss 1.6209 LearningRate 0.000149 Epoch: 26 Global Step: 541550 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:25:53,860-Speed 2493.31 samples/sec Loss 1.5943 LearningRate 0.000149 Epoch: 26 Global Step: 541560 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:02,010-Speed 2513.36 samples/sec Loss 1.6338 LearningRate 0.000149 Epoch: 26 Global Step: 541570 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:10,208-Speed 2499.00 samples/sec Loss 1.6009 LearningRate 0.000149 Epoch: 26 Global Step: 541580 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:18,409-Speed 2497.71 samples/sec Loss 1.6254 LearningRate 0.000149 Epoch: 26 Global Step: 541590 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:26,615-Speed 2496.02 samples/sec Loss 1.6211 LearningRate 0.000149 Epoch: 26 Global Step: 541600 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:34,828-Speed 2493.99 samples/sec Loss 1.6137 LearningRate 0.000149 Epoch: 26 Global Step: 541610 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:43,030-Speed 2497.30 samples/sec Loss 1.5896 LearningRate 0.000149 Epoch: 26 Global Step: 541620 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:51,178-Speed 2513.86 samples/sec Loss 1.6343 LearningRate 0.000149 Epoch: 26 Global Step: 541630 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:26:59,394-Speed 2493.10 samples/sec Loss 1.6336 LearningRate 0.000149 Epoch: 26 Global Step: 541640 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:07,594-Speed 2498.02 samples/sec Loss 1.5945 LearningRate 0.000149 Epoch: 26 Global Step: 541650 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:15,797-Speed 2497.34 samples/sec Loss 1.6038 LearningRate 0.000149 Epoch: 26 Global Step: 541660 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:24,002-Speed 2496.29 samples/sec Loss 1.5961 LearningRate 0.000149 Epoch: 26 Global Step: 541670 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:32,201-Speed 2498.25 samples/sec Loss 1.6266 LearningRate 0.000149 Epoch: 26 Global Step: 541680 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:40,354-Speed 2512.68 samples/sec Loss 1.6423 LearningRate 0.000149 Epoch: 26 Global Step: 541690 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:48,574-Speed 2491.74 samples/sec Loss 1.5966 LearningRate 0.000149 Epoch: 26 Global Step: 541700 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:27:56,783-Speed 2495.25 samples/sec Loss 1.5986 LearningRate 0.000149 Epoch: 26 Global Step: 541710 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:04,990-Speed 2495.91 samples/sec Loss 1.5956 LearningRate 0.000149 Epoch: 26 Global Step: 541720 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:13,196-Speed 2495.92 samples/sec Loss 1.6315 LearningRate 0.000149 Epoch: 26 Global Step: 541730 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:21,409-Speed 2494.23 samples/sec Loss 1.5579 LearningRate 0.000149 Epoch: 26 Global Step: 541740 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:29,579-Speed 2507.20 samples/sec Loss 1.5335 LearningRate 0.000149 Epoch: 26 Global Step: 541750 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:37,782-Speed 2496.92 samples/sec Loss 1.6158 LearningRate 0.000149 Epoch: 26 Global Step: 541760 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:45,992-Speed 2495.11 samples/sec Loss 1.6015 LearningRate 0.000149 Epoch: 26 Global Step: 541770 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:28:54,192-Speed 2497.77 samples/sec Loss 1.5881 LearningRate 0.000149 Epoch: 26 Global Step: 541780 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:02,395-Speed 2497.00 samples/sec Loss 1.6106 LearningRate 0.000149 Epoch: 26 Global Step: 541790 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:10,597-Speed 2497.26 samples/sec Loss 1.6355 LearningRate 0.000149 Epoch: 26 Global Step: 541800 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:18,750-Speed 2512.27 samples/sec Loss 1.5916 LearningRate 0.000149 Epoch: 26 Global Step: 541810 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:26,952-Speed 2497.37 samples/sec Loss 1.6091 LearningRate 0.000149 Epoch: 26 Global Step: 541820 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:35,157-Speed 2496.42 samples/sec Loss 1.6185 LearningRate 0.000149 Epoch: 26 Global Step: 541830 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:43,357-Speed 2497.87 samples/sec Loss 1.5854 LearningRate 0.000149 Epoch: 26 Global Step: 541840 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:51,560-Speed 2497.43 samples/sec Loss 1.6028 LearningRate 0.000149 Epoch: 26 Global Step: 541850 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:29:59,779-Speed 2492.17 samples/sec Loss 1.5673 LearningRate 0.000148 Epoch: 26 Global Step: 541860 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:07,928-Speed 2513.67 samples/sec Loss 1.6145 LearningRate 0.000148 Epoch: 26 Global Step: 541870 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:16,130-Speed 2497.30 samples/sec Loss 1.5815 LearningRate 0.000148 Epoch: 26 Global Step: 541880 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:24,340-Speed 2495.02 samples/sec Loss 1.6239 LearningRate 0.000148 Epoch: 26 Global Step: 541890 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:32,544-Speed 2496.73 samples/sec Loss 1.6061 LearningRate 0.000148 Epoch: 26 Global Step: 541900 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:40,743-Speed 2498.33 samples/sec Loss 1.6208 LearningRate 0.000148 Epoch: 26 Global Step: 541910 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:48,947-Speed 2497.00 samples/sec Loss 1.5660 LearningRate 0.000148 Epoch: 26 Global Step: 541920 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:30:57,100-Speed 2512.28 samples/sec Loss 1.6031 LearningRate 0.000148 Epoch: 26 Global Step: 541930 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:05,310-Speed 2494.91 samples/sec Loss 1.6391 LearningRate 0.000148 Epoch: 26 Global Step: 541940 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:13,515-Speed 2496.46 samples/sec Loss 1.6075 LearningRate 0.000148 Epoch: 26 Global Step: 541950 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:21,718-Speed 2496.85 samples/sec Loss 1.6146 LearningRate 0.000148 Epoch: 26 Global Step: 541960 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:29,948-Speed 2488.92 samples/sec Loss 1.6115 LearningRate 0.000148 Epoch: 26 Global Step: 541970 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:38,149-Speed 2497.75 samples/sec Loss 1.6341 LearningRate 0.000148 Epoch: 26 Global Step: 541980 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:46,302-Speed 2512.34 samples/sec Loss 1.6317 LearningRate 0.000148 Epoch: 26 Global Step: 541990 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:31:54,515-Speed 2493.97 samples/sec Loss 1.6536 LearningRate 0.000148 Epoch: 26 Global Step: 542000 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:02,716-Speed 2497.74 samples/sec Loss 1.6233 LearningRate 0.000148 Epoch: 26 Global Step: 542010 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:10,922-Speed 2496.37 samples/sec Loss 1.6121 LearningRate 0.000148 Epoch: 26 Global Step: 542020 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:19,127-Speed 2496.33 samples/sec Loss 1.6310 LearningRate 0.000148 Epoch: 26 Global Step: 542030 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:27,334-Speed 2496.21 samples/sec Loss 1.6043 LearningRate 0.000148 Epoch: 26 Global Step: 542040 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:35,483-Speed 2513.55 samples/sec Loss 1.6006 LearningRate 0.000148 Epoch: 26 Global Step: 542050 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:43,686-Speed 2497.27 samples/sec Loss 1.6301 LearningRate 0.000148 Epoch: 26 Global Step: 542060 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:32:51,888-Speed 2497.17 samples/sec Loss 1.5824 LearningRate 0.000148 Epoch: 26 Global Step: 542070 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:00,099-Speed 2494.69 samples/sec Loss 1.6294 LearningRate 0.000148 Epoch: 26 Global Step: 542080 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:08,312-Speed 2494.06 samples/sec Loss 1.6671 LearningRate 0.000148 Epoch: 26 Global Step: 542090 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:16,518-Speed 2496.08 samples/sec Loss 1.6553 LearningRate 0.000148 Epoch: 26 Global Step: 542100 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:24,670-Speed 2512.68 samples/sec Loss 1.6474 LearningRate 0.000148 Epoch: 26 Global Step: 542110 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:32,872-Speed 2497.16 samples/sec Loss 1.5967 LearningRate 0.000148 Epoch: 26 Global Step: 542120 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:41,089-Speed 2492.99 samples/sec Loss 1.6160 LearningRate 0.000148 Epoch: 26 Global Step: 542130 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:49,293-Speed 2497.08 samples/sec Loss 1.6113 LearningRate 0.000148 Epoch: 26 Global Step: 542140 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:33:57,494-Speed 2497.54 samples/sec Loss 1.5957 LearningRate 0.000148 Epoch: 26 Global Step: 542150 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:05,707-Speed 2493.75 samples/sec Loss 1.6287 LearningRate 0.000148 Epoch: 26 Global Step: 542160 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:13,858-Speed 2513.25 samples/sec Loss 1.5748 LearningRate 0.000148 Epoch: 26 Global Step: 542170 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:22,065-Speed 2496.13 samples/sec Loss 1.6252 LearningRate 0.000148 Epoch: 26 Global Step: 542180 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:30,267-Speed 2497.20 samples/sec Loss 1.5777 LearningRate 0.000148 Epoch: 26 Global Step: 542190 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:38,471-Speed 2496.84 samples/sec Loss 1.5671 LearningRate 0.000148 Epoch: 26 Global Step: 542200 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:46,678-Speed 2496.00 samples/sec Loss 1.6124 LearningRate 0.000148 Epoch: 26 Global Step: 542210 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:34:54,879-Speed 2497.45 samples/sec Loss 1.6104 LearningRate 0.000148 Epoch: 26 Global Step: 542220 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:03,029-Speed 2513.42 samples/sec Loss 1.6089 LearningRate 0.000148 Epoch: 26 Global Step: 542230 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:11,241-Speed 2494.52 samples/sec Loss 1.6030 LearningRate 0.000148 Epoch: 26 Global Step: 542240 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:19,442-Speed 2497.81 samples/sec Loss 1.5983 LearningRate 0.000148 Epoch: 26 Global Step: 542250 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:27,646-Speed 2496.53 samples/sec Loss 1.6010 LearningRate 0.000148 Epoch: 26 Global Step: 542260 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:35,860-Speed 2493.67 samples/sec Loss 1.6164 LearningRate 0.000148 Epoch: 26 Global Step: 542270 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:44,064-Speed 2496.92 samples/sec Loss 1.6345 LearningRate 0.000148 Epoch: 26 Global Step: 542280 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:35:52,225-Speed 2510.00 samples/sec Loss 1.6544 LearningRate 0.000148 Epoch: 26 Global Step: 542290 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:00,425-Speed 2497.67 samples/sec Loss 1.6078 LearningRate 0.000148 Epoch: 26 Global Step: 542300 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:08,628-Speed 2497.35 samples/sec Loss 1.6213 LearningRate 0.000148 Epoch: 26 Global Step: 542310 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:16,836-Speed 2495.46 samples/sec Loss 1.6372 LearningRate 0.000148 Epoch: 26 Global Step: 542320 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:25,038-Speed 2497.43 samples/sec Loss 1.6252 LearningRate 0.000148 Epoch: 26 Global Step: 542330 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:33,245-Speed 2495.54 samples/sec Loss 1.6436 LearningRate 0.000148 Epoch: 26 Global Step: 542340 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:41,397-Speed 2512.63 samples/sec Loss 1.6151 LearningRate 0.000148 Epoch: 26 Global Step: 542350 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:49,614-Speed 2492.83 samples/sec Loss 1.5920 LearningRate 0.000148 Epoch: 26 Global Step: 542360 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:36:57,820-Speed 2495.99 samples/sec Loss 1.6033 LearningRate 0.000148 Epoch: 26 Global Step: 542370 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:06,026-Speed 2496.22 samples/sec Loss 1.5948 LearningRate 0.000148 Epoch: 26 Global Step: 542380 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:14,231-Speed 2496.41 samples/sec Loss 1.6081 LearningRate 0.000148 Epoch: 26 Global Step: 542390 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:22,446-Speed 2493.53 samples/sec Loss 1.5856 LearningRate 0.000148 Epoch: 26 Global Step: 542400 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:30,599-Speed 2512.08 samples/sec Loss 1.6148 LearningRate 0.000148 Epoch: 26 Global Step: 542410 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:38,804-Speed 2496.46 samples/sec Loss 1.6122 LearningRate 0.000148 Epoch: 26 Global Step: 542420 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:47,005-Speed 2497.83 samples/sec Loss 1.5810 LearningRate 0.000148 Epoch: 26 Global Step: 542430 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:37:55,208-Speed 2497.35 samples/sec Loss 1.6168 LearningRate 0.000148 Epoch: 26 Global Step: 542440 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:03,411-Speed 2496.85 samples/sec Loss 1.6052 LearningRate 0.000148 Epoch: 26 Global Step: 542450 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:11,617-Speed 2496.61 samples/sec Loss 1.6457 LearningRate 0.000148 Epoch: 26 Global Step: 542460 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:19,767-Speed 2513.30 samples/sec Loss 1.5995 LearningRate 0.000148 Epoch: 26 Global Step: 542470 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:27,970-Speed 2497.26 samples/sec Loss 1.6148 LearningRate 0.000148 Epoch: 26 Global Step: 542480 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:36,179-Speed 2495.24 samples/sec Loss 1.5831 LearningRate 0.000148 Epoch: 26 Global Step: 542490 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:44,394-Speed 2493.47 samples/sec Loss 1.6196 LearningRate 0.000148 Epoch: 26 Global Step: 542500 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:38:52,598-Speed 2496.90 samples/sec Loss 1.6028 LearningRate 0.000148 Epoch: 26 Global Step: 542510 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:00,803-Speed 2496.52 samples/sec Loss 1.6032 LearningRate 0.000148 Epoch: 26 Global Step: 542520 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:08,955-Speed 2512.84 samples/sec Loss 1.6031 LearningRate 0.000148 Epoch: 26 Global Step: 542530 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:17,157-Speed 2496.96 samples/sec Loss 1.6468 LearningRate 0.000148 Epoch: 26 Global Step: 542540 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:25,360-Speed 2497.27 samples/sec Loss 1.6181 LearningRate 0.000148 Epoch: 26 Global Step: 542550 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:33,565-Speed 2496.65 samples/sec Loss 1.6609 LearningRate 0.000148 Epoch: 26 Global Step: 542560 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:41,767-Speed 2497.23 samples/sec Loss 1.6254 LearningRate 0.000148 Epoch: 26 Global Step: 542570 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:49,975-Speed 2495.47 samples/sec Loss 1.6136 LearningRate 0.000148 Epoch: 26 Global Step: 542580 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:39:58,130-Speed 2511.84 samples/sec Loss 1.5866 LearningRate 0.000148 Epoch: 26 Global Step: 542590 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:40:06,345-Speed 2493.59 samples/sec Loss 1.6271 LearningRate 0.000148 Epoch: 26 Global Step: 542600 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:40:14,548-Speed 2496.95 samples/sec Loss 1.6063 LearningRate 0.000148 Epoch: 26 Global Step: 542610 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:40:22,750-Speed 2497.27 samples/sec Loss 1.6455 LearningRate 0.000148 Epoch: 26 Global Step: 542620 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:40:30,956-Speed 2496.06 samples/sec Loss 1.5971 LearningRate 0.000148 Epoch: 26 Global Step: 542630 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:40:39,159-Speed 2497.39 samples/sec Loss 1.5426 LearningRate 0.000148 Epoch: 26 Global Step: 542640 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:40:47,310-Speed 2513.00 samples/sec Loss 1.5980 LearningRate 0.000148 Epoch: 26 Global Step: 542650 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:40:55,513-Speed 2496.89 samples/sec Loss 1.6291 LearningRate 0.000148 Epoch: 26 Global Step: 542660 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:03,721-Speed 2496.65 samples/sec Loss 1.6157 LearningRate 0.000148 Epoch: 26 Global Step: 542670 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:11,923-Speed 2497.37 samples/sec Loss 1.6018 LearningRate 0.000148 Epoch: 26 Global Step: 542680 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:20,129-Speed 2495.99 samples/sec Loss 1.6137 LearningRate 0.000148 Epoch: 26 Global Step: 542690 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:28,338-Speed 2495.35 samples/sec Loss 1.6247 LearningRate 0.000148 Epoch: 26 Global Step: 542700 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:36,501-Speed 2511.06 samples/sec Loss 1.6057 LearningRate 0.000148 Epoch: 26 Global Step: 542710 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:44,701-Speed 2497.92 samples/sec Loss 1.5977 LearningRate 0.000148 Epoch: 26 Global Step: 542720 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:41:52,905-Speed 2496.54 samples/sec Loss 1.6162 LearningRate 0.000148 Epoch: 26 Global Step: 542730 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:01,110-Speed 2496.79 samples/sec Loss 1.6248 LearningRate 0.000148 Epoch: 26 Global Step: 542740 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:09,316-Speed 2496.11 samples/sec Loss 1.6193 LearningRate 0.000148 Epoch: 26 Global Step: 542750 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:17,518-Speed 2497.44 samples/sec Loss 1.5880 LearningRate 0.000148 Epoch: 26 Global Step: 542760 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:25,667-Speed 2513.29 samples/sec Loss 1.6225 LearningRate 0.000148 Epoch: 26 Global Step: 542770 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:33,871-Speed 2496.87 samples/sec Loss 1.6253 LearningRate 0.000148 Epoch: 26 Global Step: 542780 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:42,078-Speed 2495.87 samples/sec Loss 1.5722 LearningRate 0.000148 Epoch: 26 Global Step: 542790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:50,285-Speed 2495.74 samples/sec Loss 1.6412 LearningRate 0.000148 Epoch: 26 Global Step: 542800 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:42:58,488-Speed 2497.20 samples/sec Loss 1.6564 LearningRate 0.000148 Epoch: 26 Global Step: 542810 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:06,689-Speed 2497.66 samples/sec Loss 1.6179 LearningRate 0.000148 Epoch: 26 Global Step: 542820 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:14,840-Speed 2512.92 samples/sec Loss 1.6199 LearningRate 0.000147 Epoch: 26 Global Step: 542830 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:23,045-Speed 2496.40 samples/sec Loss 1.6147 LearningRate 0.000147 Epoch: 26 Global Step: 542840 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:31,249-Speed 2496.86 samples/sec Loss 1.6098 LearningRate 0.000147 Epoch: 26 Global Step: 542850 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:39,454-Speed 2496.21 samples/sec Loss 1.6055 LearningRate 0.000147 Epoch: 26 Global Step: 542860 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:47,658-Speed 2496.98 samples/sec Loss 1.6114 LearningRate 0.000147 Epoch: 26 Global Step: 542870 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:43:55,864-Speed 2496.21 samples/sec Loss 1.6524 LearningRate 0.000147 Epoch: 26 Global Step: 542880 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:04,014-Speed 2513.20 samples/sec Loss 1.5742 LearningRate 0.000147 Epoch: 26 Global Step: 542890 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:12,220-Speed 2496.35 samples/sec Loss 1.6377 LearningRate 0.000147 Epoch: 26 Global Step: 542900 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:20,422-Speed 2497.19 samples/sec Loss 1.6035 LearningRate 0.000147 Epoch: 26 Global Step: 542910 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:28,623-Speed 2497.59 samples/sec Loss 1.6403 LearningRate 0.000147 Epoch: 26 Global Step: 542920 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:36,824-Speed 2497.60 samples/sec Loss 1.6461 LearningRate 0.000147 Epoch: 26 Global Step: 542930 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:45,048-Speed 2490.88 samples/sec Loss 1.5957 LearningRate 0.000147 Epoch: 26 Global Step: 542940 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:44:53,196-Speed 2513.73 samples/sec Loss 1.6215 LearningRate 0.000147 Epoch: 26 Global Step: 542950 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:01,396-Speed 2497.94 samples/sec Loss 1.6093 LearningRate 0.000147 Epoch: 26 Global Step: 542960 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:09,594-Speed 2498.21 samples/sec Loss 1.6177 LearningRate 0.000147 Epoch: 26 Global Step: 542970 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:17,814-Speed 2492.22 samples/sec Loss 1.6164 LearningRate 0.000147 Epoch: 26 Global Step: 542980 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:26,011-Speed 2499.24 samples/sec Loss 1.6232 LearningRate 0.000147 Epoch: 26 Global Step: 542990 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:34,207-Speed 2498.96 samples/sec Loss 1.6289 LearningRate 0.000147 Epoch: 26 Global Step: 543000 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:42,354-Speed 2514.17 samples/sec Loss 1.6191 LearningRate 0.000147 Epoch: 26 Global Step: 543010 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:50,553-Speed 2499.11 samples/sec Loss 1.6325 LearningRate 0.000147 Epoch: 26 Global Step: 543020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:45:58,760-Speed 2495.83 samples/sec Loss 1.5798 LearningRate 0.000147 Epoch: 26 Global Step: 543030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:06,965-Speed 2496.53 samples/sec Loss 1.5938 LearningRate 0.000147 Epoch: 26 Global Step: 543040 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:15,166-Speed 2497.59 samples/sec Loss 1.6093 LearningRate 0.000147 Epoch: 26 Global Step: 543050 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:23,366-Speed 2498.12 samples/sec Loss 1.6370 LearningRate 0.000147 Epoch: 26 Global Step: 543060 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:31,526-Speed 2510.09 samples/sec Loss 1.5958 LearningRate 0.000147 Epoch: 26 Global Step: 543070 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:39,727-Speed 2497.58 samples/sec Loss 1.6205 LearningRate 0.000147 Epoch: 26 Global Step: 543080 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:47,929-Speed 2497.37 samples/sec Loss 1.6304 LearningRate 0.000147 Epoch: 26 Global Step: 543090 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:46:56,146-Speed 2492.72 samples/sec Loss 1.6074 LearningRate 0.000147 Epoch: 26 Global Step: 543100 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:04,363-Speed 2492.85 samples/sec Loss 1.5749 LearningRate 0.000147 Epoch: 26 Global Step: 543110 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:12,570-Speed 2495.84 samples/sec Loss 1.5994 LearningRate 0.000147 Epoch: 26 Global Step: 543120 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:20,719-Speed 2513.47 samples/sec Loss 1.6195 LearningRate 0.000147 Epoch: 26 Global Step: 543130 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:28,922-Speed 2496.95 samples/sec Loss 1.6574 LearningRate 0.000147 Epoch: 26 Global Step: 543140 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:37,125-Speed 2497.20 samples/sec Loss 1.6230 LearningRate 0.000147 Epoch: 26 Global Step: 543150 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:45,332-Speed 2495.84 samples/sec Loss 1.6007 LearningRate 0.000147 Epoch: 26 Global Step: 543160 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:47:53,536-Speed 2496.60 samples/sec Loss 1.6020 LearningRate 0.000147 Epoch: 26 Global Step: 543170 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:01,740-Speed 2496.74 samples/sec Loss 1.5881 LearningRate 0.000147 Epoch: 26 Global Step: 543180 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:09,894-Speed 2512.16 samples/sec Loss 1.5961 LearningRate 0.000147 Epoch: 26 Global Step: 543190 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:18,096-Speed 2497.21 samples/sec Loss 1.6111 LearningRate 0.000147 Epoch: 26 Global Step: 543200 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:26,306-Speed 2494.86 samples/sec Loss 1.6278 LearningRate 0.000147 Epoch: 26 Global Step: 543210 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:34,511-Speed 2496.93 samples/sec Loss 1.6092 LearningRate 0.000147 Epoch: 26 Global Step: 543220 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:42,715-Speed 2496.62 samples/sec Loss 1.5801 LearningRate 0.000147 Epoch: 26 Global Step: 543230 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:50,917-Speed 2497.35 samples/sec Loss 1.6198 LearningRate 0.000147 Epoch: 26 Global Step: 543240 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:48:59,067-Speed 2513.48 samples/sec Loss 1.6072 LearningRate 0.000147 Epoch: 26 Global Step: 543250 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:49:07,275-Speed 2495.38 samples/sec Loss 1.6194 LearningRate 0.000147 Epoch: 26 Global Step: 543260 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:49:15,480-Speed 2496.23 samples/sec Loss 1.6063 LearningRate 0.000147 Epoch: 26 Global Step: 543270 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:49:23,685-Speed 2496.64 samples/sec Loss 1.6117 LearningRate 0.000147 Epoch: 26 Global Step: 543280 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:49:31,892-Speed 2495.85 samples/sec Loss 1.5772 LearningRate 0.000147 Epoch: 26 Global Step: 543290 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-07-10 18:49:40,053-Speed 2509.86 samples/sec Loss 1.5829 LearningRate 0.000147 Epoch: 26 Global Step: 543300 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:49:48,203-Speed 2513.29 samples/sec Loss 1.6322 LearningRate 0.000147 Epoch: 26 Global Step: 543310 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:49:56,408-Speed 2496.44 samples/sec Loss 1.5847 LearningRate 0.000147 Epoch: 26 Global Step: 543320 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:04,619-Speed 2494.89 samples/sec Loss 1.5951 LearningRate 0.000147 Epoch: 26 Global Step: 543330 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:12,820-Speed 2497.53 samples/sec Loss 1.5741 LearningRate 0.000147 Epoch: 26 Global Step: 543340 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:21,022-Speed 2497.28 samples/sec Loss 1.5374 LearningRate 0.000147 Epoch: 26 Global Step: 543350 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:29,240-Speed 2492.64 samples/sec Loss 1.5582 LearningRate 0.000147 Epoch: 26 Global Step: 543360 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:37,394-Speed 2512.07 samples/sec Loss 1.6092 LearningRate 0.000147 Epoch: 26 Global Step: 543370 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:45,596-Speed 2497.44 samples/sec Loss 1.5957 LearningRate 0.000147 Epoch: 26 Global Step: 543380 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:50:53,801-Speed 2496.31 samples/sec Loss 1.6142 LearningRate 0.000147 Epoch: 26 Global Step: 543390 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:02,001-Speed 2498.24 samples/sec Loss 1.5452 LearningRate 0.000147 Epoch: 26 Global Step: 543400 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:10,204-Speed 2497.14 samples/sec Loss 1.6233 LearningRate 0.000147 Epoch: 26 Global Step: 543410 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:18,405-Speed 2497.52 samples/sec Loss 1.6018 LearningRate 0.000147 Epoch: 26 Global Step: 543420 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:26,550-Speed 2514.79 samples/sec Loss 1.6066 LearningRate 0.000147 Epoch: 26 Global Step: 543430 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:34,756-Speed 2496.06 samples/sec Loss 1.6228 LearningRate 0.000147 Epoch: 26 Global Step: 543440 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:42,956-Speed 2498.11 samples/sec Loss 1.6113 LearningRate 0.000147 Epoch: 26 Global Step: 543450 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:51,158-Speed 2497.29 samples/sec Loss 1.6001 LearningRate 0.000147 Epoch: 26 Global Step: 543460 Fp16 Grad Scale: 16384 Required: 66 hours Training: 2022-07-10 18:51:59,361-Speed 2496.98 samples/sec Loss 1.6227 LearningRate 0.000147 Epoch: 26 Global Step: 543470 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:07,562-Speed 2497.86 samples/sec Loss 1.6160 LearningRate 0.000147 Epoch: 26 Global Step: 543480 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:15,710-Speed 2513.93 samples/sec Loss 1.6107 LearningRate 0.000147 Epoch: 26 Global Step: 543490 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:23,913-Speed 2497.28 samples/sec Loss 1.5925 LearningRate 0.000147 Epoch: 26 Global Step: 543500 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:32,120-Speed 2495.79 samples/sec Loss 1.5938 LearningRate 0.000147 Epoch: 26 Global Step: 543510 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:40,326-Speed 2496.16 samples/sec Loss 1.5960 LearningRate 0.000147 Epoch: 26 Global Step: 543520 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:48,529-Speed 2497.36 samples/sec Loss 1.5769 LearningRate 0.000147 Epoch: 26 Global Step: 543530 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:52:56,737-Speed 2495.24 samples/sec Loss 1.6130 LearningRate 0.000147 Epoch: 26 Global Step: 543540 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:04,887-Speed 2513.25 samples/sec Loss 1.5891 LearningRate 0.000147 Epoch: 26 Global Step: 543550 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:13,088-Speed 2497.81 samples/sec Loss 1.6057 LearningRate 0.000147 Epoch: 26 Global Step: 543560 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:21,290-Speed 2497.45 samples/sec Loss 1.6130 LearningRate 0.000147 Epoch: 26 Global Step: 543570 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:29,493-Speed 2497.01 samples/sec Loss 1.6435 LearningRate 0.000147 Epoch: 26 Global Step: 543580 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:37,694-Speed 2497.69 samples/sec Loss 1.6119 LearningRate 0.000147 Epoch: 26 Global Step: 543590 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:45,906-Speed 2494.53 samples/sec Loss 1.6126 LearningRate 0.000147 Epoch: 26 Global Step: 543600 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:53:54,053-Speed 2513.90 samples/sec Loss 1.6113 LearningRate 0.000147 Epoch: 26 Global Step: 543610 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:02,257-Speed 2496.96 samples/sec Loss 1.6068 LearningRate 0.000147 Epoch: 26 Global Step: 543620 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:10,465-Speed 2495.58 samples/sec Loss 1.5922 LearningRate 0.000147 Epoch: 26 Global Step: 543630 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:18,668-Speed 2496.89 samples/sec Loss 1.6097 LearningRate 0.000147 Epoch: 26 Global Step: 543640 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:26,886-Speed 2492.42 samples/sec Loss 1.5893 LearningRate 0.000147 Epoch: 26 Global Step: 543650 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:35,090-Speed 2496.56 samples/sec Loss 1.6204 LearningRate 0.000147 Epoch: 26 Global Step: 543660 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:43,253-Speed 2509.51 samples/sec Loss 1.5794 LearningRate 0.000147 Epoch: 26 Global Step: 543670 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:51,464-Speed 2494.58 samples/sec Loss 1.6350 LearningRate 0.000147 Epoch: 26 Global Step: 543680 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:54:59,672-Speed 2495.64 samples/sec Loss 1.6219 LearningRate 0.000147 Epoch: 26 Global Step: 543690 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:07,877-Speed 2496.24 samples/sec Loss 1.6268 LearningRate 0.000147 Epoch: 26 Global Step: 543700 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:16,090-Speed 2494.59 samples/sec Loss 1.6154 LearningRate 0.000147 Epoch: 26 Global Step: 543710 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:24,292-Speed 2497.17 samples/sec Loss 1.6075 LearningRate 0.000147 Epoch: 26 Global Step: 543720 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:32,453-Speed 2509.91 samples/sec Loss 1.5806 LearningRate 0.000147 Epoch: 26 Global Step: 543730 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:40,664-Speed 2494.42 samples/sec Loss 1.5853 LearningRate 0.000147 Epoch: 26 Global Step: 543740 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:48,867-Speed 2497.00 samples/sec Loss 1.6094 LearningRate 0.000147 Epoch: 26 Global Step: 543750 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:55:57,069-Speed 2497.34 samples/sec Loss 1.5958 LearningRate 0.000147 Epoch: 26 Global Step: 543760 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:05,276-Speed 2495.89 samples/sec Loss 1.5994 LearningRate 0.000147 Epoch: 26 Global Step: 543770 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:13,477-Speed 2497.48 samples/sec Loss 1.6426 LearningRate 0.000147 Epoch: 26 Global Step: 543780 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:21,631-Speed 2512.28 samples/sec Loss 1.5987 LearningRate 0.000147 Epoch: 26 Global Step: 543790 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:29,832-Speed 2497.44 samples/sec Loss 1.6278 LearningRate 0.000146 Epoch: 26 Global Step: 543800 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:38,033-Speed 2497.43 samples/sec Loss 1.6140 LearningRate 0.000146 Epoch: 26 Global Step: 543810 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:46,235-Speed 2497.50 samples/sec Loss 1.5979 LearningRate 0.000146 Epoch: 26 Global Step: 543820 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:56:54,446-Speed 2494.51 samples/sec Loss 1.5819 LearningRate 0.000146 Epoch: 26 Global Step: 543830 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:02,659-Speed 2494.04 samples/sec Loss 1.5841 LearningRate 0.000146 Epoch: 26 Global Step: 543840 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:10,805-Speed 2514.49 samples/sec Loss 1.5996 LearningRate 0.000146 Epoch: 26 Global Step: 543850 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:19,004-Speed 2498.54 samples/sec Loss 1.5862 LearningRate 0.000146 Epoch: 26 Global Step: 543860 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:27,203-Speed 2498.11 samples/sec Loss 1.5768 LearningRate 0.000146 Epoch: 26 Global Step: 543870 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:35,406-Speed 2496.97 samples/sec Loss 1.6096 LearningRate 0.000146 Epoch: 26 Global Step: 543880 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:43,608-Speed 2497.67 samples/sec Loss 1.6434 LearningRate 0.000146 Epoch: 26 Global Step: 543890 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:51,811-Speed 2496.87 samples/sec Loss 1.6138 LearningRate 0.000146 Epoch: 26 Global Step: 543900 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:57:59,957-Speed 2514.41 samples/sec Loss 1.5792 LearningRate 0.000146 Epoch: 26 Global Step: 543910 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:08,158-Speed 2497.62 samples/sec Loss 1.6237 LearningRate 0.000146 Epoch: 26 Global Step: 543920 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:16,369-Speed 2494.61 samples/sec Loss 1.6021 LearningRate 0.000146 Epoch: 26 Global Step: 543930 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:24,570-Speed 2497.62 samples/sec Loss 1.5211 LearningRate 0.000146 Epoch: 26 Global Step: 543940 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:32,774-Speed 2496.83 samples/sec Loss 1.6173 LearningRate 0.000146 Epoch: 26 Global Step: 543950 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:40,977-Speed 2497.06 samples/sec Loss 1.6085 LearningRate 0.000146 Epoch: 26 Global Step: 543960 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:49,137-Speed 2510.01 samples/sec Loss 1.6187 LearningRate 0.000146 Epoch: 26 Global Step: 543970 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:58:57,341-Speed 2497.20 samples/sec Loss 1.6230 LearningRate 0.000146 Epoch: 26 Global Step: 543980 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:05,545-Speed 2496.90 samples/sec Loss 1.5874 LearningRate 0.000146 Epoch: 26 Global Step: 543990 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:13,752-Speed 2495.77 samples/sec Loss 1.6311 LearningRate 0.000146 Epoch: 26 Global Step: 544000 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:21,958-Speed 2496.12 samples/sec Loss 1.6043 LearningRate 0.000146 Epoch: 26 Global Step: 544010 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:30,178-Speed 2491.99 samples/sec Loss 1.5864 LearningRate 0.000146 Epoch: 26 Global Step: 544020 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:38,328-Speed 2513.19 samples/sec Loss 1.5943 LearningRate 0.000146 Epoch: 26 Global Step: 544030 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:46,531-Speed 2497.21 samples/sec Loss 1.6013 LearningRate 0.000146 Epoch: 26 Global Step: 544040 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 18:59:54,732-Speed 2497.74 samples/sec Loss 1.6275 LearningRate 0.000146 Epoch: 26 Global Step: 544050 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:02,933-Speed 2497.96 samples/sec Loss 1.5954 LearningRate 0.000146 Epoch: 26 Global Step: 544060 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:11,132-Speed 2498.23 samples/sec Loss 1.6539 LearningRate 0.000146 Epoch: 26 Global Step: 544070 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:19,342-Speed 2494.86 samples/sec Loss 1.6034 LearningRate 0.000146 Epoch: 26 Global Step: 544080 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:27,489-Speed 2514.21 samples/sec Loss 1.6114 LearningRate 0.000146 Epoch: 26 Global Step: 544090 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:35,693-Speed 2496.75 samples/sec Loss 1.5719 LearningRate 0.000146 Epoch: 26 Global Step: 544100 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:43,900-Speed 2495.59 samples/sec Loss 1.5777 LearningRate 0.000146 Epoch: 26 Global Step: 544110 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:00:52,109-Speed 2495.27 samples/sec Loss 1.6220 LearningRate 0.000146 Epoch: 26 Global Step: 544120 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:00,312-Speed 2497.00 samples/sec Loss 1.5903 LearningRate 0.000146 Epoch: 26 Global Step: 544130 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:08,525-Speed 2494.00 samples/sec Loss 1.5913 LearningRate 0.000146 Epoch: 26 Global Step: 544140 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:16,688-Speed 2509.34 samples/sec Loss 1.5781 LearningRate 0.000146 Epoch: 26 Global Step: 544150 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:24,903-Speed 2493.35 samples/sec Loss 1.6033 LearningRate 0.000146 Epoch: 26 Global Step: 544160 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:33,111-Speed 2495.64 samples/sec Loss 1.6001 LearningRate 0.000146 Epoch: 26 Global Step: 544170 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:41,325-Speed 2493.63 samples/sec Loss 1.6248 LearningRate 0.000146 Epoch: 26 Global Step: 544180 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:49,530-Speed 2496.34 samples/sec Loss 1.6043 LearningRate 0.000146 Epoch: 26 Global Step: 544190 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:01:57,731-Speed 2497.51 samples/sec Loss 1.6326 LearningRate 0.000146 Epoch: 26 Global Step: 544200 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:05,888-Speed 2511.07 samples/sec Loss 1.6058 LearningRate 0.000146 Epoch: 26 Global Step: 544210 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:14,118-Speed 2489.43 samples/sec Loss 1.6246 LearningRate 0.000146 Epoch: 26 Global Step: 544220 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:22,322-Speed 2496.57 samples/sec Loss 1.5800 LearningRate 0.000146 Epoch: 26 Global Step: 544230 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:30,527-Speed 2496.46 samples/sec Loss 1.5982 LearningRate 0.000146 Epoch: 26 Global Step: 544240 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:38,747-Speed 2492.10 samples/sec Loss 1.6115 LearningRate 0.000146 Epoch: 26 Global Step: 544250 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:46,947-Speed 2498.03 samples/sec Loss 1.5996 LearningRate 0.000146 Epoch: 26 Global Step: 544260 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:02:55,097-Speed 2513.30 samples/sec Loss 1.6073 LearningRate 0.000146 Epoch: 26 Global Step: 544270 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:03,301-Speed 2496.72 samples/sec Loss 1.5746 LearningRate 0.000146 Epoch: 26 Global Step: 544280 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:11,498-Speed 2498.73 samples/sec Loss 1.5856 LearningRate 0.000146 Epoch: 26 Global Step: 544290 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:19,700-Speed 2497.32 samples/sec Loss 1.6175 LearningRate 0.000146 Epoch: 26 Global Step: 544300 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:27,901-Speed 2497.68 samples/sec Loss 1.6220 LearningRate 0.000146 Epoch: 26 Global Step: 544310 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:36,110-Speed 2495.06 samples/sec Loss 1.5820 LearningRate 0.000146 Epoch: 26 Global Step: 544320 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:44,265-Speed 2512.01 samples/sec Loss 1.6101 LearningRate 0.000146 Epoch: 26 Global Step: 544330 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:03:52,477-Speed 2494.16 samples/sec Loss 1.6385 LearningRate 0.000146 Epoch: 26 Global Step: 544340 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:00,691-Speed 2493.74 samples/sec Loss 1.6250 LearningRate 0.000146 Epoch: 26 Global Step: 544350 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:08,894-Speed 2497.18 samples/sec Loss 1.6061 LearningRate 0.000146 Epoch: 26 Global Step: 544360 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:17,099-Speed 2496.40 samples/sec Loss 1.6270 LearningRate 0.000146 Epoch: 26 Global Step: 544370 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:25,302-Speed 2496.84 samples/sec Loss 1.6004 LearningRate 0.000146 Epoch: 26 Global Step: 544380 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:33,450-Speed 2514.07 samples/sec Loss 1.6004 LearningRate 0.000146 Epoch: 26 Global Step: 544390 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:41,650-Speed 2497.92 samples/sec Loss 1.6045 LearningRate 0.000146 Epoch: 26 Global Step: 544400 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:49,850-Speed 2498.13 samples/sec Loss 1.5969 LearningRate 0.000146 Epoch: 26 Global Step: 544410 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:04:58,059-Speed 2495.32 samples/sec Loss 1.6246 LearningRate 0.000146 Epoch: 26 Global Step: 544420 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:06,261-Speed 2497.28 samples/sec Loss 1.6115 LearningRate 0.000146 Epoch: 26 Global Step: 544430 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:14,461-Speed 2498.10 samples/sec Loss 1.6254 LearningRate 0.000146 Epoch: 26 Global Step: 544440 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:22,616-Speed 2512.25 samples/sec Loss 1.6237 LearningRate 0.000146 Epoch: 26 Global Step: 544450 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:30,818-Speed 2497.08 samples/sec Loss 1.5994 LearningRate 0.000146 Epoch: 26 Global Step: 544460 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:39,019-Speed 2497.65 samples/sec Loss 1.5983 LearningRate 0.000146 Epoch: 26 Global Step: 544470 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:47,236-Speed 2493.04 samples/sec Loss 1.6211 LearningRate 0.000146 Epoch: 26 Global Step: 544480 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:05:55,437-Speed 2497.52 samples/sec Loss 1.6308 LearningRate 0.000146 Epoch: 26 Global Step: 544490 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:06:03,637-Speed 2497.76 samples/sec Loss 1.6178 LearningRate 0.000146 Epoch: 26 Global Step: 544500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:11,788-Speed 2513.22 samples/sec Loss 1.6065 LearningRate 0.000146 Epoch: 26 Global Step: 544510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:19,990-Speed 2497.15 samples/sec Loss 1.6210 LearningRate 0.000146 Epoch: 26 Global Step: 544520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:28,204-Speed 2494.10 samples/sec Loss 1.6021 LearningRate 0.000146 Epoch: 26 Global Step: 544530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:36,407-Speed 2497.09 samples/sec Loss 1.6204 LearningRate 0.000146 Epoch: 26 Global Step: 544540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:44,605-Speed 2498.28 samples/sec Loss 1.6343 LearningRate 0.000146 Epoch: 26 Global Step: 544550 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:06:52,809-Speed 2496.98 samples/sec Loss 1.5897 LearningRate 0.000146 Epoch: 26 Global Step: 544560 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:00,955-Speed 2514.52 samples/sec Loss 1.6234 LearningRate 0.000146 Epoch: 26 Global Step: 544570 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:09,157-Speed 2497.38 samples/sec Loss 1.5938 LearningRate 0.000146 Epoch: 26 Global Step: 544580 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:17,355-Speed 2498.42 samples/sec Loss 1.6173 LearningRate 0.000146 Epoch: 26 Global Step: 544590 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:25,560-Speed 2496.45 samples/sec Loss 1.5941 LearningRate 0.000146 Epoch: 26 Global Step: 544600 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:33,758-Speed 2498.65 samples/sec Loss 1.5983 LearningRate 0.000146 Epoch: 26 Global Step: 544610 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:41,968-Speed 2495.29 samples/sec Loss 1.5819 LearningRate 0.000146 Epoch: 26 Global Step: 544620 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:50,117-Speed 2513.45 samples/sec Loss 1.6032 LearningRate 0.000146 Epoch: 26 Global Step: 544630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:07:58,334-Speed 2492.80 samples/sec Loss 1.5241 LearningRate 0.000146 Epoch: 26 Global Step: 544640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:06,536-Speed 2497.35 samples/sec Loss 1.6493 LearningRate 0.000146 Epoch: 26 Global Step: 544650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:14,744-Speed 2495.42 samples/sec Loss 1.6143 LearningRate 0.000146 Epoch: 26 Global Step: 544660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:22,958-Speed 2494.02 samples/sec Loss 1.5952 LearningRate 0.000146 Epoch: 26 Global Step: 544670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:31,159-Speed 2497.44 samples/sec Loss 1.5850 LearningRate 0.000146 Epoch: 26 Global Step: 544680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:39,306-Speed 2514.28 samples/sec Loss 1.6161 LearningRate 0.000146 Epoch: 26 Global Step: 544690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:47,510-Speed 2496.74 samples/sec Loss 1.5971 LearningRate 0.000146 Epoch: 26 Global Step: 544700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:08:55,711-Speed 2497.70 samples/sec Loss 1.6044 LearningRate 0.000146 Epoch: 26 Global Step: 544710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:03,915-Speed 2496.97 samples/sec Loss 1.5812 LearningRate 0.000146 Epoch: 26 Global Step: 544720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:12,140-Speed 2490.26 samples/sec Loss 1.5924 LearningRate 0.000146 Epoch: 26 Global Step: 544730 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:20,353-Speed 2493.89 samples/sec Loss 1.5600 LearningRate 0.000146 Epoch: 26 Global Step: 544740 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:28,501-Speed 2513.91 samples/sec Loss 1.6094 LearningRate 0.000146 Epoch: 26 Global Step: 544750 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:36,708-Speed 2495.97 samples/sec Loss 1.6232 LearningRate 0.000146 Epoch: 26 Global Step: 544760 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:44,912-Speed 2496.73 samples/sec Loss 1.5908 LearningRate 0.000146 Epoch: 26 Global Step: 544770 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:09:53,076-Speed 2508.69 samples/sec Loss 1.6032 LearningRate 0.000145 Epoch: 26 Global Step: 544780 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:01,277-Speed 2497.79 samples/sec Loss 1.6107 LearningRate 0.000145 Epoch: 26 Global Step: 544790 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:09,479-Speed 2497.28 samples/sec Loss 1.6215 LearningRate 0.000145 Epoch: 26 Global Step: 544800 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:17,630-Speed 2512.82 samples/sec Loss 1.5665 LearningRate 0.000145 Epoch: 26 Global Step: 544810 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:25,847-Speed 2492.65 samples/sec Loss 1.6124 LearningRate 0.000145 Epoch: 26 Global Step: 544820 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:34,049-Speed 2497.31 samples/sec Loss 1.6281 LearningRate 0.000145 Epoch: 26 Global Step: 544830 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:42,252-Speed 2497.08 samples/sec Loss 1.5748 LearningRate 0.000145 Epoch: 26 Global Step: 544840 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:50,456-Speed 2496.77 samples/sec Loss 1.6119 LearningRate 0.000145 Epoch: 26 Global Step: 544850 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:10:58,656-Speed 2497.62 samples/sec Loss 1.6573 LearningRate 0.000145 Epoch: 26 Global Step: 544860 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:06,806-Speed 2513.70 samples/sec Loss 1.6035 LearningRate 0.000145 Epoch: 26 Global Step: 544870 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:15,019-Speed 2493.80 samples/sec Loss 1.5890 LearningRate 0.000145 Epoch: 26 Global Step: 544880 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:23,224-Speed 2496.38 samples/sec Loss 1.6226 LearningRate 0.000145 Epoch: 26 Global Step: 544890 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:31,426-Speed 2497.62 samples/sec Loss 1.6276 LearningRate 0.000145 Epoch: 26 Global Step: 544900 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:39,627-Speed 2497.64 samples/sec Loss 1.6798 LearningRate 0.000145 Epoch: 26 Global Step: 544910 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:47,831-Speed 2496.84 samples/sec Loss 1.6126 LearningRate 0.000145 Epoch: 26 Global Step: 544920 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:11:55,982-Speed 2512.98 samples/sec Loss 1.6254 LearningRate 0.000145 Epoch: 26 Global Step: 544930 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:04,189-Speed 2496.03 samples/sec Loss 1.5949 LearningRate 0.000145 Epoch: 26 Global Step: 544940 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:12,389-Speed 2497.69 samples/sec Loss 1.6104 LearningRate 0.000145 Epoch: 26 Global Step: 544950 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:20,594-Speed 2496.54 samples/sec Loss 1.5810 LearningRate 0.000145 Epoch: 26 Global Step: 544960 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:28,806-Speed 2494.67 samples/sec Loss 1.5915 LearningRate 0.000145 Epoch: 26 Global Step: 544970 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:37,017-Speed 2494.33 samples/sec Loss 1.6125 LearningRate 0.000145 Epoch: 26 Global Step: 544980 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:45,167-Speed 2513.43 samples/sec Loss 1.5951 LearningRate 0.000145 Epoch: 26 Global Step: 544990 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:12:53,366-Speed 2498.19 samples/sec Loss 1.5901 LearningRate 0.000145 Epoch: 26 Global Step: 545000 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:01,576-Speed 2494.94 samples/sec Loss 1.5516 LearningRate 0.000145 Epoch: 26 Global Step: 545010 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:09,779-Speed 2497.06 samples/sec Loss 1.5891 LearningRate 0.000145 Epoch: 26 Global Step: 545020 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:17,980-Speed 2497.73 samples/sec Loss 1.5444 LearningRate 0.000145 Epoch: 26 Global Step: 545030 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:26,184-Speed 2496.47 samples/sec Loss 1.5498 LearningRate 0.000145 Epoch: 26 Global Step: 545040 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:34,340-Speed 2511.55 samples/sec Loss 1.5609 LearningRate 0.000145 Epoch: 26 Global Step: 545050 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:42,548-Speed 2495.46 samples/sec Loss 1.6198 LearningRate 0.000145 Epoch: 26 Global Step: 545060 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:50,757-Speed 2495.35 samples/sec Loss 1.6035 LearningRate 0.000145 Epoch: 26 Global Step: 545070 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:13:58,963-Speed 2496.02 samples/sec Loss 1.6024 LearningRate 0.000145 Epoch: 26 Global Step: 545080 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:07,167-Speed 2496.82 samples/sec Loss 1.6043 LearningRate 0.000145 Epoch: 26 Global Step: 545090 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:15,373-Speed 2496.59 samples/sec Loss 1.6278 LearningRate 0.000145 Epoch: 26 Global Step: 545100 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:23,534-Speed 2509.68 samples/sec Loss 1.5737 LearningRate 0.000145 Epoch: 26 Global Step: 545110 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:31,743-Speed 2495.17 samples/sec Loss 1.6206 LearningRate 0.000145 Epoch: 26 Global Step: 545120 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:39,975-Speed 2488.44 samples/sec Loss 1.5853 LearningRate 0.000145 Epoch: 26 Global Step: 545130 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:48,177-Speed 2497.29 samples/sec Loss 1.5913 LearningRate 0.000145 Epoch: 26 Global Step: 545140 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:14:56,380-Speed 2496.87 samples/sec Loss 1.6174 LearningRate 0.000145 Epoch: 26 Global Step: 545150 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:04,590-Speed 2495.15 samples/sec Loss 1.6296 LearningRate 0.000145 Epoch: 26 Global Step: 545160 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:12,740-Speed 2513.76 samples/sec Loss 1.6176 LearningRate 0.000145 Epoch: 26 Global Step: 545170 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:20,949-Speed 2495.43 samples/sec Loss 1.5872 LearningRate 0.000145 Epoch: 26 Global Step: 545180 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:29,151-Speed 2497.54 samples/sec Loss 1.6243 LearningRate 0.000145 Epoch: 26 Global Step: 545190 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:37,354-Speed 2496.91 samples/sec Loss 1.6021 LearningRate 0.000145 Epoch: 26 Global Step: 545200 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:45,557-Speed 2496.73 samples/sec Loss 1.6132 LearningRate 0.000145 Epoch: 26 Global Step: 545210 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:15:53,773-Speed 2493.60 samples/sec Loss 1.6273 LearningRate 0.000145 Epoch: 26 Global Step: 545220 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:01,927-Speed 2512.02 samples/sec Loss 1.6134 LearningRate 0.000145 Epoch: 26 Global Step: 545230 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:10,130-Speed 2497.06 samples/sec Loss 1.6427 LearningRate 0.000145 Epoch: 26 Global Step: 545240 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:18,334-Speed 2496.77 samples/sec Loss 1.6248 LearningRate 0.000145 Epoch: 26 Global Step: 545250 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:26,541-Speed 2495.75 samples/sec Loss 1.6755 LearningRate 0.000145 Epoch: 26 Global Step: 545260 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:34,746-Speed 2496.56 samples/sec Loss 1.5970 LearningRate 0.000145 Epoch: 26 Global Step: 545270 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:42,951-Speed 2496.21 samples/sec Loss 1.6557 LearningRate 0.000145 Epoch: 26 Global Step: 545280 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:51,106-Speed 2511.91 samples/sec Loss 1.6321 LearningRate 0.000145 Epoch: 26 Global Step: 545290 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:16:59,310-Speed 2496.66 samples/sec Loss 1.6214 LearningRate 0.000145 Epoch: 26 Global Step: 545300 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:07,519-Speed 2495.47 samples/sec Loss 1.6083 LearningRate 0.000145 Epoch: 26 Global Step: 545310 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:15,731-Speed 2494.32 samples/sec Loss 1.6123 LearningRate 0.000145 Epoch: 26 Global Step: 545320 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:23,943-Speed 2494.22 samples/sec Loss 1.6182 LearningRate 0.000145 Epoch: 26 Global Step: 545330 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:32,167-Speed 2490.75 samples/sec Loss 1.5942 LearningRate 0.000145 Epoch: 26 Global Step: 545340 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:40,320-Speed 2512.52 samples/sec Loss 1.6134 LearningRate 0.000145 Epoch: 26 Global Step: 545350 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:48,524-Speed 2496.47 samples/sec Loss 1.5777 LearningRate 0.000145 Epoch: 26 Global Step: 545360 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:17:56,731-Speed 2496.04 samples/sec Loss 1.6106 LearningRate 0.000145 Epoch: 26 Global Step: 545370 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:04,936-Speed 2496.66 samples/sec Loss 1.5863 LearningRate 0.000145 Epoch: 26 Global Step: 545380 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:13,139-Speed 2496.99 samples/sec Loss 1.6181 LearningRate 0.000145 Epoch: 26 Global Step: 545390 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:21,342-Speed 2496.88 samples/sec Loss 1.6250 LearningRate 0.000145 Epoch: 26 Global Step: 545400 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:29,491-Speed 2513.78 samples/sec Loss 1.6118 LearningRate 0.000145 Epoch: 26 Global Step: 545410 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:37,699-Speed 2495.48 samples/sec Loss 1.5837 LearningRate 0.000145 Epoch: 26 Global Step: 545420 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:45,904-Speed 2496.38 samples/sec Loss 1.5755 LearningRate 0.000145 Epoch: 26 Global Step: 545430 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:18:54,107-Speed 2497.36 samples/sec Loss 1.6189 LearningRate 0.000145 Epoch: 26 Global Step: 545440 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:02,307-Speed 2497.84 samples/sec Loss 1.5984 LearningRate 0.000145 Epoch: 26 Global Step: 545450 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:10,513-Speed 2496.22 samples/sec Loss 1.6078 LearningRate 0.000145 Epoch: 26 Global Step: 545460 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:18,658-Speed 2514.87 samples/sec Loss 1.6341 LearningRate 0.000145 Epoch: 26 Global Step: 545470 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:26,861-Speed 2497.00 samples/sec Loss 1.6051 LearningRate 0.000145 Epoch: 26 Global Step: 545480 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:35,068-Speed 2496.08 samples/sec Loss 1.5921 LearningRate 0.000145 Epoch: 26 Global Step: 545490 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:43,269-Speed 2497.39 samples/sec Loss 1.5693 LearningRate 0.000145 Epoch: 26 Global Step: 545500 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:51,477-Speed 2495.44 samples/sec Loss 1.5587 LearningRate 0.000145 Epoch: 26 Global Step: 545510 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:19:59,689-Speed 2494.81 samples/sec Loss 1.5884 LearningRate 0.000145 Epoch: 26 Global Step: 545520 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:07,839-Speed 2513.33 samples/sec Loss 1.5443 LearningRate 0.000145 Epoch: 26 Global Step: 545530 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:16,039-Speed 2497.77 samples/sec Loss 1.5937 LearningRate 0.000145 Epoch: 26 Global Step: 545540 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:24,244-Speed 2496.78 samples/sec Loss 1.5775 LearningRate 0.000145 Epoch: 26 Global Step: 545550 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:32,450-Speed 2496.30 samples/sec Loss 1.5679 LearningRate 0.000145 Epoch: 26 Global Step: 545560 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:40,652-Speed 2497.29 samples/sec Loss 1.5927 LearningRate 0.000145 Epoch: 26 Global Step: 545570 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:48,854-Speed 2497.16 samples/sec Loss 1.5876 LearningRate 0.000145 Epoch: 26 Global Step: 545580 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:20:57,016-Speed 2509.81 samples/sec Loss 1.5806 LearningRate 0.000145 Epoch: 26 Global Step: 545590 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:05,216-Speed 2497.74 samples/sec Loss 1.5967 LearningRate 0.000145 Epoch: 26 Global Step: 545600 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:13,423-Speed 2495.99 samples/sec Loss 1.5829 LearningRate 0.000145 Epoch: 26 Global Step: 545610 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:21,627-Speed 2496.83 samples/sec Loss 1.5981 LearningRate 0.000145 Epoch: 26 Global Step: 545620 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:29,828-Speed 2497.55 samples/sec Loss 1.5868 LearningRate 0.000145 Epoch: 26 Global Step: 545630 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:38,031-Speed 2497.92 samples/sec Loss 1.5573 LearningRate 0.000145 Epoch: 26 Global Step: 545640 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:46,181-Speed 2513.21 samples/sec Loss 1.5665 LearningRate 0.000145 Epoch: 26 Global Step: 545650 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:21:54,384-Speed 2497.15 samples/sec Loss 1.5871 LearningRate 0.000145 Epoch: 26 Global Step: 545660 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:02,587-Speed 2496.83 samples/sec Loss 1.5889 LearningRate 0.000145 Epoch: 26 Global Step: 545670 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:10,802-Speed 2493.55 samples/sec Loss 1.5965 LearningRate 0.000145 Epoch: 26 Global Step: 545680 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:19,009-Speed 2495.79 samples/sec Loss 1.6028 LearningRate 0.000145 Epoch: 26 Global Step: 545690 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:27,211-Speed 2497.10 samples/sec Loss 1.6183 LearningRate 0.000145 Epoch: 26 Global Step: 545700 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:35,372-Speed 2510.00 samples/sec Loss 1.6046 LearningRate 0.000145 Epoch: 26 Global Step: 545710 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:43,584-Speed 2494.20 samples/sec Loss 1.5962 LearningRate 0.000145 Epoch: 26 Global Step: 545720 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:22:51,787-Speed 2497.19 samples/sec Loss 1.5643 LearningRate 0.000145 Epoch: 26 Global Step: 545730 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:00,005-Speed 2492.31 samples/sec Loss 1.6054 LearningRate 0.000145 Epoch: 26 Global Step: 545740 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:08,209-Speed 2496.66 samples/sec Loss 1.5517 LearningRate 0.000145 Epoch: 26 Global Step: 545750 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:16,422-Speed 2494.11 samples/sec Loss 1.5788 LearningRate 0.000144 Epoch: 26 Global Step: 545760 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:24,575-Speed 2512.09 samples/sec Loss 1.5804 LearningRate 0.000144 Epoch: 26 Global Step: 545770 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:32,776-Speed 2497.71 samples/sec Loss 1.5586 LearningRate 0.000144 Epoch: 26 Global Step: 545780 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:40,984-Speed 2495.86 samples/sec Loss 1.5521 LearningRate 0.000144 Epoch: 26 Global Step: 545790 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:49,189-Speed 2496.41 samples/sec Loss 1.5517 LearningRate 0.000144 Epoch: 26 Global Step: 545800 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:23:57,390-Speed 2497.64 samples/sec Loss 1.5503 LearningRate 0.000144 Epoch: 26 Global Step: 545810 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:05,595-Speed 2496.73 samples/sec Loss 1.6004 LearningRate 0.000144 Epoch: 26 Global Step: 545820 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:13,749-Speed 2512.27 samples/sec Loss 1.5696 LearningRate 0.000144 Epoch: 26 Global Step: 545830 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:21,950-Speed 2497.36 samples/sec Loss 1.5611 LearningRate 0.000144 Epoch: 26 Global Step: 545840 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:30,157-Speed 2496.04 samples/sec Loss 1.5589 LearningRate 0.000144 Epoch: 26 Global Step: 545850 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:38,359-Speed 2497.48 samples/sec Loss 1.5739 LearningRate 0.000144 Epoch: 26 Global Step: 545860 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:46,562-Speed 2497.06 samples/sec Loss 1.5673 LearningRate 0.000144 Epoch: 26 Global Step: 545870 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:24:54,775-Speed 2493.86 samples/sec Loss 1.6082 LearningRate 0.000144 Epoch: 26 Global Step: 545880 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:02,925-Speed 2513.51 samples/sec Loss 1.5953 LearningRate 0.000144 Epoch: 26 Global Step: 545890 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:11,136-Speed 2494.79 samples/sec Loss 1.6313 LearningRate 0.000144 Epoch: 26 Global Step: 545900 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:19,343-Speed 2495.93 samples/sec Loss 1.5956 LearningRate 0.000144 Epoch: 26 Global Step: 545910 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:27,543-Speed 2497.92 samples/sec Loss 1.5888 LearningRate 0.000144 Epoch: 26 Global Step: 545920 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:35,743-Speed 2498.20 samples/sec Loss 1.5619 LearningRate 0.000144 Epoch: 26 Global Step: 545930 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:43,945-Speed 2497.45 samples/sec Loss 1.5775 LearningRate 0.000144 Epoch: 26 Global Step: 545940 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:25:52,096-Speed 2512.95 samples/sec Loss 1.6110 LearningRate 0.000144 Epoch: 26 Global Step: 545950 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:26:00,305-Speed 2495.17 samples/sec Loss 1.6120 LearningRate 0.000144 Epoch: 26 Global Step: 545960 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:26:08,505-Speed 2497.80 samples/sec Loss 1.5679 LearningRate 0.000144 Epoch: 26 Global Step: 545970 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:26:16,714-Speed 2495.29 samples/sec Loss 1.6302 LearningRate 0.000144 Epoch: 26 Global Step: 545980 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:26:24,920-Speed 2496.15 samples/sec Loss 1.6137 LearningRate 0.000144 Epoch: 26 Global Step: 545990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:26:33,124-Speed 2496.95 samples/sec Loss 1.5715 LearningRate 0.000144 Epoch: 26 Global Step: 546000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:26:41,278-Speed 2511.87 samples/sec Loss 1.5849 LearningRate 0.000144 Epoch: 26 Global Step: 546010 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:26:49,481-Speed 2497.30 samples/sec Loss 1.6034 LearningRate 0.000144 Epoch: 26 Global Step: 546020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:26:57,684-Speed 2497.17 samples/sec Loss 1.6271 LearningRate 0.000144 Epoch: 26 Global Step: 546030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:05,889-Speed 2496.31 samples/sec Loss 1.5865 LearningRate 0.000144 Epoch: 26 Global Step: 546040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:14,089-Speed 2497.76 samples/sec Loss 1.6029 LearningRate 0.000144 Epoch: 26 Global Step: 546050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:22,299-Speed 2495.06 samples/sec Loss 1.6387 LearningRate 0.000144 Epoch: 26 Global Step: 546060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:30,449-Speed 2514.78 samples/sec Loss 1.6163 LearningRate 0.000144 Epoch: 26 Global Step: 546070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:38,675-Speed 2490.07 samples/sec Loss 1.6056 LearningRate 0.000144 Epoch: 26 Global Step: 546080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:27:46,878-Speed 2496.99 samples/sec Loss 1.5967 LearningRate 0.000144 Epoch: 26 Global Step: 546090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:00,757-Speed 2497.37 samples/sec Loss 1.5773 LearningRate 0.000144 Epoch: 26 Global Step: 546100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:08,994-Speed 2500.44 samples/sec Loss 1.5838 LearningRate 0.000144 Epoch: 26 Global Step: 546110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:21,010-Speed 1704.46 samples/sec Loss 1.5850 LearningRate 0.000144 Epoch: 26 Global Step: 546120 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:29,177-Speed 2515.38 samples/sec Loss 1.5915 LearningRate 0.000144 Epoch: 26 Global Step: 546130 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:37,403-Speed 2499.27 samples/sec Loss 1.5667 LearningRate 0.000144 Epoch: 26 Global Step: 546140 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:45,611-Speed 2495.46 samples/sec Loss 1.5880 LearningRate 0.000144 Epoch: 26 Global Step: 546150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:28:56,156-Speed 2495.65 samples/sec Loss 1.6257 LearningRate 0.000144 Epoch: 26 Global Step: 546160 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:04,444-Speed 2493.62 samples/sec Loss 1.5786 LearningRate 0.000144 Epoch: 26 Global Step: 546170 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:12,689-Speed 2492.54 samples/sec Loss 1.6064 LearningRate 0.000144 Epoch: 26 Global Step: 546180 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:20,869-Speed 2504.00 samples/sec Loss 1.5961 LearningRate 0.000144 Epoch: 26 Global Step: 546190 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:29,639-Speed 2489.35 samples/sec Loss 1.6144 LearningRate 0.000144 Epoch: 26 Global Step: 546200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:37,921-Speed 2489.62 samples/sec Loss 1.5942 LearningRate 0.000144 Epoch: 26 Global Step: 546210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:49,647-Speed 1753.35 samples/sec Loss 1.6348 LearningRate 0.000144 Epoch: 26 Global Step: 546220 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:29:57,868-Speed 2491.40 samples/sec Loss 1.5739 LearningRate 0.000144 Epoch: 26 Global Step: 546230 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:07,120-Speed 2496.44 samples/sec Loss 1.5832 LearningRate 0.000144 Epoch: 26 Global Step: 546240 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:15,324-Speed 2513.16 samples/sec Loss 1.5785 LearningRate 0.000144 Epoch: 26 Global Step: 546250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:23,528-Speed 2496.53 samples/sec Loss 1.6216 LearningRate 0.000144 Epoch: 26 Global Step: 546260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:31,762-Speed 2497.09 samples/sec Loss 1.5620 LearningRate 0.000144 Epoch: 26 Global Step: 546270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:40,004-Speed 2499.16 samples/sec Loss 1.5511 LearningRate 0.000144 Epoch: 26 Global Step: 546280 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:48,989-Speed 2501.67 samples/sec Loss 1.5671 LearningRate 0.000144 Epoch: 26 Global Step: 546290 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:30:57,189-Speed 2497.62 samples/sec Loss 1.5611 LearningRate 0.000144 Epoch: 26 Global Step: 546300 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:05,729-Speed 2398.65 samples/sec Loss 1.5746 LearningRate 0.000144 Epoch: 26 Global Step: 546310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:13,928-Speed 2498.21 samples/sec Loss 1.5967 LearningRate 0.000144 Epoch: 26 Global Step: 546320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:22,130-Speed 2497.47 samples/sec Loss 1.5956 LearningRate 0.000144 Epoch: 26 Global Step: 546330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:30,333-Speed 2496.83 samples/sec Loss 1.5678 LearningRate 0.000144 Epoch: 26 Global Step: 546340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:38,537-Speed 2496.61 samples/sec Loss 1.6315 LearningRate 0.000144 Epoch: 26 Global Step: 546350 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:31:46,696-Speed 2510.54 samples/sec Loss 1.5467 LearningRate 0.000144 Epoch: 26 Global Step: 546360 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:31:54,846-Speed 2513.14 samples/sec Loss 1.6004 LearningRate 0.000144 Epoch: 26 Global Step: 546370 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:03,060-Speed 2493.75 samples/sec Loss 1.5858 LearningRate 0.000144 Epoch: 26 Global Step: 546380 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:11,262-Speed 2497.50 samples/sec Loss 1.5914 LearningRate 0.000144 Epoch: 26 Global Step: 546390 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:19,466-Speed 2496.83 samples/sec Loss 1.5632 LearningRate 0.000144 Epoch: 26 Global Step: 546400 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:27,672-Speed 2495.82 samples/sec Loss 1.5675 LearningRate 0.000144 Epoch: 26 Global Step: 546410 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:35,874-Speed 2497.27 samples/sec Loss 1.5922 LearningRate 0.000144 Epoch: 26 Global Step: 546420 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:44,023-Speed 2513.73 samples/sec Loss 1.6170 LearningRate 0.000144 Epoch: 26 Global Step: 546430 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:32:52,224-Speed 2497.63 samples/sec Loss 1.6115 LearningRate 0.000144 Epoch: 26 Global Step: 546440 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:00,425-Speed 2497.67 samples/sec Loss 1.5683 LearningRate 0.000144 Epoch: 26 Global Step: 546450 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:08,638-Speed 2494.25 samples/sec Loss 1.5477 LearningRate 0.000144 Epoch: 26 Global Step: 546460 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:16,840-Speed 2497.31 samples/sec Loss 1.6086 LearningRate 0.000144 Epoch: 26 Global Step: 546470 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:25,047-Speed 2495.85 samples/sec Loss 1.5937 LearningRate 0.000144 Epoch: 26 Global Step: 546480 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:33,191-Speed 2514.89 samples/sec Loss 1.5623 LearningRate 0.000144 Epoch: 26 Global Step: 546490 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:41,395-Speed 2496.94 samples/sec Loss 1.5859 LearningRate 0.000144 Epoch: 26 Global Step: 546500 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:49,596-Speed 2497.55 samples/sec Loss 1.6076 LearningRate 0.000144 Epoch: 26 Global Step: 546510 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:33:57,811-Speed 2493.37 samples/sec Loss 1.5566 LearningRate 0.000144 Epoch: 26 Global Step: 546520 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:06,025-Speed 2493.68 samples/sec Loss 1.5894 LearningRate 0.000144 Epoch: 26 Global Step: 546530 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:14,230-Speed 2496.38 samples/sec Loss 1.6138 LearningRate 0.000144 Epoch: 26 Global Step: 546540 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:22,391-Speed 2509.88 samples/sec Loss 1.5716 LearningRate 0.000144 Epoch: 26 Global Step: 546550 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:30,598-Speed 2495.99 samples/sec Loss 1.5727 LearningRate 0.000144 Epoch: 26 Global Step: 546560 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:38,805-Speed 2496.00 samples/sec Loss 1.5981 LearningRate 0.000144 Epoch: 26 Global Step: 546570 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:47,009-Speed 2496.86 samples/sec Loss 1.5779 LearningRate 0.000144 Epoch: 26 Global Step: 546580 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:34:55,211-Speed 2497.47 samples/sec Loss 1.5694 LearningRate 0.000144 Epoch: 26 Global Step: 546590 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:03,412-Speed 2497.35 samples/sec Loss 1.6005 LearningRate 0.000144 Epoch: 26 Global Step: 546600 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:11,562-Speed 2513.34 samples/sec Loss 1.5900 LearningRate 0.000144 Epoch: 26 Global Step: 546610 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:19,767-Speed 2496.57 samples/sec Loss 1.5922 LearningRate 0.000144 Epoch: 26 Global Step: 546620 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:27,986-Speed 2492.25 samples/sec Loss 1.6077 LearningRate 0.000144 Epoch: 26 Global Step: 546630 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:36,190-Speed 2496.75 samples/sec Loss 1.5778 LearningRate 0.000144 Epoch: 26 Global Step: 546640 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:44,390-Speed 2497.78 samples/sec Loss 1.6023 LearningRate 0.000144 Epoch: 26 Global Step: 546650 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:35:52,592-Speed 2497.69 samples/sec Loss 1.5729 LearningRate 0.000144 Epoch: 26 Global Step: 546660 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:00,741-Speed 2513.57 samples/sec Loss 1.6082 LearningRate 0.000144 Epoch: 26 Global Step: 546670 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:08,943-Speed 2497.35 samples/sec Loss 1.5680 LearningRate 0.000144 Epoch: 26 Global Step: 546680 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:17,144-Speed 2497.80 samples/sec Loss 1.5951 LearningRate 0.000144 Epoch: 26 Global Step: 546690 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:25,346-Speed 2497.60 samples/sec Loss 1.5934 LearningRate 0.000144 Epoch: 26 Global Step: 546700 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:33,561-Speed 2493.26 samples/sec Loss 1.5806 LearningRate 0.000144 Epoch: 26 Global Step: 546710 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:41,763-Speed 2497.06 samples/sec Loss 1.5450 LearningRate 0.000144 Epoch: 26 Global Step: 546720 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:49,914-Speed 2513.32 samples/sec Loss 1.5714 LearningRate 0.000144 Epoch: 26 Global Step: 546730 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:36:58,123-Speed 2495.37 samples/sec Loss 1.6048 LearningRate 0.000143 Epoch: 26 Global Step: 546740 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:06,321-Speed 2498.45 samples/sec Loss 1.6245 LearningRate 0.000143 Epoch: 26 Global Step: 546750 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:14,522-Speed 2497.79 samples/sec Loss 1.6198 LearningRate 0.000143 Epoch: 26 Global Step: 546760 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:22,728-Speed 2496.61 samples/sec Loss 1.5651 LearningRate 0.000143 Epoch: 26 Global Step: 546770 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:30,931-Speed 2498.34 samples/sec Loss 1.6095 LearningRate 0.000143 Epoch: 26 Global Step: 546780 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:39,085-Speed 2511.99 samples/sec Loss 1.5874 LearningRate 0.000143 Epoch: 26 Global Step: 546790 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:47,287-Speed 2497.20 samples/sec Loss 1.5732 LearningRate 0.000143 Epoch: 26 Global Step: 546800 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:37:55,489-Speed 2497.48 samples/sec Loss 1.5882 LearningRate 0.000143 Epoch: 26 Global Step: 546810 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:03,689-Speed 2497.83 samples/sec Loss 1.6449 LearningRate 0.000143 Epoch: 26 Global Step: 546820 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:11,892-Speed 2497.09 samples/sec Loss 1.6292 LearningRate 0.000143 Epoch: 26 Global Step: 546830 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:20,101-Speed 2495.19 samples/sec Loss 1.5546 LearningRate 0.000143 Epoch: 26 Global Step: 546840 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:28,254-Speed 2512.46 samples/sec Loss 1.5534 LearningRate 0.000143 Epoch: 26 Global Step: 546850 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:36,474-Speed 2491.96 samples/sec Loss 1.5860 LearningRate 0.000143 Epoch: 26 Global Step: 546860 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:44,684-Speed 2494.84 samples/sec Loss 1.5817 LearningRate 0.000143 Epoch: 26 Global Step: 546870 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:38:52,891-Speed 2495.70 samples/sec Loss 1.5748 LearningRate 0.000143 Epoch: 26 Global Step: 546880 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:01,096-Speed 2496.30 samples/sec Loss 1.6167 LearningRate 0.000143 Epoch: 26 Global Step: 546890 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:09,302-Speed 2496.40 samples/sec Loss 1.5715 LearningRate 0.000143 Epoch: 26 Global Step: 546900 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:17,452-Speed 2513.28 samples/sec Loss 1.5891 LearningRate 0.000143 Epoch: 26 Global Step: 546910 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:25,655-Speed 2497.00 samples/sec Loss 1.6285 LearningRate 0.000143 Epoch: 26 Global Step: 546920 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:33,856-Speed 2497.37 samples/sec Loss 1.6113 LearningRate 0.000143 Epoch: 26 Global Step: 546930 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:42,059-Speed 2497.13 samples/sec Loss 1.5840 LearningRate 0.000143 Epoch: 26 Global Step: 546940 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:50,260-Speed 2497.57 samples/sec Loss 1.5492 LearningRate 0.000143 Epoch: 26 Global Step: 546950 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:39:58,471-Speed 2494.60 samples/sec Loss 1.5844 LearningRate 0.000143 Epoch: 26 Global Step: 546960 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:06,639-Speed 2508.14 samples/sec Loss 1.5882 LearningRate 0.000143 Epoch: 26 Global Step: 546970 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:14,857-Speed 2492.54 samples/sec Loss 1.6075 LearningRate 0.000143 Epoch: 26 Global Step: 546980 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:23,061-Speed 2496.40 samples/sec Loss 1.6083 LearningRate 0.000143 Epoch: 26 Global Step: 546990 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:31,277-Speed 2493.19 samples/sec Loss 1.6240 LearningRate 0.000143 Epoch: 26 Global Step: 547000 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:39,485-Speed 2496.05 samples/sec Loss 1.6099 LearningRate 0.000143 Epoch: 26 Global Step: 547010 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:47,688-Speed 2496.94 samples/sec Loss 1.5375 LearningRate 0.000143 Epoch: 26 Global Step: 547020 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:40:55,841-Speed 2512.45 samples/sec Loss 1.5955 LearningRate 0.000143 Epoch: 26 Global Step: 547030 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:04,045-Speed 2496.79 samples/sec Loss 1.5895 LearningRate 0.000143 Epoch: 26 Global Step: 547040 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:12,250-Speed 2496.78 samples/sec Loss 1.6272 LearningRate 0.000143 Epoch: 26 Global Step: 547050 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:20,455-Speed 2496.36 samples/sec Loss 1.5579 LearningRate 0.000143 Epoch: 26 Global Step: 547060 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:28,658-Speed 2496.74 samples/sec Loss 1.5837 LearningRate 0.000143 Epoch: 26 Global Step: 547070 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:36,862-Speed 2497.03 samples/sec Loss 1.5748 LearningRate 0.000143 Epoch: 26 Global Step: 547080 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:45,017-Speed 2511.59 samples/sec Loss 1.5883 LearningRate 0.000143 Epoch: 26 Global Step: 547090 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:41:53,224-Speed 2495.89 samples/sec Loss 1.5997 LearningRate 0.000143 Epoch: 26 Global Step: 547100 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:01,427-Speed 2497.15 samples/sec Loss 1.6388 LearningRate 0.000143 Epoch: 26 Global Step: 547110 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:09,634-Speed 2495.98 samples/sec Loss 1.5695 LearningRate 0.000143 Epoch: 26 Global Step: 547120 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:17,840-Speed 2496.05 samples/sec Loss 1.6006 LearningRate 0.000143 Epoch: 26 Global Step: 547130 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:26,040-Speed 2497.80 samples/sec Loss 1.5753 LearningRate 0.000143 Epoch: 26 Global Step: 547140 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:34,194-Speed 2512.22 samples/sec Loss 1.6132 LearningRate 0.000143 Epoch: 26 Global Step: 547150 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:42,395-Speed 2497.70 samples/sec Loss 1.6069 LearningRate 0.000143 Epoch: 26 Global Step: 547160 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:50,597-Speed 2497.35 samples/sec Loss 1.5961 LearningRate 0.000143 Epoch: 26 Global Step: 547170 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:42:58,797-Speed 2497.95 samples/sec Loss 1.5886 LearningRate 0.000143 Epoch: 26 Global Step: 547180 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:07,003-Speed 2496.17 samples/sec Loss 1.5888 LearningRate 0.000143 Epoch: 26 Global Step: 547190 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:15,210-Speed 2495.99 samples/sec Loss 1.5848 LearningRate 0.000143 Epoch: 26 Global Step: 547200 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:23,368-Speed 2510.88 samples/sec Loss 1.5828 LearningRate 0.000143 Epoch: 26 Global Step: 547210 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:31,574-Speed 2495.92 samples/sec Loss 1.5730 LearningRate 0.000143 Epoch: 26 Global Step: 547220 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:39,778-Speed 2496.68 samples/sec Loss 1.5861 LearningRate 0.000143 Epoch: 26 Global Step: 547230 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:47,983-Speed 2496.49 samples/sec Loss 1.5986 LearningRate 0.000143 Epoch: 26 Global Step: 547240 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:43:56,186-Speed 2496.90 samples/sec Loss 1.5849 LearningRate 0.000143 Epoch: 26 Global Step: 547250 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:04,398-Speed 2494.17 samples/sec Loss 1.5616 LearningRate 0.000143 Epoch: 26 Global Step: 547260 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:12,554-Speed 2511.64 samples/sec Loss 1.6190 LearningRate 0.000143 Epoch: 26 Global Step: 547270 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:20,760-Speed 2496.29 samples/sec Loss 1.6179 LearningRate 0.000143 Epoch: 26 Global Step: 547280 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:28,964-Speed 2496.75 samples/sec Loss 1.6386 LearningRate 0.000143 Epoch: 26 Global Step: 547290 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:37,168-Speed 2497.52 samples/sec Loss 1.5995 LearningRate 0.000143 Epoch: 26 Global Step: 547300 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:45,371-Speed 2497.06 samples/sec Loss 1.5870 LearningRate 0.000143 Epoch: 26 Global Step: 547310 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:44:53,582-Speed 2494.80 samples/sec Loss 1.6413 LearningRate 0.000143 Epoch: 26 Global Step: 547320 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:01,741-Speed 2510.42 samples/sec Loss 1.5947 LearningRate 0.000143 Epoch: 26 Global Step: 547330 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:09,945-Speed 2496.78 samples/sec Loss 1.5984 LearningRate 0.000143 Epoch: 26 Global Step: 547340 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:18,153-Speed 2495.57 samples/sec Loss 1.5943 LearningRate 0.000143 Epoch: 26 Global Step: 547350 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:26,356-Speed 2496.80 samples/sec Loss 1.5996 LearningRate 0.000143 Epoch: 26 Global Step: 547360 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:34,563-Speed 2495.91 samples/sec Loss 1.6088 LearningRate 0.000143 Epoch: 26 Global Step: 547370 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:42,763-Speed 2497.95 samples/sec Loss 1.5713 LearningRate 0.000143 Epoch: 26 Global Step: 547380 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:50,913-Speed 2513.26 samples/sec Loss 1.5758 LearningRate 0.000143 Epoch: 26 Global Step: 547390 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:45:59,120-Speed 2495.86 samples/sec Loss 1.6014 LearningRate 0.000143 Epoch: 26 Global Step: 547400 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:07,325-Speed 2496.49 samples/sec Loss 1.5564 LearningRate 0.000143 Epoch: 26 Global Step: 547410 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:15,533-Speed 2495.58 samples/sec Loss 1.5796 LearningRate 0.000143 Epoch: 26 Global Step: 547420 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:23,738-Speed 2496.35 samples/sec Loss 1.5992 LearningRate 0.000143 Epoch: 26 Global Step: 547430 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:31,945-Speed 2495.88 samples/sec Loss 1.5880 LearningRate 0.000143 Epoch: 26 Global Step: 547440 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:40,099-Speed 2512.23 samples/sec Loss 1.6081 LearningRate 0.000143 Epoch: 26 Global Step: 547450 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:48,304-Speed 2496.50 samples/sec Loss 1.5637 LearningRate 0.000143 Epoch: 26 Global Step: 547460 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:46:56,526-Speed 2491.24 samples/sec Loss 1.5932 LearningRate 0.000143 Epoch: 26 Global Step: 547470 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:04,739-Speed 2493.64 samples/sec Loss 1.5964 LearningRate 0.000143 Epoch: 26 Global Step: 547480 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:12,946-Speed 2495.93 samples/sec Loss 1.5972 LearningRate 0.000143 Epoch: 26 Global Step: 547490 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:21,149-Speed 2497.18 samples/sec Loss 1.5867 LearningRate 0.000143 Epoch: 26 Global Step: 547500 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:29,301-Speed 2512.74 samples/sec Loss 1.6350 LearningRate 0.000143 Epoch: 26 Global Step: 547510 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:37,500-Speed 2498.16 samples/sec Loss 1.5857 LearningRate 0.000143 Epoch: 26 Global Step: 547520 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:45,732-Speed 2488.25 samples/sec Loss 1.6010 LearningRate 0.000143 Epoch: 26 Global Step: 547530 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:47:53,934-Speed 2497.25 samples/sec Loss 1.6080 LearningRate 0.000143 Epoch: 26 Global Step: 547540 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:48:02,138-Speed 2496.91 samples/sec Loss 1.5787 LearningRate 0.000143 Epoch: 26 Global Step: 547550 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-07-10 19:48:10,338-Speed 2497.59 samples/sec Loss 1.5657 LearningRate 0.000143 Epoch: 26 Global Step: 547560 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:18,490-Speed 2512.92 samples/sec Loss 1.5727 LearningRate 0.000143 Epoch: 26 Global Step: 547570 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:26,690-Speed 2497.85 samples/sec Loss 1.6009 LearningRate 0.000143 Epoch: 26 Global Step: 547580 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:34,899-Speed 2495.28 samples/sec Loss 1.5723 LearningRate 0.000143 Epoch: 26 Global Step: 547590 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:43,106-Speed 2495.70 samples/sec Loss 1.5956 LearningRate 0.000143 Epoch: 26 Global Step: 547600 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:51,308-Speed 2497.26 samples/sec Loss 1.5951 LearningRate 0.000143 Epoch: 26 Global Step: 547610 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:48:59,515-Speed 2495.84 samples/sec Loss 1.6048 LearningRate 0.000143 Epoch: 26 Global Step: 547620 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:07,679-Speed 2508.87 samples/sec Loss 1.5872 LearningRate 0.000143 Epoch: 26 Global Step: 547630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:15,880-Speed 2497.82 samples/sec Loss 1.5906 LearningRate 0.000143 Epoch: 26 Global Step: 547640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:24,083-Speed 2497.47 samples/sec Loss 1.5441 LearningRate 0.000143 Epoch: 26 Global Step: 547650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:32,285-Speed 2497.19 samples/sec Loss 1.5822 LearningRate 0.000143 Epoch: 26 Global Step: 547660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:40,576-Speed 2470.50 samples/sec Loss 1.5698 LearningRate 0.000143 Epoch: 26 Global Step: 547670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:48,782-Speed 2496.05 samples/sec Loss 1.6148 LearningRate 0.000143 Epoch: 26 Global Step: 547680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:49:56,931-Speed 2513.49 samples/sec Loss 1.6133 LearningRate 0.000143 Epoch: 26 Global Step: 547690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:05,135-Speed 2496.70 samples/sec Loss 1.5752 LearningRate 0.000143 Epoch: 26 Global Step: 547700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:13,350-Speed 2493.55 samples/sec Loss 1.6157 LearningRate 0.000143 Epoch: 26 Global Step: 547710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:21,568-Speed 2492.49 samples/sec Loss 1.5721 LearningRate 0.000143 Epoch: 26 Global Step: 547720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:29,772-Speed 2496.77 samples/sec Loss 1.6072 LearningRate 0.000142 Epoch: 26 Global Step: 547730 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:37,979-Speed 2495.96 samples/sec Loss 1.5752 LearningRate 0.000142 Epoch: 26 Global Step: 547740 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:46,131-Speed 2512.38 samples/sec Loss 1.5861 LearningRate 0.000142 Epoch: 26 Global Step: 547750 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:50:54,334-Speed 2497.18 samples/sec Loss 1.5567 LearningRate 0.000142 Epoch: 26 Global Step: 547760 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:02,543-Speed 2495.23 samples/sec Loss 1.6261 LearningRate 0.000142 Epoch: 26 Global Step: 547770 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:10,761-Speed 2492.72 samples/sec Loss 1.6076 LearningRate 0.000142 Epoch: 26 Global Step: 547780 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:18,961-Speed 2497.87 samples/sec Loss 1.5928 LearningRate 0.000142 Epoch: 26 Global Step: 547790 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:27,165-Speed 2496.74 samples/sec Loss 1.6202 LearningRate 0.000142 Epoch: 26 Global Step: 547800 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:35,320-Speed 2512.43 samples/sec Loss 1.6040 LearningRate 0.000142 Epoch: 26 Global Step: 547810 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:43,526-Speed 2496.11 samples/sec Loss 1.5960 LearningRate 0.000142 Epoch: 26 Global Step: 547820 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:51,729-Speed 2496.91 samples/sec Loss 1.6042 LearningRate 0.000142 Epoch: 26 Global Step: 547830 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-07-10 19:51:59,934-Speed 2496.29 samples/sec Loss 1.5549 LearningRate 0.000142 Epoch: 26 Global Step: 547840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:08,153-Speed 2492.16 samples/sec Loss 1.5731 LearningRate 0.000142 Epoch: 26 Global Step: 547850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:16,357-Speed 2496.83 samples/sec Loss 1.5630 LearningRate 0.000142 Epoch: 26 Global Step: 547860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:24,509-Speed 2512.59 samples/sec Loss 1.5618 LearningRate 0.000142 Epoch: 26 Global Step: 547870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:32,718-Speed 2495.71 samples/sec Loss 1.5747 LearningRate 0.000142 Epoch: 26 Global Step: 547880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:40,921-Speed 2496.66 samples/sec Loss 1.5967 LearningRate 0.000142 Epoch: 26 Global Step: 547890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:49,122-Speed 2498.19 samples/sec Loss 1.5624 LearningRate 0.000142 Epoch: 26 Global Step: 547900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:52:57,328-Speed 2496.07 samples/sec Loss 1.5638 LearningRate 0.000142 Epoch: 26 Global Step: 547910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:05,533-Speed 2496.50 samples/sec Loss 1.6202 LearningRate 0.000142 Epoch: 26 Global Step: 547920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:13,682-Speed 2513.59 samples/sec Loss 1.5931 LearningRate 0.000142 Epoch: 26 Global Step: 547930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:21,897-Speed 2493.35 samples/sec Loss 1.6051 LearningRate 0.000142 Epoch: 26 Global Step: 547940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:30,101-Speed 2496.82 samples/sec Loss 1.6150 LearningRate 0.000142 Epoch: 26 Global Step: 547950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:38,304-Speed 2496.88 samples/sec Loss 1.6188 LearningRate 0.000142 Epoch: 26 Global Step: 547960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:46,508-Speed 2496.71 samples/sec Loss 1.5891 LearningRate 0.000142 Epoch: 26 Global Step: 547970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:53:54,710-Speed 2497.96 samples/sec Loss 1.6074 LearningRate 0.000142 Epoch: 26 Global Step: 547980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:02,864-Speed 2511.82 samples/sec Loss 1.5574 LearningRate 0.000142 Epoch: 26 Global Step: 547990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:11,068-Speed 2496.54 samples/sec Loss 1.5777 LearningRate 0.000142 Epoch: 26 Global Step: 548000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:19,278-Speed 2495.02 samples/sec Loss 1.5704 LearningRate 0.000142 Epoch: 26 Global Step: 548010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:27,495-Speed 2492.82 samples/sec Loss 1.6249 LearningRate 0.000142 Epoch: 26 Global Step: 548020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:35,708-Speed 2493.96 samples/sec Loss 1.6199 LearningRate 0.000142 Epoch: 26 Global Step: 548030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:43,909-Speed 2497.75 samples/sec Loss 1.5811 LearningRate 0.000142 Epoch: 26 Global Step: 548040 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:54:52,063-Speed 2512.03 samples/sec Loss 1.6132 LearningRate 0.000142 Epoch: 26 Global Step: 548050 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:00,264-Speed 2497.43 samples/sec Loss 1.6004 LearningRate 0.000142 Epoch: 26 Global Step: 548060 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:08,470-Speed 2496.51 samples/sec Loss 1.6138 LearningRate 0.000142 Epoch: 26 Global Step: 548070 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:16,679-Speed 2495.04 samples/sec Loss 1.5675 LearningRate 0.000142 Epoch: 26 Global Step: 548080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:24,884-Speed 2496.82 samples/sec Loss 1.5594 LearningRate 0.000142 Epoch: 26 Global Step: 548090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:33,089-Speed 2496.33 samples/sec Loss 1.6141 LearningRate 0.000142 Epoch: 26 Global Step: 548100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:41,243-Speed 2512.01 samples/sec Loss 1.6008 LearningRate 0.000142 Epoch: 26 Global Step: 548110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:49,449-Speed 2496.12 samples/sec Loss 1.6232 LearningRate 0.000142 Epoch: 26 Global Step: 548120 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:55:57,658-Speed 2495.11 samples/sec Loss 1.5389 LearningRate 0.000142 Epoch: 26 Global Step: 548130 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:05,863-Speed 2496.48 samples/sec Loss 1.5841 LearningRate 0.000142 Epoch: 26 Global Step: 548140 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:14,069-Speed 2495.99 samples/sec Loss 1.6237 LearningRate 0.000142 Epoch: 26 Global Step: 548150 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:22,271-Speed 2497.53 samples/sec Loss 1.5891 LearningRate 0.000142 Epoch: 26 Global Step: 548160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:30,425-Speed 2512.29 samples/sec Loss 1.5941 LearningRate 0.000142 Epoch: 26 Global Step: 548170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:38,631-Speed 2496.53 samples/sec Loss 1.5988 LearningRate 0.000142 Epoch: 26 Global Step: 548180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:46,834-Speed 2497.43 samples/sec Loss 1.5933 LearningRate 0.000142 Epoch: 26 Global Step: 548190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:56:55,037-Speed 2496.94 samples/sec Loss 1.6051 LearningRate 0.000142 Epoch: 26 Global Step: 548200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:03,244-Speed 2495.88 samples/sec Loss 1.5968 LearningRate 0.000142 Epoch: 26 Global Step: 548210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:11,446-Speed 2497.13 samples/sec Loss 1.5529 LearningRate 0.000142 Epoch: 26 Global Step: 548220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:19,595-Speed 2513.67 samples/sec Loss 1.5610 LearningRate 0.000142 Epoch: 26 Global Step: 548230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:27,813-Speed 2492.68 samples/sec Loss 1.5926 LearningRate 0.000142 Epoch: 26 Global Step: 548240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:36,012-Speed 2498.25 samples/sec Loss 1.5862 LearningRate 0.000142 Epoch: 26 Global Step: 548250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:44,215-Speed 2497.24 samples/sec Loss 1.5744 LearningRate 0.000142 Epoch: 26 Global Step: 548260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:57:52,422-Speed 2495.95 samples/sec Loss 1.5963 LearningRate 0.000142 Epoch: 26 Global Step: 548270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:00,627-Speed 2496.22 samples/sec Loss 1.5954 LearningRate 0.000142 Epoch: 26 Global Step: 548280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:08,795-Speed 2507.57 samples/sec Loss 1.5796 LearningRate 0.000142 Epoch: 26 Global Step: 548290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:16,999-Speed 2496.92 samples/sec Loss 1.5813 LearningRate 0.000142 Epoch: 26 Global Step: 548300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:25,208-Speed 2495.24 samples/sec Loss 1.5812 LearningRate 0.000142 Epoch: 26 Global Step: 548310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:33,412-Speed 2496.54 samples/sec Loss 1.5691 LearningRate 0.000142 Epoch: 26 Global Step: 548320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:41,620-Speed 2495.75 samples/sec Loss 1.5783 LearningRate 0.000142 Epoch: 26 Global Step: 548330 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:49,821-Speed 2497.48 samples/sec Loss 1.5765 LearningRate 0.000142 Epoch: 26 Global Step: 548340 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:58:57,971-Speed 2513.17 samples/sec Loss 1.6167 LearningRate 0.000142 Epoch: 26 Global Step: 548350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:06,176-Speed 2496.53 samples/sec Loss 1.6065 LearningRate 0.000142 Epoch: 26 Global Step: 548360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:14,385-Speed 2495.31 samples/sec Loss 1.5876 LearningRate 0.000142 Epoch: 26 Global Step: 548370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:22,590-Speed 2496.41 samples/sec Loss 1.6434 LearningRate 0.000142 Epoch: 26 Global Step: 548380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:30,805-Speed 2493.39 samples/sec Loss 1.5847 LearningRate 0.000142 Epoch: 26 Global Step: 548390 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:39,009-Speed 2496.79 samples/sec Loss 1.6068 LearningRate 0.000142 Epoch: 26 Global Step: 548400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:47,158-Speed 2513.29 samples/sec Loss 1.5731 LearningRate 0.000142 Epoch: 26 Global Step: 548410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 19:59:55,378-Speed 2491.99 samples/sec Loss 1.6096 LearningRate 0.000142 Epoch: 26 Global Step: 548420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:03,594-Speed 2493.11 samples/sec Loss 1.5910 LearningRate 0.000142 Epoch: 26 Global Step: 548430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:11,796-Speed 2497.18 samples/sec Loss 1.5809 LearningRate 0.000142 Epoch: 26 Global Step: 548440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:20,004-Speed 2495.59 samples/sec Loss 1.5875 LearningRate 0.000142 Epoch: 26 Global Step: 548450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:28,209-Speed 2496.49 samples/sec Loss 1.5594 LearningRate 0.000142 Epoch: 26 Global Step: 548460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:36,357-Speed 2513.90 samples/sec Loss 1.6276 LearningRate 0.000142 Epoch: 26 Global Step: 548470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:44,561-Speed 2496.57 samples/sec Loss 1.5982 LearningRate 0.000142 Epoch: 26 Global Step: 548480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:00:52,768-Speed 2495.86 samples/sec Loss 1.5960 LearningRate 0.000142 Epoch: 26 Global Step: 548490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:00,974-Speed 2496.07 samples/sec Loss 1.5522 LearningRate 0.000142 Epoch: 26 Global Step: 548500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:09,192-Speed 2492.68 samples/sec Loss 1.5533 LearningRate 0.000142 Epoch: 26 Global Step: 548510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:17,418-Speed 2490.09 samples/sec Loss 1.6035 LearningRate 0.000142 Epoch: 26 Global Step: 548520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:25,570-Speed 2512.82 samples/sec Loss 1.5937 LearningRate 0.000142 Epoch: 26 Global Step: 548530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:33,776-Speed 2495.75 samples/sec Loss 1.6210 LearningRate 0.000142 Epoch: 26 Global Step: 548540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:41,984-Speed 2495.47 samples/sec Loss 1.6017 LearningRate 0.000142 Epoch: 26 Global Step: 548550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:50,192-Speed 2495.52 samples/sec Loss 1.5804 LearningRate 0.000142 Epoch: 26 Global Step: 548560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:01:58,404-Speed 2494.53 samples/sec Loss 1.5879 LearningRate 0.000142 Epoch: 26 Global Step: 548570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:06,610-Speed 2496.15 samples/sec Loss 1.6572 LearningRate 0.000142 Epoch: 26 Global Step: 548580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:14,763-Speed 2512.37 samples/sec Loss 1.5926 LearningRate 0.000142 Epoch: 26 Global Step: 548590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:22,963-Speed 2498.27 samples/sec Loss 1.5969 LearningRate 0.000142 Epoch: 26 Global Step: 548600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:31,173-Speed 2495.14 samples/sec Loss 1.6269 LearningRate 0.000142 Epoch: 26 Global Step: 548610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:39,375-Speed 2497.27 samples/sec Loss 1.5715 LearningRate 0.000142 Epoch: 26 Global Step: 548620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:47,577-Speed 2497.59 samples/sec Loss 1.6160 LearningRate 0.000142 Epoch: 26 Global Step: 548630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:02:55,782-Speed 2496.29 samples/sec Loss 1.5867 LearningRate 0.000142 Epoch: 26 Global Step: 548640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:03,936-Speed 2512.30 samples/sec Loss 1.5625 LearningRate 0.000142 Epoch: 26 Global Step: 548650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:12,141-Speed 2496.61 samples/sec Loss 1.5821 LearningRate 0.000142 Epoch: 26 Global Step: 548660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:20,343-Speed 2497.76 samples/sec Loss 1.6118 LearningRate 0.000142 Epoch: 26 Global Step: 548670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:28,553-Speed 2494.90 samples/sec Loss 1.5422 LearningRate 0.000142 Epoch: 26 Global Step: 548680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:36,757-Speed 2496.85 samples/sec Loss 1.6342 LearningRate 0.000142 Epoch: 26 Global Step: 548690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:44,962-Speed 2496.34 samples/sec Loss 1.5797 LearningRate 0.000142 Epoch: 26 Global Step: 548700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:03:53,114-Speed 2512.71 samples/sec Loss 1.6025 LearningRate 0.000142 Epoch: 26 Global Step: 548710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:04:01,321-Speed 2496.00 samples/sec Loss 1.5926 LearningRate 0.000141 Epoch: 26 Global Step: 548720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:04:09,523-Speed 2497.32 samples/sec Loss 1.6109 LearningRate 0.000141 Epoch: 26 Global Step: 548730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:04:17,727-Speed 2496.60 samples/sec Loss 1.5734 LearningRate 0.000141 Epoch: 26 Global Step: 548740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:04:25,942-Speed 2493.43 samples/sec Loss 1.5858 LearningRate 0.000141 Epoch: 26 Global Step: 548750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:04:34,167-Speed 2490.58 samples/sec Loss 1.6154 LearningRate 0.000141 Epoch: 26 Global Step: 548760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:04:42,316-Speed 2513.55 samples/sec Loss 1.6031 LearningRate 0.000141 Epoch: 26 Global Step: 548770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:04:50,523-Speed 2495.77 samples/sec Loss 1.5722 LearningRate 0.000141 Epoch: 26 Global Step: 548780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:04:58,728-Speed 2496.47 samples/sec Loss 1.6038 LearningRate 0.000141 Epoch: 26 Global Step: 548790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:06,931-Speed 2496.95 samples/sec Loss 1.5713 LearningRate 0.000141 Epoch: 26 Global Step: 548800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:15,137-Speed 2496.25 samples/sec Loss 1.6229 LearningRate 0.000141 Epoch: 26 Global Step: 548810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:23,347-Speed 2494.84 samples/sec Loss 1.6284 LearningRate 0.000141 Epoch: 26 Global Step: 548820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:31,496-Speed 2513.98 samples/sec Loss 1.6137 LearningRate 0.000141 Epoch: 26 Global Step: 548830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:39,699-Speed 2497.13 samples/sec Loss 1.5980 LearningRate 0.000141 Epoch: 26 Global Step: 548840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:47,905-Speed 2496.06 samples/sec Loss 1.5695 LearningRate 0.000141 Epoch: 26 Global Step: 548850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:05:56,128-Speed 2491.17 samples/sec Loss 1.5661 LearningRate 0.000141 Epoch: 26 Global Step: 548860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:04,329-Speed 2497.62 samples/sec Loss 1.5880 LearningRate 0.000141 Epoch: 26 Global Step: 548870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:12,534-Speed 2496.13 samples/sec Loss 1.5793 LearningRate 0.000141 Epoch: 26 Global Step: 548880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:20,685-Speed 2513.17 samples/sec Loss 1.5624 LearningRate 0.000141 Epoch: 26 Global Step: 548890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:28,889-Speed 2496.71 samples/sec Loss 1.5593 LearningRate 0.000141 Epoch: 26 Global Step: 548900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:37,092-Speed 2496.94 samples/sec Loss 1.5646 LearningRate 0.000141 Epoch: 26 Global Step: 548910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:45,303-Speed 2494.68 samples/sec Loss 1.5887 LearningRate 0.000141 Epoch: 26 Global Step: 548920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:06:53,467-Speed 2508.91 samples/sec Loss 1.6054 LearningRate 0.000141 Epoch: 26 Global Step: 548930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:01,670-Speed 2496.93 samples/sec Loss 1.5834 LearningRate 0.000141 Epoch: 26 Global Step: 548940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:09,818-Speed 2514.07 samples/sec Loss 1.5772 LearningRate 0.000141 Epoch: 26 Global Step: 548950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:18,021-Speed 2497.02 samples/sec Loss 1.6149 LearningRate 0.000141 Epoch: 26 Global Step: 548960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:26,232-Speed 2495.27 samples/sec Loss 1.5537 LearningRate 0.000141 Epoch: 26 Global Step: 548970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:34,438-Speed 2496.02 samples/sec Loss 1.5751 LearningRate 0.000141 Epoch: 26 Global Step: 548980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:42,640-Speed 2497.34 samples/sec Loss 1.5608 LearningRate 0.000141 Epoch: 26 Global Step: 548990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:50,847-Speed 2495.80 samples/sec Loss 1.6021 LearningRate 0.000141 Epoch: 26 Global Step: 549000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:07:58,996-Speed 2513.63 samples/sec Loss 1.6077 LearningRate 0.000141 Epoch: 26 Global Step: 549010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:08:07,196-Speed 2497.69 samples/sec Loss 1.6028 LearningRate 0.000141 Epoch: 26 Global Step: 549020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:08:15,402-Speed 2496.28 samples/sec Loss 1.6062 LearningRate 0.000141 Epoch: 26 Global Step: 549030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:08:23,565-Speed 2509.13 samples/sec Loss 1.5854 LearningRate 0.000141 Epoch: 26 Global Step: 549040 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:08:31,768-Speed 2497.09 samples/sec Loss 1.5677 LearningRate 0.000141 Epoch: 26 Global Step: 549050 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:08:39,977-Speed 2495.42 samples/sec Loss 1.6408 LearningRate 0.000141 Epoch: 26 Global Step: 549060 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:08:48,143-Speed 2508.33 samples/sec Loss 1.5922 LearningRate 0.000141 Epoch: 26 Global Step: 549070 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:08:56,354-Speed 2494.47 samples/sec Loss 1.5525 LearningRate 0.000141 Epoch: 26 Global Step: 549080 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:04,575-Speed 2491.45 samples/sec Loss 1.6204 LearningRate 0.000141 Epoch: 26 Global Step: 549090 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:12,779-Speed 2496.76 samples/sec Loss 1.6246 LearningRate 0.000141 Epoch: 26 Global Step: 549100 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:20,982-Speed 2496.89 samples/sec Loss 1.6216 LearningRate 0.000141 Epoch: 26 Global Step: 549110 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:29,184-Speed 2497.32 samples/sec Loss 1.5693 LearningRate 0.000141 Epoch: 26 Global Step: 549120 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:37,336-Speed 2512.85 samples/sec Loss 1.6161 LearningRate 0.000141 Epoch: 26 Global Step: 549130 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:45,541-Speed 2496.37 samples/sec Loss 1.6187 LearningRate 0.000141 Epoch: 26 Global Step: 549140 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:09:53,747-Speed 2496.08 samples/sec Loss 1.5983 LearningRate 0.000141 Epoch: 26 Global Step: 549150 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:01,951-Speed 2497.04 samples/sec Loss 1.6370 LearningRate 0.000141 Epoch: 26 Global Step: 549160 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:10,155-Speed 2496.65 samples/sec Loss 1.5706 LearningRate 0.000141 Epoch: 26 Global Step: 549170 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:18,361-Speed 2496.06 samples/sec Loss 1.5626 LearningRate 0.000141 Epoch: 26 Global Step: 549180 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:26,512-Speed 2513.22 samples/sec Loss 1.5840 LearningRate 0.000141 Epoch: 26 Global Step: 549190 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:34,716-Speed 2496.71 samples/sec Loss 1.5780 LearningRate 0.000141 Epoch: 26 Global Step: 549200 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:42,919-Speed 2497.08 samples/sec Loss 1.5897 LearningRate 0.000141 Epoch: 26 Global Step: 549210 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:51,121-Speed 2497.44 samples/sec Loss 1.5886 LearningRate 0.000141 Epoch: 26 Global Step: 549220 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:10:59,347-Speed 2490.01 samples/sec Loss 1.5945 LearningRate 0.000141 Epoch: 26 Global Step: 549230 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:07,551-Speed 2497.29 samples/sec Loss 1.5280 LearningRate 0.000141 Epoch: 26 Global Step: 549240 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:15,702-Speed 2512.84 samples/sec Loss 1.5627 LearningRate 0.000141 Epoch: 26 Global Step: 549250 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:23,914-Speed 2494.70 samples/sec Loss 1.6139 LearningRate 0.000141 Epoch: 26 Global Step: 549260 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:32,121-Speed 2495.52 samples/sec Loss 1.5903 LearningRate 0.000141 Epoch: 26 Global Step: 549270 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:40,324-Speed 2497.01 samples/sec Loss 1.5928 LearningRate 0.000141 Epoch: 26 Global Step: 549280 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:48,527-Speed 2497.23 samples/sec Loss 1.5966 LearningRate 0.000141 Epoch: 26 Global Step: 549290 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:11:56,728-Speed 2497.57 samples/sec Loss 1.5991 LearningRate 0.000141 Epoch: 26 Global Step: 549300 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:04,879-Speed 2512.81 samples/sec Loss 1.6083 LearningRate 0.000141 Epoch: 26 Global Step: 549310 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:13,085-Speed 2496.18 samples/sec Loss 1.6168 LearningRate 0.000141 Epoch: 26 Global Step: 549320 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:21,292-Speed 2495.82 samples/sec Loss 1.5774 LearningRate 0.000141 Epoch: 26 Global Step: 549330 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:29,495-Speed 2496.96 samples/sec Loss 1.6460 LearningRate 0.000141 Epoch: 26 Global Step: 549340 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:37,705-Speed 2494.78 samples/sec Loss 1.5878 LearningRate 0.000141 Epoch: 26 Global Step: 549350 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:45,908-Speed 2497.40 samples/sec Loss 1.5913 LearningRate 0.000141 Epoch: 26 Global Step: 549360 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:12:54,065-Speed 2511.35 samples/sec Loss 1.5663 LearningRate 0.000141 Epoch: 26 Global Step: 549370 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:02,271-Speed 2496.05 samples/sec Loss 1.5944 LearningRate 0.000141 Epoch: 26 Global Step: 549380 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:10,475-Speed 2496.40 samples/sec Loss 1.6339 LearningRate 0.000141 Epoch: 26 Global Step: 549390 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:18,678-Speed 2497.21 samples/sec Loss 1.5621 LearningRate 0.000141 Epoch: 26 Global Step: 549400 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:26,884-Speed 2495.94 samples/sec Loss 1.6175 LearningRate 0.000141 Epoch: 26 Global Step: 549410 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:35,088-Speed 2496.74 samples/sec Loss 1.6040 LearningRate 0.000141 Epoch: 26 Global Step: 549420 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:43,237-Speed 2513.99 samples/sec Loss 1.6120 LearningRate 0.000141 Epoch: 26 Global Step: 549430 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:51,447-Speed 2494.96 samples/sec Loss 1.5644 LearningRate 0.000141 Epoch: 26 Global Step: 549440 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:13:59,648-Speed 2497.53 samples/sec Loss 1.6374 LearningRate 0.000141 Epoch: 26 Global Step: 549450 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:07,854-Speed 2495.99 samples/sec Loss 1.6006 LearningRate 0.000141 Epoch: 26 Global Step: 549460 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:16,057-Speed 2497.22 samples/sec Loss 1.5734 LearningRate 0.000141 Epoch: 26 Global Step: 549470 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:24,264-Speed 2495.98 samples/sec Loss 1.5971 LearningRate 0.000141 Epoch: 26 Global Step: 549480 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:32,419-Speed 2511.63 samples/sec Loss 1.5703 LearningRate 0.000141 Epoch: 26 Global Step: 549490 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:40,619-Speed 2498.00 samples/sec Loss 1.5814 LearningRate 0.000141 Epoch: 26 Global Step: 549500 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:48,827-Speed 2496.46 samples/sec Loss 1.5612 LearningRate 0.000141 Epoch: 26 Global Step: 549510 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:14:57,028-Speed 2497.64 samples/sec Loss 1.5606 LearningRate 0.000141 Epoch: 26 Global Step: 549520 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:05,231-Speed 2496.70 samples/sec Loss 1.5717 LearningRate 0.000141 Epoch: 26 Global Step: 549530 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:13,440-Speed 2495.29 samples/sec Loss 1.5796 LearningRate 0.000141 Epoch: 26 Global Step: 549540 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:21,588-Speed 2514.16 samples/sec Loss 1.5746 LearningRate 0.000141 Epoch: 26 Global Step: 549550 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:29,788-Speed 2497.89 samples/sec Loss 1.5637 LearningRate 0.000141 Epoch: 26 Global Step: 549560 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:38,006-Speed 2492.46 samples/sec Loss 1.5645 LearningRate 0.000141 Epoch: 26 Global Step: 549570 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:46,208-Speed 2497.16 samples/sec Loss 1.5668 LearningRate 0.000141 Epoch: 26 Global Step: 549580 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:15:54,415-Speed 2495.77 samples/sec Loss 1.5972 LearningRate 0.000141 Epoch: 26 Global Step: 549590 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:02,622-Speed 2495.87 samples/sec Loss 1.6153 LearningRate 0.000141 Epoch: 26 Global Step: 549600 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:10,770-Speed 2514.04 samples/sec Loss 1.5887 LearningRate 0.000141 Epoch: 26 Global Step: 549610 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:18,970-Speed 2497.76 samples/sec Loss 1.5459 LearningRate 0.000141 Epoch: 26 Global Step: 549620 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:27,176-Speed 2496.44 samples/sec Loss 1.5351 LearningRate 0.000141 Epoch: 26 Global Step: 549630 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:35,380-Speed 2496.60 samples/sec Loss 1.5860 LearningRate 0.000141 Epoch: 26 Global Step: 549640 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:43,589-Speed 2495.19 samples/sec Loss 1.5901 LearningRate 0.000141 Epoch: 26 Global Step: 549650 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:51,816-Speed 2489.70 samples/sec Loss 1.5741 LearningRate 0.000141 Epoch: 26 Global Step: 549660 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:16:59,966-Speed 2513.32 samples/sec Loss 1.5886 LearningRate 0.000141 Epoch: 26 Global Step: 549670 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:08,171-Speed 2496.52 samples/sec Loss 1.5581 LearningRate 0.000141 Epoch: 26 Global Step: 549680 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:16,374-Speed 2497.14 samples/sec Loss 1.5781 LearningRate 0.000141 Epoch: 26 Global Step: 549690 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:24,576-Speed 2497.36 samples/sec Loss 1.5839 LearningRate 0.000141 Epoch: 26 Global Step: 549700 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:32,781-Speed 2496.47 samples/sec Loss 1.5641 LearningRate 0.000140 Epoch: 26 Global Step: 549710 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:40,988-Speed 2496.01 samples/sec Loss 1.5784 LearningRate 0.000140 Epoch: 26 Global Step: 549720 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:49,139-Speed 2512.87 samples/sec Loss 1.5546 LearningRate 0.000140 Epoch: 26 Global Step: 549730 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:17:57,359-Speed 2491.81 samples/sec Loss 1.6104 LearningRate 0.000140 Epoch: 26 Global Step: 549740 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:05,568-Speed 2495.35 samples/sec Loss 1.5667 LearningRate 0.000140 Epoch: 26 Global Step: 549750 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:13,771-Speed 2496.73 samples/sec Loss 1.5534 LearningRate 0.000140 Epoch: 26 Global Step: 549760 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:21,976-Speed 2496.59 samples/sec Loss 1.5865 LearningRate 0.000140 Epoch: 26 Global Step: 549770 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:30,178-Speed 2497.28 samples/sec Loss 1.5444 LearningRate 0.000140 Epoch: 26 Global Step: 549780 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:38,339-Speed 2509.89 samples/sec Loss 1.5885 LearningRate 0.000140 Epoch: 26 Global Step: 549790 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:46,555-Speed 2493.48 samples/sec Loss 1.5376 LearningRate 0.000140 Epoch: 26 Global Step: 549800 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:18:54,757-Speed 2497.17 samples/sec Loss 1.5837 LearningRate 0.000140 Epoch: 26 Global Step: 549810 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:02,964-Speed 2495.72 samples/sec Loss 1.5897 LearningRate 0.000140 Epoch: 26 Global Step: 549820 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:11,165-Speed 2497.78 samples/sec Loss 1.5399 LearningRate 0.000140 Epoch: 26 Global Step: 549830 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:19,367-Speed 2497.25 samples/sec Loss 1.5908 LearningRate 0.000140 Epoch: 26 Global Step: 549840 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:27,517-Speed 2513.45 samples/sec Loss 1.5803 LearningRate 0.000140 Epoch: 26 Global Step: 549850 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:35,719-Speed 2497.49 samples/sec Loss 1.5757 LearningRate 0.000140 Epoch: 26 Global Step: 549860 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:43,918-Speed 2498.27 samples/sec Loss 1.5811 LearningRate 0.000140 Epoch: 26 Global Step: 549870 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:19:52,120-Speed 2497.30 samples/sec Loss 1.5530 LearningRate 0.000140 Epoch: 26 Global Step: 549880 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:00,334-Speed 2493.72 samples/sec Loss 1.5913 LearningRate 0.000140 Epoch: 26 Global Step: 549890 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:08,537-Speed 2497.08 samples/sec Loss 1.5715 LearningRate 0.000140 Epoch: 26 Global Step: 549900 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:16,686-Speed 2513.44 samples/sec Loss 1.5962 LearningRate 0.000140 Epoch: 26 Global Step: 549910 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:24,890-Speed 2496.96 samples/sec Loss 1.5716 LearningRate 0.000140 Epoch: 26 Global Step: 549920 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:33,095-Speed 2496.39 samples/sec Loss 1.5915 LearningRate 0.000140 Epoch: 26 Global Step: 549930 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:41,298-Speed 2497.14 samples/sec Loss 1.5780 LearningRate 0.000140 Epoch: 26 Global Step: 549940 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:49,502-Speed 2496.90 samples/sec Loss 1.6038 LearningRate 0.000140 Epoch: 26 Global Step: 549950 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:20:57,722-Speed 2491.92 samples/sec Loss 1.5820 LearningRate 0.000140 Epoch: 26 Global Step: 549960 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:05,896-Speed 2505.59 samples/sec Loss 1.6034 LearningRate 0.000140 Epoch: 26 Global Step: 549970 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:14,095-Speed 2498.40 samples/sec Loss 1.5863 LearningRate 0.000140 Epoch: 26 Global Step: 549980 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:22,298-Speed 2496.97 samples/sec Loss 1.5955 LearningRate 0.000140 Epoch: 26 Global Step: 549990 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:30,509-Speed 2494.55 samples/sec Loss 1.6106 LearningRate 0.000140 Epoch: 26 Global Step: 550000 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:38,718-Speed 2495.29 samples/sec Loss 1.6033 LearningRate 0.000140 Epoch: 26 Global Step: 550010 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:46,919-Speed 2497.55 samples/sec Loss 1.6122 LearningRate 0.000140 Epoch: 26 Global Step: 550020 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:21:55,070-Speed 2513.02 samples/sec Loss 1.5869 LearningRate 0.000140 Epoch: 26 Global Step: 550030 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:03,273-Speed 2497.10 samples/sec Loss 1.6097 LearningRate 0.000140 Epoch: 26 Global Step: 550040 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:11,478-Speed 2496.51 samples/sec Loss 1.5555 LearningRate 0.000140 Epoch: 26 Global Step: 550050 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:19,681-Speed 2496.98 samples/sec Loss 1.5893 LearningRate 0.000140 Epoch: 26 Global Step: 550060 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:27,886-Speed 2496.46 samples/sec Loss 1.5574 LearningRate 0.000140 Epoch: 26 Global Step: 550070 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:36,088-Speed 2497.37 samples/sec Loss 1.5795 LearningRate 0.000140 Epoch: 26 Global Step: 550080 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:44,249-Speed 2510.18 samples/sec Loss 1.5904 LearningRate 0.000140 Epoch: 26 Global Step: 550090 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:22:52,455-Speed 2496.06 samples/sec Loss 1.5514 LearningRate 0.000140 Epoch: 26 Global Step: 550100 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:00,657-Speed 2497.35 samples/sec Loss 1.5976 LearningRate 0.000140 Epoch: 26 Global Step: 550110 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:08,862-Speed 2496.55 samples/sec Loss 1.6274 LearningRate 0.000140 Epoch: 26 Global Step: 550120 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:17,083-Speed 2491.75 samples/sec Loss 1.5674 LearningRate 0.000140 Epoch: 26 Global Step: 550130 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:25,289-Speed 2495.98 samples/sec Loss 1.5881 LearningRate 0.000140 Epoch: 26 Global Step: 550140 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:33,445-Speed 2511.61 samples/sec Loss 1.5820 LearningRate 0.000140 Epoch: 26 Global Step: 550150 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:41,651-Speed 2496.16 samples/sec Loss 1.6038 LearningRate 0.000140 Epoch: 26 Global Step: 550160 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:49,857-Speed 2496.01 samples/sec Loss 1.6071 LearningRate 0.000140 Epoch: 26 Global Step: 550170 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:23:58,064-Speed 2495.76 samples/sec Loss 1.5806 LearningRate 0.000140 Epoch: 26 Global Step: 550180 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:06,280-Speed 2493.06 samples/sec Loss 1.5665 LearningRate 0.000140 Epoch: 26 Global Step: 550190 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:14,484-Speed 2496.82 samples/sec Loss 1.5836 LearningRate 0.000140 Epoch: 26 Global Step: 550200 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:22,639-Speed 2511.69 samples/sec Loss 1.5471 LearningRate 0.000140 Epoch: 26 Global Step: 550210 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:30,845-Speed 2496.67 samples/sec Loss 1.5659 LearningRate 0.000140 Epoch: 26 Global Step: 550220 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:39,051-Speed 2496.25 samples/sec Loss 1.5495 LearningRate 0.000140 Epoch: 26 Global Step: 550230 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:24:47,256-Speed 2496.42 samples/sec Loss 1.5932 LearningRate 0.000140 Epoch: 26 Global Step: 550240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:24:55,471-Speed 2493.04 samples/sec Loss 1.5905 LearningRate 0.000140 Epoch: 26 Global Step: 550250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:03,677-Speed 2496.21 samples/sec Loss 1.5566 LearningRate 0.000140 Epoch: 26 Global Step: 550260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:11,831-Speed 2512.39 samples/sec Loss 1.5341 LearningRate 0.000140 Epoch: 26 Global Step: 550270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:20,034-Speed 2496.92 samples/sec Loss 1.6074 LearningRate 0.000140 Epoch: 26 Global Step: 550280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:28,236-Speed 2497.26 samples/sec Loss 1.5983 LearningRate 0.000140 Epoch: 26 Global Step: 550290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:36,443-Speed 2495.92 samples/sec Loss 1.5416 LearningRate 0.000140 Epoch: 26 Global Step: 550300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:44,656-Speed 2493.87 samples/sec Loss 1.5921 LearningRate 0.000140 Epoch: 26 Global Step: 550310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:25:52,864-Speed 2495.52 samples/sec Loss 1.5726 LearningRate 0.000140 Epoch: 26 Global Step: 550320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:01,013-Speed 2513.76 samples/sec Loss 1.6077 LearningRate 0.000140 Epoch: 26 Global Step: 550330 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:09,221-Speed 2495.60 samples/sec Loss 1.5847 LearningRate 0.000140 Epoch: 26 Global Step: 550340 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:17,425-Speed 2496.89 samples/sec Loss 1.5772 LearningRate 0.000140 Epoch: 26 Global Step: 550350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:25,629-Speed 2496.69 samples/sec Loss 1.5849 LearningRate 0.000140 Epoch: 26 Global Step: 550360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:33,840-Speed 2495.21 samples/sec Loss 1.5377 LearningRate 0.000140 Epoch: 26 Global Step: 550370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:42,045-Speed 2496.50 samples/sec Loss 1.6349 LearningRate 0.000140 Epoch: 26 Global Step: 550380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:50,195-Speed 2513.29 samples/sec Loss 1.5757 LearningRate 0.000140 Epoch: 26 Global Step: 550390 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:26:58,408-Speed 2493.88 samples/sec Loss 1.5436 LearningRate 0.000140 Epoch: 26 Global Step: 550400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:06,615-Speed 2496.00 samples/sec Loss 1.5669 LearningRate 0.000140 Epoch: 26 Global Step: 550410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:14,818-Speed 2496.99 samples/sec Loss 1.5762 LearningRate 0.000140 Epoch: 26 Global Step: 550420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:23,026-Speed 2495.58 samples/sec Loss 1.5503 LearningRate 0.000140 Epoch: 26 Global Step: 550430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:31,235-Speed 2495.17 samples/sec Loss 1.5886 LearningRate 0.000140 Epoch: 26 Global Step: 550440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:39,389-Speed 2511.96 samples/sec Loss 1.6064 LearningRate 0.000140 Epoch: 26 Global Step: 550450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:47,592-Speed 2497.14 samples/sec Loss 1.5768 LearningRate 0.000140 Epoch: 26 Global Step: 550460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:27:55,798-Speed 2496.10 samples/sec Loss 1.5596 LearningRate 0.000140 Epoch: 26 Global Step: 550470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:03,999-Speed 2497.82 samples/sec Loss 1.5542 LearningRate 0.000140 Epoch: 26 Global Step: 550480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:12,202-Speed 2496.83 samples/sec Loss 1.5837 LearningRate 0.000140 Epoch: 26 Global Step: 550490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:20,408-Speed 2496.21 samples/sec Loss 1.5662 LearningRate 0.000140 Epoch: 26 Global Step: 550500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:28,557-Speed 2513.67 samples/sec Loss 1.5769 LearningRate 0.000140 Epoch: 26 Global Step: 550510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:36,759-Speed 2497.51 samples/sec Loss 1.5849 LearningRate 0.000140 Epoch: 26 Global Step: 550520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:44,960-Speed 2497.65 samples/sec Loss 1.5690 LearningRate 0.000140 Epoch: 26 Global Step: 550530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:28:53,167-Speed 2495.79 samples/sec Loss 1.5965 LearningRate 0.000140 Epoch: 26 Global Step: 550540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:01,370-Speed 2497.17 samples/sec Loss 1.6040 LearningRate 0.000140 Epoch: 26 Global Step: 550550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:09,575-Speed 2496.27 samples/sec Loss 1.5378 LearningRate 0.000140 Epoch: 26 Global Step: 550560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:17,733-Speed 2510.99 samples/sec Loss 1.5292 LearningRate 0.000140 Epoch: 26 Global Step: 550570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:25,935-Speed 2497.20 samples/sec Loss 1.5644 LearningRate 0.000140 Epoch: 26 Global Step: 550580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:34,138-Speed 2497.08 samples/sec Loss 1.5784 LearningRate 0.000140 Epoch: 26 Global Step: 550590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:42,352-Speed 2493.67 samples/sec Loss 1.5821 LearningRate 0.000140 Epoch: 26 Global Step: 550600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:50,563-Speed 2494.72 samples/sec Loss 1.5545 LearningRate 0.000140 Epoch: 26 Global Step: 550610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:29:58,768-Speed 2496.49 samples/sec Loss 1.5823 LearningRate 0.000140 Epoch: 26 Global Step: 550620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:06,921-Speed 2512.23 samples/sec Loss 1.6036 LearningRate 0.000140 Epoch: 26 Global Step: 550630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:15,125-Speed 2496.92 samples/sec Loss 1.6099 LearningRate 0.000140 Epoch: 26 Global Step: 550640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:23,326-Speed 2497.25 samples/sec Loss 1.5776 LearningRate 0.000140 Epoch: 26 Global Step: 550650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:31,532-Speed 2496.17 samples/sec Loss 1.6501 LearningRate 0.000140 Epoch: 26 Global Step: 550660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:39,734-Speed 2497.63 samples/sec Loss 1.5990 LearningRate 0.000140 Epoch: 26 Global Step: 550670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:47,937-Speed 2497.19 samples/sec Loss 1.5747 LearningRate 0.000140 Epoch: 26 Global Step: 550680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:30:56,093-Speed 2511.67 samples/sec Loss 1.5697 LearningRate 0.000140 Epoch: 26 Global Step: 550690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:04,293-Speed 2497.75 samples/sec Loss 1.5884 LearningRate 0.000140 Epoch: 26 Global Step: 550700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:12,498-Speed 2496.47 samples/sec Loss 1.5855 LearningRate 0.000139 Epoch: 26 Global Step: 550710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:20,705-Speed 2495.93 samples/sec Loss 1.6059 LearningRate 0.000139 Epoch: 26 Global Step: 550720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:28,920-Speed 2493.51 samples/sec Loss 1.5655 LearningRate 0.000139 Epoch: 26 Global Step: 550730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:37,125-Speed 2496.45 samples/sec Loss 1.5921 LearningRate 0.000139 Epoch: 26 Global Step: 550740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:45,273-Speed 2514.05 samples/sec Loss 1.5973 LearningRate 0.000139 Epoch: 26 Global Step: 550750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:31:53,473-Speed 2497.80 samples/sec Loss 1.5578 LearningRate 0.000139 Epoch: 26 Global Step: 550760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:01,678-Speed 2496.61 samples/sec Loss 1.5926 LearningRate 0.000139 Epoch: 26 Global Step: 550770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:09,894-Speed 2492.99 samples/sec Loss 1.6016 LearningRate 0.000139 Epoch: 26 Global Step: 550780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:18,101-Speed 2496.01 samples/sec Loss 1.5609 LearningRate 0.000139 Epoch: 26 Global Step: 550790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:26,309-Speed 2495.43 samples/sec Loss 1.5779 LearningRate 0.000139 Epoch: 26 Global Step: 550800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:34,458-Speed 2513.80 samples/sec Loss 1.5637 LearningRate 0.000139 Epoch: 26 Global Step: 550810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:42,660-Speed 2497.24 samples/sec Loss 1.5454 LearningRate 0.000139 Epoch: 26 Global Step: 550820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:50,875-Speed 2493.55 samples/sec Loss 1.5777 LearningRate 0.000139 Epoch: 26 Global Step: 550830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:32:59,079-Speed 2496.73 samples/sec Loss 1.5704 LearningRate 0.000139 Epoch: 26 Global Step: 550840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:07,281-Speed 2497.27 samples/sec Loss 1.5950 LearningRate 0.000139 Epoch: 26 Global Step: 550850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:15,483-Speed 2497.28 samples/sec Loss 1.5549 LearningRate 0.000139 Epoch: 26 Global Step: 550860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:23,636-Speed 2512.65 samples/sec Loss 1.5870 LearningRate 0.000139 Epoch: 26 Global Step: 550870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:31,839-Speed 2496.99 samples/sec Loss 1.5663 LearningRate 0.000139 Epoch: 26 Global Step: 550880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:40,044-Speed 2496.39 samples/sec Loss 1.5900 LearningRate 0.000139 Epoch: 26 Global Step: 550890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:48,248-Speed 2496.86 samples/sec Loss 1.5509 LearningRate 0.000139 Epoch: 26 Global Step: 550900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:33:56,458-Speed 2495.05 samples/sec Loss 1.5577 LearningRate 0.000139 Epoch: 26 Global Step: 550910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:04,659-Speed 2497.28 samples/sec Loss 1.5261 LearningRate 0.000139 Epoch: 26 Global Step: 550920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:12,809-Speed 2513.22 samples/sec Loss 1.5851 LearningRate 0.000139 Epoch: 26 Global Step: 550930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:21,017-Speed 2495.78 samples/sec Loss 1.6036 LearningRate 0.000139 Epoch: 26 Global Step: 550940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:29,220-Speed 2497.01 samples/sec Loss 1.5485 LearningRate 0.000139 Epoch: 26 Global Step: 550950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:37,433-Speed 2494.02 samples/sec Loss 1.5647 LearningRate 0.000139 Epoch: 26 Global Step: 550960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:45,636-Speed 2497.09 samples/sec Loss 1.5955 LearningRate 0.000139 Epoch: 26 Global Step: 550970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:34:53,842-Speed 2496.01 samples/sec Loss 1.5522 LearningRate 0.000139 Epoch: 26 Global Step: 550980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:01,993-Speed 2513.23 samples/sec Loss 1.5824 LearningRate 0.000139 Epoch: 26 Global Step: 550990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:10,199-Speed 2496.09 samples/sec Loss 1.5413 LearningRate 0.000139 Epoch: 26 Global Step: 551000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:18,410-Speed 2494.61 samples/sec Loss 1.5552 LearningRate 0.000139 Epoch: 26 Global Step: 551010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:26,616-Speed 2496.02 samples/sec Loss 1.5586 LearningRate 0.000139 Epoch: 26 Global Step: 551020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:34,827-Speed 2498.47 samples/sec Loss 1.5722 LearningRate 0.000139 Epoch: 26 Global Step: 551030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:43,035-Speed 2495.32 samples/sec Loss 1.5745 LearningRate 0.000139 Epoch: 26 Global Step: 551040 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:51,198-Speed 2509.34 samples/sec Loss 1.5776 LearningRate 0.000139 Epoch: 26 Global Step: 551050 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:35:59,405-Speed 2496.15 samples/sec Loss 1.5833 LearningRate 0.000139 Epoch: 26 Global Step: 551060 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:07,614-Speed 2495.20 samples/sec Loss 1.5947 LearningRate 0.000139 Epoch: 26 Global Step: 551070 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:15,817-Speed 2496.92 samples/sec Loss 1.5696 LearningRate 0.000139 Epoch: 26 Global Step: 551080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:24,025-Speed 2495.39 samples/sec Loss 1.5834 LearningRate 0.000139 Epoch: 26 Global Step: 551090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:32,232-Speed 2495.94 samples/sec Loss 1.6000 LearningRate 0.000139 Epoch: 26 Global Step: 551100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:40,396-Speed 2509.08 samples/sec Loss 1.5935 LearningRate 0.000139 Epoch: 26 Global Step: 551110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:48,607-Speed 2494.60 samples/sec Loss 1.5892 LearningRate 0.000139 Epoch: 26 Global Step: 551120 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:36:56,806-Speed 2498.15 samples/sec Loss 1.5943 LearningRate 0.000139 Epoch: 26 Global Step: 551130 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:05,010-Speed 2496.85 samples/sec Loss 1.5708 LearningRate 0.000139 Epoch: 26 Global Step: 551140 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:13,215-Speed 2496.44 samples/sec Loss 1.6002 LearningRate 0.000139 Epoch: 26 Global Step: 551150 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:21,421-Speed 2496.12 samples/sec Loss 1.6062 LearningRate 0.000139 Epoch: 26 Global Step: 551160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:29,571-Speed 2513.47 samples/sec Loss 1.5534 LearningRate 0.000139 Epoch: 26 Global Step: 551170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:37,773-Speed 2497.52 samples/sec Loss 1.5851 LearningRate 0.000139 Epoch: 26 Global Step: 551180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:45,983-Speed 2494.68 samples/sec Loss 1.5964 LearningRate 0.000139 Epoch: 26 Global Step: 551190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:37:54,185-Speed 2497.22 samples/sec Loss 1.5434 LearningRate 0.000139 Epoch: 26 Global Step: 551200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:02,387-Speed 2497.58 samples/sec Loss 1.5749 LearningRate 0.000139 Epoch: 26 Global Step: 551210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:10,593-Speed 2496.21 samples/sec Loss 1.5672 LearningRate 0.000139 Epoch: 26 Global Step: 551220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:18,746-Speed 2512.33 samples/sec Loss 1.5895 LearningRate 0.000139 Epoch: 26 Global Step: 551230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:26,948-Speed 2497.37 samples/sec Loss 1.5796 LearningRate 0.000139 Epoch: 26 Global Step: 551240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:35,152-Speed 2496.72 samples/sec Loss 1.5475 LearningRate 0.000139 Epoch: 26 Global Step: 551250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:43,355-Speed 2497.11 samples/sec Loss 1.5331 LearningRate 0.000139 Epoch: 26 Global Step: 551260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:51,559-Speed 2496.80 samples/sec Loss 1.5870 LearningRate 0.000139 Epoch: 26 Global Step: 551270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:38:59,766-Speed 2495.73 samples/sec Loss 1.5546 LearningRate 0.000139 Epoch: 26 Global Step: 551280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:07,925-Speed 2510.92 samples/sec Loss 1.5797 LearningRate 0.000139 Epoch: 26 Global Step: 551290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:16,149-Speed 2490.72 samples/sec Loss 1.5786 LearningRate 0.000139 Epoch: 26 Global Step: 551300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:24,350-Speed 2497.40 samples/sec Loss 1.5585 LearningRate 0.000139 Epoch: 26 Global Step: 551310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:32,558-Speed 2495.50 samples/sec Loss 1.5412 LearningRate 0.000139 Epoch: 26 Global Step: 551320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:40,767-Speed 2495.55 samples/sec Loss 1.5600 LearningRate 0.000139 Epoch: 26 Global Step: 551330 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:48,975-Speed 2495.60 samples/sec Loss 1.5780 LearningRate 0.000139 Epoch: 26 Global Step: 551340 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:39:57,130-Speed 2511.83 samples/sec Loss 1.5597 LearningRate 0.000139 Epoch: 26 Global Step: 551350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:05,341-Speed 2494.70 samples/sec Loss 1.6148 LearningRate 0.000139 Epoch: 26 Global Step: 551360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:13,553-Speed 2494.16 samples/sec Loss 1.5698 LearningRate 0.000139 Epoch: 26 Global Step: 551370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:21,761-Speed 2495.81 samples/sec Loss 1.5941 LearningRate 0.000139 Epoch: 26 Global Step: 551380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:29,966-Speed 2496.55 samples/sec Loss 1.5627 LearningRate 0.000139 Epoch: 26 Global Step: 551390 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:38,173-Speed 2495.94 samples/sec Loss 1.5881 LearningRate 0.000139 Epoch: 26 Global Step: 551400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:46,332-Speed 2510.82 samples/sec Loss 1.5869 LearningRate 0.000139 Epoch: 26 Global Step: 551410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:40:54,541-Speed 2495.20 samples/sec Loss 1.5703 LearningRate 0.000139 Epoch: 26 Global Step: 551420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:41:02,749-Speed 2495.50 samples/sec Loss 1.5742 LearningRate 0.000139 Epoch: 26 Global Step: 551430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:41:10,952-Speed 2496.98 samples/sec Loss 1.5608 LearningRate 0.000139 Epoch: 26 Global Step: 551440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:41:19,156-Speed 2496.89 samples/sec Loss 1.5710 LearningRate 0.000139 Epoch: 26 Global Step: 551450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:41:27,364-Speed 2495.89 samples/sec Loss 1.5416 LearningRate 0.000139 Epoch: 26 Global Step: 551460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:41:35,515-Speed 2512.71 samples/sec Loss 1.5527 LearningRate 0.000139 Epoch: 26 Global Step: 551470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-07-10 20:41:43,674-Speed 2510.28 samples/sec Loss 1.5996 LearningRate 0.000139 Epoch: 26 Global Step: 551480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:41:51,888-Speed 2494.20 samples/sec Loss 1.5476 LearningRate 0.000139 Epoch: 26 Global Step: 551490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:00,092-Speed 2496.63 samples/sec Loss 1.5896 LearningRate 0.000139 Epoch: 26 Global Step: 551500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:08,307-Speed 2493.39 samples/sec Loss 1.5350 LearningRate 0.000139 Epoch: 26 Global Step: 551510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:16,510-Speed 2497.09 samples/sec Loss 1.5918 LearningRate 0.000139 Epoch: 26 Global Step: 551520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:24,663-Speed 2512.52 samples/sec Loss 1.5580 LearningRate 0.000139 Epoch: 26 Global Step: 551530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:32,864-Speed 2497.72 samples/sec Loss 1.5721 LearningRate 0.000139 Epoch: 26 Global Step: 551540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:41,067-Speed 2496.75 samples/sec Loss 1.5739 LearningRate 0.000139 Epoch: 26 Global Step: 551550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:49,269-Speed 2498.03 samples/sec Loss 1.5847 LearningRate 0.000139 Epoch: 26 Global Step: 551560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:42:57,472-Speed 2497.29 samples/sec Loss 1.5681 LearningRate 0.000139 Epoch: 26 Global Step: 551570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:05,672-Speed 2497.88 samples/sec Loss 1.5623 LearningRate 0.000139 Epoch: 26 Global Step: 551580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:13,827-Speed 2511.61 samples/sec Loss 1.5352 LearningRate 0.000139 Epoch: 26 Global Step: 551590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:22,025-Speed 2498.72 samples/sec Loss 1.5272 LearningRate 0.000139 Epoch: 26 Global Step: 551600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:30,234-Speed 2495.21 samples/sec Loss 1.5762 LearningRate 0.000139 Epoch: 26 Global Step: 551610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:38,437-Speed 2496.89 samples/sec Loss 1.5702 LearningRate 0.000139 Epoch: 26 Global Step: 551620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:46,650-Speed 2493.93 samples/sec Loss 1.5127 LearningRate 0.000139 Epoch: 26 Global Step: 551630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:43:54,851-Speed 2497.88 samples/sec Loss 1.5559 LearningRate 0.000139 Epoch: 26 Global Step: 551640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:03,002-Speed 2513.54 samples/sec Loss 1.5704 LearningRate 0.000139 Epoch: 26 Global Step: 551650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:11,212-Speed 2494.89 samples/sec Loss 1.5645 LearningRate 0.000139 Epoch: 26 Global Step: 551660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:19,414-Speed 2497.49 samples/sec Loss 1.6170 LearningRate 0.000139 Epoch: 26 Global Step: 551670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:27,614-Speed 2497.99 samples/sec Loss 1.5584 LearningRate 0.000139 Epoch: 26 Global Step: 551680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:35,821-Speed 2495.91 samples/sec Loss 1.5358 LearningRate 0.000139 Epoch: 26 Global Step: 551690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:44,031-Speed 2494.86 samples/sec Loss 1.5444 LearningRate 0.000139 Epoch: 26 Global Step: 551700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:44:52,181-Speed 2513.36 samples/sec Loss 1.5912 LearningRate 0.000138 Epoch: 26 Global Step: 551710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:00,397-Speed 2493.08 samples/sec Loss 1.5571 LearningRate 0.000138 Epoch: 26 Global Step: 551720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:08,604-Speed 2496.08 samples/sec Loss 1.5671 LearningRate 0.000138 Epoch: 26 Global Step: 551730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:16,810-Speed 2496.32 samples/sec Loss 1.5646 LearningRate 0.000138 Epoch: 26 Global Step: 551740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:25,013-Speed 2496.81 samples/sec Loss 1.5647 LearningRate 0.000138 Epoch: 26 Global Step: 551750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:33,213-Speed 2497.90 samples/sec Loss 1.5995 LearningRate 0.000138 Epoch: 26 Global Step: 551760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:41,389-Speed 2505.55 samples/sec Loss 1.5914 LearningRate 0.000138 Epoch: 26 Global Step: 551770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:49,604-Speed 2493.27 samples/sec Loss 1.5387 LearningRate 0.000138 Epoch: 26 Global Step: 551780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:45:57,805-Speed 2497.47 samples/sec Loss 1.5890 LearningRate 0.000138 Epoch: 26 Global Step: 551790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:10,528-Speed 1639.77 samples/sec Loss 1.5439 LearningRate 0.000138 Epoch: 26 Global Step: 551800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:18,766-Speed 2500.03 samples/sec Loss 1.5806 LearningRate 0.000138 Epoch: 26 Global Step: 551810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:26,966-Speed 2497.90 samples/sec Loss 1.5925 LearningRate 0.000138 Epoch: 26 Global Step: 551820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:35,151-Speed 2518.14 samples/sec Loss 1.5609 LearningRate 0.000138 Epoch: 26 Global Step: 551830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:44,209-Speed 2501.25 samples/sec Loss 1.5643 LearningRate 0.000138 Epoch: 26 Global Step: 551840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:46:52,405-Speed 2499.13 samples/sec Loss 1.5982 LearningRate 0.000138 Epoch: 26 Global Step: 551850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:00,806-Speed 2500.87 samples/sec Loss 1.5513 LearningRate 0.000138 Epoch: 26 Global Step: 551860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:09,046-Speed 2500.10 samples/sec Loss 1.5534 LearningRate 0.000138 Epoch: 26 Global Step: 551870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:17,276-Speed 2499.21 samples/sec Loss 1.5625 LearningRate 0.000138 Epoch: 26 Global Step: 551880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:25,438-Speed 2509.68 samples/sec Loss 1.5836 LearningRate 0.000138 Epoch: 26 Global Step: 551890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:33,700-Speed 2497.31 samples/sec Loss 1.5743 LearningRate 0.000138 Epoch: 26 Global Step: 551900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:42,775-Speed 2499.73 samples/sec Loss 1.5569 LearningRate 0.000138 Epoch: 26 Global Step: 551910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:51,019-Speed 2494.99 samples/sec Loss 1.5665 LearningRate 0.000138 Epoch: 26 Global Step: 551920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:47:59,225-Speed 2496.25 samples/sec Loss 1.5567 LearningRate 0.000138 Epoch: 26 Global Step: 551930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:07,460-Speed 2497.73 samples/sec Loss 1.5918 LearningRate 0.000138 Epoch: 26 Global Step: 551940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:19,234-Speed 1739.56 samples/sec Loss 1.5388 LearningRate 0.000138 Epoch: 26 Global Step: 551950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:27,433-Speed 2498.78 samples/sec Loss 1.5662 LearningRate 0.000138 Epoch: 26 Global Step: 551960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:35,679-Speed 2497.47 samples/sec Loss 1.5721 LearningRate 0.000138 Epoch: 26 Global Step: 551970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:43,879-Speed 2497.87 samples/sec Loss 1.6082 LearningRate 0.000138 Epoch: 26 Global Step: 551980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-07-10 20:48:52,209-Speed 2515.06 samples/sec Loss 1.5723 LearningRate 0.000138 Epoch: 26 Global Step: 551990 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:00,444-Speed 2501.25 samples/sec Loss 1.5671 LearningRate 0.000138 Epoch: 26 Global Step: 552000 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:08,592-Speed 2514.12 samples/sec Loss 1.5832 LearningRate 0.000138 Epoch: 26 Global Step: 552010 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:17,015-Speed 2443.78 samples/sec Loss 1.5648 LearningRate 0.000138 Epoch: 26 Global Step: 552020 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:26,178-Speed 2239.89 samples/sec Loss 1.5943 LearningRate 0.000138 Epoch: 26 Global Step: 552030 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:34,379-Speed 2498.21 samples/sec Loss 1.5643 LearningRate 0.000138 Epoch: 26 Global Step: 552040 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:42,580-Speed 2497.40 samples/sec Loss 1.5729 LearningRate 0.000138 Epoch: 26 Global Step: 552050 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:50,785-Speed 2496.37 samples/sec Loss 1.5514 LearningRate 0.000138 Epoch: 26 Global Step: 552060 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:49:58,937-Speed 2512.92 samples/sec Loss 1.5893 LearningRate 0.000138 Epoch: 26 Global Step: 552070 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:07,149-Speed 2494.30 samples/sec Loss 1.6030 LearningRate 0.000138 Epoch: 26 Global Step: 552080 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:15,355-Speed 2496.17 samples/sec Loss 1.5799 LearningRate 0.000138 Epoch: 26 Global Step: 552090 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:23,562-Speed 2496.03 samples/sec Loss 1.5545 LearningRate 0.000138 Epoch: 26 Global Step: 552100 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:31,765-Speed 2496.87 samples/sec Loss 1.6179 LearningRate 0.000138 Epoch: 26 Global Step: 552110 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:39,968-Speed 2497.58 samples/sec Loss 1.6183 LearningRate 0.000138 Epoch: 26 Global Step: 552120 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:48,122-Speed 2512.09 samples/sec Loss 1.6133 LearningRate 0.000138 Epoch: 26 Global Step: 552130 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:50:56,323-Speed 2497.50 samples/sec Loss 1.5785 LearningRate 0.000138 Epoch: 26 Global Step: 552140 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:04,530-Speed 2496.03 samples/sec Loss 1.5932 LearningRate 0.000138 Epoch: 26 Global Step: 552150 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:12,734-Speed 2496.86 samples/sec Loss 1.6342 LearningRate 0.000138 Epoch: 26 Global Step: 552160 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:20,943-Speed 2495.40 samples/sec Loss 1.5959 LearningRate 0.000138 Epoch: 26 Global Step: 552170 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:29,146-Speed 2496.98 samples/sec Loss 1.5942 LearningRate 0.000138 Epoch: 26 Global Step: 552180 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:37,297-Speed 2512.98 samples/sec Loss 1.5415 LearningRate 0.000138 Epoch: 26 Global Step: 552190 Fp16 Grad Scale: 16384 Required: 64 hours Training: 2022-07-10 20:51:45,505-Speed 2495.64 samples/sec Loss 1.5927 LearningRate 0.000138 Epoch: 26 Global Step: 552200 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:51:53,708-Speed 2497.10 samples/sec Loss 1.5849 LearningRate 0.000138 Epoch: 26 Global Step: 552210 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:01,913-Speed 2496.25 samples/sec Loss 1.5890 LearningRate 0.000138 Epoch: 26 Global Step: 552220 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:10,123-Speed 2495.08 samples/sec Loss 1.5677 LearningRate 0.000138 Epoch: 26 Global Step: 552230 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:18,330-Speed 2495.78 samples/sec Loss 1.5678 LearningRate 0.000138 Epoch: 26 Global Step: 552240 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:26,484-Speed 2512.03 samples/sec Loss 1.5359 LearningRate 0.000138 Epoch: 26 Global Step: 552250 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:34,693-Speed 2495.20 samples/sec Loss 1.5685 LearningRate 0.000138 Epoch: 26 Global Step: 552260 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:42,900-Speed 2496.16 samples/sec Loss 1.6044 LearningRate 0.000138 Epoch: 26 Global Step: 552270 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:51,110-Speed 2495.00 samples/sec Loss 1.5881 LearningRate 0.000138 Epoch: 26 Global Step: 552280 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:52:59,316-Speed 2495.94 samples/sec Loss 1.5534 LearningRate 0.000138 Epoch: 26 Global Step: 552290 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:07,525-Speed 2495.19 samples/sec Loss 1.5696 LearningRate 0.000138 Epoch: 26 Global Step: 552300 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:15,681-Speed 2511.45 samples/sec Loss 1.5403 LearningRate 0.000138 Epoch: 26 Global Step: 552310 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:23,887-Speed 2496.12 samples/sec Loss 1.5858 LearningRate 0.000138 Epoch: 26 Global Step: 552320 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:32,112-Speed 2490.49 samples/sec Loss 1.6036 LearningRate 0.000138 Epoch: 26 Global Step: 552330 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:40,320-Speed 2495.62 samples/sec Loss 1.5606 LearningRate 0.000138 Epoch: 26 Global Step: 552340 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:48,533-Speed 2494.20 samples/sec Loss 1.5714 LearningRate 0.000138 Epoch: 26 Global Step: 552350 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:53:56,744-Speed 2494.31 samples/sec Loss 1.5886 LearningRate 0.000138 Epoch: 26 Global Step: 552360 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:04,898-Speed 2512.02 samples/sec Loss 1.5848 LearningRate 0.000138 Epoch: 26 Global Step: 552370 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:13,111-Speed 2494.19 samples/sec Loss 1.5858 LearningRate 0.000138 Epoch: 26 Global Step: 552380 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:21,321-Speed 2494.75 samples/sec Loss 1.5416 LearningRate 0.000138 Epoch: 26 Global Step: 552390 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:29,528-Speed 2495.96 samples/sec Loss 1.6189 LearningRate 0.000138 Epoch: 26 Global Step: 552400 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:37,735-Speed 2495.81 samples/sec Loss 1.6116 LearningRate 0.000138 Epoch: 26 Global Step: 552410 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:45,945-Speed 2495.05 samples/sec Loss 1.5735 LearningRate 0.000138 Epoch: 26 Global Step: 552420 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:54:54,111-Speed 2508.18 samples/sec Loss 1.5828 LearningRate 0.000138 Epoch: 26 Global Step: 552430 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:02,318-Speed 2495.97 samples/sec Loss 1.5995 LearningRate 0.000138 Epoch: 26 Global Step: 552440 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:10,524-Speed 2496.24 samples/sec Loss 1.6009 LearningRate 0.000138 Epoch: 26 Global Step: 552450 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:18,735-Speed 2494.80 samples/sec Loss 1.5724 LearningRate 0.000138 Epoch: 26 Global Step: 552460 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:26,940-Speed 2496.35 samples/sec Loss 1.5480 LearningRate 0.000138 Epoch: 26 Global Step: 552470 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:35,152-Speed 2494.35 samples/sec Loss 1.5630 LearningRate 0.000138 Epoch: 26 Global Step: 552480 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:43,305-Speed 2512.43 samples/sec Loss 1.5590 LearningRate 0.000138 Epoch: 26 Global Step: 552490 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:51,512-Speed 2495.99 samples/sec Loss 1.6041 LearningRate 0.000138 Epoch: 26 Global Step: 552500 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:55:59,717-Speed 2496.38 samples/sec Loss 1.5734 LearningRate 0.000138 Epoch: 26 Global Step: 552510 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:07,927-Speed 2494.86 samples/sec Loss 1.5921 LearningRate 0.000138 Epoch: 26 Global Step: 552520 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:16,133-Speed 2496.07 samples/sec Loss 1.5844 LearningRate 0.000138 Epoch: 26 Global Step: 552530 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:24,343-Speed 2494.79 samples/sec Loss 1.6051 LearningRate 0.000138 Epoch: 26 Global Step: 552540 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:32,512-Speed 2507.45 samples/sec Loss 1.5963 LearningRate 0.000138 Epoch: 26 Global Step: 552550 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:40,716-Speed 2496.52 samples/sec Loss 1.6006 LearningRate 0.000138 Epoch: 26 Global Step: 552560 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:48,922-Speed 2496.42 samples/sec Loss 1.5768 LearningRate 0.000138 Epoch: 26 Global Step: 552570 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:56:57,128-Speed 2496.06 samples/sec Loss 1.5893 LearningRate 0.000138 Epoch: 26 Global Step: 552580 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:05,332-Speed 2496.51 samples/sec Loss 1.5803 LearningRate 0.000138 Epoch: 26 Global Step: 552590 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:13,536-Speed 2496.95 samples/sec Loss 1.6135 LearningRate 0.000138 Epoch: 26 Global Step: 552600 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:21,690-Speed 2511.84 samples/sec Loss 1.6182 LearningRate 0.000138 Epoch: 26 Global Step: 552610 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:29,899-Speed 2495.31 samples/sec Loss 1.5651 LearningRate 0.000138 Epoch: 26 Global Step: 552620 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:38,106-Speed 2495.89 samples/sec Loss 1.5868 LearningRate 0.000138 Epoch: 26 Global Step: 552630 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:46,324-Speed 2492.15 samples/sec Loss 1.5911 LearningRate 0.000138 Epoch: 26 Global Step: 552640 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:57:54,529-Speed 2496.61 samples/sec Loss 1.5650 LearningRate 0.000138 Epoch: 26 Global Step: 552650 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:02,732-Speed 2497.05 samples/sec Loss 1.5882 LearningRate 0.000138 Epoch: 26 Global Step: 552660 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:10,889-Speed 2510.99 samples/sec Loss 1.5731 LearningRate 0.000138 Epoch: 26 Global Step: 552670 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:19,098-Speed 2494.96 samples/sec Loss 1.6067 LearningRate 0.000138 Epoch: 26 Global Step: 552680 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:27,303-Speed 2496.61 samples/sec Loss 1.5769 LearningRate 0.000138 Epoch: 26 Global Step: 552690 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:35,516-Speed 2494.00 samples/sec Loss 1.5470 LearningRate 0.000138 Epoch: 26 Global Step: 552700 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:43,722-Speed 2496.00 samples/sec Loss 1.5661 LearningRate 0.000138 Epoch: 26 Global Step: 552710 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:58:51,927-Speed 2496.23 samples/sec Loss 1.5782 LearningRate 0.000137 Epoch: 26 Global Step: 552720 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:00,088-Speed 2510.17 samples/sec Loss 1.6004 LearningRate 0.000137 Epoch: 26 Global Step: 552730 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:08,290-Speed 2497.05 samples/sec Loss 1.5822 LearningRate 0.000137 Epoch: 26 Global Step: 552740 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:16,496-Speed 2496.20 samples/sec Loss 1.5586 LearningRate 0.000137 Epoch: 26 Global Step: 552750 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:24,703-Speed 2495.75 samples/sec Loss 1.5466 LearningRate 0.000137 Epoch: 26 Global Step: 552760 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:32,905-Speed 2497.21 samples/sec Loss 1.5818 LearningRate 0.000137 Epoch: 26 Global Step: 552770 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:41,108-Speed 2497.47 samples/sec Loss 1.5490 LearningRate 0.000137 Epoch: 26 Global Step: 552780 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:49,258-Speed 2513.28 samples/sec Loss 1.5925 LearningRate 0.000137 Epoch: 26 Global Step: 552790 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 20:59:57,462-Speed 2496.93 samples/sec Loss 1.5591 LearningRate 0.000137 Epoch: 26 Global Step: 552800 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:05,680-Speed 2492.77 samples/sec Loss 1.5533 LearningRate 0.000137 Epoch: 26 Global Step: 552810 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:13,890-Speed 2494.87 samples/sec Loss 1.6065 LearningRate 0.000137 Epoch: 26 Global Step: 552820 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:22,095-Speed 2496.45 samples/sec Loss 1.6186 LearningRate 0.000137 Epoch: 26 Global Step: 552830 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:30,302-Speed 2495.79 samples/sec Loss 1.6136 LearningRate 0.000137 Epoch: 26 Global Step: 552840 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:38,458-Speed 2511.39 samples/sec Loss 1.5686 LearningRate 0.000137 Epoch: 26 Global Step: 552850 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:46,662-Speed 2496.58 samples/sec Loss 1.5903 LearningRate 0.000137 Epoch: 26 Global Step: 552860 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:00:54,880-Speed 2492.50 samples/sec Loss 1.5646 LearningRate 0.000137 Epoch: 26 Global Step: 552870 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:03,085-Speed 2496.60 samples/sec Loss 1.5837 LearningRate 0.000137 Epoch: 26 Global Step: 552880 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:11,293-Speed 2495.68 samples/sec Loss 1.5954 LearningRate 0.000137 Epoch: 26 Global Step: 552890 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:19,505-Speed 2494.17 samples/sec Loss 1.5311 LearningRate 0.000137 Epoch: 26 Global Step: 552900 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:27,653-Speed 2514.01 samples/sec Loss 1.5543 LearningRate 0.000137 Epoch: 26 Global Step: 552910 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:35,860-Speed 2495.99 samples/sec Loss 1.5361 LearningRate 0.000137 Epoch: 26 Global Step: 552920 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:44,062-Speed 2497.18 samples/sec Loss 1.6039 LearningRate 0.000137 Epoch: 26 Global Step: 552930 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:01:52,265-Speed 2496.99 samples/sec Loss 1.5703 LearningRate 0.000137 Epoch: 26 Global Step: 552940 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:00,471-Speed 2496.38 samples/sec Loss 1.5424 LearningRate 0.000137 Epoch: 26 Global Step: 552950 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:08,689-Speed 2492.52 samples/sec Loss 1.5848 LearningRate 0.000137 Epoch: 26 Global Step: 552960 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:16,841-Speed 2512.39 samples/sec Loss 1.5463 LearningRate 0.000137 Epoch: 26 Global Step: 552970 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:25,044-Speed 2497.23 samples/sec Loss 1.5434 LearningRate 0.000137 Epoch: 26 Global Step: 552980 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:33,248-Speed 2496.77 samples/sec Loss 1.5509 LearningRate 0.000137 Epoch: 26 Global Step: 552990 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:41,448-Speed 2497.71 samples/sec Loss 1.5236 LearningRate 0.000137 Epoch: 26 Global Step: 553000 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:49,653-Speed 2496.43 samples/sec Loss 1.5802 LearningRate 0.000137 Epoch: 26 Global Step: 553010 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:02:57,860-Speed 2495.95 samples/sec Loss 1.5573 LearningRate 0.000137 Epoch: 26 Global Step: 553020 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:06,008-Speed 2513.82 samples/sec Loss 1.5338 LearningRate 0.000137 Epoch: 26 Global Step: 553030 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:14,207-Speed 2498.45 samples/sec Loss 1.5541 LearningRate 0.000137 Epoch: 26 Global Step: 553040 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:22,416-Speed 2495.12 samples/sec Loss 1.5770 LearningRate 0.000137 Epoch: 26 Global Step: 553050 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:30,623-Speed 2495.87 samples/sec Loss 1.5378 LearningRate 0.000137 Epoch: 26 Global Step: 553060 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:38,824-Speed 2497.83 samples/sec Loss 1.5672 LearningRate 0.000137 Epoch: 26 Global Step: 553070 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:47,029-Speed 2496.69 samples/sec Loss 1.5836 LearningRate 0.000137 Epoch: 26 Global Step: 553080 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:03:55,180-Speed 2512.60 samples/sec Loss 1.5384 LearningRate 0.000137 Epoch: 26 Global Step: 553090 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:03,400-Speed 2491.83 samples/sec Loss 1.5673 LearningRate 0.000137 Epoch: 26 Global Step: 553100 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:11,611-Speed 2494.76 samples/sec Loss 1.5486 LearningRate 0.000137 Epoch: 26 Global Step: 553110 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:19,814-Speed 2497.17 samples/sec Loss 1.5573 LearningRate 0.000137 Epoch: 26 Global Step: 553120 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:28,019-Speed 2496.36 samples/sec Loss 1.5372 LearningRate 0.000137 Epoch: 26 Global Step: 553130 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:36,223-Speed 2496.52 samples/sec Loss 1.5514 LearningRate 0.000137 Epoch: 26 Global Step: 553140 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:44,372-Speed 2513.66 samples/sec Loss 1.5972 LearningRate 0.000137 Epoch: 26 Global Step: 553150 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:04:52,579-Speed 2495.79 samples/sec Loss 1.5467 LearningRate 0.000137 Epoch: 26 Global Step: 553160 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:05:00,783-Speed 2496.86 samples/sec Loss 1.5439 LearningRate 0.000137 Epoch: 26 Global Step: 553170 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:05:08,990-Speed 2495.74 samples/sec Loss 1.5593 LearningRate 0.000137 Epoch: 26 Global Step: 553180 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:05:17,194-Speed 2496.64 samples/sec Loss 1.5948 LearningRate 0.000137 Epoch: 26 Global Step: 553190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:05:25,403-Speed 2495.48 samples/sec Loss 1.6116 LearningRate 0.000137 Epoch: 26 Global Step: 553200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:05:33,554-Speed 2513.12 samples/sec Loss 1.5887 LearningRate 0.000137 Epoch: 26 Global Step: 553210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:05:41,762-Speed 2495.39 samples/sec Loss 1.5954 LearningRate 0.000137 Epoch: 26 Global Step: 553220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:05:49,990-Speed 2489.77 samples/sec Loss 1.5828 LearningRate 0.000137 Epoch: 26 Global Step: 553230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:05:58,194-Speed 2496.78 samples/sec Loss 1.6020 LearningRate 0.000137 Epoch: 26 Global Step: 553240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:06,412-Speed 2492.46 samples/sec Loss 1.5284 LearningRate 0.000137 Epoch: 26 Global Step: 553250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:14,620-Speed 2495.59 samples/sec Loss 1.5780 LearningRate 0.000137 Epoch: 26 Global Step: 553260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:22,770-Speed 2513.05 samples/sec Loss 1.5722 LearningRate 0.000137 Epoch: 26 Global Step: 553270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:30,973-Speed 2497.08 samples/sec Loss 1.5166 LearningRate 0.000137 Epoch: 26 Global Step: 553280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:39,180-Speed 2495.55 samples/sec Loss 1.5687 LearningRate 0.000137 Epoch: 26 Global Step: 553290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:47,387-Speed 2496.11 samples/sec Loss 1.5779 LearningRate 0.000137 Epoch: 26 Global Step: 553300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:06:55,588-Speed 2497.53 samples/sec Loss 1.5703 LearningRate 0.000137 Epoch: 26 Global Step: 553310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:03,806-Speed 2492.29 samples/sec Loss 1.5704 LearningRate 0.000137 Epoch: 26 Global Step: 553320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:11,962-Speed 2511.57 samples/sec Loss 1.5465 LearningRate 0.000137 Epoch: 26 Global Step: 553330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:20,167-Speed 2496.39 samples/sec Loss 1.5805 LearningRate 0.000137 Epoch: 26 Global Step: 553340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:28,370-Speed 2497.09 samples/sec Loss 1.5354 LearningRate 0.000137 Epoch: 26 Global Step: 553350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:36,574-Speed 2496.67 samples/sec Loss 1.5467 LearningRate 0.000137 Epoch: 26 Global Step: 553360 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:44,778-Speed 2496.81 samples/sec Loss 1.5408 LearningRate 0.000137 Epoch: 26 Global Step: 553370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:07:52,983-Speed 2496.57 samples/sec Loss 1.5668 LearningRate 0.000137 Epoch: 26 Global Step: 553380 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:01,138-Speed 2511.86 samples/sec Loss 1.5768 LearningRate 0.000137 Epoch: 26 Global Step: 553390 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:09,343-Speed 2496.08 samples/sec Loss 1.5947 LearningRate 0.000137 Epoch: 26 Global Step: 553400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:17,550-Speed 2496.07 samples/sec Loss 1.5731 LearningRate 0.000137 Epoch: 26 Global Step: 553410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:25,754-Speed 2496.61 samples/sec Loss 1.5833 LearningRate 0.000137 Epoch: 26 Global Step: 553420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:33,957-Speed 2496.94 samples/sec Loss 1.5809 LearningRate 0.000137 Epoch: 26 Global Step: 553430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:42,162-Speed 2496.52 samples/sec Loss 1.6090 LearningRate 0.000137 Epoch: 26 Global Step: 553440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:50,313-Speed 2512.87 samples/sec Loss 1.5741 LearningRate 0.000137 Epoch: 26 Global Step: 553450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:08:58,522-Speed 2495.37 samples/sec Loss 1.5339 LearningRate 0.000137 Epoch: 26 Global Step: 553460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:06,726-Speed 2496.61 samples/sec Loss 1.5748 LearningRate 0.000137 Epoch: 26 Global Step: 553470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:14,929-Speed 2497.04 samples/sec Loss 1.5756 LearningRate 0.000137 Epoch: 26 Global Step: 553480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:23,124-Speed 2499.45 samples/sec Loss 1.5696 LearningRate 0.000137 Epoch: 26 Global Step: 553490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:31,335-Speed 2494.66 samples/sec Loss 1.5807 LearningRate 0.000137 Epoch: 26 Global Step: 553500 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:39,486-Speed 2512.86 samples/sec Loss 1.5426 LearningRate 0.000137 Epoch: 26 Global Step: 553510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:47,700-Speed 2493.72 samples/sec Loss 1.5683 LearningRate 0.000137 Epoch: 26 Global Step: 553520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:09:55,913-Speed 2493.95 samples/sec Loss 1.5784 LearningRate 0.000137 Epoch: 26 Global Step: 553530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:04,119-Speed 2496.41 samples/sec Loss 1.5840 LearningRate 0.000137 Epoch: 26 Global Step: 553540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:12,326-Speed 2495.71 samples/sec Loss 1.5283 LearningRate 0.000137 Epoch: 26 Global Step: 553550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:20,536-Speed 2494.87 samples/sec Loss 1.6169 LearningRate 0.000137 Epoch: 26 Global Step: 553560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:28,691-Speed 2511.96 samples/sec Loss 1.5640 LearningRate 0.000137 Epoch: 26 Global Step: 553570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:36,896-Speed 2496.45 samples/sec Loss 1.5102 LearningRate 0.000137 Epoch: 26 Global Step: 553580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:45,098-Speed 2497.54 samples/sec Loss 1.5625 LearningRate 0.000137 Epoch: 26 Global Step: 553590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:10:53,303-Speed 2496.46 samples/sec Loss 1.5475 LearningRate 0.000137 Epoch: 26 Global Step: 553600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:01,509-Speed 2496.06 samples/sec Loss 1.5652 LearningRate 0.000137 Epoch: 26 Global Step: 553610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:09,719-Speed 2495.14 samples/sec Loss 1.5472 LearningRate 0.000137 Epoch: 26 Global Step: 553620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:17,870-Speed 2512.92 samples/sec Loss 1.5616 LearningRate 0.000137 Epoch: 26 Global Step: 553630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:26,072-Speed 2497.24 samples/sec Loss 1.5822 LearningRate 0.000137 Epoch: 26 Global Step: 553640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:34,280-Speed 2495.86 samples/sec Loss 1.5985 LearningRate 0.000137 Epoch: 26 Global Step: 553650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:42,485-Speed 2496.45 samples/sec Loss 1.5580 LearningRate 0.000137 Epoch: 26 Global Step: 553660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:50,689-Speed 2496.84 samples/sec Loss 1.5580 LearningRate 0.000137 Epoch: 26 Global Step: 553670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:11:58,891-Speed 2497.05 samples/sec Loss 1.5747 LearningRate 0.000137 Epoch: 26 Global Step: 553680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:07,060-Speed 2507.61 samples/sec Loss 1.5836 LearningRate 0.000137 Epoch: 26 Global Step: 553690 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:15,264-Speed 2496.68 samples/sec Loss 1.5721 LearningRate 0.000137 Epoch: 26 Global Step: 553700 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:23,471-Speed 2495.91 samples/sec Loss 1.5698 LearningRate 0.000137 Epoch: 26 Global Step: 553710 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:31,675-Speed 2496.45 samples/sec Loss 1.5684 LearningRate 0.000137 Epoch: 26 Global Step: 553720 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:39,884-Speed 2495.49 samples/sec Loss 1.5743 LearningRate 0.000136 Epoch: 26 Global Step: 553730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:48,102-Speed 2492.35 samples/sec Loss 1.5491 LearningRate 0.000136 Epoch: 26 Global Step: 553740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:12:56,260-Speed 2510.77 samples/sec Loss 1.5617 LearningRate 0.000136 Epoch: 26 Global Step: 553750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:04,469-Speed 2495.18 samples/sec Loss 1.5354 LearningRate 0.000136 Epoch: 26 Global Step: 553760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:12,677-Speed 2495.67 samples/sec Loss 1.5560 LearningRate 0.000136 Epoch: 26 Global Step: 553770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:20,891-Speed 2493.70 samples/sec Loss 1.5664 LearningRate 0.000136 Epoch: 26 Global Step: 553780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:29,101-Speed 2494.90 samples/sec Loss 1.5441 LearningRate 0.000136 Epoch: 26 Global Step: 553790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:37,306-Speed 2496.81 samples/sec Loss 1.5395 LearningRate 0.000136 Epoch: 26 Global Step: 553800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:45,459-Speed 2512.42 samples/sec Loss 1.5857 LearningRate 0.000136 Epoch: 26 Global Step: 553810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:13:53,665-Speed 2496.12 samples/sec Loss 1.5376 LearningRate 0.000136 Epoch: 26 Global Step: 553820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:01,873-Speed 2495.36 samples/sec Loss 1.5751 LearningRate 0.000136 Epoch: 26 Global Step: 553830 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:10,077-Speed 2497.19 samples/sec Loss 1.5795 LearningRate 0.000136 Epoch: 26 Global Step: 553840 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:18,284-Speed 2495.65 samples/sec Loss 1.5571 LearningRate 0.000136 Epoch: 26 Global Step: 553850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:26,491-Speed 2495.74 samples/sec Loss 1.5570 LearningRate 0.000136 Epoch: 26 Global Step: 553860 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:34,659-Speed 2507.61 samples/sec Loss 1.5414 LearningRate 0.000136 Epoch: 26 Global Step: 553870 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:42,865-Speed 2496.28 samples/sec Loss 1.5586 LearningRate 0.000136 Epoch: 26 Global Step: 553880 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:51,081-Speed 2492.97 samples/sec Loss 1.5957 LearningRate 0.000136 Epoch: 26 Global Step: 553890 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:14:59,287-Speed 2496.09 samples/sec Loss 1.5521 LearningRate 0.000136 Epoch: 26 Global Step: 553900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:07,493-Speed 2496.45 samples/sec Loss 1.5787 LearningRate 0.000136 Epoch: 26 Global Step: 553910 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:15,705-Speed 2494.39 samples/sec Loss 1.5540 LearningRate 0.000136 Epoch: 26 Global Step: 553920 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:23,865-Speed 2509.98 samples/sec Loss 1.5918 LearningRate 0.000136 Epoch: 26 Global Step: 553930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:32,070-Speed 2496.43 samples/sec Loss 1.6099 LearningRate 0.000136 Epoch: 26 Global Step: 553940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:40,274-Speed 2497.02 samples/sec Loss 1.5533 LearningRate 0.000136 Epoch: 26 Global Step: 553950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:48,478-Speed 2496.74 samples/sec Loss 1.5577 LearningRate 0.000136 Epoch: 26 Global Step: 553960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:15:56,683-Speed 2496.24 samples/sec Loss 1.5396 LearningRate 0.000136 Epoch: 26 Global Step: 553970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:04,890-Speed 2495.89 samples/sec Loss 1.5830 LearningRate 0.000136 Epoch: 26 Global Step: 553980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:13,041-Speed 2513.11 samples/sec Loss 1.5496 LearningRate 0.000136 Epoch: 26 Global Step: 553990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:21,247-Speed 2496.19 samples/sec Loss 1.5807 LearningRate 0.000136 Epoch: 26 Global Step: 554000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:29,466-Speed 2492.03 samples/sec Loss 1.5655 LearningRate 0.000136 Epoch: 26 Global Step: 554010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:37,671-Speed 2496.63 samples/sec Loss 1.5775 LearningRate 0.000136 Epoch: 26 Global Step: 554020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:45,874-Speed 2496.87 samples/sec Loss 1.5775 LearningRate 0.000136 Epoch: 26 Global Step: 554030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:16:54,078-Speed 2496.95 samples/sec Loss 1.5334 LearningRate 0.000136 Epoch: 26 Global Step: 554040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:02,229-Speed 2512.86 samples/sec Loss 1.5371 LearningRate 0.000136 Epoch: 26 Global Step: 554050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:10,432-Speed 2496.77 samples/sec Loss 1.5785 LearningRate 0.000136 Epoch: 26 Global Step: 554060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:18,636-Speed 2497.18 samples/sec Loss 1.5764 LearningRate 0.000136 Epoch: 26 Global Step: 554070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:26,844-Speed 2495.87 samples/sec Loss 1.5961 LearningRate 0.000136 Epoch: 26 Global Step: 554080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:35,048-Speed 2496.53 samples/sec Loss 1.5461 LearningRate 0.000136 Epoch: 26 Global Step: 554090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:43,268-Speed 2492.12 samples/sec Loss 1.5460 LearningRate 0.000136 Epoch: 26 Global Step: 554100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:51,418-Speed 2513.29 samples/sec Loss 1.5993 LearningRate 0.000136 Epoch: 26 Global Step: 554110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:17:59,622-Speed 2496.83 samples/sec Loss 1.5661 LearningRate 0.000136 Epoch: 26 Global Step: 554120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:07,828-Speed 2496.21 samples/sec Loss 1.5975 LearningRate 0.000136 Epoch: 26 Global Step: 554130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:16,031-Speed 2496.84 samples/sec Loss 1.5862 LearningRate 0.000136 Epoch: 26 Global Step: 554140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:24,236-Speed 2496.80 samples/sec Loss 1.5893 LearningRate 0.000136 Epoch: 26 Global Step: 554150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:32,442-Speed 2496.28 samples/sec Loss 1.5696 LearningRate 0.000136 Epoch: 26 Global Step: 554160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:40,616-Speed 2505.57 samples/sec Loss 1.5885 LearningRate 0.000136 Epoch: 26 Global Step: 554170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:48,828-Speed 2494.50 samples/sec Loss 1.5516 LearningRate 0.000136 Epoch: 26 Global Step: 554180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:18:57,032-Speed 2496.81 samples/sec Loss 1.5466 LearningRate 0.000136 Epoch: 26 Global Step: 554190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:05,237-Speed 2496.11 samples/sec Loss 1.5577 LearningRate 0.000136 Epoch: 26 Global Step: 554200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:13,464-Speed 2489.84 samples/sec Loss 1.5699 LearningRate 0.000136 Epoch: 26 Global Step: 554210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:21,673-Speed 2495.29 samples/sec Loss 1.5641 LearningRate 0.000136 Epoch: 26 Global Step: 554220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:29,823-Speed 2513.13 samples/sec Loss 1.5899 LearningRate 0.000136 Epoch: 26 Global Step: 554230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:38,028-Speed 2496.35 samples/sec Loss 1.5422 LearningRate 0.000136 Epoch: 26 Global Step: 554240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:46,236-Speed 2495.77 samples/sec Loss 1.5828 LearningRate 0.000136 Epoch: 26 Global Step: 554250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:19:54,457-Speed 2491.60 samples/sec Loss 1.5746 LearningRate 0.000136 Epoch: 26 Global Step: 554260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:02,665-Speed 2495.41 samples/sec Loss 1.5926 LearningRate 0.000136 Epoch: 26 Global Step: 554270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:10,878-Speed 2493.98 samples/sec Loss 1.5772 LearningRate 0.000136 Epoch: 26 Global Step: 554280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:19,028-Speed 2513.08 samples/sec Loss 1.5502 LearningRate 0.000136 Epoch: 26 Global Step: 554290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:27,234-Speed 2496.27 samples/sec Loss 1.5471 LearningRate 0.000136 Epoch: 26 Global Step: 554300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:35,446-Speed 2494.25 samples/sec Loss 1.5919 LearningRate 0.000136 Epoch: 26 Global Step: 554310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:43,654-Speed 2495.37 samples/sec Loss 1.5617 LearningRate 0.000136 Epoch: 26 Global Step: 554320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:20:51,859-Speed 2496.45 samples/sec Loss 1.5591 LearningRate 0.000136 Epoch: 26 Global Step: 554330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:21:00,087-Speed 2489.53 samples/sec Loss 1.5755 LearningRate 0.000136 Epoch: 26 Global Step: 554340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:21:08,244-Speed 2511.27 samples/sec Loss 1.6109 LearningRate 0.000136 Epoch: 26 Global Step: 554350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-07-10 21:21:16,402-Speed 2510.89 samples/sec Loss 1.5610 LearningRate 0.000136 Epoch: 26 Global Step: 554360 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:21:24,607-Speed 2496.42 samples/sec Loss 1.5757 LearningRate 0.000136 Epoch: 26 Global Step: 554370 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:21:32,813-Speed 2496.05 samples/sec Loss 1.5840 LearningRate 0.000136 Epoch: 26 Global Step: 554380 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:21:41,017-Speed 2496.85 samples/sec Loss 1.5784 LearningRate 0.000136 Epoch: 26 Global Step: 554390 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:21:49,220-Speed 2497.05 samples/sec Loss 1.5590 LearningRate 0.000136 Epoch: 26 Global Step: 554400 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:21:57,373-Speed 2512.59 samples/sec Loss 1.5921 LearningRate 0.000136 Epoch: 26 Global Step: 554410 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:05,579-Speed 2496.06 samples/sec Loss 1.5599 LearningRate 0.000136 Epoch: 26 Global Step: 554420 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:13,783-Speed 2496.54 samples/sec Loss 1.5416 LearningRate 0.000136 Epoch: 26 Global Step: 554430 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:21,993-Speed 2494.88 samples/sec Loss 1.5568 LearningRate 0.000136 Epoch: 26 Global Step: 554440 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:30,211-Speed 2492.43 samples/sec Loss 1.5418 LearningRate 0.000136 Epoch: 26 Global Step: 554450 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:38,417-Speed 2496.18 samples/sec Loss 1.5602 LearningRate 0.000136 Epoch: 26 Global Step: 554460 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:46,571-Speed 2512.58 samples/sec Loss 1.5289 LearningRate 0.000136 Epoch: 26 Global Step: 554470 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:22:54,786-Speed 2493.49 samples/sec Loss 1.5332 LearningRate 0.000136 Epoch: 26 Global Step: 554480 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:02,991-Speed 2496.64 samples/sec Loss 1.4929 LearningRate 0.000136 Epoch: 26 Global Step: 554490 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:11,194-Speed 2496.84 samples/sec Loss 1.5370 LearningRate 0.000136 Epoch: 26 Global Step: 554500 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:19,419-Speed 2490.60 samples/sec Loss 1.5520 LearningRate 0.000136 Epoch: 26 Global Step: 554510 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:27,628-Speed 2495.46 samples/sec Loss 1.5516 LearningRate 0.000136 Epoch: 26 Global Step: 554520 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:35,780-Speed 2512.74 samples/sec Loss 1.5311 LearningRate 0.000136 Epoch: 26 Global Step: 554530 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:43,987-Speed 2495.80 samples/sec Loss 1.5670 LearningRate 0.000136 Epoch: 26 Global Step: 554540 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:23:52,192-Speed 2496.43 samples/sec Loss 1.5328 LearningRate 0.000136 Epoch: 26 Global Step: 554550 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:24:00,400-Speed 2495.31 samples/sec Loss 1.5252 LearningRate 0.000136 Epoch: 26 Global Step: 554560 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:24:08,587-Speed 2501.90 samples/sec Loss 1.5709 LearningRate 0.000136 Epoch: 26 Global Step: 554570 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:16,797-Speed 2494.87 samples/sec Loss 1.5684 LearningRate 0.000136 Epoch: 26 Global Step: 554580 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:24,971-Speed 2505.83 samples/sec Loss 1.5682 LearningRate 0.000136 Epoch: 26 Global Step: 554590 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:33,179-Speed 2495.38 samples/sec Loss 1.5771 LearningRate 0.000136 Epoch: 26 Global Step: 554600 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:41,387-Speed 2495.66 samples/sec Loss 1.5643 LearningRate 0.000136 Epoch: 26 Global Step: 554610 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:49,597-Speed 2494.74 samples/sec Loss 1.5751 LearningRate 0.000136 Epoch: 26 Global Step: 554620 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:24:57,806-Speed 2495.30 samples/sec Loss 1.5535 LearningRate 0.000136 Epoch: 26 Global Step: 554630 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:06,012-Speed 2496.08 samples/sec Loss 1.5485 LearningRate 0.000136 Epoch: 26 Global Step: 554640 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:14,161-Speed 2513.68 samples/sec Loss 1.5311 LearningRate 0.000136 Epoch: 26 Global Step: 554650 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:22,364-Speed 2496.90 samples/sec Loss 1.5408 LearningRate 0.000136 Epoch: 26 Global Step: 554660 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:30,573-Speed 2495.25 samples/sec Loss 1.5698 LearningRate 0.000136 Epoch: 26 Global Step: 554670 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:38,787-Speed 2493.95 samples/sec Loss 1.5447 LearningRate 0.000136 Epoch: 26 Global Step: 554680 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:46,991-Speed 2496.57 samples/sec Loss 1.5405 LearningRate 0.000136 Epoch: 26 Global Step: 554690 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:25:55,205-Speed 2493.49 samples/sec Loss 1.5794 LearningRate 0.000136 Epoch: 26 Global Step: 554700 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:03,356-Speed 2513.10 samples/sec Loss 1.5551 LearningRate 0.000136 Epoch: 26 Global Step: 554710 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:11,559-Speed 2496.86 samples/sec Loss 1.5630 LearningRate 0.000136 Epoch: 26 Global Step: 554720 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:19,762-Speed 2497.12 samples/sec Loss 1.5359 LearningRate 0.000136 Epoch: 26 Global Step: 554730 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:27,973-Speed 2494.70 samples/sec Loss 1.5736 LearningRate 0.000135 Epoch: 26 Global Step: 554740 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:36,183-Speed 2495.00 samples/sec Loss 1.5451 LearningRate 0.000135 Epoch: 26 Global Step: 554750 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:44,393-Speed 2494.98 samples/sec Loss 1.5592 LearningRate 0.000135 Epoch: 26 Global Step: 554760 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:26:52,548-Speed 2511.87 samples/sec Loss 1.5946 LearningRate 0.000135 Epoch: 26 Global Step: 554770 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:00,747-Speed 2498.07 samples/sec Loss 1.5629 LearningRate 0.000135 Epoch: 26 Global Step: 554780 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:08,951-Speed 2497.03 samples/sec Loss 1.5522 LearningRate 0.000135 Epoch: 26 Global Step: 554790 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:17,155-Speed 2496.70 samples/sec Loss 1.5525 LearningRate 0.000135 Epoch: 26 Global Step: 554800 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:25,357-Speed 2497.29 samples/sec Loss 1.5715 LearningRate 0.000135 Epoch: 26 Global Step: 554810 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:33,561-Speed 2496.67 samples/sec Loss 1.5458 LearningRate 0.000135 Epoch: 26 Global Step: 554820 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:41,713-Speed 2512.97 samples/sec Loss 1.5602 LearningRate 0.000135 Epoch: 26 Global Step: 554830 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:49,917-Speed 2496.77 samples/sec Loss 1.5333 LearningRate 0.000135 Epoch: 26 Global Step: 554840 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:27:58,126-Speed 2495.15 samples/sec Loss 1.5410 LearningRate 0.000135 Epoch: 26 Global Step: 554850 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:06,329-Speed 2496.88 samples/sec Loss 1.5666 LearningRate 0.000135 Epoch: 26 Global Step: 554860 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:14,534-Speed 2496.71 samples/sec Loss 1.5862 LearningRate 0.000135 Epoch: 26 Global Step: 554870 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:22,738-Speed 2496.96 samples/sec Loss 1.5438 LearningRate 0.000135 Epoch: 26 Global Step: 554880 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:30,887-Speed 2513.28 samples/sec Loss 1.5403 LearningRate 0.000135 Epoch: 26 Global Step: 554890 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:39,095-Speed 2495.86 samples/sec Loss 1.5615 LearningRate 0.000135 Epoch: 26 Global Step: 554900 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:47,300-Speed 2496.75 samples/sec Loss 1.5733 LearningRate 0.000135 Epoch: 26 Global Step: 554910 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:28:55,509-Speed 2495.16 samples/sec Loss 1.5420 LearningRate 0.000135 Epoch: 26 Global Step: 554920 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:03,724-Speed 2493.53 samples/sec Loss 1.5429 LearningRate 0.000135 Epoch: 26 Global Step: 554930 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:11,931-Speed 2496.07 samples/sec Loss 1.5540 LearningRate 0.000135 Epoch: 26 Global Step: 554940 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:20,081-Speed 2513.23 samples/sec Loss 1.5731 LearningRate 0.000135 Epoch: 26 Global Step: 554950 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:28,292-Speed 2494.58 samples/sec Loss 1.5241 LearningRate 0.000135 Epoch: 26 Global Step: 554960 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:36,494-Speed 2497.40 samples/sec Loss 1.5774 LearningRate 0.000135 Epoch: 26 Global Step: 554970 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:44,699-Speed 2496.48 samples/sec Loss 1.5331 LearningRate 0.000135 Epoch: 26 Global Step: 554980 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:29:52,914-Speed 2493.31 samples/sec Loss 1.5058 LearningRate 0.000135 Epoch: 26 Global Step: 554990 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:01,116-Speed 2497.66 samples/sec Loss 1.5525 LearningRate 0.000135 Epoch: 26 Global Step: 555000 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:09,269-Speed 2512.13 samples/sec Loss 1.5659 LearningRate 0.000135 Epoch: 26 Global Step: 555010 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:17,472-Speed 2496.82 samples/sec Loss 1.5089 LearningRate 0.000135 Epoch: 26 Global Step: 555020 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:25,675-Speed 2497.32 samples/sec Loss 1.5761 LearningRate 0.000135 Epoch: 26 Global Step: 555030 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:33,878-Speed 2497.12 samples/sec Loss 1.5817 LearningRate 0.000135 Epoch: 26 Global Step: 555040 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:42,081-Speed 2496.95 samples/sec Loss 1.5204 LearningRate 0.000135 Epoch: 26 Global Step: 555050 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:50,283-Speed 2497.27 samples/sec Loss 1.5694 LearningRate 0.000135 Epoch: 26 Global Step: 555060 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:30:58,430-Speed 2514.18 samples/sec Loss 1.5799 LearningRate 0.000135 Epoch: 26 Global Step: 555070 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:06,633-Speed 2497.08 samples/sec Loss 1.5616 LearningRate 0.000135 Epoch: 26 Global Step: 555080 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:14,835-Speed 2497.33 samples/sec Loss 1.5543 LearningRate 0.000135 Epoch: 26 Global Step: 555090 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:23,037-Speed 2498.28 samples/sec Loss 1.5677 LearningRate 0.000135 Epoch: 26 Global Step: 555100 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:31,236-Speed 2498.19 samples/sec Loss 1.5919 LearningRate 0.000135 Epoch: 26 Global Step: 555110 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:39,449-Speed 2493.76 samples/sec Loss 1.5385 LearningRate 0.000135 Epoch: 26 Global Step: 555120 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:47,599-Speed 2513.32 samples/sec Loss 1.5552 LearningRate 0.000135 Epoch: 26 Global Step: 555130 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:31:55,800-Speed 2497.39 samples/sec Loss 1.5856 LearningRate 0.000135 Epoch: 26 Global Step: 555140 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:04,005-Speed 2496.43 samples/sec Loss 1.5879 LearningRate 0.000135 Epoch: 26 Global Step: 555150 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:12,210-Speed 2496.41 samples/sec Loss 1.5733 LearningRate 0.000135 Epoch: 26 Global Step: 555160 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:20,414-Speed 2496.85 samples/sec Loss 1.5896 LearningRate 0.000135 Epoch: 26 Global Step: 555170 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:28,623-Speed 2495.40 samples/sec Loss 1.5304 LearningRate 0.000135 Epoch: 26 Global Step: 555180 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:36,777-Speed 2512.20 samples/sec Loss 1.5840 LearningRate 0.000135 Epoch: 26 Global Step: 555190 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:44,992-Speed 2493.27 samples/sec Loss 1.5746 LearningRate 0.000135 Epoch: 26 Global Step: 555200 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:32:53,203-Speed 2494.67 samples/sec Loss 1.5776 LearningRate 0.000135 Epoch: 26 Global Step: 555210 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:01,402-Speed 2498.17 samples/sec Loss 1.5931 LearningRate 0.000135 Epoch: 26 Global Step: 555220 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:09,603-Speed 2497.56 samples/sec Loss 1.5690 LearningRate 0.000135 Epoch: 26 Global Step: 555230 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:17,812-Speed 2495.18 samples/sec Loss 1.5837 LearningRate 0.000135 Epoch: 26 Global Step: 555240 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:25,961-Speed 2513.53 samples/sec Loss 1.5749 LearningRate 0.000135 Epoch: 26 Global Step: 555250 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:34,164-Speed 2497.15 samples/sec Loss 1.5857 LearningRate 0.000135 Epoch: 26 Global Step: 555260 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:42,366-Speed 2497.17 samples/sec Loss 1.5652 LearningRate 0.000135 Epoch: 26 Global Step: 555270 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:50,575-Speed 2495.30 samples/sec Loss 1.5705 LearningRate 0.000135 Epoch: 26 Global Step: 555280 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:33:58,776-Speed 2497.40 samples/sec Loss 1.5680 LearningRate 0.000135 Epoch: 26 Global Step: 555290 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:06,978-Speed 2497.60 samples/sec Loss 1.5920 LearningRate 0.000135 Epoch: 26 Global Step: 555300 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:15,130-Speed 2512.51 samples/sec Loss 1.5422 LearningRate 0.000135 Epoch: 26 Global Step: 555310 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:23,341-Speed 2494.53 samples/sec Loss 1.5602 LearningRate 0.000135 Epoch: 26 Global Step: 555320 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:31,547-Speed 2496.48 samples/sec Loss 1.5719 LearningRate 0.000135 Epoch: 26 Global Step: 555330 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:39,751-Speed 2496.94 samples/sec Loss 1.5840 LearningRate 0.000135 Epoch: 26 Global Step: 555340 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:47,957-Speed 2496.01 samples/sec Loss 1.5759 LearningRate 0.000135 Epoch: 26 Global Step: 555350 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:34:56,218-Speed 2479.51 samples/sec Loss 1.5722 LearningRate 0.000135 Epoch: 26 Global Step: 555360 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:04,373-Speed 2512.48 samples/sec Loss 1.5856 LearningRate 0.000135 Epoch: 26 Global Step: 555370 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:12,581-Speed 2495.92 samples/sec Loss 1.5803 LearningRate 0.000135 Epoch: 26 Global Step: 555380 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:20,785-Speed 2496.91 samples/sec Loss 1.5933 LearningRate 0.000135 Epoch: 26 Global Step: 555390 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:28,993-Speed 2495.51 samples/sec Loss 1.5965 LearningRate 0.000135 Epoch: 26 Global Step: 555400 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:37,198-Speed 2496.42 samples/sec Loss 1.5563 LearningRate 0.000135 Epoch: 26 Global Step: 555410 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:45,414-Speed 2493.21 samples/sec Loss 1.5575 LearningRate 0.000135 Epoch: 26 Global Step: 555420 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:35:53,559-Speed 2514.84 samples/sec Loss 1.5674 LearningRate 0.000135 Epoch: 26 Global Step: 555430 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:01,768-Speed 2495.21 samples/sec Loss 1.5657 LearningRate 0.000135 Epoch: 26 Global Step: 555440 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:09,969-Speed 2497.87 samples/sec Loss 1.5735 LearningRate 0.000135 Epoch: 26 Global Step: 555450 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:18,174-Speed 2496.57 samples/sec Loss 1.5502 LearningRate 0.000135 Epoch: 26 Global Step: 555460 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:26,379-Speed 2496.15 samples/sec Loss 1.5730 LearningRate 0.000135 Epoch: 26 Global Step: 555470 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:34,583-Speed 2497.23 samples/sec Loss 1.5718 LearningRate 0.000135 Epoch: 26 Global Step: 555480 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:42,735-Speed 2512.83 samples/sec Loss 1.5475 LearningRate 0.000135 Epoch: 26 Global Step: 555490 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:50,941-Speed 2495.96 samples/sec Loss 1.5467 LearningRate 0.000135 Epoch: 26 Global Step: 555500 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:36:59,144-Speed 2497.01 samples/sec Loss 1.5807 LearningRate 0.000135 Epoch: 26 Global Step: 555510 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:07,347-Speed 2497.01 samples/sec Loss 1.5525 LearningRate 0.000135 Epoch: 26 Global Step: 555520 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:15,551-Speed 2497.00 samples/sec Loss 1.5644 LearningRate 0.000135 Epoch: 26 Global Step: 555530 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:23,752-Speed 2497.63 samples/sec Loss 1.5499 LearningRate 0.000135 Epoch: 26 Global Step: 555540 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:31,901-Speed 2513.62 samples/sec Loss 1.5070 LearningRate 0.000135 Epoch: 26 Global Step: 555550 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:40,100-Speed 2498.30 samples/sec Loss 1.5392 LearningRate 0.000135 Epoch: 26 Global Step: 555560 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:48,308-Speed 2495.68 samples/sec Loss 1.5599 LearningRate 0.000135 Epoch: 26 Global Step: 555570 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:37:56,509-Speed 2497.53 samples/sec Loss 1.5674 LearningRate 0.000135 Epoch: 26 Global Step: 555580 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:04,711-Speed 2497.36 samples/sec Loss 1.6201 LearningRate 0.000135 Epoch: 26 Global Step: 555590 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:12,922-Speed 2494.65 samples/sec Loss 1.5832 LearningRate 0.000135 Epoch: 26 Global Step: 555600 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:21,076-Speed 2512.23 samples/sec Loss 1.5498 LearningRate 0.000135 Epoch: 26 Global Step: 555610 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:29,287-Speed 2494.62 samples/sec Loss 1.5865 LearningRate 0.000135 Epoch: 26 Global Step: 555620 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:37,494-Speed 2495.57 samples/sec Loss 1.5900 LearningRate 0.000135 Epoch: 26 Global Step: 555630 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:45,708-Speed 2493.98 samples/sec Loss 1.5911 LearningRate 0.000135 Epoch: 26 Global Step: 555640 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:38:53,907-Speed 2498.07 samples/sec Loss 1.5506 LearningRate 0.000135 Epoch: 26 Global Step: 555650 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:02,110-Speed 2497.15 samples/sec Loss 1.5531 LearningRate 0.000135 Epoch: 26 Global Step: 555660 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:10,273-Speed 2509.34 samples/sec Loss 1.6031 LearningRate 0.000135 Epoch: 26 Global Step: 555670 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:18,488-Speed 2493.31 samples/sec Loss 1.5430 LearningRate 0.000135 Epoch: 26 Global Step: 555680 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:26,693-Speed 2496.92 samples/sec Loss 1.5814 LearningRate 0.000135 Epoch: 26 Global Step: 555690 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:34,895-Speed 2497.45 samples/sec Loss 1.5703 LearningRate 0.000135 Epoch: 26 Global Step: 555700 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:43,099-Speed 2496.94 samples/sec Loss 1.5740 LearningRate 0.000135 Epoch: 26 Global Step: 555710 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:51,300-Speed 2497.59 samples/sec Loss 1.5562 LearningRate 0.000135 Epoch: 26 Global Step: 555720 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:39:59,453-Speed 2512.61 samples/sec Loss 1.5879 LearningRate 0.000135 Epoch: 26 Global Step: 555730 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:40:07,655-Speed 2497.37 samples/sec Loss 1.5430 LearningRate 0.000135 Epoch: 26 Global Step: 555740 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:40:15,856-Speed 2497.47 samples/sec Loss 1.5619 LearningRate 0.000134 Epoch: 26 Global Step: 555750 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:40:24,060-Speed 2496.92 samples/sec Loss 1.5858 LearningRate 0.000134 Epoch: 26 Global Step: 555760 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:40:32,276-Speed 2493.00 samples/sec Loss 1.5898 LearningRate 0.000134 Epoch: 26 Global Step: 555770 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:40:40,477-Speed 2497.63 samples/sec Loss 1.5862 LearningRate 0.000134 Epoch: 26 Global Step: 555780 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:40:48,627-Speed 2513.18 samples/sec Loss 1.5663 LearningRate 0.000134 Epoch: 26 Global Step: 555790 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:40:56,828-Speed 2497.85 samples/sec Loss 1.5836 LearningRate 0.000134 Epoch: 26 Global Step: 555800 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:05,036-Speed 2495.53 samples/sec Loss 1.5841 LearningRate 0.000134 Epoch: 26 Global Step: 555810 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:13,238-Speed 2497.41 samples/sec Loss 1.5892 LearningRate 0.000134 Epoch: 26 Global Step: 555820 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:21,441-Speed 2497.38 samples/sec Loss 1.5597 LearningRate 0.000134 Epoch: 26 Global Step: 555830 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:29,645-Speed 2496.80 samples/sec Loss 1.6171 LearningRate 0.000134 Epoch: 26 Global Step: 555840 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:37,794-Speed 2513.57 samples/sec Loss 1.6019 LearningRate 0.000134 Epoch: 26 Global Step: 555850 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:45,997-Speed 2497.15 samples/sec Loss 1.5524 LearningRate 0.000134 Epoch: 26 Global Step: 555860 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:41:54,199-Speed 2497.24 samples/sec Loss 1.5704 LearningRate 0.000134 Epoch: 26 Global Step: 555870 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:02,399-Speed 2497.96 samples/sec Loss 1.5361 LearningRate 0.000134 Epoch: 26 Global Step: 555880 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:10,604-Speed 2496.54 samples/sec Loss 1.5878 LearningRate 0.000134 Epoch: 26 Global Step: 555890 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:18,821-Speed 2492.76 samples/sec Loss 1.5777 LearningRate 0.000134 Epoch: 26 Global Step: 555900 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:26,970-Speed 2513.79 samples/sec Loss 1.5863 LearningRate 0.000134 Epoch: 26 Global Step: 555910 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:35,184-Speed 2494.10 samples/sec Loss 1.6165 LearningRate 0.000134 Epoch: 26 Global Step: 555920 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:43,390-Speed 2496.03 samples/sec Loss 1.5522 LearningRate 0.000134 Epoch: 26 Global Step: 555930 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:51,596-Speed 2496.03 samples/sec Loss 1.5458 LearningRate 0.000134 Epoch: 26 Global Step: 555940 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:42:59,809-Speed 2494.45 samples/sec Loss 1.5748 LearningRate 0.000134 Epoch: 26 Global Step: 555950 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:08,014-Speed 2496.47 samples/sec Loss 1.5420 LearningRate 0.000134 Epoch: 26 Global Step: 555960 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:16,166-Speed 2512.72 samples/sec Loss 1.5507 LearningRate 0.000134 Epoch: 26 Global Step: 555970 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:24,369-Speed 2497.37 samples/sec Loss 1.5968 LearningRate 0.000134 Epoch: 26 Global Step: 555980 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:32,579-Speed 2494.93 samples/sec Loss 1.5340 LearningRate 0.000134 Epoch: 26 Global Step: 555990 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:40,781-Speed 2497.35 samples/sec Loss 1.5804 LearningRate 0.000134 Epoch: 26 Global Step: 556000 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:48,996-Speed 2493.38 samples/sec Loss 1.5505 LearningRate 0.000134 Epoch: 26 Global Step: 556010 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:43:57,202-Speed 2496.23 samples/sec Loss 1.5640 LearningRate 0.000134 Epoch: 26 Global Step: 556020 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:05,348-Speed 2514.56 samples/sec Loss 1.5592 LearningRate 0.000134 Epoch: 26 Global Step: 556030 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:13,551-Speed 2497.19 samples/sec Loss 1.5380 LearningRate 0.000134 Epoch: 26 Global Step: 556040 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:21,751-Speed 2497.76 samples/sec Loss 1.5296 LearningRate 0.000134 Epoch: 26 Global Step: 556050 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:29,953-Speed 2497.58 samples/sec Loss 1.5226 LearningRate 0.000134 Epoch: 26 Global Step: 556060 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:38,153-Speed 2497.86 samples/sec Loss 1.5509 LearningRate 0.000134 Epoch: 26 Global Step: 556070 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:46,360-Speed 2496.20 samples/sec Loss 1.5632 LearningRate 0.000134 Epoch: 26 Global Step: 556080 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:44:54,513-Speed 2512.22 samples/sec Loss 1.5475 LearningRate 0.000134 Epoch: 26 Global Step: 556090 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:02,716-Speed 2497.02 samples/sec Loss 1.5753 LearningRate 0.000134 Epoch: 26 Global Step: 556100 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:10,920-Speed 2496.86 samples/sec Loss 1.5221 LearningRate 0.000134 Epoch: 26 Global Step: 556110 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:19,128-Speed 2495.42 samples/sec Loss 1.5622 LearningRate 0.000134 Epoch: 26 Global Step: 556120 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:27,340-Speed 2494.44 samples/sec Loss 1.5524 LearningRate 0.000134 Epoch: 26 Global Step: 556130 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:35,548-Speed 2495.50 samples/sec Loss 1.5292 LearningRate 0.000134 Epoch: 26 Global Step: 556140 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:43,699-Speed 2513.52 samples/sec Loss 1.5514 LearningRate 0.000134 Epoch: 26 Global Step: 556150 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:45:51,909-Speed 2495.20 samples/sec Loss 1.5257 LearningRate 0.000134 Epoch: 26 Global Step: 556160 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:00,117-Speed 2495.21 samples/sec Loss 1.5430 LearningRate 0.000134 Epoch: 26 Global Step: 556170 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:08,333-Speed 2493.15 samples/sec Loss 1.5537 LearningRate 0.000134 Epoch: 26 Global Step: 556180 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:16,538-Speed 2496.28 samples/sec Loss 1.5274 LearningRate 0.000134 Epoch: 26 Global Step: 556190 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:24,747-Speed 2495.43 samples/sec Loss 1.5422 LearningRate 0.000134 Epoch: 26 Global Step: 556200 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:32,898-Speed 2512.83 samples/sec Loss 1.5099 LearningRate 0.000134 Epoch: 26 Global Step: 556210 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:41,103-Speed 2496.36 samples/sec Loss 1.5493 LearningRate 0.000134 Epoch: 26 Global Step: 556220 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:49,317-Speed 2494.08 samples/sec Loss 1.5304 LearningRate 0.000134 Epoch: 26 Global Step: 556230 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:46:57,521-Speed 2496.54 samples/sec Loss 1.5899 LearningRate 0.000134 Epoch: 26 Global Step: 556240 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:05,725-Speed 2496.80 samples/sec Loss 1.5698 LearningRate 0.000134 Epoch: 26 Global Step: 556250 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:13,939-Speed 2493.54 samples/sec Loss 1.5222 LearningRate 0.000134 Epoch: 26 Global Step: 556260 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:22,104-Speed 2508.90 samples/sec Loss 1.5623 LearningRate 0.000134 Epoch: 26 Global Step: 556270 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:30,317-Speed 2493.94 samples/sec Loss 1.5395 LearningRate 0.000134 Epoch: 26 Global Step: 556280 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:38,524-Speed 2495.83 samples/sec Loss 1.5416 LearningRate 0.000134 Epoch: 26 Global Step: 556290 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:46,726-Speed 2497.74 samples/sec Loss 1.5557 LearningRate 0.000134 Epoch: 26 Global Step: 556300 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:47:54,933-Speed 2495.96 samples/sec Loss 1.5431 LearningRate 0.000134 Epoch: 26 Global Step: 556310 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:03,137-Speed 2496.76 samples/sec Loss 1.5506 LearningRate 0.000134 Epoch: 26 Global Step: 556320 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:11,286-Speed 2513.64 samples/sec Loss 1.5734 LearningRate 0.000134 Epoch: 26 Global Step: 556330 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:19,490-Speed 2496.87 samples/sec Loss 1.5335 LearningRate 0.000134 Epoch: 26 Global Step: 556340 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:27,701-Speed 2494.35 samples/sec Loss 1.5494 LearningRate 0.000134 Epoch: 26 Global Step: 556350 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:35,920-Speed 2492.32 samples/sec Loss 1.5501 LearningRate 0.000134 Epoch: 26 Global Step: 556360 Fp16 Grad Scale: 16384 Required: 63 hours Training: 2022-07-10 21:48:44,091-Speed 2506.81 samples/sec Loss 1.5383 LearningRate 0.000134 Epoch: 26 Global Step: 556370 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:48:52,299-Speed 2495.39 samples/sec Loss 1.5499 LearningRate 0.000134 Epoch: 26 Global Step: 556380 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:00,450-Speed 2512.84 samples/sec Loss 1.5729 LearningRate 0.000134 Epoch: 26 Global Step: 556390 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:08,661-Speed 2494.69 samples/sec Loss 1.5565 LearningRate 0.000134 Epoch: 26 Global Step: 556400 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:16,863-Speed 2497.54 samples/sec Loss 1.5783 LearningRate 0.000134 Epoch: 26 Global Step: 556410 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:25,064-Speed 2497.66 samples/sec Loss 1.5439 LearningRate 0.000134 Epoch: 26 Global Step: 556420 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:33,267-Speed 2497.02 samples/sec Loss 1.5783 LearningRate 0.000134 Epoch: 26 Global Step: 556430 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:41,468-Speed 2497.66 samples/sec Loss 1.5438 LearningRate 0.000134 Epoch: 26 Global Step: 556440 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:49,619-Speed 2512.99 samples/sec Loss 1.5587 LearningRate 0.000134 Epoch: 26 Global Step: 556450 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:49:57,819-Speed 2498.95 samples/sec Loss 1.5178 LearningRate 0.000134 Epoch: 26 Global Step: 556460 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:06,016-Speed 2498.66 samples/sec Loss 1.5414 LearningRate 0.000134 Epoch: 26 Global Step: 556470 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:14,218-Speed 2497.63 samples/sec Loss 1.5056 LearningRate 0.000134 Epoch: 26 Global Step: 556480 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:22,417-Speed 2499.01 samples/sec Loss 1.5359 LearningRate 0.000134 Epoch: 26 Global Step: 556490 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:30,619-Speed 2497.37 samples/sec Loss 1.5669 LearningRate 0.000134 Epoch: 26 Global Step: 556500 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:38,770-Speed 2513.14 samples/sec Loss 1.5381 LearningRate 0.000134 Epoch: 26 Global Step: 556510 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:46,968-Speed 2498.41 samples/sec Loss 1.5413 LearningRate 0.000134 Epoch: 26 Global Step: 556520 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:50:55,171-Speed 2497.62 samples/sec Loss 1.5462 LearningRate 0.000134 Epoch: 26 Global Step: 556530 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:51:03,369-Speed 2499.20 samples/sec Loss 1.5279 LearningRate 0.000134 Epoch: 26 Global Step: 556540 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:51:11,569-Speed 2497.92 samples/sec Loss 1.5615 LearningRate 0.000134 Epoch: 26 Global Step: 556550 Fp16 Grad Scale: 8192 Required: 63 hours Training: 2022-07-10 21:51:19,769-Speed 2498.20 samples/sec Loss 1.5205 LearningRate 0.000134 Epoch: 26 Global Step: 556560 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:51:27,917-Speed 2513.92 samples/sec Loss 1.5557 LearningRate 0.000134 Epoch: 26 Global Step: 556570 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:51:36,132-Speed 2493.61 samples/sec Loss 1.5442 LearningRate 0.000134 Epoch: 26 Global Step: 556580 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:51:44,340-Speed 2495.48 samples/sec Loss 1.5515 LearningRate 0.000134 Epoch: 26 Global Step: 556590 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:51:52,541-Speed 2497.70 samples/sec Loss 1.5452 LearningRate 0.000134 Epoch: 26 Global Step: 556600 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:00,745-Speed 2496.77 samples/sec Loss 1.5820 LearningRate 0.000134 Epoch: 26 Global Step: 556610 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:08,947-Speed 2497.64 samples/sec Loss 1.5737 LearningRate 0.000134 Epoch: 26 Global Step: 556620 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:17,095-Speed 2513.81 samples/sec Loss 1.5541 LearningRate 0.000134 Epoch: 26 Global Step: 556630 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:25,294-Speed 2498.38 samples/sec Loss 1.5979 LearningRate 0.000134 Epoch: 26 Global Step: 556640 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:33,492-Speed 2498.28 samples/sec Loss 1.5706 LearningRate 0.000134 Epoch: 26 Global Step: 556650 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:41,707-Speed 2493.68 samples/sec Loss 1.5313 LearningRate 0.000134 Epoch: 26 Global Step: 556660 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:49,908-Speed 2497.62 samples/sec Loss 1.5375 LearningRate 0.000134 Epoch: 26 Global Step: 556670 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:52:58,107-Speed 2498.29 samples/sec Loss 1.5589 LearningRate 0.000134 Epoch: 26 Global Step: 556680 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:06,255-Speed 2513.79 samples/sec Loss 1.5637 LearningRate 0.000134 Epoch: 26 Global Step: 556690 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:14,453-Speed 2498.69 samples/sec Loss 1.5124 LearningRate 0.000134 Epoch: 26 Global Step: 556700 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:22,651-Speed 2498.39 samples/sec Loss 1.5709 LearningRate 0.000134 Epoch: 26 Global Step: 556710 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:30,852-Speed 2497.82 samples/sec Loss 1.5783 LearningRate 0.000134 Epoch: 26 Global Step: 556720 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:39,059-Speed 2495.75 samples/sec Loss 1.5660 LearningRate 0.000134 Epoch: 26 Global Step: 556730 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:47,263-Speed 2496.73 samples/sec Loss 1.5634 LearningRate 0.000134 Epoch: 26 Global Step: 556740 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:53:55,408-Speed 2514.67 samples/sec Loss 1.5581 LearningRate 0.000134 Epoch: 26 Global Step: 556750 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:03,635-Speed 2489.79 samples/sec Loss 1.5734 LearningRate 0.000134 Epoch: 26 Global Step: 556760 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:11,834-Speed 2498.30 samples/sec Loss 1.5804 LearningRate 0.000133 Epoch: 26 Global Step: 556770 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:20,034-Speed 2497.88 samples/sec Loss 1.5921 LearningRate 0.000133 Epoch: 26 Global Step: 556780 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:28,233-Speed 2498.08 samples/sec Loss 1.6017 LearningRate 0.000133 Epoch: 26 Global Step: 556790 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:36,436-Speed 2497.40 samples/sec Loss 1.5584 LearningRate 0.000133 Epoch: 26 Global Step: 556800 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:44,584-Speed 2513.87 samples/sec Loss 1.6050 LearningRate 0.000133 Epoch: 26 Global Step: 556810 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:54:52,784-Speed 2497.92 samples/sec Loss 1.6304 LearningRate 0.000133 Epoch: 26 Global Step: 556820 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:00,985-Speed 2497.34 samples/sec Loss 1.5737 LearningRate 0.000133 Epoch: 26 Global Step: 556830 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:09,184-Speed 2498.27 samples/sec Loss 1.5953 LearningRate 0.000133 Epoch: 26 Global Step: 556840 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:17,390-Speed 2496.09 samples/sec Loss 1.5648 LearningRate 0.000133 Epoch: 26 Global Step: 556850 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:25,596-Speed 2495.93 samples/sec Loss 1.5507 LearningRate 0.000133 Epoch: 26 Global Step: 556860 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:33,743-Speed 2514.48 samples/sec Loss 1.6506 LearningRate 0.000133 Epoch: 26 Global Step: 556870 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:41,945-Speed 2497.40 samples/sec Loss 1.5682 LearningRate 0.000133 Epoch: 26 Global Step: 556880 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:50,145-Speed 2498.00 samples/sec Loss 1.5467 LearningRate 0.000133 Epoch: 26 Global Step: 556890 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:55:58,345-Speed 2498.21 samples/sec Loss 1.5452 LearningRate 0.000133 Epoch: 26 Global Step: 556900 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:06,546-Speed 2497.57 samples/sec Loss 1.5526 LearningRate 0.000133 Epoch: 26 Global Step: 556910 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:14,749-Speed 2496.97 samples/sec Loss 1.5029 LearningRate 0.000133 Epoch: 26 Global Step: 556920 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:22,897-Speed 2513.80 samples/sec Loss 1.5889 LearningRate 0.000133 Epoch: 26 Global Step: 556930 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:31,099-Speed 2497.35 samples/sec Loss 1.5261 LearningRate 0.000133 Epoch: 26 Global Step: 556940 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:39,301-Speed 2497.50 samples/sec Loss 1.5409 LearningRate 0.000133 Epoch: 26 Global Step: 556950 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:47,504-Speed 2496.96 samples/sec Loss 1.5665 LearningRate 0.000133 Epoch: 26 Global Step: 556960 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:56:55,712-Speed 2495.67 samples/sec Loss 1.5230 LearningRate 0.000133 Epoch: 26 Global Step: 556970 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:03,915-Speed 2496.87 samples/sec Loss 1.5394 LearningRate 0.000133 Epoch: 26 Global Step: 556980 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:12,069-Speed 2511.94 samples/sec Loss 1.5262 LearningRate 0.000133 Epoch: 26 Global Step: 556990 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:20,271-Speed 2497.43 samples/sec Loss 1.5628 LearningRate 0.000133 Epoch: 26 Global Step: 557000 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:28,485-Speed 2493.56 samples/sec Loss 1.5605 LearningRate 0.000133 Epoch: 26 Global Step: 557010 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:36,686-Speed 2497.67 samples/sec Loss 1.5352 LearningRate 0.000133 Epoch: 26 Global Step: 557020 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:44,884-Speed 2498.25 samples/sec Loss 1.5486 LearningRate 0.000133 Epoch: 26 Global Step: 557030 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:57:53,088-Speed 2496.88 samples/sec Loss 1.5159 LearningRate 0.000133 Epoch: 26 Global Step: 557040 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:01,234-Speed 2514.50 samples/sec Loss 1.5521 LearningRate 0.000133 Epoch: 26 Global Step: 557050 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:09,431-Speed 2498.66 samples/sec Loss 1.5495 LearningRate 0.000133 Epoch: 26 Global Step: 557060 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:17,644-Speed 2494.25 samples/sec Loss 1.5707 LearningRate 0.000133 Epoch: 26 Global Step: 557070 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:25,852-Speed 2495.64 samples/sec Loss 1.5771 LearningRate 0.000133 Epoch: 26 Global Step: 557080 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:34,060-Speed 2495.57 samples/sec Loss 1.5434 LearningRate 0.000133 Epoch: 26 Global Step: 557090 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:42,262-Speed 2497.21 samples/sec Loss 1.5164 LearningRate 0.000133 Epoch: 26 Global Step: 557100 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:50,410-Speed 2513.79 samples/sec Loss 1.5707 LearningRate 0.000133 Epoch: 26 Global Step: 557110 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:58:58,616-Speed 2496.65 samples/sec Loss 1.5705 LearningRate 0.000133 Epoch: 26 Global Step: 557120 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:06,820-Speed 2496.56 samples/sec Loss 1.5398 LearningRate 0.000133 Epoch: 26 Global Step: 557130 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:15,022-Speed 2497.34 samples/sec Loss 1.5529 LearningRate 0.000133 Epoch: 26 Global Step: 557140 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:23,228-Speed 2496.09 samples/sec Loss 1.5520 LearningRate 0.000133 Epoch: 26 Global Step: 557150 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:31,430-Speed 2497.48 samples/sec Loss 1.5506 LearningRate 0.000133 Epoch: 26 Global Step: 557160 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:39,579-Speed 2513.43 samples/sec Loss 1.5287 LearningRate 0.000133 Epoch: 26 Global Step: 557170 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:47,781-Speed 2497.06 samples/sec Loss 1.5820 LearningRate 0.000133 Epoch: 26 Global Step: 557180 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 21:59:55,987-Speed 2496.22 samples/sec Loss 1.5760 LearningRate 0.000133 Epoch: 26 Global Step: 557190 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:04,190-Speed 2497.50 samples/sec Loss 1.5915 LearningRate 0.000133 Epoch: 26 Global Step: 557200 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:12,389-Speed 2497.96 samples/sec Loss 1.5429 LearningRate 0.000133 Epoch: 26 Global Step: 557210 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:20,588-Speed 2498.50 samples/sec Loss 1.5944 LearningRate 0.000133 Epoch: 26 Global Step: 557220 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:28,739-Speed 2512.98 samples/sec Loss 1.5942 LearningRate 0.000133 Epoch: 26 Global Step: 557230 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:36,942-Speed 2497.17 samples/sec Loss 1.5711 LearningRate 0.000133 Epoch: 26 Global Step: 557240 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:45,139-Speed 2499.03 samples/sec Loss 1.5584 LearningRate 0.000133 Epoch: 26 Global Step: 557250 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:00:53,339-Speed 2497.68 samples/sec Loss 1.5959 LearningRate 0.000133 Epoch: 26 Global Step: 557260 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:01,540-Speed 2497.90 samples/sec Loss 1.5612 LearningRate 0.000133 Epoch: 26 Global Step: 557270 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:09,740-Speed 2498.06 samples/sec Loss 1.5180 LearningRate 0.000133 Epoch: 26 Global Step: 557280 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:17,888-Speed 2513.83 samples/sec Loss 1.5654 LearningRate 0.000133 Epoch: 26 Global Step: 557290 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:26,087-Speed 2498.17 samples/sec Loss 1.6056 LearningRate 0.000133 Epoch: 26 Global Step: 557300 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:34,289-Speed 2497.45 samples/sec Loss 1.5749 LearningRate 0.000133 Epoch: 26 Global Step: 557310 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:42,490-Speed 2497.66 samples/sec Loss 1.5469 LearningRate 0.000133 Epoch: 26 Global Step: 557320 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:50,688-Speed 2498.31 samples/sec Loss 1.5532 LearningRate 0.000133 Epoch: 26 Global Step: 557330 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:01:58,887-Speed 2498.45 samples/sec Loss 1.5835 LearningRate 0.000133 Epoch: 26 Global Step: 557340 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:07,037-Speed 2513.35 samples/sec Loss 1.5644 LearningRate 0.000133 Epoch: 26 Global Step: 557350 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:15,238-Speed 2497.54 samples/sec Loss 1.5835 LearningRate 0.000133 Epoch: 26 Global Step: 557360 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:23,438-Speed 2498.06 samples/sec Loss 1.5836 LearningRate 0.000133 Epoch: 26 Global Step: 557370 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:31,637-Speed 2498.12 samples/sec Loss 1.5633 LearningRate 0.000133 Epoch: 26 Global Step: 557380 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:39,838-Speed 2497.69 samples/sec Loss 1.5399 LearningRate 0.000133 Epoch: 26 Global Step: 557390 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:48,040-Speed 2497.09 samples/sec Loss 1.5899 LearningRate 0.000133 Epoch: 26 Global Step: 557400 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:02:56,189-Speed 2513.87 samples/sec Loss 1.5714 LearningRate 0.000133 Epoch: 26 Global Step: 557410 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:04,396-Speed 2495.67 samples/sec Loss 1.5184 LearningRate 0.000133 Epoch: 26 Global Step: 557420 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:12,598-Speed 2497.25 samples/sec Loss 1.5839 LearningRate 0.000133 Epoch: 26 Global Step: 557430 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:20,801-Speed 2497.04 samples/sec Loss 1.5292 LearningRate 0.000133 Epoch: 26 Global Step: 557440 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:29,002-Speed 2497.78 samples/sec Loss 1.5244 LearningRate 0.000133 Epoch: 26 Global Step: 557450 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:37,206-Speed 2496.84 samples/sec Loss 1.5916 LearningRate 0.000133 Epoch: 26 Global Step: 557460 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:45,366-Speed 2510.03 samples/sec Loss 1.5207 LearningRate 0.000133 Epoch: 26 Global Step: 557470 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:03:53,572-Speed 2496.05 samples/sec Loss 1.5050 LearningRate 0.000133 Epoch: 26 Global Step: 557480 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:01,780-Speed 2495.58 samples/sec Loss 1.5584 LearningRate 0.000133 Epoch: 26 Global Step: 557490 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:09,984-Speed 2496.92 samples/sec Loss 1.5294 LearningRate 0.000133 Epoch: 26 Global Step: 557500 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:18,195-Speed 2494.63 samples/sec Loss 1.5727 LearningRate 0.000133 Epoch: 26 Global Step: 557510 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:26,404-Speed 2495.11 samples/sec Loss 1.5739 LearningRate 0.000133 Epoch: 26 Global Step: 557520 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:34,674-Speed 2512.36 samples/sec Loss 1.5355 LearningRate 0.000133 Epoch: 26 Global Step: 557530 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:42,876-Speed 2497.06 samples/sec Loss 1.5553 LearningRate 0.000133 Epoch: 26 Global Step: 557540 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:04:54,350-Speed 1796.05 samples/sec Loss 1.5324 LearningRate 0.000133 Epoch: 26 Global Step: 557550 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:05:02,554-Speed 2497.62 samples/sec Loss 1.5937 LearningRate 0.000133 Epoch: 26 Global Step: 557560 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:05:10,770-Speed 2499.40 samples/sec Loss 1.5729 LearningRate 0.000133 Epoch: 26 Global Step: 557570 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:05:18,972-Speed 2497.30 samples/sec Loss 1.5653 LearningRate 0.000133 Epoch: 26 Global Step: 557580 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:05:32,352-Speed 2513.51 samples/sec Loss 1.5931 LearningRate 0.000133 Epoch: 26 Global Step: 557590 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:05:40,567-Speed 2501.60 samples/sec Loss 1.5765 LearningRate 0.000133 Epoch: 26 Global Step: 557600 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:05:48,773-Speed 2500.03 samples/sec Loss 1.5534 LearningRate 0.000133 Epoch: 26 Global Step: 557610 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:05:56,977-Speed 2496.52 samples/sec Loss 1.5273 LearningRate 0.000133 Epoch: 26 Global Step: 557620 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:07,635-Speed 1921.66 samples/sec Loss 1.6049 LearningRate 0.000133 Epoch: 26 Global Step: 557630 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:15,900-Speed 2501.11 samples/sec Loss 1.5624 LearningRate 0.000133 Epoch: 26 Global Step: 557640 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:24,053-Speed 2515.67 samples/sec Loss 1.5496 LearningRate 0.000133 Epoch: 26 Global Step: 557650 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:32,274-Speed 2491.51 samples/sec Loss 1.5493 LearningRate 0.000133 Epoch: 26 Global Step: 557660 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:40,537-Speed 2492.59 samples/sec Loss 1.5248 LearningRate 0.000133 Epoch: 26 Global Step: 557670 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:06:48,787-Speed 2497.43 samples/sec Loss 1.5378 LearningRate 0.000133 Epoch: 26 Global Step: 557680 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:00,410-Speed 1762.29 samples/sec Loss 1.5561 LearningRate 0.000133 Epoch: 26 Global Step: 557690 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:08,613-Speed 2496.85 samples/sec Loss 1.5923 LearningRate 0.000133 Epoch: 26 Global Step: 557700 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:21,105-Speed 1648.66 samples/sec Loss 1.5583 LearningRate 0.000133 Epoch: 26 Global Step: 557710 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:29,302-Speed 2499.50 samples/sec Loss 1.5292 LearningRate 0.000133 Epoch: 26 Global Step: 557720 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:37,573-Speed 2476.27 samples/sec Loss 1.5797 LearningRate 0.000133 Epoch: 26 Global Step: 557730 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:45,803-Speed 2490.52 samples/sec Loss 1.5342 LearningRate 0.000133 Epoch: 26 Global Step: 557740 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:07:54,022-Speed 2492.23 samples/sec Loss 1.5891 LearningRate 0.000133 Epoch: 26 Global Step: 557750 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:02,678-Speed 2366.30 samples/sec Loss 1.5470 LearningRate 0.000133 Epoch: 26 Global Step: 557760 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:11,255-Speed 2413.81 samples/sec Loss 1.5569 LearningRate 0.000133 Epoch: 26 Global Step: 557770 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:19,477-Speed 2491.29 samples/sec Loss 1.5741 LearningRate 0.000133 Epoch: 26 Global Step: 557780 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:27,703-Speed 2489.82 samples/sec Loss 1.5682 LearningRate 0.000133 Epoch: 26 Global Step: 557790 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:35,939-Speed 2487.15 samples/sec Loss 1.5624 LearningRate 0.000132 Epoch: 26 Global Step: 557800 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:44,171-Speed 2488.28 samples/sec Loss 1.5020 LearningRate 0.000132 Epoch: 26 Global Step: 557810 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:08:52,390-Speed 2492.52 samples/sec Loss 1.5182 LearningRate 0.000132 Epoch: 26 Global Step: 557820 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:00,551-Speed 2509.89 samples/sec Loss 1.5702 LearningRate 0.000132 Epoch: 26 Global Step: 557830 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:08,760-Speed 2495.08 samples/sec Loss 1.5749 LearningRate 0.000132 Epoch: 26 Global Step: 557840 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:16,970-Speed 2494.98 samples/sec Loss 1.5298 LearningRate 0.000132 Epoch: 26 Global Step: 557850 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:25,170-Speed 2497.83 samples/sec Loss 1.5354 LearningRate 0.000132 Epoch: 26 Global Step: 557860 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:33,371-Speed 2497.73 samples/sec Loss 1.5452 LearningRate 0.000132 Epoch: 26 Global Step: 557870 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:41,569-Speed 2498.34 samples/sec Loss 1.5428 LearningRate 0.000132 Epoch: 26 Global Step: 557880 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:49,718-Speed 2513.79 samples/sec Loss 1.5632 LearningRate 0.000132 Epoch: 26 Global Step: 557890 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:09:57,918-Speed 2497.95 samples/sec Loss 1.5154 LearningRate 0.000132 Epoch: 26 Global Step: 557900 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:06,214-Speed 2468.95 samples/sec Loss 1.5434 LearningRate 0.000132 Epoch: 26 Global Step: 557910 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:14,416-Speed 2497.33 samples/sec Loss 1.5227 LearningRate 0.000132 Epoch: 26 Global Step: 557920 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:22,621-Speed 2496.41 samples/sec Loss 1.5580 LearningRate 0.000132 Epoch: 26 Global Step: 557930 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:30,824-Speed 2497.31 samples/sec Loss 1.5521 LearningRate 0.000132 Epoch: 26 Global Step: 557940 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:38,973-Speed 2513.37 samples/sec Loss 1.5678 LearningRate 0.000132 Epoch: 26 Global Step: 557950 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:47,181-Speed 2495.60 samples/sec Loss 1.5434 LearningRate 0.000132 Epoch: 26 Global Step: 557960 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:10:55,383-Speed 2497.61 samples/sec Loss 1.5267 LearningRate 0.000132 Epoch: 26 Global Step: 557970 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:03,588-Speed 2496.67 samples/sec Loss 1.5689 LearningRate 0.000132 Epoch: 26 Global Step: 557980 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:11,791-Speed 2496.78 samples/sec Loss 1.5573 LearningRate 0.000132 Epoch: 26 Global Step: 557990 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:19,997-Speed 2496.20 samples/sec Loss 1.5369 LearningRate 0.000132 Epoch: 26 Global Step: 558000 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:28,147-Speed 2513.19 samples/sec Loss 1.5428 LearningRate 0.000132 Epoch: 26 Global Step: 558010 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:36,352-Speed 2496.51 samples/sec Loss 1.5803 LearningRate 0.000132 Epoch: 26 Global Step: 558020 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:44,553-Speed 2497.59 samples/sec Loss 1.5445 LearningRate 0.000132 Epoch: 26 Global Step: 558030 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:11:52,755-Speed 2497.62 samples/sec Loss 1.5380 LearningRate 0.000132 Epoch: 26 Global Step: 558040 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:00,955-Speed 2497.74 samples/sec Loss 1.5292 LearningRate 0.000132 Epoch: 26 Global Step: 558050 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:09,157-Speed 2497.57 samples/sec Loss 1.5717 LearningRate 0.000132 Epoch: 26 Global Step: 558060 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:17,308-Speed 2512.99 samples/sec Loss 1.5432 LearningRate 0.000132 Epoch: 26 Global Step: 558070 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:25,510-Speed 2497.29 samples/sec Loss 1.5825 LearningRate 0.000132 Epoch: 26 Global Step: 558080 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:33,723-Speed 2494.14 samples/sec Loss 1.5408 LearningRate 0.000132 Epoch: 26 Global Step: 558090 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:41,924-Speed 2497.43 samples/sec Loss 1.5496 LearningRate 0.000132 Epoch: 26 Global Step: 558100 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:50,130-Speed 2496.13 samples/sec Loss 1.5470 LearningRate 0.000132 Epoch: 26 Global Step: 558110 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:12:58,336-Speed 2496.20 samples/sec Loss 1.5557 LearningRate 0.000132 Epoch: 26 Global Step: 558120 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:06,482-Speed 2514.57 samples/sec Loss 1.5608 LearningRate 0.000132 Epoch: 26 Global Step: 558130 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:14,683-Speed 2497.76 samples/sec Loss 1.5357 LearningRate 0.000132 Epoch: 26 Global Step: 558140 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:22,889-Speed 2496.24 samples/sec Loss 1.5677 LearningRate 0.000132 Epoch: 26 Global Step: 558150 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:31,090-Speed 2497.74 samples/sec Loss 1.5562 LearningRate 0.000132 Epoch: 26 Global Step: 558160 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:39,291-Speed 2497.83 samples/sec Loss 1.5644 LearningRate 0.000132 Epoch: 26 Global Step: 558170 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:47,492-Speed 2497.50 samples/sec Loss 1.6083 LearningRate 0.000132 Epoch: 26 Global Step: 558180 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:13:55,638-Speed 2514.46 samples/sec Loss 1.5580 LearningRate 0.000132 Epoch: 26 Global Step: 558190 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:03,843-Speed 2496.48 samples/sec Loss 1.5553 LearningRate 0.000132 Epoch: 26 Global Step: 558200 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:12,046-Speed 2497.29 samples/sec Loss 1.5233 LearningRate 0.000132 Epoch: 26 Global Step: 558210 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:20,249-Speed 2496.95 samples/sec Loss 1.5707 LearningRate 0.000132 Epoch: 26 Global Step: 558220 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:28,454-Speed 2496.46 samples/sec Loss 1.5276 LearningRate 0.000132 Epoch: 26 Global Step: 558230 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:36,661-Speed 2495.98 samples/sec Loss 1.5280 LearningRate 0.000132 Epoch: 26 Global Step: 558240 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:44,808-Speed 2514.25 samples/sec Loss 1.5671 LearningRate 0.000132 Epoch: 26 Global Step: 558250 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:14:53,011-Speed 2496.95 samples/sec Loss 1.5814 LearningRate 0.000132 Epoch: 26 Global Step: 558260 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:01,211-Speed 2498.09 samples/sec Loss 1.5295 LearningRate 0.000132 Epoch: 26 Global Step: 558270 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:09,414-Speed 2497.03 samples/sec Loss 1.5333 LearningRate 0.000132 Epoch: 26 Global Step: 558280 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:17,618-Speed 2496.86 samples/sec Loss 1.5682 LearningRate 0.000132 Epoch: 26 Global Step: 558290 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:25,820-Speed 2497.39 samples/sec Loss 1.5374 LearningRate 0.000132 Epoch: 26 Global Step: 558300 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:33,970-Speed 2513.18 samples/sec Loss 1.5313 LearningRate 0.000132 Epoch: 26 Global Step: 558310 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:42,172-Speed 2497.48 samples/sec Loss 1.5510 LearningRate 0.000132 Epoch: 26 Global Step: 558320 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:50,374-Speed 2497.34 samples/sec Loss 1.5399 LearningRate 0.000132 Epoch: 26 Global Step: 558330 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:15:58,578-Speed 2496.73 samples/sec Loss 1.5802 LearningRate 0.000132 Epoch: 26 Global Step: 558340 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:06,780-Speed 2497.14 samples/sec Loss 1.5682 LearningRate 0.000132 Epoch: 26 Global Step: 558350 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:14,983-Speed 2497.01 samples/sec Loss 1.5536 LearningRate 0.000132 Epoch: 26 Global Step: 558360 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:23,136-Speed 2512.91 samples/sec Loss 1.5240 LearningRate 0.000132 Epoch: 26 Global Step: 558370 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:31,339-Speed 2496.94 samples/sec Loss 1.5237 LearningRate 0.000132 Epoch: 26 Global Step: 558380 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:39,545-Speed 2496.26 samples/sec Loss 1.5796 LearningRate 0.000132 Epoch: 26 Global Step: 558390 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:47,749-Speed 2496.73 samples/sec Loss 1.5315 LearningRate 0.000132 Epoch: 26 Global Step: 558400 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:16:55,952-Speed 2497.16 samples/sec Loss 1.5458 LearningRate 0.000132 Epoch: 26 Global Step: 558410 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:17:04,157-Speed 2496.35 samples/sec Loss 1.5790 LearningRate 0.000132 Epoch: 26 Global Step: 558420 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:17:12,313-Speed 2511.61 samples/sec Loss 1.5523 LearningRate 0.000132 Epoch: 26 Global Step: 558430 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:17:20,526-Speed 2493.80 samples/sec Loss 1.5694 LearningRate 0.000132 Epoch: 26 Global Step: 558440 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:17:28,728-Speed 2497.84 samples/sec Loss 1.5941 LearningRate 0.000132 Epoch: 26 Global Step: 558450 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:17:36,886-Speed 2510.78 samples/sec Loss 1.5757 LearningRate 0.000132 Epoch: 26 Global Step: 558460 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:17:45,085-Speed 2498.23 samples/sec Loss 1.5492 LearningRate 0.000132 Epoch: 26 Global Step: 558470 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:17:53,284-Speed 2498.18 samples/sec Loss 1.5652 LearningRate 0.000132 Epoch: 26 Global Step: 558480 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:01,434-Speed 2513.55 samples/sec Loss 1.5430 LearningRate 0.000132 Epoch: 26 Global Step: 558490 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:09,636-Speed 2497.38 samples/sec Loss 1.5662 LearningRate 0.000132 Epoch: 26 Global Step: 558500 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:17,835-Speed 2498.36 samples/sec Loss 1.5859 LearningRate 0.000132 Epoch: 26 Global Step: 558510 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:26,034-Speed 2498.08 samples/sec Loss 1.5715 LearningRate 0.000132 Epoch: 26 Global Step: 558520 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:34,238-Speed 2496.81 samples/sec Loss 1.5153 LearningRate 0.000132 Epoch: 26 Global Step: 558530 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:42,440-Speed 2497.77 samples/sec Loss 1.5527 LearningRate 0.000132 Epoch: 26 Global Step: 558540 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:50,585-Speed 2514.74 samples/sec Loss 1.5436 LearningRate 0.000132 Epoch: 26 Global Step: 558550 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:18:58,796-Speed 2494.81 samples/sec Loss 1.5730 LearningRate 0.000132 Epoch: 26 Global Step: 558560 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:06,999-Speed 2497.25 samples/sec Loss 1.5699 LearningRate 0.000132 Epoch: 26 Global Step: 558570 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:15,201-Speed 2497.21 samples/sec Loss 1.5656 LearningRate 0.000132 Epoch: 26 Global Step: 558580 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:23,400-Speed 2498.36 samples/sec Loss 1.5419 LearningRate 0.000132 Epoch: 26 Global Step: 558590 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:31,600-Speed 2497.86 samples/sec Loss 1.5294 LearningRate 0.000132 Epoch: 26 Global Step: 558600 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:39,747-Speed 2514.23 samples/sec Loss 1.5335 LearningRate 0.000132 Epoch: 26 Global Step: 558610 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:47,949-Speed 2497.34 samples/sec Loss 1.5488 LearningRate 0.000132 Epoch: 26 Global Step: 558620 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:19:56,154-Speed 2496.43 samples/sec Loss 1.5579 LearningRate 0.000132 Epoch: 26 Global Step: 558630 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:04,355-Speed 2497.58 samples/sec Loss 1.5525 LearningRate 0.000132 Epoch: 26 Global Step: 558640 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:12,553-Speed 2498.52 samples/sec Loss 1.5396 LearningRate 0.000132 Epoch: 26 Global Step: 558650 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:20,758-Speed 2496.46 samples/sec Loss 1.5593 LearningRate 0.000132 Epoch: 26 Global Step: 558660 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:28,912-Speed 2512.31 samples/sec Loss 1.5474 LearningRate 0.000132 Epoch: 26 Global Step: 558670 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:37,108-Speed 2499.25 samples/sec Loss 1.5447 LearningRate 0.000132 Epoch: 26 Global Step: 558680 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:45,308-Speed 2498.00 samples/sec Loss 1.5390 LearningRate 0.000132 Epoch: 26 Global Step: 558690 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:20:53,509-Speed 2497.47 samples/sec Loss 1.5344 LearningRate 0.000132 Epoch: 26 Global Step: 558700 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:01,714-Speed 2496.67 samples/sec Loss 1.5178 LearningRate 0.000132 Epoch: 26 Global Step: 558710 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:09,914-Speed 2497.78 samples/sec Loss 1.5468 LearningRate 0.000132 Epoch: 26 Global Step: 558720 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:18,063-Speed 2513.81 samples/sec Loss 1.5259 LearningRate 0.000132 Epoch: 26 Global Step: 558730 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:26,263-Speed 2497.89 samples/sec Loss 1.5275 LearningRate 0.000132 Epoch: 26 Global Step: 558740 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:34,464-Speed 2497.60 samples/sec Loss 1.5749 LearningRate 0.000132 Epoch: 26 Global Step: 558750 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:42,664-Speed 2498.05 samples/sec Loss 1.5516 LearningRate 0.000132 Epoch: 26 Global Step: 558760 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:50,863-Speed 2498.32 samples/sec Loss 1.5299 LearningRate 0.000132 Epoch: 26 Global Step: 558770 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:21:59,067-Speed 2497.08 samples/sec Loss 1.5433 LearningRate 0.000132 Epoch: 26 Global Step: 558780 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:07,213-Speed 2514.61 samples/sec Loss 1.5655 LearningRate 0.000132 Epoch: 26 Global Step: 558790 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:15,426-Speed 2494.08 samples/sec Loss 1.5063 LearningRate 0.000132 Epoch: 26 Global Step: 558800 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:23,628-Speed 2497.27 samples/sec Loss 1.5514 LearningRate 0.000132 Epoch: 26 Global Step: 558810 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:31,833-Speed 2496.51 samples/sec Loss 1.5366 LearningRate 0.000132 Epoch: 26 Global Step: 558820 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:40,032-Speed 2498.22 samples/sec Loss 1.5591 LearningRate 0.000131 Epoch: 26 Global Step: 558830 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:48,232-Speed 2497.97 samples/sec Loss 1.5278 LearningRate 0.000131 Epoch: 26 Global Step: 558840 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:22:56,381-Speed 2513.89 samples/sec Loss 1.5565 LearningRate 0.000131 Epoch: 26 Global Step: 558850 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:04,585-Speed 2496.69 samples/sec Loss 1.5838 LearningRate 0.000131 Epoch: 26 Global Step: 558860 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:12,785-Speed 2497.85 samples/sec Loss 1.5179 LearningRate 0.000131 Epoch: 26 Global Step: 558870 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:20,986-Speed 2497.62 samples/sec Loss 1.5382 LearningRate 0.000131 Epoch: 26 Global Step: 558880 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:29,188-Speed 2497.94 samples/sec Loss 1.5968 LearningRate 0.000131 Epoch: 26 Global Step: 558890 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:37,387-Speed 2497.99 samples/sec Loss 1.5465 LearningRate 0.000131 Epoch: 26 Global Step: 558900 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:45,536-Speed 2513.70 samples/sec Loss 1.5245 LearningRate 0.000131 Epoch: 26 Global Step: 558910 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:23:53,738-Speed 2497.11 samples/sec Loss 1.5459 LearningRate 0.000131 Epoch: 26 Global Step: 558920 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:01,941-Speed 2497.30 samples/sec Loss 1.5294 LearningRate 0.000131 Epoch: 26 Global Step: 558930 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:10,142-Speed 2497.69 samples/sec Loss 1.5525 LearningRate 0.000131 Epoch: 26 Global Step: 558940 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:18,346-Speed 2496.83 samples/sec Loss 1.5845 LearningRate 0.000131 Epoch: 26 Global Step: 558950 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:26,549-Speed 2496.92 samples/sec Loss 1.5690 LearningRate 0.000131 Epoch: 26 Global Step: 558960 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:34,700-Speed 2512.88 samples/sec Loss 1.5551 LearningRate 0.000131 Epoch: 26 Global Step: 558970 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:42,900-Speed 2498.19 samples/sec Loss 1.5280 LearningRate 0.000131 Epoch: 26 Global Step: 558980 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:51,099-Speed 2498.06 samples/sec Loss 1.5488 LearningRate 0.000131 Epoch: 26 Global Step: 558990 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:24:59,301-Speed 2497.63 samples/sec Loss 1.5385 LearningRate 0.000131 Epoch: 26 Global Step: 559000 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:07,499-Speed 2498.47 samples/sec Loss 1.5561 LearningRate 0.000131 Epoch: 26 Global Step: 559010 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:15,699-Speed 2497.81 samples/sec Loss 1.5527 LearningRate 0.000131 Epoch: 26 Global Step: 559020 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:23,857-Speed 2510.82 samples/sec Loss 1.5428 LearningRate 0.000131 Epoch: 26 Global Step: 559030 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:32,056-Speed 2498.49 samples/sec Loss 1.5831 LearningRate 0.000131 Epoch: 26 Global Step: 559040 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:40,272-Speed 2494.16 samples/sec Loss 1.5676 LearningRate 0.000131 Epoch: 26 Global Step: 559050 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:48,468-Speed 2499.09 samples/sec Loss 1.5357 LearningRate 0.000131 Epoch: 26 Global Step: 559060 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:25:56,668-Speed 2497.91 samples/sec Loss 1.5729 LearningRate 0.000131 Epoch: 26 Global Step: 559070 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:04,866-Speed 2498.71 samples/sec Loss 1.5665 LearningRate 0.000131 Epoch: 26 Global Step: 559080 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:13,015-Speed 2513.38 samples/sec Loss 1.5615 LearningRate 0.000131 Epoch: 26 Global Step: 559090 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:21,215-Speed 2498.07 samples/sec Loss 1.5790 LearningRate 0.000131 Epoch: 26 Global Step: 559100 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:29,417-Speed 2497.67 samples/sec Loss 1.5603 LearningRate 0.000131 Epoch: 26 Global Step: 559110 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:37,617-Speed 2497.90 samples/sec Loss 1.5134 LearningRate 0.000131 Epoch: 26 Global Step: 559120 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:45,817-Speed 2498.19 samples/sec Loss 1.5477 LearningRate 0.000131 Epoch: 26 Global Step: 559130 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:26:54,019-Speed 2497.15 samples/sec Loss 1.5312 LearningRate 0.000131 Epoch: 26 Global Step: 559140 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:02,166-Speed 2514.09 samples/sec Loss 1.5668 LearningRate 0.000131 Epoch: 26 Global Step: 559150 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:10,373-Speed 2496.26 samples/sec Loss 1.5318 LearningRate 0.000131 Epoch: 26 Global Step: 559160 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:18,571-Speed 2498.51 samples/sec Loss 1.5426 LearningRate 0.000131 Epoch: 26 Global Step: 559170 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:26,774-Speed 2496.88 samples/sec Loss 1.5555 LearningRate 0.000131 Epoch: 26 Global Step: 559180 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:34,976-Speed 2497.44 samples/sec Loss 1.5545 LearningRate 0.000131 Epoch: 26 Global Step: 559190 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:43,178-Speed 2497.26 samples/sec Loss 1.5554 LearningRate 0.000131 Epoch: 26 Global Step: 559200 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:51,325-Speed 2514.34 samples/sec Loss 1.5267 LearningRate 0.000131 Epoch: 26 Global Step: 559210 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:27:59,532-Speed 2495.87 samples/sec Loss 1.5507 LearningRate 0.000131 Epoch: 26 Global Step: 559220 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:07,735-Speed 2497.23 samples/sec Loss 1.5644 LearningRate 0.000131 Epoch: 26 Global Step: 559230 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:15,934-Speed 2498.17 samples/sec Loss 1.5572 LearningRate 0.000131 Epoch: 26 Global Step: 559240 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:24,136-Speed 2497.45 samples/sec Loss 1.5225 LearningRate 0.000131 Epoch: 26 Global Step: 559250 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:32,338-Speed 2497.05 samples/sec Loss 1.5197 LearningRate 0.000131 Epoch: 26 Global Step: 559260 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:40,491-Speed 2512.70 samples/sec Loss 1.5441 LearningRate 0.000131 Epoch: 26 Global Step: 559270 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:48,690-Speed 2498.24 samples/sec Loss 1.5466 LearningRate 0.000131 Epoch: 26 Global Step: 559280 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:28:56,890-Speed 2498.01 samples/sec Loss 1.5687 LearningRate 0.000131 Epoch: 26 Global Step: 559290 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:05,092-Speed 2497.33 samples/sec Loss 1.5379 LearningRate 0.000131 Epoch: 26 Global Step: 559300 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:13,294-Speed 2497.51 samples/sec Loss 1.5597 LearningRate 0.000131 Epoch: 26 Global Step: 559310 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:21,498-Speed 2496.83 samples/sec Loss 1.5740 LearningRate 0.000131 Epoch: 26 Global Step: 559320 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:29,656-Speed 2510.65 samples/sec Loss 1.5210 LearningRate 0.000131 Epoch: 26 Global Step: 559330 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:37,858-Speed 2497.21 samples/sec Loss 1.5486 LearningRate 0.000131 Epoch: 26 Global Step: 559340 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:46,057-Speed 2498.59 samples/sec Loss 1.5598 LearningRate 0.000131 Epoch: 26 Global Step: 559350 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:29:54,258-Speed 2497.49 samples/sec Loss 1.5752 LearningRate 0.000131 Epoch: 26 Global Step: 559360 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:02,459-Speed 2497.67 samples/sec Loss 1.6125 LearningRate 0.000131 Epoch: 26 Global Step: 559370 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:10,660-Speed 2498.28 samples/sec Loss 1.5284 LearningRate 0.000131 Epoch: 26 Global Step: 559380 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:18,809-Speed 2513.40 samples/sec Loss 1.5746 LearningRate 0.000131 Epoch: 26 Global Step: 559390 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:27,024-Speed 2493.35 samples/sec Loss 1.5397 LearningRate 0.000131 Epoch: 26 Global Step: 559400 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:35,225-Speed 2497.57 samples/sec Loss 1.5219 LearningRate 0.000131 Epoch: 26 Global Step: 559410 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:43,426-Speed 2498.04 samples/sec Loss 1.5484 LearningRate 0.000131 Epoch: 26 Global Step: 559420 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:51,641-Speed 2493.31 samples/sec Loss 1.5800 LearningRate 0.000131 Epoch: 26 Global Step: 559430 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:30:59,843-Speed 2497.40 samples/sec Loss 1.5550 LearningRate 0.000131 Epoch: 26 Global Step: 559440 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:07,998-Speed 2511.95 samples/sec Loss 1.5298 LearningRate 0.000131 Epoch: 26 Global Step: 559450 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:16,197-Speed 2498.40 samples/sec Loss 1.5666 LearningRate 0.000131 Epoch: 26 Global Step: 559460 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:24,398-Speed 2497.67 samples/sec Loss 1.5505 LearningRate 0.000131 Epoch: 26 Global Step: 559470 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:32,612-Speed 2493.62 samples/sec Loss 1.5537 LearningRate 0.000131 Epoch: 26 Global Step: 559480 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:40,822-Speed 2495.11 samples/sec Loss 1.5289 LearningRate 0.000131 Epoch: 26 Global Step: 559490 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:49,027-Speed 2496.60 samples/sec Loss 1.5469 LearningRate 0.000131 Epoch: 26 Global Step: 559500 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:31:57,172-Speed 2514.84 samples/sec Loss 1.5435 LearningRate 0.000131 Epoch: 26 Global Step: 559510 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:05,375-Speed 2497.20 samples/sec Loss 1.5397 LearningRate 0.000131 Epoch: 26 Global Step: 559520 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:13,578-Speed 2497.12 samples/sec Loss 1.5793 LearningRate 0.000131 Epoch: 26 Global Step: 559530 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:21,779-Speed 2498.03 samples/sec Loss 1.5447 LearningRate 0.000131 Epoch: 26 Global Step: 559540 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:29,976-Speed 2498.82 samples/sec Loss 1.5130 LearningRate 0.000131 Epoch: 26 Global Step: 559550 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:38,185-Speed 2495.10 samples/sec Loss 1.5322 LearningRate 0.000131 Epoch: 26 Global Step: 559560 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:46,340-Speed 2511.82 samples/sec Loss 1.5757 LearningRate 0.000131 Epoch: 26 Global Step: 559570 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:32:54,538-Speed 2498.48 samples/sec Loss 1.5565 LearningRate 0.000131 Epoch: 26 Global Step: 559580 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:02,739-Speed 2497.84 samples/sec Loss 1.5499 LearningRate 0.000131 Epoch: 26 Global Step: 559590 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:10,935-Speed 2499.06 samples/sec Loss 1.5657 LearningRate 0.000131 Epoch: 26 Global Step: 559600 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:19,135-Speed 2498.39 samples/sec Loss 1.5315 LearningRate 0.000131 Epoch: 26 Global Step: 559610 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:27,343-Speed 2495.46 samples/sec Loss 1.5714 LearningRate 0.000131 Epoch: 26 Global Step: 559620 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:35,489-Speed 2514.54 samples/sec Loss 1.5390 LearningRate 0.000131 Epoch: 26 Global Step: 559630 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:43,685-Speed 2499.31 samples/sec Loss 1.5469 LearningRate 0.000131 Epoch: 26 Global Step: 559640 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:33:51,886-Speed 2497.58 samples/sec Loss 1.5413 LearningRate 0.000131 Epoch: 26 Global Step: 559650 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-07-10 22:34:00,091-Speed 2496.39 samples/sec Loss 1.5502 LearningRate 0.000131 Epoch: 26 Global Step: 559660 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:08,286-Speed 2499.56 samples/sec Loss 1.5498 LearningRate 0.000131 Epoch: 26 Global Step: 559670 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:16,499-Speed 2494.10 samples/sec Loss 1.5341 LearningRate 0.000131 Epoch: 26 Global Step: 559680 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:24,658-Speed 2510.74 samples/sec Loss 1.5909 LearningRate 0.000131 Epoch: 26 Global Step: 559690 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:32,860-Speed 2497.24 samples/sec Loss 1.5650 LearningRate 0.000131 Epoch: 26 Global Step: 559700 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:41,066-Speed 2496.30 samples/sec Loss 1.5402 LearningRate 0.000131 Epoch: 26 Global Step: 559710 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:49,269-Speed 2497.25 samples/sec Loss 1.5360 LearningRate 0.000131 Epoch: 26 Global Step: 559720 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:34:57,472-Speed 2496.92 samples/sec Loss 1.5380 LearningRate 0.000131 Epoch: 26 Global Step: 559730 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:05,690-Speed 2492.43 samples/sec Loss 1.5536 LearningRate 0.000131 Epoch: 26 Global Step: 559740 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:13,839-Speed 2515.21 samples/sec Loss 1.5658 LearningRate 0.000131 Epoch: 26 Global Step: 559750 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:22,047-Speed 2495.71 samples/sec Loss 1.5417 LearningRate 0.000131 Epoch: 26 Global Step: 559760 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:30,242-Speed 2499.23 samples/sec Loss 1.5781 LearningRate 0.000131 Epoch: 26 Global Step: 559770 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:38,454-Speed 2494.30 samples/sec Loss 1.6025 LearningRate 0.000131 Epoch: 26 Global Step: 559780 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:46,656-Speed 2497.59 samples/sec Loss 1.5627 LearningRate 0.000131 Epoch: 26 Global Step: 559790 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:35:54,858-Speed 2497.24 samples/sec Loss 1.5471 LearningRate 0.000131 Epoch: 26 Global Step: 559800 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:03,017-Speed 2510.36 samples/sec Loss 1.5598 LearningRate 0.000131 Epoch: 26 Global Step: 559810 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:11,218-Speed 2497.60 samples/sec Loss 1.5478 LearningRate 0.000131 Epoch: 26 Global Step: 559820 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:19,417-Speed 2499.12 samples/sec Loss 1.5397 LearningRate 0.000131 Epoch: 26 Global Step: 559830 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:27,618-Speed 2497.72 samples/sec Loss 1.5709 LearningRate 0.000131 Epoch: 26 Global Step: 559840 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:35,831-Speed 2493.69 samples/sec Loss 1.5433 LearningRate 0.000131 Epoch: 26 Global Step: 559850 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:44,033-Speed 2497.48 samples/sec Loss 1.5276 LearningRate 0.000130 Epoch: 26 Global Step: 559860 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:36:52,199-Speed 2508.70 samples/sec Loss 1.5421 LearningRate 0.000130 Epoch: 26 Global Step: 559870 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:00,400-Speed 2497.61 samples/sec Loss 1.5672 LearningRate 0.000130 Epoch: 26 Global Step: 559880 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:08,603-Speed 2497.16 samples/sec Loss 1.5462 LearningRate 0.000130 Epoch: 26 Global Step: 559890 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:16,816-Speed 2493.87 samples/sec Loss 1.5726 LearningRate 0.000130 Epoch: 26 Global Step: 559900 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:25,023-Speed 2495.83 samples/sec Loss 1.5596 LearningRate 0.000130 Epoch: 26 Global Step: 559910 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:33,222-Speed 2498.37 samples/sec Loss 1.5098 LearningRate 0.000130 Epoch: 26 Global Step: 559920 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:41,367-Speed 2514.94 samples/sec Loss 1.5516 LearningRate 0.000130 Epoch: 26 Global Step: 559930 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:49,567-Speed 2497.67 samples/sec Loss 1.5375 LearningRate 0.000130 Epoch: 26 Global Step: 559940 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:37:57,770-Speed 2497.16 samples/sec Loss 1.5585 LearningRate 0.000130 Epoch: 26 Global Step: 559950 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:05,969-Speed 2498.49 samples/sec Loss 1.5832 LearningRate 0.000130 Epoch: 26 Global Step: 559960 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:16,311-Speed 1980.41 samples/sec Loss 1.5541 LearningRate 0.000130 Epoch: 27 Global Step: 559970 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:24,509-Speed 2498.74 samples/sec Loss 1.5395 LearningRate 0.000130 Epoch: 27 Global Step: 559980 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:32,652-Speed 2515.71 samples/sec Loss 1.5509 LearningRate 0.000130 Epoch: 27 Global Step: 559990 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:40,856-Speed 2496.56 samples/sec Loss 1.5233 LearningRate 0.000130 Epoch: 27 Global Step: 560000 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:49,062-Speed 2496.40 samples/sec Loss 1.5588 LearningRate 0.000130 Epoch: 27 Global Step: 560010 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:38:57,266-Speed 2496.70 samples/sec Loss 1.5214 LearningRate 0.000130 Epoch: 27 Global Step: 560020 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:05,469-Speed 2497.39 samples/sec Loss 1.5783 LearningRate 0.000130 Epoch: 27 Global Step: 560030 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:13,673-Speed 2496.69 samples/sec Loss 1.5806 LearningRate 0.000130 Epoch: 27 Global Step: 560040 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:21,823-Speed 2513.61 samples/sec Loss 1.5675 LearningRate 0.000130 Epoch: 27 Global Step: 560050 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:30,028-Speed 2496.46 samples/sec Loss 1.5356 LearningRate 0.000130 Epoch: 27 Global Step: 560060 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:38,232-Speed 2496.81 samples/sec Loss 1.5344 LearningRate 0.000130 Epoch: 27 Global Step: 560070 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:46,445-Speed 2494.06 samples/sec Loss 1.5819 LearningRate 0.000130 Epoch: 27 Global Step: 560080 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:39:54,651-Speed 2496.19 samples/sec Loss 1.5598 LearningRate 0.000130 Epoch: 27 Global Step: 560090 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:02,856-Speed 2496.35 samples/sec Loss 1.5217 LearningRate 0.000130 Epoch: 27 Global Step: 560100 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:11,004-Speed 2513.86 samples/sec Loss 1.4876 LearningRate 0.000130 Epoch: 27 Global Step: 560110 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:19,206-Speed 2497.38 samples/sec Loss 1.5556 LearningRate 0.000130 Epoch: 27 Global Step: 560120 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:27,407-Speed 2497.86 samples/sec Loss 1.5326 LearningRate 0.000130 Epoch: 27 Global Step: 560130 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:35,608-Speed 2497.59 samples/sec Loss 1.5142 LearningRate 0.000130 Epoch: 27 Global Step: 560140 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:43,810-Speed 2497.26 samples/sec Loss 1.5268 LearningRate 0.000130 Epoch: 27 Global Step: 560150 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:40:52,014-Speed 2496.86 samples/sec Loss 1.5418 LearningRate 0.000130 Epoch: 27 Global Step: 560160 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:00,173-Speed 2510.54 samples/sec Loss 1.5482 LearningRate 0.000130 Epoch: 27 Global Step: 560170 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:08,376-Speed 2496.94 samples/sec Loss 1.5388 LearningRate 0.000130 Epoch: 27 Global Step: 560180 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:16,575-Speed 2498.35 samples/sec Loss 1.5311 LearningRate 0.000130 Epoch: 27 Global Step: 560190 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:24,780-Speed 2496.22 samples/sec Loss 1.5105 LearningRate 0.000130 Epoch: 27 Global Step: 560200 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:32,989-Speed 2495.26 samples/sec Loss 1.5220 LearningRate 0.000130 Epoch: 27 Global Step: 560210 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:41,193-Speed 2496.83 samples/sec Loss 1.5421 LearningRate 0.000130 Epoch: 27 Global Step: 560220 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:49,343-Speed 2513.55 samples/sec Loss 1.5147 LearningRate 0.000130 Epoch: 27 Global Step: 560230 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:41:57,543-Speed 2497.67 samples/sec Loss 1.5507 LearningRate 0.000130 Epoch: 27 Global Step: 560240 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:05,743-Speed 2497.98 samples/sec Loss 1.5371 LearningRate 0.000130 Epoch: 27 Global Step: 560250 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:13,945-Speed 2497.47 samples/sec Loss 1.5070 LearningRate 0.000130 Epoch: 27 Global Step: 560260 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:22,147-Speed 2497.44 samples/sec Loss 1.4936 LearningRate 0.000130 Epoch: 27 Global Step: 560270 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:30,348-Speed 2497.56 samples/sec Loss 1.5534 LearningRate 0.000130 Epoch: 27 Global Step: 560280 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:38,498-Speed 2513.35 samples/sec Loss 1.5300 LearningRate 0.000130 Epoch: 27 Global Step: 560290 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:46,704-Speed 2496.12 samples/sec Loss 1.5661 LearningRate 0.000130 Epoch: 27 Global Step: 560300 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:42:54,913-Speed 2495.18 samples/sec Loss 1.5249 LearningRate 0.000130 Epoch: 27 Global Step: 560310 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:03,116-Speed 2497.23 samples/sec Loss 1.5075 LearningRate 0.000130 Epoch: 27 Global Step: 560320 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:11,316-Speed 2497.74 samples/sec Loss 1.5421 LearningRate 0.000130 Epoch: 27 Global Step: 560330 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:19,521-Speed 2496.39 samples/sec Loss 1.5619 LearningRate 0.000130 Epoch: 27 Global Step: 560340 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:27,675-Speed 2512.13 samples/sec Loss 1.5206 LearningRate 0.000130 Epoch: 27 Global Step: 560350 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:35,878-Speed 2497.06 samples/sec Loss 1.5166 LearningRate 0.000130 Epoch: 27 Global Step: 560360 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:44,090-Speed 2494.27 samples/sec Loss 1.5051 LearningRate 0.000130 Epoch: 27 Global Step: 560370 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:43:52,292-Speed 2497.53 samples/sec Loss 1.5461 LearningRate 0.000130 Epoch: 27 Global Step: 560380 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:00,496-Speed 2496.75 samples/sec Loss 1.5301 LearningRate 0.000130 Epoch: 27 Global Step: 560390 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:08,699-Speed 2496.97 samples/sec Loss 1.5476 LearningRate 0.000130 Epoch: 27 Global Step: 560400 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:16,849-Speed 2513.62 samples/sec Loss 1.5161 LearningRate 0.000130 Epoch: 27 Global Step: 560410 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:25,072-Speed 2490.81 samples/sec Loss 1.5405 LearningRate 0.000130 Epoch: 27 Global Step: 560420 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:33,289-Speed 2492.99 samples/sec Loss 1.5256 LearningRate 0.000130 Epoch: 27 Global Step: 560430 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:41,493-Speed 2496.94 samples/sec Loss 1.5501 LearningRate 0.000130 Epoch: 27 Global Step: 560440 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:49,698-Speed 2496.30 samples/sec Loss 1.5208 LearningRate 0.000130 Epoch: 27 Global Step: 560450 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:44:57,915-Speed 2492.76 samples/sec Loss 1.5002 LearningRate 0.000130 Epoch: 27 Global Step: 560460 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:06,067-Speed 2512.64 samples/sec Loss 1.5430 LearningRate 0.000130 Epoch: 27 Global Step: 560470 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:14,268-Speed 2497.57 samples/sec Loss 1.5396 LearningRate 0.000130 Epoch: 27 Global Step: 560480 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:22,470-Speed 2497.89 samples/sec Loss 1.5392 LearningRate 0.000130 Epoch: 27 Global Step: 560490 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:30,673-Speed 2496.75 samples/sec Loss 1.5223 LearningRate 0.000130 Epoch: 27 Global Step: 560500 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:38,877-Speed 2496.92 samples/sec Loss 1.5268 LearningRate 0.000130 Epoch: 27 Global Step: 560510 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:47,082-Speed 2496.24 samples/sec Loss 1.5096 LearningRate 0.000130 Epoch: 27 Global Step: 560520 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:45:55,232-Speed 2513.55 samples/sec Loss 1.5183 LearningRate 0.000130 Epoch: 27 Global Step: 560530 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:03,440-Speed 2495.62 samples/sec Loss 1.5466 LearningRate 0.000130 Epoch: 27 Global Step: 560540 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:11,647-Speed 2495.66 samples/sec Loss 1.5407 LearningRate 0.000130 Epoch: 27 Global Step: 560550 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:19,849-Speed 2497.52 samples/sec Loss 1.5480 LearningRate 0.000130 Epoch: 27 Global Step: 560560 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:28,053-Speed 2496.79 samples/sec Loss 1.5124 LearningRate 0.000130 Epoch: 27 Global Step: 560570 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:36,256-Speed 2497.14 samples/sec Loss 1.5259 LearningRate 0.000130 Epoch: 27 Global Step: 560580 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:44,404-Speed 2514.06 samples/sec Loss 1.5151 LearningRate 0.000130 Epoch: 27 Global Step: 560590 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:46:52,606-Speed 2497.34 samples/sec Loss 1.5524 LearningRate 0.000130 Epoch: 27 Global Step: 560600 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:00,810-Speed 2496.53 samples/sec Loss 1.5552 LearningRate 0.000130 Epoch: 27 Global Step: 560610 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:09,013-Speed 2497.23 samples/sec Loss 1.5895 LearningRate 0.000130 Epoch: 27 Global Step: 560620 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:17,217-Speed 2496.76 samples/sec Loss 1.5517 LearningRate 0.000130 Epoch: 27 Global Step: 560630 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:25,421-Speed 2497.11 samples/sec Loss 1.5405 LearningRate 0.000130 Epoch: 27 Global Step: 560640 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:33,571-Speed 2513.21 samples/sec Loss 1.5298 LearningRate 0.000130 Epoch: 27 Global Step: 560650 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:41,773-Speed 2497.55 samples/sec Loss 1.5586 LearningRate 0.000130 Epoch: 27 Global Step: 560660 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:49,975-Speed 2497.54 samples/sec Loss 1.5829 LearningRate 0.000130 Epoch: 27 Global Step: 560670 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:47:58,191-Speed 2493.41 samples/sec Loss 1.5589 LearningRate 0.000130 Epoch: 27 Global Step: 560680 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:06,395-Speed 2496.84 samples/sec Loss 1.5558 LearningRate 0.000130 Epoch: 27 Global Step: 560690 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:14,593-Speed 2498.32 samples/sec Loss 1.5514 LearningRate 0.000130 Epoch: 27 Global Step: 560700 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:22,741-Speed 2514.30 samples/sec Loss 1.5865 LearningRate 0.000130 Epoch: 27 Global Step: 560710 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:30,955-Speed 2493.88 samples/sec Loss 1.5419 LearningRate 0.000130 Epoch: 27 Global Step: 560720 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:39,160-Speed 2496.38 samples/sec Loss 1.5244 LearningRate 0.000130 Epoch: 27 Global Step: 560730 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:47,361-Speed 2497.41 samples/sec Loss 1.5590 LearningRate 0.000130 Epoch: 27 Global Step: 560740 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:48:55,562-Speed 2497.88 samples/sec Loss 1.5584 LearningRate 0.000130 Epoch: 27 Global Step: 560750 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:03,762-Speed 2497.73 samples/sec Loss 1.5474 LearningRate 0.000130 Epoch: 27 Global Step: 560760 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:11,914-Speed 2512.73 samples/sec Loss 1.5648 LearningRate 0.000130 Epoch: 27 Global Step: 560770 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:20,122-Speed 2495.52 samples/sec Loss 1.5254 LearningRate 0.000130 Epoch: 27 Global Step: 560780 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:28,324-Speed 2497.41 samples/sec Loss 1.5693 LearningRate 0.000130 Epoch: 27 Global Step: 560790 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:36,525-Speed 2497.61 samples/sec Loss 1.5566 LearningRate 0.000130 Epoch: 27 Global Step: 560800 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:44,733-Speed 2495.23 samples/sec Loss 1.5260 LearningRate 0.000130 Epoch: 27 Global Step: 560810 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:49:52,939-Speed 2495.95 samples/sec Loss 1.5275 LearningRate 0.000130 Epoch: 27 Global Step: 560820 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:50:01,086-Speed 2514.29 samples/sec Loss 1.5659 LearningRate 0.000130 Epoch: 27 Global Step: 560830 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:50:09,293-Speed 2495.89 samples/sec Loss 1.5088 LearningRate 0.000130 Epoch: 27 Global Step: 560840 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:50:17,493-Speed 2497.92 samples/sec Loss 1.5555 LearningRate 0.000130 Epoch: 27 Global Step: 560850 Fp16 Grad Scale: 16384 Required: 62 hours Training: 2022-07-10 22:50:25,693-Speed 2498.16 samples/sec Loss 1.5610 LearningRate 0.000130 Epoch: 27 Global Step: 560860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:50:33,894-Speed 2497.65 samples/sec Loss 1.5589 LearningRate 0.000130 Epoch: 27 Global Step: 560870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:50:42,094-Speed 2498.17 samples/sec Loss 1.5499 LearningRate 0.000130 Epoch: 27 Global Step: 560880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:50:50,241-Speed 2514.36 samples/sec Loss 1.5380 LearningRate 0.000129 Epoch: 27 Global Step: 560890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:50:58,463-Speed 2491.46 samples/sec Loss 1.5718 LearningRate 0.000129 Epoch: 27 Global Step: 560900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:51:06,662-Speed 2498.34 samples/sec Loss 1.5281 LearningRate 0.000129 Epoch: 27 Global Step: 560910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:51:14,871-Speed 2495.01 samples/sec Loss 1.5718 LearningRate 0.000129 Epoch: 27 Global Step: 560920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-07-10 22:51:23,071-Speed 2498.24 samples/sec Loss 1.5915 LearningRate 0.000129 Epoch: 27 Global Step: 560930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:51:31,274-Speed 2496.87 samples/sec Loss 1.5789 LearningRate 0.000129 Epoch: 27 Global Step: 560940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:51:39,419-Speed 2514.69 samples/sec Loss 1.5733 LearningRate 0.000129 Epoch: 27 Global Step: 560950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:51:47,632-Speed 2494.40 samples/sec Loss 1.5475 LearningRate 0.000129 Epoch: 27 Global Step: 560960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:51:55,836-Speed 2497.03 samples/sec Loss 1.5630 LearningRate 0.000129 Epoch: 27 Global Step: 560970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:04,037-Speed 2498.19 samples/sec Loss 1.5652 LearningRate 0.000129 Epoch: 27 Global Step: 560980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:12,235-Speed 2498.22 samples/sec Loss 1.5902 LearningRate 0.000129 Epoch: 27 Global Step: 560990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:20,436-Speed 2497.97 samples/sec Loss 1.5647 LearningRate 0.000129 Epoch: 27 Global Step: 561000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:28,586-Speed 2513.20 samples/sec Loss 1.5185 LearningRate 0.000129 Epoch: 27 Global Step: 561010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:36,788-Speed 2497.57 samples/sec Loss 1.5589 LearningRate 0.000129 Epoch: 27 Global Step: 561020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:44,988-Speed 2497.99 samples/sec Loss 1.5554 LearningRate 0.000129 Epoch: 27 Global Step: 561030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:52:53,189-Speed 2497.43 samples/sec Loss 1.5356 LearningRate 0.000129 Epoch: 27 Global Step: 561040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:01,391-Speed 2497.62 samples/sec Loss 1.5703 LearningRate 0.000129 Epoch: 27 Global Step: 561050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:09,605-Speed 2493.80 samples/sec Loss 1.5457 LearningRate 0.000129 Epoch: 27 Global Step: 561060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:17,758-Speed 2512.32 samples/sec Loss 1.5582 LearningRate 0.000129 Epoch: 27 Global Step: 561070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:25,961-Speed 2497.50 samples/sec Loss 1.5579 LearningRate 0.000129 Epoch: 27 Global Step: 561080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:34,163-Speed 2497.34 samples/sec Loss 1.5539 LearningRate 0.000129 Epoch: 27 Global Step: 561090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:42,364-Speed 2498.00 samples/sec Loss 1.5358 LearningRate 0.000129 Epoch: 27 Global Step: 561100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:50,565-Speed 2497.53 samples/sec Loss 1.5286 LearningRate 0.000129 Epoch: 27 Global Step: 561110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:53:58,769-Speed 2496.68 samples/sec Loss 1.5908 LearningRate 0.000129 Epoch: 27 Global Step: 561120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:06,919-Speed 2513.54 samples/sec Loss 1.5554 LearningRate 0.000129 Epoch: 27 Global Step: 561130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:15,116-Speed 2498.86 samples/sec Loss 1.5303 LearningRate 0.000129 Epoch: 27 Global Step: 561140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:23,319-Speed 2496.96 samples/sec Loss 1.5686 LearningRate 0.000129 Epoch: 27 Global Step: 561150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:31,525-Speed 2496.19 samples/sec Loss 1.5272 LearningRate 0.000129 Epoch: 27 Global Step: 561160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:39,724-Speed 2498.37 samples/sec Loss 1.4966 LearningRate 0.000129 Epoch: 27 Global Step: 561170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:47,925-Speed 2497.47 samples/sec Loss 1.5326 LearningRate 0.000129 Epoch: 27 Global Step: 561180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:54:56,075-Speed 2513.40 samples/sec Loss 1.5313 LearningRate 0.000129 Epoch: 27 Global Step: 561190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:04,275-Speed 2497.89 samples/sec Loss 1.5329 LearningRate 0.000129 Epoch: 27 Global Step: 561200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:12,477-Speed 2497.37 samples/sec Loss 1.5292 LearningRate 0.000129 Epoch: 27 Global Step: 561210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:20,678-Speed 2497.56 samples/sec Loss 1.5659 LearningRate 0.000129 Epoch: 27 Global Step: 561220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:28,890-Speed 2494.61 samples/sec Loss 1.5027 LearningRate 0.000129 Epoch: 27 Global Step: 561230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:37,091-Speed 2497.75 samples/sec Loss 1.5440 LearningRate 0.000129 Epoch: 27 Global Step: 561240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:45,247-Speed 2511.69 samples/sec Loss 1.5415 LearningRate 0.000129 Epoch: 27 Global Step: 561250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:55:53,453-Speed 2495.83 samples/sec Loss 1.5421 LearningRate 0.000129 Epoch: 27 Global Step: 561260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:01,654-Speed 2497.69 samples/sec Loss 1.4957 LearningRate 0.000129 Epoch: 27 Global Step: 561270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:09,868-Speed 2493.66 samples/sec Loss 1.5335 LearningRate 0.000129 Epoch: 27 Global Step: 561280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:18,076-Speed 2495.39 samples/sec Loss 1.5228 LearningRate 0.000129 Epoch: 27 Global Step: 561290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:26,279-Speed 2497.31 samples/sec Loss 1.4683 LearningRate 0.000129 Epoch: 27 Global Step: 561300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:34,427-Speed 2514.01 samples/sec Loss 1.5372 LearningRate 0.000129 Epoch: 27 Global Step: 561310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:42,624-Speed 2498.69 samples/sec Loss 1.5316 LearningRate 0.000129 Epoch: 27 Global Step: 561320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:50,827-Speed 2497.41 samples/sec Loss 1.4989 LearningRate 0.000129 Epoch: 27 Global Step: 561330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:56:59,025-Speed 2498.51 samples/sec Loss 1.5597 LearningRate 0.000129 Epoch: 27 Global Step: 561340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:07,228-Speed 2497.14 samples/sec Loss 1.5351 LearningRate 0.000129 Epoch: 27 Global Step: 561350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:15,428-Speed 2497.91 samples/sec Loss 1.4975 LearningRate 0.000129 Epoch: 27 Global Step: 561360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:23,580-Speed 2512.77 samples/sec Loss 1.5555 LearningRate 0.000129 Epoch: 27 Global Step: 561370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:31,785-Speed 2496.47 samples/sec Loss 1.5209 LearningRate 0.000129 Epoch: 27 Global Step: 561380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:39,985-Speed 2497.64 samples/sec Loss 1.5551 LearningRate 0.000129 Epoch: 27 Global Step: 561390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:48,193-Speed 2495.57 samples/sec Loss 1.5360 LearningRate 0.000129 Epoch: 27 Global Step: 561400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:57:56,395-Speed 2497.73 samples/sec Loss 1.5088 LearningRate 0.000129 Epoch: 27 Global Step: 561410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:04,594-Speed 2498.21 samples/sec Loss 1.4951 LearningRate 0.000129 Epoch: 27 Global Step: 561420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:12,741-Speed 2513.93 samples/sec Loss 1.5163 LearningRate 0.000129 Epoch: 27 Global Step: 561430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:20,942-Speed 2497.97 samples/sec Loss 1.5095 LearningRate 0.000129 Epoch: 27 Global Step: 561440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:29,142-Speed 2497.88 samples/sec Loss 1.5320 LearningRate 0.000129 Epoch: 27 Global Step: 561450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:37,350-Speed 2495.57 samples/sec Loss 1.5053 LearningRate 0.000129 Epoch: 27 Global Step: 561460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:45,548-Speed 2498.42 samples/sec Loss 1.5352 LearningRate 0.000129 Epoch: 27 Global Step: 561470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:58:53,752-Speed 2496.94 samples/sec Loss 1.5503 LearningRate 0.000129 Epoch: 27 Global Step: 561480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:01,898-Speed 2514.63 samples/sec Loss 1.5250 LearningRate 0.000129 Epoch: 27 Global Step: 561490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:10,100-Speed 2497.35 samples/sec Loss 1.4983 LearningRate 0.000129 Epoch: 27 Global Step: 561500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:18,301-Speed 2497.56 samples/sec Loss 1.5167 LearningRate 0.000129 Epoch: 27 Global Step: 561510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:26,501-Speed 2498.08 samples/sec Loss 1.5320 LearningRate 0.000129 Epoch: 27 Global Step: 561520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:34,704-Speed 2497.08 samples/sec Loss 1.5228 LearningRate 0.000129 Epoch: 27 Global Step: 561530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:42,915-Speed 2494.61 samples/sec Loss 1.5072 LearningRate 0.000129 Epoch: 27 Global Step: 561540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:51,067-Speed 2513.25 samples/sec Loss 1.5346 LearningRate 0.000129 Epoch: 27 Global Step: 561550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 22:59:59,270-Speed 2497.08 samples/sec Loss 1.5594 LearningRate 0.000129 Epoch: 27 Global Step: 561560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:07,475-Speed 2496.32 samples/sec Loss 1.5302 LearningRate 0.000129 Epoch: 27 Global Step: 561570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:15,678-Speed 2497.29 samples/sec Loss 1.5378 LearningRate 0.000129 Epoch: 27 Global Step: 561580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:23,880-Speed 2497.08 samples/sec Loss 1.5541 LearningRate 0.000129 Epoch: 27 Global Step: 561590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:32,089-Speed 2495.55 samples/sec Loss 1.5238 LearningRate 0.000129 Epoch: 27 Global Step: 561600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:40,247-Speed 2510.92 samples/sec Loss 1.5400 LearningRate 0.000129 Epoch: 27 Global Step: 561610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:48,449-Speed 2497.16 samples/sec Loss 1.5395 LearningRate 0.000129 Epoch: 27 Global Step: 561620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:00:56,652-Speed 2497.00 samples/sec Loss 1.5615 LearningRate 0.000129 Epoch: 27 Global Step: 561630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:04,854-Speed 2497.41 samples/sec Loss 1.5216 LearningRate 0.000129 Epoch: 27 Global Step: 561640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:13,055-Speed 2497.54 samples/sec Loss 1.5510 LearningRate 0.000129 Epoch: 27 Global Step: 561650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:21,259-Speed 2496.84 samples/sec Loss 1.5468 LearningRate 0.000129 Epoch: 27 Global Step: 561660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:29,420-Speed 2510.00 samples/sec Loss 1.5727 LearningRate 0.000129 Epoch: 27 Global Step: 561670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:37,627-Speed 2495.98 samples/sec Loss 1.5348 LearningRate 0.000129 Epoch: 27 Global Step: 561680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:45,830-Speed 2497.16 samples/sec Loss 1.5321 LearningRate 0.000129 Epoch: 27 Global Step: 561690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:01:54,030-Speed 2497.73 samples/sec Loss 1.4999 LearningRate 0.000129 Epoch: 27 Global Step: 561700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:02,230-Speed 2498.01 samples/sec Loss 1.5323 LearningRate 0.000129 Epoch: 27 Global Step: 561710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:10,441-Speed 2494.71 samples/sec Loss 1.5425 LearningRate 0.000129 Epoch: 27 Global Step: 561720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:18,592-Speed 2512.82 samples/sec Loss 1.5426 LearningRate 0.000129 Epoch: 27 Global Step: 561730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:26,793-Speed 2497.71 samples/sec Loss 1.5303 LearningRate 0.000129 Epoch: 27 Global Step: 561740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:35,001-Speed 2495.83 samples/sec Loss 1.5341 LearningRate 0.000129 Epoch: 27 Global Step: 561750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:43,199-Speed 2498.53 samples/sec Loss 1.5496 LearningRate 0.000129 Epoch: 27 Global Step: 561760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:51,402-Speed 2497.02 samples/sec Loss 1.5564 LearningRate 0.000129 Epoch: 27 Global Step: 561770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:02:59,605-Speed 2497.22 samples/sec Loss 1.5129 LearningRate 0.000129 Epoch: 27 Global Step: 561780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:07,752-Speed 2514.12 samples/sec Loss 1.5388 LearningRate 0.000129 Epoch: 27 Global Step: 561790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:15,954-Speed 2497.69 samples/sec Loss 1.5541 LearningRate 0.000129 Epoch: 27 Global Step: 561800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:24,156-Speed 2497.19 samples/sec Loss 1.5536 LearningRate 0.000129 Epoch: 27 Global Step: 561810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:32,359-Speed 2496.85 samples/sec Loss 1.5449 LearningRate 0.000129 Epoch: 27 Global Step: 561820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:40,564-Speed 2496.42 samples/sec Loss 1.5018 LearningRate 0.000129 Epoch: 27 Global Step: 561830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:48,777-Speed 2494.02 samples/sec Loss 1.5375 LearningRate 0.000129 Epoch: 27 Global Step: 561840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:03:56,936-Speed 2510.79 samples/sec Loss 1.5337 LearningRate 0.000129 Epoch: 27 Global Step: 561850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:05,136-Speed 2497.71 samples/sec Loss 1.5016 LearningRate 0.000129 Epoch: 27 Global Step: 561860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:13,338-Speed 2497.49 samples/sec Loss 1.5625 LearningRate 0.000129 Epoch: 27 Global Step: 561870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:21,541-Speed 2496.97 samples/sec Loss 1.6043 LearningRate 0.000129 Epoch: 27 Global Step: 561880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:29,740-Speed 2498.15 samples/sec Loss 1.5494 LearningRate 0.000129 Epoch: 27 Global Step: 561890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:37,945-Speed 2496.60 samples/sec Loss 1.5345 LearningRate 0.000129 Epoch: 27 Global Step: 561900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:46,090-Speed 2514.86 samples/sec Loss 1.5448 LearningRate 0.000129 Epoch: 27 Global Step: 561910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:04:54,295-Speed 2496.41 samples/sec Loss 1.5262 LearningRate 0.000129 Epoch: 27 Global Step: 561920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:02,519-Speed 2490.75 samples/sec Loss 1.5040 LearningRate 0.000128 Epoch: 27 Global Step: 561930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:10,718-Speed 2498.06 samples/sec Loss 1.5634 LearningRate 0.000128 Epoch: 27 Global Step: 561940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:18,919-Speed 2497.72 samples/sec Loss 1.5347 LearningRate 0.000128 Epoch: 27 Global Step: 561950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:27,119-Speed 2497.70 samples/sec Loss 1.5241 LearningRate 0.000128 Epoch: 27 Global Step: 561960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:35,267-Speed 2514.09 samples/sec Loss 1.5158 LearningRate 0.000128 Epoch: 27 Global Step: 561970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:43,469-Speed 2497.21 samples/sec Loss 1.5502 LearningRate 0.000128 Epoch: 27 Global Step: 561980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:51,672-Speed 2497.17 samples/sec Loss 1.5327 LearningRate 0.000128 Epoch: 27 Global Step: 561990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:05:59,872-Speed 2497.68 samples/sec Loss 1.5331 LearningRate 0.000128 Epoch: 27 Global Step: 562000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:08,080-Speed 2495.55 samples/sec Loss 1.5224 LearningRate 0.000128 Epoch: 27 Global Step: 562010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:16,293-Speed 2494.12 samples/sec Loss 1.5321 LearningRate 0.000128 Epoch: 27 Global Step: 562020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:24,438-Speed 2514.71 samples/sec Loss 1.5275 LearningRate 0.000128 Epoch: 27 Global Step: 562030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:32,645-Speed 2495.68 samples/sec Loss 1.5195 LearningRate 0.000128 Epoch: 27 Global Step: 562040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:40,857-Speed 2494.46 samples/sec Loss 1.4876 LearningRate 0.000128 Epoch: 27 Global Step: 562050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:06:49,057-Speed 2497.88 samples/sec Loss 1.5273 LearningRate 0.000128 Epoch: 27 Global Step: 562060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:06:57,258-Speed 2497.72 samples/sec Loss 1.5362 LearningRate 0.000128 Epoch: 27 Global Step: 562070 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:05,458-Speed 2497.90 samples/sec Loss 1.5288 LearningRate 0.000128 Epoch: 27 Global Step: 562080 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:13,605-Speed 2514.24 samples/sec Loss 1.5434 LearningRate 0.000128 Epoch: 27 Global Step: 562090 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:21,803-Speed 2498.48 samples/sec Loss 1.4843 LearningRate 0.000128 Epoch: 27 Global Step: 562100 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:30,009-Speed 2496.18 samples/sec Loss 1.5245 LearningRate 0.000128 Epoch: 27 Global Step: 562110 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:38,211-Speed 2497.46 samples/sec Loss 1.5121 LearningRate 0.000128 Epoch: 27 Global Step: 562120 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:46,417-Speed 2496.06 samples/sec Loss 1.5307 LearningRate 0.000128 Epoch: 27 Global Step: 562130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:07:54,618-Speed 2497.80 samples/sec Loss 1.5149 LearningRate 0.000128 Epoch: 27 Global Step: 562140 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:08:02,769-Speed 2512.72 samples/sec Loss 1.5374 LearningRate 0.000128 Epoch: 27 Global Step: 562150 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:08:10,969-Speed 2498.00 samples/sec Loss 1.5188 LearningRate 0.000128 Epoch: 27 Global Step: 562160 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-07-10 23:08:19,129-Speed 2510.30 samples/sec Loss 1.5845 LearningRate 0.000128 Epoch: 27 Global Step: 562170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:08:27,330-Speed 2497.71 samples/sec Loss 1.4824 LearningRate 0.000128 Epoch: 27 Global Step: 562180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:08:35,530-Speed 2498.08 samples/sec Loss 1.5154 LearningRate 0.000128 Epoch: 27 Global Step: 562190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:08:43,732-Speed 2497.17 samples/sec Loss 1.5146 LearningRate 0.000128 Epoch: 27 Global Step: 562200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:08:51,879-Speed 2514.27 samples/sec Loss 1.5434 LearningRate 0.000128 Epoch: 27 Global Step: 562210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:00,076-Speed 2498.76 samples/sec Loss 1.5450 LearningRate 0.000128 Epoch: 27 Global Step: 562220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:08,280-Speed 2496.77 samples/sec Loss 1.4921 LearningRate 0.000128 Epoch: 27 Global Step: 562230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:16,478-Speed 2498.75 samples/sec Loss 1.5272 LearningRate 0.000128 Epoch: 27 Global Step: 562240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:24,676-Speed 2498.50 samples/sec Loss 1.5782 LearningRate 0.000128 Epoch: 27 Global Step: 562250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:32,875-Speed 2498.22 samples/sec Loss 1.5402 LearningRate 0.000128 Epoch: 27 Global Step: 562260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:41,032-Speed 2511.07 samples/sec Loss 1.5236 LearningRate 0.000128 Epoch: 27 Global Step: 562270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:49,238-Speed 2496.27 samples/sec Loss 1.5294 LearningRate 0.000128 Epoch: 27 Global Step: 562280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:09:57,442-Speed 2496.86 samples/sec Loss 1.5379 LearningRate 0.000128 Epoch: 27 Global Step: 562290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:05,642-Speed 2498.14 samples/sec Loss 1.5384 LearningRate 0.000128 Epoch: 27 Global Step: 562300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:13,841-Speed 2498.09 samples/sec Loss 1.5137 LearningRate 0.000128 Epoch: 27 Global Step: 562310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:22,039-Speed 2498.80 samples/sec Loss 1.5182 LearningRate 0.000128 Epoch: 27 Global Step: 562320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:30,189-Speed 2513.35 samples/sec Loss 1.5310 LearningRate 0.000128 Epoch: 27 Global Step: 562330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:38,396-Speed 2495.79 samples/sec Loss 1.5497 LearningRate 0.000128 Epoch: 27 Global Step: 562340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:46,597-Speed 2497.63 samples/sec Loss 1.5521 LearningRate 0.000128 Epoch: 27 Global Step: 562350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:10:54,799-Speed 2497.40 samples/sec Loss 1.5712 LearningRate 0.000128 Epoch: 27 Global Step: 562360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:03,005-Speed 2496.44 samples/sec Loss 1.5535 LearningRate 0.000128 Epoch: 27 Global Step: 562370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:11,206-Speed 2497.41 samples/sec Loss 1.5167 LearningRate 0.000128 Epoch: 27 Global Step: 562380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:19,356-Speed 2513.12 samples/sec Loss 1.5809 LearningRate 0.000128 Epoch: 27 Global Step: 562390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:27,568-Speed 2494.42 samples/sec Loss 1.5410 LearningRate 0.000128 Epoch: 27 Global Step: 562400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:35,772-Speed 2496.97 samples/sec Loss 1.5347 LearningRate 0.000128 Epoch: 27 Global Step: 562410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:43,974-Speed 2497.23 samples/sec Loss 1.5359 LearningRate 0.000128 Epoch: 27 Global Step: 562420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:11:52,178-Speed 2496.79 samples/sec Loss 1.5504 LearningRate 0.000128 Epoch: 27 Global Step: 562430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:00,390-Speed 2494.33 samples/sec Loss 1.5557 LearningRate 0.000128 Epoch: 27 Global Step: 562440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:08,539-Speed 2513.60 samples/sec Loss 1.5189 LearningRate 0.000128 Epoch: 27 Global Step: 562450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:16,739-Speed 2498.06 samples/sec Loss 1.5587 LearningRate 0.000128 Epoch: 27 Global Step: 562460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:24,941-Speed 2497.21 samples/sec Loss 1.5718 LearningRate 0.000128 Epoch: 27 Global Step: 562470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:33,140-Speed 2498.32 samples/sec Loss 1.5395 LearningRate 0.000128 Epoch: 27 Global Step: 562480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:41,345-Speed 2496.42 samples/sec Loss 1.5393 LearningRate 0.000128 Epoch: 27 Global Step: 562490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:49,546-Speed 2497.73 samples/sec Loss 1.5642 LearningRate 0.000128 Epoch: 27 Global Step: 562500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:12:57,694-Speed 2513.99 samples/sec Loss 1.5370 LearningRate 0.000128 Epoch: 27 Global Step: 562510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:05,897-Speed 2496.91 samples/sec Loss 1.5583 LearningRate 0.000128 Epoch: 27 Global Step: 562520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:14,097-Speed 2497.95 samples/sec Loss 1.5131 LearningRate 0.000128 Epoch: 27 Global Step: 562530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:22,300-Speed 2497.06 samples/sec Loss 1.5218 LearningRate 0.000128 Epoch: 27 Global Step: 562540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:30,499-Speed 2498.21 samples/sec Loss 1.5525 LearningRate 0.000128 Epoch: 27 Global Step: 562550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:38,701-Speed 2497.49 samples/sec Loss 1.5492 LearningRate 0.000128 Epoch: 27 Global Step: 562560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:46,849-Speed 2513.75 samples/sec Loss 1.5317 LearningRate 0.000128 Epoch: 27 Global Step: 562570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:13:55,057-Speed 2495.68 samples/sec Loss 1.5660 LearningRate 0.000128 Epoch: 27 Global Step: 562580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:03,254-Speed 2498.91 samples/sec Loss 1.5591 LearningRate 0.000128 Epoch: 27 Global Step: 562590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:11,463-Speed 2495.28 samples/sec Loss 1.5168 LearningRate 0.000128 Epoch: 27 Global Step: 562600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:19,662-Speed 2498.19 samples/sec Loss 1.5173 LearningRate 0.000128 Epoch: 27 Global Step: 562610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:27,864-Speed 2497.57 samples/sec Loss 1.5682 LearningRate 0.000128 Epoch: 27 Global Step: 562620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:36,012-Speed 2513.63 samples/sec Loss 1.5534 LearningRate 0.000128 Epoch: 27 Global Step: 562630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:44,217-Speed 2496.86 samples/sec Loss 1.5469 LearningRate 0.000128 Epoch: 27 Global Step: 562640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:14:52,419-Speed 2497.24 samples/sec Loss 1.5269 LearningRate 0.000128 Epoch: 27 Global Step: 562650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:00,621-Speed 2497.38 samples/sec Loss 1.5205 LearningRate 0.000128 Epoch: 27 Global Step: 562660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:08,824-Speed 2497.02 samples/sec Loss 1.5502 LearningRate 0.000128 Epoch: 27 Global Step: 562670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:17,028-Speed 2496.98 samples/sec Loss 1.5503 LearningRate 0.000128 Epoch: 27 Global Step: 562680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:25,178-Speed 2513.33 samples/sec Loss 1.5344 LearningRate 0.000128 Epoch: 27 Global Step: 562690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:33,382-Speed 2496.91 samples/sec Loss 1.5157 LearningRate 0.000128 Epoch: 27 Global Step: 562700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:41,585-Speed 2497.18 samples/sec Loss 1.5290 LearningRate 0.000128 Epoch: 27 Global Step: 562710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:49,792-Speed 2496.11 samples/sec Loss 1.5024 LearningRate 0.000128 Epoch: 27 Global Step: 562720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:15:57,994-Speed 2497.01 samples/sec Loss 1.5244 LearningRate 0.000128 Epoch: 27 Global Step: 562730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:06,198-Speed 2496.81 samples/sec Loss 1.5155 LearningRate 0.000128 Epoch: 27 Global Step: 562740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:14,346-Speed 2514.04 samples/sec Loss 1.5308 LearningRate 0.000128 Epoch: 27 Global Step: 562750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:22,547-Speed 2497.66 samples/sec Loss 1.5750 LearningRate 0.000128 Epoch: 27 Global Step: 562760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:30,749-Speed 2497.17 samples/sec Loss 1.5259 LearningRate 0.000128 Epoch: 27 Global Step: 562770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:38,958-Speed 2495.60 samples/sec Loss 1.5116 LearningRate 0.000128 Epoch: 27 Global Step: 562780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:47,159-Speed 2497.97 samples/sec Loss 1.5049 LearningRate 0.000128 Epoch: 27 Global Step: 562790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:16:55,363-Speed 2496.66 samples/sec Loss 1.5581 LearningRate 0.000128 Epoch: 27 Global Step: 562800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:03,510-Speed 2514.36 samples/sec Loss 1.5827 LearningRate 0.000128 Epoch: 27 Global Step: 562810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:11,712-Speed 2497.12 samples/sec Loss 1.5224 LearningRate 0.000128 Epoch: 27 Global Step: 562820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:19,915-Speed 2497.34 samples/sec Loss 1.5133 LearningRate 0.000128 Epoch: 27 Global Step: 562830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:28,117-Speed 2497.23 samples/sec Loss 1.5438 LearningRate 0.000128 Epoch: 27 Global Step: 562840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:36,323-Speed 2496.51 samples/sec Loss 1.5206 LearningRate 0.000128 Epoch: 27 Global Step: 562850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:44,525-Speed 2497.11 samples/sec Loss 1.5418 LearningRate 0.000128 Epoch: 27 Global Step: 562860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:17:52,674-Speed 2513.71 samples/sec Loss 1.5651 LearningRate 0.000128 Epoch: 27 Global Step: 562870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:00,875-Speed 2497.45 samples/sec Loss 1.5232 LearningRate 0.000128 Epoch: 27 Global Step: 562880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:09,076-Speed 2497.72 samples/sec Loss 1.5843 LearningRate 0.000128 Epoch: 27 Global Step: 562890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:17,283-Speed 2495.77 samples/sec Loss 1.4987 LearningRate 0.000128 Epoch: 27 Global Step: 562900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:25,490-Speed 2496.07 samples/sec Loss 1.5092 LearningRate 0.000128 Epoch: 27 Global Step: 562910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:33,692-Speed 2496.91 samples/sec Loss 1.5020 LearningRate 0.000128 Epoch: 27 Global Step: 562920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:41,843-Speed 2513.17 samples/sec Loss 1.5234 LearningRate 0.000128 Epoch: 27 Global Step: 562930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:50,070-Speed 2489.85 samples/sec Loss 1.5109 LearningRate 0.000128 Epoch: 27 Global Step: 562940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:18:58,285-Speed 2494.62 samples/sec Loss 1.4964 LearningRate 0.000128 Epoch: 27 Global Step: 562950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:06,488-Speed 2496.95 samples/sec Loss 1.5443 LearningRate 0.000128 Epoch: 27 Global Step: 562960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:14,690-Speed 2497.52 samples/sec Loss 1.5417 LearningRate 0.000128 Epoch: 27 Global Step: 562970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:22,897-Speed 2495.79 samples/sec Loss 1.5377 LearningRate 0.000127 Epoch: 27 Global Step: 562980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:31,050-Speed 2512.50 samples/sec Loss 1.5582 LearningRate 0.000127 Epoch: 27 Global Step: 562990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:39,252-Speed 2497.03 samples/sec Loss 1.5436 LearningRate 0.000127 Epoch: 27 Global Step: 563000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:47,453-Speed 2497.51 samples/sec Loss 1.5493 LearningRate 0.000127 Epoch: 27 Global Step: 563010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:19:55,654-Speed 2497.93 samples/sec Loss 1.5194 LearningRate 0.000127 Epoch: 27 Global Step: 563020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:03,857-Speed 2497.11 samples/sec Loss 1.5201 LearningRate 0.000127 Epoch: 27 Global Step: 563030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:12,056-Speed 2498.17 samples/sec Loss 1.5831 LearningRate 0.000127 Epoch: 27 Global Step: 563040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:20,205-Speed 2513.64 samples/sec Loss 1.5423 LearningRate 0.000127 Epoch: 27 Global Step: 563050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:28,408-Speed 2497.05 samples/sec Loss 1.5522 LearningRate 0.000127 Epoch: 27 Global Step: 563060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:36,613-Speed 2496.32 samples/sec Loss 1.5599 LearningRate 0.000127 Epoch: 27 Global Step: 563070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:44,815-Speed 2497.46 samples/sec Loss 1.5361 LearningRate 0.000127 Epoch: 27 Global Step: 563080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:20:53,018-Speed 2496.87 samples/sec Loss 1.5398 LearningRate 0.000127 Epoch: 27 Global Step: 563090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:01,220-Speed 2497.79 samples/sec Loss 1.5477 LearningRate 0.000127 Epoch: 27 Global Step: 563100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:09,373-Speed 2512.30 samples/sec Loss 1.5374 LearningRate 0.000127 Epoch: 27 Global Step: 563110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:17,573-Speed 2497.83 samples/sec Loss 1.5662 LearningRate 0.000127 Epoch: 27 Global Step: 563120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:25,774-Speed 2497.70 samples/sec Loss 1.5238 LearningRate 0.000127 Epoch: 27 Global Step: 563130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:33,978-Speed 2496.92 samples/sec Loss 1.5423 LearningRate 0.000127 Epoch: 27 Global Step: 563140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:42,180-Speed 2497.42 samples/sec Loss 1.4887 LearningRate 0.000127 Epoch: 27 Global Step: 563150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:50,404-Speed 2490.68 samples/sec Loss 1.5308 LearningRate 0.000127 Epoch: 27 Global Step: 563160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:21:58,564-Speed 2510.05 samples/sec Loss 1.5259 LearningRate 0.000127 Epoch: 27 Global Step: 563170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:06,767-Speed 2497.12 samples/sec Loss 1.5248 LearningRate 0.000127 Epoch: 27 Global Step: 563180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:14,969-Speed 2497.31 samples/sec Loss 1.5257 LearningRate 0.000127 Epoch: 27 Global Step: 563190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:23,198-Speed 2498.38 samples/sec Loss 1.5462 LearningRate 0.000127 Epoch: 27 Global Step: 563200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:32,240-Speed 2500.39 samples/sec Loss 1.5156 LearningRate 0.000127 Epoch: 27 Global Step: 563210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:40,487-Speed 2500.07 samples/sec Loss 1.5068 LearningRate 0.000127 Epoch: 27 Global Step: 563220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:48,635-Speed 2513.78 samples/sec Loss 1.5632 LearningRate 0.000127 Epoch: 27 Global Step: 563230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:22:56,884-Speed 2499.89 samples/sec Loss 1.5228 LearningRate 0.000127 Epoch: 27 Global Step: 563240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:23:05,089-Speed 2496.66 samples/sec Loss 1.5297 LearningRate 0.000127 Epoch: 27 Global Step: 563250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:23:13,319-Speed 2512.29 samples/sec Loss 1.5263 LearningRate 0.000127 Epoch: 27 Global Step: 563260 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:23:21,550-Speed 2498.24 samples/sec Loss 1.5370 LearningRate 0.000127 Epoch: 27 Global Step: 563270 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:23:29,752-Speed 2497.27 samples/sec Loss 1.4977 LearningRate 0.000127 Epoch: 27 Global Step: 563280 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:23:37,937-Speed 2515.82 samples/sec Loss 1.5188 LearningRate 0.000127 Epoch: 27 Global Step: 563290 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:23:46,192-Speed 2498.94 samples/sec Loss 1.5278 LearningRate 0.000127 Epoch: 27 Global Step: 563300 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:23:56,826-Speed 1926.20 samples/sec Loss 1.5559 LearningRate 0.000127 Epoch: 27 Global Step: 563310 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:05,046-Speed 2500.75 samples/sec Loss 1.5688 LearningRate 0.000127 Epoch: 27 Global Step: 563320 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:13,294-Speed 2499.30 samples/sec Loss 1.5693 LearningRate 0.000127 Epoch: 27 Global Step: 563330 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:21,542-Speed 2499.63 samples/sec Loss 1.5532 LearningRate 0.000127 Epoch: 27 Global Step: 563340 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:29,686-Speed 2515.10 samples/sec Loss 1.5219 LearningRate 0.000127 Epoch: 27 Global Step: 563350 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:37,890-Speed 2496.75 samples/sec Loss 1.5424 LearningRate 0.000127 Epoch: 27 Global Step: 563360 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:46,119-Speed 2498.68 samples/sec Loss 1.5555 LearningRate 0.000127 Epoch: 27 Global Step: 563370 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:24:54,363-Speed 2497.49 samples/sec Loss 1.5574 LearningRate 0.000127 Epoch: 27 Global Step: 563380 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:02,565-Speed 2497.27 samples/sec Loss 1.5122 LearningRate 0.000127 Epoch: 27 Global Step: 563390 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:10,817-Speed 2499.46 samples/sec Loss 1.5075 LearningRate 0.000127 Epoch: 27 Global Step: 563400 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:23,499-Speed 1638.48 samples/sec Loss 1.4864 LearningRate 0.000127 Epoch: 27 Global Step: 563410 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:31,724-Speed 2500.13 samples/sec Loss 1.5625 LearningRate 0.000127 Epoch: 27 Global Step: 563420 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:42,980-Speed 1819.79 samples/sec Loss 1.5626 LearningRate 0.000127 Epoch: 27 Global Step: 563430 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:25:51,168-Speed 2501.53 samples/sec Loss 1.5205 LearningRate 0.000127 Epoch: 27 Global Step: 563440 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:00,474-Speed 2201.00 samples/sec Loss 1.5590 LearningRate 0.000127 Epoch: 27 Global Step: 563450 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:08,689-Speed 2493.35 samples/sec Loss 1.5259 LearningRate 0.000127 Epoch: 27 Global Step: 563460 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:16,842-Speed 2512.56 samples/sec Loss 1.5222 LearningRate 0.000127 Epoch: 27 Global Step: 563470 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:25,048-Speed 2495.94 samples/sec Loss 1.5722 LearningRate 0.000127 Epoch: 27 Global Step: 563480 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:33,256-Speed 2495.60 samples/sec Loss 1.5257 LearningRate 0.000127 Epoch: 27 Global Step: 563490 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:41,470-Speed 2493.48 samples/sec Loss 1.5287 LearningRate 0.000127 Epoch: 27 Global Step: 563500 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:49,687-Speed 2492.81 samples/sec Loss 1.5530 LearningRate 0.000127 Epoch: 27 Global Step: 563510 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:26:57,905-Speed 2492.51 samples/sec Loss 1.4950 LearningRate 0.000127 Epoch: 27 Global Step: 563520 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:06,062-Speed 2511.20 samples/sec Loss 1.5181 LearningRate 0.000127 Epoch: 27 Global Step: 563530 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:14,271-Speed 2495.28 samples/sec Loss 1.5287 LearningRate 0.000127 Epoch: 27 Global Step: 563540 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:22,476-Speed 2496.21 samples/sec Loss 1.5588 LearningRate 0.000127 Epoch: 27 Global Step: 563550 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:30,687-Speed 2494.79 samples/sec Loss 1.5476 LearningRate 0.000127 Epoch: 27 Global Step: 563560 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:38,888-Speed 2497.68 samples/sec Loss 1.5145 LearningRate 0.000127 Epoch: 27 Global Step: 563570 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:47,090-Speed 2497.35 samples/sec Loss 1.5013 LearningRate 0.000127 Epoch: 27 Global Step: 563580 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:27:55,252-Speed 2509.66 samples/sec Loss 1.5528 LearningRate 0.000127 Epoch: 27 Global Step: 563590 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:03,465-Speed 2494.11 samples/sec Loss 1.5287 LearningRate 0.000127 Epoch: 27 Global Step: 563600 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:11,668-Speed 2497.04 samples/sec Loss 1.5307 LearningRate 0.000127 Epoch: 27 Global Step: 563610 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:19,868-Speed 2498.10 samples/sec Loss 1.5131 LearningRate 0.000127 Epoch: 27 Global Step: 563620 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:28,078-Speed 2494.74 samples/sec Loss 1.5507 LearningRate 0.000127 Epoch: 27 Global Step: 563630 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:36,282-Speed 2496.82 samples/sec Loss 1.5107 LearningRate 0.000127 Epoch: 27 Global Step: 563640 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:44,432-Speed 2513.30 samples/sec Loss 1.5809 LearningRate 0.000127 Epoch: 27 Global Step: 563650 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:28:52,633-Speed 2497.65 samples/sec Loss 1.5276 LearningRate 0.000127 Epoch: 27 Global Step: 563660 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:00,835-Speed 2497.46 samples/sec Loss 1.5101 LearningRate 0.000127 Epoch: 27 Global Step: 563670 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:09,043-Speed 2495.43 samples/sec Loss 1.5164 LearningRate 0.000127 Epoch: 27 Global Step: 563680 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:17,249-Speed 2496.09 samples/sec Loss 1.5478 LearningRate 0.000127 Epoch: 27 Global Step: 563690 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:25,463-Speed 2493.54 samples/sec Loss 1.5480 LearningRate 0.000127 Epoch: 27 Global Step: 563700 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:33,616-Speed 2512.26 samples/sec Loss 1.5341 LearningRate 0.000127 Epoch: 27 Global Step: 563710 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:41,823-Speed 2496.43 samples/sec Loss 1.5197 LearningRate 0.000127 Epoch: 27 Global Step: 563720 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:50,025-Speed 2497.24 samples/sec Loss 1.5302 LearningRate 0.000127 Epoch: 27 Global Step: 563730 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:29:58,231-Speed 2496.14 samples/sec Loss 1.5070 LearningRate 0.000127 Epoch: 27 Global Step: 563740 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:06,437-Speed 2495.99 samples/sec Loss 1.5266 LearningRate 0.000127 Epoch: 27 Global Step: 563750 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:14,642-Speed 2496.47 samples/sec Loss 1.5431 LearningRate 0.000127 Epoch: 27 Global Step: 563760 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:22,795-Speed 2512.27 samples/sec Loss 1.5123 LearningRate 0.000127 Epoch: 27 Global Step: 563770 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:31,012-Speed 2493.02 samples/sec Loss 1.5636 LearningRate 0.000127 Epoch: 27 Global Step: 563780 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:39,218-Speed 2496.08 samples/sec Loss 1.4825 LearningRate 0.000127 Epoch: 27 Global Step: 563790 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:47,422-Speed 2496.86 samples/sec Loss 1.5277 LearningRate 0.000127 Epoch: 27 Global Step: 563800 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:30:55,625-Speed 2496.89 samples/sec Loss 1.5246 LearningRate 0.000127 Epoch: 27 Global Step: 563810 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:03,832-Speed 2495.89 samples/sec Loss 1.5073 LearningRate 0.000127 Epoch: 27 Global Step: 563820 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:12,000-Speed 2507.80 samples/sec Loss 1.5489 LearningRate 0.000127 Epoch: 27 Global Step: 563830 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:20,205-Speed 2496.44 samples/sec Loss 1.5269 LearningRate 0.000127 Epoch: 27 Global Step: 563840 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:28,416-Speed 2494.64 samples/sec Loss 1.5278 LearningRate 0.000127 Epoch: 27 Global Step: 563850 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:36,623-Speed 2495.82 samples/sec Loss 1.5268 LearningRate 0.000127 Epoch: 27 Global Step: 563860 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:44,849-Speed 2490.37 samples/sec Loss 1.5060 LearningRate 0.000127 Epoch: 27 Global Step: 563870 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:31:53,055-Speed 2495.83 samples/sec Loss 1.5191 LearningRate 0.000127 Epoch: 27 Global Step: 563880 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:01,224-Speed 2507.66 samples/sec Loss 1.5464 LearningRate 0.000127 Epoch: 27 Global Step: 563890 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:09,427-Speed 2496.84 samples/sec Loss 1.5250 LearningRate 0.000127 Epoch: 27 Global Step: 563900 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:17,634-Speed 2496.08 samples/sec Loss 1.5048 LearningRate 0.000127 Epoch: 27 Global Step: 563910 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:25,843-Speed 2495.19 samples/sec Loss 1.5119 LearningRate 0.000127 Epoch: 27 Global Step: 563920 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:34,044-Speed 2497.81 samples/sec Loss 1.5262 LearningRate 0.000127 Epoch: 27 Global Step: 563930 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:42,253-Speed 2495.31 samples/sec Loss 1.5476 LearningRate 0.000127 Epoch: 27 Global Step: 563940 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:50,399-Speed 2514.58 samples/sec Loss 1.4914 LearningRate 0.000127 Epoch: 27 Global Step: 563950 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:32:58,602-Speed 2496.90 samples/sec Loss 1.5151 LearningRate 0.000127 Epoch: 27 Global Step: 563960 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:06,803-Speed 2497.63 samples/sec Loss 1.5438 LearningRate 0.000127 Epoch: 27 Global Step: 563970 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:15,007-Speed 2496.64 samples/sec Loss 1.5664 LearningRate 0.000127 Epoch: 27 Global Step: 563980 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:23,212-Speed 2496.98 samples/sec Loss 1.5725 LearningRate 0.000127 Epoch: 27 Global Step: 563990 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:31,415-Speed 2496.96 samples/sec Loss 1.5100 LearningRate 0.000127 Epoch: 27 Global Step: 564000 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:39,566-Speed 2512.83 samples/sec Loss 1.5460 LearningRate 0.000127 Epoch: 27 Global Step: 564010 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:47,784-Speed 2492.71 samples/sec Loss 1.5187 LearningRate 0.000126 Epoch: 27 Global Step: 564020 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:33:55,985-Speed 2497.62 samples/sec Loss 1.5562 LearningRate 0.000126 Epoch: 27 Global Step: 564030 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:04,191-Speed 2496.03 samples/sec Loss 1.5202 LearningRate 0.000126 Epoch: 27 Global Step: 564040 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:12,394-Speed 2497.17 samples/sec Loss 1.5066 LearningRate 0.000126 Epoch: 27 Global Step: 564050 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:20,596-Speed 2497.46 samples/sec Loss 1.5110 LearningRate 0.000126 Epoch: 27 Global Step: 564060 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:28,745-Speed 2513.27 samples/sec Loss 1.5727 LearningRate 0.000126 Epoch: 27 Global Step: 564070 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:36,951-Speed 2496.38 samples/sec Loss 1.5474 LearningRate 0.000126 Epoch: 27 Global Step: 564080 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:45,162-Speed 2494.55 samples/sec Loss 1.5521 LearningRate 0.000126 Epoch: 27 Global Step: 564090 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:34:53,367-Speed 2496.27 samples/sec Loss 1.5251 LearningRate 0.000126 Epoch: 27 Global Step: 564100 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:01,569-Speed 2497.43 samples/sec Loss 1.5212 LearningRate 0.000126 Epoch: 27 Global Step: 564110 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:09,771-Speed 2497.33 samples/sec Loss 1.5116 LearningRate 0.000126 Epoch: 27 Global Step: 564120 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:17,919-Speed 2513.74 samples/sec Loss 1.5038 LearningRate 0.000126 Epoch: 27 Global Step: 564130 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:26,123-Speed 2496.64 samples/sec Loss 1.5693 LearningRate 0.000126 Epoch: 27 Global Step: 564140 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:34,325-Speed 2497.51 samples/sec Loss 1.5756 LearningRate 0.000126 Epoch: 27 Global Step: 564150 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:42,525-Speed 2498.02 samples/sec Loss 1.5665 LearningRate 0.000126 Epoch: 27 Global Step: 564160 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:50,739-Speed 2493.48 samples/sec Loss 1.5060 LearningRate 0.000126 Epoch: 27 Global Step: 564170 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:35:58,944-Speed 2496.53 samples/sec Loss 1.5201 LearningRate 0.000126 Epoch: 27 Global Step: 564180 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:07,094-Speed 2513.11 samples/sec Loss 1.5310 LearningRate 0.000126 Epoch: 27 Global Step: 564190 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:15,302-Speed 2495.90 samples/sec Loss 1.5021 LearningRate 0.000126 Epoch: 27 Global Step: 564200 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:23,503-Speed 2497.63 samples/sec Loss 1.5249 LearningRate 0.000126 Epoch: 27 Global Step: 564210 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:31,727-Speed 2490.48 samples/sec Loss 1.5365 LearningRate 0.000126 Epoch: 27 Global Step: 564220 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:39,927-Speed 2498.12 samples/sec Loss 1.5211 LearningRate 0.000126 Epoch: 27 Global Step: 564230 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:48,129-Speed 2497.75 samples/sec Loss 1.5236 LearningRate 0.000126 Epoch: 27 Global Step: 564240 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:36:56,278-Speed 2513.39 samples/sec Loss 1.5625 LearningRate 0.000126 Epoch: 27 Global Step: 564250 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:04,479-Speed 2497.98 samples/sec Loss 1.5114 LearningRate 0.000126 Epoch: 27 Global Step: 564260 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:12,680-Speed 2497.83 samples/sec Loss 1.5052 LearningRate 0.000126 Epoch: 27 Global Step: 564270 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:20,880-Speed 2498.07 samples/sec Loss 1.5115 LearningRate 0.000126 Epoch: 27 Global Step: 564280 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:29,102-Speed 2491.22 samples/sec Loss 1.5174 LearningRate 0.000126 Epoch: 27 Global Step: 564290 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:37,306-Speed 2496.96 samples/sec Loss 1.5078 LearningRate 0.000126 Epoch: 27 Global Step: 564300 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:45,456-Speed 2513.19 samples/sec Loss 1.5158 LearningRate 0.000126 Epoch: 27 Global Step: 564310 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:37:53,659-Speed 2497.01 samples/sec Loss 1.5199 LearningRate 0.000126 Epoch: 27 Global Step: 564320 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:01,858-Speed 2498.10 samples/sec Loss 1.5075 LearningRate 0.000126 Epoch: 27 Global Step: 564330 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:10,066-Speed 2495.49 samples/sec Loss 1.4867 LearningRate 0.000126 Epoch: 27 Global Step: 564340 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:18,270-Speed 2496.96 samples/sec Loss 1.5141 LearningRate 0.000126 Epoch: 27 Global Step: 564350 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:26,472-Speed 2497.16 samples/sec Loss 1.5121 LearningRate 0.000126 Epoch: 27 Global Step: 564360 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:34,622-Speed 2513.18 samples/sec Loss 1.5076 LearningRate 0.000126 Epoch: 27 Global Step: 564370 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:42,821-Speed 2498.38 samples/sec Loss 1.5499 LearningRate 0.000126 Epoch: 27 Global Step: 564380 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:51,023-Speed 2497.49 samples/sec Loss 1.5506 LearningRate 0.000126 Epoch: 27 Global Step: 564390 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:38:59,224-Speed 2497.68 samples/sec Loss 1.5246 LearningRate 0.000126 Epoch: 27 Global Step: 564400 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:07,429-Speed 2496.69 samples/sec Loss 1.5418 LearningRate 0.000126 Epoch: 27 Global Step: 564410 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:15,629-Speed 2497.58 samples/sec Loss 1.5537 LearningRate 0.000126 Epoch: 27 Global Step: 564420 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:23,779-Speed 2513.47 samples/sec Loss 1.5111 LearningRate 0.000126 Epoch: 27 Global Step: 564430 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:31,980-Speed 2497.55 samples/sec Loss 1.5268 LearningRate 0.000126 Epoch: 27 Global Step: 564440 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:40,183-Speed 2497.09 samples/sec Loss 1.5339 LearningRate 0.000126 Epoch: 27 Global Step: 564450 Fp16 Grad Scale: 16384 Required: 61 hours Training: 2022-07-10 23:39:48,394-Speed 2494.62 samples/sec Loss 1.5013 LearningRate 0.000126 Epoch: 27 Global Step: 564460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:39:56,594-Speed 2498.03 samples/sec Loss 1.5360 LearningRate 0.000126 Epoch: 27 Global Step: 564470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:04,807-Speed 2494.08 samples/sec Loss 1.5181 LearningRate 0.000126 Epoch: 27 Global Step: 564480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:12,955-Speed 2513.94 samples/sec Loss 1.5405 LearningRate 0.000126 Epoch: 27 Global Step: 564490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:21,172-Speed 2492.71 samples/sec Loss 1.5300 LearningRate 0.000126 Epoch: 27 Global Step: 564500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:29,374-Speed 2497.34 samples/sec Loss 1.4720 LearningRate 0.000126 Epoch: 27 Global Step: 564510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:37,576-Speed 2497.32 samples/sec Loss 1.5297 LearningRate 0.000126 Epoch: 27 Global Step: 564520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:45,783-Speed 2495.96 samples/sec Loss 1.4717 LearningRate 0.000126 Epoch: 27 Global Step: 564530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:40:53,986-Speed 2497.01 samples/sec Loss 1.5501 LearningRate 0.000126 Epoch: 27 Global Step: 564540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:02,138-Speed 2512.81 samples/sec Loss 1.5624 LearningRate 0.000126 Epoch: 27 Global Step: 564550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:10,346-Speed 2495.58 samples/sec Loss 1.5282 LearningRate 0.000126 Epoch: 27 Global Step: 564560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:18,546-Speed 2497.77 samples/sec Loss 1.5185 LearningRate 0.000126 Epoch: 27 Global Step: 564570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:26,749-Speed 2497.43 samples/sec Loss 1.5525 LearningRate 0.000126 Epoch: 27 Global Step: 564580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:34,951-Speed 2497.44 samples/sec Loss 1.5579 LearningRate 0.000126 Epoch: 27 Global Step: 564590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:43,156-Speed 2496.29 samples/sec Loss 1.5176 LearningRate 0.000126 Epoch: 27 Global Step: 564600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:51,305-Speed 2513.53 samples/sec Loss 1.5126 LearningRate 0.000126 Epoch: 27 Global Step: 564610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:41:59,507-Speed 2497.43 samples/sec Loss 1.5445 LearningRate 0.000126 Epoch: 27 Global Step: 564620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:07,715-Speed 2495.61 samples/sec Loss 1.5019 LearningRate 0.000126 Epoch: 27 Global Step: 564630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:15,917-Speed 2497.25 samples/sec Loss 1.5378 LearningRate 0.000126 Epoch: 27 Global Step: 564640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:24,132-Speed 2493.52 samples/sec Loss 1.5118 LearningRate 0.000126 Epoch: 27 Global Step: 564650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:32,331-Speed 2498.11 samples/sec Loss 1.5564 LearningRate 0.000126 Epoch: 27 Global Step: 564660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:40,479-Speed 2513.81 samples/sec Loss 1.5253 LearningRate 0.000126 Epoch: 27 Global Step: 564670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:48,683-Speed 2497.12 samples/sec Loss 1.5253 LearningRate 0.000126 Epoch: 27 Global Step: 564680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:42:56,882-Speed 2498.12 samples/sec Loss 1.4703 LearningRate 0.000126 Epoch: 27 Global Step: 564690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:05,089-Speed 2496.13 samples/sec Loss 1.5526 LearningRate 0.000126 Epoch: 27 Global Step: 564700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:13,293-Speed 2496.68 samples/sec Loss 1.5399 LearningRate 0.000126 Epoch: 27 Global Step: 564710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:21,493-Speed 2497.81 samples/sec Loss 1.5731 LearningRate 0.000126 Epoch: 27 Global Step: 564720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:29,646-Speed 2512.61 samples/sec Loss 1.5512 LearningRate 0.000126 Epoch: 27 Global Step: 564730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:37,847-Speed 2497.66 samples/sec Loss 1.5353 LearningRate 0.000126 Epoch: 27 Global Step: 564740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:46,059-Speed 2494.46 samples/sec Loss 1.5140 LearningRate 0.000126 Epoch: 27 Global Step: 564750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:43:54,261-Speed 2497.31 samples/sec Loss 1.5088 LearningRate 0.000126 Epoch: 27 Global Step: 564760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:02,465-Speed 2496.62 samples/sec Loss 1.5169 LearningRate 0.000126 Epoch: 27 Global Step: 564770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:10,668-Speed 2497.23 samples/sec Loss 1.5371 LearningRate 0.000126 Epoch: 27 Global Step: 564780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:18,818-Speed 2513.40 samples/sec Loss 1.4867 LearningRate 0.000126 Epoch: 27 Global Step: 564790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:27,019-Speed 2497.73 samples/sec Loss 1.5212 LearningRate 0.000126 Epoch: 27 Global Step: 564800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:35,226-Speed 2495.72 samples/sec Loss 1.5436 LearningRate 0.000126 Epoch: 27 Global Step: 564810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:43,436-Speed 2495.03 samples/sec Loss 1.4889 LearningRate 0.000126 Epoch: 27 Global Step: 564820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:51,638-Speed 2497.21 samples/sec Loss 1.5380 LearningRate 0.000126 Epoch: 27 Global Step: 564830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:44:59,842-Speed 2496.89 samples/sec Loss 1.5102 LearningRate 0.000126 Epoch: 27 Global Step: 564840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:07,998-Speed 2511.61 samples/sec Loss 1.5065 LearningRate 0.000126 Epoch: 27 Global Step: 564850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:16,199-Speed 2497.66 samples/sec Loss 1.5050 LearningRate 0.000126 Epoch: 27 Global Step: 564860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:24,400-Speed 2497.51 samples/sec Loss 1.5219 LearningRate 0.000126 Epoch: 27 Global Step: 564870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:32,610-Speed 2495.04 samples/sec Loss 1.5116 LearningRate 0.000126 Epoch: 27 Global Step: 564880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:40,817-Speed 2495.66 samples/sec Loss 1.5388 LearningRate 0.000126 Epoch: 27 Global Step: 564890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:49,020-Speed 2497.25 samples/sec Loss 1.5322 LearningRate 0.000126 Epoch: 27 Global Step: 564900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:45:57,186-Speed 2508.26 samples/sec Loss 1.5282 LearningRate 0.000126 Epoch: 27 Global Step: 564910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:05,385-Speed 2498.43 samples/sec Loss 1.5006 LearningRate 0.000126 Epoch: 27 Global Step: 564920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:13,610-Speed 2490.57 samples/sec Loss 1.5166 LearningRate 0.000126 Epoch: 27 Global Step: 564930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:21,809-Speed 2498.25 samples/sec Loss 1.5044 LearningRate 0.000126 Epoch: 27 Global Step: 564940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:30,013-Speed 2496.64 samples/sec Loss 1.5334 LearningRate 0.000126 Epoch: 27 Global Step: 564950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:38,217-Speed 2496.84 samples/sec Loss 1.5608 LearningRate 0.000126 Epoch: 27 Global Step: 564960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:46,367-Speed 2513.26 samples/sec Loss 1.5436 LearningRate 0.000126 Epoch: 27 Global Step: 564970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:46:54,569-Speed 2497.33 samples/sec Loss 1.5438 LearningRate 0.000126 Epoch: 27 Global Step: 564980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:02,770-Speed 2497.55 samples/sec Loss 1.5422 LearningRate 0.000126 Epoch: 27 Global Step: 564990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:10,969-Speed 2498.08 samples/sec Loss 1.5491 LearningRate 0.000126 Epoch: 27 Global Step: 565000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:19,172-Speed 2497.35 samples/sec Loss 1.5177 LearningRate 0.000126 Epoch: 27 Global Step: 565010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:27,375-Speed 2496.93 samples/sec Loss 1.4920 LearningRate 0.000126 Epoch: 27 Global Step: 565020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:35,531-Speed 2511.40 samples/sec Loss 1.5088 LearningRate 0.000126 Epoch: 27 Global Step: 565030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:43,735-Speed 2496.89 samples/sec Loss 1.4964 LearningRate 0.000126 Epoch: 27 Global Step: 565040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:47:51,942-Speed 2495.83 samples/sec Loss 1.5323 LearningRate 0.000126 Epoch: 27 Global Step: 565050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:00,143-Speed 2497.39 samples/sec Loss 1.5081 LearningRate 0.000126 Epoch: 27 Global Step: 565060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:08,344-Speed 2497.69 samples/sec Loss 1.5351 LearningRate 0.000125 Epoch: 27 Global Step: 565070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:16,545-Speed 2498.00 samples/sec Loss 1.5327 LearningRate 0.000125 Epoch: 27 Global Step: 565080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:24,697-Speed 2512.68 samples/sec Loss 1.5246 LearningRate 0.000125 Epoch: 27 Global Step: 565090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:32,898-Speed 2497.70 samples/sec Loss 1.5320 LearningRate 0.000125 Epoch: 27 Global Step: 565100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:41,099-Speed 2498.02 samples/sec Loss 1.5322 LearningRate 0.000125 Epoch: 27 Global Step: 565110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:49,300-Speed 2497.59 samples/sec Loss 1.5191 LearningRate 0.000125 Epoch: 27 Global Step: 565120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:48:57,508-Speed 2495.56 samples/sec Loss 1.5557 LearningRate 0.000125 Epoch: 27 Global Step: 565130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:05,711-Speed 2496.92 samples/sec Loss 1.5176 LearningRate 0.000125 Epoch: 27 Global Step: 565140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:13,858-Speed 2514.18 samples/sec Loss 1.5558 LearningRate 0.000125 Epoch: 27 Global Step: 565150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:22,059-Speed 2497.79 samples/sec Loss 1.5716 LearningRate 0.000125 Epoch: 27 Global Step: 565160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:30,262-Speed 2496.99 samples/sec Loss 1.5438 LearningRate 0.000125 Epoch: 27 Global Step: 565170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:38,469-Speed 2495.91 samples/sec Loss 1.5493 LearningRate 0.000125 Epoch: 27 Global Step: 565180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:46,677-Speed 2495.71 samples/sec Loss 1.4821 LearningRate 0.000125 Epoch: 27 Global Step: 565190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:49:54,880-Speed 2497.29 samples/sec Loss 1.5166 LearningRate 0.000125 Epoch: 27 Global Step: 565200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:03,029-Speed 2513.38 samples/sec Loss 1.5476 LearningRate 0.000125 Epoch: 27 Global Step: 565210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:11,240-Speed 2494.57 samples/sec Loss 1.5456 LearningRate 0.000125 Epoch: 27 Global Step: 565220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:19,452-Speed 2494.41 samples/sec Loss 1.5052 LearningRate 0.000125 Epoch: 27 Global Step: 565230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:27,667-Speed 2493.55 samples/sec Loss 1.5496 LearningRate 0.000125 Epoch: 27 Global Step: 565240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:35,869-Speed 2497.19 samples/sec Loss 1.5149 LearningRate 0.000125 Epoch: 27 Global Step: 565250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:44,073-Speed 2496.76 samples/sec Loss 1.5183 LearningRate 0.000125 Epoch: 27 Global Step: 565260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:50:52,227-Speed 2512.15 samples/sec Loss 1.5706 LearningRate 0.000125 Epoch: 27 Global Step: 565270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:51:00,426-Speed 2498.15 samples/sec Loss 1.5182 LearningRate 0.000125 Epoch: 27 Global Step: 565280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-07-10 23:51:08,638-Speed 2494.30 samples/sec Loss 1.4682 LearningRate 0.000125 Epoch: 27 Global Step: 565290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:16,838-Speed 2497.79 samples/sec Loss 1.5444 LearningRate 0.000125 Epoch: 27 Global Step: 565300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:25,041-Speed 2497.23 samples/sec Loss 1.5484 LearningRate 0.000125 Epoch: 27 Global Step: 565310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:33,245-Speed 2496.76 samples/sec Loss 1.5089 LearningRate 0.000125 Epoch: 27 Global Step: 565320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:41,400-Speed 2511.62 samples/sec Loss 1.5743 LearningRate 0.000125 Epoch: 27 Global Step: 565330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:49,609-Speed 2495.48 samples/sec Loss 1.5350 LearningRate 0.000125 Epoch: 27 Global Step: 565340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:51:57,833-Speed 2490.54 samples/sec Loss 1.5430 LearningRate 0.000125 Epoch: 27 Global Step: 565350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:06,039-Speed 2496.12 samples/sec Loss 1.5575 LearningRate 0.000125 Epoch: 27 Global Step: 565360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:14,247-Speed 2495.60 samples/sec Loss 1.5334 LearningRate 0.000125 Epoch: 27 Global Step: 565370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:22,458-Speed 2494.49 samples/sec Loss 1.5580 LearningRate 0.000125 Epoch: 27 Global Step: 565380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:30,642-Speed 2503.17 samples/sec Loss 1.5388 LearningRate 0.000125 Epoch: 27 Global Step: 565390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:38,846-Speed 2496.62 samples/sec Loss 1.5133 LearningRate 0.000125 Epoch: 27 Global Step: 565400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:47,048-Speed 2497.33 samples/sec Loss 1.5241 LearningRate 0.000125 Epoch: 27 Global Step: 565410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:52:55,249-Speed 2497.79 samples/sec Loss 1.4970 LearningRate 0.000125 Epoch: 27 Global Step: 565420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:03,455-Speed 2495.97 samples/sec Loss 1.5534 LearningRate 0.000125 Epoch: 27 Global Step: 565430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:11,671-Speed 2493.24 samples/sec Loss 1.5403 LearningRate 0.000125 Epoch: 27 Global Step: 565440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:19,821-Speed 2513.42 samples/sec Loss 1.5152 LearningRate 0.000125 Epoch: 27 Global Step: 565450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:28,027-Speed 2496.20 samples/sec Loss 1.5283 LearningRate 0.000125 Epoch: 27 Global Step: 565460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:36,228-Speed 2497.49 samples/sec Loss 1.5573 LearningRate 0.000125 Epoch: 27 Global Step: 565470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:44,433-Speed 2496.47 samples/sec Loss 1.5691 LearningRate 0.000125 Epoch: 27 Global Step: 565480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:53:52,636-Speed 2497.04 samples/sec Loss 1.5641 LearningRate 0.000125 Epoch: 27 Global Step: 565490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:00,838-Speed 2497.30 samples/sec Loss 1.5400 LearningRate 0.000125 Epoch: 27 Global Step: 565500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:08,991-Speed 2512.19 samples/sec Loss 1.5138 LearningRate 0.000125 Epoch: 27 Global Step: 565510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:17,198-Speed 2495.87 samples/sec Loss 1.5846 LearningRate 0.000125 Epoch: 27 Global Step: 565520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:25,403-Speed 2496.52 samples/sec Loss 1.5324 LearningRate 0.000125 Epoch: 27 Global Step: 565530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:33,607-Speed 2496.67 samples/sec Loss 1.5449 LearningRate 0.000125 Epoch: 27 Global Step: 565540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:41,808-Speed 2497.63 samples/sec Loss 1.5704 LearningRate 0.000125 Epoch: 27 Global Step: 565550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:50,014-Speed 2496.21 samples/sec Loss 1.5161 LearningRate 0.000125 Epoch: 27 Global Step: 565560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:54:58,166-Speed 2512.34 samples/sec Loss 1.5108 LearningRate 0.000125 Epoch: 27 Global Step: 565570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:06,366-Speed 2498.18 samples/sec Loss 1.5740 LearningRate 0.000125 Epoch: 27 Global Step: 565580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:14,568-Speed 2497.26 samples/sec Loss 1.5436 LearningRate 0.000125 Epoch: 27 Global Step: 565590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:22,768-Speed 2497.92 samples/sec Loss 1.5418 LearningRate 0.000125 Epoch: 27 Global Step: 565600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:30,974-Speed 2496.21 samples/sec Loss 1.5198 LearningRate 0.000125 Epoch: 27 Global Step: 565610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:39,176-Speed 2497.30 samples/sec Loss 1.5179 LearningRate 0.000125 Epoch: 27 Global Step: 565620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:47,339-Speed 2509.30 samples/sec Loss 1.5278 LearningRate 0.000125 Epoch: 27 Global Step: 565630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:55:55,541-Speed 2497.43 samples/sec Loss 1.5470 LearningRate 0.000125 Epoch: 27 Global Step: 565640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:56:03,749-Speed 2495.41 samples/sec Loss 1.5107 LearningRate 0.000125 Epoch: 27 Global Step: 565650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:56:11,967-Speed 2492.94 samples/sec Loss 1.5293 LearningRate 0.000125 Epoch: 27 Global Step: 565660 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-07-10 23:56:20,172-Speed 2496.20 samples/sec Loss 1.5026 LearningRate 0.000125 Epoch: 27 Global Step: 565670 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-07-10 23:56:28,377-Speed 2496.43 samples/sec Loss 1.5100 LearningRate 0.000125 Epoch: 27 Global Step: 565680 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-07-10 23:56:36,528-Speed 2513.00 samples/sec Loss 1.4956 LearningRate 0.000125 Epoch: 27 Global Step: 565690 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-07-10 23:56:44,735-Speed 2495.88 samples/sec Loss 1.5250 LearningRate 0.000125 Epoch: 27 Global Step: 565700 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-07-10 23:56:52,895-Speed 2510.23 samples/sec Loss 1.5574 LearningRate 0.000125 Epoch: 27 Global Step: 565710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:01,099-Speed 2496.78 samples/sec Loss 1.5141 LearningRate 0.000125 Epoch: 27 Global Step: 565720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:09,306-Speed 2495.91 samples/sec Loss 1.5281 LearningRate 0.000125 Epoch: 27 Global Step: 565730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:17,508-Speed 2497.55 samples/sec Loss 1.5002 LearningRate 0.000125 Epoch: 27 Global Step: 565740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:25,660-Speed 2512.39 samples/sec Loss 1.5017 LearningRate 0.000125 Epoch: 27 Global Step: 565750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:33,867-Speed 2496.03 samples/sec Loss 1.5154 LearningRate 0.000125 Epoch: 27 Global Step: 565760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:42,070-Speed 2496.89 samples/sec Loss 1.5109 LearningRate 0.000125 Epoch: 27 Global Step: 565770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:50,283-Speed 2493.97 samples/sec Loss 1.5308 LearningRate 0.000125 Epoch: 27 Global Step: 565780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:57:58,487-Speed 2496.86 samples/sec Loss 1.4934 LearningRate 0.000125 Epoch: 27 Global Step: 565790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:06,692-Speed 2496.54 samples/sec Loss 1.5396 LearningRate 0.000125 Epoch: 27 Global Step: 565800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:14,844-Speed 2512.52 samples/sec Loss 1.5287 LearningRate 0.000125 Epoch: 27 Global Step: 565810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:23,048-Speed 2496.96 samples/sec Loss 1.5444 LearningRate 0.000125 Epoch: 27 Global Step: 565820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:31,251-Speed 2497.02 samples/sec Loss 1.5104 LearningRate 0.000125 Epoch: 27 Global Step: 565830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:39,455-Speed 2496.53 samples/sec Loss 1.5012 LearningRate 0.000125 Epoch: 27 Global Step: 565840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:47,661-Speed 2496.31 samples/sec Loss 1.5302 LearningRate 0.000125 Epoch: 27 Global Step: 565850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:58:55,870-Speed 2495.11 samples/sec Loss 1.5617 LearningRate 0.000125 Epoch: 27 Global Step: 565860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:04,017-Speed 2514.24 samples/sec Loss 1.5295 LearningRate 0.000125 Epoch: 27 Global Step: 565870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:12,222-Speed 2496.47 samples/sec Loss 1.5188 LearningRate 0.000125 Epoch: 27 Global Step: 565880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:20,434-Speed 2494.17 samples/sec Loss 1.5432 LearningRate 0.000125 Epoch: 27 Global Step: 565890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:28,638-Speed 2496.93 samples/sec Loss 1.5558 LearningRate 0.000125 Epoch: 27 Global Step: 565900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:36,843-Speed 2496.55 samples/sec Loss 1.5439 LearningRate 0.000125 Epoch: 27 Global Step: 565910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:45,048-Speed 2496.65 samples/sec Loss 1.5570 LearningRate 0.000125 Epoch: 27 Global Step: 565920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-10 23:59:53,191-Speed 2515.28 samples/sec Loss 1.5572 LearningRate 0.000125 Epoch: 27 Global Step: 565930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:01,400-Speed 2495.33 samples/sec Loss 1.5246 LearningRate 0.000125 Epoch: 27 Global Step: 565940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:09,605-Speed 2496.30 samples/sec Loss 1.5331 LearningRate 0.000125 Epoch: 27 Global Step: 565950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:17,808-Speed 2496.84 samples/sec Loss 1.5621 LearningRate 0.000125 Epoch: 27 Global Step: 565960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:26,012-Speed 2496.72 samples/sec Loss 1.5523 LearningRate 0.000125 Epoch: 27 Global Step: 565970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:34,216-Speed 2496.89 samples/sec Loss 1.5384 LearningRate 0.000125 Epoch: 27 Global Step: 565980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:42,364-Speed 2513.61 samples/sec Loss 1.5627 LearningRate 0.000125 Epoch: 27 Global Step: 565990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:50,569-Speed 2496.69 samples/sec Loss 1.4959 LearningRate 0.000125 Epoch: 27 Global Step: 566000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:00:58,770-Speed 2497.94 samples/sec Loss 1.5120 LearningRate 0.000125 Epoch: 27 Global Step: 566010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:06,977-Speed 2495.64 samples/sec Loss 1.5594 LearningRate 0.000125 Epoch: 27 Global Step: 566020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:15,181-Speed 2496.71 samples/sec Loss 1.5501 LearningRate 0.000125 Epoch: 27 Global Step: 566030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:23,380-Speed 2498.26 samples/sec Loss 1.5129 LearningRate 0.000125 Epoch: 27 Global Step: 566040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:31,530-Speed 2513.51 samples/sec Loss 1.5065 LearningRate 0.000125 Epoch: 27 Global Step: 566050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:39,731-Speed 2497.52 samples/sec Loss 1.5314 LearningRate 0.000125 Epoch: 27 Global Step: 566060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:47,934-Speed 2496.97 samples/sec Loss 1.5461 LearningRate 0.000125 Epoch: 27 Global Step: 566070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:01:56,140-Speed 2496.49 samples/sec Loss 1.5180 LearningRate 0.000125 Epoch: 27 Global Step: 566080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:04,340-Speed 2497.80 samples/sec Loss 1.5550 LearningRate 0.000125 Epoch: 27 Global Step: 566090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:12,556-Speed 2493.17 samples/sec Loss 1.5316 LearningRate 0.000125 Epoch: 27 Global Step: 566100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:20,710-Speed 2512.06 samples/sec Loss 1.5424 LearningRate 0.000125 Epoch: 27 Global Step: 566110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:28,918-Speed 2495.33 samples/sec Loss 1.4931 LearningRate 0.000125 Epoch: 27 Global Step: 566120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:37,134-Speed 2493.23 samples/sec Loss 1.5108 LearningRate 0.000124 Epoch: 27 Global Step: 566130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:45,335-Speed 2497.83 samples/sec Loss 1.5134 LearningRate 0.000124 Epoch: 27 Global Step: 566140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:02:53,542-Speed 2495.95 samples/sec Loss 1.5445 LearningRate 0.000124 Epoch: 27 Global Step: 566150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:03:01,746-Speed 2496.76 samples/sec Loss 1.5455 LearningRate 0.000124 Epoch: 27 Global Step: 566160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:03:09,899-Speed 2512.36 samples/sec Loss 1.5299 LearningRate 0.000124 Epoch: 27 Global Step: 566170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:03:18,067-Speed 2507.78 samples/sec Loss 1.5280 LearningRate 0.000124 Epoch: 27 Global Step: 566180 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:03:26,276-Speed 2495.62 samples/sec Loss 1.5517 LearningRate 0.000124 Epoch: 27 Global Step: 566190 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:03:34,493-Speed 2492.79 samples/sec Loss 1.5663 LearningRate 0.000124 Epoch: 27 Global Step: 566200 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:03:42,694-Speed 2497.69 samples/sec Loss 1.5105 LearningRate 0.000124 Epoch: 27 Global Step: 566210 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:03:50,895-Speed 2497.64 samples/sec Loss 1.4924 LearningRate 0.000124 Epoch: 27 Global Step: 566220 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:03:59,055-Speed 2510.21 samples/sec Loss 1.5488 LearningRate 0.000124 Epoch: 27 Global Step: 566230 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:07,254-Speed 2498.10 samples/sec Loss 1.5262 LearningRate 0.000124 Epoch: 27 Global Step: 566240 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:15,456-Speed 2497.32 samples/sec Loss 1.4947 LearningRate 0.000124 Epoch: 27 Global Step: 566250 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:23,658-Speed 2497.42 samples/sec Loss 1.4932 LearningRate 0.000124 Epoch: 27 Global Step: 566260 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:31,863-Speed 2496.52 samples/sec Loss 1.5371 LearningRate 0.000124 Epoch: 27 Global Step: 566270 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:40,066-Speed 2496.85 samples/sec Loss 1.5499 LearningRate 0.000124 Epoch: 27 Global Step: 566280 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:48,223-Speed 2511.12 samples/sec Loss 1.5092 LearningRate 0.000124 Epoch: 27 Global Step: 566290 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:04:56,424-Speed 2497.75 samples/sec Loss 1.5354 LearningRate 0.000124 Epoch: 27 Global Step: 566300 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:04,625-Speed 2497.64 samples/sec Loss 1.5670 LearningRate 0.000124 Epoch: 27 Global Step: 566310 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:12,828-Speed 2497.16 samples/sec Loss 1.5386 LearningRate 0.000124 Epoch: 27 Global Step: 566320 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:21,033-Speed 2496.48 samples/sec Loss 1.4888 LearningRate 0.000124 Epoch: 27 Global Step: 566330 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:29,248-Speed 2493.44 samples/sec Loss 1.5220 LearningRate 0.000124 Epoch: 27 Global Step: 566340 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:37,395-Speed 2514.12 samples/sec Loss 1.5076 LearningRate 0.000124 Epoch: 27 Global Step: 566350 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:45,596-Speed 2497.90 samples/sec Loss 1.5614 LearningRate 0.000124 Epoch: 27 Global Step: 566360 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:05:53,799-Speed 2496.96 samples/sec Loss 1.5521 LearningRate 0.000124 Epoch: 27 Global Step: 566370 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:02,005-Speed 2496.19 samples/sec Loss 1.4906 LearningRate 0.000124 Epoch: 27 Global Step: 566380 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:10,207-Speed 2497.22 samples/sec Loss 1.5121 LearningRate 0.000124 Epoch: 27 Global Step: 566390 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:18,410-Speed 2496.99 samples/sec Loss 1.5246 LearningRate 0.000124 Epoch: 27 Global Step: 566400 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:26,563-Speed 2512.41 samples/sec Loss 1.4976 LearningRate 0.000124 Epoch: 27 Global Step: 566410 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:34,765-Speed 2497.47 samples/sec Loss 1.4893 LearningRate 0.000124 Epoch: 27 Global Step: 566420 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:42,969-Speed 2497.20 samples/sec Loss 1.5503 LearningRate 0.000124 Epoch: 27 Global Step: 566430 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:51,171-Speed 2497.81 samples/sec Loss 1.5103 LearningRate 0.000124 Epoch: 27 Global Step: 566440 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:06:59,378-Speed 2495.91 samples/sec Loss 1.5250 LearningRate 0.000124 Epoch: 27 Global Step: 566450 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:07,579-Speed 2497.55 samples/sec Loss 1.5067 LearningRate 0.000124 Epoch: 27 Global Step: 566460 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:15,730-Speed 2513.05 samples/sec Loss 1.5095 LearningRate 0.000124 Epoch: 27 Global Step: 566470 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:23,935-Speed 2496.75 samples/sec Loss 1.4731 LearningRate 0.000124 Epoch: 27 Global Step: 566480 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:32,138-Speed 2497.00 samples/sec Loss 1.5395 LearningRate 0.000124 Epoch: 27 Global Step: 566490 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:40,342-Speed 2496.63 samples/sec Loss 1.5289 LearningRate 0.000124 Epoch: 27 Global Step: 566500 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:48,551-Speed 2495.27 samples/sec Loss 1.5121 LearningRate 0.000124 Epoch: 27 Global Step: 566510 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:07:56,756-Speed 2496.68 samples/sec Loss 1.5207 LearningRate 0.000124 Epoch: 27 Global Step: 566520 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:04,903-Speed 2513.95 samples/sec Loss 1.5014 LearningRate 0.000124 Epoch: 27 Global Step: 566530 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:13,104-Speed 2497.68 samples/sec Loss 1.5435 LearningRate 0.000124 Epoch: 27 Global Step: 566540 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:21,315-Speed 2494.94 samples/sec Loss 1.5209 LearningRate 0.000124 Epoch: 27 Global Step: 566550 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:29,512-Speed 2498.87 samples/sec Loss 1.5200 LearningRate 0.000124 Epoch: 27 Global Step: 566560 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:37,711-Speed 2497.95 samples/sec Loss 1.5556 LearningRate 0.000124 Epoch: 27 Global Step: 566570 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:45,922-Speed 2494.44 samples/sec Loss 1.5610 LearningRate 0.000124 Epoch: 27 Global Step: 566580 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:08:54,071-Speed 2513.67 samples/sec Loss 1.5582 LearningRate 0.000124 Epoch: 27 Global Step: 566590 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:02,274-Speed 2496.92 samples/sec Loss 1.4920 LearningRate 0.000124 Epoch: 27 Global Step: 566600 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:10,479-Speed 2496.49 samples/sec Loss 1.4857 LearningRate 0.000124 Epoch: 27 Global Step: 566610 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:18,678-Speed 2498.15 samples/sec Loss 1.5609 LearningRate 0.000124 Epoch: 27 Global Step: 566620 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:26,881-Speed 2497.47 samples/sec Loss 1.4874 LearningRate 0.000124 Epoch: 27 Global Step: 566630 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:35,082-Speed 2497.53 samples/sec Loss 1.5139 LearningRate 0.000124 Epoch: 27 Global Step: 566640 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:43,229-Speed 2514.25 samples/sec Loss 1.5395 LearningRate 0.000124 Epoch: 27 Global Step: 566650 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:51,430-Speed 2497.59 samples/sec Loss 1.5459 LearningRate 0.000124 Epoch: 27 Global Step: 566660 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:09:59,641-Speed 2494.67 samples/sec Loss 1.5150 LearningRate 0.000124 Epoch: 27 Global Step: 566670 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:07,847-Speed 2496.14 samples/sec Loss 1.5345 LearningRate 0.000124 Epoch: 27 Global Step: 566680 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:16,054-Speed 2495.70 samples/sec Loss 1.5543 LearningRate 0.000124 Epoch: 27 Global Step: 566690 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:24,259-Speed 2496.78 samples/sec Loss 1.5444 LearningRate 0.000124 Epoch: 27 Global Step: 566700 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:32,413-Speed 2512.06 samples/sec Loss 1.5341 LearningRate 0.000124 Epoch: 27 Global Step: 566710 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:40,617-Speed 2496.81 samples/sec Loss 1.5078 LearningRate 0.000124 Epoch: 27 Global Step: 566720 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:48,820-Speed 2496.93 samples/sec Loss 1.5247 LearningRate 0.000124 Epoch: 27 Global Step: 566730 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:10:57,020-Speed 2498.05 samples/sec Loss 1.5354 LearningRate 0.000124 Epoch: 27 Global Step: 566740 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:05,222-Speed 2497.60 samples/sec Loss 1.5411 LearningRate 0.000124 Epoch: 27 Global Step: 566750 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:13,428-Speed 2496.22 samples/sec Loss 1.5668 LearningRate 0.000124 Epoch: 27 Global Step: 566760 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:21,575-Speed 2514.30 samples/sec Loss 1.5228 LearningRate 0.000124 Epoch: 27 Global Step: 566770 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:29,778-Speed 2496.99 samples/sec Loss 1.5428 LearningRate 0.000124 Epoch: 27 Global Step: 566780 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:37,979-Speed 2497.50 samples/sec Loss 1.5397 LearningRate 0.000124 Epoch: 27 Global Step: 566790 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:46,180-Speed 2497.66 samples/sec Loss 1.5243 LearningRate 0.000124 Epoch: 27 Global Step: 566800 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:11:54,388-Speed 2495.43 samples/sec Loss 1.5344 LearningRate 0.000124 Epoch: 27 Global Step: 566810 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:02,588-Speed 2497.97 samples/sec Loss 1.4959 LearningRate 0.000124 Epoch: 27 Global Step: 566820 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:10,735-Speed 2514.25 samples/sec Loss 1.5155 LearningRate 0.000124 Epoch: 27 Global Step: 566830 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:18,939-Speed 2497.02 samples/sec Loss 1.4827 LearningRate 0.000124 Epoch: 27 Global Step: 566840 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:27,142-Speed 2497.00 samples/sec Loss 1.5340 LearningRate 0.000124 Epoch: 27 Global Step: 566850 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:35,346-Speed 2496.62 samples/sec Loss 1.5403 LearningRate 0.000124 Epoch: 27 Global Step: 566860 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:43,549-Speed 2497.14 samples/sec Loss 1.5169 LearningRate 0.000124 Epoch: 27 Global Step: 566870 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:51,753-Speed 2496.82 samples/sec Loss 1.5123 LearningRate 0.000124 Epoch: 27 Global Step: 566880 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:12:59,903-Speed 2513.21 samples/sec Loss 1.5080 LearningRate 0.000124 Epoch: 27 Global Step: 566890 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:08,109-Speed 2496.14 samples/sec Loss 1.5589 LearningRate 0.000124 Epoch: 27 Global Step: 566900 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:16,310-Speed 2497.82 samples/sec Loss 1.5389 LearningRate 0.000124 Epoch: 27 Global Step: 566910 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:24,512-Speed 2497.53 samples/sec Loss 1.5400 LearningRate 0.000124 Epoch: 27 Global Step: 566920 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:32,720-Speed 2495.24 samples/sec Loss 1.5288 LearningRate 0.000124 Epoch: 27 Global Step: 566930 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:40,926-Speed 2496.14 samples/sec Loss 1.5424 LearningRate 0.000124 Epoch: 27 Global Step: 566940 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:49,079-Speed 2512.66 samples/sec Loss 1.5113 LearningRate 0.000124 Epoch: 27 Global Step: 566950 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:13:57,284-Speed 2496.21 samples/sec Loss 1.5109 LearningRate 0.000124 Epoch: 27 Global Step: 566960 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:05,489-Speed 2496.65 samples/sec Loss 1.5222 LearningRate 0.000124 Epoch: 27 Global Step: 566970 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:13,691-Speed 2497.39 samples/sec Loss 1.5399 LearningRate 0.000124 Epoch: 27 Global Step: 566980 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:21,893-Speed 2497.33 samples/sec Loss 1.5427 LearningRate 0.000124 Epoch: 27 Global Step: 566990 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:30,095-Speed 2497.28 samples/sec Loss 1.5420 LearningRate 0.000124 Epoch: 27 Global Step: 567000 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:38,246-Speed 2513.00 samples/sec Loss 1.5281 LearningRate 0.000124 Epoch: 27 Global Step: 567010 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:46,446-Speed 2498.16 samples/sec Loss 1.4978 LearningRate 0.000124 Epoch: 27 Global Step: 567020 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:14:54,650-Speed 2496.67 samples/sec Loss 1.5161 LearningRate 0.000124 Epoch: 27 Global Step: 567030 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:02,853-Speed 2497.42 samples/sec Loss 1.5064 LearningRate 0.000124 Epoch: 27 Global Step: 567040 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:11,056-Speed 2497.00 samples/sec Loss 1.5093 LearningRate 0.000124 Epoch: 27 Global Step: 567050 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:19,262-Speed 2496.06 samples/sec Loss 1.5131 LearningRate 0.000124 Epoch: 27 Global Step: 567060 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:27,415-Speed 2512.41 samples/sec Loss 1.5448 LearningRate 0.000124 Epoch: 27 Global Step: 567070 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:35,617-Speed 2497.38 samples/sec Loss 1.5215 LearningRate 0.000124 Epoch: 27 Global Step: 567080 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:43,829-Speed 2494.75 samples/sec Loss 1.5186 LearningRate 0.000124 Epoch: 27 Global Step: 567090 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:15:52,039-Speed 2494.97 samples/sec Loss 1.5365 LearningRate 0.000124 Epoch: 27 Global Step: 567100 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:00,267-Speed 2489.55 samples/sec Loss 1.5241 LearningRate 0.000124 Epoch: 27 Global Step: 567110 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:08,471-Speed 2496.56 samples/sec Loss 1.5669 LearningRate 0.000124 Epoch: 27 Global Step: 567120 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:16,619-Speed 2514.06 samples/sec Loss 1.5493 LearningRate 0.000124 Epoch: 27 Global Step: 567130 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:24,822-Speed 2497.17 samples/sec Loss 1.4892 LearningRate 0.000124 Epoch: 27 Global Step: 567140 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:33,026-Speed 2496.79 samples/sec Loss 1.5343 LearningRate 0.000124 Epoch: 27 Global Step: 567150 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:41,227-Speed 2497.57 samples/sec Loss 1.5249 LearningRate 0.000124 Epoch: 27 Global Step: 567160 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:49,434-Speed 2495.82 samples/sec Loss 1.5353 LearningRate 0.000124 Epoch: 27 Global Step: 567170 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:16:57,640-Speed 2496.16 samples/sec Loss 1.5181 LearningRate 0.000124 Epoch: 27 Global Step: 567180 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:05,791-Speed 2513.23 samples/sec Loss 1.5477 LearningRate 0.000123 Epoch: 27 Global Step: 567190 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:13,996-Speed 2496.37 samples/sec Loss 1.5215 LearningRate 0.000123 Epoch: 27 Global Step: 567200 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:22,195-Speed 2498.40 samples/sec Loss 1.5077 LearningRate 0.000123 Epoch: 27 Global Step: 567210 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:30,401-Speed 2496.28 samples/sec Loss 1.5330 LearningRate 0.000123 Epoch: 27 Global Step: 567220 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:38,608-Speed 2495.53 samples/sec Loss 1.5228 LearningRate 0.000123 Epoch: 27 Global Step: 567230 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:46,813-Speed 2496.68 samples/sec Loss 1.5449 LearningRate 0.000123 Epoch: 27 Global Step: 567240 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:17:54,965-Speed 2512.84 samples/sec Loss 1.5341 LearningRate 0.000123 Epoch: 27 Global Step: 567250 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:03,170-Speed 2496.68 samples/sec Loss 1.5357 LearningRate 0.000123 Epoch: 27 Global Step: 567260 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:11,372-Speed 2496.95 samples/sec Loss 1.4994 LearningRate 0.000123 Epoch: 27 Global Step: 567270 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:19,576-Speed 2496.89 samples/sec Loss 1.4956 LearningRate 0.000123 Epoch: 27 Global Step: 567280 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:27,778-Speed 2497.30 samples/sec Loss 1.5325 LearningRate 0.000123 Epoch: 27 Global Step: 567290 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:35,982-Speed 2496.70 samples/sec Loss 1.5495 LearningRate 0.000123 Epoch: 27 Global Step: 567300 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:44,134-Speed 2512.77 samples/sec Loss 1.5304 LearningRate 0.000123 Epoch: 27 Global Step: 567310 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:18:52,338-Speed 2496.57 samples/sec Loss 1.5014 LearningRate 0.000123 Epoch: 27 Global Step: 567320 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:00,542-Speed 2496.98 samples/sec Loss 1.5343 LearningRate 0.000123 Epoch: 27 Global Step: 567330 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:08,748-Speed 2496.06 samples/sec Loss 1.5143 LearningRate 0.000123 Epoch: 27 Global Step: 567340 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:16,961-Speed 2494.46 samples/sec Loss 1.5081 LearningRate 0.000123 Epoch: 27 Global Step: 567350 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:25,164-Speed 2496.87 samples/sec Loss 1.5093 LearningRate 0.000123 Epoch: 27 Global Step: 567360 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:33,311-Speed 2514.22 samples/sec Loss 1.5185 LearningRate 0.000123 Epoch: 27 Global Step: 567370 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:19:41,516-Speed 2496.68 samples/sec Loss 1.4852 LearningRate 0.000123 Epoch: 27 Global Step: 567380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:19:49,718-Speed 2497.08 samples/sec Loss 1.4884 LearningRate 0.000123 Epoch: 27 Global Step: 567390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:19:57,922-Speed 2496.91 samples/sec Loss 1.5185 LearningRate 0.000123 Epoch: 27 Global Step: 567400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:06,130-Speed 2495.56 samples/sec Loss 1.5159 LearningRate 0.000123 Epoch: 27 Global Step: 567410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:14,332-Speed 2497.24 samples/sec Loss 1.5103 LearningRate 0.000123 Epoch: 27 Global Step: 567420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:22,479-Speed 2514.07 samples/sec Loss 1.4947 LearningRate 0.000123 Epoch: 27 Global Step: 567430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:30,681-Speed 2497.73 samples/sec Loss 1.4860 LearningRate 0.000123 Epoch: 27 Global Step: 567440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:38,884-Speed 2497.07 samples/sec Loss 1.4970 LearningRate 0.000123 Epoch: 27 Global Step: 567450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:47,087-Speed 2497.00 samples/sec Loss 1.5303 LearningRate 0.000123 Epoch: 27 Global Step: 567460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:20:55,290-Speed 2497.20 samples/sec Loss 1.5162 LearningRate 0.000123 Epoch: 27 Global Step: 567470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:03,494-Speed 2496.62 samples/sec Loss 1.5237 LearningRate 0.000123 Epoch: 27 Global Step: 567480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:11,641-Speed 2514.20 samples/sec Loss 1.5028 LearningRate 0.000123 Epoch: 27 Global Step: 567490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:19,850-Speed 2495.20 samples/sec Loss 1.5428 LearningRate 0.000123 Epoch: 27 Global Step: 567500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:28,052-Speed 2497.48 samples/sec Loss 1.5389 LearningRate 0.000123 Epoch: 27 Global Step: 567510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:36,257-Speed 2496.15 samples/sec Loss 1.5041 LearningRate 0.000123 Epoch: 27 Global Step: 567520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:44,468-Speed 2494.71 samples/sec Loss 1.4813 LearningRate 0.000123 Epoch: 27 Global Step: 567530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:21:52,674-Speed 2496.64 samples/sec Loss 1.5444 LearningRate 0.000123 Epoch: 27 Global Step: 567540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:00,820-Speed 2514.67 samples/sec Loss 1.5134 LearningRate 0.000123 Epoch: 27 Global Step: 567550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:09,024-Speed 2496.96 samples/sec Loss 1.5350 LearningRate 0.000123 Epoch: 27 Global Step: 567560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:17,227-Speed 2496.99 samples/sec Loss 1.5306 LearningRate 0.000123 Epoch: 27 Global Step: 567570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:25,429-Speed 2497.61 samples/sec Loss 1.5342 LearningRate 0.000123 Epoch: 27 Global Step: 567580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:33,631-Speed 2497.59 samples/sec Loss 1.5200 LearningRate 0.000123 Epoch: 27 Global Step: 567590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:41,838-Speed 2495.77 samples/sec Loss 1.5087 LearningRate 0.000123 Epoch: 27 Global Step: 567600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:49,988-Speed 2513.26 samples/sec Loss 1.4971 LearningRate 0.000123 Epoch: 27 Global Step: 567610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:22:58,147-Speed 2510.43 samples/sec Loss 1.5154 LearningRate 0.000123 Epoch: 27 Global Step: 567620 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:06,346-Speed 2498.32 samples/sec Loss 1.5074 LearningRate 0.000123 Epoch: 27 Global Step: 567630 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:14,546-Speed 2498.39 samples/sec Loss 1.5652 LearningRate 0.000123 Epoch: 27 Global Step: 567640 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:22,747-Speed 2497.65 samples/sec Loss 1.4951 LearningRate 0.000123 Epoch: 27 Global Step: 567650 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:30,949-Speed 2497.38 samples/sec Loss 1.5138 LearningRate 0.000123 Epoch: 27 Global Step: 567660 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:39,096-Speed 2514.03 samples/sec Loss 1.5150 LearningRate 0.000123 Epoch: 27 Global Step: 567670 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:47,298-Speed 2497.38 samples/sec Loss 1.4864 LearningRate 0.000123 Epoch: 27 Global Step: 567680 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:23:55,497-Speed 2498.42 samples/sec Loss 1.4824 LearningRate 0.000123 Epoch: 27 Global Step: 567690 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:03,698-Speed 2497.68 samples/sec Loss 1.5480 LearningRate 0.000123 Epoch: 27 Global Step: 567700 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:11,901-Speed 2497.01 samples/sec Loss 1.5295 LearningRate 0.000123 Epoch: 27 Global Step: 567710 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:20,103-Speed 2497.39 samples/sec Loss 1.5265 LearningRate 0.000123 Epoch: 27 Global Step: 567720 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:28,254-Speed 2512.87 samples/sec Loss 1.4810 LearningRate 0.000123 Epoch: 27 Global Step: 567730 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:36,456-Speed 2497.60 samples/sec Loss 1.5054 LearningRate 0.000123 Epoch: 27 Global Step: 567740 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:44,663-Speed 2495.79 samples/sec Loss 1.5390 LearningRate 0.000123 Epoch: 27 Global Step: 567750 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:24:52,868-Speed 2496.53 samples/sec Loss 1.5328 LearningRate 0.000123 Epoch: 27 Global Step: 567760 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:01,069-Speed 2497.82 samples/sec Loss 1.5086 LearningRate 0.000123 Epoch: 27 Global Step: 567770 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:09,269-Speed 2497.70 samples/sec Loss 1.5000 LearningRate 0.000123 Epoch: 27 Global Step: 567780 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:17,418-Speed 2513.76 samples/sec Loss 1.5158 LearningRate 0.000123 Epoch: 27 Global Step: 567790 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:25,619-Speed 2497.51 samples/sec Loss 1.5374 LearningRate 0.000123 Epoch: 27 Global Step: 567800 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:33,823-Speed 2496.97 samples/sec Loss 1.5355 LearningRate 0.000123 Epoch: 27 Global Step: 567810 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:42,027-Speed 2496.83 samples/sec Loss 1.5091 LearningRate 0.000123 Epoch: 27 Global Step: 567820 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:50,227-Speed 2497.77 samples/sec Loss 1.5055 LearningRate 0.000123 Epoch: 27 Global Step: 567830 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:25:58,429-Speed 2497.53 samples/sec Loss 1.5080 LearningRate 0.000123 Epoch: 27 Global Step: 567840 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:06,572-Speed 2515.36 samples/sec Loss 1.5208 LearningRate 0.000123 Epoch: 27 Global Step: 567850 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:14,774-Speed 2497.38 samples/sec Loss 1.5413 LearningRate 0.000123 Epoch: 27 Global Step: 567860 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:22,972-Speed 2498.56 samples/sec Loss 1.5373 LearningRate 0.000123 Epoch: 27 Global Step: 567870 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:31,171-Speed 2498.57 samples/sec Loss 1.5055 LearningRate 0.000123 Epoch: 27 Global Step: 567880 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:39,374-Speed 2496.94 samples/sec Loss 1.4826 LearningRate 0.000123 Epoch: 27 Global Step: 567890 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:47,575-Speed 2497.51 samples/sec Loss 1.5568 LearningRate 0.000123 Epoch: 27 Global Step: 567900 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:26:55,724-Speed 2513.84 samples/sec Loss 1.5048 LearningRate 0.000123 Epoch: 27 Global Step: 567910 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:03,925-Speed 2497.46 samples/sec Loss 1.5065 LearningRate 0.000123 Epoch: 27 Global Step: 567920 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:12,125-Speed 2497.93 samples/sec Loss 1.5077 LearningRate 0.000123 Epoch: 27 Global Step: 567930 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:20,327-Speed 2497.43 samples/sec Loss 1.5083 LearningRate 0.000123 Epoch: 27 Global Step: 567940 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:28,528-Speed 2497.86 samples/sec Loss 1.5076 LearningRate 0.000123 Epoch: 27 Global Step: 567950 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:36,728-Speed 2497.92 samples/sec Loss 1.4930 LearningRate 0.000123 Epoch: 27 Global Step: 567960 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:44,889-Speed 2510.03 samples/sec Loss 1.5560 LearningRate 0.000123 Epoch: 27 Global Step: 567970 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:27:53,095-Speed 2495.97 samples/sec Loss 1.5610 LearningRate 0.000123 Epoch: 27 Global Step: 567980 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:01,295-Speed 2497.99 samples/sec Loss 1.5613 LearningRate 0.000123 Epoch: 27 Global Step: 567990 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:09,495-Speed 2498.09 samples/sec Loss 1.5332 LearningRate 0.000123 Epoch: 27 Global Step: 568000 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:17,695-Speed 2497.94 samples/sec Loss 1.5329 LearningRate 0.000123 Epoch: 27 Global Step: 568010 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:25,904-Speed 2495.25 samples/sec Loss 1.5219 LearningRate 0.000123 Epoch: 27 Global Step: 568020 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:34,057-Speed 2512.23 samples/sec Loss 1.5371 LearningRate 0.000123 Epoch: 27 Global Step: 568030 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:42,260-Speed 2496.98 samples/sec Loss 1.5175 LearningRate 0.000123 Epoch: 27 Global Step: 568040 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:50,465-Speed 2496.55 samples/sec Loss 1.5321 LearningRate 0.000123 Epoch: 27 Global Step: 568050 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:28:58,666-Speed 2497.86 samples/sec Loss 1.5238 LearningRate 0.000123 Epoch: 27 Global Step: 568060 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:06,874-Speed 2495.73 samples/sec Loss 1.5727 LearningRate 0.000123 Epoch: 27 Global Step: 568070 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:15,093-Speed 2492.12 samples/sec Loss 1.5495 LearningRate 0.000123 Epoch: 27 Global Step: 568080 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:23,241-Speed 2513.86 samples/sec Loss 1.5456 LearningRate 0.000123 Epoch: 27 Global Step: 568090 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:31,447-Speed 2495.86 samples/sec Loss 1.5190 LearningRate 0.000123 Epoch: 27 Global Step: 568100 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:39,650-Speed 2497.24 samples/sec Loss 1.5464 LearningRate 0.000123 Epoch: 27 Global Step: 568110 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:47,857-Speed 2495.85 samples/sec Loss 1.4903 LearningRate 0.000123 Epoch: 27 Global Step: 568120 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:29:56,072-Speed 2493.36 samples/sec Loss 1.5648 LearningRate 0.000123 Epoch: 27 Global Step: 568130 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:04,279-Speed 2495.99 samples/sec Loss 1.5216 LearningRate 0.000123 Epoch: 27 Global Step: 568140 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:12,424-Speed 2514.76 samples/sec Loss 1.5013 LearningRate 0.000123 Epoch: 27 Global Step: 568150 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:20,623-Speed 2498.09 samples/sec Loss 1.5166 LearningRate 0.000123 Epoch: 27 Global Step: 568160 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:28,829-Speed 2496.26 samples/sec Loss 1.5507 LearningRate 0.000123 Epoch: 27 Global Step: 568170 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:37,031-Speed 2497.48 samples/sec Loss 1.4936 LearningRate 0.000123 Epoch: 27 Global Step: 568180 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:45,229-Speed 2498.26 samples/sec Loss 1.5207 LearningRate 0.000123 Epoch: 27 Global Step: 568190 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:30:53,433-Speed 2497.06 samples/sec Loss 1.5206 LearningRate 0.000123 Epoch: 27 Global Step: 568200 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:01,579-Speed 2514.40 samples/sec Loss 1.5345 LearningRate 0.000123 Epoch: 27 Global Step: 568210 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:09,788-Speed 2495.65 samples/sec Loss 1.5184 LearningRate 0.000123 Epoch: 27 Global Step: 568220 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:17,991-Speed 2496.74 samples/sec Loss 1.5147 LearningRate 0.000123 Epoch: 27 Global Step: 568230 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:26,195-Speed 2496.83 samples/sec Loss 1.5265 LearningRate 0.000123 Epoch: 27 Global Step: 568240 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:34,398-Speed 2497.25 samples/sec Loss 1.5671 LearningRate 0.000122 Epoch: 27 Global Step: 568250 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:42,600-Speed 2497.44 samples/sec Loss 1.5433 LearningRate 0.000122 Epoch: 27 Global Step: 568260 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:50,745-Speed 2514.55 samples/sec Loss 1.5050 LearningRate 0.000122 Epoch: 27 Global Step: 568270 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:31:58,945-Speed 2498.61 samples/sec Loss 1.5101 LearningRate 0.000122 Epoch: 27 Global Step: 568280 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:07,145-Speed 2497.84 samples/sec Loss 1.4890 LearningRate 0.000122 Epoch: 27 Global Step: 568290 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:15,366-Speed 2491.92 samples/sec Loss 1.5390 LearningRate 0.000122 Epoch: 27 Global Step: 568300 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:23,569-Speed 2496.90 samples/sec Loss 1.5041 LearningRate 0.000122 Epoch: 27 Global Step: 568310 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:31,770-Speed 2497.53 samples/sec Loss 1.5180 LearningRate 0.000122 Epoch: 27 Global Step: 568320 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:39,915-Speed 2514.79 samples/sec Loss 1.5140 LearningRate 0.000122 Epoch: 27 Global Step: 568330 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:48,123-Speed 2495.86 samples/sec Loss 1.5383 LearningRate 0.000122 Epoch: 27 Global Step: 568340 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:32:56,322-Speed 2498.04 samples/sec Loss 1.5342 LearningRate 0.000122 Epoch: 27 Global Step: 568350 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:04,522-Speed 2498.04 samples/sec Loss 1.5176 LearningRate 0.000122 Epoch: 27 Global Step: 568360 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:12,721-Speed 2498.25 samples/sec Loss 1.5197 LearningRate 0.000122 Epoch: 27 Global Step: 568370 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:20,922-Speed 2497.64 samples/sec Loss 1.5552 LearningRate 0.000122 Epoch: 27 Global Step: 568380 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:29,072-Speed 2513.55 samples/sec Loss 1.5197 LearningRate 0.000122 Epoch: 27 Global Step: 568390 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:37,299-Speed 2489.76 samples/sec Loss 1.5082 LearningRate 0.000122 Epoch: 27 Global Step: 568400 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:45,500-Speed 2497.73 samples/sec Loss 1.5304 LearningRate 0.000122 Epoch: 27 Global Step: 568410 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:33:53,706-Speed 2496.02 samples/sec Loss 1.5815 LearningRate 0.000122 Epoch: 27 Global Step: 568420 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:01,907-Speed 2497.51 samples/sec Loss 1.5082 LearningRate 0.000122 Epoch: 27 Global Step: 568430 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:10,110-Speed 2496.98 samples/sec Loss 1.5001 LearningRate 0.000122 Epoch: 27 Global Step: 568440 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:18,258-Speed 2514.08 samples/sec Loss 1.5113 LearningRate 0.000122 Epoch: 27 Global Step: 568450 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:26,458-Speed 2497.85 samples/sec Loss 1.4749 LearningRate 0.000122 Epoch: 27 Global Step: 568460 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:34,670-Speed 2494.42 samples/sec Loss 1.5340 LearningRate 0.000122 Epoch: 27 Global Step: 568470 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:42,881-Speed 2494.90 samples/sec Loss 1.5494 LearningRate 0.000122 Epoch: 27 Global Step: 568480 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:51,086-Speed 2496.41 samples/sec Loss 1.5213 LearningRate 0.000122 Epoch: 27 Global Step: 568490 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:34:59,290-Speed 2496.81 samples/sec Loss 1.4993 LearningRate 0.000122 Epoch: 27 Global Step: 568500 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:07,439-Speed 2513.72 samples/sec Loss 1.5248 LearningRate 0.000122 Epoch: 27 Global Step: 568510 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:15,649-Speed 2495.21 samples/sec Loss 1.5077 LearningRate 0.000122 Epoch: 27 Global Step: 568520 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:23,861-Speed 2494.41 samples/sec Loss 1.5219 LearningRate 0.000122 Epoch: 27 Global Step: 568530 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:32,065-Speed 2496.63 samples/sec Loss 1.5522 LearningRate 0.000122 Epoch: 27 Global Step: 568540 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:40,265-Speed 2497.93 samples/sec Loss 1.5075 LearningRate 0.000122 Epoch: 27 Global Step: 568550 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:48,470-Speed 2496.31 samples/sec Loss 1.5492 LearningRate 0.000122 Epoch: 27 Global Step: 568560 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:35:56,618-Speed 2513.70 samples/sec Loss 1.4912 LearningRate 0.000122 Epoch: 27 Global Step: 568570 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:04,830-Speed 2494.41 samples/sec Loss 1.5165 LearningRate 0.000122 Epoch: 27 Global Step: 568580 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:13,038-Speed 2495.78 samples/sec Loss 1.4755 LearningRate 0.000122 Epoch: 27 Global Step: 568590 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:21,248-Speed 2494.86 samples/sec Loss 1.4808 LearningRate 0.000122 Epoch: 27 Global Step: 568600 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:29,449-Speed 2497.70 samples/sec Loss 1.5139 LearningRate 0.000122 Epoch: 27 Global Step: 568610 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:37,650-Speed 2497.59 samples/sec Loss 1.5378 LearningRate 0.000122 Epoch: 27 Global Step: 568620 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:45,795-Speed 2514.88 samples/sec Loss 1.5151 LearningRate 0.000122 Epoch: 27 Global Step: 568630 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:36:53,997-Speed 2497.27 samples/sec Loss 1.5340 LearningRate 0.000122 Epoch: 27 Global Step: 568640 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:02,201-Speed 2496.69 samples/sec Loss 1.5022 LearningRate 0.000122 Epoch: 27 Global Step: 568650 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:10,415-Speed 2494.01 samples/sec Loss 1.4814 LearningRate 0.000122 Epoch: 27 Global Step: 568660 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:18,614-Speed 2498.21 samples/sec Loss 1.5330 LearningRate 0.000122 Epoch: 27 Global Step: 568670 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:26,828-Speed 2493.73 samples/sec Loss 1.4812 LearningRate 0.000122 Epoch: 27 Global Step: 568680 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:34,976-Speed 2513.90 samples/sec Loss 1.5023 LearningRate 0.000122 Epoch: 27 Global Step: 568690 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:43,191-Speed 2493.73 samples/sec Loss 1.5138 LearningRate 0.000122 Epoch: 27 Global Step: 568700 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:51,392-Speed 2497.38 samples/sec Loss 1.5048 LearningRate 0.000122 Epoch: 27 Global Step: 568710 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:37:59,591-Speed 2498.61 samples/sec Loss 1.5093 LearningRate 0.000122 Epoch: 27 Global Step: 568720 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:07,792-Speed 2497.58 samples/sec Loss 1.5229 LearningRate 0.000122 Epoch: 27 Global Step: 568730 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:15,994-Speed 2497.39 samples/sec Loss 1.5323 LearningRate 0.000122 Epoch: 27 Global Step: 568740 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:24,141-Speed 2514.08 samples/sec Loss 1.4902 LearningRate 0.000122 Epoch: 27 Global Step: 568750 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:32,345-Speed 2496.94 samples/sec Loss 1.5436 LearningRate 0.000122 Epoch: 27 Global Step: 568760 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:40,548-Speed 2496.99 samples/sec Loss 1.5570 LearningRate 0.000122 Epoch: 27 Global Step: 568770 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:48,750-Speed 2497.62 samples/sec Loss 1.5453 LearningRate 0.000122 Epoch: 27 Global Step: 568780 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:38:56,951-Speed 2497.69 samples/sec Loss 1.5429 LearningRate 0.000122 Epoch: 27 Global Step: 568790 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:39:05,156-Speed 2496.31 samples/sec Loss 1.5097 LearningRate 0.000122 Epoch: 27 Global Step: 568800 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:39:13,305-Speed 2513.57 samples/sec Loss 1.5114 LearningRate 0.000122 Epoch: 27 Global Step: 568810 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:39:21,508-Speed 2497.16 samples/sec Loss 1.5038 LearningRate 0.000122 Epoch: 27 Global Step: 568820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:39:29,713-Speed 2496.37 samples/sec Loss 1.5505 LearningRate 0.000122 Epoch: 27 Global Step: 568830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:39:37,915-Speed 2497.10 samples/sec Loss 1.5411 LearningRate 0.000122 Epoch: 27 Global Step: 568840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:39:46,118-Speed 2496.98 samples/sec Loss 1.5201 LearningRate 0.000122 Epoch: 27 Global Step: 568850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:39:54,321-Speed 2497.35 samples/sec Loss 1.5153 LearningRate 0.000122 Epoch: 27 Global Step: 568860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:02,469-Speed 2513.77 samples/sec Loss 1.5229 LearningRate 0.000122 Epoch: 27 Global Step: 568870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:10,674-Speed 2496.37 samples/sec Loss 1.5138 LearningRate 0.000122 Epoch: 27 Global Step: 568880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:18,876-Speed 2497.60 samples/sec Loss 1.5199 LearningRate 0.000122 Epoch: 27 Global Step: 568890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:27,203-Speed 2498.53 samples/sec Loss 1.5468 LearningRate 0.000122 Epoch: 27 Global Step: 568900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:35,405-Speed 2497.13 samples/sec Loss 1.5103 LearningRate 0.000122 Epoch: 27 Global Step: 568910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:43,655-Speed 2498.59 samples/sec Loss 1.5268 LearningRate 0.000122 Epoch: 27 Global Step: 568920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:40:51,841-Speed 2514.95 samples/sec Loss 1.5102 LearningRate 0.000122 Epoch: 27 Global Step: 568930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:00,059-Speed 2492.53 samples/sec Loss 1.5403 LearningRate 0.000122 Epoch: 27 Global Step: 568940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:11,760-Speed 1753.84 samples/sec Loss 1.4822 LearningRate 0.000122 Epoch: 27 Global Step: 568950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:20,015-Speed 2499.60 samples/sec Loss 1.5299 LearningRate 0.000122 Epoch: 27 Global Step: 568960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:28,267-Speed 2501.23 samples/sec Loss 1.4952 LearningRate 0.000122 Epoch: 27 Global Step: 568970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:36,462-Speed 2499.17 samples/sec Loss 1.5008 LearningRate 0.000122 Epoch: 27 Global Step: 568980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:44,624-Speed 2517.72 samples/sec Loss 1.5006 LearningRate 0.000122 Epoch: 27 Global Step: 568990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:41:57,053-Speed 1652.74 samples/sec Loss 1.5170 LearningRate 0.000122 Epoch: 27 Global Step: 569000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:05,252-Speed 2503.31 samples/sec Loss 1.5104 LearningRate 0.000122 Epoch: 27 Global Step: 569010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:13,449-Speed 2498.67 samples/sec Loss 1.5064 LearningRate 0.000122 Epoch: 27 Global Step: 569020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:22,832-Speed 2192.40 samples/sec Loss 1.5015 LearningRate 0.000122 Epoch: 27 Global Step: 569030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:34,770-Speed 1718.18 samples/sec Loss 1.4535 LearningRate 0.000122 Epoch: 27 Global Step: 569040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:42,921-Speed 2512.98 samples/sec Loss 1.5099 LearningRate 0.000122 Epoch: 27 Global Step: 569050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:42:51,153-Speed 2496.05 samples/sec Loss 1.4887 LearningRate 0.000122 Epoch: 27 Global Step: 569060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:02,111-Speed 1881.39 samples/sec Loss 1.4865 LearningRate 0.000122 Epoch: 27 Global Step: 569070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:10,323-Speed 2494.15 samples/sec Loss 1.4667 LearningRate 0.000122 Epoch: 27 Global Step: 569080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:18,583-Speed 2496.91 samples/sec Loss 1.5686 LearningRate 0.000122 Epoch: 27 Global Step: 569090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:26,838-Speed 2496.33 samples/sec Loss 1.4884 LearningRate 0.000122 Epoch: 27 Global Step: 569100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:34,991-Speed 2512.39 samples/sec Loss 1.5175 LearningRate 0.000122 Epoch: 27 Global Step: 569110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:48,309-Speed 1652.22 samples/sec Loss 1.5325 LearningRate 0.000122 Epoch: 27 Global Step: 569120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:43:56,703-Speed 2441.49 samples/sec Loss 1.5179 LearningRate 0.000122 Epoch: 27 Global Step: 569130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:04,899-Speed 2499.23 samples/sec Loss 1.4922 LearningRate 0.000122 Epoch: 27 Global Step: 569140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:13,094-Speed 2499.25 samples/sec Loss 1.5139 LearningRate 0.000122 Epoch: 27 Global Step: 569150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:21,291-Speed 2499.09 samples/sec Loss 1.5142 LearningRate 0.000122 Epoch: 27 Global Step: 569160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:29,435-Speed 2515.06 samples/sec Loss 1.4788 LearningRate 0.000122 Epoch: 27 Global Step: 569170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:37,633-Speed 2498.95 samples/sec Loss 1.5033 LearningRate 0.000122 Epoch: 27 Global Step: 569180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:45,839-Speed 2496.04 samples/sec Loss 1.5146 LearningRate 0.000122 Epoch: 27 Global Step: 569190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:44:54,039-Speed 2497.80 samples/sec Loss 1.4943 LearningRate 0.000122 Epoch: 27 Global Step: 569200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:02,241-Speed 2497.81 samples/sec Loss 1.4789 LearningRate 0.000122 Epoch: 27 Global Step: 569210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:10,447-Speed 2496.05 samples/sec Loss 1.5212 LearningRate 0.000122 Epoch: 27 Global Step: 569220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:18,607-Speed 2510.10 samples/sec Loss 1.4870 LearningRate 0.000122 Epoch: 27 Global Step: 569230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:26,816-Speed 2495.34 samples/sec Loss 1.4872 LearningRate 0.000122 Epoch: 27 Global Step: 569240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:35,022-Speed 2496.36 samples/sec Loss 1.5048 LearningRate 0.000122 Epoch: 27 Global Step: 569250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:43,227-Speed 2496.22 samples/sec Loss 1.5337 LearningRate 0.000122 Epoch: 27 Global Step: 569260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:51,430-Speed 2497.03 samples/sec Loss 1.5353 LearningRate 0.000122 Epoch: 27 Global Step: 569270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:45:59,635-Speed 2496.50 samples/sec Loss 1.5233 LearningRate 0.000122 Epoch: 27 Global Step: 569280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:07,787-Speed 2512.78 samples/sec Loss 1.4933 LearningRate 0.000122 Epoch: 27 Global Step: 569290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:15,994-Speed 2495.76 samples/sec Loss 1.5140 LearningRate 0.000122 Epoch: 27 Global Step: 569300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:24,200-Speed 2495.93 samples/sec Loss 1.5233 LearningRate 0.000122 Epoch: 27 Global Step: 569310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:32,405-Speed 2496.69 samples/sec Loss 1.5447 LearningRate 0.000121 Epoch: 27 Global Step: 569320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:40,611-Speed 2496.05 samples/sec Loss 1.5282 LearningRate 0.000121 Epoch: 27 Global Step: 569330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:48,812-Speed 2497.48 samples/sec Loss 1.5208 LearningRate 0.000121 Epoch: 27 Global Step: 569340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:46:56,964-Speed 2512.97 samples/sec Loss 1.5410 LearningRate 0.000121 Epoch: 27 Global Step: 569350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:05,165-Speed 2497.43 samples/sec Loss 1.5231 LearningRate 0.000121 Epoch: 27 Global Step: 569360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:13,369-Speed 2497.18 samples/sec Loss 1.5035 LearningRate 0.000121 Epoch: 27 Global Step: 569370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:21,573-Speed 2496.79 samples/sec Loss 1.4850 LearningRate 0.000121 Epoch: 27 Global Step: 569380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:29,789-Speed 2493.20 samples/sec Loss 1.5457 LearningRate 0.000121 Epoch: 27 Global Step: 569390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:37,996-Speed 2495.43 samples/sec Loss 1.5204 LearningRate 0.000121 Epoch: 27 Global Step: 569400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:46,148-Speed 2512.73 samples/sec Loss 1.5305 LearningRate 0.000121 Epoch: 27 Global Step: 569410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:47:54,351-Speed 2497.20 samples/sec Loss 1.5006 LearningRate 0.000121 Epoch: 27 Global Step: 569420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:02,557-Speed 2496.10 samples/sec Loss 1.4971 LearningRate 0.000121 Epoch: 27 Global Step: 569430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:10,761-Speed 2496.66 samples/sec Loss 1.5201 LearningRate 0.000121 Epoch: 27 Global Step: 569440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:18,964-Speed 2497.14 samples/sec Loss 1.5151 LearningRate 0.000121 Epoch: 27 Global Step: 569450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:27,165-Speed 2497.53 samples/sec Loss 1.5187 LearningRate 0.000121 Epoch: 27 Global Step: 569460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:35,321-Speed 2511.64 samples/sec Loss 1.5445 LearningRate 0.000121 Epoch: 27 Global Step: 569470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:43,522-Speed 2497.66 samples/sec Loss 1.5265 LearningRate 0.000121 Epoch: 27 Global Step: 569480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-07-11 00:48:51,682-Speed 2510.40 samples/sec Loss 1.4874 LearningRate 0.000121 Epoch: 27 Global Step: 569490 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:48:59,883-Speed 2497.39 samples/sec Loss 1.5215 LearningRate 0.000121 Epoch: 27 Global Step: 569500 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:08,084-Speed 2497.78 samples/sec Loss 1.5152 LearningRate 0.000121 Epoch: 27 Global Step: 569510 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:16,283-Speed 2498.18 samples/sec Loss 1.5196 LearningRate 0.000121 Epoch: 27 Global Step: 569520 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:24,435-Speed 2512.86 samples/sec Loss 1.4982 LearningRate 0.000121 Epoch: 27 Global Step: 569530 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:32,640-Speed 2496.39 samples/sec Loss 1.5186 LearningRate 0.000121 Epoch: 27 Global Step: 569540 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:40,858-Speed 2492.70 samples/sec Loss 1.4882 LearningRate 0.000121 Epoch: 27 Global Step: 569550 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:49,062-Speed 2496.70 samples/sec Loss 1.5344 LearningRate 0.000121 Epoch: 27 Global Step: 569560 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:49:57,276-Speed 2494.58 samples/sec Loss 1.5134 LearningRate 0.000121 Epoch: 27 Global Step: 569570 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:05,477-Speed 2497.59 samples/sec Loss 1.5240 LearningRate 0.000121 Epoch: 27 Global Step: 569580 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:13,625-Speed 2513.76 samples/sec Loss 1.5149 LearningRate 0.000121 Epoch: 27 Global Step: 569590 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:21,826-Speed 2497.95 samples/sec Loss 1.5089 LearningRate 0.000121 Epoch: 27 Global Step: 569600 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:30,031-Speed 2496.26 samples/sec Loss 1.5192 LearningRate 0.000121 Epoch: 27 Global Step: 569610 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:38,233-Speed 2497.34 samples/sec Loss 1.5179 LearningRate 0.000121 Epoch: 27 Global Step: 569620 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:46,461-Speed 2489.51 samples/sec Loss 1.5331 LearningRate 0.000121 Epoch: 27 Global Step: 569630 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:50:54,677-Speed 2492.84 samples/sec Loss 1.5278 LearningRate 0.000121 Epoch: 27 Global Step: 569640 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:51:02,829-Speed 2512.75 samples/sec Loss 1.5305 LearningRate 0.000121 Epoch: 27 Global Step: 569650 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-07-11 00:51:11,035-Speed 2496.24 samples/sec Loss 1.5200 LearningRate 0.000121 Epoch: 27 Global Step: 569660 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:51:19,244-Speed 2495.44 samples/sec Loss 1.5186 LearningRate 0.000121 Epoch: 27 Global Step: 569670 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:51:27,451-Speed 2495.70 samples/sec Loss 1.5195 LearningRate 0.000121 Epoch: 27 Global Step: 569680 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:51:35,655-Speed 2496.59 samples/sec Loss 1.5030 LearningRate 0.000121 Epoch: 27 Global Step: 569690 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:51:43,856-Speed 2497.86 samples/sec Loss 1.4857 LearningRate 0.000121 Epoch: 27 Global Step: 569700 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:51:52,007-Speed 2513.13 samples/sec Loss 1.4815 LearningRate 0.000121 Epoch: 27 Global Step: 569710 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:00,210-Speed 2496.85 samples/sec Loss 1.4977 LearningRate 0.000121 Epoch: 27 Global Step: 569720 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:08,412-Speed 2497.51 samples/sec Loss 1.5471 LearningRate 0.000121 Epoch: 27 Global Step: 569730 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:16,613-Speed 2497.54 samples/sec Loss 1.5344 LearningRate 0.000121 Epoch: 27 Global Step: 569740 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:24,818-Speed 2496.66 samples/sec Loss 1.5232 LearningRate 0.000121 Epoch: 27 Global Step: 569750 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:33,020-Speed 2497.35 samples/sec Loss 1.5092 LearningRate 0.000121 Epoch: 27 Global Step: 569760 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:41,172-Speed 2512.51 samples/sec Loss 1.5066 LearningRate 0.000121 Epoch: 27 Global Step: 569770 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:49,380-Speed 2495.68 samples/sec Loss 1.5107 LearningRate 0.000121 Epoch: 27 Global Step: 569780 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:52:57,581-Speed 2497.85 samples/sec Loss 1.5184 LearningRate 0.000121 Epoch: 27 Global Step: 569790 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:05,786-Speed 2496.54 samples/sec Loss 1.5469 LearningRate 0.000121 Epoch: 27 Global Step: 569800 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:13,987-Speed 2497.61 samples/sec Loss 1.5076 LearningRate 0.000121 Epoch: 27 Global Step: 569810 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:22,188-Speed 2498.06 samples/sec Loss 1.5443 LearningRate 0.000121 Epoch: 27 Global Step: 569820 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:30,337-Speed 2513.39 samples/sec Loss 1.4980 LearningRate 0.000121 Epoch: 27 Global Step: 569830 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:38,537-Speed 2497.96 samples/sec Loss 1.4959 LearningRate 0.000121 Epoch: 27 Global Step: 569840 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:46,745-Speed 2495.76 samples/sec Loss 1.5728 LearningRate 0.000121 Epoch: 27 Global Step: 569850 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:53:54,949-Speed 2496.77 samples/sec Loss 1.5353 LearningRate 0.000121 Epoch: 27 Global Step: 569860 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:03,153-Speed 2496.68 samples/sec Loss 1.4973 LearningRate 0.000121 Epoch: 27 Global Step: 569870 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:11,355-Speed 2497.20 samples/sec Loss 1.5512 LearningRate 0.000121 Epoch: 27 Global Step: 569880 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:19,512-Speed 2511.38 samples/sec Loss 1.5348 LearningRate 0.000121 Epoch: 27 Global Step: 569890 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:27,716-Speed 2496.72 samples/sec Loss 1.5064 LearningRate 0.000121 Epoch: 27 Global Step: 569900 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:35,922-Speed 2496.23 samples/sec Loss 1.5175 LearningRate 0.000121 Epoch: 27 Global Step: 569910 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:44,140-Speed 2492.67 samples/sec Loss 1.5075 LearningRate 0.000121 Epoch: 27 Global Step: 569920 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:54:52,343-Speed 2497.02 samples/sec Loss 1.5252 LearningRate 0.000121 Epoch: 27 Global Step: 569930 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:00,547-Speed 2496.51 samples/sec Loss 1.5086 LearningRate 0.000121 Epoch: 27 Global Step: 569940 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:08,696-Speed 2513.75 samples/sec Loss 1.4806 LearningRate 0.000121 Epoch: 27 Global Step: 569950 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:16,904-Speed 2495.50 samples/sec Loss 1.5077 LearningRate 0.000121 Epoch: 27 Global Step: 569960 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:25,110-Speed 2496.22 samples/sec Loss 1.4997 LearningRate 0.000121 Epoch: 27 Global Step: 569970 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:33,314-Speed 2496.63 samples/sec Loss 1.4906 LearningRate 0.000121 Epoch: 27 Global Step: 569980 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:41,517-Speed 2496.74 samples/sec Loss 1.5492 LearningRate 0.000121 Epoch: 27 Global Step: 569990 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:49,719-Speed 2497.31 samples/sec Loss 1.4884 LearningRate 0.000121 Epoch: 27 Global Step: 570000 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:55:57,870-Speed 2513.07 samples/sec Loss 1.5263 LearningRate 0.000121 Epoch: 27 Global Step: 570010 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 00:56:06,032-Speed 2509.47 samples/sec Loss 1.5119 LearningRate 0.000121 Epoch: 27 Global Step: 570020 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:14,232-Speed 2497.98 samples/sec Loss 1.5117 LearningRate 0.000121 Epoch: 27 Global Step: 570030 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:22,434-Speed 2497.44 samples/sec Loss 1.4717 LearningRate 0.000121 Epoch: 27 Global Step: 570040 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:30,635-Speed 2497.71 samples/sec Loss 1.5267 LearningRate 0.000121 Epoch: 27 Global Step: 570050 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:38,841-Speed 2495.93 samples/sec Loss 1.5107 LearningRate 0.000121 Epoch: 27 Global Step: 570060 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:46,991-Speed 2513.60 samples/sec Loss 1.5672 LearningRate 0.000121 Epoch: 27 Global Step: 570070 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:56:55,193-Speed 2497.42 samples/sec Loss 1.4680 LearningRate 0.000121 Epoch: 27 Global Step: 570080 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:03,395-Speed 2497.39 samples/sec Loss 1.5265 LearningRate 0.000121 Epoch: 27 Global Step: 570090 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:11,598-Speed 2497.18 samples/sec Loss 1.4875 LearningRate 0.000121 Epoch: 27 Global Step: 570100 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:19,796-Speed 2498.29 samples/sec Loss 1.5099 LearningRate 0.000121 Epoch: 27 Global Step: 570110 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:28,020-Speed 2490.67 samples/sec Loss 1.5264 LearningRate 0.000121 Epoch: 27 Global Step: 570120 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:36,166-Speed 2514.72 samples/sec Loss 1.4991 LearningRate 0.000121 Epoch: 27 Global Step: 570130 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:44,365-Speed 2498.57 samples/sec Loss 1.5078 LearningRate 0.000121 Epoch: 27 Global Step: 570140 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:57:52,563-Speed 2498.57 samples/sec Loss 1.5010 LearningRate 0.000121 Epoch: 27 Global Step: 570150 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:00,764-Speed 2497.91 samples/sec Loss 1.5170 LearningRate 0.000121 Epoch: 27 Global Step: 570160 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:08,962-Speed 2498.49 samples/sec Loss 1.5077 LearningRate 0.000121 Epoch: 27 Global Step: 570170 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:17,163-Speed 2497.53 samples/sec Loss 1.5050 LearningRate 0.000121 Epoch: 27 Global Step: 570180 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:25,309-Speed 2514.69 samples/sec Loss 1.5090 LearningRate 0.000121 Epoch: 27 Global Step: 570190 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:33,515-Speed 2495.93 samples/sec Loss 1.4832 LearningRate 0.000121 Epoch: 27 Global Step: 570200 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:41,714-Speed 2498.20 samples/sec Loss 1.5107 LearningRate 0.000121 Epoch: 27 Global Step: 570210 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:49,915-Speed 2497.85 samples/sec Loss 1.5117 LearningRate 0.000121 Epoch: 27 Global Step: 570220 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:58:58,117-Speed 2497.57 samples/sec Loss 1.5333 LearningRate 0.000121 Epoch: 27 Global Step: 570230 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:06,313-Speed 2499.03 samples/sec Loss 1.5255 LearningRate 0.000121 Epoch: 27 Global Step: 570240 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:14,460-Speed 2514.34 samples/sec Loss 1.5008 LearningRate 0.000121 Epoch: 27 Global Step: 570250 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:22,663-Speed 2497.16 samples/sec Loss 1.5095 LearningRate 0.000121 Epoch: 27 Global Step: 570260 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:30,865-Speed 2497.50 samples/sec Loss 1.4968 LearningRate 0.000121 Epoch: 27 Global Step: 570270 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:39,066-Speed 2497.81 samples/sec Loss 1.5350 LearningRate 0.000121 Epoch: 27 Global Step: 570280 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:47,267-Speed 2497.69 samples/sec Loss 1.4739 LearningRate 0.000121 Epoch: 27 Global Step: 570290 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 00:59:55,465-Speed 2498.68 samples/sec Loss 1.4840 LearningRate 0.000121 Epoch: 27 Global Step: 570300 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:03,625-Speed 2510.12 samples/sec Loss 1.4917 LearningRate 0.000121 Epoch: 27 Global Step: 570310 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:11,821-Speed 2499.35 samples/sec Loss 1.5083 LearningRate 0.000121 Epoch: 27 Global Step: 570320 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:20,022-Speed 2497.64 samples/sec Loss 1.5018 LearningRate 0.000121 Epoch: 27 Global Step: 570330 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:28,223-Speed 2497.82 samples/sec Loss 1.5127 LearningRate 0.000121 Epoch: 27 Global Step: 570340 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:36,438-Speed 2493.24 samples/sec Loss 1.4948 LearningRate 0.000121 Epoch: 27 Global Step: 570350 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:44,639-Speed 2497.69 samples/sec Loss 1.4614 LearningRate 0.000121 Epoch: 27 Global Step: 570360 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:00:52,786-Speed 2514.28 samples/sec Loss 1.5008 LearningRate 0.000121 Epoch: 27 Global Step: 570370 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:00,985-Speed 2498.12 samples/sec Loss 1.4731 LearningRate 0.000121 Epoch: 27 Global Step: 570380 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:09,199-Speed 2493.94 samples/sec Loss 1.5055 LearningRate 0.000121 Epoch: 27 Global Step: 570390 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:17,410-Speed 2494.49 samples/sec Loss 1.4711 LearningRate 0.000120 Epoch: 27 Global Step: 570400 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:25,612-Speed 2497.36 samples/sec Loss 1.4878 LearningRate 0.000120 Epoch: 27 Global Step: 570410 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:33,815-Speed 2497.36 samples/sec Loss 1.4780 LearningRate 0.000120 Epoch: 27 Global Step: 570420 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:41,967-Speed 2513.02 samples/sec Loss 1.4980 LearningRate 0.000120 Epoch: 27 Global Step: 570430 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:50,172-Speed 2496.70 samples/sec Loss 1.5055 LearningRate 0.000120 Epoch: 27 Global Step: 570440 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:01:58,373-Speed 2498.06 samples/sec Loss 1.5161 LearningRate 0.000120 Epoch: 27 Global Step: 570450 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:06,573-Speed 2497.88 samples/sec Loss 1.4831 LearningRate 0.000120 Epoch: 27 Global Step: 570460 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:14,774-Speed 2497.91 samples/sec Loss 1.4791 LearningRate 0.000120 Epoch: 27 Global Step: 570470 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:22,976-Speed 2497.22 samples/sec Loss 1.5158 LearningRate 0.000120 Epoch: 27 Global Step: 570480 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:31,124-Speed 2513.92 samples/sec Loss 1.5186 LearningRate 0.000120 Epoch: 27 Global Step: 570490 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:39,324-Speed 2497.88 samples/sec Loss 1.4905 LearningRate 0.000120 Epoch: 27 Global Step: 570500 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:47,525-Speed 2497.64 samples/sec Loss 1.5217 LearningRate 0.000120 Epoch: 27 Global Step: 570510 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:02:55,729-Speed 2496.97 samples/sec Loss 1.5256 LearningRate 0.000120 Epoch: 27 Global Step: 570520 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:03,926-Speed 2498.61 samples/sec Loss 1.5132 LearningRate 0.000120 Epoch: 27 Global Step: 570530 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:12,127-Speed 2497.91 samples/sec Loss 1.4857 LearningRate 0.000120 Epoch: 27 Global Step: 570540 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:20,274-Speed 2513.95 samples/sec Loss 1.4983 LearningRate 0.000120 Epoch: 27 Global Step: 570550 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:28,475-Speed 2497.73 samples/sec Loss 1.5035 LearningRate 0.000120 Epoch: 27 Global Step: 570560 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:36,676-Speed 2497.95 samples/sec Loss 1.4902 LearningRate 0.000120 Epoch: 27 Global Step: 570570 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:44,876-Speed 2498.05 samples/sec Loss 1.4961 LearningRate 0.000120 Epoch: 27 Global Step: 570580 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:03:53,076-Speed 2497.74 samples/sec Loss 1.5051 LearningRate 0.000120 Epoch: 27 Global Step: 570590 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:01,275-Speed 2498.38 samples/sec Loss 1.5345 LearningRate 0.000120 Epoch: 27 Global Step: 570600 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:09,445-Speed 2507.41 samples/sec Loss 1.4559 LearningRate 0.000120 Epoch: 27 Global Step: 570610 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:17,644-Speed 2498.07 samples/sec Loss 1.4915 LearningRate 0.000120 Epoch: 27 Global Step: 570620 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:25,849-Speed 2496.59 samples/sec Loss 1.4894 LearningRate 0.000120 Epoch: 27 Global Step: 570630 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:34,051-Speed 2497.30 samples/sec Loss 1.5057 LearningRate 0.000120 Epoch: 27 Global Step: 570640 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:42,253-Speed 2497.68 samples/sec Loss 1.5493 LearningRate 0.000120 Epoch: 27 Global Step: 570650 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:50,452-Speed 2498.31 samples/sec Loss 1.5155 LearningRate 0.000120 Epoch: 27 Global Step: 570660 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:04:58,603-Speed 2513.08 samples/sec Loss 1.5567 LearningRate 0.000120 Epoch: 27 Global Step: 570670 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:06,803-Speed 2497.87 samples/sec Loss 1.4991 LearningRate 0.000120 Epoch: 27 Global Step: 570680 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:15,010-Speed 2495.84 samples/sec Loss 1.5150 LearningRate 0.000120 Epoch: 27 Global Step: 570690 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:23,212-Speed 2497.41 samples/sec Loss 1.4716 LearningRate 0.000120 Epoch: 27 Global Step: 570700 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:31,425-Speed 2494.04 samples/sec Loss 1.5350 LearningRate 0.000120 Epoch: 27 Global Step: 570710 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:39,625-Speed 2498.01 samples/sec Loss 1.4897 LearningRate 0.000120 Epoch: 27 Global Step: 570720 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:47,771-Speed 2514.51 samples/sec Loss 1.5008 LearningRate 0.000120 Epoch: 27 Global Step: 570730 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:05:55,978-Speed 2495.89 samples/sec Loss 1.5481 LearningRate 0.000120 Epoch: 27 Global Step: 570740 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:04,177-Speed 2498.26 samples/sec Loss 1.5196 LearningRate 0.000120 Epoch: 27 Global Step: 570750 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:12,377-Speed 2498.09 samples/sec Loss 1.4959 LearningRate 0.000120 Epoch: 27 Global Step: 570760 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:20,576-Speed 2498.14 samples/sec Loss 1.5339 LearningRate 0.000120 Epoch: 27 Global Step: 570770 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:28,774-Speed 2498.37 samples/sec Loss 1.5521 LearningRate 0.000120 Epoch: 27 Global Step: 570780 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:36,923-Speed 2513.80 samples/sec Loss 1.5477 LearningRate 0.000120 Epoch: 27 Global Step: 570790 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:45,130-Speed 2495.78 samples/sec Loss 1.5031 LearningRate 0.000120 Epoch: 27 Global Step: 570800 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:06:53,332-Speed 2497.59 samples/sec Loss 1.5177 LearningRate 0.000120 Epoch: 27 Global Step: 570810 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:01,543-Speed 2494.28 samples/sec Loss 1.5281 LearningRate 0.000120 Epoch: 27 Global Step: 570820 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:09,743-Speed 2498.30 samples/sec Loss 1.5141 LearningRate 0.000120 Epoch: 27 Global Step: 570830 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:17,956-Speed 2494.05 samples/sec Loss 1.5477 LearningRate 0.000120 Epoch: 27 Global Step: 570840 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:26,103-Speed 2514.07 samples/sec Loss 1.5507 LearningRate 0.000120 Epoch: 27 Global Step: 570850 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:34,303-Speed 2497.93 samples/sec Loss 1.5286 LearningRate 0.000120 Epoch: 27 Global Step: 570860 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:42,503-Speed 2497.83 samples/sec Loss 1.5311 LearningRate 0.000120 Epoch: 27 Global Step: 570870 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:50,701-Speed 2498.45 samples/sec Loss 1.5216 LearningRate 0.000120 Epoch: 27 Global Step: 570880 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:07:58,905-Speed 2496.73 samples/sec Loss 1.5137 LearningRate 0.000120 Epoch: 27 Global Step: 570890 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:07,108-Speed 2496.96 samples/sec Loss 1.5307 LearningRate 0.000120 Epoch: 27 Global Step: 570900 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:15,264-Speed 2511.75 samples/sec Loss 1.5389 LearningRate 0.000120 Epoch: 27 Global Step: 570910 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:23,465-Speed 2497.64 samples/sec Loss 1.4964 LearningRate 0.000120 Epoch: 27 Global Step: 570920 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:31,661-Speed 2499.30 samples/sec Loss 1.5080 LearningRate 0.000120 Epoch: 27 Global Step: 570930 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:39,883-Speed 2491.17 samples/sec Loss 1.4957 LearningRate 0.000120 Epoch: 27 Global Step: 570940 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:48,085-Speed 2497.47 samples/sec Loss 1.4881 LearningRate 0.000120 Epoch: 27 Global Step: 570950 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:08:56,297-Speed 2494.38 samples/sec Loss 1.4966 LearningRate 0.000120 Epoch: 27 Global Step: 570960 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:04,441-Speed 2514.96 samples/sec Loss 1.5555 LearningRate 0.000120 Epoch: 27 Global Step: 570970 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:12,645-Speed 2496.82 samples/sec Loss 1.5126 LearningRate 0.000120 Epoch: 27 Global Step: 570980 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:20,841-Speed 2499.08 samples/sec Loss 1.5043 LearningRate 0.000120 Epoch: 27 Global Step: 570990 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:29,054-Speed 2493.79 samples/sec Loss 1.4901 LearningRate 0.000120 Epoch: 27 Global Step: 571000 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:37,277-Speed 2490.95 samples/sec Loss 1.5261 LearningRate 0.000120 Epoch: 27 Global Step: 571010 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:45,476-Speed 2498.28 samples/sec Loss 1.4942 LearningRate 0.000120 Epoch: 27 Global Step: 571020 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:09:53,635-Speed 2510.60 samples/sec Loss 1.5020 LearningRate 0.000120 Epoch: 27 Global Step: 571030 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:01,833-Speed 2498.50 samples/sec Loss 1.5106 LearningRate 0.000120 Epoch: 27 Global Step: 571040 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:10,033-Speed 2498.07 samples/sec Loss 1.5390 LearningRate 0.000120 Epoch: 27 Global Step: 571050 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:18,232-Speed 2498.18 samples/sec Loss 1.5597 LearningRate 0.000120 Epoch: 27 Global Step: 571060 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:26,440-Speed 2496.11 samples/sec Loss 1.4872 LearningRate 0.000120 Epoch: 27 Global Step: 571070 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:34,641-Speed 2497.49 samples/sec Loss 1.5015 LearningRate 0.000120 Epoch: 27 Global Step: 571080 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:42,798-Speed 2511.28 samples/sec Loss 1.4739 LearningRate 0.000120 Epoch: 27 Global Step: 571090 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:51,000-Speed 2497.57 samples/sec Loss 1.5047 LearningRate 0.000120 Epoch: 27 Global Step: 571100 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:10:59,199-Speed 2498.29 samples/sec Loss 1.4971 LearningRate 0.000120 Epoch: 27 Global Step: 571110 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:07,398-Speed 2498.40 samples/sec Loss 1.5186 LearningRate 0.000120 Epoch: 27 Global Step: 571120 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:15,595-Speed 2498.61 samples/sec Loss 1.4828 LearningRate 0.000120 Epoch: 27 Global Step: 571130 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:23,795-Speed 2498.03 samples/sec Loss 1.5218 LearningRate 0.000120 Epoch: 27 Global Step: 571140 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:31,948-Speed 2512.48 samples/sec Loss 1.5152 LearningRate 0.000120 Epoch: 27 Global Step: 571150 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:40,175-Speed 2489.72 samples/sec Loss 1.5005 LearningRate 0.000120 Epoch: 27 Global Step: 571160 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:48,375-Speed 2498.07 samples/sec Loss 1.4810 LearningRate 0.000120 Epoch: 27 Global Step: 571170 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:11:56,573-Speed 2498.67 samples/sec Loss 1.5030 LearningRate 0.000120 Epoch: 27 Global Step: 571180 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:12:04,771-Speed 2498.39 samples/sec Loss 1.4605 LearningRate 0.000120 Epoch: 27 Global Step: 571190 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:12:12,970-Speed 2498.46 samples/sec Loss 1.5258 LearningRate 0.000120 Epoch: 27 Global Step: 571200 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:12:21,116-Speed 2514.33 samples/sec Loss 1.5130 LearningRate 0.000120 Epoch: 27 Global Step: 571210 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:12:29,319-Speed 2497.22 samples/sec Loss 1.4761 LearningRate 0.000120 Epoch: 27 Global Step: 571220 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:12:37,523-Speed 2496.82 samples/sec Loss 1.5265 LearningRate 0.000120 Epoch: 27 Global Step: 571230 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:12:45,725-Speed 2497.28 samples/sec Loss 1.5098 LearningRate 0.000120 Epoch: 27 Global Step: 571240 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:12:53,928-Speed 2496.90 samples/sec Loss 1.4892 LearningRate 0.000120 Epoch: 27 Global Step: 571250 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:02,135-Speed 2495.90 samples/sec Loss 1.5049 LearningRate 0.000120 Epoch: 27 Global Step: 571260 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:10,286-Speed 2513.11 samples/sec Loss 1.5233 LearningRate 0.000120 Epoch: 27 Global Step: 571270 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:18,488-Speed 2497.53 samples/sec Loss 1.5138 LearningRate 0.000120 Epoch: 27 Global Step: 571280 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:26,695-Speed 2495.88 samples/sec Loss 1.4915 LearningRate 0.000120 Epoch: 27 Global Step: 571290 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:34,898-Speed 2497.05 samples/sec Loss 1.5082 LearningRate 0.000120 Epoch: 27 Global Step: 571300 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:43,097-Speed 2498.29 samples/sec Loss 1.5382 LearningRate 0.000120 Epoch: 27 Global Step: 571310 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:13:51,256-Speed 2510.56 samples/sec Loss 1.4803 LearningRate 0.000120 Epoch: 27 Global Step: 571320 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:13:59,404-Speed 2513.98 samples/sec Loss 1.5318 LearningRate 0.000120 Epoch: 27 Global Step: 571330 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:07,602-Speed 2498.37 samples/sec Loss 1.4846 LearningRate 0.000120 Epoch: 27 Global Step: 571340 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:15,805-Speed 2497.26 samples/sec Loss 1.5063 LearningRate 0.000120 Epoch: 27 Global Step: 571350 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:24,005-Speed 2497.82 samples/sec Loss 1.4792 LearningRate 0.000120 Epoch: 27 Global Step: 571360 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:32,204-Speed 2498.38 samples/sec Loss 1.5166 LearningRate 0.000120 Epoch: 27 Global Step: 571370 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:40,406-Speed 2497.23 samples/sec Loss 1.5151 LearningRate 0.000120 Epoch: 27 Global Step: 571380 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:48,565-Speed 2510.61 samples/sec Loss 1.5107 LearningRate 0.000120 Epoch: 27 Global Step: 571390 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:14:56,763-Speed 2498.60 samples/sec Loss 1.5014 LearningRate 0.000120 Epoch: 27 Global Step: 571400 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:04,963-Speed 2497.79 samples/sec Loss 1.5099 LearningRate 0.000120 Epoch: 27 Global Step: 571410 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:13,161-Speed 2498.54 samples/sec Loss 1.4886 LearningRate 0.000120 Epoch: 27 Global Step: 571420 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:21,379-Speed 2492.80 samples/sec Loss 1.5101 LearningRate 0.000120 Epoch: 27 Global Step: 571430 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:29,578-Speed 2498.03 samples/sec Loss 1.4910 LearningRate 0.000120 Epoch: 27 Global Step: 571440 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:37,729-Speed 2513.25 samples/sec Loss 1.5089 LearningRate 0.000120 Epoch: 27 Global Step: 571450 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:45,944-Speed 2493.52 samples/sec Loss 1.5537 LearningRate 0.000120 Epoch: 27 Global Step: 571460 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:15:54,147-Speed 2496.75 samples/sec Loss 1.5176 LearningRate 0.000119 Epoch: 27 Global Step: 571470 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:02,347-Speed 2498.23 samples/sec Loss 1.5080 LearningRate 0.000119 Epoch: 27 Global Step: 571480 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:10,544-Speed 2498.79 samples/sec Loss 1.4925 LearningRate 0.000119 Epoch: 27 Global Step: 571490 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:18,744-Speed 2498.03 samples/sec Loss 1.5217 LearningRate 0.000119 Epoch: 27 Global Step: 571500 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:26,894-Speed 2513.00 samples/sec Loss 1.5021 LearningRate 0.000119 Epoch: 27 Global Step: 571510 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:35,097-Speed 2497.18 samples/sec Loss 1.4880 LearningRate 0.000119 Epoch: 27 Global Step: 571520 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:43,297-Speed 2498.11 samples/sec Loss 1.5025 LearningRate 0.000119 Epoch: 27 Global Step: 571530 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:51,499-Speed 2497.47 samples/sec Loss 1.5354 LearningRate 0.000119 Epoch: 27 Global Step: 571540 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:16:59,698-Speed 2498.18 samples/sec Loss 1.5471 LearningRate 0.000119 Epoch: 27 Global Step: 571550 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:07,905-Speed 2495.91 samples/sec Loss 1.4978 LearningRate 0.000119 Epoch: 27 Global Step: 571560 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:16,055-Speed 2513.05 samples/sec Loss 1.5278 LearningRate 0.000119 Epoch: 27 Global Step: 571570 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:24,259-Speed 2496.82 samples/sec Loss 1.5234 LearningRate 0.000119 Epoch: 27 Global Step: 571580 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:32,461-Speed 2497.21 samples/sec Loss 1.5150 LearningRate 0.000119 Epoch: 27 Global Step: 571590 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:40,666-Speed 2496.53 samples/sec Loss 1.5019 LearningRate 0.000119 Epoch: 27 Global Step: 571600 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:48,872-Speed 2496.00 samples/sec Loss 1.4735 LearningRate 0.000119 Epoch: 27 Global Step: 571610 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:17:57,078-Speed 2495.99 samples/sec Loss 1.5099 LearningRate 0.000119 Epoch: 27 Global Step: 571620 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:05,230-Speed 2512.97 samples/sec Loss 1.4784 LearningRate 0.000119 Epoch: 27 Global Step: 571630 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:13,434-Speed 2497.14 samples/sec Loss 1.4981 LearningRate 0.000119 Epoch: 27 Global Step: 571640 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:21,637-Speed 2496.98 samples/sec Loss 1.4566 LearningRate 0.000119 Epoch: 27 Global Step: 571650 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:29,839-Speed 2497.36 samples/sec Loss 1.4646 LearningRate 0.000119 Epoch: 27 Global Step: 571660 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:38,042-Speed 2497.13 samples/sec Loss 1.5158 LearningRate 0.000119 Epoch: 27 Global Step: 571670 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:46,245-Speed 2497.12 samples/sec Loss 1.4695 LearningRate 0.000119 Epoch: 27 Global Step: 571680 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:18:54,395-Speed 2513.32 samples/sec Loss 1.5230 LearningRate 0.000119 Epoch: 27 Global Step: 571690 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:02,600-Speed 2496.38 samples/sec Loss 1.5269 LearningRate 0.000119 Epoch: 27 Global Step: 571700 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:10,800-Speed 2498.29 samples/sec Loss 1.4955 LearningRate 0.000119 Epoch: 27 Global Step: 571710 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:19,009-Speed 2495.24 samples/sec Loss 1.5421 LearningRate 0.000119 Epoch: 27 Global Step: 571720 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:27,211-Speed 2497.03 samples/sec Loss 1.4960 LearningRate 0.000119 Epoch: 27 Global Step: 571730 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:35,414-Speed 2497.22 samples/sec Loss 1.5299 LearningRate 0.000119 Epoch: 27 Global Step: 571740 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:43,565-Speed 2512.94 samples/sec Loss 1.4166 LearningRate 0.000119 Epoch: 27 Global Step: 571750 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:51,771-Speed 2496.09 samples/sec Loss 1.5001 LearningRate 0.000119 Epoch: 27 Global Step: 571760 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:19:59,974-Speed 2497.04 samples/sec Loss 1.5334 LearningRate 0.000119 Epoch: 27 Global Step: 571770 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:08,181-Speed 2495.79 samples/sec Loss 1.4955 LearningRate 0.000119 Epoch: 27 Global Step: 571780 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:16,396-Speed 2493.34 samples/sec Loss 1.5206 LearningRate 0.000119 Epoch: 27 Global Step: 571790 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:24,598-Speed 2497.56 samples/sec Loss 1.4924 LearningRate 0.000119 Epoch: 27 Global Step: 571800 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:32,747-Speed 2513.56 samples/sec Loss 1.4779 LearningRate 0.000119 Epoch: 27 Global Step: 571810 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:40,947-Speed 2498.22 samples/sec Loss 1.5321 LearningRate 0.000119 Epoch: 27 Global Step: 571820 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:49,149-Speed 2497.20 samples/sec Loss 1.5129 LearningRate 0.000119 Epoch: 27 Global Step: 571830 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:20:57,355-Speed 2496.07 samples/sec Loss 1.5144 LearningRate 0.000119 Epoch: 27 Global Step: 571840 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:05,566-Speed 2494.80 samples/sec Loss 1.4898 LearningRate 0.000119 Epoch: 27 Global Step: 571850 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:13,779-Speed 2494.08 samples/sec Loss 1.4992 LearningRate 0.000119 Epoch: 27 Global Step: 571860 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:21,924-Speed 2514.84 samples/sec Loss 1.4973 LearningRate 0.000119 Epoch: 27 Global Step: 571870 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:30,128-Speed 2496.63 samples/sec Loss 1.4995 LearningRate 0.000119 Epoch: 27 Global Step: 571880 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:38,330-Speed 2497.25 samples/sec Loss 1.4825 LearningRate 0.000119 Epoch: 27 Global Step: 571890 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:46,528-Speed 2498.65 samples/sec Loss 1.5157 LearningRate 0.000119 Epoch: 27 Global Step: 571900 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:21:54,730-Speed 2497.47 samples/sec Loss 1.5284 LearningRate 0.000119 Epoch: 27 Global Step: 571910 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:02,935-Speed 2496.61 samples/sec Loss 1.4637 LearningRate 0.000119 Epoch: 27 Global Step: 571920 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:11,085-Speed 2513.37 samples/sec Loss 1.4896 LearningRate 0.000119 Epoch: 27 Global Step: 571930 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:19,285-Speed 2497.95 samples/sec Loss 1.5259 LearningRate 0.000119 Epoch: 27 Global Step: 571940 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:27,487-Speed 2497.18 samples/sec Loss 1.4857 LearningRate 0.000119 Epoch: 27 Global Step: 571950 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:35,691-Speed 2496.81 samples/sec Loss 1.5089 LearningRate 0.000119 Epoch: 27 Global Step: 571960 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:43,889-Speed 2498.53 samples/sec Loss 1.5466 LearningRate 0.000119 Epoch: 27 Global Step: 571970 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:22:52,093-Speed 2496.95 samples/sec Loss 1.5232 LearningRate 0.000119 Epoch: 27 Global Step: 571980 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:00,239-Speed 2514.49 samples/sec Loss 1.4884 LearningRate 0.000119 Epoch: 27 Global Step: 571990 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:08,444-Speed 2496.38 samples/sec Loss 1.4629 LearningRate 0.000119 Epoch: 27 Global Step: 572000 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:16,667-Speed 2491.08 samples/sec Loss 1.5423 LearningRate 0.000119 Epoch: 27 Global Step: 572010 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:24,866-Speed 2498.15 samples/sec Loss 1.4661 LearningRate 0.000119 Epoch: 27 Global Step: 572020 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:33,070-Speed 2496.88 samples/sec Loss 1.4950 LearningRate 0.000119 Epoch: 27 Global Step: 572030 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:41,270-Speed 2498.00 samples/sec Loss 1.5064 LearningRate 0.000119 Epoch: 27 Global Step: 572040 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:49,423-Speed 2512.54 samples/sec Loss 1.4833 LearningRate 0.000119 Epoch: 27 Global Step: 572050 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:23:57,622-Speed 2497.99 samples/sec Loss 1.5189 LearningRate 0.000119 Epoch: 27 Global Step: 572060 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:05,829-Speed 2495.87 samples/sec Loss 1.5108 LearningRate 0.000119 Epoch: 27 Global Step: 572070 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:14,029-Speed 2498.10 samples/sec Loss 1.4664 LearningRate 0.000119 Epoch: 27 Global Step: 572080 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:22,229-Speed 2498.27 samples/sec Loss 1.4894 LearningRate 0.000119 Epoch: 27 Global Step: 572090 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:30,434-Speed 2496.48 samples/sec Loss 1.4694 LearningRate 0.000119 Epoch: 27 Global Step: 572100 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:38,583-Speed 2513.47 samples/sec Loss 1.5124 LearningRate 0.000119 Epoch: 27 Global Step: 572110 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:46,784-Speed 2497.71 samples/sec Loss 1.4898 LearningRate 0.000119 Epoch: 27 Global Step: 572120 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:24:54,982-Speed 2498.93 samples/sec Loss 1.5286 LearningRate 0.000119 Epoch: 27 Global Step: 572130 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:03,179-Speed 2498.65 samples/sec Loss 1.5134 LearningRate 0.000119 Epoch: 27 Global Step: 572140 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:11,380-Speed 2497.57 samples/sec Loss 1.4692 LearningRate 0.000119 Epoch: 27 Global Step: 572150 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:19,582-Speed 2497.51 samples/sec Loss 1.5004 LearningRate 0.000119 Epoch: 27 Global Step: 572160 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:27,730-Speed 2513.75 samples/sec Loss 1.5458 LearningRate 0.000119 Epoch: 27 Global Step: 572170 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:35,935-Speed 2496.50 samples/sec Loss 1.5486 LearningRate 0.000119 Epoch: 27 Global Step: 572180 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:44,146-Speed 2494.42 samples/sec Loss 1.5348 LearningRate 0.000119 Epoch: 27 Global Step: 572190 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:25:52,342-Speed 2499.14 samples/sec Loss 1.5345 LearningRate 0.000119 Epoch: 27 Global Step: 572200 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:00,554-Speed 2494.41 samples/sec Loss 1.5173 LearningRate 0.000119 Epoch: 27 Global Step: 572210 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:08,757-Speed 2496.93 samples/sec Loss 1.4882 LearningRate 0.000119 Epoch: 27 Global Step: 572220 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:16,915-Speed 2510.94 samples/sec Loss 1.5122 LearningRate 0.000119 Epoch: 27 Global Step: 572230 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:25,117-Speed 2497.20 samples/sec Loss 1.4846 LearningRate 0.000119 Epoch: 27 Global Step: 572240 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:33,320-Speed 2497.27 samples/sec Loss 1.4989 LearningRate 0.000119 Epoch: 27 Global Step: 572250 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:41,522-Speed 2497.28 samples/sec Loss 1.5191 LearningRate 0.000119 Epoch: 27 Global Step: 572260 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:49,724-Speed 2497.56 samples/sec Loss 1.4946 LearningRate 0.000119 Epoch: 27 Global Step: 572270 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:26:57,925-Speed 2497.55 samples/sec Loss 1.4993 LearningRate 0.000119 Epoch: 27 Global Step: 572280 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:06,072-Speed 2514.10 samples/sec Loss 1.4947 LearningRate 0.000119 Epoch: 27 Global Step: 572290 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:14,271-Speed 2498.26 samples/sec Loss 1.5127 LearningRate 0.000119 Epoch: 27 Global Step: 572300 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:22,471-Speed 2497.95 samples/sec Loss 1.5087 LearningRate 0.000119 Epoch: 27 Global Step: 572310 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:30,672-Speed 2498.08 samples/sec Loss 1.4859 LearningRate 0.000119 Epoch: 27 Global Step: 572320 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:38,869-Speed 2498.54 samples/sec Loss 1.4936 LearningRate 0.000119 Epoch: 27 Global Step: 572330 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:47,070-Speed 2497.71 samples/sec Loss 1.4882 LearningRate 0.000119 Epoch: 27 Global Step: 572340 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:27:55,215-Speed 2514.74 samples/sec Loss 1.5138 LearningRate 0.000119 Epoch: 27 Global Step: 572350 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:03,415-Speed 2498.30 samples/sec Loss 1.5217 LearningRate 0.000119 Epoch: 27 Global Step: 572360 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:11,614-Speed 2498.03 samples/sec Loss 1.5261 LearningRate 0.000119 Epoch: 27 Global Step: 572370 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:19,825-Speed 2494.65 samples/sec Loss 1.4798 LearningRate 0.000119 Epoch: 27 Global Step: 572380 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:28,024-Speed 2498.38 samples/sec Loss 1.5063 LearningRate 0.000119 Epoch: 27 Global Step: 572390 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:36,227-Speed 2496.77 samples/sec Loss 1.4916 LearningRate 0.000119 Epoch: 27 Global Step: 572400 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:44,371-Speed 2515.22 samples/sec Loss 1.4876 LearningRate 0.000119 Epoch: 27 Global Step: 572410 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:28:52,571-Speed 2497.84 samples/sec Loss 1.5154 LearningRate 0.000119 Epoch: 27 Global Step: 572420 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:00,773-Speed 2497.65 samples/sec Loss 1.4955 LearningRate 0.000119 Epoch: 27 Global Step: 572430 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:08,978-Speed 2496.32 samples/sec Loss 1.5194 LearningRate 0.000119 Epoch: 27 Global Step: 572440 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:17,177-Speed 2498.19 samples/sec Loss 1.5097 LearningRate 0.000119 Epoch: 27 Global Step: 572450 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:25,376-Speed 2498.40 samples/sec Loss 1.4904 LearningRate 0.000119 Epoch: 27 Global Step: 572460 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:33,523-Speed 2514.17 samples/sec Loss 1.5190 LearningRate 0.000119 Epoch: 27 Global Step: 572470 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:41,724-Speed 2497.55 samples/sec Loss 1.5219 LearningRate 0.000119 Epoch: 27 Global Step: 572480 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:49,935-Speed 2494.63 samples/sec Loss 1.5143 LearningRate 0.000119 Epoch: 27 Global Step: 572490 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:29:58,148-Speed 2493.95 samples/sec Loss 1.5064 LearningRate 0.000119 Epoch: 27 Global Step: 572500 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:30:06,353-Speed 2496.50 samples/sec Loss 1.4545 LearningRate 0.000119 Epoch: 27 Global Step: 572510 Fp16 Grad Scale: 8192 Required: 59 hours Training: 2022-07-11 01:30:14,552-Speed 2498.30 samples/sec Loss 1.4578 LearningRate 0.000119 Epoch: 27 Global Step: 572520 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:30:22,703-Speed 2512.91 samples/sec Loss 1.4870 LearningRate 0.000119 Epoch: 27 Global Step: 572530 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:30:30,906-Speed 2497.35 samples/sec Loss 1.4937 LearningRate 0.000119 Epoch: 27 Global Step: 572540 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:30:39,111-Speed 2496.40 samples/sec Loss 1.5436 LearningRate 0.000119 Epoch: 27 Global Step: 572550 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:30:47,312-Speed 2497.72 samples/sec Loss 1.4881 LearningRate 0.000118 Epoch: 27 Global Step: 572560 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:30:55,513-Speed 2497.73 samples/sec Loss 1.5174 LearningRate 0.000118 Epoch: 27 Global Step: 572570 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:03,713-Speed 2497.65 samples/sec Loss 1.5324 LearningRate 0.000118 Epoch: 27 Global Step: 572580 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:11,862-Speed 2513.64 samples/sec Loss 1.5140 LearningRate 0.000118 Epoch: 27 Global Step: 572590 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:20,063-Speed 2497.55 samples/sec Loss 1.4973 LearningRate 0.000118 Epoch: 27 Global Step: 572600 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:28,265-Speed 2497.54 samples/sec Loss 1.5000 LearningRate 0.000118 Epoch: 27 Global Step: 572610 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:36,464-Speed 2498.24 samples/sec Loss 1.4818 LearningRate 0.000118 Epoch: 27 Global Step: 572620 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:44,664-Speed 2498.02 samples/sec Loss 1.5328 LearningRate 0.000118 Epoch: 27 Global Step: 572630 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:31:52,868-Speed 2496.44 samples/sec Loss 1.4886 LearningRate 0.000118 Epoch: 27 Global Step: 572640 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:01,021-Speed 2512.44 samples/sec Loss 1.4935 LearningRate 0.000118 Epoch: 27 Global Step: 572650 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:09,220-Speed 2498.07 samples/sec Loss 1.5138 LearningRate 0.000118 Epoch: 27 Global Step: 572660 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:17,422-Speed 2497.46 samples/sec Loss 1.5112 LearningRate 0.000118 Epoch: 27 Global Step: 572670 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:25,629-Speed 2495.67 samples/sec Loss 1.5377 LearningRate 0.000118 Epoch: 27 Global Step: 572680 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:33,831-Speed 2497.18 samples/sec Loss 1.5040 LearningRate 0.000118 Epoch: 27 Global Step: 572690 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:42,034-Speed 2497.34 samples/sec Loss 1.5352 LearningRate 0.000118 Epoch: 27 Global Step: 572700 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:50,182-Speed 2514.04 samples/sec Loss 1.5069 LearningRate 0.000118 Epoch: 27 Global Step: 572710 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:32:58,391-Speed 2495.09 samples/sec Loss 1.5265 LearningRate 0.000118 Epoch: 27 Global Step: 572720 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:06,589-Speed 2498.86 samples/sec Loss 1.5404 LearningRate 0.000118 Epoch: 27 Global Step: 572730 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:14,791-Speed 2497.36 samples/sec Loss 1.5219 LearningRate 0.000118 Epoch: 27 Global Step: 572740 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:22,992-Speed 2497.54 samples/sec Loss 1.4808 LearningRate 0.000118 Epoch: 27 Global Step: 572750 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:31,190-Speed 2498.43 samples/sec Loss 1.5123 LearningRate 0.000118 Epoch: 27 Global Step: 572760 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:39,341-Speed 2513.19 samples/sec Loss 1.4913 LearningRate 0.000118 Epoch: 27 Global Step: 572770 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:47,540-Speed 2498.50 samples/sec Loss 1.4999 LearningRate 0.000118 Epoch: 27 Global Step: 572780 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:33:55,740-Speed 2497.75 samples/sec Loss 1.5043 LearningRate 0.000118 Epoch: 27 Global Step: 572790 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:03,942-Speed 2497.26 samples/sec Loss 1.5050 LearningRate 0.000118 Epoch: 27 Global Step: 572800 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:12,148-Speed 2496.29 samples/sec Loss 1.4812 LearningRate 0.000118 Epoch: 27 Global Step: 572810 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:20,351-Speed 2497.37 samples/sec Loss 1.5251 LearningRate 0.000118 Epoch: 27 Global Step: 572820 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:28,499-Speed 2513.81 samples/sec Loss 1.5105 LearningRate 0.000118 Epoch: 27 Global Step: 572830 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:36,701-Speed 2497.43 samples/sec Loss 1.4862 LearningRate 0.000118 Epoch: 27 Global Step: 572840 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:44,911-Speed 2494.71 samples/sec Loss 1.5163 LearningRate 0.000118 Epoch: 27 Global Step: 572850 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:34:53,112-Speed 2497.66 samples/sec Loss 1.4984 LearningRate 0.000118 Epoch: 27 Global Step: 572860 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:01,313-Speed 2497.78 samples/sec Loss 1.4875 LearningRate 0.000118 Epoch: 27 Global Step: 572870 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:09,515-Speed 2497.29 samples/sec Loss 1.5245 LearningRate 0.000118 Epoch: 27 Global Step: 572880 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:17,662-Speed 2514.21 samples/sec Loss 1.5218 LearningRate 0.000118 Epoch: 27 Global Step: 572890 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:25,863-Speed 2497.68 samples/sec Loss 1.5266 LearningRate 0.000118 Epoch: 27 Global Step: 572900 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:34,065-Speed 2497.55 samples/sec Loss 1.4760 LearningRate 0.000118 Epoch: 27 Global Step: 572910 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:42,267-Speed 2497.23 samples/sec Loss 1.5033 LearningRate 0.000118 Epoch: 27 Global Step: 572920 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:50,481-Speed 2493.78 samples/sec Loss 1.5430 LearningRate 0.000118 Epoch: 27 Global Step: 572930 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:35:58,679-Speed 2498.56 samples/sec Loss 1.4824 LearningRate 0.000118 Epoch: 27 Global Step: 572940 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:06,825-Speed 2514.27 samples/sec Loss 1.5280 LearningRate 0.000118 Epoch: 27 Global Step: 572950 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:15,027-Speed 2497.35 samples/sec Loss 1.5272 LearningRate 0.000118 Epoch: 27 Global Step: 572960 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:23,229-Speed 2497.41 samples/sec Loss 1.5143 LearningRate 0.000118 Epoch: 27 Global Step: 572970 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:31,428-Speed 2498.31 samples/sec Loss 1.4979 LearningRate 0.000118 Epoch: 27 Global Step: 572980 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:39,640-Speed 2494.47 samples/sec Loss 1.4654 LearningRate 0.000118 Epoch: 27 Global Step: 572990 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:47,849-Speed 2495.58 samples/sec Loss 1.5036 LearningRate 0.000118 Epoch: 27 Global Step: 573000 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:36:55,997-Speed 2513.77 samples/sec Loss 1.5113 LearningRate 0.000118 Epoch: 27 Global Step: 573010 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:04,200-Speed 2497.06 samples/sec Loss 1.5302 LearningRate 0.000118 Epoch: 27 Global Step: 573020 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:12,403-Speed 2497.09 samples/sec Loss 1.4791 LearningRate 0.000118 Epoch: 27 Global Step: 573030 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:20,606-Speed 2497.08 samples/sec Loss 1.4872 LearningRate 0.000118 Epoch: 27 Global Step: 573040 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:28,810-Speed 2496.73 samples/sec Loss 1.4958 LearningRate 0.000118 Epoch: 27 Global Step: 573050 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:37,012-Speed 2497.42 samples/sec Loss 1.5010 LearningRate 0.000118 Epoch: 27 Global Step: 573060 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:45,161-Speed 2513.63 samples/sec Loss 1.5565 LearningRate 0.000118 Epoch: 27 Global Step: 573070 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:37:53,364-Speed 2496.97 samples/sec Loss 1.4984 LearningRate 0.000118 Epoch: 27 Global Step: 573080 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:01,566-Speed 2497.49 samples/sec Loss 1.5211 LearningRate 0.000118 Epoch: 27 Global Step: 573090 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:09,768-Speed 2497.28 samples/sec Loss 1.5009 LearningRate 0.000118 Epoch: 27 Global Step: 573100 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:17,971-Speed 2497.14 samples/sec Loss 1.4966 LearningRate 0.000118 Epoch: 27 Global Step: 573110 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:26,177-Speed 2496.14 samples/sec Loss 1.4906 LearningRate 0.000118 Epoch: 27 Global Step: 573120 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:34,327-Speed 2513.50 samples/sec Loss 1.4810 LearningRate 0.000118 Epoch: 27 Global Step: 573130 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:42,525-Speed 2498.36 samples/sec Loss 1.4487 LearningRate 0.000118 Epoch: 27 Global Step: 573140 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:50,724-Speed 2497.96 samples/sec Loss 1.5236 LearningRate 0.000118 Epoch: 27 Global Step: 573150 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:38:58,929-Speed 2496.55 samples/sec Loss 1.5024 LearningRate 0.000118 Epoch: 27 Global Step: 573160 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:07,129-Speed 2497.91 samples/sec Loss 1.4653 LearningRate 0.000118 Epoch: 27 Global Step: 573170 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:15,353-Speed 2490.65 samples/sec Loss 1.5064 LearningRate 0.000118 Epoch: 27 Global Step: 573180 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:23,499-Speed 2514.52 samples/sec Loss 1.4708 LearningRate 0.000118 Epoch: 27 Global Step: 573190 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:31,705-Speed 2495.88 samples/sec Loss 1.4887 LearningRate 0.000118 Epoch: 27 Global Step: 573200 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:39,907-Speed 2497.41 samples/sec Loss 1.4634 LearningRate 0.000118 Epoch: 27 Global Step: 573210 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:48,109-Speed 2497.47 samples/sec Loss 1.4858 LearningRate 0.000118 Epoch: 27 Global Step: 573220 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:39:56,309-Speed 2498.07 samples/sec Loss 1.5035 LearningRate 0.000118 Epoch: 27 Global Step: 573230 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:04,509-Speed 2497.79 samples/sec Loss 1.4729 LearningRate 0.000118 Epoch: 27 Global Step: 573240 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:12,656-Speed 2514.13 samples/sec Loss 1.5039 LearningRate 0.000118 Epoch: 27 Global Step: 573250 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:20,859-Speed 2496.97 samples/sec Loss 1.5107 LearningRate 0.000118 Epoch: 27 Global Step: 573260 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:29,059-Speed 2498.12 samples/sec Loss 1.5313 LearningRate 0.000118 Epoch: 27 Global Step: 573270 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:37,261-Speed 2497.35 samples/sec Loss 1.5182 LearningRate 0.000118 Epoch: 27 Global Step: 573280 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:45,463-Speed 2497.09 samples/sec Loss 1.4915 LearningRate 0.000118 Epoch: 27 Global Step: 573290 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:40:53,663-Speed 2498.12 samples/sec Loss 1.5207 LearningRate 0.000118 Epoch: 27 Global Step: 573300 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:01,812-Speed 2513.64 samples/sec Loss 1.5005 LearningRate 0.000118 Epoch: 27 Global Step: 573310 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:10,017-Speed 2496.38 samples/sec Loss 1.4948 LearningRate 0.000118 Epoch: 27 Global Step: 573320 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:18,215-Speed 2498.37 samples/sec Loss 1.5110 LearningRate 0.000118 Epoch: 27 Global Step: 573330 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:26,419-Speed 2496.96 samples/sec Loss 1.5021 LearningRate 0.000118 Epoch: 27 Global Step: 573340 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:34,620-Speed 2497.71 samples/sec Loss 1.5174 LearningRate 0.000118 Epoch: 27 Global Step: 573350 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:42,819-Speed 2498.02 samples/sec Loss 1.5152 LearningRate 0.000118 Epoch: 27 Global Step: 573360 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:50,966-Speed 2514.24 samples/sec Loss 1.5211 LearningRate 0.000118 Epoch: 27 Global Step: 573370 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:41:59,167-Speed 2497.70 samples/sec Loss 1.4791 LearningRate 0.000118 Epoch: 27 Global Step: 573380 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:07,369-Speed 2497.19 samples/sec Loss 1.5083 LearningRate 0.000118 Epoch: 27 Global Step: 573390 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:15,569-Speed 2498.11 samples/sec Loss 1.5241 LearningRate 0.000118 Epoch: 27 Global Step: 573400 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:23,769-Speed 2497.83 samples/sec Loss 1.5127 LearningRate 0.000118 Epoch: 27 Global Step: 573410 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:31,979-Speed 2494.84 samples/sec Loss 1.5350 LearningRate 0.000118 Epoch: 27 Global Step: 573420 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:40,138-Speed 2510.47 samples/sec Loss 1.4526 LearningRate 0.000118 Epoch: 27 Global Step: 573430 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:48,339-Speed 2497.72 samples/sec Loss 1.4716 LearningRate 0.000118 Epoch: 27 Global Step: 573440 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:42:56,538-Speed 2498.14 samples/sec Loss 1.5065 LearningRate 0.000118 Epoch: 27 Global Step: 573450 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:04,736-Speed 2498.68 samples/sec Loss 1.4592 LearningRate 0.000118 Epoch: 27 Global Step: 573460 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:12,937-Speed 2497.75 samples/sec Loss 1.5084 LearningRate 0.000118 Epoch: 27 Global Step: 573470 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:21,140-Speed 2497.36 samples/sec Loss 1.5432 LearningRate 0.000118 Epoch: 27 Global Step: 573480 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:29,305-Speed 2508.53 samples/sec Loss 1.5129 LearningRate 0.000118 Epoch: 27 Global Step: 573490 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:37,514-Speed 2495.12 samples/sec Loss 1.5247 LearningRate 0.000118 Epoch: 27 Global Step: 573500 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:45,720-Speed 2496.19 samples/sec Loss 1.4959 LearningRate 0.000118 Epoch: 27 Global Step: 573510 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:43:53,928-Speed 2495.49 samples/sec Loss 1.4826 LearningRate 0.000118 Epoch: 27 Global Step: 573520 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:02,129-Speed 2497.49 samples/sec Loss 1.5240 LearningRate 0.000118 Epoch: 27 Global Step: 573530 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:10,332-Speed 2497.14 samples/sec Loss 1.5259 LearningRate 0.000118 Epoch: 27 Global Step: 573540 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:18,482-Speed 2513.42 samples/sec Loss 1.4703 LearningRate 0.000118 Epoch: 27 Global Step: 573550 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:26,687-Speed 2496.28 samples/sec Loss 1.5002 LearningRate 0.000118 Epoch: 27 Global Step: 573560 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:34,895-Speed 2495.82 samples/sec Loss 1.4978 LearningRate 0.000118 Epoch: 27 Global Step: 573570 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:43,113-Speed 2492.67 samples/sec Loss 1.4880 LearningRate 0.000118 Epoch: 27 Global Step: 573580 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:51,314-Speed 2497.52 samples/sec Loss 1.4863 LearningRate 0.000118 Epoch: 27 Global Step: 573590 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:44:59,516-Speed 2497.35 samples/sec Loss 1.5075 LearningRate 0.000118 Epoch: 27 Global Step: 573600 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:07,662-Speed 2514.47 samples/sec Loss 1.4840 LearningRate 0.000118 Epoch: 27 Global Step: 573610 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:15,862-Speed 2498.14 samples/sec Loss 1.5230 LearningRate 0.000118 Epoch: 27 Global Step: 573620 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:24,082-Speed 2491.86 samples/sec Loss 1.4847 LearningRate 0.000118 Epoch: 27 Global Step: 573630 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:32,282-Speed 2497.81 samples/sec Loss 1.4877 LearningRate 0.000117 Epoch: 27 Global Step: 573640 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:40,483-Speed 2497.73 samples/sec Loss 1.4976 LearningRate 0.000117 Epoch: 27 Global Step: 573650 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:48,682-Speed 2498.36 samples/sec Loss 1.4997 LearningRate 0.000117 Epoch: 27 Global Step: 573660 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:45:56,827-Speed 2514.61 samples/sec Loss 1.5014 LearningRate 0.000117 Epoch: 27 Global Step: 573670 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:46:05,031-Speed 2496.79 samples/sec Loss 1.4980 LearningRate 0.000117 Epoch: 27 Global Step: 573680 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:46:13,235-Speed 2496.92 samples/sec Loss 1.4997 LearningRate 0.000117 Epoch: 27 Global Step: 573690 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:46:21,436-Speed 2497.65 samples/sec Loss 1.4707 LearningRate 0.000117 Epoch: 27 Global Step: 573700 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:46:29,639-Speed 2496.88 samples/sec Loss 1.5185 LearningRate 0.000117 Epoch: 27 Global Step: 573710 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:46:37,845-Speed 2496.08 samples/sec Loss 1.5043 LearningRate 0.000117 Epoch: 27 Global Step: 573720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:46:45,999-Speed 2512.57 samples/sec Loss 1.4809 LearningRate 0.000117 Epoch: 27 Global Step: 573730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:46:54,205-Speed 2496.16 samples/sec Loss 1.4870 LearningRate 0.000117 Epoch: 27 Global Step: 573740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:02,421-Speed 2493.06 samples/sec Loss 1.4838 LearningRate 0.000117 Epoch: 27 Global Step: 573750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:10,626-Speed 2496.61 samples/sec Loss 1.4832 LearningRate 0.000117 Epoch: 27 Global Step: 573760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:18,831-Speed 2496.48 samples/sec Loss 1.5481 LearningRate 0.000117 Epoch: 27 Global Step: 573770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:27,033-Speed 2497.43 samples/sec Loss 1.4909 LearningRate 0.000117 Epoch: 27 Global Step: 573780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:35,204-Speed 2506.54 samples/sec Loss 1.4991 LearningRate 0.000117 Epoch: 27 Global Step: 573790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:43,416-Speed 2494.38 samples/sec Loss 1.5455 LearningRate 0.000117 Epoch: 27 Global Step: 573800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:51,621-Speed 2496.65 samples/sec Loss 1.4991 LearningRate 0.000117 Epoch: 27 Global Step: 573810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:47:59,823-Speed 2497.22 samples/sec Loss 1.4755 LearningRate 0.000117 Epoch: 27 Global Step: 573820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:08,026-Speed 2497.21 samples/sec Loss 1.5106 LearningRate 0.000117 Epoch: 27 Global Step: 573830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:16,227-Speed 2497.73 samples/sec Loss 1.5370 LearningRate 0.000117 Epoch: 27 Global Step: 573840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:24,377-Speed 2513.01 samples/sec Loss 1.5061 LearningRate 0.000117 Epoch: 27 Global Step: 573850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:32,590-Speed 2494.07 samples/sec Loss 1.4502 LearningRate 0.000117 Epoch: 27 Global Step: 573860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:40,797-Speed 2496.19 samples/sec Loss 1.4747 LearningRate 0.000117 Epoch: 27 Global Step: 573870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:49,001-Speed 2496.83 samples/sec Loss 1.4962 LearningRate 0.000117 Epoch: 27 Global Step: 573880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:48:57,201-Speed 2497.83 samples/sec Loss 1.5208 LearningRate 0.000117 Epoch: 27 Global Step: 573890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:05,402-Speed 2497.78 samples/sec Loss 1.5010 LearningRate 0.000117 Epoch: 27 Global Step: 573900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:13,551-Speed 2513.66 samples/sec Loss 1.4691 LearningRate 0.000117 Epoch: 27 Global Step: 573910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:21,753-Speed 2497.40 samples/sec Loss 1.5197 LearningRate 0.000117 Epoch: 27 Global Step: 573920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:29,966-Speed 2493.98 samples/sec Loss 1.4767 LearningRate 0.000117 Epoch: 27 Global Step: 573930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:38,166-Speed 2497.80 samples/sec Loss 1.4939 LearningRate 0.000117 Epoch: 27 Global Step: 573940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:46,366-Speed 2498.09 samples/sec Loss 1.5223 LearningRate 0.000117 Epoch: 27 Global Step: 573950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-07-11 01:49:54,524-Speed 2510.74 samples/sec Loss 1.4909 LearningRate 0.000117 Epoch: 27 Global Step: 573960 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:02,675-Speed 2513.15 samples/sec Loss 1.5017 LearningRate 0.000117 Epoch: 27 Global Step: 573970 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:10,878-Speed 2497.10 samples/sec Loss 1.5280 LearningRate 0.000117 Epoch: 27 Global Step: 573980 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:19,076-Speed 2498.57 samples/sec Loss 1.5139 LearningRate 0.000117 Epoch: 27 Global Step: 573990 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:27,279-Speed 2497.11 samples/sec Loss 1.5021 LearningRate 0.000117 Epoch: 27 Global Step: 574000 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:35,482-Speed 2497.13 samples/sec Loss 1.4744 LearningRate 0.000117 Epoch: 27 Global Step: 574010 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-07-11 01:50:43,682-Speed 2497.92 samples/sec Loss 1.4834 LearningRate 0.000117 Epoch: 27 Global Step: 574020 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:50:51,832-Speed 2513.34 samples/sec Loss 1.5345 LearningRate 0.000117 Epoch: 27 Global Step: 574030 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:00,037-Speed 2496.57 samples/sec Loss 1.4878 LearningRate 0.000117 Epoch: 27 Global Step: 574040 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:08,236-Speed 2498.30 samples/sec Loss 1.4699 LearningRate 0.000117 Epoch: 27 Global Step: 574050 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:16,441-Speed 2496.42 samples/sec Loss 1.4978 LearningRate 0.000117 Epoch: 27 Global Step: 574060 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:24,643-Speed 2497.20 samples/sec Loss 1.4970 LearningRate 0.000117 Epoch: 27 Global Step: 574070 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:32,843-Speed 2497.98 samples/sec Loss 1.4844 LearningRate 0.000117 Epoch: 27 Global Step: 574080 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:40,990-Speed 2514.46 samples/sec Loss 1.5228 LearningRate 0.000117 Epoch: 27 Global Step: 574090 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:49,194-Speed 2496.54 samples/sec Loss 1.5067 LearningRate 0.000117 Epoch: 27 Global Step: 574100 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:51:57,400-Speed 2496.35 samples/sec Loss 1.4776 LearningRate 0.000117 Epoch: 27 Global Step: 574110 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:05,604-Speed 2496.69 samples/sec Loss 1.5312 LearningRate 0.000117 Epoch: 27 Global Step: 574120 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:13,808-Speed 2496.87 samples/sec Loss 1.5036 LearningRate 0.000117 Epoch: 27 Global Step: 574130 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:22,011-Speed 2497.10 samples/sec Loss 1.5325 LearningRate 0.000117 Epoch: 27 Global Step: 574140 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:30,162-Speed 2512.94 samples/sec Loss 1.5257 LearningRate 0.000117 Epoch: 27 Global Step: 574150 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:38,363-Speed 2498.04 samples/sec Loss 1.5312 LearningRate 0.000117 Epoch: 27 Global Step: 574160 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:46,568-Speed 2496.22 samples/sec Loss 1.5316 LearningRate 0.000117 Epoch: 27 Global Step: 574170 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:52:54,770-Speed 2497.17 samples/sec Loss 1.5431 LearningRate 0.000117 Epoch: 27 Global Step: 574180 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:02,974-Speed 2497.08 samples/sec Loss 1.4740 LearningRate 0.000117 Epoch: 27 Global Step: 574190 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:11,176-Speed 2497.50 samples/sec Loss 1.4881 LearningRate 0.000117 Epoch: 27 Global Step: 574200 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:19,322-Speed 2514.45 samples/sec Loss 1.4960 LearningRate 0.000117 Epoch: 27 Global Step: 574210 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:27,529-Speed 2495.59 samples/sec Loss 1.4877 LearningRate 0.000117 Epoch: 27 Global Step: 574220 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:35,738-Speed 2495.52 samples/sec Loss 1.5059 LearningRate 0.000117 Epoch: 27 Global Step: 574230 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:43,944-Speed 2496.30 samples/sec Loss 1.4845 LearningRate 0.000117 Epoch: 27 Global Step: 574240 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:53:52,149-Speed 2496.40 samples/sec Loss 1.4689 LearningRate 0.000117 Epoch: 27 Global Step: 574250 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:00,352-Speed 2496.97 samples/sec Loss 1.4808 LearningRate 0.000117 Epoch: 27 Global Step: 574260 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:08,503-Speed 2512.95 samples/sec Loss 1.4595 LearningRate 0.000117 Epoch: 27 Global Step: 574270 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:16,720-Speed 2492.84 samples/sec Loss 1.4942 LearningRate 0.000117 Epoch: 27 Global Step: 574280 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:24,927-Speed 2495.76 samples/sec Loss 1.4724 LearningRate 0.000117 Epoch: 27 Global Step: 574290 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:33,135-Speed 2495.57 samples/sec Loss 1.4621 LearningRate 0.000117 Epoch: 27 Global Step: 574300 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:41,337-Speed 2497.27 samples/sec Loss 1.4829 LearningRate 0.000117 Epoch: 27 Global Step: 574310 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:49,548-Speed 2494.91 samples/sec Loss 1.4838 LearningRate 0.000117 Epoch: 27 Global Step: 574320 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:54:57,702-Speed 2512.11 samples/sec Loss 1.4757 LearningRate 0.000117 Epoch: 27 Global Step: 574330 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:05,906-Speed 2496.74 samples/sec Loss 1.4752 LearningRate 0.000117 Epoch: 27 Global Step: 574340 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:14,105-Speed 2498.03 samples/sec Loss 1.5195 LearningRate 0.000117 Epoch: 27 Global Step: 574350 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:22,320-Speed 2493.54 samples/sec Loss 1.4906 LearningRate 0.000117 Epoch: 27 Global Step: 574360 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:30,520-Speed 2497.90 samples/sec Loss 1.5000 LearningRate 0.000117 Epoch: 27 Global Step: 574370 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:38,722-Speed 2497.32 samples/sec Loss 1.5082 LearningRate 0.000117 Epoch: 27 Global Step: 574380 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:46,868-Speed 2514.59 samples/sec Loss 1.4844 LearningRate 0.000117 Epoch: 27 Global Step: 574390 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:55:55,069-Speed 2497.67 samples/sec Loss 1.5372 LearningRate 0.000117 Epoch: 27 Global Step: 574400 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:03,280-Speed 2494.49 samples/sec Loss 1.5167 LearningRate 0.000117 Epoch: 27 Global Step: 574410 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:11,490-Speed 2494.62 samples/sec Loss 1.5148 LearningRate 0.000117 Epoch: 27 Global Step: 574420 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:19,695-Speed 2497.06 samples/sec Loss 1.4965 LearningRate 0.000117 Epoch: 27 Global Step: 574430 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:27,897-Speed 2497.39 samples/sec Loss 1.5258 LearningRate 0.000117 Epoch: 27 Global Step: 574440 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:36,044-Speed 2514.04 samples/sec Loss 1.5141 LearningRate 0.000117 Epoch: 27 Global Step: 574450 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:44,245-Speed 2497.82 samples/sec Loss 1.5216 LearningRate 0.000117 Epoch: 27 Global Step: 574460 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:56:52,443-Speed 2498.96 samples/sec Loss 1.4740 LearningRate 0.000117 Epoch: 27 Global Step: 574470 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:00,642-Speed 2498.23 samples/sec Loss 1.5034 LearningRate 0.000117 Epoch: 27 Global Step: 574480 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:08,844-Speed 2497.32 samples/sec Loss 1.4885 LearningRate 0.000117 Epoch: 27 Global Step: 574490 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:17,043-Speed 2498.35 samples/sec Loss 1.4956 LearningRate 0.000117 Epoch: 27 Global Step: 574500 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:25,188-Speed 2514.78 samples/sec Loss 1.4856 LearningRate 0.000117 Epoch: 27 Global Step: 574510 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:33,389-Speed 2497.67 samples/sec Loss 1.4918 LearningRate 0.000117 Epoch: 27 Global Step: 574520 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:41,589-Speed 2497.79 samples/sec Loss 1.4620 LearningRate 0.000117 Epoch: 27 Global Step: 574530 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:49,791-Speed 2497.62 samples/sec Loss 1.4782 LearningRate 0.000117 Epoch: 27 Global Step: 574540 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:57:57,997-Speed 2496.17 samples/sec Loss 1.4762 LearningRate 0.000117 Epoch: 27 Global Step: 574550 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:06,197-Speed 2497.79 samples/sec Loss 1.4943 LearningRate 0.000117 Epoch: 27 Global Step: 574560 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:14,344-Speed 2514.33 samples/sec Loss 1.4700 LearningRate 0.000117 Epoch: 27 Global Step: 574570 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:22,551-Speed 2496.02 samples/sec Loss 1.5007 LearningRate 0.000117 Epoch: 27 Global Step: 574580 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:30,749-Speed 2498.64 samples/sec Loss 1.5021 LearningRate 0.000117 Epoch: 27 Global Step: 574590 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:38,951-Speed 2497.59 samples/sec Loss 1.4560 LearningRate 0.000117 Epoch: 27 Global Step: 574600 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:47,292-Speed 2499.53 samples/sec Loss 1.4928 LearningRate 0.000117 Epoch: 27 Global Step: 574610 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:58:55,541-Speed 2499.27 samples/sec Loss 1.5440 LearningRate 0.000117 Epoch: 27 Global Step: 574620 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:04,127-Speed 2517.09 samples/sec Loss 1.4918 LearningRate 0.000117 Epoch: 27 Global Step: 574630 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:12,326-Speed 2498.10 samples/sec Loss 1.4767 LearningRate 0.000117 Epoch: 27 Global Step: 574640 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:20,575-Speed 2499.66 samples/sec Loss 1.5064 LearningRate 0.000117 Epoch: 27 Global Step: 574650 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:28,773-Speed 2498.49 samples/sec Loss 1.5048 LearningRate 0.000117 Epoch: 27 Global Step: 574660 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:37,004-Speed 2498.08 samples/sec Loss 1.5210 LearningRate 0.000117 Epoch: 27 Global Step: 574670 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:45,220-Speed 2499.47 samples/sec Loss 1.5079 LearningRate 0.000117 Epoch: 27 Global Step: 574680 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 01:59:53,368-Speed 2513.91 samples/sec Loss 1.4965 LearningRate 0.000117 Epoch: 27 Global Step: 574690 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:05,018-Speed 1832.02 samples/sec Loss 1.4947 LearningRate 0.000117 Epoch: 27 Global Step: 574700 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:13,446-Speed 2499.88 samples/sec Loss 1.5100 LearningRate 0.000117 Epoch: 27 Global Step: 574710 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:21,664-Speed 2500.27 samples/sec Loss 1.4696 LearningRate 0.000117 Epoch: 27 Global Step: 574720 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:32,020-Speed 1977.85 samples/sec Loss 1.4831 LearningRate 0.000116 Epoch: 27 Global Step: 574730 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:40,772-Speed 2345.36 samples/sec Loss 1.5140 LearningRate 0.000116 Epoch: 27 Global Step: 574740 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:00:48,934-Speed 2516.53 samples/sec Loss 1.4876 LearningRate 0.000116 Epoch: 27 Global Step: 574750 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:00,225-Speed 2495.56 samples/sec Loss 1.4753 LearningRate 0.000116 Epoch: 27 Global Step: 574760 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:08,427-Speed 2497.43 samples/sec Loss 1.4908 LearningRate 0.000116 Epoch: 27 Global Step: 574770 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:16,663-Speed 2499.40 samples/sec Loss 1.4979 LearningRate 0.000116 Epoch: 27 Global Step: 574780 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:24,910-Speed 2497.55 samples/sec Loss 1.5201 LearningRate 0.000116 Epoch: 27 Global Step: 574790 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:36,266-Speed 1803.69 samples/sec Loss 1.4981 LearningRate 0.000116 Epoch: 27 Global Step: 574800 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:44,753-Speed 2418.24 samples/sec Loss 1.4646 LearningRate 0.000116 Epoch: 27 Global Step: 574810 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:01:53,023-Speed 2499.17 samples/sec Loss 1.5209 LearningRate 0.000116 Epoch: 27 Global Step: 574820 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:01,223-Speed 2498.74 samples/sec Loss 1.5007 LearningRate 0.000116 Epoch: 27 Global Step: 574830 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:11,007-Speed 2093.38 samples/sec Loss 1.4861 LearningRate 0.000116 Epoch: 27 Global Step: 574840 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:19,211-Speed 2496.58 samples/sec Loss 1.4673 LearningRate 0.000116 Epoch: 27 Global Step: 574850 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:27,419-Speed 2495.42 samples/sec Loss 1.4793 LearningRate 0.000116 Epoch: 27 Global Step: 574860 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:35,589-Speed 2507.17 samples/sec Loss 1.4864 LearningRate 0.000116 Epoch: 27 Global Step: 574870 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:43,793-Speed 2496.80 samples/sec Loss 1.4927 LearningRate 0.000116 Epoch: 27 Global Step: 574880 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:02:52,002-Speed 2495.43 samples/sec Loss 1.5117 LearningRate 0.000116 Epoch: 27 Global Step: 574890 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:00,208-Speed 2496.30 samples/sec Loss 1.5126 LearningRate 0.000116 Epoch: 27 Global Step: 574900 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:08,412-Speed 2496.84 samples/sec Loss 1.4981 LearningRate 0.000116 Epoch: 27 Global Step: 574910 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:16,622-Speed 2494.92 samples/sec Loss 1.5000 LearningRate 0.000116 Epoch: 27 Global Step: 574920 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:24,779-Speed 2511.16 samples/sec Loss 1.5059 LearningRate 0.000116 Epoch: 27 Global Step: 574930 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:32,983-Speed 2496.71 samples/sec Loss 1.4952 LearningRate 0.000116 Epoch: 27 Global Step: 574940 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:41,188-Speed 2496.37 samples/sec Loss 1.4583 LearningRate 0.000116 Epoch: 27 Global Step: 574950 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:49,391-Speed 2497.20 samples/sec Loss 1.4803 LearningRate 0.000116 Epoch: 27 Global Step: 574960 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:03:57,607-Speed 2492.94 samples/sec Loss 1.5117 LearningRate 0.000116 Epoch: 27 Global Step: 574970 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:05,810-Speed 2496.98 samples/sec Loss 1.4956 LearningRate 0.000116 Epoch: 27 Global Step: 574980 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:13,958-Speed 2513.85 samples/sec Loss 1.4918 LearningRate 0.000116 Epoch: 27 Global Step: 574990 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:22,163-Speed 2496.24 samples/sec Loss 1.4773 LearningRate 0.000116 Epoch: 27 Global Step: 575000 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:30,368-Speed 2496.74 samples/sec Loss 1.5042 LearningRate 0.000116 Epoch: 27 Global Step: 575010 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:38,573-Speed 2496.33 samples/sec Loss 1.4948 LearningRate 0.000116 Epoch: 27 Global Step: 575020 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:46,778-Speed 2496.42 samples/sec Loss 1.4664 LearningRate 0.000116 Epoch: 27 Global Step: 575030 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:04:54,983-Speed 2496.29 samples/sec Loss 1.4858 LearningRate 0.000116 Epoch: 27 Global Step: 575040 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:03,137-Speed 2512.26 samples/sec Loss 1.5126 LearningRate 0.000116 Epoch: 27 Global Step: 575050 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:11,345-Speed 2495.59 samples/sec Loss 1.4935 LearningRate 0.000116 Epoch: 27 Global Step: 575060 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:19,546-Speed 2497.41 samples/sec Loss 1.4961 LearningRate 0.000116 Epoch: 27 Global Step: 575070 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:27,751-Speed 2496.44 samples/sec Loss 1.4806 LearningRate 0.000116 Epoch: 27 Global Step: 575080 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:35,954-Speed 2496.77 samples/sec Loss 1.5305 LearningRate 0.000116 Epoch: 27 Global Step: 575090 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:44,162-Speed 2495.69 samples/sec Loss 1.5205 LearningRate 0.000116 Epoch: 27 Global Step: 575100 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:05:52,312-Speed 2513.27 samples/sec Loss 1.4962 LearningRate 0.000116 Epoch: 27 Global Step: 575110 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:06:00,515-Speed 2497.09 samples/sec Loss 1.5175 LearningRate 0.000116 Epoch: 27 Global Step: 575120 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:06:08,727-Speed 2494.60 samples/sec Loss 1.5206 LearningRate 0.000116 Epoch: 27 Global Step: 575130 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:06:16,923-Speed 2499.13 samples/sec Loss 1.4935 LearningRate 0.000116 Epoch: 27 Global Step: 575140 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:06:25,120-Speed 2498.74 samples/sec Loss 1.4923 LearningRate 0.000116 Epoch: 27 Global Step: 575150 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:06:33,324-Speed 2496.61 samples/sec Loss 1.5087 LearningRate 0.000116 Epoch: 27 Global Step: 575160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:06:41,481-Speed 2511.40 samples/sec Loss 1.4997 LearningRate 0.000116 Epoch: 27 Global Step: 575170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:06:49,686-Speed 2496.68 samples/sec Loss 1.5226 LearningRate 0.000116 Epoch: 27 Global Step: 575180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:06:57,885-Speed 2497.87 samples/sec Loss 1.4985 LearningRate 0.000116 Epoch: 27 Global Step: 575190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:06,086-Speed 2497.76 samples/sec Loss 1.5344 LearningRate 0.000116 Epoch: 27 Global Step: 575200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:14,288-Speed 2497.25 samples/sec Loss 1.5535 LearningRate 0.000116 Epoch: 27 Global Step: 575210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:22,497-Speed 2495.64 samples/sec Loss 1.5043 LearningRate 0.000116 Epoch: 27 Global Step: 575220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:30,647-Speed 2513.24 samples/sec Loss 1.5131 LearningRate 0.000116 Epoch: 27 Global Step: 575230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:38,847-Speed 2497.66 samples/sec Loss 1.5034 LearningRate 0.000116 Epoch: 27 Global Step: 575240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:47,058-Speed 2495.18 samples/sec Loss 1.5100 LearningRate 0.000116 Epoch: 27 Global Step: 575250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:07:55,265-Speed 2495.58 samples/sec Loss 1.4944 LearningRate 0.000116 Epoch: 27 Global Step: 575260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:03,472-Speed 2495.74 samples/sec Loss 1.5168 LearningRate 0.000116 Epoch: 27 Global Step: 575270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:11,675-Speed 2497.35 samples/sec Loss 1.4632 LearningRate 0.000116 Epoch: 27 Global Step: 575280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:19,861-Speed 2502.57 samples/sec Loss 1.5334 LearningRate 0.000116 Epoch: 27 Global Step: 575290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:28,063-Speed 2497.31 samples/sec Loss 1.5203 LearningRate 0.000116 Epoch: 27 Global Step: 575300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:36,262-Speed 2498.24 samples/sec Loss 1.5157 LearningRate 0.000116 Epoch: 27 Global Step: 575310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:44,463-Speed 2497.71 samples/sec Loss 1.5317 LearningRate 0.000116 Epoch: 27 Global Step: 575320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:08:52,681-Speed 2492.53 samples/sec Loss 1.5230 LearningRate 0.000116 Epoch: 27 Global Step: 575330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:00,884-Speed 2496.72 samples/sec Loss 1.5038 LearningRate 0.000116 Epoch: 27 Global Step: 575340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:09,033-Speed 2513.59 samples/sec Loss 1.4994 LearningRate 0.000116 Epoch: 27 Global Step: 575350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:17,234-Speed 2497.95 samples/sec Loss 1.5110 LearningRate 0.000116 Epoch: 27 Global Step: 575360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:25,441-Speed 2495.57 samples/sec Loss 1.5270 LearningRate 0.000116 Epoch: 27 Global Step: 575370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:33,644-Speed 2497.16 samples/sec Loss 1.5255 LearningRate 0.000116 Epoch: 27 Global Step: 575380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:41,855-Speed 2494.47 samples/sec Loss 1.5024 LearningRate 0.000116 Epoch: 27 Global Step: 575390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:50,057-Speed 2497.28 samples/sec Loss 1.4672 LearningRate 0.000116 Epoch: 27 Global Step: 575400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:09:58,214-Speed 2511.27 samples/sec Loss 1.5203 LearningRate 0.000116 Epoch: 27 Global Step: 575410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:10:06,415-Speed 2498.04 samples/sec Loss 1.4828 LearningRate 0.000116 Epoch: 27 Global Step: 575420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:10:14,616-Speed 2497.58 samples/sec Loss 1.5049 LearningRate 0.000116 Epoch: 27 Global Step: 575430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:10:22,815-Speed 2498.25 samples/sec Loss 1.5044 LearningRate 0.000116 Epoch: 27 Global Step: 575440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:10:30,975-Speed 2510.23 samples/sec Loss 1.5190 LearningRate 0.000116 Epoch: 27 Global Step: 575450 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:10:39,178-Speed 2496.99 samples/sec Loss 1.5087 LearningRate 0.000116 Epoch: 27 Global Step: 575460 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:10:47,335-Speed 2510.87 samples/sec Loss 1.5072 LearningRate 0.000116 Epoch: 27 Global Step: 575470 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:10:55,537-Speed 2497.56 samples/sec Loss 1.5012 LearningRate 0.000116 Epoch: 27 Global Step: 575480 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:03,737-Speed 2498.01 samples/sec Loss 1.5047 LearningRate 0.000116 Epoch: 27 Global Step: 575490 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:11,938-Speed 2497.39 samples/sec Loss 1.5011 LearningRate 0.000116 Epoch: 27 Global Step: 575500 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:20,140-Speed 2497.30 samples/sec Loss 1.4891 LearningRate 0.000116 Epoch: 27 Global Step: 575510 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:28,349-Speed 2495.49 samples/sec Loss 1.4691 LearningRate 0.000116 Epoch: 27 Global Step: 575520 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:36,498-Speed 2513.45 samples/sec Loss 1.4446 LearningRate 0.000116 Epoch: 27 Global Step: 575530 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:44,700-Speed 2497.39 samples/sec Loss 1.4638 LearningRate 0.000116 Epoch: 27 Global Step: 575540 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:11:52,902-Speed 2497.49 samples/sec Loss 1.4917 LearningRate 0.000116 Epoch: 27 Global Step: 575550 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:01,125-Speed 2490.89 samples/sec Loss 1.4800 LearningRate 0.000116 Epoch: 27 Global Step: 575560 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:09,331-Speed 2496.20 samples/sec Loss 1.5242 LearningRate 0.000116 Epoch: 27 Global Step: 575570 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:17,536-Speed 2496.30 samples/sec Loss 1.4827 LearningRate 0.000116 Epoch: 27 Global Step: 575580 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:25,688-Speed 2513.17 samples/sec Loss 1.4809 LearningRate 0.000116 Epoch: 27 Global Step: 575590 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:33,895-Speed 2495.87 samples/sec Loss 1.4970 LearningRate 0.000116 Epoch: 27 Global Step: 575600 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:42,094-Speed 2498.32 samples/sec Loss 1.5232 LearningRate 0.000116 Epoch: 27 Global Step: 575610 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:50,296-Speed 2497.48 samples/sec Loss 1.4505 LearningRate 0.000116 Epoch: 27 Global Step: 575620 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:12:58,504-Speed 2495.84 samples/sec Loss 1.4460 LearningRate 0.000116 Epoch: 27 Global Step: 575630 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:06,707-Speed 2496.97 samples/sec Loss 1.5008 LearningRate 0.000116 Epoch: 27 Global Step: 575640 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:14,857-Speed 2513.25 samples/sec Loss 1.4611 LearningRate 0.000116 Epoch: 27 Global Step: 575650 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:23,061-Speed 2496.92 samples/sec Loss 1.5255 LearningRate 0.000116 Epoch: 27 Global Step: 575660 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:31,273-Speed 2494.29 samples/sec Loss 1.4513 LearningRate 0.000116 Epoch: 27 Global Step: 575670 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:39,476-Speed 2497.14 samples/sec Loss 1.5145 LearningRate 0.000116 Epoch: 27 Global Step: 575680 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:47,695-Speed 2492.24 samples/sec Loss 1.4787 LearningRate 0.000116 Epoch: 27 Global Step: 575690 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:13:55,902-Speed 2495.78 samples/sec Loss 1.4865 LearningRate 0.000116 Epoch: 27 Global Step: 575700 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:04,058-Speed 2511.44 samples/sec Loss 1.4629 LearningRate 0.000116 Epoch: 27 Global Step: 575710 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:12,262-Speed 2496.51 samples/sec Loss 1.4887 LearningRate 0.000116 Epoch: 27 Global Step: 575720 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:20,464-Speed 2497.41 samples/sec Loss 1.4979 LearningRate 0.000116 Epoch: 27 Global Step: 575730 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:28,666-Speed 2497.40 samples/sec Loss 1.4472 LearningRate 0.000116 Epoch: 27 Global Step: 575740 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:36,865-Speed 2498.42 samples/sec Loss 1.5140 LearningRate 0.000116 Epoch: 27 Global Step: 575750 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:45,067-Speed 2497.31 samples/sec Loss 1.5252 LearningRate 0.000116 Epoch: 27 Global Step: 575760 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:14:53,220-Speed 2512.57 samples/sec Loss 1.5102 LearningRate 0.000116 Epoch: 27 Global Step: 575770 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:01,419-Speed 2498.11 samples/sec Loss 1.5120 LearningRate 0.000116 Epoch: 27 Global Step: 575780 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:09,646-Speed 2490.26 samples/sec Loss 1.4851 LearningRate 0.000116 Epoch: 27 Global Step: 575790 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:17,849-Speed 2496.98 samples/sec Loss 1.5003 LearningRate 0.000116 Epoch: 27 Global Step: 575800 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:26,048-Speed 2498.13 samples/sec Loss 1.4906 LearningRate 0.000116 Epoch: 27 Global Step: 575810 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:34,253-Speed 2496.45 samples/sec Loss 1.4802 LearningRate 0.000116 Epoch: 27 Global Step: 575820 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:42,405-Speed 2512.97 samples/sec Loss 1.5199 LearningRate 0.000115 Epoch: 27 Global Step: 575830 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:50,607-Speed 2497.40 samples/sec Loss 1.5011 LearningRate 0.000115 Epoch: 27 Global Step: 575840 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:15:58,810-Speed 2497.06 samples/sec Loss 1.4820 LearningRate 0.000115 Epoch: 27 Global Step: 575850 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:07,018-Speed 2495.44 samples/sec Loss 1.4977 LearningRate 0.000115 Epoch: 27 Global Step: 575860 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:15,227-Speed 2495.69 samples/sec Loss 1.5477 LearningRate 0.000115 Epoch: 27 Global Step: 575870 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:23,434-Speed 2495.94 samples/sec Loss 1.4954 LearningRate 0.000115 Epoch: 27 Global Step: 575880 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:31,590-Speed 2511.37 samples/sec Loss 1.5028 LearningRate 0.000115 Epoch: 27 Global Step: 575890 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:39,816-Speed 2490.08 samples/sec Loss 1.4902 LearningRate 0.000115 Epoch: 27 Global Step: 575900 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:48,024-Speed 2495.47 samples/sec Loss 1.4682 LearningRate 0.000115 Epoch: 27 Global Step: 575910 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:16:56,223-Speed 2498.28 samples/sec Loss 1.5037 LearningRate 0.000115 Epoch: 27 Global Step: 575920 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:04,429-Speed 2496.15 samples/sec Loss 1.4734 LearningRate 0.000115 Epoch: 27 Global Step: 575930 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:12,629-Speed 2497.88 samples/sec Loss 1.5300 LearningRate 0.000115 Epoch: 27 Global Step: 575940 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:20,775-Speed 2514.43 samples/sec Loss 1.4934 LearningRate 0.000115 Epoch: 27 Global Step: 575950 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:28,991-Speed 2493.19 samples/sec Loss 1.4705 LearningRate 0.000115 Epoch: 27 Global Step: 575960 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:37,190-Speed 2498.45 samples/sec Loss 1.5230 LearningRate 0.000115 Epoch: 27 Global Step: 575970 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:45,388-Speed 2498.55 samples/sec Loss 1.4690 LearningRate 0.000115 Epoch: 27 Global Step: 575980 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:17:53,588-Speed 2497.87 samples/sec Loss 1.5019 LearningRate 0.000115 Epoch: 27 Global Step: 575990 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:01,788-Speed 2497.88 samples/sec Loss 1.5155 LearningRate 0.000115 Epoch: 27 Global Step: 576000 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:09,937-Speed 2513.55 samples/sec Loss 1.4929 LearningRate 0.000115 Epoch: 27 Global Step: 576010 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:18,136-Speed 2498.41 samples/sec Loss 1.5158 LearningRate 0.000115 Epoch: 27 Global Step: 576020 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:26,335-Speed 2498.01 samples/sec Loss 1.5016 LearningRate 0.000115 Epoch: 27 Global Step: 576030 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:34,536-Speed 2497.84 samples/sec Loss 1.4706 LearningRate 0.000115 Epoch: 27 Global Step: 576040 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:42,741-Speed 2496.45 samples/sec Loss 1.4961 LearningRate 0.000115 Epoch: 27 Global Step: 576050 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:50,943-Speed 2497.30 samples/sec Loss 1.4925 LearningRate 0.000115 Epoch: 27 Global Step: 576060 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:18:59,095-Speed 2512.82 samples/sec Loss 1.4536 LearningRate 0.000115 Epoch: 27 Global Step: 576070 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:07,302-Speed 2496.01 samples/sec Loss 1.5608 LearningRate 0.000115 Epoch: 27 Global Step: 576080 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:15,498-Speed 2499.11 samples/sec Loss 1.4880 LearningRate 0.000115 Epoch: 27 Global Step: 576090 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:23,701-Speed 2496.98 samples/sec Loss 1.5083 LearningRate 0.000115 Epoch: 27 Global Step: 576100 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:31,915-Speed 2493.71 samples/sec Loss 1.4935 LearningRate 0.000115 Epoch: 27 Global Step: 576110 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:40,119-Speed 2496.49 samples/sec Loss 1.5050 LearningRate 0.000115 Epoch: 27 Global Step: 576120 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:48,272-Speed 2512.64 samples/sec Loss 1.5132 LearningRate 0.000115 Epoch: 27 Global Step: 576130 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:19:56,472-Speed 2497.94 samples/sec Loss 1.5299 LearningRate 0.000115 Epoch: 27 Global Step: 576140 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:04,673-Speed 2497.53 samples/sec Loss 1.4640 LearningRate 0.000115 Epoch: 27 Global Step: 576150 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:12,874-Speed 2497.96 samples/sec Loss 1.5108 LearningRate 0.000115 Epoch: 27 Global Step: 576160 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:21,077-Speed 2496.92 samples/sec Loss 1.4962 LearningRate 0.000115 Epoch: 27 Global Step: 576170 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:29,292-Speed 2493.38 samples/sec Loss 1.4771 LearningRate 0.000115 Epoch: 27 Global Step: 576180 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:37,439-Speed 2514.09 samples/sec Loss 1.5088 LearningRate 0.000115 Epoch: 27 Global Step: 576190 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:45,654-Speed 2493.36 samples/sec Loss 1.4888 LearningRate 0.000115 Epoch: 27 Global Step: 576200 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:20:53,860-Speed 2496.32 samples/sec Loss 1.4817 LearningRate 0.000115 Epoch: 27 Global Step: 576210 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:02,060-Speed 2497.81 samples/sec Loss 1.4843 LearningRate 0.000115 Epoch: 27 Global Step: 576220 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:10,265-Speed 2496.55 samples/sec Loss 1.4913 LearningRate 0.000115 Epoch: 27 Global Step: 576230 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:18,465-Speed 2498.17 samples/sec Loss 1.5198 LearningRate 0.000115 Epoch: 27 Global Step: 576240 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:26,622-Speed 2511.15 samples/sec Loss 1.4673 LearningRate 0.000115 Epoch: 27 Global Step: 576250 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:34,831-Speed 2495.21 samples/sec Loss 1.4971 LearningRate 0.000115 Epoch: 27 Global Step: 576260 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:43,032-Speed 2497.64 samples/sec Loss 1.5293 LearningRate 0.000115 Epoch: 27 Global Step: 576270 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:51,234-Speed 2497.60 samples/sec Loss 1.4949 LearningRate 0.000115 Epoch: 27 Global Step: 576280 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:21:59,441-Speed 2495.64 samples/sec Loss 1.4826 LearningRate 0.000115 Epoch: 27 Global Step: 576290 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:07,644-Speed 2497.07 samples/sec Loss 1.5068 LearningRate 0.000115 Epoch: 27 Global Step: 576300 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:15,798-Speed 2512.24 samples/sec Loss 1.4969 LearningRate 0.000115 Epoch: 27 Global Step: 576310 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:24,000-Speed 2497.32 samples/sec Loss 1.4882 LearningRate 0.000115 Epoch: 27 Global Step: 576320 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:32,215-Speed 2493.54 samples/sec Loss 1.5095 LearningRate 0.000115 Epoch: 27 Global Step: 576330 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:40,429-Speed 2493.47 samples/sec Loss 1.4741 LearningRate 0.000115 Epoch: 27 Global Step: 576340 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:48,657-Speed 2489.49 samples/sec Loss 1.5039 LearningRate 0.000115 Epoch: 27 Global Step: 576350 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:22:56,857-Speed 2497.94 samples/sec Loss 1.5143 LearningRate 0.000115 Epoch: 27 Global Step: 576360 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:05,005-Speed 2513.93 samples/sec Loss 1.5047 LearningRate 0.000115 Epoch: 27 Global Step: 576370 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:13,207-Speed 2497.65 samples/sec Loss 1.5060 LearningRate 0.000115 Epoch: 27 Global Step: 576380 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:21,409-Speed 2497.29 samples/sec Loss 1.4735 LearningRate 0.000115 Epoch: 27 Global Step: 576390 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:29,614-Speed 2496.54 samples/sec Loss 1.4857 LearningRate 0.000115 Epoch: 27 Global Step: 576400 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:37,815-Speed 2497.57 samples/sec Loss 1.4366 LearningRate 0.000115 Epoch: 27 Global Step: 576410 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:46,028-Speed 2494.10 samples/sec Loss 1.4764 LearningRate 0.000115 Epoch: 27 Global Step: 576420 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:23:54,176-Speed 2514.15 samples/sec Loss 1.5161 LearningRate 0.000115 Epoch: 27 Global Step: 576430 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:02,376-Speed 2498.01 samples/sec Loss 1.5175 LearningRate 0.000115 Epoch: 27 Global Step: 576440 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:10,577-Speed 2497.67 samples/sec Loss 1.4711 LearningRate 0.000115 Epoch: 27 Global Step: 576450 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:18,777-Speed 2497.67 samples/sec Loss 1.5052 LearningRate 0.000115 Epoch: 27 Global Step: 576460 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:26,981-Speed 2496.86 samples/sec Loss 1.4855 LearningRate 0.000115 Epoch: 27 Global Step: 576470 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:35,184-Speed 2496.98 samples/sec Loss 1.5373 LearningRate 0.000115 Epoch: 27 Global Step: 576480 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:43,336-Speed 2512.75 samples/sec Loss 1.4709 LearningRate 0.000115 Epoch: 27 Global Step: 576490 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:51,539-Speed 2497.37 samples/sec Loss 1.5064 LearningRate 0.000115 Epoch: 27 Global Step: 576500 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:24:59,745-Speed 2496.37 samples/sec Loss 1.4999 LearningRate 0.000115 Epoch: 27 Global Step: 576510 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:07,947-Speed 2497.10 samples/sec Loss 1.5040 LearningRate 0.000115 Epoch: 27 Global Step: 576520 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:16,155-Speed 2495.85 samples/sec Loss 1.4977 LearningRate 0.000115 Epoch: 27 Global Step: 576530 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:24,354-Speed 2498.22 samples/sec Loss 1.4909 LearningRate 0.000115 Epoch: 27 Global Step: 576540 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:32,505-Speed 2512.96 samples/sec Loss 1.5231 LearningRate 0.000115 Epoch: 27 Global Step: 576550 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:40,706-Speed 2497.89 samples/sec Loss 1.4841 LearningRate 0.000115 Epoch: 27 Global Step: 576560 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:48,908-Speed 2497.14 samples/sec Loss 1.4688 LearningRate 0.000115 Epoch: 27 Global Step: 576570 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:25:57,111-Speed 2497.31 samples/sec Loss 1.4844 LearningRate 0.000115 Epoch: 27 Global Step: 576580 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:05,314-Speed 2497.86 samples/sec Loss 1.5103 LearningRate 0.000115 Epoch: 27 Global Step: 576590 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:13,519-Speed 2496.18 samples/sec Loss 1.5113 LearningRate 0.000115 Epoch: 27 Global Step: 576600 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:21,676-Speed 2511.32 samples/sec Loss 1.5060 LearningRate 0.000115 Epoch: 27 Global Step: 576610 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:29,882-Speed 2496.07 samples/sec Loss 1.5511 LearningRate 0.000115 Epoch: 27 Global Step: 576620 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:38,096-Speed 2493.51 samples/sec Loss 1.5002 LearningRate 0.000115 Epoch: 27 Global Step: 576630 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:46,302-Speed 2496.16 samples/sec Loss 1.4602 LearningRate 0.000115 Epoch: 27 Global Step: 576640 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:26:54,508-Speed 2496.22 samples/sec Loss 1.4860 LearningRate 0.000115 Epoch: 27 Global Step: 576650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:02,715-Speed 2495.71 samples/sec Loss 1.4878 LearningRate 0.000115 Epoch: 27 Global Step: 576660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:10,870-Speed 2511.74 samples/sec Loss 1.5347 LearningRate 0.000115 Epoch: 27 Global Step: 576670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:19,072-Speed 2497.27 samples/sec Loss 1.5010 LearningRate 0.000115 Epoch: 27 Global Step: 576680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:27,275-Speed 2497.03 samples/sec Loss 1.4919 LearningRate 0.000115 Epoch: 27 Global Step: 576690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:35,476-Speed 2497.59 samples/sec Loss 1.4827 LearningRate 0.000115 Epoch: 27 Global Step: 576700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:43,689-Speed 2494.00 samples/sec Loss 1.4867 LearningRate 0.000115 Epoch: 27 Global Step: 576710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:27:51,844-Speed 2511.67 samples/sec Loss 1.5005 LearningRate 0.000115 Epoch: 27 Global Step: 576720 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:27:59,992-Speed 2514.12 samples/sec Loss 1.5301 LearningRate 0.000115 Epoch: 27 Global Step: 576730 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:08,196-Speed 2497.14 samples/sec Loss 1.5210 LearningRate 0.000115 Epoch: 27 Global Step: 576740 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:16,396-Speed 2497.56 samples/sec Loss 1.5241 LearningRate 0.000115 Epoch: 27 Global Step: 576750 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:24,599-Speed 2497.37 samples/sec Loss 1.4972 LearningRate 0.000115 Epoch: 27 Global Step: 576760 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:32,802-Speed 2496.78 samples/sec Loss 1.4897 LearningRate 0.000115 Epoch: 27 Global Step: 576770 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:41,004-Speed 2497.62 samples/sec Loss 1.5212 LearningRate 0.000115 Epoch: 27 Global Step: 576780 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:49,157-Speed 2512.15 samples/sec Loss 1.4882 LearningRate 0.000115 Epoch: 27 Global Step: 576790 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:28:57,359-Speed 2497.48 samples/sec Loss 1.5162 LearningRate 0.000115 Epoch: 27 Global Step: 576800 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:05,568-Speed 2495.00 samples/sec Loss 1.4989 LearningRate 0.000115 Epoch: 27 Global Step: 576810 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:13,767-Speed 2498.31 samples/sec Loss 1.5252 LearningRate 0.000115 Epoch: 27 Global Step: 576820 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:21,972-Speed 2496.43 samples/sec Loss 1.4850 LearningRate 0.000115 Epoch: 27 Global Step: 576830 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:30,184-Speed 2494.42 samples/sec Loss 1.4676 LearningRate 0.000115 Epoch: 27 Global Step: 576840 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:38,335-Speed 2513.03 samples/sec Loss 1.5176 LearningRate 0.000115 Epoch: 27 Global Step: 576850 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:46,539-Speed 2497.04 samples/sec Loss 1.5034 LearningRate 0.000115 Epoch: 27 Global Step: 576860 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:29:54,751-Speed 2494.03 samples/sec Loss 1.4718 LearningRate 0.000115 Epoch: 27 Global Step: 576870 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:02,955-Speed 2496.85 samples/sec Loss 1.4876 LearningRate 0.000115 Epoch: 27 Global Step: 576880 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:11,155-Speed 2498.13 samples/sec Loss 1.5044 LearningRate 0.000115 Epoch: 27 Global Step: 576890 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:19,355-Speed 2497.97 samples/sec Loss 1.4681 LearningRate 0.000115 Epoch: 27 Global Step: 576900 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:27,501-Speed 2514.44 samples/sec Loss 1.4814 LearningRate 0.000115 Epoch: 27 Global Step: 576910 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:35,713-Speed 2494.17 samples/sec Loss 1.4967 LearningRate 0.000115 Epoch: 27 Global Step: 576920 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:43,911-Speed 2498.93 samples/sec Loss 1.5129 LearningRate 0.000114 Epoch: 27 Global Step: 576930 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:30:52,112-Speed 2497.57 samples/sec Loss 1.4934 LearningRate 0.000114 Epoch: 27 Global Step: 576940 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:00,314-Speed 2497.74 samples/sec Loss 1.4746 LearningRate 0.000114 Epoch: 27 Global Step: 576950 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:08,515-Speed 2497.51 samples/sec Loss 1.4427 LearningRate 0.000114 Epoch: 27 Global Step: 576960 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:16,665-Speed 2513.26 samples/sec Loss 1.4777 LearningRate 0.000114 Epoch: 27 Global Step: 576970 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:24,874-Speed 2495.30 samples/sec Loss 1.4768 LearningRate 0.000114 Epoch: 27 Global Step: 576980 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:33,090-Speed 2493.01 samples/sec Loss 1.4671 LearningRate 0.000114 Epoch: 27 Global Step: 576990 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:41,294-Speed 2497.03 samples/sec Loss 1.4746 LearningRate 0.000114 Epoch: 27 Global Step: 577000 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:49,495-Speed 2497.70 samples/sec Loss 1.4948 LearningRate 0.000114 Epoch: 27 Global Step: 577010 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:31:57,696-Speed 2497.39 samples/sec Loss 1.4701 LearningRate 0.000114 Epoch: 27 Global Step: 577020 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:05,850-Speed 2512.13 samples/sec Loss 1.5130 LearningRate 0.000114 Epoch: 27 Global Step: 577030 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:14,050-Speed 2498.07 samples/sec Loss 1.5225 LearningRate 0.000114 Epoch: 27 Global Step: 577040 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:22,249-Speed 2497.97 samples/sec Loss 1.4817 LearningRate 0.000114 Epoch: 27 Global Step: 577050 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:30,456-Speed 2495.86 samples/sec Loss 1.4775 LearningRate 0.000114 Epoch: 27 Global Step: 577060 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:38,662-Speed 2496.21 samples/sec Loss 1.4698 LearningRate 0.000114 Epoch: 27 Global Step: 577070 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:46,860-Speed 2498.80 samples/sec Loss 1.5189 LearningRate 0.000114 Epoch: 27 Global Step: 577080 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:32:55,005-Speed 2514.97 samples/sec Loss 1.4911 LearningRate 0.000114 Epoch: 27 Global Step: 577090 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:03,207-Speed 2497.32 samples/sec Loss 1.5084 LearningRate 0.000114 Epoch: 27 Global Step: 577100 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:11,409-Speed 2497.58 samples/sec Loss 1.5010 LearningRate 0.000114 Epoch: 27 Global Step: 577110 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:19,611-Speed 2497.40 samples/sec Loss 1.4979 LearningRate 0.000114 Epoch: 27 Global Step: 577120 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:27,811-Speed 2498.02 samples/sec Loss 1.4980 LearningRate 0.000114 Epoch: 27 Global Step: 577130 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:36,015-Speed 2496.43 samples/sec Loss 1.4774 LearningRate 0.000114 Epoch: 27 Global Step: 577140 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:44,164-Speed 2513.72 samples/sec Loss 1.5074 LearningRate 0.000114 Epoch: 27 Global Step: 577150 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:33:52,368-Speed 2496.84 samples/sec Loss 1.4852 LearningRate 0.000114 Epoch: 27 Global Step: 577160 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:00,574-Speed 2496.05 samples/sec Loss 1.4772 LearningRate 0.000114 Epoch: 27 Global Step: 577170 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:08,785-Speed 2494.75 samples/sec Loss 1.4880 LearningRate 0.000114 Epoch: 27 Global Step: 577180 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:17,009-Speed 2491.06 samples/sec Loss 1.5191 LearningRate 0.000114 Epoch: 27 Global Step: 577190 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:25,213-Speed 2496.86 samples/sec Loss 1.4825 LearningRate 0.000114 Epoch: 27 Global Step: 577200 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:33,365-Speed 2512.52 samples/sec Loss 1.4786 LearningRate 0.000114 Epoch: 27 Global Step: 577210 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:41,570-Speed 2496.39 samples/sec Loss 1.4964 LearningRate 0.000114 Epoch: 27 Global Step: 577220 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:49,771-Speed 2497.77 samples/sec Loss 1.5127 LearningRate 0.000114 Epoch: 27 Global Step: 577230 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:34:57,976-Speed 2496.58 samples/sec Loss 1.5101 LearningRate 0.000114 Epoch: 27 Global Step: 577240 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:06,178-Speed 2497.33 samples/sec Loss 1.5017 LearningRate 0.000114 Epoch: 27 Global Step: 577250 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:14,390-Speed 2494.33 samples/sec Loss 1.5174 LearningRate 0.000114 Epoch: 27 Global Step: 577260 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:22,540-Speed 2513.65 samples/sec Loss 1.4959 LearningRate 0.000114 Epoch: 27 Global Step: 577270 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:30,742-Speed 2497.54 samples/sec Loss 1.5282 LearningRate 0.000114 Epoch: 27 Global Step: 577280 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:38,944-Speed 2497.57 samples/sec Loss 1.4664 LearningRate 0.000114 Epoch: 27 Global Step: 577290 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:47,146-Speed 2497.84 samples/sec Loss 1.5239 LearningRate 0.000114 Epoch: 27 Global Step: 577300 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:35:55,345-Speed 2498.04 samples/sec Loss 1.4767 LearningRate 0.000114 Epoch: 27 Global Step: 577310 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:03,546-Speed 2497.68 samples/sec Loss 1.5109 LearningRate 0.000114 Epoch: 27 Global Step: 577320 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:11,692-Speed 2515.34 samples/sec Loss 1.4587 LearningRate 0.000114 Epoch: 27 Global Step: 577330 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:19,896-Speed 2496.89 samples/sec Loss 1.4943 LearningRate 0.000114 Epoch: 27 Global Step: 577340 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:28,100-Speed 2496.95 samples/sec Loss 1.4998 LearningRate 0.000114 Epoch: 27 Global Step: 577350 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:36,304-Speed 2496.84 samples/sec Loss 1.5012 LearningRate 0.000114 Epoch: 27 Global Step: 577360 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:44,504-Speed 2497.75 samples/sec Loss 1.4768 LearningRate 0.000114 Epoch: 27 Global Step: 577370 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:36:52,707-Speed 2497.09 samples/sec Loss 1.5183 LearningRate 0.000114 Epoch: 27 Global Step: 577380 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:00,853-Speed 2514.56 samples/sec Loss 1.4648 LearningRate 0.000114 Epoch: 27 Global Step: 577390 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:09,051-Speed 2498.47 samples/sec Loss 1.5051 LearningRate 0.000114 Epoch: 27 Global Step: 577400 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:17,256-Speed 2496.26 samples/sec Loss 1.4617 LearningRate 0.000114 Epoch: 27 Global Step: 577410 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:25,456-Speed 2497.95 samples/sec Loss 1.4808 LearningRate 0.000114 Epoch: 27 Global Step: 577420 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:33,657-Speed 2497.63 samples/sec Loss 1.5169 LearningRate 0.000114 Epoch: 27 Global Step: 577430 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:41,856-Speed 2498.72 samples/sec Loss 1.5400 LearningRate 0.000114 Epoch: 27 Global Step: 577440 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:50,010-Speed 2512.13 samples/sec Loss 1.4946 LearningRate 0.000114 Epoch: 27 Global Step: 577450 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:37:58,208-Speed 2498.65 samples/sec Loss 1.4573 LearningRate 0.000114 Epoch: 27 Global Step: 577460 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:06,420-Speed 2494.21 samples/sec Loss 1.4806 LearningRate 0.000114 Epoch: 27 Global Step: 577470 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:14,621-Speed 2497.64 samples/sec Loss 1.4984 LearningRate 0.000114 Epoch: 27 Global Step: 577480 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:22,836-Speed 2493.48 samples/sec Loss 1.5291 LearningRate 0.000114 Epoch: 27 Global Step: 577490 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:31,041-Speed 2496.35 samples/sec Loss 1.4571 LearningRate 0.000114 Epoch: 27 Global Step: 577500 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:39,190-Speed 2513.70 samples/sec Loss 1.5254 LearningRate 0.000114 Epoch: 27 Global Step: 577510 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:47,393-Speed 2496.97 samples/sec Loss 1.4886 LearningRate 0.000114 Epoch: 27 Global Step: 577520 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:38:55,593-Speed 2497.93 samples/sec Loss 1.5120 LearningRate 0.000114 Epoch: 27 Global Step: 577530 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:03,797-Speed 2496.57 samples/sec Loss 1.4333 LearningRate 0.000114 Epoch: 27 Global Step: 577540 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:12,004-Speed 2496.00 samples/sec Loss 1.4940 LearningRate 0.000114 Epoch: 27 Global Step: 577550 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:20,220-Speed 2493.10 samples/sec Loss 1.4784 LearningRate 0.000114 Epoch: 27 Global Step: 577560 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:28,367-Speed 2514.36 samples/sec Loss 1.4881 LearningRate 0.000114 Epoch: 27 Global Step: 577570 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:36,571-Speed 2496.74 samples/sec Loss 1.4698 LearningRate 0.000114 Epoch: 27 Global Step: 577580 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:44,775-Speed 2496.81 samples/sec Loss 1.4613 LearningRate 0.000114 Epoch: 27 Global Step: 577590 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:39:52,977-Speed 2497.39 samples/sec Loss 1.5022 LearningRate 0.000114 Epoch: 27 Global Step: 577600 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:01,179-Speed 2497.13 samples/sec Loss 1.4784 LearningRate 0.000114 Epoch: 27 Global Step: 577610 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:09,384-Speed 2496.66 samples/sec Loss 1.4773 LearningRate 0.000114 Epoch: 27 Global Step: 577620 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:17,532-Speed 2513.95 samples/sec Loss 1.4747 LearningRate 0.000114 Epoch: 27 Global Step: 577630 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:25,728-Speed 2498.94 samples/sec Loss 1.4669 LearningRate 0.000114 Epoch: 27 Global Step: 577640 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:33,926-Speed 2498.57 samples/sec Loss 1.4920 LearningRate 0.000114 Epoch: 27 Global Step: 577650 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:42,125-Speed 2498.32 samples/sec Loss 1.4911 LearningRate 0.000114 Epoch: 27 Global Step: 577660 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:50,325-Speed 2498.05 samples/sec Loss 1.4567 LearningRate 0.000114 Epoch: 27 Global Step: 577670 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:40:58,526-Speed 2497.64 samples/sec Loss 1.5214 LearningRate 0.000114 Epoch: 27 Global Step: 577680 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:06,673-Speed 2514.88 samples/sec Loss 1.5008 LearningRate 0.000114 Epoch: 27 Global Step: 577690 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:14,881-Speed 2495.48 samples/sec Loss 1.4722 LearningRate 0.000114 Epoch: 27 Global Step: 577700 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:23,091-Speed 2494.73 samples/sec Loss 1.4601 LearningRate 0.000114 Epoch: 27 Global Step: 577710 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:31,289-Speed 2498.59 samples/sec Loss 1.4797 LearningRate 0.000114 Epoch: 27 Global Step: 577720 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:39,490-Speed 2497.85 samples/sec Loss 1.4529 LearningRate 0.000114 Epoch: 27 Global Step: 577730 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:47,693-Speed 2496.74 samples/sec Loss 1.5047 LearningRate 0.000114 Epoch: 27 Global Step: 577740 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:41:55,852-Speed 2510.50 samples/sec Loss 1.5219 LearningRate 0.000114 Epoch: 27 Global Step: 577750 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:04,050-Speed 2498.84 samples/sec Loss 1.4934 LearningRate 0.000114 Epoch: 27 Global Step: 577760 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:12,251-Speed 2497.76 samples/sec Loss 1.4954 LearningRate 0.000114 Epoch: 27 Global Step: 577770 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:20,452-Speed 2497.54 samples/sec Loss 1.4855 LearningRate 0.000114 Epoch: 27 Global Step: 577780 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:28,651-Speed 2498.35 samples/sec Loss 1.5156 LearningRate 0.000114 Epoch: 27 Global Step: 577790 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:36,865-Speed 2493.48 samples/sec Loss 1.4937 LearningRate 0.000114 Epoch: 27 Global Step: 577800 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:45,013-Speed 2514.01 samples/sec Loss 1.4818 LearningRate 0.000114 Epoch: 27 Global Step: 577810 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:42:53,212-Speed 2498.24 samples/sec Loss 1.4914 LearningRate 0.000114 Epoch: 27 Global Step: 577820 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:01,414-Speed 2497.23 samples/sec Loss 1.4643 LearningRate 0.000114 Epoch: 27 Global Step: 577830 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:09,615-Speed 2497.77 samples/sec Loss 1.4927 LearningRate 0.000114 Epoch: 27 Global Step: 577840 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:17,813-Speed 2498.59 samples/sec Loss 1.4846 LearningRate 0.000114 Epoch: 27 Global Step: 577850 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:26,026-Speed 2494.01 samples/sec Loss 1.4913 LearningRate 0.000114 Epoch: 27 Global Step: 577860 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:34,171-Speed 2514.56 samples/sec Loss 1.4915 LearningRate 0.000114 Epoch: 27 Global Step: 577870 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:42,372-Speed 2497.75 samples/sec Loss 1.4842 LearningRate 0.000114 Epoch: 27 Global Step: 577880 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:50,574-Speed 2497.35 samples/sec Loss 1.4860 LearningRate 0.000114 Epoch: 27 Global Step: 577890 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:43:58,777-Speed 2497.13 samples/sec Loss 1.4936 LearningRate 0.000114 Epoch: 27 Global Step: 577900 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:44:06,978-Speed 2497.49 samples/sec Loss 1.4750 LearningRate 0.000114 Epoch: 27 Global Step: 577910 Fp16 Grad Scale: 16384 Required: 58 hours Training: 2022-07-11 02:44:15,177-Speed 2498.24 samples/sec Loss 1.4861 LearningRate 0.000114 Epoch: 27 Global Step: 577920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:44:23,325-Speed 2513.81 samples/sec Loss 1.4986 LearningRate 0.000114 Epoch: 27 Global Step: 577930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:44:31,527-Speed 2497.27 samples/sec Loss 1.4579 LearningRate 0.000114 Epoch: 27 Global Step: 577940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:44:39,743-Speed 2493.15 samples/sec Loss 1.4963 LearningRate 0.000114 Epoch: 27 Global Step: 577950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:44:47,942-Speed 2498.21 samples/sec Loss 1.4836 LearningRate 0.000114 Epoch: 27 Global Step: 577960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:44:56,142-Speed 2498.31 samples/sec Loss 1.4916 LearningRate 0.000114 Epoch: 27 Global Step: 577970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:04,339-Speed 2498.72 samples/sec Loss 1.5188 LearningRate 0.000114 Epoch: 27 Global Step: 577980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:12,501-Speed 2509.62 samples/sec Loss 1.5167 LearningRate 0.000114 Epoch: 27 Global Step: 577990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:20,700-Speed 2498.09 samples/sec Loss 1.4795 LearningRate 0.000114 Epoch: 27 Global Step: 578000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:28,902-Speed 2497.47 samples/sec Loss 1.4741 LearningRate 0.000114 Epoch: 27 Global Step: 578010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:37,103-Speed 2497.70 samples/sec Loss 1.4966 LearningRate 0.000114 Epoch: 27 Global Step: 578020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:45,307-Speed 2496.59 samples/sec Loss 1.4918 LearningRate 0.000114 Epoch: 27 Global Step: 578030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:45:53,518-Speed 2494.84 samples/sec Loss 1.4980 LearningRate 0.000113 Epoch: 27 Global Step: 578040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:01,664-Speed 2514.65 samples/sec Loss 1.4980 LearningRate 0.000113 Epoch: 27 Global Step: 578050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:09,865-Speed 2497.48 samples/sec Loss 1.4767 LearningRate 0.000113 Epoch: 27 Global Step: 578060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:18,079-Speed 2493.96 samples/sec Loss 1.4850 LearningRate 0.000113 Epoch: 27 Global Step: 578070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:26,280-Speed 2497.53 samples/sec Loss 1.5028 LearningRate 0.000113 Epoch: 27 Global Step: 578080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:34,480-Speed 2498.14 samples/sec Loss 1.4954 LearningRate 0.000113 Epoch: 27 Global Step: 578090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:42,694-Speed 2493.72 samples/sec Loss 1.4301 LearningRate 0.000113 Epoch: 27 Global Step: 578100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:50,839-Speed 2514.69 samples/sec Loss 1.5019 LearningRate 0.000113 Epoch: 27 Global Step: 578110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:46:59,056-Speed 2492.74 samples/sec Loss 1.4689 LearningRate 0.000113 Epoch: 27 Global Step: 578120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:07,257-Speed 2497.75 samples/sec Loss 1.4850 LearningRate 0.000113 Epoch: 27 Global Step: 578130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:15,457-Speed 2498.09 samples/sec Loss 1.4358 LearningRate 0.000113 Epoch: 27 Global Step: 578140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:23,659-Speed 2497.23 samples/sec Loss 1.4393 LearningRate 0.000113 Epoch: 27 Global Step: 578150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:31,862-Speed 2496.74 samples/sec Loss 1.4914 LearningRate 0.000113 Epoch: 27 Global Step: 578160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:40,008-Speed 2514.86 samples/sec Loss 1.4517 LearningRate 0.000113 Epoch: 27 Global Step: 578170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:48,208-Speed 2497.68 samples/sec Loss 1.4464 LearningRate 0.000113 Epoch: 27 Global Step: 578180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:47:56,411-Speed 2497.37 samples/sec Loss 1.4701 LearningRate 0.000113 Epoch: 27 Global Step: 578190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:04,616-Speed 2496.45 samples/sec Loss 1.4540 LearningRate 0.000113 Epoch: 27 Global Step: 578200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:12,816-Speed 2497.96 samples/sec Loss 1.4790 LearningRate 0.000113 Epoch: 27 Global Step: 578210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:21,029-Speed 2493.99 samples/sec Loss 1.4689 LearningRate 0.000113 Epoch: 27 Global Step: 578220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:29,178-Speed 2513.44 samples/sec Loss 1.4470 LearningRate 0.000113 Epoch: 27 Global Step: 578230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:37,380-Speed 2497.37 samples/sec Loss 1.4825 LearningRate 0.000113 Epoch: 27 Global Step: 578240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:45,583-Speed 2497.24 samples/sec Loss 1.4622 LearningRate 0.000113 Epoch: 27 Global Step: 578250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:48:53,791-Speed 2495.46 samples/sec Loss 1.4877 LearningRate 0.000113 Epoch: 27 Global Step: 578260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:01,990-Speed 2498.06 samples/sec Loss 1.4833 LearningRate 0.000113 Epoch: 27 Global Step: 578270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:10,191-Speed 2497.72 samples/sec Loss 1.4894 LearningRate 0.000113 Epoch: 27 Global Step: 578280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:18,343-Speed 2512.74 samples/sec Loss 1.4943 LearningRate 0.000113 Epoch: 27 Global Step: 578290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:26,539-Speed 2499.53 samples/sec Loss 1.4621 LearningRate 0.000113 Epoch: 27 Global Step: 578300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:34,745-Speed 2497.04 samples/sec Loss 1.4704 LearningRate 0.000113 Epoch: 27 Global Step: 578310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:42,946-Speed 2497.65 samples/sec Loss 1.4908 LearningRate 0.000113 Epoch: 27 Global Step: 578320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:51,151-Speed 2496.42 samples/sec Loss 1.4655 LearningRate 0.000113 Epoch: 27 Global Step: 578330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:49:59,361-Speed 2495.15 samples/sec Loss 1.5170 LearningRate 0.000113 Epoch: 27 Global Step: 578340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:50:07,506-Speed 2514.57 samples/sec Loss 1.5030 LearningRate 0.000113 Epoch: 27 Global Step: 578350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:50:15,712-Speed 2496.36 samples/sec Loss 1.4489 LearningRate 0.000113 Epoch: 27 Global Step: 578360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:50:24,000-Speed 2471.17 samples/sec Loss 1.5240 LearningRate 0.000113 Epoch: 27 Global Step: 578370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-07-11 02:50:32,210-Speed 2494.94 samples/sec Loss 1.4560 LearningRate 0.000113 Epoch: 27 Global Step: 578380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-07-11 02:50:40,413-Speed 2497.17 samples/sec Loss 1.4512 LearningRate 0.000113 Epoch: 27 Global Step: 578390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-07-11 02:50:48,622-Speed 2495.72 samples/sec Loss 1.4547 LearningRate 0.000113 Epoch: 27 Global Step: 578400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-07-11 02:50:56,766-Speed 2514.95 samples/sec Loss 1.5307 LearningRate 0.000113 Epoch: 27 Global Step: 578410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-07-11 02:51:04,935-Speed 2507.45 samples/sec Loss 1.4687 LearningRate 0.000113 Epoch: 27 Global Step: 578420 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:13,141-Speed 2496.06 samples/sec Loss 1.4943 LearningRate 0.000113 Epoch: 27 Global Step: 578430 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:21,350-Speed 2495.22 samples/sec Loss 1.5161 LearningRate 0.000113 Epoch: 27 Global Step: 578440 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:29,557-Speed 2495.98 samples/sec Loss 1.4602 LearningRate 0.000113 Epoch: 27 Global Step: 578450 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:37,762-Speed 2496.37 samples/sec Loss 1.4783 LearningRate 0.000113 Epoch: 27 Global Step: 578460 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:45,911-Speed 2513.72 samples/sec Loss 1.4967 LearningRate 0.000113 Epoch: 27 Global Step: 578470 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:51:54,113-Speed 2497.33 samples/sec Loss 1.4512 LearningRate 0.000113 Epoch: 27 Global Step: 578480 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:02,316-Speed 2496.85 samples/sec Loss 1.4822 LearningRate 0.000113 Epoch: 27 Global Step: 578490 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:10,525-Speed 2495.24 samples/sec Loss 1.4661 LearningRate 0.000113 Epoch: 27 Global Step: 578500 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:18,732-Speed 2495.79 samples/sec Loss 1.4437 LearningRate 0.000113 Epoch: 27 Global Step: 578510 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:26,938-Speed 2496.11 samples/sec Loss 1.5117 LearningRate 0.000113 Epoch: 27 Global Step: 578520 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:35,089-Speed 2513.08 samples/sec Loss 1.4607 LearningRate 0.000113 Epoch: 27 Global Step: 578530 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:43,292-Speed 2497.07 samples/sec Loss 1.4852 LearningRate 0.000113 Epoch: 27 Global Step: 578540 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:51,496-Speed 2496.63 samples/sec Loss 1.4991 LearningRate 0.000113 Epoch: 27 Global Step: 578550 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:52:59,700-Speed 2496.66 samples/sec Loss 1.4549 LearningRate 0.000113 Epoch: 27 Global Step: 578560 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:07,909-Speed 2495.33 samples/sec Loss 1.5131 LearningRate 0.000113 Epoch: 27 Global Step: 578570 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:16,114-Speed 2496.33 samples/sec Loss 1.4594 LearningRate 0.000113 Epoch: 27 Global Step: 578580 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:24,265-Speed 2513.00 samples/sec Loss 1.4793 LearningRate 0.000113 Epoch: 27 Global Step: 578590 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:32,469-Speed 2496.76 samples/sec Loss 1.5110 LearningRate 0.000113 Epoch: 27 Global Step: 578600 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:40,673-Speed 2496.80 samples/sec Loss 1.4778 LearningRate 0.000113 Epoch: 27 Global Step: 578610 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:48,876-Speed 2497.19 samples/sec Loss 1.4816 LearningRate 0.000113 Epoch: 27 Global Step: 578620 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:53:57,086-Speed 2495.11 samples/sec Loss 1.4363 LearningRate 0.000113 Epoch: 27 Global Step: 578630 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:05,294-Speed 2495.63 samples/sec Loss 1.5146 LearningRate 0.000113 Epoch: 27 Global Step: 578640 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:13,444-Speed 2513.27 samples/sec Loss 1.4825 LearningRate 0.000113 Epoch: 27 Global Step: 578650 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:21,649-Speed 2496.56 samples/sec Loss 1.4623 LearningRate 0.000113 Epoch: 27 Global Step: 578660 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:29,849-Speed 2498.03 samples/sec Loss 1.4632 LearningRate 0.000113 Epoch: 27 Global Step: 578670 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:38,063-Speed 2493.56 samples/sec Loss 1.4802 LearningRate 0.000113 Epoch: 27 Global Step: 578680 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:46,267-Speed 2496.56 samples/sec Loss 1.4880 LearningRate 0.000113 Epoch: 27 Global Step: 578690 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:54:54,475-Speed 2495.76 samples/sec Loss 1.4925 LearningRate 0.000113 Epoch: 27 Global Step: 578700 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:02,623-Speed 2513.60 samples/sec Loss 1.5042 LearningRate 0.000113 Epoch: 27 Global Step: 578710 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:10,839-Speed 2493.17 samples/sec Loss 1.4960 LearningRate 0.000113 Epoch: 27 Global Step: 578720 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:19,039-Speed 2497.83 samples/sec Loss 1.4596 LearningRate 0.000113 Epoch: 27 Global Step: 578730 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:27,240-Speed 2497.84 samples/sec Loss 1.4872 LearningRate 0.000113 Epoch: 27 Global Step: 578740 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:35,444-Speed 2496.70 samples/sec Loss 1.4585 LearningRate 0.000113 Epoch: 27 Global Step: 578750 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:43,646-Speed 2497.37 samples/sec Loss 1.4748 LearningRate 0.000113 Epoch: 27 Global Step: 578760 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:55:51,802-Speed 2511.45 samples/sec Loss 1.4802 LearningRate 0.000113 Epoch: 27 Global Step: 578770 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:00,004-Speed 2497.54 samples/sec Loss 1.5080 LearningRate 0.000113 Epoch: 27 Global Step: 578780 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:08,209-Speed 2496.42 samples/sec Loss 1.4631 LearningRate 0.000113 Epoch: 27 Global Step: 578790 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:16,409-Speed 2497.76 samples/sec Loss 1.4710 LearningRate 0.000113 Epoch: 27 Global Step: 578800 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:24,614-Speed 2496.62 samples/sec Loss 1.4368 LearningRate 0.000113 Epoch: 27 Global Step: 578810 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:32,817-Speed 2497.04 samples/sec Loss 1.4711 LearningRate 0.000113 Epoch: 27 Global Step: 578820 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:40,965-Speed 2513.86 samples/sec Loss 1.4689 LearningRate 0.000113 Epoch: 27 Global Step: 578830 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:49,170-Speed 2496.58 samples/sec Loss 1.5037 LearningRate 0.000113 Epoch: 27 Global Step: 578840 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:56:57,385-Speed 2493.40 samples/sec Loss 1.4972 LearningRate 0.000113 Epoch: 27 Global Step: 578850 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:57:05,587-Speed 2497.38 samples/sec Loss 1.4675 LearningRate 0.000113 Epoch: 27 Global Step: 578860 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:57:13,799-Speed 2493.97 samples/sec Loss 1.4876 LearningRate 0.000113 Epoch: 27 Global Step: 578870 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 02:57:21,961-Speed 2509.61 samples/sec Loss 1.4855 LearningRate 0.000113 Epoch: 27 Global Step: 578880 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:57:30,109-Speed 2513.93 samples/sec Loss 1.5226 LearningRate 0.000113 Epoch: 27 Global Step: 578890 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:57:38,310-Speed 2497.95 samples/sec Loss 1.4852 LearningRate 0.000113 Epoch: 27 Global Step: 578900 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:57:46,509-Speed 2498.21 samples/sec Loss 1.5107 LearningRate 0.000113 Epoch: 27 Global Step: 578910 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:57:54,707-Speed 2498.58 samples/sec Loss 1.4953 LearningRate 0.000113 Epoch: 27 Global Step: 578920 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:02,910-Speed 2497.11 samples/sec Loss 1.4773 LearningRate 0.000113 Epoch: 27 Global Step: 578930 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:11,113-Speed 2497.19 samples/sec Loss 1.5015 LearningRate 0.000113 Epoch: 27 Global Step: 578940 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:19,262-Speed 2513.47 samples/sec Loss 1.4828 LearningRate 0.000113 Epoch: 27 Global Step: 578950 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:27,490-Speed 2489.65 samples/sec Loss 1.4443 LearningRate 0.000113 Epoch: 27 Global Step: 578960 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:35,688-Speed 2498.38 samples/sec Loss 1.4712 LearningRate 0.000113 Epoch: 27 Global Step: 578970 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:43,889-Speed 2497.63 samples/sec Loss 1.4615 LearningRate 0.000113 Epoch: 27 Global Step: 578980 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:58:52,092-Speed 2497.21 samples/sec Loss 1.5255 LearningRate 0.000113 Epoch: 27 Global Step: 578990 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:00,290-Speed 2498.34 samples/sec Loss 1.5086 LearningRate 0.000113 Epoch: 27 Global Step: 579000 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:08,435-Speed 2514.86 samples/sec Loss 1.4806 LearningRate 0.000113 Epoch: 27 Global Step: 579010 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:16,636-Speed 2497.53 samples/sec Loss 1.4606 LearningRate 0.000113 Epoch: 27 Global Step: 579020 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:24,837-Speed 2497.76 samples/sec Loss 1.4710 LearningRate 0.000113 Epoch: 27 Global Step: 579030 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:33,037-Speed 2498.06 samples/sec Loss 1.4763 LearningRate 0.000113 Epoch: 27 Global Step: 579040 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:41,245-Speed 2495.30 samples/sec Loss 1.4890 LearningRate 0.000113 Epoch: 27 Global Step: 579050 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:49,443-Speed 2498.86 samples/sec Loss 1.4640 LearningRate 0.000113 Epoch: 27 Global Step: 579060 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 02:59:57,591-Speed 2513.91 samples/sec Loss 1.4806 LearningRate 0.000113 Epoch: 27 Global Step: 579070 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:05,788-Speed 2499.04 samples/sec Loss 1.5207 LearningRate 0.000113 Epoch: 27 Global Step: 579080 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:13,990-Speed 2497.12 samples/sec Loss 1.4790 LearningRate 0.000113 Epoch: 27 Global Step: 579090 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:22,193-Speed 2497.37 samples/sec Loss 1.4795 LearningRate 0.000113 Epoch: 27 Global Step: 579100 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:30,393-Speed 2498.22 samples/sec Loss 1.4886 LearningRate 0.000113 Epoch: 27 Global Step: 579110 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:38,591-Speed 2498.85 samples/sec Loss 1.4572 LearningRate 0.000113 Epoch: 27 Global Step: 579120 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:46,743-Speed 2512.32 samples/sec Loss 1.4731 LearningRate 0.000113 Epoch: 27 Global Step: 579130 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:00:54,943-Speed 2498.30 samples/sec Loss 1.5236 LearningRate 0.000113 Epoch: 27 Global Step: 579140 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:03,142-Speed 2498.33 samples/sec Loss 1.4805 LearningRate 0.000112 Epoch: 27 Global Step: 579150 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:11,340-Speed 2498.48 samples/sec Loss 1.5022 LearningRate 0.000112 Epoch: 27 Global Step: 579160 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:19,547-Speed 2495.67 samples/sec Loss 1.5336 LearningRate 0.000112 Epoch: 27 Global Step: 579170 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:27,754-Speed 2495.91 samples/sec Loss 1.4728 LearningRate 0.000112 Epoch: 27 Global Step: 579180 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:35,903-Speed 2513.80 samples/sec Loss 1.5104 LearningRate 0.000112 Epoch: 27 Global Step: 579190 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:44,101-Speed 2498.44 samples/sec Loss 1.5122 LearningRate 0.000112 Epoch: 27 Global Step: 579200 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:01:52,301-Speed 2497.96 samples/sec Loss 1.5314 LearningRate 0.000112 Epoch: 27 Global Step: 579210 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:00,503-Speed 2497.32 samples/sec Loss 1.4658 LearningRate 0.000112 Epoch: 27 Global Step: 579220 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:08,705-Speed 2497.30 samples/sec Loss 1.5026 LearningRate 0.000112 Epoch: 27 Global Step: 579230 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:16,910-Speed 2496.35 samples/sec Loss 1.4820 LearningRate 0.000112 Epoch: 27 Global Step: 579240 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:25,069-Speed 2510.79 samples/sec Loss 1.5195 LearningRate 0.000112 Epoch: 27 Global Step: 579250 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:33,271-Speed 2497.39 samples/sec Loss 1.5240 LearningRate 0.000112 Epoch: 27 Global Step: 579260 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:41,489-Speed 2492.34 samples/sec Loss 1.4759 LearningRate 0.000112 Epoch: 27 Global Step: 579270 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:49,701-Speed 2494.56 samples/sec Loss 1.4894 LearningRate 0.000112 Epoch: 27 Global Step: 579280 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:02:57,903-Speed 2497.35 samples/sec Loss 1.4907 LearningRate 0.000112 Epoch: 27 Global Step: 579290 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:06,109-Speed 2495.96 samples/sec Loss 1.5134 LearningRate 0.000112 Epoch: 27 Global Step: 579300 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:14,264-Speed 2511.73 samples/sec Loss 1.5119 LearningRate 0.000112 Epoch: 27 Global Step: 579310 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:22,464-Speed 2498.09 samples/sec Loss 1.4829 LearningRate 0.000112 Epoch: 27 Global Step: 579320 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:30,667-Speed 2496.89 samples/sec Loss 1.4912 LearningRate 0.000112 Epoch: 27 Global Step: 579330 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:38,867-Speed 2497.99 samples/sec Loss 1.5022 LearningRate 0.000112 Epoch: 27 Global Step: 579340 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:47,071-Speed 2497.06 samples/sec Loss 1.4762 LearningRate 0.000112 Epoch: 27 Global Step: 579350 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:03:55,269-Speed 2498.44 samples/sec Loss 1.4725 LearningRate 0.000112 Epoch: 27 Global Step: 579360 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:03,417-Speed 2514.01 samples/sec Loss 1.4856 LearningRate 0.000112 Epoch: 27 Global Step: 579370 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:11,620-Speed 2497.02 samples/sec Loss 1.4860 LearningRate 0.000112 Epoch: 27 Global Step: 579380 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:19,816-Speed 2499.32 samples/sec Loss 1.5049 LearningRate 0.000112 Epoch: 27 Global Step: 579390 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:28,030-Speed 2493.29 samples/sec Loss 1.4468 LearningRate 0.000112 Epoch: 27 Global Step: 579400 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:36,232-Speed 2497.48 samples/sec Loss 1.4661 LearningRate 0.000112 Epoch: 27 Global Step: 579410 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:44,435-Speed 2497.03 samples/sec Loss 1.5083 LearningRate 0.000112 Epoch: 27 Global Step: 579420 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:04:52,585-Speed 2513.21 samples/sec Loss 1.4608 LearningRate 0.000112 Epoch: 27 Global Step: 579430 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:00,786-Speed 2497.97 samples/sec Loss 1.4995 LearningRate 0.000112 Epoch: 27 Global Step: 579440 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:08,985-Speed 2498.30 samples/sec Loss 1.4858 LearningRate 0.000112 Epoch: 27 Global Step: 579450 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:17,206-Speed 2491.45 samples/sec Loss 1.4907 LearningRate 0.000112 Epoch: 27 Global Step: 579460 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:25,417-Speed 2494.73 samples/sec Loss 1.4549 LearningRate 0.000112 Epoch: 27 Global Step: 579470 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:33,617-Speed 2498.11 samples/sec Loss 1.4376 LearningRate 0.000112 Epoch: 27 Global Step: 579480 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:41,773-Speed 2511.15 samples/sec Loss 1.4895 LearningRate 0.000112 Epoch: 27 Global Step: 579490 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:49,976-Speed 2497.14 samples/sec Loss 1.4337 LearningRate 0.000112 Epoch: 27 Global Step: 579500 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:05:58,183-Speed 2495.83 samples/sec Loss 1.4916 LearningRate 0.000112 Epoch: 27 Global Step: 579510 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:06,378-Speed 2499.59 samples/sec Loss 1.4923 LearningRate 0.000112 Epoch: 27 Global Step: 579520 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:14,578-Speed 2497.98 samples/sec Loss 1.4639 LearningRate 0.000112 Epoch: 27 Global Step: 579530 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:22,780-Speed 2497.62 samples/sec Loss 1.4334 LearningRate 0.000112 Epoch: 27 Global Step: 579540 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:30,926-Speed 2514.43 samples/sec Loss 1.4708 LearningRate 0.000112 Epoch: 27 Global Step: 579550 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:39,129-Speed 2497.21 samples/sec Loss 1.4814 LearningRate 0.000112 Epoch: 27 Global Step: 579560 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:47,334-Speed 2496.38 samples/sec Loss 1.4622 LearningRate 0.000112 Epoch: 27 Global Step: 579570 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:06:55,540-Speed 2496.34 samples/sec Loss 1.4673 LearningRate 0.000112 Epoch: 27 Global Step: 579580 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:03,751-Speed 2494.78 samples/sec Loss 1.4323 LearningRate 0.000112 Epoch: 27 Global Step: 579590 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:11,952-Speed 2497.56 samples/sec Loss 1.4834 LearningRate 0.000112 Epoch: 27 Global Step: 579600 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:20,102-Speed 2513.19 samples/sec Loss 1.4585 LearningRate 0.000112 Epoch: 27 Global Step: 579610 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:28,306-Speed 2497.18 samples/sec Loss 1.4626 LearningRate 0.000112 Epoch: 27 Global Step: 579620 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:36,507-Speed 2497.70 samples/sec Loss 1.4471 LearningRate 0.000112 Epoch: 27 Global Step: 579630 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:44,710-Speed 2496.97 samples/sec Loss 1.4760 LearningRate 0.000112 Epoch: 27 Global Step: 579640 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:07:52,918-Speed 2495.61 samples/sec Loss 1.4744 LearningRate 0.000112 Epoch: 27 Global Step: 579650 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:01,120-Speed 2497.36 samples/sec Loss 1.4935 LearningRate 0.000112 Epoch: 27 Global Step: 579660 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:09,275-Speed 2511.73 samples/sec Loss 1.4988 LearningRate 0.000112 Epoch: 27 Global Step: 579670 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:17,478-Speed 2496.88 samples/sec Loss 1.4759 LearningRate 0.000112 Epoch: 27 Global Step: 579680 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:25,679-Speed 2497.89 samples/sec Loss 1.4969 LearningRate 0.000112 Epoch: 27 Global Step: 579690 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:33,889-Speed 2495.28 samples/sec Loss 1.4818 LearningRate 0.000112 Epoch: 27 Global Step: 579700 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:42,090-Speed 2497.60 samples/sec Loss 1.4795 LearningRate 0.000112 Epoch: 27 Global Step: 579710 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:50,291-Speed 2497.53 samples/sec Loss 1.5202 LearningRate 0.000112 Epoch: 27 Global Step: 579720 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:08:58,448-Speed 2511.15 samples/sec Loss 1.4973 LearningRate 0.000112 Epoch: 27 Global Step: 579730 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:06,656-Speed 2495.53 samples/sec Loss 1.4794 LearningRate 0.000112 Epoch: 27 Global Step: 579740 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:14,858-Speed 2497.30 samples/sec Loss 1.4633 LearningRate 0.000112 Epoch: 27 Global Step: 579750 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:23,057-Speed 2498.52 samples/sec Loss 1.4470 LearningRate 0.000112 Epoch: 27 Global Step: 579760 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:31,252-Speed 2499.22 samples/sec Loss 1.4841 LearningRate 0.000112 Epoch: 27 Global Step: 579770 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:39,453-Speed 2497.78 samples/sec Loss 1.4550 LearningRate 0.000112 Epoch: 27 Global Step: 579780 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:47,595-Speed 2515.53 samples/sec Loss 1.4384 LearningRate 0.000112 Epoch: 27 Global Step: 579790 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:09:55,794-Speed 2498.24 samples/sec Loss 1.4847 LearningRate 0.000112 Epoch: 27 Global Step: 579800 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:03,990-Speed 2499.29 samples/sec Loss 1.4463 LearningRate 0.000112 Epoch: 27 Global Step: 579810 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:12,188-Speed 2498.61 samples/sec Loss 1.4732 LearningRate 0.000112 Epoch: 27 Global Step: 579820 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:20,388-Speed 2497.79 samples/sec Loss 1.5011 LearningRate 0.000112 Epoch: 27 Global Step: 579830 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:28,588-Speed 2498.47 samples/sec Loss 1.4682 LearningRate 0.000112 Epoch: 27 Global Step: 579840 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:36,731-Speed 2515.66 samples/sec Loss 1.4460 LearningRate 0.000112 Epoch: 27 Global Step: 579850 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:44,933-Speed 2497.11 samples/sec Loss 1.5025 LearningRate 0.000112 Epoch: 27 Global Step: 579860 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:10:53,133-Speed 2498.18 samples/sec Loss 1.5116 LearningRate 0.000112 Epoch: 27 Global Step: 579870 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:01,335-Speed 2497.27 samples/sec Loss 1.5076 LearningRate 0.000112 Epoch: 27 Global Step: 579880 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:09,538-Speed 2497.04 samples/sec Loss 1.4730 LearningRate 0.000112 Epoch: 27 Global Step: 579890 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:17,749-Speed 2494.53 samples/sec Loss 1.4810 LearningRate 0.000112 Epoch: 27 Global Step: 579900 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:25,895-Speed 2514.65 samples/sec Loss 1.4822 LearningRate 0.000112 Epoch: 27 Global Step: 579910 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:34,107-Speed 2494.24 samples/sec Loss 1.4957 LearningRate 0.000112 Epoch: 27 Global Step: 579920 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:42,307-Speed 2498.15 samples/sec Loss 1.4974 LearningRate 0.000112 Epoch: 27 Global Step: 579930 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:50,522-Speed 2493.37 samples/sec Loss 1.4485 LearningRate 0.000112 Epoch: 27 Global Step: 579940 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:11:58,721-Speed 2498.35 samples/sec Loss 1.4850 LearningRate 0.000112 Epoch: 27 Global Step: 579950 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:06,929-Speed 2495.72 samples/sec Loss 1.5029 LearningRate 0.000112 Epoch: 27 Global Step: 579960 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:15,075-Speed 2514.45 samples/sec Loss 1.4635 LearningRate 0.000112 Epoch: 27 Global Step: 579970 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:23,290-Speed 2493.40 samples/sec Loss 1.4868 LearningRate 0.000112 Epoch: 27 Global Step: 579980 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:31,489-Speed 2498.44 samples/sec Loss 1.4751 LearningRate 0.000112 Epoch: 27 Global Step: 579990 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:39,689-Speed 2497.88 samples/sec Loss 1.4884 LearningRate 0.000112 Epoch: 27 Global Step: 580000 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:47,891-Speed 2497.25 samples/sec Loss 1.5033 LearningRate 0.000112 Epoch: 27 Global Step: 580010 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:12:56,095-Speed 2496.70 samples/sec Loss 1.4928 LearningRate 0.000112 Epoch: 27 Global Step: 580020 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:04,250-Speed 2511.81 samples/sec Loss 1.4849 LearningRate 0.000112 Epoch: 27 Global Step: 580030 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:12,447-Speed 2498.85 samples/sec Loss 1.4742 LearningRate 0.000112 Epoch: 27 Global Step: 580040 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:20,650-Speed 2497.02 samples/sec Loss 1.4910 LearningRate 0.000112 Epoch: 27 Global Step: 580050 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:28,849-Speed 2498.34 samples/sec Loss 1.4537 LearningRate 0.000112 Epoch: 27 Global Step: 580060 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:37,051-Speed 2497.70 samples/sec Loss 1.5227 LearningRate 0.000112 Epoch: 27 Global Step: 580070 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:13:45,250-Speed 2498.28 samples/sec Loss 1.4755 LearningRate 0.000112 Epoch: 27 Global Step: 580080 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:13:53,399-Speed 2513.33 samples/sec Loss 1.4775 LearningRate 0.000112 Epoch: 27 Global Step: 580090 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:01,598-Speed 2498.28 samples/sec Loss 1.4933 LearningRate 0.000112 Epoch: 27 Global Step: 580100 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:09,801-Speed 2497.06 samples/sec Loss 1.4867 LearningRate 0.000112 Epoch: 27 Global Step: 580110 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:18,005-Speed 2497.04 samples/sec Loss 1.4652 LearningRate 0.000112 Epoch: 27 Global Step: 580120 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:26,208-Speed 2497.17 samples/sec Loss 1.5109 LearningRate 0.000112 Epoch: 27 Global Step: 580130 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:34,418-Speed 2494.56 samples/sec Loss 1.5089 LearningRate 0.000112 Epoch: 27 Global Step: 580140 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:42,567-Speed 2513.52 samples/sec Loss 1.4704 LearningRate 0.000112 Epoch: 27 Global Step: 580150 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:50,767-Speed 2497.99 samples/sec Loss 1.4804 LearningRate 0.000112 Epoch: 27 Global Step: 580160 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:14:58,970-Speed 2497.18 samples/sec Loss 1.4828 LearningRate 0.000112 Epoch: 27 Global Step: 580170 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:07,176-Speed 2496.28 samples/sec Loss 1.4964 LearningRate 0.000112 Epoch: 27 Global Step: 580180 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:15,380-Speed 2497.03 samples/sec Loss 1.4635 LearningRate 0.000112 Epoch: 27 Global Step: 580190 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:23,580-Speed 2497.80 samples/sec Loss 1.4655 LearningRate 0.000112 Epoch: 27 Global Step: 580200 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:31,733-Speed 2512.30 samples/sec Loss 1.5041 LearningRate 0.000112 Epoch: 27 Global Step: 580210 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:39,938-Speed 2496.54 samples/sec Loss 1.4976 LearningRate 0.000112 Epoch: 27 Global Step: 580220 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:48,139-Speed 2497.95 samples/sec Loss 1.4825 LearningRate 0.000112 Epoch: 27 Global Step: 580230 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:15:56,342-Speed 2497.08 samples/sec Loss 1.4744 LearningRate 0.000112 Epoch: 27 Global Step: 580240 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:04,548-Speed 2496.28 samples/sec Loss 1.4756 LearningRate 0.000112 Epoch: 27 Global Step: 580250 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:12,751-Speed 2496.97 samples/sec Loss 1.4971 LearningRate 0.000111 Epoch: 27 Global Step: 580260 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:20,906-Speed 2511.87 samples/sec Loss 1.4766 LearningRate 0.000111 Epoch: 27 Global Step: 580270 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:29,111-Speed 2496.58 samples/sec Loss 1.5042 LearningRate 0.000111 Epoch: 27 Global Step: 580280 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:37,315-Speed 2496.53 samples/sec Loss 1.4900 LearningRate 0.000111 Epoch: 27 Global Step: 580290 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:45,532-Speed 2495.32 samples/sec Loss 1.5051 LearningRate 0.000111 Epoch: 27 Global Step: 580300 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:16:54,063-Speed 2499.27 samples/sec Loss 1.4858 LearningRate 0.000111 Epoch: 27 Global Step: 580310 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:17:02,265-Speed 2497.29 samples/sec Loss 1.4938 LearningRate 0.000111 Epoch: 27 Global Step: 580320 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:17:10,466-Speed 2511.18 samples/sec Loss 1.4836 LearningRate 0.000111 Epoch: 27 Global Step: 580330 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:17:20,183-Speed 2230.01 samples/sec Loss 1.4800 LearningRate 0.000111 Epoch: 27 Global Step: 580340 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:17:28,380-Speed 2498.92 samples/sec Loss 1.4690 LearningRate 0.000111 Epoch: 27 Global Step: 580350 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:17:36,586-Speed 2513.22 samples/sec Loss 1.4802 LearningRate 0.000111 Epoch: 27 Global Step: 580360 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:17:44,791-Speed 2496.68 samples/sec Loss 1.4899 LearningRate 0.000111 Epoch: 27 Global Step: 580370 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:17:55,546-Speed 1912.47 samples/sec Loss 1.4859 LearningRate 0.000111 Epoch: 27 Global Step: 580380 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:03,753-Speed 2513.58 samples/sec Loss 1.4950 LearningRate 0.000111 Epoch: 27 Global Step: 580390 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:11,965-Speed 2494.27 samples/sec Loss 1.4731 LearningRate 0.000111 Epoch: 27 Global Step: 580400 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:23,984-Speed 1705.83 samples/sec Loss 1.4957 LearningRate 0.000111 Epoch: 27 Global Step: 580410 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:32,202-Speed 2500.53 samples/sec Loss 1.5057 LearningRate 0.000111 Epoch: 27 Global Step: 580420 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:40,467-Speed 2501.74 samples/sec Loss 1.5008 LearningRate 0.000111 Epoch: 27 Global Step: 580430 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:49,826-Speed 2188.51 samples/sec Loss 1.4888 LearningRate 0.000111 Epoch: 27 Global Step: 580440 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:18:58,013-Speed 2518.44 samples/sec Loss 1.4844 LearningRate 0.000111 Epoch: 27 Global Step: 580450 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:09,865-Speed 1733.20 samples/sec Loss 1.4700 LearningRate 0.000111 Epoch: 27 Global Step: 580460 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:18,063-Speed 2498.46 samples/sec Loss 1.4481 LearningRate 0.000111 Epoch: 27 Global Step: 580470 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:26,288-Speed 2500.15 samples/sec Loss 1.4864 LearningRate 0.000111 Epoch: 27 Global Step: 580480 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:40,672-Speed 2394.07 samples/sec Loss 1.4803 LearningRate 0.000111 Epoch: 27 Global Step: 580490 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:49,010-Speed 2502.26 samples/sec Loss 1.5193 LearningRate 0.000111 Epoch: 27 Global Step: 580500 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:19:57,391-Speed 2470.08 samples/sec Loss 1.4893 LearningRate 0.000111 Epoch: 27 Global Step: 580510 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:05,618-Speed 2499.61 samples/sec Loss 1.4743 LearningRate 0.000111 Epoch: 27 Global Step: 580520 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:14,964-Speed 2191.53 samples/sec Loss 1.4337 LearningRate 0.000111 Epoch: 27 Global Step: 580530 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:23,163-Speed 2498.31 samples/sec Loss 1.4650 LearningRate 0.000111 Epoch: 27 Global Step: 580540 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:31,362-Speed 2498.19 samples/sec Loss 1.4983 LearningRate 0.000111 Epoch: 27 Global Step: 580550 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:39,558-Speed 2498.95 samples/sec Loss 1.4741 LearningRate 0.000111 Epoch: 27 Global Step: 580560 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:47,706-Speed 2514.18 samples/sec Loss 1.5048 LearningRate 0.000111 Epoch: 27 Global Step: 580570 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:20:55,914-Speed 2495.41 samples/sec Loss 1.4989 LearningRate 0.000111 Epoch: 27 Global Step: 580580 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:04,115-Speed 2497.67 samples/sec Loss 1.4972 LearningRate 0.000111 Epoch: 27 Global Step: 580590 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:12,319-Speed 2496.92 samples/sec Loss 1.5014 LearningRate 0.000111 Epoch: 27 Global Step: 580600 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:20,524-Speed 2496.20 samples/sec Loss 1.4796 LearningRate 0.000111 Epoch: 27 Global Step: 580610 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:28,732-Speed 2495.35 samples/sec Loss 1.4560 LearningRate 0.000111 Epoch: 27 Global Step: 580620 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:36,884-Speed 2512.80 samples/sec Loss 1.5117 LearningRate 0.000111 Epoch: 27 Global Step: 580630 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:45,082-Speed 2498.58 samples/sec Loss 1.4509 LearningRate 0.000111 Epoch: 27 Global Step: 580640 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:21:53,284-Speed 2497.39 samples/sec Loss 1.4656 LearningRate 0.000111 Epoch: 27 Global Step: 580650 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:01,489-Speed 2496.39 samples/sec Loss 1.4918 LearningRate 0.000111 Epoch: 27 Global Step: 580660 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:09,701-Speed 2494.55 samples/sec Loss 1.4810 LearningRate 0.000111 Epoch: 27 Global Step: 580670 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:17,904-Speed 2497.30 samples/sec Loss 1.4808 LearningRate 0.000111 Epoch: 27 Global Step: 580680 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:26,054-Speed 2513.02 samples/sec Loss 1.5240 LearningRate 0.000111 Epoch: 27 Global Step: 580690 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:34,256-Speed 2497.28 samples/sec Loss 1.4598 LearningRate 0.000111 Epoch: 27 Global Step: 580700 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:45,041-Speed 1899.19 samples/sec Loss 1.4766 LearningRate 0.000111 Epoch: 28 Global Step: 580710 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:22:53,252-Speed 2494.94 samples/sec Loss 1.4986 LearningRate 0.000111 Epoch: 28 Global Step: 580720 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:01,455-Speed 2497.17 samples/sec Loss 1.4774 LearningRate 0.000111 Epoch: 28 Global Step: 580730 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:09,655-Speed 2497.81 samples/sec Loss 1.4941 LearningRate 0.000111 Epoch: 28 Global Step: 580740 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:17,798-Speed 2515.39 samples/sec Loss 1.4982 LearningRate 0.000111 Epoch: 28 Global Step: 580750 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:25,997-Speed 2498.44 samples/sec Loss 1.4716 LearningRate 0.000111 Epoch: 28 Global Step: 580760 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:34,197-Speed 2497.85 samples/sec Loss 1.4619 LearningRate 0.000111 Epoch: 28 Global Step: 580770 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:42,397-Speed 2498.02 samples/sec Loss 1.4634 LearningRate 0.000111 Epoch: 28 Global Step: 580780 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:50,598-Speed 2497.74 samples/sec Loss 1.4919 LearningRate 0.000111 Epoch: 28 Global Step: 580790 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:23:58,810-Speed 2494.00 samples/sec Loss 1.4877 LearningRate 0.000111 Epoch: 28 Global Step: 580800 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:06,959-Speed 2513.71 samples/sec Loss 1.4927 LearningRate 0.000111 Epoch: 28 Global Step: 580810 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:15,163-Speed 2496.76 samples/sec Loss 1.4662 LearningRate 0.000111 Epoch: 28 Global Step: 580820 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:23,364-Speed 2497.89 samples/sec Loss 1.4481 LearningRate 0.000111 Epoch: 28 Global Step: 580830 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:31,566-Speed 2497.53 samples/sec Loss 1.4583 LearningRate 0.000111 Epoch: 28 Global Step: 580840 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:39,764-Speed 2498.52 samples/sec Loss 1.4985 LearningRate 0.000111 Epoch: 28 Global Step: 580850 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:47,977-Speed 2494.04 samples/sec Loss 1.4898 LearningRate 0.000111 Epoch: 28 Global Step: 580860 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:24:56,138-Speed 2509.90 samples/sec Loss 1.4470 LearningRate 0.000111 Epoch: 28 Global Step: 580870 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:04,339-Speed 2497.83 samples/sec Loss 1.4470 LearningRate 0.000111 Epoch: 28 Global Step: 580880 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:12,539-Speed 2497.86 samples/sec Loss 1.4595 LearningRate 0.000111 Epoch: 28 Global Step: 580890 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:20,740-Speed 2497.77 samples/sec Loss 1.4893 LearningRate 0.000111 Epoch: 28 Global Step: 580900 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:28,944-Speed 2496.75 samples/sec Loss 1.4936 LearningRate 0.000111 Epoch: 28 Global Step: 580910 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:37,147-Speed 2497.15 samples/sec Loss 1.4731 LearningRate 0.000111 Epoch: 28 Global Step: 580920 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:45,296-Speed 2513.49 samples/sec Loss 1.4914 LearningRate 0.000111 Epoch: 28 Global Step: 580930 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:25:53,496-Speed 2497.70 samples/sec Loss 1.4460 LearningRate 0.000111 Epoch: 28 Global Step: 580940 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:01,701-Speed 2496.47 samples/sec Loss 1.4572 LearningRate 0.000111 Epoch: 28 Global Step: 580950 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:09,902-Speed 2498.08 samples/sec Loss 1.4859 LearningRate 0.000111 Epoch: 28 Global Step: 580960 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:18,103-Speed 2497.54 samples/sec Loss 1.4773 LearningRate 0.000111 Epoch: 28 Global Step: 580970 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:26,304-Speed 2497.72 samples/sec Loss 1.4659 LearningRate 0.000111 Epoch: 28 Global Step: 580980 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:34,453-Speed 2513.64 samples/sec Loss 1.4594 LearningRate 0.000111 Epoch: 28 Global Step: 580990 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:42,652-Speed 2498.16 samples/sec Loss 1.4935 LearningRate 0.000111 Epoch: 28 Global Step: 581000 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:50,852-Speed 2497.93 samples/sec Loss 1.4124 LearningRate 0.000111 Epoch: 28 Global Step: 581010 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:26:59,053-Speed 2497.57 samples/sec Loss 1.4602 LearningRate 0.000111 Epoch: 28 Global Step: 581020 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:07,255-Speed 2497.43 samples/sec Loss 1.4840 LearningRate 0.000111 Epoch: 28 Global Step: 581030 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:15,454-Speed 2498.21 samples/sec Loss 1.4277 LearningRate 0.000111 Epoch: 28 Global Step: 581040 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:23,603-Speed 2513.48 samples/sec Loss 1.4367 LearningRate 0.000111 Epoch: 28 Global Step: 581050 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:31,805-Speed 2497.51 samples/sec Loss 1.4413 LearningRate 0.000111 Epoch: 28 Global Step: 581060 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:40,009-Speed 2496.69 samples/sec Loss 1.4832 LearningRate 0.000111 Epoch: 28 Global Step: 581070 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:48,213-Speed 2496.59 samples/sec Loss 1.4430 LearningRate 0.000111 Epoch: 28 Global Step: 581080 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:27:56,412-Speed 2498.31 samples/sec Loss 1.4471 LearningRate 0.000111 Epoch: 28 Global Step: 581090 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:04,615-Speed 2497.56 samples/sec Loss 1.4389 LearningRate 0.000111 Epoch: 28 Global Step: 581100 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:12,762-Speed 2514.02 samples/sec Loss 1.4326 LearningRate 0.000111 Epoch: 28 Global Step: 581110 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:20,966-Speed 2496.78 samples/sec Loss 1.4982 LearningRate 0.000111 Epoch: 28 Global Step: 581120 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:29,174-Speed 2495.60 samples/sec Loss 1.4357 LearningRate 0.000111 Epoch: 28 Global Step: 581130 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:37,381-Speed 2495.86 samples/sec Loss 1.4352 LearningRate 0.000111 Epoch: 28 Global Step: 581140 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:45,585-Speed 2496.62 samples/sec Loss 1.4234 LearningRate 0.000111 Epoch: 28 Global Step: 581150 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:28:53,788-Speed 2497.04 samples/sec Loss 1.4358 LearningRate 0.000111 Epoch: 28 Global Step: 581160 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:01,933-Speed 2514.81 samples/sec Loss 1.4688 LearningRate 0.000111 Epoch: 28 Global Step: 581170 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:10,131-Speed 2498.58 samples/sec Loss 1.4315 LearningRate 0.000111 Epoch: 28 Global Step: 581180 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:18,339-Speed 2495.84 samples/sec Loss 1.4741 LearningRate 0.000111 Epoch: 28 Global Step: 581190 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:26,536-Speed 2498.68 samples/sec Loss 1.4523 LearningRate 0.000111 Epoch: 28 Global Step: 581200 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:34,738-Speed 2497.49 samples/sec Loss 1.4481 LearningRate 0.000111 Epoch: 28 Global Step: 581210 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:42,940-Speed 2497.31 samples/sec Loss 1.4495 LearningRate 0.000111 Epoch: 28 Global Step: 581220 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:51,089-Speed 2513.67 samples/sec Loss 1.4824 LearningRate 0.000111 Epoch: 28 Global Step: 581230 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:29:59,296-Speed 2495.96 samples/sec Loss 1.4508 LearningRate 0.000111 Epoch: 28 Global Step: 581240 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:07,496-Speed 2498.01 samples/sec Loss 1.4323 LearningRate 0.000111 Epoch: 28 Global Step: 581250 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:15,700-Speed 2497.07 samples/sec Loss 1.4585 LearningRate 0.000111 Epoch: 28 Global Step: 581260 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:23,900-Speed 2497.67 samples/sec Loss 1.4699 LearningRate 0.000111 Epoch: 28 Global Step: 581270 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:32,108-Speed 2495.75 samples/sec Loss 1.4890 LearningRate 0.000111 Epoch: 28 Global Step: 581280 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:40,259-Speed 2512.94 samples/sec Loss 1.4639 LearningRate 0.000111 Epoch: 28 Global Step: 581290 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:48,463-Speed 2497.04 samples/sec Loss 1.5182 LearningRate 0.000111 Epoch: 28 Global Step: 581300 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:30:56,665-Speed 2497.06 samples/sec Loss 1.4474 LearningRate 0.000111 Epoch: 28 Global Step: 581310 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:04,869-Speed 2496.95 samples/sec Loss 1.4725 LearningRate 0.000111 Epoch: 28 Global Step: 581320 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:13,067-Speed 2498.74 samples/sec Loss 1.4994 LearningRate 0.000111 Epoch: 28 Global Step: 581330 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:21,288-Speed 2491.54 samples/sec Loss 1.4228 LearningRate 0.000111 Epoch: 28 Global Step: 581340 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:29,436-Speed 2514.09 samples/sec Loss 1.4835 LearningRate 0.000111 Epoch: 28 Global Step: 581350 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:37,634-Speed 2498.43 samples/sec Loss 1.4947 LearningRate 0.000111 Epoch: 28 Global Step: 581360 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:45,840-Speed 2496.19 samples/sec Loss 1.4752 LearningRate 0.000111 Epoch: 28 Global Step: 581370 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:31:54,044-Speed 2496.72 samples/sec Loss 1.4769 LearningRate 0.000110 Epoch: 28 Global Step: 581380 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:02,254-Speed 2494.88 samples/sec Loss 1.4350 LearningRate 0.000110 Epoch: 28 Global Step: 581390 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:10,464-Speed 2494.68 samples/sec Loss 1.4723 LearningRate 0.000110 Epoch: 28 Global Step: 581400 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:18,612-Speed 2513.96 samples/sec Loss 1.4698 LearningRate 0.000110 Epoch: 28 Global Step: 581410 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:26,811-Speed 2498.21 samples/sec Loss 1.4989 LearningRate 0.000110 Epoch: 28 Global Step: 581420 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:35,025-Speed 2493.87 samples/sec Loss 1.4718 LearningRate 0.000110 Epoch: 28 Global Step: 581430 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:43,227-Speed 2497.38 samples/sec Loss 1.4559 LearningRate 0.000110 Epoch: 28 Global Step: 581440 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:51,431-Speed 2496.74 samples/sec Loss 1.4816 LearningRate 0.000110 Epoch: 28 Global Step: 581450 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:32:59,644-Speed 2493.81 samples/sec Loss 1.4872 LearningRate 0.000110 Epoch: 28 Global Step: 581460 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:07,791-Speed 2514.26 samples/sec Loss 1.4732 LearningRate 0.000110 Epoch: 28 Global Step: 581470 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:15,999-Speed 2495.67 samples/sec Loss 1.4914 LearningRate 0.000110 Epoch: 28 Global Step: 581480 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:24,202-Speed 2496.70 samples/sec Loss 1.5035 LearningRate 0.000110 Epoch: 28 Global Step: 581490 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:32,408-Speed 2496.35 samples/sec Loss 1.5130 LearningRate 0.000110 Epoch: 28 Global Step: 581500 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:40,628-Speed 2491.88 samples/sec Loss 1.4922 LearningRate 0.000110 Epoch: 28 Global Step: 581510 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:48,831-Speed 2497.36 samples/sec Loss 1.4670 LearningRate 0.000110 Epoch: 28 Global Step: 581520 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:33:56,977-Speed 2514.46 samples/sec Loss 1.4940 LearningRate 0.000110 Epoch: 28 Global Step: 581530 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:34:05,179-Speed 2497.26 samples/sec Loss 1.4854 LearningRate 0.000110 Epoch: 28 Global Step: 581540 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:34:13,379-Speed 2498.00 samples/sec Loss 1.4759 LearningRate 0.000110 Epoch: 28 Global Step: 581550 Fp16 Grad Scale: 8192 Required: 57 hours Training: 2022-07-11 03:34:21,580-Speed 2497.57 samples/sec Loss 1.4807 LearningRate 0.000110 Epoch: 28 Global Step: 581560 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:34:29,786-Speed 2496.13 samples/sec Loss 1.4999 LearningRate 0.000110 Epoch: 28 Global Step: 581570 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:34:37,990-Speed 2496.74 samples/sec Loss 1.4739 LearningRate 0.000110 Epoch: 28 Global Step: 581580 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:34:46,142-Speed 2512.87 samples/sec Loss 1.5286 LearningRate 0.000110 Epoch: 28 Global Step: 581590 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:34:54,346-Speed 2496.67 samples/sec Loss 1.4813 LearningRate 0.000110 Epoch: 28 Global Step: 581600 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:02,548-Speed 2497.38 samples/sec Loss 1.4844 LearningRate 0.000110 Epoch: 28 Global Step: 581610 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:10,760-Speed 2494.62 samples/sec Loss 1.5096 LearningRate 0.000110 Epoch: 28 Global Step: 581620 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:18,962-Speed 2497.23 samples/sec Loss 1.4865 LearningRate 0.000110 Epoch: 28 Global Step: 581630 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:27,163-Speed 2497.80 samples/sec Loss 1.4674 LearningRate 0.000110 Epoch: 28 Global Step: 581640 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:35,320-Speed 2510.94 samples/sec Loss 1.4542 LearningRate 0.000110 Epoch: 28 Global Step: 581650 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:43,524-Speed 2496.98 samples/sec Loss 1.5311 LearningRate 0.000110 Epoch: 28 Global Step: 581660 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:51,727-Speed 2496.77 samples/sec Loss 1.4835 LearningRate 0.000110 Epoch: 28 Global Step: 581670 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:35:59,932-Speed 2496.60 samples/sec Loss 1.4714 LearningRate 0.000110 Epoch: 28 Global Step: 581680 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:08,133-Speed 2497.57 samples/sec Loss 1.4716 LearningRate 0.000110 Epoch: 28 Global Step: 581690 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:16,351-Speed 2492.71 samples/sec Loss 1.4552 LearningRate 0.000110 Epoch: 28 Global Step: 581700 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:24,502-Speed 2512.85 samples/sec Loss 1.4483 LearningRate 0.000110 Epoch: 28 Global Step: 581710 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:32,711-Speed 2495.28 samples/sec Loss 1.4350 LearningRate 0.000110 Epoch: 28 Global Step: 581720 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:40,911-Speed 2498.16 samples/sec Loss 1.4304 LearningRate 0.000110 Epoch: 28 Global Step: 581730 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:49,122-Speed 2494.39 samples/sec Loss 1.4764 LearningRate 0.000110 Epoch: 28 Global Step: 581740 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:36:57,329-Speed 2496.03 samples/sec Loss 1.5133 LearningRate 0.000110 Epoch: 28 Global Step: 581750 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:05,537-Speed 2495.68 samples/sec Loss 1.4695 LearningRate 0.000110 Epoch: 28 Global Step: 581760 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:13,688-Speed 2512.96 samples/sec Loss 1.4611 LearningRate 0.000110 Epoch: 28 Global Step: 581770 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:21,889-Speed 2497.95 samples/sec Loss 1.4722 LearningRate 0.000110 Epoch: 28 Global Step: 581780 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:30,089-Speed 2497.72 samples/sec Loss 1.4623 LearningRate 0.000110 Epoch: 28 Global Step: 581790 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:38,295-Speed 2496.05 samples/sec Loss 1.4926 LearningRate 0.000110 Epoch: 28 Global Step: 581800 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:46,496-Speed 2498.12 samples/sec Loss 1.4774 LearningRate 0.000110 Epoch: 28 Global Step: 581810 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:37:54,697-Speed 2497.66 samples/sec Loss 1.4475 LearningRate 0.000110 Epoch: 28 Global Step: 581820 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:02,846-Speed 2513.52 samples/sec Loss 1.4929 LearningRate 0.000110 Epoch: 28 Global Step: 581830 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:11,052-Speed 2495.97 samples/sec Loss 1.4678 LearningRate 0.000110 Epoch: 28 Global Step: 581840 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:19,252-Speed 2498.02 samples/sec Loss 1.4671 LearningRate 0.000110 Epoch: 28 Global Step: 581850 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:27,452-Speed 2497.90 samples/sec Loss 1.4364 LearningRate 0.000110 Epoch: 28 Global Step: 581860 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:35,660-Speed 2495.49 samples/sec Loss 1.4388 LearningRate 0.000110 Epoch: 28 Global Step: 581870 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:43,870-Speed 2494.93 samples/sec Loss 1.5254 LearningRate 0.000110 Epoch: 28 Global Step: 581880 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:38:52,016-Speed 2514.59 samples/sec Loss 1.4697 LearningRate 0.000110 Epoch: 28 Global Step: 581890 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:00,223-Speed 2495.96 samples/sec Loss 1.4876 LearningRate 0.000110 Epoch: 28 Global Step: 581900 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:08,429-Speed 2495.90 samples/sec Loss 1.4338 LearningRate 0.000110 Epoch: 28 Global Step: 581910 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:16,645-Speed 2493.02 samples/sec Loss 1.4906 LearningRate 0.000110 Epoch: 28 Global Step: 581920 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:24,848-Speed 2496.91 samples/sec Loss 1.4301 LearningRate 0.000110 Epoch: 28 Global Step: 581930 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:33,048-Speed 2497.89 samples/sec Loss 1.4424 LearningRate 0.000110 Epoch: 28 Global Step: 581940 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:41,199-Speed 2513.07 samples/sec Loss 1.4482 LearningRate 0.000110 Epoch: 28 Global Step: 581950 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:49,401-Speed 2497.38 samples/sec Loss 1.4551 LearningRate 0.000110 Epoch: 28 Global Step: 581960 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:39:57,608-Speed 2495.72 samples/sec Loss 1.4931 LearningRate 0.000110 Epoch: 28 Global Step: 581970 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:05,810-Speed 2497.53 samples/sec Loss 1.4628 LearningRate 0.000110 Epoch: 28 Global Step: 581980 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:14,011-Speed 2497.64 samples/sec Loss 1.4364 LearningRate 0.000110 Epoch: 28 Global Step: 581990 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:22,215-Speed 2496.50 samples/sec Loss 1.4581 LearningRate 0.000110 Epoch: 28 Global Step: 582000 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:30,366-Speed 2513.27 samples/sec Loss 1.4697 LearningRate 0.000110 Epoch: 28 Global Step: 582010 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:38,587-Speed 2491.80 samples/sec Loss 1.4804 LearningRate 0.000110 Epoch: 28 Global Step: 582020 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:46,795-Speed 2495.39 samples/sec Loss 1.4574 LearningRate 0.000110 Epoch: 28 Global Step: 582030 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:40:55,001-Speed 2495.97 samples/sec Loss 1.4096 LearningRate 0.000110 Epoch: 28 Global Step: 582040 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:03,203-Speed 2497.52 samples/sec Loss 1.4567 LearningRate 0.000110 Epoch: 28 Global Step: 582050 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:11,408-Speed 2496.43 samples/sec Loss 1.4824 LearningRate 0.000110 Epoch: 28 Global Step: 582060 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:19,556-Speed 2513.75 samples/sec Loss 1.4468 LearningRate 0.000110 Epoch: 28 Global Step: 582070 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:27,755-Speed 2498.22 samples/sec Loss 1.4276 LearningRate 0.000110 Epoch: 28 Global Step: 582080 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:35,971-Speed 2493.26 samples/sec Loss 1.4577 LearningRate 0.000110 Epoch: 28 Global Step: 582090 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:44,171-Speed 2497.96 samples/sec Loss 1.4513 LearningRate 0.000110 Epoch: 28 Global Step: 582100 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:41:52,375-Speed 2496.82 samples/sec Loss 1.4396 LearningRate 0.000110 Epoch: 28 Global Step: 582110 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:00,579-Speed 2496.76 samples/sec Loss 1.4125 LearningRate 0.000110 Epoch: 28 Global Step: 582120 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:08,730-Speed 2512.82 samples/sec Loss 1.4449 LearningRate 0.000110 Epoch: 28 Global Step: 582130 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:16,935-Speed 2496.54 samples/sec Loss 1.4948 LearningRate 0.000110 Epoch: 28 Global Step: 582140 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:25,141-Speed 2495.92 samples/sec Loss 1.4915 LearningRate 0.000110 Epoch: 28 Global Step: 582150 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:33,348-Speed 2495.78 samples/sec Loss 1.4742 LearningRate 0.000110 Epoch: 28 Global Step: 582160 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:41,560-Speed 2494.47 samples/sec Loss 1.4958 LearningRate 0.000110 Epoch: 28 Global Step: 582170 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:49,768-Speed 2495.49 samples/sec Loss 1.4743 LearningRate 0.000110 Epoch: 28 Global Step: 582180 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:42:57,919-Speed 2513.12 samples/sec Loss 1.4449 LearningRate 0.000110 Epoch: 28 Global Step: 582190 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:06,121-Speed 2497.10 samples/sec Loss 1.4478 LearningRate 0.000110 Epoch: 28 Global Step: 582200 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:14,325-Speed 2496.90 samples/sec Loss 1.4222 LearningRate 0.000110 Epoch: 28 Global Step: 582210 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:22,526-Speed 2497.93 samples/sec Loss 1.4787 LearningRate 0.000110 Epoch: 28 Global Step: 582220 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:30,729-Speed 2497.03 samples/sec Loss 1.4418 LearningRate 0.000110 Epoch: 28 Global Step: 582230 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:38,935-Speed 2496.01 samples/sec Loss 1.4627 LearningRate 0.000110 Epoch: 28 Global Step: 582240 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:47,084-Speed 2513.71 samples/sec Loss 1.4493 LearningRate 0.000110 Epoch: 28 Global Step: 582250 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:43:55,284-Speed 2497.97 samples/sec Loss 1.4451 LearningRate 0.000110 Epoch: 28 Global Step: 582260 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:03,484-Speed 2497.76 samples/sec Loss 1.4651 LearningRate 0.000110 Epoch: 28 Global Step: 582270 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:11,686-Speed 2497.38 samples/sec Loss 1.4633 LearningRate 0.000110 Epoch: 28 Global Step: 582280 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:19,909-Speed 2490.92 samples/sec Loss 1.4311 LearningRate 0.000110 Epoch: 28 Global Step: 582290 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:28,115-Speed 2496.15 samples/sec Loss 1.4653 LearningRate 0.000110 Epoch: 28 Global Step: 582300 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:36,262-Speed 2514.15 samples/sec Loss 1.4724 LearningRate 0.000110 Epoch: 28 Global Step: 582310 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:44,476-Speed 2493.80 samples/sec Loss 1.4949 LearningRate 0.000110 Epoch: 28 Global Step: 582320 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:44:52,679-Speed 2497.07 samples/sec Loss 1.4806 LearningRate 0.000110 Epoch: 28 Global Step: 582330 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:00,881-Speed 2497.49 samples/sec Loss 1.5100 LearningRate 0.000110 Epoch: 28 Global Step: 582340 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:09,082-Speed 2497.55 samples/sec Loss 1.5222 LearningRate 0.000110 Epoch: 28 Global Step: 582350 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:17,285-Speed 2497.20 samples/sec Loss 1.4651 LearningRate 0.000110 Epoch: 28 Global Step: 582360 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:25,433-Speed 2513.73 samples/sec Loss 1.4581 LearningRate 0.000110 Epoch: 28 Global Step: 582370 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:33,637-Speed 2496.94 samples/sec Loss 1.5121 LearningRate 0.000110 Epoch: 28 Global Step: 582380 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:41,840-Speed 2496.86 samples/sec Loss 1.4673 LearningRate 0.000110 Epoch: 28 Global Step: 582390 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:50,044-Speed 2496.81 samples/sec Loss 1.4419 LearningRate 0.000110 Epoch: 28 Global Step: 582400 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:45:58,250-Speed 2496.32 samples/sec Loss 1.4783 LearningRate 0.000110 Epoch: 28 Global Step: 582410 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:06,464-Speed 2493.60 samples/sec Loss 1.4496 LearningRate 0.000110 Epoch: 28 Global Step: 582420 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:14,610-Speed 2514.34 samples/sec Loss 1.4951 LearningRate 0.000110 Epoch: 28 Global Step: 582430 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:22,822-Speed 2494.30 samples/sec Loss 1.4446 LearningRate 0.000110 Epoch: 28 Global Step: 582440 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:31,026-Speed 2497.25 samples/sec Loss 1.4796 LearningRate 0.000110 Epoch: 28 Global Step: 582450 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:39,240-Speed 2493.55 samples/sec Loss 1.4910 LearningRate 0.000110 Epoch: 28 Global Step: 582460 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:47,443-Speed 2496.98 samples/sec Loss 1.4604 LearningRate 0.000110 Epoch: 28 Global Step: 582470 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:46:55,646-Speed 2497.26 samples/sec Loss 1.4489 LearningRate 0.000110 Epoch: 28 Global Step: 582480 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:03,802-Speed 2511.49 samples/sec Loss 1.4799 LearningRate 0.000110 Epoch: 28 Global Step: 582490 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:12,002-Speed 2497.87 samples/sec Loss 1.4352 LearningRate 0.000110 Epoch: 28 Global Step: 582500 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:20,203-Speed 2497.68 samples/sec Loss 1.4705 LearningRate 0.000109 Epoch: 28 Global Step: 582510 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:28,407-Speed 2496.83 samples/sec Loss 1.4597 LearningRate 0.000109 Epoch: 28 Global Step: 582520 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:36,608-Speed 2497.44 samples/sec Loss 1.4394 LearningRate 0.000109 Epoch: 28 Global Step: 582530 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:44,815-Speed 2495.84 samples/sec Loss 1.4592 LearningRate 0.000109 Epoch: 28 Global Step: 582540 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:47:52,964-Speed 2513.56 samples/sec Loss 1.4439 LearningRate 0.000109 Epoch: 28 Global Step: 582550 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:01,168-Speed 2496.82 samples/sec Loss 1.5126 LearningRate 0.000109 Epoch: 28 Global Step: 582560 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:09,371-Speed 2497.15 samples/sec Loss 1.4714 LearningRate 0.000109 Epoch: 28 Global Step: 582570 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:17,569-Speed 2498.54 samples/sec Loss 1.4839 LearningRate 0.000109 Epoch: 28 Global Step: 582580 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:25,767-Speed 2498.52 samples/sec Loss 1.4908 LearningRate 0.000109 Epoch: 28 Global Step: 582590 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:33,968-Speed 2497.72 samples/sec Loss 1.4851 LearningRate 0.000109 Epoch: 28 Global Step: 582600 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:42,111-Speed 2516.20 samples/sec Loss 1.4715 LearningRate 0.000109 Epoch: 28 Global Step: 582610 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:50,317-Speed 2496.04 samples/sec Loss 1.5148 LearningRate 0.000109 Epoch: 28 Global Step: 582620 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:48:58,524-Speed 2495.79 samples/sec Loss 1.4822 LearningRate 0.000109 Epoch: 28 Global Step: 582630 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:06,724-Speed 2497.92 samples/sec Loss 1.4652 LearningRate 0.000109 Epoch: 28 Global Step: 582640 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:14,926-Speed 2497.38 samples/sec Loss 1.4666 LearningRate 0.000109 Epoch: 28 Global Step: 582650 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:23,129-Speed 2497.21 samples/sec Loss 1.4689 LearningRate 0.000109 Epoch: 28 Global Step: 582660 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:31,289-Speed 2510.33 samples/sec Loss 1.4300 LearningRate 0.000109 Epoch: 28 Global Step: 582670 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:39,497-Speed 2495.54 samples/sec Loss 1.4794 LearningRate 0.000109 Epoch: 28 Global Step: 582680 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:47,697-Speed 2497.94 samples/sec Loss 1.4353 LearningRate 0.000109 Epoch: 28 Global Step: 582690 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:49:55,903-Speed 2496.21 samples/sec Loss 1.4626 LearningRate 0.000109 Epoch: 28 Global Step: 582700 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:50:04,108-Speed 2496.93 samples/sec Loss 1.4545 LearningRate 0.000109 Epoch: 28 Global Step: 582710 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:50:12,309-Speed 2497.56 samples/sec Loss 1.4552 LearningRate 0.000109 Epoch: 28 Global Step: 582720 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:50:20,458-Speed 2513.71 samples/sec Loss 1.4698 LearningRate 0.000109 Epoch: 28 Global Step: 582730 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:50:28,664-Speed 2496.36 samples/sec Loss 1.4866 LearningRate 0.000109 Epoch: 28 Global Step: 582740 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-07-11 03:50:36,867-Speed 2496.82 samples/sec Loss 1.4740 LearningRate 0.000109 Epoch: 28 Global Step: 582750 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 03:50:45,073-Speed 2496.25 samples/sec Loss 1.4922 LearningRate 0.000109 Epoch: 28 Global Step: 582760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:50:53,272-Speed 2498.54 samples/sec Loss 1.4647 LearningRate 0.000109 Epoch: 28 Global Step: 582770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:01,475-Speed 2497.03 samples/sec Loss 1.4270 LearningRate 0.000109 Epoch: 28 Global Step: 582780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:09,622-Speed 2514.12 samples/sec Loss 1.5064 LearningRate 0.000109 Epoch: 28 Global Step: 582790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:17,824-Speed 2497.44 samples/sec Loss 1.4255 LearningRate 0.000109 Epoch: 28 Global Step: 582800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:26,032-Speed 2495.81 samples/sec Loss 1.4869 LearningRate 0.000109 Epoch: 28 Global Step: 582810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:34,231-Speed 2498.52 samples/sec Loss 1.4616 LearningRate 0.000109 Epoch: 28 Global Step: 582820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:42,433-Speed 2497.10 samples/sec Loss 1.5171 LearningRate 0.000109 Epoch: 28 Global Step: 582830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:50,641-Speed 2495.66 samples/sec Loss 1.5059 LearningRate 0.000109 Epoch: 28 Global Step: 582840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:51:58,788-Speed 2514.33 samples/sec Loss 1.4937 LearningRate 0.000109 Epoch: 28 Global Step: 582850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:06,989-Speed 2497.55 samples/sec Loss 1.4734 LearningRate 0.000109 Epoch: 28 Global Step: 582860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:15,192-Speed 2497.21 samples/sec Loss 1.5006 LearningRate 0.000109 Epoch: 28 Global Step: 582870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:23,394-Speed 2497.59 samples/sec Loss 1.4745 LearningRate 0.000109 Epoch: 28 Global Step: 582880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:31,596-Speed 2497.15 samples/sec Loss 1.4763 LearningRate 0.000109 Epoch: 28 Global Step: 582890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:39,800-Speed 2496.78 samples/sec Loss 1.5019 LearningRate 0.000109 Epoch: 28 Global Step: 582900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:47,964-Speed 2509.22 samples/sec Loss 1.4675 LearningRate 0.000109 Epoch: 28 Global Step: 582910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:52:56,181-Speed 2492.76 samples/sec Loss 1.5438 LearningRate 0.000109 Epoch: 28 Global Step: 582920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:04,389-Speed 2495.52 samples/sec Loss 1.4992 LearningRate 0.000109 Epoch: 28 Global Step: 582930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:12,591-Speed 2497.61 samples/sec Loss 1.4687 LearningRate 0.000109 Epoch: 28 Global Step: 582940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:20,799-Speed 2495.49 samples/sec Loss 1.4885 LearningRate 0.000109 Epoch: 28 Global Step: 582950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:29,016-Speed 2492.57 samples/sec Loss 1.4981 LearningRate 0.000109 Epoch: 28 Global Step: 582960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:37,176-Speed 2510.39 samples/sec Loss 1.4542 LearningRate 0.000109 Epoch: 28 Global Step: 582970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:45,386-Speed 2494.92 samples/sec Loss 1.4763 LearningRate 0.000109 Epoch: 28 Global Step: 582980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:53:53,587-Speed 2497.38 samples/sec Loss 1.4940 LearningRate 0.000109 Epoch: 28 Global Step: 582990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:01,797-Speed 2495.04 samples/sec Loss 1.4595 LearningRate 0.000109 Epoch: 28 Global Step: 583000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:10,004-Speed 2495.90 samples/sec Loss 1.4415 LearningRate 0.000109 Epoch: 28 Global Step: 583010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:18,204-Speed 2497.91 samples/sec Loss 1.4336 LearningRate 0.000109 Epoch: 28 Global Step: 583020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:26,357-Speed 2512.06 samples/sec Loss 1.4825 LearningRate 0.000109 Epoch: 28 Global Step: 583030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:34,568-Speed 2494.70 samples/sec Loss 1.4650 LearningRate 0.000109 Epoch: 28 Global Step: 583040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:42,773-Speed 2496.50 samples/sec Loss 1.4683 LearningRate 0.000109 Epoch: 28 Global Step: 583050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:50,988-Speed 2493.20 samples/sec Loss 1.4984 LearningRate 0.000109 Epoch: 28 Global Step: 583060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:54:59,189-Speed 2497.69 samples/sec Loss 1.4395 LearningRate 0.000109 Epoch: 28 Global Step: 583070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:07,394-Speed 2496.54 samples/sec Loss 1.4573 LearningRate 0.000109 Epoch: 28 Global Step: 583080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:15,545-Speed 2512.86 samples/sec Loss 1.4671 LearningRate 0.000109 Epoch: 28 Global Step: 583090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:23,749-Speed 2496.61 samples/sec Loss 1.4657 LearningRate 0.000109 Epoch: 28 Global Step: 583100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:31,954-Speed 2496.36 samples/sec Loss 1.4768 LearningRate 0.000109 Epoch: 28 Global Step: 583110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:40,169-Speed 2493.78 samples/sec Loss 1.5161 LearningRate 0.000109 Epoch: 28 Global Step: 583120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:48,377-Speed 2495.89 samples/sec Loss 1.4779 LearningRate 0.000109 Epoch: 28 Global Step: 583130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:55:56,587-Speed 2495.65 samples/sec Loss 1.4933 LearningRate 0.000109 Epoch: 28 Global Step: 583140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:04,743-Speed 2511.46 samples/sec Loss 1.4822 LearningRate 0.000109 Epoch: 28 Global Step: 583150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:12,946-Speed 2496.79 samples/sec Loss 1.4439 LearningRate 0.000109 Epoch: 28 Global Step: 583160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:21,148-Speed 2497.35 samples/sec Loss 1.4522 LearningRate 0.000109 Epoch: 28 Global Step: 583170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:29,352-Speed 2497.32 samples/sec Loss 1.4687 LearningRate 0.000109 Epoch: 28 Global Step: 583180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:37,557-Speed 2496.41 samples/sec Loss 1.4668 LearningRate 0.000109 Epoch: 28 Global Step: 583190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:45,764-Speed 2496.02 samples/sec Loss 1.4748 LearningRate 0.000109 Epoch: 28 Global Step: 583200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:56:53,913-Speed 2513.57 samples/sec Loss 1.4758 LearningRate 0.000109 Epoch: 28 Global Step: 583210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:02,120-Speed 2495.72 samples/sec Loss 1.4692 LearningRate 0.000109 Epoch: 28 Global Step: 583220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:10,336-Speed 2493.00 samples/sec Loss 1.4436 LearningRate 0.000109 Epoch: 28 Global Step: 583230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:18,541-Speed 2496.67 samples/sec Loss 1.4666 LearningRate 0.000109 Epoch: 28 Global Step: 583240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:26,747-Speed 2496.36 samples/sec Loss 1.4730 LearningRate 0.000109 Epoch: 28 Global Step: 583250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:34,950-Speed 2497.02 samples/sec Loss 1.4516 LearningRate 0.000109 Epoch: 28 Global Step: 583260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:43,103-Speed 2512.20 samples/sec Loss 1.4701 LearningRate 0.000109 Epoch: 28 Global Step: 583270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:51,310-Speed 2496.15 samples/sec Loss 1.4577 LearningRate 0.000109 Epoch: 28 Global Step: 583280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:57:59,521-Speed 2494.52 samples/sec Loss 1.4702 LearningRate 0.000109 Epoch: 28 Global Step: 583290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:07,731-Speed 2494.93 samples/sec Loss 1.5103 LearningRate 0.000109 Epoch: 28 Global Step: 583300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:15,938-Speed 2496.14 samples/sec Loss 1.4628 LearningRate 0.000109 Epoch: 28 Global Step: 583310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:24,143-Speed 2496.19 samples/sec Loss 1.4357 LearningRate 0.000109 Epoch: 28 Global Step: 583320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:32,306-Speed 2509.47 samples/sec Loss 1.4416 LearningRate 0.000109 Epoch: 28 Global Step: 583330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:40,513-Speed 2495.73 samples/sec Loss 1.4732 LearningRate 0.000109 Epoch: 28 Global Step: 583340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:48,721-Speed 2495.52 samples/sec Loss 1.4645 LearningRate 0.000109 Epoch: 28 Global Step: 583350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:58:56,931-Speed 2494.97 samples/sec Loss 1.4632 LearningRate 0.000109 Epoch: 28 Global Step: 583360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:05,139-Speed 2495.55 samples/sec Loss 1.4763 LearningRate 0.000109 Epoch: 28 Global Step: 583370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:13,347-Speed 2495.54 samples/sec Loss 1.4738 LearningRate 0.000109 Epoch: 28 Global Step: 583380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:21,500-Speed 2512.43 samples/sec Loss 1.4537 LearningRate 0.000109 Epoch: 28 Global Step: 583390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:29,710-Speed 2494.97 samples/sec Loss 1.4743 LearningRate 0.000109 Epoch: 28 Global Step: 583400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:37,916-Speed 2495.93 samples/sec Loss 1.4685 LearningRate 0.000109 Epoch: 28 Global Step: 583410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:46,118-Speed 2497.72 samples/sec Loss 1.4570 LearningRate 0.000109 Epoch: 28 Global Step: 583420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 03:59:54,331-Speed 2493.95 samples/sec Loss 1.4524 LearningRate 0.000109 Epoch: 28 Global Step: 583430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:02,533-Speed 2497.54 samples/sec Loss 1.4590 LearningRate 0.000109 Epoch: 28 Global Step: 583440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:10,678-Speed 2514.78 samples/sec Loss 1.4900 LearningRate 0.000109 Epoch: 28 Global Step: 583450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:18,884-Speed 2496.17 samples/sec Loss 1.4602 LearningRate 0.000109 Epoch: 28 Global Step: 583460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:27,085-Speed 2497.59 samples/sec Loss 1.4708 LearningRate 0.000109 Epoch: 28 Global Step: 583470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:35,288-Speed 2497.26 samples/sec Loss 1.4808 LearningRate 0.000109 Epoch: 28 Global Step: 583480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:43,494-Speed 2495.77 samples/sec Loss 1.4711 LearningRate 0.000109 Epoch: 28 Global Step: 583490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:51,710-Speed 2493.06 samples/sec Loss 1.4428 LearningRate 0.000109 Epoch: 28 Global Step: 583500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:00:59,862-Speed 2512.76 samples/sec Loss 1.4527 LearningRate 0.000109 Epoch: 28 Global Step: 583510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:08,064-Speed 2497.37 samples/sec Loss 1.5037 LearningRate 0.000109 Epoch: 28 Global Step: 583520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:16,268-Speed 2496.64 samples/sec Loss 1.4647 LearningRate 0.000109 Epoch: 28 Global Step: 583530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:24,475-Speed 2495.86 samples/sec Loss 1.4597 LearningRate 0.000109 Epoch: 28 Global Step: 583540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:32,684-Speed 2495.19 samples/sec Loss 1.4334 LearningRate 0.000109 Epoch: 28 Global Step: 583550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:40,892-Speed 2495.80 samples/sec Loss 1.4409 LearningRate 0.000109 Epoch: 28 Global Step: 583560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:49,041-Speed 2513.42 samples/sec Loss 1.4795 LearningRate 0.000109 Epoch: 28 Global Step: 583570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:01:57,245-Speed 2497.02 samples/sec Loss 1.4625 LearningRate 0.000109 Epoch: 28 Global Step: 583580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:05,447-Speed 2497.31 samples/sec Loss 1.5202 LearningRate 0.000109 Epoch: 28 Global Step: 583590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:13,646-Speed 2498.19 samples/sec Loss 1.4877 LearningRate 0.000109 Epoch: 28 Global Step: 583600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:21,863-Speed 2492.67 samples/sec Loss 1.4455 LearningRate 0.000109 Epoch: 28 Global Step: 583610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:30,060-Speed 2498.63 samples/sec Loss 1.4460 LearningRate 0.000109 Epoch: 28 Global Step: 583620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:38,230-Speed 2507.45 samples/sec Loss 1.4679 LearningRate 0.000109 Epoch: 28 Global Step: 583630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:46,432-Speed 2497.11 samples/sec Loss 1.4920 LearningRate 0.000108 Epoch: 28 Global Step: 583640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:02:54,646-Speed 2493.59 samples/sec Loss 1.4951 LearningRate 0.000108 Epoch: 28 Global Step: 583650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:02,854-Speed 2495.80 samples/sec Loss 1.4439 LearningRate 0.000108 Epoch: 28 Global Step: 583660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:11,056-Speed 2497.36 samples/sec Loss 1.4709 LearningRate 0.000108 Epoch: 28 Global Step: 583670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:19,267-Speed 2494.30 samples/sec Loss 1.4294 LearningRate 0.000108 Epoch: 28 Global Step: 583680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:27,421-Speed 2512.07 samples/sec Loss 1.4689 LearningRate 0.000108 Epoch: 28 Global Step: 583690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:35,626-Speed 2496.72 samples/sec Loss 1.4987 LearningRate 0.000108 Epoch: 28 Global Step: 583700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:43,841-Speed 2493.23 samples/sec Loss 1.4632 LearningRate 0.000108 Epoch: 28 Global Step: 583710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:03:52,046-Speed 2496.47 samples/sec Loss 1.4753 LearningRate 0.000108 Epoch: 28 Global Step: 583720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:00,250-Speed 2496.79 samples/sec Loss 1.4738 LearningRate 0.000108 Epoch: 28 Global Step: 583730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:08,454-Speed 2497.12 samples/sec Loss 1.4740 LearningRate 0.000108 Epoch: 28 Global Step: 583740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:16,603-Speed 2513.45 samples/sec Loss 1.4718 LearningRate 0.000108 Epoch: 28 Global Step: 583750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:24,807-Speed 2496.91 samples/sec Loss 1.4496 LearningRate 0.000108 Epoch: 28 Global Step: 583760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:33,009-Speed 2497.30 samples/sec Loss 1.4795 LearningRate 0.000108 Epoch: 28 Global Step: 583770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:41,213-Speed 2496.79 samples/sec Loss 1.4753 LearningRate 0.000108 Epoch: 28 Global Step: 583780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:49,420-Speed 2495.95 samples/sec Loss 1.4969 LearningRate 0.000108 Epoch: 28 Global Step: 583790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:04:57,622-Speed 2497.04 samples/sec Loss 1.4387 LearningRate 0.000108 Epoch: 28 Global Step: 583800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:05,771-Speed 2513.79 samples/sec Loss 1.4658 LearningRate 0.000108 Epoch: 28 Global Step: 583810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:13,982-Speed 2495.06 samples/sec Loss 1.4574 LearningRate 0.000108 Epoch: 28 Global Step: 583820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:22,192-Speed 2494.67 samples/sec Loss 1.4877 LearningRate 0.000108 Epoch: 28 Global Step: 583830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:30,391-Speed 2498.40 samples/sec Loss 1.4577 LearningRate 0.000108 Epoch: 28 Global Step: 583840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:38,599-Speed 2495.54 samples/sec Loss 1.4456 LearningRate 0.000108 Epoch: 28 Global Step: 583850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:46,801-Speed 2497.30 samples/sec Loss 1.4450 LearningRate 0.000108 Epoch: 28 Global Step: 583860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:05:54,951-Speed 2513.27 samples/sec Loss 1.5019 LearningRate 0.000108 Epoch: 28 Global Step: 583870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:03,152-Speed 2497.58 samples/sec Loss 1.4909 LearningRate 0.000108 Epoch: 28 Global Step: 583880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:11,363-Speed 2494.67 samples/sec Loss 1.4872 LearningRate 0.000108 Epoch: 28 Global Step: 583890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:19,568-Speed 2496.69 samples/sec Loss 1.4706 LearningRate 0.000108 Epoch: 28 Global Step: 583900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:27,772-Speed 2496.79 samples/sec Loss 1.4441 LearningRate 0.000108 Epoch: 28 Global Step: 583910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:35,976-Speed 2496.80 samples/sec Loss 1.4739 LearningRate 0.000108 Epoch: 28 Global Step: 583920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:44,123-Speed 2514.07 samples/sec Loss 1.4706 LearningRate 0.000108 Epoch: 28 Global Step: 583930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:06:52,324-Speed 2497.53 samples/sec Loss 1.4396 LearningRate 0.000108 Epoch: 28 Global Step: 583940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:07:00,524-Speed 2498.15 samples/sec Loss 1.4715 LearningRate 0.000108 Epoch: 28 Global Step: 583950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:07:08,730-Speed 2496.01 samples/sec Loss 1.4833 LearningRate 0.000108 Epoch: 28 Global Step: 583960 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-07-11 04:07:16,936-Speed 2496.31 samples/sec Loss 1.4688 LearningRate 0.000108 Epoch: 28 Global Step: 583970 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-07-11 04:07:25,139-Speed 2496.71 samples/sec Loss 1.4929 LearningRate 0.000108 Epoch: 28 Global Step: 583980 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-07-11 04:07:33,288-Speed 2513.57 samples/sec Loss 1.5109 LearningRate 0.000108 Epoch: 28 Global Step: 583990 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-07-11 04:07:41,491-Speed 2497.41 samples/sec Loss 1.4446 LearningRate 0.000108 Epoch: 28 Global Step: 584000 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-07-11 04:07:49,653-Speed 2509.75 samples/sec Loss 1.4554 LearningRate 0.000108 Epoch: 28 Global Step: 584010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:07:57,863-Speed 2494.74 samples/sec Loss 1.4623 LearningRate 0.000108 Epoch: 28 Global Step: 584020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:06,064-Speed 2497.73 samples/sec Loss 1.4844 LearningRate 0.000108 Epoch: 28 Global Step: 584030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:14,266-Speed 2497.55 samples/sec Loss 1.4525 LearningRate 0.000108 Epoch: 28 Global Step: 584040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:22,418-Speed 2512.44 samples/sec Loss 1.4968 LearningRate 0.000108 Epoch: 28 Global Step: 584050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:30,620-Speed 2497.61 samples/sec Loss 1.5268 LearningRate 0.000108 Epoch: 28 Global Step: 584060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:38,826-Speed 2496.19 samples/sec Loss 1.4359 LearningRate 0.000108 Epoch: 28 Global Step: 584070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:47,030-Speed 2496.76 samples/sec Loss 1.4669 LearningRate 0.000108 Epoch: 28 Global Step: 584080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:08:55,189-Speed 2510.48 samples/sec Loss 1.4404 LearningRate 0.000108 Epoch: 28 Global Step: 584090 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:03,391-Speed 2497.25 samples/sec Loss 1.4467 LearningRate 0.000108 Epoch: 28 Global Step: 584100 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:11,539-Speed 2513.93 samples/sec Loss 1.4690 LearningRate 0.000108 Epoch: 28 Global Step: 584110 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:19,742-Speed 2497.07 samples/sec Loss 1.4706 LearningRate 0.000108 Epoch: 28 Global Step: 584120 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:27,945-Speed 2496.89 samples/sec Loss 1.4721 LearningRate 0.000108 Epoch: 28 Global Step: 584130 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:36,148-Speed 2497.20 samples/sec Loss 1.4746 LearningRate 0.000108 Epoch: 28 Global Step: 584140 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:44,351-Speed 2497.10 samples/sec Loss 1.4959 LearningRate 0.000108 Epoch: 28 Global Step: 584150 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:09:52,552-Speed 2497.67 samples/sec Loss 1.4854 LearningRate 0.000108 Epoch: 28 Global Step: 584160 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:00,705-Speed 2512.18 samples/sec Loss 1.4543 LearningRate 0.000108 Epoch: 28 Global Step: 584170 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:08,907-Speed 2497.40 samples/sec Loss 1.4953 LearningRate 0.000108 Epoch: 28 Global Step: 584180 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:17,109-Speed 2497.69 samples/sec Loss 1.4466 LearningRate 0.000108 Epoch: 28 Global Step: 584190 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:25,313-Speed 2496.75 samples/sec Loss 1.4650 LearningRate 0.000108 Epoch: 28 Global Step: 584200 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:33,518-Speed 2496.54 samples/sec Loss 1.5068 LearningRate 0.000108 Epoch: 28 Global Step: 584210 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:41,732-Speed 2493.56 samples/sec Loss 1.4639 LearningRate 0.000108 Epoch: 28 Global Step: 584220 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:49,884-Speed 2512.72 samples/sec Loss 1.4517 LearningRate 0.000108 Epoch: 28 Global Step: 584230 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:10:58,101-Speed 2492.50 samples/sec Loss 1.4501 LearningRate 0.000108 Epoch: 28 Global Step: 584240 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:06,302-Speed 2497.67 samples/sec Loss 1.4637 LearningRate 0.000108 Epoch: 28 Global Step: 584250 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:14,504-Speed 2497.31 samples/sec Loss 1.4934 LearningRate 0.000108 Epoch: 28 Global Step: 584260 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:22,704-Speed 2498.38 samples/sec Loss 1.4840 LearningRate 0.000108 Epoch: 28 Global Step: 584270 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:30,920-Speed 2493.02 samples/sec Loss 1.4743 LearningRate 0.000108 Epoch: 28 Global Step: 584280 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:39,068-Speed 2513.70 samples/sec Loss 1.5035 LearningRate 0.000108 Epoch: 28 Global Step: 584290 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:47,270-Speed 2497.90 samples/sec Loss 1.4718 LearningRate 0.000108 Epoch: 28 Global Step: 584300 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:11:55,471-Speed 2497.77 samples/sec Loss 1.4765 LearningRate 0.000108 Epoch: 28 Global Step: 584310 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:03,671-Speed 2497.72 samples/sec Loss 1.4733 LearningRate 0.000108 Epoch: 28 Global Step: 584320 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:11,873-Speed 2497.61 samples/sec Loss 1.4870 LearningRate 0.000108 Epoch: 28 Global Step: 584330 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:20,076-Speed 2497.08 samples/sec Loss 1.4516 LearningRate 0.000108 Epoch: 28 Global Step: 584340 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:28,235-Speed 2510.52 samples/sec Loss 1.4710 LearningRate 0.000108 Epoch: 28 Global Step: 584350 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:36,442-Speed 2495.88 samples/sec Loss 1.4358 LearningRate 0.000108 Epoch: 28 Global Step: 584360 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:44,643-Speed 2497.68 samples/sec Loss 1.4895 LearningRate 0.000108 Epoch: 28 Global Step: 584370 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:12:52,845-Speed 2497.29 samples/sec Loss 1.4574 LearningRate 0.000108 Epoch: 28 Global Step: 584380 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:01,047-Speed 2497.44 samples/sec Loss 1.4570 LearningRate 0.000108 Epoch: 28 Global Step: 584390 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:09,246-Speed 2497.95 samples/sec Loss 1.4527 LearningRate 0.000108 Epoch: 28 Global Step: 584400 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:17,397-Speed 2513.77 samples/sec Loss 1.4211 LearningRate 0.000108 Epoch: 28 Global Step: 584410 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:25,599-Speed 2497.25 samples/sec Loss 1.4301 LearningRate 0.000108 Epoch: 28 Global Step: 584420 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:33,805-Speed 2496.34 samples/sec Loss 1.4709 LearningRate 0.000108 Epoch: 28 Global Step: 584430 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:42,008-Speed 2497.08 samples/sec Loss 1.4535 LearningRate 0.000108 Epoch: 28 Global Step: 584440 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:50,209-Speed 2497.50 samples/sec Loss 1.4738 LearningRate 0.000108 Epoch: 28 Global Step: 584450 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:13:58,408-Speed 2498.25 samples/sec Loss 1.4394 LearningRate 0.000108 Epoch: 28 Global Step: 584460 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:06,555-Speed 2514.29 samples/sec Loss 1.4251 LearningRate 0.000108 Epoch: 28 Global Step: 584470 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:14,760-Speed 2496.49 samples/sec Loss 1.4365 LearningRate 0.000108 Epoch: 28 Global Step: 584480 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:22,962-Speed 2497.32 samples/sec Loss 1.4673 LearningRate 0.000108 Epoch: 28 Global Step: 584490 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:31,164-Speed 2497.54 samples/sec Loss 1.4598 LearningRate 0.000108 Epoch: 28 Global Step: 584500 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:39,364-Speed 2497.84 samples/sec Loss 1.4427 LearningRate 0.000108 Epoch: 28 Global Step: 584510 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:47,568-Speed 2496.63 samples/sec Loss 1.4787 LearningRate 0.000108 Epoch: 28 Global Step: 584520 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:14:55,719-Speed 2512.99 samples/sec Loss 1.4705 LearningRate 0.000108 Epoch: 28 Global Step: 584530 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:03,936-Speed 2493.01 samples/sec Loss 1.4608 LearningRate 0.000108 Epoch: 28 Global Step: 584540 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:12,140-Speed 2496.62 samples/sec Loss 1.4897 LearningRate 0.000108 Epoch: 28 Global Step: 584550 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:20,359-Speed 2492.20 samples/sec Loss 1.4717 LearningRate 0.000108 Epoch: 28 Global Step: 584560 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:28,565-Speed 2496.28 samples/sec Loss 1.4401 LearningRate 0.000108 Epoch: 28 Global Step: 584570 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:36,768-Speed 2496.95 samples/sec Loss 1.4719 LearningRate 0.000108 Epoch: 28 Global Step: 584580 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:44,920-Speed 2512.67 samples/sec Loss 1.4489 LearningRate 0.000108 Epoch: 28 Global Step: 584590 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:15:53,124-Speed 2496.82 samples/sec Loss 1.4724 LearningRate 0.000108 Epoch: 28 Global Step: 584600 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:01,327-Speed 2496.98 samples/sec Loss 1.4496 LearningRate 0.000108 Epoch: 28 Global Step: 584610 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:09,529-Speed 2497.08 samples/sec Loss 1.4562 LearningRate 0.000108 Epoch: 28 Global Step: 584620 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:17,742-Speed 2494.55 samples/sec Loss 1.4718 LearningRate 0.000108 Epoch: 28 Global Step: 584630 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:25,949-Speed 2495.88 samples/sec Loss 1.4713 LearningRate 0.000108 Epoch: 28 Global Step: 584640 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:34,099-Speed 2513.21 samples/sec Loss 1.4495 LearningRate 0.000108 Epoch: 28 Global Step: 584650 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:42,313-Speed 2493.76 samples/sec Loss 1.4324 LearningRate 0.000108 Epoch: 28 Global Step: 584660 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:50,532-Speed 2492.15 samples/sec Loss 1.4742 LearningRate 0.000108 Epoch: 28 Global Step: 584670 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:16:58,734-Speed 2497.30 samples/sec Loss 1.4526 LearningRate 0.000108 Epoch: 28 Global Step: 584680 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:06,937-Speed 2496.98 samples/sec Loss 1.4745 LearningRate 0.000108 Epoch: 28 Global Step: 584690 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:15,138-Speed 2497.56 samples/sec Loss 1.4290 LearningRate 0.000108 Epoch: 28 Global Step: 584700 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:23,291-Speed 2512.35 samples/sec Loss 1.4639 LearningRate 0.000108 Epoch: 28 Global Step: 584710 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:31,490-Speed 2498.33 samples/sec Loss 1.5064 LearningRate 0.000108 Epoch: 28 Global Step: 584720 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:39,691-Speed 2497.76 samples/sec Loss 1.4593 LearningRate 0.000108 Epoch: 28 Global Step: 584730 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:47,898-Speed 2495.67 samples/sec Loss 1.4750 LearningRate 0.000108 Epoch: 28 Global Step: 584740 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:17:56,098-Speed 2498.02 samples/sec Loss 1.4375 LearningRate 0.000108 Epoch: 28 Global Step: 584750 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:04,297-Speed 2498.38 samples/sec Loss 1.4554 LearningRate 0.000108 Epoch: 28 Global Step: 584760 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:12,443-Speed 2514.28 samples/sec Loss 1.4796 LearningRate 0.000108 Epoch: 28 Global Step: 584770 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:20,651-Speed 2495.51 samples/sec Loss 1.4791 LearningRate 0.000107 Epoch: 28 Global Step: 584780 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:28,864-Speed 2493.89 samples/sec Loss 1.4623 LearningRate 0.000107 Epoch: 28 Global Step: 584790 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:37,067-Speed 2497.01 samples/sec Loss 1.4045 LearningRate 0.000107 Epoch: 28 Global Step: 584800 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:45,270-Speed 2497.07 samples/sec Loss 1.4283 LearningRate 0.000107 Epoch: 28 Global Step: 584810 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:18:53,475-Speed 2496.30 samples/sec Loss 1.4877 LearningRate 0.000107 Epoch: 28 Global Step: 584820 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:01,622-Speed 2514.36 samples/sec Loss 1.4663 LearningRate 0.000107 Epoch: 28 Global Step: 584830 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:09,825-Speed 2496.85 samples/sec Loss 1.4905 LearningRate 0.000107 Epoch: 28 Global Step: 584840 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:18,048-Speed 2491.00 samples/sec Loss 1.4402 LearningRate 0.000107 Epoch: 28 Global Step: 584850 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:26,247-Speed 2498.11 samples/sec Loss 1.4659 LearningRate 0.000107 Epoch: 28 Global Step: 584860 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:34,451-Speed 2496.90 samples/sec Loss 1.4636 LearningRate 0.000107 Epoch: 28 Global Step: 584870 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:42,651-Speed 2497.81 samples/sec Loss 1.4740 LearningRate 0.000107 Epoch: 28 Global Step: 584880 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:50,801-Speed 2513.38 samples/sec Loss 1.4586 LearningRate 0.000107 Epoch: 28 Global Step: 584890 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:19:59,001-Speed 2497.90 samples/sec Loss 1.4482 LearningRate 0.000107 Epoch: 28 Global Step: 584900 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:07,201-Speed 2497.97 samples/sec Loss 1.4536 LearningRate 0.000107 Epoch: 28 Global Step: 584910 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:15,407-Speed 2496.03 samples/sec Loss 1.4719 LearningRate 0.000107 Epoch: 28 Global Step: 584920 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:23,620-Speed 2493.82 samples/sec Loss 1.4597 LearningRate 0.000107 Epoch: 28 Global Step: 584930 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:31,821-Speed 2497.72 samples/sec Loss 1.4416 LearningRate 0.000107 Epoch: 28 Global Step: 584940 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:39,971-Speed 2513.23 samples/sec Loss 1.4631 LearningRate 0.000107 Epoch: 28 Global Step: 584950 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:48,178-Speed 2495.99 samples/sec Loss 1.4122 LearningRate 0.000107 Epoch: 28 Global Step: 584960 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:20:56,386-Speed 2495.47 samples/sec Loss 1.4553 LearningRate 0.000107 Epoch: 28 Global Step: 584970 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:04,592-Speed 2496.72 samples/sec Loss 1.4732 LearningRate 0.000107 Epoch: 28 Global Step: 584980 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:12,799-Speed 2495.73 samples/sec Loss 1.4909 LearningRate 0.000107 Epoch: 28 Global Step: 584990 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:20,999-Speed 2497.96 samples/sec Loss 1.4505 LearningRate 0.000107 Epoch: 28 Global Step: 585000 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:29,152-Speed 2512.98 samples/sec Loss 1.4632 LearningRate 0.000107 Epoch: 28 Global Step: 585010 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:37,354-Speed 2497.35 samples/sec Loss 1.4714 LearningRate 0.000107 Epoch: 28 Global Step: 585020 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:45,558-Speed 2496.76 samples/sec Loss 1.4493 LearningRate 0.000107 Epoch: 28 Global Step: 585030 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:21:53,761-Speed 2496.72 samples/sec Loss 1.4577 LearningRate 0.000107 Epoch: 28 Global Step: 585040 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:01,965-Speed 2496.94 samples/sec Loss 1.4617 LearningRate 0.000107 Epoch: 28 Global Step: 585050 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:10,168-Speed 2496.93 samples/sec Loss 1.4658 LearningRate 0.000107 Epoch: 28 Global Step: 585060 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:18,316-Speed 2513.99 samples/sec Loss 1.4902 LearningRate 0.000107 Epoch: 28 Global Step: 585070 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:26,518-Speed 2497.16 samples/sec Loss 1.4696 LearningRate 0.000107 Epoch: 28 Global Step: 585080 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:34,721-Speed 2497.17 samples/sec Loss 1.4262 LearningRate 0.000107 Epoch: 28 Global Step: 585090 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:42,937-Speed 2493.27 samples/sec Loss 1.4692 LearningRate 0.000107 Epoch: 28 Global Step: 585100 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:51,140-Speed 2497.08 samples/sec Loss 1.4600 LearningRate 0.000107 Epoch: 28 Global Step: 585110 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:22:59,339-Speed 2498.05 samples/sec Loss 1.4265 LearningRate 0.000107 Epoch: 28 Global Step: 585120 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:07,488-Speed 2513.68 samples/sec Loss 1.4989 LearningRate 0.000107 Epoch: 28 Global Step: 585130 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:15,689-Speed 2497.70 samples/sec Loss 1.4634 LearningRate 0.000107 Epoch: 28 Global Step: 585140 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:23,889-Speed 2498.11 samples/sec Loss 1.5183 LearningRate 0.000107 Epoch: 28 Global Step: 585150 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:32,090-Speed 2497.44 samples/sec Loss 1.4388 LearningRate 0.000107 Epoch: 28 Global Step: 585160 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:40,293-Speed 2497.12 samples/sec Loss 1.4778 LearningRate 0.000107 Epoch: 28 Global Step: 585170 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:48,494-Speed 2497.51 samples/sec Loss 1.4602 LearningRate 0.000107 Epoch: 28 Global Step: 585180 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:23:56,643-Speed 2513.71 samples/sec Loss 1.4311 LearningRate 0.000107 Epoch: 28 Global Step: 585190 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:04,841-Speed 2498.38 samples/sec Loss 1.4593 LearningRate 0.000107 Epoch: 28 Global Step: 585200 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:13,056-Speed 2493.73 samples/sec Loss 1.4414 LearningRate 0.000107 Epoch: 28 Global Step: 585210 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:21,262-Speed 2496.45 samples/sec Loss 1.4770 LearningRate 0.000107 Epoch: 28 Global Step: 585220 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:29,466-Speed 2496.64 samples/sec Loss 1.4542 LearningRate 0.000107 Epoch: 28 Global Step: 585230 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:37,665-Speed 2498.08 samples/sec Loss 1.4608 LearningRate 0.000107 Epoch: 28 Global Step: 585240 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:45,812-Speed 2514.15 samples/sec Loss 1.4385 LearningRate 0.000107 Epoch: 28 Global Step: 585250 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:24:54,014-Speed 2497.37 samples/sec Loss 1.4584 LearningRate 0.000107 Epoch: 28 Global Step: 585260 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:25:02,226-Speed 2494.44 samples/sec Loss 1.4603 LearningRate 0.000107 Epoch: 28 Global Step: 585270 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:25:10,429-Speed 2497.00 samples/sec Loss 1.4198 LearningRate 0.000107 Epoch: 28 Global Step: 585280 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:25:18,629-Speed 2498.14 samples/sec Loss 1.4755 LearningRate 0.000107 Epoch: 28 Global Step: 585290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:25:26,834-Speed 2496.47 samples/sec Loss 1.4858 LearningRate 0.000107 Epoch: 28 Global Step: 585300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:25:35,000-Speed 2508.60 samples/sec Loss 1.4256 LearningRate 0.000107 Epoch: 28 Global Step: 585310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:25:43,205-Speed 2496.57 samples/sec Loss 1.4637 LearningRate 0.000107 Epoch: 28 Global Step: 585320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:25:51,408-Speed 2496.91 samples/sec Loss 1.4607 LearningRate 0.000107 Epoch: 28 Global Step: 585330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:25:59,619-Speed 2494.68 samples/sec Loss 1.4431 LearningRate 0.000107 Epoch: 28 Global Step: 585340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:07,819-Speed 2497.85 samples/sec Loss 1.4393 LearningRate 0.000107 Epoch: 28 Global Step: 585350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:16,017-Speed 2498.47 samples/sec Loss 1.4139 LearningRate 0.000107 Epoch: 28 Global Step: 585360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:24,175-Speed 2511.06 samples/sec Loss 1.4694 LearningRate 0.000107 Epoch: 28 Global Step: 585370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:32,375-Speed 2497.73 samples/sec Loss 1.3924 LearningRate 0.000107 Epoch: 28 Global Step: 585380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:40,575-Speed 2497.92 samples/sec Loss 1.4165 LearningRate 0.000107 Epoch: 28 Global Step: 585390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:48,783-Speed 2496.28 samples/sec Loss 1.4628 LearningRate 0.000107 Epoch: 28 Global Step: 585400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:26:56,985-Speed 2497.56 samples/sec Loss 1.4618 LearningRate 0.000107 Epoch: 28 Global Step: 585410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:05,185-Speed 2497.90 samples/sec Loss 1.4770 LearningRate 0.000107 Epoch: 28 Global Step: 585420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:13,336-Speed 2512.89 samples/sec Loss 1.4565 LearningRate 0.000107 Epoch: 28 Global Step: 585430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:21,537-Speed 2497.73 samples/sec Loss 1.4748 LearningRate 0.000107 Epoch: 28 Global Step: 585440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:29,742-Speed 2496.45 samples/sec Loss 1.4670 LearningRate 0.000107 Epoch: 28 Global Step: 585450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:37,951-Speed 2495.11 samples/sec Loss 1.4611 LearningRate 0.000107 Epoch: 28 Global Step: 585460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:46,159-Speed 2495.74 samples/sec Loss 1.4951 LearningRate 0.000107 Epoch: 28 Global Step: 585470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:27:54,367-Speed 2495.69 samples/sec Loss 1.4779 LearningRate 0.000107 Epoch: 28 Global Step: 585480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:02,521-Speed 2511.88 samples/sec Loss 1.4341 LearningRate 0.000107 Epoch: 28 Global Step: 585490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:10,725-Speed 2496.88 samples/sec Loss 1.4740 LearningRate 0.000107 Epoch: 28 Global Step: 585500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:18,936-Speed 2494.53 samples/sec Loss 1.4642 LearningRate 0.000107 Epoch: 28 Global Step: 585510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:27,142-Speed 2496.22 samples/sec Loss 1.4865 LearningRate 0.000107 Epoch: 28 Global Step: 585520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:35,351-Speed 2495.03 samples/sec Loss 1.4421 LearningRate 0.000107 Epoch: 28 Global Step: 585530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:43,552-Speed 2497.61 samples/sec Loss 1.4143 LearningRate 0.000107 Epoch: 28 Global Step: 585540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:51,699-Speed 2514.13 samples/sec Loss 1.4685 LearningRate 0.000107 Epoch: 28 Global Step: 585550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:28:59,899-Speed 2497.89 samples/sec Loss 1.4855 LearningRate 0.000107 Epoch: 28 Global Step: 585560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:08,103-Speed 2496.82 samples/sec Loss 1.4303 LearningRate 0.000107 Epoch: 28 Global Step: 585570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:16,304-Speed 2497.59 samples/sec Loss 1.4215 LearningRate 0.000107 Epoch: 28 Global Step: 585580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:24,519-Speed 2493.41 samples/sec Loss 1.4432 LearningRate 0.000107 Epoch: 28 Global Step: 585590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:32,724-Speed 2496.26 samples/sec Loss 1.4370 LearningRate 0.000107 Epoch: 28 Global Step: 585600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:40,887-Speed 2509.55 samples/sec Loss 1.4428 LearningRate 0.000107 Epoch: 28 Global Step: 585610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:49,086-Speed 2498.01 samples/sec Loss 1.4911 LearningRate 0.000107 Epoch: 28 Global Step: 585620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:29:57,287-Speed 2497.67 samples/sec Loss 1.4348 LearningRate 0.000107 Epoch: 28 Global Step: 585630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:05,508-Speed 2492.56 samples/sec Loss 1.4602 LearningRate 0.000107 Epoch: 28 Global Step: 585640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:13,711-Speed 2497.13 samples/sec Loss 1.4560 LearningRate 0.000107 Epoch: 28 Global Step: 585650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:21,911-Speed 2497.81 samples/sec Loss 1.5081 LearningRate 0.000107 Epoch: 28 Global Step: 585660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:30,060-Speed 2513.55 samples/sec Loss 1.4715 LearningRate 0.000107 Epoch: 28 Global Step: 585670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:38,260-Speed 2498.16 samples/sec Loss 1.4776 LearningRate 0.000107 Epoch: 28 Global Step: 585680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:46,474-Speed 2493.57 samples/sec Loss 1.4285 LearningRate 0.000107 Epoch: 28 Global Step: 585690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:30:54,692-Speed 2492.36 samples/sec Loss 1.4961 LearningRate 0.000107 Epoch: 28 Global Step: 585700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:02,896-Speed 2496.76 samples/sec Loss 1.4401 LearningRate 0.000107 Epoch: 28 Global Step: 585710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:11,111-Speed 2493.67 samples/sec Loss 1.4316 LearningRate 0.000107 Epoch: 28 Global Step: 585720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:19,272-Speed 2509.88 samples/sec Loss 1.4473 LearningRate 0.000107 Epoch: 28 Global Step: 585730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:27,477-Speed 2496.47 samples/sec Loss 1.5037 LearningRate 0.000107 Epoch: 28 Global Step: 585740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:35,685-Speed 2495.47 samples/sec Loss 1.4605 LearningRate 0.000107 Epoch: 28 Global Step: 585750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:43,889-Speed 2497.07 samples/sec Loss 1.4672 LearningRate 0.000107 Epoch: 28 Global Step: 585760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:31:52,089-Speed 2497.84 samples/sec Loss 1.4618 LearningRate 0.000107 Epoch: 28 Global Step: 585770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:00,292-Speed 2496.75 samples/sec Loss 1.4685 LearningRate 0.000107 Epoch: 28 Global Step: 585780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:08,451-Speed 2510.49 samples/sec Loss 1.4817 LearningRate 0.000107 Epoch: 28 Global Step: 585790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:16,663-Speed 2494.18 samples/sec Loss 1.4706 LearningRate 0.000107 Epoch: 28 Global Step: 585800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:24,865-Speed 2497.42 samples/sec Loss 1.4510 LearningRate 0.000107 Epoch: 28 Global Step: 585810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:33,072-Speed 2495.73 samples/sec Loss 1.4420 LearningRate 0.000107 Epoch: 28 Global Step: 585820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:41,275-Speed 2497.19 samples/sec Loss 1.4916 LearningRate 0.000107 Epoch: 28 Global Step: 585830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:49,474-Speed 2498.23 samples/sec Loss 1.4512 LearningRate 0.000107 Epoch: 28 Global Step: 585840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:32:57,640-Speed 2508.26 samples/sec Loss 1.4636 LearningRate 0.000107 Epoch: 28 Global Step: 585850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:05,846-Speed 2495.91 samples/sec Loss 1.4673 LearningRate 0.000107 Epoch: 28 Global Step: 585860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:14,048-Speed 2497.42 samples/sec Loss 1.4514 LearningRate 0.000107 Epoch: 28 Global Step: 585870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:22,268-Speed 2492.08 samples/sec Loss 1.4676 LearningRate 0.000107 Epoch: 28 Global Step: 585880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:30,468-Speed 2497.75 samples/sec Loss 1.4856 LearningRate 0.000107 Epoch: 28 Global Step: 585890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:38,686-Speed 2492.67 samples/sec Loss 1.4762 LearningRate 0.000107 Epoch: 28 Global Step: 585900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:46,833-Speed 2514.28 samples/sec Loss 1.4953 LearningRate 0.000107 Epoch: 28 Global Step: 585910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:33:55,044-Speed 2494.56 samples/sec Loss 1.4440 LearningRate 0.000106 Epoch: 28 Global Step: 585920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:03,257-Speed 2494.02 samples/sec Loss 1.4627 LearningRate 0.000106 Epoch: 28 Global Step: 585930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:11,464-Speed 2496.13 samples/sec Loss 1.5010 LearningRate 0.000106 Epoch: 28 Global Step: 585940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:19,665-Speed 2497.54 samples/sec Loss 1.4596 LearningRate 0.000106 Epoch: 28 Global Step: 585950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:27,864-Speed 2498.17 samples/sec Loss 1.4371 LearningRate 0.000106 Epoch: 28 Global Step: 585960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:36,009-Speed 2514.70 samples/sec Loss 1.4559 LearningRate 0.000106 Epoch: 28 Global Step: 585970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:44,211-Speed 2497.50 samples/sec Loss 1.4525 LearningRate 0.000106 Epoch: 28 Global Step: 585980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:34:52,409-Speed 2498.70 samples/sec Loss 1.4507 LearningRate 0.000106 Epoch: 28 Global Step: 585990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:00,615-Speed 2495.87 samples/sec Loss 1.4755 LearningRate 0.000106 Epoch: 28 Global Step: 586000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:08,851-Speed 2498.74 samples/sec Loss 1.4802 LearningRate 0.000106 Epoch: 28 Global Step: 586010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:17,234-Speed 2500.34 samples/sec Loss 1.4810 LearningRate 0.000106 Epoch: 28 Global Step: 586020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:25,383-Speed 2513.54 samples/sec Loss 1.4811 LearningRate 0.000106 Epoch: 28 Global Step: 586030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:33,671-Speed 2499.60 samples/sec Loss 1.4708 LearningRate 0.000106 Epoch: 28 Global Step: 586040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:41,871-Speed 2500.73 samples/sec Loss 1.4218 LearningRate 0.000106 Epoch: 28 Global Step: 586050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:50,116-Speed 2499.10 samples/sec Loss 1.4830 LearningRate 0.000106 Epoch: 28 Global Step: 586060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:35:58,321-Speed 2496.21 samples/sec Loss 1.4380 LearningRate 0.000106 Epoch: 28 Global Step: 586070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:06,521-Speed 2498.08 samples/sec Loss 1.4542 LearningRate 0.000106 Epoch: 28 Global Step: 586080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:14,697-Speed 2515.73 samples/sec Loss 1.4625 LearningRate 0.000106 Epoch: 28 Global Step: 586090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:22,935-Speed 2499.36 samples/sec Loss 1.4397 LearningRate 0.000106 Epoch: 28 Global Step: 586100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:31,138-Speed 2496.87 samples/sec Loss 1.4072 LearningRate 0.000106 Epoch: 28 Global Step: 586110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:39,385-Speed 2498.29 samples/sec Loss 1.4746 LearningRate 0.000106 Epoch: 28 Global Step: 586120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:49,704-Speed 2500.31 samples/sec Loss 1.4386 LearningRate 0.000106 Epoch: 28 Global Step: 586130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:36:57,899-Speed 2499.51 samples/sec Loss 1.4569 LearningRate 0.000106 Epoch: 28 Global Step: 586140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:06,044-Speed 2514.85 samples/sec Loss 1.4759 LearningRate 0.000106 Epoch: 28 Global Step: 586150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:14,259-Speed 2500.24 samples/sec Loss 1.4730 LearningRate 0.000106 Epoch: 28 Global Step: 586160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:26,579-Speed 2501.02 samples/sec Loss 1.4644 LearningRate 0.000106 Epoch: 28 Global Step: 586170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:34,820-Speed 2501.68 samples/sec Loss 1.4699 LearningRate 0.000106 Epoch: 28 Global Step: 586180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:43,922-Speed 2262.94 samples/sec Loss 1.4747 LearningRate 0.000106 Epoch: 28 Global Step: 586190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:37:52,120-Speed 2499.15 samples/sec Loss 1.4542 LearningRate 0.000106 Epoch: 28 Global Step: 586200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:38:00,261-Speed 2516.19 samples/sec Loss 1.4399 LearningRate 0.000106 Epoch: 28 Global Step: 586210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:38:08,518-Speed 2488.15 samples/sec Loss 1.4721 LearningRate 0.000106 Epoch: 28 Global Step: 586220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:38:18,000-Speed 2222.26 samples/sec Loss 1.4479 LearningRate 0.000106 Epoch: 28 Global Step: 586230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:38:26,247-Speed 2501.99 samples/sec Loss 1.4768 LearningRate 0.000106 Epoch: 28 Global Step: 586240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-07-11 04:38:34,904-Speed 2365.78 samples/sec Loss 1.4715 LearningRate 0.000106 Epoch: 28 Global Step: 586250 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:38:43,953-Speed 2263.70 samples/sec Loss 1.4864 LearningRate 0.000106 Epoch: 28 Global Step: 586260 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:38:52,107-Speed 2512.08 samples/sec Loss 1.4640 LearningRate 0.000106 Epoch: 28 Global Step: 586270 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:00,304-Speed 2498.94 samples/sec Loss 1.4665 LearningRate 0.000106 Epoch: 28 Global Step: 586280 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:08,511-Speed 2495.72 samples/sec Loss 1.4647 LearningRate 0.000106 Epoch: 28 Global Step: 586290 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:16,712-Speed 2497.88 samples/sec Loss 1.4594 LearningRate 0.000106 Epoch: 28 Global Step: 586300 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:24,913-Speed 2497.68 samples/sec Loss 1.4732 LearningRate 0.000106 Epoch: 28 Global Step: 586310 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:33,121-Speed 2495.61 samples/sec Loss 1.4606 LearningRate 0.000106 Epoch: 28 Global Step: 586320 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:41,303-Speed 2503.45 samples/sec Loss 1.4909 LearningRate 0.000106 Epoch: 28 Global Step: 586330 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:49,504-Speed 2498.09 samples/sec Loss 1.4773 LearningRate 0.000106 Epoch: 28 Global Step: 586340 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:39:57,716-Speed 2494.23 samples/sec Loss 1.4652 LearningRate 0.000106 Epoch: 28 Global Step: 586350 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:05,917-Speed 2497.46 samples/sec Loss 1.4817 LearningRate 0.000106 Epoch: 28 Global Step: 586360 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:14,130-Speed 2494.22 samples/sec Loss 1.4688 LearningRate 0.000106 Epoch: 28 Global Step: 586370 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:22,335-Speed 2496.43 samples/sec Loss 1.4750 LearningRate 0.000106 Epoch: 28 Global Step: 586380 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:30,481-Speed 2514.70 samples/sec Loss 1.4592 LearningRate 0.000106 Epoch: 28 Global Step: 586390 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:38,682-Speed 2497.58 samples/sec Loss 1.4554 LearningRate 0.000106 Epoch: 28 Global Step: 586400 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:46,884-Speed 2497.44 samples/sec Loss 1.4655 LearningRate 0.000106 Epoch: 28 Global Step: 586410 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:40:55,091-Speed 2496.11 samples/sec Loss 1.4336 LearningRate 0.000106 Epoch: 28 Global Step: 586420 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:03,298-Speed 2495.61 samples/sec Loss 1.4508 LearningRate 0.000106 Epoch: 28 Global Step: 586430 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:11,514-Speed 2493.18 samples/sec Loss 1.4590 LearningRate 0.000106 Epoch: 28 Global Step: 586440 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:19,665-Speed 2512.82 samples/sec Loss 1.4743 LearningRate 0.000106 Epoch: 28 Global Step: 586450 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:27,868-Speed 2497.24 samples/sec Loss 1.4361 LearningRate 0.000106 Epoch: 28 Global Step: 586460 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:36,072-Speed 2496.48 samples/sec Loss 1.4855 LearningRate 0.000106 Epoch: 28 Global Step: 586470 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:44,276-Speed 2496.99 samples/sec Loss 1.4950 LearningRate 0.000106 Epoch: 28 Global Step: 586480 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:41:52,477-Speed 2497.48 samples/sec Loss 1.4577 LearningRate 0.000106 Epoch: 28 Global Step: 586490 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:00,682-Speed 2496.70 samples/sec Loss 1.4652 LearningRate 0.000106 Epoch: 28 Global Step: 586500 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:08,834-Speed 2512.77 samples/sec Loss 1.4629 LearningRate 0.000106 Epoch: 28 Global Step: 586510 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:17,037-Speed 2497.01 samples/sec Loss 1.4763 LearningRate 0.000106 Epoch: 28 Global Step: 586520 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:25,239-Speed 2497.25 samples/sec Loss 1.4642 LearningRate 0.000106 Epoch: 28 Global Step: 586530 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:33,438-Speed 2498.21 samples/sec Loss 1.4838 LearningRate 0.000106 Epoch: 28 Global Step: 586540 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:41,638-Speed 2498.02 samples/sec Loss 1.4688 LearningRate 0.000106 Epoch: 28 Global Step: 586550 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:49,841-Speed 2497.16 samples/sec Loss 1.4949 LearningRate 0.000106 Epoch: 28 Global Step: 586560 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:42:57,988-Speed 2514.12 samples/sec Loss 1.4691 LearningRate 0.000106 Epoch: 28 Global Step: 586570 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:06,190-Speed 2497.35 samples/sec Loss 1.5000 LearningRate 0.000106 Epoch: 28 Global Step: 586580 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:14,400-Speed 2494.85 samples/sec Loss 1.4654 LearningRate 0.000106 Epoch: 28 Global Step: 586590 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:22,612-Speed 2494.32 samples/sec Loss 1.4851 LearningRate 0.000106 Epoch: 28 Global Step: 586600 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:30,814-Speed 2497.50 samples/sec Loss 1.4908 LearningRate 0.000106 Epoch: 28 Global Step: 586610 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:39,015-Speed 2497.50 samples/sec Loss 1.4798 LearningRate 0.000106 Epoch: 28 Global Step: 586620 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:47,165-Speed 2513.55 samples/sec Loss 1.4796 LearningRate 0.000106 Epoch: 28 Global Step: 586630 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:43:55,367-Speed 2497.41 samples/sec Loss 1.4745 LearningRate 0.000106 Epoch: 28 Global Step: 586640 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:03,569-Speed 2497.15 samples/sec Loss 1.4688 LearningRate 0.000106 Epoch: 28 Global Step: 586650 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:11,770-Speed 2498.17 samples/sec Loss 1.4731 LearningRate 0.000106 Epoch: 28 Global Step: 586660 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:19,983-Speed 2493.91 samples/sec Loss 1.4806 LearningRate 0.000106 Epoch: 28 Global Step: 586670 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:28,186-Speed 2497.35 samples/sec Loss 1.4694 LearningRate 0.000106 Epoch: 28 Global Step: 586680 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:36,354-Speed 2507.69 samples/sec Loss 1.4812 LearningRate 0.000106 Epoch: 28 Global Step: 586690 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:44,558-Speed 2496.92 samples/sec Loss 1.4773 LearningRate 0.000106 Epoch: 28 Global Step: 586700 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:44:52,755-Speed 2498.72 samples/sec Loss 1.4716 LearningRate 0.000106 Epoch: 28 Global Step: 586710 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:00,956-Speed 2497.74 samples/sec Loss 1.4649 LearningRate 0.000106 Epoch: 28 Global Step: 586720 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:09,161-Speed 2496.30 samples/sec Loss 1.4852 LearningRate 0.000106 Epoch: 28 Global Step: 586730 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:17,363-Speed 2497.54 samples/sec Loss 1.4824 LearningRate 0.000106 Epoch: 28 Global Step: 586740 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:25,522-Speed 2510.63 samples/sec Loss 1.4773 LearningRate 0.000106 Epoch: 28 Global Step: 586750 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:33,726-Speed 2497.11 samples/sec Loss 1.4733 LearningRate 0.000106 Epoch: 28 Global Step: 586760 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:41,926-Speed 2498.00 samples/sec Loss 1.4398 LearningRate 0.000106 Epoch: 28 Global Step: 586770 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:50,129-Speed 2496.86 samples/sec Loss 1.4897 LearningRate 0.000106 Epoch: 28 Global Step: 586780 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:45:58,345-Speed 2493.34 samples/sec Loss 1.4502 LearningRate 0.000106 Epoch: 28 Global Step: 586790 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:06,545-Speed 2498.00 samples/sec Loss 1.4586 LearningRate 0.000106 Epoch: 28 Global Step: 586800 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:14,690-Speed 2514.71 samples/sec Loss 1.4791 LearningRate 0.000106 Epoch: 28 Global Step: 586810 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:22,888-Speed 2498.49 samples/sec Loss 1.4652 LearningRate 0.000106 Epoch: 28 Global Step: 586820 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:31,101-Speed 2494.20 samples/sec Loss 1.4547 LearningRate 0.000106 Epoch: 28 Global Step: 586830 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:39,296-Speed 2499.33 samples/sec Loss 1.4543 LearningRate 0.000106 Epoch: 28 Global Step: 586840 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:47,494-Speed 2498.59 samples/sec Loss 1.4549 LearningRate 0.000106 Epoch: 28 Global Step: 586850 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:46:55,695-Speed 2497.53 samples/sec Loss 1.4654 LearningRate 0.000106 Epoch: 28 Global Step: 586860 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:03,846-Speed 2513.21 samples/sec Loss 1.4395 LearningRate 0.000106 Epoch: 28 Global Step: 586870 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:12,045-Speed 2498.07 samples/sec Loss 1.4885 LearningRate 0.000106 Epoch: 28 Global Step: 586880 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:20,245-Speed 2498.06 samples/sec Loss 1.4835 LearningRate 0.000106 Epoch: 28 Global Step: 586890 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:28,447-Speed 2497.36 samples/sec Loss 1.4506 LearningRate 0.000106 Epoch: 28 Global Step: 586900 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:36,651-Speed 2496.78 samples/sec Loss 1.4723 LearningRate 0.000106 Epoch: 28 Global Step: 586910 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:44,851-Speed 2498.00 samples/sec Loss 1.4936 LearningRate 0.000106 Epoch: 28 Global Step: 586920 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:47:52,998-Speed 2514.21 samples/sec Loss 1.4733 LearningRate 0.000106 Epoch: 28 Global Step: 586930 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:01,196-Speed 2498.58 samples/sec Loss 1.4234 LearningRate 0.000106 Epoch: 28 Global Step: 586940 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:09,396-Speed 2497.84 samples/sec Loss 1.4625 LearningRate 0.000106 Epoch: 28 Global Step: 586950 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:17,597-Speed 2497.80 samples/sec Loss 1.4034 LearningRate 0.000106 Epoch: 28 Global Step: 586960 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:25,800-Speed 2497.27 samples/sec Loss 1.4864 LearningRate 0.000106 Epoch: 28 Global Step: 586970 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:34,019-Speed 2492.20 samples/sec Loss 1.4348 LearningRate 0.000106 Epoch: 28 Global Step: 586980 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:42,173-Speed 2511.86 samples/sec Loss 1.4773 LearningRate 0.000106 Epoch: 28 Global Step: 586990 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:50,376-Speed 2497.17 samples/sec Loss 1.4612 LearningRate 0.000106 Epoch: 28 Global Step: 587000 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:48:58,590-Speed 2493.76 samples/sec Loss 1.4657 LearningRate 0.000106 Epoch: 28 Global Step: 587010 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:06,794-Speed 2496.70 samples/sec Loss 1.4226 LearningRate 0.000106 Epoch: 28 Global Step: 587020 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:14,997-Speed 2497.20 samples/sec Loss 1.4387 LearningRate 0.000106 Epoch: 28 Global Step: 587030 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:23,203-Speed 2495.99 samples/sec Loss 1.4585 LearningRate 0.000106 Epoch: 28 Global Step: 587040 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:31,359-Speed 2511.69 samples/sec Loss 1.4478 LearningRate 0.000106 Epoch: 28 Global Step: 587050 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:39,574-Speed 2493.94 samples/sec Loss 1.4887 LearningRate 0.000105 Epoch: 28 Global Step: 587060 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:47,779-Speed 2496.43 samples/sec Loss 1.4593 LearningRate 0.000105 Epoch: 28 Global Step: 587070 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:49:55,985-Speed 2496.38 samples/sec Loss 1.4566 LearningRate 0.000105 Epoch: 28 Global Step: 587080 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:50:04,191-Speed 2495.92 samples/sec Loss 1.4515 LearningRate 0.000105 Epoch: 28 Global Step: 587090 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:50:12,395-Speed 2496.85 samples/sec Loss 1.4586 LearningRate 0.000105 Epoch: 28 Global Step: 587100 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:50:20,548-Speed 2512.22 samples/sec Loss 1.4284 LearningRate 0.000105 Epoch: 28 Global Step: 587110 Fp16 Grad Scale: 16384 Required: 56 hours Training: 2022-07-11 04:50:28,763-Speed 2493.40 samples/sec Loss 1.4604 LearningRate 0.000105 Epoch: 28 Global Step: 587120 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:50:36,968-Speed 2496.64 samples/sec Loss 1.4460 LearningRate 0.000105 Epoch: 28 Global Step: 587130 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:50:45,170-Speed 2497.38 samples/sec Loss 1.3985 LearningRate 0.000105 Epoch: 28 Global Step: 587140 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:50:53,386-Speed 2492.88 samples/sec Loss 1.4512 LearningRate 0.000105 Epoch: 28 Global Step: 587150 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:01,589-Speed 2497.09 samples/sec Loss 1.4402 LearningRate 0.000105 Epoch: 28 Global Step: 587160 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:09,742-Speed 2512.34 samples/sec Loss 1.4372 LearningRate 0.000105 Epoch: 28 Global Step: 587170 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:17,943-Speed 2497.80 samples/sec Loss 1.4326 LearningRate 0.000105 Epoch: 28 Global Step: 587180 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:26,144-Speed 2497.61 samples/sec Loss 1.5050 LearningRate 0.000105 Epoch: 28 Global Step: 587190 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:34,347-Speed 2497.03 samples/sec Loss 1.4218 LearningRate 0.000105 Epoch: 28 Global Step: 587200 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:42,565-Speed 2492.74 samples/sec Loss 1.4154 LearningRate 0.000105 Epoch: 28 Global Step: 587210 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:50,768-Speed 2496.99 samples/sec Loss 1.4806 LearningRate 0.000105 Epoch: 28 Global Step: 587220 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:51:58,930-Speed 2509.87 samples/sec Loss 1.4786 LearningRate 0.000105 Epoch: 28 Global Step: 587230 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:07,131-Speed 2497.64 samples/sec Loss 1.4465 LearningRate 0.000105 Epoch: 28 Global Step: 587240 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:15,332-Speed 2497.71 samples/sec Loss 1.4033 LearningRate 0.000105 Epoch: 28 Global Step: 587250 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:23,534-Speed 2497.10 samples/sec Loss 1.4637 LearningRate 0.000105 Epoch: 28 Global Step: 587260 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:31,741-Speed 2495.79 samples/sec Loss 1.4657 LearningRate 0.000105 Epoch: 28 Global Step: 587270 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:39,949-Speed 2495.57 samples/sec Loss 1.4184 LearningRate 0.000105 Epoch: 28 Global Step: 587280 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:48,097-Speed 2514.43 samples/sec Loss 1.4732 LearningRate 0.000105 Epoch: 28 Global Step: 587290 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:52:56,300-Speed 2496.93 samples/sec Loss 1.4611 LearningRate 0.000105 Epoch: 28 Global Step: 587300 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:04,505-Speed 2496.59 samples/sec Loss 1.4438 LearningRate 0.000105 Epoch: 28 Global Step: 587310 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:12,724-Speed 2492.11 samples/sec Loss 1.4611 LearningRate 0.000105 Epoch: 28 Global Step: 587320 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:20,920-Speed 2499.00 samples/sec Loss 1.4689 LearningRate 0.000105 Epoch: 28 Global Step: 587330 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:29,122-Speed 2497.52 samples/sec Loss 1.4772 LearningRate 0.000105 Epoch: 28 Global Step: 587340 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:37,269-Speed 2514.17 samples/sec Loss 1.4692 LearningRate 0.000105 Epoch: 28 Global Step: 587350 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:45,474-Speed 2496.49 samples/sec Loss 1.4733 LearningRate 0.000105 Epoch: 28 Global Step: 587360 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:53:53,679-Speed 2496.67 samples/sec Loss 1.4342 LearningRate 0.000105 Epoch: 28 Global Step: 587370 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:01,883-Speed 2496.84 samples/sec Loss 1.4420 LearningRate 0.000105 Epoch: 28 Global Step: 587380 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:10,085-Speed 2497.23 samples/sec Loss 1.4664 LearningRate 0.000105 Epoch: 28 Global Step: 587390 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:18,286-Speed 2497.73 samples/sec Loss 1.4960 LearningRate 0.000105 Epoch: 28 Global Step: 587400 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:26,433-Speed 2514.09 samples/sec Loss 1.4261 LearningRate 0.000105 Epoch: 28 Global Step: 587410 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:34,636-Speed 2497.04 samples/sec Loss 1.4575 LearningRate 0.000105 Epoch: 28 Global Step: 587420 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:42,837-Speed 2498.31 samples/sec Loss 1.4805 LearningRate 0.000105 Epoch: 28 Global Step: 587430 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:51,059-Speed 2491.45 samples/sec Loss 1.4316 LearningRate 0.000105 Epoch: 28 Global Step: 587440 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 04:54:59,265-Speed 2496.33 samples/sec Loss 1.4097 LearningRate 0.000105 Epoch: 28 Global Step: 587450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:07,463-Speed 2498.45 samples/sec Loss 1.4745 LearningRate 0.000105 Epoch: 28 Global Step: 587460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:15,612-Speed 2513.44 samples/sec Loss 1.4470 LearningRate 0.000105 Epoch: 28 Global Step: 587470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:23,812-Speed 2498.42 samples/sec Loss 1.4730 LearningRate 0.000105 Epoch: 28 Global Step: 587480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:32,015-Speed 2496.78 samples/sec Loss 1.4918 LearningRate 0.000105 Epoch: 28 Global Step: 587490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:40,218-Speed 2497.28 samples/sec Loss 1.4689 LearningRate 0.000105 Epoch: 28 Global Step: 587500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:48,420-Speed 2497.40 samples/sec Loss 1.4658 LearningRate 0.000105 Epoch: 28 Global Step: 587510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:55:56,623-Speed 2496.76 samples/sec Loss 1.4575 LearningRate 0.000105 Epoch: 28 Global Step: 587520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:04,775-Speed 2512.81 samples/sec Loss 1.4653 LearningRate 0.000105 Epoch: 28 Global Step: 587530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:12,975-Speed 2498.27 samples/sec Loss 1.4546 LearningRate 0.000105 Epoch: 28 Global Step: 587540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:21,175-Speed 2497.81 samples/sec Loss 1.4147 LearningRate 0.000105 Epoch: 28 Global Step: 587550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:29,374-Speed 2498.26 samples/sec Loss 1.4479 LearningRate 0.000105 Epoch: 28 Global Step: 587560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:37,573-Speed 2498.14 samples/sec Loss 1.4650 LearningRate 0.000105 Epoch: 28 Global Step: 587570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:45,785-Speed 2494.63 samples/sec Loss 1.4588 LearningRate 0.000105 Epoch: 28 Global Step: 587580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:56:53,934-Speed 2513.55 samples/sec Loss 1.4355 LearningRate 0.000105 Epoch: 28 Global Step: 587590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:02,133-Speed 2497.87 samples/sec Loss 1.4583 LearningRate 0.000105 Epoch: 28 Global Step: 587600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:10,333-Speed 2498.38 samples/sec Loss 1.4429 LearningRate 0.000105 Epoch: 28 Global Step: 587610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:18,529-Speed 2499.23 samples/sec Loss 1.4758 LearningRate 0.000105 Epoch: 28 Global Step: 587620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:26,746-Speed 2492.64 samples/sec Loss 1.4305 LearningRate 0.000105 Epoch: 28 Global Step: 587630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:34,947-Speed 2497.68 samples/sec Loss 1.4676 LearningRate 0.000105 Epoch: 28 Global Step: 587640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:43,098-Speed 2513.02 samples/sec Loss 1.4385 LearningRate 0.000105 Epoch: 28 Global Step: 587650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:51,298-Speed 2498.08 samples/sec Loss 1.4544 LearningRate 0.000105 Epoch: 28 Global Step: 587660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:57:59,502-Speed 2496.93 samples/sec Loss 1.4431 LearningRate 0.000105 Epoch: 28 Global Step: 587670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:07,702-Speed 2497.76 samples/sec Loss 1.4523 LearningRate 0.000105 Epoch: 28 Global Step: 587680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:15,903-Speed 2498.17 samples/sec Loss 1.4562 LearningRate 0.000105 Epoch: 28 Global Step: 587690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:24,104-Speed 2497.60 samples/sec Loss 1.4153 LearningRate 0.000105 Epoch: 28 Global Step: 587700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:32,255-Speed 2512.98 samples/sec Loss 1.4592 LearningRate 0.000105 Epoch: 28 Global Step: 587710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:40,454-Speed 2498.36 samples/sec Loss 1.4630 LearningRate 0.000105 Epoch: 28 Global Step: 587720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:48,657-Speed 2497.02 samples/sec Loss 1.4404 LearningRate 0.000105 Epoch: 28 Global Step: 587730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:58:56,860-Speed 2497.17 samples/sec Loss 1.4372 LearningRate 0.000105 Epoch: 28 Global Step: 587740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:05,062-Speed 2497.32 samples/sec Loss 1.4671 LearningRate 0.000105 Epoch: 28 Global Step: 587750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:13,267-Speed 2496.45 samples/sec Loss 1.4060 LearningRate 0.000105 Epoch: 28 Global Step: 587760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:21,417-Speed 2513.36 samples/sec Loss 1.4473 LearningRate 0.000105 Epoch: 28 Global Step: 587770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:29,617-Speed 2497.86 samples/sec Loss 1.4782 LearningRate 0.000105 Epoch: 28 Global Step: 587780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:37,819-Speed 2497.63 samples/sec Loss 1.4815 LearningRate 0.000105 Epoch: 28 Global Step: 587790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:46,031-Speed 2494.19 samples/sec Loss 1.4204 LearningRate 0.000105 Epoch: 28 Global Step: 587800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 04:59:54,243-Speed 2494.44 samples/sec Loss 1.4457 LearningRate 0.000105 Epoch: 28 Global Step: 587810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:02,451-Speed 2495.61 samples/sec Loss 1.4700 LearningRate 0.000105 Epoch: 28 Global Step: 587820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:10,598-Speed 2514.03 samples/sec Loss 1.4840 LearningRate 0.000105 Epoch: 28 Global Step: 587830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:18,803-Speed 2496.70 samples/sec Loss 1.4620 LearningRate 0.000105 Epoch: 28 Global Step: 587840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:27,008-Speed 2496.32 samples/sec Loss 1.4474 LearningRate 0.000105 Epoch: 28 Global Step: 587850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:35,221-Speed 2494.18 samples/sec Loss 1.4354 LearningRate 0.000105 Epoch: 28 Global Step: 587860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:43,420-Speed 2498.20 samples/sec Loss 1.4955 LearningRate 0.000105 Epoch: 28 Global Step: 587870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:51,621-Speed 2497.55 samples/sec Loss 1.4707 LearningRate 0.000105 Epoch: 28 Global Step: 587880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:00:59,769-Speed 2514.05 samples/sec Loss 1.4571 LearningRate 0.000105 Epoch: 28 Global Step: 587890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:07,968-Speed 2498.14 samples/sec Loss 1.4477 LearningRate 0.000105 Epoch: 28 Global Step: 587900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:16,173-Speed 2496.27 samples/sec Loss 1.4938 LearningRate 0.000105 Epoch: 28 Global Step: 587910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:24,374-Speed 2497.74 samples/sec Loss 1.4705 LearningRate 0.000105 Epoch: 28 Global Step: 587920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:32,573-Speed 2498.21 samples/sec Loss 1.4471 LearningRate 0.000105 Epoch: 28 Global Step: 587930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:40,773-Speed 2498.07 samples/sec Loss 1.4135 LearningRate 0.000105 Epoch: 28 Global Step: 587940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:48,922-Speed 2513.55 samples/sec Loss 1.4484 LearningRate 0.000105 Epoch: 28 Global Step: 587950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:01:57,119-Speed 2499.10 samples/sec Loss 1.4638 LearningRate 0.000105 Epoch: 28 Global Step: 587960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:05,319-Speed 2497.74 samples/sec Loss 1.4224 LearningRate 0.000105 Epoch: 28 Global Step: 587970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:13,519-Speed 2498.11 samples/sec Loss 1.4541 LearningRate 0.000105 Epoch: 28 Global Step: 587980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:21,722-Speed 2496.94 samples/sec Loss 1.4861 LearningRate 0.000105 Epoch: 28 Global Step: 587990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:29,927-Speed 2496.81 samples/sec Loss 1.4509 LearningRate 0.000105 Epoch: 28 Global Step: 588000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:38,077-Speed 2513.29 samples/sec Loss 1.4660 LearningRate 0.000105 Epoch: 28 Global Step: 588010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:46,274-Speed 2498.61 samples/sec Loss 1.4683 LearningRate 0.000105 Epoch: 28 Global Step: 588020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:02:54,483-Speed 2495.23 samples/sec Loss 1.4478 LearningRate 0.000105 Epoch: 28 Global Step: 588030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:02,687-Speed 2496.95 samples/sec Loss 1.4750 LearningRate 0.000105 Epoch: 28 Global Step: 588040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:10,895-Speed 2495.70 samples/sec Loss 1.4660 LearningRate 0.000105 Epoch: 28 Global Step: 588050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:19,112-Speed 2492.61 samples/sec Loss 1.4577 LearningRate 0.000105 Epoch: 28 Global Step: 588060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:27,258-Speed 2514.56 samples/sec Loss 1.4365 LearningRate 0.000105 Epoch: 28 Global Step: 588070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:35,460-Speed 2497.51 samples/sec Loss 1.4674 LearningRate 0.000105 Epoch: 28 Global Step: 588080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:43,661-Speed 2497.40 samples/sec Loss 1.4540 LearningRate 0.000105 Epoch: 28 Global Step: 588090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:03:51,865-Speed 2497.00 samples/sec Loss 1.4709 LearningRate 0.000105 Epoch: 28 Global Step: 588100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:00,065-Speed 2497.90 samples/sec Loss 1.4392 LearningRate 0.000105 Epoch: 28 Global Step: 588110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:08,265-Speed 2498.26 samples/sec Loss 1.4476 LearningRate 0.000105 Epoch: 28 Global Step: 588120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:16,420-Speed 2511.57 samples/sec Loss 1.4537 LearningRate 0.000105 Epoch: 28 Global Step: 588130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:24,635-Speed 2493.40 samples/sec Loss 1.4186 LearningRate 0.000105 Epoch: 28 Global Step: 588140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:32,836-Speed 2497.71 samples/sec Loss 1.4506 LearningRate 0.000105 Epoch: 28 Global Step: 588150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:04:40,994-Speed 2510.68 samples/sec Loss 1.4705 LearningRate 0.000105 Epoch: 28 Global Step: 588160 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:04:49,192-Speed 2498.56 samples/sec Loss 1.4713 LearningRate 0.000105 Epoch: 28 Global Step: 588170 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:04:57,399-Speed 2496.10 samples/sec Loss 1.4596 LearningRate 0.000105 Epoch: 28 Global Step: 588180 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:05,545-Speed 2514.46 samples/sec Loss 1.4785 LearningRate 0.000105 Epoch: 28 Global Step: 588190 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:13,758-Speed 2493.87 samples/sec Loss 1.4614 LearningRate 0.000105 Epoch: 28 Global Step: 588200 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:21,962-Speed 2496.97 samples/sec Loss 1.4739 LearningRate 0.000105 Epoch: 28 Global Step: 588210 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:30,159-Speed 2498.88 samples/sec Loss 1.4380 LearningRate 0.000104 Epoch: 28 Global Step: 588220 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:38,369-Speed 2494.89 samples/sec Loss 1.4427 LearningRate 0.000104 Epoch: 28 Global Step: 588230 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:46,577-Speed 2495.43 samples/sec Loss 1.5011 LearningRate 0.000104 Epoch: 28 Global Step: 588240 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:05:54,748-Speed 2507.12 samples/sec Loss 1.4641 LearningRate 0.000104 Epoch: 28 Global Step: 588250 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:02,949-Speed 2497.49 samples/sec Loss 1.4455 LearningRate 0.000104 Epoch: 28 Global Step: 588260 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:11,159-Speed 2495.48 samples/sec Loss 1.4644 LearningRate 0.000104 Epoch: 28 Global Step: 588270 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:19,362-Speed 2497.02 samples/sec Loss 1.4403 LearningRate 0.000104 Epoch: 28 Global Step: 588280 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:27,566-Speed 2496.64 samples/sec Loss 1.4522 LearningRate 0.000104 Epoch: 28 Global Step: 588290 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:35,779-Speed 2494.25 samples/sec Loss 1.4810 LearningRate 0.000104 Epoch: 28 Global Step: 588300 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:43,922-Speed 2515.16 samples/sec Loss 1.4926 LearningRate 0.000104 Epoch: 28 Global Step: 588310 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:06:52,117-Speed 2499.45 samples/sec Loss 1.4816 LearningRate 0.000104 Epoch: 28 Global Step: 588320 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:00,322-Speed 2496.39 samples/sec Loss 1.4403 LearningRate 0.000104 Epoch: 28 Global Step: 588330 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:08,520-Speed 2498.61 samples/sec Loss 1.4700 LearningRate 0.000104 Epoch: 28 Global Step: 588340 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:16,716-Speed 2499.47 samples/sec Loss 1.4610 LearningRate 0.000104 Epoch: 28 Global Step: 588350 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:24,917-Speed 2497.55 samples/sec Loss 1.4468 LearningRate 0.000104 Epoch: 28 Global Step: 588360 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:33,063-Speed 2514.41 samples/sec Loss 1.4605 LearningRate 0.000104 Epoch: 28 Global Step: 588370 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:41,271-Speed 2495.94 samples/sec Loss 1.4763 LearningRate 0.000104 Epoch: 28 Global Step: 588380 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:49,473-Speed 2497.29 samples/sec Loss 1.4046 LearningRate 0.000104 Epoch: 28 Global Step: 588390 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:07:57,671-Speed 2498.67 samples/sec Loss 1.4597 LearningRate 0.000104 Epoch: 28 Global Step: 588400 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:05,866-Speed 2499.29 samples/sec Loss 1.4409 LearningRate 0.000104 Epoch: 28 Global Step: 588410 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:14,068-Speed 2497.53 samples/sec Loss 1.4465 LearningRate 0.000104 Epoch: 28 Global Step: 588420 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:22,217-Speed 2513.41 samples/sec Loss 1.4810 LearningRate 0.000104 Epoch: 28 Global Step: 588430 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:30,418-Speed 2497.82 samples/sec Loss 1.4551 LearningRate 0.000104 Epoch: 28 Global Step: 588440 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:38,616-Speed 2498.60 samples/sec Loss 1.4991 LearningRate 0.000104 Epoch: 28 Global Step: 588450 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:46,815-Speed 2498.24 samples/sec Loss 1.4318 LearningRate 0.000104 Epoch: 28 Global Step: 588460 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:08:55,018-Speed 2497.13 samples/sec Loss 1.4547 LearningRate 0.000104 Epoch: 28 Global Step: 588470 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:03,219-Speed 2497.95 samples/sec Loss 1.4779 LearningRate 0.000104 Epoch: 28 Global Step: 588480 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:11,367-Speed 2513.89 samples/sec Loss 1.4525 LearningRate 0.000104 Epoch: 28 Global Step: 588490 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:19,575-Speed 2495.70 samples/sec Loss 1.4649 LearningRate 0.000104 Epoch: 28 Global Step: 588500 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:27,778-Speed 2497.06 samples/sec Loss 1.4616 LearningRate 0.000104 Epoch: 28 Global Step: 588510 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:35,982-Speed 2496.68 samples/sec Loss 1.4495 LearningRate 0.000104 Epoch: 28 Global Step: 588520 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:44,183-Speed 2497.65 samples/sec Loss 1.4426 LearningRate 0.000104 Epoch: 28 Global Step: 588530 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:09:52,391-Speed 2495.60 samples/sec Loss 1.4439 LearningRate 0.000104 Epoch: 28 Global Step: 588540 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:00,539-Speed 2513.90 samples/sec Loss 1.4603 LearningRate 0.000104 Epoch: 28 Global Step: 588550 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:08,751-Speed 2494.36 samples/sec Loss 1.4337 LearningRate 0.000104 Epoch: 28 Global Step: 588560 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:16,954-Speed 2496.97 samples/sec Loss 1.4426 LearningRate 0.000104 Epoch: 28 Global Step: 588570 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:25,242-Speed 2471.49 samples/sec Loss 1.4292 LearningRate 0.000104 Epoch: 28 Global Step: 588580 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:33,440-Speed 2498.63 samples/sec Loss 1.4513 LearningRate 0.000104 Epoch: 28 Global Step: 588590 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:41,638-Speed 2498.35 samples/sec Loss 1.4421 LearningRate 0.000104 Epoch: 28 Global Step: 588600 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:49,784-Speed 2514.51 samples/sec Loss 1.4163 LearningRate 0.000104 Epoch: 28 Global Step: 588610 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:10:58,001-Speed 2492.84 samples/sec Loss 1.4591 LearningRate 0.000104 Epoch: 28 Global Step: 588620 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:06,205-Speed 2496.71 samples/sec Loss 1.4689 LearningRate 0.000104 Epoch: 28 Global Step: 588630 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:14,407-Speed 2497.40 samples/sec Loss 1.4485 LearningRate 0.000104 Epoch: 28 Global Step: 588640 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:22,609-Speed 2497.51 samples/sec Loss 1.4035 LearningRate 0.000104 Epoch: 28 Global Step: 588650 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:30,822-Speed 2494.06 samples/sec Loss 1.4839 LearningRate 0.000104 Epoch: 28 Global Step: 588660 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:38,971-Speed 2513.46 samples/sec Loss 1.4823 LearningRate 0.000104 Epoch: 28 Global Step: 588670 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:47,172-Speed 2497.58 samples/sec Loss 1.4722 LearningRate 0.000104 Epoch: 28 Global Step: 588680 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:11:55,381-Speed 2495.49 samples/sec Loss 1.4635 LearningRate 0.000104 Epoch: 28 Global Step: 588690 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:03,593-Speed 2494.28 samples/sec Loss 1.4453 LearningRate 0.000104 Epoch: 28 Global Step: 588700 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:11,793-Speed 2497.94 samples/sec Loss 1.4412 LearningRate 0.000104 Epoch: 28 Global Step: 588710 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:19,997-Speed 2496.73 samples/sec Loss 1.4495 LearningRate 0.000104 Epoch: 28 Global Step: 588720 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:28,151-Speed 2512.10 samples/sec Loss 1.4442 LearningRate 0.000104 Epoch: 28 Global Step: 588730 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:36,349-Speed 2498.81 samples/sec Loss 1.4632 LearningRate 0.000104 Epoch: 28 Global Step: 588740 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:44,555-Speed 2496.07 samples/sec Loss 1.4533 LearningRate 0.000104 Epoch: 28 Global Step: 588750 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:12:52,758-Speed 2496.88 samples/sec Loss 1.4448 LearningRate 0.000104 Epoch: 28 Global Step: 588760 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:00,959-Speed 2497.80 samples/sec Loss 1.4584 LearningRate 0.000104 Epoch: 28 Global Step: 588770 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:09,159-Speed 2497.94 samples/sec Loss 1.4331 LearningRate 0.000104 Epoch: 28 Global Step: 588780 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:17,317-Speed 2510.87 samples/sec Loss 1.4749 LearningRate 0.000104 Epoch: 28 Global Step: 588790 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:25,517-Speed 2497.95 samples/sec Loss 1.4525 LearningRate 0.000104 Epoch: 28 Global Step: 588800 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:33,713-Speed 2499.16 samples/sec Loss 1.4785 LearningRate 0.000104 Epoch: 28 Global Step: 588810 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:41,914-Speed 2497.55 samples/sec Loss 1.4881 LearningRate 0.000104 Epoch: 28 Global Step: 588820 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:50,114-Speed 2498.07 samples/sec Loss 1.4245 LearningRate 0.000104 Epoch: 28 Global Step: 588830 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:13:58,316-Speed 2497.13 samples/sec Loss 1.4496 LearningRate 0.000104 Epoch: 28 Global Step: 588840 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:06,464-Speed 2514.33 samples/sec Loss 1.4651 LearningRate 0.000104 Epoch: 28 Global Step: 588850 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:14,665-Speed 2497.59 samples/sec Loss 1.4469 LearningRate 0.000104 Epoch: 28 Global Step: 588860 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:22,867-Speed 2497.36 samples/sec Loss 1.4882 LearningRate 0.000104 Epoch: 28 Global Step: 588870 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:31,082-Speed 2493.28 samples/sec Loss 1.4382 LearningRate 0.000104 Epoch: 28 Global Step: 588880 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:39,286-Speed 2496.74 samples/sec Loss 1.4770 LearningRate 0.000104 Epoch: 28 Global Step: 588890 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:47,483-Speed 2498.77 samples/sec Loss 1.4620 LearningRate 0.000104 Epoch: 28 Global Step: 588900 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:14:55,632-Speed 2513.48 samples/sec Loss 1.4476 LearningRate 0.000104 Epoch: 28 Global Step: 588910 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:03,835-Speed 2497.22 samples/sec Loss 1.4448 LearningRate 0.000104 Epoch: 28 Global Step: 588920 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:12,032-Speed 2499.03 samples/sec Loss 1.4442 LearningRate 0.000104 Epoch: 28 Global Step: 588930 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:20,232-Speed 2497.78 samples/sec Loss 1.4422 LearningRate 0.000104 Epoch: 28 Global Step: 588940 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:28,431-Speed 2498.37 samples/sec Loss 1.4643 LearningRate 0.000104 Epoch: 28 Global Step: 588950 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:36,632-Speed 2497.67 samples/sec Loss 1.4577 LearningRate 0.000104 Epoch: 28 Global Step: 588960 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:44,781-Speed 2513.79 samples/sec Loss 1.4354 LearningRate 0.000104 Epoch: 28 Global Step: 588970 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:15:52,986-Speed 2496.42 samples/sec Loss 1.4234 LearningRate 0.000104 Epoch: 28 Global Step: 588980 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:01,193-Speed 2496.23 samples/sec Loss 1.4475 LearningRate 0.000104 Epoch: 28 Global Step: 588990 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:09,398-Speed 2496.97 samples/sec Loss 1.4514 LearningRate 0.000104 Epoch: 28 Global Step: 589000 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:17,605-Speed 2495.87 samples/sec Loss 1.4497 LearningRate 0.000104 Epoch: 28 Global Step: 589010 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:25,809-Speed 2496.77 samples/sec Loss 1.4494 LearningRate 0.000104 Epoch: 28 Global Step: 589020 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:33,961-Speed 2512.80 samples/sec Loss 1.4462 LearningRate 0.000104 Epoch: 28 Global Step: 589030 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:42,165-Speed 2496.67 samples/sec Loss 1.4519 LearningRate 0.000104 Epoch: 28 Global Step: 589040 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:50,369-Speed 2496.83 samples/sec Loss 1.4959 LearningRate 0.000104 Epoch: 28 Global Step: 589050 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:16:58,573-Speed 2496.78 samples/sec Loss 1.4039 LearningRate 0.000104 Epoch: 28 Global Step: 589060 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:06,775-Speed 2497.25 samples/sec Loss 1.4517 LearningRate 0.000104 Epoch: 28 Global Step: 589070 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:14,985-Speed 2494.81 samples/sec Loss 1.4277 LearningRate 0.000104 Epoch: 28 Global Step: 589080 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:23,137-Speed 2512.62 samples/sec Loss 1.5019 LearningRate 0.000104 Epoch: 28 Global Step: 589090 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:31,358-Speed 2491.50 samples/sec Loss 1.4326 LearningRate 0.000104 Epoch: 28 Global Step: 589100 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:39,563-Speed 2496.51 samples/sec Loss 1.4281 LearningRate 0.000104 Epoch: 28 Global Step: 589110 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:47,765-Speed 2497.34 samples/sec Loss 1.4279 LearningRate 0.000104 Epoch: 28 Global Step: 589120 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:17:55,966-Speed 2497.57 samples/sec Loss 1.4330 LearningRate 0.000104 Epoch: 28 Global Step: 589130 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:04,169-Speed 2496.90 samples/sec Loss 1.4496 LearningRate 0.000104 Epoch: 28 Global Step: 589140 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:12,321-Speed 2512.67 samples/sec Loss 1.4320 LearningRate 0.000104 Epoch: 28 Global Step: 589150 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:20,533-Speed 2494.22 samples/sec Loss 1.4396 LearningRate 0.000104 Epoch: 28 Global Step: 589160 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:28,732-Speed 2498.45 samples/sec Loss 1.4432 LearningRate 0.000104 Epoch: 28 Global Step: 589170 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:36,928-Speed 2499.21 samples/sec Loss 1.4492 LearningRate 0.000104 Epoch: 28 Global Step: 589180 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:45,129-Speed 2497.67 samples/sec Loss 1.4222 LearningRate 0.000104 Epoch: 28 Global Step: 589190 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:18:53,330-Speed 2497.85 samples/sec Loss 1.5063 LearningRate 0.000104 Epoch: 28 Global Step: 589200 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:01,494-Speed 2508.84 samples/sec Loss 1.4738 LearningRate 0.000104 Epoch: 28 Global Step: 589210 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:09,696-Speed 2497.62 samples/sec Loss 1.4081 LearningRate 0.000104 Epoch: 28 Global Step: 589220 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:17,891-Speed 2499.41 samples/sec Loss 1.4186 LearningRate 0.000104 Epoch: 28 Global Step: 589230 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:26,090-Speed 2498.29 samples/sec Loss 1.4686 LearningRate 0.000104 Epoch: 28 Global Step: 589240 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:34,295-Speed 2496.56 samples/sec Loss 1.4477 LearningRate 0.000104 Epoch: 28 Global Step: 589250 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:42,494-Speed 2498.16 samples/sec Loss 1.4391 LearningRate 0.000104 Epoch: 28 Global Step: 589260 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:50,641-Speed 2514.36 samples/sec Loss 1.4554 LearningRate 0.000104 Epoch: 28 Global Step: 589270 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:19:58,840-Speed 2498.09 samples/sec Loss 1.4705 LearningRate 0.000104 Epoch: 28 Global Step: 589280 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:07,042-Speed 2497.31 samples/sec Loss 1.4427 LearningRate 0.000104 Epoch: 28 Global Step: 589290 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:15,244-Speed 2497.38 samples/sec Loss 1.4931 LearningRate 0.000104 Epoch: 28 Global Step: 589300 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:23,460-Speed 2493.22 samples/sec Loss 1.4528 LearningRate 0.000104 Epoch: 28 Global Step: 589310 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:31,662-Speed 2497.29 samples/sec Loss 1.4969 LearningRate 0.000104 Epoch: 28 Global Step: 589320 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:39,811-Speed 2513.50 samples/sec Loss 1.4515 LearningRate 0.000104 Epoch: 28 Global Step: 589330 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:48,022-Speed 2494.61 samples/sec Loss 1.4279 LearningRate 0.000104 Epoch: 28 Global Step: 589340 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:20:56,227-Speed 2496.63 samples/sec Loss 1.4416 LearningRate 0.000104 Epoch: 28 Global Step: 589350 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:21:04,445-Speed 2492.33 samples/sec Loss 1.4580 LearningRate 0.000104 Epoch: 28 Global Step: 589360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:12,649-Speed 2496.81 samples/sec Loss 1.4673 LearningRate 0.000103 Epoch: 28 Global Step: 589370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:20,852-Speed 2497.02 samples/sec Loss 1.4189 LearningRate 0.000103 Epoch: 28 Global Step: 589380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:29,002-Speed 2513.11 samples/sec Loss 1.4549 LearningRate 0.000103 Epoch: 28 Global Step: 589390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:37,204-Speed 2497.55 samples/sec Loss 1.4267 LearningRate 0.000103 Epoch: 28 Global Step: 589400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:45,405-Speed 2498.00 samples/sec Loss 1.4296 LearningRate 0.000103 Epoch: 28 Global Step: 589410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:21:53,618-Speed 2493.89 samples/sec Loss 1.4409 LearningRate 0.000103 Epoch: 28 Global Step: 589420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:01,821-Speed 2497.03 samples/sec Loss 1.4394 LearningRate 0.000103 Epoch: 28 Global Step: 589430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:10,027-Speed 2495.95 samples/sec Loss 1.4490 LearningRate 0.000103 Epoch: 28 Global Step: 589440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:18,178-Speed 2513.01 samples/sec Loss 1.4479 LearningRate 0.000103 Epoch: 28 Global Step: 589450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:26,380-Speed 2497.28 samples/sec Loss 1.4308 LearningRate 0.000103 Epoch: 28 Global Step: 589460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:34,584-Speed 2496.83 samples/sec Loss 1.4670 LearningRate 0.000103 Epoch: 28 Global Step: 589470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:42,786-Speed 2497.20 samples/sec Loss 1.4274 LearningRate 0.000103 Epoch: 28 Global Step: 589480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:51,004-Speed 2492.47 samples/sec Loss 1.4555 LearningRate 0.000103 Epoch: 28 Global Step: 589490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:22:59,234-Speed 2488.92 samples/sec Loss 1.4779 LearningRate 0.000103 Epoch: 28 Global Step: 589500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:07,378-Speed 2515.01 samples/sec Loss 1.4617 LearningRate 0.000103 Epoch: 28 Global Step: 589510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:15,594-Speed 2493.21 samples/sec Loss 1.4887 LearningRate 0.000103 Epoch: 28 Global Step: 589520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:23,797-Speed 2496.87 samples/sec Loss 1.4532 LearningRate 0.000103 Epoch: 28 Global Step: 589530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:32,002-Speed 2496.91 samples/sec Loss 1.4159 LearningRate 0.000103 Epoch: 28 Global Step: 589540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:40,214-Speed 2494.25 samples/sec Loss 1.4409 LearningRate 0.000103 Epoch: 28 Global Step: 589550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:48,422-Speed 2495.93 samples/sec Loss 1.4529 LearningRate 0.000103 Epoch: 28 Global Step: 589560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:23:56,577-Speed 2511.80 samples/sec Loss 1.4653 LearningRate 0.000103 Epoch: 28 Global Step: 589570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:04,780-Speed 2497.13 samples/sec Loss 1.4244 LearningRate 0.000103 Epoch: 28 Global Step: 589580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:12,981-Speed 2497.61 samples/sec Loss 1.4846 LearningRate 0.000103 Epoch: 28 Global Step: 589590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:21,187-Speed 2496.27 samples/sec Loss 1.4581 LearningRate 0.000103 Epoch: 28 Global Step: 589600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:29,391-Speed 2496.76 samples/sec Loss 1.4407 LearningRate 0.000103 Epoch: 28 Global Step: 589610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:37,592-Speed 2497.55 samples/sec Loss 1.4522 LearningRate 0.000103 Epoch: 28 Global Step: 589620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:45,756-Speed 2509.24 samples/sec Loss 1.4510 LearningRate 0.000103 Epoch: 28 Global Step: 589630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:24:53,957-Speed 2497.40 samples/sec Loss 1.4409 LearningRate 0.000103 Epoch: 28 Global Step: 589640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:02,160-Speed 2497.15 samples/sec Loss 1.4480 LearningRate 0.000103 Epoch: 28 Global Step: 589650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:10,364-Speed 2497.03 samples/sec Loss 1.4141 LearningRate 0.000103 Epoch: 28 Global Step: 589660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:18,578-Speed 2493.74 samples/sec Loss 1.4352 LearningRate 0.000103 Epoch: 28 Global Step: 589670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:26,783-Speed 2496.57 samples/sec Loss 1.4567 LearningRate 0.000103 Epoch: 28 Global Step: 589680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:34,935-Speed 2512.65 samples/sec Loss 1.4502 LearningRate 0.000103 Epoch: 28 Global Step: 589690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:43,140-Speed 2496.24 samples/sec Loss 1.4504 LearningRate 0.000103 Epoch: 28 Global Step: 589700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:51,338-Speed 2498.61 samples/sec Loss 1.4497 LearningRate 0.000103 Epoch: 28 Global Step: 589710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:25:59,545-Speed 2495.95 samples/sec Loss 1.4549 LearningRate 0.000103 Epoch: 28 Global Step: 589720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:07,745-Speed 2497.69 samples/sec Loss 1.4604 LearningRate 0.000103 Epoch: 28 Global Step: 589730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:15,951-Speed 2496.21 samples/sec Loss 1.4296 LearningRate 0.000103 Epoch: 28 Global Step: 589740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:24,098-Speed 2514.52 samples/sec Loss 1.4469 LearningRate 0.000103 Epoch: 28 Global Step: 589750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:32,305-Speed 2495.81 samples/sec Loss 1.4683 LearningRate 0.000103 Epoch: 28 Global Step: 589760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:40,507-Speed 2497.35 samples/sec Loss 1.4112 LearningRate 0.000103 Epoch: 28 Global Step: 589770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:48,717-Speed 2494.79 samples/sec Loss 1.4365 LearningRate 0.000103 Epoch: 28 Global Step: 589780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:26:56,926-Speed 2495.66 samples/sec Loss 1.4094 LearningRate 0.000103 Epoch: 28 Global Step: 589790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:05,129-Speed 2496.87 samples/sec Loss 1.4910 LearningRate 0.000103 Epoch: 28 Global Step: 589800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:13,280-Speed 2512.93 samples/sec Loss 1.4675 LearningRate 0.000103 Epoch: 28 Global Step: 589810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:21,481-Speed 2497.89 samples/sec Loss 1.4481 LearningRate 0.000103 Epoch: 28 Global Step: 589820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:29,683-Speed 2497.35 samples/sec Loss 1.4549 LearningRate 0.000103 Epoch: 28 Global Step: 589830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:37,887-Speed 2496.79 samples/sec Loss 1.4426 LearningRate 0.000103 Epoch: 28 Global Step: 589840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:46,097-Speed 2495.22 samples/sec Loss 1.4308 LearningRate 0.000103 Epoch: 28 Global Step: 589850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:27:54,298-Speed 2497.43 samples/sec Loss 1.4455 LearningRate 0.000103 Epoch: 28 Global Step: 589860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:02,448-Speed 2513.30 samples/sec Loss 1.4994 LearningRate 0.000103 Epoch: 28 Global Step: 589870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:10,658-Speed 2494.85 samples/sec Loss 1.4276 LearningRate 0.000103 Epoch: 28 Global Step: 589880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:18,862-Speed 2496.86 samples/sec Loss 1.4451 LearningRate 0.000103 Epoch: 28 Global Step: 589890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:27,067-Speed 2496.42 samples/sec Loss 1.4312 LearningRate 0.000103 Epoch: 28 Global Step: 589900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:35,272-Speed 2496.27 samples/sec Loss 1.4436 LearningRate 0.000103 Epoch: 28 Global Step: 589910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:43,480-Speed 2495.76 samples/sec Loss 1.4611 LearningRate 0.000103 Epoch: 28 Global Step: 589920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:51,637-Speed 2510.92 samples/sec Loss 1.4396 LearningRate 0.000103 Epoch: 28 Global Step: 589930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-07-11 05:28:59,796-Speed 2510.60 samples/sec Loss 1.4492 LearningRate 0.000103 Epoch: 28 Global Step: 589940 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:07,997-Speed 2497.75 samples/sec Loss 1.4228 LearningRate 0.000103 Epoch: 28 Global Step: 589950 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:16,197-Speed 2497.96 samples/sec Loss 1.4295 LearningRate 0.000103 Epoch: 28 Global Step: 589960 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:24,397-Speed 2498.03 samples/sec Loss 1.4472 LearningRate 0.000103 Epoch: 28 Global Step: 589970 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:32,600-Speed 2497.06 samples/sec Loss 1.4253 LearningRate 0.000103 Epoch: 28 Global Step: 589980 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:40,751-Speed 2512.96 samples/sec Loss 1.4807 LearningRate 0.000103 Epoch: 28 Global Step: 589990 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:48,952-Speed 2497.70 samples/sec Loss 1.4172 LearningRate 0.000103 Epoch: 28 Global Step: 590000 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:29:57,153-Speed 2497.41 samples/sec Loss 1.4446 LearningRate 0.000103 Epoch: 28 Global Step: 590010 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:05,358-Speed 2496.47 samples/sec Loss 1.4508 LearningRate 0.000103 Epoch: 28 Global Step: 590020 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:13,566-Speed 2495.73 samples/sec Loss 1.4671 LearningRate 0.000103 Epoch: 28 Global Step: 590030 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:21,765-Speed 2498.05 samples/sec Loss 1.4319 LearningRate 0.000103 Epoch: 28 Global Step: 590040 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:29,921-Speed 2512.13 samples/sec Loss 1.4531 LearningRate 0.000103 Epoch: 28 Global Step: 590050 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:38,138-Speed 2492.39 samples/sec Loss 1.4535 LearningRate 0.000103 Epoch: 28 Global Step: 590060 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:46,340-Speed 2497.37 samples/sec Loss 1.4727 LearningRate 0.000103 Epoch: 28 Global Step: 590070 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:30:54,550-Speed 2494.99 samples/sec Loss 1.4486 LearningRate 0.000103 Epoch: 28 Global Step: 590080 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:02,754-Speed 2496.88 samples/sec Loss 1.4656 LearningRate 0.000103 Epoch: 28 Global Step: 590090 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:10,954-Speed 2498.00 samples/sec Loss 1.4912 LearningRate 0.000103 Epoch: 28 Global Step: 590100 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:19,107-Speed 2512.32 samples/sec Loss 1.4868 LearningRate 0.000103 Epoch: 28 Global Step: 590110 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:27,324-Speed 2492.92 samples/sec Loss 1.4454 LearningRate 0.000103 Epoch: 28 Global Step: 590120 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:35,529-Speed 2496.39 samples/sec Loss 1.4543 LearningRate 0.000103 Epoch: 28 Global Step: 590130 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:43,728-Speed 2498.22 samples/sec Loss 1.4778 LearningRate 0.000103 Epoch: 28 Global Step: 590140 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:31:51,928-Speed 2498.09 samples/sec Loss 1.4622 LearningRate 0.000103 Epoch: 28 Global Step: 590150 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:00,127-Speed 2497.93 samples/sec Loss 1.4456 LearningRate 0.000103 Epoch: 28 Global Step: 590160 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:08,280-Speed 2512.40 samples/sec Loss 1.4893 LearningRate 0.000103 Epoch: 28 Global Step: 590170 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:16,483-Speed 2496.91 samples/sec Loss 1.4445 LearningRate 0.000103 Epoch: 28 Global Step: 590180 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:24,685-Speed 2498.16 samples/sec Loss 1.4540 LearningRate 0.000103 Epoch: 28 Global Step: 590190 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:32,890-Speed 2496.35 samples/sec Loss 1.4686 LearningRate 0.000103 Epoch: 28 Global Step: 590200 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:41,095-Speed 2496.67 samples/sec Loss 1.4453 LearningRate 0.000103 Epoch: 28 Global Step: 590210 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:49,312-Speed 2492.90 samples/sec Loss 1.4529 LearningRate 0.000103 Epoch: 28 Global Step: 590220 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:32:57,462-Speed 2513.26 samples/sec Loss 1.4957 LearningRate 0.000103 Epoch: 28 Global Step: 590230 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:05,665-Speed 2497.05 samples/sec Loss 1.4807 LearningRate 0.000103 Epoch: 28 Global Step: 590240 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:13,870-Speed 2496.97 samples/sec Loss 1.4476 LearningRate 0.000103 Epoch: 28 Global Step: 590250 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:22,072-Speed 2497.58 samples/sec Loss 1.4541 LearningRate 0.000103 Epoch: 28 Global Step: 590260 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:30,271-Speed 2497.98 samples/sec Loss 1.4953 LearningRate 0.000103 Epoch: 28 Global Step: 590270 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:38,478-Speed 2495.99 samples/sec Loss 1.4390 LearningRate 0.000103 Epoch: 28 Global Step: 590280 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:46,630-Speed 2512.74 samples/sec Loss 1.4715 LearningRate 0.000103 Epoch: 28 Global Step: 590290 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:33:54,836-Speed 2495.89 samples/sec Loss 1.4373 LearningRate 0.000103 Epoch: 28 Global Step: 590300 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:03,038-Speed 2497.56 samples/sec Loss 1.4818 LearningRate 0.000103 Epoch: 28 Global Step: 590310 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:11,239-Speed 2497.52 samples/sec Loss 1.4523 LearningRate 0.000103 Epoch: 28 Global Step: 590320 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:19,444-Speed 2496.57 samples/sec Loss 1.4705 LearningRate 0.000103 Epoch: 28 Global Step: 590330 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:27,649-Speed 2496.46 samples/sec Loss 1.4729 LearningRate 0.000103 Epoch: 28 Global Step: 590340 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:35,796-Speed 2514.07 samples/sec Loss 1.4570 LearningRate 0.000103 Epoch: 28 Global Step: 590350 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:43,999-Speed 2497.06 samples/sec Loss 1.4894 LearningRate 0.000103 Epoch: 28 Global Step: 590360 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:34:52,203-Speed 2496.79 samples/sec Loss 1.4276 LearningRate 0.000103 Epoch: 28 Global Step: 590370 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:00,416-Speed 2493.96 samples/sec Loss 1.4931 LearningRate 0.000103 Epoch: 28 Global Step: 590380 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:08,619-Speed 2497.09 samples/sec Loss 1.4598 LearningRate 0.000103 Epoch: 28 Global Step: 590390 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:16,822-Speed 2497.22 samples/sec Loss 1.4157 LearningRate 0.000103 Epoch: 28 Global Step: 590400 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:24,966-Speed 2515.03 samples/sec Loss 1.4634 LearningRate 0.000103 Epoch: 28 Global Step: 590410 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:33,167-Speed 2498.01 samples/sec Loss 1.4405 LearningRate 0.000103 Epoch: 28 Global Step: 590420 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:41,364-Speed 2498.96 samples/sec Loss 1.4518 LearningRate 0.000103 Epoch: 28 Global Step: 590430 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:49,564-Speed 2498.07 samples/sec Loss 1.4464 LearningRate 0.000103 Epoch: 28 Global Step: 590440 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:35:57,767-Speed 2497.37 samples/sec Loss 1.4941 LearningRate 0.000103 Epoch: 28 Global Step: 590450 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:05,964-Speed 2498.77 samples/sec Loss 1.4339 LearningRate 0.000103 Epoch: 28 Global Step: 590460 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:14,113-Speed 2513.78 samples/sec Loss 1.4968 LearningRate 0.000103 Epoch: 28 Global Step: 590470 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:22,316-Speed 2496.74 samples/sec Loss 1.4373 LearningRate 0.000103 Epoch: 28 Global Step: 590480 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:30,514-Speed 2498.61 samples/sec Loss 1.4438 LearningRate 0.000103 Epoch: 28 Global Step: 590490 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:38,726-Speed 2494.36 samples/sec Loss 1.4727 LearningRate 0.000103 Epoch: 28 Global Step: 590500 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:46,938-Speed 2494.61 samples/sec Loss 1.4530 LearningRate 0.000103 Epoch: 28 Global Step: 590510 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:36:55,138-Speed 2497.93 samples/sec Loss 1.4510 LearningRate 0.000103 Epoch: 28 Global Step: 590520 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:03,284-Speed 2514.34 samples/sec Loss 1.4677 LearningRate 0.000103 Epoch: 28 Global Step: 590530 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:11,480-Speed 2499.49 samples/sec Loss 1.4317 LearningRate 0.000102 Epoch: 28 Global Step: 590540 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:19,683-Speed 2497.21 samples/sec Loss 1.4276 LearningRate 0.000102 Epoch: 28 Global Step: 590550 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:27,880-Speed 2498.75 samples/sec Loss 1.4654 LearningRate 0.000102 Epoch: 28 Global Step: 590560 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:36,083-Speed 2498.01 samples/sec Loss 1.4082 LearningRate 0.000102 Epoch: 28 Global Step: 590570 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:44,284-Speed 2497.81 samples/sec Loss 1.4766 LearningRate 0.000102 Epoch: 28 Global Step: 590580 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:37:52,444-Speed 2510.30 samples/sec Loss 1.4590 LearningRate 0.000102 Epoch: 28 Global Step: 590590 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:00,644-Speed 2497.86 samples/sec Loss 1.4569 LearningRate 0.000102 Epoch: 28 Global Step: 590600 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:08,859-Speed 2493.12 samples/sec Loss 1.4808 LearningRate 0.000102 Epoch: 28 Global Step: 590610 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:17,071-Speed 2494.25 samples/sec Loss 1.4125 LearningRate 0.000102 Epoch: 28 Global Step: 590620 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:25,280-Speed 2495.39 samples/sec Loss 1.4407 LearningRate 0.000102 Epoch: 28 Global Step: 590630 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:33,484-Speed 2496.77 samples/sec Loss 1.4356 LearningRate 0.000102 Epoch: 28 Global Step: 590640 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:41,636-Speed 2512.67 samples/sec Loss 1.4516 LearningRate 0.000102 Epoch: 28 Global Step: 590650 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:49,841-Speed 2496.67 samples/sec Loss 1.4925 LearningRate 0.000102 Epoch: 28 Global Step: 590660 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:38:58,040-Speed 2498.06 samples/sec Loss 1.4353 LearningRate 0.000102 Epoch: 28 Global Step: 590670 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:39:06,244-Speed 2496.93 samples/sec Loss 1.4522 LearningRate 0.000102 Epoch: 28 Global Step: 590680 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-07-11 05:39:14,402-Speed 2510.69 samples/sec Loss 1.4753 LearningRate 0.000102 Epoch: 28 Global Step: 590690 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:39:22,602-Speed 2498.11 samples/sec Loss 1.4789 LearningRate 0.000102 Epoch: 28 Global Step: 590700 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:39:30,749-Speed 2514.04 samples/sec Loss 1.4290 LearningRate 0.000102 Epoch: 28 Global Step: 590710 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:39:38,950-Speed 2497.80 samples/sec Loss 1.4735 LearningRate 0.000102 Epoch: 28 Global Step: 590720 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:39:47,149-Speed 2498.12 samples/sec Loss 1.4139 LearningRate 0.000102 Epoch: 28 Global Step: 590730 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:39:55,347-Speed 2498.53 samples/sec Loss 1.4766 LearningRate 0.000102 Epoch: 28 Global Step: 590740 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:03,544-Speed 2498.97 samples/sec Loss 1.4517 LearningRate 0.000102 Epoch: 28 Global Step: 590750 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:11,750-Speed 2495.85 samples/sec Loss 1.5168 LearningRate 0.000102 Epoch: 28 Global Step: 590760 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:19,895-Speed 2515.04 samples/sec Loss 1.4505 LearningRate 0.000102 Epoch: 28 Global Step: 590770 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:28,095-Speed 2497.94 samples/sec Loss 1.4374 LearningRate 0.000102 Epoch: 28 Global Step: 590780 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:36,298-Speed 2496.90 samples/sec Loss 1.4481 LearningRate 0.000102 Epoch: 28 Global Step: 590790 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:44,500-Speed 2497.23 samples/sec Loss 1.4574 LearningRate 0.000102 Epoch: 28 Global Step: 590800 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:40:52,712-Speed 2494.27 samples/sec Loss 1.4771 LearningRate 0.000102 Epoch: 28 Global Step: 590810 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:00,911-Speed 2498.09 samples/sec Loss 1.4270 LearningRate 0.000102 Epoch: 28 Global Step: 590820 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:09,053-Speed 2515.89 samples/sec Loss 1.4412 LearningRate 0.000102 Epoch: 28 Global Step: 590830 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:17,248-Speed 2499.22 samples/sec Loss 1.4464 LearningRate 0.000102 Epoch: 28 Global Step: 590840 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:25,448-Speed 2497.97 samples/sec Loss 1.4549 LearningRate 0.000102 Epoch: 28 Global Step: 590850 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:33,641-Speed 2499.92 samples/sec Loss 1.4448 LearningRate 0.000102 Epoch: 28 Global Step: 590860 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:41,838-Speed 2498.80 samples/sec Loss 1.4482 LearningRate 0.000102 Epoch: 28 Global Step: 590870 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:50,035-Speed 2498.82 samples/sec Loss 1.4673 LearningRate 0.000102 Epoch: 28 Global Step: 590880 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:41:58,181-Speed 2514.65 samples/sec Loss 1.4574 LearningRate 0.000102 Epoch: 28 Global Step: 590890 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:06,395-Speed 2493.64 samples/sec Loss 1.4453 LearningRate 0.000102 Epoch: 28 Global Step: 590900 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:14,598-Speed 2497.29 samples/sec Loss 1.4408 LearningRate 0.000102 Epoch: 28 Global Step: 590910 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:22,797-Speed 2498.24 samples/sec Loss 1.4360 LearningRate 0.000102 Epoch: 28 Global Step: 590920 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:31,000-Speed 2497.32 samples/sec Loss 1.4648 LearningRate 0.000102 Epoch: 28 Global Step: 590930 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:39,207-Speed 2495.80 samples/sec Loss 1.4284 LearningRate 0.000102 Epoch: 28 Global Step: 590940 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:47,357-Speed 2513.27 samples/sec Loss 1.4360 LearningRate 0.000102 Epoch: 28 Global Step: 590950 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:42:55,551-Speed 2499.68 samples/sec Loss 1.4373 LearningRate 0.000102 Epoch: 28 Global Step: 590960 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:03,747-Speed 2499.11 samples/sec Loss 1.4379 LearningRate 0.000102 Epoch: 28 Global Step: 590970 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:11,959-Speed 2494.72 samples/sec Loss 1.4416 LearningRate 0.000102 Epoch: 28 Global Step: 590980 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:20,157-Speed 2498.24 samples/sec Loss 1.4149 LearningRate 0.000102 Epoch: 28 Global Step: 590990 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:28,367-Speed 2494.89 samples/sec Loss 1.4160 LearningRate 0.000102 Epoch: 28 Global Step: 591000 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:36,512-Speed 2514.98 samples/sec Loss 1.4543 LearningRate 0.000102 Epoch: 28 Global Step: 591010 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:44,712-Speed 2497.61 samples/sec Loss 1.4594 LearningRate 0.000102 Epoch: 28 Global Step: 591020 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:43:52,928-Speed 2493.17 samples/sec Loss 1.4317 LearningRate 0.000102 Epoch: 28 Global Step: 591030 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:01,132-Speed 2496.94 samples/sec Loss 1.4363 LearningRate 0.000102 Epoch: 28 Global Step: 591040 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:09,346-Speed 2493.57 samples/sec Loss 1.4498 LearningRate 0.000102 Epoch: 28 Global Step: 591050 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:17,546-Speed 2497.86 samples/sec Loss 1.4346 LearningRate 0.000102 Epoch: 28 Global Step: 591060 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:25,690-Speed 2515.10 samples/sec Loss 1.4478 LearningRate 0.000102 Epoch: 28 Global Step: 591070 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:33,891-Speed 2497.91 samples/sec Loss 1.4371 LearningRate 0.000102 Epoch: 28 Global Step: 591080 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:42,092-Speed 2497.68 samples/sec Loss 1.4351 LearningRate 0.000102 Epoch: 28 Global Step: 591090 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:50,289-Speed 2498.61 samples/sec Loss 1.4289 LearningRate 0.000102 Epoch: 28 Global Step: 591100 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:44:58,493-Speed 2496.63 samples/sec Loss 1.4383 LearningRate 0.000102 Epoch: 28 Global Step: 591110 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:06,695-Speed 2497.29 samples/sec Loss 1.4248 LearningRate 0.000102 Epoch: 28 Global Step: 591120 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:14,849-Speed 2512.02 samples/sec Loss 1.4611 LearningRate 0.000102 Epoch: 28 Global Step: 591130 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:23,058-Speed 2495.73 samples/sec Loss 1.4227 LearningRate 0.000102 Epoch: 28 Global Step: 591140 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:31,261-Speed 2497.04 samples/sec Loss 1.4172 LearningRate 0.000102 Epoch: 28 Global Step: 591150 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:39,465-Speed 2496.77 samples/sec Loss 1.4719 LearningRate 0.000102 Epoch: 28 Global Step: 591160 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:47,664-Speed 2498.01 samples/sec Loss 1.4837 LearningRate 0.000102 Epoch: 28 Global Step: 591170 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:45:55,865-Speed 2497.85 samples/sec Loss 1.4568 LearningRate 0.000102 Epoch: 28 Global Step: 591180 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:04,014-Speed 2513.57 samples/sec Loss 1.4397 LearningRate 0.000102 Epoch: 28 Global Step: 591190 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:12,213-Speed 2498.22 samples/sec Loss 1.4196 LearningRate 0.000102 Epoch: 28 Global Step: 591200 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:20,412-Speed 2497.97 samples/sec Loss 1.4394 LearningRate 0.000102 Epoch: 28 Global Step: 591210 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:28,613-Speed 2497.48 samples/sec Loss 1.4387 LearningRate 0.000102 Epoch: 28 Global Step: 591220 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:36,816-Speed 2497.36 samples/sec Loss 1.3947 LearningRate 0.000102 Epoch: 28 Global Step: 591230 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:45,018-Speed 2497.24 samples/sec Loss 1.4447 LearningRate 0.000102 Epoch: 28 Global Step: 591240 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:46:53,164-Speed 2514.55 samples/sec Loss 1.4626 LearningRate 0.000102 Epoch: 28 Global Step: 591250 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:01,363-Speed 2498.27 samples/sec Loss 1.4521 LearningRate 0.000102 Epoch: 28 Global Step: 591260 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:09,563-Speed 2498.23 samples/sec Loss 1.4566 LearningRate 0.000102 Epoch: 28 Global Step: 591270 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:17,764-Speed 2497.82 samples/sec Loss 1.4340 LearningRate 0.000102 Epoch: 28 Global Step: 591280 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:25,968-Speed 2496.44 samples/sec Loss 1.4406 LearningRate 0.000102 Epoch: 28 Global Step: 591290 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:34,168-Speed 2498.12 samples/sec Loss 1.4389 LearningRate 0.000102 Epoch: 28 Global Step: 591300 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:42,313-Speed 2514.75 samples/sec Loss 1.4282 LearningRate 0.000102 Epoch: 28 Global Step: 591310 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:50,511-Speed 2498.81 samples/sec Loss 1.4236 LearningRate 0.000102 Epoch: 28 Global Step: 591320 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:47:58,712-Speed 2497.45 samples/sec Loss 1.4351 LearningRate 0.000102 Epoch: 28 Global Step: 591330 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:06,921-Speed 2495.15 samples/sec Loss 1.4162 LearningRate 0.000102 Epoch: 28 Global Step: 591340 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:15,116-Speed 2499.50 samples/sec Loss 1.4296 LearningRate 0.000102 Epoch: 28 Global Step: 591350 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:23,312-Speed 2499.05 samples/sec Loss 1.4237 LearningRate 0.000102 Epoch: 28 Global Step: 591360 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:31,457-Speed 2515.50 samples/sec Loss 1.4571 LearningRate 0.000102 Epoch: 28 Global Step: 591370 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:39,656-Speed 2498.37 samples/sec Loss 1.4528 LearningRate 0.000102 Epoch: 28 Global Step: 591380 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:47,854-Speed 2498.80 samples/sec Loss 1.4590 LearningRate 0.000102 Epoch: 28 Global Step: 591390 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:48:56,079-Speed 2490.20 samples/sec Loss 1.4039 LearningRate 0.000102 Epoch: 28 Global Step: 591400 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:04,277-Speed 2498.42 samples/sec Loss 1.4834 LearningRate 0.000102 Epoch: 28 Global Step: 591410 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:12,483-Speed 2496.25 samples/sec Loss 1.4457 LearningRate 0.000102 Epoch: 28 Global Step: 591420 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:20,629-Speed 2514.39 samples/sec Loss 1.4590 LearningRate 0.000102 Epoch: 28 Global Step: 591430 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:28,827-Speed 2498.56 samples/sec Loss 1.4313 LearningRate 0.000102 Epoch: 28 Global Step: 591440 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:37,028-Speed 2497.70 samples/sec Loss 1.4278 LearningRate 0.000102 Epoch: 28 Global Step: 591450 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:45,227-Speed 2498.36 samples/sec Loss 1.4327 LearningRate 0.000102 Epoch: 28 Global Step: 591460 Fp16 Grad Scale: 8192 Required: 55 hours Training: 2022-07-11 05:49:53,426-Speed 2498.31 samples/sec Loss 1.4563 LearningRate 0.000102 Epoch: 28 Global Step: 591470 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:01,628-Speed 2497.19 samples/sec Loss 1.4670 LearningRate 0.000102 Epoch: 28 Global Step: 591480 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:09,778-Speed 2513.28 samples/sec Loss 1.4779 LearningRate 0.000102 Epoch: 28 Global Step: 591490 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:17,976-Speed 2499.07 samples/sec Loss 1.4354 LearningRate 0.000102 Epoch: 28 Global Step: 591500 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:26,172-Speed 2499.02 samples/sec Loss 1.4549 LearningRate 0.000102 Epoch: 28 Global Step: 591510 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:34,371-Speed 2498.30 samples/sec Loss 1.4278 LearningRate 0.000102 Epoch: 28 Global Step: 591520 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:42,570-Speed 2498.22 samples/sec Loss 1.4473 LearningRate 0.000102 Epoch: 28 Global Step: 591530 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:50,770-Speed 2497.70 samples/sec Loss 1.4614 LearningRate 0.000102 Epoch: 28 Global Step: 591540 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:50:58,919-Speed 2513.77 samples/sec Loss 1.4591 LearningRate 0.000102 Epoch: 28 Global Step: 591550 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:07,121-Speed 2497.14 samples/sec Loss 1.4449 LearningRate 0.000102 Epoch: 28 Global Step: 591560 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:15,316-Speed 2499.77 samples/sec Loss 1.4379 LearningRate 0.000102 Epoch: 28 Global Step: 591570 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:23,515-Speed 2498.11 samples/sec Loss 1.3904 LearningRate 0.000102 Epoch: 28 Global Step: 591580 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:31,713-Speed 2498.70 samples/sec Loss 1.4722 LearningRate 0.000102 Epoch: 28 Global Step: 591590 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:39,910-Speed 2498.59 samples/sec Loss 1.4357 LearningRate 0.000102 Epoch: 28 Global Step: 591600 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:48,059-Speed 2513.81 samples/sec Loss 1.4425 LearningRate 0.000102 Epoch: 28 Global Step: 591610 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:51:56,275-Speed 2493.00 samples/sec Loss 1.4448 LearningRate 0.000102 Epoch: 28 Global Step: 591620 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:04,477-Speed 2497.24 samples/sec Loss 1.4401 LearningRate 0.000102 Epoch: 28 Global Step: 591630 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:12,679-Speed 2497.26 samples/sec Loss 1.4736 LearningRate 0.000102 Epoch: 28 Global Step: 591640 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:20,882-Speed 2497.03 samples/sec Loss 1.4497 LearningRate 0.000102 Epoch: 28 Global Step: 591650 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:29,086-Speed 2496.80 samples/sec Loss 1.4460 LearningRate 0.000102 Epoch: 28 Global Step: 591660 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:37,232-Speed 2514.39 samples/sec Loss 1.4337 LearningRate 0.000102 Epoch: 28 Global Step: 591670 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:45,433-Speed 2497.78 samples/sec Loss 1.4742 LearningRate 0.000102 Epoch: 28 Global Step: 591680 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:52:53,641-Speed 2495.34 samples/sec Loss 1.4665 LearningRate 0.000102 Epoch: 28 Global Step: 591690 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:01,840-Speed 2498.26 samples/sec Loss 1.4529 LearningRate 0.000101 Epoch: 28 Global Step: 591700 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:10,039-Speed 2498.12 samples/sec Loss 1.4499 LearningRate 0.000101 Epoch: 28 Global Step: 591710 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:18,241-Speed 2497.62 samples/sec Loss 1.4410 LearningRate 0.000101 Epoch: 28 Global Step: 591720 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:26,388-Speed 2514.19 samples/sec Loss 1.4997 LearningRate 0.000101 Epoch: 28 Global Step: 591730 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:34,647-Speed 2499.15 samples/sec Loss 1.4236 LearningRate 0.000101 Epoch: 28 Global Step: 591740 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:42,851-Speed 2496.75 samples/sec Loss 1.4155 LearningRate 0.000101 Epoch: 28 Global Step: 591750 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:51,047-Speed 2498.89 samples/sec Loss 1.4247 LearningRate 0.000101 Epoch: 28 Global Step: 591760 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:53:59,318-Speed 2499.42 samples/sec Loss 1.4340 LearningRate 0.000101 Epoch: 28 Global Step: 591770 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:07,560-Speed 2497.65 samples/sec Loss 1.4242 LearningRate 0.000101 Epoch: 28 Global Step: 591780 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:15,706-Speed 2514.53 samples/sec Loss 1.4040 LearningRate 0.000101 Epoch: 28 Global Step: 591790 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:23,933-Speed 2498.96 samples/sec Loss 1.4202 LearningRate 0.000101 Epoch: 28 Global Step: 591800 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:32,149-Speed 2501.07 samples/sec Loss 1.4317 LearningRate 0.000101 Epoch: 28 Global Step: 591810 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:40,349-Speed 2497.98 samples/sec Loss 1.4485 LearningRate 0.000101 Epoch: 28 Global Step: 591820 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:48,548-Speed 2498.00 samples/sec Loss 1.4379 LearningRate 0.000101 Epoch: 28 Global Step: 591830 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:54:56,783-Speed 2499.20 samples/sec Loss 1.4205 LearningRate 0.000101 Epoch: 28 Global Step: 591840 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:55:04,956-Speed 2516.60 samples/sec Loss 1.4224 LearningRate 0.000101 Epoch: 28 Global Step: 591850 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:55:13,168-Speed 2494.34 samples/sec Loss 1.4394 LearningRate 0.000101 Epoch: 28 Global Step: 591860 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:55:22,378-Speed 2500.37 samples/sec Loss 1.4674 LearningRate 0.000101 Epoch: 28 Global Step: 591870 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:55:30,713-Speed 2499.08 samples/sec Loss 1.4607 LearningRate 0.000101 Epoch: 28 Global Step: 591880 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-07-11 05:55:38,917-Speed 2497.26 samples/sec Loss 1.4636 LearningRate 0.000101 Epoch: 28 Global Step: 591890 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:55:47,126-Speed 2494.82 samples/sec Loss 1.4547 LearningRate 0.000101 Epoch: 28 Global Step: 591900 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:55:55,298-Speed 2509.51 samples/sec Loss 1.4404 LearningRate 0.000101 Epoch: 28 Global Step: 591910 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:07,697-Speed 1656.66 samples/sec Loss 1.4353 LearningRate 0.000101 Epoch: 28 Global Step: 591920 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:16,614-Speed 2500.15 samples/sec Loss 1.4296 LearningRate 0.000101 Epoch: 28 Global Step: 591930 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:24,822-Speed 2495.26 samples/sec Loss 1.4536 LearningRate 0.000101 Epoch: 28 Global Step: 591940 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:34,265-Speed 2265.16 samples/sec Loss 1.4490 LearningRate 0.000101 Epoch: 28 Global Step: 591950 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:42,533-Speed 2492.93 samples/sec Loss 1.4154 LearningRate 0.000101 Epoch: 28 Global Step: 591960 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:56:50,684-Speed 2513.00 samples/sec Loss 1.4344 LearningRate 0.000101 Epoch: 28 Global Step: 591970 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:02,696-Speed 2500.50 samples/sec Loss 1.4148 LearningRate 0.000101 Epoch: 28 Global Step: 591980 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:12,014-Speed 2500.31 samples/sec Loss 1.4474 LearningRate 0.000101 Epoch: 28 Global Step: 591990 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:20,210-Speed 2499.20 samples/sec Loss 1.4401 LearningRate 0.000101 Epoch: 28 Global Step: 592000 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:28,408-Speed 2498.35 samples/sec Loss 1.4493 LearningRate 0.000101 Epoch: 28 Global Step: 592010 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:36,610-Speed 2497.37 samples/sec Loss 1.4565 LearningRate 0.000101 Epoch: 28 Global Step: 592020 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:44,753-Speed 2515.43 samples/sec Loss 1.4191 LearningRate 0.000101 Epoch: 28 Global Step: 592030 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:57:52,953-Speed 2498.08 samples/sec Loss 1.4417 LearningRate 0.000101 Epoch: 28 Global Step: 592040 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:01,151-Speed 2498.58 samples/sec Loss 1.4691 LearningRate 0.000101 Epoch: 28 Global Step: 592050 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:09,357-Speed 2496.38 samples/sec Loss 1.4495 LearningRate 0.000101 Epoch: 28 Global Step: 592060 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:17,564-Speed 2496.00 samples/sec Loss 1.4498 LearningRate 0.000101 Epoch: 28 Global Step: 592070 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:25,764-Speed 2497.83 samples/sec Loss 1.4182 LearningRate 0.000101 Epoch: 28 Global Step: 592080 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:33,920-Speed 2511.45 samples/sec Loss 1.4193 LearningRate 0.000101 Epoch: 28 Global Step: 592090 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:42,127-Speed 2495.78 samples/sec Loss 1.4292 LearningRate 0.000101 Epoch: 28 Global Step: 592100 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:50,328-Speed 2497.62 samples/sec Loss 1.4435 LearningRate 0.000101 Epoch: 28 Global Step: 592110 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:58:58,536-Speed 2495.63 samples/sec Loss 1.4080 LearningRate 0.000101 Epoch: 28 Global Step: 592120 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:06,738-Speed 2497.10 samples/sec Loss 1.4525 LearningRate 0.000101 Epoch: 28 Global Step: 592130 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:14,942-Speed 2496.67 samples/sec Loss 1.4373 LearningRate 0.000101 Epoch: 28 Global Step: 592140 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:23,096-Speed 2512.07 samples/sec Loss 1.4279 LearningRate 0.000101 Epoch: 28 Global Step: 592150 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:31,297-Speed 2497.53 samples/sec Loss 1.4301 LearningRate 0.000101 Epoch: 28 Global Step: 592160 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:39,498-Speed 2497.71 samples/sec Loss 1.4281 LearningRate 0.000101 Epoch: 28 Global Step: 592170 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:47,697-Speed 2498.06 samples/sec Loss 1.3816 LearningRate 0.000101 Epoch: 28 Global Step: 592180 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 05:59:55,896-Speed 2498.33 samples/sec Loss 1.4424 LearningRate 0.000101 Epoch: 28 Global Step: 592190 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:04,098-Speed 2497.38 samples/sec Loss 1.4371 LearningRate 0.000101 Epoch: 28 Global Step: 592200 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:12,244-Speed 2514.48 samples/sec Loss 1.4307 LearningRate 0.000101 Epoch: 28 Global Step: 592210 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:20,445-Speed 2497.74 samples/sec Loss 1.4425 LearningRate 0.000101 Epoch: 28 Global Step: 592220 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:28,672-Speed 2490.17 samples/sec Loss 1.4707 LearningRate 0.000101 Epoch: 28 Global Step: 592230 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:36,873-Speed 2497.52 samples/sec Loss 1.4465 LearningRate 0.000101 Epoch: 28 Global Step: 592240 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:45,087-Speed 2493.70 samples/sec Loss 1.4567 LearningRate 0.000101 Epoch: 28 Global Step: 592250 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:00:53,294-Speed 2495.62 samples/sec Loss 1.4415 LearningRate 0.000101 Epoch: 28 Global Step: 592260 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:01,443-Speed 2513.78 samples/sec Loss 1.4561 LearningRate 0.000101 Epoch: 28 Global Step: 592270 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:09,657-Speed 2494.03 samples/sec Loss 1.4208 LearningRate 0.000101 Epoch: 28 Global Step: 592280 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:17,859-Speed 2497.10 samples/sec Loss 1.4287 LearningRate 0.000101 Epoch: 28 Global Step: 592290 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:26,063-Speed 2497.10 samples/sec Loss 1.4842 LearningRate 0.000101 Epoch: 28 Global Step: 592300 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:34,262-Speed 2498.13 samples/sec Loss 1.4253 LearningRate 0.000101 Epoch: 28 Global Step: 592310 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:42,462-Speed 2497.74 samples/sec Loss 1.4279 LearningRate 0.000101 Epoch: 28 Global Step: 592320 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:50,609-Speed 2514.13 samples/sec Loss 1.4229 LearningRate 0.000101 Epoch: 28 Global Step: 592330 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:01:58,812-Speed 2497.40 samples/sec Loss 1.4384 LearningRate 0.000101 Epoch: 28 Global Step: 592340 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:07,013-Speed 2497.64 samples/sec Loss 1.4521 LearningRate 0.000101 Epoch: 28 Global Step: 592350 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:15,217-Speed 2496.83 samples/sec Loss 1.4622 LearningRate 0.000101 Epoch: 28 Global Step: 592360 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:23,416-Speed 2498.22 samples/sec Loss 1.4281 LearningRate 0.000101 Epoch: 28 Global Step: 592370 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:31,615-Speed 2498.17 samples/sec Loss 1.4246 LearningRate 0.000101 Epoch: 28 Global Step: 592380 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:39,755-Speed 2516.36 samples/sec Loss 1.4338 LearningRate 0.000101 Epoch: 28 Global Step: 592390 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:47,953-Speed 2498.60 samples/sec Loss 1.4472 LearningRate 0.000101 Epoch: 28 Global Step: 592400 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:02:56,153-Speed 2497.93 samples/sec Loss 1.3960 LearningRate 0.000101 Epoch: 28 Global Step: 592410 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:04,349-Speed 2499.22 samples/sec Loss 1.4350 LearningRate 0.000101 Epoch: 28 Global Step: 592420 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:12,561-Speed 2494.23 samples/sec Loss 1.4252 LearningRate 0.000101 Epoch: 28 Global Step: 592430 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:20,761-Speed 2497.91 samples/sec Loss 1.4397 LearningRate 0.000101 Epoch: 28 Global Step: 592440 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:28,909-Speed 2514.10 samples/sec Loss 1.4834 LearningRate 0.000101 Epoch: 28 Global Step: 592450 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:37,109-Speed 2497.66 samples/sec Loss 1.4410 LearningRate 0.000101 Epoch: 28 Global Step: 592460 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:45,309-Speed 2497.78 samples/sec Loss 1.4506 LearningRate 0.000101 Epoch: 28 Global Step: 592470 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:03:53,510-Speed 2497.75 samples/sec Loss 1.4334 LearningRate 0.000101 Epoch: 28 Global Step: 592480 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:01,710-Speed 2497.88 samples/sec Loss 1.3965 LearningRate 0.000101 Epoch: 28 Global Step: 592490 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:09,910-Speed 2497.95 samples/sec Loss 1.4284 LearningRate 0.000101 Epoch: 28 Global Step: 592500 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:18,057-Speed 2514.07 samples/sec Loss 1.4363 LearningRate 0.000101 Epoch: 28 Global Step: 592510 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:26,264-Speed 2495.69 samples/sec Loss 1.4275 LearningRate 0.000101 Epoch: 28 Global Step: 592520 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:34,470-Speed 2496.22 samples/sec Loss 1.4196 LearningRate 0.000101 Epoch: 28 Global Step: 592530 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:42,678-Speed 2495.56 samples/sec Loss 1.4127 LearningRate 0.000101 Epoch: 28 Global Step: 592540 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:50,889-Speed 2494.33 samples/sec Loss 1.4835 LearningRate 0.000101 Epoch: 28 Global Step: 592550 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:04:59,088-Speed 2498.14 samples/sec Loss 1.4568 LearningRate 0.000101 Epoch: 28 Global Step: 592560 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:07,243-Speed 2511.96 samples/sec Loss 1.4564 LearningRate 0.000101 Epoch: 28 Global Step: 592570 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:15,444-Speed 2497.51 samples/sec Loss 1.4434 LearningRate 0.000101 Epoch: 28 Global Step: 592580 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:23,646-Speed 2497.47 samples/sec Loss 1.4034 LearningRate 0.000101 Epoch: 28 Global Step: 592590 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:31,850-Speed 2496.79 samples/sec Loss 1.4440 LearningRate 0.000101 Epoch: 28 Global Step: 592600 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:40,054-Speed 2496.58 samples/sec Loss 1.4295 LearningRate 0.000101 Epoch: 28 Global Step: 592610 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:48,256-Speed 2497.16 samples/sec Loss 1.4191 LearningRate 0.000101 Epoch: 28 Global Step: 592620 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:05:56,416-Speed 2510.25 samples/sec Loss 1.4344 LearningRate 0.000101 Epoch: 28 Global Step: 592630 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:04,619-Speed 2497.02 samples/sec Loss 1.4301 LearningRate 0.000101 Epoch: 28 Global Step: 592640 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:12,818-Speed 2498.29 samples/sec Loss 1.4160 LearningRate 0.000101 Epoch: 28 Global Step: 592650 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:21,036-Speed 2492.46 samples/sec Loss 1.4217 LearningRate 0.000101 Epoch: 28 Global Step: 592660 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:29,259-Speed 2490.98 samples/sec Loss 1.4465 LearningRate 0.000101 Epoch: 28 Global Step: 592670 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:37,463-Speed 2496.55 samples/sec Loss 1.4501 LearningRate 0.000101 Epoch: 28 Global Step: 592680 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:45,606-Speed 2515.32 samples/sec Loss 1.4476 LearningRate 0.000101 Epoch: 28 Global Step: 592690 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:06:53,809-Speed 2497.23 samples/sec Loss 1.4414 LearningRate 0.000101 Epoch: 28 Global Step: 592700 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:02,009-Speed 2498.04 samples/sec Loss 1.4398 LearningRate 0.000101 Epoch: 28 Global Step: 592710 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:10,209-Speed 2497.75 samples/sec Loss 1.4289 LearningRate 0.000101 Epoch: 28 Global Step: 592720 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:18,435-Speed 2490.42 samples/sec Loss 1.4031 LearningRate 0.000101 Epoch: 28 Global Step: 592730 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:26,633-Speed 2498.33 samples/sec Loss 1.4871 LearningRate 0.000101 Epoch: 28 Global Step: 592740 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:34,803-Speed 2507.27 samples/sec Loss 1.4371 LearningRate 0.000101 Epoch: 28 Global Step: 592750 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:43,005-Speed 2497.25 samples/sec Loss 1.3973 LearningRate 0.000101 Epoch: 28 Global Step: 592760 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:51,208-Speed 2497.25 samples/sec Loss 1.4186 LearningRate 0.000101 Epoch: 28 Global Step: 592770 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:07:59,407-Speed 2498.21 samples/sec Loss 1.4270 LearningRate 0.000101 Epoch: 28 Global Step: 592780 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:07,611-Speed 2496.62 samples/sec Loss 1.4271 LearningRate 0.000101 Epoch: 28 Global Step: 592790 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:15,810-Speed 2498.57 samples/sec Loss 1.4515 LearningRate 0.000101 Epoch: 28 Global Step: 592800 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:23,958-Speed 2514.05 samples/sec Loss 1.4195 LearningRate 0.000101 Epoch: 28 Global Step: 592810 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:32,155-Speed 2498.76 samples/sec Loss 1.3951 LearningRate 0.000101 Epoch: 28 Global Step: 592820 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:40,359-Speed 2496.66 samples/sec Loss 1.4422 LearningRate 0.000101 Epoch: 28 Global Step: 592830 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:48,560-Speed 2498.05 samples/sec Loss 1.4474 LearningRate 0.000101 Epoch: 28 Global Step: 592840 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:08:56,759-Speed 2498.55 samples/sec Loss 1.4464 LearningRate 0.000101 Epoch: 28 Global Step: 592850 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:04,959-Speed 2497.78 samples/sec Loss 1.4079 LearningRate 0.000101 Epoch: 28 Global Step: 592860 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:13,109-Speed 2513.29 samples/sec Loss 1.4473 LearningRate 0.000101 Epoch: 28 Global Step: 592870 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:21,317-Speed 2495.60 samples/sec Loss 1.4253 LearningRate 0.000100 Epoch: 28 Global Step: 592880 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:29,532-Speed 2493.38 samples/sec Loss 1.4253 LearningRate 0.000100 Epoch: 28 Global Step: 592890 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:37,747-Speed 2493.53 samples/sec Loss 1.4523 LearningRate 0.000100 Epoch: 28 Global Step: 592900 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:45,947-Speed 2497.95 samples/sec Loss 1.4189 LearningRate 0.000100 Epoch: 28 Global Step: 592910 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:09:54,158-Speed 2494.49 samples/sec Loss 1.4126 LearningRate 0.000100 Epoch: 28 Global Step: 592920 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:02,312-Speed 2512.27 samples/sec Loss 1.4476 LearningRate 0.000100 Epoch: 28 Global Step: 592930 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:10,513-Speed 2497.34 samples/sec Loss 1.4365 LearningRate 0.000100 Epoch: 28 Global Step: 592940 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:18,712-Speed 2498.54 samples/sec Loss 1.4511 LearningRate 0.000100 Epoch: 28 Global Step: 592950 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:26,909-Speed 2498.69 samples/sec Loss 1.4691 LearningRate 0.000100 Epoch: 28 Global Step: 592960 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:35,106-Speed 2498.80 samples/sec Loss 1.4353 LearningRate 0.000100 Epoch: 28 Global Step: 592970 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:43,306-Speed 2498.03 samples/sec Loss 1.4472 LearningRate 0.000100 Epoch: 28 Global Step: 592980 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:51,452-Speed 2514.28 samples/sec Loss 1.4415 LearningRate 0.000100 Epoch: 28 Global Step: 592990 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:10:59,654-Speed 2497.59 samples/sec Loss 1.4376 LearningRate 0.000100 Epoch: 28 Global Step: 593000 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:07,863-Speed 2495.44 samples/sec Loss 1.4195 LearningRate 0.000100 Epoch: 28 Global Step: 593010 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:16,064-Speed 2497.68 samples/sec Loss 1.3710 LearningRate 0.000100 Epoch: 28 Global Step: 593020 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:24,276-Speed 2494.31 samples/sec Loss 1.4244 LearningRate 0.000100 Epoch: 28 Global Step: 593030 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:32,482-Speed 2496.26 samples/sec Loss 1.3949 LearningRate 0.000100 Epoch: 28 Global Step: 593040 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:40,634-Speed 2513.14 samples/sec Loss 1.4263 LearningRate 0.000100 Epoch: 28 Global Step: 593050 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:48,839-Speed 2496.52 samples/sec Loss 1.4512 LearningRate 0.000100 Epoch: 28 Global Step: 593060 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:11:57,040-Speed 2497.48 samples/sec Loss 1.4311 LearningRate 0.000100 Epoch: 28 Global Step: 593070 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:12:05,243-Speed 2497.13 samples/sec Loss 1.4471 LearningRate 0.000100 Epoch: 28 Global Step: 593080 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:12:13,450-Speed 2496.36 samples/sec Loss 1.4136 LearningRate 0.000100 Epoch: 28 Global Step: 593090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:12:21,654-Speed 2496.55 samples/sec Loss 1.3888 LearningRate 0.000100 Epoch: 28 Global Step: 593100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:12:29,803-Speed 2513.60 samples/sec Loss 1.4423 LearningRate 0.000100 Epoch: 28 Global Step: 593110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:12:38,006-Speed 2497.11 samples/sec Loss 1.4516 LearningRate 0.000100 Epoch: 28 Global Step: 593120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:12:46,219-Speed 2494.07 samples/sec Loss 1.4564 LearningRate 0.000100 Epoch: 28 Global Step: 593130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:12:54,420-Speed 2497.74 samples/sec Loss 1.4620 LearningRate 0.000100 Epoch: 28 Global Step: 593140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:02,629-Speed 2495.26 samples/sec Loss 1.4264 LearningRate 0.000100 Epoch: 28 Global Step: 593150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:10,831-Speed 2497.65 samples/sec Loss 1.4422 LearningRate 0.000100 Epoch: 28 Global Step: 593160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:18,980-Speed 2513.67 samples/sec Loss 1.3874 LearningRate 0.000100 Epoch: 28 Global Step: 593170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:27,177-Speed 2498.75 samples/sec Loss 1.4053 LearningRate 0.000100 Epoch: 28 Global Step: 593180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:35,376-Speed 2498.22 samples/sec Loss 1.4285 LearningRate 0.000100 Epoch: 28 Global Step: 593190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:43,596-Speed 2492.09 samples/sec Loss 1.3977 LearningRate 0.000100 Epoch: 28 Global Step: 593200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:13:51,795-Speed 2498.28 samples/sec Loss 1.4341 LearningRate 0.000100 Epoch: 28 Global Step: 593210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:00,004-Speed 2495.15 samples/sec Loss 1.4591 LearningRate 0.000100 Epoch: 28 Global Step: 593220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:08,155-Speed 2512.99 samples/sec Loss 1.4325 LearningRate 0.000100 Epoch: 28 Global Step: 593230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:16,363-Speed 2495.37 samples/sec Loss 1.4157 LearningRate 0.000100 Epoch: 28 Global Step: 593240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:24,563-Speed 2498.19 samples/sec Loss 1.3999 LearningRate 0.000100 Epoch: 28 Global Step: 593250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:32,766-Speed 2497.40 samples/sec Loss 1.4091 LearningRate 0.000100 Epoch: 28 Global Step: 593260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:40,967-Speed 2497.78 samples/sec Loss 1.4703 LearningRate 0.000100 Epoch: 28 Global Step: 593270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:49,168-Speed 2497.41 samples/sec Loss 1.4598 LearningRate 0.000100 Epoch: 28 Global Step: 593280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:14:57,317-Speed 2513.55 samples/sec Loss 1.4368 LearningRate 0.000100 Epoch: 28 Global Step: 593290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:05,520-Speed 2497.04 samples/sec Loss 1.4518 LearningRate 0.000100 Epoch: 28 Global Step: 593300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:13,720-Speed 2498.17 samples/sec Loss 1.4888 LearningRate 0.000100 Epoch: 28 Global Step: 593310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:21,919-Speed 2498.05 samples/sec Loss 1.4294 LearningRate 0.000100 Epoch: 28 Global Step: 593320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:30,121-Speed 2497.25 samples/sec Loss 1.4453 LearningRate 0.000100 Epoch: 28 Global Step: 593330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:38,325-Speed 2496.84 samples/sec Loss 1.4553 LearningRate 0.000100 Epoch: 28 Global Step: 593340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:46,475-Speed 2513.48 samples/sec Loss 1.4390 LearningRate 0.000100 Epoch: 28 Global Step: 593350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:15:54,690-Speed 2493.26 samples/sec Loss 1.4316 LearningRate 0.000100 Epoch: 28 Global Step: 593360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:02,892-Speed 2497.61 samples/sec Loss 1.4520 LearningRate 0.000100 Epoch: 28 Global Step: 593370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:11,092-Speed 2497.72 samples/sec Loss 1.4636 LearningRate 0.000100 Epoch: 28 Global Step: 593380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:19,292-Speed 2498.14 samples/sec Loss 1.4307 LearningRate 0.000100 Epoch: 28 Global Step: 593390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:27,490-Speed 2498.27 samples/sec Loss 1.4346 LearningRate 0.000100 Epoch: 28 Global Step: 593400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:35,639-Speed 2513.97 samples/sec Loss 1.4511 LearningRate 0.000100 Epoch: 28 Global Step: 593410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:43,837-Speed 2498.62 samples/sec Loss 1.4341 LearningRate 0.000100 Epoch: 28 Global Step: 593420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:16:52,040-Speed 2497.15 samples/sec Loss 1.4294 LearningRate 0.000100 Epoch: 28 Global Step: 593430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:00,255-Speed 2493.49 samples/sec Loss 1.4323 LearningRate 0.000100 Epoch: 28 Global Step: 593440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:08,455-Speed 2497.84 samples/sec Loss 1.4165 LearningRate 0.000100 Epoch: 28 Global Step: 593450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:16,655-Speed 2498.17 samples/sec Loss 1.5061 LearningRate 0.000100 Epoch: 28 Global Step: 593460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:24,801-Speed 2514.43 samples/sec Loss 1.4444 LearningRate 0.000100 Epoch: 28 Global Step: 593470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:33,004-Speed 2496.89 samples/sec Loss 1.4210 LearningRate 0.000100 Epoch: 28 Global Step: 593480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:41,203-Speed 2498.30 samples/sec Loss 1.4432 LearningRate 0.000100 Epoch: 28 Global Step: 593490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:49,404-Speed 2497.69 samples/sec Loss 1.4316 LearningRate 0.000100 Epoch: 28 Global Step: 593500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:17:57,605-Speed 2498.07 samples/sec Loss 1.4341 LearningRate 0.000100 Epoch: 28 Global Step: 593510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:05,806-Speed 2497.80 samples/sec Loss 1.4215 LearningRate 0.000100 Epoch: 28 Global Step: 593520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:13,953-Speed 2514.18 samples/sec Loss 1.4368 LearningRate 0.000100 Epoch: 28 Global Step: 593530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:22,154-Speed 2497.73 samples/sec Loss 1.3983 LearningRate 0.000100 Epoch: 28 Global Step: 593540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:30,358-Speed 2497.01 samples/sec Loss 1.4168 LearningRate 0.000100 Epoch: 28 Global Step: 593550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:38,558-Speed 2497.83 samples/sec Loss 1.4296 LearningRate 0.000100 Epoch: 28 Global Step: 593560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:46,758-Speed 2497.88 samples/sec Loss 1.4643 LearningRate 0.000100 Epoch: 28 Global Step: 593570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:18:54,964-Speed 2496.38 samples/sec Loss 1.4423 LearningRate 0.000100 Epoch: 28 Global Step: 593580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:03,112-Speed 2513.80 samples/sec Loss 1.3842 LearningRate 0.000100 Epoch: 28 Global Step: 593590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:11,308-Speed 2499.10 samples/sec Loss 1.4354 LearningRate 0.000100 Epoch: 28 Global Step: 593600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:19,510-Speed 2497.47 samples/sec Loss 1.4316 LearningRate 0.000100 Epoch: 28 Global Step: 593610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:27,714-Speed 2497.05 samples/sec Loss 1.4361 LearningRate 0.000100 Epoch: 28 Global Step: 593620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:35,925-Speed 2494.45 samples/sec Loss 1.4808 LearningRate 0.000100 Epoch: 28 Global Step: 593630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:44,125-Speed 2498.12 samples/sec Loss 1.4566 LearningRate 0.000100 Epoch: 28 Global Step: 593640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:19:52,274-Speed 2513.62 samples/sec Loss 1.4393 LearningRate 0.000100 Epoch: 28 Global Step: 593650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:00,474-Speed 2498.09 samples/sec Loss 1.4248 LearningRate 0.000100 Epoch: 28 Global Step: 593660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:08,674-Speed 2497.81 samples/sec Loss 1.4340 LearningRate 0.000100 Epoch: 28 Global Step: 593670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:16,874-Speed 2498.05 samples/sec Loss 1.4581 LearningRate 0.000100 Epoch: 28 Global Step: 593680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:25,071-Speed 2498.90 samples/sec Loss 1.4353 LearningRate 0.000100 Epoch: 28 Global Step: 593690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:33,274-Speed 2496.78 samples/sec Loss 1.4344 LearningRate 0.000100 Epoch: 28 Global Step: 593700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:41,422-Speed 2513.94 samples/sec Loss 1.4635 LearningRate 0.000100 Epoch: 28 Global Step: 593710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:49,623-Speed 2497.67 samples/sec Loss 1.4580 LearningRate 0.000100 Epoch: 28 Global Step: 593720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:20:57,825-Speed 2497.53 samples/sec Loss 1.4103 LearningRate 0.000100 Epoch: 28 Global Step: 593730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:06,022-Speed 2498.62 samples/sec Loss 1.4340 LearningRate 0.000100 Epoch: 28 Global Step: 593740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:14,224-Speed 2497.46 samples/sec Loss 1.4386 LearningRate 0.000100 Epoch: 28 Global Step: 593750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:22,428-Speed 2496.69 samples/sec Loss 1.4369 LearningRate 0.000100 Epoch: 28 Global Step: 593760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:30,577-Speed 2513.66 samples/sec Loss 1.4483 LearningRate 0.000100 Epoch: 28 Global Step: 593770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:38,774-Speed 2498.71 samples/sec Loss 1.4248 LearningRate 0.000100 Epoch: 28 Global Step: 593780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:46,975-Speed 2497.65 samples/sec Loss 1.4125 LearningRate 0.000100 Epoch: 28 Global Step: 593790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:21:55,176-Speed 2497.64 samples/sec Loss 1.4457 LearningRate 0.000100 Epoch: 28 Global Step: 593800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:03,378-Speed 2497.65 samples/sec Loss 1.4149 LearningRate 0.000100 Epoch: 28 Global Step: 593810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:11,591-Speed 2494.02 samples/sec Loss 1.4616 LearningRate 0.000100 Epoch: 28 Global Step: 593820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:19,739-Speed 2513.87 samples/sec Loss 1.4546 LearningRate 0.000100 Epoch: 28 Global Step: 593830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:27,939-Speed 2497.99 samples/sec Loss 1.4653 LearningRate 0.000100 Epoch: 28 Global Step: 593840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:36,140-Speed 2497.71 samples/sec Loss 1.4695 LearningRate 0.000100 Epoch: 28 Global Step: 593850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:44,337-Speed 2498.76 samples/sec Loss 1.4135 LearningRate 0.000100 Epoch: 28 Global Step: 593860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:22:52,535-Speed 2498.80 samples/sec Loss 1.4509 LearningRate 0.000100 Epoch: 28 Global Step: 593870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:00,737-Speed 2497.20 samples/sec Loss 1.4312 LearningRate 0.000100 Epoch: 28 Global Step: 593880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:08,882-Speed 2514.82 samples/sec Loss 1.4348 LearningRate 0.000100 Epoch: 28 Global Step: 593890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:17,083-Speed 2497.77 samples/sec Loss 1.4230 LearningRate 0.000100 Epoch: 28 Global Step: 593900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:25,283-Speed 2498.06 samples/sec Loss 1.4224 LearningRate 0.000100 Epoch: 28 Global Step: 593910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:33,483-Speed 2497.95 samples/sec Loss 1.4380 LearningRate 0.000100 Epoch: 28 Global Step: 593920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:41,689-Speed 2496.01 samples/sec Loss 1.4419 LearningRate 0.000100 Epoch: 28 Global Step: 593930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:49,888-Speed 2498.53 samples/sec Loss 1.3993 LearningRate 0.000100 Epoch: 28 Global Step: 593940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:23:58,037-Speed 2513.72 samples/sec Loss 1.4046 LearningRate 0.000100 Epoch: 28 Global Step: 593950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:06,247-Speed 2495.07 samples/sec Loss 1.4363 LearningRate 0.000100 Epoch: 28 Global Step: 593960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:14,448-Speed 2497.65 samples/sec Loss 1.4445 LearningRate 0.000100 Epoch: 28 Global Step: 593970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:22,646-Speed 2498.60 samples/sec Loss 1.4572 LearningRate 0.000100 Epoch: 28 Global Step: 593980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:30,845-Speed 2497.96 samples/sec Loss 1.4571 LearningRate 0.000100 Epoch: 28 Global Step: 593990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:39,056-Speed 2494.56 samples/sec Loss 1.4588 LearningRate 0.000100 Epoch: 28 Global Step: 594000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:47,203-Speed 2514.57 samples/sec Loss 1.4270 LearningRate 0.000100 Epoch: 28 Global Step: 594010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:24:55,401-Speed 2498.39 samples/sec Loss 1.4436 LearningRate 0.000100 Epoch: 28 Global Step: 594020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:03,602-Speed 2497.44 samples/sec Loss 1.4917 LearningRate 0.000100 Epoch: 28 Global Step: 594030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:11,806-Speed 2496.67 samples/sec Loss 1.4771 LearningRate 0.000100 Epoch: 28 Global Step: 594040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:20,008-Speed 2497.31 samples/sec Loss 1.4418 LearningRate 0.000100 Epoch: 28 Global Step: 594050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:28,212-Speed 2496.77 samples/sec Loss 1.4062 LearningRate 0.000099 Epoch: 28 Global Step: 594060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:36,364-Speed 2512.84 samples/sec Loss 1.4526 LearningRate 0.000099 Epoch: 28 Global Step: 594070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:44,564-Speed 2497.80 samples/sec Loss 1.4290 LearningRate 0.000099 Epoch: 28 Global Step: 594080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:25:52,766-Speed 2497.58 samples/sec Loss 1.4100 LearningRate 0.000099 Epoch: 28 Global Step: 594090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:00,968-Speed 2497.03 samples/sec Loss 1.4691 LearningRate 0.000099 Epoch: 28 Global Step: 594100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:09,168-Speed 2498.07 samples/sec Loss 1.4233 LearningRate 0.000099 Epoch: 28 Global Step: 594110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:17,375-Speed 2495.82 samples/sec Loss 1.4479 LearningRate 0.000099 Epoch: 28 Global Step: 594120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:25,534-Speed 2510.58 samples/sec Loss 1.4466 LearningRate 0.000099 Epoch: 28 Global Step: 594130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:33,744-Speed 2495.17 samples/sec Loss 1.4235 LearningRate 0.000099 Epoch: 28 Global Step: 594140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:41,945-Speed 2497.76 samples/sec Loss 1.3976 LearningRate 0.000099 Epoch: 28 Global Step: 594150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:50,143-Speed 2498.33 samples/sec Loss 1.4568 LearningRate 0.000099 Epoch: 28 Global Step: 594160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:26:58,345-Speed 2497.48 samples/sec Loss 1.4428 LearningRate 0.000099 Epoch: 28 Global Step: 594170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:06,548-Speed 2497.03 samples/sec Loss 1.4775 LearningRate 0.000099 Epoch: 28 Global Step: 594180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:14,698-Speed 2513.59 samples/sec Loss 1.4911 LearningRate 0.000099 Epoch: 28 Global Step: 594190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:22,898-Speed 2497.69 samples/sec Loss 1.4358 LearningRate 0.000099 Epoch: 28 Global Step: 594200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:31,099-Speed 2497.74 samples/sec Loss 1.4493 LearningRate 0.000099 Epoch: 28 Global Step: 594210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:39,298-Speed 2498.20 samples/sec Loss 1.4584 LearningRate 0.000099 Epoch: 28 Global Step: 594220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:47,502-Speed 2496.92 samples/sec Loss 1.4581 LearningRate 0.000099 Epoch: 28 Global Step: 594230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:27:55,701-Speed 2498.41 samples/sec Loss 1.4355 LearningRate 0.000099 Epoch: 28 Global Step: 594240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:28:03,848-Speed 2514.23 samples/sec Loss 1.4279 LearningRate 0.000099 Epoch: 28 Global Step: 594250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:28:12,045-Speed 2498.78 samples/sec Loss 1.4315 LearningRate 0.000099 Epoch: 28 Global Step: 594260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:28:20,257-Speed 2494.46 samples/sec Loss 1.4008 LearningRate 0.000099 Epoch: 28 Global Step: 594270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:28:28,461-Speed 2496.79 samples/sec Loss 1.4444 LearningRate 0.000099 Epoch: 28 Global Step: 594280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:28:36,658-Speed 2498.82 samples/sec Loss 1.4521 LearningRate 0.000099 Epoch: 28 Global Step: 594290 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-07-11 06:28:44,859-Speed 2497.58 samples/sec Loss 1.4528 LearningRate 0.000099 Epoch: 28 Global Step: 594300 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-07-11 06:28:53,005-Speed 2514.62 samples/sec Loss 1.4561 LearningRate 0.000099 Epoch: 28 Global Step: 594310 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-07-11 06:29:01,162-Speed 2511.11 samples/sec Loss 1.4433 LearningRate 0.000099 Epoch: 28 Global Step: 594320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:09,362-Speed 2497.96 samples/sec Loss 1.4289 LearningRate 0.000099 Epoch: 28 Global Step: 594330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:17,568-Speed 2495.80 samples/sec Loss 1.4276 LearningRate 0.000099 Epoch: 28 Global Step: 594340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:25,769-Speed 2497.87 samples/sec Loss 1.4546 LearningRate 0.000099 Epoch: 28 Global Step: 594350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:33,983-Speed 2493.69 samples/sec Loss 1.4143 LearningRate 0.000099 Epoch: 28 Global Step: 594360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:42,130-Speed 2514.20 samples/sec Loss 1.4226 LearningRate 0.000099 Epoch: 28 Global Step: 594370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:50,327-Speed 2499.03 samples/sec Loss 1.4449 LearningRate 0.000099 Epoch: 28 Global Step: 594380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:29:58,529-Speed 2497.18 samples/sec Loss 1.4303 LearningRate 0.000099 Epoch: 28 Global Step: 594390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:06,729-Speed 2497.96 samples/sec Loss 1.4326 LearningRate 0.000099 Epoch: 28 Global Step: 594400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:14,929-Speed 2497.88 samples/sec Loss 1.4515 LearningRate 0.000099 Epoch: 28 Global Step: 594410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:23,129-Speed 2497.87 samples/sec Loss 1.4372 LearningRate 0.000099 Epoch: 28 Global Step: 594420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:31,275-Speed 2514.65 samples/sec Loss 1.4511 LearningRate 0.000099 Epoch: 28 Global Step: 594430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:39,479-Speed 2496.85 samples/sec Loss 1.4451 LearningRate 0.000099 Epoch: 28 Global Step: 594440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:47,676-Speed 2498.98 samples/sec Loss 1.4611 LearningRate 0.000099 Epoch: 28 Global Step: 594450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:30:55,875-Speed 2498.19 samples/sec Loss 1.4484 LearningRate 0.000099 Epoch: 28 Global Step: 594460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:04,077-Speed 2497.61 samples/sec Loss 1.4380 LearningRate 0.000099 Epoch: 28 Global Step: 594470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:12,284-Speed 2495.59 samples/sec Loss 1.4307 LearningRate 0.000099 Epoch: 28 Global Step: 594480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:20,434-Speed 2513.48 samples/sec Loss 1.4251 LearningRate 0.000099 Epoch: 28 Global Step: 594490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:28,632-Speed 2498.80 samples/sec Loss 1.4522 LearningRate 0.000099 Epoch: 28 Global Step: 594500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:36,829-Speed 2499.00 samples/sec Loss 1.4478 LearningRate 0.000099 Epoch: 28 Global Step: 594510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:45,028-Speed 2498.42 samples/sec Loss 1.4391 LearningRate 0.000099 Epoch: 28 Global Step: 594520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:31:53,239-Speed 2494.79 samples/sec Loss 1.3977 LearningRate 0.000099 Epoch: 28 Global Step: 594530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:01,436-Speed 2498.71 samples/sec Loss 1.4554 LearningRate 0.000099 Epoch: 28 Global Step: 594540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:09,588-Speed 2512.61 samples/sec Loss 1.4326 LearningRate 0.000099 Epoch: 28 Global Step: 594550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:17,788-Speed 2498.08 samples/sec Loss 1.4234 LearningRate 0.000099 Epoch: 28 Global Step: 594560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:25,987-Speed 2498.40 samples/sec Loss 1.4548 LearningRate 0.000099 Epoch: 28 Global Step: 594570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:34,187-Speed 2498.02 samples/sec Loss 1.4297 LearningRate 0.000099 Epoch: 28 Global Step: 594580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:42,396-Speed 2495.21 samples/sec Loss 1.4401 LearningRate 0.000099 Epoch: 28 Global Step: 594590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:50,597-Speed 2497.88 samples/sec Loss 1.4547 LearningRate 0.000099 Epoch: 28 Global Step: 594600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:32:58,743-Speed 2514.90 samples/sec Loss 1.4569 LearningRate 0.000099 Epoch: 28 Global Step: 594610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:06,942-Speed 2497.93 samples/sec Loss 1.4494 LearningRate 0.000099 Epoch: 28 Global Step: 594620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:15,146-Speed 2496.99 samples/sec Loss 1.4552 LearningRate 0.000099 Epoch: 28 Global Step: 594630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:23,345-Speed 2498.38 samples/sec Loss 1.4848 LearningRate 0.000099 Epoch: 28 Global Step: 594640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:31,542-Speed 2498.99 samples/sec Loss 1.4510 LearningRate 0.000099 Epoch: 28 Global Step: 594650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:39,743-Speed 2497.66 samples/sec Loss 1.4263 LearningRate 0.000099 Epoch: 28 Global Step: 594660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:47,889-Speed 2514.26 samples/sec Loss 1.3968 LearningRate 0.000099 Epoch: 28 Global Step: 594670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:33:56,091-Speed 2497.42 samples/sec Loss 1.4740 LearningRate 0.000099 Epoch: 28 Global Step: 594680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:04,289-Speed 2498.70 samples/sec Loss 1.4228 LearningRate 0.000099 Epoch: 28 Global Step: 594690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:12,496-Speed 2495.61 samples/sec Loss 1.4468 LearningRate 0.000099 Epoch: 28 Global Step: 594700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:20,696-Speed 2497.97 samples/sec Loss 1.4630 LearningRate 0.000099 Epoch: 28 Global Step: 594710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:28,896-Speed 2498.01 samples/sec Loss 1.4315 LearningRate 0.000099 Epoch: 28 Global Step: 594720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:37,044-Speed 2514.04 samples/sec Loss 1.4376 LearningRate 0.000099 Epoch: 28 Global Step: 594730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:45,243-Speed 2498.23 samples/sec Loss 1.4800 LearningRate 0.000099 Epoch: 28 Global Step: 594740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:34:53,442-Speed 2498.50 samples/sec Loss 1.4607 LearningRate 0.000099 Epoch: 28 Global Step: 594750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:01,641-Speed 2498.26 samples/sec Loss 1.4582 LearningRate 0.000099 Epoch: 28 Global Step: 594760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:09,839-Speed 2498.51 samples/sec Loss 1.4398 LearningRate 0.000099 Epoch: 28 Global Step: 594770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:18,041-Speed 2497.19 samples/sec Loss 1.4220 LearningRate 0.000099 Epoch: 28 Global Step: 594780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:26,189-Speed 2514.03 samples/sec Loss 1.4148 LearningRate 0.000099 Epoch: 28 Global Step: 594790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:34,398-Speed 2495.22 samples/sec Loss 1.4220 LearningRate 0.000099 Epoch: 28 Global Step: 594800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:42,603-Speed 2496.25 samples/sec Loss 1.4063 LearningRate 0.000099 Epoch: 28 Global Step: 594810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:50,805-Speed 2497.60 samples/sec Loss 1.4195 LearningRate 0.000099 Epoch: 28 Global Step: 594820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:35:59,003-Speed 2498.19 samples/sec Loss 1.4301 LearningRate 0.000099 Epoch: 28 Global Step: 594830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:07,204-Speed 2497.86 samples/sec Loss 1.4458 LearningRate 0.000099 Epoch: 28 Global Step: 594840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:15,356-Speed 2512.66 samples/sec Loss 1.4483 LearningRate 0.000099 Epoch: 28 Global Step: 594850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:23,556-Speed 2497.83 samples/sec Loss 1.4136 LearningRate 0.000099 Epoch: 28 Global Step: 594860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:31,755-Speed 2498.52 samples/sec Loss 1.4444 LearningRate 0.000099 Epoch: 28 Global Step: 594870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:39,956-Speed 2497.87 samples/sec Loss 1.4409 LearningRate 0.000099 Epoch: 28 Global Step: 594880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:48,158-Speed 2497.44 samples/sec Loss 1.4038 LearningRate 0.000099 Epoch: 28 Global Step: 594890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:36:56,358-Speed 2497.98 samples/sec Loss 1.4230 LearningRate 0.000099 Epoch: 28 Global Step: 594900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:04,504-Speed 2514.38 samples/sec Loss 1.4320 LearningRate 0.000099 Epoch: 28 Global Step: 594910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:12,705-Speed 2497.64 samples/sec Loss 1.4069 LearningRate 0.000099 Epoch: 28 Global Step: 594920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:20,903-Speed 2498.83 samples/sec Loss 1.4105 LearningRate 0.000099 Epoch: 28 Global Step: 594930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:29,115-Speed 2494.19 samples/sec Loss 1.4223 LearningRate 0.000099 Epoch: 28 Global Step: 594940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:37,316-Speed 2497.66 samples/sec Loss 1.4484 LearningRate 0.000099 Epoch: 28 Global Step: 594950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:45,519-Speed 2497.02 samples/sec Loss 1.3862 LearningRate 0.000099 Epoch: 28 Global Step: 594960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:37:53,681-Speed 2509.73 samples/sec Loss 1.4528 LearningRate 0.000099 Epoch: 28 Global Step: 594970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:01,885-Speed 2496.58 samples/sec Loss 1.4417 LearningRate 0.000099 Epoch: 28 Global Step: 594980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:10,086-Speed 2497.62 samples/sec Loss 1.4143 LearningRate 0.000099 Epoch: 28 Global Step: 594990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:18,294-Speed 2495.67 samples/sec Loss 1.4032 LearningRate 0.000099 Epoch: 28 Global Step: 595000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:26,496-Speed 2497.35 samples/sec Loss 1.4541 LearningRate 0.000099 Epoch: 28 Global Step: 595010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:34,699-Speed 2497.29 samples/sec Loss 1.4165 LearningRate 0.000099 Epoch: 28 Global Step: 595020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:42,853-Speed 2511.96 samples/sec Loss 1.4225 LearningRate 0.000099 Epoch: 28 Global Step: 595030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:51,055-Speed 2497.61 samples/sec Loss 1.4099 LearningRate 0.000099 Epoch: 28 Global Step: 595040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:38:59,251-Speed 2499.10 samples/sec Loss 1.4381 LearningRate 0.000099 Epoch: 28 Global Step: 595050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:07,452-Speed 2497.73 samples/sec Loss 1.4306 LearningRate 0.000099 Epoch: 28 Global Step: 595060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:15,650-Speed 2498.78 samples/sec Loss 1.4616 LearningRate 0.000099 Epoch: 28 Global Step: 595070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:23,851-Speed 2497.87 samples/sec Loss 1.3970 LearningRate 0.000099 Epoch: 28 Global Step: 595080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:31,993-Speed 2515.58 samples/sec Loss 1.4297 LearningRate 0.000099 Epoch: 28 Global Step: 595090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:40,193-Speed 2498.02 samples/sec Loss 1.4149 LearningRate 0.000099 Epoch: 28 Global Step: 595100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:48,395-Speed 2497.68 samples/sec Loss 1.4702 LearningRate 0.000099 Epoch: 28 Global Step: 595110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:39:56,593-Speed 2498.44 samples/sec Loss 1.4430 LearningRate 0.000099 Epoch: 28 Global Step: 595120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-07-11 06:40:04,749-Speed 2511.31 samples/sec Loss 1.4221 LearningRate 0.000099 Epoch: 28 Global Step: 595130 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:12,949-Speed 2498.31 samples/sec Loss 1.4039 LearningRate 0.000099 Epoch: 28 Global Step: 595140 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:21,093-Speed 2514.92 samples/sec Loss 1.4401 LearningRate 0.000099 Epoch: 28 Global Step: 595150 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:29,291-Speed 2498.42 samples/sec Loss 1.4288 LearningRate 0.000099 Epoch: 28 Global Step: 595160 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:37,490-Speed 2498.27 samples/sec Loss 1.4104 LearningRate 0.000099 Epoch: 28 Global Step: 595170 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:45,694-Speed 2497.05 samples/sec Loss 1.4335 LearningRate 0.000099 Epoch: 28 Global Step: 595180 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:40:53,903-Speed 2495.18 samples/sec Loss 1.4330 LearningRate 0.000099 Epoch: 28 Global Step: 595190 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:02,103-Speed 2497.95 samples/sec Loss 1.4039 LearningRate 0.000099 Epoch: 28 Global Step: 595200 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:10,251-Speed 2513.83 samples/sec Loss 1.4025 LearningRate 0.000099 Epoch: 28 Global Step: 595210 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:18,451-Speed 2498.16 samples/sec Loss 1.4255 LearningRate 0.000099 Epoch: 28 Global Step: 595220 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:26,650-Speed 2498.09 samples/sec Loss 1.3844 LearningRate 0.000099 Epoch: 28 Global Step: 595230 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:34,848-Speed 2498.59 samples/sec Loss 1.4090 LearningRate 0.000099 Epoch: 28 Global Step: 595240 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:43,045-Speed 2498.93 samples/sec Loss 1.3943 LearningRate 0.000098 Epoch: 28 Global Step: 595250 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:51,246-Speed 2497.71 samples/sec Loss 1.4439 LearningRate 0.000098 Epoch: 28 Global Step: 595260 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:41:59,393-Speed 2514.30 samples/sec Loss 1.4507 LearningRate 0.000098 Epoch: 28 Global Step: 595270 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:07,590-Speed 2498.80 samples/sec Loss 1.4169 LearningRate 0.000098 Epoch: 28 Global Step: 595280 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:15,794-Speed 2496.58 samples/sec Loss 1.4467 LearningRate 0.000098 Epoch: 28 Global Step: 595290 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:23,992-Speed 2499.27 samples/sec Loss 1.4220 LearningRate 0.000098 Epoch: 28 Global Step: 595300 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:32,190-Speed 2498.73 samples/sec Loss 1.4572 LearningRate 0.000098 Epoch: 28 Global Step: 595310 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:40,390-Speed 2497.86 samples/sec Loss 1.4351 LearningRate 0.000098 Epoch: 28 Global Step: 595320 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:48,546-Speed 2512.10 samples/sec Loss 1.4463 LearningRate 0.000098 Epoch: 28 Global Step: 595330 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:42:56,753-Speed 2495.89 samples/sec Loss 1.4079 LearningRate 0.000098 Epoch: 28 Global Step: 595340 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:04,952-Speed 2498.22 samples/sec Loss 1.4067 LearningRate 0.000098 Epoch: 28 Global Step: 595350 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:13,150-Speed 2498.42 samples/sec Loss 1.4428 LearningRate 0.000098 Epoch: 28 Global Step: 595360 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:21,350-Speed 2497.88 samples/sec Loss 1.4229 LearningRate 0.000098 Epoch: 28 Global Step: 595370 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:29,549-Speed 2498.43 samples/sec Loss 1.3979 LearningRate 0.000098 Epoch: 28 Global Step: 595380 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:37,709-Speed 2510.54 samples/sec Loss 1.4699 LearningRate 0.000098 Epoch: 28 Global Step: 595390 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:45,907-Speed 2498.48 samples/sec Loss 1.4278 LearningRate 0.000098 Epoch: 28 Global Step: 595400 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:43:54,103-Speed 2499.14 samples/sec Loss 1.4313 LearningRate 0.000098 Epoch: 28 Global Step: 595410 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:02,308-Speed 2496.59 samples/sec Loss 1.4489 LearningRate 0.000098 Epoch: 28 Global Step: 595420 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:10,505-Speed 2498.59 samples/sec Loss 1.4021 LearningRate 0.000098 Epoch: 28 Global Step: 595430 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:18,706-Speed 2497.70 samples/sec Loss 1.4390 LearningRate 0.000098 Epoch: 28 Global Step: 595440 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:26,862-Speed 2511.59 samples/sec Loss 1.4385 LearningRate 0.000098 Epoch: 28 Global Step: 595450 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:35,058-Speed 2499.16 samples/sec Loss 1.4131 LearningRate 0.000098 Epoch: 28 Global Step: 595460 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:43,258-Speed 2497.93 samples/sec Loss 1.4431 LearningRate 0.000098 Epoch: 28 Global Step: 595470 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:51,456-Speed 2498.52 samples/sec Loss 1.4316 LearningRate 0.000098 Epoch: 28 Global Step: 595480 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:44:59,654-Speed 2498.58 samples/sec Loss 1.4462 LearningRate 0.000098 Epoch: 28 Global Step: 595490 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:07,857-Speed 2497.14 samples/sec Loss 1.4153 LearningRate 0.000098 Epoch: 28 Global Step: 595500 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:16,001-Speed 2515.16 samples/sec Loss 1.4460 LearningRate 0.000098 Epoch: 28 Global Step: 595510 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:24,200-Speed 2498.24 samples/sec Loss 1.4309 LearningRate 0.000098 Epoch: 28 Global Step: 595520 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:32,400-Speed 2498.05 samples/sec Loss 1.4426 LearningRate 0.000098 Epoch: 28 Global Step: 595530 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:40,599-Speed 2498.21 samples/sec Loss 1.4293 LearningRate 0.000098 Epoch: 28 Global Step: 595540 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:48,809-Speed 2494.65 samples/sec Loss 1.4467 LearningRate 0.000098 Epoch: 28 Global Step: 595550 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:45:57,009-Speed 2498.12 samples/sec Loss 1.4609 LearningRate 0.000098 Epoch: 28 Global Step: 595560 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:05,156-Speed 2514.27 samples/sec Loss 1.4558 LearningRate 0.000098 Epoch: 28 Global Step: 595570 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:13,356-Speed 2497.73 samples/sec Loss 1.4264 LearningRate 0.000098 Epoch: 28 Global Step: 595580 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:21,562-Speed 2496.24 samples/sec Loss 1.4718 LearningRate 0.000098 Epoch: 28 Global Step: 595590 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:29,768-Speed 2495.98 samples/sec Loss 1.4285 LearningRate 0.000098 Epoch: 28 Global Step: 595600 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:37,970-Speed 2497.32 samples/sec Loss 1.4367 LearningRate 0.000098 Epoch: 28 Global Step: 595610 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:46,170-Speed 2498.08 samples/sec Loss 1.4455 LearningRate 0.000098 Epoch: 28 Global Step: 595620 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:46:54,317-Speed 2514.15 samples/sec Loss 1.4475 LearningRate 0.000098 Epoch: 28 Global Step: 595630 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:02,520-Speed 2497.04 samples/sec Loss 1.4676 LearningRate 0.000098 Epoch: 28 Global Step: 595640 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:10,720-Speed 2498.27 samples/sec Loss 1.4575 LearningRate 0.000098 Epoch: 28 Global Step: 595650 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:18,921-Speed 2497.87 samples/sec Loss 1.3980 LearningRate 0.000098 Epoch: 28 Global Step: 595660 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:27,120-Speed 2498.43 samples/sec Loss 1.4328 LearningRate 0.000098 Epoch: 28 Global Step: 595670 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:35,317-Speed 2499.02 samples/sec Loss 1.4044 LearningRate 0.000098 Epoch: 28 Global Step: 595680 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:43,465-Speed 2513.77 samples/sec Loss 1.4494 LearningRate 0.000098 Epoch: 28 Global Step: 595690 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:51,666-Speed 2497.64 samples/sec Loss 1.4159 LearningRate 0.000098 Epoch: 28 Global Step: 595700 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:47:59,879-Speed 2494.05 samples/sec Loss 1.4401 LearningRate 0.000098 Epoch: 28 Global Step: 595710 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:08,079-Speed 2497.92 samples/sec Loss 1.4248 LearningRate 0.000098 Epoch: 28 Global Step: 595720 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:16,276-Speed 2498.99 samples/sec Loss 1.4552 LearningRate 0.000098 Epoch: 28 Global Step: 595730 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:24,476-Speed 2497.89 samples/sec Loss 1.4216 LearningRate 0.000098 Epoch: 28 Global Step: 595740 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:32,625-Speed 2513.41 samples/sec Loss 1.4477 LearningRate 0.000098 Epoch: 28 Global Step: 595750 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:40,834-Speed 2495.21 samples/sec Loss 1.4452 LearningRate 0.000098 Epoch: 28 Global Step: 595760 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:49,035-Speed 2497.90 samples/sec Loss 1.4455 LearningRate 0.000098 Epoch: 28 Global Step: 595770 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:48:57,234-Speed 2498.20 samples/sec Loss 1.4510 LearningRate 0.000098 Epoch: 28 Global Step: 595780 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:05,432-Speed 2498.41 samples/sec Loss 1.4332 LearningRate 0.000098 Epoch: 28 Global Step: 595790 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:13,631-Speed 2498.27 samples/sec Loss 1.4518 LearningRate 0.000098 Epoch: 28 Global Step: 595800 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:21,776-Speed 2515.03 samples/sec Loss 1.4232 LearningRate 0.000098 Epoch: 28 Global Step: 595810 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:29,993-Speed 2492.84 samples/sec Loss 1.4667 LearningRate 0.000098 Epoch: 28 Global Step: 595820 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:38,189-Speed 2498.95 samples/sec Loss 1.4170 LearningRate 0.000098 Epoch: 28 Global Step: 595830 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-07-11 06:49:46,389-Speed 2498.20 samples/sec Loss 1.4564 LearningRate 0.000098 Epoch: 28 Global Step: 595840 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:49:54,587-Speed 2498.61 samples/sec Loss 1.4401 LearningRate 0.000098 Epoch: 28 Global Step: 595850 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:02,787-Speed 2498.02 samples/sec Loss 1.4329 LearningRate 0.000098 Epoch: 28 Global Step: 595860 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:10,940-Speed 2512.54 samples/sec Loss 1.4472 LearningRate 0.000098 Epoch: 28 Global Step: 595870 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:19,141-Speed 2497.81 samples/sec Loss 1.4922 LearningRate 0.000098 Epoch: 28 Global Step: 595880 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:27,348-Speed 2495.92 samples/sec Loss 1.4508 LearningRate 0.000098 Epoch: 28 Global Step: 595890 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:35,545-Speed 2498.76 samples/sec Loss 1.4837 LearningRate 0.000098 Epoch: 28 Global Step: 595900 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:43,748-Speed 2497.35 samples/sec Loss 1.4737 LearningRate 0.000098 Epoch: 28 Global Step: 595910 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:50:51,951-Speed 2498.10 samples/sec Loss 1.5180 LearningRate 0.000098 Epoch: 28 Global Step: 595920 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:00,094-Speed 2515.10 samples/sec Loss 1.4601 LearningRate 0.000098 Epoch: 28 Global Step: 595930 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:08,296-Speed 2497.51 samples/sec Loss 1.4175 LearningRate 0.000098 Epoch: 28 Global Step: 595940 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:16,493-Speed 2498.75 samples/sec Loss 1.4133 LearningRate 0.000098 Epoch: 28 Global Step: 595950 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:24,693-Speed 2498.05 samples/sec Loss 1.4424 LearningRate 0.000098 Epoch: 28 Global Step: 595960 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:32,892-Speed 2498.22 samples/sec Loss 1.4588 LearningRate 0.000098 Epoch: 28 Global Step: 595970 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:41,095-Speed 2497.09 samples/sec Loss 1.4510 LearningRate 0.000098 Epoch: 28 Global Step: 595980 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:49,242-Speed 2514.35 samples/sec Loss 1.4641 LearningRate 0.000098 Epoch: 28 Global Step: 595990 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:51:57,473-Speed 2488.49 samples/sec Loss 1.4188 LearningRate 0.000098 Epoch: 28 Global Step: 596000 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:05,673-Speed 2498.20 samples/sec Loss 1.4005 LearningRate 0.000098 Epoch: 28 Global Step: 596010 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:13,871-Speed 2498.27 samples/sec Loss 1.4155 LearningRate 0.000098 Epoch: 28 Global Step: 596020 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:22,069-Speed 2498.68 samples/sec Loss 1.4256 LearningRate 0.000098 Epoch: 28 Global Step: 596030 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:30,265-Speed 2499.14 samples/sec Loss 1.4194 LearningRate 0.000098 Epoch: 28 Global Step: 596040 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:38,422-Speed 2511.07 samples/sec Loss 1.4205 LearningRate 0.000098 Epoch: 28 Global Step: 596050 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:46,620-Speed 2498.62 samples/sec Loss 1.4574 LearningRate 0.000098 Epoch: 28 Global Step: 596060 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:52:54,820-Speed 2498.17 samples/sec Loss 1.4136 LearningRate 0.000098 Epoch: 28 Global Step: 596070 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:03,017-Speed 2498.71 samples/sec Loss 1.4957 LearningRate 0.000098 Epoch: 28 Global Step: 596080 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:11,227-Speed 2494.70 samples/sec Loss 1.4694 LearningRate 0.000098 Epoch: 28 Global Step: 596090 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:19,430-Speed 2497.65 samples/sec Loss 1.4544 LearningRate 0.000098 Epoch: 28 Global Step: 596100 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:27,573-Speed 2515.57 samples/sec Loss 1.4427 LearningRate 0.000098 Epoch: 28 Global Step: 596110 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:35,772-Speed 2498.24 samples/sec Loss 1.4586 LearningRate 0.000098 Epoch: 28 Global Step: 596120 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:43,972-Speed 2498.09 samples/sec Loss 1.4071 LearningRate 0.000098 Epoch: 28 Global Step: 596130 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:53:52,170-Speed 2499.61 samples/sec Loss 1.4572 LearningRate 0.000098 Epoch: 28 Global Step: 596140 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:00,378-Speed 2495.50 samples/sec Loss 1.4867 LearningRate 0.000098 Epoch: 28 Global Step: 596150 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:08,578-Speed 2498.00 samples/sec Loss 1.4787 LearningRate 0.000098 Epoch: 28 Global Step: 596160 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:16,725-Speed 2514.01 samples/sec Loss 1.4604 LearningRate 0.000098 Epoch: 28 Global Step: 596170 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:24,923-Speed 2498.58 samples/sec Loss 1.4666 LearningRate 0.000098 Epoch: 28 Global Step: 596180 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:33,122-Speed 2498.38 samples/sec Loss 1.4377 LearningRate 0.000098 Epoch: 28 Global Step: 596190 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:41,324-Speed 2497.47 samples/sec Loss 1.4265 LearningRate 0.000098 Epoch: 28 Global Step: 596200 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:49,521-Speed 2498.81 samples/sec Loss 1.4024 LearningRate 0.000098 Epoch: 28 Global Step: 596210 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:54:57,719-Speed 2498.44 samples/sec Loss 1.4860 LearningRate 0.000098 Epoch: 28 Global Step: 596220 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:05,865-Speed 2514.63 samples/sec Loss 1.4416 LearningRate 0.000098 Epoch: 28 Global Step: 596230 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:14,077-Speed 2494.34 samples/sec Loss 1.4376 LearningRate 0.000098 Epoch: 28 Global Step: 596240 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:22,272-Speed 2499.27 samples/sec Loss 1.4574 LearningRate 0.000098 Epoch: 28 Global Step: 596250 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:30,482-Speed 2495.13 samples/sec Loss 1.4312 LearningRate 0.000098 Epoch: 28 Global Step: 596260 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:38,682-Speed 2497.88 samples/sec Loss 1.4466 LearningRate 0.000098 Epoch: 28 Global Step: 596270 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:46,881-Speed 2498.12 samples/sec Loss 1.4171 LearningRate 0.000098 Epoch: 28 Global Step: 596280 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:55:55,027-Speed 2514.70 samples/sec Loss 1.4186 LearningRate 0.000098 Epoch: 28 Global Step: 596290 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:56:03,226-Speed 2498.25 samples/sec Loss 1.4073 LearningRate 0.000098 Epoch: 28 Global Step: 596300 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:56:11,425-Speed 2498.16 samples/sec Loss 1.4197 LearningRate 0.000098 Epoch: 28 Global Step: 596310 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:56:19,624-Speed 2497.99 samples/sec Loss 1.4480 LearningRate 0.000098 Epoch: 28 Global Step: 596320 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 06:56:27,824-Speed 2498.88 samples/sec Loss 1.4338 LearningRate 0.000098 Epoch: 28 Global Step: 596330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:56:36,021-Speed 2499.11 samples/sec Loss 1.4174 LearningRate 0.000098 Epoch: 28 Global Step: 596340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:56:44,181-Speed 2509.97 samples/sec Loss 1.4223 LearningRate 0.000098 Epoch: 28 Global Step: 596350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:56:52,388-Speed 2495.97 samples/sec Loss 1.4332 LearningRate 0.000098 Epoch: 28 Global Step: 596360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:00,602-Speed 2493.42 samples/sec Loss 1.4167 LearningRate 0.000098 Epoch: 28 Global Step: 596370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:08,805-Speed 2497.61 samples/sec Loss 1.3855 LearningRate 0.000098 Epoch: 28 Global Step: 596380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:17,003-Speed 2498.36 samples/sec Loss 1.4068 LearningRate 0.000098 Epoch: 28 Global Step: 596390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:25,207-Speed 2496.76 samples/sec Loss 1.3944 LearningRate 0.000098 Epoch: 28 Global Step: 596400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:33,351-Speed 2515.20 samples/sec Loss 1.4281 LearningRate 0.000098 Epoch: 28 Global Step: 596410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:41,549-Speed 2498.80 samples/sec Loss 1.3904 LearningRate 0.000098 Epoch: 28 Global Step: 596420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:49,748-Speed 2498.05 samples/sec Loss 1.4284 LearningRate 0.000098 Epoch: 28 Global Step: 596430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:57:57,951-Speed 2497.43 samples/sec Loss 1.3766 LearningRate 0.000097 Epoch: 28 Global Step: 596440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:06,156-Speed 2496.66 samples/sec Loss 1.4115 LearningRate 0.000097 Epoch: 28 Global Step: 596450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:14,355-Speed 2498.11 samples/sec Loss 1.3780 LearningRate 0.000097 Epoch: 28 Global Step: 596460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:22,512-Speed 2511.00 samples/sec Loss 1.4445 LearningRate 0.000097 Epoch: 28 Global Step: 596470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:30,732-Speed 2492.19 samples/sec Loss 1.3772 LearningRate 0.000097 Epoch: 28 Global Step: 596480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:38,930-Speed 2498.52 samples/sec Loss 1.4602 LearningRate 0.000097 Epoch: 28 Global Step: 596490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:47,127-Speed 2498.69 samples/sec Loss 1.4607 LearningRate 0.000097 Epoch: 28 Global Step: 596500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:58:55,330-Speed 2497.07 samples/sec Loss 1.4190 LearningRate 0.000097 Epoch: 28 Global Step: 596510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:03,530-Speed 2498.27 samples/sec Loss 1.4006 LearningRate 0.000097 Epoch: 28 Global Step: 596520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:11,674-Speed 2515.03 samples/sec Loss 1.4481 LearningRate 0.000097 Epoch: 28 Global Step: 596530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:19,874-Speed 2497.90 samples/sec Loss 1.4166 LearningRate 0.000097 Epoch: 28 Global Step: 596540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:28,075-Speed 2497.81 samples/sec Loss 1.4614 LearningRate 0.000097 Epoch: 28 Global Step: 596550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:36,286-Speed 2494.59 samples/sec Loss 1.4104 LearningRate 0.000097 Epoch: 28 Global Step: 596560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:44,484-Speed 2498.67 samples/sec Loss 1.4331 LearningRate 0.000097 Epoch: 28 Global Step: 596570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 06:59:52,683-Speed 2498.27 samples/sec Loss 1.4292 LearningRate 0.000097 Epoch: 28 Global Step: 596580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:00,830-Speed 2514.43 samples/sec Loss 1.4465 LearningRate 0.000097 Epoch: 28 Global Step: 596590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:09,028-Speed 2498.72 samples/sec Loss 1.3908 LearningRate 0.000097 Epoch: 28 Global Step: 596600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:17,228-Speed 2497.87 samples/sec Loss 1.4278 LearningRate 0.000097 Epoch: 28 Global Step: 596610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:25,426-Speed 2498.37 samples/sec Loss 1.4254 LearningRate 0.000097 Epoch: 28 Global Step: 596620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:33,628-Speed 2497.65 samples/sec Loss 1.4606 LearningRate 0.000097 Epoch: 28 Global Step: 596630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:41,831-Speed 2496.96 samples/sec Loss 1.4346 LearningRate 0.000097 Epoch: 28 Global Step: 596640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:49,973-Speed 2515.72 samples/sec Loss 1.4181 LearningRate 0.000097 Epoch: 28 Global Step: 596650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:00:58,170-Speed 2499.19 samples/sec Loss 1.4226 LearningRate 0.000097 Epoch: 28 Global Step: 596660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:06,381-Speed 2494.45 samples/sec Loss 1.4258 LearningRate 0.000097 Epoch: 28 Global Step: 596670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:14,582-Speed 2498.13 samples/sec Loss 1.4512 LearningRate 0.000097 Epoch: 28 Global Step: 596680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:22,785-Speed 2497.26 samples/sec Loss 1.4600 LearningRate 0.000097 Epoch: 28 Global Step: 596690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:30,985-Speed 2497.95 samples/sec Loss 1.4195 LearningRate 0.000097 Epoch: 28 Global Step: 596700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:39,131-Speed 2514.68 samples/sec Loss 1.4798 LearningRate 0.000097 Epoch: 28 Global Step: 596710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:47,329-Speed 2498.78 samples/sec Loss 1.4264 LearningRate 0.000097 Epoch: 28 Global Step: 596720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:01:55,527-Speed 2498.58 samples/sec Loss 1.4721 LearningRate 0.000097 Epoch: 28 Global Step: 596730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:02:03,727-Speed 2497.62 samples/sec Loss 1.4336 LearningRate 0.000097 Epoch: 28 Global Step: 596740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:02:11,924-Speed 2498.94 samples/sec Loss 1.4301 LearningRate 0.000097 Epoch: 28 Global Step: 596750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:02:20,084-Speed 2510.28 samples/sec Loss 1.4168 LearningRate 0.000097 Epoch: 28 Global Step: 596760 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:02:28,228-Speed 2515.10 samples/sec Loss 1.4057 LearningRate 0.000097 Epoch: 28 Global Step: 596770 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:02:36,430-Speed 2497.46 samples/sec Loss 1.4435 LearningRate 0.000097 Epoch: 28 Global Step: 596780 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:02:44,628-Speed 2498.49 samples/sec Loss 1.4251 LearningRate 0.000097 Epoch: 28 Global Step: 596790 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:02:52,828-Speed 2497.95 samples/sec Loss 1.4206 LearningRate 0.000097 Epoch: 28 Global Step: 596800 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:01,029-Speed 2497.78 samples/sec Loss 1.4112 LearningRate 0.000097 Epoch: 28 Global Step: 596810 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:09,225-Speed 2499.20 samples/sec Loss 1.4327 LearningRate 0.000097 Epoch: 28 Global Step: 596820 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:17,373-Speed 2513.78 samples/sec Loss 1.4209 LearningRate 0.000097 Epoch: 28 Global Step: 596830 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:25,572-Speed 2498.67 samples/sec Loss 1.4274 LearningRate 0.000097 Epoch: 28 Global Step: 596840 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:33,769-Speed 2498.72 samples/sec Loss 1.4337 LearningRate 0.000097 Epoch: 28 Global Step: 596850 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:41,968-Speed 2498.20 samples/sec Loss 1.4031 LearningRate 0.000097 Epoch: 28 Global Step: 596860 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:50,169-Speed 2497.73 samples/sec Loss 1.3958 LearningRate 0.000097 Epoch: 28 Global Step: 596870 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:03:58,372-Speed 2497.59 samples/sec Loss 1.4240 LearningRate 0.000097 Epoch: 28 Global Step: 596880 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:06,518-Speed 2514.28 samples/sec Loss 1.4372 LearningRate 0.000097 Epoch: 28 Global Step: 596890 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:14,718-Speed 2497.94 samples/sec Loss 1.4236 LearningRate 0.000097 Epoch: 28 Global Step: 596900 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:22,933-Speed 2493.39 samples/sec Loss 1.4173 LearningRate 0.000097 Epoch: 28 Global Step: 596910 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:31,135-Speed 2497.55 samples/sec Loss 1.4167 LearningRate 0.000097 Epoch: 28 Global Step: 596920 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:39,336-Speed 2497.71 samples/sec Loss 1.4254 LearningRate 0.000097 Epoch: 28 Global Step: 596930 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:47,539-Speed 2496.98 samples/sec Loss 1.4323 LearningRate 0.000097 Epoch: 28 Global Step: 596940 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:04:55,685-Speed 2514.68 samples/sec Loss 1.4311 LearningRate 0.000097 Epoch: 28 Global Step: 596950 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:03,884-Speed 2498.16 samples/sec Loss 1.4110 LearningRate 0.000097 Epoch: 28 Global Step: 596960 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:12,084-Speed 2498.07 samples/sec Loss 1.3961 LearningRate 0.000097 Epoch: 28 Global Step: 596970 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:20,286-Speed 2497.20 samples/sec Loss 1.4064 LearningRate 0.000097 Epoch: 28 Global Step: 596980 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:28,484-Speed 2498.56 samples/sec Loss 1.3978 LearningRate 0.000097 Epoch: 28 Global Step: 596990 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:36,686-Speed 2497.39 samples/sec Loss 1.4440 LearningRate 0.000097 Epoch: 28 Global Step: 597000 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:44,830-Speed 2514.97 samples/sec Loss 1.4456 LearningRate 0.000097 Epoch: 28 Global Step: 597010 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:05:53,032-Speed 2497.56 samples/sec Loss 1.4378 LearningRate 0.000097 Epoch: 28 Global Step: 597020 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:01,236-Speed 2496.73 samples/sec Loss 1.4482 LearningRate 0.000097 Epoch: 28 Global Step: 597030 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:09,434-Speed 2498.74 samples/sec Loss 1.4312 LearningRate 0.000097 Epoch: 28 Global Step: 597040 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:17,634-Speed 2497.74 samples/sec Loss 1.4525 LearningRate 0.000097 Epoch: 28 Global Step: 597050 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:25,836-Speed 2497.61 samples/sec Loss 1.4094 LearningRate 0.000097 Epoch: 28 Global Step: 597060 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:33,986-Speed 2513.42 samples/sec Loss 1.3819 LearningRate 0.000097 Epoch: 28 Global Step: 597070 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:42,189-Speed 2497.02 samples/sec Loss 1.4168 LearningRate 0.000097 Epoch: 28 Global Step: 597080 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:50,391-Speed 2497.24 samples/sec Loss 1.4126 LearningRate 0.000097 Epoch: 28 Global Step: 597090 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:06:58,590-Speed 2498.08 samples/sec Loss 1.4529 LearningRate 0.000097 Epoch: 28 Global Step: 597100 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:06,790-Speed 2497.87 samples/sec Loss 1.4475 LearningRate 0.000097 Epoch: 28 Global Step: 597110 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:14,989-Speed 2498.44 samples/sec Loss 1.4359 LearningRate 0.000097 Epoch: 28 Global Step: 597120 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:23,135-Speed 2514.31 samples/sec Loss 1.4096 LearningRate 0.000097 Epoch: 28 Global Step: 597130 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:31,339-Speed 2496.98 samples/sec Loss 1.4354 LearningRate 0.000097 Epoch: 28 Global Step: 597140 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:39,538-Speed 2498.05 samples/sec Loss 1.4372 LearningRate 0.000097 Epoch: 28 Global Step: 597150 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:47,740-Speed 2497.45 samples/sec Loss 1.4101 LearningRate 0.000097 Epoch: 28 Global Step: 597160 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:07:55,950-Speed 2494.96 samples/sec Loss 1.4323 LearningRate 0.000097 Epoch: 28 Global Step: 597170 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:04,163-Speed 2493.82 samples/sec Loss 1.4170 LearningRate 0.000097 Epoch: 28 Global Step: 597180 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:12,327-Speed 2509.20 samples/sec Loss 1.4297 LearningRate 0.000097 Epoch: 28 Global Step: 597190 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:20,536-Speed 2495.05 samples/sec Loss 1.4159 LearningRate 0.000097 Epoch: 28 Global Step: 597200 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:28,738-Speed 2497.44 samples/sec Loss 1.4083 LearningRate 0.000097 Epoch: 28 Global Step: 597210 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:36,939-Speed 2497.63 samples/sec Loss 1.4447 LearningRate 0.000097 Epoch: 28 Global Step: 597220 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:45,152-Speed 2493.90 samples/sec Loss 1.3966 LearningRate 0.000097 Epoch: 28 Global Step: 597230 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:08:53,352-Speed 2498.29 samples/sec Loss 1.4373 LearningRate 0.000097 Epoch: 28 Global Step: 597240 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:01,496-Speed 2514.97 samples/sec Loss 1.4683 LearningRate 0.000097 Epoch: 28 Global Step: 597250 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:09,698-Speed 2497.48 samples/sec Loss 1.4423 LearningRate 0.000097 Epoch: 28 Global Step: 597260 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:17,899-Speed 2497.38 samples/sec Loss 1.4172 LearningRate 0.000097 Epoch: 28 Global Step: 597270 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:26,098-Speed 2498.37 samples/sec Loss 1.4165 LearningRate 0.000097 Epoch: 28 Global Step: 597280 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:34,304-Speed 2495.99 samples/sec Loss 1.3658 LearningRate 0.000097 Epoch: 28 Global Step: 597290 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:42,505-Speed 2497.98 samples/sec Loss 1.4280 LearningRate 0.000097 Epoch: 28 Global Step: 597300 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:50,658-Speed 2512.50 samples/sec Loss 1.4223 LearningRate 0.000097 Epoch: 28 Global Step: 597310 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:09:58,857-Speed 2498.24 samples/sec Loss 1.4240 LearningRate 0.000097 Epoch: 28 Global Step: 597320 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:07,061-Speed 2496.99 samples/sec Loss 1.4400 LearningRate 0.000097 Epoch: 28 Global Step: 597330 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:15,263-Speed 2497.28 samples/sec Loss 1.4561 LearningRate 0.000097 Epoch: 28 Global Step: 597340 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:23,468-Speed 2496.51 samples/sec Loss 1.4293 LearningRate 0.000097 Epoch: 28 Global Step: 597350 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:31,669-Speed 2497.49 samples/sec Loss 1.4756 LearningRate 0.000097 Epoch: 28 Global Step: 597360 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:39,816-Speed 2514.45 samples/sec Loss 1.4165 LearningRate 0.000097 Epoch: 28 Global Step: 597370 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:48,021-Speed 2496.28 samples/sec Loss 1.4309 LearningRate 0.000097 Epoch: 28 Global Step: 597380 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:10:56,225-Speed 2496.98 samples/sec Loss 1.4278 LearningRate 0.000097 Epoch: 28 Global Step: 597390 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:04,424-Speed 2498.43 samples/sec Loss 1.4368 LearningRate 0.000097 Epoch: 28 Global Step: 597400 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:12,624-Speed 2497.82 samples/sec Loss 1.4322 LearningRate 0.000097 Epoch: 28 Global Step: 597410 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:20,836-Speed 2494.15 samples/sec Loss 1.4541 LearningRate 0.000097 Epoch: 28 Global Step: 597420 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:28,988-Speed 2512.63 samples/sec Loss 1.4629 LearningRate 0.000097 Epoch: 28 Global Step: 597430 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:37,190-Speed 2497.37 samples/sec Loss 1.4316 LearningRate 0.000097 Epoch: 28 Global Step: 597440 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:45,393-Speed 2497.21 samples/sec Loss 1.4433 LearningRate 0.000097 Epoch: 28 Global Step: 597450 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:11:53,591-Speed 2498.47 samples/sec Loss 1.4271 LearningRate 0.000097 Epoch: 28 Global Step: 597460 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:01,791-Speed 2497.89 samples/sec Loss 1.4303 LearningRate 0.000097 Epoch: 28 Global Step: 597470 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:10,009-Speed 2500.08 samples/sec Loss 1.4050 LearningRate 0.000097 Epoch: 28 Global Step: 597480 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:18,183-Speed 2516.72 samples/sec Loss 1.4556 LearningRate 0.000097 Epoch: 28 Global Step: 597490 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:26,383-Speed 2498.13 samples/sec Loss 1.4179 LearningRate 0.000097 Epoch: 28 Global Step: 597500 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:34,611-Speed 2499.53 samples/sec Loss 1.4464 LearningRate 0.000097 Epoch: 28 Global Step: 597510 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:42,892-Speed 2498.55 samples/sec Loss 1.4561 LearningRate 0.000097 Epoch: 28 Global Step: 597520 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:51,118-Speed 2495.29 samples/sec Loss 1.4636 LearningRate 0.000097 Epoch: 28 Global Step: 597530 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:12:59,315-Speed 2498.56 samples/sec Loss 1.4443 LearningRate 0.000097 Epoch: 28 Global Step: 597540 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:13:07,520-Speed 2516.33 samples/sec Loss 1.3826 LearningRate 0.000097 Epoch: 28 Global Step: 597550 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:13:56,675-Speed 416.65 samples/sec Loss 1.3962 LearningRate 0.000097 Epoch: 28 Global Step: 597560 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:04,881-Speed 2508.74 samples/sec Loss 1.4289 LearningRate 0.000097 Epoch: 28 Global Step: 597570 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:13,057-Speed 2505.14 samples/sec Loss 1.3950 LearningRate 0.000097 Epoch: 28 Global Step: 597580 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:21,283-Speed 2504.40 samples/sec Loss 1.3929 LearningRate 0.000097 Epoch: 28 Global Step: 597590 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:29,522-Speed 2502.58 samples/sec Loss 1.4345 LearningRate 0.000097 Epoch: 28 Global Step: 597600 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:37,667-Speed 2514.61 samples/sec Loss 1.3789 LearningRate 0.000097 Epoch: 28 Global Step: 597610 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:45,873-Speed 2496.07 samples/sec Loss 1.4278 LearningRate 0.000097 Epoch: 28 Global Step: 597620 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:14:54,121-Speed 2496.55 samples/sec Loss 1.4139 LearningRate 0.000097 Epoch: 28 Global Step: 597630 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:02,481-Speed 2494.31 samples/sec Loss 1.3992 LearningRate 0.000096 Epoch: 28 Global Step: 597640 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:14,502-Speed 1703.91 samples/sec Loss 1.3960 LearningRate 0.000096 Epoch: 28 Global Step: 597650 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:22,765-Speed 2491.29 samples/sec Loss 1.4596 LearningRate 0.000096 Epoch: 28 Global Step: 597660 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:35,965-Speed 2506.13 samples/sec Loss 1.4392 LearningRate 0.000096 Epoch: 28 Global Step: 597670 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:44,198-Speed 2488.00 samples/sec Loss 1.4263 LearningRate 0.000096 Epoch: 28 Global Step: 597680 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:15:52,438-Speed 2485.81 samples/sec Loss 1.4152 LearningRate 0.000096 Epoch: 28 Global Step: 597690 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:00,785-Speed 2453.82 samples/sec Loss 1.4109 LearningRate 0.000096 Epoch: 28 Global Step: 597700 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:09,077-Speed 2470.14 samples/sec Loss 1.4140 LearningRate 0.000096 Epoch: 28 Global Step: 597710 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:17,508-Speed 2429.30 samples/sec Loss 1.4253 LearningRate 0.000096 Epoch: 28 Global Step: 597720 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:25,742-Speed 2487.85 samples/sec Loss 1.4226 LearningRate 0.000096 Epoch: 28 Global Step: 597730 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:34,011-Speed 2477.01 samples/sec Loss 1.3904 LearningRate 0.000096 Epoch: 28 Global Step: 597740 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:42,431-Speed 2432.71 samples/sec Loss 1.4163 LearningRate 0.000096 Epoch: 28 Global Step: 597750 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:50,681-Speed 2482.73 samples/sec Loss 1.4295 LearningRate 0.000096 Epoch: 28 Global Step: 597760 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:16:58,934-Speed 2481.88 samples/sec Loss 1.4143 LearningRate 0.000096 Epoch: 28 Global Step: 597770 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:07,183-Speed 2483.08 samples/sec Loss 1.3989 LearningRate 0.000096 Epoch: 28 Global Step: 597780 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:15,380-Speed 2498.69 samples/sec Loss 1.4160 LearningRate 0.000096 Epoch: 28 Global Step: 597790 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:23,630-Speed 2483.00 samples/sec Loss 1.4402 LearningRate 0.000096 Epoch: 28 Global Step: 597800 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:31,881-Speed 2482.63 samples/sec Loss 1.4062 LearningRate 0.000096 Epoch: 28 Global Step: 597810 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:40,127-Speed 2483.76 samples/sec Loss 1.4236 LearningRate 0.000096 Epoch: 28 Global Step: 597820 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:48,391-Speed 2478.67 samples/sec Loss 1.4269 LearningRate 0.000096 Epoch: 28 Global Step: 597830 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:17:56,640-Speed 2483.04 samples/sec Loss 1.4532 LearningRate 0.000096 Epoch: 28 Global Step: 597840 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:04,831-Speed 2500.72 samples/sec Loss 1.4119 LearningRate 0.000096 Epoch: 28 Global Step: 597850 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:13,075-Speed 2484.31 samples/sec Loss 1.4406 LearningRate 0.000096 Epoch: 28 Global Step: 597860 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:21,316-Speed 2485.60 samples/sec Loss 1.4048 LearningRate 0.000096 Epoch: 28 Global Step: 597870 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:29,553-Speed 2486.72 samples/sec Loss 1.4339 LearningRate 0.000096 Epoch: 28 Global Step: 597880 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:37,789-Speed 2486.92 samples/sec Loss 1.4208 LearningRate 0.000096 Epoch: 28 Global Step: 597890 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:46,030-Speed 2485.61 samples/sec Loss 1.4671 LearningRate 0.000096 Epoch: 28 Global Step: 597900 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:18:54,215-Speed 2502.44 samples/sec Loss 1.3864 LearningRate 0.000096 Epoch: 28 Global Step: 597910 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:19:02,456-Speed 2485.46 samples/sec Loss 1.4635 LearningRate 0.000096 Epoch: 28 Global Step: 597920 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:19:10,693-Speed 2486.64 samples/sec Loss 1.4234 LearningRate 0.000096 Epoch: 28 Global Step: 597930 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:19:18,928-Speed 2487.28 samples/sec Loss 1.4366 LearningRate 0.000096 Epoch: 28 Global Step: 597940 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:19:27,162-Speed 2487.93 samples/sec Loss 1.4209 LearningRate 0.000096 Epoch: 28 Global Step: 597950 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:19:35,397-Speed 2487.20 samples/sec Loss 1.4174 LearningRate 0.000096 Epoch: 28 Global Step: 597960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:19:43,579-Speed 2503.72 samples/sec Loss 1.4170 LearningRate 0.000096 Epoch: 28 Global Step: 597970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:19:51,812-Speed 2487.91 samples/sec Loss 1.4244 LearningRate 0.000096 Epoch: 28 Global Step: 597980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:00,042-Speed 2488.95 samples/sec Loss 1.4499 LearningRate 0.000096 Epoch: 28 Global Step: 597990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:08,274-Speed 2488.29 samples/sec Loss 1.3929 LearningRate 0.000096 Epoch: 28 Global Step: 598000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:16,506-Speed 2488.02 samples/sec Loss 1.4477 LearningRate 0.000096 Epoch: 28 Global Step: 598010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:24,736-Speed 2488.99 samples/sec Loss 1.4268 LearningRate 0.000096 Epoch: 28 Global Step: 598020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:32,913-Speed 2504.98 samples/sec Loss 1.4419 LearningRate 0.000096 Epoch: 28 Global Step: 598030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:41,150-Speed 2488.24 samples/sec Loss 1.4074 LearningRate 0.000096 Epoch: 28 Global Step: 598040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:49,374-Speed 2490.64 samples/sec Loss 1.4556 LearningRate 0.000096 Epoch: 28 Global Step: 598050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:20:57,609-Speed 2487.14 samples/sec Loss 1.4639 LearningRate 0.000096 Epoch: 28 Global Step: 598060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:05,838-Speed 2489.04 samples/sec Loss 1.4282 LearningRate 0.000096 Epoch: 28 Global Step: 598070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:14,070-Speed 2488.43 samples/sec Loss 1.4186 LearningRate 0.000096 Epoch: 28 Global Step: 598080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:22,251-Speed 2503.86 samples/sec Loss 1.4282 LearningRate 0.000096 Epoch: 28 Global Step: 598090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:30,476-Speed 2490.05 samples/sec Loss 1.4104 LearningRate 0.000096 Epoch: 28 Global Step: 598100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:38,703-Speed 2490.15 samples/sec Loss 1.3842 LearningRate 0.000096 Epoch: 28 Global Step: 598110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:46,941-Speed 2486.47 samples/sec Loss 1.4016 LearningRate 0.000096 Epoch: 28 Global Step: 598120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:21:55,170-Speed 2489.09 samples/sec Loss 1.4035 LearningRate 0.000096 Epoch: 28 Global Step: 598130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:03,410-Speed 2485.65 samples/sec Loss 1.4227 LearningRate 0.000096 Epoch: 28 Global Step: 598140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:11,584-Speed 2506.15 samples/sec Loss 1.4341 LearningRate 0.000096 Epoch: 28 Global Step: 598150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:19,809-Speed 2490.28 samples/sec Loss 1.4198 LearningRate 0.000096 Epoch: 28 Global Step: 598160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:28,034-Speed 2490.68 samples/sec Loss 1.4456 LearningRate 0.000096 Epoch: 28 Global Step: 598170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:36,254-Speed 2491.50 samples/sec Loss 1.4343 LearningRate 0.000096 Epoch: 28 Global Step: 598180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:44,489-Speed 2487.61 samples/sec Loss 1.4149 LearningRate 0.000096 Epoch: 28 Global Step: 598190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:22:52,710-Speed 2491.32 samples/sec Loss 1.4191 LearningRate 0.000096 Epoch: 28 Global Step: 598200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:00,894-Speed 2503.02 samples/sec Loss 1.4478 LearningRate 0.000096 Epoch: 28 Global Step: 598210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:09,119-Speed 2490.27 samples/sec Loss 1.4322 LearningRate 0.000096 Epoch: 28 Global Step: 598220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:17,340-Speed 2491.94 samples/sec Loss 1.4483 LearningRate 0.000096 Epoch: 28 Global Step: 598230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:25,566-Speed 2490.22 samples/sec Loss 1.4278 LearningRate 0.000096 Epoch: 28 Global Step: 598240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:33,789-Speed 2490.84 samples/sec Loss 1.4181 LearningRate 0.000096 Epoch: 28 Global Step: 598250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-07-11 07:23:41,966-Speed 2505.08 samples/sec Loss 1.4214 LearningRate 0.000096 Epoch: 28 Global Step: 598260 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:23:50,135-Speed 2507.64 samples/sec Loss 1.4335 LearningRate 0.000096 Epoch: 28 Global Step: 598270 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:23:58,356-Speed 2491.98 samples/sec Loss 1.4106 LearningRate 0.000096 Epoch: 28 Global Step: 598280 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:06,577-Speed 2491.34 samples/sec Loss 1.4342 LearningRate 0.000096 Epoch: 28 Global Step: 598290 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:14,810-Speed 2488.06 samples/sec Loss 1.4366 LearningRate 0.000096 Epoch: 28 Global Step: 598300 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:23,044-Speed 2487.55 samples/sec Loss 1.4015 LearningRate 0.000096 Epoch: 28 Global Step: 598310 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:31,281-Speed 2486.63 samples/sec Loss 1.4029 LearningRate 0.000096 Epoch: 28 Global Step: 598320 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:39,448-Speed 2508.16 samples/sec Loss 1.4103 LearningRate 0.000096 Epoch: 28 Global Step: 598330 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:47,669-Speed 2491.44 samples/sec Loss 1.4215 LearningRate 0.000096 Epoch: 28 Global Step: 598340 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:24:55,911-Speed 2485.12 samples/sec Loss 1.4190 LearningRate 0.000096 Epoch: 28 Global Step: 598350 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:04,129-Speed 2492.90 samples/sec Loss 1.4111 LearningRate 0.000096 Epoch: 28 Global Step: 598360 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:12,346-Speed 2492.50 samples/sec Loss 1.4197 LearningRate 0.000096 Epoch: 28 Global Step: 598370 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:20,571-Speed 2490.52 samples/sec Loss 1.3906 LearningRate 0.000096 Epoch: 28 Global Step: 598380 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:28,738-Speed 2508.24 samples/sec Loss 1.4499 LearningRate 0.000096 Epoch: 28 Global Step: 598390 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:36,957-Speed 2492.08 samples/sec Loss 1.4284 LearningRate 0.000096 Epoch: 28 Global Step: 598400 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:45,184-Speed 2490.00 samples/sec Loss 1.4246 LearningRate 0.000096 Epoch: 28 Global Step: 598410 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:25:53,411-Speed 2490.01 samples/sec Loss 1.3808 LearningRate 0.000096 Epoch: 28 Global Step: 598420 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:01,633-Speed 2491.35 samples/sec Loss 1.4303 LearningRate 0.000096 Epoch: 28 Global Step: 598430 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:09,849-Speed 2492.86 samples/sec Loss 1.4282 LearningRate 0.000096 Epoch: 28 Global Step: 598440 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:18,011-Speed 2509.63 samples/sec Loss 1.3926 LearningRate 0.000096 Epoch: 28 Global Step: 598450 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:26,237-Speed 2490.31 samples/sec Loss 1.4306 LearningRate 0.000096 Epoch: 28 Global Step: 598460 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:34,452-Speed 2493.12 samples/sec Loss 1.4372 LearningRate 0.000096 Epoch: 28 Global Step: 598470 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:42,679-Speed 2489.87 samples/sec Loss 1.4167 LearningRate 0.000096 Epoch: 28 Global Step: 598480 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:50,890-Speed 2494.90 samples/sec Loss 1.4189 LearningRate 0.000096 Epoch: 28 Global Step: 598490 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:26:59,106-Speed 2493.25 samples/sec Loss 1.4033 LearningRate 0.000096 Epoch: 28 Global Step: 598500 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:07,269-Speed 2508.98 samples/sec Loss 1.3855 LearningRate 0.000096 Epoch: 28 Global Step: 598510 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:15,483-Speed 2493.87 samples/sec Loss 1.4152 LearningRate 0.000096 Epoch: 28 Global Step: 598520 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:23,698-Speed 2493.34 samples/sec Loss 1.4029 LearningRate 0.000096 Epoch: 28 Global Step: 598530 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:31,917-Speed 2492.35 samples/sec Loss 1.4084 LearningRate 0.000096 Epoch: 28 Global Step: 598540 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:40,134-Speed 2492.86 samples/sec Loss 1.4071 LearningRate 0.000096 Epoch: 28 Global Step: 598550 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:48,363-Speed 2488.84 samples/sec Loss 1.4458 LearningRate 0.000096 Epoch: 28 Global Step: 598560 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:27:56,530-Speed 2508.03 samples/sec Loss 1.4321 LearningRate 0.000096 Epoch: 28 Global Step: 598570 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:04,749-Speed 2492.21 samples/sec Loss 1.4049 LearningRate 0.000096 Epoch: 28 Global Step: 598580 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:12,963-Speed 2493.87 samples/sec Loss 1.4112 LearningRate 0.000096 Epoch: 28 Global Step: 598590 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:21,188-Speed 2490.44 samples/sec Loss 1.4397 LearningRate 0.000096 Epoch: 28 Global Step: 598600 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:29,400-Speed 2494.57 samples/sec Loss 1.4154 LearningRate 0.000096 Epoch: 28 Global Step: 598610 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:37,613-Speed 2493.95 samples/sec Loss 1.4414 LearningRate 0.000096 Epoch: 28 Global Step: 598620 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:45,770-Speed 2510.99 samples/sec Loss 1.3855 LearningRate 0.000096 Epoch: 28 Global Step: 598630 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:28:53,992-Speed 2491.32 samples/sec Loss 1.3994 LearningRate 0.000096 Epoch: 28 Global Step: 598640 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:02,204-Speed 2494.43 samples/sec Loss 1.4212 LearningRate 0.000096 Epoch: 28 Global Step: 598650 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:10,421-Speed 2492.75 samples/sec Loss 1.4394 LearningRate 0.000096 Epoch: 28 Global Step: 598660 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:18,647-Speed 2489.99 samples/sec Loss 1.4484 LearningRate 0.000096 Epoch: 28 Global Step: 598670 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:26,858-Speed 2494.41 samples/sec Loss 1.4396 LearningRate 0.000096 Epoch: 28 Global Step: 598680 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:35,018-Speed 2510.39 samples/sec Loss 1.4295 LearningRate 0.000096 Epoch: 28 Global Step: 598690 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:43,228-Speed 2494.81 samples/sec Loss 1.4363 LearningRate 0.000096 Epoch: 28 Global Step: 598700 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:51,439-Speed 2494.56 samples/sec Loss 1.4497 LearningRate 0.000096 Epoch: 28 Global Step: 598710 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:29:59,653-Speed 2493.78 samples/sec Loss 1.4355 LearningRate 0.000096 Epoch: 28 Global Step: 598720 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:07,863-Speed 2494.96 samples/sec Loss 1.3911 LearningRate 0.000096 Epoch: 28 Global Step: 598730 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:16,079-Speed 2493.20 samples/sec Loss 1.4464 LearningRate 0.000096 Epoch: 28 Global Step: 598740 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:24,236-Speed 2510.95 samples/sec Loss 1.4283 LearningRate 0.000096 Epoch: 28 Global Step: 598750 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:32,446-Speed 2495.07 samples/sec Loss 1.4321 LearningRate 0.000096 Epoch: 28 Global Step: 598760 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:40,656-Speed 2494.91 samples/sec Loss 1.3778 LearningRate 0.000096 Epoch: 28 Global Step: 598770 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:48,865-Speed 2495.20 samples/sec Loss 1.4652 LearningRate 0.000096 Epoch: 28 Global Step: 598780 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:30:57,076-Speed 2494.73 samples/sec Loss 1.4016 LearningRate 0.000096 Epoch: 28 Global Step: 598790 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:05,288-Speed 2494.26 samples/sec Loss 1.4226 LearningRate 0.000096 Epoch: 28 Global Step: 598800 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:13,445-Speed 2511.23 samples/sec Loss 1.4567 LearningRate 0.000096 Epoch: 28 Global Step: 598810 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:21,660-Speed 2493.56 samples/sec Loss 1.4108 LearningRate 0.000096 Epoch: 28 Global Step: 598820 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:29,960-Speed 2467.87 samples/sec Loss 1.3950 LearningRate 0.000096 Epoch: 28 Global Step: 598830 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:38,173-Speed 2494.04 samples/sec Loss 1.4189 LearningRate 0.000095 Epoch: 28 Global Step: 598840 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:46,386-Speed 2494.08 samples/sec Loss 1.3742 LearningRate 0.000095 Epoch: 28 Global Step: 598850 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:31:54,599-Speed 2493.99 samples/sec Loss 1.4241 LearningRate 0.000095 Epoch: 28 Global Step: 598860 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:02,765-Speed 2508.50 samples/sec Loss 1.3959 LearningRate 0.000095 Epoch: 28 Global Step: 598870 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:10,975-Speed 2494.85 samples/sec Loss 1.3962 LearningRate 0.000095 Epoch: 28 Global Step: 598880 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:19,185-Speed 2494.96 samples/sec Loss 1.4571 LearningRate 0.000095 Epoch: 28 Global Step: 598890 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:27,405-Speed 2491.71 samples/sec Loss 1.4111 LearningRate 0.000095 Epoch: 28 Global Step: 598900 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:35,611-Speed 2496.50 samples/sec Loss 1.4194 LearningRate 0.000095 Epoch: 28 Global Step: 598910 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:43,825-Speed 2493.84 samples/sec Loss 1.3910 LearningRate 0.000095 Epoch: 28 Global Step: 598920 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:32:51,982-Speed 2511.36 samples/sec Loss 1.4223 LearningRate 0.000095 Epoch: 28 Global Step: 598930 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:00,189-Speed 2495.59 samples/sec Loss 1.4170 LearningRate 0.000095 Epoch: 28 Global Step: 598940 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:08,399-Speed 2494.71 samples/sec Loss 1.4113 LearningRate 0.000095 Epoch: 28 Global Step: 598950 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:16,607-Speed 2496.05 samples/sec Loss 1.4354 LearningRate 0.000095 Epoch: 28 Global Step: 598960 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:24,828-Speed 2491.48 samples/sec Loss 1.4333 LearningRate 0.000095 Epoch: 28 Global Step: 598970 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:33,036-Speed 2495.52 samples/sec Loss 1.4261 LearningRate 0.000095 Epoch: 28 Global Step: 598980 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:41,193-Speed 2511.17 samples/sec Loss 1.4393 LearningRate 0.000095 Epoch: 28 Global Step: 598990 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:49,399-Speed 2496.24 samples/sec Loss 1.4231 LearningRate 0.000095 Epoch: 28 Global Step: 599000 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:33:57,611-Speed 2494.19 samples/sec Loss 1.4564 LearningRate 0.000095 Epoch: 28 Global Step: 599010 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:05,826-Speed 2493.28 samples/sec Loss 1.3877 LearningRate 0.000095 Epoch: 28 Global Step: 599020 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:14,032-Speed 2496.32 samples/sec Loss 1.4030 LearningRate 0.000095 Epoch: 28 Global Step: 599030 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:22,238-Speed 2496.26 samples/sec Loss 1.4323 LearningRate 0.000095 Epoch: 28 Global Step: 599040 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:30,406-Speed 2507.61 samples/sec Loss 1.4500 LearningRate 0.000095 Epoch: 28 Global Step: 599050 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:38,622-Speed 2493.02 samples/sec Loss 1.4138 LearningRate 0.000095 Epoch: 28 Global Step: 599060 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:46,833-Speed 2494.94 samples/sec Loss 1.4325 LearningRate 0.000095 Epoch: 28 Global Step: 599070 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:34:55,040-Speed 2496.00 samples/sec Loss 1.4171 LearningRate 0.000095 Epoch: 28 Global Step: 599080 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:03,246-Speed 2495.98 samples/sec Loss 1.4456 LearningRate 0.000095 Epoch: 28 Global Step: 599090 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:11,455-Speed 2495.41 samples/sec Loss 1.4314 LearningRate 0.000095 Epoch: 28 Global Step: 599100 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:19,623-Speed 2508.12 samples/sec Loss 1.4272 LearningRate 0.000095 Epoch: 28 Global Step: 599110 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:27,834-Speed 2494.33 samples/sec Loss 1.3975 LearningRate 0.000095 Epoch: 28 Global Step: 599120 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:36,043-Speed 2495.52 samples/sec Loss 1.4141 LearningRate 0.000095 Epoch: 28 Global Step: 599130 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:44,250-Speed 2495.90 samples/sec Loss 1.4248 LearningRate 0.000095 Epoch: 28 Global Step: 599140 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:35:52,459-Speed 2495.55 samples/sec Loss 1.4047 LearningRate 0.000095 Epoch: 28 Global Step: 599150 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:00,674-Speed 2493.10 samples/sec Loss 1.4532 LearningRate 0.000095 Epoch: 28 Global Step: 599160 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:08,827-Speed 2512.47 samples/sec Loss 1.4180 LearningRate 0.000095 Epoch: 28 Global Step: 599170 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:17,038-Speed 2494.90 samples/sec Loss 1.4402 LearningRate 0.000095 Epoch: 28 Global Step: 599180 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:25,244-Speed 2496.13 samples/sec Loss 1.3857 LearningRate 0.000095 Epoch: 28 Global Step: 599190 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:33,459-Speed 2493.34 samples/sec Loss 1.4163 LearningRate 0.000095 Epoch: 28 Global Step: 599200 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:41,668-Speed 2495.28 samples/sec Loss 1.4126 LearningRate 0.000095 Epoch: 28 Global Step: 599210 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:49,876-Speed 2495.42 samples/sec Loss 1.4223 LearningRate 0.000095 Epoch: 28 Global Step: 599220 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:36:58,036-Speed 2510.49 samples/sec Loss 1.4233 LearningRate 0.000095 Epoch: 28 Global Step: 599230 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:06,242-Speed 2496.10 samples/sec Loss 1.4127 LearningRate 0.000095 Epoch: 28 Global Step: 599240 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:14,450-Speed 2495.39 samples/sec Loss 1.4205 LearningRate 0.000095 Epoch: 28 Global Step: 599250 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:22,674-Speed 2491.01 samples/sec Loss 1.4368 LearningRate 0.000095 Epoch: 28 Global Step: 599260 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:30,882-Speed 2495.44 samples/sec Loss 1.4457 LearningRate 0.000095 Epoch: 28 Global Step: 599270 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:39,092-Speed 2494.79 samples/sec Loss 1.4196 LearningRate 0.000095 Epoch: 28 Global Step: 599280 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:47,257-Speed 2508.44 samples/sec Loss 1.4136 LearningRate 0.000095 Epoch: 28 Global Step: 599290 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:37:55,475-Speed 2492.51 samples/sec Loss 1.4609 LearningRate 0.000095 Epoch: 28 Global Step: 599300 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:38:03,681-Speed 2496.21 samples/sec Loss 1.4000 LearningRate 0.000095 Epoch: 28 Global Step: 599310 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:38:11,894-Speed 2493.89 samples/sec Loss 1.4337 LearningRate 0.000095 Epoch: 28 Global Step: 599320 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:38:20,103-Speed 2495.02 samples/sec Loss 1.4098 LearningRate 0.000095 Epoch: 28 Global Step: 599330 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-07-11 07:38:28,267-Speed 2509.06 samples/sec Loss 1.4180 LearningRate 0.000095 Epoch: 28 Global Step: 599340 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:38:36,429-Speed 2509.82 samples/sec Loss 1.4038 LearningRate 0.000095 Epoch: 28 Global Step: 599350 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:38:44,632-Speed 2497.15 samples/sec Loss 1.4165 LearningRate 0.000095 Epoch: 28 Global Step: 599360 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:38:52,839-Speed 2495.89 samples/sec Loss 1.4069 LearningRate 0.000095 Epoch: 28 Global Step: 599370 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:01,042-Speed 2497.03 samples/sec Loss 1.4436 LearningRate 0.000095 Epoch: 28 Global Step: 599380 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:09,257-Speed 2493.54 samples/sec Loss 1.4436 LearningRate 0.000095 Epoch: 28 Global Step: 599390 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:17,460-Speed 2496.86 samples/sec Loss 1.4633 LearningRate 0.000095 Epoch: 28 Global Step: 599400 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:25,613-Speed 2512.79 samples/sec Loss 1.4358 LearningRate 0.000095 Epoch: 28 Global Step: 599410 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:33,815-Speed 2497.64 samples/sec Loss 1.4378 LearningRate 0.000095 Epoch: 28 Global Step: 599420 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:42,031-Speed 2492.84 samples/sec Loss 1.4192 LearningRate 0.000095 Epoch: 28 Global Step: 599430 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:50,234-Speed 2497.19 samples/sec Loss 1.4369 LearningRate 0.000095 Epoch: 28 Global Step: 599440 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:39:58,441-Speed 2495.74 samples/sec Loss 1.4295 LearningRate 0.000095 Epoch: 28 Global Step: 599450 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:06,670-Speed 2489.57 samples/sec Loss 1.3782 LearningRate 0.000095 Epoch: 28 Global Step: 599460 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:14,839-Speed 2507.41 samples/sec Loss 1.3872 LearningRate 0.000095 Epoch: 28 Global Step: 599470 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:23,051-Speed 2494.22 samples/sec Loss 1.4117 LearningRate 0.000095 Epoch: 28 Global Step: 599480 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:31,259-Speed 2495.39 samples/sec Loss 1.4484 LearningRate 0.000095 Epoch: 28 Global Step: 599490 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:39,464-Speed 2496.67 samples/sec Loss 1.4047 LearningRate 0.000095 Epoch: 28 Global Step: 599500 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:47,680-Speed 2493.05 samples/sec Loss 1.4392 LearningRate 0.000095 Epoch: 28 Global Step: 599510 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:40:55,890-Speed 2495.02 samples/sec Loss 1.3954 LearningRate 0.000095 Epoch: 28 Global Step: 599520 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:04,040-Speed 2513.60 samples/sec Loss 1.4450 LearningRate 0.000095 Epoch: 28 Global Step: 599530 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:12,246-Speed 2496.04 samples/sec Loss 1.4136 LearningRate 0.000095 Epoch: 28 Global Step: 599540 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:20,453-Speed 2495.94 samples/sec Loss 1.4545 LearningRate 0.000095 Epoch: 28 Global Step: 599550 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:28,655-Speed 2497.14 samples/sec Loss 1.4636 LearningRate 0.000095 Epoch: 28 Global Step: 599560 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:36,864-Speed 2495.51 samples/sec Loss 1.4432 LearningRate 0.000095 Epoch: 28 Global Step: 599570 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:45,073-Speed 2495.07 samples/sec Loss 1.4271 LearningRate 0.000095 Epoch: 28 Global Step: 599580 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:41:53,224-Speed 2513.11 samples/sec Loss 1.4265 LearningRate 0.000095 Epoch: 28 Global Step: 599590 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:01,431-Speed 2495.81 samples/sec Loss 1.4145 LearningRate 0.000095 Epoch: 28 Global Step: 599600 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:09,637-Speed 2496.26 samples/sec Loss 1.4338 LearningRate 0.000095 Epoch: 28 Global Step: 599610 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:17,841-Speed 2496.82 samples/sec Loss 1.3990 LearningRate 0.000095 Epoch: 28 Global Step: 599620 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:26,044-Speed 2496.86 samples/sec Loss 1.4288 LearningRate 0.000095 Epoch: 28 Global Step: 599630 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:34,256-Speed 2494.56 samples/sec Loss 1.4486 LearningRate 0.000095 Epoch: 28 Global Step: 599640 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:42,421-Speed 2508.86 samples/sec Loss 1.4399 LearningRate 0.000095 Epoch: 28 Global Step: 599650 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:50,625-Speed 2496.53 samples/sec Loss 1.4354 LearningRate 0.000095 Epoch: 28 Global Step: 599660 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:42:58,826-Speed 2497.63 samples/sec Loss 1.4278 LearningRate 0.000095 Epoch: 28 Global Step: 599670 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:07,027-Speed 2497.68 samples/sec Loss 1.4281 LearningRate 0.000095 Epoch: 28 Global Step: 599680 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:15,230-Speed 2497.30 samples/sec Loss 1.4186 LearningRate 0.000095 Epoch: 28 Global Step: 599690 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:23,435-Speed 2496.40 samples/sec Loss 1.4375 LearningRate 0.000095 Epoch: 28 Global Step: 599700 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:31,585-Speed 2513.18 samples/sec Loss 1.4425 LearningRate 0.000095 Epoch: 28 Global Step: 599710 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:39,791-Speed 2496.11 samples/sec Loss 1.4273 LearningRate 0.000095 Epoch: 28 Global Step: 599720 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:47,995-Speed 2496.88 samples/sec Loss 1.4198 LearningRate 0.000095 Epoch: 28 Global Step: 599730 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:43:56,196-Speed 2497.51 samples/sec Loss 1.4399 LearningRate 0.000095 Epoch: 28 Global Step: 599740 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:04,399-Speed 2497.40 samples/sec Loss 1.4133 LearningRate 0.000095 Epoch: 28 Global Step: 599750 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:12,604-Speed 2496.59 samples/sec Loss 1.3878 LearningRate 0.000095 Epoch: 28 Global Step: 599760 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:20,754-Speed 2513.44 samples/sec Loss 1.4391 LearningRate 0.000095 Epoch: 28 Global Step: 599770 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:28,964-Speed 2494.79 samples/sec Loss 1.4050 LearningRate 0.000095 Epoch: 28 Global Step: 599780 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:37,170-Speed 2496.08 samples/sec Loss 1.3868 LearningRate 0.000095 Epoch: 28 Global Step: 599790 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:45,371-Speed 2497.53 samples/sec Loss 1.3790 LearningRate 0.000095 Epoch: 28 Global Step: 599800 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:44:53,576-Speed 2496.34 samples/sec Loss 1.4066 LearningRate 0.000095 Epoch: 28 Global Step: 599810 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:01,785-Speed 2495.24 samples/sec Loss 1.4222 LearningRate 0.000095 Epoch: 28 Global Step: 599820 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:09,936-Speed 2513.02 samples/sec Loss 1.4545 LearningRate 0.000095 Epoch: 28 Global Step: 599830 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:18,143-Speed 2495.85 samples/sec Loss 1.4522 LearningRate 0.000095 Epoch: 28 Global Step: 599840 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:26,345-Speed 2497.11 samples/sec Loss 1.4364 LearningRate 0.000095 Epoch: 28 Global Step: 599850 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:34,551-Speed 2496.45 samples/sec Loss 1.4144 LearningRate 0.000095 Epoch: 28 Global Step: 599860 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:42,758-Speed 2495.48 samples/sec Loss 1.3902 LearningRate 0.000095 Epoch: 28 Global Step: 599870 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:50,964-Speed 2496.43 samples/sec Loss 1.4489 LearningRate 0.000095 Epoch: 28 Global Step: 599880 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:45:59,120-Speed 2511.42 samples/sec Loss 1.3983 LearningRate 0.000095 Epoch: 28 Global Step: 599890 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:07,325-Speed 2496.30 samples/sec Loss 1.4353 LearningRate 0.000095 Epoch: 28 Global Step: 599900 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:15,532-Speed 2495.97 samples/sec Loss 1.4452 LearningRate 0.000095 Epoch: 28 Global Step: 599910 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:23,737-Speed 2496.47 samples/sec Loss 1.4174 LearningRate 0.000095 Epoch: 28 Global Step: 599920 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:31,940-Speed 2496.89 samples/sec Loss 1.4228 LearningRate 0.000095 Epoch: 28 Global Step: 599930 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:40,158-Speed 2492.29 samples/sec Loss 1.4521 LearningRate 0.000095 Epoch: 28 Global Step: 599940 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:48,313-Speed 2512.02 samples/sec Loss 1.4227 LearningRate 0.000095 Epoch: 28 Global Step: 599950 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:46:56,523-Speed 2494.70 samples/sec Loss 1.4392 LearningRate 0.000095 Epoch: 28 Global Step: 599960 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:04,727-Speed 2496.56 samples/sec Loss 1.4137 LearningRate 0.000095 Epoch: 28 Global Step: 599970 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:12,930-Speed 2497.03 samples/sec Loss 1.3962 LearningRate 0.000095 Epoch: 28 Global Step: 599980 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:21,139-Speed 2495.34 samples/sec Loss 1.3717 LearningRate 0.000095 Epoch: 28 Global Step: 599990 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:29,345-Speed 2496.37 samples/sec Loss 1.4309 LearningRate 0.000095 Epoch: 28 Global Step: 600000 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:37,501-Speed 2511.40 samples/sec Loss 1.4163 LearningRate 0.000095 Epoch: 28 Global Step: 600010 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:45,704-Speed 2496.93 samples/sec Loss 1.4473 LearningRate 0.000095 Epoch: 28 Global Step: 600020 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:47:53,912-Speed 2495.52 samples/sec Loss 1.4236 LearningRate 0.000095 Epoch: 28 Global Step: 600030 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:02,132-Speed 2491.94 samples/sec Loss 1.4224 LearningRate 0.000095 Epoch: 28 Global Step: 600040 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:10,335-Speed 2497.06 samples/sec Loss 1.4184 LearningRate 0.000094 Epoch: 28 Global Step: 600050 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:18,541-Speed 2495.93 samples/sec Loss 1.4147 LearningRate 0.000094 Epoch: 28 Global Step: 600060 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:26,695-Speed 2512.02 samples/sec Loss 1.4302 LearningRate 0.000094 Epoch: 28 Global Step: 600070 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:34,900-Speed 2496.70 samples/sec Loss 1.4289 LearningRate 0.000094 Epoch: 28 Global Step: 600080 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:43,134-Speed 2487.49 samples/sec Loss 1.3916 LearningRate 0.000094 Epoch: 28 Global Step: 600090 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:51,337-Speed 2496.99 samples/sec Loss 1.4352 LearningRate 0.000094 Epoch: 28 Global Step: 600100 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:48:59,537-Speed 2498.24 samples/sec Loss 1.3954 LearningRate 0.000094 Epoch: 28 Global Step: 600110 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:07,740-Speed 2496.93 samples/sec Loss 1.4555 LearningRate 0.000094 Epoch: 28 Global Step: 600120 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:15,889-Speed 2513.82 samples/sec Loss 1.4044 LearningRate 0.000094 Epoch: 28 Global Step: 600130 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:24,094-Speed 2496.46 samples/sec Loss 1.4190 LearningRate 0.000094 Epoch: 28 Global Step: 600140 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:32,297-Speed 2497.20 samples/sec Loss 1.3933 LearningRate 0.000094 Epoch: 28 Global Step: 600150 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:40,496-Speed 2498.21 samples/sec Loss 1.4244 LearningRate 0.000094 Epoch: 28 Global Step: 600160 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:48,698-Speed 2497.30 samples/sec Loss 1.3917 LearningRate 0.000094 Epoch: 28 Global Step: 600170 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:49:56,900-Speed 2497.29 samples/sec Loss 1.4378 LearningRate 0.000094 Epoch: 28 Global Step: 600180 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:50:05,050-Speed 2513.37 samples/sec Loss 1.4035 LearningRate 0.000094 Epoch: 28 Global Step: 600190 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:50:13,260-Speed 2494.86 samples/sec Loss 1.3733 LearningRate 0.000094 Epoch: 28 Global Step: 600200 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:50:21,463-Speed 2497.02 samples/sec Loss 1.4350 LearningRate 0.000094 Epoch: 28 Global Step: 600210 Fp16 Grad Scale: 8192 Required: 53 hours Training: 2022-07-11 07:50:29,682-Speed 2492.34 samples/sec Loss 1.3934 LearningRate 0.000094 Epoch: 28 Global Step: 600220 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:50:37,889-Speed 2495.95 samples/sec Loss 1.3733 LearningRate 0.000094 Epoch: 28 Global Step: 600230 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:50:46,094-Speed 2496.26 samples/sec Loss 1.3966 LearningRate 0.000094 Epoch: 28 Global Step: 600240 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:50:54,243-Speed 2514.16 samples/sec Loss 1.3986 LearningRate 0.000094 Epoch: 28 Global Step: 600250 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:02,449-Speed 2496.21 samples/sec Loss 1.3878 LearningRate 0.000094 Epoch: 28 Global Step: 600260 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:10,660-Speed 2494.64 samples/sec Loss 1.4009 LearningRate 0.000094 Epoch: 28 Global Step: 600270 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:18,867-Speed 2495.60 samples/sec Loss 1.4168 LearningRate 0.000094 Epoch: 28 Global Step: 600280 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:27,075-Speed 2495.46 samples/sec Loss 1.4220 LearningRate 0.000094 Epoch: 28 Global Step: 600290 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:35,292-Speed 2492.78 samples/sec Loss 1.3895 LearningRate 0.000094 Epoch: 28 Global Step: 600300 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:43,451-Speed 2510.42 samples/sec Loss 1.4148 LearningRate 0.000094 Epoch: 28 Global Step: 600310 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:51,657-Speed 2496.20 samples/sec Loss 1.4267 LearningRate 0.000094 Epoch: 28 Global Step: 600320 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:51:59,861-Speed 2496.78 samples/sec Loss 1.4371 LearningRate 0.000094 Epoch: 28 Global Step: 600330 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:08,071-Speed 2494.85 samples/sec Loss 1.4071 LearningRate 0.000094 Epoch: 28 Global Step: 600340 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:16,288-Speed 2492.89 samples/sec Loss 1.3924 LearningRate 0.000094 Epoch: 28 Global Step: 600350 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:24,497-Speed 2494.95 samples/sec Loss 1.4462 LearningRate 0.000094 Epoch: 28 Global Step: 600360 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:32,649-Speed 2512.76 samples/sec Loss 1.4137 LearningRate 0.000094 Epoch: 28 Global Step: 600370 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:40,855-Speed 2496.02 samples/sec Loss 1.3834 LearningRate 0.000094 Epoch: 28 Global Step: 600380 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:49,059-Speed 2496.88 samples/sec Loss 1.4217 LearningRate 0.000094 Epoch: 28 Global Step: 600390 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:52:57,274-Speed 2493.34 samples/sec Loss 1.4013 LearningRate 0.000094 Epoch: 28 Global Step: 600400 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:05,477-Speed 2496.98 samples/sec Loss 1.3927 LearningRate 0.000094 Epoch: 28 Global Step: 600410 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:13,681-Speed 2496.62 samples/sec Loss 1.4266 LearningRate 0.000094 Epoch: 28 Global Step: 600420 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:21,834-Speed 2512.52 samples/sec Loss 1.4280 LearningRate 0.000094 Epoch: 28 Global Step: 600430 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:30,038-Speed 2496.70 samples/sec Loss 1.4149 LearningRate 0.000094 Epoch: 28 Global Step: 600440 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:38,240-Speed 2497.36 samples/sec Loss 1.3771 LearningRate 0.000094 Epoch: 28 Global Step: 600450 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:46,441-Speed 2498.13 samples/sec Loss 1.4446 LearningRate 0.000094 Epoch: 28 Global Step: 600460 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:53:54,645-Speed 2496.52 samples/sec Loss 1.4084 LearningRate 0.000094 Epoch: 28 Global Step: 600470 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:02,851-Speed 2496.23 samples/sec Loss 1.4033 LearningRate 0.000094 Epoch: 28 Global Step: 600480 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:11,015-Speed 2509.06 samples/sec Loss 1.4292 LearningRate 0.000094 Epoch: 28 Global Step: 600490 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:19,226-Speed 2494.77 samples/sec Loss 1.4363 LearningRate 0.000094 Epoch: 28 Global Step: 600500 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:27,430-Speed 2496.51 samples/sec Loss 1.4414 LearningRate 0.000094 Epoch: 28 Global Step: 600510 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:35,642-Speed 2494.38 samples/sec Loss 1.4215 LearningRate 0.000094 Epoch: 28 Global Step: 600520 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:43,845-Speed 2497.25 samples/sec Loss 1.4084 LearningRate 0.000094 Epoch: 28 Global Step: 600530 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 07:54:52,047-Speed 2497.21 samples/sec Loss 1.4167 LearningRate 0.000094 Epoch: 28 Global Step: 600540 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:00,200-Speed 2512.57 samples/sec Loss 1.4349 LearningRate 0.000094 Epoch: 28 Global Step: 600550 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:08,420-Speed 2491.97 samples/sec Loss 1.4499 LearningRate 0.000094 Epoch: 28 Global Step: 600560 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:16,623-Speed 2497.29 samples/sec Loss 1.4325 LearningRate 0.000094 Epoch: 28 Global Step: 600570 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:24,827-Speed 2496.50 samples/sec Loss 1.4283 LearningRate 0.000094 Epoch: 28 Global Step: 600580 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:33,049-Speed 2491.25 samples/sec Loss 1.3780 LearningRate 0.000094 Epoch: 28 Global Step: 600590 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:41,255-Speed 2496.38 samples/sec Loss 1.4103 LearningRate 0.000094 Epoch: 28 Global Step: 600600 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:49,411-Speed 2511.54 samples/sec Loss 1.3857 LearningRate 0.000094 Epoch: 28 Global Step: 600610 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:55:57,615-Speed 2496.48 samples/sec Loss 1.4121 LearningRate 0.000094 Epoch: 28 Global Step: 600620 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:05,827-Speed 2494.56 samples/sec Loss 1.3900 LearningRate 0.000094 Epoch: 28 Global Step: 600630 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:14,032-Speed 2496.56 samples/sec Loss 1.4174 LearningRate 0.000094 Epoch: 28 Global Step: 600640 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:22,239-Speed 2495.98 samples/sec Loss 1.3679 LearningRate 0.000094 Epoch: 28 Global Step: 600650 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:30,446-Speed 2495.71 samples/sec Loss 1.4167 LearningRate 0.000094 Epoch: 28 Global Step: 600660 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:38,603-Speed 2511.14 samples/sec Loss 1.4190 LearningRate 0.000094 Epoch: 28 Global Step: 600670 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:46,824-Speed 2491.70 samples/sec Loss 1.3751 LearningRate 0.000094 Epoch: 28 Global Step: 600680 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:56:55,027-Speed 2497.02 samples/sec Loss 1.4097 LearningRate 0.000094 Epoch: 28 Global Step: 600690 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:03,231-Speed 2496.60 samples/sec Loss 1.4159 LearningRate 0.000094 Epoch: 28 Global Step: 600700 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:11,442-Speed 2494.52 samples/sec Loss 1.3769 LearningRate 0.000094 Epoch: 28 Global Step: 600710 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:19,660-Speed 2492.75 samples/sec Loss 1.4426 LearningRate 0.000094 Epoch: 28 Global Step: 600720 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:27,813-Speed 2512.39 samples/sec Loss 1.4195 LearningRate 0.000094 Epoch: 28 Global Step: 600730 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:36,022-Speed 2495.16 samples/sec Loss 1.3559 LearningRate 0.000094 Epoch: 28 Global Step: 600740 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:44,226-Speed 2496.79 samples/sec Loss 1.4341 LearningRate 0.000094 Epoch: 28 Global Step: 600750 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:57:52,437-Speed 2495.15 samples/sec Loss 1.4073 LearningRate 0.000094 Epoch: 28 Global Step: 600760 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:00,642-Speed 2496.49 samples/sec Loss 1.4454 LearningRate 0.000094 Epoch: 28 Global Step: 600770 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:08,854-Speed 2493.94 samples/sec Loss 1.4171 LearningRate 0.000094 Epoch: 28 Global Step: 600780 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:17,009-Speed 2511.93 samples/sec Loss 1.3852 LearningRate 0.000094 Epoch: 28 Global Step: 600790 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:25,215-Speed 2496.29 samples/sec Loss 1.4233 LearningRate 0.000094 Epoch: 28 Global Step: 600800 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:33,440-Speed 2490.39 samples/sec Loss 1.4233 LearningRate 0.000094 Epoch: 28 Global Step: 600810 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:41,651-Speed 2494.69 samples/sec Loss 1.4076 LearningRate 0.000094 Epoch: 28 Global Step: 600820 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:49,857-Speed 2495.94 samples/sec Loss 1.4010 LearningRate 0.000094 Epoch: 28 Global Step: 600830 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:58:58,064-Speed 2496.48 samples/sec Loss 1.4315 LearningRate 0.000094 Epoch: 28 Global Step: 600840 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:06,227-Speed 2509.24 samples/sec Loss 1.4074 LearningRate 0.000094 Epoch: 28 Global Step: 600850 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:14,430-Speed 2496.86 samples/sec Loss 1.3999 LearningRate 0.000094 Epoch: 28 Global Step: 600860 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:22,638-Speed 2496.18 samples/sec Loss 1.4092 LearningRate 0.000094 Epoch: 28 Global Step: 600870 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:30,842-Speed 2496.54 samples/sec Loss 1.3925 LearningRate 0.000094 Epoch: 28 Global Step: 600880 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:39,049-Speed 2495.63 samples/sec Loss 1.4238 LearningRate 0.000094 Epoch: 28 Global Step: 600890 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:47,252-Speed 2497.00 samples/sec Loss 1.3695 LearningRate 0.000094 Epoch: 28 Global Step: 600900 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 07:59:55,405-Speed 2513.07 samples/sec Loss 1.4001 LearningRate 0.000094 Epoch: 28 Global Step: 600910 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:03,613-Speed 2495.78 samples/sec Loss 1.4028 LearningRate 0.000094 Epoch: 28 Global Step: 600920 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:11,815-Speed 2497.23 samples/sec Loss 1.4188 LearningRate 0.000094 Epoch: 28 Global Step: 600930 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:20,032-Speed 2492.97 samples/sec Loss 1.4360 LearningRate 0.000094 Epoch: 28 Global Step: 600940 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:28,236-Speed 2496.76 samples/sec Loss 1.3956 LearningRate 0.000094 Epoch: 28 Global Step: 600950 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:36,439-Speed 2496.88 samples/sec Loss 1.3869 LearningRate 0.000094 Epoch: 28 Global Step: 600960 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:44,596-Speed 2511.35 samples/sec Loss 1.3980 LearningRate 0.000094 Epoch: 28 Global Step: 600970 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:00:52,801-Speed 2496.24 samples/sec Loss 1.4210 LearningRate 0.000094 Epoch: 28 Global Step: 600980 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:01,004-Speed 2497.08 samples/sec Loss 1.4001 LearningRate 0.000094 Epoch: 28 Global Step: 600990 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:09,209-Speed 2496.35 samples/sec Loss 1.4234 LearningRate 0.000094 Epoch: 28 Global Step: 601000 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:17,415-Speed 2496.33 samples/sec Loss 1.4204 LearningRate 0.000094 Epoch: 28 Global Step: 601010 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:25,623-Speed 2495.85 samples/sec Loss 1.4163 LearningRate 0.000094 Epoch: 28 Global Step: 601020 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:33,777-Speed 2512.15 samples/sec Loss 1.4525 LearningRate 0.000094 Epoch: 28 Global Step: 601030 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:41,978-Speed 2497.66 samples/sec Loss 1.4062 LearningRate 0.000094 Epoch: 28 Global Step: 601040 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:50,181-Speed 2497.01 samples/sec Loss 1.4323 LearningRate 0.000094 Epoch: 28 Global Step: 601050 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:01:58,391-Speed 2494.97 samples/sec Loss 1.4392 LearningRate 0.000094 Epoch: 28 Global Step: 601060 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:06,597-Speed 2495.84 samples/sec Loss 1.3829 LearningRate 0.000094 Epoch: 28 Global Step: 601070 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:14,799-Speed 2497.24 samples/sec Loss 1.4627 LearningRate 0.000094 Epoch: 28 Global Step: 601080 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:22,950-Speed 2512.98 samples/sec Loss 1.4255 LearningRate 0.000094 Epoch: 28 Global Step: 601090 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:31,158-Speed 2495.64 samples/sec Loss 1.4365 LearningRate 0.000094 Epoch: 28 Global Step: 601100 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:39,362-Speed 2496.97 samples/sec Loss 1.4172 LearningRate 0.000094 Epoch: 28 Global Step: 601110 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:47,566-Speed 2496.52 samples/sec Loss 1.3589 LearningRate 0.000094 Epoch: 28 Global Step: 601120 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:02:55,773-Speed 2495.86 samples/sec Loss 1.4390 LearningRate 0.000094 Epoch: 28 Global Step: 601130 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:03,979-Speed 2496.08 samples/sec Loss 1.4429 LearningRate 0.000094 Epoch: 28 Global Step: 601140 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:12,126-Speed 2514.16 samples/sec Loss 1.4458 LearningRate 0.000094 Epoch: 28 Global Step: 601150 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:20,327-Speed 2497.46 samples/sec Loss 1.4496 LearningRate 0.000094 Epoch: 28 Global Step: 601160 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:28,534-Speed 2495.83 samples/sec Loss 1.4469 LearningRate 0.000094 Epoch: 28 Global Step: 601170 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:36,742-Speed 2495.64 samples/sec Loss 1.4643 LearningRate 0.000094 Epoch: 28 Global Step: 601180 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:44,947-Speed 2496.42 samples/sec Loss 1.4708 LearningRate 0.000094 Epoch: 28 Global Step: 601190 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:03:53,148-Speed 2497.88 samples/sec Loss 1.4115 LearningRate 0.000094 Epoch: 28 Global Step: 601200 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:01,320-Speed 2506.41 samples/sec Loss 1.3947 LearningRate 0.000094 Epoch: 28 Global Step: 601210 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:09,535-Speed 2493.92 samples/sec Loss 1.4251 LearningRate 0.000094 Epoch: 28 Global Step: 601220 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:17,741-Speed 2495.95 samples/sec Loss 1.4271 LearningRate 0.000094 Epoch: 28 Global Step: 601230 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:25,954-Speed 2493.92 samples/sec Loss 1.4124 LearningRate 0.000094 Epoch: 28 Global Step: 601240 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:34,162-Speed 2495.61 samples/sec Loss 1.4202 LearningRate 0.000094 Epoch: 28 Global Step: 601250 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:42,372-Speed 2495.03 samples/sec Loss 1.4514 LearningRate 0.000094 Epoch: 28 Global Step: 601260 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:50,540-Speed 2507.47 samples/sec Loss 1.3764 LearningRate 0.000093 Epoch: 28 Global Step: 601270 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:04:58,745-Speed 2496.67 samples/sec Loss 1.4185 LearningRate 0.000093 Epoch: 28 Global Step: 601280 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:06,952-Speed 2495.71 samples/sec Loss 1.4071 LearningRate 0.000093 Epoch: 28 Global Step: 601290 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:15,171-Speed 2492.37 samples/sec Loss 1.4624 LearningRate 0.000093 Epoch: 28 Global Step: 601300 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:23,379-Speed 2495.40 samples/sec Loss 1.4291 LearningRate 0.000093 Epoch: 28 Global Step: 601310 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:31,593-Speed 2493.77 samples/sec Loss 1.3935 LearningRate 0.000093 Epoch: 28 Global Step: 601320 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:39,742-Speed 2513.62 samples/sec Loss 1.4158 LearningRate 0.000093 Epoch: 28 Global Step: 601330 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:47,947-Speed 2496.35 samples/sec Loss 1.4313 LearningRate 0.000093 Epoch: 28 Global Step: 601340 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:05:56,150-Speed 2497.04 samples/sec Loss 1.4149 LearningRate 0.000093 Epoch: 28 Global Step: 601350 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:06:04,311-Speed 2510.08 samples/sec Loss 1.4097 LearningRate 0.000093 Epoch: 28 Global Step: 601360 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:12,513-Speed 2497.46 samples/sec Loss 1.4141 LearningRate 0.000093 Epoch: 28 Global Step: 601370 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:20,713-Speed 2497.75 samples/sec Loss 1.4377 LearningRate 0.000093 Epoch: 28 Global Step: 601380 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:28,864-Speed 2513.30 samples/sec Loss 1.4164 LearningRate 0.000093 Epoch: 28 Global Step: 601390 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:37,066-Speed 2497.11 samples/sec Loss 1.4469 LearningRate 0.000093 Epoch: 28 Global Step: 601400 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:45,268-Speed 2497.37 samples/sec Loss 1.4323 LearningRate 0.000093 Epoch: 28 Global Step: 601410 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:06:53,474-Speed 2497.85 samples/sec Loss 1.4243 LearningRate 0.000093 Epoch: 28 Global Step: 601420 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:01,679-Speed 2497.00 samples/sec Loss 1.4122 LearningRate 0.000093 Epoch: 28 Global Step: 601430 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:09,896-Speed 2492.63 samples/sec Loss 1.4283 LearningRate 0.000093 Epoch: 28 Global Step: 601440 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:20,324-Speed 1964.18 samples/sec Loss 1.4314 LearningRate 0.000093 Epoch: 29 Global Step: 601450 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:28,519-Speed 2499.53 samples/sec Loss 1.4017 LearningRate 0.000093 Epoch: 29 Global Step: 601460 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:36,719-Speed 2498.09 samples/sec Loss 1.4375 LearningRate 0.000093 Epoch: 29 Global Step: 601470 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:44,915-Speed 2499.04 samples/sec Loss 1.4105 LearningRate 0.000093 Epoch: 29 Global Step: 601480 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:07:53,109-Speed 2499.70 samples/sec Loss 1.4322 LearningRate 0.000093 Epoch: 29 Global Step: 601490 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:01,313-Speed 2497.56 samples/sec Loss 1.3595 LearningRate 0.000093 Epoch: 29 Global Step: 601500 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:09,462-Speed 2513.82 samples/sec Loss 1.4248 LearningRate 0.000093 Epoch: 29 Global Step: 601510 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:17,664-Speed 2497.24 samples/sec Loss 1.4129 LearningRate 0.000093 Epoch: 29 Global Step: 601520 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:25,860-Speed 2499.04 samples/sec Loss 1.4483 LearningRate 0.000093 Epoch: 29 Global Step: 601530 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:34,063-Speed 2497.25 samples/sec Loss 1.4350 LearningRate 0.000093 Epoch: 29 Global Step: 601540 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:42,260-Speed 2498.96 samples/sec Loss 1.3905 LearningRate 0.000093 Epoch: 29 Global Step: 601550 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:50,463-Speed 2497.08 samples/sec Loss 1.4133 LearningRate 0.000093 Epoch: 29 Global Step: 601560 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:08:58,606-Speed 2515.23 samples/sec Loss 1.3990 LearningRate 0.000093 Epoch: 29 Global Step: 601570 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:06,804-Speed 2498.43 samples/sec Loss 1.4136 LearningRate 0.000093 Epoch: 29 Global Step: 601580 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:15,003-Speed 2498.83 samples/sec Loss 1.4178 LearningRate 0.000093 Epoch: 29 Global Step: 601590 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:23,200-Speed 2498.63 samples/sec Loss 1.4052 LearningRate 0.000093 Epoch: 29 Global Step: 601600 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:31,403-Speed 2496.97 samples/sec Loss 1.4122 LearningRate 0.000093 Epoch: 29 Global Step: 601610 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:39,606-Speed 2497.15 samples/sec Loss 1.4156 LearningRate 0.000093 Epoch: 29 Global Step: 601620 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:47,754-Speed 2513.81 samples/sec Loss 1.3994 LearningRate 0.000093 Epoch: 29 Global Step: 601630 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:09:55,955-Speed 2497.73 samples/sec Loss 1.3879 LearningRate 0.000093 Epoch: 29 Global Step: 601640 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:04,154-Speed 2498.19 samples/sec Loss 1.3959 LearningRate 0.000093 Epoch: 29 Global Step: 601650 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:12,356-Speed 2497.29 samples/sec Loss 1.3586 LearningRate 0.000093 Epoch: 29 Global Step: 601660 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:20,558-Speed 2497.47 samples/sec Loss 1.4238 LearningRate 0.000093 Epoch: 29 Global Step: 601670 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:28,762-Speed 2496.69 samples/sec Loss 1.4162 LearningRate 0.000093 Epoch: 29 Global Step: 601680 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:36,910-Speed 2513.83 samples/sec Loss 1.4284 LearningRate 0.000093 Epoch: 29 Global Step: 601690 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:45,115-Speed 2496.54 samples/sec Loss 1.3861 LearningRate 0.000093 Epoch: 29 Global Step: 601700 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:10:53,315-Speed 2498.00 samples/sec Loss 1.3971 LearningRate 0.000093 Epoch: 29 Global Step: 601710 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:01,528-Speed 2493.99 samples/sec Loss 1.4196 LearningRate 0.000093 Epoch: 29 Global Step: 601720 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:09,733-Speed 2496.45 samples/sec Loss 1.4032 LearningRate 0.000093 Epoch: 29 Global Step: 601730 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:17,932-Speed 2498.45 samples/sec Loss 1.4104 LearningRate 0.000093 Epoch: 29 Global Step: 601740 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:26,080-Speed 2514.22 samples/sec Loss 1.3938 LearningRate 0.000093 Epoch: 29 Global Step: 601750 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:34,277-Speed 2499.03 samples/sec Loss 1.4306 LearningRate 0.000093 Epoch: 29 Global Step: 601760 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:42,473-Speed 2498.90 samples/sec Loss 1.3854 LearningRate 0.000093 Epoch: 29 Global Step: 601770 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:50,673-Speed 2497.92 samples/sec Loss 1.3716 LearningRate 0.000093 Epoch: 29 Global Step: 601780 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:11:58,871-Speed 2498.83 samples/sec Loss 1.3660 LearningRate 0.000093 Epoch: 29 Global Step: 601790 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:07,074-Speed 2496.97 samples/sec Loss 1.4330 LearningRate 0.000093 Epoch: 29 Global Step: 601800 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:15,220-Speed 2514.61 samples/sec Loss 1.3698 LearningRate 0.000093 Epoch: 29 Global Step: 601810 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:23,439-Speed 2491.94 samples/sec Loss 1.3733 LearningRate 0.000093 Epoch: 29 Global Step: 601820 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:31,649-Speed 2495.36 samples/sec Loss 1.3770 LearningRate 0.000093 Epoch: 29 Global Step: 601830 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:39,844-Speed 2499.44 samples/sec Loss 1.3745 LearningRate 0.000093 Epoch: 29 Global Step: 601840 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:48,044-Speed 2497.99 samples/sec Loss 1.4039 LearningRate 0.000093 Epoch: 29 Global Step: 601850 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:12:56,243-Speed 2498.29 samples/sec Loss 1.3994 LearningRate 0.000093 Epoch: 29 Global Step: 601860 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:04,388-Speed 2514.51 samples/sec Loss 1.3977 LearningRate 0.000093 Epoch: 29 Global Step: 601870 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:12,587-Speed 2498.50 samples/sec Loss 1.4319 LearningRate 0.000093 Epoch: 29 Global Step: 601880 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:20,782-Speed 2499.62 samples/sec Loss 1.4123 LearningRate 0.000093 Epoch: 29 Global Step: 601890 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:28,980-Speed 2498.47 samples/sec Loss 1.3860 LearningRate 0.000093 Epoch: 29 Global Step: 601900 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:37,183-Speed 2497.08 samples/sec Loss 1.3851 LearningRate 0.000093 Epoch: 29 Global Step: 601910 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:45,384-Speed 2497.49 samples/sec Loss 1.4167 LearningRate 0.000093 Epoch: 29 Global Step: 601920 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:13:53,532-Speed 2514.09 samples/sec Loss 1.4057 LearningRate 0.000093 Epoch: 29 Global Step: 601930 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:01,747-Speed 2494.11 samples/sec Loss 1.3776 LearningRate 0.000093 Epoch: 29 Global Step: 601940 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:09,955-Speed 2495.28 samples/sec Loss 1.3956 LearningRate 0.000093 Epoch: 29 Global Step: 601950 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:18,153-Speed 2498.71 samples/sec Loss 1.4010 LearningRate 0.000093 Epoch: 29 Global Step: 601960 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:26,352-Speed 2498.45 samples/sec Loss 1.4070 LearningRate 0.000093 Epoch: 29 Global Step: 601970 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:34,553-Speed 2497.46 samples/sec Loss 1.3781 LearningRate 0.000093 Epoch: 29 Global Step: 601980 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:42,698-Speed 2515.11 samples/sec Loss 1.4293 LearningRate 0.000093 Epoch: 29 Global Step: 601990 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:50,898-Speed 2497.97 samples/sec Loss 1.4339 LearningRate 0.000093 Epoch: 29 Global Step: 602000 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:14:59,100-Speed 2497.58 samples/sec Loss 1.4136 LearningRate 0.000093 Epoch: 29 Global Step: 602010 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:07,308-Speed 2495.65 samples/sec Loss 1.4127 LearningRate 0.000093 Epoch: 29 Global Step: 602020 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:15,503-Speed 2499.23 samples/sec Loss 1.4279 LearningRate 0.000093 Epoch: 29 Global Step: 602030 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:23,707-Speed 2496.95 samples/sec Loss 1.3945 LearningRate 0.000093 Epoch: 29 Global Step: 602040 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:31,851-Speed 2515.31 samples/sec Loss 1.3838 LearningRate 0.000093 Epoch: 29 Global Step: 602050 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:40,050-Speed 2498.36 samples/sec Loss 1.3922 LearningRate 0.000093 Epoch: 29 Global Step: 602060 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:48,251-Speed 2497.66 samples/sec Loss 1.3913 LearningRate 0.000093 Epoch: 29 Global Step: 602070 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:15:56,449-Speed 2498.50 samples/sec Loss 1.4064 LearningRate 0.000093 Epoch: 29 Global Step: 602080 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:04,651-Speed 2497.47 samples/sec Loss 1.4112 LearningRate 0.000093 Epoch: 29 Global Step: 602090 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:12,849-Speed 2498.43 samples/sec Loss 1.4471 LearningRate 0.000093 Epoch: 29 Global Step: 602100 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:20,993-Speed 2515.43 samples/sec Loss 1.4169 LearningRate 0.000093 Epoch: 29 Global Step: 602110 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:29,194-Speed 2497.66 samples/sec Loss 1.3835 LearningRate 0.000093 Epoch: 29 Global Step: 602120 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:37,395-Speed 2497.66 samples/sec Loss 1.4116 LearningRate 0.000093 Epoch: 29 Global Step: 602130 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:45,602-Speed 2495.83 samples/sec Loss 1.4166 LearningRate 0.000093 Epoch: 29 Global Step: 602140 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:16:53,802-Speed 2497.82 samples/sec Loss 1.4586 LearningRate 0.000093 Epoch: 29 Global Step: 602150 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:02,003-Speed 2498.05 samples/sec Loss 1.4039 LearningRate 0.000093 Epoch: 29 Global Step: 602160 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:10,147-Speed 2515.06 samples/sec Loss 1.4162 LearningRate 0.000093 Epoch: 29 Global Step: 602170 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:18,345-Speed 2498.66 samples/sec Loss 1.4019 LearningRate 0.000093 Epoch: 29 Global Step: 602180 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:26,543-Speed 2498.56 samples/sec Loss 1.3893 LearningRate 0.000093 Epoch: 29 Global Step: 602190 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:34,746-Speed 2497.13 samples/sec Loss 1.4218 LearningRate 0.000093 Epoch: 29 Global Step: 602200 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:42,945-Speed 2498.49 samples/sec Loss 1.4324 LearningRate 0.000093 Epoch: 29 Global Step: 602210 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:51,144-Speed 2498.22 samples/sec Loss 1.4183 LearningRate 0.000093 Epoch: 29 Global Step: 602220 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:17:59,291-Speed 2514.15 samples/sec Loss 1.4285 LearningRate 0.000093 Epoch: 29 Global Step: 602230 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:07,493-Speed 2497.54 samples/sec Loss 1.4155 LearningRate 0.000093 Epoch: 29 Global Step: 602240 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:15,692-Speed 2498.25 samples/sec Loss 1.4217 LearningRate 0.000093 Epoch: 29 Global Step: 602250 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:23,889-Speed 2498.92 samples/sec Loss 1.4038 LearningRate 0.000093 Epoch: 29 Global Step: 602260 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:32,086-Speed 2498.54 samples/sec Loss 1.3938 LearningRate 0.000093 Epoch: 29 Global Step: 602270 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:40,283-Speed 2499.10 samples/sec Loss 1.3989 LearningRate 0.000093 Epoch: 29 Global Step: 602280 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:48,425-Speed 2515.51 samples/sec Loss 1.4098 LearningRate 0.000093 Epoch: 29 Global Step: 602290 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:18:56,638-Speed 2494.06 samples/sec Loss 1.4016 LearningRate 0.000093 Epoch: 29 Global Step: 602300 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:04,837-Speed 2498.16 samples/sec Loss 1.4014 LearningRate 0.000093 Epoch: 29 Global Step: 602310 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:13,036-Speed 2498.55 samples/sec Loss 1.4167 LearningRate 0.000093 Epoch: 29 Global Step: 602320 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:21,254-Speed 2492.25 samples/sec Loss 1.4272 LearningRate 0.000093 Epoch: 29 Global Step: 602330 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:29,453-Speed 2498.31 samples/sec Loss 1.4106 LearningRate 0.000093 Epoch: 29 Global Step: 602340 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:37,603-Speed 2513.56 samples/sec Loss 1.4181 LearningRate 0.000093 Epoch: 29 Global Step: 602350 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:45,801-Speed 2498.44 samples/sec Loss 1.3764 LearningRate 0.000093 Epoch: 29 Global Step: 602360 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:19:54,001-Speed 2498.18 samples/sec Loss 1.4390 LearningRate 0.000093 Epoch: 29 Global Step: 602370 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:02,202-Speed 2497.44 samples/sec Loss 1.4103 LearningRate 0.000093 Epoch: 29 Global Step: 602380 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:10,399-Speed 2498.87 samples/sec Loss 1.4117 LearningRate 0.000093 Epoch: 29 Global Step: 602390 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:18,599-Speed 2498.13 samples/sec Loss 1.4096 LearningRate 0.000093 Epoch: 29 Global Step: 602400 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:26,748-Speed 2513.49 samples/sec Loss 1.3892 LearningRate 0.000093 Epoch: 29 Global Step: 602410 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:34,947-Speed 2498.35 samples/sec Loss 1.4130 LearningRate 0.000093 Epoch: 29 Global Step: 602420 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:43,146-Speed 2498.45 samples/sec Loss 1.4055 LearningRate 0.000093 Epoch: 29 Global Step: 602430 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:51,347-Speed 2497.65 samples/sec Loss 1.4370 LearningRate 0.000093 Epoch: 29 Global Step: 602440 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:20:59,543-Speed 2498.90 samples/sec Loss 1.4125 LearningRate 0.000093 Epoch: 29 Global Step: 602450 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:07,742-Speed 2498.29 samples/sec Loss 1.4155 LearningRate 0.000093 Epoch: 29 Global Step: 602460 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:15,887-Speed 2516.30 samples/sec Loss 1.3711 LearningRate 0.000093 Epoch: 29 Global Step: 602470 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:24,085-Speed 2498.30 samples/sec Loss 1.4103 LearningRate 0.000093 Epoch: 29 Global Step: 602480 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:32,285-Speed 2498.05 samples/sec Loss 1.4017 LearningRate 0.000093 Epoch: 29 Global Step: 602490 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:40,488-Speed 2497.00 samples/sec Loss 1.4160 LearningRate 0.000092 Epoch: 29 Global Step: 602500 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:48,686-Speed 2498.78 samples/sec Loss 1.4002 LearningRate 0.000092 Epoch: 29 Global Step: 602510 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:21:56,885-Speed 2498.01 samples/sec Loss 1.4218 LearningRate 0.000092 Epoch: 29 Global Step: 602520 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:22:05,032-Speed 2514.16 samples/sec Loss 1.3684 LearningRate 0.000092 Epoch: 29 Global Step: 602530 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:22:13,230-Speed 2498.62 samples/sec Loss 1.4040 LearningRate 0.000092 Epoch: 29 Global Step: 602540 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:22:21,429-Speed 2498.16 samples/sec Loss 1.4030 LearningRate 0.000092 Epoch: 29 Global Step: 602550 Fp16 Grad Scale: 8192 Required: 52 hours Training: 2022-07-11 08:22:29,630-Speed 2497.82 samples/sec Loss 1.4049 LearningRate 0.000092 Epoch: 29 Global Step: 602560 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:22:37,829-Speed 2498.15 samples/sec Loss 1.4127 LearningRate 0.000092 Epoch: 29 Global Step: 602570 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:22:46,044-Speed 2493.62 samples/sec Loss 1.4322 LearningRate 0.000092 Epoch: 29 Global Step: 602580 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:22:54,193-Speed 2513.63 samples/sec Loss 1.4196 LearningRate 0.000092 Epoch: 29 Global Step: 602590 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:02,394-Speed 2497.64 samples/sec Loss 1.4333 LearningRate 0.000092 Epoch: 29 Global Step: 602600 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:10,595-Speed 2497.54 samples/sec Loss 1.3972 LearningRate 0.000092 Epoch: 29 Global Step: 602610 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:18,798-Speed 2497.21 samples/sec Loss 1.3840 LearningRate 0.000092 Epoch: 29 Global Step: 602620 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:27,000-Speed 2497.33 samples/sec Loss 1.3970 LearningRate 0.000092 Epoch: 29 Global Step: 602630 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:35,201-Speed 2497.62 samples/sec Loss 1.4041 LearningRate 0.000092 Epoch: 29 Global Step: 602640 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:43,352-Speed 2512.90 samples/sec Loss 1.4012 LearningRate 0.000092 Epoch: 29 Global Step: 602650 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:51,561-Speed 2495.22 samples/sec Loss 1.3824 LearningRate 0.000092 Epoch: 29 Global Step: 602660 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:23:59,765-Speed 2496.89 samples/sec Loss 1.3938 LearningRate 0.000092 Epoch: 29 Global Step: 602670 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:07,969-Speed 2496.72 samples/sec Loss 1.3558 LearningRate 0.000092 Epoch: 29 Global Step: 602680 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:16,170-Speed 2497.61 samples/sec Loss 1.4100 LearningRate 0.000092 Epoch: 29 Global Step: 602690 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:24,375-Speed 2496.69 samples/sec Loss 1.3774 LearningRate 0.000092 Epoch: 29 Global Step: 602700 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:32,537-Speed 2509.53 samples/sec Loss 1.4266 LearningRate 0.000092 Epoch: 29 Global Step: 602710 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:40,746-Speed 2495.16 samples/sec Loss 1.4422 LearningRate 0.000092 Epoch: 29 Global Step: 602720 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:48,965-Speed 2491.98 samples/sec Loss 1.3984 LearningRate 0.000092 Epoch: 29 Global Step: 602730 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:24:57,167-Speed 2497.63 samples/sec Loss 1.3816 LearningRate 0.000092 Epoch: 29 Global Step: 602740 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:05,365-Speed 2498.41 samples/sec Loss 1.4119 LearningRate 0.000092 Epoch: 29 Global Step: 602750 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:13,567-Speed 2497.53 samples/sec Loss 1.3793 LearningRate 0.000092 Epoch: 29 Global Step: 602760 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:21,709-Speed 2515.74 samples/sec Loss 1.4531 LearningRate 0.000092 Epoch: 29 Global Step: 602770 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:29,909-Speed 2497.91 samples/sec Loss 1.3608 LearningRate 0.000092 Epoch: 29 Global Step: 602780 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:38,110-Speed 2497.50 samples/sec Loss 1.3585 LearningRate 0.000092 Epoch: 29 Global Step: 602790 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:46,308-Speed 2498.78 samples/sec Loss 1.4199 LearningRate 0.000092 Epoch: 29 Global Step: 602800 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:25:54,509-Speed 2497.79 samples/sec Loss 1.4261 LearningRate 0.000092 Epoch: 29 Global Step: 602810 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:02,709-Speed 2497.95 samples/sec Loss 1.3282 LearningRate 0.000092 Epoch: 29 Global Step: 602820 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:10,854-Speed 2514.70 samples/sec Loss 1.4207 LearningRate 0.000092 Epoch: 29 Global Step: 602830 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:19,053-Speed 2498.32 samples/sec Loss 1.3890 LearningRate 0.000092 Epoch: 29 Global Step: 602840 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:27,251-Speed 2498.41 samples/sec Loss 1.3974 LearningRate 0.000092 Epoch: 29 Global Step: 602850 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:35,453-Speed 2497.66 samples/sec Loss 1.4137 LearningRate 0.000092 Epoch: 29 Global Step: 602860 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:43,667-Speed 2493.42 samples/sec Loss 1.3673 LearningRate 0.000092 Epoch: 29 Global Step: 602870 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:26:51,866-Speed 2498.24 samples/sec Loss 1.4329 LearningRate 0.000092 Epoch: 29 Global Step: 602880 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:00,019-Speed 2512.57 samples/sec Loss 1.3903 LearningRate 0.000092 Epoch: 29 Global Step: 602890 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:08,220-Speed 2497.62 samples/sec Loss 1.4542 LearningRate 0.000092 Epoch: 29 Global Step: 602900 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:16,421-Speed 2497.65 samples/sec Loss 1.4305 LearningRate 0.000092 Epoch: 29 Global Step: 602910 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:24,619-Speed 2498.50 samples/sec Loss 1.4327 LearningRate 0.000092 Epoch: 29 Global Step: 602920 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:32,825-Speed 2496.35 samples/sec Loss 1.3899 LearningRate 0.000092 Epoch: 29 Global Step: 602930 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:41,034-Speed 2495.16 samples/sec Loss 1.4284 LearningRate 0.000092 Epoch: 29 Global Step: 602940 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:49,186-Speed 2512.49 samples/sec Loss 1.3885 LearningRate 0.000092 Epoch: 29 Global Step: 602950 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:27:57,387-Speed 2497.62 samples/sec Loss 1.3861 LearningRate 0.000092 Epoch: 29 Global Step: 602960 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:05,593-Speed 2496.13 samples/sec Loss 1.4169 LearningRate 0.000092 Epoch: 29 Global Step: 602970 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:13,794-Speed 2497.48 samples/sec Loss 1.3701 LearningRate 0.000092 Epoch: 29 Global Step: 602980 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:21,995-Speed 2497.86 samples/sec Loss 1.4133 LearningRate 0.000092 Epoch: 29 Global Step: 602990 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:30,193-Speed 2498.46 samples/sec Loss 1.4191 LearningRate 0.000092 Epoch: 29 Global Step: 603000 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:38,342-Speed 2513.69 samples/sec Loss 1.3798 LearningRate 0.000092 Epoch: 29 Global Step: 603010 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:46,541-Speed 2498.28 samples/sec Loss 1.3687 LearningRate 0.000092 Epoch: 29 Global Step: 603020 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:28:54,742-Speed 2497.65 samples/sec Loss 1.4136 LearningRate 0.000092 Epoch: 29 Global Step: 603030 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:02,943-Speed 2497.50 samples/sec Loss 1.3870 LearningRate 0.000092 Epoch: 29 Global Step: 603040 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:11,144-Speed 2497.75 samples/sec Loss 1.4271 LearningRate 0.000092 Epoch: 29 Global Step: 603050 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:19,346-Speed 2497.29 samples/sec Loss 1.4281 LearningRate 0.000092 Epoch: 29 Global Step: 603060 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:27,491-Speed 2514.71 samples/sec Loss 1.4162 LearningRate 0.000092 Epoch: 29 Global Step: 603070 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:35,693-Speed 2497.33 samples/sec Loss 1.4074 LearningRate 0.000092 Epoch: 29 Global Step: 603080 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:43,890-Speed 2498.91 samples/sec Loss 1.4349 LearningRate 0.000092 Epoch: 29 Global Step: 603090 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:29:52,091-Speed 2497.81 samples/sec Loss 1.4155 LearningRate 0.000092 Epoch: 29 Global Step: 603100 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:00,292-Speed 2497.66 samples/sec Loss 1.3844 LearningRate 0.000092 Epoch: 29 Global Step: 603110 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:08,495-Speed 2497.14 samples/sec Loss 1.4462 LearningRate 0.000092 Epoch: 29 Global Step: 603120 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:16,644-Speed 2513.53 samples/sec Loss 1.4123 LearningRate 0.000092 Epoch: 29 Global Step: 603130 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:24,847-Speed 2497.24 samples/sec Loss 1.4184 LearningRate 0.000092 Epoch: 29 Global Step: 603140 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:33,048-Speed 2497.58 samples/sec Loss 1.4040 LearningRate 0.000092 Epoch: 29 Global Step: 603150 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:41,251-Speed 2497.17 samples/sec Loss 1.3973 LearningRate 0.000092 Epoch: 29 Global Step: 603160 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:49,454-Speed 2497.11 samples/sec Loss 1.4411 LearningRate 0.000092 Epoch: 29 Global Step: 603170 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:30:57,651-Speed 2498.72 samples/sec Loss 1.3541 LearningRate 0.000092 Epoch: 29 Global Step: 603180 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:05,799-Speed 2513.83 samples/sec Loss 1.3976 LearningRate 0.000092 Epoch: 29 Global Step: 603190 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:14,030-Speed 2500.20 samples/sec Loss 1.3833 LearningRate 0.000092 Epoch: 29 Global Step: 603200 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:22,248-Speed 2498.93 samples/sec Loss 1.3898 LearningRate 0.000092 Epoch: 29 Global Step: 603210 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:33,934-Speed 1752.69 samples/sec Loss 1.3957 LearningRate 0.000092 Epoch: 29 Global Step: 603220 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:42,170-Speed 2500.75 samples/sec Loss 1.4111 LearningRate 0.000092 Epoch: 29 Global Step: 603230 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:31:54,304-Speed 1771.58 samples/sec Loss 1.3738 LearningRate 0.000092 Epoch: 29 Global Step: 603240 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:02,499-Speed 2517.78 samples/sec Loss 1.3864 LearningRate 0.000092 Epoch: 29 Global Step: 603250 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:10,699-Speed 2497.68 samples/sec Loss 1.4050 LearningRate 0.000092 Epoch: 29 Global Step: 603260 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:23,853-Speed 2497.43 samples/sec Loss 1.4173 LearningRate 0.000092 Epoch: 29 Global Step: 603270 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:32,824-Speed 2400.52 samples/sec Loss 1.4138 LearningRate 0.000092 Epoch: 29 Global Step: 603280 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:41,020-Speed 2499.07 samples/sec Loss 1.4289 LearningRate 0.000092 Epoch: 29 Global Step: 603290 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:32:55,204-Speed 2502.25 samples/sec Loss 1.4199 LearningRate 0.000092 Epoch: 29 Global Step: 603300 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:03,450-Speed 2518.15 samples/sec Loss 1.4250 LearningRate 0.000092 Epoch: 29 Global Step: 603310 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:11,647-Speed 2498.95 samples/sec Loss 1.3747 LearningRate 0.000092 Epoch: 29 Global Step: 603320 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:22,759-Speed 2498.77 samples/sec Loss 1.4003 LearningRate 0.000092 Epoch: 29 Global Step: 603330 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:30,970-Speed 2499.58 samples/sec Loss 1.4218 LearningRate 0.000092 Epoch: 29 Global Step: 603340 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:39,170-Speed 2497.76 samples/sec Loss 1.3910 LearningRate 0.000092 Epoch: 29 Global Step: 603350 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:33:51,974-Speed 2500.87 samples/sec Loss 1.4320 LearningRate 0.000092 Epoch: 29 Global Step: 603360 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:00,292-Speed 2515.91 samples/sec Loss 1.3571 LearningRate 0.000092 Epoch: 29 Global Step: 603370 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:08,489-Speed 2498.79 samples/sec Loss 1.3690 LearningRate 0.000092 Epoch: 29 Global Step: 603380 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:20,571-Speed 2203.98 samples/sec Loss 1.3696 LearningRate 0.000092 Epoch: 29 Global Step: 603390 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:28,777-Speed 2496.21 samples/sec Loss 1.4580 LearningRate 0.000092 Epoch: 29 Global Step: 603400 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:37,093-Speed 2498.48 samples/sec Loss 1.3915 LearningRate 0.000092 Epoch: 29 Global Step: 603410 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:45,300-Speed 2495.73 samples/sec Loss 1.4091 LearningRate 0.000092 Epoch: 29 Global Step: 603420 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:34:53,452-Speed 2512.71 samples/sec Loss 1.4387 LearningRate 0.000092 Epoch: 29 Global Step: 603430 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:01,656-Speed 2496.59 samples/sec Loss 1.4136 LearningRate 0.000092 Epoch: 29 Global Step: 603440 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:09,858-Speed 2497.50 samples/sec Loss 1.4059 LearningRate 0.000092 Epoch: 29 Global Step: 603450 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:18,063-Speed 2496.30 samples/sec Loss 1.4231 LearningRate 0.000092 Epoch: 29 Global Step: 603460 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:26,267-Speed 2496.79 samples/sec Loss 1.4136 LearningRate 0.000092 Epoch: 29 Global Step: 603470 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:34,473-Speed 2495.88 samples/sec Loss 1.3947 LearningRate 0.000092 Epoch: 29 Global Step: 603480 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:42,624-Speed 2513.21 samples/sec Loss 1.4412 LearningRate 0.000092 Epoch: 29 Global Step: 603490 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:50,830-Speed 2495.93 samples/sec Loss 1.4495 LearningRate 0.000092 Epoch: 29 Global Step: 603500 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:35:59,036-Speed 2496.34 samples/sec Loss 1.4201 LearningRate 0.000092 Epoch: 29 Global Step: 603510 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:07,243-Speed 2495.80 samples/sec Loss 1.3808 LearningRate 0.000092 Epoch: 29 Global Step: 603520 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:15,443-Speed 2497.76 samples/sec Loss 1.3808 LearningRate 0.000092 Epoch: 29 Global Step: 603530 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:23,657-Speed 2493.61 samples/sec Loss 1.4040 LearningRate 0.000092 Epoch: 29 Global Step: 603540 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:31,812-Speed 2511.93 samples/sec Loss 1.4391 LearningRate 0.000092 Epoch: 29 Global Step: 603550 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:40,024-Speed 2494.35 samples/sec Loss 1.4030 LearningRate 0.000092 Epoch: 29 Global Step: 603560 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:48,251-Speed 2489.66 samples/sec Loss 1.3817 LearningRate 0.000092 Epoch: 29 Global Step: 603570 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:36:56,454-Speed 2497.18 samples/sec Loss 1.4274 LearningRate 0.000092 Epoch: 29 Global Step: 603580 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:04,658-Speed 2496.70 samples/sec Loss 1.3977 LearningRate 0.000092 Epoch: 29 Global Step: 603590 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:12,862-Speed 2496.70 samples/sec Loss 1.4380 LearningRate 0.000092 Epoch: 29 Global Step: 603600 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:21,012-Speed 2513.19 samples/sec Loss 1.4247 LearningRate 0.000092 Epoch: 29 Global Step: 603610 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:29,216-Speed 2496.86 samples/sec Loss 1.4271 LearningRate 0.000092 Epoch: 29 Global Step: 603620 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:37,421-Speed 2496.44 samples/sec Loss 1.4260 LearningRate 0.000092 Epoch: 29 Global Step: 603630 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:45,625-Speed 2496.93 samples/sec Loss 1.3748 LearningRate 0.000092 Epoch: 29 Global Step: 603640 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:37:53,828-Speed 2496.92 samples/sec Loss 1.3888 LearningRate 0.000092 Epoch: 29 Global Step: 603650 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:02,031-Speed 2497.23 samples/sec Loss 1.4117 LearningRate 0.000092 Epoch: 29 Global Step: 603660 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:10,181-Speed 2513.34 samples/sec Loss 1.4071 LearningRate 0.000092 Epoch: 29 Global Step: 603670 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:18,384-Speed 2496.98 samples/sec Loss 1.4201 LearningRate 0.000092 Epoch: 29 Global Step: 603680 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:26,586-Speed 2497.18 samples/sec Loss 1.4204 LearningRate 0.000092 Epoch: 29 Global Step: 603690 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:34,791-Speed 2496.68 samples/sec Loss 1.4284 LearningRate 0.000092 Epoch: 29 Global Step: 603700 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:42,990-Speed 2498.15 samples/sec Loss 1.4076 LearningRate 0.000092 Epoch: 29 Global Step: 603710 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:51,194-Speed 2496.90 samples/sec Loss 1.4004 LearningRate 0.000092 Epoch: 29 Global Step: 603720 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:38:59,346-Speed 2512.63 samples/sec Loss 1.3774 LearningRate 0.000091 Epoch: 29 Global Step: 603730 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:39:07,550-Speed 2496.87 samples/sec Loss 1.3728 LearningRate 0.000091 Epoch: 29 Global Step: 603740 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:39:15,750-Speed 2497.96 samples/sec Loss 1.4280 LearningRate 0.000091 Epoch: 29 Global Step: 603750 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:39:23,950-Speed 2497.90 samples/sec Loss 1.3757 LearningRate 0.000091 Epoch: 29 Global Step: 603760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:39:32,152-Speed 2497.35 samples/sec Loss 1.4010 LearningRate 0.000091 Epoch: 29 Global Step: 603770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:39:40,353-Speed 2497.45 samples/sec Loss 1.4355 LearningRate 0.000091 Epoch: 29 Global Step: 603780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:39:48,505-Speed 2512.87 samples/sec Loss 1.4051 LearningRate 0.000091 Epoch: 29 Global Step: 603790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:39:56,708-Speed 2496.83 samples/sec Loss 1.3919 LearningRate 0.000091 Epoch: 29 Global Step: 603800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:04,924-Speed 2493.61 samples/sec Loss 1.3922 LearningRate 0.000091 Epoch: 29 Global Step: 603810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:13,125-Speed 2497.59 samples/sec Loss 1.4165 LearningRate 0.000091 Epoch: 29 Global Step: 603820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:21,325-Speed 2498.00 samples/sec Loss 1.3756 LearningRate 0.000091 Epoch: 29 Global Step: 603830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:29,527-Speed 2497.30 samples/sec Loss 1.4073 LearningRate 0.000091 Epoch: 29 Global Step: 603840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:37,675-Speed 2513.91 samples/sec Loss 1.4211 LearningRate 0.000091 Epoch: 29 Global Step: 603850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:45,876-Speed 2497.40 samples/sec Loss 1.3980 LearningRate 0.000091 Epoch: 29 Global Step: 603860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:40:54,077-Speed 2498.01 samples/sec Loss 1.3937 LearningRate 0.000091 Epoch: 29 Global Step: 603870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:02,287-Speed 2494.98 samples/sec Loss 1.3766 LearningRate 0.000091 Epoch: 29 Global Step: 603880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:10,490-Speed 2496.95 samples/sec Loss 1.3887 LearningRate 0.000091 Epoch: 29 Global Step: 603890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:18,696-Speed 2496.07 samples/sec Loss 1.4010 LearningRate 0.000091 Epoch: 29 Global Step: 603900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:26,843-Speed 2514.27 samples/sec Loss 1.3908 LearningRate 0.000091 Epoch: 29 Global Step: 603910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:35,042-Speed 2498.05 samples/sec Loss 1.4100 LearningRate 0.000091 Epoch: 29 Global Step: 603920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:43,241-Speed 2498.45 samples/sec Loss 1.4322 LearningRate 0.000091 Epoch: 29 Global Step: 603930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:51,455-Speed 2493.86 samples/sec Loss 1.3877 LearningRate 0.000091 Epoch: 29 Global Step: 603940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:41:59,655-Speed 2498.05 samples/sec Loss 1.3719 LearningRate 0.000091 Epoch: 29 Global Step: 603950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-07-11 08:42:07,817-Speed 2509.61 samples/sec Loss 1.4134 LearningRate 0.000091 Epoch: 29 Global Step: 603960 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:15,964-Speed 2514.08 samples/sec Loss 1.3931 LearningRate 0.000091 Epoch: 29 Global Step: 603970 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:24,166-Speed 2497.44 samples/sec Loss 1.3908 LearningRate 0.000091 Epoch: 29 Global Step: 603980 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:32,365-Speed 2498.38 samples/sec Loss 1.4140 LearningRate 0.000091 Epoch: 29 Global Step: 603990 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:40,563-Speed 2498.43 samples/sec Loss 1.3485 LearningRate 0.000091 Epoch: 29 Global Step: 604000 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:48,764-Speed 2498.12 samples/sec Loss 1.3899 LearningRate 0.000091 Epoch: 29 Global Step: 604010 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:42:56,964-Speed 2497.85 samples/sec Loss 1.4299 LearningRate 0.000091 Epoch: 29 Global Step: 604020 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:05,112-Speed 2513.79 samples/sec Loss 1.3730 LearningRate 0.000091 Epoch: 29 Global Step: 604030 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:13,312-Speed 2498.10 samples/sec Loss 1.4391 LearningRate 0.000091 Epoch: 29 Global Step: 604040 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:21,511-Speed 2498.62 samples/sec Loss 1.3757 LearningRate 0.000091 Epoch: 29 Global Step: 604050 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:29,712-Speed 2497.54 samples/sec Loss 1.3754 LearningRate 0.000091 Epoch: 29 Global Step: 604060 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:37,914-Speed 2497.43 samples/sec Loss 1.4105 LearningRate 0.000091 Epoch: 29 Global Step: 604070 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:46,117-Speed 2497.01 samples/sec Loss 1.4073 LearningRate 0.000091 Epoch: 29 Global Step: 604080 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:43:54,262-Speed 2514.80 samples/sec Loss 1.3980 LearningRate 0.000091 Epoch: 29 Global Step: 604090 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:02,462-Speed 2497.86 samples/sec Loss 1.3718 LearningRate 0.000091 Epoch: 29 Global Step: 604100 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:10,667-Speed 2496.53 samples/sec Loss 1.4077 LearningRate 0.000091 Epoch: 29 Global Step: 604110 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:18,871-Speed 2496.71 samples/sec Loss 1.3805 LearningRate 0.000091 Epoch: 29 Global Step: 604120 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:27,079-Speed 2495.49 samples/sec Loss 1.4089 LearningRate 0.000091 Epoch: 29 Global Step: 604130 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:35,279-Speed 2498.09 samples/sec Loss 1.3937 LearningRate 0.000091 Epoch: 29 Global Step: 604140 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:43,439-Speed 2510.08 samples/sec Loss 1.3826 LearningRate 0.000091 Epoch: 29 Global Step: 604150 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:51,637-Speed 2498.59 samples/sec Loss 1.3919 LearningRate 0.000091 Epoch: 29 Global Step: 604160 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:44:59,841-Speed 2497.14 samples/sec Loss 1.3911 LearningRate 0.000091 Epoch: 29 Global Step: 604170 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:08,045-Speed 2496.86 samples/sec Loss 1.4016 LearningRate 0.000091 Epoch: 29 Global Step: 604180 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:16,242-Speed 2498.88 samples/sec Loss 1.4147 LearningRate 0.000091 Epoch: 29 Global Step: 604190 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:24,447-Speed 2496.28 samples/sec Loss 1.3862 LearningRate 0.000091 Epoch: 29 Global Step: 604200 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:32,595-Speed 2513.96 samples/sec Loss 1.3974 LearningRate 0.000091 Epoch: 29 Global Step: 604210 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:40,796-Speed 2497.79 samples/sec Loss 1.3894 LearningRate 0.000091 Epoch: 29 Global Step: 604220 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:49,000-Speed 2496.66 samples/sec Loss 1.4011 LearningRate 0.000091 Epoch: 29 Global Step: 604230 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:45:57,204-Speed 2496.97 samples/sec Loss 1.4061 LearningRate 0.000091 Epoch: 29 Global Step: 604240 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:05,405-Speed 2497.56 samples/sec Loss 1.4037 LearningRate 0.000091 Epoch: 29 Global Step: 604250 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:13,603-Speed 2498.46 samples/sec Loss 1.4203 LearningRate 0.000091 Epoch: 29 Global Step: 604260 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:21,754-Speed 2513.00 samples/sec Loss 1.4022 LearningRate 0.000091 Epoch: 29 Global Step: 604270 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:29,955-Speed 2497.61 samples/sec Loss 1.3989 LearningRate 0.000091 Epoch: 29 Global Step: 604280 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:38,161-Speed 2496.39 samples/sec Loss 1.3782 LearningRate 0.000091 Epoch: 29 Global Step: 604290 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:46,360-Speed 2498.03 samples/sec Loss 1.4236 LearningRate 0.000091 Epoch: 29 Global Step: 604300 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:46:54,563-Speed 2497.20 samples/sec Loss 1.4349 LearningRate 0.000091 Epoch: 29 Global Step: 604310 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:02,762-Speed 2498.62 samples/sec Loss 1.3933 LearningRate 0.000091 Epoch: 29 Global Step: 604320 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:10,907-Speed 2515.04 samples/sec Loss 1.3974 LearningRate 0.000091 Epoch: 29 Global Step: 604330 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:19,106-Speed 2498.40 samples/sec Loss 1.3937 LearningRate 0.000091 Epoch: 29 Global Step: 604340 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:27,315-Speed 2495.09 samples/sec Loss 1.4113 LearningRate 0.000091 Epoch: 29 Global Step: 604350 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:35,515-Speed 2497.76 samples/sec Loss 1.3950 LearningRate 0.000091 Epoch: 29 Global Step: 604360 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:43,718-Speed 2497.46 samples/sec Loss 1.4466 LearningRate 0.000091 Epoch: 29 Global Step: 604370 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:47:51,922-Speed 2496.71 samples/sec Loss 1.3980 LearningRate 0.000091 Epoch: 29 Global Step: 604380 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:00,073-Speed 2513.13 samples/sec Loss 1.3988 LearningRate 0.000091 Epoch: 29 Global Step: 604390 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:08,276-Speed 2497.27 samples/sec Loss 1.4072 LearningRate 0.000091 Epoch: 29 Global Step: 604400 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:16,481-Speed 2496.33 samples/sec Loss 1.3918 LearningRate 0.000091 Epoch: 29 Global Step: 604410 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:24,685-Speed 2496.61 samples/sec Loss 1.4075 LearningRate 0.000091 Epoch: 29 Global Step: 604420 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:32,887-Speed 2497.65 samples/sec Loss 1.3821 LearningRate 0.000091 Epoch: 29 Global Step: 604430 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:41,097-Speed 2495.13 samples/sec Loss 1.4061 LearningRate 0.000091 Epoch: 29 Global Step: 604440 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:49,250-Speed 2512.56 samples/sec Loss 1.4324 LearningRate 0.000091 Epoch: 29 Global Step: 604450 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:48:57,450-Speed 2498.05 samples/sec Loss 1.4285 LearningRate 0.000091 Epoch: 29 Global Step: 604460 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:05,653-Speed 2497.00 samples/sec Loss 1.3770 LearningRate 0.000091 Epoch: 29 Global Step: 604470 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:13,858-Speed 2496.45 samples/sec Loss 1.4276 LearningRate 0.000091 Epoch: 29 Global Step: 604480 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:22,060-Speed 2497.38 samples/sec Loss 1.4093 LearningRate 0.000091 Epoch: 29 Global Step: 604490 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:30,262-Speed 2497.04 samples/sec Loss 1.3883 LearningRate 0.000091 Epoch: 29 Global Step: 604500 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:38,413-Speed 2513.12 samples/sec Loss 1.4117 LearningRate 0.000091 Epoch: 29 Global Step: 604510 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:46,620-Speed 2495.79 samples/sec Loss 1.3870 LearningRate 0.000091 Epoch: 29 Global Step: 604520 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:49:54,824-Speed 2496.72 samples/sec Loss 1.3682 LearningRate 0.000091 Epoch: 29 Global Step: 604530 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:03,029-Speed 2496.35 samples/sec Loss 1.4098 LearningRate 0.000091 Epoch: 29 Global Step: 604540 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:11,263-Speed 2487.48 samples/sec Loss 1.4054 LearningRate 0.000091 Epoch: 29 Global Step: 604550 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:19,467-Speed 2496.96 samples/sec Loss 1.4057 LearningRate 0.000091 Epoch: 29 Global Step: 604560 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:27,616-Speed 2513.63 samples/sec Loss 1.4535 LearningRate 0.000091 Epoch: 29 Global Step: 604570 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:35,818-Speed 2497.20 samples/sec Loss 1.4026 LearningRate 0.000091 Epoch: 29 Global Step: 604580 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:44,021-Speed 2497.15 samples/sec Loss 1.4338 LearningRate 0.000091 Epoch: 29 Global Step: 604590 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-07-11 08:50:52,222-Speed 2497.62 samples/sec Loss 1.4145 LearningRate 0.000091 Epoch: 29 Global Step: 604600 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:00,423-Speed 2497.49 samples/sec Loss 1.4284 LearningRate 0.000091 Epoch: 29 Global Step: 604610 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:08,638-Speed 2493.49 samples/sec Loss 1.3977 LearningRate 0.000091 Epoch: 29 Global Step: 604620 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:16,783-Speed 2514.64 samples/sec Loss 1.4205 LearningRate 0.000091 Epoch: 29 Global Step: 604630 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:24,986-Speed 2497.88 samples/sec Loss 1.4312 LearningRate 0.000091 Epoch: 29 Global Step: 604640 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:33,190-Speed 2496.69 samples/sec Loss 1.3826 LearningRate 0.000091 Epoch: 29 Global Step: 604650 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:41,392-Speed 2497.43 samples/sec Loss 1.4004 LearningRate 0.000091 Epoch: 29 Global Step: 604660 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:49,594-Speed 2497.76 samples/sec Loss 1.4279 LearningRate 0.000091 Epoch: 29 Global Step: 604670 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:51:57,798-Speed 2496.75 samples/sec Loss 1.4215 LearningRate 0.000091 Epoch: 29 Global Step: 604680 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:05,942-Speed 2514.94 samples/sec Loss 1.4471 LearningRate 0.000091 Epoch: 29 Global Step: 604690 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:14,141-Speed 2498.11 samples/sec Loss 1.4180 LearningRate 0.000091 Epoch: 29 Global Step: 604700 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:22,343-Speed 2497.41 samples/sec Loss 1.4207 LearningRate 0.000091 Epoch: 29 Global Step: 604710 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:30,541-Speed 2498.59 samples/sec Loss 1.3744 LearningRate 0.000091 Epoch: 29 Global Step: 604720 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:38,745-Speed 2496.89 samples/sec Loss 1.4251 LearningRate 0.000091 Epoch: 29 Global Step: 604730 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:46,957-Speed 2494.11 samples/sec Loss 1.4159 LearningRate 0.000091 Epoch: 29 Global Step: 604740 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:52:55,104-Speed 2514.61 samples/sec Loss 1.4009 LearningRate 0.000091 Epoch: 29 Global Step: 604750 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:03,300-Speed 2499.23 samples/sec Loss 1.3877 LearningRate 0.000091 Epoch: 29 Global Step: 604760 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:11,499-Speed 2498.16 samples/sec Loss 1.4526 LearningRate 0.000091 Epoch: 29 Global Step: 604770 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:19,700-Speed 2497.52 samples/sec Loss 1.4077 LearningRate 0.000091 Epoch: 29 Global Step: 604780 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:27,902-Speed 2497.56 samples/sec Loss 1.4413 LearningRate 0.000091 Epoch: 29 Global Step: 604790 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:36,102-Speed 2498.05 samples/sec Loss 1.4041 LearningRate 0.000091 Epoch: 29 Global Step: 604800 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:44,251-Speed 2513.63 samples/sec Loss 1.4679 LearningRate 0.000091 Epoch: 29 Global Step: 604810 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:53:52,454-Speed 2497.08 samples/sec Loss 1.4100 LearningRate 0.000091 Epoch: 29 Global Step: 604820 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:00,657-Speed 2497.12 samples/sec Loss 1.4362 LearningRate 0.000091 Epoch: 29 Global Step: 604830 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:08,860-Speed 2496.85 samples/sec Loss 1.4166 LearningRate 0.000091 Epoch: 29 Global Step: 604840 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:17,062-Speed 2497.30 samples/sec Loss 1.4108 LearningRate 0.000091 Epoch: 29 Global Step: 604850 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:25,266-Speed 2497.04 samples/sec Loss 1.4511 LearningRate 0.000091 Epoch: 29 Global Step: 604860 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:33,425-Speed 2510.60 samples/sec Loss 1.3878 LearningRate 0.000091 Epoch: 29 Global Step: 604870 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:41,626-Speed 2497.43 samples/sec Loss 1.4039 LearningRate 0.000091 Epoch: 29 Global Step: 604880 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:49,829-Speed 2497.29 samples/sec Loss 1.4391 LearningRate 0.000091 Epoch: 29 Global Step: 604890 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:54:58,028-Speed 2498.16 samples/sec Loss 1.4042 LearningRate 0.000091 Epoch: 29 Global Step: 604900 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:06,225-Speed 2498.84 samples/sec Loss 1.3529 LearningRate 0.000091 Epoch: 29 Global Step: 604910 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:14,426-Speed 2497.75 samples/sec Loss 1.3940 LearningRate 0.000091 Epoch: 29 Global Step: 604920 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:22,571-Speed 2514.53 samples/sec Loss 1.3822 LearningRate 0.000091 Epoch: 29 Global Step: 604930 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:30,772-Speed 2497.88 samples/sec Loss 1.4161 LearningRate 0.000091 Epoch: 29 Global Step: 604940 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:38,973-Speed 2497.76 samples/sec Loss 1.4117 LearningRate 0.000091 Epoch: 29 Global Step: 604950 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:47,170-Speed 2498.63 samples/sec Loss 1.4216 LearningRate 0.000090 Epoch: 29 Global Step: 604960 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:55:55,384-Speed 2493.92 samples/sec Loss 1.4485 LearningRate 0.000090 Epoch: 29 Global Step: 604970 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:03,594-Speed 2494.92 samples/sec Loss 1.3919 LearningRate 0.000090 Epoch: 29 Global Step: 604980 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:11,738-Speed 2514.90 samples/sec Loss 1.4259 LearningRate 0.000090 Epoch: 29 Global Step: 604990 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:19,938-Speed 2498.01 samples/sec Loss 1.3768 LearningRate 0.000090 Epoch: 29 Global Step: 605000 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:28,137-Speed 2498.29 samples/sec Loss 1.4174 LearningRate 0.000090 Epoch: 29 Global Step: 605010 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:36,341-Speed 2496.85 samples/sec Loss 1.4116 LearningRate 0.000090 Epoch: 29 Global Step: 605020 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:44,557-Speed 2492.77 samples/sec Loss 1.4191 LearningRate 0.000090 Epoch: 29 Global Step: 605030 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:56:52,759-Speed 2497.33 samples/sec Loss 1.3900 LearningRate 0.000090 Epoch: 29 Global Step: 605040 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:00,909-Speed 2513.51 samples/sec Loss 1.4312 LearningRate 0.000090 Epoch: 29 Global Step: 605050 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:09,108-Speed 2498.34 samples/sec Loss 1.3864 LearningRate 0.000090 Epoch: 29 Global Step: 605060 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:17,310-Speed 2497.02 samples/sec Loss 1.4273 LearningRate 0.000090 Epoch: 29 Global Step: 605070 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:25,510-Speed 2497.93 samples/sec Loss 1.3908 LearningRate 0.000090 Epoch: 29 Global Step: 605080 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:33,709-Speed 2498.43 samples/sec Loss 1.3909 LearningRate 0.000090 Epoch: 29 Global Step: 605090 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:41,911-Speed 2497.64 samples/sec Loss 1.4279 LearningRate 0.000090 Epoch: 29 Global Step: 605100 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:50,061-Speed 2513.44 samples/sec Loss 1.4233 LearningRate 0.000090 Epoch: 29 Global Step: 605110 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:57:58,263-Speed 2497.51 samples/sec Loss 1.3999 LearningRate 0.000090 Epoch: 29 Global Step: 605120 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:58:06,466-Speed 2497.06 samples/sec Loss 1.3558 LearningRate 0.000090 Epoch: 29 Global Step: 605130 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:58:14,671-Speed 2496.54 samples/sec Loss 1.3694 LearningRate 0.000090 Epoch: 29 Global Step: 605140 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:58:22,870-Speed 2498.21 samples/sec Loss 1.3757 LearningRate 0.000090 Epoch: 29 Global Step: 605150 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:58:31,070-Speed 2497.96 samples/sec Loss 1.4136 LearningRate 0.000090 Epoch: 29 Global Step: 605160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:58:39,215-Speed 2514.78 samples/sec Loss 1.3969 LearningRate 0.000090 Epoch: 29 Global Step: 605170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:58:47,418-Speed 2497.10 samples/sec Loss 1.4218 LearningRate 0.000090 Epoch: 29 Global Step: 605180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:58:55,620-Speed 2497.27 samples/sec Loss 1.4404 LearningRate 0.000090 Epoch: 29 Global Step: 605190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:59:03,821-Speed 2497.65 samples/sec Loss 1.3926 LearningRate 0.000090 Epoch: 29 Global Step: 605200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:59:12,022-Speed 2497.91 samples/sec Loss 1.4077 LearningRate 0.000090 Epoch: 29 Global Step: 605210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:59:20,223-Speed 2497.98 samples/sec Loss 1.4119 LearningRate 0.000090 Epoch: 29 Global Step: 605220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:59:28,369-Speed 2514.18 samples/sec Loss 1.4085 LearningRate 0.000090 Epoch: 29 Global Step: 605230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 08:59:36,529-Speed 2510.27 samples/sec Loss 1.4189 LearningRate 0.000090 Epoch: 29 Global Step: 605240 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:59:44,728-Speed 2498.67 samples/sec Loss 1.4022 LearningRate 0.000090 Epoch: 29 Global Step: 605250 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 08:59:52,927-Speed 2498.00 samples/sec Loss 1.4149 LearningRate 0.000090 Epoch: 29 Global Step: 605260 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:01,128-Speed 2497.81 samples/sec Loss 1.3911 LearningRate 0.000090 Epoch: 29 Global Step: 605270 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:09,333-Speed 2496.33 samples/sec Loss 1.3992 LearningRate 0.000090 Epoch: 29 Global Step: 605280 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:17,477-Speed 2515.07 samples/sec Loss 1.4079 LearningRate 0.000090 Epoch: 29 Global Step: 605290 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:25,705-Speed 2489.72 samples/sec Loss 1.4250 LearningRate 0.000090 Epoch: 29 Global Step: 605300 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:33,908-Speed 2496.76 samples/sec Loss 1.3930 LearningRate 0.000090 Epoch: 29 Global Step: 605310 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:42,116-Speed 2495.96 samples/sec Loss 1.3716 LearningRate 0.000090 Epoch: 29 Global Step: 605320 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:50,321-Speed 2496.30 samples/sec Loss 1.3900 LearningRate 0.000090 Epoch: 29 Global Step: 605330 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:00:58,521-Speed 2497.96 samples/sec Loss 1.4129 LearningRate 0.000090 Epoch: 29 Global Step: 605340 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:06,671-Speed 2513.54 samples/sec Loss 1.4204 LearningRate 0.000090 Epoch: 29 Global Step: 605350 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:14,870-Speed 2498.23 samples/sec Loss 1.4068 LearningRate 0.000090 Epoch: 29 Global Step: 605360 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:23,073-Speed 2496.95 samples/sec Loss 1.4714 LearningRate 0.000090 Epoch: 29 Global Step: 605370 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:31,277-Speed 2496.83 samples/sec Loss 1.3782 LearningRate 0.000090 Epoch: 29 Global Step: 605380 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:39,483-Speed 2496.01 samples/sec Loss 1.3944 LearningRate 0.000090 Epoch: 29 Global Step: 605390 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:47,688-Speed 2496.35 samples/sec Loss 1.3874 LearningRate 0.000090 Epoch: 29 Global Step: 605400 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:01:55,837-Speed 2513.93 samples/sec Loss 1.4214 LearningRate 0.000090 Epoch: 29 Global Step: 605410 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:04,039-Speed 2497.72 samples/sec Loss 1.3747 LearningRate 0.000090 Epoch: 29 Global Step: 605420 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:12,238-Speed 2498.16 samples/sec Loss 1.3802 LearningRate 0.000090 Epoch: 29 Global Step: 605430 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:20,439-Speed 2497.84 samples/sec Loss 1.3692 LearningRate 0.000090 Epoch: 29 Global Step: 605440 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:28,637-Speed 2498.28 samples/sec Loss 1.4252 LearningRate 0.000090 Epoch: 29 Global Step: 605450 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:36,842-Speed 2496.37 samples/sec Loss 1.4220 LearningRate 0.000090 Epoch: 29 Global Step: 605460 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:44,999-Speed 2511.21 samples/sec Loss 1.3721 LearningRate 0.000090 Epoch: 29 Global Step: 605470 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:02:53,202-Speed 2497.60 samples/sec Loss 1.4007 LearningRate 0.000090 Epoch: 29 Global Step: 605480 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:01,400-Speed 2498.53 samples/sec Loss 1.3744 LearningRate 0.000090 Epoch: 29 Global Step: 605490 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:09,601-Speed 2497.39 samples/sec Loss 1.4170 LearningRate 0.000090 Epoch: 29 Global Step: 605500 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:17,805-Speed 2496.80 samples/sec Loss 1.3780 LearningRate 0.000090 Epoch: 29 Global Step: 605510 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:26,005-Speed 2498.11 samples/sec Loss 1.3926 LearningRate 0.000090 Epoch: 29 Global Step: 605520 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:34,153-Speed 2513.96 samples/sec Loss 1.4108 LearningRate 0.000090 Epoch: 29 Global Step: 605530 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:42,357-Speed 2496.78 samples/sec Loss 1.4468 LearningRate 0.000090 Epoch: 29 Global Step: 605540 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:50,569-Speed 2494.27 samples/sec Loss 1.4292 LearningRate 0.000090 Epoch: 29 Global Step: 605550 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:03:58,776-Speed 2495.97 samples/sec Loss 1.4092 LearningRate 0.000090 Epoch: 29 Global Step: 605560 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:06,975-Speed 2498.38 samples/sec Loss 1.4118 LearningRate 0.000090 Epoch: 29 Global Step: 605570 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:15,189-Speed 2493.72 samples/sec Loss 1.3795 LearningRate 0.000090 Epoch: 29 Global Step: 605580 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:23,336-Speed 2514.09 samples/sec Loss 1.4309 LearningRate 0.000090 Epoch: 29 Global Step: 605590 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:31,536-Speed 2498.12 samples/sec Loss 1.3641 LearningRate 0.000090 Epoch: 29 Global Step: 605600 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:39,739-Speed 2496.90 samples/sec Loss 1.3831 LearningRate 0.000090 Epoch: 29 Global Step: 605610 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:47,940-Speed 2497.99 samples/sec Loss 1.4148 LearningRate 0.000090 Epoch: 29 Global Step: 605620 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:04:56,141-Speed 2497.50 samples/sec Loss 1.4279 LearningRate 0.000090 Epoch: 29 Global Step: 605630 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:04,343-Speed 2497.45 samples/sec Loss 1.4131 LearningRate 0.000090 Epoch: 29 Global Step: 605640 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:12,489-Speed 2514.33 samples/sec Loss 1.3858 LearningRate 0.000090 Epoch: 29 Global Step: 605650 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:20,688-Speed 2498.21 samples/sec Loss 1.3607 LearningRate 0.000090 Epoch: 29 Global Step: 605660 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:28,889-Speed 2497.61 samples/sec Loss 1.4194 LearningRate 0.000090 Epoch: 29 Global Step: 605670 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:37,090-Speed 2497.64 samples/sec Loss 1.3823 LearningRate 0.000090 Epoch: 29 Global Step: 605680 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:45,289-Speed 2498.34 samples/sec Loss 1.3989 LearningRate 0.000090 Epoch: 29 Global Step: 605690 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:05:53,491-Speed 2497.33 samples/sec Loss 1.3829 LearningRate 0.000090 Epoch: 29 Global Step: 605700 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:01,638-Speed 2514.14 samples/sec Loss 1.3756 LearningRate 0.000090 Epoch: 29 Global Step: 605710 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:09,849-Speed 2494.53 samples/sec Loss 1.3993 LearningRate 0.000090 Epoch: 29 Global Step: 605720 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:18,052-Speed 2496.94 samples/sec Loss 1.3754 LearningRate 0.000090 Epoch: 29 Global Step: 605730 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:26,254-Speed 2497.26 samples/sec Loss 1.3777 LearningRate 0.000090 Epoch: 29 Global Step: 605740 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:34,456-Speed 2497.51 samples/sec Loss 1.3992 LearningRate 0.000090 Epoch: 29 Global Step: 605750 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:42,657-Speed 2497.43 samples/sec Loss 1.4249 LearningRate 0.000090 Epoch: 29 Global Step: 605760 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:50,807-Speed 2513.30 samples/sec Loss 1.4312 LearningRate 0.000090 Epoch: 29 Global Step: 605770 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:06:59,005-Speed 2498.82 samples/sec Loss 1.3992 LearningRate 0.000090 Epoch: 29 Global Step: 605780 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:07,204-Speed 2498.09 samples/sec Loss 1.3915 LearningRate 0.000090 Epoch: 29 Global Step: 605790 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:15,406-Speed 2497.09 samples/sec Loss 1.3682 LearningRate 0.000090 Epoch: 29 Global Step: 605800 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:23,607-Speed 2497.69 samples/sec Loss 1.3824 LearningRate 0.000090 Epoch: 29 Global Step: 605810 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:31,810-Speed 2497.17 samples/sec Loss 1.4085 LearningRate 0.000090 Epoch: 29 Global Step: 605820 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:39,968-Speed 2510.76 samples/sec Loss 1.3735 LearningRate 0.000090 Epoch: 29 Global Step: 605830 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:48,166-Speed 2498.46 samples/sec Loss 1.3970 LearningRate 0.000090 Epoch: 29 Global Step: 605840 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:07:56,371-Speed 2496.36 samples/sec Loss 1.3622 LearningRate 0.000090 Epoch: 29 Global Step: 605850 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:04,569-Speed 2499.06 samples/sec Loss 1.4077 LearningRate 0.000090 Epoch: 29 Global Step: 605860 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:12,765-Speed 2499.42 samples/sec Loss 1.4054 LearningRate 0.000090 Epoch: 29 Global Step: 605870 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:20,963-Speed 2498.40 samples/sec Loss 1.4117 LearningRate 0.000090 Epoch: 29 Global Step: 605880 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:29,109-Speed 2514.47 samples/sec Loss 1.4044 LearningRate 0.000090 Epoch: 29 Global Step: 605890 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:37,308-Speed 2498.12 samples/sec Loss 1.4062 LearningRate 0.000090 Epoch: 29 Global Step: 605900 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:45,508-Speed 2498.11 samples/sec Loss 1.4164 LearningRate 0.000090 Epoch: 29 Global Step: 605910 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:08:53,720-Speed 2494.19 samples/sec Loss 1.3983 LearningRate 0.000090 Epoch: 29 Global Step: 605920 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:01,920-Speed 2497.96 samples/sec Loss 1.4019 LearningRate 0.000090 Epoch: 29 Global Step: 605930 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:10,122-Speed 2497.43 samples/sec Loss 1.3887 LearningRate 0.000090 Epoch: 29 Global Step: 605940 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:18,266-Speed 2515.09 samples/sec Loss 1.3935 LearningRate 0.000090 Epoch: 29 Global Step: 605950 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:26,468-Speed 2497.50 samples/sec Loss 1.3865 LearningRate 0.000090 Epoch: 29 Global Step: 605960 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:34,673-Speed 2496.38 samples/sec Loss 1.3687 LearningRate 0.000090 Epoch: 29 Global Step: 605970 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:42,874-Speed 2497.82 samples/sec Loss 1.3719 LearningRate 0.000090 Epoch: 29 Global Step: 605980 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:51,081-Speed 2495.57 samples/sec Loss 1.3842 LearningRate 0.000090 Epoch: 29 Global Step: 605990 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:09:59,281-Speed 2498.06 samples/sec Loss 1.4064 LearningRate 0.000090 Epoch: 29 Global Step: 606000 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:07,428-Speed 2514.15 samples/sec Loss 1.3780 LearningRate 0.000090 Epoch: 29 Global Step: 606010 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:15,634-Speed 2496.17 samples/sec Loss 1.4233 LearningRate 0.000090 Epoch: 29 Global Step: 606020 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:23,831-Speed 2498.76 samples/sec Loss 1.3749 LearningRate 0.000090 Epoch: 29 Global Step: 606030 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:32,033-Speed 2497.55 samples/sec Loss 1.3998 LearningRate 0.000090 Epoch: 29 Global Step: 606040 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:40,234-Speed 2497.49 samples/sec Loss 1.4070 LearningRate 0.000090 Epoch: 29 Global Step: 606050 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:48,438-Speed 2496.91 samples/sec Loss 1.3895 LearningRate 0.000090 Epoch: 29 Global Step: 606060 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:10:56,582-Speed 2514.79 samples/sec Loss 1.3557 LearningRate 0.000090 Epoch: 29 Global Step: 606070 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:04,793-Speed 2494.59 samples/sec Loss 1.4143 LearningRate 0.000090 Epoch: 29 Global Step: 606080 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:12,995-Speed 2497.32 samples/sec Loss 1.3969 LearningRate 0.000090 Epoch: 29 Global Step: 606090 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:21,197-Speed 2497.45 samples/sec Loss 1.3801 LearningRate 0.000090 Epoch: 29 Global Step: 606100 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:29,401-Speed 2496.71 samples/sec Loss 1.3667 LearningRate 0.000090 Epoch: 29 Global Step: 606110 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:37,610-Speed 2495.27 samples/sec Loss 1.3940 LearningRate 0.000090 Epoch: 29 Global Step: 606120 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:45,764-Speed 2511.98 samples/sec Loss 1.3743 LearningRate 0.000090 Epoch: 29 Global Step: 606130 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:11:53,970-Speed 2496.11 samples/sec Loss 1.3815 LearningRate 0.000090 Epoch: 29 Global Step: 606140 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:02,172-Speed 2497.38 samples/sec Loss 1.3897 LearningRate 0.000090 Epoch: 29 Global Step: 606150 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:10,376-Speed 2496.82 samples/sec Loss 1.4016 LearningRate 0.000090 Epoch: 29 Global Step: 606160 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:18,581-Speed 2496.60 samples/sec Loss 1.3951 LearningRate 0.000090 Epoch: 29 Global Step: 606170 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:26,786-Speed 2496.61 samples/sec Loss 1.4084 LearningRate 0.000090 Epoch: 29 Global Step: 606180 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:34,948-Speed 2509.25 samples/sec Loss 1.3594 LearningRate 0.000090 Epoch: 29 Global Step: 606190 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:43,152-Speed 2496.87 samples/sec Loss 1.3930 LearningRate 0.000090 Epoch: 29 Global Step: 606200 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:51,352-Speed 2498.05 samples/sec Loss 1.3784 LearningRate 0.000089 Epoch: 29 Global Step: 606210 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:12:59,555-Speed 2496.88 samples/sec Loss 1.3910 LearningRate 0.000089 Epoch: 29 Global Step: 606220 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:07,754-Speed 2498.07 samples/sec Loss 1.4073 LearningRate 0.000089 Epoch: 29 Global Step: 606230 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:15,957-Speed 2497.30 samples/sec Loss 1.3984 LearningRate 0.000089 Epoch: 29 Global Step: 606240 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:24,102-Speed 2514.76 samples/sec Loss 1.3722 LearningRate 0.000089 Epoch: 29 Global Step: 606250 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:32,302-Speed 2497.81 samples/sec Loss 1.3848 LearningRate 0.000089 Epoch: 29 Global Step: 606260 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:40,505-Speed 2497.38 samples/sec Loss 1.4045 LearningRate 0.000089 Epoch: 29 Global Step: 606270 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:48,709-Speed 2496.86 samples/sec Loss 1.4200 LearningRate 0.000089 Epoch: 29 Global Step: 606280 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:13:56,907-Speed 2498.47 samples/sec Loss 1.3769 LearningRate 0.000089 Epoch: 29 Global Step: 606290 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:05,106-Speed 2498.18 samples/sec Loss 1.3860 LearningRate 0.000089 Epoch: 29 Global Step: 606300 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:13,253-Speed 2514.19 samples/sec Loss 1.3629 LearningRate 0.000089 Epoch: 29 Global Step: 606310 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:21,452-Speed 2498.39 samples/sec Loss 1.4170 LearningRate 0.000089 Epoch: 29 Global Step: 606320 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:29,651-Speed 2498.22 samples/sec Loss 1.3743 LearningRate 0.000089 Epoch: 29 Global Step: 606330 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:37,850-Speed 2497.97 samples/sec Loss 1.3696 LearningRate 0.000089 Epoch: 29 Global Step: 606340 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:46,074-Speed 2490.80 samples/sec Loss 1.3980 LearningRate 0.000089 Epoch: 29 Global Step: 606350 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:14:54,277-Speed 2497.09 samples/sec Loss 1.3909 LearningRate 0.000089 Epoch: 29 Global Step: 606360 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:02,422-Speed 2514.85 samples/sec Loss 1.3709 LearningRate 0.000089 Epoch: 29 Global Step: 606370 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:10,622-Speed 2498.01 samples/sec Loss 1.3786 LearningRate 0.000089 Epoch: 29 Global Step: 606380 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:18,821-Speed 2498.19 samples/sec Loss 1.3938 LearningRate 0.000089 Epoch: 29 Global Step: 606390 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:27,021-Speed 2497.90 samples/sec Loss 1.3774 LearningRate 0.000089 Epoch: 29 Global Step: 606400 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:35,226-Speed 2496.45 samples/sec Loss 1.4442 LearningRate 0.000089 Epoch: 29 Global Step: 606410 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:43,423-Speed 2498.67 samples/sec Loss 1.3866 LearningRate 0.000089 Epoch: 29 Global Step: 606420 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:51,574-Speed 2512.94 samples/sec Loss 1.4006 LearningRate 0.000089 Epoch: 29 Global Step: 606430 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:15:59,779-Speed 2496.51 samples/sec Loss 1.3942 LearningRate 0.000089 Epoch: 29 Global Step: 606440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:07,979-Speed 2497.95 samples/sec Loss 1.3928 LearningRate 0.000089 Epoch: 29 Global Step: 606450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:16,181-Speed 2497.28 samples/sec Loss 1.3806 LearningRate 0.000089 Epoch: 29 Global Step: 606460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:24,387-Speed 2496.45 samples/sec Loss 1.3749 LearningRate 0.000089 Epoch: 29 Global Step: 606470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:32,589-Speed 2497.35 samples/sec Loss 1.4025 LearningRate 0.000089 Epoch: 29 Global Step: 606480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:40,739-Speed 2513.29 samples/sec Loss 1.3863 LearningRate 0.000089 Epoch: 29 Global Step: 606490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:48,940-Speed 2497.54 samples/sec Loss 1.4163 LearningRate 0.000089 Epoch: 29 Global Step: 606500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:16:57,139-Speed 2498.30 samples/sec Loss 1.3731 LearningRate 0.000089 Epoch: 29 Global Step: 606510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:05,341-Speed 2497.34 samples/sec Loss 1.3967 LearningRate 0.000089 Epoch: 29 Global Step: 606520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:13,542-Speed 2497.69 samples/sec Loss 1.4211 LearningRate 0.000089 Epoch: 29 Global Step: 606530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:21,743-Speed 2497.63 samples/sec Loss 1.4182 LearningRate 0.000089 Epoch: 29 Global Step: 606540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:29,886-Speed 2515.13 samples/sec Loss 1.3659 LearningRate 0.000089 Epoch: 29 Global Step: 606550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:38,115-Speed 2489.29 samples/sec Loss 1.4413 LearningRate 0.000089 Epoch: 29 Global Step: 606560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:46,316-Speed 2497.64 samples/sec Loss 1.3992 LearningRate 0.000089 Epoch: 29 Global Step: 606570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:17:54,531-Speed 2493.28 samples/sec Loss 1.4040 LearningRate 0.000089 Epoch: 29 Global Step: 606580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:02,732-Speed 2497.76 samples/sec Loss 1.4024 LearningRate 0.000089 Epoch: 29 Global Step: 606590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:10,931-Speed 2498.53 samples/sec Loss 1.3861 LearningRate 0.000089 Epoch: 29 Global Step: 606600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:19,078-Speed 2514.21 samples/sec Loss 1.4092 LearningRate 0.000089 Epoch: 29 Global Step: 606610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:27,278-Speed 2497.95 samples/sec Loss 1.4061 LearningRate 0.000089 Epoch: 29 Global Step: 606620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:35,476-Speed 2498.70 samples/sec Loss 1.3866 LearningRate 0.000089 Epoch: 29 Global Step: 606630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:43,678-Speed 2497.25 samples/sec Loss 1.4181 LearningRate 0.000089 Epoch: 29 Global Step: 606640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:18:51,880-Speed 2497.47 samples/sec Loss 1.4041 LearningRate 0.000089 Epoch: 29 Global Step: 606650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:00,080-Speed 2497.63 samples/sec Loss 1.3993 LearningRate 0.000089 Epoch: 29 Global Step: 606660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:08,227-Speed 2514.16 samples/sec Loss 1.3804 LearningRate 0.000089 Epoch: 29 Global Step: 606670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:16,429-Speed 2497.67 samples/sec Loss 1.4286 LearningRate 0.000089 Epoch: 29 Global Step: 606680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:24,629-Speed 2497.83 samples/sec Loss 1.3731 LearningRate 0.000089 Epoch: 29 Global Step: 606690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:32,844-Speed 2493.34 samples/sec Loss 1.3963 LearningRate 0.000089 Epoch: 29 Global Step: 606700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:41,055-Speed 2494.51 samples/sec Loss 1.3978 LearningRate 0.000089 Epoch: 29 Global Step: 606710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:49,253-Speed 2498.58 samples/sec Loss 1.3796 LearningRate 0.000089 Epoch: 29 Global Step: 606720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:19:57,410-Speed 2511.26 samples/sec Loss 1.4083 LearningRate 0.000089 Epoch: 29 Global Step: 606730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:05,609-Speed 2498.24 samples/sec Loss 1.4378 LearningRate 0.000089 Epoch: 29 Global Step: 606740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:13,809-Speed 2498.05 samples/sec Loss 1.3873 LearningRate 0.000089 Epoch: 29 Global Step: 606750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:22,013-Speed 2496.71 samples/sec Loss 1.4037 LearningRate 0.000089 Epoch: 29 Global Step: 606760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:30,221-Speed 2495.26 samples/sec Loss 1.3899 LearningRate 0.000089 Epoch: 29 Global Step: 606770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:38,421-Speed 2498.27 samples/sec Loss 1.3685 LearningRate 0.000089 Epoch: 29 Global Step: 606780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:46,575-Speed 2512.28 samples/sec Loss 1.4002 LearningRate 0.000089 Epoch: 29 Global Step: 606790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:20:54,777-Speed 2497.41 samples/sec Loss 1.4137 LearningRate 0.000089 Epoch: 29 Global Step: 606800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:02,978-Speed 2497.54 samples/sec Loss 1.3960 LearningRate 0.000089 Epoch: 29 Global Step: 606810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:11,177-Speed 2498.22 samples/sec Loss 1.4130 LearningRate 0.000089 Epoch: 29 Global Step: 606820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:19,378-Speed 2497.68 samples/sec Loss 1.3655 LearningRate 0.000089 Epoch: 29 Global Step: 606830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:27,577-Speed 2498.21 samples/sec Loss 1.3763 LearningRate 0.000089 Epoch: 29 Global Step: 606840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:35,720-Speed 2515.40 samples/sec Loss 1.4646 LearningRate 0.000089 Epoch: 29 Global Step: 606850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:43,925-Speed 2496.72 samples/sec Loss 1.3896 LearningRate 0.000089 Epoch: 29 Global Step: 606860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:21:52,122-Speed 2498.90 samples/sec Loss 1.3908 LearningRate 0.000089 Epoch: 29 Global Step: 606870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:22:00,326-Speed 2496.55 samples/sec Loss 1.3853 LearningRate 0.000089 Epoch: 29 Global Step: 606880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:22:08,526-Speed 2498.04 samples/sec Loss 1.4076 LearningRate 0.000089 Epoch: 29 Global Step: 606890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:22:16,724-Speed 2498.34 samples/sec Loss 1.3984 LearningRate 0.000089 Epoch: 29 Global Step: 606900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:22:24,874-Speed 2513.45 samples/sec Loss 1.3996 LearningRate 0.000089 Epoch: 29 Global Step: 606910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:22:33,030-Speed 2511.59 samples/sec Loss 1.4268 LearningRate 0.000089 Epoch: 29 Global Step: 606920 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:22:41,228-Speed 2498.46 samples/sec Loss 1.3820 LearningRate 0.000089 Epoch: 29 Global Step: 606930 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:22:49,429-Speed 2499.58 samples/sec Loss 1.4140 LearningRate 0.000089 Epoch: 29 Global Step: 606940 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:22:57,629-Speed 2498.03 samples/sec Loss 1.3899 LearningRate 0.000089 Epoch: 29 Global Step: 606950 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:05,829-Speed 2497.89 samples/sec Loss 1.3998 LearningRate 0.000089 Epoch: 29 Global Step: 606960 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:13,978-Speed 2513.32 samples/sec Loss 1.4190 LearningRate 0.000089 Epoch: 29 Global Step: 606970 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:22,180-Speed 2497.57 samples/sec Loss 1.4419 LearningRate 0.000089 Epoch: 29 Global Step: 606980 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:30,381-Speed 2497.60 samples/sec Loss 1.4165 LearningRate 0.000089 Epoch: 29 Global Step: 606990 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:38,594-Speed 2494.24 samples/sec Loss 1.3867 LearningRate 0.000089 Epoch: 29 Global Step: 607000 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:46,792-Speed 2498.19 samples/sec Loss 1.3985 LearningRate 0.000089 Epoch: 29 Global Step: 607010 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:23:54,992-Speed 2498.24 samples/sec Loss 1.4238 LearningRate 0.000089 Epoch: 29 Global Step: 607020 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:03,139-Speed 2514.44 samples/sec Loss 1.4095 LearningRate 0.000089 Epoch: 29 Global Step: 607030 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:11,339-Speed 2497.82 samples/sec Loss 1.4176 LearningRate 0.000089 Epoch: 29 Global Step: 607040 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:19,538-Speed 2498.29 samples/sec Loss 1.3969 LearningRate 0.000089 Epoch: 29 Global Step: 607050 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:27,736-Speed 2498.85 samples/sec Loss 1.4224 LearningRate 0.000089 Epoch: 29 Global Step: 607060 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:35,939-Speed 2497.08 samples/sec Loss 1.4018 LearningRate 0.000089 Epoch: 29 Global Step: 607070 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:44,139-Speed 2497.76 samples/sec Loss 1.4048 LearningRate 0.000089 Epoch: 29 Global Step: 607080 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:24:52,289-Speed 2513.43 samples/sec Loss 1.4148 LearningRate 0.000089 Epoch: 29 Global Step: 607090 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:00,489-Speed 2498.06 samples/sec Loss 1.4579 LearningRate 0.000089 Epoch: 29 Global Step: 607100 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:08,694-Speed 2496.32 samples/sec Loss 1.3832 LearningRate 0.000089 Epoch: 29 Global Step: 607110 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:16,894-Speed 2498.44 samples/sec Loss 1.3979 LearningRate 0.000089 Epoch: 29 Global Step: 607120 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:25,096-Speed 2497.42 samples/sec Loss 1.4052 LearningRate 0.000089 Epoch: 29 Global Step: 607130 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:33,297-Speed 2497.62 samples/sec Loss 1.3896 LearningRate 0.000089 Epoch: 29 Global Step: 607140 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:41,447-Speed 2513.18 samples/sec Loss 1.4069 LearningRate 0.000089 Epoch: 29 Global Step: 607150 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:49,650-Speed 2497.27 samples/sec Loss 1.4247 LearningRate 0.000089 Epoch: 29 Global Step: 607160 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:25:57,855-Speed 2496.21 samples/sec Loss 1.4013 LearningRate 0.000089 Epoch: 29 Global Step: 607170 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:06,062-Speed 2496.01 samples/sec Loss 1.4298 LearningRate 0.000089 Epoch: 29 Global Step: 607180 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:14,268-Speed 2496.09 samples/sec Loss 1.4136 LearningRate 0.000089 Epoch: 29 Global Step: 607190 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:22,471-Speed 2497.02 samples/sec Loss 1.4061 LearningRate 0.000089 Epoch: 29 Global Step: 607200 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:30,625-Speed 2512.07 samples/sec Loss 1.3850 LearningRate 0.000089 Epoch: 29 Global Step: 607210 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:38,830-Speed 2496.56 samples/sec Loss 1.4112 LearningRate 0.000089 Epoch: 29 Global Step: 607220 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:47,035-Speed 2496.46 samples/sec Loss 1.3788 LearningRate 0.000089 Epoch: 29 Global Step: 607230 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:26:55,239-Speed 2496.32 samples/sec Loss 1.4207 LearningRate 0.000089 Epoch: 29 Global Step: 607240 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:03,442-Speed 2497.30 samples/sec Loss 1.4498 LearningRate 0.000089 Epoch: 29 Global Step: 607250 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:11,646-Speed 2497.11 samples/sec Loss 1.4087 LearningRate 0.000089 Epoch: 29 Global Step: 607260 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:19,791-Speed 2514.58 samples/sec Loss 1.4110 LearningRate 0.000089 Epoch: 29 Global Step: 607270 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:27,995-Speed 2497.04 samples/sec Loss 1.3812 LearningRate 0.000089 Epoch: 29 Global Step: 607280 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:36,197-Speed 2497.11 samples/sec Loss 1.3770 LearningRate 0.000089 Epoch: 29 Global Step: 607290 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:44,405-Speed 2495.65 samples/sec Loss 1.4026 LearningRate 0.000089 Epoch: 29 Global Step: 607300 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:27:52,607-Speed 2497.62 samples/sec Loss 1.4071 LearningRate 0.000089 Epoch: 29 Global Step: 607310 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:00,806-Speed 2498.17 samples/sec Loss 1.4069 LearningRate 0.000089 Epoch: 29 Global Step: 607320 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:08,949-Speed 2515.55 samples/sec Loss 1.3825 LearningRate 0.000089 Epoch: 29 Global Step: 607330 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:17,156-Speed 2495.62 samples/sec Loss 1.3892 LearningRate 0.000089 Epoch: 29 Global Step: 607340 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:25,355-Speed 2498.42 samples/sec Loss 1.4015 LearningRate 0.000089 Epoch: 29 Global Step: 607350 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:33,556-Speed 2497.48 samples/sec Loss 1.3922 LearningRate 0.000089 Epoch: 29 Global Step: 607360 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:41,755-Speed 2498.42 samples/sec Loss 1.3471 LearningRate 0.000089 Epoch: 29 Global Step: 607370 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:49,956-Speed 2497.49 samples/sec Loss 1.4136 LearningRate 0.000089 Epoch: 29 Global Step: 607380 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:28:58,105-Speed 2513.61 samples/sec Loss 1.4193 LearningRate 0.000089 Epoch: 29 Global Step: 607390 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:06,305-Speed 2498.06 samples/sec Loss 1.4381 LearningRate 0.000089 Epoch: 29 Global Step: 607400 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:14,505-Speed 2497.98 samples/sec Loss 1.4072 LearningRate 0.000089 Epoch: 29 Global Step: 607410 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:22,709-Speed 2496.70 samples/sec Loss 1.3666 LearningRate 0.000089 Epoch: 29 Global Step: 607420 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:30,914-Speed 2496.38 samples/sec Loss 1.3994 LearningRate 0.000089 Epoch: 29 Global Step: 607430 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:39,116-Speed 2497.34 samples/sec Loss 1.3624 LearningRate 0.000089 Epoch: 29 Global Step: 607440 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:47,265-Speed 2513.70 samples/sec Loss 1.3870 LearningRate 0.000089 Epoch: 29 Global Step: 607450 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:29:55,464-Speed 2498.25 samples/sec Loss 1.4159 LearningRate 0.000088 Epoch: 29 Global Step: 607460 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:03,667-Speed 2496.99 samples/sec Loss 1.3912 LearningRate 0.000088 Epoch: 29 Global Step: 607470 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:11,871-Speed 2496.63 samples/sec Loss 1.4407 LearningRate 0.000088 Epoch: 29 Global Step: 607480 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:20,077-Speed 2496.11 samples/sec Loss 1.4050 LearningRate 0.000088 Epoch: 29 Global Step: 607490 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:28,280-Speed 2496.95 samples/sec Loss 1.3797 LearningRate 0.000088 Epoch: 29 Global Step: 607500 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:36,424-Speed 2515.30 samples/sec Loss 1.4162 LearningRate 0.000088 Epoch: 29 Global Step: 607510 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:44,633-Speed 2495.38 samples/sec Loss 1.3859 LearningRate 0.000088 Epoch: 29 Global Step: 607520 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:30:52,830-Speed 2498.61 samples/sec Loss 1.3880 LearningRate 0.000088 Epoch: 29 Global Step: 607530 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:01,030-Speed 2498.15 samples/sec Loss 1.3822 LearningRate 0.000088 Epoch: 29 Global Step: 607540 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:09,229-Speed 2498.19 samples/sec Loss 1.3994 LearningRate 0.000088 Epoch: 29 Global Step: 607550 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:17,427-Speed 2498.60 samples/sec Loss 1.3652 LearningRate 0.000088 Epoch: 29 Global Step: 607560 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:25,574-Speed 2514.17 samples/sec Loss 1.3884 LearningRate 0.000088 Epoch: 29 Global Step: 607570 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:33,784-Speed 2494.80 samples/sec Loss 1.4080 LearningRate 0.000088 Epoch: 29 Global Step: 607580 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:41,981-Speed 2498.87 samples/sec Loss 1.4072 LearningRate 0.000088 Epoch: 29 Global Step: 607590 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:50,182-Speed 2497.82 samples/sec Loss 1.3832 LearningRate 0.000088 Epoch: 29 Global Step: 607600 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:31:58,383-Speed 2497.92 samples/sec Loss 1.3720 LearningRate 0.000088 Epoch: 29 Global Step: 607610 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:06,585-Speed 2497.05 samples/sec Loss 1.3795 LearningRate 0.000088 Epoch: 29 Global Step: 607620 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:14,737-Speed 2512.74 samples/sec Loss 1.4387 LearningRate 0.000088 Epoch: 29 Global Step: 607630 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:22,937-Speed 2498.03 samples/sec Loss 1.4175 LearningRate 0.000088 Epoch: 29 Global Step: 607640 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:31,135-Speed 2498.54 samples/sec Loss 1.4042 LearningRate 0.000088 Epoch: 29 Global Step: 607650 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:39,337-Speed 2497.43 samples/sec Loss 1.4024 LearningRate 0.000088 Epoch: 29 Global Step: 607660 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:47,540-Speed 2496.93 samples/sec Loss 1.3824 LearningRate 0.000088 Epoch: 29 Global Step: 607670 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:32:55,737-Speed 2499.06 samples/sec Loss 1.4015 LearningRate 0.000088 Epoch: 29 Global Step: 607680 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:03,881-Speed 2514.91 samples/sec Loss 1.3873 LearningRate 0.000088 Epoch: 29 Global Step: 607690 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:12,082-Speed 2497.78 samples/sec Loss 1.3824 LearningRate 0.000088 Epoch: 29 Global Step: 607700 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:20,277-Speed 2499.34 samples/sec Loss 1.3699 LearningRate 0.000088 Epoch: 29 Global Step: 607710 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:28,475-Speed 2498.49 samples/sec Loss 1.4070 LearningRate 0.000088 Epoch: 29 Global Step: 607720 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:36,673-Speed 2498.56 samples/sec Loss 1.3864 LearningRate 0.000088 Epoch: 29 Global Step: 607730 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:44,881-Speed 2495.93 samples/sec Loss 1.4004 LearningRate 0.000088 Epoch: 29 Global Step: 607740 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:33:53,026-Speed 2514.79 samples/sec Loss 1.4621 LearningRate 0.000088 Epoch: 29 Global Step: 607750 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:01,228-Speed 2497.31 samples/sec Loss 1.4049 LearningRate 0.000088 Epoch: 29 Global Step: 607760 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:09,430-Speed 2497.45 samples/sec Loss 1.3741 LearningRate 0.000088 Epoch: 29 Global Step: 607770 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:17,638-Speed 2495.47 samples/sec Loss 1.3852 LearningRate 0.000088 Epoch: 29 Global Step: 607780 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:25,838-Speed 2498.09 samples/sec Loss 1.4248 LearningRate 0.000088 Epoch: 29 Global Step: 607790 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:34,037-Speed 2498.24 samples/sec Loss 1.3900 LearningRate 0.000088 Epoch: 29 Global Step: 607800 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:42,184-Speed 2514.19 samples/sec Loss 1.4277 LearningRate 0.000088 Epoch: 29 Global Step: 607810 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:50,386-Speed 2497.26 samples/sec Loss 1.3757 LearningRate 0.000088 Epoch: 29 Global Step: 607820 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:34:58,589-Speed 2497.18 samples/sec Loss 1.3843 LearningRate 0.000088 Epoch: 29 Global Step: 607830 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:06,788-Speed 2498.12 samples/sec Loss 1.3966 LearningRate 0.000088 Epoch: 29 Global Step: 607840 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:14,986-Speed 2498.35 samples/sec Loss 1.3844 LearningRate 0.000088 Epoch: 29 Global Step: 607850 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:23,188-Speed 2497.51 samples/sec Loss 1.3601 LearningRate 0.000088 Epoch: 29 Global Step: 607860 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:31,334-Speed 2514.50 samples/sec Loss 1.4105 LearningRate 0.000088 Epoch: 29 Global Step: 607870 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:39,537-Speed 2496.78 samples/sec Loss 1.4027 LearningRate 0.000088 Epoch: 29 Global Step: 607880 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:47,745-Speed 2495.60 samples/sec Loss 1.3688 LearningRate 0.000088 Epoch: 29 Global Step: 607890 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:35:55,945-Speed 2497.92 samples/sec Loss 1.4313 LearningRate 0.000088 Epoch: 29 Global Step: 607900 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:04,145-Speed 2498.13 samples/sec Loss 1.3733 LearningRate 0.000088 Epoch: 29 Global Step: 607910 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:12,349-Speed 2496.66 samples/sec Loss 1.3531 LearningRate 0.000088 Epoch: 29 Global Step: 607920 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:20,496-Speed 2514.03 samples/sec Loss 1.3806 LearningRate 0.000088 Epoch: 29 Global Step: 607930 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:28,697-Speed 2497.87 samples/sec Loss 1.3751 LearningRate 0.000088 Epoch: 29 Global Step: 607940 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:36,900-Speed 2496.93 samples/sec Loss 1.3759 LearningRate 0.000088 Epoch: 29 Global Step: 607950 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:45,101-Speed 2497.57 samples/sec Loss 1.3847 LearningRate 0.000088 Epoch: 29 Global Step: 607960 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:36:53,302-Speed 2497.59 samples/sec Loss 1.4015 LearningRate 0.000088 Epoch: 29 Global Step: 607970 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:01,505-Speed 2496.98 samples/sec Loss 1.3884 LearningRate 0.000088 Epoch: 29 Global Step: 607980 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:09,650-Speed 2514.94 samples/sec Loss 1.3999 LearningRate 0.000088 Epoch: 29 Global Step: 607990 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:17,853-Speed 2497.16 samples/sec Loss 1.3912 LearningRate 0.000088 Epoch: 29 Global Step: 608000 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:26,052-Speed 2498.25 samples/sec Loss 1.3875 LearningRate 0.000088 Epoch: 29 Global Step: 608010 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:34,251-Speed 2498.37 samples/sec Loss 1.4150 LearningRate 0.000088 Epoch: 29 Global Step: 608020 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:42,450-Speed 2498.11 samples/sec Loss 1.3725 LearningRate 0.000088 Epoch: 29 Global Step: 608030 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:50,650-Speed 2497.94 samples/sec Loss 1.3968 LearningRate 0.000088 Epoch: 29 Global Step: 608040 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:37:58,798-Speed 2513.92 samples/sec Loss 1.4007 LearningRate 0.000088 Epoch: 29 Global Step: 608050 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:07,010-Speed 2494.35 samples/sec Loss 1.3685 LearningRate 0.000088 Epoch: 29 Global Step: 608060 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:15,209-Speed 2498.21 samples/sec Loss 1.3989 LearningRate 0.000088 Epoch: 29 Global Step: 608070 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:23,408-Speed 2498.30 samples/sec Loss 1.3790 LearningRate 0.000088 Epoch: 29 Global Step: 608080 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:31,607-Speed 2498.53 samples/sec Loss 1.3676 LearningRate 0.000088 Epoch: 29 Global Step: 608090 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:39,811-Speed 2497.25 samples/sec Loss 1.4031 LearningRate 0.000088 Epoch: 29 Global Step: 608100 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:47,959-Speed 2513.59 samples/sec Loss 1.3769 LearningRate 0.000088 Epoch: 29 Global Step: 608110 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:38:56,173-Speed 2493.93 samples/sec Loss 1.4474 LearningRate 0.000088 Epoch: 29 Global Step: 608120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:04,370-Speed 2498.73 samples/sec Loss 1.3902 LearningRate 0.000088 Epoch: 29 Global Step: 608130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:12,574-Speed 2497.05 samples/sec Loss 1.4071 LearningRate 0.000088 Epoch: 29 Global Step: 608140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:20,774-Speed 2497.88 samples/sec Loss 1.4231 LearningRate 0.000088 Epoch: 29 Global Step: 608150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:28,971-Speed 2498.59 samples/sec Loss 1.4125 LearningRate 0.000088 Epoch: 29 Global Step: 608160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:37,119-Speed 2514.08 samples/sec Loss 1.4053 LearningRate 0.000088 Epoch: 29 Global Step: 608170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:45,318-Speed 2498.31 samples/sec Loss 1.4032 LearningRate 0.000088 Epoch: 29 Global Step: 608180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:39:53,524-Speed 2496.18 samples/sec Loss 1.4416 LearningRate 0.000088 Epoch: 29 Global Step: 608190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:01,722-Speed 2498.37 samples/sec Loss 1.3849 LearningRate 0.000088 Epoch: 29 Global Step: 608200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:09,921-Speed 2498.73 samples/sec Loss 1.4310 LearningRate 0.000088 Epoch: 29 Global Step: 608210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:18,121-Speed 2497.92 samples/sec Loss 1.3976 LearningRate 0.000088 Epoch: 29 Global Step: 608220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:26,270-Speed 2513.60 samples/sec Loss 1.4016 LearningRate 0.000088 Epoch: 29 Global Step: 608230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:34,469-Speed 2498.17 samples/sec Loss 1.4024 LearningRate 0.000088 Epoch: 29 Global Step: 608240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:42,670-Speed 2498.11 samples/sec Loss 1.4390 LearningRate 0.000088 Epoch: 29 Global Step: 608250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:50,890-Speed 2491.57 samples/sec Loss 1.4338 LearningRate 0.000088 Epoch: 29 Global Step: 608260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:40:59,090-Speed 2498.11 samples/sec Loss 1.3798 LearningRate 0.000088 Epoch: 29 Global Step: 608270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:07,291-Speed 2497.71 samples/sec Loss 1.3709 LearningRate 0.000088 Epoch: 29 Global Step: 608280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:15,435-Speed 2515.29 samples/sec Loss 1.3674 LearningRate 0.000088 Epoch: 29 Global Step: 608290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:23,633-Speed 2498.60 samples/sec Loss 1.3863 LearningRate 0.000088 Epoch: 29 Global Step: 608300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:31,835-Speed 2497.33 samples/sec Loss 1.3889 LearningRate 0.000088 Epoch: 29 Global Step: 608310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:40,040-Speed 2496.52 samples/sec Loss 1.3867 LearningRate 0.000088 Epoch: 29 Global Step: 608320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:48,241-Speed 2497.67 samples/sec Loss 1.4025 LearningRate 0.000088 Epoch: 29 Global Step: 608330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:41:56,439-Speed 2498.60 samples/sec Loss 1.3559 LearningRate 0.000088 Epoch: 29 Global Step: 608340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:04,585-Speed 2514.52 samples/sec Loss 1.3686 LearningRate 0.000088 Epoch: 29 Global Step: 608350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:12,786-Speed 2497.41 samples/sec Loss 1.3684 LearningRate 0.000088 Epoch: 29 Global Step: 608360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:20,989-Speed 2497.56 samples/sec Loss 1.3784 LearningRate 0.000088 Epoch: 29 Global Step: 608370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:29,191-Speed 2497.54 samples/sec Loss 1.3452 LearningRate 0.000088 Epoch: 29 Global Step: 608380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:37,387-Speed 2499.02 samples/sec Loss 1.3756 LearningRate 0.000088 Epoch: 29 Global Step: 608390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:45,589-Speed 2497.62 samples/sec Loss 1.4146 LearningRate 0.000088 Epoch: 29 Global Step: 608400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:42:53,738-Speed 2513.65 samples/sec Loss 1.3779 LearningRate 0.000088 Epoch: 29 Global Step: 608410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:01,948-Speed 2494.77 samples/sec Loss 1.3991 LearningRate 0.000088 Epoch: 29 Global Step: 608420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:10,146-Speed 2498.73 samples/sec Loss 1.3893 LearningRate 0.000088 Epoch: 29 Global Step: 608430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:18,350-Speed 2496.91 samples/sec Loss 1.3807 LearningRate 0.000088 Epoch: 29 Global Step: 608440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:26,560-Speed 2494.81 samples/sec Loss 1.4182 LearningRate 0.000088 Epoch: 29 Global Step: 608450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:34,761-Speed 2497.55 samples/sec Loss 1.4267 LearningRate 0.000088 Epoch: 29 Global Step: 608460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:42,906-Speed 2514.93 samples/sec Loss 1.3975 LearningRate 0.000088 Epoch: 29 Global Step: 608470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:51,103-Speed 2498.89 samples/sec Loss 1.3339 LearningRate 0.000088 Epoch: 29 Global Step: 608480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:43:59,312-Speed 2495.01 samples/sec Loss 1.3883 LearningRate 0.000088 Epoch: 29 Global Step: 608490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:07,523-Speed 2494.40 samples/sec Loss 1.3989 LearningRate 0.000088 Epoch: 29 Global Step: 608500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:15,722-Speed 2498.38 samples/sec Loss 1.3728 LearningRate 0.000088 Epoch: 29 Global Step: 608510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:23,921-Speed 2498.45 samples/sec Loss 1.3772 LearningRate 0.000088 Epoch: 29 Global Step: 608520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:32,069-Speed 2513.75 samples/sec Loss 1.4061 LearningRate 0.000088 Epoch: 29 Global Step: 608530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:40,269-Speed 2498.02 samples/sec Loss 1.4274 LearningRate 0.000088 Epoch: 29 Global Step: 608540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:48,468-Speed 2498.31 samples/sec Loss 1.4036 LearningRate 0.000088 Epoch: 29 Global Step: 608550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:44:56,667-Speed 2498.16 samples/sec Loss 1.4279 LearningRate 0.000088 Epoch: 29 Global Step: 608560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:04,870-Speed 2497.02 samples/sec Loss 1.4031 LearningRate 0.000088 Epoch: 29 Global Step: 608570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:13,072-Speed 2497.79 samples/sec Loss 1.3907 LearningRate 0.000088 Epoch: 29 Global Step: 608580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:21,230-Speed 2511.41 samples/sec Loss 1.4454 LearningRate 0.000088 Epoch: 29 Global Step: 608590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:29,426-Speed 2498.92 samples/sec Loss 1.4327 LearningRate 0.000088 Epoch: 29 Global Step: 608600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:37,628-Speed 2497.24 samples/sec Loss 1.4093 LearningRate 0.000088 Epoch: 29 Global Step: 608610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:45,828-Speed 2498.02 samples/sec Loss 1.3916 LearningRate 0.000088 Epoch: 29 Global Step: 608620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:45:54,030-Speed 2497.26 samples/sec Loss 1.4433 LearningRate 0.000088 Epoch: 29 Global Step: 608630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:02,233-Speed 2497.26 samples/sec Loss 1.4026 LearningRate 0.000088 Epoch: 29 Global Step: 608640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:10,387-Speed 2511.97 samples/sec Loss 1.3685 LearningRate 0.000088 Epoch: 29 Global Step: 608650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:18,598-Speed 2494.58 samples/sec Loss 1.3586 LearningRate 0.000088 Epoch: 29 Global Step: 608660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:26,797-Speed 2498.31 samples/sec Loss 1.3909 LearningRate 0.000088 Epoch: 29 Global Step: 608670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:34,995-Speed 2498.72 samples/sec Loss 1.3898 LearningRate 0.000088 Epoch: 29 Global Step: 608680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:43,197-Speed 2497.38 samples/sec Loss 1.3904 LearningRate 0.000088 Epoch: 29 Global Step: 608690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:51,397-Speed 2498.01 samples/sec Loss 1.4122 LearningRate 0.000088 Epoch: 29 Global Step: 608700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:46:59,547-Speed 2513.36 samples/sec Loss 1.3942 LearningRate 0.000088 Epoch: 29 Global Step: 608710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:47:07,743-Speed 2499.24 samples/sec Loss 1.3514 LearningRate 0.000087 Epoch: 29 Global Step: 608720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:47:15,942-Speed 2498.27 samples/sec Loss 1.3851 LearningRate 0.000087 Epoch: 29 Global Step: 608730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:47:24,143-Speed 2497.46 samples/sec Loss 1.4012 LearningRate 0.000087 Epoch: 29 Global Step: 608740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:47:32,343-Speed 2498.17 samples/sec Loss 1.3821 LearningRate 0.000087 Epoch: 29 Global Step: 608750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-07-11 09:47:40,503-Speed 2510.15 samples/sec Loss 1.4094 LearningRate 0.000087 Epoch: 29 Global Step: 608760 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:47:48,649-Speed 2514.48 samples/sec Loss 1.3905 LearningRate 0.000087 Epoch: 29 Global Step: 608770 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:47:56,852-Speed 2497.06 samples/sec Loss 1.3868 LearningRate 0.000087 Epoch: 29 Global Step: 608780 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:05,052-Speed 2498.13 samples/sec Loss 1.4241 LearningRate 0.000087 Epoch: 29 Global Step: 608790 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:13,250-Speed 2498.21 samples/sec Loss 1.4195 LearningRate 0.000087 Epoch: 29 Global Step: 608800 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:21,450-Speed 2498.06 samples/sec Loss 1.3779 LearningRate 0.000087 Epoch: 29 Global Step: 608810 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:29,649-Speed 2498.13 samples/sec Loss 1.4088 LearningRate 0.000087 Epoch: 29 Global Step: 608820 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:37,796-Speed 2514.46 samples/sec Loss 1.3874 LearningRate 0.000087 Epoch: 29 Global Step: 608830 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:45,995-Speed 2498.07 samples/sec Loss 1.4002 LearningRate 0.000087 Epoch: 29 Global Step: 608840 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:48:54,202-Speed 2495.67 samples/sec Loss 1.3672 LearningRate 0.000087 Epoch: 29 Global Step: 608850 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:02,401-Speed 2498.47 samples/sec Loss 1.4148 LearningRate 0.000087 Epoch: 29 Global Step: 608860 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:10,598-Speed 2498.95 samples/sec Loss 1.4413 LearningRate 0.000087 Epoch: 29 Global Step: 608870 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:18,795-Speed 2498.68 samples/sec Loss 1.4160 LearningRate 0.000087 Epoch: 29 Global Step: 608880 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:26,940-Speed 2514.96 samples/sec Loss 1.4016 LearningRate 0.000087 Epoch: 29 Global Step: 608890 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:35,174-Speed 2500.44 samples/sec Loss 1.3672 LearningRate 0.000087 Epoch: 29 Global Step: 608900 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:43,393-Speed 2491.95 samples/sec Loss 1.3883 LearningRate 0.000087 Epoch: 29 Global Step: 608910 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:51,603-Speed 2499.61 samples/sec Loss 1.4458 LearningRate 0.000087 Epoch: 29 Global Step: 608920 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:49:59,923-Speed 2475.35 samples/sec Loss 1.3579 LearningRate 0.000087 Epoch: 29 Global Step: 608930 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:50:08,157-Speed 2498.81 samples/sec Loss 1.4166 LearningRate 0.000087 Epoch: 29 Global Step: 608940 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:50:16,307-Speed 2513.24 samples/sec Loss 1.4091 LearningRate 0.000087 Epoch: 29 Global Step: 608950 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-07-11 09:50:24,505-Speed 2498.33 samples/sec Loss 1.3611 LearningRate 0.000087 Epoch: 29 Global Step: 608960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:50:32,733-Speed 2499.25 samples/sec Loss 1.4151 LearningRate 0.000087 Epoch: 29 Global Step: 608970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:50:42,721-Speed 2155.46 samples/sec Loss 1.3964 LearningRate 0.000087 Epoch: 29 Global Step: 608980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:50:50,934-Speed 2493.77 samples/sec Loss 1.3753 LearningRate 0.000087 Epoch: 29 Global Step: 608990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:04,770-Speed 1482.61 samples/sec Loss 1.3898 LearningRate 0.000087 Epoch: 29 Global Step: 609000 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:12,944-Speed 2516.73 samples/sec Loss 1.3654 LearningRate 0.000087 Epoch: 29 Global Step: 609010 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:21,140-Speed 2499.26 samples/sec Loss 1.3936 LearningRate 0.000087 Epoch: 29 Global Step: 609020 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:30,218-Speed 2256.12 samples/sec Loss 1.4266 LearningRate 0.000087 Epoch: 29 Global Step: 609030 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:38,419-Speed 2501.08 samples/sec Loss 1.3982 LearningRate 0.000087 Epoch: 29 Global Step: 609040 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:46,820-Speed 2456.18 samples/sec Loss 1.4106 LearningRate 0.000087 Epoch: 29 Global Step: 609050 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:51:59,229-Speed 1650.55 samples/sec Loss 1.3698 LearningRate 0.000087 Epoch: 29 Global Step: 609060 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:07,386-Speed 2515.79 samples/sec Loss 1.3588 LearningRate 0.000087 Epoch: 29 Global Step: 609070 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:15,650-Speed 2499.35 samples/sec Loss 1.3972 LearningRate 0.000087 Epoch: 29 Global Step: 609080 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:23,872-Speed 2491.12 samples/sec Loss 1.3497 LearningRate 0.000087 Epoch: 29 Global Step: 609090 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:32,086-Speed 2499.21 samples/sec Loss 1.4024 LearningRate 0.000087 Epoch: 29 Global Step: 609100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:40,321-Speed 2496.56 samples/sec Loss 1.4057 LearningRate 0.000087 Epoch: 29 Global Step: 609110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:48,524-Speed 2496.90 samples/sec Loss 1.3715 LearningRate 0.000087 Epoch: 29 Global Step: 609120 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:52:58,667-Speed 2074.02 samples/sec Loss 1.3870 LearningRate 0.000087 Epoch: 29 Global Step: 609130 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:07,095-Speed 2430.40 samples/sec Loss 1.3720 LearningRate 0.000087 Epoch: 29 Global Step: 609140 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:15,758-Speed 2496.95 samples/sec Loss 1.3665 LearningRate 0.000087 Epoch: 29 Global Step: 609150 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:23,963-Speed 2496.34 samples/sec Loss 1.3890 LearningRate 0.000087 Epoch: 29 Global Step: 609160 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:32,166-Speed 2497.13 samples/sec Loss 1.4093 LearningRate 0.000087 Epoch: 29 Global Step: 609170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:40,366-Speed 2498.15 samples/sec Loss 1.4369 LearningRate 0.000087 Epoch: 29 Global Step: 609180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:48,516-Speed 2513.08 samples/sec Loss 1.4024 LearningRate 0.000087 Epoch: 29 Global Step: 609190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:53:56,716-Speed 2498.24 samples/sec Loss 1.3680 LearningRate 0.000087 Epoch: 29 Global Step: 609200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:04,918-Speed 2497.09 samples/sec Loss 1.3880 LearningRate 0.000087 Epoch: 29 Global Step: 609210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:13,115-Speed 2499.03 samples/sec Loss 1.4072 LearningRate 0.000087 Epoch: 29 Global Step: 609220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:21,312-Speed 2498.82 samples/sec Loss 1.3938 LearningRate 0.000087 Epoch: 29 Global Step: 609230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:29,512-Speed 2498.07 samples/sec Loss 1.3639 LearningRate 0.000087 Epoch: 29 Global Step: 609240 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:37,661-Speed 2513.36 samples/sec Loss 1.3612 LearningRate 0.000087 Epoch: 29 Global Step: 609250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:45,865-Speed 2496.67 samples/sec Loss 1.3818 LearningRate 0.000087 Epoch: 29 Global Step: 609260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:54:54,072-Speed 2495.97 samples/sec Loss 1.3884 LearningRate 0.000087 Epoch: 29 Global Step: 609270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:02,273-Speed 2497.69 samples/sec Loss 1.3813 LearningRate 0.000087 Epoch: 29 Global Step: 609280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:10,482-Speed 2495.50 samples/sec Loss 1.3759 LearningRate 0.000087 Epoch: 29 Global Step: 609290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:18,697-Speed 2493.31 samples/sec Loss 1.4057 LearningRate 0.000087 Epoch: 29 Global Step: 609300 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:26,848-Speed 2512.71 samples/sec Loss 1.3861 LearningRate 0.000087 Epoch: 29 Global Step: 609310 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:35,050-Speed 2497.81 samples/sec Loss 1.3907 LearningRate 0.000087 Epoch: 29 Global Step: 609320 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:43,247-Speed 2499.21 samples/sec Loss 1.3921 LearningRate 0.000087 Epoch: 29 Global Step: 609330 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:51,449-Speed 2497.40 samples/sec Loss 1.3793 LearningRate 0.000087 Epoch: 29 Global Step: 609340 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:55:59,648-Speed 2499.09 samples/sec Loss 1.3669 LearningRate 0.000087 Epoch: 29 Global Step: 609350 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:07,845-Speed 2498.56 samples/sec Loss 1.3614 LearningRate 0.000087 Epoch: 29 Global Step: 609360 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:15,990-Speed 2514.90 samples/sec Loss 1.3684 LearningRate 0.000087 Epoch: 29 Global Step: 609370 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:24,188-Speed 2498.58 samples/sec Loss 1.4188 LearningRate 0.000087 Epoch: 29 Global Step: 609380 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:32,387-Speed 2498.28 samples/sec Loss 1.3744 LearningRate 0.000087 Epoch: 29 Global Step: 609390 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:40,590-Speed 2497.11 samples/sec Loss 1.4006 LearningRate 0.000087 Epoch: 29 Global Step: 609400 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:48,793-Speed 2497.01 samples/sec Loss 1.3938 LearningRate 0.000087 Epoch: 29 Global Step: 609410 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:56:56,994-Speed 2497.65 samples/sec Loss 1.3649 LearningRate 0.000087 Epoch: 29 Global Step: 609420 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:05,141-Speed 2514.29 samples/sec Loss 1.3792 LearningRate 0.000087 Epoch: 29 Global Step: 609430 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:13,341-Speed 2497.78 samples/sec Loss 1.3624 LearningRate 0.000087 Epoch: 29 Global Step: 609440 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:21,543-Speed 2497.51 samples/sec Loss 1.3680 LearningRate 0.000087 Epoch: 29 Global Step: 609450 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:29,743-Speed 2497.74 samples/sec Loss 1.4193 LearningRate 0.000087 Epoch: 29 Global Step: 609460 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:37,945-Speed 2497.39 samples/sec Loss 1.3815 LearningRate 0.000087 Epoch: 29 Global Step: 609470 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:46,148-Speed 2497.21 samples/sec Loss 1.4259 LearningRate 0.000087 Epoch: 29 Global Step: 609480 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:57:54,295-Speed 2514.12 samples/sec Loss 1.3798 LearningRate 0.000087 Epoch: 29 Global Step: 609490 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:02,498-Speed 2496.77 samples/sec Loss 1.3798 LearningRate 0.000087 Epoch: 29 Global Step: 609500 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:10,703-Speed 2496.18 samples/sec Loss 1.3965 LearningRate 0.000087 Epoch: 29 Global Step: 609510 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:18,915-Speed 2494.61 samples/sec Loss 1.3884 LearningRate 0.000087 Epoch: 29 Global Step: 609520 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:27,114-Speed 2498.46 samples/sec Loss 1.4221 LearningRate 0.000087 Epoch: 29 Global Step: 609530 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:35,319-Speed 2496.34 samples/sec Loss 1.3918 LearningRate 0.000087 Epoch: 29 Global Step: 609540 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:43,489-Speed 2507.22 samples/sec Loss 1.3950 LearningRate 0.000087 Epoch: 29 Global Step: 609550 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:51,687-Speed 2498.38 samples/sec Loss 1.4017 LearningRate 0.000087 Epoch: 29 Global Step: 609560 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:58:59,893-Speed 2496.40 samples/sec Loss 1.4155 LearningRate 0.000087 Epoch: 29 Global Step: 609570 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:08,105-Speed 2494.07 samples/sec Loss 1.4125 LearningRate 0.000087 Epoch: 29 Global Step: 609580 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:16,307-Speed 2497.56 samples/sec Loss 1.3996 LearningRate 0.000087 Epoch: 29 Global Step: 609590 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:24,516-Speed 2495.28 samples/sec Loss 1.4116 LearningRate 0.000087 Epoch: 29 Global Step: 609600 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:32,677-Speed 2510.04 samples/sec Loss 1.3791 LearningRate 0.000087 Epoch: 29 Global Step: 609610 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:40,876-Speed 2498.16 samples/sec Loss 1.3680 LearningRate 0.000087 Epoch: 29 Global Step: 609620 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:49,074-Speed 2498.60 samples/sec Loss 1.3983 LearningRate 0.000087 Epoch: 29 Global Step: 609630 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 09:59:57,277-Speed 2497.41 samples/sec Loss 1.3728 LearningRate 0.000087 Epoch: 29 Global Step: 609640 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:05,477-Speed 2497.73 samples/sec Loss 1.3988 LearningRate 0.000087 Epoch: 29 Global Step: 609650 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:13,679-Speed 2497.25 samples/sec Loss 1.4193 LearningRate 0.000087 Epoch: 29 Global Step: 609660 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:21,829-Speed 2513.29 samples/sec Loss 1.3706 LearningRate 0.000087 Epoch: 29 Global Step: 609670 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:30,028-Speed 2498.27 samples/sec Loss 1.3808 LearningRate 0.000087 Epoch: 29 Global Step: 609680 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:38,228-Speed 2497.82 samples/sec Loss 1.3704 LearningRate 0.000087 Epoch: 29 Global Step: 609690 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:46,432-Speed 2496.74 samples/sec Loss 1.4030 LearningRate 0.000087 Epoch: 29 Global Step: 609700 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:00:54,636-Speed 2496.92 samples/sec Loss 1.3730 LearningRate 0.000087 Epoch: 29 Global Step: 609710 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:02,838-Speed 2497.28 samples/sec Loss 1.3716 LearningRate 0.000087 Epoch: 29 Global Step: 609720 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:10,986-Speed 2513.67 samples/sec Loss 1.3736 LearningRate 0.000087 Epoch: 29 Global Step: 609730 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:19,187-Speed 2497.80 samples/sec Loss 1.4204 LearningRate 0.000087 Epoch: 29 Global Step: 609740 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:27,388-Speed 2497.96 samples/sec Loss 1.4125 LearningRate 0.000087 Epoch: 29 Global Step: 609750 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:35,586-Speed 2498.29 samples/sec Loss 1.3903 LearningRate 0.000087 Epoch: 29 Global Step: 609760 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:43,783-Speed 2498.66 samples/sec Loss 1.3798 LearningRate 0.000087 Epoch: 29 Global Step: 609770 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:01:51,985-Speed 2497.87 samples/sec Loss 1.3884 LearningRate 0.000087 Epoch: 29 Global Step: 609780 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:00,130-Speed 2514.89 samples/sec Loss 1.3746 LearningRate 0.000087 Epoch: 29 Global Step: 609790 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:08,329-Speed 2498.05 samples/sec Loss 1.3652 LearningRate 0.000087 Epoch: 29 Global Step: 609800 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:16,529-Speed 2498.13 samples/sec Loss 1.3891 LearningRate 0.000087 Epoch: 29 Global Step: 609810 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:24,733-Speed 2496.71 samples/sec Loss 1.3751 LearningRate 0.000087 Epoch: 29 Global Step: 609820 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:32,940-Speed 2495.80 samples/sec Loss 1.3511 LearningRate 0.000087 Epoch: 29 Global Step: 609830 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:41,139-Speed 2498.24 samples/sec Loss 1.3904 LearningRate 0.000087 Epoch: 29 Global Step: 609840 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:49,288-Speed 2513.37 samples/sec Loss 1.4018 LearningRate 0.000087 Epoch: 29 Global Step: 609850 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:02:57,490-Speed 2497.76 samples/sec Loss 1.4107 LearningRate 0.000087 Epoch: 29 Global Step: 609860 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:05,694-Speed 2496.88 samples/sec Loss 1.3894 LearningRate 0.000087 Epoch: 29 Global Step: 609870 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:13,899-Speed 2496.41 samples/sec Loss 1.4107 LearningRate 0.000087 Epoch: 29 Global Step: 609880 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:22,112-Speed 2493.96 samples/sec Loss 1.3985 LearningRate 0.000087 Epoch: 29 Global Step: 609890 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:30,309-Speed 2498.97 samples/sec Loss 1.4071 LearningRate 0.000087 Epoch: 29 Global Step: 609900 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:38,457-Speed 2513.79 samples/sec Loss 1.3624 LearningRate 0.000087 Epoch: 29 Global Step: 609910 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:46,653-Speed 2499.13 samples/sec Loss 1.3790 LearningRate 0.000087 Epoch: 29 Global Step: 609920 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:03:54,866-Speed 2493.97 samples/sec Loss 1.3821 LearningRate 0.000087 Epoch: 29 Global Step: 609930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:04:03,065-Speed 2498.28 samples/sec Loss 1.4128 LearningRate 0.000087 Epoch: 29 Global Step: 609940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:04:11,270-Speed 2496.56 samples/sec Loss 1.3945 LearningRate 0.000087 Epoch: 29 Global Step: 609950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:04:19,470-Speed 2497.78 samples/sec Loss 1.4230 LearningRate 0.000087 Epoch: 29 Global Step: 609960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:04:27,621-Speed 2513.07 samples/sec Loss 1.3698 LearningRate 0.000087 Epoch: 29 Global Step: 609970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:04:35,823-Speed 2497.43 samples/sec Loss 1.3857 LearningRate 0.000086 Epoch: 29 Global Step: 609980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:04:44,025-Speed 2497.34 samples/sec Loss 1.3886 LearningRate 0.000086 Epoch: 29 Global Step: 609990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:04:52,224-Speed 2498.09 samples/sec Loss 1.3894 LearningRate 0.000086 Epoch: 29 Global Step: 610000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:00,424-Speed 2497.92 samples/sec Loss 1.4183 LearningRate 0.000086 Epoch: 29 Global Step: 610010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:08,624-Speed 2497.99 samples/sec Loss 1.3807 LearningRate 0.000086 Epoch: 29 Global Step: 610020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:16,774-Speed 2513.33 samples/sec Loss 1.3590 LearningRate 0.000086 Epoch: 29 Global Step: 610030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:24,978-Speed 2496.76 samples/sec Loss 1.4125 LearningRate 0.000086 Epoch: 29 Global Step: 610040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:33,177-Speed 2498.17 samples/sec Loss 1.4372 LearningRate 0.000086 Epoch: 29 Global Step: 610050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:41,382-Speed 2496.41 samples/sec Loss 1.3602 LearningRate 0.000086 Epoch: 29 Global Step: 610060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:49,584-Speed 2497.35 samples/sec Loss 1.4123 LearningRate 0.000086 Epoch: 29 Global Step: 610070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:05:57,786-Speed 2497.14 samples/sec Loss 1.3528 LearningRate 0.000086 Epoch: 29 Global Step: 610080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:06:05,938-Speed 2512.67 samples/sec Loss 1.4106 LearningRate 0.000086 Epoch: 29 Global Step: 610090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:06:14,101-Speed 2509.53 samples/sec Loss 1.3683 LearningRate 0.000086 Epoch: 29 Global Step: 610100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:06:22,304-Speed 2496.91 samples/sec Loss 1.3913 LearningRate 0.000086 Epoch: 29 Global Step: 610110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:06:30,508-Speed 2496.71 samples/sec Loss 1.3696 LearningRate 0.000086 Epoch: 29 Global Step: 610120 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:06:38,709-Speed 2497.69 samples/sec Loss 1.3752 LearningRate 0.000086 Epoch: 29 Global Step: 610130 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:06:46,919-Speed 2495.04 samples/sec Loss 1.3657 LearningRate 0.000086 Epoch: 29 Global Step: 610140 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:06:55,063-Speed 2514.84 samples/sec Loss 1.3636 LearningRate 0.000086 Epoch: 29 Global Step: 610150 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:03,266-Speed 2497.28 samples/sec Loss 1.4174 LearningRate 0.000086 Epoch: 29 Global Step: 610160 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:11,467-Speed 2497.64 samples/sec Loss 1.3917 LearningRate 0.000086 Epoch: 29 Global Step: 610170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:19,670-Speed 2497.10 samples/sec Loss 1.4034 LearningRate 0.000086 Epoch: 29 Global Step: 610180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:27,867-Speed 2498.81 samples/sec Loss 1.4056 LearningRate 0.000086 Epoch: 29 Global Step: 610190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:36,067-Speed 2497.84 samples/sec Loss 1.3817 LearningRate 0.000086 Epoch: 29 Global Step: 610200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:44,218-Speed 2513.13 samples/sec Loss 1.4078 LearningRate 0.000086 Epoch: 29 Global Step: 610210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:07:52,419-Speed 2497.61 samples/sec Loss 1.3998 LearningRate 0.000086 Epoch: 29 Global Step: 610220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:00,619-Speed 2498.02 samples/sec Loss 1.3503 LearningRate 0.000086 Epoch: 29 Global Step: 610230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:08,821-Speed 2497.11 samples/sec Loss 1.3821 LearningRate 0.000086 Epoch: 29 Global Step: 610240 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:17,022-Speed 2497.53 samples/sec Loss 1.3732 LearningRate 0.000086 Epoch: 29 Global Step: 610250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:25,225-Speed 2497.15 samples/sec Loss 1.3876 LearningRate 0.000086 Epoch: 29 Global Step: 610260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:33,371-Speed 2514.68 samples/sec Loss 1.3588 LearningRate 0.000086 Epoch: 29 Global Step: 610270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:41,571-Speed 2497.89 samples/sec Loss 1.3969 LearningRate 0.000086 Epoch: 29 Global Step: 610280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:49,770-Speed 2498.91 samples/sec Loss 1.3737 LearningRate 0.000086 Epoch: 29 Global Step: 610290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:08:57,969-Speed 2498.11 samples/sec Loss 1.4194 LearningRate 0.000086 Epoch: 29 Global Step: 610300 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:06,171-Speed 2497.55 samples/sec Loss 1.3846 LearningRate 0.000086 Epoch: 29 Global Step: 610310 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:14,382-Speed 2494.36 samples/sec Loss 1.3820 LearningRate 0.000086 Epoch: 29 Global Step: 610320 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:22,535-Speed 2512.49 samples/sec Loss 1.3875 LearningRate 0.000086 Epoch: 29 Global Step: 610330 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:30,737-Speed 2497.69 samples/sec Loss 1.3815 LearningRate 0.000086 Epoch: 29 Global Step: 610340 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:38,942-Speed 2496.28 samples/sec Loss 1.3681 LearningRate 0.000086 Epoch: 29 Global Step: 610350 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:47,142-Speed 2498.11 samples/sec Loss 1.3782 LearningRate 0.000086 Epoch: 29 Global Step: 610360 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:09:55,357-Speed 2493.06 samples/sec Loss 1.3475 LearningRate 0.000086 Epoch: 29 Global Step: 610370 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:03,558-Speed 2497.69 samples/sec Loss 1.3870 LearningRate 0.000086 Epoch: 29 Global Step: 610380 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:11,707-Speed 2513.52 samples/sec Loss 1.3933 LearningRate 0.000086 Epoch: 29 Global Step: 610390 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:19,905-Speed 2498.50 samples/sec Loss 1.3768 LearningRate 0.000086 Epoch: 29 Global Step: 610400 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:28,116-Speed 2494.90 samples/sec Loss 1.3863 LearningRate 0.000086 Epoch: 29 Global Step: 610410 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:36,322-Speed 2496.01 samples/sec Loss 1.3876 LearningRate 0.000086 Epoch: 29 Global Step: 610420 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:44,544-Speed 2491.18 samples/sec Loss 1.3975 LearningRate 0.000086 Epoch: 29 Global Step: 610430 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:10:52,745-Speed 2497.69 samples/sec Loss 1.3896 LearningRate 0.000086 Epoch: 29 Global Step: 610440 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:00,893-Speed 2513.98 samples/sec Loss 1.4192 LearningRate 0.000086 Epoch: 29 Global Step: 610450 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:09,092-Speed 2498.22 samples/sec Loss 1.3850 LearningRate 0.000086 Epoch: 29 Global Step: 610460 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:17,295-Speed 2497.00 samples/sec Loss 1.3983 LearningRate 0.000086 Epoch: 29 Global Step: 610470 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:25,496-Speed 2497.55 samples/sec Loss 1.4017 LearningRate 0.000086 Epoch: 29 Global Step: 610480 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:33,700-Speed 2496.81 samples/sec Loss 1.3818 LearningRate 0.000086 Epoch: 29 Global Step: 610490 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:41,902-Speed 2497.29 samples/sec Loss 1.3862 LearningRate 0.000086 Epoch: 29 Global Step: 610500 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:50,054-Speed 2512.72 samples/sec Loss 1.3717 LearningRate 0.000086 Epoch: 29 Global Step: 610510 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:11:58,253-Speed 2498.59 samples/sec Loss 1.3813 LearningRate 0.000086 Epoch: 29 Global Step: 610520 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:06,459-Speed 2496.05 samples/sec Loss 1.4013 LearningRate 0.000086 Epoch: 29 Global Step: 610530 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:14,670-Speed 2494.46 samples/sec Loss 1.3633 LearningRate 0.000086 Epoch: 29 Global Step: 610540 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:22,872-Speed 2497.40 samples/sec Loss 1.3429 LearningRate 0.000086 Epoch: 29 Global Step: 610550 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:31,072-Speed 2498.06 samples/sec Loss 1.3629 LearningRate 0.000086 Epoch: 29 Global Step: 610560 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:39,217-Speed 2514.63 samples/sec Loss 1.3851 LearningRate 0.000086 Epoch: 29 Global Step: 610570 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:47,420-Speed 2496.95 samples/sec Loss 1.4312 LearningRate 0.000086 Epoch: 29 Global Step: 610580 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:12:55,625-Speed 2496.43 samples/sec Loss 1.4022 LearningRate 0.000086 Epoch: 29 Global Step: 610590 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:03,827-Speed 2497.52 samples/sec Loss 1.3939 LearningRate 0.000086 Epoch: 29 Global Step: 610600 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:12,026-Speed 2498.14 samples/sec Loss 1.3985 LearningRate 0.000086 Epoch: 29 Global Step: 610610 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:20,227-Speed 2497.60 samples/sec Loss 1.4131 LearningRate 0.000086 Epoch: 29 Global Step: 610620 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:28,379-Speed 2512.51 samples/sec Loss 1.4397 LearningRate 0.000086 Epoch: 29 Global Step: 610630 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:36,580-Speed 2498.07 samples/sec Loss 1.3885 LearningRate 0.000086 Epoch: 29 Global Step: 610640 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:44,782-Speed 2497.15 samples/sec Loss 1.4173 LearningRate 0.000086 Epoch: 29 Global Step: 610650 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:13:52,986-Speed 2496.83 samples/sec Loss 1.3813 LearningRate 0.000086 Epoch: 29 Global Step: 610660 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:01,190-Speed 2496.91 samples/sec Loss 1.4482 LearningRate 0.000086 Epoch: 29 Global Step: 610670 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:09,390-Speed 2497.96 samples/sec Loss 1.3679 LearningRate 0.000086 Epoch: 29 Global Step: 610680 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:17,538-Speed 2513.94 samples/sec Loss 1.3995 LearningRate 0.000086 Epoch: 29 Global Step: 610690 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:25,738-Speed 2498.14 samples/sec Loss 1.3988 LearningRate 0.000086 Epoch: 29 Global Step: 610700 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:33,938-Speed 2497.83 samples/sec Loss 1.3861 LearningRate 0.000086 Epoch: 29 Global Step: 610710 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:42,148-Speed 2494.80 samples/sec Loss 1.3718 LearningRate 0.000086 Epoch: 29 Global Step: 610720 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:50,361-Speed 2494.25 samples/sec Loss 1.3938 LearningRate 0.000086 Epoch: 29 Global Step: 610730 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:14:58,562-Speed 2497.51 samples/sec Loss 1.4049 LearningRate 0.000086 Epoch: 29 Global Step: 610740 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:06,722-Speed 2510.17 samples/sec Loss 1.4464 LearningRate 0.000086 Epoch: 29 Global Step: 610750 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:14,923-Speed 2497.86 samples/sec Loss 1.4383 LearningRate 0.000086 Epoch: 29 Global Step: 610760 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:23,121-Speed 2498.40 samples/sec Loss 1.3736 LearningRate 0.000086 Epoch: 29 Global Step: 610770 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:31,325-Speed 2496.79 samples/sec Loss 1.3669 LearningRate 0.000086 Epoch: 29 Global Step: 610780 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:39,529-Speed 2497.00 samples/sec Loss 1.3716 LearningRate 0.000086 Epoch: 29 Global Step: 610790 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:47,735-Speed 2496.26 samples/sec Loss 1.3947 LearningRate 0.000086 Epoch: 29 Global Step: 610800 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:15:55,883-Speed 2513.69 samples/sec Loss 1.3711 LearningRate 0.000086 Epoch: 29 Global Step: 610810 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:04,099-Speed 2493.08 samples/sec Loss 1.3733 LearningRate 0.000086 Epoch: 29 Global Step: 610820 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:12,319-Speed 2491.94 samples/sec Loss 1.3799 LearningRate 0.000086 Epoch: 29 Global Step: 610830 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:20,518-Speed 2498.27 samples/sec Loss 1.3886 LearningRate 0.000086 Epoch: 29 Global Step: 610840 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:28,718-Speed 2497.96 samples/sec Loss 1.3593 LearningRate 0.000086 Epoch: 29 Global Step: 610850 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:36,921-Speed 2497.15 samples/sec Loss 1.3742 LearningRate 0.000086 Epoch: 29 Global Step: 610860 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:45,068-Speed 2514.16 samples/sec Loss 1.4307 LearningRate 0.000086 Epoch: 29 Global Step: 610870 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:16:53,272-Speed 2496.61 samples/sec Loss 1.4062 LearningRate 0.000086 Epoch: 29 Global Step: 610880 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:01,473-Speed 2497.61 samples/sec Loss 1.3957 LearningRate 0.000086 Epoch: 29 Global Step: 610890 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:09,676-Speed 2497.17 samples/sec Loss 1.3870 LearningRate 0.000086 Epoch: 29 Global Step: 610900 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:17,877-Speed 2497.51 samples/sec Loss 1.4308 LearningRate 0.000086 Epoch: 29 Global Step: 610910 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:26,078-Speed 2497.87 samples/sec Loss 1.4034 LearningRate 0.000086 Epoch: 29 Global Step: 610920 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:34,227-Speed 2513.39 samples/sec Loss 1.3823 LearningRate 0.000086 Epoch: 29 Global Step: 610930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:42,431-Speed 2496.96 samples/sec Loss 1.4122 LearningRate 0.000086 Epoch: 29 Global Step: 610940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:50,646-Speed 2493.23 samples/sec Loss 1.4020 LearningRate 0.000086 Epoch: 29 Global Step: 610950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:17:58,847-Speed 2497.72 samples/sec Loss 1.3568 LearningRate 0.000086 Epoch: 29 Global Step: 610960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:07,045-Speed 2498.40 samples/sec Loss 1.3682 LearningRate 0.000086 Epoch: 29 Global Step: 610970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:15,246-Speed 2497.90 samples/sec Loss 1.3740 LearningRate 0.000086 Epoch: 29 Global Step: 610980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:23,395-Speed 2513.37 samples/sec Loss 1.3925 LearningRate 0.000086 Epoch: 29 Global Step: 610990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:31,607-Speed 2494.35 samples/sec Loss 1.3916 LearningRate 0.000086 Epoch: 29 Global Step: 611000 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:39,810-Speed 2497.15 samples/sec Loss 1.3953 LearningRate 0.000086 Epoch: 29 Global Step: 611010 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:48,011-Speed 2497.74 samples/sec Loss 1.3956 LearningRate 0.000086 Epoch: 29 Global Step: 611020 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:18:56,217-Speed 2496.29 samples/sec Loss 1.3857 LearningRate 0.000086 Epoch: 29 Global Step: 611030 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:04,430-Speed 2493.81 samples/sec Loss 1.4180 LearningRate 0.000086 Epoch: 29 Global Step: 611040 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:12,577-Speed 2514.20 samples/sec Loss 1.3909 LearningRate 0.000086 Epoch: 29 Global Step: 611050 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:20,784-Speed 2495.84 samples/sec Loss 1.3756 LearningRate 0.000086 Epoch: 29 Global Step: 611060 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:28,986-Speed 2497.59 samples/sec Loss 1.3767 LearningRate 0.000086 Epoch: 29 Global Step: 611070 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:37,186-Speed 2497.70 samples/sec Loss 1.4002 LearningRate 0.000086 Epoch: 29 Global Step: 611080 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:45,390-Speed 2496.83 samples/sec Loss 1.3777 LearningRate 0.000086 Epoch: 29 Global Step: 611090 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:19:53,592-Speed 2497.50 samples/sec Loss 1.3804 LearningRate 0.000086 Epoch: 29 Global Step: 611100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:01,746-Speed 2512.01 samples/sec Loss 1.3785 LearningRate 0.000086 Epoch: 29 Global Step: 611110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:09,946-Speed 2497.87 samples/sec Loss 1.3932 LearningRate 0.000086 Epoch: 29 Global Step: 611120 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:18,148-Speed 2497.88 samples/sec Loss 1.3937 LearningRate 0.000086 Epoch: 29 Global Step: 611130 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:26,350-Speed 2497.30 samples/sec Loss 1.3934 LearningRate 0.000086 Epoch: 29 Global Step: 611140 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:34,555-Speed 2496.55 samples/sec Loss 1.4331 LearningRate 0.000086 Epoch: 29 Global Step: 611150 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:42,770-Speed 2493.32 samples/sec Loss 1.3565 LearningRate 0.000086 Epoch: 29 Global Step: 611160 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:50,918-Speed 2514.18 samples/sec Loss 1.4285 LearningRate 0.000086 Epoch: 29 Global Step: 611170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:20:59,129-Speed 2494.85 samples/sec Loss 1.3959 LearningRate 0.000086 Epoch: 29 Global Step: 611180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:07,335-Speed 2496.26 samples/sec Loss 1.3773 LearningRate 0.000086 Epoch: 29 Global Step: 611190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:15,534-Speed 2498.05 samples/sec Loss 1.3959 LearningRate 0.000086 Epoch: 29 Global Step: 611200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:23,732-Speed 2498.57 samples/sec Loss 1.4073 LearningRate 0.000086 Epoch: 29 Global Step: 611210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:31,930-Speed 2498.25 samples/sec Loss 1.3828 LearningRate 0.000086 Epoch: 29 Global Step: 611220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:40,076-Speed 2514.64 samples/sec Loss 1.3805 LearningRate 0.000086 Epoch: 29 Global Step: 611230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:48,272-Speed 2499.30 samples/sec Loss 1.3772 LearningRate 0.000086 Epoch: 29 Global Step: 611240 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:21:56,468-Speed 2499.30 samples/sec Loss 1.3960 LearningRate 0.000086 Epoch: 29 Global Step: 611250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:22:04,669-Speed 2497.50 samples/sec Loss 1.3989 LearningRate 0.000085 Epoch: 29 Global Step: 611260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:22:12,874-Speed 2496.66 samples/sec Loss 1.3733 LearningRate 0.000085 Epoch: 29 Global Step: 611270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:22:21,072-Speed 2498.37 samples/sec Loss 1.3989 LearningRate 0.000085 Epoch: 29 Global Step: 611280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:22:29,218-Speed 2514.62 samples/sec Loss 1.3740 LearningRate 0.000085 Epoch: 29 Global Step: 611290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:22:37,418-Speed 2497.85 samples/sec Loss 1.4131 LearningRate 0.000085 Epoch: 29 Global Step: 611300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:22:45,632-Speed 2493.52 samples/sec Loss 1.4072 LearningRate 0.000085 Epoch: 29 Global Step: 611310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:22:53,830-Speed 2498.61 samples/sec Loss 1.3490 LearningRate 0.000085 Epoch: 29 Global Step: 611320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:02,029-Speed 2498.35 samples/sec Loss 1.3865 LearningRate 0.000085 Epoch: 29 Global Step: 611330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:10,227-Speed 2498.31 samples/sec Loss 1.3570 LearningRate 0.000085 Epoch: 29 Global Step: 611340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:18,374-Speed 2514.24 samples/sec Loss 1.4400 LearningRate 0.000085 Epoch: 29 Global Step: 611350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:26,572-Speed 2498.50 samples/sec Loss 1.4008 LearningRate 0.000085 Epoch: 29 Global Step: 611360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:34,781-Speed 2498.95 samples/sec Loss 1.3884 LearningRate 0.000085 Epoch: 29 Global Step: 611370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:42,982-Speed 2497.56 samples/sec Loss 1.4131 LearningRate 0.000085 Epoch: 29 Global Step: 611380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:51,180-Speed 2498.92 samples/sec Loss 1.3893 LearningRate 0.000085 Epoch: 29 Global Step: 611390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:23:59,378-Speed 2498.47 samples/sec Loss 1.3888 LearningRate 0.000085 Epoch: 29 Global Step: 611400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:07,529-Speed 2513.31 samples/sec Loss 1.3757 LearningRate 0.000085 Epoch: 29 Global Step: 611410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:15,729-Speed 2497.87 samples/sec Loss 1.4241 LearningRate 0.000085 Epoch: 29 Global Step: 611420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:23,929-Speed 2497.82 samples/sec Loss 1.3673 LearningRate 0.000085 Epoch: 29 Global Step: 611430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:32,127-Speed 2498.48 samples/sec Loss 1.3733 LearningRate 0.000085 Epoch: 29 Global Step: 611440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:40,329-Speed 2497.21 samples/sec Loss 1.3827 LearningRate 0.000085 Epoch: 29 Global Step: 611450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:48,531-Speed 2497.39 samples/sec Loss 1.3536 LearningRate 0.000085 Epoch: 29 Global Step: 611460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:24:56,677-Speed 2514.61 samples/sec Loss 1.3567 LearningRate 0.000085 Epoch: 29 Global Step: 611470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:04,879-Speed 2497.35 samples/sec Loss 1.3789 LearningRate 0.000085 Epoch: 29 Global Step: 611480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:13,078-Speed 2498.32 samples/sec Loss 1.3862 LearningRate 0.000085 Epoch: 29 Global Step: 611490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:21,288-Speed 2495.00 samples/sec Loss 1.4397 LearningRate 0.000085 Epoch: 29 Global Step: 611500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:29,487-Speed 2498.26 samples/sec Loss 1.4538 LearningRate 0.000085 Epoch: 29 Global Step: 611510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:37,685-Speed 2498.49 samples/sec Loss 1.3912 LearningRate 0.000085 Epoch: 29 Global Step: 611520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:45,843-Speed 2510.83 samples/sec Loss 1.3898 LearningRate 0.000085 Epoch: 29 Global Step: 611530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:25:54,043-Speed 2498.09 samples/sec Loss 1.3519 LearningRate 0.000085 Epoch: 29 Global Step: 611540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:02,239-Speed 2499.19 samples/sec Loss 1.4049 LearningRate 0.000085 Epoch: 29 Global Step: 611550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:10,453-Speed 2493.75 samples/sec Loss 1.3748 LearningRate 0.000085 Epoch: 29 Global Step: 611560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:18,654-Speed 2497.75 samples/sec Loss 1.3694 LearningRate 0.000085 Epoch: 29 Global Step: 611570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:26,854-Speed 2497.77 samples/sec Loss 1.3809 LearningRate 0.000085 Epoch: 29 Global Step: 611580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:35,013-Speed 2510.70 samples/sec Loss 1.3564 LearningRate 0.000085 Epoch: 29 Global Step: 611590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:43,215-Speed 2497.18 samples/sec Loss 1.3755 LearningRate 0.000085 Epoch: 29 Global Step: 611600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:51,429-Speed 2493.63 samples/sec Loss 1.3664 LearningRate 0.000085 Epoch: 29 Global Step: 611610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:26:59,643-Speed 2493.61 samples/sec Loss 1.3973 LearningRate 0.000085 Epoch: 29 Global Step: 611620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:07,843-Speed 2498.16 samples/sec Loss 1.3561 LearningRate 0.000085 Epoch: 29 Global Step: 611630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:16,044-Speed 2497.87 samples/sec Loss 1.3978 LearningRate 0.000085 Epoch: 29 Global Step: 611640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:24,192-Speed 2513.58 samples/sec Loss 1.4060 LearningRate 0.000085 Epoch: 29 Global Step: 611650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:32,399-Speed 2495.73 samples/sec Loss 1.3890 LearningRate 0.000085 Epoch: 29 Global Step: 611660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:40,605-Speed 2496.10 samples/sec Loss 1.3920 LearningRate 0.000085 Epoch: 29 Global Step: 611670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:48,804-Speed 2498.27 samples/sec Loss 1.4018 LearningRate 0.000085 Epoch: 29 Global Step: 611680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:27:57,005-Speed 2497.53 samples/sec Loss 1.4075 LearningRate 0.000085 Epoch: 29 Global Step: 611690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:05,210-Speed 2496.64 samples/sec Loss 1.3716 LearningRate 0.000085 Epoch: 29 Global Step: 611700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:13,360-Speed 2513.06 samples/sec Loss 1.3841 LearningRate 0.000085 Epoch: 29 Global Step: 611710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:21,563-Speed 2497.40 samples/sec Loss 1.4125 LearningRate 0.000085 Epoch: 29 Global Step: 611720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:29,761-Speed 2498.51 samples/sec Loss 1.3836 LearningRate 0.000085 Epoch: 29 Global Step: 611730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:37,962-Speed 2497.74 samples/sec Loss 1.3705 LearningRate 0.000085 Epoch: 29 Global Step: 611740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:46,162-Speed 2497.95 samples/sec Loss 1.4193 LearningRate 0.000085 Epoch: 29 Global Step: 611750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:28:54,361-Speed 2498.22 samples/sec Loss 1.4314 LearningRate 0.000085 Epoch: 29 Global Step: 611760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:02,514-Speed 2512.68 samples/sec Loss 1.3849 LearningRate 0.000085 Epoch: 29 Global Step: 611770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:10,714-Speed 2498.13 samples/sec Loss 1.3950 LearningRate 0.000085 Epoch: 29 Global Step: 611780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:18,912-Speed 2498.33 samples/sec Loss 1.4156 LearningRate 0.000085 Epoch: 29 Global Step: 611790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:27,115-Speed 2497.16 samples/sec Loss 1.3910 LearningRate 0.000085 Epoch: 29 Global Step: 611800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:35,317-Speed 2497.55 samples/sec Loss 1.3777 LearningRate 0.000085 Epoch: 29 Global Step: 611810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:43,513-Speed 2498.92 samples/sec Loss 1.3921 LearningRate 0.000085 Epoch: 29 Global Step: 611820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:51,659-Speed 2514.56 samples/sec Loss 1.3789 LearningRate 0.000085 Epoch: 29 Global Step: 611830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:29:59,859-Speed 2498.12 samples/sec Loss 1.3757 LearningRate 0.000085 Epoch: 29 Global Step: 611840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:08,068-Speed 2495.14 samples/sec Loss 1.3423 LearningRate 0.000085 Epoch: 29 Global Step: 611850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:16,278-Speed 2495.00 samples/sec Loss 1.3628 LearningRate 0.000085 Epoch: 29 Global Step: 611860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:24,485-Speed 2495.73 samples/sec Loss 1.3731 LearningRate 0.000085 Epoch: 29 Global Step: 611870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:32,682-Speed 2499.09 samples/sec Loss 1.3790 LearningRate 0.000085 Epoch: 29 Global Step: 611880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:40,830-Speed 2513.72 samples/sec Loss 1.3932 LearningRate 0.000085 Epoch: 29 Global Step: 611890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:49,029-Speed 2498.49 samples/sec Loss 1.4038 LearningRate 0.000085 Epoch: 29 Global Step: 611900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:30:57,228-Speed 2498.18 samples/sec Loss 1.4030 LearningRate 0.000085 Epoch: 29 Global Step: 611910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:31:05,429-Speed 2497.69 samples/sec Loss 1.3786 LearningRate 0.000085 Epoch: 29 Global Step: 611920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:31:13,584-Speed 2511.63 samples/sec Loss 1.4066 LearningRate 0.000085 Epoch: 29 Global Step: 611930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:31:21,784-Speed 2498.12 samples/sec Loss 1.4019 LearningRate 0.000085 Epoch: 29 Global Step: 611940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:31:29,943-Speed 2510.51 samples/sec Loss 1.3700 LearningRate 0.000085 Epoch: 29 Global Step: 611950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:31:38,141-Speed 2498.48 samples/sec Loss 1.3644 LearningRate 0.000085 Epoch: 29 Global Step: 611960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:31:46,342-Speed 2497.63 samples/sec Loss 1.3763 LearningRate 0.000085 Epoch: 29 Global Step: 611970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:31:54,542-Speed 2498.04 samples/sec Loss 1.3945 LearningRate 0.000085 Epoch: 29 Global Step: 611980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:02,739-Speed 2498.93 samples/sec Loss 1.3472 LearningRate 0.000085 Epoch: 29 Global Step: 611990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:10,940-Speed 2497.83 samples/sec Loss 1.3723 LearningRate 0.000085 Epoch: 29 Global Step: 612000 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:19,084-Speed 2515.04 samples/sec Loss 1.3810 LearningRate 0.000085 Epoch: 29 Global Step: 612010 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:27,289-Speed 2496.82 samples/sec Loss 1.3921 LearningRate 0.000085 Epoch: 29 Global Step: 612020 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:35,487-Speed 2498.96 samples/sec Loss 1.4020 LearningRate 0.000085 Epoch: 29 Global Step: 612030 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:43,683-Speed 2498.89 samples/sec Loss 1.3953 LearningRate 0.000085 Epoch: 29 Global Step: 612040 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:32:51,880-Speed 2499.05 samples/sec Loss 1.3600 LearningRate 0.000085 Epoch: 29 Global Step: 612050 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:00,080-Speed 2498.09 samples/sec Loss 1.3473 LearningRate 0.000085 Epoch: 29 Global Step: 612060 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:08,229-Speed 2513.75 samples/sec Loss 1.3807 LearningRate 0.000085 Epoch: 29 Global Step: 612070 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:16,433-Speed 2496.55 samples/sec Loss 1.3951 LearningRate 0.000085 Epoch: 29 Global Step: 612080 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:24,632-Speed 2498.02 samples/sec Loss 1.3807 LearningRate 0.000085 Epoch: 29 Global Step: 612090 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:32,842-Speed 2494.97 samples/sec Loss 1.3827 LearningRate 0.000085 Epoch: 29 Global Step: 612100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:41,046-Speed 2496.84 samples/sec Loss 1.3637 LearningRate 0.000085 Epoch: 29 Global Step: 612110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:49,243-Speed 2498.81 samples/sec Loss 1.3759 LearningRate 0.000085 Epoch: 29 Global Step: 612120 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:33:57,392-Speed 2513.75 samples/sec Loss 1.3669 LearningRate 0.000085 Epoch: 29 Global Step: 612130 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:05,592-Speed 2497.85 samples/sec Loss 1.3731 LearningRate 0.000085 Epoch: 29 Global Step: 612140 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:13,790-Speed 2498.54 samples/sec Loss 1.3797 LearningRate 0.000085 Epoch: 29 Global Step: 612150 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:22,003-Speed 2494.47 samples/sec Loss 1.3722 LearningRate 0.000085 Epoch: 29 Global Step: 612160 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:30,201-Speed 2498.57 samples/sec Loss 1.3719 LearningRate 0.000085 Epoch: 29 Global Step: 612170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:38,406-Speed 2496.39 samples/sec Loss 1.3934 LearningRate 0.000085 Epoch: 29 Global Step: 612180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:46,564-Speed 2510.67 samples/sec Loss 1.3907 LearningRate 0.000085 Epoch: 29 Global Step: 612190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:34:54,765-Speed 2497.80 samples/sec Loss 1.3594 LearningRate 0.000085 Epoch: 29 Global Step: 612200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:02,965-Speed 2497.67 samples/sec Loss 1.3822 LearningRate 0.000085 Epoch: 29 Global Step: 612210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:11,169-Speed 2497.20 samples/sec Loss 1.3664 LearningRate 0.000085 Epoch: 29 Global Step: 612220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:19,367-Speed 2498.54 samples/sec Loss 1.3828 LearningRate 0.000085 Epoch: 29 Global Step: 612230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:27,566-Speed 2498.36 samples/sec Loss 1.3654 LearningRate 0.000085 Epoch: 29 Global Step: 612240 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:35,712-Speed 2514.54 samples/sec Loss 1.3910 LearningRate 0.000085 Epoch: 29 Global Step: 612250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:43,911-Speed 2498.32 samples/sec Loss 1.3710 LearningRate 0.000085 Epoch: 29 Global Step: 612260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:35:52,119-Speed 2495.65 samples/sec Loss 1.3764 LearningRate 0.000085 Epoch: 29 Global Step: 612270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:00,325-Speed 2496.12 samples/sec Loss 1.3928 LearningRate 0.000085 Epoch: 29 Global Step: 612280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:08,537-Speed 2494.23 samples/sec Loss 1.3884 LearningRate 0.000085 Epoch: 29 Global Step: 612290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:16,741-Speed 2497.02 samples/sec Loss 1.4015 LearningRate 0.000085 Epoch: 29 Global Step: 612300 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:24,885-Speed 2515.00 samples/sec Loss 1.3795 LearningRate 0.000085 Epoch: 29 Global Step: 612310 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:33,084-Speed 2498.00 samples/sec Loss 1.4167 LearningRate 0.000085 Epoch: 29 Global Step: 612320 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:41,283-Speed 2498.35 samples/sec Loss 1.3475 LearningRate 0.000085 Epoch: 29 Global Step: 612330 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:49,485-Speed 2497.74 samples/sec Loss 1.3941 LearningRate 0.000085 Epoch: 29 Global Step: 612340 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:36:57,686-Speed 2497.57 samples/sec Loss 1.3870 LearningRate 0.000085 Epoch: 29 Global Step: 612350 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:05,886-Speed 2497.80 samples/sec Loss 1.4183 LearningRate 0.000085 Epoch: 29 Global Step: 612360 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:14,035-Speed 2513.52 samples/sec Loss 1.4040 LearningRate 0.000085 Epoch: 29 Global Step: 612370 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:22,238-Speed 2497.20 samples/sec Loss 1.3474 LearningRate 0.000085 Epoch: 29 Global Step: 612380 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:30,440-Speed 2497.34 samples/sec Loss 1.3422 LearningRate 0.000085 Epoch: 29 Global Step: 612390 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:38,643-Speed 2496.81 samples/sec Loss 1.3633 LearningRate 0.000085 Epoch: 29 Global Step: 612400 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:46,848-Speed 2496.69 samples/sec Loss 1.3925 LearningRate 0.000085 Epoch: 29 Global Step: 612410 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:37:55,053-Speed 2496.32 samples/sec Loss 1.3912 LearningRate 0.000085 Epoch: 29 Global Step: 612420 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:03,201-Speed 2513.95 samples/sec Loss 1.3757 LearningRate 0.000085 Epoch: 29 Global Step: 612430 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:11,407-Speed 2495.98 samples/sec Loss 1.4156 LearningRate 0.000085 Epoch: 29 Global Step: 612440 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:19,610-Speed 2497.22 samples/sec Loss 1.3931 LearningRate 0.000085 Epoch: 29 Global Step: 612450 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:27,810-Speed 2498.00 samples/sec Loss 1.3542 LearningRate 0.000085 Epoch: 29 Global Step: 612460 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:36,009-Speed 2498.27 samples/sec Loss 1.3644 LearningRate 0.000085 Epoch: 29 Global Step: 612470 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:44,216-Speed 2495.73 samples/sec Loss 1.4013 LearningRate 0.000085 Epoch: 29 Global Step: 612480 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:38:52,359-Speed 2515.62 samples/sec Loss 1.4010 LearningRate 0.000085 Epoch: 29 Global Step: 612490 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:00,557-Speed 2498.47 samples/sec Loss 1.3619 LearningRate 0.000085 Epoch: 29 Global Step: 612500 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:08,756-Speed 2498.31 samples/sec Loss 1.3793 LearningRate 0.000085 Epoch: 29 Global Step: 612510 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:16,954-Speed 2498.69 samples/sec Loss 1.3642 LearningRate 0.000085 Epoch: 29 Global Step: 612520 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:25,156-Speed 2497.31 samples/sec Loss 1.3863 LearningRate 0.000085 Epoch: 29 Global Step: 612530 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:33,354-Speed 2498.45 samples/sec Loss 1.3245 LearningRate 0.000084 Epoch: 29 Global Step: 612540 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:41,505-Speed 2513.11 samples/sec Loss 1.3749 LearningRate 0.000084 Epoch: 29 Global Step: 612550 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:49,703-Speed 2498.42 samples/sec Loss 1.3546 LearningRate 0.000084 Epoch: 29 Global Step: 612560 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:39:57,903-Speed 2498.02 samples/sec Loss 1.3837 LearningRate 0.000084 Epoch: 29 Global Step: 612570 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:06,099-Speed 2498.89 samples/sec Loss 1.3825 LearningRate 0.000084 Epoch: 29 Global Step: 612580 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:14,298-Speed 2498.33 samples/sec Loss 1.3778 LearningRate 0.000084 Epoch: 29 Global Step: 612590 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:22,499-Speed 2498.10 samples/sec Loss 1.3781 LearningRate 0.000084 Epoch: 29 Global Step: 612600 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:30,644-Speed 2514.64 samples/sec Loss 1.3971 LearningRate 0.000084 Epoch: 29 Global Step: 612610 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:38,842-Speed 2498.63 samples/sec Loss 1.4040 LearningRate 0.000084 Epoch: 29 Global Step: 612620 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:47,041-Speed 2498.46 samples/sec Loss 1.3723 LearningRate 0.000084 Epoch: 29 Global Step: 612630 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:40:55,242-Speed 2497.73 samples/sec Loss 1.3702 LearningRate 0.000084 Epoch: 29 Global Step: 612640 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:03,440-Speed 2498.64 samples/sec Loss 1.3662 LearningRate 0.000084 Epoch: 29 Global Step: 612650 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:11,638-Speed 2498.63 samples/sec Loss 1.3392 LearningRate 0.000084 Epoch: 29 Global Step: 612660 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:19,788-Speed 2513.20 samples/sec Loss 1.3782 LearningRate 0.000084 Epoch: 29 Global Step: 612670 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:27,986-Speed 2498.59 samples/sec Loss 1.3803 LearningRate 0.000084 Epoch: 29 Global Step: 612680 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:36,188-Speed 2497.41 samples/sec Loss 1.3677 LearningRate 0.000084 Epoch: 29 Global Step: 612690 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:44,385-Speed 2498.65 samples/sec Loss 1.4028 LearningRate 0.000084 Epoch: 29 Global Step: 612700 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:41:52,587-Speed 2497.46 samples/sec Loss 1.3928 LearningRate 0.000084 Epoch: 29 Global Step: 612710 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:00,785-Speed 2498.49 samples/sec Loss 1.3912 LearningRate 0.000084 Epoch: 29 Global Step: 612720 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:08,937-Speed 2512.55 samples/sec Loss 1.3876 LearningRate 0.000084 Epoch: 29 Global Step: 612730 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:17,143-Speed 2496.35 samples/sec Loss 1.3559 LearningRate 0.000084 Epoch: 29 Global Step: 612740 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:25,341-Speed 2498.49 samples/sec Loss 1.3669 LearningRate 0.000084 Epoch: 29 Global Step: 612750 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:33,539-Speed 2498.37 samples/sec Loss 1.3737 LearningRate 0.000084 Epoch: 29 Global Step: 612760 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:41,738-Speed 2498.22 samples/sec Loss 1.3581 LearningRate 0.000084 Epoch: 29 Global Step: 612770 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:49,938-Speed 2498.01 samples/sec Loss 1.3821 LearningRate 0.000084 Epoch: 29 Global Step: 612780 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:42:58,082-Speed 2515.19 samples/sec Loss 1.3739 LearningRate 0.000084 Epoch: 29 Global Step: 612790 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:06,283-Speed 2497.50 samples/sec Loss 1.4134 LearningRate 0.000084 Epoch: 29 Global Step: 612800 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:14,482-Speed 2498.40 samples/sec Loss 1.4029 LearningRate 0.000084 Epoch: 29 Global Step: 612810 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:22,680-Speed 2498.65 samples/sec Loss 1.3705 LearningRate 0.000084 Epoch: 29 Global Step: 612820 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:30,895-Speed 2493.54 samples/sec Loss 1.4024 LearningRate 0.000084 Epoch: 29 Global Step: 612830 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:39,106-Speed 2494.48 samples/sec Loss 1.4063 LearningRate 0.000084 Epoch: 29 Global Step: 612840 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:47,260-Speed 2512.05 samples/sec Loss 1.3711 LearningRate 0.000084 Epoch: 29 Global Step: 612850 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:43:55,470-Speed 2495.06 samples/sec Loss 1.3634 LearningRate 0.000084 Epoch: 29 Global Step: 612860 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:03,669-Speed 2498.31 samples/sec Loss 1.3724 LearningRate 0.000084 Epoch: 29 Global Step: 612870 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:11,868-Speed 2498.24 samples/sec Loss 1.4056 LearningRate 0.000084 Epoch: 29 Global Step: 612880 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:20,067-Speed 2498.14 samples/sec Loss 1.3678 LearningRate 0.000084 Epoch: 29 Global Step: 612890 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:28,268-Speed 2497.52 samples/sec Loss 1.3484 LearningRate 0.000084 Epoch: 29 Global Step: 612900 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:36,418-Speed 2513.31 samples/sec Loss 1.3992 LearningRate 0.000084 Epoch: 29 Global Step: 612910 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:44,628-Speed 2494.76 samples/sec Loss 1.3670 LearningRate 0.000084 Epoch: 29 Global Step: 612920 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:44:52,834-Speed 2496.21 samples/sec Loss 1.3608 LearningRate 0.000084 Epoch: 29 Global Step: 612930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:01,041-Speed 2495.90 samples/sec Loss 1.3631 LearningRate 0.000084 Epoch: 29 Global Step: 612940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:09,240-Speed 2497.94 samples/sec Loss 1.3653 LearningRate 0.000084 Epoch: 29 Global Step: 612950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:17,440-Speed 2498.06 samples/sec Loss 1.3548 LearningRate 0.000084 Epoch: 29 Global Step: 612960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:25,587-Speed 2514.31 samples/sec Loss 1.3300 LearningRate 0.000084 Epoch: 29 Global Step: 612970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:33,785-Speed 2498.39 samples/sec Loss 1.3876 LearningRate 0.000084 Epoch: 29 Global Step: 612980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:41,984-Speed 2498.34 samples/sec Loss 1.3934 LearningRate 0.000084 Epoch: 29 Global Step: 612990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:50,180-Speed 2499.21 samples/sec Loss 1.4108 LearningRate 0.000084 Epoch: 29 Global Step: 613000 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:45:58,379-Speed 2498.37 samples/sec Loss 1.3884 LearningRate 0.000084 Epoch: 29 Global Step: 613010 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:06,583-Speed 2497.09 samples/sec Loss 1.3833 LearningRate 0.000084 Epoch: 29 Global Step: 613020 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:14,728-Speed 2514.67 samples/sec Loss 1.3469 LearningRate 0.000084 Epoch: 29 Global Step: 613030 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:22,932-Speed 2496.67 samples/sec Loss 1.3644 LearningRate 0.000084 Epoch: 29 Global Step: 613040 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:31,135-Speed 2497.05 samples/sec Loss 1.4027 LearningRate 0.000084 Epoch: 29 Global Step: 613050 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:39,335-Speed 2497.79 samples/sec Loss 1.3736 LearningRate 0.000084 Epoch: 29 Global Step: 613060 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:47,538-Speed 2497.00 samples/sec Loss 1.4121 LearningRate 0.000084 Epoch: 29 Global Step: 613070 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:46:55,738-Speed 2498.15 samples/sec Loss 1.3702 LearningRate 0.000084 Epoch: 29 Global Step: 613080 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:47:03,886-Speed 2514.03 samples/sec Loss 1.4202 LearningRate 0.000084 Epoch: 29 Global Step: 613090 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:47:12,087-Speed 2497.60 samples/sec Loss 1.3770 LearningRate 0.000084 Epoch: 29 Global Step: 613100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:47:20,302-Speed 2493.33 samples/sec Loss 1.3695 LearningRate 0.000084 Epoch: 29 Global Step: 613110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:47:28,501-Speed 2498.14 samples/sec Loss 1.3894 LearningRate 0.000084 Epoch: 29 Global Step: 613120 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-07-11 10:47:36,700-Speed 2498.52 samples/sec Loss 1.3957 LearningRate 0.000084 Epoch: 29 Global Step: 613130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:47:44,905-Speed 2496.52 samples/sec Loss 1.3802 LearningRate 0.000084 Epoch: 29 Global Step: 613140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:47:53,050-Speed 2514.69 samples/sec Loss 1.3668 LearningRate 0.000084 Epoch: 29 Global Step: 613150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:01,248-Speed 2498.70 samples/sec Loss 1.3541 LearningRate 0.000084 Epoch: 29 Global Step: 613160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:09,450-Speed 2497.52 samples/sec Loss 1.3965 LearningRate 0.000084 Epoch: 29 Global Step: 613170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:17,649-Speed 2498.17 samples/sec Loss 1.3785 LearningRate 0.000084 Epoch: 29 Global Step: 613180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:25,847-Speed 2498.65 samples/sec Loss 1.3868 LearningRate 0.000084 Epoch: 29 Global Step: 613190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:34,050-Speed 2497.07 samples/sec Loss 1.3555 LearningRate 0.000084 Epoch: 29 Global Step: 613200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:42,210-Speed 2510.06 samples/sec Loss 1.3642 LearningRate 0.000084 Epoch: 29 Global Step: 613210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:50,415-Speed 2496.47 samples/sec Loss 1.3921 LearningRate 0.000084 Epoch: 29 Global Step: 613220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:48:58,614-Speed 2498.61 samples/sec Loss 1.3492 LearningRate 0.000084 Epoch: 29 Global Step: 613230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:06,813-Speed 2498.25 samples/sec Loss 1.3671 LearningRate 0.000084 Epoch: 29 Global Step: 613240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:15,012-Speed 2498.31 samples/sec Loss 1.3557 LearningRate 0.000084 Epoch: 29 Global Step: 613250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:23,216-Speed 2496.88 samples/sec Loss 1.3517 LearningRate 0.000084 Epoch: 29 Global Step: 613260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:31,361-Speed 2514.63 samples/sec Loss 1.3740 LearningRate 0.000084 Epoch: 29 Global Step: 613270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:39,562-Speed 2497.88 samples/sec Loss 1.3702 LearningRate 0.000084 Epoch: 29 Global Step: 613280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:47,761-Speed 2498.41 samples/sec Loss 1.3917 LearningRate 0.000084 Epoch: 29 Global Step: 613290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:49:55,963-Speed 2497.77 samples/sec Loss 1.4074 LearningRate 0.000084 Epoch: 29 Global Step: 613300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:50:04,162-Speed 2497.98 samples/sec Loss 1.4105 LearningRate 0.000084 Epoch: 29 Global Step: 613310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-07-11 10:50:12,363-Speed 2497.61 samples/sec Loss 1.3810 LearningRate 0.000084 Epoch: 29 Global Step: 613320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:50:20,521-Speed 2511.01 samples/sec Loss 1.3885 LearningRate 0.000084 Epoch: 29 Global Step: 613330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:50:28,730-Speed 2495.12 samples/sec Loss 1.4103 LearningRate 0.000084 Epoch: 29 Global Step: 613340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:50:36,928-Speed 2498.40 samples/sec Loss 1.3990 LearningRate 0.000084 Epoch: 29 Global Step: 613350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:50:45,139-Speed 2494.58 samples/sec Loss 1.3971 LearningRate 0.000084 Epoch: 29 Global Step: 613360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:50:53,339-Speed 2498.17 samples/sec Loss 1.3735 LearningRate 0.000084 Epoch: 29 Global Step: 613370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:01,537-Speed 2498.50 samples/sec Loss 1.3766 LearningRate 0.000084 Epoch: 29 Global Step: 613380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:09,683-Speed 2514.52 samples/sec Loss 1.4256 LearningRate 0.000084 Epoch: 29 Global Step: 613390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:17,882-Speed 2498.46 samples/sec Loss 1.3620 LearningRate 0.000084 Epoch: 29 Global Step: 613400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:26,077-Speed 2499.33 samples/sec Loss 1.3572 LearningRate 0.000084 Epoch: 29 Global Step: 613410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:34,290-Speed 2494.00 samples/sec Loss 1.3728 LearningRate 0.000084 Epoch: 29 Global Step: 613420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:42,500-Speed 2494.96 samples/sec Loss 1.3741 LearningRate 0.000084 Epoch: 29 Global Step: 613430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:50,697-Speed 2498.75 samples/sec Loss 1.3834 LearningRate 0.000084 Epoch: 29 Global Step: 613440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:51:58,843-Speed 2514.65 samples/sec Loss 1.3797 LearningRate 0.000084 Epoch: 29 Global Step: 613450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:07,043-Speed 2497.84 samples/sec Loss 1.3755 LearningRate 0.000084 Epoch: 29 Global Step: 613460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:15,255-Speed 2494.54 samples/sec Loss 1.3395 LearningRate 0.000084 Epoch: 29 Global Step: 613470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:23,452-Speed 2498.63 samples/sec Loss 1.3617 LearningRate 0.000084 Epoch: 29 Global Step: 613480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:31,654-Speed 2497.55 samples/sec Loss 1.3061 LearningRate 0.000084 Epoch: 29 Global Step: 613490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:39,852-Speed 2498.40 samples/sec Loss 1.3619 LearningRate 0.000084 Epoch: 29 Global Step: 613500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:48,001-Speed 2513.83 samples/sec Loss 1.3648 LearningRate 0.000084 Epoch: 29 Global Step: 613510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:52:56,202-Speed 2497.43 samples/sec Loss 1.3295 LearningRate 0.000084 Epoch: 29 Global Step: 613520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:04,404-Speed 2497.24 samples/sec Loss 1.3946 LearningRate 0.000084 Epoch: 29 Global Step: 613530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:12,604-Speed 2498.01 samples/sec Loss 1.3568 LearningRate 0.000084 Epoch: 29 Global Step: 613540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:20,806-Speed 2497.20 samples/sec Loss 1.4090 LearningRate 0.000084 Epoch: 29 Global Step: 613550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:29,005-Speed 2498.40 samples/sec Loss 1.3817 LearningRate 0.000084 Epoch: 29 Global Step: 613560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:37,152-Speed 2514.33 samples/sec Loss 1.3725 LearningRate 0.000084 Epoch: 29 Global Step: 613570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:45,350-Speed 2499.31 samples/sec Loss 1.4071 LearningRate 0.000084 Epoch: 29 Global Step: 613580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:53:53,549-Speed 2498.44 samples/sec Loss 1.3666 LearningRate 0.000084 Epoch: 29 Global Step: 613590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:01,751-Speed 2497.37 samples/sec Loss 1.3862 LearningRate 0.000084 Epoch: 29 Global Step: 613600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:09,951-Speed 2498.45 samples/sec Loss 1.3467 LearningRate 0.000084 Epoch: 29 Global Step: 613610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:18,153-Speed 2497.53 samples/sec Loss 1.3908 LearningRate 0.000084 Epoch: 29 Global Step: 613620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:26,298-Speed 2514.77 samples/sec Loss 1.3550 LearningRate 0.000084 Epoch: 29 Global Step: 613630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:34,510-Speed 2494.28 samples/sec Loss 1.3720 LearningRate 0.000084 Epoch: 29 Global Step: 613640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:42,715-Speed 2496.34 samples/sec Loss 1.3613 LearningRate 0.000084 Epoch: 29 Global Step: 613650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:50,914-Speed 2498.17 samples/sec Loss 1.3628 LearningRate 0.000084 Epoch: 29 Global Step: 613660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:54:59,112-Speed 2498.48 samples/sec Loss 1.3784 LearningRate 0.000084 Epoch: 29 Global Step: 613670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:07,314-Speed 2497.58 samples/sec Loss 1.3898 LearningRate 0.000084 Epoch: 29 Global Step: 613680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:15,460-Speed 2514.30 samples/sec Loss 1.3803 LearningRate 0.000084 Epoch: 29 Global Step: 613690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:23,677-Speed 2493.00 samples/sec Loss 1.3606 LearningRate 0.000084 Epoch: 29 Global Step: 613700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:31,876-Speed 2497.94 samples/sec Loss 1.3574 LearningRate 0.000084 Epoch: 29 Global Step: 613710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:40,076-Speed 2497.95 samples/sec Loss 1.3805 LearningRate 0.000084 Epoch: 29 Global Step: 613720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:48,275-Speed 2498.60 samples/sec Loss 1.3681 LearningRate 0.000084 Epoch: 29 Global Step: 613730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:55:56,475-Speed 2498.06 samples/sec Loss 1.3726 LearningRate 0.000084 Epoch: 29 Global Step: 613740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:04,622-Speed 2514.04 samples/sec Loss 1.3845 LearningRate 0.000084 Epoch: 29 Global Step: 613750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:12,824-Speed 2497.54 samples/sec Loss 1.3825 LearningRate 0.000084 Epoch: 29 Global Step: 613760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:21,022-Speed 2498.30 samples/sec Loss 1.3750 LearningRate 0.000084 Epoch: 29 Global Step: 613770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:29,247-Speed 2490.55 samples/sec Loss 1.3574 LearningRate 0.000084 Epoch: 29 Global Step: 613780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:37,450-Speed 2497.04 samples/sec Loss 1.4059 LearningRate 0.000084 Epoch: 29 Global Step: 613790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:45,650-Speed 2497.69 samples/sec Loss 1.3681 LearningRate 0.000084 Epoch: 29 Global Step: 613800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:56:53,796-Speed 2514.56 samples/sec Loss 1.3515 LearningRate 0.000084 Epoch: 29 Global Step: 613810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:57:01,996-Speed 2498.06 samples/sec Loss 1.3625 LearningRate 0.000083 Epoch: 29 Global Step: 613820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 10:57:10,150-Speed 2512.08 samples/sec Loss 1.3608 LearningRate 0.000083 Epoch: 29 Global Step: 613830 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:18,351-Speed 2497.78 samples/sec Loss 1.3650 LearningRate 0.000083 Epoch: 29 Global Step: 613840 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:26,549-Speed 2498.76 samples/sec Loss 1.3888 LearningRate 0.000083 Epoch: 29 Global Step: 613850 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:34,749-Speed 2497.66 samples/sec Loss 1.3408 LearningRate 0.000083 Epoch: 29 Global Step: 613860 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:42,895-Speed 2514.60 samples/sec Loss 1.3455 LearningRate 0.000083 Epoch: 29 Global Step: 613870 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:51,096-Speed 2497.48 samples/sec Loss 1.3919 LearningRate 0.000083 Epoch: 29 Global Step: 613880 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:57:59,296-Speed 2497.93 samples/sec Loss 1.3712 LearningRate 0.000083 Epoch: 29 Global Step: 613890 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:07,501-Speed 2496.44 samples/sec Loss 1.3732 LearningRate 0.000083 Epoch: 29 Global Step: 613900 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:15,714-Speed 2494.10 samples/sec Loss 1.3911 LearningRate 0.000083 Epoch: 29 Global Step: 613910 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:23,913-Speed 2498.31 samples/sec Loss 1.3583 LearningRate 0.000083 Epoch: 29 Global Step: 613920 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:32,060-Speed 2514.20 samples/sec Loss 1.3576 LearningRate 0.000083 Epoch: 29 Global Step: 613930 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:40,261-Speed 2497.61 samples/sec Loss 1.3435 LearningRate 0.000083 Epoch: 29 Global Step: 613940 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:48,462-Speed 2497.92 samples/sec Loss 1.3573 LearningRate 0.000083 Epoch: 29 Global Step: 613950 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:58:56,662-Speed 2497.93 samples/sec Loss 1.3391 LearningRate 0.000083 Epoch: 29 Global Step: 613960 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:04,862-Speed 2497.77 samples/sec Loss 1.3541 LearningRate 0.000083 Epoch: 29 Global Step: 613970 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:13,063-Speed 2497.72 samples/sec Loss 1.3468 LearningRate 0.000083 Epoch: 29 Global Step: 613980 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:21,212-Speed 2513.75 samples/sec Loss 1.4000 LearningRate 0.000083 Epoch: 29 Global Step: 613990 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:29,412-Speed 2497.90 samples/sec Loss 1.3564 LearningRate 0.000083 Epoch: 29 Global Step: 614000 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:37,610-Speed 2498.43 samples/sec Loss 1.3843 LearningRate 0.000083 Epoch: 29 Global Step: 614010 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:45,815-Speed 2497.00 samples/sec Loss 1.3816 LearningRate 0.000083 Epoch: 29 Global Step: 614020 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 10:59:54,013-Speed 2498.57 samples/sec Loss 1.3684 LearningRate 0.000083 Epoch: 29 Global Step: 614030 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:02,209-Speed 2499.14 samples/sec Loss 1.3730 LearningRate 0.000083 Epoch: 29 Global Step: 614040 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:10,360-Speed 2513.14 samples/sec Loss 1.3906 LearningRate 0.000083 Epoch: 29 Global Step: 614050 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:18,560-Speed 2497.96 samples/sec Loss 1.3905 LearningRate 0.000083 Epoch: 29 Global Step: 614060 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:26,759-Speed 2498.12 samples/sec Loss 1.4053 LearningRate 0.000083 Epoch: 29 Global Step: 614070 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:34,980-Speed 2492.28 samples/sec Loss 1.3885 LearningRate 0.000083 Epoch: 29 Global Step: 614080 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:43,194-Speed 2493.61 samples/sec Loss 1.3877 LearningRate 0.000083 Epoch: 29 Global Step: 614090 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:51,397-Speed 2496.95 samples/sec Loss 1.3784 LearningRate 0.000083 Epoch: 29 Global Step: 614100 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:00:59,549-Speed 2512.65 samples/sec Loss 1.3445 LearningRate 0.000083 Epoch: 29 Global Step: 614110 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:07,750-Speed 2497.65 samples/sec Loss 1.4022 LearningRate 0.000083 Epoch: 29 Global Step: 614120 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:15,951-Speed 2497.72 samples/sec Loss 1.3683 LearningRate 0.000083 Epoch: 29 Global Step: 614130 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:24,150-Speed 2498.16 samples/sec Loss 1.3638 LearningRate 0.000083 Epoch: 29 Global Step: 614140 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:32,352-Speed 2497.44 samples/sec Loss 1.4060 LearningRate 0.000083 Epoch: 29 Global Step: 614150 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:40,555-Speed 2497.31 samples/sec Loss 1.3793 LearningRate 0.000083 Epoch: 29 Global Step: 614160 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:48,703-Speed 2513.85 samples/sec Loss 1.3702 LearningRate 0.000083 Epoch: 29 Global Step: 614170 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:01:56,904-Speed 2497.50 samples/sec Loss 1.3902 LearningRate 0.000083 Epoch: 29 Global Step: 614180 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:05,107-Speed 2497.04 samples/sec Loss 1.3677 LearningRate 0.000083 Epoch: 29 Global Step: 614190 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:13,309-Speed 2497.51 samples/sec Loss 1.3622 LearningRate 0.000083 Epoch: 29 Global Step: 614200 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:21,509-Speed 2498.13 samples/sec Loss 1.3929 LearningRate 0.000083 Epoch: 29 Global Step: 614210 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:29,730-Speed 2491.58 samples/sec Loss 1.3732 LearningRate 0.000083 Epoch: 29 Global Step: 614220 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:37,881-Speed 2512.67 samples/sec Loss 1.3585 LearningRate 0.000083 Epoch: 29 Global Step: 614230 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:46,081-Speed 2498.17 samples/sec Loss 1.4204 LearningRate 0.000083 Epoch: 29 Global Step: 614240 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:02:54,293-Speed 2494.36 samples/sec Loss 1.3749 LearningRate 0.000083 Epoch: 29 Global Step: 614250 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:02,491-Speed 2498.42 samples/sec Loss 1.3732 LearningRate 0.000083 Epoch: 29 Global Step: 614260 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:10,690-Speed 2498.39 samples/sec Loss 1.3602 LearningRate 0.000083 Epoch: 29 Global Step: 614270 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:18,891-Speed 2497.80 samples/sec Loss 1.3566 LearningRate 0.000083 Epoch: 29 Global Step: 614280 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:27,043-Speed 2512.61 samples/sec Loss 1.3812 LearningRate 0.000083 Epoch: 29 Global Step: 614290 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:35,247-Speed 2496.56 samples/sec Loss 1.3918 LearningRate 0.000083 Epoch: 29 Global Step: 614300 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:43,448-Speed 2497.97 samples/sec Loss 1.3702 LearningRate 0.000083 Epoch: 29 Global Step: 614310 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:51,647-Speed 2498.47 samples/sec Loss 1.3759 LearningRate 0.000083 Epoch: 29 Global Step: 614320 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:03:59,845-Speed 2498.88 samples/sec Loss 1.3863 LearningRate 0.000083 Epoch: 29 Global Step: 614330 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:08,043-Speed 2498.46 samples/sec Loss 1.3934 LearningRate 0.000083 Epoch: 29 Global Step: 614340 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:16,197-Speed 2512.10 samples/sec Loss 1.3704 LearningRate 0.000083 Epoch: 29 Global Step: 614350 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:24,394-Speed 2498.91 samples/sec Loss 1.3821 LearningRate 0.000083 Epoch: 29 Global Step: 614360 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:32,592-Speed 2498.50 samples/sec Loss 1.3838 LearningRate 0.000083 Epoch: 29 Global Step: 614370 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:40,790-Speed 2498.64 samples/sec Loss 1.3513 LearningRate 0.000083 Epoch: 29 Global Step: 614380 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:48,994-Speed 2496.69 samples/sec Loss 1.3875 LearningRate 0.000083 Epoch: 29 Global Step: 614390 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:04:57,217-Speed 2490.82 samples/sec Loss 1.3548 LearningRate 0.000083 Epoch: 29 Global Step: 614400 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:05,363-Speed 2514.46 samples/sec Loss 1.3642 LearningRate 0.000083 Epoch: 29 Global Step: 614410 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:13,564-Speed 2497.81 samples/sec Loss 1.3621 LearningRate 0.000083 Epoch: 29 Global Step: 614420 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:21,772-Speed 2495.43 samples/sec Loss 1.4061 LearningRate 0.000083 Epoch: 29 Global Step: 614430 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:29,968-Speed 2499.20 samples/sec Loss 1.3722 LearningRate 0.000083 Epoch: 29 Global Step: 614440 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:38,172-Speed 2496.74 samples/sec Loss 1.3931 LearningRate 0.000083 Epoch: 29 Global Step: 614450 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:46,375-Speed 2496.93 samples/sec Loss 1.3714 LearningRate 0.000083 Epoch: 29 Global Step: 614460 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:05:54,526-Speed 2512.93 samples/sec Loss 1.3638 LearningRate 0.000083 Epoch: 29 Global Step: 614470 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:02,726-Speed 2498.11 samples/sec Loss 1.4004 LearningRate 0.000083 Epoch: 29 Global Step: 614480 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:10,930-Speed 2496.83 samples/sec Loss 1.3731 LearningRate 0.000083 Epoch: 29 Global Step: 614490 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:19,132-Speed 2497.41 samples/sec Loss 1.3908 LearningRate 0.000083 Epoch: 29 Global Step: 614500 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:27,334-Speed 2497.24 samples/sec Loss 1.3433 LearningRate 0.000083 Epoch: 29 Global Step: 614510 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:35,554-Speed 2491.86 samples/sec Loss 1.3811 LearningRate 0.000083 Epoch: 29 Global Step: 614520 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:43,699-Speed 2514.74 samples/sec Loss 1.3636 LearningRate 0.000083 Epoch: 29 Global Step: 614530 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:06:51,900-Speed 2497.87 samples/sec Loss 1.3955 LearningRate 0.000083 Epoch: 29 Global Step: 614540 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:00,105-Speed 2496.52 samples/sec Loss 1.3607 LearningRate 0.000083 Epoch: 29 Global Step: 614550 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:08,307-Speed 2497.21 samples/sec Loss 1.3956 LearningRate 0.000083 Epoch: 29 Global Step: 614560 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:16,506-Speed 2498.22 samples/sec Loss 1.3563 LearningRate 0.000083 Epoch: 29 Global Step: 614570 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:24,705-Speed 2498.11 samples/sec Loss 1.3963 LearningRate 0.000083 Epoch: 29 Global Step: 614580 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:32,850-Speed 2514.95 samples/sec Loss 1.3910 LearningRate 0.000083 Epoch: 29 Global Step: 614590 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:41,054-Speed 2496.86 samples/sec Loss 1.3804 LearningRate 0.000083 Epoch: 29 Global Step: 614600 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:49,256-Speed 2497.34 samples/sec Loss 1.3852 LearningRate 0.000083 Epoch: 29 Global Step: 614610 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:07:57,457-Speed 2497.59 samples/sec Loss 1.3559 LearningRate 0.000083 Epoch: 29 Global Step: 614620 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:05,659-Speed 2497.36 samples/sec Loss 1.3758 LearningRate 0.000083 Epoch: 29 Global Step: 614630 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:13,861-Speed 2497.36 samples/sec Loss 1.3873 LearningRate 0.000083 Epoch: 29 Global Step: 614640 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:22,007-Speed 2514.35 samples/sec Loss 1.3606 LearningRate 0.000083 Epoch: 29 Global Step: 614650 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:30,212-Speed 2496.70 samples/sec Loss 1.3756 LearningRate 0.000083 Epoch: 29 Global Step: 614660 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:38,468-Speed 2499.43 samples/sec Loss 1.3715 LearningRate 0.000083 Epoch: 29 Global Step: 614670 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:46,671-Speed 2496.72 samples/sec Loss 1.3903 LearningRate 0.000083 Epoch: 29 Global Step: 614680 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:08:54,888-Speed 2498.73 samples/sec Loss 1.3854 LearningRate 0.000083 Epoch: 29 Global Step: 614690 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:04,759-Speed 2167.62 samples/sec Loss 1.3641 LearningRate 0.000083 Epoch: 29 Global Step: 614700 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:13,402-Speed 2516.03 samples/sec Loss 1.3848 LearningRate 0.000083 Epoch: 29 Global Step: 614710 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:21,603-Speed 2497.69 samples/sec Loss 1.3852 LearningRate 0.000083 Epoch: 29 Global Step: 614720 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:29,822-Speed 2492.14 samples/sec Loss 1.4011 LearningRate 0.000083 Epoch: 29 Global Step: 614730 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:38,035-Speed 2499.96 samples/sec Loss 1.3690 LearningRate 0.000083 Epoch: 29 Global Step: 614740 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:46,284-Speed 2497.14 samples/sec Loss 1.3519 LearningRate 0.000083 Epoch: 29 Global Step: 614750 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:09:55,008-Speed 2347.86 samples/sec Loss 1.4222 LearningRate 0.000083 Epoch: 29 Global Step: 614760 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:03,152-Speed 2514.88 samples/sec Loss 1.3680 LearningRate 0.000083 Epoch: 29 Global Step: 614770 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:11,999-Speed 2497.07 samples/sec Loss 1.3950 LearningRate 0.000083 Epoch: 29 Global Step: 614780 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:21,630-Speed 2487.38 samples/sec Loss 1.3705 LearningRate 0.000083 Epoch: 29 Global Step: 614790 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:29,827-Speed 2498.68 samples/sec Loss 1.4043 LearningRate 0.000083 Epoch: 29 Global Step: 614800 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:43,075-Speed 1546.04 samples/sec Loss 1.3832 LearningRate 0.000083 Epoch: 29 Global Step: 614810 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:51,644-Speed 2494.88 samples/sec Loss 1.3776 LearningRate 0.000083 Epoch: 29 Global Step: 614820 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:10:59,819-Speed 2517.10 samples/sec Loss 1.3426 LearningRate 0.000083 Epoch: 29 Global Step: 614830 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:08,013-Speed 2499.71 samples/sec Loss 1.4004 LearningRate 0.000083 Epoch: 29 Global Step: 614840 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:16,237-Speed 2500.87 samples/sec Loss 1.3807 LearningRate 0.000083 Epoch: 29 Global Step: 614850 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:28,594-Speed 1738.54 samples/sec Loss 1.3673 LearningRate 0.000083 Epoch: 29 Global Step: 614860 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:36,991-Speed 2449.21 samples/sec Loss 1.3548 LearningRate 0.000083 Epoch: 29 Global Step: 614870 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:45,671-Speed 2359.61 samples/sec Loss 1.3875 LearningRate 0.000083 Epoch: 29 Global Step: 614880 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:11:57,960-Speed 2394.52 samples/sec Loss 1.3767 LearningRate 0.000083 Epoch: 29 Global Step: 614890 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:06,719-Speed 2393.24 samples/sec Loss 1.3721 LearningRate 0.000083 Epoch: 29 Global Step: 614900 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:14,914-Speed 2499.27 samples/sec Loss 1.3908 LearningRate 0.000083 Epoch: 29 Global Step: 614910 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:23,108-Speed 2499.75 samples/sec Loss 1.3658 LearningRate 0.000083 Epoch: 29 Global Step: 614920 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:31,303-Speed 2499.40 samples/sec Loss 1.3856 LearningRate 0.000083 Epoch: 29 Global Step: 614930 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:39,504-Speed 2497.72 samples/sec Loss 1.3567 LearningRate 0.000083 Epoch: 29 Global Step: 614940 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:47,649-Speed 2514.85 samples/sec Loss 1.3881 LearningRate 0.000083 Epoch: 29 Global Step: 614950 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:12:55,845-Speed 2499.11 samples/sec Loss 1.3582 LearningRate 0.000083 Epoch: 29 Global Step: 614960 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:04,050-Speed 2496.83 samples/sec Loss 1.4035 LearningRate 0.000083 Epoch: 29 Global Step: 614970 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:12,252-Speed 2497.45 samples/sec Loss 1.3822 LearningRate 0.000083 Epoch: 29 Global Step: 614980 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:20,453-Speed 2497.64 samples/sec Loss 1.3856 LearningRate 0.000083 Epoch: 29 Global Step: 614990 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:28,653-Speed 2498.26 samples/sec Loss 1.3496 LearningRate 0.000083 Epoch: 29 Global Step: 615000 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:36,804-Speed 2512.87 samples/sec Loss 1.3598 LearningRate 0.000083 Epoch: 29 Global Step: 615010 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:45,006-Speed 2497.22 samples/sec Loss 1.3959 LearningRate 0.000083 Epoch: 29 Global Step: 615020 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:13:53,206-Speed 2498.14 samples/sec Loss 1.3744 LearningRate 0.000083 Epoch: 29 Global Step: 615030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:01,406-Speed 2497.95 samples/sec Loss 1.3767 LearningRate 0.000083 Epoch: 29 Global Step: 615040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:09,604-Speed 2498.51 samples/sec Loss 1.3972 LearningRate 0.000083 Epoch: 29 Global Step: 615050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:17,805-Speed 2497.62 samples/sec Loss 1.3529 LearningRate 0.000083 Epoch: 29 Global Step: 615060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:25,949-Speed 2514.94 samples/sec Loss 1.3613 LearningRate 0.000083 Epoch: 29 Global Step: 615070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:34,151-Speed 2497.42 samples/sec Loss 1.4091 LearningRate 0.000083 Epoch: 29 Global Step: 615080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:42,348-Speed 2498.86 samples/sec Loss 1.3657 LearningRate 0.000083 Epoch: 29 Global Step: 615090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:50,560-Speed 2494.44 samples/sec Loss 1.4026 LearningRate 0.000083 Epoch: 29 Global Step: 615100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:14:58,763-Speed 2497.00 samples/sec Loss 1.3729 LearningRate 0.000083 Epoch: 29 Global Step: 615110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:06,964-Speed 2497.58 samples/sec Loss 1.3648 LearningRate 0.000082 Epoch: 29 Global Step: 615120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:15,115-Speed 2513.11 samples/sec Loss 1.3830 LearningRate 0.000082 Epoch: 29 Global Step: 615130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:23,320-Speed 2496.29 samples/sec Loss 1.4389 LearningRate 0.000082 Epoch: 29 Global Step: 615140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:31,535-Speed 2493.46 samples/sec Loss 1.4080 LearningRate 0.000082 Epoch: 29 Global Step: 615150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:39,740-Speed 2496.54 samples/sec Loss 1.3780 LearningRate 0.000082 Epoch: 29 Global Step: 615160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:47,956-Speed 2493.06 samples/sec Loss 1.3514 LearningRate 0.000082 Epoch: 29 Global Step: 615170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:15:56,159-Speed 2497.02 samples/sec Loss 1.3732 LearningRate 0.000082 Epoch: 29 Global Step: 615180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:04,309-Speed 2513.15 samples/sec Loss 1.3707 LearningRate 0.000082 Epoch: 29 Global Step: 615190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:12,512-Speed 2496.90 samples/sec Loss 1.3848 LearningRate 0.000082 Epoch: 29 Global Step: 615200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:20,712-Speed 2497.88 samples/sec Loss 1.3654 LearningRate 0.000082 Epoch: 29 Global Step: 615210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:28,910-Speed 2498.66 samples/sec Loss 1.3629 LearningRate 0.000082 Epoch: 29 Global Step: 615220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:37,107-Speed 2498.86 samples/sec Loss 1.3868 LearningRate 0.000082 Epoch: 29 Global Step: 615230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:45,307-Speed 2497.87 samples/sec Loss 1.3798 LearningRate 0.000082 Epoch: 29 Global Step: 615240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:16:53,453-Speed 2514.64 samples/sec Loss 1.3638 LearningRate 0.000082 Epoch: 29 Global Step: 615250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:01,654-Speed 2497.53 samples/sec Loss 1.3837 LearningRate 0.000082 Epoch: 29 Global Step: 615260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:09,853-Speed 2498.47 samples/sec Loss 1.3752 LearningRate 0.000082 Epoch: 29 Global Step: 615270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:18,051-Speed 2498.43 samples/sec Loss 1.3688 LearningRate 0.000082 Epoch: 29 Global Step: 615280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:26,256-Speed 2496.41 samples/sec Loss 1.3542 LearningRate 0.000082 Epoch: 29 Global Step: 615290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:34,456-Speed 2497.93 samples/sec Loss 1.3531 LearningRate 0.000082 Epoch: 29 Global Step: 615300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:42,603-Speed 2514.32 samples/sec Loss 1.3647 LearningRate 0.000082 Epoch: 29 Global Step: 615310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:50,803-Speed 2497.80 samples/sec Loss 1.3774 LearningRate 0.000082 Epoch: 29 Global Step: 615320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:17:59,006-Speed 2496.99 samples/sec Loss 1.3301 LearningRate 0.000082 Epoch: 29 Global Step: 615330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:18:07,172-Speed 2508.39 samples/sec Loss 1.3651 LearningRate 0.000082 Epoch: 29 Global Step: 615340 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:15,371-Speed 2498.53 samples/sec Loss 1.3605 LearningRate 0.000082 Epoch: 29 Global Step: 615350 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:23,571-Speed 2497.92 samples/sec Loss 1.3958 LearningRate 0.000082 Epoch: 29 Global Step: 615360 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:31,730-Speed 2510.44 samples/sec Loss 1.3701 LearningRate 0.000082 Epoch: 29 Global Step: 615370 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:39,930-Speed 2498.16 samples/sec Loss 1.3988 LearningRate 0.000082 Epoch: 29 Global Step: 615380 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:48,132-Speed 2497.46 samples/sec Loss 1.3565 LearningRate 0.000082 Epoch: 29 Global Step: 615390 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:18:56,336-Speed 2496.86 samples/sec Loss 1.3611 LearningRate 0.000082 Epoch: 29 Global Step: 615400 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:04,538-Speed 2497.37 samples/sec Loss 1.3738 LearningRate 0.000082 Epoch: 29 Global Step: 615410 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:12,742-Speed 2496.68 samples/sec Loss 1.4042 LearningRate 0.000082 Epoch: 29 Global Step: 615420 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:20,890-Speed 2513.86 samples/sec Loss 1.4046 LearningRate 0.000082 Epoch: 29 Global Step: 615430 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:29,094-Speed 2497.07 samples/sec Loss 1.3558 LearningRate 0.000082 Epoch: 29 Global Step: 615440 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:37,298-Speed 2497.01 samples/sec Loss 1.3906 LearningRate 0.000082 Epoch: 29 Global Step: 615450 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:45,497-Speed 2498.20 samples/sec Loss 1.4112 LearningRate 0.000082 Epoch: 29 Global Step: 615460 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:19:53,699-Speed 2497.35 samples/sec Loss 1.3727 LearningRate 0.000082 Epoch: 29 Global Step: 615470 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:01,902-Speed 2497.17 samples/sec Loss 1.3758 LearningRate 0.000082 Epoch: 29 Global Step: 615480 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:10,050-Speed 2513.84 samples/sec Loss 1.3547 LearningRate 0.000082 Epoch: 29 Global Step: 615490 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:18,253-Speed 2497.24 samples/sec Loss 1.3859 LearningRate 0.000082 Epoch: 29 Global Step: 615500 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:26,454-Speed 2497.67 samples/sec Loss 1.4058 LearningRate 0.000082 Epoch: 29 Global Step: 615510 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:34,657-Speed 2497.09 samples/sec Loss 1.3970 LearningRate 0.000082 Epoch: 29 Global Step: 615520 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:42,862-Speed 2496.38 samples/sec Loss 1.3663 LearningRate 0.000082 Epoch: 29 Global Step: 615530 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:51,063-Speed 2497.55 samples/sec Loss 1.3683 LearningRate 0.000082 Epoch: 29 Global Step: 615540 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:20:59,209-Speed 2514.47 samples/sec Loss 1.3576 LearningRate 0.000082 Epoch: 29 Global Step: 615550 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:07,414-Speed 2496.74 samples/sec Loss 1.3627 LearningRate 0.000082 Epoch: 29 Global Step: 615560 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:15,617-Speed 2496.86 samples/sec Loss 1.3868 LearningRate 0.000082 Epoch: 29 Global Step: 615570 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:23,822-Speed 2496.42 samples/sec Loss 1.3626 LearningRate 0.000082 Epoch: 29 Global Step: 615580 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:32,035-Speed 2494.27 samples/sec Loss 1.3899 LearningRate 0.000082 Epoch: 29 Global Step: 615590 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:40,236-Speed 2497.69 samples/sec Loss 1.3828 LearningRate 0.000082 Epoch: 29 Global Step: 615600 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:48,386-Speed 2513.19 samples/sec Loss 1.3738 LearningRate 0.000082 Epoch: 29 Global Step: 615610 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:21:56,589-Speed 2496.86 samples/sec Loss 1.3824 LearningRate 0.000082 Epoch: 29 Global Step: 615620 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:04,816-Speed 2490.20 samples/sec Loss 1.3902 LearningRate 0.000082 Epoch: 29 Global Step: 615630 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:13,016-Speed 2497.85 samples/sec Loss 1.3242 LearningRate 0.000082 Epoch: 29 Global Step: 615640 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:21,217-Speed 2497.53 samples/sec Loss 1.3695 LearningRate 0.000082 Epoch: 29 Global Step: 615650 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:29,418-Speed 2497.68 samples/sec Loss 1.4093 LearningRate 0.000082 Epoch: 29 Global Step: 615660 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:37,567-Speed 2513.76 samples/sec Loss 1.3761 LearningRate 0.000082 Epoch: 29 Global Step: 615670 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:45,768-Speed 2497.39 samples/sec Loss 1.3653 LearningRate 0.000082 Epoch: 29 Global Step: 615680 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:22:53,976-Speed 2495.46 samples/sec Loss 1.3553 LearningRate 0.000082 Epoch: 29 Global Step: 615690 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:02,177-Speed 2498.11 samples/sec Loss 1.3580 LearningRate 0.000082 Epoch: 29 Global Step: 615700 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:10,377-Speed 2497.98 samples/sec Loss 1.3762 LearningRate 0.000082 Epoch: 29 Global Step: 615710 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:18,580-Speed 2497.14 samples/sec Loss 1.3723 LearningRate 0.000082 Epoch: 29 Global Step: 615720 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:26,728-Speed 2514.00 samples/sec Loss 1.3656 LearningRate 0.000082 Epoch: 29 Global Step: 615730 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:34,930-Speed 2497.51 samples/sec Loss 1.3797 LearningRate 0.000082 Epoch: 29 Global Step: 615740 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:43,136-Speed 2496.28 samples/sec Loss 1.3295 LearningRate 0.000082 Epoch: 29 Global Step: 615750 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:51,334-Speed 2498.54 samples/sec Loss 1.4032 LearningRate 0.000082 Epoch: 29 Global Step: 615760 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:23:59,535-Speed 2497.79 samples/sec Loss 1.3684 LearningRate 0.000082 Epoch: 29 Global Step: 615770 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:07,734-Speed 2498.21 samples/sec Loss 1.3751 LearningRate 0.000082 Epoch: 29 Global Step: 615780 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:15,886-Speed 2512.49 samples/sec Loss 1.3783 LearningRate 0.000082 Epoch: 29 Global Step: 615790 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:24,099-Speed 2494.11 samples/sec Loss 1.3859 LearningRate 0.000082 Epoch: 29 Global Step: 615800 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:32,302-Speed 2497.27 samples/sec Loss 1.3617 LearningRate 0.000082 Epoch: 29 Global Step: 615810 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:40,504-Speed 2497.48 samples/sec Loss 1.3623 LearningRate 0.000082 Epoch: 29 Global Step: 615820 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:48,706-Speed 2497.16 samples/sec Loss 1.3711 LearningRate 0.000082 Epoch: 29 Global Step: 615830 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:24:56,906-Speed 2498.03 samples/sec Loss 1.3546 LearningRate 0.000082 Epoch: 29 Global Step: 615840 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:05,052-Speed 2514.71 samples/sec Loss 1.3609 LearningRate 0.000082 Epoch: 29 Global Step: 615850 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:13,253-Speed 2497.47 samples/sec Loss 1.4076 LearningRate 0.000082 Epoch: 29 Global Step: 615860 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:21,454-Speed 2497.59 samples/sec Loss 1.4040 LearningRate 0.000082 Epoch: 29 Global Step: 615870 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:29,666-Speed 2494.25 samples/sec Loss 1.4195 LearningRate 0.000082 Epoch: 29 Global Step: 615880 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:37,866-Speed 2497.97 samples/sec Loss 1.3607 LearningRate 0.000082 Epoch: 29 Global Step: 615890 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:46,068-Speed 2497.32 samples/sec Loss 1.3423 LearningRate 0.000082 Epoch: 29 Global Step: 615900 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:25:54,218-Speed 2513.24 samples/sec Loss 1.3563 LearningRate 0.000082 Epoch: 29 Global Step: 615910 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:02,417-Speed 2498.82 samples/sec Loss 1.3930 LearningRate 0.000082 Epoch: 29 Global Step: 615920 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:10,617-Speed 2497.87 samples/sec Loss 1.3876 LearningRate 0.000082 Epoch: 29 Global Step: 615930 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:18,817-Speed 2498.17 samples/sec Loss 1.3857 LearningRate 0.000082 Epoch: 29 Global Step: 615940 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:27,018-Speed 2497.55 samples/sec Loss 1.3964 LearningRate 0.000082 Epoch: 29 Global Step: 615950 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:35,231-Speed 2494.18 samples/sec Loss 1.3892 LearningRate 0.000082 Epoch: 29 Global Step: 615960 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:43,378-Speed 2514.25 samples/sec Loss 1.3569 LearningRate 0.000082 Epoch: 29 Global Step: 615970 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:51,578-Speed 2497.89 samples/sec Loss 1.3906 LearningRate 0.000082 Epoch: 29 Global Step: 615980 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:26:59,799-Speed 2491.57 samples/sec Loss 1.4036 LearningRate 0.000082 Epoch: 29 Global Step: 615990 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:07,999-Speed 2497.94 samples/sec Loss 1.3770 LearningRate 0.000082 Epoch: 29 Global Step: 616000 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:16,203-Speed 2496.66 samples/sec Loss 1.3557 LearningRate 0.000082 Epoch: 29 Global Step: 616010 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:24,403-Speed 2497.86 samples/sec Loss 1.3190 LearningRate 0.000082 Epoch: 29 Global Step: 616020 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:32,555-Speed 2512.71 samples/sec Loss 1.3491 LearningRate 0.000082 Epoch: 29 Global Step: 616030 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:40,755-Speed 2497.96 samples/sec Loss 1.3757 LearningRate 0.000082 Epoch: 29 Global Step: 616040 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:48,956-Speed 2497.67 samples/sec Loss 1.3641 LearningRate 0.000082 Epoch: 29 Global Step: 616050 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:27:57,155-Speed 2498.41 samples/sec Loss 1.3757 LearningRate 0.000082 Epoch: 29 Global Step: 616060 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:05,359-Speed 2496.92 samples/sec Loss 1.3420 LearningRate 0.000082 Epoch: 29 Global Step: 616070 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:13,558-Speed 2498.11 samples/sec Loss 1.3541 LearningRate 0.000082 Epoch: 29 Global Step: 616080 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:21,704-Speed 2514.53 samples/sec Loss 1.3657 LearningRate 0.000082 Epoch: 29 Global Step: 616090 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:29,914-Speed 2494.70 samples/sec Loss 1.3589 LearningRate 0.000082 Epoch: 29 Global Step: 616100 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:38,125-Speed 2494.53 samples/sec Loss 1.3774 LearningRate 0.000082 Epoch: 29 Global Step: 616110 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:46,335-Speed 2494.97 samples/sec Loss 1.3463 LearningRate 0.000082 Epoch: 29 Global Step: 616120 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:28:54,532-Speed 2498.79 samples/sec Loss 1.4067 LearningRate 0.000082 Epoch: 29 Global Step: 616130 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:02,746-Speed 2493.82 samples/sec Loss 1.3787 LearningRate 0.000082 Epoch: 29 Global Step: 616140 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:10,887-Speed 2516.30 samples/sec Loss 1.3905 LearningRate 0.000082 Epoch: 29 Global Step: 616150 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:19,088-Speed 2497.62 samples/sec Loss 1.3766 LearningRate 0.000082 Epoch: 29 Global Step: 616160 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:27,289-Speed 2497.54 samples/sec Loss 1.3880 LearningRate 0.000082 Epoch: 29 Global Step: 616170 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:35,490-Speed 2497.81 samples/sec Loss 1.3835 LearningRate 0.000082 Epoch: 29 Global Step: 616180 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:43,691-Speed 2497.64 samples/sec Loss 1.3945 LearningRate 0.000082 Epoch: 29 Global Step: 616190 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:29:51,890-Speed 2498.26 samples/sec Loss 1.3833 LearningRate 0.000082 Epoch: 29 Global Step: 616200 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:00,036-Speed 2514.60 samples/sec Loss 1.3835 LearningRate 0.000082 Epoch: 29 Global Step: 616210 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:08,240-Speed 2496.53 samples/sec Loss 1.3877 LearningRate 0.000082 Epoch: 29 Global Step: 616220 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:16,442-Speed 2497.32 samples/sec Loss 1.3652 LearningRate 0.000082 Epoch: 29 Global Step: 616230 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:24,647-Speed 2497.01 samples/sec Loss 1.3942 LearningRate 0.000082 Epoch: 29 Global Step: 616240 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:32,853-Speed 2496.01 samples/sec Loss 1.3546 LearningRate 0.000082 Epoch: 29 Global Step: 616250 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:41,053-Speed 2497.83 samples/sec Loss 1.3971 LearningRate 0.000082 Epoch: 29 Global Step: 616260 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:49,202-Speed 2513.81 samples/sec Loss 1.3878 LearningRate 0.000082 Epoch: 29 Global Step: 616270 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:30:57,416-Speed 2493.64 samples/sec Loss 1.3949 LearningRate 0.000082 Epoch: 29 Global Step: 616280 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:05,620-Speed 2496.75 samples/sec Loss 1.3914 LearningRate 0.000082 Epoch: 29 Global Step: 616290 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:13,830-Speed 2494.78 samples/sec Loss 1.3814 LearningRate 0.000082 Epoch: 29 Global Step: 616300 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:22,034-Speed 2496.94 samples/sec Loss 1.3707 LearningRate 0.000082 Epoch: 29 Global Step: 616310 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:30,239-Speed 2496.40 samples/sec Loss 1.3686 LearningRate 0.000082 Epoch: 29 Global Step: 616320 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:38,395-Speed 2511.45 samples/sec Loss 1.3887 LearningRate 0.000082 Epoch: 29 Global Step: 616330 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:46,597-Speed 2497.70 samples/sec Loss 1.3866 LearningRate 0.000082 Epoch: 29 Global Step: 616340 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:31:54,800-Speed 2496.93 samples/sec Loss 1.4331 LearningRate 0.000082 Epoch: 29 Global Step: 616350 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:03,002-Speed 2497.13 samples/sec Loss 1.3711 LearningRate 0.000082 Epoch: 29 Global Step: 616360 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:11,203-Speed 2497.60 samples/sec Loss 1.3702 LearningRate 0.000082 Epoch: 29 Global Step: 616370 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:19,403-Speed 2498.35 samples/sec Loss 1.3612 LearningRate 0.000082 Epoch: 29 Global Step: 616380 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:27,552-Speed 2513.90 samples/sec Loss 1.3834 LearningRate 0.000082 Epoch: 29 Global Step: 616390 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:35,753-Speed 2497.53 samples/sec Loss 1.3327 LearningRate 0.000082 Epoch: 29 Global Step: 616400 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:43,965-Speed 2494.32 samples/sec Loss 1.3248 LearningRate 0.000082 Epoch: 29 Global Step: 616410 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:32:52,169-Speed 2496.84 samples/sec Loss 1.4070 LearningRate 0.000081 Epoch: 29 Global Step: 616420 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:00,367-Speed 2498.76 samples/sec Loss 1.3637 LearningRate 0.000081 Epoch: 29 Global Step: 616430 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:08,566-Speed 2498.31 samples/sec Loss 1.3698 LearningRate 0.000081 Epoch: 29 Global Step: 616440 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:16,714-Speed 2513.68 samples/sec Loss 1.3668 LearningRate 0.000081 Epoch: 29 Global Step: 616450 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:24,916-Speed 2497.38 samples/sec Loss 1.3531 LearningRate 0.000081 Epoch: 29 Global Step: 616460 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:33,116-Speed 2497.83 samples/sec Loss 1.3793 LearningRate 0.000081 Epoch: 29 Global Step: 616470 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:41,320-Speed 2497.27 samples/sec Loss 1.3622 LearningRate 0.000081 Epoch: 29 Global Step: 616480 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:49,521-Speed 2497.72 samples/sec Loss 1.3987 LearningRate 0.000081 Epoch: 29 Global Step: 616490 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:33:57,719-Speed 2498.50 samples/sec Loss 1.3613 LearningRate 0.000081 Epoch: 29 Global Step: 616500 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:34:05,865-Speed 2514.42 samples/sec Loss 1.4106 LearningRate 0.000081 Epoch: 29 Global Step: 616510 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:34:14,064-Speed 2498.09 samples/sec Loss 1.3864 LearningRate 0.000081 Epoch: 29 Global Step: 616520 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:34:22,265-Speed 2497.93 samples/sec Loss 1.3738 LearningRate 0.000081 Epoch: 29 Global Step: 616530 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-07-11 11:34:30,464-Speed 2498.25 samples/sec Loss 1.3793 LearningRate 0.000081 Epoch: 29 Global Step: 616540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:34:38,663-Speed 2498.29 samples/sec Loss 1.3964 LearningRate 0.000081 Epoch: 29 Global Step: 616550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:34:46,862-Speed 2498.00 samples/sec Loss 1.3713 LearningRate 0.000081 Epoch: 29 Global Step: 616560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:34:55,010-Speed 2513.91 samples/sec Loss 1.3731 LearningRate 0.000081 Epoch: 29 Global Step: 616570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:03,210-Speed 2498.08 samples/sec Loss 1.3536 LearningRate 0.000081 Epoch: 29 Global Step: 616580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:11,414-Speed 2496.91 samples/sec Loss 1.3987 LearningRate 0.000081 Epoch: 29 Global Step: 616590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:19,613-Speed 2497.98 samples/sec Loss 1.3268 LearningRate 0.000081 Epoch: 29 Global Step: 616600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:27,814-Speed 2497.82 samples/sec Loss 1.4005 LearningRate 0.000081 Epoch: 29 Global Step: 616610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:36,014-Speed 2497.85 samples/sec Loss 1.3871 LearningRate 0.000081 Epoch: 29 Global Step: 616620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:44,163-Speed 2513.83 samples/sec Loss 1.3533 LearningRate 0.000081 Epoch: 29 Global Step: 616630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:35:52,363-Speed 2498.00 samples/sec Loss 1.4028 LearningRate 0.000081 Epoch: 29 Global Step: 616640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:00,563-Speed 2497.78 samples/sec Loss 1.3937 LearningRate 0.000081 Epoch: 29 Global Step: 616650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:08,768-Speed 2496.67 samples/sec Loss 1.3676 LearningRate 0.000081 Epoch: 29 Global Step: 616660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:16,966-Speed 2498.53 samples/sec Loss 1.3832 LearningRate 0.000081 Epoch: 29 Global Step: 616670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:25,164-Speed 2498.48 samples/sec Loss 1.3974 LearningRate 0.000081 Epoch: 29 Global Step: 616680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:33,325-Speed 2509.78 samples/sec Loss 1.3529 LearningRate 0.000081 Epoch: 29 Global Step: 616690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:41,540-Speed 2493.43 samples/sec Loss 1.3744 LearningRate 0.000081 Epoch: 29 Global Step: 616700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:49,757-Speed 2492.94 samples/sec Loss 1.3851 LearningRate 0.000081 Epoch: 29 Global Step: 616710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:36:57,953-Speed 2499.08 samples/sec Loss 1.3810 LearningRate 0.000081 Epoch: 29 Global Step: 616720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:06,152-Speed 2498.13 samples/sec Loss 1.3899 LearningRate 0.000081 Epoch: 29 Global Step: 616730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:14,353-Speed 2497.71 samples/sec Loss 1.3793 LearningRate 0.000081 Epoch: 29 Global Step: 616740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:22,513-Speed 2510.20 samples/sec Loss 1.3355 LearningRate 0.000081 Epoch: 29 Global Step: 616750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:30,714-Speed 2497.37 samples/sec Loss 1.3786 LearningRate 0.000081 Epoch: 29 Global Step: 616760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:38,914-Speed 2498.00 samples/sec Loss 1.3745 LearningRate 0.000081 Epoch: 29 Global Step: 616770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:47,129-Speed 2493.78 samples/sec Loss 1.3462 LearningRate 0.000081 Epoch: 29 Global Step: 616780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:37:55,340-Speed 2494.81 samples/sec Loss 1.3557 LearningRate 0.000081 Epoch: 29 Global Step: 616790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:03,540-Speed 2497.98 samples/sec Loss 1.3761 LearningRate 0.000081 Epoch: 29 Global Step: 616800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:11,701-Speed 2510.08 samples/sec Loss 1.3831 LearningRate 0.000081 Epoch: 29 Global Step: 616810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:19,899-Speed 2498.44 samples/sec Loss 1.3841 LearningRate 0.000081 Epoch: 29 Global Step: 616820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:28,107-Speed 2495.35 samples/sec Loss 1.3397 LearningRate 0.000081 Epoch: 29 Global Step: 616830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:36,311-Speed 2496.88 samples/sec Loss 1.3729 LearningRate 0.000081 Epoch: 29 Global Step: 616840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:44,509-Speed 2498.72 samples/sec Loss 1.3579 LearningRate 0.000081 Epoch: 29 Global Step: 616850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:38:52,707-Speed 2498.24 samples/sec Loss 1.3718 LearningRate 0.000081 Epoch: 29 Global Step: 616860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:00,860-Speed 2512.44 samples/sec Loss 1.3472 LearningRate 0.000081 Epoch: 29 Global Step: 616870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:09,062-Speed 2497.53 samples/sec Loss 1.3935 LearningRate 0.000081 Epoch: 29 Global Step: 616880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:17,263-Speed 2497.69 samples/sec Loss 1.3447 LearningRate 0.000081 Epoch: 29 Global Step: 616890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:25,462-Speed 2498.20 samples/sec Loss 1.3732 LearningRate 0.000081 Epoch: 29 Global Step: 616900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:33,664-Speed 2497.31 samples/sec Loss 1.3887 LearningRate 0.000081 Epoch: 29 Global Step: 616910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:41,864-Speed 2497.97 samples/sec Loss 1.3704 LearningRate 0.000081 Epoch: 29 Global Step: 616920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:50,013-Speed 2514.07 samples/sec Loss 1.3825 LearningRate 0.000081 Epoch: 29 Global Step: 616930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:39:58,212-Speed 2498.07 samples/sec Loss 1.3818 LearningRate 0.000081 Epoch: 29 Global Step: 616940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:06,412-Speed 2497.86 samples/sec Loss 1.3909 LearningRate 0.000081 Epoch: 29 Global Step: 616950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:14,612-Speed 2498.17 samples/sec Loss 1.3705 LearningRate 0.000081 Epoch: 29 Global Step: 616960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:22,811-Speed 2498.06 samples/sec Loss 1.3550 LearningRate 0.000081 Epoch: 29 Global Step: 616970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:31,012-Speed 2497.66 samples/sec Loss 1.3685 LearningRate 0.000081 Epoch: 29 Global Step: 616980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:39,164-Speed 2512.57 samples/sec Loss 1.3932 LearningRate 0.000081 Epoch: 29 Global Step: 616990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:47,365-Speed 2497.75 samples/sec Loss 1.3867 LearningRate 0.000081 Epoch: 29 Global Step: 617000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:40:55,568-Speed 2497.15 samples/sec Loss 1.3853 LearningRate 0.000081 Epoch: 29 Global Step: 617010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:03,768-Speed 2498.16 samples/sec Loss 1.3808 LearningRate 0.000081 Epoch: 29 Global Step: 617020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:11,969-Speed 2497.51 samples/sec Loss 1.3970 LearningRate 0.000081 Epoch: 29 Global Step: 617030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:20,170-Speed 2497.61 samples/sec Loss 1.3925 LearningRate 0.000081 Epoch: 29 Global Step: 617040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:28,317-Speed 2514.21 samples/sec Loss 1.3523 LearningRate 0.000081 Epoch: 29 Global Step: 617050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:36,516-Speed 2498.45 samples/sec Loss 1.3699 LearningRate 0.000081 Epoch: 29 Global Step: 617060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:44,719-Speed 2496.87 samples/sec Loss 1.3717 LearningRate 0.000081 Epoch: 29 Global Step: 617070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:41:52,920-Speed 2497.85 samples/sec Loss 1.3404 LearningRate 0.000081 Epoch: 29 Global Step: 617080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:01,124-Speed 2496.76 samples/sec Loss 1.3361 LearningRate 0.000081 Epoch: 29 Global Step: 617090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:09,329-Speed 2496.50 samples/sec Loss 1.3313 LearningRate 0.000081 Epoch: 29 Global Step: 617100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:17,479-Speed 2513.25 samples/sec Loss 1.3713 LearningRate 0.000081 Epoch: 29 Global Step: 617110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:25,683-Speed 2496.88 samples/sec Loss 1.3213 LearningRate 0.000081 Epoch: 29 Global Step: 617120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:33,887-Speed 2496.94 samples/sec Loss 1.3473 LearningRate 0.000081 Epoch: 29 Global Step: 617130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:42,086-Speed 2498.18 samples/sec Loss 1.3582 LearningRate 0.000081 Epoch: 29 Global Step: 617140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:50,289-Speed 2497.30 samples/sec Loss 1.3698 LearningRate 0.000081 Epoch: 29 Global Step: 617150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:42:58,490-Speed 2497.52 samples/sec Loss 1.3520 LearningRate 0.000081 Epoch: 29 Global Step: 617160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:06,637-Speed 2514.31 samples/sec Loss 1.3616 LearningRate 0.000081 Epoch: 29 Global Step: 617170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:14,840-Speed 2497.36 samples/sec Loss 1.3511 LearningRate 0.000081 Epoch: 29 Global Step: 617180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:23,042-Speed 2497.17 samples/sec Loss 1.3525 LearningRate 0.000081 Epoch: 29 Global Step: 617190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:31,269-Speed 2489.89 samples/sec Loss 1.3492 LearningRate 0.000081 Epoch: 29 Global Step: 617200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:39,471-Speed 2497.37 samples/sec Loss 1.3421 LearningRate 0.000081 Epoch: 29 Global Step: 617210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:47,672-Speed 2497.56 samples/sec Loss 1.4056 LearningRate 0.000081 Epoch: 29 Global Step: 617220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:43:55,820-Speed 2513.89 samples/sec Loss 1.3450 LearningRate 0.000081 Epoch: 29 Global Step: 617230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:04,022-Speed 2497.33 samples/sec Loss 1.3318 LearningRate 0.000081 Epoch: 29 Global Step: 617240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:12,222-Speed 2498.15 samples/sec Loss 1.3610 LearningRate 0.000081 Epoch: 29 Global Step: 617250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:20,426-Speed 2496.63 samples/sec Loss 1.3319 LearningRate 0.000081 Epoch: 29 Global Step: 617260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:28,625-Speed 2498.10 samples/sec Loss 1.3715 LearningRate 0.000081 Epoch: 29 Global Step: 617270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:36,832-Speed 2495.99 samples/sec Loss 1.3840 LearningRate 0.000081 Epoch: 29 Global Step: 617280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:44,981-Speed 2513.59 samples/sec Loss 1.3408 LearningRate 0.000081 Epoch: 29 Global Step: 617290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:44:53,182-Speed 2497.73 samples/sec Loss 1.3619 LearningRate 0.000081 Epoch: 29 Global Step: 617300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:01,385-Speed 2497.18 samples/sec Loss 1.3909 LearningRate 0.000081 Epoch: 29 Global Step: 617310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:09,585-Speed 2497.94 samples/sec Loss 1.3452 LearningRate 0.000081 Epoch: 29 Global Step: 617320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:17,806-Speed 2491.31 samples/sec Loss 1.3576 LearningRate 0.000081 Epoch: 29 Global Step: 617330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:26,008-Speed 2497.41 samples/sec Loss 1.3536 LearningRate 0.000081 Epoch: 29 Global Step: 617340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:34,155-Speed 2514.09 samples/sec Loss 1.3472 LearningRate 0.000081 Epoch: 29 Global Step: 617350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:42,357-Speed 2497.35 samples/sec Loss 1.3940 LearningRate 0.000081 Epoch: 29 Global Step: 617360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:50,559-Speed 2497.48 samples/sec Loss 1.3369 LearningRate 0.000081 Epoch: 29 Global Step: 617370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:45:58,758-Speed 2498.21 samples/sec Loss 1.3703 LearningRate 0.000081 Epoch: 29 Global Step: 617380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:06,958-Speed 2497.91 samples/sec Loss 1.3869 LearningRate 0.000081 Epoch: 29 Global Step: 617390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:15,157-Speed 2498.09 samples/sec Loss 1.3944 LearningRate 0.000081 Epoch: 29 Global Step: 617400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:23,304-Speed 2514.23 samples/sec Loss 1.3612 LearningRate 0.000081 Epoch: 29 Global Step: 617410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:31,504-Speed 2497.86 samples/sec Loss 1.3655 LearningRate 0.000081 Epoch: 29 Global Step: 617420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:39,705-Speed 2497.98 samples/sec Loss 1.3894 LearningRate 0.000081 Epoch: 29 Global Step: 617430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:47,914-Speed 2495.11 samples/sec Loss 1.3211 LearningRate 0.000081 Epoch: 29 Global Step: 617440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:46:56,136-Speed 2491.38 samples/sec Loss 1.3712 LearningRate 0.000081 Epoch: 29 Global Step: 617450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:04,336-Speed 2497.90 samples/sec Loss 1.3666 LearningRate 0.000081 Epoch: 29 Global Step: 617460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:12,483-Speed 2514.38 samples/sec Loss 1.3610 LearningRate 0.000081 Epoch: 29 Global Step: 617470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:20,683-Speed 2497.88 samples/sec Loss 1.3849 LearningRate 0.000081 Epoch: 29 Global Step: 617480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:28,884-Speed 2497.45 samples/sec Loss 1.4143 LearningRate 0.000081 Epoch: 29 Global Step: 617490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:37,083-Speed 2498.68 samples/sec Loss 1.3863 LearningRate 0.000081 Epoch: 29 Global Step: 617500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:45,288-Speed 2496.53 samples/sec Loss 1.4085 LearningRate 0.000081 Epoch: 29 Global Step: 617510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:47:53,489-Speed 2497.59 samples/sec Loss 1.4294 LearningRate 0.000081 Epoch: 29 Global Step: 617520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:01,639-Speed 2513.22 samples/sec Loss 1.3498 LearningRate 0.000081 Epoch: 29 Global Step: 617530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:09,844-Speed 2496.50 samples/sec Loss 1.3906 LearningRate 0.000081 Epoch: 29 Global Step: 617540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:18,043-Speed 2498.30 samples/sec Loss 1.3753 LearningRate 0.000081 Epoch: 29 Global Step: 617550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:26,244-Speed 2497.76 samples/sec Loss 1.3666 LearningRate 0.000081 Epoch: 29 Global Step: 617560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:34,441-Speed 2498.75 samples/sec Loss 1.3573 LearningRate 0.000081 Epoch: 29 Global Step: 617570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:42,642-Speed 2497.55 samples/sec Loss 1.3811 LearningRate 0.000081 Epoch: 29 Global Step: 617580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:50,788-Speed 2514.63 samples/sec Loss 1.4201 LearningRate 0.000081 Epoch: 29 Global Step: 617590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:48:58,986-Speed 2498.36 samples/sec Loss 1.3344 LearningRate 0.000081 Epoch: 29 Global Step: 617600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:07,192-Speed 2496.19 samples/sec Loss 1.3743 LearningRate 0.000081 Epoch: 29 Global Step: 617610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:15,395-Speed 2497.05 samples/sec Loss 1.3929 LearningRate 0.000081 Epoch: 29 Global Step: 617620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:23,603-Speed 2495.70 samples/sec Loss 1.3561 LearningRate 0.000081 Epoch: 29 Global Step: 617630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:31,804-Speed 2497.35 samples/sec Loss 1.3238 LearningRate 0.000081 Epoch: 29 Global Step: 617640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:39,954-Speed 2513.22 samples/sec Loss 1.3694 LearningRate 0.000081 Epoch: 29 Global Step: 617650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:48,167-Speed 2494.05 samples/sec Loss 1.3744 LearningRate 0.000081 Epoch: 29 Global Step: 617660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:49:56,367-Speed 2497.84 samples/sec Loss 1.4032 LearningRate 0.000081 Epoch: 29 Global Step: 617670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:50:04,570-Speed 2496.84 samples/sec Loss 1.3774 LearningRate 0.000081 Epoch: 29 Global Step: 617680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-07-11 11:50:12,785-Speed 2493.20 samples/sec Loss 1.3755 LearningRate 0.000081 Epoch: 29 Global Step: 617690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:50:20,990-Speed 2496.44 samples/sec Loss 1.3794 LearningRate 0.000081 Epoch: 29 Global Step: 617700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:50:29,143-Speed 2512.53 samples/sec Loss 1.3801 LearningRate 0.000081 Epoch: 29 Global Step: 617710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:50:37,345-Speed 2497.21 samples/sec Loss 1.3691 LearningRate 0.000081 Epoch: 29 Global Step: 617720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:50:45,548-Speed 2497.34 samples/sec Loss 1.3504 LearningRate 0.000081 Epoch: 29 Global Step: 617730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:50:53,748-Speed 2498.04 samples/sec Loss 1.3559 LearningRate 0.000080 Epoch: 29 Global Step: 617740 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-07-11 11:51:01,946-Speed 2498.50 samples/sec Loss 1.3293 LearningRate 0.000080 Epoch: 29 Global Step: 617750 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-07-11 11:51:10,102-Speed 2511.49 samples/sec Loss 1.3578 LearningRate 0.000080 Epoch: 29 Global Step: 617760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:18,264-Speed 2509.56 samples/sec Loss 1.3860 LearningRate 0.000080 Epoch: 29 Global Step: 617770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:26,462-Speed 2498.61 samples/sec Loss 1.3769 LearningRate 0.000080 Epoch: 29 Global Step: 617780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:34,664-Speed 2498.24 samples/sec Loss 1.3541 LearningRate 0.000080 Epoch: 29 Global Step: 617790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:42,861-Speed 2498.69 samples/sec Loss 1.3924 LearningRate 0.000080 Epoch: 29 Global Step: 617800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:51,066-Speed 2496.78 samples/sec Loss 1.3754 LearningRate 0.000080 Epoch: 29 Global Step: 617810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:51:59,281-Speed 2493.59 samples/sec Loss 1.3823 LearningRate 0.000080 Epoch: 29 Global Step: 617820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:07,430-Speed 2513.42 samples/sec Loss 1.3696 LearningRate 0.000080 Epoch: 29 Global Step: 617830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:15,632-Speed 2497.80 samples/sec Loss 1.3691 LearningRate 0.000080 Epoch: 29 Global Step: 617840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:23,833-Speed 2497.62 samples/sec Loss 1.3315 LearningRate 0.000080 Epoch: 29 Global Step: 617850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:32,032-Speed 2497.97 samples/sec Loss 1.4006 LearningRate 0.000080 Epoch: 29 Global Step: 617860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:40,235-Speed 2497.39 samples/sec Loss 1.3388 LearningRate 0.000080 Epoch: 29 Global Step: 617870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:48,434-Speed 2498.63 samples/sec Loss 1.3671 LearningRate 0.000080 Epoch: 29 Global Step: 617880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:52:56,585-Speed 2514.55 samples/sec Loss 1.3843 LearningRate 0.000080 Epoch: 29 Global Step: 617890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:04,788-Speed 2497.20 samples/sec Loss 1.3765 LearningRate 0.000080 Epoch: 29 Global Step: 617900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:12,987-Speed 2498.33 samples/sec Loss 1.3640 LearningRate 0.000080 Epoch: 29 Global Step: 617910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:21,187-Speed 2497.78 samples/sec Loss 1.3336 LearningRate 0.000080 Epoch: 29 Global Step: 617920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:29,391-Speed 2499.30 samples/sec Loss 1.3464 LearningRate 0.000080 Epoch: 29 Global Step: 617930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:37,589-Speed 2498.43 samples/sec Loss 1.3841 LearningRate 0.000080 Epoch: 29 Global Step: 617940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:45,736-Speed 2514.49 samples/sec Loss 1.3533 LearningRate 0.000080 Epoch: 29 Global Step: 617950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:53:53,933-Speed 2498.84 samples/sec Loss 1.3764 LearningRate 0.000080 Epoch: 29 Global Step: 617960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:54:02,133-Speed 2497.95 samples/sec Loss 1.3685 LearningRate 0.000080 Epoch: 29 Global Step: 617970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:54:10,332-Speed 2498.12 samples/sec Loss 1.3949 LearningRate 0.000080 Epoch: 29 Global Step: 617980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 11:54:18,492-Speed 2510.42 samples/sec Loss 1.3737 LearningRate 0.000080 Epoch: 29 Global Step: 617990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:54:26,691-Speed 2498.21 samples/sec Loss 1.3732 LearningRate 0.000080 Epoch: 29 Global Step: 618000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:54:34,848-Speed 2511.23 samples/sec Loss 1.3627 LearningRate 0.000080 Epoch: 29 Global Step: 618010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:54:43,049-Speed 2497.49 samples/sec Loss 1.3642 LearningRate 0.000080 Epoch: 29 Global Step: 618020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:54:51,260-Speed 2494.83 samples/sec Loss 1.3730 LearningRate 0.000080 Epoch: 29 Global Step: 618030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:54:59,459-Speed 2498.69 samples/sec Loss 1.3716 LearningRate 0.000080 Epoch: 29 Global Step: 618040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:07,663-Speed 2496.48 samples/sec Loss 1.4161 LearningRate 0.000080 Epoch: 29 Global Step: 618050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:15,861-Speed 2498.34 samples/sec Loss 1.3582 LearningRate 0.000080 Epoch: 29 Global Step: 618060 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:24,385-Speed 2513.35 samples/sec Loss 1.3564 LearningRate 0.000080 Epoch: 29 Global Step: 618070 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:32,584-Speed 2498.11 samples/sec Loss 1.3652 LearningRate 0.000080 Epoch: 29 Global Step: 618080 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:40,781-Speed 2498.88 samples/sec Loss 1.3675 LearningRate 0.000080 Epoch: 29 Global Step: 618090 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:48,978-Speed 2499.02 samples/sec Loss 1.3492 LearningRate 0.000080 Epoch: 29 Global Step: 618100 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:55:57,177-Speed 2498.19 samples/sec Loss 1.3952 LearningRate 0.000080 Epoch: 29 Global Step: 618110 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:05,378-Speed 2497.59 samples/sec Loss 1.3815 LearningRate 0.000080 Epoch: 29 Global Step: 618120 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:13,526-Speed 2515.39 samples/sec Loss 1.3740 LearningRate 0.000080 Epoch: 29 Global Step: 618130 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:21,730-Speed 2496.68 samples/sec Loss 1.3628 LearningRate 0.000080 Epoch: 29 Global Step: 618140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:29,941-Speed 2494.66 samples/sec Loss 1.3426 LearningRate 0.000080 Epoch: 29 Global Step: 618150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:38,145-Speed 2496.49 samples/sec Loss 1.3821 LearningRate 0.000080 Epoch: 29 Global Step: 618160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:46,347-Speed 2497.34 samples/sec Loss 1.3527 LearningRate 0.000080 Epoch: 29 Global Step: 618170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:56:54,548-Speed 2497.86 samples/sec Loss 1.3622 LearningRate 0.000080 Epoch: 29 Global Step: 618180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:02,708-Speed 2510.17 samples/sec Loss 1.3494 LearningRate 0.000080 Epoch: 29 Global Step: 618190 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:10,909-Speed 2497.80 samples/sec Loss 1.3636 LearningRate 0.000080 Epoch: 29 Global Step: 618200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:19,110-Speed 2499.32 samples/sec Loss 1.3452 LearningRate 0.000080 Epoch: 29 Global Step: 618210 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:27,311-Speed 2497.39 samples/sec Loss 1.3456 LearningRate 0.000080 Epoch: 29 Global Step: 618220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:35,523-Speed 2494.23 samples/sec Loss 1.3576 LearningRate 0.000080 Epoch: 29 Global Step: 618230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:43,723-Speed 2498.05 samples/sec Loss 1.3726 LearningRate 0.000080 Epoch: 29 Global Step: 618240 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:57:51,885-Speed 2513.03 samples/sec Loss 1.4045 LearningRate 0.000080 Epoch: 29 Global Step: 618250 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:00,084-Speed 2498.37 samples/sec Loss 1.3613 LearningRate 0.000080 Epoch: 29 Global Step: 618260 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:08,293-Speed 2494.99 samples/sec Loss 1.3645 LearningRate 0.000080 Epoch: 29 Global Step: 618270 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:16,496-Speed 2497.23 samples/sec Loss 1.3754 LearningRate 0.000080 Epoch: 29 Global Step: 618280 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:24,695-Speed 2498.07 samples/sec Loss 1.3821 LearningRate 0.000080 Epoch: 29 Global Step: 618290 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:32,903-Speed 2495.63 samples/sec Loss 1.3521 LearningRate 0.000080 Epoch: 29 Global Step: 618300 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:41,146-Speed 2512.52 samples/sec Loss 1.3272 LearningRate 0.000080 Epoch: 29 Global Step: 618310 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:49,354-Speed 2495.56 samples/sec Loss 1.4050 LearningRate 0.000080 Epoch: 29 Global Step: 618320 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:58:57,558-Speed 2496.67 samples/sec Loss 1.3638 LearningRate 0.000080 Epoch: 29 Global Step: 618330 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:05,758-Speed 2498.90 samples/sec Loss 1.3802 LearningRate 0.000080 Epoch: 29 Global Step: 618340 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:13,963-Speed 2496.57 samples/sec Loss 1.3706 LearningRate 0.000080 Epoch: 29 Global Step: 618350 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:22,167-Speed 2496.65 samples/sec Loss 1.3893 LearningRate 0.000080 Epoch: 29 Global Step: 618360 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:30,315-Speed 2514.11 samples/sec Loss 1.3676 LearningRate 0.000080 Epoch: 29 Global Step: 618370 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:38,529-Speed 2496.71 samples/sec Loss 1.3541 LearningRate 0.000080 Epoch: 29 Global Step: 618380 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:46,732-Speed 2496.89 samples/sec Loss 1.3409 LearningRate 0.000080 Epoch: 29 Global Step: 618390 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 11:59:54,933-Speed 2497.77 samples/sec Loss 1.3656 LearningRate 0.000080 Epoch: 29 Global Step: 618400 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:03,132-Speed 2498.16 samples/sec Loss 1.3650 LearningRate 0.000080 Epoch: 29 Global Step: 618410 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:11,332-Speed 2500.43 samples/sec Loss 1.3715 LearningRate 0.000080 Epoch: 29 Global Step: 618420 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:19,490-Speed 2510.99 samples/sec Loss 1.3485 LearningRate 0.000080 Epoch: 29 Global Step: 618430 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:27,704-Speed 2496.58 samples/sec Loss 1.3387 LearningRate 0.000080 Epoch: 29 Global Step: 618440 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:35,914-Speed 2498.37 samples/sec Loss 1.3385 LearningRate 0.000080 Epoch: 29 Global Step: 618450 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:44,112-Speed 2498.80 samples/sec Loss 1.3628 LearningRate 0.000080 Epoch: 29 Global Step: 618460 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:00:52,315-Speed 2496.99 samples/sec Loss 1.3546 LearningRate 0.000080 Epoch: 29 Global Step: 618470 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:00,515-Speed 2498.70 samples/sec Loss 1.3328 LearningRate 0.000080 Epoch: 29 Global Step: 618480 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:08,673-Speed 2510.78 samples/sec Loss 1.3549 LearningRate 0.000080 Epoch: 29 Global Step: 618490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:16,874-Speed 2497.73 samples/sec Loss 1.3522 LearningRate 0.000080 Epoch: 29 Global Step: 618500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:25,081-Speed 2495.95 samples/sec Loss 1.3501 LearningRate 0.000080 Epoch: 29 Global Step: 618510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:33,310-Speed 2498.71 samples/sec Loss 1.3469 LearningRate 0.000080 Epoch: 29 Global Step: 618520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:41,513-Speed 2497.22 samples/sec Loss 1.3499 LearningRate 0.000080 Epoch: 29 Global Step: 618530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:49,717-Speed 2496.64 samples/sec Loss 1.3378 LearningRate 0.000080 Epoch: 29 Global Step: 618540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:01:57,869-Speed 2512.61 samples/sec Loss 1.4332 LearningRate 0.000080 Epoch: 29 Global Step: 618550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:06,148-Speed 2498.59 samples/sec Loss 1.4063 LearningRate 0.000080 Epoch: 29 Global Step: 618560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:14,366-Speed 2492.48 samples/sec Loss 1.3604 LearningRate 0.000080 Epoch: 29 Global Step: 618570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:22,566-Speed 2497.66 samples/sec Loss 1.3941 LearningRate 0.000080 Epoch: 29 Global Step: 618580 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:30,781-Speed 2494.90 samples/sec Loss 1.3703 LearningRate 0.000080 Epoch: 29 Global Step: 618590 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:38,995-Speed 2493.66 samples/sec Loss 1.3623 LearningRate 0.000080 Epoch: 29 Global Step: 618600 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:47,145-Speed 2513.31 samples/sec Loss 1.3837 LearningRate 0.000080 Epoch: 29 Global Step: 618610 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:02:55,348-Speed 2496.79 samples/sec Loss 1.3832 LearningRate 0.000080 Epoch: 29 Global Step: 618620 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:03,553-Speed 2498.70 samples/sec Loss 1.4090 LearningRate 0.000080 Epoch: 29 Global Step: 618630 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:11,761-Speed 2495.35 samples/sec Loss 1.4002 LearningRate 0.000080 Epoch: 29 Global Step: 618640 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:19,970-Speed 2495.61 samples/sec Loss 1.3520 LearningRate 0.000080 Epoch: 29 Global Step: 618650 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:28,175-Speed 2496.23 samples/sec Loss 1.3645 LearningRate 0.000080 Epoch: 29 Global Step: 618660 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:36,328-Speed 2512.43 samples/sec Loss 1.3642 LearningRate 0.000080 Epoch: 29 Global Step: 618670 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:44,532-Speed 2496.88 samples/sec Loss 1.3901 LearningRate 0.000080 Epoch: 29 Global Step: 618680 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:03:52,733-Speed 2497.49 samples/sec Loss 1.3897 LearningRate 0.000080 Epoch: 29 Global Step: 618690 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:00,937-Speed 2496.99 samples/sec Loss 1.3082 LearningRate 0.000080 Epoch: 29 Global Step: 618700 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:09,151-Speed 2493.37 samples/sec Loss 1.3691 LearningRate 0.000080 Epoch: 29 Global Step: 618710 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:17,355-Speed 2496.88 samples/sec Loss 1.3379 LearningRate 0.000080 Epoch: 29 Global Step: 618720 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:25,504-Speed 2513.55 samples/sec Loss 1.3590 LearningRate 0.000080 Epoch: 29 Global Step: 618730 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:33,705-Speed 2497.63 samples/sec Loss 1.3503 LearningRate 0.000080 Epoch: 29 Global Step: 618740 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:41,903-Speed 2498.37 samples/sec Loss 1.3203 LearningRate 0.000080 Epoch: 29 Global Step: 618750 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:50,108-Speed 2496.48 samples/sec Loss 1.3826 LearningRate 0.000080 Epoch: 29 Global Step: 618760 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:04:58,307-Speed 2498.28 samples/sec Loss 1.3511 LearningRate 0.000080 Epoch: 29 Global Step: 618770 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:06,510-Speed 2496.97 samples/sec Loss 1.3648 LearningRate 0.000080 Epoch: 29 Global Step: 618780 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:14,654-Speed 2515.14 samples/sec Loss 1.3616 LearningRate 0.000080 Epoch: 29 Global Step: 618790 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:22,856-Speed 2497.45 samples/sec Loss 1.3398 LearningRate 0.000080 Epoch: 29 Global Step: 618800 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:31,057-Speed 2497.56 samples/sec Loss 1.3520 LearningRate 0.000080 Epoch: 29 Global Step: 618810 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:39,258-Speed 2497.68 samples/sec Loss 1.3720 LearningRate 0.000080 Epoch: 29 Global Step: 618820 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:47,456-Speed 2498.47 samples/sec Loss 1.3800 LearningRate 0.000080 Epoch: 29 Global Step: 618830 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:05:55,655-Speed 2498.45 samples/sec Loss 1.3608 LearningRate 0.000080 Epoch: 29 Global Step: 618840 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:03,802-Speed 2514.32 samples/sec Loss 1.3911 LearningRate 0.000080 Epoch: 29 Global Step: 618850 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:12,007-Speed 2496.64 samples/sec Loss 1.3675 LearningRate 0.000080 Epoch: 29 Global Step: 618860 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:20,207-Speed 2497.86 samples/sec Loss 1.3689 LearningRate 0.000080 Epoch: 29 Global Step: 618870 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:28,409-Speed 2497.34 samples/sec Loss 1.3435 LearningRate 0.000080 Epoch: 29 Global Step: 618880 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:36,613-Speed 2496.74 samples/sec Loss 1.3865 LearningRate 0.000080 Epoch: 29 Global Step: 618890 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:44,813-Speed 2498.08 samples/sec Loss 1.3840 LearningRate 0.000080 Epoch: 29 Global Step: 618900 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:06:52,965-Speed 2512.79 samples/sec Loss 1.3340 LearningRate 0.000080 Epoch: 29 Global Step: 618910 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:01,173-Speed 2495.44 samples/sec Loss 1.3613 LearningRate 0.000080 Epoch: 29 Global Step: 618920 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:09,375-Speed 2497.41 samples/sec Loss 1.3438 LearningRate 0.000080 Epoch: 29 Global Step: 618930 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:17,575-Speed 2497.97 samples/sec Loss 1.3591 LearningRate 0.000080 Epoch: 29 Global Step: 618940 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:25,778-Speed 2497.19 samples/sec Loss 1.3819 LearningRate 0.000080 Epoch: 29 Global Step: 618950 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:33,981-Speed 2496.95 samples/sec Loss 1.3999 LearningRate 0.000080 Epoch: 29 Global Step: 618960 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:42,132-Speed 2513.08 samples/sec Loss 1.4017 LearningRate 0.000080 Epoch: 29 Global Step: 618970 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:50,332-Speed 2498.11 samples/sec Loss 1.3803 LearningRate 0.000080 Epoch: 29 Global Step: 618980 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:07:58,534-Speed 2497.28 samples/sec Loss 1.4004 LearningRate 0.000080 Epoch: 29 Global Step: 618990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:06,733-Speed 2498.13 samples/sec Loss 1.3535 LearningRate 0.000080 Epoch: 29 Global Step: 619000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:14,933-Speed 2497.83 samples/sec Loss 1.3711 LearningRate 0.000080 Epoch: 29 Global Step: 619010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:23,138-Speed 2496.57 samples/sec Loss 1.3960 LearningRate 0.000080 Epoch: 29 Global Step: 619020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:31,283-Speed 2514.77 samples/sec Loss 1.3499 LearningRate 0.000080 Epoch: 29 Global Step: 619030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:39,498-Speed 2493.35 samples/sec Loss 1.4028 LearningRate 0.000080 Epoch: 29 Global Step: 619040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:47,704-Speed 2496.38 samples/sec Loss 1.3626 LearningRate 0.000080 Epoch: 29 Global Step: 619050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:08:55,914-Speed 2494.78 samples/sec Loss 1.3308 LearningRate 0.000079 Epoch: 29 Global Step: 619060 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:04,116-Speed 2497.38 samples/sec Loss 1.3581 LearningRate 0.000079 Epoch: 29 Global Step: 619070 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:12,321-Speed 2496.21 samples/sec Loss 1.3858 LearningRate 0.000079 Epoch: 29 Global Step: 619080 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:20,469-Speed 2513.91 samples/sec Loss 1.3530 LearningRate 0.000079 Epoch: 29 Global Step: 619090 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:28,676-Speed 2495.70 samples/sec Loss 1.3683 LearningRate 0.000079 Epoch: 29 Global Step: 619100 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:36,877-Speed 2497.90 samples/sec Loss 1.3303 LearningRate 0.000079 Epoch: 29 Global Step: 619110 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:45,074-Speed 2498.79 samples/sec Loss 1.3446 LearningRate 0.000079 Epoch: 29 Global Step: 619120 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:09:53,275-Speed 2497.81 samples/sec Loss 1.3454 LearningRate 0.000079 Epoch: 29 Global Step: 619130 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:01,480-Speed 2496.48 samples/sec Loss 1.3742 LearningRate 0.000079 Epoch: 29 Global Step: 619140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:09,627-Speed 2514.43 samples/sec Loss 1.3423 LearningRate 0.000079 Epoch: 29 Global Step: 619150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:17,829-Speed 2497.19 samples/sec Loss 1.3468 LearningRate 0.000079 Epoch: 29 Global Step: 619160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:26,116-Speed 2471.65 samples/sec Loss 1.3846 LearningRate 0.000079 Epoch: 29 Global Step: 619170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:34,315-Speed 2498.11 samples/sec Loss 1.3370 LearningRate 0.000079 Epoch: 29 Global Step: 619180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:10:42,515-Speed 2497.99 samples/sec Loss 1.3645 LearningRate 0.000079 Epoch: 29 Global Step: 619190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:10:50,718-Speed 2496.97 samples/sec Loss 1.3454 LearningRate 0.000079 Epoch: 29 Global Step: 619200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:10:58,872-Speed 2511.97 samples/sec Loss 1.3915 LearningRate 0.000079 Epoch: 29 Global Step: 619210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:11:07,027-Speed 2511.73 samples/sec Loss 1.3703 LearningRate 0.000079 Epoch: 29 Global Step: 619220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:15,229-Speed 2497.49 samples/sec Loss 1.3426 LearningRate 0.000079 Epoch: 29 Global Step: 619230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:23,434-Speed 2496.14 samples/sec Loss 1.4032 LearningRate 0.000079 Epoch: 29 Global Step: 619240 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:31,634-Speed 2498.19 samples/sec Loss 1.3565 LearningRate 0.000079 Epoch: 29 Global Step: 619250 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:39,835-Speed 2497.61 samples/sec Loss 1.3664 LearningRate 0.000079 Epoch: 29 Global Step: 619260 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:47,982-Speed 2514.14 samples/sec Loss 1.3666 LearningRate 0.000079 Epoch: 29 Global Step: 619270 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:11:56,183-Speed 2497.60 samples/sec Loss 1.4072 LearningRate 0.000079 Epoch: 29 Global Step: 619280 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:04,388-Speed 2496.33 samples/sec Loss 1.3383 LearningRate 0.000079 Epoch: 29 Global Step: 619290 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:12,593-Speed 2496.37 samples/sec Loss 1.3569 LearningRate 0.000079 Epoch: 29 Global Step: 619300 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:20,805-Speed 2494.51 samples/sec Loss 1.3782 LearningRate 0.000079 Epoch: 29 Global Step: 619310 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:29,008-Speed 2496.97 samples/sec Loss 1.3735 LearningRate 0.000079 Epoch: 29 Global Step: 619320 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:37,163-Speed 2512.78 samples/sec Loss 1.3564 LearningRate 0.000079 Epoch: 29 Global Step: 619330 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:45,382-Speed 2492.26 samples/sec Loss 1.3633 LearningRate 0.000079 Epoch: 29 Global Step: 619340 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:12:53,581-Speed 2498.24 samples/sec Loss 1.3548 LearningRate 0.000079 Epoch: 29 Global Step: 619350 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:01,782-Speed 2497.84 samples/sec Loss 1.3697 LearningRate 0.000079 Epoch: 29 Global Step: 619360 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:09,986-Speed 2496.52 samples/sec Loss 1.3400 LearningRate 0.000079 Epoch: 29 Global Step: 619370 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:18,185-Speed 2498.40 samples/sec Loss 1.3628 LearningRate 0.000079 Epoch: 29 Global Step: 619380 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:26,331-Speed 2514.69 samples/sec Loss 1.3455 LearningRate 0.000079 Epoch: 29 Global Step: 619390 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:34,531-Speed 2498.04 samples/sec Loss 1.3470 LearningRate 0.000079 Epoch: 29 Global Step: 619400 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:42,731-Speed 2498.04 samples/sec Loss 1.3907 LearningRate 0.000079 Epoch: 29 Global Step: 619410 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:50,935-Speed 2496.89 samples/sec Loss 1.3782 LearningRate 0.000079 Epoch: 29 Global Step: 619420 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:13:59,133-Speed 2498.55 samples/sec Loss 1.3607 LearningRate 0.000079 Epoch: 29 Global Step: 619430 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:07,335-Speed 2497.30 samples/sec Loss 1.3636 LearningRate 0.000079 Epoch: 29 Global Step: 619440 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:15,481-Speed 2514.62 samples/sec Loss 1.3774 LearningRate 0.000079 Epoch: 29 Global Step: 619450 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:23,681-Speed 2498.11 samples/sec Loss 1.3696 LearningRate 0.000079 Epoch: 29 Global Step: 619460 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:31,881-Speed 2497.78 samples/sec Loss 1.3360 LearningRate 0.000079 Epoch: 29 Global Step: 619470 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:40,081-Speed 2498.29 samples/sec Loss 1.3620 LearningRate 0.000079 Epoch: 29 Global Step: 619480 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:48,284-Speed 2497.34 samples/sec Loss 1.3949 LearningRate 0.000079 Epoch: 29 Global Step: 619490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:14:56,480-Speed 2499.01 samples/sec Loss 1.4019 LearningRate 0.000079 Epoch: 29 Global Step: 619500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:04,624-Speed 2515.33 samples/sec Loss 1.3839 LearningRate 0.000079 Epoch: 29 Global Step: 619510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:12,827-Speed 2496.92 samples/sec Loss 1.3707 LearningRate 0.000079 Epoch: 29 Global Step: 619520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:21,027-Speed 2497.87 samples/sec Loss 1.3447 LearningRate 0.000079 Epoch: 29 Global Step: 619530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:29,226-Speed 2498.37 samples/sec Loss 1.3595 LearningRate 0.000079 Epoch: 29 Global Step: 619540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:37,425-Speed 2498.14 samples/sec Loss 1.3349 LearningRate 0.000079 Epoch: 29 Global Step: 619550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:45,626-Speed 2497.71 samples/sec Loss 1.3617 LearningRate 0.000079 Epoch: 29 Global Step: 619560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:15:53,772-Speed 2514.48 samples/sec Loss 1.3332 LearningRate 0.000079 Epoch: 29 Global Step: 619570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:01,968-Speed 2499.16 samples/sec Loss 1.3251 LearningRate 0.000079 Epoch: 29 Global Step: 619580 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:10,166-Speed 2499.50 samples/sec Loss 1.3621 LearningRate 0.000079 Epoch: 29 Global Step: 619590 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:18,368-Speed 2497.58 samples/sec Loss 1.3421 LearningRate 0.000079 Epoch: 29 Global Step: 619600 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:26,570-Speed 2497.33 samples/sec Loss 1.3564 LearningRate 0.000079 Epoch: 29 Global Step: 619610 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:34,771-Speed 2497.83 samples/sec Loss 1.3372 LearningRate 0.000079 Epoch: 29 Global Step: 619620 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:42,919-Speed 2513.74 samples/sec Loss 1.3852 LearningRate 0.000079 Epoch: 29 Global Step: 619630 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:51,120-Speed 2497.82 samples/sec Loss 1.3483 LearningRate 0.000079 Epoch: 29 Global Step: 619640 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:16:59,318-Speed 2498.49 samples/sec Loss 1.3354 LearningRate 0.000079 Epoch: 29 Global Step: 619650 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:07,517-Speed 2498.06 samples/sec Loss 1.3637 LearningRate 0.000079 Epoch: 29 Global Step: 619660 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:15,716-Speed 2498.27 samples/sec Loss 1.3720 LearningRate 0.000079 Epoch: 29 Global Step: 619670 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:23,915-Speed 2498.64 samples/sec Loss 1.3358 LearningRate 0.000079 Epoch: 29 Global Step: 619680 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:32,064-Speed 2513.37 samples/sec Loss 1.3766 LearningRate 0.000079 Epoch: 29 Global Step: 619690 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:40,272-Speed 2495.73 samples/sec Loss 1.3344 LearningRate 0.000079 Epoch: 29 Global Step: 619700 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:48,471-Speed 2498.16 samples/sec Loss 1.3545 LearningRate 0.000079 Epoch: 29 Global Step: 619710 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:17:56,674-Speed 2497.14 samples/sec Loss 1.3415 LearningRate 0.000079 Epoch: 29 Global Step: 619720 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:04,880-Speed 2496.14 samples/sec Loss 1.3584 LearningRate 0.000079 Epoch: 29 Global Step: 619730 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:13,077-Speed 2499.06 samples/sec Loss 1.3728 LearningRate 0.000079 Epoch: 29 Global Step: 619740 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:21,223-Speed 2514.45 samples/sec Loss 1.3947 LearningRate 0.000079 Epoch: 29 Global Step: 619750 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:29,424-Speed 2497.56 samples/sec Loss 1.3766 LearningRate 0.000079 Epoch: 29 Global Step: 619760 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:37,621-Speed 2498.98 samples/sec Loss 1.3561 LearningRate 0.000079 Epoch: 29 Global Step: 619770 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:45,821-Speed 2497.68 samples/sec Loss 1.3357 LearningRate 0.000079 Epoch: 29 Global Step: 619780 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:18:54,019-Speed 2498.74 samples/sec Loss 1.3338 LearningRate 0.000079 Epoch: 29 Global Step: 619790 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:02,220-Speed 2497.44 samples/sec Loss 1.3413 LearningRate 0.000079 Epoch: 29 Global Step: 619800 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:10,369-Speed 2513.82 samples/sec Loss 1.3673 LearningRate 0.000079 Epoch: 29 Global Step: 619810 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:18,572-Speed 2496.96 samples/sec Loss 1.3704 LearningRate 0.000079 Epoch: 29 Global Step: 619820 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:26,776-Speed 2496.95 samples/sec Loss 1.3833 LearningRate 0.000079 Epoch: 29 Global Step: 619830 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:34,975-Speed 2498.11 samples/sec Loss 1.3254 LearningRate 0.000079 Epoch: 29 Global Step: 619840 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:43,185-Speed 2494.99 samples/sec Loss 1.3756 LearningRate 0.000079 Epoch: 29 Global Step: 619850 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:51,379-Speed 2499.66 samples/sec Loss 1.3867 LearningRate 0.000079 Epoch: 29 Global Step: 619860 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:19:59,524-Speed 2514.89 samples/sec Loss 1.3602 LearningRate 0.000079 Epoch: 29 Global Step: 619870 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:07,727-Speed 2497.11 samples/sec Loss 1.3521 LearningRate 0.000079 Epoch: 29 Global Step: 619880 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:15,925-Speed 2498.40 samples/sec Loss 1.3545 LearningRate 0.000079 Epoch: 29 Global Step: 619890 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:24,126-Speed 2497.86 samples/sec Loss 1.3606 LearningRate 0.000079 Epoch: 29 Global Step: 619900 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:32,335-Speed 2495.16 samples/sec Loss 1.3668 LearningRate 0.000079 Epoch: 29 Global Step: 619910 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:40,538-Speed 2496.96 samples/sec Loss 1.3421 LearningRate 0.000079 Epoch: 29 Global Step: 619920 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:48,684-Speed 2514.51 samples/sec Loss 1.3560 LearningRate 0.000079 Epoch: 29 Global Step: 619930 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:20:56,886-Speed 2497.20 samples/sec Loss 1.3660 LearningRate 0.000079 Epoch: 29 Global Step: 619940 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:05,086-Speed 2498.09 samples/sec Loss 1.3897 LearningRate 0.000079 Epoch: 29 Global Step: 619950 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:13,291-Speed 2496.30 samples/sec Loss 1.3718 LearningRate 0.000079 Epoch: 29 Global Step: 619960 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:21,492-Speed 2497.67 samples/sec Loss 1.3898 LearningRate 0.000079 Epoch: 29 Global Step: 619970 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:29,691-Speed 2498.27 samples/sec Loss 1.3603 LearningRate 0.000079 Epoch: 29 Global Step: 619980 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:37,838-Speed 2514.39 samples/sec Loss 1.3636 LearningRate 0.000079 Epoch: 29 Global Step: 619990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:46,035-Speed 2498.72 samples/sec Loss 1.3567 LearningRate 0.000079 Epoch: 29 Global Step: 620000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:21:54,248-Speed 2494.02 samples/sec Loss 1.3423 LearningRate 0.000079 Epoch: 29 Global Step: 620010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:02,452-Speed 2496.79 samples/sec Loss 1.3869 LearningRate 0.000079 Epoch: 29 Global Step: 620020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:10,661-Speed 2495.23 samples/sec Loss 1.3596 LearningRate 0.000079 Epoch: 29 Global Step: 620030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:18,866-Speed 2496.55 samples/sec Loss 1.3786 LearningRate 0.000079 Epoch: 29 Global Step: 620040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:27,017-Speed 2512.92 samples/sec Loss 1.3254 LearningRate 0.000079 Epoch: 29 Global Step: 620050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:35,219-Speed 2497.23 samples/sec Loss 1.4231 LearningRate 0.000079 Epoch: 29 Global Step: 620060 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:43,425-Speed 2496.00 samples/sec Loss 1.3820 LearningRate 0.000079 Epoch: 29 Global Step: 620070 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:51,627-Speed 2497.48 samples/sec Loss 1.3710 LearningRate 0.000079 Epoch: 29 Global Step: 620080 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:22:59,832-Speed 2496.87 samples/sec Loss 1.3821 LearningRate 0.000079 Epoch: 29 Global Step: 620090 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:08,051-Speed 2492.19 samples/sec Loss 1.3917 LearningRate 0.000079 Epoch: 29 Global Step: 620100 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:16,198-Speed 2514.03 samples/sec Loss 1.3597 LearningRate 0.000079 Epoch: 29 Global Step: 620110 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:24,405-Speed 2495.99 samples/sec Loss 1.3768 LearningRate 0.000079 Epoch: 29 Global Step: 620120 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:32,619-Speed 2493.65 samples/sec Loss 1.3859 LearningRate 0.000079 Epoch: 29 Global Step: 620130 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:40,818-Speed 2498.36 samples/sec Loss 1.3429 LearningRate 0.000079 Epoch: 29 Global Step: 620140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:49,015-Speed 2498.75 samples/sec Loss 1.3348 LearningRate 0.000079 Epoch: 29 Global Step: 620150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:23:57,224-Speed 2495.36 samples/sec Loss 1.3905 LearningRate 0.000079 Epoch: 29 Global Step: 620160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:05,372-Speed 2513.84 samples/sec Loss 1.3788 LearningRate 0.000079 Epoch: 29 Global Step: 620170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:13,571-Speed 2498.33 samples/sec Loss 1.3421 LearningRate 0.000079 Epoch: 29 Global Step: 620180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:21,770-Speed 2498.30 samples/sec Loss 1.3530 LearningRate 0.000079 Epoch: 29 Global Step: 620190 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:29,973-Speed 2497.15 samples/sec Loss 1.3411 LearningRate 0.000079 Epoch: 29 Global Step: 620200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:38,178-Speed 2496.54 samples/sec Loss 1.3402 LearningRate 0.000079 Epoch: 29 Global Step: 620210 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:46,381-Speed 2497.03 samples/sec Loss 1.3654 LearningRate 0.000079 Epoch: 29 Global Step: 620220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:24:54,530-Speed 2513.39 samples/sec Loss 1.3543 LearningRate 0.000079 Epoch: 29 Global Step: 620230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:02,745-Speed 2493.64 samples/sec Loss 1.3355 LearningRate 0.000079 Epoch: 29 Global Step: 620240 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:10,947-Speed 2497.37 samples/sec Loss 1.3876 LearningRate 0.000079 Epoch: 29 Global Step: 620250 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:19,165-Speed 2492.46 samples/sec Loss 1.3623 LearningRate 0.000079 Epoch: 29 Global Step: 620260 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:27,364-Speed 2498.31 samples/sec Loss 1.3788 LearningRate 0.000079 Epoch: 29 Global Step: 620270 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:35,566-Speed 2497.53 samples/sec Loss 1.3729 LearningRate 0.000079 Epoch: 29 Global Step: 620280 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:43,716-Speed 2513.16 samples/sec Loss 1.3345 LearningRate 0.000079 Epoch: 29 Global Step: 620290 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:25:51,922-Speed 2496.48 samples/sec Loss 1.3539 LearningRate 0.000079 Epoch: 29 Global Step: 620300 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:00,121-Speed 2498.25 samples/sec Loss 1.3465 LearningRate 0.000079 Epoch: 29 Global Step: 620310 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:08,327-Speed 2496.20 samples/sec Loss 1.3538 LearningRate 0.000079 Epoch: 29 Global Step: 620320 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:16,523-Speed 2498.86 samples/sec Loss 1.3385 LearningRate 0.000079 Epoch: 29 Global Step: 620330 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:24,723-Speed 2498.00 samples/sec Loss 1.3885 LearningRate 0.000079 Epoch: 29 Global Step: 620340 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:32,879-Speed 2511.39 samples/sec Loss 1.3488 LearningRate 0.000079 Epoch: 29 Global Step: 620350 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:41,080-Speed 2497.70 samples/sec Loss 1.3517 LearningRate 0.000079 Epoch: 29 Global Step: 620360 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:49,283-Speed 2496.95 samples/sec Loss 1.3425 LearningRate 0.000079 Epoch: 29 Global Step: 620370 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:26:57,486-Speed 2497.06 samples/sec Loss 1.3538 LearningRate 0.000078 Epoch: 29 Global Step: 620380 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:27:05,698-Speed 2494.07 samples/sec Loss 1.3415 LearningRate 0.000078 Epoch: 29 Global Step: 620390 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:27:13,899-Speed 2497.90 samples/sec Loss 1.3406 LearningRate 0.000078 Epoch: 29 Global Step: 620400 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:27:22,043-Speed 2515.17 samples/sec Loss 1.3596 LearningRate 0.000078 Epoch: 29 Global Step: 620410 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:27:30,243-Speed 2497.95 samples/sec Loss 1.3581 LearningRate 0.000078 Epoch: 29 Global Step: 620420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:27:38,441-Speed 2498.40 samples/sec Loss 1.3263 LearningRate 0.000078 Epoch: 29 Global Step: 620430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:27:46,686-Speed 2498.95 samples/sec Loss 1.3071 LearningRate 0.000078 Epoch: 29 Global Step: 620440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:27:54,896-Speed 2498.83 samples/sec Loss 1.3441 LearningRate 0.000078 Epoch: 29 Global Step: 620450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:03,095-Speed 2498.31 samples/sec Loss 1.3511 LearningRate 0.000078 Epoch: 29 Global Step: 620460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:11,272-Speed 2516.76 samples/sec Loss 1.3546 LearningRate 0.000078 Epoch: 29 Global Step: 620470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:19,506-Speed 2500.20 samples/sec Loss 1.3699 LearningRate 0.000078 Epoch: 29 Global Step: 620480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:27,706-Speed 2497.98 samples/sec Loss 1.3739 LearningRate 0.000078 Epoch: 29 Global Step: 620490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:35,905-Speed 2498.14 samples/sec Loss 1.3618 LearningRate 0.000078 Epoch: 29 Global Step: 620500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:44,144-Speed 2499.91 samples/sec Loss 1.3832 LearningRate 0.000078 Epoch: 29 Global Step: 620510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:28:53,035-Speed 2497.60 samples/sec Loss 1.3486 LearningRate 0.000078 Epoch: 29 Global Step: 620520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:01,181-Speed 2514.20 samples/sec Loss 1.3501 LearningRate 0.000078 Epoch: 29 Global Step: 620530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:11,332-Speed 2501.61 samples/sec Loss 1.3504 LearningRate 0.000078 Epoch: 29 Global Step: 620540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:19,560-Speed 2499.47 samples/sec Loss 1.3949 LearningRate 0.000078 Epoch: 29 Global Step: 620550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:27,786-Speed 2500.62 samples/sec Loss 1.3770 LearningRate 0.000078 Epoch: 29 Global Step: 620560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:40,434-Speed 1619.44 samples/sec Loss 1.3482 LearningRate 0.000078 Epoch: 29 Global Step: 620570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:48,629-Speed 2499.26 samples/sec Loss 1.3750 LearningRate 0.000078 Epoch: 29 Global Step: 620580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:29:56,816-Speed 2514.81 samples/sec Loss 1.3831 LearningRate 0.000078 Epoch: 29 Global Step: 620590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:07,499-Speed 2075.19 samples/sec Loss 1.3126 LearningRate 0.000078 Epoch: 29 Global Step: 620600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:15,710-Speed 2494.67 samples/sec Loss 1.3293 LearningRate 0.000078 Epoch: 29 Global Step: 620610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:23,984-Speed 2489.29 samples/sec Loss 1.3521 LearningRate 0.000078 Epoch: 29 Global Step: 620620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:37,237-Speed 2491.33 samples/sec Loss 1.4144 LearningRate 0.000078 Epoch: 29 Global Step: 620630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:46,311-Speed 2493.45 samples/sec Loss 1.3593 LearningRate 0.000078 Epoch: 29 Global Step: 620640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:30:58,358-Speed 1713.86 samples/sec Loss 1.3851 LearningRate 0.000078 Epoch: 29 Global Step: 620650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:31:07,002-Speed 2378.86 samples/sec Loss 1.3697 LearningRate 0.000078 Epoch: 29 Global Step: 620660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:31:15,165-Speed 2509.02 samples/sec Loss 1.3675 LearningRate 0.000078 Epoch: 29 Global Step: 620670 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:31:23,368-Speed 2497.13 samples/sec Loss 1.3695 LearningRate 0.000078 Epoch: 29 Global Step: 620680 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:31:31,572-Speed 2496.72 samples/sec Loss 1.3458 LearningRate 0.000078 Epoch: 29 Global Step: 620690 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:31:39,774-Speed 2497.22 samples/sec Loss 1.3778 LearningRate 0.000078 Epoch: 29 Global Step: 620700 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:31:47,923-Speed 2513.57 samples/sec Loss 1.3711 LearningRate 0.000078 Epoch: 29 Global Step: 620710 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:31:56,126-Speed 2497.41 samples/sec Loss 1.3782 LearningRate 0.000078 Epoch: 29 Global Step: 620720 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:04,323-Speed 2498.78 samples/sec Loss 1.3908 LearningRate 0.000078 Epoch: 29 Global Step: 620730 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:12,525-Speed 2497.30 samples/sec Loss 1.3291 LearningRate 0.000078 Epoch: 29 Global Step: 620740 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:20,738-Speed 2494.15 samples/sec Loss 1.3532 LearningRate 0.000078 Epoch: 29 Global Step: 620750 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:28,942-Speed 2496.65 samples/sec Loss 1.3522 LearningRate 0.000078 Epoch: 29 Global Step: 620760 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:37,098-Speed 2511.77 samples/sec Loss 1.3832 LearningRate 0.000078 Epoch: 29 Global Step: 620770 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:45,312-Speed 2493.70 samples/sec Loss 1.3463 LearningRate 0.000078 Epoch: 29 Global Step: 620780 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:32:53,532-Speed 2491.95 samples/sec Loss 1.3901 LearningRate 0.000078 Epoch: 29 Global Step: 620790 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:01,740-Speed 2495.52 samples/sec Loss 1.3901 LearningRate 0.000078 Epoch: 29 Global Step: 620800 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:09,948-Speed 2495.74 samples/sec Loss 1.3578 LearningRate 0.000078 Epoch: 29 Global Step: 620810 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:18,165-Speed 2492.68 samples/sec Loss 1.3615 LearningRate 0.000078 Epoch: 29 Global Step: 620820 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:26,320-Speed 2511.94 samples/sec Loss 1.3880 LearningRate 0.000078 Epoch: 29 Global Step: 620830 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:34,528-Speed 2495.31 samples/sec Loss 1.3520 LearningRate 0.000078 Epoch: 29 Global Step: 620840 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:42,737-Speed 2495.34 samples/sec Loss 1.3517 LearningRate 0.000078 Epoch: 29 Global Step: 620850 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:50,949-Speed 2494.18 samples/sec Loss 1.3619 LearningRate 0.000078 Epoch: 29 Global Step: 620860 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:33:59,154-Speed 2496.40 samples/sec Loss 1.3379 LearningRate 0.000078 Epoch: 29 Global Step: 620870 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:07,367-Speed 2493.86 samples/sec Loss 1.3236 LearningRate 0.000078 Epoch: 29 Global Step: 620880 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:15,518-Speed 2513.03 samples/sec Loss 1.3556 LearningRate 0.000078 Epoch: 29 Global Step: 620890 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:23,722-Speed 2496.75 samples/sec Loss 1.3484 LearningRate 0.000078 Epoch: 29 Global Step: 620900 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:31,929-Speed 2495.82 samples/sec Loss 1.3232 LearningRate 0.000078 Epoch: 29 Global Step: 620910 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:40,140-Speed 2494.49 samples/sec Loss 1.3321 LearningRate 0.000078 Epoch: 29 Global Step: 620920 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:48,347-Speed 2496.23 samples/sec Loss 1.3349 LearningRate 0.000078 Epoch: 29 Global Step: 620930 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:34:56,553-Speed 2495.92 samples/sec Loss 1.3872 LearningRate 0.000078 Epoch: 29 Global Step: 620940 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:04,700-Speed 2514.17 samples/sec Loss 1.3639 LearningRate 0.000078 Epoch: 29 Global Step: 620950 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:12,900-Speed 2498.21 samples/sec Loss 1.3804 LearningRate 0.000078 Epoch: 29 Global Step: 620960 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:21,105-Speed 2496.18 samples/sec Loss 1.3240 LearningRate 0.000078 Epoch: 29 Global Step: 620970 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:29,308-Speed 2496.95 samples/sec Loss 1.3590 LearningRate 0.000078 Epoch: 29 Global Step: 620980 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:37,512-Speed 2497.00 samples/sec Loss 1.3777 LearningRate 0.000078 Epoch: 29 Global Step: 620990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:45,717-Speed 2496.32 samples/sec Loss 1.3824 LearningRate 0.000078 Epoch: 29 Global Step: 621000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:35:53,865-Speed 2514.02 samples/sec Loss 1.3554 LearningRate 0.000078 Epoch: 29 Global Step: 621010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:02,078-Speed 2493.81 samples/sec Loss 1.3365 LearningRate 0.000078 Epoch: 29 Global Step: 621020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:10,294-Speed 2493.41 samples/sec Loss 1.3562 LearningRate 0.000078 Epoch: 29 Global Step: 621030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:18,499-Speed 2496.50 samples/sec Loss 1.3447 LearningRate 0.000078 Epoch: 29 Global Step: 621040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:26,701-Speed 2497.48 samples/sec Loss 1.3432 LearningRate 0.000078 Epoch: 29 Global Step: 621050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:34,913-Speed 2494.61 samples/sec Loss 1.3550 LearningRate 0.000078 Epoch: 29 Global Step: 621060 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:43,061-Speed 2513.93 samples/sec Loss 1.3388 LearningRate 0.000078 Epoch: 29 Global Step: 621070 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:51,266-Speed 2496.30 samples/sec Loss 1.3101 LearningRate 0.000078 Epoch: 29 Global Step: 621080 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:36:59,470-Speed 2496.74 samples/sec Loss 1.3347 LearningRate 0.000078 Epoch: 29 Global Step: 621090 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:07,669-Speed 2498.28 samples/sec Loss 1.3737 LearningRate 0.000078 Epoch: 29 Global Step: 621100 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:15,872-Speed 2496.89 samples/sec Loss 1.3533 LearningRate 0.000078 Epoch: 29 Global Step: 621110 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:24,081-Speed 2495.13 samples/sec Loss 1.3329 LearningRate 0.000078 Epoch: 29 Global Step: 621120 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:32,229-Speed 2513.94 samples/sec Loss 1.3861 LearningRate 0.000078 Epoch: 29 Global Step: 621130 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:40,445-Speed 2493.36 samples/sec Loss 1.3524 LearningRate 0.000078 Epoch: 29 Global Step: 621140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:48,646-Speed 2497.88 samples/sec Loss 1.3238 LearningRate 0.000078 Epoch: 29 Global Step: 621150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:37:56,846-Speed 2498.11 samples/sec Loss 1.3625 LearningRate 0.000078 Epoch: 29 Global Step: 621160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:05,053-Speed 2495.82 samples/sec Loss 1.4026 LearningRate 0.000078 Epoch: 29 Global Step: 621170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:13,253-Speed 2498.10 samples/sec Loss 1.3029 LearningRate 0.000078 Epoch: 29 Global Step: 621180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:21,414-Speed 2509.93 samples/sec Loss 1.3327 LearningRate 0.000078 Epoch: 29 Global Step: 621190 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:29,615-Speed 2497.64 samples/sec Loss 1.3573 LearningRate 0.000078 Epoch: 29 Global Step: 621200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:37,819-Speed 2496.60 samples/sec Loss 1.3515 LearningRate 0.000078 Epoch: 29 Global Step: 621210 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:46,022-Speed 2497.20 samples/sec Loss 1.3659 LearningRate 0.000078 Epoch: 29 Global Step: 621220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:38:54,235-Speed 2493.83 samples/sec Loss 1.3531 LearningRate 0.000078 Epoch: 29 Global Step: 621230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:02,440-Speed 2496.35 samples/sec Loss 1.3296 LearningRate 0.000078 Epoch: 29 Global Step: 621240 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:10,593-Speed 2512.50 samples/sec Loss 1.3073 LearningRate 0.000078 Epoch: 29 Global Step: 621250 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:18,793-Speed 2497.89 samples/sec Loss 1.3527 LearningRate 0.000078 Epoch: 29 Global Step: 621260 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:26,998-Speed 2496.30 samples/sec Loss 1.3544 LearningRate 0.000078 Epoch: 29 Global Step: 621270 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:35,200-Speed 2497.46 samples/sec Loss 1.3388 LearningRate 0.000078 Epoch: 29 Global Step: 621280 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:43,405-Speed 2496.31 samples/sec Loss 1.3635 LearningRate 0.000078 Epoch: 29 Global Step: 621290 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:51,612-Speed 2495.74 samples/sec Loss 1.3619 LearningRate 0.000078 Epoch: 29 Global Step: 621300 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:39:59,760-Speed 2514.13 samples/sec Loss 1.3656 LearningRate 0.000078 Epoch: 29 Global Step: 621310 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:07,967-Speed 2495.88 samples/sec Loss 1.3489 LearningRate 0.000078 Epoch: 29 Global Step: 621320 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:16,171-Speed 2496.68 samples/sec Loss 1.3605 LearningRate 0.000078 Epoch: 29 Global Step: 621330 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:24,372-Speed 2497.52 samples/sec Loss 1.3202 LearningRate 0.000078 Epoch: 29 Global Step: 621340 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:32,587-Speed 2493.52 samples/sec Loss 1.3486 LearningRate 0.000078 Epoch: 29 Global Step: 621350 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:40,795-Speed 2495.71 samples/sec Loss 1.3641 LearningRate 0.000078 Epoch: 29 Global Step: 621360 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:48,940-Speed 2514.56 samples/sec Loss 1.3863 LearningRate 0.000078 Epoch: 29 Global Step: 621370 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:40:57,152-Speed 2494.43 samples/sec Loss 1.3617 LearningRate 0.000078 Epoch: 29 Global Step: 621380 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:05,352-Speed 2498.18 samples/sec Loss 1.3572 LearningRate 0.000078 Epoch: 29 Global Step: 621390 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:13,550-Speed 2498.24 samples/sec Loss 1.3212 LearningRate 0.000078 Epoch: 29 Global Step: 621400 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:21,750-Speed 2498.07 samples/sec Loss 1.3480 LearningRate 0.000078 Epoch: 29 Global Step: 621410 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:29,954-Speed 2496.91 samples/sec Loss 1.3435 LearningRate 0.000078 Epoch: 29 Global Step: 621420 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:38,101-Speed 2513.98 samples/sec Loss 1.3501 LearningRate 0.000078 Epoch: 29 Global Step: 621430 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:46,317-Speed 2493.07 samples/sec Loss 1.3578 LearningRate 0.000078 Epoch: 29 Global Step: 621440 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:41:54,520-Speed 2497.58 samples/sec Loss 1.3892 LearningRate 0.000078 Epoch: 29 Global Step: 621450 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:02,731-Speed 2494.61 samples/sec Loss 1.3443 LearningRate 0.000078 Epoch: 29 Global Step: 621460 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:10,938-Speed 2495.94 samples/sec Loss 1.3739 LearningRate 0.000078 Epoch: 29 Global Step: 621470 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:19,141-Speed 2496.81 samples/sec Loss 1.3597 LearningRate 0.000078 Epoch: 29 Global Step: 621480 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:27,301-Speed 2510.51 samples/sec Loss 1.3786 LearningRate 0.000078 Epoch: 29 Global Step: 621490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:35,505-Speed 2496.62 samples/sec Loss 1.3154 LearningRate 0.000078 Epoch: 29 Global Step: 621500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:43,707-Speed 2497.43 samples/sec Loss 1.3447 LearningRate 0.000078 Epoch: 29 Global Step: 621510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:42:51,911-Speed 2496.72 samples/sec Loss 1.3619 LearningRate 0.000078 Epoch: 29 Global Step: 621520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:00,112-Speed 2497.60 samples/sec Loss 1.3458 LearningRate 0.000078 Epoch: 29 Global Step: 621530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:08,312-Speed 2497.98 samples/sec Loss 1.3343 LearningRate 0.000078 Epoch: 29 Global Step: 621540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:16,458-Speed 2514.45 samples/sec Loss 1.3926 LearningRate 0.000078 Epoch: 29 Global Step: 621550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:24,658-Speed 2497.99 samples/sec Loss 1.3657 LearningRate 0.000078 Epoch: 29 Global Step: 621560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:32,875-Speed 2492.81 samples/sec Loss 1.3813 LearningRate 0.000078 Epoch: 29 Global Step: 621570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:41,076-Speed 2497.61 samples/sec Loss 1.3634 LearningRate 0.000078 Epoch: 29 Global Step: 621580 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:49,276-Speed 2498.12 samples/sec Loss 1.3479 LearningRate 0.000078 Epoch: 29 Global Step: 621590 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:43:57,475-Speed 2498.04 samples/sec Loss 1.3419 LearningRate 0.000078 Epoch: 29 Global Step: 621600 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:05,627-Speed 2512.74 samples/sec Loss 1.3630 LearningRate 0.000078 Epoch: 29 Global Step: 621610 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:13,828-Speed 2497.72 samples/sec Loss 1.3276 LearningRate 0.000078 Epoch: 29 Global Step: 621620 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:22,031-Speed 2497.01 samples/sec Loss 1.3317 LearningRate 0.000078 Epoch: 29 Global Step: 621630 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:30,230-Speed 2498.20 samples/sec Loss 1.3763 LearningRate 0.000078 Epoch: 29 Global Step: 621640 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:38,432-Speed 2497.38 samples/sec Loss 1.3740 LearningRate 0.000078 Epoch: 29 Global Step: 621650 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:46,635-Speed 2497.11 samples/sec Loss 1.3587 LearningRate 0.000078 Epoch: 29 Global Step: 621660 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:44:54,783-Speed 2513.77 samples/sec Loss 1.3687 LearningRate 0.000078 Epoch: 29 Global Step: 621670 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:02,986-Speed 2497.41 samples/sec Loss 1.3670 LearningRate 0.000078 Epoch: 29 Global Step: 621680 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:11,185-Speed 2498.17 samples/sec Loss 1.3469 LearningRate 0.000078 Epoch: 29 Global Step: 621690 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:19,386-Speed 2497.89 samples/sec Loss 1.3605 LearningRate 0.000078 Epoch: 29 Global Step: 621700 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:27,592-Speed 2496.14 samples/sec Loss 1.3380 LearningRate 0.000078 Epoch: 29 Global Step: 621710 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:35,797-Speed 2496.31 samples/sec Loss 1.3678 LearningRate 0.000077 Epoch: 29 Global Step: 621720 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:43,957-Speed 2510.41 samples/sec Loss 1.3654 LearningRate 0.000077 Epoch: 29 Global Step: 621730 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:45:52,159-Speed 2497.40 samples/sec Loss 1.3634 LearningRate 0.000077 Epoch: 29 Global Step: 621740 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:00,360-Speed 2497.53 samples/sec Loss 1.3431 LearningRate 0.000077 Epoch: 29 Global Step: 621750 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:08,563-Speed 2497.08 samples/sec Loss 1.3900 LearningRate 0.000077 Epoch: 29 Global Step: 621760 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:16,765-Speed 2497.42 samples/sec Loss 1.3760 LearningRate 0.000077 Epoch: 29 Global Step: 621770 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:24,965-Speed 2497.79 samples/sec Loss 1.3939 LearningRate 0.000077 Epoch: 29 Global Step: 621780 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:33,115-Speed 2513.17 samples/sec Loss 1.3449 LearningRate 0.000077 Epoch: 29 Global Step: 621790 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:41,323-Speed 2495.56 samples/sec Loss 1.3809 LearningRate 0.000077 Epoch: 29 Global Step: 621800 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:49,528-Speed 2496.52 samples/sec Loss 1.3331 LearningRate 0.000077 Epoch: 29 Global Step: 621810 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:46:57,745-Speed 2492.79 samples/sec Loss 1.3888 LearningRate 0.000077 Epoch: 29 Global Step: 621820 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:47:05,945-Speed 2497.86 samples/sec Loss 1.3922 LearningRate 0.000077 Epoch: 29 Global Step: 621830 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:47:14,149-Speed 2496.79 samples/sec Loss 1.3643 LearningRate 0.000077 Epoch: 29 Global Step: 621840 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:47:22,298-Speed 2513.71 samples/sec Loss 1.3785 LearningRate 0.000077 Epoch: 29 Global Step: 621850 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:47:30,501-Speed 2496.80 samples/sec Loss 1.3965 LearningRate 0.000077 Epoch: 29 Global Step: 621860 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:47:38,704-Speed 2497.11 samples/sec Loss 1.3772 LearningRate 0.000077 Epoch: 29 Global Step: 621870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:47:46,903-Speed 2498.08 samples/sec Loss 1.3958 LearningRate 0.000077 Epoch: 29 Global Step: 621880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:47:55,105-Speed 2497.46 samples/sec Loss 1.3520 LearningRate 0.000077 Epoch: 29 Global Step: 621890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-07-11 12:48:03,261-Speed 2511.26 samples/sec Loss 1.3544 LearningRate 0.000077 Epoch: 29 Global Step: 621900 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:11,416-Speed 2511.87 samples/sec Loss 1.3459 LearningRate 0.000077 Epoch: 29 Global Step: 621910 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:19,622-Speed 2496.15 samples/sec Loss 1.3970 LearningRate 0.000077 Epoch: 29 Global Step: 621920 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:27,826-Speed 2496.55 samples/sec Loss 1.3419 LearningRate 0.000077 Epoch: 29 Global Step: 621930 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:36,030-Speed 2496.76 samples/sec Loss 1.3728 LearningRate 0.000077 Epoch: 29 Global Step: 621940 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:44,234-Speed 2496.71 samples/sec Loss 1.3554 LearningRate 0.000077 Epoch: 29 Global Step: 621950 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:48:52,438-Speed 2496.61 samples/sec Loss 1.3984 LearningRate 0.000077 Epoch: 29 Global Step: 621960 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:00,589-Speed 2513.36 samples/sec Loss 1.3489 LearningRate 0.000077 Epoch: 29 Global Step: 621970 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:08,803-Speed 2493.50 samples/sec Loss 1.3315 LearningRate 0.000077 Epoch: 29 Global Step: 621980 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:17,028-Speed 2490.44 samples/sec Loss 1.3710 LearningRate 0.000077 Epoch: 29 Global Step: 621990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:25,233-Speed 2496.53 samples/sec Loss 1.3462 LearningRate 0.000077 Epoch: 29 Global Step: 622000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:33,434-Speed 2497.62 samples/sec Loss 1.3403 LearningRate 0.000077 Epoch: 29 Global Step: 622010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:41,634-Speed 2497.81 samples/sec Loss 1.3827 LearningRate 0.000077 Epoch: 29 Global Step: 622020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:49,785-Speed 2512.86 samples/sec Loss 1.3729 LearningRate 0.000077 Epoch: 29 Global Step: 622030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:49:57,984-Speed 2498.45 samples/sec Loss 1.3820 LearningRate 0.000077 Epoch: 29 Global Step: 622040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:50:06,185-Speed 2497.49 samples/sec Loss 1.3495 LearningRate 0.000077 Epoch: 29 Global Step: 622050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-07-11 12:50:14,386-Speed 2497.55 samples/sec Loss 1.3764 LearningRate 0.000077 Epoch: 29 Global Step: 622060 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:50:22,586-Speed 2497.92 samples/sec Loss 1.3810 LearningRate 0.000077 Epoch: 29 Global Step: 622070 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:50:30,791-Speed 2496.59 samples/sec Loss 1.3749 LearningRate 0.000077 Epoch: 29 Global Step: 622080 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:50:38,940-Speed 2513.68 samples/sec Loss 1.3595 LearningRate 0.000077 Epoch: 29 Global Step: 622090 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:50:47,143-Speed 2496.87 samples/sec Loss 1.3819 LearningRate 0.000077 Epoch: 29 Global Step: 622100 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:50:55,347-Speed 2496.66 samples/sec Loss 1.3762 LearningRate 0.000077 Epoch: 29 Global Step: 622110 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:03,545-Speed 2498.61 samples/sec Loss 1.3915 LearningRate 0.000077 Epoch: 29 Global Step: 622120 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:11,748-Speed 2497.27 samples/sec Loss 1.3570 LearningRate 0.000077 Epoch: 29 Global Step: 622130 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:19,967-Speed 2492.15 samples/sec Loss 1.3785 LearningRate 0.000077 Epoch: 29 Global Step: 622140 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:28,116-Speed 2513.74 samples/sec Loss 1.3783 LearningRate 0.000077 Epoch: 29 Global Step: 622150 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:36,316-Speed 2498.02 samples/sec Loss 1.3713 LearningRate 0.000077 Epoch: 29 Global Step: 622160 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:44,528-Speed 2494.45 samples/sec Loss 1.3748 LearningRate 0.000077 Epoch: 29 Global Step: 622170 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:51:52,729-Speed 2497.41 samples/sec Loss 1.3562 LearningRate 0.000077 Epoch: 29 Global Step: 622180 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:03,065-Speed 1981.66 samples/sec Loss 1.3484 LearningRate 0.000077 Epoch: 30 Global Step: 622190 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:11,270-Speed 2496.62 samples/sec Loss 1.3514 LearningRate 0.000077 Epoch: 30 Global Step: 622200 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:19,432-Speed 2509.65 samples/sec Loss 1.3827 LearningRate 0.000077 Epoch: 30 Global Step: 622210 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:27,633-Speed 2497.29 samples/sec Loss 1.3959 LearningRate 0.000077 Epoch: 30 Global Step: 622220 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:35,832-Speed 2498.65 samples/sec Loss 1.3548 LearningRate 0.000077 Epoch: 30 Global Step: 622230 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:44,048-Speed 2492.94 samples/sec Loss 1.3483 LearningRate 0.000077 Epoch: 30 Global Step: 622240 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:52:52,254-Speed 2496.31 samples/sec Loss 1.3937 LearningRate 0.000077 Epoch: 30 Global Step: 622250 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:00,455-Speed 2497.62 samples/sec Loss 1.3453 LearningRate 0.000077 Epoch: 30 Global Step: 622260 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:08,605-Speed 2513.06 samples/sec Loss 1.3323 LearningRate 0.000077 Epoch: 30 Global Step: 622270 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:16,808-Speed 2497.03 samples/sec Loss 1.3620 LearningRate 0.000077 Epoch: 30 Global Step: 622280 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:25,011-Speed 2497.11 samples/sec Loss 1.3646 LearningRate 0.000077 Epoch: 30 Global Step: 622290 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:33,218-Speed 2495.94 samples/sec Loss 1.3651 LearningRate 0.000077 Epoch: 30 Global Step: 622300 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:41,426-Speed 2495.46 samples/sec Loss 1.3719 LearningRate 0.000077 Epoch: 30 Global Step: 622310 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:49,631-Speed 2496.38 samples/sec Loss 1.3900 LearningRate 0.000077 Epoch: 30 Global Step: 622320 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:53:57,781-Speed 2513.19 samples/sec Loss 1.3742 LearningRate 0.000077 Epoch: 30 Global Step: 622330 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:05,984-Speed 2497.10 samples/sec Loss 1.3595 LearningRate 0.000077 Epoch: 30 Global Step: 622340 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:14,188-Speed 2496.85 samples/sec Loss 1.3472 LearningRate 0.000077 Epoch: 30 Global Step: 622350 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:22,387-Speed 2497.85 samples/sec Loss 1.3462 LearningRate 0.000077 Epoch: 30 Global Step: 622360 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:30,591-Speed 2497.04 samples/sec Loss 1.3995 LearningRate 0.000077 Epoch: 30 Global Step: 622370 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:38,794-Speed 2496.99 samples/sec Loss 1.3463 LearningRate 0.000077 Epoch: 30 Global Step: 622380 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:46,938-Speed 2514.80 samples/sec Loss 1.3330 LearningRate 0.000077 Epoch: 30 Global Step: 622390 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:54:55,137-Speed 2498.31 samples/sec Loss 1.3550 LearningRate 0.000077 Epoch: 30 Global Step: 622400 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:03,336-Speed 2498.83 samples/sec Loss 1.3637 LearningRate 0.000077 Epoch: 30 Global Step: 622410 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:11,533-Speed 2499.04 samples/sec Loss 1.3590 LearningRate 0.000077 Epoch: 30 Global Step: 622420 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:19,739-Speed 2496.20 samples/sec Loss 1.3380 LearningRate 0.000077 Epoch: 30 Global Step: 622430 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:27,940-Speed 2497.58 samples/sec Loss 1.3454 LearningRate 0.000077 Epoch: 30 Global Step: 622440 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:36,089-Speed 2513.88 samples/sec Loss 1.3453 LearningRate 0.000077 Epoch: 30 Global Step: 622450 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:44,294-Speed 2496.52 samples/sec Loss 1.3280 LearningRate 0.000077 Epoch: 30 Global Step: 622460 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:55:52,494-Speed 2497.75 samples/sec Loss 1.3291 LearningRate 0.000077 Epoch: 30 Global Step: 622470 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:00,699-Speed 2496.31 samples/sec Loss 1.3479 LearningRate 0.000077 Epoch: 30 Global Step: 622480 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:08,902-Speed 2497.25 samples/sec Loss 1.3575 LearningRate 0.000077 Epoch: 30 Global Step: 622490 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:17,103-Speed 2497.45 samples/sec Loss 1.3321 LearningRate 0.000077 Epoch: 30 Global Step: 622500 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:25,251-Speed 2514.10 samples/sec Loss 1.2917 LearningRate 0.000077 Epoch: 30 Global Step: 622510 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:33,464-Speed 2494.02 samples/sec Loss 1.3147 LearningRate 0.000077 Epoch: 30 Global Step: 622520 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:41,664-Speed 2497.99 samples/sec Loss 1.3128 LearningRate 0.000077 Epoch: 30 Global Step: 622530 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:49,863-Speed 2498.10 samples/sec Loss 1.3093 LearningRate 0.000077 Epoch: 30 Global Step: 622540 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:56:58,065-Speed 2497.39 samples/sec Loss 1.3588 LearningRate 0.000077 Epoch: 30 Global Step: 622550 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:06,271-Speed 2496.11 samples/sec Loss 1.3420 LearningRate 0.000077 Epoch: 30 Global Step: 622560 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:14,417-Speed 2514.72 samples/sec Loss 1.3210 LearningRate 0.000077 Epoch: 30 Global Step: 622570 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:22,618-Speed 2497.43 samples/sec Loss 1.3409 LearningRate 0.000077 Epoch: 30 Global Step: 622580 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:30,845-Speed 2489.83 samples/sec Loss 1.3376 LearningRate 0.000077 Epoch: 30 Global Step: 622590 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:39,042-Speed 2498.71 samples/sec Loss 1.3331 LearningRate 0.000077 Epoch: 30 Global Step: 622600 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:47,248-Speed 2496.23 samples/sec Loss 1.3524 LearningRate 0.000077 Epoch: 30 Global Step: 622610 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:57:55,452-Speed 2496.77 samples/sec Loss 1.3590 LearningRate 0.000077 Epoch: 30 Global Step: 622620 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:03,601-Speed 2513.63 samples/sec Loss 1.3393 LearningRate 0.000077 Epoch: 30 Global Step: 622630 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:11,808-Speed 2495.83 samples/sec Loss 1.3553 LearningRate 0.000077 Epoch: 30 Global Step: 622640 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:20,004-Speed 2499.13 samples/sec Loss 1.3112 LearningRate 0.000077 Epoch: 30 Global Step: 622650 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:28,202-Speed 2498.61 samples/sec Loss 1.3389 LearningRate 0.000077 Epoch: 30 Global Step: 622660 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:36,403-Speed 2497.64 samples/sec Loss 1.3084 LearningRate 0.000077 Epoch: 30 Global Step: 622670 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 12:58:44,565-Speed 2509.77 samples/sec Loss 1.3410 LearningRate 0.000077 Epoch: 30 Global Step: 622680 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:58:52,712-Speed 2514.48 samples/sec Loss 1.3370 LearningRate 0.000077 Epoch: 30 Global Step: 622690 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:00,908-Speed 2499.25 samples/sec Loss 1.3306 LearningRate 0.000077 Epoch: 30 Global Step: 622700 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:09,126-Speed 2492.57 samples/sec Loss 1.3406 LearningRate 0.000077 Epoch: 30 Global Step: 622710 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:17,324-Speed 2498.62 samples/sec Loss 1.3286 LearningRate 0.000077 Epoch: 30 Global Step: 622720 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:25,524-Speed 2497.98 samples/sec Loss 1.3315 LearningRate 0.000077 Epoch: 30 Global Step: 622730 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:33,725-Speed 2497.59 samples/sec Loss 1.3322 LearningRate 0.000077 Epoch: 30 Global Step: 622740 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:41,873-Speed 2514.27 samples/sec Loss 1.3725 LearningRate 0.000077 Epoch: 30 Global Step: 622750 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:50,071-Speed 2498.52 samples/sec Loss 1.3817 LearningRate 0.000077 Epoch: 30 Global Step: 622760 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 12:59:58,276-Speed 2496.65 samples/sec Loss 1.3885 LearningRate 0.000077 Epoch: 30 Global Step: 622770 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:06,477-Speed 2497.85 samples/sec Loss 1.3249 LearningRate 0.000077 Epoch: 30 Global Step: 622780 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:14,678-Speed 2497.43 samples/sec Loss 1.3339 LearningRate 0.000077 Epoch: 30 Global Step: 622790 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:22,879-Speed 2497.65 samples/sec Loss 1.3392 LearningRate 0.000077 Epoch: 30 Global Step: 622800 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:31,026-Speed 2514.12 samples/sec Loss 1.3671 LearningRate 0.000077 Epoch: 30 Global Step: 622810 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:39,229-Speed 2497.18 samples/sec Loss 1.3309 LearningRate 0.000077 Epoch: 30 Global Step: 622820 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:47,429-Speed 2498.08 samples/sec Loss 1.3667 LearningRate 0.000077 Epoch: 30 Global Step: 622830 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:00:55,629-Speed 2497.98 samples/sec Loss 1.3265 LearningRate 0.000077 Epoch: 30 Global Step: 622840 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:01:03,827-Speed 2498.61 samples/sec Loss 1.3377 LearningRate 0.000077 Epoch: 30 Global Step: 622850 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:01:12,029-Speed 2497.22 samples/sec Loss 1.3265 LearningRate 0.000077 Epoch: 30 Global Step: 622860 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:01:20,175-Speed 2514.76 samples/sec Loss 1.3726 LearningRate 0.000077 Epoch: 30 Global Step: 622870 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:01:28,333-Speed 2510.68 samples/sec Loss 1.3670 LearningRate 0.000077 Epoch: 30 Global Step: 622880 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:01:36,532-Speed 2498.35 samples/sec Loss 1.3353 LearningRate 0.000077 Epoch: 30 Global Step: 622890 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:01:44,736-Speed 2496.83 samples/sec Loss 1.3619 LearningRate 0.000077 Epoch: 30 Global Step: 622900 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:01:52,939-Speed 2497.13 samples/sec Loss 1.3297 LearningRate 0.000077 Epoch: 30 Global Step: 622910 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:01,136-Speed 2498.66 samples/sec Loss 1.3466 LearningRate 0.000077 Epoch: 30 Global Step: 622920 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:09,284-Speed 2514.19 samples/sec Loss 1.3602 LearningRate 0.000077 Epoch: 30 Global Step: 622930 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:17,517-Speed 2487.70 samples/sec Loss 1.3349 LearningRate 0.000077 Epoch: 30 Global Step: 622940 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:25,715-Speed 2498.44 samples/sec Loss 1.3377 LearningRate 0.000077 Epoch: 30 Global Step: 622950 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:33,921-Speed 2496.44 samples/sec Loss 1.3675 LearningRate 0.000077 Epoch: 30 Global Step: 622960 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:42,120-Speed 2498.33 samples/sec Loss 1.3704 LearningRate 0.000077 Epoch: 30 Global Step: 622970 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:50,319-Speed 2498.34 samples/sec Loss 1.3265 LearningRate 0.000077 Epoch: 30 Global Step: 622980 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:02:58,469-Speed 2513.43 samples/sec Loss 1.3447 LearningRate 0.000077 Epoch: 30 Global Step: 622990 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:06,666-Speed 2498.63 samples/sec Loss 1.3522 LearningRate 0.000077 Epoch: 30 Global Step: 623000 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:14,864-Speed 2498.70 samples/sec Loss 1.3709 LearningRate 0.000077 Epoch: 30 Global Step: 623010 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:23,059-Speed 2499.24 samples/sec Loss 1.3292 LearningRate 0.000077 Epoch: 30 Global Step: 623020 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:31,254-Speed 2499.57 samples/sec Loss 1.3810 LearningRate 0.000077 Epoch: 30 Global Step: 623030 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:39,452-Speed 2498.92 samples/sec Loss 1.3494 LearningRate 0.000077 Epoch: 30 Global Step: 623040 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:47,596-Speed 2515.36 samples/sec Loss 1.3674 LearningRate 0.000077 Epoch: 30 Global Step: 623050 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:03:55,792-Speed 2499.10 samples/sec Loss 1.3484 LearningRate 0.000077 Epoch: 30 Global Step: 623060 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:03,999-Speed 2495.80 samples/sec Loss 1.3378 LearningRate 0.000076 Epoch: 30 Global Step: 623070 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:12,196-Speed 2498.98 samples/sec Loss 1.3292 LearningRate 0.000076 Epoch: 30 Global Step: 623080 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:20,394-Speed 2498.42 samples/sec Loss 1.3808 LearningRate 0.000076 Epoch: 30 Global Step: 623090 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:28,595-Speed 2497.82 samples/sec Loss 1.3550 LearningRate 0.000076 Epoch: 30 Global Step: 623100 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:36,739-Speed 2514.92 samples/sec Loss 1.3425 LearningRate 0.000076 Epoch: 30 Global Step: 623110 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:44,941-Speed 2497.41 samples/sec Loss 1.3459 LearningRate 0.000076 Epoch: 30 Global Step: 623120 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:04:53,151-Speed 2494.94 samples/sec Loss 1.3601 LearningRate 0.000076 Epoch: 30 Global Step: 623130 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:01,347-Speed 2499.33 samples/sec Loss 1.3612 LearningRate 0.000076 Epoch: 30 Global Step: 623140 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:09,547-Speed 2497.96 samples/sec Loss 1.3416 LearningRate 0.000076 Epoch: 30 Global Step: 623150 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:17,747-Speed 2497.86 samples/sec Loss 1.3759 LearningRate 0.000076 Epoch: 30 Global Step: 623160 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:25,894-Speed 2514.42 samples/sec Loss 1.3636 LearningRate 0.000076 Epoch: 30 Global Step: 623170 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:34,093-Speed 2498.32 samples/sec Loss 1.3565 LearningRate 0.000076 Epoch: 30 Global Step: 623180 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:42,290-Speed 2498.89 samples/sec Loss 1.3625 LearningRate 0.000076 Epoch: 30 Global Step: 623190 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:50,501-Speed 2494.74 samples/sec Loss 1.3318 LearningRate 0.000076 Epoch: 30 Global Step: 623200 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:05:58,702-Speed 2497.85 samples/sec Loss 1.3835 LearningRate 0.000076 Epoch: 30 Global Step: 623210 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:06,900-Speed 2498.65 samples/sec Loss 1.3678 LearningRate 0.000076 Epoch: 30 Global Step: 623220 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:15,044-Speed 2515.16 samples/sec Loss 1.3410 LearningRate 0.000076 Epoch: 30 Global Step: 623230 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:23,240-Speed 2499.11 samples/sec Loss 1.3647 LearningRate 0.000076 Epoch: 30 Global Step: 623240 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:31,434-Speed 2499.72 samples/sec Loss 1.3464 LearningRate 0.000076 Epoch: 30 Global Step: 623250 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:39,630-Speed 2499.39 samples/sec Loss 1.3438 LearningRate 0.000076 Epoch: 30 Global Step: 623260 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:47,825-Speed 2499.29 samples/sec Loss 1.3308 LearningRate 0.000076 Epoch: 30 Global Step: 623270 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:06:56,026-Speed 2497.94 samples/sec Loss 1.3406 LearningRate 0.000076 Epoch: 30 Global Step: 623280 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:04,182-Speed 2511.38 samples/sec Loss 1.3402 LearningRate 0.000076 Epoch: 30 Global Step: 623290 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:12,378-Speed 2499.15 samples/sec Loss 1.3469 LearningRate 0.000076 Epoch: 30 Global Step: 623300 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:20,576-Speed 2498.62 samples/sec Loss 1.3281 LearningRate 0.000076 Epoch: 30 Global Step: 623310 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:28,773-Speed 2498.76 samples/sec Loss 1.3404 LearningRate 0.000076 Epoch: 30 Global Step: 623320 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:36,969-Speed 2499.37 samples/sec Loss 1.3706 LearningRate 0.000076 Epoch: 30 Global Step: 623330 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:45,165-Speed 2498.91 samples/sec Loss 1.3584 LearningRate 0.000076 Epoch: 30 Global Step: 623340 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:07:53,308-Speed 2515.32 samples/sec Loss 1.2990 LearningRate 0.000076 Epoch: 30 Global Step: 623350 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:01,506-Speed 2498.63 samples/sec Loss 1.3122 LearningRate 0.000076 Epoch: 30 Global Step: 623360 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:09,709-Speed 2497.02 samples/sec Loss 1.3347 LearningRate 0.000076 Epoch: 30 Global Step: 623370 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:17,907-Speed 2498.59 samples/sec Loss 1.3452 LearningRate 0.000076 Epoch: 30 Global Step: 623380 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:26,106-Speed 2498.43 samples/sec Loss 1.3124 LearningRate 0.000076 Epoch: 30 Global Step: 623390 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:34,305-Speed 2498.20 samples/sec Loss 1.3591 LearningRate 0.000076 Epoch: 30 Global Step: 623400 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:42,453-Speed 2513.90 samples/sec Loss 1.3636 LearningRate 0.000076 Epoch: 30 Global Step: 623410 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:50,661-Speed 2495.73 samples/sec Loss 1.3607 LearningRate 0.000076 Epoch: 30 Global Step: 623420 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:08:58,856-Speed 2499.44 samples/sec Loss 1.3430 LearningRate 0.000076 Epoch: 30 Global Step: 623430 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:07,061-Speed 2496.44 samples/sec Loss 1.3435 LearningRate 0.000076 Epoch: 30 Global Step: 623440 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:15,262-Speed 2497.83 samples/sec Loss 1.3281 LearningRate 0.000076 Epoch: 30 Global Step: 623450 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:23,464-Speed 2497.24 samples/sec Loss 1.3474 LearningRate 0.000076 Epoch: 30 Global Step: 623460 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:31,608-Speed 2515.27 samples/sec Loss 1.3345 LearningRate 0.000076 Epoch: 30 Global Step: 623470 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:39,804-Speed 2499.02 samples/sec Loss 1.3588 LearningRate 0.000076 Epoch: 30 Global Step: 623480 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:48,002-Speed 2498.60 samples/sec Loss 1.3342 LearningRate 0.000076 Epoch: 30 Global Step: 623490 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:09:56,201-Speed 2498.12 samples/sec Loss 1.3501 LearningRate 0.000076 Epoch: 30 Global Step: 623500 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:04,401-Speed 2498.04 samples/sec Loss 1.3551 LearningRate 0.000076 Epoch: 30 Global Step: 623510 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:12,602-Speed 2497.79 samples/sec Loss 1.3627 LearningRate 0.000076 Epoch: 30 Global Step: 623520 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:20,749-Speed 2514.02 samples/sec Loss 1.3089 LearningRate 0.000076 Epoch: 30 Global Step: 623530 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:28,946-Speed 2499.09 samples/sec Loss 1.3413 LearningRate 0.000076 Epoch: 30 Global Step: 623540 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:37,144-Speed 2498.82 samples/sec Loss 1.3470 LearningRate 0.000076 Epoch: 30 Global Step: 623550 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:45,339-Speed 2499.48 samples/sec Loss 1.3315 LearningRate 0.000076 Epoch: 30 Global Step: 623560 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:10:53,546-Speed 2495.63 samples/sec Loss 1.3545 LearningRate 0.000076 Epoch: 30 Global Step: 623570 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:01,744-Speed 2498.55 samples/sec Loss 1.3236 LearningRate 0.000076 Epoch: 30 Global Step: 623580 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:09,888-Speed 2515.16 samples/sec Loss 1.3223 LearningRate 0.000076 Epoch: 30 Global Step: 623590 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:18,092-Speed 2497.03 samples/sec Loss 1.3619 LearningRate 0.000076 Epoch: 30 Global Step: 623600 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:26,292-Speed 2497.76 samples/sec Loss 1.3448 LearningRate 0.000076 Epoch: 30 Global Step: 623610 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:34,488-Speed 2499.36 samples/sec Loss 1.3437 LearningRate 0.000076 Epoch: 30 Global Step: 623620 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:42,700-Speed 2494.54 samples/sec Loss 1.3596 LearningRate 0.000076 Epoch: 30 Global Step: 623630 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:50,902-Speed 2497.12 samples/sec Loss 1.3213 LearningRate 0.000076 Epoch: 30 Global Step: 623640 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:11:59,045-Speed 2516.36 samples/sec Loss 1.3303 LearningRate 0.000076 Epoch: 30 Global Step: 623650 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:07,241-Speed 2499.46 samples/sec Loss 1.3845 LearningRate 0.000076 Epoch: 30 Global Step: 623660 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:15,443-Speed 2497.31 samples/sec Loss 1.3040 LearningRate 0.000076 Epoch: 30 Global Step: 623670 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:23,640-Speed 2498.87 samples/sec Loss 1.3311 LearningRate 0.000076 Epoch: 30 Global Step: 623680 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:31,840-Speed 2498.04 samples/sec Loss 1.3196 LearningRate 0.000076 Epoch: 30 Global Step: 623690 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:40,039-Speed 2498.13 samples/sec Loss 1.3120 LearningRate 0.000076 Epoch: 30 Global Step: 623700 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:48,184-Speed 2514.79 samples/sec Loss 1.3159 LearningRate 0.000076 Epoch: 30 Global Step: 623710 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:12:56,394-Speed 2494.89 samples/sec Loss 1.3170 LearningRate 0.000076 Epoch: 30 Global Step: 623720 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:04,592-Speed 2498.45 samples/sec Loss 1.3284 LearningRate 0.000076 Epoch: 30 Global Step: 623730 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:12,793-Speed 2497.80 samples/sec Loss 1.3520 LearningRate 0.000076 Epoch: 30 Global Step: 623740 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:20,994-Speed 2497.64 samples/sec Loss 1.3304 LearningRate 0.000076 Epoch: 30 Global Step: 623750 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:29,195-Speed 2497.55 samples/sec Loss 1.3519 LearningRate 0.000076 Epoch: 30 Global Step: 623760 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:37,343-Speed 2513.82 samples/sec Loss 1.3379 LearningRate 0.000076 Epoch: 30 Global Step: 623770 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:45,541-Speed 2498.65 samples/sec Loss 1.3476 LearningRate 0.000076 Epoch: 30 Global Step: 623780 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:13:53,737-Speed 2499.16 samples/sec Loss 1.3237 LearningRate 0.000076 Epoch: 30 Global Step: 623790 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:01,935-Speed 2498.74 samples/sec Loss 1.3751 LearningRate 0.000076 Epoch: 30 Global Step: 623800 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:10,135-Speed 2497.76 samples/sec Loss 1.3648 LearningRate 0.000076 Epoch: 30 Global Step: 623810 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:18,336-Speed 2498.17 samples/sec Loss 1.3206 LearningRate 0.000076 Epoch: 30 Global Step: 623820 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:26,479-Speed 2515.41 samples/sec Loss 1.3457 LearningRate 0.000076 Epoch: 30 Global Step: 623830 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:34,674-Speed 2499.33 samples/sec Loss 1.3176 LearningRate 0.000076 Epoch: 30 Global Step: 623840 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:42,870-Speed 2499.30 samples/sec Loss 1.3957 LearningRate 0.000076 Epoch: 30 Global Step: 623850 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:51,083-Speed 2494.33 samples/sec Loss 1.3531 LearningRate 0.000076 Epoch: 30 Global Step: 623860 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:14:59,280-Speed 2498.65 samples/sec Loss 1.3688 LearningRate 0.000076 Epoch: 30 Global Step: 623870 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:07,476-Speed 2499.27 samples/sec Loss 1.3520 LearningRate 0.000076 Epoch: 30 Global Step: 623880 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:15,617-Speed 2516.28 samples/sec Loss 1.3501 LearningRate 0.000076 Epoch: 30 Global Step: 623890 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:23,815-Speed 2498.53 samples/sec Loss 1.3466 LearningRate 0.000076 Epoch: 30 Global Step: 623900 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:32,012-Speed 2498.78 samples/sec Loss 1.3090 LearningRate 0.000076 Epoch: 30 Global Step: 623910 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:40,223-Speed 2494.85 samples/sec Loss 1.3604 LearningRate 0.000076 Epoch: 30 Global Step: 623920 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:48,421-Speed 2498.26 samples/sec Loss 1.3626 LearningRate 0.000076 Epoch: 30 Global Step: 623930 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:15:56,625-Speed 2496.87 samples/sec Loss 1.3184 LearningRate 0.000076 Epoch: 30 Global Step: 623940 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:04,768-Speed 2515.61 samples/sec Loss 1.3138 LearningRate 0.000076 Epoch: 30 Global Step: 623950 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:12,967-Speed 2498.14 samples/sec Loss 1.3447 LearningRate 0.000076 Epoch: 30 Global Step: 623960 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:21,165-Speed 2498.52 samples/sec Loss 1.3239 LearningRate 0.000076 Epoch: 30 Global Step: 623970 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:29,377-Speed 2494.57 samples/sec Loss 1.3341 LearningRate 0.000076 Epoch: 30 Global Step: 623980 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:37,570-Speed 2499.89 samples/sec Loss 1.3121 LearningRate 0.000076 Epoch: 30 Global Step: 623990 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:45,772-Speed 2497.24 samples/sec Loss 1.3218 LearningRate 0.000076 Epoch: 30 Global Step: 624000 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:16:53,917-Speed 2515.05 samples/sec Loss 1.3665 LearningRate 0.000076 Epoch: 30 Global Step: 624010 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:02,113-Speed 2499.08 samples/sec Loss 1.3499 LearningRate 0.000076 Epoch: 30 Global Step: 624020 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:10,310-Speed 2498.99 samples/sec Loss 1.3332 LearningRate 0.000076 Epoch: 30 Global Step: 624030 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:18,506-Speed 2498.96 samples/sec Loss 1.3172 LearningRate 0.000076 Epoch: 30 Global Step: 624040 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:26,702-Speed 2499.17 samples/sec Loss 1.3517 LearningRate 0.000076 Epoch: 30 Global Step: 624050 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:34,899-Speed 2499.36 samples/sec Loss 1.3256 LearningRate 0.000076 Epoch: 30 Global Step: 624060 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:43,049-Speed 2513.07 samples/sec Loss 1.3611 LearningRate 0.000076 Epoch: 30 Global Step: 624070 Fp16 Grad Scale: 4096 Required: 47 hours Training: 2022-07-11 13:17:51,256-Speed 2496.05 samples/sec Loss 1.3523 LearningRate 0.000076 Epoch: 30 Global Step: 624080 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:17:59,453-Speed 2498.76 samples/sec Loss 1.3475 LearningRate 0.000076 Epoch: 30 Global Step: 624090 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:07,653-Speed 2498.05 samples/sec Loss 1.3432 LearningRate 0.000076 Epoch: 30 Global Step: 624100 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:15,855-Speed 2497.37 samples/sec Loss 1.3767 LearningRate 0.000076 Epoch: 30 Global Step: 624110 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:24,053-Speed 2498.55 samples/sec Loss 1.3721 LearningRate 0.000076 Epoch: 30 Global Step: 624120 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:32,205-Speed 2512.63 samples/sec Loss 1.3311 LearningRate 0.000076 Epoch: 30 Global Step: 624130 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:40,406-Speed 2497.72 samples/sec Loss 1.3752 LearningRate 0.000076 Epoch: 30 Global Step: 624140 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:48,605-Speed 2498.16 samples/sec Loss 1.3596 LearningRate 0.000076 Epoch: 30 Global Step: 624150 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:18:56,809-Speed 2496.84 samples/sec Loss 1.3263 LearningRate 0.000076 Epoch: 30 Global Step: 624160 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:05,009-Speed 2497.82 samples/sec Loss 1.3504 LearningRate 0.000076 Epoch: 30 Global Step: 624170 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:13,219-Speed 2495.20 samples/sec Loss 1.3674 LearningRate 0.000076 Epoch: 30 Global Step: 624180 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:21,374-Speed 2511.65 samples/sec Loss 1.3800 LearningRate 0.000076 Epoch: 30 Global Step: 624190 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:29,571-Speed 2498.87 samples/sec Loss 1.3658 LearningRate 0.000076 Epoch: 30 Global Step: 624200 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:37,770-Speed 2498.61 samples/sec Loss 1.3782 LearningRate 0.000076 Epoch: 30 Global Step: 624210 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:45,971-Speed 2497.52 samples/sec Loss 1.3619 LearningRate 0.000076 Epoch: 30 Global Step: 624220 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:19:54,179-Speed 2495.47 samples/sec Loss 1.3737 LearningRate 0.000076 Epoch: 30 Global Step: 624230 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:02,377-Speed 2498.62 samples/sec Loss 1.3357 LearningRate 0.000076 Epoch: 30 Global Step: 624240 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:10,526-Speed 2513.68 samples/sec Loss 1.3423 LearningRate 0.000076 Epoch: 30 Global Step: 624250 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:18,724-Speed 2498.51 samples/sec Loss 1.3742 LearningRate 0.000076 Epoch: 30 Global Step: 624260 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:26,924-Speed 2497.82 samples/sec Loss 1.3586 LearningRate 0.000076 Epoch: 30 Global Step: 624270 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:35,125-Speed 2497.73 samples/sec Loss 1.3305 LearningRate 0.000076 Epoch: 30 Global Step: 624280 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:43,336-Speed 2494.71 samples/sec Loss 1.3169 LearningRate 0.000076 Epoch: 30 Global Step: 624290 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:51,530-Speed 2499.67 samples/sec Loss 1.3603 LearningRate 0.000076 Epoch: 30 Global Step: 624300 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:20:59,677-Speed 2514.19 samples/sec Loss 1.2886 LearningRate 0.000076 Epoch: 30 Global Step: 624310 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:07,876-Speed 2498.25 samples/sec Loss 1.3447 LearningRate 0.000076 Epoch: 30 Global Step: 624320 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:16,079-Speed 2497.23 samples/sec Loss 1.3472 LearningRate 0.000076 Epoch: 30 Global Step: 624330 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:24,278-Speed 2498.30 samples/sec Loss 1.3561 LearningRate 0.000076 Epoch: 30 Global Step: 624340 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:32,476-Speed 2498.39 samples/sec Loss 1.3627 LearningRate 0.000076 Epoch: 30 Global Step: 624350 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:40,680-Speed 2496.69 samples/sec Loss 1.3479 LearningRate 0.000076 Epoch: 30 Global Step: 624360 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:48,824-Speed 2515.14 samples/sec Loss 1.3531 LearningRate 0.000076 Epoch: 30 Global Step: 624370 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:21:57,031-Speed 2495.95 samples/sec Loss 1.3815 LearningRate 0.000076 Epoch: 30 Global Step: 624380 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:05,228-Speed 2498.99 samples/sec Loss 1.3431 LearningRate 0.000076 Epoch: 30 Global Step: 624390 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:13,426-Speed 2498.66 samples/sec Loss 1.3745 LearningRate 0.000076 Epoch: 30 Global Step: 624400 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:21,624-Speed 2498.82 samples/sec Loss 1.3601 LearningRate 0.000076 Epoch: 30 Global Step: 624410 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:29,822-Speed 2498.43 samples/sec Loss 1.3486 LearningRate 0.000075 Epoch: 30 Global Step: 624420 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:37,975-Speed 2512.31 samples/sec Loss 1.3538 LearningRate 0.000075 Epoch: 30 Global Step: 624430 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:46,173-Speed 2498.45 samples/sec Loss 1.3225 LearningRate 0.000075 Epoch: 30 Global Step: 624440 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:22:54,368-Speed 2499.78 samples/sec Loss 1.3700 LearningRate 0.000075 Epoch: 30 Global Step: 624450 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:02,565-Speed 2498.82 samples/sec Loss 1.3015 LearningRate 0.000075 Epoch: 30 Global Step: 624460 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:10,766-Speed 2497.73 samples/sec Loss 1.3842 LearningRate 0.000075 Epoch: 30 Global Step: 624470 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:18,964-Speed 2498.51 samples/sec Loss 1.3524 LearningRate 0.000075 Epoch: 30 Global Step: 624480 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:27,111-Speed 2514.34 samples/sec Loss 1.3356 LearningRate 0.000075 Epoch: 30 Global Step: 624490 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:35,320-Speed 2495.17 samples/sec Loss 1.3456 LearningRate 0.000075 Epoch: 30 Global Step: 624500 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:43,519-Speed 2498.03 samples/sec Loss 1.3485 LearningRate 0.000075 Epoch: 30 Global Step: 624510 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:51,727-Speed 2495.59 samples/sec Loss 1.3263 LearningRate 0.000075 Epoch: 30 Global Step: 624520 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:23:59,925-Speed 2498.61 samples/sec Loss 1.3316 LearningRate 0.000075 Epoch: 30 Global Step: 624530 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:08,127-Speed 2497.34 samples/sec Loss 1.3195 LearningRate 0.000075 Epoch: 30 Global Step: 624540 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:16,274-Speed 2514.16 samples/sec Loss 1.3500 LearningRate 0.000075 Epoch: 30 Global Step: 624550 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:24,472-Speed 2498.52 samples/sec Loss 1.3233 LearningRate 0.000075 Epoch: 30 Global Step: 624560 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:32,670-Speed 2498.53 samples/sec Loss 1.2936 LearningRate 0.000075 Epoch: 30 Global Step: 624570 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:40,869-Speed 2498.34 samples/sec Loss 1.3789 LearningRate 0.000075 Epoch: 30 Global Step: 624580 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:49,067-Speed 2498.91 samples/sec Loss 1.3667 LearningRate 0.000075 Epoch: 30 Global Step: 624590 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:24:57,277-Speed 2494.81 samples/sec Loss 1.3595 LearningRate 0.000075 Epoch: 30 Global Step: 624600 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:05,426-Speed 2513.76 samples/sec Loss 1.3693 LearningRate 0.000075 Epoch: 30 Global Step: 624610 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:13,626-Speed 2497.78 samples/sec Loss 1.3565 LearningRate 0.000075 Epoch: 30 Global Step: 624620 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:21,827-Speed 2497.55 samples/sec Loss 1.3608 LearningRate 0.000075 Epoch: 30 Global Step: 624630 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:30,027-Speed 2498.00 samples/sec Loss 1.3281 LearningRate 0.000075 Epoch: 30 Global Step: 624640 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:38,238-Speed 2494.70 samples/sec Loss 1.3113 LearningRate 0.000075 Epoch: 30 Global Step: 624650 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:46,449-Speed 2494.63 samples/sec Loss 1.3615 LearningRate 0.000075 Epoch: 30 Global Step: 624660 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:25:54,606-Speed 2511.06 samples/sec Loss 1.3376 LearningRate 0.000075 Epoch: 30 Global Step: 624670 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:02,804-Speed 2498.56 samples/sec Loss 1.3353 LearningRate 0.000075 Epoch: 30 Global Step: 624680 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:11,002-Speed 2500.01 samples/sec Loss 1.3683 LearningRate 0.000075 Epoch: 30 Global Step: 624690 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:19,206-Speed 2496.87 samples/sec Loss 1.3554 LearningRate 0.000075 Epoch: 30 Global Step: 624700 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:27,404-Speed 2498.56 samples/sec Loss 1.3374 LearningRate 0.000075 Epoch: 30 Global Step: 624710 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:35,602-Speed 2498.35 samples/sec Loss 1.3507 LearningRate 0.000075 Epoch: 30 Global Step: 624720 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:43,749-Speed 2514.28 samples/sec Loss 1.3400 LearningRate 0.000075 Epoch: 30 Global Step: 624730 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:26:51,950-Speed 2497.82 samples/sec Loss 1.3521 LearningRate 0.000075 Epoch: 30 Global Step: 624740 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:00,161-Speed 2494.55 samples/sec Loss 1.3293 LearningRate 0.000075 Epoch: 30 Global Step: 624750 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:08,364-Speed 2497.24 samples/sec Loss 1.3634 LearningRate 0.000075 Epoch: 30 Global Step: 624760 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:16,563-Speed 2498.36 samples/sec Loss 1.3367 LearningRate 0.000075 Epoch: 30 Global Step: 624770 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:24,764-Speed 2497.44 samples/sec Loss 1.2835 LearningRate 0.000075 Epoch: 30 Global Step: 624780 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:32,912-Speed 2514.03 samples/sec Loss 1.3206 LearningRate 0.000075 Epoch: 30 Global Step: 624790 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:41,113-Speed 2497.73 samples/sec Loss 1.3454 LearningRate 0.000075 Epoch: 30 Global Step: 624800 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:49,315-Speed 2497.18 samples/sec Loss 1.3454 LearningRate 0.000075 Epoch: 30 Global Step: 624810 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:27:57,514-Speed 2498.21 samples/sec Loss 1.3535 LearningRate 0.000075 Epoch: 30 Global Step: 624820 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:05,728-Speed 2493.71 samples/sec Loss 1.3753 LearningRate 0.000075 Epoch: 30 Global Step: 624830 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:13,929-Speed 2497.54 samples/sec Loss 1.3467 LearningRate 0.000075 Epoch: 30 Global Step: 624840 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:22,075-Speed 2514.53 samples/sec Loss 1.3261 LearningRate 0.000075 Epoch: 30 Global Step: 624850 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:30,274-Speed 2498.40 samples/sec Loss 1.3326 LearningRate 0.000075 Epoch: 30 Global Step: 624860 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:38,473-Speed 2498.52 samples/sec Loss 1.3399 LearningRate 0.000075 Epoch: 30 Global Step: 624870 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:46,669-Speed 2498.91 samples/sec Loss 1.3443 LearningRate 0.000075 Epoch: 30 Global Step: 624880 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:28:54,872-Speed 2497.30 samples/sec Loss 1.3354 LearningRate 0.000075 Epoch: 30 Global Step: 624890 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:03,071-Speed 2498.23 samples/sec Loss 1.3386 LearningRate 0.000075 Epoch: 30 Global Step: 624900 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:11,217-Speed 2514.26 samples/sec Loss 1.3876 LearningRate 0.000075 Epoch: 30 Global Step: 624910 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:19,419-Speed 2497.57 samples/sec Loss 1.3449 LearningRate 0.000075 Epoch: 30 Global Step: 624920 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:27,617-Speed 2498.35 samples/sec Loss 1.3484 LearningRate 0.000075 Epoch: 30 Global Step: 624930 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:35,819-Speed 2497.71 samples/sec Loss 1.3436 LearningRate 0.000075 Epoch: 30 Global Step: 624940 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:44,029-Speed 2494.82 samples/sec Loss 1.3267 LearningRate 0.000075 Epoch: 30 Global Step: 624950 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:29:52,244-Speed 2493.36 samples/sec Loss 1.3467 LearningRate 0.000075 Epoch: 30 Global Step: 624960 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:00,389-Speed 2514.70 samples/sec Loss 1.3217 LearningRate 0.000075 Epoch: 30 Global Step: 624970 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:08,596-Speed 2495.98 samples/sec Loss 1.3504 LearningRate 0.000075 Epoch: 30 Global Step: 624980 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:16,797-Speed 2497.56 samples/sec Loss 1.3241 LearningRate 0.000075 Epoch: 30 Global Step: 624990 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:24,999-Speed 2498.07 samples/sec Loss 1.3536 LearningRate 0.000075 Epoch: 30 Global Step: 625000 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:33,203-Speed 2496.67 samples/sec Loss 1.3514 LearningRate 0.000075 Epoch: 30 Global Step: 625010 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:41,400-Speed 2498.85 samples/sec Loss 1.3446 LearningRate 0.000075 Epoch: 30 Global Step: 625020 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:49,556-Speed 2511.60 samples/sec Loss 1.3698 LearningRate 0.000075 Epoch: 30 Global Step: 625030 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:30:57,757-Speed 2498.14 samples/sec Loss 1.3579 LearningRate 0.000075 Epoch: 30 Global Step: 625040 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:05,954-Speed 2498.67 samples/sec Loss 1.3065 LearningRate 0.000075 Epoch: 30 Global Step: 625050 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:14,155-Speed 2497.91 samples/sec Loss 1.3054 LearningRate 0.000075 Epoch: 30 Global Step: 625060 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:22,357-Speed 2497.08 samples/sec Loss 1.3410 LearningRate 0.000075 Epoch: 30 Global Step: 625070 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:30,558-Speed 2497.88 samples/sec Loss 1.3565 LearningRate 0.000075 Epoch: 30 Global Step: 625080 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:38,708-Speed 2513.40 samples/sec Loss 1.3388 LearningRate 0.000075 Epoch: 30 Global Step: 625090 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:46,910-Speed 2497.32 samples/sec Loss 1.3381 LearningRate 0.000075 Epoch: 30 Global Step: 625100 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:31:55,112-Speed 2497.27 samples/sec Loss 1.3389 LearningRate 0.000075 Epoch: 30 Global Step: 625110 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:03,313-Speed 2497.90 samples/sec Loss 1.3368 LearningRate 0.000075 Epoch: 30 Global Step: 625120 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:11,517-Speed 2497.19 samples/sec Loss 1.3203 LearningRate 0.000075 Epoch: 30 Global Step: 625130 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:19,717-Speed 2497.66 samples/sec Loss 1.3541 LearningRate 0.000075 Epoch: 30 Global Step: 625140 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:27,871-Speed 2512.37 samples/sec Loss 1.3137 LearningRate 0.000075 Epoch: 30 Global Step: 625150 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:36,074-Speed 2497.00 samples/sec Loss 1.3781 LearningRate 0.000075 Epoch: 30 Global Step: 625160 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:44,271-Speed 2498.81 samples/sec Loss 1.3301 LearningRate 0.000075 Epoch: 30 Global Step: 625170 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:32:52,475-Speed 2497.24 samples/sec Loss 1.3372 LearningRate 0.000075 Epoch: 30 Global Step: 625180 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:00,673-Speed 2498.51 samples/sec Loss 1.3239 LearningRate 0.000075 Epoch: 30 Global Step: 625190 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:08,873-Speed 2497.97 samples/sec Loss 1.3331 LearningRate 0.000075 Epoch: 30 Global Step: 625200 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:17,016-Speed 2515.20 samples/sec Loss 1.3649 LearningRate 0.000075 Epoch: 30 Global Step: 625210 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:25,225-Speed 2495.26 samples/sec Loss 1.3717 LearningRate 0.000075 Epoch: 30 Global Step: 625220 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:33,423-Speed 2498.46 samples/sec Loss 1.3503 LearningRate 0.000075 Epoch: 30 Global Step: 625230 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:41,635-Speed 2494.35 samples/sec Loss 1.3338 LearningRate 0.000075 Epoch: 30 Global Step: 625240 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:49,834-Speed 2498.13 samples/sec Loss 1.3582 LearningRate 0.000075 Epoch: 30 Global Step: 625250 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:33:58,034-Speed 2498.20 samples/sec Loss 1.3493 LearningRate 0.000075 Epoch: 30 Global Step: 625260 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:34:06,179-Speed 2514.71 samples/sec Loss 1.3823 LearningRate 0.000075 Epoch: 30 Global Step: 625270 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-07-11 13:34:14,380-Speed 2497.73 samples/sec Loss 1.3453 LearningRate 0.000075 Epoch: 30 Global Step: 625280 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:34:22,584-Speed 2496.69 samples/sec Loss 1.3406 LearningRate 0.000075 Epoch: 30 Global Step: 625290 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:34:30,784-Speed 2497.96 samples/sec Loss 1.3680 LearningRate 0.000075 Epoch: 30 Global Step: 625300 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:34:38,990-Speed 2495.94 samples/sec Loss 1.3295 LearningRate 0.000075 Epoch: 30 Global Step: 625310 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:34:47,204-Speed 2493.72 samples/sec Loss 1.3600 LearningRate 0.000075 Epoch: 30 Global Step: 625320 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:34:55,353-Speed 2513.54 samples/sec Loss 1.3402 LearningRate 0.000075 Epoch: 30 Global Step: 625330 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:03,553-Speed 2498.18 samples/sec Loss 1.3720 LearningRate 0.000075 Epoch: 30 Global Step: 625340 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:11,755-Speed 2497.26 samples/sec Loss 1.3316 LearningRate 0.000075 Epoch: 30 Global Step: 625350 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:19,956-Speed 2497.99 samples/sec Loss 1.3398 LearningRate 0.000075 Epoch: 30 Global Step: 625360 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:28,156-Speed 2497.75 samples/sec Loss 1.3550 LearningRate 0.000075 Epoch: 30 Global Step: 625370 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:36,364-Speed 2495.67 samples/sec Loss 1.3620 LearningRate 0.000075 Epoch: 30 Global Step: 625380 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:44,515-Speed 2512.94 samples/sec Loss 1.3713 LearningRate 0.000075 Epoch: 30 Global Step: 625390 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:35:52,715-Speed 2498.16 samples/sec Loss 1.3492 LearningRate 0.000075 Epoch: 30 Global Step: 625400 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:00,910-Speed 2499.32 samples/sec Loss 1.3730 LearningRate 0.000075 Epoch: 30 Global Step: 625410 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:09,127-Speed 2492.77 samples/sec Loss 1.3402 LearningRate 0.000075 Epoch: 30 Global Step: 625420 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:17,325-Speed 2498.55 samples/sec Loss 1.3577 LearningRate 0.000075 Epoch: 30 Global Step: 625430 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:25,534-Speed 2495.00 samples/sec Loss 1.3573 LearningRate 0.000075 Epoch: 30 Global Step: 625440 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:33,685-Speed 2513.24 samples/sec Loss 1.3349 LearningRate 0.000075 Epoch: 30 Global Step: 625450 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:41,884-Speed 2498.15 samples/sec Loss 1.3540 LearningRate 0.000075 Epoch: 30 Global Step: 625460 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:50,085-Speed 2497.65 samples/sec Loss 1.3535 LearningRate 0.000075 Epoch: 30 Global Step: 625470 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:36:58,292-Speed 2496.03 samples/sec Loss 1.3121 LearningRate 0.000075 Epoch: 30 Global Step: 625480 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:06,492-Speed 2498.00 samples/sec Loss 1.3662 LearningRate 0.000075 Epoch: 30 Global Step: 625490 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:14,696-Speed 2496.53 samples/sec Loss 1.3248 LearningRate 0.000075 Epoch: 30 Global Step: 625500 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:22,845-Speed 2514.56 samples/sec Loss 1.3787 LearningRate 0.000075 Epoch: 30 Global Step: 625510 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:31,045-Speed 2497.89 samples/sec Loss 1.3580 LearningRate 0.000075 Epoch: 30 Global Step: 625520 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:39,253-Speed 2495.59 samples/sec Loss 1.3437 LearningRate 0.000075 Epoch: 30 Global Step: 625530 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:47,461-Speed 2495.75 samples/sec Loss 1.3268 LearningRate 0.000075 Epoch: 30 Global Step: 625540 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:37:55,662-Speed 2497.62 samples/sec Loss 1.3748 LearningRate 0.000075 Epoch: 30 Global Step: 625550 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:03,874-Speed 2494.34 samples/sec Loss 1.3691 LearningRate 0.000075 Epoch: 30 Global Step: 625560 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:12,024-Speed 2513.29 samples/sec Loss 1.3808 LearningRate 0.000075 Epoch: 30 Global Step: 625570 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:20,222-Speed 2498.51 samples/sec Loss 1.3302 LearningRate 0.000075 Epoch: 30 Global Step: 625580 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:28,425-Speed 2497.17 samples/sec Loss 1.3241 LearningRate 0.000075 Epoch: 30 Global Step: 625590 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:36,625-Speed 2497.83 samples/sec Loss 1.2982 LearningRate 0.000075 Epoch: 30 Global Step: 625600 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:44,833-Speed 2495.44 samples/sec Loss 1.3606 LearningRate 0.000075 Epoch: 30 Global Step: 625610 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:38:53,030-Speed 2499.18 samples/sec Loss 1.3458 LearningRate 0.000075 Epoch: 30 Global Step: 625620 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:01,179-Speed 2513.82 samples/sec Loss 1.3843 LearningRate 0.000075 Epoch: 30 Global Step: 625630 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:09,379-Speed 2497.91 samples/sec Loss 1.3332 LearningRate 0.000075 Epoch: 30 Global Step: 625640 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:17,577-Speed 2498.37 samples/sec Loss 1.3901 LearningRate 0.000075 Epoch: 30 Global Step: 625650 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:25,782-Speed 2496.83 samples/sec Loss 1.3556 LearningRate 0.000075 Epoch: 30 Global Step: 625660 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:33,980-Speed 2498.64 samples/sec Loss 1.3582 LearningRate 0.000075 Epoch: 30 Global Step: 625670 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:42,177-Speed 2498.79 samples/sec Loss 1.3453 LearningRate 0.000075 Epoch: 30 Global Step: 625680 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:50,326-Speed 2513.76 samples/sec Loss 1.3264 LearningRate 0.000075 Epoch: 30 Global Step: 625690 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:39:58,524-Speed 2498.66 samples/sec Loss 1.3358 LearningRate 0.000075 Epoch: 30 Global Step: 625700 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:06,736-Speed 2494.43 samples/sec Loss 1.3642 LearningRate 0.000075 Epoch: 30 Global Step: 625710 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:14,938-Speed 2497.44 samples/sec Loss 1.3467 LearningRate 0.000075 Epoch: 30 Global Step: 625720 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:23,138-Speed 2498.08 samples/sec Loss 1.3709 LearningRate 0.000075 Epoch: 30 Global Step: 625730 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:31,337-Speed 2498.35 samples/sec Loss 1.3707 LearningRate 0.000075 Epoch: 30 Global Step: 625740 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:39,496-Speed 2510.54 samples/sec Loss 1.3638 LearningRate 0.000075 Epoch: 30 Global Step: 625750 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:47,696-Speed 2497.73 samples/sec Loss 1.3383 LearningRate 0.000075 Epoch: 30 Global Step: 625760 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:40:55,909-Speed 2494.28 samples/sec Loss 1.3672 LearningRate 0.000075 Epoch: 30 Global Step: 625770 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:04,108-Speed 2498.52 samples/sec Loss 1.3726 LearningRate 0.000074 Epoch: 30 Global Step: 625780 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:12,306-Speed 2498.32 samples/sec Loss 1.3358 LearningRate 0.000074 Epoch: 30 Global Step: 625790 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:20,507-Speed 2497.77 samples/sec Loss 1.3650 LearningRate 0.000074 Epoch: 30 Global Step: 625800 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:28,654-Speed 2514.24 samples/sec Loss 1.3470 LearningRate 0.000074 Epoch: 30 Global Step: 625810 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:36,853-Speed 2498.52 samples/sec Loss 1.3268 LearningRate 0.000074 Epoch: 30 Global Step: 625820 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:45,049-Speed 2499.28 samples/sec Loss 1.3329 LearningRate 0.000074 Epoch: 30 Global Step: 625830 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:41:53,251-Speed 2497.44 samples/sec Loss 1.3251 LearningRate 0.000074 Epoch: 30 Global Step: 625840 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:01,448-Speed 2499.41 samples/sec Loss 1.3425 LearningRate 0.000074 Epoch: 30 Global Step: 625850 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:09,645-Speed 2498.94 samples/sec Loss 1.3264 LearningRate 0.000074 Epoch: 30 Global Step: 625860 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:17,787-Speed 2515.56 samples/sec Loss 1.3525 LearningRate 0.000074 Epoch: 30 Global Step: 625870 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:25,984-Speed 2498.89 samples/sec Loss 1.3599 LearningRate 0.000074 Epoch: 30 Global Step: 625880 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:34,181-Speed 2499.06 samples/sec Loss 1.3265 LearningRate 0.000074 Epoch: 30 Global Step: 625890 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:42,396-Speed 2493.31 samples/sec Loss 1.3556 LearningRate 0.000074 Epoch: 30 Global Step: 625900 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:50,594-Speed 2499.10 samples/sec Loss 1.3682 LearningRate 0.000074 Epoch: 30 Global Step: 625910 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:42:58,795-Speed 2497.77 samples/sec Loss 1.3224 LearningRate 0.000074 Epoch: 30 Global Step: 625920 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:06,945-Speed 2513.27 samples/sec Loss 1.3841 LearningRate 0.000074 Epoch: 30 Global Step: 625930 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:15,144-Speed 2498.33 samples/sec Loss 1.3520 LearningRate 0.000074 Epoch: 30 Global Step: 625940 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:23,350-Speed 2496.08 samples/sec Loss 1.3804 LearningRate 0.000074 Epoch: 30 Global Step: 625950 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:31,548-Speed 2498.62 samples/sec Loss 1.3209 LearningRate 0.000074 Epoch: 30 Global Step: 625960 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:39,746-Speed 2498.60 samples/sec Loss 1.3497 LearningRate 0.000074 Epoch: 30 Global Step: 625970 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:47,945-Speed 2498.38 samples/sec Loss 1.2947 LearningRate 0.000074 Epoch: 30 Global Step: 625980 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:43:56,090-Speed 2514.63 samples/sec Loss 1.3561 LearningRate 0.000074 Epoch: 30 Global Step: 625990 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:04,290-Speed 2498.08 samples/sec Loss 1.3409 LearningRate 0.000074 Epoch: 30 Global Step: 626000 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:12,488-Speed 2498.58 samples/sec Loss 1.3367 LearningRate 0.000074 Epoch: 30 Global Step: 626010 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:20,688-Speed 2498.01 samples/sec Loss 1.3507 LearningRate 0.000074 Epoch: 30 Global Step: 626020 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:28,885-Speed 2498.78 samples/sec Loss 1.3487 LearningRate 0.000074 Epoch: 30 Global Step: 626030 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:37,086-Speed 2497.74 samples/sec Loss 1.3282 LearningRate 0.000074 Epoch: 30 Global Step: 626040 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:45,241-Speed 2511.73 samples/sec Loss 1.3439 LearningRate 0.000074 Epoch: 30 Global Step: 626050 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:44:53,439-Speed 2498.45 samples/sec Loss 1.3803 LearningRate 0.000074 Epoch: 30 Global Step: 626060 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:01,638-Speed 2498.53 samples/sec Loss 1.3498 LearningRate 0.000074 Epoch: 30 Global Step: 626070 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:09,836-Speed 2498.56 samples/sec Loss 1.3224 LearningRate 0.000074 Epoch: 30 Global Step: 626080 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:18,038-Speed 2497.50 samples/sec Loss 1.3104 LearningRate 0.000074 Epoch: 30 Global Step: 626090 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:26,238-Speed 2497.88 samples/sec Loss 1.3159 LearningRate 0.000074 Epoch: 30 Global Step: 626100 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:34,382-Speed 2515.33 samples/sec Loss 1.3473 LearningRate 0.000074 Epoch: 30 Global Step: 626110 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:42,584-Speed 2497.34 samples/sec Loss 1.3120 LearningRate 0.000074 Epoch: 30 Global Step: 626120 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:50,783-Speed 2498.49 samples/sec Loss 1.3476 LearningRate 0.000074 Epoch: 30 Global Step: 626130 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:45:59,061-Speed 2500.20 samples/sec Loss 1.3424 LearningRate 0.000074 Epoch: 30 Global Step: 626140 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:07,267-Speed 2499.72 samples/sec Loss 1.3169 LearningRate 0.000074 Epoch: 30 Global Step: 626150 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:15,487-Speed 2499.53 samples/sec Loss 1.3678 LearningRate 0.000074 Epoch: 30 Global Step: 626160 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:23,630-Speed 2515.21 samples/sec Loss 1.3415 LearningRate 0.000074 Epoch: 30 Global Step: 626170 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:31,920-Speed 2500.08 samples/sec Loss 1.3339 LearningRate 0.000074 Epoch: 30 Global Step: 626180 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:40,143-Speed 2499.43 samples/sec Loss 1.3356 LearningRate 0.000074 Epoch: 30 Global Step: 626190 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:48,413-Speed 2494.48 samples/sec Loss 1.3456 LearningRate 0.000074 Epoch: 30 Global Step: 626200 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:46:56,607-Speed 2499.70 samples/sec Loss 1.3475 LearningRate 0.000074 Epoch: 30 Global Step: 626210 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:04,807-Speed 2497.77 samples/sec Loss 1.3405 LearningRate 0.000074 Epoch: 30 Global Step: 626220 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:12,982-Speed 2515.03 samples/sec Loss 1.3340 LearningRate 0.000074 Epoch: 30 Global Step: 626230 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:21,208-Speed 2498.99 samples/sec Loss 1.3389 LearningRate 0.000074 Epoch: 30 Global Step: 626240 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:33,481-Speed 1669.14 samples/sec Loss 1.3503 LearningRate 0.000074 Epoch: 30 Global Step: 626250 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:41,720-Speed 2500.67 samples/sec Loss 1.3448 LearningRate 0.000074 Epoch: 30 Global Step: 626260 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:47:54,331-Speed 1698.66 samples/sec Loss 1.3578 LearningRate 0.000074 Epoch: 30 Global Step: 626270 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:02,542-Speed 2497.97 samples/sec Loss 1.3224 LearningRate 0.000074 Epoch: 30 Global Step: 626280 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:10,690-Speed 2513.74 samples/sec Loss 1.3361 LearningRate 0.000074 Epoch: 30 Global Step: 626290 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:18,921-Speed 2496.75 samples/sec Loss 1.3390 LearningRate 0.000074 Epoch: 30 Global Step: 626300 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:31,375-Speed 1808.73 samples/sec Loss 1.3343 LearningRate 0.000074 Epoch: 30 Global Step: 626310 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:39,735-Speed 2495.10 samples/sec Loss 1.3392 LearningRate 0.000074 Epoch: 30 Global Step: 626320 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:48:47,961-Speed 2492.56 samples/sec Loss 1.3358 LearningRate 0.000074 Epoch: 30 Global Step: 626330 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:01,266-Speed 2487.84 samples/sec Loss 1.3312 LearningRate 0.000074 Epoch: 30 Global Step: 626340 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:09,468-Speed 2507.71 samples/sec Loss 1.3159 LearningRate 0.000074 Epoch: 30 Global Step: 626350 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:18,487-Speed 2270.97 samples/sec Loss 1.3370 LearningRate 0.000074 Epoch: 30 Global Step: 626360 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:28,087-Speed 2391.30 samples/sec Loss 1.3398 LearningRate 0.000074 Epoch: 30 Global Step: 626370 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:36,311-Speed 2490.66 samples/sec Loss 1.3243 LearningRate 0.000074 Epoch: 30 Global Step: 626380 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:44,522-Speed 2494.36 samples/sec Loss 1.3209 LearningRate 0.000074 Epoch: 30 Global Step: 626390 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:49:52,734-Speed 2494.65 samples/sec Loss 1.3406 LearningRate 0.000074 Epoch: 30 Global Step: 626400 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:50:00,889-Speed 2511.78 samples/sec Loss 1.3363 LearningRate 0.000074 Epoch: 30 Global Step: 626410 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:50:09,097-Speed 2495.60 samples/sec Loss 1.3551 LearningRate 0.000074 Epoch: 30 Global Step: 626420 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-07-11 13:50:17,302-Speed 2496.42 samples/sec Loss 1.3477 LearningRate 0.000074 Epoch: 30 Global Step: 626430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:50:25,508-Speed 2496.14 samples/sec Loss 1.3543 LearningRate 0.000074 Epoch: 30 Global Step: 626440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:50:33,716-Speed 2495.34 samples/sec Loss 1.3187 LearningRate 0.000074 Epoch: 30 Global Step: 626450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:50:41,924-Speed 2495.64 samples/sec Loss 1.3493 LearningRate 0.000074 Epoch: 30 Global Step: 626460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:50:50,080-Speed 2511.56 samples/sec Loss 1.3488 LearningRate 0.000074 Epoch: 30 Global Step: 626470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:50:58,289-Speed 2495.07 samples/sec Loss 1.3516 LearningRate 0.000074 Epoch: 30 Global Step: 626480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:06,502-Speed 2493.98 samples/sec Loss 1.3642 LearningRate 0.000074 Epoch: 30 Global Step: 626490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:14,712-Speed 2495.02 samples/sec Loss 1.3163 LearningRate 0.000074 Epoch: 30 Global Step: 626500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:22,923-Speed 2494.55 samples/sec Loss 1.2866 LearningRate 0.000074 Epoch: 30 Global Step: 626510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:31,138-Speed 2493.54 samples/sec Loss 1.3461 LearningRate 0.000074 Epoch: 30 Global Step: 626520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:39,294-Speed 2511.26 samples/sec Loss 1.3408 LearningRate 0.000074 Epoch: 30 Global Step: 626530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:47,503-Speed 2495.22 samples/sec Loss 1.3305 LearningRate 0.000074 Epoch: 30 Global Step: 626540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:51:55,714-Speed 2494.69 samples/sec Loss 1.3305 LearningRate 0.000074 Epoch: 30 Global Step: 626550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:03,920-Speed 2495.87 samples/sec Loss 1.3194 LearningRate 0.000074 Epoch: 30 Global Step: 626560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:12,130-Speed 2494.96 samples/sec Loss 1.3460 LearningRate 0.000074 Epoch: 30 Global Step: 626570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:20,337-Speed 2495.51 samples/sec Loss 1.3391 LearningRate 0.000074 Epoch: 30 Global Step: 626580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:28,491-Speed 2512.32 samples/sec Loss 1.3724 LearningRate 0.000074 Epoch: 30 Global Step: 626590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:36,697-Speed 2496.45 samples/sec Loss 1.3625 LearningRate 0.000074 Epoch: 30 Global Step: 626600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:44,917-Speed 2491.88 samples/sec Loss 1.3315 LearningRate 0.000074 Epoch: 30 Global Step: 626610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:52:53,128-Speed 2494.58 samples/sec Loss 1.3365 LearningRate 0.000074 Epoch: 30 Global Step: 626620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:53:01,335-Speed 2496.06 samples/sec Loss 1.3251 LearningRate 0.000074 Epoch: 30 Global Step: 626630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 13:53:09,503-Speed 2507.73 samples/sec Loss 1.3389 LearningRate 0.000074 Epoch: 30 Global Step: 626640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:17,662-Speed 2510.52 samples/sec Loss 1.3341 LearningRate 0.000074 Epoch: 30 Global Step: 626650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:25,871-Speed 2495.19 samples/sec Loss 1.3120 LearningRate 0.000074 Epoch: 30 Global Step: 626660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:34,078-Speed 2495.77 samples/sec Loss 1.3399 LearningRate 0.000074 Epoch: 30 Global Step: 626670 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:42,287-Speed 2495.30 samples/sec Loss 1.3605 LearningRate 0.000074 Epoch: 30 Global Step: 626680 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:50,495-Speed 2495.53 samples/sec Loss 1.3215 LearningRate 0.000074 Epoch: 30 Global Step: 626690 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:53:58,700-Speed 2496.60 samples/sec Loss 1.3646 LearningRate 0.000074 Epoch: 30 Global Step: 626700 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:06,856-Speed 2511.55 samples/sec Loss 1.3449 LearningRate 0.000074 Epoch: 30 Global Step: 626710 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:15,066-Speed 2495.10 samples/sec Loss 1.3542 LearningRate 0.000074 Epoch: 30 Global Step: 626720 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:23,270-Speed 2496.73 samples/sec Loss 1.3621 LearningRate 0.000074 Epoch: 30 Global Step: 626730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:31,483-Speed 2493.98 samples/sec Loss 1.3222 LearningRate 0.000074 Epoch: 30 Global Step: 626740 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:39,689-Speed 2496.13 samples/sec Loss 1.3244 LearningRate 0.000074 Epoch: 30 Global Step: 626750 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:47,898-Speed 2495.23 samples/sec Loss 1.3424 LearningRate 0.000074 Epoch: 30 Global Step: 626760 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:54:56,050-Speed 2512.55 samples/sec Loss 1.3430 LearningRate 0.000074 Epoch: 30 Global Step: 626770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:04,257-Speed 2495.82 samples/sec Loss 1.3211 LearningRate 0.000074 Epoch: 30 Global Step: 626780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:12,464-Speed 2496.06 samples/sec Loss 1.3154 LearningRate 0.000074 Epoch: 30 Global Step: 626790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:20,670-Speed 2496.08 samples/sec Loss 1.3111 LearningRate 0.000074 Epoch: 30 Global Step: 626800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:28,875-Speed 2496.51 samples/sec Loss 1.3651 LearningRate 0.000074 Epoch: 30 Global Step: 626810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:37,078-Speed 2497.00 samples/sec Loss 1.3315 LearningRate 0.000074 Epoch: 30 Global Step: 626820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:45,230-Speed 2512.50 samples/sec Loss 1.3169 LearningRate 0.000074 Epoch: 30 Global Step: 626830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:55:53,436-Speed 2496.15 samples/sec Loss 1.3478 LearningRate 0.000074 Epoch: 30 Global Step: 626840 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:01,646-Speed 2495.09 samples/sec Loss 1.3470 LearningRate 0.000074 Epoch: 30 Global Step: 626850 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:09,852-Speed 2496.18 samples/sec Loss 1.3119 LearningRate 0.000074 Epoch: 30 Global Step: 626860 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:18,056-Speed 2496.81 samples/sec Loss 1.3106 LearningRate 0.000074 Epoch: 30 Global Step: 626870 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:26,263-Speed 2495.98 samples/sec Loss 1.3393 LearningRate 0.000074 Epoch: 30 Global Step: 626880 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:34,413-Speed 2513.19 samples/sec Loss 1.3118 LearningRate 0.000074 Epoch: 30 Global Step: 626890 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:42,623-Speed 2495.00 samples/sec Loss 1.3395 LearningRate 0.000074 Epoch: 30 Global Step: 626900 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:50,830-Speed 2495.55 samples/sec Loss 1.3401 LearningRate 0.000074 Epoch: 30 Global Step: 626910 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:56:59,039-Speed 2495.36 samples/sec Loss 1.3344 LearningRate 0.000074 Epoch: 30 Global Step: 626920 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:07,247-Speed 2495.47 samples/sec Loss 1.3293 LearningRate 0.000074 Epoch: 30 Global Step: 626930 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:15,453-Speed 2495.99 samples/sec Loss 1.3495 LearningRate 0.000074 Epoch: 30 Global Step: 626940 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:23,609-Speed 2511.60 samples/sec Loss 1.3173 LearningRate 0.000074 Epoch: 30 Global Step: 626950 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:31,815-Speed 2496.21 samples/sec Loss 1.3247 LearningRate 0.000074 Epoch: 30 Global Step: 626960 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:40,021-Speed 2496.12 samples/sec Loss 1.3310 LearningRate 0.000074 Epoch: 30 Global Step: 626970 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:48,233-Speed 2494.09 samples/sec Loss 1.3646 LearningRate 0.000074 Epoch: 30 Global Step: 626980 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:57:56,440-Speed 2495.82 samples/sec Loss 1.3411 LearningRate 0.000074 Epoch: 30 Global Step: 626990 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:04,648-Speed 2495.43 samples/sec Loss 1.3363 LearningRate 0.000074 Epoch: 30 Global Step: 627000 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:12,805-Speed 2510.99 samples/sec Loss 1.2947 LearningRate 0.000074 Epoch: 30 Global Step: 627010 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:21,015-Speed 2495.08 samples/sec Loss 1.3697 LearningRate 0.000074 Epoch: 30 Global Step: 627020 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:29,222-Speed 2495.59 samples/sec Loss 1.3539 LearningRate 0.000074 Epoch: 30 Global Step: 627030 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:37,434-Speed 2494.30 samples/sec Loss 1.3537 LearningRate 0.000074 Epoch: 30 Global Step: 627040 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:45,652-Speed 2492.62 samples/sec Loss 1.3354 LearningRate 0.000074 Epoch: 30 Global Step: 627050 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:58:53,856-Speed 2496.58 samples/sec Loss 1.3570 LearningRate 0.000074 Epoch: 30 Global Step: 627060 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:02,038-Speed 2503.63 samples/sec Loss 1.3178 LearningRate 0.000074 Epoch: 30 Global Step: 627070 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:10,242-Speed 2496.61 samples/sec Loss 1.3226 LearningRate 0.000074 Epoch: 30 Global Step: 627080 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:18,447-Speed 2496.42 samples/sec Loss 1.3357 LearningRate 0.000074 Epoch: 30 Global Step: 627090 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:26,652-Speed 2496.38 samples/sec Loss 1.3198 LearningRate 0.000074 Epoch: 30 Global Step: 627100 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:34,855-Speed 2497.22 samples/sec Loss 1.3321 LearningRate 0.000074 Epoch: 30 Global Step: 627110 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:43,062-Speed 2495.77 samples/sec Loss 1.3220 LearningRate 0.000074 Epoch: 30 Global Step: 627120 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:51,215-Speed 2512.38 samples/sec Loss 1.3368 LearningRate 0.000074 Epoch: 30 Global Step: 627130 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 13:59:59,420-Speed 2496.34 samples/sec Loss 1.3290 LearningRate 0.000074 Epoch: 30 Global Step: 627140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:07,633-Speed 2494.19 samples/sec Loss 1.3301 LearningRate 0.000074 Epoch: 30 Global Step: 627150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:15,838-Speed 2496.31 samples/sec Loss 1.3397 LearningRate 0.000073 Epoch: 30 Global Step: 627160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:24,043-Speed 2496.43 samples/sec Loss 1.3370 LearningRate 0.000073 Epoch: 30 Global Step: 627170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:32,255-Speed 2494.23 samples/sec Loss 1.3294 LearningRate 0.000073 Epoch: 30 Global Step: 627180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:40,410-Speed 2511.63 samples/sec Loss 1.3538 LearningRate 0.000073 Epoch: 30 Global Step: 627190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:48,629-Speed 2492.30 samples/sec Loss 1.3387 LearningRate 0.000073 Epoch: 30 Global Step: 627200 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:00:56,836-Speed 2495.88 samples/sec Loss 1.3613 LearningRate 0.000073 Epoch: 30 Global Step: 627210 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:05,044-Speed 2495.44 samples/sec Loss 1.3151 LearningRate 0.000073 Epoch: 30 Global Step: 627220 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:13,248-Speed 2496.80 samples/sec Loss 1.3222 LearningRate 0.000073 Epoch: 30 Global Step: 627230 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:21,453-Speed 2496.46 samples/sec Loss 1.3805 LearningRate 0.000073 Epoch: 30 Global Step: 627240 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:29,622-Speed 2507.13 samples/sec Loss 1.3655 LearningRate 0.000073 Epoch: 30 Global Step: 627250 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:37,826-Speed 2496.83 samples/sec Loss 1.3365 LearningRate 0.000073 Epoch: 30 Global Step: 627260 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:46,031-Speed 2496.53 samples/sec Loss 1.3212 LearningRate 0.000073 Epoch: 30 Global Step: 627270 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:01:54,235-Speed 2496.48 samples/sec Loss 1.3280 LearningRate 0.000073 Epoch: 30 Global Step: 627280 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:02,445-Speed 2495.46 samples/sec Loss 1.3044 LearningRate 0.000073 Epoch: 30 Global Step: 627290 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:10,651-Speed 2496.11 samples/sec Loss 1.3096 LearningRate 0.000073 Epoch: 30 Global Step: 627300 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:18,812-Speed 2509.93 samples/sec Loss 1.3266 LearningRate 0.000073 Epoch: 30 Global Step: 627310 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:27,015-Speed 2496.87 samples/sec Loss 1.3275 LearningRate 0.000073 Epoch: 30 Global Step: 627320 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:35,229-Speed 2493.69 samples/sec Loss 1.3652 LearningRate 0.000073 Epoch: 30 Global Step: 627330 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:43,438-Speed 2495.29 samples/sec Loss 1.3678 LearningRate 0.000073 Epoch: 30 Global Step: 627340 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:51,644-Speed 2496.27 samples/sec Loss 1.3198 LearningRate 0.000073 Epoch: 30 Global Step: 627350 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:02:59,854-Speed 2494.83 samples/sec Loss 1.3487 LearningRate 0.000073 Epoch: 30 Global Step: 627360 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:08,003-Speed 2513.51 samples/sec Loss 1.3466 LearningRate 0.000073 Epoch: 30 Global Step: 627370 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:16,209-Speed 2496.26 samples/sec Loss 1.3595 LearningRate 0.000073 Epoch: 30 Global Step: 627380 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:24,412-Speed 2496.83 samples/sec Loss 1.3545 LearningRate 0.000073 Epoch: 30 Global Step: 627390 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:32,625-Speed 2494.45 samples/sec Loss 1.3269 LearningRate 0.000073 Epoch: 30 Global Step: 627400 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:40,834-Speed 2495.14 samples/sec Loss 1.3676 LearningRate 0.000073 Epoch: 30 Global Step: 627410 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:49,044-Speed 2494.85 samples/sec Loss 1.3335 LearningRate 0.000073 Epoch: 30 Global Step: 627420 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:03:57,198-Speed 2511.96 samples/sec Loss 1.3335 LearningRate 0.000073 Epoch: 30 Global Step: 627430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:05,408-Speed 2494.82 samples/sec Loss 1.3358 LearningRate 0.000073 Epoch: 30 Global Step: 627440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:13,613-Speed 2496.31 samples/sec Loss 1.3663 LearningRate 0.000073 Epoch: 30 Global Step: 627450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:21,823-Speed 2494.97 samples/sec Loss 1.3484 LearningRate 0.000073 Epoch: 30 Global Step: 627460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:30,029-Speed 2495.88 samples/sec Loss 1.3432 LearningRate 0.000073 Epoch: 30 Global Step: 627470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:38,235-Speed 2496.59 samples/sec Loss 1.3270 LearningRate 0.000073 Epoch: 30 Global Step: 627480 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:46,390-Speed 2511.75 samples/sec Loss 1.3645 LearningRate 0.000073 Epoch: 30 Global Step: 627490 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:04:54,599-Speed 2495.25 samples/sec Loss 1.3739 LearningRate 0.000073 Epoch: 30 Global Step: 627500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:02,806-Speed 2495.81 samples/sec Loss 1.3215 LearningRate 0.000073 Epoch: 30 Global Step: 627510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:11,014-Speed 2495.60 samples/sec Loss 1.3439 LearningRate 0.000073 Epoch: 30 Global Step: 627520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:19,223-Speed 2495.31 samples/sec Loss 1.3548 LearningRate 0.000073 Epoch: 30 Global Step: 627530 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:27,429-Speed 2496.42 samples/sec Loss 1.3641 LearningRate 0.000073 Epoch: 30 Global Step: 627540 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:35,584-Speed 2511.63 samples/sec Loss 1.2905 LearningRate 0.000073 Epoch: 30 Global Step: 627550 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:43,793-Speed 2495.26 samples/sec Loss 1.3606 LearningRate 0.000073 Epoch: 30 Global Step: 627560 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:05:52,000-Speed 2495.92 samples/sec Loss 1.3651 LearningRate 0.000073 Epoch: 30 Global Step: 627570 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:00,207-Speed 2495.72 samples/sec Loss 1.3697 LearningRate 0.000073 Epoch: 30 Global Step: 627580 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:08,413-Speed 2495.98 samples/sec Loss 1.3731 LearningRate 0.000073 Epoch: 30 Global Step: 627590 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:16,625-Speed 2494.79 samples/sec Loss 1.3432 LearningRate 0.000073 Epoch: 30 Global Step: 627600 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:24,776-Speed 2512.79 samples/sec Loss 1.3751 LearningRate 0.000073 Epoch: 30 Global Step: 627610 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:32,982-Speed 2496.03 samples/sec Loss 1.3596 LearningRate 0.000073 Epoch: 30 Global Step: 627620 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:41,189-Speed 2495.91 samples/sec Loss 1.3574 LearningRate 0.000073 Epoch: 30 Global Step: 627630 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:49,402-Speed 2494.11 samples/sec Loss 1.3454 LearningRate 0.000073 Epoch: 30 Global Step: 627640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:06:57,609-Speed 2495.86 samples/sec Loss 1.3431 LearningRate 0.000073 Epoch: 30 Global Step: 627650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:05,819-Speed 2494.97 samples/sec Loss 1.3814 LearningRate 0.000073 Epoch: 30 Global Step: 627660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:13,968-Speed 2513.66 samples/sec Loss 1.3334 LearningRate 0.000073 Epoch: 30 Global Step: 627670 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:22,177-Speed 2495.38 samples/sec Loss 1.3586 LearningRate 0.000073 Epoch: 30 Global Step: 627680 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:30,381-Speed 2496.83 samples/sec Loss 1.3268 LearningRate 0.000073 Epoch: 30 Global Step: 627690 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:38,586-Speed 2496.07 samples/sec Loss 1.3717 LearningRate 0.000073 Epoch: 30 Global Step: 627700 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:46,804-Speed 2492.67 samples/sec Loss 1.3914 LearningRate 0.000073 Epoch: 30 Global Step: 627710 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:07:55,012-Speed 2495.83 samples/sec Loss 1.3703 LearningRate 0.000073 Epoch: 30 Global Step: 627720 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:03,162-Speed 2512.99 samples/sec Loss 1.3802 LearningRate 0.000073 Epoch: 30 Global Step: 627730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:11,368-Speed 2496.05 samples/sec Loss 1.3556 LearningRate 0.000073 Epoch: 30 Global Step: 627740 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:19,575-Speed 2495.96 samples/sec Loss 1.3721 LearningRate 0.000073 Epoch: 30 Global Step: 627750 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:27,784-Speed 2495.11 samples/sec Loss 1.3884 LearningRate 0.000073 Epoch: 30 Global Step: 627760 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:35,993-Speed 2495.16 samples/sec Loss 1.3265 LearningRate 0.000073 Epoch: 30 Global Step: 627770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:44,215-Speed 2491.50 samples/sec Loss 1.3371 LearningRate 0.000073 Epoch: 30 Global Step: 627780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:08:52,369-Speed 2512.08 samples/sec Loss 1.3130 LearningRate 0.000073 Epoch: 30 Global Step: 627790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:09:00,574-Speed 2496.26 samples/sec Loss 1.3510 LearningRate 0.000073 Epoch: 30 Global Step: 627800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:09:08,781-Speed 2495.90 samples/sec Loss 1.3406 LearningRate 0.000073 Epoch: 30 Global Step: 627810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:09:17,002-Speed 2491.52 samples/sec Loss 1.3296 LearningRate 0.000073 Epoch: 30 Global Step: 627820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:09:25,207-Speed 2496.38 samples/sec Loss 1.3498 LearningRate 0.000073 Epoch: 30 Global Step: 627830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:09:33,415-Speed 2495.65 samples/sec Loss 1.3471 LearningRate 0.000073 Epoch: 30 Global Step: 627840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:09:41,565-Speed 2513.31 samples/sec Loss 1.3194 LearningRate 0.000073 Epoch: 30 Global Step: 627850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:09:49,769-Speed 2496.77 samples/sec Loss 1.3867 LearningRate 0.000073 Epoch: 30 Global Step: 627860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:09:57,973-Speed 2496.72 samples/sec Loss 1.3366 LearningRate 0.000073 Epoch: 30 Global Step: 627870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:06,180-Speed 2495.67 samples/sec Loss 1.3461 LearningRate 0.000073 Epoch: 30 Global Step: 627880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:14,384-Speed 2496.64 samples/sec Loss 1.3843 LearningRate 0.000073 Epoch: 30 Global Step: 627890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:22,591-Speed 2495.97 samples/sec Loss 1.3298 LearningRate 0.000073 Epoch: 30 Global Step: 627900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:30,746-Speed 2511.80 samples/sec Loss 1.3135 LearningRate 0.000073 Epoch: 30 Global Step: 627910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:38,950-Speed 2496.56 samples/sec Loss 1.3567 LearningRate 0.000073 Epoch: 30 Global Step: 627920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:47,158-Speed 2495.76 samples/sec Loss 1.3227 LearningRate 0.000073 Epoch: 30 Global Step: 627930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:10:55,363-Speed 2496.28 samples/sec Loss 1.3507 LearningRate 0.000073 Epoch: 30 Global Step: 627940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:03,569-Speed 2496.41 samples/sec Loss 1.3318 LearningRate 0.000073 Epoch: 30 Global Step: 627950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:11,777-Speed 2495.32 samples/sec Loss 1.3539 LearningRate 0.000073 Epoch: 30 Global Step: 627960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:19,931-Speed 2512.11 samples/sec Loss 1.3466 LearningRate 0.000073 Epoch: 30 Global Step: 627970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:28,154-Speed 2491.20 samples/sec Loss 1.3322 LearningRate 0.000073 Epoch: 30 Global Step: 627980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:36,359-Speed 2496.30 samples/sec Loss 1.3457 LearningRate 0.000073 Epoch: 30 Global Step: 627990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:44,583-Speed 2490.64 samples/sec Loss 1.3437 LearningRate 0.000073 Epoch: 30 Global Step: 628000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:11:52,788-Speed 2496.37 samples/sec Loss 1.3339 LearningRate 0.000073 Epoch: 30 Global Step: 628010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:12:00,951-Speed 2509.29 samples/sec Loss 1.3203 LearningRate 0.000073 Epoch: 30 Global Step: 628020 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:09,105-Speed 2512.06 samples/sec Loss 1.3643 LearningRate 0.000073 Epoch: 30 Global Step: 628030 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:17,309-Speed 2496.80 samples/sec Loss 1.3304 LearningRate 0.000073 Epoch: 30 Global Step: 628040 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:25,522-Speed 2494.13 samples/sec Loss 1.3221 LearningRate 0.000073 Epoch: 30 Global Step: 628050 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:33,729-Speed 2495.97 samples/sec Loss 1.3379 LearningRate 0.000073 Epoch: 30 Global Step: 628060 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:41,932-Speed 2496.94 samples/sec Loss 1.3746 LearningRate 0.000073 Epoch: 30 Global Step: 628070 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:50,139-Speed 2495.85 samples/sec Loss 1.3148 LearningRate 0.000073 Epoch: 30 Global Step: 628080 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:12:58,289-Speed 2513.36 samples/sec Loss 1.3488 LearningRate 0.000073 Epoch: 30 Global Step: 628090 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:06,495-Speed 2496.07 samples/sec Loss 1.3354 LearningRate 0.000073 Epoch: 30 Global Step: 628100 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:14,699-Speed 2497.23 samples/sec Loss 1.3579 LearningRate 0.000073 Epoch: 30 Global Step: 628110 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:22,902-Speed 2496.79 samples/sec Loss 1.3795 LearningRate 0.000073 Epoch: 30 Global Step: 628120 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:31,118-Speed 2493.42 samples/sec Loss 1.3527 LearningRate 0.000073 Epoch: 30 Global Step: 628130 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:39,320-Speed 2497.19 samples/sec Loss 1.3761 LearningRate 0.000073 Epoch: 30 Global Step: 628140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:47,473-Speed 2512.53 samples/sec Loss 1.3082 LearningRate 0.000073 Epoch: 30 Global Step: 628150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:13:55,680-Speed 2495.66 samples/sec Loss 1.3122 LearningRate 0.000073 Epoch: 30 Global Step: 628160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:03,886-Speed 2496.04 samples/sec Loss 1.3457 LearningRate 0.000073 Epoch: 30 Global Step: 628170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:12,096-Speed 2495.15 samples/sec Loss 1.3535 LearningRate 0.000073 Epoch: 30 Global Step: 628180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:20,313-Speed 2492.49 samples/sec Loss 1.3216 LearningRate 0.000073 Epoch: 30 Global Step: 628190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:28,518-Speed 2496.49 samples/sec Loss 1.3254 LearningRate 0.000073 Epoch: 30 Global Step: 628200 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:36,671-Speed 2512.63 samples/sec Loss 1.3473 LearningRate 0.000073 Epoch: 30 Global Step: 628210 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:44,886-Speed 2493.33 samples/sec Loss 1.3138 LearningRate 0.000073 Epoch: 30 Global Step: 628220 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:14:53,089-Speed 2497.18 samples/sec Loss 1.3647 LearningRate 0.000073 Epoch: 30 Global Step: 628230 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:01,299-Speed 2495.13 samples/sec Loss 1.3518 LearningRate 0.000073 Epoch: 30 Global Step: 628240 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:09,504-Speed 2496.65 samples/sec Loss 1.3380 LearningRate 0.000073 Epoch: 30 Global Step: 628250 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:17,722-Speed 2492.52 samples/sec Loss 1.3594 LearningRate 0.000073 Epoch: 30 Global Step: 628260 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:25,877-Speed 2511.39 samples/sec Loss 1.3216 LearningRate 0.000073 Epoch: 30 Global Step: 628270 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:34,082-Speed 2496.65 samples/sec Loss 1.3319 LearningRate 0.000073 Epoch: 30 Global Step: 628280 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:42,319-Speed 2486.49 samples/sec Loss 1.3173 LearningRate 0.000073 Epoch: 30 Global Step: 628290 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:50,523-Speed 2496.81 samples/sec Loss 1.3849 LearningRate 0.000073 Epoch: 30 Global Step: 628300 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:15:58,725-Speed 2497.21 samples/sec Loss 1.3665 LearningRate 0.000073 Epoch: 30 Global Step: 628310 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:06,932-Speed 2495.91 samples/sec Loss 1.3140 LearningRate 0.000073 Epoch: 30 Global Step: 628320 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:15,082-Speed 2513.25 samples/sec Loss 1.3377 LearningRate 0.000073 Epoch: 30 Global Step: 628330 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:23,287-Speed 2496.67 samples/sec Loss 1.3726 LearningRate 0.000073 Epoch: 30 Global Step: 628340 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:31,489-Speed 2497.38 samples/sec Loss 1.3350 LearningRate 0.000073 Epoch: 30 Global Step: 628350 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:39,693-Speed 2496.65 samples/sec Loss 1.3602 LearningRate 0.000073 Epoch: 30 Global Step: 628360 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:47,899-Speed 2496.06 samples/sec Loss 1.3526 LearningRate 0.000073 Epoch: 30 Global Step: 628370 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:16:56,116-Speed 2493.06 samples/sec Loss 1.3587 LearningRate 0.000073 Epoch: 30 Global Step: 628380 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:04,266-Speed 2513.13 samples/sec Loss 1.3277 LearningRate 0.000073 Epoch: 30 Global Step: 628390 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:12,470-Speed 2496.58 samples/sec Loss 1.3363 LearningRate 0.000073 Epoch: 30 Global Step: 628400 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:20,675-Speed 2496.62 samples/sec Loss 1.3308 LearningRate 0.000073 Epoch: 30 Global Step: 628410 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:28,879-Speed 2496.73 samples/sec Loss 1.3393 LearningRate 0.000073 Epoch: 30 Global Step: 628420 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:37,082-Speed 2497.01 samples/sec Loss 1.3058 LearningRate 0.000073 Epoch: 30 Global Step: 628430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:45,288-Speed 2495.94 samples/sec Loss 1.3444 LearningRate 0.000073 Epoch: 30 Global Step: 628440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:17:53,435-Speed 2514.36 samples/sec Loss 1.3560 LearningRate 0.000073 Epoch: 30 Global Step: 628450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:01,639-Speed 2496.93 samples/sec Loss 1.3123 LearningRate 0.000073 Epoch: 30 Global Step: 628460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:09,843-Speed 2496.97 samples/sec Loss 1.3457 LearningRate 0.000073 Epoch: 30 Global Step: 628470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:18,047-Speed 2496.66 samples/sec Loss 1.3525 LearningRate 0.000073 Epoch: 30 Global Step: 628480 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:26,251-Speed 2496.94 samples/sec Loss 1.3533 LearningRate 0.000073 Epoch: 30 Global Step: 628490 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:34,453-Speed 2497.08 samples/sec Loss 1.3080 LearningRate 0.000073 Epoch: 30 Global Step: 628500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:42,604-Speed 2513.09 samples/sec Loss 1.3296 LearningRate 0.000073 Epoch: 30 Global Step: 628510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:50,814-Speed 2494.68 samples/sec Loss 1.3337 LearningRate 0.000073 Epoch: 30 Global Step: 628520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:18:59,038-Speed 2490.78 samples/sec Loss 1.3541 LearningRate 0.000073 Epoch: 30 Global Step: 628530 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:07,246-Speed 2495.52 samples/sec Loss 1.3446 LearningRate 0.000072 Epoch: 30 Global Step: 628540 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:15,451-Speed 2496.53 samples/sec Loss 1.3415 LearningRate 0.000072 Epoch: 30 Global Step: 628550 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:23,656-Speed 2496.33 samples/sec Loss 1.3416 LearningRate 0.000072 Epoch: 30 Global Step: 628560 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:31,809-Speed 2512.48 samples/sec Loss 1.2955 LearningRate 0.000072 Epoch: 30 Global Step: 628570 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:40,014-Speed 2496.42 samples/sec Loss 1.3486 LearningRate 0.000072 Epoch: 30 Global Step: 628580 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:48,222-Speed 2495.99 samples/sec Loss 1.3129 LearningRate 0.000072 Epoch: 30 Global Step: 628590 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:19:56,429-Speed 2495.87 samples/sec Loss 1.3191 LearningRate 0.000072 Epoch: 30 Global Step: 628600 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:04,635-Speed 2495.99 samples/sec Loss 1.3239 LearningRate 0.000072 Epoch: 30 Global Step: 628610 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:12,845-Speed 2494.86 samples/sec Loss 1.3264 LearningRate 0.000072 Epoch: 30 Global Step: 628620 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:20,997-Speed 2512.81 samples/sec Loss 1.3203 LearningRate 0.000072 Epoch: 30 Global Step: 628630 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:29,202-Speed 2496.36 samples/sec Loss 1.3360 LearningRate 0.000072 Epoch: 30 Global Step: 628640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:37,410-Speed 2495.64 samples/sec Loss 1.2928 LearningRate 0.000072 Epoch: 30 Global Step: 628650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:45,622-Speed 2494.21 samples/sec Loss 1.3297 LearningRate 0.000072 Epoch: 30 Global Step: 628660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:20:53,830-Speed 2495.61 samples/sec Loss 1.3189 LearningRate 0.000072 Epoch: 30 Global Step: 628670 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:02,036-Speed 2496.12 samples/sec Loss 1.3575 LearningRate 0.000072 Epoch: 30 Global Step: 628680 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:10,190-Speed 2512.18 samples/sec Loss 1.3133 LearningRate 0.000072 Epoch: 30 Global Step: 628690 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:18,397-Speed 2495.46 samples/sec Loss 1.3474 LearningRate 0.000072 Epoch: 30 Global Step: 628700 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:26,602-Speed 2496.56 samples/sec Loss 1.3429 LearningRate 0.000072 Epoch: 30 Global Step: 628710 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:34,808-Speed 2496.22 samples/sec Loss 1.3103 LearningRate 0.000072 Epoch: 30 Global Step: 628720 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:43,020-Speed 2494.54 samples/sec Loss 1.3180 LearningRate 0.000072 Epoch: 30 Global Step: 628730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:51,231-Speed 2494.45 samples/sec Loss 1.3204 LearningRate 0.000072 Epoch: 30 Global Step: 628740 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:21:59,384-Speed 2512.63 samples/sec Loss 1.3339 LearningRate 0.000072 Epoch: 30 Global Step: 628750 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:07,600-Speed 2493.23 samples/sec Loss 1.3561 LearningRate 0.000072 Epoch: 30 Global Step: 628760 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:15,802-Speed 2497.04 samples/sec Loss 1.3197 LearningRate 0.000072 Epoch: 30 Global Step: 628770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:24,010-Speed 2495.61 samples/sec Loss 1.3341 LearningRate 0.000072 Epoch: 30 Global Step: 628780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:32,216-Speed 2496.22 samples/sec Loss 1.3348 LearningRate 0.000072 Epoch: 30 Global Step: 628790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:40,419-Speed 2497.01 samples/sec Loss 1.3201 LearningRate 0.000072 Epoch: 30 Global Step: 628800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:48,574-Speed 2511.65 samples/sec Loss 1.3294 LearningRate 0.000072 Epoch: 30 Global Step: 628810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:22:56,781-Speed 2495.91 samples/sec Loss 1.3317 LearningRate 0.000072 Epoch: 30 Global Step: 628820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:04,986-Speed 2496.34 samples/sec Loss 1.3371 LearningRate 0.000072 Epoch: 30 Global Step: 628830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:13,196-Speed 2495.03 samples/sec Loss 1.3795 LearningRate 0.000072 Epoch: 30 Global Step: 628840 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:21,407-Speed 2494.95 samples/sec Loss 1.3378 LearningRate 0.000072 Epoch: 30 Global Step: 628850 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:29,622-Speed 2493.59 samples/sec Loss 1.3660 LearningRate 0.000072 Epoch: 30 Global Step: 628860 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:37,780-Speed 2510.87 samples/sec Loss 1.3474 LearningRate 0.000072 Epoch: 30 Global Step: 628870 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:46,006-Speed 2489.93 samples/sec Loss 1.3457 LearningRate 0.000072 Epoch: 30 Global Step: 628880 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:23:54,219-Speed 2494.08 samples/sec Loss 1.3092 LearningRate 0.000072 Epoch: 30 Global Step: 628890 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:02,429-Speed 2494.84 samples/sec Loss 1.3021 LearningRate 0.000072 Epoch: 30 Global Step: 628900 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:10,636-Speed 2496.09 samples/sec Loss 1.3141 LearningRate 0.000072 Epoch: 30 Global Step: 628910 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:18,844-Speed 2495.45 samples/sec Loss 1.3439 LearningRate 0.000072 Epoch: 30 Global Step: 628920 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:26,994-Speed 2513.17 samples/sec Loss 1.3205 LearningRate 0.000072 Epoch: 30 Global Step: 628930 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:35,200-Speed 2495.93 samples/sec Loss 1.3290 LearningRate 0.000072 Epoch: 30 Global Step: 628940 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:43,408-Speed 2495.78 samples/sec Loss 1.3561 LearningRate 0.000072 Epoch: 30 Global Step: 628950 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:51,620-Speed 2494.02 samples/sec Loss 1.3179 LearningRate 0.000072 Epoch: 30 Global Step: 628960 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:24:59,827-Speed 2495.87 samples/sec Loss 1.3308 LearningRate 0.000072 Epoch: 30 Global Step: 628970 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:08,031-Speed 2496.74 samples/sec Loss 1.3469 LearningRate 0.000072 Epoch: 30 Global Step: 628980 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:16,180-Speed 2513.41 samples/sec Loss 1.3647 LearningRate 0.000072 Epoch: 30 Global Step: 628990 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:24,384-Speed 2496.85 samples/sec Loss 1.3452 LearningRate 0.000072 Epoch: 30 Global Step: 629000 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:32,593-Speed 2495.44 samples/sec Loss 1.3442 LearningRate 0.000072 Epoch: 30 Global Step: 629010 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:40,798-Speed 2496.57 samples/sec Loss 1.3232 LearningRate 0.000072 Epoch: 30 Global Step: 629020 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:49,001-Speed 2497.01 samples/sec Loss 1.3386 LearningRate 0.000072 Epoch: 30 Global Step: 629030 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:25:57,207-Speed 2496.07 samples/sec Loss 1.3600 LearningRate 0.000072 Epoch: 30 Global Step: 629040 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:05,361-Speed 2512.11 samples/sec Loss 1.3371 LearningRate 0.000072 Epoch: 30 Global Step: 629050 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:13,564-Speed 2496.99 samples/sec Loss 1.3567 LearningRate 0.000072 Epoch: 30 Global Step: 629060 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:21,767-Speed 2497.11 samples/sec Loss 1.3597 LearningRate 0.000072 Epoch: 30 Global Step: 629070 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:29,974-Speed 2496.24 samples/sec Loss 1.3706 LearningRate 0.000072 Epoch: 30 Global Step: 629080 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:38,179-Speed 2496.64 samples/sec Loss 1.3435 LearningRate 0.000072 Epoch: 30 Global Step: 629090 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:46,394-Speed 2493.26 samples/sec Loss 1.3590 LearningRate 0.000072 Epoch: 30 Global Step: 629100 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:26:54,545-Speed 2513.03 samples/sec Loss 1.3535 LearningRate 0.000072 Epoch: 30 Global Step: 629110 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:02,748-Speed 2496.94 samples/sec Loss 1.3160 LearningRate 0.000072 Epoch: 30 Global Step: 629120 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:10,952-Speed 2496.84 samples/sec Loss 1.3185 LearningRate 0.000072 Epoch: 30 Global Step: 629130 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:19,157-Speed 2496.31 samples/sec Loss 1.3188 LearningRate 0.000072 Epoch: 30 Global Step: 629140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:27,360-Speed 2497.04 samples/sec Loss 1.3597 LearningRate 0.000072 Epoch: 30 Global Step: 629150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:35,565-Speed 2496.40 samples/sec Loss 1.3399 LearningRate 0.000072 Epoch: 30 Global Step: 629160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:43,730-Speed 2508.67 samples/sec Loss 1.3111 LearningRate 0.000072 Epoch: 30 Global Step: 629170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:27:51,936-Speed 2496.25 samples/sec Loss 1.3105 LearningRate 0.000072 Epoch: 30 Global Step: 629180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:28:00,151-Speed 2493.43 samples/sec Loss 1.3333 LearningRate 0.000072 Epoch: 30 Global Step: 629190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:28:08,356-Speed 2496.35 samples/sec Loss 1.3066 LearningRate 0.000072 Epoch: 30 Global Step: 629200 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:28:16,575-Speed 2492.06 samples/sec Loss 1.2842 LearningRate 0.000072 Epoch: 30 Global Step: 629210 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-07-11 14:28:24,782-Speed 2495.78 samples/sec Loss 1.3344 LearningRate 0.000072 Epoch: 30 Global Step: 629220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:28:32,939-Speed 2511.08 samples/sec Loss 1.3364 LearningRate 0.000072 Epoch: 30 Global Step: 629230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:28:41,152-Speed 2494.16 samples/sec Loss 1.3120 LearningRate 0.000072 Epoch: 30 Global Step: 629240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:28:49,359-Speed 2495.80 samples/sec Loss 1.3517 LearningRate 0.000072 Epoch: 30 Global Step: 629250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:28:57,577-Speed 2492.51 samples/sec Loss 1.3298 LearningRate 0.000072 Epoch: 30 Global Step: 629260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:05,780-Speed 2496.91 samples/sec Loss 1.3158 LearningRate 0.000072 Epoch: 30 Global Step: 629270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:13,983-Speed 2497.16 samples/sec Loss 1.3135 LearningRate 0.000072 Epoch: 30 Global Step: 629280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:22,134-Speed 2512.94 samples/sec Loss 1.3352 LearningRate 0.000072 Epoch: 30 Global Step: 629290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:30,345-Speed 2494.62 samples/sec Loss 1.3293 LearningRate 0.000072 Epoch: 30 Global Step: 629300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:38,548-Speed 2497.49 samples/sec Loss 1.3261 LearningRate 0.000072 Epoch: 30 Global Step: 629310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:46,754-Speed 2495.96 samples/sec Loss 1.3181 LearningRate 0.000072 Epoch: 30 Global Step: 629320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:29:54,960-Speed 2496.31 samples/sec Loss 1.3633 LearningRate 0.000072 Epoch: 30 Global Step: 629330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:03,165-Speed 2496.20 samples/sec Loss 1.3479 LearningRate 0.000072 Epoch: 30 Global Step: 629340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:11,329-Speed 2509.19 samples/sec Loss 1.2987 LearningRate 0.000072 Epoch: 30 Global Step: 629350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:19,533-Speed 2496.41 samples/sec Loss 1.3273 LearningRate 0.000072 Epoch: 30 Global Step: 629360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:27,744-Speed 2494.76 samples/sec Loss 1.3454 LearningRate 0.000072 Epoch: 30 Global Step: 629370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:35,962-Speed 2492.37 samples/sec Loss 1.3271 LearningRate 0.000072 Epoch: 30 Global Step: 629380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:44,255-Speed 2470.00 samples/sec Loss 1.3437 LearningRate 0.000072 Epoch: 30 Global Step: 629390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:30:52,460-Speed 2496.31 samples/sec Loss 1.3382 LearningRate 0.000072 Epoch: 30 Global Step: 629400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:00,617-Speed 2511.05 samples/sec Loss 1.3486 LearningRate 0.000072 Epoch: 30 Global Step: 629410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:08,821-Speed 2496.80 samples/sec Loss 1.3837 LearningRate 0.000072 Epoch: 30 Global Step: 629420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:17,025-Speed 2496.51 samples/sec Loss 1.3139 LearningRate 0.000072 Epoch: 30 Global Step: 629430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:25,227-Speed 2497.33 samples/sec Loss 1.3583 LearningRate 0.000072 Epoch: 30 Global Step: 629440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:33,435-Speed 2495.64 samples/sec Loss 1.3387 LearningRate 0.000072 Epoch: 30 Global Step: 629450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:41,650-Speed 2493.30 samples/sec Loss 1.3399 LearningRate 0.000072 Epoch: 30 Global Step: 629460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:49,800-Speed 2513.22 samples/sec Loss 1.3579 LearningRate 0.000072 Epoch: 30 Global Step: 629470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:31:58,006-Speed 2496.51 samples/sec Loss 1.3494 LearningRate 0.000072 Epoch: 30 Global Step: 629480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:06,211-Speed 2496.40 samples/sec Loss 1.3122 LearningRate 0.000072 Epoch: 30 Global Step: 629490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:14,417-Speed 2496.04 samples/sec Loss 1.3139 LearningRate 0.000072 Epoch: 30 Global Step: 629500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:22,622-Speed 2496.15 samples/sec Loss 1.3237 LearningRate 0.000072 Epoch: 30 Global Step: 629510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:30,826-Speed 2496.77 samples/sec Loss 1.3396 LearningRate 0.000072 Epoch: 30 Global Step: 629520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:38,985-Speed 2510.82 samples/sec Loss 1.3506 LearningRate 0.000072 Epoch: 30 Global Step: 629530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:47,192-Speed 2495.81 samples/sec Loss 1.3416 LearningRate 0.000072 Epoch: 30 Global Step: 629540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:32:55,397-Speed 2496.14 samples/sec Loss 1.3489 LearningRate 0.000072 Epoch: 30 Global Step: 629550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:03,606-Speed 2495.50 samples/sec Loss 1.3611 LearningRate 0.000072 Epoch: 30 Global Step: 629560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:11,815-Speed 2495.08 samples/sec Loss 1.3646 LearningRate 0.000072 Epoch: 30 Global Step: 629570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:20,026-Speed 2494.56 samples/sec Loss 1.3784 LearningRate 0.000072 Epoch: 30 Global Step: 629580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:28,176-Speed 2513.26 samples/sec Loss 1.3453 LearningRate 0.000072 Epoch: 30 Global Step: 629590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:36,383-Speed 2496.10 samples/sec Loss 1.3305 LearningRate 0.000072 Epoch: 30 Global Step: 629600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:44,594-Speed 2494.42 samples/sec Loss 1.3407 LearningRate 0.000072 Epoch: 30 Global Step: 629610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:33:52,804-Speed 2494.72 samples/sec Loss 1.3523 LearningRate 0.000072 Epoch: 30 Global Step: 629620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:01,009-Speed 2496.75 samples/sec Loss 1.3719 LearningRate 0.000072 Epoch: 30 Global Step: 629630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:09,214-Speed 2496.31 samples/sec Loss 1.3414 LearningRate 0.000072 Epoch: 30 Global Step: 629640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:17,368-Speed 2512.22 samples/sec Loss 1.3559 LearningRate 0.000072 Epoch: 30 Global Step: 629650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:25,576-Speed 2495.39 samples/sec Loss 1.3445 LearningRate 0.000072 Epoch: 30 Global Step: 629660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:33,791-Speed 2493.42 samples/sec Loss 1.3704 LearningRate 0.000072 Epoch: 30 Global Step: 629670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:41,996-Speed 2496.24 samples/sec Loss 1.3271 LearningRate 0.000072 Epoch: 30 Global Step: 629680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:50,200-Speed 2496.65 samples/sec Loss 1.3397 LearningRate 0.000072 Epoch: 30 Global Step: 629690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:34:58,404-Speed 2496.61 samples/sec Loss 1.3518 LearningRate 0.000072 Epoch: 30 Global Step: 629700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:06,553-Speed 2513.53 samples/sec Loss 1.3602 LearningRate 0.000072 Epoch: 30 Global Step: 629710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:14,757-Speed 2496.66 samples/sec Loss 1.3468 LearningRate 0.000072 Epoch: 30 Global Step: 629720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:22,959-Speed 2497.51 samples/sec Loss 1.3268 LearningRate 0.000072 Epoch: 30 Global Step: 629730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:31,164-Speed 2496.33 samples/sec Loss 1.3555 LearningRate 0.000072 Epoch: 30 Global Step: 629740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:39,368-Speed 2496.70 samples/sec Loss 1.3259 LearningRate 0.000072 Epoch: 30 Global Step: 629750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:47,573-Speed 2496.39 samples/sec Loss 1.3029 LearningRate 0.000072 Epoch: 30 Global Step: 629760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:35:55,724-Speed 2513.11 samples/sec Loss 1.3219 LearningRate 0.000072 Epoch: 30 Global Step: 629770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:03,930-Speed 2496.19 samples/sec Loss 1.2924 LearningRate 0.000072 Epoch: 30 Global Step: 629780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:12,133-Speed 2496.67 samples/sec Loss 1.2813 LearningRate 0.000072 Epoch: 30 Global Step: 629790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:20,343-Speed 2494.96 samples/sec Loss 1.3346 LearningRate 0.000072 Epoch: 30 Global Step: 629800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:28,550-Speed 2495.89 samples/sec Loss 1.3418 LearningRate 0.000072 Epoch: 30 Global Step: 629810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:36,755-Speed 2496.48 samples/sec Loss 1.3420 LearningRate 0.000072 Epoch: 30 Global Step: 629820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:44,908-Speed 2512.29 samples/sec Loss 1.3507 LearningRate 0.000072 Epoch: 30 Global Step: 629830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:36:53,114-Speed 2496.29 samples/sec Loss 1.3383 LearningRate 0.000072 Epoch: 30 Global Step: 629840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:01,321-Speed 2495.91 samples/sec Loss 1.3306 LearningRate 0.000072 Epoch: 30 Global Step: 629850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:09,527-Speed 2496.19 samples/sec Loss 1.3583 LearningRate 0.000072 Epoch: 30 Global Step: 629860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:17,729-Speed 2497.11 samples/sec Loss 1.3372 LearningRate 0.000072 Epoch: 30 Global Step: 629870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:25,935-Speed 2496.20 samples/sec Loss 1.3135 LearningRate 0.000072 Epoch: 30 Global Step: 629880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:34,092-Speed 2511.33 samples/sec Loss 1.3274 LearningRate 0.000072 Epoch: 30 Global Step: 629890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:42,299-Speed 2495.42 samples/sec Loss 1.3252 LearningRate 0.000072 Epoch: 30 Global Step: 629900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:50,507-Speed 2495.66 samples/sec Loss 1.3773 LearningRate 0.000072 Epoch: 30 Global Step: 629910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:37:58,719-Speed 2494.43 samples/sec Loss 1.3204 LearningRate 0.000072 Epoch: 30 Global Step: 629920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:06,928-Speed 2495.08 samples/sec Loss 1.3428 LearningRate 0.000071 Epoch: 30 Global Step: 629930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:15,132-Speed 2496.81 samples/sec Loss 1.3599 LearningRate 0.000071 Epoch: 30 Global Step: 629940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:23,282-Speed 2513.33 samples/sec Loss 1.3261 LearningRate 0.000071 Epoch: 30 Global Step: 629950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:31,493-Speed 2494.63 samples/sec Loss 1.3461 LearningRate 0.000071 Epoch: 30 Global Step: 629960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:39,697-Speed 2496.70 samples/sec Loss 1.3538 LearningRate 0.000071 Epoch: 30 Global Step: 629970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:47,913-Speed 2492.96 samples/sec Loss 1.3522 LearningRate 0.000071 Epoch: 30 Global Step: 629980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:38:56,122-Speed 2495.22 samples/sec Loss 1.3491 LearningRate 0.000071 Epoch: 30 Global Step: 629990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:04,338-Speed 2493.13 samples/sec Loss 1.3110 LearningRate 0.000071 Epoch: 30 Global Step: 630000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:12,502-Speed 2508.69 samples/sec Loss 1.3260 LearningRate 0.000071 Epoch: 30 Global Step: 630010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:20,709-Speed 2495.84 samples/sec Loss 1.3168 LearningRate 0.000071 Epoch: 30 Global Step: 630020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:28,914-Speed 2496.63 samples/sec Loss 1.3012 LearningRate 0.000071 Epoch: 30 Global Step: 630030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:37,117-Speed 2496.91 samples/sec Loss 1.3460 LearningRate 0.000071 Epoch: 30 Global Step: 630040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:45,324-Speed 2495.64 samples/sec Loss 1.3235 LearningRate 0.000071 Epoch: 30 Global Step: 630050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:39:53,529-Speed 2496.90 samples/sec Loss 1.3386 LearningRate 0.000071 Epoch: 30 Global Step: 630060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:01,675-Speed 2514.22 samples/sec Loss 1.3337 LearningRate 0.000071 Epoch: 30 Global Step: 630070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:09,879-Speed 2496.80 samples/sec Loss 1.3699 LearningRate 0.000071 Epoch: 30 Global Step: 630080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:18,084-Speed 2496.31 samples/sec Loss 1.3411 LearningRate 0.000071 Epoch: 30 Global Step: 630090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:26,290-Speed 2496.20 samples/sec Loss 1.3588 LearningRate 0.000071 Epoch: 30 Global Step: 630100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:34,501-Speed 2494.44 samples/sec Loss 1.3079 LearningRate 0.000071 Epoch: 30 Global Step: 630110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:42,707-Speed 2496.12 samples/sec Loss 1.3381 LearningRate 0.000071 Epoch: 30 Global Step: 630120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:50,861-Speed 2512.05 samples/sec Loss 1.3172 LearningRate 0.000071 Epoch: 30 Global Step: 630130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:40:59,067-Speed 2496.37 samples/sec Loss 1.3345 LearningRate 0.000071 Epoch: 30 Global Step: 630140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:07,275-Speed 2495.42 samples/sec Loss 1.3106 LearningRate 0.000071 Epoch: 30 Global Step: 630150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:15,481-Speed 2496.11 samples/sec Loss 1.3398 LearningRate 0.000071 Epoch: 30 Global Step: 630160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:23,688-Speed 2496.05 samples/sec Loss 1.3134 LearningRate 0.000071 Epoch: 30 Global Step: 630170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:31,894-Speed 2495.88 samples/sec Loss 1.3200 LearningRate 0.000071 Epoch: 30 Global Step: 630180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:40,047-Speed 2512.45 samples/sec Loss 1.3141 LearningRate 0.000071 Epoch: 30 Global Step: 630190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:48,254-Speed 2495.87 samples/sec Loss 1.3560 LearningRate 0.000071 Epoch: 30 Global Step: 630200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:41:56,456-Speed 2497.20 samples/sec Loss 1.3082 LearningRate 0.000071 Epoch: 30 Global Step: 630210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:04,664-Speed 2495.68 samples/sec Loss 1.3354 LearningRate 0.000071 Epoch: 30 Global Step: 630220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:12,868-Speed 2496.57 samples/sec Loss 1.3202 LearningRate 0.000071 Epoch: 30 Global Step: 630230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:21,074-Speed 2496.23 samples/sec Loss 1.3540 LearningRate 0.000071 Epoch: 30 Global Step: 630240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:29,237-Speed 2509.27 samples/sec Loss 1.3095 LearningRate 0.000071 Epoch: 30 Global Step: 630250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:37,445-Speed 2495.49 samples/sec Loss 1.3363 LearningRate 0.000071 Epoch: 30 Global Step: 630260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:45,650-Speed 2496.55 samples/sec Loss 1.3043 LearningRate 0.000071 Epoch: 30 Global Step: 630270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:42:53,861-Speed 2494.61 samples/sec Loss 1.2930 LearningRate 0.000071 Epoch: 30 Global Step: 630280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:02,069-Speed 2495.69 samples/sec Loss 1.3278 LearningRate 0.000071 Epoch: 30 Global Step: 630290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:10,276-Speed 2495.93 samples/sec Loss 1.3286 LearningRate 0.000071 Epoch: 30 Global Step: 630300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:18,435-Speed 2510.55 samples/sec Loss 1.3143 LearningRate 0.000071 Epoch: 30 Global Step: 630310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:26,637-Speed 2497.19 samples/sec Loss 1.3727 LearningRate 0.000071 Epoch: 30 Global Step: 630320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:34,856-Speed 2493.15 samples/sec Loss 1.3626 LearningRate 0.000071 Epoch: 30 Global Step: 630330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:43,059-Speed 2497.03 samples/sec Loss 1.2974 LearningRate 0.000071 Epoch: 30 Global Step: 630340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:51,263-Speed 2496.53 samples/sec Loss 1.3468 LearningRate 0.000071 Epoch: 30 Global Step: 630350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:43:59,472-Speed 2495.64 samples/sec Loss 1.3407 LearningRate 0.000071 Epoch: 30 Global Step: 630360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:07,623-Speed 2512.95 samples/sec Loss 1.3243 LearningRate 0.000071 Epoch: 30 Global Step: 630370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:15,827-Speed 2496.49 samples/sec Loss 1.3368 LearningRate 0.000071 Epoch: 30 Global Step: 630380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:24,032-Speed 2496.53 samples/sec Loss 1.3159 LearningRate 0.000071 Epoch: 30 Global Step: 630390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:32,235-Speed 2497.08 samples/sec Loss 1.3519 LearningRate 0.000071 Epoch: 30 Global Step: 630400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:40,450-Speed 2493.27 samples/sec Loss 1.3161 LearningRate 0.000071 Epoch: 30 Global Step: 630410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:44:48,655-Speed 2496.27 samples/sec Loss 1.2790 LearningRate 0.000071 Epoch: 30 Global Step: 630420 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-07-11 14:44:56,825-Speed 2507.07 samples/sec Loss 1.3125 LearningRate 0.000071 Epoch: 30 Global Step: 630430 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-07-11 14:45:05,034-Speed 2495.31 samples/sec Loss 1.3235 LearningRate 0.000071 Epoch: 30 Global Step: 630440 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-07-11 14:45:13,196-Speed 2509.59 samples/sec Loss 1.3205 LearningRate 0.000071 Epoch: 30 Global Step: 630450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:45:21,402-Speed 2496.02 samples/sec Loss 1.3508 LearningRate 0.000071 Epoch: 30 Global Step: 630460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:45:29,606-Speed 2496.78 samples/sec Loss 1.3112 LearningRate 0.000071 Epoch: 30 Global Step: 630470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:45:37,812-Speed 2496.38 samples/sec Loss 1.3413 LearningRate 0.000071 Epoch: 30 Global Step: 630480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:45:45,965-Speed 2512.12 samples/sec Loss 1.3177 LearningRate 0.000071 Epoch: 30 Global Step: 630490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:45:54,171-Speed 2496.23 samples/sec Loss 1.3022 LearningRate 0.000071 Epoch: 30 Global Step: 630500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:02,374-Speed 2497.15 samples/sec Loss 1.3361 LearningRate 0.000071 Epoch: 30 Global Step: 630510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:10,583-Speed 2495.43 samples/sec Loss 1.3536 LearningRate 0.000071 Epoch: 30 Global Step: 630520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:18,792-Speed 2495.18 samples/sec Loss 1.3379 LearningRate 0.000071 Epoch: 30 Global Step: 630530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:27,003-Speed 2494.48 samples/sec Loss 1.3113 LearningRate 0.000071 Epoch: 30 Global Step: 630540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:35,166-Speed 2509.63 samples/sec Loss 1.2730 LearningRate 0.000071 Epoch: 30 Global Step: 630550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:43,369-Speed 2497.07 samples/sec Loss 1.2983 LearningRate 0.000071 Epoch: 30 Global Step: 630560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:51,574-Speed 2496.24 samples/sec Loss 1.3153 LearningRate 0.000071 Epoch: 30 Global Step: 630570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:46:59,778-Speed 2496.93 samples/sec Loss 1.3194 LearningRate 0.000071 Epoch: 30 Global Step: 630580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:07,981-Speed 2497.05 samples/sec Loss 1.3481 LearningRate 0.000071 Epoch: 30 Global Step: 630590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:16,187-Speed 2495.96 samples/sec Loss 1.3596 LearningRate 0.000071 Epoch: 30 Global Step: 630600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:24,341-Speed 2512.27 samples/sec Loss 1.2964 LearningRate 0.000071 Epoch: 30 Global Step: 630610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:32,547-Speed 2495.89 samples/sec Loss 1.3140 LearningRate 0.000071 Epoch: 30 Global Step: 630620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:40,756-Speed 2495.40 samples/sec Loss 1.3006 LearningRate 0.000071 Epoch: 30 Global Step: 630630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:48,963-Speed 2495.80 samples/sec Loss 1.3467 LearningRate 0.000071 Epoch: 30 Global Step: 630640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:47:57,169-Speed 2496.13 samples/sec Loss 1.3510 LearningRate 0.000071 Epoch: 30 Global Step: 630650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:05,377-Speed 2495.51 samples/sec Loss 1.3322 LearningRate 0.000071 Epoch: 30 Global Step: 630660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:13,530-Speed 2512.40 samples/sec Loss 1.3564 LearningRate 0.000071 Epoch: 30 Global Step: 630670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:21,736-Speed 2496.40 samples/sec Loss 1.3212 LearningRate 0.000071 Epoch: 30 Global Step: 630680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:29,941-Speed 2496.25 samples/sec Loss 1.3618 LearningRate 0.000071 Epoch: 30 Global Step: 630690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:38,145-Speed 2496.50 samples/sec Loss 1.3180 LearningRate 0.000071 Epoch: 30 Global Step: 630700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:46,368-Speed 2491.09 samples/sec Loss 1.3422 LearningRate 0.000071 Epoch: 30 Global Step: 630710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:48:54,577-Speed 2495.03 samples/sec Loss 1.3323 LearningRate 0.000071 Epoch: 30 Global Step: 630720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:02,731-Speed 2512.19 samples/sec Loss 1.3356 LearningRate 0.000071 Epoch: 30 Global Step: 630730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:10,935-Speed 2496.59 samples/sec Loss 1.3274 LearningRate 0.000071 Epoch: 30 Global Step: 630740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:19,153-Speed 2492.70 samples/sec Loss 1.3114 LearningRate 0.000071 Epoch: 30 Global Step: 630750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:27,359-Speed 2496.14 samples/sec Loss 1.3400 LearningRate 0.000071 Epoch: 30 Global Step: 630760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:35,565-Speed 2496.22 samples/sec Loss 1.3716 LearningRate 0.000071 Epoch: 30 Global Step: 630770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:43,780-Speed 2493.22 samples/sec Loss 1.3416 LearningRate 0.000071 Epoch: 30 Global Step: 630780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-07-11 14:49:51,931-Speed 2513.08 samples/sec Loss 1.3722 LearningRate 0.000071 Epoch: 30 Global Step: 630790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 14:50:00,100-Speed 2507.26 samples/sec Loss 1.3534 LearningRate 0.000071 Epoch: 30 Global Step: 630800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:08,309-Speed 2495.34 samples/sec Loss 1.3395 LearningRate 0.000071 Epoch: 30 Global Step: 630810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:16,511-Speed 2497.26 samples/sec Loss 1.3246 LearningRate 0.000071 Epoch: 30 Global Step: 630820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:24,714-Speed 2496.97 samples/sec Loss 1.2883 LearningRate 0.000071 Epoch: 30 Global Step: 630830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:32,919-Speed 2496.56 samples/sec Loss 1.3328 LearningRate 0.000071 Epoch: 30 Global Step: 630840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:41,074-Speed 2512.35 samples/sec Loss 1.3394 LearningRate 0.000071 Epoch: 30 Global Step: 630850 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:49,280-Speed 2496.33 samples/sec Loss 1.3571 LearningRate 0.000071 Epoch: 30 Global Step: 630860 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:50:57,488-Speed 2495.35 samples/sec Loss 1.3234 LearningRate 0.000071 Epoch: 30 Global Step: 630870 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:05,694-Speed 2496.19 samples/sec Loss 1.3069 LearningRate 0.000071 Epoch: 30 Global Step: 630880 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:13,899-Speed 2496.33 samples/sec Loss 1.3033 LearningRate 0.000071 Epoch: 30 Global Step: 630890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:22,103-Speed 2496.73 samples/sec Loss 1.3321 LearningRate 0.000071 Epoch: 30 Global Step: 630900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:30,258-Speed 2511.96 samples/sec Loss 1.3425 LearningRate 0.000071 Epoch: 30 Global Step: 630910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:38,466-Speed 2495.48 samples/sec Loss 1.3325 LearningRate 0.000071 Epoch: 30 Global Step: 630920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:46,671-Speed 2496.79 samples/sec Loss 1.3480 LearningRate 0.000071 Epoch: 30 Global Step: 630930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:51:54,880-Speed 2495.03 samples/sec Loss 1.3348 LearningRate 0.000071 Epoch: 30 Global Step: 630940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:03,087-Speed 2495.98 samples/sec Loss 1.3593 LearningRate 0.000071 Epoch: 30 Global Step: 630950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:11,291-Speed 2496.64 samples/sec Loss 1.3534 LearningRate 0.000071 Epoch: 30 Global Step: 630960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:19,445-Speed 2512.27 samples/sec Loss 1.3706 LearningRate 0.000071 Epoch: 30 Global Step: 630970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:27,649-Speed 2497.01 samples/sec Loss 1.3219 LearningRate 0.000071 Epoch: 30 Global Step: 630980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:35,853-Speed 2496.66 samples/sec Loss 1.3613 LearningRate 0.000071 Epoch: 30 Global Step: 630990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:44,060-Speed 2495.65 samples/sec Loss 1.3354 LearningRate 0.000071 Epoch: 30 Global Step: 631000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:52:52,270-Speed 2495.25 samples/sec Loss 1.3160 LearningRate 0.000071 Epoch: 30 Global Step: 631010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:00,474-Speed 2496.49 samples/sec Loss 1.3232 LearningRate 0.000071 Epoch: 30 Global Step: 631020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:08,627-Speed 2512.36 samples/sec Loss 1.3298 LearningRate 0.000071 Epoch: 30 Global Step: 631030 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:16,832-Speed 2496.71 samples/sec Loss 1.3201 LearningRate 0.000071 Epoch: 30 Global Step: 631040 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:25,041-Speed 2495.30 samples/sec Loss 1.3348 LearningRate 0.000071 Epoch: 30 Global Step: 631050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:33,259-Speed 2492.48 samples/sec Loss 1.3253 LearningRate 0.000071 Epoch: 30 Global Step: 631060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:41,465-Speed 2496.07 samples/sec Loss 1.3082 LearningRate 0.000071 Epoch: 30 Global Step: 631070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:49,669-Speed 2496.69 samples/sec Loss 1.3507 LearningRate 0.000071 Epoch: 30 Global Step: 631080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:53:57,819-Speed 2513.33 samples/sec Loss 1.3187 LearningRate 0.000071 Epoch: 30 Global Step: 631090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:06,023-Speed 2496.75 samples/sec Loss 1.3180 LearningRate 0.000071 Epoch: 30 Global Step: 631100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:14,230-Speed 2495.84 samples/sec Loss 1.3310 LearningRate 0.000071 Epoch: 30 Global Step: 631110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:22,433-Speed 2496.96 samples/sec Loss 1.3567 LearningRate 0.000071 Epoch: 30 Global Step: 631120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:30,635-Speed 2497.39 samples/sec Loss 1.3332 LearningRate 0.000071 Epoch: 30 Global Step: 631130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:38,839-Speed 2496.80 samples/sec Loss 1.2886 LearningRate 0.000071 Epoch: 30 Global Step: 631140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:46,996-Speed 2511.20 samples/sec Loss 1.3385 LearningRate 0.000071 Epoch: 30 Global Step: 631150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:54:55,204-Speed 2495.57 samples/sec Loss 1.3342 LearningRate 0.000071 Epoch: 30 Global Step: 631160 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:03,408-Speed 2497.09 samples/sec Loss 1.3604 LearningRate 0.000071 Epoch: 30 Global Step: 631170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:11,613-Speed 2496.47 samples/sec Loss 1.3143 LearningRate 0.000071 Epoch: 30 Global Step: 631180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:19,819-Speed 2496.30 samples/sec Loss 1.3341 LearningRate 0.000071 Epoch: 30 Global Step: 631190 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:28,023-Speed 2496.71 samples/sec Loss 1.3297 LearningRate 0.000071 Epoch: 30 Global Step: 631200 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:36,178-Speed 2512.05 samples/sec Loss 1.2997 LearningRate 0.000071 Epoch: 30 Global Step: 631210 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:44,381-Speed 2496.74 samples/sec Loss 1.3214 LearningRate 0.000071 Epoch: 30 Global Step: 631220 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:55:52,586-Speed 2496.63 samples/sec Loss 1.3321 LearningRate 0.000071 Epoch: 30 Global Step: 631230 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:00,791-Speed 2496.68 samples/sec Loss 1.2849 LearningRate 0.000071 Epoch: 30 Global Step: 631240 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:09,002-Speed 2494.63 samples/sec Loss 1.3033 LearningRate 0.000071 Epoch: 30 Global Step: 631250 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:17,209-Speed 2495.87 samples/sec Loss 1.3208 LearningRate 0.000071 Epoch: 30 Global Step: 631260 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:25,363-Speed 2512.19 samples/sec Loss 1.3473 LearningRate 0.000071 Epoch: 30 Global Step: 631270 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:33,574-Speed 2494.53 samples/sec Loss 1.3163 LearningRate 0.000071 Epoch: 30 Global Step: 631280 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:41,783-Speed 2495.05 samples/sec Loss 1.3344 LearningRate 0.000071 Epoch: 30 Global Step: 631290 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:49,991-Speed 2495.64 samples/sec Loss 1.3122 LearningRate 0.000071 Epoch: 30 Global Step: 631300 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:56:58,207-Speed 2492.97 samples/sec Loss 1.3303 LearningRate 0.000071 Epoch: 30 Global Step: 631310 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:06,414-Speed 2495.99 samples/sec Loss 1.3017 LearningRate 0.000071 Epoch: 30 Global Step: 631320 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:14,579-Speed 2508.85 samples/sec Loss 1.3273 LearningRate 0.000070 Epoch: 30 Global Step: 631330 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:22,797-Speed 2492.36 samples/sec Loss 1.3206 LearningRate 0.000070 Epoch: 30 Global Step: 631340 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:31,010-Speed 2494.12 samples/sec Loss 1.3442 LearningRate 0.000070 Epoch: 30 Global Step: 631350 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:39,229-Speed 2492.25 samples/sec Loss 1.3231 LearningRate 0.000070 Epoch: 30 Global Step: 631360 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:47,444-Speed 2493.17 samples/sec Loss 1.3397 LearningRate 0.000070 Epoch: 30 Global Step: 631370 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:57:55,649-Speed 2496.51 samples/sec Loss 1.3512 LearningRate 0.000070 Epoch: 30 Global Step: 631380 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:03,808-Speed 2510.72 samples/sec Loss 1.3492 LearningRate 0.000070 Epoch: 30 Global Step: 631390 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:12,014-Speed 2496.14 samples/sec Loss 1.3278 LearningRate 0.000070 Epoch: 30 Global Step: 631400 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:20,218-Speed 2496.36 samples/sec Loss 1.2996 LearningRate 0.000070 Epoch: 30 Global Step: 631410 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:28,422-Speed 2496.67 samples/sec Loss 1.3418 LearningRate 0.000070 Epoch: 30 Global Step: 631420 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:36,627-Speed 2496.44 samples/sec Loss 1.3241 LearningRate 0.000070 Epoch: 30 Global Step: 631430 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:44,839-Speed 2494.49 samples/sec Loss 1.3187 LearningRate 0.000070 Epoch: 30 Global Step: 631440 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:58:52,991-Speed 2512.52 samples/sec Loss 1.3318 LearningRate 0.000070 Epoch: 30 Global Step: 631450 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:01,199-Speed 2495.54 samples/sec Loss 1.3114 LearningRate 0.000070 Epoch: 30 Global Step: 631460 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:09,406-Speed 2495.95 samples/sec Loss 1.3319 LearningRate 0.000070 Epoch: 30 Global Step: 631470 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:17,611-Speed 2496.34 samples/sec Loss 1.3455 LearningRate 0.000070 Epoch: 30 Global Step: 631480 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:25,817-Speed 2496.05 samples/sec Loss 1.3277 LearningRate 0.000070 Epoch: 30 Global Step: 631490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:34,037-Speed 2492.03 samples/sec Loss 1.3461 LearningRate 0.000070 Epoch: 30 Global Step: 631500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:42,190-Speed 2512.23 samples/sec Loss 1.3231 LearningRate 0.000070 Epoch: 30 Global Step: 631510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:50,427-Speed 2486.79 samples/sec Loss 1.3119 LearningRate 0.000070 Epoch: 30 Global Step: 631520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 14:59:58,633-Speed 2496.22 samples/sec Loss 1.3073 LearningRate 0.000070 Epoch: 30 Global Step: 631530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:06,845-Speed 2494.43 samples/sec Loss 1.3096 LearningRate 0.000070 Epoch: 30 Global Step: 631540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:15,047-Speed 2497.09 samples/sec Loss 1.3528 LearningRate 0.000070 Epoch: 30 Global Step: 631550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:23,252-Speed 2496.31 samples/sec Loss 1.3385 LearningRate 0.000070 Epoch: 30 Global Step: 631560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:31,403-Speed 2513.29 samples/sec Loss 1.3283 LearningRate 0.000070 Epoch: 30 Global Step: 631570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:39,611-Speed 2495.37 samples/sec Loss 1.3384 LearningRate 0.000070 Epoch: 30 Global Step: 631580 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:47,820-Speed 2495.56 samples/sec Loss 1.3555 LearningRate 0.000070 Epoch: 30 Global Step: 631590 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:00:56,022-Speed 2497.16 samples/sec Loss 1.3490 LearningRate 0.000070 Epoch: 30 Global Step: 631600 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:04,239-Speed 2492.88 samples/sec Loss 1.3629 LearningRate 0.000070 Epoch: 30 Global Step: 631610 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:12,446-Speed 2496.02 samples/sec Loss 1.2914 LearningRate 0.000070 Epoch: 30 Global Step: 631620 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:20,608-Speed 2509.43 samples/sec Loss 1.3573 LearningRate 0.000070 Epoch: 30 Global Step: 631630 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:28,815-Speed 2495.86 samples/sec Loss 1.3017 LearningRate 0.000070 Epoch: 30 Global Step: 631640 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:37,017-Speed 2497.15 samples/sec Loss 1.3090 LearningRate 0.000070 Epoch: 30 Global Step: 631650 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:45,223-Speed 2496.59 samples/sec Loss 1.3183 LearningRate 0.000070 Epoch: 30 Global Step: 631660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:01:53,431-Speed 2495.44 samples/sec Loss 1.3207 LearningRate 0.000070 Epoch: 30 Global Step: 631670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:01,634-Speed 2497.20 samples/sec Loss 1.3301 LearningRate 0.000070 Epoch: 30 Global Step: 631680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:09,787-Speed 2512.25 samples/sec Loss 1.3220 LearningRate 0.000070 Epoch: 30 Global Step: 631690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:17,992-Speed 2496.32 samples/sec Loss 1.3265 LearningRate 0.000070 Epoch: 30 Global Step: 631700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:26,210-Speed 2492.75 samples/sec Loss 1.3278 LearningRate 0.000070 Epoch: 30 Global Step: 631710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:34,419-Speed 2495.07 samples/sec Loss 1.3396 LearningRate 0.000070 Epoch: 30 Global Step: 631720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:42,624-Speed 2496.77 samples/sec Loss 1.3284 LearningRate 0.000070 Epoch: 30 Global Step: 631730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:50,829-Speed 2496.40 samples/sec Loss 1.3435 LearningRate 0.000070 Epoch: 30 Global Step: 631740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:02:58,982-Speed 2512.36 samples/sec Loss 1.3529 LearningRate 0.000070 Epoch: 30 Global Step: 631750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:07,186-Speed 2496.76 samples/sec Loss 1.3471 LearningRate 0.000070 Epoch: 30 Global Step: 631760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:15,401-Speed 2493.59 samples/sec Loss 1.3338 LearningRate 0.000070 Epoch: 30 Global Step: 631770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:23,603-Speed 2497.35 samples/sec Loss 1.3326 LearningRate 0.000070 Epoch: 30 Global Step: 631780 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:31,813-Speed 2494.84 samples/sec Loss 1.3492 LearningRate 0.000070 Epoch: 30 Global Step: 631790 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:40,016-Speed 2497.32 samples/sec Loss 1.3393 LearningRate 0.000070 Epoch: 30 Global Step: 631800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:48,170-Speed 2511.97 samples/sec Loss 1.3235 LearningRate 0.000070 Epoch: 30 Global Step: 631810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:03:56,382-Speed 2494.50 samples/sec Loss 1.3182 LearningRate 0.000070 Epoch: 30 Global Step: 631820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:04,587-Speed 2496.24 samples/sec Loss 1.3099 LearningRate 0.000070 Epoch: 30 Global Step: 631830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:12,792-Speed 2496.46 samples/sec Loss 1.3376 LearningRate 0.000070 Epoch: 30 Global Step: 631840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:20,996-Speed 2496.73 samples/sec Loss 1.3092 LearningRate 0.000070 Epoch: 30 Global Step: 631850 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:29,238-Speed 2495.92 samples/sec Loss 1.3330 LearningRate 0.000070 Epoch: 30 Global Step: 631860 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:37,433-Speed 2512.55 samples/sec Loss 1.3290 LearningRate 0.000070 Epoch: 30 Global Step: 631870 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:45,638-Speed 2496.36 samples/sec Loss 1.3265 LearningRate 0.000070 Epoch: 30 Global Step: 631880 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:04:53,884-Speed 2498.88 samples/sec Loss 1.3277 LearningRate 0.000070 Epoch: 30 Global Step: 631890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:02,137-Speed 2496.84 samples/sec Loss 1.3422 LearningRate 0.000070 Epoch: 30 Global Step: 631900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:10,341-Speed 2496.59 samples/sec Loss 1.2949 LearningRate 0.000070 Epoch: 30 Global Step: 631910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:24,284-Speed 2498.47 samples/sec Loss 1.3315 LearningRate 0.000070 Epoch: 30 Global Step: 631920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:32,475-Speed 2516.25 samples/sec Loss 1.3575 LearningRate 0.000070 Epoch: 30 Global Step: 631930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:42,760-Speed 2002.07 samples/sec Loss 1.3390 LearningRate 0.000070 Epoch: 30 Global Step: 631940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:50,957-Speed 2498.73 samples/sec Loss 1.3515 LearningRate 0.000070 Epoch: 30 Global Step: 631950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:05:59,198-Speed 2498.07 samples/sec Loss 1.3271 LearningRate 0.000070 Epoch: 30 Global Step: 631960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:06:13,199-Speed 2499.52 samples/sec Loss 1.3380 LearningRate 0.000070 Epoch: 30 Global Step: 631970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:06:21,586-Speed 2442.17 samples/sec Loss 1.3133 LearningRate 0.000070 Epoch: 30 Global Step: 631980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:06:33,051-Speed 1797.18 samples/sec Loss 1.3365 LearningRate 0.000070 Epoch: 30 Global Step: 631990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:06:41,245-Speed 2502.05 samples/sec Loss 1.3483 LearningRate 0.000070 Epoch: 30 Global Step: 632000 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:06:49,447-Speed 2497.36 samples/sec Loss 1.3193 LearningRate 0.000070 Epoch: 30 Global Step: 632010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:06:57,729-Speed 2497.22 samples/sec Loss 1.3265 LearningRate 0.000070 Epoch: 30 Global Step: 632020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:07:17,016-Speed 1064.68 samples/sec Loss 1.3206 LearningRate 0.000070 Epoch: 30 Global Step: 632030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:07:25,213-Speed 2498.74 samples/sec Loss 1.3327 LearningRate 0.000070 Epoch: 30 Global Step: 632040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:07:33,395-Speed 2517.30 samples/sec Loss 1.3534 LearningRate 0.000070 Epoch: 30 Global Step: 632050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:07:44,350-Speed 1877.20 samples/sec Loss 1.3419 LearningRate 0.000070 Epoch: 30 Global Step: 632060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:07:55,844-Speed 1781.87 samples/sec Loss 1.3181 LearningRate 0.000070 Epoch: 30 Global Step: 632070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:04,045-Speed 2497.97 samples/sec Loss 1.3265 LearningRate 0.000070 Epoch: 30 Global Step: 632080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:12,256-Speed 2494.39 samples/sec Loss 1.3321 LearningRate 0.000070 Epoch: 30 Global Step: 632090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:20,460-Speed 2496.95 samples/sec Loss 1.3498 LearningRate 0.000070 Epoch: 30 Global Step: 632100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:28,609-Speed 2513.37 samples/sec Loss 1.3158 LearningRate 0.000070 Epoch: 30 Global Step: 632110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:36,821-Speed 2494.42 samples/sec Loss 1.3302 LearningRate 0.000070 Epoch: 30 Global Step: 632120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:45,026-Speed 2496.37 samples/sec Loss 1.3079 LearningRate 0.000070 Epoch: 30 Global Step: 632130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:08:53,251-Speed 2490.56 samples/sec Loss 1.3058 LearningRate 0.000070 Epoch: 30 Global Step: 632140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:01,453-Speed 2497.11 samples/sec Loss 1.3507 LearningRate 0.000070 Epoch: 30 Global Step: 632150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:09,659-Speed 2496.27 samples/sec Loss 1.3311 LearningRate 0.000070 Epoch: 30 Global Step: 632160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:17,808-Speed 2513.75 samples/sec Loss 1.3279 LearningRate 0.000070 Epoch: 30 Global Step: 632170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:26,018-Speed 2494.84 samples/sec Loss 1.3130 LearningRate 0.000070 Epoch: 30 Global Step: 632180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:34,219-Speed 2497.93 samples/sec Loss 1.3182 LearningRate 0.000070 Epoch: 30 Global Step: 632190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:42,422-Speed 2497.14 samples/sec Loss 1.3282 LearningRate 0.000070 Epoch: 30 Global Step: 632200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:50,626-Speed 2496.82 samples/sec Loss 1.3239 LearningRate 0.000070 Epoch: 30 Global Step: 632210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:09:58,831-Speed 2496.27 samples/sec Loss 1.3277 LearningRate 0.000070 Epoch: 30 Global Step: 632220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:06,981-Speed 2513.22 samples/sec Loss 1.3340 LearningRate 0.000070 Epoch: 30 Global Step: 632230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:15,195-Speed 2494.11 samples/sec Loss 1.3092 LearningRate 0.000070 Epoch: 30 Global Step: 632240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:23,397-Speed 2497.20 samples/sec Loss 1.2984 LearningRate 0.000070 Epoch: 30 Global Step: 632250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:31,599-Speed 2497.21 samples/sec Loss 1.3411 LearningRate 0.000070 Epoch: 30 Global Step: 632260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:39,819-Speed 2492.13 samples/sec Loss 1.3727 LearningRate 0.000070 Epoch: 30 Global Step: 632270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:48,024-Speed 2496.41 samples/sec Loss 1.3521 LearningRate 0.000070 Epoch: 30 Global Step: 632280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:10:56,190-Speed 2508.23 samples/sec Loss 1.2837 LearningRate 0.000070 Epoch: 30 Global Step: 632290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:04,418-Speed 2489.62 samples/sec Loss 1.3603 LearningRate 0.000070 Epoch: 30 Global Step: 632300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:12,621-Speed 2497.04 samples/sec Loss 1.3270 LearningRate 0.000070 Epoch: 30 Global Step: 632310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:20,828-Speed 2495.84 samples/sec Loss 1.3173 LearningRate 0.000070 Epoch: 30 Global Step: 632320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:29,031-Speed 2496.97 samples/sec Loss 1.3316 LearningRate 0.000070 Epoch: 30 Global Step: 632330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:37,235-Speed 2496.62 samples/sec Loss 1.3421 LearningRate 0.000070 Epoch: 30 Global Step: 632340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:45,398-Speed 2509.28 samples/sec Loss 1.3083 LearningRate 0.000070 Epoch: 30 Global Step: 632350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-07-11 15:11:53,559-Speed 2509.95 samples/sec Loss 1.2910 LearningRate 0.000070 Epoch: 30 Global Step: 632360 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:01,780-Speed 2491.56 samples/sec Loss 1.2878 LearningRate 0.000070 Epoch: 30 Global Step: 632370 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:09,988-Speed 2495.58 samples/sec Loss 1.3179 LearningRate 0.000070 Epoch: 30 Global Step: 632380 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:18,187-Speed 2497.97 samples/sec Loss 1.3480 LearningRate 0.000070 Epoch: 30 Global Step: 632390 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:26,395-Speed 2495.43 samples/sec Loss 1.3277 LearningRate 0.000070 Epoch: 30 Global Step: 632400 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:34,560-Speed 2509.54 samples/sec Loss 1.3211 LearningRate 0.000070 Epoch: 30 Global Step: 632410 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:42,764-Speed 2496.86 samples/sec Loss 1.3265 LearningRate 0.000070 Epoch: 30 Global Step: 632420 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:50,983-Speed 2491.99 samples/sec Loss 1.3206 LearningRate 0.000070 Epoch: 30 Global Step: 632430 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:12:59,190-Speed 2495.88 samples/sec Loss 1.3391 LearningRate 0.000070 Epoch: 30 Global Step: 632440 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:07,394-Speed 2496.60 samples/sec Loss 1.3359 LearningRate 0.000070 Epoch: 30 Global Step: 632450 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:15,600-Speed 2496.26 samples/sec Loss 1.3131 LearningRate 0.000070 Epoch: 30 Global Step: 632460 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:23,748-Speed 2513.65 samples/sec Loss 1.3264 LearningRate 0.000070 Epoch: 30 Global Step: 632470 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:31,952-Speed 2496.79 samples/sec Loss 1.2892 LearningRate 0.000070 Epoch: 30 Global Step: 632480 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:40,164-Speed 2494.39 samples/sec Loss 1.3387 LearningRate 0.000070 Epoch: 30 Global Step: 632490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:48,371-Speed 2495.73 samples/sec Loss 1.3256 LearningRate 0.000070 Epoch: 30 Global Step: 632500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:13:56,571-Speed 2498.11 samples/sec Loss 1.3200 LearningRate 0.000070 Epoch: 30 Global Step: 632510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:04,771-Speed 2497.86 samples/sec Loss 1.3251 LearningRate 0.000070 Epoch: 30 Global Step: 632520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:12,923-Speed 2512.56 samples/sec Loss 1.3322 LearningRate 0.000070 Epoch: 30 Global Step: 632530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:21,123-Speed 2498.10 samples/sec Loss 1.3477 LearningRate 0.000070 Epoch: 30 Global Step: 632540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:29,333-Speed 2494.89 samples/sec Loss 1.3412 LearningRate 0.000070 Epoch: 30 Global Step: 632550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:37,543-Speed 2494.77 samples/sec Loss 1.3153 LearningRate 0.000070 Epoch: 30 Global Step: 632560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:45,759-Speed 2493.10 samples/sec Loss 1.3266 LearningRate 0.000070 Epoch: 30 Global Step: 632570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:14:53,962-Speed 2497.25 samples/sec Loss 1.3393 LearningRate 0.000070 Epoch: 30 Global Step: 632580 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:02,116-Speed 2511.96 samples/sec Loss 1.3533 LearningRate 0.000070 Epoch: 30 Global Step: 632590 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:10,327-Speed 2494.52 samples/sec Loss 1.3298 LearningRate 0.000070 Epoch: 30 Global Step: 632600 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:18,530-Speed 2497.41 samples/sec Loss 1.3012 LearningRate 0.000070 Epoch: 30 Global Step: 632610 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:26,734-Speed 2496.95 samples/sec Loss 1.3108 LearningRate 0.000070 Epoch: 30 Global Step: 632620 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:34,933-Speed 2498.05 samples/sec Loss 1.2965 LearningRate 0.000070 Epoch: 30 Global Step: 632630 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:43,135-Speed 2497.28 samples/sec Loss 1.3031 LearningRate 0.000070 Epoch: 30 Global Step: 632640 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:51,286-Speed 2513.07 samples/sec Loss 1.3555 LearningRate 0.000070 Epoch: 30 Global Step: 632650 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:15:59,489-Speed 2497.38 samples/sec Loss 1.3186 LearningRate 0.000070 Epoch: 30 Global Step: 632660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:07,692-Speed 2496.82 samples/sec Loss 1.3326 LearningRate 0.000070 Epoch: 30 Global Step: 632670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:15,899-Speed 2496.10 samples/sec Loss 1.3186 LearningRate 0.000070 Epoch: 30 Global Step: 632680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:24,100-Speed 2497.74 samples/sec Loss 1.3593 LearningRate 0.000070 Epoch: 30 Global Step: 632690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:32,315-Speed 2493.41 samples/sec Loss 1.2939 LearningRate 0.000070 Epoch: 30 Global Step: 632700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:40,467-Speed 2512.69 samples/sec Loss 1.3339 LearningRate 0.000070 Epoch: 30 Global Step: 632710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:48,676-Speed 2495.62 samples/sec Loss 1.3198 LearningRate 0.000070 Epoch: 30 Global Step: 632720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:16:56,879-Speed 2497.16 samples/sec Loss 1.2882 LearningRate 0.000070 Epoch: 30 Global Step: 632730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:17:05,081-Speed 2497.13 samples/sec Loss 1.3129 LearningRate 0.000069 Epoch: 30 Global Step: 632740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:17:13,296-Speed 2493.74 samples/sec Loss 1.2908 LearningRate 0.000069 Epoch: 30 Global Step: 632750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:17:21,493-Speed 2499.04 samples/sec Loss 1.3813 LearningRate 0.000069 Epoch: 30 Global Step: 632760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:17:29,643-Speed 2513.21 samples/sec Loss 1.3016 LearningRate 0.000069 Epoch: 30 Global Step: 632770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:17:37,797-Speed 2511.96 samples/sec Loss 1.3270 LearningRate 0.000069 Epoch: 30 Global Step: 632780 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:17:46,001-Speed 2497.18 samples/sec Loss 1.3590 LearningRate 0.000069 Epoch: 30 Global Step: 632790 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:17:54,198-Speed 2498.73 samples/sec Loss 1.3128 LearningRate 0.000069 Epoch: 30 Global Step: 632800 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:02,399-Speed 2497.92 samples/sec Loss 1.3377 LearningRate 0.000069 Epoch: 30 Global Step: 632810 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:10,598-Speed 2498.22 samples/sec Loss 1.2956 LearningRate 0.000069 Epoch: 30 Global Step: 632820 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:18,745-Speed 2514.05 samples/sec Loss 1.3381 LearningRate 0.000069 Epoch: 30 Global Step: 632830 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:26,942-Speed 2498.79 samples/sec Loss 1.3073 LearningRate 0.000069 Epoch: 30 Global Step: 632840 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:35,142-Speed 2498.01 samples/sec Loss 1.2683 LearningRate 0.000069 Epoch: 30 Global Step: 632850 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:43,339-Speed 2498.87 samples/sec Loss 1.3009 LearningRate 0.000069 Epoch: 30 Global Step: 632860 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:51,542-Speed 2497.07 samples/sec Loss 1.3465 LearningRate 0.000069 Epoch: 30 Global Step: 632870 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:18:59,745-Speed 2497.19 samples/sec Loss 1.3354 LearningRate 0.000069 Epoch: 30 Global Step: 632880 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:07,890-Speed 2514.62 samples/sec Loss 1.3503 LearningRate 0.000069 Epoch: 30 Global Step: 632890 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:16,095-Speed 2496.50 samples/sec Loss 1.2961 LearningRate 0.000069 Epoch: 30 Global Step: 632900 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:24,305-Speed 2495.01 samples/sec Loss 1.3291 LearningRate 0.000069 Epoch: 30 Global Step: 632910 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:32,515-Speed 2494.92 samples/sec Loss 1.3237 LearningRate 0.000069 Epoch: 30 Global Step: 632920 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:40,720-Speed 2496.79 samples/sec Loss 1.3273 LearningRate 0.000069 Epoch: 30 Global Step: 632930 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:48,922-Speed 2497.24 samples/sec Loss 1.3249 LearningRate 0.000069 Epoch: 30 Global Step: 632940 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:19:57,070-Speed 2513.73 samples/sec Loss 1.3113 LearningRate 0.000069 Epoch: 30 Global Step: 632950 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:05,272-Speed 2497.42 samples/sec Loss 1.2973 LearningRate 0.000069 Epoch: 30 Global Step: 632960 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:13,477-Speed 2496.44 samples/sec Loss 1.2899 LearningRate 0.000069 Epoch: 30 Global Step: 632970 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:21,678-Speed 2497.47 samples/sec Loss 1.3070 LearningRate 0.000069 Epoch: 30 Global Step: 632980 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:29,879-Speed 2497.77 samples/sec Loss 1.3254 LearningRate 0.000069 Epoch: 30 Global Step: 632990 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:38,077-Speed 2498.36 samples/sec Loss 1.3412 LearningRate 0.000069 Epoch: 30 Global Step: 633000 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:46,239-Speed 2509.64 samples/sec Loss 1.3270 LearningRate 0.000069 Epoch: 30 Global Step: 633010 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:20:54,444-Speed 2496.53 samples/sec Loss 1.3232 LearningRate 0.000069 Epoch: 30 Global Step: 633020 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:02,652-Speed 2495.62 samples/sec Loss 1.3113 LearningRate 0.000069 Epoch: 30 Global Step: 633030 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:10,852-Speed 2498.19 samples/sec Loss 1.2875 LearningRate 0.000069 Epoch: 30 Global Step: 633040 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:19,048-Speed 2499.14 samples/sec Loss 1.3278 LearningRate 0.000069 Epoch: 30 Global Step: 633050 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:27,258-Speed 2495.38 samples/sec Loss 1.3476 LearningRate 0.000069 Epoch: 30 Global Step: 633060 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:35,400-Speed 2515.62 samples/sec Loss 1.3751 LearningRate 0.000069 Epoch: 30 Global Step: 633070 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:43,600-Speed 2498.02 samples/sec Loss 1.2988 LearningRate 0.000069 Epoch: 30 Global Step: 633080 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:21:51,807-Speed 2496.00 samples/sec Loss 1.3304 LearningRate 0.000069 Epoch: 30 Global Step: 633090 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:00,008-Speed 2497.75 samples/sec Loss 1.3428 LearningRate 0.000069 Epoch: 30 Global Step: 633100 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:08,217-Speed 2495.29 samples/sec Loss 1.3676 LearningRate 0.000069 Epoch: 30 Global Step: 633110 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:16,421-Speed 2496.77 samples/sec Loss 1.3357 LearningRate 0.000069 Epoch: 30 Global Step: 633120 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:24,570-Speed 2513.65 samples/sec Loss 1.3238 LearningRate 0.000069 Epoch: 30 Global Step: 633130 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:32,782-Speed 2494.37 samples/sec Loss 1.3248 LearningRate 0.000069 Epoch: 30 Global Step: 633140 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:40,986-Speed 2496.75 samples/sec Loss 1.3299 LearningRate 0.000069 Epoch: 30 Global Step: 633150 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:49,191-Speed 2496.61 samples/sec Loss 1.3324 LearningRate 0.000069 Epoch: 30 Global Step: 633160 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:22:57,403-Speed 2494.04 samples/sec Loss 1.3174 LearningRate 0.000069 Epoch: 30 Global Step: 633170 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:05,602-Speed 2498.64 samples/sec Loss 1.2994 LearningRate 0.000069 Epoch: 30 Global Step: 633180 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:13,751-Speed 2514.31 samples/sec Loss 1.3361 LearningRate 0.000069 Epoch: 30 Global Step: 633190 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:21,947-Speed 2499.05 samples/sec Loss 1.3368 LearningRate 0.000069 Epoch: 30 Global Step: 633200 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:30,148-Speed 2497.83 samples/sec Loss 1.3170 LearningRate 0.000069 Epoch: 30 Global Step: 633210 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:38,349-Speed 2497.67 samples/sec Loss 1.3262 LearningRate 0.000069 Epoch: 30 Global Step: 633220 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:46,570-Speed 2491.84 samples/sec Loss 1.3431 LearningRate 0.000069 Epoch: 30 Global Step: 633230 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:23:54,772-Speed 2497.13 samples/sec Loss 1.3721 LearningRate 0.000069 Epoch: 30 Global Step: 633240 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:02,918-Speed 2514.64 samples/sec Loss 1.3244 LearningRate 0.000069 Epoch: 30 Global Step: 633250 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:11,118-Speed 2497.75 samples/sec Loss 1.2970 LearningRate 0.000069 Epoch: 30 Global Step: 633260 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:19,316-Speed 2498.55 samples/sec Loss 1.3502 LearningRate 0.000069 Epoch: 30 Global Step: 633270 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:27,516-Speed 2498.17 samples/sec Loss 1.3112 LearningRate 0.000069 Epoch: 30 Global Step: 633280 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:35,719-Speed 2496.92 samples/sec Loss 1.3161 LearningRate 0.000069 Epoch: 30 Global Step: 633290 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:43,921-Speed 2497.43 samples/sec Loss 1.3085 LearningRate 0.000069 Epoch: 30 Global Step: 633300 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:24:52,065-Speed 2515.37 samples/sec Loss 1.3130 LearningRate 0.000069 Epoch: 30 Global Step: 633310 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:00,266-Speed 2497.80 samples/sec Loss 1.3199 LearningRate 0.000069 Epoch: 30 Global Step: 633320 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:08,466-Speed 2497.72 samples/sec Loss 1.3166 LearningRate 0.000069 Epoch: 30 Global Step: 633330 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:16,665-Speed 2498.32 samples/sec Loss 1.3348 LearningRate 0.000069 Epoch: 30 Global Step: 633340 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:24,866-Speed 2498.25 samples/sec Loss 1.3210 LearningRate 0.000069 Epoch: 30 Global Step: 633350 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:33,064-Speed 2498.45 samples/sec Loss 1.3361 LearningRate 0.000069 Epoch: 30 Global Step: 633360 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:41,215-Speed 2512.76 samples/sec Loss 1.3025 LearningRate 0.000069 Epoch: 30 Global Step: 633370 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:49,415-Speed 2497.97 samples/sec Loss 1.2979 LearningRate 0.000069 Epoch: 30 Global Step: 633380 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:25:57,614-Speed 2498.40 samples/sec Loss 1.3417 LearningRate 0.000069 Epoch: 30 Global Step: 633390 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:05,832-Speed 2492.52 samples/sec Loss 1.3287 LearningRate 0.000069 Epoch: 30 Global Step: 633400 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:14,031-Speed 2498.27 samples/sec Loss 1.3256 LearningRate 0.000069 Epoch: 30 Global Step: 633410 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:22,230-Speed 2498.32 samples/sec Loss 1.3234 LearningRate 0.000069 Epoch: 30 Global Step: 633420 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:30,378-Speed 2513.84 samples/sec Loss 1.3339 LearningRate 0.000069 Epoch: 30 Global Step: 633430 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:38,579-Speed 2497.95 samples/sec Loss 1.3275 LearningRate 0.000069 Epoch: 30 Global Step: 633440 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:46,780-Speed 2497.50 samples/sec Loss 1.3316 LearningRate 0.000069 Epoch: 30 Global Step: 633450 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:26:54,991-Speed 2494.73 samples/sec Loss 1.3164 LearningRate 0.000069 Epoch: 30 Global Step: 633460 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:03,191-Speed 2497.73 samples/sec Loss 1.3347 LearningRate 0.000069 Epoch: 30 Global Step: 633470 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:11,401-Speed 2495.03 samples/sec Loss 1.3187 LearningRate 0.000069 Epoch: 30 Global Step: 633480 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:19,569-Speed 2507.82 samples/sec Loss 1.3046 LearningRate 0.000069 Epoch: 30 Global Step: 633490 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:27,771-Speed 2497.35 samples/sec Loss 1.3139 LearningRate 0.000069 Epoch: 30 Global Step: 633500 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:35,969-Speed 2498.41 samples/sec Loss 1.3314 LearningRate 0.000069 Epoch: 30 Global Step: 633510 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:44,166-Speed 2498.94 samples/sec Loss 1.3202 LearningRate 0.000069 Epoch: 30 Global Step: 633520 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:27:52,370-Speed 2496.92 samples/sec Loss 1.3160 LearningRate 0.000069 Epoch: 30 Global Step: 633530 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:00,566-Speed 2498.92 samples/sec Loss 1.3130 LearningRate 0.000069 Epoch: 30 Global Step: 633540 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:08,710-Speed 2515.43 samples/sec Loss 1.2754 LearningRate 0.000069 Epoch: 30 Global Step: 633550 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:16,911-Speed 2497.76 samples/sec Loss 1.3273 LearningRate 0.000069 Epoch: 30 Global Step: 633560 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:25,105-Speed 2499.62 samples/sec Loss 1.3529 LearningRate 0.000069 Epoch: 30 Global Step: 633570 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:33,309-Speed 2496.86 samples/sec Loss 1.2967 LearningRate 0.000069 Epoch: 30 Global Step: 633580 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:41,511-Speed 2497.15 samples/sec Loss 1.3010 LearningRate 0.000069 Epoch: 30 Global Step: 633590 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:49,732-Speed 2491.73 samples/sec Loss 1.3446 LearningRate 0.000069 Epoch: 30 Global Step: 633600 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:28:57,886-Speed 2511.97 samples/sec Loss 1.3233 LearningRate 0.000069 Epoch: 30 Global Step: 633610 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:06,086-Speed 2497.97 samples/sec Loss 1.3382 LearningRate 0.000069 Epoch: 30 Global Step: 633620 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:14,286-Speed 2497.66 samples/sec Loss 1.3153 LearningRate 0.000069 Epoch: 30 Global Step: 633630 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:22,486-Speed 2498.13 samples/sec Loss 1.3356 LearningRate 0.000069 Epoch: 30 Global Step: 633640 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:30,681-Speed 2499.48 samples/sec Loss 1.2954 LearningRate 0.000069 Epoch: 30 Global Step: 633650 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:38,887-Speed 2496.00 samples/sec Loss 1.3400 LearningRate 0.000069 Epoch: 30 Global Step: 633660 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:47,034-Speed 2514.28 samples/sec Loss 1.3347 LearningRate 0.000069 Epoch: 30 Global Step: 633670 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:29:55,233-Speed 2498.56 samples/sec Loss 1.2909 LearningRate 0.000069 Epoch: 30 Global Step: 633680 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:03,430-Speed 2498.62 samples/sec Loss 1.3085 LearningRate 0.000069 Epoch: 30 Global Step: 633690 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:11,631-Speed 2497.53 samples/sec Loss 1.2966 LearningRate 0.000069 Epoch: 30 Global Step: 633700 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:19,836-Speed 2496.40 samples/sec Loss 1.3343 LearningRate 0.000069 Epoch: 30 Global Step: 633710 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:28,038-Speed 2497.35 samples/sec Loss 1.3376 LearningRate 0.000069 Epoch: 30 Global Step: 633720 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:36,190-Speed 2512.64 samples/sec Loss 1.3203 LearningRate 0.000069 Epoch: 30 Global Step: 633730 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:44,403-Speed 2494.21 samples/sec Loss 1.3016 LearningRate 0.000069 Epoch: 30 Global Step: 633740 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:30:52,602-Speed 2498.15 samples/sec Loss 1.3227 LearningRate 0.000069 Epoch: 30 Global Step: 633750 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:00,810-Speed 2495.38 samples/sec Loss 1.3132 LearningRate 0.000069 Epoch: 30 Global Step: 633760 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:09,008-Speed 2498.84 samples/sec Loss 1.3384 LearningRate 0.000069 Epoch: 30 Global Step: 633770 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:17,211-Speed 2497.01 samples/sec Loss 1.3178 LearningRate 0.000069 Epoch: 30 Global Step: 633780 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:25,359-Speed 2513.60 samples/sec Loss 1.2937 LearningRate 0.000069 Epoch: 30 Global Step: 633790 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:33,561-Speed 2497.67 samples/sec Loss 1.3056 LearningRate 0.000069 Epoch: 30 Global Step: 633800 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:41,758-Speed 2498.85 samples/sec Loss 1.3315 LearningRate 0.000069 Epoch: 30 Global Step: 633810 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:49,959-Speed 2497.57 samples/sec Loss 1.3298 LearningRate 0.000069 Epoch: 30 Global Step: 633820 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:31:58,158-Speed 2498.26 samples/sec Loss 1.3000 LearningRate 0.000069 Epoch: 30 Global Step: 633830 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:06,358-Speed 2497.87 samples/sec Loss 1.3448 LearningRate 0.000069 Epoch: 30 Global Step: 633840 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:14,508-Speed 2513.58 samples/sec Loss 1.3223 LearningRate 0.000069 Epoch: 30 Global Step: 633850 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:22,707-Speed 2498.32 samples/sec Loss 1.2844 LearningRate 0.000069 Epoch: 30 Global Step: 633860 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:30,907-Speed 2497.72 samples/sec Loss 1.3407 LearningRate 0.000069 Epoch: 30 Global Step: 633870 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:39,108-Speed 2497.92 samples/sec Loss 1.3331 LearningRate 0.000069 Epoch: 30 Global Step: 633880 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:47,315-Speed 2495.69 samples/sec Loss 1.2915 LearningRate 0.000069 Epoch: 30 Global Step: 633890 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:32:55,517-Speed 2497.25 samples/sec Loss 1.3123 LearningRate 0.000069 Epoch: 30 Global Step: 633900 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:03,674-Speed 2511.21 samples/sec Loss 1.2924 LearningRate 0.000069 Epoch: 30 Global Step: 633910 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:11,874-Speed 2498.32 samples/sec Loss 1.3216 LearningRate 0.000069 Epoch: 30 Global Step: 633920 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:20,073-Speed 2498.28 samples/sec Loss 1.3235 LearningRate 0.000069 Epoch: 30 Global Step: 633930 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:28,271-Speed 2498.51 samples/sec Loss 1.3094 LearningRate 0.000069 Epoch: 30 Global Step: 633940 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:36,470-Speed 2498.36 samples/sec Loss 1.2950 LearningRate 0.000069 Epoch: 30 Global Step: 633950 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:44,668-Speed 2498.47 samples/sec Loss 1.3085 LearningRate 0.000069 Epoch: 30 Global Step: 633960 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:33:52,815-Speed 2514.24 samples/sec Loss 1.2920 LearningRate 0.000069 Epoch: 30 Global Step: 633970 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-07-11 15:34:01,014-Speed 2498.18 samples/sec Loss 1.3409 LearningRate 0.000069 Epoch: 30 Global Step: 633980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:09,215-Speed 2497.68 samples/sec Loss 1.3046 LearningRate 0.000069 Epoch: 30 Global Step: 633990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:17,414-Speed 2498.17 samples/sec Loss 1.3859 LearningRate 0.000069 Epoch: 30 Global Step: 634000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:25,612-Speed 2498.71 samples/sec Loss 1.2862 LearningRate 0.000069 Epoch: 30 Global Step: 634010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:33,814-Speed 2497.24 samples/sec Loss 1.3195 LearningRate 0.000069 Epoch: 30 Global Step: 634020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:41,974-Speed 2510.52 samples/sec Loss 1.3179 LearningRate 0.000069 Epoch: 30 Global Step: 634030 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:50,172-Speed 2498.43 samples/sec Loss 1.3152 LearningRate 0.000069 Epoch: 30 Global Step: 634040 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:34:58,375-Speed 2497.13 samples/sec Loss 1.3200 LearningRate 0.000069 Epoch: 30 Global Step: 634050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:06,575-Speed 2498.16 samples/sec Loss 1.3440 LearningRate 0.000069 Epoch: 30 Global Step: 634060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:14,772-Speed 2499.16 samples/sec Loss 1.3145 LearningRate 0.000069 Epoch: 30 Global Step: 634070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:22,978-Speed 2496.07 samples/sec Loss 1.3398 LearningRate 0.000069 Epoch: 30 Global Step: 634080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:31,128-Speed 2513.36 samples/sec Loss 1.3246 LearningRate 0.000069 Epoch: 30 Global Step: 634090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:39,327-Speed 2498.19 samples/sec Loss 1.2982 LearningRate 0.000069 Epoch: 30 Global Step: 634100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:47,529-Speed 2497.53 samples/sec Loss 1.3148 LearningRate 0.000069 Epoch: 30 Global Step: 634110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:35:55,727-Speed 2498.48 samples/sec Loss 1.3219 LearningRate 0.000069 Epoch: 30 Global Step: 634120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:03,927-Speed 2497.98 samples/sec Loss 1.3052 LearningRate 0.000069 Epoch: 30 Global Step: 634130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:12,129-Speed 2497.72 samples/sec Loss 1.3454 LearningRate 0.000069 Epoch: 30 Global Step: 634140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:20,274-Speed 2514.73 samples/sec Loss 1.3300 LearningRate 0.000069 Epoch: 30 Global Step: 634150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:28,507-Speed 2487.93 samples/sec Loss 1.3210 LearningRate 0.000068 Epoch: 30 Global Step: 634160 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:36,712-Speed 2496.60 samples/sec Loss 1.3375 LearningRate 0.000068 Epoch: 30 Global Step: 634170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:44,913-Speed 2497.67 samples/sec Loss 1.2841 LearningRate 0.000068 Epoch: 30 Global Step: 634180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:36:53,114-Speed 2497.59 samples/sec Loss 1.3119 LearningRate 0.000068 Epoch: 30 Global Step: 634190 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:01,314-Speed 2498.11 samples/sec Loss 1.3113 LearningRate 0.000068 Epoch: 30 Global Step: 634200 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:09,470-Speed 2511.60 samples/sec Loss 1.3337 LearningRate 0.000068 Epoch: 30 Global Step: 634210 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:17,669-Speed 2498.12 samples/sec Loss 1.3350 LearningRate 0.000068 Epoch: 30 Global Step: 634220 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:25,869-Speed 2498.47 samples/sec Loss 1.3235 LearningRate 0.000068 Epoch: 30 Global Step: 634230 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:34,068-Speed 2498.16 samples/sec Loss 1.3030 LearningRate 0.000068 Epoch: 30 Global Step: 634240 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:42,299-Speed 2488.67 samples/sec Loss 1.3309 LearningRate 0.000068 Epoch: 30 Global Step: 634250 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:50,497-Speed 2498.44 samples/sec Loss 1.3370 LearningRate 0.000068 Epoch: 30 Global Step: 634260 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:37:58,645-Speed 2514.07 samples/sec Loss 1.3374 LearningRate 0.000068 Epoch: 30 Global Step: 634270 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:06,857-Speed 2494.13 samples/sec Loss 1.3054 LearningRate 0.000068 Epoch: 30 Global Step: 634280 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:15,057-Speed 2497.91 samples/sec Loss 1.3035 LearningRate 0.000068 Epoch: 30 Global Step: 634290 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:23,258-Speed 2497.87 samples/sec Loss 1.3299 LearningRate 0.000068 Epoch: 30 Global Step: 634300 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:31,472-Speed 2493.65 samples/sec Loss 1.3199 LearningRate 0.000068 Epoch: 30 Global Step: 634310 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:39,673-Speed 2497.35 samples/sec Loss 1.3309 LearningRate 0.000068 Epoch: 30 Global Step: 634320 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:47,827-Speed 2512.24 samples/sec Loss 1.3019 LearningRate 0.000068 Epoch: 30 Global Step: 634330 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:38:56,033-Speed 2496.14 samples/sec Loss 1.3093 LearningRate 0.000068 Epoch: 30 Global Step: 634340 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:04,245-Speed 2494.18 samples/sec Loss 1.3237 LearningRate 0.000068 Epoch: 30 Global Step: 634350 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:12,445-Speed 2497.84 samples/sec Loss 1.3289 LearningRate 0.000068 Epoch: 30 Global Step: 634360 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:20,648-Speed 2497.09 samples/sec Loss 1.3118 LearningRate 0.000068 Epoch: 30 Global Step: 634370 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:28,856-Speed 2495.47 samples/sec Loss 1.2998 LearningRate 0.000068 Epoch: 30 Global Step: 634380 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:37,001-Speed 2515.08 samples/sec Loss 1.3451 LearningRate 0.000068 Epoch: 30 Global Step: 634390 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:45,200-Speed 2498.14 samples/sec Loss 1.2982 LearningRate 0.000068 Epoch: 30 Global Step: 634400 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:39:53,398-Speed 2498.60 samples/sec Loss 1.3201 LearningRate 0.000068 Epoch: 30 Global Step: 634410 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:01,602-Speed 2496.84 samples/sec Loss 1.3416 LearningRate 0.000068 Epoch: 30 Global Step: 634420 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:09,806-Speed 2496.92 samples/sec Loss 1.3096 LearningRate 0.000068 Epoch: 30 Global Step: 634430 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:18,009-Speed 2496.85 samples/sec Loss 1.3354 LearningRate 0.000068 Epoch: 30 Global Step: 634440 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:26,158-Speed 2513.78 samples/sec Loss 1.3307 LearningRate 0.000068 Epoch: 30 Global Step: 634450 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:34,362-Speed 2497.44 samples/sec Loss 1.3133 LearningRate 0.000068 Epoch: 30 Global Step: 634460 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:42,566-Speed 2496.78 samples/sec Loss 1.2932 LearningRate 0.000068 Epoch: 30 Global Step: 634470 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:50,782-Speed 2493.54 samples/sec Loss 1.3207 LearningRate 0.000068 Epoch: 30 Global Step: 634480 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:40:58,984-Speed 2497.45 samples/sec Loss 1.3229 LearningRate 0.000068 Epoch: 30 Global Step: 634490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:07,186-Speed 2497.44 samples/sec Loss 1.3184 LearningRate 0.000068 Epoch: 30 Global Step: 634500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:15,332-Speed 2514.21 samples/sec Loss 1.3004 LearningRate 0.000068 Epoch: 30 Global Step: 634510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:23,535-Speed 2497.15 samples/sec Loss 1.3112 LearningRate 0.000068 Epoch: 30 Global Step: 634520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:31,741-Speed 2495.86 samples/sec Loss 1.3202 LearningRate 0.000068 Epoch: 30 Global Step: 634530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:39,952-Speed 2494.95 samples/sec Loss 1.3357 LearningRate 0.000068 Epoch: 30 Global Step: 634540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:48,150-Speed 2498.33 samples/sec Loss 1.3325 LearningRate 0.000068 Epoch: 30 Global Step: 634550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:41:56,355-Speed 2496.50 samples/sec Loss 1.3104 LearningRate 0.000068 Epoch: 30 Global Step: 634560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:04,502-Speed 2514.80 samples/sec Loss 1.2949 LearningRate 0.000068 Epoch: 30 Global Step: 634570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:12,699-Speed 2498.69 samples/sec Loss 1.2923 LearningRate 0.000068 Epoch: 30 Global Step: 634580 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:20,900-Speed 2497.66 samples/sec Loss 1.3050 LearningRate 0.000068 Epoch: 30 Global Step: 634590 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:29,100-Speed 2498.18 samples/sec Loss 1.3073 LearningRate 0.000068 Epoch: 30 Global Step: 634600 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:37,299-Speed 2498.16 samples/sec Loss 1.3178 LearningRate 0.000068 Epoch: 30 Global Step: 634610 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:45,495-Speed 2499.19 samples/sec Loss 1.2959 LearningRate 0.000068 Epoch: 30 Global Step: 634620 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:42:53,643-Speed 2514.13 samples/sec Loss 1.3206 LearningRate 0.000068 Epoch: 30 Global Step: 634630 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:01,843-Speed 2497.80 samples/sec Loss 1.3204 LearningRate 0.000068 Epoch: 30 Global Step: 634640 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:10,045-Speed 2497.47 samples/sec Loss 1.2915 LearningRate 0.000068 Epoch: 30 Global Step: 634650 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:18,249-Speed 2496.75 samples/sec Loss 1.2984 LearningRate 0.000068 Epoch: 30 Global Step: 634660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:26,449-Speed 2497.95 samples/sec Loss 1.3028 LearningRate 0.000068 Epoch: 30 Global Step: 634670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:34,650-Speed 2498.00 samples/sec Loss 1.3012 LearningRate 0.000068 Epoch: 30 Global Step: 634680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:42,798-Speed 2513.91 samples/sec Loss 1.2911 LearningRate 0.000068 Epoch: 30 Global Step: 634690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:51,001-Speed 2496.96 samples/sec Loss 1.2964 LearningRate 0.000068 Epoch: 30 Global Step: 634700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:43:59,202-Speed 2497.87 samples/sec Loss 1.3375 LearningRate 0.000068 Epoch: 30 Global Step: 634710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:07,404-Speed 2497.25 samples/sec Loss 1.3034 LearningRate 0.000068 Epoch: 30 Global Step: 634720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:15,604-Speed 2498.10 samples/sec Loss 1.2957 LearningRate 0.000068 Epoch: 30 Global Step: 634730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:23,818-Speed 2493.50 samples/sec Loss 1.3032 LearningRate 0.000068 Epoch: 30 Global Step: 634740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:31,963-Speed 2514.97 samples/sec Loss 1.3492 LearningRate 0.000068 Epoch: 30 Global Step: 634750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:40,161-Speed 2498.32 samples/sec Loss 1.2871 LearningRate 0.000068 Epoch: 30 Global Step: 634760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:48,361-Speed 2498.26 samples/sec Loss 1.3385 LearningRate 0.000068 Epoch: 30 Global Step: 634770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:44:56,561-Speed 2497.99 samples/sec Loss 1.3480 LearningRate 0.000068 Epoch: 30 Global Step: 634780 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:04,765-Speed 2496.89 samples/sec Loss 1.3113 LearningRate 0.000068 Epoch: 30 Global Step: 634790 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:12,968-Speed 2496.81 samples/sec Loss 1.2980 LearningRate 0.000068 Epoch: 30 Global Step: 634800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:21,112-Speed 2515.15 samples/sec Loss 1.3026 LearningRate 0.000068 Epoch: 30 Global Step: 634810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:29,318-Speed 2496.26 samples/sec Loss 1.3193 LearningRate 0.000068 Epoch: 30 Global Step: 634820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:37,514-Speed 2499.47 samples/sec Loss 1.3220 LearningRate 0.000068 Epoch: 30 Global Step: 634830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:45,720-Speed 2496.27 samples/sec Loss 1.3284 LearningRate 0.000068 Epoch: 30 Global Step: 634840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:45:53,919-Speed 2498.32 samples/sec Loss 1.3539 LearningRate 0.000068 Epoch: 30 Global Step: 634850 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:02,120-Speed 2497.66 samples/sec Loss 1.3034 LearningRate 0.000068 Epoch: 30 Global Step: 634860 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:10,272-Speed 2512.56 samples/sec Loss 1.3189 LearningRate 0.000068 Epoch: 30 Global Step: 634870 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:18,471-Speed 2498.44 samples/sec Loss 1.3556 LearningRate 0.000068 Epoch: 30 Global Step: 634880 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:26,679-Speed 2495.73 samples/sec Loss 1.2904 LearningRate 0.000068 Epoch: 30 Global Step: 634890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:34,880-Speed 2497.52 samples/sec Loss 1.3299 LearningRate 0.000068 Epoch: 30 Global Step: 634900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:43,083-Speed 2497.14 samples/sec Loss 1.3247 LearningRate 0.000068 Epoch: 30 Global Step: 634910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:51,282-Speed 2498.38 samples/sec Loss 1.3530 LearningRate 0.000068 Epoch: 30 Global Step: 634920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:46:59,426-Speed 2514.98 samples/sec Loss 1.3431 LearningRate 0.000068 Epoch: 30 Global Step: 634930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:07,624-Speed 2498.62 samples/sec Loss 1.3043 LearningRate 0.000068 Epoch: 30 Global Step: 634940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:15,826-Speed 2497.51 samples/sec Loss 1.3221 LearningRate 0.000068 Epoch: 30 Global Step: 634950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:24,026-Speed 2497.81 samples/sec Loss 1.3361 LearningRate 0.000068 Epoch: 30 Global Step: 634960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:32,227-Speed 2497.78 samples/sec Loss 1.3371 LearningRate 0.000068 Epoch: 30 Global Step: 634970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:40,447-Speed 2492.15 samples/sec Loss 1.3053 LearningRate 0.000068 Epoch: 30 Global Step: 634980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:48,595-Speed 2513.76 samples/sec Loss 1.3402 LearningRate 0.000068 Epoch: 30 Global Step: 634990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:47:56,807-Speed 2494.21 samples/sec Loss 1.3194 LearningRate 0.000068 Epoch: 30 Global Step: 635000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:05,006-Speed 2498.22 samples/sec Loss 1.3506 LearningRate 0.000068 Epoch: 30 Global Step: 635010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:13,220-Speed 2493.67 samples/sec Loss 1.3299 LearningRate 0.000068 Epoch: 30 Global Step: 635020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:21,433-Speed 2494.07 samples/sec Loss 1.3007 LearningRate 0.000068 Epoch: 30 Global Step: 635030 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:29,632-Speed 2498.24 samples/sec Loss 1.3535 LearningRate 0.000068 Epoch: 30 Global Step: 635040 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:37,777-Speed 2514.84 samples/sec Loss 1.3068 LearningRate 0.000068 Epoch: 30 Global Step: 635050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:45,978-Speed 2497.83 samples/sec Loss 1.3019 LearningRate 0.000068 Epoch: 30 Global Step: 635060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:48:54,176-Speed 2498.54 samples/sec Loss 1.3333 LearningRate 0.000068 Epoch: 30 Global Step: 635070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:02,376-Speed 2498.10 samples/sec Loss 1.3358 LearningRate 0.000068 Epoch: 30 Global Step: 635080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:10,578-Speed 2497.44 samples/sec Loss 1.3037 LearningRate 0.000068 Epoch: 30 Global Step: 635090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:18,781-Speed 2497.17 samples/sec Loss 1.3255 LearningRate 0.000068 Epoch: 30 Global Step: 635100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:26,931-Speed 2513.36 samples/sec Loss 1.2948 LearningRate 0.000068 Epoch: 30 Global Step: 635110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:35,145-Speed 2493.91 samples/sec Loss 1.3127 LearningRate 0.000068 Epoch: 30 Global Step: 635120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:43,349-Speed 2496.68 samples/sec Loss 1.2953 LearningRate 0.000068 Epoch: 30 Global Step: 635130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:51,551-Speed 2497.38 samples/sec Loss 1.3048 LearningRate 0.000068 Epoch: 30 Global Step: 635140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:49:59,752-Speed 2497.90 samples/sec Loss 1.2746 LearningRate 0.000068 Epoch: 30 Global Step: 635150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-07-11 15:50:07,953-Speed 2497.37 samples/sec Loss 1.2612 LearningRate 0.000068 Epoch: 30 Global Step: 635160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 15:50:16,099-Speed 2514.59 samples/sec Loss 1.3187 LearningRate 0.000068 Epoch: 30 Global Step: 635170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 15:50:24,296-Speed 2499.01 samples/sec Loss 1.3078 LearningRate 0.000068 Epoch: 30 Global Step: 635180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:50:32,496-Speed 2498.09 samples/sec Loss 1.2970 LearningRate 0.000068 Epoch: 30 Global Step: 635190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:50:40,695-Speed 2498.02 samples/sec Loss 1.3032 LearningRate 0.000068 Epoch: 30 Global Step: 635200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:50:48,899-Speed 2496.82 samples/sec Loss 1.3121 LearningRate 0.000068 Epoch: 30 Global Step: 635210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:50:57,119-Speed 2492.04 samples/sec Loss 1.2950 LearningRate 0.000068 Epoch: 30 Global Step: 635220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:05,259-Speed 2516.28 samples/sec Loss 1.3190 LearningRate 0.000068 Epoch: 30 Global Step: 635230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:13,461-Speed 2497.46 samples/sec Loss 1.3395 LearningRate 0.000068 Epoch: 30 Global Step: 635240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:21,658-Speed 2498.48 samples/sec Loss 1.3106 LearningRate 0.000068 Epoch: 30 Global Step: 635250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:29,861-Speed 2497.03 samples/sec Loss 1.3182 LearningRate 0.000068 Epoch: 30 Global Step: 635260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:38,062-Speed 2497.67 samples/sec Loss 1.3545 LearningRate 0.000068 Epoch: 30 Global Step: 635270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:46,260-Speed 2498.67 samples/sec Loss 1.2970 LearningRate 0.000068 Epoch: 30 Global Step: 635280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:51:54,407-Speed 2514.24 samples/sec Loss 1.3368 LearningRate 0.000068 Epoch: 30 Global Step: 635290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:02,606-Speed 2498.27 samples/sec Loss 1.3123 LearningRate 0.000068 Epoch: 30 Global Step: 635300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:10,821-Speed 2493.29 samples/sec Loss 1.3132 LearningRate 0.000068 Epoch: 30 Global Step: 635310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:19,020-Speed 2498.21 samples/sec Loss 1.3219 LearningRate 0.000068 Epoch: 30 Global Step: 635320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:27,222-Speed 2497.38 samples/sec Loss 1.2999 LearningRate 0.000068 Epoch: 30 Global Step: 635330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:35,434-Speed 2494.25 samples/sec Loss 1.2983 LearningRate 0.000068 Epoch: 30 Global Step: 635340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:43,582-Speed 2513.83 samples/sec Loss 1.3341 LearningRate 0.000068 Epoch: 30 Global Step: 635350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:52:51,779-Speed 2498.94 samples/sec Loss 1.3348 LearningRate 0.000068 Epoch: 30 Global Step: 635360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:00,000-Speed 2491.75 samples/sec Loss 1.3037 LearningRate 0.000068 Epoch: 30 Global Step: 635370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:08,200-Speed 2497.78 samples/sec Loss 1.3281 LearningRate 0.000068 Epoch: 30 Global Step: 635380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:16,404-Speed 2497.03 samples/sec Loss 1.3563 LearningRate 0.000068 Epoch: 30 Global Step: 635390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:24,604-Speed 2497.88 samples/sec Loss 1.3067 LearningRate 0.000068 Epoch: 30 Global Step: 635400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:32,767-Speed 2509.34 samples/sec Loss 1.3382 LearningRate 0.000068 Epoch: 30 Global Step: 635410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:40,966-Speed 2498.17 samples/sec Loss 1.3321 LearningRate 0.000068 Epoch: 30 Global Step: 635420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:49,170-Speed 2496.84 samples/sec Loss 1.2971 LearningRate 0.000068 Epoch: 30 Global Step: 635430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:53:57,369-Speed 2498.31 samples/sec Loss 1.3026 LearningRate 0.000068 Epoch: 30 Global Step: 635440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:05,569-Speed 2497.89 samples/sec Loss 1.3263 LearningRate 0.000068 Epoch: 30 Global Step: 635450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:13,768-Speed 2498.08 samples/sec Loss 1.2996 LearningRate 0.000068 Epoch: 30 Global Step: 635460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:21,915-Speed 2514.12 samples/sec Loss 1.2993 LearningRate 0.000068 Epoch: 30 Global Step: 635470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:30,129-Speed 2493.58 samples/sec Loss 1.3258 LearningRate 0.000068 Epoch: 30 Global Step: 635480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:38,330-Speed 2497.77 samples/sec Loss 1.3393 LearningRate 0.000068 Epoch: 30 Global Step: 635490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:46,534-Speed 2496.99 samples/sec Loss 1.2976 LearningRate 0.000068 Epoch: 30 Global Step: 635500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:54:54,739-Speed 2496.40 samples/sec Loss 1.3000 LearningRate 0.000068 Epoch: 30 Global Step: 635510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:02,942-Speed 2496.95 samples/sec Loss 1.3321 LearningRate 0.000068 Epoch: 30 Global Step: 635520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:11,094-Speed 2512.54 samples/sec Loss 1.3275 LearningRate 0.000068 Epoch: 30 Global Step: 635530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:19,295-Speed 2497.65 samples/sec Loss 1.3285 LearningRate 0.000068 Epoch: 30 Global Step: 635540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:27,501-Speed 2496.02 samples/sec Loss 1.3221 LearningRate 0.000068 Epoch: 30 Global Step: 635550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:35,704-Speed 2497.04 samples/sec Loss 1.2980 LearningRate 0.000068 Epoch: 30 Global Step: 635560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:43,913-Speed 2495.13 samples/sec Loss 1.2778 LearningRate 0.000068 Epoch: 30 Global Step: 635570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:55:52,114-Speed 2498.07 samples/sec Loss 1.3193 LearningRate 0.000068 Epoch: 30 Global Step: 635580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:00,263-Speed 2513.54 samples/sec Loss 1.3007 LearningRate 0.000067 Epoch: 30 Global Step: 635590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:08,468-Speed 2496.65 samples/sec Loss 1.3088 LearningRate 0.000067 Epoch: 30 Global Step: 635600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:16,673-Speed 2496.52 samples/sec Loss 1.2952 LearningRate 0.000067 Epoch: 30 Global Step: 635610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:24,879-Speed 2496.16 samples/sec Loss 1.3078 LearningRate 0.000067 Epoch: 30 Global Step: 635620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:33,092-Speed 2493.71 samples/sec Loss 1.3358 LearningRate 0.000067 Epoch: 30 Global Step: 635630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:41,297-Speed 2496.52 samples/sec Loss 1.3182 LearningRate 0.000067 Epoch: 30 Global Step: 635640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:49,448-Speed 2513.10 samples/sec Loss 1.3153 LearningRate 0.000067 Epoch: 30 Global Step: 635650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:56:57,671-Speed 2490.88 samples/sec Loss 1.2821 LearningRate 0.000067 Epoch: 30 Global Step: 635660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:05,872-Speed 2497.69 samples/sec Loss 1.3216 LearningRate 0.000067 Epoch: 30 Global Step: 635670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:14,075-Speed 2496.98 samples/sec Loss 1.2960 LearningRate 0.000067 Epoch: 30 Global Step: 635680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:22,280-Speed 2496.24 samples/sec Loss 1.3385 LearningRate 0.000067 Epoch: 30 Global Step: 635690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:30,489-Speed 2495.25 samples/sec Loss 1.3104 LearningRate 0.000067 Epoch: 30 Global Step: 635700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:38,653-Speed 2509.22 samples/sec Loss 1.3587 LearningRate 0.000067 Epoch: 30 Global Step: 635710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:46,856-Speed 2497.08 samples/sec Loss 1.3256 LearningRate 0.000067 Epoch: 30 Global Step: 635720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:57:55,060-Speed 2496.56 samples/sec Loss 1.3084 LearningRate 0.000067 Epoch: 30 Global Step: 635730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:03,266-Speed 2496.41 samples/sec Loss 1.3070 LearningRate 0.000067 Epoch: 30 Global Step: 635740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:11,472-Speed 2496.02 samples/sec Loss 1.3437 LearningRate 0.000067 Epoch: 30 Global Step: 635750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:19,676-Speed 2496.77 samples/sec Loss 1.3130 LearningRate 0.000067 Epoch: 30 Global Step: 635760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:27,826-Speed 2513.00 samples/sec Loss 1.3106 LearningRate 0.000067 Epoch: 30 Global Step: 635770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:36,040-Speed 2494.12 samples/sec Loss 1.3082 LearningRate 0.000067 Epoch: 30 Global Step: 635780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:44,248-Speed 2495.37 samples/sec Loss 1.3514 LearningRate 0.000067 Epoch: 30 Global Step: 635790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:58:52,469-Speed 2491.61 samples/sec Loss 1.3148 LearningRate 0.000067 Epoch: 30 Global Step: 635800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:00,678-Speed 2495.35 samples/sec Loss 1.2952 LearningRate 0.000067 Epoch: 30 Global Step: 635810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:08,889-Speed 2494.64 samples/sec Loss 1.3252 LearningRate 0.000067 Epoch: 30 Global Step: 635820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:17,041-Speed 2512.60 samples/sec Loss 1.2962 LearningRate 0.000067 Epoch: 30 Global Step: 635830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:25,248-Speed 2495.98 samples/sec Loss 1.2888 LearningRate 0.000067 Epoch: 30 Global Step: 635840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:33,456-Speed 2495.60 samples/sec Loss 1.3069 LearningRate 0.000067 Epoch: 30 Global Step: 635850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:41,658-Speed 2497.21 samples/sec Loss 1.2523 LearningRate 0.000067 Epoch: 30 Global Step: 635860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:49,867-Speed 2495.16 samples/sec Loss 1.2880 LearningRate 0.000067 Epoch: 30 Global Step: 635870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 15:59:58,069-Speed 2497.31 samples/sec Loss 1.3346 LearningRate 0.000067 Epoch: 30 Global Step: 635880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:00:06,222-Speed 2512.36 samples/sec Loss 1.3147 LearningRate 0.000067 Epoch: 30 Global Step: 635890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:00:14,429-Speed 2495.95 samples/sec Loss 1.3567 LearningRate 0.000067 Epoch: 30 Global Step: 635900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:00:22,631-Speed 2497.37 samples/sec Loss 1.3232 LearningRate 0.000067 Epoch: 30 Global Step: 635910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:00:30,791-Speed 2510.27 samples/sec Loss 1.3123 LearningRate 0.000067 Epoch: 30 Global Step: 635920 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:00:38,995-Speed 2496.76 samples/sec Loss 1.2976 LearningRate 0.000067 Epoch: 30 Global Step: 635930 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:00:47,195-Speed 2497.75 samples/sec Loss 1.3174 LearningRate 0.000067 Epoch: 30 Global Step: 635940 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:00:55,347-Speed 2512.70 samples/sec Loss 1.2997 LearningRate 0.000067 Epoch: 30 Global Step: 635950 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:03,553-Speed 2496.33 samples/sec Loss 1.3445 LearningRate 0.000067 Epoch: 30 Global Step: 635960 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:11,753-Speed 2497.96 samples/sec Loss 1.3048 LearningRate 0.000067 Epoch: 30 Global Step: 635970 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:19,964-Speed 2494.42 samples/sec Loss 1.2989 LearningRate 0.000067 Epoch: 30 Global Step: 635980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:28,169-Speed 2496.66 samples/sec Loss 1.3079 LearningRate 0.000067 Epoch: 30 Global Step: 635990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:36,373-Speed 2496.66 samples/sec Loss 1.3019 LearningRate 0.000067 Epoch: 30 Global Step: 636000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:44,525-Speed 2512.68 samples/sec Loss 1.3258 LearningRate 0.000067 Epoch: 30 Global Step: 636010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:01:52,729-Speed 2496.77 samples/sec Loss 1.3234 LearningRate 0.000067 Epoch: 30 Global Step: 636020 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:00,929-Speed 2497.98 samples/sec Loss 1.3073 LearningRate 0.000067 Epoch: 30 Global Step: 636030 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:09,131-Speed 2497.25 samples/sec Loss 1.3095 LearningRate 0.000067 Epoch: 30 Global Step: 636040 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:17,333-Speed 2497.60 samples/sec Loss 1.3474 LearningRate 0.000067 Epoch: 30 Global Step: 636050 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:25,536-Speed 2496.76 samples/sec Loss 1.3032 LearningRate 0.000067 Epoch: 30 Global Step: 636060 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:33,685-Speed 2513.81 samples/sec Loss 1.2983 LearningRate 0.000067 Epoch: 30 Global Step: 636070 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:41,885-Speed 2497.81 samples/sec Loss 1.3159 LearningRate 0.000067 Epoch: 30 Global Step: 636080 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:50,090-Speed 2496.70 samples/sec Loss 1.3107 LearningRate 0.000067 Epoch: 30 Global Step: 636090 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:02:58,292-Speed 2497.27 samples/sec Loss 1.3261 LearningRate 0.000067 Epoch: 30 Global Step: 636100 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:06,495-Speed 2496.96 samples/sec Loss 1.3497 LearningRate 0.000067 Epoch: 30 Global Step: 636110 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:14,696-Speed 2497.69 samples/sec Loss 1.3345 LearningRate 0.000067 Epoch: 30 Global Step: 636120 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:22,844-Speed 2514.06 samples/sec Loss 1.3154 LearningRate 0.000067 Epoch: 30 Global Step: 636130 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:31,059-Speed 2493.39 samples/sec Loss 1.2944 LearningRate 0.000067 Epoch: 30 Global Step: 636140 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:39,260-Speed 2497.65 samples/sec Loss 1.3121 LearningRate 0.000067 Epoch: 30 Global Step: 636150 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:47,462-Speed 2497.43 samples/sec Loss 1.3606 LearningRate 0.000067 Epoch: 30 Global Step: 636160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:03:55,661-Speed 2498.26 samples/sec Loss 1.3126 LearningRate 0.000067 Epoch: 30 Global Step: 636170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:03,863-Speed 2497.72 samples/sec Loss 1.3344 LearningRate 0.000067 Epoch: 30 Global Step: 636180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:12,019-Speed 2511.19 samples/sec Loss 1.3254 LearningRate 0.000067 Epoch: 30 Global Step: 636190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:20,233-Speed 2493.65 samples/sec Loss 1.3282 LearningRate 0.000067 Epoch: 30 Global Step: 636200 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:28,439-Speed 2496.13 samples/sec Loss 1.3390 LearningRate 0.000067 Epoch: 30 Global Step: 636210 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:36,648-Speed 2495.41 samples/sec Loss 1.3227 LearningRate 0.000067 Epoch: 30 Global Step: 636220 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:44,850-Speed 2497.50 samples/sec Loss 1.3399 LearningRate 0.000067 Epoch: 30 Global Step: 636230 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:04:53,053-Speed 2496.83 samples/sec Loss 1.3267 LearningRate 0.000067 Epoch: 30 Global Step: 636240 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:01,204-Speed 2513.04 samples/sec Loss 1.3016 LearningRate 0.000067 Epoch: 30 Global Step: 636250 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:09,409-Speed 2496.34 samples/sec Loss 1.3261 LearningRate 0.000067 Epoch: 30 Global Step: 636260 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:17,612-Speed 2497.50 samples/sec Loss 1.3328 LearningRate 0.000067 Epoch: 30 Global Step: 636270 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:25,814-Speed 2497.51 samples/sec Loss 1.3187 LearningRate 0.000067 Epoch: 30 Global Step: 636280 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:34,019-Speed 2496.42 samples/sec Loss 1.3043 LearningRate 0.000067 Epoch: 30 Global Step: 636290 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:42,221-Speed 2497.32 samples/sec Loss 1.3301 LearningRate 0.000067 Epoch: 30 Global Step: 636300 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:50,372-Speed 2513.09 samples/sec Loss 1.3013 LearningRate 0.000067 Epoch: 30 Global Step: 636310 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:05:58,576-Speed 2497.08 samples/sec Loss 1.3382 LearningRate 0.000067 Epoch: 30 Global Step: 636320 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:06,785-Speed 2495.12 samples/sec Loss 1.3018 LearningRate 0.000067 Epoch: 30 Global Step: 636330 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:14,987-Speed 2497.41 samples/sec Loss 1.3258 LearningRate 0.000067 Epoch: 30 Global Step: 636340 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:23,188-Speed 2498.09 samples/sec Loss 1.3038 LearningRate 0.000067 Epoch: 30 Global Step: 636350 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:31,385-Speed 2498.66 samples/sec Loss 1.3378 LearningRate 0.000067 Epoch: 30 Global Step: 636360 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:39,533-Speed 2513.82 samples/sec Loss 1.3136 LearningRate 0.000067 Epoch: 30 Global Step: 636370 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:47,738-Speed 2496.43 samples/sec Loss 1.3534 LearningRate 0.000067 Epoch: 30 Global Step: 636380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:06:55,938-Speed 2498.13 samples/sec Loss 1.2945 LearningRate 0.000067 Epoch: 30 Global Step: 636390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:04,137-Speed 2498.04 samples/sec Loss 1.3262 LearningRate 0.000067 Epoch: 30 Global Step: 636400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:12,352-Speed 2493.65 samples/sec Loss 1.3460 LearningRate 0.000067 Epoch: 30 Global Step: 636410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:20,556-Speed 2496.89 samples/sec Loss 1.3054 LearningRate 0.000067 Epoch: 30 Global Step: 636420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:28,704-Speed 2514.16 samples/sec Loss 1.3009 LearningRate 0.000067 Epoch: 30 Global Step: 636430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:36,904-Speed 2497.91 samples/sec Loss 1.3002 LearningRate 0.000067 Epoch: 30 Global Step: 636440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:45,107-Speed 2496.85 samples/sec Loss 1.3266 LearningRate 0.000067 Epoch: 30 Global Step: 636450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:07:53,305-Speed 2498.44 samples/sec Loss 1.3027 LearningRate 0.000067 Epoch: 30 Global Step: 636460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:01,504-Speed 2498.44 samples/sec Loss 1.3184 LearningRate 0.000067 Epoch: 30 Global Step: 636470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:09,719-Speed 2493.35 samples/sec Loss 1.3316 LearningRate 0.000067 Epoch: 30 Global Step: 636480 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:17,865-Speed 2514.66 samples/sec Loss 1.3362 LearningRate 0.000067 Epoch: 30 Global Step: 636490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:26,064-Speed 2498.37 samples/sec Loss 1.3245 LearningRate 0.000067 Epoch: 30 Global Step: 636500 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:34,268-Speed 2496.64 samples/sec Loss 1.3231 LearningRate 0.000067 Epoch: 30 Global Step: 636510 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:42,468-Speed 2497.91 samples/sec Loss 1.3318 LearningRate 0.000067 Epoch: 30 Global Step: 636520 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:50,668-Speed 2497.71 samples/sec Loss 1.2803 LearningRate 0.000067 Epoch: 30 Global Step: 636530 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:08:58,867-Speed 2498.35 samples/sec Loss 1.3320 LearningRate 0.000067 Epoch: 30 Global Step: 636540 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:07,012-Speed 2514.99 samples/sec Loss 1.3137 LearningRate 0.000067 Epoch: 30 Global Step: 636550 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:15,216-Speed 2496.71 samples/sec Loss 1.3053 LearningRate 0.000067 Epoch: 30 Global Step: 636560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:23,419-Speed 2497.01 samples/sec Loss 1.2956 LearningRate 0.000067 Epoch: 30 Global Step: 636570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:31,619-Speed 2497.93 samples/sec Loss 1.3308 LearningRate 0.000067 Epoch: 30 Global Step: 636580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:39,823-Speed 2496.66 samples/sec Loss 1.3197 LearningRate 0.000067 Epoch: 30 Global Step: 636590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:48,027-Speed 2496.80 samples/sec Loss 1.3083 LearningRate 0.000067 Epoch: 30 Global Step: 636600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:09:56,178-Speed 2512.89 samples/sec Loss 1.3242 LearningRate 0.000067 Epoch: 30 Global Step: 636610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:04,380-Speed 2497.28 samples/sec Loss 1.2849 LearningRate 0.000067 Epoch: 30 Global Step: 636620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:12,582-Speed 2497.71 samples/sec Loss 1.3161 LearningRate 0.000067 Epoch: 30 Global Step: 636630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:20,788-Speed 2496.17 samples/sec Loss 1.3014 LearningRate 0.000067 Epoch: 30 Global Step: 636640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:28,992-Speed 2496.52 samples/sec Loss 1.2980 LearningRate 0.000067 Epoch: 30 Global Step: 636650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:37,196-Speed 2496.82 samples/sec Loss 1.3341 LearningRate 0.000067 Epoch: 30 Global Step: 636660 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:45,361-Speed 2508.68 samples/sec Loss 1.3411 LearningRate 0.000067 Epoch: 30 Global Step: 636670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:10:53,567-Speed 2496.16 samples/sec Loss 1.3216 LearningRate 0.000067 Epoch: 30 Global Step: 636680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:01,783-Speed 2493.36 samples/sec Loss 1.3178 LearningRate 0.000067 Epoch: 30 Global Step: 636690 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:09,989-Speed 2496.01 samples/sec Loss 1.3081 LearningRate 0.000067 Epoch: 30 Global Step: 636700 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:18,193-Speed 2496.82 samples/sec Loss 1.3212 LearningRate 0.000067 Epoch: 30 Global Step: 636710 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:26,402-Speed 2495.24 samples/sec Loss 1.3052 LearningRate 0.000067 Epoch: 30 Global Step: 636720 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:34,551-Speed 2513.62 samples/sec Loss 1.2935 LearningRate 0.000067 Epoch: 30 Global Step: 636730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:42,757-Speed 2496.13 samples/sec Loss 1.3262 LearningRate 0.000067 Epoch: 30 Global Step: 636740 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:50,962-Speed 2496.33 samples/sec Loss 1.3093 LearningRate 0.000067 Epoch: 30 Global Step: 636750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:11:59,168-Speed 2496.27 samples/sec Loss 1.3195 LearningRate 0.000067 Epoch: 30 Global Step: 636760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:07,393-Speed 2490.52 samples/sec Loss 1.3248 LearningRate 0.000067 Epoch: 30 Global Step: 636770 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:15,597-Speed 2496.84 samples/sec Loss 1.2893 LearningRate 0.000067 Epoch: 30 Global Step: 636780 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:23,746-Speed 2513.43 samples/sec Loss 1.2712 LearningRate 0.000067 Epoch: 30 Global Step: 636790 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:31,947-Speed 2497.88 samples/sec Loss 1.3333 LearningRate 0.000067 Epoch: 30 Global Step: 636800 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:40,148-Speed 2497.55 samples/sec Loss 1.3088 LearningRate 0.000067 Epoch: 30 Global Step: 636810 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:48,361-Speed 2493.94 samples/sec Loss 1.3080 LearningRate 0.000067 Epoch: 30 Global Step: 636820 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:12:56,565-Speed 2496.70 samples/sec Loss 1.3241 LearningRate 0.000067 Epoch: 30 Global Step: 636830 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:04,771-Speed 2496.59 samples/sec Loss 1.2870 LearningRate 0.000067 Epoch: 30 Global Step: 636840 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:12,932-Speed 2509.68 samples/sec Loss 1.3067 LearningRate 0.000067 Epoch: 30 Global Step: 636850 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:21,137-Speed 2496.51 samples/sec Loss 1.3401 LearningRate 0.000067 Epoch: 30 Global Step: 636860 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:29,340-Speed 2496.77 samples/sec Loss 1.3310 LearningRate 0.000067 Epoch: 30 Global Step: 636870 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:37,540-Speed 2498.00 samples/sec Loss 1.3008 LearningRate 0.000067 Epoch: 30 Global Step: 636880 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:45,745-Speed 2496.49 samples/sec Loss 1.2633 LearningRate 0.000067 Epoch: 30 Global Step: 636890 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:13:53,948-Speed 2496.85 samples/sec Loss 1.3057 LearningRate 0.000067 Epoch: 30 Global Step: 636900 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:02,097-Speed 2513.67 samples/sec Loss 1.3475 LearningRate 0.000067 Epoch: 30 Global Step: 636910 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:10,300-Speed 2497.07 samples/sec Loss 1.3013 LearningRate 0.000067 Epoch: 30 Global Step: 636920 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:18,501-Speed 2497.73 samples/sec Loss 1.3476 LearningRate 0.000067 Epoch: 30 Global Step: 636930 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:26,704-Speed 2496.80 samples/sec Loss 1.2695 LearningRate 0.000067 Epoch: 30 Global Step: 636940 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:34,905-Speed 2497.90 samples/sec Loss 1.3199 LearningRate 0.000067 Epoch: 30 Global Step: 636950 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:43,108-Speed 2497.01 samples/sec Loss 1.3202 LearningRate 0.000067 Epoch: 30 Global Step: 636960 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:51,259-Speed 2513.03 samples/sec Loss 1.2953 LearningRate 0.000067 Epoch: 30 Global Step: 636970 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:14:59,466-Speed 2495.66 samples/sec Loss 1.3177 LearningRate 0.000067 Epoch: 30 Global Step: 636980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:07,665-Speed 2498.20 samples/sec Loss 1.3323 LearningRate 0.000067 Epoch: 30 Global Step: 636990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:15,875-Speed 2494.88 samples/sec Loss 1.3240 LearningRate 0.000067 Epoch: 30 Global Step: 637000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:24,081-Speed 2496.00 samples/sec Loss 1.3006 LearningRate 0.000067 Epoch: 30 Global Step: 637010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:32,282-Speed 2497.81 samples/sec Loss 1.3297 LearningRate 0.000067 Epoch: 30 Global Step: 637020 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:40,435-Speed 2512.45 samples/sec Loss 1.3364 LearningRate 0.000066 Epoch: 30 Global Step: 637030 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:48,633-Speed 2498.60 samples/sec Loss 1.3070 LearningRate 0.000066 Epoch: 30 Global Step: 637040 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:15:56,840-Speed 2495.78 samples/sec Loss 1.3255 LearningRate 0.000066 Epoch: 30 Global Step: 637050 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:05,041-Speed 2497.87 samples/sec Loss 1.3394 LearningRate 0.000066 Epoch: 30 Global Step: 637060 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:13,241-Speed 2498.23 samples/sec Loss 1.3325 LearningRate 0.000066 Epoch: 30 Global Step: 637070 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:21,440-Speed 2498.06 samples/sec Loss 1.3242 LearningRate 0.000066 Epoch: 30 Global Step: 637080 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:29,587-Speed 2514.27 samples/sec Loss 1.3340 LearningRate 0.000066 Epoch: 30 Global Step: 637090 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:37,787-Speed 2497.80 samples/sec Loss 1.3400 LearningRate 0.000066 Epoch: 30 Global Step: 637100 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:46,008-Speed 2491.69 samples/sec Loss 1.3179 LearningRate 0.000066 Epoch: 30 Global Step: 637110 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:16:54,209-Speed 2497.60 samples/sec Loss 1.2995 LearningRate 0.000066 Epoch: 30 Global Step: 637120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:02,425-Speed 2493.11 samples/sec Loss 1.2952 LearningRate 0.000066 Epoch: 30 Global Step: 637130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:10,627-Speed 2497.43 samples/sec Loss 1.3033 LearningRate 0.000066 Epoch: 30 Global Step: 637140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:18,785-Speed 2511.00 samples/sec Loss 1.3245 LearningRate 0.000066 Epoch: 30 Global Step: 637150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:26,993-Speed 2495.48 samples/sec Loss 1.2970 LearningRate 0.000066 Epoch: 30 Global Step: 637160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:35,192-Speed 2498.09 samples/sec Loss 1.3469 LearningRate 0.000066 Epoch: 30 Global Step: 637170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:43,393-Speed 2497.85 samples/sec Loss 1.2655 LearningRate 0.000066 Epoch: 30 Global Step: 637180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:51,595-Speed 2497.29 samples/sec Loss 1.3353 LearningRate 0.000066 Epoch: 30 Global Step: 637190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:17:59,795-Speed 2498.10 samples/sec Loss 1.3060 LearningRate 0.000066 Epoch: 30 Global Step: 637200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:07,946-Speed 2513.04 samples/sec Loss 1.3046 LearningRate 0.000066 Epoch: 30 Global Step: 637210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:16,147-Speed 2497.60 samples/sec Loss 1.3207 LearningRate 0.000066 Epoch: 30 Global Step: 637220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:24,346-Speed 2498.12 samples/sec Loss 1.3282 LearningRate 0.000066 Epoch: 30 Global Step: 637230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:32,549-Speed 2497.08 samples/sec Loss 1.3036 LearningRate 0.000066 Epoch: 30 Global Step: 637240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:40,748-Speed 2498.32 samples/sec Loss 1.3064 LearningRate 0.000066 Epoch: 30 Global Step: 637250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:48,956-Speed 2495.42 samples/sec Loss 1.3270 LearningRate 0.000066 Epoch: 30 Global Step: 637260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:18:57,105-Speed 2513.62 samples/sec Loss 1.3213 LearningRate 0.000066 Epoch: 30 Global Step: 637270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:05,309-Speed 2496.82 samples/sec Loss 1.2978 LearningRate 0.000066 Epoch: 30 Global Step: 637280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:13,510-Speed 2497.94 samples/sec Loss 1.3039 LearningRate 0.000066 Epoch: 30 Global Step: 637290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:21,712-Speed 2497.32 samples/sec Loss 1.2981 LearningRate 0.000066 Epoch: 30 Global Step: 637300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:29,911-Speed 2498.04 samples/sec Loss 1.3031 LearningRate 0.000066 Epoch: 30 Global Step: 637310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:38,114-Speed 2497.05 samples/sec Loss 1.3290 LearningRate 0.000066 Epoch: 30 Global Step: 637320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:46,267-Speed 2513.58 samples/sec Loss 1.2964 LearningRate 0.000066 Epoch: 30 Global Step: 637330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:19:54,468-Speed 2497.92 samples/sec Loss 1.3150 LearningRate 0.000066 Epoch: 30 Global Step: 637340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:02,680-Speed 2493.99 samples/sec Loss 1.3164 LearningRate 0.000066 Epoch: 30 Global Step: 637350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:10,880-Speed 2498.02 samples/sec Loss 1.3153 LearningRate 0.000066 Epoch: 30 Global Step: 637360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:19,085-Speed 2496.35 samples/sec Loss 1.3478 LearningRate 0.000066 Epoch: 30 Global Step: 637370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:27,284-Speed 2498.37 samples/sec Loss 1.3098 LearningRate 0.000066 Epoch: 30 Global Step: 637380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:35,434-Speed 2513.22 samples/sec Loss 1.3185 LearningRate 0.000066 Epoch: 30 Global Step: 637390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:43,635-Speed 2497.96 samples/sec Loss 1.3093 LearningRate 0.000066 Epoch: 30 Global Step: 637400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:20:51,834-Speed 2498.66 samples/sec Loss 1.3156 LearningRate 0.000066 Epoch: 30 Global Step: 637410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:00,065-Speed 2488.46 samples/sec Loss 1.3114 LearningRate 0.000066 Epoch: 30 Global Step: 637420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:08,266-Speed 2497.62 samples/sec Loss 1.3322 LearningRate 0.000066 Epoch: 30 Global Step: 637430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:16,463-Speed 2498.73 samples/sec Loss 1.3293 LearningRate 0.000066 Epoch: 30 Global Step: 637440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:24,615-Speed 2512.69 samples/sec Loss 1.2919 LearningRate 0.000066 Epoch: 30 Global Step: 637450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:32,814-Speed 2498.23 samples/sec Loss 1.3295 LearningRate 0.000066 Epoch: 30 Global Step: 637460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:41,020-Speed 2496.22 samples/sec Loss 1.3320 LearningRate 0.000066 Epoch: 30 Global Step: 637470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:49,226-Speed 2496.21 samples/sec Loss 1.3592 LearningRate 0.000066 Epoch: 30 Global Step: 637480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:21:57,427-Speed 2497.36 samples/sec Loss 1.3248 LearningRate 0.000066 Epoch: 30 Global Step: 637490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:05,631-Speed 2496.70 samples/sec Loss 1.3233 LearningRate 0.000066 Epoch: 30 Global Step: 637500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:13,781-Speed 2513.40 samples/sec Loss 1.3066 LearningRate 0.000066 Epoch: 30 Global Step: 637510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:21,981-Speed 2498.00 samples/sec Loss 1.2746 LearningRate 0.000066 Epoch: 30 Global Step: 637520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:30,180-Speed 2498.44 samples/sec Loss 1.3037 LearningRate 0.000066 Epoch: 30 Global Step: 637530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:38,389-Speed 2495.22 samples/sec Loss 1.3086 LearningRate 0.000066 Epoch: 30 Global Step: 637540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:46,602-Speed 2494.25 samples/sec Loss 1.3020 LearningRate 0.000066 Epoch: 30 Global Step: 637550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:22:54,804-Speed 2497.26 samples/sec Loss 1.3071 LearningRate 0.000066 Epoch: 30 Global Step: 637560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:02,960-Speed 2511.37 samples/sec Loss 1.3241 LearningRate 0.000066 Epoch: 30 Global Step: 637570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:11,170-Speed 2495.21 samples/sec Loss 1.3486 LearningRate 0.000066 Epoch: 30 Global Step: 637580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:19,413-Speed 2498.67 samples/sec Loss 1.3314 LearningRate 0.000066 Epoch: 30 Global Step: 637590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:27,652-Speed 2499.47 samples/sec Loss 1.3278 LearningRate 0.000066 Epoch: 30 Global Step: 637600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:35,854-Speed 2497.56 samples/sec Loss 1.3079 LearningRate 0.000066 Epoch: 30 Global Step: 637610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:44,097-Speed 2498.09 samples/sec Loss 1.3680 LearningRate 0.000066 Epoch: 30 Global Step: 637620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:23:56,547-Speed 1655.91 samples/sec Loss 1.3111 LearningRate 0.000066 Epoch: 30 Global Step: 637630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:05,016-Speed 2501.03 samples/sec Loss 1.3327 LearningRate 0.000066 Epoch: 30 Global Step: 637640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:13,217-Speed 2497.54 samples/sec Loss 1.3236 LearningRate 0.000066 Epoch: 30 Global Step: 637650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:24,539-Speed 2491.37 samples/sec Loss 1.3331 LearningRate 0.000066 Epoch: 30 Global Step: 637660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:32,755-Speed 2503.00 samples/sec Loss 1.3234 LearningRate 0.000066 Epoch: 30 Global Step: 637670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:40,950-Speed 2499.43 samples/sec Loss 1.3185 LearningRate 0.000066 Epoch: 30 Global Step: 637680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:24:53,936-Speed 1606.43 samples/sec Loss 1.2747 LearningRate 0.000066 Epoch: 30 Global Step: 637690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:02,132-Speed 2502.03 samples/sec Loss 1.3707 LearningRate 0.000066 Epoch: 30 Global Step: 637700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:11,000-Speed 2326.38 samples/sec Loss 1.3243 LearningRate 0.000066 Epoch: 30 Global Step: 637710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:21,048-Speed 2038.23 samples/sec Loss 1.3344 LearningRate 0.000066 Epoch: 30 Global Step: 637720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:29,363-Speed 2500.34 samples/sec Loss 1.3420 LearningRate 0.000066 Epoch: 30 Global Step: 637730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:37,617-Speed 2499.86 samples/sec Loss 1.3306 LearningRate 0.000066 Epoch: 30 Global Step: 637740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:49,979-Speed 2515.56 samples/sec Loss 1.3104 LearningRate 0.000066 Epoch: 30 Global Step: 637750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:25:58,220-Speed 2500.18 samples/sec Loss 1.3109 LearningRate 0.000066 Epoch: 30 Global Step: 637760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:10,795-Speed 1628.75 samples/sec Loss 1.2921 LearningRate 0.000066 Epoch: 30 Global Step: 637770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:19,810-Speed 2306.70 samples/sec Loss 1.3179 LearningRate 0.000066 Epoch: 30 Global Step: 637780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:32,070-Speed 1677.82 samples/sec Loss 1.3362 LearningRate 0.000066 Epoch: 30 Global Step: 637790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:40,918-Speed 2314.96 samples/sec Loss 1.3100 LearningRate 0.000066 Epoch: 30 Global Step: 637800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:49,855-Speed 2329.12 samples/sec Loss 1.3392 LearningRate 0.000066 Epoch: 30 Global Step: 637810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:26:58,069-Speed 2493.94 samples/sec Loss 1.2872 LearningRate 0.000066 Epoch: 30 Global Step: 637820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:06,284-Speed 2493.07 samples/sec Loss 1.3458 LearningRate 0.000066 Epoch: 30 Global Step: 637830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:14,500-Speed 2493.26 samples/sec Loss 1.3021 LearningRate 0.000066 Epoch: 30 Global Step: 637840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:22,711-Speed 2494.79 samples/sec Loss 1.3078 LearningRate 0.000066 Epoch: 30 Global Step: 637850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:30,920-Speed 2495.05 samples/sec Loss 1.2742 LearningRate 0.000066 Epoch: 30 Global Step: 637860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:39,089-Speed 2507.56 samples/sec Loss 1.3046 LearningRate 0.000066 Epoch: 30 Global Step: 637870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:47,299-Speed 2494.89 samples/sec Loss 1.2785 LearningRate 0.000066 Epoch: 30 Global Step: 637880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:27:55,505-Speed 2496.20 samples/sec Loss 1.2784 LearningRate 0.000066 Epoch: 30 Global Step: 637890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:03,711-Speed 2495.96 samples/sec Loss 1.3102 LearningRate 0.000066 Epoch: 30 Global Step: 637900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:11,918-Speed 2496.16 samples/sec Loss 1.2898 LearningRate 0.000066 Epoch: 30 Global Step: 637910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:20,123-Speed 2496.32 samples/sec Loss 1.3127 LearningRate 0.000066 Epoch: 30 Global Step: 637920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:28,287-Speed 2509.52 samples/sec Loss 1.3019 LearningRate 0.000066 Epoch: 30 Global Step: 637930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:36,497-Speed 2494.84 samples/sec Loss 1.3029 LearningRate 0.000066 Epoch: 30 Global Step: 637940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:44,717-Speed 2491.97 samples/sec Loss 1.3022 LearningRate 0.000066 Epoch: 30 Global Step: 637950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:28:52,928-Speed 2494.54 samples/sec Loss 1.3099 LearningRate 0.000066 Epoch: 30 Global Step: 637960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:01,139-Speed 2495.71 samples/sec Loss 1.2897 LearningRate 0.000066 Epoch: 30 Global Step: 637970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:09,343-Speed 2496.49 samples/sec Loss 1.3288 LearningRate 0.000066 Epoch: 30 Global Step: 637980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:17,498-Speed 2511.81 samples/sec Loss 1.2986 LearningRate 0.000066 Epoch: 30 Global Step: 637990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:25,706-Speed 2495.42 samples/sec Loss 1.3057 LearningRate 0.000066 Epoch: 30 Global Step: 638000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:33,914-Speed 2495.64 samples/sec Loss 1.3342 LearningRate 0.000066 Epoch: 30 Global Step: 638010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:42,127-Speed 2493.99 samples/sec Loss 1.3334 LearningRate 0.000066 Epoch: 30 Global Step: 638020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:50,331-Speed 2496.57 samples/sec Loss 1.2997 LearningRate 0.000066 Epoch: 30 Global Step: 638030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:29:58,543-Speed 2494.29 samples/sec Loss 1.3087 LearningRate 0.000066 Epoch: 30 Global Step: 638040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:06,698-Speed 2511.80 samples/sec Loss 1.2926 LearningRate 0.000066 Epoch: 30 Global Step: 638050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:14,907-Speed 2495.17 samples/sec Loss 1.3324 LearningRate 0.000066 Epoch: 30 Global Step: 638060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:23,127-Speed 2491.85 samples/sec Loss 1.2874 LearningRate 0.000066 Epoch: 30 Global Step: 638070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:31,337-Speed 2494.94 samples/sec Loss 1.3418 LearningRate 0.000066 Epoch: 30 Global Step: 638080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:39,546-Speed 2495.33 samples/sec Loss 1.2967 LearningRate 0.000066 Epoch: 30 Global Step: 638090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:47,766-Speed 2491.75 samples/sec Loss 1.3391 LearningRate 0.000066 Epoch: 30 Global Step: 638100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:30:55,932-Speed 2508.29 samples/sec Loss 1.2972 LearningRate 0.000066 Epoch: 30 Global Step: 638110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:04,141-Speed 2495.84 samples/sec Loss 1.3364 LearningRate 0.000066 Epoch: 30 Global Step: 638120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:12,347-Speed 2496.16 samples/sec Loss 1.2954 LearningRate 0.000066 Epoch: 30 Global Step: 638130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:20,553-Speed 2495.85 samples/sec Loss 1.2933 LearningRate 0.000066 Epoch: 30 Global Step: 638140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:28,761-Speed 2495.56 samples/sec Loss 1.3097 LearningRate 0.000066 Epoch: 30 Global Step: 638150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:36,972-Speed 2494.78 samples/sec Loss 1.3033 LearningRate 0.000066 Epoch: 30 Global Step: 638160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:45,126-Speed 2511.87 samples/sec Loss 1.3258 LearningRate 0.000066 Epoch: 30 Global Step: 638170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:31:53,338-Speed 2494.27 samples/sec Loss 1.3269 LearningRate 0.000066 Epoch: 30 Global Step: 638180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:32:01,543-Speed 2496.39 samples/sec Loss 1.3184 LearningRate 0.000066 Epoch: 30 Global Step: 638190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:32:09,752-Speed 2495.35 samples/sec Loss 1.2970 LearningRate 0.000066 Epoch: 30 Global Step: 638200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-07-11 16:32:17,933-Speed 2503.78 samples/sec Loss 1.3123 LearningRate 0.000066 Epoch: 30 Global Step: 638210 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:32:26,137-Speed 2496.83 samples/sec Loss 1.3332 LearningRate 0.000066 Epoch: 30 Global Step: 638220 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:32:34,287-Speed 2513.19 samples/sec Loss 1.3214 LearningRate 0.000066 Epoch: 30 Global Step: 638230 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:32:42,492-Speed 2496.57 samples/sec Loss 1.2886 LearningRate 0.000066 Epoch: 30 Global Step: 638240 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:32:50,704-Speed 2494.31 samples/sec Loss 1.3590 LearningRate 0.000066 Epoch: 30 Global Step: 638250 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:32:58,923-Speed 2492.19 samples/sec Loss 1.2975 LearningRate 0.000066 Epoch: 30 Global Step: 638260 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:07,127-Speed 2496.58 samples/sec Loss 1.3366 LearningRate 0.000066 Epoch: 30 Global Step: 638270 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:15,347-Speed 2491.64 samples/sec Loss 1.3363 LearningRate 0.000066 Epoch: 30 Global Step: 638280 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:23,500-Speed 2513.11 samples/sec Loss 1.3128 LearningRate 0.000066 Epoch: 30 Global Step: 638290 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:31,706-Speed 2495.85 samples/sec Loss 1.3078 LearningRate 0.000066 Epoch: 30 Global Step: 638300 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:39,914-Speed 2495.67 samples/sec Loss 1.3412 LearningRate 0.000066 Epoch: 30 Global Step: 638310 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:48,129-Speed 2494.34 samples/sec Loss 1.3123 LearningRate 0.000066 Epoch: 30 Global Step: 638320 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:33:56,333-Speed 2496.53 samples/sec Loss 1.3330 LearningRate 0.000066 Epoch: 30 Global Step: 638330 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:04,540-Speed 2496.05 samples/sec Loss 1.2758 LearningRate 0.000066 Epoch: 30 Global Step: 638340 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:12,703-Speed 2509.12 samples/sec Loss 1.3008 LearningRate 0.000066 Epoch: 30 Global Step: 638350 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:20,912-Speed 2495.10 samples/sec Loss 1.2980 LearningRate 0.000066 Epoch: 30 Global Step: 638360 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:29,124-Speed 2494.30 samples/sec Loss 1.3011 LearningRate 0.000066 Epoch: 30 Global Step: 638370 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:37,327-Speed 2496.89 samples/sec Loss 1.3002 LearningRate 0.000066 Epoch: 30 Global Step: 638380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:45,537-Speed 2494.92 samples/sec Loss 1.3136 LearningRate 0.000066 Epoch: 30 Global Step: 638390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:34:53,742-Speed 2496.55 samples/sec Loss 1.3244 LearningRate 0.000066 Epoch: 30 Global Step: 638400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:01,891-Speed 2513.45 samples/sec Loss 1.3430 LearningRate 0.000066 Epoch: 30 Global Step: 638410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:10,096-Speed 2496.54 samples/sec Loss 1.3184 LearningRate 0.000066 Epoch: 30 Global Step: 638420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:18,296-Speed 2498.01 samples/sec Loss 1.3117 LearningRate 0.000066 Epoch: 30 Global Step: 638430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:26,498-Speed 2497.11 samples/sec Loss 1.3202 LearningRate 0.000066 Epoch: 30 Global Step: 638440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:34,701-Speed 2496.98 samples/sec Loss 1.3421 LearningRate 0.000066 Epoch: 30 Global Step: 638450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:42,906-Speed 2496.37 samples/sec Loss 1.3219 LearningRate 0.000066 Epoch: 30 Global Step: 638460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:51,071-Speed 2508.62 samples/sec Loss 1.3236 LearningRate 0.000066 Epoch: 30 Global Step: 638470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:35:59,277-Speed 2496.35 samples/sec Loss 1.2742 LearningRate 0.000066 Epoch: 30 Global Step: 638480 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:07,481-Speed 2496.79 samples/sec Loss 1.2901 LearningRate 0.000065 Epoch: 30 Global Step: 638490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:15,686-Speed 2496.48 samples/sec Loss 1.3188 LearningRate 0.000065 Epoch: 30 Global Step: 638500 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:23,890-Speed 2496.94 samples/sec Loss 1.3394 LearningRate 0.000065 Epoch: 30 Global Step: 638510 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:32,096-Speed 2496.27 samples/sec Loss 1.3319 LearningRate 0.000065 Epoch: 30 Global Step: 638520 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:40,246-Speed 2513.23 samples/sec Loss 1.3037 LearningRate 0.000065 Epoch: 30 Global Step: 638530 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:48,460-Speed 2493.43 samples/sec Loss 1.3016 LearningRate 0.000065 Epoch: 30 Global Step: 638540 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:36:56,682-Speed 2491.18 samples/sec Loss 1.3133 LearningRate 0.000065 Epoch: 30 Global Step: 638550 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:04,899-Speed 2493.02 samples/sec Loss 1.3264 LearningRate 0.000065 Epoch: 30 Global Step: 638560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:13,099-Speed 2497.67 samples/sec Loss 1.2675 LearningRate 0.000065 Epoch: 30 Global Step: 638570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:21,303-Speed 2496.91 samples/sec Loss 1.3209 LearningRate 0.000065 Epoch: 30 Global Step: 638580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:29,452-Speed 2513.63 samples/sec Loss 1.3209 LearningRate 0.000065 Epoch: 30 Global Step: 638590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:37,656-Speed 2496.60 samples/sec Loss 1.3137 LearningRate 0.000065 Epoch: 30 Global Step: 638600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:45,860-Speed 2496.59 samples/sec Loss 1.3222 LearningRate 0.000065 Epoch: 30 Global Step: 638610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:37:54,074-Speed 2493.75 samples/sec Loss 1.3458 LearningRate 0.000065 Epoch: 30 Global Step: 638620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:02,280-Speed 2496.38 samples/sec Loss 1.3185 LearningRate 0.000065 Epoch: 30 Global Step: 638630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:10,496-Speed 2492.98 samples/sec Loss 1.3305 LearningRate 0.000065 Epoch: 30 Global Step: 638640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:18,645-Speed 2513.40 samples/sec Loss 1.3312 LearningRate 0.000065 Epoch: 30 Global Step: 638650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:26,848-Speed 2497.05 samples/sec Loss 1.3117 LearningRate 0.000065 Epoch: 30 Global Step: 638660 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:35,052-Speed 2496.70 samples/sec Loss 1.3069 LearningRate 0.000065 Epoch: 30 Global Step: 638670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:43,256-Speed 2496.73 samples/sec Loss 1.3111 LearningRate 0.000065 Epoch: 30 Global Step: 638680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:51,467-Speed 2494.72 samples/sec Loss 1.2952 LearningRate 0.000065 Epoch: 30 Global Step: 638690 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:38:59,670-Speed 2496.82 samples/sec Loss 1.3299 LearningRate 0.000065 Epoch: 30 Global Step: 638700 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:07,845-Speed 2505.69 samples/sec Loss 1.2886 LearningRate 0.000065 Epoch: 30 Global Step: 638710 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:16,050-Speed 2496.22 samples/sec Loss 1.3381 LearningRate 0.000065 Epoch: 30 Global Step: 638720 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:24,272-Speed 2491.75 samples/sec Loss 1.3241 LearningRate 0.000065 Epoch: 30 Global Step: 638730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:32,481-Speed 2495.11 samples/sec Loss 1.3000 LearningRate 0.000065 Epoch: 30 Global Step: 638740 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:40,687-Speed 2495.95 samples/sec Loss 1.3318 LearningRate 0.000065 Epoch: 30 Global Step: 638750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:48,887-Speed 2497.73 samples/sec Loss 1.3154 LearningRate 0.000065 Epoch: 30 Global Step: 638760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:39:57,042-Speed 2511.75 samples/sec Loss 1.3162 LearningRate 0.000065 Epoch: 30 Global Step: 638770 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:05,250-Speed 2495.76 samples/sec Loss 1.3229 LearningRate 0.000065 Epoch: 30 Global Step: 638780 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:13,455-Speed 2496.36 samples/sec Loss 1.2938 LearningRate 0.000065 Epoch: 30 Global Step: 638790 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:21,660-Speed 2496.48 samples/sec Loss 1.2978 LearningRate 0.000065 Epoch: 30 Global Step: 638800 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:29,865-Speed 2496.51 samples/sec Loss 1.2833 LearningRate 0.000065 Epoch: 30 Global Step: 638810 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:38,071-Speed 2496.01 samples/sec Loss 1.3072 LearningRate 0.000065 Epoch: 30 Global Step: 638820 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:46,222-Speed 2512.83 samples/sec Loss 1.2943 LearningRate 0.000065 Epoch: 30 Global Step: 638830 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:40:54,426-Speed 2497.02 samples/sec Loss 1.3055 LearningRate 0.000065 Epoch: 30 Global Step: 638840 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:02,631-Speed 2496.52 samples/sec Loss 1.3356 LearningRate 0.000065 Epoch: 30 Global Step: 638850 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:10,832-Speed 2497.67 samples/sec Loss 1.3164 LearningRate 0.000065 Epoch: 30 Global Step: 638860 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:19,035-Speed 2497.03 samples/sec Loss 1.3156 LearningRate 0.000065 Epoch: 30 Global Step: 638870 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:27,240-Speed 2496.40 samples/sec Loss 1.2954 LearningRate 0.000065 Epoch: 30 Global Step: 638880 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:35,394-Speed 2512.33 samples/sec Loss 1.3294 LearningRate 0.000065 Epoch: 30 Global Step: 638890 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-07-11 16:41:43,557-Speed 2509.08 samples/sec Loss 1.3184 LearningRate 0.000065 Epoch: 30 Global Step: 638900 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:41:51,760-Speed 2496.97 samples/sec Loss 1.3038 LearningRate 0.000065 Epoch: 30 Global Step: 638910 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:41:59,965-Speed 2497.57 samples/sec Loss 1.3293 LearningRate 0.000065 Epoch: 30 Global Step: 638920 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:08,174-Speed 2495.37 samples/sec Loss 1.3173 LearningRate 0.000065 Epoch: 30 Global Step: 638930 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:16,383-Speed 2495.18 samples/sec Loss 1.3023 LearningRate 0.000065 Epoch: 30 Global Step: 638940 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:24,532-Speed 2513.66 samples/sec Loss 1.3399 LearningRate 0.000065 Epoch: 30 Global Step: 638950 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:32,751-Speed 2492.12 samples/sec Loss 1.2542 LearningRate 0.000065 Epoch: 30 Global Step: 638960 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:40,957-Speed 2497.74 samples/sec Loss 1.2857 LearningRate 0.000065 Epoch: 30 Global Step: 638970 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:49,175-Speed 2492.44 samples/sec Loss 1.3075 LearningRate 0.000065 Epoch: 30 Global Step: 638980 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:42:57,383-Speed 2495.50 samples/sec Loss 1.3199 LearningRate 0.000065 Epoch: 30 Global Step: 638990 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:05,588-Speed 2496.37 samples/sec Loss 1.3182 LearningRate 0.000065 Epoch: 30 Global Step: 639000 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:13,739-Speed 2513.20 samples/sec Loss 1.3247 LearningRate 0.000065 Epoch: 30 Global Step: 639010 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:21,943-Speed 2496.66 samples/sec Loss 1.3448 LearningRate 0.000065 Epoch: 30 Global Step: 639020 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:30,147-Speed 2496.80 samples/sec Loss 1.2839 LearningRate 0.000065 Epoch: 30 Global Step: 639030 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:38,351-Speed 2496.65 samples/sec Loss 1.2836 LearningRate 0.000065 Epoch: 30 Global Step: 639040 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:46,560-Speed 2495.64 samples/sec Loss 1.3371 LearningRate 0.000065 Epoch: 30 Global Step: 639050 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:43:54,768-Speed 2495.36 samples/sec Loss 1.3170 LearningRate 0.000065 Epoch: 30 Global Step: 639060 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:02,923-Speed 2511.78 samples/sec Loss 1.2779 LearningRate 0.000065 Epoch: 30 Global Step: 639070 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:11,131-Speed 2495.64 samples/sec Loss 1.3068 LearningRate 0.000065 Epoch: 30 Global Step: 639080 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:19,345-Speed 2493.61 samples/sec Loss 1.3287 LearningRate 0.000065 Epoch: 30 Global Step: 639090 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:27,551-Speed 2495.95 samples/sec Loss 1.2916 LearningRate 0.000065 Epoch: 30 Global Step: 639100 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:35,757-Speed 2496.39 samples/sec Loss 1.3073 LearningRate 0.000065 Epoch: 30 Global Step: 639110 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:43,962-Speed 2496.19 samples/sec Loss 1.3107 LearningRate 0.000065 Epoch: 30 Global Step: 639120 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:44:52,114-Speed 2512.81 samples/sec Loss 1.2805 LearningRate 0.000065 Epoch: 30 Global Step: 639130 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:00,322-Speed 2495.50 samples/sec Loss 1.3193 LearningRate 0.000065 Epoch: 30 Global Step: 639140 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:08,528-Speed 2496.37 samples/sec Loss 1.3137 LearningRate 0.000065 Epoch: 30 Global Step: 639150 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:16,732-Speed 2496.77 samples/sec Loss 1.3212 LearningRate 0.000065 Epoch: 30 Global Step: 639160 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:24,935-Speed 2496.88 samples/sec Loss 1.3261 LearningRate 0.000065 Epoch: 30 Global Step: 639170 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:33,137-Speed 2497.21 samples/sec Loss 1.2957 LearningRate 0.000065 Epoch: 30 Global Step: 639180 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:41,293-Speed 2511.61 samples/sec Loss 1.3012 LearningRate 0.000065 Epoch: 30 Global Step: 639190 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:49,502-Speed 2495.38 samples/sec Loss 1.2971 LearningRate 0.000065 Epoch: 30 Global Step: 639200 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:45:57,721-Speed 2491.93 samples/sec Loss 1.2930 LearningRate 0.000065 Epoch: 30 Global Step: 639210 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:05,927-Speed 2496.20 samples/sec Loss 1.3211 LearningRate 0.000065 Epoch: 30 Global Step: 639220 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:14,134-Speed 2495.80 samples/sec Loss 1.3556 LearningRate 0.000065 Epoch: 30 Global Step: 639230 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:22,340-Speed 2496.21 samples/sec Loss 1.3513 LearningRate 0.000065 Epoch: 30 Global Step: 639240 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:30,504-Speed 2508.96 samples/sec Loss 1.2759 LearningRate 0.000065 Epoch: 30 Global Step: 639250 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:38,710-Speed 2496.17 samples/sec Loss 1.3157 LearningRate 0.000065 Epoch: 30 Global Step: 639260 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:46,921-Speed 2494.46 samples/sec Loss 1.2646 LearningRate 0.000065 Epoch: 30 Global Step: 639270 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:46:55,139-Speed 2492.52 samples/sec Loss 1.3077 LearningRate 0.000065 Epoch: 30 Global Step: 639280 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:03,346-Speed 2495.58 samples/sec Loss 1.3130 LearningRate 0.000065 Epoch: 30 Global Step: 639290 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:11,552-Speed 2496.31 samples/sec Loss 1.2732 LearningRate 0.000065 Epoch: 30 Global Step: 639300 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:19,703-Speed 2512.84 samples/sec Loss 1.3102 LearningRate 0.000065 Epoch: 30 Global Step: 639310 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:27,912-Speed 2495.33 samples/sec Loss 1.3158 LearningRate 0.000065 Epoch: 30 Global Step: 639320 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:36,116-Speed 2496.63 samples/sec Loss 1.2939 LearningRate 0.000065 Epoch: 30 Global Step: 639330 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:44,344-Speed 2489.44 samples/sec Loss 1.3016 LearningRate 0.000065 Epoch: 30 Global Step: 639340 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:47:52,547-Speed 2497.22 samples/sec Loss 1.2977 LearningRate 0.000065 Epoch: 30 Global Step: 639350 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:00,764-Speed 2492.85 samples/sec Loss 1.3007 LearningRate 0.000065 Epoch: 30 Global Step: 639360 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:08,914-Speed 2512.98 samples/sec Loss 1.3341 LearningRate 0.000065 Epoch: 30 Global Step: 639370 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:17,118-Speed 2497.06 samples/sec Loss 1.3058 LearningRate 0.000065 Epoch: 30 Global Step: 639380 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:25,322-Speed 2496.69 samples/sec Loss 1.3182 LearningRate 0.000065 Epoch: 30 Global Step: 639390 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:33,532-Speed 2495.00 samples/sec Loss 1.3271 LearningRate 0.000065 Epoch: 30 Global Step: 639400 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:41,739-Speed 2495.73 samples/sec Loss 1.3165 LearningRate 0.000065 Epoch: 30 Global Step: 639410 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:49,947-Speed 2496.61 samples/sec Loss 1.2954 LearningRate 0.000065 Epoch: 30 Global Step: 639420 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:48:58,098-Speed 2512.78 samples/sec Loss 1.3292 LearningRate 0.000065 Epoch: 30 Global Step: 639430 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:06,307-Speed 2495.29 samples/sec Loss 1.3030 LearningRate 0.000065 Epoch: 30 Global Step: 639440 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:14,508-Speed 2497.45 samples/sec Loss 1.3224 LearningRate 0.000065 Epoch: 30 Global Step: 639450 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:22,716-Speed 2495.61 samples/sec Loss 1.3231 LearningRate 0.000065 Epoch: 30 Global Step: 639460 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:30,918-Speed 2497.38 samples/sec Loss 1.3208 LearningRate 0.000065 Epoch: 30 Global Step: 639470 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:39,122-Speed 2496.73 samples/sec Loss 1.3271 LearningRate 0.000065 Epoch: 30 Global Step: 639480 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:47,297-Speed 2505.50 samples/sec Loss 1.3072 LearningRate 0.000065 Epoch: 30 Global Step: 639490 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:49:55,503-Speed 2496.47 samples/sec Loss 1.3003 LearningRate 0.000065 Epoch: 30 Global Step: 639500 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:50:03,711-Speed 2495.41 samples/sec Loss 1.2717 LearningRate 0.000065 Epoch: 30 Global Step: 639510 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:50:11,948-Speed 2486.66 samples/sec Loss 1.2950 LearningRate 0.000065 Epoch: 30 Global Step: 639520 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:50:20,153-Speed 2496.47 samples/sec Loss 1.2957 LearningRate 0.000065 Epoch: 30 Global Step: 639530 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-07-11 16:50:28,358-Speed 2496.41 samples/sec Loss 1.2997 LearningRate 0.000065 Epoch: 30 Global Step: 639540 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:50:36,516-Speed 2511.03 samples/sec Loss 1.2784 LearningRate 0.000065 Epoch: 30 Global Step: 639550 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:50:44,720-Speed 2496.57 samples/sec Loss 1.3054 LearningRate 0.000065 Epoch: 30 Global Step: 639560 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:50:52,926-Speed 2496.15 samples/sec Loss 1.3201 LearningRate 0.000065 Epoch: 30 Global Step: 639570 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:01,135-Speed 2495.36 samples/sec Loss 1.3683 LearningRate 0.000065 Epoch: 30 Global Step: 639580 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:09,339-Speed 2496.81 samples/sec Loss 1.2606 LearningRate 0.000065 Epoch: 30 Global Step: 639590 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:17,544-Speed 2496.40 samples/sec Loss 1.3139 LearningRate 0.000065 Epoch: 30 Global Step: 639600 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:25,695-Speed 2513.03 samples/sec Loss 1.2668 LearningRate 0.000065 Epoch: 30 Global Step: 639610 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:33,901-Speed 2496.31 samples/sec Loss 1.3244 LearningRate 0.000065 Epoch: 30 Global Step: 639620 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:42,194-Speed 2469.63 samples/sec Loss 1.3114 LearningRate 0.000065 Epoch: 30 Global Step: 639630 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:50,406-Speed 2494.36 samples/sec Loss 1.3048 LearningRate 0.000065 Epoch: 30 Global Step: 639640 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:51:58,613-Speed 2495.80 samples/sec Loss 1.2963 LearningRate 0.000065 Epoch: 30 Global Step: 639650 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:06,821-Speed 2495.80 samples/sec Loss 1.2849 LearningRate 0.000065 Epoch: 30 Global Step: 639660 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:14,970-Speed 2513.48 samples/sec Loss 1.3352 LearningRate 0.000065 Epoch: 30 Global Step: 639670 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:23,176-Speed 2496.08 samples/sec Loss 1.3213 LearningRate 0.000065 Epoch: 30 Global Step: 639680 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:31,388-Speed 2494.43 samples/sec Loss 1.3413 LearningRate 0.000065 Epoch: 30 Global Step: 639690 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:39,595-Speed 2495.84 samples/sec Loss 1.3391 LearningRate 0.000065 Epoch: 30 Global Step: 639700 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:47,821-Speed 2489.97 samples/sec Loss 1.2985 LearningRate 0.000065 Epoch: 30 Global Step: 639710 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:52:56,037-Speed 2493.50 samples/sec Loss 1.2933 LearningRate 0.000065 Epoch: 30 Global Step: 639720 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:04,189-Speed 2512.44 samples/sec Loss 1.3526 LearningRate 0.000065 Epoch: 30 Global Step: 639730 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:12,393-Speed 2496.74 samples/sec Loss 1.3312 LearningRate 0.000065 Epoch: 30 Global Step: 639740 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:20,596-Speed 2497.00 samples/sec Loss 1.3092 LearningRate 0.000065 Epoch: 30 Global Step: 639750 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:28,804-Speed 2495.69 samples/sec Loss 1.2623 LearningRate 0.000065 Epoch: 30 Global Step: 639760 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:37,007-Speed 2497.09 samples/sec Loss 1.3305 LearningRate 0.000065 Epoch: 30 Global Step: 639770 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:45,211-Speed 2496.77 samples/sec Loss 1.3001 LearningRate 0.000065 Epoch: 30 Global Step: 639780 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:53:53,363-Speed 2512.48 samples/sec Loss 1.3191 LearningRate 0.000065 Epoch: 30 Global Step: 639790 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:01,568-Speed 2496.42 samples/sec Loss 1.2983 LearningRate 0.000065 Epoch: 30 Global Step: 639800 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:09,774-Speed 2496.11 samples/sec Loss 1.3324 LearningRate 0.000065 Epoch: 30 Global Step: 639810 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:17,981-Speed 2495.95 samples/sec Loss 1.3387 LearningRate 0.000065 Epoch: 30 Global Step: 639820 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:26,184-Speed 2497.25 samples/sec Loss 1.3047 LearningRate 0.000065 Epoch: 30 Global Step: 639830 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:34,390-Speed 2496.08 samples/sec Loss 1.3134 LearningRate 0.000065 Epoch: 30 Global Step: 639840 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:42,542-Speed 2512.81 samples/sec Loss 1.3065 LearningRate 0.000065 Epoch: 30 Global Step: 639850 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:50,756-Speed 2493.95 samples/sec Loss 1.3172 LearningRate 0.000065 Epoch: 30 Global Step: 639860 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:54:58,975-Speed 2492.14 samples/sec Loss 1.3161 LearningRate 0.000065 Epoch: 30 Global Step: 639870 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:07,180-Speed 2496.34 samples/sec Loss 1.3055 LearningRate 0.000065 Epoch: 30 Global Step: 639880 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:15,386-Speed 2496.16 samples/sec Loss 1.2892 LearningRate 0.000065 Epoch: 30 Global Step: 639890 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:23,598-Speed 2494.43 samples/sec Loss 1.3275 LearningRate 0.000065 Epoch: 30 Global Step: 639900 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:31,749-Speed 2512.97 samples/sec Loss 1.3111 LearningRate 0.000065 Epoch: 30 Global Step: 639910 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:39,952-Speed 2496.91 samples/sec Loss 1.3046 LearningRate 0.000065 Epoch: 30 Global Step: 639920 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:48,160-Speed 2495.51 samples/sec Loss 1.3226 LearningRate 0.000065 Epoch: 30 Global Step: 639930 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:55:56,363-Speed 2496.89 samples/sec Loss 1.3159 LearningRate 0.000065 Epoch: 30 Global Step: 639940 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:04,570-Speed 2495.86 samples/sec Loss 1.3166 LearningRate 0.000064 Epoch: 30 Global Step: 639950 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:12,773-Speed 2497.20 samples/sec Loss 1.3078 LearningRate 0.000064 Epoch: 30 Global Step: 639960 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:20,923-Speed 2513.21 samples/sec Loss 1.3035 LearningRate 0.000064 Epoch: 30 Global Step: 639970 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:29,140-Speed 2492.73 samples/sec Loss 1.3263 LearningRate 0.000064 Epoch: 30 Global Step: 639980 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:37,347-Speed 2495.91 samples/sec Loss 1.2965 LearningRate 0.000064 Epoch: 30 Global Step: 639990 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:45,562-Speed 2493.43 samples/sec Loss 1.2868 LearningRate 0.000064 Epoch: 30 Global Step: 640000 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:56:53,775-Speed 2493.97 samples/sec Loss 1.3054 LearningRate 0.000064 Epoch: 30 Global Step: 640010 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:01,982-Speed 2495.50 samples/sec Loss 1.3326 LearningRate 0.000064 Epoch: 30 Global Step: 640020 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:10,138-Speed 2511.85 samples/sec Loss 1.3186 LearningRate 0.000064 Epoch: 30 Global Step: 640030 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:18,343-Speed 2496.46 samples/sec Loss 1.2928 LearningRate 0.000064 Epoch: 30 Global Step: 640040 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:26,550-Speed 2495.75 samples/sec Loss 1.3272 LearningRate 0.000064 Epoch: 30 Global Step: 640050 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:34,756-Speed 2496.08 samples/sec Loss 1.3236 LearningRate 0.000064 Epoch: 30 Global Step: 640060 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:42,969-Speed 2494.09 samples/sec Loss 1.3120 LearningRate 0.000064 Epoch: 30 Global Step: 640070 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:51,179-Speed 2495.33 samples/sec Loss 1.3092 LearningRate 0.000064 Epoch: 30 Global Step: 640080 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:57:59,332-Speed 2512.27 samples/sec Loss 1.2852 LearningRate 0.000064 Epoch: 30 Global Step: 640090 Fp16 Grad Scale: 8192 Required: 43 hours Training: 2022-07-11 16:58:07,543-Speed 2494.72 samples/sec Loss 1.3316 LearningRate 0.000064 Epoch: 30 Global Step: 640100 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:15,751-Speed 2495.38 samples/sec Loss 1.2908 LearningRate 0.000064 Epoch: 30 Global Step: 640110 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:23,958-Speed 2496.27 samples/sec Loss 1.3095 LearningRate 0.000064 Epoch: 30 Global Step: 640120 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:32,163-Speed 2496.27 samples/sec Loss 1.2916 LearningRate 0.000064 Epoch: 30 Global Step: 640130 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:40,378-Speed 2493.54 samples/sec Loss 1.3334 LearningRate 0.000064 Epoch: 30 Global Step: 640140 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:48,539-Speed 2510.01 samples/sec Loss 1.3062 LearningRate 0.000064 Epoch: 30 Global Step: 640150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:58:56,744-Speed 2496.32 samples/sec Loss 1.2957 LearningRate 0.000064 Epoch: 30 Global Step: 640160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:04,952-Speed 2495.57 samples/sec Loss 1.3213 LearningRate 0.000064 Epoch: 30 Global Step: 640170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:13,155-Speed 2496.99 samples/sec Loss 1.3106 LearningRate 0.000064 Epoch: 30 Global Step: 640180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:21,362-Speed 2495.87 samples/sec Loss 1.2781 LearningRate 0.000064 Epoch: 30 Global Step: 640190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:29,568-Speed 2495.93 samples/sec Loss 1.3333 LearningRate 0.000064 Epoch: 30 Global Step: 640200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:37,750-Speed 2503.47 samples/sec Loss 1.2764 LearningRate 0.000064 Epoch: 30 Global Step: 640210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:45,956-Speed 2496.14 samples/sec Loss 1.3113 LearningRate 0.000064 Epoch: 30 Global Step: 640220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 16:59:54,162-Speed 2496.07 samples/sec Loss 1.3560 LearningRate 0.000064 Epoch: 30 Global Step: 640230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:02,375-Speed 2494.46 samples/sec Loss 1.2961 LearningRate 0.000064 Epoch: 30 Global Step: 640240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:10,584-Speed 2495.19 samples/sec Loss 1.3014 LearningRate 0.000064 Epoch: 30 Global Step: 640250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:18,791-Speed 2495.74 samples/sec Loss 1.3040 LearningRate 0.000064 Epoch: 30 Global Step: 640260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:26,947-Speed 2511.61 samples/sec Loss 1.3063 LearningRate 0.000064 Epoch: 30 Global Step: 640270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:35,154-Speed 2495.68 samples/sec Loss 1.3338 LearningRate 0.000064 Epoch: 30 Global Step: 640280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:43,364-Speed 2495.07 samples/sec Loss 1.2682 LearningRate 0.000064 Epoch: 30 Global Step: 640290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:51,572-Speed 2495.31 samples/sec Loss 1.3144 LearningRate 0.000064 Epoch: 30 Global Step: 640300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:00:59,783-Speed 2495.14 samples/sec Loss 1.3149 LearningRate 0.000064 Epoch: 30 Global Step: 640310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:07,998-Speed 2493.65 samples/sec Loss 1.3265 LearningRate 0.000064 Epoch: 30 Global Step: 640320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:16,151-Speed 2512.03 samples/sec Loss 1.3233 LearningRate 0.000064 Epoch: 30 Global Step: 640330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:24,361-Speed 2495.13 samples/sec Loss 1.2860 LearningRate 0.000064 Epoch: 30 Global Step: 640340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:32,568-Speed 2495.81 samples/sec Loss 1.2643 LearningRate 0.000064 Epoch: 30 Global Step: 640350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:40,776-Speed 2495.32 samples/sec Loss 1.2827 LearningRate 0.000064 Epoch: 30 Global Step: 640360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:48,986-Speed 2494.86 samples/sec Loss 1.2667 LearningRate 0.000064 Epoch: 30 Global Step: 640370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:01:57,193-Speed 2495.94 samples/sec Loss 1.2628 LearningRate 0.000064 Epoch: 30 Global Step: 640380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:05,346-Speed 2512.19 samples/sec Loss 1.2869 LearningRate 0.000064 Epoch: 30 Global Step: 640390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:13,556-Speed 2495.09 samples/sec Loss 1.3094 LearningRate 0.000064 Epoch: 30 Global Step: 640400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:21,763-Speed 2495.52 samples/sec Loss 1.2800 LearningRate 0.000064 Epoch: 30 Global Step: 640410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:29,968-Speed 2496.41 samples/sec Loss 1.3259 LearningRate 0.000064 Epoch: 30 Global Step: 640420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:38,175-Speed 2495.81 samples/sec Loss 1.2822 LearningRate 0.000064 Epoch: 30 Global Step: 640430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:46,386-Speed 2494.67 samples/sec Loss 1.3202 LearningRate 0.000064 Epoch: 30 Global Step: 640440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:02:54,540-Speed 2512.10 samples/sec Loss 1.2755 LearningRate 0.000064 Epoch: 30 Global Step: 640450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:02,749-Speed 2495.22 samples/sec Loss 1.3081 LearningRate 0.000064 Epoch: 30 Global Step: 640460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:10,955-Speed 2496.21 samples/sec Loss 1.3006 LearningRate 0.000064 Epoch: 30 Global Step: 640470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:19,159-Speed 2496.72 samples/sec Loss 1.3120 LearningRate 0.000064 Epoch: 30 Global Step: 640480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:27,365-Speed 2496.15 samples/sec Loss 1.3043 LearningRate 0.000064 Epoch: 30 Global Step: 640490 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:35,570-Speed 2496.26 samples/sec Loss 1.3090 LearningRate 0.000064 Epoch: 30 Global Step: 640500 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:43,725-Speed 2511.72 samples/sec Loss 1.3228 LearningRate 0.000064 Epoch: 30 Global Step: 640510 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:03:51,935-Speed 2494.78 samples/sec Loss 1.3379 LearningRate 0.000064 Epoch: 30 Global Step: 640520 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:00,142-Speed 2495.94 samples/sec Loss 1.3285 LearningRate 0.000064 Epoch: 30 Global Step: 640530 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:08,364-Speed 2491.56 samples/sec Loss 1.3247 LearningRate 0.000064 Epoch: 30 Global Step: 640540 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:16,577-Speed 2493.99 samples/sec Loss 1.2988 LearningRate 0.000064 Epoch: 30 Global Step: 640550 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:24,783-Speed 2496.03 samples/sec Loss 1.2877 LearningRate 0.000064 Epoch: 30 Global Step: 640560 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:32,939-Speed 2511.51 samples/sec Loss 1.2905 LearningRate 0.000064 Epoch: 30 Global Step: 640570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:41,146-Speed 2495.89 samples/sec Loss 1.2823 LearningRate 0.000064 Epoch: 30 Global Step: 640580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:49,351-Speed 2496.29 samples/sec Loss 1.2928 LearningRate 0.000064 Epoch: 30 Global Step: 640590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:04:57,555-Speed 2496.87 samples/sec Loss 1.3368 LearningRate 0.000064 Epoch: 30 Global Step: 640600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:05,766-Speed 2494.69 samples/sec Loss 1.2989 LearningRate 0.000064 Epoch: 30 Global Step: 640610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:13,973-Speed 2495.82 samples/sec Loss 1.3343 LearningRate 0.000064 Epoch: 30 Global Step: 640620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:22,126-Speed 2512.46 samples/sec Loss 1.3044 LearningRate 0.000064 Epoch: 30 Global Step: 640630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:30,348-Speed 2491.31 samples/sec Loss 1.3044 LearningRate 0.000064 Epoch: 30 Global Step: 640640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:38,553-Speed 2496.33 samples/sec Loss 1.2936 LearningRate 0.000064 Epoch: 30 Global Step: 640650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:46,761-Speed 2495.70 samples/sec Loss 1.3115 LearningRate 0.000064 Epoch: 30 Global Step: 640660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:05:54,974-Speed 2494.24 samples/sec Loss 1.3195 LearningRate 0.000064 Epoch: 30 Global Step: 640670 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:03,181-Speed 2495.81 samples/sec Loss 1.3315 LearningRate 0.000064 Epoch: 30 Global Step: 640680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:11,332-Speed 2513.08 samples/sec Loss 1.3312 LearningRate 0.000064 Epoch: 30 Global Step: 640690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:19,538-Speed 2496.04 samples/sec Loss 1.3092 LearningRate 0.000064 Epoch: 30 Global Step: 640700 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:27,744-Speed 2495.99 samples/sec Loss 1.3332 LearningRate 0.000064 Epoch: 30 Global Step: 640710 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:35,952-Speed 2495.43 samples/sec Loss 1.3013 LearningRate 0.000064 Epoch: 30 Global Step: 640720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:44,159-Speed 2495.62 samples/sec Loss 1.3039 LearningRate 0.000064 Epoch: 30 Global Step: 640730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:06:52,367-Speed 2495.46 samples/sec Loss 1.2952 LearningRate 0.000064 Epoch: 30 Global Step: 640740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:00,522-Speed 2511.89 samples/sec Loss 1.3128 LearningRate 0.000064 Epoch: 30 Global Step: 640750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:08,730-Speed 2495.27 samples/sec Loss 1.3032 LearningRate 0.000064 Epoch: 30 Global Step: 640760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:16,935-Speed 2496.56 samples/sec Loss 1.3276 LearningRate 0.000064 Epoch: 30 Global Step: 640770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:25,143-Speed 2495.44 samples/sec Loss 1.2895 LearningRate 0.000064 Epoch: 30 Global Step: 640780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:33,348-Speed 2496.39 samples/sec Loss 1.2916 LearningRate 0.000064 Epoch: 30 Global Step: 640790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:41,554-Speed 2496.08 samples/sec Loss 1.3101 LearningRate 0.000064 Epoch: 30 Global Step: 640800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:49,708-Speed 2512.24 samples/sec Loss 1.3203 LearningRate 0.000064 Epoch: 30 Global Step: 640810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:07:57,915-Speed 2495.73 samples/sec Loss 1.3351 LearningRate 0.000064 Epoch: 30 Global Step: 640820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:06,119-Speed 2496.56 samples/sec Loss 1.3219 LearningRate 0.000064 Epoch: 30 Global Step: 640830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:14,336-Speed 2493.00 samples/sec Loss 1.2911 LearningRate 0.000064 Epoch: 30 Global Step: 640840 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:22,543-Speed 2495.72 samples/sec Loss 1.2978 LearningRate 0.000064 Epoch: 30 Global Step: 640850 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:30,749-Speed 2496.06 samples/sec Loss 1.3130 LearningRate 0.000064 Epoch: 30 Global Step: 640860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:38,903-Speed 2511.96 samples/sec Loss 1.3072 LearningRate 0.000064 Epoch: 30 Global Step: 640870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:47,110-Speed 2495.88 samples/sec Loss 1.3445 LearningRate 0.000064 Epoch: 30 Global Step: 640880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:08:55,319-Speed 2495.39 samples/sec Loss 1.3074 LearningRate 0.000064 Epoch: 30 Global Step: 640890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:03,524-Speed 2496.47 samples/sec Loss 1.2966 LearningRate 0.000064 Epoch: 30 Global Step: 640900 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:11,728-Speed 2496.67 samples/sec Loss 1.2795 LearningRate 0.000064 Epoch: 30 Global Step: 640910 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:19,936-Speed 2495.91 samples/sec Loss 1.3222 LearningRate 0.000064 Epoch: 30 Global Step: 640920 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:28,088-Speed 2512.55 samples/sec Loss 1.2917 LearningRate 0.000064 Epoch: 30 Global Step: 640930 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:36,292-Speed 2496.60 samples/sec Loss 1.2832 LearningRate 0.000064 Epoch: 30 Global Step: 640940 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:44,498-Speed 2496.03 samples/sec Loss 1.3122 LearningRate 0.000064 Epoch: 30 Global Step: 640950 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:09:52,707-Speed 2495.38 samples/sec Loss 1.2776 LearningRate 0.000064 Epoch: 30 Global Step: 640960 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:00,911-Speed 2496.51 samples/sec Loss 1.3304 LearningRate 0.000064 Epoch: 30 Global Step: 640970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:09,124-Speed 2494.12 samples/sec Loss 1.2922 LearningRate 0.000064 Epoch: 30 Global Step: 640980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:17,277-Speed 2512.21 samples/sec Loss 1.3252 LearningRate 0.000064 Epoch: 30 Global Step: 640990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:25,485-Speed 2495.63 samples/sec Loss 1.3114 LearningRate 0.000064 Epoch: 30 Global Step: 641000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:33,689-Speed 2496.57 samples/sec Loss 1.3168 LearningRate 0.000064 Epoch: 30 Global Step: 641010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:41,894-Speed 2496.59 samples/sec Loss 1.2963 LearningRate 0.000064 Epoch: 30 Global Step: 641020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:50,107-Speed 2494.34 samples/sec Loss 1.3128 LearningRate 0.000064 Epoch: 30 Global Step: 641030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:10:58,317-Speed 2494.71 samples/sec Loss 1.3225 LearningRate 0.000064 Epoch: 30 Global Step: 641040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:06,475-Speed 2510.87 samples/sec Loss 1.3081 LearningRate 0.000064 Epoch: 30 Global Step: 641050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:14,684-Speed 2495.36 samples/sec Loss 1.2891 LearningRate 0.000064 Epoch: 30 Global Step: 641060 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:22,897-Speed 2493.99 samples/sec Loss 1.2800 LearningRate 0.000064 Epoch: 30 Global Step: 641070 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:31,108-Speed 2495.00 samples/sec Loss 1.3042 LearningRate 0.000064 Epoch: 30 Global Step: 641080 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:39,332-Speed 2490.45 samples/sec Loss 1.3277 LearningRate 0.000064 Epoch: 30 Global Step: 641090 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:47,558-Speed 2489.89 samples/sec Loss 1.3151 LearningRate 0.000064 Epoch: 30 Global Step: 641100 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:11:55,715-Speed 2511.24 samples/sec Loss 1.3201 LearningRate 0.000064 Epoch: 30 Global Step: 641110 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:03,934-Speed 2492.11 samples/sec Loss 1.3185 LearningRate 0.000064 Epoch: 30 Global Step: 641120 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:12,167-Speed 2488.14 samples/sec Loss 1.2899 LearningRate 0.000064 Epoch: 30 Global Step: 641130 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:20,378-Speed 2494.49 samples/sec Loss 1.3164 LearningRate 0.000064 Epoch: 30 Global Step: 641140 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:28,589-Speed 2494.78 samples/sec Loss 1.2749 LearningRate 0.000064 Epoch: 30 Global Step: 641150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:36,801-Speed 2494.44 samples/sec Loss 1.3330 LearningRate 0.000064 Epoch: 30 Global Step: 641160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:44,960-Speed 2510.37 samples/sec Loss 1.2895 LearningRate 0.000064 Epoch: 30 Global Step: 641170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:12:53,174-Speed 2493.79 samples/sec Loss 1.2931 LearningRate 0.000064 Epoch: 30 Global Step: 641180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:01,386-Speed 2494.31 samples/sec Loss 1.3166 LearningRate 0.000064 Epoch: 30 Global Step: 641190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:09,594-Speed 2495.68 samples/sec Loss 1.2976 LearningRate 0.000064 Epoch: 30 Global Step: 641200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:17,805-Speed 2494.51 samples/sec Loss 1.2800 LearningRate 0.000064 Epoch: 30 Global Step: 641210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:26,015-Speed 2494.78 samples/sec Loss 1.2965 LearningRate 0.000064 Epoch: 30 Global Step: 641220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:34,174-Speed 2510.65 samples/sec Loss 1.3172 LearningRate 0.000064 Epoch: 30 Global Step: 641230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:42,379-Speed 2496.31 samples/sec Loss 1.2772 LearningRate 0.000064 Epoch: 30 Global Step: 641240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:50,583-Speed 2496.80 samples/sec Loss 1.3105 LearningRate 0.000064 Epoch: 30 Global Step: 641250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:13:58,791-Speed 2495.88 samples/sec Loss 1.3167 LearningRate 0.000064 Epoch: 30 Global Step: 641260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:14:06,996-Speed 2496.31 samples/sec Loss 1.3008 LearningRate 0.000064 Epoch: 30 Global Step: 641270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:14:15,197-Speed 2497.68 samples/sec Loss 1.3201 LearningRate 0.000064 Epoch: 30 Global Step: 641280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:14:23,349-Speed 2512.74 samples/sec Loss 1.2951 LearningRate 0.000064 Epoch: 30 Global Step: 641290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:14:31,561-Speed 2494.37 samples/sec Loss 1.3259 LearningRate 0.000064 Epoch: 30 Global Step: 641300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:14:39,765-Speed 2496.66 samples/sec Loss 1.3169 LearningRate 0.000064 Epoch: 30 Global Step: 641310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:14:47,967-Speed 2497.17 samples/sec Loss 1.2849 LearningRate 0.000064 Epoch: 30 Global Step: 641320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:14:56,171-Speed 2496.84 samples/sec Loss 1.3632 LearningRate 0.000064 Epoch: 30 Global Step: 641330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:04,380-Speed 2495.23 samples/sec Loss 1.3235 LearningRate 0.000064 Epoch: 30 Global Step: 641340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:12,534-Speed 2512.12 samples/sec Loss 1.3148 LearningRate 0.000064 Epoch: 30 Global Step: 641350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:20,738-Speed 2496.86 samples/sec Loss 1.2991 LearningRate 0.000064 Epoch: 30 Global Step: 641360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:28,944-Speed 2495.99 samples/sec Loss 1.3012 LearningRate 0.000064 Epoch: 30 Global Step: 641370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:37,152-Speed 2495.92 samples/sec Loss 1.3425 LearningRate 0.000064 Epoch: 30 Global Step: 641380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:45,368-Speed 2493.00 samples/sec Loss 1.3073 LearningRate 0.000064 Epoch: 30 Global Step: 641390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:15:53,573-Speed 2496.54 samples/sec Loss 1.3123 LearningRate 0.000064 Epoch: 30 Global Step: 641400 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:01,729-Speed 2511.31 samples/sec Loss 1.3019 LearningRate 0.000064 Epoch: 30 Global Step: 641410 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:09,934-Speed 2496.42 samples/sec Loss 1.3005 LearningRate 0.000064 Epoch: 30 Global Step: 641420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:18,141-Speed 2495.71 samples/sec Loss 1.2989 LearningRate 0.000063 Epoch: 30 Global Step: 641430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:26,347-Speed 2496.05 samples/sec Loss 1.3108 LearningRate 0.000063 Epoch: 30 Global Step: 641440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:34,551-Speed 2496.71 samples/sec Loss 1.3327 LearningRate 0.000063 Epoch: 30 Global Step: 641450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:42,757-Speed 2496.24 samples/sec Loss 1.3317 LearningRate 0.000063 Epoch: 30 Global Step: 641460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:50,908-Speed 2512.78 samples/sec Loss 1.2885 LearningRate 0.000063 Epoch: 30 Global Step: 641470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:16:59,128-Speed 2491.80 samples/sec Loss 1.3432 LearningRate 0.000063 Epoch: 30 Global Step: 641480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:07,333-Speed 2496.48 samples/sec Loss 1.3260 LearningRate 0.000063 Epoch: 30 Global Step: 641490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:15,537-Speed 2496.74 samples/sec Loss 1.3062 LearningRate 0.000063 Epoch: 30 Global Step: 641500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:23,743-Speed 2496.15 samples/sec Loss 1.3236 LearningRate 0.000063 Epoch: 30 Global Step: 641510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:31,953-Speed 2495.09 samples/sec Loss 1.3239 LearningRate 0.000063 Epoch: 30 Global Step: 641520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:40,105-Speed 2512.66 samples/sec Loss 1.3268 LearningRate 0.000063 Epoch: 30 Global Step: 641530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:48,311-Speed 2496.00 samples/sec Loss 1.2838 LearningRate 0.000063 Epoch: 30 Global Step: 641540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:17:56,517-Speed 2496.08 samples/sec Loss 1.2610 LearningRate 0.000063 Epoch: 30 Global Step: 641550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:04,722-Speed 2496.64 samples/sec Loss 1.2993 LearningRate 0.000063 Epoch: 30 Global Step: 641560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:12,937-Speed 2493.05 samples/sec Loss 1.3085 LearningRate 0.000063 Epoch: 30 Global Step: 641570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:21,144-Speed 2495.95 samples/sec Loss 1.3331 LearningRate 0.000063 Epoch: 30 Global Step: 641580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:29,294-Speed 2513.25 samples/sec Loss 1.3065 LearningRate 0.000063 Epoch: 30 Global Step: 641590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:37,501-Speed 2495.83 samples/sec Loss 1.3340 LearningRate 0.000063 Epoch: 30 Global Step: 641600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:45,707-Speed 2496.06 samples/sec Loss 1.2828 LearningRate 0.000063 Epoch: 30 Global Step: 641610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:18:53,921-Speed 2493.81 samples/sec Loss 1.2932 LearningRate 0.000063 Epoch: 30 Global Step: 641620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:02,128-Speed 2495.89 samples/sec Loss 1.2801 LearningRate 0.000063 Epoch: 30 Global Step: 641630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:10,332-Speed 2496.78 samples/sec Loss 1.2886 LearningRate 0.000063 Epoch: 30 Global Step: 641640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:18,483-Speed 2512.76 samples/sec Loss 1.2830 LearningRate 0.000063 Epoch: 30 Global Step: 641650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:26,687-Speed 2496.86 samples/sec Loss 1.3188 LearningRate 0.000063 Epoch: 30 Global Step: 641660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:34,896-Speed 2495.18 samples/sec Loss 1.3064 LearningRate 0.000063 Epoch: 30 Global Step: 641670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:43,112-Speed 2493.06 samples/sec Loss 1.2616 LearningRate 0.000063 Epoch: 30 Global Step: 641680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:51,323-Speed 2494.74 samples/sec Loss 1.3060 LearningRate 0.000063 Epoch: 30 Global Step: 641690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:19:59,540-Speed 2493.15 samples/sec Loss 1.2916 LearningRate 0.000063 Epoch: 30 Global Step: 641700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:07,694-Speed 2511.89 samples/sec Loss 1.3086 LearningRate 0.000063 Epoch: 30 Global Step: 641710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:15,903-Speed 2495.14 samples/sec Loss 1.2623 LearningRate 0.000063 Epoch: 30 Global Step: 641720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:24,108-Speed 2496.64 samples/sec Loss 1.2825 LearningRate 0.000063 Epoch: 30 Global Step: 641730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:32,320-Speed 2494.18 samples/sec Loss 1.2960 LearningRate 0.000063 Epoch: 30 Global Step: 641740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:40,524-Speed 2496.70 samples/sec Loss 1.2946 LearningRate 0.000063 Epoch: 30 Global Step: 641750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:48,743-Speed 2492.40 samples/sec Loss 1.2808 LearningRate 0.000063 Epoch: 30 Global Step: 641760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:20:56,894-Speed 2512.96 samples/sec Loss 1.3220 LearningRate 0.000063 Epoch: 30 Global Step: 641770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:05,099-Speed 2496.29 samples/sec Loss 1.2880 LearningRate 0.000063 Epoch: 30 Global Step: 641780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:13,306-Speed 2495.94 samples/sec Loss 1.2855 LearningRate 0.000063 Epoch: 30 Global Step: 641790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:21,511-Speed 2496.17 samples/sec Loss 1.3271 LearningRate 0.000063 Epoch: 30 Global Step: 641800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:29,718-Speed 2495.93 samples/sec Loss 1.3078 LearningRate 0.000063 Epoch: 30 Global Step: 641810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:37,921-Speed 2497.14 samples/sec Loss 1.2834 LearningRate 0.000063 Epoch: 30 Global Step: 641820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:46,075-Speed 2512.25 samples/sec Loss 1.2855 LearningRate 0.000063 Epoch: 30 Global Step: 641830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:21:54,279-Speed 2497.12 samples/sec Loss 1.3112 LearningRate 0.000063 Epoch: 30 Global Step: 641840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:02,493-Speed 2493.81 samples/sec Loss 1.2832 LearningRate 0.000063 Epoch: 30 Global Step: 641850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:10,697-Speed 2496.39 samples/sec Loss 1.2601 LearningRate 0.000063 Epoch: 30 Global Step: 641860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:18,902-Speed 2496.74 samples/sec Loss 1.3123 LearningRate 0.000063 Epoch: 30 Global Step: 641870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:27,105-Speed 2496.85 samples/sec Loss 1.3065 LearningRate 0.000063 Epoch: 30 Global Step: 641880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:35,258-Speed 2512.63 samples/sec Loss 1.2869 LearningRate 0.000063 Epoch: 30 Global Step: 641890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:43,466-Speed 2495.36 samples/sec Loss 1.3060 LearningRate 0.000063 Epoch: 30 Global Step: 641900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:51,672-Speed 2496.39 samples/sec Loss 1.3179 LearningRate 0.000063 Epoch: 30 Global Step: 641910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:22:59,878-Speed 2496.42 samples/sec Loss 1.3557 LearningRate 0.000063 Epoch: 30 Global Step: 641920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:08,083-Speed 2496.55 samples/sec Loss 1.3293 LearningRate 0.000063 Epoch: 30 Global Step: 641930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:16,290-Speed 2495.64 samples/sec Loss 1.2609 LearningRate 0.000063 Epoch: 30 Global Step: 641940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:24,443-Speed 2512.62 samples/sec Loss 1.2707 LearningRate 0.000063 Epoch: 30 Global Step: 641950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:32,650-Speed 2495.91 samples/sec Loss 1.2818 LearningRate 0.000063 Epoch: 30 Global Step: 641960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:40,854-Speed 2496.50 samples/sec Loss 1.3079 LearningRate 0.000063 Epoch: 30 Global Step: 641970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:49,059-Speed 2496.35 samples/sec Loss 1.2257 LearningRate 0.000063 Epoch: 30 Global Step: 641980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:23:57,265-Speed 2496.21 samples/sec Loss 1.2804 LearningRate 0.000063 Epoch: 30 Global Step: 641990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:05,474-Speed 2494.94 samples/sec Loss 1.2716 LearningRate 0.000063 Epoch: 30 Global Step: 642000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:13,628-Speed 2511.99 samples/sec Loss 1.2818 LearningRate 0.000063 Epoch: 30 Global Step: 642010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:21,832-Speed 2496.79 samples/sec Loss 1.3022 LearningRate 0.000063 Epoch: 30 Global Step: 642020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:30,042-Speed 2495.09 samples/sec Loss 1.3285 LearningRate 0.000063 Epoch: 30 Global Step: 642030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:38,258-Speed 2492.98 samples/sec Loss 1.3118 LearningRate 0.000063 Epoch: 30 Global Step: 642040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:46,476-Speed 2492.56 samples/sec Loss 1.3187 LearningRate 0.000063 Epoch: 30 Global Step: 642050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:24:54,680-Speed 2496.70 samples/sec Loss 1.2842 LearningRate 0.000063 Epoch: 30 Global Step: 642060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:02,846-Speed 2508.26 samples/sec Loss 1.3059 LearningRate 0.000063 Epoch: 30 Global Step: 642070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:11,051-Speed 2496.33 samples/sec Loss 1.3173 LearningRate 0.000063 Epoch: 30 Global Step: 642080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:19,256-Speed 2496.38 samples/sec Loss 1.3553 LearningRate 0.000063 Epoch: 30 Global Step: 642090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:27,467-Speed 2494.50 samples/sec Loss 1.2913 LearningRate 0.000063 Epoch: 30 Global Step: 642100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:35,671-Speed 2497.14 samples/sec Loss 1.2827 LearningRate 0.000063 Epoch: 30 Global Step: 642110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:43,880-Speed 2495.34 samples/sec Loss 1.2978 LearningRate 0.000063 Epoch: 30 Global Step: 642120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:25:52,055-Speed 2505.31 samples/sec Loss 1.2645 LearningRate 0.000063 Epoch: 30 Global Step: 642130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:00,261-Speed 2496.22 samples/sec Loss 1.3178 LearningRate 0.000063 Epoch: 30 Global Step: 642140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:08,469-Speed 2495.64 samples/sec Loss 1.2820 LearningRate 0.000063 Epoch: 30 Global Step: 642150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:16,673-Speed 2496.47 samples/sec Loss 1.3280 LearningRate 0.000063 Epoch: 30 Global Step: 642160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:24,879-Speed 2496.18 samples/sec Loss 1.2849 LearningRate 0.000063 Epoch: 30 Global Step: 642170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:33,086-Speed 2495.83 samples/sec Loss 1.3154 LearningRate 0.000063 Epoch: 30 Global Step: 642180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:41,236-Speed 2513.34 samples/sec Loss 1.2720 LearningRate 0.000063 Epoch: 30 Global Step: 642190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:49,448-Speed 2494.25 samples/sec Loss 1.2925 LearningRate 0.000063 Epoch: 30 Global Step: 642200 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:26:57,671-Speed 2490.84 samples/sec Loss 1.2975 LearningRate 0.000063 Epoch: 30 Global Step: 642210 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:27:05,835-Speed 2509.06 samples/sec Loss 1.3182 LearningRate 0.000063 Epoch: 30 Global Step: 642220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:14,042-Speed 2496.09 samples/sec Loss 1.3152 LearningRate 0.000063 Epoch: 30 Global Step: 642230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:22,246-Speed 2496.77 samples/sec Loss 1.2940 LearningRate 0.000063 Epoch: 30 Global Step: 642240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:30,400-Speed 2511.89 samples/sec Loss 1.2839 LearningRate 0.000063 Epoch: 30 Global Step: 642250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:38,605-Speed 2496.52 samples/sec Loss 1.2918 LearningRate 0.000063 Epoch: 30 Global Step: 642260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:46,826-Speed 2491.57 samples/sec Loss 1.3064 LearningRate 0.000063 Epoch: 30 Global Step: 642270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:27:55,033-Speed 2495.72 samples/sec Loss 1.3134 LearningRate 0.000063 Epoch: 30 Global Step: 642280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:03,250-Speed 2492.67 samples/sec Loss 1.2915 LearningRate 0.000063 Epoch: 30 Global Step: 642290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:11,461-Speed 2494.85 samples/sec Loss 1.3200 LearningRate 0.000063 Epoch: 30 Global Step: 642300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:19,619-Speed 2510.72 samples/sec Loss 1.3196 LearningRate 0.000063 Epoch: 30 Global Step: 642310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:27,830-Speed 2494.64 samples/sec Loss 1.3155 LearningRate 0.000063 Epoch: 30 Global Step: 642320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:36,031-Speed 2497.70 samples/sec Loss 1.3251 LearningRate 0.000063 Epoch: 30 Global Step: 642330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:44,237-Speed 2496.20 samples/sec Loss 1.3295 LearningRate 0.000063 Epoch: 30 Global Step: 642340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:28:52,443-Speed 2496.13 samples/sec Loss 1.3091 LearningRate 0.000063 Epoch: 30 Global Step: 642350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:00,650-Speed 2495.57 samples/sec Loss 1.2778 LearningRate 0.000063 Epoch: 30 Global Step: 642360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:08,802-Speed 2512.79 samples/sec Loss 1.2577 LearningRate 0.000063 Epoch: 30 Global Step: 642370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:17,011-Speed 2495.14 samples/sec Loss 1.3054 LearningRate 0.000063 Epoch: 30 Global Step: 642380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:25,222-Speed 2494.82 samples/sec Loss 1.3341 LearningRate 0.000063 Epoch: 30 Global Step: 642390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:33,427-Speed 2496.38 samples/sec Loss 1.3262 LearningRate 0.000063 Epoch: 30 Global Step: 642400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:41,630-Speed 2496.91 samples/sec Loss 1.3257 LearningRate 0.000063 Epoch: 30 Global Step: 642410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:49,850-Speed 2492.04 samples/sec Loss 1.3321 LearningRate 0.000063 Epoch: 30 Global Step: 642420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:29:57,999-Speed 2513.56 samples/sec Loss 1.3108 LearningRate 0.000063 Epoch: 30 Global Step: 642430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:06,207-Speed 2495.57 samples/sec Loss 1.2805 LearningRate 0.000063 Epoch: 30 Global Step: 642440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:14,427-Speed 2492.09 samples/sec Loss 1.3059 LearningRate 0.000063 Epoch: 30 Global Step: 642450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:22,631-Speed 2497.07 samples/sec Loss 1.2987 LearningRate 0.000063 Epoch: 30 Global Step: 642460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:30,838-Speed 2495.82 samples/sec Loss 1.3202 LearningRate 0.000063 Epoch: 30 Global Step: 642470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:39,043-Speed 2496.50 samples/sec Loss 1.2913 LearningRate 0.000063 Epoch: 30 Global Step: 642480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:47,195-Speed 2512.62 samples/sec Loss 1.3433 LearningRate 0.000063 Epoch: 30 Global Step: 642490 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:30:55,400-Speed 2496.45 samples/sec Loss 1.3085 LearningRate 0.000063 Epoch: 30 Global Step: 642500 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:03,602-Speed 2497.44 samples/sec Loss 1.2718 LearningRate 0.000063 Epoch: 30 Global Step: 642510 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:11,824-Speed 2491.19 samples/sec Loss 1.3137 LearningRate 0.000063 Epoch: 30 Global Step: 642520 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:20,042-Speed 2492.38 samples/sec Loss 1.2968 LearningRate 0.000063 Epoch: 30 Global Step: 642530 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:28,248-Speed 2496.00 samples/sec Loss 1.3012 LearningRate 0.000063 Epoch: 30 Global Step: 642540 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:36,400-Speed 2512.69 samples/sec Loss 1.3175 LearningRate 0.000063 Epoch: 30 Global Step: 642550 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:44,609-Speed 2495.20 samples/sec Loss 1.3371 LearningRate 0.000063 Epoch: 30 Global Step: 642560 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:31:52,813-Speed 2496.80 samples/sec Loss 1.2900 LearningRate 0.000063 Epoch: 30 Global Step: 642570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:01,020-Speed 2495.91 samples/sec Loss 1.3185 LearningRate 0.000063 Epoch: 30 Global Step: 642580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:09,225-Speed 2496.25 samples/sec Loss 1.3352 LearningRate 0.000063 Epoch: 30 Global Step: 642590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:17,429-Speed 2496.59 samples/sec Loss 1.3623 LearningRate 0.000063 Epoch: 30 Global Step: 642600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:25,582-Speed 2512.31 samples/sec Loss 1.2994 LearningRate 0.000063 Epoch: 30 Global Step: 642610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:33,786-Speed 2496.78 samples/sec Loss 1.3046 LearningRate 0.000063 Epoch: 30 Global Step: 642620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:41,990-Speed 2496.49 samples/sec Loss 1.2817 LearningRate 0.000063 Epoch: 30 Global Step: 642630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:50,193-Speed 2497.18 samples/sec Loss 1.2802 LearningRate 0.000063 Epoch: 30 Global Step: 642640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:32:58,399-Speed 2496.27 samples/sec Loss 1.2982 LearningRate 0.000063 Epoch: 30 Global Step: 642650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:06,607-Speed 2495.27 samples/sec Loss 1.3262 LearningRate 0.000063 Epoch: 30 Global Step: 642660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:14,760-Speed 2512.35 samples/sec Loss 1.3232 LearningRate 0.000063 Epoch: 30 Global Step: 642670 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:22,967-Speed 2495.99 samples/sec Loss 1.3368 LearningRate 0.000063 Epoch: 30 Global Step: 642680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:31,179-Speed 2494.67 samples/sec Loss 1.3091 LearningRate 0.000063 Epoch: 30 Global Step: 642690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:39,385-Speed 2496.00 samples/sec Loss 1.3131 LearningRate 0.000063 Epoch: 30 Global Step: 642700 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:47,599-Speed 2493.73 samples/sec Loss 1.3103 LearningRate 0.000063 Epoch: 30 Global Step: 642710 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:33:55,829-Speed 2488.72 samples/sec Loss 1.3272 LearningRate 0.000063 Epoch: 30 Global Step: 642720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:03,983-Speed 2512.12 samples/sec Loss 1.3137 LearningRate 0.000063 Epoch: 30 Global Step: 642730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:12,200-Speed 2492.68 samples/sec Loss 1.2794 LearningRate 0.000063 Epoch: 30 Global Step: 642740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:20,406-Speed 2496.17 samples/sec Loss 1.3343 LearningRate 0.000063 Epoch: 30 Global Step: 642750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:28,611-Speed 2496.47 samples/sec Loss 1.3186 LearningRate 0.000063 Epoch: 30 Global Step: 642760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:36,814-Speed 2496.90 samples/sec Loss 1.3323 LearningRate 0.000063 Epoch: 30 Global Step: 642770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:45,017-Speed 2497.13 samples/sec Loss 1.3263 LearningRate 0.000063 Epoch: 30 Global Step: 642780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:34:53,181-Speed 2508.93 samples/sec Loss 1.3425 LearningRate 0.000063 Epoch: 30 Global Step: 642790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:01,386-Speed 2496.70 samples/sec Loss 1.2666 LearningRate 0.000063 Epoch: 30 Global Step: 642800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:09,590-Speed 2496.51 samples/sec Loss 1.3314 LearningRate 0.000063 Epoch: 30 Global Step: 642810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:17,797-Speed 2495.91 samples/sec Loss 1.2956 LearningRate 0.000063 Epoch: 30 Global Step: 642820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:25,999-Speed 2497.44 samples/sec Loss 1.3204 LearningRate 0.000063 Epoch: 30 Global Step: 642830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:34,201-Speed 2497.25 samples/sec Loss 1.3131 LearningRate 0.000063 Epoch: 30 Global Step: 642840 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:42,354-Speed 2512.27 samples/sec Loss 1.3011 LearningRate 0.000063 Epoch: 30 Global Step: 642850 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:50,558-Speed 2496.68 samples/sec Loss 1.2677 LearningRate 0.000063 Epoch: 30 Global Step: 642860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:35:58,763-Speed 2496.37 samples/sec Loss 1.2993 LearningRate 0.000063 Epoch: 30 Global Step: 642870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:06,965-Speed 2497.42 samples/sec Loss 1.3055 LearningRate 0.000063 Epoch: 30 Global Step: 642880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:15,179-Speed 2493.87 samples/sec Loss 1.3189 LearningRate 0.000063 Epoch: 30 Global Step: 642890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:23,387-Speed 2495.41 samples/sec Loss 1.3311 LearningRate 0.000063 Epoch: 30 Global Step: 642900 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:31,536-Speed 2513.62 samples/sec Loss 1.3126 LearningRate 0.000063 Epoch: 30 Global Step: 642910 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:39,743-Speed 2495.82 samples/sec Loss 1.2978 LearningRate 0.000062 Epoch: 30 Global Step: 642920 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:50,274-Speed 1944.93 samples/sec Loss 1.2931 LearningRate 0.000062 Epoch: 31 Global Step: 642930 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:36:58,471-Speed 2498.99 samples/sec Loss 1.2930 LearningRate 0.000062 Epoch: 31 Global Step: 642940 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:06,670-Speed 2498.17 samples/sec Loss 1.3080 LearningRate 0.000062 Epoch: 31 Global Step: 642950 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:14,882-Speed 2494.36 samples/sec Loss 1.2812 LearningRate 0.000062 Epoch: 31 Global Step: 642960 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:23,034-Speed 2512.68 samples/sec Loss 1.3075 LearningRate 0.000062 Epoch: 31 Global Step: 642970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:31,239-Speed 2496.58 samples/sec Loss 1.3183 LearningRate 0.000062 Epoch: 31 Global Step: 642980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:39,439-Speed 2497.84 samples/sec Loss 1.2890 LearningRate 0.000062 Epoch: 31 Global Step: 642990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:47,642-Speed 2497.42 samples/sec Loss 1.3061 LearningRate 0.000062 Epoch: 31 Global Step: 643000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:37:55,849-Speed 2495.81 samples/sec Loss 1.2782 LearningRate 0.000062 Epoch: 31 Global Step: 643010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:04,047-Speed 2498.50 samples/sec Loss 1.2835 LearningRate 0.000062 Epoch: 31 Global Step: 643020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:12,194-Speed 2514.34 samples/sec Loss 1.3014 LearningRate 0.000062 Epoch: 31 Global Step: 643030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:20,395-Speed 2497.71 samples/sec Loss 1.3167 LearningRate 0.000062 Epoch: 31 Global Step: 643040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:28,599-Speed 2496.70 samples/sec Loss 1.3158 LearningRate 0.000062 Epoch: 31 Global Step: 643050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:36,799-Speed 2498.05 samples/sec Loss 1.3219 LearningRate 0.000062 Epoch: 31 Global Step: 643060 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:44,999-Speed 2497.99 samples/sec Loss 1.2734 LearningRate 0.000062 Epoch: 31 Global Step: 643070 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:38:53,197-Speed 2498.62 samples/sec Loss 1.3145 LearningRate 0.000062 Epoch: 31 Global Step: 643080 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:01,344-Speed 2514.70 samples/sec Loss 1.2802 LearningRate 0.000062 Epoch: 31 Global Step: 643090 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:09,543-Speed 2498.26 samples/sec Loss 1.2884 LearningRate 0.000062 Epoch: 31 Global Step: 643100 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:17,742-Speed 2498.25 samples/sec Loss 1.3066 LearningRate 0.000062 Epoch: 31 Global Step: 643110 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:25,944-Speed 2497.30 samples/sec Loss 1.3317 LearningRate 0.000062 Epoch: 31 Global Step: 643120 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:34,151-Speed 2495.58 samples/sec Loss 1.2753 LearningRate 0.000062 Epoch: 31 Global Step: 643130 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:42,351-Speed 2498.27 samples/sec Loss 1.2874 LearningRate 0.000062 Epoch: 31 Global Step: 643140 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:50,503-Speed 2512.67 samples/sec Loss 1.2990 LearningRate 0.000062 Epoch: 31 Global Step: 643150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:39:58,703-Speed 2497.89 samples/sec Loss 1.2913 LearningRate 0.000062 Epoch: 31 Global Step: 643160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:06,910-Speed 2496.10 samples/sec Loss 1.2892 LearningRate 0.000062 Epoch: 31 Global Step: 643170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:15,110-Speed 2498.12 samples/sec Loss 1.3099 LearningRate 0.000062 Epoch: 31 Global Step: 643180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:23,311-Speed 2497.38 samples/sec Loss 1.3199 LearningRate 0.000062 Epoch: 31 Global Step: 643190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:31,510-Speed 2498.28 samples/sec Loss 1.2580 LearningRate 0.000062 Epoch: 31 Global Step: 643200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:39,658-Speed 2514.21 samples/sec Loss 1.2933 LearningRate 0.000062 Epoch: 31 Global Step: 643210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:47,858-Speed 2497.96 samples/sec Loss 1.3064 LearningRate 0.000062 Epoch: 31 Global Step: 643220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:40:56,057-Speed 2498.16 samples/sec Loss 1.2799 LearningRate 0.000062 Epoch: 31 Global Step: 643230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:04,259-Speed 2497.14 samples/sec Loss 1.3154 LearningRate 0.000062 Epoch: 31 Global Step: 643240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:12,460-Speed 2497.97 samples/sec Loss 1.3113 LearningRate 0.000062 Epoch: 31 Global Step: 643250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:20,660-Speed 2497.84 samples/sec Loss 1.2657 LearningRate 0.000062 Epoch: 31 Global Step: 643260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:28,807-Speed 2514.14 samples/sec Loss 1.2479 LearningRate 0.000062 Epoch: 31 Global Step: 643270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:37,008-Speed 2497.68 samples/sec Loss 1.2964 LearningRate 0.000062 Epoch: 31 Global Step: 643280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:45,207-Speed 2498.48 samples/sec Loss 1.2685 LearningRate 0.000062 Epoch: 31 Global Step: 643290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:41:53,407-Speed 2497.79 samples/sec Loss 1.2983 LearningRate 0.000062 Epoch: 31 Global Step: 643300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:01,611-Speed 2496.86 samples/sec Loss 1.3062 LearningRate 0.000062 Epoch: 31 Global Step: 643310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:09,811-Speed 2498.03 samples/sec Loss 1.2762 LearningRate 0.000062 Epoch: 31 Global Step: 643320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:17,959-Speed 2513.94 samples/sec Loss 1.3079 LearningRate 0.000062 Epoch: 31 Global Step: 643330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:26,159-Speed 2497.95 samples/sec Loss 1.2677 LearningRate 0.000062 Epoch: 31 Global Step: 643340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:34,361-Speed 2497.32 samples/sec Loss 1.2998 LearningRate 0.000062 Epoch: 31 Global Step: 643350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:42,563-Speed 2497.27 samples/sec Loss 1.3006 LearningRate 0.000062 Epoch: 31 Global Step: 643360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:50,764-Speed 2497.73 samples/sec Loss 1.2919 LearningRate 0.000062 Epoch: 31 Global Step: 643370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:42:58,972-Speed 2495.46 samples/sec Loss 1.3058 LearningRate 0.000062 Epoch: 31 Global Step: 643380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:43:07,119-Speed 2514.24 samples/sec Loss 1.2841 LearningRate 0.000062 Epoch: 31 Global Step: 643390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:43:15,317-Speed 2498.28 samples/sec Loss 1.2807 LearningRate 0.000062 Epoch: 31 Global Step: 643400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:43:23,520-Speed 2497.12 samples/sec Loss 1.2996 LearningRate 0.000062 Epoch: 31 Global Step: 643410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:43:31,718-Speed 2498.47 samples/sec Loss 1.2947 LearningRate 0.000062 Epoch: 31 Global Step: 643420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:43:39,918-Speed 2498.29 samples/sec Loss 1.2979 LearningRate 0.000062 Epoch: 31 Global Step: 643430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:43:48,130-Speed 2494.28 samples/sec Loss 1.3156 LearningRate 0.000062 Epoch: 31 Global Step: 643440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:43:56,279-Speed 2513.69 samples/sec Loss 1.2903 LearningRate 0.000062 Epoch: 31 Global Step: 643450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:04,481-Speed 2497.00 samples/sec Loss 1.3155 LearningRate 0.000062 Epoch: 31 Global Step: 643460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:12,684-Speed 2497.13 samples/sec Loss 1.3184 LearningRate 0.000062 Epoch: 31 Global Step: 643470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:20,884-Speed 2498.07 samples/sec Loss 1.3124 LearningRate 0.000062 Epoch: 31 Global Step: 643480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:29,096-Speed 2494.34 samples/sec Loss 1.2938 LearningRate 0.000062 Epoch: 31 Global Step: 643490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:37,300-Speed 2496.56 samples/sec Loss 1.3050 LearningRate 0.000062 Epoch: 31 Global Step: 643500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:45,454-Speed 2512.14 samples/sec Loss 1.2703 LearningRate 0.000062 Epoch: 31 Global Step: 643510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:44:53,673-Speed 2492.34 samples/sec Loss 1.3201 LearningRate 0.000062 Epoch: 31 Global Step: 643520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:01,871-Speed 2498.48 samples/sec Loss 1.2996 LearningRate 0.000062 Epoch: 31 Global Step: 643530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:10,073-Speed 2497.64 samples/sec Loss 1.2900 LearningRate 0.000062 Epoch: 31 Global Step: 643540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:18,270-Speed 2498.94 samples/sec Loss 1.3273 LearningRate 0.000062 Epoch: 31 Global Step: 643550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:26,473-Speed 2496.78 samples/sec Loss 1.2787 LearningRate 0.000062 Epoch: 31 Global Step: 643560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:34,621-Speed 2514.12 samples/sec Loss 1.3049 LearningRate 0.000062 Epoch: 31 Global Step: 643570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:42,820-Speed 2497.96 samples/sec Loss 1.2591 LearningRate 0.000062 Epoch: 31 Global Step: 643580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:51,024-Speed 2496.86 samples/sec Loss 1.2887 LearningRate 0.000062 Epoch: 31 Global Step: 643590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:45:59,238-Speed 2493.84 samples/sec Loss 1.3204 LearningRate 0.000062 Epoch: 31 Global Step: 643600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:07,436-Speed 2498.29 samples/sec Loss 1.3092 LearningRate 0.000062 Epoch: 31 Global Step: 643610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:15,640-Speed 2496.68 samples/sec Loss 1.2939 LearningRate 0.000062 Epoch: 31 Global Step: 643620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:23,794-Speed 2512.27 samples/sec Loss 1.2845 LearningRate 0.000062 Epoch: 31 Global Step: 643630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:31,992-Speed 2498.66 samples/sec Loss 1.3276 LearningRate 0.000062 Epoch: 31 Global Step: 643640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:40,190-Speed 2498.32 samples/sec Loss 1.2992 LearningRate 0.000062 Epoch: 31 Global Step: 643650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:48,389-Speed 2498.27 samples/sec Loss 1.3180 LearningRate 0.000062 Epoch: 31 Global Step: 643660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:46:56,590-Speed 2497.45 samples/sec Loss 1.3140 LearningRate 0.000062 Epoch: 31 Global Step: 643670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:04,792-Speed 2497.30 samples/sec Loss 1.3269 LearningRate 0.000062 Epoch: 31 Global Step: 643680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:12,937-Speed 2514.88 samples/sec Loss 1.2943 LearningRate 0.000062 Epoch: 31 Global Step: 643690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:21,139-Speed 2497.26 samples/sec Loss 1.3270 LearningRate 0.000062 Epoch: 31 Global Step: 643700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:29,340-Speed 2497.62 samples/sec Loss 1.2697 LearningRate 0.000062 Epoch: 31 Global Step: 643710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:37,540-Speed 2498.06 samples/sec Loss 1.2919 LearningRate 0.000062 Epoch: 31 Global Step: 643720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:45,741-Speed 2497.61 samples/sec Loss 1.2874 LearningRate 0.000062 Epoch: 31 Global Step: 643730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:47:53,941-Speed 2497.96 samples/sec Loss 1.2887 LearningRate 0.000062 Epoch: 31 Global Step: 643740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:48:02,088-Speed 2514.23 samples/sec Loss 1.2878 LearningRate 0.000062 Epoch: 31 Global Step: 643750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:48:10,300-Speed 2494.18 samples/sec Loss 1.2930 LearningRate 0.000062 Epoch: 31 Global Step: 643760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-07-11 17:48:18,461-Speed 2510.02 samples/sec Loss 1.3330 LearningRate 0.000062 Epoch: 31 Global Step: 643770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:48:26,677-Speed 2492.93 samples/sec Loss 1.2778 LearningRate 0.000062 Epoch: 31 Global Step: 643780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:48:34,877-Speed 2498.17 samples/sec Loss 1.3043 LearningRate 0.000062 Epoch: 31 Global Step: 643790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:48:43,092-Speed 2493.42 samples/sec Loss 1.3015 LearningRate 0.000062 Epoch: 31 Global Step: 643800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:48:51,235-Speed 2515.45 samples/sec Loss 1.3087 LearningRate 0.000062 Epoch: 31 Global Step: 643810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:48:59,461-Speed 2490.34 samples/sec Loss 1.2963 LearningRate 0.000062 Epoch: 31 Global Step: 643820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:07,661-Speed 2497.67 samples/sec Loss 1.3095 LearningRate 0.000062 Epoch: 31 Global Step: 643830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:15,864-Speed 2497.31 samples/sec Loss 1.2832 LearningRate 0.000062 Epoch: 31 Global Step: 643840 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:24,077-Speed 2494.03 samples/sec Loss 1.3120 LearningRate 0.000062 Epoch: 31 Global Step: 643850 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:32,282-Speed 2496.52 samples/sec Loss 1.3290 LearningRate 0.000062 Epoch: 31 Global Step: 643860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:40,429-Speed 2514.46 samples/sec Loss 1.2941 LearningRate 0.000062 Epoch: 31 Global Step: 643870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:48,630-Speed 2497.63 samples/sec Loss 1.3048 LearningRate 0.000062 Epoch: 31 Global Step: 643880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:49:56,828-Speed 2498.22 samples/sec Loss 1.3072 LearningRate 0.000062 Epoch: 31 Global Step: 643890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-07-11 17:50:05,033-Speed 2496.44 samples/sec Loss 1.2977 LearningRate 0.000062 Epoch: 31 Global Step: 643900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:13,237-Speed 2496.88 samples/sec Loss 1.2847 LearningRate 0.000062 Epoch: 31 Global Step: 643910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:21,444-Speed 2495.65 samples/sec Loss 1.2866 LearningRate 0.000062 Epoch: 31 Global Step: 643920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:29,596-Speed 2512.72 samples/sec Loss 1.2760 LearningRate 0.000062 Epoch: 31 Global Step: 643930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:37,800-Speed 2497.02 samples/sec Loss 1.2708 LearningRate 0.000062 Epoch: 31 Global Step: 643940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:46,002-Speed 2497.31 samples/sec Loss 1.3272 LearningRate 0.000062 Epoch: 31 Global Step: 643950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:50:54,212-Speed 2494.91 samples/sec Loss 1.2767 LearningRate 0.000062 Epoch: 31 Global Step: 643960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:02,422-Speed 2494.91 samples/sec Loss 1.3007 LearningRate 0.000062 Epoch: 31 Global Step: 643970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:10,624-Speed 2497.24 samples/sec Loss 1.2894 LearningRate 0.000062 Epoch: 31 Global Step: 643980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:18,778-Speed 2512.12 samples/sec Loss 1.3117 LearningRate 0.000062 Epoch: 31 Global Step: 643990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:26,979-Speed 2498.18 samples/sec Loss 1.2970 LearningRate 0.000062 Epoch: 31 Global Step: 644000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:35,181-Speed 2497.37 samples/sec Loss 1.2855 LearningRate 0.000062 Epoch: 31 Global Step: 644010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:43,381-Speed 2497.99 samples/sec Loss 1.2821 LearningRate 0.000062 Epoch: 31 Global Step: 644020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:51,584-Speed 2496.94 samples/sec Loss 1.3041 LearningRate 0.000062 Epoch: 31 Global Step: 644030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:51:59,789-Speed 2496.61 samples/sec Loss 1.3035 LearningRate 0.000062 Epoch: 31 Global Step: 644040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:07,936-Speed 2514.48 samples/sec Loss 1.2757 LearningRate 0.000062 Epoch: 31 Global Step: 644050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:16,134-Speed 2498.58 samples/sec Loss 1.2736 LearningRate 0.000062 Epoch: 31 Global Step: 644060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:24,332-Speed 2498.50 samples/sec Loss 1.2913 LearningRate 0.000062 Epoch: 31 Global Step: 644070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:32,535-Speed 2496.93 samples/sec Loss 1.2928 LearningRate 0.000062 Epoch: 31 Global Step: 644080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:40,738-Speed 2497.06 samples/sec Loss 1.3006 LearningRate 0.000062 Epoch: 31 Global Step: 644090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:48,945-Speed 2495.92 samples/sec Loss 1.2925 LearningRate 0.000062 Epoch: 31 Global Step: 644100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:52:57,094-Speed 2513.59 samples/sec Loss 1.2939 LearningRate 0.000062 Epoch: 31 Global Step: 644110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:05,299-Speed 2496.94 samples/sec Loss 1.2984 LearningRate 0.000062 Epoch: 31 Global Step: 644120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:13,516-Speed 2492.64 samples/sec Loss 1.2997 LearningRate 0.000062 Epoch: 31 Global Step: 644130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:21,728-Speed 2494.30 samples/sec Loss 1.2824 LearningRate 0.000062 Epoch: 31 Global Step: 644140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:29,936-Speed 2495.58 samples/sec Loss 1.2956 LearningRate 0.000062 Epoch: 31 Global Step: 644150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:38,141-Speed 2496.69 samples/sec Loss 1.2802 LearningRate 0.000062 Epoch: 31 Global Step: 644160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:46,291-Speed 2513.28 samples/sec Loss 1.3110 LearningRate 0.000062 Epoch: 31 Global Step: 644170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:53:54,491-Speed 2497.86 samples/sec Loss 1.2982 LearningRate 0.000062 Epoch: 31 Global Step: 644180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 17:54:02,649-Speed 2510.74 samples/sec Loss 1.2510 LearningRate 0.000062 Epoch: 31 Global Step: 644190 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:10,860-Speed 2494.73 samples/sec Loss 1.2922 LearningRate 0.000062 Epoch: 31 Global Step: 644200 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:19,063-Speed 2497.02 samples/sec Loss 1.2852 LearningRate 0.000062 Epoch: 31 Global Step: 644210 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:27,260-Speed 2498.98 samples/sec Loss 1.2606 LearningRate 0.000062 Epoch: 31 Global Step: 644220 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:35,402-Speed 2515.54 samples/sec Loss 1.2712 LearningRate 0.000062 Epoch: 31 Global Step: 644230 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:43,617-Speed 2493.47 samples/sec Loss 1.3349 LearningRate 0.000062 Epoch: 31 Global Step: 644240 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:54:51,816-Speed 2498.30 samples/sec Loss 1.2619 LearningRate 0.000062 Epoch: 31 Global Step: 644250 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:00,018-Speed 2497.14 samples/sec Loss 1.2856 LearningRate 0.000062 Epoch: 31 Global Step: 644260 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:08,219-Speed 2497.88 samples/sec Loss 1.3121 LearningRate 0.000062 Epoch: 31 Global Step: 644270 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:16,417-Speed 2498.61 samples/sec Loss 1.2847 LearningRate 0.000062 Epoch: 31 Global Step: 644280 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:24,562-Speed 2514.76 samples/sec Loss 1.2951 LearningRate 0.000062 Epoch: 31 Global Step: 644290 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:32,768-Speed 2496.26 samples/sec Loss 1.2464 LearningRate 0.000062 Epoch: 31 Global Step: 644300 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:40,965-Speed 2499.01 samples/sec Loss 1.2828 LearningRate 0.000062 Epoch: 31 Global Step: 644310 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:49,164-Speed 2498.53 samples/sec Loss 1.2642 LearningRate 0.000062 Epoch: 31 Global Step: 644320 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:55:57,375-Speed 2494.49 samples/sec Loss 1.2987 LearningRate 0.000062 Epoch: 31 Global Step: 644330 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:05,573-Speed 2498.95 samples/sec Loss 1.2823 LearningRate 0.000062 Epoch: 31 Global Step: 644340 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:13,718-Speed 2514.71 samples/sec Loss 1.2868 LearningRate 0.000062 Epoch: 31 Global Step: 644350 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:21,917-Speed 2498.24 samples/sec Loss 1.2618 LearningRate 0.000062 Epoch: 31 Global Step: 644360 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:30,118-Speed 2497.51 samples/sec Loss 1.2705 LearningRate 0.000062 Epoch: 31 Global Step: 644370 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:38,334-Speed 2493.48 samples/sec Loss 1.2982 LearningRate 0.000062 Epoch: 31 Global Step: 644380 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:46,533-Speed 2498.42 samples/sec Loss 1.2838 LearningRate 0.000062 Epoch: 31 Global Step: 644390 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:56:54,732-Speed 2498.29 samples/sec Loss 1.2717 LearningRate 0.000062 Epoch: 31 Global Step: 644400 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:02,888-Speed 2511.27 samples/sec Loss 1.2596 LearningRate 0.000061 Epoch: 31 Global Step: 644410 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:11,087-Speed 2498.31 samples/sec Loss 1.3139 LearningRate 0.000061 Epoch: 31 Global Step: 644420 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:19,289-Speed 2497.32 samples/sec Loss 1.2838 LearningRate 0.000061 Epoch: 31 Global Step: 644430 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:27,489-Speed 2498.01 samples/sec Loss 1.2777 LearningRate 0.000061 Epoch: 31 Global Step: 644440 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:35,692-Speed 2496.99 samples/sec Loss 1.2948 LearningRate 0.000061 Epoch: 31 Global Step: 644450 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:43,893-Speed 2497.53 samples/sec Loss 1.3014 LearningRate 0.000061 Epoch: 31 Global Step: 644460 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:57:52,043-Speed 2513.54 samples/sec Loss 1.2880 LearningRate 0.000061 Epoch: 31 Global Step: 644470 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:00,244-Speed 2497.66 samples/sec Loss 1.2760 LearningRate 0.000061 Epoch: 31 Global Step: 644480 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:08,445-Speed 2497.54 samples/sec Loss 1.2953 LearningRate 0.000061 Epoch: 31 Global Step: 644490 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:16,645-Speed 2497.94 samples/sec Loss 1.2736 LearningRate 0.000061 Epoch: 31 Global Step: 644500 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:24,846-Speed 2497.80 samples/sec Loss 1.2637 LearningRate 0.000061 Epoch: 31 Global Step: 644510 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:33,048-Speed 2497.64 samples/sec Loss 1.3061 LearningRate 0.000061 Epoch: 31 Global Step: 644520 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:41,197-Speed 2513.65 samples/sec Loss 1.2962 LearningRate 0.000061 Epoch: 31 Global Step: 644530 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:49,395-Speed 2498.45 samples/sec Loss 1.2613 LearningRate 0.000061 Epoch: 31 Global Step: 644540 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:58:57,597-Speed 2497.36 samples/sec Loss 1.2876 LearningRate 0.000061 Epoch: 31 Global Step: 644550 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:05,801-Speed 2496.53 samples/sec Loss 1.3256 LearningRate 0.000061 Epoch: 31 Global Step: 644560 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:14,010-Speed 2495.33 samples/sec Loss 1.2904 LearningRate 0.000061 Epoch: 31 Global Step: 644570 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:22,210-Speed 2498.03 samples/sec Loss 1.2971 LearningRate 0.000061 Epoch: 31 Global Step: 644580 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:30,357-Speed 2514.18 samples/sec Loss 1.2827 LearningRate 0.000061 Epoch: 31 Global Step: 644590 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:38,557-Speed 2497.91 samples/sec Loss 1.3010 LearningRate 0.000061 Epoch: 31 Global Step: 644600 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:46,755-Speed 2498.60 samples/sec Loss 1.2748 LearningRate 0.000061 Epoch: 31 Global Step: 644610 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 17:59:54,958-Speed 2497.07 samples/sec Loss 1.2665 LearningRate 0.000061 Epoch: 31 Global Step: 644620 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:03,155-Speed 2499.15 samples/sec Loss 1.3083 LearningRate 0.000061 Epoch: 31 Global Step: 644630 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:11,350-Speed 2499.46 samples/sec Loss 1.3022 LearningRate 0.000061 Epoch: 31 Global Step: 644640 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:19,493-Speed 2515.15 samples/sec Loss 1.2658 LearningRate 0.000061 Epoch: 31 Global Step: 644650 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:27,694-Speed 2497.80 samples/sec Loss 1.3114 LearningRate 0.000061 Epoch: 31 Global Step: 644660 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:35,901-Speed 2495.82 samples/sec Loss 1.3120 LearningRate 0.000061 Epoch: 31 Global Step: 644670 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:44,101-Speed 2497.92 samples/sec Loss 1.2792 LearningRate 0.000061 Epoch: 31 Global Step: 644680 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:00:52,314-Speed 2494.76 samples/sec Loss 1.2868 LearningRate 0.000061 Epoch: 31 Global Step: 644690 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:00,513-Speed 2498.32 samples/sec Loss 1.2738 LearningRate 0.000061 Epoch: 31 Global Step: 644700 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:08,678-Speed 2508.90 samples/sec Loss 1.3211 LearningRate 0.000061 Epoch: 31 Global Step: 644710 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:16,877-Speed 2498.20 samples/sec Loss 1.2858 LearningRate 0.000061 Epoch: 31 Global Step: 644720 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:25,079-Speed 2497.31 samples/sec Loss 1.2687 LearningRate 0.000061 Epoch: 31 Global Step: 644730 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:33,281-Speed 2497.34 samples/sec Loss 1.2969 LearningRate 0.000061 Epoch: 31 Global Step: 644740 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:41,484-Speed 2496.90 samples/sec Loss 1.2873 LearningRate 0.000061 Epoch: 31 Global Step: 644750 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:49,683-Speed 2498.12 samples/sec Loss 1.2576 LearningRate 0.000061 Epoch: 31 Global Step: 644760 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:01:57,829-Speed 2514.66 samples/sec Loss 1.2572 LearningRate 0.000061 Epoch: 31 Global Step: 644770 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:06,031-Speed 2497.33 samples/sec Loss 1.3018 LearningRate 0.000061 Epoch: 31 Global Step: 644780 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:14,231-Speed 2497.84 samples/sec Loss 1.3086 LearningRate 0.000061 Epoch: 31 Global Step: 644790 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:22,438-Speed 2496.20 samples/sec Loss 1.2826 LearningRate 0.000061 Epoch: 31 Global Step: 644800 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:30,636-Speed 2498.61 samples/sec Loss 1.3161 LearningRate 0.000061 Epoch: 31 Global Step: 644810 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:38,832-Speed 2499.18 samples/sec Loss 1.2917 LearningRate 0.000061 Epoch: 31 Global Step: 644820 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:46,979-Speed 2514.06 samples/sec Loss 1.2572 LearningRate 0.000061 Epoch: 31 Global Step: 644830 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:02:55,178-Speed 2498.38 samples/sec Loss 1.3040 LearningRate 0.000061 Epoch: 31 Global Step: 644840 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:03,375-Speed 2498.86 samples/sec Loss 1.2813 LearningRate 0.000061 Epoch: 31 Global Step: 644850 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:11,573-Speed 2498.75 samples/sec Loss 1.2915 LearningRate 0.000061 Epoch: 31 Global Step: 644860 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:19,771-Speed 2498.32 samples/sec Loss 1.2959 LearningRate 0.000061 Epoch: 31 Global Step: 644870 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:27,970-Speed 2498.28 samples/sec Loss 1.2520 LearningRate 0.000061 Epoch: 31 Global Step: 644880 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:36,122-Speed 2512.71 samples/sec Loss 1.2997 LearningRate 0.000061 Epoch: 31 Global Step: 644890 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:44,321-Speed 2498.32 samples/sec Loss 1.2852 LearningRate 0.000061 Epoch: 31 Global Step: 644900 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:03:52,523-Speed 2497.38 samples/sec Loss 1.2781 LearningRate 0.000061 Epoch: 31 Global Step: 644910 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:00,721-Speed 2498.55 samples/sec Loss 1.2738 LearningRate 0.000061 Epoch: 31 Global Step: 644920 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:08,933-Speed 2494.22 samples/sec Loss 1.2838 LearningRate 0.000061 Epoch: 31 Global Step: 644930 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:17,135-Speed 2497.34 samples/sec Loss 1.3193 LearningRate 0.000061 Epoch: 31 Global Step: 644940 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:25,284-Speed 2513.74 samples/sec Loss 1.2945 LearningRate 0.000061 Epoch: 31 Global Step: 644950 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:33,486-Speed 2497.07 samples/sec Loss 1.2864 LearningRate 0.000061 Epoch: 31 Global Step: 644960 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:41,686-Speed 2498.02 samples/sec Loss 1.3057 LearningRate 0.000061 Epoch: 31 Global Step: 644970 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:49,898-Speed 2494.24 samples/sec Loss 1.3069 LearningRate 0.000061 Epoch: 31 Global Step: 644980 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:04:58,098-Speed 2497.87 samples/sec Loss 1.2757 LearningRate 0.000061 Epoch: 31 Global Step: 644990 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:06,307-Speed 2495.28 samples/sec Loss 1.2860 LearningRate 0.000061 Epoch: 31 Global Step: 645000 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:14,452-Speed 2514.90 samples/sec Loss 1.3038 LearningRate 0.000061 Epoch: 31 Global Step: 645010 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:22,653-Speed 2497.58 samples/sec Loss 1.2682 LearningRate 0.000061 Epoch: 31 Global Step: 645020 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:30,865-Speed 2494.60 samples/sec Loss 1.3013 LearningRate 0.000061 Epoch: 31 Global Step: 645030 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:39,064-Speed 2498.36 samples/sec Loss 1.3297 LearningRate 0.000061 Epoch: 31 Global Step: 645040 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:47,262-Speed 2498.57 samples/sec Loss 1.3339 LearningRate 0.000061 Epoch: 31 Global Step: 645050 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:05:55,465-Speed 2496.92 samples/sec Loss 1.2855 LearningRate 0.000061 Epoch: 31 Global Step: 645060 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:03,615-Speed 2513.63 samples/sec Loss 1.3604 LearningRate 0.000061 Epoch: 31 Global Step: 645070 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:11,835-Speed 2491.86 samples/sec Loss 1.2717 LearningRate 0.000061 Epoch: 31 Global Step: 645080 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:20,037-Speed 2497.29 samples/sec Loss 1.3104 LearningRate 0.000061 Epoch: 31 Global Step: 645090 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:28,235-Speed 2498.53 samples/sec Loss 1.3123 LearningRate 0.000061 Epoch: 31 Global Step: 645100 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:36,445-Speed 2494.71 samples/sec Loss 1.2820 LearningRate 0.000061 Epoch: 31 Global Step: 645110 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:44,643-Speed 2499.84 samples/sec Loss 1.2941 LearningRate 0.000061 Epoch: 31 Global Step: 645120 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:06:52,793-Speed 2513.68 samples/sec Loss 1.2873 LearningRate 0.000061 Epoch: 31 Global Step: 645130 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:01,001-Speed 2495.44 samples/sec Loss 1.3072 LearningRate 0.000061 Epoch: 31 Global Step: 645140 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:09,412-Speed 2500.47 samples/sec Loss 1.3127 LearningRate 0.000061 Epoch: 31 Global Step: 645150 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:17,653-Speed 2500.75 samples/sec Loss 1.2774 LearningRate 0.000061 Epoch: 31 Global Step: 645160 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:25,920-Speed 2499.55 samples/sec Loss 1.2777 LearningRate 0.000061 Epoch: 31 Global Step: 645170 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:34,123-Speed 2496.90 samples/sec Loss 1.2895 LearningRate 0.000061 Epoch: 31 Global Step: 645180 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:42,300-Speed 2516.06 samples/sec Loss 1.3200 LearningRate 0.000061 Epoch: 31 Global Step: 645190 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:50,531-Speed 2500.26 samples/sec Loss 1.2815 LearningRate 0.000061 Epoch: 31 Global Step: 645200 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:07:59,261-Speed 2499.33 samples/sec Loss 1.3102 LearningRate 0.000061 Epoch: 31 Global Step: 645210 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:07,462-Speed 2497.56 samples/sec Loss 1.3175 LearningRate 0.000061 Epoch: 31 Global Step: 645220 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:15,659-Speed 2499.04 samples/sec Loss 1.3094 LearningRate 0.000061 Epoch: 31 Global Step: 645230 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:23,894-Speed 2500.48 samples/sec Loss 1.2825 LearningRate 0.000061 Epoch: 31 Global Step: 645240 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:32,080-Speed 2513.02 samples/sec Loss 1.2901 LearningRate 0.000061 Epoch: 31 Global Step: 645250 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:40,282-Speed 2497.34 samples/sec Loss 1.2856 LearningRate 0.000061 Epoch: 31 Global Step: 645260 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:48,522-Speed 2499.97 samples/sec Loss 1.2734 LearningRate 0.000061 Epoch: 31 Global Step: 645270 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:08:56,768-Speed 2499.49 samples/sec Loss 1.2698 LearningRate 0.000061 Epoch: 31 Global Step: 645280 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:04,972-Speed 2496.48 samples/sec Loss 1.2814 LearningRate 0.000061 Epoch: 31 Global Step: 645290 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:13,213-Speed 2498.55 samples/sec Loss 1.3186 LearningRate 0.000061 Epoch: 31 Global Step: 645300 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:21,415-Speed 2510.65 samples/sec Loss 1.2749 LearningRate 0.000061 Epoch: 31 Global Step: 645310 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:29,655-Speed 2499.56 samples/sec Loss 1.3233 LearningRate 0.000061 Epoch: 31 Global Step: 645320 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:37,854-Speed 2497.98 samples/sec Loss 1.3065 LearningRate 0.000061 Epoch: 31 Global Step: 645330 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:46,089-Speed 2498.33 samples/sec Loss 1.2888 LearningRate 0.000061 Epoch: 31 Global Step: 645340 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:09:54,291-Speed 2496.99 samples/sec Loss 1.3052 LearningRate 0.000061 Epoch: 31 Global Step: 645350 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:10:02,535-Speed 2499.80 samples/sec Loss 1.2911 LearningRate 0.000061 Epoch: 31 Global Step: 645360 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:10:10,737-Speed 2514.85 samples/sec Loss 1.2636 LearningRate 0.000061 Epoch: 31 Global Step: 645370 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:10:18,935-Speed 2498.35 samples/sec Loss 1.2766 LearningRate 0.000061 Epoch: 31 Global Step: 645380 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-07-11 18:10:27,174-Speed 2498.04 samples/sec Loss 1.3002 LearningRate 0.000061 Epoch: 31 Global Step: 645390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:10:35,415-Speed 2498.82 samples/sec Loss 1.3211 LearningRate 0.000061 Epoch: 31 Global Step: 645400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:10:43,615-Speed 2497.69 samples/sec Loss 1.3056 LearningRate 0.000061 Epoch: 31 Global Step: 645410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:10:51,872-Speed 2497.28 samples/sec Loss 1.2894 LearningRate 0.000061 Epoch: 31 Global Step: 645420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:00,071-Speed 2515.61 samples/sec Loss 1.2971 LearningRate 0.000061 Epoch: 31 Global Step: 645430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:08,322-Speed 2498.50 samples/sec Loss 1.2820 LearningRate 0.000061 Epoch: 31 Global Step: 645440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:16,524-Speed 2497.43 samples/sec Loss 1.3285 LearningRate 0.000061 Epoch: 31 Global Step: 645450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:24,725-Speed 2497.76 samples/sec Loss 1.2972 LearningRate 0.000061 Epoch: 31 Global Step: 645460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:32,924-Speed 2497.98 samples/sec Loss 1.2962 LearningRate 0.000061 Epoch: 31 Global Step: 645470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:41,129-Speed 2496.67 samples/sec Loss 1.2417 LearningRate 0.000061 Epoch: 31 Global Step: 645480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:49,281-Speed 2512.56 samples/sec Loss 1.2741 LearningRate 0.000061 Epoch: 31 Global Step: 645490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:11:57,487-Speed 2496.02 samples/sec Loss 1.2827 LearningRate 0.000061 Epoch: 31 Global Step: 645500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:05,689-Speed 2497.57 samples/sec Loss 1.2927 LearningRate 0.000061 Epoch: 31 Global Step: 645510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:13,900-Speed 2494.49 samples/sec Loss 1.3009 LearningRate 0.000061 Epoch: 31 Global Step: 645520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:22,101-Speed 2497.66 samples/sec Loss 1.3066 LearningRate 0.000061 Epoch: 31 Global Step: 645530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:30,305-Speed 2496.93 samples/sec Loss 1.2859 LearningRate 0.000061 Epoch: 31 Global Step: 645540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:38,449-Speed 2515.15 samples/sec Loss 1.2469 LearningRate 0.000061 Epoch: 31 Global Step: 645550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:46,652-Speed 2496.96 samples/sec Loss 1.2805 LearningRate 0.000061 Epoch: 31 Global Step: 645560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:12:54,856-Speed 2497.06 samples/sec Loss 1.3147 LearningRate 0.000061 Epoch: 31 Global Step: 645570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:03,058-Speed 2497.58 samples/sec Loss 1.2756 LearningRate 0.000061 Epoch: 31 Global Step: 645580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:11,259-Speed 2497.88 samples/sec Loss 1.2900 LearningRate 0.000061 Epoch: 31 Global Step: 645590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:19,459-Speed 2497.95 samples/sec Loss 1.2859 LearningRate 0.000061 Epoch: 31 Global Step: 645600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:27,605-Speed 2514.20 samples/sec Loss 1.2680 LearningRate 0.000061 Epoch: 31 Global Step: 645610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:35,803-Speed 2498.79 samples/sec Loss 1.3155 LearningRate 0.000061 Epoch: 31 Global Step: 645620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:44,003-Speed 2498.02 samples/sec Loss 1.2944 LearningRate 0.000061 Epoch: 31 Global Step: 645630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:13:52,205-Speed 2497.00 samples/sec Loss 1.2748 LearningRate 0.000061 Epoch: 31 Global Step: 645640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:00,402-Speed 2499.13 samples/sec Loss 1.2756 LearningRate 0.000061 Epoch: 31 Global Step: 645650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:08,601-Speed 2498.19 samples/sec Loss 1.2764 LearningRate 0.000061 Epoch: 31 Global Step: 645660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:16,743-Speed 2515.90 samples/sec Loss 1.2596 LearningRate 0.000061 Epoch: 31 Global Step: 645670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:24,945-Speed 2497.27 samples/sec Loss 1.2966 LearningRate 0.000061 Epoch: 31 Global Step: 645680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:33,146-Speed 2497.48 samples/sec Loss 1.3124 LearningRate 0.000061 Epoch: 31 Global Step: 645690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:41,344-Speed 2498.91 samples/sec Loss 1.2449 LearningRate 0.000061 Epoch: 31 Global Step: 645700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:49,549-Speed 2497.22 samples/sec Loss 1.3009 LearningRate 0.000061 Epoch: 31 Global Step: 645710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:14:57,755-Speed 2495.85 samples/sec Loss 1.2954 LearningRate 0.000061 Epoch: 31 Global Step: 645720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:05,903-Speed 2514.16 samples/sec Loss 1.2826 LearningRate 0.000061 Epoch: 31 Global Step: 645730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:14,114-Speed 2494.69 samples/sec Loss 1.3155 LearningRate 0.000061 Epoch: 31 Global Step: 645740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:22,312-Speed 2498.55 samples/sec Loss 1.2428 LearningRate 0.000061 Epoch: 31 Global Step: 645750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:30,521-Speed 2495.06 samples/sec Loss 1.2712 LearningRate 0.000061 Epoch: 31 Global Step: 645760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:38,724-Speed 2497.10 samples/sec Loss 1.2640 LearningRate 0.000061 Epoch: 31 Global Step: 645770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:46,925-Speed 2497.82 samples/sec Loss 1.2985 LearningRate 0.000061 Epoch: 31 Global Step: 645780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:15:55,076-Speed 2512.87 samples/sec Loss 1.2837 LearningRate 0.000061 Epoch: 31 Global Step: 645790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:03,275-Speed 2498.38 samples/sec Loss 1.2932 LearningRate 0.000061 Epoch: 31 Global Step: 645800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:11,477-Speed 2497.18 samples/sec Loss 1.2605 LearningRate 0.000061 Epoch: 31 Global Step: 645810 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:19,704-Speed 2489.70 samples/sec Loss 1.2909 LearningRate 0.000061 Epoch: 31 Global Step: 645820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:27,907-Speed 2496.91 samples/sec Loss 1.2711 LearningRate 0.000061 Epoch: 31 Global Step: 645830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:36,113-Speed 2496.05 samples/sec Loss 1.3197 LearningRate 0.000061 Epoch: 31 Global Step: 645840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:44,261-Speed 2514.09 samples/sec Loss 1.2901 LearningRate 0.000061 Epoch: 31 Global Step: 645850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:16:52,471-Speed 2494.83 samples/sec Loss 1.2662 LearningRate 0.000061 Epoch: 31 Global Step: 645860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:00,673-Speed 2497.51 samples/sec Loss 1.2783 LearningRate 0.000061 Epoch: 31 Global Step: 645870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:08,875-Speed 2497.24 samples/sec Loss 1.3032 LearningRate 0.000061 Epoch: 31 Global Step: 645880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:17,079-Speed 2496.79 samples/sec Loss 1.2780 LearningRate 0.000061 Epoch: 31 Global Step: 645890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:25,293-Speed 2493.97 samples/sec Loss 1.2815 LearningRate 0.000061 Epoch: 31 Global Step: 645900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:33,442-Speed 2513.32 samples/sec Loss 1.3121 LearningRate 0.000061 Epoch: 31 Global Step: 645910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:41,644-Speed 2497.48 samples/sec Loss 1.3025 LearningRate 0.000061 Epoch: 31 Global Step: 645920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:49,846-Speed 2497.25 samples/sec Loss 1.3340 LearningRate 0.000060 Epoch: 31 Global Step: 645930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:17:58,048-Speed 2497.72 samples/sec Loss 1.2826 LearningRate 0.000060 Epoch: 31 Global Step: 645940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:06,263-Speed 2493.12 samples/sec Loss 1.3182 LearningRate 0.000060 Epoch: 31 Global Step: 645950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:14,466-Speed 2496.99 samples/sec Loss 1.2873 LearningRate 0.000060 Epoch: 31 Global Step: 645960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:22,615-Speed 2513.92 samples/sec Loss 1.2972 LearningRate 0.000060 Epoch: 31 Global Step: 645970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:30,818-Speed 2496.90 samples/sec Loss 1.2812 LearningRate 0.000060 Epoch: 31 Global Step: 645980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:39,028-Speed 2494.91 samples/sec Loss 1.3083 LearningRate 0.000060 Epoch: 31 Global Step: 645990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:47,236-Speed 2495.56 samples/sec Loss 1.2818 LearningRate 0.000060 Epoch: 31 Global Step: 646000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:18:55,443-Speed 2495.66 samples/sec Loss 1.3092 LearningRate 0.000060 Epoch: 31 Global Step: 646010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:03,646-Speed 2497.07 samples/sec Loss 1.2833 LearningRate 0.000060 Epoch: 31 Global Step: 646020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:11,800-Speed 2512.02 samples/sec Loss 1.3164 LearningRate 0.000060 Epoch: 31 Global Step: 646030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:20,003-Speed 2497.30 samples/sec Loss 1.3084 LearningRate 0.000060 Epoch: 31 Global Step: 646040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:28,206-Speed 2497.12 samples/sec Loss 1.2950 LearningRate 0.000060 Epoch: 31 Global Step: 646050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:36,410-Speed 2496.60 samples/sec Loss 1.3189 LearningRate 0.000060 Epoch: 31 Global Step: 646060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:44,609-Speed 2498.27 samples/sec Loss 1.2999 LearningRate 0.000060 Epoch: 31 Global Step: 646070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:19:52,817-Speed 2495.72 samples/sec Loss 1.2853 LearningRate 0.000060 Epoch: 31 Global Step: 646080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:00,975-Speed 2510.95 samples/sec Loss 1.2982 LearningRate 0.000060 Epoch: 31 Global Step: 646090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:09,175-Speed 2497.84 samples/sec Loss 1.2801 LearningRate 0.000060 Epoch: 31 Global Step: 646100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:17,388-Speed 2494.17 samples/sec Loss 1.2798 LearningRate 0.000060 Epoch: 31 Global Step: 646110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:25,594-Speed 2496.51 samples/sec Loss 1.2874 LearningRate 0.000060 Epoch: 31 Global Step: 646120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:33,795-Speed 2497.52 samples/sec Loss 1.2742 LearningRate 0.000060 Epoch: 31 Global Step: 646130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:41,998-Speed 2497.09 samples/sec Loss 1.2995 LearningRate 0.000060 Epoch: 31 Global Step: 646140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:50,160-Speed 2509.54 samples/sec Loss 1.2964 LearningRate 0.000060 Epoch: 31 Global Step: 646150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:20:58,366-Speed 2495.99 samples/sec Loss 1.3025 LearningRate 0.000060 Epoch: 31 Global Step: 646160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:06,565-Speed 2498.37 samples/sec Loss 1.2899 LearningRate 0.000060 Epoch: 31 Global Step: 646170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:14,769-Speed 2496.86 samples/sec Loss 1.2584 LearningRate 0.000060 Epoch: 31 Global Step: 646180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:22,972-Speed 2496.80 samples/sec Loss 1.3110 LearningRate 0.000060 Epoch: 31 Global Step: 646190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:31,171-Speed 2498.24 samples/sec Loss 1.3012 LearningRate 0.000060 Epoch: 31 Global Step: 646200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:39,319-Speed 2514.07 samples/sec Loss 1.3224 LearningRate 0.000060 Epoch: 31 Global Step: 646210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:47,518-Speed 2498.26 samples/sec Loss 1.2977 LearningRate 0.000060 Epoch: 31 Global Step: 646220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:21:55,718-Speed 2498.00 samples/sec Loss 1.3087 LearningRate 0.000060 Epoch: 31 Global Step: 646230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:03,927-Speed 2495.25 samples/sec Loss 1.3540 LearningRate 0.000060 Epoch: 31 Global Step: 646240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:12,131-Speed 2496.80 samples/sec Loss 1.3036 LearningRate 0.000060 Epoch: 31 Global Step: 646250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:20,329-Speed 2498.48 samples/sec Loss 1.2757 LearningRate 0.000060 Epoch: 31 Global Step: 646260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:28,480-Speed 2512.92 samples/sec Loss 1.3030 LearningRate 0.000060 Epoch: 31 Global Step: 646270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:36,683-Speed 2496.91 samples/sec Loss 1.3187 LearningRate 0.000060 Epoch: 31 Global Step: 646280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:44,887-Speed 2496.84 samples/sec Loss 1.2632 LearningRate 0.000060 Epoch: 31 Global Step: 646290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:22:53,092-Speed 2496.35 samples/sec Loss 1.3135 LearningRate 0.000060 Epoch: 31 Global Step: 646300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:01,293-Speed 2497.84 samples/sec Loss 1.2721 LearningRate 0.000060 Epoch: 31 Global Step: 646310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:09,495-Speed 2497.56 samples/sec Loss 1.2987 LearningRate 0.000060 Epoch: 31 Global Step: 646320 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:17,643-Speed 2513.55 samples/sec Loss 1.3070 LearningRate 0.000060 Epoch: 31 Global Step: 646330 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:25,846-Speed 2497.32 samples/sec Loss 1.2693 LearningRate 0.000060 Epoch: 31 Global Step: 646340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:34,061-Speed 2493.85 samples/sec Loss 1.3056 LearningRate 0.000060 Epoch: 31 Global Step: 646350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:42,261-Speed 2497.85 samples/sec Loss 1.2652 LearningRate 0.000060 Epoch: 31 Global Step: 646360 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:50,464-Speed 2497.16 samples/sec Loss 1.2997 LearningRate 0.000060 Epoch: 31 Global Step: 646370 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:23:58,665-Speed 2497.79 samples/sec Loss 1.3049 LearningRate 0.000060 Epoch: 31 Global Step: 646380 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:06,810-Speed 2514.70 samples/sec Loss 1.2751 LearningRate 0.000060 Epoch: 31 Global Step: 646390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:15,015-Speed 2496.58 samples/sec Loss 1.2827 LearningRate 0.000060 Epoch: 31 Global Step: 646400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:23,211-Speed 2499.19 samples/sec Loss 1.2590 LearningRate 0.000060 Epoch: 31 Global Step: 646410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:31,411-Speed 2498.06 samples/sec Loss 1.2660 LearningRate 0.000060 Epoch: 31 Global Step: 646420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:39,624-Speed 2494.23 samples/sec Loss 1.2875 LearningRate 0.000060 Epoch: 31 Global Step: 646430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:47,834-Speed 2494.70 samples/sec Loss 1.2846 LearningRate 0.000060 Epoch: 31 Global Step: 646440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:24:55,980-Speed 2514.29 samples/sec Loss 1.2843 LearningRate 0.000060 Epoch: 31 Global Step: 646450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:04,183-Speed 2497.09 samples/sec Loss 1.3275 LearningRate 0.000060 Epoch: 31 Global Step: 646460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:12,383-Speed 2497.93 samples/sec Loss 1.2918 LearningRate 0.000060 Epoch: 31 Global Step: 646470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:20,588-Speed 2496.46 samples/sec Loss 1.2898 LearningRate 0.000060 Epoch: 31 Global Step: 646480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:28,789-Speed 2497.46 samples/sec Loss 1.2830 LearningRate 0.000060 Epoch: 31 Global Step: 646490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:36,992-Speed 2497.28 samples/sec Loss 1.2975 LearningRate 0.000060 Epoch: 31 Global Step: 646500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:45,151-Speed 2510.41 samples/sec Loss 1.2765 LearningRate 0.000060 Epoch: 31 Global Step: 646510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:25:53,365-Speed 2493.46 samples/sec Loss 1.3059 LearningRate 0.000060 Epoch: 31 Global Step: 646520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:01,566-Speed 2497.86 samples/sec Loss 1.3235 LearningRate 0.000060 Epoch: 31 Global Step: 646530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:09,766-Speed 2498.03 samples/sec Loss 1.2913 LearningRate 0.000060 Epoch: 31 Global Step: 646540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:17,966-Speed 2497.91 samples/sec Loss 1.2903 LearningRate 0.000060 Epoch: 31 Global Step: 646550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:26,169-Speed 2497.00 samples/sec Loss 1.3280 LearningRate 0.000060 Epoch: 31 Global Step: 646560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:34,315-Speed 2514.47 samples/sec Loss 1.2790 LearningRate 0.000060 Epoch: 31 Global Step: 646570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:42,517-Speed 2497.41 samples/sec Loss 1.2815 LearningRate 0.000060 Epoch: 31 Global Step: 646580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:26:50,718-Speed 2497.58 samples/sec Loss 1.2405 LearningRate 0.000060 Epoch: 31 Global Step: 646590 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:26:58,919-Speed 2497.48 samples/sec Loss 1.2824 LearningRate 0.000060 Epoch: 31 Global Step: 646600 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:07,121-Speed 2497.49 samples/sec Loss 1.2711 LearningRate 0.000060 Epoch: 31 Global Step: 646610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:15,325-Speed 2496.67 samples/sec Loss 1.2890 LearningRate 0.000060 Epoch: 31 Global Step: 646620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:23,474-Speed 2513.64 samples/sec Loss 1.2891 LearningRate 0.000060 Epoch: 31 Global Step: 646630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:31,675-Speed 2497.99 samples/sec Loss 1.2947 LearningRate 0.000060 Epoch: 31 Global Step: 646640 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:39,891-Speed 2492.85 samples/sec Loss 1.2993 LearningRate 0.000060 Epoch: 31 Global Step: 646650 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:48,097-Speed 2496.64 samples/sec Loss 1.2799 LearningRate 0.000060 Epoch: 31 Global Step: 646660 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:27:56,299-Speed 2497.24 samples/sec Loss 1.2964 LearningRate 0.000060 Epoch: 31 Global Step: 646670 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:04,498-Speed 2498.16 samples/sec Loss 1.2895 LearningRate 0.000060 Epoch: 31 Global Step: 646680 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:12,646-Speed 2514.01 samples/sec Loss 1.3050 LearningRate 0.000060 Epoch: 31 Global Step: 646690 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:20,849-Speed 2496.83 samples/sec Loss 1.2957 LearningRate 0.000060 Epoch: 31 Global Step: 646700 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:29,048-Speed 2498.31 samples/sec Loss 1.2567 LearningRate 0.000060 Epoch: 31 Global Step: 646710 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:37,248-Speed 2498.02 samples/sec Loss 1.2827 LearningRate 0.000060 Epoch: 31 Global Step: 646720 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:45,459-Speed 2494.62 samples/sec Loss 1.3091 LearningRate 0.000060 Epoch: 31 Global Step: 646730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:28:53,658-Speed 2498.54 samples/sec Loss 1.2951 LearningRate 0.000060 Epoch: 31 Global Step: 646740 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:01,805-Speed 2514.16 samples/sec Loss 1.3010 LearningRate 0.000060 Epoch: 31 Global Step: 646750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:10,003-Speed 2498.51 samples/sec Loss 1.2810 LearningRate 0.000060 Epoch: 31 Global Step: 646760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:18,203-Speed 2498.16 samples/sec Loss 1.2585 LearningRate 0.000060 Epoch: 31 Global Step: 646770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:26,406-Speed 2497.13 samples/sec Loss 1.2638 LearningRate 0.000060 Epoch: 31 Global Step: 646780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:34,602-Speed 2499.02 samples/sec Loss 1.2553 LearningRate 0.000060 Epoch: 31 Global Step: 646790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:42,801-Speed 2498.20 samples/sec Loss 1.2897 LearningRate 0.000060 Epoch: 31 Global Step: 646800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:50,946-Speed 2514.70 samples/sec Loss 1.3193 LearningRate 0.000060 Epoch: 31 Global Step: 646810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:29:59,148-Speed 2497.81 samples/sec Loss 1.2857 LearningRate 0.000060 Epoch: 31 Global Step: 646820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:07,350-Speed 2496.98 samples/sec Loss 1.2999 LearningRate 0.000060 Epoch: 31 Global Step: 646830 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:15,549-Speed 2498.41 samples/sec Loss 1.3071 LearningRate 0.000060 Epoch: 31 Global Step: 646840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:23,751-Speed 2497.64 samples/sec Loss 1.2671 LearningRate 0.000060 Epoch: 31 Global Step: 646850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:31,956-Speed 2496.50 samples/sec Loss 1.2837 LearningRate 0.000060 Epoch: 31 Global Step: 646860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:40,099-Speed 2515.09 samples/sec Loss 1.2850 LearningRate 0.000060 Epoch: 31 Global Step: 646870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:48,301-Speed 2497.30 samples/sec Loss 1.2458 LearningRate 0.000060 Epoch: 31 Global Step: 646880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:30:56,501-Speed 2498.14 samples/sec Loss 1.2892 LearningRate 0.000060 Epoch: 31 Global Step: 646890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:04,710-Speed 2495.23 samples/sec Loss 1.3021 LearningRate 0.000060 Epoch: 31 Global Step: 646900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:12,909-Speed 2498.09 samples/sec Loss 1.2908 LearningRate 0.000060 Epoch: 31 Global Step: 646910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:21,112-Speed 2497.21 samples/sec Loss 1.2777 LearningRate 0.000060 Epoch: 31 Global Step: 646920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:29,261-Speed 2513.74 samples/sec Loss 1.2604 LearningRate 0.000060 Epoch: 31 Global Step: 646930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:37,459-Speed 2498.85 samples/sec Loss 1.3042 LearningRate 0.000060 Epoch: 31 Global Step: 646940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:45,665-Speed 2496.12 samples/sec Loss 1.3064 LearningRate 0.000060 Epoch: 31 Global Step: 646950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:31:53,861-Speed 2499.68 samples/sec Loss 1.3103 LearningRate 0.000060 Epoch: 31 Global Step: 646960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:02,064-Speed 2497.03 samples/sec Loss 1.3206 LearningRate 0.000060 Epoch: 31 Global Step: 646970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:10,267-Speed 2497.11 samples/sec Loss 1.2986 LearningRate 0.000060 Epoch: 31 Global Step: 646980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:18,424-Speed 2510.96 samples/sec Loss 1.2579 LearningRate 0.000060 Epoch: 31 Global Step: 646990 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:26,622-Speed 2498.38 samples/sec Loss 1.3007 LearningRate 0.000060 Epoch: 31 Global Step: 647000 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:34,819-Speed 2499.19 samples/sec Loss 1.2939 LearningRate 0.000060 Epoch: 31 Global Step: 647010 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:43,018-Speed 2498.16 samples/sec Loss 1.2937 LearningRate 0.000060 Epoch: 31 Global Step: 647020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:51,230-Speed 2494.38 samples/sec Loss 1.3156 LearningRate 0.000060 Epoch: 31 Global Step: 647030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:32:59,430-Speed 2498.24 samples/sec Loss 1.2655 LearningRate 0.000060 Epoch: 31 Global Step: 647040 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:07,576-Speed 2514.54 samples/sec Loss 1.3036 LearningRate 0.000060 Epoch: 31 Global Step: 647050 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:15,780-Speed 2496.80 samples/sec Loss 1.2894 LearningRate 0.000060 Epoch: 31 Global Step: 647060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:23,978-Speed 2498.55 samples/sec Loss 1.2836 LearningRate 0.000060 Epoch: 31 Global Step: 647070 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:32,182-Speed 2496.79 samples/sec Loss 1.2808 LearningRate 0.000060 Epoch: 31 Global Step: 647080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:40,382-Speed 2497.82 samples/sec Loss 1.2764 LearningRate 0.000060 Epoch: 31 Global Step: 647090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:48,591-Speed 2495.46 samples/sec Loss 1.2143 LearningRate 0.000060 Epoch: 31 Global Step: 647100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:33:56,736-Speed 2514.85 samples/sec Loss 1.2619 LearningRate 0.000060 Epoch: 31 Global Step: 647110 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:04,946-Speed 2494.85 samples/sec Loss 1.2772 LearningRate 0.000060 Epoch: 31 Global Step: 647120 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:13,144-Speed 2498.77 samples/sec Loss 1.2640 LearningRate 0.000060 Epoch: 31 Global Step: 647130 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:21,352-Speed 2495.36 samples/sec Loss 1.2907 LearningRate 0.000060 Epoch: 31 Global Step: 647140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:29,564-Speed 2494.27 samples/sec Loss 1.2923 LearningRate 0.000060 Epoch: 31 Global Step: 647150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:37,776-Speed 2494.39 samples/sec Loss 1.2872 LearningRate 0.000060 Epoch: 31 Global Step: 647160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:45,921-Speed 2514.89 samples/sec Loss 1.2970 LearningRate 0.000060 Epoch: 31 Global Step: 647170 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:34:54,134-Speed 2493.94 samples/sec Loss 1.2751 LearningRate 0.000060 Epoch: 31 Global Step: 647180 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:35:02,333-Speed 2498.14 samples/sec Loss 1.2756 LearningRate 0.000060 Epoch: 31 Global Step: 647190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:35:10,533-Speed 2497.86 samples/sec Loss 1.2870 LearningRate 0.000060 Epoch: 31 Global Step: 647200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-07-11 18:35:18,689-Speed 2511.61 samples/sec Loss 1.3395 LearningRate 0.000060 Epoch: 31 Global Step: 647210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:35:26,901-Speed 2494.41 samples/sec Loss 1.2729 LearningRate 0.000060 Epoch: 31 Global Step: 647220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:35:35,046-Speed 2514.82 samples/sec Loss 1.2937 LearningRate 0.000060 Epoch: 31 Global Step: 647230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:35:43,247-Speed 2497.85 samples/sec Loss 1.2822 LearningRate 0.000060 Epoch: 31 Global Step: 647240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:35:51,443-Speed 2499.00 samples/sec Loss 1.3143 LearningRate 0.000060 Epoch: 31 Global Step: 647250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:35:59,643-Speed 2498.01 samples/sec Loss 1.3091 LearningRate 0.000060 Epoch: 31 Global Step: 647260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:07,841-Speed 2498.59 samples/sec Loss 1.3016 LearningRate 0.000060 Epoch: 31 Global Step: 647270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:16,051-Speed 2494.84 samples/sec Loss 1.2663 LearningRate 0.000060 Epoch: 31 Global Step: 647280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:24,194-Speed 2515.42 samples/sec Loss 1.3098 LearningRate 0.000060 Epoch: 31 Global Step: 647290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:32,393-Speed 2498.24 samples/sec Loss 1.3320 LearningRate 0.000060 Epoch: 31 Global Step: 647300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:40,593-Speed 2497.92 samples/sec Loss 1.2929 LearningRate 0.000060 Epoch: 31 Global Step: 647310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:48,803-Speed 2495.40 samples/sec Loss 1.2981 LearningRate 0.000060 Epoch: 31 Global Step: 647320 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:36:57,005-Speed 2497.44 samples/sec Loss 1.2771 LearningRate 0.000060 Epoch: 31 Global Step: 647330 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:05,207-Speed 2497.45 samples/sec Loss 1.2323 LearningRate 0.000060 Epoch: 31 Global Step: 647340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:13,352-Speed 2514.95 samples/sec Loss 1.3018 LearningRate 0.000060 Epoch: 31 Global Step: 647350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:21,557-Speed 2496.72 samples/sec Loss 1.2683 LearningRate 0.000060 Epoch: 31 Global Step: 647360 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:29,755-Speed 2498.38 samples/sec Loss 1.3190 LearningRate 0.000060 Epoch: 31 Global Step: 647370 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:37,954-Speed 2498.38 samples/sec Loss 1.2736 LearningRate 0.000060 Epoch: 31 Global Step: 647380 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:46,152-Speed 2498.43 samples/sec Loss 1.2672 LearningRate 0.000060 Epoch: 31 Global Step: 647390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:37:54,351-Speed 2498.39 samples/sec Loss 1.3102 LearningRate 0.000060 Epoch: 31 Global Step: 647400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:02,497-Speed 2514.56 samples/sec Loss 1.2804 LearningRate 0.000060 Epoch: 31 Global Step: 647410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:10,697-Speed 2497.89 samples/sec Loss 1.2938 LearningRate 0.000060 Epoch: 31 Global Step: 647420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:18,894-Speed 2499.05 samples/sec Loss 1.2954 LearningRate 0.000060 Epoch: 31 Global Step: 647430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:27,098-Speed 2496.62 samples/sec Loss 1.2735 LearningRate 0.000060 Epoch: 31 Global Step: 647440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:35,298-Speed 2498.07 samples/sec Loss 1.2720 LearningRate 0.000059 Epoch: 31 Global Step: 647450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:43,504-Speed 2495.97 samples/sec Loss 1.2651 LearningRate 0.000059 Epoch: 31 Global Step: 647460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:51,649-Speed 2514.96 samples/sec Loss 1.2891 LearningRate 0.000059 Epoch: 31 Global Step: 647470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:38:59,849-Speed 2497.87 samples/sec Loss 1.3240 LearningRate 0.000059 Epoch: 31 Global Step: 647480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:08,071-Speed 2491.15 samples/sec Loss 1.2532 LearningRate 0.000059 Epoch: 31 Global Step: 647490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:16,268-Speed 2499.21 samples/sec Loss 1.3222 LearningRate 0.000059 Epoch: 31 Global Step: 647500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:24,466-Speed 2498.92 samples/sec Loss 1.2878 LearningRate 0.000059 Epoch: 31 Global Step: 647510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:32,673-Speed 2495.88 samples/sec Loss 1.2727 LearningRate 0.000059 Epoch: 31 Global Step: 647520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:40,820-Speed 2514.22 samples/sec Loss 1.2677 LearningRate 0.000059 Epoch: 31 Global Step: 647530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:49,031-Speed 2494.55 samples/sec Loss 1.2641 LearningRate 0.000059 Epoch: 31 Global Step: 647540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:39:57,232-Speed 2497.89 samples/sec Loss 1.2734 LearningRate 0.000059 Epoch: 31 Global Step: 647550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:05,430-Speed 2498.41 samples/sec Loss 1.2853 LearningRate 0.000059 Epoch: 31 Global Step: 647560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:13,630-Speed 2498.06 samples/sec Loss 1.3122 LearningRate 0.000059 Epoch: 31 Global Step: 647570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:21,840-Speed 2494.85 samples/sec Loss 1.2802 LearningRate 0.000059 Epoch: 31 Global Step: 647580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:29,986-Speed 2514.70 samples/sec Loss 1.2772 LearningRate 0.000059 Epoch: 31 Global Step: 647590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:38,186-Speed 2497.90 samples/sec Loss 1.2822 LearningRate 0.000059 Epoch: 31 Global Step: 647600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:46,393-Speed 2495.85 samples/sec Loss 1.2782 LearningRate 0.000059 Epoch: 31 Global Step: 647610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:40:54,595-Speed 2497.27 samples/sec Loss 1.2881 LearningRate 0.000059 Epoch: 31 Global Step: 647620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:02,805-Speed 2495.08 samples/sec Loss 1.2728 LearningRate 0.000059 Epoch: 31 Global Step: 647630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:11,004-Speed 2498.06 samples/sec Loss 1.2633 LearningRate 0.000059 Epoch: 31 Global Step: 647640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:19,166-Speed 2509.69 samples/sec Loss 1.2903 LearningRate 0.000059 Epoch: 31 Global Step: 647650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:27,365-Speed 2498.31 samples/sec Loss 1.2489 LearningRate 0.000059 Epoch: 31 Global Step: 647660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:35,567-Speed 2497.29 samples/sec Loss 1.2860 LearningRate 0.000059 Epoch: 31 Global Step: 647670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:43,768-Speed 2498.03 samples/sec Loss 1.3174 LearningRate 0.000059 Epoch: 31 Global Step: 647680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:41:51,971-Speed 2497.63 samples/sec Loss 1.2789 LearningRate 0.000059 Epoch: 31 Global Step: 647690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:00,180-Speed 2495.65 samples/sec Loss 1.2739 LearningRate 0.000059 Epoch: 31 Global Step: 647700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:08,326-Speed 2514.70 samples/sec Loss 1.2691 LearningRate 0.000059 Epoch: 31 Global Step: 647710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:16,548-Speed 2491.10 samples/sec Loss 1.2528 LearningRate 0.000059 Epoch: 31 Global Step: 647720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:24,747-Speed 2498.34 samples/sec Loss 1.3061 LearningRate 0.000059 Epoch: 31 Global Step: 647730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:32,947-Speed 2497.77 samples/sec Loss 1.2967 LearningRate 0.000059 Epoch: 31 Global Step: 647740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:41,153-Speed 2496.16 samples/sec Loss 1.2704 LearningRate 0.000059 Epoch: 31 Global Step: 647750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:49,355-Speed 2497.33 samples/sec Loss 1.2913 LearningRate 0.000059 Epoch: 31 Global Step: 647760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:42:57,501-Speed 2514.59 samples/sec Loss 1.2675 LearningRate 0.000059 Epoch: 31 Global Step: 647770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:05,717-Speed 2493.04 samples/sec Loss 1.3019 LearningRate 0.000059 Epoch: 31 Global Step: 647780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:13,920-Speed 2497.18 samples/sec Loss 1.3149 LearningRate 0.000059 Epoch: 31 Global Step: 647790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:22,127-Speed 2495.59 samples/sec Loss 1.2912 LearningRate 0.000059 Epoch: 31 Global Step: 647800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:30,330-Speed 2497.28 samples/sec Loss 1.2652 LearningRate 0.000059 Epoch: 31 Global Step: 647810 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:38,526-Speed 2499.17 samples/sec Loss 1.2552 LearningRate 0.000059 Epoch: 31 Global Step: 647820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:46,675-Speed 2513.54 samples/sec Loss 1.2884 LearningRate 0.000059 Epoch: 31 Global Step: 647830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:43:54,882-Speed 2495.83 samples/sec Loss 1.2852 LearningRate 0.000059 Epoch: 31 Global Step: 647840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:03,087-Speed 2496.68 samples/sec Loss 1.2802 LearningRate 0.000059 Epoch: 31 Global Step: 647850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:11,288-Speed 2497.69 samples/sec Loss 1.3227 LearningRate 0.000059 Epoch: 31 Global Step: 647860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:19,488-Speed 2498.17 samples/sec Loss 1.2941 LearningRate 0.000059 Epoch: 31 Global Step: 647870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:27,691-Speed 2497.29 samples/sec Loss 1.3118 LearningRate 0.000059 Epoch: 31 Global Step: 647880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:35,837-Speed 2514.31 samples/sec Loss 1.2783 LearningRate 0.000059 Epoch: 31 Global Step: 647890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:44,051-Speed 2493.89 samples/sec Loss 1.3035 LearningRate 0.000059 Epoch: 31 Global Step: 647900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:44:52,251-Speed 2498.75 samples/sec Loss 1.2912 LearningRate 0.000059 Epoch: 31 Global Step: 647910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:00,453-Speed 2497.17 samples/sec Loss 1.3097 LearningRate 0.000059 Epoch: 31 Global Step: 647920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:08,649-Speed 2499.04 samples/sec Loss 1.2743 LearningRate 0.000059 Epoch: 31 Global Step: 647930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:16,854-Speed 2496.80 samples/sec Loss 1.3218 LearningRate 0.000059 Epoch: 31 Global Step: 647940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:24,998-Speed 2514.94 samples/sec Loss 1.2656 LearningRate 0.000059 Epoch: 31 Global Step: 647950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:33,203-Speed 2496.44 samples/sec Loss 1.2677 LearningRate 0.000059 Epoch: 31 Global Step: 647960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:41,402-Speed 2498.27 samples/sec Loss 1.2911 LearningRate 0.000059 Epoch: 31 Global Step: 647970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:49,601-Speed 2498.39 samples/sec Loss 1.2490 LearningRate 0.000059 Epoch: 31 Global Step: 647980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:45:57,800-Speed 2498.20 samples/sec Loss 1.2820 LearningRate 0.000059 Epoch: 31 Global Step: 647990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:06,006-Speed 2496.06 samples/sec Loss 1.2589 LearningRate 0.000059 Epoch: 31 Global Step: 648000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:14,163-Speed 2511.19 samples/sec Loss 1.2855 LearningRate 0.000059 Epoch: 31 Global Step: 648010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:22,366-Speed 2497.22 samples/sec Loss 1.2771 LearningRate 0.000059 Epoch: 31 Global Step: 648020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:30,567-Speed 2497.70 samples/sec Loss 1.2692 LearningRate 0.000059 Epoch: 31 Global Step: 648030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:38,768-Speed 2497.71 samples/sec Loss 1.2735 LearningRate 0.000059 Epoch: 31 Global Step: 648040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:46,979-Speed 2494.75 samples/sec Loss 1.2861 LearningRate 0.000059 Epoch: 31 Global Step: 648050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:46:55,181-Speed 2497.26 samples/sec Loss 1.2945 LearningRate 0.000059 Epoch: 31 Global Step: 648060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:03,328-Speed 2514.30 samples/sec Loss 1.2564 LearningRate 0.000059 Epoch: 31 Global Step: 648070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:11,529-Speed 2497.44 samples/sec Loss 1.3092 LearningRate 0.000059 Epoch: 31 Global Step: 648080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:19,737-Speed 2495.82 samples/sec Loss 1.2699 LearningRate 0.000059 Epoch: 31 Global Step: 648090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:27,935-Speed 2498.29 samples/sec Loss 1.2736 LearningRate 0.000059 Epoch: 31 Global Step: 648100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:36,139-Speed 2496.79 samples/sec Loss 1.3028 LearningRate 0.000059 Epoch: 31 Global Step: 648110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:44,345-Speed 2496.61 samples/sec Loss 1.2802 LearningRate 0.000059 Epoch: 31 Global Step: 648120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:47:52,493-Speed 2513.90 samples/sec Loss 1.2826 LearningRate 0.000059 Epoch: 31 Global Step: 648130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:00,694-Speed 2497.40 samples/sec Loss 1.2759 LearningRate 0.000059 Epoch: 31 Global Step: 648140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:08,902-Speed 2495.59 samples/sec Loss 1.2931 LearningRate 0.000059 Epoch: 31 Global Step: 648150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:17,100-Speed 2498.71 samples/sec Loss 1.2872 LearningRate 0.000059 Epoch: 31 Global Step: 648160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:25,300-Speed 2497.93 samples/sec Loss 1.2672 LearningRate 0.000059 Epoch: 31 Global Step: 648170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:33,498-Speed 2498.41 samples/sec Loss 1.3421 LearningRate 0.000059 Epoch: 31 Global Step: 648180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:41,652-Speed 2512.09 samples/sec Loss 1.2586 LearningRate 0.000059 Epoch: 31 Global Step: 648190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:49,855-Speed 2497.97 samples/sec Loss 1.3125 LearningRate 0.000059 Epoch: 31 Global Step: 648200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:48:58,083-Speed 2489.64 samples/sec Loss 1.2654 LearningRate 0.000059 Epoch: 31 Global Step: 648210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:49:06,280-Speed 2498.75 samples/sec Loss 1.2866 LearningRate 0.000059 Epoch: 31 Global Step: 648220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:49:14,483-Speed 2497.03 samples/sec Loss 1.2939 LearningRate 0.000059 Epoch: 31 Global Step: 648230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:49:22,695-Speed 2494.10 samples/sec Loss 1.2700 LearningRate 0.000059 Epoch: 31 Global Step: 648240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:49:30,842-Speed 2514.34 samples/sec Loss 1.2676 LearningRate 0.000059 Epoch: 31 Global Step: 648250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-07-11 18:49:39,043-Speed 2497.57 samples/sec Loss 1.3274 LearningRate 0.000059 Epoch: 31 Global Step: 648260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:49:47,251-Speed 2495.71 samples/sec Loss 1.2902 LearningRate 0.000059 Epoch: 31 Global Step: 648270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:49:55,454-Speed 2497.26 samples/sec Loss 1.2924 LearningRate 0.000059 Epoch: 31 Global Step: 648280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:03,653-Speed 2498.14 samples/sec Loss 1.2981 LearningRate 0.000059 Epoch: 31 Global Step: 648290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:11,860-Speed 2495.91 samples/sec Loss 1.3231 LearningRate 0.000059 Epoch: 31 Global Step: 648300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:20,009-Speed 2513.54 samples/sec Loss 1.2859 LearningRate 0.000059 Epoch: 31 Global Step: 648310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:28,210-Speed 2497.68 samples/sec Loss 1.2793 LearningRate 0.000059 Epoch: 31 Global Step: 648320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:36,413-Speed 2496.80 samples/sec Loss 1.3109 LearningRate 0.000059 Epoch: 31 Global Step: 648330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:44,618-Speed 2496.59 samples/sec Loss 1.2693 LearningRate 0.000059 Epoch: 31 Global Step: 648340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:50:52,819-Speed 2497.46 samples/sec Loss 1.2723 LearningRate 0.000059 Epoch: 31 Global Step: 648350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:01,033-Speed 2493.78 samples/sec Loss 1.2778 LearningRate 0.000059 Epoch: 31 Global Step: 648360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:09,184-Speed 2513.08 samples/sec Loss 1.2937 LearningRate 0.000059 Epoch: 31 Global Step: 648370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:17,387-Speed 2497.00 samples/sec Loss 1.2918 LearningRate 0.000059 Epoch: 31 Global Step: 648380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:25,594-Speed 2495.90 samples/sec Loss 1.3003 LearningRate 0.000059 Epoch: 31 Global Step: 648390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:33,792-Speed 2498.56 samples/sec Loss 1.2767 LearningRate 0.000059 Epoch: 31 Global Step: 648400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:51:41,996-Speed 2496.79 samples/sec Loss 1.3008 LearningRate 0.000059 Epoch: 31 Global Step: 648410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:51:50,197-Speed 2497.72 samples/sec Loss 1.2688 LearningRate 0.000059 Epoch: 31 Global Step: 648420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:51:58,345-Speed 2513.88 samples/sec Loss 1.3030 LearningRate 0.000059 Epoch: 31 Global Step: 648430 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:06,545-Speed 2497.93 samples/sec Loss 1.2641 LearningRate 0.000059 Epoch: 31 Global Step: 648440 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:14,750-Speed 2496.57 samples/sec Loss 1.2728 LearningRate 0.000059 Epoch: 31 Global Step: 648450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:22,954-Speed 2496.57 samples/sec Loss 1.3295 LearningRate 0.000059 Epoch: 31 Global Step: 648460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:31,167-Speed 2494.03 samples/sec Loss 1.2880 LearningRate 0.000059 Epoch: 31 Global Step: 648470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:39,365-Speed 2498.61 samples/sec Loss 1.2803 LearningRate 0.000059 Epoch: 31 Global Step: 648480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:47,516-Speed 2513.07 samples/sec Loss 1.3034 LearningRate 0.000059 Epoch: 31 Global Step: 648490 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:52:55,714-Speed 2498.36 samples/sec Loss 1.2605 LearningRate 0.000059 Epoch: 31 Global Step: 648500 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:03,913-Speed 2498.43 samples/sec Loss 1.3128 LearningRate 0.000059 Epoch: 31 Global Step: 648510 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:12,110-Speed 2498.73 samples/sec Loss 1.3277 LearningRate 0.000059 Epoch: 31 Global Step: 648520 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:20,311-Speed 2497.67 samples/sec Loss 1.2954 LearningRate 0.000059 Epoch: 31 Global Step: 648530 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:28,509-Speed 2498.61 samples/sec Loss 1.2882 LearningRate 0.000059 Epoch: 31 Global Step: 648540 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:36,654-Speed 2514.71 samples/sec Loss 1.2968 LearningRate 0.000059 Epoch: 31 Global Step: 648550 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:44,855-Speed 2497.78 samples/sec Loss 1.2930 LearningRate 0.000059 Epoch: 31 Global Step: 648560 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:53:53,055-Speed 2497.87 samples/sec Loss 1.2869 LearningRate 0.000059 Epoch: 31 Global Step: 648570 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:01,266-Speed 2494.86 samples/sec Loss 1.3081 LearningRate 0.000059 Epoch: 31 Global Step: 648580 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:09,464-Speed 2498.44 samples/sec Loss 1.2621 LearningRate 0.000059 Epoch: 31 Global Step: 648590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:17,674-Speed 2494.82 samples/sec Loss 1.2728 LearningRate 0.000059 Epoch: 31 Global Step: 648600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:25,822-Speed 2514.21 samples/sec Loss 1.2976 LearningRate 0.000059 Epoch: 31 Global Step: 648610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:34,027-Speed 2496.15 samples/sec Loss 1.2885 LearningRate 0.000059 Epoch: 31 Global Step: 648620 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 18:54:42,181-Speed 2511.96 samples/sec Loss 1.2793 LearningRate 0.000059 Epoch: 31 Global Step: 648630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:54:50,382-Speed 2497.62 samples/sec Loss 1.3408 LearningRate 0.000059 Epoch: 31 Global Step: 648640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:54:58,576-Speed 2499.90 samples/sec Loss 1.2840 LearningRate 0.000059 Epoch: 31 Global Step: 648650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:06,787-Speed 2494.37 samples/sec Loss 1.2693 LearningRate 0.000059 Epoch: 31 Global Step: 648660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:14,932-Speed 2514.76 samples/sec Loss 1.2965 LearningRate 0.000059 Epoch: 31 Global Step: 648670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:23,146-Speed 2493.82 samples/sec Loss 1.2831 LearningRate 0.000059 Epoch: 31 Global Step: 648680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:31,357-Speed 2494.45 samples/sec Loss 1.2857 LearningRate 0.000059 Epoch: 31 Global Step: 648690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:39,557-Speed 2497.93 samples/sec Loss 1.2848 LearningRate 0.000059 Epoch: 31 Global Step: 648700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:47,754-Speed 2498.97 samples/sec Loss 1.2941 LearningRate 0.000059 Epoch: 31 Global Step: 648710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:55:55,952-Speed 2498.59 samples/sec Loss 1.2802 LearningRate 0.000059 Epoch: 31 Global Step: 648720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:04,105-Speed 2512.17 samples/sec Loss 1.2852 LearningRate 0.000059 Epoch: 31 Global Step: 648730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:12,302-Speed 2499.11 samples/sec Loss 1.2788 LearningRate 0.000059 Epoch: 31 Global Step: 648740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:20,504-Speed 2497.97 samples/sec Loss 1.2948 LearningRate 0.000059 Epoch: 31 Global Step: 648750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:28,702-Speed 2498.34 samples/sec Loss 1.2843 LearningRate 0.000059 Epoch: 31 Global Step: 648760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:36,901-Speed 2498.54 samples/sec Loss 1.3013 LearningRate 0.000059 Epoch: 31 Global Step: 648770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:45,122-Speed 2491.26 samples/sec Loss 1.2456 LearningRate 0.000059 Epoch: 31 Global Step: 648780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:56:53,271-Speed 2513.87 samples/sec Loss 1.2811 LearningRate 0.000059 Epoch: 31 Global Step: 648790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:01,473-Speed 2497.41 samples/sec Loss 1.3192 LearningRate 0.000059 Epoch: 31 Global Step: 648800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:09,672-Speed 2498.14 samples/sec Loss 1.2649 LearningRate 0.000059 Epoch: 31 Global Step: 648810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:17,876-Speed 2496.71 samples/sec Loss 1.3042 LearningRate 0.000059 Epoch: 31 Global Step: 648820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:26,077-Speed 2497.79 samples/sec Loss 1.2786 LearningRate 0.000059 Epoch: 31 Global Step: 648830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:34,277-Speed 2497.92 samples/sec Loss 1.2736 LearningRate 0.000059 Epoch: 31 Global Step: 648840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:42,421-Speed 2515.00 samples/sec Loss 1.2764 LearningRate 0.000059 Epoch: 31 Global Step: 648850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:50,623-Speed 2497.31 samples/sec Loss 1.2880 LearningRate 0.000059 Epoch: 31 Global Step: 648860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:57:58,827-Speed 2496.83 samples/sec Loss 1.2746 LearningRate 0.000059 Epoch: 31 Global Step: 648870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:07,029-Speed 2497.52 samples/sec Loss 1.2912 LearningRate 0.000059 Epoch: 31 Global Step: 648880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:15,227-Speed 2498.25 samples/sec Loss 1.3048 LearningRate 0.000059 Epoch: 31 Global Step: 648890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:23,426-Speed 2498.48 samples/sec Loss 1.2724 LearningRate 0.000059 Epoch: 31 Global Step: 648900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:31,580-Speed 2511.87 samples/sec Loss 1.2909 LearningRate 0.000059 Epoch: 31 Global Step: 648910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:39,777-Speed 2499.06 samples/sec Loss 1.2796 LearningRate 0.000059 Epoch: 31 Global Step: 648920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:47,976-Speed 2498.27 samples/sec Loss 1.3205 LearningRate 0.000059 Epoch: 31 Global Step: 648930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:58:56,181-Speed 2496.41 samples/sec Loss 1.3136 LearningRate 0.000059 Epoch: 31 Global Step: 648940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:04,395-Speed 2493.85 samples/sec Loss 1.2778 LearningRate 0.000059 Epoch: 31 Global Step: 648950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:12,593-Speed 2498.25 samples/sec Loss 1.2953 LearningRate 0.000059 Epoch: 31 Global Step: 648960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:20,740-Speed 2514.26 samples/sec Loss 1.2637 LearningRate 0.000059 Epoch: 31 Global Step: 648970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:28,934-Speed 2500.03 samples/sec Loss 1.2826 LearningRate 0.000059 Epoch: 31 Global Step: 648980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:37,133-Speed 2498.31 samples/sec Loss 1.3004 LearningRate 0.000058 Epoch: 31 Global Step: 648990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:45,345-Speed 2494.19 samples/sec Loss 1.2964 LearningRate 0.000058 Epoch: 31 Global Step: 649000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 18:59:53,544-Speed 2498.39 samples/sec Loss 1.2576 LearningRate 0.000058 Epoch: 31 Global Step: 649010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:00:01,744-Speed 2497.80 samples/sec Loss 1.2740 LearningRate 0.000058 Epoch: 31 Global Step: 649020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:00:09,892-Speed 2514.06 samples/sec Loss 1.2820 LearningRate 0.000058 Epoch: 31 Global Step: 649030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:00:18,093-Speed 2497.56 samples/sec Loss 1.3139 LearningRate 0.000058 Epoch: 31 Global Step: 649040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:00:26,247-Speed 2511.91 samples/sec Loss 1.2435 LearningRate 0.000058 Epoch: 31 Global Step: 649050 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:00:34,446-Speed 2498.37 samples/sec Loss 1.2836 LearningRate 0.000058 Epoch: 31 Global Step: 649060 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:00:42,647-Speed 2497.63 samples/sec Loss 1.2885 LearningRate 0.000058 Epoch: 31 Global Step: 649070 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:00:50,848-Speed 2497.77 samples/sec Loss 1.3067 LearningRate 0.000058 Epoch: 31 Global Step: 649080 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:00:58,997-Speed 2513.80 samples/sec Loss 1.2739 LearningRate 0.000058 Epoch: 31 Global Step: 649090 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:07,201-Speed 2497.06 samples/sec Loss 1.2838 LearningRate 0.000058 Epoch: 31 Global Step: 649100 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:15,400-Speed 2498.28 samples/sec Loss 1.3037 LearningRate 0.000058 Epoch: 31 Global Step: 649110 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:23,599-Speed 2498.38 samples/sec Loss 1.2803 LearningRate 0.000058 Epoch: 31 Global Step: 649120 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:31,800-Speed 2497.59 samples/sec Loss 1.2787 LearningRate 0.000058 Epoch: 31 Global Step: 649130 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:39,998-Speed 2498.87 samples/sec Loss 1.2882 LearningRate 0.000058 Epoch: 31 Global Step: 649140 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:48,147-Speed 2513.43 samples/sec Loss 1.2647 LearningRate 0.000058 Epoch: 31 Global Step: 649150 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:01:56,349-Speed 2497.43 samples/sec Loss 1.2820 LearningRate 0.000058 Epoch: 31 Global Step: 649160 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:04,549-Speed 2497.98 samples/sec Loss 1.3032 LearningRate 0.000058 Epoch: 31 Global Step: 649170 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:12,753-Speed 2496.99 samples/sec Loss 1.2734 LearningRate 0.000058 Epoch: 31 Global Step: 649180 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:20,954-Speed 2497.55 samples/sec Loss 1.2935 LearningRate 0.000058 Epoch: 31 Global Step: 649190 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:29,154-Speed 2498.00 samples/sec Loss 1.2611 LearningRate 0.000058 Epoch: 31 Global Step: 649200 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:37,304-Speed 2513.26 samples/sec Loss 1.2537 LearningRate 0.000058 Epoch: 31 Global Step: 649210 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:45,510-Speed 2496.16 samples/sec Loss 1.2720 LearningRate 0.000058 Epoch: 31 Global Step: 649220 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:02:53,708-Speed 2498.53 samples/sec Loss 1.2594 LearningRate 0.000058 Epoch: 31 Global Step: 649230 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:01,903-Speed 2499.62 samples/sec Loss 1.2933 LearningRate 0.000058 Epoch: 31 Global Step: 649240 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:10,101-Speed 2498.64 samples/sec Loss 1.2867 LearningRate 0.000058 Epoch: 31 Global Step: 649250 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:18,297-Speed 2498.92 samples/sec Loss 1.2799 LearningRate 0.000058 Epoch: 31 Global Step: 649260 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:26,443-Speed 2514.85 samples/sec Loss 1.2694 LearningRate 0.000058 Epoch: 31 Global Step: 649270 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:34,644-Speed 2497.42 samples/sec Loss 1.2923 LearningRate 0.000058 Epoch: 31 Global Step: 649280 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:42,843-Speed 2498.34 samples/sec Loss 1.2709 LearningRate 0.000058 Epoch: 31 Global Step: 649290 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:51,040-Speed 2499.20 samples/sec Loss 1.2722 LearningRate 0.000058 Epoch: 31 Global Step: 649300 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:03:59,237-Speed 2498.55 samples/sec Loss 1.2354 LearningRate 0.000058 Epoch: 31 Global Step: 649310 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:07,439-Speed 2497.31 samples/sec Loss 1.2759 LearningRate 0.000058 Epoch: 31 Global Step: 649320 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:15,588-Speed 2513.82 samples/sec Loss 1.2814 LearningRate 0.000058 Epoch: 31 Global Step: 649330 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:23,790-Speed 2497.54 samples/sec Loss 1.2987 LearningRate 0.000058 Epoch: 31 Global Step: 649340 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:31,990-Speed 2497.97 samples/sec Loss 1.2824 LearningRate 0.000058 Epoch: 31 Global Step: 649350 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:40,191-Speed 2498.02 samples/sec Loss 1.3019 LearningRate 0.000058 Epoch: 31 Global Step: 649360 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:48,394-Speed 2496.82 samples/sec Loss 1.3010 LearningRate 0.000058 Epoch: 31 Global Step: 649370 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:04:56,605-Speed 2494.69 samples/sec Loss 1.2815 LearningRate 0.000058 Epoch: 31 Global Step: 649380 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:04,758-Speed 2512.38 samples/sec Loss 1.2474 LearningRate 0.000058 Epoch: 31 Global Step: 649390 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:12,956-Speed 2498.65 samples/sec Loss 1.2807 LearningRate 0.000058 Epoch: 31 Global Step: 649400 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:21,159-Speed 2496.89 samples/sec Loss 1.2527 LearningRate 0.000058 Epoch: 31 Global Step: 649410 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:29,362-Speed 2497.15 samples/sec Loss 1.2677 LearningRate 0.000058 Epoch: 31 Global Step: 649420 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:37,562-Speed 2497.91 samples/sec Loss 1.3137 LearningRate 0.000058 Epoch: 31 Global Step: 649430 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:45,763-Speed 2497.57 samples/sec Loss 1.2659 LearningRate 0.000058 Epoch: 31 Global Step: 649440 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:05:53,922-Speed 2510.58 samples/sec Loss 1.3057 LearningRate 0.000058 Epoch: 31 Global Step: 649450 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:02,122-Speed 2498.07 samples/sec Loss 1.2802 LearningRate 0.000058 Epoch: 31 Global Step: 649460 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:10,320-Speed 2498.55 samples/sec Loss 1.2447 LearningRate 0.000058 Epoch: 31 Global Step: 649470 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:18,543-Speed 2492.00 samples/sec Loss 1.2832 LearningRate 0.000058 Epoch: 31 Global Step: 649480 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:26,740-Speed 2498.82 samples/sec Loss 1.2694 LearningRate 0.000058 Epoch: 31 Global Step: 649490 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:34,948-Speed 2495.80 samples/sec Loss 1.3065 LearningRate 0.000058 Epoch: 31 Global Step: 649500 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:43,097-Speed 2513.50 samples/sec Loss 1.2983 LearningRate 0.000058 Epoch: 31 Global Step: 649510 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:51,297-Speed 2497.75 samples/sec Loss 1.2571 LearningRate 0.000058 Epoch: 31 Global Step: 649520 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:06:59,499-Speed 2497.52 samples/sec Loss 1.2659 LearningRate 0.000058 Epoch: 31 Global Step: 649530 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:07,705-Speed 2496.03 samples/sec Loss 1.2326 LearningRate 0.000058 Epoch: 31 Global Step: 649540 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:15,905-Speed 2497.99 samples/sec Loss 1.3032 LearningRate 0.000058 Epoch: 31 Global Step: 649550 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:24,105-Speed 2498.36 samples/sec Loss 1.2833 LearningRate 0.000058 Epoch: 31 Global Step: 649560 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:32,259-Speed 2511.92 samples/sec Loss 1.2531 LearningRate 0.000058 Epoch: 31 Global Step: 649570 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:40,470-Speed 2494.59 samples/sec Loss 1.2695 LearningRate 0.000058 Epoch: 31 Global Step: 649580 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:48,671-Speed 2497.89 samples/sec Loss 1.2741 LearningRate 0.000058 Epoch: 31 Global Step: 649590 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:07:56,875-Speed 2496.66 samples/sec Loss 1.2665 LearningRate 0.000058 Epoch: 31 Global Step: 649600 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:05,075-Speed 2497.72 samples/sec Loss 1.2605 LearningRate 0.000058 Epoch: 31 Global Step: 649610 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:13,364-Speed 2471.24 samples/sec Loss 1.2725 LearningRate 0.000058 Epoch: 31 Global Step: 649620 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:21,511-Speed 2514.03 samples/sec Loss 1.2821 LearningRate 0.000058 Epoch: 31 Global Step: 649630 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:29,707-Speed 2499.39 samples/sec Loss 1.2621 LearningRate 0.000058 Epoch: 31 Global Step: 649640 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:37,910-Speed 2496.84 samples/sec Loss 1.2768 LearningRate 0.000058 Epoch: 31 Global Step: 649650 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:46,111-Speed 2497.66 samples/sec Loss 1.3112 LearningRate 0.000058 Epoch: 31 Global Step: 649660 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:08:54,310-Speed 2498.28 samples/sec Loss 1.2774 LearningRate 0.000058 Epoch: 31 Global Step: 649670 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:02,519-Speed 2495.16 samples/sec Loss 1.2254 LearningRate 0.000058 Epoch: 31 Global Step: 649680 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:10,663-Speed 2515.13 samples/sec Loss 1.2655 LearningRate 0.000058 Epoch: 31 Global Step: 649690 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:18,869-Speed 2496.16 samples/sec Loss 1.2559 LearningRate 0.000058 Epoch: 31 Global Step: 649700 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:27,065-Speed 2499.18 samples/sec Loss 1.2602 LearningRate 0.000058 Epoch: 31 Global Step: 649710 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:35,269-Speed 2496.71 samples/sec Loss 1.2741 LearningRate 0.000058 Epoch: 31 Global Step: 649720 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:43,467-Speed 2498.44 samples/sec Loss 1.2919 LearningRate 0.000058 Epoch: 31 Global Step: 649730 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:51,669-Speed 2497.63 samples/sec Loss 1.2718 LearningRate 0.000058 Epoch: 31 Global Step: 649740 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:09:59,828-Speed 2510.83 samples/sec Loss 1.2841 LearningRate 0.000058 Epoch: 31 Global Step: 649750 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:08,027-Speed 2498.10 samples/sec Loss 1.2486 LearningRate 0.000058 Epoch: 31 Global Step: 649760 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:16,232-Speed 2496.63 samples/sec Loss 1.2819 LearningRate 0.000058 Epoch: 31 Global Step: 649770 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:24,431-Speed 2498.16 samples/sec Loss 1.2964 LearningRate 0.000058 Epoch: 31 Global Step: 649780 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:32,635-Speed 2497.00 samples/sec Loss 1.2971 LearningRate 0.000058 Epoch: 31 Global Step: 649790 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:40,840-Speed 2496.33 samples/sec Loss 1.2927 LearningRate 0.000058 Epoch: 31 Global Step: 649800 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:49,005-Speed 2508.50 samples/sec Loss 1.2849 LearningRate 0.000058 Epoch: 31 Global Step: 649810 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:10:57,206-Speed 2498.33 samples/sec Loss 1.2552 LearningRate 0.000058 Epoch: 31 Global Step: 649820 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:05,404-Speed 2498.66 samples/sec Loss 1.2602 LearningRate 0.000058 Epoch: 31 Global Step: 649830 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:13,603-Speed 2498.47 samples/sec Loss 1.2712 LearningRate 0.000058 Epoch: 31 Global Step: 649840 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:21,799-Speed 2499.09 samples/sec Loss 1.2682 LearningRate 0.000058 Epoch: 31 Global Step: 649850 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:29,996-Speed 2498.73 samples/sec Loss 1.3018 LearningRate 0.000058 Epoch: 31 Global Step: 649860 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:38,142-Speed 2514.49 samples/sec Loss 1.2620 LearningRate 0.000058 Epoch: 31 Global Step: 649870 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:46,339-Speed 2498.92 samples/sec Loss 1.2969 LearningRate 0.000058 Epoch: 31 Global Step: 649880 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:11:54,537-Speed 2498.44 samples/sec Loss 1.2797 LearningRate 0.000058 Epoch: 31 Global Step: 649890 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:02,734-Speed 2498.84 samples/sec Loss 1.2763 LearningRate 0.000058 Epoch: 31 Global Step: 649900 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:10,938-Speed 2496.90 samples/sec Loss 1.2735 LearningRate 0.000058 Epoch: 31 Global Step: 649910 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:19,136-Speed 2498.57 samples/sec Loss 1.2887 LearningRate 0.000058 Epoch: 31 Global Step: 649920 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:27,280-Speed 2515.41 samples/sec Loss 1.3147 LearningRate 0.000058 Epoch: 31 Global Step: 649930 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:35,476-Speed 2499.34 samples/sec Loss 1.2783 LearningRate 0.000058 Epoch: 31 Global Step: 649940 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:43,675-Speed 2498.29 samples/sec Loss 1.3038 LearningRate 0.000058 Epoch: 31 Global Step: 649950 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:12:51,876-Speed 2497.58 samples/sec Loss 1.2857 LearningRate 0.000058 Epoch: 31 Global Step: 649960 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:00,083-Speed 2495.75 samples/sec Loss 1.2915 LearningRate 0.000058 Epoch: 31 Global Step: 649970 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:08,285-Speed 2497.34 samples/sec Loss 1.2820 LearningRate 0.000058 Epoch: 31 Global Step: 649980 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:16,431-Speed 2514.68 samples/sec Loss 1.2645 LearningRate 0.000058 Epoch: 31 Global Step: 649990 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:24,625-Speed 2499.65 samples/sec Loss 1.2545 LearningRate 0.000058 Epoch: 31 Global Step: 650000 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:32,822-Speed 2498.96 samples/sec Loss 1.2869 LearningRate 0.000058 Epoch: 31 Global Step: 650010 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:41,019-Speed 2498.87 samples/sec Loss 1.2893 LearningRate 0.000058 Epoch: 31 Global Step: 650020 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:49,214-Speed 2499.40 samples/sec Loss 1.2530 LearningRate 0.000058 Epoch: 31 Global Step: 650030 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:13:57,412-Speed 2498.57 samples/sec Loss 1.3121 LearningRate 0.000058 Epoch: 31 Global Step: 650040 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:05,556-Speed 2515.07 samples/sec Loss 1.2603 LearningRate 0.000058 Epoch: 31 Global Step: 650050 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:13,754-Speed 2498.57 samples/sec Loss 1.3008 LearningRate 0.000058 Epoch: 31 Global Step: 650060 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:21,954-Speed 2497.87 samples/sec Loss 1.3007 LearningRate 0.000058 Epoch: 31 Global Step: 650070 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:30,151-Speed 2498.80 samples/sec Loss 1.2830 LearningRate 0.000058 Epoch: 31 Global Step: 650080 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:38,350-Speed 2498.52 samples/sec Loss 1.3265 LearningRate 0.000058 Epoch: 31 Global Step: 650090 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:46,552-Speed 2497.23 samples/sec Loss 1.3020 LearningRate 0.000058 Epoch: 31 Global Step: 650100 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:14:54,694-Speed 2515.66 samples/sec Loss 1.3322 LearningRate 0.000058 Epoch: 31 Global Step: 650110 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:02,902-Speed 2495.40 samples/sec Loss 1.3057 LearningRate 0.000058 Epoch: 31 Global Step: 650120 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:11,095-Speed 2500.07 samples/sec Loss 1.2495 LearningRate 0.000058 Epoch: 31 Global Step: 650130 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:19,296-Speed 2497.69 samples/sec Loss 1.2781 LearningRate 0.000058 Epoch: 31 Global Step: 650140 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:27,494-Speed 2498.77 samples/sec Loss 1.2748 LearningRate 0.000058 Epoch: 31 Global Step: 650150 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:35,697-Speed 2497.02 samples/sec Loss 1.3160 LearningRate 0.000058 Epoch: 31 Global Step: 650160 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:43,840-Speed 2516.16 samples/sec Loss 1.2854 LearningRate 0.000058 Epoch: 31 Global Step: 650170 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:15:52,041-Speed 2497.72 samples/sec Loss 1.2478 LearningRate 0.000058 Epoch: 31 Global Step: 650180 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:00,249-Speed 2495.43 samples/sec Loss 1.2431 LearningRate 0.000058 Epoch: 31 Global Step: 650190 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:08,449-Speed 2498.17 samples/sec Loss 1.2679 LearningRate 0.000058 Epoch: 31 Global Step: 650200 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:16,647-Speed 2498.36 samples/sec Loss 1.3151 LearningRate 0.000058 Epoch: 31 Global Step: 650210 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:24,846-Speed 2498.29 samples/sec Loss 1.2820 LearningRate 0.000058 Epoch: 31 Global Step: 650220 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:32,991-Speed 2514.67 samples/sec Loss 1.2967 LearningRate 0.000058 Epoch: 31 Global Step: 650230 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:41,195-Speed 2496.74 samples/sec Loss 1.2860 LearningRate 0.000058 Epoch: 31 Global Step: 650240 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-07-11 19:16:49,396-Speed 2497.63 samples/sec Loss 1.2766 LearningRate 0.000058 Epoch: 31 Global Step: 650250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:16:57,607-Speed 2494.76 samples/sec Loss 1.2854 LearningRate 0.000058 Epoch: 31 Global Step: 650260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:05,811-Speed 2496.86 samples/sec Loss 1.2519 LearningRate 0.000058 Epoch: 31 Global Step: 650270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:14,011-Speed 2498.08 samples/sec Loss 1.2901 LearningRate 0.000058 Epoch: 31 Global Step: 650280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:22,165-Speed 2511.99 samples/sec Loss 1.2763 LearningRate 0.000058 Epoch: 31 Global Step: 650290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:30,384-Speed 2492.17 samples/sec Loss 1.2843 LearningRate 0.000058 Epoch: 31 Global Step: 650300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:38,599-Speed 2495.86 samples/sec Loss 1.2882 LearningRate 0.000058 Epoch: 31 Global Step: 650310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:46,802-Speed 2497.16 samples/sec Loss 1.2640 LearningRate 0.000058 Epoch: 31 Global Step: 650320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:17:54,998-Speed 2499.04 samples/sec Loss 1.2774 LearningRate 0.000058 Epoch: 31 Global Step: 650330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:03,202-Speed 2496.90 samples/sec Loss 1.2778 LearningRate 0.000058 Epoch: 31 Global Step: 650340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:11,351-Speed 2513.44 samples/sec Loss 1.2890 LearningRate 0.000058 Epoch: 31 Global Step: 650350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:19,549-Speed 2498.66 samples/sec Loss 1.2792 LearningRate 0.000058 Epoch: 31 Global Step: 650360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:27,749-Speed 2498.00 samples/sec Loss 1.2946 LearningRate 0.000058 Epoch: 31 Global Step: 650370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:35,944-Speed 2499.60 samples/sec Loss 1.2927 LearningRate 0.000058 Epoch: 31 Global Step: 650380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:44,148-Speed 2496.70 samples/sec Loss 1.2842 LearningRate 0.000058 Epoch: 31 Global Step: 650390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:18:52,352-Speed 2496.76 samples/sec Loss 1.3003 LearningRate 0.000058 Epoch: 31 Global Step: 650400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:00,498-Speed 2514.52 samples/sec Loss 1.2738 LearningRate 0.000058 Epoch: 31 Global Step: 650410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:08,696-Speed 2498.74 samples/sec Loss 1.2576 LearningRate 0.000058 Epoch: 31 Global Step: 650420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:16,890-Speed 2499.94 samples/sec Loss 1.2779 LearningRate 0.000058 Epoch: 31 Global Step: 650430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:25,092-Speed 2497.46 samples/sec Loss 1.2862 LearningRate 0.000058 Epoch: 31 Global Step: 650440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:33,292-Speed 2498.01 samples/sec Loss 1.2904 LearningRate 0.000058 Epoch: 31 Global Step: 650450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:41,487-Speed 2499.19 samples/sec Loss 1.2793 LearningRate 0.000058 Epoch: 31 Global Step: 650460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:49,644-Speed 2511.14 samples/sec Loss 1.2541 LearningRate 0.000058 Epoch: 31 Global Step: 650470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:19:57,844-Speed 2498.37 samples/sec Loss 1.2990 LearningRate 0.000058 Epoch: 31 Global Step: 650480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:06,057-Speed 2493.77 samples/sec Loss 1.2866 LearningRate 0.000058 Epoch: 31 Global Step: 650490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:14,263-Speed 2495.96 samples/sec Loss 1.2754 LearningRate 0.000058 Epoch: 31 Global Step: 650500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:22,466-Speed 2497.24 samples/sec Loss 1.3061 LearningRate 0.000058 Epoch: 31 Global Step: 650510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:30,668-Speed 2497.38 samples/sec Loss 1.2769 LearningRate 0.000058 Epoch: 31 Global Step: 650520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:38,816-Speed 2513.74 samples/sec Loss 1.2747 LearningRate 0.000058 Epoch: 31 Global Step: 650530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:47,017-Speed 2497.88 samples/sec Loss 1.2884 LearningRate 0.000057 Epoch: 31 Global Step: 650540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:20:55,225-Speed 2495.69 samples/sec Loss 1.2910 LearningRate 0.000057 Epoch: 31 Global Step: 650550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:03,427-Speed 2497.42 samples/sec Loss 1.2743 LearningRate 0.000057 Epoch: 31 Global Step: 650560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:11,650-Speed 2491.09 samples/sec Loss 1.3046 LearningRate 0.000057 Epoch: 31 Global Step: 650570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:19,871-Speed 2491.49 samples/sec Loss 1.2754 LearningRate 0.000057 Epoch: 31 Global Step: 650580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:28,017-Speed 2514.73 samples/sec Loss 1.3027 LearningRate 0.000057 Epoch: 31 Global Step: 650590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:36,218-Speed 2497.80 samples/sec Loss 1.2948 LearningRate 0.000057 Epoch: 31 Global Step: 650600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:44,423-Speed 2496.58 samples/sec Loss 1.2878 LearningRate 0.000057 Epoch: 31 Global Step: 650610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:21:52,624-Speed 2497.46 samples/sec Loss 1.2838 LearningRate 0.000057 Epoch: 31 Global Step: 650620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:00,825-Speed 2497.74 samples/sec Loss 1.2565 LearningRate 0.000057 Epoch: 31 Global Step: 650630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:09,028-Speed 2496.91 samples/sec Loss 1.2614 LearningRate 0.000057 Epoch: 31 Global Step: 650640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:17,184-Speed 2511.45 samples/sec Loss 1.2746 LearningRate 0.000057 Epoch: 31 Global Step: 650650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:25,390-Speed 2496.14 samples/sec Loss 1.2754 LearningRate 0.000057 Epoch: 31 Global Step: 650660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:33,592-Speed 2497.35 samples/sec Loss 1.3038 LearningRate 0.000057 Epoch: 31 Global Step: 650670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:41,794-Speed 2497.35 samples/sec Loss 1.2554 LearningRate 0.000057 Epoch: 31 Global Step: 650680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:49,995-Speed 2497.61 samples/sec Loss 1.2851 LearningRate 0.000057 Epoch: 31 Global Step: 650690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:22:58,205-Speed 2494.78 samples/sec Loss 1.3043 LearningRate 0.000057 Epoch: 31 Global Step: 650700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:06,353-Speed 2514.11 samples/sec Loss 1.2762 LearningRate 0.000057 Epoch: 31 Global Step: 650710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:14,553-Speed 2497.75 samples/sec Loss 1.2927 LearningRate 0.000057 Epoch: 31 Global Step: 650720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:22,766-Speed 2493.99 samples/sec Loss 1.2700 LearningRate 0.000057 Epoch: 31 Global Step: 650730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:30,970-Speed 2496.68 samples/sec Loss 1.2848 LearningRate 0.000057 Epoch: 31 Global Step: 650740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:39,192-Speed 2491.29 samples/sec Loss 1.2615 LearningRate 0.000057 Epoch: 31 Global Step: 650750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:47,409-Speed 2492.79 samples/sec Loss 1.2926 LearningRate 0.000057 Epoch: 31 Global Step: 650760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:23:55,560-Speed 2512.95 samples/sec Loss 1.2512 LearningRate 0.000057 Epoch: 31 Global Step: 650770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:03,765-Speed 2496.61 samples/sec Loss 1.3042 LearningRate 0.000057 Epoch: 31 Global Step: 650780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:11,967-Speed 2497.68 samples/sec Loss 1.2942 LearningRate 0.000057 Epoch: 31 Global Step: 650790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:20,171-Speed 2496.46 samples/sec Loss 1.2802 LearningRate 0.000057 Epoch: 31 Global Step: 650800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:28,377-Speed 2496.18 samples/sec Loss 1.2591 LearningRate 0.000057 Epoch: 31 Global Step: 650810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:36,580-Speed 2496.95 samples/sec Loss 1.2680 LearningRate 0.000057 Epoch: 31 Global Step: 650820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:44,728-Speed 2514.15 samples/sec Loss 1.2479 LearningRate 0.000057 Epoch: 31 Global Step: 650830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:24:52,928-Speed 2497.63 samples/sec Loss 1.2798 LearningRate 0.000057 Epoch: 31 Global Step: 650840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:01,130-Speed 2497.49 samples/sec Loss 1.3271 LearningRate 0.000057 Epoch: 31 Global Step: 650850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:09,330-Speed 2497.78 samples/sec Loss 1.2832 LearningRate 0.000057 Epoch: 31 Global Step: 650860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:17,526-Speed 2499.10 samples/sec Loss 1.2958 LearningRate 0.000057 Epoch: 31 Global Step: 650870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:25,729-Speed 2497.20 samples/sec Loss 1.2838 LearningRate 0.000057 Epoch: 31 Global Step: 650880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:33,883-Speed 2512.05 samples/sec Loss 1.3049 LearningRate 0.000057 Epoch: 31 Global Step: 650890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:42,082-Speed 2498.41 samples/sec Loss 1.2517 LearningRate 0.000057 Epoch: 31 Global Step: 650900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:50,282-Speed 2497.50 samples/sec Loss 1.2668 LearningRate 0.000057 Epoch: 31 Global Step: 650910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:25:58,487-Speed 2496.62 samples/sec Loss 1.2775 LearningRate 0.000057 Epoch: 31 Global Step: 650920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:06,693-Speed 2495.95 samples/sec Loss 1.2671 LearningRate 0.000057 Epoch: 31 Global Step: 650930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:14,896-Speed 2496.97 samples/sec Loss 1.2680 LearningRate 0.000057 Epoch: 31 Global Step: 650940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:23,047-Speed 2512.97 samples/sec Loss 1.3060 LearningRate 0.000057 Epoch: 31 Global Step: 650950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:31,261-Speed 2493.70 samples/sec Loss 1.2735 LearningRate 0.000057 Epoch: 31 Global Step: 650960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:39,462-Speed 2497.51 samples/sec Loss 1.2874 LearningRate 0.000057 Epoch: 31 Global Step: 650970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:47,675-Speed 2493.92 samples/sec Loss 1.2792 LearningRate 0.000057 Epoch: 31 Global Step: 650980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:26:55,878-Speed 2497.65 samples/sec Loss 1.2679 LearningRate 0.000057 Epoch: 31 Global Step: 650990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:04,090-Speed 2494.18 samples/sec Loss 1.2643 LearningRate 0.000057 Epoch: 31 Global Step: 651000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:12,238-Speed 2513.95 samples/sec Loss 1.2647 LearningRate 0.000057 Epoch: 31 Global Step: 651010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:20,437-Speed 2498.30 samples/sec Loss 1.2720 LearningRate 0.000057 Epoch: 31 Global Step: 651020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:28,633-Speed 2499.20 samples/sec Loss 1.3132 LearningRate 0.000057 Epoch: 31 Global Step: 651030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:36,836-Speed 2497.14 samples/sec Loss 1.2919 LearningRate 0.000057 Epoch: 31 Global Step: 651040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:45,038-Speed 2497.28 samples/sec Loss 1.2898 LearningRate 0.000057 Epoch: 31 Global Step: 651050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:27:53,237-Speed 2498.08 samples/sec Loss 1.2776 LearningRate 0.000057 Epoch: 31 Global Step: 651060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:01,383-Speed 2514.56 samples/sec Loss 1.2929 LearningRate 0.000057 Epoch: 31 Global Step: 651070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:09,586-Speed 2497.17 samples/sec Loss 1.2967 LearningRate 0.000057 Epoch: 31 Global Step: 651080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:17,790-Speed 2496.52 samples/sec Loss 1.3050 LearningRate 0.000057 Epoch: 31 Global Step: 651090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:25,990-Speed 2498.15 samples/sec Loss 1.2941 LearningRate 0.000057 Epoch: 31 Global Step: 651100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:34,193-Speed 2497.04 samples/sec Loss 1.2988 LearningRate 0.000057 Epoch: 31 Global Step: 651110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:42,394-Speed 2497.59 samples/sec Loss 1.3215 LearningRate 0.000057 Epoch: 31 Global Step: 651120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:50,547-Speed 2512.43 samples/sec Loss 1.2936 LearningRate 0.000057 Epoch: 31 Global Step: 651130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:28:58,748-Speed 2497.73 samples/sec Loss 1.2920 LearningRate 0.000057 Epoch: 31 Global Step: 651140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:06,950-Speed 2497.06 samples/sec Loss 1.2656 LearningRate 0.000057 Epoch: 31 Global Step: 651150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:15,151-Speed 2497.80 samples/sec Loss 1.2799 LearningRate 0.000057 Epoch: 31 Global Step: 651160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:23,348-Speed 2498.95 samples/sec Loss 1.3045 LearningRate 0.000057 Epoch: 31 Global Step: 651170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:31,549-Speed 2497.39 samples/sec Loss 1.2895 LearningRate 0.000057 Epoch: 31 Global Step: 651180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:39,698-Speed 2513.78 samples/sec Loss 1.3001 LearningRate 0.000057 Epoch: 31 Global Step: 651190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:47,896-Speed 2498.56 samples/sec Loss 1.2919 LearningRate 0.000057 Epoch: 31 Global Step: 651200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:29:56,101-Speed 2496.23 samples/sec Loss 1.2688 LearningRate 0.000057 Epoch: 31 Global Step: 651210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:04,315-Speed 2493.85 samples/sec Loss 1.2967 LearningRate 0.000057 Epoch: 31 Global Step: 651220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:12,513-Speed 2498.50 samples/sec Loss 1.2850 LearningRate 0.000057 Epoch: 31 Global Step: 651230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:20,712-Speed 2498.35 samples/sec Loss 1.2634 LearningRate 0.000057 Epoch: 31 Global Step: 651240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:28,859-Speed 2514.32 samples/sec Loss 1.2873 LearningRate 0.000057 Epoch: 31 Global Step: 651250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:37,056-Speed 2498.71 samples/sec Loss 1.2656 LearningRate 0.000057 Epoch: 31 Global Step: 651260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:45,255-Speed 2498.05 samples/sec Loss 1.3277 LearningRate 0.000057 Epoch: 31 Global Step: 651270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:30:53,454-Speed 2498.40 samples/sec Loss 1.3109 LearningRate 0.000057 Epoch: 31 Global Step: 651280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:01,654-Speed 2497.85 samples/sec Loss 1.2909 LearningRate 0.000057 Epoch: 31 Global Step: 651290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:09,853-Speed 2498.18 samples/sec Loss 1.2896 LearningRate 0.000057 Epoch: 31 Global Step: 651300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:17,998-Speed 2514.78 samples/sec Loss 1.2745 LearningRate 0.000057 Epoch: 31 Global Step: 651310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:26,197-Speed 2498.36 samples/sec Loss 1.2440 LearningRate 0.000057 Epoch: 31 Global Step: 651320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:34,394-Speed 2499.05 samples/sec Loss 1.3007 LearningRate 0.000057 Epoch: 31 Global Step: 651330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:42,594-Speed 2498.05 samples/sec Loss 1.2833 LearningRate 0.000057 Epoch: 31 Global Step: 651340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:50,793-Speed 2498.18 samples/sec Loss 1.2744 LearningRate 0.000057 Epoch: 31 Global Step: 651350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:31:58,993-Speed 2497.94 samples/sec Loss 1.2906 LearningRate 0.000057 Epoch: 31 Global Step: 651360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:07,135-Speed 2515.89 samples/sec Loss 1.2661 LearningRate 0.000057 Epoch: 31 Global Step: 651370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:15,331-Speed 2499.02 samples/sec Loss 1.3285 LearningRate 0.000057 Epoch: 31 Global Step: 651380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:23,532-Speed 2497.68 samples/sec Loss 1.2539 LearningRate 0.000057 Epoch: 31 Global Step: 651390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:31,734-Speed 2497.75 samples/sec Loss 1.2798 LearningRate 0.000057 Epoch: 31 Global Step: 651400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:39,933-Speed 2498.29 samples/sec Loss 1.2622 LearningRate 0.000057 Epoch: 31 Global Step: 651410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:48,133-Speed 2497.90 samples/sec Loss 1.2702 LearningRate 0.000057 Epoch: 31 Global Step: 651420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:32:56,280-Speed 2514.27 samples/sec Loss 1.2642 LearningRate 0.000057 Epoch: 31 Global Step: 651430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:33:04,482-Speed 2497.23 samples/sec Loss 1.2773 LearningRate 0.000057 Epoch: 31 Global Step: 651440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:33:12,678-Speed 2499.05 samples/sec Loss 1.2760 LearningRate 0.000057 Epoch: 31 Global Step: 651450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 19:33:20,884-Speed 2496.54 samples/sec Loss 1.2710 LearningRate 0.000057 Epoch: 31 Global Step: 651460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-07-11 19:33:29,037-Speed 2512.22 samples/sec Loss 1.2819 LearningRate 0.000057 Epoch: 31 Global Step: 651470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:33:37,236-Speed 2498.39 samples/sec Loss 1.2419 LearningRate 0.000057 Epoch: 31 Global Step: 651480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:33:45,382-Speed 2514.45 samples/sec Loss 1.3212 LearningRate 0.000057 Epoch: 31 Global Step: 651490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:33:53,589-Speed 2495.76 samples/sec Loss 1.2841 LearningRate 0.000057 Epoch: 31 Global Step: 651500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:01,789-Speed 2497.75 samples/sec Loss 1.2782 LearningRate 0.000057 Epoch: 31 Global Step: 651510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:09,988-Speed 2498.11 samples/sec Loss 1.2856 LearningRate 0.000057 Epoch: 31 Global Step: 651520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:18,189-Speed 2497.66 samples/sec Loss 1.2673 LearningRate 0.000057 Epoch: 31 Global Step: 651530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:26,385-Speed 2499.28 samples/sec Loss 1.3066 LearningRate 0.000057 Epoch: 31 Global Step: 651540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:34,528-Speed 2515.10 samples/sec Loss 1.2681 LearningRate 0.000057 Epoch: 31 Global Step: 651550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:42,733-Speed 2496.95 samples/sec Loss 1.2793 LearningRate 0.000057 Epoch: 31 Global Step: 651560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:50,951-Speed 2492.67 samples/sec Loss 1.2808 LearningRate 0.000057 Epoch: 31 Global Step: 651570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:34:59,151-Speed 2497.74 samples/sec Loss 1.2690 LearningRate 0.000057 Epoch: 31 Global Step: 651580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:07,354-Speed 2497.00 samples/sec Loss 1.2762 LearningRate 0.000057 Epoch: 31 Global Step: 651590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:15,554-Speed 2497.96 samples/sec Loss 1.3231 LearningRate 0.000057 Epoch: 31 Global Step: 651600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:23,703-Speed 2513.58 samples/sec Loss 1.2831 LearningRate 0.000057 Epoch: 31 Global Step: 651610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:31,904-Speed 2497.83 samples/sec Loss 1.2786 LearningRate 0.000057 Epoch: 31 Global Step: 651620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:40,108-Speed 2496.92 samples/sec Loss 1.2979 LearningRate 0.000057 Epoch: 31 Global Step: 651630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:48,312-Speed 2496.63 samples/sec Loss 1.2867 LearningRate 0.000057 Epoch: 31 Global Step: 651640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:35:56,515-Speed 2497.09 samples/sec Loss 1.2807 LearningRate 0.000057 Epoch: 31 Global Step: 651650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:04,720-Speed 2496.75 samples/sec Loss 1.2877 LearningRate 0.000057 Epoch: 31 Global Step: 651660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:12,867-Speed 2514.14 samples/sec Loss 1.2949 LearningRate 0.000057 Epoch: 31 Global Step: 651670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:21,082-Speed 2493.51 samples/sec Loss 1.2663 LearningRate 0.000057 Epoch: 31 Global Step: 651680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:29,288-Speed 2496.10 samples/sec Loss 1.2602 LearningRate 0.000057 Epoch: 31 Global Step: 651690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:37,488-Speed 2497.82 samples/sec Loss 1.2800 LearningRate 0.000057 Epoch: 31 Global Step: 651700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:45,693-Speed 2496.79 samples/sec Loss 1.2692 LearningRate 0.000057 Epoch: 31 Global Step: 651710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:36:53,902-Speed 2494.92 samples/sec Loss 1.2725 LearningRate 0.000057 Epoch: 31 Global Step: 651720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:02,054-Speed 2512.97 samples/sec Loss 1.2511 LearningRate 0.000057 Epoch: 31 Global Step: 651730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:10,257-Speed 2497.08 samples/sec Loss 1.2591 LearningRate 0.000057 Epoch: 31 Global Step: 651740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:18,459-Speed 2497.33 samples/sec Loss 1.2595 LearningRate 0.000057 Epoch: 31 Global Step: 651750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:26,675-Speed 2493.08 samples/sec Loss 1.2628 LearningRate 0.000057 Epoch: 31 Global Step: 651760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:34,878-Speed 2497.13 samples/sec Loss 1.2466 LearningRate 0.000057 Epoch: 31 Global Step: 651770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:43,080-Speed 2497.23 samples/sec Loss 1.2638 LearningRate 0.000057 Epoch: 31 Global Step: 651780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:51,231-Speed 2512.74 samples/sec Loss 1.2792 LearningRate 0.000057 Epoch: 31 Global Step: 651790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:37:59,446-Speed 2493.57 samples/sec Loss 1.2762 LearningRate 0.000057 Epoch: 31 Global Step: 651800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:07,649-Speed 2497.06 samples/sec Loss 1.2869 LearningRate 0.000057 Epoch: 31 Global Step: 651810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:15,854-Speed 2496.39 samples/sec Loss 1.2876 LearningRate 0.000057 Epoch: 31 Global Step: 651820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:24,053-Speed 2498.48 samples/sec Loss 1.3044 LearningRate 0.000057 Epoch: 31 Global Step: 651830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:32,259-Speed 2496.03 samples/sec Loss 1.3036 LearningRate 0.000057 Epoch: 31 Global Step: 651840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:40,414-Speed 2511.75 samples/sec Loss 1.3089 LearningRate 0.000057 Epoch: 31 Global Step: 651850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:48,618-Speed 2496.75 samples/sec Loss 1.2899 LearningRate 0.000057 Epoch: 31 Global Step: 651860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:38:56,824-Speed 2496.46 samples/sec Loss 1.2728 LearningRate 0.000057 Epoch: 31 Global Step: 651870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:05,024-Speed 2497.68 samples/sec Loss 1.2726 LearningRate 0.000057 Epoch: 31 Global Step: 651880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:13,226-Speed 2497.49 samples/sec Loss 1.2477 LearningRate 0.000057 Epoch: 31 Global Step: 651890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:21,435-Speed 2495.13 samples/sec Loss 1.2885 LearningRate 0.000057 Epoch: 31 Global Step: 651900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:29,581-Speed 2514.67 samples/sec Loss 1.2665 LearningRate 0.000057 Epoch: 31 Global Step: 651910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:37,785-Speed 2496.94 samples/sec Loss 1.2715 LearningRate 0.000057 Epoch: 31 Global Step: 651920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:45,986-Speed 2497.85 samples/sec Loss 1.2680 LearningRate 0.000057 Epoch: 31 Global Step: 651930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:39:54,185-Speed 2498.19 samples/sec Loss 1.2588 LearningRate 0.000057 Epoch: 31 Global Step: 651940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:02,387-Speed 2497.29 samples/sec Loss 1.2803 LearningRate 0.000057 Epoch: 31 Global Step: 651950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:10,586-Speed 2498.42 samples/sec Loss 1.2598 LearningRate 0.000057 Epoch: 31 Global Step: 651960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:18,737-Speed 2512.97 samples/sec Loss 1.2737 LearningRate 0.000057 Epoch: 31 Global Step: 651970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:26,945-Speed 2495.27 samples/sec Loss 1.2543 LearningRate 0.000057 Epoch: 31 Global Step: 651980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:35,151-Speed 2496.34 samples/sec Loss 1.2528 LearningRate 0.000057 Epoch: 31 Global Step: 651990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:43,347-Speed 2498.98 samples/sec Loss 1.3024 LearningRate 0.000057 Epoch: 31 Global Step: 652000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:51,551-Speed 2496.92 samples/sec Loss 1.2785 LearningRate 0.000057 Epoch: 31 Global Step: 652010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:40:59,768-Speed 2492.78 samples/sec Loss 1.2907 LearningRate 0.000057 Epoch: 31 Global Step: 652020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:07,916-Speed 2513.98 samples/sec Loss 1.2578 LearningRate 0.000057 Epoch: 31 Global Step: 652030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:16,115-Speed 2498.03 samples/sec Loss 1.2746 LearningRate 0.000057 Epoch: 31 Global Step: 652040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:24,317-Speed 2497.31 samples/sec Loss 1.2651 LearningRate 0.000057 Epoch: 31 Global Step: 652050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:32,516-Speed 2498.30 samples/sec Loss 1.2614 LearningRate 0.000057 Epoch: 31 Global Step: 652060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:40,714-Speed 2498.56 samples/sec Loss 1.2780 LearningRate 0.000057 Epoch: 31 Global Step: 652070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:48,915-Speed 2497.61 samples/sec Loss 1.2819 LearningRate 0.000057 Epoch: 31 Global Step: 652080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:41:57,063-Speed 2513.86 samples/sec Loss 1.2746 LearningRate 0.000057 Epoch: 31 Global Step: 652090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:05,267-Speed 2496.66 samples/sec Loss 1.2621 LearningRate 0.000056 Epoch: 31 Global Step: 652100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:13,466-Speed 2498.19 samples/sec Loss 1.2744 LearningRate 0.000056 Epoch: 31 Global Step: 652110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:21,679-Speed 2494.21 samples/sec Loss 1.2626 LearningRate 0.000056 Epoch: 31 Global Step: 652120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:29,892-Speed 2493.86 samples/sec Loss 1.2822 LearningRate 0.000056 Epoch: 31 Global Step: 652130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:38,092-Speed 2498.05 samples/sec Loss 1.2695 LearningRate 0.000056 Epoch: 31 Global Step: 652140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:46,250-Speed 2510.86 samples/sec Loss 1.2932 LearningRate 0.000056 Epoch: 31 Global Step: 652150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:42:54,462-Speed 2494.26 samples/sec Loss 1.2642 LearningRate 0.000056 Epoch: 31 Global Step: 652160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:02,665-Speed 2496.81 samples/sec Loss 1.3162 LearningRate 0.000056 Epoch: 31 Global Step: 652170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:10,874-Speed 2495.38 samples/sec Loss 1.2716 LearningRate 0.000056 Epoch: 31 Global Step: 652180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:19,075-Speed 2497.76 samples/sec Loss 1.2718 LearningRate 0.000056 Epoch: 31 Global Step: 652190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:27,275-Speed 2497.92 samples/sec Loss 1.3088 LearningRate 0.000056 Epoch: 31 Global Step: 652200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:35,424-Speed 2513.66 samples/sec Loss 1.3099 LearningRate 0.000056 Epoch: 31 Global Step: 652210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:43,620-Speed 2499.16 samples/sec Loss 1.2605 LearningRate 0.000056 Epoch: 31 Global Step: 652220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:43:51,821-Speed 2497.81 samples/sec Loss 1.2642 LearningRate 0.000056 Epoch: 31 Global Step: 652230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:00,024-Speed 2496.91 samples/sec Loss 1.3006 LearningRate 0.000056 Epoch: 31 Global Step: 652240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:08,224-Speed 2498.03 samples/sec Loss 1.2784 LearningRate 0.000056 Epoch: 31 Global Step: 652250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:16,426-Speed 2497.34 samples/sec Loss 1.2479 LearningRate 0.000056 Epoch: 31 Global Step: 652260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:24,574-Speed 2514.11 samples/sec Loss 1.2692 LearningRate 0.000056 Epoch: 31 Global Step: 652270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:32,773-Speed 2498.39 samples/sec Loss 1.3323 LearningRate 0.000056 Epoch: 31 Global Step: 652280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:40,974-Speed 2497.69 samples/sec Loss 1.2858 LearningRate 0.000056 Epoch: 31 Global Step: 652290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:49,174-Speed 2497.92 samples/sec Loss 1.2973 LearningRate 0.000056 Epoch: 31 Global Step: 652300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:44:57,378-Speed 2496.63 samples/sec Loss 1.2258 LearningRate 0.000056 Epoch: 31 Global Step: 652310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:05,581-Speed 2497.11 samples/sec Loss 1.2860 LearningRate 0.000056 Epoch: 31 Global Step: 652320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:13,729-Speed 2513.97 samples/sec Loss 1.2542 LearningRate 0.000056 Epoch: 31 Global Step: 652330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:21,932-Speed 2496.95 samples/sec Loss 1.2779 LearningRate 0.000056 Epoch: 31 Global Step: 652340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:30,135-Speed 2497.07 samples/sec Loss 1.2766 LearningRate 0.000056 Epoch: 31 Global Step: 652350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:38,333-Speed 2498.62 samples/sec Loss 1.2552 LearningRate 0.000056 Epoch: 31 Global Step: 652360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:46,537-Speed 2496.69 samples/sec Loss 1.2896 LearningRate 0.000056 Epoch: 31 Global Step: 652370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:45:54,750-Speed 2497.46 samples/sec Loss 1.2453 LearningRate 0.000056 Epoch: 31 Global Step: 652380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:46:06,726-Speed 2514.03 samples/sec Loss 1.2414 LearningRate 0.000056 Epoch: 31 Global Step: 652390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:46:14,925-Speed 2498.28 samples/sec Loss 1.3139 LearningRate 0.000056 Epoch: 31 Global Step: 652400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:46:23,182-Speed 2498.90 samples/sec Loss 1.2717 LearningRate 0.000056 Epoch: 31 Global Step: 652410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:48:28,684-Speed 163.23 samples/sec Loss 1.2932 LearningRate 0.000056 Epoch: 31 Global Step: 652420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:48:36,942-Speed 2511.02 samples/sec Loss 1.2608 LearningRate 0.000056 Epoch: 31 Global Step: 652430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:48:45,173-Speed 2507.45 samples/sec Loss 1.2897 LearningRate 0.000056 Epoch: 31 Global Step: 652440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:48:57,518-Speed 1659.10 samples/sec Loss 1.2626 LearningRate 0.000056 Epoch: 31 Global Step: 652450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:05,810-Speed 2506.24 samples/sec Loss 1.2683 LearningRate 0.000056 Epoch: 31 Global Step: 652460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:14,251-Speed 2502.18 samples/sec Loss 1.2758 LearningRate 0.000056 Epoch: 31 Global Step: 652470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:22,464-Speed 2502.18 samples/sec Loss 1.3102 LearningRate 0.000056 Epoch: 31 Global Step: 652480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:30,662-Speed 2498.71 samples/sec Loss 1.3112 LearningRate 0.000056 Epoch: 31 Global Step: 652490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:38,927-Speed 2497.10 samples/sec Loss 1.3159 LearningRate 0.000056 Epoch: 31 Global Step: 652500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:49,369-Speed 1973.82 samples/sec Loss 1.2940 LearningRate 0.000056 Epoch: 31 Global Step: 652510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:49:57,775-Speed 2493.61 samples/sec Loss 1.3041 LearningRate 0.000056 Epoch: 31 Global Step: 652520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:06,725-Speed 2288.40 samples/sec Loss 1.2777 LearningRate 0.000056 Epoch: 31 Global Step: 652530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:14,970-Speed 2484.42 samples/sec Loss 1.2796 LearningRate 0.000056 Epoch: 31 Global Step: 652540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:23,211-Speed 2485.53 samples/sec Loss 1.2708 LearningRate 0.000056 Epoch: 31 Global Step: 652550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:31,469-Speed 2480.43 samples/sec Loss 1.2868 LearningRate 0.000056 Epoch: 31 Global Step: 652560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:39,660-Speed 2500.67 samples/sec Loss 1.2606 LearningRate 0.000056 Epoch: 31 Global Step: 652570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:47,910-Speed 2482.69 samples/sec Loss 1.2575 LearningRate 0.000056 Epoch: 31 Global Step: 652580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:50:56,152-Speed 2485.34 samples/sec Loss 1.2475 LearningRate 0.000056 Epoch: 31 Global Step: 652590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:04,391-Speed 2485.85 samples/sec Loss 1.2923 LearningRate 0.000056 Epoch: 31 Global Step: 652600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:12,631-Speed 2485.87 samples/sec Loss 1.2675 LearningRate 0.000056 Epoch: 31 Global Step: 652610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:20,880-Speed 2484.33 samples/sec Loss 1.2461 LearningRate 0.000056 Epoch: 31 Global Step: 652620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:29,069-Speed 2501.30 samples/sec Loss 1.2504 LearningRate 0.000056 Epoch: 31 Global Step: 652630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:37,323-Speed 2481.56 samples/sec Loss 1.2913 LearningRate 0.000056 Epoch: 31 Global Step: 652640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:45,568-Speed 2484.34 samples/sec Loss 1.2922 LearningRate 0.000056 Epoch: 31 Global Step: 652650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:51:53,806-Speed 2486.47 samples/sec Loss 1.2571 LearningRate 0.000056 Epoch: 31 Global Step: 652660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-07-11 19:52:02,042-Speed 2487.01 samples/sec Loss 1.2863 LearningRate 0.000056 Epoch: 31 Global Step: 652670 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:10,277-Speed 2487.32 samples/sec Loss 1.2684 LearningRate 0.000056 Epoch: 31 Global Step: 652680 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:18,462-Speed 2502.35 samples/sec Loss 1.3035 LearningRate 0.000056 Epoch: 31 Global Step: 652690 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:26,695-Speed 2488.39 samples/sec Loss 1.2785 LearningRate 0.000056 Epoch: 31 Global Step: 652700 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:34,925-Speed 2489.05 samples/sec Loss 1.2942 LearningRate 0.000056 Epoch: 31 Global Step: 652710 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:43,157-Speed 2488.01 samples/sec Loss 1.2717 LearningRate 0.000056 Epoch: 31 Global Step: 652720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:51,385-Speed 2489.46 samples/sec Loss 1.2834 LearningRate 0.000056 Epoch: 31 Global Step: 652730 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:52:59,611-Speed 2490.08 samples/sec Loss 1.2946 LearningRate 0.000056 Epoch: 31 Global Step: 652740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:07,789-Speed 2504.85 samples/sec Loss 1.2659 LearningRate 0.000056 Epoch: 31 Global Step: 652750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:16,043-Speed 2481.44 samples/sec Loss 1.2513 LearningRate 0.000056 Epoch: 31 Global Step: 652760 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:24,269-Speed 2490.03 samples/sec Loss 1.2714 LearningRate 0.000056 Epoch: 31 Global Step: 652770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:32,496-Speed 2489.71 samples/sec Loss 1.2836 LearningRate 0.000056 Epoch: 31 Global Step: 652780 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:40,723-Speed 2489.68 samples/sec Loss 1.2743 LearningRate 0.000056 Epoch: 31 Global Step: 652790 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:48,950-Speed 2489.95 samples/sec Loss 1.3159 LearningRate 0.000056 Epoch: 31 Global Step: 652800 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:53:57,124-Speed 2505.82 samples/sec Loss 1.2508 LearningRate 0.000056 Epoch: 31 Global Step: 652810 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:05,356-Speed 2488.30 samples/sec Loss 1.3341 LearningRate 0.000056 Epoch: 31 Global Step: 652820 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:13,583-Speed 2490.02 samples/sec Loss 1.3222 LearningRate 0.000056 Epoch: 31 Global Step: 652830 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:21,812-Speed 2488.93 samples/sec Loss 1.2711 LearningRate 0.000056 Epoch: 31 Global Step: 652840 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:30,048-Speed 2486.88 samples/sec Loss 1.2889 LearningRate 0.000056 Epoch: 31 Global Step: 652850 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:38,276-Speed 2489.43 samples/sec Loss 1.2857 LearningRate 0.000056 Epoch: 31 Global Step: 652860 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:46,450-Speed 2505.97 samples/sec Loss 1.3025 LearningRate 0.000056 Epoch: 31 Global Step: 652870 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:54:54,678-Speed 2489.44 samples/sec Loss 1.3080 LearningRate 0.000056 Epoch: 31 Global Step: 652880 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:02,899-Speed 2491.66 samples/sec Loss 1.2937 LearningRate 0.000056 Epoch: 31 Global Step: 652890 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:11,134-Speed 2487.62 samples/sec Loss 1.2657 LearningRate 0.000056 Epoch: 31 Global Step: 652900 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:19,355-Speed 2491.42 samples/sec Loss 1.2608 LearningRate 0.000056 Epoch: 31 Global Step: 652910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:27,590-Speed 2487.29 samples/sec Loss 1.2744 LearningRate 0.000056 Epoch: 31 Global Step: 652920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:35,759-Speed 2507.64 samples/sec Loss 1.2780 LearningRate 0.000056 Epoch: 31 Global Step: 652930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:43,979-Speed 2491.78 samples/sec Loss 1.2799 LearningRate 0.000056 Epoch: 31 Global Step: 652940 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:55:52,206-Speed 2489.96 samples/sec Loss 1.2887 LearningRate 0.000056 Epoch: 31 Global Step: 652950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:00,428-Speed 2491.40 samples/sec Loss 1.2785 LearningRate 0.000056 Epoch: 31 Global Step: 652960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:08,650-Speed 2491.06 samples/sec Loss 1.2918 LearningRate 0.000056 Epoch: 31 Global Step: 652970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:16,870-Speed 2491.68 samples/sec Loss 1.2787 LearningRate 0.000056 Epoch: 31 Global Step: 652980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:25,037-Speed 2508.23 samples/sec Loss 1.2740 LearningRate 0.000056 Epoch: 31 Global Step: 652990 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:33,268-Speed 2488.61 samples/sec Loss 1.2937 LearningRate 0.000056 Epoch: 31 Global Step: 653000 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:41,498-Speed 2488.85 samples/sec Loss 1.2695 LearningRate 0.000056 Epoch: 31 Global Step: 653010 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:49,730-Speed 2488.21 samples/sec Loss 1.2416 LearningRate 0.000056 Epoch: 31 Global Step: 653020 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:56:57,951-Speed 2491.65 samples/sec Loss 1.2875 LearningRate 0.000056 Epoch: 31 Global Step: 653030 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:06,172-Speed 2491.69 samples/sec Loss 1.2679 LearningRate 0.000056 Epoch: 31 Global Step: 653040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:14,343-Speed 2506.83 samples/sec Loss 1.2794 LearningRate 0.000056 Epoch: 31 Global Step: 653050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:22,561-Speed 2492.23 samples/sec Loss 1.3204 LearningRate 0.000056 Epoch: 31 Global Step: 653060 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:30,780-Speed 2492.45 samples/sec Loss 1.2979 LearningRate 0.000056 Epoch: 31 Global Step: 653070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:39,005-Speed 2490.21 samples/sec Loss 1.2801 LearningRate 0.000056 Epoch: 31 Global Step: 653080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:47,225-Speed 2491.74 samples/sec Loss 1.2633 LearningRate 0.000056 Epoch: 31 Global Step: 653090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:57:55,467-Speed 2485.26 samples/sec Loss 1.2833 LearningRate 0.000056 Epoch: 31 Global Step: 653100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:03,644-Speed 2505.07 samples/sec Loss 1.3078 LearningRate 0.000056 Epoch: 31 Global Step: 653110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:11,860-Speed 2493.29 samples/sec Loss 1.2701 LearningRate 0.000056 Epoch: 31 Global Step: 653120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:20,074-Speed 2493.70 samples/sec Loss 1.2600 LearningRate 0.000056 Epoch: 31 Global Step: 653130 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:28,291-Speed 2492.75 samples/sec Loss 1.2793 LearningRate 0.000056 Epoch: 31 Global Step: 653140 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:36,512-Speed 2491.60 samples/sec Loss 1.2927 LearningRate 0.000056 Epoch: 31 Global Step: 653150 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:44,728-Speed 2493.02 samples/sec Loss 1.2812 LearningRate 0.000056 Epoch: 31 Global Step: 653160 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:58:52,890-Speed 2511.14 samples/sec Loss 1.2908 LearningRate 0.000056 Epoch: 31 Global Step: 653170 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:01,105-Speed 2493.62 samples/sec Loss 1.2660 LearningRate 0.000056 Epoch: 31 Global Step: 653180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:09,324-Speed 2491.81 samples/sec Loss 1.2981 LearningRate 0.000056 Epoch: 31 Global Step: 653190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:17,538-Speed 2493.94 samples/sec Loss 1.2746 LearningRate 0.000056 Epoch: 31 Global Step: 653200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:25,751-Speed 2493.85 samples/sec Loss 1.2559 LearningRate 0.000056 Epoch: 31 Global Step: 653210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:33,968-Speed 2492.84 samples/sec Loss 1.2576 LearningRate 0.000056 Epoch: 31 Global Step: 653220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:42,129-Speed 2510.07 samples/sec Loss 1.2801 LearningRate 0.000056 Epoch: 31 Global Step: 653230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:50,345-Speed 2492.98 samples/sec Loss 1.2600 LearningRate 0.000056 Epoch: 31 Global Step: 653240 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 19:59:58,561-Speed 2493.13 samples/sec Loss 1.2770 LearningRate 0.000056 Epoch: 31 Global Step: 653250 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:06,776-Speed 2493.40 samples/sec Loss 1.2686 LearningRate 0.000056 Epoch: 31 Global Step: 653260 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:15,000-Speed 2490.79 samples/sec Loss 1.3214 LearningRate 0.000056 Epoch: 31 Global Step: 653270 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:23,215-Speed 2493.46 samples/sec Loss 1.2722 LearningRate 0.000056 Epoch: 31 Global Step: 653280 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:31,378-Speed 2509.29 samples/sec Loss 1.2664 LearningRate 0.000056 Epoch: 31 Global Step: 653290 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:39,603-Speed 2490.53 samples/sec Loss 1.2687 LearningRate 0.000056 Epoch: 31 Global Step: 653300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:47,819-Speed 2493.35 samples/sec Loss 1.2415 LearningRate 0.000056 Epoch: 31 Global Step: 653310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:00:56,036-Speed 2493.13 samples/sec Loss 1.3009 LearningRate 0.000056 Epoch: 31 Global Step: 653320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:04,252-Speed 2493.12 samples/sec Loss 1.2957 LearningRate 0.000056 Epoch: 31 Global Step: 653330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:12,465-Speed 2493.92 samples/sec Loss 1.2780 LearningRate 0.000056 Epoch: 31 Global Step: 653340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:20,628-Speed 2509.52 samples/sec Loss 1.3134 LearningRate 0.000056 Epoch: 31 Global Step: 653350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:28,842-Speed 2493.84 samples/sec Loss 1.2828 LearningRate 0.000056 Epoch: 31 Global Step: 653360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:37,053-Speed 2494.59 samples/sec Loss 1.2895 LearningRate 0.000056 Epoch: 31 Global Step: 653370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:45,266-Speed 2493.92 samples/sec Loss 1.2856 LearningRate 0.000056 Epoch: 31 Global Step: 653380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:01:53,482-Speed 2493.23 samples/sec Loss 1.2899 LearningRate 0.000056 Epoch: 31 Global Step: 653390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:01,695-Speed 2493.81 samples/sec Loss 1.2956 LearningRate 0.000056 Epoch: 31 Global Step: 653400 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:09,855-Speed 2510.20 samples/sec Loss 1.2551 LearningRate 0.000056 Epoch: 31 Global Step: 653410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:18,074-Speed 2492.32 samples/sec Loss 1.2673 LearningRate 0.000056 Epoch: 31 Global Step: 653420 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:26,290-Speed 2492.77 samples/sec Loss 1.2677 LearningRate 0.000056 Epoch: 31 Global Step: 653430 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:34,502-Speed 2494.66 samples/sec Loss 1.2879 LearningRate 0.000056 Epoch: 31 Global Step: 653440 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:42,722-Speed 2491.98 samples/sec Loss 1.2616 LearningRate 0.000056 Epoch: 31 Global Step: 653450 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:50,933-Speed 2494.80 samples/sec Loss 1.2645 LearningRate 0.000056 Epoch: 31 Global Step: 653460 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:02:59,094-Speed 2509.79 samples/sec Loss 1.2719 LearningRate 0.000056 Epoch: 31 Global Step: 653470 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:07,303-Speed 2495.28 samples/sec Loss 1.3117 LearningRate 0.000056 Epoch: 31 Global Step: 653480 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:15,514-Speed 2494.59 samples/sec Loss 1.2756 LearningRate 0.000056 Epoch: 31 Global Step: 653490 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:23,723-Speed 2495.24 samples/sec Loss 1.2659 LearningRate 0.000056 Epoch: 31 Global Step: 653500 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:31,933-Speed 2494.92 samples/sec Loss 1.2872 LearningRate 0.000056 Epoch: 31 Global Step: 653510 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:40,145-Speed 2494.18 samples/sec Loss 1.2919 LearningRate 0.000056 Epoch: 31 Global Step: 653520 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:48,301-Speed 2511.69 samples/sec Loss 1.2810 LearningRate 0.000056 Epoch: 31 Global Step: 653530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:03:56,510-Speed 2495.05 samples/sec Loss 1.2470 LearningRate 0.000056 Epoch: 31 Global Step: 653540 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:04,720-Speed 2494.94 samples/sec Loss 1.2593 LearningRate 0.000056 Epoch: 31 Global Step: 653550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:12,941-Speed 2491.88 samples/sec Loss 1.2688 LearningRate 0.000056 Epoch: 31 Global Step: 653560 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:21,152-Speed 2494.78 samples/sec Loss 1.2637 LearningRate 0.000056 Epoch: 31 Global Step: 653570 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:29,378-Speed 2490.24 samples/sec Loss 1.2811 LearningRate 0.000056 Epoch: 31 Global Step: 653580 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:37,538-Speed 2510.03 samples/sec Loss 1.2632 LearningRate 0.000056 Epoch: 31 Global Step: 653590 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:45,749-Speed 2494.69 samples/sec Loss 1.2881 LearningRate 0.000056 Epoch: 31 Global Step: 653600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:04:53,971-Speed 2491.53 samples/sec Loss 1.2623 LearningRate 0.000056 Epoch: 31 Global Step: 653610 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:05:02,180-Speed 2495.53 samples/sec Loss 1.2932 LearningRate 0.000056 Epoch: 31 Global Step: 653620 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:05:10,403-Speed 2490.86 samples/sec Loss 1.2847 LearningRate 0.000056 Epoch: 31 Global Step: 653630 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:05:18,614-Speed 2494.47 samples/sec Loss 1.2611 LearningRate 0.000056 Epoch: 31 Global Step: 653640 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:05:26,776-Speed 2509.85 samples/sec Loss 1.3216 LearningRate 0.000056 Epoch: 31 Global Step: 653650 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-07-11 20:05:34,942-Speed 2508.24 samples/sec Loss 1.2882 LearningRate 0.000056 Epoch: 31 Global Step: 653660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:05:43,151-Speed 2495.23 samples/sec Loss 1.3018 LearningRate 0.000056 Epoch: 31 Global Step: 653670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:05:51,367-Speed 2493.09 samples/sec Loss 1.2638 LearningRate 0.000055 Epoch: 31 Global Step: 653680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:05:59,576-Speed 2495.35 samples/sec Loss 1.2539 LearningRate 0.000055 Epoch: 31 Global Step: 653690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:07,786-Speed 2494.70 samples/sec Loss 1.2861 LearningRate 0.000055 Epoch: 31 Global Step: 653700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:15,957-Speed 2507.27 samples/sec Loss 1.2763 LearningRate 0.000055 Epoch: 31 Global Step: 653710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:24,172-Speed 2493.40 samples/sec Loss 1.2827 LearningRate 0.000055 Epoch: 31 Global Step: 653720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:32,389-Speed 2492.59 samples/sec Loss 1.2693 LearningRate 0.000055 Epoch: 31 Global Step: 653730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:40,610-Speed 2491.66 samples/sec Loss 1.2433 LearningRate 0.000055 Epoch: 31 Global Step: 653740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:48,821-Speed 2494.70 samples/sec Loss 1.2414 LearningRate 0.000055 Epoch: 31 Global Step: 653750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:06:57,029-Speed 2495.30 samples/sec Loss 1.2860 LearningRate 0.000055 Epoch: 31 Global Step: 653760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:05,189-Speed 2510.35 samples/sec Loss 1.2687 LearningRate 0.000055 Epoch: 31 Global Step: 653770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:13,404-Speed 2493.36 samples/sec Loss 1.2691 LearningRate 0.000055 Epoch: 31 Global Step: 653780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:21,613-Speed 2495.31 samples/sec Loss 1.2517 LearningRate 0.000055 Epoch: 31 Global Step: 653790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:29,822-Speed 2495.40 samples/sec Loss 1.2707 LearningRate 0.000055 Epoch: 31 Global Step: 653800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:38,029-Speed 2495.78 samples/sec Loss 1.2656 LearningRate 0.000055 Epoch: 31 Global Step: 653810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:46,240-Speed 2494.45 samples/sec Loss 1.2771 LearningRate 0.000055 Epoch: 31 Global Step: 653820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:07:54,399-Speed 2510.62 samples/sec Loss 1.2749 LearningRate 0.000055 Epoch: 31 Global Step: 653830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:02,606-Speed 2495.88 samples/sec Loss 1.3038 LearningRate 0.000055 Epoch: 31 Global Step: 653840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:10,814-Speed 2495.40 samples/sec Loss 1.2767 LearningRate 0.000055 Epoch: 31 Global Step: 653850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:19,026-Speed 2494.31 samples/sec Loss 1.2675 LearningRate 0.000055 Epoch: 31 Global Step: 653860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:27,237-Speed 2494.45 samples/sec Loss 1.2388 LearningRate 0.000055 Epoch: 31 Global Step: 653870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:35,448-Speed 2494.50 samples/sec Loss 1.2891 LearningRate 0.000055 Epoch: 31 Global Step: 653880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:43,607-Speed 2510.74 samples/sec Loss 1.2996 LearningRate 0.000055 Epoch: 31 Global Step: 653890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:08:51,815-Speed 2495.18 samples/sec Loss 1.3129 LearningRate 0.000055 Epoch: 31 Global Step: 653900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:00,024-Speed 2495.41 samples/sec Loss 1.2750 LearningRate 0.000055 Epoch: 31 Global Step: 653910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:08,230-Speed 2496.21 samples/sec Loss 1.2931 LearningRate 0.000055 Epoch: 31 Global Step: 653920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:16,453-Speed 2490.92 samples/sec Loss 1.2866 LearningRate 0.000055 Epoch: 31 Global Step: 653930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:24,664-Speed 2494.63 samples/sec Loss 1.2800 LearningRate 0.000055 Epoch: 31 Global Step: 653940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:32,820-Speed 2511.38 samples/sec Loss 1.2411 LearningRate 0.000055 Epoch: 31 Global Step: 653950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:41,033-Speed 2494.41 samples/sec Loss 1.2799 LearningRate 0.000055 Epoch: 31 Global Step: 653960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:49,241-Speed 2495.49 samples/sec Loss 1.2660 LearningRate 0.000055 Epoch: 31 Global Step: 653970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:09:57,452-Speed 2494.71 samples/sec Loss 1.2568 LearningRate 0.000055 Epoch: 31 Global Step: 653980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:05,661-Speed 2494.98 samples/sec Loss 1.2646 LearningRate 0.000055 Epoch: 31 Global Step: 653990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:13,869-Speed 2495.71 samples/sec Loss 1.2737 LearningRate 0.000055 Epoch: 31 Global Step: 654000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:22,030-Speed 2509.65 samples/sec Loss 1.2689 LearningRate 0.000055 Epoch: 31 Global Step: 654010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:30,239-Speed 2495.28 samples/sec Loss 1.2797 LearningRate 0.000055 Epoch: 31 Global Step: 654020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:38,452-Speed 2494.04 samples/sec Loss 1.2815 LearningRate 0.000055 Epoch: 31 Global Step: 654030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:46,658-Speed 2496.15 samples/sec Loss 1.2684 LearningRate 0.000055 Epoch: 31 Global Step: 654040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:10:54,866-Speed 2495.40 samples/sec Loss 1.2739 LearningRate 0.000055 Epoch: 31 Global Step: 654050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:03,073-Speed 2496.00 samples/sec Loss 1.2666 LearningRate 0.000055 Epoch: 31 Global Step: 654060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:11,229-Speed 2511.22 samples/sec Loss 1.2441 LearningRate 0.000055 Epoch: 31 Global Step: 654070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:19,437-Speed 2495.60 samples/sec Loss 1.2612 LearningRate 0.000055 Epoch: 31 Global Step: 654080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:27,658-Speed 2491.77 samples/sec Loss 1.2637 LearningRate 0.000055 Epoch: 31 Global Step: 654090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:35,866-Speed 2495.53 samples/sec Loss 1.2998 LearningRate 0.000055 Epoch: 31 Global Step: 654100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:44,079-Speed 2494.30 samples/sec Loss 1.2955 LearningRate 0.000055 Epoch: 31 Global Step: 654110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:11:52,247-Speed 2507.71 samples/sec Loss 1.2607 LearningRate 0.000055 Epoch: 31 Global Step: 654120 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:00,400-Speed 2512.17 samples/sec Loss 1.2729 LearningRate 0.000055 Epoch: 31 Global Step: 654130 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:08,625-Speed 2490.56 samples/sec Loss 1.2576 LearningRate 0.000055 Epoch: 31 Global Step: 654140 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:16,830-Speed 2496.36 samples/sec Loss 1.2911 LearningRate 0.000055 Epoch: 31 Global Step: 654150 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:25,043-Speed 2494.10 samples/sec Loss 1.2633 LearningRate 0.000055 Epoch: 31 Global Step: 654160 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:33,255-Speed 2494.25 samples/sec Loss 1.2692 LearningRate 0.000055 Epoch: 31 Global Step: 654170 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:41,464-Speed 2495.32 samples/sec Loss 1.2399 LearningRate 0.000055 Epoch: 31 Global Step: 654180 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:49,628-Speed 2509.14 samples/sec Loss 1.2613 LearningRate 0.000055 Epoch: 31 Global Step: 654190 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:12:57,843-Speed 2493.47 samples/sec Loss 1.2621 LearningRate 0.000055 Epoch: 31 Global Step: 654200 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:06,056-Speed 2494.16 samples/sec Loss 1.2671 LearningRate 0.000055 Epoch: 31 Global Step: 654210 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:14,268-Speed 2494.10 samples/sec Loss 1.2711 LearningRate 0.000055 Epoch: 31 Global Step: 654220 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:22,477-Speed 2495.16 samples/sec Loss 1.2988 LearningRate 0.000055 Epoch: 31 Global Step: 654230 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:30,684-Speed 2495.80 samples/sec Loss 1.2736 LearningRate 0.000055 Epoch: 31 Global Step: 654240 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:38,845-Speed 2510.21 samples/sec Loss 1.2820 LearningRate 0.000055 Epoch: 31 Global Step: 654250 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:47,053-Speed 2495.50 samples/sec Loss 1.2791 LearningRate 0.000055 Epoch: 31 Global Step: 654260 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:13:55,257-Speed 2496.69 samples/sec Loss 1.2500 LearningRate 0.000055 Epoch: 31 Global Step: 654270 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:03,467-Speed 2494.81 samples/sec Loss 1.2641 LearningRate 0.000055 Epoch: 31 Global Step: 654280 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:11,669-Speed 2497.18 samples/sec Loss 1.2819 LearningRate 0.000055 Epoch: 31 Global Step: 654290 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:19,888-Speed 2492.21 samples/sec Loss 1.2482 LearningRate 0.000055 Epoch: 31 Global Step: 654300 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:28,058-Speed 2507.21 samples/sec Loss 1.2959 LearningRate 0.000055 Epoch: 31 Global Step: 654310 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:36,271-Speed 2494.03 samples/sec Loss 1.2807 LearningRate 0.000055 Epoch: 31 Global Step: 654320 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:44,476-Speed 2496.04 samples/sec Loss 1.3101 LearningRate 0.000055 Epoch: 31 Global Step: 654330 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:14:52,678-Speed 2497.50 samples/sec Loss 1.2525 LearningRate 0.000055 Epoch: 31 Global Step: 654340 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:00,882-Speed 2496.69 samples/sec Loss 1.2202 LearningRate 0.000055 Epoch: 31 Global Step: 654350 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:09,089-Speed 2495.81 samples/sec Loss 1.2383 LearningRate 0.000055 Epoch: 31 Global Step: 654360 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:17,241-Speed 2513.06 samples/sec Loss 1.2976 LearningRate 0.000055 Epoch: 31 Global Step: 654370 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:25,446-Speed 2496.39 samples/sec Loss 1.2626 LearningRate 0.000055 Epoch: 31 Global Step: 654380 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:33,647-Speed 2497.51 samples/sec Loss 1.2781 LearningRate 0.000055 Epoch: 31 Global Step: 654390 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:41,865-Speed 2492.57 samples/sec Loss 1.2524 LearningRate 0.000055 Epoch: 31 Global Step: 654400 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:50,068-Speed 2497.03 samples/sec Loss 1.2733 LearningRate 0.000055 Epoch: 31 Global Step: 654410 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:15:58,275-Speed 2495.84 samples/sec Loss 1.2719 LearningRate 0.000055 Epoch: 31 Global Step: 654420 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:06,429-Speed 2512.15 samples/sec Loss 1.2646 LearningRate 0.000055 Epoch: 31 Global Step: 654430 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:14,636-Speed 2495.95 samples/sec Loss 1.2620 LearningRate 0.000055 Epoch: 31 Global Step: 654440 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:22,837-Speed 2497.54 samples/sec Loss 1.2553 LearningRate 0.000055 Epoch: 31 Global Step: 654450 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:31,041-Speed 2496.95 samples/sec Loss 1.2845 LearningRate 0.000055 Epoch: 31 Global Step: 654460 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:39,244-Speed 2496.96 samples/sec Loss 1.2879 LearningRate 0.000055 Epoch: 31 Global Step: 654470 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:47,450-Speed 2496.06 samples/sec Loss 1.2626 LearningRate 0.000055 Epoch: 31 Global Step: 654480 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:16:55,605-Speed 2511.86 samples/sec Loss 1.2727 LearningRate 0.000055 Epoch: 31 Global Step: 654490 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:03,807-Speed 2497.48 samples/sec Loss 1.2753 LearningRate 0.000055 Epoch: 31 Global Step: 654500 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:12,010-Speed 2496.95 samples/sec Loss 1.2739 LearningRate 0.000055 Epoch: 31 Global Step: 654510 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:20,219-Speed 2495.20 samples/sec Loss 1.2807 LearningRate 0.000055 Epoch: 31 Global Step: 654520 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:28,425-Speed 2496.47 samples/sec Loss 1.2508 LearningRate 0.000055 Epoch: 31 Global Step: 654530 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:36,627-Speed 2497.49 samples/sec Loss 1.2537 LearningRate 0.000055 Epoch: 31 Global Step: 654540 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:44,787-Speed 2510.07 samples/sec Loss 1.2915 LearningRate 0.000055 Epoch: 31 Global Step: 654550 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:17:52,995-Speed 2495.68 samples/sec Loss 1.2894 LearningRate 0.000055 Epoch: 31 Global Step: 654560 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:01,206-Speed 2494.45 samples/sec Loss 1.2491 LearningRate 0.000055 Epoch: 31 Global Step: 654570 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:09,421-Speed 2493.33 samples/sec Loss 1.2540 LearningRate 0.000055 Epoch: 31 Global Step: 654580 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:17,639-Speed 2492.24 samples/sec Loss 1.2502 LearningRate 0.000055 Epoch: 31 Global Step: 654590 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:25,845-Speed 2496.59 samples/sec Loss 1.2425 LearningRate 0.000055 Epoch: 31 Global Step: 654600 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:33,997-Speed 2512.82 samples/sec Loss 1.2483 LearningRate 0.000055 Epoch: 31 Global Step: 654610 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:42,214-Speed 2492.51 samples/sec Loss 1.2570 LearningRate 0.000055 Epoch: 31 Global Step: 654620 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:50,423-Speed 2495.42 samples/sec Loss 1.2747 LearningRate 0.000055 Epoch: 31 Global Step: 654630 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:18:58,636-Speed 2494.33 samples/sec Loss 1.2796 LearningRate 0.000055 Epoch: 31 Global Step: 654640 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:06,841-Speed 2497.38 samples/sec Loss 1.2328 LearningRate 0.000055 Epoch: 31 Global Step: 654650 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:15,058-Speed 2492.91 samples/sec Loss 1.2337 LearningRate 0.000055 Epoch: 31 Global Step: 654660 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:23,211-Speed 2512.40 samples/sec Loss 1.2864 LearningRate 0.000055 Epoch: 31 Global Step: 654670 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:31,416-Speed 2496.48 samples/sec Loss 1.2566 LearningRate 0.000055 Epoch: 31 Global Step: 654680 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:39,618-Speed 2497.37 samples/sec Loss 1.2497 LearningRate 0.000055 Epoch: 31 Global Step: 654690 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:47,824-Speed 2496.50 samples/sec Loss 1.2753 LearningRate 0.000055 Epoch: 31 Global Step: 654700 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:19:56,030-Speed 2496.04 samples/sec Loss 1.2678 LearningRate 0.000055 Epoch: 31 Global Step: 654710 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:04,233-Speed 2497.00 samples/sec Loss 1.2631 LearningRate 0.000055 Epoch: 31 Global Step: 654720 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:12,398-Speed 2508.67 samples/sec Loss 1.2734 LearningRate 0.000055 Epoch: 31 Global Step: 654730 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:20,605-Speed 2495.88 samples/sec Loss 1.2740 LearningRate 0.000055 Epoch: 31 Global Step: 654740 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:28,811-Speed 2495.84 samples/sec Loss 1.2356 LearningRate 0.000055 Epoch: 31 Global Step: 654750 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:37,018-Speed 2495.87 samples/sec Loss 1.2849 LearningRate 0.000055 Epoch: 31 Global Step: 654760 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:45,221-Speed 2497.18 samples/sec Loss 1.2605 LearningRate 0.000055 Epoch: 31 Global Step: 654770 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:20:53,427-Speed 2496.12 samples/sec Loss 1.2588 LearningRate 0.000055 Epoch: 31 Global Step: 654780 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:01,580-Speed 2512.19 samples/sec Loss 1.2846 LearningRate 0.000055 Epoch: 31 Global Step: 654790 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:09,792-Speed 2494.62 samples/sec Loss 1.2471 LearningRate 0.000055 Epoch: 31 Global Step: 654800 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:17,999-Speed 2495.99 samples/sec Loss 1.2359 LearningRate 0.000055 Epoch: 31 Global Step: 654810 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:26,204-Speed 2496.34 samples/sec Loss 1.2805 LearningRate 0.000055 Epoch: 31 Global Step: 654820 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:34,414-Speed 2495.19 samples/sec Loss 1.2887 LearningRate 0.000055 Epoch: 31 Global Step: 654830 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:42,624-Speed 2494.64 samples/sec Loss 1.2708 LearningRate 0.000055 Epoch: 31 Global Step: 654840 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:50,776-Speed 2512.79 samples/sec Loss 1.2502 LearningRate 0.000055 Epoch: 31 Global Step: 654850 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:21:58,984-Speed 2495.59 samples/sec Loss 1.2416 LearningRate 0.000055 Epoch: 31 Global Step: 654860 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:07,191-Speed 2495.94 samples/sec Loss 1.2515 LearningRate 0.000055 Epoch: 31 Global Step: 654870 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:15,396-Speed 2496.27 samples/sec Loss 1.2591 LearningRate 0.000055 Epoch: 31 Global Step: 654880 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:23,603-Speed 2495.90 samples/sec Loss 1.2310 LearningRate 0.000055 Epoch: 31 Global Step: 654890 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:31,811-Speed 2495.37 samples/sec Loss 1.2550 LearningRate 0.000055 Epoch: 31 Global Step: 654900 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:39,963-Speed 2512.72 samples/sec Loss 1.2626 LearningRate 0.000055 Epoch: 31 Global Step: 654910 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:48,168-Speed 2496.39 samples/sec Loss 1.2901 LearningRate 0.000055 Epoch: 31 Global Step: 654920 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:22:56,374-Speed 2496.12 samples/sec Loss 1.2621 LearningRate 0.000055 Epoch: 31 Global Step: 654930 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:04,580-Speed 2496.27 samples/sec Loss 1.2685 LearningRate 0.000055 Epoch: 31 Global Step: 654940 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:12,792-Speed 2494.13 samples/sec Loss 1.2624 LearningRate 0.000055 Epoch: 31 Global Step: 654950 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:20,998-Speed 2496.22 samples/sec Loss 1.2674 LearningRate 0.000055 Epoch: 31 Global Step: 654960 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:29,178-Speed 2504.26 samples/sec Loss 1.2392 LearningRate 0.000055 Epoch: 31 Global Step: 654970 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:37,389-Speed 2494.37 samples/sec Loss 1.2451 LearningRate 0.000055 Epoch: 31 Global Step: 654980 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:45,599-Speed 2495.22 samples/sec Loss 1.2476 LearningRate 0.000055 Epoch: 31 Global Step: 654990 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:23:53,805-Speed 2495.84 samples/sec Loss 1.2815 LearningRate 0.000055 Epoch: 31 Global Step: 655000 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:02,010-Speed 2496.40 samples/sec Loss 1.2856 LearningRate 0.000055 Epoch: 31 Global Step: 655010 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:10,233-Speed 2491.08 samples/sec Loss 1.2581 LearningRate 0.000055 Epoch: 31 Global Step: 655020 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:18,384-Speed 2513.20 samples/sec Loss 1.2311 LearningRate 0.000055 Epoch: 31 Global Step: 655030 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:26,589-Speed 2496.20 samples/sec Loss 1.2898 LearningRate 0.000055 Epoch: 31 Global Step: 655040 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:34,793-Speed 2496.71 samples/sec Loss 1.2786 LearningRate 0.000055 Epoch: 31 Global Step: 655050 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:42,998-Speed 2496.64 samples/sec Loss 1.2796 LearningRate 0.000055 Epoch: 31 Global Step: 655060 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:51,208-Speed 2494.95 samples/sec Loss 1.2802 LearningRate 0.000055 Epoch: 31 Global Step: 655070 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:24:59,425-Speed 2492.85 samples/sec Loss 1.2799 LearningRate 0.000055 Epoch: 31 Global Step: 655080 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:07,581-Speed 2511.35 samples/sec Loss 1.2550 LearningRate 0.000055 Epoch: 31 Global Step: 655090 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:15,786-Speed 2496.63 samples/sec Loss 1.2357 LearningRate 0.000055 Epoch: 31 Global Step: 655100 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:23,987-Speed 2497.56 samples/sec Loss 1.2503 LearningRate 0.000055 Epoch: 31 Global Step: 655110 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:32,199-Speed 2494.32 samples/sec Loss 1.2520 LearningRate 0.000055 Epoch: 31 Global Step: 655120 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:40,405-Speed 2496.06 samples/sec Loss 1.2839 LearningRate 0.000055 Epoch: 31 Global Step: 655130 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:48,609-Speed 2496.69 samples/sec Loss 1.2737 LearningRate 0.000055 Epoch: 31 Global Step: 655140 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:25:56,759-Speed 2513.16 samples/sec Loss 1.2445 LearningRate 0.000055 Epoch: 31 Global Step: 655150 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:04,963-Speed 2496.69 samples/sec Loss 1.2735 LearningRate 0.000055 Epoch: 31 Global Step: 655160 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:13,181-Speed 2492.58 samples/sec Loss 1.2362 LearningRate 0.000055 Epoch: 31 Global Step: 655170 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:21,392-Speed 2494.45 samples/sec Loss 1.2707 LearningRate 0.000055 Epoch: 31 Global Step: 655180 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:29,598-Speed 2496.48 samples/sec Loss 1.2854 LearningRate 0.000055 Epoch: 31 Global Step: 655190 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:37,802-Speed 2496.55 samples/sec Loss 1.2976 LearningRate 0.000055 Epoch: 31 Global Step: 655200 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:45,953-Speed 2512.77 samples/sec Loss 1.2743 LearningRate 0.000055 Epoch: 31 Global Step: 655210 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:26:54,159-Speed 2496.46 samples/sec Loss 1.2655 LearningRate 0.000055 Epoch: 31 Global Step: 655220 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:02,369-Speed 2494.77 samples/sec Loss 1.2429 LearningRate 0.000055 Epoch: 31 Global Step: 655230 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:10,573-Speed 2496.58 samples/sec Loss 1.2853 LearningRate 0.000055 Epoch: 31 Global Step: 655240 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:18,802-Speed 2489.25 samples/sec Loss 1.2720 LearningRate 0.000055 Epoch: 31 Global Step: 655250 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:27,006-Speed 2496.71 samples/sec Loss 1.2857 LearningRate 0.000055 Epoch: 31 Global Step: 655260 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:35,159-Speed 2512.52 samples/sec Loss 1.2703 LearningRate 0.000054 Epoch: 31 Global Step: 655270 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:43,362-Speed 2496.89 samples/sec Loss 1.2560 LearningRate 0.000054 Epoch: 31 Global Step: 655280 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:51,574-Speed 2494.28 samples/sec Loss 1.2771 LearningRate 0.000054 Epoch: 31 Global Step: 655290 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:27:59,779-Speed 2496.70 samples/sec Loss 1.2437 LearningRate 0.000054 Epoch: 31 Global Step: 655300 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:28:07,986-Speed 2496.10 samples/sec Loss 1.2500 LearningRate 0.000054 Epoch: 31 Global Step: 655310 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:28:16,186-Speed 2498.01 samples/sec Loss 1.2629 LearningRate 0.000054 Epoch: 31 Global Step: 655320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:28:24,342-Speed 2511.51 samples/sec Loss 1.2631 LearningRate 0.000054 Epoch: 31 Global Step: 655330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:28:32,546-Speed 2496.72 samples/sec Loss 1.2473 LearningRate 0.000054 Epoch: 31 Global Step: 655340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:28:40,749-Speed 2497.06 samples/sec Loss 1.2798 LearningRate 0.000054 Epoch: 31 Global Step: 655350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:28:48,969-Speed 2491.96 samples/sec Loss 1.2815 LearningRate 0.000054 Epoch: 31 Global Step: 655360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:28:57,186-Speed 2492.96 samples/sec Loss 1.2629 LearningRate 0.000054 Epoch: 31 Global Step: 655370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:05,389-Speed 2497.08 samples/sec Loss 1.2589 LearningRate 0.000054 Epoch: 31 Global Step: 655380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:13,540-Speed 2512.73 samples/sec Loss 1.2187 LearningRate 0.000054 Epoch: 31 Global Step: 655390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:21,746-Speed 2496.25 samples/sec Loss 1.2521 LearningRate 0.000054 Epoch: 31 Global Step: 655400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:29,955-Speed 2495.41 samples/sec Loss 1.2652 LearningRate 0.000054 Epoch: 31 Global Step: 655410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:38,168-Speed 2494.04 samples/sec Loss 1.2661 LearningRate 0.000054 Epoch: 31 Global Step: 655420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:46,376-Speed 2495.43 samples/sec Loss 1.2358 LearningRate 0.000054 Epoch: 31 Global Step: 655430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:29:54,583-Speed 2495.73 samples/sec Loss 1.2875 LearningRate 0.000054 Epoch: 31 Global Step: 655440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:02,752-Speed 2507.74 samples/sec Loss 1.2501 LearningRate 0.000054 Epoch: 31 Global Step: 655450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:10,975-Speed 2490.71 samples/sec Loss 1.2483 LearningRate 0.000054 Epoch: 31 Global Step: 655460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:19,190-Speed 2493.40 samples/sec Loss 1.2299 LearningRate 0.000054 Epoch: 31 Global Step: 655470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:27,396-Speed 2496.28 samples/sec Loss 1.2819 LearningRate 0.000054 Epoch: 31 Global Step: 655480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:35,607-Speed 2495.06 samples/sec Loss 1.2524 LearningRate 0.000054 Epoch: 31 Global Step: 655490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:43,815-Speed 2495.34 samples/sec Loss 1.2903 LearningRate 0.000054 Epoch: 31 Global Step: 655500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:30:51,972-Speed 2511.09 samples/sec Loss 1.2886 LearningRate 0.000054 Epoch: 31 Global Step: 655510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:00,182-Speed 2495.17 samples/sec Loss 1.2840 LearningRate 0.000054 Epoch: 31 Global Step: 655520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:08,412-Speed 2489.07 samples/sec Loss 1.2788 LearningRate 0.000054 Epoch: 31 Global Step: 655530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:16,619-Speed 2495.75 samples/sec Loss 1.2584 LearningRate 0.000054 Epoch: 31 Global Step: 655540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:24,828-Speed 2495.07 samples/sec Loss 1.2559 LearningRate 0.000054 Epoch: 31 Global Step: 655550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:33,042-Speed 2493.97 samples/sec Loss 1.3015 LearningRate 0.000054 Epoch: 31 Global Step: 655560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:41,195-Speed 2512.39 samples/sec Loss 1.2696 LearningRate 0.000054 Epoch: 31 Global Step: 655570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:49,402-Speed 2495.55 samples/sec Loss 1.2629 LearningRate 0.000054 Epoch: 31 Global Step: 655580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:31:57,609-Speed 2495.97 samples/sec Loss 1.2801 LearningRate 0.000054 Epoch: 31 Global Step: 655590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:05,820-Speed 2494.52 samples/sec Loss 1.2652 LearningRate 0.000054 Epoch: 31 Global Step: 655600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:14,029-Speed 2495.24 samples/sec Loss 1.2445 LearningRate 0.000054 Epoch: 31 Global Step: 655610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:22,237-Speed 2495.58 samples/sec Loss 1.2624 LearningRate 0.000054 Epoch: 31 Global Step: 655620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:30,389-Speed 2512.48 samples/sec Loss 1.2730 LearningRate 0.000054 Epoch: 31 Global Step: 655630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:38,597-Speed 2495.86 samples/sec Loss 1.2785 LearningRate 0.000054 Epoch: 31 Global Step: 655640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:46,806-Speed 2495.14 samples/sec Loss 1.2593 LearningRate 0.000054 Epoch: 31 Global Step: 655650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:32:55,015-Speed 2494.93 samples/sec Loss 1.2796 LearningRate 0.000054 Epoch: 31 Global Step: 655660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:03,224-Speed 2495.40 samples/sec Loss 1.2726 LearningRate 0.000054 Epoch: 31 Global Step: 655670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:11,428-Speed 2496.70 samples/sec Loss 1.2744 LearningRate 0.000054 Epoch: 31 Global Step: 655680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:19,580-Speed 2512.87 samples/sec Loss 1.2521 LearningRate 0.000054 Epoch: 31 Global Step: 655690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:27,785-Speed 2496.56 samples/sec Loss 1.2710 LearningRate 0.000054 Epoch: 31 Global Step: 655700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:35,993-Speed 2495.20 samples/sec Loss 1.2868 LearningRate 0.000054 Epoch: 31 Global Step: 655710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:44,200-Speed 2496.13 samples/sec Loss 1.2623 LearningRate 0.000054 Epoch: 31 Global Step: 655720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:33:52,403-Speed 2497.07 samples/sec Loss 1.2756 LearningRate 0.000054 Epoch: 31 Global Step: 655730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:00,608-Speed 2496.35 samples/sec Loss 1.2863 LearningRate 0.000054 Epoch: 31 Global Step: 655740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:08,762-Speed 2512.00 samples/sec Loss 1.2407 LearningRate 0.000054 Epoch: 31 Global Step: 655750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:16,966-Speed 2496.80 samples/sec Loss 1.2644 LearningRate 0.000054 Epoch: 31 Global Step: 655760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:25,170-Speed 2496.79 samples/sec Loss 1.2779 LearningRate 0.000054 Epoch: 31 Global Step: 655770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:33,379-Speed 2495.21 samples/sec Loss 1.2319 LearningRate 0.000054 Epoch: 31 Global Step: 655780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:41,587-Speed 2495.46 samples/sec Loss 1.2329 LearningRate 0.000054 Epoch: 31 Global Step: 655790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:49,794-Speed 2495.79 samples/sec Loss 1.2684 LearningRate 0.000054 Epoch: 31 Global Step: 655800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:34:57,948-Speed 2512.17 samples/sec Loss 1.2651 LearningRate 0.000054 Epoch: 31 Global Step: 655810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:06,153-Speed 2496.36 samples/sec Loss 1.2164 LearningRate 0.000054 Epoch: 31 Global Step: 655820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:14,355-Speed 2497.49 samples/sec Loss 1.2702 LearningRate 0.000054 Epoch: 31 Global Step: 655830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:22,564-Speed 2495.19 samples/sec Loss 1.2954 LearningRate 0.000054 Epoch: 31 Global Step: 655840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:30,768-Speed 2496.66 samples/sec Loss 1.2520 LearningRate 0.000054 Epoch: 31 Global Step: 655850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:38,976-Speed 2495.93 samples/sec Loss 1.2718 LearningRate 0.000054 Epoch: 31 Global Step: 655860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:47,127-Speed 2512.91 samples/sec Loss 1.2647 LearningRate 0.000054 Epoch: 31 Global Step: 655870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:35:55,335-Speed 2495.53 samples/sec Loss 1.2479 LearningRate 0.000054 Epoch: 31 Global Step: 655880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:03,542-Speed 2495.81 samples/sec Loss 1.2547 LearningRate 0.000054 Epoch: 31 Global Step: 655890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:11,747-Speed 2496.52 samples/sec Loss 1.2496 LearningRate 0.000054 Epoch: 31 Global Step: 655900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:19,951-Speed 2497.24 samples/sec Loss 1.2476 LearningRate 0.000054 Epoch: 31 Global Step: 655910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:28,158-Speed 2495.62 samples/sec Loss 1.2799 LearningRate 0.000054 Epoch: 31 Global Step: 655920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:36,313-Speed 2511.86 samples/sec Loss 1.3090 LearningRate 0.000054 Epoch: 31 Global Step: 655930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:44,519-Speed 2496.00 samples/sec Loss 1.2871 LearningRate 0.000054 Epoch: 31 Global Step: 655940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:36:52,722-Speed 2497.13 samples/sec Loss 1.2653 LearningRate 0.000054 Epoch: 31 Global Step: 655950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:00,931-Speed 2495.31 samples/sec Loss 1.2557 LearningRate 0.000054 Epoch: 31 Global Step: 655960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:09,135-Speed 2496.95 samples/sec Loss 1.2790 LearningRate 0.000054 Epoch: 31 Global Step: 655970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:17,341-Speed 2496.06 samples/sec Loss 1.2599 LearningRate 0.000054 Epoch: 31 Global Step: 655980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:25,491-Speed 2513.37 samples/sec Loss 1.2320 LearningRate 0.000054 Epoch: 31 Global Step: 655990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:33,703-Speed 2494.34 samples/sec Loss 1.2849 LearningRate 0.000054 Epoch: 31 Global Step: 656000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:41,907-Speed 2496.91 samples/sec Loss 1.2811 LearningRate 0.000054 Epoch: 31 Global Step: 656010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:50,120-Speed 2493.92 samples/sec Loss 1.2941 LearningRate 0.000054 Epoch: 31 Global Step: 656020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:37:58,334-Speed 2493.88 samples/sec Loss 1.2666 LearningRate 0.000054 Epoch: 31 Global Step: 656030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:06,540-Speed 2496.09 samples/sec Loss 1.2620 LearningRate 0.000054 Epoch: 31 Global Step: 656040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:14,721-Speed 2503.98 samples/sec Loss 1.2752 LearningRate 0.000054 Epoch: 31 Global Step: 656050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:22,928-Speed 2495.97 samples/sec Loss 1.2776 LearningRate 0.000054 Epoch: 31 Global Step: 656060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:31,136-Speed 2495.49 samples/sec Loss 1.2787 LearningRate 0.000054 Epoch: 31 Global Step: 656070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:39,357-Speed 2491.71 samples/sec Loss 1.2541 LearningRate 0.000054 Epoch: 31 Global Step: 656080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:47,575-Speed 2492.73 samples/sec Loss 1.2427 LearningRate 0.000054 Epoch: 31 Global Step: 656090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:38:55,778-Speed 2496.91 samples/sec Loss 1.2749 LearningRate 0.000054 Epoch: 31 Global Step: 656100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:03,929-Speed 2512.98 samples/sec Loss 1.2859 LearningRate 0.000054 Epoch: 31 Global Step: 656110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:12,148-Speed 2492.21 samples/sec Loss 1.2521 LearningRate 0.000054 Epoch: 31 Global Step: 656120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:20,353-Speed 2496.64 samples/sec Loss 1.2842 LearningRate 0.000054 Epoch: 31 Global Step: 656130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:28,562-Speed 2495.55 samples/sec Loss 1.2582 LearningRate 0.000054 Epoch: 31 Global Step: 656140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:36,765-Speed 2497.10 samples/sec Loss 1.2797 LearningRate 0.000054 Epoch: 31 Global Step: 656150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:44,970-Speed 2496.35 samples/sec Loss 1.2937 LearningRate 0.000054 Epoch: 31 Global Step: 656160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:39:53,120-Speed 2513.11 samples/sec Loss 1.2880 LearningRate 0.000054 Epoch: 31 Global Step: 656170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:01,332-Speed 2494.37 samples/sec Loss 1.2981 LearningRate 0.000054 Epoch: 31 Global Step: 656180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:09,538-Speed 2495.96 samples/sec Loss 1.2617 LearningRate 0.000054 Epoch: 31 Global Step: 656190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:17,745-Speed 2495.88 samples/sec Loss 1.2573 LearningRate 0.000054 Epoch: 31 Global Step: 656200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:25,947-Speed 2497.08 samples/sec Loss 1.2864 LearningRate 0.000054 Epoch: 31 Global Step: 656210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:34,152-Speed 2496.50 samples/sec Loss 1.2641 LearningRate 0.000054 Epoch: 31 Global Step: 656220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:42,304-Speed 2512.58 samples/sec Loss 1.2878 LearningRate 0.000054 Epoch: 31 Global Step: 656230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:50,513-Speed 2495.27 samples/sec Loss 1.2584 LearningRate 0.000054 Epoch: 31 Global Step: 656240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:40:58,723-Speed 2495.00 samples/sec Loss 1.2501 LearningRate 0.000054 Epoch: 31 Global Step: 656250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:06,927-Speed 2496.70 samples/sec Loss 1.2869 LearningRate 0.000054 Epoch: 31 Global Step: 656260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:15,149-Speed 2491.31 samples/sec Loss 1.2886 LearningRate 0.000054 Epoch: 31 Global Step: 656270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:23,352-Speed 2496.81 samples/sec Loss 1.2591 LearningRate 0.000054 Epoch: 31 Global Step: 656280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:31,507-Speed 2511.83 samples/sec Loss 1.2303 LearningRate 0.000054 Epoch: 31 Global Step: 656290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:39,716-Speed 2495.31 samples/sec Loss 1.2964 LearningRate 0.000054 Epoch: 31 Global Step: 656300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:47,922-Speed 2496.15 samples/sec Loss 1.2790 LearningRate 0.000054 Epoch: 31 Global Step: 656310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:41:56,124-Speed 2497.26 samples/sec Loss 1.2655 LearningRate 0.000054 Epoch: 31 Global Step: 656320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:04,331-Speed 2495.81 samples/sec Loss 1.2731 LearningRate 0.000054 Epoch: 31 Global Step: 656330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:12,536-Speed 2496.33 samples/sec Loss 1.2559 LearningRate 0.000054 Epoch: 31 Global Step: 656340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:20,692-Speed 2511.40 samples/sec Loss 1.2780 LearningRate 0.000054 Epoch: 31 Global Step: 656350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:28,900-Speed 2495.73 samples/sec Loss 1.2348 LearningRate 0.000054 Epoch: 31 Global Step: 656360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:37,118-Speed 2492.29 samples/sec Loss 1.2680 LearningRate 0.000054 Epoch: 31 Global Step: 656370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:45,334-Speed 2493.27 samples/sec Loss 1.2761 LearningRate 0.000054 Epoch: 31 Global Step: 656380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:42:53,544-Speed 2494.67 samples/sec Loss 1.2777 LearningRate 0.000054 Epoch: 31 Global Step: 656390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:01,750-Speed 2496.28 samples/sec Loss 1.2576 LearningRate 0.000054 Epoch: 31 Global Step: 656400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:09,903-Speed 2512.23 samples/sec Loss 1.2351 LearningRate 0.000054 Epoch: 31 Global Step: 656410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:18,114-Speed 2494.31 samples/sec Loss 1.2468 LearningRate 0.000054 Epoch: 31 Global Step: 656420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:26,328-Speed 2493.94 samples/sec Loss 1.2705 LearningRate 0.000054 Epoch: 31 Global Step: 656430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:34,536-Speed 2495.81 samples/sec Loss 1.2939 LearningRate 0.000054 Epoch: 31 Global Step: 656440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:42,744-Speed 2495.64 samples/sec Loss 1.3020 LearningRate 0.000054 Epoch: 31 Global Step: 656450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-07-11 20:43:50,904-Speed 2510.17 samples/sec Loss 1.2712 LearningRate 0.000054 Epoch: 31 Global Step: 656460 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:43:59,065-Speed 2509.74 samples/sec Loss 1.2776 LearningRate 0.000054 Epoch: 31 Global Step: 656470 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:07,274-Speed 2495.24 samples/sec Loss 1.2524 LearningRate 0.000054 Epoch: 31 Global Step: 656480 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:15,490-Speed 2493.24 samples/sec Loss 1.3139 LearningRate 0.000054 Epoch: 31 Global Step: 656490 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:23,704-Speed 2493.64 samples/sec Loss 1.2356 LearningRate 0.000054 Epoch: 31 Global Step: 656500 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:31,908-Speed 2496.93 samples/sec Loss 1.3064 LearningRate 0.000054 Epoch: 31 Global Step: 656510 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:40,110-Speed 2497.19 samples/sec Loss 1.2633 LearningRate 0.000054 Epoch: 31 Global Step: 656520 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:48,263-Speed 2512.38 samples/sec Loss 1.2833 LearningRate 0.000054 Epoch: 31 Global Step: 656530 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:44:56,468-Speed 2496.31 samples/sec Loss 1.2444 LearningRate 0.000054 Epoch: 31 Global Step: 656540 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:04,696-Speed 2489.62 samples/sec Loss 1.2567 LearningRate 0.000054 Epoch: 31 Global Step: 656550 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:12,905-Speed 2495.06 samples/sec Loss 1.2660 LearningRate 0.000054 Epoch: 31 Global Step: 656560 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:21,108-Speed 2497.04 samples/sec Loss 1.2877 LearningRate 0.000054 Epoch: 31 Global Step: 656570 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:29,320-Speed 2494.50 samples/sec Loss 1.2477 LearningRate 0.000054 Epoch: 31 Global Step: 656580 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:37,469-Speed 2513.31 samples/sec Loss 1.2864 LearningRate 0.000054 Epoch: 31 Global Step: 656590 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:45,676-Speed 2495.95 samples/sec Loss 1.2620 LearningRate 0.000054 Epoch: 31 Global Step: 656600 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:45:53,896-Speed 2491.76 samples/sec Loss 1.2907 LearningRate 0.000054 Epoch: 31 Global Step: 656610 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:02,110-Speed 2493.74 samples/sec Loss 1.2737 LearningRate 0.000054 Epoch: 31 Global Step: 656620 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:10,318-Speed 2495.46 samples/sec Loss 1.2900 LearningRate 0.000054 Epoch: 31 Global Step: 656630 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:18,527-Speed 2495.49 samples/sec Loss 1.2715 LearningRate 0.000054 Epoch: 31 Global Step: 656640 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:26,675-Speed 2514.27 samples/sec Loss 1.2903 LearningRate 0.000054 Epoch: 31 Global Step: 656650 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:34,879-Speed 2496.67 samples/sec Loss 1.2367 LearningRate 0.000054 Epoch: 31 Global Step: 656660 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:43,082-Speed 2497.34 samples/sec Loss 1.2821 LearningRate 0.000054 Epoch: 31 Global Step: 656670 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:51,288-Speed 2496.12 samples/sec Loss 1.2611 LearningRate 0.000054 Epoch: 31 Global Step: 656680 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:46:59,490-Speed 2497.15 samples/sec Loss 1.2861 LearningRate 0.000054 Epoch: 31 Global Step: 656690 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:07,697-Speed 2495.87 samples/sec Loss 1.2937 LearningRate 0.000054 Epoch: 31 Global Step: 656700 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:15,848-Speed 2512.88 samples/sec Loss 1.2620 LearningRate 0.000054 Epoch: 31 Global Step: 656710 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:24,050-Speed 2497.79 samples/sec Loss 1.3043 LearningRate 0.000054 Epoch: 31 Global Step: 656720 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:32,252-Speed 2497.31 samples/sec Loss 1.2851 LearningRate 0.000054 Epoch: 31 Global Step: 656730 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:40,467-Speed 2493.29 samples/sec Loss 1.2583 LearningRate 0.000054 Epoch: 31 Global Step: 656740 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:48,670-Speed 2496.98 samples/sec Loss 1.2986 LearningRate 0.000054 Epoch: 31 Global Step: 656750 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:47:56,875-Speed 2496.46 samples/sec Loss 1.2940 LearningRate 0.000054 Epoch: 31 Global Step: 656760 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:05,025-Speed 2513.46 samples/sec Loss 1.2542 LearningRate 0.000054 Epoch: 31 Global Step: 656770 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:13,227-Speed 2497.44 samples/sec Loss 1.2655 LearningRate 0.000054 Epoch: 31 Global Step: 656780 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:21,438-Speed 2494.57 samples/sec Loss 1.2543 LearningRate 0.000054 Epoch: 31 Global Step: 656790 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:29,643-Speed 2496.29 samples/sec Loss 1.2335 LearningRate 0.000054 Epoch: 31 Global Step: 656800 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:37,861-Speed 2492.60 samples/sec Loss 1.2477 LearningRate 0.000054 Epoch: 31 Global Step: 656810 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:46,066-Speed 2496.38 samples/sec Loss 1.3106 LearningRate 0.000054 Epoch: 31 Global Step: 656820 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:48:54,214-Speed 2513.95 samples/sec Loss 1.2835 LearningRate 0.000054 Epoch: 31 Global Step: 656830 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:02,420-Speed 2496.11 samples/sec Loss 1.2715 LearningRate 0.000054 Epoch: 31 Global Step: 656840 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:10,629-Speed 2495.40 samples/sec Loss 1.2756 LearningRate 0.000054 Epoch: 31 Global Step: 656850 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:18,832-Speed 2496.88 samples/sec Loss 1.2701 LearningRate 0.000054 Epoch: 31 Global Step: 656860 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:27,038-Speed 2496.28 samples/sec Loss 1.2431 LearningRate 0.000054 Epoch: 31 Global Step: 656870 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:35,242-Speed 2496.67 samples/sec Loss 1.2613 LearningRate 0.000053 Epoch: 31 Global Step: 656880 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:43,400-Speed 2510.93 samples/sec Loss 1.2361 LearningRate 0.000053 Epoch: 31 Global Step: 656890 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:51,608-Speed 2495.41 samples/sec Loss 1.2504 LearningRate 0.000053 Epoch: 31 Global Step: 656900 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:49:59,810-Speed 2497.80 samples/sec Loss 1.2665 LearningRate 0.000053 Epoch: 31 Global Step: 656910 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:08,020-Speed 2494.93 samples/sec Loss 1.2891 LearningRate 0.000053 Epoch: 31 Global Step: 656920 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:16,225-Speed 2496.23 samples/sec Loss 1.2683 LearningRate 0.000053 Epoch: 31 Global Step: 656930 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:24,429-Speed 2496.70 samples/sec Loss 1.3043 LearningRate 0.000053 Epoch: 31 Global Step: 656940 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:32,586-Speed 2511.51 samples/sec Loss 1.2661 LearningRate 0.000053 Epoch: 31 Global Step: 656950 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:40,789-Speed 2496.99 samples/sec Loss 1.2470 LearningRate 0.000053 Epoch: 31 Global Step: 656960 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:48,995-Speed 2496.14 samples/sec Loss 1.2893 LearningRate 0.000053 Epoch: 31 Global Step: 656970 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:50:57,196-Speed 2497.44 samples/sec Loss 1.2578 LearningRate 0.000053 Epoch: 31 Global Step: 656980 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:51:05,400-Speed 2496.75 samples/sec Loss 1.2591 LearningRate 0.000053 Epoch: 31 Global Step: 656990 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:51:13,614-Speed 2493.65 samples/sec Loss 1.2408 LearningRate 0.000053 Epoch: 31 Global Step: 657000 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:51:21,767-Speed 2512.45 samples/sec Loss 1.2951 LearningRate 0.000053 Epoch: 31 Global Step: 657010 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:51:29,969-Speed 2497.17 samples/sec Loss 1.2562 LearningRate 0.000053 Epoch: 31 Global Step: 657020 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-07-11 20:51:38,171-Speed 2497.36 samples/sec Loss 1.2762 LearningRate 0.000053 Epoch: 31 Global Step: 657030 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:51:46,376-Speed 2496.36 samples/sec Loss 1.2761 LearningRate 0.000053 Epoch: 31 Global Step: 657040 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:51:54,582-Speed 2496.20 samples/sec Loss 1.2901 LearningRate 0.000053 Epoch: 31 Global Step: 657050 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:02,785-Speed 2497.00 samples/sec Loss 1.3091 LearningRate 0.000053 Epoch: 31 Global Step: 657060 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:10,936-Speed 2513.02 samples/sec Loss 1.2960 LearningRate 0.000053 Epoch: 31 Global Step: 657070 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:19,147-Speed 2494.55 samples/sec Loss 1.2392 LearningRate 0.000053 Epoch: 31 Global Step: 657080 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:27,352-Speed 2496.45 samples/sec Loss 1.2510 LearningRate 0.000053 Epoch: 31 Global Step: 657090 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:35,557-Speed 2496.69 samples/sec Loss 1.2645 LearningRate 0.000053 Epoch: 31 Global Step: 657100 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:43,774-Speed 2492.65 samples/sec Loss 1.2515 LearningRate 0.000053 Epoch: 31 Global Step: 657110 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:52:51,981-Speed 2495.81 samples/sec Loss 1.2787 LearningRate 0.000053 Epoch: 31 Global Step: 657120 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:00,133-Speed 2512.61 samples/sec Loss 1.2618 LearningRate 0.000053 Epoch: 31 Global Step: 657130 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:08,340-Speed 2495.80 samples/sec Loss 1.2653 LearningRate 0.000053 Epoch: 31 Global Step: 657140 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:16,556-Speed 2493.13 samples/sec Loss 1.2394 LearningRate 0.000053 Epoch: 31 Global Step: 657150 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:24,764-Speed 2495.43 samples/sec Loss 1.2601 LearningRate 0.000053 Epoch: 31 Global Step: 657160 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:32,965-Speed 2497.51 samples/sec Loss 1.2663 LearningRate 0.000053 Epoch: 31 Global Step: 657170 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:41,179-Speed 2493.99 samples/sec Loss 1.2605 LearningRate 0.000053 Epoch: 31 Global Step: 657180 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:49,336-Speed 2510.94 samples/sec Loss 1.2706 LearningRate 0.000053 Epoch: 31 Global Step: 657190 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:53:57,544-Speed 2495.45 samples/sec Loss 1.3033 LearningRate 0.000053 Epoch: 31 Global Step: 657200 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:05,750-Speed 2496.31 samples/sec Loss 1.2769 LearningRate 0.000053 Epoch: 31 Global Step: 657210 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:13,957-Speed 2495.81 samples/sec Loss 1.2273 LearningRate 0.000053 Epoch: 31 Global Step: 657220 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:22,159-Speed 2497.23 samples/sec Loss 1.2558 LearningRate 0.000053 Epoch: 31 Global Step: 657230 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:30,365-Speed 2495.98 samples/sec Loss 1.2511 LearningRate 0.000053 Epoch: 31 Global Step: 657240 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:38,517-Speed 2512.93 samples/sec Loss 1.2243 LearningRate 0.000053 Epoch: 31 Global Step: 657250 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:46,719-Speed 2497.22 samples/sec Loss 1.2940 LearningRate 0.000053 Epoch: 31 Global Step: 657260 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:54:54,939-Speed 2491.85 samples/sec Loss 1.2638 LearningRate 0.000053 Epoch: 31 Global Step: 657270 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:03,144-Speed 2496.63 samples/sec Loss 1.2557 LearningRate 0.000053 Epoch: 31 Global Step: 657280 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:11,350-Speed 2496.49 samples/sec Loss 1.2624 LearningRate 0.000053 Epoch: 31 Global Step: 657290 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:19,551-Speed 2497.48 samples/sec Loss 1.2258 LearningRate 0.000053 Epoch: 31 Global Step: 657300 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:27,703-Speed 2512.55 samples/sec Loss 1.2265 LearningRate 0.000053 Epoch: 31 Global Step: 657310 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:35,908-Speed 2496.58 samples/sec Loss 1.2709 LearningRate 0.000053 Epoch: 31 Global Step: 657320 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:44,111-Speed 2497.09 samples/sec Loss 1.2782 LearningRate 0.000053 Epoch: 31 Global Step: 657330 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:55:52,316-Speed 2496.43 samples/sec Loss 1.2736 LearningRate 0.000053 Epoch: 31 Global Step: 657340 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:00,521-Speed 2496.52 samples/sec Loss 1.2483 LearningRate 0.000053 Epoch: 31 Global Step: 657350 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:08,725-Speed 2497.08 samples/sec Loss 1.2812 LearningRate 0.000053 Epoch: 31 Global Step: 657360 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:16,876-Speed 2512.90 samples/sec Loss 1.2613 LearningRate 0.000053 Epoch: 31 Global Step: 657370 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:25,083-Speed 2495.82 samples/sec Loss 1.2622 LearningRate 0.000053 Epoch: 31 Global Step: 657380 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:33,285-Speed 2497.16 samples/sec Loss 1.2736 LearningRate 0.000053 Epoch: 31 Global Step: 657390 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:41,490-Speed 2496.63 samples/sec Loss 1.2532 LearningRate 0.000053 Epoch: 31 Global Step: 657400 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:49,694-Speed 2496.65 samples/sec Loss 1.2469 LearningRate 0.000053 Epoch: 31 Global Step: 657410 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:56:57,900-Speed 2496.42 samples/sec Loss 1.2607 LearningRate 0.000053 Epoch: 31 Global Step: 657420 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:06,061-Speed 2509.70 samples/sec Loss 1.2876 LearningRate 0.000053 Epoch: 31 Global Step: 657430 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:14,267-Speed 2496.25 samples/sec Loss 1.2641 LearningRate 0.000053 Epoch: 31 Global Step: 657440 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:22,472-Speed 2496.43 samples/sec Loss 1.2242 LearningRate 0.000053 Epoch: 31 Global Step: 657450 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:30,677-Speed 2496.62 samples/sec Loss 1.2783 LearningRate 0.000053 Epoch: 31 Global Step: 657460 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:38,884-Speed 2495.79 samples/sec Loss 1.2758 LearningRate 0.000053 Epoch: 31 Global Step: 657470 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:47,090-Speed 2496.52 samples/sec Loss 1.2487 LearningRate 0.000053 Epoch: 31 Global Step: 657480 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:57:55,241-Speed 2512.74 samples/sec Loss 1.2482 LearningRate 0.000053 Epoch: 31 Global Step: 657490 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:03,445-Speed 2496.81 samples/sec Loss 1.2496 LearningRate 0.000053 Epoch: 31 Global Step: 657500 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:11,651-Speed 2496.11 samples/sec Loss 1.2836 LearningRate 0.000053 Epoch: 31 Global Step: 657510 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:19,856-Speed 2496.40 samples/sec Loss 1.2517 LearningRate 0.000053 Epoch: 31 Global Step: 657520 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:28,094-Speed 2486.34 samples/sec Loss 1.2450 LearningRate 0.000053 Epoch: 31 Global Step: 657530 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:36,313-Speed 2492.10 samples/sec Loss 1.2511 LearningRate 0.000053 Epoch: 31 Global Step: 657540 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:44,470-Speed 2511.36 samples/sec Loss 1.2525 LearningRate 0.000053 Epoch: 31 Global Step: 657550 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:58:52,685-Speed 2493.56 samples/sec Loss 1.2593 LearningRate 0.000053 Epoch: 31 Global Step: 657560 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:00,890-Speed 2496.28 samples/sec Loss 1.2653 LearningRate 0.000053 Epoch: 31 Global Step: 657570 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:09,096-Speed 2495.99 samples/sec Loss 1.2980 LearningRate 0.000053 Epoch: 31 Global Step: 657580 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:17,300-Speed 2497.24 samples/sec Loss 1.2824 LearningRate 0.000053 Epoch: 31 Global Step: 657590 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:25,515-Speed 2493.52 samples/sec Loss 1.2591 LearningRate 0.000053 Epoch: 31 Global Step: 657600 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:33,668-Speed 2512.23 samples/sec Loss 1.2604 LearningRate 0.000053 Epoch: 31 Global Step: 657610 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:41,873-Speed 2496.50 samples/sec Loss 1.2831 LearningRate 0.000053 Epoch: 31 Global Step: 657620 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:50,087-Speed 2493.78 samples/sec Loss 1.2591 LearningRate 0.000053 Epoch: 31 Global Step: 657630 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 20:59:58,291-Speed 2496.79 samples/sec Loss 1.2536 LearningRate 0.000053 Epoch: 31 Global Step: 657640 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:00:06,501-Speed 2494.99 samples/sec Loss 1.2761 LearningRate 0.000053 Epoch: 31 Global Step: 657650 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:00:14,702-Speed 2497.36 samples/sec Loss 1.2810 LearningRate 0.000053 Epoch: 31 Global Step: 657660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:00:22,859-Speed 2511.41 samples/sec Loss 1.2727 LearningRate 0.000053 Epoch: 31 Global Step: 657670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:00:31,066-Speed 2495.73 samples/sec Loss 1.2696 LearningRate 0.000053 Epoch: 31 Global Step: 657680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:00:39,273-Speed 2495.97 samples/sec Loss 1.2712 LearningRate 0.000053 Epoch: 31 Global Step: 657690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:00:47,479-Speed 2496.15 samples/sec Loss 1.2326 LearningRate 0.000053 Epoch: 31 Global Step: 657700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:00:55,682-Speed 2497.48 samples/sec Loss 1.2465 LearningRate 0.000053 Epoch: 31 Global Step: 657710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:03,894-Speed 2494.01 samples/sec Loss 1.2526 LearningRate 0.000053 Epoch: 31 Global Step: 657720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:12,052-Speed 2510.79 samples/sec Loss 1.2615 LearningRate 0.000053 Epoch: 31 Global Step: 657730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:20,258-Speed 2496.22 samples/sec Loss 1.2643 LearningRate 0.000053 Epoch: 31 Global Step: 657740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:28,465-Speed 2495.70 samples/sec Loss 1.2688 LearningRate 0.000053 Epoch: 31 Global Step: 657750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:36,673-Speed 2495.54 samples/sec Loss 1.2557 LearningRate 0.000053 Epoch: 31 Global Step: 657760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:44,880-Speed 2495.98 samples/sec Loss 1.2639 LearningRate 0.000053 Epoch: 31 Global Step: 657770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:01:53,093-Speed 2494.42 samples/sec Loss 1.2608 LearningRate 0.000053 Epoch: 31 Global Step: 657780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:01,244-Speed 2512.86 samples/sec Loss 1.2872 LearningRate 0.000053 Epoch: 31 Global Step: 657790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:09,450-Speed 2495.98 samples/sec Loss 1.2220 LearningRate 0.000053 Epoch: 31 Global Step: 657800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:17,655-Speed 2496.68 samples/sec Loss 1.2789 LearningRate 0.000053 Epoch: 31 Global Step: 657810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:25,862-Speed 2496.04 samples/sec Loss 1.2337 LearningRate 0.000053 Epoch: 31 Global Step: 657820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:34,066-Speed 2496.48 samples/sec Loss 1.2646 LearningRate 0.000053 Epoch: 31 Global Step: 657830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:42,270-Speed 2496.89 samples/sec Loss 1.2767 LearningRate 0.000053 Epoch: 31 Global Step: 657840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:50,420-Speed 2513.76 samples/sec Loss 1.2577 LearningRate 0.000053 Epoch: 31 Global Step: 657850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:02:58,629-Speed 2495.36 samples/sec Loss 1.2818 LearningRate 0.000053 Epoch: 31 Global Step: 657860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:06,837-Speed 2495.32 samples/sec Loss 1.2255 LearningRate 0.000053 Epoch: 31 Global Step: 657870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:15,041-Speed 2497.12 samples/sec Loss 1.2659 LearningRate 0.000053 Epoch: 31 Global Step: 657880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:23,250-Speed 2495.16 samples/sec Loss 1.2953 LearningRate 0.000053 Epoch: 31 Global Step: 657890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:31,457-Speed 2496.25 samples/sec Loss 1.2762 LearningRate 0.000053 Epoch: 31 Global Step: 657900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:39,609-Speed 2512.68 samples/sec Loss 1.2746 LearningRate 0.000053 Epoch: 31 Global Step: 657910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:47,814-Speed 2496.49 samples/sec Loss 1.2381 LearningRate 0.000053 Epoch: 31 Global Step: 657920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:03:56,020-Speed 2496.10 samples/sec Loss 1.2770 LearningRate 0.000053 Epoch: 31 Global Step: 657930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:04,250-Speed 2489.02 samples/sec Loss 1.2818 LearningRate 0.000053 Epoch: 31 Global Step: 657940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:12,456-Speed 2496.12 samples/sec Loss 1.2624 LearningRate 0.000053 Epoch: 31 Global Step: 657950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:20,660-Speed 2496.51 samples/sec Loss 1.2737 LearningRate 0.000053 Epoch: 31 Global Step: 657960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:28,817-Speed 2511.32 samples/sec Loss 1.2281 LearningRate 0.000053 Epoch: 31 Global Step: 657970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:37,022-Speed 2496.66 samples/sec Loss 1.2495 LearningRate 0.000053 Epoch: 31 Global Step: 657980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:45,226-Speed 2496.50 samples/sec Loss 1.2635 LearningRate 0.000053 Epoch: 31 Global Step: 657990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:04:53,434-Speed 2495.39 samples/sec Loss 1.2476 LearningRate 0.000053 Epoch: 31 Global Step: 658000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:01,640-Speed 2496.33 samples/sec Loss 1.2131 LearningRate 0.000053 Epoch: 31 Global Step: 658010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:09,860-Speed 2491.83 samples/sec Loss 1.2606 LearningRate 0.000053 Epoch: 31 Global Step: 658020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:18,013-Speed 2512.26 samples/sec Loss 1.2979 LearningRate 0.000053 Epoch: 31 Global Step: 658030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:26,216-Speed 2497.28 samples/sec Loss 1.2642 LearningRate 0.000053 Epoch: 31 Global Step: 658040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:34,418-Speed 2497.47 samples/sec Loss 1.2847 LearningRate 0.000053 Epoch: 31 Global Step: 658050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:42,629-Speed 2494.70 samples/sec Loss 1.2670 LearningRate 0.000053 Epoch: 31 Global Step: 658060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:50,832-Speed 2496.79 samples/sec Loss 1.2800 LearningRate 0.000053 Epoch: 31 Global Step: 658070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:05:59,036-Speed 2496.81 samples/sec Loss 1.3000 LearningRate 0.000053 Epoch: 31 Global Step: 658080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:07,186-Speed 2513.35 samples/sec Loss 1.2859 LearningRate 0.000053 Epoch: 31 Global Step: 658090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:15,412-Speed 2490.07 samples/sec Loss 1.2655 LearningRate 0.000053 Epoch: 31 Global Step: 658100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:23,618-Speed 2496.08 samples/sec Loss 1.2632 LearningRate 0.000053 Epoch: 31 Global Step: 658110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:31,824-Speed 2496.14 samples/sec Loss 1.2678 LearningRate 0.000053 Epoch: 31 Global Step: 658120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:40,049-Speed 2490.47 samples/sec Loss 1.2904 LearningRate 0.000053 Epoch: 31 Global Step: 658130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:48,259-Speed 2494.63 samples/sec Loss 1.2569 LearningRate 0.000053 Epoch: 31 Global Step: 658140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:06:56,409-Speed 2513.21 samples/sec Loss 1.2832 LearningRate 0.000053 Epoch: 31 Global Step: 658150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:04,621-Speed 2494.32 samples/sec Loss 1.2683 LearningRate 0.000053 Epoch: 31 Global Step: 658160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:12,827-Speed 2496.25 samples/sec Loss 1.2726 LearningRate 0.000053 Epoch: 31 Global Step: 658170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:21,036-Speed 2495.16 samples/sec Loss 1.2574 LearningRate 0.000053 Epoch: 31 Global Step: 658180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:29,243-Speed 2495.91 samples/sec Loss 1.2152 LearningRate 0.000053 Epoch: 31 Global Step: 658190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:37,448-Speed 2496.54 samples/sec Loss 1.2467 LearningRate 0.000053 Epoch: 31 Global Step: 658200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:45,600-Speed 2512.56 samples/sec Loss 1.2774 LearningRate 0.000053 Epoch: 31 Global Step: 658210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:07:53,807-Speed 2495.53 samples/sec Loss 1.2797 LearningRate 0.000053 Epoch: 31 Global Step: 658220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:02,023-Speed 2493.34 samples/sec Loss 1.2541 LearningRate 0.000053 Epoch: 31 Global Step: 658230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:10,227-Speed 2496.63 samples/sec Loss 1.2448 LearningRate 0.000053 Epoch: 31 Global Step: 658240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:18,440-Speed 2493.95 samples/sec Loss 1.2467 LearningRate 0.000053 Epoch: 31 Global Step: 658250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:26,646-Speed 2496.10 samples/sec Loss 1.2559 LearningRate 0.000053 Epoch: 31 Global Step: 658260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:34,797-Speed 2512.91 samples/sec Loss 1.2438 LearningRate 0.000053 Epoch: 31 Global Step: 658270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:43,000-Speed 2497.18 samples/sec Loss 1.2594 LearningRate 0.000053 Epoch: 31 Global Step: 658280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:51,206-Speed 2496.07 samples/sec Loss 1.2574 LearningRate 0.000053 Epoch: 31 Global Step: 658290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:08:59,423-Speed 2492.70 samples/sec Loss 1.2581 LearningRate 0.000053 Epoch: 31 Global Step: 658300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:09:07,626-Speed 2497.13 samples/sec Loss 1.2857 LearningRate 0.000053 Epoch: 31 Global Step: 658310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:09:15,832-Speed 2496.26 samples/sec Loss 1.2598 LearningRate 0.000053 Epoch: 31 Global Step: 658320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:09:23,983-Speed 2512.88 samples/sec Loss 1.2812 LearningRate 0.000053 Epoch: 31 Global Step: 658330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:09:32,186-Speed 2497.10 samples/sec Loss 1.2579 LearningRate 0.000053 Epoch: 31 Global Step: 658340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:09:40,348-Speed 2509.68 samples/sec Loss 1.2305 LearningRate 0.000053 Epoch: 31 Global Step: 658350 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:09:48,556-Speed 2495.93 samples/sec Loss 1.2865 LearningRate 0.000053 Epoch: 31 Global Step: 658360 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:09:56,759-Speed 2496.76 samples/sec Loss 1.2602 LearningRate 0.000053 Epoch: 31 Global Step: 658370 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:04,967-Speed 2495.48 samples/sec Loss 1.2981 LearningRate 0.000053 Epoch: 31 Global Step: 658380 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:13,121-Speed 2512.27 samples/sec Loss 1.3072 LearningRate 0.000053 Epoch: 31 Global Step: 658390 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:21,324-Speed 2496.98 samples/sec Loss 1.2580 LearningRate 0.000053 Epoch: 31 Global Step: 658400 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:29,527-Speed 2497.29 samples/sec Loss 1.2520 LearningRate 0.000053 Epoch: 31 Global Step: 658410 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:37,742-Speed 2493.21 samples/sec Loss 1.2412 LearningRate 0.000053 Epoch: 31 Global Step: 658420 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:45,952-Speed 2495.04 samples/sec Loss 1.2282 LearningRate 0.000053 Epoch: 31 Global Step: 658430 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:10:54,154-Speed 2497.28 samples/sec Loss 1.2672 LearningRate 0.000053 Epoch: 31 Global Step: 658440 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:02,309-Speed 2511.88 samples/sec Loss 1.2710 LearningRate 0.000053 Epoch: 31 Global Step: 658450 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:10,516-Speed 2495.94 samples/sec Loss 1.2858 LearningRate 0.000053 Epoch: 31 Global Step: 658460 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:18,717-Speed 2497.62 samples/sec Loss 1.2978 LearningRate 0.000053 Epoch: 31 Global Step: 658470 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:26,920-Speed 2497.30 samples/sec Loss 1.2434 LearningRate 0.000053 Epoch: 31 Global Step: 658480 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:35,123-Speed 2496.90 samples/sec Loss 1.2438 LearningRate 0.000053 Epoch: 31 Global Step: 658490 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:43,328-Speed 2496.68 samples/sec Loss 1.2603 LearningRate 0.000052 Epoch: 31 Global Step: 658500 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:51,488-Speed 2510.20 samples/sec Loss 1.2351 LearningRate 0.000052 Epoch: 31 Global Step: 658510 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:11:59,696-Speed 2495.45 samples/sec Loss 1.2882 LearningRate 0.000052 Epoch: 31 Global Step: 658520 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:07,904-Speed 2495.71 samples/sec Loss 1.2262 LearningRate 0.000052 Epoch: 31 Global Step: 658530 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:16,110-Speed 2496.06 samples/sec Loss 1.2646 LearningRate 0.000052 Epoch: 31 Global Step: 658540 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:24,317-Speed 2496.02 samples/sec Loss 1.2506 LearningRate 0.000052 Epoch: 31 Global Step: 658550 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:32,522-Speed 2496.68 samples/sec Loss 1.2643 LearningRate 0.000052 Epoch: 31 Global Step: 658560 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:40,675-Speed 2512.26 samples/sec Loss 1.3002 LearningRate 0.000052 Epoch: 31 Global Step: 658570 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:48,882-Speed 2495.99 samples/sec Loss 1.2632 LearningRate 0.000052 Epoch: 31 Global Step: 658580 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:12:57,091-Speed 2495.19 samples/sec Loss 1.2508 LearningRate 0.000052 Epoch: 31 Global Step: 658590 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:05,297-Speed 2496.12 samples/sec Loss 1.2482 LearningRate 0.000052 Epoch: 31 Global Step: 658600 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:13,500-Speed 2496.92 samples/sec Loss 1.2561 LearningRate 0.000052 Epoch: 31 Global Step: 658610 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:21,705-Speed 2496.55 samples/sec Loss 1.2343 LearningRate 0.000052 Epoch: 31 Global Step: 658620 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:29,854-Speed 2513.64 samples/sec Loss 1.2486 LearningRate 0.000052 Epoch: 31 Global Step: 658630 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:38,057-Speed 2497.00 samples/sec Loss 1.2717 LearningRate 0.000052 Epoch: 31 Global Step: 658640 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:46,260-Speed 2496.97 samples/sec Loss 1.2315 LearningRate 0.000052 Epoch: 31 Global Step: 658650 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:13:54,464-Speed 2496.77 samples/sec Loss 1.2471 LearningRate 0.000052 Epoch: 31 Global Step: 658660 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:02,668-Speed 2496.79 samples/sec Loss 1.2632 LearningRate 0.000052 Epoch: 31 Global Step: 658670 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:10,875-Speed 2495.85 samples/sec Loss 1.2763 LearningRate 0.000052 Epoch: 31 Global Step: 658680 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:19,026-Speed 2512.97 samples/sec Loss 1.2983 LearningRate 0.000052 Epoch: 31 Global Step: 658690 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:27,230-Speed 2496.96 samples/sec Loss 1.2335 LearningRate 0.000052 Epoch: 31 Global Step: 658700 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:35,431-Speed 2497.82 samples/sec Loss 1.2652 LearningRate 0.000052 Epoch: 31 Global Step: 658710 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:43,641-Speed 2494.78 samples/sec Loss 1.2486 LearningRate 0.000052 Epoch: 31 Global Step: 658720 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:14:51,855-Speed 2493.94 samples/sec Loss 1.2617 LearningRate 0.000052 Epoch: 31 Global Step: 658730 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:00,062-Speed 2495.50 samples/sec Loss 1.2837 LearningRate 0.000052 Epoch: 31 Global Step: 658740 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:08,212-Speed 2513.55 samples/sec Loss 1.2501 LearningRate 0.000052 Epoch: 31 Global Step: 658750 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:16,417-Speed 2496.64 samples/sec Loss 1.2699 LearningRate 0.000052 Epoch: 31 Global Step: 658760 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:24,628-Speed 2494.88 samples/sec Loss 1.2657 LearningRate 0.000052 Epoch: 31 Global Step: 658770 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:32,830-Speed 2497.13 samples/sec Loss 1.2353 LearningRate 0.000052 Epoch: 31 Global Step: 658780 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:41,037-Speed 2496.02 samples/sec Loss 1.2354 LearningRate 0.000052 Epoch: 31 Global Step: 658790 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:49,243-Speed 2496.03 samples/sec Loss 1.2808 LearningRate 0.000052 Epoch: 31 Global Step: 658800 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:15:57,398-Speed 2511.76 samples/sec Loss 1.2749 LearningRate 0.000052 Epoch: 31 Global Step: 658810 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:05,604-Speed 2495.95 samples/sec Loss 1.2907 LearningRate 0.000052 Epoch: 31 Global Step: 658820 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:13,811-Speed 2495.98 samples/sec Loss 1.2306 LearningRate 0.000052 Epoch: 31 Global Step: 658830 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:22,013-Speed 2497.31 samples/sec Loss 1.2182 LearningRate 0.000052 Epoch: 31 Global Step: 658840 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:30,217-Speed 2496.62 samples/sec Loss 1.3148 LearningRate 0.000052 Epoch: 31 Global Step: 658850 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:38,423-Speed 2496.17 samples/sec Loss 1.2918 LearningRate 0.000052 Epoch: 31 Global Step: 658860 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:46,575-Speed 2512.86 samples/sec Loss 1.2646 LearningRate 0.000052 Epoch: 31 Global Step: 658870 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:16:54,782-Speed 2495.68 samples/sec Loss 1.2734 LearningRate 0.000052 Epoch: 31 Global Step: 658880 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:02,984-Speed 2497.67 samples/sec Loss 1.2815 LearningRate 0.000052 Epoch: 31 Global Step: 658890 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:11,190-Speed 2496.14 samples/sec Loss 1.2471 LearningRate 0.000052 Epoch: 31 Global Step: 658900 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:19,394-Speed 2496.62 samples/sec Loss 1.2861 LearningRate 0.000052 Epoch: 31 Global Step: 658910 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:27,604-Speed 2494.98 samples/sec Loss 1.2675 LearningRate 0.000052 Epoch: 31 Global Step: 658920 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:35,772-Speed 2507.55 samples/sec Loss 1.2508 LearningRate 0.000052 Epoch: 31 Global Step: 658930 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:43,981-Speed 2495.49 samples/sec Loss 1.2790 LearningRate 0.000052 Epoch: 31 Global Step: 658940 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:17:52,184-Speed 2497.24 samples/sec Loss 1.2452 LearningRate 0.000052 Epoch: 31 Global Step: 658950 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:00,388-Speed 2496.54 samples/sec Loss 1.2945 LearningRate 0.000052 Epoch: 31 Global Step: 658960 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:08,591-Speed 2497.09 samples/sec Loss 1.2663 LearningRate 0.000052 Epoch: 31 Global Step: 658970 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:16,794-Speed 2497.05 samples/sec Loss 1.2653 LearningRate 0.000052 Epoch: 31 Global Step: 658980 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:24,944-Speed 2513.19 samples/sec Loss 1.2710 LearningRate 0.000052 Epoch: 31 Global Step: 658990 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:33,150-Speed 2496.36 samples/sec Loss 1.3022 LearningRate 0.000052 Epoch: 31 Global Step: 659000 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:41,356-Speed 2495.93 samples/sec Loss 1.3022 LearningRate 0.000052 Epoch: 31 Global Step: 659010 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:49,562-Speed 2496.43 samples/sec Loss 1.2583 LearningRate 0.000052 Epoch: 31 Global Step: 659020 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:18:57,776-Speed 2493.91 samples/sec Loss 1.2827 LearningRate 0.000052 Epoch: 31 Global Step: 659030 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:05,978-Speed 2497.13 samples/sec Loss 1.2575 LearningRate 0.000052 Epoch: 31 Global Step: 659040 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:14,125-Speed 2514.20 samples/sec Loss 1.2550 LearningRate 0.000052 Epoch: 31 Global Step: 659050 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:22,350-Speed 2490.82 samples/sec Loss 1.2615 LearningRate 0.000052 Epoch: 31 Global Step: 659060 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:30,553-Speed 2496.85 samples/sec Loss 1.3164 LearningRate 0.000052 Epoch: 31 Global Step: 659070 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:38,760-Speed 2495.96 samples/sec Loss 1.2905 LearningRate 0.000052 Epoch: 31 Global Step: 659080 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:46,966-Speed 2496.33 samples/sec Loss 1.2674 LearningRate 0.000052 Epoch: 31 Global Step: 659090 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:19:55,175-Speed 2495.35 samples/sec Loss 1.2567 LearningRate 0.000052 Epoch: 31 Global Step: 659100 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:03,344-Speed 2507.58 samples/sec Loss 1.2710 LearningRate 0.000052 Epoch: 31 Global Step: 659110 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:11,554-Speed 2494.75 samples/sec Loss 1.2847 LearningRate 0.000052 Epoch: 31 Global Step: 659120 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:19,760-Speed 2496.36 samples/sec Loss 1.2416 LearningRate 0.000052 Epoch: 31 Global Step: 659130 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:27,966-Speed 2495.96 samples/sec Loss 1.2724 LearningRate 0.000052 Epoch: 31 Global Step: 659140 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:36,167-Speed 2497.58 samples/sec Loss 1.2597 LearningRate 0.000052 Epoch: 31 Global Step: 659150 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:44,372-Speed 2496.69 samples/sec Loss 1.2798 LearningRate 0.000052 Epoch: 31 Global Step: 659160 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:20:52,525-Speed 2512.54 samples/sec Loss 1.2916 LearningRate 0.000052 Epoch: 31 Global Step: 659170 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:00,727-Speed 2497.28 samples/sec Loss 1.2715 LearningRate 0.000052 Epoch: 31 Global Step: 659180 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:08,931-Speed 2496.49 samples/sec Loss 1.2309 LearningRate 0.000052 Epoch: 31 Global Step: 659190 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:17,136-Speed 2496.45 samples/sec Loss 1.2741 LearningRate 0.000052 Epoch: 31 Global Step: 659200 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:25,344-Speed 2495.79 samples/sec Loss 1.2645 LearningRate 0.000052 Epoch: 31 Global Step: 659210 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:33,550-Speed 2495.82 samples/sec Loss 1.2671 LearningRate 0.000052 Epoch: 31 Global Step: 659220 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:41,706-Speed 2511.61 samples/sec Loss 1.2774 LearningRate 0.000052 Epoch: 31 Global Step: 659230 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:49,909-Speed 2497.26 samples/sec Loss 1.2784 LearningRate 0.000052 Epoch: 31 Global Step: 659240 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:21:58,366-Speed 2499.01 samples/sec Loss 1.2426 LearningRate 0.000052 Epoch: 31 Global Step: 659250 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:06,568-Speed 2497.49 samples/sec Loss 1.2798 LearningRate 0.000052 Epoch: 31 Global Step: 659260 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:15,512-Speed 2499.92 samples/sec Loss 1.2511 LearningRate 0.000052 Epoch: 31 Global Step: 659270 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:25,097-Speed 2501.07 samples/sec Loss 1.2834 LearningRate 0.000052 Epoch: 31 Global Step: 659280 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:33,790-Speed 2517.18 samples/sec Loss 1.2780 LearningRate 0.000052 Epoch: 31 Global Step: 659290 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:44,097-Speed 1995.10 samples/sec Loss 1.2546 LearningRate 0.000052 Epoch: 31 Global Step: 659300 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:22:52,561-Speed 2502.09 samples/sec Loss 1.2603 LearningRate 0.000052 Epoch: 31 Global Step: 659310 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:00,795-Speed 2499.42 samples/sec Loss 1.2758 LearningRate 0.000052 Epoch: 31 Global Step: 659320 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:08,998-Speed 2497.00 samples/sec Loss 1.2388 LearningRate 0.000052 Epoch: 31 Global Step: 659330 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:17,243-Speed 2500.40 samples/sec Loss 1.2798 LearningRate 0.000052 Epoch: 31 Global Step: 659340 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:25,501-Speed 2514.25 samples/sec Loss 1.2505 LearningRate 0.000052 Epoch: 31 Global Step: 659350 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:33,747-Speed 2496.63 samples/sec Loss 1.2553 LearningRate 0.000052 Epoch: 31 Global Step: 659360 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:41,964-Speed 2498.60 samples/sec Loss 1.2800 LearningRate 0.000052 Epoch: 31 Global Step: 659370 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:50,187-Speed 2490.94 samples/sec Loss 1.2659 LearningRate 0.000052 Epoch: 31 Global Step: 659380 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:23:58,447-Speed 2498.14 samples/sec Loss 1.2583 LearningRate 0.000052 Epoch: 31 Global Step: 659390 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:06,673-Speed 2494.54 samples/sec Loss 1.2916 LearningRate 0.000052 Epoch: 31 Global Step: 659400 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:14,828-Speed 2511.70 samples/sec Loss 1.2503 LearningRate 0.000052 Epoch: 31 Global Step: 659410 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:25,638-Speed 1963.83 samples/sec Loss 1.2613 LearningRate 0.000052 Epoch: 31 Global Step: 659420 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:33,873-Speed 2500.04 samples/sec Loss 1.2568 LearningRate 0.000052 Epoch: 31 Global Step: 659430 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:42,289-Speed 2500.13 samples/sec Loss 1.2611 LearningRate 0.000052 Epoch: 31 Global Step: 659440 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:50,497-Speed 2495.26 samples/sec Loss 1.2791 LearningRate 0.000052 Epoch: 31 Global Step: 659450 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:24:59,954-Speed 2499.16 samples/sec Loss 1.2819 LearningRate 0.000052 Epoch: 31 Global Step: 659460 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:08,097-Speed 2517.98 samples/sec Loss 1.2924 LearningRate 0.000052 Epoch: 31 Global Step: 659470 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:20,947-Speed 1631.62 samples/sec Loss 1.2499 LearningRate 0.000052 Epoch: 31 Global Step: 659480 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:29,197-Speed 2501.60 samples/sec Loss 1.2522 LearningRate 0.000052 Epoch: 31 Global Step: 659490 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:37,641-Speed 2500.62 samples/sec Loss 1.2388 LearningRate 0.000052 Epoch: 31 Global Step: 659500 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:50,475-Speed 2501.82 samples/sec Loss 1.2529 LearningRate 0.000052 Epoch: 31 Global Step: 659510 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:25:59,573-Speed 2489.08 samples/sec Loss 1.2880 LearningRate 0.000052 Epoch: 31 Global Step: 659520 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:26:07,721-Speed 2514.02 samples/sec Loss 1.2763 LearningRate 0.000052 Epoch: 31 Global Step: 659530 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:26:16,128-Speed 2494.51 samples/sec Loss 1.2787 LearningRate 0.000052 Epoch: 31 Global Step: 659540 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-07-11 21:26:24,330-Speed 2497.51 samples/sec Loss 1.2549 LearningRate 0.000052 Epoch: 31 Global Step: 659550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:26:32,532-Speed 2497.29 samples/sec Loss 1.2445 LearningRate 0.000052 Epoch: 31 Global Step: 659560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:26:40,739-Speed 2495.92 samples/sec Loss 1.2201 LearningRate 0.000052 Epoch: 31 Global Step: 659570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:26:48,946-Speed 2495.88 samples/sec Loss 1.2641 LearningRate 0.000052 Epoch: 31 Global Step: 659580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:26:57,103-Speed 2511.07 samples/sec Loss 1.2571 LearningRate 0.000052 Epoch: 31 Global Step: 659590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:05,311-Speed 2495.59 samples/sec Loss 1.2475 LearningRate 0.000052 Epoch: 31 Global Step: 659600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:13,525-Speed 2493.55 samples/sec Loss 1.2919 LearningRate 0.000052 Epoch: 31 Global Step: 659610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:21,735-Speed 2495.14 samples/sec Loss 1.2676 LearningRate 0.000052 Epoch: 31 Global Step: 659620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:29,947-Speed 2494.22 samples/sec Loss 1.2823 LearningRate 0.000052 Epoch: 31 Global Step: 659630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:38,155-Speed 2495.51 samples/sec Loss 1.2819 LearningRate 0.000052 Epoch: 31 Global Step: 659640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:46,310-Speed 2511.92 samples/sec Loss 1.2509 LearningRate 0.000052 Epoch: 31 Global Step: 659650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:27:54,516-Speed 2496.23 samples/sec Loss 1.2429 LearningRate 0.000052 Epoch: 31 Global Step: 659660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:02,728-Speed 2494.24 samples/sec Loss 1.2949 LearningRate 0.000052 Epoch: 31 Global Step: 659670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:10,938-Speed 2494.94 samples/sec Loss 1.2536 LearningRate 0.000052 Epoch: 31 Global Step: 659680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:19,144-Speed 2496.12 samples/sec Loss 1.2679 LearningRate 0.000052 Epoch: 31 Global Step: 659690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:27,352-Speed 2495.95 samples/sec Loss 1.2972 LearningRate 0.000052 Epoch: 31 Global Step: 659700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:35,521-Speed 2511.15 samples/sec Loss 1.2618 LearningRate 0.000052 Epoch: 31 Global Step: 659710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:43,722-Speed 2497.41 samples/sec Loss 1.2400 LearningRate 0.000052 Epoch: 31 Global Step: 659720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:28:51,935-Speed 2494.10 samples/sec Loss 1.2391 LearningRate 0.000052 Epoch: 31 Global Step: 659730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:00,153-Speed 2492.67 samples/sec Loss 1.2475 LearningRate 0.000052 Epoch: 31 Global Step: 659740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:08,355-Speed 2497.24 samples/sec Loss 1.2450 LearningRate 0.000052 Epoch: 31 Global Step: 659750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:16,558-Speed 2496.76 samples/sec Loss 1.2404 LearningRate 0.000052 Epoch: 31 Global Step: 659760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:24,711-Speed 2512.67 samples/sec Loss 1.2284 LearningRate 0.000052 Epoch: 31 Global Step: 659770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:32,914-Speed 2496.84 samples/sec Loss 1.2591 LearningRate 0.000052 Epoch: 31 Global Step: 659780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:41,132-Speed 2492.48 samples/sec Loss 1.2578 LearningRate 0.000052 Epoch: 31 Global Step: 659790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:49,336-Speed 2496.68 samples/sec Loss 1.2738 LearningRate 0.000052 Epoch: 31 Global Step: 659800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:29:57,543-Speed 2495.95 samples/sec Loss 1.2182 LearningRate 0.000052 Epoch: 31 Global Step: 659810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:05,745-Speed 2497.24 samples/sec Loss 1.2417 LearningRate 0.000052 Epoch: 31 Global Step: 659820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:13,889-Speed 2515.07 samples/sec Loss 1.2710 LearningRate 0.000052 Epoch: 31 Global Step: 659830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:22,092-Speed 2497.32 samples/sec Loss 1.2726 LearningRate 0.000052 Epoch: 31 Global Step: 659840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:30,290-Speed 2498.95 samples/sec Loss 1.2849 LearningRate 0.000052 Epoch: 31 Global Step: 659850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:38,507-Speed 2492.71 samples/sec Loss 1.2576 LearningRate 0.000052 Epoch: 31 Global Step: 659860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:46,802-Speed 2469.28 samples/sec Loss 1.2642 LearningRate 0.000052 Epoch: 31 Global Step: 659870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:30:55,019-Speed 2492.68 samples/sec Loss 1.2677 LearningRate 0.000052 Epoch: 31 Global Step: 659880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:03,169-Speed 2513.24 samples/sec Loss 1.2562 LearningRate 0.000052 Epoch: 31 Global Step: 659890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:11,369-Speed 2498.01 samples/sec Loss 1.2862 LearningRate 0.000052 Epoch: 31 Global Step: 659900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:19,574-Speed 2496.48 samples/sec Loss 1.2374 LearningRate 0.000052 Epoch: 31 Global Step: 659910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:27,778-Speed 2496.76 samples/sec Loss 1.2691 LearningRate 0.000052 Epoch: 31 Global Step: 659920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:35,980-Speed 2497.53 samples/sec Loss 1.2354 LearningRate 0.000052 Epoch: 31 Global Step: 659930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:44,184-Speed 2496.75 samples/sec Loss 1.2181 LearningRate 0.000052 Epoch: 31 Global Step: 659940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:31:52,332-Speed 2513.81 samples/sec Loss 1.2610 LearningRate 0.000052 Epoch: 31 Global Step: 659950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:00,534-Speed 2497.44 samples/sec Loss 1.2633 LearningRate 0.000052 Epoch: 31 Global Step: 659960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:08,734-Speed 2497.91 samples/sec Loss 1.2696 LearningRate 0.000052 Epoch: 31 Global Step: 659970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:16,933-Speed 2498.36 samples/sec Loss 1.2792 LearningRate 0.000052 Epoch: 31 Global Step: 659980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:25,136-Speed 2497.01 samples/sec Loss 1.2519 LearningRate 0.000052 Epoch: 31 Global Step: 659990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:33,352-Speed 2493.32 samples/sec Loss 1.2491 LearningRate 0.000052 Epoch: 31 Global Step: 660000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:41,498-Speed 2514.51 samples/sec Loss 1.2639 LearningRate 0.000052 Epoch: 31 Global Step: 660010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:49,715-Speed 2492.70 samples/sec Loss 1.2770 LearningRate 0.000052 Epoch: 31 Global Step: 660020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:32:57,917-Speed 2497.39 samples/sec Loss 1.2553 LearningRate 0.000052 Epoch: 31 Global Step: 660030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:06,116-Speed 2498.20 samples/sec Loss 1.2670 LearningRate 0.000052 Epoch: 31 Global Step: 660040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:14,318-Speed 2497.57 samples/sec Loss 1.2445 LearningRate 0.000052 Epoch: 31 Global Step: 660050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:22,518-Speed 2497.79 samples/sec Loss 1.2647 LearningRate 0.000052 Epoch: 31 Global Step: 660060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:30,665-Speed 2514.24 samples/sec Loss 1.2789 LearningRate 0.000052 Epoch: 31 Global Step: 660070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:38,864-Speed 2498.29 samples/sec Loss 1.2526 LearningRate 0.000052 Epoch: 31 Global Step: 660080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:47,064-Speed 2498.07 samples/sec Loss 1.2382 LearningRate 0.000052 Epoch: 31 Global Step: 660090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:33:55,268-Speed 2496.64 samples/sec Loss 1.2613 LearningRate 0.000052 Epoch: 31 Global Step: 660100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:03,469-Speed 2498.21 samples/sec Loss 1.2557 LearningRate 0.000052 Epoch: 31 Global Step: 660110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:11,677-Speed 2496.46 samples/sec Loss 1.2801 LearningRate 0.000052 Epoch: 31 Global Step: 660120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:19,826-Speed 2513.31 samples/sec Loss 1.2738 LearningRate 0.000051 Epoch: 31 Global Step: 660130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:28,027-Speed 2497.73 samples/sec Loss 1.2572 LearningRate 0.000051 Epoch: 31 Global Step: 660140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:36,236-Speed 2495.29 samples/sec Loss 1.2583 LearningRate 0.000051 Epoch: 31 Global Step: 660150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:44,453-Speed 2492.86 samples/sec Loss 1.2474 LearningRate 0.000051 Epoch: 31 Global Step: 660160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:34:52,659-Speed 2496.02 samples/sec Loss 1.2731 LearningRate 0.000051 Epoch: 31 Global Step: 660170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:00,862-Speed 2497.47 samples/sec Loss 1.3040 LearningRate 0.000051 Epoch: 31 Global Step: 660180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:09,012-Speed 2513.31 samples/sec Loss 1.2777 LearningRate 0.000051 Epoch: 31 Global Step: 660190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:17,218-Speed 2496.45 samples/sec Loss 1.2568 LearningRate 0.000051 Epoch: 31 Global Step: 660200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:25,419-Speed 2497.51 samples/sec Loss 1.2448 LearningRate 0.000051 Epoch: 31 Global Step: 660210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:33,630-Speed 2494.72 samples/sec Loss 1.2555 LearningRate 0.000051 Epoch: 31 Global Step: 660220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:41,845-Speed 2493.40 samples/sec Loss 1.2858 LearningRate 0.000051 Epoch: 31 Global Step: 660230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:50,054-Speed 2495.15 samples/sec Loss 1.2197 LearningRate 0.000051 Epoch: 31 Global Step: 660240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:35:58,200-Speed 2514.25 samples/sec Loss 1.2740 LearningRate 0.000051 Epoch: 31 Global Step: 660250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:06,408-Speed 2495.45 samples/sec Loss 1.2124 LearningRate 0.000051 Epoch: 31 Global Step: 660260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:14,604-Speed 2499.99 samples/sec Loss 1.2503 LearningRate 0.000051 Epoch: 31 Global Step: 660270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:22,798-Speed 2500.27 samples/sec Loss 1.2514 LearningRate 0.000051 Epoch: 31 Global Step: 660280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:30,996-Speed 2498.39 samples/sec Loss 1.2429 LearningRate 0.000051 Epoch: 31 Global Step: 660290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:39,208-Speed 2494.23 samples/sec Loss 1.2513 LearningRate 0.000051 Epoch: 31 Global Step: 660300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:47,352-Speed 2515.22 samples/sec Loss 1.2623 LearningRate 0.000051 Epoch: 31 Global Step: 660310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:36:55,558-Speed 2496.54 samples/sec Loss 1.2441 LearningRate 0.000051 Epoch: 31 Global Step: 660320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:03,756-Speed 2498.34 samples/sec Loss 1.2348 LearningRate 0.000051 Epoch: 31 Global Step: 660330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:11,956-Speed 2498.26 samples/sec Loss 1.2506 LearningRate 0.000051 Epoch: 31 Global Step: 660340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:20,154-Speed 2498.46 samples/sec Loss 1.2379 LearningRate 0.000051 Epoch: 31 Global Step: 660350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:28,355-Speed 2497.58 samples/sec Loss 1.2448 LearningRate 0.000051 Epoch: 31 Global Step: 660360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:36,501-Speed 2514.55 samples/sec Loss 1.2491 LearningRate 0.000051 Epoch: 31 Global Step: 660370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:44,701-Speed 2498.16 samples/sec Loss 1.2601 LearningRate 0.000051 Epoch: 31 Global Step: 660380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:37:52,904-Speed 2497.10 samples/sec Loss 1.2472 LearningRate 0.000051 Epoch: 31 Global Step: 660390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:01,102-Speed 2498.50 samples/sec Loss 1.2413 LearningRate 0.000051 Epoch: 31 Global Step: 660400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:09,307-Speed 2496.71 samples/sec Loss 1.2778 LearningRate 0.000051 Epoch: 31 Global Step: 660410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:17,517-Speed 2494.75 samples/sec Loss 1.2450 LearningRate 0.000051 Epoch: 31 Global Step: 660420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:25,669-Speed 2512.90 samples/sec Loss 1.2707 LearningRate 0.000051 Epoch: 31 Global Step: 660430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:33,893-Speed 2490.75 samples/sec Loss 1.2527 LearningRate 0.000051 Epoch: 31 Global Step: 660440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:42,096-Speed 2496.94 samples/sec Loss 1.2478 LearningRate 0.000051 Epoch: 31 Global Step: 660450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:50,303-Speed 2495.91 samples/sec Loss 1.2814 LearningRate 0.000051 Epoch: 31 Global Step: 660460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:38:58,525-Speed 2491.17 samples/sec Loss 1.2798 LearningRate 0.000051 Epoch: 31 Global Step: 660470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:06,726-Speed 2497.86 samples/sec Loss 1.2779 LearningRate 0.000051 Epoch: 31 Global Step: 660480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:14,877-Speed 2513.02 samples/sec Loss 1.2547 LearningRate 0.000051 Epoch: 31 Global Step: 660490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:23,080-Speed 2497.05 samples/sec Loss 1.2667 LearningRate 0.000051 Epoch: 31 Global Step: 660500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:31,282-Speed 2497.27 samples/sec Loss 1.2752 LearningRate 0.000051 Epoch: 31 Global Step: 660510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:39,484-Speed 2497.21 samples/sec Loss 1.2756 LearningRate 0.000051 Epoch: 31 Global Step: 660520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:47,691-Speed 2495.95 samples/sec Loss 1.2600 LearningRate 0.000051 Epoch: 31 Global Step: 660530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:39:55,890-Speed 2498.41 samples/sec Loss 1.2474 LearningRate 0.000051 Epoch: 31 Global Step: 660540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:04,039-Speed 2513.79 samples/sec Loss 1.2626 LearningRate 0.000051 Epoch: 31 Global Step: 660550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:12,238-Speed 2498.25 samples/sec Loss 1.2632 LearningRate 0.000051 Epoch: 31 Global Step: 660560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:20,436-Speed 2498.44 samples/sec Loss 1.2512 LearningRate 0.000051 Epoch: 31 Global Step: 660570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:28,636-Speed 2497.96 samples/sec Loss 1.2360 LearningRate 0.000051 Epoch: 31 Global Step: 660580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:36,838-Speed 2497.31 samples/sec Loss 1.2483 LearningRate 0.000051 Epoch: 31 Global Step: 660590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:45,037-Speed 2498.54 samples/sec Loss 1.2283 LearningRate 0.000051 Epoch: 31 Global Step: 660600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:40:53,187-Speed 2513.40 samples/sec Loss 1.2740 LearningRate 0.000051 Epoch: 31 Global Step: 660610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:01,395-Speed 2495.47 samples/sec Loss 1.2586 LearningRate 0.000051 Epoch: 31 Global Step: 660620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:09,599-Speed 2496.76 samples/sec Loss 1.2392 LearningRate 0.000051 Epoch: 31 Global Step: 660630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:17,799-Speed 2498.01 samples/sec Loss 1.2428 LearningRate 0.000051 Epoch: 31 Global Step: 660640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:25,998-Speed 2498.36 samples/sec Loss 1.2534 LearningRate 0.000051 Epoch: 31 Global Step: 660650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:34,198-Speed 2497.80 samples/sec Loss 1.2689 LearningRate 0.000051 Epoch: 31 Global Step: 660660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:42,345-Speed 2513.92 samples/sec Loss 1.2938 LearningRate 0.000051 Epoch: 31 Global Step: 660670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:50,545-Speed 2498.29 samples/sec Loss 1.2369 LearningRate 0.000051 Epoch: 31 Global Step: 660680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:41:58,743-Speed 2498.55 samples/sec Loss 1.2541 LearningRate 0.000051 Epoch: 31 Global Step: 660690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:06,945-Speed 2496.96 samples/sec Loss 1.3166 LearningRate 0.000051 Epoch: 31 Global Step: 660700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:15,143-Speed 2498.71 samples/sec Loss 1.2625 LearningRate 0.000051 Epoch: 31 Global Step: 660710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:23,348-Speed 2496.66 samples/sec Loss 1.2338 LearningRate 0.000051 Epoch: 31 Global Step: 660720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:31,495-Speed 2514.04 samples/sec Loss 1.2547 LearningRate 0.000051 Epoch: 31 Global Step: 660730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:39,708-Speed 2494.18 samples/sec Loss 1.2169 LearningRate 0.000051 Epoch: 31 Global Step: 660740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:42:47,907-Speed 2498.14 samples/sec Loss 1.2494 LearningRate 0.000051 Epoch: 31 Global Step: 660750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:42:56,110-Speed 2497.00 samples/sec Loss 1.2478 LearningRate 0.000051 Epoch: 31 Global Step: 660760 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:04,313-Speed 2497.08 samples/sec Loss 1.2471 LearningRate 0.000051 Epoch: 31 Global Step: 660770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:12,514-Speed 2498.18 samples/sec Loss 1.2632 LearningRate 0.000051 Epoch: 31 Global Step: 660780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:20,667-Speed 2512.31 samples/sec Loss 1.2253 LearningRate 0.000051 Epoch: 31 Global Step: 660790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:28,866-Speed 2498.62 samples/sec Loss 1.2661 LearningRate 0.000051 Epoch: 31 Global Step: 660800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:37,068-Speed 2497.31 samples/sec Loss 1.2517 LearningRate 0.000051 Epoch: 31 Global Step: 660810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:45,266-Speed 2498.50 samples/sec Loss 1.2550 LearningRate 0.000051 Epoch: 31 Global Step: 660820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:43:53,464-Speed 2498.68 samples/sec Loss 1.2327 LearningRate 0.000051 Epoch: 31 Global Step: 660830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:01,677-Speed 2494.21 samples/sec Loss 1.2620 LearningRate 0.000051 Epoch: 31 Global Step: 660840 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:09,839-Speed 2510.38 samples/sec Loss 1.2377 LearningRate 0.000051 Epoch: 31 Global Step: 660850 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:18,039-Speed 2497.87 samples/sec Loss 1.2311 LearningRate 0.000051 Epoch: 31 Global Step: 660860 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:26,243-Speed 2496.73 samples/sec Loss 1.2734 LearningRate 0.000051 Epoch: 31 Global Step: 660870 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:34,441-Speed 2498.81 samples/sec Loss 1.2332 LearningRate 0.000051 Epoch: 31 Global Step: 660880 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:42,637-Speed 2498.88 samples/sec Loss 1.2564 LearningRate 0.000051 Epoch: 31 Global Step: 660890 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:50,838-Speed 2497.62 samples/sec Loss 1.2509 LearningRate 0.000051 Epoch: 31 Global Step: 660900 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:44:58,988-Speed 2513.48 samples/sec Loss 1.2688 LearningRate 0.000051 Epoch: 31 Global Step: 660910 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:07,189-Speed 2497.59 samples/sec Loss 1.2619 LearningRate 0.000051 Epoch: 31 Global Step: 660920 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:15,401-Speed 2494.21 samples/sec Loss 1.2428 LearningRate 0.000051 Epoch: 31 Global Step: 660930 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:23,601-Speed 2498.22 samples/sec Loss 1.2626 LearningRate 0.000051 Epoch: 31 Global Step: 660940 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:31,801-Speed 2497.98 samples/sec Loss 1.2562 LearningRate 0.000051 Epoch: 31 Global Step: 660950 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:40,003-Speed 2497.27 samples/sec Loss 1.2428 LearningRate 0.000051 Epoch: 31 Global Step: 660960 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:48,153-Speed 2513.34 samples/sec Loss 1.2456 LearningRate 0.000051 Epoch: 31 Global Step: 660970 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-07-11 21:45:56,316-Speed 2509.51 samples/sec Loss 1.2384 LearningRate 0.000051 Epoch: 31 Global Step: 660980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:04,518-Speed 2497.41 samples/sec Loss 1.2754 LearningRate 0.000051 Epoch: 31 Global Step: 660990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:12,715-Speed 2498.66 samples/sec Loss 1.2953 LearningRate 0.000051 Epoch: 31 Global Step: 661000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:20,922-Speed 2496.03 samples/sec Loss 1.2306 LearningRate 0.000051 Epoch: 31 Global Step: 661010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:29,121-Speed 2498.02 samples/sec Loss 1.2806 LearningRate 0.000051 Epoch: 31 Global Step: 661020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:37,267-Speed 2514.53 samples/sec Loss 1.2962 LearningRate 0.000051 Epoch: 31 Global Step: 661030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:45,471-Speed 2497.04 samples/sec Loss 1.2558 LearningRate 0.000051 Epoch: 31 Global Step: 661040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:46:53,691-Speed 2491.77 samples/sec Loss 1.2695 LearningRate 0.000051 Epoch: 31 Global Step: 661050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:01,898-Speed 2495.97 samples/sec Loss 1.2540 LearningRate 0.000051 Epoch: 31 Global Step: 661060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:10,101-Speed 2497.32 samples/sec Loss 1.2616 LearningRate 0.000051 Epoch: 31 Global Step: 661070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:18,305-Speed 2496.75 samples/sec Loss 1.2437 LearningRate 0.000051 Epoch: 31 Global Step: 661080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:26,454-Speed 2513.69 samples/sec Loss 1.2465 LearningRate 0.000051 Epoch: 31 Global Step: 661090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:34,652-Speed 2498.41 samples/sec Loss 1.2649 LearningRate 0.000051 Epoch: 31 Global Step: 661100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:42,865-Speed 2494.26 samples/sec Loss 1.2272 LearningRate 0.000051 Epoch: 31 Global Step: 661110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:51,066-Speed 2497.83 samples/sec Loss 1.2870 LearningRate 0.000051 Epoch: 31 Global Step: 661120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:47:59,266-Speed 2497.86 samples/sec Loss 1.2714 LearningRate 0.000051 Epoch: 31 Global Step: 661130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:07,468-Speed 2497.21 samples/sec Loss 1.2405 LearningRate 0.000051 Epoch: 31 Global Step: 661140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:15,615-Speed 2514.40 samples/sec Loss 1.2582 LearningRate 0.000051 Epoch: 31 Global Step: 661150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:23,819-Speed 2496.79 samples/sec Loss 1.2324 LearningRate 0.000051 Epoch: 31 Global Step: 661160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:32,027-Speed 2495.22 samples/sec Loss 1.2506 LearningRate 0.000051 Epoch: 31 Global Step: 661170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:40,238-Speed 2494.88 samples/sec Loss 1.2384 LearningRate 0.000051 Epoch: 31 Global Step: 661180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:48,441-Speed 2498.13 samples/sec Loss 1.2035 LearningRate 0.000051 Epoch: 31 Global Step: 661190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:48:56,640-Speed 2498.36 samples/sec Loss 1.2476 LearningRate 0.000051 Epoch: 31 Global Step: 661200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:04,791-Speed 2513.04 samples/sec Loss 1.2384 LearningRate 0.000051 Epoch: 31 Global Step: 661210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:12,991-Speed 2497.91 samples/sec Loss 1.2473 LearningRate 0.000051 Epoch: 31 Global Step: 661220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:21,191-Speed 2497.96 samples/sec Loss 1.2257 LearningRate 0.000051 Epoch: 31 Global Step: 661230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:29,394-Speed 2496.98 samples/sec Loss 1.2602 LearningRate 0.000051 Epoch: 31 Global Step: 661240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:37,594-Speed 2498.21 samples/sec Loss 1.2370 LearningRate 0.000051 Epoch: 31 Global Step: 661250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:45,796-Speed 2497.16 samples/sec Loss 1.2166 LearningRate 0.000051 Epoch: 31 Global Step: 661260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:49:53,948-Speed 2512.93 samples/sec Loss 1.2936 LearningRate 0.000051 Epoch: 31 Global Step: 661270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:02,153-Speed 2496.40 samples/sec Loss 1.2438 LearningRate 0.000051 Epoch: 31 Global Step: 661280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:10,358-Speed 2496.56 samples/sec Loss 1.2511 LearningRate 0.000051 Epoch: 31 Global Step: 661290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:18,573-Speed 2493.16 samples/sec Loss 1.2516 LearningRate 0.000051 Epoch: 31 Global Step: 661300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:26,773-Speed 2498.29 samples/sec Loss 1.2834 LearningRate 0.000051 Epoch: 31 Global Step: 661310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:34,976-Speed 2496.87 samples/sec Loss 1.2901 LearningRate 0.000051 Epoch: 31 Global Step: 661320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:43,135-Speed 2510.54 samples/sec Loss 1.2516 LearningRate 0.000051 Epoch: 31 Global Step: 661330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:51,336-Speed 2497.75 samples/sec Loss 1.2309 LearningRate 0.000051 Epoch: 31 Global Step: 661340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:50:59,535-Speed 2498.34 samples/sec Loss 1.2878 LearningRate 0.000051 Epoch: 31 Global Step: 661350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:51:07,734-Speed 2498.27 samples/sec Loss 1.2483 LearningRate 0.000051 Epoch: 31 Global Step: 661360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:51:15,956-Speed 2491.53 samples/sec Loss 1.2371 LearningRate 0.000051 Epoch: 31 Global Step: 661370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:51:24,158-Speed 2497.66 samples/sec Loss 1.2374 LearningRate 0.000051 Epoch: 31 Global Step: 661380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:51:32,302-Speed 2514.80 samples/sec Loss 1.2936 LearningRate 0.000051 Epoch: 31 Global Step: 661390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-07-11 21:51:40,508-Speed 2496.43 samples/sec Loss 1.2213 LearningRate 0.000051 Epoch: 31 Global Step: 661400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:51:48,708-Speed 2498.00 samples/sec Loss 1.2391 LearningRate 0.000051 Epoch: 31 Global Step: 661410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:51:56,915-Speed 2495.77 samples/sec Loss 1.2601 LearningRate 0.000051 Epoch: 31 Global Step: 661420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:05,113-Speed 2498.60 samples/sec Loss 1.2740 LearningRate 0.000051 Epoch: 31 Global Step: 661430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:13,312-Speed 2498.31 samples/sec Loss 1.2599 LearningRate 0.000051 Epoch: 31 Global Step: 661440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:21,463-Speed 2513.03 samples/sec Loss 1.2518 LearningRate 0.000051 Epoch: 31 Global Step: 661450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:29,663-Speed 2497.76 samples/sec Loss 1.2607 LearningRate 0.000051 Epoch: 31 Global Step: 661460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:37,864-Speed 2497.71 samples/sec Loss 1.2314 LearningRate 0.000051 Epoch: 31 Global Step: 661470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:46,059-Speed 2499.24 samples/sec Loss 1.2630 LearningRate 0.000051 Epoch: 31 Global Step: 661480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:52:54,259-Speed 2498.00 samples/sec Loss 1.2669 LearningRate 0.000051 Epoch: 31 Global Step: 661490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:02,467-Speed 2495.82 samples/sec Loss 1.2662 LearningRate 0.000051 Epoch: 31 Global Step: 661500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:10,628-Speed 2509.83 samples/sec Loss 1.2496 LearningRate 0.000051 Epoch: 31 Global Step: 661510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:18,826-Speed 2498.67 samples/sec Loss 1.2660 LearningRate 0.000051 Epoch: 31 Global Step: 661520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:27,027-Speed 2497.58 samples/sec Loss 1.2721 LearningRate 0.000051 Epoch: 31 Global Step: 661530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:35,231-Speed 2496.81 samples/sec Loss 1.2750 LearningRate 0.000051 Epoch: 31 Global Step: 661540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:43,430-Speed 2498.01 samples/sec Loss 1.2187 LearningRate 0.000051 Epoch: 31 Global Step: 661550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:51,632-Speed 2497.42 samples/sec Loss 1.2813 LearningRate 0.000051 Epoch: 31 Global Step: 661560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:53:59,782-Speed 2513.46 samples/sec Loss 1.2594 LearningRate 0.000051 Epoch: 31 Global Step: 661570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:07,984-Speed 2497.21 samples/sec Loss 1.2349 LearningRate 0.000051 Epoch: 31 Global Step: 661580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:16,185-Speed 2497.87 samples/sec Loss 1.2537 LearningRate 0.000051 Epoch: 31 Global Step: 661590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:24,386-Speed 2497.63 samples/sec Loss 1.2542 LearningRate 0.000051 Epoch: 31 Global Step: 661600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:32,587-Speed 2497.84 samples/sec Loss 1.2571 LearningRate 0.000051 Epoch: 31 Global Step: 661610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:40,790-Speed 2497.11 samples/sec Loss 1.2353 LearningRate 0.000051 Epoch: 31 Global Step: 661620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:48,949-Speed 2510.36 samples/sec Loss 1.2335 LearningRate 0.000051 Epoch: 31 Global Step: 661630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:54:57,154-Speed 2496.74 samples/sec Loss 1.2528 LearningRate 0.000051 Epoch: 31 Global Step: 661640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:05,367-Speed 2493.90 samples/sec Loss 1.2667 LearningRate 0.000051 Epoch: 31 Global Step: 661650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:13,566-Speed 2498.14 samples/sec Loss 1.2457 LearningRate 0.000051 Epoch: 31 Global Step: 661660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:21,765-Speed 2498.43 samples/sec Loss 1.2554 LearningRate 0.000051 Epoch: 31 Global Step: 661670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:29,980-Speed 2493.40 samples/sec Loss 1.2455 LearningRate 0.000051 Epoch: 31 Global Step: 661680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:38,129-Speed 2513.60 samples/sec Loss 1.2612 LearningRate 0.000051 Epoch: 31 Global Step: 661690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:46,341-Speed 2494.28 samples/sec Loss 1.2281 LearningRate 0.000051 Epoch: 31 Global Step: 661700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:55:54,543-Speed 2497.20 samples/sec Loss 1.2198 LearningRate 0.000051 Epoch: 31 Global Step: 661710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:02,748-Speed 2496.77 samples/sec Loss 1.2610 LearningRate 0.000051 Epoch: 31 Global Step: 661720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:10,945-Speed 2498.98 samples/sec Loss 1.2112 LearningRate 0.000051 Epoch: 31 Global Step: 661730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:19,162-Speed 2493.16 samples/sec Loss 1.2345 LearningRate 0.000051 Epoch: 31 Global Step: 661740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:27,308-Speed 2514.42 samples/sec Loss 1.2703 LearningRate 0.000051 Epoch: 31 Global Step: 661750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:35,509-Speed 2497.88 samples/sec Loss 1.2147 LearningRate 0.000051 Epoch: 31 Global Step: 661760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:43,710-Speed 2497.59 samples/sec Loss 1.2619 LearningRate 0.000051 Epoch: 31 Global Step: 661770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:56:51,911-Speed 2497.88 samples/sec Loss 1.2687 LearningRate 0.000051 Epoch: 31 Global Step: 661780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:00,113-Speed 2497.13 samples/sec Loss 1.2395 LearningRate 0.000050 Epoch: 31 Global Step: 661790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:08,316-Speed 2497.06 samples/sec Loss 1.2696 LearningRate 0.000050 Epoch: 31 Global Step: 661800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:16,466-Speed 2513.11 samples/sec Loss 1.2259 LearningRate 0.000050 Epoch: 31 Global Step: 661810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:24,666-Speed 2498.03 samples/sec Loss 1.2820 LearningRate 0.000050 Epoch: 31 Global Step: 661820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:32,877-Speed 2494.64 samples/sec Loss 1.2483 LearningRate 0.000050 Epoch: 31 Global Step: 661830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:41,081-Speed 2497.10 samples/sec Loss 1.2649 LearningRate 0.000050 Epoch: 31 Global Step: 661840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:49,284-Speed 2496.83 samples/sec Loss 1.2760 LearningRate 0.000050 Epoch: 31 Global Step: 661850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:57:57,496-Speed 2494.35 samples/sec Loss 1.2812 LearningRate 0.000050 Epoch: 31 Global Step: 661860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:05,645-Speed 2513.74 samples/sec Loss 1.2196 LearningRate 0.000050 Epoch: 31 Global Step: 661870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:13,851-Speed 2496.08 samples/sec Loss 1.2507 LearningRate 0.000050 Epoch: 31 Global Step: 661880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:22,051-Speed 2498.13 samples/sec Loss 1.2608 LearningRate 0.000050 Epoch: 31 Global Step: 661890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:30,252-Speed 2497.62 samples/sec Loss 1.2486 LearningRate 0.000050 Epoch: 31 Global Step: 661900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:38,452-Speed 2498.23 samples/sec Loss 1.2434 LearningRate 0.000050 Epoch: 31 Global Step: 661910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:46,656-Speed 2496.51 samples/sec Loss 1.2576 LearningRate 0.000050 Epoch: 31 Global Step: 661920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:58:54,803-Speed 2514.29 samples/sec Loss 1.2669 LearningRate 0.000050 Epoch: 31 Global Step: 661930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:03,014-Speed 2494.43 samples/sec Loss 1.2869 LearningRate 0.000050 Epoch: 31 Global Step: 661940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:11,217-Speed 2497.21 samples/sec Loss 1.2469 LearningRate 0.000050 Epoch: 31 Global Step: 661950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:19,417-Speed 2497.80 samples/sec Loss 1.2456 LearningRate 0.000050 Epoch: 31 Global Step: 661960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:27,622-Speed 2496.41 samples/sec Loss 1.2664 LearningRate 0.000050 Epoch: 31 Global Step: 661970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:35,824-Speed 2497.35 samples/sec Loss 1.2397 LearningRate 0.000050 Epoch: 31 Global Step: 661980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:43,986-Speed 2510.15 samples/sec Loss 1.2616 LearningRate 0.000050 Epoch: 31 Global Step: 661990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 21:59:52,189-Speed 2496.89 samples/sec Loss 1.2453 LearningRate 0.000050 Epoch: 31 Global Step: 662000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:00,389-Speed 2498.21 samples/sec Loss 1.2255 LearningRate 0.000050 Epoch: 31 Global Step: 662010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:08,587-Speed 2498.43 samples/sec Loss 1.2566 LearningRate 0.000050 Epoch: 31 Global Step: 662020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:16,788-Speed 2498.03 samples/sec Loss 1.2625 LearningRate 0.000050 Epoch: 31 Global Step: 662030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:24,994-Speed 2496.10 samples/sec Loss 1.2434 LearningRate 0.000050 Epoch: 31 Global Step: 662040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:33,139-Speed 2514.59 samples/sec Loss 1.2487 LearningRate 0.000050 Epoch: 31 Global Step: 662050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:41,346-Speed 2496.01 samples/sec Loss 1.2515 LearningRate 0.000050 Epoch: 31 Global Step: 662060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:49,559-Speed 2494.19 samples/sec Loss 1.2617 LearningRate 0.000050 Epoch: 31 Global Step: 662070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:00:57,760-Speed 2497.62 samples/sec Loss 1.2772 LearningRate 0.000050 Epoch: 31 Global Step: 662080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:05,958-Speed 2498.53 samples/sec Loss 1.2413 LearningRate 0.000050 Epoch: 31 Global Step: 662090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:14,161-Speed 2496.92 samples/sec Loss 1.2334 LearningRate 0.000050 Epoch: 31 Global Step: 662100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:22,313-Speed 2512.79 samples/sec Loss 1.2630 LearningRate 0.000050 Epoch: 31 Global Step: 662110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:30,514-Speed 2497.58 samples/sec Loss 1.2854 LearningRate 0.000050 Epoch: 31 Global Step: 662120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:38,719-Speed 2496.33 samples/sec Loss 1.2565 LearningRate 0.000050 Epoch: 31 Global Step: 662130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:46,924-Speed 2496.47 samples/sec Loss 1.2431 LearningRate 0.000050 Epoch: 31 Global Step: 662140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:01:55,127-Speed 2497.48 samples/sec Loss 1.2631 LearningRate 0.000050 Epoch: 31 Global Step: 662150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:02:03,331-Speed 2496.74 samples/sec Loss 1.2587 LearningRate 0.000050 Epoch: 31 Global Step: 662160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:02:11,479-Speed 2513.84 samples/sec Loss 1.2227 LearningRate 0.000050 Epoch: 31 Global Step: 662170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:02:19,682-Speed 2497.15 samples/sec Loss 1.2804 LearningRate 0.000050 Epoch: 31 Global Step: 662180 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:02:27,886-Speed 2496.58 samples/sec Loss 1.2645 LearningRate 0.000050 Epoch: 31 Global Step: 662190 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:02:36,090-Speed 2496.63 samples/sec Loss 1.2638 LearningRate 0.000050 Epoch: 31 Global Step: 662200 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:02:44,296-Speed 2496.65 samples/sec Loss 1.2613 LearningRate 0.000050 Epoch: 31 Global Step: 662210 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:02:52,495-Speed 2498.42 samples/sec Loss 1.2368 LearningRate 0.000050 Epoch: 31 Global Step: 662220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:00,641-Speed 2514.23 samples/sec Loss 1.2527 LearningRate 0.000050 Epoch: 31 Global Step: 662230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:08,847-Speed 2496.30 samples/sec Loss 1.2291 LearningRate 0.000050 Epoch: 31 Global Step: 662240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:17,059-Speed 2494.18 samples/sec Loss 1.2392 LearningRate 0.000050 Epoch: 31 Global Step: 662250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:25,260-Speed 2497.80 samples/sec Loss 1.2185 LearningRate 0.000050 Epoch: 31 Global Step: 662260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:33,466-Speed 2496.34 samples/sec Loss 1.2762 LearningRate 0.000050 Epoch: 31 Global Step: 662270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:41,670-Speed 2496.45 samples/sec Loss 1.2976 LearningRate 0.000050 Epoch: 31 Global Step: 662280 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:49,818-Speed 2514.83 samples/sec Loss 1.2575 LearningRate 0.000050 Epoch: 31 Global Step: 662290 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:03:58,024-Speed 2496.26 samples/sec Loss 1.2289 LearningRate 0.000050 Epoch: 31 Global Step: 662300 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:06,227-Speed 2497.12 samples/sec Loss 1.2708 LearningRate 0.000050 Epoch: 31 Global Step: 662310 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:14,430-Speed 2496.87 samples/sec Loss 1.2488 LearningRate 0.000050 Epoch: 31 Global Step: 662320 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:22,631-Speed 2498.08 samples/sec Loss 1.2399 LearningRate 0.000050 Epoch: 31 Global Step: 662330 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:30,831-Speed 2497.95 samples/sec Loss 1.2598 LearningRate 0.000050 Epoch: 31 Global Step: 662340 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:38,977-Speed 2514.27 samples/sec Loss 1.2779 LearningRate 0.000050 Epoch: 31 Global Step: 662350 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:47,179-Speed 2497.47 samples/sec Loss 1.2723 LearningRate 0.000050 Epoch: 31 Global Step: 662360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:04:55,384-Speed 2496.68 samples/sec Loss 1.2140 LearningRate 0.000050 Epoch: 31 Global Step: 662370 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:03,594-Speed 2494.93 samples/sec Loss 1.2269 LearningRate 0.000050 Epoch: 31 Global Step: 662380 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:11,796-Speed 2497.36 samples/sec Loss 1.2845 LearningRate 0.000050 Epoch: 31 Global Step: 662390 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:19,996-Speed 2497.96 samples/sec Loss 1.2292 LearningRate 0.000050 Epoch: 31 Global Step: 662400 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:28,150-Speed 2512.09 samples/sec Loss 1.2477 LearningRate 0.000050 Epoch: 31 Global Step: 662410 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:36,348-Speed 2498.41 samples/sec Loss 1.2156 LearningRate 0.000050 Epoch: 31 Global Step: 662420 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:44,555-Speed 2495.98 samples/sec Loss 1.2667 LearningRate 0.000050 Epoch: 31 Global Step: 662430 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:05:52,758-Speed 2497.19 samples/sec Loss 1.2388 LearningRate 0.000050 Epoch: 31 Global Step: 662440 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:00,958-Speed 2497.95 samples/sec Loss 1.2469 LearningRate 0.000050 Epoch: 31 Global Step: 662450 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:09,162-Speed 2496.45 samples/sec Loss 1.2641 LearningRate 0.000050 Epoch: 31 Global Step: 662460 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:17,311-Speed 2514.10 samples/sec Loss 1.2379 LearningRate 0.000050 Epoch: 31 Global Step: 662470 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:25,514-Speed 2496.88 samples/sec Loss 1.2359 LearningRate 0.000050 Epoch: 31 Global Step: 662480 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:33,718-Speed 2496.74 samples/sec Loss 1.2360 LearningRate 0.000050 Epoch: 31 Global Step: 662490 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:41,920-Speed 2497.47 samples/sec Loss 1.2411 LearningRate 0.000050 Epoch: 31 Global Step: 662500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:50,122-Speed 2497.47 samples/sec Loss 1.2402 LearningRate 0.000050 Epoch: 31 Global Step: 662510 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:06:58,334-Speed 2494.44 samples/sec Loss 1.2857 LearningRate 0.000050 Epoch: 31 Global Step: 662520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:06,482-Speed 2514.11 samples/sec Loss 1.2385 LearningRate 0.000050 Epoch: 31 Global Step: 662530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:14,688-Speed 2495.94 samples/sec Loss 1.2330 LearningRate 0.000050 Epoch: 31 Global Step: 662540 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:22,892-Speed 2497.06 samples/sec Loss 1.2335 LearningRate 0.000050 Epoch: 31 Global Step: 662550 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:31,099-Speed 2495.97 samples/sec Loss 1.2555 LearningRate 0.000050 Epoch: 31 Global Step: 662560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:39,295-Speed 2499.23 samples/sec Loss 1.2228 LearningRate 0.000050 Epoch: 31 Global Step: 662570 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:47,496-Speed 2497.47 samples/sec Loss 1.2722 LearningRate 0.000050 Epoch: 31 Global Step: 662580 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:07:55,649-Speed 2512.59 samples/sec Loss 1.2517 LearningRate 0.000050 Epoch: 31 Global Step: 662590 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:03,853-Speed 2496.96 samples/sec Loss 1.2381 LearningRate 0.000050 Epoch: 31 Global Step: 662600 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:12,053-Speed 2497.72 samples/sec Loss 1.2317 LearningRate 0.000050 Epoch: 31 Global Step: 662610 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:20,254-Speed 2497.52 samples/sec Loss 1.2702 LearningRate 0.000050 Epoch: 31 Global Step: 662620 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:28,453-Speed 2498.45 samples/sec Loss 1.2333 LearningRate 0.000050 Epoch: 31 Global Step: 662630 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:36,658-Speed 2496.65 samples/sec Loss 1.2673 LearningRate 0.000050 Epoch: 31 Global Step: 662640 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:44,805-Speed 2514.11 samples/sec Loss 1.2444 LearningRate 0.000050 Epoch: 31 Global Step: 662650 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:08:53,015-Speed 2495.73 samples/sec Loss 1.2194 LearningRate 0.000050 Epoch: 31 Global Step: 662660 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:01,238-Speed 2491.30 samples/sec Loss 1.2383 LearningRate 0.000050 Epoch: 31 Global Step: 662670 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:09,451-Speed 2494.00 samples/sec Loss 1.2530 LearningRate 0.000050 Epoch: 31 Global Step: 662680 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:17,657-Speed 2496.10 samples/sec Loss 1.2539 LearningRate 0.000050 Epoch: 31 Global Step: 662690 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:25,858-Speed 2497.68 samples/sec Loss 1.2434 LearningRate 0.000050 Epoch: 31 Global Step: 662700 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:34,006-Speed 2513.82 samples/sec Loss 1.2672 LearningRate 0.000050 Epoch: 31 Global Step: 662710 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:42,208-Speed 2497.57 samples/sec Loss 1.2468 LearningRate 0.000050 Epoch: 31 Global Step: 662720 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:50,417-Speed 2495.03 samples/sec Loss 1.2802 LearningRate 0.000050 Epoch: 31 Global Step: 662730 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:09:58,617-Speed 2497.88 samples/sec Loss 1.2268 LearningRate 0.000050 Epoch: 31 Global Step: 662740 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:06,823-Speed 2495.97 samples/sec Loss 1.2696 LearningRate 0.000050 Epoch: 31 Global Step: 662750 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:15,028-Speed 2496.57 samples/sec Loss 1.2407 LearningRate 0.000050 Epoch: 31 Global Step: 662760 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:23,194-Speed 2508.25 samples/sec Loss 1.2428 LearningRate 0.000050 Epoch: 31 Global Step: 662770 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:31,395-Speed 2497.80 samples/sec Loss 1.2542 LearningRate 0.000050 Epoch: 31 Global Step: 662780 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:39,592-Speed 2498.66 samples/sec Loss 1.2222 LearningRate 0.000050 Epoch: 31 Global Step: 662790 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:47,803-Speed 2494.83 samples/sec Loss 1.2470 LearningRate 0.000050 Epoch: 31 Global Step: 662800 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:10:56,006-Speed 2496.86 samples/sec Loss 1.2637 LearningRate 0.000050 Epoch: 31 Global Step: 662810 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:11:04,165-Speed 2510.47 samples/sec Loss 1.2656 LearningRate 0.000050 Epoch: 31 Global Step: 662820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:12,315-Speed 2513.51 samples/sec Loss 1.2442 LearningRate 0.000050 Epoch: 31 Global Step: 662830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:20,517-Speed 2497.36 samples/sec Loss 1.2246 LearningRate 0.000050 Epoch: 31 Global Step: 662840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:28,721-Speed 2496.74 samples/sec Loss 1.2227 LearningRate 0.000050 Epoch: 31 Global Step: 662850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:36,924-Speed 2496.96 samples/sec Loss 1.2671 LearningRate 0.000050 Epoch: 31 Global Step: 662860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:45,125-Speed 2497.72 samples/sec Loss 1.2465 LearningRate 0.000050 Epoch: 31 Global Step: 662870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:11:53,326-Speed 2497.60 samples/sec Loss 1.2273 LearningRate 0.000050 Epoch: 31 Global Step: 662880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:01,472-Speed 2514.64 samples/sec Loss 1.2797 LearningRate 0.000050 Epoch: 31 Global Step: 662890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:09,672-Speed 2497.85 samples/sec Loss 1.2199 LearningRate 0.000050 Epoch: 31 Global Step: 662900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:17,873-Speed 2497.63 samples/sec Loss 1.2275 LearningRate 0.000050 Epoch: 31 Global Step: 662910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:26,085-Speed 2494.25 samples/sec Loss 1.2509 LearningRate 0.000050 Epoch: 31 Global Step: 662920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:34,287-Speed 2497.46 samples/sec Loss 1.2436 LearningRate 0.000050 Epoch: 31 Global Step: 662930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:42,487-Speed 2498.07 samples/sec Loss 1.2463 LearningRate 0.000050 Epoch: 31 Global Step: 662940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:50,639-Speed 2512.79 samples/sec Loss 1.2749 LearningRate 0.000050 Epoch: 31 Global Step: 662950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:12:58,839-Speed 2497.75 samples/sec Loss 1.2492 LearningRate 0.000050 Epoch: 31 Global Step: 662960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:07,039-Speed 2498.09 samples/sec Loss 1.2695 LearningRate 0.000050 Epoch: 31 Global Step: 662970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:15,250-Speed 2494.33 samples/sec Loss 1.2968 LearningRate 0.000050 Epoch: 31 Global Step: 662980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:23,452-Speed 2497.60 samples/sec Loss 1.2514 LearningRate 0.000050 Epoch: 31 Global Step: 662990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:31,667-Speed 2493.44 samples/sec Loss 1.2038 LearningRate 0.000050 Epoch: 31 Global Step: 663000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:39,813-Speed 2514.54 samples/sec Loss 1.2697 LearningRate 0.000050 Epoch: 31 Global Step: 663010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:48,014-Speed 2497.99 samples/sec Loss 1.2389 LearningRate 0.000050 Epoch: 31 Global Step: 663020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:13:56,215-Speed 2497.58 samples/sec Loss 1.2864 LearningRate 0.000050 Epoch: 31 Global Step: 663030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:04,436-Speed 2491.70 samples/sec Loss 1.2200 LearningRate 0.000050 Epoch: 31 Global Step: 663040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:12,636-Speed 2498.07 samples/sec Loss 1.2931 LearningRate 0.000050 Epoch: 31 Global Step: 663050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:20,841-Speed 2496.31 samples/sec Loss 1.2581 LearningRate 0.000050 Epoch: 31 Global Step: 663060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:28,988-Speed 2514.19 samples/sec Loss 1.2551 LearningRate 0.000050 Epoch: 31 Global Step: 663070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:37,186-Speed 2498.59 samples/sec Loss 1.2498 LearningRate 0.000050 Epoch: 31 Global Step: 663080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:45,388-Speed 2497.43 samples/sec Loss 1.2477 LearningRate 0.000050 Epoch: 31 Global Step: 663090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:14:53,596-Speed 2495.62 samples/sec Loss 1.2450 LearningRate 0.000050 Epoch: 31 Global Step: 663100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:01,796-Speed 2497.92 samples/sec Loss 1.2392 LearningRate 0.000050 Epoch: 31 Global Step: 663110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:09,998-Speed 2497.47 samples/sec Loss 1.2591 LearningRate 0.000050 Epoch: 31 Global Step: 663120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:18,150-Speed 2513.02 samples/sec Loss 1.2688 LearningRate 0.000050 Epoch: 31 Global Step: 663130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:26,358-Speed 2495.47 samples/sec Loss 1.2799 LearningRate 0.000050 Epoch: 31 Global Step: 663140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:34,561-Speed 2497.19 samples/sec Loss 1.2362 LearningRate 0.000050 Epoch: 31 Global Step: 663150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:42,766-Speed 2496.61 samples/sec Loss 1.2581 LearningRate 0.000050 Epoch: 31 Global Step: 663160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:50,982-Speed 2493.18 samples/sec Loss 1.2282 LearningRate 0.000050 Epoch: 31 Global Step: 663170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:15:59,183-Speed 2497.54 samples/sec Loss 1.2478 LearningRate 0.000050 Epoch: 31 Global Step: 663180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:07,332-Speed 2513.39 samples/sec Loss 1.2883 LearningRate 0.000050 Epoch: 31 Global Step: 663190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:15,537-Speed 2496.45 samples/sec Loss 1.2587 LearningRate 0.000050 Epoch: 31 Global Step: 663200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:23,736-Speed 2498.28 samples/sec Loss 1.2416 LearningRate 0.000050 Epoch: 31 Global Step: 663210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:31,938-Speed 2497.27 samples/sec Loss 1.2601 LearningRate 0.000050 Epoch: 31 Global Step: 663220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:40,138-Speed 2498.05 samples/sec Loss 1.2373 LearningRate 0.000050 Epoch: 31 Global Step: 663230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:48,338-Speed 2498.20 samples/sec Loss 1.2646 LearningRate 0.000050 Epoch: 31 Global Step: 663240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:16:56,486-Speed 2513.82 samples/sec Loss 1.2752 LearningRate 0.000050 Epoch: 31 Global Step: 663250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:04,696-Speed 2494.76 samples/sec Loss 1.2518 LearningRate 0.000050 Epoch: 31 Global Step: 663260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:12,895-Speed 2498.53 samples/sec Loss 1.2309 LearningRate 0.000050 Epoch: 31 Global Step: 663270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:21,095-Speed 2498.12 samples/sec Loss 1.2498 LearningRate 0.000050 Epoch: 31 Global Step: 663280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:29,295-Speed 2497.84 samples/sec Loss 1.2141 LearningRate 0.000050 Epoch: 31 Global Step: 663290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:37,497-Speed 2497.72 samples/sec Loss 1.2575 LearningRate 0.000050 Epoch: 31 Global Step: 663300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:45,642-Speed 2514.60 samples/sec Loss 1.2766 LearningRate 0.000050 Epoch: 31 Global Step: 663310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:17:53,843-Speed 2497.89 samples/sec Loss 1.2719 LearningRate 0.000050 Epoch: 31 Global Step: 663320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:02,047-Speed 2496.62 samples/sec Loss 1.2759 LearningRate 0.000050 Epoch: 31 Global Step: 663330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:10,246-Speed 2498.46 samples/sec Loss 1.2459 LearningRate 0.000050 Epoch: 31 Global Step: 663340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:18,444-Speed 2498.65 samples/sec Loss 1.2517 LearningRate 0.000050 Epoch: 31 Global Step: 663350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:26,644-Speed 2498.09 samples/sec Loss 1.2838 LearningRate 0.000050 Epoch: 31 Global Step: 663360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:34,803-Speed 2510.70 samples/sec Loss 1.2623 LearningRate 0.000050 Epoch: 31 Global Step: 663370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:43,004-Speed 2497.51 samples/sec Loss 1.2700 LearningRate 0.000050 Epoch: 31 Global Step: 663380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:51,205-Speed 2497.66 samples/sec Loss 1.2584 LearningRate 0.000050 Epoch: 31 Global Step: 663390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:18:59,418-Speed 2494.03 samples/sec Loss 1.2613 LearningRate 0.000050 Epoch: 31 Global Step: 663400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:07,619-Speed 2497.66 samples/sec Loss 1.2869 LearningRate 0.000050 Epoch: 31 Global Step: 663410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:15,819-Speed 2497.84 samples/sec Loss 1.2747 LearningRate 0.000050 Epoch: 31 Global Step: 663420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:23,970-Speed 2513.12 samples/sec Loss 1.2398 LearningRate 0.000050 Epoch: 31 Global Step: 663430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:32,172-Speed 2497.52 samples/sec Loss 1.2304 LearningRate 0.000050 Epoch: 31 Global Step: 663440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:40,372-Speed 2497.82 samples/sec Loss 1.2639 LearningRate 0.000050 Epoch: 31 Global Step: 663450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:48,586-Speed 2493.77 samples/sec Loss 1.2714 LearningRate 0.000049 Epoch: 31 Global Step: 663460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:19:56,793-Speed 2495.88 samples/sec Loss 1.2695 LearningRate 0.000049 Epoch: 31 Global Step: 663470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:04,997-Speed 2496.82 samples/sec Loss 1.2685 LearningRate 0.000049 Epoch: 31 Global Step: 663480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:13,142-Speed 2514.79 samples/sec Loss 1.2548 LearningRate 0.000049 Epoch: 31 Global Step: 663490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:21,352-Speed 2494.81 samples/sec Loss 1.2472 LearningRate 0.000049 Epoch: 31 Global Step: 663500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:29,553-Speed 2497.62 samples/sec Loss 1.2683 LearningRate 0.000049 Epoch: 31 Global Step: 663510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:37,764-Speed 2494.60 samples/sec Loss 1.2422 LearningRate 0.000049 Epoch: 31 Global Step: 663520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:45,965-Speed 2497.65 samples/sec Loss 1.2561 LearningRate 0.000049 Epoch: 31 Global Step: 663530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:20:54,179-Speed 2493.65 samples/sec Loss 1.2614 LearningRate 0.000049 Epoch: 31 Global Step: 663540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:02,325-Speed 2514.42 samples/sec Loss 1.2616 LearningRate 0.000049 Epoch: 31 Global Step: 663550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:10,524-Speed 2498.43 samples/sec Loss 1.2469 LearningRate 0.000049 Epoch: 31 Global Step: 663560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:18,721-Speed 2498.69 samples/sec Loss 1.2834 LearningRate 0.000049 Epoch: 31 Global Step: 663570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:26,932-Speed 2494.56 samples/sec Loss 1.2457 LearningRate 0.000049 Epoch: 31 Global Step: 663580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:35,134-Speed 2499.12 samples/sec Loss 1.2507 LearningRate 0.000049 Epoch: 31 Global Step: 663590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:43,337-Speed 2496.84 samples/sec Loss 1.2250 LearningRate 0.000049 Epoch: 31 Global Step: 663600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:51,486-Speed 2513.57 samples/sec Loss 1.2594 LearningRate 0.000049 Epoch: 31 Global Step: 663610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:21:59,689-Speed 2497.10 samples/sec Loss 1.2683 LearningRate 0.000049 Epoch: 31 Global Step: 663620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:07,890-Speed 2497.66 samples/sec Loss 1.2481 LearningRate 0.000049 Epoch: 31 Global Step: 663630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:16,089-Speed 2498.54 samples/sec Loss 1.2699 LearningRate 0.000049 Epoch: 31 Global Step: 663640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:24,292-Speed 2496.91 samples/sec Loss 1.2683 LearningRate 0.000049 Epoch: 31 Global Step: 663650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:32,492-Speed 2498.10 samples/sec Loss 1.2617 LearningRate 0.000049 Epoch: 31 Global Step: 663660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:42,868-Speed 1974.15 samples/sec Loss 1.2376 LearningRate 0.000049 Epoch: 32 Global Step: 663670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:51,071-Speed 2496.67 samples/sec Loss 1.3001 LearningRate 0.000049 Epoch: 32 Global Step: 663680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:22:59,278-Speed 2495.84 samples/sec Loss 1.2705 LearningRate 0.000049 Epoch: 32 Global Step: 663690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:07,482-Speed 2496.85 samples/sec Loss 1.2654 LearningRate 0.000049 Epoch: 32 Global Step: 663700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:15,683-Speed 2497.48 samples/sec Loss 1.2475 LearningRate 0.000049 Epoch: 32 Global Step: 663710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:23,893-Speed 2495.09 samples/sec Loss 1.2116 LearningRate 0.000049 Epoch: 32 Global Step: 663720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:32,046-Speed 2512.32 samples/sec Loss 1.2567 LearningRate 0.000049 Epoch: 32 Global Step: 663730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:40,261-Speed 2493.55 samples/sec Loss 1.2822 LearningRate 0.000049 Epoch: 32 Global Step: 663740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:48,465-Speed 2496.64 samples/sec Loss 1.2623 LearningRate 0.000049 Epoch: 32 Global Step: 663750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:23:56,674-Speed 2495.38 samples/sec Loss 1.2076 LearningRate 0.000049 Epoch: 32 Global Step: 663760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:04,880-Speed 2496.07 samples/sec Loss 1.2607 LearningRate 0.000049 Epoch: 32 Global Step: 663770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:13,085-Speed 2496.84 samples/sec Loss 1.2931 LearningRate 0.000049 Epoch: 32 Global Step: 663780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:21,233-Speed 2513.53 samples/sec Loss 1.2759 LearningRate 0.000049 Epoch: 32 Global Step: 663790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:29,432-Speed 2498.17 samples/sec Loss 1.2249 LearningRate 0.000049 Epoch: 32 Global Step: 663800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:37,631-Speed 2498.41 samples/sec Loss 1.2371 LearningRate 0.000049 Epoch: 32 Global Step: 663810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:45,830-Speed 2498.90 samples/sec Loss 1.2522 LearningRate 0.000049 Epoch: 32 Global Step: 663820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:24:54,046-Speed 2493.17 samples/sec Loss 1.2296 LearningRate 0.000049 Epoch: 32 Global Step: 663830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:02,253-Speed 2495.92 samples/sec Loss 1.2384 LearningRate 0.000049 Epoch: 32 Global Step: 663840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:10,403-Speed 2513.26 samples/sec Loss 1.2384 LearningRate 0.000049 Epoch: 32 Global Step: 663850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:18,605-Speed 2497.20 samples/sec Loss 1.2576 LearningRate 0.000049 Epoch: 32 Global Step: 663860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:26,804-Speed 2498.24 samples/sec Loss 1.2395 LearningRate 0.000049 Epoch: 32 Global Step: 663870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:35,003-Speed 2498.45 samples/sec Loss 1.2526 LearningRate 0.000049 Epoch: 32 Global Step: 663880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:43,206-Speed 2497.09 samples/sec Loss 1.2076 LearningRate 0.000049 Epoch: 32 Global Step: 663890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:51,406-Speed 2497.89 samples/sec Loss 1.2612 LearningRate 0.000049 Epoch: 32 Global Step: 663900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:25:59,557-Speed 2512.98 samples/sec Loss 1.2202 LearningRate 0.000049 Epoch: 32 Global Step: 663910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:07,761-Speed 2496.77 samples/sec Loss 1.2380 LearningRate 0.000049 Epoch: 32 Global Step: 663920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:15,958-Speed 2498.83 samples/sec Loss 1.2748 LearningRate 0.000049 Epoch: 32 Global Step: 663930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:24,158-Speed 2498.17 samples/sec Loss 1.1781 LearningRate 0.000049 Epoch: 32 Global Step: 663940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:32,369-Speed 2494.41 samples/sec Loss 1.2422 LearningRate 0.000049 Epoch: 32 Global Step: 663950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:40,568-Speed 2498.21 samples/sec Loss 1.2445 LearningRate 0.000049 Epoch: 32 Global Step: 663960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:48,717-Speed 2513.74 samples/sec Loss 1.2530 LearningRate 0.000049 Epoch: 32 Global Step: 663970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:26:56,922-Speed 2496.22 samples/sec Loss 1.2407 LearningRate 0.000049 Epoch: 32 Global Step: 663980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:27:05,127-Speed 2496.74 samples/sec Loss 1.2592 LearningRate 0.000049 Epoch: 32 Global Step: 663990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:27:13,324-Speed 2498.93 samples/sec Loss 1.2457 LearningRate 0.000049 Epoch: 32 Global Step: 664000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:27:21,527-Speed 2497.17 samples/sec Loss 1.2144 LearningRate 0.000049 Epoch: 32 Global Step: 664010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:27:29,728-Speed 2497.31 samples/sec Loss 1.2296 LearningRate 0.000049 Epoch: 32 Global Step: 664020 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:27:37,876-Speed 2513.86 samples/sec Loss 1.2334 LearningRate 0.000049 Epoch: 32 Global Step: 664030 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:27:46,080-Speed 2496.86 samples/sec Loss 1.2337 LearningRate 0.000049 Epoch: 32 Global Step: 664040 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:27:54,283-Speed 2497.04 samples/sec Loss 1.2405 LearningRate 0.000049 Epoch: 32 Global Step: 664050 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:02,488-Speed 2496.26 samples/sec Loss 1.2603 LearningRate 0.000049 Epoch: 32 Global Step: 664060 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:10,694-Speed 2496.35 samples/sec Loss 1.2150 LearningRate 0.000049 Epoch: 32 Global Step: 664070 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:18,896-Speed 2497.32 samples/sec Loss 1.2626 LearningRate 0.000049 Epoch: 32 Global Step: 664080 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:27,041-Speed 2514.67 samples/sec Loss 1.2130 LearningRate 0.000049 Epoch: 32 Global Step: 664090 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:35,244-Speed 2497.43 samples/sec Loss 1.2561 LearningRate 0.000049 Epoch: 32 Global Step: 664100 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:43,442-Speed 2498.77 samples/sec Loss 1.2843 LearningRate 0.000049 Epoch: 32 Global Step: 664110 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:51,642-Speed 2498.02 samples/sec Loss 1.2165 LearningRate 0.000049 Epoch: 32 Global Step: 664120 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:28:59,845-Speed 2497.12 samples/sec Loss 1.2989 LearningRate 0.000049 Epoch: 32 Global Step: 664130 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:08,042-Speed 2498.63 samples/sec Loss 1.2438 LearningRate 0.000049 Epoch: 32 Global Step: 664140 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:16,189-Speed 2514.25 samples/sec Loss 1.2476 LearningRate 0.000049 Epoch: 32 Global Step: 664150 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:24,403-Speed 2493.97 samples/sec Loss 1.2572 LearningRate 0.000049 Epoch: 32 Global Step: 664160 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:32,604-Speed 2497.42 samples/sec Loss 1.2190 LearningRate 0.000049 Epoch: 32 Global Step: 664170 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:40,810-Speed 2496.13 samples/sec Loss 1.2457 LearningRate 0.000049 Epoch: 32 Global Step: 664180 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:49,010-Speed 2497.95 samples/sec Loss 1.2380 LearningRate 0.000049 Epoch: 32 Global Step: 664190 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:29:57,214-Speed 2496.86 samples/sec Loss 1.2324 LearningRate 0.000049 Epoch: 32 Global Step: 664200 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:05,361-Speed 2514.01 samples/sec Loss 1.2412 LearningRate 0.000049 Epoch: 32 Global Step: 664210 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:13,575-Speed 2494.20 samples/sec Loss 1.2483 LearningRate 0.000049 Epoch: 32 Global Step: 664220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:21,783-Speed 2495.56 samples/sec Loss 1.2561 LearningRate 0.000049 Epoch: 32 Global Step: 664230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:29,987-Speed 2496.52 samples/sec Loss 1.2270 LearningRate 0.000049 Epoch: 32 Global Step: 664240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:38,190-Speed 2497.28 samples/sec Loss 1.2497 LearningRate 0.000049 Epoch: 32 Global Step: 664250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:46,395-Speed 2496.56 samples/sec Loss 1.2769 LearningRate 0.000049 Epoch: 32 Global Step: 664260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:30:54,543-Speed 2514.00 samples/sec Loss 1.2392 LearningRate 0.000049 Epoch: 32 Global Step: 664270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:02,743-Speed 2497.90 samples/sec Loss 1.2180 LearningRate 0.000049 Epoch: 32 Global Step: 664280 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:10,944-Speed 2497.56 samples/sec Loss 1.2908 LearningRate 0.000049 Epoch: 32 Global Step: 664290 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:19,161-Speed 2492.68 samples/sec Loss 1.2536 LearningRate 0.000049 Epoch: 32 Global Step: 664300 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:27,367-Speed 2496.15 samples/sec Loss 1.2491 LearningRate 0.000049 Epoch: 32 Global Step: 664310 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:35,568-Speed 2497.83 samples/sec Loss 1.2443 LearningRate 0.000049 Epoch: 32 Global Step: 664320 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:43,711-Speed 2515.36 samples/sec Loss 1.2798 LearningRate 0.000049 Epoch: 32 Global Step: 664330 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:31:51,918-Speed 2495.89 samples/sec Loss 1.2709 LearningRate 0.000049 Epoch: 32 Global Step: 664340 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:00,118-Speed 2498.22 samples/sec Loss 1.2424 LearningRate 0.000049 Epoch: 32 Global Step: 664350 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:08,319-Speed 2497.78 samples/sec Loss 1.2578 LearningRate 0.000049 Epoch: 32 Global Step: 664360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:16,519-Speed 2498.03 samples/sec Loss 1.2627 LearningRate 0.000049 Epoch: 32 Global Step: 664370 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:24,716-Speed 2498.46 samples/sec Loss 1.2605 LearningRate 0.000049 Epoch: 32 Global Step: 664380 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:32,866-Speed 2513.49 samples/sec Loss 1.2423 LearningRate 0.000049 Epoch: 32 Global Step: 664390 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:41,067-Speed 2497.71 samples/sec Loss 1.2335 LearningRate 0.000049 Epoch: 32 Global Step: 664400 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:49,268-Speed 2497.40 samples/sec Loss 1.2666 LearningRate 0.000049 Epoch: 32 Global Step: 664410 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:32:57,473-Speed 2496.78 samples/sec Loss 1.2439 LearningRate 0.000049 Epoch: 32 Global Step: 664420 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:05,674-Speed 2497.36 samples/sec Loss 1.2756 LearningRate 0.000049 Epoch: 32 Global Step: 664430 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:13,888-Speed 2493.76 samples/sec Loss 1.2699 LearningRate 0.000049 Epoch: 32 Global Step: 664440 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:22,049-Speed 2509.95 samples/sec Loss 1.2675 LearningRate 0.000049 Epoch: 32 Global Step: 664450 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:30,254-Speed 2496.36 samples/sec Loss 1.2354 LearningRate 0.000049 Epoch: 32 Global Step: 664460 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:38,460-Speed 2496.22 samples/sec Loss 1.2395 LearningRate 0.000049 Epoch: 32 Global Step: 664470 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:46,673-Speed 2493.80 samples/sec Loss 1.2498 LearningRate 0.000049 Epoch: 32 Global Step: 664480 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:33:54,871-Speed 2498.47 samples/sec Loss 1.2722 LearningRate 0.000049 Epoch: 32 Global Step: 664490 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:03,085-Speed 2493.87 samples/sec Loss 1.2576 LearningRate 0.000049 Epoch: 32 Global Step: 664500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:11,234-Speed 2513.62 samples/sec Loss 1.2661 LearningRate 0.000049 Epoch: 32 Global Step: 664510 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:19,436-Speed 2497.17 samples/sec Loss 1.2635 LearningRate 0.000049 Epoch: 32 Global Step: 664520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:27,635-Speed 2498.38 samples/sec Loss 1.2510 LearningRate 0.000049 Epoch: 32 Global Step: 664530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:35,842-Speed 2495.70 samples/sec Loss 1.2586 LearningRate 0.000049 Epoch: 32 Global Step: 664540 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:44,044-Speed 2497.54 samples/sec Loss 1.2798 LearningRate 0.000049 Epoch: 32 Global Step: 664550 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:34:52,245-Speed 2497.52 samples/sec Loss 1.2685 LearningRate 0.000049 Epoch: 32 Global Step: 664560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:00,398-Speed 2512.34 samples/sec Loss 1.2205 LearningRate 0.000049 Epoch: 32 Global Step: 664570 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:08,606-Speed 2495.67 samples/sec Loss 1.2455 LearningRate 0.000049 Epoch: 32 Global Step: 664580 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:16,808-Speed 2497.05 samples/sec Loss 1.2142 LearningRate 0.000049 Epoch: 32 Global Step: 664590 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:25,012-Speed 2496.85 samples/sec Loss 1.2699 LearningRate 0.000049 Epoch: 32 Global Step: 664600 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:33,214-Speed 2497.30 samples/sec Loss 1.2599 LearningRate 0.000049 Epoch: 32 Global Step: 664610 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:41,429-Speed 2493.31 samples/sec Loss 1.2289 LearningRate 0.000049 Epoch: 32 Global Step: 664620 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:49,581-Speed 2512.69 samples/sec Loss 1.2613 LearningRate 0.000049 Epoch: 32 Global Step: 664630 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:35:57,783-Speed 2497.41 samples/sec Loss 1.2384 LearningRate 0.000049 Epoch: 32 Global Step: 664640 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:05,988-Speed 2496.51 samples/sec Loss 1.2860 LearningRate 0.000049 Epoch: 32 Global Step: 664650 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:14,188-Speed 2498.11 samples/sec Loss 1.2608 LearningRate 0.000049 Epoch: 32 Global Step: 664660 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:22,390-Speed 2497.26 samples/sec Loss 1.2364 LearningRate 0.000049 Epoch: 32 Global Step: 664670 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:30,591-Speed 2497.55 samples/sec Loss 1.2134 LearningRate 0.000049 Epoch: 32 Global Step: 664680 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:38,737-Speed 2514.62 samples/sec Loss 1.2491 LearningRate 0.000049 Epoch: 32 Global Step: 664690 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-07-11 22:36:46,892-Speed 2511.67 samples/sec Loss 1.2530 LearningRate 0.000049 Epoch: 32 Global Step: 664700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:36:55,097-Speed 2496.51 samples/sec Loss 1.2685 LearningRate 0.000049 Epoch: 32 Global Step: 664710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:03,298-Speed 2497.89 samples/sec Loss 1.2358 LearningRate 0.000049 Epoch: 32 Global Step: 664720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:11,495-Speed 2498.57 samples/sec Loss 1.2627 LearningRate 0.000049 Epoch: 32 Global Step: 664730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:19,696-Speed 2497.84 samples/sec Loss 1.2608 LearningRate 0.000049 Epoch: 32 Global Step: 664740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:27,845-Speed 2513.47 samples/sec Loss 1.2781 LearningRate 0.000049 Epoch: 32 Global Step: 664750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:36,046-Speed 2497.70 samples/sec Loss 1.2208 LearningRate 0.000049 Epoch: 32 Global Step: 664760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:44,246-Speed 2497.76 samples/sec Loss 1.2447 LearningRate 0.000049 Epoch: 32 Global Step: 664770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:37:52,450-Speed 2496.96 samples/sec Loss 1.2530 LearningRate 0.000049 Epoch: 32 Global Step: 664780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:00,652-Speed 2497.29 samples/sec Loss 1.2627 LearningRate 0.000049 Epoch: 32 Global Step: 664790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:08,854-Speed 2497.25 samples/sec Loss 1.2440 LearningRate 0.000049 Epoch: 32 Global Step: 664800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:17,002-Speed 2514.14 samples/sec Loss 1.2266 LearningRate 0.000049 Epoch: 32 Global Step: 664810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:25,200-Speed 2498.44 samples/sec Loss 1.2763 LearningRate 0.000049 Epoch: 32 Global Step: 664820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:33,407-Speed 2495.88 samples/sec Loss 1.2304 LearningRate 0.000049 Epoch: 32 Global Step: 664830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:41,607-Speed 2497.71 samples/sec Loss 1.2396 LearningRate 0.000049 Epoch: 32 Global Step: 664840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:49,821-Speed 2493.90 samples/sec Loss 1.2181 LearningRate 0.000049 Epoch: 32 Global Step: 664850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:38:58,022-Speed 2497.75 samples/sec Loss 1.2333 LearningRate 0.000049 Epoch: 32 Global Step: 664860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:06,171-Speed 2513.42 samples/sec Loss 1.2474 LearningRate 0.000049 Epoch: 32 Global Step: 664870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:14,373-Speed 2497.40 samples/sec Loss 1.2277 LearningRate 0.000049 Epoch: 32 Global Step: 664880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:22,576-Speed 2497.28 samples/sec Loss 1.2455 LearningRate 0.000049 Epoch: 32 Global Step: 664890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:30,778-Speed 2497.37 samples/sec Loss 1.2376 LearningRate 0.000049 Epoch: 32 Global Step: 664900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:38,978-Speed 2497.92 samples/sec Loss 1.2342 LearningRate 0.000049 Epoch: 32 Global Step: 664910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:47,187-Speed 2496.12 samples/sec Loss 1.2561 LearningRate 0.000049 Epoch: 32 Global Step: 664920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:39:55,339-Speed 2512.82 samples/sec Loss 1.1845 LearningRate 0.000049 Epoch: 32 Global Step: 664930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:03,539-Speed 2497.78 samples/sec Loss 1.2431 LearningRate 0.000049 Epoch: 32 Global Step: 664940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:11,738-Speed 2498.58 samples/sec Loss 1.2473 LearningRate 0.000049 Epoch: 32 Global Step: 664950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:19,947-Speed 2495.22 samples/sec Loss 1.2283 LearningRate 0.000049 Epoch: 32 Global Step: 664960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:28,148-Speed 2497.55 samples/sec Loss 1.2723 LearningRate 0.000049 Epoch: 32 Global Step: 664970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:36,351-Speed 2497.41 samples/sec Loss 1.2418 LearningRate 0.000049 Epoch: 32 Global Step: 664980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:44,502-Speed 2512.78 samples/sec Loss 1.2352 LearningRate 0.000049 Epoch: 32 Global Step: 664990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:40:52,707-Speed 2496.51 samples/sec Loss 1.2347 LearningRate 0.000049 Epoch: 32 Global Step: 665000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:00,922-Speed 2493.34 samples/sec Loss 1.2380 LearningRate 0.000049 Epoch: 32 Global Step: 665010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:09,128-Speed 2496.13 samples/sec Loss 1.2820 LearningRate 0.000049 Epoch: 32 Global Step: 665020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:17,333-Speed 2496.49 samples/sec Loss 1.2359 LearningRate 0.000049 Epoch: 32 Global Step: 665030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:25,531-Speed 2498.78 samples/sec Loss 1.2370 LearningRate 0.000049 Epoch: 32 Global Step: 665040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:33,678-Speed 2514.06 samples/sec Loss 1.2336 LearningRate 0.000049 Epoch: 32 Global Step: 665050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:41,881-Speed 2497.04 samples/sec Loss 1.2206 LearningRate 0.000049 Epoch: 32 Global Step: 665060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:50,081-Speed 2497.92 samples/sec Loss 1.2487 LearningRate 0.000049 Epoch: 32 Global Step: 665070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:41:58,284-Speed 2497.35 samples/sec Loss 1.2069 LearningRate 0.000049 Epoch: 32 Global Step: 665080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:06,490-Speed 2495.84 samples/sec Loss 1.2411 LearningRate 0.000049 Epoch: 32 Global Step: 665090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:14,698-Speed 2495.70 samples/sec Loss 1.2132 LearningRate 0.000049 Epoch: 32 Global Step: 665100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:22,850-Speed 2512.54 samples/sec Loss 1.2501 LearningRate 0.000049 Epoch: 32 Global Step: 665110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:31,050-Speed 2498.09 samples/sec Loss 1.2988 LearningRate 0.000049 Epoch: 32 Global Step: 665120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:39,258-Speed 2495.24 samples/sec Loss 1.2000 LearningRate 0.000049 Epoch: 32 Global Step: 665130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:47,460-Speed 2497.48 samples/sec Loss 1.2171 LearningRate 0.000048 Epoch: 32 Global Step: 665140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:42:55,673-Speed 2494.45 samples/sec Loss 1.2314 LearningRate 0.000048 Epoch: 32 Global Step: 665150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:03,886-Speed 2493.84 samples/sec Loss 1.2655 LearningRate 0.000048 Epoch: 32 Global Step: 665160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:12,033-Speed 2514.10 samples/sec Loss 1.2226 LearningRate 0.000048 Epoch: 32 Global Step: 665170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:20,234-Speed 2498.14 samples/sec Loss 1.2346 LearningRate 0.000048 Epoch: 32 Global Step: 665180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:28,449-Speed 2493.29 samples/sec Loss 1.2653 LearningRate 0.000048 Epoch: 32 Global Step: 665190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:36,654-Speed 2496.36 samples/sec Loss 1.2593 LearningRate 0.000048 Epoch: 32 Global Step: 665200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:44,857-Speed 2496.96 samples/sec Loss 1.2137 LearningRate 0.000048 Epoch: 32 Global Step: 665210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:43:53,059-Speed 2497.35 samples/sec Loss 1.2049 LearningRate 0.000048 Epoch: 32 Global Step: 665220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:01,208-Speed 2513.48 samples/sec Loss 1.2366 LearningRate 0.000048 Epoch: 32 Global Step: 665230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:09,407-Speed 2498.50 samples/sec Loss 1.2450 LearningRate 0.000048 Epoch: 32 Global Step: 665240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:17,605-Speed 2498.32 samples/sec Loss 1.2557 LearningRate 0.000048 Epoch: 32 Global Step: 665250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:25,804-Speed 2498.36 samples/sec Loss 1.2278 LearningRate 0.000048 Epoch: 32 Global Step: 665260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:34,003-Speed 2498.37 samples/sec Loss 1.2447 LearningRate 0.000048 Epoch: 32 Global Step: 665270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:42,204-Speed 2497.48 samples/sec Loss 1.2648 LearningRate 0.000048 Epoch: 32 Global Step: 665280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:50,354-Speed 2513.27 samples/sec Loss 1.2317 LearningRate 0.000048 Epoch: 32 Global Step: 665290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:44:58,562-Speed 2495.47 samples/sec Loss 1.2293 LearningRate 0.000048 Epoch: 32 Global Step: 665300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:06,781-Speed 2492.06 samples/sec Loss 1.2575 LearningRate 0.000048 Epoch: 32 Global Step: 665310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:14,985-Speed 2496.75 samples/sec Loss 1.2201 LearningRate 0.000048 Epoch: 32 Global Step: 665320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:23,188-Speed 2497.24 samples/sec Loss 1.2591 LearningRate 0.000048 Epoch: 32 Global Step: 665330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:31,403-Speed 2493.23 samples/sec Loss 1.2154 LearningRate 0.000048 Epoch: 32 Global Step: 665340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:39,552-Speed 2513.62 samples/sec Loss 1.2724 LearningRate 0.000048 Epoch: 32 Global Step: 665350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:47,754-Speed 2497.19 samples/sec Loss 1.2470 LearningRate 0.000048 Epoch: 32 Global Step: 665360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:45:55,957-Speed 2496.97 samples/sec Loss 1.2335 LearningRate 0.000048 Epoch: 32 Global Step: 665370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:04,163-Speed 2496.30 samples/sec Loss 1.2262 LearningRate 0.000048 Epoch: 32 Global Step: 665380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:12,371-Speed 2495.55 samples/sec Loss 1.2277 LearningRate 0.000048 Epoch: 32 Global Step: 665390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:20,571-Speed 2498.01 samples/sec Loss 1.2302 LearningRate 0.000048 Epoch: 32 Global Step: 665400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:28,718-Speed 2514.28 samples/sec Loss 1.2494 LearningRate 0.000048 Epoch: 32 Global Step: 665410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:36,922-Speed 2496.81 samples/sec Loss 1.2906 LearningRate 0.000048 Epoch: 32 Global Step: 665420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:45,121-Speed 2498.05 samples/sec Loss 1.2637 LearningRate 0.000048 Epoch: 32 Global Step: 665430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:46:53,324-Speed 2497.11 samples/sec Loss 1.2223 LearningRate 0.000048 Epoch: 32 Global Step: 665440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:01,522-Speed 2498.29 samples/sec Loss 1.2214 LearningRate 0.000048 Epoch: 32 Global Step: 665450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:09,726-Speed 2496.74 samples/sec Loss 1.2461 LearningRate 0.000048 Epoch: 32 Global Step: 665460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:17,879-Speed 2512.70 samples/sec Loss 1.2421 LearningRate 0.000048 Epoch: 32 Global Step: 665470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:26,081-Speed 2497.18 samples/sec Loss 1.2156 LearningRate 0.000048 Epoch: 32 Global Step: 665480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:34,296-Speed 2493.44 samples/sec Loss 1.2348 LearningRate 0.000048 Epoch: 32 Global Step: 665490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:42,501-Speed 2496.37 samples/sec Loss 1.2416 LearningRate 0.000048 Epoch: 32 Global Step: 665500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:50,706-Speed 2496.58 samples/sec Loss 1.2504 LearningRate 0.000048 Epoch: 32 Global Step: 665510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:47:58,918-Speed 2494.37 samples/sec Loss 1.2276 LearningRate 0.000048 Epoch: 32 Global Step: 665520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:07,065-Speed 2514.18 samples/sec Loss 1.2521 LearningRate 0.000048 Epoch: 32 Global Step: 665530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:15,270-Speed 2496.48 samples/sec Loss 1.2300 LearningRate 0.000048 Epoch: 32 Global Step: 665540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:23,470-Speed 2498.00 samples/sec Loss 1.2373 LearningRate 0.000048 Epoch: 32 Global Step: 665550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:31,667-Speed 2498.59 samples/sec Loss 1.2513 LearningRate 0.000048 Epoch: 32 Global Step: 665560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:39,864-Speed 2498.98 samples/sec Loss 1.2633 LearningRate 0.000048 Epoch: 32 Global Step: 665570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:48,065-Speed 2497.63 samples/sec Loss 1.2456 LearningRate 0.000048 Epoch: 32 Global Step: 665580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:48:56,225-Speed 2510.19 samples/sec Loss 1.2347 LearningRate 0.000048 Epoch: 32 Global Step: 665590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:04,428-Speed 2497.18 samples/sec Loss 1.2403 LearningRate 0.000048 Epoch: 32 Global Step: 665600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:12,631-Speed 2497.06 samples/sec Loss 1.2407 LearningRate 0.000048 Epoch: 32 Global Step: 665610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:20,831-Speed 2497.89 samples/sec Loss 1.2319 LearningRate 0.000048 Epoch: 32 Global Step: 665620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:29,036-Speed 2496.44 samples/sec Loss 1.2356 LearningRate 0.000048 Epoch: 32 Global Step: 665630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:37,241-Speed 2496.62 samples/sec Loss 1.2427 LearningRate 0.000048 Epoch: 32 Global Step: 665640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:45,385-Speed 2515.13 samples/sec Loss 1.2627 LearningRate 0.000048 Epoch: 32 Global Step: 665650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:49:53,588-Speed 2496.94 samples/sec Loss 1.2204 LearningRate 0.000048 Epoch: 32 Global Step: 665660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:01,787-Speed 2498.33 samples/sec Loss 1.2482 LearningRate 0.000048 Epoch: 32 Global Step: 665670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:09,989-Speed 2497.62 samples/sec Loss 1.2402 LearningRate 0.000048 Epoch: 32 Global Step: 665680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:18,190-Speed 2497.66 samples/sec Loss 1.2513 LearningRate 0.000048 Epoch: 32 Global Step: 665690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:26,388-Speed 2498.21 samples/sec Loss 1.2300 LearningRate 0.000048 Epoch: 32 Global Step: 665700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:34,539-Speed 2513.04 samples/sec Loss 1.2128 LearningRate 0.000048 Epoch: 32 Global Step: 665710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:42,749-Speed 2495.03 samples/sec Loss 1.2083 LearningRate 0.000048 Epoch: 32 Global Step: 665720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:50,953-Speed 2496.89 samples/sec Loss 1.2515 LearningRate 0.000048 Epoch: 32 Global Step: 665730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:50:59,155-Speed 2497.35 samples/sec Loss 1.2215 LearningRate 0.000048 Epoch: 32 Global Step: 665740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:51:07,359-Speed 2496.64 samples/sec Loss 1.2463 LearningRate 0.000048 Epoch: 32 Global Step: 665750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-07-11 22:51:15,558-Speed 2498.18 samples/sec Loss 1.2267 LearningRate 0.000048 Epoch: 32 Global Step: 665760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:51:23,708-Speed 2513.36 samples/sec Loss 1.2437 LearningRate 0.000048 Epoch: 32 Global Step: 665770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:51:31,907-Speed 2498.42 samples/sec Loss 1.2544 LearningRate 0.000048 Epoch: 32 Global Step: 665780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:51:40,106-Speed 2498.02 samples/sec Loss 1.2625 LearningRate 0.000048 Epoch: 32 Global Step: 665790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:51:48,308-Speed 2497.84 samples/sec Loss 1.3202 LearningRate 0.000048 Epoch: 32 Global Step: 665800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:51:56,507-Speed 2498.27 samples/sec Loss 1.2520 LearningRate 0.000048 Epoch: 32 Global Step: 665810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:04,708-Speed 2497.75 samples/sec Loss 1.2797 LearningRate 0.000048 Epoch: 32 Global Step: 665820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:12,854-Speed 2514.34 samples/sec Loss 1.2627 LearningRate 0.000048 Epoch: 32 Global Step: 665830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:21,056-Speed 2497.55 samples/sec Loss 1.2277 LearningRate 0.000048 Epoch: 32 Global Step: 665840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:29,259-Speed 2497.37 samples/sec Loss 1.2372 LearningRate 0.000048 Epoch: 32 Global Step: 665850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:37,457-Speed 2498.19 samples/sec Loss 1.2367 LearningRate 0.000048 Epoch: 32 Global Step: 665860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:45,656-Speed 2498.29 samples/sec Loss 1.2362 LearningRate 0.000048 Epoch: 32 Global Step: 665870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:52:53,861-Speed 2496.72 samples/sec Loss 1.2426 LearningRate 0.000048 Epoch: 32 Global Step: 665880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:53:02,006-Speed 2514.69 samples/sec Loss 1.2167 LearningRate 0.000048 Epoch: 32 Global Step: 665890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 22:53:10,209-Speed 2497.01 samples/sec Loss 1.2253 LearningRate 0.000048 Epoch: 32 Global Step: 665900 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:18,408-Speed 2498.60 samples/sec Loss 1.2222 LearningRate 0.000048 Epoch: 32 Global Step: 665910 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:26,610-Speed 2497.64 samples/sec Loss 1.2345 LearningRate 0.000048 Epoch: 32 Global Step: 665920 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:34,812-Speed 2497.39 samples/sec Loss 1.2522 LearningRate 0.000048 Epoch: 32 Global Step: 665930 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:43,011-Speed 2498.11 samples/sec Loss 1.2485 LearningRate 0.000048 Epoch: 32 Global Step: 665940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:51,166-Speed 2511.95 samples/sec Loss 1.2543 LearningRate 0.000048 Epoch: 32 Global Step: 665950 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:53:59,369-Speed 2496.95 samples/sec Loss 1.2111 LearningRate 0.000048 Epoch: 32 Global Step: 665960 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:07,572-Speed 2497.13 samples/sec Loss 1.2370 LearningRate 0.000048 Epoch: 32 Global Step: 665970 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:15,776-Speed 2496.72 samples/sec Loss 1.2306 LearningRate 0.000048 Epoch: 32 Global Step: 665980 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:23,982-Speed 2495.99 samples/sec Loss 1.2356 LearningRate 0.000048 Epoch: 32 Global Step: 665990 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:32,183-Speed 2497.63 samples/sec Loss 1.2519 LearningRate 0.000048 Epoch: 32 Global Step: 666000 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:40,332-Speed 2513.58 samples/sec Loss 1.2499 LearningRate 0.000048 Epoch: 32 Global Step: 666010 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:48,538-Speed 2496.16 samples/sec Loss 1.2467 LearningRate 0.000048 Epoch: 32 Global Step: 666020 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:54:56,739-Speed 2497.57 samples/sec Loss 1.2196 LearningRate 0.000048 Epoch: 32 Global Step: 666030 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:04,941-Speed 2497.45 samples/sec Loss 1.2478 LearningRate 0.000048 Epoch: 32 Global Step: 666040 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:13,140-Speed 2498.14 samples/sec Loss 1.2501 LearningRate 0.000048 Epoch: 32 Global Step: 666050 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:21,342-Speed 2497.39 samples/sec Loss 1.2322 LearningRate 0.000048 Epoch: 32 Global Step: 666060 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:29,486-Speed 2515.07 samples/sec Loss 1.2323 LearningRate 0.000048 Epoch: 32 Global Step: 666070 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:37,703-Speed 2492.91 samples/sec Loss 1.2462 LearningRate 0.000048 Epoch: 32 Global Step: 666080 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:45,907-Speed 2496.61 samples/sec Loss 1.2450 LearningRate 0.000048 Epoch: 32 Global Step: 666090 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:55:54,112-Speed 2496.80 samples/sec Loss 1.2619 LearningRate 0.000048 Epoch: 32 Global Step: 666100 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:02,317-Speed 2496.44 samples/sec Loss 1.2461 LearningRate 0.000048 Epoch: 32 Global Step: 666110 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:10,516-Speed 2498.28 samples/sec Loss 1.2406 LearningRate 0.000048 Epoch: 32 Global Step: 666120 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:18,661-Speed 2514.44 samples/sec Loss 1.2403 LearningRate 0.000048 Epoch: 32 Global Step: 666130 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:26,865-Speed 2496.94 samples/sec Loss 1.2363 LearningRate 0.000048 Epoch: 32 Global Step: 666140 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:35,066-Speed 2497.85 samples/sec Loss 1.2680 LearningRate 0.000048 Epoch: 32 Global Step: 666150 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:43,269-Speed 2497.02 samples/sec Loss 1.2833 LearningRate 0.000048 Epoch: 32 Global Step: 666160 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:51,472-Speed 2497.10 samples/sec Loss 1.2404 LearningRate 0.000048 Epoch: 32 Global Step: 666170 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:56:59,672-Speed 2497.92 samples/sec Loss 1.2370 LearningRate 0.000048 Epoch: 32 Global Step: 666180 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:07,819-Speed 2514.27 samples/sec Loss 1.2221 LearningRate 0.000048 Epoch: 32 Global Step: 666190 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:16,020-Speed 2497.64 samples/sec Loss 1.2386 LearningRate 0.000048 Epoch: 32 Global Step: 666200 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:24,222-Speed 2497.41 samples/sec Loss 1.2646 LearningRate 0.000048 Epoch: 32 Global Step: 666210 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:32,421-Speed 2497.95 samples/sec Loss 1.2755 LearningRate 0.000048 Epoch: 32 Global Step: 666220 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:40,622-Speed 2497.72 samples/sec Loss 1.2195 LearningRate 0.000048 Epoch: 32 Global Step: 666230 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:48,831-Speed 2495.54 samples/sec Loss 1.2478 LearningRate 0.000048 Epoch: 32 Global Step: 666240 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:57:56,977-Speed 2514.42 samples/sec Loss 1.2352 LearningRate 0.000048 Epoch: 32 Global Step: 666250 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:05,180-Speed 2497.14 samples/sec Loss 1.2604 LearningRate 0.000048 Epoch: 32 Global Step: 666260 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:13,382-Speed 2497.46 samples/sec Loss 1.2369 LearningRate 0.000048 Epoch: 32 Global Step: 666270 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:21,585-Speed 2496.79 samples/sec Loss 1.2873 LearningRate 0.000048 Epoch: 32 Global Step: 666280 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:30,917-Speed 2411.44 samples/sec Loss 1.2348 LearningRate 0.000048 Epoch: 32 Global Step: 666290 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:39,169-Speed 2500.05 samples/sec Loss 1.2373 LearningRate 0.000048 Epoch: 32 Global Step: 666300 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:47,389-Speed 2516.45 samples/sec Loss 1.2697 LearningRate 0.000048 Epoch: 32 Global Step: 666310 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:58:55,602-Speed 2493.99 samples/sec Loss 1.2338 LearningRate 0.000048 Epoch: 32 Global Step: 666320 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:03,845-Speed 2500.51 samples/sec Loss 1.2019 LearningRate 0.000048 Epoch: 32 Global Step: 666330 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:12,125-Speed 2491.80 samples/sec Loss 1.2497 LearningRate 0.000048 Epoch: 32 Global Step: 666340 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:20,450-Speed 2496.74 samples/sec Loss 1.2297 LearningRate 0.000048 Epoch: 32 Global Step: 666350 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:28,656-Speed 2495.95 samples/sec Loss 1.2494 LearningRate 0.000048 Epoch: 32 Global Step: 666360 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:36,803-Speed 2514.15 samples/sec Loss 1.2257 LearningRate 0.000048 Epoch: 32 Global Step: 666370 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:49,306-Speed 1648.49 samples/sec Loss 1.2295 LearningRate 0.000048 Epoch: 32 Global Step: 666380 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 22:59:57,536-Speed 2498.16 samples/sec Loss 1.2416 LearningRate 0.000048 Epoch: 32 Global Step: 666390 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:05,753-Speed 2500.15 samples/sec Loss 1.2197 LearningRate 0.000048 Epoch: 32 Global Step: 666400 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:14,002-Speed 2499.11 samples/sec Loss 1.2728 LearningRate 0.000048 Epoch: 32 Global Step: 666410 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:22,238-Speed 2499.99 samples/sec Loss 1.2491 LearningRate 0.000048 Epoch: 32 Global Step: 666420 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:30,390-Speed 2512.57 samples/sec Loss 1.2554 LearningRate 0.000048 Epoch: 32 Global Step: 666430 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:38,664-Speed 2498.67 samples/sec Loss 1.2512 LearningRate 0.000048 Epoch: 32 Global Step: 666440 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:00:52,236-Speed 2494.83 samples/sec Loss 1.2461 LearningRate 0.000048 Epoch: 32 Global Step: 666450 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:00,485-Speed 2500.22 samples/sec Loss 1.2440 LearningRate 0.000048 Epoch: 32 Global Step: 666460 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:08,679-Speed 2499.57 samples/sec Loss 1.2363 LearningRate 0.000048 Epoch: 32 Global Step: 666470 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:16,911-Speed 2500.53 samples/sec Loss 1.2152 LearningRate 0.000048 Epoch: 32 Global Step: 666480 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:25,116-Speed 2517.50 samples/sec Loss 1.2397 LearningRate 0.000048 Epoch: 32 Global Step: 666490 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:37,316-Speed 1678.84 samples/sec Loss 1.2405 LearningRate 0.000048 Epoch: 32 Global Step: 666500 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:47,766-Speed 2501.17 samples/sec Loss 1.2617 LearningRate 0.000048 Epoch: 32 Global Step: 666510 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:01:59,279-Speed 1782.61 samples/sec Loss 1.2405 LearningRate 0.000048 Epoch: 32 Global Step: 666520 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:07,615-Speed 2457.30 samples/sec Loss 1.2284 LearningRate 0.000048 Epoch: 32 Global Step: 666530 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:17,880-Speed 2092.61 samples/sec Loss 1.2451 LearningRate 0.000048 Epoch: 32 Global Step: 666540 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:26,038-Speed 2510.83 samples/sec Loss 1.1963 LearningRate 0.000048 Epoch: 32 Global Step: 666550 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:34,395-Speed 2499.24 samples/sec Loss 1.2179 LearningRate 0.000048 Epoch: 32 Global Step: 666560 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:42,597-Speed 2497.40 samples/sec Loss 1.2769 LearningRate 0.000048 Epoch: 32 Global Step: 666570 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:50,805-Speed 2495.34 samples/sec Loss 1.2407 LearningRate 0.000048 Epoch: 32 Global Step: 666580 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:02:59,015-Speed 2495.73 samples/sec Loss 1.2356 LearningRate 0.000048 Epoch: 32 Global Step: 666590 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:07,224-Speed 2495.30 samples/sec Loss 1.2482 LearningRate 0.000048 Epoch: 32 Global Step: 666600 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:15,383-Speed 2510.41 samples/sec Loss 1.2555 LearningRate 0.000048 Epoch: 32 Global Step: 666610 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:23,603-Speed 2491.91 samples/sec Loss 1.2398 LearningRate 0.000048 Epoch: 32 Global Step: 666620 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:31,814-Speed 2494.52 samples/sec Loss 1.2330 LearningRate 0.000048 Epoch: 32 Global Step: 666630 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:40,023-Speed 2495.25 samples/sec Loss 1.2382 LearningRate 0.000048 Epoch: 32 Global Step: 666640 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:48,242-Speed 2492.20 samples/sec Loss 1.2448 LearningRate 0.000048 Epoch: 32 Global Step: 666650 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:03:56,450-Speed 2495.60 samples/sec Loss 1.2443 LearningRate 0.000048 Epoch: 32 Global Step: 666660 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:04,607-Speed 2511.28 samples/sec Loss 1.2332 LearningRate 0.000048 Epoch: 32 Global Step: 666670 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:12,811-Speed 2496.58 samples/sec Loss 1.2613 LearningRate 0.000048 Epoch: 32 Global Step: 666680 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:21,016-Speed 2496.48 samples/sec Loss 1.2365 LearningRate 0.000048 Epoch: 32 Global Step: 666690 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:29,222-Speed 2496.07 samples/sec Loss 1.2269 LearningRate 0.000048 Epoch: 32 Global Step: 666700 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:37,428-Speed 2496.29 samples/sec Loss 1.2242 LearningRate 0.000048 Epoch: 32 Global Step: 666710 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:45,634-Speed 2495.82 samples/sec Loss 1.2504 LearningRate 0.000048 Epoch: 32 Global Step: 666720 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:04:53,789-Speed 2511.63 samples/sec Loss 1.2218 LearningRate 0.000048 Epoch: 32 Global Step: 666730 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:05:01,998-Speed 2495.31 samples/sec Loss 1.2564 LearningRate 0.000048 Epoch: 32 Global Step: 666740 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:05:10,210-Speed 2494.13 samples/sec Loss 1.2541 LearningRate 0.000048 Epoch: 32 Global Step: 666750 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:05:18,417-Speed 2496.07 samples/sec Loss 1.2605 LearningRate 0.000048 Epoch: 32 Global Step: 666760 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:05:26,622-Speed 2496.39 samples/sec Loss 1.2507 LearningRate 0.000048 Epoch: 32 Global Step: 666770 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:05:34,784-Speed 2509.23 samples/sec Loss 1.2445 LearningRate 0.000048 Epoch: 32 Global Step: 666780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:05:42,941-Speed 2511.32 samples/sec Loss 1.2849 LearningRate 0.000048 Epoch: 32 Global Step: 666790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:05:51,147-Speed 2496.14 samples/sec Loss 1.2464 LearningRate 0.000048 Epoch: 32 Global Step: 666800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:05:59,351-Speed 2496.59 samples/sec Loss 1.2398 LearningRate 0.000048 Epoch: 32 Global Step: 666810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:07,557-Speed 2496.08 samples/sec Loss 1.2298 LearningRate 0.000048 Epoch: 32 Global Step: 666820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:15,762-Speed 2496.43 samples/sec Loss 1.2518 LearningRate 0.000048 Epoch: 32 Global Step: 666830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:23,966-Speed 2496.89 samples/sec Loss 1.2350 LearningRate 0.000048 Epoch: 32 Global Step: 666840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:32,119-Speed 2512.39 samples/sec Loss 1.2750 LearningRate 0.000047 Epoch: 32 Global Step: 666850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:40,329-Speed 2495.04 samples/sec Loss 1.2465 LearningRate 0.000047 Epoch: 32 Global Step: 666860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:48,546-Speed 2492.82 samples/sec Loss 1.2691 LearningRate 0.000047 Epoch: 32 Global Step: 666870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:06:56,752-Speed 2496.18 samples/sec Loss 1.2186 LearningRate 0.000047 Epoch: 32 Global Step: 666880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:04,961-Speed 2495.60 samples/sec Loss 1.2630 LearningRate 0.000047 Epoch: 32 Global Step: 666890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:13,165-Speed 2496.58 samples/sec Loss 1.2315 LearningRate 0.000047 Epoch: 32 Global Step: 666900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:21,319-Speed 2512.21 samples/sec Loss 1.2433 LearningRate 0.000047 Epoch: 32 Global Step: 666910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:29,539-Speed 2491.78 samples/sec Loss 1.2530 LearningRate 0.000047 Epoch: 32 Global Step: 666920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:37,749-Speed 2494.87 samples/sec Loss 1.2559 LearningRate 0.000047 Epoch: 32 Global Step: 666930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:45,958-Speed 2495.57 samples/sec Loss 1.2403 LearningRate 0.000047 Epoch: 32 Global Step: 666940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:07:54,165-Speed 2495.70 samples/sec Loss 1.2759 LearningRate 0.000047 Epoch: 32 Global Step: 666950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:02,373-Speed 2495.44 samples/sec Loss 1.2563 LearningRate 0.000047 Epoch: 32 Global Step: 666960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:10,522-Speed 2513.68 samples/sec Loss 1.2574 LearningRate 0.000047 Epoch: 32 Global Step: 666970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:18,728-Speed 2496.08 samples/sec Loss 1.2267 LearningRate 0.000047 Epoch: 32 Global Step: 666980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:26,935-Speed 2496.01 samples/sec Loss 1.2173 LearningRate 0.000047 Epoch: 32 Global Step: 666990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:35,153-Speed 2492.41 samples/sec Loss 1.2674 LearningRate 0.000047 Epoch: 32 Global Step: 667000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:43,365-Speed 2494.52 samples/sec Loss 1.2336 LearningRate 0.000047 Epoch: 32 Global Step: 667010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:51,575-Speed 2494.85 samples/sec Loss 1.2475 LearningRate 0.000047 Epoch: 32 Global Step: 667020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:08:59,726-Speed 2512.81 samples/sec Loss 1.2739 LearningRate 0.000047 Epoch: 32 Global Step: 667030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:07,935-Speed 2495.35 samples/sec Loss 1.2468 LearningRate 0.000047 Epoch: 32 Global Step: 667040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:16,149-Speed 2493.75 samples/sec Loss 1.2297 LearningRate 0.000047 Epoch: 32 Global Step: 667050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:24,353-Speed 2496.53 samples/sec Loss 1.2273 LearningRate 0.000047 Epoch: 32 Global Step: 667060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:32,559-Speed 2496.36 samples/sec Loss 1.2402 LearningRate 0.000047 Epoch: 32 Global Step: 667070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:40,761-Speed 2497.18 samples/sec Loss 1.2465 LearningRate 0.000047 Epoch: 32 Global Step: 667080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:48,912-Speed 2512.77 samples/sec Loss 1.2463 LearningRate 0.000047 Epoch: 32 Global Step: 667090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:09:57,116-Speed 2496.83 samples/sec Loss 1.2440 LearningRate 0.000047 Epoch: 32 Global Step: 667100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:10:05,318-Speed 2497.48 samples/sec Loss 1.2019 LearningRate 0.000047 Epoch: 32 Global Step: 667110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:10:13,524-Speed 2496.06 samples/sec Loss 1.2278 LearningRate 0.000047 Epoch: 32 Global Step: 667120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:10:21,686-Speed 2509.62 samples/sec Loss 1.2279 LearningRate 0.000047 Epoch: 32 Global Step: 667130 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:10:29,891-Speed 2496.60 samples/sec Loss 1.2299 LearningRate 0.000047 Epoch: 32 Global Step: 667140 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:10:38,054-Speed 2509.19 samples/sec Loss 1.2037 LearningRate 0.000047 Epoch: 32 Global Step: 667150 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:10:46,266-Speed 2494.40 samples/sec Loss 1.2357 LearningRate 0.000047 Epoch: 32 Global Step: 667160 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:10:54,471-Speed 2496.61 samples/sec Loss 1.2336 LearningRate 0.000047 Epoch: 32 Global Step: 667170 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:02,676-Speed 2496.40 samples/sec Loss 1.2625 LearningRate 0.000047 Epoch: 32 Global Step: 667180 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:10,883-Speed 2496.23 samples/sec Loss 1.2402 LearningRate 0.000047 Epoch: 32 Global Step: 667190 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:19,087-Speed 2496.57 samples/sec Loss 1.2353 LearningRate 0.000047 Epoch: 32 Global Step: 667200 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:27,240-Speed 2512.45 samples/sec Loss 1.2484 LearningRate 0.000047 Epoch: 32 Global Step: 667210 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:35,444-Speed 2496.81 samples/sec Loss 1.2409 LearningRate 0.000047 Epoch: 32 Global Step: 667220 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:43,647-Speed 2497.15 samples/sec Loss 1.2412 LearningRate 0.000047 Epoch: 32 Global Step: 667230 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:11:51,849-Speed 2497.35 samples/sec Loss 1.2366 LearningRate 0.000047 Epoch: 32 Global Step: 667240 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:00,060-Speed 2494.74 samples/sec Loss 1.2444 LearningRate 0.000047 Epoch: 32 Global Step: 667250 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:08,266-Speed 2496.07 samples/sec Loss 1.2898 LearningRate 0.000047 Epoch: 32 Global Step: 667260 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:16,437-Speed 2507.08 samples/sec Loss 1.2531 LearningRate 0.000047 Epoch: 32 Global Step: 667270 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:24,644-Speed 2495.97 samples/sec Loss 1.2355 LearningRate 0.000047 Epoch: 32 Global Step: 667280 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:32,850-Speed 2495.84 samples/sec Loss 1.2285 LearningRate 0.000047 Epoch: 32 Global Step: 667290 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:41,068-Speed 2492.58 samples/sec Loss 1.2621 LearningRate 0.000047 Epoch: 32 Global Step: 667300 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:49,274-Speed 2496.16 samples/sec Loss 1.2608 LearningRate 0.000047 Epoch: 32 Global Step: 667310 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:12:57,489-Speed 2493.36 samples/sec Loss 1.2590 LearningRate 0.000047 Epoch: 32 Global Step: 667320 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:05,640-Speed 2512.83 samples/sec Loss 1.2310 LearningRate 0.000047 Epoch: 32 Global Step: 667330 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:13,846-Speed 2496.23 samples/sec Loss 1.2625 LearningRate 0.000047 Epoch: 32 Global Step: 667340 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:22,055-Speed 2495.17 samples/sec Loss 1.1973 LearningRate 0.000047 Epoch: 32 Global Step: 667350 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:30,260-Speed 2496.67 samples/sec Loss 1.2125 LearningRate 0.000047 Epoch: 32 Global Step: 667360 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:38,466-Speed 2495.82 samples/sec Loss 1.2159 LearningRate 0.000047 Epoch: 32 Global Step: 667370 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:46,674-Speed 2495.54 samples/sec Loss 1.2468 LearningRate 0.000047 Epoch: 32 Global Step: 667380 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:13:54,829-Speed 2512.23 samples/sec Loss 1.2246 LearningRate 0.000047 Epoch: 32 Global Step: 667390 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:03,034-Speed 2496.30 samples/sec Loss 1.2858 LearningRate 0.000047 Epoch: 32 Global Step: 667400 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:11,240-Speed 2495.86 samples/sec Loss 1.2280 LearningRate 0.000047 Epoch: 32 Global Step: 667410 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:19,444-Speed 2496.87 samples/sec Loss 1.2690 LearningRate 0.000047 Epoch: 32 Global Step: 667420 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:27,661-Speed 2492.96 samples/sec Loss 1.2539 LearningRate 0.000047 Epoch: 32 Global Step: 667430 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:35,884-Speed 2490.74 samples/sec Loss 1.2297 LearningRate 0.000047 Epoch: 32 Global Step: 667440 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:44,037-Speed 2512.31 samples/sec Loss 1.2358 LearningRate 0.000047 Epoch: 32 Global Step: 667450 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:14:52,260-Speed 2491.50 samples/sec Loss 1.2553 LearningRate 0.000047 Epoch: 32 Global Step: 667460 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:00,465-Speed 2496.49 samples/sec Loss 1.2392 LearningRate 0.000047 Epoch: 32 Global Step: 667470 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:08,669-Speed 2496.72 samples/sec Loss 1.1949 LearningRate 0.000047 Epoch: 32 Global Step: 667480 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:16,874-Speed 2496.60 samples/sec Loss 1.2267 LearningRate 0.000047 Epoch: 32 Global Step: 667490 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:25,080-Speed 2496.27 samples/sec Loss 1.2411 LearningRate 0.000047 Epoch: 32 Global Step: 667500 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:33,231-Speed 2512.70 samples/sec Loss 1.2363 LearningRate 0.000047 Epoch: 32 Global Step: 667510 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:41,447-Speed 2493.02 samples/sec Loss 1.2403 LearningRate 0.000047 Epoch: 32 Global Step: 667520 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:49,650-Speed 2497.23 samples/sec Loss 1.2121 LearningRate 0.000047 Epoch: 32 Global Step: 667530 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:15:57,855-Speed 2496.24 samples/sec Loss 1.2450 LearningRate 0.000047 Epoch: 32 Global Step: 667540 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:06,072-Speed 2492.89 samples/sec Loss 1.2463 LearningRate 0.000047 Epoch: 32 Global Step: 667550 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:14,279-Speed 2495.91 samples/sec Loss 1.2423 LearningRate 0.000047 Epoch: 32 Global Step: 667560 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:22,429-Speed 2513.07 samples/sec Loss 1.2502 LearningRate 0.000047 Epoch: 32 Global Step: 667570 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:30,637-Speed 2495.74 samples/sec Loss 1.2440 LearningRate 0.000047 Epoch: 32 Global Step: 667580 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:38,842-Speed 2496.15 samples/sec Loss 1.2582 LearningRate 0.000047 Epoch: 32 Global Step: 667590 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:47,048-Speed 2496.77 samples/sec Loss 1.2516 LearningRate 0.000047 Epoch: 32 Global Step: 667600 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:16:55,252-Speed 2496.63 samples/sec Loss 1.2387 LearningRate 0.000047 Epoch: 32 Global Step: 667610 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:03,459-Speed 2495.80 samples/sec Loss 1.2380 LearningRate 0.000047 Epoch: 32 Global Step: 667620 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:11,609-Speed 2513.42 samples/sec Loss 1.2034 LearningRate 0.000047 Epoch: 32 Global Step: 667630 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:19,820-Speed 2494.60 samples/sec Loss 1.2221 LearningRate 0.000047 Epoch: 32 Global Step: 667640 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:28,023-Speed 2497.25 samples/sec Loss 1.2291 LearningRate 0.000047 Epoch: 32 Global Step: 667650 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:36,224-Speed 2497.65 samples/sec Loss 1.1936 LearningRate 0.000047 Epoch: 32 Global Step: 667660 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:44,426-Speed 2497.27 samples/sec Loss 1.2207 LearningRate 0.000047 Epoch: 32 Global Step: 667670 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:17:52,628-Speed 2497.69 samples/sec Loss 1.2121 LearningRate 0.000047 Epoch: 32 Global Step: 667680 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:00,780-Speed 2512.63 samples/sec Loss 1.2698 LearningRate 0.000047 Epoch: 32 Global Step: 667690 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:08,991-Speed 2494.74 samples/sec Loss 1.2383 LearningRate 0.000047 Epoch: 32 Global Step: 667700 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:17,191-Speed 2497.76 samples/sec Loss 1.2601 LearningRate 0.000047 Epoch: 32 Global Step: 667710 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:25,401-Speed 2495.15 samples/sec Loss 1.2415 LearningRate 0.000047 Epoch: 32 Global Step: 667720 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:33,614-Speed 2493.96 samples/sec Loss 1.2269 LearningRate 0.000047 Epoch: 32 Global Step: 667730 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:41,818-Speed 2496.92 samples/sec Loss 1.2167 LearningRate 0.000047 Epoch: 32 Global Step: 667740 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:49,968-Speed 2513.20 samples/sec Loss 1.2144 LearningRate 0.000047 Epoch: 32 Global Step: 667750 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:18:58,172-Speed 2496.92 samples/sec Loss 1.2489 LearningRate 0.000047 Epoch: 32 Global Step: 667760 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:06,380-Speed 2495.81 samples/sec Loss 1.2469 LearningRate 0.000047 Epoch: 32 Global Step: 667770 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:14,581-Speed 2497.49 samples/sec Loss 1.2736 LearningRate 0.000047 Epoch: 32 Global Step: 667780 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:22,796-Speed 2493.26 samples/sec Loss 1.2610 LearningRate 0.000047 Epoch: 32 Global Step: 667790 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:31,015-Speed 2492.24 samples/sec Loss 1.2450 LearningRate 0.000047 Epoch: 32 Global Step: 667800 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:39,167-Speed 2512.63 samples/sec Loss 1.2435 LearningRate 0.000047 Epoch: 32 Global Step: 667810 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:47,394-Speed 2489.83 samples/sec Loss 1.2480 LearningRate 0.000047 Epoch: 32 Global Step: 667820 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:19:55,596-Speed 2497.13 samples/sec Loss 1.2384 LearningRate 0.000047 Epoch: 32 Global Step: 667830 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:03,800-Speed 2496.81 samples/sec Loss 1.2368 LearningRate 0.000047 Epoch: 32 Global Step: 667840 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:12,024-Speed 2490.54 samples/sec Loss 1.2703 LearningRate 0.000047 Epoch: 32 Global Step: 667850 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:20,227-Speed 2496.92 samples/sec Loss 1.2304 LearningRate 0.000047 Epoch: 32 Global Step: 667860 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:28,382-Speed 2511.67 samples/sec Loss 1.2276 LearningRate 0.000047 Epoch: 32 Global Step: 667870 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:36,594-Speed 2494.34 samples/sec Loss 1.2333 LearningRate 0.000047 Epoch: 32 Global Step: 667880 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:44,796-Speed 2497.27 samples/sec Loss 1.2452 LearningRate 0.000047 Epoch: 32 Global Step: 667890 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:20:52,998-Speed 2497.55 samples/sec Loss 1.2326 LearningRate 0.000047 Epoch: 32 Global Step: 667900 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:01,200-Speed 2497.49 samples/sec Loss 1.2619 LearningRate 0.000047 Epoch: 32 Global Step: 667910 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:09,404-Speed 2496.85 samples/sec Loss 1.2482 LearningRate 0.000047 Epoch: 32 Global Step: 667920 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:17,552-Speed 2513.77 samples/sec Loss 1.2252 LearningRate 0.000047 Epoch: 32 Global Step: 667930 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:25,756-Speed 2496.65 samples/sec Loss 1.2325 LearningRate 0.000047 Epoch: 32 Global Step: 667940 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:33,959-Speed 2497.07 samples/sec Loss 1.2459 LearningRate 0.000047 Epoch: 32 Global Step: 667950 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:42,159-Speed 2498.11 samples/sec Loss 1.2869 LearningRate 0.000047 Epoch: 32 Global Step: 667960 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:50,364-Speed 2496.41 samples/sec Loss 1.2370 LearningRate 0.000047 Epoch: 32 Global Step: 667970 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:21:58,573-Speed 2495.22 samples/sec Loss 1.2433 LearningRate 0.000047 Epoch: 32 Global Step: 667980 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:06,751-Speed 2504.78 samples/sec Loss 1.2372 LearningRate 0.000047 Epoch: 32 Global Step: 667990 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:14,957-Speed 2496.25 samples/sec Loss 1.2374 LearningRate 0.000047 Epoch: 32 Global Step: 668000 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:23,168-Speed 2494.45 samples/sec Loss 1.2515 LearningRate 0.000047 Epoch: 32 Global Step: 668010 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:31,378-Speed 2494.98 samples/sec Loss 1.2590 LearningRate 0.000047 Epoch: 32 Global Step: 668020 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:39,588-Speed 2494.75 samples/sec Loss 1.2306 LearningRate 0.000047 Epoch: 32 Global Step: 668030 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:47,797-Speed 2495.50 samples/sec Loss 1.2597 LearningRate 0.000047 Epoch: 32 Global Step: 668040 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:22:55,952-Speed 2511.68 samples/sec Loss 1.2314 LearningRate 0.000047 Epoch: 32 Global Step: 668050 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:04,172-Speed 2492.35 samples/sec Loss 1.2366 LearningRate 0.000047 Epoch: 32 Global Step: 668060 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:12,376-Speed 2496.59 samples/sec Loss 1.2203 LearningRate 0.000047 Epoch: 32 Global Step: 668070 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:20,586-Speed 2494.85 samples/sec Loss 1.2477 LearningRate 0.000047 Epoch: 32 Global Step: 668080 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:28,800-Speed 2493.86 samples/sec Loss 1.2647 LearningRate 0.000047 Epoch: 32 Global Step: 668090 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:37,006-Speed 2496.25 samples/sec Loss 1.2451 LearningRate 0.000047 Epoch: 32 Global Step: 668100 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:45,161-Speed 2511.95 samples/sec Loss 1.2410 LearningRate 0.000047 Epoch: 32 Global Step: 668110 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:23:53,370-Speed 2495.08 samples/sec Loss 1.2313 LearningRate 0.000047 Epoch: 32 Global Step: 668120 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:01,579-Speed 2495.35 samples/sec Loss 1.2297 LearningRate 0.000047 Epoch: 32 Global Step: 668130 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:09,804-Speed 2490.19 samples/sec Loss 1.2477 LearningRate 0.000047 Epoch: 32 Global Step: 668140 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:18,009-Speed 2496.66 samples/sec Loss 1.2704 LearningRate 0.000047 Epoch: 32 Global Step: 668150 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:26,214-Speed 2496.27 samples/sec Loss 1.2387 LearningRate 0.000047 Epoch: 32 Global Step: 668160 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:34,371-Speed 2511.18 samples/sec Loss 1.2158 LearningRate 0.000047 Epoch: 32 Global Step: 668170 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:42,579-Speed 2495.38 samples/sec Loss 1.2597 LearningRate 0.000047 Epoch: 32 Global Step: 668180 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:50,791-Speed 2494.37 samples/sec Loss 1.2472 LearningRate 0.000047 Epoch: 32 Global Step: 668190 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:24:58,998-Speed 2495.93 samples/sec Loss 1.1803 LearningRate 0.000047 Epoch: 32 Global Step: 668200 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:07,214-Speed 2492.88 samples/sec Loss 1.2486 LearningRate 0.000047 Epoch: 32 Global Step: 668210 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:15,427-Speed 2494.03 samples/sec Loss 1.2285 LearningRate 0.000047 Epoch: 32 Global Step: 668220 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:23,586-Speed 2510.39 samples/sec Loss 1.2708 LearningRate 0.000047 Epoch: 32 Global Step: 668230 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:31,791-Speed 2496.59 samples/sec Loss 1.2487 LearningRate 0.000047 Epoch: 32 Global Step: 668240 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:39,996-Speed 2496.45 samples/sec Loss 1.2273 LearningRate 0.000047 Epoch: 32 Global Step: 668250 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:48,203-Speed 2495.92 samples/sec Loss 1.2189 LearningRate 0.000047 Epoch: 32 Global Step: 668260 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:25:56,410-Speed 2495.87 samples/sec Loss 1.2254 LearningRate 0.000047 Epoch: 32 Global Step: 668270 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:04,617-Speed 2495.97 samples/sec Loss 1.2021 LearningRate 0.000047 Epoch: 32 Global Step: 668280 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:12,773-Speed 2511.27 samples/sec Loss 1.2467 LearningRate 0.000047 Epoch: 32 Global Step: 668290 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:20,981-Speed 2495.36 samples/sec Loss 1.2514 LearningRate 0.000047 Epoch: 32 Global Step: 668300 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:29,187-Speed 2496.20 samples/sec Loss 1.2431 LearningRate 0.000047 Epoch: 32 Global Step: 668310 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:37,393-Speed 2496.38 samples/sec Loss 1.2405 LearningRate 0.000047 Epoch: 32 Global Step: 668320 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-07-11 23:26:45,598-Speed 2496.46 samples/sec Loss 1.2541 LearningRate 0.000047 Epoch: 32 Global Step: 668330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:26:53,808-Speed 2494.65 samples/sec Loss 1.2481 LearningRate 0.000047 Epoch: 32 Global Step: 668340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:01,962-Speed 2512.17 samples/sec Loss 1.2116 LearningRate 0.000047 Epoch: 32 Global Step: 668350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:10,181-Speed 2492.23 samples/sec Loss 1.2283 LearningRate 0.000047 Epoch: 32 Global Step: 668360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:18,383-Speed 2497.20 samples/sec Loss 1.2473 LearningRate 0.000047 Epoch: 32 Global Step: 668370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:26,586-Speed 2496.93 samples/sec Loss 1.2403 LearningRate 0.000047 Epoch: 32 Global Step: 668380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:34,789-Speed 2497.13 samples/sec Loss 1.2535 LearningRate 0.000047 Epoch: 32 Global Step: 668390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:42,994-Speed 2496.56 samples/sec Loss 1.2116 LearningRate 0.000047 Epoch: 32 Global Step: 668400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:51,149-Speed 2511.49 samples/sec Loss 1.2276 LearningRate 0.000047 Epoch: 32 Global Step: 668410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:27:59,353-Speed 2496.82 samples/sec Loss 1.2358 LearningRate 0.000047 Epoch: 32 Global Step: 668420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:07,577-Speed 2490.75 samples/sec Loss 1.2520 LearningRate 0.000047 Epoch: 32 Global Step: 668430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:15,784-Speed 2495.91 samples/sec Loss 1.2589 LearningRate 0.000047 Epoch: 32 Global Step: 668440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:24,001-Speed 2493.04 samples/sec Loss 1.2512 LearningRate 0.000047 Epoch: 32 Global Step: 668450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:32,204-Speed 2496.99 samples/sec Loss 1.2216 LearningRate 0.000047 Epoch: 32 Global Step: 668460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:40,354-Speed 2513.31 samples/sec Loss 1.2274 LearningRate 0.000047 Epoch: 32 Global Step: 668470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:48,560-Speed 2496.07 samples/sec Loss 1.2266 LearningRate 0.000047 Epoch: 32 Global Step: 668480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:28:56,789-Speed 2489.27 samples/sec Loss 1.2722 LearningRate 0.000047 Epoch: 32 Global Step: 668490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:04,994-Speed 2496.24 samples/sec Loss 1.2057 LearningRate 0.000047 Epoch: 32 Global Step: 668500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:13,213-Speed 2492.24 samples/sec Loss 1.2347 LearningRate 0.000047 Epoch: 32 Global Step: 668510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:21,417-Speed 2496.74 samples/sec Loss 1.2116 LearningRate 0.000047 Epoch: 32 Global Step: 668520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:29,572-Speed 2511.91 samples/sec Loss 1.2595 LearningRate 0.000047 Epoch: 32 Global Step: 668530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:37,778-Speed 2496.17 samples/sec Loss 1.2361 LearningRate 0.000047 Epoch: 32 Global Step: 668540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:45,985-Speed 2495.97 samples/sec Loss 1.2333 LearningRate 0.000047 Epoch: 32 Global Step: 668550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:29:54,191-Speed 2495.89 samples/sec Loss 1.2368 LearningRate 0.000047 Epoch: 32 Global Step: 668560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:02,397-Speed 2495.96 samples/sec Loss 1.1932 LearningRate 0.000046 Epoch: 32 Global Step: 668570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:10,614-Speed 2493.34 samples/sec Loss 1.2447 LearningRate 0.000046 Epoch: 32 Global Step: 668580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:18,764-Speed 2513.41 samples/sec Loss 1.2113 LearningRate 0.000046 Epoch: 32 Global Step: 668590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:26,967-Speed 2496.89 samples/sec Loss 1.2150 LearningRate 0.000046 Epoch: 32 Global Step: 668600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:35,191-Speed 2490.69 samples/sec Loss 1.2592 LearningRate 0.000046 Epoch: 32 Global Step: 668610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:43,397-Speed 2496.26 samples/sec Loss 1.2138 LearningRate 0.000046 Epoch: 32 Global Step: 668620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:51,599-Speed 2497.04 samples/sec Loss 1.2383 LearningRate 0.000046 Epoch: 32 Global Step: 668630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:30:59,802-Speed 2497.01 samples/sec Loss 1.2261 LearningRate 0.000046 Epoch: 32 Global Step: 668640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:07,953-Speed 2512.92 samples/sec Loss 1.2235 LearningRate 0.000046 Epoch: 32 Global Step: 668650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:16,165-Speed 2494.41 samples/sec Loss 1.2596 LearningRate 0.000046 Epoch: 32 Global Step: 668660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:24,379-Speed 2493.52 samples/sec Loss 1.2248 LearningRate 0.000046 Epoch: 32 Global Step: 668670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:32,584-Speed 2496.60 samples/sec Loss 1.2433 LearningRate 0.000046 Epoch: 32 Global Step: 668680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:40,788-Speed 2496.69 samples/sec Loss 1.2182 LearningRate 0.000046 Epoch: 32 Global Step: 668690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:48,991-Speed 2497.04 samples/sec Loss 1.2432 LearningRate 0.000046 Epoch: 32 Global Step: 668700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:31:57,142-Speed 2512.83 samples/sec Loss 1.2632 LearningRate 0.000046 Epoch: 32 Global Step: 668710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:05,346-Speed 2496.59 samples/sec Loss 1.2341 LearningRate 0.000046 Epoch: 32 Global Step: 668720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:13,549-Speed 2497.07 samples/sec Loss 1.1862 LearningRate 0.000046 Epoch: 32 Global Step: 668730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:21,755-Speed 2496.30 samples/sec Loss 1.1995 LearningRate 0.000046 Epoch: 32 Global Step: 668740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:29,961-Speed 2495.80 samples/sec Loss 1.2229 LearningRate 0.000046 Epoch: 32 Global Step: 668750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:38,165-Speed 2496.90 samples/sec Loss 1.2293 LearningRate 0.000046 Epoch: 32 Global Step: 668760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:46,318-Speed 2512.28 samples/sec Loss 1.2330 LearningRate 0.000046 Epoch: 32 Global Step: 668770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:32:54,522-Speed 2496.98 samples/sec Loss 1.2402 LearningRate 0.000046 Epoch: 32 Global Step: 668780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:02,742-Speed 2491.68 samples/sec Loss 1.2056 LearningRate 0.000046 Epoch: 32 Global Step: 668790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:10,947-Speed 2496.52 samples/sec Loss 1.2456 LearningRate 0.000046 Epoch: 32 Global Step: 668800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:19,152-Speed 2496.33 samples/sec Loss 1.2526 LearningRate 0.000046 Epoch: 32 Global Step: 668810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:27,355-Speed 2496.99 samples/sec Loss 1.2514 LearningRate 0.000046 Epoch: 32 Global Step: 668820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:35,509-Speed 2512.11 samples/sec Loss 1.2158 LearningRate 0.000046 Epoch: 32 Global Step: 668830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:43,712-Speed 2496.85 samples/sec Loss 1.2472 LearningRate 0.000046 Epoch: 32 Global Step: 668840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:33:51,916-Speed 2496.91 samples/sec Loss 1.2693 LearningRate 0.000046 Epoch: 32 Global Step: 668850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:00,123-Speed 2495.99 samples/sec Loss 1.2344 LearningRate 0.000046 Epoch: 32 Global Step: 668860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:08,328-Speed 2496.28 samples/sec Loss 1.2225 LearningRate 0.000046 Epoch: 32 Global Step: 668870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:16,534-Speed 2496.23 samples/sec Loss 1.2234 LearningRate 0.000046 Epoch: 32 Global Step: 668880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:24,684-Speed 2513.29 samples/sec Loss 1.2317 LearningRate 0.000046 Epoch: 32 Global Step: 668890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:32,886-Speed 2497.30 samples/sec Loss 1.2458 LearningRate 0.000046 Epoch: 32 Global Step: 668900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:41,098-Speed 2494.46 samples/sec Loss 1.2386 LearningRate 0.000046 Epoch: 32 Global Step: 668910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:49,301-Speed 2497.17 samples/sec Loss 1.2355 LearningRate 0.000046 Epoch: 32 Global Step: 668920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:34:57,508-Speed 2495.87 samples/sec Loss 1.2399 LearningRate 0.000046 Epoch: 32 Global Step: 668930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:05,714-Speed 2496.08 samples/sec Loss 1.2649 LearningRate 0.000046 Epoch: 32 Global Step: 668940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:13,869-Speed 2511.60 samples/sec Loss 1.2134 LearningRate 0.000046 Epoch: 32 Global Step: 668950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:22,087-Speed 2492.59 samples/sec Loss 1.2118 LearningRate 0.000046 Epoch: 32 Global Step: 668960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:30,305-Speed 2492.59 samples/sec Loss 1.2388 LearningRate 0.000046 Epoch: 32 Global Step: 668970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:38,512-Speed 2495.74 samples/sec Loss 1.2315 LearningRate 0.000046 Epoch: 32 Global Step: 668980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:46,718-Speed 2496.03 samples/sec Loss 1.2044 LearningRate 0.000046 Epoch: 32 Global Step: 668990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:35:54,920-Speed 2497.37 samples/sec Loss 1.2587 LearningRate 0.000046 Epoch: 32 Global Step: 669000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:03,074-Speed 2512.29 samples/sec Loss 1.2503 LearningRate 0.000046 Epoch: 32 Global Step: 669010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:11,280-Speed 2496.04 samples/sec Loss 1.2394 LearningRate 0.000046 Epoch: 32 Global Step: 669020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:19,487-Speed 2496.12 samples/sec Loss 1.2259 LearningRate 0.000046 Epoch: 32 Global Step: 669030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:27,694-Speed 2495.85 samples/sec Loss 1.2332 LearningRate 0.000046 Epoch: 32 Global Step: 669040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:35,905-Speed 2494.72 samples/sec Loss 1.1996 LearningRate 0.000046 Epoch: 32 Global Step: 669050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:44,121-Speed 2492.97 samples/sec Loss 1.2429 LearningRate 0.000046 Epoch: 32 Global Step: 669060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:36:52,273-Speed 2512.57 samples/sec Loss 1.2281 LearningRate 0.000046 Epoch: 32 Global Step: 669070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:00,476-Speed 2497.28 samples/sec Loss 1.2454 LearningRate 0.000046 Epoch: 32 Global Step: 669080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:08,681-Speed 2496.52 samples/sec Loss 1.2688 LearningRate 0.000046 Epoch: 32 Global Step: 669090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:16,885-Speed 2496.73 samples/sec Loss 1.2182 LearningRate 0.000046 Epoch: 32 Global Step: 669100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:25,088-Speed 2497.25 samples/sec Loss 1.2370 LearningRate 0.000046 Epoch: 32 Global Step: 669110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:33,292-Speed 2496.85 samples/sec Loss 1.2565 LearningRate 0.000046 Epoch: 32 Global Step: 669120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:41,442-Speed 2513.26 samples/sec Loss 1.2560 LearningRate 0.000046 Epoch: 32 Global Step: 669130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:49,651-Speed 2495.06 samples/sec Loss 1.2588 LearningRate 0.000046 Epoch: 32 Global Step: 669140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:37:57,856-Speed 2496.35 samples/sec Loss 1.2499 LearningRate 0.000046 Epoch: 32 Global Step: 669150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:06,060-Speed 2497.14 samples/sec Loss 1.2413 LearningRate 0.000046 Epoch: 32 Global Step: 669160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:14,270-Speed 2494.69 samples/sec Loss 1.2532 LearningRate 0.000046 Epoch: 32 Global Step: 669170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:22,473-Speed 2496.75 samples/sec Loss 1.2334 LearningRate 0.000046 Epoch: 32 Global Step: 669180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:30,627-Speed 2512.10 samples/sec Loss 1.2537 LearningRate 0.000046 Epoch: 32 Global Step: 669190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:38,837-Speed 2495.65 samples/sec Loss 1.2478 LearningRate 0.000046 Epoch: 32 Global Step: 669200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:47,039-Speed 2497.20 samples/sec Loss 1.2330 LearningRate 0.000046 Epoch: 32 Global Step: 669210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:38:55,243-Speed 2496.79 samples/sec Loss 1.2541 LearningRate 0.000046 Epoch: 32 Global Step: 669220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:03,450-Speed 2495.89 samples/sec Loss 1.2575 LearningRate 0.000046 Epoch: 32 Global Step: 669230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:11,655-Speed 2496.55 samples/sec Loss 1.2561 LearningRate 0.000046 Epoch: 32 Global Step: 669240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:19,808-Speed 2512.17 samples/sec Loss 1.2428 LearningRate 0.000046 Epoch: 32 Global Step: 669250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:28,015-Speed 2496.03 samples/sec Loss 1.2519 LearningRate 0.000046 Epoch: 32 Global Step: 669260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:36,219-Speed 2496.71 samples/sec Loss 1.2144 LearningRate 0.000046 Epoch: 32 Global Step: 669270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:44,427-Speed 2495.37 samples/sec Loss 1.2312 LearningRate 0.000046 Epoch: 32 Global Step: 669280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:39:52,633-Speed 2496.06 samples/sec Loss 1.2390 LearningRate 0.000046 Epoch: 32 Global Step: 669290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:00,839-Speed 2496.15 samples/sec Loss 1.2335 LearningRate 0.000046 Epoch: 32 Global Step: 669300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:08,994-Speed 2512.18 samples/sec Loss 1.2399 LearningRate 0.000046 Epoch: 32 Global Step: 669310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:17,201-Speed 2495.93 samples/sec Loss 1.2674 LearningRate 0.000046 Epoch: 32 Global Step: 669320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:25,410-Speed 2494.97 samples/sec Loss 1.2086 LearningRate 0.000046 Epoch: 32 Global Step: 669330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:33,618-Speed 2495.77 samples/sec Loss 1.2486 LearningRate 0.000046 Epoch: 32 Global Step: 669340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:41,826-Speed 2496.40 samples/sec Loss 1.2681 LearningRate 0.000046 Epoch: 32 Global Step: 669350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:50,035-Speed 2495.41 samples/sec Loss 1.2490 LearningRate 0.000046 Epoch: 32 Global Step: 669360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:40:58,191-Speed 2511.45 samples/sec Loss 1.2583 LearningRate 0.000046 Epoch: 32 Global Step: 669370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:06,415-Speed 2490.51 samples/sec Loss 1.2647 LearningRate 0.000046 Epoch: 32 Global Step: 669380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:14,624-Speed 2495.59 samples/sec Loss 1.2458 LearningRate 0.000046 Epoch: 32 Global Step: 669390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:22,828-Speed 2496.44 samples/sec Loss 1.2533 LearningRate 0.000046 Epoch: 32 Global Step: 669400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:31,034-Speed 2496.19 samples/sec Loss 1.2613 LearningRate 0.000046 Epoch: 32 Global Step: 669410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:39,239-Speed 2496.26 samples/sec Loss 1.2573 LearningRate 0.000046 Epoch: 32 Global Step: 669420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:47,401-Speed 2509.73 samples/sec Loss 1.2369 LearningRate 0.000046 Epoch: 32 Global Step: 669430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:41:55,618-Speed 2493.18 samples/sec Loss 1.2522 LearningRate 0.000046 Epoch: 32 Global Step: 669440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:03,820-Speed 2497.17 samples/sec Loss 1.2797 LearningRate 0.000046 Epoch: 32 Global Step: 669450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:12,023-Speed 2497.24 samples/sec Loss 1.2358 LearningRate 0.000046 Epoch: 32 Global Step: 669460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:20,227-Speed 2496.61 samples/sec Loss 1.2517 LearningRate 0.000046 Epoch: 32 Global Step: 669470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:28,429-Speed 2497.22 samples/sec Loss 1.2264 LearningRate 0.000046 Epoch: 32 Global Step: 669480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:36,582-Speed 2512.13 samples/sec Loss 1.2330 LearningRate 0.000046 Epoch: 32 Global Step: 669490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:44,787-Speed 2496.78 samples/sec Loss 1.2436 LearningRate 0.000046 Epoch: 32 Global Step: 669500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:42:52,997-Speed 2494.96 samples/sec Loss 1.2633 LearningRate 0.000046 Epoch: 32 Global Step: 669510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:43:01,206-Speed 2495.24 samples/sec Loss 1.2406 LearningRate 0.000046 Epoch: 32 Global Step: 669520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:43:09,414-Speed 2495.36 samples/sec Loss 1.2817 LearningRate 0.000046 Epoch: 32 Global Step: 669530 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:17,618-Speed 2496.74 samples/sec Loss 1.2648 LearningRate 0.000046 Epoch: 32 Global Step: 669540 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:25,771-Speed 2512.55 samples/sec Loss 1.2729 LearningRate 0.000046 Epoch: 32 Global Step: 669550 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:33,975-Speed 2496.82 samples/sec Loss 1.2617 LearningRate 0.000046 Epoch: 32 Global Step: 669560 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:42,178-Speed 2496.97 samples/sec Loss 1.2435 LearningRate 0.000046 Epoch: 32 Global Step: 669570 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:50,381-Speed 2496.98 samples/sec Loss 1.2155 LearningRate 0.000046 Epoch: 32 Global Step: 669580 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:43:58,599-Speed 2492.60 samples/sec Loss 1.2432 LearningRate 0.000046 Epoch: 32 Global Step: 669590 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:44:06,805-Speed 2496.11 samples/sec Loss 1.2476 LearningRate 0.000046 Epoch: 32 Global Step: 669600 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:44:14,955-Speed 2513.20 samples/sec Loss 1.2336 LearningRate 0.000046 Epoch: 32 Global Step: 669610 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:44:23,158-Speed 2497.22 samples/sec Loss 1.2618 LearningRate 0.000046 Epoch: 32 Global Step: 669620 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-07-11 23:44:31,321-Speed 2509.46 samples/sec Loss 1.2473 LearningRate 0.000046 Epoch: 32 Global Step: 669630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:44:39,524-Speed 2496.86 samples/sec Loss 1.2094 LearningRate 0.000046 Epoch: 32 Global Step: 669640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:44:47,730-Speed 2496.46 samples/sec Loss 1.2072 LearningRate 0.000046 Epoch: 32 Global Step: 669650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:44:55,938-Speed 2495.56 samples/sec Loss 1.2361 LearningRate 0.000046 Epoch: 32 Global Step: 669660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:04,087-Speed 2513.64 samples/sec Loss 1.2053 LearningRate 0.000046 Epoch: 32 Global Step: 669670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:12,292-Speed 2496.25 samples/sec Loss 1.2342 LearningRate 0.000046 Epoch: 32 Global Step: 669680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:20,496-Speed 2496.77 samples/sec Loss 1.2464 LearningRate 0.000046 Epoch: 32 Global Step: 669690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:28,709-Speed 2494.08 samples/sec Loss 1.2390 LearningRate 0.000046 Epoch: 32 Global Step: 669700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:36,912-Speed 2496.89 samples/sec Loss 1.2660 LearningRate 0.000046 Epoch: 32 Global Step: 669710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:45,115-Speed 2497.13 samples/sec Loss 1.2552 LearningRate 0.000046 Epoch: 32 Global Step: 669720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:45:53,268-Speed 2512.65 samples/sec Loss 1.2739 LearningRate 0.000046 Epoch: 32 Global Step: 669730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:01,470-Speed 2497.34 samples/sec Loss 1.2540 LearningRate 0.000046 Epoch: 32 Global Step: 669740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:09,674-Speed 2496.48 samples/sec Loss 1.1903 LearningRate 0.000046 Epoch: 32 Global Step: 669750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:17,879-Speed 2496.66 samples/sec Loss 1.2655 LearningRate 0.000046 Epoch: 32 Global Step: 669760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:26,095-Speed 2493.20 samples/sec Loss 1.2319 LearningRate 0.000046 Epoch: 32 Global Step: 669770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:34,298-Speed 2496.86 samples/sec Loss 1.2186 LearningRate 0.000046 Epoch: 32 Global Step: 669780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:42,449-Speed 2513.19 samples/sec Loss 1.2331 LearningRate 0.000046 Epoch: 32 Global Step: 669790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:50,657-Speed 2495.34 samples/sec Loss 1.2252 LearningRate 0.000046 Epoch: 32 Global Step: 669800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:46:58,861-Speed 2496.56 samples/sec Loss 1.2332 LearningRate 0.000046 Epoch: 32 Global Step: 669810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:07,078-Speed 2492.67 samples/sec Loss 1.2296 LearningRate 0.000046 Epoch: 32 Global Step: 669820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:15,279-Speed 2497.88 samples/sec Loss 1.2380 LearningRate 0.000046 Epoch: 32 Global Step: 669830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:23,483-Speed 2496.74 samples/sec Loss 1.2373 LearningRate 0.000046 Epoch: 32 Global Step: 669840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:31,635-Speed 2512.69 samples/sec Loss 1.2318 LearningRate 0.000046 Epoch: 32 Global Step: 669850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:39,849-Speed 2493.56 samples/sec Loss 1.2484 LearningRate 0.000046 Epoch: 32 Global Step: 669860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:48,091-Speed 2485.45 samples/sec Loss 1.2381 LearningRate 0.000046 Epoch: 32 Global Step: 669870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:47:56,297-Speed 2496.32 samples/sec Loss 1.2226 LearningRate 0.000046 Epoch: 32 Global Step: 669880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:04,505-Speed 2495.50 samples/sec Loss 1.2462 LearningRate 0.000046 Epoch: 32 Global Step: 669890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:12,711-Speed 2496.13 samples/sec Loss 1.2641 LearningRate 0.000046 Epoch: 32 Global Step: 669900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:20,878-Speed 2508.06 samples/sec Loss 1.2210 LearningRate 0.000046 Epoch: 32 Global Step: 669910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:29,087-Speed 2495.23 samples/sec Loss 1.2452 LearningRate 0.000046 Epoch: 32 Global Step: 669920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:37,293-Speed 2496.33 samples/sec Loss 1.2390 LearningRate 0.000046 Epoch: 32 Global Step: 669930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:45,498-Speed 2496.16 samples/sec Loss 1.2416 LearningRate 0.000046 Epoch: 32 Global Step: 669940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:48:53,717-Speed 2492.14 samples/sec Loss 1.2581 LearningRate 0.000046 Epoch: 32 Global Step: 669950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:01,934-Speed 2492.56 samples/sec Loss 1.2401 LearningRate 0.000046 Epoch: 32 Global Step: 669960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:10,089-Speed 2512.06 samples/sec Loss 1.2042 LearningRate 0.000046 Epoch: 32 Global Step: 669970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:18,295-Speed 2496.11 samples/sec Loss 1.2519 LearningRate 0.000046 Epoch: 32 Global Step: 669980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:26,514-Speed 2491.96 samples/sec Loss 1.2057 LearningRate 0.000046 Epoch: 32 Global Step: 669990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:34,718-Speed 2496.81 samples/sec Loss 1.2031 LearningRate 0.000046 Epoch: 32 Global Step: 670000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:42,921-Speed 2497.19 samples/sec Loss 1.2392 LearningRate 0.000046 Epoch: 32 Global Step: 670010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:51,124-Speed 2496.95 samples/sec Loss 1.2170 LearningRate 0.000046 Epoch: 32 Global Step: 670020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:49:59,272-Speed 2514.41 samples/sec Loss 1.2051 LearningRate 0.000046 Epoch: 32 Global Step: 670030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:07,474-Speed 2497.50 samples/sec Loss 1.2152 LearningRate 0.000046 Epoch: 32 Global Step: 670040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:15,678-Speed 2496.71 samples/sec Loss 1.2328 LearningRate 0.000046 Epoch: 32 Global Step: 670050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:23,881-Speed 2496.98 samples/sec Loss 1.2291 LearningRate 0.000046 Epoch: 32 Global Step: 670060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:32,086-Speed 2496.60 samples/sec Loss 1.2405 LearningRate 0.000046 Epoch: 32 Global Step: 670070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:40,383-Speed 2468.51 samples/sec Loss 1.2149 LearningRate 0.000046 Epoch: 32 Global Step: 670080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:48,537-Speed 2512.09 samples/sec Loss 1.2151 LearningRate 0.000046 Epoch: 32 Global Step: 670090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:50:56,741-Speed 2496.83 samples/sec Loss 1.2702 LearningRate 0.000046 Epoch: 32 Global Step: 670100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:51:04,959-Speed 2492.35 samples/sec Loss 1.2407 LearningRate 0.000046 Epoch: 32 Global Step: 670110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:51:13,171-Speed 2494.68 samples/sec Loss 1.2140 LearningRate 0.000046 Epoch: 32 Global Step: 670120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-07-11 23:51:21,376-Speed 2496.06 samples/sec Loss 1.2359 LearningRate 0.000046 Epoch: 32 Global Step: 670130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:51:29,585-Speed 2495.31 samples/sec Loss 1.2135 LearningRate 0.000046 Epoch: 32 Global Step: 670140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:51:37,737-Speed 2512.95 samples/sec Loss 1.2307 LearningRate 0.000046 Epoch: 32 Global Step: 670150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:51:45,942-Speed 2496.32 samples/sec Loss 1.2121 LearningRate 0.000046 Epoch: 32 Global Step: 670160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:51:54,151-Speed 2494.99 samples/sec Loss 1.2080 LearningRate 0.000046 Epoch: 32 Global Step: 670170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:02,356-Speed 2496.58 samples/sec Loss 1.2209 LearningRate 0.000046 Epoch: 32 Global Step: 670180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:10,569-Speed 2494.42 samples/sec Loss 1.1823 LearningRate 0.000046 Epoch: 32 Global Step: 670190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:18,772-Speed 2496.97 samples/sec Loss 1.2234 LearningRate 0.000046 Epoch: 32 Global Step: 670200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:26,921-Speed 2513.53 samples/sec Loss 1.1968 LearningRate 0.000046 Epoch: 32 Global Step: 670210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:35,124-Speed 2497.26 samples/sec Loss 1.2149 LearningRate 0.000046 Epoch: 32 Global Step: 670220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:43,333-Speed 2495.58 samples/sec Loss 1.2195 LearningRate 0.000046 Epoch: 32 Global Step: 670230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:51,536-Speed 2496.98 samples/sec Loss 1.2415 LearningRate 0.000046 Epoch: 32 Global Step: 670240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:52:59,739-Speed 2497.00 samples/sec Loss 1.2099 LearningRate 0.000046 Epoch: 32 Global Step: 670250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:07,943-Speed 2497.00 samples/sec Loss 1.2401 LearningRate 0.000046 Epoch: 32 Global Step: 670260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:16,094-Speed 2512.79 samples/sec Loss 1.2328 LearningRate 0.000046 Epoch: 32 Global Step: 670270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:24,305-Speed 2494.55 samples/sec Loss 1.2128 LearningRate 0.000046 Epoch: 32 Global Step: 670280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:32,510-Speed 2496.65 samples/sec Loss 1.2215 LearningRate 0.000046 Epoch: 32 Global Step: 670290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:40,725-Speed 2493.26 samples/sec Loss 1.2319 LearningRate 0.000046 Epoch: 32 Global Step: 670300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:48,929-Speed 2497.00 samples/sec Loss 1.2486 LearningRate 0.000045 Epoch: 32 Global Step: 670310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:53:57,134-Speed 2496.20 samples/sec Loss 1.2344 LearningRate 0.000045 Epoch: 32 Global Step: 670320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:05,285-Speed 2513.06 samples/sec Loss 1.2314 LearningRate 0.000045 Epoch: 32 Global Step: 670330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:13,487-Speed 2497.17 samples/sec Loss 1.2552 LearningRate 0.000045 Epoch: 32 Global Step: 670340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:21,704-Speed 2492.78 samples/sec Loss 1.2292 LearningRate 0.000045 Epoch: 32 Global Step: 670350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:29,909-Speed 2496.45 samples/sec Loss 1.2424 LearningRate 0.000045 Epoch: 32 Global Step: 670360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:38,115-Speed 2496.09 samples/sec Loss 1.2354 LearningRate 0.000045 Epoch: 32 Global Step: 670370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:46,324-Speed 2495.57 samples/sec Loss 1.2491 LearningRate 0.000045 Epoch: 32 Global Step: 670380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:54:54,483-Speed 2510.55 samples/sec Loss 1.2443 LearningRate 0.000045 Epoch: 32 Global Step: 670390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:02,687-Speed 2496.69 samples/sec Loss 1.1955 LearningRate 0.000045 Epoch: 32 Global Step: 670400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:10,891-Speed 2496.60 samples/sec Loss 1.2045 LearningRate 0.000045 Epoch: 32 Global Step: 670410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:19,094-Speed 2497.01 samples/sec Loss 1.2319 LearningRate 0.000045 Epoch: 32 Global Step: 670420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:27,299-Speed 2496.55 samples/sec Loss 1.2616 LearningRate 0.000045 Epoch: 32 Global Step: 670430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:35,515-Speed 2493.15 samples/sec Loss 1.2040 LearningRate 0.000045 Epoch: 32 Global Step: 670440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:43,668-Speed 2512.17 samples/sec Loss 1.2302 LearningRate 0.000045 Epoch: 32 Global Step: 670450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:55:51,873-Speed 2496.65 samples/sec Loss 1.2361 LearningRate 0.000045 Epoch: 32 Global Step: 670460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:00,076-Speed 2497.00 samples/sec Loss 1.2205 LearningRate 0.000045 Epoch: 32 Global Step: 670470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:08,281-Speed 2496.64 samples/sec Loss 1.2457 LearningRate 0.000045 Epoch: 32 Global Step: 670480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:16,499-Speed 2492.61 samples/sec Loss 1.2182 LearningRate 0.000045 Epoch: 32 Global Step: 670490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:24,718-Speed 2492.10 samples/sec Loss 1.2197 LearningRate 0.000045 Epoch: 32 Global Step: 670500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:32,870-Speed 2512.54 samples/sec Loss 1.2236 LearningRate 0.000045 Epoch: 32 Global Step: 670510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:41,075-Speed 2496.41 samples/sec Loss 1.2247 LearningRate 0.000045 Epoch: 32 Global Step: 670520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:49,281-Speed 2496.25 samples/sec Loss 1.2545 LearningRate 0.000045 Epoch: 32 Global Step: 670530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:56:57,486-Speed 2496.38 samples/sec Loss 1.2288 LearningRate 0.000045 Epoch: 32 Global Step: 670540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:05,693-Speed 2495.69 samples/sec Loss 1.2513 LearningRate 0.000045 Epoch: 32 Global Step: 670550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:13,908-Speed 2493.38 samples/sec Loss 1.2306 LearningRate 0.000045 Epoch: 32 Global Step: 670560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:22,055-Speed 2514.79 samples/sec Loss 1.2216 LearningRate 0.000045 Epoch: 32 Global Step: 670570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:30,256-Speed 2497.59 samples/sec Loss 1.2086 LearningRate 0.000045 Epoch: 32 Global Step: 670580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:38,459-Speed 2497.04 samples/sec Loss 1.2564 LearningRate 0.000045 Epoch: 32 Global Step: 670590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:46,662-Speed 2497.27 samples/sec Loss 1.2603 LearningRate 0.000045 Epoch: 32 Global Step: 670600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:57:54,878-Speed 2493.08 samples/sec Loss 1.2035 LearningRate 0.000045 Epoch: 32 Global Step: 670610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:03,091-Speed 2494.05 samples/sec Loss 1.2264 LearningRate 0.000045 Epoch: 32 Global Step: 670620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:11,255-Speed 2509.00 samples/sec Loss 1.2159 LearningRate 0.000045 Epoch: 32 Global Step: 670630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:19,458-Speed 2497.14 samples/sec Loss 1.2100 LearningRate 0.000045 Epoch: 32 Global Step: 670640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:27,669-Speed 2494.70 samples/sec Loss 1.2489 LearningRate 0.000045 Epoch: 32 Global Step: 670650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:35,874-Speed 2496.48 samples/sec Loss 1.2064 LearningRate 0.000045 Epoch: 32 Global Step: 670660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:44,079-Speed 2496.29 samples/sec Loss 1.2426 LearningRate 0.000045 Epoch: 32 Global Step: 670670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:58:52,297-Speed 2492.36 samples/sec Loss 1.2383 LearningRate 0.000045 Epoch: 32 Global Step: 670680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:00,446-Speed 2513.56 samples/sec Loss 1.2478 LearningRate 0.000045 Epoch: 32 Global Step: 670690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:08,655-Speed 2495.44 samples/sec Loss 1.2491 LearningRate 0.000045 Epoch: 32 Global Step: 670700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:16,866-Speed 2494.64 samples/sec Loss 1.2490 LearningRate 0.000045 Epoch: 32 Global Step: 670710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:25,073-Speed 2495.55 samples/sec Loss 1.2340 LearningRate 0.000045 Epoch: 32 Global Step: 670720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:33,278-Speed 2496.88 samples/sec Loss 1.2263 LearningRate 0.000045 Epoch: 32 Global Step: 670730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:41,487-Speed 2495.12 samples/sec Loss 1.2667 LearningRate 0.000045 Epoch: 32 Global Step: 670740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:49,666-Speed 2504.13 samples/sec Loss 1.2362 LearningRate 0.000045 Epoch: 32 Global Step: 670750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-11 23:59:57,870-Speed 2496.86 samples/sec Loss 1.2256 LearningRate 0.000045 Epoch: 32 Global Step: 670760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:06,075-Speed 2496.41 samples/sec Loss 1.2268 LearningRate 0.000045 Epoch: 32 Global Step: 670770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:14,281-Speed 2496.27 samples/sec Loss 1.2413 LearningRate 0.000045 Epoch: 32 Global Step: 670780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:22,486-Speed 2496.49 samples/sec Loss 1.2357 LearningRate 0.000045 Epoch: 32 Global Step: 670790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:30,695-Speed 2495.46 samples/sec Loss 1.2159 LearningRate 0.000045 Epoch: 32 Global Step: 670800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:38,851-Speed 2511.57 samples/sec Loss 1.2424 LearningRate 0.000045 Epoch: 32 Global Step: 670810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:47,068-Speed 2492.93 samples/sec Loss 1.2591 LearningRate 0.000045 Epoch: 32 Global Step: 670820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:00:55,272-Speed 2496.43 samples/sec Loss 1.2581 LearningRate 0.000045 Epoch: 32 Global Step: 670830 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:03,483-Speed 2494.91 samples/sec Loss 1.2312 LearningRate 0.000045 Epoch: 32 Global Step: 670840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:11,686-Speed 2496.87 samples/sec Loss 1.2226 LearningRate 0.000045 Epoch: 32 Global Step: 670850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:19,890-Speed 2496.81 samples/sec Loss 1.2477 LearningRate 0.000045 Epoch: 32 Global Step: 670860 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:28,042-Speed 2512.38 samples/sec Loss 1.2545 LearningRate 0.000045 Epoch: 32 Global Step: 670870 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:36,248-Speed 2497.29 samples/sec Loss 1.2270 LearningRate 0.000045 Epoch: 32 Global Step: 670880 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:44,465-Speed 2492.91 samples/sec Loss 1.2210 LearningRate 0.000045 Epoch: 32 Global Step: 670890 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:01:52,671-Speed 2496.11 samples/sec Loss 1.2694 LearningRate 0.000045 Epoch: 32 Global Step: 670900 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:00,876-Speed 2496.44 samples/sec Loss 1.2232 LearningRate 0.000045 Epoch: 32 Global Step: 670910 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:09,094-Speed 2492.52 samples/sec Loss 1.2175 LearningRate 0.000045 Epoch: 32 Global Step: 670920 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:17,245-Speed 2513.00 samples/sec Loss 1.2565 LearningRate 0.000045 Epoch: 32 Global Step: 670930 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:25,450-Speed 2496.21 samples/sec Loss 1.2071 LearningRate 0.000045 Epoch: 32 Global Step: 670940 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:33,654-Speed 2497.19 samples/sec Loss 1.2383 LearningRate 0.000045 Epoch: 32 Global Step: 670950 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:41,857-Speed 2496.78 samples/sec Loss 1.2190 LearningRate 0.000045 Epoch: 32 Global Step: 670960 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:50,061-Speed 2496.82 samples/sec Loss 1.2459 LearningRate 0.000045 Epoch: 32 Global Step: 670970 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:02:58,266-Speed 2496.29 samples/sec Loss 1.2102 LearningRate 0.000045 Epoch: 32 Global Step: 670980 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:06,416-Speed 2513.39 samples/sec Loss 1.2230 LearningRate 0.000045 Epoch: 32 Global Step: 670990 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:14,621-Speed 2496.52 samples/sec Loss 1.2645 LearningRate 0.000045 Epoch: 32 Global Step: 671000 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:22,842-Speed 2491.49 samples/sec Loss 1.2511 LearningRate 0.000045 Epoch: 32 Global Step: 671010 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:31,044-Speed 2497.03 samples/sec Loss 1.2387 LearningRate 0.000045 Epoch: 32 Global Step: 671020 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:39,250-Speed 2496.32 samples/sec Loss 1.2503 LearningRate 0.000045 Epoch: 32 Global Step: 671030 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:47,458-Speed 2495.31 samples/sec Loss 1.2607 LearningRate 0.000045 Epoch: 32 Global Step: 671040 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:03:55,614-Speed 2511.67 samples/sec Loss 1.2336 LearningRate 0.000045 Epoch: 32 Global Step: 671050 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:03,818-Speed 2496.86 samples/sec Loss 1.2378 LearningRate 0.000045 Epoch: 32 Global Step: 671060 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:12,023-Speed 2496.20 samples/sec Loss 1.2422 LearningRate 0.000045 Epoch: 32 Global Step: 671070 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:20,228-Speed 2496.44 samples/sec Loss 1.2173 LearningRate 0.000045 Epoch: 32 Global Step: 671080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:28,434-Speed 2496.19 samples/sec Loss 1.2751 LearningRate 0.000045 Epoch: 32 Global Step: 671090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:36,639-Speed 2496.58 samples/sec Loss 1.2058 LearningRate 0.000045 Epoch: 32 Global Step: 671100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:44,790-Speed 2513.21 samples/sec Loss 1.2051 LearningRate 0.000045 Epoch: 32 Global Step: 671110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:04:52,995-Speed 2496.36 samples/sec Loss 1.2333 LearningRate 0.000045 Epoch: 32 Global Step: 671120 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:01,199-Speed 2496.50 samples/sec Loss 1.2411 LearningRate 0.000045 Epoch: 32 Global Step: 671130 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:09,403-Speed 2496.79 samples/sec Loss 1.2238 LearningRate 0.000045 Epoch: 32 Global Step: 671140 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:17,608-Speed 2496.57 samples/sec Loss 1.2025 LearningRate 0.000045 Epoch: 32 Global Step: 671150 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:25,815-Speed 2495.71 samples/sec Loss 1.2363 LearningRate 0.000045 Epoch: 32 Global Step: 671160 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:33,964-Speed 2513.33 samples/sec Loss 1.1980 LearningRate 0.000045 Epoch: 32 Global Step: 671170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:42,174-Speed 2494.94 samples/sec Loss 1.2796 LearningRate 0.000045 Epoch: 32 Global Step: 671180 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:50,377-Speed 2497.25 samples/sec Loss 1.2314 LearningRate 0.000045 Epoch: 32 Global Step: 671190 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:05:58,596-Speed 2491.94 samples/sec Loss 1.2575 LearningRate 0.000045 Epoch: 32 Global Step: 671200 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:06:06,803-Speed 2495.91 samples/sec Loss 1.2392 LearningRate 0.000045 Epoch: 32 Global Step: 671210 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:06:15,007-Speed 2497.14 samples/sec Loss 1.2561 LearningRate 0.000045 Epoch: 32 Global Step: 671220 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:06:23,163-Speed 2511.25 samples/sec Loss 1.2507 LearningRate 0.000045 Epoch: 32 Global Step: 671230 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:06:31,370-Speed 2495.81 samples/sec Loss 1.2410 LearningRate 0.000045 Epoch: 32 Global Step: 671240 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:06:39,547-Speed 2505.27 samples/sec Loss 1.2613 LearningRate 0.000045 Epoch: 32 Global Step: 671250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:06:47,775-Speed 2489.54 samples/sec Loss 1.2015 LearningRate 0.000045 Epoch: 32 Global Step: 671260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:06:55,982-Speed 2495.85 samples/sec Loss 1.2233 LearningRate 0.000045 Epoch: 32 Global Step: 671270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:04,187-Speed 2496.62 samples/sec Loss 1.2277 LearningRate 0.000045 Epoch: 32 Global Step: 671280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:12,339-Speed 2512.72 samples/sec Loss 1.2450 LearningRate 0.000045 Epoch: 32 Global Step: 671290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:20,555-Speed 2493.13 samples/sec Loss 1.2312 LearningRate 0.000045 Epoch: 32 Global Step: 671300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:28,758-Speed 2496.77 samples/sec Loss 1.2228 LearningRate 0.000045 Epoch: 32 Global Step: 671310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:36,962-Speed 2496.94 samples/sec Loss 1.2370 LearningRate 0.000045 Epoch: 32 Global Step: 671320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:45,169-Speed 2495.79 samples/sec Loss 1.2445 LearningRate 0.000045 Epoch: 32 Global Step: 671330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:07:53,374-Speed 2496.30 samples/sec Loss 1.1951 LearningRate 0.000045 Epoch: 32 Global Step: 671340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:01,523-Speed 2513.47 samples/sec Loss 1.2090 LearningRate 0.000045 Epoch: 32 Global Step: 671350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:09,735-Speed 2494.48 samples/sec Loss 1.2208 LearningRate 0.000045 Epoch: 32 Global Step: 671360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:17,945-Speed 2494.85 samples/sec Loss 1.2497 LearningRate 0.000045 Epoch: 32 Global Step: 671370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:26,149-Speed 2496.49 samples/sec Loss 1.2312 LearningRate 0.000045 Epoch: 32 Global Step: 671380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:34,352-Speed 2497.27 samples/sec Loss 1.2340 LearningRate 0.000045 Epoch: 32 Global Step: 671390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:42,562-Speed 2494.80 samples/sec Loss 1.2367 LearningRate 0.000045 Epoch: 32 Global Step: 671400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:50,712-Speed 2513.11 samples/sec Loss 1.2319 LearningRate 0.000045 Epoch: 32 Global Step: 671410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:08:58,916-Speed 2496.86 samples/sec Loss 1.2577 LearningRate 0.000045 Epoch: 32 Global Step: 671420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:07,118-Speed 2497.34 samples/sec Loss 1.2319 LearningRate 0.000045 Epoch: 32 Global Step: 671430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:15,322-Speed 2496.63 samples/sec Loss 1.2172 LearningRate 0.000045 Epoch: 32 Global Step: 671440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:23,536-Speed 2494.24 samples/sec Loss 1.2360 LearningRate 0.000045 Epoch: 32 Global Step: 671450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:31,739-Speed 2497.09 samples/sec Loss 1.2415 LearningRate 0.000045 Epoch: 32 Global Step: 671460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:39,891-Speed 2512.58 samples/sec Loss 1.1966 LearningRate 0.000045 Epoch: 32 Global Step: 671470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:48,109-Speed 2492.59 samples/sec Loss 1.2506 LearningRate 0.000045 Epoch: 32 Global Step: 671480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:09:56,313-Speed 2496.69 samples/sec Loss 1.2103 LearningRate 0.000045 Epoch: 32 Global Step: 671490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:04,532-Speed 2492.10 samples/sec Loss 1.2542 LearningRate 0.000045 Epoch: 32 Global Step: 671500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:12,739-Speed 2495.68 samples/sec Loss 1.2234 LearningRate 0.000045 Epoch: 32 Global Step: 671510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:20,945-Speed 2496.08 samples/sec Loss 1.2314 LearningRate 0.000045 Epoch: 32 Global Step: 671520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:29,098-Speed 2512.37 samples/sec Loss 1.1886 LearningRate 0.000045 Epoch: 32 Global Step: 671530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:37,303-Speed 2496.40 samples/sec Loss 1.2463 LearningRate 0.000045 Epoch: 32 Global Step: 671540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:45,507-Speed 2496.68 samples/sec Loss 1.2625 LearningRate 0.000045 Epoch: 32 Global Step: 671550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:10:53,710-Speed 2496.95 samples/sec Loss 1.1770 LearningRate 0.000045 Epoch: 32 Global Step: 671560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:01,923-Speed 2494.16 samples/sec Loss 1.2311 LearningRate 0.000045 Epoch: 32 Global Step: 671570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:10,128-Speed 2496.35 samples/sec Loss 1.2305 LearningRate 0.000045 Epoch: 32 Global Step: 671580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:18,282-Speed 2512.28 samples/sec Loss 1.1986 LearningRate 0.000045 Epoch: 32 Global Step: 671590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:26,487-Speed 2496.37 samples/sec Loss 1.1976 LearningRate 0.000045 Epoch: 32 Global Step: 671600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:34,694-Speed 2495.95 samples/sec Loss 1.2676 LearningRate 0.000045 Epoch: 32 Global Step: 671610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:42,901-Speed 2495.82 samples/sec Loss 1.2445 LearningRate 0.000045 Epoch: 32 Global Step: 671620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:51,103-Speed 2497.51 samples/sec Loss 1.2004 LearningRate 0.000045 Epoch: 32 Global Step: 671630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:11:59,309-Speed 2496.19 samples/sec Loss 1.2114 LearningRate 0.000045 Epoch: 32 Global Step: 671640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:07,460-Speed 2512.85 samples/sec Loss 1.2139 LearningRate 0.000045 Epoch: 32 Global Step: 671650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:15,664-Speed 2496.81 samples/sec Loss 1.2624 LearningRate 0.000045 Epoch: 32 Global Step: 671660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:23,866-Speed 2497.49 samples/sec Loss 1.2465 LearningRate 0.000045 Epoch: 32 Global Step: 671670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:32,071-Speed 2496.43 samples/sec Loss 1.2363 LearningRate 0.000045 Epoch: 32 Global Step: 671680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:40,275-Speed 2496.72 samples/sec Loss 1.2443 LearningRate 0.000045 Epoch: 32 Global Step: 671690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:48,486-Speed 2494.74 samples/sec Loss 1.2289 LearningRate 0.000045 Epoch: 32 Global Step: 671700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:12:56,636-Speed 2513.06 samples/sec Loss 1.2403 LearningRate 0.000045 Epoch: 32 Global Step: 671710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:04,848-Speed 2494.74 samples/sec Loss 1.2386 LearningRate 0.000045 Epoch: 32 Global Step: 671720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:13,051-Speed 2496.82 samples/sec Loss 1.2435 LearningRate 0.000045 Epoch: 32 Global Step: 671730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:21,258-Speed 2496.07 samples/sec Loss 1.2173 LearningRate 0.000045 Epoch: 32 Global Step: 671740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:29,464-Speed 2496.25 samples/sec Loss 1.2199 LearningRate 0.000045 Epoch: 32 Global Step: 671750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:37,667-Speed 2496.87 samples/sec Loss 1.2274 LearningRate 0.000045 Epoch: 32 Global Step: 671760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:45,818-Speed 2512.94 samples/sec Loss 1.2622 LearningRate 0.000045 Epoch: 32 Global Step: 671770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:13:54,022-Speed 2496.76 samples/sec Loss 1.2399 LearningRate 0.000045 Epoch: 32 Global Step: 671780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:02,224-Speed 2497.15 samples/sec Loss 1.2279 LearningRate 0.000045 Epoch: 32 Global Step: 671790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:10,430-Speed 2496.30 samples/sec Loss 1.2301 LearningRate 0.000045 Epoch: 32 Global Step: 671800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:18,636-Speed 2496.17 samples/sec Loss 1.2453 LearningRate 0.000045 Epoch: 32 Global Step: 671810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:26,839-Speed 2496.83 samples/sec Loss 1.2243 LearningRate 0.000045 Epoch: 32 Global Step: 671820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:35,004-Speed 2510.74 samples/sec Loss 1.2224 LearningRate 0.000045 Epoch: 32 Global Step: 671830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:43,214-Speed 2494.96 samples/sec Loss 1.2091 LearningRate 0.000045 Epoch: 32 Global Step: 671840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:51,417-Speed 2496.90 samples/sec Loss 1.2904 LearningRate 0.000045 Epoch: 32 Global Step: 671850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:14:59,622-Speed 2496.39 samples/sec Loss 1.2355 LearningRate 0.000045 Epoch: 32 Global Step: 671860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:07,830-Speed 2495.77 samples/sec Loss 1.2405 LearningRate 0.000045 Epoch: 32 Global Step: 671870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:16,034-Speed 2496.63 samples/sec Loss 1.2427 LearningRate 0.000045 Epoch: 32 Global Step: 671880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:24,186-Speed 2512.59 samples/sec Loss 1.2180 LearningRate 0.000045 Epoch: 32 Global Step: 671890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:32,394-Speed 2495.73 samples/sec Loss 1.2390 LearningRate 0.000045 Epoch: 32 Global Step: 671900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:40,603-Speed 2495.15 samples/sec Loss 1.2305 LearningRate 0.000045 Epoch: 32 Global Step: 671910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:48,811-Speed 2495.53 samples/sec Loss 1.2332 LearningRate 0.000045 Epoch: 32 Global Step: 671920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:15:57,015-Speed 2496.50 samples/sec Loss 1.2435 LearningRate 0.000045 Epoch: 32 Global Step: 671930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:05,223-Speed 2495.54 samples/sec Loss 1.3000 LearningRate 0.000045 Epoch: 32 Global Step: 671940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:13,374-Speed 2513.25 samples/sec Loss 1.2673 LearningRate 0.000045 Epoch: 32 Global Step: 671950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:21,578-Speed 2496.60 samples/sec Loss 1.2568 LearningRate 0.000045 Epoch: 32 Global Step: 671960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:29,790-Speed 2494.23 samples/sec Loss 1.2148 LearningRate 0.000045 Epoch: 32 Global Step: 671970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:37,992-Speed 2497.36 samples/sec Loss 1.2259 LearningRate 0.000045 Epoch: 32 Global Step: 671980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:46,203-Speed 2495.21 samples/sec Loss 1.2373 LearningRate 0.000045 Epoch: 32 Global Step: 671990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:16:54,412-Speed 2495.14 samples/sec Loss 1.2255 LearningRate 0.000045 Epoch: 32 Global Step: 672000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:02,565-Speed 2512.12 samples/sec Loss 1.2349 LearningRate 0.000045 Epoch: 32 Global Step: 672010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:10,768-Speed 2497.53 samples/sec Loss 1.2290 LearningRate 0.000045 Epoch: 32 Global Step: 672020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:18,971-Speed 2496.86 samples/sec Loss 1.2401 LearningRate 0.000045 Epoch: 32 Global Step: 672030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:27,175-Speed 2496.72 samples/sec Loss 1.2327 LearningRate 0.000045 Epoch: 32 Global Step: 672040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:35,380-Speed 2496.60 samples/sec Loss 1.2446 LearningRate 0.000045 Epoch: 32 Global Step: 672050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:43,588-Speed 2495.35 samples/sec Loss 1.2724 LearningRate 0.000045 Epoch: 32 Global Step: 672060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:51,741-Speed 2512.23 samples/sec Loss 1.2066 LearningRate 0.000044 Epoch: 32 Global Step: 672070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:17:59,945-Speed 2496.67 samples/sec Loss 1.2031 LearningRate 0.000044 Epoch: 32 Global Step: 672080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:08,154-Speed 2495.20 samples/sec Loss 1.2333 LearningRate 0.000044 Epoch: 32 Global Step: 672090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:16,366-Speed 2494.54 samples/sec Loss 1.2566 LearningRate 0.000044 Epoch: 32 Global Step: 672100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:24,568-Speed 2497.24 samples/sec Loss 1.2442 LearningRate 0.000044 Epoch: 32 Global Step: 672110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:32,772-Speed 2496.89 samples/sec Loss 1.2745 LearningRate 0.000044 Epoch: 32 Global Step: 672120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:40,923-Speed 2512.85 samples/sec Loss 1.2159 LearningRate 0.000044 Epoch: 32 Global Step: 672130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:49,125-Speed 2497.45 samples/sec Loss 1.2417 LearningRate 0.000044 Epoch: 32 Global Step: 672140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:18:57,342-Speed 2492.96 samples/sec Loss 1.2037 LearningRate 0.000044 Epoch: 32 Global Step: 672150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:05,545-Speed 2496.86 samples/sec Loss 1.2311 LearningRate 0.000044 Epoch: 32 Global Step: 672160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:13,752-Speed 2496.00 samples/sec Loss 1.2351 LearningRate 0.000044 Epoch: 32 Global Step: 672170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:21,956-Speed 2496.71 samples/sec Loss 1.2091 LearningRate 0.000044 Epoch: 32 Global Step: 672180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:30,121-Speed 2508.75 samples/sec Loss 1.2482 LearningRate 0.000044 Epoch: 32 Global Step: 672190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:38,324-Speed 2496.87 samples/sec Loss 1.2433 LearningRate 0.000044 Epoch: 32 Global Step: 672200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:46,529-Speed 2496.30 samples/sec Loss 1.2651 LearningRate 0.000044 Epoch: 32 Global Step: 672210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:19:54,743-Speed 2493.72 samples/sec Loss 1.2183 LearningRate 0.000044 Epoch: 32 Global Step: 672220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:02,950-Speed 2495.82 samples/sec Loss 1.2763 LearningRate 0.000044 Epoch: 32 Global Step: 672230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:11,162-Speed 2494.34 samples/sec Loss 1.2277 LearningRate 0.000044 Epoch: 32 Global Step: 672240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:19,315-Speed 2512.58 samples/sec Loss 1.2383 LearningRate 0.000044 Epoch: 32 Global Step: 672250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:27,535-Speed 2491.96 samples/sec Loss 1.2086 LearningRate 0.000044 Epoch: 32 Global Step: 672260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:35,739-Speed 2497.10 samples/sec Loss 1.2533 LearningRate 0.000044 Epoch: 32 Global Step: 672270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:43,951-Speed 2494.18 samples/sec Loss 1.2568 LearningRate 0.000044 Epoch: 32 Global Step: 672280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:20:52,162-Speed 2494.75 samples/sec Loss 1.2210 LearningRate 0.000044 Epoch: 32 Global Step: 672290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:00,365-Speed 2496.84 samples/sec Loss 1.2450 LearningRate 0.000044 Epoch: 32 Global Step: 672300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:08,520-Speed 2511.73 samples/sec Loss 1.2173 LearningRate 0.000044 Epoch: 32 Global Step: 672310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:16,734-Speed 2493.93 samples/sec Loss 1.2274 LearningRate 0.000044 Epoch: 32 Global Step: 672320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:24,939-Speed 2496.73 samples/sec Loss 1.2701 LearningRate 0.000044 Epoch: 32 Global Step: 672330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:33,159-Speed 2492.01 samples/sec Loss 1.2715 LearningRate 0.000044 Epoch: 32 Global Step: 672340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:41,364-Speed 2496.25 samples/sec Loss 1.2463 LearningRate 0.000044 Epoch: 32 Global Step: 672350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:49,588-Speed 2490.89 samples/sec Loss 1.2392 LearningRate 0.000044 Epoch: 32 Global Step: 672360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:21:57,752-Speed 2508.87 samples/sec Loss 1.2402 LearningRate 0.000044 Epoch: 32 Global Step: 672370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:05,956-Speed 2496.87 samples/sec Loss 1.2556 LearningRate 0.000044 Epoch: 32 Global Step: 672380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:14,166-Speed 2494.73 samples/sec Loss 1.2427 LearningRate 0.000044 Epoch: 32 Global Step: 672390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:22,372-Speed 2496.13 samples/sec Loss 1.2164 LearningRate 0.000044 Epoch: 32 Global Step: 672400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:30,579-Speed 2495.88 samples/sec Loss 1.2465 LearningRate 0.000044 Epoch: 32 Global Step: 672410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:38,787-Speed 2495.61 samples/sec Loss 1.2359 LearningRate 0.000044 Epoch: 32 Global Step: 672420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:46,937-Speed 2513.20 samples/sec Loss 1.2421 LearningRate 0.000044 Epoch: 32 Global Step: 672430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:22:55,140-Speed 2496.91 samples/sec Loss 1.2085 LearningRate 0.000044 Epoch: 32 Global Step: 672440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:23:03,347-Speed 2495.99 samples/sec Loss 1.2151 LearningRate 0.000044 Epoch: 32 Global Step: 672450 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:11,548-Speed 2497.64 samples/sec Loss 1.2174 LearningRate 0.000044 Epoch: 32 Global Step: 672460 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:19,758-Speed 2495.20 samples/sec Loss 1.2460 LearningRate 0.000044 Epoch: 32 Global Step: 672470 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:27,960-Speed 2497.18 samples/sec Loss 1.2444 LearningRate 0.000044 Epoch: 32 Global Step: 672480 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:36,109-Speed 2513.77 samples/sec Loss 1.2265 LearningRate 0.000044 Epoch: 32 Global Step: 672490 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:44,321-Speed 2494.12 samples/sec Loss 1.2550 LearningRate 0.000044 Epoch: 32 Global Step: 672500 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:23:52,522-Speed 2497.88 samples/sec Loss 1.1912 LearningRate 0.000044 Epoch: 32 Global Step: 672510 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:00,724-Speed 2497.26 samples/sec Loss 1.2309 LearningRate 0.000044 Epoch: 32 Global Step: 672520 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:08,931-Speed 2495.97 samples/sec Loss 1.2518 LearningRate 0.000044 Epoch: 32 Global Step: 672530 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:17,133-Speed 2497.14 samples/sec Loss 1.2316 LearningRate 0.000044 Epoch: 32 Global Step: 672540 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:25,297-Speed 2508.72 samples/sec Loss 1.2153 LearningRate 0.000044 Epoch: 32 Global Step: 672550 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:33,501-Speed 2496.84 samples/sec Loss 1.2210 LearningRate 0.000044 Epoch: 32 Global Step: 672560 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:41,704-Speed 2497.17 samples/sec Loss 1.2291 LearningRate 0.000044 Epoch: 32 Global Step: 672570 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:49,915-Speed 2494.82 samples/sec Loss 1.2386 LearningRate 0.000044 Epoch: 32 Global Step: 672580 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:24:58,117-Speed 2497.19 samples/sec Loss 1.2394 LearningRate 0.000044 Epoch: 32 Global Step: 672590 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:06,320-Speed 2497.11 samples/sec Loss 1.2373 LearningRate 0.000044 Epoch: 32 Global Step: 672600 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:14,474-Speed 2512.23 samples/sec Loss 1.2575 LearningRate 0.000044 Epoch: 32 Global Step: 672610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:22,680-Speed 2495.99 samples/sec Loss 1.2374 LearningRate 0.000044 Epoch: 32 Global Step: 672620 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:30,884-Speed 2496.74 samples/sec Loss 1.2519 LearningRate 0.000044 Epoch: 32 Global Step: 672630 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:39,089-Speed 2496.57 samples/sec Loss 1.2073 LearningRate 0.000044 Epoch: 32 Global Step: 672640 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:47,296-Speed 2495.85 samples/sec Loss 1.2123 LearningRate 0.000044 Epoch: 32 Global Step: 672650 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:25:55,505-Speed 2495.13 samples/sec Loss 1.2164 LearningRate 0.000044 Epoch: 32 Global Step: 672660 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:03,663-Speed 2510.92 samples/sec Loss 1.2204 LearningRate 0.000044 Epoch: 32 Global Step: 672670 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:11,867-Speed 2496.77 samples/sec Loss 1.2508 LearningRate 0.000044 Epoch: 32 Global Step: 672680 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:20,075-Speed 2495.32 samples/sec Loss 1.2260 LearningRate 0.000044 Epoch: 32 Global Step: 672690 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:28,281-Speed 2496.32 samples/sec Loss 1.2195 LearningRate 0.000044 Epoch: 32 Global Step: 672700 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:36,496-Speed 2493.13 samples/sec Loss 1.2151 LearningRate 0.000044 Epoch: 32 Global Step: 672710 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:44,703-Speed 2495.74 samples/sec Loss 1.2420 LearningRate 0.000044 Epoch: 32 Global Step: 672720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:26:52,858-Speed 2511.86 samples/sec Loss 1.2140 LearningRate 0.000044 Epoch: 32 Global Step: 672730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:01,067-Speed 2495.42 samples/sec Loss 1.2046 LearningRate 0.000044 Epoch: 32 Global Step: 672740 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:09,280-Speed 2493.79 samples/sec Loss 1.2410 LearningRate 0.000044 Epoch: 32 Global Step: 672750 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:17,486-Speed 2496.10 samples/sec Loss 1.2107 LearningRate 0.000044 Epoch: 32 Global Step: 672760 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:25,708-Speed 2491.62 samples/sec Loss 1.1918 LearningRate 0.000044 Epoch: 32 Global Step: 672770 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:33,917-Speed 2495.42 samples/sec Loss 1.2108 LearningRate 0.000044 Epoch: 32 Global Step: 672780 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:42,071-Speed 2512.02 samples/sec Loss 1.2037 LearningRate 0.000044 Epoch: 32 Global Step: 672790 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:50,278-Speed 2496.00 samples/sec Loss 1.2221 LearningRate 0.000044 Epoch: 32 Global Step: 672800 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:27:58,487-Speed 2495.10 samples/sec Loss 1.1744 LearningRate 0.000044 Epoch: 32 Global Step: 672810 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:06,694-Speed 2495.67 samples/sec Loss 1.2330 LearningRate 0.000044 Epoch: 32 Global Step: 672820 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:14,903-Speed 2495.55 samples/sec Loss 1.2285 LearningRate 0.000044 Epoch: 32 Global Step: 672830 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:23,109-Speed 2496.08 samples/sec Loss 1.2063 LearningRate 0.000044 Epoch: 32 Global Step: 672840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:31,262-Speed 2512.24 samples/sec Loss 1.2315 LearningRate 0.000044 Epoch: 32 Global Step: 672850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:39,466-Speed 2496.57 samples/sec Loss 1.2025 LearningRate 0.000044 Epoch: 32 Global Step: 672860 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:47,674-Speed 2495.57 samples/sec Loss 1.2390 LearningRate 0.000044 Epoch: 32 Global Step: 672870 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:28:55,879-Speed 2496.41 samples/sec Loss 1.1935 LearningRate 0.000044 Epoch: 32 Global Step: 672880 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:04,084-Speed 2496.56 samples/sec Loss 1.2217 LearningRate 0.000044 Epoch: 32 Global Step: 672890 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:12,293-Speed 2495.34 samples/sec Loss 1.2352 LearningRate 0.000044 Epoch: 32 Global Step: 672900 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:20,446-Speed 2512.32 samples/sec Loss 1.2446 LearningRate 0.000044 Epoch: 32 Global Step: 672910 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:28,652-Speed 2496.20 samples/sec Loss 1.2265 LearningRate 0.000044 Epoch: 32 Global Step: 672920 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:36,857-Speed 2496.39 samples/sec Loss 1.2248 LearningRate 0.000044 Epoch: 32 Global Step: 672930 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:45,067-Speed 2495.13 samples/sec Loss 1.2182 LearningRate 0.000044 Epoch: 32 Global Step: 672940 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:29:53,272-Speed 2496.46 samples/sec Loss 1.2141 LearningRate 0.000044 Epoch: 32 Global Step: 672950 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:01,479-Speed 2495.68 samples/sec Loss 1.2203 LearningRate 0.000044 Epoch: 32 Global Step: 672960 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:09,628-Speed 2513.55 samples/sec Loss 1.2103 LearningRate 0.000044 Epoch: 32 Global Step: 672970 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:17,836-Speed 2495.83 samples/sec Loss 1.2370 LearningRate 0.000044 Epoch: 32 Global Step: 672980 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:26,042-Speed 2496.10 samples/sec Loss 1.2576 LearningRate 0.000044 Epoch: 32 Global Step: 672990 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:34,250-Speed 2495.30 samples/sec Loss 1.2498 LearningRate 0.000044 Epoch: 32 Global Step: 673000 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:42,455-Speed 2496.26 samples/sec Loss 1.2460 LearningRate 0.000044 Epoch: 32 Global Step: 673010 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:50,659-Speed 2497.01 samples/sec Loss 1.2422 LearningRate 0.000044 Epoch: 32 Global Step: 673020 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:30:58,810-Speed 2513.06 samples/sec Loss 1.2501 LearningRate 0.000044 Epoch: 32 Global Step: 673030 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:07,023-Speed 2493.98 samples/sec Loss 1.2158 LearningRate 0.000044 Epoch: 32 Global Step: 673040 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:15,232-Speed 2495.33 samples/sec Loss 1.2001 LearningRate 0.000044 Epoch: 32 Global Step: 673050 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:23,441-Speed 2495.27 samples/sec Loss 1.2342 LearningRate 0.000044 Epoch: 32 Global Step: 673060 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:31,651-Speed 2494.91 samples/sec Loss 1.2548 LearningRate 0.000044 Epoch: 32 Global Step: 673070 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:39,868-Speed 2492.80 samples/sec Loss 1.2636 LearningRate 0.000044 Epoch: 32 Global Step: 673080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:48,026-Speed 2510.80 samples/sec Loss 1.2219 LearningRate 0.000044 Epoch: 32 Global Step: 673090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:31:56,230-Speed 2496.54 samples/sec Loss 1.2210 LearningRate 0.000044 Epoch: 32 Global Step: 673100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:04,434-Speed 2496.72 samples/sec Loss 1.2103 LearningRate 0.000044 Epoch: 32 Global Step: 673110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:12,645-Speed 2494.80 samples/sec Loss 1.2021 LearningRate 0.000044 Epoch: 32 Global Step: 673120 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:20,862-Speed 2492.68 samples/sec Loss 1.2348 LearningRate 0.000044 Epoch: 32 Global Step: 673130 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:29,068-Speed 2496.01 samples/sec Loss 1.2355 LearningRate 0.000044 Epoch: 32 Global Step: 673140 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:37,224-Speed 2511.60 samples/sec Loss 1.2443 LearningRate 0.000044 Epoch: 32 Global Step: 673150 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:45,431-Speed 2495.75 samples/sec Loss 1.2500 LearningRate 0.000044 Epoch: 32 Global Step: 673160 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:32:53,640-Speed 2495.17 samples/sec Loss 1.2413 LearningRate 0.000044 Epoch: 32 Global Step: 673170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:01,847-Speed 2496.07 samples/sec Loss 1.2566 LearningRate 0.000044 Epoch: 32 Global Step: 673180 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:10,054-Speed 2495.66 samples/sec Loss 1.2261 LearningRate 0.000044 Epoch: 32 Global Step: 673190 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:18,261-Speed 2495.90 samples/sec Loss 1.2375 LearningRate 0.000044 Epoch: 32 Global Step: 673200 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:26,412-Speed 2512.89 samples/sec Loss 1.2752 LearningRate 0.000044 Epoch: 32 Global Step: 673210 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:34,617-Speed 2496.39 samples/sec Loss 1.2134 LearningRate 0.000044 Epoch: 32 Global Step: 673220 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:42,823-Speed 2496.35 samples/sec Loss 1.2490 LearningRate 0.000044 Epoch: 32 Global Step: 673230 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:51,026-Speed 2496.99 samples/sec Loss 1.2341 LearningRate 0.000044 Epoch: 32 Global Step: 673240 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:33:59,233-Speed 2495.80 samples/sec Loss 1.2200 LearningRate 0.000044 Epoch: 32 Global Step: 673250 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:07,438-Speed 2496.21 samples/sec Loss 1.2506 LearningRate 0.000044 Epoch: 32 Global Step: 673260 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:15,593-Speed 2511.84 samples/sec Loss 1.2384 LearningRate 0.000044 Epoch: 32 Global Step: 673270 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:23,797-Speed 2496.73 samples/sec Loss 1.2216 LearningRate 0.000044 Epoch: 32 Global Step: 673280 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:32,010-Speed 2494.21 samples/sec Loss 1.2274 LearningRate 0.000044 Epoch: 32 Global Step: 673290 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:40,830-Speed 2322.53 samples/sec Loss 1.2578 LearningRate 0.000044 Epoch: 32 Global Step: 673300 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:49,048-Speed 2492.41 samples/sec Loss 1.2600 LearningRate 0.000044 Epoch: 32 Global Step: 673310 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:34:57,266-Speed 2498.29 samples/sec Loss 1.1967 LearningRate 0.000044 Epoch: 32 Global Step: 673320 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:05,462-Speed 2515.55 samples/sec Loss 1.2129 LearningRate 0.000044 Epoch: 32 Global Step: 673330 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:16,589-Speed 1840.71 samples/sec Loss 1.2217 LearningRate 0.000044 Epoch: 32 Global Step: 673340 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:26,002-Speed 2502.18 samples/sec Loss 1.2309 LearningRate 0.000044 Epoch: 32 Global Step: 673350 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:34,252-Speed 2499.11 samples/sec Loss 1.2087 LearningRate 0.000044 Epoch: 32 Global Step: 673360 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:42,553-Speed 2500.05 samples/sec Loss 1.2422 LearningRate 0.000044 Epoch: 32 Global Step: 673370 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:35:54,852-Speed 1665.22 samples/sec Loss 1.2332 LearningRate 0.000044 Epoch: 32 Global Step: 673380 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:36:03,026-Speed 2518.54 samples/sec Loss 1.2582 LearningRate 0.000044 Epoch: 32 Global Step: 673390 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-07-12 00:36:11,204-Speed 2511.94 samples/sec Loss 1.2210 LearningRate 0.000044 Epoch: 32 Global Step: 673400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-07-12 00:36:22,015-Speed 1894.58 samples/sec Loss 1.2239 LearningRate 0.000044 Epoch: 32 Global Step: 673410 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:36:30,211-Speed 2500.69 samples/sec Loss 1.2130 LearningRate 0.000044 Epoch: 32 Global Step: 673420 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:36:38,463-Speed 2501.19 samples/sec Loss 1.2882 LearningRate 0.000044 Epoch: 32 Global Step: 673430 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:36:46,704-Speed 2495.88 samples/sec Loss 1.2667 LearningRate 0.000044 Epoch: 32 Global Step: 673440 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:36:59,306-Speed 2516.20 samples/sec Loss 1.2293 LearningRate 0.000044 Epoch: 32 Global Step: 673450 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:07,543-Speed 2500.53 samples/sec Loss 1.2299 LearningRate 0.000044 Epoch: 32 Global Step: 673460 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:16,065-Speed 2413.53 samples/sec Loss 1.2307 LearningRate 0.000044 Epoch: 32 Global Step: 673470 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:24,281-Speed 2493.20 samples/sec Loss 1.2498 LearningRate 0.000044 Epoch: 32 Global Step: 673480 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:32,523-Speed 2500.44 samples/sec Loss 1.2425 LearningRate 0.000044 Epoch: 32 Global Step: 673490 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:44,516-Speed 1716.86 samples/sec Loss 1.2186 LearningRate 0.000044 Epoch: 32 Global Step: 673500 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:37:52,664-Speed 2513.70 samples/sec Loss 1.2598 LearningRate 0.000044 Epoch: 32 Global Step: 673510 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:00,862-Speed 2498.51 samples/sec Loss 1.2210 LearningRate 0.000044 Epoch: 32 Global Step: 673520 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:12,128-Speed 1825.45 samples/sec Loss 1.2393 LearningRate 0.000044 Epoch: 32 Global Step: 673530 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:20,373-Speed 2502.02 samples/sec Loss 1.2346 LearningRate 0.000044 Epoch: 32 Global Step: 673540 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:28,573-Speed 2497.98 samples/sec Loss 1.2373 LearningRate 0.000044 Epoch: 32 Global Step: 673550 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:42,000-Speed 1525.47 samples/sec Loss 1.2486 LearningRate 0.000044 Epoch: 32 Global Step: 673560 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:38:50,857-Speed 2318.29 samples/sec Loss 1.2412 LearningRate 0.000044 Epoch: 32 Global Step: 673570 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:00,480-Speed 2496.20 samples/sec Loss 1.2266 LearningRate 0.000044 Epoch: 32 Global Step: 673580 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:08,676-Speed 2499.28 samples/sec Loss 1.2399 LearningRate 0.000044 Epoch: 32 Global Step: 673590 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:16,878-Speed 2497.19 samples/sec Loss 1.2329 LearningRate 0.000044 Epoch: 32 Global Step: 673600 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:25,084-Speed 2496.10 samples/sec Loss 1.2443 LearningRate 0.000044 Epoch: 32 Global Step: 673610 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:33,281-Speed 2498.89 samples/sec Loss 1.2068 LearningRate 0.000044 Epoch: 32 Global Step: 673620 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:41,426-Speed 2514.93 samples/sec Loss 1.2230 LearningRate 0.000044 Epoch: 32 Global Step: 673630 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:49,630-Speed 2496.89 samples/sec Loss 1.2441 LearningRate 0.000044 Epoch: 32 Global Step: 673640 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:39:57,829-Speed 2498.22 samples/sec Loss 1.2107 LearningRate 0.000044 Epoch: 32 Global Step: 673650 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:06,031-Speed 2497.36 samples/sec Loss 1.2450 LearningRate 0.000044 Epoch: 32 Global Step: 673660 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:14,238-Speed 2495.99 samples/sec Loss 1.2669 LearningRate 0.000044 Epoch: 32 Global Step: 673670 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:22,435-Speed 2498.88 samples/sec Loss 1.2288 LearningRate 0.000044 Epoch: 32 Global Step: 673680 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:30,580-Speed 2514.65 samples/sec Loss 1.2131 LearningRate 0.000044 Epoch: 32 Global Step: 673690 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:38,793-Speed 2494.17 samples/sec Loss 1.2244 LearningRate 0.000044 Epoch: 32 Global Step: 673700 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:46,995-Speed 2497.41 samples/sec Loss 1.2340 LearningRate 0.000044 Epoch: 32 Global Step: 673710 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:40:55,198-Speed 2497.38 samples/sec Loss 1.2423 LearningRate 0.000044 Epoch: 32 Global Step: 673720 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:03,399-Speed 2497.53 samples/sec Loss 1.2304 LearningRate 0.000044 Epoch: 32 Global Step: 673730 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:11,605-Speed 2496.04 samples/sec Loss 1.2484 LearningRate 0.000044 Epoch: 32 Global Step: 673740 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:19,754-Speed 2513.79 samples/sec Loss 1.2323 LearningRate 0.000044 Epoch: 32 Global Step: 673750 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:27,961-Speed 2495.73 samples/sec Loss 1.2434 LearningRate 0.000044 Epoch: 32 Global Step: 673760 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:36,163-Speed 2497.41 samples/sec Loss 1.2626 LearningRate 0.000044 Epoch: 32 Global Step: 673770 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:44,364-Speed 2497.87 samples/sec Loss 1.2009 LearningRate 0.000044 Epoch: 32 Global Step: 673780 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:41:52,568-Speed 2496.61 samples/sec Loss 1.2433 LearningRate 0.000044 Epoch: 32 Global Step: 673790 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:00,772-Speed 2496.73 samples/sec Loss 1.2152 LearningRate 0.000044 Epoch: 32 Global Step: 673800 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:08,920-Speed 2513.80 samples/sec Loss 1.2252 LearningRate 0.000044 Epoch: 32 Global Step: 673810 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:17,143-Speed 2491.13 samples/sec Loss 1.2054 LearningRate 0.000044 Epoch: 32 Global Step: 673820 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:25,346-Speed 2497.16 samples/sec Loss 1.2761 LearningRate 0.000044 Epoch: 32 Global Step: 673830 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:33,547-Speed 2497.56 samples/sec Loss 1.2384 LearningRate 0.000044 Epoch: 32 Global Step: 673840 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:41,749-Speed 2497.40 samples/sec Loss 1.2404 LearningRate 0.000043 Epoch: 32 Global Step: 673850 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:49,961-Speed 2494.42 samples/sec Loss 1.2119 LearningRate 0.000043 Epoch: 32 Global Step: 673860 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:42:58,115-Speed 2512.60 samples/sec Loss 1.1945 LearningRate 0.000043 Epoch: 32 Global Step: 673870 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:06,316-Speed 2497.63 samples/sec Loss 1.2731 LearningRate 0.000043 Epoch: 32 Global Step: 673880 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:14,522-Speed 2496.09 samples/sec Loss 1.2361 LearningRate 0.000043 Epoch: 32 Global Step: 673890 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:22,724-Speed 2497.29 samples/sec Loss 1.2405 LearningRate 0.000043 Epoch: 32 Global Step: 673900 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:30,925-Speed 2497.75 samples/sec Loss 1.2146 LearningRate 0.000043 Epoch: 32 Global Step: 673910 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:39,123-Speed 2498.37 samples/sec Loss 1.2171 LearningRate 0.000043 Epoch: 32 Global Step: 673920 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:47,273-Speed 2513.56 samples/sec Loss 1.2230 LearningRate 0.000043 Epoch: 32 Global Step: 673930 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:43:55,472-Speed 2498.15 samples/sec Loss 1.2209 LearningRate 0.000043 Epoch: 32 Global Step: 673940 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:03,678-Speed 2496.26 samples/sec Loss 1.2163 LearningRate 0.000043 Epoch: 32 Global Step: 673950 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:11,912-Speed 2487.93 samples/sec Loss 1.2144 LearningRate 0.000043 Epoch: 32 Global Step: 673960 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:20,118-Speed 2495.94 samples/sec Loss 1.2351 LearningRate 0.000043 Epoch: 32 Global Step: 673970 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:28,323-Speed 2496.43 samples/sec Loss 1.2109 LearningRate 0.000043 Epoch: 32 Global Step: 673980 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:36,471-Speed 2514.07 samples/sec Loss 1.2231 LearningRate 0.000043 Epoch: 32 Global Step: 673990 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:44,673-Speed 2497.26 samples/sec Loss 1.2234 LearningRate 0.000043 Epoch: 32 Global Step: 674000 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:44:52,875-Speed 2497.39 samples/sec Loss 1.2445 LearningRate 0.000043 Epoch: 32 Global Step: 674010 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:01,079-Speed 2497.26 samples/sec Loss 1.1967 LearningRate 0.000043 Epoch: 32 Global Step: 674020 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:09,293-Speed 2493.44 samples/sec Loss 1.2031 LearningRate 0.000043 Epoch: 32 Global Step: 674030 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:17,499-Speed 2495.96 samples/sec Loss 1.2368 LearningRate 0.000043 Epoch: 32 Global Step: 674040 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:25,650-Speed 2513.20 samples/sec Loss 1.2510 LearningRate 0.000043 Epoch: 32 Global Step: 674050 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:33,852-Speed 2497.31 samples/sec Loss 1.1886 LearningRate 0.000043 Epoch: 32 Global Step: 674060 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:42,052-Speed 2497.86 samples/sec Loss 1.2257 LearningRate 0.000043 Epoch: 32 Global Step: 674070 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:50,258-Speed 2495.96 samples/sec Loss 1.2067 LearningRate 0.000043 Epoch: 32 Global Step: 674080 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:45:58,477-Speed 2492.28 samples/sec Loss 1.2139 LearningRate 0.000043 Epoch: 32 Global Step: 674090 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:06,678-Speed 2497.57 samples/sec Loss 1.2510 LearningRate 0.000043 Epoch: 32 Global Step: 674100 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:14,829-Speed 2513.26 samples/sec Loss 1.2343 LearningRate 0.000043 Epoch: 32 Global Step: 674110 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:23,029-Speed 2498.00 samples/sec Loss 1.2429 LearningRate 0.000043 Epoch: 32 Global Step: 674120 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:31,233-Speed 2497.24 samples/sec Loss 1.2127 LearningRate 0.000043 Epoch: 32 Global Step: 674130 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:39,433-Speed 2497.75 samples/sec Loss 1.2026 LearningRate 0.000043 Epoch: 32 Global Step: 674140 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:47,631-Speed 2498.59 samples/sec Loss 1.2010 LearningRate 0.000043 Epoch: 32 Global Step: 674150 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:46:55,833-Speed 2497.31 samples/sec Loss 1.2337 LearningRate 0.000043 Epoch: 32 Global Step: 674160 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:03,979-Speed 2514.78 samples/sec Loss 1.2257 LearningRate 0.000043 Epoch: 32 Global Step: 674170 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:12,179-Speed 2497.73 samples/sec Loss 1.2231 LearningRate 0.000043 Epoch: 32 Global Step: 674180 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:20,392-Speed 2493.89 samples/sec Loss 1.2156 LearningRate 0.000043 Epoch: 32 Global Step: 674190 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:28,595-Speed 2497.17 samples/sec Loss 1.2419 LearningRate 0.000043 Epoch: 32 Global Step: 674200 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:36,803-Speed 2495.35 samples/sec Loss 1.2563 LearningRate 0.000043 Epoch: 32 Global Step: 674210 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:45,021-Speed 2492.59 samples/sec Loss 1.2397 LearningRate 0.000043 Epoch: 32 Global Step: 674220 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:47:53,167-Speed 2514.32 samples/sec Loss 1.1908 LearningRate 0.000043 Epoch: 32 Global Step: 674230 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:01,365-Speed 2498.84 samples/sec Loss 1.1987 LearningRate 0.000043 Epoch: 32 Global Step: 674240 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:09,566-Speed 2497.70 samples/sec Loss 1.2369 LearningRate 0.000043 Epoch: 32 Global Step: 674250 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:17,764-Speed 2498.57 samples/sec Loss 1.2328 LearningRate 0.000043 Epoch: 32 Global Step: 674260 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:25,983-Speed 2492.08 samples/sec Loss 1.2206 LearningRate 0.000043 Epoch: 32 Global Step: 674270 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:34,183-Speed 2498.18 samples/sec Loss 1.2168 LearningRate 0.000043 Epoch: 32 Global Step: 674280 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:42,337-Speed 2512.10 samples/sec Loss 1.2029 LearningRate 0.000043 Epoch: 32 Global Step: 674290 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:50,534-Speed 2498.59 samples/sec Loss 1.2447 LearningRate 0.000043 Epoch: 32 Global Step: 674300 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:48:58,734-Speed 2497.85 samples/sec Loss 1.2064 LearningRate 0.000043 Epoch: 32 Global Step: 674310 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:06,932-Speed 2498.74 samples/sec Loss 1.2395 LearningRate 0.000043 Epoch: 32 Global Step: 674320 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:15,128-Speed 2499.42 samples/sec Loss 1.2004 LearningRate 0.000043 Epoch: 32 Global Step: 674330 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:23,327-Speed 2498.10 samples/sec Loss 1.2316 LearningRate 0.000043 Epoch: 32 Global Step: 674340 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:31,494-Speed 2508.29 samples/sec Loss 1.2277 LearningRate 0.000043 Epoch: 32 Global Step: 674350 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:39,693-Speed 2498.49 samples/sec Loss 1.2402 LearningRate 0.000043 Epoch: 32 Global Step: 674360 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:47,895-Speed 2497.27 samples/sec Loss 1.2030 LearningRate 0.000043 Epoch: 32 Global Step: 674370 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:49:56,096-Speed 2497.47 samples/sec Loss 1.2232 LearningRate 0.000043 Epoch: 32 Global Step: 674380 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:04,296-Speed 2498.28 samples/sec Loss 1.2285 LearningRate 0.000043 Epoch: 32 Global Step: 674390 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:12,503-Speed 2495.83 samples/sec Loss 1.2199 LearningRate 0.000043 Epoch: 32 Global Step: 674400 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:20,652-Speed 2513.55 samples/sec Loss 1.2372 LearningRate 0.000043 Epoch: 32 Global Step: 674410 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:28,850-Speed 2498.50 samples/sec Loss 1.2591 LearningRate 0.000043 Epoch: 32 Global Step: 674420 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:37,048-Speed 2498.50 samples/sec Loss 1.2076 LearningRate 0.000043 Epoch: 32 Global Step: 674430 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:45,261-Speed 2494.06 samples/sec Loss 1.2385 LearningRate 0.000043 Epoch: 32 Global Step: 674440 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:50:53,458-Speed 2498.80 samples/sec Loss 1.2333 LearningRate 0.000043 Epoch: 32 Global Step: 674450 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:51:01,673-Speed 2493.58 samples/sec Loss 1.2274 LearningRate 0.000043 Epoch: 32 Global Step: 674460 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:51:09,831-Speed 2510.90 samples/sec Loss 1.2048 LearningRate 0.000043 Epoch: 32 Global Step: 674470 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:51:18,032-Speed 2497.74 samples/sec Loss 1.2362 LearningRate 0.000043 Epoch: 32 Global Step: 674480 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:51:26,239-Speed 2495.80 samples/sec Loss 1.2270 LearningRate 0.000043 Epoch: 32 Global Step: 674490 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-07-12 00:51:34,441-Speed 2497.29 samples/sec Loss 1.2525 LearningRate 0.000043 Epoch: 32 Global Step: 674500 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:51:42,654-Speed 2494.08 samples/sec Loss 1.2553 LearningRate 0.000043 Epoch: 32 Global Step: 674510 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:51:50,860-Speed 2496.12 samples/sec Loss 1.2020 LearningRate 0.000043 Epoch: 32 Global Step: 674520 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:51:59,006-Speed 2514.48 samples/sec Loss 1.2087 LearningRate 0.000043 Epoch: 32 Global Step: 674530 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:07,216-Speed 2494.87 samples/sec Loss 1.2671 LearningRate 0.000043 Epoch: 32 Global Step: 674540 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:15,416-Speed 2497.91 samples/sec Loss 1.2552 LearningRate 0.000043 Epoch: 32 Global Step: 674550 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:23,618-Speed 2497.52 samples/sec Loss 1.2424 LearningRate 0.000043 Epoch: 32 Global Step: 674560 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:31,817-Speed 2498.18 samples/sec Loss 1.2216 LearningRate 0.000043 Epoch: 32 Global Step: 674570 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:40,017-Speed 2497.98 samples/sec Loss 1.2283 LearningRate 0.000043 Epoch: 32 Global Step: 674580 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:48,164-Speed 2514.28 samples/sec Loss 1.2687 LearningRate 0.000043 Epoch: 32 Global Step: 674590 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:52:56,368-Speed 2496.69 samples/sec Loss 1.2314 LearningRate 0.000043 Epoch: 32 Global Step: 674600 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 00:53:04,565-Speed 2498.67 samples/sec Loss 1.2099 LearningRate 0.000043 Epoch: 32 Global Step: 674610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:12,769-Speed 2497.06 samples/sec Loss 1.2417 LearningRate 0.000043 Epoch: 32 Global Step: 674620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:20,980-Speed 2494.61 samples/sec Loss 1.2397 LearningRate 0.000043 Epoch: 32 Global Step: 674630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:29,187-Speed 2495.63 samples/sec Loss 1.2237 LearningRate 0.000043 Epoch: 32 Global Step: 674640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:37,334-Speed 2514.26 samples/sec Loss 1.2725 LearningRate 0.000043 Epoch: 32 Global Step: 674650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:45,533-Speed 2498.32 samples/sec Loss 1.2022 LearningRate 0.000043 Epoch: 32 Global Step: 674660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:53:53,747-Speed 2493.68 samples/sec Loss 1.2146 LearningRate 0.000043 Epoch: 32 Global Step: 674670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:01,953-Speed 2496.03 samples/sec Loss 1.2307 LearningRate 0.000043 Epoch: 32 Global Step: 674680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:10,165-Speed 2494.47 samples/sec Loss 1.2269 LearningRate 0.000043 Epoch: 32 Global Step: 674690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:18,370-Speed 2496.33 samples/sec Loss 1.2117 LearningRate 0.000043 Epoch: 32 Global Step: 674700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:26,512-Speed 2515.62 samples/sec Loss 1.2229 LearningRate 0.000043 Epoch: 32 Global Step: 674710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:34,712-Speed 2497.88 samples/sec Loss 1.1994 LearningRate 0.000043 Epoch: 32 Global Step: 674720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:42,915-Speed 2497.10 samples/sec Loss 1.2466 LearningRate 0.000043 Epoch: 32 Global Step: 674730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:51,128-Speed 2494.13 samples/sec Loss 1.2264 LearningRate 0.000043 Epoch: 32 Global Step: 674740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:54:59,331-Speed 2496.87 samples/sec Loss 1.2284 LearningRate 0.000043 Epoch: 32 Global Step: 674750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:07,532-Speed 2497.80 samples/sec Loss 1.2368 LearningRate 0.000043 Epoch: 32 Global Step: 674760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:15,680-Speed 2513.80 samples/sec Loss 1.2066 LearningRate 0.000043 Epoch: 32 Global Step: 674770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:23,883-Speed 2497.18 samples/sec Loss 1.2253 LearningRate 0.000043 Epoch: 32 Global Step: 674780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:32,089-Speed 2496.26 samples/sec Loss 1.2106 LearningRate 0.000043 Epoch: 32 Global Step: 674790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:40,294-Speed 2496.56 samples/sec Loss 1.2313 LearningRate 0.000043 Epoch: 32 Global Step: 674800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:48,499-Speed 2496.47 samples/sec Loss 1.2121 LearningRate 0.000043 Epoch: 32 Global Step: 674810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:55:56,700-Speed 2497.92 samples/sec Loss 1.2502 LearningRate 0.000043 Epoch: 32 Global Step: 674820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:04,848-Speed 2513.72 samples/sec Loss 1.2137 LearningRate 0.000043 Epoch: 32 Global Step: 674830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:13,051-Speed 2497.15 samples/sec Loss 1.1899 LearningRate 0.000043 Epoch: 32 Global Step: 674840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:21,253-Speed 2497.39 samples/sec Loss 1.2301 LearningRate 0.000043 Epoch: 32 Global Step: 674850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:29,465-Speed 2494.25 samples/sec Loss 1.2446 LearningRate 0.000043 Epoch: 32 Global Step: 674860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:37,664-Speed 2498.16 samples/sec Loss 1.2194 LearningRate 0.000043 Epoch: 32 Global Step: 674870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:45,862-Speed 2498.51 samples/sec Loss 1.2306 LearningRate 0.000043 Epoch: 32 Global Step: 674880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:56:54,012-Speed 2513.66 samples/sec Loss 1.2479 LearningRate 0.000043 Epoch: 32 Global Step: 674890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:02,209-Speed 2499.06 samples/sec Loss 1.2443 LearningRate 0.000043 Epoch: 32 Global Step: 674900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:10,410-Speed 2497.83 samples/sec Loss 1.2512 LearningRate 0.000043 Epoch: 32 Global Step: 674910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:18,604-Speed 2499.76 samples/sec Loss 1.2560 LearningRate 0.000043 Epoch: 32 Global Step: 674920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:26,804-Speed 2498.20 samples/sec Loss 1.2235 LearningRate 0.000043 Epoch: 32 Global Step: 674930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:35,002-Speed 2498.25 samples/sec Loss 1.2323 LearningRate 0.000043 Epoch: 32 Global Step: 674940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:43,162-Speed 2510.26 samples/sec Loss 1.2230 LearningRate 0.000043 Epoch: 32 Global Step: 674950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:51,364-Speed 2497.87 samples/sec Loss 1.2222 LearningRate 0.000043 Epoch: 32 Global Step: 674960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:57:59,562-Speed 2498.38 samples/sec Loss 1.2384 LearningRate 0.000043 Epoch: 32 Global Step: 674970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:07,764-Speed 2497.49 samples/sec Loss 1.2350 LearningRate 0.000043 Epoch: 32 Global Step: 674980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:15,972-Speed 2495.31 samples/sec Loss 1.2105 LearningRate 0.000043 Epoch: 32 Global Step: 674990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:24,172-Speed 2498.25 samples/sec Loss 1.2022 LearningRate 0.000043 Epoch: 32 Global Step: 675000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:32,328-Speed 2511.81 samples/sec Loss 1.2125 LearningRate 0.000043 Epoch: 32 Global Step: 675010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:40,530-Speed 2497.02 samples/sec Loss 1.2380 LearningRate 0.000043 Epoch: 32 Global Step: 675020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:48,731-Speed 2497.68 samples/sec Loss 1.2351 LearningRate 0.000043 Epoch: 32 Global Step: 675030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:58:56,937-Speed 2496.31 samples/sec Loss 1.2232 LearningRate 0.000043 Epoch: 32 Global Step: 675040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:05,144-Speed 2495.88 samples/sec Loss 1.2293 LearningRate 0.000043 Epoch: 32 Global Step: 675050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:13,355-Speed 2494.48 samples/sec Loss 1.2118 LearningRate 0.000043 Epoch: 32 Global Step: 675060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:21,505-Speed 2513.39 samples/sec Loss 1.2265 LearningRate 0.000043 Epoch: 32 Global Step: 675070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:29,706-Speed 2497.64 samples/sec Loss 1.2263 LearningRate 0.000043 Epoch: 32 Global Step: 675080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:37,910-Speed 2496.95 samples/sec Loss 1.2325 LearningRate 0.000043 Epoch: 32 Global Step: 675090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:46,111-Speed 2497.94 samples/sec Loss 1.2072 LearningRate 0.000043 Epoch: 32 Global Step: 675100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 00:59:54,311-Speed 2498.03 samples/sec Loss 1.2251 LearningRate 0.000043 Epoch: 32 Global Step: 675110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:02,510-Speed 2498.06 samples/sec Loss 1.2297 LearningRate 0.000043 Epoch: 32 Global Step: 675120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:10,658-Speed 2513.98 samples/sec Loss 1.2297 LearningRate 0.000043 Epoch: 32 Global Step: 675130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:18,887-Speed 2489.29 samples/sec Loss 1.2397 LearningRate 0.000043 Epoch: 32 Global Step: 675140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:27,086-Speed 2498.10 samples/sec Loss 1.1967 LearningRate 0.000043 Epoch: 32 Global Step: 675150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:35,289-Speed 2497.32 samples/sec Loss 1.2129 LearningRate 0.000043 Epoch: 32 Global Step: 675160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:43,488-Speed 2498.13 samples/sec Loss 1.2028 LearningRate 0.000043 Epoch: 32 Global Step: 675170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:51,701-Speed 2493.97 samples/sec Loss 1.2183 LearningRate 0.000043 Epoch: 32 Global Step: 675180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:00:59,852-Speed 2513.12 samples/sec Loss 1.2276 LearningRate 0.000043 Epoch: 32 Global Step: 675190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:08,050-Speed 2498.59 samples/sec Loss 1.1957 LearningRate 0.000043 Epoch: 32 Global Step: 675200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:16,253-Speed 2497.13 samples/sec Loss 1.2045 LearningRate 0.000043 Epoch: 32 Global Step: 675210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:24,452-Speed 2498.21 samples/sec Loss 1.2328 LearningRate 0.000043 Epoch: 32 Global Step: 675220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:32,651-Speed 2498.27 samples/sec Loss 1.2261 LearningRate 0.000043 Epoch: 32 Global Step: 675230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:40,861-Speed 2494.89 samples/sec Loss 1.2337 LearningRate 0.000043 Epoch: 32 Global Step: 675240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:49,009-Speed 2513.92 samples/sec Loss 1.2192 LearningRate 0.000043 Epoch: 32 Global Step: 675250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:01:57,213-Speed 2496.70 samples/sec Loss 1.1958 LearningRate 0.000043 Epoch: 32 Global Step: 675260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:05,411-Speed 2498.43 samples/sec Loss 1.1975 LearningRate 0.000043 Epoch: 32 Global Step: 675270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:13,610-Speed 2498.42 samples/sec Loss 1.2603 LearningRate 0.000043 Epoch: 32 Global Step: 675280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:21,816-Speed 2495.91 samples/sec Loss 1.2402 LearningRate 0.000043 Epoch: 32 Global Step: 675290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:30,018-Speed 2497.43 samples/sec Loss 1.2240 LearningRate 0.000043 Epoch: 32 Global Step: 675300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:38,179-Speed 2509.91 samples/sec Loss 1.2186 LearningRate 0.000043 Epoch: 32 Global Step: 675310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:46,382-Speed 2497.47 samples/sec Loss 1.2047 LearningRate 0.000043 Epoch: 32 Global Step: 675320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:02:54,584-Speed 2497.37 samples/sec Loss 1.2194 LearningRate 0.000043 Epoch: 32 Global Step: 675330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:02,786-Speed 2497.29 samples/sec Loss 1.2390 LearningRate 0.000043 Epoch: 32 Global Step: 675340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:10,987-Speed 2497.89 samples/sec Loss 1.1870 LearningRate 0.000043 Epoch: 32 Global Step: 675350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:19,192-Speed 2496.58 samples/sec Loss 1.1841 LearningRate 0.000043 Epoch: 32 Global Step: 675360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:27,346-Speed 2511.95 samples/sec Loss 1.1824 LearningRate 0.000043 Epoch: 32 Global Step: 675370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:35,544-Speed 2498.69 samples/sec Loss 1.2327 LearningRate 0.000043 Epoch: 32 Global Step: 675380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:43,755-Speed 2494.37 samples/sec Loss 1.2412 LearningRate 0.000043 Epoch: 32 Global Step: 675390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:03:51,961-Speed 2496.24 samples/sec Loss 1.2194 LearningRate 0.000043 Epoch: 32 Global Step: 675400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:00,159-Speed 2498.70 samples/sec Loss 1.2033 LearningRate 0.000043 Epoch: 32 Global Step: 675410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:08,359-Speed 2497.88 samples/sec Loss 1.2050 LearningRate 0.000043 Epoch: 32 Global Step: 675420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:16,520-Speed 2510.01 samples/sec Loss 1.2254 LearningRate 0.000043 Epoch: 32 Global Step: 675430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:24,734-Speed 2493.71 samples/sec Loss 1.2373 LearningRate 0.000043 Epoch: 32 Global Step: 675440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:32,939-Speed 2496.29 samples/sec Loss 1.2083 LearningRate 0.000043 Epoch: 32 Global Step: 675450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:41,138-Speed 2498.31 samples/sec Loss 1.2118 LearningRate 0.000043 Epoch: 32 Global Step: 675460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:49,338-Speed 2497.89 samples/sec Loss 1.2045 LearningRate 0.000043 Epoch: 32 Global Step: 675470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:04:57,543-Speed 2496.67 samples/sec Loss 1.2443 LearningRate 0.000043 Epoch: 32 Global Step: 675480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:05,690-Speed 2514.09 samples/sec Loss 1.2014 LearningRate 0.000043 Epoch: 32 Global Step: 675490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:13,899-Speed 2495.57 samples/sec Loss 1.2066 LearningRate 0.000043 Epoch: 32 Global Step: 675500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:22,096-Speed 2498.88 samples/sec Loss 1.2151 LearningRate 0.000043 Epoch: 32 Global Step: 675510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:30,296-Speed 2498.09 samples/sec Loss 1.2287 LearningRate 0.000043 Epoch: 32 Global Step: 675520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:38,499-Speed 2496.96 samples/sec Loss 1.2032 LearningRate 0.000043 Epoch: 32 Global Step: 675530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:46,701-Speed 2497.58 samples/sec Loss 1.2133 LearningRate 0.000043 Epoch: 32 Global Step: 675540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:05:54,850-Speed 2513.44 samples/sec Loss 1.2462 LearningRate 0.000043 Epoch: 32 Global Step: 675550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:03,050-Speed 2497.86 samples/sec Loss 1.2387 LearningRate 0.000043 Epoch: 32 Global Step: 675560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:11,250-Speed 2498.09 samples/sec Loss 1.2024 LearningRate 0.000043 Epoch: 32 Global Step: 675570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:19,449-Speed 2498.37 samples/sec Loss 1.2082 LearningRate 0.000043 Epoch: 32 Global Step: 675580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:27,647-Speed 2498.50 samples/sec Loss 1.1874 LearningRate 0.000043 Epoch: 32 Global Step: 675590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:35,847-Speed 2497.84 samples/sec Loss 1.2277 LearningRate 0.000043 Epoch: 32 Global Step: 675600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:43,999-Speed 2512.55 samples/sec Loss 1.2442 LearningRate 0.000043 Epoch: 32 Global Step: 675610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:06:52,199-Speed 2498.02 samples/sec Loss 1.1915 LearningRate 0.000043 Epoch: 32 Global Step: 675620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:00,404-Speed 2496.38 samples/sec Loss 1.2437 LearningRate 0.000043 Epoch: 32 Global Step: 675630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:08,604-Speed 2497.95 samples/sec Loss 1.2325 LearningRate 0.000043 Epoch: 32 Global Step: 675640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:16,806-Speed 2497.30 samples/sec Loss 1.2145 LearningRate 0.000042 Epoch: 32 Global Step: 675650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:25,007-Speed 2497.73 samples/sec Loss 1.2349 LearningRate 0.000042 Epoch: 32 Global Step: 675660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:33,154-Speed 2514.25 samples/sec Loss 1.1802 LearningRate 0.000042 Epoch: 32 Global Step: 675670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:41,354-Speed 2497.98 samples/sec Loss 1.1930 LearningRate 0.000042 Epoch: 32 Global Step: 675680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:49,553-Speed 2498.01 samples/sec Loss 1.2069 LearningRate 0.000042 Epoch: 32 Global Step: 675690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:07:57,754-Speed 2497.81 samples/sec Loss 1.2205 LearningRate 0.000042 Epoch: 32 Global Step: 675700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:05,970-Speed 2493.09 samples/sec Loss 1.2185 LearningRate 0.000042 Epoch: 32 Global Step: 675710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:14,170-Speed 2497.85 samples/sec Loss 1.2059 LearningRate 0.000042 Epoch: 32 Global Step: 675720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:22,316-Speed 2514.76 samples/sec Loss 1.2275 LearningRate 0.000042 Epoch: 32 Global Step: 675730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:30,518-Speed 2497.24 samples/sec Loss 1.2256 LearningRate 0.000042 Epoch: 32 Global Step: 675740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:38,717-Speed 2498.33 samples/sec Loss 1.2196 LearningRate 0.000042 Epoch: 32 Global Step: 675750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:46,916-Speed 2498.37 samples/sec Loss 1.2405 LearningRate 0.000042 Epoch: 32 Global Step: 675760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:08:55,117-Speed 2497.79 samples/sec Loss 1.2437 LearningRate 0.000042 Epoch: 32 Global Step: 675770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:09:03,322-Speed 2496.13 samples/sec Loss 1.2221 LearningRate 0.000042 Epoch: 32 Global Step: 675780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:09:11,474-Speed 2512.72 samples/sec Loss 1.1981 LearningRate 0.000042 Epoch: 32 Global Step: 675790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:09:19,673-Speed 2498.17 samples/sec Loss 1.2149 LearningRate 0.000042 Epoch: 32 Global Step: 675800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:09:27,873-Speed 2497.97 samples/sec Loss 1.2146 LearningRate 0.000042 Epoch: 32 Global Step: 675810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:09:36,074-Speed 2497.92 samples/sec Loss 1.2394 LearningRate 0.000042 Epoch: 32 Global Step: 675820 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:09:44,273-Speed 2497.89 samples/sec Loss 1.2050 LearningRate 0.000042 Epoch: 32 Global Step: 675830 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:09:52,474-Speed 2497.94 samples/sec Loss 1.2447 LearningRate 0.000042 Epoch: 32 Global Step: 675840 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:00,622-Speed 2513.77 samples/sec Loss 1.1778 LearningRate 0.000042 Epoch: 32 Global Step: 675850 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:08,831-Speed 2495.09 samples/sec Loss 1.2179 LearningRate 0.000042 Epoch: 32 Global Step: 675860 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:17,033-Speed 2497.37 samples/sec Loss 1.1990 LearningRate 0.000042 Epoch: 32 Global Step: 675870 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:25,232-Speed 2498.22 samples/sec Loss 1.2057 LearningRate 0.000042 Epoch: 32 Global Step: 675880 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:33,438-Speed 2496.16 samples/sec Loss 1.2311 LearningRate 0.000042 Epoch: 32 Global Step: 675890 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:41,642-Speed 2496.84 samples/sec Loss 1.2140 LearningRate 0.000042 Epoch: 32 Global Step: 675900 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:49,790-Speed 2513.66 samples/sec Loss 1.2319 LearningRate 0.000042 Epoch: 32 Global Step: 675910 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:10:58,001-Speed 2494.94 samples/sec Loss 1.2232 LearningRate 0.000042 Epoch: 32 Global Step: 675920 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:06,201-Speed 2497.92 samples/sec Loss 1.2172 LearningRate 0.000042 Epoch: 32 Global Step: 675930 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:14,402-Speed 2497.70 samples/sec Loss 1.2160 LearningRate 0.000042 Epoch: 32 Global Step: 675940 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:22,602-Speed 2497.89 samples/sec Loss 1.2352 LearningRate 0.000042 Epoch: 32 Global Step: 675950 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:30,817-Speed 2493.45 samples/sec Loss 1.2042 LearningRate 0.000042 Epoch: 32 Global Step: 675960 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:38,963-Speed 2514.39 samples/sec Loss 1.2045 LearningRate 0.000042 Epoch: 32 Global Step: 675970 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:47,166-Speed 2497.23 samples/sec Loss 1.2011 LearningRate 0.000042 Epoch: 32 Global Step: 675980 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:11:55,371-Speed 2496.46 samples/sec Loss 1.2153 LearningRate 0.000042 Epoch: 32 Global Step: 675990 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:03,584-Speed 2494.03 samples/sec Loss 1.2018 LearningRate 0.000042 Epoch: 32 Global Step: 676000 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:11,788-Speed 2496.71 samples/sec Loss 1.1976 LearningRate 0.000042 Epoch: 32 Global Step: 676010 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:19,992-Speed 2497.02 samples/sec Loss 1.2000 LearningRate 0.000042 Epoch: 32 Global Step: 676020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:28,143-Speed 2512.81 samples/sec Loss 1.1577 LearningRate 0.000042 Epoch: 32 Global Step: 676030 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:36,345-Speed 2497.57 samples/sec Loss 1.2235 LearningRate 0.000042 Epoch: 32 Global Step: 676040 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:44,559-Speed 2493.83 samples/sec Loss 1.2312 LearningRate 0.000042 Epoch: 32 Global Step: 676050 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:12:52,757-Speed 2498.38 samples/sec Loss 1.2116 LearningRate 0.000042 Epoch: 32 Global Step: 676060 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:00,959-Speed 2497.52 samples/sec Loss 1.2092 LearningRate 0.000042 Epoch: 32 Global Step: 676070 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:09,160-Speed 2497.55 samples/sec Loss 1.1923 LearningRate 0.000042 Epoch: 32 Global Step: 676080 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:17,305-Speed 2514.90 samples/sec Loss 1.2296 LearningRate 0.000042 Epoch: 32 Global Step: 676090 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:25,506-Speed 2497.95 samples/sec Loss 1.2291 LearningRate 0.000042 Epoch: 32 Global Step: 676100 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:33,711-Speed 2496.24 samples/sec Loss 1.2280 LearningRate 0.000042 Epoch: 32 Global Step: 676110 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:41,913-Speed 2497.52 samples/sec Loss 1.2022 LearningRate 0.000042 Epoch: 32 Global Step: 676120 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:50,117-Speed 2496.53 samples/sec Loss 1.1874 LearningRate 0.000042 Epoch: 32 Global Step: 676130 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:13:58,317-Speed 2497.83 samples/sec Loss 1.2131 LearningRate 0.000042 Epoch: 32 Global Step: 676140 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:06,470-Speed 2512.72 samples/sec Loss 1.2132 LearningRate 0.000042 Epoch: 32 Global Step: 676150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:14,670-Speed 2498.00 samples/sec Loss 1.2102 LearningRate 0.000042 Epoch: 32 Global Step: 676160 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:22,895-Speed 2490.31 samples/sec Loss 1.2569 LearningRate 0.000042 Epoch: 32 Global Step: 676170 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:31,108-Speed 2494.10 samples/sec Loss 1.2152 LearningRate 0.000042 Epoch: 32 Global Step: 676180 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:39,316-Speed 2496.27 samples/sec Loss 1.2228 LearningRate 0.000042 Epoch: 32 Global Step: 676190 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:47,522-Speed 2496.21 samples/sec Loss 1.2020 LearningRate 0.000042 Epoch: 32 Global Step: 676200 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:14:55,675-Speed 2512.44 samples/sec Loss 1.2003 LearningRate 0.000042 Epoch: 32 Global Step: 676210 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:03,883-Speed 2495.36 samples/sec Loss 1.2618 LearningRate 0.000042 Epoch: 32 Global Step: 676220 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:12,092-Speed 2495.46 samples/sec Loss 1.2116 LearningRate 0.000042 Epoch: 32 Global Step: 676230 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:20,311-Speed 2492.26 samples/sec Loss 1.2021 LearningRate 0.000042 Epoch: 32 Global Step: 676240 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:28,520-Speed 2495.28 samples/sec Loss 1.2142 LearningRate 0.000042 Epoch: 32 Global Step: 676250 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:36,725-Speed 2496.24 samples/sec Loss 1.2119 LearningRate 0.000042 Epoch: 32 Global Step: 676260 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:44,877-Speed 2512.67 samples/sec Loss 1.2089 LearningRate 0.000042 Epoch: 32 Global Step: 676270 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:15:53,085-Speed 2495.51 samples/sec Loss 1.1930 LearningRate 0.000042 Epoch: 32 Global Step: 676280 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:01,289-Speed 2496.82 samples/sec Loss 1.2262 LearningRate 0.000042 Epoch: 32 Global Step: 676290 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:09,492-Speed 2496.99 samples/sec Loss 1.2055 LearningRate 0.000042 Epoch: 32 Global Step: 676300 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:17,698-Speed 2496.25 samples/sec Loss 1.2295 LearningRate 0.000042 Epoch: 32 Global Step: 676310 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:25,899-Speed 2497.66 samples/sec Loss 1.1860 LearningRate 0.000042 Epoch: 32 Global Step: 676320 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:34,062-Speed 2509.45 samples/sec Loss 1.2426 LearningRate 0.000042 Epoch: 32 Global Step: 676330 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:42,264-Speed 2497.18 samples/sec Loss 1.2232 LearningRate 0.000042 Epoch: 32 Global Step: 676340 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:50,471-Speed 2496.42 samples/sec Loss 1.2125 LearningRate 0.000042 Epoch: 32 Global Step: 676350 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:16:58,677-Speed 2495.94 samples/sec Loss 1.2476 LearningRate 0.000042 Epoch: 32 Global Step: 676360 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:06,881-Speed 2496.78 samples/sec Loss 1.2064 LearningRate 0.000042 Epoch: 32 Global Step: 676370 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:15,087-Speed 2496.36 samples/sec Loss 1.1938 LearningRate 0.000042 Epoch: 32 Global Step: 676380 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:23,235-Speed 2513.98 samples/sec Loss 1.2508 LearningRate 0.000042 Epoch: 32 Global Step: 676390 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:31,442-Speed 2495.87 samples/sec Loss 1.2403 LearningRate 0.000042 Epoch: 32 Global Step: 676400 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:39,662-Speed 2491.67 samples/sec Loss 1.2226 LearningRate 0.000042 Epoch: 32 Global Step: 676410 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:47,879-Speed 2493.13 samples/sec Loss 1.2277 LearningRate 0.000042 Epoch: 32 Global Step: 676420 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:17:56,083-Speed 2496.56 samples/sec Loss 1.2128 LearningRate 0.000042 Epoch: 32 Global Step: 676430 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:04,287-Speed 2496.71 samples/sec Loss 1.1908 LearningRate 0.000042 Epoch: 32 Global Step: 676440 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:12,442-Speed 2512.04 samples/sec Loss 1.2398 LearningRate 0.000042 Epoch: 32 Global Step: 676450 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:20,647-Speed 2496.32 samples/sec Loss 1.2117 LearningRate 0.000042 Epoch: 32 Global Step: 676460 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:28,853-Speed 2495.98 samples/sec Loss 1.2122 LearningRate 0.000042 Epoch: 32 Global Step: 676470 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:37,060-Speed 2495.84 samples/sec Loss 1.2234 LearningRate 0.000042 Epoch: 32 Global Step: 676480 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:45,261-Speed 2497.65 samples/sec Loss 1.2071 LearningRate 0.000042 Epoch: 32 Global Step: 676490 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:18:53,468-Speed 2495.70 samples/sec Loss 1.2404 LearningRate 0.000042 Epoch: 32 Global Step: 676500 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:01,620-Speed 2512.79 samples/sec Loss 1.1982 LearningRate 0.000042 Epoch: 32 Global Step: 676510 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:09,831-Speed 2494.52 samples/sec Loss 1.2159 LearningRate 0.000042 Epoch: 32 Global Step: 676520 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:18,034-Speed 2497.04 samples/sec Loss 1.2111 LearningRate 0.000042 Epoch: 32 Global Step: 676530 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:26,235-Speed 2497.60 samples/sec Loss 1.2189 LearningRate 0.000042 Epoch: 32 Global Step: 676540 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:34,438-Speed 2496.90 samples/sec Loss 1.1980 LearningRate 0.000042 Epoch: 32 Global Step: 676550 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:42,651-Speed 2494.04 samples/sec Loss 1.2176 LearningRate 0.000042 Epoch: 32 Global Step: 676560 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:50,800-Speed 2513.61 samples/sec Loss 1.2210 LearningRate 0.000042 Epoch: 32 Global Step: 676570 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:19:59,026-Speed 2490.29 samples/sec Loss 1.2452 LearningRate 0.000042 Epoch: 32 Global Step: 676580 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:07,233-Speed 2495.80 samples/sec Loss 1.2136 LearningRate 0.000042 Epoch: 32 Global Step: 676590 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:15,432-Speed 2498.17 samples/sec Loss 1.2424 LearningRate 0.000042 Epoch: 32 Global Step: 676600 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:23,637-Speed 2496.71 samples/sec Loss 1.2024 LearningRate 0.000042 Epoch: 32 Global Step: 676610 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:31,841-Speed 2496.93 samples/sec Loss 1.2286 LearningRate 0.000042 Epoch: 32 Global Step: 676620 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:39,993-Speed 2512.63 samples/sec Loss 1.2059 LearningRate 0.000042 Epoch: 32 Global Step: 676630 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:48,197-Speed 2496.96 samples/sec Loss 1.2501 LearningRate 0.000042 Epoch: 32 Global Step: 676640 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:20:56,400-Speed 2497.04 samples/sec Loss 1.2020 LearningRate 0.000042 Epoch: 32 Global Step: 676650 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:04,604-Speed 2496.69 samples/sec Loss 1.1861 LearningRate 0.000042 Epoch: 32 Global Step: 676660 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:12,808-Speed 2496.81 samples/sec Loss 1.2342 LearningRate 0.000042 Epoch: 32 Global Step: 676670 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:21,011-Speed 2496.97 samples/sec Loss 1.1988 LearningRate 0.000042 Epoch: 32 Global Step: 676680 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:29,161-Speed 2513.43 samples/sec Loss 1.2035 LearningRate 0.000042 Epoch: 32 Global Step: 676690 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:37,365-Speed 2496.78 samples/sec Loss 1.2126 LearningRate 0.000042 Epoch: 32 Global Step: 676700 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:45,576-Speed 2494.68 samples/sec Loss 1.2119 LearningRate 0.000042 Epoch: 32 Global Step: 676710 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:21:53,776-Speed 2497.85 samples/sec Loss 1.2237 LearningRate 0.000042 Epoch: 32 Global Step: 676720 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:01,978-Speed 2497.26 samples/sec Loss 1.2061 LearningRate 0.000042 Epoch: 32 Global Step: 676730 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:10,189-Speed 2494.45 samples/sec Loss 1.2151 LearningRate 0.000042 Epoch: 32 Global Step: 676740 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:18,341-Speed 2512.80 samples/sec Loss 1.2304 LearningRate 0.000042 Epoch: 32 Global Step: 676750 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:26,549-Speed 2495.71 samples/sec Loss 1.2428 LearningRate 0.000042 Epoch: 32 Global Step: 676760 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:34,752-Speed 2496.89 samples/sec Loss 1.1898 LearningRate 0.000042 Epoch: 32 Global Step: 676770 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:42,955-Speed 2496.90 samples/sec Loss 1.2363 LearningRate 0.000042 Epoch: 32 Global Step: 676780 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:51,156-Speed 2497.48 samples/sec Loss 1.2003 LearningRate 0.000042 Epoch: 32 Global Step: 676790 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:22:59,363-Speed 2496.18 samples/sec Loss 1.2098 LearningRate 0.000042 Epoch: 32 Global Step: 676800 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:07,516-Speed 2512.39 samples/sec Loss 1.1989 LearningRate 0.000042 Epoch: 32 Global Step: 676810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:15,720-Speed 2496.93 samples/sec Loss 1.2473 LearningRate 0.000042 Epoch: 32 Global Step: 676820 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:23,927-Speed 2495.56 samples/sec Loss 1.2056 LearningRate 0.000042 Epoch: 32 Global Step: 676830 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:32,132-Speed 2496.48 samples/sec Loss 1.2253 LearningRate 0.000042 Epoch: 32 Global Step: 676840 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:40,350-Speed 2492.44 samples/sec Loss 1.1868 LearningRate 0.000042 Epoch: 32 Global Step: 676850 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:48,553-Speed 2497.25 samples/sec Loss 1.2297 LearningRate 0.000042 Epoch: 32 Global Step: 676860 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:23:56,704-Speed 2513.00 samples/sec Loss 1.2503 LearningRate 0.000042 Epoch: 32 Global Step: 676870 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:04,910-Speed 2496.00 samples/sec Loss 1.2027 LearningRate 0.000042 Epoch: 32 Global Step: 676880 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:13,115-Speed 2496.44 samples/sec Loss 1.2266 LearningRate 0.000042 Epoch: 32 Global Step: 676890 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:21,319-Speed 2496.87 samples/sec Loss 1.2258 LearningRate 0.000042 Epoch: 32 Global Step: 676900 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:29,523-Speed 2496.93 samples/sec Loss 1.2113 LearningRate 0.000042 Epoch: 32 Global Step: 676910 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:37,727-Speed 2496.76 samples/sec Loss 1.2614 LearningRate 0.000042 Epoch: 32 Global Step: 676920 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:45,877-Speed 2513.18 samples/sec Loss 1.2346 LearningRate 0.000042 Epoch: 32 Global Step: 676930 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:24:54,078-Speed 2498.02 samples/sec Loss 1.2176 LearningRate 0.000042 Epoch: 32 Global Step: 676940 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:02,281-Speed 2497.13 samples/sec Loss 1.2103 LearningRate 0.000042 Epoch: 32 Global Step: 676950 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:10,486-Speed 2496.30 samples/sec Loss 1.1817 LearningRate 0.000042 Epoch: 32 Global Step: 676960 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:18,699-Speed 2494.03 samples/sec Loss 1.2160 LearningRate 0.000042 Epoch: 32 Global Step: 676970 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:26,911-Speed 2494.33 samples/sec Loss 1.2166 LearningRate 0.000042 Epoch: 32 Global Step: 676980 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:35,063-Speed 2512.86 samples/sec Loss 1.2390 LearningRate 0.000042 Epoch: 32 Global Step: 676990 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:43,265-Speed 2497.18 samples/sec Loss 1.2067 LearningRate 0.000042 Epoch: 32 Global Step: 677000 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:25:51,467-Speed 2497.45 samples/sec Loss 1.2515 LearningRate 0.000042 Epoch: 32 Global Step: 677010 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-07-12 01:25:59,627-Speed 2510.15 samples/sec Loss 1.2158 LearningRate 0.000042 Epoch: 32 Global Step: 677020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:07,837-Speed 2495.10 samples/sec Loss 1.2080 LearningRate 0.000042 Epoch: 32 Global Step: 677030 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:16,043-Speed 2496.09 samples/sec Loss 1.2329 LearningRate 0.000042 Epoch: 32 Global Step: 677040 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:24,197-Speed 2512.22 samples/sec Loss 1.2341 LearningRate 0.000042 Epoch: 32 Global Step: 677050 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:32,401-Speed 2496.72 samples/sec Loss 1.2068 LearningRate 0.000042 Epoch: 32 Global Step: 677060 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:40,605-Speed 2496.67 samples/sec Loss 1.2305 LearningRate 0.000042 Epoch: 32 Global Step: 677070 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:48,812-Speed 2495.93 samples/sec Loss 1.1903 LearningRate 0.000042 Epoch: 32 Global Step: 677080 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:26:57,021-Speed 2495.33 samples/sec Loss 1.1685 LearningRate 0.000042 Epoch: 32 Global Step: 677090 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:05,228-Speed 2495.56 samples/sec Loss 1.2135 LearningRate 0.000042 Epoch: 32 Global Step: 677100 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:13,396-Speed 2507.82 samples/sec Loss 1.2360 LearningRate 0.000042 Epoch: 32 Global Step: 677110 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:21,601-Speed 2496.59 samples/sec Loss 1.2063 LearningRate 0.000042 Epoch: 32 Global Step: 677120 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:29,803-Speed 2497.53 samples/sec Loss 1.2554 LearningRate 0.000042 Epoch: 32 Global Step: 677130 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:38,004-Speed 2497.55 samples/sec Loss 1.2311 LearningRate 0.000042 Epoch: 32 Global Step: 677140 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:46,208-Speed 2496.69 samples/sec Loss 1.2308 LearningRate 0.000042 Epoch: 32 Global Step: 677150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:27:54,408-Speed 2498.53 samples/sec Loss 1.2545 LearningRate 0.000042 Epoch: 32 Global Step: 677160 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:02,562-Speed 2512.19 samples/sec Loss 1.2060 LearningRate 0.000042 Epoch: 32 Global Step: 677170 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:10,767-Speed 2496.60 samples/sec Loss 1.1928 LearningRate 0.000042 Epoch: 32 Global Step: 677180 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:18,975-Speed 2495.56 samples/sec Loss 1.2530 LearningRate 0.000042 Epoch: 32 Global Step: 677190 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:27,176-Speed 2497.44 samples/sec Loss 1.2679 LearningRate 0.000042 Epoch: 32 Global Step: 677200 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:35,379-Speed 2497.53 samples/sec Loss 1.2080 LearningRate 0.000042 Epoch: 32 Global Step: 677210 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:43,584-Speed 2496.27 samples/sec Loss 1.2268 LearningRate 0.000042 Epoch: 32 Global Step: 677220 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:51,738-Speed 2512.08 samples/sec Loss 1.2200 LearningRate 0.000042 Epoch: 32 Global Step: 677230 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:28:59,945-Speed 2495.76 samples/sec Loss 1.2458 LearningRate 0.000042 Epoch: 32 Global Step: 677240 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:08,150-Speed 2496.78 samples/sec Loss 1.2483 LearningRate 0.000042 Epoch: 32 Global Step: 677250 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:16,354-Speed 2496.93 samples/sec Loss 1.2258 LearningRate 0.000042 Epoch: 32 Global Step: 677260 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:24,558-Speed 2496.76 samples/sec Loss 1.2387 LearningRate 0.000042 Epoch: 32 Global Step: 677270 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:32,766-Speed 2495.44 samples/sec Loss 1.2042 LearningRate 0.000042 Epoch: 32 Global Step: 677280 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:40,917-Speed 2513.25 samples/sec Loss 1.2004 LearningRate 0.000042 Epoch: 32 Global Step: 677290 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:49,118-Speed 2497.35 samples/sec Loss 1.2308 LearningRate 0.000042 Epoch: 32 Global Step: 677300 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:29:57,321-Speed 2497.32 samples/sec Loss 1.1950 LearningRate 0.000042 Epoch: 32 Global Step: 677310 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:05,523-Speed 2497.35 samples/sec Loss 1.2370 LearningRate 0.000042 Epoch: 32 Global Step: 677320 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:13,727-Speed 2496.86 samples/sec Loss 1.2490 LearningRate 0.000042 Epoch: 32 Global Step: 677330 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:21,931-Speed 2496.75 samples/sec Loss 1.2403 LearningRate 0.000042 Epoch: 32 Global Step: 677340 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:30,079-Speed 2513.63 samples/sec Loss 1.2458 LearningRate 0.000042 Epoch: 32 Global Step: 677350 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:38,283-Speed 2496.87 samples/sec Loss 1.1959 LearningRate 0.000042 Epoch: 32 Global Step: 677360 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:46,489-Speed 2496.27 samples/sec Loss 1.2299 LearningRate 0.000042 Epoch: 32 Global Step: 677370 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:30:54,690-Speed 2497.46 samples/sec Loss 1.2288 LearningRate 0.000042 Epoch: 32 Global Step: 677380 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:02,896-Speed 2496.23 samples/sec Loss 1.2273 LearningRate 0.000042 Epoch: 32 Global Step: 677390 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:11,111-Speed 2493.61 samples/sec Loss 1.2247 LearningRate 0.000042 Epoch: 32 Global Step: 677400 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:19,262-Speed 2512.97 samples/sec Loss 1.2296 LearningRate 0.000042 Epoch: 32 Global Step: 677410 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:27,467-Speed 2496.37 samples/sec Loss 1.2092 LearningRate 0.000042 Epoch: 32 Global Step: 677420 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:35,668-Speed 2498.11 samples/sec Loss 1.2114 LearningRate 0.000042 Epoch: 32 Global Step: 677430 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:43,874-Speed 2496.30 samples/sec Loss 1.2022 LearningRate 0.000042 Epoch: 32 Global Step: 677440 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:31:52,075-Speed 2497.59 samples/sec Loss 1.2502 LearningRate 0.000042 Epoch: 32 Global Step: 677450 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:00,280-Speed 2496.17 samples/sec Loss 1.2260 LearningRate 0.000042 Epoch: 32 Global Step: 677460 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:08,437-Speed 2511.29 samples/sec Loss 1.2522 LearningRate 0.000041 Epoch: 32 Global Step: 677470 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:16,637-Speed 2498.11 samples/sec Loss 1.2150 LearningRate 0.000041 Epoch: 32 Global Step: 677480 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:24,840-Speed 2497.04 samples/sec Loss 1.1950 LearningRate 0.000041 Epoch: 32 Global Step: 677490 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:33,044-Speed 2496.80 samples/sec Loss 1.2164 LearningRate 0.000041 Epoch: 32 Global Step: 677500 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:41,247-Speed 2497.27 samples/sec Loss 1.2399 LearningRate 0.000041 Epoch: 32 Global Step: 677510 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:49,448-Speed 2497.72 samples/sec Loss 1.2197 LearningRate 0.000041 Epoch: 32 Global Step: 677520 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:32:57,608-Speed 2510.22 samples/sec Loss 1.2391 LearningRate 0.000041 Epoch: 32 Global Step: 677530 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:05,814-Speed 2496.15 samples/sec Loss 1.2335 LearningRate 0.000041 Epoch: 32 Global Step: 677540 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:14,020-Speed 2496.39 samples/sec Loss 1.2618 LearningRate 0.000041 Epoch: 32 Global Step: 677550 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:22,230-Speed 2494.90 samples/sec Loss 1.2549 LearningRate 0.000041 Epoch: 32 Global Step: 677560 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:30,435-Speed 2496.52 samples/sec Loss 1.2245 LearningRate 0.000041 Epoch: 32 Global Step: 677570 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:38,651-Speed 2493.13 samples/sec Loss 1.2239 LearningRate 0.000041 Epoch: 32 Global Step: 677580 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:46,802-Speed 2512.91 samples/sec Loss 1.1888 LearningRate 0.000041 Epoch: 32 Global Step: 677590 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:33:55,008-Speed 2496.02 samples/sec Loss 1.2126 LearningRate 0.000041 Epoch: 32 Global Step: 677600 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:34:03,215-Speed 2496.14 samples/sec Loss 1.2237 LearningRate 0.000041 Epoch: 32 Global Step: 677610 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:34:11,430-Speed 2493.70 samples/sec Loss 1.2311 LearningRate 0.000041 Epoch: 32 Global Step: 677620 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-07-12 01:34:19,593-Speed 2509.41 samples/sec Loss 1.2190 LearningRate 0.000041 Epoch: 32 Global Step: 677630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:34:27,807-Speed 2493.83 samples/sec Loss 1.2032 LearningRate 0.000041 Epoch: 32 Global Step: 677640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:34:35,961-Speed 2512.08 samples/sec Loss 1.2230 LearningRate 0.000041 Epoch: 32 Global Step: 677650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:34:44,164-Speed 2497.03 samples/sec Loss 1.2253 LearningRate 0.000041 Epoch: 32 Global Step: 677660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:34:52,373-Speed 2495.33 samples/sec Loss 1.2294 LearningRate 0.000041 Epoch: 32 Global Step: 677670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:00,574-Speed 2497.58 samples/sec Loss 1.1920 LearningRate 0.000041 Epoch: 32 Global Step: 677680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:08,775-Speed 2497.58 samples/sec Loss 1.2204 LearningRate 0.000041 Epoch: 32 Global Step: 677690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:16,978-Speed 2497.19 samples/sec Loss 1.2123 LearningRate 0.000041 Epoch: 32 Global Step: 677700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:25,131-Speed 2512.24 samples/sec Loss 1.2025 LearningRate 0.000041 Epoch: 32 Global Step: 677710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:33,333-Speed 2497.34 samples/sec Loss 1.2309 LearningRate 0.000041 Epoch: 32 Global Step: 677720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:41,537-Speed 2496.96 samples/sec Loss 1.2537 LearningRate 0.000041 Epoch: 32 Global Step: 677730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:49,739-Speed 2497.35 samples/sec Loss 1.1949 LearningRate 0.000041 Epoch: 32 Global Step: 677740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:35:57,940-Speed 2497.45 samples/sec Loss 1.2363 LearningRate 0.000041 Epoch: 32 Global Step: 677750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:06,144-Speed 2496.93 samples/sec Loss 1.2429 LearningRate 0.000041 Epoch: 32 Global Step: 677760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:14,292-Speed 2514.01 samples/sec Loss 1.2181 LearningRate 0.000041 Epoch: 32 Global Step: 677770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:22,515-Speed 2491.25 samples/sec Loss 1.2113 LearningRate 0.000041 Epoch: 32 Global Step: 677780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:30,716-Speed 2497.43 samples/sec Loss 1.2152 LearningRate 0.000041 Epoch: 32 Global Step: 677790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:38,921-Speed 2496.34 samples/sec Loss 1.2387 LearningRate 0.000041 Epoch: 32 Global Step: 677800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:47,130-Speed 2495.62 samples/sec Loss 1.2258 LearningRate 0.000041 Epoch: 32 Global Step: 677810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:36:55,337-Speed 2495.86 samples/sec Loss 1.1930 LearningRate 0.000041 Epoch: 32 Global Step: 677820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:03,481-Speed 2515.16 samples/sec Loss 1.2117 LearningRate 0.000041 Epoch: 32 Global Step: 677830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:11,692-Speed 2494.63 samples/sec Loss 1.2330 LearningRate 0.000041 Epoch: 32 Global Step: 677840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:19,894-Speed 2497.55 samples/sec Loss 1.2301 LearningRate 0.000041 Epoch: 32 Global Step: 677850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:28,102-Speed 2495.62 samples/sec Loss 1.2314 LearningRate 0.000041 Epoch: 32 Global Step: 677860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:36,310-Speed 2495.34 samples/sec Loss 1.2097 LearningRate 0.000041 Epoch: 32 Global Step: 677870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:44,521-Speed 2494.73 samples/sec Loss 1.2220 LearningRate 0.000041 Epoch: 32 Global Step: 677880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:37:52,671-Speed 2513.34 samples/sec Loss 1.2274 LearningRate 0.000041 Epoch: 32 Global Step: 677890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:38:00,876-Speed 2496.46 samples/sec Loss 1.2282 LearningRate 0.000041 Epoch: 32 Global Step: 677900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:38:09,082-Speed 2496.78 samples/sec Loss 1.2225 LearningRate 0.000041 Epoch: 32 Global Step: 677910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-07-12 01:38:17,251-Speed 2507.74 samples/sec Loss 1.2133 LearningRate 0.000041 Epoch: 32 Global Step: 677920 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:38:25,457-Speed 2495.97 samples/sec Loss 1.1879 LearningRate 0.000041 Epoch: 32 Global Step: 677930 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:38:33,667-Speed 2494.86 samples/sec Loss 1.2231 LearningRate 0.000041 Epoch: 32 Global Step: 677940 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:38:41,819-Speed 2512.82 samples/sec Loss 1.1944 LearningRate 0.000041 Epoch: 32 Global Step: 677950 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:38:50,023-Speed 2496.70 samples/sec Loss 1.2109 LearningRate 0.000041 Epoch: 32 Global Step: 677960 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:38:58,228-Speed 2496.47 samples/sec Loss 1.1831 LearningRate 0.000041 Epoch: 32 Global Step: 677970 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:06,430-Speed 2497.36 samples/sec Loss 1.1828 LearningRate 0.000041 Epoch: 32 Global Step: 677980 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:14,631-Speed 2497.56 samples/sec Loss 1.2127 LearningRate 0.000041 Epoch: 32 Global Step: 677990 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:22,832-Speed 2497.72 samples/sec Loss 1.1839 LearningRate 0.000041 Epoch: 32 Global Step: 678000 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:30,982-Speed 2513.20 samples/sec Loss 1.2033 LearningRate 0.000041 Epoch: 32 Global Step: 678010 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:39,185-Speed 2497.13 samples/sec Loss 1.2282 LearningRate 0.000041 Epoch: 32 Global Step: 678020 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:47,386-Speed 2497.77 samples/sec Loss 1.1935 LearningRate 0.000041 Epoch: 32 Global Step: 678030 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:39:55,587-Speed 2497.65 samples/sec Loss 1.2086 LearningRate 0.000041 Epoch: 32 Global Step: 678040 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:03,790-Speed 2497.12 samples/sec Loss 1.1936 LearningRate 0.000041 Epoch: 32 Global Step: 678050 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:11,997-Speed 2495.70 samples/sec Loss 1.2259 LearningRate 0.000041 Epoch: 32 Global Step: 678060 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:20,147-Speed 2513.21 samples/sec Loss 1.2176 LearningRate 0.000041 Epoch: 32 Global Step: 678070 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:28,352-Speed 2496.67 samples/sec Loss 1.2145 LearningRate 0.000041 Epoch: 32 Global Step: 678080 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:36,553-Speed 2497.32 samples/sec Loss 1.2133 LearningRate 0.000041 Epoch: 32 Global Step: 678090 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:44,753-Speed 2498.37 samples/sec Loss 1.2041 LearningRate 0.000041 Epoch: 32 Global Step: 678100 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:40:52,969-Speed 2493.39 samples/sec Loss 1.2727 LearningRate 0.000041 Epoch: 32 Global Step: 678110 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:01,169-Speed 2497.59 samples/sec Loss 1.2156 LearningRate 0.000041 Epoch: 32 Global Step: 678120 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:09,320-Speed 2513.38 samples/sec Loss 1.1875 LearningRate 0.000041 Epoch: 32 Global Step: 678130 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:17,526-Speed 2496.14 samples/sec Loss 1.1941 LearningRate 0.000041 Epoch: 32 Global Step: 678140 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:25,727-Speed 2497.89 samples/sec Loss 1.2138 LearningRate 0.000041 Epoch: 32 Global Step: 678150 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:33,932-Speed 2496.42 samples/sec Loss 1.2293 LearningRate 0.000041 Epoch: 32 Global Step: 678160 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:42,136-Speed 2496.49 samples/sec Loss 1.2035 LearningRate 0.000041 Epoch: 32 Global Step: 678170 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:50,336-Speed 2497.94 samples/sec Loss 1.2207 LearningRate 0.000041 Epoch: 32 Global Step: 678180 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:41:58,487-Speed 2513.01 samples/sec Loss 1.2059 LearningRate 0.000041 Epoch: 32 Global Step: 678190 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:06,689-Speed 2497.22 samples/sec Loss 1.2013 LearningRate 0.000041 Epoch: 32 Global Step: 678200 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:14,893-Speed 2496.81 samples/sec Loss 1.2171 LearningRate 0.000041 Epoch: 32 Global Step: 678210 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:23,095-Speed 2497.68 samples/sec Loss 1.2072 LearningRate 0.000041 Epoch: 32 Global Step: 678220 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:31,296-Speed 2497.46 samples/sec Loss 1.2164 LearningRate 0.000041 Epoch: 32 Global Step: 678230 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:39,497-Speed 2497.81 samples/sec Loss 1.2080 LearningRate 0.000041 Epoch: 32 Global Step: 678240 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:47,660-Speed 2509.38 samples/sec Loss 1.2182 LearningRate 0.000041 Epoch: 32 Global Step: 678250 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:42:55,862-Speed 2497.60 samples/sec Loss 1.2234 LearningRate 0.000041 Epoch: 32 Global Step: 678260 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:04,068-Speed 2495.80 samples/sec Loss 1.2169 LearningRate 0.000041 Epoch: 32 Global Step: 678270 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:12,275-Speed 2495.83 samples/sec Loss 1.2005 LearningRate 0.000041 Epoch: 32 Global Step: 678280 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:20,478-Speed 2497.04 samples/sec Loss 1.1987 LearningRate 0.000041 Epoch: 32 Global Step: 678290 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:28,679-Speed 2497.76 samples/sec Loss 1.1923 LearningRate 0.000041 Epoch: 32 Global Step: 678300 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:36,840-Speed 2509.90 samples/sec Loss 1.2126 LearningRate 0.000041 Epoch: 32 Global Step: 678310 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:45,043-Speed 2497.08 samples/sec Loss 1.1930 LearningRate 0.000041 Epoch: 32 Global Step: 678320 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:43:53,246-Speed 2497.08 samples/sec Loss 1.2036 LearningRate 0.000041 Epoch: 32 Global Step: 678330 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:01,472-Speed 2490.18 samples/sec Loss 1.2214 LearningRate 0.000041 Epoch: 32 Global Step: 678340 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:09,676-Speed 2496.59 samples/sec Loss 1.2337 LearningRate 0.000041 Epoch: 32 Global Step: 678350 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:17,876-Speed 2498.08 samples/sec Loss 1.2303 LearningRate 0.000041 Epoch: 32 Global Step: 678360 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:26,029-Speed 2512.45 samples/sec Loss 1.2149 LearningRate 0.000041 Epoch: 32 Global Step: 678370 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:34,245-Speed 2493.05 samples/sec Loss 1.1972 LearningRate 0.000041 Epoch: 32 Global Step: 678380 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:42,460-Speed 2493.38 samples/sec Loss 1.2196 LearningRate 0.000041 Epoch: 32 Global Step: 678390 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:50,662-Speed 2497.47 samples/sec Loss 1.1859 LearningRate 0.000041 Epoch: 32 Global Step: 678400 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:44:58,866-Speed 2496.63 samples/sec Loss 1.2068 LearningRate 0.000041 Epoch: 32 Global Step: 678410 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:07,068-Speed 2497.31 samples/sec Loss 1.2008 LearningRate 0.000041 Epoch: 32 Global Step: 678420 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:15,218-Speed 2513.34 samples/sec Loss 1.2196 LearningRate 0.000041 Epoch: 32 Global Step: 678430 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:23,436-Speed 2492.48 samples/sec Loss 1.2212 LearningRate 0.000041 Epoch: 32 Global Step: 678440 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:31,648-Speed 2494.09 samples/sec Loss 1.2174 LearningRate 0.000041 Epoch: 32 Global Step: 678450 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:39,848-Speed 2498.17 samples/sec Loss 1.2230 LearningRate 0.000041 Epoch: 32 Global Step: 678460 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:48,048-Speed 2497.94 samples/sec Loss 1.2209 LearningRate 0.000041 Epoch: 32 Global Step: 678470 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:45:56,253-Speed 2495.99 samples/sec Loss 1.2157 LearningRate 0.000041 Epoch: 32 Global Step: 678480 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:04,416-Speed 2509.40 samples/sec Loss 1.2255 LearningRate 0.000041 Epoch: 32 Global Step: 678490 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:12,620-Speed 2496.87 samples/sec Loss 1.2198 LearningRate 0.000041 Epoch: 32 Global Step: 678500 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:20,824-Speed 2496.78 samples/sec Loss 1.2002 LearningRate 0.000041 Epoch: 32 Global Step: 678510 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:29,025-Speed 2497.58 samples/sec Loss 1.1865 LearningRate 0.000041 Epoch: 32 Global Step: 678520 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:37,226-Speed 2497.64 samples/sec Loss 1.2442 LearningRate 0.000041 Epoch: 32 Global Step: 678530 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:45,430-Speed 2496.66 samples/sec Loss 1.2114 LearningRate 0.000041 Epoch: 32 Global Step: 678540 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:46:53,579-Speed 2513.62 samples/sec Loss 1.2166 LearningRate 0.000041 Epoch: 32 Global Step: 678550 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:01,779-Speed 2498.07 samples/sec Loss 1.1923 LearningRate 0.000041 Epoch: 32 Global Step: 678560 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:09,979-Speed 2497.96 samples/sec Loss 1.2237 LearningRate 0.000041 Epoch: 32 Global Step: 678570 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:18,180-Speed 2497.56 samples/sec Loss 1.2051 LearningRate 0.000041 Epoch: 32 Global Step: 678580 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:26,381-Speed 2497.65 samples/sec Loss 1.1930 LearningRate 0.000041 Epoch: 32 Global Step: 678590 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:34,585-Speed 2497.21 samples/sec Loss 1.2209 LearningRate 0.000041 Epoch: 32 Global Step: 678600 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:42,733-Speed 2514.01 samples/sec Loss 1.2369 LearningRate 0.000041 Epoch: 32 Global Step: 678610 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:50,935-Speed 2497.33 samples/sec Loss 1.2076 LearningRate 0.000041 Epoch: 32 Global Step: 678620 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:47:59,137-Speed 2497.13 samples/sec Loss 1.2120 LearningRate 0.000041 Epoch: 32 Global Step: 678630 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:07,348-Speed 2494.50 samples/sec Loss 1.1968 LearningRate 0.000041 Epoch: 32 Global Step: 678640 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:15,551-Speed 2497.24 samples/sec Loss 1.2059 LearningRate 0.000041 Epoch: 32 Global Step: 678650 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:23,754-Speed 2496.85 samples/sec Loss 1.2020 LearningRate 0.000041 Epoch: 32 Global Step: 678660 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:31,907-Speed 2512.40 samples/sec Loss 1.1933 LearningRate 0.000041 Epoch: 32 Global Step: 678670 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:40,108-Speed 2497.71 samples/sec Loss 1.2193 LearningRate 0.000041 Epoch: 32 Global Step: 678680 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:48,311-Speed 2497.31 samples/sec Loss 1.2474 LearningRate 0.000041 Epoch: 32 Global Step: 678690 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:48:56,523-Speed 2494.80 samples/sec Loss 1.2240 LearningRate 0.000041 Epoch: 32 Global Step: 678700 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:04,723-Speed 2498.01 samples/sec Loss 1.2060 LearningRate 0.000041 Epoch: 32 Global Step: 678710 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:12,940-Speed 2492.70 samples/sec Loss 1.2039 LearningRate 0.000041 Epoch: 32 Global Step: 678720 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:21,093-Speed 2512.64 samples/sec Loss 1.1996 LearningRate 0.000041 Epoch: 32 Global Step: 678730 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:29,539-Speed 2424.90 samples/sec Loss 1.2198 LearningRate 0.000041 Epoch: 32 Global Step: 678740 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:37,746-Speed 2496.01 samples/sec Loss 1.2162 LearningRate 0.000041 Epoch: 32 Global Step: 678750 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:45,961-Speed 2493.28 samples/sec Loss 1.2386 LearningRate 0.000041 Epoch: 32 Global Step: 678760 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:49:54,165-Speed 2497.25 samples/sec Loss 1.2061 LearningRate 0.000041 Epoch: 32 Global Step: 678770 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:02,369-Speed 2496.49 samples/sec Loss 1.2358 LearningRate 0.000041 Epoch: 32 Global Step: 678780 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:10,520-Speed 2513.16 samples/sec Loss 1.2378 LearningRate 0.000041 Epoch: 32 Global Step: 678790 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:18,722-Speed 2497.25 samples/sec Loss 1.2042 LearningRate 0.000041 Epoch: 32 Global Step: 678800 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:26,925-Speed 2497.03 samples/sec Loss 1.1855 LearningRate 0.000041 Epoch: 32 Global Step: 678810 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:35,138-Speed 2493.84 samples/sec Loss 1.2184 LearningRate 0.000041 Epoch: 32 Global Step: 678820 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:43,343-Speed 2496.48 samples/sec Loss 1.1946 LearningRate 0.000041 Epoch: 32 Global Step: 678830 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:51,549-Speed 2496.15 samples/sec Loss 1.1924 LearningRate 0.000041 Epoch: 32 Global Step: 678840 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:50:59,717-Speed 2507.59 samples/sec Loss 1.2642 LearningRate 0.000041 Epoch: 32 Global Step: 678850 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-07-12 01:51:07,920-Speed 2497.25 samples/sec Loss 1.2031 LearningRate 0.000041 Epoch: 32 Global Step: 678860 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:16,118-Speed 2498.54 samples/sec Loss 1.2410 LearningRate 0.000041 Epoch: 32 Global Step: 678870 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:24,322-Speed 2496.46 samples/sec Loss 1.2295 LearningRate 0.000041 Epoch: 32 Global Step: 678880 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:32,529-Speed 2495.93 samples/sec Loss 1.2353 LearningRate 0.000041 Epoch: 32 Global Step: 678890 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:40,731-Speed 2497.23 samples/sec Loss 1.2137 LearningRate 0.000041 Epoch: 32 Global Step: 678900 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:48,884-Speed 2512.54 samples/sec Loss 1.2310 LearningRate 0.000041 Epoch: 32 Global Step: 678910 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:51:57,083-Speed 2498.08 samples/sec Loss 1.2377 LearningRate 0.000041 Epoch: 32 Global Step: 678920 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:05,303-Speed 2491.99 samples/sec Loss 1.1992 LearningRate 0.000041 Epoch: 32 Global Step: 678930 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:13,502-Speed 2498.46 samples/sec Loss 1.1903 LearningRate 0.000041 Epoch: 32 Global Step: 678940 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:21,701-Speed 2498.71 samples/sec Loss 1.2400 LearningRate 0.000041 Epoch: 32 Global Step: 678950 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:29,901-Speed 2497.76 samples/sec Loss 1.1960 LearningRate 0.000041 Epoch: 32 Global Step: 678960 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:38,053-Speed 2512.91 samples/sec Loss 1.2491 LearningRate 0.000041 Epoch: 32 Global Step: 678970 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:46,255-Speed 2497.55 samples/sec Loss 1.2079 LearningRate 0.000041 Epoch: 32 Global Step: 678980 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:52:54,459-Speed 2496.65 samples/sec Loss 1.2010 LearningRate 0.000041 Epoch: 32 Global Step: 678990 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:02,662-Speed 2497.11 samples/sec Loss 1.1832 LearningRate 0.000041 Epoch: 32 Global Step: 679000 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:10,868-Speed 2496.25 samples/sec Loss 1.2136 LearningRate 0.000041 Epoch: 32 Global Step: 679010 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:19,072-Speed 2496.66 samples/sec Loss 1.2053 LearningRate 0.000041 Epoch: 32 Global Step: 679020 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:27,219-Speed 2514.30 samples/sec Loss 1.2525 LearningRate 0.000041 Epoch: 32 Global Step: 679030 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:35,421-Speed 2497.40 samples/sec Loss 1.2048 LearningRate 0.000041 Epoch: 32 Global Step: 679040 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:43,623-Speed 2497.28 samples/sec Loss 1.2444 LearningRate 0.000041 Epoch: 32 Global Step: 679050 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:53:51,828-Speed 2496.37 samples/sec Loss 1.1980 LearningRate 0.000041 Epoch: 32 Global Step: 679060 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:00,030-Speed 2497.51 samples/sec Loss 1.2231 LearningRate 0.000041 Epoch: 32 Global Step: 679070 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:08,234-Speed 2496.65 samples/sec Loss 1.1983 LearningRate 0.000041 Epoch: 32 Global Step: 679080 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:16,384-Speed 2513.49 samples/sec Loss 1.2617 LearningRate 0.000041 Epoch: 32 Global Step: 679090 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:24,585-Speed 2497.25 samples/sec Loss 1.2092 LearningRate 0.000041 Epoch: 32 Global Step: 679100 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:32,789-Speed 2496.96 samples/sec Loss 1.2351 LearningRate 0.000041 Epoch: 32 Global Step: 679110 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-07-12 01:54:40,993-Speed 2496.82 samples/sec Loss 1.1837 LearningRate 0.000041 Epoch: 32 Global Step: 679120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:54:49,198-Speed 2496.44 samples/sec Loss 1.2007 LearningRate 0.000041 Epoch: 32 Global Step: 679130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:54:57,402-Speed 2496.74 samples/sec Loss 1.2361 LearningRate 0.000041 Epoch: 32 Global Step: 679140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:05,554-Speed 2512.52 samples/sec Loss 1.2260 LearningRate 0.000041 Epoch: 32 Global Step: 679150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:13,756-Speed 2497.42 samples/sec Loss 1.2363 LearningRate 0.000041 Epoch: 32 Global Step: 679160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:21,971-Speed 2493.52 samples/sec Loss 1.2086 LearningRate 0.000041 Epoch: 32 Global Step: 679170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:30,173-Speed 2497.29 samples/sec Loss 1.2416 LearningRate 0.000041 Epoch: 32 Global Step: 679180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:38,382-Speed 2495.41 samples/sec Loss 1.2416 LearningRate 0.000041 Epoch: 32 Global Step: 679190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:46,586-Speed 2496.88 samples/sec Loss 1.2115 LearningRate 0.000041 Epoch: 32 Global Step: 679200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:55:54,737-Speed 2512.75 samples/sec Loss 1.2297 LearningRate 0.000041 Epoch: 32 Global Step: 679210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:02,937-Speed 2498.15 samples/sec Loss 1.2057 LearningRate 0.000041 Epoch: 32 Global Step: 679220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:11,140-Speed 2496.75 samples/sec Loss 1.2234 LearningRate 0.000041 Epoch: 32 Global Step: 679230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:19,345-Speed 2496.68 samples/sec Loss 1.2341 LearningRate 0.000041 Epoch: 32 Global Step: 679240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:27,545-Speed 2497.66 samples/sec Loss 1.2187 LearningRate 0.000041 Epoch: 32 Global Step: 679250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:35,758-Speed 2493.99 samples/sec Loss 1.2168 LearningRate 0.000041 Epoch: 32 Global Step: 679260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:43,908-Speed 2513.47 samples/sec Loss 1.1818 LearningRate 0.000041 Epoch: 32 Global Step: 679270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:56:52,116-Speed 2495.60 samples/sec Loss 1.2416 LearningRate 0.000041 Epoch: 32 Global Step: 679280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:00,333-Speed 2493.00 samples/sec Loss 1.2318 LearningRate 0.000041 Epoch: 32 Global Step: 679290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:08,536-Speed 2496.99 samples/sec Loss 1.2087 LearningRate 0.000041 Epoch: 32 Global Step: 679300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:16,739-Speed 2497.27 samples/sec Loss 1.1797 LearningRate 0.000040 Epoch: 32 Global Step: 679310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:24,944-Speed 2496.39 samples/sec Loss 1.2210 LearningRate 0.000040 Epoch: 32 Global Step: 679320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:33,094-Speed 2513.35 samples/sec Loss 1.1812 LearningRate 0.000040 Epoch: 32 Global Step: 679330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:41,296-Speed 2497.30 samples/sec Loss 1.1784 LearningRate 0.000040 Epoch: 32 Global Step: 679340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:49,501-Speed 2496.38 samples/sec Loss 1.2236 LearningRate 0.000040 Epoch: 32 Global Step: 679350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:57:57,702-Speed 2497.62 samples/sec Loss 1.2279 LearningRate 0.000040 Epoch: 32 Global Step: 679360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:05,907-Speed 2496.44 samples/sec Loss 1.1718 LearningRate 0.000040 Epoch: 32 Global Step: 679370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:14,117-Speed 2494.92 samples/sec Loss 1.2007 LearningRate 0.000040 Epoch: 32 Global Step: 679380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:22,281-Speed 2509.04 samples/sec Loss 1.1890 LearningRate 0.000040 Epoch: 32 Global Step: 679390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:30,484-Speed 2496.84 samples/sec Loss 1.2087 LearningRate 0.000040 Epoch: 32 Global Step: 679400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:38,688-Speed 2496.94 samples/sec Loss 1.2050 LearningRate 0.000040 Epoch: 32 Global Step: 679410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:46,894-Speed 2495.96 samples/sec Loss 1.1905 LearningRate 0.000040 Epoch: 32 Global Step: 679420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:58:55,099-Speed 2496.36 samples/sec Loss 1.1925 LearningRate 0.000040 Epoch: 32 Global Step: 679430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:03,302-Speed 2497.03 samples/sec Loss 1.2156 LearningRate 0.000040 Epoch: 32 Global Step: 679440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:11,452-Speed 2513.33 samples/sec Loss 1.2166 LearningRate 0.000040 Epoch: 32 Global Step: 679450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:19,667-Speed 2493.53 samples/sec Loss 1.2067 LearningRate 0.000040 Epoch: 32 Global Step: 679460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:27,875-Speed 2495.43 samples/sec Loss 1.2160 LearningRate 0.000040 Epoch: 32 Global Step: 679470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:36,077-Speed 2497.24 samples/sec Loss 1.2135 LearningRate 0.000040 Epoch: 32 Global Step: 679480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:44,293-Speed 2493.33 samples/sec Loss 1.2069 LearningRate 0.000040 Epoch: 32 Global Step: 679490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 01:59:52,496-Speed 2497.00 samples/sec Loss 1.1856 LearningRate 0.000040 Epoch: 32 Global Step: 679500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:00,650-Speed 2511.93 samples/sec Loss 1.2411 LearningRate 0.000040 Epoch: 32 Global Step: 679510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:08,863-Speed 2494.17 samples/sec Loss 1.1978 LearningRate 0.000040 Epoch: 32 Global Step: 679520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:17,065-Speed 2497.55 samples/sec Loss 1.1829 LearningRate 0.000040 Epoch: 32 Global Step: 679530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:25,270-Speed 2496.34 samples/sec Loss 1.2497 LearningRate 0.000040 Epoch: 32 Global Step: 679540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:33,474-Speed 2496.98 samples/sec Loss 1.1951 LearningRate 0.000040 Epoch: 32 Global Step: 679550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:41,676-Speed 2497.21 samples/sec Loss 1.2073 LearningRate 0.000040 Epoch: 32 Global Step: 679560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:49,830-Speed 2512.06 samples/sec Loss 1.1889 LearningRate 0.000040 Epoch: 32 Global Step: 679570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:00:58,033-Speed 2497.02 samples/sec Loss 1.2366 LearningRate 0.000040 Epoch: 32 Global Step: 679580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:06,235-Speed 2497.43 samples/sec Loss 1.2424 LearningRate 0.000040 Epoch: 32 Global Step: 679590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:14,439-Speed 2496.73 samples/sec Loss 1.1978 LearningRate 0.000040 Epoch: 32 Global Step: 679600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:22,643-Speed 2497.15 samples/sec Loss 1.2359 LearningRate 0.000040 Epoch: 32 Global Step: 679610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:30,845-Speed 2497.47 samples/sec Loss 1.2119 LearningRate 0.000040 Epoch: 32 Global Step: 679620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:38,994-Speed 2513.48 samples/sec Loss 1.2096 LearningRate 0.000040 Epoch: 32 Global Step: 679630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:47,193-Speed 2498.29 samples/sec Loss 1.2090 LearningRate 0.000040 Epoch: 32 Global Step: 679640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:01:55,396-Speed 2497.09 samples/sec Loss 1.2045 LearningRate 0.000040 Epoch: 32 Global Step: 679650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:03,601-Speed 2496.44 samples/sec Loss 1.2051 LearningRate 0.000040 Epoch: 32 Global Step: 679660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:11,804-Speed 2497.03 samples/sec Loss 1.2139 LearningRate 0.000040 Epoch: 32 Global Step: 679670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:20,010-Speed 2496.05 samples/sec Loss 1.2122 LearningRate 0.000040 Epoch: 32 Global Step: 679680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:28,163-Speed 2512.56 samples/sec Loss 1.2359 LearningRate 0.000040 Epoch: 32 Global Step: 679690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:36,365-Speed 2496.96 samples/sec Loss 1.2287 LearningRate 0.000040 Epoch: 32 Global Step: 679700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:44,570-Speed 2496.52 samples/sec Loss 1.2216 LearningRate 0.000040 Epoch: 32 Global Step: 679710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:02:52,778-Speed 2495.86 samples/sec Loss 1.2201 LearningRate 0.000040 Epoch: 32 Global Step: 679720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:00,981-Speed 2497.05 samples/sec Loss 1.1934 LearningRate 0.000040 Epoch: 32 Global Step: 679730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:09,183-Speed 2497.31 samples/sec Loss 1.2289 LearningRate 0.000040 Epoch: 32 Global Step: 679740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:17,342-Speed 2510.53 samples/sec Loss 1.1826 LearningRate 0.000040 Epoch: 32 Global Step: 679750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:25,549-Speed 2495.75 samples/sec Loss 1.2573 LearningRate 0.000040 Epoch: 32 Global Step: 679760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:33,749-Speed 2497.96 samples/sec Loss 1.2358 LearningRate 0.000040 Epoch: 32 Global Step: 679770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:41,952-Speed 2496.98 samples/sec Loss 1.1898 LearningRate 0.000040 Epoch: 32 Global Step: 679780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:50,155-Speed 2496.97 samples/sec Loss 1.2317 LearningRate 0.000040 Epoch: 32 Global Step: 679790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:03:58,358-Speed 2497.19 samples/sec Loss 1.1983 LearningRate 0.000040 Epoch: 32 Global Step: 679800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:06,511-Speed 2512.29 samples/sec Loss 1.2283 LearningRate 0.000040 Epoch: 32 Global Step: 679810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:14,713-Speed 2497.46 samples/sec Loss 1.2583 LearningRate 0.000040 Epoch: 32 Global Step: 679820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:22,919-Speed 2496.15 samples/sec Loss 1.2391 LearningRate 0.000040 Epoch: 32 Global Step: 679830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:31,123-Speed 2496.87 samples/sec Loss 1.1970 LearningRate 0.000040 Epoch: 32 Global Step: 679840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:39,325-Speed 2497.37 samples/sec Loss 1.2321 LearningRate 0.000040 Epoch: 32 Global Step: 679850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:47,528-Speed 2497.03 samples/sec Loss 1.2557 LearningRate 0.000040 Epoch: 32 Global Step: 679860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:04:55,681-Speed 2512.66 samples/sec Loss 1.2249 LearningRate 0.000040 Epoch: 32 Global Step: 679870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:03,883-Speed 2497.23 samples/sec Loss 1.2080 LearningRate 0.000040 Epoch: 32 Global Step: 679880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:12,089-Speed 2495.96 samples/sec Loss 1.2301 LearningRate 0.000040 Epoch: 32 Global Step: 679890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:20,293-Speed 2497.07 samples/sec Loss 1.2052 LearningRate 0.000040 Epoch: 32 Global Step: 679900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:28,494-Speed 2498.00 samples/sec Loss 1.2159 LearningRate 0.000040 Epoch: 32 Global Step: 679910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:36,703-Speed 2495.23 samples/sec Loss 1.2113 LearningRate 0.000040 Epoch: 32 Global Step: 679920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:44,850-Speed 2514.18 samples/sec Loss 1.2406 LearningRate 0.000040 Epoch: 32 Global Step: 679930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:05:53,076-Speed 2490.14 samples/sec Loss 1.2074 LearningRate 0.000040 Epoch: 32 Global Step: 679940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:01,278-Speed 2497.25 samples/sec Loss 1.2242 LearningRate 0.000040 Epoch: 32 Global Step: 679950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:09,478-Speed 2497.88 samples/sec Loss 1.2355 LearningRate 0.000040 Epoch: 32 Global Step: 679960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:17,680-Speed 2497.64 samples/sec Loss 1.2109 LearningRate 0.000040 Epoch: 32 Global Step: 679970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:25,884-Speed 2496.80 samples/sec Loss 1.2248 LearningRate 0.000040 Epoch: 32 Global Step: 679980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:34,030-Speed 2514.46 samples/sec Loss 1.2255 LearningRate 0.000040 Epoch: 32 Global Step: 679990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:42,235-Speed 2496.66 samples/sec Loss 1.2316 LearningRate 0.000040 Epoch: 32 Global Step: 680000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:50,438-Speed 2496.99 samples/sec Loss 1.2183 LearningRate 0.000040 Epoch: 32 Global Step: 680010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:06:58,640-Speed 2497.43 samples/sec Loss 1.2074 LearningRate 0.000040 Epoch: 32 Global Step: 680020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:06,857-Speed 2492.68 samples/sec Loss 1.2265 LearningRate 0.000040 Epoch: 32 Global Step: 680030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:15,060-Speed 2497.22 samples/sec Loss 1.1787 LearningRate 0.000040 Epoch: 32 Global Step: 680040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:23,212-Speed 2512.42 samples/sec Loss 1.2126 LearningRate 0.000040 Epoch: 32 Global Step: 680050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:31,414-Speed 2497.66 samples/sec Loss 1.2223 LearningRate 0.000040 Epoch: 32 Global Step: 680060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:39,617-Speed 2497.03 samples/sec Loss 1.2217 LearningRate 0.000040 Epoch: 32 Global Step: 680070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:47,819-Speed 2497.16 samples/sec Loss 1.2145 LearningRate 0.000040 Epoch: 32 Global Step: 680080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:07:56,026-Speed 2495.97 samples/sec Loss 1.2125 LearningRate 0.000040 Epoch: 32 Global Step: 680090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:04,230-Speed 2496.71 samples/sec Loss 1.1724 LearningRate 0.000040 Epoch: 32 Global Step: 680100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:12,383-Speed 2512.46 samples/sec Loss 1.2200 LearningRate 0.000040 Epoch: 32 Global Step: 680110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:20,591-Speed 2495.48 samples/sec Loss 1.2145 LearningRate 0.000040 Epoch: 32 Global Step: 680120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:28,799-Speed 2495.66 samples/sec Loss 1.1807 LearningRate 0.000040 Epoch: 32 Global Step: 680130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:37,005-Speed 2496.13 samples/sec Loss 1.2143 LearningRate 0.000040 Epoch: 32 Global Step: 680140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:45,205-Speed 2497.87 samples/sec Loss 1.2082 LearningRate 0.000040 Epoch: 32 Global Step: 680150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:08:53,418-Speed 2494.16 samples/sec Loss 1.2148 LearningRate 0.000040 Epoch: 32 Global Step: 680160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:01,577-Speed 2510.38 samples/sec Loss 1.2016 LearningRate 0.000040 Epoch: 32 Global Step: 680170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:09,781-Speed 2496.85 samples/sec Loss 1.2197 LearningRate 0.000040 Epoch: 32 Global Step: 680180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:17,981-Speed 2497.82 samples/sec Loss 1.2221 LearningRate 0.000040 Epoch: 32 Global Step: 680190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:26,181-Speed 2497.96 samples/sec Loss 1.2769 LearningRate 0.000040 Epoch: 32 Global Step: 680200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:34,384-Speed 2497.05 samples/sec Loss 1.1972 LearningRate 0.000040 Epoch: 32 Global Step: 680210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:42,597-Speed 2493.97 samples/sec Loss 1.2206 LearningRate 0.000040 Epoch: 32 Global Step: 680220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:50,772-Speed 2505.95 samples/sec Loss 1.2178 LearningRate 0.000040 Epoch: 32 Global Step: 680230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:09:58,974-Speed 2497.27 samples/sec Loss 1.2117 LearningRate 0.000040 Epoch: 32 Global Step: 680240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:07,177-Speed 2497.04 samples/sec Loss 1.2030 LearningRate 0.000040 Epoch: 32 Global Step: 680250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:15,381-Speed 2496.67 samples/sec Loss 1.2675 LearningRate 0.000040 Epoch: 32 Global Step: 680260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:23,585-Speed 2496.55 samples/sec Loss 1.2250 LearningRate 0.000040 Epoch: 32 Global Step: 680270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:31,786-Speed 2498.03 samples/sec Loss 1.2081 LearningRate 0.000040 Epoch: 32 Global Step: 680280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:39,936-Speed 2513.14 samples/sec Loss 1.2076 LearningRate 0.000040 Epoch: 32 Global Step: 680290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:48,137-Speed 2497.51 samples/sec Loss 1.2377 LearningRate 0.000040 Epoch: 32 Global Step: 680300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:10:56,356-Speed 2492.64 samples/sec Loss 1.2175 LearningRate 0.000040 Epoch: 32 Global Step: 680310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:11:04,557-Speed 2497.55 samples/sec Loss 1.2011 LearningRate 0.000040 Epoch: 32 Global Step: 680320 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:12,846-Speed 2470.93 samples/sec Loss 1.2178 LearningRate 0.000040 Epoch: 32 Global Step: 680330 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:21,054-Speed 2496.87 samples/sec Loss 1.2305 LearningRate 0.000040 Epoch: 32 Global Step: 680340 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:29,318-Speed 2514.91 samples/sec Loss 1.2242 LearningRate 0.000040 Epoch: 32 Global Step: 680350 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:37,529-Speed 2494.66 samples/sec Loss 1.2210 LearningRate 0.000040 Epoch: 32 Global Step: 680360 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:45,738-Speed 2494.93 samples/sec Loss 1.2437 LearningRate 0.000040 Epoch: 32 Global Step: 680370 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:11:54,121-Speed 2498.99 samples/sec Loss 1.2200 LearningRate 0.000040 Epoch: 32 Global Step: 680380 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:02,346-Speed 2497.92 samples/sec Loss 1.2352 LearningRate 0.000040 Epoch: 32 Global Step: 680390 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:10,554-Speed 2495.34 samples/sec Loss 1.2213 LearningRate 0.000040 Epoch: 32 Global Step: 680400 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:18,739-Speed 2514.56 samples/sec Loss 1.2252 LearningRate 0.000040 Epoch: 32 Global Step: 680410 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:27,009-Speed 2498.92 samples/sec Loss 1.2196 LearningRate 0.000040 Epoch: 32 Global Step: 680420 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:39,294-Speed 1779.09 samples/sec Loss 1.2073 LearningRate 0.000040 Epoch: 32 Global Step: 680430 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:47,553-Speed 2500.25 samples/sec Loss 1.2148 LearningRate 0.000040 Epoch: 32 Global Step: 680440 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:12:55,749-Speed 2498.92 samples/sec Loss 1.2309 LearningRate 0.000040 Epoch: 32 Global Step: 680450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:07,820-Speed 1713.32 samples/sec Loss 1.2241 LearningRate 0.000040 Epoch: 32 Global Step: 680460 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:16,003-Speed 2516.28 samples/sec Loss 1.2069 LearningRate 0.000040 Epoch: 32 Global Step: 680470 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:29,788-Speed 1485.76 samples/sec Loss 1.2200 LearningRate 0.000040 Epoch: 32 Global Step: 680480 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:39,099-Speed 2500.09 samples/sec Loss 1.2244 LearningRate 0.000040 Epoch: 32 Global Step: 680490 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:47,321-Speed 2498.98 samples/sec Loss 1.2102 LearningRate 0.000040 Epoch: 32 Global Step: 680500 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:13:55,512-Speed 2500.78 samples/sec Loss 1.2250 LearningRate 0.000040 Epoch: 32 Global Step: 680510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:14:07,469-Speed 2503.98 samples/sec Loss 1.1966 LearningRate 0.000040 Epoch: 32 Global Step: 680520 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:14:15,632-Speed 2518.02 samples/sec Loss 1.2232 LearningRate 0.000040 Epoch: 32 Global Step: 680530 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:14:30,554-Speed 1378.98 samples/sec Loss 1.2375 LearningRate 0.000040 Epoch: 32 Global Step: 680540 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:14:43,351-Speed 2500.73 samples/sec Loss 1.2305 LearningRate 0.000040 Epoch: 32 Global Step: 680550 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:14:55,179-Speed 2493.69 samples/sec Loss 1.2124 LearningRate 0.000040 Epoch: 32 Global Step: 680560 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:03,566-Speed 2503.22 samples/sec Loss 1.2148 LearningRate 0.000040 Epoch: 32 Global Step: 680570 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:11,769-Speed 2496.91 samples/sec Loss 1.2420 LearningRate 0.000040 Epoch: 32 Global Step: 680580 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:23,341-Speed 2517.27 samples/sec Loss 1.2184 LearningRate 0.000040 Epoch: 32 Global Step: 680590 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:31,548-Speed 2499.58 samples/sec Loss 1.1861 LearningRate 0.000040 Epoch: 32 Global Step: 680600 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:40,348-Speed 2498.56 samples/sec Loss 1.1824 LearningRate 0.000040 Epoch: 32 Global Step: 680610 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:48,554-Speed 2496.33 samples/sec Loss 1.2173 LearningRate 0.000040 Epoch: 32 Global Step: 680620 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:15:56,764-Speed 2494.84 samples/sec Loss 1.1737 LearningRate 0.000040 Epoch: 32 Global Step: 680630 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:04,968-Speed 2496.59 samples/sec Loss 1.1922 LearningRate 0.000040 Epoch: 32 Global Step: 680640 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:13,121-Speed 2512.49 samples/sec Loss 1.1867 LearningRate 0.000040 Epoch: 32 Global Step: 680650 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:21,329-Speed 2495.35 samples/sec Loss 1.2051 LearningRate 0.000040 Epoch: 32 Global Step: 680660 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:29,537-Speed 2495.89 samples/sec Loss 1.2077 LearningRate 0.000040 Epoch: 32 Global Step: 680670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:37,749-Speed 2494.46 samples/sec Loss 1.2038 LearningRate 0.000040 Epoch: 32 Global Step: 680680 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:45,955-Speed 2496.01 samples/sec Loss 1.2290 LearningRate 0.000040 Epoch: 32 Global Step: 680690 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:16:54,162-Speed 2496.01 samples/sec Loss 1.2213 LearningRate 0.000040 Epoch: 32 Global Step: 680700 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:02,319-Speed 2511.32 samples/sec Loss 1.1805 LearningRate 0.000040 Epoch: 32 Global Step: 680710 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:10,524-Speed 2496.38 samples/sec Loss 1.2173 LearningRate 0.000040 Epoch: 32 Global Step: 680720 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:18,730-Speed 2495.93 samples/sec Loss 1.1739 LearningRate 0.000040 Epoch: 32 Global Step: 680730 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:26,937-Speed 2495.91 samples/sec Loss 1.2190 LearningRate 0.000040 Epoch: 32 Global Step: 680740 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:35,142-Speed 2496.35 samples/sec Loss 1.2240 LearningRate 0.000040 Epoch: 32 Global Step: 680750 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:43,347-Speed 2496.67 samples/sec Loss 1.1955 LearningRate 0.000040 Epoch: 32 Global Step: 680760 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:51,503-Speed 2511.27 samples/sec Loss 1.2006 LearningRate 0.000040 Epoch: 32 Global Step: 680770 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:17:59,714-Speed 2494.72 samples/sec Loss 1.2197 LearningRate 0.000040 Epoch: 32 Global Step: 680780 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:07,922-Speed 2495.76 samples/sec Loss 1.2196 LearningRate 0.000040 Epoch: 32 Global Step: 680790 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:16,130-Speed 2495.63 samples/sec Loss 1.2311 LearningRate 0.000040 Epoch: 32 Global Step: 680800 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:24,347-Speed 2492.76 samples/sec Loss 1.2375 LearningRate 0.000040 Epoch: 32 Global Step: 680810 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:32,550-Speed 2497.26 samples/sec Loss 1.2326 LearningRate 0.000040 Epoch: 32 Global Step: 680820 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:40,702-Speed 2512.47 samples/sec Loss 1.2156 LearningRate 0.000040 Epoch: 32 Global Step: 680830 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:48,908-Speed 2496.62 samples/sec Loss 1.2284 LearningRate 0.000040 Epoch: 32 Global Step: 680840 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:18:57,115-Speed 2495.93 samples/sec Loss 1.2121 LearningRate 0.000040 Epoch: 32 Global Step: 680850 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:05,322-Speed 2495.64 samples/sec Loss 1.2198 LearningRate 0.000040 Epoch: 32 Global Step: 680860 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:13,527-Speed 2496.53 samples/sec Loss 1.2493 LearningRate 0.000040 Epoch: 32 Global Step: 680870 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:21,741-Speed 2493.64 samples/sec Loss 1.2221 LearningRate 0.000040 Epoch: 32 Global Step: 680880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:29,897-Speed 2511.26 samples/sec Loss 1.2494 LearningRate 0.000040 Epoch: 32 Global Step: 680890 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:38,106-Speed 2495.29 samples/sec Loss 1.2182 LearningRate 0.000040 Epoch: 32 Global Step: 680900 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:46,334-Speed 2489.80 samples/sec Loss 1.2215 LearningRate 0.000040 Epoch: 32 Global Step: 680910 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:19:54,544-Speed 2494.87 samples/sec Loss 1.2215 LearningRate 0.000040 Epoch: 32 Global Step: 680920 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:02,753-Speed 2495.22 samples/sec Loss 1.2068 LearningRate 0.000040 Epoch: 32 Global Step: 680930 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:10,976-Speed 2490.87 samples/sec Loss 1.2361 LearningRate 0.000040 Epoch: 32 Global Step: 680940 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:19,128-Speed 2512.69 samples/sec Loss 1.2126 LearningRate 0.000040 Epoch: 32 Global Step: 680950 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:27,337-Speed 2495.36 samples/sec Loss 1.2058 LearningRate 0.000040 Epoch: 32 Global Step: 680960 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:35,543-Speed 2495.87 samples/sec Loss 1.2363 LearningRate 0.000040 Epoch: 32 Global Step: 680970 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:43,749-Speed 2496.28 samples/sec Loss 1.2166 LearningRate 0.000040 Epoch: 32 Global Step: 680980 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:20:51,964-Speed 2493.38 samples/sec Loss 1.2255 LearningRate 0.000040 Epoch: 32 Global Step: 680990 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:00,175-Speed 2494.61 samples/sec Loss 1.2139 LearningRate 0.000040 Epoch: 32 Global Step: 681000 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:08,330-Speed 2511.86 samples/sec Loss 1.2207 LearningRate 0.000040 Epoch: 32 Global Step: 681010 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:16,535-Speed 2496.38 samples/sec Loss 1.1904 LearningRate 0.000040 Epoch: 32 Global Step: 681020 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:24,745-Speed 2494.92 samples/sec Loss 1.2163 LearningRate 0.000040 Epoch: 32 Global Step: 681030 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:32,952-Speed 2495.83 samples/sec Loss 1.2494 LearningRate 0.000040 Epoch: 32 Global Step: 681040 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:41,155-Speed 2497.24 samples/sec Loss 1.2394 LearningRate 0.000040 Epoch: 32 Global Step: 681050 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:49,363-Speed 2495.68 samples/sec Loss 1.1815 LearningRate 0.000040 Epoch: 32 Global Step: 681060 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:21:57,520-Speed 2511.15 samples/sec Loss 1.2218 LearningRate 0.000040 Epoch: 32 Global Step: 681070 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:05,729-Speed 2495.32 samples/sec Loss 1.1836 LearningRate 0.000040 Epoch: 32 Global Step: 681080 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:13,944-Speed 2493.24 samples/sec Loss 1.2101 LearningRate 0.000040 Epoch: 32 Global Step: 681090 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:22,151-Speed 2495.79 samples/sec Loss 1.1965 LearningRate 0.000040 Epoch: 32 Global Step: 681100 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:30,361-Speed 2495.03 samples/sec Loss 1.2139 LearningRate 0.000040 Epoch: 32 Global Step: 681110 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:38,571-Speed 2495.01 samples/sec Loss 1.2099 LearningRate 0.000040 Epoch: 32 Global Step: 681120 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:46,721-Speed 2513.21 samples/sec Loss 1.1950 LearningRate 0.000040 Epoch: 32 Global Step: 681130 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:22:54,932-Speed 2494.71 samples/sec Loss 1.2075 LearningRate 0.000040 Epoch: 32 Global Step: 681140 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:03,135-Speed 2497.02 samples/sec Loss 1.1820 LearningRate 0.000040 Epoch: 32 Global Step: 681150 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:11,339-Speed 2496.53 samples/sec Loss 1.2126 LearningRate 0.000040 Epoch: 32 Global Step: 681160 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:19,545-Speed 2496.23 samples/sec Loss 1.1913 LearningRate 0.000040 Epoch: 32 Global Step: 681170 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:27,750-Speed 2496.37 samples/sec Loss 1.2112 LearningRate 0.000039 Epoch: 32 Global Step: 681180 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:35,906-Speed 2511.59 samples/sec Loss 1.2025 LearningRate 0.000039 Epoch: 32 Global Step: 681190 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:44,123-Speed 2492.78 samples/sec Loss 1.2565 LearningRate 0.000039 Epoch: 32 Global Step: 681200 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:23:52,327-Speed 2496.83 samples/sec Loss 1.2232 LearningRate 0.000039 Epoch: 32 Global Step: 681210 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:00,532-Speed 2496.43 samples/sec Loss 1.2267 LearningRate 0.000039 Epoch: 32 Global Step: 681220 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:08,737-Speed 2496.40 samples/sec Loss 1.1843 LearningRate 0.000039 Epoch: 32 Global Step: 681230 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:16,947-Speed 2495.03 samples/sec Loss 1.2220 LearningRate 0.000039 Epoch: 32 Global Step: 681240 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:25,099-Speed 2512.74 samples/sec Loss 1.1776 LearningRate 0.000039 Epoch: 32 Global Step: 681250 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:33,304-Speed 2496.51 samples/sec Loss 1.1743 LearningRate 0.000039 Epoch: 32 Global Step: 681260 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:41,506-Speed 2497.02 samples/sec Loss 1.2012 LearningRate 0.000039 Epoch: 32 Global Step: 681270 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:49,713-Speed 2496.00 samples/sec Loss 1.2056 LearningRate 0.000039 Epoch: 32 Global Step: 681280 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:24:57,917-Speed 2496.68 samples/sec Loss 1.2303 LearningRate 0.000039 Epoch: 32 Global Step: 681290 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:06,126-Speed 2495.33 samples/sec Loss 1.2061 LearningRate 0.000039 Epoch: 32 Global Step: 681300 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:14,278-Speed 2512.61 samples/sec Loss 1.2044 LearningRate 0.000039 Epoch: 32 Global Step: 681310 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:22,498-Speed 2491.75 samples/sec Loss 1.2067 LearningRate 0.000039 Epoch: 32 Global Step: 681320 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:30,716-Speed 2492.82 samples/sec Loss 1.1904 LearningRate 0.000039 Epoch: 32 Global Step: 681330 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:38,921-Speed 2496.29 samples/sec Loss 1.1852 LearningRate 0.000039 Epoch: 32 Global Step: 681340 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:47,154-Speed 2487.88 samples/sec Loss 1.1754 LearningRate 0.000039 Epoch: 32 Global Step: 681350 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:25:55,365-Speed 2494.78 samples/sec Loss 1.2026 LearningRate 0.000039 Epoch: 32 Global Step: 681360 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:03,514-Speed 2513.76 samples/sec Loss 1.2201 LearningRate 0.000039 Epoch: 32 Global Step: 681370 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:11,717-Speed 2497.35 samples/sec Loss 1.2416 LearningRate 0.000039 Epoch: 32 Global Step: 681380 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:19,919-Speed 2497.19 samples/sec Loss 1.2013 LearningRate 0.000039 Epoch: 32 Global Step: 681390 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:28,125-Speed 2496.39 samples/sec Loss 1.2180 LearningRate 0.000039 Epoch: 32 Global Step: 681400 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:36,331-Speed 2496.21 samples/sec Loss 1.1889 LearningRate 0.000039 Epoch: 32 Global Step: 681410 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:44,535-Speed 2496.55 samples/sec Loss 1.2320 LearningRate 0.000039 Epoch: 32 Global Step: 681420 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:26:52,687-Speed 2512.77 samples/sec Loss 1.2455 LearningRate 0.000039 Epoch: 32 Global Step: 681430 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:00,889-Speed 2497.24 samples/sec Loss 1.2133 LearningRate 0.000039 Epoch: 32 Global Step: 681440 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:09,093-Speed 2496.78 samples/sec Loss 1.2351 LearningRate 0.000039 Epoch: 32 Global Step: 681450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:17,302-Speed 2495.32 samples/sec Loss 1.2195 LearningRate 0.000039 Epoch: 32 Global Step: 681460 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:25,505-Speed 2496.88 samples/sec Loss 1.2263 LearningRate 0.000039 Epoch: 32 Global Step: 681470 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:33,715-Speed 2495.04 samples/sec Loss 1.2044 LearningRate 0.000039 Epoch: 32 Global Step: 681480 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:41,873-Speed 2511.16 samples/sec Loss 1.2145 LearningRate 0.000039 Epoch: 32 Global Step: 681490 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:50,080-Speed 2495.52 samples/sec Loss 1.1864 LearningRate 0.000039 Epoch: 32 Global Step: 681500 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:27:58,286-Speed 2496.15 samples/sec Loss 1.2276 LearningRate 0.000039 Epoch: 32 Global Step: 681510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:28:06,489-Speed 2497.12 samples/sec Loss 1.2180 LearningRate 0.000039 Epoch: 32 Global Step: 681520 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-07-12 02:28:14,695-Speed 2496.29 samples/sec Loss 1.1996 LearningRate 0.000039 Epoch: 32 Global Step: 681530 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-07-12 02:28:22,863-Speed 2507.61 samples/sec Loss 1.2203 LearningRate 0.000039 Epoch: 32 Global Step: 681540 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:28:31,015-Speed 2512.59 samples/sec Loss 1.2170 LearningRate 0.000039 Epoch: 32 Global Step: 681550 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:28:39,224-Speed 2495.45 samples/sec Loss 1.2211 LearningRate 0.000039 Epoch: 32 Global Step: 681560 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:28:47,430-Speed 2496.03 samples/sec Loss 1.2130 LearningRate 0.000039 Epoch: 32 Global Step: 681570 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:28:55,637-Speed 2496.79 samples/sec Loss 1.1936 LearningRate 0.000039 Epoch: 32 Global Step: 681580 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:03,845-Speed 2495.66 samples/sec Loss 1.1972 LearningRate 0.000039 Epoch: 32 Global Step: 681590 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:12,051-Speed 2496.51 samples/sec Loss 1.2448 LearningRate 0.000039 Epoch: 32 Global Step: 681600 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:20,204-Speed 2512.27 samples/sec Loss 1.1813 LearningRate 0.000039 Epoch: 32 Global Step: 681610 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:28,408-Speed 2496.81 samples/sec Loss 1.1748 LearningRate 0.000039 Epoch: 32 Global Step: 681620 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:36,609-Speed 2497.35 samples/sec Loss 1.2309 LearningRate 0.000039 Epoch: 32 Global Step: 681630 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:44,812-Speed 2497.29 samples/sec Loss 1.2108 LearningRate 0.000039 Epoch: 32 Global Step: 681640 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:29:53,015-Speed 2497.73 samples/sec Loss 1.2013 LearningRate 0.000039 Epoch: 32 Global Step: 681650 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:01,216-Speed 2497.65 samples/sec Loss 1.2324 LearningRate 0.000039 Epoch: 32 Global Step: 681660 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:09,367-Speed 2513.07 samples/sec Loss 1.2354 LearningRate 0.000039 Epoch: 32 Global Step: 681670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:17,569-Speed 2497.38 samples/sec Loss 1.2318 LearningRate 0.000039 Epoch: 32 Global Step: 681680 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:25,777-Speed 2495.66 samples/sec Loss 1.2361 LearningRate 0.000039 Epoch: 32 Global Step: 681690 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:33,981-Speed 2496.64 samples/sec Loss 1.2227 LearningRate 0.000039 Epoch: 32 Global Step: 681700 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:42,189-Speed 2495.57 samples/sec Loss 1.2221 LearningRate 0.000039 Epoch: 32 Global Step: 681710 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:50,397-Speed 2495.43 samples/sec Loss 1.2137 LearningRate 0.000039 Epoch: 32 Global Step: 681720 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:30:58,556-Speed 2510.51 samples/sec Loss 1.2155 LearningRate 0.000039 Epoch: 32 Global Step: 681730 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:06,761-Speed 2496.40 samples/sec Loss 1.2213 LearningRate 0.000039 Epoch: 32 Global Step: 681740 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:14,965-Speed 2496.60 samples/sec Loss 1.2267 LearningRate 0.000039 Epoch: 32 Global Step: 681750 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:23,171-Speed 2496.12 samples/sec Loss 1.1843 LearningRate 0.000039 Epoch: 32 Global Step: 681760 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:31,381-Speed 2494.81 samples/sec Loss 1.1995 LearningRate 0.000039 Epoch: 32 Global Step: 681770 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:39,590-Speed 2495.43 samples/sec Loss 1.2377 LearningRate 0.000039 Epoch: 32 Global Step: 681780 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:47,750-Speed 2510.03 samples/sec Loss 1.1936 LearningRate 0.000039 Epoch: 32 Global Step: 681790 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:31:55,959-Speed 2495.19 samples/sec Loss 1.1702 LearningRate 0.000039 Epoch: 32 Global Step: 681800 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:04,164-Speed 2496.10 samples/sec Loss 1.2014 LearningRate 0.000039 Epoch: 32 Global Step: 681810 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:12,375-Speed 2494.84 samples/sec Loss 1.1951 LearningRate 0.000039 Epoch: 32 Global Step: 681820 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:20,583-Speed 2495.47 samples/sec Loss 1.1841 LearningRate 0.000039 Epoch: 32 Global Step: 681830 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:28,794-Speed 2494.45 samples/sec Loss 1.2106 LearningRate 0.000039 Epoch: 32 Global Step: 681840 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:36,949-Speed 2511.72 samples/sec Loss 1.2084 LearningRate 0.000039 Epoch: 32 Global Step: 681850 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:45,153-Speed 2496.84 samples/sec Loss 1.1827 LearningRate 0.000039 Epoch: 32 Global Step: 681860 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:32:53,358-Speed 2496.58 samples/sec Loss 1.2297 LearningRate 0.000039 Epoch: 32 Global Step: 681870 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:01,572-Speed 2493.52 samples/sec Loss 1.2386 LearningRate 0.000039 Epoch: 32 Global Step: 681880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:09,775-Speed 2496.93 samples/sec Loss 1.2081 LearningRate 0.000039 Epoch: 32 Global Step: 681890 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:17,977-Speed 2497.26 samples/sec Loss 1.2198 LearningRate 0.000039 Epoch: 32 Global Step: 681900 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:26,128-Speed 2512.94 samples/sec Loss 1.2073 LearningRate 0.000039 Epoch: 32 Global Step: 681910 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:34,349-Speed 2491.58 samples/sec Loss 1.2129 LearningRate 0.000039 Epoch: 32 Global Step: 681920 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:42,555-Speed 2496.05 samples/sec Loss 1.1987 LearningRate 0.000039 Epoch: 32 Global Step: 681930 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:50,763-Speed 2495.57 samples/sec Loss 1.2078 LearningRate 0.000039 Epoch: 32 Global Step: 681940 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:33:58,966-Speed 2497.05 samples/sec Loss 1.2132 LearningRate 0.000039 Epoch: 32 Global Step: 681950 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:07,173-Speed 2496.00 samples/sec Loss 1.2216 LearningRate 0.000039 Epoch: 32 Global Step: 681960 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:15,323-Speed 2513.26 samples/sec Loss 1.2273 LearningRate 0.000039 Epoch: 32 Global Step: 681970 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:23,531-Speed 2495.41 samples/sec Loss 1.1873 LearningRate 0.000039 Epoch: 32 Global Step: 681980 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:31,735-Speed 2496.50 samples/sec Loss 1.1849 LearningRate 0.000039 Epoch: 32 Global Step: 681990 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:39,941-Speed 2496.36 samples/sec Loss 1.2276 LearningRate 0.000039 Epoch: 32 Global Step: 682000 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:48,146-Speed 2496.37 samples/sec Loss 1.1966 LearningRate 0.000039 Epoch: 32 Global Step: 682010 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:34:56,353-Speed 2495.88 samples/sec Loss 1.2013 LearningRate 0.000039 Epoch: 32 Global Step: 682020 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:35:04,507-Speed 2511.92 samples/sec Loss 1.2145 LearningRate 0.000039 Epoch: 32 Global Step: 682030 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-07-12 02:35:12,670-Speed 2509.13 samples/sec Loss 1.2258 LearningRate 0.000039 Epoch: 32 Global Step: 682040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:35:20,875-Speed 2496.87 samples/sec Loss 1.2306 LearningRate 0.000039 Epoch: 32 Global Step: 682050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:35:29,091-Speed 2493.14 samples/sec Loss 1.1964 LearningRate 0.000039 Epoch: 32 Global Step: 682060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:35:37,295-Speed 2496.84 samples/sec Loss 1.2041 LearningRate 0.000039 Epoch: 32 Global Step: 682070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:35:45,499-Speed 2496.67 samples/sec Loss 1.1943 LearningRate 0.000039 Epoch: 32 Global Step: 682080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:35:53,649-Speed 2513.50 samples/sec Loss 1.2246 LearningRate 0.000039 Epoch: 32 Global Step: 682090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:01,853-Speed 2496.82 samples/sec Loss 1.1827 LearningRate 0.000039 Epoch: 32 Global Step: 682100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:10,070-Speed 2492.51 samples/sec Loss 1.1747 LearningRate 0.000039 Epoch: 32 Global Step: 682110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:18,276-Speed 2496.12 samples/sec Loss 1.2182 LearningRate 0.000039 Epoch: 32 Global Step: 682120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:26,481-Speed 2496.26 samples/sec Loss 1.2512 LearningRate 0.000039 Epoch: 32 Global Step: 682130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:34,685-Speed 2496.99 samples/sec Loss 1.2200 LearningRate 0.000039 Epoch: 32 Global Step: 682140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:42,836-Speed 2512.89 samples/sec Loss 1.2044 LearningRate 0.000039 Epoch: 32 Global Step: 682150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:51,040-Speed 2497.24 samples/sec Loss 1.2261 LearningRate 0.000039 Epoch: 32 Global Step: 682160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:36:59,257-Speed 2492.68 samples/sec Loss 1.1950 LearningRate 0.000039 Epoch: 32 Global Step: 682170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:07,464-Speed 2495.99 samples/sec Loss 1.2444 LearningRate 0.000039 Epoch: 32 Global Step: 682180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:15,669-Speed 2496.63 samples/sec Loss 1.2229 LearningRate 0.000039 Epoch: 32 Global Step: 682190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:23,872-Speed 2496.87 samples/sec Loss 1.1432 LearningRate 0.000039 Epoch: 32 Global Step: 682200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:32,024-Speed 2512.83 samples/sec Loss 1.2167 LearningRate 0.000039 Epoch: 32 Global Step: 682210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:40,228-Speed 2496.82 samples/sec Loss 1.2131 LearningRate 0.000039 Epoch: 32 Global Step: 682220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:48,438-Speed 2494.89 samples/sec Loss 1.2066 LearningRate 0.000039 Epoch: 32 Global Step: 682230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:37:56,643-Speed 2496.41 samples/sec Loss 1.2046 LearningRate 0.000039 Epoch: 32 Global Step: 682240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:04,849-Speed 2496.27 samples/sec Loss 1.2374 LearningRate 0.000039 Epoch: 32 Global Step: 682250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:13,052-Speed 2496.95 samples/sec Loss 1.2111 LearningRate 0.000039 Epoch: 32 Global Step: 682260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:21,200-Speed 2513.92 samples/sec Loss 1.2111 LearningRate 0.000039 Epoch: 32 Global Step: 682270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:29,406-Speed 2496.34 samples/sec Loss 1.1769 LearningRate 0.000039 Epoch: 32 Global Step: 682280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:37,608-Speed 2497.43 samples/sec Loss 1.1950 LearningRate 0.000039 Epoch: 32 Global Step: 682290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:45,812-Speed 2496.59 samples/sec Loss 1.2076 LearningRate 0.000039 Epoch: 32 Global Step: 682300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:38:54,015-Speed 2497.19 samples/sec Loss 1.2312 LearningRate 0.000039 Epoch: 32 Global Step: 682310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:02,225-Speed 2495.30 samples/sec Loss 1.2004 LearningRate 0.000039 Epoch: 32 Global Step: 682320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:10,375-Speed 2513.17 samples/sec Loss 1.2157 LearningRate 0.000039 Epoch: 32 Global Step: 682330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:18,581-Speed 2496.16 samples/sec Loss 1.2286 LearningRate 0.000039 Epoch: 32 Global Step: 682340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:26,792-Speed 2494.47 samples/sec Loss 1.1905 LearningRate 0.000039 Epoch: 32 Global Step: 682350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:34,998-Speed 2496.27 samples/sec Loss 1.2355 LearningRate 0.000039 Epoch: 32 Global Step: 682360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:43,201-Speed 2496.83 samples/sec Loss 1.2279 LearningRate 0.000039 Epoch: 32 Global Step: 682370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:51,405-Speed 2496.56 samples/sec Loss 1.2158 LearningRate 0.000039 Epoch: 32 Global Step: 682380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:39:59,561-Speed 2511.65 samples/sec Loss 1.2049 LearningRate 0.000039 Epoch: 32 Global Step: 682390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:07,764-Speed 2496.91 samples/sec Loss 1.2283 LearningRate 0.000039 Epoch: 32 Global Step: 682400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:15,972-Speed 2495.48 samples/sec Loss 1.1997 LearningRate 0.000039 Epoch: 32 Global Step: 682410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:24,178-Speed 2496.19 samples/sec Loss 1.2014 LearningRate 0.000039 Epoch: 32 Global Step: 682420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:32,385-Speed 2495.87 samples/sec Loss 1.2129 LearningRate 0.000039 Epoch: 32 Global Step: 682430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:40,590-Speed 2496.80 samples/sec Loss 1.2081 LearningRate 0.000039 Epoch: 32 Global Step: 682440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:48,742-Speed 2512.59 samples/sec Loss 1.1687 LearningRate 0.000039 Epoch: 32 Global Step: 682450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:40:56,944-Speed 2497.38 samples/sec Loss 1.2016 LearningRate 0.000039 Epoch: 32 Global Step: 682460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:05,148-Speed 2496.73 samples/sec Loss 1.2394 LearningRate 0.000039 Epoch: 32 Global Step: 682470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:13,353-Speed 2496.17 samples/sec Loss 1.2297 LearningRate 0.000039 Epoch: 32 Global Step: 682480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:21,579-Speed 2490.33 samples/sec Loss 1.2035 LearningRate 0.000039 Epoch: 32 Global Step: 682490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:29,788-Speed 2495.02 samples/sec Loss 1.2086 LearningRate 0.000039 Epoch: 32 Global Step: 682500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:37,935-Speed 2514.36 samples/sec Loss 1.2220 LearningRate 0.000039 Epoch: 32 Global Step: 682510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:46,135-Speed 2497.80 samples/sec Loss 1.2052 LearningRate 0.000039 Epoch: 32 Global Step: 682520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:41:54,337-Speed 2497.28 samples/sec Loss 1.2044 LearningRate 0.000039 Epoch: 32 Global Step: 682530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:02,539-Speed 2497.41 samples/sec Loss 1.2106 LearningRate 0.000039 Epoch: 32 Global Step: 682540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:10,742-Speed 2497.03 samples/sec Loss 1.2192 LearningRate 0.000039 Epoch: 32 Global Step: 682550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:18,947-Speed 2496.54 samples/sec Loss 1.2227 LearningRate 0.000039 Epoch: 32 Global Step: 682560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:27,099-Speed 2512.46 samples/sec Loss 1.1827 LearningRate 0.000039 Epoch: 32 Global Step: 682570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:35,305-Speed 2496.20 samples/sec Loss 1.2039 LearningRate 0.000039 Epoch: 32 Global Step: 682580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:43,509-Speed 2496.71 samples/sec Loss 1.2164 LearningRate 0.000039 Epoch: 32 Global Step: 682590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:51,715-Speed 2496.08 samples/sec Loss 1.1934 LearningRate 0.000039 Epoch: 32 Global Step: 682600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:42:59,922-Speed 2495.88 samples/sec Loss 1.1901 LearningRate 0.000039 Epoch: 32 Global Step: 682610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:08,133-Speed 2494.64 samples/sec Loss 1.1929 LearningRate 0.000039 Epoch: 32 Global Step: 682620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:16,286-Speed 2512.59 samples/sec Loss 1.2147 LearningRate 0.000039 Epoch: 32 Global Step: 682630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:24,491-Speed 2496.41 samples/sec Loss 1.2017 LearningRate 0.000039 Epoch: 32 Global Step: 682640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:32,700-Speed 2494.93 samples/sec Loss 1.2044 LearningRate 0.000039 Epoch: 32 Global Step: 682650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:40,918-Speed 2492.78 samples/sec Loss 1.1942 LearningRate 0.000039 Epoch: 32 Global Step: 682660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:49,123-Speed 2496.31 samples/sec Loss 1.2220 LearningRate 0.000039 Epoch: 32 Global Step: 682670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:43:57,334-Speed 2494.52 samples/sec Loss 1.1801 LearningRate 0.000039 Epoch: 32 Global Step: 682680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:05,490-Speed 2511.75 samples/sec Loss 1.2064 LearningRate 0.000039 Epoch: 32 Global Step: 682690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:13,692-Speed 2497.21 samples/sec Loss 1.1989 LearningRate 0.000039 Epoch: 32 Global Step: 682700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:21,897-Speed 2496.77 samples/sec Loss 1.2099 LearningRate 0.000039 Epoch: 32 Global Step: 682710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:30,101-Speed 2496.74 samples/sec Loss 1.1996 LearningRate 0.000039 Epoch: 32 Global Step: 682720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:38,305-Speed 2496.74 samples/sec Loss 1.2154 LearningRate 0.000039 Epoch: 32 Global Step: 682730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:46,508-Speed 2497.40 samples/sec Loss 1.2398 LearningRate 0.000039 Epoch: 32 Global Step: 682740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:44:54,656-Speed 2513.74 samples/sec Loss 1.2008 LearningRate 0.000039 Epoch: 32 Global Step: 682750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:02,858-Speed 2497.31 samples/sec Loss 1.1903 LearningRate 0.000039 Epoch: 32 Global Step: 682760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:11,060-Speed 2497.45 samples/sec Loss 1.2302 LearningRate 0.000039 Epoch: 32 Global Step: 682770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:19,263-Speed 2497.08 samples/sec Loss 1.2092 LearningRate 0.000039 Epoch: 32 Global Step: 682780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:27,464-Speed 2498.09 samples/sec Loss 1.2257 LearningRate 0.000039 Epoch: 32 Global Step: 682790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:35,669-Speed 2496.34 samples/sec Loss 1.1896 LearningRate 0.000039 Epoch: 32 Global Step: 682800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:43,821-Speed 2512.41 samples/sec Loss 1.1915 LearningRate 0.000039 Epoch: 32 Global Step: 682810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:45:52,025-Speed 2496.89 samples/sec Loss 1.1958 LearningRate 0.000039 Epoch: 32 Global Step: 682820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:00,229-Speed 2496.93 samples/sec Loss 1.1961 LearningRate 0.000039 Epoch: 32 Global Step: 682830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:08,431-Speed 2497.07 samples/sec Loss 1.2020 LearningRate 0.000039 Epoch: 32 Global Step: 682840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:16,637-Speed 2496.24 samples/sec Loss 1.2099 LearningRate 0.000039 Epoch: 32 Global Step: 682850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:24,840-Speed 2496.93 samples/sec Loss 1.2173 LearningRate 0.000039 Epoch: 32 Global Step: 682860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:32,991-Speed 2513.45 samples/sec Loss 1.2195 LearningRate 0.000039 Epoch: 32 Global Step: 682870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:41,194-Speed 2497.06 samples/sec Loss 1.2153 LearningRate 0.000039 Epoch: 32 Global Step: 682880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:49,401-Speed 2495.94 samples/sec Loss 1.2055 LearningRate 0.000039 Epoch: 32 Global Step: 682890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:46:57,606-Speed 2496.56 samples/sec Loss 1.2138 LearningRate 0.000039 Epoch: 32 Global Step: 682900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:05,818-Speed 2494.41 samples/sec Loss 1.2198 LearningRate 0.000039 Epoch: 32 Global Step: 682910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:14,021-Speed 2497.12 samples/sec Loss 1.2080 LearningRate 0.000039 Epoch: 32 Global Step: 682920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:22,169-Speed 2513.94 samples/sec Loss 1.2347 LearningRate 0.000039 Epoch: 32 Global Step: 682930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:30,375-Speed 2496.17 samples/sec Loss 1.2359 LearningRate 0.000039 Epoch: 32 Global Step: 682940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:38,579-Speed 2496.77 samples/sec Loss 1.2318 LearningRate 0.000039 Epoch: 32 Global Step: 682950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:46,781-Speed 2497.51 samples/sec Loss 1.1923 LearningRate 0.000039 Epoch: 32 Global Step: 682960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:47:54,985-Speed 2496.63 samples/sec Loss 1.2231 LearningRate 0.000039 Epoch: 32 Global Step: 682970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:03,195-Speed 2494.80 samples/sec Loss 1.2101 LearningRate 0.000039 Epoch: 32 Global Step: 682980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:11,338-Speed 2515.97 samples/sec Loss 1.2333 LearningRate 0.000039 Epoch: 32 Global Step: 682990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:19,544-Speed 2496.05 samples/sec Loss 1.1822 LearningRate 0.000039 Epoch: 32 Global Step: 683000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:27,748-Speed 2496.60 samples/sec Loss 1.1924 LearningRate 0.000039 Epoch: 32 Global Step: 683010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:35,952-Speed 2496.78 samples/sec Loss 1.1772 LearningRate 0.000039 Epoch: 32 Global Step: 683020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:44,169-Speed 2492.88 samples/sec Loss 1.2074 LearningRate 0.000039 Epoch: 32 Global Step: 683030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:48:52,384-Speed 2493.28 samples/sec Loss 1.2213 LearningRate 0.000039 Epoch: 32 Global Step: 683040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:00,537-Speed 2512.53 samples/sec Loss 1.1978 LearningRate 0.000039 Epoch: 32 Global Step: 683050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:08,742-Speed 2496.42 samples/sec Loss 1.2285 LearningRate 0.000039 Epoch: 32 Global Step: 683060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:16,945-Speed 2496.93 samples/sec Loss 1.1845 LearningRate 0.000038 Epoch: 32 Global Step: 683070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:25,151-Speed 2496.09 samples/sec Loss 1.2007 LearningRate 0.000038 Epoch: 32 Global Step: 683080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:33,355-Speed 2496.97 samples/sec Loss 1.1993 LearningRate 0.000038 Epoch: 32 Global Step: 683090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:41,568-Speed 2493.92 samples/sec Loss 1.1985 LearningRate 0.000038 Epoch: 32 Global Step: 683100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:49,721-Speed 2512.56 samples/sec Loss 1.1895 LearningRate 0.000038 Epoch: 32 Global Step: 683110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:49:57,929-Speed 2495.39 samples/sec Loss 1.2095 LearningRate 0.000038 Epoch: 32 Global Step: 683120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:06,136-Speed 2495.91 samples/sec Loss 1.1987 LearningRate 0.000038 Epoch: 32 Global Step: 683130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:14,337-Speed 2497.56 samples/sec Loss 1.1804 LearningRate 0.000038 Epoch: 32 Global Step: 683140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:22,538-Speed 2497.55 samples/sec Loss 1.2141 LearningRate 0.000038 Epoch: 32 Global Step: 683150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:30,741-Speed 2497.13 samples/sec Loss 1.1880 LearningRate 0.000038 Epoch: 32 Global Step: 683160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:38,891-Speed 2513.17 samples/sec Loss 1.1981 LearningRate 0.000038 Epoch: 32 Global Step: 683170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:47,097-Speed 2496.18 samples/sec Loss 1.2180 LearningRate 0.000038 Epoch: 32 Global Step: 683180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:50:55,309-Speed 2494.55 samples/sec Loss 1.1843 LearningRate 0.000038 Epoch: 32 Global Step: 683190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:51:03,515-Speed 2496.44 samples/sec Loss 1.1891 LearningRate 0.000038 Epoch: 32 Global Step: 683200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:51:11,719-Speed 2496.56 samples/sec Loss 1.1700 LearningRate 0.000038 Epoch: 32 Global Step: 683210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:51:19,921-Speed 2497.15 samples/sec Loss 1.1938 LearningRate 0.000038 Epoch: 32 Global Step: 683220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:51:28,070-Speed 2513.74 samples/sec Loss 1.2308 LearningRate 0.000038 Epoch: 32 Global Step: 683230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-07-12 02:51:36,274-Speed 2496.52 samples/sec Loss 1.2216 LearningRate 0.000038 Epoch: 32 Global Step: 683240 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:51:44,477-Speed 2497.13 samples/sec Loss 1.2484 LearningRate 0.000038 Epoch: 32 Global Step: 683250 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:51:52,680-Speed 2496.98 samples/sec Loss 1.2192 LearningRate 0.000038 Epoch: 32 Global Step: 683260 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:00,885-Speed 2496.58 samples/sec Loss 1.2030 LearningRate 0.000038 Epoch: 32 Global Step: 683270 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:09,090-Speed 2496.27 samples/sec Loss 1.1963 LearningRate 0.000038 Epoch: 32 Global Step: 683280 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:17,244-Speed 2512.26 samples/sec Loss 1.2710 LearningRate 0.000038 Epoch: 32 Global Step: 683290 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:25,448-Speed 2496.79 samples/sec Loss 1.2018 LearningRate 0.000038 Epoch: 32 Global Step: 683300 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:33,653-Speed 2496.23 samples/sec Loss 1.1982 LearningRate 0.000038 Epoch: 32 Global Step: 683310 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:41,860-Speed 2496.00 samples/sec Loss 1.2135 LearningRate 0.000038 Epoch: 32 Global Step: 683320 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:50,065-Speed 2496.06 samples/sec Loss 1.2086 LearningRate 0.000038 Epoch: 32 Global Step: 683330 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:52:58,273-Speed 2495.56 samples/sec Loss 1.1663 LearningRate 0.000038 Epoch: 32 Global Step: 683340 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:06,427-Speed 2512.12 samples/sec Loss 1.2113 LearningRate 0.000038 Epoch: 32 Global Step: 683350 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:14,634-Speed 2495.79 samples/sec Loss 1.2116 LearningRate 0.000038 Epoch: 32 Global Step: 683360 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:22,845-Speed 2494.55 samples/sec Loss 1.2129 LearningRate 0.000038 Epoch: 32 Global Step: 683370 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:31,049-Speed 2496.79 samples/sec Loss 1.1931 LearningRate 0.000038 Epoch: 32 Global Step: 683380 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:39,259-Speed 2495.14 samples/sec Loss 1.1856 LearningRate 0.000038 Epoch: 32 Global Step: 683390 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:47,465-Speed 2495.84 samples/sec Loss 1.1738 LearningRate 0.000038 Epoch: 32 Global Step: 683400 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:53:55,617-Speed 2512.81 samples/sec Loss 1.2129 LearningRate 0.000038 Epoch: 32 Global Step: 683410 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:03,820-Speed 2496.93 samples/sec Loss 1.1989 LearningRate 0.000038 Epoch: 32 Global Step: 683420 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:12,022-Speed 2497.52 samples/sec Loss 1.2312 LearningRate 0.000038 Epoch: 32 Global Step: 683430 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:20,227-Speed 2496.31 samples/sec Loss 1.2119 LearningRate 0.000038 Epoch: 32 Global Step: 683440 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:28,448-Speed 2491.50 samples/sec Loss 1.2508 LearningRate 0.000038 Epoch: 32 Global Step: 683450 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:36,654-Speed 2496.56 samples/sec Loss 1.2142 LearningRate 0.000038 Epoch: 32 Global Step: 683460 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:44,819-Speed 2508.64 samples/sec Loss 1.2077 LearningRate 0.000038 Epoch: 32 Global Step: 683470 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:54:53,035-Speed 2493.10 samples/sec Loss 1.1941 LearningRate 0.000038 Epoch: 32 Global Step: 683480 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:01,240-Speed 2496.47 samples/sec Loss 1.1798 LearningRate 0.000038 Epoch: 32 Global Step: 683490 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:09,448-Speed 2495.42 samples/sec Loss 1.1785 LearningRate 0.000038 Epoch: 32 Global Step: 683500 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:17,663-Speed 2493.78 samples/sec Loss 1.2109 LearningRate 0.000038 Epoch: 32 Global Step: 683510 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:25,869-Speed 2496.23 samples/sec Loss 1.1814 LearningRate 0.000038 Epoch: 32 Global Step: 683520 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:34,028-Speed 2510.52 samples/sec Loss 1.2206 LearningRate 0.000038 Epoch: 32 Global Step: 683530 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:42,233-Speed 2496.38 samples/sec Loss 1.2069 LearningRate 0.000038 Epoch: 32 Global Step: 683540 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:50,437-Speed 2496.60 samples/sec Loss 1.2213 LearningRate 0.000038 Epoch: 32 Global Step: 683550 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:55:58,641-Speed 2496.92 samples/sec Loss 1.2246 LearningRate 0.000038 Epoch: 32 Global Step: 683560 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:06,851-Speed 2494.97 samples/sec Loss 1.2155 LearningRate 0.000038 Epoch: 32 Global Step: 683570 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:15,052-Speed 2497.48 samples/sec Loss 1.2221 LearningRate 0.000038 Epoch: 32 Global Step: 683580 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:23,206-Speed 2512.29 samples/sec Loss 1.1892 LearningRate 0.000038 Epoch: 32 Global Step: 683590 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:31,413-Speed 2495.73 samples/sec Loss 1.2148 LearningRate 0.000038 Epoch: 32 Global Step: 683600 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:39,617-Speed 2496.59 samples/sec Loss 1.2218 LearningRate 0.000038 Epoch: 32 Global Step: 683610 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:47,824-Speed 2495.89 samples/sec Loss 1.2004 LearningRate 0.000038 Epoch: 32 Global Step: 683620 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:56:56,037-Speed 2494.36 samples/sec Loss 1.1886 LearningRate 0.000038 Epoch: 32 Global Step: 683630 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:04,242-Speed 2496.30 samples/sec Loss 1.2307 LearningRate 0.000038 Epoch: 32 Global Step: 683640 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:12,393-Speed 2513.17 samples/sec Loss 1.2000 LearningRate 0.000038 Epoch: 32 Global Step: 683650 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:20,608-Speed 2493.36 samples/sec Loss 1.1983 LearningRate 0.000038 Epoch: 32 Global Step: 683660 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:28,814-Speed 2496.22 samples/sec Loss 1.2150 LearningRate 0.000038 Epoch: 32 Global Step: 683670 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:37,021-Speed 2495.68 samples/sec Loss 1.1673 LearningRate 0.000038 Epoch: 32 Global Step: 683680 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:45,231-Speed 2495.04 samples/sec Loss 1.1988 LearningRate 0.000038 Epoch: 32 Global Step: 683690 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:57:53,437-Speed 2496.32 samples/sec Loss 1.2408 LearningRate 0.000038 Epoch: 32 Global Step: 683700 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:01,591-Speed 2511.92 samples/sec Loss 1.2301 LearningRate 0.000038 Epoch: 32 Global Step: 683710 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:09,800-Speed 2495.13 samples/sec Loss 1.2137 LearningRate 0.000038 Epoch: 32 Global Step: 683720 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:18,008-Speed 2495.79 samples/sec Loss 1.1767 LearningRate 0.000038 Epoch: 32 Global Step: 683730 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:26,216-Speed 2495.58 samples/sec Loss 1.2172 LearningRate 0.000038 Epoch: 32 Global Step: 683740 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:34,424-Speed 2495.30 samples/sec Loss 1.2146 LearningRate 0.000038 Epoch: 32 Global Step: 683750 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:42,632-Speed 2495.65 samples/sec Loss 1.2096 LearningRate 0.000038 Epoch: 32 Global Step: 683760 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:50,786-Speed 2512.91 samples/sec Loss 1.1902 LearningRate 0.000038 Epoch: 32 Global Step: 683770 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:58:58,996-Speed 2494.82 samples/sec Loss 1.2341 LearningRate 0.000038 Epoch: 32 Global Step: 683780 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:07,204-Speed 2495.57 samples/sec Loss 1.2026 LearningRate 0.000038 Epoch: 32 Global Step: 683790 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:15,412-Speed 2495.48 samples/sec Loss 1.2166 LearningRate 0.000038 Epoch: 32 Global Step: 683800 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:23,621-Speed 2495.24 samples/sec Loss 1.1879 LearningRate 0.000038 Epoch: 32 Global Step: 683810 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:31,831-Speed 2494.94 samples/sec Loss 1.2324 LearningRate 0.000038 Epoch: 32 Global Step: 683820 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:39,987-Speed 2511.24 samples/sec Loss 1.2096 LearningRate 0.000038 Epoch: 32 Global Step: 683830 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:48,199-Speed 2494.43 samples/sec Loss 1.2166 LearningRate 0.000038 Epoch: 32 Global Step: 683840 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 02:59:56,408-Speed 2495.30 samples/sec Loss 1.2185 LearningRate 0.000038 Epoch: 32 Global Step: 683850 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:04,618-Speed 2494.77 samples/sec Loss 1.2187 LearningRate 0.000038 Epoch: 32 Global Step: 683860 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:12,824-Speed 2496.09 samples/sec Loss 1.2325 LearningRate 0.000038 Epoch: 32 Global Step: 683870 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:21,033-Speed 2495.19 samples/sec Loss 1.2211 LearningRate 0.000038 Epoch: 32 Global Step: 683880 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:29,187-Speed 2512.17 samples/sec Loss 1.1968 LearningRate 0.000038 Epoch: 32 Global Step: 683890 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:37,408-Speed 2491.47 samples/sec Loss 1.1997 LearningRate 0.000038 Epoch: 32 Global Step: 683900 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:45,619-Speed 2494.53 samples/sec Loss 1.2216 LearningRate 0.000038 Epoch: 32 Global Step: 683910 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:00:53,838-Speed 2492.29 samples/sec Loss 1.1898 LearningRate 0.000038 Epoch: 32 Global Step: 683920 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:02,042-Speed 2496.72 samples/sec Loss 1.1869 LearningRate 0.000038 Epoch: 32 Global Step: 683930 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:10,250-Speed 2495.51 samples/sec Loss 1.2169 LearningRate 0.000038 Epoch: 32 Global Step: 683940 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:18,401-Speed 2513.30 samples/sec Loss 1.2109 LearningRate 0.000038 Epoch: 32 Global Step: 683950 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:26,606-Speed 2496.39 samples/sec Loss 1.1892 LearningRate 0.000038 Epoch: 32 Global Step: 683960 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:34,811-Speed 2496.36 samples/sec Loss 1.2276 LearningRate 0.000038 Epoch: 32 Global Step: 683970 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:43,036-Speed 2490.37 samples/sec Loss 1.2001 LearningRate 0.000038 Epoch: 32 Global Step: 683980 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:51,238-Speed 2497.27 samples/sec Loss 1.2044 LearningRate 0.000038 Epoch: 32 Global Step: 683990 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:01:59,441-Speed 2496.91 samples/sec Loss 1.2198 LearningRate 0.000038 Epoch: 32 Global Step: 684000 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:07,607-Speed 2508.63 samples/sec Loss 1.1972 LearningRate 0.000038 Epoch: 32 Global Step: 684010 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:15,816-Speed 2494.97 samples/sec Loss 1.2026 LearningRate 0.000038 Epoch: 32 Global Step: 684020 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:24,019-Speed 2496.91 samples/sec Loss 1.1894 LearningRate 0.000038 Epoch: 32 Global Step: 684030 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:32,222-Speed 2497.20 samples/sec Loss 1.2272 LearningRate 0.000038 Epoch: 32 Global Step: 684040 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:40,428-Speed 2496.31 samples/sec Loss 1.1770 LearningRate 0.000038 Epoch: 32 Global Step: 684050 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:48,632-Speed 2496.73 samples/sec Loss 1.2065 LearningRate 0.000038 Epoch: 32 Global Step: 684060 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:02:56,781-Speed 2513.63 samples/sec Loss 1.2254 LearningRate 0.000038 Epoch: 32 Global Step: 684070 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:04,985-Speed 2496.86 samples/sec Loss 1.2554 LearningRate 0.000038 Epoch: 32 Global Step: 684080 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:13,191-Speed 2496.21 samples/sec Loss 1.2182 LearningRate 0.000038 Epoch: 32 Global Step: 684090 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:21,395-Speed 2496.65 samples/sec Loss 1.1749 LearningRate 0.000038 Epoch: 32 Global Step: 684100 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:29,601-Speed 2496.15 samples/sec Loss 1.2454 LearningRate 0.000038 Epoch: 32 Global Step: 684110 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:37,805-Speed 2496.69 samples/sec Loss 1.1941 LearningRate 0.000038 Epoch: 32 Global Step: 684120 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:45,961-Speed 2511.10 samples/sec Loss 1.2049 LearningRate 0.000038 Epoch: 32 Global Step: 684130 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:03:54,178-Speed 2492.90 samples/sec Loss 1.2129 LearningRate 0.000038 Epoch: 32 Global Step: 684140 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:02,382-Speed 2496.96 samples/sec Loss 1.1714 LearningRate 0.000038 Epoch: 32 Global Step: 684150 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:10,589-Speed 2495.69 samples/sec Loss 1.2458 LearningRate 0.000038 Epoch: 32 Global Step: 684160 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:18,793-Speed 2496.97 samples/sec Loss 1.2319 LearningRate 0.000038 Epoch: 32 Global Step: 684170 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:26,997-Speed 2496.31 samples/sec Loss 1.2424 LearningRate 0.000038 Epoch: 32 Global Step: 684180 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:35,146-Speed 2513.71 samples/sec Loss 1.2120 LearningRate 0.000038 Epoch: 32 Global Step: 684190 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:43,358-Speed 2494.40 samples/sec Loss 1.2113 LearningRate 0.000038 Epoch: 32 Global Step: 684200 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:51,567-Speed 2495.08 samples/sec Loss 1.2182 LearningRate 0.000038 Epoch: 32 Global Step: 684210 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:04:59,775-Speed 2495.66 samples/sec Loss 1.2226 LearningRate 0.000038 Epoch: 32 Global Step: 684220 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:07,982-Speed 2496.07 samples/sec Loss 1.2164 LearningRate 0.000038 Epoch: 32 Global Step: 684230 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:16,188-Speed 2496.09 samples/sec Loss 1.2171 LearningRate 0.000038 Epoch: 32 Global Step: 684240 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:24,339-Speed 2512.78 samples/sec Loss 1.2085 LearningRate 0.000038 Epoch: 32 Global Step: 684250 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:32,544-Speed 2496.58 samples/sec Loss 1.1785 LearningRate 0.000038 Epoch: 32 Global Step: 684260 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:40,750-Speed 2496.41 samples/sec Loss 1.2076 LearningRate 0.000038 Epoch: 32 Global Step: 684270 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:48,958-Speed 2495.41 samples/sec Loss 1.2179 LearningRate 0.000038 Epoch: 32 Global Step: 684280 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:05:57,163-Speed 2496.31 samples/sec Loss 1.2262 LearningRate 0.000038 Epoch: 32 Global Step: 684290 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:06:05,367-Speed 2496.97 samples/sec Loss 1.2085 LearningRate 0.000038 Epoch: 32 Global Step: 684300 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:06:13,521-Speed 2512.20 samples/sec Loss 1.1818 LearningRate 0.000038 Epoch: 32 Global Step: 684310 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:06:21,724-Speed 2496.92 samples/sec Loss 1.1698 LearningRate 0.000038 Epoch: 32 Global Step: 684320 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:06:29,894-Speed 2506.89 samples/sec Loss 1.1955 LearningRate 0.000038 Epoch: 32 Global Step: 684330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:06:38,099-Speed 2496.48 samples/sec Loss 1.2233 LearningRate 0.000038 Epoch: 32 Global Step: 684340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:06:46,302-Speed 2496.98 samples/sec Loss 1.1942 LearningRate 0.000038 Epoch: 32 Global Step: 684350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:06:54,509-Speed 2495.92 samples/sec Loss 1.2552 LearningRate 0.000038 Epoch: 32 Global Step: 684360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:02,663-Speed 2511.90 samples/sec Loss 1.2057 LearningRate 0.000038 Epoch: 32 Global Step: 684370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:10,880-Speed 2492.98 samples/sec Loss 1.1931 LearningRate 0.000038 Epoch: 32 Global Step: 684380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:19,083-Speed 2496.89 samples/sec Loss 1.2247 LearningRate 0.000038 Epoch: 32 Global Step: 684390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:27,298-Speed 2493.45 samples/sec Loss 1.2103 LearningRate 0.000038 Epoch: 32 Global Step: 684400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:37,782-Speed 1954.01 samples/sec Loss 1.2440 LearningRate 0.000038 Epoch: 33 Global Step: 684410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:45,985-Speed 2496.96 samples/sec Loss 1.2303 LearningRate 0.000038 Epoch: 33 Global Step: 684420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:07:54,134-Speed 2513.70 samples/sec Loss 1.2235 LearningRate 0.000038 Epoch: 33 Global Step: 684430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:02,331-Speed 2498.71 samples/sec Loss 1.2136 LearningRate 0.000038 Epoch: 33 Global Step: 684440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:10,538-Speed 2496.20 samples/sec Loss 1.2150 LearningRate 0.000038 Epoch: 33 Global Step: 684450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:18,737-Speed 2497.99 samples/sec Loss 1.2268 LearningRate 0.000038 Epoch: 33 Global Step: 684460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:26,938-Speed 2497.67 samples/sec Loss 1.2222 LearningRate 0.000038 Epoch: 33 Global Step: 684470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:35,141-Speed 2496.90 samples/sec Loss 1.1781 LearningRate 0.000038 Epoch: 33 Global Step: 684480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:43,289-Speed 2513.91 samples/sec Loss 1.1814 LearningRate 0.000038 Epoch: 33 Global Step: 684490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:51,489-Speed 2498.50 samples/sec Loss 1.2241 LearningRate 0.000038 Epoch: 33 Global Step: 684500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:08:59,723-Speed 2487.56 samples/sec Loss 1.2045 LearningRate 0.000038 Epoch: 33 Global Step: 684510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:07,924-Speed 2497.62 samples/sec Loss 1.1952 LearningRate 0.000038 Epoch: 33 Global Step: 684520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:16,125-Speed 2497.64 samples/sec Loss 1.2004 LearningRate 0.000038 Epoch: 33 Global Step: 684530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:24,326-Speed 2497.49 samples/sec Loss 1.1927 LearningRate 0.000038 Epoch: 33 Global Step: 684540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:32,477-Speed 2513.08 samples/sec Loss 1.1820 LearningRate 0.000038 Epoch: 33 Global Step: 684550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:40,681-Speed 2496.68 samples/sec Loss 1.2149 LearningRate 0.000038 Epoch: 33 Global Step: 684560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:48,884-Speed 2497.13 samples/sec Loss 1.1796 LearningRate 0.000038 Epoch: 33 Global Step: 684570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:09:57,089-Speed 2496.27 samples/sec Loss 1.1807 LearningRate 0.000038 Epoch: 33 Global Step: 684580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:05,297-Speed 2495.63 samples/sec Loss 1.2481 LearningRate 0.000038 Epoch: 33 Global Step: 684590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:13,496-Speed 2498.20 samples/sec Loss 1.1940 LearningRate 0.000038 Epoch: 33 Global Step: 684600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:21,652-Speed 2511.58 samples/sec Loss 1.1920 LearningRate 0.000038 Epoch: 33 Global Step: 684610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:29,864-Speed 2494.33 samples/sec Loss 1.2005 LearningRate 0.000038 Epoch: 33 Global Step: 684620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:38,069-Speed 2496.24 samples/sec Loss 1.1864 LearningRate 0.000038 Epoch: 33 Global Step: 684630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:46,277-Speed 2495.61 samples/sec Loss 1.1883 LearningRate 0.000038 Epoch: 33 Global Step: 684640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:10:54,480-Speed 2497.16 samples/sec Loss 1.2096 LearningRate 0.000038 Epoch: 33 Global Step: 684650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:02,689-Speed 2495.55 samples/sec Loss 1.2173 LearningRate 0.000038 Epoch: 33 Global Step: 684660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:10,859-Speed 2507.59 samples/sec Loss 1.2074 LearningRate 0.000038 Epoch: 33 Global Step: 684670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:19,068-Speed 2495.04 samples/sec Loss 1.2296 LearningRate 0.000038 Epoch: 33 Global Step: 684680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:27,275-Speed 2496.01 samples/sec Loss 1.2073 LearningRate 0.000038 Epoch: 33 Global Step: 684690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:35,479-Speed 2496.65 samples/sec Loss 1.1773 LearningRate 0.000038 Epoch: 33 Global Step: 684700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:43,686-Speed 2495.78 samples/sec Loss 1.2058 LearningRate 0.000038 Epoch: 33 Global Step: 684710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:11:51,890-Speed 2496.69 samples/sec Loss 1.1992 LearningRate 0.000038 Epoch: 33 Global Step: 684720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:12:00,047-Speed 2511.36 samples/sec Loss 1.1803 LearningRate 0.000038 Epoch: 33 Global Step: 684730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:12:08,258-Speed 2495.11 samples/sec Loss 1.1545 LearningRate 0.000038 Epoch: 33 Global Step: 684740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:12:16,419-Speed 2510.17 samples/sec Loss 1.1869 LearningRate 0.000038 Epoch: 33 Global Step: 684750 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:12:24,619-Speed 2497.96 samples/sec Loss 1.1982 LearningRate 0.000038 Epoch: 33 Global Step: 684760 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:12:32,822-Speed 2497.27 samples/sec Loss 1.2031 LearningRate 0.000038 Epoch: 33 Global Step: 684770 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:12:41,025-Speed 2497.19 samples/sec Loss 1.2248 LearningRate 0.000038 Epoch: 33 Global Step: 684780 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:12:49,188-Speed 2509.30 samples/sec Loss 1.1781 LearningRate 0.000038 Epoch: 33 Global Step: 684790 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:12:57,397-Speed 2495.44 samples/sec Loss 1.2107 LearningRate 0.000038 Epoch: 33 Global Step: 684800 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:05,603-Speed 2496.23 samples/sec Loss 1.2201 LearningRate 0.000038 Epoch: 33 Global Step: 684810 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:13,807-Speed 2496.53 samples/sec Loss 1.1918 LearningRate 0.000038 Epoch: 33 Global Step: 684820 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:22,025-Speed 2492.71 samples/sec Loss 1.1899 LearningRate 0.000038 Epoch: 33 Global Step: 684830 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:30,226-Speed 2497.66 samples/sec Loss 1.1652 LearningRate 0.000038 Epoch: 33 Global Step: 684840 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:38,375-Speed 2513.44 samples/sec Loss 1.1732 LearningRate 0.000038 Epoch: 33 Global Step: 684850 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:46,577-Speed 2497.25 samples/sec Loss 1.1974 LearningRate 0.000038 Epoch: 33 Global Step: 684860 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:13:54,793-Speed 2493.04 samples/sec Loss 1.1894 LearningRate 0.000038 Epoch: 33 Global Step: 684870 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:02,998-Speed 2496.75 samples/sec Loss 1.2090 LearningRate 0.000038 Epoch: 33 Global Step: 684880 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:11,215-Speed 2492.78 samples/sec Loss 1.1945 LearningRate 0.000038 Epoch: 33 Global Step: 684890 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:19,421-Speed 2496.07 samples/sec Loss 1.1947 LearningRate 0.000038 Epoch: 33 Global Step: 684900 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:27,573-Speed 2512.76 samples/sec Loss 1.2173 LearningRate 0.000038 Epoch: 33 Global Step: 684910 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:35,781-Speed 2495.81 samples/sec Loss 1.2085 LearningRate 0.000038 Epoch: 33 Global Step: 684920 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:43,984-Speed 2497.06 samples/sec Loss 1.1774 LearningRate 0.000038 Epoch: 33 Global Step: 684930 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:14:52,185-Speed 2497.53 samples/sec Loss 1.1799 LearningRate 0.000038 Epoch: 33 Global Step: 684940 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:00,388-Speed 2497.11 samples/sec Loss 1.1956 LearningRate 0.000038 Epoch: 33 Global Step: 684950 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:08,593-Speed 2496.44 samples/sec Loss 1.2052 LearningRate 0.000038 Epoch: 33 Global Step: 684960 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:16,744-Speed 2513.33 samples/sec Loss 1.1595 LearningRate 0.000038 Epoch: 33 Global Step: 684970 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:24,950-Speed 2495.92 samples/sec Loss 1.2096 LearningRate 0.000038 Epoch: 33 Global Step: 684980 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:33,154-Speed 2496.91 samples/sec Loss 1.2012 LearningRate 0.000037 Epoch: 33 Global Step: 684990 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:41,355-Speed 2497.33 samples/sec Loss 1.2148 LearningRate 0.000037 Epoch: 33 Global Step: 685000 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:49,560-Speed 2496.53 samples/sec Loss 1.1639 LearningRate 0.000037 Epoch: 33 Global Step: 685010 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:15:57,761-Speed 2497.35 samples/sec Loss 1.2516 LearningRate 0.000037 Epoch: 33 Global Step: 685020 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:05,911-Speed 2513.43 samples/sec Loss 1.2329 LearningRate 0.000037 Epoch: 33 Global Step: 685030 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:14,115-Speed 2496.86 samples/sec Loss 1.2128 LearningRate 0.000037 Epoch: 33 Global Step: 685040 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:22,317-Speed 2497.30 samples/sec Loss 1.2039 LearningRate 0.000037 Epoch: 33 Global Step: 685050 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:30,531-Speed 2493.63 samples/sec Loss 1.2029 LearningRate 0.000037 Epoch: 33 Global Step: 685060 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:38,736-Speed 2496.40 samples/sec Loss 1.1974 LearningRate 0.000037 Epoch: 33 Global Step: 685070 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:46,943-Speed 2495.92 samples/sec Loss 1.2264 LearningRate 0.000037 Epoch: 33 Global Step: 685080 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:16:55,094-Speed 2512.91 samples/sec Loss 1.1903 LearningRate 0.000037 Epoch: 33 Global Step: 685090 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:03,307-Speed 2494.24 samples/sec Loss 1.2024 LearningRate 0.000037 Epoch: 33 Global Step: 685100 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:11,509-Speed 2497.12 samples/sec Loss 1.1888 LearningRate 0.000037 Epoch: 33 Global Step: 685110 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:19,713-Speed 2496.75 samples/sec Loss 1.2265 LearningRate 0.000037 Epoch: 33 Global Step: 685120 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:27,920-Speed 2495.98 samples/sec Loss 1.2593 LearningRate 0.000037 Epoch: 33 Global Step: 685130 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:36,122-Speed 2497.00 samples/sec Loss 1.2065 LearningRate 0.000037 Epoch: 33 Global Step: 685140 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:44,274-Speed 2512.90 samples/sec Loss 1.2187 LearningRate 0.000037 Epoch: 33 Global Step: 685150 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:17:52,475-Speed 2497.73 samples/sec Loss 1.1805 LearningRate 0.000037 Epoch: 33 Global Step: 685160 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:00,677-Speed 2497.10 samples/sec Loss 1.2265 LearningRate 0.000037 Epoch: 33 Global Step: 685170 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:08,878-Speed 2497.52 samples/sec Loss 1.2102 LearningRate 0.000037 Epoch: 33 Global Step: 685180 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:17,084-Speed 2496.41 samples/sec Loss 1.2020 LearningRate 0.000037 Epoch: 33 Global Step: 685190 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:25,289-Speed 2496.25 samples/sec Loss 1.2058 LearningRate 0.000037 Epoch: 33 Global Step: 685200 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:33,443-Speed 2512.22 samples/sec Loss 1.2338 LearningRate 0.000037 Epoch: 33 Global Step: 685210 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:41,645-Speed 2497.20 samples/sec Loss 1.2058 LearningRate 0.000037 Epoch: 33 Global Step: 685220 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:49,852-Speed 2495.87 samples/sec Loss 1.1960 LearningRate 0.000037 Epoch: 33 Global Step: 685230 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:18:58,059-Speed 2495.81 samples/sec Loss 1.1936 LearningRate 0.000037 Epoch: 33 Global Step: 685240 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:06,265-Speed 2496.18 samples/sec Loss 1.2053 LearningRate 0.000037 Epoch: 33 Global Step: 685250 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:14,465-Speed 2498.05 samples/sec Loss 1.1996 LearningRate 0.000037 Epoch: 33 Global Step: 685260 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:22,617-Speed 2512.84 samples/sec Loss 1.2318 LearningRate 0.000037 Epoch: 33 Global Step: 685270 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:30,821-Speed 2496.66 samples/sec Loss 1.2290 LearningRate 0.000037 Epoch: 33 Global Step: 685280 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:39,022-Speed 2497.52 samples/sec Loss 1.2018 LearningRate 0.000037 Epoch: 33 Global Step: 685290 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:47,229-Speed 2495.93 samples/sec Loss 1.2027 LearningRate 0.000037 Epoch: 33 Global Step: 685300 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:19:55,433-Speed 2496.98 samples/sec Loss 1.2263 LearningRate 0.000037 Epoch: 33 Global Step: 685310 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:03,647-Speed 2493.47 samples/sec Loss 1.2059 LearningRate 0.000037 Epoch: 33 Global Step: 685320 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:11,798-Speed 2513.03 samples/sec Loss 1.2031 LearningRate 0.000037 Epoch: 33 Global Step: 685330 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:19,998-Speed 2497.83 samples/sec Loss 1.2081 LearningRate 0.000037 Epoch: 33 Global Step: 685340 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:28,202-Speed 2496.83 samples/sec Loss 1.2095 LearningRate 0.000037 Epoch: 33 Global Step: 685350 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:36,409-Speed 2495.86 samples/sec Loss 1.2216 LearningRate 0.000037 Epoch: 33 Global Step: 685360 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:44,612-Speed 2496.85 samples/sec Loss 1.2092 LearningRate 0.000037 Epoch: 33 Global Step: 685370 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:20:52,818-Speed 2496.23 samples/sec Loss 1.2197 LearningRate 0.000037 Epoch: 33 Global Step: 685380 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:00,971-Speed 2512.38 samples/sec Loss 1.2157 LearningRate 0.000037 Epoch: 33 Global Step: 685390 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:09,176-Speed 2496.55 samples/sec Loss 1.1868 LearningRate 0.000037 Epoch: 33 Global Step: 685400 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:17,381-Speed 2496.38 samples/sec Loss 1.1930 LearningRate 0.000037 Epoch: 33 Global Step: 685410 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:25,585-Speed 2496.79 samples/sec Loss 1.2054 LearningRate 0.000037 Epoch: 33 Global Step: 685420 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:33,786-Speed 2497.65 samples/sec Loss 1.2203 LearningRate 0.000037 Epoch: 33 Global Step: 685430 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:41,993-Speed 2495.74 samples/sec Loss 1.2071 LearningRate 0.000037 Epoch: 33 Global Step: 685440 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:50,144-Speed 2513.10 samples/sec Loss 1.2001 LearningRate 0.000037 Epoch: 33 Global Step: 685450 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:21:58,348-Speed 2496.89 samples/sec Loss 1.2264 LearningRate 0.000037 Epoch: 33 Global Step: 685460 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:06,549-Speed 2497.70 samples/sec Loss 1.1866 LearningRate 0.000037 Epoch: 33 Global Step: 685470 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:14,748-Speed 2498.25 samples/sec Loss 1.2200 LearningRate 0.000037 Epoch: 33 Global Step: 685480 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:22,949-Speed 2497.40 samples/sec Loss 1.1701 LearningRate 0.000037 Epoch: 33 Global Step: 685490 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:31,152-Speed 2497.21 samples/sec Loss 1.1702 LearningRate 0.000037 Epoch: 33 Global Step: 685500 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:39,300-Speed 2514.20 samples/sec Loss 1.2267 LearningRate 0.000037 Epoch: 33 Global Step: 685510 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:47,505-Speed 2496.17 samples/sec Loss 1.2176 LearningRate 0.000037 Epoch: 33 Global Step: 685520 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:22:55,708-Speed 2497.35 samples/sec Loss 1.1858 LearningRate 0.000037 Epoch: 33 Global Step: 685530 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:03,916-Speed 2495.57 samples/sec Loss 1.1722 LearningRate 0.000037 Epoch: 33 Global Step: 685540 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:12,118-Speed 2497.38 samples/sec Loss 1.2066 LearningRate 0.000037 Epoch: 33 Global Step: 685550 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:20,327-Speed 2495.08 samples/sec Loss 1.1862 LearningRate 0.000037 Epoch: 33 Global Step: 685560 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:28,477-Speed 2513.16 samples/sec Loss 1.2070 LearningRate 0.000037 Epoch: 33 Global Step: 685570 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:36,681-Speed 2496.85 samples/sec Loss 1.2126 LearningRate 0.000037 Epoch: 33 Global Step: 685580 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:44,901-Speed 2491.99 samples/sec Loss 1.1974 LearningRate 0.000037 Epoch: 33 Global Step: 685590 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:23:53,106-Speed 2496.08 samples/sec Loss 1.2160 LearningRate 0.000037 Epoch: 33 Global Step: 685600 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:01,310-Speed 2496.87 samples/sec Loss 1.1779 LearningRate 0.000037 Epoch: 33 Global Step: 685610 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:09,514-Speed 2497.08 samples/sec Loss 1.2047 LearningRate 0.000037 Epoch: 33 Global Step: 685620 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:17,663-Speed 2513.39 samples/sec Loss 1.2043 LearningRate 0.000037 Epoch: 33 Global Step: 685630 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:25,864-Speed 2497.67 samples/sec Loss 1.1992 LearningRate 0.000037 Epoch: 33 Global Step: 685640 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:34,074-Speed 2495.13 samples/sec Loss 1.2034 LearningRate 0.000037 Epoch: 33 Global Step: 685650 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:42,273-Speed 2497.98 samples/sec Loss 1.2065 LearningRate 0.000037 Epoch: 33 Global Step: 685660 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:50,473-Speed 2498.05 samples/sec Loss 1.1859 LearningRate 0.000037 Epoch: 33 Global Step: 685670 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:24:58,681-Speed 2495.43 samples/sec Loss 1.1867 LearningRate 0.000037 Epoch: 33 Global Step: 685680 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:06,828-Speed 2514.37 samples/sec Loss 1.2209 LearningRate 0.000037 Epoch: 33 Global Step: 685690 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:15,038-Speed 2494.94 samples/sec Loss 1.1945 LearningRate 0.000037 Epoch: 33 Global Step: 685700 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:23,241-Speed 2496.98 samples/sec Loss 1.2127 LearningRate 0.000037 Epoch: 33 Global Step: 685710 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:31,442-Speed 2497.29 samples/sec Loss 1.1720 LearningRate 0.000037 Epoch: 33 Global Step: 685720 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:39,643-Speed 2497.83 samples/sec Loss 1.1816 LearningRate 0.000037 Epoch: 33 Global Step: 685730 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:47,844-Speed 2497.58 samples/sec Loss 1.1781 LearningRate 0.000037 Epoch: 33 Global Step: 685740 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:25:56,005-Speed 2509.89 samples/sec Loss 1.2085 LearningRate 0.000037 Epoch: 33 Global Step: 685750 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:04,210-Speed 2496.64 samples/sec Loss 1.2217 LearningRate 0.000037 Epoch: 33 Global Step: 685760 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:12,409-Speed 2498.36 samples/sec Loss 1.1886 LearningRate 0.000037 Epoch: 33 Global Step: 685770 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:20,612-Speed 2497.02 samples/sec Loss 1.1958 LearningRate 0.000037 Epoch: 33 Global Step: 685780 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:28,815-Speed 2497.16 samples/sec Loss 1.1661 LearningRate 0.000037 Epoch: 33 Global Step: 685790 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:37,015-Speed 2497.97 samples/sec Loss 1.1909 LearningRate 0.000037 Epoch: 33 Global Step: 685800 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:45,162-Speed 2514.22 samples/sec Loss 1.1553 LearningRate 0.000037 Epoch: 33 Global Step: 685810 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:26:53,366-Speed 2496.70 samples/sec Loss 1.2020 LearningRate 0.000037 Epoch: 33 Global Step: 685820 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:01,572-Speed 2496.19 samples/sec Loss 1.2054 LearningRate 0.000037 Epoch: 33 Global Step: 685830 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:09,783-Speed 2494.71 samples/sec Loss 1.1809 LearningRate 0.000037 Epoch: 33 Global Step: 685840 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:17,998-Speed 2493.47 samples/sec Loss 1.2069 LearningRate 0.000037 Epoch: 33 Global Step: 685850 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:26,202-Speed 2496.46 samples/sec Loss 1.2049 LearningRate 0.000037 Epoch: 33 Global Step: 685860 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:34,359-Speed 2511.37 samples/sec Loss 1.1921 LearningRate 0.000037 Epoch: 33 Global Step: 685870 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:42,566-Speed 2496.04 samples/sec Loss 1.2146 LearningRate 0.000037 Epoch: 33 Global Step: 685880 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:50,770-Speed 2496.66 samples/sec Loss 1.1985 LearningRate 0.000037 Epoch: 33 Global Step: 685890 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:27:58,975-Speed 2496.56 samples/sec Loss 1.2015 LearningRate 0.000037 Epoch: 33 Global Step: 685900 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:28:07,179-Speed 2497.03 samples/sec Loss 1.1899 LearningRate 0.000037 Epoch: 33 Global Step: 685910 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:28:15,383-Speed 2496.65 samples/sec Loss 1.2089 LearningRate 0.000037 Epoch: 33 Global Step: 685920 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:28:23,530-Speed 2514.04 samples/sec Loss 1.1890 LearningRate 0.000037 Epoch: 33 Global Step: 685930 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:28:31,738-Speed 2495.69 samples/sec Loss 1.1960 LearningRate 0.000037 Epoch: 33 Global Step: 685940 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-07-12 03:28:39,945-Speed 2495.87 samples/sec Loss 1.1752 LearningRate 0.000037 Epoch: 33 Global Step: 685950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:28:48,154-Speed 2495.34 samples/sec Loss 1.2200 LearningRate 0.000037 Epoch: 33 Global Step: 685960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:28:56,357-Speed 2496.99 samples/sec Loss 1.1583 LearningRate 0.000037 Epoch: 33 Global Step: 685970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:04,565-Speed 2495.67 samples/sec Loss 1.1832 LearningRate 0.000037 Epoch: 33 Global Step: 685980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:12,720-Speed 2511.80 samples/sec Loss 1.2008 LearningRate 0.000037 Epoch: 33 Global Step: 685990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:20,923-Speed 2496.86 samples/sec Loss 1.1882 LearningRate 0.000037 Epoch: 33 Global Step: 686000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:29,125-Speed 2497.56 samples/sec Loss 1.2023 LearningRate 0.000037 Epoch: 33 Global Step: 686010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:37,331-Speed 2496.08 samples/sec Loss 1.2241 LearningRate 0.000037 Epoch: 33 Global Step: 686020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:45,533-Speed 2497.56 samples/sec Loss 1.2006 LearningRate 0.000037 Epoch: 33 Global Step: 686030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:29:53,736-Speed 2496.90 samples/sec Loss 1.2134 LearningRate 0.000037 Epoch: 33 Global Step: 686040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:01,887-Speed 2513.27 samples/sec Loss 1.1949 LearningRate 0.000037 Epoch: 33 Global Step: 686050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:10,092-Speed 2496.17 samples/sec Loss 1.2261 LearningRate 0.000037 Epoch: 33 Global Step: 686060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:18,294-Speed 2497.67 samples/sec Loss 1.2075 LearningRate 0.000037 Epoch: 33 Global Step: 686070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:26,497-Speed 2497.43 samples/sec Loss 1.1950 LearningRate 0.000037 Epoch: 33 Global Step: 686080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:34,704-Speed 2495.65 samples/sec Loss 1.2110 LearningRate 0.000037 Epoch: 33 Global Step: 686090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:42,905-Speed 2498.18 samples/sec Loss 1.1616 LearningRate 0.000037 Epoch: 33 Global Step: 686100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:51,058-Speed 2512.28 samples/sec Loss 1.1827 LearningRate 0.000037 Epoch: 33 Global Step: 686110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:30:59,264-Speed 2496.50 samples/sec Loss 1.1707 LearningRate 0.000037 Epoch: 33 Global Step: 686120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:07,470-Speed 2496.05 samples/sec Loss 1.2057 LearningRate 0.000037 Epoch: 33 Global Step: 686130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:15,671-Speed 2497.54 samples/sec Loss 1.1652 LearningRate 0.000037 Epoch: 33 Global Step: 686140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:23,874-Speed 2497.39 samples/sec Loss 1.2155 LearningRate 0.000037 Epoch: 33 Global Step: 686150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:32,079-Speed 2496.66 samples/sec Loss 1.1969 LearningRate 0.000037 Epoch: 33 Global Step: 686160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:40,233-Speed 2511.80 samples/sec Loss 1.1952 LearningRate 0.000037 Epoch: 33 Global Step: 686170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:48,440-Speed 2495.81 samples/sec Loss 1.1894 LearningRate 0.000037 Epoch: 33 Global Step: 686180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:31:56,649-Speed 2495.21 samples/sec Loss 1.2075 LearningRate 0.000037 Epoch: 33 Global Step: 686190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:04,855-Speed 2496.21 samples/sec Loss 1.1875 LearningRate 0.000037 Epoch: 33 Global Step: 686200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:13,076-Speed 2491.57 samples/sec Loss 1.2114 LearningRate 0.000037 Epoch: 33 Global Step: 686210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:21,285-Speed 2495.26 samples/sec Loss 1.2013 LearningRate 0.000037 Epoch: 33 Global Step: 686220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:29,437-Speed 2512.77 samples/sec Loss 1.1814 LearningRate 0.000037 Epoch: 33 Global Step: 686230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:37,650-Speed 2493.98 samples/sec Loss 1.1993 LearningRate 0.000037 Epoch: 33 Global Step: 686240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:45,855-Speed 2496.22 samples/sec Loss 1.1854 LearningRate 0.000037 Epoch: 33 Global Step: 686250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:32:54,062-Speed 2495.98 samples/sec Loss 1.1692 LearningRate 0.000037 Epoch: 33 Global Step: 686260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:02,271-Speed 2495.10 samples/sec Loss 1.2242 LearningRate 0.000037 Epoch: 33 Global Step: 686270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:10,475-Speed 2496.75 samples/sec Loss 1.2109 LearningRate 0.000037 Epoch: 33 Global Step: 686280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:18,634-Speed 2510.53 samples/sec Loss 1.1774 LearningRate 0.000037 Epoch: 33 Global Step: 686290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:26,841-Speed 2495.75 samples/sec Loss 1.1823 LearningRate 0.000037 Epoch: 33 Global Step: 686300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:35,046-Speed 2496.44 samples/sec Loss 1.1917 LearningRate 0.000037 Epoch: 33 Global Step: 686310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:43,253-Speed 2495.65 samples/sec Loss 1.1806 LearningRate 0.000037 Epoch: 33 Global Step: 686320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:51,471-Speed 2492.43 samples/sec Loss 1.1956 LearningRate 0.000037 Epoch: 33 Global Step: 686330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:33:59,676-Speed 2497.36 samples/sec Loss 1.1855 LearningRate 0.000037 Epoch: 33 Global Step: 686340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:07,831-Speed 2511.70 samples/sec Loss 1.1768 LearningRate 0.000037 Epoch: 33 Global Step: 686350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:16,045-Speed 2493.94 samples/sec Loss 1.1786 LearningRate 0.000037 Epoch: 33 Global Step: 686360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:24,276-Speed 2488.46 samples/sec Loss 1.2219 LearningRate 0.000037 Epoch: 33 Global Step: 686370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:32,485-Speed 2495.20 samples/sec Loss 1.1961 LearningRate 0.000037 Epoch: 33 Global Step: 686380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:40,690-Speed 2496.57 samples/sec Loss 1.2139 LearningRate 0.000037 Epoch: 33 Global Step: 686390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:48,898-Speed 2495.57 samples/sec Loss 1.1775 LearningRate 0.000037 Epoch: 33 Global Step: 686400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:34:57,051-Speed 2512.04 samples/sec Loss 1.1986 LearningRate 0.000037 Epoch: 33 Global Step: 686410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:05,256-Speed 2496.87 samples/sec Loss 1.1971 LearningRate 0.000037 Epoch: 33 Global Step: 686420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:13,464-Speed 2495.23 samples/sec Loss 1.1930 LearningRate 0.000037 Epoch: 33 Global Step: 686430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:21,679-Speed 2493.64 samples/sec Loss 1.1902 LearningRate 0.000037 Epoch: 33 Global Step: 686440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:29,886-Speed 2495.73 samples/sec Loss 1.2313 LearningRate 0.000037 Epoch: 33 Global Step: 686450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:38,096-Speed 2494.63 samples/sec Loss 1.2072 LearningRate 0.000037 Epoch: 33 Global Step: 686460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:46,256-Speed 2510.45 samples/sec Loss 1.2149 LearningRate 0.000037 Epoch: 33 Global Step: 686470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:35:54,462-Speed 2496.21 samples/sec Loss 1.2130 LearningRate 0.000037 Epoch: 33 Global Step: 686480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:02,663-Speed 2497.54 samples/sec Loss 1.2080 LearningRate 0.000037 Epoch: 33 Global Step: 686490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:10,865-Speed 2497.42 samples/sec Loss 1.2032 LearningRate 0.000037 Epoch: 33 Global Step: 686500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:19,068-Speed 2496.92 samples/sec Loss 1.1819 LearningRate 0.000037 Epoch: 33 Global Step: 686510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:27,273-Speed 2496.55 samples/sec Loss 1.1954 LearningRate 0.000037 Epoch: 33 Global Step: 686520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:35,421-Speed 2513.87 samples/sec Loss 1.2005 LearningRate 0.000037 Epoch: 33 Global Step: 686530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:43,625-Speed 2496.89 samples/sec Loss 1.1699 LearningRate 0.000037 Epoch: 33 Global Step: 686540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:36:51,834-Speed 2495.25 samples/sec Loss 1.2275 LearningRate 0.000037 Epoch: 33 Global Step: 686550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:00,037-Speed 2496.90 samples/sec Loss 1.1945 LearningRate 0.000037 Epoch: 33 Global Step: 686560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:08,241-Speed 2496.75 samples/sec Loss 1.2094 LearningRate 0.000037 Epoch: 33 Global Step: 686570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:16,445-Speed 2496.63 samples/sec Loss 1.1887 LearningRate 0.000037 Epoch: 33 Global Step: 686580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:24,595-Speed 2513.18 samples/sec Loss 1.1652 LearningRate 0.000037 Epoch: 33 Global Step: 686590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:32,796-Speed 2497.59 samples/sec Loss 1.1988 LearningRate 0.000037 Epoch: 33 Global Step: 686600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:41,008-Speed 2494.37 samples/sec Loss 1.2181 LearningRate 0.000037 Epoch: 33 Global Step: 686610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:49,213-Speed 2496.58 samples/sec Loss 1.2298 LearningRate 0.000037 Epoch: 33 Global Step: 686620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:37:57,412-Speed 2498.19 samples/sec Loss 1.2206 LearningRate 0.000037 Epoch: 33 Global Step: 686630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:05,616-Speed 2496.59 samples/sec Loss 1.1950 LearningRate 0.000037 Epoch: 33 Global Step: 686640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:13,766-Speed 2513.31 samples/sec Loss 1.1900 LearningRate 0.000037 Epoch: 33 Global Step: 686650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:21,979-Speed 2494.28 samples/sec Loss 1.2207 LearningRate 0.000037 Epoch: 33 Global Step: 686660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:30,187-Speed 2495.39 samples/sec Loss 1.1822 LearningRate 0.000037 Epoch: 33 Global Step: 686670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:38,390-Speed 2496.99 samples/sec Loss 1.1636 LearningRate 0.000037 Epoch: 33 Global Step: 686680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:46,591-Speed 2497.82 samples/sec Loss 1.2132 LearningRate 0.000037 Epoch: 33 Global Step: 686690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:38:54,795-Speed 2496.98 samples/sec Loss 1.2060 LearningRate 0.000037 Epoch: 33 Global Step: 686700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:02,943-Speed 2513.86 samples/sec Loss 1.1931 LearningRate 0.000037 Epoch: 33 Global Step: 686710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:11,149-Speed 2495.82 samples/sec Loss 1.1868 LearningRate 0.000037 Epoch: 33 Global Step: 686720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:19,351-Speed 2497.75 samples/sec Loss 1.1657 LearningRate 0.000037 Epoch: 33 Global Step: 686730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:27,555-Speed 2497.12 samples/sec Loss 1.2071 LearningRate 0.000037 Epoch: 33 Global Step: 686740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:35,770-Speed 2493.16 samples/sec Loss 1.1781 LearningRate 0.000037 Epoch: 33 Global Step: 686750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:43,974-Speed 2496.63 samples/sec Loss 1.1660 LearningRate 0.000037 Epoch: 33 Global Step: 686760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:39:52,124-Speed 2513.46 samples/sec Loss 1.2108 LearningRate 0.000037 Epoch: 33 Global Step: 686770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:00,331-Speed 2495.99 samples/sec Loss 1.1970 LearningRate 0.000037 Epoch: 33 Global Step: 686780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:08,539-Speed 2495.44 samples/sec Loss 1.1869 LearningRate 0.000037 Epoch: 33 Global Step: 686790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:16,754-Speed 2493.27 samples/sec Loss 1.1855 LearningRate 0.000037 Epoch: 33 Global Step: 686800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:24,963-Speed 2495.50 samples/sec Loss 1.2319 LearningRate 0.000037 Epoch: 33 Global Step: 686810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:33,165-Speed 2497.40 samples/sec Loss 1.1941 LearningRate 0.000037 Epoch: 33 Global Step: 686820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:41,313-Speed 2513.69 samples/sec Loss 1.1906 LearningRate 0.000037 Epoch: 33 Global Step: 686830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:49,515-Speed 2497.42 samples/sec Loss 1.1678 LearningRate 0.000037 Epoch: 33 Global Step: 686840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:40:57,722-Speed 2495.87 samples/sec Loss 1.2230 LearningRate 0.000037 Epoch: 33 Global Step: 686850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:05,925-Speed 2497.18 samples/sec Loss 1.2103 LearningRate 0.000037 Epoch: 33 Global Step: 686860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:14,126-Speed 2497.85 samples/sec Loss 1.1737 LearningRate 0.000037 Epoch: 33 Global Step: 686870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:22,329-Speed 2497.11 samples/sec Loss 1.2115 LearningRate 0.000037 Epoch: 33 Global Step: 686880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:30,480-Speed 2512.78 samples/sec Loss 1.2265 LearningRate 0.000037 Epoch: 33 Global Step: 686890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:38,682-Speed 2497.32 samples/sec Loss 1.2205 LearningRate 0.000037 Epoch: 33 Global Step: 686900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:46,886-Speed 2496.73 samples/sec Loss 1.1751 LearningRate 0.000037 Epoch: 33 Global Step: 686910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:41:55,086-Speed 2497.88 samples/sec Loss 1.1857 LearningRate 0.000037 Epoch: 33 Global Step: 686920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:03,285-Speed 2498.27 samples/sec Loss 1.2023 LearningRate 0.000036 Epoch: 33 Global Step: 686930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:11,488-Speed 2496.98 samples/sec Loss 1.1871 LearningRate 0.000036 Epoch: 33 Global Step: 686940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:19,639-Speed 2513.00 samples/sec Loss 1.2057 LearningRate 0.000036 Epoch: 33 Global Step: 686950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:27,843-Speed 2496.77 samples/sec Loss 1.1626 LearningRate 0.000036 Epoch: 33 Global Step: 686960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:36,044-Speed 2497.60 samples/sec Loss 1.1686 LearningRate 0.000036 Epoch: 33 Global Step: 686970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:44,246-Speed 2497.35 samples/sec Loss 1.2179 LearningRate 0.000036 Epoch: 33 Global Step: 686980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:42:52,447-Speed 2497.82 samples/sec Loss 1.1988 LearningRate 0.000036 Epoch: 33 Global Step: 686990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:00,653-Speed 2496.04 samples/sec Loss 1.2066 LearningRate 0.000036 Epoch: 33 Global Step: 687000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:08,803-Speed 2513.34 samples/sec Loss 1.1827 LearningRate 0.000036 Epoch: 33 Global Step: 687010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:17,015-Speed 2494.47 samples/sec Loss 1.1905 LearningRate 0.000036 Epoch: 33 Global Step: 687020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:25,218-Speed 2496.83 samples/sec Loss 1.2073 LearningRate 0.000036 Epoch: 33 Global Step: 687030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:33,422-Speed 2496.99 samples/sec Loss 1.2129 LearningRate 0.000036 Epoch: 33 Global Step: 687040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:41,626-Speed 2496.86 samples/sec Loss 1.2167 LearningRate 0.000036 Epoch: 33 Global Step: 687050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:49,831-Speed 2496.39 samples/sec Loss 1.1710 LearningRate 0.000036 Epoch: 33 Global Step: 687060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:43:57,982-Speed 2513.09 samples/sec Loss 1.1690 LearningRate 0.000036 Epoch: 33 Global Step: 687070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:06,190-Speed 2495.61 samples/sec Loss 1.1951 LearningRate 0.000036 Epoch: 33 Global Step: 687080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:14,396-Speed 2496.11 samples/sec Loss 1.2000 LearningRate 0.000036 Epoch: 33 Global Step: 687090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:22,598-Speed 2497.11 samples/sec Loss 1.1888 LearningRate 0.000036 Epoch: 33 Global Step: 687100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:30,800-Speed 2497.65 samples/sec Loss 1.2237 LearningRate 0.000036 Epoch: 33 Global Step: 687110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:39,002-Speed 2497.32 samples/sec Loss 1.1879 LearningRate 0.000036 Epoch: 33 Global Step: 687120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:47,163-Speed 2509.99 samples/sec Loss 1.2198 LearningRate 0.000036 Epoch: 33 Global Step: 687130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:44:55,364-Speed 2497.64 samples/sec Loss 1.1711 LearningRate 0.000036 Epoch: 33 Global Step: 687140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-07-12 03:45:03,572-Speed 2495.46 samples/sec Loss 1.2100 LearningRate 0.000036 Epoch: 33 Global Step: 687150 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:11,784-Speed 2494.79 samples/sec Loss 1.1354 LearningRate 0.000036 Epoch: 33 Global Step: 687160 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:19,987-Speed 2497.12 samples/sec Loss 1.2052 LearningRate 0.000036 Epoch: 33 Global Step: 687170 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:28,191-Speed 2496.45 samples/sec Loss 1.2311 LearningRate 0.000036 Epoch: 33 Global Step: 687180 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:36,339-Speed 2514.12 samples/sec Loss 1.1854 LearningRate 0.000036 Epoch: 33 Global Step: 687190 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:44,548-Speed 2495.33 samples/sec Loss 1.1977 LearningRate 0.000036 Epoch: 33 Global Step: 687200 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:45:52,752-Speed 2496.71 samples/sec Loss 1.2061 LearningRate 0.000036 Epoch: 33 Global Step: 687210 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:00,956-Speed 2496.74 samples/sec Loss 1.1928 LearningRate 0.000036 Epoch: 33 Global Step: 687220 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:09,158-Speed 2497.20 samples/sec Loss 1.1755 LearningRate 0.000036 Epoch: 33 Global Step: 687230 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:17,363-Speed 2496.86 samples/sec Loss 1.2283 LearningRate 0.000036 Epoch: 33 Global Step: 687240 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:25,510-Speed 2513.88 samples/sec Loss 1.2133 LearningRate 0.000036 Epoch: 33 Global Step: 687250 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:33,713-Speed 2497.14 samples/sec Loss 1.1747 LearningRate 0.000036 Epoch: 33 Global Step: 687260 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:41,915-Speed 2497.35 samples/sec Loss 1.2255 LearningRate 0.000036 Epoch: 33 Global Step: 687270 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:50,132-Speed 2492.88 samples/sec Loss 1.2062 LearningRate 0.000036 Epoch: 33 Global Step: 687280 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:46:58,335-Speed 2497.03 samples/sec Loss 1.1802 LearningRate 0.000036 Epoch: 33 Global Step: 687290 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:06,538-Speed 2497.22 samples/sec Loss 1.1764 LearningRate 0.000036 Epoch: 33 Global Step: 687300 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:14,688-Speed 2513.45 samples/sec Loss 1.1790 LearningRate 0.000036 Epoch: 33 Global Step: 687310 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:22,892-Speed 2496.88 samples/sec Loss 1.2214 LearningRate 0.000036 Epoch: 33 Global Step: 687320 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:31,094-Speed 2497.30 samples/sec Loss 1.2253 LearningRate 0.000036 Epoch: 33 Global Step: 687330 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:39,296-Speed 2497.40 samples/sec Loss 1.1933 LearningRate 0.000036 Epoch: 33 Global Step: 687340 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:47,497-Speed 2497.53 samples/sec Loss 1.1966 LearningRate 0.000036 Epoch: 33 Global Step: 687350 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:47:55,700-Speed 2496.82 samples/sec Loss 1.1924 LearningRate 0.000036 Epoch: 33 Global Step: 687360 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:03,851-Speed 2513.20 samples/sec Loss 1.1826 LearningRate 0.000036 Epoch: 33 Global Step: 687370 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:12,061-Speed 2495.08 samples/sec Loss 1.2125 LearningRate 0.000036 Epoch: 33 Global Step: 687380 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:20,262-Speed 2497.61 samples/sec Loss 1.2043 LearningRate 0.000036 Epoch: 33 Global Step: 687390 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:28,476-Speed 2493.59 samples/sec Loss 1.2043 LearningRate 0.000036 Epoch: 33 Global Step: 687400 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:36,678-Speed 2497.41 samples/sec Loss 1.1919 LearningRate 0.000036 Epoch: 33 Global Step: 687410 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:44,884-Speed 2496.00 samples/sec Loss 1.1892 LearningRate 0.000036 Epoch: 33 Global Step: 687420 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:48:58,417-Speed 1517.01 samples/sec Loss 1.1921 LearningRate 0.000036 Epoch: 33 Global Step: 687430 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:06,635-Speed 2501.88 samples/sec Loss 1.2095 LearningRate 0.000036 Epoch: 33 Global Step: 687440 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:14,873-Speed 2501.17 samples/sec Loss 1.2074 LearningRate 0.000036 Epoch: 33 Global Step: 687450 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:23,133-Speed 2501.39 samples/sec Loss 1.2236 LearningRate 0.000036 Epoch: 33 Global Step: 687460 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:34,207-Speed 2499.96 samples/sec Loss 1.2034 LearningRate 0.000036 Epoch: 33 Global Step: 687470 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:42,403-Speed 2498.95 samples/sec Loss 1.2106 LearningRate 0.000036 Epoch: 33 Global Step: 687480 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:49:50,588-Speed 2514.82 samples/sec Loss 1.1923 LearningRate 0.000036 Epoch: 33 Global Step: 687490 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:50:03,152-Speed 1635.11 samples/sec Loss 1.1817 LearningRate 0.000036 Epoch: 33 Global Step: 687500 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:50:14,415-Speed 2494.64 samples/sec Loss 1.2222 LearningRate 0.000036 Epoch: 33 Global Step: 687510 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:50:27,759-Speed 1534.88 samples/sec Loss 1.1993 LearningRate 0.000036 Epoch: 33 Global Step: 687520 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:50:39,768-Speed 2500.64 samples/sec Loss 1.2120 LearningRate 0.000036 Epoch: 33 Global Step: 687530 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:50:53,649-Speed 1482.70 samples/sec Loss 1.2046 LearningRate 0.000036 Epoch: 33 Global Step: 687540 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:51:01,891-Speed 2519.00 samples/sec Loss 1.2057 LearningRate 0.000036 Epoch: 33 Global Step: 687550 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:51:10,130-Speed 2499.75 samples/sec Loss 1.2164 LearningRate 0.000036 Epoch: 33 Global Step: 687560 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:51:23,444-Speed 1542.37 samples/sec Loss 1.2448 LearningRate 0.000036 Epoch: 33 Global Step: 687570 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:51:35,921-Speed 2499.82 samples/sec Loss 1.2359 LearningRate 0.000036 Epoch: 33 Global Step: 687580 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:51:47,716-Speed 1742.47 samples/sec Loss 1.2135 LearningRate 0.000036 Epoch: 33 Global Step: 687590 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:52:01,279-Speed 1514.72 samples/sec Loss 1.1891 LearningRate 0.000036 Epoch: 33 Global Step: 687600 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-07-12 03:52:09,432-Speed 2512.17 samples/sec Loss 1.2107 LearningRate 0.000036 Epoch: 33 Global Step: 687610 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:52:21,535-Speed 1700.82 samples/sec Loss 1.1904 LearningRate 0.000036 Epoch: 33 Global Step: 687620 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:52:29,738-Speed 2498.66 samples/sec Loss 1.1675 LearningRate 0.000036 Epoch: 33 Global Step: 687630 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:52:38,164-Speed 2431.03 samples/sec Loss 1.1942 LearningRate 0.000036 Epoch: 33 Global Step: 687640 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:52:46,563-Speed 2495.73 samples/sec Loss 1.1932 LearningRate 0.000036 Epoch: 33 Global Step: 687650 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:52:59,051-Speed 1834.97 samples/sec Loss 1.2157 LearningRate 0.000036 Epoch: 33 Global Step: 687660 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:07,257-Speed 2513.22 samples/sec Loss 1.2101 LearningRate 0.000036 Epoch: 33 Global Step: 687670 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:15,471-Speed 2493.91 samples/sec Loss 1.1899 LearningRate 0.000036 Epoch: 33 Global Step: 687680 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:23,692-Speed 2491.73 samples/sec Loss 1.1942 LearningRate 0.000036 Epoch: 33 Global Step: 687690 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:31,907-Speed 2493.54 samples/sec Loss 1.1990 LearningRate 0.000036 Epoch: 33 Global Step: 687700 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:40,126-Speed 2492.13 samples/sec Loss 1.1989 LearningRate 0.000036 Epoch: 33 Global Step: 687710 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:48,338-Speed 2494.50 samples/sec Loss 1.2123 LearningRate 0.000036 Epoch: 33 Global Step: 687720 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:53:56,495-Speed 2511.08 samples/sec Loss 1.1952 LearningRate 0.000036 Epoch: 33 Global Step: 687730 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:04,707-Speed 2494.18 samples/sec Loss 1.2073 LearningRate 0.000036 Epoch: 33 Global Step: 687740 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:12,919-Speed 2494.45 samples/sec Loss 1.2098 LearningRate 0.000036 Epoch: 33 Global Step: 687750 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:21,128-Speed 2495.28 samples/sec Loss 1.1850 LearningRate 0.000036 Epoch: 33 Global Step: 687760 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:29,341-Speed 2493.96 samples/sec Loss 1.2113 LearningRate 0.000036 Epoch: 33 Global Step: 687770 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:37,554-Speed 2493.91 samples/sec Loss 1.1924 LearningRate 0.000036 Epoch: 33 Global Step: 687780 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:45,713-Speed 2510.45 samples/sec Loss 1.2055 LearningRate 0.000036 Epoch: 33 Global Step: 687790 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:54:53,923-Speed 2495.01 samples/sec Loss 1.2251 LearningRate 0.000036 Epoch: 33 Global Step: 687800 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:02,132-Speed 2495.30 samples/sec Loss 1.2029 LearningRate 0.000036 Epoch: 33 Global Step: 687810 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:10,346-Speed 2493.75 samples/sec Loss 1.2024 LearningRate 0.000036 Epoch: 33 Global Step: 687820 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:18,557-Speed 2494.81 samples/sec Loss 1.2125 LearningRate 0.000036 Epoch: 33 Global Step: 687830 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:26,766-Speed 2495.26 samples/sec Loss 1.2203 LearningRate 0.000036 Epoch: 33 Global Step: 687840 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:34,921-Speed 2511.95 samples/sec Loss 1.2051 LearningRate 0.000036 Epoch: 33 Global Step: 687850 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:43,131-Speed 2495.01 samples/sec Loss 1.1830 LearningRate 0.000036 Epoch: 33 Global Step: 687860 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:51,343-Speed 2494.35 samples/sec Loss 1.2295 LearningRate 0.000036 Epoch: 33 Global Step: 687870 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:55:59,553-Speed 2494.88 samples/sec Loss 1.1897 LearningRate 0.000036 Epoch: 33 Global Step: 687880 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:07,769-Speed 2493.19 samples/sec Loss 1.2210 LearningRate 0.000036 Epoch: 33 Global Step: 687890 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:15,981-Speed 2494.16 samples/sec Loss 1.1873 LearningRate 0.000036 Epoch: 33 Global Step: 687900 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:24,135-Speed 2511.87 samples/sec Loss 1.2077 LearningRate 0.000036 Epoch: 33 Global Step: 687910 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:32,343-Speed 2495.80 samples/sec Loss 1.1899 LearningRate 0.000036 Epoch: 33 Global Step: 687920 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:40,552-Speed 2495.55 samples/sec Loss 1.1988 LearningRate 0.000036 Epoch: 33 Global Step: 687930 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:48,758-Speed 2495.84 samples/sec Loss 1.1807 LearningRate 0.000036 Epoch: 33 Global Step: 687940 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:56:56,976-Speed 2492.45 samples/sec Loss 1.2094 LearningRate 0.000036 Epoch: 33 Global Step: 687950 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:05,181-Speed 2496.44 samples/sec Loss 1.2009 LearningRate 0.000036 Epoch: 33 Global Step: 687960 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:13,339-Speed 2511.69 samples/sec Loss 1.2127 LearningRate 0.000036 Epoch: 33 Global Step: 687970 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:21,544-Speed 2496.31 samples/sec Loss 1.2165 LearningRate 0.000036 Epoch: 33 Global Step: 687980 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:29,754-Speed 2495.13 samples/sec Loss 1.2016 LearningRate 0.000036 Epoch: 33 Global Step: 687990 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:37,962-Speed 2495.45 samples/sec Loss 1.2191 LearningRate 0.000036 Epoch: 33 Global Step: 688000 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:46,169-Speed 2495.77 samples/sec Loss 1.1808 LearningRate 0.000036 Epoch: 33 Global Step: 688010 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:57:54,376-Speed 2495.71 samples/sec Loss 1.2168 LearningRate 0.000036 Epoch: 33 Global Step: 688020 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:02,544-Speed 2507.85 samples/sec Loss 1.1845 LearningRate 0.000036 Epoch: 33 Global Step: 688030 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:10,755-Speed 2494.65 samples/sec Loss 1.2105 LearningRate 0.000036 Epoch: 33 Global Step: 688040 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:18,976-Speed 2491.63 samples/sec Loss 1.1844 LearningRate 0.000036 Epoch: 33 Global Step: 688050 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:27,194-Speed 2492.46 samples/sec Loss 1.1956 LearningRate 0.000036 Epoch: 33 Global Step: 688060 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:35,401-Speed 2495.91 samples/sec Loss 1.1719 LearningRate 0.000036 Epoch: 33 Global Step: 688070 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:43,610-Speed 2495.43 samples/sec Loss 1.2369 LearningRate 0.000036 Epoch: 33 Global Step: 688080 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:51,763-Speed 2512.30 samples/sec Loss 1.1863 LearningRate 0.000036 Epoch: 33 Global Step: 688090 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:58:59,968-Speed 2496.92 samples/sec Loss 1.2095 LearningRate 0.000036 Epoch: 33 Global Step: 688100 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:08,187-Speed 2492.03 samples/sec Loss 1.1822 LearningRate 0.000036 Epoch: 33 Global Step: 688110 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:16,396-Speed 2495.21 samples/sec Loss 1.2019 LearningRate 0.000036 Epoch: 33 Global Step: 688120 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:24,601-Speed 2496.29 samples/sec Loss 1.2039 LearningRate 0.000036 Epoch: 33 Global Step: 688130 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:32,807-Speed 2496.25 samples/sec Loss 1.2374 LearningRate 0.000036 Epoch: 33 Global Step: 688140 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:40,964-Speed 2511.09 samples/sec Loss 1.1633 LearningRate 0.000036 Epoch: 33 Global Step: 688150 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:49,169-Speed 2496.61 samples/sec Loss 1.1814 LearningRate 0.000036 Epoch: 33 Global Step: 688160 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 03:59:57,377-Speed 2495.51 samples/sec Loss 1.1960 LearningRate 0.000036 Epoch: 33 Global Step: 688170 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:05,586-Speed 2495.03 samples/sec Loss 1.1852 LearningRate 0.000036 Epoch: 33 Global Step: 688180 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:13,798-Speed 2494.56 samples/sec Loss 1.1788 LearningRate 0.000036 Epoch: 33 Global Step: 688190 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:22,005-Speed 2495.97 samples/sec Loss 1.1782 LearningRate 0.000036 Epoch: 33 Global Step: 688200 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:30,161-Speed 2511.11 samples/sec Loss 1.2119 LearningRate 0.000036 Epoch: 33 Global Step: 688210 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:38,392-Speed 2488.66 samples/sec Loss 1.2123 LearningRate 0.000036 Epoch: 33 Global Step: 688220 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:46,603-Speed 2494.82 samples/sec Loss 1.1676 LearningRate 0.000036 Epoch: 33 Global Step: 688230 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:00:54,811-Speed 2495.64 samples/sec Loss 1.1856 LearningRate 0.000036 Epoch: 33 Global Step: 688240 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:03,016-Speed 2496.14 samples/sec Loss 1.2172 LearningRate 0.000036 Epoch: 33 Global Step: 688250 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:11,225-Speed 2495.19 samples/sec Loss 1.1910 LearningRate 0.000036 Epoch: 33 Global Step: 688260 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:19,383-Speed 2510.77 samples/sec Loss 1.1794 LearningRate 0.000036 Epoch: 33 Global Step: 688270 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:27,596-Speed 2494.24 samples/sec Loss 1.2156 LearningRate 0.000036 Epoch: 33 Global Step: 688280 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:35,802-Speed 2495.84 samples/sec Loss 1.1948 LearningRate 0.000036 Epoch: 33 Global Step: 688290 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:44,010-Speed 2496.09 samples/sec Loss 1.1898 LearningRate 0.000036 Epoch: 33 Global Step: 688300 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:01:52,216-Speed 2496.26 samples/sec Loss 1.1896 LearningRate 0.000036 Epoch: 33 Global Step: 688310 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:00,438-Speed 2491.19 samples/sec Loss 1.1735 LearningRate 0.000036 Epoch: 33 Global Step: 688320 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:08,590-Speed 2512.55 samples/sec Loss 1.2134 LearningRate 0.000036 Epoch: 33 Global Step: 688330 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:16,795-Speed 2496.63 samples/sec Loss 1.1773 LearningRate 0.000036 Epoch: 33 Global Step: 688340 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:25,002-Speed 2496.06 samples/sec Loss 1.1706 LearningRate 0.000036 Epoch: 33 Global Step: 688350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-07-12 04:02:33,210-Speed 2495.61 samples/sec Loss 1.2159 LearningRate 0.000036 Epoch: 33 Global Step: 688360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-07-12 04:02:41,377-Speed 2508.09 samples/sec Loss 1.1585 LearningRate 0.000036 Epoch: 33 Global Step: 688370 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:49,580-Speed 2496.88 samples/sec Loss 1.2002 LearningRate 0.000036 Epoch: 33 Global Step: 688380 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:02:57,736-Speed 2511.22 samples/sec Loss 1.1967 LearningRate 0.000036 Epoch: 33 Global Step: 688390 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:05,948-Speed 2494.30 samples/sec Loss 1.1682 LearningRate 0.000036 Epoch: 33 Global Step: 688400 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:14,153-Speed 2496.41 samples/sec Loss 1.2189 LearningRate 0.000036 Epoch: 33 Global Step: 688410 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:22,355-Speed 2497.37 samples/sec Loss 1.2155 LearningRate 0.000036 Epoch: 33 Global Step: 688420 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:30,560-Speed 2496.46 samples/sec Loss 1.1789 LearningRate 0.000036 Epoch: 33 Global Step: 688430 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:38,765-Speed 2496.37 samples/sec Loss 1.2008 LearningRate 0.000036 Epoch: 33 Global Step: 688440 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:46,920-Speed 2511.87 samples/sec Loss 1.2065 LearningRate 0.000036 Epoch: 33 Global Step: 688450 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:03:55,127-Speed 2495.84 samples/sec Loss 1.2108 LearningRate 0.000036 Epoch: 33 Global Step: 688460 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:03,330-Speed 2496.88 samples/sec Loss 1.1994 LearningRate 0.000036 Epoch: 33 Global Step: 688470 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:11,538-Speed 2495.57 samples/sec Loss 1.1833 LearningRate 0.000036 Epoch: 33 Global Step: 688480 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:19,741-Speed 2496.96 samples/sec Loss 1.1882 LearningRate 0.000036 Epoch: 33 Global Step: 688490 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:27,959-Speed 2492.60 samples/sec Loss 1.2140 LearningRate 0.000036 Epoch: 33 Global Step: 688500 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:36,122-Speed 2509.36 samples/sec Loss 1.2041 LearningRate 0.000036 Epoch: 33 Global Step: 688510 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:44,328-Speed 2496.07 samples/sec Loss 1.1758 LearningRate 0.000036 Epoch: 33 Global Step: 688520 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:04:52,538-Speed 2494.85 samples/sec Loss 1.2154 LearningRate 0.000036 Epoch: 33 Global Step: 688530 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:00,742-Speed 2497.00 samples/sec Loss 1.1545 LearningRate 0.000036 Epoch: 33 Global Step: 688540 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:08,957-Speed 2493.46 samples/sec Loss 1.1696 LearningRate 0.000036 Epoch: 33 Global Step: 688550 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:17,161-Speed 2496.78 samples/sec Loss 1.2117 LearningRate 0.000036 Epoch: 33 Global Step: 688560 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:25,317-Speed 2511.15 samples/sec Loss 1.1732 LearningRate 0.000036 Epoch: 33 Global Step: 688570 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:33,544-Speed 2489.94 samples/sec Loss 1.2179 LearningRate 0.000036 Epoch: 33 Global Step: 688580 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:41,755-Speed 2494.73 samples/sec Loss 1.2274 LearningRate 0.000036 Epoch: 33 Global Step: 688590 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:49,966-Speed 2494.81 samples/sec Loss 1.2104 LearningRate 0.000036 Epoch: 33 Global Step: 688600 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:05:58,177-Speed 2494.77 samples/sec Loss 1.2237 LearningRate 0.000036 Epoch: 33 Global Step: 688610 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:06:06,342-Speed 2508.45 samples/sec Loss 1.1789 LearningRate 0.000036 Epoch: 33 Global Step: 688620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:14,503-Speed 2509.78 samples/sec Loss 1.2004 LearningRate 0.000036 Epoch: 33 Global Step: 688630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:22,716-Speed 2493.92 samples/sec Loss 1.1750 LearningRate 0.000036 Epoch: 33 Global Step: 688640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:30,922-Speed 2496.49 samples/sec Loss 1.1717 LearningRate 0.000036 Epoch: 33 Global Step: 688650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:39,126-Speed 2496.96 samples/sec Loss 1.2157 LearningRate 0.000036 Epoch: 33 Global Step: 688660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:47,328-Speed 2497.33 samples/sec Loss 1.2054 LearningRate 0.000036 Epoch: 33 Global Step: 688670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:06:55,533-Speed 2496.40 samples/sec Loss 1.1757 LearningRate 0.000036 Epoch: 33 Global Step: 688680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:03,686-Speed 2512.24 samples/sec Loss 1.2337 LearningRate 0.000036 Epoch: 33 Global Step: 688690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:11,888-Speed 2497.71 samples/sec Loss 1.1931 LearningRate 0.000036 Epoch: 33 Global Step: 688700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:20,094-Speed 2495.99 samples/sec Loss 1.2097 LearningRate 0.000036 Epoch: 33 Global Step: 688710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:28,311-Speed 2492.65 samples/sec Loss 1.1819 LearningRate 0.000036 Epoch: 33 Global Step: 688720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:36,516-Speed 2496.40 samples/sec Loss 1.2201 LearningRate 0.000036 Epoch: 33 Global Step: 688730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:44,720-Speed 2497.08 samples/sec Loss 1.2069 LearningRate 0.000036 Epoch: 33 Global Step: 688740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:07:52,884-Speed 2508.91 samples/sec Loss 1.1943 LearningRate 0.000036 Epoch: 33 Global Step: 688750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:01,091-Speed 2495.71 samples/sec Loss 1.1599 LearningRate 0.000036 Epoch: 33 Global Step: 688760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:09,296-Speed 2496.58 samples/sec Loss 1.1623 LearningRate 0.000036 Epoch: 33 Global Step: 688770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:17,501-Speed 2496.08 samples/sec Loss 1.1847 LearningRate 0.000036 Epoch: 33 Global Step: 688780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:25,714-Speed 2493.96 samples/sec Loss 1.1949 LearningRate 0.000036 Epoch: 33 Global Step: 688790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:33,919-Speed 2496.48 samples/sec Loss 1.1781 LearningRate 0.000036 Epoch: 33 Global Step: 688800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:42,076-Speed 2511.07 samples/sec Loss 1.2088 LearningRate 0.000036 Epoch: 33 Global Step: 688810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:50,284-Speed 2495.58 samples/sec Loss 1.1924 LearningRate 0.000036 Epoch: 33 Global Step: 688820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:08:58,495-Speed 2494.54 samples/sec Loss 1.1965 LearningRate 0.000036 Epoch: 33 Global Step: 688830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:06,702-Speed 2495.98 samples/sec Loss 1.2040 LearningRate 0.000036 Epoch: 33 Global Step: 688840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:14,909-Speed 2495.72 samples/sec Loss 1.2199 LearningRate 0.000036 Epoch: 33 Global Step: 688850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:23,122-Speed 2493.65 samples/sec Loss 1.1813 LearningRate 0.000036 Epoch: 33 Global Step: 688860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:31,277-Speed 2512.05 samples/sec Loss 1.2197 LearningRate 0.000036 Epoch: 33 Global Step: 688870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:39,480-Speed 2497.11 samples/sec Loss 1.2001 LearningRate 0.000036 Epoch: 33 Global Step: 688880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:47,687-Speed 2495.61 samples/sec Loss 1.1756 LearningRate 0.000036 Epoch: 33 Global Step: 688890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:09:55,893-Speed 2496.41 samples/sec Loss 1.2044 LearningRate 0.000035 Epoch: 33 Global Step: 688900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:04,095-Speed 2497.12 samples/sec Loss 1.1787 LearningRate 0.000035 Epoch: 33 Global Step: 688910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:12,301-Speed 2496.52 samples/sec Loss 1.1879 LearningRate 0.000035 Epoch: 33 Global Step: 688920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:20,450-Speed 2513.38 samples/sec Loss 1.1749 LearningRate 0.000035 Epoch: 33 Global Step: 688930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:28,653-Speed 2496.89 samples/sec Loss 1.2077 LearningRate 0.000035 Epoch: 33 Global Step: 688940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:36,861-Speed 2495.95 samples/sec Loss 1.2089 LearningRate 0.000035 Epoch: 33 Global Step: 688950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:45,065-Speed 2496.91 samples/sec Loss 1.1640 LearningRate 0.000035 Epoch: 33 Global Step: 688960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:10:53,275-Speed 2494.77 samples/sec Loss 1.1983 LearningRate 0.000035 Epoch: 33 Global Step: 688970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:01,492-Speed 2492.90 samples/sec Loss 1.1713 LearningRate 0.000035 Epoch: 33 Global Step: 688980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:09,642-Speed 2513.08 samples/sec Loss 1.1998 LearningRate 0.000035 Epoch: 33 Global Step: 688990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:17,849-Speed 2496.00 samples/sec Loss 1.1869 LearningRate 0.000035 Epoch: 33 Global Step: 689000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:26,051-Speed 2497.27 samples/sec Loss 1.1939 LearningRate 0.000035 Epoch: 33 Global Step: 689010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:34,256-Speed 2496.49 samples/sec Loss 1.1533 LearningRate 0.000035 Epoch: 33 Global Step: 689020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:42,463-Speed 2495.71 samples/sec Loss 1.1981 LearningRate 0.000035 Epoch: 33 Global Step: 689030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:50,668-Speed 2496.47 samples/sec Loss 1.1970 LearningRate 0.000035 Epoch: 33 Global Step: 689040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:11:58,820-Speed 2512.70 samples/sec Loss 1.1629 LearningRate 0.000035 Epoch: 33 Global Step: 689050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:07,027-Speed 2496.19 samples/sec Loss 1.2059 LearningRate 0.000035 Epoch: 33 Global Step: 689060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:15,230-Speed 2497.06 samples/sec Loss 1.2007 LearningRate 0.000035 Epoch: 33 Global Step: 689070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:23,434-Speed 2497.00 samples/sec Loss 1.1833 LearningRate 0.000035 Epoch: 33 Global Step: 689080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:31,638-Speed 2496.69 samples/sec Loss 1.1666 LearningRate 0.000035 Epoch: 33 Global Step: 689090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:39,841-Speed 2496.98 samples/sec Loss 1.1724 LearningRate 0.000035 Epoch: 33 Global Step: 689100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:47,995-Speed 2512.22 samples/sec Loss 1.1817 LearningRate 0.000035 Epoch: 33 Global Step: 689110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:12:56,200-Speed 2496.32 samples/sec Loss 1.1995 LearningRate 0.000035 Epoch: 33 Global Step: 689120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:04,404-Speed 2496.69 samples/sec Loss 1.1965 LearningRate 0.000035 Epoch: 33 Global Step: 689130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:12,612-Speed 2495.27 samples/sec Loss 1.2027 LearningRate 0.000035 Epoch: 33 Global Step: 689140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:20,814-Speed 2497.45 samples/sec Loss 1.2086 LearningRate 0.000035 Epoch: 33 Global Step: 689150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:29,016-Speed 2497.49 samples/sec Loss 1.1909 LearningRate 0.000035 Epoch: 33 Global Step: 689160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:37,166-Speed 2513.12 samples/sec Loss 1.1905 LearningRate 0.000035 Epoch: 33 Global Step: 689170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:45,371-Speed 2496.77 samples/sec Loss 1.1946 LearningRate 0.000035 Epoch: 33 Global Step: 689180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:13:53,584-Speed 2493.87 samples/sec Loss 1.2339 LearningRate 0.000035 Epoch: 33 Global Step: 689190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:01,810-Speed 2489.94 samples/sec Loss 1.1751 LearningRate 0.000035 Epoch: 33 Global Step: 689200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:10,014-Speed 2496.66 samples/sec Loss 1.1728 LearningRate 0.000035 Epoch: 33 Global Step: 689210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:18,226-Speed 2494.57 samples/sec Loss 1.2056 LearningRate 0.000035 Epoch: 33 Global Step: 689220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:26,377-Speed 2512.76 samples/sec Loss 1.1850 LearningRate 0.000035 Epoch: 33 Global Step: 689230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:34,583-Speed 2496.23 samples/sec Loss 1.1850 LearningRate 0.000035 Epoch: 33 Global Step: 689240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:42,786-Speed 2496.95 samples/sec Loss 1.1737 LearningRate 0.000035 Epoch: 33 Global Step: 689250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:51,003-Speed 2493.26 samples/sec Loss 1.1997 LearningRate 0.000035 Epoch: 33 Global Step: 689260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:14:59,207-Speed 2496.72 samples/sec Loss 1.2093 LearningRate 0.000035 Epoch: 33 Global Step: 689270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:07,414-Speed 2495.71 samples/sec Loss 1.2106 LearningRate 0.000035 Epoch: 33 Global Step: 689280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:15,564-Speed 2513.47 samples/sec Loss 1.2014 LearningRate 0.000035 Epoch: 33 Global Step: 689290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:23,768-Speed 2496.72 samples/sec Loss 1.2000 LearningRate 0.000035 Epoch: 33 Global Step: 689300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:31,972-Speed 2496.80 samples/sec Loss 1.1756 LearningRate 0.000035 Epoch: 33 Global Step: 689310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:40,184-Speed 2494.34 samples/sec Loss 1.1869 LearningRate 0.000035 Epoch: 33 Global Step: 689320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:48,388-Speed 2496.85 samples/sec Loss 1.1906 LearningRate 0.000035 Epoch: 33 Global Step: 689330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:15:56,595-Speed 2495.79 samples/sec Loss 1.2078 LearningRate 0.000035 Epoch: 33 Global Step: 689340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:04,747-Speed 2512.74 samples/sec Loss 1.1778 LearningRate 0.000035 Epoch: 33 Global Step: 689350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:12,955-Speed 2495.60 samples/sec Loss 1.1888 LearningRate 0.000035 Epoch: 33 Global Step: 689360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:21,159-Speed 2496.72 samples/sec Loss 1.1705 LearningRate 0.000035 Epoch: 33 Global Step: 689370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:29,363-Speed 2496.70 samples/sec Loss 1.1815 LearningRate 0.000035 Epoch: 33 Global Step: 689380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:37,566-Speed 2497.21 samples/sec Loss 1.1732 LearningRate 0.000035 Epoch: 33 Global Step: 689390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:45,779-Speed 2493.91 samples/sec Loss 1.1900 LearningRate 0.000035 Epoch: 33 Global Step: 689400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:16:53,930-Speed 2512.85 samples/sec Loss 1.2116 LearningRate 0.000035 Epoch: 33 Global Step: 689410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:02,138-Speed 2495.62 samples/sec Loss 1.1671 LearningRate 0.000035 Epoch: 33 Global Step: 689420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:10,343-Speed 2496.37 samples/sec Loss 1.1843 LearningRate 0.000035 Epoch: 33 Global Step: 689430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:18,545-Speed 2497.25 samples/sec Loss 1.2193 LearningRate 0.000035 Epoch: 33 Global Step: 689440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:26,748-Speed 2497.07 samples/sec Loss 1.1885 LearningRate 0.000035 Epoch: 33 Global Step: 689450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:34,953-Speed 2496.87 samples/sec Loss 1.1621 LearningRate 0.000035 Epoch: 33 Global Step: 689460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:43,102-Speed 2513.40 samples/sec Loss 1.1798 LearningRate 0.000035 Epoch: 33 Global Step: 689470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:51,312-Speed 2494.86 samples/sec Loss 1.2120 LearningRate 0.000035 Epoch: 33 Global Step: 689480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:17:59,515-Speed 2497.23 samples/sec Loss 1.1625 LearningRate 0.000035 Epoch: 33 Global Step: 689490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:07,719-Speed 2496.90 samples/sec Loss 1.2115 LearningRate 0.000035 Epoch: 33 Global Step: 689500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:15,925-Speed 2496.14 samples/sec Loss 1.2126 LearningRate 0.000035 Epoch: 33 Global Step: 689510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:24,139-Speed 2493.42 samples/sec Loss 1.2159 LearningRate 0.000035 Epoch: 33 Global Step: 689520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:32,292-Speed 2512.57 samples/sec Loss 1.1995 LearningRate 0.000035 Epoch: 33 Global Step: 689530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:40,496-Speed 2496.69 samples/sec Loss 1.2039 LearningRate 0.000035 Epoch: 33 Global Step: 689540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:48,702-Speed 2496.36 samples/sec Loss 1.1681 LearningRate 0.000035 Epoch: 33 Global Step: 689550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:18:56,905-Speed 2496.77 samples/sec Loss 1.1696 LearningRate 0.000035 Epoch: 33 Global Step: 689560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:05,110-Speed 2496.93 samples/sec Loss 1.1964 LearningRate 0.000035 Epoch: 33 Global Step: 689570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:13,315-Speed 2496.29 samples/sec Loss 1.1665 LearningRate 0.000035 Epoch: 33 Global Step: 689580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:21,464-Speed 2513.41 samples/sec Loss 1.1944 LearningRate 0.000035 Epoch: 33 Global Step: 689590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:29,670-Speed 2496.46 samples/sec Loss 1.1974 LearningRate 0.000035 Epoch: 33 Global Step: 689600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:37,876-Speed 2496.25 samples/sec Loss 1.1904 LearningRate 0.000035 Epoch: 33 Global Step: 689610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:46,091-Speed 2493.15 samples/sec Loss 1.1718 LearningRate 0.000035 Epoch: 33 Global Step: 689620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:19:54,294-Speed 2497.04 samples/sec Loss 1.2137 LearningRate 0.000035 Epoch: 33 Global Step: 689630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:02,502-Speed 2495.46 samples/sec Loss 1.1893 LearningRate 0.000035 Epoch: 33 Global Step: 689640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:10,653-Speed 2513.15 samples/sec Loss 1.2314 LearningRate 0.000035 Epoch: 33 Global Step: 689650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:18,864-Speed 2494.49 samples/sec Loss 1.2149 LearningRate 0.000035 Epoch: 33 Global Step: 689660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:27,067-Speed 2496.96 samples/sec Loss 1.1692 LearningRate 0.000035 Epoch: 33 Global Step: 689670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:35,271-Speed 2496.88 samples/sec Loss 1.1969 LearningRate 0.000035 Epoch: 33 Global Step: 689680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:43,476-Speed 2496.66 samples/sec Loss 1.1785 LearningRate 0.000035 Epoch: 33 Global Step: 689690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:51,681-Speed 2496.27 samples/sec Loss 1.2005 LearningRate 0.000035 Epoch: 33 Global Step: 689700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:20:59,830-Speed 2513.51 samples/sec Loss 1.1693 LearningRate 0.000035 Epoch: 33 Global Step: 689710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:08,037-Speed 2495.57 samples/sec Loss 1.1835 LearningRate 0.000035 Epoch: 33 Global Step: 689720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:16,253-Speed 2493.41 samples/sec Loss 1.2055 LearningRate 0.000035 Epoch: 33 Global Step: 689730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:24,459-Speed 2496.05 samples/sec Loss 1.1744 LearningRate 0.000035 Epoch: 33 Global Step: 689740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:32,665-Speed 2495.98 samples/sec Loss 1.1851 LearningRate 0.000035 Epoch: 33 Global Step: 689750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:40,873-Speed 2495.65 samples/sec Loss 1.1687 LearningRate 0.000035 Epoch: 33 Global Step: 689760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:49,024-Speed 2513.17 samples/sec Loss 1.1985 LearningRate 0.000035 Epoch: 33 Global Step: 689770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:21:57,229-Speed 2496.62 samples/sec Loss 1.2100 LearningRate 0.000035 Epoch: 33 Global Step: 689780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:22:05,432-Speed 2497.01 samples/sec Loss 1.1859 LearningRate 0.000035 Epoch: 33 Global Step: 689790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:22:13,636-Speed 2496.53 samples/sec Loss 1.1987 LearningRate 0.000035 Epoch: 33 Global Step: 689800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:22:21,839-Speed 2497.00 samples/sec Loss 1.2093 LearningRate 0.000035 Epoch: 33 Global Step: 689810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:22:30,048-Speed 2495.37 samples/sec Loss 1.2047 LearningRate 0.000035 Epoch: 33 Global Step: 689820 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:22:38,203-Speed 2511.49 samples/sec Loss 1.2205 LearningRate 0.000035 Epoch: 33 Global Step: 689830 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:22:46,420-Speed 2492.98 samples/sec Loss 1.1930 LearningRate 0.000035 Epoch: 33 Global Step: 689840 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:22:54,627-Speed 2495.91 samples/sec Loss 1.2099 LearningRate 0.000035 Epoch: 33 Global Step: 689850 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:02,833-Speed 2495.95 samples/sec Loss 1.1942 LearningRate 0.000035 Epoch: 33 Global Step: 689860 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:11,040-Speed 2495.93 samples/sec Loss 1.2081 LearningRate 0.000035 Epoch: 33 Global Step: 689870 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:19,247-Speed 2495.71 samples/sec Loss 1.1902 LearningRate 0.000035 Epoch: 33 Global Step: 689880 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:27,396-Speed 2513.40 samples/sec Loss 1.2012 LearningRate 0.000035 Epoch: 33 Global Step: 689890 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:35,609-Speed 2494.09 samples/sec Loss 1.1781 LearningRate 0.000035 Epoch: 33 Global Step: 689900 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:43,812-Speed 2497.23 samples/sec Loss 1.2330 LearningRate 0.000035 Epoch: 33 Global Step: 689910 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:23:52,016-Speed 2496.98 samples/sec Loss 1.1980 LearningRate 0.000035 Epoch: 33 Global Step: 689920 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:00,220-Speed 2496.53 samples/sec Loss 1.1794 LearningRate 0.000035 Epoch: 33 Global Step: 689930 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:08,429-Speed 2495.09 samples/sec Loss 1.1857 LearningRate 0.000035 Epoch: 33 Global Step: 689940 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:16,598-Speed 2507.72 samples/sec Loss 1.1977 LearningRate 0.000035 Epoch: 33 Global Step: 689950 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:24,806-Speed 2495.50 samples/sec Loss 1.2018 LearningRate 0.000035 Epoch: 33 Global Step: 689960 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:33,010-Speed 2496.79 samples/sec Loss 1.1770 LearningRate 0.000035 Epoch: 33 Global Step: 689970 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:41,213-Speed 2497.11 samples/sec Loss 1.1857 LearningRate 0.000035 Epoch: 33 Global Step: 689980 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:49,420-Speed 2495.90 samples/sec Loss 1.1970 LearningRate 0.000035 Epoch: 33 Global Step: 689990 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:24:57,627-Speed 2495.77 samples/sec Loss 1.2158 LearningRate 0.000035 Epoch: 33 Global Step: 690000 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:25:05,783-Speed 2511.73 samples/sec Loss 1.2204 LearningRate 0.000035 Epoch: 33 Global Step: 690010 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-07-12 04:25:13,944-Speed 2510.02 samples/sec Loss 1.2244 LearningRate 0.000035 Epoch: 33 Global Step: 690020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:25:22,148-Speed 2496.84 samples/sec Loss 1.2097 LearningRate 0.000035 Epoch: 33 Global Step: 690030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:25:30,354-Speed 2495.83 samples/sec Loss 1.1978 LearningRate 0.000035 Epoch: 33 Global Step: 690040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:25:38,560-Speed 2496.51 samples/sec Loss 1.1940 LearningRate 0.000035 Epoch: 33 Global Step: 690050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:25:46,769-Speed 2495.02 samples/sec Loss 1.1915 LearningRate 0.000035 Epoch: 33 Global Step: 690060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:25:54,919-Speed 2513.47 samples/sec Loss 1.1721 LearningRate 0.000035 Epoch: 33 Global Step: 690070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:03,124-Speed 2496.41 samples/sec Loss 1.2040 LearningRate 0.000035 Epoch: 33 Global Step: 690080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:11,329-Speed 2496.34 samples/sec Loss 1.1920 LearningRate 0.000035 Epoch: 33 Global Step: 690090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:19,537-Speed 2495.67 samples/sec Loss 1.1772 LearningRate 0.000035 Epoch: 33 Global Step: 690100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:27,741-Speed 2496.70 samples/sec Loss 1.2269 LearningRate 0.000035 Epoch: 33 Global Step: 690110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:35,950-Speed 2495.32 samples/sec Loss 1.2054 LearningRate 0.000035 Epoch: 33 Global Step: 690120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:44,103-Speed 2512.20 samples/sec Loss 1.1961 LearningRate 0.000035 Epoch: 33 Global Step: 690130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:26:52,309-Speed 2496.09 samples/sec Loss 1.1763 LearningRate 0.000035 Epoch: 33 Global Step: 690140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:00,515-Speed 2496.37 samples/sec Loss 1.2059 LearningRate 0.000035 Epoch: 33 Global Step: 690150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:08,717-Speed 2497.06 samples/sec Loss 1.2424 LearningRate 0.000035 Epoch: 33 Global Step: 690160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:16,927-Speed 2494.95 samples/sec Loss 1.2011 LearningRate 0.000035 Epoch: 33 Global Step: 690170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:25,129-Speed 2497.30 samples/sec Loss 1.1873 LearningRate 0.000035 Epoch: 33 Global Step: 690180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:33,289-Speed 2510.34 samples/sec Loss 1.2123 LearningRate 0.000035 Epoch: 33 Global Step: 690190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:41,489-Speed 2497.70 samples/sec Loss 1.1730 LearningRate 0.000035 Epoch: 33 Global Step: 690200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:49,692-Speed 2497.26 samples/sec Loss 1.2184 LearningRate 0.000035 Epoch: 33 Global Step: 690210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:27:57,898-Speed 2496.19 samples/sec Loss 1.2061 LearningRate 0.000035 Epoch: 33 Global Step: 690220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:06,104-Speed 2496.23 samples/sec Loss 1.1890 LearningRate 0.000035 Epoch: 33 Global Step: 690230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:14,308-Speed 2496.48 samples/sec Loss 1.2151 LearningRate 0.000035 Epoch: 33 Global Step: 690240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:22,471-Speed 2509.41 samples/sec Loss 1.2000 LearningRate 0.000035 Epoch: 33 Global Step: 690250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:30,674-Speed 2496.88 samples/sec Loss 1.1837 LearningRate 0.000035 Epoch: 33 Global Step: 690260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:38,877-Speed 2497.22 samples/sec Loss 1.2119 LearningRate 0.000035 Epoch: 33 Global Step: 690270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:47,079-Speed 2497.58 samples/sec Loss 1.2269 LearningRate 0.000035 Epoch: 33 Global Step: 690280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:28:55,281-Speed 2497.29 samples/sec Loss 1.1808 LearningRate 0.000035 Epoch: 33 Global Step: 690290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:03,487-Speed 2496.57 samples/sec Loss 1.1933 LearningRate 0.000035 Epoch: 33 Global Step: 690300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:11,643-Speed 2511.12 samples/sec Loss 1.2286 LearningRate 0.000035 Epoch: 33 Global Step: 690310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:19,846-Speed 2497.01 samples/sec Loss 1.2318 LearningRate 0.000035 Epoch: 33 Global Step: 690320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:28,049-Speed 2497.26 samples/sec Loss 1.2307 LearningRate 0.000035 Epoch: 33 Global Step: 690330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:36,250-Speed 2497.29 samples/sec Loss 1.1950 LearningRate 0.000035 Epoch: 33 Global Step: 690340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:44,477-Speed 2489.68 samples/sec Loss 1.2302 LearningRate 0.000035 Epoch: 33 Global Step: 690350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:29:52,681-Speed 2496.97 samples/sec Loss 1.2074 LearningRate 0.000035 Epoch: 33 Global Step: 690360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:00,830-Speed 2513.90 samples/sec Loss 1.1893 LearningRate 0.000035 Epoch: 33 Global Step: 690370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:09,033-Speed 2497.01 samples/sec Loss 1.2177 LearningRate 0.000035 Epoch: 33 Global Step: 690380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:17,233-Speed 2497.63 samples/sec Loss 1.1644 LearningRate 0.000035 Epoch: 33 Global Step: 690390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:25,436-Speed 2497.21 samples/sec Loss 1.2043 LearningRate 0.000035 Epoch: 33 Global Step: 690400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:33,645-Speed 2495.22 samples/sec Loss 1.2112 LearningRate 0.000035 Epoch: 33 Global Step: 690410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:41,848-Speed 2496.99 samples/sec Loss 1.2054 LearningRate 0.000035 Epoch: 33 Global Step: 690420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:50,000-Speed 2512.82 samples/sec Loss 1.1864 LearningRate 0.000035 Epoch: 33 Global Step: 690430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:30:58,205-Speed 2496.25 samples/sec Loss 1.1857 LearningRate 0.000035 Epoch: 33 Global Step: 690440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:06,414-Speed 2495.32 samples/sec Loss 1.2080 LearningRate 0.000035 Epoch: 33 Global Step: 690450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:14,619-Speed 2496.56 samples/sec Loss 1.1906 LearningRate 0.000035 Epoch: 33 Global Step: 690460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:22,824-Speed 2496.24 samples/sec Loss 1.1823 LearningRate 0.000035 Epoch: 33 Global Step: 690470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:31,026-Speed 2497.62 samples/sec Loss 1.1774 LearningRate 0.000035 Epoch: 33 Global Step: 690480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:39,174-Speed 2513.75 samples/sec Loss 1.1806 LearningRate 0.000035 Epoch: 33 Global Step: 690490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:47,377-Speed 2497.13 samples/sec Loss 1.2027 LearningRate 0.000035 Epoch: 33 Global Step: 690500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:31:55,578-Speed 2497.79 samples/sec Loss 1.1634 LearningRate 0.000035 Epoch: 33 Global Step: 690510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:03,791-Speed 2494.01 samples/sec Loss 1.1881 LearningRate 0.000035 Epoch: 33 Global Step: 690520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:11,993-Speed 2497.23 samples/sec Loss 1.1862 LearningRate 0.000035 Epoch: 33 Global Step: 690530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:20,208-Speed 2493.22 samples/sec Loss 1.1792 LearningRate 0.000035 Epoch: 33 Global Step: 690540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:28,352-Speed 2515.15 samples/sec Loss 1.1750 LearningRate 0.000035 Epoch: 33 Global Step: 690550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:36,558-Speed 2496.26 samples/sec Loss 1.1907 LearningRate 0.000035 Epoch: 33 Global Step: 690560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:44,760-Speed 2497.27 samples/sec Loss 1.2038 LearningRate 0.000035 Epoch: 33 Global Step: 690570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:32:52,976-Speed 2493.03 samples/sec Loss 1.2129 LearningRate 0.000035 Epoch: 33 Global Step: 690580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:01,184-Speed 2495.34 samples/sec Loss 1.1914 LearningRate 0.000035 Epoch: 33 Global Step: 690590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:09,389-Speed 2496.61 samples/sec Loss 1.1806 LearningRate 0.000035 Epoch: 33 Global Step: 690600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:17,541-Speed 2512.48 samples/sec Loss 1.1688 LearningRate 0.000035 Epoch: 33 Global Step: 690610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:25,744-Speed 2496.99 samples/sec Loss 1.2025 LearningRate 0.000035 Epoch: 33 Global Step: 690620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:33,959-Speed 2493.38 samples/sec Loss 1.2086 LearningRate 0.000035 Epoch: 33 Global Step: 690630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:42,164-Speed 2496.48 samples/sec Loss 1.1780 LearningRate 0.000035 Epoch: 33 Global Step: 690640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:50,375-Speed 2494.72 samples/sec Loss 1.1771 LearningRate 0.000035 Epoch: 33 Global Step: 690650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:33:58,579-Speed 2496.42 samples/sec Loss 1.2066 LearningRate 0.000035 Epoch: 33 Global Step: 690660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:34:06,735-Speed 2511.42 samples/sec Loss 1.1920 LearningRate 0.000035 Epoch: 33 Global Step: 690670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:34:14,942-Speed 2495.90 samples/sec Loss 1.1889 LearningRate 0.000035 Epoch: 33 Global Step: 690680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:34:23,105-Speed 2509.27 samples/sec Loss 1.1989 LearningRate 0.000035 Epoch: 33 Global Step: 690690 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:34:31,316-Speed 2494.73 samples/sec Loss 1.2084 LearningRate 0.000035 Epoch: 33 Global Step: 690700 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:34:39,526-Speed 2495.02 samples/sec Loss 1.1629 LearningRate 0.000035 Epoch: 33 Global Step: 690710 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:34:47,733-Speed 2495.93 samples/sec Loss 1.1860 LearningRate 0.000035 Epoch: 33 Global Step: 690720 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:34:55,888-Speed 2511.76 samples/sec Loss 1.1735 LearningRate 0.000035 Epoch: 33 Global Step: 690730 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:04,100-Speed 2494.50 samples/sec Loss 1.1920 LearningRate 0.000035 Epoch: 33 Global Step: 690740 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:12,302-Speed 2497.24 samples/sec Loss 1.1926 LearningRate 0.000035 Epoch: 33 Global Step: 690750 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:20,510-Speed 2495.62 samples/sec Loss 1.1915 LearningRate 0.000035 Epoch: 33 Global Step: 690760 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:28,713-Speed 2496.90 samples/sec Loss 1.1720 LearningRate 0.000035 Epoch: 33 Global Step: 690770 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:36,919-Speed 2496.15 samples/sec Loss 1.1854 LearningRate 0.000035 Epoch: 33 Global Step: 690780 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:45,080-Speed 2509.85 samples/sec Loss 1.1907 LearningRate 0.000035 Epoch: 33 Global Step: 690790 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:35:53,282-Speed 2497.68 samples/sec Loss 1.1636 LearningRate 0.000035 Epoch: 33 Global Step: 690800 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:01,487-Speed 2496.60 samples/sec Loss 1.1811 LearningRate 0.000035 Epoch: 33 Global Step: 690810 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:09,692-Speed 2496.44 samples/sec Loss 1.1815 LearningRate 0.000035 Epoch: 33 Global Step: 690820 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:17,898-Speed 2495.92 samples/sec Loss 1.1925 LearningRate 0.000035 Epoch: 33 Global Step: 690830 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:26,102-Speed 2496.77 samples/sec Loss 1.1909 LearningRate 0.000035 Epoch: 33 Global Step: 690840 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:34,257-Speed 2512.03 samples/sec Loss 1.1704 LearningRate 0.000035 Epoch: 33 Global Step: 690850 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:42,467-Speed 2494.74 samples/sec Loss 1.1864 LearningRate 0.000035 Epoch: 33 Global Step: 690860 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:50,674-Speed 2495.85 samples/sec Loss 1.1874 LearningRate 0.000035 Epoch: 33 Global Step: 690870 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:36:58,879-Speed 2496.60 samples/sec Loss 1.2001 LearningRate 0.000035 Epoch: 33 Global Step: 690880 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:07,086-Speed 2495.84 samples/sec Loss 1.1503 LearningRate 0.000034 Epoch: 33 Global Step: 690890 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:15,289-Speed 2496.86 samples/sec Loss 1.1993 LearningRate 0.000034 Epoch: 33 Global Step: 690900 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:23,448-Speed 2510.57 samples/sec Loss 1.2146 LearningRate 0.000034 Epoch: 33 Global Step: 690910 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:31,660-Speed 2494.78 samples/sec Loss 1.1947 LearningRate 0.000034 Epoch: 33 Global Step: 690920 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:39,866-Speed 2496.21 samples/sec Loss 1.1926 LearningRate 0.000034 Epoch: 33 Global Step: 690930 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:48,072-Speed 2495.97 samples/sec Loss 1.1681 LearningRate 0.000034 Epoch: 33 Global Step: 690940 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:37:56,278-Speed 2496.37 samples/sec Loss 1.1730 LearningRate 0.000034 Epoch: 33 Global Step: 690950 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:04,486-Speed 2495.82 samples/sec Loss 1.1881 LearningRate 0.000034 Epoch: 33 Global Step: 690960 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:12,638-Speed 2512.72 samples/sec Loss 1.1859 LearningRate 0.000034 Epoch: 33 Global Step: 690970 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:20,841-Speed 2496.94 samples/sec Loss 1.1657 LearningRate 0.000034 Epoch: 33 Global Step: 690980 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:29,046-Speed 2496.65 samples/sec Loss 1.1622 LearningRate 0.000034 Epoch: 33 Global Step: 690990 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:37,250-Speed 2497.05 samples/sec Loss 1.1806 LearningRate 0.000034 Epoch: 33 Global Step: 691000 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:45,452-Speed 2497.06 samples/sec Loss 1.1596 LearningRate 0.000034 Epoch: 33 Global Step: 691010 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:38:53,657-Speed 2496.43 samples/sec Loss 1.1956 LearningRate 0.000034 Epoch: 33 Global Step: 691020 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:01,809-Speed 2512.67 samples/sec Loss 1.1740 LearningRate 0.000034 Epoch: 33 Global Step: 691030 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:10,014-Speed 2497.00 samples/sec Loss 1.2013 LearningRate 0.000034 Epoch: 33 Global Step: 691040 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:18,221-Speed 2495.70 samples/sec Loss 1.1973 LearningRate 0.000034 Epoch: 33 Global Step: 691050 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:26,424-Speed 2497.16 samples/sec Loss 1.1702 LearningRate 0.000034 Epoch: 33 Global Step: 691060 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:34,630-Speed 2496.43 samples/sec Loss 1.1875 LearningRate 0.000034 Epoch: 33 Global Step: 691070 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:42,832-Speed 2497.37 samples/sec Loss 1.1921 LearningRate 0.000034 Epoch: 33 Global Step: 691080 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:50,984-Speed 2512.52 samples/sec Loss 1.1755 LearningRate 0.000034 Epoch: 33 Global Step: 691090 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:39:59,187-Speed 2497.44 samples/sec Loss 1.1752 LearningRate 0.000034 Epoch: 33 Global Step: 691100 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:07,388-Speed 2497.37 samples/sec Loss 1.2133 LearningRate 0.000034 Epoch: 33 Global Step: 691110 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:15,604-Speed 2493.20 samples/sec Loss 1.2063 LearningRate 0.000034 Epoch: 33 Global Step: 691120 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:23,809-Speed 2496.50 samples/sec Loss 1.1874 LearningRate 0.000034 Epoch: 33 Global Step: 691130 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:32,014-Speed 2496.12 samples/sec Loss 1.2042 LearningRate 0.000034 Epoch: 33 Global Step: 691140 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:40,167-Speed 2512.65 samples/sec Loss 1.1883 LearningRate 0.000034 Epoch: 33 Global Step: 691150 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:48,371-Speed 2496.73 samples/sec Loss 1.1748 LearningRate 0.000034 Epoch: 33 Global Step: 691160 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:40:56,589-Speed 2492.29 samples/sec Loss 1.2023 LearningRate 0.000034 Epoch: 33 Global Step: 691170 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:04,800-Speed 2494.65 samples/sec Loss 1.1975 LearningRate 0.000034 Epoch: 33 Global Step: 691180 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:13,000-Speed 2498.04 samples/sec Loss 1.1495 LearningRate 0.000034 Epoch: 33 Global Step: 691190 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:21,216-Speed 2493.11 samples/sec Loss 1.1891 LearningRate 0.000034 Epoch: 33 Global Step: 691200 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:29,384-Speed 2507.70 samples/sec Loss 1.2140 LearningRate 0.000034 Epoch: 33 Global Step: 691210 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:37,586-Speed 2497.23 samples/sec Loss 1.1709 LearningRate 0.000034 Epoch: 33 Global Step: 691220 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:45,795-Speed 2495.61 samples/sec Loss 1.1855 LearningRate 0.000034 Epoch: 33 Global Step: 691230 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:41:53,998-Speed 2496.94 samples/sec Loss 1.1829 LearningRate 0.000034 Epoch: 33 Global Step: 691240 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:02,202-Speed 2496.80 samples/sec Loss 1.1963 LearningRate 0.000034 Epoch: 33 Global Step: 691250 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:10,413-Speed 2494.44 samples/sec Loss 1.1516 LearningRate 0.000034 Epoch: 33 Global Step: 691260 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:18,571-Speed 2510.79 samples/sec Loss 1.1965 LearningRate 0.000034 Epoch: 33 Global Step: 691270 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:26,776-Speed 2496.41 samples/sec Loss 1.1526 LearningRate 0.000034 Epoch: 33 Global Step: 691280 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:34,984-Speed 2495.63 samples/sec Loss 1.1730 LearningRate 0.000034 Epoch: 33 Global Step: 691290 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:43,190-Speed 2495.96 samples/sec Loss 1.1658 LearningRate 0.000034 Epoch: 33 Global Step: 691300 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:51,397-Speed 2496.17 samples/sec Loss 1.1808 LearningRate 0.000034 Epoch: 33 Global Step: 691310 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:42:59,618-Speed 2491.28 samples/sec Loss 1.1891 LearningRate 0.000034 Epoch: 33 Global Step: 691320 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:07,769-Speed 2512.95 samples/sec Loss 1.1812 LearningRate 0.000034 Epoch: 33 Global Step: 691330 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:15,975-Speed 2496.20 samples/sec Loss 1.2223 LearningRate 0.000034 Epoch: 33 Global Step: 691340 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:24,178-Speed 2497.14 samples/sec Loss 1.1957 LearningRate 0.000034 Epoch: 33 Global Step: 691350 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:32,383-Speed 2496.27 samples/sec Loss 1.1891 LearningRate 0.000034 Epoch: 33 Global Step: 691360 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:40,597-Speed 2493.74 samples/sec Loss 1.1663 LearningRate 0.000034 Epoch: 33 Global Step: 691370 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:48,808-Speed 2494.68 samples/sec Loss 1.1972 LearningRate 0.000034 Epoch: 33 Global Step: 691380 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:43:56,958-Speed 2512.94 samples/sec Loss 1.1785 LearningRate 0.000034 Epoch: 33 Global Step: 691390 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:05,163-Speed 2496.45 samples/sec Loss 1.1901 LearningRate 0.000034 Epoch: 33 Global Step: 691400 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:13,366-Speed 2497.18 samples/sec Loss 1.2012 LearningRate 0.000034 Epoch: 33 Global Step: 691410 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:21,569-Speed 2496.73 samples/sec Loss 1.2062 LearningRate 0.000034 Epoch: 33 Global Step: 691420 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:29,784-Speed 2493.54 samples/sec Loss 1.1904 LearningRate 0.000034 Epoch: 33 Global Step: 691430 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:37,986-Speed 2497.17 samples/sec Loss 1.1859 LearningRate 0.000034 Epoch: 33 Global Step: 691440 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:46,145-Speed 2510.56 samples/sec Loss 1.1722 LearningRate 0.000034 Epoch: 33 Global Step: 691450 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:44:54,349-Speed 2496.86 samples/sec Loss 1.1810 LearningRate 0.000034 Epoch: 33 Global Step: 691460 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:02,561-Speed 2494.34 samples/sec Loss 1.2020 LearningRate 0.000034 Epoch: 33 Global Step: 691470 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:10,780-Speed 2491.96 samples/sec Loss 1.1963 LearningRate 0.000034 Epoch: 33 Global Step: 691480 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:18,987-Speed 2495.94 samples/sec Loss 1.1921 LearningRate 0.000034 Epoch: 33 Global Step: 691490 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:27,191-Speed 2496.67 samples/sec Loss 1.1858 LearningRate 0.000034 Epoch: 33 Global Step: 691500 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:35,342-Speed 2513.04 samples/sec Loss 1.1655 LearningRate 0.000034 Epoch: 33 Global Step: 691510 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:43,547-Speed 2496.68 samples/sec Loss 1.2124 LearningRate 0.000034 Epoch: 33 Global Step: 691520 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:51,753-Speed 2496.02 samples/sec Loss 1.1929 LearningRate 0.000034 Epoch: 33 Global Step: 691530 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:45:59,954-Speed 2497.80 samples/sec Loss 1.1633 LearningRate 0.000034 Epoch: 33 Global Step: 691540 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:08,157-Speed 2497.06 samples/sec Loss 1.1813 LearningRate 0.000034 Epoch: 33 Global Step: 691550 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:16,361-Speed 2496.54 samples/sec Loss 1.1848 LearningRate 0.000034 Epoch: 33 Global Step: 691560 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:24,513-Speed 2512.61 samples/sec Loss 1.1691 LearningRate 0.000034 Epoch: 33 Global Step: 691570 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:32,716-Speed 2497.07 samples/sec Loss 1.1582 LearningRate 0.000034 Epoch: 33 Global Step: 691580 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:40,919-Speed 2497.05 samples/sec Loss 1.1998 LearningRate 0.000034 Epoch: 33 Global Step: 691590 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:49,124-Speed 2496.41 samples/sec Loss 1.2288 LearningRate 0.000034 Epoch: 33 Global Step: 691600 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:46:57,342-Speed 2492.50 samples/sec Loss 1.1714 LearningRate 0.000034 Epoch: 33 Global Step: 691610 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:05,547-Speed 2496.44 samples/sec Loss 1.1914 LearningRate 0.000034 Epoch: 33 Global Step: 691620 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:13,702-Speed 2511.75 samples/sec Loss 1.1794 LearningRate 0.000034 Epoch: 33 Global Step: 691630 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:21,919-Speed 2492.60 samples/sec Loss 1.1976 LearningRate 0.000034 Epoch: 33 Global Step: 691640 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:30,127-Speed 2495.45 samples/sec Loss 1.2139 LearningRate 0.000034 Epoch: 33 Global Step: 691650 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:38,332-Speed 2496.59 samples/sec Loss 1.2136 LearningRate 0.000034 Epoch: 33 Global Step: 691660 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:46,536-Speed 2496.76 samples/sec Loss 1.1795 LearningRate 0.000034 Epoch: 33 Global Step: 691670 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:47:54,742-Speed 2495.87 samples/sec Loss 1.2099 LearningRate 0.000034 Epoch: 33 Global Step: 691680 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:02,891-Speed 2513.73 samples/sec Loss 1.1572 LearningRate 0.000034 Epoch: 33 Global Step: 691690 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:11,095-Speed 2496.70 samples/sec Loss 1.2024 LearningRate 0.000034 Epoch: 33 Global Step: 691700 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:19,299-Speed 2496.57 samples/sec Loss 1.1899 LearningRate 0.000034 Epoch: 33 Global Step: 691710 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:27,505-Speed 2496.28 samples/sec Loss 1.1821 LearningRate 0.000034 Epoch: 33 Global Step: 691720 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:35,720-Speed 2493.23 samples/sec Loss 1.1784 LearningRate 0.000034 Epoch: 33 Global Step: 691730 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:43,932-Speed 2494.24 samples/sec Loss 1.2095 LearningRate 0.000034 Epoch: 33 Global Step: 691740 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:48:52,081-Speed 2513.48 samples/sec Loss 1.2010 LearningRate 0.000034 Epoch: 33 Global Step: 691750 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:00,296-Speed 2493.55 samples/sec Loss 1.1842 LearningRate 0.000034 Epoch: 33 Global Step: 691760 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:08,499-Speed 2497.25 samples/sec Loss 1.2071 LearningRate 0.000034 Epoch: 33 Global Step: 691770 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:16,699-Speed 2497.90 samples/sec Loss 1.2072 LearningRate 0.000034 Epoch: 33 Global Step: 691780 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:24,903-Speed 2496.69 samples/sec Loss 1.1692 LearningRate 0.000034 Epoch: 33 Global Step: 691790 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:33,107-Speed 2496.85 samples/sec Loss 1.1951 LearningRate 0.000034 Epoch: 33 Global Step: 691800 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:41,256-Speed 2513.53 samples/sec Loss 1.2023 LearningRate 0.000034 Epoch: 33 Global Step: 691810 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:49,460-Speed 2496.69 samples/sec Loss 1.1756 LearningRate 0.000034 Epoch: 33 Global Step: 691820 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:49:57,662-Speed 2497.56 samples/sec Loss 1.1888 LearningRate 0.000034 Epoch: 33 Global Step: 691830 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:05,871-Speed 2495.82 samples/sec Loss 1.1784 LearningRate 0.000034 Epoch: 33 Global Step: 691840 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:14,088-Speed 2492.77 samples/sec Loss 1.1891 LearningRate 0.000034 Epoch: 33 Global Step: 691850 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:22,289-Speed 2497.49 samples/sec Loss 1.1767 LearningRate 0.000034 Epoch: 33 Global Step: 691860 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:30,441-Speed 2512.95 samples/sec Loss 1.1917 LearningRate 0.000034 Epoch: 33 Global Step: 691870 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:38,642-Speed 2497.78 samples/sec Loss 1.1942 LearningRate 0.000034 Epoch: 33 Global Step: 691880 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-07-12 04:50:46,855-Speed 2493.86 samples/sec Loss 1.2157 LearningRate 0.000034 Epoch: 33 Global Step: 691890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:50:55,064-Speed 2495.14 samples/sec Loss 1.2141 LearningRate 0.000034 Epoch: 33 Global Step: 691900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:03,268-Speed 2497.06 samples/sec Loss 1.1807 LearningRate 0.000034 Epoch: 33 Global Step: 691910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:11,474-Speed 2496.15 samples/sec Loss 1.2111 LearningRate 0.000034 Epoch: 33 Global Step: 691920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:19,625-Speed 2512.70 samples/sec Loss 1.1773 LearningRate 0.000034 Epoch: 33 Global Step: 691930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:27,831-Speed 2496.23 samples/sec Loss 1.1716 LearningRate 0.000034 Epoch: 33 Global Step: 691940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:36,037-Speed 2496.82 samples/sec Loss 1.1755 LearningRate 0.000034 Epoch: 33 Global Step: 691950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:44,241-Speed 2496.66 samples/sec Loss 1.2031 LearningRate 0.000034 Epoch: 33 Global Step: 691960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:51:52,443-Speed 2497.36 samples/sec Loss 1.1998 LearningRate 0.000034 Epoch: 33 Global Step: 691970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-07-12 04:52:00,651-Speed 2495.73 samples/sec Loss 1.1905 LearningRate 0.000034 Epoch: 33 Global Step: 691980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:08,805-Speed 2512.04 samples/sec Loss 1.1958 LearningRate 0.000034 Epoch: 33 Global Step: 691990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:17,011-Speed 2496.41 samples/sec Loss 1.2111 LearningRate 0.000034 Epoch: 33 Global Step: 692000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:25,215-Speed 2496.38 samples/sec Loss 1.1902 LearningRate 0.000034 Epoch: 33 Global Step: 692010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:33,420-Speed 2496.69 samples/sec Loss 1.1822 LearningRate 0.000034 Epoch: 33 Global Step: 692020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:41,624-Speed 2496.50 samples/sec Loss 1.1970 LearningRate 0.000034 Epoch: 33 Global Step: 692030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:49,828-Speed 2496.85 samples/sec Loss 1.1764 LearningRate 0.000034 Epoch: 33 Global Step: 692040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:52:57,977-Speed 2513.65 samples/sec Loss 1.1816 LearningRate 0.000034 Epoch: 33 Global Step: 692050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:06,183-Speed 2496.00 samples/sec Loss 1.1904 LearningRate 0.000034 Epoch: 33 Global Step: 692060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:14,389-Speed 2496.00 samples/sec Loss 1.1914 LearningRate 0.000034 Epoch: 33 Global Step: 692070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:22,597-Speed 2495.65 samples/sec Loss 1.1877 LearningRate 0.000034 Epoch: 33 Global Step: 692080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:30,804-Speed 2496.04 samples/sec Loss 1.2137 LearningRate 0.000034 Epoch: 33 Global Step: 692090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:39,010-Speed 2496.04 samples/sec Loss 1.2159 LearningRate 0.000034 Epoch: 33 Global Step: 692100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:47,168-Speed 2510.89 samples/sec Loss 1.1988 LearningRate 0.000034 Epoch: 33 Global Step: 692110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:53:55,373-Speed 2496.56 samples/sec Loss 1.1831 LearningRate 0.000034 Epoch: 33 Global Step: 692120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:03,576-Speed 2497.19 samples/sec Loss 1.1983 LearningRate 0.000034 Epoch: 33 Global Step: 692130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:11,780-Speed 2496.59 samples/sec Loss 1.2173 LearningRate 0.000034 Epoch: 33 Global Step: 692140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:19,985-Speed 2496.22 samples/sec Loss 1.2040 LearningRate 0.000034 Epoch: 33 Global Step: 692150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:28,189-Speed 2497.12 samples/sec Loss 1.1969 LearningRate 0.000034 Epoch: 33 Global Step: 692160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:36,350-Speed 2509.71 samples/sec Loss 1.1822 LearningRate 0.000034 Epoch: 33 Global Step: 692170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:44,556-Speed 2496.30 samples/sec Loss 1.1657 LearningRate 0.000034 Epoch: 33 Global Step: 692180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:54:52,758-Speed 2497.29 samples/sec Loss 1.2261 LearningRate 0.000034 Epoch: 33 Global Step: 692190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:00,964-Speed 2496.05 samples/sec Loss 1.1906 LearningRate 0.000034 Epoch: 33 Global Step: 692200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:09,173-Speed 2495.12 samples/sec Loss 1.2144 LearningRate 0.000034 Epoch: 33 Global Step: 692210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:17,377-Speed 2496.91 samples/sec Loss 1.1645 LearningRate 0.000034 Epoch: 33 Global Step: 692220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:25,528-Speed 2513.13 samples/sec Loss 1.1882 LearningRate 0.000034 Epoch: 33 Global Step: 692230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:33,735-Speed 2495.51 samples/sec Loss 1.1765 LearningRate 0.000034 Epoch: 33 Global Step: 692240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:41,939-Speed 2496.79 samples/sec Loss 1.1715 LearningRate 0.000034 Epoch: 33 Global Step: 692250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:50,139-Speed 2498.13 samples/sec Loss 1.1805 LearningRate 0.000034 Epoch: 33 Global Step: 692260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:55:58,350-Speed 2494.58 samples/sec Loss 1.1778 LearningRate 0.000034 Epoch: 33 Global Step: 692270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:06,556-Speed 2496.13 samples/sec Loss 1.1748 LearningRate 0.000034 Epoch: 33 Global Step: 692280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:14,710-Speed 2512.16 samples/sec Loss 1.1597 LearningRate 0.000034 Epoch: 33 Global Step: 692290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:22,913-Speed 2496.89 samples/sec Loss 1.1971 LearningRate 0.000034 Epoch: 33 Global Step: 692300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:31,116-Speed 2497.21 samples/sec Loss 1.1728 LearningRate 0.000034 Epoch: 33 Global Step: 692310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:39,321-Speed 2496.39 samples/sec Loss 1.1853 LearningRate 0.000034 Epoch: 33 Global Step: 692320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:47,524-Speed 2496.87 samples/sec Loss 1.1744 LearningRate 0.000034 Epoch: 33 Global Step: 692330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:56:55,728-Speed 2496.86 samples/sec Loss 1.1586 LearningRate 0.000034 Epoch: 33 Global Step: 692340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:03,878-Speed 2513.56 samples/sec Loss 1.1896 LearningRate 0.000034 Epoch: 33 Global Step: 692350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:12,086-Speed 2495.20 samples/sec Loss 1.1854 LearningRate 0.000034 Epoch: 33 Global Step: 692360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:20,304-Speed 2492.70 samples/sec Loss 1.1869 LearningRate 0.000034 Epoch: 33 Global Step: 692370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:28,522-Speed 2492.31 samples/sec Loss 1.1842 LearningRate 0.000034 Epoch: 33 Global Step: 692380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:36,725-Speed 2497.04 samples/sec Loss 1.1785 LearningRate 0.000034 Epoch: 33 Global Step: 692390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:44,927-Speed 2497.54 samples/sec Loss 1.1532 LearningRate 0.000034 Epoch: 33 Global Step: 692400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:57:53,079-Speed 2512.46 samples/sec Loss 1.2294 LearningRate 0.000034 Epoch: 33 Global Step: 692410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:01,283-Speed 2496.75 samples/sec Loss 1.1974 LearningRate 0.000034 Epoch: 33 Global Step: 692420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:09,487-Speed 2496.78 samples/sec Loss 1.1921 LearningRate 0.000034 Epoch: 33 Global Step: 692430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:17,690-Speed 2496.90 samples/sec Loss 1.1827 LearningRate 0.000034 Epoch: 33 Global Step: 692440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:25,893-Speed 2497.18 samples/sec Loss 1.2185 LearningRate 0.000034 Epoch: 33 Global Step: 692450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:34,097-Speed 2496.63 samples/sec Loss 1.1851 LearningRate 0.000034 Epoch: 33 Global Step: 692460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:42,250-Speed 2512.57 samples/sec Loss 1.1875 LearningRate 0.000034 Epoch: 33 Global Step: 692470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:50,454-Speed 2496.60 samples/sec Loss 1.1847 LearningRate 0.000034 Epoch: 33 Global Step: 692480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:58:58,666-Speed 2494.24 samples/sec Loss 1.1932 LearningRate 0.000034 Epoch: 33 Global Step: 692490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:06,867-Speed 2497.58 samples/sec Loss 1.1862 LearningRate 0.000034 Epoch: 33 Global Step: 692500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:15,072-Speed 2496.65 samples/sec Loss 1.2099 LearningRate 0.000034 Epoch: 33 Global Step: 692510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:23,282-Speed 2495.00 samples/sec Loss 1.1810 LearningRate 0.000034 Epoch: 33 Global Step: 692520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:31,436-Speed 2512.04 samples/sec Loss 1.1729 LearningRate 0.000034 Epoch: 33 Global Step: 692530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:39,643-Speed 2495.72 samples/sec Loss 1.2228 LearningRate 0.000034 Epoch: 33 Global Step: 692540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:47,851-Speed 2495.56 samples/sec Loss 1.2058 LearningRate 0.000034 Epoch: 33 Global Step: 692550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 04:59:56,053-Speed 2497.17 samples/sec Loss 1.2042 LearningRate 0.000034 Epoch: 33 Global Step: 692560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:04,260-Speed 2495.78 samples/sec Loss 1.1667 LearningRate 0.000034 Epoch: 33 Global Step: 692570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:12,467-Speed 2495.70 samples/sec Loss 1.1923 LearningRate 0.000034 Epoch: 33 Global Step: 692580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:20,620-Speed 2512.45 samples/sec Loss 1.1922 LearningRate 0.000034 Epoch: 33 Global Step: 692590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:28,829-Speed 2495.25 samples/sec Loss 1.1713 LearningRate 0.000034 Epoch: 33 Global Step: 692600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:37,033-Speed 2496.88 samples/sec Loss 1.2159 LearningRate 0.000034 Epoch: 33 Global Step: 692610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:00:45,193-Speed 2510.33 samples/sec Loss 1.2033 LearningRate 0.000034 Epoch: 33 Global Step: 692620 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:00:53,400-Speed 2495.78 samples/sec Loss 1.1816 LearningRate 0.000034 Epoch: 33 Global Step: 692630 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:01,608-Speed 2495.46 samples/sec Loss 1.1963 LearningRate 0.000034 Epoch: 33 Global Step: 692640 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:09,761-Speed 2512.18 samples/sec Loss 1.2080 LearningRate 0.000034 Epoch: 33 Global Step: 692650 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:17,967-Speed 2496.13 samples/sec Loss 1.1803 LearningRate 0.000034 Epoch: 33 Global Step: 692660 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:26,171-Speed 2496.72 samples/sec Loss 1.1856 LearningRate 0.000034 Epoch: 33 Global Step: 692670 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:34,374-Speed 2497.07 samples/sec Loss 1.1920 LearningRate 0.000034 Epoch: 33 Global Step: 692680 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:42,576-Speed 2497.41 samples/sec Loss 1.2078 LearningRate 0.000034 Epoch: 33 Global Step: 692690 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:50,783-Speed 2495.88 samples/sec Loss 1.2060 LearningRate 0.000034 Epoch: 33 Global Step: 692700 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:01:58,947-Speed 2508.91 samples/sec Loss 1.2247 LearningRate 0.000034 Epoch: 33 Global Step: 692710 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:07,148-Speed 2497.31 samples/sec Loss 1.1841 LearningRate 0.000034 Epoch: 33 Global Step: 692720 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:15,352-Speed 2496.80 samples/sec Loss 1.2089 LearningRate 0.000034 Epoch: 33 Global Step: 692730 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:23,556-Speed 2497.12 samples/sec Loss 1.1563 LearningRate 0.000034 Epoch: 33 Global Step: 692740 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:31,771-Speed 2492.99 samples/sec Loss 1.1761 LearningRate 0.000034 Epoch: 33 Global Step: 692750 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:39,977-Speed 2496.51 samples/sec Loss 1.1957 LearningRate 0.000034 Epoch: 33 Global Step: 692760 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:48,129-Speed 2512.66 samples/sec Loss 1.1836 LearningRate 0.000034 Epoch: 33 Global Step: 692770 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:02:56,335-Speed 2496.18 samples/sec Loss 1.1740 LearningRate 0.000034 Epoch: 33 Global Step: 692780 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:04,545-Speed 2495.08 samples/sec Loss 1.1960 LearningRate 0.000034 Epoch: 33 Global Step: 692790 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:12,750-Speed 2496.56 samples/sec Loss 1.2053 LearningRate 0.000034 Epoch: 33 Global Step: 692800 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:20,953-Speed 2496.72 samples/sec Loss 1.1753 LearningRate 0.000034 Epoch: 33 Global Step: 692810 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:29,156-Speed 2497.15 samples/sec Loss 1.1733 LearningRate 0.000034 Epoch: 33 Global Step: 692820 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:37,310-Speed 2511.93 samples/sec Loss 1.1818 LearningRate 0.000034 Epoch: 33 Global Step: 692830 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:45,524-Speed 2493.81 samples/sec Loss 1.1840 LearningRate 0.000034 Epoch: 33 Global Step: 692840 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:03:53,730-Speed 2496.16 samples/sec Loss 1.1991 LearningRate 0.000034 Epoch: 33 Global Step: 692850 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:01,934-Speed 2496.81 samples/sec Loss 1.1494 LearningRate 0.000034 Epoch: 33 Global Step: 692860 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:10,138-Speed 2496.81 samples/sec Loss 1.2261 LearningRate 0.000034 Epoch: 33 Global Step: 692870 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:18,340-Speed 2497.24 samples/sec Loss 1.1778 LearningRate 0.000034 Epoch: 33 Global Step: 692880 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:26,496-Speed 2511.28 samples/sec Loss 1.1952 LearningRate 0.000034 Epoch: 33 Global Step: 692890 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:34,702-Speed 2496.37 samples/sec Loss 1.1753 LearningRate 0.000034 Epoch: 33 Global Step: 692900 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:42,907-Speed 2496.75 samples/sec Loss 1.2032 LearningRate 0.000034 Epoch: 33 Global Step: 692910 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:51,109-Speed 2497.63 samples/sec Loss 1.1923 LearningRate 0.000033 Epoch: 33 Global Step: 692920 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:04:59,313-Speed 2496.54 samples/sec Loss 1.1636 LearningRate 0.000033 Epoch: 33 Global Step: 692930 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:07,516-Speed 2496.86 samples/sec Loss 1.2180 LearningRate 0.000033 Epoch: 33 Global Step: 692940 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:15,665-Speed 2513.69 samples/sec Loss 1.2278 LearningRate 0.000033 Epoch: 33 Global Step: 692950 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:23,873-Speed 2495.61 samples/sec Loss 1.2160 LearningRate 0.000033 Epoch: 33 Global Step: 692960 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:32,084-Speed 2494.67 samples/sec Loss 1.2333 LearningRate 0.000033 Epoch: 33 Global Step: 692970 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:40,288-Speed 2496.58 samples/sec Loss 1.2035 LearningRate 0.000033 Epoch: 33 Global Step: 692980 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:48,492-Speed 2496.73 samples/sec Loss 1.2202 LearningRate 0.000033 Epoch: 33 Global Step: 692990 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:05:56,699-Speed 2495.97 samples/sec Loss 1.1948 LearningRate 0.000033 Epoch: 33 Global Step: 693000 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:04,854-Speed 2511.75 samples/sec Loss 1.1903 LearningRate 0.000033 Epoch: 33 Global Step: 693010 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:13,067-Speed 2494.20 samples/sec Loss 1.1882 LearningRate 0.000033 Epoch: 33 Global Step: 693020 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:21,267-Speed 2497.86 samples/sec Loss 1.1726 LearningRate 0.000033 Epoch: 33 Global Step: 693030 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:29,472-Speed 2496.69 samples/sec Loss 1.2084 LearningRate 0.000033 Epoch: 33 Global Step: 693040 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:37,676-Speed 2496.39 samples/sec Loss 1.1598 LearningRate 0.000033 Epoch: 33 Global Step: 693050 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:45,885-Speed 2495.41 samples/sec Loss 1.2142 LearningRate 0.000033 Epoch: 33 Global Step: 693060 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:06:54,034-Speed 2513.50 samples/sec Loss 1.1895 LearningRate 0.000033 Epoch: 33 Global Step: 693070 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:02,244-Speed 2495.08 samples/sec Loss 1.2076 LearningRate 0.000033 Epoch: 33 Global Step: 693080 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:10,449-Speed 2496.28 samples/sec Loss 1.1597 LearningRate 0.000033 Epoch: 33 Global Step: 693090 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:18,652-Speed 2497.15 samples/sec Loss 1.1594 LearningRate 0.000033 Epoch: 33 Global Step: 693100 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:26,860-Speed 2495.63 samples/sec Loss 1.1797 LearningRate 0.000033 Epoch: 33 Global Step: 693110 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:35,077-Speed 2492.60 samples/sec Loss 1.2029 LearningRate 0.000033 Epoch: 33 Global Step: 693120 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:43,240-Speed 2509.37 samples/sec Loss 1.1519 LearningRate 0.000033 Epoch: 33 Global Step: 693130 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:51,447-Speed 2495.83 samples/sec Loss 1.2106 LearningRate 0.000033 Epoch: 33 Global Step: 693140 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:07:59,652-Speed 2496.33 samples/sec Loss 1.1990 LearningRate 0.000033 Epoch: 33 Global Step: 693150 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:07,853-Speed 2497.80 samples/sec Loss 1.1732 LearningRate 0.000033 Epoch: 33 Global Step: 693160 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:16,057-Speed 2497.03 samples/sec Loss 1.1821 LearningRate 0.000033 Epoch: 33 Global Step: 693170 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:24,260-Speed 2496.93 samples/sec Loss 1.2344 LearningRate 0.000033 Epoch: 33 Global Step: 693180 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:32,409-Speed 2513.57 samples/sec Loss 1.1894 LearningRate 0.000033 Epoch: 33 Global Step: 693190 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:40,612-Speed 2497.04 samples/sec Loss 1.1978 LearningRate 0.000033 Epoch: 33 Global Step: 693200 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:48,817-Speed 2496.41 samples/sec Loss 1.2002 LearningRate 0.000033 Epoch: 33 Global Step: 693210 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:08:57,024-Speed 2495.97 samples/sec Loss 1.1769 LearningRate 0.000033 Epoch: 33 Global Step: 693220 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:05,228-Speed 2497.07 samples/sec Loss 1.1969 LearningRate 0.000033 Epoch: 33 Global Step: 693230 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:13,431-Speed 2496.84 samples/sec Loss 1.1808 LearningRate 0.000033 Epoch: 33 Global Step: 693240 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:21,582-Speed 2512.91 samples/sec Loss 1.1697 LearningRate 0.000033 Epoch: 33 Global Step: 693250 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:29,787-Speed 2496.72 samples/sec Loss 1.2149 LearningRate 0.000033 Epoch: 33 Global Step: 693260 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:37,988-Speed 2497.82 samples/sec Loss 1.1772 LearningRate 0.000033 Epoch: 33 Global Step: 693270 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:46,194-Speed 2496.20 samples/sec Loss 1.1922 LearningRate 0.000033 Epoch: 33 Global Step: 693280 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:09:54,398-Speed 2496.56 samples/sec Loss 1.1638 LearningRate 0.000033 Epoch: 33 Global Step: 693290 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:02,605-Speed 2496.10 samples/sec Loss 1.2120 LearningRate 0.000033 Epoch: 33 Global Step: 693300 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:10,768-Speed 2509.03 samples/sec Loss 1.2077 LearningRate 0.000033 Epoch: 33 Global Step: 693310 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:18,973-Speed 2496.46 samples/sec Loss 1.1664 LearningRate 0.000033 Epoch: 33 Global Step: 693320 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:27,181-Speed 2495.60 samples/sec Loss 1.1739 LearningRate 0.000033 Epoch: 33 Global Step: 693330 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:35,386-Speed 2496.48 samples/sec Loss 1.1617 LearningRate 0.000033 Epoch: 33 Global Step: 693340 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:43,593-Speed 2496.01 samples/sec Loss 1.2133 LearningRate 0.000033 Epoch: 33 Global Step: 693350 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:51,798-Speed 2496.34 samples/sec Loss 1.1911 LearningRate 0.000033 Epoch: 33 Global Step: 693360 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:10:59,954-Speed 2511.53 samples/sec Loss 1.1935 LearningRate 0.000033 Epoch: 33 Global Step: 693370 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:08,157-Speed 2496.86 samples/sec Loss 1.2141 LearningRate 0.000033 Epoch: 33 Global Step: 693380 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:16,361-Speed 2496.78 samples/sec Loss 1.1611 LearningRate 0.000033 Epoch: 33 Global Step: 693390 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:24,565-Speed 2496.56 samples/sec Loss 1.2161 LearningRate 0.000033 Epoch: 33 Global Step: 693400 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:32,769-Speed 2496.81 samples/sec Loss 1.1698 LearningRate 0.000033 Epoch: 33 Global Step: 693410 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:40,976-Speed 2496.17 samples/sec Loss 1.1825 LearningRate 0.000033 Epoch: 33 Global Step: 693420 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:49,127-Speed 2512.92 samples/sec Loss 1.1586 LearningRate 0.000033 Epoch: 33 Global Step: 693430 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:11:57,330-Speed 2496.93 samples/sec Loss 1.2164 LearningRate 0.000033 Epoch: 33 Global Step: 693440 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:05,538-Speed 2495.75 samples/sec Loss 1.1825 LearningRate 0.000033 Epoch: 33 Global Step: 693450 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:13,753-Speed 2493.27 samples/sec Loss 1.1525 LearningRate 0.000033 Epoch: 33 Global Step: 693460 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:21,956-Speed 2497.02 samples/sec Loss 1.1943 LearningRate 0.000033 Epoch: 33 Global Step: 693470 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:30,158-Speed 2497.48 samples/sec Loss 1.1759 LearningRate 0.000033 Epoch: 33 Global Step: 693480 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:38,308-Speed 2513.32 samples/sec Loss 1.1795 LearningRate 0.000033 Epoch: 33 Global Step: 693490 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:46,514-Speed 2496.10 samples/sec Loss 1.1852 LearningRate 0.000033 Epoch: 33 Global Step: 693500 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:12:54,718-Speed 2496.69 samples/sec Loss 1.1861 LearningRate 0.000033 Epoch: 33 Global Step: 693510 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:02,934-Speed 2493.11 samples/sec Loss 1.1778 LearningRate 0.000033 Epoch: 33 Global Step: 693520 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:11,150-Speed 2493.21 samples/sec Loss 1.1836 LearningRate 0.000033 Epoch: 33 Global Step: 693530 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:19,355-Speed 2496.53 samples/sec Loss 1.2091 LearningRate 0.000033 Epoch: 33 Global Step: 693540 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:27,509-Speed 2512.10 samples/sec Loss 1.1647 LearningRate 0.000033 Epoch: 33 Global Step: 693550 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:35,713-Speed 2497.36 samples/sec Loss 1.1770 LearningRate 0.000033 Epoch: 33 Global Step: 693560 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:43,915-Speed 2497.23 samples/sec Loss 1.2170 LearningRate 0.000033 Epoch: 33 Global Step: 693570 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:13:52,119-Speed 2496.63 samples/sec Loss 1.1709 LearningRate 0.000033 Epoch: 33 Global Step: 693580 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:00,323-Speed 2497.03 samples/sec Loss 1.1734 LearningRate 0.000033 Epoch: 33 Global Step: 693590 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:08,527-Speed 2496.72 samples/sec Loss 1.2121 LearningRate 0.000033 Epoch: 33 Global Step: 693600 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:16,679-Speed 2512.67 samples/sec Loss 1.2143 LearningRate 0.000033 Epoch: 33 Global Step: 693610 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:24,884-Speed 2496.13 samples/sec Loss 1.1893 LearningRate 0.000033 Epoch: 33 Global Step: 693620 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:33,092-Speed 2495.57 samples/sec Loss 1.1400 LearningRate 0.000033 Epoch: 33 Global Step: 693630 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:41,298-Speed 2496.24 samples/sec Loss 1.1606 LearningRate 0.000033 Epoch: 33 Global Step: 693640 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:49,503-Speed 2496.34 samples/sec Loss 1.1829 LearningRate 0.000033 Epoch: 33 Global Step: 693650 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:14:57,711-Speed 2495.63 samples/sec Loss 1.1591 LearningRate 0.000033 Epoch: 33 Global Step: 693660 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:05,865-Speed 2512.18 samples/sec Loss 1.1894 LearningRate 0.000033 Epoch: 33 Global Step: 693670 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:14,070-Speed 2496.16 samples/sec Loss 1.2160 LearningRate 0.000033 Epoch: 33 Global Step: 693680 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:22,277-Speed 2496.12 samples/sec Loss 1.1839 LearningRate 0.000033 Epoch: 33 Global Step: 693690 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:30,483-Speed 2496.16 samples/sec Loss 1.1997 LearningRate 0.000033 Epoch: 33 Global Step: 693700 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:38,689-Speed 2496.04 samples/sec Loss 1.1850 LearningRate 0.000033 Epoch: 33 Global Step: 693710 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:46,893-Speed 2496.94 samples/sec Loss 1.1775 LearningRate 0.000033 Epoch: 33 Global Step: 693720 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:15:55,052-Speed 2510.44 samples/sec Loss 1.1901 LearningRate 0.000033 Epoch: 33 Global Step: 693730 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:03,254-Speed 2497.27 samples/sec Loss 1.1690 LearningRate 0.000033 Epoch: 33 Global Step: 693740 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:11,456-Speed 2497.26 samples/sec Loss 1.1721 LearningRate 0.000033 Epoch: 33 Global Step: 693750 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:19,660-Speed 2496.96 samples/sec Loss 1.1851 LearningRate 0.000033 Epoch: 33 Global Step: 693760 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:27,864-Speed 2496.84 samples/sec Loss 1.1674 LearningRate 0.000033 Epoch: 33 Global Step: 693770 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:36,082-Speed 2492.55 samples/sec Loss 1.1930 LearningRate 0.000033 Epoch: 33 Global Step: 693780 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:44,233-Speed 2512.93 samples/sec Loss 1.1957 LearningRate 0.000033 Epoch: 33 Global Step: 693790 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:16:52,466-Speed 2487.94 samples/sec Loss 1.1759 LearningRate 0.000033 Epoch: 33 Global Step: 693800 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:17:00,672-Speed 2496.09 samples/sec Loss 1.1992 LearningRate 0.000033 Epoch: 33 Global Step: 693810 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:17:08,879-Speed 2495.99 samples/sec Loss 1.1812 LearningRate 0.000033 Epoch: 33 Global Step: 693820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:17,082-Speed 2496.86 samples/sec Loss 1.1951 LearningRate 0.000033 Epoch: 33 Global Step: 693830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:25,289-Speed 2495.93 samples/sec Loss 1.1674 LearningRate 0.000033 Epoch: 33 Global Step: 693840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:33,442-Speed 2512.44 samples/sec Loss 1.1382 LearningRate 0.000033 Epoch: 33 Global Step: 693850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:41,657-Speed 2493.25 samples/sec Loss 1.1719 LearningRate 0.000033 Epoch: 33 Global Step: 693860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:49,862-Speed 2496.50 samples/sec Loss 1.1800 LearningRate 0.000033 Epoch: 33 Global Step: 693870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:17:58,065-Speed 2497.02 samples/sec Loss 1.2320 LearningRate 0.000033 Epoch: 33 Global Step: 693880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:06,279-Speed 2493.40 samples/sec Loss 1.2081 LearningRate 0.000033 Epoch: 33 Global Step: 693890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:14,491-Speed 2494.49 samples/sec Loss 1.1866 LearningRate 0.000033 Epoch: 33 Global Step: 693900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:22,643-Speed 2512.47 samples/sec Loss 1.1822 LearningRate 0.000033 Epoch: 33 Global Step: 693910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:30,849-Speed 2496.01 samples/sec Loss 1.1802 LearningRate 0.000033 Epoch: 33 Global Step: 693920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:39,058-Speed 2495.42 samples/sec Loss 1.2034 LearningRate 0.000033 Epoch: 33 Global Step: 693930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:47,269-Speed 2494.73 samples/sec Loss 1.1906 LearningRate 0.000033 Epoch: 33 Global Step: 693940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:18:55,480-Speed 2494.47 samples/sec Loss 1.1918 LearningRate 0.000033 Epoch: 33 Global Step: 693950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:03,686-Speed 2496.19 samples/sec Loss 1.1778 LearningRate 0.000033 Epoch: 33 Global Step: 693960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:11,839-Speed 2512.32 samples/sec Loss 1.2263 LearningRate 0.000033 Epoch: 33 Global Step: 693970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:20,042-Speed 2496.88 samples/sec Loss 1.1768 LearningRate 0.000033 Epoch: 33 Global Step: 693980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:28,260-Speed 2492.67 samples/sec Loss 1.1953 LearningRate 0.000033 Epoch: 33 Global Step: 693990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:36,464-Speed 2496.62 samples/sec Loss 1.2020 LearningRate 0.000033 Epoch: 33 Global Step: 694000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:44,668-Speed 2496.74 samples/sec Loss 1.2072 LearningRate 0.000033 Epoch: 33 Global Step: 694010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:19:52,878-Speed 2494.88 samples/sec Loss 1.1684 LearningRate 0.000033 Epoch: 33 Global Step: 694020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:01,029-Speed 2512.82 samples/sec Loss 1.2161 LearningRate 0.000033 Epoch: 33 Global Step: 694030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:09,233-Speed 2496.86 samples/sec Loss 1.1752 LearningRate 0.000033 Epoch: 33 Global Step: 694040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:17,437-Speed 2496.71 samples/sec Loss 1.1952 LearningRate 0.000033 Epoch: 33 Global Step: 694050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:25,643-Speed 2496.06 samples/sec Loss 1.1904 LearningRate 0.000033 Epoch: 33 Global Step: 694060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:33,847-Speed 2496.48 samples/sec Loss 1.2026 LearningRate 0.000033 Epoch: 33 Global Step: 694070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:42,055-Speed 2495.75 samples/sec Loss 1.1580 LearningRate 0.000033 Epoch: 33 Global Step: 694080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:50,209-Speed 2512.04 samples/sec Loss 1.1629 LearningRate 0.000033 Epoch: 33 Global Step: 694090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:20:58,442-Speed 2487.84 samples/sec Loss 1.2100 LearningRate 0.000033 Epoch: 33 Global Step: 694100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:06,645-Speed 2497.16 samples/sec Loss 1.2038 LearningRate 0.000033 Epoch: 33 Global Step: 694110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:14,865-Speed 2491.94 samples/sec Loss 1.2230 LearningRate 0.000033 Epoch: 33 Global Step: 694120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:23,077-Speed 2494.73 samples/sec Loss 1.1900 LearningRate 0.000033 Epoch: 33 Global Step: 694130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:31,282-Speed 2496.34 samples/sec Loss 1.1881 LearningRate 0.000033 Epoch: 33 Global Step: 694140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:39,434-Speed 2512.47 samples/sec Loss 1.1530 LearningRate 0.000033 Epoch: 33 Global Step: 694150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:47,643-Speed 2495.19 samples/sec Loss 1.1952 LearningRate 0.000033 Epoch: 33 Global Step: 694160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:21:55,849-Speed 2496.31 samples/sec Loss 1.1649 LearningRate 0.000033 Epoch: 33 Global Step: 694170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:04,051-Speed 2497.16 samples/sec Loss 1.1958 LearningRate 0.000033 Epoch: 33 Global Step: 694180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:12,262-Speed 2494.73 samples/sec Loss 1.1895 LearningRate 0.000033 Epoch: 33 Global Step: 694190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:20,468-Speed 2495.92 samples/sec Loss 1.1918 LearningRate 0.000033 Epoch: 33 Global Step: 694200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:28,619-Speed 2513.42 samples/sec Loss 1.2228 LearningRate 0.000033 Epoch: 33 Global Step: 694210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:36,823-Speed 2496.65 samples/sec Loss 1.1800 LearningRate 0.000033 Epoch: 33 Global Step: 694220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:45,026-Speed 2497.13 samples/sec Loss 1.2027 LearningRate 0.000033 Epoch: 33 Global Step: 694230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:22:53,231-Speed 2496.49 samples/sec Loss 1.1967 LearningRate 0.000033 Epoch: 33 Global Step: 694240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:01,445-Speed 2493.83 samples/sec Loss 1.2102 LearningRate 0.000033 Epoch: 33 Global Step: 694250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:09,656-Speed 2494.63 samples/sec Loss 1.1738 LearningRate 0.000033 Epoch: 33 Global Step: 694260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:17,817-Speed 2509.76 samples/sec Loss 1.2172 LearningRate 0.000033 Epoch: 33 Global Step: 694270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:26,025-Speed 2495.55 samples/sec Loss 1.2155 LearningRate 0.000033 Epoch: 33 Global Step: 694280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:34,227-Speed 2497.21 samples/sec Loss 1.1884 LearningRate 0.000033 Epoch: 33 Global Step: 694290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:42,433-Speed 2496.01 samples/sec Loss 1.1529 LearningRate 0.000033 Epoch: 33 Global Step: 694300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:50,653-Speed 2491.87 samples/sec Loss 1.1875 LearningRate 0.000033 Epoch: 33 Global Step: 694310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:23:58,858-Speed 2496.48 samples/sec Loss 1.1890 LearningRate 0.000033 Epoch: 33 Global Step: 694320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:07,025-Speed 2508.18 samples/sec Loss 1.1657 LearningRate 0.000033 Epoch: 33 Global Step: 694330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:15,229-Speed 2496.78 samples/sec Loss 1.2095 LearningRate 0.000033 Epoch: 33 Global Step: 694340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:23,434-Speed 2496.49 samples/sec Loss 1.1722 LearningRate 0.000033 Epoch: 33 Global Step: 694350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:31,639-Speed 2496.49 samples/sec Loss 1.1944 LearningRate 0.000033 Epoch: 33 Global Step: 694360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:39,844-Speed 2496.33 samples/sec Loss 1.2024 LearningRate 0.000033 Epoch: 33 Global Step: 694370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:48,049-Speed 2496.33 samples/sec Loss 1.1864 LearningRate 0.000033 Epoch: 33 Global Step: 694380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:24:56,200-Speed 2513.46 samples/sec Loss 1.1832 LearningRate 0.000033 Epoch: 33 Global Step: 694390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:04,404-Speed 2496.68 samples/sec Loss 1.1857 LearningRate 0.000033 Epoch: 33 Global Step: 694400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:12,615-Speed 2494.61 samples/sec Loss 1.1735 LearningRate 0.000033 Epoch: 33 Global Step: 694410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:20,820-Speed 2496.16 samples/sec Loss 1.1827 LearningRate 0.000033 Epoch: 33 Global Step: 694420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:29,030-Speed 2495.04 samples/sec Loss 1.2031 LearningRate 0.000033 Epoch: 33 Global Step: 694430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:37,237-Speed 2495.89 samples/sec Loss 1.2304 LearningRate 0.000033 Epoch: 33 Global Step: 694440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:45,389-Speed 2512.73 samples/sec Loss 1.1810 LearningRate 0.000033 Epoch: 33 Global Step: 694450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:25:53,597-Speed 2495.66 samples/sec Loss 1.2043 LearningRate 0.000033 Epoch: 33 Global Step: 694460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:01,828-Speed 2497.84 samples/sec Loss 1.2026 LearningRate 0.000033 Epoch: 33 Global Step: 694470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:10,098-Speed 2498.17 samples/sec Loss 1.1996 LearningRate 0.000033 Epoch: 33 Global Step: 694480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:18,303-Speed 2496.27 samples/sec Loss 1.2058 LearningRate 0.000033 Epoch: 33 Global Step: 694490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:27,449-Speed 2496.51 samples/sec Loss 1.2078 LearningRate 0.000033 Epoch: 33 Global Step: 694500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:40,712-Speed 2386.06 samples/sec Loss 1.1991 LearningRate 0.000033 Epoch: 33 Global Step: 694510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:48,937-Speed 2498.56 samples/sec Loss 1.1869 LearningRate 0.000033 Epoch: 33 Global Step: 694520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:26:57,148-Speed 2494.52 samples/sec Loss 1.2011 LearningRate 0.000033 Epoch: 33 Global Step: 694530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:10,450-Speed 1539.82 samples/sec Loss 1.1805 LearningRate 0.000033 Epoch: 33 Global Step: 694540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:18,667-Speed 2494.00 samples/sec Loss 1.2098 LearningRate 0.000033 Epoch: 33 Global Step: 694550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:26,932-Speed 2491.47 samples/sec Loss 1.1701 LearningRate 0.000033 Epoch: 33 Global Step: 694560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:35,112-Speed 2503.89 samples/sec Loss 1.1653 LearningRate 0.000033 Epoch: 33 Global Step: 694570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:49,209-Speed 2489.04 samples/sec Loss 1.1864 LearningRate 0.000033 Epoch: 33 Global Step: 694580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:27:57,463-Speed 2490.78 samples/sec Loss 1.1687 LearningRate 0.000033 Epoch: 33 Global Step: 694590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:05,706-Speed 2484.88 samples/sec Loss 1.2194 LearningRate 0.000033 Epoch: 33 Global Step: 694600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:19,046-Speed 1538.22 samples/sec Loss 1.1708 LearningRate 0.000033 Epoch: 33 Global Step: 694610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:27,300-Speed 2492.83 samples/sec Loss 1.1726 LearningRate 0.000033 Epoch: 33 Global Step: 694620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:35,890-Speed 2384.46 samples/sec Loss 1.1713 LearningRate 0.000033 Epoch: 33 Global Step: 694630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:48,259-Speed 1707.35 samples/sec Loss 1.2104 LearningRate 0.000033 Epoch: 33 Global Step: 694640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:28:57,835-Speed 2490.52 samples/sec Loss 1.1942 LearningRate 0.000033 Epoch: 33 Global Step: 694650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:06,089-Speed 2492.60 samples/sec Loss 1.1986 LearningRate 0.000033 Epoch: 33 Global Step: 694660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:19,039-Speed 1581.57 samples/sec Loss 1.1958 LearningRate 0.000033 Epoch: 33 Global Step: 694670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:27,291-Speed 2492.61 samples/sec Loss 1.1580 LearningRate 0.000033 Epoch: 33 Global Step: 694680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:35,513-Speed 2507.90 samples/sec Loss 1.1931 LearningRate 0.000033 Epoch: 33 Global Step: 694690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:47,625-Speed 1691.00 samples/sec Loss 1.1530 LearningRate 0.000033 Epoch: 33 Global Step: 694700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:29:56,478-Speed 2471.73 samples/sec Loss 1.1709 LearningRate 0.000033 Epoch: 33 Global Step: 694710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:06,019-Speed 2490.11 samples/sec Loss 1.1844 LearningRate 0.000033 Epoch: 33 Global Step: 694720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:15,811-Speed 2493.69 samples/sec Loss 1.1930 LearningRate 0.000033 Epoch: 33 Global Step: 694730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:24,037-Speed 2490.16 samples/sec Loss 1.2140 LearningRate 0.000033 Epoch: 33 Global Step: 694740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:32,205-Speed 2507.67 samples/sec Loss 1.1777 LearningRate 0.000033 Epoch: 33 Global Step: 694750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:40,426-Speed 2491.43 samples/sec Loss 1.1959 LearningRate 0.000033 Epoch: 33 Global Step: 694760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:48,646-Speed 2491.88 samples/sec Loss 1.1636 LearningRate 0.000033 Epoch: 33 Global Step: 694770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:30:56,882-Speed 2487.18 samples/sec Loss 1.1758 LearningRate 0.000033 Epoch: 33 Global Step: 694780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:05,102-Speed 2491.83 samples/sec Loss 1.1943 LearningRate 0.000033 Epoch: 33 Global Step: 694790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:13,326-Speed 2490.58 samples/sec Loss 1.1780 LearningRate 0.000033 Epoch: 33 Global Step: 694800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:21,512-Speed 2502.19 samples/sec Loss 1.1870 LearningRate 0.000033 Epoch: 33 Global Step: 694810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:29,733-Speed 2491.55 samples/sec Loss 1.1753 LearningRate 0.000033 Epoch: 33 Global Step: 694820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:37,948-Speed 2493.38 samples/sec Loss 1.1628 LearningRate 0.000033 Epoch: 33 Global Step: 694830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:46,162-Speed 2493.68 samples/sec Loss 1.1750 LearningRate 0.000033 Epoch: 33 Global Step: 694840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:31:54,377-Speed 2493.44 samples/sec Loss 1.1707 LearningRate 0.000033 Epoch: 33 Global Step: 694850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:02,587-Speed 2494.90 samples/sec Loss 1.1871 LearningRate 0.000033 Epoch: 33 Global Step: 694860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:10,743-Speed 2511.72 samples/sec Loss 1.2267 LearningRate 0.000033 Epoch: 33 Global Step: 694870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:18,952-Speed 2495.29 samples/sec Loss 1.1792 LearningRate 0.000033 Epoch: 33 Global Step: 694880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:27,162-Speed 2494.69 samples/sec Loss 1.2162 LearningRate 0.000033 Epoch: 33 Global Step: 694890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:35,376-Speed 2494.03 samples/sec Loss 1.1857 LearningRate 0.000033 Epoch: 33 Global Step: 694900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:43,587-Speed 2494.76 samples/sec Loss 1.2138 LearningRate 0.000033 Epoch: 33 Global Step: 694910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:51,798-Speed 2494.54 samples/sec Loss 1.2054 LearningRate 0.000033 Epoch: 33 Global Step: 694920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:32:59,955-Speed 2511.23 samples/sec Loss 1.1735 LearningRate 0.000033 Epoch: 33 Global Step: 694930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:08,165-Speed 2494.82 samples/sec Loss 1.1918 LearningRate 0.000033 Epoch: 33 Global Step: 694940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:16,387-Speed 2491.13 samples/sec Loss 1.2081 LearningRate 0.000033 Epoch: 33 Global Step: 694950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:24,600-Speed 2494.24 samples/sec Loss 1.1675 LearningRate 0.000033 Epoch: 33 Global Step: 694960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:32,811-Speed 2494.51 samples/sec Loss 1.1964 LearningRate 0.000032 Epoch: 33 Global Step: 694970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:41,022-Speed 2494.47 samples/sec Loss 1.1761 LearningRate 0.000032 Epoch: 33 Global Step: 694980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:49,180-Speed 2510.81 samples/sec Loss 1.1667 LearningRate 0.000032 Epoch: 33 Global Step: 694990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:33:57,398-Speed 2492.57 samples/sec Loss 1.1734 LearningRate 0.000032 Epoch: 33 Global Step: 695000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:34:05,605-Speed 2495.80 samples/sec Loss 1.1753 LearningRate 0.000032 Epoch: 33 Global Step: 695010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:34:13,821-Speed 2493.07 samples/sec Loss 1.2031 LearningRate 0.000032 Epoch: 33 Global Step: 695020 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:34:22,030-Speed 2495.24 samples/sec Loss 1.1979 LearningRate 0.000032 Epoch: 33 Global Step: 695030 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:34:30,241-Speed 2494.58 samples/sec Loss 1.1782 LearningRate 0.000032 Epoch: 33 Global Step: 695040 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:34:38,396-Speed 2511.49 samples/sec Loss 1.1671 LearningRate 0.000032 Epoch: 33 Global Step: 695050 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:34:46,604-Speed 2495.63 samples/sec Loss 1.1899 LearningRate 0.000032 Epoch: 33 Global Step: 695060 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:34:54,813-Speed 2495.63 samples/sec Loss 1.2027 LearningRate 0.000032 Epoch: 33 Global Step: 695070 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:03,021-Speed 2495.59 samples/sec Loss 1.1914 LearningRate 0.000032 Epoch: 33 Global Step: 695080 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:11,235-Speed 2493.60 samples/sec Loss 1.1683 LearningRate 0.000032 Epoch: 33 Global Step: 695090 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:19,448-Speed 2494.16 samples/sec Loss 1.2175 LearningRate 0.000032 Epoch: 33 Global Step: 695100 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:27,606-Speed 2510.56 samples/sec Loss 1.2136 LearningRate 0.000032 Epoch: 33 Global Step: 695110 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:35,816-Speed 2494.98 samples/sec Loss 1.1956 LearningRate 0.000032 Epoch: 33 Global Step: 695120 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:44,024-Speed 2495.72 samples/sec Loss 1.1834 LearningRate 0.000032 Epoch: 33 Global Step: 695130 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:35:52,231-Speed 2495.73 samples/sec Loss 1.1770 LearningRate 0.000032 Epoch: 33 Global Step: 695140 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:00,447-Speed 2493.19 samples/sec Loss 1.1472 LearningRate 0.000032 Epoch: 33 Global Step: 695150 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:08,666-Speed 2491.98 samples/sec Loss 1.1871 LearningRate 0.000032 Epoch: 33 Global Step: 695160 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:16,820-Speed 2512.02 samples/sec Loss 1.1756 LearningRate 0.000032 Epoch: 33 Global Step: 695170 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:25,029-Speed 2495.16 samples/sec Loss 1.2019 LearningRate 0.000032 Epoch: 33 Global Step: 695180 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:33,236-Speed 2495.78 samples/sec Loss 1.1622 LearningRate 0.000032 Epoch: 33 Global Step: 695190 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:41,447-Speed 2494.76 samples/sec Loss 1.2122 LearningRate 0.000032 Epoch: 33 Global Step: 695200 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:49,655-Speed 2495.54 samples/sec Loss 1.1988 LearningRate 0.000032 Epoch: 33 Global Step: 695210 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:36:57,863-Speed 2495.69 samples/sec Loss 1.1553 LearningRate 0.000032 Epoch: 33 Global Step: 695220 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:06,021-Speed 2511.06 samples/sec Loss 1.1888 LearningRate 0.000032 Epoch: 33 Global Step: 695230 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:14,228-Speed 2495.73 samples/sec Loss 1.1752 LearningRate 0.000032 Epoch: 33 Global Step: 695240 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:22,444-Speed 2492.94 samples/sec Loss 1.1669 LearningRate 0.000032 Epoch: 33 Global Step: 695250 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:30,662-Speed 2492.58 samples/sec Loss 1.1869 LearningRate 0.000032 Epoch: 33 Global Step: 695260 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:38,871-Speed 2495.20 samples/sec Loss 1.1997 LearningRate 0.000032 Epoch: 33 Global Step: 695270 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:47,082-Speed 2494.36 samples/sec Loss 1.2274 LearningRate 0.000032 Epoch: 33 Global Step: 695280 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:37:55,235-Speed 2512.42 samples/sec Loss 1.1801 LearningRate 0.000032 Epoch: 33 Global Step: 695290 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:03,443-Speed 2495.42 samples/sec Loss 1.1571 LearningRate 0.000032 Epoch: 33 Global Step: 695300 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:11,649-Speed 2496.26 samples/sec Loss 1.1788 LearningRate 0.000032 Epoch: 33 Global Step: 695310 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:19,856-Speed 2495.87 samples/sec Loss 1.1897 LearningRate 0.000032 Epoch: 33 Global Step: 695320 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:28,058-Speed 2497.22 samples/sec Loss 1.2213 LearningRate 0.000032 Epoch: 33 Global Step: 695330 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:36,262-Speed 2496.74 samples/sec Loss 1.2042 LearningRate 0.000032 Epoch: 33 Global Step: 695340 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:44,413-Speed 2512.89 samples/sec Loss 1.1774 LearningRate 0.000032 Epoch: 33 Global Step: 695350 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:38:52,620-Speed 2495.94 samples/sec Loss 1.1862 LearningRate 0.000032 Epoch: 33 Global Step: 695360 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:00,836-Speed 2492.88 samples/sec Loss 1.1829 LearningRate 0.000032 Epoch: 33 Global Step: 695370 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:09,041-Speed 2496.52 samples/sec Loss 1.2048 LearningRate 0.000032 Epoch: 33 Global Step: 695380 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:17,249-Speed 2495.65 samples/sec Loss 1.1993 LearningRate 0.000032 Epoch: 33 Global Step: 695390 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:25,458-Speed 2495.21 samples/sec Loss 1.1825 LearningRate 0.000032 Epoch: 33 Global Step: 695400 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:33,619-Speed 2509.94 samples/sec Loss 1.1675 LearningRate 0.000032 Epoch: 33 Global Step: 695410 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:41,826-Speed 2495.90 samples/sec Loss 1.1504 LearningRate 0.000032 Epoch: 33 Global Step: 695420 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:50,035-Speed 2495.21 samples/sec Loss 1.2027 LearningRate 0.000032 Epoch: 33 Global Step: 695430 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:39:58,242-Speed 2495.61 samples/sec Loss 1.1874 LearningRate 0.000032 Epoch: 33 Global Step: 695440 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:06,450-Speed 2495.76 samples/sec Loss 1.1750 LearningRate 0.000032 Epoch: 33 Global Step: 695450 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:14,654-Speed 2496.61 samples/sec Loss 1.1778 LearningRate 0.000032 Epoch: 33 Global Step: 695460 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:22,806-Speed 2512.71 samples/sec Loss 1.1612 LearningRate 0.000032 Epoch: 33 Global Step: 695470 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:31,014-Speed 2495.67 samples/sec Loss 1.1893 LearningRate 0.000032 Epoch: 33 Global Step: 695480 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:39,222-Speed 2495.37 samples/sec Loss 1.1963 LearningRate 0.000032 Epoch: 33 Global Step: 695490 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:47,430-Speed 2496.02 samples/sec Loss 1.1708 LearningRate 0.000032 Epoch: 33 Global Step: 695500 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:40:55,637-Speed 2495.86 samples/sec Loss 1.1919 LearningRate 0.000032 Epoch: 33 Global Step: 695510 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:03,843-Speed 2495.89 samples/sec Loss 1.1736 LearningRate 0.000032 Epoch: 33 Global Step: 695520 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:11,995-Speed 2512.83 samples/sec Loss 1.2201 LearningRate 0.000032 Epoch: 33 Global Step: 695530 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:20,201-Speed 2496.18 samples/sec Loss 1.2215 LearningRate 0.000032 Epoch: 33 Global Step: 695540 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:28,418-Speed 2492.53 samples/sec Loss 1.1620 LearningRate 0.000032 Epoch: 33 Global Step: 695550 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:36,626-Speed 2495.66 samples/sec Loss 1.2046 LearningRate 0.000032 Epoch: 33 Global Step: 695560 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:44,844-Speed 2492.37 samples/sec Loss 1.1702 LearningRate 0.000032 Epoch: 33 Global Step: 695570 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:41:53,051-Speed 2496.12 samples/sec Loss 1.1801 LearningRate 0.000032 Epoch: 33 Global Step: 695580 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:01,216-Speed 2508.54 samples/sec Loss 1.1939 LearningRate 0.000032 Epoch: 33 Global Step: 695590 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:09,422-Speed 2496.11 samples/sec Loss 1.1861 LearningRate 0.000032 Epoch: 33 Global Step: 695600 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:17,627-Speed 2496.42 samples/sec Loss 1.1829 LearningRate 0.000032 Epoch: 33 Global Step: 695610 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:25,834-Speed 2495.75 samples/sec Loss 1.1325 LearningRate 0.000032 Epoch: 33 Global Step: 695620 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:34,041-Speed 2495.77 samples/sec Loss 1.1791 LearningRate 0.000032 Epoch: 33 Global Step: 695630 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:42,251-Speed 2494.95 samples/sec Loss 1.1705 LearningRate 0.000032 Epoch: 33 Global Step: 695640 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:50,406-Speed 2511.74 samples/sec Loss 1.1784 LearningRate 0.000032 Epoch: 33 Global Step: 695650 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:42:58,614-Speed 2495.41 samples/sec Loss 1.1755 LearningRate 0.000032 Epoch: 33 Global Step: 695660 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:06,833-Speed 2492.19 samples/sec Loss 1.1737 LearningRate 0.000032 Epoch: 33 Global Step: 695670 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:15,035-Speed 2497.34 samples/sec Loss 1.1850 LearningRate 0.000032 Epoch: 33 Global Step: 695680 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:23,240-Speed 2496.52 samples/sec Loss 1.1886 LearningRate 0.000032 Epoch: 33 Global Step: 695690 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:31,443-Speed 2496.88 samples/sec Loss 1.2033 LearningRate 0.000032 Epoch: 33 Global Step: 695700 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:39,596-Speed 2512.65 samples/sec Loss 1.1910 LearningRate 0.000032 Epoch: 33 Global Step: 695710 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:47,799-Speed 2496.92 samples/sec Loss 1.1776 LearningRate 0.000032 Epoch: 33 Global Step: 695720 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-07-12 05:43:55,960-Speed 2510.24 samples/sec Loss 1.1820 LearningRate 0.000032 Epoch: 33 Global Step: 695730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:04,164-Speed 2496.82 samples/sec Loss 1.1909 LearningRate 0.000032 Epoch: 33 Global Step: 695740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:12,369-Speed 2496.37 samples/sec Loss 1.2065 LearningRate 0.000032 Epoch: 33 Global Step: 695750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:20,578-Speed 2495.23 samples/sec Loss 1.1311 LearningRate 0.000032 Epoch: 33 Global Step: 695760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:28,742-Speed 2509.08 samples/sec Loss 1.2123 LearningRate 0.000032 Epoch: 33 Global Step: 695770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:36,947-Speed 2496.29 samples/sec Loss 1.2036 LearningRate 0.000032 Epoch: 33 Global Step: 695780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-07-12 05:44:45,108-Speed 2510.51 samples/sec Loss 1.1534 LearningRate 0.000032 Epoch: 33 Global Step: 695790 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:44:53,315-Speed 2495.72 samples/sec Loss 1.1932 LearningRate 0.000032 Epoch: 33 Global Step: 695800 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:01,518-Speed 2497.00 samples/sec Loss 1.1795 LearningRate 0.000032 Epoch: 33 Global Step: 695810 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:09,737-Speed 2492.37 samples/sec Loss 1.1994 LearningRate 0.000032 Epoch: 33 Global Step: 695820 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:17,888-Speed 2512.95 samples/sec Loss 1.1692 LearningRate 0.000032 Epoch: 33 Global Step: 695830 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:26,088-Speed 2497.95 samples/sec Loss 1.2182 LearningRate 0.000032 Epoch: 33 Global Step: 695840 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:34,303-Speed 2493.50 samples/sec Loss 1.1856 LearningRate 0.000032 Epoch: 33 Global Step: 695850 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:42,507-Speed 2496.72 samples/sec Loss 1.1778 LearningRate 0.000032 Epoch: 33 Global Step: 695860 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:50,711-Speed 2496.69 samples/sec Loss 1.1808 LearningRate 0.000032 Epoch: 33 Global Step: 695870 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:45:58,913-Speed 2497.63 samples/sec Loss 1.1791 LearningRate 0.000032 Epoch: 33 Global Step: 695880 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:07,062-Speed 2513.65 samples/sec Loss 1.1714 LearningRate 0.000032 Epoch: 33 Global Step: 695890 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:15,262-Speed 2498.04 samples/sec Loss 1.1658 LearningRate 0.000032 Epoch: 33 Global Step: 695900 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:23,463-Speed 2497.70 samples/sec Loss 1.1905 LearningRate 0.000032 Epoch: 33 Global Step: 695910 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:31,667-Speed 2496.62 samples/sec Loss 1.1874 LearningRate 0.000032 Epoch: 33 Global Step: 695920 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:39,875-Speed 2495.57 samples/sec Loss 1.2077 LearningRate 0.000032 Epoch: 33 Global Step: 695930 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:48,077-Speed 2497.28 samples/sec Loss 1.1906 LearningRate 0.000032 Epoch: 33 Global Step: 695940 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:46:56,232-Speed 2511.84 samples/sec Loss 1.1922 LearningRate 0.000032 Epoch: 33 Global Step: 695950 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:04,434-Speed 2497.46 samples/sec Loss 1.1706 LearningRate 0.000032 Epoch: 33 Global Step: 695960 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:12,638-Speed 2496.38 samples/sec Loss 1.1431 LearningRate 0.000032 Epoch: 33 Global Step: 695970 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:20,839-Speed 2497.92 samples/sec Loss 1.1596 LearningRate 0.000032 Epoch: 33 Global Step: 695980 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:29,041-Speed 2497.24 samples/sec Loss 1.1472 LearningRate 0.000032 Epoch: 33 Global Step: 695990 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:37,245-Speed 2496.69 samples/sec Loss 1.2152 LearningRate 0.000032 Epoch: 33 Global Step: 696000 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:45,395-Speed 2513.23 samples/sec Loss 1.1822 LearningRate 0.000032 Epoch: 33 Global Step: 696010 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:47:53,598-Speed 2497.30 samples/sec Loss 1.1648 LearningRate 0.000032 Epoch: 33 Global Step: 696020 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:01,801-Speed 2496.85 samples/sec Loss 1.1750 LearningRate 0.000032 Epoch: 33 Global Step: 696030 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:10,011-Speed 2495.17 samples/sec Loss 1.1814 LearningRate 0.000032 Epoch: 33 Global Step: 696040 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:18,215-Speed 2496.80 samples/sec Loss 1.1912 LearningRate 0.000032 Epoch: 33 Global Step: 696050 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:26,418-Speed 2497.00 samples/sec Loss 1.1807 LearningRate 0.000032 Epoch: 33 Global Step: 696060 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:34,566-Speed 2513.83 samples/sec Loss 1.1920 LearningRate 0.000032 Epoch: 33 Global Step: 696070 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:42,768-Speed 2497.33 samples/sec Loss 1.1722 LearningRate 0.000032 Epoch: 33 Global Step: 696080 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:51,003-Speed 2487.34 samples/sec Loss 1.1649 LearningRate 0.000032 Epoch: 33 Global Step: 696090 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:48:59,206-Speed 2497.89 samples/sec Loss 1.1785 LearningRate 0.000032 Epoch: 33 Global Step: 696100 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:07,410-Speed 2496.78 samples/sec Loss 1.1874 LearningRate 0.000032 Epoch: 33 Global Step: 696110 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:15,612-Speed 2497.29 samples/sec Loss 1.1679 LearningRate 0.000032 Epoch: 33 Global Step: 696120 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:23,759-Speed 2514.17 samples/sec Loss 1.1678 LearningRate 0.000032 Epoch: 33 Global Step: 696130 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:31,961-Speed 2497.31 samples/sec Loss 1.1881 LearningRate 0.000032 Epoch: 33 Global Step: 696140 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:40,164-Speed 2497.07 samples/sec Loss 1.1582 LearningRate 0.000032 Epoch: 33 Global Step: 696150 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:48,367-Speed 2497.18 samples/sec Loss 1.1527 LearningRate 0.000032 Epoch: 33 Global Step: 696160 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:49:56,570-Speed 2496.97 samples/sec Loss 1.1819 LearningRate 0.000032 Epoch: 33 Global Step: 696170 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:04,770-Speed 2497.83 samples/sec Loss 1.1532 LearningRate 0.000032 Epoch: 33 Global Step: 696180 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:12,916-Speed 2514.79 samples/sec Loss 1.1585 LearningRate 0.000032 Epoch: 33 Global Step: 696190 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:21,122-Speed 2496.15 samples/sec Loss 1.1702 LearningRate 0.000032 Epoch: 33 Global Step: 696200 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:29,323-Speed 2497.55 samples/sec Loss 1.1682 LearningRate 0.000032 Epoch: 33 Global Step: 696210 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:37,527-Speed 2496.86 samples/sec Loss 1.1544 LearningRate 0.000032 Epoch: 33 Global Step: 696220 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:45,727-Speed 2497.98 samples/sec Loss 1.1943 LearningRate 0.000032 Epoch: 33 Global Step: 696230 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:50:53,927-Speed 2497.91 samples/sec Loss 1.1889 LearningRate 0.000032 Epoch: 33 Global Step: 696240 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:02,078-Speed 2513.04 samples/sec Loss 1.1999 LearningRate 0.000032 Epoch: 33 Global Step: 696250 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:10,270-Speed 2500.32 samples/sec Loss 1.1948 LearningRate 0.000032 Epoch: 33 Global Step: 696260 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:18,482-Speed 2494.28 samples/sec Loss 1.1931 LearningRate 0.000032 Epoch: 33 Global Step: 696270 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:26,683-Speed 2497.59 samples/sec Loss 1.1881 LearningRate 0.000032 Epoch: 33 Global Step: 696280 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:34,884-Speed 2497.63 samples/sec Loss 1.1865 LearningRate 0.000032 Epoch: 33 Global Step: 696290 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:43,101-Speed 2493.12 samples/sec Loss 1.1921 LearningRate 0.000032 Epoch: 33 Global Step: 696300 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:51,265-Speed 2509.06 samples/sec Loss 1.1911 LearningRate 0.000032 Epoch: 33 Global Step: 696310 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:51:59,469-Speed 2496.69 samples/sec Loss 1.1861 LearningRate 0.000032 Epoch: 33 Global Step: 696320 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:52:07,673-Speed 2496.54 samples/sec Loss 1.1549 LearningRate 0.000032 Epoch: 33 Global Step: 696330 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:52:15,871-Speed 2498.72 samples/sec Loss 1.1305 LearningRate 0.000032 Epoch: 33 Global Step: 696340 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-07-12 05:52:24,071-Speed 2498.09 samples/sec Loss 1.1638 LearningRate 0.000032 Epoch: 33 Global Step: 696350 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:52:32,275-Speed 2496.56 samples/sec Loss 1.1536 LearningRate 0.000032 Epoch: 33 Global Step: 696360 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:52:40,423-Speed 2514.02 samples/sec Loss 1.1851 LearningRate 0.000032 Epoch: 33 Global Step: 696370 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:52:48,624-Speed 2497.46 samples/sec Loss 1.1632 LearningRate 0.000032 Epoch: 33 Global Step: 696380 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:52:56,832-Speed 2495.65 samples/sec Loss 1.1426 LearningRate 0.000032 Epoch: 33 Global Step: 696390 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:05,030-Speed 2498.49 samples/sec Loss 1.1613 LearningRate 0.000032 Epoch: 33 Global Step: 696400 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:13,235-Speed 2496.71 samples/sec Loss 1.1875 LearningRate 0.000032 Epoch: 33 Global Step: 696410 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:21,438-Speed 2497.00 samples/sec Loss 1.1434 LearningRate 0.000032 Epoch: 33 Global Step: 696420 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:29,589-Speed 2512.53 samples/sec Loss 1.1763 LearningRate 0.000032 Epoch: 33 Global Step: 696430 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:37,790-Speed 2497.62 samples/sec Loss 1.1612 LearningRate 0.000032 Epoch: 33 Global Step: 696440 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:45,987-Speed 2498.93 samples/sec Loss 1.1895 LearningRate 0.000032 Epoch: 33 Global Step: 696450 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:53:54,191-Speed 2496.91 samples/sec Loss 1.1621 LearningRate 0.000032 Epoch: 33 Global Step: 696460 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:02,397-Speed 2495.79 samples/sec Loss 1.1747 LearningRate 0.000032 Epoch: 33 Global Step: 696470 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:10,602-Speed 2496.43 samples/sec Loss 1.1707 LearningRate 0.000032 Epoch: 33 Global Step: 696480 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:18,758-Speed 2511.77 samples/sec Loss 1.1961 LearningRate 0.000032 Epoch: 33 Global Step: 696490 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:26,959-Speed 2497.61 samples/sec Loss 1.1881 LearningRate 0.000032 Epoch: 33 Global Step: 696500 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:35,163-Speed 2496.70 samples/sec Loss 1.1771 LearningRate 0.000032 Epoch: 33 Global Step: 696510 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:43,366-Speed 2497.36 samples/sec Loss 1.1437 LearningRate 0.000032 Epoch: 33 Global Step: 696520 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:51,576-Speed 2494.68 samples/sec Loss 1.1734 LearningRate 0.000032 Epoch: 33 Global Step: 696530 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:54:59,778-Speed 2497.43 samples/sec Loss 1.1401 LearningRate 0.000032 Epoch: 33 Global Step: 696540 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:07,939-Speed 2510.03 samples/sec Loss 1.2005 LearningRate 0.000032 Epoch: 33 Global Step: 696550 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:16,153-Speed 2493.70 samples/sec Loss 1.1777 LearningRate 0.000032 Epoch: 33 Global Step: 696560 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:24,355-Speed 2497.29 samples/sec Loss 1.1654 LearningRate 0.000032 Epoch: 33 Global Step: 696570 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:32,556-Speed 2497.60 samples/sec Loss 1.1616 LearningRate 0.000032 Epoch: 33 Global Step: 696580 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:40,759-Speed 2496.93 samples/sec Loss 1.1962 LearningRate 0.000032 Epoch: 33 Global Step: 696590 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:48,971-Speed 2494.17 samples/sec Loss 1.1950 LearningRate 0.000032 Epoch: 33 Global Step: 696600 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:55:57,121-Speed 2513.53 samples/sec Loss 1.1923 LearningRate 0.000032 Epoch: 33 Global Step: 696610 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:05,328-Speed 2496.00 samples/sec Loss 1.2015 LearningRate 0.000032 Epoch: 33 Global Step: 696620 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:13,536-Speed 2495.57 samples/sec Loss 1.1740 LearningRate 0.000032 Epoch: 33 Global Step: 696630 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:21,737-Speed 2497.53 samples/sec Loss 1.2009 LearningRate 0.000032 Epoch: 33 Global Step: 696640 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:29,939-Speed 2497.33 samples/sec Loss 1.1474 LearningRate 0.000032 Epoch: 33 Global Step: 696650 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:38,139-Speed 2498.26 samples/sec Loss 1.1589 LearningRate 0.000032 Epoch: 33 Global Step: 696660 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:46,287-Speed 2513.92 samples/sec Loss 1.1378 LearningRate 0.000032 Epoch: 33 Global Step: 696670 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:56:54,487-Speed 2497.90 samples/sec Loss 1.1833 LearningRate 0.000032 Epoch: 33 Global Step: 696680 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:02,693-Speed 2496.28 samples/sec Loss 1.2119 LearningRate 0.000032 Epoch: 33 Global Step: 696690 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:10,895-Speed 2497.45 samples/sec Loss 1.1850 LearningRate 0.000032 Epoch: 33 Global Step: 696700 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:19,097-Speed 2496.99 samples/sec Loss 1.1957 LearningRate 0.000032 Epoch: 33 Global Step: 696710 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:27,303-Speed 2496.19 samples/sec Loss 1.1518 LearningRate 0.000032 Epoch: 33 Global Step: 696720 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:35,463-Speed 2510.49 samples/sec Loss 1.1789 LearningRate 0.000032 Epoch: 33 Global Step: 696730 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:43,662-Speed 2498.01 samples/sec Loss 1.1700 LearningRate 0.000032 Epoch: 33 Global Step: 696740 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:57:51,864-Speed 2497.20 samples/sec Loss 1.1761 LearningRate 0.000032 Epoch: 33 Global Step: 696750 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:00,104-Speed 2486.11 samples/sec Loss 1.1836 LearningRate 0.000032 Epoch: 33 Global Step: 696760 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:08,305-Speed 2497.69 samples/sec Loss 1.1711 LearningRate 0.000032 Epoch: 33 Global Step: 696770 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:16,508-Speed 2496.84 samples/sec Loss 1.1993 LearningRate 0.000032 Epoch: 33 Global Step: 696780 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:24,656-Speed 2513.90 samples/sec Loss 1.1975 LearningRate 0.000032 Epoch: 33 Global Step: 696790 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:32,856-Speed 2497.84 samples/sec Loss 1.1419 LearningRate 0.000032 Epoch: 33 Global Step: 696800 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:41,060-Speed 2496.78 samples/sec Loss 1.2033 LearningRate 0.000032 Epoch: 33 Global Step: 696810 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:49,261-Speed 2497.77 samples/sec Loss 1.1827 LearningRate 0.000032 Epoch: 33 Global Step: 696820 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:58:57,474-Speed 2494.14 samples/sec Loss 1.1553 LearningRate 0.000032 Epoch: 33 Global Step: 696830 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:05,674-Speed 2498.20 samples/sec Loss 1.1824 LearningRate 0.000032 Epoch: 33 Global Step: 696840 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:13,823-Speed 2513.63 samples/sec Loss 1.1674 LearningRate 0.000032 Epoch: 33 Global Step: 696850 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:22,025-Speed 2497.38 samples/sec Loss 1.1825 LearningRate 0.000032 Epoch: 33 Global Step: 696860 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:30,225-Speed 2497.80 samples/sec Loss 1.1765 LearningRate 0.000032 Epoch: 33 Global Step: 696870 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:38,427-Speed 2497.20 samples/sec Loss 1.1414 LearningRate 0.000032 Epoch: 33 Global Step: 696880 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:46,642-Speed 2493.39 samples/sec Loss 1.1750 LearningRate 0.000032 Epoch: 33 Global Step: 696890 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 05:59:54,848-Speed 2496.55 samples/sec Loss 1.1878 LearningRate 0.000032 Epoch: 33 Global Step: 696900 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:02,998-Speed 2513.13 samples/sec Loss 1.1941 LearningRate 0.000032 Epoch: 33 Global Step: 696910 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:11,199-Speed 2497.79 samples/sec Loss 1.1991 LearningRate 0.000032 Epoch: 33 Global Step: 696920 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:19,405-Speed 2496.32 samples/sec Loss 1.1765 LearningRate 0.000032 Epoch: 33 Global Step: 696930 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:27,606-Speed 2497.35 samples/sec Loss 1.1565 LearningRate 0.000032 Epoch: 33 Global Step: 696940 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:35,811-Speed 2496.61 samples/sec Loss 1.1414 LearningRate 0.000032 Epoch: 33 Global Step: 696950 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:44,015-Speed 2496.87 samples/sec Loss 1.1683 LearningRate 0.000032 Epoch: 33 Global Step: 696960 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:00:52,173-Speed 2510.82 samples/sec Loss 1.1718 LearningRate 0.000032 Epoch: 33 Global Step: 696970 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:01:00,377-Speed 2496.78 samples/sec Loss 1.1561 LearningRate 0.000032 Epoch: 33 Global Step: 696980 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:01:08,584-Speed 2495.72 samples/sec Loss 1.1572 LearningRate 0.000032 Epoch: 33 Global Step: 696990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:16,788-Speed 2497.07 samples/sec Loss 1.1820 LearningRate 0.000032 Epoch: 33 Global Step: 697000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:24,988-Speed 2498.15 samples/sec Loss 1.1641 LearningRate 0.000032 Epoch: 33 Global Step: 697010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:33,192-Speed 2496.68 samples/sec Loss 1.1795 LearningRate 0.000032 Epoch: 33 Global Step: 697020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:41,348-Speed 2511.86 samples/sec Loss 1.1985 LearningRate 0.000032 Epoch: 33 Global Step: 697030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:49,555-Speed 2495.84 samples/sec Loss 1.1611 LearningRate 0.000032 Epoch: 33 Global Step: 697040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:01:57,759-Speed 2496.63 samples/sec Loss 1.1794 LearningRate 0.000032 Epoch: 33 Global Step: 697050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:05,962-Speed 2496.93 samples/sec Loss 1.1819 LearningRate 0.000031 Epoch: 33 Global Step: 697060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:14,165-Speed 2497.20 samples/sec Loss 1.1521 LearningRate 0.000031 Epoch: 33 Global Step: 697070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:22,368-Speed 2497.18 samples/sec Loss 1.1631 LearningRate 0.000031 Epoch: 33 Global Step: 697080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:30,519-Speed 2512.93 samples/sec Loss 1.2082 LearningRate 0.000031 Epoch: 33 Global Step: 697090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:38,722-Speed 2496.94 samples/sec Loss 1.1640 LearningRate 0.000031 Epoch: 33 Global Step: 697100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:46,925-Speed 2497.36 samples/sec Loss 1.1865 LearningRate 0.000031 Epoch: 33 Global Step: 697110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:02:55,131-Speed 2496.23 samples/sec Loss 1.1995 LearningRate 0.000031 Epoch: 33 Global Step: 697120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:03,334-Speed 2497.12 samples/sec Loss 1.1801 LearningRate 0.000031 Epoch: 33 Global Step: 697130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:11,534-Speed 2497.73 samples/sec Loss 1.1779 LearningRate 0.000031 Epoch: 33 Global Step: 697140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:19,684-Speed 2513.43 samples/sec Loss 1.1679 LearningRate 0.000031 Epoch: 33 Global Step: 697150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:27,886-Speed 2497.21 samples/sec Loss 1.2128 LearningRate 0.000031 Epoch: 33 Global Step: 697160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:36,105-Speed 2492.21 samples/sec Loss 1.1580 LearningRate 0.000031 Epoch: 33 Global Step: 697170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:44,311-Speed 2496.12 samples/sec Loss 1.1890 LearningRate 0.000031 Epoch: 33 Global Step: 697180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:03:52,513-Speed 2497.36 samples/sec Loss 1.1694 LearningRate 0.000031 Epoch: 33 Global Step: 697190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:00,715-Speed 2497.55 samples/sec Loss 1.1732 LearningRate 0.000031 Epoch: 33 Global Step: 697200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:08,867-Speed 2512.65 samples/sec Loss 1.1768 LearningRate 0.000031 Epoch: 33 Global Step: 697210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:17,069-Speed 2497.21 samples/sec Loss 1.1823 LearningRate 0.000031 Epoch: 33 Global Step: 697220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:25,279-Speed 2494.98 samples/sec Loss 1.1666 LearningRate 0.000031 Epoch: 33 Global Step: 697230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:33,482-Speed 2497.18 samples/sec Loss 1.1677 LearningRate 0.000031 Epoch: 33 Global Step: 697240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:41,684-Speed 2497.42 samples/sec Loss 1.1773 LearningRate 0.000031 Epoch: 33 Global Step: 697250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:49,888-Speed 2496.63 samples/sec Loss 1.1789 LearningRate 0.000031 Epoch: 33 Global Step: 697260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:04:58,036-Speed 2514.28 samples/sec Loss 1.1823 LearningRate 0.000031 Epoch: 33 Global Step: 697270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:06,238-Speed 2497.29 samples/sec Loss 1.1995 LearningRate 0.000031 Epoch: 33 Global Step: 697280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:14,439-Speed 2497.46 samples/sec Loss 1.1553 LearningRate 0.000031 Epoch: 33 Global Step: 697290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:22,642-Speed 2497.76 samples/sec Loss 1.1697 LearningRate 0.000031 Epoch: 33 Global Step: 697300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:30,850-Speed 2495.55 samples/sec Loss 1.1768 LearningRate 0.000031 Epoch: 33 Global Step: 697310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:39,052-Speed 2497.22 samples/sec Loss 1.1202 LearningRate 0.000031 Epoch: 33 Global Step: 697320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:47,201-Speed 2513.85 samples/sec Loss 1.1759 LearningRate 0.000031 Epoch: 33 Global Step: 697330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:05:55,407-Speed 2496.00 samples/sec Loss 1.1891 LearningRate 0.000031 Epoch: 33 Global Step: 697340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:03,608-Speed 2497.69 samples/sec Loss 1.1763 LearningRate 0.000031 Epoch: 33 Global Step: 697350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:11,811-Speed 2496.94 samples/sec Loss 1.1940 LearningRate 0.000031 Epoch: 33 Global Step: 697360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:20,026-Speed 2493.40 samples/sec Loss 1.1707 LearningRate 0.000031 Epoch: 33 Global Step: 697370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:28,226-Speed 2497.99 samples/sec Loss 1.1977 LearningRate 0.000031 Epoch: 33 Global Step: 697380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:36,378-Speed 2512.93 samples/sec Loss 1.1810 LearningRate 0.000031 Epoch: 33 Global Step: 697390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:44,581-Speed 2496.89 samples/sec Loss 1.1729 LearningRate 0.000031 Epoch: 33 Global Step: 697400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:06:52,789-Speed 2495.76 samples/sec Loss 1.1602 LearningRate 0.000031 Epoch: 33 Global Step: 697410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:00,991-Speed 2497.26 samples/sec Loss 1.1529 LearningRate 0.000031 Epoch: 33 Global Step: 697420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:09,192-Speed 2497.81 samples/sec Loss 1.1721 LearningRate 0.000031 Epoch: 33 Global Step: 697430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:17,393-Speed 2497.58 samples/sec Loss 1.1386 LearningRate 0.000031 Epoch: 33 Global Step: 697440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:25,557-Speed 2508.78 samples/sec Loss 1.1924 LearningRate 0.000031 Epoch: 33 Global Step: 697450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:33,758-Speed 2497.85 samples/sec Loss 1.1899 LearningRate 0.000031 Epoch: 33 Global Step: 697460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:41,960-Speed 2497.32 samples/sec Loss 1.1638 LearningRate 0.000031 Epoch: 33 Global Step: 697470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:50,174-Speed 2493.79 samples/sec Loss 1.1355 LearningRate 0.000031 Epoch: 33 Global Step: 697480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:07:58,375-Speed 2497.44 samples/sec Loss 1.1589 LearningRate 0.000031 Epoch: 33 Global Step: 697490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:06,578-Speed 2496.92 samples/sec Loss 1.1541 LearningRate 0.000031 Epoch: 33 Global Step: 697500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:14,727-Speed 2513.71 samples/sec Loss 1.1726 LearningRate 0.000031 Epoch: 33 Global Step: 697510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:22,941-Speed 2493.51 samples/sec Loss 1.1645 LearningRate 0.000031 Epoch: 33 Global Step: 697520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:31,170-Speed 2489.11 samples/sec Loss 1.1511 LearningRate 0.000031 Epoch: 33 Global Step: 697530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:39,374-Speed 2496.67 samples/sec Loss 1.1960 LearningRate 0.000031 Epoch: 33 Global Step: 697540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:47,575-Speed 2497.61 samples/sec Loss 1.1865 LearningRate 0.000031 Epoch: 33 Global Step: 697550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:08:55,777-Speed 2497.24 samples/sec Loss 1.2053 LearningRate 0.000031 Epoch: 33 Global Step: 697560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:03,926-Speed 2513.70 samples/sec Loss 1.1792 LearningRate 0.000031 Epoch: 33 Global Step: 697570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:12,130-Speed 2496.69 samples/sec Loss 1.1605 LearningRate 0.000031 Epoch: 33 Global Step: 697580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:20,334-Speed 2496.95 samples/sec Loss 1.1925 LearningRate 0.000031 Epoch: 33 Global Step: 697590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:28,540-Speed 2495.93 samples/sec Loss 1.1958 LearningRate 0.000031 Epoch: 33 Global Step: 697600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:36,743-Speed 2497.42 samples/sec Loss 1.1840 LearningRate 0.000031 Epoch: 33 Global Step: 697610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:44,948-Speed 2496.46 samples/sec Loss 1.1806 LearningRate 0.000031 Epoch: 33 Global Step: 697620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:09:53,100-Speed 2512.38 samples/sec Loss 1.1764 LearningRate 0.000031 Epoch: 33 Global Step: 697630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:01,304-Speed 2496.84 samples/sec Loss 1.1986 LearningRate 0.000031 Epoch: 33 Global Step: 697640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:09,508-Speed 2496.74 samples/sec Loss 1.1783 LearningRate 0.000031 Epoch: 33 Global Step: 697650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:17,724-Speed 2493.04 samples/sec Loss 1.2192 LearningRate 0.000031 Epoch: 33 Global Step: 697660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:25,937-Speed 2494.00 samples/sec Loss 1.1698 LearningRate 0.000031 Epoch: 33 Global Step: 697670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:34,150-Speed 2494.49 samples/sec Loss 1.1930 LearningRate 0.000031 Epoch: 33 Global Step: 697680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:42,303-Speed 2512.31 samples/sec Loss 1.2044 LearningRate 0.000031 Epoch: 33 Global Step: 697690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:50,516-Speed 2494.20 samples/sec Loss 1.1612 LearningRate 0.000031 Epoch: 33 Global Step: 697700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:10:58,749-Speed 2487.99 samples/sec Loss 1.1782 LearningRate 0.000031 Epoch: 33 Global Step: 697710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:06,953-Speed 2496.53 samples/sec Loss 1.1898 LearningRate 0.000031 Epoch: 33 Global Step: 697720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:15,158-Speed 2496.54 samples/sec Loss 1.1845 LearningRate 0.000031 Epoch: 33 Global Step: 697730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:23,368-Speed 2494.73 samples/sec Loss 1.1784 LearningRate 0.000031 Epoch: 33 Global Step: 697740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:31,519-Speed 2512.86 samples/sec Loss 1.1874 LearningRate 0.000031 Epoch: 33 Global Step: 697750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:39,741-Speed 2491.49 samples/sec Loss 1.1799 LearningRate 0.000031 Epoch: 33 Global Step: 697760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:47,946-Speed 2496.24 samples/sec Loss 1.1735 LearningRate 0.000031 Epoch: 33 Global Step: 697770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:11:56,171-Speed 2490.60 samples/sec Loss 1.1770 LearningRate 0.000031 Epoch: 33 Global Step: 697780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:04,378-Speed 2495.88 samples/sec Loss 1.1915 LearningRate 0.000031 Epoch: 33 Global Step: 697790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:12,582-Speed 2496.37 samples/sec Loss 1.2090 LearningRate 0.000031 Epoch: 33 Global Step: 697800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:20,734-Speed 2512.89 samples/sec Loss 1.2042 LearningRate 0.000031 Epoch: 33 Global Step: 697810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:28,944-Speed 2494.88 samples/sec Loss 1.2057 LearningRate 0.000031 Epoch: 33 Global Step: 697820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:37,152-Speed 2495.46 samples/sec Loss 1.1610 LearningRate 0.000031 Epoch: 33 Global Step: 697830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:45,356-Speed 2496.83 samples/sec Loss 1.1533 LearningRate 0.000031 Epoch: 33 Global Step: 697840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:12:53,560-Speed 2496.51 samples/sec Loss 1.1723 LearningRate 0.000031 Epoch: 33 Global Step: 697850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:01,771-Speed 2494.52 samples/sec Loss 1.1924 LearningRate 0.000031 Epoch: 33 Global Step: 697860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:09,928-Speed 2511.17 samples/sec Loss 1.1961 LearningRate 0.000031 Epoch: 33 Global Step: 697870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:18,133-Speed 2496.57 samples/sec Loss 1.1809 LearningRate 0.000031 Epoch: 33 Global Step: 697880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:26,335-Speed 2497.41 samples/sec Loss 1.1793 LearningRate 0.000031 Epoch: 33 Global Step: 697890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:34,540-Speed 2496.18 samples/sec Loss 1.2086 LearningRate 0.000031 Epoch: 33 Global Step: 697900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:42,747-Speed 2495.90 samples/sec Loss 1.1829 LearningRate 0.000031 Epoch: 33 Global Step: 697910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:50,956-Speed 2495.45 samples/sec Loss 1.2158 LearningRate 0.000031 Epoch: 33 Global Step: 697920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:13:59,106-Speed 2513.05 samples/sec Loss 1.1676 LearningRate 0.000031 Epoch: 33 Global Step: 697930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:07,325-Speed 2492.18 samples/sec Loss 1.1844 LearningRate 0.000031 Epoch: 33 Global Step: 697940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:15,531-Speed 2496.13 samples/sec Loss 1.1795 LearningRate 0.000031 Epoch: 33 Global Step: 697950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:23,743-Speed 2494.69 samples/sec Loss 1.2119 LearningRate 0.000031 Epoch: 33 Global Step: 697960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:31,948-Speed 2496.21 samples/sec Loss 1.1659 LearningRate 0.000031 Epoch: 33 Global Step: 697970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:40,159-Speed 2494.53 samples/sec Loss 1.1724 LearningRate 0.000031 Epoch: 33 Global Step: 697980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:48,315-Speed 2511.58 samples/sec Loss 1.1699 LearningRate 0.000031 Epoch: 33 Global Step: 697990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:14:56,523-Speed 2495.59 samples/sec Loss 1.2108 LearningRate 0.000031 Epoch: 33 Global Step: 698000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:04,731-Speed 2495.27 samples/sec Loss 1.1894 LearningRate 0.000031 Epoch: 33 Global Step: 698010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:12,936-Speed 2496.66 samples/sec Loss 1.1994 LearningRate 0.000031 Epoch: 33 Global Step: 698020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:21,142-Speed 2496.03 samples/sec Loss 1.1605 LearningRate 0.000031 Epoch: 33 Global Step: 698030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:29,348-Speed 2496.01 samples/sec Loss 1.1897 LearningRate 0.000031 Epoch: 33 Global Step: 698040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:37,499-Speed 2513.00 samples/sec Loss 1.1621 LearningRate 0.000031 Epoch: 33 Global Step: 698050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:45,708-Speed 2495.44 samples/sec Loss 1.2149 LearningRate 0.000031 Epoch: 33 Global Step: 698060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:15:53,910-Speed 2497.24 samples/sec Loss 1.1754 LearningRate 0.000031 Epoch: 33 Global Step: 698070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:02,118-Speed 2495.53 samples/sec Loss 1.1657 LearningRate 0.000031 Epoch: 33 Global Step: 698080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:10,324-Speed 2497.25 samples/sec Loss 1.1511 LearningRate 0.000031 Epoch: 33 Global Step: 698090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:18,530-Speed 2495.91 samples/sec Loss 1.2061 LearningRate 0.000031 Epoch: 33 Global Step: 698100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:26,683-Speed 2512.51 samples/sec Loss 1.1696 LearningRate 0.000031 Epoch: 33 Global Step: 698110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:34,897-Speed 2493.75 samples/sec Loss 1.1888 LearningRate 0.000031 Epoch: 33 Global Step: 698120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:43,103-Speed 2496.17 samples/sec Loss 1.1666 LearningRate 0.000031 Epoch: 33 Global Step: 698130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:51,324-Speed 2491.60 samples/sec Loss 1.1991 LearningRate 0.000031 Epoch: 33 Global Step: 698140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:16:59,546-Speed 2491.41 samples/sec Loss 1.1662 LearningRate 0.000031 Epoch: 33 Global Step: 698150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:17:07,751-Speed 2496.66 samples/sec Loss 1.2061 LearningRate 0.000031 Epoch: 33 Global Step: 698160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:17:15,901-Speed 2513.11 samples/sec Loss 1.2297 LearningRate 0.000031 Epoch: 33 Global Step: 698170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:17:24,107-Speed 2496.07 samples/sec Loss 1.2042 LearningRate 0.000031 Epoch: 33 Global Step: 698180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:17:32,315-Speed 2495.81 samples/sec Loss 1.1971 LearningRate 0.000031 Epoch: 33 Global Step: 698190 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:17:40,523-Speed 2495.25 samples/sec Loss 1.1788 LearningRate 0.000031 Epoch: 33 Global Step: 698200 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:17:48,726-Speed 2496.96 samples/sec Loss 1.1593 LearningRate 0.000031 Epoch: 33 Global Step: 698210 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:17:56,949-Speed 2490.99 samples/sec Loss 1.1584 LearningRate 0.000031 Epoch: 33 Global Step: 698220 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:05,104-Speed 2511.68 samples/sec Loss 1.1653 LearningRate 0.000031 Epoch: 33 Global Step: 698230 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:13,314-Speed 2494.86 samples/sec Loss 1.1824 LearningRate 0.000031 Epoch: 33 Global Step: 698240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:21,523-Speed 2495.26 samples/sec Loss 1.1710 LearningRate 0.000031 Epoch: 33 Global Step: 698250 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:29,728-Speed 2496.48 samples/sec Loss 1.1580 LearningRate 0.000031 Epoch: 33 Global Step: 698260 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:37,942-Speed 2493.81 samples/sec Loss 1.2019 LearningRate 0.000031 Epoch: 33 Global Step: 698270 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:46,147-Speed 2496.59 samples/sec Loss 1.1804 LearningRate 0.000031 Epoch: 33 Global Step: 698280 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:18:54,298-Speed 2513.00 samples/sec Loss 1.1909 LearningRate 0.000031 Epoch: 33 Global Step: 698290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:02,502-Speed 2496.43 samples/sec Loss 1.1748 LearningRate 0.000031 Epoch: 33 Global Step: 698300 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:10,709-Speed 2495.87 samples/sec Loss 1.1666 LearningRate 0.000031 Epoch: 33 Global Step: 698310 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:18,918-Speed 2495.25 samples/sec Loss 1.1660 LearningRate 0.000031 Epoch: 33 Global Step: 698320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:27,134-Speed 2493.34 samples/sec Loss 1.1880 LearningRate 0.000031 Epoch: 33 Global Step: 698330 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:35,338-Speed 2496.72 samples/sec Loss 1.1946 LearningRate 0.000031 Epoch: 33 Global Step: 698340 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:43,486-Speed 2513.61 samples/sec Loss 1.1670 LearningRate 0.000031 Epoch: 33 Global Step: 698350 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:51,690-Speed 2496.96 samples/sec Loss 1.2038 LearningRate 0.000031 Epoch: 33 Global Step: 698360 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:19:59,897-Speed 2495.91 samples/sec Loss 1.1869 LearningRate 0.000031 Epoch: 33 Global Step: 698370 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:08,114-Speed 2492.50 samples/sec Loss 1.1965 LearningRate 0.000031 Epoch: 33 Global Step: 698380 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:16,318-Speed 2497.00 samples/sec Loss 1.1817 LearningRate 0.000031 Epoch: 33 Global Step: 698390 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:24,521-Speed 2496.87 samples/sec Loss 1.1961 LearningRate 0.000031 Epoch: 33 Global Step: 698400 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:32,670-Speed 2513.37 samples/sec Loss 1.1733 LearningRate 0.000031 Epoch: 33 Global Step: 698410 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:40,878-Speed 2495.63 samples/sec Loss 1.1921 LearningRate 0.000031 Epoch: 33 Global Step: 698420 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:49,083-Speed 2496.33 samples/sec Loss 1.1961 LearningRate 0.000031 Epoch: 33 Global Step: 698430 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:20:57,296-Speed 2494.01 samples/sec Loss 1.1757 LearningRate 0.000031 Epoch: 33 Global Step: 698440 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:05,497-Speed 2497.99 samples/sec Loss 1.1795 LearningRate 0.000031 Epoch: 33 Global Step: 698450 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:13,699-Speed 2497.51 samples/sec Loss 1.1730 LearningRate 0.000031 Epoch: 33 Global Step: 698460 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:21,849-Speed 2513.07 samples/sec Loss 1.2043 LearningRate 0.000031 Epoch: 33 Global Step: 698470 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:30,055-Speed 2496.06 samples/sec Loss 1.1985 LearningRate 0.000031 Epoch: 33 Global Step: 698480 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:38,263-Speed 2495.75 samples/sec Loss 1.2021 LearningRate 0.000031 Epoch: 33 Global Step: 698490 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:46,464-Speed 2497.60 samples/sec Loss 1.2051 LearningRate 0.000031 Epoch: 33 Global Step: 698500 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:21:54,670-Speed 2496.02 samples/sec Loss 1.1872 LearningRate 0.000031 Epoch: 33 Global Step: 698510 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:02,872-Speed 2497.14 samples/sec Loss 1.2050 LearningRate 0.000031 Epoch: 33 Global Step: 698520 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:11,025-Speed 2512.42 samples/sec Loss 1.1927 LearningRate 0.000031 Epoch: 33 Global Step: 698530 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:19,233-Speed 2495.74 samples/sec Loss 1.1704 LearningRate 0.000031 Epoch: 33 Global Step: 698540 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:27,435-Speed 2497.17 samples/sec Loss 1.2171 LearningRate 0.000031 Epoch: 33 Global Step: 698550 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:35,639-Speed 2496.79 samples/sec Loss 1.1831 LearningRate 0.000031 Epoch: 33 Global Step: 698560 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:43,849-Speed 2495.27 samples/sec Loss 1.1946 LearningRate 0.000031 Epoch: 33 Global Step: 698570 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:22:52,050-Speed 2497.68 samples/sec Loss 1.1473 LearningRate 0.000031 Epoch: 33 Global Step: 698580 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:00,203-Speed 2512.54 samples/sec Loss 1.1695 LearningRate 0.000031 Epoch: 33 Global Step: 698590 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:08,404-Speed 2497.46 samples/sec Loss 1.1606 LearningRate 0.000031 Epoch: 33 Global Step: 698600 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:16,610-Speed 2496.55 samples/sec Loss 1.1965 LearningRate 0.000031 Epoch: 33 Global Step: 698610 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:24,829-Speed 2492.33 samples/sec Loss 1.1913 LearningRate 0.000031 Epoch: 33 Global Step: 698620 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:33,035-Speed 2496.06 samples/sec Loss 1.1627 LearningRate 0.000031 Epoch: 33 Global Step: 698630 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:41,241-Speed 2496.33 samples/sec Loss 1.1674 LearningRate 0.000031 Epoch: 33 Global Step: 698640 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:49,395-Speed 2512.08 samples/sec Loss 1.1904 LearningRate 0.000031 Epoch: 33 Global Step: 698650 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:23:57,599-Speed 2496.70 samples/sec Loss 1.1974 LearningRate 0.000031 Epoch: 33 Global Step: 698660 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:05,808-Speed 2495.36 samples/sec Loss 1.1509 LearningRate 0.000031 Epoch: 33 Global Step: 698670 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:14,013-Speed 2496.40 samples/sec Loss 1.1683 LearningRate 0.000031 Epoch: 33 Global Step: 698680 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:22,220-Speed 2495.86 samples/sec Loss 1.1623 LearningRate 0.000031 Epoch: 33 Global Step: 698690 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:30,427-Speed 2495.83 samples/sec Loss 1.1698 LearningRate 0.000031 Epoch: 33 Global Step: 698700 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:38,578-Speed 2512.89 samples/sec Loss 1.1765 LearningRate 0.000031 Epoch: 33 Global Step: 698710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:46,788-Speed 2494.99 samples/sec Loss 1.1944 LearningRate 0.000031 Epoch: 33 Global Step: 698720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:24:54,991-Speed 2497.07 samples/sec Loss 1.1649 LearningRate 0.000031 Epoch: 33 Global Step: 698730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:03,199-Speed 2495.58 samples/sec Loss 1.1619 LearningRate 0.000031 Epoch: 33 Global Step: 698740 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:11,404-Speed 2496.33 samples/sec Loss 1.1988 LearningRate 0.000031 Epoch: 33 Global Step: 698750 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:19,613-Speed 2495.26 samples/sec Loss 1.1639 LearningRate 0.000031 Epoch: 33 Global Step: 698760 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:27,766-Speed 2512.26 samples/sec Loss 1.1801 LearningRate 0.000031 Epoch: 33 Global Step: 698770 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:35,974-Speed 2495.64 samples/sec Loss 1.1914 LearningRate 0.000031 Epoch: 33 Global Step: 698780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:44,174-Speed 2497.75 samples/sec Loss 1.1756 LearningRate 0.000031 Epoch: 33 Global Step: 698790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:25:52,376-Speed 2497.43 samples/sec Loss 1.1495 LearningRate 0.000031 Epoch: 33 Global Step: 698800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:00,579-Speed 2497.55 samples/sec Loss 1.1721 LearningRate 0.000031 Epoch: 33 Global Step: 698810 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:08,786-Speed 2495.60 samples/sec Loss 1.1936 LearningRate 0.000031 Epoch: 33 Global Step: 698820 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:16,940-Speed 2512.29 samples/sec Loss 1.1655 LearningRate 0.000031 Epoch: 33 Global Step: 698830 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:25,141-Speed 2497.70 samples/sec Loss 1.1639 LearningRate 0.000031 Epoch: 33 Global Step: 698840 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:33,347-Speed 2496.02 samples/sec Loss 1.1561 LearningRate 0.000031 Epoch: 33 Global Step: 698850 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:41,551-Speed 2496.61 samples/sec Loss 1.1566 LearningRate 0.000031 Epoch: 33 Global Step: 698860 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:49,756-Speed 2496.61 samples/sec Loss 1.1618 LearningRate 0.000031 Epoch: 33 Global Step: 698870 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:26:57,960-Speed 2497.42 samples/sec Loss 1.1817 LearningRate 0.000031 Epoch: 33 Global Step: 698880 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:06,110-Speed 2513.05 samples/sec Loss 1.1575 LearningRate 0.000031 Epoch: 33 Global Step: 698890 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:14,324-Speed 2493.73 samples/sec Loss 1.1735 LearningRate 0.000031 Epoch: 33 Global Step: 698900 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:22,527-Speed 2497.31 samples/sec Loss 1.1756 LearningRate 0.000031 Epoch: 33 Global Step: 698910 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:30,728-Speed 2497.77 samples/sec Loss 1.1846 LearningRate 0.000031 Epoch: 33 Global Step: 698920 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:38,931-Speed 2496.96 samples/sec Loss 1.1514 LearningRate 0.000031 Epoch: 33 Global Step: 698930 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:47,136-Speed 2496.17 samples/sec Loss 1.2034 LearningRate 0.000031 Epoch: 33 Global Step: 698940 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:27:55,284-Speed 2514.20 samples/sec Loss 1.1527 LearningRate 0.000031 Epoch: 33 Global Step: 698950 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:03,494-Speed 2495.26 samples/sec Loss 1.1545 LearningRate 0.000031 Epoch: 33 Global Step: 698960 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:11,710-Speed 2492.78 samples/sec Loss 1.1949 LearningRate 0.000031 Epoch: 33 Global Step: 698970 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:19,912-Speed 2497.62 samples/sec Loss 1.1755 LearningRate 0.000031 Epoch: 33 Global Step: 698980 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:28,114-Speed 2497.32 samples/sec Loss 1.1752 LearningRate 0.000031 Epoch: 33 Global Step: 698990 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:36,317-Speed 2497.04 samples/sec Loss 1.1872 LearningRate 0.000031 Epoch: 33 Global Step: 699000 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:44,466-Speed 2513.48 samples/sec Loss 1.1768 LearningRate 0.000031 Epoch: 33 Global Step: 699010 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:28:52,667-Speed 2497.80 samples/sec Loss 1.1605 LearningRate 0.000031 Epoch: 33 Global Step: 699020 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:00,870-Speed 2497.85 samples/sec Loss 1.1720 LearningRate 0.000031 Epoch: 33 Global Step: 699030 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:09,079-Speed 2495.17 samples/sec Loss 1.1448 LearningRate 0.000031 Epoch: 33 Global Step: 699040 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:17,281-Speed 2497.39 samples/sec Loss 1.1790 LearningRate 0.000031 Epoch: 33 Global Step: 699050 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:25,486-Speed 2496.79 samples/sec Loss 1.1950 LearningRate 0.000031 Epoch: 33 Global Step: 699060 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:33,642-Speed 2511.33 samples/sec Loss 1.1847 LearningRate 0.000031 Epoch: 33 Global Step: 699070 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:41,850-Speed 2495.72 samples/sec Loss 1.1888 LearningRate 0.000031 Epoch: 33 Global Step: 699080 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:50,055-Speed 2496.52 samples/sec Loss 1.1953 LearningRate 0.000031 Epoch: 33 Global Step: 699090 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:29:58,260-Speed 2496.34 samples/sec Loss 1.1674 LearningRate 0.000031 Epoch: 33 Global Step: 699100 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:06,467-Speed 2495.69 samples/sec Loss 1.1618 LearningRate 0.000031 Epoch: 33 Global Step: 699110 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:14,673-Speed 2496.66 samples/sec Loss 1.1719 LearningRate 0.000031 Epoch: 33 Global Step: 699120 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:22,830-Speed 2511.68 samples/sec Loss 1.1730 LearningRate 0.000031 Epoch: 33 Global Step: 699130 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:31,036-Speed 2496.07 samples/sec Loss 1.1660 LearningRate 0.000031 Epoch: 33 Global Step: 699140 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:39,238-Speed 2497.06 samples/sec Loss 1.1605 LearningRate 0.000031 Epoch: 33 Global Step: 699150 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:47,440-Speed 2497.35 samples/sec Loss 1.1621 LearningRate 0.000031 Epoch: 33 Global Step: 699160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:30:55,642-Speed 2497.40 samples/sec Loss 1.2093 LearningRate 0.000031 Epoch: 33 Global Step: 699170 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:31:03,856-Speed 2493.72 samples/sec Loss 1.1629 LearningRate 0.000030 Epoch: 33 Global Step: 699180 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:31:12,003-Speed 2514.25 samples/sec Loss 1.1499 LearningRate 0.000030 Epoch: 33 Global Step: 699190 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-07-12 06:31:20,177-Speed 2505.96 samples/sec Loss 1.1493 LearningRate 0.000030 Epoch: 33 Global Step: 699200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:31:28,378-Speed 2497.46 samples/sec Loss 1.2041 LearningRate 0.000030 Epoch: 33 Global Step: 699210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:31:36,585-Speed 2496.29 samples/sec Loss 1.1997 LearningRate 0.000030 Epoch: 33 Global Step: 699220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:31:44,788-Speed 2496.79 samples/sec Loss 1.1607 LearningRate 0.000030 Epoch: 33 Global Step: 699230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:31:52,990-Speed 2497.64 samples/sec Loss 1.1686 LearningRate 0.000030 Epoch: 33 Global Step: 699240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:01,143-Speed 2512.37 samples/sec Loss 1.1905 LearningRate 0.000030 Epoch: 33 Global Step: 699250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:09,346-Speed 2497.11 samples/sec Loss 1.1626 LearningRate 0.000030 Epoch: 33 Global Step: 699260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:17,545-Speed 2498.31 samples/sec Loss 1.1835 LearningRate 0.000030 Epoch: 33 Global Step: 699270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:25,747-Speed 2497.39 samples/sec Loss 1.1456 LearningRate 0.000030 Epoch: 33 Global Step: 699280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:33,948-Speed 2497.52 samples/sec Loss 1.1796 LearningRate 0.000030 Epoch: 33 Global Step: 699290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:42,151-Speed 2497.22 samples/sec Loss 1.1750 LearningRate 0.000030 Epoch: 33 Global Step: 699300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:50,299-Speed 2513.78 samples/sec Loss 1.1730 LearningRate 0.000030 Epoch: 33 Global Step: 699310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:32:58,504-Speed 2496.64 samples/sec Loss 1.1553 LearningRate 0.000030 Epoch: 33 Global Step: 699320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:06,709-Speed 2496.55 samples/sec Loss 1.1955 LearningRate 0.000030 Epoch: 33 Global Step: 699330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:14,911-Speed 2497.13 samples/sec Loss 1.1592 LearningRate 0.000030 Epoch: 33 Global Step: 699340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:23,117-Speed 2496.30 samples/sec Loss 1.1535 LearningRate 0.000030 Epoch: 33 Global Step: 699350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:31,323-Speed 2496.28 samples/sec Loss 1.1704 LearningRate 0.000030 Epoch: 33 Global Step: 699360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:39,476-Speed 2512.13 samples/sec Loss 1.1662 LearningRate 0.000030 Epoch: 33 Global Step: 699370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:47,678-Speed 2497.50 samples/sec Loss 1.1834 LearningRate 0.000030 Epoch: 33 Global Step: 699380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:33:55,885-Speed 2495.71 samples/sec Loss 1.2188 LearningRate 0.000030 Epoch: 33 Global Step: 699390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:04,088-Speed 2497.27 samples/sec Loss 1.1253 LearningRate 0.000030 Epoch: 33 Global Step: 699400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:12,294-Speed 2496.19 samples/sec Loss 1.1803 LearningRate 0.000030 Epoch: 33 Global Step: 699410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:20,496-Speed 2497.32 samples/sec Loss 1.1825 LearningRate 0.000030 Epoch: 33 Global Step: 699420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:28,643-Speed 2514.03 samples/sec Loss 1.1573 LearningRate 0.000030 Epoch: 33 Global Step: 699430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:36,847-Speed 2496.99 samples/sec Loss 1.2065 LearningRate 0.000030 Epoch: 33 Global Step: 699440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:45,049-Speed 2497.37 samples/sec Loss 1.1848 LearningRate 0.000030 Epoch: 33 Global Step: 699450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:34:53,255-Speed 2496.19 samples/sec Loss 1.1839 LearningRate 0.000030 Epoch: 33 Global Step: 699460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:01,473-Speed 2492.59 samples/sec Loss 1.1748 LearningRate 0.000030 Epoch: 33 Global Step: 699470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:09,676-Speed 2497.11 samples/sec Loss 1.1952 LearningRate 0.000030 Epoch: 33 Global Step: 699480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:17,827-Speed 2513.21 samples/sec Loss 1.1536 LearningRate 0.000030 Epoch: 33 Global Step: 699490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:26,030-Speed 2497.06 samples/sec Loss 1.1567 LearningRate 0.000030 Epoch: 33 Global Step: 699500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:34,232-Speed 2497.22 samples/sec Loss 1.1902 LearningRate 0.000030 Epoch: 33 Global Step: 699510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:42,436-Speed 2496.96 samples/sec Loss 1.1675 LearningRate 0.000030 Epoch: 33 Global Step: 699520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:50,642-Speed 2496.13 samples/sec Loss 1.1812 LearningRate 0.000030 Epoch: 33 Global Step: 699530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:35:58,843-Speed 2497.53 samples/sec Loss 1.1608 LearningRate 0.000030 Epoch: 33 Global Step: 699540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:06,992-Speed 2513.52 samples/sec Loss 1.1950 LearningRate 0.000030 Epoch: 33 Global Step: 699550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:15,215-Speed 2490.92 samples/sec Loss 1.1627 LearningRate 0.000030 Epoch: 33 Global Step: 699560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:23,418-Speed 2497.22 samples/sec Loss 1.1866 LearningRate 0.000030 Epoch: 33 Global Step: 699570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:31,620-Speed 2497.16 samples/sec Loss 1.1851 LearningRate 0.000030 Epoch: 33 Global Step: 699580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:39,823-Speed 2497.04 samples/sec Loss 1.1828 LearningRate 0.000030 Epoch: 33 Global Step: 699590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:48,034-Speed 2494.50 samples/sec Loss 1.1734 LearningRate 0.000030 Epoch: 33 Global Step: 699600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:36:56,180-Speed 2514.71 samples/sec Loss 1.1395 LearningRate 0.000030 Epoch: 33 Global Step: 699610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:04,383-Speed 2497.34 samples/sec Loss 1.1797 LearningRate 0.000030 Epoch: 33 Global Step: 699620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:12,585-Speed 2497.06 samples/sec Loss 1.1857 LearningRate 0.000030 Epoch: 33 Global Step: 699630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:20,786-Speed 2497.84 samples/sec Loss 1.1885 LearningRate 0.000030 Epoch: 33 Global Step: 699640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:28,990-Speed 2496.72 samples/sec Loss 1.1669 LearningRate 0.000030 Epoch: 33 Global Step: 699650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:37,193-Speed 2497.11 samples/sec Loss 1.1570 LearningRate 0.000030 Epoch: 33 Global Step: 699660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:45,343-Speed 2513.24 samples/sec Loss 1.1580 LearningRate 0.000030 Epoch: 33 Global Step: 699670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:37:53,554-Speed 2494.63 samples/sec Loss 1.1786 LearningRate 0.000030 Epoch: 33 Global Step: 699680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:38:01,758-Speed 2496.81 samples/sec Loss 1.1957 LearningRate 0.000030 Epoch: 33 Global Step: 699690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-07-12 06:38:09,917-Speed 2510.34 samples/sec Loss 1.1981 LearningRate 0.000030 Epoch: 33 Global Step: 699700 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:18,123-Speed 2496.12 samples/sec Loss 1.1576 LearningRate 0.000030 Epoch: 33 Global Step: 699710 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:26,326-Speed 2497.20 samples/sec Loss 1.1602 LearningRate 0.000030 Epoch: 33 Global Step: 699720 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:34,474-Speed 2513.76 samples/sec Loss 1.1622 LearningRate 0.000030 Epoch: 33 Global Step: 699730 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:42,679-Speed 2496.31 samples/sec Loss 1.1763 LearningRate 0.000030 Epoch: 33 Global Step: 699740 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:50,881-Speed 2497.30 samples/sec Loss 1.1876 LearningRate 0.000030 Epoch: 33 Global Step: 699750 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:38:59,083-Speed 2497.82 samples/sec Loss 1.1823 LearningRate 0.000030 Epoch: 33 Global Step: 699760 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:07,289-Speed 2495.87 samples/sec Loss 1.1938 LearningRate 0.000030 Epoch: 33 Global Step: 699770 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:15,490-Speed 2497.78 samples/sec Loss 1.1898 LearningRate 0.000030 Epoch: 33 Global Step: 699780 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:23,646-Speed 2511.70 samples/sec Loss 1.2038 LearningRate 0.000030 Epoch: 33 Global Step: 699790 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:31,848-Speed 2497.32 samples/sec Loss 1.1598 LearningRate 0.000030 Epoch: 33 Global Step: 699800 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:40,053-Speed 2496.36 samples/sec Loss 1.2086 LearningRate 0.000030 Epoch: 33 Global Step: 699810 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:48,269-Speed 2493.16 samples/sec Loss 1.2039 LearningRate 0.000030 Epoch: 33 Global Step: 699820 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:39:56,470-Speed 2497.44 samples/sec Loss 1.2021 LearningRate 0.000030 Epoch: 33 Global Step: 699830 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:04,681-Speed 2494.93 samples/sec Loss 1.1927 LearningRate 0.000030 Epoch: 33 Global Step: 699840 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:12,834-Speed 2512.11 samples/sec Loss 1.1936 LearningRate 0.000030 Epoch: 33 Global Step: 699850 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:21,040-Speed 2496.45 samples/sec Loss 1.1872 LearningRate 0.000030 Epoch: 33 Global Step: 699860 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:29,246-Speed 2496.22 samples/sec Loss 1.2126 LearningRate 0.000030 Epoch: 33 Global Step: 699870 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:37,451-Speed 2496.45 samples/sec Loss 1.1931 LearningRate 0.000030 Epoch: 33 Global Step: 699880 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:45,658-Speed 2495.75 samples/sec Loss 1.1939 LearningRate 0.000030 Epoch: 33 Global Step: 699890 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:40:53,868-Speed 2494.93 samples/sec Loss 1.1700 LearningRate 0.000030 Epoch: 33 Global Step: 699900 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:02,024-Speed 2511.83 samples/sec Loss 1.1955 LearningRate 0.000030 Epoch: 33 Global Step: 699910 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:10,230-Speed 2496.29 samples/sec Loss 1.1711 LearningRate 0.000030 Epoch: 33 Global Step: 699920 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:18,438-Speed 2495.28 samples/sec Loss 1.1875 LearningRate 0.000030 Epoch: 33 Global Step: 699930 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:26,646-Speed 2495.61 samples/sec Loss 1.1919 LearningRate 0.000030 Epoch: 33 Global Step: 699940 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:34,858-Speed 2494.70 samples/sec Loss 1.1571 LearningRate 0.000030 Epoch: 33 Global Step: 699950 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:43,062-Speed 2496.46 samples/sec Loss 1.1860 LearningRate 0.000030 Epoch: 33 Global Step: 699960 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:51,223-Speed 2509.97 samples/sec Loss 1.1821 LearningRate 0.000030 Epoch: 33 Global Step: 699970 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:41:59,441-Speed 2492.88 samples/sec Loss 1.1789 LearningRate 0.000030 Epoch: 33 Global Step: 699980 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:07,639-Speed 2498.41 samples/sec Loss 1.1850 LearningRate 0.000030 Epoch: 33 Global Step: 699990 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:15,842-Speed 2497.25 samples/sec Loss 1.1409 LearningRate 0.000030 Epoch: 33 Global Step: 700000 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:24,051-Speed 2495.15 samples/sec Loss 1.1801 LearningRate 0.000030 Epoch: 33 Global Step: 700010 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:32,254-Speed 2497.01 samples/sec Loss 1.1631 LearningRate 0.000030 Epoch: 33 Global Step: 700020 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:40,405-Speed 2512.95 samples/sec Loss 1.1807 LearningRate 0.000030 Epoch: 33 Global Step: 700030 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:48,608-Speed 2497.17 samples/sec Loss 1.1722 LearningRate 0.000030 Epoch: 33 Global Step: 700040 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:42:56,809-Speed 2497.53 samples/sec Loss 1.1598 LearningRate 0.000030 Epoch: 33 Global Step: 700050 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:05,015-Speed 2496.33 samples/sec Loss 1.1964 LearningRate 0.000030 Epoch: 33 Global Step: 700060 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:13,218-Speed 2496.77 samples/sec Loss 1.1356 LearningRate 0.000030 Epoch: 33 Global Step: 700070 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:21,421-Speed 2496.89 samples/sec Loss 1.1451 LearningRate 0.000030 Epoch: 33 Global Step: 700080 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:29,570-Speed 2513.80 samples/sec Loss 1.1657 LearningRate 0.000030 Epoch: 33 Global Step: 700090 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:37,773-Speed 2497.57 samples/sec Loss 1.1663 LearningRate 0.000030 Epoch: 33 Global Step: 700100 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:45,979-Speed 2495.88 samples/sec Loss 1.1529 LearningRate 0.000030 Epoch: 33 Global Step: 700110 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:43:54,194-Speed 2493.49 samples/sec Loss 1.1606 LearningRate 0.000030 Epoch: 33 Global Step: 700120 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:02,396-Speed 2497.36 samples/sec Loss 1.1857 LearningRate 0.000030 Epoch: 33 Global Step: 700130 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:10,598-Speed 2497.27 samples/sec Loss 1.1622 LearningRate 0.000030 Epoch: 33 Global Step: 700140 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:18,746-Speed 2513.95 samples/sec Loss 1.1790 LearningRate 0.000030 Epoch: 33 Global Step: 700150 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:26,949-Speed 2497.01 samples/sec Loss 1.1968 LearningRate 0.000030 Epoch: 33 Global Step: 700160 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:35,150-Speed 2497.61 samples/sec Loss 1.1293 LearningRate 0.000030 Epoch: 33 Global Step: 700170 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:43,351-Speed 2497.52 samples/sec Loss 1.1868 LearningRate 0.000030 Epoch: 33 Global Step: 700180 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:51,553-Speed 2497.44 samples/sec Loss 1.1686 LearningRate 0.000030 Epoch: 33 Global Step: 700190 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:44:59,755-Speed 2497.47 samples/sec Loss 1.1698 LearningRate 0.000030 Epoch: 33 Global Step: 700200 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:07,907-Speed 2512.76 samples/sec Loss 1.1591 LearningRate 0.000030 Epoch: 33 Global Step: 700210 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:16,107-Speed 2497.87 samples/sec Loss 1.1398 LearningRate 0.000030 Epoch: 33 Global Step: 700220 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:24,311-Speed 2496.86 samples/sec Loss 1.1413 LearningRate 0.000030 Epoch: 33 Global Step: 700230 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:32,517-Speed 2496.23 samples/sec Loss 1.1619 LearningRate 0.000030 Epoch: 33 Global Step: 700240 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:40,717-Speed 2498.00 samples/sec Loss 1.1733 LearningRate 0.000030 Epoch: 33 Global Step: 700250 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:48,920-Speed 2496.99 samples/sec Loss 1.1612 LearningRate 0.000030 Epoch: 33 Global Step: 700260 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:45:57,073-Speed 2512.38 samples/sec Loss 1.1736 LearningRate 0.000030 Epoch: 33 Global Step: 700270 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:05,276-Speed 2496.98 samples/sec Loss 1.1596 LearningRate 0.000030 Epoch: 33 Global Step: 700280 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:13,483-Speed 2495.95 samples/sec Loss 1.1950 LearningRate 0.000030 Epoch: 33 Global Step: 700290 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:21,688-Speed 2496.38 samples/sec Loss 1.1513 LearningRate 0.000030 Epoch: 33 Global Step: 700300 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:29,894-Speed 2495.83 samples/sec Loss 1.1831 LearningRate 0.000030 Epoch: 33 Global Step: 700310 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:38,098-Speed 2496.90 samples/sec Loss 1.1717 LearningRate 0.000030 Epoch: 33 Global Step: 700320 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:46,247-Speed 2513.36 samples/sec Loss 1.1760 LearningRate 0.000030 Epoch: 33 Global Step: 700330 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:46:54,452-Speed 2496.31 samples/sec Loss 1.1829 LearningRate 0.000030 Epoch: 33 Global Step: 700340 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:02,653-Speed 2497.52 samples/sec Loss 1.2020 LearningRate 0.000030 Epoch: 33 Global Step: 700350 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:10,855-Speed 2497.79 samples/sec Loss 1.1765 LearningRate 0.000030 Epoch: 33 Global Step: 700360 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:19,060-Speed 2496.22 samples/sec Loss 1.1970 LearningRate 0.000030 Epoch: 33 Global Step: 700370 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:27,264-Speed 2496.70 samples/sec Loss 1.2219 LearningRate 0.000030 Epoch: 33 Global Step: 700380 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:35,411-Speed 2514.09 samples/sec Loss 1.1699 LearningRate 0.000030 Epoch: 33 Global Step: 700390 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:43,615-Speed 2496.91 samples/sec Loss 1.1667 LearningRate 0.000030 Epoch: 33 Global Step: 700400 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:47:51,815-Speed 2498.22 samples/sec Loss 1.1768 LearningRate 0.000030 Epoch: 33 Global Step: 700410 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:00,017-Speed 2497.23 samples/sec Loss 1.1714 LearningRate 0.000030 Epoch: 33 Global Step: 700420 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:08,243-Speed 2489.97 samples/sec Loss 1.1922 LearningRate 0.000030 Epoch: 33 Global Step: 700430 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:16,458-Speed 2493.61 samples/sec Loss 1.2062 LearningRate 0.000030 Epoch: 33 Global Step: 700440 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:24,607-Speed 2513.63 samples/sec Loss 1.1820 LearningRate 0.000030 Epoch: 33 Global Step: 700450 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:32,806-Speed 2498.10 samples/sec Loss 1.1597 LearningRate 0.000030 Epoch: 33 Global Step: 700460 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:41,015-Speed 2495.33 samples/sec Loss 1.1753 LearningRate 0.000030 Epoch: 33 Global Step: 700470 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:49,218-Speed 2497.49 samples/sec Loss 1.1584 LearningRate 0.000030 Epoch: 33 Global Step: 700480 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:48:57,424-Speed 2495.83 samples/sec Loss 1.1565 LearningRate 0.000030 Epoch: 33 Global Step: 700490 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:05,633-Speed 2495.52 samples/sec Loss 1.1716 LearningRate 0.000030 Epoch: 33 Global Step: 700500 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:13,806-Speed 2506.56 samples/sec Loss 1.1752 LearningRate 0.000030 Epoch: 33 Global Step: 700510 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:22,006-Speed 2498.08 samples/sec Loss 1.1753 LearningRate 0.000030 Epoch: 33 Global Step: 700520 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:30,208-Speed 2497.45 samples/sec Loss 1.1601 LearningRate 0.000030 Epoch: 33 Global Step: 700530 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:38,422-Speed 2493.76 samples/sec Loss 1.1909 LearningRate 0.000030 Epoch: 33 Global Step: 700540 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:46,625-Speed 2496.85 samples/sec Loss 1.1852 LearningRate 0.000030 Epoch: 33 Global Step: 700550 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:49:54,841-Speed 2493.00 samples/sec Loss 1.1612 LearningRate 0.000030 Epoch: 33 Global Step: 700560 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:03,006-Speed 2508.74 samples/sec Loss 1.1853 LearningRate 0.000030 Epoch: 33 Global Step: 700570 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:11,211-Speed 2496.69 samples/sec Loss 1.1682 LearningRate 0.000030 Epoch: 33 Global Step: 700580 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:19,413-Speed 2497.36 samples/sec Loss 1.2173 LearningRate 0.000030 Epoch: 33 Global Step: 700590 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:27,614-Speed 2497.43 samples/sec Loss 1.1713 LearningRate 0.000030 Epoch: 33 Global Step: 700600 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:35,818-Speed 2496.77 samples/sec Loss 1.1814 LearningRate 0.000030 Epoch: 33 Global Step: 700610 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:44,024-Speed 2496.44 samples/sec Loss 1.1601 LearningRate 0.000030 Epoch: 33 Global Step: 700620 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:50:52,172-Speed 2514.51 samples/sec Loss 1.1819 LearningRate 0.000030 Epoch: 33 Global Step: 700630 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:00,377-Speed 2496.41 samples/sec Loss 1.1936 LearningRate 0.000030 Epoch: 33 Global Step: 700640 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:08,581-Speed 2496.72 samples/sec Loss 1.1693 LearningRate 0.000030 Epoch: 33 Global Step: 700650 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:16,785-Speed 2496.56 samples/sec Loss 1.2079 LearningRate 0.000030 Epoch: 33 Global Step: 700660 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:24,989-Speed 2496.87 samples/sec Loss 1.1793 LearningRate 0.000030 Epoch: 33 Global Step: 700670 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:33,195-Speed 2496.22 samples/sec Loss 1.1768 LearningRate 0.000030 Epoch: 33 Global Step: 700680 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:41,348-Speed 2512.60 samples/sec Loss 1.1727 LearningRate 0.000030 Epoch: 33 Global Step: 700690 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:49,553-Speed 2496.49 samples/sec Loss 1.1740 LearningRate 0.000030 Epoch: 33 Global Step: 700700 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-07-12 06:51:57,758-Speed 2496.48 samples/sec Loss 1.1739 LearningRate 0.000030 Epoch: 33 Global Step: 700710 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:05,975-Speed 2492.57 samples/sec Loss 1.1714 LearningRate 0.000030 Epoch: 33 Global Step: 700720 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:14,179-Speed 2496.90 samples/sec Loss 1.1774 LearningRate 0.000030 Epoch: 33 Global Step: 700730 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:22,382-Speed 2497.16 samples/sec Loss 1.1989 LearningRate 0.000030 Epoch: 33 Global Step: 700740 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:30,534-Speed 2512.62 samples/sec Loss 1.2020 LearningRate 0.000030 Epoch: 33 Global Step: 700750 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:38,735-Speed 2497.45 samples/sec Loss 1.1867 LearningRate 0.000030 Epoch: 33 Global Step: 700760 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:46,939-Speed 2497.00 samples/sec Loss 1.1742 LearningRate 0.000030 Epoch: 33 Global Step: 700770 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:52:55,140-Speed 2497.59 samples/sec Loss 1.2081 LearningRate 0.000030 Epoch: 33 Global Step: 700780 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:03,434-Speed 2469.66 samples/sec Loss 1.1930 LearningRate 0.000030 Epoch: 33 Global Step: 700790 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:11,636-Speed 2497.17 samples/sec Loss 1.1873 LearningRate 0.000030 Epoch: 33 Global Step: 700800 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:19,787-Speed 2513.28 samples/sec Loss 1.1768 LearningRate 0.000030 Epoch: 33 Global Step: 700810 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:27,986-Speed 2498.17 samples/sec Loss 1.1486 LearningRate 0.000030 Epoch: 33 Global Step: 700820 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:36,189-Speed 2497.05 samples/sec Loss 1.1682 LearningRate 0.000030 Epoch: 33 Global Step: 700830 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:44,403-Speed 2493.91 samples/sec Loss 1.1826 LearningRate 0.000030 Epoch: 33 Global Step: 700840 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:53:52,604-Speed 2497.76 samples/sec Loss 1.1707 LearningRate 0.000030 Epoch: 33 Global Step: 700850 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:54:00,805-Speed 2497.53 samples/sec Loss 1.1533 LearningRate 0.000030 Epoch: 33 Global Step: 700860 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:54:08,954-Speed 2513.56 samples/sec Loss 1.1832 LearningRate 0.000030 Epoch: 33 Global Step: 700870 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:54:17,160-Speed 2496.26 samples/sec Loss 1.1811 LearningRate 0.000030 Epoch: 33 Global Step: 700880 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:54:25,366-Speed 2496.09 samples/sec Loss 1.1684 LearningRate 0.000030 Epoch: 33 Global Step: 700890 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 06:54:33,569-Speed 2496.84 samples/sec Loss 1.1517 LearningRate 0.000030 Epoch: 33 Global Step: 700900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:54:41,778-Speed 2495.21 samples/sec Loss 1.1766 LearningRate 0.000030 Epoch: 33 Global Step: 700910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:54:49,980-Speed 2497.69 samples/sec Loss 1.1810 LearningRate 0.000030 Epoch: 33 Global Step: 700920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:54:58,152-Speed 2506.45 samples/sec Loss 1.1713 LearningRate 0.000030 Epoch: 33 Global Step: 700930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:06,355-Speed 2497.08 samples/sec Loss 1.2072 LearningRate 0.000030 Epoch: 33 Global Step: 700940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:14,558-Speed 2497.03 samples/sec Loss 1.1655 LearningRate 0.000030 Epoch: 33 Global Step: 700950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:22,765-Speed 2495.67 samples/sec Loss 1.1726 LearningRate 0.000030 Epoch: 33 Global Step: 700960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:30,967-Speed 2497.36 samples/sec Loss 1.2089 LearningRate 0.000030 Epoch: 33 Global Step: 700970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:39,170-Speed 2496.83 samples/sec Loss 1.1861 LearningRate 0.000030 Epoch: 33 Global Step: 700980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:47,318-Speed 2513.86 samples/sec Loss 1.1823 LearningRate 0.000030 Epoch: 33 Global Step: 700990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:55:55,525-Speed 2496.00 samples/sec Loss 1.1662 LearningRate 0.000030 Epoch: 33 Global Step: 701000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:03,730-Speed 2496.55 samples/sec Loss 1.1710 LearningRate 0.000030 Epoch: 33 Global Step: 701010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:11,935-Speed 2496.57 samples/sec Loss 1.2196 LearningRate 0.000030 Epoch: 33 Global Step: 701020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:20,138-Speed 2496.74 samples/sec Loss 1.1890 LearningRate 0.000030 Epoch: 33 Global Step: 701030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:28,342-Speed 2497.07 samples/sec Loss 1.2115 LearningRate 0.000030 Epoch: 33 Global Step: 701040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:36,488-Speed 2514.48 samples/sec Loss 1.2144 LearningRate 0.000030 Epoch: 33 Global Step: 701050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:44,690-Speed 2497.24 samples/sec Loss 1.1887 LearningRate 0.000030 Epoch: 33 Global Step: 701060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:56:52,896-Speed 2496.12 samples/sec Loss 1.2169 LearningRate 0.000030 Epoch: 33 Global Step: 701070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:01,108-Speed 2494.27 samples/sec Loss 1.1846 LearningRate 0.000030 Epoch: 33 Global Step: 701080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:09,311-Speed 2496.91 samples/sec Loss 1.2071 LearningRate 0.000030 Epoch: 33 Global Step: 701090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:17,517-Speed 2496.13 samples/sec Loss 1.1839 LearningRate 0.000030 Epoch: 33 Global Step: 701100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:25,668-Speed 2513.22 samples/sec Loss 1.1850 LearningRate 0.000030 Epoch: 33 Global Step: 701110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:33,875-Speed 2495.98 samples/sec Loss 1.1543 LearningRate 0.000030 Epoch: 33 Global Step: 701120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:42,080-Speed 2496.38 samples/sec Loss 1.1434 LearningRate 0.000030 Epoch: 33 Global Step: 701130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:50,283-Speed 2496.91 samples/sec Loss 1.1537 LearningRate 0.000030 Epoch: 33 Global Step: 701140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:57:58,488-Speed 2496.59 samples/sec Loss 1.1845 LearningRate 0.000030 Epoch: 33 Global Step: 701150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:06,689-Speed 2497.78 samples/sec Loss 1.1837 LearningRate 0.000030 Epoch: 33 Global Step: 701160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:14,838-Speed 2513.44 samples/sec Loss 1.1866 LearningRate 0.000030 Epoch: 33 Global Step: 701170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:23,043-Speed 2496.45 samples/sec Loss 1.1866 LearningRate 0.000030 Epoch: 33 Global Step: 701180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:31,246-Speed 2496.93 samples/sec Loss 1.1415 LearningRate 0.000030 Epoch: 33 Global Step: 701190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:39,455-Speed 2495.29 samples/sec Loss 1.1836 LearningRate 0.000030 Epoch: 33 Global Step: 701200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:47,659-Speed 2496.64 samples/sec Loss 1.1477 LearningRate 0.000030 Epoch: 33 Global Step: 701210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:58:55,865-Speed 2496.12 samples/sec Loss 1.1760 LearningRate 0.000030 Epoch: 33 Global Step: 701220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:04,015-Speed 2513.34 samples/sec Loss 1.1651 LearningRate 0.000030 Epoch: 33 Global Step: 701230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:12,218-Speed 2497.10 samples/sec Loss 1.1926 LearningRate 0.000030 Epoch: 33 Global Step: 701240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:20,442-Speed 2490.47 samples/sec Loss 1.1828 LearningRate 0.000030 Epoch: 33 Global Step: 701250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:28,646-Speed 2496.94 samples/sec Loss 1.1742 LearningRate 0.000030 Epoch: 33 Global Step: 701260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:36,866-Speed 2492.00 samples/sec Loss 1.1959 LearningRate 0.000030 Epoch: 33 Global Step: 701270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:45,076-Speed 2495.15 samples/sec Loss 1.1657 LearningRate 0.000030 Epoch: 33 Global Step: 701280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 06:59:53,228-Speed 2512.55 samples/sec Loss 1.1557 LearningRate 0.000030 Epoch: 33 Global Step: 701290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:01,437-Speed 2495.17 samples/sec Loss 1.1665 LearningRate 0.000030 Epoch: 33 Global Step: 701300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:09,644-Speed 2496.14 samples/sec Loss 1.1643 LearningRate 0.000030 Epoch: 33 Global Step: 701310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:17,846-Speed 2497.07 samples/sec Loss 1.1506 LearningRate 0.000030 Epoch: 33 Global Step: 701320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:26,047-Speed 2497.58 samples/sec Loss 1.1698 LearningRate 0.000029 Epoch: 33 Global Step: 701330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:34,258-Speed 2494.80 samples/sec Loss 1.1651 LearningRate 0.000029 Epoch: 33 Global Step: 701340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:42,404-Speed 2514.23 samples/sec Loss 1.1747 LearningRate 0.000029 Epoch: 33 Global Step: 701350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:50,607-Speed 2497.16 samples/sec Loss 1.1670 LearningRate 0.000029 Epoch: 33 Global Step: 701360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:00:58,814-Speed 2495.78 samples/sec Loss 1.1313 LearningRate 0.000029 Epoch: 33 Global Step: 701370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:07,020-Speed 2496.29 samples/sec Loss 1.1509 LearningRate 0.000029 Epoch: 33 Global Step: 701380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:15,229-Speed 2495.25 samples/sec Loss 1.1720 LearningRate 0.000029 Epoch: 33 Global Step: 701390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:23,436-Speed 2495.86 samples/sec Loss 1.1797 LearningRate 0.000029 Epoch: 33 Global Step: 701400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:31,602-Speed 2508.16 samples/sec Loss 1.1660 LearningRate 0.000029 Epoch: 33 Global Step: 701410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:39,829-Speed 2490.01 samples/sec Loss 1.1786 LearningRate 0.000029 Epoch: 33 Global Step: 701420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:01:47,988-Speed 2510.60 samples/sec Loss 1.1387 LearningRate 0.000029 Epoch: 33 Global Step: 701430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:01:56,191-Speed 2496.68 samples/sec Loss 1.1688 LearningRate 0.000029 Epoch: 33 Global Step: 701440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:04,396-Speed 2496.68 samples/sec Loss 1.2010 LearningRate 0.000029 Epoch: 33 Global Step: 701450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:12,599-Speed 2496.90 samples/sec Loss 1.1602 LearningRate 0.000029 Epoch: 33 Global Step: 701460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:20,752-Speed 2512.31 samples/sec Loss 1.1778 LearningRate 0.000029 Epoch: 33 Global Step: 701470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:28,958-Speed 2496.37 samples/sec Loss 1.1479 LearningRate 0.000029 Epoch: 33 Global Step: 701480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:37,164-Speed 2495.96 samples/sec Loss 1.1624 LearningRate 0.000029 Epoch: 33 Global Step: 701490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:45,369-Speed 2496.51 samples/sec Loss 1.1664 LearningRate 0.000029 Epoch: 33 Global Step: 701500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:02:53,571-Speed 2497.23 samples/sec Loss 1.1990 LearningRate 0.000029 Epoch: 33 Global Step: 701510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:01,778-Speed 2496.02 samples/sec Loss 1.1828 LearningRate 0.000029 Epoch: 33 Global Step: 701520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:09,929-Speed 2512.84 samples/sec Loss 1.1861 LearningRate 0.000029 Epoch: 33 Global Step: 701530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:18,133-Speed 2496.60 samples/sec Loss 1.1828 LearningRate 0.000029 Epoch: 33 Global Step: 701540 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:26,335-Speed 2497.54 samples/sec Loss 1.1734 LearningRate 0.000029 Epoch: 33 Global Step: 701550 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:34,606-Speed 2495.79 samples/sec Loss 1.1642 LearningRate 0.000029 Epoch: 33 Global Step: 701560 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:42,844-Speed 2498.33 samples/sec Loss 1.1632 LearningRate 0.000029 Epoch: 33 Global Step: 701570 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:03:53,615-Speed 1901.67 samples/sec Loss 1.1896 LearningRate 0.000029 Epoch: 33 Global Step: 701580 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:01,759-Speed 2515.02 samples/sec Loss 1.2079 LearningRate 0.000029 Epoch: 33 Global Step: 701590 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:10,052-Speed 2499.14 samples/sec Loss 1.1473 LearningRate 0.000029 Epoch: 33 Global Step: 701600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:18,321-Speed 2498.78 samples/sec Loss 1.2048 LearningRate 0.000029 Epoch: 33 Global Step: 701610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:31,735-Speed 1526.94 samples/sec Loss 1.1914 LearningRate 0.000029 Epoch: 33 Global Step: 701620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:39,989-Speed 2499.97 samples/sec Loss 1.1595 LearningRate 0.000029 Epoch: 33 Global Step: 701630 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:49,495-Speed 2499.19 samples/sec Loss 1.1918 LearningRate 0.000029 Epoch: 33 Global Step: 701640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:04:58,333-Speed 2510.68 samples/sec Loss 1.1525 LearningRate 0.000029 Epoch: 33 Global Step: 701650 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:06,549-Speed 2492.96 samples/sec Loss 1.1596 LearningRate 0.000029 Epoch: 33 Global Step: 701660 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:18,034-Speed 1791.30 samples/sec Loss 1.1720 LearningRate 0.000029 Epoch: 33 Global Step: 701670 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:27,230-Speed 2494.52 samples/sec Loss 1.1674 LearningRate 0.000029 Epoch: 33 Global Step: 701680 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:35,510-Speed 2488.93 samples/sec Loss 1.1805 LearningRate 0.000029 Epoch: 33 Global Step: 701690 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:43,739-Speed 2488.78 samples/sec Loss 1.1695 LearningRate 0.000029 Epoch: 33 Global Step: 701700 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:05:55,267-Speed 1824.12 samples/sec Loss 1.1744 LearningRate 0.000029 Epoch: 33 Global Step: 701710 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:03,515-Speed 2495.68 samples/sec Loss 1.2077 LearningRate 0.000029 Epoch: 33 Global Step: 701720 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:11,727-Speed 2494.21 samples/sec Loss 1.1523 LearningRate 0.000029 Epoch: 33 Global Step: 701730 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:23,493-Speed 2499.56 samples/sec Loss 1.1694 LearningRate 0.000029 Epoch: 33 Global Step: 701740 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:31,749-Speed 2502.50 samples/sec Loss 1.1964 LearningRate 0.000029 Epoch: 33 Global Step: 701750 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:39,943-Speed 2499.56 samples/sec Loss 1.1777 LearningRate 0.000029 Epoch: 33 Global Step: 701760 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:48,106-Speed 2518.06 samples/sec Loss 1.1607 LearningRate 0.000029 Epoch: 33 Global Step: 701770 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:06:58,879-Speed 2498.39 samples/sec Loss 1.1734 LearningRate 0.000029 Epoch: 33 Global Step: 701780 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:07,107-Speed 2499.21 samples/sec Loss 1.1588 LearningRate 0.000029 Epoch: 33 Global Step: 701790 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:18,799-Speed 1751.66 samples/sec Loss 1.1904 LearningRate 0.000029 Epoch: 33 Global Step: 701800 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:26,994-Speed 2499.58 samples/sec Loss 1.1593 LearningRate 0.000029 Epoch: 33 Global Step: 701810 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:35,190-Speed 2499.14 samples/sec Loss 1.1846 LearningRate 0.000029 Epoch: 33 Global Step: 701820 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:43,336-Speed 2514.62 samples/sec Loss 1.1155 LearningRate 0.000029 Epoch: 33 Global Step: 701830 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:51,539-Speed 2496.86 samples/sec Loss 1.1185 LearningRate 0.000029 Epoch: 33 Global Step: 701840 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:07:59,744-Speed 2496.31 samples/sec Loss 1.2009 LearningRate 0.000029 Epoch: 33 Global Step: 701850 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:07,956-Speed 2494.67 samples/sec Loss 1.1818 LearningRate 0.000029 Epoch: 33 Global Step: 701860 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:16,175-Speed 2492.38 samples/sec Loss 1.1743 LearningRate 0.000029 Epoch: 33 Global Step: 701870 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:24,380-Speed 2496.20 samples/sec Loss 1.1755 LearningRate 0.000029 Epoch: 33 Global Step: 701880 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:32,534-Speed 2512.09 samples/sec Loss 1.1711 LearningRate 0.000029 Epoch: 33 Global Step: 701890 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:40,739-Speed 2496.64 samples/sec Loss 1.1602 LearningRate 0.000029 Epoch: 33 Global Step: 701900 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:48,944-Speed 2496.39 samples/sec Loss 1.1706 LearningRate 0.000029 Epoch: 33 Global Step: 701910 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:08:57,156-Speed 2494.47 samples/sec Loss 1.2060 LearningRate 0.000029 Epoch: 33 Global Step: 701920 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:05,362-Speed 2496.30 samples/sec Loss 1.1452 LearningRate 0.000029 Epoch: 33 Global Step: 701930 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:13,579-Speed 2492.91 samples/sec Loss 1.1527 LearningRate 0.000029 Epoch: 33 Global Step: 701940 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:21,770-Speed 2500.60 samples/sec Loss 1.1899 LearningRate 0.000029 Epoch: 33 Global Step: 701950 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:29,973-Speed 2496.86 samples/sec Loss 1.1719 LearningRate 0.000029 Epoch: 33 Global Step: 701960 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:38,178-Speed 2496.91 samples/sec Loss 1.1727 LearningRate 0.000029 Epoch: 33 Global Step: 701970 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:46,385-Speed 2496.03 samples/sec Loss 1.1550 LearningRate 0.000029 Epoch: 33 Global Step: 701980 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:09:54,598-Speed 2493.90 samples/sec Loss 1.1612 LearningRate 0.000029 Epoch: 33 Global Step: 701990 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:02,800-Speed 2497.35 samples/sec Loss 1.1892 LearningRate 0.000029 Epoch: 33 Global Step: 702000 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:10,954-Speed 2512.22 samples/sec Loss 1.1759 LearningRate 0.000029 Epoch: 33 Global Step: 702010 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:19,167-Speed 2493.86 samples/sec Loss 1.1713 LearningRate 0.000029 Epoch: 33 Global Step: 702020 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:27,374-Speed 2496.13 samples/sec Loss 1.1867 LearningRate 0.000029 Epoch: 33 Global Step: 702030 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:35,579-Speed 2496.14 samples/sec Loss 1.1935 LearningRate 0.000029 Epoch: 33 Global Step: 702040 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:43,785-Speed 2496.40 samples/sec Loss 1.1573 LearningRate 0.000029 Epoch: 33 Global Step: 702050 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:10:51,991-Speed 2496.18 samples/sec Loss 1.1476 LearningRate 0.000029 Epoch: 33 Global Step: 702060 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:00,144-Speed 2512.52 samples/sec Loss 1.1638 LearningRate 0.000029 Epoch: 33 Global Step: 702070 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:08,349-Speed 2496.23 samples/sec Loss 1.1767 LearningRate 0.000029 Epoch: 33 Global Step: 702080 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:16,571-Speed 2491.22 samples/sec Loss 1.1612 LearningRate 0.000029 Epoch: 33 Global Step: 702090 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:24,774-Speed 2497.45 samples/sec Loss 1.1352 LearningRate 0.000029 Epoch: 33 Global Step: 702100 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:32,991-Speed 2492.52 samples/sec Loss 1.1767 LearningRate 0.000029 Epoch: 33 Global Step: 702110 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:41,197-Speed 2496.18 samples/sec Loss 1.1602 LearningRate 0.000029 Epoch: 33 Global Step: 702120 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:49,349-Speed 2512.81 samples/sec Loss 1.1631 LearningRate 0.000029 Epoch: 33 Global Step: 702130 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:11:57,557-Speed 2495.43 samples/sec Loss 1.1764 LearningRate 0.000029 Epoch: 33 Global Step: 702140 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:05,760-Speed 2497.13 samples/sec Loss 1.1910 LearningRate 0.000029 Epoch: 33 Global Step: 702150 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:13,965-Speed 2496.45 samples/sec Loss 1.1757 LearningRate 0.000029 Epoch: 33 Global Step: 702160 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:22,171-Speed 2496.11 samples/sec Loss 1.1840 LearningRate 0.000029 Epoch: 33 Global Step: 702170 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:30,376-Speed 2496.44 samples/sec Loss 1.2038 LearningRate 0.000029 Epoch: 33 Global Step: 702180 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:38,526-Speed 2513.19 samples/sec Loss 1.1814 LearningRate 0.000029 Epoch: 33 Global Step: 702190 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:46,728-Speed 2497.29 samples/sec Loss 1.1467 LearningRate 0.000029 Epoch: 33 Global Step: 702200 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:12:54,934-Speed 2496.55 samples/sec Loss 1.1731 LearningRate 0.000029 Epoch: 33 Global Step: 702210 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:03,139-Speed 2496.41 samples/sec Loss 1.1693 LearningRate 0.000029 Epoch: 33 Global Step: 702220 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:11,361-Speed 2491.40 samples/sec Loss 1.1732 LearningRate 0.000029 Epoch: 33 Global Step: 702230 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:19,566-Speed 2496.25 samples/sec Loss 1.1690 LearningRate 0.000029 Epoch: 33 Global Step: 702240 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:27,724-Speed 2510.95 samples/sec Loss 1.2137 LearningRate 0.000029 Epoch: 33 Global Step: 702250 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:35,924-Speed 2497.92 samples/sec Loss 1.1621 LearningRate 0.000029 Epoch: 33 Global Step: 702260 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:44,129-Speed 2496.58 samples/sec Loss 1.1947 LearningRate 0.000029 Epoch: 33 Global Step: 702270 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:13:52,332-Speed 2496.90 samples/sec Loss 1.1820 LearningRate 0.000029 Epoch: 33 Global Step: 702280 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:00,534-Speed 2497.29 samples/sec Loss 1.1705 LearningRate 0.000029 Epoch: 33 Global Step: 702290 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:08,740-Speed 2496.45 samples/sec Loss 1.1554 LearningRate 0.000029 Epoch: 33 Global Step: 702300 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:16,891-Speed 2512.79 samples/sec Loss 1.1797 LearningRate 0.000029 Epoch: 33 Global Step: 702310 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:25,093-Speed 2497.46 samples/sec Loss 1.1725 LearningRate 0.000029 Epoch: 33 Global Step: 702320 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:33,294-Speed 2497.71 samples/sec Loss 1.1951 LearningRate 0.000029 Epoch: 33 Global Step: 702330 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:41,495-Speed 2497.72 samples/sec Loss 1.1830 LearningRate 0.000029 Epoch: 33 Global Step: 702340 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:49,694-Speed 2498.16 samples/sec Loss 1.2116 LearningRate 0.000029 Epoch: 33 Global Step: 702350 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:14:57,899-Speed 2496.44 samples/sec Loss 1.1718 LearningRate 0.000029 Epoch: 33 Global Step: 702360 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:06,048-Speed 2513.95 samples/sec Loss 1.1753 LearningRate 0.000029 Epoch: 33 Global Step: 702370 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:14,248-Speed 2497.80 samples/sec Loss 1.1669 LearningRate 0.000029 Epoch: 33 Global Step: 702380 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:22,452-Speed 2496.77 samples/sec Loss 1.1818 LearningRate 0.000029 Epoch: 33 Global Step: 702390 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:30,653-Speed 2497.73 samples/sec Loss 1.1243 LearningRate 0.000029 Epoch: 33 Global Step: 702400 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:38,855-Speed 2497.55 samples/sec Loss 1.1840 LearningRate 0.000029 Epoch: 33 Global Step: 702410 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:47,058-Speed 2496.80 samples/sec Loss 1.1728 LearningRate 0.000029 Epoch: 33 Global Step: 702420 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:15:55,209-Speed 2513.01 samples/sec Loss 1.1928 LearningRate 0.000029 Epoch: 33 Global Step: 702430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:03,412-Speed 2497.02 samples/sec Loss 1.1738 LearningRate 0.000029 Epoch: 33 Global Step: 702440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:11,612-Speed 2497.89 samples/sec Loss 1.1807 LearningRate 0.000029 Epoch: 33 Global Step: 702450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:19,820-Speed 2495.64 samples/sec Loss 1.1550 LearningRate 0.000029 Epoch: 33 Global Step: 702460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:28,039-Speed 2492.13 samples/sec Loss 1.1777 LearningRate 0.000029 Epoch: 33 Global Step: 702470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:36,242-Speed 2496.95 samples/sec Loss 1.1512 LearningRate 0.000029 Epoch: 33 Global Step: 702480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:44,408-Speed 2508.37 samples/sec Loss 1.1544 LearningRate 0.000029 Epoch: 33 Global Step: 702490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:16:52,619-Speed 2494.84 samples/sec Loss 1.1959 LearningRate 0.000029 Epoch: 33 Global Step: 702500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:00,827-Speed 2495.59 samples/sec Loss 1.1844 LearningRate 0.000029 Epoch: 33 Global Step: 702510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:09,034-Speed 2495.84 samples/sec Loss 1.1480 LearningRate 0.000029 Epoch: 33 Global Step: 702520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:17,239-Speed 2496.59 samples/sec Loss 1.1689 LearningRate 0.000029 Epoch: 33 Global Step: 702530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:25,458-Speed 2492.03 samples/sec Loss 1.1768 LearningRate 0.000029 Epoch: 33 Global Step: 702540 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:33,608-Speed 2513.29 samples/sec Loss 1.1460 LearningRate 0.000029 Epoch: 33 Global Step: 702550 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:41,810-Speed 2497.46 samples/sec Loss 1.1558 LearningRate 0.000029 Epoch: 33 Global Step: 702560 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:50,014-Speed 2496.88 samples/sec Loss 1.1613 LearningRate 0.000029 Epoch: 33 Global Step: 702570 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:17:58,222-Speed 2495.51 samples/sec Loss 1.1563 LearningRate 0.000029 Epoch: 33 Global Step: 702580 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:18:06,424-Speed 2497.17 samples/sec Loss 1.1710 LearningRate 0.000029 Epoch: 33 Global Step: 702590 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:18:14,631-Speed 2495.87 samples/sec Loss 1.1872 LearningRate 0.000029 Epoch: 33 Global Step: 702600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:18:22,782-Speed 2513.25 samples/sec Loss 1.1822 LearningRate 0.000029 Epoch: 33 Global Step: 702610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:18:30,989-Speed 2495.53 samples/sec Loss 1.1610 LearningRate 0.000029 Epoch: 33 Global Step: 702620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:18:39,193-Speed 2497.01 samples/sec Loss 1.1583 LearningRate 0.000029 Epoch: 33 Global Step: 702630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:18:47,396-Speed 2497.24 samples/sec Loss 1.1736 LearningRate 0.000029 Epoch: 33 Global Step: 702640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:18:55,617-Speed 2491.52 samples/sec Loss 1.1859 LearningRate 0.000029 Epoch: 33 Global Step: 702650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:03,831-Speed 2493.71 samples/sec Loss 1.1505 LearningRate 0.000029 Epoch: 33 Global Step: 702660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:11,984-Speed 2512.51 samples/sec Loss 1.1990 LearningRate 0.000029 Epoch: 33 Global Step: 702670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:20,188-Speed 2496.64 samples/sec Loss 1.1652 LearningRate 0.000029 Epoch: 33 Global Step: 702680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:28,391-Speed 2497.02 samples/sec Loss 1.1821 LearningRate 0.000029 Epoch: 33 Global Step: 702690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:36,598-Speed 2495.74 samples/sec Loss 1.1506 LearningRate 0.000029 Epoch: 33 Global Step: 702700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:44,794-Speed 2499.26 samples/sec Loss 1.1530 LearningRate 0.000029 Epoch: 33 Global Step: 702710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:19:52,998-Speed 2496.90 samples/sec Loss 1.2020 LearningRate 0.000029 Epoch: 33 Global Step: 702720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:01,148-Speed 2513.17 samples/sec Loss 1.1460 LearningRate 0.000029 Epoch: 33 Global Step: 702730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:09,355-Speed 2495.87 samples/sec Loss 1.1873 LearningRate 0.000029 Epoch: 33 Global Step: 702740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:17,557-Speed 2497.34 samples/sec Loss 1.1631 LearningRate 0.000029 Epoch: 33 Global Step: 702750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:25,760-Speed 2496.88 samples/sec Loss 1.1397 LearningRate 0.000029 Epoch: 33 Global Step: 702760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:33,963-Speed 2497.53 samples/sec Loss 1.1950 LearningRate 0.000029 Epoch: 33 Global Step: 702770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:42,166-Speed 2497.00 samples/sec Loss 1.1965 LearningRate 0.000029 Epoch: 33 Global Step: 702780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:50,317-Speed 2512.94 samples/sec Loss 1.1626 LearningRate 0.000029 Epoch: 33 Global Step: 702790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:20:58,531-Speed 2493.84 samples/sec Loss 1.1952 LearningRate 0.000029 Epoch: 33 Global Step: 702800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:06,737-Speed 2496.04 samples/sec Loss 1.1540 LearningRate 0.000029 Epoch: 33 Global Step: 702810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:14,939-Speed 2497.35 samples/sec Loss 1.1527 LearningRate 0.000029 Epoch: 33 Global Step: 702820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:23,148-Speed 2495.59 samples/sec Loss 1.1836 LearningRate 0.000029 Epoch: 33 Global Step: 702830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:31,355-Speed 2495.61 samples/sec Loss 1.1720 LearningRate 0.000029 Epoch: 33 Global Step: 702840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:39,504-Speed 2513.45 samples/sec Loss 1.2233 LearningRate 0.000029 Epoch: 33 Global Step: 702850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:47,708-Speed 2496.86 samples/sec Loss 1.1364 LearningRate 0.000029 Epoch: 33 Global Step: 702860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:21:55,927-Speed 2492.48 samples/sec Loss 1.1903 LearningRate 0.000029 Epoch: 33 Global Step: 702870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:04,133-Speed 2495.86 samples/sec Loss 1.1790 LearningRate 0.000029 Epoch: 33 Global Step: 702880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:12,341-Speed 2495.57 samples/sec Loss 1.1664 LearningRate 0.000029 Epoch: 33 Global Step: 702890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:20,543-Speed 2497.33 samples/sec Loss 1.1689 LearningRate 0.000029 Epoch: 33 Global Step: 702900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:28,695-Speed 2512.74 samples/sec Loss 1.1705 LearningRate 0.000029 Epoch: 33 Global Step: 702910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:36,898-Speed 2497.01 samples/sec Loss 1.1856 LearningRate 0.000029 Epoch: 33 Global Step: 702920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:45,103-Speed 2496.54 samples/sec Loss 1.1616 LearningRate 0.000029 Epoch: 33 Global Step: 702930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:22:53,308-Speed 2496.22 samples/sec Loss 1.1501 LearningRate 0.000029 Epoch: 33 Global Step: 702940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:01,514-Speed 2496.19 samples/sec Loss 1.1590 LearningRate 0.000029 Epoch: 33 Global Step: 702950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:09,720-Speed 2496.00 samples/sec Loss 1.1809 LearningRate 0.000029 Epoch: 33 Global Step: 702960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:17,871-Speed 2513.20 samples/sec Loss 1.1523 LearningRate 0.000029 Epoch: 33 Global Step: 702970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:26,075-Speed 2496.73 samples/sec Loss 1.1688 LearningRate 0.000029 Epoch: 33 Global Step: 702980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:34,276-Speed 2497.55 samples/sec Loss 1.1489 LearningRate 0.000029 Epoch: 33 Global Step: 702990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:42,477-Speed 2497.46 samples/sec Loss 1.1622 LearningRate 0.000029 Epoch: 33 Global Step: 703000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:50,694-Speed 2493.11 samples/sec Loss 1.1507 LearningRate 0.000029 Epoch: 33 Global Step: 703010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:23:58,897-Speed 2497.26 samples/sec Loss 1.1822 LearningRate 0.000029 Epoch: 33 Global Step: 703020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:07,048-Speed 2512.75 samples/sec Loss 1.1816 LearningRate 0.000029 Epoch: 33 Global Step: 703030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:15,249-Speed 2497.97 samples/sec Loss 1.1718 LearningRate 0.000029 Epoch: 33 Global Step: 703040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:23,455-Speed 2496.18 samples/sec Loss 1.1772 LearningRate 0.000029 Epoch: 33 Global Step: 703050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:31,663-Speed 2495.30 samples/sec Loss 1.1802 LearningRate 0.000029 Epoch: 33 Global Step: 703060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:39,869-Speed 2496.55 samples/sec Loss 1.1679 LearningRate 0.000029 Epoch: 33 Global Step: 703070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:48,071-Speed 2497.38 samples/sec Loss 1.1439 LearningRate 0.000029 Epoch: 33 Global Step: 703080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:24:56,230-Speed 2510.47 samples/sec Loss 1.1530 LearningRate 0.000029 Epoch: 33 Global Step: 703090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:04,435-Speed 2496.51 samples/sec Loss 1.1599 LearningRate 0.000029 Epoch: 33 Global Step: 703100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:12,641-Speed 2496.06 samples/sec Loss 1.1607 LearningRate 0.000029 Epoch: 33 Global Step: 703110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:20,845-Speed 2496.60 samples/sec Loss 1.1708 LearningRate 0.000029 Epoch: 33 Global Step: 703120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:29,069-Speed 2490.95 samples/sec Loss 1.1630 LearningRate 0.000029 Epoch: 33 Global Step: 703130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:37,275-Speed 2496.08 samples/sec Loss 1.1991 LearningRate 0.000029 Epoch: 33 Global Step: 703140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:45,428-Speed 2512.30 samples/sec Loss 1.1653 LearningRate 0.000029 Epoch: 33 Global Step: 703150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:25:53,631-Speed 2496.99 samples/sec Loss 1.1684 LearningRate 0.000029 Epoch: 33 Global Step: 703160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:01,833-Speed 2497.36 samples/sec Loss 1.1709 LearningRate 0.000029 Epoch: 33 Global Step: 703170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:10,039-Speed 2496.26 samples/sec Loss 1.1734 LearningRate 0.000029 Epoch: 33 Global Step: 703180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:18,244-Speed 2496.29 samples/sec Loss 1.1614 LearningRate 0.000029 Epoch: 33 Global Step: 703190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:26,447-Speed 2497.37 samples/sec Loss 1.1968 LearningRate 0.000029 Epoch: 33 Global Step: 703200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:34,594-Speed 2514.11 samples/sec Loss 1.1118 LearningRate 0.000029 Epoch: 33 Global Step: 703210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:42,794-Speed 2498.05 samples/sec Loss 1.1776 LearningRate 0.000029 Epoch: 33 Global Step: 703220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:50,996-Speed 2497.30 samples/sec Loss 1.1723 LearningRate 0.000029 Epoch: 33 Global Step: 703230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:26:59,205-Speed 2495.30 samples/sec Loss 1.1755 LearningRate 0.000029 Epoch: 33 Global Step: 703240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:07,405-Speed 2497.72 samples/sec Loss 1.1561 LearningRate 0.000029 Epoch: 33 Global Step: 703250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:15,608-Speed 2496.95 samples/sec Loss 1.1780 LearningRate 0.000029 Epoch: 33 Global Step: 703260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:23,759-Speed 2512.93 samples/sec Loss 1.1414 LearningRate 0.000029 Epoch: 33 Global Step: 703270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:31,961-Speed 2497.48 samples/sec Loss 1.1962 LearningRate 0.000029 Epoch: 33 Global Step: 703280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:40,165-Speed 2496.75 samples/sec Loss 1.1626 LearningRate 0.000029 Epoch: 33 Global Step: 703290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:48,370-Speed 2496.35 samples/sec Loss 1.1662 LearningRate 0.000029 Epoch: 33 Global Step: 703300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:27:56,580-Speed 2494.93 samples/sec Loss 1.1744 LearningRate 0.000029 Epoch: 33 Global Step: 703310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:04,787-Speed 2495.76 samples/sec Loss 1.1652 LearningRate 0.000029 Epoch: 33 Global Step: 703320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:12,943-Speed 2511.64 samples/sec Loss 1.1695 LearningRate 0.000029 Epoch: 33 Global Step: 703330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:21,152-Speed 2495.33 samples/sec Loss 1.1700 LearningRate 0.000029 Epoch: 33 Global Step: 703340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:29,365-Speed 2494.26 samples/sec Loss 1.1597 LearningRate 0.000029 Epoch: 33 Global Step: 703350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:37,572-Speed 2495.74 samples/sec Loss 1.1854 LearningRate 0.000029 Epoch: 33 Global Step: 703360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:45,778-Speed 2496.12 samples/sec Loss 1.1448 LearningRate 0.000029 Epoch: 33 Global Step: 703370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:28:53,986-Speed 2495.61 samples/sec Loss 1.1608 LearningRate 0.000029 Epoch: 33 Global Step: 703380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:02,154-Speed 2507.86 samples/sec Loss 1.1775 LearningRate 0.000029 Epoch: 33 Global Step: 703390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:10,357-Speed 2496.94 samples/sec Loss 1.1965 LearningRate 0.000029 Epoch: 33 Global Step: 703400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:18,564-Speed 2495.85 samples/sec Loss 1.1856 LearningRate 0.000029 Epoch: 33 Global Step: 703410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:26,771-Speed 2495.70 samples/sec Loss 1.1728 LearningRate 0.000029 Epoch: 33 Global Step: 703420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:34,973-Speed 2497.69 samples/sec Loss 1.1770 LearningRate 0.000029 Epoch: 33 Global Step: 703430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:43,179-Speed 2496.23 samples/sec Loss 1.1560 LearningRate 0.000029 Epoch: 33 Global Step: 703440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:51,328-Speed 2513.45 samples/sec Loss 1.1700 LearningRate 0.000029 Epoch: 33 Global Step: 703450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:29:59,531-Speed 2497.02 samples/sec Loss 1.1701 LearningRate 0.000029 Epoch: 33 Global Step: 703460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:07,742-Speed 2494.82 samples/sec Loss 1.1751 LearningRate 0.000029 Epoch: 33 Global Step: 703470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:15,945-Speed 2496.85 samples/sec Loss 1.1493 LearningRate 0.000029 Epoch: 33 Global Step: 703480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:24,159-Speed 2493.80 samples/sec Loss 1.1793 LearningRate 0.000029 Epoch: 33 Global Step: 703490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:32,363-Speed 2496.91 samples/sec Loss 1.1690 LearningRate 0.000029 Epoch: 33 Global Step: 703500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:40,523-Speed 2510.33 samples/sec Loss 1.1863 LearningRate 0.000029 Epoch: 33 Global Step: 703510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:48,727-Speed 2496.57 samples/sec Loss 1.1618 LearningRate 0.000028 Epoch: 33 Global Step: 703520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:30:56,938-Speed 2494.43 samples/sec Loss 1.1558 LearningRate 0.000028 Epoch: 33 Global Step: 703530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:05,162-Speed 2490.80 samples/sec Loss 1.1508 LearningRate 0.000028 Epoch: 33 Global Step: 703540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:13,370-Speed 2495.87 samples/sec Loss 1.1833 LearningRate 0.000028 Epoch: 33 Global Step: 703550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:21,581-Speed 2494.44 samples/sec Loss 1.1863 LearningRate 0.000028 Epoch: 33 Global Step: 703560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:29,739-Speed 2510.90 samples/sec Loss 1.1624 LearningRate 0.000028 Epoch: 33 Global Step: 703570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:37,945-Speed 2496.28 samples/sec Loss 1.1619 LearningRate 0.000028 Epoch: 33 Global Step: 703580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:46,149-Speed 2496.52 samples/sec Loss 1.1796 LearningRate 0.000028 Epoch: 33 Global Step: 703590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:31:54,353-Speed 2496.65 samples/sec Loss 1.1915 LearningRate 0.000028 Epoch: 33 Global Step: 703600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:02,566-Speed 2494.37 samples/sec Loss 1.1712 LearningRate 0.000028 Epoch: 33 Global Step: 703610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:10,771-Speed 2496.98 samples/sec Loss 1.1710 LearningRate 0.000028 Epoch: 33 Global Step: 703620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:18,920-Speed 2513.38 samples/sec Loss 1.1760 LearningRate 0.000028 Epoch: 33 Global Step: 703630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:27,124-Speed 2496.66 samples/sec Loss 1.1704 LearningRate 0.000028 Epoch: 33 Global Step: 703640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:35,341-Speed 2493.24 samples/sec Loss 1.1698 LearningRate 0.000028 Epoch: 33 Global Step: 703650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:43,543-Speed 2497.35 samples/sec Loss 1.1419 LearningRate 0.000028 Epoch: 33 Global Step: 703660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:51,751-Speed 2495.64 samples/sec Loss 1.1576 LearningRate 0.000028 Epoch: 33 Global Step: 703670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:32:59,955-Speed 2496.64 samples/sec Loss 1.1995 LearningRate 0.000028 Epoch: 33 Global Step: 703680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:08,108-Speed 2512.35 samples/sec Loss 1.1980 LearningRate 0.000028 Epoch: 33 Global Step: 703690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:16,316-Speed 2495.86 samples/sec Loss 1.1523 LearningRate 0.000028 Epoch: 33 Global Step: 703700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:24,518-Speed 2497.11 samples/sec Loss 1.1786 LearningRate 0.000028 Epoch: 33 Global Step: 703710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:32,724-Speed 2496.33 samples/sec Loss 1.1674 LearningRate 0.000028 Epoch: 33 Global Step: 703720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:40,928-Speed 2496.65 samples/sec Loss 1.1577 LearningRate 0.000028 Epoch: 33 Global Step: 703730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:49,130-Speed 2497.86 samples/sec Loss 1.1774 LearningRate 0.000028 Epoch: 33 Global Step: 703740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:33:57,280-Speed 2513.53 samples/sec Loss 1.1869 LearningRate 0.000028 Epoch: 33 Global Step: 703750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:05,494-Speed 2493.38 samples/sec Loss 1.1902 LearningRate 0.000028 Epoch: 33 Global Step: 703760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:13,699-Speed 2496.67 samples/sec Loss 1.1616 LearningRate 0.000028 Epoch: 33 Global Step: 703770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:21,913-Speed 2494.09 samples/sec Loss 1.1343 LearningRate 0.000028 Epoch: 33 Global Step: 703780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:30,121-Speed 2495.42 samples/sec Loss 1.1506 LearningRate 0.000028 Epoch: 33 Global Step: 703790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:38,333-Speed 2494.36 samples/sec Loss 1.1616 LearningRate 0.000028 Epoch: 33 Global Step: 703800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:46,480-Speed 2514.48 samples/sec Loss 1.1864 LearningRate 0.000028 Epoch: 33 Global Step: 703810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:34:54,685-Speed 2496.73 samples/sec Loss 1.1288 LearningRate 0.000028 Epoch: 33 Global Step: 703820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:35:02,890-Speed 2496.34 samples/sec Loss 1.1410 LearningRate 0.000028 Epoch: 33 Global Step: 703830 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:11,095-Speed 2496.73 samples/sec Loss 1.1429 LearningRate 0.000028 Epoch: 33 Global Step: 703840 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:19,300-Speed 2496.40 samples/sec Loss 1.1482 LearningRate 0.000028 Epoch: 33 Global Step: 703850 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:27,507-Speed 2495.84 samples/sec Loss 1.1750 LearningRate 0.000028 Epoch: 33 Global Step: 703860 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:35,662-Speed 2511.58 samples/sec Loss 1.1316 LearningRate 0.000028 Epoch: 33 Global Step: 703870 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:43,868-Speed 2496.44 samples/sec Loss 1.1574 LearningRate 0.000028 Epoch: 33 Global Step: 703880 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:35:52,076-Speed 2495.45 samples/sec Loss 1.1741 LearningRate 0.000028 Epoch: 33 Global Step: 703890 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:00,281-Speed 2496.47 samples/sec Loss 1.1601 LearningRate 0.000028 Epoch: 33 Global Step: 703900 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:08,508-Speed 2489.76 samples/sec Loss 1.1611 LearningRate 0.000028 Epoch: 33 Global Step: 703910 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:16,714-Speed 2496.30 samples/sec Loss 1.1793 LearningRate 0.000028 Epoch: 33 Global Step: 703920 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:24,867-Speed 2512.68 samples/sec Loss 1.1634 LearningRate 0.000028 Epoch: 33 Global Step: 703930 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:33,071-Speed 2496.62 samples/sec Loss 1.1438 LearningRate 0.000028 Epoch: 33 Global Step: 703940 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:41,276-Speed 2496.68 samples/sec Loss 1.1435 LearningRate 0.000028 Epoch: 33 Global Step: 703950 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:49,483-Speed 2495.73 samples/sec Loss 1.1576 LearningRate 0.000028 Epoch: 33 Global Step: 703960 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:36:57,697-Speed 2493.86 samples/sec Loss 1.1572 LearningRate 0.000028 Epoch: 33 Global Step: 703970 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:05,909-Speed 2494.18 samples/sec Loss 1.1473 LearningRate 0.000028 Epoch: 33 Global Step: 703980 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:14,061-Speed 2512.59 samples/sec Loss 1.1554 LearningRate 0.000028 Epoch: 33 Global Step: 703990 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:22,266-Speed 2496.48 samples/sec Loss 1.1873 LearningRate 0.000028 Epoch: 33 Global Step: 704000 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:30,488-Speed 2491.42 samples/sec Loss 1.1690 LearningRate 0.000028 Epoch: 33 Global Step: 704010 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:38,694-Speed 2496.02 samples/sec Loss 1.1371 LearningRate 0.000028 Epoch: 33 Global Step: 704020 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:46,899-Speed 2496.62 samples/sec Loss 1.1417 LearningRate 0.000028 Epoch: 33 Global Step: 704030 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:37:55,108-Speed 2495.50 samples/sec Loss 1.1304 LearningRate 0.000028 Epoch: 33 Global Step: 704040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:03,258-Speed 2513.19 samples/sec Loss 1.1537 LearningRate 0.000028 Epoch: 33 Global Step: 704050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:11,463-Speed 2496.38 samples/sec Loss 1.1458 LearningRate 0.000028 Epoch: 33 Global Step: 704060 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:19,670-Speed 2495.92 samples/sec Loss 1.1495 LearningRate 0.000028 Epoch: 33 Global Step: 704070 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:27,872-Speed 2497.26 samples/sec Loss 1.1632 LearningRate 0.000028 Epoch: 33 Global Step: 704080 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:36,079-Speed 2495.61 samples/sec Loss 1.1633 LearningRate 0.000028 Epoch: 33 Global Step: 704090 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:44,281-Speed 2497.45 samples/sec Loss 1.1410 LearningRate 0.000028 Epoch: 33 Global Step: 704100 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:38:52,429-Speed 2513.77 samples/sec Loss 1.1457 LearningRate 0.000028 Epoch: 33 Global Step: 704110 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:00,635-Speed 2496.32 samples/sec Loss 1.1700 LearningRate 0.000028 Epoch: 33 Global Step: 704120 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:08,837-Speed 2497.08 samples/sec Loss 1.1593 LearningRate 0.000028 Epoch: 33 Global Step: 704130 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:17,040-Speed 2497.24 samples/sec Loss 1.1903 LearningRate 0.000028 Epoch: 33 Global Step: 704140 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:25,242-Speed 2497.47 samples/sec Loss 1.1691 LearningRate 0.000028 Epoch: 33 Global Step: 704150 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:33,467-Speed 2490.22 samples/sec Loss 1.1732 LearningRate 0.000028 Epoch: 33 Global Step: 704160 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:41,618-Speed 2512.93 samples/sec Loss 1.1866 LearningRate 0.000028 Epoch: 33 Global Step: 704170 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:49,825-Speed 2496.09 samples/sec Loss 1.1652 LearningRate 0.000028 Epoch: 33 Global Step: 704180 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:39:58,031-Speed 2496.20 samples/sec Loss 1.1728 LearningRate 0.000028 Epoch: 33 Global Step: 704190 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:40:06,242-Speed 2494.24 samples/sec Loss 1.1169 LearningRate 0.000028 Epoch: 33 Global Step: 704200 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:40:14,448-Speed 2496.39 samples/sec Loss 1.1719 LearningRate 0.000028 Epoch: 33 Global Step: 704210 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:40:22,664-Speed 2492.89 samples/sec Loss 1.1529 LearningRate 0.000028 Epoch: 33 Global Step: 704220 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:40:30,819-Speed 2511.77 samples/sec Loss 1.1727 LearningRate 0.000028 Epoch: 33 Global Step: 704230 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-07-12 07:40:38,983-Speed 2509.03 samples/sec Loss 1.1637 LearningRate 0.000028 Epoch: 33 Global Step: 704240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:40:47,184-Speed 2497.59 samples/sec Loss 1.1893 LearningRate 0.000028 Epoch: 33 Global Step: 704250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:40:55,388-Speed 2496.75 samples/sec Loss 1.1748 LearningRate 0.000028 Epoch: 33 Global Step: 704260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:03,594-Speed 2496.34 samples/sec Loss 1.1763 LearningRate 0.000028 Epoch: 33 Global Step: 704270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:11,801-Speed 2495.68 samples/sec Loss 1.1793 LearningRate 0.000028 Epoch: 33 Global Step: 704280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:19,963-Speed 2509.69 samples/sec Loss 1.2089 LearningRate 0.000028 Epoch: 33 Global Step: 704290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:28,165-Speed 2497.40 samples/sec Loss 1.1812 LearningRate 0.000028 Epoch: 33 Global Step: 704300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:36,368-Speed 2496.97 samples/sec Loss 1.1516 LearningRate 0.000028 Epoch: 33 Global Step: 704310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:44,570-Speed 2497.38 samples/sec Loss 1.1762 LearningRate 0.000028 Epoch: 33 Global Step: 704320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:41:52,776-Speed 2496.12 samples/sec Loss 1.1642 LearningRate 0.000028 Epoch: 33 Global Step: 704330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:42:00,982-Speed 2497.09 samples/sec Loss 1.1573 LearningRate 0.000028 Epoch: 33 Global Step: 704340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:42:09,133-Speed 2512.83 samples/sec Loss 1.1675 LearningRate 0.000028 Epoch: 33 Global Step: 704350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-07-12 07:42:17,297-Speed 2508.94 samples/sec Loss 1.1602 LearningRate 0.000028 Epoch: 33 Global Step: 704360 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:42:25,502-Speed 2496.61 samples/sec Loss 1.1722 LearningRate 0.000028 Epoch: 33 Global Step: 704370 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:42:33,705-Speed 2496.90 samples/sec Loss 1.1453 LearningRate 0.000028 Epoch: 33 Global Step: 704380 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:42:41,912-Speed 2495.84 samples/sec Loss 1.1777 LearningRate 0.000028 Epoch: 33 Global Step: 704390 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:42:50,118-Speed 2496.54 samples/sec Loss 1.1372 LearningRate 0.000028 Epoch: 33 Global Step: 704400 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:42:58,273-Speed 2511.68 samples/sec Loss 1.2022 LearningRate 0.000028 Epoch: 33 Global Step: 704410 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:06,482-Speed 2495.16 samples/sec Loss 1.1708 LearningRate 0.000028 Epoch: 33 Global Step: 704420 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:14,699-Speed 2493.10 samples/sec Loss 1.1654 LearningRate 0.000028 Epoch: 33 Global Step: 704430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:22,907-Speed 2495.39 samples/sec Loss 1.1967 LearningRate 0.000028 Epoch: 33 Global Step: 704440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:31,108-Speed 2497.74 samples/sec Loss 1.1918 LearningRate 0.000028 Epoch: 33 Global Step: 704450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:39,311-Speed 2497.25 samples/sec Loss 1.1801 LearningRate 0.000028 Epoch: 33 Global Step: 704460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:47,459-Speed 2513.64 samples/sec Loss 1.1736 LearningRate 0.000028 Epoch: 33 Global Step: 704470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:43:55,671-Speed 2494.46 samples/sec Loss 1.1453 LearningRate 0.000028 Epoch: 33 Global Step: 704480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:03,883-Speed 2494.38 samples/sec Loss 1.1758 LearningRate 0.000028 Epoch: 33 Global Step: 704490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:12,088-Speed 2496.62 samples/sec Loss 1.1600 LearningRate 0.000028 Epoch: 33 Global Step: 704500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:20,288-Speed 2497.76 samples/sec Loss 1.1451 LearningRate 0.000028 Epoch: 33 Global Step: 704510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:28,493-Speed 2496.36 samples/sec Loss 1.1457 LearningRate 0.000028 Epoch: 33 Global Step: 704520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:36,649-Speed 2511.63 samples/sec Loss 1.1697 LearningRate 0.000028 Epoch: 33 Global Step: 704530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:44,850-Speed 2497.73 samples/sec Loss 1.1850 LearningRate 0.000028 Epoch: 33 Global Step: 704540 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:44:53,054-Speed 2496.66 samples/sec Loss 1.1898 LearningRate 0.000028 Epoch: 33 Global Step: 704550 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:01,258-Speed 2496.81 samples/sec Loss 1.1579 LearningRate 0.000028 Epoch: 33 Global Step: 704560 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:09,456-Speed 2498.59 samples/sec Loss 1.1661 LearningRate 0.000028 Epoch: 33 Global Step: 704570 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:17,674-Speed 2492.50 samples/sec Loss 1.1792 LearningRate 0.000028 Epoch: 33 Global Step: 704580 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:25,835-Speed 2510.04 samples/sec Loss 1.1710 LearningRate 0.000028 Epoch: 33 Global Step: 704590 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:34,035-Speed 2497.82 samples/sec Loss 1.1544 LearningRate 0.000028 Epoch: 33 Global Step: 704600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:42,238-Speed 2497.28 samples/sec Loss 1.1730 LearningRate 0.000028 Epoch: 33 Global Step: 704610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:50,440-Speed 2497.42 samples/sec Loss 1.1926 LearningRate 0.000028 Epoch: 33 Global Step: 704620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:45:58,642-Speed 2497.26 samples/sec Loss 1.1576 LearningRate 0.000028 Epoch: 33 Global Step: 704630 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:06,845-Speed 2497.60 samples/sec Loss 1.1946 LearningRate 0.000028 Epoch: 33 Global Step: 704640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:14,994-Speed 2513.57 samples/sec Loss 1.1836 LearningRate 0.000028 Epoch: 33 Global Step: 704650 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:23,194-Speed 2497.80 samples/sec Loss 1.1652 LearningRate 0.000028 Epoch: 33 Global Step: 704660 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:31,393-Speed 2498.38 samples/sec Loss 1.1502 LearningRate 0.000028 Epoch: 33 Global Step: 704670 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:39,599-Speed 2496.34 samples/sec Loss 1.1726 LearningRate 0.000028 Epoch: 33 Global Step: 704680 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:47,804-Speed 2496.32 samples/sec Loss 1.1860 LearningRate 0.000028 Epoch: 33 Global Step: 704690 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:46:56,007-Speed 2497.53 samples/sec Loss 1.1473 LearningRate 0.000028 Epoch: 33 Global Step: 704700 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:04,152-Speed 2514.53 samples/sec Loss 1.1760 LearningRate 0.000028 Epoch: 33 Global Step: 704710 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:12,355-Speed 2497.13 samples/sec Loss 1.1797 LearningRate 0.000028 Epoch: 33 Global Step: 704720 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:20,558-Speed 2497.20 samples/sec Loss 1.1696 LearningRate 0.000028 Epoch: 33 Global Step: 704730 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:28,759-Speed 2497.65 samples/sec Loss 1.1749 LearningRate 0.000028 Epoch: 33 Global Step: 704740 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:36,964-Speed 2496.28 samples/sec Loss 1.1833 LearningRate 0.000028 Epoch: 33 Global Step: 704750 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:45,165-Speed 2497.79 samples/sec Loss 1.1709 LearningRate 0.000028 Epoch: 33 Global Step: 704760 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:47:53,321-Speed 2511.60 samples/sec Loss 1.1621 LearningRate 0.000028 Epoch: 33 Global Step: 704770 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:01,520-Speed 2498.05 samples/sec Loss 1.1740 LearningRate 0.000028 Epoch: 33 Global Step: 704780 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:09,723-Speed 2497.06 samples/sec Loss 1.1739 LearningRate 0.000028 Epoch: 33 Global Step: 704790 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:17,924-Speed 2497.70 samples/sec Loss 1.1729 LearningRate 0.000028 Epoch: 33 Global Step: 704800 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:26,126-Speed 2497.48 samples/sec Loss 1.1858 LearningRate 0.000028 Epoch: 33 Global Step: 704810 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:34,328-Speed 2497.46 samples/sec Loss 1.1599 LearningRate 0.000028 Epoch: 33 Global Step: 704820 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:42,475-Speed 2513.98 samples/sec Loss 1.1919 LearningRate 0.000028 Epoch: 33 Global Step: 704830 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:50,678-Speed 2497.29 samples/sec Loss 1.1728 LearningRate 0.000028 Epoch: 33 Global Step: 704840 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:48:58,900-Speed 2491.13 samples/sec Loss 1.1885 LearningRate 0.000028 Epoch: 33 Global Step: 704850 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:07,101-Speed 2497.67 samples/sec Loss 1.1801 LearningRate 0.000028 Epoch: 33 Global Step: 704860 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:15,302-Speed 2498.01 samples/sec Loss 1.1729 LearningRate 0.000028 Epoch: 33 Global Step: 704870 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:23,504-Speed 2497.39 samples/sec Loss 1.1869 LearningRate 0.000028 Epoch: 33 Global Step: 704880 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:31,650-Speed 2514.20 samples/sec Loss 1.1814 LearningRate 0.000028 Epoch: 33 Global Step: 704890 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:39,852-Speed 2497.44 samples/sec Loss 1.1861 LearningRate 0.000028 Epoch: 33 Global Step: 704900 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:48,053-Speed 2497.68 samples/sec Loss 1.1493 LearningRate 0.000028 Epoch: 33 Global Step: 704910 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:49:56,264-Speed 2494.50 samples/sec Loss 1.1846 LearningRate 0.000028 Epoch: 33 Global Step: 704920 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:04,469-Speed 2496.52 samples/sec Loss 1.1333 LearningRate 0.000028 Epoch: 33 Global Step: 704930 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:12,673-Speed 2496.93 samples/sec Loss 1.1955 LearningRate 0.000028 Epoch: 33 Global Step: 704940 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:20,821-Speed 2514.01 samples/sec Loss 1.1598 LearningRate 0.000028 Epoch: 33 Global Step: 704950 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:29,032-Speed 2494.53 samples/sec Loss 1.1814 LearningRate 0.000028 Epoch: 33 Global Step: 704960 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:37,235-Speed 2497.00 samples/sec Loss 1.1946 LearningRate 0.000028 Epoch: 33 Global Step: 704970 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:45,439-Speed 2499.15 samples/sec Loss 1.1659 LearningRate 0.000028 Epoch: 33 Global Step: 704980 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:50:53,644-Speed 2496.69 samples/sec Loss 1.1706 LearningRate 0.000028 Epoch: 33 Global Step: 704990 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:01,844-Speed 2497.99 samples/sec Loss 1.1711 LearningRate 0.000028 Epoch: 33 Global Step: 705000 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:09,992-Speed 2513.52 samples/sec Loss 1.1871 LearningRate 0.000028 Epoch: 33 Global Step: 705010 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:18,194-Speed 2497.23 samples/sec Loss 1.1781 LearningRate 0.000028 Epoch: 33 Global Step: 705020 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:26,400-Speed 2496.71 samples/sec Loss 1.1765 LearningRate 0.000028 Epoch: 33 Global Step: 705030 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:34,604-Speed 2497.04 samples/sec Loss 1.1458 LearningRate 0.000028 Epoch: 33 Global Step: 705040 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:42,809-Speed 2496.27 samples/sec Loss 1.2002 LearningRate 0.000028 Epoch: 33 Global Step: 705050 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:51,010-Speed 2497.45 samples/sec Loss 1.1786 LearningRate 0.000028 Epoch: 33 Global Step: 705060 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:51:59,160-Speed 2513.23 samples/sec Loss 1.1612 LearningRate 0.000028 Epoch: 33 Global Step: 705070 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-07-12 07:52:07,361-Speed 2497.45 samples/sec Loss 1.1790 LearningRate 0.000028 Epoch: 33 Global Step: 705080 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:15,564-Speed 2497.22 samples/sec Loss 1.1893 LearningRate 0.000028 Epoch: 33 Global Step: 705090 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:23,764-Speed 2498.10 samples/sec Loss 1.2026 LearningRate 0.000028 Epoch: 33 Global Step: 705100 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:31,966-Speed 2497.33 samples/sec Loss 1.1615 LearningRate 0.000028 Epoch: 33 Global Step: 705110 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:40,182-Speed 2492.90 samples/sec Loss 1.1600 LearningRate 0.000028 Epoch: 33 Global Step: 705120 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:48,328-Speed 2514.69 samples/sec Loss 1.1662 LearningRate 0.000028 Epoch: 33 Global Step: 705130 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:52:56,532-Speed 2496.92 samples/sec Loss 1.1678 LearningRate 0.000028 Epoch: 33 Global Step: 705140 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:06,828-Speed 1989.24 samples/sec Loss 1.1973 LearningRate 0.000028 Epoch: 34 Global Step: 705150 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:15,024-Speed 2499.37 samples/sec Loss 1.1422 LearningRate 0.000028 Epoch: 34 Global Step: 705160 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:23,221-Speed 2498.67 samples/sec Loss 1.1568 LearningRate 0.000028 Epoch: 34 Global Step: 705170 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:31,418-Speed 2498.83 samples/sec Loss 1.1645 LearningRate 0.000028 Epoch: 34 Global Step: 705180 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:39,560-Speed 2515.81 samples/sec Loss 1.1467 LearningRate 0.000028 Epoch: 34 Global Step: 705190 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:47,760-Speed 2498.11 samples/sec Loss 1.1906 LearningRate 0.000028 Epoch: 34 Global Step: 705200 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:53:55,961-Speed 2497.54 samples/sec Loss 1.1828 LearningRate 0.000028 Epoch: 34 Global Step: 705210 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:04,159-Speed 2499.01 samples/sec Loss 1.1439 LearningRate 0.000028 Epoch: 34 Global Step: 705220 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:12,361-Speed 2497.15 samples/sec Loss 1.2074 LearningRate 0.000028 Epoch: 34 Global Step: 705230 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:20,556-Speed 2499.51 samples/sec Loss 1.1481 LearningRate 0.000028 Epoch: 34 Global Step: 705240 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:28,702-Speed 2514.45 samples/sec Loss 1.1853 LearningRate 0.000028 Epoch: 34 Global Step: 705250 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:36,906-Speed 2497.28 samples/sec Loss 1.1546 LearningRate 0.000028 Epoch: 34 Global Step: 705260 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:45,107-Speed 2497.79 samples/sec Loss 1.1759 LearningRate 0.000028 Epoch: 34 Global Step: 705270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:54:53,309-Speed 2497.29 samples/sec Loss 1.1672 LearningRate 0.000028 Epoch: 34 Global Step: 705280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:01,514-Speed 2496.20 samples/sec Loss 1.1313 LearningRate 0.000028 Epoch: 34 Global Step: 705290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:09,725-Speed 2494.69 samples/sec Loss 1.1505 LearningRate 0.000028 Epoch: 34 Global Step: 705300 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:17,873-Speed 2513.82 samples/sec Loss 1.1628 LearningRate 0.000028 Epoch: 34 Global Step: 705310 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:26,069-Speed 2499.18 samples/sec Loss 1.1752 LearningRate 0.000028 Epoch: 34 Global Step: 705320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:34,266-Speed 2499.00 samples/sec Loss 1.1276 LearningRate 0.000028 Epoch: 34 Global Step: 705330 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:42,474-Speed 2495.41 samples/sec Loss 1.1797 LearningRate 0.000028 Epoch: 34 Global Step: 705340 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:50,673-Speed 2498.34 samples/sec Loss 1.1839 LearningRate 0.000028 Epoch: 34 Global Step: 705350 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:55:58,883-Speed 2494.86 samples/sec Loss 1.1423 LearningRate 0.000028 Epoch: 34 Global Step: 705360 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:07,027-Speed 2515.61 samples/sec Loss 1.1599 LearningRate 0.000028 Epoch: 34 Global Step: 705370 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:15,227-Speed 2497.95 samples/sec Loss 1.1411 LearningRate 0.000028 Epoch: 34 Global Step: 705380 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:23,427-Speed 2498.03 samples/sec Loss 1.1415 LearningRate 0.000028 Epoch: 34 Global Step: 705390 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:31,625-Speed 2498.64 samples/sec Loss 1.1714 LearningRate 0.000028 Epoch: 34 Global Step: 705400 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:39,824-Speed 2498.38 samples/sec Loss 1.1295 LearningRate 0.000028 Epoch: 34 Global Step: 705410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:48,022-Speed 2498.68 samples/sec Loss 1.1415 LearningRate 0.000028 Epoch: 34 Global Step: 705420 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:56:56,180-Speed 2510.65 samples/sec Loss 1.1700 LearningRate 0.000028 Epoch: 34 Global Step: 705430 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:04,381-Speed 2498.06 samples/sec Loss 1.1589 LearningRate 0.000028 Epoch: 34 Global Step: 705440 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:12,580-Speed 2498.38 samples/sec Loss 1.1776 LearningRate 0.000028 Epoch: 34 Global Step: 705450 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:20,781-Speed 2497.38 samples/sec Loss 1.1568 LearningRate 0.000028 Epoch: 34 Global Step: 705460 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:28,983-Speed 2497.64 samples/sec Loss 1.1531 LearningRate 0.000028 Epoch: 34 Global Step: 705470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:37,185-Speed 2497.35 samples/sec Loss 1.1973 LearningRate 0.000028 Epoch: 34 Global Step: 705480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:45,333-Speed 2513.84 samples/sec Loss 1.1583 LearningRate 0.000028 Epoch: 34 Global Step: 705490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:57:53,533-Speed 2497.98 samples/sec Loss 1.2018 LearningRate 0.000028 Epoch: 34 Global Step: 705500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:01,733-Speed 2497.84 samples/sec Loss 1.1708 LearningRate 0.000028 Epoch: 34 Global Step: 705510 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:09,935-Speed 2497.64 samples/sec Loss 1.1728 LearningRate 0.000028 Epoch: 34 Global Step: 705520 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:18,134-Speed 2498.09 samples/sec Loss 1.1465 LearningRate 0.000028 Epoch: 34 Global Step: 705530 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:26,334-Speed 2498.06 samples/sec Loss 1.1505 LearningRate 0.000028 Epoch: 34 Global Step: 705540 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:34,491-Speed 2511.26 samples/sec Loss 1.1664 LearningRate 0.000028 Epoch: 34 Global Step: 705550 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 07:58:42,693-Speed 2497.38 samples/sec Loss 1.1910 LearningRate 0.000028 Epoch: 34 Global Step: 705560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:58:50,908-Speed 2493.43 samples/sec Loss 1.1539 LearningRate 0.000028 Epoch: 34 Global Step: 705570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:58:59,112-Speed 2496.51 samples/sec Loss 1.1720 LearningRate 0.000028 Epoch: 34 Global Step: 705580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:07,311-Speed 2498.35 samples/sec Loss 1.1679 LearningRate 0.000028 Epoch: 34 Global Step: 705590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:15,521-Speed 2494.75 samples/sec Loss 1.1601 LearningRate 0.000028 Epoch: 34 Global Step: 705600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:23,668-Speed 2514.33 samples/sec Loss 1.1124 LearningRate 0.000028 Epoch: 34 Global Step: 705610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:31,885-Speed 2492.72 samples/sec Loss 1.1557 LearningRate 0.000028 Epoch: 34 Global Step: 705620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:40,085-Speed 2497.91 samples/sec Loss 1.1453 LearningRate 0.000028 Epoch: 34 Global Step: 705630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:48,286-Speed 2497.67 samples/sec Loss 1.1815 LearningRate 0.000028 Epoch: 34 Global Step: 705640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 07:59:56,485-Speed 2498.23 samples/sec Loss 1.1532 LearningRate 0.000028 Epoch: 34 Global Step: 705650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:04,689-Speed 2496.91 samples/sec Loss 1.1570 LearningRate 0.000028 Epoch: 34 Global Step: 705660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:12,840-Speed 2513.46 samples/sec Loss 1.1713 LearningRate 0.000028 Epoch: 34 Global Step: 705670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:21,042-Speed 2497.59 samples/sec Loss 1.1641 LearningRate 0.000028 Epoch: 34 Global Step: 705680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:29,241-Speed 2498.22 samples/sec Loss 1.1775 LearningRate 0.000028 Epoch: 34 Global Step: 705690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:37,443-Speed 2497.26 samples/sec Loss 1.1570 LearningRate 0.000028 Epoch: 34 Global Step: 705700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:45,651-Speed 2495.55 samples/sec Loss 1.1699 LearningRate 0.000028 Epoch: 34 Global Step: 705710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:00:53,850-Speed 2498.14 samples/sec Loss 1.1619 LearningRate 0.000028 Epoch: 34 Global Step: 705720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:01,997-Speed 2514.42 samples/sec Loss 1.1394 LearningRate 0.000028 Epoch: 34 Global Step: 705730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:10,210-Speed 2494.15 samples/sec Loss 1.1560 LearningRate 0.000028 Epoch: 34 Global Step: 705740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:18,421-Speed 2494.49 samples/sec Loss 1.1737 LearningRate 0.000028 Epoch: 34 Global Step: 705750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:26,622-Speed 2497.80 samples/sec Loss 1.1667 LearningRate 0.000027 Epoch: 34 Global Step: 705760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:34,826-Speed 2496.57 samples/sec Loss 1.1660 LearningRate 0.000027 Epoch: 34 Global Step: 705770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:43,027-Speed 2497.49 samples/sec Loss 1.1587 LearningRate 0.000027 Epoch: 34 Global Step: 705780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:51,174-Speed 2514.23 samples/sec Loss 1.1895 LearningRate 0.000027 Epoch: 34 Global Step: 705790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:01:59,379-Speed 2496.51 samples/sec Loss 1.1585 LearningRate 0.000027 Epoch: 34 Global Step: 705800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:07,583-Speed 2496.69 samples/sec Loss 1.1647 LearningRate 0.000027 Epoch: 34 Global Step: 705810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:15,783-Speed 2498.28 samples/sec Loss 1.1163 LearningRate 0.000027 Epoch: 34 Global Step: 705820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:23,985-Speed 2497.77 samples/sec Loss 1.2006 LearningRate 0.000027 Epoch: 34 Global Step: 705830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:32,192-Speed 2495.83 samples/sec Loss 1.1636 LearningRate 0.000027 Epoch: 34 Global Step: 705840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:40,338-Speed 2514.63 samples/sec Loss 1.1935 LearningRate 0.000027 Epoch: 34 Global Step: 705850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:48,543-Speed 2496.28 samples/sec Loss 1.1333 LearningRate 0.000027 Epoch: 34 Global Step: 705860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:02:56,754-Speed 2494.67 samples/sec Loss 1.1385 LearningRate 0.000027 Epoch: 34 Global Step: 705870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:04,954-Speed 2498.00 samples/sec Loss 1.1773 LearningRate 0.000027 Epoch: 34 Global Step: 705880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:13,162-Speed 2495.29 samples/sec Loss 1.1391 LearningRate 0.000027 Epoch: 34 Global Step: 705890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:21,376-Speed 2493.82 samples/sec Loss 1.1346 LearningRate 0.000027 Epoch: 34 Global Step: 705900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:29,524-Speed 2514.00 samples/sec Loss 1.1926 LearningRate 0.000027 Epoch: 34 Global Step: 705910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:37,728-Speed 2496.60 samples/sec Loss 1.1797 LearningRate 0.000027 Epoch: 34 Global Step: 705920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:45,931-Speed 2497.13 samples/sec Loss 1.1512 LearningRate 0.000027 Epoch: 34 Global Step: 705930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:03:54,131-Speed 2497.81 samples/sec Loss 1.1819 LearningRate 0.000027 Epoch: 34 Global Step: 705940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:02,336-Speed 2496.55 samples/sec Loss 1.1558 LearningRate 0.000027 Epoch: 34 Global Step: 705950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:10,538-Speed 2497.32 samples/sec Loss 1.1397 LearningRate 0.000027 Epoch: 34 Global Step: 705960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:18,686-Speed 2513.74 samples/sec Loss 1.2015 LearningRate 0.000027 Epoch: 34 Global Step: 705970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:26,889-Speed 2497.28 samples/sec Loss 1.1497 LearningRate 0.000027 Epoch: 34 Global Step: 705980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:35,090-Speed 2497.53 samples/sec Loss 1.1725 LearningRate 0.000027 Epoch: 34 Global Step: 705990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:43,304-Speed 2493.79 samples/sec Loss 1.1839 LearningRate 0.000027 Epoch: 34 Global Step: 706000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:51,511-Speed 2495.87 samples/sec Loss 1.1998 LearningRate 0.000027 Epoch: 34 Global Step: 706010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:04:59,713-Speed 2497.51 samples/sec Loss 1.1791 LearningRate 0.000027 Epoch: 34 Global Step: 706020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:07,864-Speed 2513.05 samples/sec Loss 1.1842 LearningRate 0.000027 Epoch: 34 Global Step: 706030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:16,064-Speed 2497.84 samples/sec Loss 1.1610 LearningRate 0.000027 Epoch: 34 Global Step: 706040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:24,266-Speed 2497.68 samples/sec Loss 1.1681 LearningRate 0.000027 Epoch: 34 Global Step: 706050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:32,478-Speed 2494.42 samples/sec Loss 1.1744 LearningRate 0.000027 Epoch: 34 Global Step: 706060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:40,680-Speed 2497.09 samples/sec Loss 1.1923 LearningRate 0.000027 Epoch: 34 Global Step: 706070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:48,883-Speed 2497.12 samples/sec Loss 1.1613 LearningRate 0.000027 Epoch: 34 Global Step: 706080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:05:57,029-Speed 2514.39 samples/sec Loss 1.1647 LearningRate 0.000027 Epoch: 34 Global Step: 706090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:05,230-Speed 2497.84 samples/sec Loss 1.1913 LearningRate 0.000027 Epoch: 34 Global Step: 706100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:13,427-Speed 2498.75 samples/sec Loss 1.1842 LearningRate 0.000027 Epoch: 34 Global Step: 706110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:21,628-Speed 2497.37 samples/sec Loss 1.1702 LearningRate 0.000027 Epoch: 34 Global Step: 706120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:29,848-Speed 2491.92 samples/sec Loss 1.1365 LearningRate 0.000027 Epoch: 34 Global Step: 706130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:38,048-Speed 2498.04 samples/sec Loss 1.1511 LearningRate 0.000027 Epoch: 34 Global Step: 706140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:46,198-Speed 2513.28 samples/sec Loss 1.1844 LearningRate 0.000027 Epoch: 34 Global Step: 706150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:06:54,398-Speed 2497.89 samples/sec Loss 1.1617 LearningRate 0.000027 Epoch: 34 Global Step: 706160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:02,603-Speed 2497.32 samples/sec Loss 1.1949 LearningRate 0.000027 Epoch: 34 Global Step: 706170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:10,800-Speed 2498.98 samples/sec Loss 1.1855 LearningRate 0.000027 Epoch: 34 Global Step: 706180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:19,010-Speed 2494.47 samples/sec Loss 1.1716 LearningRate 0.000027 Epoch: 34 Global Step: 706190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:27,210-Speed 2498.27 samples/sec Loss 1.1467 LearningRate 0.000027 Epoch: 34 Global Step: 706200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:35,358-Speed 2513.89 samples/sec Loss 1.1762 LearningRate 0.000027 Epoch: 34 Global Step: 706210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:43,559-Speed 2497.75 samples/sec Loss 1.1833 LearningRate 0.000027 Epoch: 34 Global Step: 706220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:51,765-Speed 2496.13 samples/sec Loss 1.1666 LearningRate 0.000027 Epoch: 34 Global Step: 706230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:07:59,966-Speed 2497.50 samples/sec Loss 1.1282 LearningRate 0.000027 Epoch: 34 Global Step: 706240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:08,169-Speed 2497.04 samples/sec Loss 1.1378 LearningRate 0.000027 Epoch: 34 Global Step: 706250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:16,371-Speed 2497.58 samples/sec Loss 1.1392 LearningRate 0.000027 Epoch: 34 Global Step: 706260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:24,519-Speed 2513.83 samples/sec Loss 1.1880 LearningRate 0.000027 Epoch: 34 Global Step: 706270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:32,724-Speed 2496.63 samples/sec Loss 1.1807 LearningRate 0.000027 Epoch: 34 Global Step: 706280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:40,926-Speed 2497.45 samples/sec Loss 1.1645 LearningRate 0.000027 Epoch: 34 Global Step: 706290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:49,128-Speed 2497.52 samples/sec Loss 1.1689 LearningRate 0.000027 Epoch: 34 Global Step: 706300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:08:57,327-Speed 2498.08 samples/sec Loss 1.1645 LearningRate 0.000027 Epoch: 34 Global Step: 706310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:05,530-Speed 2496.78 samples/sec Loss 1.1297 LearningRate 0.000027 Epoch: 34 Global Step: 706320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:13,684-Speed 2512.20 samples/sec Loss 1.1683 LearningRate 0.000027 Epoch: 34 Global Step: 706330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:21,909-Speed 2490.48 samples/sec Loss 1.1464 LearningRate 0.000027 Epoch: 34 Global Step: 706340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:30,114-Speed 2496.54 samples/sec Loss 1.2003 LearningRate 0.000027 Epoch: 34 Global Step: 706350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:38,311-Speed 2498.59 samples/sec Loss 1.1657 LearningRate 0.000027 Epoch: 34 Global Step: 706360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:46,517-Speed 2496.29 samples/sec Loss 1.1735 LearningRate 0.000027 Epoch: 34 Global Step: 706370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:09:54,719-Speed 2497.09 samples/sec Loss 1.1593 LearningRate 0.000027 Epoch: 34 Global Step: 706380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:02,867-Speed 2513.73 samples/sec Loss 1.1776 LearningRate 0.000027 Epoch: 34 Global Step: 706390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:11,068-Speed 2497.92 samples/sec Loss 1.1711 LearningRate 0.000027 Epoch: 34 Global Step: 706400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:19,270-Speed 2497.33 samples/sec Loss 1.1556 LearningRate 0.000027 Epoch: 34 Global Step: 706410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:27,470-Speed 2497.66 samples/sec Loss 1.1392 LearningRate 0.000027 Epoch: 34 Global Step: 706420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:35,674-Speed 2496.98 samples/sec Loss 1.1536 LearningRate 0.000027 Epoch: 34 Global Step: 706430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:43,876-Speed 2497.25 samples/sec Loss 1.1638 LearningRate 0.000027 Epoch: 34 Global Step: 706440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:10:52,023-Speed 2514.11 samples/sec Loss 1.1455 LearningRate 0.000027 Epoch: 34 Global Step: 706450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:00,229-Speed 2496.27 samples/sec Loss 1.1272 LearningRate 0.000027 Epoch: 34 Global Step: 706460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:08,431-Speed 2497.14 samples/sec Loss 1.1710 LearningRate 0.000027 Epoch: 34 Global Step: 706470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:16,634-Speed 2497.06 samples/sec Loss 1.1607 LearningRate 0.000027 Epoch: 34 Global Step: 706480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:24,838-Speed 2496.93 samples/sec Loss 1.1512 LearningRate 0.000027 Epoch: 34 Global Step: 706490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:33,038-Speed 2497.89 samples/sec Loss 1.1327 LearningRate 0.000027 Epoch: 34 Global Step: 706500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:41,187-Speed 2513.69 samples/sec Loss 1.1618 LearningRate 0.000027 Epoch: 34 Global Step: 706510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:49,385-Speed 2498.42 samples/sec Loss 1.1581 LearningRate 0.000027 Epoch: 34 Global Step: 706520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:11:57,587-Speed 2497.39 samples/sec Loss 1.1527 LearningRate 0.000027 Epoch: 34 Global Step: 706530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:05,788-Speed 2497.79 samples/sec Loss 1.1362 LearningRate 0.000027 Epoch: 34 Global Step: 706540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:13,990-Speed 2497.26 samples/sec Loss 1.1743 LearningRate 0.000027 Epoch: 34 Global Step: 706550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:22,189-Speed 2498.18 samples/sec Loss 1.1571 LearningRate 0.000027 Epoch: 34 Global Step: 706560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:30,337-Speed 2514.00 samples/sec Loss 1.1496 LearningRate 0.000027 Epoch: 34 Global Step: 706570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:38,537-Speed 2497.77 samples/sec Loss 1.1800 LearningRate 0.000027 Epoch: 34 Global Step: 706580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:46,742-Speed 2496.45 samples/sec Loss 1.1447 LearningRate 0.000027 Epoch: 34 Global Step: 706590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:12:54,941-Speed 2498.04 samples/sec Loss 1.1395 LearningRate 0.000027 Epoch: 34 Global Step: 706600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:03,141-Speed 2497.95 samples/sec Loss 1.1503 LearningRate 0.000027 Epoch: 34 Global Step: 706610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:11,346-Speed 2496.31 samples/sec Loss 1.1645 LearningRate 0.000027 Epoch: 34 Global Step: 706620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:19,506-Speed 2510.25 samples/sec Loss 1.1753 LearningRate 0.000027 Epoch: 34 Global Step: 706630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:27,708-Speed 2497.29 samples/sec Loss 1.1776 LearningRate 0.000027 Epoch: 34 Global Step: 706640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:35,909-Speed 2497.74 samples/sec Loss 1.1551 LearningRate 0.000027 Epoch: 34 Global Step: 706650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:44,112-Speed 2497.20 samples/sec Loss 1.1322 LearningRate 0.000027 Epoch: 34 Global Step: 706660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:13:52,315-Speed 2496.79 samples/sec Loss 1.1468 LearningRate 0.000027 Epoch: 34 Global Step: 706670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:00,517-Speed 2497.52 samples/sec Loss 1.1532 LearningRate 0.000027 Epoch: 34 Global Step: 706680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:08,666-Speed 2513.54 samples/sec Loss 1.1411 LearningRate 0.000027 Epoch: 34 Global Step: 706690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:16,866-Speed 2497.83 samples/sec Loss 1.1803 LearningRate 0.000027 Epoch: 34 Global Step: 706700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:25,075-Speed 2495.52 samples/sec Loss 1.1603 LearningRate 0.000027 Epoch: 34 Global Step: 706710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:33,278-Speed 2496.94 samples/sec Loss 1.1411 LearningRate 0.000027 Epoch: 34 Global Step: 706720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:41,482-Speed 2496.85 samples/sec Loss 1.1666 LearningRate 0.000027 Epoch: 34 Global Step: 706730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:49,694-Speed 2493.97 samples/sec Loss 1.1626 LearningRate 0.000027 Epoch: 34 Global Step: 706740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:14:57,848-Speed 2512.27 samples/sec Loss 1.1544 LearningRate 0.000027 Epoch: 34 Global Step: 706750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:15:06,051-Speed 2496.89 samples/sec Loss 1.1855 LearningRate 0.000027 Epoch: 34 Global Step: 706760 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:14,254-Speed 2496.87 samples/sec Loss 1.1669 LearningRate 0.000027 Epoch: 34 Global Step: 706770 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:22,455-Speed 2497.95 samples/sec Loss 1.1398 LearningRate 0.000027 Epoch: 34 Global Step: 706780 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:30,658-Speed 2497.18 samples/sec Loss 1.1574 LearningRate 0.000027 Epoch: 34 Global Step: 706790 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:38,860-Speed 2497.19 samples/sec Loss 1.1381 LearningRate 0.000027 Epoch: 34 Global Step: 706800 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:47,011-Speed 2512.97 samples/sec Loss 1.1449 LearningRate 0.000027 Epoch: 34 Global Step: 706810 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:15:55,211-Speed 2497.71 samples/sec Loss 1.1359 LearningRate 0.000027 Epoch: 34 Global Step: 706820 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:16:03,414-Speed 2497.09 samples/sec Loss 1.1402 LearningRate 0.000027 Epoch: 34 Global Step: 706830 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:16:11,619-Speed 2496.39 samples/sec Loss 1.1867 LearningRate 0.000027 Epoch: 34 Global Step: 706840 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:16:19,819-Speed 2497.83 samples/sec Loss 1.1413 LearningRate 0.000027 Epoch: 34 Global Step: 706850 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-07-12 08:16:27,982-Speed 2509.76 samples/sec Loss 1.1710 LearningRate 0.000027 Epoch: 34 Global Step: 706860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:16:36,129-Speed 2514.12 samples/sec Loss 1.1564 LearningRate 0.000027 Epoch: 34 Global Step: 706870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:16:44,330-Speed 2497.67 samples/sec Loss 1.1558 LearningRate 0.000027 Epoch: 34 Global Step: 706880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:16:52,530-Speed 2497.78 samples/sec Loss 1.1365 LearningRate 0.000027 Epoch: 34 Global Step: 706890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:00,734-Speed 2497.12 samples/sec Loss 1.1539 LearningRate 0.000027 Epoch: 34 Global Step: 706900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:08,949-Speed 2493.38 samples/sec Loss 1.1414 LearningRate 0.000027 Epoch: 34 Global Step: 706910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:17,150-Speed 2497.61 samples/sec Loss 1.1806 LearningRate 0.000027 Epoch: 34 Global Step: 706920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:25,298-Speed 2513.88 samples/sec Loss 1.1693 LearningRate 0.000027 Epoch: 34 Global Step: 706930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:33,501-Speed 2497.22 samples/sec Loss 1.1561 LearningRate 0.000027 Epoch: 34 Global Step: 706940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:41,702-Speed 2497.39 samples/sec Loss 1.1985 LearningRate 0.000027 Epoch: 34 Global Step: 706950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:49,904-Speed 2497.49 samples/sec Loss 1.1408 LearningRate 0.000027 Epoch: 34 Global Step: 706960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:17:58,109-Speed 2496.41 samples/sec Loss 1.1757 LearningRate 0.000027 Epoch: 34 Global Step: 706970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:06,307-Speed 2498.69 samples/sec Loss 1.1754 LearningRate 0.000027 Epoch: 34 Global Step: 706980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:14,458-Speed 2512.77 samples/sec Loss 1.1485 LearningRate 0.000027 Epoch: 34 Global Step: 706990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:22,661-Speed 2497.48 samples/sec Loss 1.1436 LearningRate 0.000027 Epoch: 34 Global Step: 707000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:30,861-Speed 2497.51 samples/sec Loss 1.1596 LearningRate 0.000027 Epoch: 34 Global Step: 707010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:39,074-Speed 2494.06 samples/sec Loss 1.0991 LearningRate 0.000027 Epoch: 34 Global Step: 707020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:47,279-Speed 2496.64 samples/sec Loss 1.1526 LearningRate 0.000027 Epoch: 34 Global Step: 707030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:18:55,479-Speed 2497.90 samples/sec Loss 1.1692 LearningRate 0.000027 Epoch: 34 Global Step: 707040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:03,628-Speed 2513.58 samples/sec Loss 1.1761 LearningRate 0.000027 Epoch: 34 Global Step: 707050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:11,830-Speed 2497.42 samples/sec Loss 1.1308 LearningRate 0.000027 Epoch: 34 Global Step: 707060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:20,045-Speed 2493.39 samples/sec Loss 1.1753 LearningRate 0.000027 Epoch: 34 Global Step: 707070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:28,252-Speed 2495.84 samples/sec Loss 1.1710 LearningRate 0.000027 Epoch: 34 Global Step: 707080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:36,454-Speed 2497.19 samples/sec Loss 1.1896 LearningRate 0.000027 Epoch: 34 Global Step: 707090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:44,654-Speed 2498.13 samples/sec Loss 1.1653 LearningRate 0.000027 Epoch: 34 Global Step: 707100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:19:52,804-Speed 2513.37 samples/sec Loss 1.1629 LearningRate 0.000027 Epoch: 34 Global Step: 707110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:01,019-Speed 2493.26 samples/sec Loss 1.1667 LearningRate 0.000027 Epoch: 34 Global Step: 707120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:09,233-Speed 2493.65 samples/sec Loss 1.1856 LearningRate 0.000027 Epoch: 34 Global Step: 707130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:17,435-Speed 2497.33 samples/sec Loss 1.1492 LearningRate 0.000027 Epoch: 34 Global Step: 707140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:25,637-Speed 2497.62 samples/sec Loss 1.1541 LearningRate 0.000027 Epoch: 34 Global Step: 707150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:33,838-Speed 2497.68 samples/sec Loss 1.1467 LearningRate 0.000027 Epoch: 34 Global Step: 707160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:41,999-Speed 2509.93 samples/sec Loss 1.1709 LearningRate 0.000027 Epoch: 34 Global Step: 707170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:50,199-Speed 2497.81 samples/sec Loss 1.1664 LearningRate 0.000027 Epoch: 34 Global Step: 707180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:20:58,407-Speed 2495.57 samples/sec Loss 1.1735 LearningRate 0.000027 Epoch: 34 Global Step: 707190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:06,608-Speed 2497.49 samples/sec Loss 1.1535 LearningRate 0.000027 Epoch: 34 Global Step: 707200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:14,807-Speed 2498.24 samples/sec Loss 1.1644 LearningRate 0.000027 Epoch: 34 Global Step: 707210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:23,010-Speed 2499.27 samples/sec Loss 1.1947 LearningRate 0.000027 Epoch: 34 Global Step: 707220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:31,157-Speed 2514.20 samples/sec Loss 1.1530 LearningRate 0.000027 Epoch: 34 Global Step: 707230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:39,358-Speed 2497.60 samples/sec Loss 1.1484 LearningRate 0.000027 Epoch: 34 Global Step: 707240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:47,561-Speed 2496.89 samples/sec Loss 1.1680 LearningRate 0.000027 Epoch: 34 Global Step: 707250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:21:55,764-Speed 2496.91 samples/sec Loss 1.1448 LearningRate 0.000027 Epoch: 34 Global Step: 707260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:03,962-Speed 2498.51 samples/sec Loss 1.1804 LearningRate 0.000027 Epoch: 34 Global Step: 707270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:12,163-Speed 2497.60 samples/sec Loss 1.1733 LearningRate 0.000027 Epoch: 34 Global Step: 707280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:20,314-Speed 2513.18 samples/sec Loss 1.1592 LearningRate 0.000027 Epoch: 34 Global Step: 707290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:28,517-Speed 2497.25 samples/sec Loss 1.1347 LearningRate 0.000027 Epoch: 34 Global Step: 707300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:36,717-Speed 2497.81 samples/sec Loss 1.1683 LearningRate 0.000027 Epoch: 34 Global Step: 707310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:44,916-Speed 2498.22 samples/sec Loss 1.1579 LearningRate 0.000027 Epoch: 34 Global Step: 707320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:22:53,117-Speed 2497.43 samples/sec Loss 1.1922 LearningRate 0.000027 Epoch: 34 Global Step: 707330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:01,325-Speed 2495.68 samples/sec Loss 1.1620 LearningRate 0.000027 Epoch: 34 Global Step: 707340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:09,474-Speed 2513.49 samples/sec Loss 1.1597 LearningRate 0.000027 Epoch: 34 Global Step: 707350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:17,675-Speed 2497.65 samples/sec Loss 1.1572 LearningRate 0.000027 Epoch: 34 Global Step: 707360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:25,877-Speed 2497.48 samples/sec Loss 1.1507 LearningRate 0.000027 Epoch: 34 Global Step: 707370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:34,075-Speed 2498.50 samples/sec Loss 1.1535 LearningRate 0.000027 Epoch: 34 Global Step: 707380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:23:42,233-Speed 2510.90 samples/sec Loss 1.1708 LearningRate 0.000027 Epoch: 34 Global Step: 707390 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:23:50,434-Speed 2497.66 samples/sec Loss 1.1432 LearningRate 0.000027 Epoch: 34 Global Step: 707400 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:23:58,582-Speed 2513.84 samples/sec Loss 1.1584 LearningRate 0.000027 Epoch: 34 Global Step: 707410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:06,784-Speed 2497.58 samples/sec Loss 1.1551 LearningRate 0.000027 Epoch: 34 Global Step: 707420 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:14,985-Speed 2497.48 samples/sec Loss 1.1506 LearningRate 0.000027 Epoch: 34 Global Step: 707430 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:23,188-Speed 2497.24 samples/sec Loss 1.1445 LearningRate 0.000027 Epoch: 34 Global Step: 707440 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:31,394-Speed 2496.30 samples/sec Loss 1.1671 LearningRate 0.000027 Epoch: 34 Global Step: 707450 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:39,598-Speed 2496.67 samples/sec Loss 1.1396 LearningRate 0.000027 Epoch: 34 Global Step: 707460 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:47,748-Speed 2513.09 samples/sec Loss 1.1767 LearningRate 0.000027 Epoch: 34 Global Step: 707470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:24:55,954-Speed 2495.94 samples/sec Loss 1.1539 LearningRate 0.000027 Epoch: 34 Global Step: 707480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:04,157-Speed 2497.28 samples/sec Loss 1.1405 LearningRate 0.000027 Epoch: 34 Global Step: 707490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:12,360-Speed 2497.13 samples/sec Loss 1.1612 LearningRate 0.000027 Epoch: 34 Global Step: 707500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:20,562-Speed 2497.24 samples/sec Loss 1.1443 LearningRate 0.000027 Epoch: 34 Global Step: 707510 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:28,765-Speed 2497.13 samples/sec Loss 1.1849 LearningRate 0.000027 Epoch: 34 Global Step: 707520 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:36,912-Speed 2514.02 samples/sec Loss 1.1425 LearningRate 0.000027 Epoch: 34 Global Step: 707530 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:45,115-Speed 2496.95 samples/sec Loss 1.1104 LearningRate 0.000027 Epoch: 34 Global Step: 707540 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:25:53,321-Speed 2496.31 samples/sec Loss 1.1909 LearningRate 0.000027 Epoch: 34 Global Step: 707550 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:01,524-Speed 2497.12 samples/sec Loss 1.1281 LearningRate 0.000027 Epoch: 34 Global Step: 707560 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:09,729-Speed 2496.38 samples/sec Loss 1.1583 LearningRate 0.000027 Epoch: 34 Global Step: 707570 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:17,931-Speed 2497.27 samples/sec Loss 1.1558 LearningRate 0.000027 Epoch: 34 Global Step: 707580 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:26,084-Speed 2512.52 samples/sec Loss 1.1749 LearningRate 0.000027 Epoch: 34 Global Step: 707590 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:34,285-Speed 2497.46 samples/sec Loss 1.1577 LearningRate 0.000027 Epoch: 34 Global Step: 707600 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:42,484-Speed 2498.42 samples/sec Loss 1.1464 LearningRate 0.000027 Epoch: 34 Global Step: 707610 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:50,680-Speed 2498.98 samples/sec Loss 1.1647 LearningRate 0.000027 Epoch: 34 Global Step: 707620 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:26:58,877-Speed 2498.77 samples/sec Loss 1.1688 LearningRate 0.000027 Epoch: 34 Global Step: 707630 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:07,081-Speed 2496.93 samples/sec Loss 1.1528 LearningRate 0.000027 Epoch: 34 Global Step: 707640 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:15,246-Speed 2508.72 samples/sec Loss 1.1622 LearningRate 0.000027 Epoch: 34 Global Step: 707650 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:23,448-Speed 2497.25 samples/sec Loss 1.1984 LearningRate 0.000027 Epoch: 34 Global Step: 707660 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:31,673-Speed 2490.75 samples/sec Loss 1.1689 LearningRate 0.000027 Epoch: 34 Global Step: 707670 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:39,882-Speed 2495.19 samples/sec Loss 1.1310 LearningRate 0.000027 Epoch: 34 Global Step: 707680 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:48,084-Speed 2497.53 samples/sec Loss 1.1787 LearningRate 0.000027 Epoch: 34 Global Step: 707690 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:27:56,287-Speed 2496.92 samples/sec Loss 1.1459 LearningRate 0.000027 Epoch: 34 Global Step: 707700 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:04,432-Speed 2514.86 samples/sec Loss 1.1759 LearningRate 0.000027 Epoch: 34 Global Step: 707710 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:12,632-Speed 2497.93 samples/sec Loss 1.1685 LearningRate 0.000027 Epoch: 34 Global Step: 707720 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:20,831-Speed 2498.08 samples/sec Loss 1.1403 LearningRate 0.000027 Epoch: 34 Global Step: 707730 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:29,035-Speed 2496.81 samples/sec Loss 1.1472 LearningRate 0.000027 Epoch: 34 Global Step: 707740 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:37,235-Speed 2497.77 samples/sec Loss 1.1577 LearningRate 0.000027 Epoch: 34 Global Step: 707750 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:45,446-Speed 2494.48 samples/sec Loss 1.1318 LearningRate 0.000027 Epoch: 34 Global Step: 707760 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:28:53,595-Speed 2513.58 samples/sec Loss 1.1541 LearningRate 0.000027 Epoch: 34 Global Step: 707770 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:01,792-Speed 2498.94 samples/sec Loss 1.1825 LearningRate 0.000027 Epoch: 34 Global Step: 707780 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:09,995-Speed 2497.22 samples/sec Loss 1.1544 LearningRate 0.000027 Epoch: 34 Global Step: 707790 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:18,202-Speed 2496.14 samples/sec Loss 1.1564 LearningRate 0.000027 Epoch: 34 Global Step: 707800 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:26,403-Speed 2497.46 samples/sec Loss 1.1448 LearningRate 0.000027 Epoch: 34 Global Step: 707810 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:34,607-Speed 2496.82 samples/sec Loss 1.1525 LearningRate 0.000027 Epoch: 34 Global Step: 707820 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:42,753-Speed 2514.38 samples/sec Loss 1.1949 LearningRate 0.000027 Epoch: 34 Global Step: 707830 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:50,953-Speed 2498.31 samples/sec Loss 1.1445 LearningRate 0.000027 Epoch: 34 Global Step: 707840 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:29:59,152-Speed 2498.15 samples/sec Loss 1.1331 LearningRate 0.000027 Epoch: 34 Global Step: 707850 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:07,356-Speed 2496.74 samples/sec Loss 1.1245 LearningRate 0.000027 Epoch: 34 Global Step: 707860 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:15,574-Speed 2492.67 samples/sec Loss 1.1688 LearningRate 0.000027 Epoch: 34 Global Step: 707870 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:23,776-Speed 2497.11 samples/sec Loss 1.1834 LearningRate 0.000027 Epoch: 34 Global Step: 707880 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:31,923-Speed 2514.34 samples/sec Loss 1.1787 LearningRate 0.000027 Epoch: 34 Global Step: 707890 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:40,123-Speed 2497.90 samples/sec Loss 1.1634 LearningRate 0.000027 Epoch: 34 Global Step: 707900 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:48,328-Speed 2496.40 samples/sec Loss 1.1454 LearningRate 0.000027 Epoch: 34 Global Step: 707910 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:30:56,529-Speed 2497.84 samples/sec Loss 1.1556 LearningRate 0.000027 Epoch: 34 Global Step: 707920 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:04,733-Speed 2496.63 samples/sec Loss 1.1444 LearningRate 0.000027 Epoch: 34 Global Step: 707930 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:12,934-Speed 2498.10 samples/sec Loss 1.1894 LearningRate 0.000027 Epoch: 34 Global Step: 707940 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:21,081-Speed 2514.34 samples/sec Loss 1.1342 LearningRate 0.000027 Epoch: 34 Global Step: 707950 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:29,284-Speed 2497.22 samples/sec Loss 1.1617 LearningRate 0.000027 Epoch: 34 Global Step: 707960 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:37,489-Speed 2496.35 samples/sec Loss 1.1398 LearningRate 0.000027 Epoch: 34 Global Step: 707970 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:45,702-Speed 2494.13 samples/sec Loss 1.1457 LearningRate 0.000027 Epoch: 34 Global Step: 707980 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:31:53,904-Speed 2497.28 samples/sec Loss 1.1545 LearningRate 0.000027 Epoch: 34 Global Step: 707990 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:02,103-Speed 2498.19 samples/sec Loss 1.1843 LearningRate 0.000027 Epoch: 34 Global Step: 708000 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:10,253-Speed 2513.32 samples/sec Loss 1.1363 LearningRate 0.000027 Epoch: 34 Global Step: 708010 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:18,452-Speed 2498.38 samples/sec Loss 1.1583 LearningRate 0.000027 Epoch: 34 Global Step: 708020 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:26,654-Speed 2497.41 samples/sec Loss 1.1536 LearningRate 0.000026 Epoch: 34 Global Step: 708030 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:34,855-Speed 2497.52 samples/sec Loss 1.1470 LearningRate 0.000026 Epoch: 34 Global Step: 708040 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:43,053-Speed 2498.45 samples/sec Loss 1.1347 LearningRate 0.000026 Epoch: 34 Global Step: 708050 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:51,257-Speed 2496.73 samples/sec Loss 1.1470 LearningRate 0.000026 Epoch: 34 Global Step: 708060 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:32:59,421-Speed 2510.94 samples/sec Loss 1.1639 LearningRate 0.000026 Epoch: 34 Global Step: 708070 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:07,621-Speed 2497.90 samples/sec Loss 1.1729 LearningRate 0.000026 Epoch: 34 Global Step: 708080 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:15,821-Speed 2497.69 samples/sec Loss 1.1545 LearningRate 0.000026 Epoch: 34 Global Step: 708090 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:24,021-Speed 2498.42 samples/sec Loss 1.1527 LearningRate 0.000026 Epoch: 34 Global Step: 708100 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:32,220-Speed 2498.12 samples/sec Loss 1.1745 LearningRate 0.000026 Epoch: 34 Global Step: 708110 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:40,425-Speed 2496.54 samples/sec Loss 1.1667 LearningRate 0.000026 Epoch: 34 Global Step: 708120 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:48,571-Speed 2514.59 samples/sec Loss 1.1432 LearningRate 0.000026 Epoch: 34 Global Step: 708130 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:33:56,777-Speed 2496.41 samples/sec Loss 1.1358 LearningRate 0.000026 Epoch: 34 Global Step: 708140 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:05,000-Speed 2491.04 samples/sec Loss 1.1701 LearningRate 0.000026 Epoch: 34 Global Step: 708150 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:13,211-Speed 2494.48 samples/sec Loss 1.1668 LearningRate 0.000026 Epoch: 34 Global Step: 708160 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:21,413-Speed 2497.63 samples/sec Loss 1.1434 LearningRate 0.000026 Epoch: 34 Global Step: 708170 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:29,618-Speed 2496.30 samples/sec Loss 1.1990 LearningRate 0.000026 Epoch: 34 Global Step: 708180 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:37,764-Speed 2514.59 samples/sec Loss 1.1694 LearningRate 0.000026 Epoch: 34 Global Step: 708190 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:45,963-Speed 2498.05 samples/sec Loss 1.1617 LearningRate 0.000026 Epoch: 34 Global Step: 708200 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:34:54,164-Speed 2497.76 samples/sec Loss 1.1651 LearningRate 0.000026 Epoch: 34 Global Step: 708210 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:02,365-Speed 2497.51 samples/sec Loss 1.1599 LearningRate 0.000026 Epoch: 34 Global Step: 708220 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:10,566-Speed 2497.73 samples/sec Loss 1.0933 LearningRate 0.000026 Epoch: 34 Global Step: 708230 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:18,766-Speed 2497.94 samples/sec Loss 1.1383 LearningRate 0.000026 Epoch: 34 Global Step: 708240 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:26,909-Speed 2515.11 samples/sec Loss 1.1775 LearningRate 0.000026 Epoch: 34 Global Step: 708250 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:35,108-Speed 2498.33 samples/sec Loss 1.1850 LearningRate 0.000026 Epoch: 34 Global Step: 708260 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:43,308-Speed 2498.16 samples/sec Loss 1.1805 LearningRate 0.000026 Epoch: 34 Global Step: 708270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:51,508-Speed 2497.83 samples/sec Loss 1.1673 LearningRate 0.000026 Epoch: 34 Global Step: 708280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:35:59,707-Speed 2498.15 samples/sec Loss 1.1551 LearningRate 0.000026 Epoch: 34 Global Step: 708290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:07,906-Speed 2498.02 samples/sec Loss 1.1996 LearningRate 0.000026 Epoch: 34 Global Step: 708300 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:16,055-Speed 2513.92 samples/sec Loss 1.1522 LearningRate 0.000026 Epoch: 34 Global Step: 708310 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:24,253-Speed 2498.63 samples/sec Loss 1.1799 LearningRate 0.000026 Epoch: 34 Global Step: 708320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:32,453-Speed 2497.64 samples/sec Loss 1.1568 LearningRate 0.000026 Epoch: 34 Global Step: 708330 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:40,660-Speed 2495.96 samples/sec Loss 1.1616 LearningRate 0.000026 Epoch: 34 Global Step: 708340 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:48,858-Speed 2498.65 samples/sec Loss 1.1611 LearningRate 0.000026 Epoch: 34 Global Step: 708350 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:36:57,062-Speed 2496.62 samples/sec Loss 1.1675 LearningRate 0.000026 Epoch: 34 Global Step: 708360 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:05,213-Speed 2513.28 samples/sec Loss 1.1515 LearningRate 0.000026 Epoch: 34 Global Step: 708370 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:13,412-Speed 2498.36 samples/sec Loss 1.2070 LearningRate 0.000026 Epoch: 34 Global Step: 708380 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:21,616-Speed 2496.41 samples/sec Loss 1.1589 LearningRate 0.000026 Epoch: 34 Global Step: 708390 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:29,817-Speed 2497.62 samples/sec Loss 1.1524 LearningRate 0.000026 Epoch: 34 Global Step: 708400 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:38,018-Speed 2497.72 samples/sec Loss 1.1580 LearningRate 0.000026 Epoch: 34 Global Step: 708410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:46,219-Speed 2498.07 samples/sec Loss 1.1590 LearningRate 0.000026 Epoch: 34 Global Step: 708420 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:37:54,361-Speed 2515.46 samples/sec Loss 1.1488 LearningRate 0.000026 Epoch: 34 Global Step: 708430 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:02,563-Speed 2497.46 samples/sec Loss 1.1513 LearningRate 0.000026 Epoch: 34 Global Step: 708440 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:10,765-Speed 2497.34 samples/sec Loss 1.1526 LearningRate 0.000026 Epoch: 34 Global Step: 708450 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:18,969-Speed 2497.20 samples/sec Loss 1.1579 LearningRate 0.000026 Epoch: 34 Global Step: 708460 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:27,170-Speed 2497.38 samples/sec Loss 1.1729 LearningRate 0.000026 Epoch: 34 Global Step: 708470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:35,379-Speed 2495.38 samples/sec Loss 1.1784 LearningRate 0.000026 Epoch: 34 Global Step: 708480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:43,533-Speed 2512.19 samples/sec Loss 1.1424 LearningRate 0.000026 Epoch: 34 Global Step: 708490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:51,736-Speed 2497.03 samples/sec Loss 1.1791 LearningRate 0.000026 Epoch: 34 Global Step: 708500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:38:59,943-Speed 2495.67 samples/sec Loss 1.1528 LearningRate 0.000026 Epoch: 34 Global Step: 708510 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:08,147-Speed 2496.74 samples/sec Loss 1.1641 LearningRate 0.000026 Epoch: 34 Global Step: 708520 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:16,350-Speed 2496.97 samples/sec Loss 1.1651 LearningRate 0.000026 Epoch: 34 Global Step: 708530 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:24,551-Speed 2497.52 samples/sec Loss 1.1319 LearningRate 0.000026 Epoch: 34 Global Step: 708540 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:32,701-Speed 2513.24 samples/sec Loss 1.1550 LearningRate 0.000026 Epoch: 34 Global Step: 708550 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:40,904-Speed 2497.20 samples/sec Loss 1.1504 LearningRate 0.000026 Epoch: 34 Global Step: 708560 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:49,104-Speed 2497.91 samples/sec Loss 1.1630 LearningRate 0.000026 Epoch: 34 Global Step: 708570 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:39:57,302-Speed 2498.68 samples/sec Loss 1.1542 LearningRate 0.000026 Epoch: 34 Global Step: 708580 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-07-12 08:40:05,506-Speed 2496.64 samples/sec Loss 1.1582 LearningRate 0.000026 Epoch: 34 Global Step: 708590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:13,708-Speed 2497.45 samples/sec Loss 1.1696 LearningRate 0.000026 Epoch: 34 Global Step: 708600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:21,854-Speed 2514.54 samples/sec Loss 1.1773 LearningRate 0.000026 Epoch: 34 Global Step: 708610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:30,054-Speed 2498.17 samples/sec Loss 1.1802 LearningRate 0.000026 Epoch: 34 Global Step: 708620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:38,254-Speed 2497.94 samples/sec Loss 1.1750 LearningRate 0.000026 Epoch: 34 Global Step: 708630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:46,455-Speed 2497.61 samples/sec Loss 1.1653 LearningRate 0.000026 Epoch: 34 Global Step: 708640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:40:54,670-Speed 2493.51 samples/sec Loss 1.1074 LearningRate 0.000026 Epoch: 34 Global Step: 708650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:02,887-Speed 2492.85 samples/sec Loss 1.1785 LearningRate 0.000026 Epoch: 34 Global Step: 708660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:11,033-Speed 2514.53 samples/sec Loss 1.1638 LearningRate 0.000026 Epoch: 34 Global Step: 708670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:19,233-Speed 2498.20 samples/sec Loss 1.1900 LearningRate 0.000026 Epoch: 34 Global Step: 708680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:27,548-Speed 2500.10 samples/sec Loss 1.1730 LearningRate 0.000026 Epoch: 34 Global Step: 708690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:35,751-Speed 2496.90 samples/sec Loss 1.1541 LearningRate 0.000026 Epoch: 34 Global Step: 708700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:44,698-Speed 2501.25 samples/sec Loss 1.1522 LearningRate 0.000026 Epoch: 34 Global Step: 708710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:41:52,958-Speed 2500.08 samples/sec Loss 1.1724 LearningRate 0.000026 Epoch: 34 Global Step: 708720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:01,116-Speed 2515.51 samples/sec Loss 1.1361 LearningRate 0.000026 Epoch: 34 Global Step: 708730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:13,123-Speed 1705.87 samples/sec Loss 1.1678 LearningRate 0.000026 Epoch: 34 Global Step: 708740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:21,316-Speed 2499.79 samples/sec Loss 1.1488 LearningRate 0.000026 Epoch: 34 Global Step: 708750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:29,571-Speed 2499.71 samples/sec Loss 1.1372 LearningRate 0.000026 Epoch: 34 Global Step: 708760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:37,835-Speed 2500.79 samples/sec Loss 1.1674 LearningRate 0.000026 Epoch: 34 Global Step: 708770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:50,050-Speed 1676.77 samples/sec Loss 1.1321 LearningRate 0.000026 Epoch: 34 Global Step: 708780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:42:58,223-Speed 2512.24 samples/sec Loss 1.1486 LearningRate 0.000026 Epoch: 34 Global Step: 708790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:09,471-Speed 2497.41 samples/sec Loss 1.1723 LearningRate 0.000026 Epoch: 34 Global Step: 708800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:20,314-Speed 1899.88 samples/sec Loss 1.1769 LearningRate 0.000026 Epoch: 34 Global Step: 708810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:28,505-Speed 2500.34 samples/sec Loss 1.1805 LearningRate 0.000026 Epoch: 34 Global Step: 708820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:36,741-Speed 2501.72 samples/sec Loss 1.1472 LearningRate 0.000026 Epoch: 34 Global Step: 708830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:44,972-Speed 2497.98 samples/sec Loss 1.1683 LearningRate 0.000026 Epoch: 34 Global Step: 708840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:43:53,118-Speed 2514.53 samples/sec Loss 1.1653 LearningRate 0.000026 Epoch: 34 Global Step: 708850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:01,326-Speed 2495.68 samples/sec Loss 1.1611 LearningRate 0.000026 Epoch: 34 Global Step: 708860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:09,569-Speed 2497.56 samples/sec Loss 1.1649 LearningRate 0.000026 Epoch: 34 Global Step: 708870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:17,814-Speed 2498.64 samples/sec Loss 1.1645 LearningRate 0.000026 Epoch: 34 Global Step: 708880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:26,022-Speed 2495.62 samples/sec Loss 1.1570 LearningRate 0.000026 Epoch: 34 Global Step: 708890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:34,276-Speed 2498.07 samples/sec Loss 1.1625 LearningRate 0.000026 Epoch: 34 Global Step: 708900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:42,481-Speed 2514.94 samples/sec Loss 1.1634 LearningRate 0.000026 Epoch: 34 Global Step: 708910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:44:55,370-Speed 1589.04 samples/sec Loss 1.1650 LearningRate 0.000026 Epoch: 34 Global Step: 708920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:03,584-Speed 2500.64 samples/sec Loss 1.1417 LearningRate 0.000026 Epoch: 34 Global Step: 708930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:11,811-Speed 2497.21 samples/sec Loss 1.1593 LearningRate 0.000026 Epoch: 34 Global Step: 708940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:22,235-Speed 1964.93 samples/sec Loss 1.1768 LearningRate 0.000026 Epoch: 34 Global Step: 708950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:30,434-Speed 2498.22 samples/sec Loss 1.1256 LearningRate 0.000026 Epoch: 34 Global Step: 708960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:42,191-Speed 2514.10 samples/sec Loss 1.1523 LearningRate 0.000026 Epoch: 34 Global Step: 708970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:50,385-Speed 2500.06 samples/sec Loss 1.1501 LearningRate 0.000026 Epoch: 34 Global Step: 708980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:45:58,591-Speed 2495.92 samples/sec Loss 1.1212 LearningRate 0.000026 Epoch: 34 Global Step: 708990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:06,803-Speed 2494.16 samples/sec Loss 1.1847 LearningRate 0.000026 Epoch: 34 Global Step: 709000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:15,003-Speed 2498.30 samples/sec Loss 1.1672 LearningRate 0.000026 Epoch: 34 Global Step: 709010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:23,209-Speed 2496.19 samples/sec Loss 1.1674 LearningRate 0.000026 Epoch: 34 Global Step: 709020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:31,359-Speed 2513.05 samples/sec Loss 1.1603 LearningRate 0.000026 Epoch: 34 Global Step: 709030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:39,562-Speed 2497.27 samples/sec Loss 1.1496 LearningRate 0.000026 Epoch: 34 Global Step: 709040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:47,768-Speed 2496.17 samples/sec Loss 1.1236 LearningRate 0.000026 Epoch: 34 Global Step: 709050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:46:55,977-Speed 2495.07 samples/sec Loss 1.1502 LearningRate 0.000026 Epoch: 34 Global Step: 709060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:04,184-Speed 2496.15 samples/sec Loss 1.1557 LearningRate 0.000026 Epoch: 34 Global Step: 709070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:12,392-Speed 2495.49 samples/sec Loss 1.1596 LearningRate 0.000026 Epoch: 34 Global Step: 709080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:20,544-Speed 2512.70 samples/sec Loss 1.1725 LearningRate 0.000026 Epoch: 34 Global Step: 709090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:28,749-Speed 2496.61 samples/sec Loss 1.1739 LearningRate 0.000026 Epoch: 34 Global Step: 709100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:36,954-Speed 2496.44 samples/sec Loss 1.1757 LearningRate 0.000026 Epoch: 34 Global Step: 709110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:45,156-Speed 2497.50 samples/sec Loss 1.1408 LearningRate 0.000026 Epoch: 34 Global Step: 709120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:47:53,359-Speed 2497.05 samples/sec Loss 1.1668 LearningRate 0.000026 Epoch: 34 Global Step: 709130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:01,563-Speed 2496.75 samples/sec Loss 1.1652 LearningRate 0.000026 Epoch: 34 Global Step: 709140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:09,716-Speed 2512.80 samples/sec Loss 1.1643 LearningRate 0.000026 Epoch: 34 Global Step: 709150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:17,920-Speed 2496.70 samples/sec Loss 1.1533 LearningRate 0.000026 Epoch: 34 Global Step: 709160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:26,124-Speed 2496.85 samples/sec Loss 1.1592 LearningRate 0.000026 Epoch: 34 Global Step: 709170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:34,330-Speed 2495.98 samples/sec Loss 1.1567 LearningRate 0.000026 Epoch: 34 Global Step: 709180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:42,538-Speed 2495.85 samples/sec Loss 1.1220 LearningRate 0.000026 Epoch: 34 Global Step: 709190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:50,744-Speed 2495.94 samples/sec Loss 1.1397 LearningRate 0.000026 Epoch: 34 Global Step: 709200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:48:58,910-Speed 2508.42 samples/sec Loss 1.1457 LearningRate 0.000026 Epoch: 34 Global Step: 709210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:07,118-Speed 2495.53 samples/sec Loss 1.1362 LearningRate 0.000026 Epoch: 34 Global Step: 709220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:15,325-Speed 2495.78 samples/sec Loss 1.1780 LearningRate 0.000026 Epoch: 34 Global Step: 709230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:23,529-Speed 2496.94 samples/sec Loss 1.1346 LearningRate 0.000026 Epoch: 34 Global Step: 709240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:31,738-Speed 2495.60 samples/sec Loss 1.1463 LearningRate 0.000026 Epoch: 34 Global Step: 709250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:39,941-Speed 2496.77 samples/sec Loss 1.1672 LearningRate 0.000026 Epoch: 34 Global Step: 709260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:48,093-Speed 2512.77 samples/sec Loss 1.1842 LearningRate 0.000026 Epoch: 34 Global Step: 709270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:49:56,299-Speed 2496.38 samples/sec Loss 1.1443 LearningRate 0.000026 Epoch: 34 Global Step: 709280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:04,507-Speed 2495.34 samples/sec Loss 1.1627 LearningRate 0.000026 Epoch: 34 Global Step: 709290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:12,713-Speed 2496.09 samples/sec Loss 1.1681 LearningRate 0.000026 Epoch: 34 Global Step: 709300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:20,915-Speed 2497.22 samples/sec Loss 1.1847 LearningRate 0.000026 Epoch: 34 Global Step: 709310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:29,133-Speed 2492.58 samples/sec Loss 1.1615 LearningRate 0.000026 Epoch: 34 Global Step: 709320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:37,283-Speed 2513.54 samples/sec Loss 1.1578 LearningRate 0.000026 Epoch: 34 Global Step: 709330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:45,487-Speed 2496.76 samples/sec Loss 1.1617 LearningRate 0.000026 Epoch: 34 Global Step: 709340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:50:53,692-Speed 2496.44 samples/sec Loss 1.1837 LearningRate 0.000026 Epoch: 34 Global Step: 709350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:01,896-Speed 2496.77 samples/sec Loss 1.1770 LearningRate 0.000026 Epoch: 34 Global Step: 709360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:10,099-Speed 2496.98 samples/sec Loss 1.1445 LearningRate 0.000026 Epoch: 34 Global Step: 709370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:18,330-Speed 2488.72 samples/sec Loss 1.1902 LearningRate 0.000026 Epoch: 34 Global Step: 709380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:26,482-Speed 2512.55 samples/sec Loss 1.1649 LearningRate 0.000026 Epoch: 34 Global Step: 709390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:34,694-Speed 2494.34 samples/sec Loss 1.1924 LearningRate 0.000026 Epoch: 34 Global Step: 709400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:42,908-Speed 2493.59 samples/sec Loss 1.1572 LearningRate 0.000026 Epoch: 34 Global Step: 709410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:51,115-Speed 2496.08 samples/sec Loss 1.1654 LearningRate 0.000026 Epoch: 34 Global Step: 709420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:51:59,318-Speed 2496.92 samples/sec Loss 1.1660 LearningRate 0.000026 Epoch: 34 Global Step: 709430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:52:07,525-Speed 2495.73 samples/sec Loss 1.1630 LearningRate 0.000026 Epoch: 34 Global Step: 709440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-07-12 08:52:15,673-Speed 2514.16 samples/sec Loss 1.2067 LearningRate 0.000026 Epoch: 34 Global Step: 709450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:52:23,878-Speed 2496.60 samples/sec Loss 1.1743 LearningRate 0.000026 Epoch: 34 Global Step: 709460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:52:32,078-Speed 2498.27 samples/sec Loss 1.1379 LearningRate 0.000026 Epoch: 34 Global Step: 709470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:52:40,281-Speed 2497.00 samples/sec Loss 1.2001 LearningRate 0.000026 Epoch: 34 Global Step: 709480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:52:48,483-Speed 2497.33 samples/sec Loss 1.1228 LearningRate 0.000026 Epoch: 34 Global Step: 709490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:52:56,683-Speed 2497.77 samples/sec Loss 1.1265 LearningRate 0.000026 Epoch: 34 Global Step: 709500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:04,837-Speed 2512.07 samples/sec Loss 1.1576 LearningRate 0.000026 Epoch: 34 Global Step: 709510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:13,041-Speed 2496.90 samples/sec Loss 1.1529 LearningRate 0.000026 Epoch: 34 Global Step: 709520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:21,247-Speed 2496.12 samples/sec Loss 1.1467 LearningRate 0.000026 Epoch: 34 Global Step: 709530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:29,448-Speed 2497.68 samples/sec Loss 1.1575 LearningRate 0.000026 Epoch: 34 Global Step: 709540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:37,652-Speed 2496.71 samples/sec Loss 1.1374 LearningRate 0.000026 Epoch: 34 Global Step: 709550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:45,855-Speed 2496.93 samples/sec Loss 1.1702 LearningRate 0.000026 Epoch: 34 Global Step: 709560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:53:54,008-Speed 2512.14 samples/sec Loss 1.1798 LearningRate 0.000026 Epoch: 34 Global Step: 709570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:02,216-Speed 2495.71 samples/sec Loss 1.1578 LearningRate 0.000026 Epoch: 34 Global Step: 709580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:10,423-Speed 2495.91 samples/sec Loss 1.1798 LearningRate 0.000026 Epoch: 34 Global Step: 709590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:18,630-Speed 2495.71 samples/sec Loss 1.1820 LearningRate 0.000026 Epoch: 34 Global Step: 709600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:26,834-Speed 2496.85 samples/sec Loss 1.1487 LearningRate 0.000026 Epoch: 34 Global Step: 709610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:35,036-Speed 2497.18 samples/sec Loss 1.1421 LearningRate 0.000026 Epoch: 34 Global Step: 709620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:43,185-Speed 2513.65 samples/sec Loss 1.1340 LearningRate 0.000026 Epoch: 34 Global Step: 709630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:51,385-Speed 2497.94 samples/sec Loss 1.1528 LearningRate 0.000026 Epoch: 34 Global Step: 709640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:54:59,587-Speed 2497.68 samples/sec Loss 1.1848 LearningRate 0.000026 Epoch: 34 Global Step: 709650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:07,792-Speed 2496.45 samples/sec Loss 1.1576 LearningRate 0.000026 Epoch: 34 Global Step: 709660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:15,993-Speed 2497.53 samples/sec Loss 1.1256 LearningRate 0.000026 Epoch: 34 Global Step: 709670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:24,195-Speed 2497.30 samples/sec Loss 1.1438 LearningRate 0.000026 Epoch: 34 Global Step: 709680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:32,343-Speed 2514.17 samples/sec Loss 1.1485 LearningRate 0.000026 Epoch: 34 Global Step: 709690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:40,540-Speed 2498.62 samples/sec Loss 1.1744 LearningRate 0.000026 Epoch: 34 Global Step: 709700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:48,741-Speed 2497.70 samples/sec Loss 1.1360 LearningRate 0.000026 Epoch: 34 Global Step: 709710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:55:56,942-Speed 2497.50 samples/sec Loss 1.1726 LearningRate 0.000026 Epoch: 34 Global Step: 709720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:05,144-Speed 2497.54 samples/sec Loss 1.1736 LearningRate 0.000026 Epoch: 34 Global Step: 709730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:13,350-Speed 2496.15 samples/sec Loss 1.1568 LearningRate 0.000026 Epoch: 34 Global Step: 709740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:21,498-Speed 2513.91 samples/sec Loss 1.1688 LearningRate 0.000026 Epoch: 34 Global Step: 709750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:29,698-Speed 2497.84 samples/sec Loss 1.1569 LearningRate 0.000026 Epoch: 34 Global Step: 709760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:37,901-Speed 2497.17 samples/sec Loss 1.1253 LearningRate 0.000026 Epoch: 34 Global Step: 709770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:46,115-Speed 2494.06 samples/sec Loss 1.1897 LearningRate 0.000026 Epoch: 34 Global Step: 709780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:56:54,316-Speed 2497.66 samples/sec Loss 1.1534 LearningRate 0.000026 Epoch: 34 Global Step: 709790 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:02,524-Speed 2495.28 samples/sec Loss 1.1425 LearningRate 0.000026 Epoch: 34 Global Step: 709800 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:10,674-Speed 2513.42 samples/sec Loss 1.1289 LearningRate 0.000026 Epoch: 34 Global Step: 709810 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:18,885-Speed 2494.58 samples/sec Loss 1.1528 LearningRate 0.000026 Epoch: 34 Global Step: 709820 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:27,090-Speed 2496.54 samples/sec Loss 1.1541 LearningRate 0.000026 Epoch: 34 Global Step: 709830 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:35,290-Speed 2497.82 samples/sec Loss 1.1599 LearningRate 0.000026 Epoch: 34 Global Step: 709840 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:43,493-Speed 2497.11 samples/sec Loss 1.1465 LearningRate 0.000026 Epoch: 34 Global Step: 709850 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:51,700-Speed 2495.85 samples/sec Loss 1.1800 LearningRate 0.000026 Epoch: 34 Global Step: 709860 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:57:59,850-Speed 2513.39 samples/sec Loss 1.1726 LearningRate 0.000026 Epoch: 34 Global Step: 709870 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:08,061-Speed 2494.74 samples/sec Loss 1.1821 LearningRate 0.000026 Epoch: 34 Global Step: 709880 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:16,263-Speed 2497.35 samples/sec Loss 1.1457 LearningRate 0.000026 Epoch: 34 Global Step: 709890 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:24,466-Speed 2496.74 samples/sec Loss 1.1910 LearningRate 0.000026 Epoch: 34 Global Step: 709900 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:32,688-Speed 2491.53 samples/sec Loss 1.1800 LearningRate 0.000026 Epoch: 34 Global Step: 709910 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:40,895-Speed 2495.66 samples/sec Loss 1.1474 LearningRate 0.000026 Epoch: 34 Global Step: 709920 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:49,045-Speed 2513.20 samples/sec Loss 1.1456 LearningRate 0.000026 Epoch: 34 Global Step: 709930 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:58:57,250-Speed 2496.33 samples/sec Loss 1.1345 LearningRate 0.000026 Epoch: 34 Global Step: 709940 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:59:05,454-Speed 2498.09 samples/sec Loss 1.1706 LearningRate 0.000026 Epoch: 34 Global Step: 709950 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 08:59:13,615-Speed 2509.83 samples/sec Loss 1.1278 LearningRate 0.000026 Epoch: 34 Global Step: 709960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:59:21,819-Speed 2496.78 samples/sec Loss 1.1344 LearningRate 0.000026 Epoch: 34 Global Step: 709970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:59:30,021-Speed 2497.04 samples/sec Loss 1.1240 LearningRate 0.000026 Epoch: 34 Global Step: 709980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:59:38,177-Speed 2511.62 samples/sec Loss 1.1479 LearningRate 0.000026 Epoch: 34 Global Step: 709990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:59:46,382-Speed 2496.87 samples/sec Loss 1.1413 LearningRate 0.000026 Epoch: 34 Global Step: 710000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 08:59:54,597-Speed 2493.05 samples/sec Loss 1.1548 LearningRate 0.000026 Epoch: 34 Global Step: 710010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:02,800-Speed 2497.18 samples/sec Loss 1.1393 LearningRate 0.000026 Epoch: 34 Global Step: 710020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:11,005-Speed 2496.71 samples/sec Loss 1.1537 LearningRate 0.000026 Epoch: 34 Global Step: 710030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:19,217-Speed 2494.31 samples/sec Loss 1.1634 LearningRate 0.000026 Epoch: 34 Global Step: 710040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:27,378-Speed 2510.05 samples/sec Loss 1.1727 LearningRate 0.000026 Epoch: 34 Global Step: 710050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:35,593-Speed 2493.27 samples/sec Loss 1.1745 LearningRate 0.000026 Epoch: 34 Global Step: 710060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:43,794-Speed 2497.53 samples/sec Loss 1.1160 LearningRate 0.000026 Epoch: 34 Global Step: 710070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:00:51,997-Speed 2497.24 samples/sec Loss 1.1639 LearningRate 0.000026 Epoch: 34 Global Step: 710080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:00,197-Speed 2497.95 samples/sec Loss 1.1782 LearningRate 0.000026 Epoch: 34 Global Step: 710090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:08,404-Speed 2495.84 samples/sec Loss 1.1830 LearningRate 0.000026 Epoch: 34 Global Step: 710100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:16,551-Speed 2514.10 samples/sec Loss 1.1507 LearningRate 0.000026 Epoch: 34 Global Step: 710110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:24,754-Speed 2497.00 samples/sec Loss 1.1210 LearningRate 0.000026 Epoch: 34 Global Step: 710120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:32,964-Speed 2495.07 samples/sec Loss 1.1326 LearningRate 0.000026 Epoch: 34 Global Step: 710130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:41,164-Speed 2497.76 samples/sec Loss 1.1568 LearningRate 0.000026 Epoch: 34 Global Step: 710140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:49,367-Speed 2497.32 samples/sec Loss 1.1507 LearningRate 0.000026 Epoch: 34 Global Step: 710150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:01:57,567-Speed 2497.88 samples/sec Loss 1.1580 LearningRate 0.000026 Epoch: 34 Global Step: 710160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:05,722-Speed 2511.89 samples/sec Loss 1.1331 LearningRate 0.000026 Epoch: 34 Global Step: 710170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:13,923-Speed 2497.73 samples/sec Loss 1.1570 LearningRate 0.000026 Epoch: 34 Global Step: 710180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:22,139-Speed 2493.11 samples/sec Loss 1.1919 LearningRate 0.000026 Epoch: 34 Global Step: 710190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:30,342-Speed 2497.17 samples/sec Loss 1.1593 LearningRate 0.000026 Epoch: 34 Global Step: 710200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:38,543-Speed 2497.70 samples/sec Loss 1.1471 LearningRate 0.000026 Epoch: 34 Global Step: 710210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:46,745-Speed 2497.21 samples/sec Loss 1.1183 LearningRate 0.000026 Epoch: 34 Global Step: 710220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:02:54,895-Speed 2513.31 samples/sec Loss 1.1331 LearningRate 0.000026 Epoch: 34 Global Step: 710230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:03,109-Speed 2493.74 samples/sec Loss 1.1498 LearningRate 0.000026 Epoch: 34 Global Step: 710240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:11,316-Speed 2495.76 samples/sec Loss 1.1643 LearningRate 0.000026 Epoch: 34 Global Step: 710250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:19,518-Speed 2497.50 samples/sec Loss 1.1483 LearningRate 0.000026 Epoch: 34 Global Step: 710260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:27,719-Speed 2497.67 samples/sec Loss 1.1641 LearningRate 0.000026 Epoch: 34 Global Step: 710270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:35,921-Speed 2497.18 samples/sec Loss 1.1717 LearningRate 0.000026 Epoch: 34 Global Step: 710280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:44,083-Speed 2509.64 samples/sec Loss 1.1506 LearningRate 0.000026 Epoch: 34 Global Step: 710290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:03:52,289-Speed 2496.28 samples/sec Loss 1.1328 LearningRate 0.000026 Epoch: 34 Global Step: 710300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:00,490-Speed 2497.50 samples/sec Loss 1.1649 LearningRate 0.000026 Epoch: 34 Global Step: 710310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:08,703-Speed 2493.97 samples/sec Loss 1.1618 LearningRate 0.000026 Epoch: 34 Global Step: 710320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:16,905-Speed 2497.46 samples/sec Loss 1.1779 LearningRate 0.000026 Epoch: 34 Global Step: 710330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:25,108-Speed 2496.83 samples/sec Loss 1.1408 LearningRate 0.000025 Epoch: 34 Global Step: 710340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:33,268-Speed 2510.18 samples/sec Loss 1.1686 LearningRate 0.000025 Epoch: 34 Global Step: 710350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:41,469-Speed 2497.59 samples/sec Loss 1.1676 LearningRate 0.000025 Epoch: 34 Global Step: 710360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:49,669-Speed 2498.23 samples/sec Loss 1.1568 LearningRate 0.000025 Epoch: 34 Global Step: 710370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:04:57,882-Speed 2494.14 samples/sec Loss 1.1617 LearningRate 0.000025 Epoch: 34 Global Step: 710380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:06,082-Speed 2497.84 samples/sec Loss 1.1284 LearningRate 0.000025 Epoch: 34 Global Step: 710390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:14,282-Speed 2498.13 samples/sec Loss 1.1378 LearningRate 0.000025 Epoch: 34 Global Step: 710400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:22,436-Speed 2511.96 samples/sec Loss 1.1623 LearningRate 0.000025 Epoch: 34 Global Step: 710410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:30,638-Speed 2497.44 samples/sec Loss 1.1636 LearningRate 0.000025 Epoch: 34 Global Step: 710420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:38,839-Speed 2497.69 samples/sec Loss 1.1652 LearningRate 0.000025 Epoch: 34 Global Step: 710430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:47,047-Speed 2496.10 samples/sec Loss 1.1522 LearningRate 0.000025 Epoch: 34 Global Step: 710440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:05:55,251-Speed 2496.65 samples/sec Loss 1.1541 LearningRate 0.000025 Epoch: 34 Global Step: 710450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:03,455-Speed 2496.64 samples/sec Loss 1.1309 LearningRate 0.000025 Epoch: 34 Global Step: 710460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:11,605-Speed 2513.33 samples/sec Loss 1.1551 LearningRate 0.000025 Epoch: 34 Global Step: 710470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:19,808-Speed 2497.05 samples/sec Loss 1.1370 LearningRate 0.000025 Epoch: 34 Global Step: 710480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:28,009-Speed 2497.46 samples/sec Loss 1.1728 LearningRate 0.000025 Epoch: 34 Global Step: 710490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:36,212-Speed 2497.35 samples/sec Loss 1.1572 LearningRate 0.000025 Epoch: 34 Global Step: 710500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:44,413-Speed 2497.48 samples/sec Loss 1.1577 LearningRate 0.000025 Epoch: 34 Global Step: 710510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:06:52,628-Speed 2493.22 samples/sec Loss 1.1634 LearningRate 0.000025 Epoch: 34 Global Step: 710520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:00,784-Speed 2511.65 samples/sec Loss 1.1673 LearningRate 0.000025 Epoch: 34 Global Step: 710530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:08,984-Speed 2497.94 samples/sec Loss 1.1422 LearningRate 0.000025 Epoch: 34 Global Step: 710540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:17,186-Speed 2497.31 samples/sec Loss 1.1691 LearningRate 0.000025 Epoch: 34 Global Step: 710550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:25,387-Speed 2497.46 samples/sec Loss 1.1701 LearningRate 0.000025 Epoch: 34 Global Step: 710560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:33,590-Speed 2497.31 samples/sec Loss 1.1365 LearningRate 0.000025 Epoch: 34 Global Step: 710570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:41,792-Speed 2497.27 samples/sec Loss 1.1437 LearningRate 0.000025 Epoch: 34 Global Step: 710580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:49,939-Speed 2514.30 samples/sec Loss 1.1639 LearningRate 0.000025 Epoch: 34 Global Step: 710590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:07:58,140-Speed 2497.77 samples/sec Loss 1.1556 LearningRate 0.000025 Epoch: 34 Global Step: 710600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:06,348-Speed 2495.39 samples/sec Loss 1.1954 LearningRate 0.000025 Epoch: 34 Global Step: 710610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:14,548-Speed 2498.04 samples/sec Loss 1.1761 LearningRate 0.000025 Epoch: 34 Global Step: 710620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:22,755-Speed 2495.89 samples/sec Loss 1.1778 LearningRate 0.000025 Epoch: 34 Global Step: 710630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:30,965-Speed 2495.10 samples/sec Loss 1.1832 LearningRate 0.000025 Epoch: 34 Global Step: 710640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:39,114-Speed 2513.52 samples/sec Loss 1.1502 LearningRate 0.000025 Epoch: 34 Global Step: 710650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:47,317-Speed 2497.12 samples/sec Loss 1.1600 LearningRate 0.000025 Epoch: 34 Global Step: 710660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:08:55,519-Speed 2497.56 samples/sec Loss 1.1652 LearningRate 0.000025 Epoch: 34 Global Step: 710670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:03,719-Speed 2498.05 samples/sec Loss 1.1757 LearningRate 0.000025 Epoch: 34 Global Step: 710680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:11,920-Speed 2497.63 samples/sec Loss 1.1487 LearningRate 0.000025 Epoch: 34 Global Step: 710690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:20,126-Speed 2496.27 samples/sec Loss 1.1691 LearningRate 0.000025 Epoch: 34 Global Step: 710700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:28,280-Speed 2511.94 samples/sec Loss 1.1915 LearningRate 0.000025 Epoch: 34 Global Step: 710710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:36,479-Speed 2498.63 samples/sec Loss 1.1409 LearningRate 0.000025 Epoch: 34 Global Step: 710720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:44,678-Speed 2498.16 samples/sec Loss 1.1614 LearningRate 0.000025 Epoch: 34 Global Step: 710730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:09:52,878-Speed 2497.99 samples/sec Loss 1.1541 LearningRate 0.000025 Epoch: 34 Global Step: 710740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:01,090-Speed 2494.45 samples/sec Loss 1.1669 LearningRate 0.000025 Epoch: 34 Global Step: 710750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:09,290-Speed 2497.70 samples/sec Loss 1.1748 LearningRate 0.000025 Epoch: 34 Global Step: 710760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:17,439-Speed 2513.60 samples/sec Loss 1.1817 LearningRate 0.000025 Epoch: 34 Global Step: 710770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:25,638-Speed 2498.29 samples/sec Loss 1.1584 LearningRate 0.000025 Epoch: 34 Global Step: 710780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:33,840-Speed 2497.54 samples/sec Loss 1.1342 LearningRate 0.000025 Epoch: 34 Global Step: 710790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:42,038-Speed 2498.53 samples/sec Loss 1.1457 LearningRate 0.000025 Epoch: 34 Global Step: 710800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:50,245-Speed 2496.04 samples/sec Loss 1.1700 LearningRate 0.000025 Epoch: 34 Global Step: 710810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:10:58,448-Speed 2496.80 samples/sec Loss 1.1530 LearningRate 0.000025 Epoch: 34 Global Step: 710820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:06,596-Speed 2513.93 samples/sec Loss 1.1533 LearningRate 0.000025 Epoch: 34 Global Step: 710830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:14,795-Speed 2498.45 samples/sec Loss 1.1459 LearningRate 0.000025 Epoch: 34 Global Step: 710840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:23,001-Speed 2496.94 samples/sec Loss 1.1916 LearningRate 0.000025 Epoch: 34 Global Step: 710850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:31,206-Speed 2496.35 samples/sec Loss 1.1641 LearningRate 0.000025 Epoch: 34 Global Step: 710860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:39,409-Speed 2497.09 samples/sec Loss 1.1522 LearningRate 0.000025 Epoch: 34 Global Step: 710870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:47,610-Speed 2497.48 samples/sec Loss 1.1390 LearningRate 0.000025 Epoch: 34 Global Step: 710880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:11:55,760-Speed 2513.60 samples/sec Loss 1.1553 LearningRate 0.000025 Epoch: 34 Global Step: 710890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:03,964-Speed 2496.71 samples/sec Loss 1.1776 LearningRate 0.000025 Epoch: 34 Global Step: 710900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:12,164-Speed 2498.04 samples/sec Loss 1.1588 LearningRate 0.000025 Epoch: 34 Global Step: 710910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:20,367-Speed 2497.10 samples/sec Loss 1.1672 LearningRate 0.000025 Epoch: 34 Global Step: 710920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:28,572-Speed 2496.48 samples/sec Loss 1.1611 LearningRate 0.000025 Epoch: 34 Global Step: 710930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:36,773-Speed 2497.68 samples/sec Loss 1.1807 LearningRate 0.000025 Epoch: 34 Global Step: 710940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:44,922-Speed 2513.74 samples/sec Loss 1.1711 LearningRate 0.000025 Epoch: 34 Global Step: 710950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:12:53,127-Speed 2496.26 samples/sec Loss 1.1570 LearningRate 0.000025 Epoch: 34 Global Step: 710960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:01,341-Speed 2493.89 samples/sec Loss 1.2017 LearningRate 0.000025 Epoch: 34 Global Step: 710970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:09,547-Speed 2496.18 samples/sec Loss 1.1782 LearningRate 0.000025 Epoch: 34 Global Step: 710980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:17,746-Speed 2498.44 samples/sec Loss 1.1791 LearningRate 0.000025 Epoch: 34 Global Step: 710990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:26,036-Speed 2470.84 samples/sec Loss 1.1623 LearningRate 0.000025 Epoch: 34 Global Step: 711000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:34,182-Speed 2514.29 samples/sec Loss 1.1436 LearningRate 0.000025 Epoch: 34 Global Step: 711010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:42,383-Speed 2497.68 samples/sec Loss 1.1592 LearningRate 0.000025 Epoch: 34 Global Step: 711020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:50,584-Speed 2497.84 samples/sec Loss 1.1559 LearningRate 0.000025 Epoch: 34 Global Step: 711030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:13:58,783-Speed 2498.59 samples/sec Loss 1.1732 LearningRate 0.000025 Epoch: 34 Global Step: 711040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:06,987-Speed 2496.52 samples/sec Loss 1.1665 LearningRate 0.000025 Epoch: 34 Global Step: 711050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:15,197-Speed 2495.07 samples/sec Loss 1.1718 LearningRate 0.000025 Epoch: 34 Global Step: 711060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:23,343-Speed 2514.34 samples/sec Loss 1.1872 LearningRate 0.000025 Epoch: 34 Global Step: 711070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:31,545-Speed 2497.39 samples/sec Loss 1.1708 LearningRate 0.000025 Epoch: 34 Global Step: 711080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:39,747-Speed 2497.74 samples/sec Loss 1.1554 LearningRate 0.000025 Epoch: 34 Global Step: 711090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:47,947-Speed 2497.86 samples/sec Loss 1.1508 LearningRate 0.000025 Epoch: 34 Global Step: 711100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:14:56,150-Speed 2497.24 samples/sec Loss 1.1838 LearningRate 0.000025 Epoch: 34 Global Step: 711110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:15:04,352-Speed 2497.26 samples/sec Loss 1.1989 LearningRate 0.000025 Epoch: 34 Global Step: 711120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:15:12,505-Speed 2512.48 samples/sec Loss 1.1629 LearningRate 0.000025 Epoch: 34 Global Step: 711130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:15:20,716-Speed 2494.77 samples/sec Loss 1.1627 LearningRate 0.000025 Epoch: 34 Global Step: 711140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:15:28,914-Speed 2498.37 samples/sec Loss 1.1584 LearningRate 0.000025 Epoch: 34 Global Step: 711150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:15:37,113-Speed 2498.24 samples/sec Loss 1.1655 LearningRate 0.000025 Epoch: 34 Global Step: 711160 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:15:45,320-Speed 2495.91 samples/sec Loss 1.1477 LearningRate 0.000025 Epoch: 34 Global Step: 711170 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:15:53,524-Speed 2497.13 samples/sec Loss 1.1575 LearningRate 0.000025 Epoch: 34 Global Step: 711180 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:01,672-Speed 2513.72 samples/sec Loss 1.1781 LearningRate 0.000025 Epoch: 34 Global Step: 711190 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:09,874-Speed 2497.43 samples/sec Loss 1.1618 LearningRate 0.000025 Epoch: 34 Global Step: 711200 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:18,084-Speed 2495.14 samples/sec Loss 1.1560 LearningRate 0.000025 Epoch: 34 Global Step: 711210 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:26,298-Speed 2493.89 samples/sec Loss 1.1458 LearningRate 0.000025 Epoch: 34 Global Step: 711220 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:34,501-Speed 2496.84 samples/sec Loss 1.1835 LearningRate 0.000025 Epoch: 34 Global Step: 711230 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:42,704-Speed 2497.16 samples/sec Loss 1.1633 LearningRate 0.000025 Epoch: 34 Global Step: 711240 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:50,852-Speed 2514.03 samples/sec Loss 1.1893 LearningRate 0.000025 Epoch: 34 Global Step: 711250 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:16:59,083-Speed 2488.52 samples/sec Loss 1.1585 LearningRate 0.000025 Epoch: 34 Global Step: 711260 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:07,282-Speed 2498.23 samples/sec Loss 1.1796 LearningRate 0.000025 Epoch: 34 Global Step: 711270 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:15,485-Speed 2497.27 samples/sec Loss 1.1719 LearningRate 0.000025 Epoch: 34 Global Step: 711280 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:23,690-Speed 2496.56 samples/sec Loss 1.1576 LearningRate 0.000025 Epoch: 34 Global Step: 711290 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:31,895-Speed 2496.82 samples/sec Loss 1.1684 LearningRate 0.000025 Epoch: 34 Global Step: 711300 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:40,037-Speed 2515.58 samples/sec Loss 1.1463 LearningRate 0.000025 Epoch: 34 Global Step: 711310 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:48,237-Speed 2497.92 samples/sec Loss 1.1511 LearningRate 0.000025 Epoch: 34 Global Step: 711320 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:17:56,436-Speed 2498.10 samples/sec Loss 1.1869 LearningRate 0.000025 Epoch: 34 Global Step: 711330 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:04,643-Speed 2496.00 samples/sec Loss 1.1464 LearningRate 0.000025 Epoch: 34 Global Step: 711340 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:12,851-Speed 2495.33 samples/sec Loss 1.1484 LearningRate 0.000025 Epoch: 34 Global Step: 711350 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:21,057-Speed 2496.59 samples/sec Loss 1.1532 LearningRate 0.000025 Epoch: 34 Global Step: 711360 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:29,203-Speed 2514.54 samples/sec Loss 1.1521 LearningRate 0.000025 Epoch: 34 Global Step: 711370 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:37,407-Speed 2496.73 samples/sec Loss 1.1623 LearningRate 0.000025 Epoch: 34 Global Step: 711380 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:45,609-Speed 2497.48 samples/sec Loss 1.1256 LearningRate 0.000025 Epoch: 34 Global Step: 711390 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:18:53,816-Speed 2496.06 samples/sec Loss 1.1685 LearningRate 0.000025 Epoch: 34 Global Step: 711400 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:02,021-Speed 2496.31 samples/sec Loss 1.1603 LearningRate 0.000025 Epoch: 34 Global Step: 711410 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:10,229-Speed 2495.69 samples/sec Loss 1.1638 LearningRate 0.000025 Epoch: 34 Global Step: 711420 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:18,379-Speed 2513.39 samples/sec Loss 1.1356 LearningRate 0.000025 Epoch: 34 Global Step: 711430 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:26,581-Speed 2497.40 samples/sec Loss 1.1728 LearningRate 0.000025 Epoch: 34 Global Step: 711440 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:34,781-Speed 2498.21 samples/sec Loss 1.1723 LearningRate 0.000025 Epoch: 34 Global Step: 711450 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:42,983-Speed 2497.08 samples/sec Loss 1.1581 LearningRate 0.000025 Epoch: 34 Global Step: 711460 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:51,186-Speed 2497.18 samples/sec Loss 1.1503 LearningRate 0.000025 Epoch: 34 Global Step: 711470 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:19:59,388-Speed 2497.62 samples/sec Loss 1.1477 LearningRate 0.000025 Epoch: 34 Global Step: 711480 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:20:07,535-Speed 2514.30 samples/sec Loss 1.1739 LearningRate 0.000025 Epoch: 34 Global Step: 711490 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:20:15,736-Speed 2497.43 samples/sec Loss 1.1542 LearningRate 0.000025 Epoch: 34 Global Step: 711500 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-07-12 09:20:23,896-Speed 2510.32 samples/sec Loss 1.1306 LearningRate 0.000025 Epoch: 34 Global Step: 711510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:20:32,107-Speed 2494.83 samples/sec Loss 1.1271 LearningRate 0.000025 Epoch: 34 Global Step: 711520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:20:40,306-Speed 2498.35 samples/sec Loss 1.1345 LearningRate 0.000025 Epoch: 34 Global Step: 711530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:20:48,506-Speed 2497.84 samples/sec Loss 1.1429 LearningRate 0.000025 Epoch: 34 Global Step: 711540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:20:56,660-Speed 2512.39 samples/sec Loss 1.1471 LearningRate 0.000025 Epoch: 34 Global Step: 711550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:04,868-Speed 2495.56 samples/sec Loss 1.1440 LearningRate 0.000025 Epoch: 34 Global Step: 711560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:13,076-Speed 2495.57 samples/sec Loss 1.1572 LearningRate 0.000025 Epoch: 34 Global Step: 711570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:21,276-Speed 2498.02 samples/sec Loss 1.1371 LearningRate 0.000025 Epoch: 34 Global Step: 711580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:29,479-Speed 2496.89 samples/sec Loss 1.1447 LearningRate 0.000025 Epoch: 34 Global Step: 711590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:37,680-Speed 2497.71 samples/sec Loss 1.1244 LearningRate 0.000025 Epoch: 34 Global Step: 711600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:45,830-Speed 2513.38 samples/sec Loss 1.1758 LearningRate 0.000025 Epoch: 34 Global Step: 711610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:21:54,033-Speed 2497.15 samples/sec Loss 1.1275 LearningRate 0.000025 Epoch: 34 Global Step: 711620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:02,237-Speed 2496.85 samples/sec Loss 1.1532 LearningRate 0.000025 Epoch: 34 Global Step: 711630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:10,439-Speed 2497.14 samples/sec Loss 1.1429 LearningRate 0.000025 Epoch: 34 Global Step: 711640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:18,647-Speed 2495.50 samples/sec Loss 1.1200 LearningRate 0.000025 Epoch: 34 Global Step: 711650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:26,850-Speed 2497.02 samples/sec Loss 1.1367 LearningRate 0.000025 Epoch: 34 Global Step: 711660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:34,997-Speed 2514.04 samples/sec Loss 1.1627 LearningRate 0.000025 Epoch: 34 Global Step: 711670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:43,197-Speed 2498.22 samples/sec Loss 1.1485 LearningRate 0.000025 Epoch: 34 Global Step: 711680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:51,397-Speed 2498.02 samples/sec Loss 1.1228 LearningRate 0.000025 Epoch: 34 Global Step: 711690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:22:59,606-Speed 2495.13 samples/sec Loss 1.1192 LearningRate 0.000025 Epoch: 34 Global Step: 711700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:07,807-Speed 2497.86 samples/sec Loss 1.1425 LearningRate 0.000025 Epoch: 34 Global Step: 711710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:16,007-Speed 2498.04 samples/sec Loss 1.1731 LearningRate 0.000025 Epoch: 34 Global Step: 711720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:24,155-Speed 2513.86 samples/sec Loss 1.1613 LearningRate 0.000025 Epoch: 34 Global Step: 711730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:32,353-Speed 2498.56 samples/sec Loss 1.1536 LearningRate 0.000025 Epoch: 34 Global Step: 711740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:40,555-Speed 2497.75 samples/sec Loss 1.1644 LearningRate 0.000025 Epoch: 34 Global Step: 711750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:48,755-Speed 2497.95 samples/sec Loss 1.1409 LearningRate 0.000025 Epoch: 34 Global Step: 711760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:23:56,973-Speed 2492.30 samples/sec Loss 1.1152 LearningRate 0.000025 Epoch: 34 Global Step: 711770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:05,185-Speed 2494.53 samples/sec Loss 1.1217 LearningRate 0.000025 Epoch: 34 Global Step: 711780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:13,332-Speed 2513.87 samples/sec Loss 1.2121 LearningRate 0.000025 Epoch: 34 Global Step: 711790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:21,532-Speed 2497.87 samples/sec Loss 1.1452 LearningRate 0.000025 Epoch: 34 Global Step: 711800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:29,733-Speed 2497.81 samples/sec Loss 1.1383 LearningRate 0.000025 Epoch: 34 Global Step: 711810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:37,936-Speed 2496.98 samples/sec Loss 1.1346 LearningRate 0.000025 Epoch: 34 Global Step: 711820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:46,140-Speed 2496.47 samples/sec Loss 1.1552 LearningRate 0.000025 Epoch: 34 Global Step: 711830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:24:54,349-Speed 2495.42 samples/sec Loss 1.1334 LearningRate 0.000025 Epoch: 34 Global Step: 711840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:02,501-Speed 2512.57 samples/sec Loss 1.1420 LearningRate 0.000025 Epoch: 34 Global Step: 711850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:10,700-Speed 2498.04 samples/sec Loss 1.1571 LearningRate 0.000025 Epoch: 34 Global Step: 711860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:18,905-Speed 2496.62 samples/sec Loss 1.1897 LearningRate 0.000025 Epoch: 34 Global Step: 711870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:27,108-Speed 2496.84 samples/sec Loss 1.1339 LearningRate 0.000025 Epoch: 34 Global Step: 711880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:35,311-Speed 2497.08 samples/sec Loss 1.1242 LearningRate 0.000025 Epoch: 34 Global Step: 711890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:43,510-Speed 2498.17 samples/sec Loss 1.1693 LearningRate 0.000025 Epoch: 34 Global Step: 711900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:51,660-Speed 2513.20 samples/sec Loss 1.1359 LearningRate 0.000025 Epoch: 34 Global Step: 711910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:25:59,859-Speed 2498.35 samples/sec Loss 1.1396 LearningRate 0.000025 Epoch: 34 Global Step: 711920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:08,079-Speed 2491.72 samples/sec Loss 1.1278 LearningRate 0.000025 Epoch: 34 Global Step: 711930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:16,282-Speed 2497.40 samples/sec Loss 1.1662 LearningRate 0.000025 Epoch: 34 Global Step: 711940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:24,482-Speed 2498.06 samples/sec Loss 1.1589 LearningRate 0.000025 Epoch: 34 Global Step: 711950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:32,688-Speed 2496.09 samples/sec Loss 1.1603 LearningRate 0.000025 Epoch: 34 Global Step: 711960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:40,840-Speed 2512.85 samples/sec Loss 1.1438 LearningRate 0.000025 Epoch: 34 Global Step: 711970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:49,040-Speed 2497.77 samples/sec Loss 1.1647 LearningRate 0.000025 Epoch: 34 Global Step: 711980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:26:57,238-Speed 2498.71 samples/sec Loss 1.1467 LearningRate 0.000025 Epoch: 34 Global Step: 711990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:05,442-Speed 2496.77 samples/sec Loss 1.1625 LearningRate 0.000025 Epoch: 34 Global Step: 712000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:13,643-Speed 2497.76 samples/sec Loss 1.1146 LearningRate 0.000025 Epoch: 34 Global Step: 712010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:21,845-Speed 2497.03 samples/sec Loss 1.1588 LearningRate 0.000025 Epoch: 34 Global Step: 712020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:29,995-Speed 2513.61 samples/sec Loss 1.1624 LearningRate 0.000025 Epoch: 34 Global Step: 712030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:38,196-Speed 2497.72 samples/sec Loss 1.1291 LearningRate 0.000025 Epoch: 34 Global Step: 712040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:46,396-Speed 2498.21 samples/sec Loss 1.1441 LearningRate 0.000025 Epoch: 34 Global Step: 712050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:27:54,600-Speed 2496.82 samples/sec Loss 1.1651 LearningRate 0.000025 Epoch: 34 Global Step: 712060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:02,806-Speed 2496.10 samples/sec Loss 1.1165 LearningRate 0.000025 Epoch: 34 Global Step: 712070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:11,012-Speed 2496.17 samples/sec Loss 1.1530 LearningRate 0.000025 Epoch: 34 Global Step: 712080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:19,160-Speed 2513.87 samples/sec Loss 1.1389 LearningRate 0.000025 Epoch: 34 Global Step: 712090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:27,358-Speed 2498.66 samples/sec Loss 1.1463 LearningRate 0.000025 Epoch: 34 Global Step: 712100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:35,559-Speed 2498.16 samples/sec Loss 1.1605 LearningRate 0.000025 Epoch: 34 Global Step: 712110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:43,758-Speed 2498.29 samples/sec Loss 1.1545 LearningRate 0.000025 Epoch: 34 Global Step: 712120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:28:51,964-Speed 2496.07 samples/sec Loss 1.1623 LearningRate 0.000025 Epoch: 34 Global Step: 712130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:00,178-Speed 2493.74 samples/sec Loss 1.1556 LearningRate 0.000025 Epoch: 34 Global Step: 712140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:08,325-Speed 2514.24 samples/sec Loss 1.1590 LearningRate 0.000025 Epoch: 34 Global Step: 712150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:16,526-Speed 2497.50 samples/sec Loss 1.1419 LearningRate 0.000025 Epoch: 34 Global Step: 712160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:24,727-Speed 2497.62 samples/sec Loss 1.1668 LearningRate 0.000025 Epoch: 34 Global Step: 712170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:32,929-Speed 2497.61 samples/sec Loss 1.1587 LearningRate 0.000025 Epoch: 34 Global Step: 712180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:41,130-Speed 2497.78 samples/sec Loss 1.1633 LearningRate 0.000025 Epoch: 34 Global Step: 712190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:49,330-Speed 2497.67 samples/sec Loss 1.1462 LearningRate 0.000025 Epoch: 34 Global Step: 712200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:29:57,477-Speed 2514.29 samples/sec Loss 1.1264 LearningRate 0.000025 Epoch: 34 Global Step: 712210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:30:05,687-Speed 2494.94 samples/sec Loss 1.1850 LearningRate 0.000025 Epoch: 34 Global Step: 712220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:30:13,887-Speed 2497.84 samples/sec Loss 1.1570 LearningRate 0.000025 Epoch: 34 Global Step: 712230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:30:22,087-Speed 2498.30 samples/sec Loss 1.1609 LearningRate 0.000025 Epoch: 34 Global Step: 712240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:30:30,288-Speed 2497.76 samples/sec Loss 1.1695 LearningRate 0.000025 Epoch: 34 Global Step: 712250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:30:38,446-Speed 2510.98 samples/sec Loss 1.1225 LearningRate 0.000025 Epoch: 34 Global Step: 712260 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:30:46,593-Speed 2514.30 samples/sec Loss 1.1746 LearningRate 0.000025 Epoch: 34 Global Step: 712270 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:30:54,803-Speed 2494.62 samples/sec Loss 1.1392 LearningRate 0.000025 Epoch: 34 Global Step: 712280 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:03,019-Speed 2493.15 samples/sec Loss 1.1263 LearningRate 0.000025 Epoch: 34 Global Step: 712290 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:11,220-Speed 2497.84 samples/sec Loss 1.1761 LearningRate 0.000025 Epoch: 34 Global Step: 712300 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:19,419-Speed 2498.09 samples/sec Loss 1.1428 LearningRate 0.000025 Epoch: 34 Global Step: 712310 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:27,620-Speed 2497.61 samples/sec Loss 1.1468 LearningRate 0.000025 Epoch: 34 Global Step: 712320 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:35,766-Speed 2514.59 samples/sec Loss 1.1505 LearningRate 0.000025 Epoch: 34 Global Step: 712330 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:43,968-Speed 2497.29 samples/sec Loss 1.1680 LearningRate 0.000025 Epoch: 34 Global Step: 712340 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:31:52,167-Speed 2498.46 samples/sec Loss 1.1478 LearningRate 0.000025 Epoch: 34 Global Step: 712350 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:00,365-Speed 2498.58 samples/sec Loss 1.1673 LearningRate 0.000025 Epoch: 34 Global Step: 712360 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:08,589-Speed 2490.71 samples/sec Loss 1.1534 LearningRate 0.000025 Epoch: 34 Global Step: 712370 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:16,789-Speed 2497.94 samples/sec Loss 1.1635 LearningRate 0.000025 Epoch: 34 Global Step: 712380 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:24,945-Speed 2511.43 samples/sec Loss 1.1456 LearningRate 0.000025 Epoch: 34 Global Step: 712390 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:33,154-Speed 2495.49 samples/sec Loss 1.1554 LearningRate 0.000025 Epoch: 34 Global Step: 712400 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:41,358-Speed 2496.44 samples/sec Loss 1.1690 LearningRate 0.000025 Epoch: 34 Global Step: 712410 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:49,561-Speed 2497.09 samples/sec Loss 1.1352 LearningRate 0.000025 Epoch: 34 Global Step: 712420 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:32:57,765-Speed 2496.74 samples/sec Loss 1.1632 LearningRate 0.000025 Epoch: 34 Global Step: 712430 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:05,972-Speed 2495.69 samples/sec Loss 1.1395 LearningRate 0.000025 Epoch: 34 Global Step: 712440 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:14,122-Speed 2513.19 samples/sec Loss 1.1540 LearningRate 0.000025 Epoch: 34 Global Step: 712450 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:22,323-Speed 2497.50 samples/sec Loss 1.1079 LearningRate 0.000025 Epoch: 34 Global Step: 712460 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:30,526-Speed 2497.33 samples/sec Loss 1.1089 LearningRate 0.000025 Epoch: 34 Global Step: 712470 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:38,729-Speed 2496.99 samples/sec Loss 1.1505 LearningRate 0.000025 Epoch: 34 Global Step: 712480 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:46,929-Speed 2498.06 samples/sec Loss 1.1500 LearningRate 0.000025 Epoch: 34 Global Step: 712490 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:33:55,128-Speed 2497.97 samples/sec Loss 1.1684 LearningRate 0.000025 Epoch: 34 Global Step: 712500 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:03,278-Speed 2513.43 samples/sec Loss 1.1733 LearningRate 0.000025 Epoch: 34 Global Step: 712510 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:11,476-Speed 2498.53 samples/sec Loss 1.1438 LearningRate 0.000025 Epoch: 34 Global Step: 712520 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:19,676-Speed 2498.01 samples/sec Loss 1.1240 LearningRate 0.000025 Epoch: 34 Global Step: 712530 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:27,878-Speed 2497.67 samples/sec Loss 1.1521 LearningRate 0.000025 Epoch: 34 Global Step: 712540 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:36,077-Speed 2498.09 samples/sec Loss 1.1603 LearningRate 0.000025 Epoch: 34 Global Step: 712550 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:44,281-Speed 2497.02 samples/sec Loss 1.1102 LearningRate 0.000025 Epoch: 34 Global Step: 712560 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:34:52,427-Speed 2515.09 samples/sec Loss 1.1631 LearningRate 0.000025 Epoch: 34 Global Step: 712570 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:00,630-Speed 2497.03 samples/sec Loss 1.1848 LearningRate 0.000025 Epoch: 34 Global Step: 712580 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:08,831-Speed 2497.61 samples/sec Loss 1.1349 LearningRate 0.000025 Epoch: 34 Global Step: 712590 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:17,037-Speed 2496.43 samples/sec Loss 1.1821 LearningRate 0.000025 Epoch: 34 Global Step: 712600 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:25,242-Speed 2496.64 samples/sec Loss 1.1745 LearningRate 0.000025 Epoch: 34 Global Step: 712610 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:33,445-Speed 2496.77 samples/sec Loss 1.1533 LearningRate 0.000025 Epoch: 34 Global Step: 712620 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:41,603-Speed 2510.99 samples/sec Loss 1.1424 LearningRate 0.000025 Epoch: 34 Global Step: 712630 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:49,804-Speed 2497.78 samples/sec Loss 1.1611 LearningRate 0.000025 Epoch: 34 Global Step: 712640 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:35:58,010-Speed 2495.92 samples/sec Loss 1.1578 LearningRate 0.000025 Epoch: 34 Global Step: 712650 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:06,210-Speed 2497.84 samples/sec Loss 1.1333 LearningRate 0.000025 Epoch: 34 Global Step: 712660 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:14,408-Speed 2498.94 samples/sec Loss 1.1598 LearningRate 0.000025 Epoch: 34 Global Step: 712670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:22,608-Speed 2498.00 samples/sec Loss 1.1420 LearningRate 0.000025 Epoch: 34 Global Step: 712680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:30,751-Speed 2515.42 samples/sec Loss 1.1552 LearningRate 0.000025 Epoch: 34 Global Step: 712690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:38,950-Speed 2497.98 samples/sec Loss 1.1370 LearningRate 0.000024 Epoch: 34 Global Step: 712700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:47,151-Speed 2498.11 samples/sec Loss 1.1452 LearningRate 0.000024 Epoch: 34 Global Step: 712710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:36:55,354-Speed 2497.17 samples/sec Loss 1.1549 LearningRate 0.000024 Epoch: 34 Global Step: 712720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:03,562-Speed 2495.54 samples/sec Loss 1.1692 LearningRate 0.000024 Epoch: 34 Global Step: 712730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:11,762-Speed 2497.69 samples/sec Loss 1.1713 LearningRate 0.000024 Epoch: 34 Global Step: 712740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:19,922-Speed 2510.31 samples/sec Loss 1.1481 LearningRate 0.000024 Epoch: 34 Global Step: 712750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:28,121-Speed 2498.52 samples/sec Loss 1.1574 LearningRate 0.000024 Epoch: 34 Global Step: 712760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:36,324-Speed 2497.00 samples/sec Loss 1.1592 LearningRate 0.000024 Epoch: 34 Global Step: 712770 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:44,523-Speed 2498.10 samples/sec Loss 1.1582 LearningRate 0.000024 Epoch: 34 Global Step: 712780 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:37:52,725-Speed 2497.67 samples/sec Loss 1.1413 LearningRate 0.000024 Epoch: 34 Global Step: 712790 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:00,927-Speed 2497.38 samples/sec Loss 1.1461 LearningRate 0.000024 Epoch: 34 Global Step: 712800 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:09,074-Speed 2514.18 samples/sec Loss 1.1493 LearningRate 0.000024 Epoch: 34 Global Step: 712810 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:17,286-Speed 2494.31 samples/sec Loss 1.1076 LearningRate 0.000024 Epoch: 34 Global Step: 712820 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:25,485-Speed 2497.99 samples/sec Loss 1.1512 LearningRate 0.000024 Epoch: 34 Global Step: 712830 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:33,697-Speed 2494.38 samples/sec Loss 1.1395 LearningRate 0.000024 Epoch: 34 Global Step: 712840 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:41,900-Speed 2497.14 samples/sec Loss 1.1081 LearningRate 0.000024 Epoch: 34 Global Step: 712850 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:50,102-Speed 2497.26 samples/sec Loss 1.1490 LearningRate 0.000024 Epoch: 34 Global Step: 712860 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:38:58,249-Speed 2514.49 samples/sec Loss 1.1556 LearningRate 0.000024 Epoch: 34 Global Step: 712870 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:06,450-Speed 2497.54 samples/sec Loss 1.1573 LearningRate 0.000024 Epoch: 34 Global Step: 712880 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:14,649-Speed 2498.34 samples/sec Loss 1.1670 LearningRate 0.000024 Epoch: 34 Global Step: 712890 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:22,852-Speed 2497.09 samples/sec Loss 1.1598 LearningRate 0.000024 Epoch: 34 Global Step: 712900 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:31,052-Speed 2497.91 samples/sec Loss 1.1574 LearningRate 0.000024 Epoch: 34 Global Step: 712910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:39,255-Speed 2496.92 samples/sec Loss 1.1614 LearningRate 0.000024 Epoch: 34 Global Step: 712920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:47,404-Speed 2513.71 samples/sec Loss 1.1696 LearningRate 0.000024 Epoch: 34 Global Step: 712930 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:39:55,604-Speed 2497.85 samples/sec Loss 1.1422 LearningRate 0.000024 Epoch: 34 Global Step: 712940 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:03,809-Speed 2496.83 samples/sec Loss 1.1563 LearningRate 0.000024 Epoch: 34 Global Step: 712950 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:12,010-Speed 2497.78 samples/sec Loss 1.1396 LearningRate 0.000024 Epoch: 34 Global Step: 712960 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:20,223-Speed 2494.13 samples/sec Loss 1.1194 LearningRate 0.000024 Epoch: 34 Global Step: 712970 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:28,423-Speed 2497.99 samples/sec Loss 1.1508 LearningRate 0.000024 Epoch: 34 Global Step: 712980 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:36,566-Speed 2515.34 samples/sec Loss 1.1612 LearningRate 0.000024 Epoch: 34 Global Step: 712990 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:44,765-Speed 2498.16 samples/sec Loss 1.1485 LearningRate 0.000024 Epoch: 34 Global Step: 713000 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:40:52,969-Speed 2496.82 samples/sec Loss 1.1324 LearningRate 0.000024 Epoch: 34 Global Step: 713010 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:01,171-Speed 2497.29 samples/sec Loss 1.1699 LearningRate 0.000024 Epoch: 34 Global Step: 713020 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:09,369-Speed 2499.01 samples/sec Loss 1.1548 LearningRate 0.000024 Epoch: 34 Global Step: 713030 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:17,568-Speed 2497.96 samples/sec Loss 1.1460 LearningRate 0.000024 Epoch: 34 Global Step: 713040 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:25,716-Speed 2514.10 samples/sec Loss 1.1657 LearningRate 0.000024 Epoch: 34 Global Step: 713050 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:33,922-Speed 2496.32 samples/sec Loss 1.1427 LearningRate 0.000024 Epoch: 34 Global Step: 713060 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:42,123-Speed 2497.65 samples/sec Loss 1.1500 LearningRate 0.000024 Epoch: 34 Global Step: 713070 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:50,321-Speed 2498.64 samples/sec Loss 1.1517 LearningRate 0.000024 Epoch: 34 Global Step: 713080 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:41:58,521-Speed 2497.75 samples/sec Loss 1.1484 LearningRate 0.000024 Epoch: 34 Global Step: 713090 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:06,723-Speed 2497.50 samples/sec Loss 1.1692 LearningRate 0.000024 Epoch: 34 Global Step: 713100 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:14,868-Speed 2514.74 samples/sec Loss 1.1690 LearningRate 0.000024 Epoch: 34 Global Step: 713110 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:23,067-Speed 2498.51 samples/sec Loss 1.1511 LearningRate 0.000024 Epoch: 34 Global Step: 713120 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:31,267-Speed 2497.88 samples/sec Loss 1.1633 LearningRate 0.000024 Epoch: 34 Global Step: 713130 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:39,467-Speed 2497.93 samples/sec Loss 1.1295 LearningRate 0.000024 Epoch: 34 Global Step: 713140 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:47,666-Speed 2498.19 samples/sec Loss 1.1108 LearningRate 0.000024 Epoch: 34 Global Step: 713150 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:42:55,867-Speed 2497.83 samples/sec Loss 1.1477 LearningRate 0.000024 Epoch: 34 Global Step: 713160 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:04,018-Speed 2513.32 samples/sec Loss 1.1500 LearningRate 0.000024 Epoch: 34 Global Step: 713170 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:12,217-Speed 2498.08 samples/sec Loss 1.1602 LearningRate 0.000024 Epoch: 34 Global Step: 713180 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:20,430-Speed 2493.89 samples/sec Loss 1.1325 LearningRate 0.000024 Epoch: 34 Global Step: 713190 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:28,639-Speed 2495.42 samples/sec Loss 1.1437 LearningRate 0.000024 Epoch: 34 Global Step: 713200 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:36,842-Speed 2496.86 samples/sec Loss 1.1704 LearningRate 0.000024 Epoch: 34 Global Step: 713210 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:45,048-Speed 2496.23 samples/sec Loss 1.1523 LearningRate 0.000024 Epoch: 34 Global Step: 713220 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:43:53,195-Speed 2514.23 samples/sec Loss 1.1298 LearningRate 0.000024 Epoch: 34 Global Step: 713230 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:01,398-Speed 2497.10 samples/sec Loss 1.1737 LearningRate 0.000024 Epoch: 34 Global Step: 713240 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:09,599-Speed 2497.70 samples/sec Loss 1.1485 LearningRate 0.000024 Epoch: 34 Global Step: 713250 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:17,801-Speed 2497.48 samples/sec Loss 1.1599 LearningRate 0.000024 Epoch: 34 Global Step: 713260 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:26,000-Speed 2498.04 samples/sec Loss 1.1685 LearningRate 0.000024 Epoch: 34 Global Step: 713270 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:34,203-Speed 2497.16 samples/sec Loss 1.1494 LearningRate 0.000024 Epoch: 34 Global Step: 713280 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:42,346-Speed 2515.38 samples/sec Loss 1.1403 LearningRate 0.000024 Epoch: 34 Global Step: 713290 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:50,546-Speed 2498.02 samples/sec Loss 1.1631 LearningRate 0.000024 Epoch: 34 Global Step: 713300 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:44:58,744-Speed 2498.69 samples/sec Loss 1.1571 LearningRate 0.000024 Epoch: 34 Global Step: 713310 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:06,946-Speed 2497.05 samples/sec Loss 1.1820 LearningRate 0.000024 Epoch: 34 Global Step: 713320 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:15,145-Speed 2498.49 samples/sec Loss 1.1582 LearningRate 0.000024 Epoch: 34 Global Step: 713330 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:23,347-Speed 2497.32 samples/sec Loss 1.1583 LearningRate 0.000024 Epoch: 34 Global Step: 713340 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:31,496-Speed 2513.69 samples/sec Loss 1.1323 LearningRate 0.000024 Epoch: 34 Global Step: 713350 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:39,695-Speed 2498.10 samples/sec Loss 1.2068 LearningRate 0.000024 Epoch: 34 Global Step: 713360 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:47,894-Speed 2498.41 samples/sec Loss 1.1668 LearningRate 0.000024 Epoch: 34 Global Step: 713370 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:45:56,094-Speed 2497.96 samples/sec Loss 1.1533 LearningRate 0.000024 Epoch: 34 Global Step: 713380 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:04,298-Speed 2497.03 samples/sec Loss 1.1281 LearningRate 0.000024 Epoch: 34 Global Step: 713390 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:12,501-Speed 2497.00 samples/sec Loss 1.1658 LearningRate 0.000024 Epoch: 34 Global Step: 713400 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:20,649-Speed 2514.14 samples/sec Loss 1.1250 LearningRate 0.000024 Epoch: 34 Global Step: 713410 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:28,857-Speed 2495.41 samples/sec Loss 1.1277 LearningRate 0.000024 Epoch: 34 Global Step: 713420 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:37,068-Speed 2494.51 samples/sec Loss 1.1373 LearningRate 0.000024 Epoch: 34 Global Step: 713430 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:45,269-Speed 2497.77 samples/sec Loss 1.1466 LearningRate 0.000024 Epoch: 34 Global Step: 713440 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:46:53,468-Speed 2498.16 samples/sec Loss 1.1678 LearningRate 0.000024 Epoch: 34 Global Step: 713450 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:47:01,680-Speed 2494.37 samples/sec Loss 1.2028 LearningRate 0.000024 Epoch: 34 Global Step: 713460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:09,827-Speed 2514.10 samples/sec Loss 1.1203 LearningRate 0.000024 Epoch: 34 Global Step: 713470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:18,033-Speed 2496.67 samples/sec Loss 1.1583 LearningRate 0.000024 Epoch: 34 Global Step: 713480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:26,236-Speed 2496.89 samples/sec Loss 1.1870 LearningRate 0.000024 Epoch: 34 Global Step: 713490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:34,439-Speed 2497.02 samples/sec Loss 1.1619 LearningRate 0.000024 Epoch: 34 Global Step: 713500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:42,643-Speed 2496.81 samples/sec Loss 1.1405 LearningRate 0.000024 Epoch: 34 Global Step: 713510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:50,844-Speed 2497.63 samples/sec Loss 1.1484 LearningRate 0.000024 Epoch: 34 Global Step: 713520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:47:58,991-Speed 2514.15 samples/sec Loss 1.1431 LearningRate 0.000024 Epoch: 34 Global Step: 713530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:07,191-Speed 2497.85 samples/sec Loss 1.1704 LearningRate 0.000024 Epoch: 34 Global Step: 713540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:15,390-Speed 2498.26 samples/sec Loss 1.1347 LearningRate 0.000024 Epoch: 34 Global Step: 713550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:23,596-Speed 2496.27 samples/sec Loss 1.1315 LearningRate 0.000024 Epoch: 34 Global Step: 713560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:31,797-Speed 2497.68 samples/sec Loss 1.1565 LearningRate 0.000024 Epoch: 34 Global Step: 713570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:40,001-Speed 2496.70 samples/sec Loss 1.1574 LearningRate 0.000024 Epoch: 34 Global Step: 713580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:48,165-Speed 2509.10 samples/sec Loss 1.1591 LearningRate 0.000024 Epoch: 34 Global Step: 713590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:48:56,368-Speed 2496.74 samples/sec Loss 1.1571 LearningRate 0.000024 Epoch: 34 Global Step: 713600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:04,570-Speed 2497.33 samples/sec Loss 1.1508 LearningRate 0.000024 Epoch: 34 Global Step: 713610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:12,775-Speed 2496.71 samples/sec Loss 1.1382 LearningRate 0.000024 Epoch: 34 Global Step: 713620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:20,976-Speed 2497.58 samples/sec Loss 1.2008 LearningRate 0.000024 Epoch: 34 Global Step: 713630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:29,178-Speed 2497.29 samples/sec Loss 1.1878 LearningRate 0.000024 Epoch: 34 Global Step: 713640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:37,326-Speed 2514.20 samples/sec Loss 1.1323 LearningRate 0.000024 Epoch: 34 Global Step: 713650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:45,533-Speed 2495.82 samples/sec Loss 1.1532 LearningRate 0.000024 Epoch: 34 Global Step: 713660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:49:53,731-Speed 2498.38 samples/sec Loss 1.1508 LearningRate 0.000024 Epoch: 34 Global Step: 713670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:01,938-Speed 2496.04 samples/sec Loss 1.1513 LearningRate 0.000024 Epoch: 34 Global Step: 713680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:10,146-Speed 2495.63 samples/sec Loss 1.1456 LearningRate 0.000024 Epoch: 34 Global Step: 713690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:18,347-Speed 2497.68 samples/sec Loss 1.1654 LearningRate 0.000024 Epoch: 34 Global Step: 713700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:26,499-Speed 2512.47 samples/sec Loss 1.1724 LearningRate 0.000024 Epoch: 34 Global Step: 713710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:34,698-Speed 2498.39 samples/sec Loss 1.1540 LearningRate 0.000024 Epoch: 34 Global Step: 713720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:42,898-Speed 2497.87 samples/sec Loss 1.1490 LearningRate 0.000024 Epoch: 34 Global Step: 713730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:51,097-Speed 2498.28 samples/sec Loss 1.1428 LearningRate 0.000024 Epoch: 34 Global Step: 713740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:50:59,298-Speed 2497.76 samples/sec Loss 1.1178 LearningRate 0.000024 Epoch: 34 Global Step: 713750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:51:07,499-Speed 2497.57 samples/sec Loss 1.1664 LearningRate 0.000024 Epoch: 34 Global Step: 713760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:51:15,645-Speed 2514.55 samples/sec Loss 1.1838 LearningRate 0.000024 Epoch: 34 Global Step: 713770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:51:23,844-Speed 2498.18 samples/sec Loss 1.1599 LearningRate 0.000024 Epoch: 34 Global Step: 713780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:51:32,054-Speed 2494.96 samples/sec Loss 1.1373 LearningRate 0.000024 Epoch: 34 Global Step: 713790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-07-12 09:51:40,212-Speed 2511.08 samples/sec Loss 1.1220 LearningRate 0.000024 Epoch: 34 Global Step: 713800 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-07-12 09:51:48,416-Speed 2496.48 samples/sec Loss 1.1797 LearningRate 0.000024 Epoch: 34 Global Step: 713810 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:51:56,628-Speed 2494.43 samples/sec Loss 1.1448 LearningRate 0.000024 Epoch: 34 Global Step: 713820 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:04,773-Speed 2514.75 samples/sec Loss 1.1733 LearningRate 0.000024 Epoch: 34 Global Step: 713830 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:12,979-Speed 2496.16 samples/sec Loss 1.1580 LearningRate 0.000024 Epoch: 34 Global Step: 713840 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:21,190-Speed 2495.06 samples/sec Loss 1.1483 LearningRate 0.000024 Epoch: 34 Global Step: 713850 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:29,395-Speed 2496.24 samples/sec Loss 1.1564 LearningRate 0.000024 Epoch: 34 Global Step: 713860 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:37,593-Speed 2498.97 samples/sec Loss 1.1472 LearningRate 0.000024 Epoch: 34 Global Step: 713870 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:45,791-Speed 2498.34 samples/sec Loss 1.1406 LearningRate 0.000024 Epoch: 34 Global Step: 713880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:52:53,941-Speed 2513.39 samples/sec Loss 1.1527 LearningRate 0.000024 Epoch: 34 Global Step: 713890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:02,145-Speed 2496.57 samples/sec Loss 1.1646 LearningRate 0.000024 Epoch: 34 Global Step: 713900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:10,347-Speed 2497.29 samples/sec Loss 1.1561 LearningRate 0.000024 Epoch: 34 Global Step: 713910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:18,546-Speed 2498.46 samples/sec Loss 1.1457 LearningRate 0.000024 Epoch: 34 Global Step: 713920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:26,747-Speed 2497.61 samples/sec Loss 1.1278 LearningRate 0.000024 Epoch: 34 Global Step: 713930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:34,946-Speed 2498.06 samples/sec Loss 1.1515 LearningRate 0.000024 Epoch: 34 Global Step: 713940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:43,094-Speed 2514.00 samples/sec Loss 1.1907 LearningRate 0.000024 Epoch: 34 Global Step: 713950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:51,293-Speed 2498.14 samples/sec Loss 1.1293 LearningRate 0.000024 Epoch: 34 Global Step: 713960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:53:59,494-Speed 2497.72 samples/sec Loss 1.1518 LearningRate 0.000024 Epoch: 34 Global Step: 713970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:07,696-Speed 2497.42 samples/sec Loss 1.1558 LearningRate 0.000024 Epoch: 34 Global Step: 713980 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:15,898-Speed 2497.31 samples/sec Loss 1.1630 LearningRate 0.000024 Epoch: 34 Global Step: 713990 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:24,099-Speed 2497.70 samples/sec Loss 1.1475 LearningRate 0.000024 Epoch: 34 Global Step: 714000 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:32,243-Speed 2514.83 samples/sec Loss 1.1356 LearningRate 0.000024 Epoch: 34 Global Step: 714010 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:40,448-Speed 2496.73 samples/sec Loss 1.1261 LearningRate 0.000024 Epoch: 34 Global Step: 714020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:48,651-Speed 2497.09 samples/sec Loss 1.1406 LearningRate 0.000024 Epoch: 34 Global Step: 714030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:54:56,858-Speed 2495.93 samples/sec Loss 1.1308 LearningRate 0.000024 Epoch: 34 Global Step: 714040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:05,060-Speed 2497.35 samples/sec Loss 1.1439 LearningRate 0.000024 Epoch: 34 Global Step: 714050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:13,275-Speed 2493.47 samples/sec Loss 1.1604 LearningRate 0.000024 Epoch: 34 Global Step: 714060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:21,420-Speed 2514.47 samples/sec Loss 1.1417 LearningRate 0.000024 Epoch: 34 Global Step: 714070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:29,622-Speed 2497.54 samples/sec Loss 1.1734 LearningRate 0.000024 Epoch: 34 Global Step: 714080 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:37,821-Speed 2498.50 samples/sec Loss 1.1417 LearningRate 0.000024 Epoch: 34 Global Step: 714090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:46,024-Speed 2497.11 samples/sec Loss 1.1457 LearningRate 0.000024 Epoch: 34 Global Step: 714100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:55:54,233-Speed 2494.97 samples/sec Loss 1.1664 LearningRate 0.000024 Epoch: 34 Global Step: 714110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:02,439-Speed 2496.22 samples/sec Loss 1.1614 LearningRate 0.000024 Epoch: 34 Global Step: 714120 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:10,584-Speed 2514.76 samples/sec Loss 1.1682 LearningRate 0.000024 Epoch: 34 Global Step: 714130 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:18,785-Speed 2497.92 samples/sec Loss 1.1216 LearningRate 0.000024 Epoch: 34 Global Step: 714140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:26,983-Speed 2499.01 samples/sec Loss 1.1454 LearningRate 0.000024 Epoch: 34 Global Step: 714150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:35,183-Speed 2497.75 samples/sec Loss 1.1532 LearningRate 0.000024 Epoch: 34 Global Step: 714160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:43,383-Speed 2498.11 samples/sec Loss 1.1725 LearningRate 0.000024 Epoch: 34 Global Step: 714170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:51,581-Speed 2498.58 samples/sec Loss 1.1615 LearningRate 0.000024 Epoch: 34 Global Step: 714180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:56:59,727-Speed 2514.44 samples/sec Loss 1.1530 LearningRate 0.000024 Epoch: 34 Global Step: 714190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:07,944-Speed 2492.68 samples/sec Loss 1.1357 LearningRate 0.000024 Epoch: 34 Global Step: 714200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:16,144-Speed 2497.98 samples/sec Loss 1.1195 LearningRate 0.000024 Epoch: 34 Global Step: 714210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:24,343-Speed 2498.57 samples/sec Loss 1.1259 LearningRate 0.000024 Epoch: 34 Global Step: 714220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:32,543-Speed 2497.78 samples/sec Loss 1.1553 LearningRate 0.000024 Epoch: 34 Global Step: 714230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:40,744-Speed 2497.62 samples/sec Loss 1.1540 LearningRate 0.000024 Epoch: 34 Global Step: 714240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:48,892-Speed 2513.85 samples/sec Loss 1.0845 LearningRate 0.000024 Epoch: 34 Global Step: 714250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:57:57,093-Speed 2497.69 samples/sec Loss 1.1653 LearningRate 0.000024 Epoch: 34 Global Step: 714260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:05,296-Speed 2496.97 samples/sec Loss 1.1541 LearningRate 0.000024 Epoch: 34 Global Step: 714270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:13,517-Speed 2491.52 samples/sec Loss 1.1493 LearningRate 0.000024 Epoch: 34 Global Step: 714280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:21,720-Speed 2497.40 samples/sec Loss 1.1312 LearningRate 0.000024 Epoch: 34 Global Step: 714290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:29,920-Speed 2497.85 samples/sec Loss 1.1466 LearningRate 0.000024 Epoch: 34 Global Step: 714300 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:38,064-Speed 2514.94 samples/sec Loss 1.1374 LearningRate 0.000024 Epoch: 34 Global Step: 714310 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:46,264-Speed 2498.02 samples/sec Loss 1.1773 LearningRate 0.000024 Epoch: 34 Global Step: 714320 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:58:54,492-Speed 2489.22 samples/sec Loss 1.1190 LearningRate 0.000024 Epoch: 34 Global Step: 714330 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:02,695-Speed 2497.29 samples/sec Loss 1.1197 LearningRate 0.000024 Epoch: 34 Global Step: 714340 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:10,906-Speed 2494.61 samples/sec Loss 1.1409 LearningRate 0.000024 Epoch: 34 Global Step: 714350 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:19,124-Speed 2492.79 samples/sec Loss 1.1631 LearningRate 0.000024 Epoch: 34 Global Step: 714360 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:27,275-Speed 2513.04 samples/sec Loss 1.1649 LearningRate 0.000024 Epoch: 34 Global Step: 714370 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:35,478-Speed 2496.81 samples/sec Loss 1.1628 LearningRate 0.000024 Epoch: 34 Global Step: 714380 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:43,682-Speed 2496.76 samples/sec Loss 1.1607 LearningRate 0.000024 Epoch: 34 Global Step: 714390 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 09:59:51,884-Speed 2497.21 samples/sec Loss 1.1472 LearningRate 0.000024 Epoch: 34 Global Step: 714400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:00,083-Speed 2498.37 samples/sec Loss 1.1449 LearningRate 0.000024 Epoch: 34 Global Step: 714410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:08,284-Speed 2497.33 samples/sec Loss 1.1620 LearningRate 0.000024 Epoch: 34 Global Step: 714420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:16,432-Speed 2514.01 samples/sec Loss 1.1246 LearningRate 0.000024 Epoch: 34 Global Step: 714430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:24,635-Speed 2496.98 samples/sec Loss 1.1362 LearningRate 0.000024 Epoch: 34 Global Step: 714440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:32,835-Speed 2498.09 samples/sec Loss 1.1588 LearningRate 0.000024 Epoch: 34 Global Step: 714450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:41,035-Speed 2497.88 samples/sec Loss 1.1689 LearningRate 0.000024 Epoch: 34 Global Step: 714460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:49,235-Speed 2498.10 samples/sec Loss 1.1699 LearningRate 0.000024 Epoch: 34 Global Step: 714470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:00:57,436-Speed 2497.64 samples/sec Loss 1.1885 LearningRate 0.000024 Epoch: 34 Global Step: 714480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:05,578-Speed 2515.75 samples/sec Loss 1.1568 LearningRate 0.000024 Epoch: 34 Global Step: 714490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:13,775-Speed 2498.72 samples/sec Loss 1.1363 LearningRate 0.000024 Epoch: 34 Global Step: 714500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:21,976-Speed 2497.74 samples/sec Loss 1.1505 LearningRate 0.000024 Epoch: 34 Global Step: 714510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:30,174-Speed 2498.59 samples/sec Loss 1.1172 LearningRate 0.000024 Epoch: 34 Global Step: 714520 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:38,372-Speed 2498.64 samples/sec Loss 1.1563 LearningRate 0.000024 Epoch: 34 Global Step: 714530 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:46,594-Speed 2491.41 samples/sec Loss 1.1287 LearningRate 0.000024 Epoch: 34 Global Step: 714540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:01:54,746-Speed 2512.86 samples/sec Loss 1.1699 LearningRate 0.000024 Epoch: 34 Global Step: 714550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:02,947-Speed 2497.46 samples/sec Loss 1.1917 LearningRate 0.000024 Epoch: 34 Global Step: 714560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:11,146-Speed 2498.32 samples/sec Loss 1.1907 LearningRate 0.000024 Epoch: 34 Global Step: 714570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:19,348-Speed 2497.45 samples/sec Loss 1.1396 LearningRate 0.000024 Epoch: 34 Global Step: 714580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:27,556-Speed 2496.04 samples/sec Loss 1.1266 LearningRate 0.000024 Epoch: 34 Global Step: 714590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:35,753-Speed 2498.56 samples/sec Loss 1.1231 LearningRate 0.000024 Epoch: 34 Global Step: 714600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:43,907-Speed 2512.05 samples/sec Loss 1.1713 LearningRate 0.000024 Epoch: 34 Global Step: 714610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:02:52,127-Speed 2492.03 samples/sec Loss 1.1409 LearningRate 0.000024 Epoch: 34 Global Step: 714620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:00,327-Speed 2497.98 samples/sec Loss 1.1525 LearningRate 0.000024 Epoch: 34 Global Step: 714630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:08,528-Speed 2497.67 samples/sec Loss 1.1485 LearningRate 0.000024 Epoch: 34 Global Step: 714640 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:16,727-Speed 2498.51 samples/sec Loss 1.1772 LearningRate 0.000024 Epoch: 34 Global Step: 714650 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:24,924-Speed 2499.08 samples/sec Loss 1.1582 LearningRate 0.000024 Epoch: 34 Global Step: 714660 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:33,081-Speed 2511.06 samples/sec Loss 1.1655 LearningRate 0.000024 Epoch: 34 Global Step: 714670 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:41,293-Speed 2494.74 samples/sec Loss 1.1627 LearningRate 0.000024 Epoch: 34 Global Step: 714680 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:49,498-Speed 2496.17 samples/sec Loss 1.1427 LearningRate 0.000024 Epoch: 34 Global Step: 714690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:03:57,703-Speed 2496.71 samples/sec Loss 1.1506 LearningRate 0.000024 Epoch: 34 Global Step: 714700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:05,909-Speed 2496.13 samples/sec Loss 1.1525 LearningRate 0.000024 Epoch: 34 Global Step: 714710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:14,115-Speed 2495.95 samples/sec Loss 1.1537 LearningRate 0.000024 Epoch: 34 Global Step: 714720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:22,263-Speed 2513.95 samples/sec Loss 1.1513 LearningRate 0.000024 Epoch: 34 Global Step: 714730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:30,463-Speed 2498.06 samples/sec Loss 1.1598 LearningRate 0.000024 Epoch: 34 Global Step: 714740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:38,663-Speed 2497.73 samples/sec Loss 1.1664 LearningRate 0.000024 Epoch: 34 Global Step: 714750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:46,867-Speed 2497.17 samples/sec Loss 1.1632 LearningRate 0.000024 Epoch: 34 Global Step: 714760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:04:55,068-Speed 2497.67 samples/sec Loss 1.1329 LearningRate 0.000024 Epoch: 34 Global Step: 714770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:03,271-Speed 2497.16 samples/sec Loss 1.1619 LearningRate 0.000024 Epoch: 34 Global Step: 714780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:11,418-Speed 2514.08 samples/sec Loss 1.1590 LearningRate 0.000024 Epoch: 34 Global Step: 714790 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:19,618-Speed 2497.97 samples/sec Loss 1.1661 LearningRate 0.000024 Epoch: 34 Global Step: 714800 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:27,818-Speed 2498.12 samples/sec Loss 1.1056 LearningRate 0.000024 Epoch: 34 Global Step: 714810 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:36,021-Speed 2497.09 samples/sec Loss 1.1206 LearningRate 0.000024 Epoch: 34 Global Step: 714820 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:44,218-Speed 2498.85 samples/sec Loss 1.1640 LearningRate 0.000024 Epoch: 34 Global Step: 714830 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:05:52,418-Speed 2497.94 samples/sec Loss 1.1192 LearningRate 0.000024 Epoch: 34 Global Step: 714840 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:00,569-Speed 2512.89 samples/sec Loss 1.1448 LearningRate 0.000024 Epoch: 34 Global Step: 714850 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:08,772-Speed 2497.21 samples/sec Loss 1.1531 LearningRate 0.000024 Epoch: 34 Global Step: 714860 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:16,971-Speed 2498.31 samples/sec Loss 1.1680 LearningRate 0.000024 Epoch: 34 Global Step: 714870 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:25,171-Speed 2498.06 samples/sec Loss 1.1733 LearningRate 0.000024 Epoch: 34 Global Step: 714880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:33,375-Speed 2496.76 samples/sec Loss 1.1384 LearningRate 0.000024 Epoch: 34 Global Step: 714890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:41,572-Speed 2498.89 samples/sec Loss 1.1271 LearningRate 0.000024 Epoch: 34 Global Step: 714900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:49,725-Speed 2512.17 samples/sec Loss 1.1670 LearningRate 0.000024 Epoch: 34 Global Step: 714910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:06:57,934-Speed 2495.31 samples/sec Loss 1.1674 LearningRate 0.000024 Epoch: 34 Global Step: 714920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:06,134-Speed 2497.90 samples/sec Loss 1.1476 LearningRate 0.000024 Epoch: 34 Global Step: 714930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:14,334-Speed 2498.23 samples/sec Loss 1.1697 LearningRate 0.000024 Epoch: 34 Global Step: 714940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:22,532-Speed 2498.42 samples/sec Loss 1.1545 LearningRate 0.000024 Epoch: 34 Global Step: 714950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:30,735-Speed 2497.15 samples/sec Loss 1.1386 LearningRate 0.000024 Epoch: 34 Global Step: 714960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:38,883-Speed 2513.88 samples/sec Loss 1.1664 LearningRate 0.000024 Epoch: 34 Global Step: 714970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:47,082-Speed 2498.09 samples/sec Loss 1.1305 LearningRate 0.000024 Epoch: 34 Global Step: 714980 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:07:55,304-Speed 2491.63 samples/sec Loss 1.1495 LearningRate 0.000024 Epoch: 34 Global Step: 714990 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-07-12 10:08:03,506-Speed 2497.44 samples/sec Loss 1.1361 LearningRate 0.000024 Epoch: 34 Global Step: 715000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:11,707-Speed 2497.63 samples/sec Loss 1.1400 LearningRate 0.000024 Epoch: 34 Global Step: 715010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:19,909-Speed 2497.22 samples/sec Loss 1.1264 LearningRate 0.000024 Epoch: 34 Global Step: 715020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:28,056-Speed 2514.22 samples/sec Loss 1.1889 LearningRate 0.000024 Epoch: 34 Global Step: 715030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:36,270-Speed 2493.56 samples/sec Loss 1.1891 LearningRate 0.000024 Epoch: 34 Global Step: 715040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:44,472-Speed 2497.45 samples/sec Loss 1.1584 LearningRate 0.000024 Epoch: 34 Global Step: 715050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:08:52,676-Speed 2496.63 samples/sec Loss 1.1536 LearningRate 0.000024 Epoch: 34 Global Step: 715060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:00,891-Speed 2493.22 samples/sec Loss 1.1482 LearningRate 0.000024 Epoch: 34 Global Step: 715070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:09,096-Speed 2497.64 samples/sec Loss 1.1496 LearningRate 0.000024 Epoch: 34 Global Step: 715080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:17,246-Speed 2513.18 samples/sec Loss 1.1508 LearningRate 0.000024 Epoch: 34 Global Step: 715090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:25,446-Speed 2497.92 samples/sec Loss 1.1504 LearningRate 0.000024 Epoch: 34 Global Step: 715100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:33,649-Speed 2496.93 samples/sec Loss 1.1301 LearningRate 0.000023 Epoch: 34 Global Step: 715110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:41,852-Speed 2497.36 samples/sec Loss 1.1500 LearningRate 0.000023 Epoch: 34 Global Step: 715120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:50,051-Speed 2498.49 samples/sec Loss 1.1432 LearningRate 0.000023 Epoch: 34 Global Step: 715130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:09:58,250-Speed 2498.07 samples/sec Loss 1.1384 LearningRate 0.000023 Epoch: 34 Global Step: 715140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:06,398-Speed 2513.98 samples/sec Loss 1.1268 LearningRate 0.000023 Epoch: 34 Global Step: 715150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:14,596-Speed 2498.73 samples/sec Loss 1.1628 LearningRate 0.000023 Epoch: 34 Global Step: 715160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:22,797-Speed 2497.56 samples/sec Loss 1.1457 LearningRate 0.000023 Epoch: 34 Global Step: 715170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:30,995-Speed 2498.34 samples/sec Loss 1.2093 LearningRate 0.000023 Epoch: 34 Global Step: 715180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:39,194-Speed 2498.61 samples/sec Loss 1.1577 LearningRate 0.000023 Epoch: 34 Global Step: 715190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:47,410-Speed 2493.02 samples/sec Loss 1.1632 LearningRate 0.000023 Epoch: 34 Global Step: 715200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:10:55,555-Speed 2514.86 samples/sec Loss 1.1600 LearningRate 0.000023 Epoch: 34 Global Step: 715210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:03,756-Speed 2497.85 samples/sec Loss 1.1442 LearningRate 0.000023 Epoch: 34 Global Step: 715220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:11,951-Speed 2499.32 samples/sec Loss 1.1774 LearningRate 0.000023 Epoch: 34 Global Step: 715230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:20,152-Speed 2497.87 samples/sec Loss 1.1497 LearningRate 0.000023 Epoch: 34 Global Step: 715240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:28,353-Speed 2497.53 samples/sec Loss 1.1306 LearningRate 0.000023 Epoch: 34 Global Step: 715250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:36,554-Speed 2497.88 samples/sec Loss 1.1644 LearningRate 0.000023 Epoch: 34 Global Step: 715260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:44,704-Speed 2513.37 samples/sec Loss 1.1413 LearningRate 0.000023 Epoch: 34 Global Step: 715270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:11:52,909-Speed 2496.44 samples/sec Loss 1.1664 LearningRate 0.000023 Epoch: 34 Global Step: 715280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:01,111-Speed 2497.52 samples/sec Loss 1.1459 LearningRate 0.000023 Epoch: 34 Global Step: 715290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:09,308-Speed 2498.74 samples/sec Loss 1.1246 LearningRate 0.000023 Epoch: 34 Global Step: 715300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:17,523-Speed 2493.24 samples/sec Loss 1.1343 LearningRate 0.000023 Epoch: 34 Global Step: 715310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:25,721-Speed 2498.70 samples/sec Loss 1.1493 LearningRate 0.000023 Epoch: 34 Global Step: 715320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:33,867-Speed 2514.45 samples/sec Loss 1.1582 LearningRate 0.000023 Epoch: 34 Global Step: 715330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:42,067-Speed 2498.14 samples/sec Loss 1.1526 LearningRate 0.000023 Epoch: 34 Global Step: 715340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:50,264-Speed 2498.82 samples/sec Loss 1.1677 LearningRate 0.000023 Epoch: 34 Global Step: 715350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:12:58,465-Speed 2497.99 samples/sec Loss 1.1513 LearningRate 0.000023 Epoch: 34 Global Step: 715360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:06,665-Speed 2497.69 samples/sec Loss 1.1286 LearningRate 0.000023 Epoch: 34 Global Step: 715370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:14,863-Speed 2498.37 samples/sec Loss 1.1561 LearningRate 0.000023 Epoch: 34 Global Step: 715380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:23,014-Speed 2513.26 samples/sec Loss 1.1441 LearningRate 0.000023 Epoch: 34 Global Step: 715390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:31,212-Speed 2498.56 samples/sec Loss 1.1311 LearningRate 0.000023 Epoch: 34 Global Step: 715400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:39,415-Speed 2497.24 samples/sec Loss 1.1209 LearningRate 0.000023 Epoch: 34 Global Step: 715410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:47,618-Speed 2496.98 samples/sec Loss 1.1444 LearningRate 0.000023 Epoch: 34 Global Step: 715420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:13:55,817-Speed 2498.18 samples/sec Loss 1.1633 LearningRate 0.000023 Epoch: 34 Global Step: 715430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:04,020-Speed 2497.15 samples/sec Loss 1.1558 LearningRate 0.000023 Epoch: 34 Global Step: 715440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:12,172-Speed 2512.77 samples/sec Loss 1.1900 LearningRate 0.000023 Epoch: 34 Global Step: 715450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:20,376-Speed 2496.85 samples/sec Loss 1.1697 LearningRate 0.000023 Epoch: 34 Global Step: 715460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:28,579-Speed 2497.23 samples/sec Loss 1.1481 LearningRate 0.000023 Epoch: 34 Global Step: 715470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:36,780-Speed 2497.57 samples/sec Loss 1.1210 LearningRate 0.000023 Epoch: 34 Global Step: 715480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:44,984-Speed 2496.75 samples/sec Loss 1.1788 LearningRate 0.000023 Epoch: 34 Global Step: 715490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:14:53,188-Speed 2496.62 samples/sec Loss 1.1116 LearningRate 0.000023 Epoch: 34 Global Step: 715500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:01,335-Speed 2514.14 samples/sec Loss 1.1477 LearningRate 0.000023 Epoch: 34 Global Step: 715510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:09,538-Speed 2497.17 samples/sec Loss 1.1570 LearningRate 0.000023 Epoch: 34 Global Step: 715520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:17,739-Speed 2497.79 samples/sec Loss 1.1454 LearningRate 0.000023 Epoch: 34 Global Step: 715530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:25,939-Speed 2498.07 samples/sec Loss 1.1476 LearningRate 0.000023 Epoch: 34 Global Step: 715540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:34,137-Speed 2498.52 samples/sec Loss 1.1725 LearningRate 0.000023 Epoch: 34 Global Step: 715550 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:42,340-Speed 2497.07 samples/sec Loss 1.1185 LearningRate 0.000023 Epoch: 34 Global Step: 715560 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:50,488-Speed 2513.84 samples/sec Loss 1.1559 LearningRate 0.000023 Epoch: 34 Global Step: 715570 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:15:58,694-Speed 2496.25 samples/sec Loss 1.1346 LearningRate 0.000023 Epoch: 34 Global Step: 715580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:06,898-Speed 2496.68 samples/sec Loss 1.1225 LearningRate 0.000023 Epoch: 34 Global Step: 715590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:15,095-Speed 2498.96 samples/sec Loss 1.1573 LearningRate 0.000023 Epoch: 34 Global Step: 715600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:23,295-Speed 2497.82 samples/sec Loss 1.1576 LearningRate 0.000023 Epoch: 34 Global Step: 715610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:31,494-Speed 2498.55 samples/sec Loss 1.1448 LearningRate 0.000023 Epoch: 34 Global Step: 715620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:39,643-Speed 2513.94 samples/sec Loss 1.1395 LearningRate 0.000023 Epoch: 34 Global Step: 715630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:47,838-Speed 2499.31 samples/sec Loss 1.1486 LearningRate 0.000023 Epoch: 34 Global Step: 715640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:16:56,038-Speed 2497.93 samples/sec Loss 1.1896 LearningRate 0.000023 Epoch: 34 Global Step: 715650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:04,238-Speed 2497.96 samples/sec Loss 1.1580 LearningRate 0.000023 Epoch: 34 Global Step: 715660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:12,439-Speed 2497.64 samples/sec Loss 1.1351 LearningRate 0.000023 Epoch: 34 Global Step: 715670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:20,641-Speed 2497.22 samples/sec Loss 1.1512 LearningRate 0.000023 Epoch: 34 Global Step: 715680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:28,791-Speed 2513.56 samples/sec Loss 1.1264 LearningRate 0.000023 Epoch: 34 Global Step: 715690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:36,996-Speed 2496.46 samples/sec Loss 1.1722 LearningRate 0.000023 Epoch: 34 Global Step: 715700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:45,200-Speed 2496.82 samples/sec Loss 1.1507 LearningRate 0.000023 Epoch: 34 Global Step: 715710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:17:53,410-Speed 2494.78 samples/sec Loss 1.1537 LearningRate 0.000023 Epoch: 34 Global Step: 715720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:01,614-Speed 2496.64 samples/sec Loss 1.1537 LearningRate 0.000023 Epoch: 34 Global Step: 715730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:09,826-Speed 2494.43 samples/sec Loss 1.1978 LearningRate 0.000023 Epoch: 34 Global Step: 715740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:17,977-Speed 2513.04 samples/sec Loss 1.1455 LearningRate 0.000023 Epoch: 34 Global Step: 715750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:26,180-Speed 2496.80 samples/sec Loss 1.1731 LearningRate 0.000023 Epoch: 34 Global Step: 715760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:34,382-Speed 2497.51 samples/sec Loss 1.1430 LearningRate 0.000023 Epoch: 34 Global Step: 715770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:42,594-Speed 2494.17 samples/sec Loss 1.1678 LearningRate 0.000023 Epoch: 34 Global Step: 715780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:50,797-Speed 2497.13 samples/sec Loss 1.1611 LearningRate 0.000023 Epoch: 34 Global Step: 715790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:18:58,998-Speed 2497.81 samples/sec Loss 1.1624 LearningRate 0.000023 Epoch: 34 Global Step: 715800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:07,152-Speed 2512.20 samples/sec Loss 1.1458 LearningRate 0.000023 Epoch: 34 Global Step: 715810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:15,350-Speed 2498.59 samples/sec Loss 1.1451 LearningRate 0.000023 Epoch: 34 Global Step: 715820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:23,548-Speed 2498.60 samples/sec Loss 1.1451 LearningRate 0.000023 Epoch: 34 Global Step: 715830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:31,762-Speed 2493.74 samples/sec Loss 1.1293 LearningRate 0.000023 Epoch: 34 Global Step: 715840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:39,969-Speed 2495.88 samples/sec Loss 1.1356 LearningRate 0.000023 Epoch: 34 Global Step: 715850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:48,170-Speed 2497.60 samples/sec Loss 1.1681 LearningRate 0.000023 Epoch: 34 Global Step: 715860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:19:56,319-Speed 2513.73 samples/sec Loss 1.1868 LearningRate 0.000023 Epoch: 34 Global Step: 715870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:04,517-Speed 2498.57 samples/sec Loss 1.1549 LearningRate 0.000023 Epoch: 34 Global Step: 715880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:12,719-Speed 2497.35 samples/sec Loss 1.1715 LearningRate 0.000023 Epoch: 34 Global Step: 715890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:20,922-Speed 2496.80 samples/sec Loss 1.1537 LearningRate 0.000023 Epoch: 34 Global Step: 715900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:29,203-Speed 2499.14 samples/sec Loss 1.1321 LearningRate 0.000023 Epoch: 34 Global Step: 715910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:37,460-Speed 2499.34 samples/sec Loss 1.1129 LearningRate 0.000023 Epoch: 34 Global Step: 715920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:49,271-Speed 1736.68 samples/sec Loss 1.1426 LearningRate 0.000023 Epoch: 34 Global Step: 715930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:20:57,508-Speed 2501.73 samples/sec Loss 1.1439 LearningRate 0.000023 Epoch: 34 Global Step: 715940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:05,714-Speed 2495.88 samples/sec Loss 1.1724 LearningRate 0.000023 Epoch: 34 Global Step: 715950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:18,571-Speed 1628.69 samples/sec Loss 1.1516 LearningRate 0.000023 Epoch: 34 Global Step: 715960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:27,048-Speed 2503.96 samples/sec Loss 1.1349 LearningRate 0.000023 Epoch: 34 Global Step: 715970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:35,251-Speed 2496.79 samples/sec Loss 1.1494 LearningRate 0.000023 Epoch: 34 Global Step: 715980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:43,432-Speed 2516.61 samples/sec Loss 1.1777 LearningRate 0.000023 Epoch: 34 Global Step: 715990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:21:51,665-Speed 2499.19 samples/sec Loss 1.1342 LearningRate 0.000023 Epoch: 34 Global Step: 716000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:01,521-Speed 2078.07 samples/sec Loss 1.1653 LearningRate 0.000023 Epoch: 34 Global Step: 716010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:09,725-Speed 2496.78 samples/sec Loss 1.1423 LearningRate 0.000023 Epoch: 34 Global Step: 716020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:17,996-Speed 2493.89 samples/sec Loss 1.1593 LearningRate 0.000023 Epoch: 34 Global Step: 716030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:26,228-Speed 2498.05 samples/sec Loss 1.1548 LearningRate 0.000023 Epoch: 34 Global Step: 716040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:38,619-Speed 1652.95 samples/sec Loss 1.1187 LearningRate 0.000023 Epoch: 34 Global Step: 716050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:46,831-Speed 2494.10 samples/sec Loss 1.1364 LearningRate 0.000023 Epoch: 34 Global Step: 716060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:22:57,465-Speed 1946.61 samples/sec Loss 1.1379 LearningRate 0.000023 Epoch: 34 Global Step: 716070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:05,676-Speed 2495.78 samples/sec Loss 1.1482 LearningRate 0.000023 Epoch: 34 Global Step: 716080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:13,883-Speed 2495.90 samples/sec Loss 1.1336 LearningRate 0.000023 Epoch: 34 Global Step: 716090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:24,685-Speed 1975.04 samples/sec Loss 1.1291 LearningRate 0.000023 Epoch: 34 Global Step: 716100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:32,878-Speed 2516.78 samples/sec Loss 1.1328 LearningRate 0.000023 Epoch: 34 Global Step: 716110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:41,098-Speed 2491.66 samples/sec Loss 1.1545 LearningRate 0.000023 Epoch: 34 Global Step: 716120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:23:54,691-Speed 2500.93 samples/sec Loss 1.1505 LearningRate 0.000023 Epoch: 34 Global Step: 716130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:02,914-Speed 2499.40 samples/sec Loss 1.1364 LearningRate 0.000023 Epoch: 34 Global Step: 716140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:14,453-Speed 1774.99 samples/sec Loss 1.1265 LearningRate 0.000023 Epoch: 34 Global Step: 716150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:22,687-Speed 2496.62 samples/sec Loss 1.1673 LearningRate 0.000023 Epoch: 34 Global Step: 716160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:30,905-Speed 2509.45 samples/sec Loss 1.1547 LearningRate 0.000023 Epoch: 34 Global Step: 716170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:39,134-Speed 2488.92 samples/sec Loss 1.1622 LearningRate 0.000023 Epoch: 34 Global Step: 716180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:47,367-Speed 2487.93 samples/sec Loss 1.1645 LearningRate 0.000023 Epoch: 34 Global Step: 716190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:24:55,607-Speed 2485.78 samples/sec Loss 1.1235 LearningRate 0.000023 Epoch: 34 Global Step: 716200 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:03,846-Speed 2486.64 samples/sec Loss 1.1561 LearningRate 0.000023 Epoch: 34 Global Step: 716210 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:12,081-Speed 2487.11 samples/sec Loss 1.1594 LearningRate 0.000023 Epoch: 34 Global Step: 716220 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:20,268-Speed 2501.96 samples/sec Loss 1.1306 LearningRate 0.000023 Epoch: 34 Global Step: 716230 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:28,490-Speed 2491.21 samples/sec Loss 1.1597 LearningRate 0.000023 Epoch: 34 Global Step: 716240 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:36,708-Speed 2492.68 samples/sec Loss 1.1585 LearningRate 0.000023 Epoch: 34 Global Step: 716250 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:44,924-Speed 2492.98 samples/sec Loss 1.1486 LearningRate 0.000023 Epoch: 34 Global Step: 716260 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:25:53,138-Speed 2493.61 samples/sec Loss 1.1599 LearningRate 0.000023 Epoch: 34 Global Step: 716270 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:01,348-Speed 2494.85 samples/sec Loss 1.1287 LearningRate 0.000023 Epoch: 34 Global Step: 716280 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:09,521-Speed 2506.21 samples/sec Loss 1.1608 LearningRate 0.000023 Epoch: 34 Global Step: 716290 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:17,733-Speed 2494.52 samples/sec Loss 1.1625 LearningRate 0.000023 Epoch: 34 Global Step: 716300 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:25,945-Speed 2494.38 samples/sec Loss 1.1344 LearningRate 0.000023 Epoch: 34 Global Step: 716310 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:34,156-Speed 2494.63 samples/sec Loss 1.1734 LearningRate 0.000023 Epoch: 34 Global Step: 716320 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:42,369-Speed 2493.78 samples/sec Loss 1.1920 LearningRate 0.000023 Epoch: 34 Global Step: 716330 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:50,582-Speed 2494.05 samples/sec Loss 1.1277 LearningRate 0.000023 Epoch: 34 Global Step: 716340 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:26:58,751-Speed 2507.24 samples/sec Loss 1.1271 LearningRate 0.000023 Epoch: 34 Global Step: 716350 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:06,970-Speed 2493.04 samples/sec Loss 1.1482 LearningRate 0.000023 Epoch: 34 Global Step: 716360 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:15,189-Speed 2492.12 samples/sec Loss 1.1516 LearningRate 0.000023 Epoch: 34 Global Step: 716370 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:23,405-Speed 2493.30 samples/sec Loss 1.1200 LearningRate 0.000023 Epoch: 34 Global Step: 716380 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:31,621-Speed 2493.08 samples/sec Loss 1.1502 LearningRate 0.000023 Epoch: 34 Global Step: 716390 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:39,837-Speed 2493.35 samples/sec Loss 1.1647 LearningRate 0.000023 Epoch: 34 Global Step: 716400 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:48,004-Speed 2507.86 samples/sec Loss 1.1693 LearningRate 0.000023 Epoch: 34 Global Step: 716410 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:27:56,217-Speed 2493.96 samples/sec Loss 1.1544 LearningRate 0.000023 Epoch: 34 Global Step: 716420 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:04,430-Speed 2494.51 samples/sec Loss 1.1712 LearningRate 0.000023 Epoch: 34 Global Step: 716430 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:12,664-Speed 2487.35 samples/sec Loss 1.1721 LearningRate 0.000023 Epoch: 34 Global Step: 716440 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:20,877-Speed 2494.00 samples/sec Loss 1.1497 LearningRate 0.000023 Epoch: 34 Global Step: 716450 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:29,091-Speed 2493.75 samples/sec Loss 1.1445 LearningRate 0.000023 Epoch: 34 Global Step: 716460 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:37,247-Speed 2512.12 samples/sec Loss 1.1680 LearningRate 0.000023 Epoch: 34 Global Step: 716470 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:45,458-Speed 2494.57 samples/sec Loss 1.1504 LearningRate 0.000023 Epoch: 34 Global Step: 716480 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:28:53,668-Speed 2494.80 samples/sec Loss 1.1714 LearningRate 0.000023 Epoch: 34 Global Step: 716490 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:01,883-Speed 2493.95 samples/sec Loss 1.1415 LearningRate 0.000023 Epoch: 34 Global Step: 716500 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:10,094-Speed 2494.55 samples/sec Loss 1.1249 LearningRate 0.000023 Epoch: 34 Global Step: 716510 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:18,317-Speed 2490.84 samples/sec Loss 1.1343 LearningRate 0.000023 Epoch: 34 Global Step: 716520 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:26,476-Speed 2510.52 samples/sec Loss 1.1538 LearningRate 0.000023 Epoch: 34 Global Step: 716530 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:34,687-Speed 2494.82 samples/sec Loss 1.1466 LearningRate 0.000023 Epoch: 34 Global Step: 716540 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:42,898-Speed 2494.58 samples/sec Loss 1.1432 LearningRate 0.000023 Epoch: 34 Global Step: 716550 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:51,112-Speed 2493.86 samples/sec Loss 1.1571 LearningRate 0.000023 Epoch: 34 Global Step: 716560 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:29:59,324-Speed 2494.28 samples/sec Loss 1.1272 LearningRate 0.000023 Epoch: 34 Global Step: 716570 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:07,557-Speed 2487.87 samples/sec Loss 1.1741 LearningRate 0.000023 Epoch: 34 Global Step: 716580 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:15,709-Speed 2512.82 samples/sec Loss 1.1347 LearningRate 0.000023 Epoch: 34 Global Step: 716590 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:23,918-Speed 2495.08 samples/sec Loss 1.1477 LearningRate 0.000023 Epoch: 34 Global Step: 716600 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:32,130-Speed 2494.33 samples/sec Loss 1.1603 LearningRate 0.000023 Epoch: 34 Global Step: 716610 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:40,338-Speed 2495.45 samples/sec Loss 1.1422 LearningRate 0.000023 Epoch: 34 Global Step: 716620 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:48,547-Speed 2495.40 samples/sec Loss 1.1690 LearningRate 0.000023 Epoch: 34 Global Step: 716630 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:30:56,761-Speed 2493.60 samples/sec Loss 1.1537 LearningRate 0.000023 Epoch: 34 Global Step: 716640 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:04,917-Speed 2511.27 samples/sec Loss 1.1451 LearningRate 0.000023 Epoch: 34 Global Step: 716650 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:13,127-Speed 2495.00 samples/sec Loss 1.1434 LearningRate 0.000023 Epoch: 34 Global Step: 716660 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:21,335-Speed 2495.59 samples/sec Loss 1.1424 LearningRate 0.000023 Epoch: 34 Global Step: 716670 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:29,554-Speed 2492.27 samples/sec Loss 1.1476 LearningRate 0.000023 Epoch: 34 Global Step: 716680 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:37,764-Speed 2494.85 samples/sec Loss 1.1721 LearningRate 0.000023 Epoch: 34 Global Step: 716690 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:45,982-Speed 2492.57 samples/sec Loss 1.1317 LearningRate 0.000023 Epoch: 34 Global Step: 716700 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:31:54,138-Speed 2511.51 samples/sec Loss 1.1668 LearningRate 0.000023 Epoch: 34 Global Step: 716710 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:32:02,306-Speed 2507.64 samples/sec Loss 1.1747 LearningRate 0.000023 Epoch: 34 Global Step: 716720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:10,522-Speed 2493.34 samples/sec Loss 1.1657 LearningRate 0.000023 Epoch: 34 Global Step: 716730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:18,732-Speed 2494.86 samples/sec Loss 1.1210 LearningRate 0.000023 Epoch: 34 Global Step: 716740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:26,940-Speed 2495.48 samples/sec Loss 1.1473 LearningRate 0.000023 Epoch: 34 Global Step: 716750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:35,150-Speed 2495.06 samples/sec Loss 1.1488 LearningRate 0.000023 Epoch: 34 Global Step: 716760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:43,305-Speed 2511.44 samples/sec Loss 1.1654 LearningRate 0.000023 Epoch: 34 Global Step: 716770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:51,515-Speed 2495.51 samples/sec Loss 1.1300 LearningRate 0.000023 Epoch: 34 Global Step: 716780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:32:59,724-Speed 2495.04 samples/sec Loss 1.1331 LearningRate 0.000023 Epoch: 34 Global Step: 716790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:07,936-Speed 2494.69 samples/sec Loss 1.1788 LearningRate 0.000023 Epoch: 34 Global Step: 716800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:16,158-Speed 2491.28 samples/sec Loss 1.1680 LearningRate 0.000023 Epoch: 34 Global Step: 716810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:24,381-Speed 2490.95 samples/sec Loss 1.1339 LearningRate 0.000023 Epoch: 34 Global Step: 716820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:32,548-Speed 2507.97 samples/sec Loss 1.1289 LearningRate 0.000023 Epoch: 34 Global Step: 716830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:40,758-Speed 2494.92 samples/sec Loss 1.1443 LearningRate 0.000023 Epoch: 34 Global Step: 716840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:48,967-Speed 2495.18 samples/sec Loss 1.1388 LearningRate 0.000023 Epoch: 34 Global Step: 716850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:33:57,175-Speed 2495.83 samples/sec Loss 1.1274 LearningRate 0.000023 Epoch: 34 Global Step: 716860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:05,385-Speed 2494.67 samples/sec Loss 1.1397 LearningRate 0.000023 Epoch: 34 Global Step: 716870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:13,595-Speed 2494.99 samples/sec Loss 1.1207 LearningRate 0.000023 Epoch: 34 Global Step: 716880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:21,751-Speed 2511.22 samples/sec Loss 1.1143 LearningRate 0.000023 Epoch: 34 Global Step: 716890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:29,971-Speed 2491.75 samples/sec Loss 1.1306 LearningRate 0.000023 Epoch: 34 Global Step: 716900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:38,180-Speed 2495.21 samples/sec Loss 1.1441 LearningRate 0.000023 Epoch: 34 Global Step: 716910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:46,401-Speed 2491.77 samples/sec Loss 1.1179 LearningRate 0.000023 Epoch: 34 Global Step: 716920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:34:54,610-Speed 2495.52 samples/sec Loss 1.1323 LearningRate 0.000023 Epoch: 34 Global Step: 716930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:02,822-Speed 2494.57 samples/sec Loss 1.1449 LearningRate 0.000023 Epoch: 34 Global Step: 716940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:10,974-Speed 2512.40 samples/sec Loss 1.1866 LearningRate 0.000023 Epoch: 34 Global Step: 716950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:19,186-Speed 2494.30 samples/sec Loss 1.1338 LearningRate 0.000023 Epoch: 34 Global Step: 716960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:27,397-Speed 2494.64 samples/sec Loss 1.1513 LearningRate 0.000023 Epoch: 34 Global Step: 716970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:35,611-Speed 2493.59 samples/sec Loss 1.1398 LearningRate 0.000023 Epoch: 34 Global Step: 716980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:43,820-Speed 2494.97 samples/sec Loss 1.1443 LearningRate 0.000023 Epoch: 34 Global Step: 716990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:35:52,029-Speed 2495.59 samples/sec Loss 1.1472 LearningRate 0.000023 Epoch: 34 Global Step: 717000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:00,190-Speed 2510.09 samples/sec Loss 1.1576 LearningRate 0.000023 Epoch: 34 Global Step: 717010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:08,402-Speed 2494.09 samples/sec Loss 1.1327 LearningRate 0.000023 Epoch: 34 Global Step: 717020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:16,616-Speed 2493.87 samples/sec Loss 1.1369 LearningRate 0.000023 Epoch: 34 Global Step: 717030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:24,826-Speed 2494.61 samples/sec Loss 1.1332 LearningRate 0.000023 Epoch: 34 Global Step: 717040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:33,037-Speed 2494.90 samples/sec Loss 1.1213 LearningRate 0.000023 Epoch: 34 Global Step: 717050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:41,262-Speed 2490.39 samples/sec Loss 1.1414 LearningRate 0.000023 Epoch: 34 Global Step: 717060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:49,420-Speed 2510.77 samples/sec Loss 1.1659 LearningRate 0.000023 Epoch: 34 Global Step: 717070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:36:57,630-Speed 2494.70 samples/sec Loss 1.1432 LearningRate 0.000023 Epoch: 34 Global Step: 717080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:05,852-Speed 2491.32 samples/sec Loss 1.1707 LearningRate 0.000023 Epoch: 34 Global Step: 717090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:14,058-Speed 2496.27 samples/sec Loss 1.1592 LearningRate 0.000023 Epoch: 34 Global Step: 717100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:22,268-Speed 2494.79 samples/sec Loss 1.1617 LearningRate 0.000023 Epoch: 34 Global Step: 717110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:30,479-Speed 2494.75 samples/sec Loss 1.1226 LearningRate 0.000023 Epoch: 34 Global Step: 717120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:38,633-Speed 2511.85 samples/sec Loss 1.1350 LearningRate 0.000023 Epoch: 34 Global Step: 717130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:46,847-Speed 2493.58 samples/sec Loss 1.1255 LearningRate 0.000023 Epoch: 34 Global Step: 717140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:37:55,066-Speed 2492.04 samples/sec Loss 1.1617 LearningRate 0.000023 Epoch: 34 Global Step: 717150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:03,274-Speed 2495.48 samples/sec Loss 1.1354 LearningRate 0.000023 Epoch: 34 Global Step: 717160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:11,482-Speed 2495.46 samples/sec Loss 1.1319 LearningRate 0.000023 Epoch: 34 Global Step: 717170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:19,690-Speed 2495.45 samples/sec Loss 1.1204 LearningRate 0.000023 Epoch: 34 Global Step: 717180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:27,847-Speed 2511.05 samples/sec Loss 1.1471 LearningRate 0.000023 Epoch: 34 Global Step: 717190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:36,056-Speed 2495.13 samples/sec Loss 1.1315 LearningRate 0.000023 Epoch: 34 Global Step: 717200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:44,266-Speed 2494.93 samples/sec Loss 1.1366 LearningRate 0.000023 Epoch: 34 Global Step: 717210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:38:52,475-Speed 2495.15 samples/sec Loss 1.1582 LearningRate 0.000023 Epoch: 34 Global Step: 717220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:00,689-Speed 2494.01 samples/sec Loss 1.1303 LearningRate 0.000023 Epoch: 34 Global Step: 717230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:08,913-Speed 2490.48 samples/sec Loss 1.1397 LearningRate 0.000023 Epoch: 34 Global Step: 717240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:17,069-Speed 2511.49 samples/sec Loss 1.1357 LearningRate 0.000023 Epoch: 34 Global Step: 717250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:25,275-Speed 2496.16 samples/sec Loss 1.1322 LearningRate 0.000023 Epoch: 34 Global Step: 717260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:33,481-Speed 2496.05 samples/sec Loss 1.1325 LearningRate 0.000023 Epoch: 34 Global Step: 717270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:41,690-Speed 2495.36 samples/sec Loss 1.1340 LearningRate 0.000023 Epoch: 34 Global Step: 717280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:49,900-Speed 2495.01 samples/sec Loss 1.1305 LearningRate 0.000023 Epoch: 34 Global Step: 717290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:39:58,109-Speed 2495.22 samples/sec Loss 1.1405 LearningRate 0.000023 Epoch: 34 Global Step: 717300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:06,262-Speed 2512.43 samples/sec Loss 1.1602 LearningRate 0.000023 Epoch: 34 Global Step: 717310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:14,469-Speed 2495.74 samples/sec Loss 1.1306 LearningRate 0.000023 Epoch: 34 Global Step: 717320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:22,681-Speed 2494.33 samples/sec Loss 1.1512 LearningRate 0.000023 Epoch: 34 Global Step: 717330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:30,886-Speed 2496.43 samples/sec Loss 1.1610 LearningRate 0.000023 Epoch: 34 Global Step: 717340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:39,099-Speed 2493.91 samples/sec Loss 1.1411 LearningRate 0.000023 Epoch: 34 Global Step: 717350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:47,306-Speed 2495.84 samples/sec Loss 1.1552 LearningRate 0.000023 Epoch: 34 Global Step: 717360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:40:55,461-Speed 2512.12 samples/sec Loss 1.1265 LearningRate 0.000023 Epoch: 34 Global Step: 717370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:03,667-Speed 2495.90 samples/sec Loss 1.1227 LearningRate 0.000023 Epoch: 34 Global Step: 717380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:11,878-Speed 2494.58 samples/sec Loss 1.1433 LearningRate 0.000023 Epoch: 34 Global Step: 717390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:20,087-Speed 2495.33 samples/sec Loss 1.1385 LearningRate 0.000023 Epoch: 34 Global Step: 717400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:28,293-Speed 2496.04 samples/sec Loss 1.1144 LearningRate 0.000023 Epoch: 34 Global Step: 717410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:36,497-Speed 2496.50 samples/sec Loss 1.1482 LearningRate 0.000023 Epoch: 34 Global Step: 717420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:44,657-Speed 2510.54 samples/sec Loss 1.1390 LearningRate 0.000023 Epoch: 34 Global Step: 717430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:41:52,858-Speed 2497.53 samples/sec Loss 1.1417 LearningRate 0.000023 Epoch: 34 Global Step: 717440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:01,060-Speed 2497.39 samples/sec Loss 1.1220 LearningRate 0.000023 Epoch: 34 Global Step: 717450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:09,264-Speed 2496.36 samples/sec Loss 1.1275 LearningRate 0.000023 Epoch: 34 Global Step: 717460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:17,481-Speed 2493.06 samples/sec Loss 1.1397 LearningRate 0.000023 Epoch: 34 Global Step: 717470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:25,697-Speed 2493.14 samples/sec Loss 1.1737 LearningRate 0.000023 Epoch: 34 Global Step: 717480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:33,853-Speed 2511.25 samples/sec Loss 1.1212 LearningRate 0.000023 Epoch: 34 Global Step: 717490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:42,055-Speed 2497.33 samples/sec Loss 1.1466 LearningRate 0.000023 Epoch: 34 Global Step: 717500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:50,259-Speed 2496.80 samples/sec Loss 1.1503 LearningRate 0.000023 Epoch: 34 Global Step: 717510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:42:58,466-Speed 2495.73 samples/sec Loss 1.1557 LearningRate 0.000023 Epoch: 34 Global Step: 717520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:06,678-Speed 2494.22 samples/sec Loss 1.1252 LearningRate 0.000023 Epoch: 34 Global Step: 717530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:14,882-Speed 2496.80 samples/sec Loss 1.1409 LearningRate 0.000023 Epoch: 34 Global Step: 717540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:23,042-Speed 2510.28 samples/sec Loss 1.1540 LearningRate 0.000023 Epoch: 34 Global Step: 717550 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:31,244-Speed 2497.41 samples/sec Loss 1.1321 LearningRate 0.000023 Epoch: 34 Global Step: 717560 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:39,448-Speed 2497.04 samples/sec Loss 1.1713 LearningRate 0.000023 Epoch: 34 Global Step: 717570 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:47,672-Speed 2490.68 samples/sec Loss 1.1409 LearningRate 0.000022 Epoch: 34 Global Step: 717580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:43:55,882-Speed 2495.05 samples/sec Loss 1.1344 LearningRate 0.000022 Epoch: 34 Global Step: 717590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:04,099-Speed 2492.56 samples/sec Loss 1.1209 LearningRate 0.000022 Epoch: 34 Global Step: 717600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:12,252-Speed 2512.53 samples/sec Loss 1.1421 LearningRate 0.000022 Epoch: 34 Global Step: 717610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:20,457-Speed 2496.32 samples/sec Loss 1.1453 LearningRate 0.000022 Epoch: 34 Global Step: 717620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:28,662-Speed 2496.70 samples/sec Loss 1.1370 LearningRate 0.000022 Epoch: 34 Global Step: 717630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:36,864-Speed 2497.22 samples/sec Loss 1.1279 LearningRate 0.000022 Epoch: 34 Global Step: 717640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:45,069-Speed 2496.21 samples/sec Loss 1.1391 LearningRate 0.000022 Epoch: 34 Global Step: 717650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:44:53,272-Speed 2497.09 samples/sec Loss 1.1325 LearningRate 0.000022 Epoch: 34 Global Step: 717660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:01,425-Speed 2512.67 samples/sec Loss 1.1120 LearningRate 0.000022 Epoch: 34 Global Step: 717670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:09,628-Speed 2496.77 samples/sec Loss 1.1194 LearningRate 0.000022 Epoch: 34 Global Step: 717680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:17,835-Speed 2495.98 samples/sec Loss 1.1539 LearningRate 0.000022 Epoch: 34 Global Step: 717690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:26,038-Speed 2497.17 samples/sec Loss 1.1283 LearningRate 0.000022 Epoch: 34 Global Step: 717700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:34,244-Speed 2496.10 samples/sec Loss 1.1257 LearningRate 0.000022 Epoch: 34 Global Step: 717710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:42,449-Speed 2496.17 samples/sec Loss 1.1154 LearningRate 0.000022 Epoch: 34 Global Step: 717720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:50,600-Speed 2512.83 samples/sec Loss 1.1679 LearningRate 0.000022 Epoch: 34 Global Step: 717730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:45:58,802-Speed 2497.38 samples/sec Loss 1.1777 LearningRate 0.000022 Epoch: 34 Global Step: 717740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:07,015-Speed 2494.10 samples/sec Loss 1.1297 LearningRate 0.000022 Epoch: 34 Global Step: 717750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:15,223-Speed 2495.57 samples/sec Loss 1.1483 LearningRate 0.000022 Epoch: 34 Global Step: 717760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:23,425-Speed 2497.40 samples/sec Loss 1.1266 LearningRate 0.000022 Epoch: 34 Global Step: 717770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:31,633-Speed 2496.00 samples/sec Loss 1.1298 LearningRate 0.000022 Epoch: 34 Global Step: 717780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:39,785-Speed 2512.73 samples/sec Loss 1.1287 LearningRate 0.000022 Epoch: 34 Global Step: 717790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:47,990-Speed 2496.24 samples/sec Loss 1.1632 LearningRate 0.000022 Epoch: 34 Global Step: 717800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:46:56,193-Speed 2496.94 samples/sec Loss 1.1476 LearningRate 0.000022 Epoch: 34 Global Step: 717810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:04,396-Speed 2497.25 samples/sec Loss 1.1262 LearningRate 0.000022 Epoch: 34 Global Step: 717820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:12,598-Speed 2497.54 samples/sec Loss 1.1436 LearningRate 0.000022 Epoch: 34 Global Step: 717830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:20,807-Speed 2495.23 samples/sec Loss 1.1252 LearningRate 0.000022 Epoch: 34 Global Step: 717840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:28,956-Speed 2513.49 samples/sec Loss 1.1373 LearningRate 0.000022 Epoch: 34 Global Step: 717850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:37,171-Speed 2493.43 samples/sec Loss 1.1294 LearningRate 0.000022 Epoch: 34 Global Step: 717860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:45,375-Speed 2496.76 samples/sec Loss 1.1579 LearningRate 0.000022 Epoch: 34 Global Step: 717870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:47:53,587-Speed 2494.22 samples/sec Loss 1.1460 LearningRate 0.000022 Epoch: 34 Global Step: 717880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:48:01,803-Speed 2493.17 samples/sec Loss 1.1502 LearningRate 0.000022 Epoch: 34 Global Step: 717890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:48:10,010-Speed 2496.86 samples/sec Loss 1.1213 LearningRate 0.000022 Epoch: 34 Global Step: 717900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:48:18,159-Speed 2513.22 samples/sec Loss 1.1628 LearningRate 0.000022 Epoch: 34 Global Step: 717910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:48:26,364-Speed 2496.57 samples/sec Loss 1.1206 LearningRate 0.000022 Epoch: 34 Global Step: 717920 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:48:34,566-Speed 2497.21 samples/sec Loss 1.1285 LearningRate 0.000022 Epoch: 34 Global Step: 717930 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:48:42,774-Speed 2495.76 samples/sec Loss 1.1377 LearningRate 0.000022 Epoch: 34 Global Step: 717940 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:48:50,976-Speed 2497.03 samples/sec Loss 1.1272 LearningRate 0.000022 Epoch: 34 Global Step: 717950 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:48:59,186-Speed 2495.03 samples/sec Loss 1.1032 LearningRate 0.000022 Epoch: 34 Global Step: 717960 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:07,346-Speed 2510.13 samples/sec Loss 1.1308 LearningRate 0.000022 Epoch: 34 Global Step: 717970 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:15,555-Speed 2495.30 samples/sec Loss 1.1565 LearningRate 0.000022 Epoch: 34 Global Step: 717980 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:23,773-Speed 2492.66 samples/sec Loss 1.1357 LearningRate 0.000022 Epoch: 34 Global Step: 717990 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:31,979-Speed 2496.11 samples/sec Loss 1.1630 LearningRate 0.000022 Epoch: 34 Global Step: 718000 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:40,181-Speed 2497.46 samples/sec Loss 1.1615 LearningRate 0.000022 Epoch: 34 Global Step: 718010 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:48,399-Speed 2492.54 samples/sec Loss 1.1554 LearningRate 0.000022 Epoch: 34 Global Step: 718020 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:49:56,547-Speed 2513.56 samples/sec Loss 1.1513 LearningRate 0.000022 Epoch: 34 Global Step: 718030 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:50:04,756-Speed 2495.19 samples/sec Loss 1.1565 LearningRate 0.000022 Epoch: 34 Global Step: 718040 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:50:12,957-Speed 2497.89 samples/sec Loss 1.1420 LearningRate 0.000022 Epoch: 34 Global Step: 718050 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:50:21,158-Speed 2497.55 samples/sec Loss 1.1271 LearningRate 0.000022 Epoch: 34 Global Step: 718060 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-07-12 10:50:29,319-Speed 2510.00 samples/sec Loss 1.1653 LearningRate 0.000022 Epoch: 34 Global Step: 718070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:50:37,521-Speed 2497.24 samples/sec Loss 1.1020 LearningRate 0.000022 Epoch: 34 Global Step: 718080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:50:45,670-Speed 2513.49 samples/sec Loss 1.1716 LearningRate 0.000022 Epoch: 34 Global Step: 718090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:50:53,879-Speed 2495.55 samples/sec Loss 1.1387 LearningRate 0.000022 Epoch: 34 Global Step: 718100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:02,085-Speed 2495.90 samples/sec Loss 1.1549 LearningRate 0.000022 Epoch: 34 Global Step: 718110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:10,301-Speed 2493.69 samples/sec Loss 1.1263 LearningRate 0.000022 Epoch: 34 Global Step: 718120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:18,503-Speed 2497.70 samples/sec Loss 1.1070 LearningRate 0.000022 Epoch: 34 Global Step: 718130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:26,707-Speed 2496.40 samples/sec Loss 1.1754 LearningRate 0.000022 Epoch: 34 Global Step: 718140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:34,860-Speed 2512.44 samples/sec Loss 1.1474 LearningRate 0.000022 Epoch: 34 Global Step: 718150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:43,066-Speed 2496.30 samples/sec Loss 1.1634 LearningRate 0.000022 Epoch: 34 Global Step: 718160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:51,270-Speed 2497.01 samples/sec Loss 1.1055 LearningRate 0.000022 Epoch: 34 Global Step: 718170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-07-12 10:51:59,474-Speed 2496.68 samples/sec Loss 1.1523 LearningRate 0.000022 Epoch: 34 Global Step: 718180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:07,681-Speed 2496.20 samples/sec Loss 1.1383 LearningRate 0.000022 Epoch: 34 Global Step: 718190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:15,883-Speed 2497.18 samples/sec Loss 1.1567 LearningRate 0.000022 Epoch: 34 Global Step: 718200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:24,033-Speed 2513.37 samples/sec Loss 1.1295 LearningRate 0.000022 Epoch: 34 Global Step: 718210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:32,232-Speed 2498.39 samples/sec Loss 1.1405 LearningRate 0.000022 Epoch: 34 Global Step: 718220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:40,434-Speed 2497.54 samples/sec Loss 1.1113 LearningRate 0.000022 Epoch: 34 Global Step: 718230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:48,637-Speed 2497.05 samples/sec Loss 1.1679 LearningRate 0.000022 Epoch: 34 Global Step: 718240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:52:56,836-Speed 2498.23 samples/sec Loss 1.1104 LearningRate 0.000022 Epoch: 34 Global Step: 718250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:05,041-Speed 2496.36 samples/sec Loss 1.1346 LearningRate 0.000022 Epoch: 34 Global Step: 718260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:13,189-Speed 2513.80 samples/sec Loss 1.1132 LearningRate 0.000022 Epoch: 34 Global Step: 718270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:21,390-Speed 2497.76 samples/sec Loss 1.1682 LearningRate 0.000022 Epoch: 34 Global Step: 718280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:29,590-Speed 2498.18 samples/sec Loss 1.1574 LearningRate 0.000022 Epoch: 34 Global Step: 718290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:37,790-Speed 2498.12 samples/sec Loss 1.1357 LearningRate 0.000022 Epoch: 34 Global Step: 718300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:45,991-Speed 2497.63 samples/sec Loss 1.1497 LearningRate 0.000022 Epoch: 34 Global Step: 718310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:53:54,192-Speed 2497.92 samples/sec Loss 1.1368 LearningRate 0.000022 Epoch: 34 Global Step: 718320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:02,339-Speed 2514.26 samples/sec Loss 1.1505 LearningRate 0.000022 Epoch: 34 Global Step: 718330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:10,546-Speed 2495.67 samples/sec Loss 1.1632 LearningRate 0.000022 Epoch: 34 Global Step: 718340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:18,751-Speed 2496.57 samples/sec Loss 1.1202 LearningRate 0.000022 Epoch: 34 Global Step: 718350 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:26,959-Speed 2495.45 samples/sec Loss 1.1629 LearningRate 0.000022 Epoch: 34 Global Step: 718360 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:35,166-Speed 2495.89 samples/sec Loss 1.1599 LearningRate 0.000022 Epoch: 34 Global Step: 718370 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:43,372-Speed 2496.18 samples/sec Loss 1.1412 LearningRate 0.000022 Epoch: 34 Global Step: 718380 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:51,528-Speed 2511.29 samples/sec Loss 1.1651 LearningRate 0.000022 Epoch: 34 Global Step: 718390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:54:59,761-Speed 2488.29 samples/sec Loss 1.1845 LearningRate 0.000022 Epoch: 34 Global Step: 718400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:07,973-Speed 2494.44 samples/sec Loss 1.1377 LearningRate 0.000022 Epoch: 34 Global Step: 718410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:16,173-Speed 2497.90 samples/sec Loss 1.1388 LearningRate 0.000022 Epoch: 34 Global Step: 718420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:24,392-Speed 2492.23 samples/sec Loss 1.1291 LearningRate 0.000022 Epoch: 34 Global Step: 718430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:32,592-Speed 2497.86 samples/sec Loss 1.1626 LearningRate 0.000022 Epoch: 34 Global Step: 718440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:40,752-Speed 2510.13 samples/sec Loss 1.1446 LearningRate 0.000022 Epoch: 34 Global Step: 718450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:48,950-Speed 2498.85 samples/sec Loss 1.1657 LearningRate 0.000022 Epoch: 34 Global Step: 718460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:55:57,149-Speed 2498.06 samples/sec Loss 1.1427 LearningRate 0.000022 Epoch: 34 Global Step: 718470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:05,360-Speed 2494.71 samples/sec Loss 1.1558 LearningRate 0.000022 Epoch: 34 Global Step: 718480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:13,560-Speed 2498.03 samples/sec Loss 1.1332 LearningRate 0.000022 Epoch: 34 Global Step: 718490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:21,761-Speed 2497.50 samples/sec Loss 1.1274 LearningRate 0.000022 Epoch: 34 Global Step: 718500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:29,913-Speed 2512.74 samples/sec Loss 1.1624 LearningRate 0.000022 Epoch: 34 Global Step: 718510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:38,117-Speed 2496.93 samples/sec Loss 1.1613 LearningRate 0.000022 Epoch: 34 Global Step: 718520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:46,319-Speed 2497.30 samples/sec Loss 1.1553 LearningRate 0.000022 Epoch: 34 Global Step: 718530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:56:54,519-Speed 2497.94 samples/sec Loss 1.1483 LearningRate 0.000022 Epoch: 34 Global Step: 718540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:02,745-Speed 2490.34 samples/sec Loss 1.1261 LearningRate 0.000022 Epoch: 34 Global Step: 718550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:10,948-Speed 2497.16 samples/sec Loss 1.1575 LearningRate 0.000022 Epoch: 34 Global Step: 718560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:19,098-Speed 2512.92 samples/sec Loss 1.1599 LearningRate 0.000022 Epoch: 34 Global Step: 718570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:27,304-Speed 2496.16 samples/sec Loss 1.1486 LearningRate 0.000022 Epoch: 34 Global Step: 718580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:35,520-Speed 2493.18 samples/sec Loss 1.1397 LearningRate 0.000022 Epoch: 34 Global Step: 718590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:43,723-Speed 2497.13 samples/sec Loss 1.1237 LearningRate 0.000022 Epoch: 34 Global Step: 718600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:57:51,930-Speed 2495.82 samples/sec Loss 1.1536 LearningRate 0.000022 Epoch: 34 Global Step: 718610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:00,134-Speed 2496.91 samples/sec Loss 1.1507 LearningRate 0.000022 Epoch: 34 Global Step: 718620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:08,283-Speed 2513.80 samples/sec Loss 1.1615 LearningRate 0.000022 Epoch: 34 Global Step: 718630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:16,487-Speed 2496.71 samples/sec Loss 1.1514 LearningRate 0.000022 Epoch: 34 Global Step: 718640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:24,690-Speed 2497.06 samples/sec Loss 1.1502 LearningRate 0.000022 Epoch: 34 Global Step: 718650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:32,895-Speed 2496.66 samples/sec Loss 1.1492 LearningRate 0.000022 Epoch: 34 Global Step: 718660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:41,099-Speed 2496.80 samples/sec Loss 1.1717 LearningRate 0.000022 Epoch: 34 Global Step: 718670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:49,301-Speed 2497.10 samples/sec Loss 1.1582 LearningRate 0.000022 Epoch: 34 Global Step: 718680 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:58:57,456-Speed 2511.68 samples/sec Loss 1.1501 LearningRate 0.000022 Epoch: 34 Global Step: 718690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:05,661-Speed 2496.47 samples/sec Loss 1.1404 LearningRate 0.000022 Epoch: 34 Global Step: 718700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:13,898-Speed 2486.89 samples/sec Loss 1.1522 LearningRate 0.000022 Epoch: 34 Global Step: 718710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:22,104-Speed 2496.02 samples/sec Loss 1.1336 LearningRate 0.000022 Epoch: 34 Global Step: 718720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:30,333-Speed 2489.26 samples/sec Loss 1.1630 LearningRate 0.000022 Epoch: 34 Global Step: 718730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:38,543-Speed 2494.97 samples/sec Loss 1.1301 LearningRate 0.000022 Epoch: 34 Global Step: 718740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:46,695-Speed 2512.61 samples/sec Loss 1.1699 LearningRate 0.000022 Epoch: 34 Global Step: 718750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 10:59:54,898-Speed 2497.19 samples/sec Loss 1.1408 LearningRate 0.000022 Epoch: 34 Global Step: 718760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:03,101-Speed 2496.92 samples/sec Loss 1.1612 LearningRate 0.000022 Epoch: 34 Global Step: 718770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:11,303-Speed 2497.47 samples/sec Loss 1.1072 LearningRate 0.000022 Epoch: 34 Global Step: 718780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:19,503-Speed 2497.96 samples/sec Loss 1.1441 LearningRate 0.000022 Epoch: 34 Global Step: 718790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:27,708-Speed 2496.45 samples/sec Loss 1.1413 LearningRate 0.000022 Epoch: 34 Global Step: 718800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:35,858-Speed 2513.38 samples/sec Loss 1.1597 LearningRate 0.000022 Epoch: 34 Global Step: 718810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:44,078-Speed 2491.65 samples/sec Loss 1.1291 LearningRate 0.000022 Epoch: 34 Global Step: 718820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:00:52,280-Speed 2497.58 samples/sec Loss 1.1555 LearningRate 0.000022 Epoch: 34 Global Step: 718830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:00,480-Speed 2497.74 samples/sec Loss 1.1558 LearningRate 0.000022 Epoch: 34 Global Step: 718840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:08,694-Speed 2493.93 samples/sec Loss 1.1490 LearningRate 0.000022 Epoch: 34 Global Step: 718850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:16,897-Speed 2497.15 samples/sec Loss 1.1317 LearningRate 0.000022 Epoch: 34 Global Step: 718860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:25,043-Speed 2514.24 samples/sec Loss 1.1680 LearningRate 0.000022 Epoch: 34 Global Step: 718870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:33,249-Speed 2496.15 samples/sec Loss 1.1268 LearningRate 0.000022 Epoch: 34 Global Step: 718880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:41,454-Speed 2496.55 samples/sec Loss 1.1476 LearningRate 0.000022 Epoch: 34 Global Step: 718890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:49,659-Speed 2496.72 samples/sec Loss 1.1243 LearningRate 0.000022 Epoch: 34 Global Step: 718900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:01:57,864-Speed 2496.29 samples/sec Loss 1.1286 LearningRate 0.000022 Epoch: 34 Global Step: 718910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:06,071-Speed 2495.79 samples/sec Loss 1.1609 LearningRate 0.000022 Epoch: 34 Global Step: 718920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:14,222-Speed 2513.06 samples/sec Loss 1.1548 LearningRate 0.000022 Epoch: 34 Global Step: 718930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:22,423-Speed 2497.78 samples/sec Loss 1.1219 LearningRate 0.000022 Epoch: 34 Global Step: 718940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:30,624-Speed 2497.26 samples/sec Loss 1.1640 LearningRate 0.000022 Epoch: 34 Global Step: 718950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:38,827-Speed 2497.06 samples/sec Loss 1.1418 LearningRate 0.000022 Epoch: 34 Global Step: 718960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:47,033-Speed 2496.35 samples/sec Loss 1.1394 LearningRate 0.000022 Epoch: 34 Global Step: 718970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:02:55,234-Speed 2497.40 samples/sec Loss 1.1704 LearningRate 0.000022 Epoch: 34 Global Step: 718980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:03,389-Speed 2511.93 samples/sec Loss 1.1448 LearningRate 0.000022 Epoch: 34 Global Step: 718990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:11,592-Speed 2497.25 samples/sec Loss 1.1396 LearningRate 0.000022 Epoch: 34 Global Step: 719000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:19,792-Speed 2498.27 samples/sec Loss 1.1245 LearningRate 0.000022 Epoch: 34 Global Step: 719010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:27,995-Speed 2496.74 samples/sec Loss 1.1504 LearningRate 0.000022 Epoch: 34 Global Step: 719020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:36,195-Speed 2498.17 samples/sec Loss 1.1519 LearningRate 0.000022 Epoch: 34 Global Step: 719030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:44,397-Speed 2497.32 samples/sec Loss 1.1646 LearningRate 0.000022 Epoch: 34 Global Step: 719040 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:03:52,556-Speed 2510.57 samples/sec Loss 1.1582 LearningRate 0.000022 Epoch: 34 Global Step: 719050 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:00,758-Speed 2497.38 samples/sec Loss 1.1403 LearningRate 0.000022 Epoch: 34 Global Step: 719060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:08,974-Speed 2493.07 samples/sec Loss 1.1478 LearningRate 0.000022 Epoch: 34 Global Step: 719070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:17,178-Speed 2496.60 samples/sec Loss 1.1383 LearningRate 0.000022 Epoch: 34 Global Step: 719080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:25,381-Speed 2497.32 samples/sec Loss 1.1537 LearningRate 0.000022 Epoch: 34 Global Step: 719090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:33,580-Speed 2498.08 samples/sec Loss 1.1419 LearningRate 0.000022 Epoch: 34 Global Step: 719100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:41,730-Speed 2513.44 samples/sec Loss 1.1568 LearningRate 0.000022 Epoch: 34 Global Step: 719110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:49,931-Speed 2497.67 samples/sec Loss 1.1352 LearningRate 0.000022 Epoch: 34 Global Step: 719120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:04:58,133-Speed 2497.38 samples/sec Loss 1.1250 LearningRate 0.000022 Epoch: 34 Global Step: 719130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:06,335-Speed 2497.20 samples/sec Loss 1.1628 LearningRate 0.000022 Epoch: 34 Global Step: 719140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:14,539-Speed 2496.88 samples/sec Loss 1.1642 LearningRate 0.000022 Epoch: 34 Global Step: 719150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:22,744-Speed 2496.18 samples/sec Loss 1.1408 LearningRate 0.000022 Epoch: 34 Global Step: 719160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:30,895-Speed 2513.12 samples/sec Loss 1.1716 LearningRate 0.000022 Epoch: 34 Global Step: 719170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:39,101-Speed 2496.24 samples/sec Loss 1.1729 LearningRate 0.000022 Epoch: 34 Global Step: 719180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:47,303-Speed 2497.26 samples/sec Loss 1.1467 LearningRate 0.000022 Epoch: 34 Global Step: 719190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:05:55,521-Speed 2492.59 samples/sec Loss 1.1459 LearningRate 0.000022 Epoch: 34 Global Step: 719200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:03,721-Speed 2497.91 samples/sec Loss 1.1821 LearningRate 0.000022 Epoch: 34 Global Step: 719210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:11,924-Speed 2496.86 samples/sec Loss 1.1343 LearningRate 0.000022 Epoch: 34 Global Step: 719220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:20,073-Speed 2513.77 samples/sec Loss 1.1532 LearningRate 0.000022 Epoch: 34 Global Step: 719230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:28,274-Speed 2497.57 samples/sec Loss 1.1324 LearningRate 0.000022 Epoch: 34 Global Step: 719240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:36,477-Speed 2497.26 samples/sec Loss 1.1558 LearningRate 0.000022 Epoch: 34 Global Step: 719250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:44,677-Speed 2498.08 samples/sec Loss 1.1214 LearningRate 0.000022 Epoch: 34 Global Step: 719260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:06:52,879-Speed 2497.46 samples/sec Loss 1.1429 LearningRate 0.000022 Epoch: 34 Global Step: 719270 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:01,081-Speed 2497.28 samples/sec Loss 1.1613 LearningRate 0.000022 Epoch: 34 Global Step: 719280 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:09,235-Speed 2512.02 samples/sec Loss 1.1318 LearningRate 0.000022 Epoch: 34 Global Step: 719290 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:17,433-Speed 2498.64 samples/sec Loss 1.1746 LearningRate 0.000022 Epoch: 34 Global Step: 719300 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:25,632-Speed 2498.33 samples/sec Loss 1.1360 LearningRate 0.000022 Epoch: 34 Global Step: 719310 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:33,830-Speed 2498.53 samples/sec Loss 1.1225 LearningRate 0.000022 Epoch: 34 Global Step: 719320 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:42,033-Speed 2497.19 samples/sec Loss 1.1482 LearningRate 0.000022 Epoch: 34 Global Step: 719330 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:50,235-Speed 2497.30 samples/sec Loss 1.1801 LearningRate 0.000022 Epoch: 34 Global Step: 719340 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:07:58,386-Speed 2512.82 samples/sec Loss 1.1466 LearningRate 0.000022 Epoch: 34 Global Step: 719350 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:06,585-Speed 2498.30 samples/sec Loss 1.1553 LearningRate 0.000022 Epoch: 34 Global Step: 719360 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:14,786-Speed 2497.62 samples/sec Loss 1.1443 LearningRate 0.000022 Epoch: 34 Global Step: 719370 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:22,988-Speed 2498.45 samples/sec Loss 1.1152 LearningRate 0.000022 Epoch: 34 Global Step: 719380 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:31,187-Speed 2498.10 samples/sec Loss 1.1487 LearningRate 0.000022 Epoch: 34 Global Step: 719390 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:39,392-Speed 2496.36 samples/sec Loss 1.1504 LearningRate 0.000022 Epoch: 34 Global Step: 719400 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:47,541-Speed 2513.70 samples/sec Loss 1.1379 LearningRate 0.000022 Epoch: 34 Global Step: 719410 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:08:55,744-Speed 2496.99 samples/sec Loss 1.1180 LearningRate 0.000022 Epoch: 34 Global Step: 719420 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:03,946-Speed 2497.75 samples/sec Loss 1.1438 LearningRate 0.000022 Epoch: 34 Global Step: 719430 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:12,149-Speed 2496.92 samples/sec Loss 1.1162 LearningRate 0.000022 Epoch: 34 Global Step: 719440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:20,352-Speed 2496.88 samples/sec Loss 1.1485 LearningRate 0.000022 Epoch: 34 Global Step: 719450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:28,555-Speed 2497.39 samples/sec Loss 1.1356 LearningRate 0.000022 Epoch: 34 Global Step: 719460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:36,704-Speed 2513.56 samples/sec Loss 1.1230 LearningRate 0.000022 Epoch: 34 Global Step: 719470 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:44,906-Speed 2497.12 samples/sec Loss 1.1346 LearningRate 0.000022 Epoch: 34 Global Step: 719480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:09:53,109-Speed 2497.43 samples/sec Loss 1.1343 LearningRate 0.000022 Epoch: 34 Global Step: 719490 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:01,310-Speed 2497.83 samples/sec Loss 1.1364 LearningRate 0.000022 Epoch: 34 Global Step: 719500 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:09,511-Speed 2497.63 samples/sec Loss 1.1540 LearningRate 0.000022 Epoch: 34 Global Step: 719510 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:17,716-Speed 2496.41 samples/sec Loss 1.1631 LearningRate 0.000022 Epoch: 34 Global Step: 719520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:25,865-Speed 2513.57 samples/sec Loss 1.1396 LearningRate 0.000022 Epoch: 34 Global Step: 719530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:34,066-Speed 2497.86 samples/sec Loss 1.1285 LearningRate 0.000022 Epoch: 34 Global Step: 719540 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:42,267-Speed 2497.78 samples/sec Loss 1.1364 LearningRate 0.000022 Epoch: 34 Global Step: 719550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:50,470-Speed 2496.90 samples/sec Loss 1.1269 LearningRate 0.000022 Epoch: 34 Global Step: 719560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:10:58,677-Speed 2495.86 samples/sec Loss 1.1287 LearningRate 0.000022 Epoch: 34 Global Step: 719570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:06,881-Speed 2496.74 samples/sec Loss 1.1445 LearningRate 0.000022 Epoch: 34 Global Step: 719580 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:15,032-Speed 2512.90 samples/sec Loss 1.1651 LearningRate 0.000022 Epoch: 34 Global Step: 719590 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:23,236-Speed 2496.79 samples/sec Loss 1.1577 LearningRate 0.000022 Epoch: 34 Global Step: 719600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:31,438-Speed 2497.47 samples/sec Loss 1.1701 LearningRate 0.000022 Epoch: 34 Global Step: 719610 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:39,644-Speed 2495.86 samples/sec Loss 1.1576 LearningRate 0.000022 Epoch: 34 Global Step: 719620 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:47,851-Speed 2496.15 samples/sec Loss 1.1412 LearningRate 0.000022 Epoch: 34 Global Step: 719630 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:11:56,053-Speed 2497.29 samples/sec Loss 1.1320 LearningRate 0.000022 Epoch: 34 Global Step: 719640 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:04,205-Speed 2512.73 samples/sec Loss 1.1307 LearningRate 0.000022 Epoch: 34 Global Step: 719650 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:12,407-Speed 2497.58 samples/sec Loss 1.1684 LearningRate 0.000022 Epoch: 34 Global Step: 719660 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:20,610-Speed 2496.99 samples/sec Loss 1.1429 LearningRate 0.000022 Epoch: 34 Global Step: 719670 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:28,817-Speed 2495.84 samples/sec Loss 1.1347 LearningRate 0.000022 Epoch: 34 Global Step: 719680 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:37,020-Speed 2497.08 samples/sec Loss 1.1430 LearningRate 0.000022 Epoch: 34 Global Step: 719690 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:45,236-Speed 2493.10 samples/sec Loss 1.1333 LearningRate 0.000022 Epoch: 34 Global Step: 719700 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:12:53,386-Speed 2513.34 samples/sec Loss 1.1361 LearningRate 0.000022 Epoch: 34 Global Step: 719710 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:01,588-Speed 2497.33 samples/sec Loss 1.1343 LearningRate 0.000022 Epoch: 34 Global Step: 719720 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:09,793-Speed 2496.45 samples/sec Loss 1.1334 LearningRate 0.000022 Epoch: 34 Global Step: 719730 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:17,994-Speed 2497.41 samples/sec Loss 1.1258 LearningRate 0.000022 Epoch: 34 Global Step: 719740 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:26,203-Speed 2495.49 samples/sec Loss 1.1189 LearningRate 0.000022 Epoch: 34 Global Step: 719750 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:34,409-Speed 2496.02 samples/sec Loss 1.1197 LearningRate 0.000022 Epoch: 34 Global Step: 719760 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:42,557-Speed 2513.75 samples/sec Loss 1.1283 LearningRate 0.000022 Epoch: 34 Global Step: 719770 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:50,760-Speed 2497.33 samples/sec Loss 1.1178 LearningRate 0.000022 Epoch: 34 Global Step: 719780 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:13:58,962-Speed 2497.00 samples/sec Loss 1.1246 LearningRate 0.000022 Epoch: 34 Global Step: 719790 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:07,164-Speed 2497.79 samples/sec Loss 1.1357 LearningRate 0.000022 Epoch: 34 Global Step: 719800 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:15,367-Speed 2496.91 samples/sec Loss 1.1528 LearningRate 0.000022 Epoch: 34 Global Step: 719810 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:23,570-Speed 2497.17 samples/sec Loss 1.1426 LearningRate 0.000022 Epoch: 34 Global Step: 719820 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:31,719-Speed 2513.47 samples/sec Loss 1.1397 LearningRate 0.000022 Epoch: 34 Global Step: 719830 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:39,919-Speed 2498.10 samples/sec Loss 1.1857 LearningRate 0.000022 Epoch: 34 Global Step: 719840 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:48,116-Speed 2498.58 samples/sec Loss 1.1328 LearningRate 0.000022 Epoch: 34 Global Step: 719850 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:14:56,316-Speed 2497.87 samples/sec Loss 1.1425 LearningRate 0.000022 Epoch: 34 Global Step: 719860 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:04,519-Speed 2497.36 samples/sec Loss 1.1169 LearningRate 0.000022 Epoch: 34 Global Step: 719870 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:12,720-Speed 2497.35 samples/sec Loss 1.1106 LearningRate 0.000022 Epoch: 34 Global Step: 719880 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:20,873-Speed 2512.43 samples/sec Loss 1.1391 LearningRate 0.000022 Epoch: 34 Global Step: 719890 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:29,073-Speed 2498.14 samples/sec Loss 1.1403 LearningRate 0.000022 Epoch: 34 Global Step: 719900 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:37,275-Speed 2497.31 samples/sec Loss 1.1807 LearningRate 0.000022 Epoch: 34 Global Step: 719910 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:45,477-Speed 2497.26 samples/sec Loss 1.1149 LearningRate 0.000022 Epoch: 34 Global Step: 719920 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:15:53,679-Speed 2497.62 samples/sec Loss 1.1318 LearningRate 0.000022 Epoch: 34 Global Step: 719930 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:01,883-Speed 2496.92 samples/sec Loss 1.1475 LearningRate 0.000022 Epoch: 34 Global Step: 719940 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:10,045-Speed 2509.61 samples/sec Loss 1.1518 LearningRate 0.000022 Epoch: 34 Global Step: 719950 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:18,246-Speed 2497.69 samples/sec Loss 1.1349 LearningRate 0.000022 Epoch: 34 Global Step: 719960 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:26,450-Speed 2496.62 samples/sec Loss 1.1330 LearningRate 0.000022 Epoch: 34 Global Step: 719970 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:34,654-Speed 2496.88 samples/sec Loss 1.1720 LearningRate 0.000022 Epoch: 34 Global Step: 719980 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:42,859-Speed 2496.48 samples/sec Loss 1.1351 LearningRate 0.000022 Epoch: 34 Global Step: 719990 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:51,065-Speed 2496.12 samples/sec Loss 1.1585 LearningRate 0.000022 Epoch: 34 Global Step: 720000 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:16:59,226-Speed 2509.97 samples/sec Loss 1.1511 LearningRate 0.000022 Epoch: 34 Global Step: 720010 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:07,427-Speed 2497.73 samples/sec Loss 1.1279 LearningRate 0.000022 Epoch: 34 Global Step: 720020 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:15,632-Speed 2496.49 samples/sec Loss 1.1604 LearningRate 0.000022 Epoch: 34 Global Step: 720030 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:23,831-Speed 2498.22 samples/sec Loss 1.1161 LearningRate 0.000022 Epoch: 34 Global Step: 720040 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:32,031-Speed 2497.94 samples/sec Loss 1.1453 LearningRate 0.000022 Epoch: 34 Global Step: 720050 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:40,232-Speed 2497.55 samples/sec Loss 1.1594 LearningRate 0.000022 Epoch: 34 Global Step: 720060 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:48,380-Speed 2513.97 samples/sec Loss 1.1303 LearningRate 0.000022 Epoch: 34 Global Step: 720070 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:17:56,540-Speed 2510.41 samples/sec Loss 1.1568 LearningRate 0.000022 Epoch: 34 Global Step: 720080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:04,741-Speed 2497.71 samples/sec Loss 1.1260 LearningRate 0.000021 Epoch: 34 Global Step: 720090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:12,941-Speed 2497.99 samples/sec Loss 1.1312 LearningRate 0.000021 Epoch: 34 Global Step: 720100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:21,138-Speed 2498.58 samples/sec Loss 1.1213 LearningRate 0.000021 Epoch: 34 Global Step: 720110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:29,342-Speed 2496.76 samples/sec Loss 1.1266 LearningRate 0.000021 Epoch: 34 Global Step: 720120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:37,491-Speed 2513.79 samples/sec Loss 1.1353 LearningRate 0.000021 Epoch: 34 Global Step: 720130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:45,693-Speed 2497.35 samples/sec Loss 1.1270 LearningRate 0.000021 Epoch: 34 Global Step: 720140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:18:53,903-Speed 2494.97 samples/sec Loss 1.1106 LearningRate 0.000021 Epoch: 34 Global Step: 720150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:02,107-Speed 2496.62 samples/sec Loss 1.1314 LearningRate 0.000021 Epoch: 34 Global Step: 720160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:10,311-Speed 2496.80 samples/sec Loss 1.1488 LearningRate 0.000021 Epoch: 34 Global Step: 720170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:18,529-Speed 2492.43 samples/sec Loss 1.1549 LearningRate 0.000021 Epoch: 34 Global Step: 720180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:26,680-Speed 2512.89 samples/sec Loss 1.1529 LearningRate 0.000021 Epoch: 34 Global Step: 720190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:34,885-Speed 2496.66 samples/sec Loss 1.1450 LearningRate 0.000021 Epoch: 34 Global Step: 720200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:43,085-Speed 2497.78 samples/sec Loss 1.1388 LearningRate 0.000021 Epoch: 34 Global Step: 720210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:51,290-Speed 2496.30 samples/sec Loss 1.1112 LearningRate 0.000021 Epoch: 34 Global Step: 720220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:19:59,491-Speed 2497.74 samples/sec Loss 1.1400 LearningRate 0.000021 Epoch: 34 Global Step: 720230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:07,705-Speed 2493.73 samples/sec Loss 1.1612 LearningRate 0.000021 Epoch: 34 Global Step: 720240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:15,855-Speed 2513.13 samples/sec Loss 1.1659 LearningRate 0.000021 Epoch: 34 Global Step: 720250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:24,065-Speed 2495.13 samples/sec Loss 1.1156 LearningRate 0.000021 Epoch: 34 Global Step: 720260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:32,301-Speed 2486.85 samples/sec Loss 1.1468 LearningRate 0.000021 Epoch: 34 Global Step: 720270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:40,500-Speed 2498.25 samples/sec Loss 1.1142 LearningRate 0.000021 Epoch: 34 Global Step: 720280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:48,701-Speed 2497.59 samples/sec Loss 1.1192 LearningRate 0.000021 Epoch: 34 Global Step: 720290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:20:56,902-Speed 2497.77 samples/sec Loss 1.1344 LearningRate 0.000021 Epoch: 34 Global Step: 720300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:05,047-Speed 2514.61 samples/sec Loss 1.2014 LearningRate 0.000021 Epoch: 34 Global Step: 720310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:13,249-Speed 2497.56 samples/sec Loss 1.1498 LearningRate 0.000021 Epoch: 34 Global Step: 720320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:21,454-Speed 2496.78 samples/sec Loss 1.1120 LearningRate 0.000021 Epoch: 34 Global Step: 720330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:29,654-Speed 2497.97 samples/sec Loss 1.1255 LearningRate 0.000021 Epoch: 34 Global Step: 720340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:37,856-Speed 2497.57 samples/sec Loss 1.1402 LearningRate 0.000021 Epoch: 34 Global Step: 720350 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:46,059-Speed 2497.12 samples/sec Loss 1.1639 LearningRate 0.000021 Epoch: 34 Global Step: 720360 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:21:54,214-Speed 2511.63 samples/sec Loss 1.1622 LearningRate 0.000021 Epoch: 34 Global Step: 720370 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:02,418-Speed 2496.62 samples/sec Loss 1.1783 LearningRate 0.000021 Epoch: 34 Global Step: 720380 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:10,652-Speed 2487.81 samples/sec Loss 1.1441 LearningRate 0.000021 Epoch: 34 Global Step: 720390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:18,858-Speed 2496.33 samples/sec Loss 1.1404 LearningRate 0.000021 Epoch: 34 Global Step: 720400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:27,060-Speed 2497.10 samples/sec Loss 1.1391 LearningRate 0.000021 Epoch: 34 Global Step: 720410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:35,264-Speed 2496.68 samples/sec Loss 1.1331 LearningRate 0.000021 Epoch: 34 Global Step: 720420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:43,416-Speed 2512.96 samples/sec Loss 1.1505 LearningRate 0.000021 Epoch: 34 Global Step: 720430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:51,620-Speed 2497.00 samples/sec Loss 1.1373 LearningRate 0.000021 Epoch: 34 Global Step: 720440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:22:59,821-Speed 2497.56 samples/sec Loss 1.1418 LearningRate 0.000021 Epoch: 34 Global Step: 720450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:08,023-Speed 2497.38 samples/sec Loss 1.1595 LearningRate 0.000021 Epoch: 34 Global Step: 720460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:16,222-Speed 2498.19 samples/sec Loss 1.1536 LearningRate 0.000021 Epoch: 34 Global Step: 720470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:24,425-Speed 2496.95 samples/sec Loss 1.1561 LearningRate 0.000021 Epoch: 34 Global Step: 720480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:32,573-Speed 2513.96 samples/sec Loss 1.1282 LearningRate 0.000021 Epoch: 34 Global Step: 720490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:40,783-Speed 2494.92 samples/sec Loss 1.1399 LearningRate 0.000021 Epoch: 34 Global Step: 720500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:48,986-Speed 2497.32 samples/sec Loss 1.1425 LearningRate 0.000021 Epoch: 34 Global Step: 720510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:23:57,190-Speed 2496.52 samples/sec Loss 1.1508 LearningRate 0.000021 Epoch: 34 Global Step: 720520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:05,395-Speed 2496.50 samples/sec Loss 1.1795 LearningRate 0.000021 Epoch: 34 Global Step: 720530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:13,595-Speed 2498.10 samples/sec Loss 1.1539 LearningRate 0.000021 Epoch: 34 Global Step: 720540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:21,744-Speed 2513.68 samples/sec Loss 1.1404 LearningRate 0.000021 Epoch: 34 Global Step: 720550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:29,946-Speed 2497.30 samples/sec Loss 1.1690 LearningRate 0.000021 Epoch: 34 Global Step: 720560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:38,146-Speed 2498.07 samples/sec Loss 1.1204 LearningRate 0.000021 Epoch: 34 Global Step: 720570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:46,347-Speed 2497.66 samples/sec Loss 1.1782 LearningRate 0.000021 Epoch: 34 Global Step: 720580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:24:54,545-Speed 2498.49 samples/sec Loss 1.1323 LearningRate 0.000021 Epoch: 34 Global Step: 720590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:02,759-Speed 2493.67 samples/sec Loss 1.1522 LearningRate 0.000021 Epoch: 34 Global Step: 720600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:10,908-Speed 2513.61 samples/sec Loss 1.1300 LearningRate 0.000021 Epoch: 34 Global Step: 720610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:19,109-Speed 2497.53 samples/sec Loss 1.1279 LearningRate 0.000021 Epoch: 34 Global Step: 720620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:27,318-Speed 2495.55 samples/sec Loss 1.1681 LearningRate 0.000021 Epoch: 34 Global Step: 720630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:35,531-Speed 2494.09 samples/sec Loss 1.1335 LearningRate 0.000021 Epoch: 34 Global Step: 720640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:43,752-Speed 2491.72 samples/sec Loss 1.1425 LearningRate 0.000021 Epoch: 34 Global Step: 720650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:25:51,953-Speed 2497.64 samples/sec Loss 1.1539 LearningRate 0.000021 Epoch: 34 Global Step: 720660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:00,102-Speed 2513.25 samples/sec Loss 1.1280 LearningRate 0.000021 Epoch: 34 Global Step: 720670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:08,305-Speed 2497.04 samples/sec Loss 1.1571 LearningRate 0.000021 Epoch: 34 Global Step: 720680 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:16,509-Speed 2497.10 samples/sec Loss 1.1244 LearningRate 0.000021 Epoch: 34 Global Step: 720690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:24,712-Speed 2497.19 samples/sec Loss 1.1244 LearningRate 0.000021 Epoch: 34 Global Step: 720700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:32,916-Speed 2496.89 samples/sec Loss 1.1572 LearningRate 0.000021 Epoch: 34 Global Step: 720710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:41,132-Speed 2493.14 samples/sec Loss 1.1365 LearningRate 0.000021 Epoch: 34 Global Step: 720720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:49,293-Speed 2511.03 samples/sec Loss 1.1276 LearningRate 0.000021 Epoch: 34 Global Step: 720730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:26:57,496-Speed 2497.17 samples/sec Loss 1.1542 LearningRate 0.000021 Epoch: 34 Global Step: 720740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:05,708-Speed 2494.10 samples/sec Loss 1.1525 LearningRate 0.000021 Epoch: 34 Global Step: 720750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:13,907-Speed 2498.29 samples/sec Loss 1.1262 LearningRate 0.000021 Epoch: 34 Global Step: 720760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:22,123-Speed 2493.60 samples/sec Loss 1.1553 LearningRate 0.000021 Epoch: 34 Global Step: 720770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:30,322-Speed 2498.54 samples/sec Loss 1.1279 LearningRate 0.000021 Epoch: 34 Global Step: 720780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:38,471-Speed 2513.51 samples/sec Loss 1.1304 LearningRate 0.000021 Epoch: 34 Global Step: 720790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:46,674-Speed 2497.18 samples/sec Loss 1.1402 LearningRate 0.000021 Epoch: 34 Global Step: 720800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:27:54,877-Speed 2497.09 samples/sec Loss 1.1264 LearningRate 0.000021 Epoch: 34 Global Step: 720810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:03,081-Speed 2496.80 samples/sec Loss 1.1623 LearningRate 0.000021 Epoch: 34 Global Step: 720820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:11,284-Speed 2497.09 samples/sec Loss 1.1245 LearningRate 0.000021 Epoch: 34 Global Step: 720830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:19,486-Speed 2497.01 samples/sec Loss 1.1571 LearningRate 0.000021 Epoch: 34 Global Step: 720840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:27,637-Speed 2513.28 samples/sec Loss 1.1380 LearningRate 0.000021 Epoch: 34 Global Step: 720850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:35,843-Speed 2496.26 samples/sec Loss 1.1303 LearningRate 0.000021 Epoch: 34 Global Step: 720860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:44,045-Speed 2497.16 samples/sec Loss 1.1407 LearningRate 0.000021 Epoch: 34 Global Step: 720870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:28:52,246-Speed 2498.08 samples/sec Loss 1.1342 LearningRate 0.000021 Epoch: 34 Global Step: 720880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:00,447-Speed 2497.68 samples/sec Loss 1.1345 LearningRate 0.000021 Epoch: 34 Global Step: 720890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:08,647-Speed 2497.91 samples/sec Loss 1.1233 LearningRate 0.000021 Epoch: 34 Global Step: 720900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:16,795-Speed 2513.87 samples/sec Loss 1.1207 LearningRate 0.000021 Epoch: 34 Global Step: 720910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:24,997-Speed 2497.66 samples/sec Loss 1.0852 LearningRate 0.000021 Epoch: 34 Global Step: 720920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:33,196-Speed 2498.12 samples/sec Loss 1.1573 LearningRate 0.000021 Epoch: 34 Global Step: 720930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:41,397-Speed 2497.56 samples/sec Loss 1.1142 LearningRate 0.000021 Epoch: 34 Global Step: 720940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:49,597-Speed 2497.69 samples/sec Loss 1.1239 LearningRate 0.000021 Epoch: 34 Global Step: 720950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:29:57,802-Speed 2496.51 samples/sec Loss 1.1516 LearningRate 0.000021 Epoch: 34 Global Step: 720960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:05,963-Speed 2509.94 samples/sec Loss 1.1294 LearningRate 0.000021 Epoch: 34 Global Step: 720970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:14,167-Speed 2496.92 samples/sec Loss 1.1371 LearningRate 0.000021 Epoch: 34 Global Step: 720980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:22,386-Speed 2492.44 samples/sec Loss 1.1365 LearningRate 0.000021 Epoch: 34 Global Step: 720990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:30,585-Speed 2498.17 samples/sec Loss 1.1071 LearningRate 0.000021 Epoch: 34 Global Step: 721000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:38,792-Speed 2495.83 samples/sec Loss 1.1241 LearningRate 0.000021 Epoch: 34 Global Step: 721010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:46,993-Speed 2497.77 samples/sec Loss 1.1288 LearningRate 0.000021 Epoch: 34 Global Step: 721020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:30:55,143-Speed 2513.27 samples/sec Loss 1.1670 LearningRate 0.000021 Epoch: 34 Global Step: 721030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:03,345-Speed 2497.15 samples/sec Loss 1.1511 LearningRate 0.000021 Epoch: 34 Global Step: 721040 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:11,549-Speed 2496.72 samples/sec Loss 1.1439 LearningRate 0.000021 Epoch: 34 Global Step: 721050 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:19,755-Speed 2496.17 samples/sec Loss 1.1155 LearningRate 0.000021 Epoch: 34 Global Step: 721060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:27,956-Speed 2497.81 samples/sec Loss 1.1525 LearningRate 0.000021 Epoch: 34 Global Step: 721070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:36,157-Speed 2497.80 samples/sec Loss 1.1331 LearningRate 0.000021 Epoch: 34 Global Step: 721080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:44,308-Speed 2512.87 samples/sec Loss 1.1192 LearningRate 0.000021 Epoch: 34 Global Step: 721090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:31:52,510-Speed 2497.52 samples/sec Loss 1.1347 LearningRate 0.000021 Epoch: 34 Global Step: 721100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:00,713-Speed 2496.99 samples/sec Loss 1.1415 LearningRate 0.000021 Epoch: 34 Global Step: 721110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:08,913-Speed 2497.82 samples/sec Loss 1.1514 LearningRate 0.000021 Epoch: 34 Global Step: 721120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:17,115-Speed 2497.36 samples/sec Loss 1.1539 LearningRate 0.000021 Epoch: 34 Global Step: 721130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:25,314-Speed 2498.29 samples/sec Loss 1.1643 LearningRate 0.000021 Epoch: 34 Global Step: 721140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:33,465-Speed 2513.07 samples/sec Loss 1.1606 LearningRate 0.000021 Epoch: 34 Global Step: 721150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:41,663-Speed 2498.50 samples/sec Loss 1.1324 LearningRate 0.000021 Epoch: 34 Global Step: 721160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:49,875-Speed 2494.67 samples/sec Loss 1.1180 LearningRate 0.000021 Epoch: 34 Global Step: 721170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:32:58,083-Speed 2495.58 samples/sec Loss 1.1890 LearningRate 0.000021 Epoch: 34 Global Step: 721180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:06,291-Speed 2495.64 samples/sec Loss 1.1228 LearningRate 0.000021 Epoch: 34 Global Step: 721190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:14,498-Speed 2495.85 samples/sec Loss 1.1628 LearningRate 0.000021 Epoch: 34 Global Step: 721200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:22,648-Speed 2513.17 samples/sec Loss 1.1103 LearningRate 0.000021 Epoch: 34 Global Step: 721210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:30,859-Speed 2494.65 samples/sec Loss 1.1460 LearningRate 0.000021 Epoch: 34 Global Step: 721220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:39,066-Speed 2495.95 samples/sec Loss 1.1373 LearningRate 0.000021 Epoch: 34 Global Step: 721230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:47,364-Speed 2468.17 samples/sec Loss 1.1642 LearningRate 0.000021 Epoch: 34 Global Step: 721240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:33:55,568-Speed 2496.93 samples/sec Loss 1.1433 LearningRate 0.000021 Epoch: 34 Global Step: 721250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:34:03,769-Speed 2497.51 samples/sec Loss 1.1333 LearningRate 0.000021 Epoch: 34 Global Step: 721260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:34:11,916-Speed 2514.14 samples/sec Loss 1.1668 LearningRate 0.000021 Epoch: 34 Global Step: 721270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:34:20,132-Speed 2493.16 samples/sec Loss 1.1837 LearningRate 0.000021 Epoch: 34 Global Step: 721280 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:34:28,335-Speed 2497.06 samples/sec Loss 1.1149 LearningRate 0.000021 Epoch: 34 Global Step: 721290 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:34:36,536-Speed 2497.80 samples/sec Loss 1.1275 LearningRate 0.000021 Epoch: 34 Global Step: 721300 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:34:44,742-Speed 2495.98 samples/sec Loss 1.1557 LearningRate 0.000021 Epoch: 34 Global Step: 721310 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:34:52,945-Speed 2497.16 samples/sec Loss 1.1729 LearningRate 0.000021 Epoch: 34 Global Step: 721320 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:01,101-Speed 2511.12 samples/sec Loss 1.1485 LearningRate 0.000021 Epoch: 34 Global Step: 721330 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:09,304-Speed 2497.20 samples/sec Loss 1.1703 LearningRate 0.000021 Epoch: 34 Global Step: 721340 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:17,507-Speed 2497.04 samples/sec Loss 1.1328 LearningRate 0.000021 Epoch: 34 Global Step: 721350 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:25,708-Speed 2497.60 samples/sec Loss 1.1416 LearningRate 0.000021 Epoch: 34 Global Step: 721360 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:33,915-Speed 2495.88 samples/sec Loss 1.1162 LearningRate 0.000021 Epoch: 34 Global Step: 721370 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:42,115-Speed 2497.94 samples/sec Loss 1.1259 LearningRate 0.000021 Epoch: 34 Global Step: 721380 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:50,262-Speed 2514.22 samples/sec Loss 1.1352 LearningRate 0.000021 Epoch: 34 Global Step: 721390 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:35:58,465-Speed 2497.03 samples/sec Loss 1.1230 LearningRate 0.000021 Epoch: 34 Global Step: 721400 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:06,667-Speed 2497.24 samples/sec Loss 1.1372 LearningRate 0.000021 Epoch: 34 Global Step: 721410 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:14,870-Speed 2496.94 samples/sec Loss 1.1572 LearningRate 0.000021 Epoch: 34 Global Step: 721420 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:23,074-Speed 2497.00 samples/sec Loss 1.1421 LearningRate 0.000021 Epoch: 34 Global Step: 721430 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:31,279-Speed 2496.45 samples/sec Loss 1.1356 LearningRate 0.000021 Epoch: 34 Global Step: 721440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:39,428-Speed 2513.42 samples/sec Loss 1.1651 LearningRate 0.000021 Epoch: 34 Global Step: 721450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:47,627-Speed 2498.48 samples/sec Loss 1.1355 LearningRate 0.000021 Epoch: 34 Global Step: 721460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:36:55,826-Speed 2498.37 samples/sec Loss 1.1159 LearningRate 0.000021 Epoch: 34 Global Step: 721470 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:04,027-Speed 2497.67 samples/sec Loss 1.1661 LearningRate 0.000021 Epoch: 34 Global Step: 721480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:12,226-Speed 2498.65 samples/sec Loss 1.1625 LearningRate 0.000021 Epoch: 34 Global Step: 721490 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:20,432-Speed 2496.15 samples/sec Loss 1.1315 LearningRate 0.000021 Epoch: 34 Global Step: 721500 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:28,581-Speed 2513.65 samples/sec Loss 1.1370 LearningRate 0.000021 Epoch: 34 Global Step: 721510 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:36,790-Speed 2495.07 samples/sec Loss 1.1394 LearningRate 0.000021 Epoch: 34 Global Step: 721520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:44,988-Speed 2498.40 samples/sec Loss 1.1231 LearningRate 0.000021 Epoch: 34 Global Step: 721530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:37:53,191-Speed 2497.21 samples/sec Loss 1.1362 LearningRate 0.000021 Epoch: 34 Global Step: 721540 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:01,391-Speed 2497.71 samples/sec Loss 1.1586 LearningRate 0.000021 Epoch: 34 Global Step: 721550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:09,600-Speed 2495.31 samples/sec Loss 1.1352 LearningRate 0.000021 Epoch: 34 Global Step: 721560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:17,751-Speed 2512.92 samples/sec Loss 1.1175 LearningRate 0.000021 Epoch: 34 Global Step: 721570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:25,954-Speed 2497.31 samples/sec Loss 1.1578 LearningRate 0.000021 Epoch: 34 Global Step: 721580 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:34,160-Speed 2496.23 samples/sec Loss 1.1372 LearningRate 0.000021 Epoch: 34 Global Step: 721590 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:42,365-Speed 2496.33 samples/sec Loss 1.1348 LearningRate 0.000021 Epoch: 34 Global Step: 721600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:50,574-Speed 2495.47 samples/sec Loss 1.1412 LearningRate 0.000021 Epoch: 34 Global Step: 721610 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:38:58,790-Speed 2493.17 samples/sec Loss 1.1656 LearningRate 0.000021 Epoch: 34 Global Step: 721620 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:06,939-Speed 2513.59 samples/sec Loss 1.1381 LearningRate 0.000021 Epoch: 34 Global Step: 721630 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:15,139-Speed 2497.90 samples/sec Loss 1.1357 LearningRate 0.000021 Epoch: 34 Global Step: 721640 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:23,341-Speed 2497.05 samples/sec Loss 1.1681 LearningRate 0.000021 Epoch: 34 Global Step: 721650 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:31,545-Speed 2496.86 samples/sec Loss 1.1422 LearningRate 0.000021 Epoch: 34 Global Step: 721660 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:39,749-Speed 2497.17 samples/sec Loss 1.1196 LearningRate 0.000021 Epoch: 34 Global Step: 721670 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:47,950-Speed 2497.37 samples/sec Loss 1.1479 LearningRate 0.000021 Epoch: 34 Global Step: 721680 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:39:56,098-Speed 2514.01 samples/sec Loss 1.1482 LearningRate 0.000021 Epoch: 34 Global Step: 721690 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:04,314-Speed 2493.06 samples/sec Loss 1.1403 LearningRate 0.000021 Epoch: 34 Global Step: 721700 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:12,516-Speed 2497.41 samples/sec Loss 1.1441 LearningRate 0.000021 Epoch: 34 Global Step: 721710 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:20,732-Speed 2493.04 samples/sec Loss 1.1430 LearningRate 0.000021 Epoch: 34 Global Step: 721720 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:28,932-Speed 2498.05 samples/sec Loss 1.1616 LearningRate 0.000021 Epoch: 34 Global Step: 721730 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:37,144-Speed 2494.30 samples/sec Loss 1.1250 LearningRate 0.000021 Epoch: 34 Global Step: 721740 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:45,289-Speed 2514.54 samples/sec Loss 1.1302 LearningRate 0.000021 Epoch: 34 Global Step: 721750 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:40:53,491-Speed 2497.34 samples/sec Loss 1.1741 LearningRate 0.000021 Epoch: 34 Global Step: 721760 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:01,691-Speed 2498.27 samples/sec Loss 1.1463 LearningRate 0.000021 Epoch: 34 Global Step: 721770 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:09,895-Speed 2496.59 samples/sec Loss 1.1285 LearningRate 0.000021 Epoch: 34 Global Step: 721780 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:18,096-Speed 2497.58 samples/sec Loss 1.1722 LearningRate 0.000021 Epoch: 34 Global Step: 721790 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:26,309-Speed 2494.30 samples/sec Loss 1.1219 LearningRate 0.000021 Epoch: 34 Global Step: 721800 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:34,464-Speed 2512.87 samples/sec Loss 1.1146 LearningRate 0.000021 Epoch: 34 Global Step: 721810 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:42,669-Speed 2496.49 samples/sec Loss 1.1412 LearningRate 0.000021 Epoch: 34 Global Step: 721820 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:50,870-Speed 2497.74 samples/sec Loss 1.1278 LearningRate 0.000021 Epoch: 34 Global Step: 721830 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:41:59,071-Speed 2497.67 samples/sec Loss 1.1294 LearningRate 0.000021 Epoch: 34 Global Step: 721840 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:07,282-Speed 2494.55 samples/sec Loss 1.1274 LearningRate 0.000021 Epoch: 34 Global Step: 721850 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:15,493-Speed 2494.63 samples/sec Loss 1.1475 LearningRate 0.000021 Epoch: 34 Global Step: 721860 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:23,641-Speed 2514.02 samples/sec Loss 1.1581 LearningRate 0.000021 Epoch: 34 Global Step: 721870 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:31,841-Speed 2498.00 samples/sec Loss 1.1505 LearningRate 0.000021 Epoch: 34 Global Step: 721880 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:40,041-Speed 2497.81 samples/sec Loss 1.1491 LearningRate 0.000021 Epoch: 34 Global Step: 721890 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:48,241-Speed 2498.13 samples/sec Loss 1.1330 LearningRate 0.000021 Epoch: 34 Global Step: 721900 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-07-12 11:42:56,403-Speed 2509.13 samples/sec Loss 1.1531 LearningRate 0.000021 Epoch: 34 Global Step: 721910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:04,606-Speed 2497.19 samples/sec Loss 1.1367 LearningRate 0.000021 Epoch: 34 Global Step: 721920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:12,754-Speed 2514.19 samples/sec Loss 1.1239 LearningRate 0.000021 Epoch: 34 Global Step: 721930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:20,956-Speed 2497.25 samples/sec Loss 1.1537 LearningRate 0.000021 Epoch: 34 Global Step: 721940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:29,159-Speed 2497.13 samples/sec Loss 1.1588 LearningRate 0.000021 Epoch: 34 Global Step: 721950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:37,375-Speed 2493.08 samples/sec Loss 1.1717 LearningRate 0.000021 Epoch: 34 Global Step: 721960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:45,576-Speed 2497.79 samples/sec Loss 1.1503 LearningRate 0.000021 Epoch: 34 Global Step: 721970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:43:53,783-Speed 2496.01 samples/sec Loss 1.1756 LearningRate 0.000021 Epoch: 34 Global Step: 721980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:01,934-Speed 2512.80 samples/sec Loss 1.1491 LearningRate 0.000021 Epoch: 34 Global Step: 721990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:10,136-Speed 2497.48 samples/sec Loss 1.1252 LearningRate 0.000021 Epoch: 34 Global Step: 722000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:18,334-Speed 2498.38 samples/sec Loss 1.1557 LearningRate 0.000021 Epoch: 34 Global Step: 722010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:26,531-Speed 2498.82 samples/sec Loss 1.1200 LearningRate 0.000021 Epoch: 34 Global Step: 722020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:34,737-Speed 2496.16 samples/sec Loss 1.1451 LearningRate 0.000021 Epoch: 34 Global Step: 722030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:42,935-Speed 2498.96 samples/sec Loss 1.1601 LearningRate 0.000021 Epoch: 34 Global Step: 722040 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:51,088-Speed 2512.35 samples/sec Loss 1.1452 LearningRate 0.000021 Epoch: 34 Global Step: 722050 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:44:59,288-Speed 2498.04 samples/sec Loss 1.1098 LearningRate 0.000021 Epoch: 34 Global Step: 722060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:07,489-Speed 2497.61 samples/sec Loss 1.1302 LearningRate 0.000021 Epoch: 34 Global Step: 722070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:15,692-Speed 2497.11 samples/sec Loss 1.1265 LearningRate 0.000021 Epoch: 34 Global Step: 722080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:23,894-Speed 2497.49 samples/sec Loss 1.1366 LearningRate 0.000021 Epoch: 34 Global Step: 722090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:32,093-Speed 2498.36 samples/sec Loss 1.1170 LearningRate 0.000021 Epoch: 34 Global Step: 722100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:40,239-Speed 2514.29 samples/sec Loss 1.1383 LearningRate 0.000021 Epoch: 34 Global Step: 722110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:48,444-Speed 2496.44 samples/sec Loss 1.1204 LearningRate 0.000021 Epoch: 34 Global Step: 722120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:45:56,648-Speed 2497.14 samples/sec Loss 1.1394 LearningRate 0.000021 Epoch: 34 Global Step: 722130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:04,847-Speed 2498.07 samples/sec Loss 1.1292 LearningRate 0.000021 Epoch: 34 Global Step: 722140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:13,051-Speed 2496.65 samples/sec Loss 1.1335 LearningRate 0.000021 Epoch: 34 Global Step: 722150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:21,258-Speed 2495.97 samples/sec Loss 1.1191 LearningRate 0.000021 Epoch: 34 Global Step: 722160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:29,408-Speed 2513.24 samples/sec Loss 1.1394 LearningRate 0.000021 Epoch: 34 Global Step: 722170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:37,611-Speed 2496.80 samples/sec Loss 1.1555 LearningRate 0.000021 Epoch: 34 Global Step: 722180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:45,813-Speed 2497.47 samples/sec Loss 1.1353 LearningRate 0.000021 Epoch: 34 Global Step: 722190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:46:54,013-Speed 2498.06 samples/sec Loss 1.1358 LearningRate 0.000021 Epoch: 34 Global Step: 722200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:02,217-Speed 2496.87 samples/sec Loss 1.1313 LearningRate 0.000021 Epoch: 34 Global Step: 722210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:10,421-Speed 2496.70 samples/sec Loss 1.1458 LearningRate 0.000021 Epoch: 34 Global Step: 722220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:18,573-Speed 2512.58 samples/sec Loss 1.1545 LearningRate 0.000021 Epoch: 34 Global Step: 722230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:26,775-Speed 2497.42 samples/sec Loss 1.1303 LearningRate 0.000021 Epoch: 34 Global Step: 722240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:34,975-Speed 2498.10 samples/sec Loss 1.1322 LearningRate 0.000021 Epoch: 34 Global Step: 722250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:43,189-Speed 2493.72 samples/sec Loss 1.1633 LearningRate 0.000021 Epoch: 34 Global Step: 722260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:51,396-Speed 2495.80 samples/sec Loss 1.1211 LearningRate 0.000021 Epoch: 34 Global Step: 722270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:47:59,599-Speed 2496.92 samples/sec Loss 1.1693 LearningRate 0.000021 Epoch: 34 Global Step: 722280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:07,747-Speed 2513.90 samples/sec Loss 1.1372 LearningRate 0.000021 Epoch: 34 Global Step: 722290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:15,945-Speed 2498.54 samples/sec Loss 1.1485 LearningRate 0.000021 Epoch: 34 Global Step: 722300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:24,150-Speed 2496.60 samples/sec Loss 1.1768 LearningRate 0.000021 Epoch: 34 Global Step: 722310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:32,364-Speed 2493.71 samples/sec Loss 1.1491 LearningRate 0.000021 Epoch: 34 Global Step: 722320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:40,569-Speed 2496.35 samples/sec Loss 1.1386 LearningRate 0.000021 Epoch: 34 Global Step: 722330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:48,768-Speed 2498.28 samples/sec Loss 1.1386 LearningRate 0.000021 Epoch: 34 Global Step: 722340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:48:56,917-Speed 2513.77 samples/sec Loss 1.1641 LearningRate 0.000021 Epoch: 34 Global Step: 722350 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:05,121-Speed 2497.06 samples/sec Loss 1.1582 LearningRate 0.000021 Epoch: 34 Global Step: 722360 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:13,321-Speed 2497.78 samples/sec Loss 1.1388 LearningRate 0.000021 Epoch: 34 Global Step: 722370 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:21,524-Speed 2497.53 samples/sec Loss 1.1272 LearningRate 0.000021 Epoch: 34 Global Step: 722380 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:29,723-Speed 2498.38 samples/sec Loss 1.1236 LearningRate 0.000021 Epoch: 34 Global Step: 722390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:37,921-Speed 2498.38 samples/sec Loss 1.1311 LearningRate 0.000021 Epoch: 34 Global Step: 722400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:46,082-Speed 2509.98 samples/sec Loss 1.1384 LearningRate 0.000021 Epoch: 34 Global Step: 722410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:49:54,283-Speed 2497.98 samples/sec Loss 1.1419 LearningRate 0.000021 Epoch: 34 Global Step: 722420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:02,485-Speed 2497.48 samples/sec Loss 1.1013 LearningRate 0.000021 Epoch: 34 Global Step: 722430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:10,684-Speed 2497.98 samples/sec Loss 1.0905 LearningRate 0.000021 Epoch: 34 Global Step: 722440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:18,880-Speed 2499.19 samples/sec Loss 1.1538 LearningRate 0.000021 Epoch: 34 Global Step: 722450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:27,084-Speed 2496.67 samples/sec Loss 1.1028 LearningRate 0.000021 Epoch: 34 Global Step: 722460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:35,244-Speed 2510.37 samples/sec Loss 1.1177 LearningRate 0.000021 Epoch: 34 Global Step: 722470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:43,450-Speed 2496.43 samples/sec Loss 1.1350 LearningRate 0.000021 Epoch: 34 Global Step: 722480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:51,648-Speed 2498.39 samples/sec Loss 1.1582 LearningRate 0.000021 Epoch: 34 Global Step: 722490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:50:59,852-Speed 2496.95 samples/sec Loss 1.1591 LearningRate 0.000021 Epoch: 34 Global Step: 722500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:51:08,053-Speed 2497.73 samples/sec Loss 1.1600 LearningRate 0.000021 Epoch: 34 Global Step: 722510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:51:16,257-Speed 2496.64 samples/sec Loss 1.1539 LearningRate 0.000021 Epoch: 34 Global Step: 722520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:51:24,407-Speed 2513.34 samples/sec Loss 1.1501 LearningRate 0.000021 Epoch: 34 Global Step: 722530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:51:32,624-Speed 2493.30 samples/sec Loss 1.1153 LearningRate 0.000021 Epoch: 34 Global Step: 722540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-07-12 11:51:40,821-Speed 2498.97 samples/sec Loss 1.1189 LearningRate 0.000021 Epoch: 34 Global Step: 722550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:51:49,037-Speed 2492.97 samples/sec Loss 1.1846 LearningRate 0.000021 Epoch: 34 Global Step: 722560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:51:57,251-Speed 2493.85 samples/sec Loss 1.1370 LearningRate 0.000021 Epoch: 34 Global Step: 722570 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:05,451-Speed 2498.09 samples/sec Loss 1.1394 LearningRate 0.000021 Epoch: 34 Global Step: 722580 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:13,601-Speed 2513.06 samples/sec Loss 1.1562 LearningRate 0.000021 Epoch: 34 Global Step: 722590 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:21,802-Speed 2497.66 samples/sec Loss 1.1222 LearningRate 0.000021 Epoch: 34 Global Step: 722600 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:30,001-Speed 2498.55 samples/sec Loss 1.1656 LearningRate 0.000021 Epoch: 34 Global Step: 722610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:38,203-Speed 2497.64 samples/sec Loss 1.1353 LearningRate 0.000021 Epoch: 34 Global Step: 722620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:46,411-Speed 2495.60 samples/sec Loss 1.1181 LearningRate 0.000021 Epoch: 34 Global Step: 722630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:52:54,614-Speed 2496.91 samples/sec Loss 1.1530 LearningRate 0.000021 Epoch: 34 Global Step: 722640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:02,764-Speed 2513.25 samples/sec Loss 1.1374 LearningRate 0.000021 Epoch: 34 Global Step: 722650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:10,967-Speed 2497.11 samples/sec Loss 1.1202 LearningRate 0.000021 Epoch: 34 Global Step: 722660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:19,171-Speed 2496.75 samples/sec Loss 1.1267 LearningRate 0.000020 Epoch: 34 Global Step: 722670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:27,374-Speed 2497.19 samples/sec Loss 1.1736 LearningRate 0.000020 Epoch: 34 Global Step: 722680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:35,573-Speed 2498.33 samples/sec Loss 1.1300 LearningRate 0.000020 Epoch: 34 Global Step: 722690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:43,771-Speed 2498.50 samples/sec Loss 1.1412 LearningRate 0.000020 Epoch: 34 Global Step: 722700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:53:51,937-Speed 2508.42 samples/sec Loss 1.1100 LearningRate 0.000020 Epoch: 34 Global Step: 722710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:00,137-Speed 2497.78 samples/sec Loss 1.1641 LearningRate 0.000020 Epoch: 34 Global Step: 722720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:08,337-Speed 2497.95 samples/sec Loss 1.1364 LearningRate 0.000020 Epoch: 34 Global Step: 722730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:16,538-Speed 2498.15 samples/sec Loss 1.1191 LearningRate 0.000020 Epoch: 34 Global Step: 722740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:24,738-Speed 2497.59 samples/sec Loss 1.1475 LearningRate 0.000020 Epoch: 34 Global Step: 722750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:32,944-Speed 2496.42 samples/sec Loss 1.1087 LearningRate 0.000020 Epoch: 34 Global Step: 722760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:41,107-Speed 2509.29 samples/sec Loss 1.1277 LearningRate 0.000020 Epoch: 34 Global Step: 722770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:49,307-Speed 2498.01 samples/sec Loss 1.1456 LearningRate 0.000020 Epoch: 34 Global Step: 722780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:54:57,513-Speed 2495.93 samples/sec Loss 1.1384 LearningRate 0.000020 Epoch: 34 Global Step: 722790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:05,714-Speed 2498.13 samples/sec Loss 1.1621 LearningRate 0.000020 Epoch: 34 Global Step: 722800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:13,918-Speed 2496.77 samples/sec Loss 1.1291 LearningRate 0.000020 Epoch: 34 Global Step: 722810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:22,128-Speed 2494.86 samples/sec Loss 1.1331 LearningRate 0.000020 Epoch: 34 Global Step: 722820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:30,277-Speed 2513.59 samples/sec Loss 1.1516 LearningRate 0.000020 Epoch: 34 Global Step: 722830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:38,474-Speed 2498.88 samples/sec Loss 1.1171 LearningRate 0.000020 Epoch: 34 Global Step: 722840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:46,687-Speed 2494.25 samples/sec Loss 1.0781 LearningRate 0.000020 Epoch: 34 Global Step: 722850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:55:54,901-Speed 2494.01 samples/sec Loss 1.1275 LearningRate 0.000020 Epoch: 34 Global Step: 722860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:03,101-Speed 2497.71 samples/sec Loss 1.1639 LearningRate 0.000020 Epoch: 34 Global Step: 722870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:11,299-Speed 2498.61 samples/sec Loss 1.1603 LearningRate 0.000020 Epoch: 34 Global Step: 722880 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:19,453-Speed 2512.43 samples/sec Loss 1.1676 LearningRate 0.000020 Epoch: 34 Global Step: 722890 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:27,652-Speed 2498.37 samples/sec Loss 1.1315 LearningRate 0.000020 Epoch: 34 Global Step: 722900 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:35,854-Speed 2497.32 samples/sec Loss 1.1356 LearningRate 0.000020 Epoch: 34 Global Step: 722910 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:44,057-Speed 2497.13 samples/sec Loss 1.1289 LearningRate 0.000020 Epoch: 34 Global Step: 722920 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:56:52,256-Speed 2498.18 samples/sec Loss 1.1487 LearningRate 0.000020 Epoch: 34 Global Step: 722930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:00,474-Speed 2492.63 samples/sec Loss 1.1246 LearningRate 0.000020 Epoch: 34 Global Step: 722940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:08,623-Speed 2513.85 samples/sec Loss 1.1203 LearningRate 0.000020 Epoch: 34 Global Step: 722950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:16,823-Speed 2497.99 samples/sec Loss 1.1299 LearningRate 0.000020 Epoch: 34 Global Step: 722960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:25,030-Speed 2496.07 samples/sec Loss 1.1604 LearningRate 0.000020 Epoch: 34 Global Step: 722970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:33,231-Speed 2497.42 samples/sec Loss 1.1316 LearningRate 0.000020 Epoch: 34 Global Step: 722980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:41,429-Speed 2498.34 samples/sec Loss 1.1279 LearningRate 0.000020 Epoch: 34 Global Step: 722990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:49,631-Speed 2497.39 samples/sec Loss 1.1575 LearningRate 0.000020 Epoch: 34 Global Step: 723000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:57:57,782-Speed 2513.08 samples/sec Loss 1.1130 LearningRate 0.000020 Epoch: 34 Global Step: 723010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:05,985-Speed 2497.02 samples/sec Loss 1.1358 LearningRate 0.000020 Epoch: 34 Global Step: 723020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:14,184-Speed 2498.18 samples/sec Loss 1.1408 LearningRate 0.000020 Epoch: 34 Global Step: 723030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:22,383-Speed 2498.38 samples/sec Loss 1.1378 LearningRate 0.000020 Epoch: 34 Global Step: 723040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:30,582-Speed 2498.09 samples/sec Loss 1.1448 LearningRate 0.000020 Epoch: 34 Global Step: 723050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:38,794-Speed 2494.40 samples/sec Loss 1.1472 LearningRate 0.000020 Epoch: 34 Global Step: 723060 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:46,940-Speed 2514.57 samples/sec Loss 1.1343 LearningRate 0.000020 Epoch: 34 Global Step: 723070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:58:55,140-Speed 2497.84 samples/sec Loss 1.1271 LearningRate 0.000020 Epoch: 34 Global Step: 723080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:59:03,343-Speed 2497.19 samples/sec Loss 1.1442 LearningRate 0.000020 Epoch: 34 Global Step: 723090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:59:11,545-Speed 2497.08 samples/sec Loss 1.1537 LearningRate 0.000020 Epoch: 34 Global Step: 723100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 11:59:19,745-Speed 2498.19 samples/sec Loss 1.1598 LearningRate 0.000020 Epoch: 34 Global Step: 723110 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 11:59:27,942-Speed 2498.99 samples/sec Loss 1.1508 LearningRate 0.000020 Epoch: 34 Global Step: 723120 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 11:59:36,089-Speed 2514.42 samples/sec Loss 1.1643 LearningRate 0.000020 Epoch: 34 Global Step: 723130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 11:59:44,287-Speed 2498.60 samples/sec Loss 1.1387 LearningRate 0.000020 Epoch: 34 Global Step: 723140 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 11:59:52,485-Speed 2498.40 samples/sec Loss 1.1403 LearningRate 0.000020 Epoch: 34 Global Step: 723150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:00,689-Speed 2496.67 samples/sec Loss 1.1295 LearningRate 0.000020 Epoch: 34 Global Step: 723160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:08,889-Speed 2498.11 samples/sec Loss 1.1494 LearningRate 0.000020 Epoch: 34 Global Step: 723170 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:17,092-Speed 2496.91 samples/sec Loss 1.1302 LearningRate 0.000020 Epoch: 34 Global Step: 723180 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:25,238-Speed 2514.39 samples/sec Loss 1.1350 LearningRate 0.000020 Epoch: 34 Global Step: 723190 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:33,437-Speed 2498.21 samples/sec Loss 1.1696 LearningRate 0.000020 Epoch: 34 Global Step: 723200 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:41,637-Speed 2498.12 samples/sec Loss 1.1484 LearningRate 0.000020 Epoch: 34 Global Step: 723210 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:49,836-Speed 2498.07 samples/sec Loss 1.1107 LearningRate 0.000020 Epoch: 34 Global Step: 723220 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:00:58,036-Speed 2498.33 samples/sec Loss 1.1548 LearningRate 0.000020 Epoch: 34 Global Step: 723230 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:06,240-Speed 2496.77 samples/sec Loss 1.1017 LearningRate 0.000020 Epoch: 34 Global Step: 723240 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:14,383-Speed 2515.45 samples/sec Loss 1.1224 LearningRate 0.000020 Epoch: 34 Global Step: 723250 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:22,582-Speed 2498.04 samples/sec Loss 1.1297 LearningRate 0.000020 Epoch: 34 Global Step: 723260 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:30,786-Speed 2496.70 samples/sec Loss 1.1421 LearningRate 0.000020 Epoch: 34 Global Step: 723270 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:38,986-Speed 2498.26 samples/sec Loss 1.1365 LearningRate 0.000020 Epoch: 34 Global Step: 723280 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:47,187-Speed 2497.42 samples/sec Loss 1.1325 LearningRate 0.000020 Epoch: 34 Global Step: 723290 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:01:55,390-Speed 2497.53 samples/sec Loss 1.1428 LearningRate 0.000020 Epoch: 34 Global Step: 723300 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:03,537-Speed 2514.16 samples/sec Loss 1.1230 LearningRate 0.000020 Epoch: 34 Global Step: 723310 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:11,736-Speed 2498.38 samples/sec Loss 1.1339 LearningRate 0.000020 Epoch: 34 Global Step: 723320 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:19,936-Speed 2498.00 samples/sec Loss 1.0955 LearningRate 0.000020 Epoch: 34 Global Step: 723330 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:28,138-Speed 2497.10 samples/sec Loss 1.1646 LearningRate 0.000020 Epoch: 34 Global Step: 723340 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:36,336-Speed 2498.79 samples/sec Loss 1.1527 LearningRate 0.000020 Epoch: 34 Global Step: 723350 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:44,534-Speed 2498.47 samples/sec Loss 1.1276 LearningRate 0.000020 Epoch: 34 Global Step: 723360 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:02:52,696-Speed 2509.56 samples/sec Loss 1.1275 LearningRate 0.000020 Epoch: 34 Global Step: 723370 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:00,893-Speed 2499.03 samples/sec Loss 1.1087 LearningRate 0.000020 Epoch: 34 Global Step: 723380 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:09,095-Speed 2497.72 samples/sec Loss 1.1170 LearningRate 0.000020 Epoch: 34 Global Step: 723390 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:17,293-Speed 2498.29 samples/sec Loss 1.1401 LearningRate 0.000020 Epoch: 34 Global Step: 723400 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:25,500-Speed 2496.06 samples/sec Loss 1.1233 LearningRate 0.000020 Epoch: 34 Global Step: 723410 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:33,703-Speed 2497.14 samples/sec Loss 1.1241 LearningRate 0.000020 Epoch: 34 Global Step: 723420 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:41,848-Speed 2514.69 samples/sec Loss 1.1382 LearningRate 0.000020 Epoch: 34 Global Step: 723430 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:50,045-Speed 2498.81 samples/sec Loss 1.1814 LearningRate 0.000020 Epoch: 34 Global Step: 723440 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:03:58,269-Speed 2490.87 samples/sec Loss 1.1329 LearningRate 0.000020 Epoch: 34 Global Step: 723450 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:06,466-Speed 2498.68 samples/sec Loss 1.1386 LearningRate 0.000020 Epoch: 34 Global Step: 723460 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:14,678-Speed 2494.35 samples/sec Loss 1.1185 LearningRate 0.000020 Epoch: 34 Global Step: 723470 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:22,877-Speed 2498.54 samples/sec Loss 1.1484 LearningRate 0.000020 Epoch: 34 Global Step: 723480 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:31,024-Speed 2514.22 samples/sec Loss 1.1253 LearningRate 0.000020 Epoch: 34 Global Step: 723490 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:39,226-Speed 2497.50 samples/sec Loss 1.1277 LearningRate 0.000020 Epoch: 34 Global Step: 723500 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:47,426-Speed 2498.09 samples/sec Loss 1.1242 LearningRate 0.000020 Epoch: 34 Global Step: 723510 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:04:55,630-Speed 2496.86 samples/sec Loss 1.1638 LearningRate 0.000020 Epoch: 34 Global Step: 723520 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:03,839-Speed 2495.35 samples/sec Loss 1.1306 LearningRate 0.000020 Epoch: 34 Global Step: 723530 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:12,037-Speed 2498.28 samples/sec Loss 1.1258 LearningRate 0.000020 Epoch: 34 Global Step: 723540 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:20,192-Speed 2511.87 samples/sec Loss 1.1373 LearningRate 0.000020 Epoch: 34 Global Step: 723550 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:28,392-Speed 2498.05 samples/sec Loss 1.1538 LearningRate 0.000020 Epoch: 34 Global Step: 723560 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:36,589-Speed 2498.75 samples/sec Loss 1.1252 LearningRate 0.000020 Epoch: 34 Global Step: 723570 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:44,792-Speed 2497.02 samples/sec Loss 1.1279 LearningRate 0.000020 Epoch: 34 Global Step: 723580 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:05:52,992-Speed 2497.99 samples/sec Loss 1.1386 LearningRate 0.000020 Epoch: 34 Global Step: 723590 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:01,192-Speed 2498.04 samples/sec Loss 1.1358 LearningRate 0.000020 Epoch: 34 Global Step: 723600 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:09,339-Speed 2514.13 samples/sec Loss 1.1705 LearningRate 0.000020 Epoch: 34 Global Step: 723610 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:17,554-Speed 2493.54 samples/sec Loss 1.0963 LearningRate 0.000020 Epoch: 34 Global Step: 723620 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:25,757-Speed 2497.03 samples/sec Loss 1.1718 LearningRate 0.000020 Epoch: 34 Global Step: 723630 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:33,969-Speed 2494.33 samples/sec Loss 1.1554 LearningRate 0.000020 Epoch: 34 Global Step: 723640 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:42,173-Speed 2497.01 samples/sec Loss 1.1823 LearningRate 0.000020 Epoch: 34 Global Step: 723650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:50,375-Speed 2497.34 samples/sec Loss 1.1557 LearningRate 0.000020 Epoch: 34 Global Step: 723660 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:06:58,545-Speed 2507.05 samples/sec Loss 1.1482 LearningRate 0.000020 Epoch: 34 Global Step: 723670 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:06,748-Speed 2496.96 samples/sec Loss 1.1226 LearningRate 0.000020 Epoch: 34 Global Step: 723680 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:14,951-Speed 2496.99 samples/sec Loss 1.1550 LearningRate 0.000020 Epoch: 34 Global Step: 723690 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:23,176-Speed 2490.52 samples/sec Loss 1.1396 LearningRate 0.000020 Epoch: 34 Global Step: 723700 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:31,377-Speed 2497.50 samples/sec Loss 1.1335 LearningRate 0.000020 Epoch: 34 Global Step: 723710 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:39,586-Speed 2495.43 samples/sec Loss 1.1211 LearningRate 0.000020 Epoch: 34 Global Step: 723720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:47,732-Speed 2514.73 samples/sec Loss 1.1284 LearningRate 0.000020 Epoch: 34 Global Step: 723730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:07:55,936-Speed 2496.62 samples/sec Loss 1.1570 LearningRate 0.000020 Epoch: 34 Global Step: 723740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:04,136-Speed 2497.85 samples/sec Loss 1.1409 LearningRate 0.000020 Epoch: 34 Global Step: 723750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:12,336-Speed 2497.95 samples/sec Loss 1.1323 LearningRate 0.000020 Epoch: 34 Global Step: 723760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:20,540-Speed 2496.90 samples/sec Loss 1.1242 LearningRate 0.000020 Epoch: 34 Global Step: 723770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:28,745-Speed 2496.46 samples/sec Loss 1.1336 LearningRate 0.000020 Epoch: 34 Global Step: 723780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:36,897-Speed 2512.87 samples/sec Loss 1.1539 LearningRate 0.000020 Epoch: 34 Global Step: 723790 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:45,101-Speed 2496.65 samples/sec Loss 1.1319 LearningRate 0.000020 Epoch: 34 Global Step: 723800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:08:53,304-Speed 2497.12 samples/sec Loss 1.1079 LearningRate 0.000020 Epoch: 34 Global Step: 723810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:01,505-Speed 2497.60 samples/sec Loss 1.1150 LearningRate 0.000020 Epoch: 34 Global Step: 723820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:09,706-Speed 2497.65 samples/sec Loss 1.1099 LearningRate 0.000020 Epoch: 34 Global Step: 723830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:17,905-Speed 2498.29 samples/sec Loss 1.1142 LearningRate 0.000020 Epoch: 34 Global Step: 723840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:26,054-Speed 2513.40 samples/sec Loss 1.1530 LearningRate 0.000020 Epoch: 34 Global Step: 723850 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:34,254-Speed 2498.07 samples/sec Loss 1.1312 LearningRate 0.000020 Epoch: 34 Global Step: 723860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:42,459-Speed 2496.33 samples/sec Loss 1.1432 LearningRate 0.000020 Epoch: 34 Global Step: 723870 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:50,665-Speed 2496.26 samples/sec Loss 1.1358 LearningRate 0.000020 Epoch: 34 Global Step: 723880 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:09:58,870-Speed 2496.53 samples/sec Loss 1.1211 LearningRate 0.000020 Epoch: 34 Global Step: 723890 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:10:07,073-Speed 2497.27 samples/sec Loss 1.1398 LearningRate 0.000020 Epoch: 34 Global Step: 723900 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:10:15,220-Speed 2514.13 samples/sec Loss 1.1347 LearningRate 0.000020 Epoch: 34 Global Step: 723910 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:10:23,425-Speed 2496.28 samples/sec Loss 1.1280 LearningRate 0.000020 Epoch: 34 Global Step: 723920 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:10:31,584-Speed 2510.75 samples/sec Loss 1.1150 LearningRate 0.000020 Epoch: 34 Global Step: 723930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:10:39,791-Speed 2495.79 samples/sec Loss 1.1572 LearningRate 0.000020 Epoch: 34 Global Step: 723940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:10:47,995-Speed 2496.88 samples/sec Loss 1.1570 LearningRate 0.000020 Epoch: 34 Global Step: 723950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:10:56,198-Speed 2496.78 samples/sec Loss 1.1595 LearningRate 0.000020 Epoch: 34 Global Step: 723960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:04,358-Speed 2510.83 samples/sec Loss 1.1363 LearningRate 0.000020 Epoch: 34 Global Step: 723970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:12,560-Speed 2497.03 samples/sec Loss 1.1287 LearningRate 0.000020 Epoch: 34 Global Step: 723980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:20,761-Speed 2497.65 samples/sec Loss 1.1534 LearningRate 0.000020 Epoch: 34 Global Step: 723990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:28,968-Speed 2496.02 samples/sec Loss 1.1609 LearningRate 0.000020 Epoch: 34 Global Step: 724000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:37,175-Speed 2496.08 samples/sec Loss 1.1474 LearningRate 0.000020 Epoch: 34 Global Step: 724010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:45,374-Speed 2498.10 samples/sec Loss 1.1368 LearningRate 0.000020 Epoch: 34 Global Step: 724020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:11:53,526-Speed 2512.71 samples/sec Loss 1.1306 LearningRate 0.000020 Epoch: 34 Global Step: 724030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:01,732-Speed 2496.28 samples/sec Loss 1.1296 LearningRate 0.000020 Epoch: 34 Global Step: 724040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:09,946-Speed 2493.64 samples/sec Loss 1.1156 LearningRate 0.000020 Epoch: 34 Global Step: 724050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:18,150-Speed 2496.66 samples/sec Loss 1.1348 LearningRate 0.000020 Epoch: 34 Global Step: 724060 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:26,351-Speed 2497.68 samples/sec Loss 1.1380 LearningRate 0.000020 Epoch: 34 Global Step: 724070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:34,564-Speed 2494.33 samples/sec Loss 1.0932 LearningRate 0.000020 Epoch: 34 Global Step: 724080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:42,712-Speed 2513.89 samples/sec Loss 1.1414 LearningRate 0.000020 Epoch: 34 Global Step: 724090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:50,913-Speed 2497.84 samples/sec Loss 1.1160 LearningRate 0.000020 Epoch: 34 Global Step: 724100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:12:59,117-Speed 2496.60 samples/sec Loss 1.1215 LearningRate 0.000020 Epoch: 34 Global Step: 724110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:07,323-Speed 2496.22 samples/sec Loss 1.1150 LearningRate 0.000020 Epoch: 34 Global Step: 724120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:15,521-Speed 2498.34 samples/sec Loss 1.1380 LearningRate 0.000020 Epoch: 34 Global Step: 724130 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:23,726-Speed 2496.70 samples/sec Loss 1.1288 LearningRate 0.000020 Epoch: 34 Global Step: 724140 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:31,887-Speed 2510.00 samples/sec Loss 1.1540 LearningRate 0.000020 Epoch: 34 Global Step: 724150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:40,102-Speed 2493.39 samples/sec Loss 1.1282 LearningRate 0.000020 Epoch: 34 Global Step: 724160 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:48,306-Speed 2496.68 samples/sec Loss 1.1500 LearningRate 0.000020 Epoch: 34 Global Step: 724170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:13:56,508-Speed 2497.58 samples/sec Loss 1.1542 LearningRate 0.000020 Epoch: 34 Global Step: 724180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:04,709-Speed 2497.37 samples/sec Loss 1.0990 LearningRate 0.000020 Epoch: 34 Global Step: 724190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:12,910-Speed 2497.87 samples/sec Loss 1.1241 LearningRate 0.000020 Epoch: 34 Global Step: 724200 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:21,059-Speed 2513.69 samples/sec Loss 1.1199 LearningRate 0.000020 Epoch: 34 Global Step: 724210 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:29,261-Speed 2497.39 samples/sec Loss 1.1289 LearningRate 0.000020 Epoch: 34 Global Step: 724220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:37,463-Speed 2497.42 samples/sec Loss 1.1513 LearningRate 0.000020 Epoch: 34 Global Step: 724230 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:45,664-Speed 2497.67 samples/sec Loss 1.1264 LearningRate 0.000020 Epoch: 34 Global Step: 724240 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:14:53,865-Speed 2497.74 samples/sec Loss 1.1104 LearningRate 0.000020 Epoch: 34 Global Step: 724250 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:02,066-Speed 2497.41 samples/sec Loss 1.1488 LearningRate 0.000020 Epoch: 34 Global Step: 724260 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:10,217-Speed 2513.12 samples/sec Loss 1.1124 LearningRate 0.000020 Epoch: 34 Global Step: 724270 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:18,419-Speed 2497.44 samples/sec Loss 1.1397 LearningRate 0.000020 Epoch: 34 Global Step: 724280 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:26,620-Speed 2497.59 samples/sec Loss 1.1237 LearningRate 0.000020 Epoch: 34 Global Step: 724290 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:34,823-Speed 2496.97 samples/sec Loss 1.1441 LearningRate 0.000020 Epoch: 34 Global Step: 724300 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:43,055-Speed 2488.22 samples/sec Loss 1.1048 LearningRate 0.000020 Epoch: 34 Global Step: 724310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:51,259-Speed 2496.65 samples/sec Loss 1.1554 LearningRate 0.000020 Epoch: 34 Global Step: 724320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:15:59,407-Speed 2513.89 samples/sec Loss 1.1378 LearningRate 0.000020 Epoch: 34 Global Step: 724330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:07,606-Speed 2498.19 samples/sec Loss 1.1187 LearningRate 0.000020 Epoch: 34 Global Step: 724340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:15,813-Speed 2496.20 samples/sec Loss 1.1540 LearningRate 0.000020 Epoch: 34 Global Step: 724350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:24,016-Speed 2496.85 samples/sec Loss 1.1390 LearningRate 0.000020 Epoch: 34 Global Step: 724360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:32,229-Speed 2494.04 samples/sec Loss 1.1467 LearningRate 0.000020 Epoch: 34 Global Step: 724370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:40,434-Speed 2496.66 samples/sec Loss 1.1223 LearningRate 0.000020 Epoch: 34 Global Step: 724380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:48,582-Speed 2513.83 samples/sec Loss 1.1619 LearningRate 0.000020 Epoch: 34 Global Step: 724390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:16:56,783-Speed 2497.72 samples/sec Loss 1.1396 LearningRate 0.000020 Epoch: 34 Global Step: 724400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:04,990-Speed 2495.84 samples/sec Loss 1.1556 LearningRate 0.000020 Epoch: 34 Global Step: 724410 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:13,194-Speed 2496.84 samples/sec Loss 1.1292 LearningRate 0.000020 Epoch: 34 Global Step: 724420 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:21,395-Speed 2497.70 samples/sec Loss 1.1463 LearningRate 0.000020 Epoch: 34 Global Step: 724430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:29,601-Speed 2496.12 samples/sec Loss 1.1504 LearningRate 0.000020 Epoch: 34 Global Step: 724440 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:37,752-Speed 2512.95 samples/sec Loss 1.1288 LearningRate 0.000020 Epoch: 34 Global Step: 724450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:45,957-Speed 2496.47 samples/sec Loss 1.1338 LearningRate 0.000020 Epoch: 34 Global Step: 724460 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:17:54,157-Speed 2497.85 samples/sec Loss 1.1295 LearningRate 0.000020 Epoch: 34 Global Step: 724470 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:02,361-Speed 2496.61 samples/sec Loss 1.1274 LearningRate 0.000020 Epoch: 34 Global Step: 724480 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:10,568-Speed 2496.09 samples/sec Loss 1.1159 LearningRate 0.000020 Epoch: 34 Global Step: 724490 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:18,771-Speed 2496.73 samples/sec Loss 1.1560 LearningRate 0.000020 Epoch: 34 Global Step: 724500 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:26,922-Speed 2513.10 samples/sec Loss 1.1380 LearningRate 0.000020 Epoch: 34 Global Step: 724510 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:35,123-Speed 2497.47 samples/sec Loss 1.1402 LearningRate 0.000020 Epoch: 34 Global Step: 724520 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:43,325-Speed 2497.31 samples/sec Loss 1.1213 LearningRate 0.000020 Epoch: 34 Global Step: 724530 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:51,527-Speed 2497.23 samples/sec Loss 1.1460 LearningRate 0.000020 Epoch: 34 Global Step: 724540 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:18:59,733-Speed 2496.40 samples/sec Loss 1.1037 LearningRate 0.000020 Epoch: 34 Global Step: 724550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:07,938-Speed 2496.25 samples/sec Loss 1.1345 LearningRate 0.000020 Epoch: 34 Global Step: 724560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:16,090-Speed 2512.72 samples/sec Loss 1.1077 LearningRate 0.000020 Epoch: 34 Global Step: 724570 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:24,290-Speed 2498.08 samples/sec Loss 1.1270 LearningRate 0.000020 Epoch: 34 Global Step: 724580 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:32,502-Speed 2494.19 samples/sec Loss 1.1285 LearningRate 0.000020 Epoch: 34 Global Step: 724590 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:40,703-Speed 2497.71 samples/sec Loss 1.1053 LearningRate 0.000020 Epoch: 34 Global Step: 724600 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:48,914-Speed 2494.54 samples/sec Loss 1.1498 LearningRate 0.000020 Epoch: 34 Global Step: 724610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:19:57,128-Speed 2493.80 samples/sec Loss 1.1000 LearningRate 0.000020 Epoch: 34 Global Step: 724620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:05,276-Speed 2514.08 samples/sec Loss 1.1437 LearningRate 0.000020 Epoch: 34 Global Step: 724630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:13,475-Speed 2498.05 samples/sec Loss 1.1157 LearningRate 0.000020 Epoch: 34 Global Step: 724640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:21,675-Speed 2498.18 samples/sec Loss 1.0900 LearningRate 0.000020 Epoch: 34 Global Step: 724650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:29,876-Speed 2497.62 samples/sec Loss 1.1582 LearningRate 0.000020 Epoch: 34 Global Step: 724660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:38,074-Speed 2498.58 samples/sec Loss 1.1291 LearningRate 0.000020 Epoch: 34 Global Step: 724670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:46,288-Speed 2493.59 samples/sec Loss 1.1571 LearningRate 0.000020 Epoch: 34 Global Step: 724680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:20:54,440-Speed 2512.72 samples/sec Loss 1.1211 LearningRate 0.000020 Epoch: 34 Global Step: 724690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:02,655-Speed 2493.47 samples/sec Loss 1.1233 LearningRate 0.000020 Epoch: 34 Global Step: 724700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:10,863-Speed 2495.62 samples/sec Loss 1.1374 LearningRate 0.000020 Epoch: 34 Global Step: 724710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:19,067-Speed 2496.66 samples/sec Loss 1.1338 LearningRate 0.000020 Epoch: 34 Global Step: 724720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:27,267-Speed 2497.72 samples/sec Loss 1.1644 LearningRate 0.000020 Epoch: 34 Global Step: 724730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:35,468-Speed 2497.77 samples/sec Loss 1.1333 LearningRate 0.000020 Epoch: 34 Global Step: 724740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:43,616-Speed 2513.77 samples/sec Loss 1.1217 LearningRate 0.000020 Epoch: 34 Global Step: 724750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:21:51,815-Speed 2498.24 samples/sec Loss 1.1118 LearningRate 0.000020 Epoch: 34 Global Step: 724760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:00,017-Speed 2497.55 samples/sec Loss 1.1124 LearningRate 0.000020 Epoch: 34 Global Step: 724770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:08,223-Speed 2495.94 samples/sec Loss 1.1345 LearningRate 0.000020 Epoch: 34 Global Step: 724780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:16,427-Speed 2496.96 samples/sec Loss 1.1502 LearningRate 0.000020 Epoch: 34 Global Step: 724790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:24,629-Speed 2497.32 samples/sec Loss 1.1405 LearningRate 0.000020 Epoch: 34 Global Step: 724800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:32,779-Speed 2513.26 samples/sec Loss 1.1201 LearningRate 0.000020 Epoch: 34 Global Step: 724810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:40,980-Speed 2497.59 samples/sec Loss 1.1499 LearningRate 0.000020 Epoch: 34 Global Step: 724820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:49,181-Speed 2497.41 samples/sec Loss 1.1270 LearningRate 0.000020 Epoch: 34 Global Step: 724830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:22:57,382-Speed 2497.82 samples/sec Loss 1.1474 LearningRate 0.000020 Epoch: 34 Global Step: 724840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:05,586-Speed 2496.95 samples/sec Loss 1.1439 LearningRate 0.000020 Epoch: 34 Global Step: 724850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:13,786-Speed 2498.01 samples/sec Loss 1.1729 LearningRate 0.000020 Epoch: 34 Global Step: 724860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:21,935-Speed 2513.39 samples/sec Loss 1.1406 LearningRate 0.000020 Epoch: 34 Global Step: 724870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:30,137-Speed 2497.53 samples/sec Loss 1.1384 LearningRate 0.000020 Epoch: 34 Global Step: 724880 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:38,339-Speed 2497.61 samples/sec Loss 1.1553 LearningRate 0.000020 Epoch: 34 Global Step: 724890 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:46,538-Speed 2498.37 samples/sec Loss 1.1513 LearningRate 0.000020 Epoch: 34 Global Step: 724900 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:23:54,740-Speed 2497.33 samples/sec Loss 1.1489 LearningRate 0.000020 Epoch: 34 Global Step: 724910 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:02,941-Speed 2497.59 samples/sec Loss 1.1274 LearningRate 0.000020 Epoch: 34 Global Step: 724920 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:11,088-Speed 2514.19 samples/sec Loss 1.1340 LearningRate 0.000020 Epoch: 34 Global Step: 724930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:19,291-Speed 2497.21 samples/sec Loss 1.1508 LearningRate 0.000020 Epoch: 34 Global Step: 724940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:27,494-Speed 2497.16 samples/sec Loss 1.1358 LearningRate 0.000020 Epoch: 34 Global Step: 724950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:35,698-Speed 2496.81 samples/sec Loss 1.1344 LearningRate 0.000020 Epoch: 34 Global Step: 724960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:43,897-Speed 2497.89 samples/sec Loss 1.1339 LearningRate 0.000020 Epoch: 34 Global Step: 724970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:24:52,102-Speed 2496.63 samples/sec Loss 1.1326 LearningRate 0.000020 Epoch: 34 Global Step: 724980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:00,247-Speed 2514.75 samples/sec Loss 1.1415 LearningRate 0.000020 Epoch: 34 Global Step: 724990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:08,449-Speed 2497.15 samples/sec Loss 1.1174 LearningRate 0.000020 Epoch: 34 Global Step: 725000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:16,653-Speed 2496.86 samples/sec Loss 1.1219 LearningRate 0.000020 Epoch: 34 Global Step: 725010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:24,865-Speed 2494.69 samples/sec Loss 1.1059 LearningRate 0.000020 Epoch: 34 Global Step: 725020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:33,068-Speed 2496.80 samples/sec Loss 1.1594 LearningRate 0.000020 Epoch: 34 Global Step: 725030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:41,274-Speed 2496.11 samples/sec Loss 1.1341 LearningRate 0.000020 Epoch: 34 Global Step: 725040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:49,428-Speed 2512.36 samples/sec Loss 1.1263 LearningRate 0.000020 Epoch: 34 Global Step: 725050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:25:57,630-Speed 2497.11 samples/sec Loss 1.1535 LearningRate 0.000020 Epoch: 34 Global Step: 725060 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:05,833-Speed 2497.20 samples/sec Loss 1.1272 LearningRate 0.000020 Epoch: 34 Global Step: 725070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:14,037-Speed 2496.78 samples/sec Loss 1.1496 LearningRate 0.000020 Epoch: 34 Global Step: 725080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:22,240-Speed 2497.03 samples/sec Loss 1.1322 LearningRate 0.000020 Epoch: 34 Global Step: 725090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:30,445-Speed 2496.50 samples/sec Loss 1.1386 LearningRate 0.000020 Epoch: 34 Global Step: 725100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:38,602-Speed 2511.40 samples/sec Loss 1.1273 LearningRate 0.000020 Epoch: 34 Global Step: 725110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:46,804-Speed 2497.09 samples/sec Loss 1.1199 LearningRate 0.000020 Epoch: 34 Global Step: 725120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:26:55,007-Speed 2496.99 samples/sec Loss 1.1685 LearningRate 0.000020 Epoch: 34 Global Step: 725130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:03,210-Speed 2497.25 samples/sec Loss 1.1322 LearningRate 0.000020 Epoch: 34 Global Step: 725140 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:11,413-Speed 2497.02 samples/sec Loss 1.1217 LearningRate 0.000020 Epoch: 34 Global Step: 725150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:19,619-Speed 2496.40 samples/sec Loss 1.1224 LearningRate 0.000020 Epoch: 34 Global Step: 725160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:27,766-Speed 2514.02 samples/sec Loss 1.1411 LearningRate 0.000020 Epoch: 34 Global Step: 725170 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:35,971-Speed 2496.43 samples/sec Loss 1.1198 LearningRate 0.000020 Epoch: 34 Global Step: 725180 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:44,179-Speed 2495.54 samples/sec Loss 1.1127 LearningRate 0.000020 Epoch: 34 Global Step: 725190 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:27:52,380-Speed 2497.63 samples/sec Loss 1.1080 LearningRate 0.000020 Epoch: 34 Global Step: 725200 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:00,594-Speed 2493.81 samples/sec Loss 1.1356 LearningRate 0.000020 Epoch: 34 Global Step: 725210 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:08,800-Speed 2496.25 samples/sec Loss 1.1800 LearningRate 0.000020 Epoch: 34 Global Step: 725220 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:16,949-Speed 2513.39 samples/sec Loss 1.1323 LearningRate 0.000020 Epoch: 34 Global Step: 725230 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:25,152-Speed 2496.92 samples/sec Loss 1.1875 LearningRate 0.000020 Epoch: 34 Global Step: 725240 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:33,358-Speed 2496.19 samples/sec Loss 1.1381 LearningRate 0.000020 Epoch: 34 Global Step: 725250 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:41,573-Speed 2493.32 samples/sec Loss 1.1244 LearningRate 0.000020 Epoch: 34 Global Step: 725260 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:49,781-Speed 2495.50 samples/sec Loss 1.1158 LearningRate 0.000020 Epoch: 34 Global Step: 725270 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:28:57,986-Speed 2496.80 samples/sec Loss 1.1322 LearningRate 0.000020 Epoch: 34 Global Step: 725280 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:29:06,140-Speed 2512.12 samples/sec Loss 1.1481 LearningRate 0.000020 Epoch: 34 Global Step: 725290 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:29:14,341-Speed 2497.48 samples/sec Loss 1.1117 LearningRate 0.000020 Epoch: 34 Global Step: 725300 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:29:22,544-Speed 2497.33 samples/sec Loss 1.1076 LearningRate 0.000019 Epoch: 34 Global Step: 725310 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:29:30,750-Speed 2496.25 samples/sec Loss 1.1471 LearningRate 0.000019 Epoch: 34 Global Step: 725320 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:29:38,908-Speed 2510.97 samples/sec Loss 1.1323 LearningRate 0.000019 Epoch: 34 Global Step: 725330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:29:47,110-Speed 2497.36 samples/sec Loss 1.1504 LearningRate 0.000019 Epoch: 34 Global Step: 725340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:29:55,256-Speed 2514.31 samples/sec Loss 1.1496 LearningRate 0.000019 Epoch: 34 Global Step: 725350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:03,458-Speed 2497.51 samples/sec Loss 1.1681 LearningRate 0.000019 Epoch: 34 Global Step: 725360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:11,661-Speed 2497.04 samples/sec Loss 1.1355 LearningRate 0.000019 Epoch: 34 Global Step: 725370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:19,865-Speed 2496.81 samples/sec Loss 1.1406 LearningRate 0.000019 Epoch: 34 Global Step: 725380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:28,065-Speed 2497.68 samples/sec Loss 1.1384 LearningRate 0.000019 Epoch: 34 Global Step: 725390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:36,267-Speed 2497.62 samples/sec Loss 1.1563 LearningRate 0.000019 Epoch: 34 Global Step: 725400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:44,415-Speed 2514.05 samples/sec Loss 1.1480 LearningRate 0.000019 Epoch: 34 Global Step: 725410 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:30:52,647-Speed 2488.12 samples/sec Loss 1.1220 LearningRate 0.000019 Epoch: 34 Global Step: 725420 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:00,866-Speed 2492.26 samples/sec Loss 1.1250 LearningRate 0.000019 Epoch: 34 Global Step: 725430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:09,073-Speed 2495.78 samples/sec Loss 1.1516 LearningRate 0.000019 Epoch: 34 Global Step: 725440 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:17,275-Speed 2497.17 samples/sec Loss 1.1056 LearningRate 0.000019 Epoch: 34 Global Step: 725450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:25,476-Speed 2497.70 samples/sec Loss 1.1252 LearningRate 0.000019 Epoch: 34 Global Step: 725460 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:33,624-Speed 2513.96 samples/sec Loss 1.1456 LearningRate 0.000019 Epoch: 34 Global Step: 725470 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:41,826-Speed 2497.74 samples/sec Loss 1.1287 LearningRate 0.000019 Epoch: 34 Global Step: 725480 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:50,027-Speed 2497.59 samples/sec Loss 1.1491 LearningRate 0.000019 Epoch: 34 Global Step: 725490 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:31:58,238-Speed 2494.60 samples/sec Loss 1.1467 LearningRate 0.000019 Epoch: 34 Global Step: 725500 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:06,438-Speed 2497.93 samples/sec Loss 1.1290 LearningRate 0.000019 Epoch: 34 Global Step: 725510 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:14,638-Speed 2497.71 samples/sec Loss 1.1428 LearningRate 0.000019 Epoch: 34 Global Step: 725520 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:22,789-Speed 2513.14 samples/sec Loss 1.1583 LearningRate 0.000019 Epoch: 34 Global Step: 725530 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:30,989-Speed 2497.82 samples/sec Loss 1.1484 LearningRate 0.000019 Epoch: 34 Global Step: 725540 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:39,204-Speed 2493.39 samples/sec Loss 1.1422 LearningRate 0.000019 Epoch: 34 Global Step: 725550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:47,402-Speed 2498.32 samples/sec Loss 1.1502 LearningRate 0.000019 Epoch: 34 Global Step: 725560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:32:55,602-Speed 2497.97 samples/sec Loss 1.1590 LearningRate 0.000019 Epoch: 34 Global Step: 725570 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:03,804-Speed 2497.19 samples/sec Loss 1.1479 LearningRate 0.000019 Epoch: 34 Global Step: 725580 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:11,964-Speed 2510.23 samples/sec Loss 1.1597 LearningRate 0.000019 Epoch: 34 Global Step: 725590 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:20,179-Speed 2493.62 samples/sec Loss 1.1365 LearningRate 0.000019 Epoch: 34 Global Step: 725600 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:28,380-Speed 2497.36 samples/sec Loss 1.1358 LearningRate 0.000019 Epoch: 34 Global Step: 725610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:36,580-Speed 2497.97 samples/sec Loss 1.1621 LearningRate 0.000019 Epoch: 34 Global Step: 725620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:44,793-Speed 2493.96 samples/sec Loss 1.1111 LearningRate 0.000019 Epoch: 34 Global Step: 725630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:33:52,994-Speed 2497.38 samples/sec Loss 1.0958 LearningRate 0.000019 Epoch: 34 Global Step: 725640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:01,142-Speed 2514.07 samples/sec Loss 1.1440 LearningRate 0.000019 Epoch: 34 Global Step: 725650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:09,342-Speed 2497.89 samples/sec Loss 1.1493 LearningRate 0.000019 Epoch: 34 Global Step: 725660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:17,541-Speed 2498.38 samples/sec Loss 1.1274 LearningRate 0.000019 Epoch: 34 Global Step: 725670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:25,743-Speed 2497.24 samples/sec Loss 1.1368 LearningRate 0.000019 Epoch: 34 Global Step: 725680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:33,943-Speed 2497.79 samples/sec Loss 1.1752 LearningRate 0.000019 Epoch: 34 Global Step: 725690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:42,150-Speed 2495.99 samples/sec Loss 1.1389 LearningRate 0.000019 Epoch: 34 Global Step: 725700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:50,296-Speed 2514.52 samples/sec Loss 1.1406 LearningRate 0.000019 Epoch: 34 Global Step: 725710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:34:58,496-Speed 2498.06 samples/sec Loss 1.1136 LearningRate 0.000019 Epoch: 34 Global Step: 725720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:06,694-Speed 2498.40 samples/sec Loss 1.1538 LearningRate 0.000019 Epoch: 34 Global Step: 725730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:14,894-Speed 2498.02 samples/sec Loss 1.1570 LearningRate 0.000019 Epoch: 34 Global Step: 725740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:23,100-Speed 2496.17 samples/sec Loss 1.1461 LearningRate 0.000019 Epoch: 34 Global Step: 725750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:31,302-Speed 2497.42 samples/sec Loss 1.1459 LearningRate 0.000019 Epoch: 34 Global Step: 725760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:39,446-Speed 2515.07 samples/sec Loss 1.1284 LearningRate 0.000019 Epoch: 34 Global Step: 725770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:47,646-Speed 2498.42 samples/sec Loss 1.1154 LearningRate 0.000019 Epoch: 34 Global Step: 725780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:35:55,842-Speed 2499.00 samples/sec Loss 1.1418 LearningRate 0.000019 Epoch: 34 Global Step: 725790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:04,043-Speed 2497.86 samples/sec Loss 1.1459 LearningRate 0.000019 Epoch: 34 Global Step: 725800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:12,254-Speed 2494.68 samples/sec Loss 1.1609 LearningRate 0.000019 Epoch: 34 Global Step: 725810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:20,453-Speed 2497.95 samples/sec Loss 1.1238 LearningRate 0.000019 Epoch: 34 Global Step: 725820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:28,600-Speed 2514.34 samples/sec Loss 1.1637 LearningRate 0.000019 Epoch: 34 Global Step: 725830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:36,803-Speed 2497.59 samples/sec Loss 1.1157 LearningRate 0.000019 Epoch: 34 Global Step: 725840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:45,007-Speed 2496.86 samples/sec Loss 1.1091 LearningRate 0.000019 Epoch: 34 Global Step: 725850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:36:53,206-Speed 2497.97 samples/sec Loss 1.1437 LearningRate 0.000019 Epoch: 34 Global Step: 725860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:01,410-Speed 2496.78 samples/sec Loss 1.1000 LearningRate 0.000019 Epoch: 34 Global Step: 725870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:09,616-Speed 2496.27 samples/sec Loss 1.1621 LearningRate 0.000019 Epoch: 34 Global Step: 725880 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:20,535-Speed 1875.75 samples/sec Loss 1.1464 LearningRate 0.000019 Epoch: 35 Global Step: 725890 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:28,743-Speed 2495.67 samples/sec Loss 1.1462 LearningRate 0.000019 Epoch: 35 Global Step: 725900 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:36,953-Speed 2495.04 samples/sec Loss 1.1383 LearningRate 0.000019 Epoch: 35 Global Step: 725910 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:45,159-Speed 2495.72 samples/sec Loss 1.1195 LearningRate 0.000019 Epoch: 35 Global Step: 725920 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:37:53,372-Speed 2494.20 samples/sec Loss 1.1660 LearningRate 0.000019 Epoch: 35 Global Step: 725930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:01,581-Speed 2495.00 samples/sec Loss 1.1338 LearningRate 0.000019 Epoch: 35 Global Step: 725940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:09,746-Speed 2508.65 samples/sec Loss 1.1583 LearningRate 0.000019 Epoch: 35 Global Step: 725950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:17,963-Speed 2492.96 samples/sec Loss 1.1609 LearningRate 0.000019 Epoch: 35 Global Step: 725960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:26,178-Speed 2493.14 samples/sec Loss 1.1511 LearningRate 0.000019 Epoch: 35 Global Step: 725970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:34,392-Speed 2493.91 samples/sec Loss 1.1196 LearningRate 0.000019 Epoch: 35 Global Step: 725980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:42,603-Speed 2494.65 samples/sec Loss 1.1463 LearningRate 0.000019 Epoch: 35 Global Step: 725990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:50,811-Speed 2495.67 samples/sec Loss 1.1166 LearningRate 0.000019 Epoch: 35 Global Step: 726000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:38:58,965-Speed 2511.87 samples/sec Loss 1.0773 LearningRate 0.000019 Epoch: 35 Global Step: 726010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:07,171-Speed 2496.40 samples/sec Loss 1.1564 LearningRate 0.000019 Epoch: 35 Global Step: 726020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:15,372-Speed 2498.00 samples/sec Loss 1.1446 LearningRate 0.000019 Epoch: 35 Global Step: 726030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:23,575-Speed 2497.01 samples/sec Loss 1.1503 LearningRate 0.000019 Epoch: 35 Global Step: 726040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:31,775-Speed 2497.89 samples/sec Loss 1.1383 LearningRate 0.000019 Epoch: 35 Global Step: 726050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:39,976-Speed 2497.60 samples/sec Loss 1.1044 LearningRate 0.000019 Epoch: 35 Global Step: 726060 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:48,124-Speed 2514.05 samples/sec Loss 1.1304 LearningRate 0.000019 Epoch: 35 Global Step: 726070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:39:56,327-Speed 2496.78 samples/sec Loss 1.1349 LearningRate 0.000019 Epoch: 35 Global Step: 726080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:04,536-Speed 2495.32 samples/sec Loss 1.1511 LearningRate 0.000019 Epoch: 35 Global Step: 726090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:12,743-Speed 2495.81 samples/sec Loss 1.1417 LearningRate 0.000019 Epoch: 35 Global Step: 726100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:20,945-Speed 2497.12 samples/sec Loss 1.1002 LearningRate 0.000019 Epoch: 35 Global Step: 726110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:29,149-Speed 2496.86 samples/sec Loss 1.1287 LearningRate 0.000019 Epoch: 35 Global Step: 726120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:37,299-Speed 2513.61 samples/sec Loss 1.1114 LearningRate 0.000019 Epoch: 35 Global Step: 726130 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:45,503-Speed 2496.91 samples/sec Loss 1.0989 LearningRate 0.000019 Epoch: 35 Global Step: 726140 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:40:53,712-Speed 2495.28 samples/sec Loss 1.1378 LearningRate 0.000019 Epoch: 35 Global Step: 726150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:01,940-Speed 2489.36 samples/sec Loss 1.1228 LearningRate 0.000019 Epoch: 35 Global Step: 726160 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:10,146-Speed 2496.11 samples/sec Loss 1.1080 LearningRate 0.000019 Epoch: 35 Global Step: 726170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:18,351-Speed 2496.49 samples/sec Loss 1.1063 LearningRate 0.000019 Epoch: 35 Global Step: 726180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:26,500-Speed 2513.56 samples/sec Loss 1.1238 LearningRate 0.000019 Epoch: 35 Global Step: 726190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:34,707-Speed 2495.68 samples/sec Loss 1.1060 LearningRate 0.000019 Epoch: 35 Global Step: 726200 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:42,907-Speed 2497.91 samples/sec Loss 1.1409 LearningRate 0.000019 Epoch: 35 Global Step: 726210 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:51,115-Speed 2495.50 samples/sec Loss 1.1524 LearningRate 0.000019 Epoch: 35 Global Step: 726220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:41:59,321-Speed 2496.20 samples/sec Loss 1.1117 LearningRate 0.000019 Epoch: 35 Global Step: 726230 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:07,525-Speed 2496.54 samples/sec Loss 1.1284 LearningRate 0.000019 Epoch: 35 Global Step: 726240 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:15,673-Speed 2513.79 samples/sec Loss 1.1575 LearningRate 0.000019 Epoch: 35 Global Step: 726250 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:23,876-Speed 2497.24 samples/sec Loss 1.1438 LearningRate 0.000019 Epoch: 35 Global Step: 726260 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:32,086-Speed 2494.78 samples/sec Loss 1.1020 LearningRate 0.000019 Epoch: 35 Global Step: 726270 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:40,289-Speed 2497.07 samples/sec Loss 1.1374 LearningRate 0.000019 Epoch: 35 Global Step: 726280 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:48,497-Speed 2497.15 samples/sec Loss 1.1074 LearningRate 0.000019 Epoch: 35 Global Step: 726290 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:42:56,697-Speed 2498.13 samples/sec Loss 1.1357 LearningRate 0.000019 Epoch: 35 Global Step: 726300 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:04,864-Speed 2507.96 samples/sec Loss 1.1298 LearningRate 0.000019 Epoch: 35 Global Step: 726310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:13,067-Speed 2497.54 samples/sec Loss 1.1035 LearningRate 0.000019 Epoch: 35 Global Step: 726320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:21,275-Speed 2495.33 samples/sec Loss 1.1236 LearningRate 0.000019 Epoch: 35 Global Step: 726330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:29,478-Speed 2497.22 samples/sec Loss 1.1471 LearningRate 0.000019 Epoch: 35 Global Step: 726340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:37,682-Speed 2496.51 samples/sec Loss 1.1240 LearningRate 0.000019 Epoch: 35 Global Step: 726350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:45,885-Speed 2497.06 samples/sec Loss 1.1292 LearningRate 0.000019 Epoch: 35 Global Step: 726360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:43:54,040-Speed 2511.71 samples/sec Loss 1.1257 LearningRate 0.000019 Epoch: 35 Global Step: 726370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:02,243-Speed 2497.26 samples/sec Loss 1.1338 LearningRate 0.000019 Epoch: 35 Global Step: 726380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:10,462-Speed 2492.29 samples/sec Loss 1.1416 LearningRate 0.000019 Epoch: 35 Global Step: 726390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:18,664-Speed 2497.09 samples/sec Loss 1.1344 LearningRate 0.000019 Epoch: 35 Global Step: 726400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:26,870-Speed 2496.39 samples/sec Loss 1.1338 LearningRate 0.000019 Epoch: 35 Global Step: 726410 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:35,072-Speed 2497.43 samples/sec Loss 1.1427 LearningRate 0.000019 Epoch: 35 Global Step: 726420 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:43,236-Speed 2508.89 samples/sec Loss 1.1302 LearningRate 0.000019 Epoch: 35 Global Step: 726430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:51,441-Speed 2496.46 samples/sec Loss 1.1175 LearningRate 0.000019 Epoch: 35 Global Step: 726440 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:44:59,653-Speed 2494.56 samples/sec Loss 1.0844 LearningRate 0.000019 Epoch: 35 Global Step: 726450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:07,855-Speed 2497.39 samples/sec Loss 1.1598 LearningRate 0.000019 Epoch: 35 Global Step: 726460 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:16,068-Speed 2494.05 samples/sec Loss 1.0931 LearningRate 0.000019 Epoch: 35 Global Step: 726470 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:24,274-Speed 2496.06 samples/sec Loss 1.1483 LearningRate 0.000019 Epoch: 35 Global Step: 726480 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:32,438-Speed 2509.18 samples/sec Loss 1.1177 LearningRate 0.000019 Epoch: 35 Global Step: 726490 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:40,640-Speed 2497.25 samples/sec Loss 1.1088 LearningRate 0.000019 Epoch: 35 Global Step: 726500 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:48,842-Speed 2497.39 samples/sec Loss 1.1262 LearningRate 0.000019 Epoch: 35 Global Step: 726510 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:45:57,046-Speed 2497.11 samples/sec Loss 1.1593 LearningRate 0.000019 Epoch: 35 Global Step: 726520 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-07-12 12:46:05,251-Speed 2496.31 samples/sec Loss 1.1489 LearningRate 0.000019 Epoch: 35 Global Step: 726530 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:13,459-Speed 2495.49 samples/sec Loss 1.1398 LearningRate 0.000019 Epoch: 35 Global Step: 726540 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:21,609-Speed 2513.12 samples/sec Loss 1.0988 LearningRate 0.000019 Epoch: 35 Global Step: 726550 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:29,819-Speed 2495.14 samples/sec Loss 1.1341 LearningRate 0.000019 Epoch: 35 Global Step: 726560 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:38,021-Speed 2497.36 samples/sec Loss 1.1382 LearningRate 0.000019 Epoch: 35 Global Step: 726570 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:46,227-Speed 2496.11 samples/sec Loss 1.1330 LearningRate 0.000019 Epoch: 35 Global Step: 726580 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:46:54,427-Speed 2497.68 samples/sec Loss 1.1300 LearningRate 0.000019 Epoch: 35 Global Step: 726590 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:02,633-Speed 2496.25 samples/sec Loss 1.1461 LearningRate 0.000019 Epoch: 35 Global Step: 726600 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:10,780-Speed 2514.09 samples/sec Loss 1.1295 LearningRate 0.000019 Epoch: 35 Global Step: 726610 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:18,983-Speed 2497.18 samples/sec Loss 1.1112 LearningRate 0.000019 Epoch: 35 Global Step: 726620 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:27,186-Speed 2497.16 samples/sec Loss 1.1253 LearningRate 0.000019 Epoch: 35 Global Step: 726630 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:35,387-Speed 2497.69 samples/sec Loss 1.1644 LearningRate 0.000019 Epoch: 35 Global Step: 726640 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:43,589-Speed 2497.19 samples/sec Loss 1.1249 LearningRate 0.000019 Epoch: 35 Global Step: 726650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:51,790-Speed 2497.59 samples/sec Loss 1.1264 LearningRate 0.000019 Epoch: 35 Global Step: 726660 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:47:59,939-Speed 2513.80 samples/sec Loss 1.1344 LearningRate 0.000019 Epoch: 35 Global Step: 726670 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:08,153-Speed 2493.75 samples/sec Loss 1.1428 LearningRate 0.000019 Epoch: 35 Global Step: 726680 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:16,352-Speed 2498.11 samples/sec Loss 1.1289 LearningRate 0.000019 Epoch: 35 Global Step: 726690 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:24,554-Speed 2497.87 samples/sec Loss 1.1182 LearningRate 0.000019 Epoch: 35 Global Step: 726700 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:32,754-Speed 2497.96 samples/sec Loss 1.1272 LearningRate 0.000019 Epoch: 35 Global Step: 726710 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:40,953-Speed 2498.07 samples/sec Loss 1.1490 LearningRate 0.000019 Epoch: 35 Global Step: 726720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:49,115-Speed 2509.85 samples/sec Loss 1.1640 LearningRate 0.000019 Epoch: 35 Global Step: 726730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:48:57,320-Speed 2496.43 samples/sec Loss 1.1158 LearningRate 0.000019 Epoch: 35 Global Step: 726740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:05,521-Speed 2497.74 samples/sec Loss 1.1460 LearningRate 0.000019 Epoch: 35 Global Step: 726750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:13,725-Speed 2496.60 samples/sec Loss 1.1425 LearningRate 0.000019 Epoch: 35 Global Step: 726760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:21,930-Speed 2496.67 samples/sec Loss 1.1026 LearningRate 0.000019 Epoch: 35 Global Step: 726770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:30,131-Speed 2497.38 samples/sec Loss 1.1231 LearningRate 0.000019 Epoch: 35 Global Step: 726780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:38,286-Speed 2512.33 samples/sec Loss 1.1741 LearningRate 0.000019 Epoch: 35 Global Step: 726790 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:46,493-Speed 2495.88 samples/sec Loss 1.1234 LearningRate 0.000019 Epoch: 35 Global Step: 726800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:49:54,696-Speed 2497.04 samples/sec Loss 1.1116 LearningRate 0.000019 Epoch: 35 Global Step: 726810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:02,897-Speed 2497.78 samples/sec Loss 1.1420 LearningRate 0.000019 Epoch: 35 Global Step: 726820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:11,100-Speed 2497.20 samples/sec Loss 1.1160 LearningRate 0.000019 Epoch: 35 Global Step: 726830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:19,300-Speed 2497.82 samples/sec Loss 1.1172 LearningRate 0.000019 Epoch: 35 Global Step: 726840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:27,450-Speed 2513.35 samples/sec Loss 1.1554 LearningRate 0.000019 Epoch: 35 Global Step: 726850 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:35,653-Speed 2497.21 samples/sec Loss 1.1121 LearningRate 0.000019 Epoch: 35 Global Step: 726860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:43,852-Speed 2498.00 samples/sec Loss 1.1108 LearningRate 0.000019 Epoch: 35 Global Step: 726870 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:50:52,056-Speed 2496.91 samples/sec Loss 1.1211 LearningRate 0.000019 Epoch: 35 Global Step: 726880 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:51:00,264-Speed 2495.41 samples/sec Loss 1.1546 LearningRate 0.000019 Epoch: 35 Global Step: 726890 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:51:08,469-Speed 2496.41 samples/sec Loss 1.1314 LearningRate 0.000019 Epoch: 35 Global Step: 726900 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-07-12 12:51:16,623-Speed 2511.98 samples/sec Loss 1.1540 LearningRate 0.000019 Epoch: 35 Global Step: 726910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:51:24,827-Speed 2496.85 samples/sec Loss 1.1182 LearningRate 0.000019 Epoch: 35 Global Step: 726920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:51:33,029-Speed 2497.37 samples/sec Loss 1.1115 LearningRate 0.000019 Epoch: 35 Global Step: 726930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:51:41,233-Speed 2496.73 samples/sec Loss 1.1433 LearningRate 0.000019 Epoch: 35 Global Step: 726940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:51:49,436-Speed 2497.16 samples/sec Loss 1.1382 LearningRate 0.000019 Epoch: 35 Global Step: 726950 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:51:57,637-Speed 2497.54 samples/sec Loss 1.1357 LearningRate 0.000019 Epoch: 35 Global Step: 726960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:05,786-Speed 2513.41 samples/sec Loss 1.1328 LearningRate 0.000019 Epoch: 35 Global Step: 726970 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:13,989-Speed 2497.12 samples/sec Loss 1.1315 LearningRate 0.000019 Epoch: 35 Global Step: 726980 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:22,190-Speed 2497.64 samples/sec Loss 1.1376 LearningRate 0.000019 Epoch: 35 Global Step: 726990 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:30,411-Speed 2491.56 samples/sec Loss 1.1106 LearningRate 0.000019 Epoch: 35 Global Step: 727000 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:38,612-Speed 2497.82 samples/sec Loss 1.0920 LearningRate 0.000019 Epoch: 35 Global Step: 727010 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:46,815-Speed 2497.36 samples/sec Loss 1.1296 LearningRate 0.000019 Epoch: 35 Global Step: 727020 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:52:54,962-Speed 2514.17 samples/sec Loss 1.1488 LearningRate 0.000019 Epoch: 35 Global Step: 727030 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:03,164-Speed 2497.09 samples/sec Loss 1.1052 LearningRate 0.000019 Epoch: 35 Global Step: 727040 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:11,370-Speed 2496.22 samples/sec Loss 1.1273 LearningRate 0.000019 Epoch: 35 Global Step: 727050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:19,571-Speed 2497.66 samples/sec Loss 1.1009 LearningRate 0.000019 Epoch: 35 Global Step: 727060 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:27,777-Speed 2496.11 samples/sec Loss 1.1613 LearningRate 0.000019 Epoch: 35 Global Step: 727070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:35,985-Speed 2495.62 samples/sec Loss 1.1519 LearningRate 0.000019 Epoch: 35 Global Step: 727080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:44,138-Speed 2512.26 samples/sec Loss 1.1157 LearningRate 0.000019 Epoch: 35 Global Step: 727090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:53:52,340-Speed 2497.18 samples/sec Loss 1.1377 LearningRate 0.000019 Epoch: 35 Global Step: 727100 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:00,544-Speed 2497.06 samples/sec Loss 1.1177 LearningRate 0.000019 Epoch: 35 Global Step: 727110 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:08,741-Speed 2498.74 samples/sec Loss 1.1254 LearningRate 0.000019 Epoch: 35 Global Step: 727120 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:16,942-Speed 2497.96 samples/sec Loss 1.1770 LearningRate 0.000019 Epoch: 35 Global Step: 727130 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:25,141-Speed 2497.91 samples/sec Loss 1.1349 LearningRate 0.000019 Epoch: 35 Global Step: 727140 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:33,292-Speed 2513.15 samples/sec Loss 1.1353 LearningRate 0.000019 Epoch: 35 Global Step: 727150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:41,491-Speed 2498.14 samples/sec Loss 1.1397 LearningRate 0.000019 Epoch: 35 Global Step: 727160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:49,691-Speed 2498.26 samples/sec Loss 1.1353 LearningRate 0.000019 Epoch: 35 Global Step: 727170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:54:57,909-Speed 2492.34 samples/sec Loss 1.1401 LearningRate 0.000019 Epoch: 35 Global Step: 727180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:06,113-Speed 2497.07 samples/sec Loss 1.1335 LearningRate 0.000019 Epoch: 35 Global Step: 727190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:14,316-Speed 2497.00 samples/sec Loss 1.1271 LearningRate 0.000019 Epoch: 35 Global Step: 727200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:22,466-Speed 2513.44 samples/sec Loss 1.1183 LearningRate 0.000019 Epoch: 35 Global Step: 727210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:30,681-Speed 2493.37 samples/sec Loss 1.1567 LearningRate 0.000019 Epoch: 35 Global Step: 727220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:38,884-Speed 2497.02 samples/sec Loss 1.1327 LearningRate 0.000019 Epoch: 35 Global Step: 727230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:47,082-Speed 2498.87 samples/sec Loss 1.0792 LearningRate 0.000019 Epoch: 35 Global Step: 727240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:55:55,282-Speed 2497.88 samples/sec Loss 1.1202 LearningRate 0.000019 Epoch: 35 Global Step: 727250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:03,485-Speed 2496.88 samples/sec Loss 1.1428 LearningRate 0.000019 Epoch: 35 Global Step: 727260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:11,634-Speed 2513.78 samples/sec Loss 1.0991 LearningRate 0.000019 Epoch: 35 Global Step: 727270 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:19,833-Speed 2498.20 samples/sec Loss 1.1287 LearningRate 0.000019 Epoch: 35 Global Step: 727280 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:28,035-Speed 2497.28 samples/sec Loss 1.1058 LearningRate 0.000019 Epoch: 35 Global Step: 727290 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:36,237-Speed 2497.42 samples/sec Loss 1.1159 LearningRate 0.000019 Epoch: 35 Global Step: 727300 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:44,443-Speed 2496.10 samples/sec Loss 1.1402 LearningRate 0.000019 Epoch: 35 Global Step: 727310 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:56:52,647-Speed 2496.63 samples/sec Loss 1.1264 LearningRate 0.000019 Epoch: 35 Global Step: 727320 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:00,795-Speed 2513.96 samples/sec Loss 1.1144 LearningRate 0.000019 Epoch: 35 Global Step: 727330 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:08,995-Speed 2498.14 samples/sec Loss 1.1212 LearningRate 0.000019 Epoch: 35 Global Step: 727340 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:17,194-Speed 2498.08 samples/sec Loss 1.1367 LearningRate 0.000019 Epoch: 35 Global Step: 727350 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:25,395-Speed 2497.78 samples/sec Loss 1.1393 LearningRate 0.000019 Epoch: 35 Global Step: 727360 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:33,604-Speed 2495.17 samples/sec Loss 1.1224 LearningRate 0.000019 Epoch: 35 Global Step: 727370 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:41,809-Speed 2496.45 samples/sec Loss 1.1218 LearningRate 0.000019 Epoch: 35 Global Step: 727380 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:49,967-Speed 2510.97 samples/sec Loss 1.1083 LearningRate 0.000019 Epoch: 35 Global Step: 727390 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:57:58,174-Speed 2496.09 samples/sec Loss 1.1234 LearningRate 0.000019 Epoch: 35 Global Step: 727400 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:06,375-Speed 2497.47 samples/sec Loss 1.1252 LearningRate 0.000019 Epoch: 35 Global Step: 727410 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:14,578-Speed 2497.29 samples/sec Loss 1.1019 LearningRate 0.000019 Epoch: 35 Global Step: 727420 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:22,784-Speed 2496.16 samples/sec Loss 1.1001 LearningRate 0.000019 Epoch: 35 Global Step: 727430 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:30,985-Speed 2497.80 samples/sec Loss 1.1497 LearningRate 0.000019 Epoch: 35 Global Step: 727440 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:39,136-Speed 2512.81 samples/sec Loss 1.1125 LearningRate 0.000019 Epoch: 35 Global Step: 727450 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:47,333-Speed 2498.80 samples/sec Loss 1.1389 LearningRate 0.000019 Epoch: 35 Global Step: 727460 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:58:55,552-Speed 2492.31 samples/sec Loss 1.1063 LearningRate 0.000019 Epoch: 35 Global Step: 727470 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:03,761-Speed 2495.44 samples/sec Loss 1.1091 LearningRate 0.000019 Epoch: 35 Global Step: 727480 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:11,969-Speed 2495.53 samples/sec Loss 1.1115 LearningRate 0.000019 Epoch: 35 Global Step: 727490 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:20,173-Speed 2496.66 samples/sec Loss 1.1417 LearningRate 0.000019 Epoch: 35 Global Step: 727500 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:28,331-Speed 2510.86 samples/sec Loss 1.1188 LearningRate 0.000019 Epoch: 35 Global Step: 727510 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:36,539-Speed 2495.51 samples/sec Loss 1.1670 LearningRate 0.000019 Epoch: 35 Global Step: 727520 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:44,745-Speed 2496.11 samples/sec Loss 1.1253 LearningRate 0.000019 Epoch: 35 Global Step: 727530 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 12:59:52,950-Speed 2496.38 samples/sec Loss 1.1351 LearningRate 0.000019 Epoch: 35 Global Step: 727540 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:01,155-Speed 2496.67 samples/sec Loss 1.0992 LearningRate 0.000019 Epoch: 35 Global Step: 727550 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:09,362-Speed 2495.59 samples/sec Loss 1.1330 LearningRate 0.000019 Epoch: 35 Global Step: 727560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:17,516-Speed 2512.42 samples/sec Loss 1.1181 LearningRate 0.000019 Epoch: 35 Global Step: 727570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:25,724-Speed 2495.41 samples/sec Loss 1.1438 LearningRate 0.000019 Epoch: 35 Global Step: 727580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:33,931-Speed 2496.04 samples/sec Loss 1.1420 LearningRate 0.000019 Epoch: 35 Global Step: 727590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:42,138-Speed 2495.65 samples/sec Loss 1.1411 LearningRate 0.000019 Epoch: 35 Global Step: 727600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:50,347-Speed 2495.17 samples/sec Loss 1.1361 LearningRate 0.000019 Epoch: 35 Global Step: 727610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:00:58,564-Speed 2492.81 samples/sec Loss 1.1321 LearningRate 0.000019 Epoch: 35 Global Step: 727620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:06,715-Speed 2513.16 samples/sec Loss 1.1270 LearningRate 0.000019 Epoch: 35 Global Step: 727630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:14,922-Speed 2495.77 samples/sec Loss 1.1555 LearningRate 0.000019 Epoch: 35 Global Step: 727640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:23,127-Speed 2496.42 samples/sec Loss 1.1459 LearningRate 0.000019 Epoch: 35 Global Step: 727650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:31,336-Speed 2495.41 samples/sec Loss 1.1252 LearningRate 0.000019 Epoch: 35 Global Step: 727660 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:39,554-Speed 2492.46 samples/sec Loss 1.1038 LearningRate 0.000019 Epoch: 35 Global Step: 727670 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:47,759-Speed 2496.29 samples/sec Loss 1.1330 LearningRate 0.000019 Epoch: 35 Global Step: 727680 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:01:55,915-Speed 2511.76 samples/sec Loss 1.0895 LearningRate 0.000019 Epoch: 35 Global Step: 727690 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:04,115-Speed 2497.74 samples/sec Loss 1.1047 LearningRate 0.000019 Epoch: 35 Global Step: 727700 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:12,322-Speed 2495.82 samples/sec Loss 1.1038 LearningRate 0.000019 Epoch: 35 Global Step: 727710 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:20,532-Speed 2494.92 samples/sec Loss 1.1199 LearningRate 0.000019 Epoch: 35 Global Step: 727720 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:28,740-Speed 2495.46 samples/sec Loss 1.1137 LearningRate 0.000019 Epoch: 35 Global Step: 727730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-07-12 13:02:36,906-Speed 2508.43 samples/sec Loss 1.1202 LearningRate 0.000019 Epoch: 35 Global Step: 727740 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:45,060-Speed 2512.17 samples/sec Loss 1.1338 LearningRate 0.000019 Epoch: 35 Global Step: 727750 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:02:53,266-Speed 2496.02 samples/sec Loss 1.1219 LearningRate 0.000019 Epoch: 35 Global Step: 727760 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:01,468-Speed 2497.24 samples/sec Loss 1.1397 LearningRate 0.000019 Epoch: 35 Global Step: 727770 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:09,670-Speed 2497.41 samples/sec Loss 1.1471 LearningRate 0.000019 Epoch: 35 Global Step: 727780 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:17,874-Speed 2497.04 samples/sec Loss 1.1283 LearningRate 0.000019 Epoch: 35 Global Step: 727790 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:26,097-Speed 2490.87 samples/sec Loss 1.1438 LearningRate 0.000019 Epoch: 35 Global Step: 727800 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:34,242-Speed 2514.67 samples/sec Loss 1.1440 LearningRate 0.000019 Epoch: 35 Global Step: 727810 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:42,451-Speed 2495.73 samples/sec Loss 1.1199 LearningRate 0.000019 Epoch: 35 Global Step: 727820 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:50,655-Speed 2496.60 samples/sec Loss 1.1345 LearningRate 0.000019 Epoch: 35 Global Step: 727830 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:03:58,856-Speed 2497.41 samples/sec Loss 1.1034 LearningRate 0.000019 Epoch: 35 Global Step: 727840 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:07,058-Speed 2497.55 samples/sec Loss 1.1388 LearningRate 0.000019 Epoch: 35 Global Step: 727850 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:15,263-Speed 2496.52 samples/sec Loss 1.1001 LearningRate 0.000019 Epoch: 35 Global Step: 727860 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:23,411-Speed 2513.80 samples/sec Loss 1.1243 LearningRate 0.000019 Epoch: 35 Global Step: 727870 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:31,630-Speed 2492.49 samples/sec Loss 1.1529 LearningRate 0.000019 Epoch: 35 Global Step: 727880 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:39,842-Speed 2494.28 samples/sec Loss 1.0910 LearningRate 0.000019 Epoch: 35 Global Step: 727890 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:48,046-Speed 2496.87 samples/sec Loss 1.1241 LearningRate 0.000019 Epoch: 35 Global Step: 727900 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:04:56,260-Speed 2493.69 samples/sec Loss 1.1259 LearningRate 0.000019 Epoch: 35 Global Step: 727910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:04,462-Speed 2497.11 samples/sec Loss 1.0644 LearningRate 0.000019 Epoch: 35 Global Step: 727920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:12,613-Speed 2512.91 samples/sec Loss 1.1330 LearningRate 0.000019 Epoch: 35 Global Step: 727930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:20,813-Speed 2497.98 samples/sec Loss 1.1350 LearningRate 0.000019 Epoch: 35 Global Step: 727940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:29,017-Speed 2496.73 samples/sec Loss 1.1238 LearningRate 0.000019 Epoch: 35 Global Step: 727950 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:37,220-Speed 2497.07 samples/sec Loss 1.1431 LearningRate 0.000019 Epoch: 35 Global Step: 727960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:45,426-Speed 2496.47 samples/sec Loss 1.1512 LearningRate 0.000019 Epoch: 35 Global Step: 727970 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:05:53,626-Speed 2498.00 samples/sec Loss 1.1527 LearningRate 0.000019 Epoch: 35 Global Step: 727980 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:01,777-Speed 2512.94 samples/sec Loss 1.1219 LearningRate 0.000019 Epoch: 35 Global Step: 727990 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:09,979-Speed 2497.41 samples/sec Loss 1.1046 LearningRate 0.000019 Epoch: 35 Global Step: 728000 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:18,181-Speed 2497.22 samples/sec Loss 1.1549 LearningRate 0.000019 Epoch: 35 Global Step: 728010 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:26,385-Speed 2496.60 samples/sec Loss 1.1330 LearningRate 0.000018 Epoch: 35 Global Step: 728020 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:34,590-Speed 2496.66 samples/sec Loss 1.1470 LearningRate 0.000018 Epoch: 35 Global Step: 728030 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:42,803-Speed 2493.70 samples/sec Loss 1.1398 LearningRate 0.000018 Epoch: 35 Global Step: 728040 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:50,953-Speed 2513.31 samples/sec Loss 1.1588 LearningRate 0.000018 Epoch: 35 Global Step: 728050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:06:59,156-Speed 2497.23 samples/sec Loss 1.1149 LearningRate 0.000018 Epoch: 35 Global Step: 728060 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:07,360-Speed 2496.80 samples/sec Loss 1.1612 LearningRate 0.000018 Epoch: 35 Global Step: 728070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:15,562-Speed 2497.41 samples/sec Loss 1.1297 LearningRate 0.000018 Epoch: 35 Global Step: 728080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:23,767-Speed 2496.31 samples/sec Loss 1.1437 LearningRate 0.000018 Epoch: 35 Global Step: 728090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:31,970-Speed 2496.94 samples/sec Loss 1.1112 LearningRate 0.000018 Epoch: 35 Global Step: 728100 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:40,139-Speed 2507.51 samples/sec Loss 1.1284 LearningRate 0.000018 Epoch: 35 Global Step: 728110 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:48,342-Speed 2496.97 samples/sec Loss 1.1409 LearningRate 0.000018 Epoch: 35 Global Step: 728120 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:07:56,543-Speed 2497.60 samples/sec Loss 1.1340 LearningRate 0.000018 Epoch: 35 Global Step: 728130 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:04,752-Speed 2495.39 samples/sec Loss 1.1079 LearningRate 0.000018 Epoch: 35 Global Step: 728140 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:12,954-Speed 2497.17 samples/sec Loss 1.1274 LearningRate 0.000018 Epoch: 35 Global Step: 728150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:21,164-Speed 2495.23 samples/sec Loss 1.1340 LearningRate 0.000018 Epoch: 35 Global Step: 728160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:29,312-Speed 2513.66 samples/sec Loss 1.1223 LearningRate 0.000018 Epoch: 35 Global Step: 728170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:37,520-Speed 2495.58 samples/sec Loss 1.1567 LearningRate 0.000018 Epoch: 35 Global Step: 728180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:45,727-Speed 2496.28 samples/sec Loss 1.1118 LearningRate 0.000018 Epoch: 35 Global Step: 728190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:08:53,935-Speed 2495.25 samples/sec Loss 1.1160 LearningRate 0.000018 Epoch: 35 Global Step: 728200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:02,144-Speed 2495.41 samples/sec Loss 1.1207 LearningRate 0.000018 Epoch: 35 Global Step: 728210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:10,352-Speed 2495.58 samples/sec Loss 1.1377 LearningRate 0.000018 Epoch: 35 Global Step: 728220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:18,507-Speed 2511.65 samples/sec Loss 1.1454 LearningRate 0.000018 Epoch: 35 Global Step: 728230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:26,710-Speed 2497.24 samples/sec Loss 1.1264 LearningRate 0.000018 Epoch: 35 Global Step: 728240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:34,930-Speed 2491.86 samples/sec Loss 1.1278 LearningRate 0.000018 Epoch: 35 Global Step: 728250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:43,137-Speed 2495.70 samples/sec Loss 1.1191 LearningRate 0.000018 Epoch: 35 Global Step: 728260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:51,338-Speed 2497.65 samples/sec Loss 1.1439 LearningRate 0.000018 Epoch: 35 Global Step: 728270 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:09:59,542-Speed 2496.88 samples/sec Loss 1.1369 LearningRate 0.000018 Epoch: 35 Global Step: 728280 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:07,700-Speed 2510.70 samples/sec Loss 1.1086 LearningRate 0.000018 Epoch: 35 Global Step: 728290 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:15,910-Speed 2494.85 samples/sec Loss 1.1026 LearningRate 0.000018 Epoch: 35 Global Step: 728300 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:24,114-Speed 2496.60 samples/sec Loss 1.1240 LearningRate 0.000018 Epoch: 35 Global Step: 728310 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:32,312-Speed 2498.71 samples/sec Loss 1.1321 LearningRate 0.000018 Epoch: 35 Global Step: 728320 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:40,515-Speed 2496.93 samples/sec Loss 1.1676 LearningRate 0.000018 Epoch: 35 Global Step: 728330 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:10:48,675-Speed 2510.24 samples/sec Loss 1.1230 LearningRate 0.000018 Epoch: 35 Global Step: 728340 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:10:56,826-Speed 2512.85 samples/sec Loss 1.1274 LearningRate 0.000018 Epoch: 35 Global Step: 728350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:05,033-Speed 2496.13 samples/sec Loss 1.1136 LearningRate 0.000018 Epoch: 35 Global Step: 728360 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:13,246-Speed 2493.83 samples/sec Loss 1.1267 LearningRate 0.000018 Epoch: 35 Global Step: 728370 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:21,451-Speed 2496.31 samples/sec Loss 1.1040 LearningRate 0.000018 Epoch: 35 Global Step: 728380 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:29,653-Speed 2497.28 samples/sec Loss 1.1170 LearningRate 0.000018 Epoch: 35 Global Step: 728390 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:37,857-Speed 2496.68 samples/sec Loss 1.1167 LearningRate 0.000018 Epoch: 35 Global Step: 728400 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:46,010-Speed 2512.42 samples/sec Loss 1.1350 LearningRate 0.000018 Epoch: 35 Global Step: 728410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:11:54,219-Speed 2495.19 samples/sec Loss 1.1217 LearningRate 0.000018 Epoch: 35 Global Step: 728420 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:02,423-Speed 2496.96 samples/sec Loss 1.1257 LearningRate 0.000018 Epoch: 35 Global Step: 728430 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:10,639-Speed 2493.31 samples/sec Loss 1.1028 LearningRate 0.000018 Epoch: 35 Global Step: 728440 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:18,842-Speed 2497.08 samples/sec Loss 1.1219 LearningRate 0.000018 Epoch: 35 Global Step: 728450 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:27,044-Speed 2497.29 samples/sec Loss 1.1088 LearningRate 0.000018 Epoch: 35 Global Step: 728460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:35,195-Speed 2512.89 samples/sec Loss 1.1229 LearningRate 0.000018 Epoch: 35 Global Step: 728470 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:43,398-Speed 2497.30 samples/sec Loss 1.1181 LearningRate 0.000018 Epoch: 35 Global Step: 728480 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:51,600-Speed 2497.25 samples/sec Loss 1.1273 LearningRate 0.000018 Epoch: 35 Global Step: 728490 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:12:59,807-Speed 2495.98 samples/sec Loss 1.1351 LearningRate 0.000018 Epoch: 35 Global Step: 728500 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:08,008-Speed 2497.67 samples/sec Loss 1.1446 LearningRate 0.000018 Epoch: 35 Global Step: 728510 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:16,211-Speed 2497.06 samples/sec Loss 1.1244 LearningRate 0.000018 Epoch: 35 Global Step: 728520 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:24,357-Speed 2514.43 samples/sec Loss 1.1275 LearningRate 0.000018 Epoch: 35 Global Step: 728530 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:32,559-Speed 2497.51 samples/sec Loss 1.0987 LearningRate 0.000018 Epoch: 35 Global Step: 728540 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:40,761-Speed 2497.33 samples/sec Loss 1.1559 LearningRate 0.000018 Epoch: 35 Global Step: 728550 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:48,958-Speed 2498.84 samples/sec Loss 1.1132 LearningRate 0.000018 Epoch: 35 Global Step: 728560 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:13:57,159-Speed 2497.81 samples/sec Loss 1.1255 LearningRate 0.000018 Epoch: 35 Global Step: 728570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:05,359-Speed 2497.82 samples/sec Loss 1.1327 LearningRate 0.000018 Epoch: 35 Global Step: 728580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:13,506-Speed 2514.46 samples/sec Loss 1.1393 LearningRate 0.000018 Epoch: 35 Global Step: 728590 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:21,710-Speed 2496.66 samples/sec Loss 1.1173 LearningRate 0.000018 Epoch: 35 Global Step: 728600 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:29,913-Speed 2497.13 samples/sec Loss 1.1317 LearningRate 0.000018 Epoch: 35 Global Step: 728610 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:38,127-Speed 2493.58 samples/sec Loss 1.0946 LearningRate 0.000018 Epoch: 35 Global Step: 728620 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:46,340-Speed 2494.03 samples/sec Loss 1.1114 LearningRate 0.000018 Epoch: 35 Global Step: 728630 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:14:54,545-Speed 2496.59 samples/sec Loss 1.1173 LearningRate 0.000018 Epoch: 35 Global Step: 728640 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:02,695-Speed 2513.09 samples/sec Loss 1.1306 LearningRate 0.000018 Epoch: 35 Global Step: 728650 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:10,906-Speed 2494.59 samples/sec Loss 1.1205 LearningRate 0.000018 Epoch: 35 Global Step: 728660 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:19,105-Speed 2498.29 samples/sec Loss 1.1154 LearningRate 0.000018 Epoch: 35 Global Step: 728670 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:27,307-Speed 2497.64 samples/sec Loss 1.1074 LearningRate 0.000018 Epoch: 35 Global Step: 728680 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:35,509-Speed 2497.20 samples/sec Loss 1.1464 LearningRate 0.000018 Epoch: 35 Global Step: 728690 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:43,722-Speed 2493.82 samples/sec Loss 1.1512 LearningRate 0.000018 Epoch: 35 Global Step: 728700 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:15:51,874-Speed 2513.01 samples/sec Loss 1.1357 LearningRate 0.000018 Epoch: 35 Global Step: 728710 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:00,084-Speed 2494.71 samples/sec Loss 1.1601 LearningRate 0.000018 Epoch: 35 Global Step: 728720 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:08,284-Speed 2497.92 samples/sec Loss 1.1104 LearningRate 0.000018 Epoch: 35 Global Step: 728730 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:16,492-Speed 2495.81 samples/sec Loss 1.1321 LearningRate 0.000018 Epoch: 35 Global Step: 728740 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:24,693-Speed 2497.48 samples/sec Loss 1.1205 LearningRate 0.000018 Epoch: 35 Global Step: 728750 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:32,896-Speed 2496.95 samples/sec Loss 1.0844 LearningRate 0.000018 Epoch: 35 Global Step: 728760 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:41,043-Speed 2514.19 samples/sec Loss 1.1498 LearningRate 0.000018 Epoch: 35 Global Step: 728770 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:49,245-Speed 2497.20 samples/sec Loss 1.1182 LearningRate 0.000018 Epoch: 35 Global Step: 728780 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:16:57,450-Speed 2496.74 samples/sec Loss 1.1170 LearningRate 0.000018 Epoch: 35 Global Step: 728790 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:05,649-Speed 2498.05 samples/sec Loss 1.0987 LearningRate 0.000018 Epoch: 35 Global Step: 728800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:13,852-Speed 2496.88 samples/sec Loss 1.0954 LearningRate 0.000018 Epoch: 35 Global Step: 728810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:22,057-Speed 2496.42 samples/sec Loss 1.1309 LearningRate 0.000018 Epoch: 35 Global Step: 728820 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:30,224-Speed 2509.22 samples/sec Loss 1.1259 LearningRate 0.000018 Epoch: 35 Global Step: 728830 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:38,425-Speed 2497.66 samples/sec Loss 1.1391 LearningRate 0.000018 Epoch: 35 Global Step: 728840 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:46,627-Speed 2498.06 samples/sec Loss 1.1093 LearningRate 0.000018 Epoch: 35 Global Step: 728850 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:17:54,828-Speed 2497.40 samples/sec Loss 1.1761 LearningRate 0.000018 Epoch: 35 Global Step: 728860 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:03,032-Speed 2496.94 samples/sec Loss 1.1159 LearningRate 0.000018 Epoch: 35 Global Step: 728870 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:11,245-Speed 2493.80 samples/sec Loss 1.1022 LearningRate 0.000018 Epoch: 35 Global Step: 728880 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:19,407-Speed 2509.84 samples/sec Loss 1.1378 LearningRate 0.000018 Epoch: 35 Global Step: 728890 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:27,609-Speed 2497.09 samples/sec Loss 1.1054 LearningRate 0.000018 Epoch: 35 Global Step: 728900 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:35,810-Speed 2497.63 samples/sec Loss 1.1196 LearningRate 0.000018 Epoch: 35 Global Step: 728910 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:44,015-Speed 2496.73 samples/sec Loss 1.1510 LearningRate 0.000018 Epoch: 35 Global Step: 728920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:18:52,216-Speed 2497.57 samples/sec Loss 1.1300 LearningRate 0.000018 Epoch: 35 Global Step: 728930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:00,438-Speed 2491.08 samples/sec Loss 1.1449 LearningRate 0.000018 Epoch: 35 Global Step: 728940 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:08,592-Speed 2512.25 samples/sec Loss 1.1238 LearningRate 0.000018 Epoch: 35 Global Step: 728950 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:16,792-Speed 2497.94 samples/sec Loss 1.1186 LearningRate 0.000018 Epoch: 35 Global Step: 728960 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:24,996-Speed 2496.66 samples/sec Loss 1.1455 LearningRate 0.000018 Epoch: 35 Global Step: 728970 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:33,198-Speed 2497.36 samples/sec Loss 1.1297 LearningRate 0.000018 Epoch: 35 Global Step: 728980 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:41,398-Speed 2497.85 samples/sec Loss 1.1488 LearningRate 0.000018 Epoch: 35 Global Step: 728990 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:49,602-Speed 2496.91 samples/sec Loss 1.1241 LearningRate 0.000018 Epoch: 35 Global Step: 729000 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:19:57,755-Speed 2512.58 samples/sec Loss 1.1099 LearningRate 0.000018 Epoch: 35 Global Step: 729010 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:05,956-Speed 2497.42 samples/sec Loss 1.1212 LearningRate 0.000018 Epoch: 35 Global Step: 729020 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:14,171-Speed 2493.70 samples/sec Loss 1.1260 LearningRate 0.000018 Epoch: 35 Global Step: 729030 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:22,373-Speed 2497.22 samples/sec Loss 1.1242 LearningRate 0.000018 Epoch: 35 Global Step: 729040 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:30,587-Speed 2493.99 samples/sec Loss 1.1428 LearningRate 0.000018 Epoch: 35 Global Step: 729050 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:38,788-Speed 2497.89 samples/sec Loss 1.1004 LearningRate 0.000018 Epoch: 35 Global Step: 729060 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:46,935-Speed 2513.97 samples/sec Loss 1.1234 LearningRate 0.000018 Epoch: 35 Global Step: 729070 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:20:55,136-Speed 2497.84 samples/sec Loss 1.1341 LearningRate 0.000018 Epoch: 35 Global Step: 729080 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:03,338-Speed 2497.52 samples/sec Loss 1.1652 LearningRate 0.000018 Epoch: 35 Global Step: 729090 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:11,539-Speed 2497.66 samples/sec Loss 1.1384 LearningRate 0.000018 Epoch: 35 Global Step: 729100 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:19,741-Speed 2497.16 samples/sec Loss 1.1411 LearningRate 0.000018 Epoch: 35 Global Step: 729110 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:27,948-Speed 2496.11 samples/sec Loss 1.1435 LearningRate 0.000018 Epoch: 35 Global Step: 729120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:36,097-Speed 2513.79 samples/sec Loss 1.1530 LearningRate 0.000018 Epoch: 35 Global Step: 729130 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:44,305-Speed 2495.47 samples/sec Loss 1.1280 LearningRate 0.000018 Epoch: 35 Global Step: 729140 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:21:52,508-Speed 2497.31 samples/sec Loss 1.1457 LearningRate 0.000018 Epoch: 35 Global Step: 729150 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:00,711-Speed 2497.12 samples/sec Loss 1.1129 LearningRate 0.000018 Epoch: 35 Global Step: 729160 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:08,912-Speed 2497.74 samples/sec Loss 1.1162 LearningRate 0.000018 Epoch: 35 Global Step: 729170 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:17,114-Speed 2497.37 samples/sec Loss 1.1221 LearningRate 0.000018 Epoch: 35 Global Step: 729180 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:25,261-Speed 2514.22 samples/sec Loss 1.0943 LearningRate 0.000018 Epoch: 35 Global Step: 729190 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:33,460-Speed 2498.02 samples/sec Loss 1.1278 LearningRate 0.000018 Epoch: 35 Global Step: 729200 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:41,664-Speed 2496.92 samples/sec Loss 1.1204 LearningRate 0.000018 Epoch: 35 Global Step: 729210 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:49,862-Speed 2498.55 samples/sec Loss 1.1104 LearningRate 0.000018 Epoch: 35 Global Step: 729220 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:22:58,101-Speed 2485.98 samples/sec Loss 1.1365 LearningRate 0.000018 Epoch: 35 Global Step: 729230 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:06,307-Speed 2496.34 samples/sec Loss 1.1394 LearningRate 0.000018 Epoch: 35 Global Step: 729240 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:14,455-Speed 2513.67 samples/sec Loss 1.1336 LearningRate 0.000018 Epoch: 35 Global Step: 729250 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:22,656-Speed 2498.72 samples/sec Loss 1.1496 LearningRate 0.000018 Epoch: 35 Global Step: 729260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:30,857-Speed 2497.52 samples/sec Loss 1.1180 LearningRate 0.000018 Epoch: 35 Global Step: 729270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:39,056-Speed 2498.19 samples/sec Loss 1.1357 LearningRate 0.000018 Epoch: 35 Global Step: 729280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:47,258-Speed 2497.64 samples/sec Loss 1.1117 LearningRate 0.000018 Epoch: 35 Global Step: 729290 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:23:55,461-Speed 2497.10 samples/sec Loss 1.1451 LearningRate 0.000018 Epoch: 35 Global Step: 729300 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:03,610-Speed 2513.61 samples/sec Loss 1.1083 LearningRate 0.000018 Epoch: 35 Global Step: 729310 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:11,812-Speed 2497.36 samples/sec Loss 1.1215 LearningRate 0.000018 Epoch: 35 Global Step: 729320 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:20,017-Speed 2496.43 samples/sec Loss 1.1424 LearningRate 0.000018 Epoch: 35 Global Step: 729330 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:28,217-Speed 2497.70 samples/sec Loss 1.1392 LearningRate 0.000018 Epoch: 35 Global Step: 729340 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:36,421-Speed 2496.99 samples/sec Loss 1.1533 LearningRate 0.000018 Epoch: 35 Global Step: 729350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:44,622-Speed 2497.48 samples/sec Loss 1.1194 LearningRate 0.000018 Epoch: 35 Global Step: 729360 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:24:52,770-Speed 2513.68 samples/sec Loss 1.1171 LearningRate 0.000018 Epoch: 35 Global Step: 729370 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:00,977-Speed 2495.91 samples/sec Loss 1.1141 LearningRate 0.000018 Epoch: 35 Global Step: 729380 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:09,176-Speed 2498.65 samples/sec Loss 1.1224 LearningRate 0.000018 Epoch: 35 Global Step: 729390 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:17,377-Speed 2497.58 samples/sec Loss 1.1360 LearningRate 0.000018 Epoch: 35 Global Step: 729400 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:25,581-Speed 2496.58 samples/sec Loss 1.1108 LearningRate 0.000018 Epoch: 35 Global Step: 729410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:33,785-Speed 2497.06 samples/sec Loss 1.1192 LearningRate 0.000018 Epoch: 35 Global Step: 729420 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:41,932-Speed 2514.31 samples/sec Loss 1.1423 LearningRate 0.000018 Epoch: 35 Global Step: 729430 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:50,134-Speed 2497.18 samples/sec Loss 1.1166 LearningRate 0.000018 Epoch: 35 Global Step: 729440 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:25:58,333-Speed 2498.36 samples/sec Loss 1.1043 LearningRate 0.000018 Epoch: 35 Global Step: 729450 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:06,535-Speed 2497.15 samples/sec Loss 1.1121 LearningRate 0.000018 Epoch: 35 Global Step: 729460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:14,745-Speed 2495.15 samples/sec Loss 1.1036 LearningRate 0.000018 Epoch: 35 Global Step: 729470 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:22,955-Speed 2494.97 samples/sec Loss 1.1092 LearningRate 0.000018 Epoch: 35 Global Step: 729480 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:31,106-Speed 2513.12 samples/sec Loss 1.1638 LearningRate 0.000018 Epoch: 35 Global Step: 729490 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:39,309-Speed 2496.79 samples/sec Loss 1.0824 LearningRate 0.000018 Epoch: 35 Global Step: 729500 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:47,515-Speed 2496.38 samples/sec Loss 1.1475 LearningRate 0.000018 Epoch: 35 Global Step: 729510 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:26:55,720-Speed 2496.49 samples/sec Loss 1.1116 LearningRate 0.000018 Epoch: 35 Global Step: 729520 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:27:03,923-Speed 2497.13 samples/sec Loss 1.1524 LearningRate 0.000018 Epoch: 35 Global Step: 729530 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:27:12,124-Speed 2497.55 samples/sec Loss 1.1544 LearningRate 0.000018 Epoch: 35 Global Step: 729540 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:27:20,274-Speed 2513.52 samples/sec Loss 1.1565 LearningRate 0.000018 Epoch: 35 Global Step: 729550 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:27:28,485-Speed 2494.42 samples/sec Loss 1.1446 LearningRate 0.000018 Epoch: 35 Global Step: 729560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:27:36,691-Speed 2496.03 samples/sec Loss 1.1093 LearningRate 0.000018 Epoch: 35 Global Step: 729570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:27:44,896-Speed 2496.63 samples/sec Loss 1.1247 LearningRate 0.000018 Epoch: 35 Global Step: 729580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:27:53,098-Speed 2497.31 samples/sec Loss 1.1356 LearningRate 0.000018 Epoch: 35 Global Step: 729590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:01,304-Speed 2496.10 samples/sec Loss 1.1178 LearningRate 0.000018 Epoch: 35 Global Step: 729600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:09,452-Speed 2513.97 samples/sec Loss 1.1605 LearningRate 0.000018 Epoch: 35 Global Step: 729610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:17,655-Speed 2496.94 samples/sec Loss 1.1544 LearningRate 0.000018 Epoch: 35 Global Step: 729620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:25,857-Speed 2497.35 samples/sec Loss 1.1686 LearningRate 0.000018 Epoch: 35 Global Step: 729630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:34,057-Speed 2497.84 samples/sec Loss 1.1385 LearningRate 0.000018 Epoch: 35 Global Step: 729640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:42,262-Speed 2496.40 samples/sec Loss 1.1250 LearningRate 0.000018 Epoch: 35 Global Step: 729650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:50,464-Speed 2497.74 samples/sec Loss 1.1180 LearningRate 0.000018 Epoch: 35 Global Step: 729660 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:28:58,612-Speed 2513.84 samples/sec Loss 1.1214 LearningRate 0.000018 Epoch: 35 Global Step: 729670 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:06,813-Speed 2497.50 samples/sec Loss 1.0930 LearningRate 0.000018 Epoch: 35 Global Step: 729680 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:15,016-Speed 2497.12 samples/sec Loss 1.1322 LearningRate 0.000018 Epoch: 35 Global Step: 729690 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:23,219-Speed 2496.86 samples/sec Loss 1.1025 LearningRate 0.000018 Epoch: 35 Global Step: 729700 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:31,449-Speed 2488.93 samples/sec Loss 1.1301 LearningRate 0.000018 Epoch: 35 Global Step: 729710 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:39,657-Speed 2495.61 samples/sec Loss 1.1347 LearningRate 0.000018 Epoch: 35 Global Step: 729720 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:47,810-Speed 2512.38 samples/sec Loss 1.1319 LearningRate 0.000018 Epoch: 35 Global Step: 729730 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-07-12 13:29:55,975-Speed 2508.76 samples/sec Loss 1.1184 LearningRate 0.000018 Epoch: 35 Global Step: 729740 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:04,189-Speed 2493.71 samples/sec Loss 1.1035 LearningRate 0.000018 Epoch: 35 Global Step: 729750 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:12,393-Speed 2496.63 samples/sec Loss 1.1333 LearningRate 0.000018 Epoch: 35 Global Step: 729760 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:20,600-Speed 2495.81 samples/sec Loss 1.1160 LearningRate 0.000018 Epoch: 35 Global Step: 729770 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:28,824-Speed 2490.68 samples/sec Loss 1.1169 LearningRate 0.000018 Epoch: 35 Global Step: 729780 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:36,978-Speed 2512.00 samples/sec Loss 1.1438 LearningRate 0.000018 Epoch: 35 Global Step: 729790 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:45,186-Speed 2495.67 samples/sec Loss 1.1197 LearningRate 0.000018 Epoch: 35 Global Step: 729800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:30:53,392-Speed 2496.16 samples/sec Loss 1.0951 LearningRate 0.000018 Epoch: 35 Global Step: 729810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:01,602-Speed 2494.79 samples/sec Loss 1.1362 LearningRate 0.000018 Epoch: 35 Global Step: 729820 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:09,827-Speed 2490.44 samples/sec Loss 1.1521 LearningRate 0.000018 Epoch: 35 Global Step: 729830 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:18,033-Speed 2496.10 samples/sec Loss 1.1213 LearningRate 0.000018 Epoch: 35 Global Step: 729840 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:26,186-Speed 2512.43 samples/sec Loss 1.1120 LearningRate 0.000018 Epoch: 35 Global Step: 729850 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:34,396-Speed 2495.01 samples/sec Loss 1.1444 LearningRate 0.000018 Epoch: 35 Global Step: 729860 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:42,600-Speed 2496.72 samples/sec Loss 1.1557 LearningRate 0.000018 Epoch: 35 Global Step: 729870 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:50,808-Speed 2495.67 samples/sec Loss 1.1115 LearningRate 0.000018 Epoch: 35 Global Step: 729880 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:31:59,038-Speed 2488.84 samples/sec Loss 1.1144 LearningRate 0.000018 Epoch: 35 Global Step: 729890 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:07,242-Speed 2496.84 samples/sec Loss 1.1294 LearningRate 0.000018 Epoch: 35 Global Step: 729900 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:15,393-Speed 2513.12 samples/sec Loss 1.1293 LearningRate 0.000018 Epoch: 35 Global Step: 729910 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:23,595-Speed 2497.25 samples/sec Loss 1.1456 LearningRate 0.000018 Epoch: 35 Global Step: 729920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:31,797-Speed 2497.23 samples/sec Loss 1.1299 LearningRate 0.000018 Epoch: 35 Global Step: 729930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:40,001-Speed 2496.41 samples/sec Loss 1.1331 LearningRate 0.000018 Epoch: 35 Global Step: 729940 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:48,206-Speed 2496.80 samples/sec Loss 1.1228 LearningRate 0.000018 Epoch: 35 Global Step: 729950 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:32:56,410-Speed 2496.91 samples/sec Loss 1.1256 LearningRate 0.000018 Epoch: 35 Global Step: 729960 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:04,565-Speed 2511.61 samples/sec Loss 1.0912 LearningRate 0.000018 Epoch: 35 Global Step: 729970 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:12,771-Speed 2495.97 samples/sec Loss 1.1237 LearningRate 0.000018 Epoch: 35 Global Step: 729980 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:20,975-Speed 2496.95 samples/sec Loss 1.1006 LearningRate 0.000018 Epoch: 35 Global Step: 729990 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:29,180-Speed 2496.49 samples/sec Loss 1.0915 LearningRate 0.000018 Epoch: 35 Global Step: 730000 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:37,381-Speed 2497.32 samples/sec Loss 1.1701 LearningRate 0.000018 Epoch: 35 Global Step: 730010 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:45,586-Speed 2496.51 samples/sec Loss 1.1319 LearningRate 0.000018 Epoch: 35 Global Step: 730020 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:33:53,739-Speed 2512.52 samples/sec Loss 1.1133 LearningRate 0.000018 Epoch: 35 Global Step: 730030 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:01,941-Speed 2497.30 samples/sec Loss 1.1304 LearningRate 0.000018 Epoch: 35 Global Step: 730040 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:10,147-Speed 2496.05 samples/sec Loss 1.1360 LearningRate 0.000018 Epoch: 35 Global Step: 730050 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:18,348-Speed 2498.02 samples/sec Loss 1.1108 LearningRate 0.000018 Epoch: 35 Global Step: 730060 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:26,554-Speed 2496.05 samples/sec Loss 1.1361 LearningRate 0.000018 Epoch: 35 Global Step: 730070 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:34,760-Speed 2495.96 samples/sec Loss 1.0998 LearningRate 0.000018 Epoch: 35 Global Step: 730080 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:42,914-Speed 2512.31 samples/sec Loss 1.1362 LearningRate 0.000018 Epoch: 35 Global Step: 730090 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:51,116-Speed 2497.56 samples/sec Loss 1.0859 LearningRate 0.000018 Epoch: 35 Global Step: 730100 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:34:59,320-Speed 2496.67 samples/sec Loss 1.1244 LearningRate 0.000018 Epoch: 35 Global Step: 730110 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:07,522-Speed 2497.64 samples/sec Loss 1.1289 LearningRate 0.000018 Epoch: 35 Global Step: 730120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:15,735-Speed 2493.80 samples/sec Loss 1.0965 LearningRate 0.000018 Epoch: 35 Global Step: 730130 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:23,945-Speed 2495.07 samples/sec Loss 1.1451 LearningRate 0.000018 Epoch: 35 Global Step: 730140 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:32,095-Speed 2513.39 samples/sec Loss 1.1101 LearningRate 0.000018 Epoch: 35 Global Step: 730150 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:40,293-Speed 2498.41 samples/sec Loss 1.1116 LearningRate 0.000018 Epoch: 35 Global Step: 730160 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:48,496-Speed 2497.00 samples/sec Loss 1.1365 LearningRate 0.000018 Epoch: 35 Global Step: 730170 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:35:56,697-Speed 2497.64 samples/sec Loss 1.1189 LearningRate 0.000018 Epoch: 35 Global Step: 730180 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:04,903-Speed 2496.40 samples/sec Loss 1.1364 LearningRate 0.000018 Epoch: 35 Global Step: 730190 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:13,104-Speed 2497.56 samples/sec Loss 1.1140 LearningRate 0.000018 Epoch: 35 Global Step: 730200 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:21,254-Speed 2513.26 samples/sec Loss 1.1235 LearningRate 0.000018 Epoch: 35 Global Step: 730210 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:29,468-Speed 2493.85 samples/sec Loss 1.1504 LearningRate 0.000018 Epoch: 35 Global Step: 730220 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:37,672-Speed 2497.02 samples/sec Loss 1.1428 LearningRate 0.000018 Epoch: 35 Global Step: 730230 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:45,871-Speed 2497.94 samples/sec Loss 1.1120 LearningRate 0.000018 Epoch: 35 Global Step: 730240 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:36:54,073-Speed 2497.63 samples/sec Loss 1.1222 LearningRate 0.000018 Epoch: 35 Global Step: 730250 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:02,275-Speed 2497.10 samples/sec Loss 1.1434 LearningRate 0.000018 Epoch: 35 Global Step: 730260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:10,423-Speed 2513.94 samples/sec Loss 1.1182 LearningRate 0.000018 Epoch: 35 Global Step: 730270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:18,627-Speed 2496.87 samples/sec Loss 1.1453 LearningRate 0.000018 Epoch: 35 Global Step: 730280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:26,837-Speed 2495.05 samples/sec Loss 1.1297 LearningRate 0.000018 Epoch: 35 Global Step: 730290 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:35,039-Speed 2497.73 samples/sec Loss 1.1166 LearningRate 0.000018 Epoch: 35 Global Step: 730300 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:43,244-Speed 2496.13 samples/sec Loss 1.1035 LearningRate 0.000018 Epoch: 35 Global Step: 730310 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:51,451-Speed 2495.88 samples/sec Loss 1.1154 LearningRate 0.000018 Epoch: 35 Global Step: 730320 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:37:59,599-Speed 2514.12 samples/sec Loss 1.1462 LearningRate 0.000018 Epoch: 35 Global Step: 730330 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:38:07,802-Speed 2497.07 samples/sec Loss 1.1393 LearningRate 0.000018 Epoch: 35 Global Step: 730340 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:38:16,005-Speed 2496.92 samples/sec Loss 1.1545 LearningRate 0.000018 Epoch: 35 Global Step: 730350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-07-12 13:38:24,162-Speed 2511.52 samples/sec Loss 1.1433 LearningRate 0.000018 Epoch: 35 Global Step: 730360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:38:32,369-Speed 2496.06 samples/sec Loss 1.1326 LearningRate 0.000018 Epoch: 35 Global Step: 730370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:38:40,581-Speed 2494.29 samples/sec Loss 1.1347 LearningRate 0.000018 Epoch: 35 Global Step: 730380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:38:48,730-Speed 2513.63 samples/sec Loss 1.1119 LearningRate 0.000018 Epoch: 35 Global Step: 730390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:38:56,930-Speed 2498.04 samples/sec Loss 1.1247 LearningRate 0.000018 Epoch: 35 Global Step: 730400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:05,130-Speed 2498.03 samples/sec Loss 1.1034 LearningRate 0.000018 Epoch: 35 Global Step: 730410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:13,330-Speed 2498.03 samples/sec Loss 1.1249 LearningRate 0.000018 Epoch: 35 Global Step: 730420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:21,527-Speed 2498.67 samples/sec Loss 1.1371 LearningRate 0.000018 Epoch: 35 Global Step: 730430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:29,725-Speed 2498.72 samples/sec Loss 1.1471 LearningRate 0.000018 Epoch: 35 Global Step: 730440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:37,872-Speed 2514.23 samples/sec Loss 1.1310 LearningRate 0.000018 Epoch: 35 Global Step: 730450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:46,070-Speed 2498.53 samples/sec Loss 1.0855 LearningRate 0.000018 Epoch: 35 Global Step: 730460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:39:54,269-Speed 2498.13 samples/sec Loss 1.1344 LearningRate 0.000018 Epoch: 35 Global Step: 730470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:02,468-Speed 2498.69 samples/sec Loss 1.0911 LearningRate 0.000018 Epoch: 35 Global Step: 730480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:10,667-Speed 2497.92 samples/sec Loss 1.1211 LearningRate 0.000018 Epoch: 35 Global Step: 730490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:18,874-Speed 2495.90 samples/sec Loss 1.1148 LearningRate 0.000018 Epoch: 35 Global Step: 730500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:27,026-Speed 2512.73 samples/sec Loss 1.1227 LearningRate 0.000018 Epoch: 35 Global Step: 730510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:35,228-Speed 2497.79 samples/sec Loss 1.1362 LearningRate 0.000018 Epoch: 35 Global Step: 730520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:43,430-Speed 2497.45 samples/sec Loss 1.1267 LearningRate 0.000018 Epoch: 35 Global Step: 730530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:51,630-Speed 2497.92 samples/sec Loss 1.1159 LearningRate 0.000018 Epoch: 35 Global Step: 730540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:40:59,834-Speed 2496.72 samples/sec Loss 1.0988 LearningRate 0.000018 Epoch: 35 Global Step: 730550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:08,045-Speed 2494.56 samples/sec Loss 1.1424 LearningRate 0.000018 Epoch: 35 Global Step: 730560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:16,190-Speed 2514.82 samples/sec Loss 1.1163 LearningRate 0.000018 Epoch: 35 Global Step: 730570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:24,389-Speed 2498.66 samples/sec Loss 1.1049 LearningRate 0.000018 Epoch: 35 Global Step: 730580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:32,596-Speed 2495.88 samples/sec Loss 1.1443 LearningRate 0.000018 Epoch: 35 Global Step: 730590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:40,819-Speed 2491.24 samples/sec Loss 1.1285 LearningRate 0.000018 Epoch: 35 Global Step: 730600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:49,036-Speed 2492.64 samples/sec Loss 1.1138 LearningRate 0.000018 Epoch: 35 Global Step: 730610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:41:57,239-Speed 2497.10 samples/sec Loss 1.1088 LearningRate 0.000018 Epoch: 35 Global Step: 730620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:05,406-Speed 2508.02 samples/sec Loss 1.1326 LearningRate 0.000018 Epoch: 35 Global Step: 730630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:13,605-Speed 2498.35 samples/sec Loss 1.1161 LearningRate 0.000018 Epoch: 35 Global Step: 730640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:21,817-Speed 2494.36 samples/sec Loss 1.1637 LearningRate 0.000018 Epoch: 35 Global Step: 730650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:30,016-Speed 2499.10 samples/sec Loss 1.1332 LearningRate 0.000018 Epoch: 35 Global Step: 730660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:38,216-Speed 2498.08 samples/sec Loss 1.1141 LearningRate 0.000018 Epoch: 35 Global Step: 730670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:46,418-Speed 2497.04 samples/sec Loss 1.1002 LearningRate 0.000018 Epoch: 35 Global Step: 730680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:42:54,566-Speed 2513.96 samples/sec Loss 1.1396 LearningRate 0.000018 Epoch: 35 Global Step: 730690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:02,772-Speed 2496.47 samples/sec Loss 1.1200 LearningRate 0.000018 Epoch: 35 Global Step: 730700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:10,976-Speed 2496.56 samples/sec Loss 1.1337 LearningRate 0.000018 Epoch: 35 Global Step: 730710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:19,179-Speed 2497.23 samples/sec Loss 1.1482 LearningRate 0.000018 Epoch: 35 Global Step: 730720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:27,378-Speed 2498.28 samples/sec Loss 1.1056 LearningRate 0.000018 Epoch: 35 Global Step: 730730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:35,579-Speed 2497.44 samples/sec Loss 1.1224 LearningRate 0.000018 Epoch: 35 Global Step: 730740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:43,736-Speed 2511.32 samples/sec Loss 1.1288 LearningRate 0.000018 Epoch: 35 Global Step: 730750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:43:51,931-Speed 2499.49 samples/sec Loss 1.1369 LearningRate 0.000018 Epoch: 35 Global Step: 730760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:00,131-Speed 2497.93 samples/sec Loss 1.1048 LearningRate 0.000018 Epoch: 35 Global Step: 730770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:08,337-Speed 2496.24 samples/sec Loss 1.0957 LearningRate 0.000018 Epoch: 35 Global Step: 730780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:16,536-Speed 2498.78 samples/sec Loss 1.1083 LearningRate 0.000018 Epoch: 35 Global Step: 730790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:24,732-Speed 2498.97 samples/sec Loss 1.1209 LearningRate 0.000017 Epoch: 35 Global Step: 730800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:32,890-Speed 2510.92 samples/sec Loss 1.1110 LearningRate 0.000017 Epoch: 35 Global Step: 730810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:41,116-Speed 2489.88 samples/sec Loss 1.1107 LearningRate 0.000017 Epoch: 35 Global Step: 730820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:49,319-Speed 2497.23 samples/sec Loss 1.1242 LearningRate 0.000017 Epoch: 35 Global Step: 730830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:44:57,518-Speed 2497.96 samples/sec Loss 1.1335 LearningRate 0.000017 Epoch: 35 Global Step: 730840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:05,722-Speed 2496.74 samples/sec Loss 1.1472 LearningRate 0.000017 Epoch: 35 Global Step: 730850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:13,925-Speed 2497.19 samples/sec Loss 1.1180 LearningRate 0.000017 Epoch: 35 Global Step: 730860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:22,077-Speed 2512.61 samples/sec Loss 1.1149 LearningRate 0.000017 Epoch: 35 Global Step: 730870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:30,281-Speed 2496.67 samples/sec Loss 1.1369 LearningRate 0.000017 Epoch: 35 Global Step: 730880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:38,484-Speed 2497.15 samples/sec Loss 1.1159 LearningRate 0.000017 Epoch: 35 Global Step: 730890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:46,690-Speed 2496.58 samples/sec Loss 1.0903 LearningRate 0.000017 Epoch: 35 Global Step: 730900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:45:54,890-Speed 2497.96 samples/sec Loss 1.1067 LearningRate 0.000017 Epoch: 35 Global Step: 730910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:03,088-Speed 2498.47 samples/sec Loss 1.1195 LearningRate 0.000017 Epoch: 35 Global Step: 730920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:11,235-Speed 2514.20 samples/sec Loss 1.1185 LearningRate 0.000017 Epoch: 35 Global Step: 730930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:19,434-Speed 2498.12 samples/sec Loss 1.1522 LearningRate 0.000017 Epoch: 35 Global Step: 730940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:27,636-Speed 2497.75 samples/sec Loss 1.1348 LearningRate 0.000017 Epoch: 35 Global Step: 730950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:35,833-Speed 2498.63 samples/sec Loss 1.0959 LearningRate 0.000017 Epoch: 35 Global Step: 730960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:44,032-Speed 2498.42 samples/sec Loss 1.1350 LearningRate 0.000017 Epoch: 35 Global Step: 730970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:46:52,242-Speed 2494.74 samples/sec Loss 1.1109 LearningRate 0.000017 Epoch: 35 Global Step: 730980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:00,389-Speed 2514.30 samples/sec Loss 1.1281 LearningRate 0.000017 Epoch: 35 Global Step: 730990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:08,587-Speed 2498.45 samples/sec Loss 1.1244 LearningRate 0.000017 Epoch: 35 Global Step: 731000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:16,786-Speed 2498.30 samples/sec Loss 1.1186 LearningRate 0.000017 Epoch: 35 Global Step: 731010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:24,986-Speed 2498.10 samples/sec Loss 1.1212 LearningRate 0.000017 Epoch: 35 Global Step: 731020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:33,183-Speed 2498.76 samples/sec Loss 1.0970 LearningRate 0.000017 Epoch: 35 Global Step: 731030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:41,387-Speed 2496.71 samples/sec Loss 1.1348 LearningRate 0.000017 Epoch: 35 Global Step: 731040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:49,536-Speed 2513.37 samples/sec Loss 1.1200 LearningRate 0.000017 Epoch: 35 Global Step: 731050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:47:57,735-Speed 2498.11 samples/sec Loss 1.1364 LearningRate 0.000017 Epoch: 35 Global Step: 731060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:05,938-Speed 2497.22 samples/sec Loss 1.1145 LearningRate 0.000017 Epoch: 35 Global Step: 731070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:14,137-Speed 2498.26 samples/sec Loss 1.1668 LearningRate 0.000017 Epoch: 35 Global Step: 731080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:22,342-Speed 2496.58 samples/sec Loss 1.0921 LearningRate 0.000017 Epoch: 35 Global Step: 731090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:30,557-Speed 2493.57 samples/sec Loss 1.1218 LearningRate 0.000017 Epoch: 35 Global Step: 731100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:38,705-Speed 2513.74 samples/sec Loss 1.1278 LearningRate 0.000017 Epoch: 35 Global Step: 731110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:46,909-Speed 2496.83 samples/sec Loss 1.1159 LearningRate 0.000017 Epoch: 35 Global Step: 731120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:48:55,115-Speed 2496.19 samples/sec Loss 1.1051 LearningRate 0.000017 Epoch: 35 Global Step: 731130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:03,315-Speed 2498.12 samples/sec Loss 1.1167 LearningRate 0.000017 Epoch: 35 Global Step: 731140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:11,519-Speed 2496.53 samples/sec Loss 1.1317 LearningRate 0.000017 Epoch: 35 Global Step: 731150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:19,722-Speed 2497.52 samples/sec Loss 1.1210 LearningRate 0.000017 Epoch: 35 Global Step: 731160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:27,869-Speed 2514.10 samples/sec Loss 1.1301 LearningRate 0.000017 Epoch: 35 Global Step: 731170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:36,068-Speed 2498.21 samples/sec Loss 1.1575 LearningRate 0.000017 Epoch: 35 Global Step: 731180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:44,266-Speed 2498.46 samples/sec Loss 1.1076 LearningRate 0.000017 Epoch: 35 Global Step: 731190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:49:52,468-Speed 2497.28 samples/sec Loss 1.1474 LearningRate 0.000017 Epoch: 35 Global Step: 731200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:00,668-Speed 2498.10 samples/sec Loss 1.1479 LearningRate 0.000017 Epoch: 35 Global Step: 731210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:08,881-Speed 2494.07 samples/sec Loss 1.1096 LearningRate 0.000017 Epoch: 35 Global Step: 731220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:17,039-Speed 2510.77 samples/sec Loss 1.1411 LearningRate 0.000017 Epoch: 35 Global Step: 731230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:25,239-Speed 2498.16 samples/sec Loss 1.1297 LearningRate 0.000017 Epoch: 35 Global Step: 731240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:33,439-Speed 2498.09 samples/sec Loss 1.1258 LearningRate 0.000017 Epoch: 35 Global Step: 731250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:41,638-Speed 2498.25 samples/sec Loss 1.1200 LearningRate 0.000017 Epoch: 35 Global Step: 731260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:49,843-Speed 2496.45 samples/sec Loss 1.1364 LearningRate 0.000017 Epoch: 35 Global Step: 731270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-07-12 13:50:58,057-Speed 2493.74 samples/sec Loss 1.1544 LearningRate 0.000017 Epoch: 35 Global Step: 731280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:06,209-Speed 2512.63 samples/sec Loss 1.1227 LearningRate 0.000017 Epoch: 35 Global Step: 731290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:14,411-Speed 2497.54 samples/sec Loss 1.1062 LearningRate 0.000017 Epoch: 35 Global Step: 731300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:22,610-Speed 2498.23 samples/sec Loss 1.1224 LearningRate 0.000017 Epoch: 35 Global Step: 731310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:30,820-Speed 2494.75 samples/sec Loss 1.1117 LearningRate 0.000017 Epoch: 35 Global Step: 731320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:39,033-Speed 2494.21 samples/sec Loss 1.1455 LearningRate 0.000017 Epoch: 35 Global Step: 731330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:47,236-Speed 2496.98 samples/sec Loss 1.1460 LearningRate 0.000017 Epoch: 35 Global Step: 731340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:51:55,388-Speed 2512.76 samples/sec Loss 1.1157 LearningRate 0.000017 Epoch: 35 Global Step: 731350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:03,589-Speed 2497.72 samples/sec Loss 1.0947 LearningRate 0.000017 Epoch: 35 Global Step: 731360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:11,787-Speed 2499.38 samples/sec Loss 1.1389 LearningRate 0.000017 Epoch: 35 Global Step: 731370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:19,987-Speed 2497.95 samples/sec Loss 1.1397 LearningRate 0.000017 Epoch: 35 Global Step: 731380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:28,187-Speed 2498.19 samples/sec Loss 1.1248 LearningRate 0.000017 Epoch: 35 Global Step: 731390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:36,391-Speed 2496.72 samples/sec Loss 1.1218 LearningRate 0.000017 Epoch: 35 Global Step: 731400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:44,540-Speed 2513.62 samples/sec Loss 1.1151 LearningRate 0.000017 Epoch: 35 Global Step: 731410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:52:52,742-Speed 2497.34 samples/sec Loss 1.1449 LearningRate 0.000017 Epoch: 35 Global Step: 731420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:00,941-Speed 2497.89 samples/sec Loss 1.1310 LearningRate 0.000017 Epoch: 35 Global Step: 731430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:09,143-Speed 2497.62 samples/sec Loss 1.0920 LearningRate 0.000017 Epoch: 35 Global Step: 731440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:17,354-Speed 2494.62 samples/sec Loss 1.1345 LearningRate 0.000017 Epoch: 35 Global Step: 731450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:25,648-Speed 2469.40 samples/sec Loss 1.1180 LearningRate 0.000017 Epoch: 35 Global Step: 731460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:33,797-Speed 2513.59 samples/sec Loss 1.1352 LearningRate 0.000017 Epoch: 35 Global Step: 731470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:41,999-Speed 2497.66 samples/sec Loss 1.1331 LearningRate 0.000017 Epoch: 35 Global Step: 731480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:50,202-Speed 2496.98 samples/sec Loss 1.1477 LearningRate 0.000017 Epoch: 35 Global Step: 731490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:53:58,403-Speed 2497.99 samples/sec Loss 1.1325 LearningRate 0.000017 Epoch: 35 Global Step: 731500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:06,603-Speed 2497.97 samples/sec Loss 1.1183 LearningRate 0.000017 Epoch: 35 Global Step: 731510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:14,805-Speed 2497.26 samples/sec Loss 1.1228 LearningRate 0.000017 Epoch: 35 Global Step: 731520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:22,952-Speed 2514.15 samples/sec Loss 1.1253 LearningRate 0.000017 Epoch: 35 Global Step: 731530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:31,153-Speed 2497.39 samples/sec Loss 1.1103 LearningRate 0.000017 Epoch: 35 Global Step: 731540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:39,356-Speed 2497.30 samples/sec Loss 1.1359 LearningRate 0.000017 Epoch: 35 Global Step: 731550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 13:54:47,557-Speed 2497.72 samples/sec Loss 1.1406 LearningRate 0.000017 Epoch: 35 Global Step: 731560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:54:55,760-Speed 2497.05 samples/sec Loss 1.1697 LearningRate 0.000017 Epoch: 35 Global Step: 731570 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:03,968-Speed 2495.68 samples/sec Loss 1.1099 LearningRate 0.000017 Epoch: 35 Global Step: 731580 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:12,115-Speed 2514.18 samples/sec Loss 1.1179 LearningRate 0.000017 Epoch: 35 Global Step: 731590 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:20,324-Speed 2495.20 samples/sec Loss 1.1348 LearningRate 0.000017 Epoch: 35 Global Step: 731600 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:28,526-Speed 2497.60 samples/sec Loss 1.0725 LearningRate 0.000017 Epoch: 35 Global Step: 731610 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:36,726-Speed 2498.07 samples/sec Loss 1.1121 LearningRate 0.000017 Epoch: 35 Global Step: 731620 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:44,940-Speed 2493.61 samples/sec Loss 1.1161 LearningRate 0.000017 Epoch: 35 Global Step: 731630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:55:53,140-Speed 2497.79 samples/sec Loss 1.1366 LearningRate 0.000017 Epoch: 35 Global Step: 731640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:01,294-Speed 2512.30 samples/sec Loss 1.1160 LearningRate 0.000017 Epoch: 35 Global Step: 731650 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:09,506-Speed 2494.21 samples/sec Loss 1.1128 LearningRate 0.000017 Epoch: 35 Global Step: 731660 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:17,707-Speed 2497.54 samples/sec Loss 1.1170 LearningRate 0.000017 Epoch: 35 Global Step: 731670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:25,915-Speed 2495.68 samples/sec Loss 1.1463 LearningRate 0.000017 Epoch: 35 Global Step: 731680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:34,116-Speed 2497.63 samples/sec Loss 1.1520 LearningRate 0.000017 Epoch: 35 Global Step: 731690 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:42,318-Speed 2497.28 samples/sec Loss 1.1154 LearningRate 0.000017 Epoch: 35 Global Step: 731700 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:50,471-Speed 2512.39 samples/sec Loss 1.1259 LearningRate 0.000017 Epoch: 35 Global Step: 731710 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:56:58,670-Speed 2498.15 samples/sec Loss 1.1194 LearningRate 0.000017 Epoch: 35 Global Step: 731720 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:06,873-Speed 2497.20 samples/sec Loss 1.1221 LearningRate 0.000017 Epoch: 35 Global Step: 731730 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:15,075-Speed 2497.19 samples/sec Loss 1.1420 LearningRate 0.000017 Epoch: 35 Global Step: 731740 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:23,276-Speed 2497.61 samples/sec Loss 1.1438 LearningRate 0.000017 Epoch: 35 Global Step: 731750 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:31,477-Speed 2497.65 samples/sec Loss 1.1456 LearningRate 0.000017 Epoch: 35 Global Step: 731760 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:39,628-Speed 2513.03 samples/sec Loss 1.1161 LearningRate 0.000017 Epoch: 35 Global Step: 731770 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:47,830-Speed 2497.21 samples/sec Loss 1.1801 LearningRate 0.000017 Epoch: 35 Global Step: 731780 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:57:56,039-Speed 2495.25 samples/sec Loss 1.1403 LearningRate 0.000017 Epoch: 35 Global Step: 731790 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:04,241-Speed 2497.26 samples/sec Loss 1.1127 LearningRate 0.000017 Epoch: 35 Global Step: 731800 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:12,446-Speed 2496.64 samples/sec Loss 1.1376 LearningRate 0.000017 Epoch: 35 Global Step: 731810 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:20,651-Speed 2496.38 samples/sec Loss 1.1293 LearningRate 0.000017 Epoch: 35 Global Step: 731820 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:28,803-Speed 2512.54 samples/sec Loss 1.1491 LearningRate 0.000017 Epoch: 35 Global Step: 731830 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:37,008-Speed 2496.46 samples/sec Loss 1.1110 LearningRate 0.000017 Epoch: 35 Global Step: 731840 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:45,211-Speed 2496.94 samples/sec Loss 1.1170 LearningRate 0.000017 Epoch: 35 Global Step: 731850 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:58:53,444-Speed 2488.00 samples/sec Loss 1.1366 LearningRate 0.000017 Epoch: 35 Global Step: 731860 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:01,651-Speed 2495.95 samples/sec Loss 1.1314 LearningRate 0.000017 Epoch: 35 Global Step: 731870 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:09,856-Speed 2496.41 samples/sec Loss 1.1424 LearningRate 0.000017 Epoch: 35 Global Step: 731880 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:18,022-Speed 2508.33 samples/sec Loss 1.1576 LearningRate 0.000017 Epoch: 35 Global Step: 731890 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:26,223-Speed 2497.66 samples/sec Loss 1.1632 LearningRate 0.000017 Epoch: 35 Global Step: 731900 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:34,423-Speed 2497.80 samples/sec Loss 1.1309 LearningRate 0.000017 Epoch: 35 Global Step: 731910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:42,627-Speed 2496.61 samples/sec Loss 1.1579 LearningRate 0.000017 Epoch: 35 Global Step: 731920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:50,829-Speed 2497.34 samples/sec Loss 1.1643 LearningRate 0.000017 Epoch: 35 Global Step: 731930 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 13:59:59,031-Speed 2497.55 samples/sec Loss 1.1407 LearningRate 0.000017 Epoch: 35 Global Step: 731940 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:07,193-Speed 2509.38 samples/sec Loss 1.0889 LearningRate 0.000017 Epoch: 35 Global Step: 731950 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:15,393-Speed 2498.03 samples/sec Loss 1.1206 LearningRate 0.000017 Epoch: 35 Global Step: 731960 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:23,607-Speed 2493.87 samples/sec Loss 1.1061 LearningRate 0.000017 Epoch: 35 Global Step: 731970 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:31,808-Speed 2497.37 samples/sec Loss 1.1152 LearningRate 0.000017 Epoch: 35 Global Step: 731980 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:40,010-Speed 2497.31 samples/sec Loss 1.1301 LearningRate 0.000017 Epoch: 35 Global Step: 731990 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:48,214-Speed 2496.83 samples/sec Loss 1.1529 LearningRate 0.000017 Epoch: 35 Global Step: 732000 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:00:56,375-Speed 2509.99 samples/sec Loss 1.1446 LearningRate 0.000017 Epoch: 35 Global Step: 732010 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:04,576-Speed 2497.45 samples/sec Loss 1.1231 LearningRate 0.000017 Epoch: 35 Global Step: 732020 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:12,777-Speed 2497.73 samples/sec Loss 1.1062 LearningRate 0.000017 Epoch: 35 Global Step: 732030 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:20,980-Speed 2497.01 samples/sec Loss 1.1045 LearningRate 0.000017 Epoch: 35 Global Step: 732040 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:29,188-Speed 2495.54 samples/sec Loss 1.1652 LearningRate 0.000017 Epoch: 35 Global Step: 732050 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:37,390-Speed 2497.28 samples/sec Loss 1.1135 LearningRate 0.000017 Epoch: 35 Global Step: 732060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:45,538-Speed 2513.97 samples/sec Loss 1.1176 LearningRate 0.000017 Epoch: 35 Global Step: 732070 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:01:53,741-Speed 2497.12 samples/sec Loss 1.1383 LearningRate 0.000017 Epoch: 35 Global Step: 732080 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:01,950-Speed 2495.38 samples/sec Loss 1.1481 LearningRate 0.000017 Epoch: 35 Global Step: 732090 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:10,158-Speed 2495.56 samples/sec Loss 1.1097 LearningRate 0.000017 Epoch: 35 Global Step: 732100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:18,361-Speed 2496.93 samples/sec Loss 1.1416 LearningRate 0.000017 Epoch: 35 Global Step: 732110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:26,563-Speed 2497.50 samples/sec Loss 1.1046 LearningRate 0.000017 Epoch: 35 Global Step: 732120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:34,713-Speed 2513.23 samples/sec Loss 1.1351 LearningRate 0.000017 Epoch: 35 Global Step: 732130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:42,916-Speed 2497.15 samples/sec Loss 1.0999 LearningRate 0.000017 Epoch: 35 Global Step: 732140 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:51,118-Speed 2497.36 samples/sec Loss 1.1469 LearningRate 0.000017 Epoch: 35 Global Step: 732150 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:02:59,325-Speed 2495.90 samples/sec Loss 1.1260 LearningRate 0.000017 Epoch: 35 Global Step: 732160 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:07,527-Speed 2497.23 samples/sec Loss 1.1097 LearningRate 0.000017 Epoch: 35 Global Step: 732170 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:15,734-Speed 2495.79 samples/sec Loss 1.1264 LearningRate 0.000017 Epoch: 35 Global Step: 732180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:23,880-Speed 2514.40 samples/sec Loss 1.1277 LearningRate 0.000017 Epoch: 35 Global Step: 732190 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:32,081-Speed 2497.76 samples/sec Loss 1.1329 LearningRate 0.000017 Epoch: 35 Global Step: 732200 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:40,284-Speed 2496.87 samples/sec Loss 1.1489 LearningRate 0.000017 Epoch: 35 Global Step: 732210 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:48,484-Speed 2497.94 samples/sec Loss 1.1314 LearningRate 0.000017 Epoch: 35 Global Step: 732220 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:03:56,683-Speed 2498.37 samples/sec Loss 1.1337 LearningRate 0.000017 Epoch: 35 Global Step: 732230 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:04,885-Speed 2497.33 samples/sec Loss 1.1379 LearningRate 0.000017 Epoch: 35 Global Step: 732240 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:13,035-Speed 2513.42 samples/sec Loss 1.1162 LearningRate 0.000017 Epoch: 35 Global Step: 732250 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:21,236-Speed 2497.48 samples/sec Loss 1.1352 LearningRate 0.000017 Epoch: 35 Global Step: 732260 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:29,453-Speed 2492.92 samples/sec Loss 1.1257 LearningRate 0.000017 Epoch: 35 Global Step: 732270 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:37,655-Speed 2497.10 samples/sec Loss 1.1269 LearningRate 0.000017 Epoch: 35 Global Step: 732280 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:45,858-Speed 2497.19 samples/sec Loss 1.0941 LearningRate 0.000017 Epoch: 35 Global Step: 732290 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:04:54,062-Speed 2496.77 samples/sec Loss 1.1299 LearningRate 0.000017 Epoch: 35 Global Step: 732300 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:02,210-Speed 2513.81 samples/sec Loss 1.1114 LearningRate 0.000017 Epoch: 35 Global Step: 732310 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:10,415-Speed 2496.62 samples/sec Loss 1.1328 LearningRate 0.000017 Epoch: 35 Global Step: 732320 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:18,614-Speed 2498.20 samples/sec Loss 1.0999 LearningRate 0.000017 Epoch: 35 Global Step: 732330 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:26,816-Speed 2497.25 samples/sec Loss 1.0811 LearningRate 0.000017 Epoch: 35 Global Step: 732340 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:35,019-Speed 2497.17 samples/sec Loss 1.1155 LearningRate 0.000017 Epoch: 35 Global Step: 732350 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:43,222-Speed 2497.04 samples/sec Loss 1.1072 LearningRate 0.000017 Epoch: 35 Global Step: 732360 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:51,373-Speed 2512.72 samples/sec Loss 1.1215 LearningRate 0.000017 Epoch: 35 Global Step: 732370 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:05:59,582-Speed 2495.08 samples/sec Loss 1.1243 LearningRate 0.000017 Epoch: 35 Global Step: 732380 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:07,783-Speed 2497.69 samples/sec Loss 1.1044 LearningRate 0.000017 Epoch: 35 Global Step: 732390 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:15,983-Speed 2498.18 samples/sec Loss 1.1087 LearningRate 0.000017 Epoch: 35 Global Step: 732400 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:24,185-Speed 2497.10 samples/sec Loss 1.1170 LearningRate 0.000017 Epoch: 35 Global Step: 732410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:32,389-Speed 2496.88 samples/sec Loss 1.0983 LearningRate 0.000017 Epoch: 35 Global Step: 732420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:40,535-Speed 2514.27 samples/sec Loss 1.1428 LearningRate 0.000017 Epoch: 35 Global Step: 732430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:48,734-Speed 2498.52 samples/sec Loss 1.0827 LearningRate 0.000017 Epoch: 35 Global Step: 732440 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:06:56,935-Speed 2497.62 samples/sec Loss 1.0974 LearningRate 0.000017 Epoch: 35 Global Step: 732450 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:05,133-Speed 2498.89 samples/sec Loss 1.1031 LearningRate 0.000017 Epoch: 35 Global Step: 732460 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:13,337-Speed 2496.66 samples/sec Loss 1.1375 LearningRate 0.000017 Epoch: 35 Global Step: 732470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:21,540-Speed 2496.84 samples/sec Loss 1.1485 LearningRate 0.000017 Epoch: 35 Global Step: 732480 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:29,687-Speed 2514.38 samples/sec Loss 1.1205 LearningRate 0.000017 Epoch: 35 Global Step: 732490 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:37,894-Speed 2495.78 samples/sec Loss 1.0985 LearningRate 0.000017 Epoch: 35 Global Step: 732500 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:46,103-Speed 2495.37 samples/sec Loss 1.1383 LearningRate 0.000017 Epoch: 35 Global Step: 732510 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:07:54,309-Speed 2496.40 samples/sec Loss 1.1259 LearningRate 0.000017 Epoch: 35 Global Step: 732520 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:02,511-Speed 2497.43 samples/sec Loss 1.0984 LearningRate 0.000017 Epoch: 35 Global Step: 732530 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:10,716-Speed 2496.48 samples/sec Loss 1.1311 LearningRate 0.000017 Epoch: 35 Global Step: 732540 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:18,860-Speed 2514.95 samples/sec Loss 1.1345 LearningRate 0.000017 Epoch: 35 Global Step: 732550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:27,063-Speed 2497.04 samples/sec Loss 1.1326 LearningRate 0.000017 Epoch: 35 Global Step: 732560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:35,264-Speed 2497.70 samples/sec Loss 1.1148 LearningRate 0.000017 Epoch: 35 Global Step: 732570 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:43,465-Speed 2497.75 samples/sec Loss 1.1348 LearningRate 0.000017 Epoch: 35 Global Step: 732580 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:51,671-Speed 2496.33 samples/sec Loss 1.1385 LearningRate 0.000017 Epoch: 35 Global Step: 732590 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:08:59,869-Speed 2498.90 samples/sec Loss 1.1201 LearningRate 0.000017 Epoch: 35 Global Step: 732600 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:08,017-Speed 2513.94 samples/sec Loss 1.1400 LearningRate 0.000017 Epoch: 35 Global Step: 732610 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:16,217-Speed 2498.01 samples/sec Loss 1.1230 LearningRate 0.000017 Epoch: 35 Global Step: 732620 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:24,417-Speed 2497.96 samples/sec Loss 1.1224 LearningRate 0.000017 Epoch: 35 Global Step: 732630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:32,619-Speed 2497.33 samples/sec Loss 1.1053 LearningRate 0.000017 Epoch: 35 Global Step: 732640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:40,821-Speed 2497.37 samples/sec Loss 1.1120 LearningRate 0.000017 Epoch: 35 Global Step: 732650 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:49,020-Speed 2498.11 samples/sec Loss 1.1230 LearningRate 0.000017 Epoch: 35 Global Step: 732660 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:09:57,169-Speed 2513.90 samples/sec Loss 1.1349 LearningRate 0.000017 Epoch: 35 Global Step: 732670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:05,380-Speed 2494.52 samples/sec Loss 1.0936 LearningRate 0.000017 Epoch: 35 Global Step: 732680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:13,581-Speed 2497.76 samples/sec Loss 1.1417 LearningRate 0.000017 Epoch: 35 Global Step: 732690 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:21,783-Speed 2497.00 samples/sec Loss 1.0912 LearningRate 0.000017 Epoch: 35 Global Step: 732700 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:29,986-Speed 2497.08 samples/sec Loss 1.1271 LearningRate 0.000017 Epoch: 35 Global Step: 732710 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:38,188-Speed 2497.35 samples/sec Loss 1.1034 LearningRate 0.000017 Epoch: 35 Global Step: 732720 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:46,333-Speed 2514.74 samples/sec Loss 1.1167 LearningRate 0.000017 Epoch: 35 Global Step: 732730 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:10:54,534-Speed 2497.74 samples/sec Loss 1.1388 LearningRate 0.000017 Epoch: 35 Global Step: 732740 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:11:02,731-Speed 2498.99 samples/sec Loss 1.1247 LearningRate 0.000017 Epoch: 35 Global Step: 732750 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:11:10,931-Speed 2497.84 samples/sec Loss 1.1050 LearningRate 0.000017 Epoch: 35 Global Step: 732760 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:11:19,132-Speed 2497.72 samples/sec Loss 1.1282 LearningRate 0.000017 Epoch: 35 Global Step: 732770 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:11:27,333-Speed 2497.64 samples/sec Loss 1.1033 LearningRate 0.000017 Epoch: 35 Global Step: 732780 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:11:35,491-Speed 2510.61 samples/sec Loss 1.1163 LearningRate 0.000017 Epoch: 35 Global Step: 732790 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:11:43,699-Speed 2495.47 samples/sec Loss 1.1069 LearningRate 0.000017 Epoch: 35 Global Step: 732800 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:11:51,901-Speed 2497.54 samples/sec Loss 1.1089 LearningRate 0.000017 Epoch: 35 Global Step: 732810 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:00,106-Speed 2496.63 samples/sec Loss 1.1118 LearningRate 0.000017 Epoch: 35 Global Step: 732820 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:08,307-Speed 2497.54 samples/sec Loss 1.1060 LearningRate 0.000017 Epoch: 35 Global Step: 732830 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:16,508-Speed 2497.55 samples/sec Loss 1.1365 LearningRate 0.000017 Epoch: 35 Global Step: 732840 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:24,657-Speed 2513.65 samples/sec Loss 1.1166 LearningRate 0.000017 Epoch: 35 Global Step: 732850 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:32,860-Speed 2497.09 samples/sec Loss 1.1250 LearningRate 0.000017 Epoch: 35 Global Step: 732860 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:41,063-Speed 2496.90 samples/sec Loss 1.0823 LearningRate 0.000017 Epoch: 35 Global Step: 732870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:49,264-Speed 2497.90 samples/sec Loss 1.0815 LearningRate 0.000017 Epoch: 35 Global Step: 732880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:12:57,462-Speed 2498.47 samples/sec Loss 1.1573 LearningRate 0.000017 Epoch: 35 Global Step: 732890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:05,660-Speed 2498.71 samples/sec Loss 1.1101 LearningRate 0.000017 Epoch: 35 Global Step: 732900 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:13,810-Speed 2513.16 samples/sec Loss 1.1291 LearningRate 0.000017 Epoch: 35 Global Step: 732910 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:22,010-Speed 2498.16 samples/sec Loss 1.1368 LearningRate 0.000017 Epoch: 35 Global Step: 732920 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:30,211-Speed 2497.41 samples/sec Loss 1.1092 LearningRate 0.000017 Epoch: 35 Global Step: 732930 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:38,415-Speed 2496.87 samples/sec Loss 1.1277 LearningRate 0.000017 Epoch: 35 Global Step: 732940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:46,616-Speed 2498.06 samples/sec Loss 1.1253 LearningRate 0.000017 Epoch: 35 Global Step: 732950 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:13:54,817-Speed 2497.49 samples/sec Loss 1.1387 LearningRate 0.000017 Epoch: 35 Global Step: 732960 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:02,964-Speed 2514.05 samples/sec Loss 1.1246 LearningRate 0.000017 Epoch: 35 Global Step: 732970 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:11,163-Speed 2498.25 samples/sec Loss 1.1362 LearningRate 0.000017 Epoch: 35 Global Step: 732980 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:19,367-Speed 2496.87 samples/sec Loss 1.1244 LearningRate 0.000017 Epoch: 35 Global Step: 732990 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:27,565-Speed 2498.51 samples/sec Loss 1.1187 LearningRate 0.000017 Epoch: 35 Global Step: 733000 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:35,767-Speed 2497.53 samples/sec Loss 1.1154 LearningRate 0.000017 Epoch: 35 Global Step: 733010 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:43,971-Speed 2496.81 samples/sec Loss 1.1228 LearningRate 0.000017 Epoch: 35 Global Step: 733020 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:14:52,123-Speed 2512.67 samples/sec Loss 1.1372 LearningRate 0.000017 Epoch: 35 Global Step: 733030 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:00,329-Speed 2496.06 samples/sec Loss 1.1033 LearningRate 0.000017 Epoch: 35 Global Step: 733040 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:08,531-Speed 2497.37 samples/sec Loss 1.1350 LearningRate 0.000017 Epoch: 35 Global Step: 733050 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:16,734-Speed 2497.00 samples/sec Loss 1.1010 LearningRate 0.000017 Epoch: 35 Global Step: 733060 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:24,938-Speed 2496.85 samples/sec Loss 1.1267 LearningRate 0.000017 Epoch: 35 Global Step: 733070 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:33,136-Speed 2498.52 samples/sec Loss 1.1530 LearningRate 0.000017 Epoch: 35 Global Step: 733080 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:41,288-Speed 2512.81 samples/sec Loss 1.1230 LearningRate 0.000017 Epoch: 35 Global Step: 733090 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:49,489-Speed 2497.47 samples/sec Loss 1.1422 LearningRate 0.000017 Epoch: 35 Global Step: 733100 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:15:57,646-Speed 2511.53 samples/sec Loss 1.1291 LearningRate 0.000017 Epoch: 35 Global Step: 733110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:05,847-Speed 2497.82 samples/sec Loss 1.1283 LearningRate 0.000017 Epoch: 35 Global Step: 733120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:14,061-Speed 2493.92 samples/sec Loss 1.1114 LearningRate 0.000017 Epoch: 35 Global Step: 733130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:22,261-Speed 2497.61 samples/sec Loss 1.1464 LearningRate 0.000017 Epoch: 35 Global Step: 733140 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:30,414-Speed 2512.65 samples/sec Loss 1.1209 LearningRate 0.000017 Epoch: 35 Global Step: 733150 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:38,619-Speed 2496.33 samples/sec Loss 1.1345 LearningRate 0.000017 Epoch: 35 Global Step: 733160 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:46,822-Speed 2496.93 samples/sec Loss 1.1582 LearningRate 0.000017 Epoch: 35 Global Step: 733170 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:16:55,031-Speed 2495.29 samples/sec Loss 1.0954 LearningRate 0.000017 Epoch: 35 Global Step: 733180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:03,232-Speed 2497.74 samples/sec Loss 1.1259 LearningRate 0.000017 Epoch: 35 Global Step: 733190 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:11,435-Speed 2497.12 samples/sec Loss 1.1600 LearningRate 0.000017 Epoch: 35 Global Step: 733200 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:19,584-Speed 2513.44 samples/sec Loss 1.1407 LearningRate 0.000017 Epoch: 35 Global Step: 733210 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:27,788-Speed 2496.59 samples/sec Loss 1.1108 LearningRate 0.000017 Epoch: 35 Global Step: 733220 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:36,009-Speed 2491.77 samples/sec Loss 1.1271 LearningRate 0.000017 Epoch: 35 Global Step: 733230 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:44,208-Speed 2498.24 samples/sec Loss 1.1157 LearningRate 0.000017 Epoch: 35 Global Step: 733240 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:17:52,412-Speed 2496.66 samples/sec Loss 1.1138 LearningRate 0.000017 Epoch: 35 Global Step: 733250 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:00,615-Speed 2497.24 samples/sec Loss 1.1176 LearningRate 0.000017 Epoch: 35 Global Step: 733260 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:08,763-Speed 2513.96 samples/sec Loss 1.1438 LearningRate 0.000017 Epoch: 35 Global Step: 733270 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:16,961-Speed 2498.49 samples/sec Loss 1.1273 LearningRate 0.000017 Epoch: 35 Global Step: 733280 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:25,165-Speed 2496.94 samples/sec Loss 1.1268 LearningRate 0.000017 Epoch: 35 Global Step: 733290 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:33,368-Speed 2496.78 samples/sec Loss 1.1393 LearningRate 0.000017 Epoch: 35 Global Step: 733300 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:41,568-Speed 2498.32 samples/sec Loss 1.0983 LearningRate 0.000017 Epoch: 35 Global Step: 733310 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:49,771-Speed 2497.26 samples/sec Loss 1.1128 LearningRate 0.000017 Epoch: 35 Global Step: 733320 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:18:57,919-Speed 2513.75 samples/sec Loss 1.1250 LearningRate 0.000017 Epoch: 35 Global Step: 733330 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:06,134-Speed 2493.63 samples/sec Loss 1.1110 LearningRate 0.000017 Epoch: 35 Global Step: 733340 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:14,335-Speed 2497.39 samples/sec Loss 1.1250 LearningRate 0.000017 Epoch: 35 Global Step: 733350 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:22,538-Speed 2497.40 samples/sec Loss 1.1474 LearningRate 0.000017 Epoch: 35 Global Step: 733360 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:30,742-Speed 2496.84 samples/sec Loss 1.1080 LearningRate 0.000017 Epoch: 35 Global Step: 733370 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:38,948-Speed 2496.58 samples/sec Loss 1.1427 LearningRate 0.000017 Epoch: 35 Global Step: 733380 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:47,098-Speed 2513.32 samples/sec Loss 1.1371 LearningRate 0.000017 Epoch: 35 Global Step: 733390 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:19:55,301-Speed 2497.17 samples/sec Loss 1.1104 LearningRate 0.000017 Epoch: 35 Global Step: 733400 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:03,504-Speed 2497.11 samples/sec Loss 1.0859 LearningRate 0.000017 Epoch: 35 Global Step: 733410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:11,707-Speed 2497.00 samples/sec Loss 1.1242 LearningRate 0.000017 Epoch: 35 Global Step: 733420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:19,907-Speed 2497.80 samples/sec Loss 1.1111 LearningRate 0.000017 Epoch: 35 Global Step: 733430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:28,109-Speed 2497.41 samples/sec Loss 1.1569 LearningRate 0.000017 Epoch: 35 Global Step: 733440 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:36,265-Speed 2511.39 samples/sec Loss 1.1109 LearningRate 0.000017 Epoch: 35 Global Step: 733450 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:44,472-Speed 2496.14 samples/sec Loss 1.1321 LearningRate 0.000017 Epoch: 35 Global Step: 733460 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:20:52,673-Speed 2497.48 samples/sec Loss 1.1238 LearningRate 0.000017 Epoch: 35 Global Step: 733470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:00,878-Speed 2496.93 samples/sec Loss 1.1041 LearningRate 0.000017 Epoch: 35 Global Step: 733480 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:09,079-Speed 2497.31 samples/sec Loss 1.1643 LearningRate 0.000017 Epoch: 35 Global Step: 733490 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:17,283-Speed 2496.97 samples/sec Loss 1.1297 LearningRate 0.000017 Epoch: 35 Global Step: 733500 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:25,428-Speed 2514.57 samples/sec Loss 1.1117 LearningRate 0.000017 Epoch: 35 Global Step: 733510 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:33,629-Speed 2497.70 samples/sec Loss 1.1463 LearningRate 0.000017 Epoch: 35 Global Step: 733520 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:41,828-Speed 2498.34 samples/sec Loss 1.0892 LearningRate 0.000017 Epoch: 35 Global Step: 733530 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:50,029-Speed 2497.60 samples/sec Loss 1.1194 LearningRate 0.000017 Epoch: 35 Global Step: 733540 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:21:58,230-Speed 2497.69 samples/sec Loss 1.1786 LearningRate 0.000017 Epoch: 35 Global Step: 733550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:06,429-Speed 2498.09 samples/sec Loss 1.1238 LearningRate 0.000017 Epoch: 35 Global Step: 733560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:14,580-Speed 2513.16 samples/sec Loss 1.0968 LearningRate 0.000017 Epoch: 35 Global Step: 733570 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:22,781-Speed 2497.36 samples/sec Loss 1.1154 LearningRate 0.000017 Epoch: 35 Global Step: 733580 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:30,985-Speed 2496.93 samples/sec Loss 1.1145 LearningRate 0.000017 Epoch: 35 Global Step: 733590 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:39,189-Speed 2496.53 samples/sec Loss 1.1574 LearningRate 0.000017 Epoch: 35 Global Step: 733600 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:47,390-Speed 2497.68 samples/sec Loss 1.1163 LearningRate 0.000017 Epoch: 35 Global Step: 733610 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:22:55,592-Speed 2497.47 samples/sec Loss 1.1007 LearningRate 0.000017 Epoch: 35 Global Step: 733620 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:03,741-Speed 2513.66 samples/sec Loss 1.1177 LearningRate 0.000017 Epoch: 35 Global Step: 733630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:11,941-Speed 2497.83 samples/sec Loss 1.1200 LearningRate 0.000017 Epoch: 35 Global Step: 733640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:20,143-Speed 2497.27 samples/sec Loss 1.1147 LearningRate 0.000017 Epoch: 35 Global Step: 733650 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:28,349-Speed 2496.29 samples/sec Loss 1.0977 LearningRate 0.000016 Epoch: 35 Global Step: 733660 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:36,551-Speed 2497.38 samples/sec Loss 1.0987 LearningRate 0.000016 Epoch: 35 Global Step: 733670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:44,754-Speed 2496.92 samples/sec Loss 1.1420 LearningRate 0.000016 Epoch: 35 Global Step: 733680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:23:52,908-Speed 2511.85 samples/sec Loss 1.1061 LearningRate 0.000016 Epoch: 35 Global Step: 733690 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:01,110-Speed 2497.43 samples/sec Loss 1.0954 LearningRate 0.000016 Epoch: 35 Global Step: 733700 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:09,318-Speed 2495.64 samples/sec Loss 1.1467 LearningRate 0.000016 Epoch: 35 Global Step: 733710 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:17,520-Speed 2497.19 samples/sec Loss 1.1194 LearningRate 0.000016 Epoch: 35 Global Step: 733720 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:25,734-Speed 2493.83 samples/sec Loss 1.1131 LearningRate 0.000016 Epoch: 35 Global Step: 733730 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:33,937-Speed 2496.88 samples/sec Loss 1.1004 LearningRate 0.000016 Epoch: 35 Global Step: 733740 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:42,087-Speed 2513.33 samples/sec Loss 1.1133 LearningRate 0.000016 Epoch: 35 Global Step: 733750 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:50,290-Speed 2497.02 samples/sec Loss 1.0917 LearningRate 0.000016 Epoch: 35 Global Step: 733760 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:24:58,501-Speed 2494.70 samples/sec Loss 1.1204 LearningRate 0.000016 Epoch: 35 Global Step: 733770 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:06,702-Speed 2497.39 samples/sec Loss 1.1404 LearningRate 0.000016 Epoch: 35 Global Step: 733780 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:14,910-Speed 2495.67 samples/sec Loss 1.1232 LearningRate 0.000016 Epoch: 35 Global Step: 733790 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:23,115-Speed 2496.51 samples/sec Loss 1.1119 LearningRate 0.000016 Epoch: 35 Global Step: 733800 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:31,262-Speed 2514.16 samples/sec Loss 1.1307 LearningRate 0.000016 Epoch: 35 Global Step: 733810 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:39,467-Speed 2496.08 samples/sec Loss 1.1041 LearningRate 0.000016 Epoch: 35 Global Step: 733820 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:47,677-Speed 2495.25 samples/sec Loss 1.0994 LearningRate 0.000016 Epoch: 35 Global Step: 733830 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:25:55,876-Speed 2498.22 samples/sec Loss 1.0867 LearningRate 0.000016 Epoch: 35 Global Step: 733840 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:04,078-Speed 2497.15 samples/sec Loss 1.1143 LearningRate 0.000016 Epoch: 35 Global Step: 733850 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:12,284-Speed 2496.21 samples/sec Loss 1.1260 LearningRate 0.000016 Epoch: 35 Global Step: 733860 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:20,437-Speed 2512.37 samples/sec Loss 1.1002 LearningRate 0.000016 Epoch: 35 Global Step: 733870 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:28,637-Speed 2497.93 samples/sec Loss 1.1033 LearningRate 0.000016 Epoch: 35 Global Step: 733880 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:36,838-Speed 2497.71 samples/sec Loss 1.1313 LearningRate 0.000016 Epoch: 35 Global Step: 733890 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:45,040-Speed 2497.32 samples/sec Loss 1.1005 LearningRate 0.000016 Epoch: 35 Global Step: 733900 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:26:53,241-Speed 2497.53 samples/sec Loss 1.1574 LearningRate 0.000016 Epoch: 35 Global Step: 733910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:01,447-Speed 2496.15 samples/sec Loss 1.1330 LearningRate 0.000016 Epoch: 35 Global Step: 733920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:09,599-Speed 2513.48 samples/sec Loss 1.1364 LearningRate 0.000016 Epoch: 35 Global Step: 733930 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:17,800-Speed 2497.90 samples/sec Loss 1.1292 LearningRate 0.000016 Epoch: 35 Global Step: 733940 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:26,005-Speed 2496.45 samples/sec Loss 1.1356 LearningRate 0.000016 Epoch: 35 Global Step: 733950 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:34,207-Speed 2497.42 samples/sec Loss 1.1238 LearningRate 0.000016 Epoch: 35 Global Step: 733960 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:42,408-Speed 2497.59 samples/sec Loss 1.0954 LearningRate 0.000016 Epoch: 35 Global Step: 733970 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:50,609-Speed 2497.87 samples/sec Loss 1.1139 LearningRate 0.000016 Epoch: 35 Global Step: 733980 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:27:58,755-Speed 2514.32 samples/sec Loss 1.1111 LearningRate 0.000016 Epoch: 35 Global Step: 733990 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:06,957-Speed 2497.13 samples/sec Loss 1.1367 LearningRate 0.000016 Epoch: 35 Global Step: 734000 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:15,162-Speed 2496.61 samples/sec Loss 1.1476 LearningRate 0.000016 Epoch: 35 Global Step: 734010 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:23,360-Speed 2498.51 samples/sec Loss 1.1283 LearningRate 0.000016 Epoch: 35 Global Step: 734020 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:31,575-Speed 2493.26 samples/sec Loss 1.1154 LearningRate 0.000016 Epoch: 35 Global Step: 734030 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:39,784-Speed 2495.63 samples/sec Loss 1.1004 LearningRate 0.000016 Epoch: 35 Global Step: 734040 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:47,934-Speed 2513.19 samples/sec Loss 1.1089 LearningRate 0.000016 Epoch: 35 Global Step: 734050 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:28:56,138-Speed 2496.68 samples/sec Loss 1.1186 LearningRate 0.000016 Epoch: 35 Global Step: 734060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:04,347-Speed 2495.20 samples/sec Loss 1.1478 LearningRate 0.000016 Epoch: 35 Global Step: 734070 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:12,546-Speed 2498.28 samples/sec Loss 1.1430 LearningRate 0.000016 Epoch: 35 Global Step: 734080 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:20,748-Speed 2497.26 samples/sec Loss 1.1369 LearningRate 0.000016 Epoch: 35 Global Step: 734090 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:28,949-Speed 2497.59 samples/sec Loss 1.1735 LearningRate 0.000016 Epoch: 35 Global Step: 734100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:37,105-Speed 2511.40 samples/sec Loss 1.1754 LearningRate 0.000016 Epoch: 35 Global Step: 734110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:45,309-Speed 2496.70 samples/sec Loss 1.1253 LearningRate 0.000016 Epoch: 35 Global Step: 734120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:29:53,511-Speed 2497.48 samples/sec Loss 1.1060 LearningRate 0.000016 Epoch: 35 Global Step: 734130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:01,711-Speed 2497.82 samples/sec Loss 1.1130 LearningRate 0.000016 Epoch: 35 Global Step: 734140 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:09,928-Speed 2492.88 samples/sec Loss 1.1264 LearningRate 0.000016 Epoch: 35 Global Step: 734150 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:18,130-Speed 2497.25 samples/sec Loss 1.1244 LearningRate 0.000016 Epoch: 35 Global Step: 734160 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:26,282-Speed 2512.98 samples/sec Loss 1.1375 LearningRate 0.000016 Epoch: 35 Global Step: 734170 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:34,491-Speed 2495.12 samples/sec Loss 1.1449 LearningRate 0.000016 Epoch: 35 Global Step: 734180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:42,692-Speed 2497.55 samples/sec Loss 1.0868 LearningRate 0.000016 Epoch: 35 Global Step: 734190 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:50,892-Speed 2498.05 samples/sec Loss 1.1066 LearningRate 0.000016 Epoch: 35 Global Step: 734200 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:30:59,093-Speed 2497.53 samples/sec Loss 1.1199 LearningRate 0.000016 Epoch: 35 Global Step: 734210 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:07,297-Speed 2496.80 samples/sec Loss 1.0979 LearningRate 0.000016 Epoch: 35 Global Step: 734220 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:15,449-Speed 2512.59 samples/sec Loss 1.1290 LearningRate 0.000016 Epoch: 35 Global Step: 734230 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:23,653-Speed 2496.85 samples/sec Loss 1.0940 LearningRate 0.000016 Epoch: 35 Global Step: 734240 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:31,858-Speed 2496.84 samples/sec Loss 1.1435 LearningRate 0.000016 Epoch: 35 Global Step: 734250 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:40,064-Speed 2496.02 samples/sec Loss 1.1406 LearningRate 0.000016 Epoch: 35 Global Step: 734260 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:48,272-Speed 2495.46 samples/sec Loss 1.1111 LearningRate 0.000016 Epoch: 35 Global Step: 734270 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:31:56,476-Speed 2496.77 samples/sec Loss 1.1429 LearningRate 0.000016 Epoch: 35 Global Step: 734280 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:04,628-Speed 2513.00 samples/sec Loss 1.1070 LearningRate 0.000016 Epoch: 35 Global Step: 734290 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:12,835-Speed 2495.72 samples/sec Loss 1.1423 LearningRate 0.000016 Epoch: 35 Global Step: 734300 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:21,036-Speed 2497.68 samples/sec Loss 1.1180 LearningRate 0.000016 Epoch: 35 Global Step: 734310 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-07-12 14:32:29,203-Speed 2508.19 samples/sec Loss 1.1312 LearningRate 0.000016 Epoch: 35 Global Step: 734320 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:37,400-Speed 2499.11 samples/sec Loss 1.1317 LearningRate 0.000016 Epoch: 35 Global Step: 734330 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:45,619-Speed 2492.17 samples/sec Loss 1.1323 LearningRate 0.000016 Epoch: 35 Global Step: 734340 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:32:53,766-Speed 2513.88 samples/sec Loss 1.1270 LearningRate 0.000016 Epoch: 35 Global Step: 734350 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:01,967-Speed 2497.79 samples/sec Loss 1.1128 LearningRate 0.000016 Epoch: 35 Global Step: 734360 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:10,168-Speed 2497.64 samples/sec Loss 1.1351 LearningRate 0.000016 Epoch: 35 Global Step: 734370 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:18,369-Speed 2497.64 samples/sec Loss 1.1154 LearningRate 0.000016 Epoch: 35 Global Step: 734380 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:26,571-Speed 2497.50 samples/sec Loss 1.1353 LearningRate 0.000016 Epoch: 35 Global Step: 734390 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:34,776-Speed 2496.59 samples/sec Loss 1.1196 LearningRate 0.000016 Epoch: 35 Global Step: 734400 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:42,926-Speed 2513.15 samples/sec Loss 1.1255 LearningRate 0.000016 Epoch: 35 Global Step: 734410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:51,133-Speed 2495.76 samples/sec Loss 1.1534 LearningRate 0.000016 Epoch: 35 Global Step: 734420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:33:59,342-Speed 2495.34 samples/sec Loss 1.1415 LearningRate 0.000016 Epoch: 35 Global Step: 734430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:07,543-Speed 2497.83 samples/sec Loss 1.1383 LearningRate 0.000016 Epoch: 35 Global Step: 734440 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:15,743-Speed 2497.98 samples/sec Loss 1.1276 LearningRate 0.000016 Epoch: 35 Global Step: 734450 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:23,960-Speed 2492.80 samples/sec Loss 1.1273 LearningRate 0.000016 Epoch: 35 Global Step: 734460 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:32,111-Speed 2512.79 samples/sec Loss 1.1183 LearningRate 0.000016 Epoch: 35 Global Step: 734470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:40,317-Speed 2496.32 samples/sec Loss 1.1303 LearningRate 0.000016 Epoch: 35 Global Step: 734480 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:48,520-Speed 2496.69 samples/sec Loss 1.1094 LearningRate 0.000016 Epoch: 35 Global Step: 734490 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:34:56,724-Speed 2497.35 samples/sec Loss 1.1442 LearningRate 0.000016 Epoch: 35 Global Step: 734500 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:04,931-Speed 2495.98 samples/sec Loss 1.1400 LearningRate 0.000016 Epoch: 35 Global Step: 734510 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:13,136-Speed 2496.39 samples/sec Loss 1.1426 LearningRate 0.000016 Epoch: 35 Global Step: 734520 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:21,289-Speed 2512.50 samples/sec Loss 1.1299 LearningRate 0.000016 Epoch: 35 Global Step: 734530 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:29,494-Speed 2496.19 samples/sec Loss 1.1048 LearningRate 0.000016 Epoch: 35 Global Step: 734540 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:37,697-Speed 2497.11 samples/sec Loss 1.1112 LearningRate 0.000016 Epoch: 35 Global Step: 734550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:45,897-Speed 2498.00 samples/sec Loss 1.1257 LearningRate 0.000016 Epoch: 35 Global Step: 734560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:35:54,105-Speed 2495.36 samples/sec Loss 1.1455 LearningRate 0.000016 Epoch: 35 Global Step: 734570 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:02,309-Speed 2496.91 samples/sec Loss 1.1322 LearningRate 0.000016 Epoch: 35 Global Step: 734580 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:10,461-Speed 2512.88 samples/sec Loss 1.1274 LearningRate 0.000016 Epoch: 35 Global Step: 734590 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:18,678-Speed 2492.67 samples/sec Loss 1.1444 LearningRate 0.000016 Epoch: 35 Global Step: 734600 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:26,889-Speed 2494.63 samples/sec Loss 1.1139 LearningRate 0.000016 Epoch: 35 Global Step: 734610 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:35,095-Speed 2496.30 samples/sec Loss 1.1220 LearningRate 0.000016 Epoch: 35 Global Step: 734620 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:43,297-Speed 2497.08 samples/sec Loss 1.1496 LearningRate 0.000016 Epoch: 35 Global Step: 734630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:51,499-Speed 2497.49 samples/sec Loss 1.1238 LearningRate 0.000016 Epoch: 35 Global Step: 734640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:36:59,669-Speed 2507.11 samples/sec Loss 1.1205 LearningRate 0.000016 Epoch: 35 Global Step: 734650 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:07,872-Speed 2496.81 samples/sec Loss 1.1536 LearningRate 0.000016 Epoch: 35 Global Step: 734660 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:16,078-Speed 2496.44 samples/sec Loss 1.1098 LearningRate 0.000016 Epoch: 35 Global Step: 734670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:24,280-Speed 2497.15 samples/sec Loss 1.1048 LearningRate 0.000016 Epoch: 35 Global Step: 734680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:32,494-Speed 2493.81 samples/sec Loss 1.1286 LearningRate 0.000016 Epoch: 35 Global Step: 734690 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:40,703-Speed 2495.08 samples/sec Loss 1.1030 LearningRate 0.000016 Epoch: 35 Global Step: 734700 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:48,851-Speed 2514.32 samples/sec Loss 1.1318 LearningRate 0.000016 Epoch: 35 Global Step: 734710 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:37:57,052-Speed 2497.97 samples/sec Loss 1.1150 LearningRate 0.000016 Epoch: 35 Global Step: 734720 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:05,259-Speed 2495.62 samples/sec Loss 1.1334 LearningRate 0.000016 Epoch: 35 Global Step: 734730 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:13,460-Speed 2497.77 samples/sec Loss 1.1529 LearningRate 0.000016 Epoch: 35 Global Step: 734740 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:21,663-Speed 2496.91 samples/sec Loss 1.0969 LearningRate 0.000016 Epoch: 35 Global Step: 734750 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:29,860-Speed 2498.79 samples/sec Loss 1.0948 LearningRate 0.000016 Epoch: 35 Global Step: 734760 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:38,008-Speed 2513.79 samples/sec Loss 1.1179 LearningRate 0.000016 Epoch: 35 Global Step: 734770 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:46,211-Speed 2497.23 samples/sec Loss 1.1270 LearningRate 0.000016 Epoch: 35 Global Step: 734780 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:38:54,415-Speed 2496.72 samples/sec Loss 1.1239 LearningRate 0.000016 Epoch: 35 Global Step: 734790 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:02,617-Speed 2497.22 samples/sec Loss 1.1304 LearningRate 0.000016 Epoch: 35 Global Step: 734800 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:10,825-Speed 2495.57 samples/sec Loss 1.0978 LearningRate 0.000016 Epoch: 35 Global Step: 734810 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:19,037-Speed 2494.47 samples/sec Loss 1.1223 LearningRate 0.000016 Epoch: 35 Global Step: 734820 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:27,187-Speed 2513.46 samples/sec Loss 1.1449 LearningRate 0.000016 Epoch: 35 Global Step: 734830 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:35,400-Speed 2493.75 samples/sec Loss 1.0982 LearningRate 0.000016 Epoch: 35 Global Step: 734840 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:43,612-Speed 2494.41 samples/sec Loss 1.1084 LearningRate 0.000016 Epoch: 35 Global Step: 734850 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:39:51,812-Speed 2497.82 samples/sec Loss 1.0984 LearningRate 0.000016 Epoch: 35 Global Step: 734860 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:00,010-Speed 2498.77 samples/sec Loss 1.0851 LearningRate 0.000016 Epoch: 35 Global Step: 734870 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:08,208-Speed 2498.27 samples/sec Loss 1.1336 LearningRate 0.000016 Epoch: 35 Global Step: 734880 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:16,355-Speed 2514.25 samples/sec Loss 1.1384 LearningRate 0.000016 Epoch: 35 Global Step: 734890 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:24,557-Speed 2497.62 samples/sec Loss 1.1098 LearningRate 0.000016 Epoch: 35 Global Step: 734900 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:32,755-Speed 2498.32 samples/sec Loss 1.0931 LearningRate 0.000016 Epoch: 35 Global Step: 734910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:40,957-Speed 2497.20 samples/sec Loss 1.1130 LearningRate 0.000016 Epoch: 35 Global Step: 734920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:49,158-Speed 2497.86 samples/sec Loss 1.1307 LearningRate 0.000016 Epoch: 35 Global Step: 734930 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:40:57,357-Speed 2498.09 samples/sec Loss 1.1094 LearningRate 0.000016 Epoch: 35 Global Step: 734940 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:05,505-Speed 2514.04 samples/sec Loss 1.1009 LearningRate 0.000016 Epoch: 35 Global Step: 734950 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:13,704-Speed 2498.31 samples/sec Loss 1.1145 LearningRate 0.000016 Epoch: 35 Global Step: 734960 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:21,904-Speed 2498.11 samples/sec Loss 1.1101 LearningRate 0.000016 Epoch: 35 Global Step: 734970 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:30,106-Speed 2497.50 samples/sec Loss 1.1564 LearningRate 0.000016 Epoch: 35 Global Step: 734980 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:38,304-Speed 2498.24 samples/sec Loss 1.1113 LearningRate 0.000016 Epoch: 35 Global Step: 734990 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:46,506-Speed 2497.45 samples/sec Loss 1.1291 LearningRate 0.000016 Epoch: 35 Global Step: 735000 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:41:54,651-Speed 2514.85 samples/sec Loss 1.1192 LearningRate 0.000016 Epoch: 35 Global Step: 735010 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:02,852-Speed 2497.81 samples/sec Loss 1.1141 LearningRate 0.000016 Epoch: 35 Global Step: 735020 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:11,048-Speed 2499.14 samples/sec Loss 1.1192 LearningRate 0.000016 Epoch: 35 Global Step: 735030 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:19,246-Speed 2498.68 samples/sec Loss 1.1268 LearningRate 0.000016 Epoch: 35 Global Step: 735040 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:27,464-Speed 2492.63 samples/sec Loss 1.1319 LearningRate 0.000016 Epoch: 35 Global Step: 735050 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:35,664-Speed 2497.94 samples/sec Loss 1.1271 LearningRate 0.000016 Epoch: 35 Global Step: 735060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:43,810-Speed 2514.60 samples/sec Loss 1.1221 LearningRate 0.000016 Epoch: 35 Global Step: 735070 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:42:52,010-Speed 2497.73 samples/sec Loss 1.0912 LearningRate 0.000016 Epoch: 35 Global Step: 735080 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:00,209-Speed 2498.38 samples/sec Loss 1.1371 LearningRate 0.000016 Epoch: 35 Global Step: 735090 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:08,409-Speed 2498.09 samples/sec Loss 1.1408 LearningRate 0.000016 Epoch: 35 Global Step: 735100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:16,608-Speed 2498.07 samples/sec Loss 1.1007 LearningRate 0.000016 Epoch: 35 Global Step: 735110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:24,810-Speed 2497.41 samples/sec Loss 1.1027 LearningRate 0.000016 Epoch: 35 Global Step: 735120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:32,958-Speed 2513.80 samples/sec Loss 1.0910 LearningRate 0.000016 Epoch: 35 Global Step: 735130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-07-12 14:43:41,118-Speed 2510.27 samples/sec Loss 1.1095 LearningRate 0.000016 Epoch: 35 Global Step: 735140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:43:49,317-Speed 2498.53 samples/sec Loss 1.1365 LearningRate 0.000016 Epoch: 35 Global Step: 735150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:43:57,528-Speed 2494.41 samples/sec Loss 1.1354 LearningRate 0.000016 Epoch: 35 Global Step: 735160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:05,727-Speed 2498.32 samples/sec Loss 1.1018 LearningRate 0.000016 Epoch: 35 Global Step: 735170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:13,929-Speed 2497.10 samples/sec Loss 1.1280 LearningRate 0.000016 Epoch: 35 Global Step: 735180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:22,071-Speed 2515.86 samples/sec Loss 1.1331 LearningRate 0.000016 Epoch: 35 Global Step: 735190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:30,271-Speed 2498.37 samples/sec Loss 1.0709 LearningRate 0.000016 Epoch: 35 Global Step: 735200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:38,477-Speed 2496.15 samples/sec Loss 1.1205 LearningRate 0.000016 Epoch: 35 Global Step: 735210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:46,677-Speed 2497.74 samples/sec Loss 1.1498 LearningRate 0.000016 Epoch: 35 Global Step: 735220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:44:54,877-Speed 2498.05 samples/sec Loss 1.0916 LearningRate 0.000016 Epoch: 35 Global Step: 735230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:03,089-Speed 2494.44 samples/sec Loss 1.1167 LearningRate 0.000016 Epoch: 35 Global Step: 735240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:11,234-Speed 2514.70 samples/sec Loss 1.1394 LearningRate 0.000016 Epoch: 35 Global Step: 735250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:19,435-Speed 2497.88 samples/sec Loss 1.1285 LearningRate 0.000016 Epoch: 35 Global Step: 735260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:27,635-Speed 2497.98 samples/sec Loss 1.1400 LearningRate 0.000016 Epoch: 35 Global Step: 735270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:35,842-Speed 2495.92 samples/sec Loss 1.1137 LearningRate 0.000016 Epoch: 35 Global Step: 735280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:44,045-Speed 2496.92 samples/sec Loss 1.1053 LearningRate 0.000016 Epoch: 35 Global Step: 735290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:45:52,256-Speed 2494.64 samples/sec Loss 1.0782 LearningRate 0.000016 Epoch: 35 Global Step: 735300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:00,405-Speed 2513.67 samples/sec Loss 1.1166 LearningRate 0.000016 Epoch: 35 Global Step: 735310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:08,607-Speed 2497.14 samples/sec Loss 1.1511 LearningRate 0.000016 Epoch: 35 Global Step: 735320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:16,807-Speed 2498.15 samples/sec Loss 1.1240 LearningRate 0.000016 Epoch: 35 Global Step: 735330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:25,012-Speed 2496.35 samples/sec Loss 1.1136 LearningRate 0.000016 Epoch: 35 Global Step: 735340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:33,213-Speed 2497.54 samples/sec Loss 1.0872 LearningRate 0.000016 Epoch: 35 Global Step: 735350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:41,409-Speed 2499.35 samples/sec Loss 1.1278 LearningRate 0.000016 Epoch: 35 Global Step: 735360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:49,556-Speed 2514.09 samples/sec Loss 1.1351 LearningRate 0.000016 Epoch: 35 Global Step: 735370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:46:57,759-Speed 2497.13 samples/sec Loss 1.0922 LearningRate 0.000016 Epoch: 35 Global Step: 735380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:05,960-Speed 2497.72 samples/sec Loss 1.0978 LearningRate 0.000016 Epoch: 35 Global Step: 735390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:14,162-Speed 2497.74 samples/sec Loss 1.1141 LearningRate 0.000016 Epoch: 35 Global Step: 735400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:22,361-Speed 2498.02 samples/sec Loss 1.1172 LearningRate 0.000016 Epoch: 35 Global Step: 735410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:30,563-Speed 2497.43 samples/sec Loss 1.1468 LearningRate 0.000016 Epoch: 35 Global Step: 735420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:38,711-Speed 2513.92 samples/sec Loss 1.1095 LearningRate 0.000016 Epoch: 35 Global Step: 735430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:46,914-Speed 2497.31 samples/sec Loss 1.1436 LearningRate 0.000016 Epoch: 35 Global Step: 735440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:47:55,124-Speed 2494.84 samples/sec Loss 1.1418 LearningRate 0.000016 Epoch: 35 Global Step: 735450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:03,335-Speed 2494.62 samples/sec Loss 1.1351 LearningRate 0.000016 Epoch: 35 Global Step: 735460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:11,536-Speed 2497.52 samples/sec Loss 1.1052 LearningRate 0.000016 Epoch: 35 Global Step: 735470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:19,738-Speed 2497.31 samples/sec Loss 1.1178 LearningRate 0.000016 Epoch: 35 Global Step: 735480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:27,882-Speed 2514.84 samples/sec Loss 1.1400 LearningRate 0.000016 Epoch: 35 Global Step: 735490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:36,085-Speed 2497.05 samples/sec Loss 1.0884 LearningRate 0.000016 Epoch: 35 Global Step: 735500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:44,290-Speed 2496.71 samples/sec Loss 1.1138 LearningRate 0.000016 Epoch: 35 Global Step: 735510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:48:52,495-Speed 2496.28 samples/sec Loss 1.1046 LearningRate 0.000016 Epoch: 35 Global Step: 735520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:00,698-Speed 2497.18 samples/sec Loss 1.1597 LearningRate 0.000016 Epoch: 35 Global Step: 735530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:08,902-Speed 2496.88 samples/sec Loss 1.1143 LearningRate 0.000016 Epoch: 35 Global Step: 735540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:17,051-Speed 2513.70 samples/sec Loss 1.1140 LearningRate 0.000016 Epoch: 35 Global Step: 735550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:25,249-Speed 2498.51 samples/sec Loss 1.1184 LearningRate 0.000016 Epoch: 35 Global Step: 735560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:33,456-Speed 2495.79 samples/sec Loss 1.1149 LearningRate 0.000016 Epoch: 35 Global Step: 735570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:41,655-Speed 2498.17 samples/sec Loss 1.1272 LearningRate 0.000016 Epoch: 35 Global Step: 735580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:49,855-Speed 2498.48 samples/sec Loss 1.1055 LearningRate 0.000016 Epoch: 35 Global Step: 735590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:49:58,055-Speed 2497.94 samples/sec Loss 1.0925 LearningRate 0.000016 Epoch: 35 Global Step: 735600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:50:06,204-Speed 2513.44 samples/sec Loss 1.1370 LearningRate 0.000016 Epoch: 35 Global Step: 735610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:50:14,404-Speed 2498.14 samples/sec Loss 1.1672 LearningRate 0.000016 Epoch: 35 Global Step: 735620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:50:22,605-Speed 2497.69 samples/sec Loss 1.1262 LearningRate 0.000016 Epoch: 35 Global Step: 735630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-07-12 14:50:30,808-Speed 2496.78 samples/sec Loss 1.1118 LearningRate 0.000016 Epoch: 35 Global Step: 735640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:50:39,009-Speed 2497.93 samples/sec Loss 1.1518 LearningRate 0.000016 Epoch: 35 Global Step: 735650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:50:47,212-Speed 2497.35 samples/sec Loss 1.1291 LearningRate 0.000016 Epoch: 35 Global Step: 735660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:50:55,364-Speed 2512.60 samples/sec Loss 1.1237 LearningRate 0.000016 Epoch: 35 Global Step: 735670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:03,572-Speed 2495.68 samples/sec Loss 1.0900 LearningRate 0.000016 Epoch: 35 Global Step: 735680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:11,771-Speed 2498.28 samples/sec Loss 1.1225 LearningRate 0.000016 Epoch: 35 Global Step: 735690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:19,981-Speed 2495.01 samples/sec Loss 1.1087 LearningRate 0.000016 Epoch: 35 Global Step: 735700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:28,180-Speed 2498.02 samples/sec Loss 1.0996 LearningRate 0.000016 Epoch: 35 Global Step: 735710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:36,381-Speed 2497.93 samples/sec Loss 1.1077 LearningRate 0.000016 Epoch: 35 Global Step: 735720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:44,551-Speed 2507.29 samples/sec Loss 1.1139 LearningRate 0.000016 Epoch: 35 Global Step: 735730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:51:52,752-Speed 2497.67 samples/sec Loss 1.1130 LearningRate 0.000016 Epoch: 35 Global Step: 735740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:00,951-Speed 2498.13 samples/sec Loss 1.1026 LearningRate 0.000016 Epoch: 35 Global Step: 735750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:09,158-Speed 2496.01 samples/sec Loss 1.1152 LearningRate 0.000016 Epoch: 35 Global Step: 735760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:17,358-Speed 2497.97 samples/sec Loss 1.1238 LearningRate 0.000016 Epoch: 35 Global Step: 735770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:25,563-Speed 2496.27 samples/sec Loss 1.1275 LearningRate 0.000016 Epoch: 35 Global Step: 735780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:33,716-Speed 2512.34 samples/sec Loss 1.1347 LearningRate 0.000016 Epoch: 35 Global Step: 735790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:41,914-Speed 2498.56 samples/sec Loss 1.1466 LearningRate 0.000016 Epoch: 35 Global Step: 735800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:50,114-Speed 2498.10 samples/sec Loss 1.1496 LearningRate 0.000016 Epoch: 35 Global Step: 735810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:52:58,315-Speed 2497.56 samples/sec Loss 1.1346 LearningRate 0.000016 Epoch: 35 Global Step: 735820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:53:06,515-Speed 2497.74 samples/sec Loss 1.1142 LearningRate 0.000016 Epoch: 35 Global Step: 735830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:53:14,715-Speed 2498.84 samples/sec Loss 1.1295 LearningRate 0.000016 Epoch: 35 Global Step: 735840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:53:22,860-Speed 2514.75 samples/sec Loss 1.1299 LearningRate 0.000016 Epoch: 35 Global Step: 735850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:53:31,056-Speed 2499.32 samples/sec Loss 1.1237 LearningRate 0.000016 Epoch: 35 Global Step: 735860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 14:53:39,210-Speed 2512.32 samples/sec Loss 1.1270 LearningRate 0.000016 Epoch: 35 Global Step: 735870 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:53:47,409-Speed 2498.44 samples/sec Loss 1.0922 LearningRate 0.000016 Epoch: 35 Global Step: 735880 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:53:55,607-Speed 2498.35 samples/sec Loss 1.1513 LearningRate 0.000016 Epoch: 35 Global Step: 735890 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:03,806-Speed 2498.17 samples/sec Loss 1.1632 LearningRate 0.000016 Epoch: 35 Global Step: 735900 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:11,955-Speed 2513.82 samples/sec Loss 1.1104 LearningRate 0.000016 Epoch: 35 Global Step: 735910 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:20,150-Speed 2499.37 samples/sec Loss 1.1102 LearningRate 0.000016 Epoch: 35 Global Step: 735920 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:28,352-Speed 2497.35 samples/sec Loss 1.1155 LearningRate 0.000016 Epoch: 35 Global Step: 735930 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:36,550-Speed 2498.82 samples/sec Loss 1.1152 LearningRate 0.000016 Epoch: 35 Global Step: 735940 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:44,759-Speed 2495.17 samples/sec Loss 1.1219 LearningRate 0.000016 Epoch: 35 Global Step: 735950 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:54:52,959-Speed 2498.09 samples/sec Loss 1.1454 LearningRate 0.000016 Epoch: 35 Global Step: 735960 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:01,105-Speed 2514.65 samples/sec Loss 1.1259 LearningRate 0.000016 Epoch: 35 Global Step: 735970 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:09,313-Speed 2495.30 samples/sec Loss 1.1330 LearningRate 0.000016 Epoch: 35 Global Step: 735980 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:17,510-Speed 2498.96 samples/sec Loss 1.1278 LearningRate 0.000016 Epoch: 35 Global Step: 735990 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:25,715-Speed 2496.74 samples/sec Loss 1.1337 LearningRate 0.000016 Epoch: 35 Global Step: 736000 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:33,913-Speed 2498.56 samples/sec Loss 1.1227 LearningRate 0.000016 Epoch: 35 Global Step: 736010 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:42,120-Speed 2495.47 samples/sec Loss 1.1248 LearningRate 0.000016 Epoch: 35 Global Step: 736020 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:50,266-Speed 2514.68 samples/sec Loss 1.1252 LearningRate 0.000016 Epoch: 35 Global Step: 736030 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:55:58,467-Speed 2498.04 samples/sec Loss 1.0683 LearningRate 0.000016 Epoch: 35 Global Step: 736040 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:06,664-Speed 2498.67 samples/sec Loss 1.1047 LearningRate 0.000016 Epoch: 35 Global Step: 736050 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:14,866-Speed 2497.51 samples/sec Loss 1.1140 LearningRate 0.000016 Epoch: 35 Global Step: 736060 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:23,063-Speed 2498.68 samples/sec Loss 1.1050 LearningRate 0.000016 Epoch: 35 Global Step: 736070 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:31,274-Speed 2494.82 samples/sec Loss 1.1493 LearningRate 0.000016 Epoch: 35 Global Step: 736080 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:39,418-Speed 2514.90 samples/sec Loss 1.1544 LearningRate 0.000016 Epoch: 35 Global Step: 736090 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:47,618-Speed 2498.06 samples/sec Loss 1.1151 LearningRate 0.000016 Epoch: 35 Global Step: 736100 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:56:55,822-Speed 2496.78 samples/sec Loss 1.1264 LearningRate 0.000016 Epoch: 35 Global Step: 736110 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:04,020-Speed 2498.58 samples/sec Loss 1.1296 LearningRate 0.000016 Epoch: 35 Global Step: 736120 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:12,220-Speed 2497.93 samples/sec Loss 1.0865 LearningRate 0.000016 Epoch: 35 Global Step: 736130 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:20,419-Speed 2498.32 samples/sec Loss 1.1063 LearningRate 0.000016 Epoch: 35 Global Step: 736140 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:28,570-Speed 2513.15 samples/sec Loss 1.1153 LearningRate 0.000016 Epoch: 35 Global Step: 736150 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:36,776-Speed 2495.90 samples/sec Loss 1.1155 LearningRate 0.000016 Epoch: 35 Global Step: 736160 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:44,985-Speed 2495.48 samples/sec Loss 1.1016 LearningRate 0.000016 Epoch: 35 Global Step: 736170 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:57:53,185-Speed 2497.93 samples/sec Loss 1.1138 LearningRate 0.000016 Epoch: 35 Global Step: 736180 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:01,384-Speed 2498.22 samples/sec Loss 1.1214 LearningRate 0.000016 Epoch: 35 Global Step: 736190 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:09,593-Speed 2495.26 samples/sec Loss 1.0893 LearningRate 0.000016 Epoch: 35 Global Step: 736200 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:17,738-Speed 2514.78 samples/sec Loss 1.1328 LearningRate 0.000016 Epoch: 35 Global Step: 736210 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:25,953-Speed 2493.45 samples/sec Loss 1.1187 LearningRate 0.000016 Epoch: 35 Global Step: 736220 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:34,148-Speed 2499.56 samples/sec Loss 1.1037 LearningRate 0.000016 Epoch: 35 Global Step: 736230 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:42,352-Speed 2496.68 samples/sec Loss 1.1114 LearningRate 0.000016 Epoch: 35 Global Step: 736240 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:50,550-Speed 2498.37 samples/sec Loss 1.1254 LearningRate 0.000016 Epoch: 35 Global Step: 736250 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:58:58,749-Speed 2498.34 samples/sec Loss 1.1021 LearningRate 0.000016 Epoch: 35 Global Step: 736260 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:06,898-Speed 2513.81 samples/sec Loss 1.1393 LearningRate 0.000016 Epoch: 35 Global Step: 736270 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:15,101-Speed 2496.90 samples/sec Loss 1.1226 LearningRate 0.000016 Epoch: 35 Global Step: 736280 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:23,305-Speed 2496.80 samples/sec Loss 1.1459 LearningRate 0.000016 Epoch: 35 Global Step: 736290 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:31,510-Speed 2496.38 samples/sec Loss 1.1087 LearningRate 0.000016 Epoch: 35 Global Step: 736300 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:39,715-Speed 2496.44 samples/sec Loss 1.1576 LearningRate 0.000016 Epoch: 35 Global Step: 736310 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:47,917-Speed 2497.58 samples/sec Loss 1.1322 LearningRate 0.000016 Epoch: 35 Global Step: 736320 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 14:59:56,068-Speed 2512.80 samples/sec Loss 1.1133 LearningRate 0.000016 Epoch: 35 Global Step: 736330 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:04,274-Speed 2496.32 samples/sec Loss 1.1141 LearningRate 0.000016 Epoch: 35 Global Step: 736340 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:12,476-Speed 2497.43 samples/sec Loss 1.1265 LearningRate 0.000016 Epoch: 35 Global Step: 736350 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:20,679-Speed 2497.00 samples/sec Loss 1.1204 LearningRate 0.000016 Epoch: 35 Global Step: 736360 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:28,882-Speed 2496.92 samples/sec Loss 1.1313 LearningRate 0.000016 Epoch: 35 Global Step: 736370 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:37,085-Speed 2497.07 samples/sec Loss 1.0956 LearningRate 0.000016 Epoch: 35 Global Step: 736380 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:45,236-Speed 2513.16 samples/sec Loss 1.1269 LearningRate 0.000016 Epoch: 35 Global Step: 736390 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:00:53,457-Speed 2491.53 samples/sec Loss 1.0884 LearningRate 0.000016 Epoch: 35 Global Step: 736400 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:01,662-Speed 2496.35 samples/sec Loss 1.1310 LearningRate 0.000016 Epoch: 35 Global Step: 736410 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:09,870-Speed 2495.68 samples/sec Loss 1.1054 LearningRate 0.000016 Epoch: 35 Global Step: 736420 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:18,073-Speed 2496.87 samples/sec Loss 1.1035 LearningRate 0.000016 Epoch: 35 Global Step: 736430 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:26,276-Speed 2497.28 samples/sec Loss 1.1297 LearningRate 0.000016 Epoch: 35 Global Step: 736440 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:34,426-Speed 2513.23 samples/sec Loss 1.1242 LearningRate 0.000016 Epoch: 35 Global Step: 736450 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:42,627-Speed 2498.04 samples/sec Loss 1.1155 LearningRate 0.000016 Epoch: 35 Global Step: 736460 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:50,829-Speed 2497.42 samples/sec Loss 1.1486 LearningRate 0.000016 Epoch: 35 Global Step: 736470 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:01:59,035-Speed 2496.08 samples/sec Loss 1.1065 LearningRate 0.000016 Epoch: 35 Global Step: 736480 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:07,234-Speed 2498.22 samples/sec Loss 1.1132 LearningRate 0.000016 Epoch: 35 Global Step: 736490 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:15,439-Speed 2496.85 samples/sec Loss 1.0889 LearningRate 0.000016 Epoch: 35 Global Step: 736500 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:23,588-Speed 2513.56 samples/sec Loss 1.1502 LearningRate 0.000016 Epoch: 35 Global Step: 736510 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:31,789-Speed 2497.79 samples/sec Loss 1.1138 LearningRate 0.000016 Epoch: 35 Global Step: 736520 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:39,992-Speed 2496.93 samples/sec Loss 1.1025 LearningRate 0.000016 Epoch: 35 Global Step: 736530 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:48,193-Speed 2497.76 samples/sec Loss 1.1227 LearningRate 0.000016 Epoch: 35 Global Step: 736540 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:02:56,398-Speed 2496.39 samples/sec Loss 1.1055 LearningRate 0.000016 Epoch: 35 Global Step: 736550 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:04,601-Speed 2496.90 samples/sec Loss 1.1090 LearningRate 0.000016 Epoch: 35 Global Step: 736560 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:12,749-Speed 2514.03 samples/sec Loss 1.1187 LearningRate 0.000016 Epoch: 35 Global Step: 736570 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:20,950-Speed 2498.18 samples/sec Loss 1.1200 LearningRate 0.000016 Epoch: 35 Global Step: 736580 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:29,154-Speed 2496.38 samples/sec Loss 1.1242 LearningRate 0.000016 Epoch: 35 Global Step: 736590 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:37,360-Speed 2496.73 samples/sec Loss 1.1358 LearningRate 0.000016 Epoch: 35 Global Step: 736600 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:45,561-Speed 2497.60 samples/sec Loss 1.1347 LearningRate 0.000015 Epoch: 35 Global Step: 736610 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:03:53,766-Speed 2496.27 samples/sec Loss 1.1270 LearningRate 0.000015 Epoch: 35 Global Step: 736620 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:01,918-Speed 2512.57 samples/sec Loss 1.0591 LearningRate 0.000015 Epoch: 35 Global Step: 736630 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:10,121-Speed 2497.23 samples/sec Loss 1.1246 LearningRate 0.000015 Epoch: 35 Global Step: 736640 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:18,324-Speed 2497.10 samples/sec Loss 1.1386 LearningRate 0.000015 Epoch: 35 Global Step: 736650 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:26,530-Speed 2496.21 samples/sec Loss 1.1105 LearningRate 0.000015 Epoch: 35 Global Step: 736660 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:34,734-Speed 2496.74 samples/sec Loss 1.1156 LearningRate 0.000015 Epoch: 35 Global Step: 736670 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:42,937-Speed 2496.89 samples/sec Loss 1.1151 LearningRate 0.000015 Epoch: 35 Global Step: 736680 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:51,086-Speed 2513.79 samples/sec Loss 1.1237 LearningRate 0.000015 Epoch: 35 Global Step: 736690 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:04:59,293-Speed 2495.94 samples/sec Loss 1.1333 LearningRate 0.000015 Epoch: 35 Global Step: 736700 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:07,499-Speed 2495.83 samples/sec Loss 1.1143 LearningRate 0.000015 Epoch: 35 Global Step: 736710 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:15,704-Speed 2496.48 samples/sec Loss 1.1222 LearningRate 0.000015 Epoch: 35 Global Step: 736720 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:23,904-Speed 2498.17 samples/sec Loss 1.0946 LearningRate 0.000015 Epoch: 35 Global Step: 736730 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:32,108-Speed 2496.85 samples/sec Loss 1.1140 LearningRate 0.000015 Epoch: 35 Global Step: 736740 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:40,258-Speed 2513.11 samples/sec Loss 1.1404 LearningRate 0.000015 Epoch: 35 Global Step: 736750 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:48,458-Speed 2497.94 samples/sec Loss 1.1177 LearningRate 0.000015 Epoch: 35 Global Step: 736760 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:05:56,659-Speed 2497.81 samples/sec Loss 1.0997 LearningRate 0.000015 Epoch: 35 Global Step: 736770 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:04,860-Speed 2497.77 samples/sec Loss 1.1293 LearningRate 0.000015 Epoch: 35 Global Step: 736780 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:13,061-Speed 2497.82 samples/sec Loss 1.1083 LearningRate 0.000015 Epoch: 35 Global Step: 736790 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:21,265-Speed 2496.56 samples/sec Loss 1.1338 LearningRate 0.000015 Epoch: 35 Global Step: 736800 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:29,415-Speed 2513.26 samples/sec Loss 1.1208 LearningRate 0.000015 Epoch: 35 Global Step: 736810 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:37,624-Speed 2495.39 samples/sec Loss 1.1110 LearningRate 0.000015 Epoch: 35 Global Step: 736820 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:45,836-Speed 2494.29 samples/sec Loss 1.1221 LearningRate 0.000015 Epoch: 35 Global Step: 736830 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:06:54,038-Speed 2497.50 samples/sec Loss 1.1183 LearningRate 0.000015 Epoch: 35 Global Step: 736840 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:02,239-Speed 2497.71 samples/sec Loss 1.1041 LearningRate 0.000015 Epoch: 35 Global Step: 736850 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:10,441-Speed 2497.20 samples/sec Loss 1.1682 LearningRate 0.000015 Epoch: 35 Global Step: 736860 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:18,588-Speed 2514.45 samples/sec Loss 1.1068 LearningRate 0.000015 Epoch: 35 Global Step: 736870 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:26,790-Speed 2497.57 samples/sec Loss 1.1041 LearningRate 0.000015 Epoch: 35 Global Step: 736880 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:34,991-Speed 2497.79 samples/sec Loss 1.0994 LearningRate 0.000015 Epoch: 35 Global Step: 736890 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:43,198-Speed 2496.30 samples/sec Loss 1.1151 LearningRate 0.000015 Epoch: 35 Global Step: 736900 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:51,398-Speed 2498.02 samples/sec Loss 1.1205 LearningRate 0.000015 Epoch: 35 Global Step: 736910 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:07:59,598-Speed 2497.81 samples/sec Loss 1.1108 LearningRate 0.000015 Epoch: 35 Global Step: 736920 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:07,745-Speed 2514.06 samples/sec Loss 1.1362 LearningRate 0.000015 Epoch: 35 Global Step: 736930 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:15,948-Speed 2497.18 samples/sec Loss 1.1418 LearningRate 0.000015 Epoch: 35 Global Step: 736940 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:24,149-Speed 2497.62 samples/sec Loss 1.1464 LearningRate 0.000015 Epoch: 35 Global Step: 736950 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:32,355-Speed 2496.24 samples/sec Loss 1.1079 LearningRate 0.000015 Epoch: 35 Global Step: 736960 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:40,555-Speed 2498.10 samples/sec Loss 1.1234 LearningRate 0.000015 Epoch: 35 Global Step: 736970 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:48,756-Speed 2497.74 samples/sec Loss 1.0981 LearningRate 0.000015 Epoch: 35 Global Step: 736980 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:08:56,899-Speed 2515.56 samples/sec Loss 1.1308 LearningRate 0.000015 Epoch: 35 Global Step: 736990 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:05,100-Speed 2497.61 samples/sec Loss 1.1178 LearningRate 0.000015 Epoch: 35 Global Step: 737000 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:13,298-Speed 2498.60 samples/sec Loss 1.1430 LearningRate 0.000015 Epoch: 35 Global Step: 737010 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:21,502-Speed 2496.59 samples/sec Loss 1.1121 LearningRate 0.000015 Epoch: 35 Global Step: 737020 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:29,702-Speed 2498.00 samples/sec Loss 1.1300 LearningRate 0.000015 Epoch: 35 Global Step: 737030 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:37,901-Speed 2498.26 samples/sec Loss 1.1174 LearningRate 0.000015 Epoch: 35 Global Step: 737040 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:46,064-Speed 2509.43 samples/sec Loss 1.1132 LearningRate 0.000015 Epoch: 35 Global Step: 737050 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:09:54,264-Speed 2498.02 samples/sec Loss 1.1126 LearningRate 0.000015 Epoch: 35 Global Step: 737060 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-07-12 15:10:02,470-Speed 2496.17 samples/sec Loss 1.1248 LearningRate 0.000015 Epoch: 35 Global Step: 737070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:10,682-Speed 2494.28 samples/sec Loss 1.1036 LearningRate 0.000015 Epoch: 35 Global Step: 737080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:18,881-Speed 2498.34 samples/sec Loss 1.0998 LearningRate 0.000015 Epoch: 35 Global Step: 737090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:27,093-Speed 2494.30 samples/sec Loss 1.1175 LearningRate 0.000015 Epoch: 35 Global Step: 737100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:35,241-Speed 2514.00 samples/sec Loss 1.0996 LearningRate 0.000015 Epoch: 35 Global Step: 737110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:43,447-Speed 2496.28 samples/sec Loss 1.1064 LearningRate 0.000015 Epoch: 35 Global Step: 737120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:51,652-Speed 2496.35 samples/sec Loss 1.1256 LearningRate 0.000015 Epoch: 35 Global Step: 737130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:10:59,860-Speed 2495.47 samples/sec Loss 1.1064 LearningRate 0.000015 Epoch: 35 Global Step: 737140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:08,062-Speed 2497.60 samples/sec Loss 1.1358 LearningRate 0.000015 Epoch: 35 Global Step: 737150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:16,265-Speed 2497.04 samples/sec Loss 1.1218 LearningRate 0.000015 Epoch: 35 Global Step: 737160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:24,411-Speed 2514.44 samples/sec Loss 1.0798 LearningRate 0.000015 Epoch: 35 Global Step: 737170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:32,617-Speed 2496.18 samples/sec Loss 1.1373 LearningRate 0.000015 Epoch: 35 Global Step: 737180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:40,819-Speed 2497.46 samples/sec Loss 1.1233 LearningRate 0.000015 Epoch: 35 Global Step: 737190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:49,022-Speed 2497.06 samples/sec Loss 1.1181 LearningRate 0.000015 Epoch: 35 Global Step: 737200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:11:57,222-Speed 2498.17 samples/sec Loss 1.1422 LearningRate 0.000015 Epoch: 35 Global Step: 737210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:05,424-Speed 2497.84 samples/sec Loss 1.1031 LearningRate 0.000015 Epoch: 35 Global Step: 737220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:13,579-Speed 2512.34 samples/sec Loss 1.1213 LearningRate 0.000015 Epoch: 35 Global Step: 737230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:21,774-Speed 2499.35 samples/sec Loss 1.1547 LearningRate 0.000015 Epoch: 35 Global Step: 737240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:29,976-Speed 2497.31 samples/sec Loss 1.1274 LearningRate 0.000015 Epoch: 35 Global Step: 737250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:38,179-Speed 2497.59 samples/sec Loss 1.1509 LearningRate 0.000015 Epoch: 35 Global Step: 737260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:46,416-Speed 2486.69 samples/sec Loss 1.1091 LearningRate 0.000015 Epoch: 35 Global Step: 737270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:12:54,632-Speed 2493.16 samples/sec Loss 1.1247 LearningRate 0.000015 Epoch: 35 Global Step: 737280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:02,778-Speed 2514.45 samples/sec Loss 1.1392 LearningRate 0.000015 Epoch: 35 Global Step: 737290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:10,977-Speed 2498.26 samples/sec Loss 1.0960 LearningRate 0.000015 Epoch: 35 Global Step: 737300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:19,180-Speed 2497.27 samples/sec Loss 1.1749 LearningRate 0.000015 Epoch: 35 Global Step: 737310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:27,380-Speed 2497.64 samples/sec Loss 1.1432 LearningRate 0.000015 Epoch: 35 Global Step: 737320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:35,582-Speed 2497.49 samples/sec Loss 1.1044 LearningRate 0.000015 Epoch: 35 Global Step: 737330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:43,784-Speed 2497.27 samples/sec Loss 1.1003 LearningRate 0.000015 Epoch: 35 Global Step: 737340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:13:51,938-Speed 2511.98 samples/sec Loss 1.1269 LearningRate 0.000015 Epoch: 35 Global Step: 737350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:00,143-Speed 2496.34 samples/sec Loss 1.1173 LearningRate 0.000015 Epoch: 35 Global Step: 737360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:08,346-Speed 2496.96 samples/sec Loss 1.1252 LearningRate 0.000015 Epoch: 35 Global Step: 737370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:16,553-Speed 2496.08 samples/sec Loss 1.1209 LearningRate 0.000015 Epoch: 35 Global Step: 737380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:24,757-Speed 2496.82 samples/sec Loss 1.1015 LearningRate 0.000015 Epoch: 35 Global Step: 737390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:32,961-Speed 2496.70 samples/sec Loss 1.1078 LearningRate 0.000015 Epoch: 35 Global Step: 737400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:41,110-Speed 2513.51 samples/sec Loss 1.1133 LearningRate 0.000015 Epoch: 35 Global Step: 737410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:49,312-Speed 2497.35 samples/sec Loss 1.1134 LearningRate 0.000015 Epoch: 35 Global Step: 737420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:14:57,536-Speed 2490.75 samples/sec Loss 1.1011 LearningRate 0.000015 Epoch: 35 Global Step: 737430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:05,736-Speed 2497.81 samples/sec Loss 1.1241 LearningRate 0.000015 Epoch: 35 Global Step: 737440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:13,939-Speed 2497.35 samples/sec Loss 1.1082 LearningRate 0.000015 Epoch: 35 Global Step: 737450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:22,140-Speed 2497.78 samples/sec Loss 1.0922 LearningRate 0.000015 Epoch: 35 Global Step: 737460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:30,298-Speed 2510.89 samples/sec Loss 1.1272 LearningRate 0.000015 Epoch: 35 Global Step: 737470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:38,497-Speed 2498.38 samples/sec Loss 1.1560 LearningRate 0.000015 Epoch: 35 Global Step: 737480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:46,700-Speed 2497.17 samples/sec Loss 1.1236 LearningRate 0.000015 Epoch: 35 Global Step: 737490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:15:54,903-Speed 2497.08 samples/sec Loss 1.1044 LearningRate 0.000015 Epoch: 35 Global Step: 737500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:03,115-Speed 2494.10 samples/sec Loss 1.1031 LearningRate 0.000015 Epoch: 35 Global Step: 737510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:11,317-Speed 2497.48 samples/sec Loss 1.0950 LearningRate 0.000015 Epoch: 35 Global Step: 737520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:19,463-Speed 2514.39 samples/sec Loss 1.1444 LearningRate 0.000015 Epoch: 35 Global Step: 737530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:27,665-Speed 2497.39 samples/sec Loss 1.1378 LearningRate 0.000015 Epoch: 35 Global Step: 737540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:35,867-Speed 2497.35 samples/sec Loss 1.0992 LearningRate 0.000015 Epoch: 35 Global Step: 737550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:44,071-Speed 2496.98 samples/sec Loss 1.0834 LearningRate 0.000015 Epoch: 35 Global Step: 737560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:16:52,269-Speed 2498.51 samples/sec Loss 1.1293 LearningRate 0.000015 Epoch: 35 Global Step: 737570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:00,469-Speed 2497.79 samples/sec Loss 1.1338 LearningRate 0.000015 Epoch: 35 Global Step: 737580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:08,619-Speed 2513.67 samples/sec Loss 1.0992 LearningRate 0.000015 Epoch: 35 Global Step: 737590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:16,827-Speed 2495.74 samples/sec Loss 1.1117 LearningRate 0.000015 Epoch: 35 Global Step: 737600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:25,028-Speed 2497.97 samples/sec Loss 1.1211 LearningRate 0.000015 Epoch: 35 Global Step: 737610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:33,248-Speed 2491.85 samples/sec Loss 1.0960 LearningRate 0.000015 Epoch: 35 Global Step: 737620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:41,454-Speed 2496.15 samples/sec Loss 1.0819 LearningRate 0.000015 Epoch: 35 Global Step: 737630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:49,659-Speed 2496.89 samples/sec Loss 1.0994 LearningRate 0.000015 Epoch: 35 Global Step: 737640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:17:57,812-Speed 2512.52 samples/sec Loss 1.1061 LearningRate 0.000015 Epoch: 35 Global Step: 737650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:06,013-Speed 2497.54 samples/sec Loss 1.1208 LearningRate 0.000015 Epoch: 35 Global Step: 737660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:14,217-Speed 2496.57 samples/sec Loss 1.1283 LearningRate 0.000015 Epoch: 35 Global Step: 737670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:22,428-Speed 2494.59 samples/sec Loss 1.1316 LearningRate 0.000015 Epoch: 35 Global Step: 737680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:30,636-Speed 2495.62 samples/sec Loss 1.1398 LearningRate 0.000015 Epoch: 35 Global Step: 737690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:38,835-Speed 2498.33 samples/sec Loss 1.1196 LearningRate 0.000015 Epoch: 35 Global Step: 737700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:46,983-Speed 2513.72 samples/sec Loss 1.1112 LearningRate 0.000015 Epoch: 35 Global Step: 737710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:18:55,188-Speed 2497.17 samples/sec Loss 1.1437 LearningRate 0.000015 Epoch: 35 Global Step: 737720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:03,393-Speed 2496.63 samples/sec Loss 1.1305 LearningRate 0.000015 Epoch: 35 Global Step: 737730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:11,593-Speed 2497.65 samples/sec Loss 1.1297 LearningRate 0.000015 Epoch: 35 Global Step: 737740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:19,792-Speed 2498.42 samples/sec Loss 1.0970 LearningRate 0.000015 Epoch: 35 Global Step: 737750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:27,991-Speed 2498.23 samples/sec Loss 1.1055 LearningRate 0.000015 Epoch: 35 Global Step: 737760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:36,137-Speed 2514.56 samples/sec Loss 1.1233 LearningRate 0.000015 Epoch: 35 Global Step: 737770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:44,344-Speed 2495.73 samples/sec Loss 1.1260 LearningRate 0.000015 Epoch: 35 Global Step: 737780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:19:52,544-Speed 2498.23 samples/sec Loss 1.0819 LearningRate 0.000015 Epoch: 35 Global Step: 737790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:00,742-Speed 2498.66 samples/sec Loss 1.1015 LearningRate 0.000015 Epoch: 35 Global Step: 737800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:08,942-Speed 2497.93 samples/sec Loss 1.1216 LearningRate 0.000015 Epoch: 35 Global Step: 737810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:17,148-Speed 2495.98 samples/sec Loss 1.0526 LearningRate 0.000015 Epoch: 35 Global Step: 737820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:25,294-Speed 2514.61 samples/sec Loss 1.1143 LearningRate 0.000015 Epoch: 35 Global Step: 737830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:33,497-Speed 2497.17 samples/sec Loss 1.1099 LearningRate 0.000015 Epoch: 35 Global Step: 737840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:41,698-Speed 2497.37 samples/sec Loss 1.1117 LearningRate 0.000015 Epoch: 35 Global Step: 737850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:49,902-Speed 2496.47 samples/sec Loss 1.1036 LearningRate 0.000015 Epoch: 35 Global Step: 737860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:20:58,110-Speed 2495.60 samples/sec Loss 1.1272 LearningRate 0.000015 Epoch: 35 Global Step: 737870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:06,309-Speed 2498.17 samples/sec Loss 1.1003 LearningRate 0.000015 Epoch: 35 Global Step: 737880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:14,459-Speed 2513.13 samples/sec Loss 1.1207 LearningRate 0.000015 Epoch: 35 Global Step: 737890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:22,661-Speed 2497.61 samples/sec Loss 1.1099 LearningRate 0.000015 Epoch: 35 Global Step: 737900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:30,861-Speed 2498.28 samples/sec Loss 1.1056 LearningRate 0.000015 Epoch: 35 Global Step: 737910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:39,060-Speed 2498.42 samples/sec Loss 1.1173 LearningRate 0.000015 Epoch: 35 Global Step: 737920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:47,258-Speed 2498.45 samples/sec Loss 1.1129 LearningRate 0.000015 Epoch: 35 Global Step: 737930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:21:55,455-Speed 2498.69 samples/sec Loss 1.1022 LearningRate 0.000015 Epoch: 35 Global Step: 737940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:03,598-Speed 2515.43 samples/sec Loss 1.1200 LearningRate 0.000015 Epoch: 35 Global Step: 737950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:11,801-Speed 2497.38 samples/sec Loss 1.0955 LearningRate 0.000015 Epoch: 35 Global Step: 737960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:20,001-Speed 2497.87 samples/sec Loss 1.1111 LearningRate 0.000015 Epoch: 35 Global Step: 737970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:28,203-Speed 2497.25 samples/sec Loss 1.1186 LearningRate 0.000015 Epoch: 35 Global Step: 737980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:36,402-Speed 2498.63 samples/sec Loss 1.0824 LearningRate 0.000015 Epoch: 35 Global Step: 737990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:44,600-Speed 2498.59 samples/sec Loss 1.1141 LearningRate 0.000015 Epoch: 35 Global Step: 738000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:22:52,754-Speed 2512.60 samples/sec Loss 1.1127 LearningRate 0.000015 Epoch: 35 Global Step: 738010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:00,956-Speed 2497.53 samples/sec Loss 1.0727 LearningRate 0.000015 Epoch: 35 Global Step: 738020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:09,169-Speed 2493.98 samples/sec Loss 1.1058 LearningRate 0.000015 Epoch: 35 Global Step: 738030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:17,368-Speed 2498.10 samples/sec Loss 1.0983 LearningRate 0.000015 Epoch: 35 Global Step: 738040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:25,569-Speed 2497.60 samples/sec Loss 1.0819 LearningRate 0.000015 Epoch: 35 Global Step: 738050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:33,778-Speed 2495.02 samples/sec Loss 1.1418 LearningRate 0.000015 Epoch: 35 Global Step: 738060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:41,938-Speed 2510.44 samples/sec Loss 1.1200 LearningRate 0.000015 Epoch: 35 Global Step: 738070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:50,140-Speed 2497.62 samples/sec Loss 1.1230 LearningRate 0.000015 Epoch: 35 Global Step: 738080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:23:58,337-Speed 2498.61 samples/sec Loss 1.1224 LearningRate 0.000015 Epoch: 35 Global Step: 738090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:06,540-Speed 2497.31 samples/sec Loss 1.1413 LearningRate 0.000015 Epoch: 35 Global Step: 738100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:14,741-Speed 2497.72 samples/sec Loss 1.0935 LearningRate 0.000015 Epoch: 35 Global Step: 738110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:22,940-Speed 2498.12 samples/sec Loss 1.0981 LearningRate 0.000015 Epoch: 35 Global Step: 738120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:31,090-Speed 2513.41 samples/sec Loss 1.1193 LearningRate 0.000015 Epoch: 35 Global Step: 738130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:39,289-Speed 2498.17 samples/sec Loss 1.0719 LearningRate 0.000015 Epoch: 35 Global Step: 738140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:47,490-Speed 2497.82 samples/sec Loss 1.1214 LearningRate 0.000015 Epoch: 35 Global Step: 738150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:24:55,694-Speed 2496.67 samples/sec Loss 1.0941 LearningRate 0.000015 Epoch: 35 Global Step: 738160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:03,891-Speed 2498.84 samples/sec Loss 1.0931 LearningRate 0.000015 Epoch: 35 Global Step: 738170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:12,094-Speed 2497.30 samples/sec Loss 1.1083 LearningRate 0.000015 Epoch: 35 Global Step: 738180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:20,240-Speed 2514.62 samples/sec Loss 1.0977 LearningRate 0.000015 Epoch: 35 Global Step: 738190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:28,447-Speed 2495.68 samples/sec Loss 1.1362 LearningRate 0.000015 Epoch: 35 Global Step: 738200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:36,657-Speed 2494.78 samples/sec Loss 1.1100 LearningRate 0.000015 Epoch: 35 Global Step: 738210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:44,854-Speed 2498.95 samples/sec Loss 1.1146 LearningRate 0.000015 Epoch: 35 Global Step: 738220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:25:53,057-Speed 2497.25 samples/sec Loss 1.1440 LearningRate 0.000015 Epoch: 35 Global Step: 738230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:26:01,256-Speed 2498.02 samples/sec Loss 1.1223 LearningRate 0.000015 Epoch: 35 Global Step: 738240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:26:09,402-Speed 2514.58 samples/sec Loss 1.1286 LearningRate 0.000015 Epoch: 35 Global Step: 738250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:26:17,603-Speed 2497.74 samples/sec Loss 1.0697 LearningRate 0.000015 Epoch: 35 Global Step: 738260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:26:25,809-Speed 2496.35 samples/sec Loss 1.1045 LearningRate 0.000015 Epoch: 35 Global Step: 738270 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:26:34,009-Speed 2498.21 samples/sec Loss 1.0919 LearningRate 0.000015 Epoch: 35 Global Step: 738280 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:26:42,211-Speed 2497.18 samples/sec Loss 1.1058 LearningRate 0.000015 Epoch: 35 Global Step: 738290 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:26:50,411-Speed 2497.97 samples/sec Loss 1.1061 LearningRate 0.000015 Epoch: 35 Global Step: 738300 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:26:58,563-Speed 2512.57 samples/sec Loss 1.1077 LearningRate 0.000015 Epoch: 35 Global Step: 738310 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:06,769-Speed 2496.01 samples/sec Loss 1.1088 LearningRate 0.000015 Epoch: 35 Global Step: 738320 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:14,968-Speed 2498.22 samples/sec Loss 1.1043 LearningRate 0.000015 Epoch: 35 Global Step: 738330 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:23,171-Speed 2497.17 samples/sec Loss 1.0935 LearningRate 0.000015 Epoch: 35 Global Step: 738340 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:31,370-Speed 2498.08 samples/sec Loss 1.1023 LearningRate 0.000015 Epoch: 35 Global Step: 738350 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:39,573-Speed 2497.23 samples/sec Loss 1.1312 LearningRate 0.000015 Epoch: 35 Global Step: 738360 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:47,721-Speed 2513.78 samples/sec Loss 1.0860 LearningRate 0.000015 Epoch: 35 Global Step: 738370 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:27:55,927-Speed 2496.47 samples/sec Loss 1.0884 LearningRate 0.000015 Epoch: 35 Global Step: 738380 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:04,137-Speed 2494.82 samples/sec Loss 1.0940 LearningRate 0.000015 Epoch: 35 Global Step: 738390 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:12,337-Speed 2497.93 samples/sec Loss 1.1037 LearningRate 0.000015 Epoch: 35 Global Step: 738400 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:20,535-Speed 2498.50 samples/sec Loss 1.1245 LearningRate 0.000015 Epoch: 35 Global Step: 738410 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:28,737-Speed 2497.69 samples/sec Loss 1.1006 LearningRate 0.000015 Epoch: 35 Global Step: 738420 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:36,892-Speed 2511.44 samples/sec Loss 1.1062 LearningRate 0.000015 Epoch: 35 Global Step: 738430 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:45,090-Speed 2498.54 samples/sec Loss 1.0992 LearningRate 0.000015 Epoch: 35 Global Step: 738440 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:28:53,294-Speed 2497.06 samples/sec Loss 1.1322 LearningRate 0.000015 Epoch: 35 Global Step: 738450 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:29:01,506-Speed 2494.16 samples/sec Loss 1.1198 LearningRate 0.000015 Epoch: 35 Global Step: 738460 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:29:09,664-Speed 2510.75 samples/sec Loss 1.1172 LearningRate 0.000015 Epoch: 35 Global Step: 738470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:17,868-Speed 2496.96 samples/sec Loss 1.1097 LearningRate 0.000015 Epoch: 35 Global Step: 738480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:26,027-Speed 2510.36 samples/sec Loss 1.1014 LearningRate 0.000015 Epoch: 35 Global Step: 738490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:34,233-Speed 2495.94 samples/sec Loss 1.1163 LearningRate 0.000015 Epoch: 35 Global Step: 738500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:42,433-Speed 2498.09 samples/sec Loss 1.0951 LearningRate 0.000015 Epoch: 35 Global Step: 738510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:50,633-Speed 2498.09 samples/sec Loss 1.0885 LearningRate 0.000015 Epoch: 35 Global Step: 738520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:29:58,834-Speed 2497.68 samples/sec Loss 1.0791 LearningRate 0.000015 Epoch: 35 Global Step: 738530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:07,034-Speed 2497.95 samples/sec Loss 1.1418 LearningRate 0.000015 Epoch: 35 Global Step: 738540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:15,186-Speed 2512.41 samples/sec Loss 1.0923 LearningRate 0.000015 Epoch: 35 Global Step: 738550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:23,399-Speed 2494.03 samples/sec Loss 1.1365 LearningRate 0.000015 Epoch: 35 Global Step: 738560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:31,597-Speed 2498.60 samples/sec Loss 1.1246 LearningRate 0.000015 Epoch: 35 Global Step: 738570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:39,804-Speed 2495.78 samples/sec Loss 1.0961 LearningRate 0.000015 Epoch: 35 Global Step: 738580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:48,009-Speed 2496.36 samples/sec Loss 1.0975 LearningRate 0.000015 Epoch: 35 Global Step: 738590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:30:56,204-Speed 2499.58 samples/sec Loss 1.1268 LearningRate 0.000015 Epoch: 35 Global Step: 738600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:04,352-Speed 2513.87 samples/sec Loss 1.1400 LearningRate 0.000015 Epoch: 35 Global Step: 738610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:12,550-Speed 2498.36 samples/sec Loss 1.1274 LearningRate 0.000015 Epoch: 35 Global Step: 738620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:20,749-Speed 2498.56 samples/sec Loss 1.1032 LearningRate 0.000015 Epoch: 35 Global Step: 738630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:28,946-Speed 2498.80 samples/sec Loss 1.0982 LearningRate 0.000015 Epoch: 35 Global Step: 738640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:37,145-Speed 2498.39 samples/sec Loss 1.1033 LearningRate 0.000015 Epoch: 35 Global Step: 738650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:45,345-Speed 2497.92 samples/sec Loss 1.1380 LearningRate 0.000015 Epoch: 35 Global Step: 738660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:31:53,493-Speed 2514.08 samples/sec Loss 1.1068 LearningRate 0.000015 Epoch: 35 Global Step: 738670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:01,694-Speed 2497.89 samples/sec Loss 1.1519 LearningRate 0.000015 Epoch: 35 Global Step: 738680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:09,895-Speed 2497.64 samples/sec Loss 1.1111 LearningRate 0.000015 Epoch: 35 Global Step: 738690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:18,109-Speed 2493.75 samples/sec Loss 1.1063 LearningRate 0.000015 Epoch: 35 Global Step: 738700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:26,311-Speed 2497.68 samples/sec Loss 1.1267 LearningRate 0.000015 Epoch: 35 Global Step: 738710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:34,521-Speed 2494.91 samples/sec Loss 1.1095 LearningRate 0.000015 Epoch: 35 Global Step: 738720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:42,665-Speed 2515.39 samples/sec Loss 1.1307 LearningRate 0.000015 Epoch: 35 Global Step: 738730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:50,867-Speed 2497.32 samples/sec Loss 1.1040 LearningRate 0.000015 Epoch: 35 Global Step: 738740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:32:59,068-Speed 2497.70 samples/sec Loss 1.1210 LearningRate 0.000015 Epoch: 35 Global Step: 738750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:07,267-Speed 2498.51 samples/sec Loss 1.1035 LearningRate 0.000015 Epoch: 35 Global Step: 738760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:15,473-Speed 2496.32 samples/sec Loss 1.1134 LearningRate 0.000015 Epoch: 35 Global Step: 738770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:23,678-Speed 2496.38 samples/sec Loss 1.1145 LearningRate 0.000015 Epoch: 35 Global Step: 738780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:31,825-Speed 2514.00 samples/sec Loss 1.1125 LearningRate 0.000015 Epoch: 35 Global Step: 738790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:40,027-Speed 2497.44 samples/sec Loss 1.1312 LearningRate 0.000015 Epoch: 35 Global Step: 738800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:48,228-Speed 2498.08 samples/sec Loss 1.1029 LearningRate 0.000015 Epoch: 35 Global Step: 738810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:33:56,443-Speed 2493.37 samples/sec Loss 1.0999 LearningRate 0.000015 Epoch: 35 Global Step: 738820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:04,654-Speed 2494.63 samples/sec Loss 1.1198 LearningRate 0.000015 Epoch: 35 Global Step: 738830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:12,854-Speed 2497.93 samples/sec Loss 1.1250 LearningRate 0.000015 Epoch: 35 Global Step: 738840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:20,999-Speed 2514.69 samples/sec Loss 1.1169 LearningRate 0.000015 Epoch: 35 Global Step: 738850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:29,197-Speed 2498.66 samples/sec Loss 1.1164 LearningRate 0.000015 Epoch: 35 Global Step: 738860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:37,396-Speed 2498.14 samples/sec Loss 1.1142 LearningRate 0.000015 Epoch: 35 Global Step: 738870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:45,599-Speed 2497.00 samples/sec Loss 1.1228 LearningRate 0.000015 Epoch: 35 Global Step: 738880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:34:53,797-Speed 2498.76 samples/sec Loss 1.1267 LearningRate 0.000015 Epoch: 35 Global Step: 738890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:01,999-Speed 2497.26 samples/sec Loss 1.1369 LearningRate 0.000015 Epoch: 35 Global Step: 738900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:10,146-Speed 2514.30 samples/sec Loss 1.0966 LearningRate 0.000015 Epoch: 35 Global Step: 738910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:18,344-Speed 2498.55 samples/sec Loss 1.1095 LearningRate 0.000015 Epoch: 35 Global Step: 738920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:26,558-Speed 2493.65 samples/sec Loss 1.0990 LearningRate 0.000015 Epoch: 35 Global Step: 738930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:34,754-Speed 2499.22 samples/sec Loss 1.1036 LearningRate 0.000015 Epoch: 35 Global Step: 738940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:42,956-Speed 2497.18 samples/sec Loss 1.1345 LearningRate 0.000015 Epoch: 35 Global Step: 738950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:51,156-Speed 2498.38 samples/sec Loss 1.1175 LearningRate 0.000015 Epoch: 35 Global Step: 738960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:35:59,306-Speed 2513.25 samples/sec Loss 1.1172 LearningRate 0.000015 Epoch: 35 Global Step: 738970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:07,512-Speed 2496.06 samples/sec Loss 1.1237 LearningRate 0.000015 Epoch: 35 Global Step: 738980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:15,716-Speed 2496.74 samples/sec Loss 1.0920 LearningRate 0.000015 Epoch: 35 Global Step: 738990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:23,918-Speed 2497.55 samples/sec Loss 1.1106 LearningRate 0.000015 Epoch: 35 Global Step: 739000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:32,120-Speed 2497.30 samples/sec Loss 1.0855 LearningRate 0.000015 Epoch: 35 Global Step: 739010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:40,321-Speed 2497.84 samples/sec Loss 1.0837 LearningRate 0.000015 Epoch: 35 Global Step: 739020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:48,468-Speed 2514.28 samples/sec Loss 1.1377 LearningRate 0.000015 Epoch: 35 Global Step: 739030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:36:56,669-Speed 2497.63 samples/sec Loss 1.1324 LearningRate 0.000015 Epoch: 35 Global Step: 739040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:04,870-Speed 2497.58 samples/sec Loss 1.1213 LearningRate 0.000015 Epoch: 35 Global Step: 739050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:13,071-Speed 2497.56 samples/sec Loss 1.1143 LearningRate 0.000015 Epoch: 35 Global Step: 739060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:21,268-Speed 2498.92 samples/sec Loss 1.1070 LearningRate 0.000015 Epoch: 35 Global Step: 739070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:29,473-Speed 2496.19 samples/sec Loss 1.1421 LearningRate 0.000015 Epoch: 35 Global Step: 739080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:37,624-Speed 2512.95 samples/sec Loss 1.1120 LearningRate 0.000015 Epoch: 35 Global Step: 739090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:45,825-Speed 2497.64 samples/sec Loss 1.1074 LearningRate 0.000015 Epoch: 35 Global Step: 739100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:37:54,026-Speed 2497.81 samples/sec Loss 1.1157 LearningRate 0.000015 Epoch: 35 Global Step: 739110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:02,225-Speed 2497.96 samples/sec Loss 1.0817 LearningRate 0.000015 Epoch: 35 Global Step: 739120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:10,429-Speed 2496.75 samples/sec Loss 1.1150 LearningRate 0.000015 Epoch: 35 Global Step: 739130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:18,635-Speed 2496.19 samples/sec Loss 1.1006 LearningRate 0.000015 Epoch: 35 Global Step: 739140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:26,789-Speed 2512.25 samples/sec Loss 1.0993 LearningRate 0.000015 Epoch: 35 Global Step: 739150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:34,987-Speed 2498.35 samples/sec Loss 1.1490 LearningRate 0.000015 Epoch: 35 Global Step: 739160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:43,187-Speed 2497.98 samples/sec Loss 1.0996 LearningRate 0.000015 Epoch: 35 Global Step: 739170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:51,388-Speed 2497.69 samples/sec Loss 1.1040 LearningRate 0.000015 Epoch: 35 Global Step: 739180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:38:59,584-Speed 2499.07 samples/sec Loss 1.1160 LearningRate 0.000015 Epoch: 35 Global Step: 739190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:07,799-Speed 2493.39 samples/sec Loss 1.1322 LearningRate 0.000015 Epoch: 35 Global Step: 739200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:15,943-Speed 2516.14 samples/sec Loss 1.1002 LearningRate 0.000015 Epoch: 35 Global Step: 739210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:24,144-Speed 2497.94 samples/sec Loss 1.1214 LearningRate 0.000015 Epoch: 35 Global Step: 739220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:32,345-Speed 2497.39 samples/sec Loss 1.1085 LearningRate 0.000015 Epoch: 35 Global Step: 739230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:40,543-Speed 2499.00 samples/sec Loss 1.0975 LearningRate 0.000015 Epoch: 35 Global Step: 739240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:48,740-Speed 2498.82 samples/sec Loss 1.1182 LearningRate 0.000015 Epoch: 35 Global Step: 739250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:39:56,943-Speed 2496.93 samples/sec Loss 1.1583 LearningRate 0.000015 Epoch: 35 Global Step: 739260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:05,104-Speed 2509.71 samples/sec Loss 1.0892 LearningRate 0.000015 Epoch: 35 Global Step: 739270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:13,304-Speed 2498.09 samples/sec Loss 1.1091 LearningRate 0.000015 Epoch: 35 Global Step: 739280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:21,507-Speed 2496.98 samples/sec Loss 1.1270 LearningRate 0.000015 Epoch: 35 Global Step: 739290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:29,707-Speed 2498.03 samples/sec Loss 1.1028 LearningRate 0.000015 Epoch: 35 Global Step: 739300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:37,911-Speed 2496.33 samples/sec Loss 1.1360 LearningRate 0.000015 Epoch: 35 Global Step: 739310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:46,128-Speed 2492.80 samples/sec Loss 1.1454 LearningRate 0.000015 Epoch: 35 Global Step: 739320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:40:54,276-Speed 2514.39 samples/sec Loss 1.1639 LearningRate 0.000015 Epoch: 35 Global Step: 739330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:02,489-Speed 2494.09 samples/sec Loss 1.1201 LearningRate 0.000015 Epoch: 35 Global Step: 739340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:10,695-Speed 2496.24 samples/sec Loss 1.0925 LearningRate 0.000015 Epoch: 35 Global Step: 739350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:18,896-Speed 2497.83 samples/sec Loss 1.1060 LearningRate 0.000015 Epoch: 35 Global Step: 739360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:27,097-Speed 2497.43 samples/sec Loss 1.1052 LearningRate 0.000015 Epoch: 35 Global Step: 739370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:35,298-Speed 2497.54 samples/sec Loss 1.1034 LearningRate 0.000015 Epoch: 35 Global Step: 739380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:43,464-Speed 2508.65 samples/sec Loss 1.0980 LearningRate 0.000015 Epoch: 35 Global Step: 739390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:51,661-Speed 2498.85 samples/sec Loss 1.1095 LearningRate 0.000015 Epoch: 35 Global Step: 739400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:41:59,857-Speed 2499.13 samples/sec Loss 1.1213 LearningRate 0.000015 Epoch: 35 Global Step: 739410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:08,055-Speed 2498.40 samples/sec Loss 1.1275 LearningRate 0.000015 Epoch: 35 Global Step: 739420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:16,255-Speed 2497.90 samples/sec Loss 1.1168 LearningRate 0.000015 Epoch: 35 Global Step: 739430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:24,455-Speed 2498.01 samples/sec Loss 1.1249 LearningRate 0.000015 Epoch: 35 Global Step: 739440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:32,605-Speed 2513.15 samples/sec Loss 1.1340 LearningRate 0.000015 Epoch: 35 Global Step: 739450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:40,803-Speed 2498.60 samples/sec Loss 1.1050 LearningRate 0.000015 Epoch: 35 Global Step: 739460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:49,005-Speed 2497.44 samples/sec Loss 1.1048 LearningRate 0.000015 Epoch: 35 Global Step: 739470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:42:57,204-Speed 2498.35 samples/sec Loss 1.1291 LearningRate 0.000015 Epoch: 35 Global Step: 739480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:05,403-Speed 2498.46 samples/sec Loss 1.1299 LearningRate 0.000015 Epoch: 35 Global Step: 739490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:13,604-Speed 2497.82 samples/sec Loss 1.1090 LearningRate 0.000015 Epoch: 35 Global Step: 739500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:21,753-Speed 2513.24 samples/sec Loss 1.1278 LearningRate 0.000015 Epoch: 35 Global Step: 739510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:29,953-Speed 2498.15 samples/sec Loss 1.1370 LearningRate 0.000015 Epoch: 35 Global Step: 739520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:38,162-Speed 2495.24 samples/sec Loss 1.1476 LearningRate 0.000015 Epoch: 35 Global Step: 739530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:46,363-Speed 2497.87 samples/sec Loss 1.1186 LearningRate 0.000015 Epoch: 35 Global Step: 739540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:43:54,563-Speed 2497.79 samples/sec Loss 1.1205 LearningRate 0.000015 Epoch: 35 Global Step: 739550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:02,765-Speed 2497.63 samples/sec Loss 1.1172 LearningRate 0.000015 Epoch: 35 Global Step: 739560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:10,915-Speed 2513.10 samples/sec Loss 1.1394 LearningRate 0.000015 Epoch: 35 Global Step: 739570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:19,118-Speed 2497.14 samples/sec Loss 1.1254 LearningRate 0.000015 Epoch: 35 Global Step: 739580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:27,323-Speed 2496.49 samples/sec Loss 1.1137 LearningRate 0.000015 Epoch: 35 Global Step: 739590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:35,528-Speed 2496.24 samples/sec Loss 1.1140 LearningRate 0.000015 Epoch: 35 Global Step: 739600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:43,733-Speed 2496.33 samples/sec Loss 1.1103 LearningRate 0.000015 Epoch: 35 Global Step: 739610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:44:51,935-Speed 2497.46 samples/sec Loss 1.0921 LearningRate 0.000015 Epoch: 35 Global Step: 739620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:45:00,083-Speed 2513.69 samples/sec Loss 1.0910 LearningRate 0.000015 Epoch: 35 Global Step: 739630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:45:08,282-Speed 2498.14 samples/sec Loss 1.1144 LearningRate 0.000015 Epoch: 35 Global Step: 739640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:45:16,482-Speed 2498.04 samples/sec Loss 1.0886 LearningRate 0.000015 Epoch: 35 Global Step: 739650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:45:24,694-Speed 2494.20 samples/sec Loss 1.1128 LearningRate 0.000014 Epoch: 35 Global Step: 739660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-07-12 15:45:32,898-Speed 2496.66 samples/sec Loss 1.1114 LearningRate 0.000014 Epoch: 35 Global Step: 739670 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:45:41,102-Speed 2496.70 samples/sec Loss 1.1026 LearningRate 0.000014 Epoch: 35 Global Step: 739680 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:45:49,249-Speed 2514.06 samples/sec Loss 1.1415 LearningRate 0.000014 Epoch: 35 Global Step: 739690 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:45:57,451-Speed 2497.51 samples/sec Loss 1.1037 LearningRate 0.000014 Epoch: 35 Global Step: 739700 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:05,666-Speed 2493.39 samples/sec Loss 1.1056 LearningRate 0.000014 Epoch: 35 Global Step: 739710 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:13,870-Speed 2496.63 samples/sec Loss 1.1227 LearningRate 0.000014 Epoch: 35 Global Step: 739720 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:22,083-Speed 2494.05 samples/sec Loss 1.1037 LearningRate 0.000014 Epoch: 35 Global Step: 739730 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:30,280-Speed 2498.71 samples/sec Loss 1.1194 LearningRate 0.000014 Epoch: 35 Global Step: 739740 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:38,429-Speed 2513.60 samples/sec Loss 1.1028 LearningRate 0.000014 Epoch: 35 Global Step: 739750 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:46,637-Speed 2495.50 samples/sec Loss 1.1512 LearningRate 0.000014 Epoch: 35 Global Step: 739760 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:46:54,848-Speed 2494.76 samples/sec Loss 1.0897 LearningRate 0.000014 Epoch: 35 Global Step: 739770 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:03,051-Speed 2497.11 samples/sec Loss 1.1273 LearningRate 0.000014 Epoch: 35 Global Step: 739780 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:11,252-Speed 2497.69 samples/sec Loss 1.1176 LearningRate 0.000014 Epoch: 35 Global Step: 739790 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:19,460-Speed 2495.62 samples/sec Loss 1.1153 LearningRate 0.000014 Epoch: 35 Global Step: 739800 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:27,612-Speed 2512.59 samples/sec Loss 1.1164 LearningRate 0.000014 Epoch: 35 Global Step: 739810 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:35,817-Speed 2496.49 samples/sec Loss 1.1058 LearningRate 0.000014 Epoch: 35 Global Step: 739820 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:44,021-Speed 2496.83 samples/sec Loss 1.1208 LearningRate 0.000014 Epoch: 35 Global Step: 739830 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:47:52,226-Speed 2496.37 samples/sec Loss 1.1214 LearningRate 0.000014 Epoch: 35 Global Step: 739840 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:00,427-Speed 2497.59 samples/sec Loss 1.1059 LearningRate 0.000014 Epoch: 35 Global Step: 739850 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:08,629-Speed 2497.54 samples/sec Loss 1.1656 LearningRate 0.000014 Epoch: 35 Global Step: 739860 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:16,794-Speed 2508.61 samples/sec Loss 1.1352 LearningRate 0.000014 Epoch: 35 Global Step: 739870 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:24,994-Speed 2497.77 samples/sec Loss 1.1414 LearningRate 0.000014 Epoch: 35 Global Step: 739880 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:33,200-Speed 2496.20 samples/sec Loss 1.1440 LearningRate 0.000014 Epoch: 35 Global Step: 739890 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:41,398-Speed 2498.54 samples/sec Loss 1.1138 LearningRate 0.000014 Epoch: 35 Global Step: 739900 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:49,599-Speed 2497.71 samples/sec Loss 1.1374 LearningRate 0.000014 Epoch: 35 Global Step: 739910 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:48:57,800-Speed 2497.42 samples/sec Loss 1.1038 LearningRate 0.000014 Epoch: 35 Global Step: 739920 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:05,947-Speed 2514.32 samples/sec Loss 1.1128 LearningRate 0.000014 Epoch: 35 Global Step: 739930 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:14,158-Speed 2494.93 samples/sec Loss 1.1312 LearningRate 0.000014 Epoch: 35 Global Step: 739940 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:22,357-Speed 2498.07 samples/sec Loss 1.1233 LearningRate 0.000014 Epoch: 35 Global Step: 739950 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:30,558-Speed 2497.58 samples/sec Loss 1.1024 LearningRate 0.000014 Epoch: 35 Global Step: 739960 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:38,758-Speed 2498.14 samples/sec Loss 1.1343 LearningRate 0.000014 Epoch: 35 Global Step: 739970 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:46,958-Speed 2498.03 samples/sec Loss 1.0947 LearningRate 0.000014 Epoch: 35 Global Step: 739980 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:49:55,108-Speed 2513.53 samples/sec Loss 1.1212 LearningRate 0.000014 Epoch: 35 Global Step: 739990 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:50:03,311-Speed 2496.90 samples/sec Loss 1.1089 LearningRate 0.000014 Epoch: 35 Global Step: 740000 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-07-12 15:50:11,510-Speed 2498.18 samples/sec Loss 1.1130 LearningRate 0.000014 Epoch: 35 Global Step: 740010 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:50:19,715-Speed 2496.62 samples/sec Loss 1.1626 LearningRate 0.000014 Epoch: 35 Global Step: 740020 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:50:27,917-Speed 2497.40 samples/sec Loss 1.1317 LearningRate 0.000014 Epoch: 35 Global Step: 740030 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:50:36,123-Speed 2496.07 samples/sec Loss 1.1055 LearningRate 0.000014 Epoch: 35 Global Step: 740040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:50:44,296-Speed 2506.17 samples/sec Loss 1.1165 LearningRate 0.000014 Epoch: 35 Global Step: 740050 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:50:52,500-Speed 2497.10 samples/sec Loss 1.1431 LearningRate 0.000014 Epoch: 35 Global Step: 740060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:00,699-Speed 2498.38 samples/sec Loss 1.1139 LearningRate 0.000014 Epoch: 35 Global Step: 740070 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:08,901-Speed 2497.12 samples/sec Loss 1.1273 LearningRate 0.000014 Epoch: 35 Global Step: 740080 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:17,105-Speed 2497.13 samples/sec Loss 1.1088 LearningRate 0.000014 Epoch: 35 Global Step: 740090 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:25,305-Speed 2498.04 samples/sec Loss 1.1027 LearningRate 0.000014 Epoch: 35 Global Step: 740100 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:33,452-Speed 2514.21 samples/sec Loss 1.1077 LearningRate 0.000014 Epoch: 35 Global Step: 740110 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:41,669-Speed 2492.75 samples/sec Loss 1.1268 LearningRate 0.000014 Epoch: 35 Global Step: 740120 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:49,876-Speed 2495.95 samples/sec Loss 1.1103 LearningRate 0.000014 Epoch: 35 Global Step: 740130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:51:58,075-Speed 2498.65 samples/sec Loss 1.1229 LearningRate 0.000014 Epoch: 35 Global Step: 740140 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:06,287-Speed 2494.48 samples/sec Loss 1.1198 LearningRate 0.000014 Epoch: 35 Global Step: 740150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:14,484-Speed 2498.77 samples/sec Loss 1.0958 LearningRate 0.000014 Epoch: 35 Global Step: 740160 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:22,632-Speed 2514.57 samples/sec Loss 1.1189 LearningRate 0.000014 Epoch: 35 Global Step: 740170 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:30,832-Speed 2498.13 samples/sec Loss 1.1305 LearningRate 0.000014 Epoch: 35 Global Step: 740180 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:39,029-Speed 2498.70 samples/sec Loss 1.1447 LearningRate 0.000014 Epoch: 35 Global Step: 740190 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:47,230-Speed 2497.65 samples/sec Loss 1.0702 LearningRate 0.000014 Epoch: 35 Global Step: 740200 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:52:55,428-Speed 2498.75 samples/sec Loss 1.1264 LearningRate 0.000014 Epoch: 35 Global Step: 740210 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:03,628-Speed 2498.82 samples/sec Loss 1.1068 LearningRate 0.000014 Epoch: 35 Global Step: 740220 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:11,786-Speed 2510.77 samples/sec Loss 1.1219 LearningRate 0.000014 Epoch: 35 Global Step: 740230 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:19,985-Speed 2498.25 samples/sec Loss 1.0660 LearningRate 0.000014 Epoch: 35 Global Step: 740240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:28,188-Speed 2496.85 samples/sec Loss 1.1059 LearningRate 0.000014 Epoch: 35 Global Step: 740250 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:36,389-Speed 2497.67 samples/sec Loss 1.0891 LearningRate 0.000014 Epoch: 35 Global Step: 740260 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:44,588-Speed 2498.23 samples/sec Loss 1.0691 LearningRate 0.000014 Epoch: 35 Global Step: 740270 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:53:52,790-Speed 2497.48 samples/sec Loss 1.0971 LearningRate 0.000014 Epoch: 35 Global Step: 740280 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:00,936-Speed 2514.79 samples/sec Loss 1.1243 LearningRate 0.000014 Epoch: 35 Global Step: 740290 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:09,146-Speed 2494.77 samples/sec Loss 1.1571 LearningRate 0.000014 Epoch: 35 Global Step: 740300 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:17,348-Speed 2497.49 samples/sec Loss 1.1230 LearningRate 0.000014 Epoch: 35 Global Step: 740310 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:25,548-Speed 2497.81 samples/sec Loss 1.1080 LearningRate 0.000014 Epoch: 35 Global Step: 740320 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:33,748-Speed 2498.15 samples/sec Loss 1.1229 LearningRate 0.000014 Epoch: 35 Global Step: 740330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:41,947-Speed 2498.11 samples/sec Loss 1.1261 LearningRate 0.000014 Epoch: 35 Global Step: 740340 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:50,092-Speed 2514.91 samples/sec Loss 1.1051 LearningRate 0.000014 Epoch: 35 Global Step: 740350 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:54:58,300-Speed 2495.31 samples/sec Loss 1.1489 LearningRate 0.000014 Epoch: 35 Global Step: 740360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:06,499-Speed 2498.34 samples/sec Loss 1.1072 LearningRate 0.000014 Epoch: 35 Global Step: 740370 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:14,712-Speed 2494.03 samples/sec Loss 1.0957 LearningRate 0.000014 Epoch: 35 Global Step: 740380 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:22,914-Speed 2497.41 samples/sec Loss 1.1172 LearningRate 0.000014 Epoch: 35 Global Step: 740390 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:31,115-Speed 2497.68 samples/sec Loss 1.1253 LearningRate 0.000014 Epoch: 35 Global Step: 740400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:39,270-Speed 2511.93 samples/sec Loss 1.1013 LearningRate 0.000014 Epoch: 35 Global Step: 740410 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:47,476-Speed 2496.17 samples/sec Loss 1.1253 LearningRate 0.000014 Epoch: 35 Global Step: 740420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:55:55,678-Speed 2497.33 samples/sec Loss 1.1033 LearningRate 0.000014 Epoch: 35 Global Step: 740430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:03,882-Speed 2497.03 samples/sec Loss 1.1132 LearningRate 0.000014 Epoch: 35 Global Step: 740440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:12,084-Speed 2497.60 samples/sec Loss 1.1201 LearningRate 0.000014 Epoch: 35 Global Step: 740450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:20,285-Speed 2497.70 samples/sec Loss 1.1024 LearningRate 0.000014 Epoch: 35 Global Step: 740460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:28,430-Speed 2514.60 samples/sec Loss 1.1179 LearningRate 0.000014 Epoch: 35 Global Step: 740470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:36,634-Speed 2497.33 samples/sec Loss 1.1058 LearningRate 0.000014 Epoch: 35 Global Step: 740480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:44,835-Speed 2497.86 samples/sec Loss 1.1075 LearningRate 0.000014 Epoch: 35 Global Step: 740490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:56:53,032-Speed 2498.69 samples/sec Loss 1.0881 LearningRate 0.000014 Epoch: 35 Global Step: 740500 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:01,245-Speed 2493.90 samples/sec Loss 1.1076 LearningRate 0.000014 Epoch: 35 Global Step: 740510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:09,453-Speed 2495.76 samples/sec Loss 1.0916 LearningRate 0.000014 Epoch: 35 Global Step: 740520 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:17,602-Speed 2513.56 samples/sec Loss 1.1133 LearningRate 0.000014 Epoch: 35 Global Step: 740530 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:25,802-Speed 2498.02 samples/sec Loss 1.1111 LearningRate 0.000014 Epoch: 35 Global Step: 740540 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:34,009-Speed 2496.13 samples/sec Loss 1.1028 LearningRate 0.000014 Epoch: 35 Global Step: 740550 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:42,211-Speed 2497.40 samples/sec Loss 1.1079 LearningRate 0.000014 Epoch: 35 Global Step: 740560 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:50,410-Speed 2498.15 samples/sec Loss 1.1215 LearningRate 0.000014 Epoch: 35 Global Step: 740570 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:57:58,610-Speed 2497.89 samples/sec Loss 1.1196 LearningRate 0.000014 Epoch: 35 Global Step: 740580 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:06,761-Speed 2512.78 samples/sec Loss 1.1303 LearningRate 0.000014 Epoch: 35 Global Step: 740590 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:14,964-Speed 2497.63 samples/sec Loss 1.1201 LearningRate 0.000014 Epoch: 35 Global Step: 740600 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:23,162-Speed 2498.49 samples/sec Loss 1.1236 LearningRate 0.000014 Epoch: 35 Global Step: 740610 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:31,370-Speed 2495.38 samples/sec Loss 1.1097 LearningRate 0.000014 Epoch: 35 Global Step: 740620 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:39,572-Speed 2497.41 samples/sec Loss 1.0913 LearningRate 0.000014 Epoch: 35 Global Step: 740630 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:47,787-Speed 2493.29 samples/sec Loss 1.1336 LearningRate 0.000014 Epoch: 35 Global Step: 740640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:58:55,950-Speed 2509.13 samples/sec Loss 1.0857 LearningRate 0.000014 Epoch: 35 Global Step: 740650 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:04,154-Speed 2496.73 samples/sec Loss 1.1274 LearningRate 0.000014 Epoch: 35 Global Step: 740660 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:12,360-Speed 2496.39 samples/sec Loss 1.1083 LearningRate 0.000014 Epoch: 35 Global Step: 740670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:20,574-Speed 2493.63 samples/sec Loss 1.1304 LearningRate 0.000014 Epoch: 35 Global Step: 740680 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:28,776-Speed 2497.21 samples/sec Loss 1.0900 LearningRate 0.000014 Epoch: 35 Global Step: 740690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:36,979-Speed 2496.89 samples/sec Loss 1.0987 LearningRate 0.000014 Epoch: 35 Global Step: 740700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:45,152-Speed 2506.38 samples/sec Loss 1.1124 LearningRate 0.000014 Epoch: 35 Global Step: 740710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 15:59:53,353-Speed 2498.14 samples/sec Loss 1.1114 LearningRate 0.000014 Epoch: 35 Global Step: 740720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:01,556-Speed 2497.02 samples/sec Loss 1.1236 LearningRate 0.000014 Epoch: 35 Global Step: 740730 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:09,757-Speed 2497.69 samples/sec Loss 1.1373 LearningRate 0.000014 Epoch: 35 Global Step: 740740 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:17,960-Speed 2497.25 samples/sec Loss 1.1010 LearningRate 0.000014 Epoch: 35 Global Step: 740750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:26,161-Speed 2497.60 samples/sec Loss 1.0935 LearningRate 0.000014 Epoch: 35 Global Step: 740760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:34,317-Speed 2511.46 samples/sec Loss 1.1476 LearningRate 0.000014 Epoch: 35 Global Step: 740770 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:42,517-Speed 2498.10 samples/sec Loss 1.1140 LearningRate 0.000014 Epoch: 35 Global Step: 740780 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:50,729-Speed 2494.61 samples/sec Loss 1.1415 LearningRate 0.000014 Epoch: 35 Global Step: 740790 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:00:58,929-Speed 2497.70 samples/sec Loss 1.1272 LearningRate 0.000014 Epoch: 35 Global Step: 740800 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:01:07,134-Speed 2496.59 samples/sec Loss 1.0980 LearningRate 0.000014 Epoch: 35 Global Step: 740810 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:01:15,337-Speed 2496.96 samples/sec Loss 1.1043 LearningRate 0.000014 Epoch: 35 Global Step: 740820 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:01:23,481-Speed 2515.25 samples/sec Loss 1.1180 LearningRate 0.000014 Epoch: 35 Global Step: 740830 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:01:31,637-Speed 2511.42 samples/sec Loss 1.0988 LearningRate 0.000014 Epoch: 35 Global Step: 740840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:01:39,835-Speed 2498.86 samples/sec Loss 1.1090 LearningRate 0.000014 Epoch: 35 Global Step: 740850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:01:48,031-Speed 2499.11 samples/sec Loss 1.1249 LearningRate 0.000014 Epoch: 35 Global Step: 740860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:01:56,230-Speed 2498.28 samples/sec Loss 1.1291 LearningRate 0.000014 Epoch: 35 Global Step: 740870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:04,431-Speed 2497.57 samples/sec Loss 1.1191 LearningRate 0.000014 Epoch: 35 Global Step: 740880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:12,580-Speed 2513.70 samples/sec Loss 1.0858 LearningRate 0.000014 Epoch: 35 Global Step: 740890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:20,781-Speed 2497.83 samples/sec Loss 1.1026 LearningRate 0.000014 Epoch: 35 Global Step: 740900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:28,981-Speed 2498.13 samples/sec Loss 1.1312 LearningRate 0.000014 Epoch: 35 Global Step: 740910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:37,181-Speed 2497.82 samples/sec Loss 1.1690 LearningRate 0.000014 Epoch: 35 Global Step: 740920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:45,382-Speed 2497.93 samples/sec Loss 1.1045 LearningRate 0.000014 Epoch: 35 Global Step: 740930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:02:53,598-Speed 2493.15 samples/sec Loss 1.1106 LearningRate 0.000014 Epoch: 35 Global Step: 740940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:01,755-Speed 2511.37 samples/sec Loss 1.0845 LearningRate 0.000014 Epoch: 35 Global Step: 740950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:09,960-Speed 2496.35 samples/sec Loss 1.1404 LearningRate 0.000014 Epoch: 35 Global Step: 740960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:18,164-Speed 2496.93 samples/sec Loss 1.1269 LearningRate 0.000014 Epoch: 35 Global Step: 740970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:26,394-Speed 2488.99 samples/sec Loss 1.1313 LearningRate 0.000014 Epoch: 35 Global Step: 740980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:34,592-Speed 2498.64 samples/sec Loss 1.1071 LearningRate 0.000014 Epoch: 35 Global Step: 740990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:42,805-Speed 2494.10 samples/sec Loss 1.0959 LearningRate 0.000014 Epoch: 35 Global Step: 741000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:50,949-Speed 2515.04 samples/sec Loss 1.1103 LearningRate 0.000014 Epoch: 35 Global Step: 741010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:03:59,150-Speed 2498.09 samples/sec Loss 1.0917 LearningRate 0.000014 Epoch: 35 Global Step: 741020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:07,354-Speed 2496.63 samples/sec Loss 1.0996 LearningRate 0.000014 Epoch: 35 Global Step: 741030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:15,558-Speed 2496.53 samples/sec Loss 1.1110 LearningRate 0.000014 Epoch: 35 Global Step: 741040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:23,765-Speed 2496.24 samples/sec Loss 1.1118 LearningRate 0.000014 Epoch: 35 Global Step: 741050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:31,969-Speed 2496.90 samples/sec Loss 1.1252 LearningRate 0.000014 Epoch: 35 Global Step: 741060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:40,126-Speed 2510.96 samples/sec Loss 1.1168 LearningRate 0.000014 Epoch: 35 Global Step: 741070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:48,330-Speed 2497.12 samples/sec Loss 1.1054 LearningRate 0.000014 Epoch: 35 Global Step: 741080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:04:56,542-Speed 2494.19 samples/sec Loss 1.1277 LearningRate 0.000014 Epoch: 35 Global Step: 741090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:04,765-Speed 2490.84 samples/sec Loss 1.1158 LearningRate 0.000014 Epoch: 35 Global Step: 741100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:12,965-Speed 2498.16 samples/sec Loss 1.1288 LearningRate 0.000014 Epoch: 35 Global Step: 741110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:21,186-Speed 2491.46 samples/sec Loss 1.1158 LearningRate 0.000014 Epoch: 35 Global Step: 741120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:29,336-Speed 2514.69 samples/sec Loss 1.1274 LearningRate 0.000014 Epoch: 35 Global Step: 741130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:37,536-Speed 2498.02 samples/sec Loss 1.1405 LearningRate 0.000014 Epoch: 35 Global Step: 741140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:45,738-Speed 2497.25 samples/sec Loss 1.0985 LearningRate 0.000014 Epoch: 35 Global Step: 741150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:05:53,946-Speed 2495.77 samples/sec Loss 1.1125 LearningRate 0.000014 Epoch: 35 Global Step: 741160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:02,148-Speed 2497.14 samples/sec Loss 1.1098 LearningRate 0.000014 Epoch: 35 Global Step: 741170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:10,349-Speed 2497.71 samples/sec Loss 1.0994 LearningRate 0.000014 Epoch: 35 Global Step: 741180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:18,496-Speed 2514.19 samples/sec Loss 1.0790 LearningRate 0.000014 Epoch: 35 Global Step: 741190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:26,695-Speed 2498.24 samples/sec Loss 1.1258 LearningRate 0.000014 Epoch: 35 Global Step: 741200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:34,894-Speed 2498.41 samples/sec Loss 1.0937 LearningRate 0.000014 Epoch: 35 Global Step: 741210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:43,111-Speed 2493.12 samples/sec Loss 1.1160 LearningRate 0.000014 Epoch: 35 Global Step: 741220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:51,316-Speed 2496.40 samples/sec Loss 1.0980 LearningRate 0.000014 Epoch: 35 Global Step: 741230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:06:59,521-Speed 2496.81 samples/sec Loss 1.1188 LearningRate 0.000014 Epoch: 35 Global Step: 741240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:07,681-Speed 2510.04 samples/sec Loss 1.0968 LearningRate 0.000014 Epoch: 35 Global Step: 741250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:15,881-Speed 2497.95 samples/sec Loss 1.1031 LearningRate 0.000014 Epoch: 35 Global Step: 741260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:24,080-Speed 2497.85 samples/sec Loss 1.1018 LearningRate 0.000014 Epoch: 35 Global Step: 741270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:32,287-Speed 2495.86 samples/sec Loss 1.1270 LearningRate 0.000014 Epoch: 35 Global Step: 741280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:40,492-Speed 2496.58 samples/sec Loss 1.0894 LearningRate 0.000014 Epoch: 35 Global Step: 741290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:48,696-Speed 2496.90 samples/sec Loss 1.1350 LearningRate 0.000014 Epoch: 35 Global Step: 741300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:07:56,849-Speed 2512.42 samples/sec Loss 1.1451 LearningRate 0.000014 Epoch: 35 Global Step: 741310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:05,051-Speed 2497.36 samples/sec Loss 1.1216 LearningRate 0.000014 Epoch: 35 Global Step: 741320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:13,258-Speed 2495.86 samples/sec Loss 1.1013 LearningRate 0.000014 Epoch: 35 Global Step: 741330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:21,468-Speed 2494.83 samples/sec Loss 1.1228 LearningRate 0.000014 Epoch: 35 Global Step: 741340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:29,676-Speed 2495.79 samples/sec Loss 1.0959 LearningRate 0.000014 Epoch: 35 Global Step: 741350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:37,888-Speed 2494.08 samples/sec Loss 1.1260 LearningRate 0.000014 Epoch: 35 Global Step: 741360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:46,036-Speed 2513.95 samples/sec Loss 1.0863 LearningRate 0.000014 Epoch: 35 Global Step: 741370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:08:54,240-Speed 2496.79 samples/sec Loss 1.0936 LearningRate 0.000014 Epoch: 35 Global Step: 741380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:02,446-Speed 2496.16 samples/sec Loss 1.1204 LearningRate 0.000014 Epoch: 35 Global Step: 741390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:10,646-Speed 2498.15 samples/sec Loss 1.1030 LearningRate 0.000014 Epoch: 35 Global Step: 741400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:18,846-Speed 2497.65 samples/sec Loss 1.1345 LearningRate 0.000014 Epoch: 35 Global Step: 741410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:27,052-Speed 2496.10 samples/sec Loss 1.1329 LearningRate 0.000014 Epoch: 35 Global Step: 741420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:35,209-Speed 2511.53 samples/sec Loss 1.1265 LearningRate 0.000014 Epoch: 35 Global Step: 741430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:43,409-Speed 2497.95 samples/sec Loss 1.1255 LearningRate 0.000014 Epoch: 35 Global Step: 741440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:51,608-Speed 2498.07 samples/sec Loss 1.1167 LearningRate 0.000014 Epoch: 35 Global Step: 741450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:09:59,811-Speed 2497.03 samples/sec Loss 1.0986 LearningRate 0.000014 Epoch: 35 Global Step: 741460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:08,013-Speed 2497.32 samples/sec Loss 1.0992 LearningRate 0.000014 Epoch: 35 Global Step: 741470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:16,213-Speed 2497.99 samples/sec Loss 1.1421 LearningRate 0.000014 Epoch: 35 Global Step: 741480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:24,370-Speed 2511.38 samples/sec Loss 1.0939 LearningRate 0.000014 Epoch: 35 Global Step: 741490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:32,567-Speed 2498.66 samples/sec Loss 1.1086 LearningRate 0.000014 Epoch: 35 Global Step: 741500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:40,768-Speed 2497.76 samples/sec Loss 1.1229 LearningRate 0.000014 Epoch: 35 Global Step: 741510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:48,971-Speed 2497.08 samples/sec Loss 1.0704 LearningRate 0.000014 Epoch: 35 Global Step: 741520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:10:57,174-Speed 2497.14 samples/sec Loss 1.0913 LearningRate 0.000014 Epoch: 35 Global Step: 741530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:05,377-Speed 2497.06 samples/sec Loss 1.1402 LearningRate 0.000014 Epoch: 35 Global Step: 741540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:13,524-Speed 2514.13 samples/sec Loss 1.1164 LearningRate 0.000014 Epoch: 35 Global Step: 741550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:21,724-Speed 2498.07 samples/sec Loss 1.1177 LearningRate 0.000014 Epoch: 35 Global Step: 741560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:29,929-Speed 2496.44 samples/sec Loss 1.0962 LearningRate 0.000014 Epoch: 35 Global Step: 741570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:38,142-Speed 2494.05 samples/sec Loss 1.1309 LearningRate 0.000014 Epoch: 35 Global Step: 741580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:46,343-Speed 2497.56 samples/sec Loss 1.0620 LearningRate 0.000014 Epoch: 35 Global Step: 741590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:11:54,548-Speed 2496.23 samples/sec Loss 1.0965 LearningRate 0.000014 Epoch: 35 Global Step: 741600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:02,697-Speed 2513.75 samples/sec Loss 1.0956 LearningRate 0.000014 Epoch: 35 Global Step: 741610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:10,897-Speed 2498.12 samples/sec Loss 1.1013 LearningRate 0.000014 Epoch: 35 Global Step: 741620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:19,098-Speed 2497.72 samples/sec Loss 1.0901 LearningRate 0.000014 Epoch: 35 Global Step: 741630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:27,297-Speed 2498.57 samples/sec Loss 1.0913 LearningRate 0.000014 Epoch: 35 Global Step: 741640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:35,498-Speed 2497.72 samples/sec Loss 1.1206 LearningRate 0.000014 Epoch: 35 Global Step: 741650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:43,697-Speed 2498.22 samples/sec Loss 1.1250 LearningRate 0.000014 Epoch: 35 Global Step: 741660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:12:51,858-Speed 2509.98 samples/sec Loss 1.1130 LearningRate 0.000014 Epoch: 35 Global Step: 741670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:00,065-Speed 2495.78 samples/sec Loss 1.0827 LearningRate 0.000014 Epoch: 35 Global Step: 741680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:08,266-Speed 2497.75 samples/sec Loss 1.1069 LearningRate 0.000014 Epoch: 35 Global Step: 741690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:16,549-Speed 2472.75 samples/sec Loss 1.1213 LearningRate 0.000014 Epoch: 35 Global Step: 741700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:24,750-Speed 2497.57 samples/sec Loss 1.0965 LearningRate 0.000014 Epoch: 35 Global Step: 741710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:32,968-Speed 2492.76 samples/sec Loss 1.1018 LearningRate 0.000014 Epoch: 35 Global Step: 741720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:41,118-Speed 2513.34 samples/sec Loss 1.1004 LearningRate 0.000014 Epoch: 35 Global Step: 741730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:49,325-Speed 2495.96 samples/sec Loss 1.0958 LearningRate 0.000014 Epoch: 35 Global Step: 741740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:13:57,536-Speed 2494.40 samples/sec Loss 1.1110 LearningRate 0.000014 Epoch: 35 Global Step: 741750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:05,737-Speed 2498.59 samples/sec Loss 1.1115 LearningRate 0.000014 Epoch: 35 Global Step: 741760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:13,937-Speed 2497.84 samples/sec Loss 1.1368 LearningRate 0.000014 Epoch: 35 Global Step: 741770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:22,136-Speed 2498.09 samples/sec Loss 1.1034 LearningRate 0.000014 Epoch: 35 Global Step: 741780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:30,285-Speed 2514.07 samples/sec Loss 1.0993 LearningRate 0.000014 Epoch: 35 Global Step: 741790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:38,488-Speed 2496.91 samples/sec Loss 1.0907 LearningRate 0.000014 Epoch: 35 Global Step: 741800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:46,690-Speed 2497.53 samples/sec Loss 1.1229 LearningRate 0.000014 Epoch: 35 Global Step: 741810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:14:54,890-Speed 2497.91 samples/sec Loss 1.1353 LearningRate 0.000014 Epoch: 35 Global Step: 741820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:03,095-Speed 2496.27 samples/sec Loss 1.1295 LearningRate 0.000014 Epoch: 35 Global Step: 741830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:11,298-Speed 2497.24 samples/sec Loss 1.1093 LearningRate 0.000014 Epoch: 35 Global Step: 741840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:19,465-Speed 2508.03 samples/sec Loss 1.1095 LearningRate 0.000014 Epoch: 35 Global Step: 741850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:27,685-Speed 2492.28 samples/sec Loss 1.1092 LearningRate 0.000014 Epoch: 35 Global Step: 741860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:35,890-Speed 2496.49 samples/sec Loss 1.1260 LearningRate 0.000014 Epoch: 35 Global Step: 741870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:44,088-Speed 2498.41 samples/sec Loss 1.0837 LearningRate 0.000014 Epoch: 35 Global Step: 741880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:15:52,290-Speed 2497.60 samples/sec Loss 1.1359 LearningRate 0.000014 Epoch: 35 Global Step: 741890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:00,490-Speed 2497.95 samples/sec Loss 1.0997 LearningRate 0.000014 Epoch: 35 Global Step: 741900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:08,647-Speed 2511.02 samples/sec Loss 1.1495 LearningRate 0.000014 Epoch: 35 Global Step: 741910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:16,848-Speed 2497.53 samples/sec Loss 1.1156 LearningRate 0.000014 Epoch: 35 Global Step: 741920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:25,046-Speed 2498.40 samples/sec Loss 1.1113 LearningRate 0.000014 Epoch: 35 Global Step: 741930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:33,250-Speed 2496.80 samples/sec Loss 1.0979 LearningRate 0.000014 Epoch: 35 Global Step: 741940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:41,451-Speed 2497.49 samples/sec Loss 1.1165 LearningRate 0.000014 Epoch: 35 Global Step: 741950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:49,652-Speed 2497.59 samples/sec Loss 1.0995 LearningRate 0.000014 Epoch: 35 Global Step: 741960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:16:57,812-Speed 2510.51 samples/sec Loss 1.1086 LearningRate 0.000014 Epoch: 35 Global Step: 741970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:06,013-Speed 2497.54 samples/sec Loss 1.1116 LearningRate 0.000014 Epoch: 35 Global Step: 741980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:14,213-Speed 2498.13 samples/sec Loss 1.1159 LearningRate 0.000014 Epoch: 35 Global Step: 741990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:22,413-Speed 2497.78 samples/sec Loss 1.1103 LearningRate 0.000014 Epoch: 35 Global Step: 742000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:30,616-Speed 2496.89 samples/sec Loss 1.1307 LearningRate 0.000014 Epoch: 35 Global Step: 742010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:38,818-Speed 2497.35 samples/sec Loss 1.1346 LearningRate 0.000014 Epoch: 35 Global Step: 742020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:46,977-Speed 2510.60 samples/sec Loss 1.1279 LearningRate 0.000014 Epoch: 35 Global Step: 742030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:17:55,177-Speed 2497.93 samples/sec Loss 1.1168 LearningRate 0.000014 Epoch: 35 Global Step: 742040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:03,386-Speed 2495.31 samples/sec Loss 1.1045 LearningRate 0.000014 Epoch: 35 Global Step: 742050 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:11,589-Speed 2496.83 samples/sec Loss 1.1099 LearningRate 0.000014 Epoch: 35 Global Step: 742060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:19,789-Speed 2498.29 samples/sec Loss 1.1138 LearningRate 0.000014 Epoch: 35 Global Step: 742070 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:27,992-Speed 2497.16 samples/sec Loss 1.0972 LearningRate 0.000014 Epoch: 35 Global Step: 742080 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:36,151-Speed 2510.86 samples/sec Loss 1.1221 LearningRate 0.000014 Epoch: 35 Global Step: 742090 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:44,357-Speed 2496.03 samples/sec Loss 1.0834 LearningRate 0.000014 Epoch: 35 Global Step: 742100 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:18:52,560-Speed 2497.07 samples/sec Loss 1.1265 LearningRate 0.000014 Epoch: 35 Global Step: 742110 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:00,761-Speed 2497.80 samples/sec Loss 1.1012 LearningRate 0.000014 Epoch: 35 Global Step: 742120 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:08,963-Speed 2497.67 samples/sec Loss 1.1168 LearningRate 0.000014 Epoch: 35 Global Step: 742130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:17,168-Speed 2496.44 samples/sec Loss 1.1141 LearningRate 0.000014 Epoch: 35 Global Step: 742140 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:25,314-Speed 2515.50 samples/sec Loss 1.1263 LearningRate 0.000014 Epoch: 35 Global Step: 742150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:33,520-Speed 2496.16 samples/sec Loss 1.0888 LearningRate 0.000014 Epoch: 35 Global Step: 742160 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:41,720-Speed 2497.94 samples/sec Loss 1.1149 LearningRate 0.000014 Epoch: 35 Global Step: 742170 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:49,921-Speed 2497.93 samples/sec Loss 1.1187 LearningRate 0.000014 Epoch: 35 Global Step: 742180 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:19:58,136-Speed 2493.42 samples/sec Loss 1.0917 LearningRate 0.000014 Epoch: 35 Global Step: 742190 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:06,344-Speed 2495.57 samples/sec Loss 1.1282 LearningRate 0.000014 Epoch: 35 Global Step: 742200 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:14,489-Speed 2514.56 samples/sec Loss 1.1079 LearningRate 0.000014 Epoch: 35 Global Step: 742210 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:22,692-Speed 2497.23 samples/sec Loss 1.0980 LearningRate 0.000014 Epoch: 35 Global Step: 742220 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:30,894-Speed 2497.43 samples/sec Loss 1.1170 LearningRate 0.000014 Epoch: 35 Global Step: 742230 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:39,095-Speed 2497.45 samples/sec Loss 1.1488 LearningRate 0.000014 Epoch: 35 Global Step: 742240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:47,298-Speed 2497.07 samples/sec Loss 1.1139 LearningRate 0.000014 Epoch: 35 Global Step: 742250 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:20:55,505-Speed 2495.98 samples/sec Loss 1.1088 LearningRate 0.000014 Epoch: 35 Global Step: 742260 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:03,649-Speed 2515.08 samples/sec Loss 1.1054 LearningRate 0.000014 Epoch: 35 Global Step: 742270 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:12,014-Speed 2498.38 samples/sec Loss 1.1259 LearningRate 0.000014 Epoch: 35 Global Step: 742280 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:20,215-Speed 2497.45 samples/sec Loss 1.1363 LearningRate 0.000014 Epoch: 35 Global Step: 742290 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:28,417-Speed 2497.31 samples/sec Loss 1.1357 LearningRate 0.000014 Epoch: 35 Global Step: 742300 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:36,616-Speed 2498.45 samples/sec Loss 1.1345 LearningRate 0.000014 Epoch: 35 Global Step: 742310 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:44,818-Speed 2497.29 samples/sec Loss 1.0930 LearningRate 0.000014 Epoch: 35 Global Step: 742320 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:21:52,975-Speed 2511.16 samples/sec Loss 1.1293 LearningRate 0.000014 Epoch: 35 Global Step: 742330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:01,177-Speed 2497.23 samples/sec Loss 1.1119 LearningRate 0.000014 Epoch: 35 Global Step: 742340 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:09,378-Speed 2497.49 samples/sec Loss 1.1262 LearningRate 0.000014 Epoch: 35 Global Step: 742350 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:17,577-Speed 2498.41 samples/sec Loss 1.1154 LearningRate 0.000014 Epoch: 35 Global Step: 742360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:25,793-Speed 2493.03 samples/sec Loss 1.0911 LearningRate 0.000014 Epoch: 35 Global Step: 742370 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:33,997-Speed 2496.86 samples/sec Loss 1.0938 LearningRate 0.000014 Epoch: 35 Global Step: 742380 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:42,152-Speed 2511.75 samples/sec Loss 1.0937 LearningRate 0.000014 Epoch: 35 Global Step: 742390 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:50,358-Speed 2495.96 samples/sec Loss 1.0970 LearningRate 0.000014 Epoch: 35 Global Step: 742400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:22:58,563-Speed 2496.56 samples/sec Loss 1.1031 LearningRate 0.000014 Epoch: 35 Global Step: 742410 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:06,766-Speed 2497.36 samples/sec Loss 1.0882 LearningRate 0.000014 Epoch: 35 Global Step: 742420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:14,965-Speed 2498.18 samples/sec Loss 1.1139 LearningRate 0.000014 Epoch: 35 Global Step: 742430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:23,171-Speed 2496.05 samples/sec Loss 1.1232 LearningRate 0.000014 Epoch: 35 Global Step: 742440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:31,321-Speed 2513.42 samples/sec Loss 1.1092 LearningRate 0.000014 Epoch: 35 Global Step: 742450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:39,537-Speed 2493.15 samples/sec Loss 1.1244 LearningRate 0.000014 Epoch: 35 Global Step: 742460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:47,739-Speed 2497.52 samples/sec Loss 1.1281 LearningRate 0.000014 Epoch: 35 Global Step: 742470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:23:55,940-Speed 2497.63 samples/sec Loss 1.1021 LearningRate 0.000014 Epoch: 35 Global Step: 742480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:04,140-Speed 2498.13 samples/sec Loss 1.0736 LearningRate 0.000014 Epoch: 35 Global Step: 742490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:12,339-Speed 2498.16 samples/sec Loss 1.1066 LearningRate 0.000014 Epoch: 35 Global Step: 742500 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:20,488-Speed 2513.70 samples/sec Loss 1.1425 LearningRate 0.000014 Epoch: 35 Global Step: 742510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:28,699-Speed 2494.49 samples/sec Loss 1.0990 LearningRate 0.000014 Epoch: 35 Global Step: 742520 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:36,900-Speed 2497.75 samples/sec Loss 1.1060 LearningRate 0.000014 Epoch: 35 Global Step: 742530 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:45,112-Speed 2494.08 samples/sec Loss 1.1415 LearningRate 0.000014 Epoch: 35 Global Step: 742540 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:24:53,312-Speed 2498.21 samples/sec Loss 1.1249 LearningRate 0.000014 Epoch: 35 Global Step: 742550 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:01,514-Speed 2497.21 samples/sec Loss 1.1225 LearningRate 0.000014 Epoch: 35 Global Step: 742560 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:09,660-Speed 2514.47 samples/sec Loss 1.1056 LearningRate 0.000014 Epoch: 35 Global Step: 742570 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:17,860-Speed 2498.05 samples/sec Loss 1.1172 LearningRate 0.000014 Epoch: 35 Global Step: 742580 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:26,070-Speed 2494.96 samples/sec Loss 1.1305 LearningRate 0.000014 Epoch: 35 Global Step: 742590 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:34,269-Speed 2498.04 samples/sec Loss 1.1364 LearningRate 0.000014 Epoch: 35 Global Step: 742600 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:42,474-Speed 2496.46 samples/sec Loss 1.1056 LearningRate 0.000014 Epoch: 35 Global Step: 742610 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:50,680-Speed 2496.18 samples/sec Loss 1.1197 LearningRate 0.000014 Epoch: 35 Global Step: 742620 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:25:58,828-Speed 2514.07 samples/sec Loss 1.1077 LearningRate 0.000014 Epoch: 35 Global Step: 742630 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:07,033-Speed 2496.28 samples/sec Loss 1.1125 LearningRate 0.000014 Epoch: 35 Global Step: 742640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:15,232-Speed 2498.34 samples/sec Loss 1.1337 LearningRate 0.000014 Epoch: 35 Global Step: 742650 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:23,434-Speed 2497.50 samples/sec Loss 1.1385 LearningRate 0.000014 Epoch: 35 Global Step: 742660 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:31,641-Speed 2495.58 samples/sec Loss 1.1250 LearningRate 0.000014 Epoch: 35 Global Step: 742670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:39,839-Speed 2498.58 samples/sec Loss 1.1061 LearningRate 0.000014 Epoch: 35 Global Step: 742680 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:47,992-Speed 2512.77 samples/sec Loss 1.1113 LearningRate 0.000014 Epoch: 35 Global Step: 742690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:26:56,204-Speed 2494.28 samples/sec Loss 1.1101 LearningRate 0.000014 Epoch: 35 Global Step: 742700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:04,405-Speed 2497.40 samples/sec Loss 1.0947 LearningRate 0.000014 Epoch: 35 Global Step: 742710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:12,607-Speed 2497.44 samples/sec Loss 1.1137 LearningRate 0.000014 Epoch: 35 Global Step: 742720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:20,807-Speed 2497.97 samples/sec Loss 1.1110 LearningRate 0.000014 Epoch: 35 Global Step: 742730 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:29,007-Speed 2497.70 samples/sec Loss 1.1105 LearningRate 0.000014 Epoch: 35 Global Step: 742740 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:37,157-Speed 2513.48 samples/sec Loss 1.0894 LearningRate 0.000014 Epoch: 35 Global Step: 742750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:45,357-Speed 2497.89 samples/sec Loss 1.1047 LearningRate 0.000014 Epoch: 35 Global Step: 742760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:27:53,561-Speed 2496.68 samples/sec Loss 1.1043 LearningRate 0.000014 Epoch: 35 Global Step: 742770 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:01,761-Speed 2497.89 samples/sec Loss 1.1112 LearningRate 0.000014 Epoch: 35 Global Step: 742780 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:09,962-Speed 2497.78 samples/sec Loss 1.1001 LearningRate 0.000014 Epoch: 35 Global Step: 742790 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:18,162-Speed 2497.71 samples/sec Loss 1.0952 LearningRate 0.000014 Epoch: 35 Global Step: 742800 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:26,325-Speed 2509.43 samples/sec Loss 1.0759 LearningRate 0.000014 Epoch: 35 Global Step: 742810 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:34,524-Speed 2498.12 samples/sec Loss 1.1279 LearningRate 0.000013 Epoch: 35 Global Step: 742820 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:42,731-Speed 2495.91 samples/sec Loss 1.1145 LearningRate 0.000013 Epoch: 35 Global Step: 742830 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:50,933-Speed 2497.32 samples/sec Loss 1.1217 LearningRate 0.000013 Epoch: 35 Global Step: 742840 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:28:59,142-Speed 2495.17 samples/sec Loss 1.1281 LearningRate 0.000013 Epoch: 35 Global Step: 742850 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:07,344-Speed 2497.19 samples/sec Loss 1.1192 LearningRate 0.000013 Epoch: 35 Global Step: 742860 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:15,490-Speed 2515.02 samples/sec Loss 1.1347 LearningRate 0.000013 Epoch: 35 Global Step: 742870 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:23,698-Speed 2495.54 samples/sec Loss 1.0819 LearningRate 0.000013 Epoch: 35 Global Step: 742880 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:31,901-Speed 2497.33 samples/sec Loss 1.1098 LearningRate 0.000013 Epoch: 35 Global Step: 742890 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:40,102-Speed 2497.63 samples/sec Loss 1.0938 LearningRate 0.000013 Epoch: 35 Global Step: 742900 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:48,325-Speed 2490.91 samples/sec Loss 1.0978 LearningRate 0.000013 Epoch: 35 Global Step: 742910 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:29:56,526-Speed 2497.28 samples/sec Loss 1.1028 LearningRate 0.000013 Epoch: 35 Global Step: 742920 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:04,679-Speed 2512.52 samples/sec Loss 1.0943 LearningRate 0.000013 Epoch: 35 Global Step: 742930 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:12,882-Speed 2497.18 samples/sec Loss 1.0911 LearningRate 0.000013 Epoch: 35 Global Step: 742940 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:21,084-Speed 2497.30 samples/sec Loss 1.1143 LearningRate 0.000013 Epoch: 35 Global Step: 742950 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:29,285-Speed 2497.67 samples/sec Loss 1.1342 LearningRate 0.000013 Epoch: 35 Global Step: 742960 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:37,507-Speed 2491.62 samples/sec Loss 1.1028 LearningRate 0.000013 Epoch: 35 Global Step: 742970 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:45,717-Speed 2494.81 samples/sec Loss 1.1276 LearningRate 0.000013 Epoch: 35 Global Step: 742980 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:30:53,866-Speed 2513.58 samples/sec Loss 1.1276 LearningRate 0.000013 Epoch: 35 Global Step: 742990 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:02,072-Speed 2496.31 samples/sec Loss 1.0921 LearningRate 0.000013 Epoch: 35 Global Step: 743000 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:10,273-Speed 2497.70 samples/sec Loss 1.1045 LearningRate 0.000013 Epoch: 35 Global Step: 743010 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:18,506-Speed 2487.87 samples/sec Loss 1.1616 LearningRate 0.000013 Epoch: 35 Global Step: 743020 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:26,707-Speed 2497.90 samples/sec Loss 1.1066 LearningRate 0.000013 Epoch: 35 Global Step: 743030 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:34,922-Speed 2493.30 samples/sec Loss 1.1186 LearningRate 0.000013 Epoch: 35 Global Step: 743040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:43,067-Speed 2514.68 samples/sec Loss 1.1167 LearningRate 0.000013 Epoch: 35 Global Step: 743050 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:51,270-Speed 2497.01 samples/sec Loss 1.1020 LearningRate 0.000013 Epoch: 35 Global Step: 743060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:31:59,472-Speed 2497.30 samples/sec Loss 1.1162 LearningRate 0.000013 Epoch: 35 Global Step: 743070 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:07,671-Speed 2498.72 samples/sec Loss 1.0705 LearningRate 0.000013 Epoch: 35 Global Step: 743080 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:15,874-Speed 2497.35 samples/sec Loss 1.1020 LearningRate 0.000013 Epoch: 35 Global Step: 743090 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:24,080-Speed 2495.77 samples/sec Loss 1.1238 LearningRate 0.000013 Epoch: 35 Global Step: 743100 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:32,228-Speed 2513.90 samples/sec Loss 1.0724 LearningRate 0.000013 Epoch: 35 Global Step: 743110 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:40,430-Speed 2497.53 samples/sec Loss 1.1290 LearningRate 0.000013 Epoch: 35 Global Step: 743120 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:48,632-Speed 2497.06 samples/sec Loss 1.0952 LearningRate 0.000013 Epoch: 35 Global Step: 743130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:32:56,835-Speed 2497.29 samples/sec Loss 1.1097 LearningRate 0.000013 Epoch: 35 Global Step: 743140 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:05,037-Speed 2497.40 samples/sec Loss 1.1311 LearningRate 0.000013 Epoch: 35 Global Step: 743150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:13,239-Speed 2497.30 samples/sec Loss 1.0970 LearningRate 0.000013 Epoch: 35 Global Step: 743160 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:21,386-Speed 2514.04 samples/sec Loss 1.1393 LearningRate 0.000013 Epoch: 35 Global Step: 743170 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:29,586-Speed 2498.03 samples/sec Loss 1.1015 LearningRate 0.000013 Epoch: 35 Global Step: 743180 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:37,787-Speed 2497.99 samples/sec Loss 1.0780 LearningRate 0.000013 Epoch: 35 Global Step: 743190 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:45,989-Speed 2497.18 samples/sec Loss 1.1213 LearningRate 0.000013 Epoch: 35 Global Step: 743200 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:33:54,207-Speed 2492.66 samples/sec Loss 1.0632 LearningRate 0.000013 Epoch: 35 Global Step: 743210 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:34:02,406-Speed 2498.24 samples/sec Loss 1.1458 LearningRate 0.000013 Epoch: 35 Global Step: 743220 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:34:10,558-Speed 2512.47 samples/sec Loss 1.1065 LearningRate 0.000013 Epoch: 35 Global Step: 743230 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:34:18,755-Speed 2498.72 samples/sec Loss 1.1164 LearningRate 0.000013 Epoch: 35 Global Step: 743240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:34:26,957-Speed 2497.61 samples/sec Loss 1.1105 LearningRate 0.000013 Epoch: 35 Global Step: 743250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:34:35,158-Speed 2497.52 samples/sec Loss 1.1142 LearningRate 0.000013 Epoch: 35 Global Step: 743260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:34:43,361-Speed 2497.24 samples/sec Loss 1.1071 LearningRate 0.000013 Epoch: 35 Global Step: 743270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:34:51,565-Speed 2496.65 samples/sec Loss 1.1336 LearningRate 0.000013 Epoch: 35 Global Step: 743280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:34:59,710-Speed 2514.86 samples/sec Loss 1.1377 LearningRate 0.000013 Epoch: 35 Global Step: 743290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:07,910-Speed 2497.94 samples/sec Loss 1.0826 LearningRate 0.000013 Epoch: 35 Global Step: 743300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:16,110-Speed 2498.09 samples/sec Loss 1.1057 LearningRate 0.000013 Epoch: 35 Global Step: 743310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:24,316-Speed 2496.13 samples/sec Loss 1.0949 LearningRate 0.000013 Epoch: 35 Global Step: 743320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:32,518-Speed 2497.26 samples/sec Loss 1.1056 LearningRate 0.000013 Epoch: 35 Global Step: 743330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:40,716-Speed 2498.82 samples/sec Loss 1.0900 LearningRate 0.000013 Epoch: 35 Global Step: 743340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:48,878-Speed 2509.35 samples/sec Loss 1.1319 LearningRate 0.000013 Epoch: 35 Global Step: 743350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:35:57,082-Speed 2496.97 samples/sec Loss 1.0869 LearningRate 0.000013 Epoch: 35 Global Step: 743360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:05,283-Speed 2497.74 samples/sec Loss 1.1127 LearningRate 0.000013 Epoch: 35 Global Step: 743370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:13,489-Speed 2496.17 samples/sec Loss 1.1105 LearningRate 0.000013 Epoch: 35 Global Step: 743380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:21,695-Speed 2495.91 samples/sec Loss 1.1015 LearningRate 0.000013 Epoch: 35 Global Step: 743390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:29,899-Speed 2497.05 samples/sec Loss 1.1185 LearningRate 0.000013 Epoch: 35 Global Step: 743400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:38,048-Speed 2513.52 samples/sec Loss 1.1255 LearningRate 0.000013 Epoch: 35 Global Step: 743410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:46,256-Speed 2495.92 samples/sec Loss 1.1177 LearningRate 0.000013 Epoch: 35 Global Step: 743420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:36:54,455-Speed 2498.13 samples/sec Loss 1.1129 LearningRate 0.000013 Epoch: 35 Global Step: 743430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:02,662-Speed 2496.16 samples/sec Loss 1.0940 LearningRate 0.000013 Epoch: 35 Global Step: 743440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:10,864-Speed 2497.39 samples/sec Loss 1.1038 LearningRate 0.000013 Epoch: 35 Global Step: 743450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:19,066-Speed 2497.50 samples/sec Loss 1.1046 LearningRate 0.000013 Epoch: 35 Global Step: 743460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:27,228-Speed 2509.54 samples/sec Loss 1.1262 LearningRate 0.000013 Epoch: 35 Global Step: 743470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:35,431-Speed 2497.04 samples/sec Loss 1.1112 LearningRate 0.000013 Epoch: 35 Global Step: 743480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:43,645-Speed 2493.53 samples/sec Loss 1.1005 LearningRate 0.000013 Epoch: 35 Global Step: 743490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:37:51,846-Speed 2497.77 samples/sec Loss 1.1194 LearningRate 0.000013 Epoch: 35 Global Step: 743500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:00,048-Speed 2497.35 samples/sec Loss 1.1038 LearningRate 0.000013 Epoch: 35 Global Step: 743510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:08,252-Speed 2496.50 samples/sec Loss 1.0858 LearningRate 0.000013 Epoch: 35 Global Step: 743520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:16,398-Speed 2514.46 samples/sec Loss 1.1065 LearningRate 0.000013 Epoch: 35 Global Step: 743530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:24,603-Speed 2496.47 samples/sec Loss 1.1046 LearningRate 0.000013 Epoch: 35 Global Step: 743540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:32,807-Speed 2496.86 samples/sec Loss 1.1051 LearningRate 0.000013 Epoch: 35 Global Step: 743550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:41,006-Speed 2498.29 samples/sec Loss 1.1196 LearningRate 0.000013 Epoch: 35 Global Step: 743560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:49,206-Speed 2497.82 samples/sec Loss 1.1184 LearningRate 0.000013 Epoch: 35 Global Step: 743570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:38:57,409-Speed 2496.86 samples/sec Loss 1.1178 LearningRate 0.000013 Epoch: 35 Global Step: 743580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:05,558-Speed 2514.75 samples/sec Loss 1.1143 LearningRate 0.000013 Epoch: 35 Global Step: 743590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:13,782-Speed 2490.51 samples/sec Loss 1.1208 LearningRate 0.000013 Epoch: 35 Global Step: 743600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:21,986-Speed 2496.52 samples/sec Loss 1.1208 LearningRate 0.000013 Epoch: 35 Global Step: 743610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:30,197-Speed 2494.83 samples/sec Loss 1.0635 LearningRate 0.000013 Epoch: 35 Global Step: 743620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:38,401-Speed 2496.61 samples/sec Loss 1.1121 LearningRate 0.000013 Epoch: 35 Global Step: 743630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:46,604-Speed 2496.92 samples/sec Loss 1.1090 LearningRate 0.000013 Epoch: 35 Global Step: 743640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:39:54,752-Speed 2514.25 samples/sec Loss 1.0796 LearningRate 0.000013 Epoch: 35 Global Step: 743650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:40:02,953-Speed 2497.78 samples/sec Loss 1.1069 LearningRate 0.000013 Epoch: 35 Global Step: 743660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:40:11,156-Speed 2497.50 samples/sec Loss 1.1026 LearningRate 0.000013 Epoch: 35 Global Step: 743670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:40:19,359-Speed 2497.09 samples/sec Loss 1.0521 LearningRate 0.000013 Epoch: 35 Global Step: 743680 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:40:27,567-Speed 2495.51 samples/sec Loss 1.1142 LearningRate 0.000013 Epoch: 35 Global Step: 743690 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-07-12 16:40:35,732-Speed 2509.01 samples/sec Loss 1.0933 LearningRate 0.000013 Epoch: 35 Global Step: 743700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:40:43,881-Speed 2514.01 samples/sec Loss 1.1167 LearningRate 0.000013 Epoch: 35 Global Step: 743710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:40:52,085-Speed 2496.58 samples/sec Loss 1.1414 LearningRate 0.000013 Epoch: 35 Global Step: 743720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:00,286-Speed 2497.70 samples/sec Loss 1.1005 LearningRate 0.000013 Epoch: 35 Global Step: 743730 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:08,484-Speed 2498.52 samples/sec Loss 1.1172 LearningRate 0.000013 Epoch: 35 Global Step: 743740 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:16,689-Speed 2496.23 samples/sec Loss 1.1502 LearningRate 0.000013 Epoch: 35 Global Step: 743750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:24,902-Speed 2494.10 samples/sec Loss 1.1004 LearningRate 0.000013 Epoch: 35 Global Step: 743760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:33,049-Speed 2514.27 samples/sec Loss 1.1279 LearningRate 0.000013 Epoch: 35 Global Step: 743770 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:41,249-Speed 2498.19 samples/sec Loss 1.1136 LearningRate 0.000013 Epoch: 35 Global Step: 743780 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-07-12 16:41:49,403-Speed 2511.88 samples/sec Loss 1.0918 LearningRate 0.000013 Epoch: 35 Global Step: 743790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:41:57,607-Speed 2496.83 samples/sec Loss 1.1082 LearningRate 0.000013 Epoch: 35 Global Step: 743800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:05,809-Speed 2497.16 samples/sec Loss 1.0939 LearningRate 0.000013 Epoch: 35 Global Step: 743810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:14,013-Speed 2497.17 samples/sec Loss 1.0962 LearningRate 0.000013 Epoch: 35 Global Step: 743820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:22,179-Speed 2508.10 samples/sec Loss 1.1387 LearningRate 0.000013 Epoch: 35 Global Step: 743830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:30,384-Speed 2496.53 samples/sec Loss 1.1140 LearningRate 0.000013 Epoch: 35 Global Step: 743840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:38,588-Speed 2496.63 samples/sec Loss 1.1362 LearningRate 0.000013 Epoch: 35 Global Step: 743850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:46,791-Speed 2497.11 samples/sec Loss 1.1161 LearningRate 0.000013 Epoch: 35 Global Step: 743860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:42:55,000-Speed 2495.24 samples/sec Loss 1.1097 LearningRate 0.000013 Epoch: 35 Global Step: 743870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:03,208-Speed 2495.85 samples/sec Loss 1.1122 LearningRate 0.000013 Epoch: 35 Global Step: 743880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:11,358-Speed 2513.30 samples/sec Loss 1.1124 LearningRate 0.000013 Epoch: 35 Global Step: 743890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:19,560-Speed 2497.46 samples/sec Loss 1.1339 LearningRate 0.000013 Epoch: 35 Global Step: 743900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:27,766-Speed 2496.10 samples/sec Loss 1.1061 LearningRate 0.000013 Epoch: 35 Global Step: 743910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:35,970-Speed 2496.68 samples/sec Loss 1.1341 LearningRate 0.000013 Epoch: 35 Global Step: 743920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:44,172-Speed 2497.69 samples/sec Loss 1.1270 LearningRate 0.000013 Epoch: 35 Global Step: 743930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:43:52,383-Speed 2494.50 samples/sec Loss 1.1096 LearningRate 0.000013 Epoch: 35 Global Step: 743940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:00,538-Speed 2511.94 samples/sec Loss 1.1034 LearningRate 0.000013 Epoch: 35 Global Step: 743950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:08,739-Speed 2497.50 samples/sec Loss 1.1330 LearningRate 0.000013 Epoch: 35 Global Step: 743960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:16,938-Speed 2498.52 samples/sec Loss 1.0993 LearningRate 0.000013 Epoch: 35 Global Step: 743970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:25,141-Speed 2497.15 samples/sec Loss 1.1091 LearningRate 0.000013 Epoch: 35 Global Step: 743980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:33,341-Speed 2498.15 samples/sec Loss 1.1180 LearningRate 0.000013 Epoch: 35 Global Step: 743990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:41,543-Speed 2497.18 samples/sec Loss 1.1160 LearningRate 0.000013 Epoch: 35 Global Step: 744000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:49,692-Speed 2513.82 samples/sec Loss 1.1162 LearningRate 0.000013 Epoch: 35 Global Step: 744010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:44:57,891-Speed 2498.07 samples/sec Loss 1.1185 LearningRate 0.000013 Epoch: 35 Global Step: 744020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:06,106-Speed 2493.68 samples/sec Loss 1.1069 LearningRate 0.000013 Epoch: 35 Global Step: 744030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:14,312-Speed 2496.15 samples/sec Loss 1.1115 LearningRate 0.000013 Epoch: 35 Global Step: 744040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:22,515-Speed 2497.00 samples/sec Loss 1.1147 LearningRate 0.000013 Epoch: 35 Global Step: 744050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:30,717-Speed 2497.38 samples/sec Loss 1.1116 LearningRate 0.000013 Epoch: 35 Global Step: 744060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:38,865-Speed 2514.00 samples/sec Loss 1.1122 LearningRate 0.000013 Epoch: 35 Global Step: 744070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:47,070-Speed 2496.77 samples/sec Loss 1.0870 LearningRate 0.000013 Epoch: 35 Global Step: 744080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:45:55,282-Speed 2494.57 samples/sec Loss 1.0853 LearningRate 0.000013 Epoch: 35 Global Step: 744090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:03,490-Speed 2495.36 samples/sec Loss 1.0971 LearningRate 0.000013 Epoch: 35 Global Step: 744100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:11,693-Speed 2497.17 samples/sec Loss 1.0847 LearningRate 0.000013 Epoch: 35 Global Step: 744110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:19,892-Speed 2498.12 samples/sec Loss 1.1346 LearningRate 0.000013 Epoch: 35 Global Step: 744120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:28,040-Speed 2513.79 samples/sec Loss 1.1383 LearningRate 0.000013 Epoch: 35 Global Step: 744130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:36,239-Speed 2498.59 samples/sec Loss 1.0944 LearningRate 0.000013 Epoch: 35 Global Step: 744140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:44,440-Speed 2497.72 samples/sec Loss 1.1104 LearningRate 0.000013 Epoch: 35 Global Step: 744150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:46:52,634-Speed 2499.80 samples/sec Loss 1.1039 LearningRate 0.000013 Epoch: 35 Global Step: 744160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:00,831-Speed 2498.92 samples/sec Loss 1.0825 LearningRate 0.000013 Epoch: 35 Global Step: 744170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:09,043-Speed 2494.28 samples/sec Loss 1.0943 LearningRate 0.000013 Epoch: 35 Global Step: 744180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:17,192-Speed 2513.49 samples/sec Loss 1.1114 LearningRate 0.000013 Epoch: 35 Global Step: 744190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:25,392-Speed 2498.26 samples/sec Loss 1.1037 LearningRate 0.000013 Epoch: 35 Global Step: 744200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:33,591-Speed 2497.91 samples/sec Loss 1.1219 LearningRate 0.000013 Epoch: 35 Global Step: 744210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:41,804-Speed 2494.11 samples/sec Loss 1.0768 LearningRate 0.000013 Epoch: 35 Global Step: 744220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:50,005-Speed 2497.67 samples/sec Loss 1.1245 LearningRate 0.000013 Epoch: 35 Global Step: 744230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:47:58,206-Speed 2497.79 samples/sec Loss 1.0958 LearningRate 0.000013 Epoch: 35 Global Step: 744240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:06,352-Speed 2514.38 samples/sec Loss 1.1111 LearningRate 0.000013 Epoch: 35 Global Step: 744250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:14,549-Speed 2498.94 samples/sec Loss 1.1157 LearningRate 0.000013 Epoch: 35 Global Step: 744260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:22,745-Speed 2499.22 samples/sec Loss 1.1173 LearningRate 0.000013 Epoch: 35 Global Step: 744270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:30,949-Speed 2496.84 samples/sec Loss 1.0999 LearningRate 0.000013 Epoch: 35 Global Step: 744280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:39,148-Speed 2498.07 samples/sec Loss 1.1093 LearningRate 0.000013 Epoch: 35 Global Step: 744290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:47,351-Speed 2497.17 samples/sec Loss 1.1055 LearningRate 0.000013 Epoch: 35 Global Step: 744300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:48:55,503-Speed 2512.86 samples/sec Loss 1.1053 LearningRate 0.000013 Epoch: 35 Global Step: 744310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:03,704-Speed 2497.53 samples/sec Loss 1.1110 LearningRate 0.000013 Epoch: 35 Global Step: 744320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:11,912-Speed 2495.89 samples/sec Loss 1.1044 LearningRate 0.000013 Epoch: 35 Global Step: 744330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:20,112-Speed 2498.06 samples/sec Loss 1.0802 LearningRate 0.000013 Epoch: 35 Global Step: 744340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:28,309-Speed 2498.70 samples/sec Loss 1.1152 LearningRate 0.000013 Epoch: 35 Global Step: 744350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:36,511-Speed 2497.59 samples/sec Loss 1.1271 LearningRate 0.000013 Epoch: 35 Global Step: 744360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-07-12 16:49:44,657-Speed 2514.35 samples/sec Loss 1.1154 LearningRate 0.000013 Epoch: 35 Global Step: 744370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:49:52,857-Speed 2498.18 samples/sec Loss 1.1364 LearningRate 0.000013 Epoch: 35 Global Step: 744380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:01,059-Speed 2497.29 samples/sec Loss 1.1214 LearningRate 0.000013 Epoch: 35 Global Step: 744390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:09,255-Speed 2499.35 samples/sec Loss 1.1237 LearningRate 0.000013 Epoch: 35 Global Step: 744400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:17,459-Speed 2496.88 samples/sec Loss 1.1157 LearningRate 0.000013 Epoch: 35 Global Step: 744410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:25,658-Speed 2498.17 samples/sec Loss 1.1028 LearningRate 0.000013 Epoch: 35 Global Step: 744420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:33,806-Speed 2514.26 samples/sec Loss 1.1377 LearningRate 0.000013 Epoch: 35 Global Step: 744430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:42,026-Speed 2491.93 samples/sec Loss 1.0963 LearningRate 0.000013 Epoch: 35 Global Step: 744440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:50,233-Speed 2495.88 samples/sec Loss 1.1315 LearningRate 0.000013 Epoch: 35 Global Step: 744450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:50:58,436-Speed 2496.99 samples/sec Loss 1.0870 LearningRate 0.000013 Epoch: 35 Global Step: 744460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:06,634-Speed 2498.61 samples/sec Loss 1.0999 LearningRate 0.000013 Epoch: 35 Global Step: 744470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:14,846-Speed 2494.48 samples/sec Loss 1.1031 LearningRate 0.000013 Epoch: 35 Global Step: 744480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:22,991-Speed 2514.75 samples/sec Loss 1.0841 LearningRate 0.000013 Epoch: 35 Global Step: 744490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:31,191-Speed 2497.88 samples/sec Loss 1.1085 LearningRate 0.000013 Epoch: 35 Global Step: 744500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:39,395-Speed 2496.91 samples/sec Loss 1.0923 LearningRate 0.000013 Epoch: 35 Global Step: 744510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:47,598-Speed 2496.65 samples/sec Loss 1.1040 LearningRate 0.000013 Epoch: 35 Global Step: 744520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:51:55,804-Speed 2496.12 samples/sec Loss 1.1289 LearningRate 0.000013 Epoch: 35 Global Step: 744530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:04,017-Speed 2494.16 samples/sec Loss 1.1329 LearningRate 0.000013 Epoch: 35 Global Step: 744540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:12,175-Speed 2510.81 samples/sec Loss 1.1338 LearningRate 0.000013 Epoch: 35 Global Step: 744550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:20,380-Speed 2496.38 samples/sec Loss 1.1238 LearningRate 0.000013 Epoch: 35 Global Step: 744560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:28,584-Speed 2496.73 samples/sec Loss 1.1272 LearningRate 0.000013 Epoch: 35 Global Step: 744570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:36,788-Speed 2496.93 samples/sec Loss 1.1115 LearningRate 0.000013 Epoch: 35 Global Step: 744580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:44,990-Speed 2497.18 samples/sec Loss 1.1007 LearningRate 0.000013 Epoch: 35 Global Step: 744590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:52:53,193-Speed 2497.04 samples/sec Loss 1.0696 LearningRate 0.000013 Epoch: 35 Global Step: 744600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:01,343-Speed 2513.33 samples/sec Loss 1.0905 LearningRate 0.000013 Epoch: 35 Global Step: 744610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:09,560-Speed 2492.80 samples/sec Loss 1.1205 LearningRate 0.000013 Epoch: 35 Global Step: 744620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:17,760-Speed 2497.85 samples/sec Loss 1.1180 LearningRate 0.000013 Epoch: 35 Global Step: 744630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:25,975-Speed 2493.46 samples/sec Loss 1.1082 LearningRate 0.000013 Epoch: 35 Global Step: 744640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:34,178-Speed 2497.35 samples/sec Loss 1.1162 LearningRate 0.000013 Epoch: 35 Global Step: 744650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:42,387-Speed 2495.34 samples/sec Loss 1.0853 LearningRate 0.000013 Epoch: 35 Global Step: 744660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:50,536-Speed 2513.49 samples/sec Loss 1.0880 LearningRate 0.000013 Epoch: 35 Global Step: 744670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:53:58,739-Speed 2496.96 samples/sec Loss 1.0906 LearningRate 0.000013 Epoch: 35 Global Step: 744680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:06,955-Speed 2493.17 samples/sec Loss 1.0776 LearningRate 0.000013 Epoch: 35 Global Step: 744690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:15,157-Speed 2497.50 samples/sec Loss 1.0986 LearningRate 0.000013 Epoch: 35 Global Step: 744700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:23,362-Speed 2496.47 samples/sec Loss 1.1280 LearningRate 0.000013 Epoch: 35 Global Step: 744710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:31,563-Speed 2497.50 samples/sec Loss 1.1259 LearningRate 0.000013 Epoch: 35 Global Step: 744720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:39,713-Speed 2513.37 samples/sec Loss 1.1110 LearningRate 0.000013 Epoch: 35 Global Step: 744730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:47,919-Speed 2497.00 samples/sec Loss 1.1203 LearningRate 0.000013 Epoch: 35 Global Step: 744740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:54:56,127-Speed 2495.31 samples/sec Loss 1.0981 LearningRate 0.000013 Epoch: 35 Global Step: 744750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:04,334-Speed 2496.08 samples/sec Loss 1.0771 LearningRate 0.000013 Epoch: 35 Global Step: 744760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:12,541-Speed 2495.72 samples/sec Loss 1.1117 LearningRate 0.000013 Epoch: 35 Global Step: 744770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:20,747-Speed 2496.13 samples/sec Loss 1.1123 LearningRate 0.000013 Epoch: 35 Global Step: 744780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:28,893-Speed 2514.34 samples/sec Loss 1.1222 LearningRate 0.000013 Epoch: 35 Global Step: 744790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:37,102-Speed 2495.54 samples/sec Loss 1.1365 LearningRate 0.000013 Epoch: 35 Global Step: 744800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:45,302-Speed 2498.00 samples/sec Loss 1.1038 LearningRate 0.000013 Epoch: 35 Global Step: 744810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:55:53,505-Speed 2496.90 samples/sec Loss 1.1212 LearningRate 0.000013 Epoch: 35 Global Step: 744820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:01,712-Speed 2495.79 samples/sec Loss 1.0821 LearningRate 0.000013 Epoch: 35 Global Step: 744830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:09,918-Speed 2496.67 samples/sec Loss 1.0894 LearningRate 0.000013 Epoch: 35 Global Step: 744840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:18,071-Speed 2512.38 samples/sec Loss 1.1358 LearningRate 0.000013 Epoch: 35 Global Step: 744850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:26,277-Speed 2496.08 samples/sec Loss 1.0789 LearningRate 0.000013 Epoch: 35 Global Step: 744860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:34,487-Speed 2495.13 samples/sec Loss 1.1160 LearningRate 0.000013 Epoch: 35 Global Step: 744870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:42,692-Speed 2496.47 samples/sec Loss 1.0842 LearningRate 0.000013 Epoch: 35 Global Step: 744880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:50,899-Speed 2496.08 samples/sec Loss 1.1376 LearningRate 0.000013 Epoch: 35 Global Step: 744890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:56:59,105-Speed 2497.48 samples/sec Loss 1.1084 LearningRate 0.000013 Epoch: 35 Global Step: 744900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:07,265-Speed 2510.13 samples/sec Loss 1.0601 LearningRate 0.000013 Epoch: 35 Global Step: 744910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:15,475-Speed 2494.94 samples/sec Loss 1.1122 LearningRate 0.000013 Epoch: 35 Global Step: 744920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:23,685-Speed 2495.26 samples/sec Loss 1.1293 LearningRate 0.000013 Epoch: 35 Global Step: 744930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:31,893-Speed 2495.23 samples/sec Loss 1.1236 LearningRate 0.000013 Epoch: 35 Global Step: 744940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:40,101-Speed 2495.63 samples/sec Loss 1.0913 LearningRate 0.000013 Epoch: 35 Global Step: 744950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:48,309-Speed 2495.55 samples/sec Loss 1.1138 LearningRate 0.000013 Epoch: 35 Global Step: 744960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:57:56,462-Speed 2512.17 samples/sec Loss 1.1114 LearningRate 0.000013 Epoch: 35 Global Step: 744970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:58:04,667-Speed 2496.65 samples/sec Loss 1.0939 LearningRate 0.000013 Epoch: 35 Global Step: 744980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 16:58:12,882-Speed 2493.33 samples/sec Loss 1.0980 LearningRate 0.000013 Epoch: 35 Global Step: 744990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:58:21,086-Speed 2496.46 samples/sec Loss 1.1178 LearningRate 0.000013 Epoch: 35 Global Step: 745000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:58:29,308-Speed 2491.44 samples/sec Loss 1.1081 LearningRate 0.000013 Epoch: 35 Global Step: 745010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:58:37,514-Speed 2495.94 samples/sec Loss 1.1169 LearningRate 0.000013 Epoch: 35 Global Step: 745020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:58:45,668-Speed 2512.21 samples/sec Loss 1.1005 LearningRate 0.000013 Epoch: 35 Global Step: 745030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:58:53,874-Speed 2496.30 samples/sec Loss 1.1035 LearningRate 0.000013 Epoch: 35 Global Step: 745040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:02,079-Speed 2496.22 samples/sec Loss 1.0975 LearningRate 0.000013 Epoch: 35 Global Step: 745050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:10,296-Speed 2492.84 samples/sec Loss 1.1295 LearningRate 0.000013 Epoch: 35 Global Step: 745060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:18,503-Speed 2495.85 samples/sec Loss 1.1359 LearningRate 0.000013 Epoch: 35 Global Step: 745070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:26,709-Speed 2496.22 samples/sec Loss 1.0862 LearningRate 0.000013 Epoch: 35 Global Step: 745080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:34,862-Speed 2512.20 samples/sec Loss 1.1287 LearningRate 0.000013 Epoch: 35 Global Step: 745090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:43,069-Speed 2495.96 samples/sec Loss 1.1213 LearningRate 0.000013 Epoch: 35 Global Step: 745100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:51,276-Speed 2496.15 samples/sec Loss 1.1199 LearningRate 0.000013 Epoch: 35 Global Step: 745110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 16:59:59,484-Speed 2495.52 samples/sec Loss 1.1071 LearningRate 0.000013 Epoch: 35 Global Step: 745120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:07,692-Speed 2495.62 samples/sec Loss 1.1476 LearningRate 0.000013 Epoch: 35 Global Step: 745130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:15,899-Speed 2495.75 samples/sec Loss 1.0809 LearningRate 0.000013 Epoch: 35 Global Step: 745140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:24,053-Speed 2512.00 samples/sec Loss 1.1298 LearningRate 0.000013 Epoch: 35 Global Step: 745150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:32,259-Speed 2496.18 samples/sec Loss 1.0790 LearningRate 0.000013 Epoch: 35 Global Step: 745160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:40,472-Speed 2494.12 samples/sec Loss 1.1109 LearningRate 0.000013 Epoch: 35 Global Step: 745170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:48,680-Speed 2495.61 samples/sec Loss 1.1024 LearningRate 0.000013 Epoch: 35 Global Step: 745180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:00:56,890-Speed 2494.95 samples/sec Loss 1.0852 LearningRate 0.000013 Epoch: 35 Global Step: 745190 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:05,098-Speed 2495.28 samples/sec Loss 1.1032 LearningRate 0.000013 Epoch: 35 Global Step: 745200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:13,254-Speed 2511.72 samples/sec Loss 1.1320 LearningRate 0.000013 Epoch: 35 Global Step: 745210 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:21,471-Speed 2492.49 samples/sec Loss 1.1174 LearningRate 0.000013 Epoch: 35 Global Step: 745220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:29,682-Speed 2494.90 samples/sec Loss 1.1064 LearningRate 0.000013 Epoch: 35 Global Step: 745230 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:37,890-Speed 2495.48 samples/sec Loss 1.1373 LearningRate 0.000013 Epoch: 35 Global Step: 745240 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:46,104-Speed 2493.80 samples/sec Loss 1.0942 LearningRate 0.000013 Epoch: 35 Global Step: 745250 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:01:54,318-Speed 2493.64 samples/sec Loss 1.1178 LearningRate 0.000013 Epoch: 35 Global Step: 745260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:02,477-Speed 2510.65 samples/sec Loss 1.1253 LearningRate 0.000013 Epoch: 35 Global Step: 745270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:10,701-Speed 2490.64 samples/sec Loss 1.1018 LearningRate 0.000013 Epoch: 35 Global Step: 745280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:18,921-Speed 2491.98 samples/sec Loss 1.1216 LearningRate 0.000013 Epoch: 35 Global Step: 745290 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:27,131-Speed 2494.97 samples/sec Loss 1.1327 LearningRate 0.000013 Epoch: 35 Global Step: 745300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:35,341-Speed 2494.84 samples/sec Loss 1.0662 LearningRate 0.000013 Epoch: 35 Global Step: 745310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:43,551-Speed 2494.95 samples/sec Loss 1.1071 LearningRate 0.000013 Epoch: 35 Global Step: 745320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:51,704-Speed 2512.18 samples/sec Loss 1.0984 LearningRate 0.000013 Epoch: 35 Global Step: 745330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:02:59,922-Speed 2493.04 samples/sec Loss 1.1005 LearningRate 0.000013 Epoch: 35 Global Step: 745340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:08,130-Speed 2495.53 samples/sec Loss 1.0940 LearningRate 0.000013 Epoch: 35 Global Step: 745350 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:16,341-Speed 2494.55 samples/sec Loss 1.1005 LearningRate 0.000013 Epoch: 35 Global Step: 745360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:24,548-Speed 2496.04 samples/sec Loss 1.0958 LearningRate 0.000013 Epoch: 35 Global Step: 745370 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:32,766-Speed 2492.56 samples/sec Loss 1.1223 LearningRate 0.000013 Epoch: 35 Global Step: 745380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:40,918-Speed 2512.53 samples/sec Loss 1.0992 LearningRate 0.000013 Epoch: 35 Global Step: 745390 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:49,130-Speed 2494.44 samples/sec Loss 1.1375 LearningRate 0.000013 Epoch: 35 Global Step: 745400 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:03:57,344-Speed 2493.62 samples/sec Loss 1.1001 LearningRate 0.000013 Epoch: 35 Global Step: 745410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:05,574-Speed 2489.16 samples/sec Loss 1.0834 LearningRate 0.000013 Epoch: 35 Global Step: 745420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:13,781-Speed 2495.69 samples/sec Loss 1.0882 LearningRate 0.000013 Epoch: 35 Global Step: 745430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:21,990-Speed 2495.45 samples/sec Loss 1.1227 LearningRate 0.000013 Epoch: 35 Global Step: 745440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:30,144-Speed 2512.28 samples/sec Loss 1.1082 LearningRate 0.000013 Epoch: 35 Global Step: 745450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:38,352-Speed 2495.50 samples/sec Loss 1.1252 LearningRate 0.000013 Epoch: 35 Global Step: 745460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:46,560-Speed 2495.59 samples/sec Loss 1.1073 LearningRate 0.000013 Epoch: 35 Global Step: 745470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:04:54,769-Speed 2495.24 samples/sec Loss 1.1019 LearningRate 0.000013 Epoch: 35 Global Step: 745480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:02,981-Speed 2494.25 samples/sec Loss 1.0982 LearningRate 0.000013 Epoch: 35 Global Step: 745490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:11,197-Speed 2493.46 samples/sec Loss 1.0935 LearningRate 0.000013 Epoch: 35 Global Step: 745500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:19,362-Speed 2508.43 samples/sec Loss 1.1041 LearningRate 0.000013 Epoch: 35 Global Step: 745510 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:27,604-Speed 2485.25 samples/sec Loss 1.1144 LearningRate 0.000013 Epoch: 35 Global Step: 745520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:35,811-Speed 2495.86 samples/sec Loss 1.0944 LearningRate 0.000013 Epoch: 35 Global Step: 745530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:44,020-Speed 2495.38 samples/sec Loss 1.0846 LearningRate 0.000013 Epoch: 35 Global Step: 745540 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:05:52,230-Speed 2494.93 samples/sec Loss 1.1053 LearningRate 0.000013 Epoch: 35 Global Step: 745550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:00,438-Speed 2495.26 samples/sec Loss 1.0989 LearningRate 0.000013 Epoch: 35 Global Step: 745560 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:08,599-Speed 2510.04 samples/sec Loss 1.0948 LearningRate 0.000013 Epoch: 35 Global Step: 745570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:16,819-Speed 2491.77 samples/sec Loss 1.1168 LearningRate 0.000013 Epoch: 35 Global Step: 745580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:25,028-Speed 2495.03 samples/sec Loss 1.0922 LearningRate 0.000013 Epoch: 35 Global Step: 745590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:33,238-Speed 2495.04 samples/sec Loss 1.0965 LearningRate 0.000013 Epoch: 35 Global Step: 745600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:41,443-Speed 2496.22 samples/sec Loss 1.0862 LearningRate 0.000013 Epoch: 35 Global Step: 745610 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:49,655-Speed 2494.59 samples/sec Loss 1.1318 LearningRate 0.000013 Epoch: 35 Global Step: 745620 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:06:57,808-Speed 2512.31 samples/sec Loss 1.0812 LearningRate 0.000013 Epoch: 35 Global Step: 745630 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:06,014-Speed 2496.21 samples/sec Loss 1.0646 LearningRate 0.000013 Epoch: 35 Global Step: 745640 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:14,220-Speed 2496.26 samples/sec Loss 1.1305 LearningRate 0.000013 Epoch: 35 Global Step: 745650 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:22,428-Speed 2495.97 samples/sec Loss 1.0742 LearningRate 0.000013 Epoch: 35 Global Step: 745660 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:30,635-Speed 2495.74 samples/sec Loss 1.1130 LearningRate 0.000013 Epoch: 35 Global Step: 745670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:38,841-Speed 2496.19 samples/sec Loss 1.0920 LearningRate 0.000013 Epoch: 35 Global Step: 745680 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:47,000-Speed 2510.59 samples/sec Loss 1.1035 LearningRate 0.000013 Epoch: 35 Global Step: 745690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:07:55,221-Speed 2491.34 samples/sec Loss 1.1087 LearningRate 0.000013 Epoch: 35 Global Step: 745700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:03,446-Speed 2490.45 samples/sec Loss 1.1046 LearningRate 0.000013 Epoch: 35 Global Step: 745710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:11,652-Speed 2496.13 samples/sec Loss 1.1316 LearningRate 0.000013 Epoch: 35 Global Step: 745720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:19,860-Speed 2495.72 samples/sec Loss 1.0993 LearningRate 0.000013 Epoch: 35 Global Step: 745730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:28,071-Speed 2494.81 samples/sec Loss 1.0827 LearningRate 0.000013 Epoch: 35 Global Step: 745740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:36,223-Speed 2512.60 samples/sec Loss 1.1058 LearningRate 0.000013 Epoch: 35 Global Step: 745750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:44,432-Speed 2495.28 samples/sec Loss 1.1562 LearningRate 0.000013 Epoch: 35 Global Step: 745760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:08:52,637-Speed 2496.31 samples/sec Loss 1.1155 LearningRate 0.000013 Epoch: 35 Global Step: 745770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:00,849-Speed 2494.30 samples/sec Loss 1.1248 LearningRate 0.000013 Epoch: 35 Global Step: 745780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:09,056-Speed 2495.98 samples/sec Loss 1.0771 LearningRate 0.000013 Epoch: 35 Global Step: 745790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:17,267-Speed 2494.26 samples/sec Loss 1.1447 LearningRate 0.000013 Epoch: 35 Global Step: 745800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:25,424-Speed 2511.34 samples/sec Loss 1.1356 LearningRate 0.000013 Epoch: 35 Global Step: 745810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:33,633-Speed 2495.41 samples/sec Loss 1.1200 LearningRate 0.000013 Epoch: 35 Global Step: 745820 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:41,838-Speed 2496.36 samples/sec Loss 1.0940 LearningRate 0.000013 Epoch: 35 Global Step: 745830 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:50,046-Speed 2495.63 samples/sec Loss 1.1020 LearningRate 0.000013 Epoch: 35 Global Step: 745840 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:09:58,262-Speed 2493.16 samples/sec Loss 1.1096 LearningRate 0.000013 Epoch: 35 Global Step: 745850 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:06,477-Speed 2493.32 samples/sec Loss 1.1124 LearningRate 0.000013 Epoch: 35 Global Step: 745860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:14,629-Speed 2512.52 samples/sec Loss 1.0955 LearningRate 0.000013 Epoch: 35 Global Step: 745870 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:22,835-Speed 2496.13 samples/sec Loss 1.1485 LearningRate 0.000013 Epoch: 35 Global Step: 745880 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:31,041-Speed 2496.34 samples/sec Loss 1.1189 LearningRate 0.000013 Epoch: 35 Global Step: 745890 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:39,245-Speed 2496.58 samples/sec Loss 1.1073 LearningRate 0.000013 Epoch: 35 Global Step: 745900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:47,450-Speed 2496.61 samples/sec Loss 1.1350 LearningRate 0.000013 Epoch: 35 Global Step: 745910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:10:55,654-Speed 2496.59 samples/sec Loss 1.0912 LearningRate 0.000013 Epoch: 35 Global Step: 745920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:03,805-Speed 2512.97 samples/sec Loss 1.1068 LearningRate 0.000013 Epoch: 35 Global Step: 745930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:12,010-Speed 2496.59 samples/sec Loss 1.1352 LearningRate 0.000013 Epoch: 35 Global Step: 745940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:20,212-Speed 2497.18 samples/sec Loss 1.1190 LearningRate 0.000013 Epoch: 35 Global Step: 745950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:28,416-Speed 2496.79 samples/sec Loss 1.0862 LearningRate 0.000013 Epoch: 35 Global Step: 745960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:36,637-Speed 2491.64 samples/sec Loss 1.1193 LearningRate 0.000013 Epoch: 35 Global Step: 745970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:44,846-Speed 2495.61 samples/sec Loss 1.0914 LearningRate 0.000013 Epoch: 35 Global Step: 745980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:11:52,998-Speed 2512.33 samples/sec Loss 1.1092 LearningRate 0.000013 Epoch: 35 Global Step: 745990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:01,205-Speed 2496.25 samples/sec Loss 1.1135 LearningRate 0.000013 Epoch: 35 Global Step: 746000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:09,414-Speed 2495.30 samples/sec Loss 1.0758 LearningRate 0.000013 Epoch: 35 Global Step: 746010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:17,631-Speed 2492.80 samples/sec Loss 1.1023 LearningRate 0.000013 Epoch: 35 Global Step: 746020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:25,840-Speed 2495.16 samples/sec Loss 1.1271 LearningRate 0.000013 Epoch: 35 Global Step: 746030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:34,071-Speed 2488.54 samples/sec Loss 1.1197 LearningRate 0.000013 Epoch: 35 Global Step: 746040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:42,248-Speed 2505.13 samples/sec Loss 1.0995 LearningRate 0.000013 Epoch: 35 Global Step: 746050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:50,453-Speed 2496.50 samples/sec Loss 1.1041 LearningRate 0.000013 Epoch: 35 Global Step: 746060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:12:58,665-Speed 2494.01 samples/sec Loss 1.1344 LearningRate 0.000013 Epoch: 35 Global Step: 746070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:06,882-Speed 2492.99 samples/sec Loss 1.1144 LearningRate 0.000013 Epoch: 35 Global Step: 746080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:15,086-Speed 2496.75 samples/sec Loss 1.1054 LearningRate 0.000012 Epoch: 35 Global Step: 746090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:23,297-Speed 2494.37 samples/sec Loss 1.1346 LearningRate 0.000012 Epoch: 35 Global Step: 746100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:31,449-Speed 2512.88 samples/sec Loss 1.0855 LearningRate 0.000012 Epoch: 35 Global Step: 746110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:39,657-Speed 2495.67 samples/sec Loss 1.1408 LearningRate 0.000012 Epoch: 35 Global Step: 746120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:47,862-Speed 2496.22 samples/sec Loss 1.1079 LearningRate 0.000012 Epoch: 35 Global Step: 746130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:13:56,065-Speed 2497.19 samples/sec Loss 1.0966 LearningRate 0.000012 Epoch: 35 Global Step: 746140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:14:04,268-Speed 2497.18 samples/sec Loss 1.1046 LearningRate 0.000012 Epoch: 35 Global Step: 746150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:14:12,472-Speed 2496.54 samples/sec Loss 1.1014 LearningRate 0.000012 Epoch: 35 Global Step: 746160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:14:20,627-Speed 2511.83 samples/sec Loss 1.0879 LearningRate 0.000012 Epoch: 35 Global Step: 746170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:14:28,853-Speed 2489.87 samples/sec Loss 1.0740 LearningRate 0.000012 Epoch: 35 Global Step: 746180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:14:37,058-Speed 2496.45 samples/sec Loss 1.1277 LearningRate 0.000012 Epoch: 35 Global Step: 746190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:14:45,260-Speed 2497.29 samples/sec Loss 1.1186 LearningRate 0.000012 Epoch: 35 Global Step: 746200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:14:53,465-Speed 2496.43 samples/sec Loss 1.1228 LearningRate 0.000012 Epoch: 35 Global Step: 746210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:01,674-Speed 2495.17 samples/sec Loss 1.1130 LearningRate 0.000012 Epoch: 35 Global Step: 746220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:09,826-Speed 2512.65 samples/sec Loss 1.1200 LearningRate 0.000012 Epoch: 35 Global Step: 746230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:18,032-Speed 2496.01 samples/sec Loss 1.1167 LearningRate 0.000012 Epoch: 35 Global Step: 746240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:26,235-Speed 2497.22 samples/sec Loss 1.1127 LearningRate 0.000012 Epoch: 35 Global Step: 746250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:34,444-Speed 2495.14 samples/sec Loss 1.1023 LearningRate 0.000012 Epoch: 35 Global Step: 746260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:42,652-Speed 2496.32 samples/sec Loss 1.1302 LearningRate 0.000012 Epoch: 35 Global Step: 746270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:50,860-Speed 2495.49 samples/sec Loss 1.1044 LearningRate 0.000012 Epoch: 35 Global Step: 746280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:15:59,011-Speed 2512.84 samples/sec Loss 1.1016 LearningRate 0.000012 Epoch: 35 Global Step: 746290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:07,218-Speed 2495.78 samples/sec Loss 1.1411 LearningRate 0.000012 Epoch: 35 Global Step: 746300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:15,422-Speed 2496.84 samples/sec Loss 1.0940 LearningRate 0.000012 Epoch: 35 Global Step: 746310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:23,625-Speed 2496.91 samples/sec Loss 1.0969 LearningRate 0.000012 Epoch: 35 Global Step: 746320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:31,829-Speed 2496.75 samples/sec Loss 1.1255 LearningRate 0.000012 Epoch: 35 Global Step: 746330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:40,042-Speed 2494.17 samples/sec Loss 1.1156 LearningRate 0.000012 Epoch: 35 Global Step: 746340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:48,192-Speed 2513.10 samples/sec Loss 1.1087 LearningRate 0.000012 Epoch: 35 Global Step: 746350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:16:56,395-Speed 2497.13 samples/sec Loss 1.1191 LearningRate 0.000012 Epoch: 35 Global Step: 746360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:04,604-Speed 2495.13 samples/sec Loss 1.1106 LearningRate 0.000012 Epoch: 35 Global Step: 746370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:12,809-Speed 2496.60 samples/sec Loss 1.0971 LearningRate 0.000012 Epoch: 35 Global Step: 746380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:21,012-Speed 2497.21 samples/sec Loss 1.1685 LearningRate 0.000012 Epoch: 35 Global Step: 746390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:29,216-Speed 2496.52 samples/sec Loss 1.1100 LearningRate 0.000012 Epoch: 35 Global Step: 746400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:37,367-Speed 2512.86 samples/sec Loss 1.1009 LearningRate 0.000012 Epoch: 35 Global Step: 746410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:45,573-Speed 2496.14 samples/sec Loss 1.1202 LearningRate 0.000012 Epoch: 35 Global Step: 746420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:17:53,779-Speed 2496.32 samples/sec Loss 1.0847 LearningRate 0.000012 Epoch: 35 Global Step: 746430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:01,984-Speed 2496.62 samples/sec Loss 1.0837 LearningRate 0.000012 Epoch: 35 Global Step: 746440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:10,188-Speed 2496.59 samples/sec Loss 1.1038 LearningRate 0.000012 Epoch: 35 Global Step: 746450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:18,395-Speed 2495.93 samples/sec Loss 1.1123 LearningRate 0.000012 Epoch: 35 Global Step: 746460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:26,546-Speed 2513.29 samples/sec Loss 1.1440 LearningRate 0.000012 Epoch: 35 Global Step: 746470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:34,760-Speed 2493.76 samples/sec Loss 1.1077 LearningRate 0.000012 Epoch: 35 Global Step: 746480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:42,972-Speed 2494.08 samples/sec Loss 1.1244 LearningRate 0.000012 Epoch: 35 Global Step: 746490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:51,179-Speed 2495.76 samples/sec Loss 1.0996 LearningRate 0.000012 Epoch: 35 Global Step: 746500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:18:59,382-Speed 2497.06 samples/sec Loss 1.1010 LearningRate 0.000012 Epoch: 35 Global Step: 746510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:07,584-Speed 2497.37 samples/sec Loss 1.1117 LearningRate 0.000012 Epoch: 35 Global Step: 746520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:15,731-Speed 2514.16 samples/sec Loss 1.1129 LearningRate 0.000012 Epoch: 35 Global Step: 746530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:23,935-Speed 2497.15 samples/sec Loss 1.1042 LearningRate 0.000012 Epoch: 35 Global Step: 746540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:32,140-Speed 2496.66 samples/sec Loss 1.0959 LearningRate 0.000012 Epoch: 35 Global Step: 746550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:40,345-Speed 2496.16 samples/sec Loss 1.1110 LearningRate 0.000012 Epoch: 35 Global Step: 746560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:48,552-Speed 2495.94 samples/sec Loss 1.1131 LearningRate 0.000012 Epoch: 35 Global Step: 746570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:19:56,755-Speed 2497.09 samples/sec Loss 1.1370 LearningRate 0.000012 Epoch: 35 Global Step: 746580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:04,909-Speed 2512.20 samples/sec Loss 1.1113 LearningRate 0.000012 Epoch: 35 Global Step: 746590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:13,110-Speed 2497.33 samples/sec Loss 1.1140 LearningRate 0.000012 Epoch: 35 Global Step: 746600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:21,314-Speed 2496.72 samples/sec Loss 1.1130 LearningRate 0.000012 Epoch: 35 Global Step: 746610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:29,520-Speed 2496.28 samples/sec Loss 1.1009 LearningRate 0.000012 Epoch: 35 Global Step: 746620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:40,338-Speed 1893.48 samples/sec Loss 1.1506 LearningRate 0.000012 Epoch: 36 Global Step: 746630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:48,537-Speed 2498.39 samples/sec Loss 1.1156 LearningRate 0.000012 Epoch: 36 Global Step: 746640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:20:56,688-Speed 2513.18 samples/sec Loss 1.1207 LearningRate 0.000012 Epoch: 36 Global Step: 746650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:04,888-Speed 2497.76 samples/sec Loss 1.1122 LearningRate 0.000012 Epoch: 36 Global Step: 746660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:13,091-Speed 2496.84 samples/sec Loss 1.1194 LearningRate 0.000012 Epoch: 36 Global Step: 746670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:21,295-Speed 2497.10 samples/sec Loss 1.1285 LearningRate 0.000012 Epoch: 36 Global Step: 746680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:29,496-Speed 2497.71 samples/sec Loss 1.1153 LearningRate 0.000012 Epoch: 36 Global Step: 746690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:37,698-Speed 2497.13 samples/sec Loss 1.1309 LearningRate 0.000012 Epoch: 36 Global Step: 746700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:45,851-Speed 2512.24 samples/sec Loss 1.1025 LearningRate 0.000012 Epoch: 36 Global Step: 746710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:21:54,055-Speed 2496.78 samples/sec Loss 1.0847 LearningRate 0.000012 Epoch: 36 Global Step: 746720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:02,257-Speed 2497.34 samples/sec Loss 1.0960 LearningRate 0.000012 Epoch: 36 Global Step: 746730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:10,458-Speed 2497.70 samples/sec Loss 1.1253 LearningRate 0.000012 Epoch: 36 Global Step: 746740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:18,663-Speed 2496.48 samples/sec Loss 1.1024 LearningRate 0.000012 Epoch: 36 Global Step: 746750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:26,866-Speed 2496.94 samples/sec Loss 1.0952 LearningRate 0.000012 Epoch: 36 Global Step: 746760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:35,035-Speed 2507.37 samples/sec Loss 1.0965 LearningRate 0.000012 Epoch: 36 Global Step: 746770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:43,242-Speed 2496.25 samples/sec Loss 1.1126 LearningRate 0.000012 Epoch: 36 Global Step: 746780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:51,448-Speed 2495.94 samples/sec Loss 1.1039 LearningRate 0.000012 Epoch: 36 Global Step: 746790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:22:59,653-Speed 2496.78 samples/sec Loss 1.0653 LearningRate 0.000012 Epoch: 36 Global Step: 746800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:07,856-Speed 2496.98 samples/sec Loss 1.0814 LearningRate 0.000012 Epoch: 36 Global Step: 746810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:16,057-Speed 2497.54 samples/sec Loss 1.0802 LearningRate 0.000012 Epoch: 36 Global Step: 746820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:24,209-Speed 2512.77 samples/sec Loss 1.1291 LearningRate 0.000012 Epoch: 36 Global Step: 746830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:32,412-Speed 2496.99 samples/sec Loss 1.1008 LearningRate 0.000012 Epoch: 36 Global Step: 746840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:40,616-Speed 2496.83 samples/sec Loss 1.1079 LearningRate 0.000012 Epoch: 36 Global Step: 746850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:48,819-Speed 2496.84 samples/sec Loss 1.0658 LearningRate 0.000012 Epoch: 36 Global Step: 746860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:23:57,025-Speed 2496.42 samples/sec Loss 1.0948 LearningRate 0.000012 Epoch: 36 Global Step: 746870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:05,231-Speed 2496.38 samples/sec Loss 1.1037 LearningRate 0.000012 Epoch: 36 Global Step: 746880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:13,381-Speed 2513.09 samples/sec Loss 1.1068 LearningRate 0.000012 Epoch: 36 Global Step: 746890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:21,585-Speed 2496.81 samples/sec Loss 1.0770 LearningRate 0.000012 Epoch: 36 Global Step: 746900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:29,794-Speed 2495.27 samples/sec Loss 1.1162 LearningRate 0.000012 Epoch: 36 Global Step: 746910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:38,004-Speed 2494.92 samples/sec Loss 1.1204 LearningRate 0.000012 Epoch: 36 Global Step: 746920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:46,213-Speed 2495.20 samples/sec Loss 1.1008 LearningRate 0.000012 Epoch: 36 Global Step: 746930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:24:54,420-Speed 2495.74 samples/sec Loss 1.1054 LearningRate 0.000012 Epoch: 36 Global Step: 746940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:25:02,569-Speed 2513.70 samples/sec Loss 1.1080 LearningRate 0.000012 Epoch: 36 Global Step: 746950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:25:10,775-Speed 2496.11 samples/sec Loss 1.1064 LearningRate 0.000012 Epoch: 36 Global Step: 746960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:25:18,984-Speed 2495.15 samples/sec Loss 1.0898 LearningRate 0.000012 Epoch: 36 Global Step: 746970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:25:27,148-Speed 2509.16 samples/sec Loss 1.1027 LearningRate 0.000012 Epoch: 36 Global Step: 746980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:25:35,353-Speed 2496.36 samples/sec Loss 1.1178 LearningRate 0.000012 Epoch: 36 Global Step: 746990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:25:43,557-Speed 2496.79 samples/sec Loss 1.0829 LearningRate 0.000012 Epoch: 36 Global Step: 747000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:25:51,709-Speed 2512.52 samples/sec Loss 1.1109 LearningRate 0.000012 Epoch: 36 Global Step: 747010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:25:59,914-Speed 2496.84 samples/sec Loss 1.0777 LearningRate 0.000012 Epoch: 36 Global Step: 747020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:08,122-Speed 2495.75 samples/sec Loss 1.1105 LearningRate 0.000012 Epoch: 36 Global Step: 747030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:16,327-Speed 2496.12 samples/sec Loss 1.1032 LearningRate 0.000012 Epoch: 36 Global Step: 747040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:24,533-Speed 2496.28 samples/sec Loss 1.0892 LearningRate 0.000012 Epoch: 36 Global Step: 747050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:32,738-Speed 2496.69 samples/sec Loss 1.1144 LearningRate 0.000012 Epoch: 36 Global Step: 747060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:40,888-Speed 2513.75 samples/sec Loss 1.1303 LearningRate 0.000012 Epoch: 36 Global Step: 747070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:49,097-Speed 2495.12 samples/sec Loss 1.0904 LearningRate 0.000012 Epoch: 36 Global Step: 747080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:26:57,313-Speed 2493.19 samples/sec Loss 1.1131 LearningRate 0.000012 Epoch: 36 Global Step: 747090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:05,520-Speed 2495.67 samples/sec Loss 1.0839 LearningRate 0.000012 Epoch: 36 Global Step: 747100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:13,736-Speed 2493.35 samples/sec Loss 1.1088 LearningRate 0.000012 Epoch: 36 Global Step: 747110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:21,950-Speed 2493.60 samples/sec Loss 1.1105 LearningRate 0.000012 Epoch: 36 Global Step: 747120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:30,103-Speed 2512.55 samples/sec Loss 1.0835 LearningRate 0.000012 Epoch: 36 Global Step: 747130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:38,308-Speed 2496.25 samples/sec Loss 1.0997 LearningRate 0.000012 Epoch: 36 Global Step: 747140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:46,521-Speed 2493.97 samples/sec Loss 1.0872 LearningRate 0.000012 Epoch: 36 Global Step: 747150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:27:54,729-Speed 2495.71 samples/sec Loss 1.1243 LearningRate 0.000012 Epoch: 36 Global Step: 747160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:02,933-Speed 2496.76 samples/sec Loss 1.0785 LearningRate 0.000012 Epoch: 36 Global Step: 747170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:11,143-Speed 2494.79 samples/sec Loss 1.0854 LearningRate 0.000012 Epoch: 36 Global Step: 747180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:19,294-Speed 2512.99 samples/sec Loss 1.0897 LearningRate 0.000012 Epoch: 36 Global Step: 747190 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:27,500-Speed 2495.98 samples/sec Loss 1.1071 LearningRate 0.000012 Epoch: 36 Global Step: 747200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:35,704-Speed 2496.90 samples/sec Loss 1.1330 LearningRate 0.000012 Epoch: 36 Global Step: 747210 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:43,908-Speed 2496.62 samples/sec Loss 1.1016 LearningRate 0.000012 Epoch: 36 Global Step: 747220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:28:52,110-Speed 2497.01 samples/sec Loss 1.0860 LearningRate 0.000012 Epoch: 36 Global Step: 747230 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:00,316-Speed 2496.24 samples/sec Loss 1.0885 LearningRate 0.000012 Epoch: 36 Global Step: 747240 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:08,467-Speed 2512.83 samples/sec Loss 1.0991 LearningRate 0.000012 Epoch: 36 Global Step: 747250 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:16,676-Speed 2495.19 samples/sec Loss 1.0948 LearningRate 0.000012 Epoch: 36 Global Step: 747260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:24,880-Speed 2496.72 samples/sec Loss 1.1137 LearningRate 0.000012 Epoch: 36 Global Step: 747270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:33,098-Speed 2493.08 samples/sec Loss 1.1144 LearningRate 0.000012 Epoch: 36 Global Step: 747280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:41,301-Speed 2496.83 samples/sec Loss 1.1083 LearningRate 0.000012 Epoch: 36 Global Step: 747290 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:49,507-Speed 2496.21 samples/sec Loss 1.0984 LearningRate 0.000012 Epoch: 36 Global Step: 747300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:29:57,658-Speed 2512.92 samples/sec Loss 1.1052 LearningRate 0.000012 Epoch: 36 Global Step: 747310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:05,864-Speed 2495.94 samples/sec Loss 1.1216 LearningRate 0.000012 Epoch: 36 Global Step: 747320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:14,066-Speed 2497.49 samples/sec Loss 1.0987 LearningRate 0.000012 Epoch: 36 Global Step: 747330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:22,271-Speed 2496.42 samples/sec Loss 1.1009 LearningRate 0.000012 Epoch: 36 Global Step: 747340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:30,485-Speed 2493.75 samples/sec Loss 1.0935 LearningRate 0.000012 Epoch: 36 Global Step: 747350 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:38,697-Speed 2494.64 samples/sec Loss 1.0993 LearningRate 0.000012 Epoch: 36 Global Step: 747360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:46,847-Speed 2513.06 samples/sec Loss 1.1189 LearningRate 0.000012 Epoch: 36 Global Step: 747370 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:30:55,066-Speed 2492.38 samples/sec Loss 1.1215 LearningRate 0.000012 Epoch: 36 Global Step: 747380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:03,270-Speed 2496.68 samples/sec Loss 1.1095 LearningRate 0.000012 Epoch: 36 Global Step: 747390 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:11,475-Speed 2496.39 samples/sec Loss 1.0658 LearningRate 0.000012 Epoch: 36 Global Step: 747400 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:19,683-Speed 2495.71 samples/sec Loss 1.1180 LearningRate 0.000012 Epoch: 36 Global Step: 747410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:27,887-Speed 2496.50 samples/sec Loss 1.1312 LearningRate 0.000012 Epoch: 36 Global Step: 747420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:36,040-Speed 2512.65 samples/sec Loss 1.0839 LearningRate 0.000012 Epoch: 36 Global Step: 747430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:44,247-Speed 2495.94 samples/sec Loss 1.1340 LearningRate 0.000012 Epoch: 36 Global Step: 747440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:31:52,456-Speed 2495.08 samples/sec Loss 1.1221 LearningRate 0.000012 Epoch: 36 Global Step: 747450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:00,659-Speed 2497.53 samples/sec Loss 1.0998 LearningRate 0.000012 Epoch: 36 Global Step: 747460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:08,869-Speed 2495.04 samples/sec Loss 1.1067 LearningRate 0.000012 Epoch: 36 Global Step: 747470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:17,073-Speed 2496.69 samples/sec Loss 1.0914 LearningRate 0.000012 Epoch: 36 Global Step: 747480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:25,223-Speed 2513.23 samples/sec Loss 1.1082 LearningRate 0.000012 Epoch: 36 Global Step: 747490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:33,436-Speed 2494.16 samples/sec Loss 1.0888 LearningRate 0.000012 Epoch: 36 Global Step: 747500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:41,638-Speed 2497.42 samples/sec Loss 1.1184 LearningRate 0.000012 Epoch: 36 Global Step: 747510 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:49,853-Speed 2493.30 samples/sec Loss 1.1038 LearningRate 0.000012 Epoch: 36 Global Step: 747520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:32:58,060-Speed 2496.07 samples/sec Loss 1.1052 LearningRate 0.000012 Epoch: 36 Global Step: 747530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:06,273-Speed 2493.93 samples/sec Loss 1.0958 LearningRate 0.000012 Epoch: 36 Global Step: 747540 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:14,427-Speed 2511.84 samples/sec Loss 1.1082 LearningRate 0.000012 Epoch: 36 Global Step: 747550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:22,632-Speed 2496.36 samples/sec Loss 1.1066 LearningRate 0.000012 Epoch: 36 Global Step: 747560 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:30,848-Speed 2493.20 samples/sec Loss 1.1461 LearningRate 0.000012 Epoch: 36 Global Step: 747570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:39,056-Speed 2495.60 samples/sec Loss 1.0682 LearningRate 0.000012 Epoch: 36 Global Step: 747580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:47,261-Speed 2496.43 samples/sec Loss 1.1093 LearningRate 0.000012 Epoch: 36 Global Step: 747590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:33:55,467-Speed 2496.18 samples/sec Loss 1.1290 LearningRate 0.000012 Epoch: 36 Global Step: 747600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:03,620-Speed 2512.40 samples/sec Loss 1.1017 LearningRate 0.000012 Epoch: 36 Global Step: 747610 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:11,847-Speed 2489.62 samples/sec Loss 1.1072 LearningRate 0.000012 Epoch: 36 Global Step: 747620 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:20,064-Speed 2492.79 samples/sec Loss 1.0842 LearningRate 0.000012 Epoch: 36 Global Step: 747630 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:28,271-Speed 2495.99 samples/sec Loss 1.1145 LearningRate 0.000012 Epoch: 36 Global Step: 747640 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:36,478-Speed 2495.77 samples/sec Loss 1.1117 LearningRate 0.000012 Epoch: 36 Global Step: 747650 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:44,686-Speed 2495.41 samples/sec Loss 1.0835 LearningRate 0.000012 Epoch: 36 Global Step: 747660 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:34:52,839-Speed 2512.34 samples/sec Loss 1.1196 LearningRate 0.000012 Epoch: 36 Global Step: 747670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:01,052-Speed 2494.10 samples/sec Loss 1.1095 LearningRate 0.000012 Epoch: 36 Global Step: 747680 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:09,258-Speed 2496.10 samples/sec Loss 1.1013 LearningRate 0.000012 Epoch: 36 Global Step: 747690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:17,460-Speed 2497.63 samples/sec Loss 1.0941 LearningRate 0.000012 Epoch: 36 Global Step: 747700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:25,664-Speed 2497.05 samples/sec Loss 1.0854 LearningRate 0.000012 Epoch: 36 Global Step: 747710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:33,872-Speed 2495.49 samples/sec Loss 1.1092 LearningRate 0.000012 Epoch: 36 Global Step: 747720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:42,037-Speed 2508.67 samples/sec Loss 1.1159 LearningRate 0.000012 Epoch: 36 Global Step: 747730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:50,255-Speed 2492.74 samples/sec Loss 1.1099 LearningRate 0.000012 Epoch: 36 Global Step: 747740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:35:58,461-Speed 2496.35 samples/sec Loss 1.0947 LearningRate 0.000012 Epoch: 36 Global Step: 747750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:06,668-Speed 2495.76 samples/sec Loss 1.0815 LearningRate 0.000012 Epoch: 36 Global Step: 747760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:14,875-Speed 2495.91 samples/sec Loss 1.0714 LearningRate 0.000012 Epoch: 36 Global Step: 747770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:23,083-Speed 2495.43 samples/sec Loss 1.0879 LearningRate 0.000012 Epoch: 36 Global Step: 747780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:31,238-Speed 2511.63 samples/sec Loss 1.0900 LearningRate 0.000012 Epoch: 36 Global Step: 747790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:39,446-Speed 2495.59 samples/sec Loss 1.1125 LearningRate 0.000012 Epoch: 36 Global Step: 747800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:47,650-Speed 2496.81 samples/sec Loss 1.1181 LearningRate 0.000012 Epoch: 36 Global Step: 747810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:36:55,855-Speed 2496.57 samples/sec Loss 1.0884 LearningRate 0.000012 Epoch: 36 Global Step: 747820 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:04,060-Speed 2496.35 samples/sec Loss 1.0992 LearningRate 0.000012 Epoch: 36 Global Step: 747830 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:12,276-Speed 2492.99 samples/sec Loss 1.1091 LearningRate 0.000012 Epoch: 36 Global Step: 747840 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:20,430-Speed 2512.24 samples/sec Loss 1.1131 LearningRate 0.000012 Epoch: 36 Global Step: 747850 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:28,633-Speed 2496.96 samples/sec Loss 1.0783 LearningRate 0.000012 Epoch: 36 Global Step: 747860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:36,842-Speed 2495.28 samples/sec Loss 1.1117 LearningRate 0.000012 Epoch: 36 Global Step: 747870 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:45,046-Speed 2496.79 samples/sec Loss 1.1099 LearningRate 0.000012 Epoch: 36 Global Step: 747880 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:37:53,249-Speed 2497.01 samples/sec Loss 1.1354 LearningRate 0.000012 Epoch: 36 Global Step: 747890 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:01,455-Speed 2496.03 samples/sec Loss 1.1303 LearningRate 0.000012 Epoch: 36 Global Step: 747900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:09,607-Speed 2513.03 samples/sec Loss 1.0852 LearningRate 0.000012 Epoch: 36 Global Step: 747910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:17,809-Speed 2497.14 samples/sec Loss 1.0986 LearningRate 0.000012 Epoch: 36 Global Step: 747920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:26,014-Speed 2496.65 samples/sec Loss 1.0968 LearningRate 0.000012 Epoch: 36 Global Step: 747930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:34,217-Speed 2496.99 samples/sec Loss 1.0933 LearningRate 0.000012 Epoch: 36 Global Step: 747940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:42,419-Speed 2497.35 samples/sec Loss 1.0837 LearningRate 0.000012 Epoch: 36 Global Step: 747950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:50,622-Speed 2496.84 samples/sec Loss 1.1017 LearningRate 0.000012 Epoch: 36 Global Step: 747960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:38:58,771-Speed 2513.76 samples/sec Loss 1.0810 LearningRate 0.000012 Epoch: 36 Global Step: 747970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:06,971-Speed 2498.11 samples/sec Loss 1.1005 LearningRate 0.000012 Epoch: 36 Global Step: 747980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:15,174-Speed 2497.07 samples/sec Loss 1.0818 LearningRate 0.000012 Epoch: 36 Global Step: 747990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:23,376-Speed 2497.62 samples/sec Loss 1.1116 LearningRate 0.000012 Epoch: 36 Global Step: 748000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:31,580-Speed 2496.63 samples/sec Loss 1.0892 LearningRate 0.000012 Epoch: 36 Global Step: 748010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:39,783-Speed 2497.20 samples/sec Loss 1.1107 LearningRate 0.000012 Epoch: 36 Global Step: 748020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:47,931-Speed 2513.77 samples/sec Loss 1.0897 LearningRate 0.000012 Epoch: 36 Global Step: 748030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:39:56,147-Speed 2493.10 samples/sec Loss 1.0624 LearningRate 0.000012 Epoch: 36 Global Step: 748040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:04,350-Speed 2496.98 samples/sec Loss 1.0605 LearningRate 0.000012 Epoch: 36 Global Step: 748050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:12,552-Speed 2497.22 samples/sec Loss 1.1039 LearningRate 0.000012 Epoch: 36 Global Step: 748060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:20,756-Speed 2496.78 samples/sec Loss 1.1032 LearningRate 0.000012 Epoch: 36 Global Step: 748070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:28,958-Speed 2497.02 samples/sec Loss 1.1072 LearningRate 0.000012 Epoch: 36 Global Step: 748080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:37,109-Speed 2513.08 samples/sec Loss 1.1250 LearningRate 0.000012 Epoch: 36 Global Step: 748090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:45,310-Speed 2497.59 samples/sec Loss 1.1062 LearningRate 0.000012 Epoch: 36 Global Step: 748100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:40:53,514-Speed 2496.77 samples/sec Loss 1.0668 LearningRate 0.000012 Epoch: 36 Global Step: 748110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:01,717-Speed 2496.92 samples/sec Loss 1.0770 LearningRate 0.000012 Epoch: 36 Global Step: 748120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:09,933-Speed 2493.27 samples/sec Loss 1.1071 LearningRate 0.000012 Epoch: 36 Global Step: 748130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:18,137-Speed 2496.53 samples/sec Loss 1.0887 LearningRate 0.000012 Epoch: 36 Global Step: 748140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:26,289-Speed 2512.56 samples/sec Loss 1.0995 LearningRate 0.000012 Epoch: 36 Global Step: 748150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:34,501-Speed 2494.28 samples/sec Loss 1.0878 LearningRate 0.000012 Epoch: 36 Global Step: 748160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:42,711-Speed 2495.04 samples/sec Loss 1.1032 LearningRate 0.000012 Epoch: 36 Global Step: 748170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:41:50,912-Speed 2497.74 samples/sec Loss 1.0793 LearningRate 0.000012 Epoch: 36 Global Step: 748180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:41:59,116-Speed 2496.46 samples/sec Loss 1.0982 LearningRate 0.000012 Epoch: 36 Global Step: 748190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:07,319-Speed 2497.25 samples/sec Loss 1.0868 LearningRate 0.000012 Epoch: 36 Global Step: 748200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:15,477-Speed 2510.94 samples/sec Loss 1.1010 LearningRate 0.000012 Epoch: 36 Global Step: 748210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:23,696-Speed 2492.02 samples/sec Loss 1.0915 LearningRate 0.000012 Epoch: 36 Global Step: 748220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:31,902-Speed 2496.71 samples/sec Loss 1.1036 LearningRate 0.000012 Epoch: 36 Global Step: 748230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:40,109-Speed 2495.93 samples/sec Loss 1.1213 LearningRate 0.000012 Epoch: 36 Global Step: 748240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:48,327-Speed 2492.71 samples/sec Loss 1.0848 LearningRate 0.000012 Epoch: 36 Global Step: 748250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:42:56,537-Speed 2494.88 samples/sec Loss 1.1174 LearningRate 0.000012 Epoch: 36 Global Step: 748260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:04,690-Speed 2512.41 samples/sec Loss 1.1055 LearningRate 0.000012 Epoch: 36 Global Step: 748270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:12,902-Speed 2494.21 samples/sec Loss 1.1293 LearningRate 0.000012 Epoch: 36 Global Step: 748280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:21,111-Speed 2495.25 samples/sec Loss 1.1302 LearningRate 0.000012 Epoch: 36 Global Step: 748290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:29,320-Speed 2495.30 samples/sec Loss 1.0970 LearningRate 0.000012 Epoch: 36 Global Step: 748300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:37,525-Speed 2496.19 samples/sec Loss 1.1022 LearningRate 0.000012 Epoch: 36 Global Step: 748310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:45,740-Speed 2493.46 samples/sec Loss 1.0797 LearningRate 0.000012 Epoch: 36 Global Step: 748320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:43:53,891-Speed 2513.08 samples/sec Loss 1.1199 LearningRate 0.000012 Epoch: 36 Global Step: 748330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:44:02,094-Speed 2496.98 samples/sec Loss 1.1044 LearningRate 0.000012 Epoch: 36 Global Step: 748340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:44:10,303-Speed 2495.21 samples/sec Loss 1.0675 LearningRate 0.000012 Epoch: 36 Global Step: 748350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:44:18,507-Speed 2496.75 samples/sec Loss 1.1171 LearningRate 0.000012 Epoch: 36 Global Step: 748360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:44:26,709-Speed 2497.34 samples/sec Loss 1.1395 LearningRate 0.000012 Epoch: 36 Global Step: 748370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-07-12 17:44:34,870-Speed 2509.88 samples/sec Loss 1.1015 LearningRate 0.000012 Epoch: 36 Global Step: 748380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:44:43,020-Speed 2513.30 samples/sec Loss 1.0963 LearningRate 0.000012 Epoch: 36 Global Step: 748390 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:44:51,226-Speed 2496.19 samples/sec Loss 1.0990 LearningRate 0.000012 Epoch: 36 Global Step: 748400 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:44:59,432-Speed 2496.30 samples/sec Loss 1.1172 LearningRate 0.000012 Epoch: 36 Global Step: 748410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:07,638-Speed 2496.28 samples/sec Loss 1.0895 LearningRate 0.000012 Epoch: 36 Global Step: 748420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:15,846-Speed 2495.37 samples/sec Loss 1.0834 LearningRate 0.000012 Epoch: 36 Global Step: 748430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:24,048-Speed 2497.34 samples/sec Loss 1.1186 LearningRate 0.000012 Epoch: 36 Global Step: 748440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:32,195-Speed 2514.61 samples/sec Loss 1.0786 LearningRate 0.000012 Epoch: 36 Global Step: 748450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:40,410-Speed 2493.49 samples/sec Loss 1.1294 LearningRate 0.000012 Epoch: 36 Global Step: 748460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:48,642-Speed 2488.28 samples/sec Loss 1.0729 LearningRate 0.000012 Epoch: 36 Global Step: 748470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:45:56,847-Speed 2496.44 samples/sec Loss 1.1216 LearningRate 0.000012 Epoch: 36 Global Step: 748480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:05,049-Speed 2497.46 samples/sec Loss 1.1102 LearningRate 0.000012 Epoch: 36 Global Step: 748490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:13,248-Speed 2498.01 samples/sec Loss 1.1106 LearningRate 0.000012 Epoch: 36 Global Step: 748500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:21,396-Speed 2513.83 samples/sec Loss 1.1071 LearningRate 0.000012 Epoch: 36 Global Step: 748510 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:29,609-Speed 2494.08 samples/sec Loss 1.0856 LearningRate 0.000012 Epoch: 36 Global Step: 748520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:37,814-Speed 2496.64 samples/sec Loss 1.1267 LearningRate 0.000012 Epoch: 36 Global Step: 748530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-07-12 17:46:45,975-Speed 2509.86 samples/sec Loss 1.0755 LearningRate 0.000012 Epoch: 36 Global Step: 748540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:46:54,178-Speed 2497.05 samples/sec Loss 1.0735 LearningRate 0.000012 Epoch: 36 Global Step: 748550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:02,381-Speed 2497.11 samples/sec Loss 1.1298 LearningRate 0.000012 Epoch: 36 Global Step: 748560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:10,538-Speed 2511.27 samples/sec Loss 1.1243 LearningRate 0.000012 Epoch: 36 Global Step: 748570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:18,754-Speed 2493.15 samples/sec Loss 1.1186 LearningRate 0.000012 Epoch: 36 Global Step: 748580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:26,958-Speed 2496.72 samples/sec Loss 1.0749 LearningRate 0.000012 Epoch: 36 Global Step: 748590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:35,163-Speed 2496.38 samples/sec Loss 1.1260 LearningRate 0.000012 Epoch: 36 Global Step: 748600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:43,374-Speed 2494.69 samples/sec Loss 1.1055 LearningRate 0.000012 Epoch: 36 Global Step: 748610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:51,579-Speed 2496.41 samples/sec Loss 1.1013 LearningRate 0.000012 Epoch: 36 Global Step: 748620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:47:59,731-Speed 2512.45 samples/sec Loss 1.0977 LearningRate 0.000012 Epoch: 36 Global Step: 748630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:07,940-Speed 2495.39 samples/sec Loss 1.0902 LearningRate 0.000012 Epoch: 36 Global Step: 748640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:16,156-Speed 2493.25 samples/sec Loss 1.0830 LearningRate 0.000012 Epoch: 36 Global Step: 748650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:24,358-Speed 2497.16 samples/sec Loss 1.1060 LearningRate 0.000012 Epoch: 36 Global Step: 748660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:32,575-Speed 2493.11 samples/sec Loss 1.0931 LearningRate 0.000012 Epoch: 36 Global Step: 748670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:40,778-Speed 2497.05 samples/sec Loss 1.0825 LearningRate 0.000012 Epoch: 36 Global Step: 748680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:48,929-Speed 2512.73 samples/sec Loss 1.1187 LearningRate 0.000012 Epoch: 36 Global Step: 748690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:48:57,131-Speed 2497.39 samples/sec Loss 1.0899 LearningRate 0.000012 Epoch: 36 Global Step: 748700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:49:05,335-Speed 2496.67 samples/sec Loss 1.1318 LearningRate 0.000012 Epoch: 36 Global Step: 748710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:49:13,536-Speed 2497.58 samples/sec Loss 1.1113 LearningRate 0.000012 Epoch: 36 Global Step: 748720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:49:21,740-Speed 2496.53 samples/sec Loss 1.1295 LearningRate 0.000012 Epoch: 36 Global Step: 748730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-07-12 17:49:29,943-Speed 2497.21 samples/sec Loss 1.0906 LearningRate 0.000012 Epoch: 36 Global Step: 748740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:49:38,090-Speed 2514.02 samples/sec Loss 1.1091 LearningRate 0.000012 Epoch: 36 Global Step: 748750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:49:46,298-Speed 2495.79 samples/sec Loss 1.1065 LearningRate 0.000012 Epoch: 36 Global Step: 748760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:49:54,501-Speed 2496.77 samples/sec Loss 1.1191 LearningRate 0.000012 Epoch: 36 Global Step: 748770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:02,708-Speed 2496.02 samples/sec Loss 1.1076 LearningRate 0.000012 Epoch: 36 Global Step: 748780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:10,912-Speed 2496.90 samples/sec Loss 1.1197 LearningRate 0.000012 Epoch: 36 Global Step: 748790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:19,113-Speed 2497.73 samples/sec Loss 1.1064 LearningRate 0.000012 Epoch: 36 Global Step: 748800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:27,264-Speed 2512.65 samples/sec Loss 1.0922 LearningRate 0.000012 Epoch: 36 Global Step: 748810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:35,468-Speed 2496.91 samples/sec Loss 1.1120 LearningRate 0.000012 Epoch: 36 Global Step: 748820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:43,685-Speed 2492.78 samples/sec Loss 1.0933 LearningRate 0.000012 Epoch: 36 Global Step: 748830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:50:51,889-Speed 2496.68 samples/sec Loss 1.1006 LearningRate 0.000012 Epoch: 36 Global Step: 748840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:00,092-Speed 2497.03 samples/sec Loss 1.1028 LearningRate 0.000012 Epoch: 36 Global Step: 748850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:08,295-Speed 2496.91 samples/sec Loss 1.1108 LearningRate 0.000012 Epoch: 36 Global Step: 748860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:16,448-Speed 2512.57 samples/sec Loss 1.1183 LearningRate 0.000012 Epoch: 36 Global Step: 748870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:24,652-Speed 2496.56 samples/sec Loss 1.0960 LearningRate 0.000012 Epoch: 36 Global Step: 748880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:32,856-Speed 2496.97 samples/sec Loss 1.1018 LearningRate 0.000012 Epoch: 36 Global Step: 748890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:41,060-Speed 2496.97 samples/sec Loss 1.0984 LearningRate 0.000012 Epoch: 36 Global Step: 748900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:49,262-Speed 2497.25 samples/sec Loss 1.1167 LearningRate 0.000012 Epoch: 36 Global Step: 748910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:51:57,476-Speed 2493.43 samples/sec Loss 1.0883 LearningRate 0.000012 Epoch: 36 Global Step: 748920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:05,625-Speed 2513.81 samples/sec Loss 1.0732 LearningRate 0.000012 Epoch: 36 Global Step: 748930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:13,826-Speed 2497.69 samples/sec Loss 1.1005 LearningRate 0.000012 Epoch: 36 Global Step: 748940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:22,051-Speed 2490.62 samples/sec Loss 1.0996 LearningRate 0.000012 Epoch: 36 Global Step: 748950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:30,254-Speed 2496.97 samples/sec Loss 1.1181 LearningRate 0.000012 Epoch: 36 Global Step: 748960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:38,459-Speed 2496.08 samples/sec Loss 1.1048 LearningRate 0.000012 Epoch: 36 Global Step: 748970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:46,666-Speed 2495.73 samples/sec Loss 1.1374 LearningRate 0.000012 Epoch: 36 Global Step: 748980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:52:54,815-Speed 2513.79 samples/sec Loss 1.1391 LearningRate 0.000012 Epoch: 36 Global Step: 748990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:03,017-Speed 2497.18 samples/sec Loss 1.1089 LearningRate 0.000012 Epoch: 36 Global Step: 749000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:11,220-Speed 2496.96 samples/sec Loss 1.0820 LearningRate 0.000012 Epoch: 36 Global Step: 749010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:19,422-Speed 2497.49 samples/sec Loss 1.1176 LearningRate 0.000012 Epoch: 36 Global Step: 749020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:27,622-Speed 2497.98 samples/sec Loss 1.0952 LearningRate 0.000012 Epoch: 36 Global Step: 749030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:35,826-Speed 2496.52 samples/sec Loss 1.1101 LearningRate 0.000012 Epoch: 36 Global Step: 749040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:43,979-Speed 2512.74 samples/sec Loss 1.1070 LearningRate 0.000012 Epoch: 36 Global Step: 749050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:53:52,182-Speed 2497.01 samples/sec Loss 1.0855 LearningRate 0.000012 Epoch: 36 Global Step: 749060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:00,385-Speed 2496.87 samples/sec Loss 1.0873 LearningRate 0.000012 Epoch: 36 Global Step: 749070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:08,587-Speed 2497.41 samples/sec Loss 1.1302 LearningRate 0.000012 Epoch: 36 Global Step: 749080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:16,791-Speed 2496.92 samples/sec Loss 1.1128 LearningRate 0.000012 Epoch: 36 Global Step: 749090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:24,993-Speed 2497.58 samples/sec Loss 1.1068 LearningRate 0.000012 Epoch: 36 Global Step: 749100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:33,145-Speed 2512.66 samples/sec Loss 1.0894 LearningRate 0.000012 Epoch: 36 Global Step: 749110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:41,355-Speed 2495.13 samples/sec Loss 1.0876 LearningRate 0.000012 Epoch: 36 Global Step: 749120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:49,564-Speed 2495.26 samples/sec Loss 1.0812 LearningRate 0.000012 Epoch: 36 Global Step: 749130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:54:57,779-Speed 2493.29 samples/sec Loss 1.1058 LearningRate 0.000012 Epoch: 36 Global Step: 749140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:05,982-Speed 2497.08 samples/sec Loss 1.0907 LearningRate 0.000012 Epoch: 36 Global Step: 749150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:14,188-Speed 2495.94 samples/sec Loss 1.1189 LearningRate 0.000012 Epoch: 36 Global Step: 749160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:22,333-Speed 2515.02 samples/sec Loss 1.0942 LearningRate 0.000012 Epoch: 36 Global Step: 749170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:30,536-Speed 2497.05 samples/sec Loss 1.1049 LearningRate 0.000012 Epoch: 36 Global Step: 749180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:38,739-Speed 2497.11 samples/sec Loss 1.0992 LearningRate 0.000012 Epoch: 36 Global Step: 749190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:46,943-Speed 2496.71 samples/sec Loss 1.1327 LearningRate 0.000012 Epoch: 36 Global Step: 749200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:55:55,144-Speed 2497.66 samples/sec Loss 1.1126 LearningRate 0.000012 Epoch: 36 Global Step: 749210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:03,347-Speed 2497.20 samples/sec Loss 1.0782 LearningRate 0.000012 Epoch: 36 Global Step: 749220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:11,498-Speed 2512.93 samples/sec Loss 1.1020 LearningRate 0.000012 Epoch: 36 Global Step: 749230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:19,699-Speed 2497.91 samples/sec Loss 1.1066 LearningRate 0.000012 Epoch: 36 Global Step: 749240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:27,898-Speed 2498.41 samples/sec Loss 1.0966 LearningRate 0.000012 Epoch: 36 Global Step: 749250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:36,100-Speed 2497.19 samples/sec Loss 1.0721 LearningRate 0.000012 Epoch: 36 Global Step: 749260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:44,297-Speed 2498.82 samples/sec Loss 1.0722 LearningRate 0.000012 Epoch: 36 Global Step: 749270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:56:52,501-Speed 2496.64 samples/sec Loss 1.0709 LearningRate 0.000012 Epoch: 36 Global Step: 749280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:00,663-Speed 2509.69 samples/sec Loss 1.0979 LearningRate 0.000012 Epoch: 36 Global Step: 749290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:08,868-Speed 2496.45 samples/sec Loss 1.0904 LearningRate 0.000012 Epoch: 36 Global Step: 749300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:17,081-Speed 2494.07 samples/sec Loss 1.0885 LearningRate 0.000012 Epoch: 36 Global Step: 749310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:25,283-Speed 2497.61 samples/sec Loss 1.0860 LearningRate 0.000012 Epoch: 36 Global Step: 749320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:33,496-Speed 2494.24 samples/sec Loss 1.1023 LearningRate 0.000012 Epoch: 36 Global Step: 749330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:41,710-Speed 2493.62 samples/sec Loss 1.1116 LearningRate 0.000012 Epoch: 36 Global Step: 749340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:49,857-Speed 2514.32 samples/sec Loss 1.0887 LearningRate 0.000012 Epoch: 36 Global Step: 749350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:57:58,061-Speed 2496.96 samples/sec Loss 1.0815 LearningRate 0.000012 Epoch: 36 Global Step: 749360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:06,265-Speed 2496.93 samples/sec Loss 1.0801 LearningRate 0.000012 Epoch: 36 Global Step: 749370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:14,467-Speed 2497.34 samples/sec Loss 1.1314 LearningRate 0.000012 Epoch: 36 Global Step: 749380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:22,668-Speed 2497.35 samples/sec Loss 1.1054 LearningRate 0.000012 Epoch: 36 Global Step: 749390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:30,870-Speed 2497.30 samples/sec Loss 1.0769 LearningRate 0.000012 Epoch: 36 Global Step: 749400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:39,019-Speed 2513.84 samples/sec Loss 1.1222 LearningRate 0.000012 Epoch: 36 Global Step: 749410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:47,227-Speed 2495.31 samples/sec Loss 1.0513 LearningRate 0.000012 Epoch: 36 Global Step: 749420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:58:55,428-Speed 2497.57 samples/sec Loss 1.1255 LearningRate 0.000012 Epoch: 36 Global Step: 749430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:03,648-Speed 2491.87 samples/sec Loss 1.1033 LearningRate 0.000012 Epoch: 36 Global Step: 749440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:11,852-Speed 2496.82 samples/sec Loss 1.1044 LearningRate 0.000012 Epoch: 36 Global Step: 749450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:20,059-Speed 2496.07 samples/sec Loss 1.0984 LearningRate 0.000012 Epoch: 36 Global Step: 749460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:28,207-Speed 2513.84 samples/sec Loss 1.1031 LearningRate 0.000012 Epoch: 36 Global Step: 749470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:36,407-Speed 2498.17 samples/sec Loss 1.1140 LearningRate 0.000012 Epoch: 36 Global Step: 749480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:44,609-Speed 2497.09 samples/sec Loss 1.1157 LearningRate 0.000012 Epoch: 36 Global Step: 749490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 17:59:52,809-Speed 2498.18 samples/sec Loss 1.0861 LearningRate 0.000011 Epoch: 36 Global Step: 749500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:01,009-Speed 2497.88 samples/sec Loss 1.0797 LearningRate 0.000011 Epoch: 36 Global Step: 749510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:09,212-Speed 2497.02 samples/sec Loss 1.1115 LearningRate 0.000011 Epoch: 36 Global Step: 749520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:17,362-Speed 2513.19 samples/sec Loss 1.1131 LearningRate 0.000011 Epoch: 36 Global Step: 749530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:25,566-Speed 2497.04 samples/sec Loss 1.0999 LearningRate 0.000011 Epoch: 36 Global Step: 749540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:33,773-Speed 2495.70 samples/sec Loss 1.1447 LearningRate 0.000011 Epoch: 36 Global Step: 749550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:41,980-Speed 2495.90 samples/sec Loss 1.0631 LearningRate 0.000011 Epoch: 36 Global Step: 749560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:50,182-Speed 2497.46 samples/sec Loss 1.0728 LearningRate 0.000011 Epoch: 36 Global Step: 749570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:00:58,383-Speed 2498.30 samples/sec Loss 1.1033 LearningRate 0.000011 Epoch: 36 Global Step: 749580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:06,530-Speed 2514.14 samples/sec Loss 1.0996 LearningRate 0.000011 Epoch: 36 Global Step: 749590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:14,731-Speed 2497.52 samples/sec Loss 1.0954 LearningRate 0.000011 Epoch: 36 Global Step: 749600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:22,934-Speed 2497.13 samples/sec Loss 1.1286 LearningRate 0.000011 Epoch: 36 Global Step: 749610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:31,132-Speed 2498.73 samples/sec Loss 1.0804 LearningRate 0.000011 Epoch: 36 Global Step: 749620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:39,333-Speed 2497.81 samples/sec Loss 1.0940 LearningRate 0.000011 Epoch: 36 Global Step: 749630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:47,538-Speed 2496.42 samples/sec Loss 1.0951 LearningRate 0.000011 Epoch: 36 Global Step: 749640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:01:55,685-Speed 2514.36 samples/sec Loss 1.0921 LearningRate 0.000011 Epoch: 36 Global Step: 749650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:03,888-Speed 2497.14 samples/sec Loss 1.1082 LearningRate 0.000011 Epoch: 36 Global Step: 749660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:12,092-Speed 2496.70 samples/sec Loss 1.1307 LearningRate 0.000011 Epoch: 36 Global Step: 749670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:20,292-Speed 2497.75 samples/sec Loss 1.1130 LearningRate 0.000011 Epoch: 36 Global Step: 749680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:28,492-Speed 2498.21 samples/sec Loss 1.0809 LearningRate 0.000011 Epoch: 36 Global Step: 749690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:36,697-Speed 2496.47 samples/sec Loss 1.1041 LearningRate 0.000011 Epoch: 36 Global Step: 749700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:44,850-Speed 2512.43 samples/sec Loss 1.1214 LearningRate 0.000011 Epoch: 36 Global Step: 749710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:02:53,052-Speed 2497.65 samples/sec Loss 1.1121 LearningRate 0.000011 Epoch: 36 Global Step: 749720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:03:01,262-Speed 2494.70 samples/sec Loss 1.1091 LearningRate 0.000011 Epoch: 36 Global Step: 749730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-07-12 18:03:09,467-Speed 2496.60 samples/sec Loss 1.1131 LearningRate 0.000011 Epoch: 36 Global Step: 749740 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:17,673-Speed 2496.08 samples/sec Loss 1.1310 LearningRate 0.000011 Epoch: 36 Global Step: 749750 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:25,873-Speed 2497.82 samples/sec Loss 1.1142 LearningRate 0.000011 Epoch: 36 Global Step: 749760 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:34,028-Speed 2511.93 samples/sec Loss 1.1039 LearningRate 0.000011 Epoch: 36 Global Step: 749770 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:42,228-Speed 2497.73 samples/sec Loss 1.1103 LearningRate 0.000011 Epoch: 36 Global Step: 749780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:50,433-Speed 2496.47 samples/sec Loss 1.1009 LearningRate 0.000011 Epoch: 36 Global Step: 749790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:03:58,635-Speed 2497.45 samples/sec Loss 1.0830 LearningRate 0.000011 Epoch: 36 Global Step: 749800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:06,837-Speed 2497.31 samples/sec Loss 1.1021 LearningRate 0.000011 Epoch: 36 Global Step: 749810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:15,044-Speed 2495.97 samples/sec Loss 1.1270 LearningRate 0.000011 Epoch: 36 Global Step: 749820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:23,192-Speed 2514.03 samples/sec Loss 1.1351 LearningRate 0.000011 Epoch: 36 Global Step: 749830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:31,396-Speed 2496.71 samples/sec Loss 1.0990 LearningRate 0.000011 Epoch: 36 Global Step: 749840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:39,608-Speed 2494.58 samples/sec Loss 1.1099 LearningRate 0.000011 Epoch: 36 Global Step: 749850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:47,813-Speed 2496.48 samples/sec Loss 1.1221 LearningRate 0.000011 Epoch: 36 Global Step: 749860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:04:56,014-Speed 2497.70 samples/sec Loss 1.0993 LearningRate 0.000011 Epoch: 36 Global Step: 749870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:04,217-Speed 2496.92 samples/sec Loss 1.1187 LearningRate 0.000011 Epoch: 36 Global Step: 749880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:12,366-Speed 2513.54 samples/sec Loss 1.1274 LearningRate 0.000011 Epoch: 36 Global Step: 749890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:20,572-Speed 2496.05 samples/sec Loss 1.1027 LearningRate 0.000011 Epoch: 36 Global Step: 749900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:28,774-Speed 2497.40 samples/sec Loss 1.0871 LearningRate 0.000011 Epoch: 36 Global Step: 749910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:36,990-Speed 2493.11 samples/sec Loss 1.0893 LearningRate 0.000011 Epoch: 36 Global Step: 749920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:45,191-Speed 2497.83 samples/sec Loss 1.1358 LearningRate 0.000011 Epoch: 36 Global Step: 749930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:05:53,396-Speed 2496.25 samples/sec Loss 1.1081 LearningRate 0.000011 Epoch: 36 Global Step: 749940 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:01,555-Speed 2510.63 samples/sec Loss 1.1096 LearningRate 0.000011 Epoch: 36 Global Step: 749950 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:09,759-Speed 2496.52 samples/sec Loss 1.0860 LearningRate 0.000011 Epoch: 36 Global Step: 749960 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:17,962-Speed 2497.03 samples/sec Loss 1.1166 LearningRate 0.000011 Epoch: 36 Global Step: 749970 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:26,166-Speed 2496.99 samples/sec Loss 1.0921 LearningRate 0.000011 Epoch: 36 Global Step: 749980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:34,363-Speed 2498.67 samples/sec Loss 1.0824 LearningRate 0.000011 Epoch: 36 Global Step: 749990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:42,566-Speed 2497.11 samples/sec Loss 1.1201 LearningRate 0.000011 Epoch: 36 Global Step: 750000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:50,715-Speed 2513.73 samples/sec Loss 1.0946 LearningRate 0.000011 Epoch: 36 Global Step: 750010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:06:58,921-Speed 2496.14 samples/sec Loss 1.0946 LearningRate 0.000011 Epoch: 36 Global Step: 750020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:07,127-Speed 2496.22 samples/sec Loss 1.1161 LearningRate 0.000011 Epoch: 36 Global Step: 750030 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:15,329-Speed 2497.57 samples/sec Loss 1.0987 LearningRate 0.000011 Epoch: 36 Global Step: 750040 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:23,530-Speed 2497.69 samples/sec Loss 1.1112 LearningRate 0.000011 Epoch: 36 Global Step: 750050 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:31,737-Speed 2495.69 samples/sec Loss 1.0986 LearningRate 0.000011 Epoch: 36 Global Step: 750060 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:39,883-Speed 2514.42 samples/sec Loss 1.0719 LearningRate 0.000011 Epoch: 36 Global Step: 750070 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:48,087-Speed 2497.26 samples/sec Loss 1.0691 LearningRate 0.000011 Epoch: 36 Global Step: 750080 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:07:56,291-Speed 2496.61 samples/sec Loss 1.1109 LearningRate 0.000011 Epoch: 36 Global Step: 750090 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:04,504-Speed 2493.94 samples/sec Loss 1.0879 LearningRate 0.000011 Epoch: 36 Global Step: 750100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:12,706-Speed 2497.63 samples/sec Loss 1.1151 LearningRate 0.000011 Epoch: 36 Global Step: 750110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:20,908-Speed 2497.49 samples/sec Loss 1.1101 LearningRate 0.000011 Epoch: 36 Global Step: 750120 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:29,058-Speed 2513.31 samples/sec Loss 1.0926 LearningRate 0.000011 Epoch: 36 Global Step: 750130 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:37,259-Speed 2497.51 samples/sec Loss 1.1141 LearningRate 0.000011 Epoch: 36 Global Step: 750140 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:45,465-Speed 2496.09 samples/sec Loss 1.0733 LearningRate 0.000011 Epoch: 36 Global Step: 750150 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:08:53,669-Speed 2497.05 samples/sec Loss 1.0723 LearningRate 0.000011 Epoch: 36 Global Step: 750160 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:01,871-Speed 2497.17 samples/sec Loss 1.0942 LearningRate 0.000011 Epoch: 36 Global Step: 750170 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:10,074-Speed 2496.95 samples/sec Loss 1.0961 LearningRate 0.000011 Epoch: 36 Global Step: 750180 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:18,225-Speed 2512.98 samples/sec Loss 1.0995 LearningRate 0.000011 Epoch: 36 Global Step: 750190 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:26,434-Speed 2495.24 samples/sec Loss 1.0826 LearningRate 0.000011 Epoch: 36 Global Step: 750200 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:34,660-Speed 2489.96 samples/sec Loss 1.0935 LearningRate 0.000011 Epoch: 36 Global Step: 750210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:42,864-Speed 2496.73 samples/sec Loss 1.1324 LearningRate 0.000011 Epoch: 36 Global Step: 750220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:51,067-Speed 2497.18 samples/sec Loss 1.1383 LearningRate 0.000011 Epoch: 36 Global Step: 750230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:09:59,268-Speed 2497.50 samples/sec Loss 1.0794 LearningRate 0.000011 Epoch: 36 Global Step: 750240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:07,425-Speed 2511.23 samples/sec Loss 1.1016 LearningRate 0.000011 Epoch: 36 Global Step: 750250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:15,625-Speed 2497.82 samples/sec Loss 1.1123 LearningRate 0.000011 Epoch: 36 Global Step: 750260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:23,833-Speed 2495.80 samples/sec Loss 1.1162 LearningRate 0.000011 Epoch: 36 Global Step: 750270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:32,034-Speed 2497.53 samples/sec Loss 1.1188 LearningRate 0.000011 Epoch: 36 Global Step: 750280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:40,240-Speed 2495.80 samples/sec Loss 1.0955 LearningRate 0.000011 Epoch: 36 Global Step: 750290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:48,443-Speed 2496.98 samples/sec Loss 1.1042 LearningRate 0.000011 Epoch: 36 Global Step: 750300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:10:56,593-Speed 2513.31 samples/sec Loss 1.0810 LearningRate 0.000011 Epoch: 36 Global Step: 750310 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:04,796-Speed 2496.95 samples/sec Loss 1.1158 LearningRate 0.000011 Epoch: 36 Global Step: 750320 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:13,001-Speed 2496.58 samples/sec Loss 1.1235 LearningRate 0.000011 Epoch: 36 Global Step: 750330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:21,204-Speed 2497.24 samples/sec Loss 1.0788 LearningRate 0.000011 Epoch: 36 Global Step: 750340 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:29,408-Speed 2496.68 samples/sec Loss 1.0899 LearningRate 0.000011 Epoch: 36 Global Step: 750350 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:37,626-Speed 2492.26 samples/sec Loss 1.1003 LearningRate 0.000011 Epoch: 36 Global Step: 750360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:45,778-Speed 2512.92 samples/sec Loss 1.0986 LearningRate 0.000011 Epoch: 36 Global Step: 750370 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:11:53,980-Speed 2497.27 samples/sec Loss 1.0981 LearningRate 0.000011 Epoch: 36 Global Step: 750380 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:02,186-Speed 2496.26 samples/sec Loss 1.0906 LearningRate 0.000011 Epoch: 36 Global Step: 750390 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:10,391-Speed 2496.33 samples/sec Loss 1.0634 LearningRate 0.000011 Epoch: 36 Global Step: 750400 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:18,594-Speed 2497.18 samples/sec Loss 1.1061 LearningRate 0.000011 Epoch: 36 Global Step: 750410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:26,795-Speed 2497.65 samples/sec Loss 1.1044 LearningRate 0.000011 Epoch: 36 Global Step: 750420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:34,945-Speed 2513.37 samples/sec Loss 1.0918 LearningRate 0.000011 Epoch: 36 Global Step: 750430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:43,148-Speed 2496.82 samples/sec Loss 1.0809 LearningRate 0.000011 Epoch: 36 Global Step: 750440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:51,353-Speed 2496.74 samples/sec Loss 1.1392 LearningRate 0.000011 Epoch: 36 Global Step: 750450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:12:59,551-Speed 2498.28 samples/sec Loss 1.0689 LearningRate 0.000011 Epoch: 36 Global Step: 750460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:07,748-Speed 2499.18 samples/sec Loss 1.0819 LearningRate 0.000011 Epoch: 36 Global Step: 750470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:15,963-Speed 2493.74 samples/sec Loss 1.1067 LearningRate 0.000011 Epoch: 36 Global Step: 750480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:24,115-Speed 2512.56 samples/sec Loss 1.0940 LearningRate 0.000011 Epoch: 36 Global Step: 750490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:32,321-Speed 2496.26 samples/sec Loss 1.1320 LearningRate 0.000011 Epoch: 36 Global Step: 750500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:40,524-Speed 2496.86 samples/sec Loss 1.1407 LearningRate 0.000011 Epoch: 36 Global Step: 750510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:48,730-Speed 2495.89 samples/sec Loss 1.0887 LearningRate 0.000011 Epoch: 36 Global Step: 750520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:13:56,941-Speed 2494.96 samples/sec Loss 1.0962 LearningRate 0.000011 Epoch: 36 Global Step: 750530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:05,147-Speed 2496.33 samples/sec Loss 1.1006 LearningRate 0.000011 Epoch: 36 Global Step: 750540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:13,296-Speed 2513.63 samples/sec Loss 1.1181 LearningRate 0.000011 Epoch: 36 Global Step: 750550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:21,498-Speed 2497.27 samples/sec Loss 1.0927 LearningRate 0.000011 Epoch: 36 Global Step: 750560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:29,723-Speed 2490.38 samples/sec Loss 1.1026 LearningRate 0.000011 Epoch: 36 Global Step: 750570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:37,928-Speed 2496.54 samples/sec Loss 1.1037 LearningRate 0.000011 Epoch: 36 Global Step: 750580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:46,128-Speed 2498.16 samples/sec Loss 1.1067 LearningRate 0.000011 Epoch: 36 Global Step: 750590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:14:54,333-Speed 2496.61 samples/sec Loss 1.1142 LearningRate 0.000011 Epoch: 36 Global Step: 750600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:02,484-Speed 2512.86 samples/sec Loss 1.1305 LearningRate 0.000011 Epoch: 36 Global Step: 750610 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:10,688-Speed 2496.87 samples/sec Loss 1.0697 LearningRate 0.000011 Epoch: 36 Global Step: 750620 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:18,892-Speed 2496.62 samples/sec Loss 1.0826 LearningRate 0.000011 Epoch: 36 Global Step: 750630 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:27,097-Speed 2496.56 samples/sec Loss 1.1210 LearningRate 0.000011 Epoch: 36 Global Step: 750640 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:35,301-Speed 2496.95 samples/sec Loss 1.0754 LearningRate 0.000011 Epoch: 36 Global Step: 750650 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:43,504-Speed 2496.98 samples/sec Loss 1.0878 LearningRate 0.000011 Epoch: 36 Global Step: 750660 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:51,654-Speed 2513.06 samples/sec Loss 1.0912 LearningRate 0.000011 Epoch: 36 Global Step: 750670 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:15:59,859-Speed 2496.74 samples/sec Loss 1.0940 LearningRate 0.000011 Epoch: 36 Global Step: 750680 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:08,061-Speed 2497.26 samples/sec Loss 1.0973 LearningRate 0.000011 Epoch: 36 Global Step: 750690 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:16,265-Speed 2496.84 samples/sec Loss 1.0948 LearningRate 0.000011 Epoch: 36 Global Step: 750700 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:24,468-Speed 2497.12 samples/sec Loss 1.1291 LearningRate 0.000011 Epoch: 36 Global Step: 750710 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:32,678-Speed 2494.66 samples/sec Loss 1.0906 LearningRate 0.000011 Epoch: 36 Global Step: 750720 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:40,833-Speed 2511.64 samples/sec Loss 1.1104 LearningRate 0.000011 Epoch: 36 Global Step: 750730 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:49,037-Speed 2496.87 samples/sec Loss 1.1051 LearningRate 0.000011 Epoch: 36 Global Step: 750740 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:16:57,243-Speed 2496.24 samples/sec Loss 1.1369 LearningRate 0.000011 Epoch: 36 Global Step: 750750 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:05,446-Speed 2496.81 samples/sec Loss 1.1417 LearningRate 0.000011 Epoch: 36 Global Step: 750760 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:13,648-Speed 2497.64 samples/sec Loss 1.1050 LearningRate 0.000011 Epoch: 36 Global Step: 750770 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:21,848-Speed 2497.83 samples/sec Loss 1.1031 LearningRate 0.000011 Epoch: 36 Global Step: 750780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:29,996-Speed 2513.99 samples/sec Loss 1.1064 LearningRate 0.000011 Epoch: 36 Global Step: 750790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:38,197-Speed 2497.63 samples/sec Loss 1.1044 LearningRate 0.000011 Epoch: 36 Global Step: 750800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:46,399-Speed 2497.23 samples/sec Loss 1.0806 LearningRate 0.000011 Epoch: 36 Global Step: 750810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:17:54,599-Speed 2497.85 samples/sec Loss 1.0974 LearningRate 0.000011 Epoch: 36 Global Step: 750820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:02,803-Speed 2496.80 samples/sec Loss 1.0934 LearningRate 0.000011 Epoch: 36 Global Step: 750830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:11,005-Speed 2497.22 samples/sec Loss 1.1048 LearningRate 0.000011 Epoch: 36 Global Step: 750840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:19,156-Speed 2513.34 samples/sec Loss 1.0960 LearningRate 0.000011 Epoch: 36 Global Step: 750850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:27,363-Speed 2496.01 samples/sec Loss 1.0770 LearningRate 0.000011 Epoch: 36 Global Step: 750860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:35,565-Speed 2497.16 samples/sec Loss 1.0658 LearningRate 0.000011 Epoch: 36 Global Step: 750870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:43,770-Speed 2496.49 samples/sec Loss 1.1067 LearningRate 0.000011 Epoch: 36 Global Step: 750880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:18:51,972-Speed 2497.41 samples/sec Loss 1.0931 LearningRate 0.000011 Epoch: 36 Global Step: 750890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:19:00,177-Speed 2496.32 samples/sec Loss 1.0966 LearningRate 0.000011 Epoch: 36 Global Step: 750900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:19:08,328-Speed 2513.08 samples/sec Loss 1.1006 LearningRate 0.000011 Epoch: 36 Global Step: 750910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:19:16,531-Speed 2497.00 samples/sec Loss 1.0947 LearningRate 0.000011 Epoch: 36 Global Step: 750920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:19:24,747-Speed 2493.01 samples/sec Loss 1.1029 LearningRate 0.000011 Epoch: 36 Global Step: 750930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:19:32,948-Speed 2497.71 samples/sec Loss 1.1096 LearningRate 0.000011 Epoch: 36 Global Step: 750940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:19:41,148-Speed 2497.89 samples/sec Loss 1.0931 LearningRate 0.000011 Epoch: 36 Global Step: 750950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:19:49,349-Speed 2497.54 samples/sec Loss 1.1166 LearningRate 0.000011 Epoch: 36 Global Step: 750960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:19:57,498-Speed 2513.83 samples/sec Loss 1.0881 LearningRate 0.000011 Epoch: 36 Global Step: 750970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:05,699-Speed 2497.80 samples/sec Loss 1.0757 LearningRate 0.000011 Epoch: 36 Global Step: 750980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:13,902-Speed 2496.82 samples/sec Loss 1.1213 LearningRate 0.000011 Epoch: 36 Global Step: 750990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:22,102-Speed 2498.01 samples/sec Loss 1.1258 LearningRate 0.000011 Epoch: 36 Global Step: 751000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:30,302-Speed 2498.03 samples/sec Loss 1.0895 LearningRate 0.000011 Epoch: 36 Global Step: 751010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:38,502-Speed 2497.99 samples/sec Loss 1.0868 LearningRate 0.000011 Epoch: 36 Global Step: 751020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:46,647-Speed 2514.52 samples/sec Loss 1.0897 LearningRate 0.000011 Epoch: 36 Global Step: 751030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:20:54,856-Speed 2495.29 samples/sec Loss 1.1171 LearningRate 0.000011 Epoch: 36 Global Step: 751040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:03,059-Speed 2497.01 samples/sec Loss 1.1077 LearningRate 0.000011 Epoch: 36 Global Step: 751050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:11,260-Speed 2497.71 samples/sec Loss 1.0787 LearningRate 0.000011 Epoch: 36 Global Step: 751060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:19,464-Speed 2496.69 samples/sec Loss 1.0976 LearningRate 0.000011 Epoch: 36 Global Step: 751070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:27,667-Speed 2497.09 samples/sec Loss 1.1044 LearningRate 0.000011 Epoch: 36 Global Step: 751080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:35,818-Speed 2513.07 samples/sec Loss 1.1004 LearningRate 0.000011 Epoch: 36 Global Step: 751090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:44,018-Speed 2497.90 samples/sec Loss 1.1051 LearningRate 0.000011 Epoch: 36 Global Step: 751100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:21:52,221-Speed 2497.25 samples/sec Loss 1.1216 LearningRate 0.000011 Epoch: 36 Global Step: 751110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:00,421-Speed 2498.02 samples/sec Loss 1.0683 LearningRate 0.000011 Epoch: 36 Global Step: 751120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:08,624-Speed 2496.79 samples/sec Loss 1.0923 LearningRate 0.000011 Epoch: 36 Global Step: 751130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:16,827-Speed 2497.19 samples/sec Loss 1.1233 LearningRate 0.000011 Epoch: 36 Global Step: 751140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:24,979-Speed 2512.93 samples/sec Loss 1.1086 LearningRate 0.000011 Epoch: 36 Global Step: 751150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:33,185-Speed 2496.28 samples/sec Loss 1.1118 LearningRate 0.000011 Epoch: 36 Global Step: 751160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:41,389-Speed 2496.66 samples/sec Loss 1.0836 LearningRate 0.000011 Epoch: 36 Global Step: 751170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:49,597-Speed 2495.37 samples/sec Loss 1.1013 LearningRate 0.000011 Epoch: 36 Global Step: 751180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:22:57,797-Speed 2498.25 samples/sec Loss 1.1327 LearningRate 0.000011 Epoch: 36 Global Step: 751190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:06,000-Speed 2497.03 samples/sec Loss 1.0978 LearningRate 0.000011 Epoch: 36 Global Step: 751200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:14,149-Speed 2513.65 samples/sec Loss 1.0843 LearningRate 0.000011 Epoch: 36 Global Step: 751210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:22,356-Speed 2495.76 samples/sec Loss 1.0897 LearningRate 0.000011 Epoch: 36 Global Step: 751220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:30,562-Speed 2496.22 samples/sec Loss 1.1123 LearningRate 0.000011 Epoch: 36 Global Step: 751230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:38,763-Speed 2497.62 samples/sec Loss 1.1015 LearningRate 0.000011 Epoch: 36 Global Step: 751240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:46,962-Speed 2498.03 samples/sec Loss 1.0972 LearningRate 0.000011 Epoch: 36 Global Step: 751250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:23:55,166-Speed 2496.98 samples/sec Loss 1.0895 LearningRate 0.000011 Epoch: 36 Global Step: 751260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:03,315-Speed 2513.32 samples/sec Loss 1.1053 LearningRate 0.000011 Epoch: 36 Global Step: 751270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:11,518-Speed 2497.54 samples/sec Loss 1.0885 LearningRate 0.000011 Epoch: 36 Global Step: 751280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:19,717-Speed 2498.12 samples/sec Loss 1.1007 LearningRate 0.000011 Epoch: 36 Global Step: 751290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:27,931-Speed 2493.81 samples/sec Loss 1.0961 LearningRate 0.000011 Epoch: 36 Global Step: 751300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:36,134-Speed 2497.19 samples/sec Loss 1.1036 LearningRate 0.000011 Epoch: 36 Global Step: 751310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:44,337-Speed 2497.20 samples/sec Loss 1.1005 LearningRate 0.000011 Epoch: 36 Global Step: 751320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:24:52,488-Speed 2512.81 samples/sec Loss 1.0843 LearningRate 0.000011 Epoch: 36 Global Step: 751330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:00,716-Speed 2489.45 samples/sec Loss 1.0805 LearningRate 0.000011 Epoch: 36 Global Step: 751340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:08,921-Speed 2496.44 samples/sec Loss 1.0847 LearningRate 0.000011 Epoch: 36 Global Step: 751350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:17,126-Speed 2496.46 samples/sec Loss 1.1004 LearningRate 0.000011 Epoch: 36 Global Step: 751360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:25,334-Speed 2495.45 samples/sec Loss 1.1147 LearningRate 0.000011 Epoch: 36 Global Step: 751370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:33,543-Speed 2495.23 samples/sec Loss 1.0586 LearningRate 0.000011 Epoch: 36 Global Step: 751380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:41,693-Speed 2513.29 samples/sec Loss 1.1068 LearningRate 0.000011 Epoch: 36 Global Step: 751390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:49,897-Speed 2496.88 samples/sec Loss 1.0805 LearningRate 0.000011 Epoch: 36 Global Step: 751400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:25:58,061-Speed 2508.84 samples/sec Loss 1.0906 LearningRate 0.000011 Epoch: 36 Global Step: 751410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:06,275-Speed 2493.85 samples/sec Loss 1.1073 LearningRate 0.000011 Epoch: 36 Global Step: 751420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:14,480-Speed 2496.52 samples/sec Loss 1.1057 LearningRate 0.000011 Epoch: 36 Global Step: 751430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:22,684-Speed 2496.73 samples/sec Loss 1.1196 LearningRate 0.000011 Epoch: 36 Global Step: 751440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:30,841-Speed 2510.82 samples/sec Loss 1.1007 LearningRate 0.000011 Epoch: 36 Global Step: 751450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:39,054-Speed 2494.21 samples/sec Loss 1.1311 LearningRate 0.000011 Epoch: 36 Global Step: 751460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:47,265-Speed 2494.69 samples/sec Loss 1.0947 LearningRate 0.000011 Epoch: 36 Global Step: 751470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:26:55,492-Speed 2489.51 samples/sec Loss 1.1146 LearningRate 0.000011 Epoch: 36 Global Step: 751480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:03,699-Speed 2496.02 samples/sec Loss 1.0954 LearningRate 0.000011 Epoch: 36 Global Step: 751490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:11,906-Speed 2495.59 samples/sec Loss 1.1209 LearningRate 0.000011 Epoch: 36 Global Step: 751500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:20,056-Speed 2513.53 samples/sec Loss 1.0761 LearningRate 0.000011 Epoch: 36 Global Step: 751510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:28,264-Speed 2495.42 samples/sec Loss 1.0631 LearningRate 0.000011 Epoch: 36 Global Step: 751520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:36,466-Speed 2497.51 samples/sec Loss 1.0933 LearningRate 0.000011 Epoch: 36 Global Step: 751530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:44,668-Speed 2497.37 samples/sec Loss 1.0729 LearningRate 0.000011 Epoch: 36 Global Step: 751540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:27:52,883-Speed 2493.35 samples/sec Loss 1.0783 LearningRate 0.000011 Epoch: 36 Global Step: 751550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:01,086-Speed 2497.05 samples/sec Loss 1.1278 LearningRate 0.000011 Epoch: 36 Global Step: 751560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:09,236-Speed 2513.22 samples/sec Loss 1.1171 LearningRate 0.000011 Epoch: 36 Global Step: 751570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:17,452-Speed 2493.20 samples/sec Loss 1.1190 LearningRate 0.000011 Epoch: 36 Global Step: 751580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:25,653-Speed 2497.68 samples/sec Loss 1.0922 LearningRate 0.000011 Epoch: 36 Global Step: 751590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:33,857-Speed 2496.39 samples/sec Loss 1.1058 LearningRate 0.000011 Epoch: 36 Global Step: 751600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:42,062-Speed 2496.57 samples/sec Loss 1.1232 LearningRate 0.000011 Epoch: 36 Global Step: 751610 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:50,265-Speed 2497.00 samples/sec Loss 1.1159 LearningRate 0.000011 Epoch: 36 Global Step: 751620 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:28:58,418-Speed 2512.39 samples/sec Loss 1.0880 LearningRate 0.000011 Epoch: 36 Global Step: 751630 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:06,624-Speed 2496.41 samples/sec Loss 1.0823 LearningRate 0.000011 Epoch: 36 Global Step: 751640 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:14,828-Speed 2496.70 samples/sec Loss 1.1203 LearningRate 0.000011 Epoch: 36 Global Step: 751650 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:23,033-Speed 2496.66 samples/sec Loss 1.0993 LearningRate 0.000011 Epoch: 36 Global Step: 751660 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:31,238-Speed 2496.58 samples/sec Loss 1.0873 LearningRate 0.000011 Epoch: 36 Global Step: 751670 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:39,445-Speed 2495.51 samples/sec Loss 1.0674 LearningRate 0.000011 Epoch: 36 Global Step: 751680 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:47,609-Speed 2508.96 samples/sec Loss 1.0786 LearningRate 0.000011 Epoch: 36 Global Step: 751690 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:29:55,812-Speed 2497.19 samples/sec Loss 1.0733 LearningRate 0.000011 Epoch: 36 Global Step: 751700 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:04,019-Speed 2495.85 samples/sec Loss 1.1014 LearningRate 0.000011 Epoch: 36 Global Step: 751710 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:12,222-Speed 2496.96 samples/sec Loss 1.1065 LearningRate 0.000011 Epoch: 36 Global Step: 751720 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:20,429-Speed 2495.57 samples/sec Loss 1.0850 LearningRate 0.000011 Epoch: 36 Global Step: 751730 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:28,636-Speed 2495.80 samples/sec Loss 1.1229 LearningRate 0.000011 Epoch: 36 Global Step: 751740 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:36,791-Speed 2511.72 samples/sec Loss 1.1009 LearningRate 0.000011 Epoch: 36 Global Step: 751750 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:45,008-Speed 2492.72 samples/sec Loss 1.0914 LearningRate 0.000011 Epoch: 36 Global Step: 751760 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:30:53,212-Speed 2496.92 samples/sec Loss 1.1008 LearningRate 0.000011 Epoch: 36 Global Step: 751770 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:01,419-Speed 2496.13 samples/sec Loss 1.1028 LearningRate 0.000011 Epoch: 36 Global Step: 751780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:09,628-Speed 2495.18 samples/sec Loss 1.0769 LearningRate 0.000011 Epoch: 36 Global Step: 751790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:17,934-Speed 2466.02 samples/sec Loss 1.1124 LearningRate 0.000011 Epoch: 36 Global Step: 751800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:26,085-Speed 2512.70 samples/sec Loss 1.1090 LearningRate 0.000011 Epoch: 36 Global Step: 751810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:34,292-Speed 2495.93 samples/sec Loss 1.1162 LearningRate 0.000011 Epoch: 36 Global Step: 751820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:42,498-Speed 2496.31 samples/sec Loss 1.1324 LearningRate 0.000011 Epoch: 36 Global Step: 751830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:50,720-Speed 2491.24 samples/sec Loss 1.0973 LearningRate 0.000011 Epoch: 36 Global Step: 751840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:31:58,926-Speed 2496.07 samples/sec Loss 1.1055 LearningRate 0.000011 Epoch: 36 Global Step: 751850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:07,131-Speed 2496.79 samples/sec Loss 1.0914 LearningRate 0.000011 Epoch: 36 Global Step: 751860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:15,284-Speed 2512.38 samples/sec Loss 1.1033 LearningRate 0.000011 Epoch: 36 Global Step: 751870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:23,490-Speed 2496.22 samples/sec Loss 1.1131 LearningRate 0.000011 Epoch: 36 Global Step: 751880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:31,708-Speed 2492.14 samples/sec Loss 1.0953 LearningRate 0.000011 Epoch: 36 Global Step: 751890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:39,918-Speed 2495.13 samples/sec Loss 1.0906 LearningRate 0.000011 Epoch: 36 Global Step: 751900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:48,124-Speed 2496.50 samples/sec Loss 1.0741 LearningRate 0.000011 Epoch: 36 Global Step: 751910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:32:56,334-Speed 2494.65 samples/sec Loss 1.0971 LearningRate 0.000011 Epoch: 36 Global Step: 751920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:04,490-Speed 2511.66 samples/sec Loss 1.0926 LearningRate 0.000011 Epoch: 36 Global Step: 751930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:12,706-Speed 2493.21 samples/sec Loss 1.1132 LearningRate 0.000011 Epoch: 36 Global Step: 751940 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:20,914-Speed 2495.71 samples/sec Loss 1.0988 LearningRate 0.000011 Epoch: 36 Global Step: 751950 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:29,118-Speed 2496.51 samples/sec Loss 1.0731 LearningRate 0.000011 Epoch: 36 Global Step: 751960 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:37,324-Speed 2496.22 samples/sec Loss 1.1194 LearningRate 0.000011 Epoch: 36 Global Step: 751970 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:45,540-Speed 2493.11 samples/sec Loss 1.0926 LearningRate 0.000011 Epoch: 36 Global Step: 751980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:33:53,692-Speed 2512.56 samples/sec Loss 1.0970 LearningRate 0.000011 Epoch: 36 Global Step: 751990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:01,906-Speed 2493.86 samples/sec Loss 1.0940 LearningRate 0.000011 Epoch: 36 Global Step: 752000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:10,111-Speed 2496.36 samples/sec Loss 1.1154 LearningRate 0.000011 Epoch: 36 Global Step: 752010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:18,317-Speed 2496.33 samples/sec Loss 1.1039 LearningRate 0.000011 Epoch: 36 Global Step: 752020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:26,523-Speed 2495.84 samples/sec Loss 1.0578 LearningRate 0.000011 Epoch: 36 Global Step: 752030 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:34,734-Speed 2494.68 samples/sec Loss 1.0971 LearningRate 0.000011 Epoch: 36 Global Step: 752040 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:42,893-Speed 2510.66 samples/sec Loss 1.1113 LearningRate 0.000011 Epoch: 36 Global Step: 752050 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:51,101-Speed 2495.53 samples/sec Loss 1.1393 LearningRate 0.000011 Epoch: 36 Global Step: 752060 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:34:59,310-Speed 2494.96 samples/sec Loss 1.0981 LearningRate 0.000011 Epoch: 36 Global Step: 752070 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:07,519-Speed 2495.52 samples/sec Loss 1.0923 LearningRate 0.000011 Epoch: 36 Global Step: 752080 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:15,727-Speed 2495.70 samples/sec Loss 1.1180 LearningRate 0.000011 Epoch: 36 Global Step: 752090 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:23,945-Speed 2492.08 samples/sec Loss 1.0900 LearningRate 0.000011 Epoch: 36 Global Step: 752100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:32,102-Speed 2511.28 samples/sec Loss 1.0846 LearningRate 0.000011 Epoch: 36 Global Step: 752110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:40,308-Speed 2496.89 samples/sec Loss 1.1108 LearningRate 0.000011 Epoch: 36 Global Step: 752120 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:48,521-Speed 2494.22 samples/sec Loss 1.1084 LearningRate 0.000011 Epoch: 36 Global Step: 752130 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:35:56,730-Speed 2495.32 samples/sec Loss 1.1133 LearningRate 0.000011 Epoch: 36 Global Step: 752140 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:04,941-Speed 2494.71 samples/sec Loss 1.0826 LearningRate 0.000011 Epoch: 36 Global Step: 752150 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:13,146-Speed 2496.33 samples/sec Loss 1.1132 LearningRate 0.000011 Epoch: 36 Global Step: 752160 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:21,298-Speed 2512.84 samples/sec Loss 1.1140 LearningRate 0.000011 Epoch: 36 Global Step: 752170 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:29,503-Speed 2496.57 samples/sec Loss 1.1212 LearningRate 0.000011 Epoch: 36 Global Step: 752180 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:37,723-Speed 2491.67 samples/sec Loss 1.0954 LearningRate 0.000011 Epoch: 36 Global Step: 752190 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:45,925-Speed 2497.34 samples/sec Loss 1.0966 LearningRate 0.000011 Epoch: 36 Global Step: 752200 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:36:54,127-Speed 2497.48 samples/sec Loss 1.0910 LearningRate 0.000011 Epoch: 36 Global Step: 752210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:02,334-Speed 2495.67 samples/sec Loss 1.1249 LearningRate 0.000011 Epoch: 36 Global Step: 752220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:10,488-Speed 2511.92 samples/sec Loss 1.0887 LearningRate 0.000011 Epoch: 36 Global Step: 752230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:18,693-Speed 2496.56 samples/sec Loss 1.1264 LearningRate 0.000011 Epoch: 36 Global Step: 752240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:26,907-Speed 2493.74 samples/sec Loss 1.0965 LearningRate 0.000011 Epoch: 36 Global Step: 752250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:35,110-Speed 2497.08 samples/sec Loss 1.0661 LearningRate 0.000011 Epoch: 36 Global Step: 752260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:43,312-Speed 2497.14 samples/sec Loss 1.0787 LearningRate 0.000011 Epoch: 36 Global Step: 752270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:51,518-Speed 2496.10 samples/sec Loss 1.0983 LearningRate 0.000011 Epoch: 36 Global Step: 752280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:37:59,669-Speed 2513.09 samples/sec Loss 1.0827 LearningRate 0.000011 Epoch: 36 Global Step: 752290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:07,871-Speed 2497.38 samples/sec Loss 1.1132 LearningRate 0.000011 Epoch: 36 Global Step: 752300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:16,075-Speed 2496.91 samples/sec Loss 1.1013 LearningRate 0.000011 Epoch: 36 Global Step: 752310 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:24,284-Speed 2495.26 samples/sec Loss 1.0889 LearningRate 0.000011 Epoch: 36 Global Step: 752320 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:32,501-Speed 2492.67 samples/sec Loss 1.1126 LearningRate 0.000011 Epoch: 36 Global Step: 752330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:40,706-Speed 2496.41 samples/sec Loss 1.0891 LearningRate 0.000011 Epoch: 36 Global Step: 752340 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:48,857-Speed 2512.95 samples/sec Loss 1.0849 LearningRate 0.000011 Epoch: 36 Global Step: 752350 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:38:57,065-Speed 2495.59 samples/sec Loss 1.1152 LearningRate 0.000011 Epoch: 36 Global Step: 752360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:05,269-Speed 2497.00 samples/sec Loss 1.1135 LearningRate 0.000011 Epoch: 36 Global Step: 752370 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:13,472-Speed 2497.06 samples/sec Loss 1.0899 LearningRate 0.000011 Epoch: 36 Global Step: 752380 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:21,677-Speed 2496.59 samples/sec Loss 1.0894 LearningRate 0.000011 Epoch: 36 Global Step: 752390 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:29,893-Speed 2492.91 samples/sec Loss 1.0990 LearningRate 0.000011 Epoch: 36 Global Step: 752400 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:38,047-Speed 2512.02 samples/sec Loss 1.1042 LearningRate 0.000011 Epoch: 36 Global Step: 752410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:46,250-Speed 2497.21 samples/sec Loss 1.0919 LearningRate 0.000011 Epoch: 36 Global Step: 752420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:39:54,465-Speed 2493.50 samples/sec Loss 1.0991 LearningRate 0.000011 Epoch: 36 Global Step: 752430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:02,675-Speed 2494.88 samples/sec Loss 1.0968 LearningRate 0.000011 Epoch: 36 Global Step: 752440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:10,886-Speed 2494.92 samples/sec Loss 1.1295 LearningRate 0.000011 Epoch: 36 Global Step: 752450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:19,090-Speed 2496.50 samples/sec Loss 1.0951 LearningRate 0.000011 Epoch: 36 Global Step: 752460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:27,253-Speed 2509.70 samples/sec Loss 1.0952 LearningRate 0.000011 Epoch: 36 Global Step: 752470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:35,458-Speed 2496.22 samples/sec Loss 1.1046 LearningRate 0.000011 Epoch: 36 Global Step: 752480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:43,661-Speed 2496.97 samples/sec Loss 1.1198 LearningRate 0.000011 Epoch: 36 Global Step: 752490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:40:51,867-Speed 2496.19 samples/sec Loss 1.1168 LearningRate 0.000011 Epoch: 36 Global Step: 752500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:00,073-Speed 2496.23 samples/sec Loss 1.1263 LearningRate 0.000011 Epoch: 36 Global Step: 752510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:08,279-Speed 2496.55 samples/sec Loss 1.0921 LearningRate 0.000011 Epoch: 36 Global Step: 752520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:16,432-Speed 2512.44 samples/sec Loss 1.0850 LearningRate 0.000011 Epoch: 36 Global Step: 752530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:24,635-Speed 2497.17 samples/sec Loss 1.1173 LearningRate 0.000011 Epoch: 36 Global Step: 752540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:32,837-Speed 2497.32 samples/sec Loss 1.1002 LearningRate 0.000011 Epoch: 36 Global Step: 752550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:41,041-Speed 2496.94 samples/sec Loss 1.1330 LearningRate 0.000011 Epoch: 36 Global Step: 752560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:49,250-Speed 2495.16 samples/sec Loss 1.1184 LearningRate 0.000011 Epoch: 36 Global Step: 752570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:41:57,453-Speed 2497.03 samples/sec Loss 1.1089 LearningRate 0.000011 Epoch: 36 Global Step: 752580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:42:05,605-Speed 2512.63 samples/sec Loss 1.0917 LearningRate 0.000011 Epoch: 36 Global Step: 752590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:42:13,822-Speed 2492.83 samples/sec Loss 1.1395 LearningRate 0.000011 Epoch: 36 Global Step: 752600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-07-12 18:42:22,043-Speed 2491.64 samples/sec Loss 1.0937 LearningRate 0.000011 Epoch: 36 Global Step: 752610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:42:30,247-Speed 2496.84 samples/sec Loss 1.1102 LearningRate 0.000011 Epoch: 36 Global Step: 752620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:42:38,461-Speed 2493.79 samples/sec Loss 1.1101 LearningRate 0.000011 Epoch: 36 Global Step: 752630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:42:46,665-Speed 2496.33 samples/sec Loss 1.0876 LearningRate 0.000011 Epoch: 36 Global Step: 752640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:42:54,817-Speed 2512.63 samples/sec Loss 1.0903 LearningRate 0.000011 Epoch: 36 Global Step: 752650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:03,022-Speed 2496.61 samples/sec Loss 1.0981 LearningRate 0.000011 Epoch: 36 Global Step: 752660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:11,230-Speed 2495.52 samples/sec Loss 1.1343 LearningRate 0.000011 Epoch: 36 Global Step: 752670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:19,433-Speed 2496.99 samples/sec Loss 1.1135 LearningRate 0.000011 Epoch: 36 Global Step: 752680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:27,646-Speed 2494.34 samples/sec Loss 1.1175 LearningRate 0.000011 Epoch: 36 Global Step: 752690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:35,861-Speed 2493.53 samples/sec Loss 1.0741 LearningRate 0.000011 Epoch: 36 Global Step: 752700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:44,017-Speed 2511.44 samples/sec Loss 1.0704 LearningRate 0.000011 Epoch: 36 Global Step: 752710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:43:52,232-Speed 2493.39 samples/sec Loss 1.1161 LearningRate 0.000011 Epoch: 36 Global Step: 752720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:00,436-Speed 2496.77 samples/sec Loss 1.1198 LearningRate 0.000011 Epoch: 36 Global Step: 752730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:08,657-Speed 2491.75 samples/sec Loss 1.1501 LearningRate 0.000011 Epoch: 36 Global Step: 752740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:16,859-Speed 2497.41 samples/sec Loss 1.0858 LearningRate 0.000011 Epoch: 36 Global Step: 752750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:25,063-Speed 2496.67 samples/sec Loss 1.1182 LearningRate 0.000011 Epoch: 36 Global Step: 752760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:33,213-Speed 2513.20 samples/sec Loss 1.0958 LearningRate 0.000011 Epoch: 36 Global Step: 752770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:41,419-Speed 2496.38 samples/sec Loss 1.0728 LearningRate 0.000011 Epoch: 36 Global Step: 752780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:49,625-Speed 2496.13 samples/sec Loss 1.1100 LearningRate 0.000011 Epoch: 36 Global Step: 752790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:44:57,829-Speed 2496.89 samples/sec Loss 1.1033 LearningRate 0.000011 Epoch: 36 Global Step: 752800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:06,035-Speed 2495.91 samples/sec Loss 1.0877 LearningRate 0.000011 Epoch: 36 Global Step: 752810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:14,244-Speed 2495.51 samples/sec Loss 1.1073 LearningRate 0.000011 Epoch: 36 Global Step: 752820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:22,401-Speed 2511.07 samples/sec Loss 1.0913 LearningRate 0.000011 Epoch: 36 Global Step: 752830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:30,619-Speed 2492.90 samples/sec Loss 1.0863 LearningRate 0.000011 Epoch: 36 Global Step: 752840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:38,826-Speed 2495.91 samples/sec Loss 1.1010 LearningRate 0.000011 Epoch: 36 Global Step: 752850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:47,033-Speed 2495.95 samples/sec Loss 1.1103 LearningRate 0.000011 Epoch: 36 Global Step: 752860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:45:55,243-Speed 2494.91 samples/sec Loss 1.0959 LearningRate 0.000011 Epoch: 36 Global Step: 752870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:03,462-Speed 2492.34 samples/sec Loss 1.1018 LearningRate 0.000011 Epoch: 36 Global Step: 752880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:11,618-Speed 2511.32 samples/sec Loss 1.1140 LearningRate 0.000011 Epoch: 36 Global Step: 752890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:19,828-Speed 2494.92 samples/sec Loss 1.1398 LearningRate 0.000011 Epoch: 36 Global Step: 752900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:28,035-Speed 2496.16 samples/sec Loss 1.1300 LearningRate 0.000011 Epoch: 36 Global Step: 752910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:36,239-Speed 2496.59 samples/sec Loss 1.0886 LearningRate 0.000011 Epoch: 36 Global Step: 752920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:44,449-Speed 2494.84 samples/sec Loss 1.1253 LearningRate 0.000011 Epoch: 36 Global Step: 752930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:46:52,656-Speed 2496.10 samples/sec Loss 1.0862 LearningRate 0.000011 Epoch: 36 Global Step: 752940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:00,810-Speed 2511.93 samples/sec Loss 1.0909 LearningRate 0.000011 Epoch: 36 Global Step: 752950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:09,015-Speed 2496.65 samples/sec Loss 1.0739 LearningRate 0.000011 Epoch: 36 Global Step: 752960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:17,223-Speed 2495.37 samples/sec Loss 1.0993 LearningRate 0.000011 Epoch: 36 Global Step: 752970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:25,432-Speed 2495.27 samples/sec Loss 1.1089 LearningRate 0.000011 Epoch: 36 Global Step: 752980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:33,638-Speed 2496.08 samples/sec Loss 1.1003 LearningRate 0.000011 Epoch: 36 Global Step: 752990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:41,842-Speed 2496.70 samples/sec Loss 1.1061 LearningRate 0.000011 Epoch: 36 Global Step: 753000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:49,996-Speed 2511.89 samples/sec Loss 1.1152 LearningRate 0.000011 Epoch: 36 Global Step: 753010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:47:58,202-Speed 2496.22 samples/sec Loss 1.0968 LearningRate 0.000011 Epoch: 36 Global Step: 753020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:06,413-Speed 2494.67 samples/sec Loss 1.0916 LearningRate 0.000011 Epoch: 36 Global Step: 753030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:14,619-Speed 2496.24 samples/sec Loss 1.0900 LearningRate 0.000011 Epoch: 36 Global Step: 753040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:22,824-Speed 2496.78 samples/sec Loss 1.0797 LearningRate 0.000011 Epoch: 36 Global Step: 753050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:31,030-Speed 2495.98 samples/sec Loss 1.1328 LearningRate 0.000010 Epoch: 36 Global Step: 753060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:39,188-Speed 2510.81 samples/sec Loss 1.0669 LearningRate 0.000010 Epoch: 36 Global Step: 753070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:47,394-Speed 2496.12 samples/sec Loss 1.0557 LearningRate 0.000010 Epoch: 36 Global Step: 753080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:48:55,602-Speed 2495.50 samples/sec Loss 1.0550 LearningRate 0.000010 Epoch: 36 Global Step: 753090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:49:03,811-Speed 2495.44 samples/sec Loss 1.1084 LearningRate 0.000010 Epoch: 36 Global Step: 753100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-07-12 18:49:12,014-Speed 2496.88 samples/sec Loss 1.1009 LearningRate 0.000010 Epoch: 36 Global Step: 753110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:49:20,217-Speed 2497.34 samples/sec Loss 1.0897 LearningRate 0.000010 Epoch: 36 Global Step: 753120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:49:28,366-Speed 2513.43 samples/sec Loss 1.1213 LearningRate 0.000010 Epoch: 36 Global Step: 753130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:49:36,570-Speed 2496.86 samples/sec Loss 1.0601 LearningRate 0.000010 Epoch: 36 Global Step: 753140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:49:44,778-Speed 2495.46 samples/sec Loss 1.1275 LearningRate 0.000010 Epoch: 36 Global Step: 753150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:49:52,982-Speed 2496.92 samples/sec Loss 1.0804 LearningRate 0.000010 Epoch: 36 Global Step: 753160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:01,185-Speed 2497.10 samples/sec Loss 1.0904 LearningRate 0.000010 Epoch: 36 Global Step: 753170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:09,393-Speed 2495.63 samples/sec Loss 1.0885 LearningRate 0.000010 Epoch: 36 Global Step: 753180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:17,546-Speed 2512.19 samples/sec Loss 1.0843 LearningRate 0.000010 Epoch: 36 Global Step: 753190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:25,749-Speed 2497.55 samples/sec Loss 1.1092 LearningRate 0.000010 Epoch: 36 Global Step: 753200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:33,955-Speed 2495.94 samples/sec Loss 1.0773 LearningRate 0.000010 Epoch: 36 Global Step: 753210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:42,158-Speed 2497.00 samples/sec Loss 1.0850 LearningRate 0.000010 Epoch: 36 Global Step: 753220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:50,366-Speed 2495.75 samples/sec Loss 1.0641 LearningRate 0.000010 Epoch: 36 Global Step: 753230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:50:58,569-Speed 2497.00 samples/sec Loss 1.0889 LearningRate 0.000010 Epoch: 36 Global Step: 753240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:51:06,725-Speed 2511.18 samples/sec Loss 1.1287 LearningRate 0.000010 Epoch: 36 Global Step: 753250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 18:51:14,895-Speed 2507.32 samples/sec Loss 1.1036 LearningRate 0.000010 Epoch: 36 Global Step: 753260 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:51:23,106-Speed 2494.44 samples/sec Loss 1.1087 LearningRate 0.000010 Epoch: 36 Global Step: 753270 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:51:31,332-Speed 2490.35 samples/sec Loss 1.0917 LearningRate 0.000010 Epoch: 36 Global Step: 753280 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:51:39,538-Speed 2496.27 samples/sec Loss 1.0803 LearningRate 0.000010 Epoch: 36 Global Step: 753290 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:51:47,744-Speed 2496.45 samples/sec Loss 1.0587 LearningRate 0.000010 Epoch: 36 Global Step: 753300 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:51:55,899-Speed 2511.97 samples/sec Loss 1.1039 LearningRate 0.000010 Epoch: 36 Global Step: 753310 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:04,102-Speed 2496.83 samples/sec Loss 1.1006 LearningRate 0.000010 Epoch: 36 Global Step: 753320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:12,307-Speed 2496.39 samples/sec Loss 1.0781 LearningRate 0.000010 Epoch: 36 Global Step: 753330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:20,515-Speed 2495.82 samples/sec Loss 1.0791 LearningRate 0.000010 Epoch: 36 Global Step: 753340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:28,724-Speed 2494.96 samples/sec Loss 1.1113 LearningRate 0.000010 Epoch: 36 Global Step: 753350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:36,928-Speed 2496.79 samples/sec Loss 1.0945 LearningRate 0.000010 Epoch: 36 Global Step: 753360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:45,078-Speed 2513.43 samples/sec Loss 1.0912 LearningRate 0.000010 Epoch: 36 Global Step: 753370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:52:53,281-Speed 2496.62 samples/sec Loss 1.0871 LearningRate 0.000010 Epoch: 36 Global Step: 753380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:01,489-Speed 2495.69 samples/sec Loss 1.0850 LearningRate 0.000010 Epoch: 36 Global Step: 753390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:09,692-Speed 2496.88 samples/sec Loss 1.0749 LearningRate 0.000010 Epoch: 36 Global Step: 753400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:17,898-Speed 2495.98 samples/sec Loss 1.1119 LearningRate 0.000010 Epoch: 36 Global Step: 753410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:26,103-Speed 2496.50 samples/sec Loss 1.0846 LearningRate 0.000010 Epoch: 36 Global Step: 753420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:34,253-Speed 2513.41 samples/sec Loss 1.0686 LearningRate 0.000010 Epoch: 36 Global Step: 753430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:42,454-Speed 2497.51 samples/sec Loss 1.0815 LearningRate 0.000010 Epoch: 36 Global Step: 753440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:50,657-Speed 2496.84 samples/sec Loss 1.0854 LearningRate 0.000010 Epoch: 36 Global Step: 753450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:53:58,862-Speed 2496.49 samples/sec Loss 1.0995 LearningRate 0.000010 Epoch: 36 Global Step: 753460 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:07,068-Speed 2496.11 samples/sec Loss 1.1112 LearningRate 0.000010 Epoch: 36 Global Step: 753470 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:15,268-Speed 2497.95 samples/sec Loss 1.0659 LearningRate 0.000010 Epoch: 36 Global Step: 753480 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:23,420-Speed 2512.47 samples/sec Loss 1.1148 LearningRate 0.000010 Epoch: 36 Global Step: 753490 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:31,629-Speed 2495.17 samples/sec Loss 1.1062 LearningRate 0.000010 Epoch: 36 Global Step: 753500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:39,834-Speed 2496.75 samples/sec Loss 1.1032 LearningRate 0.000010 Epoch: 36 Global Step: 753510 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:48,039-Speed 2496.46 samples/sec Loss 1.1062 LearningRate 0.000010 Epoch: 36 Global Step: 753520 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:54:56,247-Speed 2495.46 samples/sec Loss 1.0832 LearningRate 0.000010 Epoch: 36 Global Step: 753530 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:04,449-Speed 2497.26 samples/sec Loss 1.0970 LearningRate 0.000010 Epoch: 36 Global Step: 753540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:12,598-Speed 2513.84 samples/sec Loss 1.1045 LearningRate 0.000010 Epoch: 36 Global Step: 753550 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:20,801-Speed 2496.75 samples/sec Loss 1.0897 LearningRate 0.000010 Epoch: 36 Global Step: 753560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:29,004-Speed 2497.06 samples/sec Loss 1.1047 LearningRate 0.000010 Epoch: 36 Global Step: 753570 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:37,205-Speed 2497.77 samples/sec Loss 1.0991 LearningRate 0.000010 Epoch: 36 Global Step: 753580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:45,407-Speed 2497.61 samples/sec Loss 1.1027 LearningRate 0.000010 Epoch: 36 Global Step: 753590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:55:53,611-Speed 2496.56 samples/sec Loss 1.0830 LearningRate 0.000010 Epoch: 36 Global Step: 753600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:56:01,760-Speed 2513.47 samples/sec Loss 1.1066 LearningRate 0.000010 Epoch: 36 Global Step: 753610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 18:56:09,918-Speed 2510.98 samples/sec Loss 1.1086 LearningRate 0.000010 Epoch: 36 Global Step: 753620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:18,134-Speed 2493.13 samples/sec Loss 1.0869 LearningRate 0.000010 Epoch: 36 Global Step: 753630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:26,338-Speed 2496.70 samples/sec Loss 1.1241 LearningRate 0.000010 Epoch: 36 Global Step: 753640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:34,541-Speed 2496.82 samples/sec Loss 1.0819 LearningRate 0.000010 Epoch: 36 Global Step: 753650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:42,743-Speed 2497.54 samples/sec Loss 1.0853 LearningRate 0.000010 Epoch: 36 Global Step: 753660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:50,892-Speed 2513.67 samples/sec Loss 1.1086 LearningRate 0.000010 Epoch: 36 Global Step: 753670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:56:59,093-Speed 2497.62 samples/sec Loss 1.0885 LearningRate 0.000010 Epoch: 36 Global Step: 753680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:07,294-Speed 2497.58 samples/sec Loss 1.0912 LearningRate 0.000010 Epoch: 36 Global Step: 753690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:15,498-Speed 2496.65 samples/sec Loss 1.1006 LearningRate 0.000010 Epoch: 36 Global Step: 753700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:23,704-Speed 2496.34 samples/sec Loss 1.0840 LearningRate 0.000010 Epoch: 36 Global Step: 753710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:31,908-Speed 2496.89 samples/sec Loss 1.1125 LearningRate 0.000010 Epoch: 36 Global Step: 753720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:40,055-Speed 2513.96 samples/sec Loss 1.1203 LearningRate 0.000010 Epoch: 36 Global Step: 753730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:48,260-Speed 2496.71 samples/sec Loss 1.1138 LearningRate 0.000010 Epoch: 36 Global Step: 753740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:57:56,460-Speed 2497.84 samples/sec Loss 1.1058 LearningRate 0.000010 Epoch: 36 Global Step: 753750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:04,667-Speed 2495.80 samples/sec Loss 1.1201 LearningRate 0.000010 Epoch: 36 Global Step: 753760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:12,880-Speed 2494.19 samples/sec Loss 1.1143 LearningRate 0.000010 Epoch: 36 Global Step: 753770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:21,083-Speed 2497.18 samples/sec Loss 1.1250 LearningRate 0.000010 Epoch: 36 Global Step: 753780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:29,230-Speed 2514.26 samples/sec Loss 1.1125 LearningRate 0.000010 Epoch: 36 Global Step: 753790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:37,435-Speed 2496.55 samples/sec Loss 1.1057 LearningRate 0.000010 Epoch: 36 Global Step: 753800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:45,639-Speed 2496.81 samples/sec Loss 1.0932 LearningRate 0.000010 Epoch: 36 Global Step: 753810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:58:53,843-Speed 2496.75 samples/sec Loss 1.0947 LearningRate 0.000010 Epoch: 36 Global Step: 753820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:02,044-Speed 2497.60 samples/sec Loss 1.1092 LearningRate 0.000010 Epoch: 36 Global Step: 753830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:10,248-Speed 2496.97 samples/sec Loss 1.0831 LearningRate 0.000010 Epoch: 36 Global Step: 753840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:18,395-Speed 2514.09 samples/sec Loss 1.0944 LearningRate 0.000010 Epoch: 36 Global Step: 753850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:26,610-Speed 2493.58 samples/sec Loss 1.1159 LearningRate 0.000010 Epoch: 36 Global Step: 753860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:34,813-Speed 2496.95 samples/sec Loss 1.0883 LearningRate 0.000010 Epoch: 36 Global Step: 753870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:43,026-Speed 2494.17 samples/sec Loss 1.1041 LearningRate 0.000010 Epoch: 36 Global Step: 753880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:51,236-Speed 2495.21 samples/sec Loss 1.1015 LearningRate 0.000010 Epoch: 36 Global Step: 753890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 18:59:59,439-Speed 2496.96 samples/sec Loss 1.1105 LearningRate 0.000010 Epoch: 36 Global Step: 753900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:07,589-Speed 2513.11 samples/sec Loss 1.1081 LearningRate 0.000010 Epoch: 36 Global Step: 753910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:15,795-Speed 2496.42 samples/sec Loss 1.0679 LearningRate 0.000010 Epoch: 36 Global Step: 753920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:23,999-Speed 2496.68 samples/sec Loss 1.0903 LearningRate 0.000010 Epoch: 36 Global Step: 753930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:32,204-Speed 2496.46 samples/sec Loss 1.0954 LearningRate 0.000010 Epoch: 36 Global Step: 753940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:40,432-Speed 2489.55 samples/sec Loss 1.1021 LearningRate 0.000010 Epoch: 36 Global Step: 753950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:48,635-Speed 2496.84 samples/sec Loss 1.0782 LearningRate 0.000010 Epoch: 36 Global Step: 753960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:00:56,787-Speed 2512.82 samples/sec Loss 1.1203 LearningRate 0.000010 Epoch: 36 Global Step: 753970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:04,990-Speed 2497.09 samples/sec Loss 1.1217 LearningRate 0.000010 Epoch: 36 Global Step: 753980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:13,187-Speed 2498.74 samples/sec Loss 1.0914 LearningRate 0.000010 Epoch: 36 Global Step: 753990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:21,393-Speed 2496.07 samples/sec Loss 1.0925 LearningRate 0.000010 Epoch: 36 Global Step: 754000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:29,597-Speed 2496.79 samples/sec Loss 1.0944 LearningRate 0.000010 Epoch: 36 Global Step: 754010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:37,799-Speed 2497.36 samples/sec Loss 1.0933 LearningRate 0.000010 Epoch: 36 Global Step: 754020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:45,959-Speed 2510.05 samples/sec Loss 1.1014 LearningRate 0.000010 Epoch: 36 Global Step: 754030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:01:54,158-Speed 2498.07 samples/sec Loss 1.1239 LearningRate 0.000010 Epoch: 36 Global Step: 754040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:02,373-Speed 2493.60 samples/sec Loss 1.1195 LearningRate 0.000010 Epoch: 36 Global Step: 754050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:10,576-Speed 2497.33 samples/sec Loss 1.1283 LearningRate 0.000010 Epoch: 36 Global Step: 754060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:18,779-Speed 2496.91 samples/sec Loss 1.0853 LearningRate 0.000010 Epoch: 36 Global Step: 754070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:26,985-Speed 2496.42 samples/sec Loss 1.0938 LearningRate 0.000010 Epoch: 36 Global Step: 754080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:35,142-Speed 2511.36 samples/sec Loss 1.1045 LearningRate 0.000010 Epoch: 36 Global Step: 754090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:43,348-Speed 2496.02 samples/sec Loss 1.0865 LearningRate 0.000010 Epoch: 36 Global Step: 754100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:51,549-Speed 2497.59 samples/sec Loss 1.0701 LearningRate 0.000010 Epoch: 36 Global Step: 754110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:02:59,755-Speed 2496.57 samples/sec Loss 1.1168 LearningRate 0.000010 Epoch: 36 Global Step: 754120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:07,957-Speed 2497.24 samples/sec Loss 1.1147 LearningRate 0.000010 Epoch: 36 Global Step: 754130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:16,159-Speed 2497.47 samples/sec Loss 1.0931 LearningRate 0.000010 Epoch: 36 Global Step: 754140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:24,309-Speed 2513.21 samples/sec Loss 1.0892 LearningRate 0.000010 Epoch: 36 Global Step: 754150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:32,511-Speed 2497.51 samples/sec Loss 1.0749 LearningRate 0.000010 Epoch: 36 Global Step: 754160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:40,709-Speed 2498.51 samples/sec Loss 1.1139 LearningRate 0.000010 Epoch: 36 Global Step: 754170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:48,912-Speed 2497.16 samples/sec Loss 1.1189 LearningRate 0.000010 Epoch: 36 Global Step: 754180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:03:57,116-Speed 2496.94 samples/sec Loss 1.0785 LearningRate 0.000010 Epoch: 36 Global Step: 754190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:05,329-Speed 2493.80 samples/sec Loss 1.1101 LearningRate 0.000010 Epoch: 36 Global Step: 754200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:13,482-Speed 2512.73 samples/sec Loss 1.0779 LearningRate 0.000010 Epoch: 36 Global Step: 754210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:21,681-Speed 2498.18 samples/sec Loss 1.0918 LearningRate 0.000010 Epoch: 36 Global Step: 754220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:29,885-Speed 2496.78 samples/sec Loss 1.1030 LearningRate 0.000010 Epoch: 36 Global Step: 754230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:38,098-Speed 2494.14 samples/sec Loss 1.1057 LearningRate 0.000010 Epoch: 36 Global Step: 754240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:46,299-Speed 2497.64 samples/sec Loss 1.1053 LearningRate 0.000010 Epoch: 36 Global Step: 754250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:04:54,506-Speed 2495.80 samples/sec Loss 1.0758 LearningRate 0.000010 Epoch: 36 Global Step: 754260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:02,656-Speed 2513.07 samples/sec Loss 1.1105 LearningRate 0.000010 Epoch: 36 Global Step: 754270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:10,857-Speed 2497.84 samples/sec Loss 1.1262 LearningRate 0.000010 Epoch: 36 Global Step: 754280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:19,057-Speed 2498.09 samples/sec Loss 1.0906 LearningRate 0.000010 Epoch: 36 Global Step: 754290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:27,258-Speed 2497.68 samples/sec Loss 1.0892 LearningRate 0.000010 Epoch: 36 Global Step: 754300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:35,458-Speed 2498.04 samples/sec Loss 1.1366 LearningRate 0.000010 Epoch: 36 Global Step: 754310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:43,682-Speed 2490.61 samples/sec Loss 1.1063 LearningRate 0.000010 Epoch: 36 Global Step: 754320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:05:51,832-Speed 2513.31 samples/sec Loss 1.0998 LearningRate 0.000010 Epoch: 36 Global Step: 754330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:00,033-Speed 2497.48 samples/sec Loss 1.1027 LearningRate 0.000010 Epoch: 36 Global Step: 754340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:08,238-Speed 2496.56 samples/sec Loss 1.0583 LearningRate 0.000010 Epoch: 36 Global Step: 754350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:16,441-Speed 2497.01 samples/sec Loss 1.1162 LearningRate 0.000010 Epoch: 36 Global Step: 754360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:24,655-Speed 2493.66 samples/sec Loss 1.1036 LearningRate 0.000010 Epoch: 36 Global Step: 754370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:32,856-Speed 2497.49 samples/sec Loss 1.0891 LearningRate 0.000010 Epoch: 36 Global Step: 754380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:41,012-Speed 2511.59 samples/sec Loss 1.0848 LearningRate 0.000010 Epoch: 36 Global Step: 754390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:49,213-Speed 2497.43 samples/sec Loss 1.0775 LearningRate 0.000010 Epoch: 36 Global Step: 754400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:06:57,416-Speed 2497.15 samples/sec Loss 1.0562 LearningRate 0.000010 Epoch: 36 Global Step: 754410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:05,616-Speed 2498.04 samples/sec Loss 1.0669 LearningRate 0.000010 Epoch: 36 Global Step: 754420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:13,819-Speed 2496.96 samples/sec Loss 1.0826 LearningRate 0.000010 Epoch: 36 Global Step: 754430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:22,020-Speed 2497.38 samples/sec Loss 1.1233 LearningRate 0.000010 Epoch: 36 Global Step: 754440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:30,171-Speed 2513.27 samples/sec Loss 1.1240 LearningRate 0.000010 Epoch: 36 Global Step: 754450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:38,371-Speed 2497.89 samples/sec Loss 1.0963 LearningRate 0.000010 Epoch: 36 Global Step: 754460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:46,577-Speed 2496.35 samples/sec Loss 1.0675 LearningRate 0.000010 Epoch: 36 Global Step: 754470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:07:54,779-Speed 2497.29 samples/sec Loss 1.1013 LearningRate 0.000010 Epoch: 36 Global Step: 754480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:02,980-Speed 2497.54 samples/sec Loss 1.0782 LearningRate 0.000010 Epoch: 36 Global Step: 754490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:11,185-Speed 2497.03 samples/sec Loss 1.0970 LearningRate 0.000010 Epoch: 36 Global Step: 754500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:19,336-Speed 2513.07 samples/sec Loss 1.0595 LearningRate 0.000010 Epoch: 36 Global Step: 754510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:27,539-Speed 2497.30 samples/sec Loss 1.0835 LearningRate 0.000010 Epoch: 36 Global Step: 754520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:35,741-Speed 2497.38 samples/sec Loss 1.1081 LearningRate 0.000010 Epoch: 36 Global Step: 754530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:43,944-Speed 2497.13 samples/sec Loss 1.0926 LearningRate 0.000010 Epoch: 36 Global Step: 754540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:08:52,146-Speed 2497.20 samples/sec Loss 1.0724 LearningRate 0.000010 Epoch: 36 Global Step: 754550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:00,358-Speed 2494.19 samples/sec Loss 1.0992 LearningRate 0.000010 Epoch: 36 Global Step: 754560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:08,511-Speed 2512.42 samples/sec Loss 1.0943 LearningRate 0.000010 Epoch: 36 Global Step: 754570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:16,713-Speed 2497.46 samples/sec Loss 1.1010 LearningRate 0.000010 Epoch: 36 Global Step: 754580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:24,921-Speed 2495.49 samples/sec Loss 1.0762 LearningRate 0.000010 Epoch: 36 Global Step: 754590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:33,135-Speed 2493.58 samples/sec Loss 1.1044 LearningRate 0.000010 Epoch: 36 Global Step: 754600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:41,334-Speed 2498.20 samples/sec Loss 1.1238 LearningRate 0.000010 Epoch: 36 Global Step: 754610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:49,535-Speed 2497.75 samples/sec Loss 1.1239 LearningRate 0.000010 Epoch: 36 Global Step: 754620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:09:57,684-Speed 2513.61 samples/sec Loss 1.0992 LearningRate 0.000010 Epoch: 36 Global Step: 754630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:05,893-Speed 2495.23 samples/sec Loss 1.0833 LearningRate 0.000010 Epoch: 36 Global Step: 754640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:14,098-Speed 2496.59 samples/sec Loss 1.0969 LearningRate 0.000010 Epoch: 36 Global Step: 754650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:22,303-Speed 2496.54 samples/sec Loss 1.0793 LearningRate 0.000010 Epoch: 36 Global Step: 754660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:30,508-Speed 2496.39 samples/sec Loss 1.0675 LearningRate 0.000010 Epoch: 36 Global Step: 754670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:38,715-Speed 2495.74 samples/sec Loss 1.1166 LearningRate 0.000010 Epoch: 36 Global Step: 754680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:46,868-Speed 2512.43 samples/sec Loss 1.1029 LearningRate 0.000010 Epoch: 36 Global Step: 754690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:10:55,076-Speed 2495.43 samples/sec Loss 1.0820 LearningRate 0.000010 Epoch: 36 Global Step: 754700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:03,284-Speed 2495.47 samples/sec Loss 1.0862 LearningRate 0.000010 Epoch: 36 Global Step: 754710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:11,504-Speed 2491.69 samples/sec Loss 1.1082 LearningRate 0.000010 Epoch: 36 Global Step: 754720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:19,711-Speed 2496.46 samples/sec Loss 1.0892 LearningRate 0.000010 Epoch: 36 Global Step: 754730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:27,915-Speed 2496.68 samples/sec Loss 1.0867 LearningRate 0.000010 Epoch: 36 Global Step: 754740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:36,068-Speed 2512.19 samples/sec Loss 1.1043 LearningRate 0.000010 Epoch: 36 Global Step: 754750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:44,269-Speed 2497.54 samples/sec Loss 1.0854 LearningRate 0.000010 Epoch: 36 Global Step: 754760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:11:52,473-Speed 2497.29 samples/sec Loss 1.1101 LearningRate 0.000010 Epoch: 36 Global Step: 754770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:12:00,682-Speed 2495.14 samples/sec Loss 1.1005 LearningRate 0.000010 Epoch: 36 Global Step: 754780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:12:08,884-Speed 2497.03 samples/sec Loss 1.0818 LearningRate 0.000010 Epoch: 36 Global Step: 754790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:12:17,090-Speed 2496.55 samples/sec Loss 1.1089 LearningRate 0.000010 Epoch: 36 Global Step: 754800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:12:25,240-Speed 2513.47 samples/sec Loss 1.1302 LearningRate 0.000010 Epoch: 36 Global Step: 754810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:12:33,446-Speed 2495.85 samples/sec Loss 1.1155 LearningRate 0.000010 Epoch: 36 Global Step: 754820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:12:41,656-Speed 2495.14 samples/sec Loss 1.0878 LearningRate 0.000010 Epoch: 36 Global Step: 754830 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:12:49,858-Speed 2497.24 samples/sec Loss 1.1348 LearningRate 0.000010 Epoch: 36 Global Step: 754840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:12:58,080-Speed 2491.39 samples/sec Loss 1.0995 LearningRate 0.000010 Epoch: 36 Global Step: 754850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:06,285-Speed 2496.35 samples/sec Loss 1.1232 LearningRate 0.000010 Epoch: 36 Global Step: 754860 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:14,441-Speed 2511.55 samples/sec Loss 1.1067 LearningRate 0.000010 Epoch: 36 Global Step: 754870 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:22,652-Speed 2494.57 samples/sec Loss 1.0898 LearningRate 0.000010 Epoch: 36 Global Step: 754880 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:30,852-Speed 2498.19 samples/sec Loss 1.0951 LearningRate 0.000010 Epoch: 36 Global Step: 754890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:39,056-Speed 2496.57 samples/sec Loss 1.1241 LearningRate 0.000010 Epoch: 36 Global Step: 754900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:47,261-Speed 2496.54 samples/sec Loss 1.0697 LearningRate 0.000010 Epoch: 36 Global Step: 754910 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:13:55,468-Speed 2495.75 samples/sec Loss 1.0972 LearningRate 0.000010 Epoch: 36 Global Step: 754920 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:03,620-Speed 2513.02 samples/sec Loss 1.1318 LearningRate 0.000010 Epoch: 36 Global Step: 754930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:11,827-Speed 2495.76 samples/sec Loss 1.0788 LearningRate 0.000010 Epoch: 36 Global Step: 754940 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:20,030-Speed 2496.73 samples/sec Loss 1.1047 LearningRate 0.000010 Epoch: 36 Global Step: 754950 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:28,232-Speed 2497.34 samples/sec Loss 1.1025 LearningRate 0.000010 Epoch: 36 Global Step: 754960 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:36,451-Speed 2492.13 samples/sec Loss 1.1013 LearningRate 0.000010 Epoch: 36 Global Step: 754970 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:44,653-Speed 2497.34 samples/sec Loss 1.0996 LearningRate 0.000010 Epoch: 36 Global Step: 754980 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:14:52,814-Speed 2509.92 samples/sec Loss 1.1029 LearningRate 0.000010 Epoch: 36 Global Step: 754990 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:01,018-Speed 2496.75 samples/sec Loss 1.0896 LearningRate 0.000010 Epoch: 36 Global Step: 755000 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:09,220-Speed 2497.48 samples/sec Loss 1.0984 LearningRate 0.000010 Epoch: 36 Global Step: 755010 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:17,420-Speed 2497.70 samples/sec Loss 1.1137 LearningRate 0.000010 Epoch: 36 Global Step: 755020 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:25,623-Speed 2496.88 samples/sec Loss 1.1095 LearningRate 0.000010 Epoch: 36 Global Step: 755030 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:33,829-Speed 2496.31 samples/sec Loss 1.0968 LearningRate 0.000010 Epoch: 36 Global Step: 755040 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:41,977-Speed 2513.89 samples/sec Loss 1.1128 LearningRate 0.000010 Epoch: 36 Global Step: 755050 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:50,178-Speed 2497.72 samples/sec Loss 1.0890 LearningRate 0.000010 Epoch: 36 Global Step: 755060 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:15:58,381-Speed 2497.02 samples/sec Loss 1.1044 LearningRate 0.000010 Epoch: 36 Global Step: 755070 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:06,582-Speed 2497.66 samples/sec Loss 1.1325 LearningRate 0.000010 Epoch: 36 Global Step: 755080 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:14,785-Speed 2497.13 samples/sec Loss 1.1061 LearningRate 0.000010 Epoch: 36 Global Step: 755090 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:22,991-Speed 2496.07 samples/sec Loss 1.1244 LearningRate 0.000010 Epoch: 36 Global Step: 755100 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:31,139-Speed 2514.08 samples/sec Loss 1.0884 LearningRate 0.000010 Epoch: 36 Global Step: 755110 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:39,339-Speed 2497.96 samples/sec Loss 1.1272 LearningRate 0.000010 Epoch: 36 Global Step: 755120 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:47,547-Speed 2495.25 samples/sec Loss 1.1066 LearningRate 0.000010 Epoch: 36 Global Step: 755130 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:16:55,747-Speed 2498.10 samples/sec Loss 1.0626 LearningRate 0.000010 Epoch: 36 Global Step: 755140 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:03,953-Speed 2495.93 samples/sec Loss 1.1215 LearningRate 0.000010 Epoch: 36 Global Step: 755150 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:12,157-Speed 2496.85 samples/sec Loss 1.1077 LearningRate 0.000010 Epoch: 36 Global Step: 755160 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:20,310-Speed 2512.31 samples/sec Loss 1.1253 LearningRate 0.000010 Epoch: 36 Global Step: 755170 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:28,517-Speed 2496.07 samples/sec Loss 1.0961 LearningRate 0.000010 Epoch: 36 Global Step: 755180 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:36,719-Speed 2497.48 samples/sec Loss 1.1068 LearningRate 0.000010 Epoch: 36 Global Step: 755190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:44,924-Speed 2496.45 samples/sec Loss 1.0755 LearningRate 0.000010 Epoch: 36 Global Step: 755200 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:17:53,134-Speed 2494.92 samples/sec Loss 1.1117 LearningRate 0.000010 Epoch: 36 Global Step: 755210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:01,339-Speed 2496.25 samples/sec Loss 1.1408 LearningRate 0.000010 Epoch: 36 Global Step: 755220 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:09,490-Speed 2513.24 samples/sec Loss 1.1054 LearningRate 0.000010 Epoch: 36 Global Step: 755230 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:17,697-Speed 2495.81 samples/sec Loss 1.0937 LearningRate 0.000010 Epoch: 36 Global Step: 755240 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:25,904-Speed 2495.88 samples/sec Loss 1.1035 LearningRate 0.000010 Epoch: 36 Global Step: 755250 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:34,108-Speed 2496.70 samples/sec Loss 1.0972 LearningRate 0.000010 Epoch: 36 Global Step: 755260 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:42,323-Speed 2493.37 samples/sec Loss 1.1224 LearningRate 0.000010 Epoch: 36 Global Step: 755270 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:50,527-Speed 2496.57 samples/sec Loss 1.1232 LearningRate 0.000010 Epoch: 36 Global Step: 755280 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:18:58,679-Speed 2512.75 samples/sec Loss 1.0853 LearningRate 0.000010 Epoch: 36 Global Step: 755290 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:06,884-Speed 2496.30 samples/sec Loss 1.1182 LearningRate 0.000010 Epoch: 36 Global Step: 755300 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:15,087-Speed 2497.02 samples/sec Loss 1.1164 LearningRate 0.000010 Epoch: 36 Global Step: 755310 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:23,291-Speed 2497.04 samples/sec Loss 1.0904 LearningRate 0.000010 Epoch: 36 Global Step: 755320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:31,505-Speed 2493.60 samples/sec Loss 1.0838 LearningRate 0.000010 Epoch: 36 Global Step: 755330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:39,707-Speed 2497.20 samples/sec Loss 1.0524 LearningRate 0.000010 Epoch: 36 Global Step: 755340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:47,856-Speed 2513.87 samples/sec Loss 1.0868 LearningRate 0.000010 Epoch: 36 Global Step: 755350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:19:56,059-Speed 2497.20 samples/sec Loss 1.0856 LearningRate 0.000010 Epoch: 36 Global Step: 755360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:04,264-Speed 2496.44 samples/sec Loss 1.0930 LearningRate 0.000010 Epoch: 36 Global Step: 755370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:12,465-Speed 2497.63 samples/sec Loss 1.1009 LearningRate 0.000010 Epoch: 36 Global Step: 755380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:20,670-Speed 2496.33 samples/sec Loss 1.0853 LearningRate 0.000010 Epoch: 36 Global Step: 755390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:28,872-Speed 2497.22 samples/sec Loss 1.0951 LearningRate 0.000010 Epoch: 36 Global Step: 755400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:37,020-Speed 2514.01 samples/sec Loss 1.1098 LearningRate 0.000010 Epoch: 36 Global Step: 755410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:45,234-Speed 2493.85 samples/sec Loss 1.0856 LearningRate 0.000010 Epoch: 36 Global Step: 755420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:20:53,446-Speed 2494.41 samples/sec Loss 1.0735 LearningRate 0.000010 Epoch: 36 Global Step: 755430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:01,650-Speed 2496.78 samples/sec Loss 1.0688 LearningRate 0.000010 Epoch: 36 Global Step: 755440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:09,851-Speed 2497.75 samples/sec Loss 1.0793 LearningRate 0.000010 Epoch: 36 Global Step: 755450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:18,067-Speed 2493.37 samples/sec Loss 1.1049 LearningRate 0.000010 Epoch: 36 Global Step: 755460 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:26,221-Speed 2511.93 samples/sec Loss 1.0522 LearningRate 0.000010 Epoch: 36 Global Step: 755470 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:34,420-Speed 2498.43 samples/sec Loss 1.0860 LearningRate 0.000010 Epoch: 36 Global Step: 755480 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:42,624-Speed 2496.64 samples/sec Loss 1.0929 LearningRate 0.000010 Epoch: 36 Global Step: 755490 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:50,829-Speed 2496.33 samples/sec Loss 1.0882 LearningRate 0.000010 Epoch: 36 Global Step: 755500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:21:59,033-Speed 2496.83 samples/sec Loss 1.1256 LearningRate 0.000010 Epoch: 36 Global Step: 755510 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:07,237-Speed 2496.81 samples/sec Loss 1.1242 LearningRate 0.000010 Epoch: 36 Global Step: 755520 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:15,384-Speed 2514.16 samples/sec Loss 1.1075 LearningRate 0.000010 Epoch: 36 Global Step: 755530 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:23,604-Speed 2491.87 samples/sec Loss 1.1230 LearningRate 0.000010 Epoch: 36 Global Step: 755540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:31,806-Speed 2497.31 samples/sec Loss 1.0950 LearningRate 0.000010 Epoch: 36 Global Step: 755550 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:40,007-Speed 2497.78 samples/sec Loss 1.0968 LearningRate 0.000010 Epoch: 36 Global Step: 755560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:48,211-Speed 2496.48 samples/sec Loss 1.0968 LearningRate 0.000010 Epoch: 36 Global Step: 755570 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:22:56,427-Speed 2493.05 samples/sec Loss 1.1002 LearningRate 0.000010 Epoch: 36 Global Step: 755580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:04,579-Speed 2512.85 samples/sec Loss 1.0977 LearningRate 0.000010 Epoch: 36 Global Step: 755590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:12,780-Speed 2497.65 samples/sec Loss 1.1149 LearningRate 0.000010 Epoch: 36 Global Step: 755600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:20,984-Speed 2496.72 samples/sec Loss 1.0970 LearningRate 0.000010 Epoch: 36 Global Step: 755610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:29,187-Speed 2497.22 samples/sec Loss 1.0705 LearningRate 0.000010 Epoch: 36 Global Step: 755620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:37,388-Speed 2498.92 samples/sec Loss 1.0903 LearningRate 0.000010 Epoch: 36 Global Step: 755630 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:45,592-Speed 2496.70 samples/sec Loss 1.0482 LearningRate 0.000010 Epoch: 36 Global Step: 755640 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:23:53,738-Speed 2514.54 samples/sec Loss 1.0674 LearningRate 0.000010 Epoch: 36 Global Step: 755650 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:01,940-Speed 2497.57 samples/sec Loss 1.1314 LearningRate 0.000010 Epoch: 36 Global Step: 755660 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:10,140-Speed 2497.80 samples/sec Loss 1.0918 LearningRate 0.000010 Epoch: 36 Global Step: 755670 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:18,344-Speed 2496.76 samples/sec Loss 1.1011 LearningRate 0.000010 Epoch: 36 Global Step: 755680 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:26,546-Speed 2497.24 samples/sec Loss 1.0640 LearningRate 0.000010 Epoch: 36 Global Step: 755690 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:34,756-Speed 2495.30 samples/sec Loss 1.1256 LearningRate 0.000010 Epoch: 36 Global Step: 755700 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:42,907-Speed 2512.72 samples/sec Loss 1.0751 LearningRate 0.000010 Epoch: 36 Global Step: 755710 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:51,111-Speed 2496.79 samples/sec Loss 1.0769 LearningRate 0.000010 Epoch: 36 Global Step: 755720 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:24:59,322-Speed 2494.79 samples/sec Loss 1.0792 LearningRate 0.000010 Epoch: 36 Global Step: 755730 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:07,529-Speed 2496.22 samples/sec Loss 1.1132 LearningRate 0.000010 Epoch: 36 Global Step: 755740 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:15,732-Speed 2496.96 samples/sec Loss 1.1154 LearningRate 0.000010 Epoch: 36 Global Step: 755750 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:23,938-Speed 2496.58 samples/sec Loss 1.0895 LearningRate 0.000010 Epoch: 36 Global Step: 755760 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:32,103-Speed 2508.64 samples/sec Loss 1.1107 LearningRate 0.000010 Epoch: 36 Global Step: 755770 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:40,317-Speed 2493.65 samples/sec Loss 1.1231 LearningRate 0.000010 Epoch: 36 Global Step: 755780 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:48,520-Speed 2497.04 samples/sec Loss 1.1042 LearningRate 0.000010 Epoch: 36 Global Step: 755790 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:25:56,723-Speed 2497.25 samples/sec Loss 1.1112 LearningRate 0.000010 Epoch: 36 Global Step: 755800 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:04,927-Speed 2496.60 samples/sec Loss 1.0716 LearningRate 0.000010 Epoch: 36 Global Step: 755810 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:13,132-Speed 2496.40 samples/sec Loss 1.1170 LearningRate 0.000010 Epoch: 36 Global Step: 755820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:21,279-Speed 2514.34 samples/sec Loss 1.1115 LearningRate 0.000010 Epoch: 36 Global Step: 755830 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:29,487-Speed 2495.48 samples/sec Loss 1.0891 LearningRate 0.000010 Epoch: 36 Global Step: 755840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:37,689-Speed 2497.34 samples/sec Loss 1.0865 LearningRate 0.000010 Epoch: 36 Global Step: 755850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:45,892-Speed 2497.11 samples/sec Loss 1.1054 LearningRate 0.000010 Epoch: 36 Global Step: 755860 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:26:54,093-Speed 2497.54 samples/sec Loss 1.0738 LearningRate 0.000010 Epoch: 36 Global Step: 755870 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:02,293-Speed 2497.89 samples/sec Loss 1.0741 LearningRate 0.000010 Epoch: 36 Global Step: 755880 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:10,444-Speed 2513.32 samples/sec Loss 1.0783 LearningRate 0.000010 Epoch: 36 Global Step: 755890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:18,657-Speed 2493.96 samples/sec Loss 1.0712 LearningRate 0.000010 Epoch: 36 Global Step: 755900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:26,859-Speed 2497.41 samples/sec Loss 1.0820 LearningRate 0.000010 Epoch: 36 Global Step: 755910 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:35,066-Speed 2495.93 samples/sec Loss 1.0949 LearningRate 0.000010 Epoch: 36 Global Step: 755920 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:43,269-Speed 2496.91 samples/sec Loss 1.1139 LearningRate 0.000010 Epoch: 36 Global Step: 755930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:51,476-Speed 2495.80 samples/sec Loss 1.1218 LearningRate 0.000010 Epoch: 36 Global Step: 755940 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:27:59,625-Speed 2513.68 samples/sec Loss 1.0856 LearningRate 0.000010 Epoch: 36 Global Step: 755950 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:07,828-Speed 2497.08 samples/sec Loss 1.0922 LearningRate 0.000010 Epoch: 36 Global Step: 755960 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:16,029-Speed 2497.59 samples/sec Loss 1.0962 LearningRate 0.000010 Epoch: 36 Global Step: 755970 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:24,231-Speed 2497.46 samples/sec Loss 1.1019 LearningRate 0.000010 Epoch: 36 Global Step: 755980 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:32,433-Speed 2497.46 samples/sec Loss 1.0836 LearningRate 0.000010 Epoch: 36 Global Step: 755990 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:40,641-Speed 2495.24 samples/sec Loss 1.1002 LearningRate 0.000010 Epoch: 36 Global Step: 756000 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:48,795-Speed 2512.37 samples/sec Loss 1.0967 LearningRate 0.000010 Epoch: 36 Global Step: 756010 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:28:56,996-Speed 2497.34 samples/sec Loss 1.0995 LearningRate 0.000010 Epoch: 36 Global Step: 756020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:05,202-Speed 2496.32 samples/sec Loss 1.0930 LearningRate 0.000010 Epoch: 36 Global Step: 756030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:13,401-Speed 2498.22 samples/sec Loss 1.0732 LearningRate 0.000010 Epoch: 36 Global Step: 756040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:21,607-Speed 2496.31 samples/sec Loss 1.0780 LearningRate 0.000010 Epoch: 36 Global Step: 756050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:29,822-Speed 2493.27 samples/sec Loss 1.0864 LearningRate 0.000010 Epoch: 36 Global Step: 756060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:37,973-Speed 2513.72 samples/sec Loss 1.0707 LearningRate 0.000010 Epoch: 36 Global Step: 756070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:46,182-Speed 2495.28 samples/sec Loss 1.0791 LearningRate 0.000010 Epoch: 36 Global Step: 756080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:29:54,390-Speed 2495.26 samples/sec Loss 1.0846 LearningRate 0.000010 Epoch: 36 Global Step: 756090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:02,592-Speed 2497.37 samples/sec Loss 1.0974 LearningRate 0.000010 Epoch: 36 Global Step: 756100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:10,811-Speed 2492.16 samples/sec Loss 1.1224 LearningRate 0.000010 Epoch: 36 Global Step: 756110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:19,014-Speed 2497.21 samples/sec Loss 1.1091 LearningRate 0.000010 Epoch: 36 Global Step: 756120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:27,170-Speed 2511.23 samples/sec Loss 1.0938 LearningRate 0.000010 Epoch: 36 Global Step: 756130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:35,373-Speed 2497.05 samples/sec Loss 1.0924 LearningRate 0.000010 Epoch: 36 Global Step: 756140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:43,579-Speed 2496.48 samples/sec Loss 1.1165 LearningRate 0.000010 Epoch: 36 Global Step: 756150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-07-12 19:30:51,742-Speed 2509.22 samples/sec Loss 1.1015 LearningRate 0.000010 Epoch: 36 Global Step: 756160 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:30:59,946-Speed 2496.82 samples/sec Loss 1.1120 LearningRate 0.000010 Epoch: 36 Global Step: 756170 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:08,162-Speed 2493.32 samples/sec Loss 1.1037 LearningRate 0.000010 Epoch: 36 Global Step: 756180 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:16,314-Speed 2512.79 samples/sec Loss 1.1041 LearningRate 0.000010 Epoch: 36 Global Step: 756190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:24,517-Speed 2497.07 samples/sec Loss 1.1061 LearningRate 0.000010 Epoch: 36 Global Step: 756200 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:32,722-Speed 2496.59 samples/sec Loss 1.0990 LearningRate 0.000010 Epoch: 36 Global Step: 756210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:40,930-Speed 2495.46 samples/sec Loss 1.1153 LearningRate 0.000010 Epoch: 36 Global Step: 756220 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:49,131-Speed 2497.52 samples/sec Loss 1.0800 LearningRate 0.000010 Epoch: 36 Global Step: 756230 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:31:57,335-Speed 2496.78 samples/sec Loss 1.0968 LearningRate 0.000010 Epoch: 36 Global Step: 756240 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:05,490-Speed 2511.82 samples/sec Loss 1.1021 LearningRate 0.000010 Epoch: 36 Global Step: 756250 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:13,694-Speed 2496.71 samples/sec Loss 1.0710 LearningRate 0.000010 Epoch: 36 Global Step: 756260 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:21,898-Speed 2496.93 samples/sec Loss 1.0854 LearningRate 0.000010 Epoch: 36 Global Step: 756270 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:30,112-Speed 2493.72 samples/sec Loss 1.0927 LearningRate 0.000010 Epoch: 36 Global Step: 756280 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:38,315-Speed 2497.06 samples/sec Loss 1.0802 LearningRate 0.000010 Epoch: 36 Global Step: 756290 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:46,518-Speed 2497.03 samples/sec Loss 1.1054 LearningRate 0.000010 Epoch: 36 Global Step: 756300 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:32:54,669-Speed 2513.08 samples/sec Loss 1.0942 LearningRate 0.000010 Epoch: 36 Global Step: 756310 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:02,876-Speed 2495.84 samples/sec Loss 1.0952 LearningRate 0.000010 Epoch: 36 Global Step: 756320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:11,081-Speed 2496.17 samples/sec Loss 1.1079 LearningRate 0.000010 Epoch: 36 Global Step: 756330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:19,285-Speed 2497.29 samples/sec Loss 1.0868 LearningRate 0.000010 Epoch: 36 Global Step: 756340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:27,489-Speed 2496.58 samples/sec Loss 1.1223 LearningRate 0.000010 Epoch: 36 Global Step: 756350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:35,692-Speed 2497.18 samples/sec Loss 1.1053 LearningRate 0.000010 Epoch: 36 Global Step: 756360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:43,841-Speed 2513.53 samples/sec Loss 1.0909 LearningRate 0.000010 Epoch: 36 Global Step: 756370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:33:52,059-Speed 2492.64 samples/sec Loss 1.1012 LearningRate 0.000010 Epoch: 36 Global Step: 756380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:00,263-Speed 2496.57 samples/sec Loss 1.1275 LearningRate 0.000010 Epoch: 36 Global Step: 756390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:08,469-Speed 2496.10 samples/sec Loss 1.0950 LearningRate 0.000010 Epoch: 36 Global Step: 756400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:16,672-Speed 2496.94 samples/sec Loss 1.0988 LearningRate 0.000010 Epoch: 36 Global Step: 756410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:24,877-Speed 2496.46 samples/sec Loss 1.1078 LearningRate 0.000010 Epoch: 36 Global Step: 756420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:33,028-Speed 2512.86 samples/sec Loss 1.0847 LearningRate 0.000010 Epoch: 36 Global Step: 756430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:41,231-Speed 2497.67 samples/sec Loss 1.0829 LearningRate 0.000010 Epoch: 36 Global Step: 756440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:49,435-Speed 2496.93 samples/sec Loss 1.1061 LearningRate 0.000010 Epoch: 36 Global Step: 756450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:34:57,637-Speed 2497.57 samples/sec Loss 1.0718 LearningRate 0.000010 Epoch: 36 Global Step: 756460 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:05,839-Speed 2497.51 samples/sec Loss 1.0920 LearningRate 0.000010 Epoch: 36 Global Step: 756470 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:14,043-Speed 2496.69 samples/sec Loss 1.0948 LearningRate 0.000010 Epoch: 36 Global Step: 756480 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:22,191-Speed 2513.81 samples/sec Loss 1.1094 LearningRate 0.000010 Epoch: 36 Global Step: 756490 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:30,395-Speed 2496.99 samples/sec Loss 1.0596 LearningRate 0.000010 Epoch: 36 Global Step: 756500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:38,600-Speed 2496.19 samples/sec Loss 1.0833 LearningRate 0.000010 Epoch: 36 Global Step: 756510 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:46,806-Speed 2496.33 samples/sec Loss 1.1247 LearningRate 0.000010 Epoch: 36 Global Step: 756520 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:35:55,010-Speed 2497.07 samples/sec Loss 1.0776 LearningRate 0.000010 Epoch: 36 Global Step: 756530 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:03,222-Speed 2494.38 samples/sec Loss 1.0858 LearningRate 0.000010 Epoch: 36 Global Step: 756540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:11,376-Speed 2512.27 samples/sec Loss 1.1044 LearningRate 0.000010 Epoch: 36 Global Step: 756550 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:19,581-Speed 2496.24 samples/sec Loss 1.0990 LearningRate 0.000010 Epoch: 36 Global Step: 756560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:27,787-Speed 2496.34 samples/sec Loss 1.1032 LearningRate 0.000010 Epoch: 36 Global Step: 756570 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:35,993-Speed 2496.10 samples/sec Loss 1.1093 LearningRate 0.000010 Epoch: 36 Global Step: 756580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:44,199-Speed 2496.24 samples/sec Loss 1.1136 LearningRate 0.000010 Epoch: 36 Global Step: 756590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:36:52,403-Speed 2496.75 samples/sec Loss 1.1288 LearningRate 0.000010 Epoch: 36 Global Step: 756600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:00,553-Speed 2513.27 samples/sec Loss 1.0965 LearningRate 0.000010 Epoch: 36 Global Step: 756610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:08,755-Speed 2497.56 samples/sec Loss 1.1245 LearningRate 0.000010 Epoch: 36 Global Step: 756620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:16,957-Speed 2497.13 samples/sec Loss 1.1155 LearningRate 0.000010 Epoch: 36 Global Step: 756630 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:25,165-Speed 2495.64 samples/sec Loss 1.0595 LearningRate 0.000010 Epoch: 36 Global Step: 756640 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:33,369-Speed 2496.81 samples/sec Loss 1.0807 LearningRate 0.000010 Epoch: 36 Global Step: 756650 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:41,574-Speed 2496.23 samples/sec Loss 1.0902 LearningRate 0.000010 Epoch: 36 Global Step: 756660 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:49,732-Speed 2511.10 samples/sec Loss 1.0887 LearningRate 0.000010 Epoch: 36 Global Step: 756670 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:37:57,936-Speed 2496.48 samples/sec Loss 1.1306 LearningRate 0.000010 Epoch: 36 Global Step: 756680 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:06,147-Speed 2494.77 samples/sec Loss 1.0815 LearningRate 0.000010 Epoch: 36 Global Step: 756690 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:14,351-Speed 2496.75 samples/sec Loss 1.0873 LearningRate 0.000010 Epoch: 36 Global Step: 756700 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:22,551-Speed 2497.99 samples/sec Loss 1.1085 LearningRate 0.000010 Epoch: 36 Global Step: 756710 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:30,757-Speed 2496.04 samples/sec Loss 1.1119 LearningRate 0.000010 Epoch: 36 Global Step: 756720 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:38,908-Speed 2513.36 samples/sec Loss 1.0882 LearningRate 0.000010 Epoch: 36 Global Step: 756730 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:47,112-Speed 2496.84 samples/sec Loss 1.1004 LearningRate 0.000010 Epoch: 36 Global Step: 756740 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:38:55,318-Speed 2496.03 samples/sec Loss 1.0754 LearningRate 0.000010 Epoch: 36 Global Step: 756750 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:03,523-Speed 2496.64 samples/sec Loss 1.1061 LearningRate 0.000010 Epoch: 36 Global Step: 756760 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:11,738-Speed 2493.56 samples/sec Loss 1.1282 LearningRate 0.000010 Epoch: 36 Global Step: 756770 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:19,946-Speed 2495.46 samples/sec Loss 1.0709 LearningRate 0.000010 Epoch: 36 Global Step: 756780 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:28,098-Speed 2512.79 samples/sec Loss 1.0824 LearningRate 0.000010 Epoch: 36 Global Step: 756790 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:36,302-Speed 2496.74 samples/sec Loss 1.1158 LearningRate 0.000009 Epoch: 36 Global Step: 756800 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:44,505-Speed 2497.23 samples/sec Loss 1.1364 LearningRate 0.000009 Epoch: 36 Global Step: 756810 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:39:52,713-Speed 2495.51 samples/sec Loss 1.1114 LearningRate 0.000009 Epoch: 36 Global Step: 756820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:00,920-Speed 2495.80 samples/sec Loss 1.1266 LearningRate 0.000009 Epoch: 36 Global Step: 756830 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:09,125-Speed 2496.38 samples/sec Loss 1.0861 LearningRate 0.000009 Epoch: 36 Global Step: 756840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:17,279-Speed 2512.07 samples/sec Loss 1.0797 LearningRate 0.000009 Epoch: 36 Global Step: 756850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:25,484-Speed 2496.25 samples/sec Loss 1.0985 LearningRate 0.000009 Epoch: 36 Global Step: 756860 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:33,690-Speed 2496.01 samples/sec Loss 1.1257 LearningRate 0.000009 Epoch: 36 Global Step: 756870 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:41,900-Speed 2495.02 samples/sec Loss 1.0928 LearningRate 0.000009 Epoch: 36 Global Step: 756880 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:50,108-Speed 2495.52 samples/sec Loss 1.0797 LearningRate 0.000009 Epoch: 36 Global Step: 756890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:40:58,315-Speed 2495.88 samples/sec Loss 1.1239 LearningRate 0.000009 Epoch: 36 Global Step: 756900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:06,469-Speed 2512.10 samples/sec Loss 1.1028 LearningRate 0.000009 Epoch: 36 Global Step: 756910 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:14,672-Speed 2496.78 samples/sec Loss 1.0560 LearningRate 0.000009 Epoch: 36 Global Step: 756920 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:22,877-Speed 2496.57 samples/sec Loss 1.0995 LearningRate 0.000009 Epoch: 36 Global Step: 756930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:31,077-Speed 2497.97 samples/sec Loss 1.0649 LearningRate 0.000009 Epoch: 36 Global Step: 756940 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:39,280-Speed 2496.97 samples/sec Loss 1.0959 LearningRate 0.000009 Epoch: 36 Global Step: 756950 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:47,482-Speed 2497.45 samples/sec Loss 1.0965 LearningRate 0.000009 Epoch: 36 Global Step: 756960 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:41:55,628-Speed 2514.41 samples/sec Loss 1.1079 LearningRate 0.000009 Epoch: 36 Global Step: 756970 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:42:03,833-Speed 2496.42 samples/sec Loss 1.0890 LearningRate 0.000009 Epoch: 36 Global Step: 756980 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-07-12 19:42:11,995-Speed 2509.71 samples/sec Loss 1.0981 LearningRate 0.000009 Epoch: 36 Global Step: 756990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:42:20,201-Speed 2495.97 samples/sec Loss 1.1052 LearningRate 0.000009 Epoch: 36 Global Step: 757000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:42:28,408-Speed 2495.92 samples/sec Loss 1.0863 LearningRate 0.000009 Epoch: 36 Global Step: 757010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:42:36,614-Speed 2496.22 samples/sec Loss 1.0726 LearningRate 0.000009 Epoch: 36 Global Step: 757020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:42:44,767-Speed 2513.03 samples/sec Loss 1.1126 LearningRate 0.000009 Epoch: 36 Global Step: 757030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:42:52,966-Speed 2498.45 samples/sec Loss 1.0788 LearningRate 0.000009 Epoch: 36 Global Step: 757040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:01,170-Speed 2496.71 samples/sec Loss 1.0900 LearningRate 0.000009 Epoch: 36 Global Step: 757050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:09,376-Speed 2496.15 samples/sec Loss 1.0576 LearningRate 0.000009 Epoch: 36 Global Step: 757060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:17,578-Speed 2497.67 samples/sec Loss 1.0910 LearningRate 0.000009 Epoch: 36 Global Step: 757070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:25,786-Speed 2495.42 samples/sec Loss 1.1060 LearningRate 0.000009 Epoch: 36 Global Step: 757080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:33,939-Speed 2512.24 samples/sec Loss 1.0985 LearningRate 0.000009 Epoch: 36 Global Step: 757090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:42,142-Speed 2497.11 samples/sec Loss 1.1043 LearningRate 0.000009 Epoch: 36 Global Step: 757100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:50,343-Speed 2497.73 samples/sec Loss 1.1210 LearningRate 0.000009 Epoch: 36 Global Step: 757110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:43:58,547-Speed 2496.67 samples/sec Loss 1.0926 LearningRate 0.000009 Epoch: 36 Global Step: 757120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:06,771-Speed 2490.60 samples/sec Loss 1.0880 LearningRate 0.000009 Epoch: 36 Global Step: 757130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:14,977-Speed 2496.31 samples/sec Loss 1.0831 LearningRate 0.000009 Epoch: 36 Global Step: 757140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:23,137-Speed 2510.00 samples/sec Loss 1.0766 LearningRate 0.000009 Epoch: 36 Global Step: 757150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:31,338-Speed 2497.65 samples/sec Loss 1.0911 LearningRate 0.000009 Epoch: 36 Global Step: 757160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:39,542-Speed 2497.00 samples/sec Loss 1.1049 LearningRate 0.000009 Epoch: 36 Global Step: 757170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:47,745-Speed 2496.79 samples/sec Loss 1.0996 LearningRate 0.000009 Epoch: 36 Global Step: 757180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:44:55,948-Speed 2497.25 samples/sec Loss 1.0873 LearningRate 0.000009 Epoch: 36 Global Step: 757190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:04,157-Speed 2495.27 samples/sec Loss 1.0917 LearningRate 0.000009 Epoch: 36 Global Step: 757200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:12,317-Speed 2510.27 samples/sec Loss 1.1096 LearningRate 0.000009 Epoch: 36 Global Step: 757210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:20,522-Speed 2496.25 samples/sec Loss 1.0742 LearningRate 0.000009 Epoch: 36 Global Step: 757220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:28,726-Speed 2496.62 samples/sec Loss 1.0937 LearningRate 0.000009 Epoch: 36 Global Step: 757230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:36,929-Speed 2497.19 samples/sec Loss 1.0713 LearningRate 0.000009 Epoch: 36 Global Step: 757240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:45,131-Speed 2497.15 samples/sec Loss 1.0968 LearningRate 0.000009 Epoch: 36 Global Step: 757250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:45:53,334-Speed 2496.90 samples/sec Loss 1.0932 LearningRate 0.000009 Epoch: 36 Global Step: 757260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:01,486-Speed 2512.79 samples/sec Loss 1.0714 LearningRate 0.000009 Epoch: 36 Global Step: 757270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:09,695-Speed 2495.49 samples/sec Loss 1.0975 LearningRate 0.000009 Epoch: 36 Global Step: 757280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:17,898-Speed 2496.89 samples/sec Loss 1.0924 LearningRate 0.000009 Epoch: 36 Global Step: 757290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:26,103-Speed 2496.61 samples/sec Loss 1.1078 LearningRate 0.000009 Epoch: 36 Global Step: 757300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:34,305-Speed 2497.49 samples/sec Loss 1.1034 LearningRate 0.000009 Epoch: 36 Global Step: 757310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:42,508-Speed 2496.88 samples/sec Loss 1.1133 LearningRate 0.000009 Epoch: 36 Global Step: 757320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:50,656-Speed 2513.86 samples/sec Loss 1.1019 LearningRate 0.000009 Epoch: 36 Global Step: 757330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:46:58,866-Speed 2495.55 samples/sec Loss 1.1055 LearningRate 0.000009 Epoch: 36 Global Step: 757340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:07,076-Speed 2495.09 samples/sec Loss 1.0787 LearningRate 0.000009 Epoch: 36 Global Step: 757350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:15,284-Speed 2495.44 samples/sec Loss 1.0974 LearningRate 0.000009 Epoch: 36 Global Step: 757360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:23,489-Speed 2496.50 samples/sec Loss 1.0788 LearningRate 0.000009 Epoch: 36 Global Step: 757370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:31,694-Speed 2496.94 samples/sec Loss 1.1154 LearningRate 0.000009 Epoch: 36 Global Step: 757380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:39,844-Speed 2513.38 samples/sec Loss 1.0887 LearningRate 0.000009 Epoch: 36 Global Step: 757390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:48,046-Speed 2497.25 samples/sec Loss 1.1059 LearningRate 0.000009 Epoch: 36 Global Step: 757400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:47:56,252-Speed 2496.09 samples/sec Loss 1.0804 LearningRate 0.000009 Epoch: 36 Global Step: 757410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:04,459-Speed 2495.81 samples/sec Loss 1.1128 LearningRate 0.000009 Epoch: 36 Global Step: 757420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:12,677-Speed 2493.48 samples/sec Loss 1.0697 LearningRate 0.000009 Epoch: 36 Global Step: 757430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:20,889-Speed 2494.39 samples/sec Loss 1.0487 LearningRate 0.000009 Epoch: 36 Global Step: 757440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:29,036-Speed 2514.00 samples/sec Loss 1.0940 LearningRate 0.000009 Epoch: 36 Global Step: 757450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:37,247-Speed 2494.88 samples/sec Loss 1.1027 LearningRate 0.000009 Epoch: 36 Global Step: 757460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-07-12 19:48:45,453-Speed 2495.86 samples/sec Loss 1.1105 LearningRate 0.000009 Epoch: 36 Global Step: 757470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:48:53,657-Speed 2496.94 samples/sec Loss 1.0789 LearningRate 0.000009 Epoch: 36 Global Step: 757480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:01,861-Speed 2496.56 samples/sec Loss 1.1106 LearningRate 0.000009 Epoch: 36 Global Step: 757490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:10,067-Speed 2496.17 samples/sec Loss 1.1494 LearningRate 0.000009 Epoch: 36 Global Step: 757500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:18,220-Speed 2512.48 samples/sec Loss 1.1093 LearningRate 0.000009 Epoch: 36 Global Step: 757510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:26,427-Speed 2495.79 samples/sec Loss 1.1057 LearningRate 0.000009 Epoch: 36 Global Step: 757520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:34,631-Speed 2496.54 samples/sec Loss 1.1349 LearningRate 0.000009 Epoch: 36 Global Step: 757530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:42,839-Speed 2495.59 samples/sec Loss 1.0994 LearningRate 0.000009 Epoch: 36 Global Step: 757540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:51,042-Speed 2496.95 samples/sec Loss 1.0871 LearningRate 0.000009 Epoch: 36 Global Step: 757550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:49:59,245-Speed 2497.08 samples/sec Loss 1.0801 LearningRate 0.000009 Epoch: 36 Global Step: 757560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:07,403-Speed 2510.65 samples/sec Loss 1.0813 LearningRate 0.000009 Epoch: 36 Global Step: 757570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:15,605-Speed 2497.39 samples/sec Loss 1.1217 LearningRate 0.000009 Epoch: 36 Global Step: 757580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:23,809-Speed 2496.77 samples/sec Loss 1.0996 LearningRate 0.000009 Epoch: 36 Global Step: 757590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:32,016-Speed 2495.76 samples/sec Loss 1.1092 LearningRate 0.000009 Epoch: 36 Global Step: 757600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:40,219-Speed 2497.09 samples/sec Loss 1.0821 LearningRate 0.000009 Epoch: 36 Global Step: 757610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:48,424-Speed 2496.60 samples/sec Loss 1.0690 LearningRate 0.000009 Epoch: 36 Global Step: 757620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:50:56,572-Speed 2513.70 samples/sec Loss 1.0719 LearningRate 0.000009 Epoch: 36 Global Step: 757630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:04,776-Speed 2496.71 samples/sec Loss 1.1205 LearningRate 0.000009 Epoch: 36 Global Step: 757640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:12,981-Speed 2496.47 samples/sec Loss 1.0636 LearningRate 0.000009 Epoch: 36 Global Step: 757650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:21,196-Speed 2493.57 samples/sec Loss 1.0825 LearningRate 0.000009 Epoch: 36 Global Step: 757660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:29,399-Speed 2496.88 samples/sec Loss 1.0812 LearningRate 0.000009 Epoch: 36 Global Step: 757670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:37,605-Speed 2496.39 samples/sec Loss 1.0602 LearningRate 0.000009 Epoch: 36 Global Step: 757680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:45,770-Speed 2508.61 samples/sec Loss 1.0646 LearningRate 0.000009 Epoch: 36 Global Step: 757690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:51:53,972-Speed 2497.29 samples/sec Loss 1.0900 LearningRate 0.000009 Epoch: 36 Global Step: 757700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:02,177-Speed 2496.43 samples/sec Loss 1.0935 LearningRate 0.000009 Epoch: 36 Global Step: 757710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:10,381-Speed 2496.84 samples/sec Loss 1.1030 LearningRate 0.000009 Epoch: 36 Global Step: 757720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:18,589-Speed 2495.96 samples/sec Loss 1.1184 LearningRate 0.000009 Epoch: 36 Global Step: 757730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:26,793-Speed 2496.64 samples/sec Loss 1.0850 LearningRate 0.000009 Epoch: 36 Global Step: 757740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:34,957-Speed 2509.02 samples/sec Loss 1.1121 LearningRate 0.000009 Epoch: 36 Global Step: 757750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:43,164-Speed 2495.83 samples/sec Loss 1.0627 LearningRate 0.000009 Epoch: 36 Global Step: 757760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:51,367-Speed 2497.29 samples/sec Loss 1.0842 LearningRate 0.000009 Epoch: 36 Global Step: 757770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:52:59,584-Speed 2492.72 samples/sec Loss 1.1078 LearningRate 0.000009 Epoch: 36 Global Step: 757780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:07,797-Speed 2493.71 samples/sec Loss 1.0661 LearningRate 0.000009 Epoch: 36 Global Step: 757790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:16,001-Speed 2497.17 samples/sec Loss 1.1105 LearningRate 0.000009 Epoch: 36 Global Step: 757800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:24,151-Speed 2513.34 samples/sec Loss 1.1245 LearningRate 0.000009 Epoch: 36 Global Step: 757810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:32,352-Speed 2497.54 samples/sec Loss 1.0953 LearningRate 0.000009 Epoch: 36 Global Step: 757820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:40,559-Speed 2495.91 samples/sec Loss 1.0720 LearningRate 0.000009 Epoch: 36 Global Step: 757830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:48,764-Speed 2496.60 samples/sec Loss 1.0787 LearningRate 0.000009 Epoch: 36 Global Step: 757840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:53:56,971-Speed 2495.98 samples/sec Loss 1.1139 LearningRate 0.000009 Epoch: 36 Global Step: 757850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:05,174-Speed 2497.08 samples/sec Loss 1.0877 LearningRate 0.000009 Epoch: 36 Global Step: 757860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:13,321-Speed 2513.90 samples/sec Loss 1.1051 LearningRate 0.000009 Epoch: 36 Global Step: 757870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:21,527-Speed 2496.21 samples/sec Loss 1.0977 LearningRate 0.000009 Epoch: 36 Global Step: 757880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:29,729-Speed 2497.28 samples/sec Loss 1.1097 LearningRate 0.000009 Epoch: 36 Global Step: 757890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:37,929-Speed 2498.02 samples/sec Loss 1.0937 LearningRate 0.000009 Epoch: 36 Global Step: 757900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:46,131-Speed 2497.51 samples/sec Loss 1.1131 LearningRate 0.000009 Epoch: 36 Global Step: 757910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:54:54,332-Speed 2497.48 samples/sec Loss 1.1150 LearningRate 0.000009 Epoch: 36 Global Step: 757920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:02,485-Speed 2512.32 samples/sec Loss 1.0870 LearningRate 0.000009 Epoch: 36 Global Step: 757930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:10,688-Speed 2496.87 samples/sec Loss 1.1369 LearningRate 0.000009 Epoch: 36 Global Step: 757940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:18,893-Speed 2496.49 samples/sec Loss 1.0821 LearningRate 0.000009 Epoch: 36 Global Step: 757950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:27,096-Speed 2497.35 samples/sec Loss 1.0926 LearningRate 0.000009 Epoch: 36 Global Step: 757960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:35,305-Speed 2494.92 samples/sec Loss 1.0723 LearningRate 0.000009 Epoch: 36 Global Step: 757970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:43,512-Speed 2496.02 samples/sec Loss 1.0871 LearningRate 0.000009 Epoch: 36 Global Step: 757980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:51,658-Speed 2514.48 samples/sec Loss 1.0799 LearningRate 0.000009 Epoch: 36 Global Step: 757990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:55:59,869-Speed 2494.88 samples/sec Loss 1.0794 LearningRate 0.000009 Epoch: 36 Global Step: 758000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:08,072-Speed 2496.77 samples/sec Loss 1.0923 LearningRate 0.000009 Epoch: 36 Global Step: 758010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:16,290-Speed 2492.74 samples/sec Loss 1.1333 LearningRate 0.000009 Epoch: 36 Global Step: 758020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:24,496-Speed 2496.30 samples/sec Loss 1.0964 LearningRate 0.000009 Epoch: 36 Global Step: 758030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:32,716-Speed 2491.76 samples/sec Loss 1.0950 LearningRate 0.000009 Epoch: 36 Global Step: 758040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:40,867-Speed 2513.18 samples/sec Loss 1.1075 LearningRate 0.000009 Epoch: 36 Global Step: 758050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:49,071-Speed 2496.71 samples/sec Loss 1.1154 LearningRate 0.000009 Epoch: 36 Global Step: 758060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:56:57,280-Speed 2495.15 samples/sec Loss 1.0938 LearningRate 0.000009 Epoch: 36 Global Step: 758070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:05,490-Speed 2494.85 samples/sec Loss 1.1059 LearningRate 0.000009 Epoch: 36 Global Step: 758080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:13,693-Speed 2497.04 samples/sec Loss 1.0881 LearningRate 0.000009 Epoch: 36 Global Step: 758090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:21,896-Speed 2497.10 samples/sec Loss 1.1050 LearningRate 0.000009 Epoch: 36 Global Step: 758100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:30,046-Speed 2513.05 samples/sec Loss 1.0897 LearningRate 0.000009 Epoch: 36 Global Step: 758110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:38,248-Speed 2497.72 samples/sec Loss 1.1111 LearningRate 0.000009 Epoch: 36 Global Step: 758120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:46,450-Speed 2497.13 samples/sec Loss 1.0696 LearningRate 0.000009 Epoch: 36 Global Step: 758130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:57:54,653-Speed 2497.16 samples/sec Loss 1.0794 LearningRate 0.000009 Epoch: 36 Global Step: 758140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:58:02,864-Speed 2494.43 samples/sec Loss 1.0774 LearningRate 0.000009 Epoch: 36 Global Step: 758150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:58:11,069-Speed 2496.40 samples/sec Loss 1.0893 LearningRate 0.000009 Epoch: 36 Global Step: 758160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:58:19,216-Speed 2514.20 samples/sec Loss 1.0986 LearningRate 0.000009 Epoch: 36 Global Step: 758170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:58:27,420-Speed 2496.65 samples/sec Loss 1.0815 LearningRate 0.000009 Epoch: 36 Global Step: 758180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 19:58:35,626-Speed 2496.29 samples/sec Loss 1.1251 LearningRate 0.000009 Epoch: 36 Global Step: 758190 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:58:43,828-Speed 2497.11 samples/sec Loss 1.0808 LearningRate 0.000009 Epoch: 36 Global Step: 758200 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:58:52,033-Speed 2496.43 samples/sec Loss 1.1148 LearningRate 0.000009 Epoch: 36 Global Step: 758210 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:00,234-Speed 2497.82 samples/sec Loss 1.0773 LearningRate 0.000009 Epoch: 36 Global Step: 758220 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:08,385-Speed 2513.13 samples/sec Loss 1.0927 LearningRate 0.000009 Epoch: 36 Global Step: 758230 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:16,586-Speed 2497.50 samples/sec Loss 1.0627 LearningRate 0.000009 Epoch: 36 Global Step: 758240 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:24,788-Speed 2497.25 samples/sec Loss 1.1080 LearningRate 0.000009 Epoch: 36 Global Step: 758250 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:32,995-Speed 2495.98 samples/sec Loss 1.0840 LearningRate 0.000009 Epoch: 36 Global Step: 758260 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:41,203-Speed 2495.45 samples/sec Loss 1.1192 LearningRate 0.000009 Epoch: 36 Global Step: 758270 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:49,412-Speed 2495.30 samples/sec Loss 1.0672 LearningRate 0.000009 Epoch: 36 Global Step: 758280 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 19:59:57,565-Speed 2512.53 samples/sec Loss 1.0907 LearningRate 0.000009 Epoch: 36 Global Step: 758290 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:00:05,769-Speed 2496.70 samples/sec Loss 1.1049 LearningRate 0.000009 Epoch: 36 Global Step: 758300 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:00:13,979-Speed 2494.91 samples/sec Loss 1.0787 LearningRate 0.000009 Epoch: 36 Global Step: 758310 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:00:22,137-Speed 2510.87 samples/sec Loss 1.0738 LearningRate 0.000009 Epoch: 36 Global Step: 758320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:00:30,340-Speed 2497.09 samples/sec Loss 1.1131 LearningRate 0.000009 Epoch: 36 Global Step: 758330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:00:38,544-Speed 2496.83 samples/sec Loss 1.0988 LearningRate 0.000009 Epoch: 36 Global Step: 758340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:00:46,693-Speed 2513.69 samples/sec Loss 1.0985 LearningRate 0.000009 Epoch: 36 Global Step: 758350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:00:54,895-Speed 2497.30 samples/sec Loss 1.0830 LearningRate 0.000009 Epoch: 36 Global Step: 758360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:03,103-Speed 2495.52 samples/sec Loss 1.0971 LearningRate 0.000009 Epoch: 36 Global Step: 758370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:11,325-Speed 2491.13 samples/sec Loss 1.0972 LearningRate 0.000009 Epoch: 36 Global Step: 758380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:19,527-Speed 2497.40 samples/sec Loss 1.0612 LearningRate 0.000009 Epoch: 36 Global Step: 758390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:27,734-Speed 2495.98 samples/sec Loss 1.0996 LearningRate 0.000009 Epoch: 36 Global Step: 758400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:35,883-Speed 2513.67 samples/sec Loss 1.0930 LearningRate 0.000009 Epoch: 36 Global Step: 758410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:44,086-Speed 2497.01 samples/sec Loss 1.0743 LearningRate 0.000009 Epoch: 36 Global Step: 758420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:01:52,289-Speed 2497.01 samples/sec Loss 1.1182 LearningRate 0.000009 Epoch: 36 Global Step: 758430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:00,500-Speed 2494.64 samples/sec Loss 1.1223 LearningRate 0.000009 Epoch: 36 Global Step: 758440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:08,715-Speed 2493.46 samples/sec Loss 1.0698 LearningRate 0.000009 Epoch: 36 Global Step: 758450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:16,916-Speed 2497.56 samples/sec Loss 1.0974 LearningRate 0.000009 Epoch: 36 Global Step: 758460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:25,070-Speed 2512.00 samples/sec Loss 1.0957 LearningRate 0.000009 Epoch: 36 Global Step: 758470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:33,273-Speed 2497.30 samples/sec Loss 1.0817 LearningRate 0.000009 Epoch: 36 Global Step: 758480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:41,477-Speed 2496.92 samples/sec Loss 1.0783 LearningRate 0.000009 Epoch: 36 Global Step: 758490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:49,688-Speed 2494.76 samples/sec Loss 1.0815 LearningRate 0.000009 Epoch: 36 Global Step: 758500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:02:57,890-Speed 2497.20 samples/sec Loss 1.0946 LearningRate 0.000009 Epoch: 36 Global Step: 758510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:06,110-Speed 2492.21 samples/sec Loss 1.0846 LearningRate 0.000009 Epoch: 36 Global Step: 758520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:14,272-Speed 2509.44 samples/sec Loss 1.1108 LearningRate 0.000009 Epoch: 36 Global Step: 758530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:22,475-Speed 2497.15 samples/sec Loss 1.0868 LearningRate 0.000009 Epoch: 36 Global Step: 758540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:30,679-Speed 2496.92 samples/sec Loss 1.0973 LearningRate 0.000009 Epoch: 36 Global Step: 758550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:38,882-Speed 2496.91 samples/sec Loss 1.0857 LearningRate 0.000009 Epoch: 36 Global Step: 758560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:47,084-Speed 2497.29 samples/sec Loss 1.0771 LearningRate 0.000009 Epoch: 36 Global Step: 758570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:03:55,297-Speed 2494.20 samples/sec Loss 1.0586 LearningRate 0.000009 Epoch: 36 Global Step: 758580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:03,443-Speed 2514.39 samples/sec Loss 1.1027 LearningRate 0.000009 Epoch: 36 Global Step: 758590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:11,656-Speed 2494.07 samples/sec Loss 1.0515 LearningRate 0.000009 Epoch: 36 Global Step: 758600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:19,856-Speed 2497.93 samples/sec Loss 1.0860 LearningRate 0.000009 Epoch: 36 Global Step: 758610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:28,060-Speed 2496.81 samples/sec Loss 1.0581 LearningRate 0.000009 Epoch: 36 Global Step: 758620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:36,264-Speed 2496.71 samples/sec Loss 1.1090 LearningRate 0.000009 Epoch: 36 Global Step: 758630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:44,466-Speed 2497.24 samples/sec Loss 1.0528 LearningRate 0.000009 Epoch: 36 Global Step: 758640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:04:52,616-Speed 2514.00 samples/sec Loss 1.0757 LearningRate 0.000009 Epoch: 36 Global Step: 758650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:00,822-Speed 2496.27 samples/sec Loss 1.0897 LearningRate 0.000009 Epoch: 36 Global Step: 758660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:09,028-Speed 2496.10 samples/sec Loss 1.0676 LearningRate 0.000009 Epoch: 36 Global Step: 758670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:17,241-Speed 2494.02 samples/sec Loss 1.0738 LearningRate 0.000009 Epoch: 36 Global Step: 758680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:25,445-Speed 2496.46 samples/sec Loss 1.0637 LearningRate 0.000009 Epoch: 36 Global Step: 758690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:33,649-Speed 2497.08 samples/sec Loss 1.0866 LearningRate 0.000009 Epoch: 36 Global Step: 758700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:41,801-Speed 2512.67 samples/sec Loss 1.0930 LearningRate 0.000009 Epoch: 36 Global Step: 758710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:50,003-Speed 2497.32 samples/sec Loss 1.1227 LearningRate 0.000009 Epoch: 36 Global Step: 758720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:05:58,204-Speed 2497.56 samples/sec Loss 1.0914 LearningRate 0.000009 Epoch: 36 Global Step: 758730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:06,442-Speed 2486.82 samples/sec Loss 1.1138 LearningRate 0.000009 Epoch: 36 Global Step: 758740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:14,646-Speed 2496.72 samples/sec Loss 1.1012 LearningRate 0.000009 Epoch: 36 Global Step: 758750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:22,847-Speed 2497.57 samples/sec Loss 1.0629 LearningRate 0.000009 Epoch: 36 Global Step: 758760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:31,001-Speed 2512.04 samples/sec Loss 1.0864 LearningRate 0.000009 Epoch: 36 Global Step: 758770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:39,206-Speed 2496.65 samples/sec Loss 1.0815 LearningRate 0.000009 Epoch: 36 Global Step: 758780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:47,410-Speed 2496.78 samples/sec Loss 1.0842 LearningRate 0.000009 Epoch: 36 Global Step: 758790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:06:55,613-Speed 2496.95 samples/sec Loss 1.0728 LearningRate 0.000009 Epoch: 36 Global Step: 758800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:03,817-Speed 2496.76 samples/sec Loss 1.0745 LearningRate 0.000009 Epoch: 36 Global Step: 758810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:12,034-Speed 2492.68 samples/sec Loss 1.0845 LearningRate 0.000009 Epoch: 36 Global Step: 758820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:20,195-Speed 2509.78 samples/sec Loss 1.0864 LearningRate 0.000009 Epoch: 36 Global Step: 758830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:28,400-Speed 2496.62 samples/sec Loss 1.0908 LearningRate 0.000009 Epoch: 36 Global Step: 758840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:36,603-Speed 2496.93 samples/sec Loss 1.0790 LearningRate 0.000009 Epoch: 36 Global Step: 758850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:44,810-Speed 2496.17 samples/sec Loss 1.0764 LearningRate 0.000009 Epoch: 36 Global Step: 758860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:07:53,014-Speed 2496.62 samples/sec Loss 1.0784 LearningRate 0.000009 Epoch: 36 Global Step: 758870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:01,218-Speed 2496.80 samples/sec Loss 1.0591 LearningRate 0.000009 Epoch: 36 Global Step: 758880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:09,365-Speed 2514.87 samples/sec Loss 1.0998 LearningRate 0.000009 Epoch: 36 Global Step: 758890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:17,570-Speed 2496.54 samples/sec Loss 1.0521 LearningRate 0.000009 Epoch: 36 Global Step: 758900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:25,772-Speed 2497.41 samples/sec Loss 1.0722 LearningRate 0.000009 Epoch: 36 Global Step: 758910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:33,975-Speed 2496.96 samples/sec Loss 1.0990 LearningRate 0.000009 Epoch: 36 Global Step: 758920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:42,177-Speed 2497.20 samples/sec Loss 1.0631 LearningRate 0.000009 Epoch: 36 Global Step: 758930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:50,379-Speed 2497.74 samples/sec Loss 1.0992 LearningRate 0.000009 Epoch: 36 Global Step: 758940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:08:58,528-Speed 2513.57 samples/sec Loss 1.0704 LearningRate 0.000009 Epoch: 36 Global Step: 758950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:06,735-Speed 2495.77 samples/sec Loss 1.0883 LearningRate 0.000009 Epoch: 36 Global Step: 758960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:14,950-Speed 2493.48 samples/sec Loss 1.1014 LearningRate 0.000009 Epoch: 36 Global Step: 758970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:23,154-Speed 2496.83 samples/sec Loss 1.0709 LearningRate 0.000009 Epoch: 36 Global Step: 758980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:31,358-Speed 2496.48 samples/sec Loss 1.0672 LearningRate 0.000009 Epoch: 36 Global Step: 758990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:39,561-Speed 2497.40 samples/sec Loss 1.0742 LearningRate 0.000009 Epoch: 36 Global Step: 759000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:47,710-Speed 2513.74 samples/sec Loss 1.0850 LearningRate 0.000009 Epoch: 36 Global Step: 759010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:09:55,912-Speed 2497.46 samples/sec Loss 1.0894 LearningRate 0.000009 Epoch: 36 Global Step: 759020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:04,117-Speed 2496.36 samples/sec Loss 1.0881 LearningRate 0.000009 Epoch: 36 Global Step: 759030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:12,330-Speed 2494.12 samples/sec Loss 1.1036 LearningRate 0.000009 Epoch: 36 Global Step: 759040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:20,531-Speed 2497.83 samples/sec Loss 1.0915 LearningRate 0.000009 Epoch: 36 Global Step: 759050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:28,732-Speed 2497.85 samples/sec Loss 1.1062 LearningRate 0.000009 Epoch: 36 Global Step: 759060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:36,884-Speed 2512.50 samples/sec Loss 1.0607 LearningRate 0.000009 Epoch: 36 Global Step: 759070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:45,089-Speed 2496.37 samples/sec Loss 1.0563 LearningRate 0.000009 Epoch: 36 Global Step: 759080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:10:53,291-Speed 2497.39 samples/sec Loss 1.0892 LearningRate 0.000009 Epoch: 36 Global Step: 759090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:01,494-Speed 2497.02 samples/sec Loss 1.0899 LearningRate 0.000009 Epoch: 36 Global Step: 759100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:09,698-Speed 2496.84 samples/sec Loss 1.0872 LearningRate 0.000009 Epoch: 36 Global Step: 759110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:17,901-Speed 2496.94 samples/sec Loss 1.1124 LearningRate 0.000009 Epoch: 36 Global Step: 759120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:26,053-Speed 2512.72 samples/sec Loss 1.0907 LearningRate 0.000009 Epoch: 36 Global Step: 759130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:34,257-Speed 2497.02 samples/sec Loss 1.0536 LearningRate 0.000009 Epoch: 36 Global Step: 759140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:42,457-Speed 2498.15 samples/sec Loss 1.0986 LearningRate 0.000009 Epoch: 36 Global Step: 759150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:50,659-Speed 2497.19 samples/sec Loss 1.1048 LearningRate 0.000009 Epoch: 36 Global Step: 759160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:11:58,875-Speed 2492.94 samples/sec Loss 1.0798 LearningRate 0.000009 Epoch: 36 Global Step: 759170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:07,078-Speed 2497.34 samples/sec Loss 1.0875 LearningRate 0.000009 Epoch: 36 Global Step: 759180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:15,230-Speed 2512.48 samples/sec Loss 1.0808 LearningRate 0.000009 Epoch: 36 Global Step: 759190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:23,431-Speed 2497.68 samples/sec Loss 1.1095 LearningRate 0.000009 Epoch: 36 Global Step: 759200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:31,634-Speed 2497.17 samples/sec Loss 1.1111 LearningRate 0.000009 Epoch: 36 Global Step: 759210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:39,839-Speed 2496.34 samples/sec Loss 1.0947 LearningRate 0.000009 Epoch: 36 Global Step: 759220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:48,044-Speed 2496.30 samples/sec Loss 1.1032 LearningRate 0.000009 Epoch: 36 Global Step: 759230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:12:56,250-Speed 2496.10 samples/sec Loss 1.1029 LearningRate 0.000009 Epoch: 36 Global Step: 759240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:04,405-Speed 2512.03 samples/sec Loss 1.0667 LearningRate 0.000009 Epoch: 36 Global Step: 759250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:12,608-Speed 2496.99 samples/sec Loss 1.1116 LearningRate 0.000009 Epoch: 36 Global Step: 759260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:20,814-Speed 2496.06 samples/sec Loss 1.1205 LearningRate 0.000009 Epoch: 36 Global Step: 759270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:29,018-Speed 2496.62 samples/sec Loss 1.0937 LearningRate 0.000009 Epoch: 36 Global Step: 759280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:37,223-Speed 2496.57 samples/sec Loss 1.0858 LearningRate 0.000009 Epoch: 36 Global Step: 759290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:45,450-Speed 2489.71 samples/sec Loss 1.1210 LearningRate 0.000009 Epoch: 36 Global Step: 759300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:13:53,599-Speed 2513.28 samples/sec Loss 1.0531 LearningRate 0.000009 Epoch: 36 Global Step: 759310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:01,804-Speed 2497.00 samples/sec Loss 1.1242 LearningRate 0.000009 Epoch: 36 Global Step: 759320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:10,012-Speed 2495.74 samples/sec Loss 1.1030 LearningRate 0.000009 Epoch: 36 Global Step: 759330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:18,219-Speed 2495.79 samples/sec Loss 1.0878 LearningRate 0.000009 Epoch: 36 Global Step: 759340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:26,424-Speed 2496.27 samples/sec Loss 1.0869 LearningRate 0.000009 Epoch: 36 Global Step: 759350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:34,629-Speed 2496.42 samples/sec Loss 1.0962 LearningRate 0.000009 Epoch: 36 Global Step: 759360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:42,780-Speed 2512.93 samples/sec Loss 1.0917 LearningRate 0.000009 Epoch: 36 Global Step: 759370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:50,991-Speed 2494.55 samples/sec Loss 1.1156 LearningRate 0.000009 Epoch: 36 Global Step: 759380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:14:59,201-Speed 2494.88 samples/sec Loss 1.0945 LearningRate 0.000009 Epoch: 36 Global Step: 759390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:07,440-Speed 2486.17 samples/sec Loss 1.0854 LearningRate 0.000009 Epoch: 36 Global Step: 759400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:15,647-Speed 2496.04 samples/sec Loss 1.0919 LearningRate 0.000009 Epoch: 36 Global Step: 759410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:23,854-Speed 2495.57 samples/sec Loss 1.0824 LearningRate 0.000009 Epoch: 36 Global Step: 759420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:32,005-Speed 2513.04 samples/sec Loss 1.0958 LearningRate 0.000009 Epoch: 36 Global Step: 759430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:40,210-Speed 2496.41 samples/sec Loss 1.0721 LearningRate 0.000009 Epoch: 36 Global Step: 759440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:48,414-Speed 2496.79 samples/sec Loss 1.0907 LearningRate 0.000009 Epoch: 36 Global Step: 759450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:15:56,619-Speed 2496.37 samples/sec Loss 1.0843 LearningRate 0.000009 Epoch: 36 Global Step: 759460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:04,833-Speed 2493.51 samples/sec Loss 1.1113 LearningRate 0.000009 Epoch: 36 Global Step: 759470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:13,046-Speed 2494.46 samples/sec Loss 1.0882 LearningRate 0.000009 Epoch: 36 Global Step: 759480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:21,194-Speed 2513.76 samples/sec Loss 1.0900 LearningRate 0.000009 Epoch: 36 Global Step: 759490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:29,397-Speed 2497.17 samples/sec Loss 1.0468 LearningRate 0.000009 Epoch: 36 Global Step: 759500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:37,602-Speed 2496.63 samples/sec Loss 1.0916 LearningRate 0.000009 Epoch: 36 Global Step: 759510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:16:45,807-Speed 2496.62 samples/sec Loss 1.0831 LearningRate 0.000009 Epoch: 36 Global Step: 759520 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:16:54,012-Speed 2496.32 samples/sec Loss 1.1008 LearningRate 0.000009 Epoch: 36 Global Step: 759530 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:02,219-Speed 2496.05 samples/sec Loss 1.0982 LearningRate 0.000009 Epoch: 36 Global Step: 759540 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:10,370-Speed 2512.87 samples/sec Loss 1.1094 LearningRate 0.000009 Epoch: 36 Global Step: 759550 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:18,576-Speed 2496.10 samples/sec Loss 1.0986 LearningRate 0.000009 Epoch: 36 Global Step: 759560 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:26,792-Speed 2493.41 samples/sec Loss 1.0812 LearningRate 0.000009 Epoch: 36 Global Step: 759570 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:34,996-Speed 2496.42 samples/sec Loss 1.1084 LearningRate 0.000009 Epoch: 36 Global Step: 759580 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:43,210-Speed 2493.71 samples/sec Loss 1.0919 LearningRate 0.000009 Epoch: 36 Global Step: 759590 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:51,421-Speed 2495.08 samples/sec Loss 1.0951 LearningRate 0.000009 Epoch: 36 Global Step: 759600 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:17:59,572-Speed 2513.11 samples/sec Loss 1.0795 LearningRate 0.000009 Epoch: 36 Global Step: 759610 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:07,774-Speed 2497.10 samples/sec Loss 1.0713 LearningRate 0.000009 Epoch: 36 Global Step: 759620 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:15,978-Speed 2497.15 samples/sec Loss 1.0909 LearningRate 0.000009 Epoch: 36 Global Step: 759630 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:24,183-Speed 2496.75 samples/sec Loss 1.0676 LearningRate 0.000009 Epoch: 36 Global Step: 759640 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:32,396-Speed 2493.74 samples/sec Loss 1.1133 LearningRate 0.000009 Epoch: 36 Global Step: 759650 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:40,604-Speed 2495.33 samples/sec Loss 1.0834 LearningRate 0.000009 Epoch: 36 Global Step: 759660 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:48,755-Speed 2513.26 samples/sec Loss 1.0892 LearningRate 0.000009 Epoch: 36 Global Step: 759670 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:18:56,956-Speed 2497.46 samples/sec Loss 1.0897 LearningRate 0.000009 Epoch: 36 Global Step: 759680 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:05,159-Speed 2496.93 samples/sec Loss 1.1074 LearningRate 0.000009 Epoch: 36 Global Step: 759690 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:13,364-Speed 2496.67 samples/sec Loss 1.0959 LearningRate 0.000009 Epoch: 36 Global Step: 759700 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:21,567-Speed 2496.94 samples/sec Loss 1.0835 LearningRate 0.000009 Epoch: 36 Global Step: 759710 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:29,770-Speed 2496.78 samples/sec Loss 1.0936 LearningRate 0.000009 Epoch: 36 Global Step: 759720 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:37,922-Speed 2512.75 samples/sec Loss 1.1010 LearningRate 0.000009 Epoch: 36 Global Step: 759730 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:46,136-Speed 2493.81 samples/sec Loss 1.0954 LearningRate 0.000009 Epoch: 36 Global Step: 759740 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:19:54,349-Speed 2493.85 samples/sec Loss 1.0669 LearningRate 0.000009 Epoch: 36 Global Step: 759750 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:02,553-Speed 2496.86 samples/sec Loss 1.0933 LearningRate 0.000009 Epoch: 36 Global Step: 759760 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:10,756-Speed 2497.16 samples/sec Loss 1.0977 LearningRate 0.000009 Epoch: 36 Global Step: 759770 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:18,958-Speed 2497.35 samples/sec Loss 1.0512 LearningRate 0.000009 Epoch: 36 Global Step: 759780 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:27,112-Speed 2511.94 samples/sec Loss 1.1011 LearningRate 0.000009 Epoch: 36 Global Step: 759790 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:35,314-Speed 2497.46 samples/sec Loss 1.1077 LearningRate 0.000009 Epoch: 36 Global Step: 759800 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:43,523-Speed 2495.34 samples/sec Loss 1.0902 LearningRate 0.000009 Epoch: 36 Global Step: 759810 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:51,725-Speed 2497.33 samples/sec Loss 1.0928 LearningRate 0.000009 Epoch: 36 Global Step: 759820 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:20:59,928-Speed 2497.08 samples/sec Loss 1.1358 LearningRate 0.000009 Epoch: 36 Global Step: 759830 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:08,146-Speed 2493.82 samples/sec Loss 1.1006 LearningRate 0.000009 Epoch: 36 Global Step: 759840 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:16,310-Speed 2511.91 samples/sec Loss 1.0824 LearningRate 0.000009 Epoch: 36 Global Step: 759850 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:24,512-Speed 2497.30 samples/sec Loss 1.0932 LearningRate 0.000009 Epoch: 36 Global Step: 759860 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:32,719-Speed 2495.79 samples/sec Loss 1.0778 LearningRate 0.000009 Epoch: 36 Global Step: 759870 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:40,921-Speed 2497.34 samples/sec Loss 1.1066 LearningRate 0.000009 Epoch: 36 Global Step: 759880 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:49,126-Speed 2496.39 samples/sec Loss 1.0858 LearningRate 0.000009 Epoch: 36 Global Step: 759890 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:21:57,331-Speed 2496.54 samples/sec Loss 1.0871 LearningRate 0.000009 Epoch: 36 Global Step: 759900 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:05,480-Speed 2513.60 samples/sec Loss 1.0854 LearningRate 0.000009 Epoch: 36 Global Step: 759910 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:13,688-Speed 2495.74 samples/sec Loss 1.0935 LearningRate 0.000009 Epoch: 36 Global Step: 759920 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:21,889-Speed 2497.78 samples/sec Loss 1.0604 LearningRate 0.000009 Epoch: 36 Global Step: 759930 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:30,095-Speed 2496.02 samples/sec Loss 1.1037 LearningRate 0.000009 Epoch: 36 Global Step: 759940 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:38,301-Speed 2496.15 samples/sec Loss 1.1223 LearningRate 0.000009 Epoch: 36 Global Step: 759950 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:46,506-Speed 2496.42 samples/sec Loss 1.0965 LearningRate 0.000009 Epoch: 36 Global Step: 759960 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:22:54,674-Speed 2507.96 samples/sec Loss 1.1016 LearningRate 0.000009 Epoch: 36 Global Step: 759970 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:02,878-Speed 2496.68 samples/sec Loss 1.1168 LearningRate 0.000009 Epoch: 36 Global Step: 759980 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:11,083-Speed 2496.45 samples/sec Loss 1.1108 LearningRate 0.000009 Epoch: 36 Global Step: 759990 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:19,290-Speed 2495.78 samples/sec Loss 1.0898 LearningRate 0.000009 Epoch: 36 Global Step: 760000 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:27,497-Speed 2495.95 samples/sec Loss 1.0840 LearningRate 0.000009 Epoch: 36 Global Step: 760010 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:35,700-Speed 2497.15 samples/sec Loss 1.1018 LearningRate 0.000009 Epoch: 36 Global Step: 760020 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:43,864-Speed 2508.97 samples/sec Loss 1.0924 LearningRate 0.000009 Epoch: 36 Global Step: 760030 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:23:52,067-Speed 2496.75 samples/sec Loss 1.1046 LearningRate 0.000009 Epoch: 36 Global Step: 760040 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:00,273-Speed 2496.64 samples/sec Loss 1.0945 LearningRate 0.000009 Epoch: 36 Global Step: 760050 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:08,478-Speed 2496.34 samples/sec Loss 1.0530 LearningRate 0.000009 Epoch: 36 Global Step: 760060 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:16,681-Speed 2497.06 samples/sec Loss 1.0920 LearningRate 0.000009 Epoch: 36 Global Step: 760070 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:24,885-Speed 2496.91 samples/sec Loss 1.0848 LearningRate 0.000009 Epoch: 36 Global Step: 760080 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:33,039-Speed 2511.76 samples/sec Loss 1.0962 LearningRate 0.000009 Epoch: 36 Global Step: 760090 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:41,248-Speed 2495.27 samples/sec Loss 1.0611 LearningRate 0.000009 Epoch: 36 Global Step: 760100 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:49,451-Speed 2497.27 samples/sec Loss 1.0876 LearningRate 0.000009 Epoch: 36 Global Step: 760110 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:24:57,608-Speed 2511.05 samples/sec Loss 1.1081 LearningRate 0.000009 Epoch: 36 Global Step: 760120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:05,813-Speed 2496.65 samples/sec Loss 1.1156 LearningRate 0.000009 Epoch: 36 Global Step: 760130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:14,018-Speed 2496.52 samples/sec Loss 1.1237 LearningRate 0.000009 Epoch: 36 Global Step: 760140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:22,165-Speed 2513.99 samples/sec Loss 1.0922 LearningRate 0.000009 Epoch: 36 Global Step: 760150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:30,370-Speed 2496.56 samples/sec Loss 1.1170 LearningRate 0.000009 Epoch: 36 Global Step: 760160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:38,575-Speed 2497.03 samples/sec Loss 1.1055 LearningRate 0.000009 Epoch: 36 Global Step: 760170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:46,779-Speed 2496.67 samples/sec Loss 1.1041 LearningRate 0.000009 Epoch: 36 Global Step: 760180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:25:54,981-Speed 2497.48 samples/sec Loss 1.0670 LearningRate 0.000009 Epoch: 36 Global Step: 760190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:03,183-Speed 2497.33 samples/sec Loss 1.1067 LearningRate 0.000009 Epoch: 36 Global Step: 760200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:11,331-Speed 2513.85 samples/sec Loss 1.0708 LearningRate 0.000009 Epoch: 36 Global Step: 760210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:19,546-Speed 2493.34 samples/sec Loss 1.0777 LearningRate 0.000009 Epoch: 36 Global Step: 760220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:27,748-Speed 2497.36 samples/sec Loss 1.0722 LearningRate 0.000009 Epoch: 36 Global Step: 760230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:35,951-Speed 2497.28 samples/sec Loss 1.0719 LearningRate 0.000009 Epoch: 36 Global Step: 760240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:44,153-Speed 2497.56 samples/sec Loss 1.0690 LearningRate 0.000009 Epoch: 36 Global Step: 760250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:26:52,355-Speed 2497.43 samples/sec Loss 1.0811 LearningRate 0.000009 Epoch: 36 Global Step: 760260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:00,504-Speed 2513.47 samples/sec Loss 1.1158 LearningRate 0.000009 Epoch: 36 Global Step: 760270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:08,705-Speed 2497.73 samples/sec Loss 1.0943 LearningRate 0.000009 Epoch: 36 Global Step: 760280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:16,910-Speed 2496.39 samples/sec Loss 1.0868 LearningRate 0.000009 Epoch: 36 Global Step: 760290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:25,112-Speed 2497.40 samples/sec Loss 1.0913 LearningRate 0.000009 Epoch: 36 Global Step: 760300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:33,311-Speed 2498.16 samples/sec Loss 1.0942 LearningRate 0.000009 Epoch: 36 Global Step: 760310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:41,512-Speed 2497.79 samples/sec Loss 1.1009 LearningRate 0.000009 Epoch: 36 Global Step: 760320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:49,661-Speed 2513.63 samples/sec Loss 1.0527 LearningRate 0.000009 Epoch: 36 Global Step: 760330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:27:57,863-Speed 2497.06 samples/sec Loss 1.1070 LearningRate 0.000009 Epoch: 36 Global Step: 760340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:06,072-Speed 2495.42 samples/sec Loss 1.1148 LearningRate 0.000009 Epoch: 36 Global Step: 760350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:14,283-Speed 2494.55 samples/sec Loss 1.0981 LearningRate 0.000009 Epoch: 36 Global Step: 760360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:22,484-Speed 2497.75 samples/sec Loss 1.0636 LearningRate 0.000009 Epoch: 36 Global Step: 760370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:30,687-Speed 2496.91 samples/sec Loss 1.0879 LearningRate 0.000009 Epoch: 36 Global Step: 760380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:38,846-Speed 2510.52 samples/sec Loss 1.1077 LearningRate 0.000009 Epoch: 36 Global Step: 760390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:47,050-Speed 2496.68 samples/sec Loss 1.0910 LearningRate 0.000009 Epoch: 36 Global Step: 760400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:28:55,258-Speed 2495.55 samples/sec Loss 1.1101 LearningRate 0.000009 Epoch: 36 Global Step: 760410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:03,465-Speed 2495.87 samples/sec Loss 1.1159 LearningRate 0.000009 Epoch: 36 Global Step: 760420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:11,668-Speed 2497.34 samples/sec Loss 1.1062 LearningRate 0.000009 Epoch: 36 Global Step: 760430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:19,869-Speed 2497.45 samples/sec Loss 1.0949 LearningRate 0.000009 Epoch: 36 Global Step: 760440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:28,018-Speed 2513.64 samples/sec Loss 1.0801 LearningRate 0.000009 Epoch: 36 Global Step: 760450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:36,235-Speed 2492.62 samples/sec Loss 1.0553 LearningRate 0.000009 Epoch: 36 Global Step: 760460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:44,438-Speed 2497.09 samples/sec Loss 1.0899 LearningRate 0.000009 Epoch: 36 Global Step: 760470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:29:52,640-Speed 2497.56 samples/sec Loss 1.1054 LearningRate 0.000009 Epoch: 36 Global Step: 760480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:00,845-Speed 2496.30 samples/sec Loss 1.1051 LearningRate 0.000009 Epoch: 36 Global Step: 760490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:09,049-Speed 2496.90 samples/sec Loss 1.1279 LearningRate 0.000009 Epoch: 36 Global Step: 760500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:17,198-Speed 2513.39 samples/sec Loss 1.0795 LearningRate 0.000009 Epoch: 36 Global Step: 760510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:25,399-Speed 2498.05 samples/sec Loss 1.0678 LearningRate 0.000009 Epoch: 36 Global Step: 760520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:33,600-Speed 2497.57 samples/sec Loss 1.1174 LearningRate 0.000009 Epoch: 36 Global Step: 760530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:41,799-Speed 2498.41 samples/sec Loss 1.0899 LearningRate 0.000009 Epoch: 36 Global Step: 760540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:50,001-Speed 2497.25 samples/sec Loss 1.0919 LearningRate 0.000009 Epoch: 36 Global Step: 760550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:30:58,203-Speed 2497.55 samples/sec Loss 1.0991 LearningRate 0.000009 Epoch: 36 Global Step: 760560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:06,350-Speed 2514.06 samples/sec Loss 1.0938 LearningRate 0.000009 Epoch: 36 Global Step: 760570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:14,553-Speed 2497.06 samples/sec Loss 1.1065 LearningRate 0.000009 Epoch: 36 Global Step: 760580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:22,762-Speed 2495.32 samples/sec Loss 1.0945 LearningRate 0.000009 Epoch: 36 Global Step: 760590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:30,965-Speed 2496.92 samples/sec Loss 1.0999 LearningRate 0.000009 Epoch: 36 Global Step: 760600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:39,165-Speed 2497.89 samples/sec Loss 1.0842 LearningRate 0.000009 Epoch: 36 Global Step: 760610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:47,369-Speed 2496.80 samples/sec Loss 1.0968 LearningRate 0.000009 Epoch: 36 Global Step: 760620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:31:55,519-Speed 2513.76 samples/sec Loss 1.1301 LearningRate 0.000009 Epoch: 36 Global Step: 760630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:03,722-Speed 2496.90 samples/sec Loss 1.1109 LearningRate 0.000009 Epoch: 36 Global Step: 760640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:11,926-Speed 2496.77 samples/sec Loss 1.0928 LearningRate 0.000009 Epoch: 36 Global Step: 760650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:20,130-Speed 2496.88 samples/sec Loss 1.0856 LearningRate 0.000009 Epoch: 36 Global Step: 760660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:28,334-Speed 2496.66 samples/sec Loss 1.1009 LearningRate 0.000009 Epoch: 36 Global Step: 760670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:36,541-Speed 2495.72 samples/sec Loss 1.0950 LearningRate 0.000009 Epoch: 36 Global Step: 760680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:44,693-Speed 2512.59 samples/sec Loss 1.0932 LearningRate 0.000009 Epoch: 36 Global Step: 760690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:32:52,897-Speed 2496.72 samples/sec Loss 1.0750 LearningRate 0.000009 Epoch: 36 Global Step: 760700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:01,098-Speed 2497.57 samples/sec Loss 1.1113 LearningRate 0.000009 Epoch: 36 Global Step: 760710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:09,300-Speed 2497.54 samples/sec Loss 1.1322 LearningRate 0.000009 Epoch: 36 Global Step: 760720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:17,515-Speed 2493.58 samples/sec Loss 1.1169 LearningRate 0.000008 Epoch: 36 Global Step: 760730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:25,714-Speed 2498.17 samples/sec Loss 1.0805 LearningRate 0.000008 Epoch: 36 Global Step: 760740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:33,863-Speed 2513.61 samples/sec Loss 1.0813 LearningRate 0.000008 Epoch: 36 Global Step: 760750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:42,078-Speed 2493.48 samples/sec Loss 1.0969 LearningRate 0.000008 Epoch: 36 Global Step: 760760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:50,279-Speed 2497.62 samples/sec Loss 1.0987 LearningRate 0.000008 Epoch: 36 Global Step: 760770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:33:58,482-Speed 2497.24 samples/sec Loss 1.1050 LearningRate 0.000008 Epoch: 36 Global Step: 760780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:06,685-Speed 2497.12 samples/sec Loss 1.0849 LearningRate 0.000008 Epoch: 36 Global Step: 760790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:14,888-Speed 2496.85 samples/sec Loss 1.0942 LearningRate 0.000008 Epoch: 36 Global Step: 760800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:23,037-Speed 2513.71 samples/sec Loss 1.1227 LearningRate 0.000008 Epoch: 36 Global Step: 760810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:31,237-Speed 2497.76 samples/sec Loss 1.0963 LearningRate 0.000008 Epoch: 36 Global Step: 760820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:39,439-Speed 2497.43 samples/sec Loss 1.0643 LearningRate 0.000008 Epoch: 36 Global Step: 760830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:47,640-Speed 2497.56 samples/sec Loss 1.0898 LearningRate 0.000008 Epoch: 36 Global Step: 760840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:34:55,843-Speed 2497.25 samples/sec Loss 1.0931 LearningRate 0.000008 Epoch: 36 Global Step: 760850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:04,046-Speed 2497.13 samples/sec Loss 1.0879 LearningRate 0.000008 Epoch: 36 Global Step: 760860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:12,196-Speed 2513.19 samples/sec Loss 1.0867 LearningRate 0.000008 Epoch: 36 Global Step: 760870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:20,399-Speed 2496.92 samples/sec Loss 1.0972 LearningRate 0.000008 Epoch: 36 Global Step: 760880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:28,605-Speed 2496.59 samples/sec Loss 1.1127 LearningRate 0.000008 Epoch: 36 Global Step: 760890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:36,812-Speed 2496.20 samples/sec Loss 1.0742 LearningRate 0.000008 Epoch: 36 Global Step: 760900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:45,027-Speed 2493.35 samples/sec Loss 1.1037 LearningRate 0.000008 Epoch: 36 Global Step: 760910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:35:53,231-Speed 2496.75 samples/sec Loss 1.0928 LearningRate 0.000008 Epoch: 36 Global Step: 760920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:01,380-Speed 2513.72 samples/sec Loss 1.0571 LearningRate 0.000008 Epoch: 36 Global Step: 760930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:09,595-Speed 2493.46 samples/sec Loss 1.0727 LearningRate 0.000008 Epoch: 36 Global Step: 760940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:17,797-Speed 2497.61 samples/sec Loss 1.0751 LearningRate 0.000008 Epoch: 36 Global Step: 760950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:25,999-Speed 2497.18 samples/sec Loss 1.0917 LearningRate 0.000008 Epoch: 36 Global Step: 760960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:34,205-Speed 2495.87 samples/sec Loss 1.0772 LearningRate 0.000008 Epoch: 36 Global Step: 760970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:42,406-Speed 2497.94 samples/sec Loss 1.0789 LearningRate 0.000008 Epoch: 36 Global Step: 760980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:50,553-Speed 2514.02 samples/sec Loss 1.0866 LearningRate 0.000008 Epoch: 36 Global Step: 760990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:36:58,751-Speed 2498.37 samples/sec Loss 1.0815 LearningRate 0.000008 Epoch: 36 Global Step: 761000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:06,952-Speed 2497.77 samples/sec Loss 1.0720 LearningRate 0.000008 Epoch: 36 Global Step: 761010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:15,152-Speed 2498.05 samples/sec Loss 1.1004 LearningRate 0.000008 Epoch: 36 Global Step: 761020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:23,354-Speed 2497.26 samples/sec Loss 1.0561 LearningRate 0.000008 Epoch: 36 Global Step: 761030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:31,567-Speed 2494.18 samples/sec Loss 1.1060 LearningRate 0.000008 Epoch: 36 Global Step: 761040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:39,715-Speed 2514.03 samples/sec Loss 1.1086 LearningRate 0.000008 Epoch: 36 Global Step: 761050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:47,917-Speed 2497.28 samples/sec Loss 1.0920 LearningRate 0.000008 Epoch: 36 Global Step: 761060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:37:56,120-Speed 2496.94 samples/sec Loss 1.0989 LearningRate 0.000008 Epoch: 36 Global Step: 761070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:04,321-Speed 2497.81 samples/sec Loss 1.0789 LearningRate 0.000008 Epoch: 36 Global Step: 761080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:12,522-Speed 2497.78 samples/sec Loss 1.0560 LearningRate 0.000008 Epoch: 36 Global Step: 761090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:20,729-Speed 2495.90 samples/sec Loss 1.0987 LearningRate 0.000008 Epoch: 36 Global Step: 761100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:28,876-Speed 2514.11 samples/sec Loss 1.0882 LearningRate 0.000008 Epoch: 36 Global Step: 761110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:37,077-Speed 2497.79 samples/sec Loss 1.0760 LearningRate 0.000008 Epoch: 36 Global Step: 761120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:45,290-Speed 2494.03 samples/sec Loss 1.1044 LearningRate 0.000008 Epoch: 36 Global Step: 761130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:38:53,498-Speed 2495.59 samples/sec Loss 1.0956 LearningRate 0.000008 Epoch: 36 Global Step: 761140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:01,701-Speed 2497.04 samples/sec Loss 1.0842 LearningRate 0.000008 Epoch: 36 Global Step: 761150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:09,906-Speed 2496.60 samples/sec Loss 1.0602 LearningRate 0.000008 Epoch: 36 Global Step: 761160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:18,054-Speed 2513.88 samples/sec Loss 1.1131 LearningRate 0.000008 Epoch: 36 Global Step: 761170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:26,258-Speed 2496.85 samples/sec Loss 1.0651 LearningRate 0.000008 Epoch: 36 Global Step: 761180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:34,455-Speed 2499.37 samples/sec Loss 1.0915 LearningRate 0.000008 Epoch: 36 Global Step: 761190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:42,658-Speed 2497.21 samples/sec Loss 1.1098 LearningRate 0.000008 Epoch: 36 Global Step: 761200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:50,865-Speed 2495.92 samples/sec Loss 1.1135 LearningRate 0.000008 Epoch: 36 Global Step: 761210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:39:59,079-Speed 2493.63 samples/sec Loss 1.0697 LearningRate 0.000008 Epoch: 36 Global Step: 761220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:07,228-Speed 2513.68 samples/sec Loss 1.1224 LearningRate 0.000008 Epoch: 36 Global Step: 761230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:15,428-Speed 2497.76 samples/sec Loss 1.0558 LearningRate 0.000008 Epoch: 36 Global Step: 761240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:23,630-Speed 2497.31 samples/sec Loss 1.0767 LearningRate 0.000008 Epoch: 36 Global Step: 761250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:31,831-Speed 2497.50 samples/sec Loss 1.0801 LearningRate 0.000008 Epoch: 36 Global Step: 761260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:40,038-Speed 2496.50 samples/sec Loss 1.0780 LearningRate 0.000008 Epoch: 36 Global Step: 761270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:48,244-Speed 2496.39 samples/sec Loss 1.1090 LearningRate 0.000008 Epoch: 36 Global Step: 761280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:40:56,389-Speed 2514.58 samples/sec Loss 1.0774 LearningRate 0.000008 Epoch: 36 Global Step: 761290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:41:04,590-Speed 2497.89 samples/sec Loss 1.0536 LearningRate 0.000008 Epoch: 36 Global Step: 761300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:41:12,800-Speed 2495.02 samples/sec Loss 1.0910 LearningRate 0.000008 Epoch: 36 Global Step: 761310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:41:21,003-Speed 2497.19 samples/sec Loss 1.1066 LearningRate 0.000008 Epoch: 36 Global Step: 761320 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:41:29,204-Speed 2497.62 samples/sec Loss 1.0675 LearningRate 0.000008 Epoch: 36 Global Step: 761330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:41:37,409-Speed 2496.45 samples/sec Loss 1.0852 LearningRate 0.000008 Epoch: 36 Global Step: 761340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:41:45,558-Speed 2513.59 samples/sec Loss 1.0815 LearningRate 0.000008 Epoch: 36 Global Step: 761350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:41:53,763-Speed 2496.47 samples/sec Loss 1.0889 LearningRate 0.000008 Epoch: 36 Global Step: 761360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:42:01,966-Speed 2497.19 samples/sec Loss 1.0836 LearningRate 0.000008 Epoch: 36 Global Step: 761370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:42:10,167-Speed 2497.75 samples/sec Loss 1.0800 LearningRate 0.000008 Epoch: 36 Global Step: 761380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:42:18,369-Speed 2497.29 samples/sec Loss 1.0881 LearningRate 0.000008 Epoch: 36 Global Step: 761390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-07-12 20:42:26,541-Speed 2506.47 samples/sec Loss 1.1188 LearningRate 0.000008 Epoch: 36 Global Step: 761400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:42:34,696-Speed 2511.70 samples/sec Loss 1.0810 LearningRate 0.000008 Epoch: 36 Global Step: 761410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:42:42,908-Speed 2494.12 samples/sec Loss 1.1138 LearningRate 0.000008 Epoch: 36 Global Step: 761420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:42:51,113-Speed 2496.64 samples/sec Loss 1.0619 LearningRate 0.000008 Epoch: 36 Global Step: 761430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:42:59,313-Speed 2497.84 samples/sec Loss 1.1090 LearningRate 0.000008 Epoch: 36 Global Step: 761440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:07,522-Speed 2495.22 samples/sec Loss 1.1025 LearningRate 0.000008 Epoch: 36 Global Step: 761450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:15,724-Speed 2497.33 samples/sec Loss 1.0908 LearningRate 0.000008 Epoch: 36 Global Step: 761460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:23,873-Speed 2513.62 samples/sec Loss 1.0807 LearningRate 0.000008 Epoch: 36 Global Step: 761470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:32,075-Speed 2497.42 samples/sec Loss 1.0675 LearningRate 0.000008 Epoch: 36 Global Step: 761480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:40,284-Speed 2495.50 samples/sec Loss 1.1116 LearningRate 0.000008 Epoch: 36 Global Step: 761490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:48,525-Speed 2485.28 samples/sec Loss 1.1186 LearningRate 0.000008 Epoch: 36 Global Step: 761500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:43:56,730-Speed 2496.46 samples/sec Loss 1.0993 LearningRate 0.000008 Epoch: 36 Global Step: 761510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:04,932-Speed 2497.40 samples/sec Loss 1.0789 LearningRate 0.000008 Epoch: 36 Global Step: 761520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:13,082-Speed 2513.16 samples/sec Loss 1.1044 LearningRate 0.000008 Epoch: 36 Global Step: 761530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:21,285-Speed 2497.00 samples/sec Loss 1.1312 LearningRate 0.000008 Epoch: 36 Global Step: 761540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:29,488-Speed 2497.08 samples/sec Loss 1.1329 LearningRate 0.000008 Epoch: 36 Global Step: 761550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:37,687-Speed 2498.40 samples/sec Loss 1.0858 LearningRate 0.000008 Epoch: 36 Global Step: 761560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:45,890-Speed 2497.08 samples/sec Loss 1.1070 LearningRate 0.000008 Epoch: 36 Global Step: 761570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:44:54,092-Speed 2497.34 samples/sec Loss 1.0696 LearningRate 0.000008 Epoch: 36 Global Step: 761580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:02,241-Speed 2513.73 samples/sec Loss 1.0744 LearningRate 0.000008 Epoch: 36 Global Step: 761590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:10,449-Speed 2495.52 samples/sec Loss 1.0582 LearningRate 0.000008 Epoch: 36 Global Step: 761600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:18,653-Speed 2497.21 samples/sec Loss 1.1084 LearningRate 0.000008 Epoch: 36 Global Step: 761610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:26,865-Speed 2494.28 samples/sec Loss 1.1081 LearningRate 0.000008 Epoch: 36 Global Step: 761620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:35,067-Speed 2497.18 samples/sec Loss 1.1060 LearningRate 0.000008 Epoch: 36 Global Step: 761630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:43,269-Speed 2497.27 samples/sec Loss 1.0978 LearningRate 0.000008 Epoch: 36 Global Step: 761640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:51,417-Speed 2514.17 samples/sec Loss 1.0873 LearningRate 0.000008 Epoch: 36 Global Step: 761650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:45:59,620-Speed 2497.07 samples/sec Loss 1.0685 LearningRate 0.000008 Epoch: 36 Global Step: 761660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:07,833-Speed 2494.08 samples/sec Loss 1.1048 LearningRate 0.000008 Epoch: 36 Global Step: 761670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:16,036-Speed 2496.89 samples/sec Loss 1.1045 LearningRate 0.000008 Epoch: 36 Global Step: 761680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:24,263-Speed 2490.07 samples/sec Loss 1.0923 LearningRate 0.000008 Epoch: 36 Global Step: 761690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:32,464-Speed 2497.79 samples/sec Loss 1.1105 LearningRate 0.000008 Epoch: 36 Global Step: 761700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:40,615-Speed 2513.19 samples/sec Loss 1.0645 LearningRate 0.000008 Epoch: 36 Global Step: 761710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:48,818-Speed 2497.08 samples/sec Loss 1.1162 LearningRate 0.000008 Epoch: 36 Global Step: 761720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:46:57,020-Speed 2497.11 samples/sec Loss 1.1176 LearningRate 0.000008 Epoch: 36 Global Step: 761730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:05,230-Speed 2495.23 samples/sec Loss 1.1130 LearningRate 0.000008 Epoch: 36 Global Step: 761740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:13,433-Speed 2496.90 samples/sec Loss 1.0874 LearningRate 0.000008 Epoch: 36 Global Step: 761750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:21,636-Speed 2496.91 samples/sec Loss 1.0675 LearningRate 0.000008 Epoch: 36 Global Step: 761760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:29,791-Speed 2512.01 samples/sec Loss 1.0607 LearningRate 0.000008 Epoch: 36 Global Step: 761770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:37,993-Speed 2497.16 samples/sec Loss 1.0953 LearningRate 0.000008 Epoch: 36 Global Step: 761780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:46,199-Speed 2496.10 samples/sec Loss 1.0839 LearningRate 0.000008 Epoch: 36 Global Step: 761790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:47:54,409-Speed 2495.05 samples/sec Loss 1.1074 LearningRate 0.000008 Epoch: 36 Global Step: 761800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:48:02,613-Speed 2496.81 samples/sec Loss 1.1010 LearningRate 0.000008 Epoch: 36 Global Step: 761810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:48:10,829-Speed 2492.99 samples/sec Loss 1.0879 LearningRate 0.000008 Epoch: 36 Global Step: 761820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:48:18,991-Speed 2509.23 samples/sec Loss 1.0945 LearningRate 0.000008 Epoch: 36 Global Step: 761830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-07-12 20:48:27,197-Speed 2496.23 samples/sec Loss 1.1006 LearningRate 0.000008 Epoch: 36 Global Step: 761840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:48:35,404-Speed 2496.18 samples/sec Loss 1.1027 LearningRate 0.000008 Epoch: 36 Global Step: 761850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:48:43,606-Speed 2497.26 samples/sec Loss 1.1012 LearningRate 0.000008 Epoch: 36 Global Step: 761860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:48:51,809-Speed 2497.03 samples/sec Loss 1.1078 LearningRate 0.000008 Epoch: 36 Global Step: 761870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:00,015-Speed 2496.33 samples/sec Loss 1.1149 LearningRate 0.000008 Epoch: 36 Global Step: 761880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:08,164-Speed 2513.55 samples/sec Loss 1.1120 LearningRate 0.000008 Epoch: 36 Global Step: 761890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:16,366-Speed 2497.26 samples/sec Loss 1.1066 LearningRate 0.000008 Epoch: 36 Global Step: 761900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:24,568-Speed 2497.45 samples/sec Loss 1.0975 LearningRate 0.000008 Epoch: 36 Global Step: 761910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:32,767-Speed 2498.17 samples/sec Loss 1.1005 LearningRate 0.000008 Epoch: 36 Global Step: 761920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:40,980-Speed 2493.90 samples/sec Loss 1.0785 LearningRate 0.000008 Epoch: 36 Global Step: 761930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:49,181-Speed 2497.69 samples/sec Loss 1.0831 LearningRate 0.000008 Epoch: 36 Global Step: 761940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:49:57,333-Speed 2512.88 samples/sec Loss 1.1168 LearningRate 0.000008 Epoch: 36 Global Step: 761950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:05,534-Speed 2497.63 samples/sec Loss 1.0588 LearningRate 0.000008 Epoch: 36 Global Step: 761960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:13,737-Speed 2496.97 samples/sec Loss 1.1148 LearningRate 0.000008 Epoch: 36 Global Step: 761970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:21,938-Speed 2497.50 samples/sec Loss 1.0775 LearningRate 0.000008 Epoch: 36 Global Step: 761980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:30,138-Speed 2498.14 samples/sec Loss 1.0833 LearningRate 0.000008 Epoch: 36 Global Step: 761990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:38,338-Speed 2497.90 samples/sec Loss 1.1091 LearningRate 0.000008 Epoch: 36 Global Step: 762000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:46,492-Speed 2512.14 samples/sec Loss 1.0736 LearningRate 0.000008 Epoch: 36 Global Step: 762010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:50:54,709-Speed 2492.68 samples/sec Loss 1.0746 LearningRate 0.000008 Epoch: 36 Global Step: 762020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:02,916-Speed 2495.87 samples/sec Loss 1.0864 LearningRate 0.000008 Epoch: 36 Global Step: 762030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:11,120-Speed 2496.64 samples/sec Loss 1.1140 LearningRate 0.000008 Epoch: 36 Global Step: 762040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:19,319-Speed 2498.36 samples/sec Loss 1.1004 LearningRate 0.000008 Epoch: 36 Global Step: 762050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:27,521-Speed 2497.07 samples/sec Loss 1.0769 LearningRate 0.000008 Epoch: 36 Global Step: 762060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:35,674-Speed 2512.23 samples/sec Loss 1.0891 LearningRate 0.000008 Epoch: 36 Global Step: 762070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:43,875-Speed 2497.67 samples/sec Loss 1.0904 LearningRate 0.000008 Epoch: 36 Global Step: 762080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:51:52,090-Speed 2493.39 samples/sec Loss 1.1094 LearningRate 0.000008 Epoch: 36 Global Step: 762090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:00,293-Speed 2497.05 samples/sec Loss 1.0818 LearningRate 0.000008 Epoch: 36 Global Step: 762100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:08,495-Speed 2497.25 samples/sec Loss 1.0801 LearningRate 0.000008 Epoch: 36 Global Step: 762110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:16,697-Speed 2497.44 samples/sec Loss 1.0938 LearningRate 0.000008 Epoch: 36 Global Step: 762120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:24,845-Speed 2513.86 samples/sec Loss 1.1195 LearningRate 0.000008 Epoch: 36 Global Step: 762130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:33,047-Speed 2497.20 samples/sec Loss 1.0767 LearningRate 0.000008 Epoch: 36 Global Step: 762140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:41,247-Speed 2498.16 samples/sec Loss 1.0787 LearningRate 0.000008 Epoch: 36 Global Step: 762150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:49,456-Speed 2495.30 samples/sec Loss 1.0839 LearningRate 0.000008 Epoch: 36 Global Step: 762160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:52:57,671-Speed 2493.82 samples/sec Loss 1.0836 LearningRate 0.000008 Epoch: 36 Global Step: 762170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:05,875-Speed 2496.76 samples/sec Loss 1.0997 LearningRate 0.000008 Epoch: 36 Global Step: 762180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:14,026-Speed 2512.93 samples/sec Loss 1.1345 LearningRate 0.000008 Epoch: 36 Global Step: 762190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:22,233-Speed 2495.92 samples/sec Loss 1.0985 LearningRate 0.000008 Epoch: 36 Global Step: 762200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:30,435-Speed 2497.34 samples/sec Loss 1.0962 LearningRate 0.000008 Epoch: 36 Global Step: 762210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:38,639-Speed 2496.85 samples/sec Loss 1.0671 LearningRate 0.000008 Epoch: 36 Global Step: 762220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:46,843-Speed 2496.86 samples/sec Loss 1.1088 LearningRate 0.000008 Epoch: 36 Global Step: 762230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:53:55,053-Speed 2495.03 samples/sec Loss 1.0772 LearningRate 0.000008 Epoch: 36 Global Step: 762240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:03,206-Speed 2512.27 samples/sec Loss 1.0661 LearningRate 0.000008 Epoch: 36 Global Step: 762250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:11,404-Speed 2498.41 samples/sec Loss 1.0753 LearningRate 0.000008 Epoch: 36 Global Step: 762260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:19,606-Speed 2497.31 samples/sec Loss 1.0726 LearningRate 0.000008 Epoch: 36 Global Step: 762270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:27,812-Speed 2496.25 samples/sec Loss 1.0829 LearningRate 0.000008 Epoch: 36 Global Step: 762280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:36,013-Speed 2497.73 samples/sec Loss 1.0413 LearningRate 0.000008 Epoch: 36 Global Step: 762290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:44,216-Speed 2497.04 samples/sec Loss 1.0478 LearningRate 0.000008 Epoch: 36 Global Step: 762300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:54:52,367-Speed 2513.24 samples/sec Loss 1.0745 LearningRate 0.000008 Epoch: 36 Global Step: 762310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:00,571-Speed 2496.80 samples/sec Loss 1.0885 LearningRate 0.000008 Epoch: 36 Global Step: 762320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:08,773-Speed 2497.20 samples/sec Loss 1.0769 LearningRate 0.000008 Epoch: 36 Global Step: 762330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:16,975-Speed 2497.50 samples/sec Loss 1.0821 LearningRate 0.000008 Epoch: 36 Global Step: 762340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:25,179-Speed 2496.88 samples/sec Loss 1.0736 LearningRate 0.000008 Epoch: 36 Global Step: 762350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:33,384-Speed 2496.44 samples/sec Loss 1.0376 LearningRate 0.000008 Epoch: 36 Global Step: 762360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:41,534-Speed 2513.65 samples/sec Loss 1.0431 LearningRate 0.000008 Epoch: 36 Global Step: 762370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:49,734-Speed 2497.68 samples/sec Loss 1.0723 LearningRate 0.000008 Epoch: 36 Global Step: 762380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:55:57,935-Speed 2497.74 samples/sec Loss 1.0789 LearningRate 0.000008 Epoch: 36 Global Step: 762390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:06,137-Speed 2497.45 samples/sec Loss 1.0794 LearningRate 0.000008 Epoch: 36 Global Step: 762400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:14,338-Speed 2497.75 samples/sec Loss 1.0560 LearningRate 0.000008 Epoch: 36 Global Step: 762410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:22,550-Speed 2494.46 samples/sec Loss 1.0734 LearningRate 0.000008 Epoch: 36 Global Step: 762420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:30,700-Speed 2513.24 samples/sec Loss 1.0894 LearningRate 0.000008 Epoch: 36 Global Step: 762430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:38,900-Speed 2497.92 samples/sec Loss 1.1092 LearningRate 0.000008 Epoch: 36 Global Step: 762440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:47,101-Speed 2497.55 samples/sec Loss 1.1267 LearningRate 0.000008 Epoch: 36 Global Step: 762450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:56:55,310-Speed 2495.36 samples/sec Loss 1.0786 LearningRate 0.000008 Epoch: 36 Global Step: 762460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:03,513-Speed 2496.82 samples/sec Loss 1.0870 LearningRate 0.000008 Epoch: 36 Global Step: 762470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:11,715-Speed 2497.67 samples/sec Loss 1.0679 LearningRate 0.000008 Epoch: 36 Global Step: 762480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:19,863-Speed 2513.85 samples/sec Loss 1.1206 LearningRate 0.000008 Epoch: 36 Global Step: 762490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:28,066-Speed 2496.93 samples/sec Loss 1.0892 LearningRate 0.000008 Epoch: 36 Global Step: 762500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:36,269-Speed 2497.00 samples/sec Loss 1.0927 LearningRate 0.000008 Epoch: 36 Global Step: 762510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:44,471-Speed 2497.67 samples/sec Loss 1.0812 LearningRate 0.000008 Epoch: 36 Global Step: 762520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:57:52,672-Speed 2497.67 samples/sec Loss 1.0744 LearningRate 0.000008 Epoch: 36 Global Step: 762530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:00,873-Speed 2497.51 samples/sec Loss 1.0871 LearningRate 0.000008 Epoch: 36 Global Step: 762540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:09,023-Speed 2513.15 samples/sec Loss 1.0890 LearningRate 0.000008 Epoch: 36 Global Step: 762550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:17,227-Speed 2496.79 samples/sec Loss 1.1165 LearningRate 0.000008 Epoch: 36 Global Step: 762560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:25,430-Speed 2497.54 samples/sec Loss 1.0940 LearningRate 0.000008 Epoch: 36 Global Step: 762570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:33,643-Speed 2494.04 samples/sec Loss 1.0637 LearningRate 0.000008 Epoch: 36 Global Step: 762580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:41,848-Speed 2496.19 samples/sec Loss 1.0710 LearningRate 0.000008 Epoch: 36 Global Step: 762590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-07-12 20:58:50,049-Speed 2497.80 samples/sec Loss 1.1378 LearningRate 0.000008 Epoch: 36 Global Step: 762600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:58:58,209-Speed 2510.50 samples/sec Loss 1.1060 LearningRate 0.000008 Epoch: 36 Global Step: 762610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:06,414-Speed 2496.22 samples/sec Loss 1.0864 LearningRate 0.000008 Epoch: 36 Global Step: 762620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:14,614-Speed 2497.91 samples/sec Loss 1.0298 LearningRate 0.000008 Epoch: 36 Global Step: 762630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:22,817-Speed 2497.01 samples/sec Loss 1.1198 LearningRate 0.000008 Epoch: 36 Global Step: 762640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:31,036-Speed 2492.44 samples/sec Loss 1.0727 LearningRate 0.000008 Epoch: 36 Global Step: 762650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:39,237-Speed 2497.64 samples/sec Loss 1.1000 LearningRate 0.000008 Epoch: 36 Global Step: 762660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:47,391-Speed 2512.24 samples/sec Loss 1.0741 LearningRate 0.000008 Epoch: 36 Global Step: 762670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 20:59:55,591-Speed 2497.92 samples/sec Loss 1.0952 LearningRate 0.000008 Epoch: 36 Global Step: 762680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:03,791-Speed 2498.02 samples/sec Loss 1.0799 LearningRate 0.000008 Epoch: 36 Global Step: 762690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:11,995-Speed 2496.61 samples/sec Loss 1.1121 LearningRate 0.000008 Epoch: 36 Global Step: 762700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:20,199-Speed 2496.95 samples/sec Loss 1.1223 LearningRate 0.000008 Epoch: 36 Global Step: 762710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:28,402-Speed 2497.08 samples/sec Loss 1.1148 LearningRate 0.000008 Epoch: 36 Global Step: 762720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:36,550-Speed 2513.63 samples/sec Loss 1.0967 LearningRate 0.000008 Epoch: 36 Global Step: 762730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:44,750-Speed 2497.87 samples/sec Loss 1.0799 LearningRate 0.000008 Epoch: 36 Global Step: 762740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:00:52,956-Speed 2496.19 samples/sec Loss 1.0863 LearningRate 0.000008 Epoch: 36 Global Step: 762750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:01,160-Speed 2496.69 samples/sec Loss 1.0908 LearningRate 0.000008 Epoch: 36 Global Step: 762760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:09,365-Speed 2496.47 samples/sec Loss 1.1046 LearningRate 0.000008 Epoch: 36 Global Step: 762770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:17,570-Speed 2496.34 samples/sec Loss 1.1195 LearningRate 0.000008 Epoch: 36 Global Step: 762780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:25,720-Speed 2513.46 samples/sec Loss 1.0486 LearningRate 0.000008 Epoch: 36 Global Step: 762790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:33,928-Speed 2495.59 samples/sec Loss 1.1096 LearningRate 0.000008 Epoch: 36 Global Step: 762800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:42,145-Speed 2492.85 samples/sec Loss 1.1064 LearningRate 0.000008 Epoch: 36 Global Step: 762810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:50,347-Speed 2497.34 samples/sec Loss 1.0865 LearningRate 0.000008 Epoch: 36 Global Step: 762820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:01:58,553-Speed 2496.08 samples/sec Loss 1.0887 LearningRate 0.000008 Epoch: 36 Global Step: 762830 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:06,752-Speed 2498.16 samples/sec Loss 1.0800 LearningRate 0.000008 Epoch: 36 Global Step: 762840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:14,901-Speed 2513.94 samples/sec Loss 1.0821 LearningRate 0.000008 Epoch: 36 Global Step: 762850 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:23,103-Speed 2497.26 samples/sec Loss 1.1028 LearningRate 0.000008 Epoch: 36 Global Step: 762860 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:31,309-Speed 2496.21 samples/sec Loss 1.0979 LearningRate 0.000008 Epoch: 36 Global Step: 762870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:39,511-Speed 2497.17 samples/sec Loss 1.0943 LearningRate 0.000008 Epoch: 36 Global Step: 762880 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:47,718-Speed 2495.93 samples/sec Loss 1.0966 LearningRate 0.000008 Epoch: 36 Global Step: 762890 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:02:55,927-Speed 2495.26 samples/sec Loss 1.0873 LearningRate 0.000008 Epoch: 36 Global Step: 762900 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:04,079-Speed 2512.66 samples/sec Loss 1.0628 LearningRate 0.000008 Epoch: 36 Global Step: 762910 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:12,285-Speed 2496.29 samples/sec Loss 1.0927 LearningRate 0.000008 Epoch: 36 Global Step: 762920 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:20,491-Speed 2495.79 samples/sec Loss 1.0505 LearningRate 0.000008 Epoch: 36 Global Step: 762930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:28,695-Speed 2496.79 samples/sec Loss 1.0816 LearningRate 0.000008 Epoch: 36 Global Step: 762940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:36,898-Speed 2497.22 samples/sec Loss 1.1293 LearningRate 0.000008 Epoch: 36 Global Step: 762950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:45,103-Speed 2496.52 samples/sec Loss 1.1203 LearningRate 0.000008 Epoch: 36 Global Step: 762960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:03:53,258-Speed 2511.67 samples/sec Loss 1.0902 LearningRate 0.000008 Epoch: 36 Global Step: 762970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:01,458-Speed 2497.93 samples/sec Loss 1.0812 LearningRate 0.000008 Epoch: 36 Global Step: 762980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:09,661-Speed 2497.27 samples/sec Loss 1.1042 LearningRate 0.000008 Epoch: 36 Global Step: 762990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:17,866-Speed 2496.61 samples/sec Loss 1.0737 LearningRate 0.000008 Epoch: 36 Global Step: 763000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:26,065-Speed 2498.09 samples/sec Loss 1.0799 LearningRate 0.000008 Epoch: 36 Global Step: 763010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:34,270-Speed 2496.55 samples/sec Loss 1.0967 LearningRate 0.000008 Epoch: 36 Global Step: 763020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:42,417-Speed 2514.20 samples/sec Loss 1.1091 LearningRate 0.000008 Epoch: 36 Global Step: 763030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:50,620-Speed 2497.14 samples/sec Loss 1.0786 LearningRate 0.000008 Epoch: 36 Global Step: 763040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:04:58,825-Speed 2496.46 samples/sec Loss 1.0806 LearningRate 0.000008 Epoch: 36 Global Step: 763050 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:07,031-Speed 2496.27 samples/sec Loss 1.1238 LearningRate 0.000008 Epoch: 36 Global Step: 763060 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:15,235-Speed 2496.64 samples/sec Loss 1.0880 LearningRate 0.000008 Epoch: 36 Global Step: 763070 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:23,444-Speed 2495.18 samples/sec Loss 1.0852 LearningRate 0.000008 Epoch: 36 Global Step: 763080 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:31,593-Speed 2513.76 samples/sec Loss 1.1047 LearningRate 0.000008 Epoch: 36 Global Step: 763090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:39,798-Speed 2496.39 samples/sec Loss 1.1098 LearningRate 0.000008 Epoch: 36 Global Step: 763100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:48,009-Speed 2494.82 samples/sec Loss 1.0965 LearningRate 0.000008 Epoch: 36 Global Step: 763110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:05:56,212-Speed 2496.92 samples/sec Loss 1.0640 LearningRate 0.000008 Epoch: 36 Global Step: 763120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:04,422-Speed 2494.96 samples/sec Loss 1.1192 LearningRate 0.000008 Epoch: 36 Global Step: 763130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:12,623-Speed 2498.16 samples/sec Loss 1.0811 LearningRate 0.000008 Epoch: 36 Global Step: 763140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:20,774-Speed 2513.03 samples/sec Loss 1.0819 LearningRate 0.000008 Epoch: 36 Global Step: 763150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:28,979-Speed 2496.28 samples/sec Loss 1.0884 LearningRate 0.000008 Epoch: 36 Global Step: 763160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:37,183-Speed 2496.91 samples/sec Loss 1.0694 LearningRate 0.000008 Epoch: 36 Global Step: 763170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:45,387-Speed 2496.66 samples/sec Loss 1.0535 LearningRate 0.000008 Epoch: 36 Global Step: 763180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:06:53,591-Speed 2496.81 samples/sec Loss 1.0894 LearningRate 0.000008 Epoch: 36 Global Step: 763190 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:01,791-Speed 2497.81 samples/sec Loss 1.1056 LearningRate 0.000008 Epoch: 36 Global Step: 763200 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:09,939-Speed 2514.13 samples/sec Loss 1.0977 LearningRate 0.000008 Epoch: 36 Global Step: 763210 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:18,160-Speed 2491.55 samples/sec Loss 1.0793 LearningRate 0.000008 Epoch: 36 Global Step: 763220 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:26,362-Speed 2497.51 samples/sec Loss 1.0893 LearningRate 0.000008 Epoch: 36 Global Step: 763230 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:34,568-Speed 2496.22 samples/sec Loss 1.0889 LearningRate 0.000008 Epoch: 36 Global Step: 763240 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:42,771-Speed 2496.69 samples/sec Loss 1.0605 LearningRate 0.000008 Epoch: 36 Global Step: 763250 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:50,974-Speed 2497.12 samples/sec Loss 1.1211 LearningRate 0.000008 Epoch: 36 Global Step: 763260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:07:59,130-Speed 2511.81 samples/sec Loss 1.0898 LearningRate 0.000008 Epoch: 36 Global Step: 763270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:07,331-Speed 2497.54 samples/sec Loss 1.0726 LearningRate 0.000008 Epoch: 36 Global Step: 763280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:15,537-Speed 2496.24 samples/sec Loss 1.0921 LearningRate 0.000008 Epoch: 36 Global Step: 763290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:23,736-Speed 2498.05 samples/sec Loss 1.1197 LearningRate 0.000008 Epoch: 36 Global Step: 763300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:31,941-Speed 2496.85 samples/sec Loss 1.1130 LearningRate 0.000008 Epoch: 36 Global Step: 763310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:40,144-Speed 2496.99 samples/sec Loss 1.0900 LearningRate 0.000008 Epoch: 36 Global Step: 763320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:48,292-Speed 2513.86 samples/sec Loss 1.1155 LearningRate 0.000008 Epoch: 36 Global Step: 763330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:08:56,495-Speed 2497.15 samples/sec Loss 1.1157 LearningRate 0.000008 Epoch: 36 Global Step: 763340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:04,698-Speed 2496.82 samples/sec Loss 1.0896 LearningRate 0.000008 Epoch: 36 Global Step: 763350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:12,900-Speed 2497.52 samples/sec Loss 1.1002 LearningRate 0.000008 Epoch: 36 Global Step: 763360 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:21,104-Speed 2496.57 samples/sec Loss 1.1160 LearningRate 0.000008 Epoch: 36 Global Step: 763370 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:29,307-Speed 2497.28 samples/sec Loss 1.0989 LearningRate 0.000008 Epoch: 36 Global Step: 763380 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:37,461-Speed 2511.97 samples/sec Loss 1.0813 LearningRate 0.000008 Epoch: 36 Global Step: 763390 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:45,668-Speed 2495.84 samples/sec Loss 1.0773 LearningRate 0.000008 Epoch: 36 Global Step: 763400 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:09:53,871-Speed 2497.05 samples/sec Loss 1.0785 LearningRate 0.000008 Epoch: 36 Global Step: 763410 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:02,075-Speed 2496.93 samples/sec Loss 1.1088 LearningRate 0.000008 Epoch: 36 Global Step: 763420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:10,280-Speed 2496.34 samples/sec Loss 1.1143 LearningRate 0.000008 Epoch: 36 Global Step: 763430 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:18,484-Speed 2496.84 samples/sec Loss 1.1099 LearningRate 0.000008 Epoch: 36 Global Step: 763440 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:26,637-Speed 2512.03 samples/sec Loss 1.0826 LearningRate 0.000008 Epoch: 36 Global Step: 763450 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:34,840-Speed 2497.13 samples/sec Loss 1.0873 LearningRate 0.000008 Epoch: 36 Global Step: 763460 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:43,047-Speed 2496.05 samples/sec Loss 1.0667 LearningRate 0.000008 Epoch: 36 Global Step: 763470 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:51,253-Speed 2496.17 samples/sec Loss 1.1019 LearningRate 0.000008 Epoch: 36 Global Step: 763480 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:10:59,458-Speed 2496.17 samples/sec Loss 1.0449 LearningRate 0.000008 Epoch: 36 Global Step: 763490 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:07,666-Speed 2495.54 samples/sec Loss 1.0851 LearningRate 0.000008 Epoch: 36 Global Step: 763500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:15,816-Speed 2513.40 samples/sec Loss 1.1007 LearningRate 0.000008 Epoch: 36 Global Step: 763510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:24,019-Speed 2497.25 samples/sec Loss 1.1020 LearningRate 0.000008 Epoch: 36 Global Step: 763520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:32,225-Speed 2496.11 samples/sec Loss 1.0788 LearningRate 0.000008 Epoch: 36 Global Step: 763530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:40,436-Speed 2494.68 samples/sec Loss 1.0934 LearningRate 0.000008 Epoch: 36 Global Step: 763540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:48,668-Speed 2488.14 samples/sec Loss 1.0726 LearningRate 0.000008 Epoch: 36 Global Step: 763550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:11:56,878-Speed 2495.16 samples/sec Loss 1.0853 LearningRate 0.000008 Epoch: 36 Global Step: 763560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:05,031-Speed 2512.30 samples/sec Loss 1.0654 LearningRate 0.000008 Epoch: 36 Global Step: 763570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:13,270-Speed 2486.49 samples/sec Loss 1.0852 LearningRate 0.000008 Epoch: 36 Global Step: 763580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:21,481-Speed 2494.60 samples/sec Loss 1.0733 LearningRate 0.000008 Epoch: 36 Global Step: 763590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:29,690-Speed 2495.17 samples/sec Loss 1.0779 LearningRate 0.000008 Epoch: 36 Global Step: 763600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:37,913-Speed 2491.02 samples/sec Loss 1.0379 LearningRate 0.000008 Epoch: 36 Global Step: 763610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:46,122-Speed 2495.43 samples/sec Loss 1.1029 LearningRate 0.000008 Epoch: 36 Global Step: 763620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:12:54,281-Speed 2510.46 samples/sec Loss 1.0366 LearningRate 0.000008 Epoch: 36 Global Step: 763630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:02,493-Speed 2494.35 samples/sec Loss 1.0707 LearningRate 0.000008 Epoch: 36 Global Step: 763640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:10,703-Speed 2495.24 samples/sec Loss 1.0757 LearningRate 0.000008 Epoch: 36 Global Step: 763650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:18,913-Speed 2494.76 samples/sec Loss 1.0922 LearningRate 0.000008 Epoch: 36 Global Step: 763660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:27,119-Speed 2496.26 samples/sec Loss 1.0742 LearningRate 0.000008 Epoch: 36 Global Step: 763670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:35,325-Speed 2496.17 samples/sec Loss 1.0834 LearningRate 0.000008 Epoch: 36 Global Step: 763680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:43,480-Speed 2511.80 samples/sec Loss 1.0950 LearningRate 0.000008 Epoch: 36 Global Step: 763690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:51,684-Speed 2496.73 samples/sec Loss 1.0954 LearningRate 0.000008 Epoch: 36 Global Step: 763700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:13:59,900-Speed 2493.21 samples/sec Loss 1.0742 LearningRate 0.000008 Epoch: 36 Global Step: 763710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:08,107-Speed 2495.67 samples/sec Loss 1.0984 LearningRate 0.000008 Epoch: 36 Global Step: 763720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:16,315-Speed 2495.78 samples/sec Loss 1.1052 LearningRate 0.000008 Epoch: 36 Global Step: 763730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:24,520-Speed 2496.31 samples/sec Loss 1.1025 LearningRate 0.000008 Epoch: 36 Global Step: 763740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:32,686-Speed 2508.51 samples/sec Loss 1.1133 LearningRate 0.000008 Epoch: 36 Global Step: 763750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:40,889-Speed 2496.83 samples/sec Loss 1.0589 LearningRate 0.000008 Epoch: 36 Global Step: 763760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:49,095-Speed 2496.03 samples/sec Loss 1.1029 LearningRate 0.000008 Epoch: 36 Global Step: 763770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:14:57,295-Speed 2498.15 samples/sec Loss 1.1132 LearningRate 0.000008 Epoch: 36 Global Step: 763780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:15:05,500-Speed 2496.59 samples/sec Loss 1.0806 LearningRate 0.000008 Epoch: 36 Global Step: 763790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:15:13,705-Speed 2496.48 samples/sec Loss 1.1046 LearningRate 0.000008 Epoch: 36 Global Step: 763800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:15:21,859-Speed 2512.05 samples/sec Loss 1.0963 LearningRate 0.000008 Epoch: 36 Global Step: 763810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:15:30,064-Speed 2496.56 samples/sec Loss 1.0814 LearningRate 0.000008 Epoch: 36 Global Step: 763820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:15:38,269-Speed 2496.45 samples/sec Loss 1.0949 LearningRate 0.000008 Epoch: 36 Global Step: 763830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:15:46,477-Speed 2495.38 samples/sec Loss 1.0691 LearningRate 0.000008 Epoch: 36 Global Step: 763840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:15:54,686-Speed 2495.48 samples/sec Loss 1.1046 LearningRate 0.000008 Epoch: 36 Global Step: 763850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:02,891-Speed 2496.21 samples/sec Loss 1.1163 LearningRate 0.000008 Epoch: 36 Global Step: 763860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:11,044-Speed 2512.36 samples/sec Loss 1.0829 LearningRate 0.000008 Epoch: 36 Global Step: 763870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:19,250-Speed 2496.45 samples/sec Loss 1.1142 LearningRate 0.000008 Epoch: 36 Global Step: 763880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:27,455-Speed 2496.33 samples/sec Loss 1.0917 LearningRate 0.000008 Epoch: 36 Global Step: 763890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:35,660-Speed 2496.43 samples/sec Loss 1.0797 LearningRate 0.000008 Epoch: 36 Global Step: 763900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:43,863-Speed 2496.99 samples/sec Loss 1.0867 LearningRate 0.000008 Epoch: 36 Global Step: 763910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:16:52,062-Speed 2498.13 samples/sec Loss 1.0851 LearningRate 0.000008 Epoch: 36 Global Step: 763920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:00,212-Speed 2513.45 samples/sec Loss 1.0666 LearningRate 0.000008 Epoch: 36 Global Step: 763930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:08,412-Speed 2497.88 samples/sec Loss 1.0507 LearningRate 0.000008 Epoch: 36 Global Step: 763940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:16,616-Speed 2496.63 samples/sec Loss 1.0906 LearningRate 0.000008 Epoch: 36 Global Step: 763950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:24,825-Speed 2495.32 samples/sec Loss 1.0533 LearningRate 0.000008 Epoch: 36 Global Step: 763960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:33,035-Speed 2494.82 samples/sec Loss 1.1175 LearningRate 0.000008 Epoch: 36 Global Step: 763970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:41,242-Speed 2496.02 samples/sec Loss 1.0702 LearningRate 0.000008 Epoch: 36 Global Step: 763980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:49,392-Speed 2513.15 samples/sec Loss 1.1012 LearningRate 0.000008 Epoch: 36 Global Step: 763990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:17:57,594-Speed 2497.37 samples/sec Loss 1.0603 LearningRate 0.000008 Epoch: 36 Global Step: 764000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:05,799-Speed 2496.38 samples/sec Loss 1.1121 LearningRate 0.000008 Epoch: 36 Global Step: 764010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:14,005-Speed 2496.63 samples/sec Loss 1.0914 LearningRate 0.000008 Epoch: 36 Global Step: 764020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:22,209-Speed 2496.48 samples/sec Loss 1.0942 LearningRate 0.000008 Epoch: 36 Global Step: 764030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:30,413-Speed 2496.65 samples/sec Loss 1.0996 LearningRate 0.000008 Epoch: 36 Global Step: 764040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:38,564-Speed 2513.04 samples/sec Loss 1.0593 LearningRate 0.000008 Epoch: 36 Global Step: 764050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:46,771-Speed 2496.30 samples/sec Loss 1.1042 LearningRate 0.000008 Epoch: 36 Global Step: 764060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:18:54,989-Speed 2492.69 samples/sec Loss 1.0994 LearningRate 0.000008 Epoch: 36 Global Step: 764070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:19:03,195-Speed 2496.15 samples/sec Loss 1.0640 LearningRate 0.000008 Epoch: 36 Global Step: 764080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:19:11,400-Speed 2496.32 samples/sec Loss 1.0976 LearningRate 0.000008 Epoch: 36 Global Step: 764090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:19:19,560-Speed 2510.23 samples/sec Loss 1.0915 LearningRate 0.000008 Epoch: 36 Global Step: 764100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:19:27,710-Speed 2513.37 samples/sec Loss 1.0770 LearningRate 0.000008 Epoch: 36 Global Step: 764110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:19:35,913-Speed 2496.87 samples/sec Loss 1.1086 LearningRate 0.000008 Epoch: 36 Global Step: 764120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:19:44,118-Speed 2496.40 samples/sec Loss 1.0988 LearningRate 0.000008 Epoch: 36 Global Step: 764130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:19:52,320-Speed 2497.90 samples/sec Loss 1.1157 LearningRate 0.000008 Epoch: 36 Global Step: 764140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:00,527-Speed 2495.75 samples/sec Loss 1.0519 LearningRate 0.000008 Epoch: 36 Global Step: 764150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:08,728-Speed 2497.59 samples/sec Loss 1.0829 LearningRate 0.000008 Epoch: 36 Global Step: 764160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:16,879-Speed 2513.20 samples/sec Loss 1.0718 LearningRate 0.000008 Epoch: 36 Global Step: 764170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:25,080-Speed 2497.35 samples/sec Loss 1.0977 LearningRate 0.000008 Epoch: 36 Global Step: 764180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:33,286-Speed 2496.28 samples/sec Loss 1.0900 LearningRate 0.000008 Epoch: 36 Global Step: 764190 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:41,491-Speed 2496.85 samples/sec Loss 1.0772 LearningRate 0.000008 Epoch: 36 Global Step: 764200 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:49,695-Speed 2496.54 samples/sec Loss 1.1160 LearningRate 0.000008 Epoch: 36 Global Step: 764210 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:20:57,915-Speed 2491.67 samples/sec Loss 1.0740 LearningRate 0.000008 Epoch: 36 Global Step: 764220 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:06,066-Speed 2513.18 samples/sec Loss 1.1215 LearningRate 0.000008 Epoch: 36 Global Step: 764230 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:14,282-Speed 2492.84 samples/sec Loss 1.0541 LearningRate 0.000008 Epoch: 36 Global Step: 764240 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:22,485-Speed 2497.15 samples/sec Loss 1.0720 LearningRate 0.000008 Epoch: 36 Global Step: 764250 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:30,700-Speed 2493.62 samples/sec Loss 1.0863 LearningRate 0.000008 Epoch: 36 Global Step: 764260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:38,914-Speed 2493.53 samples/sec Loss 1.0966 LearningRate 0.000008 Epoch: 36 Global Step: 764270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:47,116-Speed 2497.41 samples/sec Loss 1.0817 LearningRate 0.000008 Epoch: 36 Global Step: 764280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:21:55,272-Speed 2511.57 samples/sec Loss 1.0734 LearningRate 0.000008 Epoch: 36 Global Step: 764290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:03,478-Speed 2496.04 samples/sec Loss 1.0930 LearningRate 0.000008 Epoch: 36 Global Step: 764300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:11,683-Speed 2496.54 samples/sec Loss 1.0814 LearningRate 0.000008 Epoch: 36 Global Step: 764310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:19,885-Speed 2497.32 samples/sec Loss 1.0756 LearningRate 0.000008 Epoch: 36 Global Step: 764320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:28,088-Speed 2497.08 samples/sec Loss 1.0983 LearningRate 0.000008 Epoch: 36 Global Step: 764330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:36,298-Speed 2494.92 samples/sec Loss 1.0865 LearningRate 0.000008 Epoch: 36 Global Step: 764340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:44,453-Speed 2511.77 samples/sec Loss 1.0726 LearningRate 0.000008 Epoch: 36 Global Step: 764350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:22:52,657-Speed 2496.67 samples/sec Loss 1.0877 LearningRate 0.000008 Epoch: 36 Global Step: 764360 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:00,862-Speed 2496.66 samples/sec Loss 1.1095 LearningRate 0.000008 Epoch: 36 Global Step: 764370 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:09,065-Speed 2496.89 samples/sec Loss 1.0866 LearningRate 0.000008 Epoch: 36 Global Step: 764380 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:17,268-Speed 2497.06 samples/sec Loss 1.1222 LearningRate 0.000008 Epoch: 36 Global Step: 764390 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:25,469-Speed 2497.49 samples/sec Loss 1.0727 LearningRate 0.000008 Epoch: 36 Global Step: 764400 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:33,622-Speed 2512.64 samples/sec Loss 1.0813 LearningRate 0.000008 Epoch: 36 Global Step: 764410 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:41,827-Speed 2496.59 samples/sec Loss 1.1120 LearningRate 0.000008 Epoch: 36 Global Step: 764420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:50,028-Speed 2497.47 samples/sec Loss 1.1067 LearningRate 0.000008 Epoch: 36 Global Step: 764430 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:23:58,230-Speed 2497.29 samples/sec Loss 1.0886 LearningRate 0.000008 Epoch: 36 Global Step: 764440 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:06,435-Speed 2496.37 samples/sec Loss 1.0886 LearningRate 0.000008 Epoch: 36 Global Step: 764450 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:14,637-Speed 2497.66 samples/sec Loss 1.0735 LearningRate 0.000008 Epoch: 36 Global Step: 764460 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:22,786-Speed 2513.36 samples/sec Loss 1.0947 LearningRate 0.000008 Epoch: 36 Global Step: 764470 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:30,990-Speed 2496.88 samples/sec Loss 1.0684 LearningRate 0.000008 Epoch: 36 Global Step: 764480 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:39,193-Speed 2497.02 samples/sec Loss 1.0583 LearningRate 0.000008 Epoch: 36 Global Step: 764490 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:47,400-Speed 2495.98 samples/sec Loss 1.0976 LearningRate 0.000008 Epoch: 36 Global Step: 764500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:24:55,602-Speed 2497.57 samples/sec Loss 1.1074 LearningRate 0.000008 Epoch: 36 Global Step: 764510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:03,807-Speed 2496.64 samples/sec Loss 1.0970 LearningRate 0.000008 Epoch: 36 Global Step: 764520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:11,972-Speed 2508.36 samples/sec Loss 1.0946 LearningRate 0.000008 Epoch: 36 Global Step: 764530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:20,176-Speed 2496.80 samples/sec Loss 1.0899 LearningRate 0.000008 Epoch: 36 Global Step: 764540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:28,381-Speed 2496.60 samples/sec Loss 1.0718 LearningRate 0.000008 Epoch: 36 Global Step: 764550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:36,584-Speed 2496.83 samples/sec Loss 1.0573 LearningRate 0.000008 Epoch: 36 Global Step: 764560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:44,799-Speed 2493.47 samples/sec Loss 1.1198 LearningRate 0.000008 Epoch: 36 Global Step: 764570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:25:53,002-Speed 2497.13 samples/sec Loss 1.1074 LearningRate 0.000008 Epoch: 36 Global Step: 764580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:01,156-Speed 2511.99 samples/sec Loss 1.0571 LearningRate 0.000008 Epoch: 36 Global Step: 764590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:09,362-Speed 2496.00 samples/sec Loss 1.0460 LearningRate 0.000008 Epoch: 36 Global Step: 764600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:17,565-Speed 2497.15 samples/sec Loss 1.0867 LearningRate 0.000008 Epoch: 36 Global Step: 764610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:25,772-Speed 2496.14 samples/sec Loss 1.1162 LearningRate 0.000008 Epoch: 36 Global Step: 764620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:33,977-Speed 2496.26 samples/sec Loss 1.0888 LearningRate 0.000008 Epoch: 36 Global Step: 764630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:42,182-Speed 2496.47 samples/sec Loss 1.1033 LearningRate 0.000008 Epoch: 36 Global Step: 764640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:50,333-Speed 2512.88 samples/sec Loss 1.0892 LearningRate 0.000008 Epoch: 36 Global Step: 764650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:26:58,538-Speed 2496.65 samples/sec Loss 1.0700 LearningRate 0.000008 Epoch: 36 Global Step: 764660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:06,741-Speed 2496.82 samples/sec Loss 1.0792 LearningRate 0.000008 Epoch: 36 Global Step: 764670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:14,947-Speed 2496.22 samples/sec Loss 1.0608 LearningRate 0.000008 Epoch: 36 Global Step: 764680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:23,150-Speed 2496.94 samples/sec Loss 1.0929 LearningRate 0.000008 Epoch: 36 Global Step: 764690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:31,351-Speed 2497.49 samples/sec Loss 1.0803 LearningRate 0.000008 Epoch: 36 Global Step: 764700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:39,500-Speed 2513.47 samples/sec Loss 1.0699 LearningRate 0.000008 Epoch: 36 Global Step: 764710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:47,702-Speed 2498.02 samples/sec Loss 1.1016 LearningRate 0.000008 Epoch: 36 Global Step: 764720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:27:55,908-Speed 2496.35 samples/sec Loss 1.1059 LearningRate 0.000008 Epoch: 36 Global Step: 764730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:04,111-Speed 2496.98 samples/sec Loss 1.1034 LearningRate 0.000008 Epoch: 36 Global Step: 764740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:12,312-Speed 2497.61 samples/sec Loss 1.1214 LearningRate 0.000008 Epoch: 36 Global Step: 764750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:20,517-Speed 2496.68 samples/sec Loss 1.0734 LearningRate 0.000008 Epoch: 36 Global Step: 764760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:28,672-Speed 2511.67 samples/sec Loss 1.0659 LearningRate 0.000008 Epoch: 36 Global Step: 764770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:36,879-Speed 2495.69 samples/sec Loss 1.1010 LearningRate 0.000008 Epoch: 36 Global Step: 764780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:45,083-Speed 2496.74 samples/sec Loss 1.0999 LearningRate 0.000008 Epoch: 36 Global Step: 764790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:28:53,285-Speed 2497.12 samples/sec Loss 1.0663 LearningRate 0.000008 Epoch: 36 Global Step: 764800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:01,497-Speed 2494.46 samples/sec Loss 1.0796 LearningRate 0.000008 Epoch: 36 Global Step: 764810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:09,702-Speed 2496.51 samples/sec Loss 1.0999 LearningRate 0.000008 Epoch: 36 Global Step: 764820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:17,851-Speed 2513.50 samples/sec Loss 1.0982 LearningRate 0.000008 Epoch: 36 Global Step: 764830 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:26,055-Speed 2496.89 samples/sec Loss 1.0763 LearningRate 0.000008 Epoch: 36 Global Step: 764840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:34,263-Speed 2495.91 samples/sec Loss 1.0785 LearningRate 0.000008 Epoch: 36 Global Step: 764850 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:42,471-Speed 2495.32 samples/sec Loss 1.0823 LearningRate 0.000008 Epoch: 36 Global Step: 764860 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:50,680-Speed 2494.97 samples/sec Loss 1.1013 LearningRate 0.000008 Epoch: 36 Global Step: 764870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:29:58,888-Speed 2495.69 samples/sec Loss 1.0833 LearningRate 0.000008 Epoch: 36 Global Step: 764880 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:07,043-Speed 2511.79 samples/sec Loss 1.0596 LearningRate 0.000008 Epoch: 36 Global Step: 764890 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:15,248-Speed 2496.31 samples/sec Loss 1.0760 LearningRate 0.000008 Epoch: 36 Global Step: 764900 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:23,450-Speed 2497.44 samples/sec Loss 1.0947 LearningRate 0.000007 Epoch: 36 Global Step: 764910 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:31,654-Speed 2496.73 samples/sec Loss 1.0903 LearningRate 0.000007 Epoch: 36 Global Step: 764920 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:39,867-Speed 2494.03 samples/sec Loss 1.1155 LearningRate 0.000007 Epoch: 36 Global Step: 764930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:48,069-Speed 2497.21 samples/sec Loss 1.0647 LearningRate 0.000007 Epoch: 36 Global Step: 764940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:30:56,222-Speed 2512.48 samples/sec Loss 1.0865 LearningRate 0.000007 Epoch: 36 Global Step: 764950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:04,427-Speed 2496.40 samples/sec Loss 1.0672 LearningRate 0.000007 Epoch: 36 Global Step: 764960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:12,629-Speed 2497.31 samples/sec Loss 1.0710 LearningRate 0.000007 Epoch: 36 Global Step: 764970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:20,843-Speed 2493.47 samples/sec Loss 1.1038 LearningRate 0.000007 Epoch: 36 Global Step: 764980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:29,044-Speed 2497.98 samples/sec Loss 1.0896 LearningRate 0.000007 Epoch: 36 Global Step: 764990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:37,246-Speed 2497.48 samples/sec Loss 1.1193 LearningRate 0.000007 Epoch: 36 Global Step: 765000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:45,398-Speed 2512.62 samples/sec Loss 1.0876 LearningRate 0.000007 Epoch: 36 Global Step: 765010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:31:53,601-Speed 2497.09 samples/sec Loss 1.0880 LearningRate 0.000007 Epoch: 36 Global Step: 765020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:01,804-Speed 2496.89 samples/sec Loss 1.0996 LearningRate 0.000007 Epoch: 36 Global Step: 765030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:10,005-Speed 2497.70 samples/sec Loss 1.0607 LearningRate 0.000007 Epoch: 36 Global Step: 765040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:18,207-Speed 2497.38 samples/sec Loss 1.0817 LearningRate 0.000007 Epoch: 36 Global Step: 765050 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:26,412-Speed 2496.16 samples/sec Loss 1.0507 LearningRate 0.000007 Epoch: 36 Global Step: 765060 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:34,563-Speed 2513.19 samples/sec Loss 1.0955 LearningRate 0.000007 Epoch: 36 Global Step: 765070 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:42,768-Speed 2496.34 samples/sec Loss 1.0795 LearningRate 0.000007 Epoch: 36 Global Step: 765080 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:50,972-Speed 2496.78 samples/sec Loss 1.1265 LearningRate 0.000007 Epoch: 36 Global Step: 765090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:32:59,173-Speed 2497.54 samples/sec Loss 1.0857 LearningRate 0.000007 Epoch: 36 Global Step: 765100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:07,383-Speed 2495.16 samples/sec Loss 1.0951 LearningRate 0.000007 Epoch: 36 Global Step: 765110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:15,585-Speed 2497.60 samples/sec Loss 1.0330 LearningRate 0.000007 Epoch: 36 Global Step: 765120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:23,735-Speed 2513.24 samples/sec Loss 1.0971 LearningRate 0.000007 Epoch: 36 Global Step: 765130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:31,940-Speed 2496.58 samples/sec Loss 1.0811 LearningRate 0.000007 Epoch: 36 Global Step: 765140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:40,143-Speed 2497.01 samples/sec Loss 1.0798 LearningRate 0.000007 Epoch: 36 Global Step: 765150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:48,344-Speed 2497.53 samples/sec Loss 1.0520 LearningRate 0.000007 Epoch: 36 Global Step: 765160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:33:56,551-Speed 2495.88 samples/sec Loss 1.1155 LearningRate 0.000007 Epoch: 36 Global Step: 765170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:04,756-Speed 2496.48 samples/sec Loss 1.0900 LearningRate 0.000007 Epoch: 36 Global Step: 765180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:12,907-Speed 2512.90 samples/sec Loss 1.0977 LearningRate 0.000007 Epoch: 36 Global Step: 765190 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:21,110-Speed 2497.18 samples/sec Loss 1.1001 LearningRate 0.000007 Epoch: 36 Global Step: 765200 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:29,317-Speed 2495.86 samples/sec Loss 1.0919 LearningRate 0.000007 Epoch: 36 Global Step: 765210 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:37,522-Speed 2496.49 samples/sec Loss 1.0813 LearningRate 0.000007 Epoch: 36 Global Step: 765220 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:45,723-Speed 2497.64 samples/sec Loss 1.1021 LearningRate 0.000007 Epoch: 36 Global Step: 765230 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:34:53,926-Speed 2496.87 samples/sec Loss 1.0781 LearningRate 0.000007 Epoch: 36 Global Step: 765240 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:02,075-Speed 2513.58 samples/sec Loss 1.0814 LearningRate 0.000007 Epoch: 36 Global Step: 765250 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:10,286-Speed 2494.90 samples/sec Loss 1.1001 LearningRate 0.000007 Epoch: 36 Global Step: 765260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:18,491-Speed 2496.38 samples/sec Loss 1.0842 LearningRate 0.000007 Epoch: 36 Global Step: 765270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:26,709-Speed 2492.41 samples/sec Loss 1.0793 LearningRate 0.000007 Epoch: 36 Global Step: 765280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:34,913-Speed 2496.98 samples/sec Loss 1.0823 LearningRate 0.000007 Epoch: 36 Global Step: 765290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:35:43,119-Speed 2496.17 samples/sec Loss 1.0981 LearningRate 0.000007 Epoch: 36 Global Step: 765300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:35:51,271-Speed 2512.68 samples/sec Loss 1.0906 LearningRate 0.000007 Epoch: 36 Global Step: 765310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:35:59,473-Speed 2497.47 samples/sec Loss 1.1235 LearningRate 0.000007 Epoch: 36 Global Step: 765320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:07,679-Speed 2495.92 samples/sec Loss 1.0963 LearningRate 0.000007 Epoch: 36 Global Step: 765330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:15,881-Speed 2497.43 samples/sec Loss 1.0774 LearningRate 0.000007 Epoch: 36 Global Step: 765340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:24,089-Speed 2495.53 samples/sec Loss 1.0676 LearningRate 0.000007 Epoch: 36 Global Step: 765350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:32,290-Speed 2497.28 samples/sec Loss 1.0701 LearningRate 0.000007 Epoch: 36 Global Step: 765360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:40,440-Speed 2513.49 samples/sec Loss 1.0937 LearningRate 0.000007 Epoch: 36 Global Step: 765370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-07-12 21:36:48,600-Speed 2510.27 samples/sec Loss 1.0776 LearningRate 0.000007 Epoch: 36 Global Step: 765380 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:36:56,801-Speed 2497.64 samples/sec Loss 1.0853 LearningRate 0.000007 Epoch: 36 Global Step: 765390 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:05,008-Speed 2495.93 samples/sec Loss 1.1047 LearningRate 0.000007 Epoch: 36 Global Step: 765400 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:13,214-Speed 2496.28 samples/sec Loss 1.0948 LearningRate 0.000007 Epoch: 36 Global Step: 765410 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:21,418-Speed 2496.80 samples/sec Loss 1.0769 LearningRate 0.000007 Epoch: 36 Global Step: 765420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:29,574-Speed 2511.49 samples/sec Loss 1.0964 LearningRate 0.000007 Epoch: 36 Global Step: 765430 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:37,778-Speed 2496.64 samples/sec Loss 1.0873 LearningRate 0.000007 Epoch: 36 Global Step: 765440 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:45,980-Speed 2497.33 samples/sec Loss 1.0828 LearningRate 0.000007 Epoch: 36 Global Step: 765450 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:37:54,189-Speed 2495.13 samples/sec Loss 1.0881 LearningRate 0.000007 Epoch: 36 Global Step: 765460 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:02,395-Speed 2495.99 samples/sec Loss 1.0728 LearningRate 0.000007 Epoch: 36 Global Step: 765470 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:10,600-Speed 2496.32 samples/sec Loss 1.0992 LearningRate 0.000007 Epoch: 36 Global Step: 765480 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:18,752-Speed 2512.82 samples/sec Loss 1.0743 LearningRate 0.000007 Epoch: 36 Global Step: 765490 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:26,959-Speed 2496.15 samples/sec Loss 1.0650 LearningRate 0.000007 Epoch: 36 Global Step: 765500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:35,165-Speed 2496.24 samples/sec Loss 1.1003 LearningRate 0.000007 Epoch: 36 Global Step: 765510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:43,370-Speed 2496.59 samples/sec Loss 1.0757 LearningRate 0.000007 Epoch: 36 Global Step: 765520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:51,574-Speed 2496.65 samples/sec Loss 1.0876 LearningRate 0.000007 Epoch: 36 Global Step: 765530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:38:59,782-Speed 2495.20 samples/sec Loss 1.0987 LearningRate 0.000007 Epoch: 36 Global Step: 765540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:07,933-Speed 2513.10 samples/sec Loss 1.1075 LearningRate 0.000007 Epoch: 36 Global Step: 765550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:16,139-Speed 2496.24 samples/sec Loss 1.0721 LearningRate 0.000007 Epoch: 36 Global Step: 765560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:24,346-Speed 2496.07 samples/sec Loss 1.0806 LearningRate 0.000007 Epoch: 36 Global Step: 765570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:32,553-Speed 2495.59 samples/sec Loss 1.0948 LearningRate 0.000007 Epoch: 36 Global Step: 765580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:40,759-Speed 2496.22 samples/sec Loss 1.0570 LearningRate 0.000007 Epoch: 36 Global Step: 765590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:48,976-Speed 2492.81 samples/sec Loss 1.0713 LearningRate 0.000007 Epoch: 36 Global Step: 765600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:39:57,126-Speed 2513.49 samples/sec Loss 1.0780 LearningRate 0.000007 Epoch: 36 Global Step: 765610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:05,333-Speed 2495.56 samples/sec Loss 1.0944 LearningRate 0.000007 Epoch: 36 Global Step: 765620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:13,538-Speed 2496.46 samples/sec Loss 1.0915 LearningRate 0.000007 Epoch: 36 Global Step: 765630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:21,738-Speed 2498.11 samples/sec Loss 1.0812 LearningRate 0.000007 Epoch: 36 Global Step: 765640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:29,942-Speed 2496.92 samples/sec Loss 1.1032 LearningRate 0.000007 Epoch: 36 Global Step: 765650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:38,144-Speed 2497.39 samples/sec Loss 1.0811 LearningRate 0.000007 Epoch: 36 Global Step: 765660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:46,293-Speed 2513.70 samples/sec Loss 1.0543 LearningRate 0.000007 Epoch: 36 Global Step: 765670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:40:54,496-Speed 2497.01 samples/sec Loss 1.0787 LearningRate 0.000007 Epoch: 36 Global Step: 765680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:02,700-Speed 2496.75 samples/sec Loss 1.0797 LearningRate 0.000007 Epoch: 36 Global Step: 765690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:10,905-Speed 2496.24 samples/sec Loss 1.0980 LearningRate 0.000007 Epoch: 36 Global Step: 765700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:19,109-Speed 2496.99 samples/sec Loss 1.0933 LearningRate 0.000007 Epoch: 36 Global Step: 765710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:27,315-Speed 2495.93 samples/sec Loss 1.0937 LearningRate 0.000007 Epoch: 36 Global Step: 765720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:35,467-Speed 2512.77 samples/sec Loss 1.0706 LearningRate 0.000007 Epoch: 36 Global Step: 765730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:43,671-Speed 2496.81 samples/sec Loss 1.0892 LearningRate 0.000007 Epoch: 36 Global Step: 765740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:41:51,876-Speed 2496.48 samples/sec Loss 1.0844 LearningRate 0.000007 Epoch: 36 Global Step: 765750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:00,081-Speed 2496.63 samples/sec Loss 1.1218 LearningRate 0.000007 Epoch: 36 Global Step: 765760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:08,296-Speed 2493.24 samples/sec Loss 1.0533 LearningRate 0.000007 Epoch: 36 Global Step: 765770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:16,501-Speed 2496.30 samples/sec Loss 1.0981 LearningRate 0.000007 Epoch: 36 Global Step: 765780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:24,654-Speed 2512.32 samples/sec Loss 1.1068 LearningRate 0.000007 Epoch: 36 Global Step: 765790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:32,857-Speed 2497.01 samples/sec Loss 1.0761 LearningRate 0.000007 Epoch: 36 Global Step: 765800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:41,070-Speed 2494.13 samples/sec Loss 1.1007 LearningRate 0.000007 Epoch: 36 Global Step: 765810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:49,274-Speed 2496.44 samples/sec Loss 1.1114 LearningRate 0.000007 Epoch: 36 Global Step: 765820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:42:57,495-Speed 2491.59 samples/sec Loss 1.0858 LearningRate 0.000007 Epoch: 36 Global Step: 765830 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:05,703-Speed 2495.52 samples/sec Loss 1.0972 LearningRate 0.000007 Epoch: 36 Global Step: 765840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:13,855-Speed 2512.86 samples/sec Loss 1.0816 LearningRate 0.000007 Epoch: 36 Global Step: 765850 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:22,060-Speed 2496.36 samples/sec Loss 1.0908 LearningRate 0.000007 Epoch: 36 Global Step: 765860 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:30,264-Speed 2496.71 samples/sec Loss 1.0892 LearningRate 0.000007 Epoch: 36 Global Step: 765870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:38,468-Speed 2496.76 samples/sec Loss 1.0947 LearningRate 0.000007 Epoch: 36 Global Step: 765880 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:46,675-Speed 2495.80 samples/sec Loss 1.1062 LearningRate 0.000007 Epoch: 36 Global Step: 765890 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:43:54,880-Speed 2496.16 samples/sec Loss 1.1093 LearningRate 0.000007 Epoch: 36 Global Step: 765900 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:03,030-Speed 2513.29 samples/sec Loss 1.0948 LearningRate 0.000007 Epoch: 36 Global Step: 765910 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:11,240-Speed 2495.12 samples/sec Loss 1.0691 LearningRate 0.000007 Epoch: 36 Global Step: 765920 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:19,444-Speed 2496.51 samples/sec Loss 1.1115 LearningRate 0.000007 Epoch: 36 Global Step: 765930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:27,654-Speed 2494.92 samples/sec Loss 1.1009 LearningRate 0.000007 Epoch: 36 Global Step: 765940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:35,857-Speed 2497.17 samples/sec Loss 1.0535 LearningRate 0.000007 Epoch: 36 Global Step: 765950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:44,062-Speed 2496.51 samples/sec Loss 1.1169 LearningRate 0.000007 Epoch: 36 Global Step: 765960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:44:52,214-Speed 2512.71 samples/sec Loss 1.1317 LearningRate 0.000007 Epoch: 36 Global Step: 765970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:00,422-Speed 2495.44 samples/sec Loss 1.0905 LearningRate 0.000007 Epoch: 36 Global Step: 765980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:08,626-Speed 2496.71 samples/sec Loss 1.0556 LearningRate 0.000007 Epoch: 36 Global Step: 765990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:16,840-Speed 2493.63 samples/sec Loss 1.0844 LearningRate 0.000007 Epoch: 36 Global Step: 766000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:25,044-Speed 2497.06 samples/sec Loss 1.0928 LearningRate 0.000007 Epoch: 36 Global Step: 766010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:33,250-Speed 2496.20 samples/sec Loss 1.0733 LearningRate 0.000007 Epoch: 36 Global Step: 766020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:41,400-Speed 2513.18 samples/sec Loss 1.0593 LearningRate 0.000007 Epoch: 36 Global Step: 766030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:49,603-Speed 2497.06 samples/sec Loss 1.0552 LearningRate 0.000007 Epoch: 36 Global Step: 766040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:45:57,806-Speed 2497.04 samples/sec Loss 1.0629 LearningRate 0.000007 Epoch: 36 Global Step: 766050 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:06,008-Speed 2497.19 samples/sec Loss 1.0739 LearningRate 0.000007 Epoch: 36 Global Step: 766060 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:14,215-Speed 2496.16 samples/sec Loss 1.0651 LearningRate 0.000007 Epoch: 36 Global Step: 766070 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:22,419-Speed 2496.60 samples/sec Loss 1.0693 LearningRate 0.000007 Epoch: 36 Global Step: 766080 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:30,571-Speed 2512.61 samples/sec Loss 1.0841 LearningRate 0.000007 Epoch: 36 Global Step: 766090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:38,778-Speed 2495.86 samples/sec Loss 1.0519 LearningRate 0.000007 Epoch: 36 Global Step: 766100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:46,983-Speed 2496.50 samples/sec Loss 1.0530 LearningRate 0.000007 Epoch: 36 Global Step: 766110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:46:55,185-Speed 2497.36 samples/sec Loss 1.0633 LearningRate 0.000007 Epoch: 36 Global Step: 766120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:03,392-Speed 2496.27 samples/sec Loss 1.0501 LearningRate 0.000007 Epoch: 36 Global Step: 766130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:11,595-Speed 2496.76 samples/sec Loss 1.0778 LearningRate 0.000007 Epoch: 36 Global Step: 766140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:19,747-Speed 2512.95 samples/sec Loss 1.0855 LearningRate 0.000007 Epoch: 36 Global Step: 766150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:27,950-Speed 2496.87 samples/sec Loss 1.0864 LearningRate 0.000007 Epoch: 36 Global Step: 766160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:36,154-Speed 2496.85 samples/sec Loss 1.0810 LearningRate 0.000007 Epoch: 36 Global Step: 766170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:44,362-Speed 2495.31 samples/sec Loss 1.1064 LearningRate 0.000007 Epoch: 36 Global Step: 766180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:47:52,572-Speed 2495.12 samples/sec Loss 1.0926 LearningRate 0.000007 Epoch: 36 Global Step: 766190 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:48:00,775-Speed 2496.84 samples/sec Loss 1.0513 LearningRate 0.000007 Epoch: 36 Global Step: 766200 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-07-12 21:48:08,926-Speed 2512.92 samples/sec Loss 1.0868 LearningRate 0.000007 Epoch: 36 Global Step: 766210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:17,137-Speed 2494.72 samples/sec Loss 1.1158 LearningRate 0.000007 Epoch: 36 Global Step: 766220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:25,341-Speed 2496.92 samples/sec Loss 1.0824 LearningRate 0.000007 Epoch: 36 Global Step: 766230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:33,545-Speed 2496.78 samples/sec Loss 1.0710 LearningRate 0.000007 Epoch: 36 Global Step: 766240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:41,752-Speed 2495.94 samples/sec Loss 1.1005 LearningRate 0.000007 Epoch: 36 Global Step: 766250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:49,962-Speed 2494.68 samples/sec Loss 1.0760 LearningRate 0.000007 Epoch: 36 Global Step: 766260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:48:58,117-Speed 2512.06 samples/sec Loss 1.0792 LearningRate 0.000007 Epoch: 36 Global Step: 766270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:49:06,322-Speed 2496.06 samples/sec Loss 1.1296 LearningRate 0.000007 Epoch: 36 Global Step: 766280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 21:49:14,494-Speed 2506.68 samples/sec Loss 1.0668 LearningRate 0.000007 Epoch: 36 Global Step: 766290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:49:22,695-Speed 2497.76 samples/sec Loss 1.0718 LearningRate 0.000007 Epoch: 36 Global Step: 766300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:49:30,900-Speed 2496.14 samples/sec Loss 1.0917 LearningRate 0.000007 Epoch: 36 Global Step: 766310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:49:39,106-Speed 2496.21 samples/sec Loss 1.0752 LearningRate 0.000007 Epoch: 36 Global Step: 766320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:49:47,256-Speed 2513.51 samples/sec Loss 1.0711 LearningRate 0.000007 Epoch: 36 Global Step: 766330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:49:55,464-Speed 2495.71 samples/sec Loss 1.0642 LearningRate 0.000007 Epoch: 36 Global Step: 766340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:03,667-Speed 2496.85 samples/sec Loss 1.0915 LearningRate 0.000007 Epoch: 36 Global Step: 766350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:11,868-Speed 2497.51 samples/sec Loss 1.0876 LearningRate 0.000007 Epoch: 36 Global Step: 766360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:20,070-Speed 2497.31 samples/sec Loss 1.0637 LearningRate 0.000007 Epoch: 36 Global Step: 766370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:28,269-Speed 2498.24 samples/sec Loss 1.0845 LearningRate 0.000007 Epoch: 36 Global Step: 766380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:36,435-Speed 2508.55 samples/sec Loss 1.0857 LearningRate 0.000007 Epoch: 36 Global Step: 766390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:44,643-Speed 2495.44 samples/sec Loss 1.0770 LearningRate 0.000007 Epoch: 36 Global Step: 766400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:50:52,848-Speed 2496.91 samples/sec Loss 1.0991 LearningRate 0.000007 Epoch: 36 Global Step: 766410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:01,053-Speed 2496.12 samples/sec Loss 1.0847 LearningRate 0.000007 Epoch: 36 Global Step: 766420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:09,273-Speed 2491.95 samples/sec Loss 1.0885 LearningRate 0.000007 Epoch: 36 Global Step: 766430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:17,476-Speed 2497.10 samples/sec Loss 1.0727 LearningRate 0.000007 Epoch: 36 Global Step: 766440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:25,624-Speed 2514.15 samples/sec Loss 1.1001 LearningRate 0.000007 Epoch: 36 Global Step: 766450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:33,827-Speed 2496.88 samples/sec Loss 1.0778 LearningRate 0.000007 Epoch: 36 Global Step: 766460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:42,031-Speed 2496.73 samples/sec Loss 1.0915 LearningRate 0.000007 Epoch: 36 Global Step: 766470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:50,235-Speed 2497.06 samples/sec Loss 1.0585 LearningRate 0.000007 Epoch: 36 Global Step: 766480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:51:58,445-Speed 2494.91 samples/sec Loss 1.1090 LearningRate 0.000007 Epoch: 36 Global Step: 766490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:06,651-Speed 2495.98 samples/sec Loss 1.0666 LearningRate 0.000007 Epoch: 36 Global Step: 766500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:14,800-Speed 2513.46 samples/sec Loss 1.0902 LearningRate 0.000007 Epoch: 36 Global Step: 766510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:23,003-Speed 2497.01 samples/sec Loss 1.1101 LearningRate 0.000007 Epoch: 36 Global Step: 766520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:31,208-Speed 2496.57 samples/sec Loss 1.0903 LearningRate 0.000007 Epoch: 36 Global Step: 766530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:39,411-Speed 2496.82 samples/sec Loss 1.1047 LearningRate 0.000007 Epoch: 36 Global Step: 766540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:47,616-Speed 2496.63 samples/sec Loss 1.0853 LearningRate 0.000007 Epoch: 36 Global Step: 766550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:52:55,816-Speed 2497.78 samples/sec Loss 1.0377 LearningRate 0.000007 Epoch: 36 Global Step: 766560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:03,967-Speed 2513.00 samples/sec Loss 1.0783 LearningRate 0.000007 Epoch: 36 Global Step: 766570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:12,181-Speed 2493.83 samples/sec Loss 1.0861 LearningRate 0.000007 Epoch: 36 Global Step: 766580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:20,383-Speed 2497.26 samples/sec Loss 1.0980 LearningRate 0.000007 Epoch: 36 Global Step: 766590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:28,588-Speed 2496.62 samples/sec Loss 1.0685 LearningRate 0.000007 Epoch: 36 Global Step: 766600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:36,792-Speed 2496.70 samples/sec Loss 1.1126 LearningRate 0.000007 Epoch: 36 Global Step: 766610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:44,996-Speed 2496.56 samples/sec Loss 1.0884 LearningRate 0.000007 Epoch: 36 Global Step: 766620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:53:53,148-Speed 2512.60 samples/sec Loss 1.0850 LearningRate 0.000007 Epoch: 36 Global Step: 766630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:01,350-Speed 2497.30 samples/sec Loss 1.0436 LearningRate 0.000007 Epoch: 36 Global Step: 766640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:09,564-Speed 2493.83 samples/sec Loss 1.0827 LearningRate 0.000007 Epoch: 36 Global Step: 766650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:17,768-Speed 2496.59 samples/sec Loss 1.1134 LearningRate 0.000007 Epoch: 36 Global Step: 766660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:25,972-Speed 2496.93 samples/sec Loss 1.0969 LearningRate 0.000007 Epoch: 36 Global Step: 766670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:34,187-Speed 2493.52 samples/sec Loss 1.1010 LearningRate 0.000007 Epoch: 36 Global Step: 766680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:42,337-Speed 2513.16 samples/sec Loss 1.1047 LearningRate 0.000007 Epoch: 36 Global Step: 766690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:50,541-Speed 2496.57 samples/sec Loss 1.0844 LearningRate 0.000007 Epoch: 36 Global Step: 766700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:54:58,745-Speed 2497.01 samples/sec Loss 1.0879 LearningRate 0.000007 Epoch: 36 Global Step: 766710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:06,952-Speed 2495.83 samples/sec Loss 1.0657 LearningRate 0.000007 Epoch: 36 Global Step: 766720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:15,166-Speed 2493.60 samples/sec Loss 1.0807 LearningRate 0.000007 Epoch: 36 Global Step: 766730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:23,366-Speed 2497.71 samples/sec Loss 1.0822 LearningRate 0.000007 Epoch: 36 Global Step: 766740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:31,519-Speed 2513.04 samples/sec Loss 1.1009 LearningRate 0.000007 Epoch: 36 Global Step: 766750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:39,720-Speed 2497.84 samples/sec Loss 1.1139 LearningRate 0.000007 Epoch: 36 Global Step: 766760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:47,924-Speed 2496.78 samples/sec Loss 1.1060 LearningRate 0.000007 Epoch: 36 Global Step: 766770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:55:56,126-Speed 2497.42 samples/sec Loss 1.1120 LearningRate 0.000007 Epoch: 36 Global Step: 766780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:04,329-Speed 2497.09 samples/sec Loss 1.1067 LearningRate 0.000007 Epoch: 36 Global Step: 766790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:12,531-Speed 2497.45 samples/sec Loss 1.0990 LearningRate 0.000007 Epoch: 36 Global Step: 766800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:20,681-Speed 2513.16 samples/sec Loss 1.1125 LearningRate 0.000007 Epoch: 36 Global Step: 766810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:28,885-Speed 2496.73 samples/sec Loss 1.0715 LearningRate 0.000007 Epoch: 36 Global Step: 766820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:37,088-Speed 2497.18 samples/sec Loss 1.0861 LearningRate 0.000007 Epoch: 36 Global Step: 766830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:45,291-Speed 2496.90 samples/sec Loss 1.0659 LearningRate 0.000007 Epoch: 36 Global Step: 766840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:56:53,496-Speed 2496.70 samples/sec Loss 1.1216 LearningRate 0.000007 Epoch: 36 Global Step: 766850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:01,698-Speed 2497.07 samples/sec Loss 1.0905 LearningRate 0.000007 Epoch: 36 Global Step: 766860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:09,867-Speed 2507.73 samples/sec Loss 1.1055 LearningRate 0.000007 Epoch: 36 Global Step: 766870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:18,069-Speed 2497.24 samples/sec Loss 1.0557 LearningRate 0.000007 Epoch: 36 Global Step: 766880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:26,272-Speed 2497.00 samples/sec Loss 1.0841 LearningRate 0.000007 Epoch: 36 Global Step: 766890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:34,477-Speed 2496.22 samples/sec Loss 1.0748 LearningRate 0.000007 Epoch: 36 Global Step: 766900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:42,677-Speed 2498.11 samples/sec Loss 1.0900 LearningRate 0.000007 Epoch: 36 Global Step: 766910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:50,904-Speed 2489.53 samples/sec Loss 1.0917 LearningRate 0.000007 Epoch: 36 Global Step: 766920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:57:59,053-Speed 2513.59 samples/sec Loss 1.0876 LearningRate 0.000007 Epoch: 36 Global Step: 766930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:07,255-Speed 2497.22 samples/sec Loss 1.0756 LearningRate 0.000007 Epoch: 36 Global Step: 766940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:15,458-Speed 2497.27 samples/sec Loss 1.0992 LearningRate 0.000007 Epoch: 36 Global Step: 766950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:23,662-Speed 2496.92 samples/sec Loss 1.0988 LearningRate 0.000007 Epoch: 36 Global Step: 766960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:31,866-Speed 2496.54 samples/sec Loss 1.1034 LearningRate 0.000007 Epoch: 36 Global Step: 766970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:40,069-Speed 2497.11 samples/sec Loss 1.0808 LearningRate 0.000007 Epoch: 36 Global Step: 766980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:48,218-Speed 2513.73 samples/sec Loss 1.0814 LearningRate 0.000007 Epoch: 36 Global Step: 766990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:58:56,422-Speed 2496.74 samples/sec Loss 1.0892 LearningRate 0.000007 Epoch: 36 Global Step: 767000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:04,625-Speed 2496.62 samples/sec Loss 1.0928 LearningRate 0.000007 Epoch: 36 Global Step: 767010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:12,842-Speed 2492.92 samples/sec Loss 1.0864 LearningRate 0.000007 Epoch: 36 Global Step: 767020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:21,049-Speed 2495.96 samples/sec Loss 1.1001 LearningRate 0.000007 Epoch: 36 Global Step: 767030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:29,254-Speed 2496.61 samples/sec Loss 1.1016 LearningRate 0.000007 Epoch: 36 Global Step: 767040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:37,405-Speed 2513.12 samples/sec Loss 1.1112 LearningRate 0.000007 Epoch: 36 Global Step: 767050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:45,607-Speed 2497.34 samples/sec Loss 1.1129 LearningRate 0.000007 Epoch: 36 Global Step: 767060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 21:59:53,809-Speed 2497.42 samples/sec Loss 1.0828 LearningRate 0.000007 Epoch: 36 Global Step: 767070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:02,013-Speed 2496.57 samples/sec Loss 1.0756 LearningRate 0.000007 Epoch: 36 Global Step: 767080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:10,219-Speed 2496.11 samples/sec Loss 1.1078 LearningRate 0.000007 Epoch: 36 Global Step: 767090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:18,426-Speed 2495.92 samples/sec Loss 1.1067 LearningRate 0.000007 Epoch: 36 Global Step: 767100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:26,578-Speed 2512.86 samples/sec Loss 1.0707 LearningRate 0.000007 Epoch: 36 Global Step: 767110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:34,786-Speed 2495.16 samples/sec Loss 1.0907 LearningRate 0.000007 Epoch: 36 Global Step: 767120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:42,990-Speed 2496.78 samples/sec Loss 1.0742 LearningRate 0.000007 Epoch: 36 Global Step: 767130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:51,196-Speed 2496.35 samples/sec Loss 1.0781 LearningRate 0.000007 Epoch: 36 Global Step: 767140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:00:59,401-Speed 2496.36 samples/sec Loss 1.0724 LearningRate 0.000007 Epoch: 36 Global Step: 767150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:07,615-Speed 2493.68 samples/sec Loss 1.0845 LearningRate 0.000007 Epoch: 36 Global Step: 767160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:15,773-Speed 2510.84 samples/sec Loss 1.0852 LearningRate 0.000007 Epoch: 36 Global Step: 767170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:23,978-Speed 2496.59 samples/sec Loss 1.1065 LearningRate 0.000007 Epoch: 36 Global Step: 767180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:32,180-Speed 2497.45 samples/sec Loss 1.0815 LearningRate 0.000007 Epoch: 36 Global Step: 767190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:40,389-Speed 2495.18 samples/sec Loss 1.0763 LearningRate 0.000007 Epoch: 36 Global Step: 767200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:48,592-Speed 2496.89 samples/sec Loss 1.0878 LearningRate 0.000007 Epoch: 36 Global Step: 767210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:01:56,795-Speed 2497.18 samples/sec Loss 1.0897 LearningRate 0.000007 Epoch: 36 Global Step: 767220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:04,957-Speed 2509.70 samples/sec Loss 1.0650 LearningRate 0.000007 Epoch: 36 Global Step: 767230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:13,159-Speed 2497.10 samples/sec Loss 1.0827 LearningRate 0.000007 Epoch: 36 Global Step: 767240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:21,367-Speed 2495.53 samples/sec Loss 1.0996 LearningRate 0.000007 Epoch: 36 Global Step: 767250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:29,564-Speed 2498.98 samples/sec Loss 1.1000 LearningRate 0.000007 Epoch: 36 Global Step: 767260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:37,764-Speed 2497.91 samples/sec Loss 1.0777 LearningRate 0.000007 Epoch: 36 Global Step: 767270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:45,967-Speed 2496.97 samples/sec Loss 1.0948 LearningRate 0.000007 Epoch: 36 Global Step: 767280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:02:54,116-Speed 2513.38 samples/sec Loss 1.0897 LearningRate 0.000007 Epoch: 36 Global Step: 767290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:02,321-Speed 2496.69 samples/sec Loss 1.0851 LearningRate 0.000007 Epoch: 36 Global Step: 767300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:10,523-Speed 2498.00 samples/sec Loss 1.1195 LearningRate 0.000007 Epoch: 36 Global Step: 767310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:18,727-Speed 2496.64 samples/sec Loss 1.0866 LearningRate 0.000007 Epoch: 36 Global Step: 767320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:26,930-Speed 2497.15 samples/sec Loss 1.0965 LearningRate 0.000007 Epoch: 36 Global Step: 767330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:35,133-Speed 2497.06 samples/sec Loss 1.0755 LearningRate 0.000007 Epoch: 36 Global Step: 767340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:43,296-Speed 2509.09 samples/sec Loss 1.1142 LearningRate 0.000007 Epoch: 36 Global Step: 767350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:03:51,499-Speed 2496.92 samples/sec Loss 1.0733 LearningRate 0.000007 Epoch: 36 Global Step: 767360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:02,167-Speed 1920.11 samples/sec Loss 1.1237 LearningRate 0.000007 Epoch: 37 Global Step: 767370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:10,368-Speed 2498.09 samples/sec Loss 1.0839 LearningRate 0.000007 Epoch: 37 Global Step: 767380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:18,573-Speed 2496.54 samples/sec Loss 1.0768 LearningRate 0.000007 Epoch: 37 Global Step: 767390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:26,775-Speed 2497.21 samples/sec Loss 1.0936 LearningRate 0.000007 Epoch: 37 Global Step: 767400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:34,929-Speed 2512.14 samples/sec Loss 1.0834 LearningRate 0.000007 Epoch: 37 Global Step: 767410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:43,136-Speed 2495.67 samples/sec Loss 1.0828 LearningRate 0.000007 Epoch: 37 Global Step: 767420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:51,351-Speed 2493.74 samples/sec Loss 1.0841 LearningRate 0.000007 Epoch: 37 Global Step: 767430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:04:59,555-Speed 2497.00 samples/sec Loss 1.1115 LearningRate 0.000007 Epoch: 37 Global Step: 767440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:05:07,759-Speed 2496.82 samples/sec Loss 1.0935 LearningRate 0.000007 Epoch: 37 Global Step: 767450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:05:15,962-Speed 2496.80 samples/sec Loss 1.0875 LearningRate 0.000007 Epoch: 37 Global Step: 767460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:05:24,117-Speed 2511.97 samples/sec Loss 1.0706 LearningRate 0.000007 Epoch: 37 Global Step: 767470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:05:32,319-Speed 2497.40 samples/sec Loss 1.1101 LearningRate 0.000007 Epoch: 37 Global Step: 767480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:05:40,528-Speed 2494.94 samples/sec Loss 1.0988 LearningRate 0.000007 Epoch: 37 Global Step: 767490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:05:48,747-Speed 2492.19 samples/sec Loss 1.0840 LearningRate 0.000007 Epoch: 37 Global Step: 767500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:05:56,955-Speed 2495.78 samples/sec Loss 1.0650 LearningRate 0.000007 Epoch: 37 Global Step: 767510 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:05,158-Speed 2496.75 samples/sec Loss 1.0840 LearningRate 0.000007 Epoch: 37 Global Step: 767520 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:13,310-Speed 2512.73 samples/sec Loss 1.0704 LearningRate 0.000007 Epoch: 37 Global Step: 767530 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:21,515-Speed 2496.51 samples/sec Loss 1.0485 LearningRate 0.000007 Epoch: 37 Global Step: 767540 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:29,722-Speed 2495.85 samples/sec Loss 1.0968 LearningRate 0.000007 Epoch: 37 Global Step: 767550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:37,928-Speed 2496.12 samples/sec Loss 1.0724 LearningRate 0.000007 Epoch: 37 Global Step: 767560 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:46,129-Speed 2497.66 samples/sec Loss 1.0746 LearningRate 0.000007 Epoch: 37 Global Step: 767570 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:06:54,334-Speed 2496.43 samples/sec Loss 1.0967 LearningRate 0.000007 Epoch: 37 Global Step: 767580 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:02,488-Speed 2512.29 samples/sec Loss 1.0959 LearningRate 0.000007 Epoch: 37 Global Step: 767590 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:10,693-Speed 2496.70 samples/sec Loss 1.0750 LearningRate 0.000007 Epoch: 37 Global Step: 767600 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:18,896-Speed 2497.06 samples/sec Loss 1.0896 LearningRate 0.000007 Epoch: 37 Global Step: 767610 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:27,097-Speed 2497.83 samples/sec Loss 1.0792 LearningRate 0.000007 Epoch: 37 Global Step: 767620 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:35,303-Speed 2496.19 samples/sec Loss 1.0611 LearningRate 0.000007 Epoch: 37 Global Step: 767630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:43,505-Speed 2497.46 samples/sec Loss 1.0818 LearningRate 0.000007 Epoch: 37 Global Step: 767640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:51,652-Speed 2514.21 samples/sec Loss 1.0620 LearningRate 0.000007 Epoch: 37 Global Step: 767650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:07:59,861-Speed 2495.02 samples/sec Loss 1.0817 LearningRate 0.000007 Epoch: 37 Global Step: 767660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:08,066-Speed 2496.65 samples/sec Loss 1.0923 LearningRate 0.000007 Epoch: 37 Global Step: 767670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:16,270-Speed 2496.86 samples/sec Loss 1.0561 LearningRate 0.000007 Epoch: 37 Global Step: 767680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:24,473-Speed 2496.88 samples/sec Loss 1.0832 LearningRate 0.000007 Epoch: 37 Global Step: 767690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:32,676-Speed 2496.80 samples/sec Loss 1.0983 LearningRate 0.000007 Epoch: 37 Global Step: 767700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:40,832-Speed 2511.62 samples/sec Loss 1.0597 LearningRate 0.000007 Epoch: 37 Global Step: 767710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:49,040-Speed 2495.34 samples/sec Loss 1.0655 LearningRate 0.000007 Epoch: 37 Global Step: 767720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:08:57,248-Speed 2495.59 samples/sec Loss 1.1031 LearningRate 0.000007 Epoch: 37 Global Step: 767730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:05,454-Speed 2496.26 samples/sec Loss 1.0819 LearningRate 0.000007 Epoch: 37 Global Step: 767740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:13,658-Speed 2497.03 samples/sec Loss 1.0560 LearningRate 0.000007 Epoch: 37 Global Step: 767750 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:21,864-Speed 2495.92 samples/sec Loss 1.0895 LearningRate 0.000007 Epoch: 37 Global Step: 767760 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:30,014-Speed 2513.27 samples/sec Loss 1.0716 LearningRate 0.000007 Epoch: 37 Global Step: 767770 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:38,216-Speed 2497.42 samples/sec Loss 1.0775 LearningRate 0.000007 Epoch: 37 Global Step: 767780 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:46,421-Speed 2496.35 samples/sec Loss 1.1107 LearningRate 0.000007 Epoch: 37 Global Step: 767790 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:09:54,626-Speed 2496.39 samples/sec Loss 1.0333 LearningRate 0.000007 Epoch: 37 Global Step: 767800 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:02,829-Speed 2497.09 samples/sec Loss 1.0641 LearningRate 0.000007 Epoch: 37 Global Step: 767810 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:11,036-Speed 2495.84 samples/sec Loss 1.0840 LearningRate 0.000007 Epoch: 37 Global Step: 767820 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:19,187-Speed 2512.95 samples/sec Loss 1.0924 LearningRate 0.000007 Epoch: 37 Global Step: 767830 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:27,393-Speed 2496.07 samples/sec Loss 1.0765 LearningRate 0.000007 Epoch: 37 Global Step: 767840 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:35,597-Speed 2496.99 samples/sec Loss 1.0826 LearningRate 0.000007 Epoch: 37 Global Step: 767850 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:43,803-Speed 2495.99 samples/sec Loss 1.0868 LearningRate 0.000007 Epoch: 37 Global Step: 767860 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:10:52,006-Speed 2497.26 samples/sec Loss 1.0696 LearningRate 0.000007 Epoch: 37 Global Step: 767870 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:00,211-Speed 2496.54 samples/sec Loss 1.0654 LearningRate 0.000007 Epoch: 37 Global Step: 767880 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:08,359-Speed 2513.72 samples/sec Loss 1.0888 LearningRate 0.000007 Epoch: 37 Global Step: 767890 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:16,565-Speed 2496.11 samples/sec Loss 1.0650 LearningRate 0.000007 Epoch: 37 Global Step: 767900 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:24,772-Speed 2495.96 samples/sec Loss 1.0831 LearningRate 0.000007 Epoch: 37 Global Step: 767910 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:32,978-Speed 2495.94 samples/sec Loss 1.0424 LearningRate 0.000007 Epoch: 37 Global Step: 767920 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:41,185-Speed 2495.89 samples/sec Loss 1.0952 LearningRate 0.000007 Epoch: 37 Global Step: 767930 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:49,387-Speed 2497.33 samples/sec Loss 1.0752 LearningRate 0.000007 Epoch: 37 Global Step: 767940 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:11:57,538-Speed 2513.02 samples/sec Loss 1.0933 LearningRate 0.000007 Epoch: 37 Global Step: 767950 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:05,745-Speed 2495.90 samples/sec Loss 1.0775 LearningRate 0.000007 Epoch: 37 Global Step: 767960 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:13,951-Speed 2496.26 samples/sec Loss 1.0464 LearningRate 0.000007 Epoch: 37 Global Step: 767970 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:22,152-Speed 2497.57 samples/sec Loss 1.0645 LearningRate 0.000007 Epoch: 37 Global Step: 767980 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:30,356-Speed 2496.92 samples/sec Loss 1.0798 LearningRate 0.000007 Epoch: 37 Global Step: 767990 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:38,557-Speed 2497.43 samples/sec Loss 1.0683 LearningRate 0.000007 Epoch: 37 Global Step: 768000 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:46,734-Speed 2505.20 samples/sec Loss 1.0722 LearningRate 0.000007 Epoch: 37 Global Step: 768010 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:12:54,941-Speed 2495.52 samples/sec Loss 1.0882 LearningRate 0.000007 Epoch: 37 Global Step: 768020 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:03,149-Speed 2495.61 samples/sec Loss 1.0916 LearningRate 0.000007 Epoch: 37 Global Step: 768030 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:11,355-Speed 2495.96 samples/sec Loss 1.0729 LearningRate 0.000007 Epoch: 37 Global Step: 768040 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:19,560-Speed 2496.59 samples/sec Loss 1.0811 LearningRate 0.000007 Epoch: 37 Global Step: 768050 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:27,764-Speed 2496.73 samples/sec Loss 1.1098 LearningRate 0.000007 Epoch: 37 Global Step: 768060 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:35,916-Speed 2512.77 samples/sec Loss 1.0841 LearningRate 0.000007 Epoch: 37 Global Step: 768070 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:44,130-Speed 2493.67 samples/sec Loss 1.0950 LearningRate 0.000007 Epoch: 37 Global Step: 768080 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:13:52,333-Speed 2497.41 samples/sec Loss 1.0885 LearningRate 0.000007 Epoch: 37 Global Step: 768090 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:00,535-Speed 2497.45 samples/sec Loss 1.0787 LearningRate 0.000007 Epoch: 37 Global Step: 768100 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:08,744-Speed 2495.66 samples/sec Loss 1.0616 LearningRate 0.000007 Epoch: 37 Global Step: 768110 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:16,947-Speed 2496.91 samples/sec Loss 1.0940 LearningRate 0.000007 Epoch: 37 Global Step: 768120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:25,101-Speed 2512.13 samples/sec Loss 1.1123 LearningRate 0.000007 Epoch: 37 Global Step: 768130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:33,310-Speed 2495.24 samples/sec Loss 1.1000 LearningRate 0.000007 Epoch: 37 Global Step: 768140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:41,515-Speed 2496.74 samples/sec Loss 1.0634 LearningRate 0.000007 Epoch: 37 Global Step: 768150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:49,719-Speed 2496.73 samples/sec Loss 1.0789 LearningRate 0.000007 Epoch: 37 Global Step: 768160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:14:57,925-Speed 2495.99 samples/sec Loss 1.0836 LearningRate 0.000007 Epoch: 37 Global Step: 768170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:06,126-Speed 2497.58 samples/sec Loss 1.1054 LearningRate 0.000007 Epoch: 37 Global Step: 768180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:14,280-Speed 2512.27 samples/sec Loss 1.1091 LearningRate 0.000007 Epoch: 37 Global Step: 768190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:22,486-Speed 2496.39 samples/sec Loss 1.1079 LearningRate 0.000007 Epoch: 37 Global Step: 768200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:30,689-Speed 2497.16 samples/sec Loss 1.0791 LearningRate 0.000007 Epoch: 37 Global Step: 768210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:38,907-Speed 2492.31 samples/sec Loss 1.1012 LearningRate 0.000007 Epoch: 37 Global Step: 768220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:47,121-Speed 2493.93 samples/sec Loss 1.0721 LearningRate 0.000007 Epoch: 37 Global Step: 768230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:15:55,325-Speed 2496.69 samples/sec Loss 1.0954 LearningRate 0.000007 Epoch: 37 Global Step: 768240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:03,478-Speed 2512.24 samples/sec Loss 1.0768 LearningRate 0.000007 Epoch: 37 Global Step: 768250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:11,681-Speed 2497.46 samples/sec Loss 1.0907 LearningRate 0.000007 Epoch: 37 Global Step: 768260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:19,888-Speed 2495.82 samples/sec Loss 1.0703 LearningRate 0.000007 Epoch: 37 Global Step: 768270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:28,095-Speed 2495.58 samples/sec Loss 1.0489 LearningRate 0.000007 Epoch: 37 Global Step: 768280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:36,298-Speed 2497.07 samples/sec Loss 1.0806 LearningRate 0.000007 Epoch: 37 Global Step: 768290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:44,504-Speed 2496.27 samples/sec Loss 1.1072 LearningRate 0.000007 Epoch: 37 Global Step: 768300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:16:52,652-Speed 2513.90 samples/sec Loss 1.0620 LearningRate 0.000007 Epoch: 37 Global Step: 768310 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:00,854-Speed 2497.04 samples/sec Loss 1.0921 LearningRate 0.000007 Epoch: 37 Global Step: 768320 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:09,060-Speed 2496.40 samples/sec Loss 1.0712 LearningRate 0.000007 Epoch: 37 Global Step: 768330 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:17,265-Speed 2496.39 samples/sec Loss 1.0944 LearningRate 0.000007 Epoch: 37 Global Step: 768340 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:25,468-Speed 2497.01 samples/sec Loss 1.0794 LearningRate 0.000007 Epoch: 37 Global Step: 768350 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:33,675-Speed 2495.94 samples/sec Loss 1.0929 LearningRate 0.000007 Epoch: 37 Global Step: 768360 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:41,827-Speed 2512.68 samples/sec Loss 1.0900 LearningRate 0.000007 Epoch: 37 Global Step: 768370 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:50,030-Speed 2496.73 samples/sec Loss 1.0550 LearningRate 0.000007 Epoch: 37 Global Step: 768380 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:17:58,240-Speed 2495.12 samples/sec Loss 1.1271 LearningRate 0.000007 Epoch: 37 Global Step: 768390 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:06,458-Speed 2492.18 samples/sec Loss 1.0684 LearningRate 0.000007 Epoch: 37 Global Step: 768400 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:14,665-Speed 2495.82 samples/sec Loss 1.0814 LearningRate 0.000007 Epoch: 37 Global Step: 768410 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:22,870-Speed 2496.41 samples/sec Loss 1.0962 LearningRate 0.000007 Epoch: 37 Global Step: 768420 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:31,024-Speed 2512.31 samples/sec Loss 1.0495 LearningRate 0.000007 Epoch: 37 Global Step: 768430 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:39,230-Speed 2496.43 samples/sec Loss 1.0681 LearningRate 0.000007 Epoch: 37 Global Step: 768440 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:47,435-Speed 2496.36 samples/sec Loss 1.0724 LearningRate 0.000007 Epoch: 37 Global Step: 768450 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:18:55,637-Speed 2497.13 samples/sec Loss 1.0757 LearningRate 0.000007 Epoch: 37 Global Step: 768460 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:03,842-Speed 2496.56 samples/sec Loss 1.0741 LearningRate 0.000007 Epoch: 37 Global Step: 768470 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:12,049-Speed 2496.15 samples/sec Loss 1.0907 LearningRate 0.000007 Epoch: 37 Global Step: 768480 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:20,206-Speed 2511.26 samples/sec Loss 1.0752 LearningRate 0.000007 Epoch: 37 Global Step: 768490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:28,408-Speed 2497.13 samples/sec Loss 1.0871 LearningRate 0.000007 Epoch: 37 Global Step: 768500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:36,612-Speed 2496.70 samples/sec Loss 1.0770 LearningRate 0.000007 Epoch: 37 Global Step: 768510 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:44,821-Speed 2495.49 samples/sec Loss 1.0686 LearningRate 0.000007 Epoch: 37 Global Step: 768520 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:19:53,025-Speed 2496.76 samples/sec Loss 1.0439 LearningRate 0.000007 Epoch: 37 Global Step: 768530 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:01,231-Speed 2495.93 samples/sec Loss 1.0816 LearningRate 0.000007 Epoch: 37 Global Step: 768540 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:09,384-Speed 2512.25 samples/sec Loss 1.0797 LearningRate 0.000007 Epoch: 37 Global Step: 768550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:17,595-Speed 2494.56 samples/sec Loss 1.0813 LearningRate 0.000007 Epoch: 37 Global Step: 768560 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:25,799-Speed 2496.67 samples/sec Loss 1.0835 LearningRate 0.000007 Epoch: 37 Global Step: 768570 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:34,021-Speed 2491.50 samples/sec Loss 1.0740 LearningRate 0.000007 Epoch: 37 Global Step: 768580 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:42,224-Speed 2496.89 samples/sec Loss 1.0997 LearningRate 0.000007 Epoch: 37 Global Step: 768590 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:50,431-Speed 2496.02 samples/sec Loss 1.0950 LearningRate 0.000007 Epoch: 37 Global Step: 768600 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:20:58,589-Speed 2510.57 samples/sec Loss 1.0781 LearningRate 0.000007 Epoch: 37 Global Step: 768610 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:06,794-Speed 2496.46 samples/sec Loss 1.0911 LearningRate 0.000007 Epoch: 37 Global Step: 768620 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:15,000-Speed 2496.12 samples/sec Loss 1.0912 LearningRate 0.000007 Epoch: 37 Global Step: 768630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:23,204-Speed 2496.80 samples/sec Loss 1.0903 LearningRate 0.000007 Epoch: 37 Global Step: 768640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:31,424-Speed 2492.11 samples/sec Loss 1.1046 LearningRate 0.000007 Epoch: 37 Global Step: 768650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:39,628-Speed 2496.74 samples/sec Loss 1.0684 LearningRate 0.000007 Epoch: 37 Global Step: 768660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:47,782-Speed 2512.01 samples/sec Loss 1.0611 LearningRate 0.000007 Epoch: 37 Global Step: 768670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:21:55,985-Speed 2496.89 samples/sec Loss 1.1092 LearningRate 0.000007 Epoch: 37 Global Step: 768680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:22:04,189-Speed 2497.22 samples/sec Loss 1.0738 LearningRate 0.000007 Epoch: 37 Global Step: 768690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-07-12 22:22:12,394-Speed 2496.26 samples/sec Loss 1.0881 LearningRate 0.000007 Epoch: 37 Global Step: 768700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-07-12 22:22:20,557-Speed 2509.41 samples/sec Loss 1.0978 LearningRate 0.000007 Epoch: 37 Global Step: 768710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:22:28,768-Speed 2494.37 samples/sec Loss 1.0929 LearningRate 0.000007 Epoch: 37 Global Step: 768720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:22:36,925-Speed 2511.15 samples/sec Loss 1.0536 LearningRate 0.000007 Epoch: 37 Global Step: 768730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:22:45,132-Speed 2495.74 samples/sec Loss 1.0691 LearningRate 0.000007 Epoch: 37 Global Step: 768740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:22:53,342-Speed 2494.83 samples/sec Loss 1.0730 LearningRate 0.000007 Epoch: 37 Global Step: 768750 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:01,557-Speed 2493.53 samples/sec Loss 1.0743 LearningRate 0.000007 Epoch: 37 Global Step: 768760 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:09,771-Speed 2493.57 samples/sec Loss 1.0486 LearningRate 0.000007 Epoch: 37 Global Step: 768770 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:17,977-Speed 2496.08 samples/sec Loss 1.0662 LearningRate 0.000007 Epoch: 37 Global Step: 768780 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:26,134-Speed 2511.09 samples/sec Loss 1.0764 LearningRate 0.000007 Epoch: 37 Global Step: 768790 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:34,339-Speed 2496.67 samples/sec Loss 1.0963 LearningRate 0.000007 Epoch: 37 Global Step: 768800 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:42,547-Speed 2495.45 samples/sec Loss 1.0813 LearningRate 0.000007 Epoch: 37 Global Step: 768810 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:50,751-Speed 2496.96 samples/sec Loss 1.0799 LearningRate 0.000007 Epoch: 37 Global Step: 768820 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:23:58,958-Speed 2495.81 samples/sec Loss 1.0473 LearningRate 0.000007 Epoch: 37 Global Step: 768830 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:07,162-Speed 2497.11 samples/sec Loss 1.0563 LearningRate 0.000007 Epoch: 37 Global Step: 768840 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:15,312-Speed 2513.19 samples/sec Loss 1.0970 LearningRate 0.000007 Epoch: 37 Global Step: 768850 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:23,517-Speed 2497.53 samples/sec Loss 1.0707 LearningRate 0.000007 Epoch: 37 Global Step: 768860 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:31,729-Speed 2494.62 samples/sec Loss 1.0840 LearningRate 0.000007 Epoch: 37 Global Step: 768870 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:39,936-Speed 2495.78 samples/sec Loss 1.0662 LearningRate 0.000007 Epoch: 37 Global Step: 768880 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:48,155-Speed 2492.07 samples/sec Loss 1.0803 LearningRate 0.000007 Epoch: 37 Global Step: 768890 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:24:56,368-Speed 2494.08 samples/sec Loss 1.0664 LearningRate 0.000007 Epoch: 37 Global Step: 768900 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:04,533-Speed 2508.74 samples/sec Loss 1.0660 LearningRate 0.000007 Epoch: 37 Global Step: 768910 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:12,736-Speed 2497.16 samples/sec Loss 1.0712 LearningRate 0.000007 Epoch: 37 Global Step: 768920 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:20,954-Speed 2492.70 samples/sec Loss 1.1034 LearningRate 0.000007 Epoch: 37 Global Step: 768930 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:29,164-Speed 2494.87 samples/sec Loss 1.0527 LearningRate 0.000007 Epoch: 37 Global Step: 768940 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:37,371-Speed 2495.79 samples/sec Loss 1.0643 LearningRate 0.000007 Epoch: 37 Global Step: 768950 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:45,580-Speed 2495.09 samples/sec Loss 1.0886 LearningRate 0.000007 Epoch: 37 Global Step: 768960 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:25:53,736-Speed 2511.54 samples/sec Loss 1.1269 LearningRate 0.000007 Epoch: 37 Global Step: 768970 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:01,941-Speed 2496.39 samples/sec Loss 1.0830 LearningRate 0.000007 Epoch: 37 Global Step: 768980 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:10,152-Speed 2494.91 samples/sec Loss 1.0841 LearningRate 0.000007 Epoch: 37 Global Step: 768990 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:18,363-Speed 2494.62 samples/sec Loss 1.0803 LearningRate 0.000007 Epoch: 37 Global Step: 769000 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:26,566-Speed 2497.16 samples/sec Loss 1.0607 LearningRate 0.000007 Epoch: 37 Global Step: 769010 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:34,781-Speed 2493.22 samples/sec Loss 1.0861 LearningRate 0.000007 Epoch: 37 Global Step: 769020 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:42,934-Speed 2512.43 samples/sec Loss 1.1011 LearningRate 0.000007 Epoch: 37 Global Step: 769030 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:51,136-Speed 2497.16 samples/sec Loss 1.0921 LearningRate 0.000007 Epoch: 37 Global Step: 769040 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:26:59,343-Speed 2495.93 samples/sec Loss 1.1010 LearningRate 0.000007 Epoch: 37 Global Step: 769050 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:07,547-Speed 2496.82 samples/sec Loss 1.0933 LearningRate 0.000007 Epoch: 37 Global Step: 769060 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:15,752-Speed 2496.39 samples/sec Loss 1.0731 LearningRate 0.000007 Epoch: 37 Global Step: 769070 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:23,967-Speed 2493.26 samples/sec Loss 1.0812 LearningRate 0.000007 Epoch: 37 Global Step: 769080 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:32,121-Speed 2512.28 samples/sec Loss 1.1065 LearningRate 0.000007 Epoch: 37 Global Step: 769090 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:40,325-Speed 2496.74 samples/sec Loss 1.0532 LearningRate 0.000007 Epoch: 37 Global Step: 769100 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:48,531-Speed 2496.17 samples/sec Loss 1.0699 LearningRate 0.000007 Epoch: 37 Global Step: 769110 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:27:56,736-Speed 2496.15 samples/sec Loss 1.0673 LearningRate 0.000007 Epoch: 37 Global Step: 769120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:04,944-Speed 2495.63 samples/sec Loss 1.0956 LearningRate 0.000007 Epoch: 37 Global Step: 769130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:13,148-Speed 2496.76 samples/sec Loss 1.0796 LearningRate 0.000007 Epoch: 37 Global Step: 769140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:21,298-Speed 2513.35 samples/sec Loss 1.0707 LearningRate 0.000007 Epoch: 37 Global Step: 769150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:29,505-Speed 2495.99 samples/sec Loss 1.0862 LearningRate 0.000007 Epoch: 37 Global Step: 769160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:37,720-Speed 2493.32 samples/sec Loss 1.0862 LearningRate 0.000007 Epoch: 37 Global Step: 769170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:45,924-Speed 2496.82 samples/sec Loss 1.0867 LearningRate 0.000007 Epoch: 37 Global Step: 769180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:28:54,129-Speed 2496.47 samples/sec Loss 1.0969 LearningRate 0.000007 Epoch: 37 Global Step: 769190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:02,332-Speed 2496.78 samples/sec Loss 1.0779 LearningRate 0.000007 Epoch: 37 Global Step: 769200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:10,499-Speed 2508.26 samples/sec Loss 1.0949 LearningRate 0.000007 Epoch: 37 Global Step: 769210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:18,703-Speed 2496.70 samples/sec Loss 1.0908 LearningRate 0.000007 Epoch: 37 Global Step: 769220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:26,921-Speed 2492.73 samples/sec Loss 1.0892 LearningRate 0.000007 Epoch: 37 Global Step: 769230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:35,149-Speed 2489.44 samples/sec Loss 1.1058 LearningRate 0.000007 Epoch: 37 Global Step: 769240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:43,353-Speed 2496.43 samples/sec Loss 1.0887 LearningRate 0.000007 Epoch: 37 Global Step: 769250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:51,560-Speed 2495.90 samples/sec Loss 1.0493 LearningRate 0.000007 Epoch: 37 Global Step: 769260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:29:59,712-Speed 2512.78 samples/sec Loss 1.0706 LearningRate 0.000007 Epoch: 37 Global Step: 769270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:30:07,874-Speed 2509.31 samples/sec Loss 1.0620 LearningRate 0.000007 Epoch: 37 Global Step: 769280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:16,090-Speed 2492.91 samples/sec Loss 1.0802 LearningRate 0.000007 Epoch: 37 Global Step: 769290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:24,296-Speed 2496.19 samples/sec Loss 1.0780 LearningRate 0.000007 Epoch: 37 Global Step: 769300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:32,505-Speed 2495.28 samples/sec Loss 1.0729 LearningRate 0.000007 Epoch: 37 Global Step: 769310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:40,722-Speed 2492.67 samples/sec Loss 1.1212 LearningRate 0.000007 Epoch: 37 Global Step: 769320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:48,887-Speed 2508.98 samples/sec Loss 1.0903 LearningRate 0.000007 Epoch: 37 Global Step: 769330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:30:57,093-Speed 2496.32 samples/sec Loss 1.0884 LearningRate 0.000007 Epoch: 37 Global Step: 769340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:05,298-Speed 2496.13 samples/sec Loss 1.0851 LearningRate 0.000007 Epoch: 37 Global Step: 769350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:13,511-Speed 2493.98 samples/sec Loss 1.0668 LearningRate 0.000007 Epoch: 37 Global Step: 769360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:21,725-Speed 2493.63 samples/sec Loss 1.0975 LearningRate 0.000006 Epoch: 37 Global Step: 769370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:29,946-Speed 2491.58 samples/sec Loss 1.0861 LearningRate 0.000006 Epoch: 37 Global Step: 769380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:38,100-Speed 2512.09 samples/sec Loss 1.0629 LearningRate 0.000006 Epoch: 37 Global Step: 769390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:46,318-Speed 2492.55 samples/sec Loss 1.0815 LearningRate 0.000006 Epoch: 37 Global Step: 769400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:31:54,521-Speed 2496.92 samples/sec Loss 1.0584 LearningRate 0.000006 Epoch: 37 Global Step: 769410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:02,731-Speed 2494.99 samples/sec Loss 1.1270 LearningRate 0.000006 Epoch: 37 Global Step: 769420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:10,939-Speed 2495.41 samples/sec Loss 1.0534 LearningRate 0.000006 Epoch: 37 Global Step: 769430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:19,151-Speed 2494.41 samples/sec Loss 1.0950 LearningRate 0.000006 Epoch: 37 Global Step: 769440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:27,303-Speed 2512.52 samples/sec Loss 1.0919 LearningRate 0.000006 Epoch: 37 Global Step: 769450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:35,508-Speed 2497.18 samples/sec Loss 1.0768 LearningRate 0.000006 Epoch: 37 Global Step: 769460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:43,713-Speed 2496.48 samples/sec Loss 1.0937 LearningRate 0.000006 Epoch: 37 Global Step: 769470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:32:51,918-Speed 2496.41 samples/sec Loss 1.0823 LearningRate 0.000006 Epoch: 37 Global Step: 769480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:00,125-Speed 2495.93 samples/sec Loss 1.0929 LearningRate 0.000006 Epoch: 37 Global Step: 769490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:08,332-Speed 2495.99 samples/sec Loss 1.0756 LearningRate 0.000006 Epoch: 37 Global Step: 769500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:16,484-Speed 2512.70 samples/sec Loss 1.0985 LearningRate 0.000006 Epoch: 37 Global Step: 769510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:24,685-Speed 2497.49 samples/sec Loss 1.0729 LearningRate 0.000006 Epoch: 37 Global Step: 769520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:32,898-Speed 2494.00 samples/sec Loss 1.0714 LearningRate 0.000006 Epoch: 37 Global Step: 769530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:41,102-Speed 2496.91 samples/sec Loss 1.0558 LearningRate 0.000006 Epoch: 37 Global Step: 769540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:49,303-Speed 2497.57 samples/sec Loss 1.0939 LearningRate 0.000006 Epoch: 37 Global Step: 769550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:33:57,504-Speed 2497.54 samples/sec Loss 1.1101 LearningRate 0.000006 Epoch: 37 Global Step: 769560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:05,655-Speed 2513.03 samples/sec Loss 1.0688 LearningRate 0.000006 Epoch: 37 Global Step: 769570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:13,860-Speed 2496.33 samples/sec Loss 1.0577 LearningRate 0.000006 Epoch: 37 Global Step: 769580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:22,063-Speed 2497.08 samples/sec Loss 1.0864 LearningRate 0.000006 Epoch: 37 Global Step: 769590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:30,265-Speed 2497.37 samples/sec Loss 1.0889 LearningRate 0.000006 Epoch: 37 Global Step: 769600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:38,470-Speed 2496.62 samples/sec Loss 1.1062 LearningRate 0.000006 Epoch: 37 Global Step: 769610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:46,678-Speed 2495.63 samples/sec Loss 1.0740 LearningRate 0.000006 Epoch: 37 Global Step: 769620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:34:54,833-Speed 2511.66 samples/sec Loss 1.0857 LearningRate 0.000006 Epoch: 37 Global Step: 769630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:03,032-Speed 2498.47 samples/sec Loss 1.1024 LearningRate 0.000006 Epoch: 37 Global Step: 769640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:11,233-Speed 2497.74 samples/sec Loss 1.0774 LearningRate 0.000006 Epoch: 37 Global Step: 769650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:19,436-Speed 2496.91 samples/sec Loss 1.0888 LearningRate 0.000006 Epoch: 37 Global Step: 769660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:27,640-Speed 2497.28 samples/sec Loss 1.0980 LearningRate 0.000006 Epoch: 37 Global Step: 769670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:35,846-Speed 2496.29 samples/sec Loss 1.1121 LearningRate 0.000006 Epoch: 37 Global Step: 769680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:43,997-Speed 2512.92 samples/sec Loss 1.0589 LearningRate 0.000006 Epoch: 37 Global Step: 769690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:35:52,204-Speed 2495.82 samples/sec Loss 1.1207 LearningRate 0.000006 Epoch: 37 Global Step: 769700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:00,412-Speed 2495.91 samples/sec Loss 1.0502 LearningRate 0.000006 Epoch: 37 Global Step: 769710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:08,616-Speed 2496.76 samples/sec Loss 1.0504 LearningRate 0.000006 Epoch: 37 Global Step: 769720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:16,823-Speed 2496.04 samples/sec Loss 1.0663 LearningRate 0.000006 Epoch: 37 Global Step: 769730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:25,031-Speed 2495.49 samples/sec Loss 1.0775 LearningRate 0.000006 Epoch: 37 Global Step: 769740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:33,184-Speed 2512.40 samples/sec Loss 1.0961 LearningRate 0.000006 Epoch: 37 Global Step: 769750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:41,387-Speed 2497.30 samples/sec Loss 1.0781 LearningRate 0.000006 Epoch: 37 Global Step: 769760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:49,589-Speed 2497.14 samples/sec Loss 1.0674 LearningRate 0.000006 Epoch: 37 Global Step: 769770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:36:57,789-Speed 2498.01 samples/sec Loss 1.0928 LearningRate 0.000006 Epoch: 37 Global Step: 769780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:06,005-Speed 2493.11 samples/sec Loss 1.0814 LearningRate 0.000006 Epoch: 37 Global Step: 769790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:14,212-Speed 2496.08 samples/sec Loss 1.1008 LearningRate 0.000006 Epoch: 37 Global Step: 769800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:22,367-Speed 2511.70 samples/sec Loss 1.0699 LearningRate 0.000006 Epoch: 37 Global Step: 769810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:30,588-Speed 2491.61 samples/sec Loss 1.0922 LearningRate 0.000006 Epoch: 37 Global Step: 769820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:38,794-Speed 2495.80 samples/sec Loss 1.0947 LearningRate 0.000006 Epoch: 37 Global Step: 769830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:46,999-Speed 2496.58 samples/sec Loss 1.0882 LearningRate 0.000006 Epoch: 37 Global Step: 769840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:37:55,206-Speed 2495.76 samples/sec Loss 1.0722 LearningRate 0.000006 Epoch: 37 Global Step: 769850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:03,415-Speed 2495.21 samples/sec Loss 1.0652 LearningRate 0.000006 Epoch: 37 Global Step: 769860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:11,576-Speed 2509.65 samples/sec Loss 1.0870 LearningRate 0.000006 Epoch: 37 Global Step: 769870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:19,778-Speed 2497.40 samples/sec Loss 1.0829 LearningRate 0.000006 Epoch: 37 Global Step: 769880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:27,984-Speed 2496.23 samples/sec Loss 1.0626 LearningRate 0.000006 Epoch: 37 Global Step: 769890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:36,186-Speed 2497.14 samples/sec Loss 1.1015 LearningRate 0.000006 Epoch: 37 Global Step: 769900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:44,388-Speed 2497.47 samples/sec Loss 1.0599 LearningRate 0.000006 Epoch: 37 Global Step: 769910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:38:52,601-Speed 2493.92 samples/sec Loss 1.0680 LearningRate 0.000006 Epoch: 37 Global Step: 769920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:00,746-Speed 2514.96 samples/sec Loss 1.0620 LearningRate 0.000006 Epoch: 37 Global Step: 769930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:08,947-Speed 2497.57 samples/sec Loss 1.0572 LearningRate 0.000006 Epoch: 37 Global Step: 769940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:17,155-Speed 2495.46 samples/sec Loss 1.0790 LearningRate 0.000006 Epoch: 37 Global Step: 769950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:25,357-Speed 2497.34 samples/sec Loss 1.0734 LearningRate 0.000006 Epoch: 37 Global Step: 769960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:33,573-Speed 2492.93 samples/sec Loss 1.0486 LearningRate 0.000006 Epoch: 37 Global Step: 769970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:41,777-Speed 2496.84 samples/sec Loss 1.1034 LearningRate 0.000006 Epoch: 37 Global Step: 769980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:49,929-Speed 2512.95 samples/sec Loss 1.0938 LearningRate 0.000006 Epoch: 37 Global Step: 769990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:39:58,131-Speed 2496.95 samples/sec Loss 1.0816 LearningRate 0.000006 Epoch: 37 Global Step: 770000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:06,336-Speed 2496.34 samples/sec Loss 1.0669 LearningRate 0.000006 Epoch: 37 Global Step: 770010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:14,541-Speed 2496.57 samples/sec Loss 1.0605 LearningRate 0.000006 Epoch: 37 Global Step: 770020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:22,746-Speed 2496.35 samples/sec Loss 1.0461 LearningRate 0.000006 Epoch: 37 Global Step: 770030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:30,949-Speed 2497.12 samples/sec Loss 1.0879 LearningRate 0.000006 Epoch: 37 Global Step: 770040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:39,097-Speed 2513.72 samples/sec Loss 1.0634 LearningRate 0.000006 Epoch: 37 Global Step: 770050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:47,314-Speed 2492.98 samples/sec Loss 1.0894 LearningRate 0.000006 Epoch: 37 Global Step: 770060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:40:55,519-Speed 2496.78 samples/sec Loss 1.1036 LearningRate 0.000006 Epoch: 37 Global Step: 770070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:03,724-Speed 2496.37 samples/sec Loss 1.1083 LearningRate 0.000006 Epoch: 37 Global Step: 770080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:11,932-Speed 2495.45 samples/sec Loss 1.0741 LearningRate 0.000006 Epoch: 37 Global Step: 770090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:20,141-Speed 2495.57 samples/sec Loss 1.0683 LearningRate 0.000006 Epoch: 37 Global Step: 770100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:28,292-Speed 2513.18 samples/sec Loss 1.0967 LearningRate 0.000006 Epoch: 37 Global Step: 770110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:36,491-Speed 2498.23 samples/sec Loss 1.1019 LearningRate 0.000006 Epoch: 37 Global Step: 770120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:44,709-Speed 2492.46 samples/sec Loss 1.0841 LearningRate 0.000006 Epoch: 37 Global Step: 770130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:41:52,910-Speed 2498.05 samples/sec Loss 1.0752 LearningRate 0.000006 Epoch: 37 Global Step: 770140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:01,111-Speed 2497.63 samples/sec Loss 1.0611 LearningRate 0.000006 Epoch: 37 Global Step: 770150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:09,312-Speed 2497.35 samples/sec Loss 1.0784 LearningRate 0.000006 Epoch: 37 Global Step: 770160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:17,465-Speed 2512.65 samples/sec Loss 1.0787 LearningRate 0.000006 Epoch: 37 Global Step: 770170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:25,670-Speed 2496.31 samples/sec Loss 1.0765 LearningRate 0.000006 Epoch: 37 Global Step: 770180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:33,871-Speed 2497.51 samples/sec Loss 1.0834 LearningRate 0.000006 Epoch: 37 Global Step: 770190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:42,075-Speed 2497.21 samples/sec Loss 1.0900 LearningRate 0.000006 Epoch: 37 Global Step: 770200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:50,289-Speed 2493.63 samples/sec Loss 1.0582 LearningRate 0.000006 Epoch: 37 Global Step: 770210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:42:58,491-Speed 2497.40 samples/sec Loss 1.0957 LearningRate 0.000006 Epoch: 37 Global Step: 770220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:06,639-Speed 2513.81 samples/sec Loss 1.0641 LearningRate 0.000006 Epoch: 37 Global Step: 770230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:14,842-Speed 2497.13 samples/sec Loss 1.0815 LearningRate 0.000006 Epoch: 37 Global Step: 770240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:23,048-Speed 2495.93 samples/sec Loss 1.0866 LearningRate 0.000006 Epoch: 37 Global Step: 770250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:31,259-Speed 2494.84 samples/sec Loss 1.0861 LearningRate 0.000006 Epoch: 37 Global Step: 770260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:39,475-Speed 2493.32 samples/sec Loss 1.0698 LearningRate 0.000006 Epoch: 37 Global Step: 770270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:47,676-Speed 2497.42 samples/sec Loss 1.1033 LearningRate 0.000006 Epoch: 37 Global Step: 770280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:43:55,827-Speed 2512.92 samples/sec Loss 1.0717 LearningRate 0.000006 Epoch: 37 Global Step: 770290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:04,028-Speed 2497.68 samples/sec Loss 1.0657 LearningRate 0.000006 Epoch: 37 Global Step: 770300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:12,230-Speed 2497.52 samples/sec Loss 1.0888 LearningRate 0.000006 Epoch: 37 Global Step: 770310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:20,438-Speed 2495.31 samples/sec Loss 1.1040 LearningRate 0.000006 Epoch: 37 Global Step: 770320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:28,645-Speed 2495.87 samples/sec Loss 1.0971 LearningRate 0.000006 Epoch: 37 Global Step: 770330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:36,847-Speed 2497.28 samples/sec Loss 1.0639 LearningRate 0.000006 Epoch: 37 Global Step: 770340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:45,002-Speed 2511.73 samples/sec Loss 1.0721 LearningRate 0.000006 Epoch: 37 Global Step: 770350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:44:53,205-Speed 2497.34 samples/sec Loss 1.0820 LearningRate 0.000006 Epoch: 37 Global Step: 770360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:01,412-Speed 2495.87 samples/sec Loss 1.0836 LearningRate 0.000006 Epoch: 37 Global Step: 770370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:09,623-Speed 2494.87 samples/sec Loss 1.0779 LearningRate 0.000006 Epoch: 37 Global Step: 770380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:17,827-Speed 2496.71 samples/sec Loss 1.0764 LearningRate 0.000006 Epoch: 37 Global Step: 770390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:26,035-Speed 2495.35 samples/sec Loss 1.0994 LearningRate 0.000006 Epoch: 37 Global Step: 770400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:34,184-Speed 2513.36 samples/sec Loss 1.1072 LearningRate 0.000006 Epoch: 37 Global Step: 770410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:42,406-Speed 2491.53 samples/sec Loss 1.0689 LearningRate 0.000006 Epoch: 37 Global Step: 770420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:50,606-Speed 2497.82 samples/sec Loss 1.0774 LearningRate 0.000006 Epoch: 37 Global Step: 770430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:45:58,814-Speed 2495.67 samples/sec Loss 1.1006 LearningRate 0.000006 Epoch: 37 Global Step: 770440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:46:07,020-Speed 2496.04 samples/sec Loss 1.0906 LearningRate 0.000006 Epoch: 37 Global Step: 770450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:46:15,221-Speed 2497.64 samples/sec Loss 1.0832 LearningRate 0.000006 Epoch: 37 Global Step: 770460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:46:23,382-Speed 2509.93 samples/sec Loss 1.1064 LearningRate 0.000006 Epoch: 37 Global Step: 770470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-07-12 22:46:31,583-Speed 2497.81 samples/sec Loss 1.0662 LearningRate 0.000006 Epoch: 37 Global Step: 770480 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:46:39,801-Speed 2492.57 samples/sec Loss 1.0697 LearningRate 0.000006 Epoch: 37 Global Step: 770490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:46:48,007-Speed 2496.32 samples/sec Loss 1.0927 LearningRate 0.000006 Epoch: 37 Global Step: 770500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:46:56,212-Speed 2496.30 samples/sec Loss 1.1144 LearningRate 0.000006 Epoch: 37 Global Step: 770510 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:04,425-Speed 2494.26 samples/sec Loss 1.1105 LearningRate 0.000006 Epoch: 37 Global Step: 770520 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:12,577-Speed 2512.66 samples/sec Loss 1.1029 LearningRate 0.000006 Epoch: 37 Global Step: 770530 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:20,782-Speed 2496.74 samples/sec Loss 1.0529 LearningRate 0.000006 Epoch: 37 Global Step: 770540 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:28,990-Speed 2495.42 samples/sec Loss 1.0842 LearningRate 0.000006 Epoch: 37 Global Step: 770550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:37,197-Speed 2495.77 samples/sec Loss 1.0774 LearningRate 0.000006 Epoch: 37 Global Step: 770560 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:45,410-Speed 2494.16 samples/sec Loss 1.0872 LearningRate 0.000006 Epoch: 37 Global Step: 770570 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-07-12 22:47:53,619-Speed 2495.47 samples/sec Loss 1.0845 LearningRate 0.000006 Epoch: 37 Global Step: 770580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:01,785-Speed 2508.18 samples/sec Loss 1.0816 LearningRate 0.000006 Epoch: 37 Global Step: 770590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:10,001-Speed 2493.10 samples/sec Loss 1.0707 LearningRate 0.000006 Epoch: 37 Global Step: 770600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:18,200-Speed 2498.36 samples/sec Loss 1.0983 LearningRate 0.000006 Epoch: 37 Global Step: 770610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:26,403-Speed 2497.02 samples/sec Loss 1.0870 LearningRate 0.000006 Epoch: 37 Global Step: 770620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:34,625-Speed 2491.35 samples/sec Loss 1.1344 LearningRate 0.000006 Epoch: 37 Global Step: 770630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:42,824-Speed 2498.26 samples/sec Loss 1.0667 LearningRate 0.000006 Epoch: 37 Global Step: 770640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:50,972-Speed 2513.91 samples/sec Loss 1.0825 LearningRate 0.000006 Epoch: 37 Global Step: 770650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:48:59,178-Speed 2496.06 samples/sec Loss 1.0698 LearningRate 0.000006 Epoch: 37 Global Step: 770660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:07,386-Speed 2495.25 samples/sec Loss 1.0648 LearningRate 0.000006 Epoch: 37 Global Step: 770670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:15,607-Speed 2492.19 samples/sec Loss 1.0738 LearningRate 0.000006 Epoch: 37 Global Step: 770680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:23,808-Speed 2497.66 samples/sec Loss 1.0851 LearningRate 0.000006 Epoch: 37 Global Step: 770690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:32,012-Speed 2496.59 samples/sec Loss 1.0647 LearningRate 0.000006 Epoch: 37 Global Step: 770700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:40,164-Speed 2512.62 samples/sec Loss 1.0948 LearningRate 0.000006 Epoch: 37 Global Step: 770710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:48,371-Speed 2495.98 samples/sec Loss 1.1080 LearningRate 0.000006 Epoch: 37 Global Step: 770720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:49:56,576-Speed 2496.35 samples/sec Loss 1.0837 LearningRate 0.000006 Epoch: 37 Global Step: 770730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:04,794-Speed 2492.39 samples/sec Loss 1.0955 LearningRate 0.000006 Epoch: 37 Global Step: 770740 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:12,998-Speed 2496.88 samples/sec Loss 1.0856 LearningRate 0.000006 Epoch: 37 Global Step: 770750 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:21,207-Speed 2495.18 samples/sec Loss 1.0628 LearningRate 0.000006 Epoch: 37 Global Step: 770760 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:29,359-Speed 2512.56 samples/sec Loss 1.0776 LearningRate 0.000006 Epoch: 37 Global Step: 770770 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:37,570-Speed 2494.59 samples/sec Loss 1.0647 LearningRate 0.000006 Epoch: 37 Global Step: 770780 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:45,776-Speed 2496.37 samples/sec Loss 1.0975 LearningRate 0.000006 Epoch: 37 Global Step: 770790 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:50:53,978-Speed 2497.22 samples/sec Loss 1.0705 LearningRate 0.000006 Epoch: 37 Global Step: 770800 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:02,184-Speed 2496.14 samples/sec Loss 1.0830 LearningRate 0.000006 Epoch: 37 Global Step: 770810 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:10,394-Speed 2495.26 samples/sec Loss 1.0997 LearningRate 0.000006 Epoch: 37 Global Step: 770820 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:18,547-Speed 2512.34 samples/sec Loss 1.0770 LearningRate 0.000006 Epoch: 37 Global Step: 770830 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:26,765-Speed 2492.67 samples/sec Loss 1.0676 LearningRate 0.000006 Epoch: 37 Global Step: 770840 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:34,972-Speed 2495.93 samples/sec Loss 1.0637 LearningRate 0.000006 Epoch: 37 Global Step: 770850 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:43,186-Speed 2493.51 samples/sec Loss 1.0709 LearningRate 0.000006 Epoch: 37 Global Step: 770860 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:51,393-Speed 2495.97 samples/sec Loss 1.0836 LearningRate 0.000006 Epoch: 37 Global Step: 770870 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:51:59,595-Speed 2497.11 samples/sec Loss 1.0969 LearningRate 0.000006 Epoch: 37 Global Step: 770880 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:07,747-Speed 2512.97 samples/sec Loss 1.1086 LearningRate 0.000006 Epoch: 37 Global Step: 770890 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:15,951-Speed 2496.74 samples/sec Loss 1.0946 LearningRate 0.000006 Epoch: 37 Global Step: 770900 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:24,158-Speed 2495.72 samples/sec Loss 1.0878 LearningRate 0.000006 Epoch: 37 Global Step: 770910 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:32,360-Speed 2497.57 samples/sec Loss 1.0849 LearningRate 0.000006 Epoch: 37 Global Step: 770920 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:40,568-Speed 2495.59 samples/sec Loss 1.0846 LearningRate 0.000006 Epoch: 37 Global Step: 770930 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:48,773-Speed 2496.36 samples/sec Loss 1.0781 LearningRate 0.000006 Epoch: 37 Global Step: 770940 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:52:56,926-Speed 2512.51 samples/sec Loss 1.0718 LearningRate 0.000006 Epoch: 37 Global Step: 770950 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:05,130-Speed 2496.63 samples/sec Loss 1.0932 LearningRate 0.000006 Epoch: 37 Global Step: 770960 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:13,337-Speed 2495.91 samples/sec Loss 1.0875 LearningRate 0.000006 Epoch: 37 Global Step: 770970 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:21,539-Speed 2497.32 samples/sec Loss 1.0990 LearningRate 0.000006 Epoch: 37 Global Step: 770980 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:29,744-Speed 2496.43 samples/sec Loss 1.0812 LearningRate 0.000006 Epoch: 37 Global Step: 770990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:37,948-Speed 2496.75 samples/sec Loss 1.0787 LearningRate 0.000006 Epoch: 37 Global Step: 771000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:46,114-Speed 2508.31 samples/sec Loss 1.0736 LearningRate 0.000006 Epoch: 37 Global Step: 771010 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:53:54,322-Speed 2495.47 samples/sec Loss 1.0947 LearningRate 0.000006 Epoch: 37 Global Step: 771020 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:02,535-Speed 2494.25 samples/sec Loss 1.0944 LearningRate 0.000006 Epoch: 37 Global Step: 771030 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:10,746-Speed 2494.80 samples/sec Loss 1.0778 LearningRate 0.000006 Epoch: 37 Global Step: 771040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:18,958-Speed 2494.34 samples/sec Loss 1.1062 LearningRate 0.000006 Epoch: 37 Global Step: 771050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:27,164-Speed 2496.00 samples/sec Loss 1.0805 LearningRate 0.000006 Epoch: 37 Global Step: 771060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:35,318-Speed 2511.91 samples/sec Loss 1.0396 LearningRate 0.000006 Epoch: 37 Global Step: 771070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:43,524-Speed 2496.31 samples/sec Loss 1.0801 LearningRate 0.000006 Epoch: 37 Global Step: 771080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:51,738-Speed 2493.56 samples/sec Loss 1.0901 LearningRate 0.000006 Epoch: 37 Global Step: 771090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:54:59,947-Speed 2495.22 samples/sec Loss 1.1081 LearningRate 0.000006 Epoch: 37 Global Step: 771100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:08,166-Speed 2492.27 samples/sec Loss 1.0615 LearningRate 0.000006 Epoch: 37 Global Step: 771110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:16,375-Speed 2495.13 samples/sec Loss 1.0987 LearningRate 0.000006 Epoch: 37 Global Step: 771120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:24,530-Speed 2512.05 samples/sec Loss 1.0925 LearningRate 0.000006 Epoch: 37 Global Step: 771130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:32,736-Speed 2495.93 samples/sec Loss 1.0811 LearningRate 0.000006 Epoch: 37 Global Step: 771140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:40,960-Speed 2490.90 samples/sec Loss 1.0772 LearningRate 0.000006 Epoch: 37 Global Step: 771150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:49,168-Speed 2495.73 samples/sec Loss 1.0759 LearningRate 0.000006 Epoch: 37 Global Step: 771160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:55:57,372-Speed 2496.47 samples/sec Loss 1.0778 LearningRate 0.000006 Epoch: 37 Global Step: 771170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:05,577-Speed 2496.39 samples/sec Loss 1.1041 LearningRate 0.000006 Epoch: 37 Global Step: 771180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:13,726-Speed 2513.77 samples/sec Loss 1.0914 LearningRate 0.000006 Epoch: 37 Global Step: 771190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:21,940-Speed 2493.57 samples/sec Loss 1.0855 LearningRate 0.000006 Epoch: 37 Global Step: 771200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:30,143-Speed 2496.82 samples/sec Loss 1.0547 LearningRate 0.000006 Epoch: 37 Global Step: 771210 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:38,361-Speed 2493.48 samples/sec Loss 1.0652 LearningRate 0.000006 Epoch: 37 Global Step: 771220 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:46,567-Speed 2496.25 samples/sec Loss 1.0876 LearningRate 0.000006 Epoch: 37 Global Step: 771230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:56:54,772-Speed 2496.12 samples/sec Loss 1.0577 LearningRate 0.000006 Epoch: 37 Global Step: 771240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:02,926-Speed 2512.31 samples/sec Loss 1.0666 LearningRate 0.000006 Epoch: 37 Global Step: 771250 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:11,131-Speed 2496.37 samples/sec Loss 1.0729 LearningRate 0.000006 Epoch: 37 Global Step: 771260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:19,341-Speed 2495.07 samples/sec Loss 1.0750 LearningRate 0.000006 Epoch: 37 Global Step: 771270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:27,546-Speed 2496.45 samples/sec Loss 1.0737 LearningRate 0.000006 Epoch: 37 Global Step: 771280 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:35,751-Speed 2496.34 samples/sec Loss 1.0814 LearningRate 0.000006 Epoch: 37 Global Step: 771290 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:43,953-Speed 2497.43 samples/sec Loss 1.0873 LearningRate 0.000006 Epoch: 37 Global Step: 771300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:57:52,117-Speed 2508.80 samples/sec Loss 1.0878 LearningRate 0.000006 Epoch: 37 Global Step: 771310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:00,325-Speed 2495.64 samples/sec Loss 1.0650 LearningRate 0.000006 Epoch: 37 Global Step: 771320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:08,535-Speed 2495.12 samples/sec Loss 1.0697 LearningRate 0.000006 Epoch: 37 Global Step: 771330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:16,739-Speed 2496.50 samples/sec Loss 1.0809 LearningRate 0.000006 Epoch: 37 Global Step: 771340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:24,943-Speed 2496.85 samples/sec Loss 1.0991 LearningRate 0.000006 Epoch: 37 Global Step: 771350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:33,148-Speed 2496.53 samples/sec Loss 1.0954 LearningRate 0.000006 Epoch: 37 Global Step: 771360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:41,301-Speed 2512.20 samples/sec Loss 1.1069 LearningRate 0.000006 Epoch: 37 Global Step: 771370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:49,514-Speed 2494.11 samples/sec Loss 1.0655 LearningRate 0.000006 Epoch: 37 Global Step: 771380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:58:57,720-Speed 2496.13 samples/sec Loss 1.0622 LearningRate 0.000006 Epoch: 37 Global Step: 771390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:05,927-Speed 2495.92 samples/sec Loss 1.0825 LearningRate 0.000006 Epoch: 37 Global Step: 771400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:14,134-Speed 2495.90 samples/sec Loss 1.0645 LearningRate 0.000006 Epoch: 37 Global Step: 771410 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:22,340-Speed 2496.07 samples/sec Loss 1.0542 LearningRate 0.000006 Epoch: 37 Global Step: 771420 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:30,492-Speed 2512.72 samples/sec Loss 1.0713 LearningRate 0.000006 Epoch: 37 Global Step: 771430 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:38,695-Speed 2497.71 samples/sec Loss 1.0853 LearningRate 0.000006 Epoch: 37 Global Step: 771440 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:46,898-Speed 2496.83 samples/sec Loss 1.0945 LearningRate 0.000006 Epoch: 37 Global Step: 771450 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 22:59:55,128-Speed 2488.93 samples/sec Loss 1.1101 LearningRate 0.000006 Epoch: 37 Global Step: 771460 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:03,341-Speed 2494.37 samples/sec Loss 1.0643 LearningRate 0.000006 Epoch: 37 Global Step: 771470 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:11,547-Speed 2495.88 samples/sec Loss 1.0832 LearningRate 0.000006 Epoch: 37 Global Step: 771480 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:19,701-Speed 2512.33 samples/sec Loss 1.0615 LearningRate 0.000006 Epoch: 37 Global Step: 771490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:27,904-Speed 2497.04 samples/sec Loss 1.0881 LearningRate 0.000006 Epoch: 37 Global Step: 771500 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:36,118-Speed 2493.48 samples/sec Loss 1.0908 LearningRate 0.000006 Epoch: 37 Global Step: 771510 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:44,325-Speed 2495.96 samples/sec Loss 1.0658 LearningRate 0.000006 Epoch: 37 Global Step: 771520 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:00:52,529-Speed 2496.64 samples/sec Loss 1.0639 LearningRate 0.000006 Epoch: 37 Global Step: 771530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:00,747-Speed 2492.39 samples/sec Loss 1.0912 LearningRate 0.000006 Epoch: 37 Global Step: 771540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:08,907-Speed 2510.30 samples/sec Loss 1.0823 LearningRate 0.000006 Epoch: 37 Global Step: 771550 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:17,118-Speed 2494.65 samples/sec Loss 1.0692 LearningRate 0.000006 Epoch: 37 Global Step: 771560 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:25,323-Speed 2496.39 samples/sec Loss 1.0893 LearningRate 0.000006 Epoch: 37 Global Step: 771570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:33,537-Speed 2493.73 samples/sec Loss 1.0834 LearningRate 0.000006 Epoch: 37 Global Step: 771580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:41,740-Speed 2496.90 samples/sec Loss 1.0830 LearningRate 0.000006 Epoch: 37 Global Step: 771590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:49,945-Speed 2496.70 samples/sec Loss 1.0865 LearningRate 0.000006 Epoch: 37 Global Step: 771600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:01:58,096-Speed 2513.14 samples/sec Loss 1.0922 LearningRate 0.000006 Epoch: 37 Global Step: 771610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:06,302-Speed 2496.03 samples/sec Loss 1.0809 LearningRate 0.000006 Epoch: 37 Global Step: 771620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:14,511-Speed 2495.47 samples/sec Loss 1.0791 LearningRate 0.000006 Epoch: 37 Global Step: 771630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:22,717-Speed 2495.89 samples/sec Loss 1.0763 LearningRate 0.000006 Epoch: 37 Global Step: 771640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:30,920-Speed 2496.87 samples/sec Loss 1.0622 LearningRate 0.000006 Epoch: 37 Global Step: 771650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:39,124-Speed 2496.64 samples/sec Loss 1.0947 LearningRate 0.000006 Epoch: 37 Global Step: 771660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:47,276-Speed 2512.89 samples/sec Loss 1.0902 LearningRate 0.000006 Epoch: 37 Global Step: 771670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:02:55,485-Speed 2495.41 samples/sec Loss 1.0741 LearningRate 0.000006 Epoch: 37 Global Step: 771680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:03,696-Speed 2494.44 samples/sec Loss 1.0780 LearningRate 0.000006 Epoch: 37 Global Step: 771690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:11,908-Speed 2494.26 samples/sec Loss 1.0737 LearningRate 0.000006 Epoch: 37 Global Step: 771700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:20,126-Speed 2492.66 samples/sec Loss 1.0765 LearningRate 0.000006 Epoch: 37 Global Step: 771710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:28,332-Speed 2496.12 samples/sec Loss 1.0482 LearningRate 0.000006 Epoch: 37 Global Step: 771720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:36,497-Speed 2508.68 samples/sec Loss 1.1014 LearningRate 0.000006 Epoch: 37 Global Step: 771730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:44,708-Speed 2494.73 samples/sec Loss 1.1079 LearningRate 0.000006 Epoch: 37 Global Step: 771740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:03:52,915-Speed 2495.96 samples/sec Loss 1.0888 LearningRate 0.000006 Epoch: 37 Global Step: 771750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:01,117-Speed 2497.22 samples/sec Loss 1.0627 LearningRate 0.000006 Epoch: 37 Global Step: 771760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:09,321-Speed 2496.87 samples/sec Loss 1.0759 LearningRate 0.000006 Epoch: 37 Global Step: 771770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:17,527-Speed 2495.79 samples/sec Loss 1.0668 LearningRate 0.000006 Epoch: 37 Global Step: 771780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:25,675-Speed 2514.09 samples/sec Loss 1.0791 LearningRate 0.000006 Epoch: 37 Global Step: 771790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:33,882-Speed 2496.00 samples/sec Loss 1.0827 LearningRate 0.000006 Epoch: 37 Global Step: 771800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:42,087-Speed 2496.37 samples/sec Loss 1.0982 LearningRate 0.000006 Epoch: 37 Global Step: 771810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:50,290-Speed 2496.94 samples/sec Loss 1.1007 LearningRate 0.000006 Epoch: 37 Global Step: 771820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:04:58,508-Speed 2493.18 samples/sec Loss 1.0766 LearningRate 0.000006 Epoch: 37 Global Step: 771830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:06,712-Speed 2496.79 samples/sec Loss 1.0872 LearningRate 0.000006 Epoch: 37 Global Step: 771840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:14,864-Speed 2512.40 samples/sec Loss 1.0679 LearningRate 0.000006 Epoch: 37 Global Step: 771850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:23,071-Speed 2495.82 samples/sec Loss 1.1012 LearningRate 0.000006 Epoch: 37 Global Step: 771860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:31,276-Speed 2496.74 samples/sec Loss 1.0842 LearningRate 0.000006 Epoch: 37 Global Step: 771870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:39,483-Speed 2495.66 samples/sec Loss 1.0850 LearningRate 0.000006 Epoch: 37 Global Step: 771880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:47,690-Speed 2495.91 samples/sec Loss 1.0778 LearningRate 0.000006 Epoch: 37 Global Step: 771890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:05:55,908-Speed 2492.42 samples/sec Loss 1.0727 LearningRate 0.000006 Epoch: 37 Global Step: 771900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:04,062-Speed 2512.13 samples/sec Loss 1.0401 LearningRate 0.000006 Epoch: 37 Global Step: 771910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:12,267-Speed 2496.32 samples/sec Loss 1.0952 LearningRate 0.000006 Epoch: 37 Global Step: 771920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:20,475-Speed 2495.53 samples/sec Loss 1.1001 LearningRate 0.000006 Epoch: 37 Global Step: 771930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:28,679-Speed 2496.65 samples/sec Loss 1.0649 LearningRate 0.000006 Epoch: 37 Global Step: 771940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:36,886-Speed 2495.97 samples/sec Loss 1.0905 LearningRate 0.000006 Epoch: 37 Global Step: 771950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:45,091-Speed 2496.56 samples/sec Loss 1.0935 LearningRate 0.000006 Epoch: 37 Global Step: 771960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:06:53,244-Speed 2512.42 samples/sec Loss 1.0811 LearningRate 0.000006 Epoch: 37 Global Step: 771970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:01,448-Speed 2496.60 samples/sec Loss 1.0908 LearningRate 0.000006 Epoch: 37 Global Step: 771980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:09,653-Speed 2496.73 samples/sec Loss 1.0856 LearningRate 0.000006 Epoch: 37 Global Step: 771990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:17,858-Speed 2496.29 samples/sec Loss 1.0709 LearningRate 0.000006 Epoch: 37 Global Step: 772000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:26,078-Speed 2491.93 samples/sec Loss 1.0866 LearningRate 0.000006 Epoch: 37 Global Step: 772010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:34,290-Speed 2494.47 samples/sec Loss 1.0783 LearningRate 0.000006 Epoch: 37 Global Step: 772020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:42,441-Speed 2513.07 samples/sec Loss 1.0974 LearningRate 0.000006 Epoch: 37 Global Step: 772030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:50,646-Speed 2496.32 samples/sec Loss 1.0926 LearningRate 0.000006 Epoch: 37 Global Step: 772040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-07-12 23:07:58,827-Speed 2503.79 samples/sec Loss 1.0714 LearningRate 0.000006 Epoch: 37 Global Step: 772050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:07,034-Speed 2495.89 samples/sec Loss 1.0671 LearningRate 0.000006 Epoch: 37 Global Step: 772060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:15,244-Speed 2495.17 samples/sec Loss 1.0728 LearningRate 0.000006 Epoch: 37 Global Step: 772070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:23,459-Speed 2493.11 samples/sec Loss 1.0839 LearningRate 0.000006 Epoch: 37 Global Step: 772080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:31,612-Speed 2512.47 samples/sec Loss 1.0766 LearningRate 0.000006 Epoch: 37 Global Step: 772090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:39,829-Speed 2492.65 samples/sec Loss 1.0745 LearningRate 0.000006 Epoch: 37 Global Step: 772100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:48,038-Speed 2495.42 samples/sec Loss 1.0971 LearningRate 0.000006 Epoch: 37 Global Step: 772110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:08:56,241-Speed 2496.89 samples/sec Loss 1.0898 LearningRate 0.000006 Epoch: 37 Global Step: 772120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:04,447-Speed 2496.13 samples/sec Loss 1.0719 LearningRate 0.000006 Epoch: 37 Global Step: 772130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:12,652-Speed 2496.59 samples/sec Loss 1.0836 LearningRate 0.000006 Epoch: 37 Global Step: 772140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:20,803-Speed 2512.83 samples/sec Loss 1.0707 LearningRate 0.000006 Epoch: 37 Global Step: 772150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:29,009-Speed 2496.08 samples/sec Loss 1.0719 LearningRate 0.000006 Epoch: 37 Global Step: 772160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:37,235-Speed 2490.30 samples/sec Loss 1.0753 LearningRate 0.000006 Epoch: 37 Global Step: 772170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:45,443-Speed 2495.39 samples/sec Loss 1.0791 LearningRate 0.000006 Epoch: 37 Global Step: 772180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:09:53,662-Speed 2492.20 samples/sec Loss 1.0913 LearningRate 0.000006 Epoch: 37 Global Step: 772190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:01,867-Speed 2496.17 samples/sec Loss 1.0574 LearningRate 0.000006 Epoch: 37 Global Step: 772200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:10,021-Speed 2512.77 samples/sec Loss 1.0855 LearningRate 0.000006 Epoch: 37 Global Step: 772210 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:18,227-Speed 2496.00 samples/sec Loss 1.0855 LearningRate 0.000006 Epoch: 37 Global Step: 772220 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:26,436-Speed 2495.29 samples/sec Loss 1.0649 LearningRate 0.000006 Epoch: 37 Global Step: 772230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:34,641-Speed 2496.29 samples/sec Loss 1.0576 LearningRate 0.000006 Epoch: 37 Global Step: 772240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:42,849-Speed 2495.76 samples/sec Loss 1.0551 LearningRate 0.000006 Epoch: 37 Global Step: 772250 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:51,054-Speed 2496.37 samples/sec Loss 1.0643 LearningRate 0.000006 Epoch: 37 Global Step: 772260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:10:59,204-Speed 2512.99 samples/sec Loss 1.0634 LearningRate 0.000006 Epoch: 37 Global Step: 772270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:07,418-Speed 2493.84 samples/sec Loss 1.0847 LearningRate 0.000006 Epoch: 37 Global Step: 772280 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:15,621-Speed 2497.04 samples/sec Loss 1.0888 LearningRate 0.000006 Epoch: 37 Global Step: 772290 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:23,825-Speed 2496.77 samples/sec Loss 1.0728 LearningRate 0.000006 Epoch: 37 Global Step: 772300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:32,028-Speed 2496.83 samples/sec Loss 1.0865 LearningRate 0.000006 Epoch: 37 Global Step: 772310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:40,230-Speed 2497.22 samples/sec Loss 1.0921 LearningRate 0.000006 Epoch: 37 Global Step: 772320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:48,381-Speed 2513.27 samples/sec Loss 1.0887 LearningRate 0.000006 Epoch: 37 Global Step: 772330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:11:56,591-Speed 2494.87 samples/sec Loss 1.1006 LearningRate 0.000006 Epoch: 37 Global Step: 772340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:04,808-Speed 2492.74 samples/sec Loss 1.0465 LearningRate 0.000006 Epoch: 37 Global Step: 772350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:13,016-Speed 2495.60 samples/sec Loss 1.0626 LearningRate 0.000006 Epoch: 37 Global Step: 772360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:21,303-Speed 2471.64 samples/sec Loss 1.0830 LearningRate 0.000006 Epoch: 37 Global Step: 772370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:29,509-Speed 2496.19 samples/sec Loss 1.0519 LearningRate 0.000006 Epoch: 37 Global Step: 772380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:37,662-Speed 2512.32 samples/sec Loss 1.0874 LearningRate 0.000006 Epoch: 37 Global Step: 772390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:45,872-Speed 2494.81 samples/sec Loss 1.0412 LearningRate 0.000006 Epoch: 37 Global Step: 772400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:12:54,080-Speed 2495.61 samples/sec Loss 1.0656 LearningRate 0.000006 Epoch: 37 Global Step: 772410 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:02,288-Speed 2495.38 samples/sec Loss 1.0809 LearningRate 0.000006 Epoch: 37 Global Step: 772420 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:10,500-Speed 2494.34 samples/sec Loss 1.0587 LearningRate 0.000006 Epoch: 37 Global Step: 772430 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:18,711-Speed 2494.67 samples/sec Loss 1.0879 LearningRate 0.000006 Epoch: 37 Global Step: 772440 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:26,864-Speed 2512.47 samples/sec Loss 1.0796 LearningRate 0.000006 Epoch: 37 Global Step: 772450 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:35,073-Speed 2495.39 samples/sec Loss 1.1026 LearningRate 0.000006 Epoch: 37 Global Step: 772460 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:43,278-Speed 2496.59 samples/sec Loss 1.0728 LearningRate 0.000006 Epoch: 37 Global Step: 772470 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:51,484-Speed 2495.97 samples/sec Loss 1.0781 LearningRate 0.000006 Epoch: 37 Global Step: 772480 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:13:59,690-Speed 2496.16 samples/sec Loss 1.0633 LearningRate 0.000006 Epoch: 37 Global Step: 772490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:07,899-Speed 2495.05 samples/sec Loss 1.0811 LearningRate 0.000006 Epoch: 37 Global Step: 772500 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:16,049-Speed 2513.57 samples/sec Loss 1.1132 LearningRate 0.000006 Epoch: 37 Global Step: 772510 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:24,254-Speed 2496.61 samples/sec Loss 1.0819 LearningRate 0.000006 Epoch: 37 Global Step: 772520 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:32,457-Speed 2496.86 samples/sec Loss 1.0772 LearningRate 0.000006 Epoch: 37 Global Step: 772530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:40,661-Speed 2496.76 samples/sec Loss 1.1088 LearningRate 0.000006 Epoch: 37 Global Step: 772540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:48,864-Speed 2496.76 samples/sec Loss 1.0920 LearningRate 0.000006 Epoch: 37 Global Step: 772550 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:14:57,082-Speed 2492.43 samples/sec Loss 1.0728 LearningRate 0.000006 Epoch: 37 Global Step: 772560 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:05,232-Speed 2513.67 samples/sec Loss 1.0505 LearningRate 0.000006 Epoch: 37 Global Step: 772570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:13,434-Speed 2497.38 samples/sec Loss 1.0859 LearningRate 0.000006 Epoch: 37 Global Step: 772580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:21,637-Speed 2496.70 samples/sec Loss 1.0579 LearningRate 0.000006 Epoch: 37 Global Step: 772590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:29,841-Speed 2497.12 samples/sec Loss 1.0553 LearningRate 0.000006 Epoch: 37 Global Step: 772600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:38,045-Speed 2497.37 samples/sec Loss 1.0683 LearningRate 0.000006 Epoch: 37 Global Step: 772610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:46,248-Speed 2497.01 samples/sec Loss 1.1206 LearningRate 0.000006 Epoch: 37 Global Step: 772620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:15:54,398-Speed 2513.20 samples/sec Loss 1.0826 LearningRate 0.000006 Epoch: 37 Global Step: 772630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:02,605-Speed 2496.12 samples/sec Loss 1.0933 LearningRate 0.000006 Epoch: 37 Global Step: 772640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:10,810-Speed 2496.50 samples/sec Loss 1.0927 LearningRate 0.000006 Epoch: 37 Global Step: 772650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:19,014-Speed 2496.66 samples/sec Loss 1.0986 LearningRate 0.000006 Epoch: 37 Global Step: 772660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:27,236-Speed 2491.42 samples/sec Loss 1.0818 LearningRate 0.000006 Epoch: 37 Global Step: 772670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:35,440-Speed 2496.84 samples/sec Loss 1.1030 LearningRate 0.000006 Epoch: 37 Global Step: 772680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:43,593-Speed 2512.35 samples/sec Loss 1.0794 LearningRate 0.000006 Epoch: 37 Global Step: 772690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:16:51,797-Speed 2496.71 samples/sec Loss 1.0874 LearningRate 0.000006 Epoch: 37 Global Step: 772700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:17:00,003-Speed 2496.08 samples/sec Loss 1.0549 LearningRate 0.000006 Epoch: 37 Global Step: 772710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-07-12 23:17:08,164-Speed 2510.14 samples/sec Loss 1.0880 LearningRate 0.000006 Epoch: 37 Global Step: 772720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:16,367-Speed 2496.81 samples/sec Loss 1.0755 LearningRate 0.000006 Epoch: 37 Global Step: 772730 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:24,571-Speed 2496.82 samples/sec Loss 1.0319 LearningRate 0.000006 Epoch: 37 Global Step: 772740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:32,722-Speed 2512.95 samples/sec Loss 1.0840 LearningRate 0.000006 Epoch: 37 Global Step: 772750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:40,924-Speed 2497.33 samples/sec Loss 1.0557 LearningRate 0.000006 Epoch: 37 Global Step: 772760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:49,131-Speed 2495.95 samples/sec Loss 1.0872 LearningRate 0.000006 Epoch: 37 Global Step: 772770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:17:57,338-Speed 2495.77 samples/sec Loss 1.0960 LearningRate 0.000006 Epoch: 37 Global Step: 772780 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:05,542-Speed 2497.12 samples/sec Loss 1.0875 LearningRate 0.000006 Epoch: 37 Global Step: 772790 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:13,750-Speed 2499.42 samples/sec Loss 1.0743 LearningRate 0.000006 Epoch: 37 Global Step: 772800 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:21,902-Speed 2512.52 samples/sec Loss 1.0787 LearningRate 0.000006 Epoch: 37 Global Step: 772810 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:30,104-Speed 2497.35 samples/sec Loss 1.0684 LearningRate 0.000006 Epoch: 37 Global Step: 772820 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:38,311-Speed 2495.65 samples/sec Loss 1.0512 LearningRate 0.000006 Epoch: 37 Global Step: 772830 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:46,515-Speed 2497.31 samples/sec Loss 1.0977 LearningRate 0.000006 Epoch: 37 Global Step: 772840 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:18:54,721-Speed 2496.07 samples/sec Loss 1.0459 LearningRate 0.000006 Epoch: 37 Global Step: 772850 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:02,926-Speed 2496.30 samples/sec Loss 1.1049 LearningRate 0.000006 Epoch: 37 Global Step: 772860 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:11,080-Speed 2511.91 samples/sec Loss 1.1120 LearningRate 0.000006 Epoch: 37 Global Step: 772870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:19,285-Speed 2496.98 samples/sec Loss 1.0703 LearningRate 0.000006 Epoch: 37 Global Step: 772880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:27,490-Speed 2496.26 samples/sec Loss 1.0619 LearningRate 0.000006 Epoch: 37 Global Step: 772890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:35,695-Speed 2496.34 samples/sec Loss 1.1013 LearningRate 0.000006 Epoch: 37 Global Step: 772900 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:43,900-Speed 2496.35 samples/sec Loss 1.0604 LearningRate 0.000006 Epoch: 37 Global Step: 772910 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:19:52,065-Speed 2509.01 samples/sec Loss 1.1152 LearningRate 0.000006 Epoch: 37 Global Step: 772920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:00,212-Speed 2514.16 samples/sec Loss 1.0795 LearningRate 0.000006 Epoch: 37 Global Step: 772930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:08,419-Speed 2496.07 samples/sec Loss 1.0941 LearningRate 0.000006 Epoch: 37 Global Step: 772940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:16,619-Speed 2497.98 samples/sec Loss 1.0920 LearningRate 0.000006 Epoch: 37 Global Step: 772950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:24,820-Speed 2497.52 samples/sec Loss 1.0949 LearningRate 0.000006 Epoch: 37 Global Step: 772960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:33,026-Speed 2496.12 samples/sec Loss 1.0656 LearningRate 0.000006 Epoch: 37 Global Step: 772970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:41,229-Speed 2497.20 samples/sec Loss 1.0863 LearningRate 0.000006 Epoch: 37 Global Step: 772980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:49,379-Speed 2513.25 samples/sec Loss 1.0666 LearningRate 0.000006 Epoch: 37 Global Step: 772990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:20:57,584-Speed 2496.54 samples/sec Loss 1.0993 LearningRate 0.000006 Epoch: 37 Global Step: 773000 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:05,796-Speed 2494.21 samples/sec Loss 1.0753 LearningRate 0.000006 Epoch: 37 Global Step: 773010 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:13,999-Speed 2497.26 samples/sec Loss 1.1121 LearningRate 0.000006 Epoch: 37 Global Step: 773020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:22,203-Speed 2496.59 samples/sec Loss 1.0607 LearningRate 0.000006 Epoch: 37 Global Step: 773030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:30,404-Speed 2497.47 samples/sec Loss 1.0809 LearningRate 0.000006 Epoch: 37 Global Step: 773040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:38,556-Speed 2512.74 samples/sec Loss 1.0934 LearningRate 0.000006 Epoch: 37 Global Step: 773050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:46,759-Speed 2497.28 samples/sec Loss 1.0537 LearningRate 0.000006 Epoch: 37 Global Step: 773060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:21:54,959-Speed 2498.01 samples/sec Loss 1.0787 LearningRate 0.000006 Epoch: 37 Global Step: 773070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:03,164-Speed 2496.31 samples/sec Loss 1.0944 LearningRate 0.000006 Epoch: 37 Global Step: 773080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:11,367-Speed 2497.19 samples/sec Loss 1.0891 LearningRate 0.000006 Epoch: 37 Global Step: 773090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:19,567-Speed 2498.07 samples/sec Loss 1.1093 LearningRate 0.000006 Epoch: 37 Global Step: 773100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:27,718-Speed 2512.91 samples/sec Loss 1.0923 LearningRate 0.000006 Epoch: 37 Global Step: 773110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:35,920-Speed 2497.57 samples/sec Loss 1.0670 LearningRate 0.000006 Epoch: 37 Global Step: 773120 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:44,123-Speed 2496.92 samples/sec Loss 1.0864 LearningRate 0.000006 Epoch: 37 Global Step: 773130 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:22:52,326-Speed 2496.87 samples/sec Loss 1.0832 LearningRate 0.000006 Epoch: 37 Global Step: 773140 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:00,526-Speed 2498.09 samples/sec Loss 1.0759 LearningRate 0.000006 Epoch: 37 Global Step: 773150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:08,730-Speed 2496.86 samples/sec Loss 1.1127 LearningRate 0.000006 Epoch: 37 Global Step: 773160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:16,878-Speed 2514.15 samples/sec Loss 1.0932 LearningRate 0.000006 Epoch: 37 Global Step: 773170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:25,081-Speed 2497.00 samples/sec Loss 1.0695 LearningRate 0.000006 Epoch: 37 Global Step: 773180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:33,282-Speed 2497.78 samples/sec Loss 1.0785 LearningRate 0.000006 Epoch: 37 Global Step: 773190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:41,483-Speed 2497.56 samples/sec Loss 1.0713 LearningRate 0.000006 Epoch: 37 Global Step: 773200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:49,687-Speed 2497.14 samples/sec Loss 1.0671 LearningRate 0.000006 Epoch: 37 Global Step: 773210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:23:57,891-Speed 2496.66 samples/sec Loss 1.0855 LearningRate 0.000006 Epoch: 37 Global Step: 773220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:06,040-Speed 2513.55 samples/sec Loss 1.0996 LearningRate 0.000006 Epoch: 37 Global Step: 773230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:14,249-Speed 2495.26 samples/sec Loss 1.0478 LearningRate 0.000006 Epoch: 37 Global Step: 773240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:22,464-Speed 2493.46 samples/sec Loss 1.0876 LearningRate 0.000006 Epoch: 37 Global Step: 773250 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:30,664-Speed 2497.77 samples/sec Loss 1.1019 LearningRate 0.000006 Epoch: 37 Global Step: 773260 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:38,864-Speed 2497.71 samples/sec Loss 1.1061 LearningRate 0.000006 Epoch: 37 Global Step: 773270 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:47,069-Speed 2496.63 samples/sec Loss 1.0894 LearningRate 0.000006 Epoch: 37 Global Step: 773280 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:24:55,234-Speed 2508.64 samples/sec Loss 1.0847 LearningRate 0.000006 Epoch: 37 Global Step: 773290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:03,437-Speed 2496.88 samples/sec Loss 1.0794 LearningRate 0.000006 Epoch: 37 Global Step: 773300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:11,640-Speed 2497.20 samples/sec Loss 1.0799 LearningRate 0.000006 Epoch: 37 Global Step: 773310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:19,844-Speed 2496.76 samples/sec Loss 1.0776 LearningRate 0.000006 Epoch: 37 Global Step: 773320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:28,047-Speed 2497.23 samples/sec Loss 1.0688 LearningRate 0.000006 Epoch: 37 Global Step: 773330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:36,250-Speed 2497.07 samples/sec Loss 1.0820 LearningRate 0.000006 Epoch: 37 Global Step: 773340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:44,401-Speed 2512.78 samples/sec Loss 1.1176 LearningRate 0.000006 Epoch: 37 Global Step: 773350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:25:52,603-Speed 2497.26 samples/sec Loss 1.0762 LearningRate 0.000006 Epoch: 37 Global Step: 773360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:00,806-Speed 2497.21 samples/sec Loss 1.1145 LearningRate 0.000006 Epoch: 37 Global Step: 773370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:09,008-Speed 2497.26 samples/sec Loss 1.0731 LearningRate 0.000006 Epoch: 37 Global Step: 773380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:17,209-Speed 2497.55 samples/sec Loss 1.0856 LearningRate 0.000006 Epoch: 37 Global Step: 773390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:25,418-Speed 2495.16 samples/sec Loss 1.0958 LearningRate 0.000006 Epoch: 37 Global Step: 773400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:33,569-Speed 2513.28 samples/sec Loss 1.1027 LearningRate 0.000006 Epoch: 37 Global Step: 773410 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:41,771-Speed 2497.43 samples/sec Loss 1.0904 LearningRate 0.000006 Epoch: 37 Global Step: 773420 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:49,971-Speed 2497.77 samples/sec Loss 1.1258 LearningRate 0.000006 Epoch: 37 Global Step: 773430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:26:58,171-Speed 2497.95 samples/sec Loss 1.0792 LearningRate 0.000006 Epoch: 37 Global Step: 773440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:06,376-Speed 2496.50 samples/sec Loss 1.0828 LearningRate 0.000006 Epoch: 37 Global Step: 773450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:14,588-Speed 2494.33 samples/sec Loss 1.0781 LearningRate 0.000006 Epoch: 37 Global Step: 773460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:22,735-Speed 2514.40 samples/sec Loss 1.0564 LearningRate 0.000006 Epoch: 37 Global Step: 773470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:30,938-Speed 2497.23 samples/sec Loss 1.1156 LearningRate 0.000006 Epoch: 37 Global Step: 773480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:39,139-Speed 2497.60 samples/sec Loss 1.0963 LearningRate 0.000006 Epoch: 37 Global Step: 773490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:47,344-Speed 2496.61 samples/sec Loss 1.1134 LearningRate 0.000006 Epoch: 37 Global Step: 773500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:27:55,546-Speed 2497.37 samples/sec Loss 1.0813 LearningRate 0.000006 Epoch: 37 Global Step: 773510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:03,752-Speed 2496.15 samples/sec Loss 1.0587 LearningRate 0.000006 Epoch: 37 Global Step: 773520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:11,901-Speed 2513.55 samples/sec Loss 1.0837 LearningRate 0.000006 Epoch: 37 Global Step: 773530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:20,100-Speed 2498.17 samples/sec Loss 1.0466 LearningRate 0.000006 Epoch: 37 Global Step: 773540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:28,304-Speed 2496.64 samples/sec Loss 1.0914 LearningRate 0.000006 Epoch: 37 Global Step: 773550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:36,510-Speed 2496.45 samples/sec Loss 1.0801 LearningRate 0.000006 Epoch: 37 Global Step: 773560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:44,716-Speed 2496.02 samples/sec Loss 1.0779 LearningRate 0.000006 Epoch: 37 Global Step: 773570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:28:52,918-Speed 2497.42 samples/sec Loss 1.1111 LearningRate 0.000006 Epoch: 37 Global Step: 773580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:01,068-Speed 2513.45 samples/sec Loss 1.1037 LearningRate 0.000006 Epoch: 37 Global Step: 773590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:09,272-Speed 2496.83 samples/sec Loss 1.0607 LearningRate 0.000006 Epoch: 37 Global Step: 773600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:17,474-Speed 2497.14 samples/sec Loss 1.0487 LearningRate 0.000006 Epoch: 37 Global Step: 773610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:25,676-Speed 2497.22 samples/sec Loss 1.0810 LearningRate 0.000006 Epoch: 37 Global Step: 773620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:33,884-Speed 2495.63 samples/sec Loss 1.0861 LearningRate 0.000006 Epoch: 37 Global Step: 773630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:42,088-Speed 2496.68 samples/sec Loss 1.0801 LearningRate 0.000006 Epoch: 37 Global Step: 773640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:50,235-Speed 2514.24 samples/sec Loss 1.0779 LearningRate 0.000006 Epoch: 37 Global Step: 773650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:29:58,438-Speed 2497.08 samples/sec Loss 1.0827 LearningRate 0.000006 Epoch: 37 Global Step: 773660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:06,640-Speed 2497.34 samples/sec Loss 1.0646 LearningRate 0.000006 Epoch: 37 Global Step: 773670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:14,840-Speed 2498.06 samples/sec Loss 1.0722 LearningRate 0.000006 Epoch: 37 Global Step: 773680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:23,045-Speed 2496.38 samples/sec Loss 1.0555 LearningRate 0.000006 Epoch: 37 Global Step: 773690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:31,250-Speed 2496.60 samples/sec Loss 1.0917 LearningRate 0.000006 Epoch: 37 Global Step: 773700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:39,398-Speed 2513.78 samples/sec Loss 1.0845 LearningRate 0.000006 Epoch: 37 Global Step: 773710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:47,604-Speed 2496.29 samples/sec Loss 1.0740 LearningRate 0.000006 Epoch: 37 Global Step: 773720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:30:55,804-Speed 2497.67 samples/sec Loss 1.0825 LearningRate 0.000006 Epoch: 37 Global Step: 773730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:04,007-Speed 2497.27 samples/sec Loss 1.0847 LearningRate 0.000006 Epoch: 37 Global Step: 773740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:12,214-Speed 2495.65 samples/sec Loss 1.0636 LearningRate 0.000006 Epoch: 37 Global Step: 773750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:20,419-Speed 2496.55 samples/sec Loss 1.0539 LearningRate 0.000006 Epoch: 37 Global Step: 773760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:28,569-Speed 2513.34 samples/sec Loss 1.0641 LearningRate 0.000006 Epoch: 37 Global Step: 773770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:36,775-Speed 2496.05 samples/sec Loss 1.0562 LearningRate 0.000006 Epoch: 37 Global Step: 773780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:44,975-Speed 2497.85 samples/sec Loss 1.0708 LearningRate 0.000006 Epoch: 37 Global Step: 773790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:31:53,177-Speed 2497.77 samples/sec Loss 1.0689 LearningRate 0.000006 Epoch: 37 Global Step: 773800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:01,390-Speed 2494.09 samples/sec Loss 1.0909 LearningRate 0.000006 Epoch: 37 Global Step: 773810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:09,591-Speed 2497.37 samples/sec Loss 1.0856 LearningRate 0.000006 Epoch: 37 Global Step: 773820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:17,750-Speed 2510.78 samples/sec Loss 1.0576 LearningRate 0.000006 Epoch: 37 Global Step: 773830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:25,951-Speed 2497.53 samples/sec Loss 1.1007 LearningRate 0.000006 Epoch: 37 Global Step: 773840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:34,153-Speed 2497.28 samples/sec Loss 1.0708 LearningRate 0.000006 Epoch: 37 Global Step: 773850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:42,354-Speed 2497.46 samples/sec Loss 1.0683 LearningRate 0.000006 Epoch: 37 Global Step: 773860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:50,558-Speed 2496.80 samples/sec Loss 1.0757 LearningRate 0.000006 Epoch: 37 Global Step: 773870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:32:58,765-Speed 2495.70 samples/sec Loss 1.0477 LearningRate 0.000006 Epoch: 37 Global Step: 773880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:06,913-Speed 2514.01 samples/sec Loss 1.0843 LearningRate 0.000006 Epoch: 37 Global Step: 773890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:15,114-Speed 2497.51 samples/sec Loss 1.0346 LearningRate 0.000006 Epoch: 37 Global Step: 773900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:23,327-Speed 2493.82 samples/sec Loss 1.0713 LearningRate 0.000006 Epoch: 37 Global Step: 773910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:31,532-Speed 2496.64 samples/sec Loss 1.1106 LearningRate 0.000006 Epoch: 37 Global Step: 773920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:39,734-Speed 2497.25 samples/sec Loss 1.0330 LearningRate 0.000006 Epoch: 37 Global Step: 773930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:47,947-Speed 2493.85 samples/sec Loss 1.0857 LearningRate 0.000006 Epoch: 37 Global Step: 773940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:33:56,097-Speed 2513.23 samples/sec Loss 1.0473 LearningRate 0.000006 Epoch: 37 Global Step: 773950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:04,300-Speed 2497.05 samples/sec Loss 1.0698 LearningRate 0.000006 Epoch: 37 Global Step: 773960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:12,501-Speed 2497.59 samples/sec Loss 1.0647 LearningRate 0.000006 Epoch: 37 Global Step: 773970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:20,700-Speed 2498.32 samples/sec Loss 1.0761 LearningRate 0.000006 Epoch: 37 Global Step: 773980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:28,903-Speed 2496.81 samples/sec Loss 1.0506 LearningRate 0.000006 Epoch: 37 Global Step: 773990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:37,108-Speed 2496.49 samples/sec Loss 1.0695 LearningRate 0.000006 Epoch: 37 Global Step: 774000 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:45,259-Speed 2512.97 samples/sec Loss 1.0812 LearningRate 0.000006 Epoch: 37 Global Step: 774010 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:34:53,464-Speed 2496.55 samples/sec Loss 1.0778 LearningRate 0.000006 Epoch: 37 Global Step: 774020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:01,665-Speed 2499.24 samples/sec Loss 1.0550 LearningRate 0.000006 Epoch: 37 Global Step: 774030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:09,868-Speed 2497.22 samples/sec Loss 1.0711 LearningRate 0.000006 Epoch: 37 Global Step: 774040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:18,067-Speed 2498.14 samples/sec Loss 1.0740 LearningRate 0.000006 Epoch: 37 Global Step: 774050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:26,268-Speed 2497.85 samples/sec Loss 1.0678 LearningRate 0.000006 Epoch: 37 Global Step: 774060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:34,416-Speed 2514.00 samples/sec Loss 1.0775 LearningRate 0.000006 Epoch: 37 Global Step: 774070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:42,615-Speed 2498.25 samples/sec Loss 1.0727 LearningRate 0.000006 Epoch: 37 Global Step: 774080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:50,819-Speed 2496.90 samples/sec Loss 1.1082 LearningRate 0.000006 Epoch: 37 Global Step: 774090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:35:59,022-Speed 2496.79 samples/sec Loss 1.0994 LearningRate 0.000006 Epoch: 37 Global Step: 774100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:36:07,223-Speed 2497.72 samples/sec Loss 1.0752 LearningRate 0.000006 Epoch: 37 Global Step: 774110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:36:15,425-Speed 2497.38 samples/sec Loss 1.0794 LearningRate 0.000006 Epoch: 37 Global Step: 774120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:36:23,575-Speed 2513.42 samples/sec Loss 1.0876 LearningRate 0.000006 Epoch: 37 Global Step: 774130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:36:31,776-Speed 2497.48 samples/sec Loss 1.0686 LearningRate 0.000006 Epoch: 37 Global Step: 774140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:36:39,980-Speed 2496.99 samples/sec Loss 1.0651 LearningRate 0.000006 Epoch: 37 Global Step: 774150 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:36:48,182-Speed 2497.20 samples/sec Loss 1.0845 LearningRate 0.000006 Epoch: 37 Global Step: 774160 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:36:56,388-Speed 2496.35 samples/sec Loss 1.0935 LearningRate 0.000006 Epoch: 37 Global Step: 774170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:04,593-Speed 2496.46 samples/sec Loss 1.0783 LearningRate 0.000006 Epoch: 37 Global Step: 774180 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:12,740-Speed 2514.23 samples/sec Loss 1.0586 LearningRate 0.000006 Epoch: 37 Global Step: 774190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:20,943-Speed 2496.93 samples/sec Loss 1.0688 LearningRate 0.000005 Epoch: 37 Global Step: 774200 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:29,149-Speed 2496.04 samples/sec Loss 1.0675 LearningRate 0.000005 Epoch: 37 Global Step: 774210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:37,353-Speed 2496.65 samples/sec Loss 1.1000 LearningRate 0.000005 Epoch: 37 Global Step: 774220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:45,558-Speed 2496.66 samples/sec Loss 1.1295 LearningRate 0.000005 Epoch: 37 Global Step: 774230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:37:53,775-Speed 2492.85 samples/sec Loss 1.0718 LearningRate 0.000005 Epoch: 37 Global Step: 774240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:01,926-Speed 2512.95 samples/sec Loss 1.0573 LearningRate 0.000005 Epoch: 37 Global Step: 774250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:10,132-Speed 2496.07 samples/sec Loss 1.0636 LearningRate 0.000005 Epoch: 37 Global Step: 774260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:18,340-Speed 2495.74 samples/sec Loss 1.0530 LearningRate 0.000005 Epoch: 37 Global Step: 774270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:26,545-Speed 2496.30 samples/sec Loss 1.0763 LearningRate 0.000005 Epoch: 37 Global Step: 774280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:34,749-Speed 2496.62 samples/sec Loss 1.0775 LearningRate 0.000005 Epoch: 37 Global Step: 774290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:42,969-Speed 2491.93 samples/sec Loss 1.0602 LearningRate 0.000005 Epoch: 37 Global Step: 774300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:51,124-Speed 2511.85 samples/sec Loss 1.1018 LearningRate 0.000005 Epoch: 37 Global Step: 774310 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:38:59,330-Speed 2495.84 samples/sec Loss 1.0744 LearningRate 0.000005 Epoch: 37 Global Step: 774320 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:39:07,544-Speed 2493.97 samples/sec Loss 1.0781 LearningRate 0.000005 Epoch: 37 Global Step: 774330 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:39:15,755-Speed 2494.55 samples/sec Loss 1.0875 LearningRate 0.000005 Epoch: 37 Global Step: 774340 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:39:23,961-Speed 2496.44 samples/sec Loss 1.0781 LearningRate 0.000005 Epoch: 37 Global Step: 774350 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-07-12 23:39:32,122-Speed 2509.72 samples/sec Loss 1.0908 LearningRate 0.000005 Epoch: 37 Global Step: 774360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:39:40,284-Speed 2509.57 samples/sec Loss 1.1048 LearningRate 0.000005 Epoch: 37 Global Step: 774370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:39:48,485-Speed 2497.84 samples/sec Loss 1.0757 LearningRate 0.000005 Epoch: 37 Global Step: 774380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:39:56,688-Speed 2496.72 samples/sec Loss 1.0697 LearningRate 0.000005 Epoch: 37 Global Step: 774390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:04,889-Speed 2497.83 samples/sec Loss 1.1026 LearningRate 0.000005 Epoch: 37 Global Step: 774400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:13,087-Speed 2498.37 samples/sec Loss 1.0794 LearningRate 0.000005 Epoch: 37 Global Step: 774410 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:21,293-Speed 2496.27 samples/sec Loss 1.0758 LearningRate 0.000005 Epoch: 37 Global Step: 774420 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:29,438-Speed 2514.77 samples/sec Loss 1.0836 LearningRate 0.000005 Epoch: 37 Global Step: 774430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:37,635-Speed 2499.11 samples/sec Loss 1.0829 LearningRate 0.000005 Epoch: 37 Global Step: 774440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:45,864-Speed 2489.21 samples/sec Loss 1.0686 LearningRate 0.000005 Epoch: 37 Global Step: 774450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:40:54,065-Speed 2497.49 samples/sec Loss 1.0846 LearningRate 0.000005 Epoch: 37 Global Step: 774460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:02,269-Speed 2496.82 samples/sec Loss 1.0930 LearningRate 0.000005 Epoch: 37 Global Step: 774470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:10,486-Speed 2492.80 samples/sec Loss 1.1137 LearningRate 0.000005 Epoch: 37 Global Step: 774480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:18,633-Speed 2514.04 samples/sec Loss 1.0583 LearningRate 0.000005 Epoch: 37 Global Step: 774490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:26,851-Speed 2492.54 samples/sec Loss 1.0663 LearningRate 0.000005 Epoch: 37 Global Step: 774500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:35,055-Speed 2496.83 samples/sec Loss 1.1042 LearningRate 0.000005 Epoch: 37 Global Step: 774510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:43,255-Speed 2497.95 samples/sec Loss 1.0857 LearningRate 0.000005 Epoch: 37 Global Step: 774520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:51,458-Speed 2497.15 samples/sec Loss 1.0820 LearningRate 0.000005 Epoch: 37 Global Step: 774530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:41:59,674-Speed 2492.95 samples/sec Loss 1.0664 LearningRate 0.000005 Epoch: 37 Global Step: 774540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:07,822-Speed 2513.65 samples/sec Loss 1.0731 LearningRate 0.000005 Epoch: 37 Global Step: 774550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:16,026-Speed 2496.91 samples/sec Loss 1.0550 LearningRate 0.000005 Epoch: 37 Global Step: 774560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:24,228-Speed 2497.21 samples/sec Loss 1.0850 LearningRate 0.000005 Epoch: 37 Global Step: 774570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:32,433-Speed 2496.54 samples/sec Loss 1.0868 LearningRate 0.000005 Epoch: 37 Global Step: 774580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:40,637-Speed 2497.04 samples/sec Loss 1.0786 LearningRate 0.000005 Epoch: 37 Global Step: 774590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:48,840-Speed 2496.83 samples/sec Loss 1.0756 LearningRate 0.000005 Epoch: 37 Global Step: 774600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:42:56,991-Speed 2513.05 samples/sec Loss 1.0568 LearningRate 0.000005 Epoch: 37 Global Step: 774610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:05,193-Speed 2497.06 samples/sec Loss 1.1133 LearningRate 0.000005 Epoch: 37 Global Step: 774620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:13,405-Speed 2494.56 samples/sec Loss 1.0708 LearningRate 0.000005 Epoch: 37 Global Step: 774630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:21,610-Speed 2496.41 samples/sec Loss 1.0818 LearningRate 0.000005 Epoch: 37 Global Step: 774640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:29,813-Speed 2497.16 samples/sec Loss 1.0769 LearningRate 0.000005 Epoch: 37 Global Step: 774650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:38,016-Speed 2497.10 samples/sec Loss 1.0945 LearningRate 0.000005 Epoch: 37 Global Step: 774660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:46,164-Speed 2513.78 samples/sec Loss 1.0898 LearningRate 0.000005 Epoch: 37 Global Step: 774670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:43:54,373-Speed 2495.25 samples/sec Loss 1.0793 LearningRate 0.000005 Epoch: 37 Global Step: 774680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:02,587-Speed 2493.67 samples/sec Loss 1.0797 LearningRate 0.000005 Epoch: 37 Global Step: 774690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:10,790-Speed 2497.28 samples/sec Loss 1.0620 LearningRate 0.000005 Epoch: 37 Global Step: 774700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:18,994-Speed 2496.78 samples/sec Loss 1.0710 LearningRate 0.000005 Epoch: 37 Global Step: 774710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:27,200-Speed 2496.04 samples/sec Loss 1.0926 LearningRate 0.000005 Epoch: 37 Global Step: 774720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:35,345-Speed 2514.70 samples/sec Loss 1.0671 LearningRate 0.000005 Epoch: 37 Global Step: 774730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:43,560-Speed 2493.42 samples/sec Loss 1.0847 LearningRate 0.000005 Epoch: 37 Global Step: 774740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:51,761-Speed 2497.56 samples/sec Loss 1.1015 LearningRate 0.000005 Epoch: 37 Global Step: 774750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:44:59,961-Speed 2498.11 samples/sec Loss 1.1037 LearningRate 0.000005 Epoch: 37 Global Step: 774760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:08,162-Speed 2497.67 samples/sec Loss 1.1017 LearningRate 0.000005 Epoch: 37 Global Step: 774770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:16,363-Speed 2497.59 samples/sec Loss 1.1007 LearningRate 0.000005 Epoch: 37 Global Step: 774780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:24,514-Speed 2513.02 samples/sec Loss 1.1060 LearningRate 0.000005 Epoch: 37 Global Step: 774790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:32,721-Speed 2495.84 samples/sec Loss 1.0801 LearningRate 0.000005 Epoch: 37 Global Step: 774800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:40,922-Speed 2497.64 samples/sec Loss 1.0483 LearningRate 0.000005 Epoch: 37 Global Step: 774810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:49,123-Speed 2497.46 samples/sec Loss 1.0732 LearningRate 0.000005 Epoch: 37 Global Step: 774820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:45:57,337-Speed 2493.94 samples/sec Loss 1.0488 LearningRate 0.000005 Epoch: 37 Global Step: 774830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:05,536-Speed 2498.41 samples/sec Loss 1.0678 LearningRate 0.000005 Epoch: 37 Global Step: 774840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:13,684-Speed 2514.03 samples/sec Loss 1.0727 LearningRate 0.000005 Epoch: 37 Global Step: 774850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:21,885-Speed 2497.39 samples/sec Loss 1.0664 LearningRate 0.000005 Epoch: 37 Global Step: 774860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:30,086-Speed 2497.96 samples/sec Loss 1.0837 LearningRate 0.000005 Epoch: 37 Global Step: 774870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:38,301-Speed 2493.67 samples/sec Loss 1.0874 LearningRate 0.000005 Epoch: 37 Global Step: 774880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:46,511-Speed 2494.77 samples/sec Loss 1.1048 LearningRate 0.000005 Epoch: 37 Global Step: 774890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:46:54,727-Speed 2493.02 samples/sec Loss 1.0755 LearningRate 0.000005 Epoch: 37 Global Step: 774900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:47:02,874-Speed 2514.06 samples/sec Loss 1.0783 LearningRate 0.000005 Epoch: 37 Global Step: 774910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:47:11,076-Speed 2497.47 samples/sec Loss 1.0734 LearningRate 0.000005 Epoch: 37 Global Step: 774920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:47:19,279-Speed 2496.95 samples/sec Loss 1.0842 LearningRate 0.000005 Epoch: 37 Global Step: 774930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-07-12 23:47:27,482-Speed 2497.10 samples/sec Loss 1.0828 LearningRate 0.000005 Epoch: 37 Global Step: 774940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:47:35,687-Speed 2496.44 samples/sec Loss 1.0861 LearningRate 0.000005 Epoch: 37 Global Step: 774950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:47:43,889-Speed 2497.47 samples/sec Loss 1.0830 LearningRate 0.000005 Epoch: 37 Global Step: 774960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:47:52,040-Speed 2512.92 samples/sec Loss 1.0765 LearningRate 0.000005 Epoch: 37 Global Step: 774970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:00,244-Speed 2496.84 samples/sec Loss 1.0659 LearningRate 0.000005 Epoch: 37 Global Step: 774980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:08,446-Speed 2496.97 samples/sec Loss 1.0872 LearningRate 0.000005 Epoch: 37 Global Step: 774990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:16,652-Speed 2496.22 samples/sec Loss 1.0560 LearningRate 0.000005 Epoch: 37 Global Step: 775000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:24,854-Speed 2497.35 samples/sec Loss 1.0883 LearningRate 0.000005 Epoch: 37 Global Step: 775010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:33,057-Speed 2496.89 samples/sec Loss 1.0821 LearningRate 0.000005 Epoch: 37 Global Step: 775020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:41,210-Speed 2512.09 samples/sec Loss 1.1075 LearningRate 0.000005 Epoch: 37 Global Step: 775030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:49,419-Speed 2495.47 samples/sec Loss 1.0873 LearningRate 0.000005 Epoch: 37 Global Step: 775040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:48:57,630-Speed 2494.77 samples/sec Loss 1.0852 LearningRate 0.000005 Epoch: 37 Global Step: 775050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:05,836-Speed 2496.11 samples/sec Loss 1.0876 LearningRate 0.000005 Epoch: 37 Global Step: 775060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:14,034-Speed 2498.26 samples/sec Loss 1.0612 LearningRate 0.000005 Epoch: 37 Global Step: 775070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:22,239-Speed 2496.54 samples/sec Loss 1.0922 LearningRate 0.000005 Epoch: 37 Global Step: 775080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:30,389-Speed 2513.39 samples/sec Loss 1.1008 LearningRate 0.000005 Epoch: 37 Global Step: 775090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:38,590-Speed 2497.62 samples/sec Loss 1.0464 LearningRate 0.000005 Epoch: 37 Global Step: 775100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:46,790-Speed 2498.01 samples/sec Loss 1.0991 LearningRate 0.000005 Epoch: 37 Global Step: 775110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:49:54,991-Speed 2497.50 samples/sec Loss 1.0563 LearningRate 0.000005 Epoch: 37 Global Step: 775120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:03,196-Speed 2496.51 samples/sec Loss 1.0358 LearningRate 0.000005 Epoch: 37 Global Step: 775130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:11,399-Speed 2497.02 samples/sec Loss 1.1119 LearningRate 0.000005 Epoch: 37 Global Step: 775140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:19,548-Speed 2513.47 samples/sec Loss 1.0964 LearningRate 0.000005 Epoch: 37 Global Step: 775150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:27,750-Speed 2497.54 samples/sec Loss 1.0794 LearningRate 0.000005 Epoch: 37 Global Step: 775160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:35,950-Speed 2497.65 samples/sec Loss 1.0779 LearningRate 0.000005 Epoch: 37 Global Step: 775170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:44,156-Speed 2496.09 samples/sec Loss 1.0860 LearningRate 0.000005 Epoch: 37 Global Step: 775180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:50:52,355-Speed 2498.26 samples/sec Loss 1.0806 LearningRate 0.000005 Epoch: 37 Global Step: 775190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:00,558-Speed 2497.15 samples/sec Loss 1.0876 LearningRate 0.000005 Epoch: 37 Global Step: 775200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:08,708-Speed 2513.16 samples/sec Loss 1.0410 LearningRate 0.000005 Epoch: 37 Global Step: 775210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:16,906-Speed 2498.34 samples/sec Loss 1.0744 LearningRate 0.000005 Epoch: 37 Global Step: 775220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:25,107-Speed 2497.67 samples/sec Loss 1.0773 LearningRate 0.000005 Epoch: 37 Global Step: 775230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:33,306-Speed 2498.48 samples/sec Loss 1.0550 LearningRate 0.000005 Epoch: 37 Global Step: 775240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:41,506-Speed 2497.87 samples/sec Loss 1.1086 LearningRate 0.000005 Epoch: 37 Global Step: 775250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:49,710-Speed 2496.82 samples/sec Loss 1.0951 LearningRate 0.000005 Epoch: 37 Global Step: 775260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:51:57,862-Speed 2512.73 samples/sec Loss 1.1016 LearningRate 0.000005 Epoch: 37 Global Step: 775270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:06,067-Speed 2496.38 samples/sec Loss 1.0880 LearningRate 0.000005 Epoch: 37 Global Step: 775280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:14,273-Speed 2496.23 samples/sec Loss 1.0748 LearningRate 0.000005 Epoch: 37 Global Step: 775290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:22,481-Speed 2495.42 samples/sec Loss 1.0833 LearningRate 0.000005 Epoch: 37 Global Step: 775300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:30,684-Speed 2497.13 samples/sec Loss 1.0513 LearningRate 0.000005 Epoch: 37 Global Step: 775310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:38,883-Speed 2498.34 samples/sec Loss 1.0632 LearningRate 0.000005 Epoch: 37 Global Step: 775320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:47,029-Speed 2514.35 samples/sec Loss 1.0618 LearningRate 0.000005 Epoch: 37 Global Step: 775330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:52:55,240-Speed 2494.81 samples/sec Loss 1.0878 LearningRate 0.000005 Epoch: 37 Global Step: 775340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:03,441-Speed 2497.71 samples/sec Loss 1.0622 LearningRate 0.000005 Epoch: 37 Global Step: 775350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:11,641-Speed 2497.69 samples/sec Loss 1.0643 LearningRate 0.000005 Epoch: 37 Global Step: 775360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:19,840-Speed 2498.16 samples/sec Loss 1.0945 LearningRate 0.000005 Epoch: 37 Global Step: 775370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:28,042-Speed 2497.30 samples/sec Loss 1.0717 LearningRate 0.000005 Epoch: 37 Global Step: 775380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:36,201-Speed 2510.68 samples/sec Loss 1.0984 LearningRate 0.000005 Epoch: 37 Global Step: 775390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:44,401-Speed 2497.94 samples/sec Loss 1.0670 LearningRate 0.000005 Epoch: 37 Global Step: 775400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:53:52,603-Speed 2497.32 samples/sec Loss 1.1128 LearningRate 0.000005 Epoch: 37 Global Step: 775410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:00,806-Speed 2497.47 samples/sec Loss 1.0760 LearningRate 0.000005 Epoch: 37 Global Step: 775420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:09,008-Speed 2497.56 samples/sec Loss 1.0654 LearningRate 0.000005 Epoch: 37 Global Step: 775430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:17,210-Speed 2497.33 samples/sec Loss 1.0808 LearningRate 0.000005 Epoch: 37 Global Step: 775440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:25,361-Speed 2512.91 samples/sec Loss 1.0550 LearningRate 0.000005 Epoch: 37 Global Step: 775450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:33,575-Speed 2493.60 samples/sec Loss 1.0632 LearningRate 0.000005 Epoch: 37 Global Step: 775460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:41,777-Speed 2497.48 samples/sec Loss 1.0772 LearningRate 0.000005 Epoch: 37 Global Step: 775470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:49,978-Speed 2497.57 samples/sec Loss 1.1026 LearningRate 0.000005 Epoch: 37 Global Step: 775480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:54:58,183-Speed 2496.42 samples/sec Loss 1.0793 LearningRate 0.000005 Epoch: 37 Global Step: 775490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:06,385-Speed 2497.26 samples/sec Loss 1.0611 LearningRate 0.000005 Epoch: 37 Global Step: 775500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:14,530-Speed 2515.00 samples/sec Loss 1.1078 LearningRate 0.000005 Epoch: 37 Global Step: 775510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:22,730-Speed 2498.00 samples/sec Loss 1.0833 LearningRate 0.000005 Epoch: 37 Global Step: 775520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:30,929-Speed 2498.21 samples/sec Loss 1.0983 LearningRate 0.000005 Epoch: 37 Global Step: 775530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:39,129-Speed 2497.83 samples/sec Loss 1.0377 LearningRate 0.000005 Epoch: 37 Global Step: 775540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:47,329-Speed 2497.84 samples/sec Loss 1.0649 LearningRate 0.000005 Epoch: 37 Global Step: 775550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-12 23:55:55,529-Speed 2497.88 samples/sec Loss 1.0773 LearningRate 0.000005 Epoch: 37 Global Step: 775560 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:03,677-Speed 2513.95 samples/sec Loss 1.1040 LearningRate 0.000005 Epoch: 37 Global Step: 775570 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:11,884-Speed 2495.87 samples/sec Loss 1.1128 LearningRate 0.000005 Epoch: 37 Global Step: 775580 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:20,083-Speed 2498.20 samples/sec Loss 1.0863 LearningRate 0.000005 Epoch: 37 Global Step: 775590 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:28,288-Speed 2496.42 samples/sec Loss 1.1037 LearningRate 0.000005 Epoch: 37 Global Step: 775600 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:36,490-Speed 2498.44 samples/sec Loss 1.0691 LearningRate 0.000005 Epoch: 37 Global Step: 775610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:44,691-Speed 2497.67 samples/sec Loss 1.0649 LearningRate 0.000005 Epoch: 37 Global Step: 775620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:56:52,841-Speed 2513.46 samples/sec Loss 1.0820 LearningRate 0.000005 Epoch: 37 Global Step: 775630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:01,040-Speed 2498.34 samples/sec Loss 1.0995 LearningRate 0.000005 Epoch: 37 Global Step: 775640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:09,241-Speed 2497.74 samples/sec Loss 1.0755 LearningRate 0.000005 Epoch: 37 Global Step: 775650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:17,446-Speed 2496.38 samples/sec Loss 1.0854 LearningRate 0.000005 Epoch: 37 Global Step: 775660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:25,647-Speed 2497.89 samples/sec Loss 1.0879 LearningRate 0.000005 Epoch: 37 Global Step: 775670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:33,850-Speed 2496.94 samples/sec Loss 1.0845 LearningRate 0.000005 Epoch: 37 Global Step: 775680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:42,001-Speed 2512.73 samples/sec Loss 1.0855 LearningRate 0.000005 Epoch: 37 Global Step: 775690 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:50,203-Speed 2497.63 samples/sec Loss 1.0594 LearningRate 0.000005 Epoch: 37 Global Step: 775700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:57:58,406-Speed 2497.15 samples/sec Loss 1.0703 LearningRate 0.000005 Epoch: 37 Global Step: 775710 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:06,606-Speed 2498.06 samples/sec Loss 1.0771 LearningRate 0.000005 Epoch: 37 Global Step: 775720 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:14,812-Speed 2496.03 samples/sec Loss 1.0663 LearningRate 0.000005 Epoch: 37 Global Step: 775730 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:23,014-Speed 2497.39 samples/sec Loss 1.0899 LearningRate 0.000005 Epoch: 37 Global Step: 775740 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:31,164-Speed 2513.38 samples/sec Loss 1.0540 LearningRate 0.000005 Epoch: 37 Global Step: 775750 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:39,364-Speed 2497.68 samples/sec Loss 1.0645 LearningRate 0.000005 Epoch: 37 Global Step: 775760 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:47,566-Speed 2497.63 samples/sec Loss 1.0623 LearningRate 0.000005 Epoch: 37 Global Step: 775770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:58:55,767-Speed 2497.86 samples/sec Loss 1.1155 LearningRate 0.000005 Epoch: 37 Global Step: 775780 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:03,968-Speed 2497.35 samples/sec Loss 1.0854 LearningRate 0.000005 Epoch: 37 Global Step: 775790 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:12,170-Speed 2497.37 samples/sec Loss 1.1036 LearningRate 0.000005 Epoch: 37 Global Step: 775800 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:20,328-Speed 2511.06 samples/sec Loss 1.0781 LearningRate 0.000005 Epoch: 37 Global Step: 775810 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:28,530-Speed 2497.40 samples/sec Loss 1.0767 LearningRate 0.000005 Epoch: 37 Global Step: 775820 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:36,734-Speed 2496.49 samples/sec Loss 1.1044 LearningRate 0.000005 Epoch: 37 Global Step: 775830 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:44,937-Speed 2497.24 samples/sec Loss 1.0911 LearningRate 0.000005 Epoch: 37 Global Step: 775840 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-12 23:59:53,139-Speed 2497.39 samples/sec Loss 1.0619 LearningRate 0.000005 Epoch: 37 Global Step: 775850 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:01,341-Speed 2497.38 samples/sec Loss 1.0821 LearningRate 0.000005 Epoch: 37 Global Step: 775860 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:09,490-Speed 2513.61 samples/sec Loss 1.0822 LearningRate 0.000005 Epoch: 37 Global Step: 775870 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:17,696-Speed 2496.33 samples/sec Loss 1.0430 LearningRate 0.000005 Epoch: 37 Global Step: 775880 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:25,901-Speed 2496.45 samples/sec Loss 1.0985 LearningRate 0.000005 Epoch: 37 Global Step: 775890 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:34,106-Speed 2496.34 samples/sec Loss 1.0777 LearningRate 0.000005 Epoch: 37 Global Step: 775900 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:42,313-Speed 2495.78 samples/sec Loss 1.0781 LearningRate 0.000005 Epoch: 37 Global Step: 775910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:50,516-Speed 2496.98 samples/sec Loss 1.0880 LearningRate 0.000005 Epoch: 37 Global Step: 775920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:00:58,664-Speed 2514.04 samples/sec Loss 1.0599 LearningRate 0.000005 Epoch: 37 Global Step: 775930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:06,868-Speed 2496.74 samples/sec Loss 1.0955 LearningRate 0.000005 Epoch: 37 Global Step: 775940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:15,075-Speed 2495.90 samples/sec Loss 1.0851 LearningRate 0.000005 Epoch: 37 Global Step: 775950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:23,285-Speed 2494.93 samples/sec Loss 1.0898 LearningRate 0.000005 Epoch: 37 Global Step: 775960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:31,487-Speed 2497.20 samples/sec Loss 1.1084 LearningRate 0.000005 Epoch: 37 Global Step: 775970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:39,695-Speed 2495.51 samples/sec Loss 1.0643 LearningRate 0.000005 Epoch: 37 Global Step: 775980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:47,847-Speed 2512.95 samples/sec Loss 1.0730 LearningRate 0.000005 Epoch: 37 Global Step: 775990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:01:56,048-Speed 2497.63 samples/sec Loss 1.0842 LearningRate 0.000005 Epoch: 37 Global Step: 776000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:02:04,251-Speed 2497.01 samples/sec Loss 1.0808 LearningRate 0.000005 Epoch: 37 Global Step: 776010 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:02:12,456-Speed 2496.44 samples/sec Loss 1.1257 LearningRate 0.000005 Epoch: 37 Global Step: 776020 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:02:20,617-Speed 2510.12 samples/sec Loss 1.0854 LearningRate 0.000005 Epoch: 37 Global Step: 776030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:02:28,820-Speed 2497.06 samples/sec Loss 1.0533 LearningRate 0.000005 Epoch: 37 Global Step: 776040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:02:36,971-Speed 2512.95 samples/sec Loss 1.0683 LearningRate 0.000005 Epoch: 37 Global Step: 776050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:02:45,172-Speed 2497.57 samples/sec Loss 1.0813 LearningRate 0.000005 Epoch: 37 Global Step: 776060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:02:53,381-Speed 2495.21 samples/sec Loss 1.0709 LearningRate 0.000005 Epoch: 37 Global Step: 776070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:01,582-Speed 2497.82 samples/sec Loss 1.1081 LearningRate 0.000005 Epoch: 37 Global Step: 776080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:09,783-Speed 2497.41 samples/sec Loss 1.0775 LearningRate 0.000005 Epoch: 37 Global Step: 776090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:17,984-Speed 2497.54 samples/sec Loss 1.1363 LearningRate 0.000005 Epoch: 37 Global Step: 776100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:26,136-Speed 2512.89 samples/sec Loss 1.0806 LearningRate 0.000005 Epoch: 37 Global Step: 776110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:34,336-Speed 2497.97 samples/sec Loss 1.0949 LearningRate 0.000005 Epoch: 37 Global Step: 776120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:42,541-Speed 2496.46 samples/sec Loss 1.0889 LearningRate 0.000005 Epoch: 37 Global Step: 776130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:50,745-Speed 2497.06 samples/sec Loss 1.0879 LearningRate 0.000005 Epoch: 37 Global Step: 776140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:03:58,950-Speed 2496.53 samples/sec Loss 1.0674 LearningRate 0.000005 Epoch: 37 Global Step: 776150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:07,154-Speed 2497.08 samples/sec Loss 1.0610 LearningRate 0.000005 Epoch: 37 Global Step: 776160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:15,308-Speed 2511.96 samples/sec Loss 1.0830 LearningRate 0.000005 Epoch: 37 Global Step: 776170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:23,514-Speed 2496.26 samples/sec Loss 1.0512 LearningRate 0.000005 Epoch: 37 Global Step: 776180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:31,715-Speed 2497.39 samples/sec Loss 1.0785 LearningRate 0.000005 Epoch: 37 Global Step: 776190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:39,916-Speed 2497.70 samples/sec Loss 1.0824 LearningRate 0.000005 Epoch: 37 Global Step: 776200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:48,121-Speed 2496.46 samples/sec Loss 1.0697 LearningRate 0.000005 Epoch: 37 Global Step: 776210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:04:56,328-Speed 2495.77 samples/sec Loss 1.0400 LearningRate 0.000005 Epoch: 37 Global Step: 776220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:04,477-Speed 2513.56 samples/sec Loss 1.0692 LearningRate 0.000005 Epoch: 37 Global Step: 776230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:12,688-Speed 2494.44 samples/sec Loss 1.0837 LearningRate 0.000005 Epoch: 37 Global Step: 776240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:20,892-Speed 2496.95 samples/sec Loss 1.0825 LearningRate 0.000005 Epoch: 37 Global Step: 776250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:29,090-Speed 2498.27 samples/sec Loss 1.0924 LearningRate 0.000005 Epoch: 37 Global Step: 776260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:37,297-Speed 2496.04 samples/sec Loss 1.0545 LearningRate 0.000005 Epoch: 37 Global Step: 776270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:45,499-Speed 2497.30 samples/sec Loss 1.0647 LearningRate 0.000005 Epoch: 37 Global Step: 776280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:05:53,646-Speed 2514.16 samples/sec Loss 1.0978 LearningRate 0.000005 Epoch: 37 Global Step: 776290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:01,846-Speed 2497.91 samples/sec Loss 1.0951 LearningRate 0.000005 Epoch: 37 Global Step: 776300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:10,050-Speed 2496.74 samples/sec Loss 1.0908 LearningRate 0.000005 Epoch: 37 Global Step: 776310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:18,253-Speed 2496.93 samples/sec Loss 1.0854 LearningRate 0.000005 Epoch: 37 Global Step: 776320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:26,469-Speed 2493.36 samples/sec Loss 1.0709 LearningRate 0.000005 Epoch: 37 Global Step: 776330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:34,684-Speed 2493.72 samples/sec Loss 1.0562 LearningRate 0.000005 Epoch: 37 Global Step: 776340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:42,835-Speed 2512.75 samples/sec Loss 1.1025 LearningRate 0.000005 Epoch: 37 Global Step: 776350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:51,050-Speed 2493.47 samples/sec Loss 1.0764 LearningRate 0.000005 Epoch: 37 Global Step: 776360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:06:59,253-Speed 2496.96 samples/sec Loss 1.0628 LearningRate 0.000005 Epoch: 37 Global Step: 776370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:07,472-Speed 2492.23 samples/sec Loss 1.0783 LearningRate 0.000005 Epoch: 37 Global Step: 776380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:15,681-Speed 2495.22 samples/sec Loss 1.0753 LearningRate 0.000005 Epoch: 37 Global Step: 776390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:23,896-Speed 2493.43 samples/sec Loss 1.1075 LearningRate 0.000005 Epoch: 37 Global Step: 776400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:32,044-Speed 2514.04 samples/sec Loss 1.0474 LearningRate 0.000005 Epoch: 37 Global Step: 776410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:40,247-Speed 2497.14 samples/sec Loss 1.0567 LearningRate 0.000005 Epoch: 37 Global Step: 776420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:48,452-Speed 2496.36 samples/sec Loss 1.0904 LearningRate 0.000005 Epoch: 37 Global Step: 776430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:07:56,652-Speed 2497.89 samples/sec Loss 1.0578 LearningRate 0.000005 Epoch: 37 Global Step: 776440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:04,855-Speed 2497.03 samples/sec Loss 1.0835 LearningRate 0.000005 Epoch: 37 Global Step: 776450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:13,058-Speed 2497.35 samples/sec Loss 1.0738 LearningRate 0.000005 Epoch: 37 Global Step: 776460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:21,211-Speed 2512.69 samples/sec Loss 1.0673 LearningRate 0.000005 Epoch: 37 Global Step: 776470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:29,413-Speed 2497.13 samples/sec Loss 1.0803 LearningRate 0.000005 Epoch: 37 Global Step: 776480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:37,617-Speed 2496.64 samples/sec Loss 1.0740 LearningRate 0.000005 Epoch: 37 Global Step: 776490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:45,820-Speed 2497.22 samples/sec Loss 1.0884 LearningRate 0.000005 Epoch: 37 Global Step: 776500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:08:54,021-Speed 2497.79 samples/sec Loss 1.0822 LearningRate 0.000005 Epoch: 37 Global Step: 776510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:02,223-Speed 2497.20 samples/sec Loss 1.0639 LearningRate 0.000005 Epoch: 37 Global Step: 776520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:10,382-Speed 2511.22 samples/sec Loss 1.1092 LearningRate 0.000005 Epoch: 37 Global Step: 776530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:18,584-Speed 2497.26 samples/sec Loss 1.0597 LearningRate 0.000005 Epoch: 37 Global Step: 776540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:26,786-Speed 2497.34 samples/sec Loss 1.0837 LearningRate 0.000005 Epoch: 37 Global Step: 776550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:34,990-Speed 2496.73 samples/sec Loss 1.0566 LearningRate 0.000005 Epoch: 37 Global Step: 776560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:43,192-Speed 2497.37 samples/sec Loss 1.0967 LearningRate 0.000005 Epoch: 37 Global Step: 776570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:51,392-Speed 2497.84 samples/sec Loss 1.0810 LearningRate 0.000005 Epoch: 37 Global Step: 776580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:09:59,538-Speed 2514.29 samples/sec Loss 1.0759 LearningRate 0.000005 Epoch: 37 Global Step: 776590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:07,741-Speed 2497.34 samples/sec Loss 1.0913 LearningRate 0.000005 Epoch: 37 Global Step: 776600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:15,945-Speed 2496.80 samples/sec Loss 1.0553 LearningRate 0.000005 Epoch: 37 Global Step: 776610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:24,146-Speed 2498.16 samples/sec Loss 1.0891 LearningRate 0.000005 Epoch: 37 Global Step: 776620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:32,347-Speed 2497.54 samples/sec Loss 1.0734 LearningRate 0.000005 Epoch: 37 Global Step: 776630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:40,547-Speed 2497.91 samples/sec Loss 1.0902 LearningRate 0.000005 Epoch: 37 Global Step: 776640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:48,696-Speed 2513.76 samples/sec Loss 1.0440 LearningRate 0.000005 Epoch: 37 Global Step: 776650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:10:56,896-Speed 2497.93 samples/sec Loss 1.0932 LearningRate 0.000005 Epoch: 37 Global Step: 776660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:05,098-Speed 2497.39 samples/sec Loss 1.0972 LearningRate 0.000005 Epoch: 37 Global Step: 776670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:13,299-Speed 2497.65 samples/sec Loss 1.0777 LearningRate 0.000005 Epoch: 37 Global Step: 776680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:21,498-Speed 2498.05 samples/sec Loss 1.0931 LearningRate 0.000005 Epoch: 37 Global Step: 776690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:29,702-Speed 2496.61 samples/sec Loss 1.0976 LearningRate 0.000005 Epoch: 37 Global Step: 776700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:37,851-Speed 2513.75 samples/sec Loss 1.0416 LearningRate 0.000005 Epoch: 37 Global Step: 776710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:46,067-Speed 2492.96 samples/sec Loss 1.0756 LearningRate 0.000005 Epoch: 37 Global Step: 776720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:11:54,270-Speed 2496.98 samples/sec Loss 1.0663 LearningRate 0.000005 Epoch: 37 Global Step: 776730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:02,474-Speed 2496.83 samples/sec Loss 1.0818 LearningRate 0.000005 Epoch: 37 Global Step: 776740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:10,683-Speed 2495.29 samples/sec Loss 1.0633 LearningRate 0.000005 Epoch: 37 Global Step: 776750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:18,881-Speed 2498.54 samples/sec Loss 1.0787 LearningRate 0.000005 Epoch: 37 Global Step: 776760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:27,031-Speed 2513.42 samples/sec Loss 1.0808 LearningRate 0.000005 Epoch: 37 Global Step: 776770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:35,234-Speed 2497.18 samples/sec Loss 1.0669 LearningRate 0.000005 Epoch: 37 Global Step: 776780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:43,433-Speed 2498.21 samples/sec Loss 1.0794 LearningRate 0.000005 Epoch: 37 Global Step: 776790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:51,633-Speed 2498.00 samples/sec Loss 1.0551 LearningRate 0.000005 Epoch: 37 Global Step: 776800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:12:59,834-Speed 2497.72 samples/sec Loss 1.0922 LearningRate 0.000005 Epoch: 37 Global Step: 776810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:08,039-Speed 2496.51 samples/sec Loss 1.0898 LearningRate 0.000005 Epoch: 37 Global Step: 776820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:16,191-Speed 2512.84 samples/sec Loss 1.0988 LearningRate 0.000005 Epoch: 37 Global Step: 776830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:24,393-Speed 2497.46 samples/sec Loss 1.1002 LearningRate 0.000005 Epoch: 37 Global Step: 776840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:32,607-Speed 2493.68 samples/sec Loss 1.0352 LearningRate 0.000005 Epoch: 37 Global Step: 776850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:40,817-Speed 2494.94 samples/sec Loss 1.0710 LearningRate 0.000005 Epoch: 37 Global Step: 776860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:49,019-Speed 2497.63 samples/sec Loss 1.0577 LearningRate 0.000005 Epoch: 37 Global Step: 776870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:13:57,225-Speed 2496.02 samples/sec Loss 1.1033 LearningRate 0.000005 Epoch: 37 Global Step: 776880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:05,387-Speed 2509.76 samples/sec Loss 1.1043 LearningRate 0.000005 Epoch: 37 Global Step: 776890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:13,591-Speed 2496.64 samples/sec Loss 1.1069 LearningRate 0.000005 Epoch: 37 Global Step: 776900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:21,811-Speed 2491.75 samples/sec Loss 1.0954 LearningRate 0.000005 Epoch: 37 Global Step: 776910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:30,016-Speed 2496.62 samples/sec Loss 1.0732 LearningRate 0.000005 Epoch: 37 Global Step: 776920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:38,217-Speed 2497.40 samples/sec Loss 1.0872 LearningRate 0.000005 Epoch: 37 Global Step: 776930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:46,426-Speed 2495.43 samples/sec Loss 1.0854 LearningRate 0.000005 Epoch: 37 Global Step: 776940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:14:54,578-Speed 2512.45 samples/sec Loss 1.0744 LearningRate 0.000005 Epoch: 37 Global Step: 776950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:02,790-Speed 2494.52 samples/sec Loss 1.0680 LearningRate 0.000005 Epoch: 37 Global Step: 776960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:10,997-Speed 2495.88 samples/sec Loss 1.0764 LearningRate 0.000005 Epoch: 37 Global Step: 776970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:19,199-Speed 2497.21 samples/sec Loss 1.0569 LearningRate 0.000005 Epoch: 37 Global Step: 776980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:27,404-Speed 2496.61 samples/sec Loss 1.0786 LearningRate 0.000005 Epoch: 37 Global Step: 776990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:35,608-Speed 2496.76 samples/sec Loss 1.0762 LearningRate 0.000005 Epoch: 37 Global Step: 777000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:43,759-Speed 2512.92 samples/sec Loss 1.0940 LearningRate 0.000005 Epoch: 37 Global Step: 777010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:15:51,962-Speed 2496.96 samples/sec Loss 1.0867 LearningRate 0.000005 Epoch: 37 Global Step: 777020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:00,178-Speed 2493.51 samples/sec Loss 1.0957 LearningRate 0.000005 Epoch: 37 Global Step: 777030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:08,381-Speed 2497.10 samples/sec Loss 1.0635 LearningRate 0.000005 Epoch: 37 Global Step: 777040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:16,584-Speed 2497.19 samples/sec Loss 1.0569 LearningRate 0.000005 Epoch: 37 Global Step: 777050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:24,785-Speed 2497.58 samples/sec Loss 1.0691 LearningRate 0.000005 Epoch: 37 Global Step: 777060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:32,936-Speed 2512.96 samples/sec Loss 1.0846 LearningRate 0.000005 Epoch: 37 Global Step: 777070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:41,146-Speed 2495.35 samples/sec Loss 1.0747 LearningRate 0.000005 Epoch: 37 Global Step: 777080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:49,352-Speed 2495.79 samples/sec Loss 1.1078 LearningRate 0.000005 Epoch: 37 Global Step: 777090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:16:57,558-Speed 2496.16 samples/sec Loss 1.0688 LearningRate 0.000005 Epoch: 37 Global Step: 777100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:05,763-Speed 2496.50 samples/sec Loss 1.0721 LearningRate 0.000005 Epoch: 37 Global Step: 777110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:13,967-Speed 2496.77 samples/sec Loss 1.1114 LearningRate 0.000005 Epoch: 37 Global Step: 777120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:22,112-Speed 2514.99 samples/sec Loss 1.0626 LearningRate 0.000005 Epoch: 37 Global Step: 777130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:30,314-Speed 2497.62 samples/sec Loss 1.0767 LearningRate 0.000005 Epoch: 37 Global Step: 777140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:38,516-Speed 2497.32 samples/sec Loss 1.0827 LearningRate 0.000005 Epoch: 37 Global Step: 777150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:46,717-Speed 2497.42 samples/sec Loss 1.0825 LearningRate 0.000005 Epoch: 37 Global Step: 777160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:17:54,916-Speed 2498.22 samples/sec Loss 1.0469 LearningRate 0.000005 Epoch: 37 Global Step: 777170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:03,116-Speed 2498.28 samples/sec Loss 1.0835 LearningRate 0.000005 Epoch: 37 Global Step: 777180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:11,276-Speed 2510.08 samples/sec Loss 1.0824 LearningRate 0.000005 Epoch: 37 Global Step: 777190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:19,478-Speed 2497.46 samples/sec Loss 1.0626 LearningRate 0.000005 Epoch: 37 Global Step: 777200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:27,679-Speed 2497.56 samples/sec Loss 1.1110 LearningRate 0.000005 Epoch: 37 Global Step: 777210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:35,894-Speed 2493.39 samples/sec Loss 1.0456 LearningRate 0.000005 Epoch: 37 Global Step: 777220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-07-13 00:18:44,105-Speed 2494.75 samples/sec Loss 1.0775 LearningRate 0.000005 Epoch: 37 Global Step: 777230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:18:52,308-Speed 2496.98 samples/sec Loss 1.0847 LearningRate 0.000005 Epoch: 37 Global Step: 777240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:00,466-Speed 2510.60 samples/sec Loss 1.0708 LearningRate 0.000005 Epoch: 37 Global Step: 777250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:08,673-Speed 2496.11 samples/sec Loss 1.0745 LearningRate 0.000005 Epoch: 37 Global Step: 777260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:16,882-Speed 2495.39 samples/sec Loss 1.0584 LearningRate 0.000005 Epoch: 37 Global Step: 777270 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:25,092-Speed 2494.91 samples/sec Loss 1.0994 LearningRate 0.000005 Epoch: 37 Global Step: 777280 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:33,298-Speed 2495.99 samples/sec Loss 1.0683 LearningRate 0.000005 Epoch: 37 Global Step: 777290 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:41,505-Speed 2495.87 samples/sec Loss 1.0983 LearningRate 0.000005 Epoch: 37 Global Step: 777300 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:49,656-Speed 2512.86 samples/sec Loss 1.0963 LearningRate 0.000005 Epoch: 37 Global Step: 777310 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:19:57,858-Speed 2497.44 samples/sec Loss 1.1061 LearningRate 0.000005 Epoch: 37 Global Step: 777320 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:06,061-Speed 2497.04 samples/sec Loss 1.0875 LearningRate 0.000005 Epoch: 37 Global Step: 777330 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:14,274-Speed 2497.61 samples/sec Loss 1.0533 LearningRate 0.000005 Epoch: 37 Global Step: 777340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:22,489-Speed 2493.60 samples/sec Loss 1.0846 LearningRate 0.000005 Epoch: 37 Global Step: 777350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:30,692-Speed 2497.21 samples/sec Loss 1.1017 LearningRate 0.000005 Epoch: 37 Global Step: 777360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:38,843-Speed 2513.11 samples/sec Loss 1.0861 LearningRate 0.000005 Epoch: 37 Global Step: 777370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:47,053-Speed 2495.30 samples/sec Loss 1.0506 LearningRate 0.000005 Epoch: 37 Global Step: 777380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:20:55,257-Speed 2496.70 samples/sec Loss 1.0888 LearningRate 0.000005 Epoch: 37 Global Step: 777390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:03,460-Speed 2496.86 samples/sec Loss 1.1144 LearningRate 0.000005 Epoch: 37 Global Step: 777400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:11,666-Speed 2496.60 samples/sec Loss 1.0594 LearningRate 0.000005 Epoch: 37 Global Step: 777410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:19,871-Speed 2496.48 samples/sec Loss 1.0949 LearningRate 0.000005 Epoch: 37 Global Step: 777420 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:28,034-Speed 2509.35 samples/sec Loss 1.1064 LearningRate 0.000005 Epoch: 37 Global Step: 777430 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:36,238-Speed 2496.81 samples/sec Loss 1.0811 LearningRate 0.000005 Epoch: 37 Global Step: 777440 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:44,440-Speed 2497.24 samples/sec Loss 1.0691 LearningRate 0.000005 Epoch: 37 Global Step: 777450 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:21:52,643-Speed 2496.90 samples/sec Loss 1.0802 LearningRate 0.000005 Epoch: 37 Global Step: 777460 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:00,848-Speed 2496.34 samples/sec Loss 1.0771 LearningRate 0.000005 Epoch: 37 Global Step: 777470 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:09,051-Speed 2497.26 samples/sec Loss 1.0729 LearningRate 0.000005 Epoch: 37 Global Step: 777480 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:17,199-Speed 2513.61 samples/sec Loss 1.0795 LearningRate 0.000005 Epoch: 37 Global Step: 777490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:25,404-Speed 2496.75 samples/sec Loss 1.0480 LearningRate 0.000005 Epoch: 37 Global Step: 777500 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:33,612-Speed 2495.50 samples/sec Loss 1.0823 LearningRate 0.000005 Epoch: 37 Global Step: 777510 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:41,821-Speed 2495.31 samples/sec Loss 1.0943 LearningRate 0.000005 Epoch: 37 Global Step: 777520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:50,023-Speed 2497.29 samples/sec Loss 1.0789 LearningRate 0.000005 Epoch: 37 Global Step: 777530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:22:58,229-Speed 2496.26 samples/sec Loss 1.0798 LearningRate 0.000005 Epoch: 37 Global Step: 777540 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:06,382-Speed 2512.40 samples/sec Loss 1.0853 LearningRate 0.000005 Epoch: 37 Global Step: 777550 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:14,588-Speed 2496.10 samples/sec Loss 1.0807 LearningRate 0.000005 Epoch: 37 Global Step: 777560 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:22,806-Speed 2492.53 samples/sec Loss 1.0712 LearningRate 0.000005 Epoch: 37 Global Step: 777570 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:31,008-Speed 2497.33 samples/sec Loss 1.0729 LearningRate 0.000005 Epoch: 37 Global Step: 777580 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:39,208-Speed 2497.90 samples/sec Loss 1.0535 LearningRate 0.000005 Epoch: 37 Global Step: 777590 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:47,412-Speed 2496.70 samples/sec Loss 1.0685 LearningRate 0.000005 Epoch: 37 Global Step: 777600 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:23:55,561-Speed 2513.77 samples/sec Loss 1.0632 LearningRate 0.000005 Epoch: 37 Global Step: 777610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:03,771-Speed 2494.73 samples/sec Loss 1.0562 LearningRate 0.000005 Epoch: 37 Global Step: 777620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:11,975-Speed 2496.84 samples/sec Loss 1.0783 LearningRate 0.000005 Epoch: 37 Global Step: 777630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:20,184-Speed 2495.15 samples/sec Loss 1.0846 LearningRate 0.000005 Epoch: 37 Global Step: 777640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:28,387-Speed 2497.14 samples/sec Loss 1.0686 LearningRate 0.000005 Epoch: 37 Global Step: 777650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:36,589-Speed 2497.52 samples/sec Loss 1.0800 LearningRate 0.000005 Epoch: 37 Global Step: 777660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:44,739-Speed 2513.18 samples/sec Loss 1.0812 LearningRate 0.000005 Epoch: 37 Global Step: 777670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:24:52,941-Speed 2497.23 samples/sec Loss 1.1109 LearningRate 0.000005 Epoch: 37 Global Step: 777680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:01,143-Speed 2497.19 samples/sec Loss 1.0449 LearningRate 0.000005 Epoch: 37 Global Step: 777690 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:09,345-Speed 2497.52 samples/sec Loss 1.0632 LearningRate 0.000005 Epoch: 37 Global Step: 777700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:17,551-Speed 2495.93 samples/sec Loss 1.0898 LearningRate 0.000005 Epoch: 37 Global Step: 777710 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:25,755-Speed 2496.66 samples/sec Loss 1.0567 LearningRate 0.000005 Epoch: 37 Global Step: 777720 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:33,915-Speed 2510.43 samples/sec Loss 1.0760 LearningRate 0.000005 Epoch: 37 Global Step: 777730 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:42,115-Speed 2497.88 samples/sec Loss 1.0985 LearningRate 0.000005 Epoch: 37 Global Step: 777740 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:50,319-Speed 2496.86 samples/sec Loss 1.0872 LearningRate 0.000005 Epoch: 37 Global Step: 777750 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:25:58,525-Speed 2496.08 samples/sec Loss 1.0754 LearningRate 0.000005 Epoch: 37 Global Step: 777760 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:06,735-Speed 2495.09 samples/sec Loss 1.0525 LearningRate 0.000005 Epoch: 37 Global Step: 777770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:14,938-Speed 2496.69 samples/sec Loss 1.1118 LearningRate 0.000005 Epoch: 37 Global Step: 777780 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:23,094-Speed 2511.47 samples/sec Loss 1.0568 LearningRate 0.000005 Epoch: 37 Global Step: 777790 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:31,297-Speed 2497.06 samples/sec Loss 1.0936 LearningRate 0.000005 Epoch: 37 Global Step: 777800 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:39,498-Speed 2497.85 samples/sec Loss 1.1025 LearningRate 0.000005 Epoch: 37 Global Step: 777810 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:47,712-Speed 2493.60 samples/sec Loss 1.1021 LearningRate 0.000005 Epoch: 37 Global Step: 777820 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:26:55,914-Speed 2497.29 samples/sec Loss 1.0855 LearningRate 0.000005 Epoch: 37 Global Step: 777830 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:04,117-Speed 2497.06 samples/sec Loss 1.0694 LearningRate 0.000005 Epoch: 37 Global Step: 777840 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:12,267-Speed 2513.43 samples/sec Loss 1.0987 LearningRate 0.000005 Epoch: 37 Global Step: 777850 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:20,473-Speed 2495.93 samples/sec Loss 1.0636 LearningRate 0.000005 Epoch: 37 Global Step: 777860 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:28,680-Speed 2495.98 samples/sec Loss 1.0775 LearningRate 0.000005 Epoch: 37 Global Step: 777870 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:36,895-Speed 2493.54 samples/sec Loss 1.0932 LearningRate 0.000005 Epoch: 37 Global Step: 777880 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:45,099-Speed 2496.80 samples/sec Loss 1.0643 LearningRate 0.000005 Epoch: 37 Global Step: 777890 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:27:53,315-Speed 2493.00 samples/sec Loss 1.0862 LearningRate 0.000005 Epoch: 37 Global Step: 777900 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:01,465-Speed 2513.33 samples/sec Loss 1.0773 LearningRate 0.000005 Epoch: 37 Global Step: 777910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:09,667-Speed 2497.43 samples/sec Loss 1.0727 LearningRate 0.000005 Epoch: 37 Global Step: 777920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:17,868-Speed 2497.59 samples/sec Loss 1.0921 LearningRate 0.000005 Epoch: 37 Global Step: 777930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:26,070-Speed 2497.68 samples/sec Loss 1.0850 LearningRate 0.000005 Epoch: 37 Global Step: 777940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:34,279-Speed 2495.14 samples/sec Loss 1.0694 LearningRate 0.000005 Epoch: 37 Global Step: 777950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:42,485-Speed 2496.37 samples/sec Loss 1.0663 LearningRate 0.000005 Epoch: 37 Global Step: 777960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:50,643-Speed 2511.07 samples/sec Loss 1.0698 LearningRate 0.000005 Epoch: 37 Global Step: 777970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:28:58,846-Speed 2496.97 samples/sec Loss 1.0869 LearningRate 0.000005 Epoch: 37 Global Step: 777980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:07,052-Speed 2496.22 samples/sec Loss 1.0664 LearningRate 0.000005 Epoch: 37 Global Step: 777990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:15,258-Speed 2496.36 samples/sec Loss 1.0743 LearningRate 0.000005 Epoch: 37 Global Step: 778000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:23,466-Speed 2495.46 samples/sec Loss 1.0779 LearningRate 0.000005 Epoch: 37 Global Step: 778010 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:31,668-Speed 2497.29 samples/sec Loss 1.1041 LearningRate 0.000005 Epoch: 37 Global Step: 778020 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:39,818-Speed 2513.32 samples/sec Loss 1.0664 LearningRate 0.000005 Epoch: 37 Global Step: 778030 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:48,018-Speed 2497.98 samples/sec Loss 1.0858 LearningRate 0.000005 Epoch: 37 Global Step: 778040 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:29:56,222-Speed 2496.88 samples/sec Loss 1.0597 LearningRate 0.000005 Epoch: 37 Global Step: 778050 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:04,423-Speed 2497.45 samples/sec Loss 1.0679 LearningRate 0.000005 Epoch: 37 Global Step: 778060 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:12,625-Speed 2497.51 samples/sec Loss 1.0886 LearningRate 0.000005 Epoch: 37 Global Step: 778070 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:20,832-Speed 2495.97 samples/sec Loss 1.0631 LearningRate 0.000005 Epoch: 37 Global Step: 778080 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:28,983-Speed 2513.07 samples/sec Loss 1.0804 LearningRate 0.000005 Epoch: 37 Global Step: 778090 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:37,184-Speed 2497.77 samples/sec Loss 1.0757 LearningRate 0.000005 Epoch: 37 Global Step: 778100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:45,392-Speed 2495.46 samples/sec Loss 1.0727 LearningRate 0.000005 Epoch: 37 Global Step: 778110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:30:53,597-Speed 2496.48 samples/sec Loss 1.1000 LearningRate 0.000005 Epoch: 37 Global Step: 778120 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:01,802-Speed 2496.45 samples/sec Loss 1.0527 LearningRate 0.000005 Epoch: 37 Global Step: 778130 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:10,008-Speed 2496.26 samples/sec Loss 1.0735 LearningRate 0.000005 Epoch: 37 Global Step: 778140 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:18,159-Speed 2513.03 samples/sec Loss 1.0721 LearningRate 0.000005 Epoch: 37 Global Step: 778150 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:26,363-Speed 2496.78 samples/sec Loss 1.0726 LearningRate 0.000005 Epoch: 37 Global Step: 778160 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:34,581-Speed 2492.47 samples/sec Loss 1.0557 LearningRate 0.000005 Epoch: 37 Global Step: 778170 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:42,785-Speed 2496.67 samples/sec Loss 1.0634 LearningRate 0.000005 Epoch: 37 Global Step: 778180 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:50,989-Speed 2496.53 samples/sec Loss 1.0502 LearningRate 0.000005 Epoch: 37 Global Step: 778190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:31:59,192-Speed 2497.25 samples/sec Loss 1.0834 LearningRate 0.000005 Epoch: 37 Global Step: 778200 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:07,343-Speed 2512.77 samples/sec Loss 1.0814 LearningRate 0.000005 Epoch: 37 Global Step: 778210 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:15,564-Speed 2491.57 samples/sec Loss 1.1204 LearningRate 0.000005 Epoch: 37 Global Step: 778220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:23,768-Speed 2496.89 samples/sec Loss 1.0702 LearningRate 0.000005 Epoch: 37 Global Step: 778230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:31,969-Speed 2497.51 samples/sec Loss 1.0594 LearningRate 0.000005 Epoch: 37 Global Step: 778240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:40,171-Speed 2497.32 samples/sec Loss 1.0722 LearningRate 0.000005 Epoch: 37 Global Step: 778250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:48,372-Speed 2497.80 samples/sec Loss 1.0887 LearningRate 0.000005 Epoch: 37 Global Step: 778260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:32:56,524-Speed 2512.50 samples/sec Loss 1.0863 LearningRate 0.000005 Epoch: 37 Global Step: 778270 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:04,726-Speed 2497.22 samples/sec Loss 1.0908 LearningRate 0.000005 Epoch: 37 Global Step: 778280 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:12,929-Speed 2496.91 samples/sec Loss 1.0995 LearningRate 0.000005 Epoch: 37 Global Step: 778290 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:21,136-Speed 2495.94 samples/sec Loss 1.0626 LearningRate 0.000005 Epoch: 37 Global Step: 778300 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:29,340-Speed 2496.58 samples/sec Loss 1.0820 LearningRate 0.000005 Epoch: 37 Global Step: 778310 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:37,544-Speed 2496.72 samples/sec Loss 1.0861 LearningRate 0.000005 Epoch: 37 Global Step: 778320 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:45,702-Speed 2510.74 samples/sec Loss 1.0637 LearningRate 0.000005 Epoch: 37 Global Step: 778330 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:33:53,905-Speed 2497.25 samples/sec Loss 1.0896 LearningRate 0.000005 Epoch: 37 Global Step: 778340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:02,109-Speed 2496.76 samples/sec Loss 1.1159 LearningRate 0.000005 Epoch: 37 Global Step: 778350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:10,311-Speed 2497.21 samples/sec Loss 1.0687 LearningRate 0.000005 Epoch: 37 Global Step: 778360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:18,516-Speed 2496.48 samples/sec Loss 1.0571 LearningRate 0.000005 Epoch: 37 Global Step: 778370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:26,718-Speed 2497.35 samples/sec Loss 1.0979 LearningRate 0.000005 Epoch: 37 Global Step: 778380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:34,870-Speed 2512.75 samples/sec Loss 1.0967 LearningRate 0.000005 Epoch: 37 Global Step: 778390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:43,075-Speed 2496.50 samples/sec Loss 1.0665 LearningRate 0.000005 Epoch: 37 Global Step: 778400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:51,290-Speed 2493.40 samples/sec Loss 1.0959 LearningRate 0.000005 Epoch: 37 Global Step: 778410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:34:59,502-Speed 2494.26 samples/sec Loss 1.0879 LearningRate 0.000005 Epoch: 37 Global Step: 778420 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-07-13 00:35:07,716-Speed 2493.86 samples/sec Loss 1.0831 LearningRate 0.000005 Epoch: 37 Global Step: 778430 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:15,924-Speed 2495.44 samples/sec Loss 1.0584 LearningRate 0.000005 Epoch: 37 Global Step: 778440 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:24,074-Speed 2513.06 samples/sec Loss 1.0750 LearningRate 0.000005 Epoch: 37 Global Step: 778450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:32,288-Speed 2493.90 samples/sec Loss 1.0648 LearningRate 0.000005 Epoch: 37 Global Step: 778460 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:40,491-Speed 2497.18 samples/sec Loss 1.0825 LearningRate 0.000005 Epoch: 37 Global Step: 778470 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:48,698-Speed 2495.78 samples/sec Loss 1.1068 LearningRate 0.000005 Epoch: 37 Global Step: 778480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:35:56,904-Speed 2496.13 samples/sec Loss 1.0630 LearningRate 0.000005 Epoch: 37 Global Step: 778490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:05,118-Speed 2493.62 samples/sec Loss 1.0818 LearningRate 0.000005 Epoch: 37 Global Step: 778500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:13,272-Speed 2512.11 samples/sec Loss 1.0835 LearningRate 0.000005 Epoch: 37 Global Step: 778510 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:21,481-Speed 2495.31 samples/sec Loss 1.0661 LearningRate 0.000005 Epoch: 37 Global Step: 778520 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:29,685-Speed 2496.60 samples/sec Loss 1.0543 LearningRate 0.000005 Epoch: 37 Global Step: 778530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:37,891-Speed 2496.26 samples/sec Loss 1.0774 LearningRate 0.000005 Epoch: 37 Global Step: 778540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:46,097-Speed 2496.08 samples/sec Loss 1.0884 LearningRate 0.000005 Epoch: 37 Global Step: 778550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:36:54,309-Speed 2494.37 samples/sec Loss 1.0645 LearningRate 0.000005 Epoch: 37 Global Step: 778560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:02,466-Speed 2511.76 samples/sec Loss 1.0772 LearningRate 0.000005 Epoch: 37 Global Step: 778570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:10,672-Speed 2495.98 samples/sec Loss 1.0664 LearningRate 0.000005 Epoch: 37 Global Step: 778580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:18,880-Speed 2495.56 samples/sec Loss 1.0815 LearningRate 0.000005 Epoch: 37 Global Step: 778590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:27,083-Speed 2496.93 samples/sec Loss 1.1117 LearningRate 0.000005 Epoch: 37 Global Step: 778600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:35,290-Speed 2495.81 samples/sec Loss 1.0611 LearningRate 0.000005 Epoch: 37 Global Step: 778610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:43,504-Speed 2493.66 samples/sec Loss 1.0702 LearningRate 0.000005 Epoch: 37 Global Step: 778620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:51,659-Speed 2511.91 samples/sec Loss 1.0520 LearningRate 0.000005 Epoch: 37 Global Step: 778630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:37:59,863-Speed 2496.60 samples/sec Loss 1.0726 LearningRate 0.000005 Epoch: 37 Global Step: 778640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:08,069-Speed 2496.27 samples/sec Loss 1.0830 LearningRate 0.000005 Epoch: 37 Global Step: 778650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:16,273-Speed 2496.87 samples/sec Loss 1.0835 LearningRate 0.000005 Epoch: 37 Global Step: 778660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:24,475-Speed 2497.35 samples/sec Loss 1.0820 LearningRate 0.000005 Epoch: 37 Global Step: 778670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:32,679-Speed 2496.70 samples/sec Loss 1.0631 LearningRate 0.000005 Epoch: 37 Global Step: 778680 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:40,827-Speed 2513.70 samples/sec Loss 1.0859 LearningRate 0.000005 Epoch: 37 Global Step: 778690 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:49,032-Speed 2496.80 samples/sec Loss 1.0989 LearningRate 0.000005 Epoch: 37 Global Step: 778700 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:38:57,235-Speed 2496.74 samples/sec Loss 1.0905 LearningRate 0.000005 Epoch: 37 Global Step: 778710 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:05,442-Speed 2496.04 samples/sec Loss 1.0752 LearningRate 0.000005 Epoch: 37 Global Step: 778720 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:13,646-Speed 2496.59 samples/sec Loss 1.0895 LearningRate 0.000005 Epoch: 37 Global Step: 778730 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:21,861-Speed 2493.49 samples/sec Loss 1.0736 LearningRate 0.000005 Epoch: 37 Global Step: 778740 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:30,010-Speed 2513.73 samples/sec Loss 1.0801 LearningRate 0.000005 Epoch: 37 Global Step: 778750 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:38,214-Speed 2496.61 samples/sec Loss 1.0822 LearningRate 0.000005 Epoch: 37 Global Step: 778760 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:46,420-Speed 2496.32 samples/sec Loss 1.0816 LearningRate 0.000005 Epoch: 37 Global Step: 778770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:39:54,629-Speed 2495.03 samples/sec Loss 1.0444 LearningRate 0.000005 Epoch: 37 Global Step: 778780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:02,838-Speed 2495.27 samples/sec Loss 1.0659 LearningRate 0.000005 Epoch: 37 Global Step: 778790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:11,048-Speed 2494.85 samples/sec Loss 1.0836 LearningRate 0.000005 Epoch: 37 Global Step: 778800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:19,201-Speed 2512.41 samples/sec Loss 1.0858 LearningRate 0.000005 Epoch: 37 Global Step: 778810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:27,422-Speed 2491.76 samples/sec Loss 1.0677 LearningRate 0.000005 Epoch: 37 Global Step: 778820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:35,633-Speed 2494.58 samples/sec Loss 1.0682 LearningRate 0.000005 Epoch: 37 Global Step: 778830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:43,836-Speed 2496.87 samples/sec Loss 1.0748 LearningRate 0.000005 Epoch: 37 Global Step: 778840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:40:52,037-Speed 2497.82 samples/sec Loss 1.0780 LearningRate 0.000005 Epoch: 37 Global Step: 778850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:00,244-Speed 2495.89 samples/sec Loss 1.0703 LearningRate 0.000005 Epoch: 37 Global Step: 778860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:08,395-Speed 2512.81 samples/sec Loss 1.0735 LearningRate 0.000005 Epoch: 37 Global Step: 778870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:16,600-Speed 2496.56 samples/sec Loss 1.0593 LearningRate 0.000005 Epoch: 37 Global Step: 778880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:24,804-Speed 2496.80 samples/sec Loss 1.0783 LearningRate 0.000005 Epoch: 37 Global Step: 778890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:33,008-Speed 2496.64 samples/sec Loss 1.0683 LearningRate 0.000005 Epoch: 37 Global Step: 778900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:41,211-Speed 2496.85 samples/sec Loss 1.0777 LearningRate 0.000005 Epoch: 37 Global Step: 778910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:49,424-Speed 2493.92 samples/sec Loss 1.0946 LearningRate 0.000005 Epoch: 37 Global Step: 778920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:41:57,577-Speed 2512.57 samples/sec Loss 1.1041 LearningRate 0.000005 Epoch: 37 Global Step: 778930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:05,781-Speed 2496.79 samples/sec Loss 1.0756 LearningRate 0.000005 Epoch: 37 Global Step: 778940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:13,981-Speed 2497.82 samples/sec Loss 1.0703 LearningRate 0.000005 Epoch: 37 Global Step: 778950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:22,188-Speed 2496.08 samples/sec Loss 1.0677 LearningRate 0.000005 Epoch: 37 Global Step: 778960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:30,393-Speed 2496.43 samples/sec Loss 1.0767 LearningRate 0.000005 Epoch: 37 Global Step: 778970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:38,612-Speed 2492.09 samples/sec Loss 1.0700 LearningRate 0.000005 Epoch: 37 Global Step: 778980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:46,764-Speed 2512.93 samples/sec Loss 1.0621 LearningRate 0.000005 Epoch: 37 Global Step: 778990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:42:54,965-Speed 2497.58 samples/sec Loss 1.0771 LearningRate 0.000005 Epoch: 37 Global Step: 779000 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:03,169-Speed 2496.88 samples/sec Loss 1.0806 LearningRate 0.000005 Epoch: 37 Global Step: 779010 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:11,378-Speed 2495.12 samples/sec Loss 1.0680 LearningRate 0.000005 Epoch: 37 Global Step: 779020 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:19,584-Speed 2496.06 samples/sec Loss 1.0481 LearningRate 0.000005 Epoch: 37 Global Step: 779030 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:27,788-Speed 2496.61 samples/sec Loss 1.0655 LearningRate 0.000005 Epoch: 37 Global Step: 779040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:35,946-Speed 2511.01 samples/sec Loss 1.0724 LearningRate 0.000005 Epoch: 37 Global Step: 779050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:44,148-Speed 2497.12 samples/sec Loss 1.0628 LearningRate 0.000005 Epoch: 37 Global Step: 779060 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:43:52,358-Speed 2495.17 samples/sec Loss 1.0495 LearningRate 0.000005 Epoch: 37 Global Step: 779070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:00,580-Speed 2491.29 samples/sec Loss 1.0704 LearningRate 0.000005 Epoch: 37 Global Step: 779080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:08,782-Speed 2497.26 samples/sec Loss 1.0752 LearningRate 0.000005 Epoch: 37 Global Step: 779090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:16,991-Speed 2495.17 samples/sec Loss 1.0879 LearningRate 0.000005 Epoch: 37 Global Step: 779100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:25,143-Speed 2512.68 samples/sec Loss 1.0775 LearningRate 0.000005 Epoch: 37 Global Step: 779110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:33,353-Speed 2495.14 samples/sec Loss 1.0492 LearningRate 0.000005 Epoch: 37 Global Step: 779120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:41,560-Speed 2495.82 samples/sec Loss 1.0772 LearningRate 0.000005 Epoch: 37 Global Step: 779130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:49,770-Speed 2494.85 samples/sec Loss 1.0555 LearningRate 0.000005 Epoch: 37 Global Step: 779140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:44:57,975-Speed 2496.22 samples/sec Loss 1.0842 LearningRate 0.000005 Epoch: 37 Global Step: 779150 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:06,186-Speed 2494.58 samples/sec Loss 1.0748 LearningRate 0.000005 Epoch: 37 Global Step: 779160 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:14,337-Speed 2513.05 samples/sec Loss 1.0865 LearningRate 0.000005 Epoch: 37 Global Step: 779170 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:22,542-Speed 2496.45 samples/sec Loss 1.0741 LearningRate 0.000005 Epoch: 37 Global Step: 779180 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:30,756-Speed 2493.39 samples/sec Loss 1.0783 LearningRate 0.000005 Epoch: 37 Global Step: 779190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:38,960-Speed 2497.08 samples/sec Loss 1.0980 LearningRate 0.000005 Epoch: 37 Global Step: 779200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:47,168-Speed 2495.72 samples/sec Loss 1.0760 LearningRate 0.000005 Epoch: 37 Global Step: 779210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:45:55,374-Speed 2495.85 samples/sec Loss 1.0853 LearningRate 0.000005 Epoch: 37 Global Step: 779220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:03,526-Speed 2512.76 samples/sec Loss 1.0729 LearningRate 0.000005 Epoch: 37 Global Step: 779230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:11,731-Speed 2496.48 samples/sec Loss 1.0871 LearningRate 0.000005 Epoch: 37 Global Step: 779240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:19,935-Speed 2496.62 samples/sec Loss 1.0795 LearningRate 0.000005 Epoch: 37 Global Step: 779250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:28,140-Speed 2496.27 samples/sec Loss 1.0928 LearningRate 0.000005 Epoch: 37 Global Step: 779260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:36,357-Speed 2492.88 samples/sec Loss 1.0535 LearningRate 0.000005 Epoch: 37 Global Step: 779270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:44,567-Speed 2494.95 samples/sec Loss 1.0562 LearningRate 0.000005 Epoch: 37 Global Step: 779280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:46:52,718-Speed 2512.99 samples/sec Loss 1.0423 LearningRate 0.000005 Epoch: 37 Global Step: 779290 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:47:00,923-Speed 2496.55 samples/sec Loss 1.0851 LearningRate 0.000005 Epoch: 37 Global Step: 779300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-07-13 00:47:09,135-Speed 2494.47 samples/sec Loss 1.0717 LearningRate 0.000005 Epoch: 37 Global Step: 779310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:17,340-Speed 2496.40 samples/sec Loss 1.0540 LearningRate 0.000005 Epoch: 37 Global Step: 779320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:25,542-Speed 2497.17 samples/sec Loss 1.0802 LearningRate 0.000005 Epoch: 37 Global Step: 779330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:33,755-Speed 2494.30 samples/sec Loss 1.0572 LearningRate 0.000005 Epoch: 37 Global Step: 779340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:41,907-Speed 2512.48 samples/sec Loss 1.0413 LearningRate 0.000005 Epoch: 37 Global Step: 779350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:50,121-Speed 2493.72 samples/sec Loss 1.0705 LearningRate 0.000005 Epoch: 37 Global Step: 779360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:47:58,327-Speed 2496.16 samples/sec Loss 1.0713 LearningRate 0.000005 Epoch: 37 Global Step: 779370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:06,532-Speed 2496.67 samples/sec Loss 1.0887 LearningRate 0.000005 Epoch: 37 Global Step: 779380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:14,737-Speed 2496.57 samples/sec Loss 1.0581 LearningRate 0.000005 Epoch: 37 Global Step: 779390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:22,952-Speed 2493.23 samples/sec Loss 1.0753 LearningRate 0.000005 Epoch: 37 Global Step: 779400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:31,105-Speed 2512.37 samples/sec Loss 1.0798 LearningRate 0.000005 Epoch: 37 Global Step: 779410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:39,320-Speed 2493.44 samples/sec Loss 1.0368 LearningRate 0.000005 Epoch: 37 Global Step: 779420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:47,542-Speed 2491.47 samples/sec Loss 1.0657 LearningRate 0.000005 Epoch: 37 Global Step: 779430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:48:55,750-Speed 2495.57 samples/sec Loss 1.0952 LearningRate 0.000005 Epoch: 37 Global Step: 779440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:03,957-Speed 2495.81 samples/sec Loss 1.0693 LearningRate 0.000005 Epoch: 37 Global Step: 779450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:12,171-Speed 2493.44 samples/sec Loss 1.0663 LearningRate 0.000005 Epoch: 37 Global Step: 779460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:20,326-Speed 2511.96 samples/sec Loss 1.0748 LearningRate 0.000005 Epoch: 37 Global Step: 779470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:28,531-Speed 2496.57 samples/sec Loss 1.0593 LearningRate 0.000004 Epoch: 37 Global Step: 779480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:36,735-Speed 2496.57 samples/sec Loss 1.0613 LearningRate 0.000004 Epoch: 37 Global Step: 779490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:44,943-Speed 2495.54 samples/sec Loss 1.0687 LearningRate 0.000004 Epoch: 37 Global Step: 779500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:49:53,150-Speed 2495.95 samples/sec Loss 1.1135 LearningRate 0.000004 Epoch: 37 Global Step: 779510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:01,356-Speed 2495.98 samples/sec Loss 1.0838 LearningRate 0.000004 Epoch: 37 Global Step: 779520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:09,508-Speed 2512.81 samples/sec Loss 1.0681 LearningRate 0.000004 Epoch: 37 Global Step: 779530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:17,710-Speed 2497.41 samples/sec Loss 1.0552 LearningRate 0.000004 Epoch: 37 Global Step: 779540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:25,919-Speed 2495.13 samples/sec Loss 1.0877 LearningRate 0.000004 Epoch: 37 Global Step: 779550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:34,123-Speed 2496.53 samples/sec Loss 1.0577 LearningRate 0.000004 Epoch: 37 Global Step: 779560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:42,325-Speed 2497.46 samples/sec Loss 1.0363 LearningRate 0.000004 Epoch: 37 Global Step: 779570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:50,531-Speed 2495.94 samples/sec Loss 1.0622 LearningRate 0.000004 Epoch: 37 Global Step: 779580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:50:58,683-Speed 2512.89 samples/sec Loss 1.0327 LearningRate 0.000004 Epoch: 37 Global Step: 779590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:51:06,890-Speed 2495.67 samples/sec Loss 1.0596 LearningRate 0.000004 Epoch: 37 Global Step: 779600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:51:15,091-Speed 2497.61 samples/sec Loss 1.0742 LearningRate 0.000004 Epoch: 37 Global Step: 779610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:51:23,312-Speed 2491.77 samples/sec Loss 1.0510 LearningRate 0.000004 Epoch: 37 Global Step: 779620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:51:31,517-Speed 2496.27 samples/sec Loss 1.0902 LearningRate 0.000004 Epoch: 37 Global Step: 779630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:51:39,727-Speed 2495.10 samples/sec Loss 1.0554 LearningRate 0.000004 Epoch: 37 Global Step: 779640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:51:47,874-Speed 2514.28 samples/sec Loss 1.0729 LearningRate 0.000004 Epoch: 37 Global Step: 779650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:51:56,080-Speed 2496.28 samples/sec Loss 1.0734 LearningRate 0.000004 Epoch: 37 Global Step: 779660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:04,284-Speed 2496.97 samples/sec Loss 1.0534 LearningRate 0.000004 Epoch: 37 Global Step: 779670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:12,486-Speed 2497.17 samples/sec Loss 1.1002 LearningRate 0.000004 Epoch: 37 Global Step: 779680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:20,691-Speed 2496.30 samples/sec Loss 1.0430 LearningRate 0.000004 Epoch: 37 Global Step: 779690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:28,894-Speed 2497.05 samples/sec Loss 1.0770 LearningRate 0.000004 Epoch: 37 Global Step: 779700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:37,046-Speed 2512.97 samples/sec Loss 1.1068 LearningRate 0.000004 Epoch: 37 Global Step: 779710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 00:52:45,229-Speed 2503.16 samples/sec Loss 1.0811 LearningRate 0.000004 Epoch: 37 Global Step: 779720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:52:53,432-Speed 2496.89 samples/sec Loss 1.0694 LearningRate 0.000004 Epoch: 37 Global Step: 779730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:01,638-Speed 2496.11 samples/sec Loss 1.0518 LearningRate 0.000004 Epoch: 37 Global Step: 779740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:09,853-Speed 2493.39 samples/sec Loss 1.0680 LearningRate 0.000004 Epoch: 37 Global Step: 779750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:18,059-Speed 2496.18 samples/sec Loss 1.0662 LearningRate 0.000004 Epoch: 37 Global Step: 779760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:26,211-Speed 2512.71 samples/sec Loss 1.0463 LearningRate 0.000004 Epoch: 37 Global Step: 779770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:34,423-Speed 2494.61 samples/sec Loss 1.0771 LearningRate 0.000004 Epoch: 37 Global Step: 779780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:42,629-Speed 2496.36 samples/sec Loss 1.0491 LearningRate 0.000004 Epoch: 37 Global Step: 779790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:50,831-Speed 2497.22 samples/sec Loss 1.0521 LearningRate 0.000004 Epoch: 37 Global Step: 779800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:53:59,039-Speed 2495.56 samples/sec Loss 1.0618 LearningRate 0.000004 Epoch: 37 Global Step: 779810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:07,252-Speed 2493.93 samples/sec Loss 1.0322 LearningRate 0.000004 Epoch: 37 Global Step: 779820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:15,402-Speed 2513.52 samples/sec Loss 1.0827 LearningRate 0.000004 Epoch: 37 Global Step: 779830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:23,604-Speed 2497.14 samples/sec Loss 1.0881 LearningRate 0.000004 Epoch: 37 Global Step: 779840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:31,809-Speed 2496.56 samples/sec Loss 1.0704 LearningRate 0.000004 Epoch: 37 Global Step: 779850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:40,011-Speed 2497.34 samples/sec Loss 1.0636 LearningRate 0.000004 Epoch: 37 Global Step: 779860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:48,215-Speed 2496.50 samples/sec Loss 1.0797 LearningRate 0.000004 Epoch: 37 Global Step: 779870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:54:56,419-Speed 2496.90 samples/sec Loss 1.0687 LearningRate 0.000004 Epoch: 37 Global Step: 779880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:04,571-Speed 2512.62 samples/sec Loss 1.0518 LearningRate 0.000004 Epoch: 37 Global Step: 779890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:12,780-Speed 2495.39 samples/sec Loss 1.0448 LearningRate 0.000004 Epoch: 37 Global Step: 779900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:20,995-Speed 2493.55 samples/sec Loss 1.0706 LearningRate 0.000004 Epoch: 37 Global Step: 779910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:29,209-Speed 2493.95 samples/sec Loss 1.0508 LearningRate 0.000004 Epoch: 37 Global Step: 779920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:37,411-Speed 2497.09 samples/sec Loss 1.0805 LearningRate 0.000004 Epoch: 37 Global Step: 779930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:45,629-Speed 2492.49 samples/sec Loss 1.0751 LearningRate 0.000004 Epoch: 37 Global Step: 779940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:55:53,779-Speed 2513.50 samples/sec Loss 1.0520 LearningRate 0.000004 Epoch: 37 Global Step: 779950 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:01,986-Speed 2495.82 samples/sec Loss 1.0634 LearningRate 0.000004 Epoch: 37 Global Step: 779960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:10,192-Speed 2496.15 samples/sec Loss 1.0768 LearningRate 0.000004 Epoch: 37 Global Step: 779970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:18,399-Speed 2495.67 samples/sec Loss 1.0746 LearningRate 0.000004 Epoch: 37 Global Step: 779980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:26,602-Speed 2497.06 samples/sec Loss 1.1099 LearningRate 0.000004 Epoch: 37 Global Step: 779990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:34,828-Speed 2493.87 samples/sec Loss 1.0752 LearningRate 0.000004 Epoch: 37 Global Step: 780000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:42,977-Speed 2513.43 samples/sec Loss 1.1160 LearningRate 0.000004 Epoch: 37 Global Step: 780010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:51,180-Speed 2497.08 samples/sec Loss 1.0984 LearningRate 0.000004 Epoch: 37 Global Step: 780020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:56:59,390-Speed 2494.86 samples/sec Loss 1.1048 LearningRate 0.000004 Epoch: 37 Global Step: 780030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:07,594-Speed 2496.83 samples/sec Loss 1.0608 LearningRate 0.000004 Epoch: 37 Global Step: 780040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:15,799-Speed 2496.18 samples/sec Loss 1.1030 LearningRate 0.000004 Epoch: 37 Global Step: 780050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:24,016-Speed 2492.96 samples/sec Loss 1.0695 LearningRate 0.000004 Epoch: 37 Global Step: 780060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:32,184-Speed 2507.68 samples/sec Loss 1.0631 LearningRate 0.000004 Epoch: 37 Global Step: 780070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:40,403-Speed 2492.41 samples/sec Loss 1.1080 LearningRate 0.000004 Epoch: 37 Global Step: 780080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:48,607-Speed 2496.68 samples/sec Loss 1.0758 LearningRate 0.000004 Epoch: 37 Global Step: 780090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:57:56,825-Speed 2492.38 samples/sec Loss 1.0650 LearningRate 0.000004 Epoch: 37 Global Step: 780100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:05,029-Speed 2496.73 samples/sec Loss 1.0461 LearningRate 0.000004 Epoch: 37 Global Step: 780110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:13,244-Speed 2493.35 samples/sec Loss 1.0686 LearningRate 0.000004 Epoch: 37 Global Step: 780120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:21,395-Speed 2513.26 samples/sec Loss 1.1233 LearningRate 0.000004 Epoch: 37 Global Step: 780130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:29,598-Speed 2497.02 samples/sec Loss 1.0583 LearningRate 0.000004 Epoch: 37 Global Step: 780140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:37,800-Speed 2497.55 samples/sec Loss 1.0814 LearningRate 0.000004 Epoch: 37 Global Step: 780150 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:46,010-Speed 2494.76 samples/sec Loss 1.0738 LearningRate 0.000004 Epoch: 37 Global Step: 780160 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:58:54,216-Speed 2496.34 samples/sec Loss 1.0518 LearningRate 0.000004 Epoch: 37 Global Step: 780170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:02,428-Speed 2494.21 samples/sec Loss 1.0847 LearningRate 0.000004 Epoch: 37 Global Step: 780180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:10,583-Speed 2512.13 samples/sec Loss 1.0706 LearningRate 0.000004 Epoch: 37 Global Step: 780190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:18,801-Speed 2492.33 samples/sec Loss 1.1124 LearningRate 0.000004 Epoch: 37 Global Step: 780200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:27,020-Speed 2492.17 samples/sec Loss 1.0407 LearningRate 0.000004 Epoch: 37 Global Step: 780210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:35,222-Speed 2497.69 samples/sec Loss 1.0541 LearningRate 0.000004 Epoch: 37 Global Step: 780220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:43,426-Speed 2496.85 samples/sec Loss 1.0953 LearningRate 0.000004 Epoch: 37 Global Step: 780230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:51,653-Speed 2489.69 samples/sec Loss 1.0697 LearningRate 0.000004 Epoch: 37 Global Step: 780240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 00:59:59,802-Speed 2513.46 samples/sec Loss 1.0825 LearningRate 0.000004 Epoch: 37 Global Step: 780250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:08,007-Speed 2496.48 samples/sec Loss 1.0729 LearningRate 0.000004 Epoch: 37 Global Step: 780260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:16,210-Speed 2497.27 samples/sec Loss 1.0823 LearningRate 0.000004 Epoch: 37 Global Step: 780270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:24,413-Speed 2496.70 samples/sec Loss 1.0714 LearningRate 0.000004 Epoch: 37 Global Step: 780280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:32,619-Speed 2496.33 samples/sec Loss 1.0396 LearningRate 0.000004 Epoch: 37 Global Step: 780290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:40,821-Speed 2497.08 samples/sec Loss 1.0674 LearningRate 0.000004 Epoch: 37 Global Step: 780300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:48,971-Speed 2513.61 samples/sec Loss 1.0615 LearningRate 0.000004 Epoch: 37 Global Step: 780310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:00:57,178-Speed 2495.62 samples/sec Loss 1.0749 LearningRate 0.000004 Epoch: 37 Global Step: 780320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:05,382-Speed 2496.90 samples/sec Loss 1.0666 LearningRate 0.000004 Epoch: 37 Global Step: 780330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:13,602-Speed 2491.69 samples/sec Loss 1.0687 LearningRate 0.000004 Epoch: 37 Global Step: 780340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:21,816-Speed 2493.92 samples/sec Loss 1.0950 LearningRate 0.000004 Epoch: 37 Global Step: 780350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:30,044-Speed 2489.54 samples/sec Loss 1.1165 LearningRate 0.000004 Epoch: 37 Global Step: 780360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:38,204-Speed 2510.01 samples/sec Loss 1.0660 LearningRate 0.000004 Epoch: 37 Global Step: 780370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:46,407-Speed 2496.93 samples/sec Loss 1.0534 LearningRate 0.000004 Epoch: 37 Global Step: 780380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:01:54,625-Speed 2492.86 samples/sec Loss 1.0539 LearningRate 0.000004 Epoch: 37 Global Step: 780390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:02,833-Speed 2495.57 samples/sec Loss 1.0505 LearningRate 0.000004 Epoch: 37 Global Step: 780400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:11,040-Speed 2495.65 samples/sec Loss 1.0829 LearningRate 0.000004 Epoch: 37 Global Step: 780410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:19,248-Speed 2495.52 samples/sec Loss 1.0778 LearningRate 0.000004 Epoch: 37 Global Step: 780420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:27,402-Speed 2512.28 samples/sec Loss 1.0647 LearningRate 0.000004 Epoch: 37 Global Step: 780430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:35,606-Speed 2496.48 samples/sec Loss 1.0885 LearningRate 0.000004 Epoch: 37 Global Step: 780440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:43,821-Speed 2493.41 samples/sec Loss 1.0600 LearningRate 0.000004 Epoch: 37 Global Step: 780450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:02:52,031-Speed 2494.96 samples/sec Loss 1.0505 LearningRate 0.000004 Epoch: 37 Global Step: 780460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:00,237-Speed 2496.38 samples/sec Loss 1.0523 LearningRate 0.000004 Epoch: 37 Global Step: 780470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:08,445-Speed 2495.55 samples/sec Loss 1.0661 LearningRate 0.000004 Epoch: 37 Global Step: 780480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:16,594-Speed 2513.68 samples/sec Loss 1.0752 LearningRate 0.000004 Epoch: 37 Global Step: 780490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:24,800-Speed 2496.33 samples/sec Loss 1.1025 LearningRate 0.000004 Epoch: 37 Global Step: 780500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:33,005-Speed 2496.27 samples/sec Loss 1.0765 LearningRate 0.000004 Epoch: 37 Global Step: 780510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:41,214-Speed 2495.11 samples/sec Loss 1.0940 LearningRate 0.000004 Epoch: 37 Global Step: 780520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:49,438-Speed 2490.81 samples/sec Loss 1.0703 LearningRate 0.000004 Epoch: 37 Global Step: 780530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:03:57,644-Speed 2496.07 samples/sec Loss 1.1151 LearningRate 0.000004 Epoch: 37 Global Step: 780540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:05,798-Speed 2512.27 samples/sec Loss 1.0827 LearningRate 0.000004 Epoch: 37 Global Step: 780550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:14,001-Speed 2497.08 samples/sec Loss 1.0854 LearningRate 0.000004 Epoch: 37 Global Step: 780560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:22,204-Speed 2497.04 samples/sec Loss 1.0858 LearningRate 0.000004 Epoch: 37 Global Step: 780570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:30,407-Speed 2497.21 samples/sec Loss 1.0551 LearningRate 0.000004 Epoch: 37 Global Step: 780580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:38,613-Speed 2496.05 samples/sec Loss 1.0276 LearningRate 0.000004 Epoch: 37 Global Step: 780590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:46,820-Speed 2495.81 samples/sec Loss 1.0649 LearningRate 0.000004 Epoch: 37 Global Step: 780600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:04:54,979-Speed 2510.65 samples/sec Loss 1.0857 LearningRate 0.000004 Epoch: 37 Global Step: 780610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:03,184-Speed 2496.69 samples/sec Loss 1.1053 LearningRate 0.000004 Epoch: 37 Global Step: 780620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:11,394-Speed 2494.86 samples/sec Loss 1.0594 LearningRate 0.000004 Epoch: 37 Global Step: 780630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:19,597-Speed 2496.90 samples/sec Loss 1.0643 LearningRate 0.000004 Epoch: 37 Global Step: 780640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:27,801-Speed 2496.99 samples/sec Loss 1.0477 LearningRate 0.000004 Epoch: 37 Global Step: 780650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:36,006-Speed 2496.54 samples/sec Loss 1.0586 LearningRate 0.000004 Epoch: 37 Global Step: 780660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:44,163-Speed 2511.14 samples/sec Loss 1.0749 LearningRate 0.000004 Epoch: 37 Global Step: 780670 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:05:52,370-Speed 2495.72 samples/sec Loss 1.0761 LearningRate 0.000004 Epoch: 37 Global Step: 780680 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:00,578-Speed 2495.62 samples/sec Loss 1.0790 LearningRate 0.000004 Epoch: 37 Global Step: 780690 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:08,783-Speed 2496.54 samples/sec Loss 1.0721 LearningRate 0.000004 Epoch: 37 Global Step: 780700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:16,988-Speed 2496.38 samples/sec Loss 1.0921 LearningRate 0.000004 Epoch: 37 Global Step: 780710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:25,192-Speed 2496.70 samples/sec Loss 1.0900 LearningRate 0.000004 Epoch: 37 Global Step: 780720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:33,353-Speed 2509.94 samples/sec Loss 1.0517 LearningRate 0.000004 Epoch: 37 Global Step: 780730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:41,571-Speed 2492.43 samples/sec Loss 1.0933 LearningRate 0.000004 Epoch: 37 Global Step: 780740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:49,780-Speed 2495.09 samples/sec Loss 1.0724 LearningRate 0.000004 Epoch: 37 Global Step: 780750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:06:57,986-Speed 2496.38 samples/sec Loss 1.0594 LearningRate 0.000004 Epoch: 37 Global Step: 780760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:06,191-Speed 2496.48 samples/sec Loss 1.0489 LearningRate 0.000004 Epoch: 37 Global Step: 780770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:14,400-Speed 2495.04 samples/sec Loss 1.0923 LearningRate 0.000004 Epoch: 37 Global Step: 780780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:22,554-Speed 2512.36 samples/sec Loss 1.0687 LearningRate 0.000004 Epoch: 37 Global Step: 780790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:30,764-Speed 2495.00 samples/sec Loss 1.0900 LearningRate 0.000004 Epoch: 37 Global Step: 780800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:38,969-Speed 2496.45 samples/sec Loss 1.1034 LearningRate 0.000004 Epoch: 37 Global Step: 780810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:47,178-Speed 2495.03 samples/sec Loss 1.0457 LearningRate 0.000004 Epoch: 37 Global Step: 780820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:07:55,383-Speed 2496.46 samples/sec Loss 1.0864 LearningRate 0.000004 Epoch: 37 Global Step: 780830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:03,591-Speed 2495.51 samples/sec Loss 1.0560 LearningRate 0.000004 Epoch: 37 Global Step: 780840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:11,745-Speed 2511.98 samples/sec Loss 1.0814 LearningRate 0.000004 Epoch: 37 Global Step: 780850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:19,948-Speed 2496.92 samples/sec Loss 1.1015 LearningRate 0.000004 Epoch: 37 Global Step: 780860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:28,154-Speed 2496.18 samples/sec Loss 1.0933 LearningRate 0.000004 Epoch: 37 Global Step: 780870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:36,360-Speed 2496.36 samples/sec Loss 1.0649 LearningRate 0.000004 Epoch: 37 Global Step: 780880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:44,568-Speed 2495.56 samples/sec Loss 1.0946 LearningRate 0.000004 Epoch: 37 Global Step: 780890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:08:52,774-Speed 2495.81 samples/sec Loss 1.0773 LearningRate 0.000004 Epoch: 37 Global Step: 780900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:09:00,927-Speed 2512.49 samples/sec Loss 1.0761 LearningRate 0.000004 Epoch: 37 Global Step: 780910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:09:09,133-Speed 2496.28 samples/sec Loss 1.1080 LearningRate 0.000004 Epoch: 37 Global Step: 780920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:17,345-Speed 2494.08 samples/sec Loss 1.0761 LearningRate 0.000004 Epoch: 37 Global Step: 780930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:25,550-Speed 2496.38 samples/sec Loss 1.1108 LearningRate 0.000004 Epoch: 37 Global Step: 780940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:33,755-Speed 2496.45 samples/sec Loss 1.0808 LearningRate 0.000004 Epoch: 37 Global Step: 780950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:41,961-Speed 2496.13 samples/sec Loss 1.0813 LearningRate 0.000004 Epoch: 37 Global Step: 780960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:50,116-Speed 2511.62 samples/sec Loss 1.0679 LearningRate 0.000004 Epoch: 37 Global Step: 780970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:09:58,320-Speed 2497.00 samples/sec Loss 1.0888 LearningRate 0.000004 Epoch: 37 Global Step: 780980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:06,525-Speed 2496.50 samples/sec Loss 1.0705 LearningRate 0.000004 Epoch: 37 Global Step: 780990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:14,732-Speed 2496.26 samples/sec Loss 1.0784 LearningRate 0.000004 Epoch: 37 Global Step: 781000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:22,938-Speed 2495.86 samples/sec Loss 1.0967 LearningRate 0.000004 Epoch: 37 Global Step: 781010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:31,142-Speed 2496.84 samples/sec Loss 1.0570 LearningRate 0.000004 Epoch: 37 Global Step: 781020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:39,294-Speed 2512.87 samples/sec Loss 1.0497 LearningRate 0.000004 Epoch: 37 Global Step: 781030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:47,505-Speed 2494.62 samples/sec Loss 1.0903 LearningRate 0.000004 Epoch: 37 Global Step: 781040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:10:55,711-Speed 2495.92 samples/sec Loss 1.0490 LearningRate 0.000004 Epoch: 37 Global Step: 781050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:03,927-Speed 2493.08 samples/sec Loss 1.0888 LearningRate 0.000004 Epoch: 37 Global Step: 781060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:12,130-Speed 2497.48 samples/sec Loss 1.0677 LearningRate 0.000004 Epoch: 37 Global Step: 781070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:20,336-Speed 2496.17 samples/sec Loss 1.0569 LearningRate 0.000004 Epoch: 37 Global Step: 781080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:28,488-Speed 2512.63 samples/sec Loss 1.0629 LearningRate 0.000004 Epoch: 37 Global Step: 781090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:36,696-Speed 2495.30 samples/sec Loss 1.0460 LearningRate 0.000004 Epoch: 37 Global Step: 781100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:44,905-Speed 2495.61 samples/sec Loss 1.0749 LearningRate 0.000004 Epoch: 37 Global Step: 781110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:11:53,112-Speed 2495.93 samples/sec Loss 1.0847 LearningRate 0.000004 Epoch: 37 Global Step: 781120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:01,321-Speed 2495.34 samples/sec Loss 1.0806 LearningRate 0.000004 Epoch: 37 Global Step: 781130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:09,522-Speed 2497.29 samples/sec Loss 1.0933 LearningRate 0.000004 Epoch: 37 Global Step: 781140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:17,674-Speed 2512.90 samples/sec Loss 1.0672 LearningRate 0.000004 Epoch: 37 Global Step: 781150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:25,875-Speed 2497.56 samples/sec Loss 1.0565 LearningRate 0.000004 Epoch: 37 Global Step: 781160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:34,078-Speed 2497.03 samples/sec Loss 1.0830 LearningRate 0.000004 Epoch: 37 Global Step: 781170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:42,279-Speed 2497.57 samples/sec Loss 1.0891 LearningRate 0.000004 Epoch: 37 Global Step: 781180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:50,481-Speed 2497.10 samples/sec Loss 1.0841 LearningRate 0.000004 Epoch: 37 Global Step: 781190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:12:58,693-Speed 2494.25 samples/sec Loss 1.0835 LearningRate 0.000004 Epoch: 37 Global Step: 781200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:06,848-Speed 2511.76 samples/sec Loss 1.0854 LearningRate 0.000004 Epoch: 37 Global Step: 781210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:15,056-Speed 2495.83 samples/sec Loss 1.1013 LearningRate 0.000004 Epoch: 37 Global Step: 781220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:23,272-Speed 2492.98 samples/sec Loss 1.0642 LearningRate 0.000004 Epoch: 37 Global Step: 781230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:31,476-Speed 2496.81 samples/sec Loss 1.0608 LearningRate 0.000004 Epoch: 37 Global Step: 781240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:39,689-Speed 2493.81 samples/sec Loss 1.0913 LearningRate 0.000004 Epoch: 37 Global Step: 781250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:47,899-Speed 2494.92 samples/sec Loss 1.0691 LearningRate 0.000004 Epoch: 37 Global Step: 781260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:13:56,059-Speed 2510.38 samples/sec Loss 1.0695 LearningRate 0.000004 Epoch: 37 Global Step: 781270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:14:04,264-Speed 2496.46 samples/sec Loss 1.1052 LearningRate 0.000004 Epoch: 37 Global Step: 781280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:14:12,467-Speed 2496.88 samples/sec Loss 1.0615 LearningRate 0.000004 Epoch: 37 Global Step: 781290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-07-13 01:14:20,630-Speed 2509.40 samples/sec Loss 1.0816 LearningRate 0.000004 Epoch: 37 Global Step: 781300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:14:28,834-Speed 2496.89 samples/sec Loss 1.1016 LearningRate 0.000004 Epoch: 37 Global Step: 781310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:14:37,048-Speed 2493.39 samples/sec Loss 1.0611 LearningRate 0.000004 Epoch: 37 Global Step: 781320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:14:45,198-Speed 2513.50 samples/sec Loss 1.0953 LearningRate 0.000004 Epoch: 37 Global Step: 781330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:14:53,400-Speed 2497.18 samples/sec Loss 1.0624 LearningRate 0.000004 Epoch: 37 Global Step: 781340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:01,607-Speed 2496.03 samples/sec Loss 1.1022 LearningRate 0.000004 Epoch: 37 Global Step: 781350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:09,814-Speed 2495.57 samples/sec Loss 1.0701 LearningRate 0.000004 Epoch: 37 Global Step: 781360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:18,030-Speed 2493.05 samples/sec Loss 1.0837 LearningRate 0.000004 Epoch: 37 Global Step: 781370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:26,235-Speed 2496.47 samples/sec Loss 1.0883 LearningRate 0.000004 Epoch: 37 Global Step: 781380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:34,387-Speed 2512.54 samples/sec Loss 1.0560 LearningRate 0.000004 Epoch: 37 Global Step: 781390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:42,595-Speed 2495.37 samples/sec Loss 1.1032 LearningRate 0.000004 Epoch: 37 Global Step: 781400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:50,802-Speed 2495.89 samples/sec Loss 1.0821 LearningRate 0.000004 Epoch: 37 Global Step: 781410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:15:59,008-Speed 2496.27 samples/sec Loss 1.0805 LearningRate 0.000004 Epoch: 37 Global Step: 781420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:07,215-Speed 2495.92 samples/sec Loss 1.0601 LearningRate 0.000004 Epoch: 37 Global Step: 781430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:15,423-Speed 2495.53 samples/sec Loss 1.0705 LearningRate 0.000004 Epoch: 37 Global Step: 781440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:23,574-Speed 2513.10 samples/sec Loss 1.0368 LearningRate 0.000004 Epoch: 37 Global Step: 781450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:31,779-Speed 2496.36 samples/sec Loss 1.0555 LearningRate 0.000004 Epoch: 37 Global Step: 781460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:39,984-Speed 2496.70 samples/sec Loss 1.0716 LearningRate 0.000004 Epoch: 37 Global Step: 781470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:48,187-Speed 2497.08 samples/sec Loss 1.1044 LearningRate 0.000004 Epoch: 37 Global Step: 781480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:16:56,391-Speed 2496.52 samples/sec Loss 1.0706 LearningRate 0.000004 Epoch: 37 Global Step: 781490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:04,597-Speed 2496.08 samples/sec Loss 1.0810 LearningRate 0.000004 Epoch: 37 Global Step: 781500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:12,746-Speed 2513.76 samples/sec Loss 1.0779 LearningRate 0.000004 Epoch: 37 Global Step: 781510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:20,953-Speed 2495.67 samples/sec Loss 1.0981 LearningRate 0.000004 Epoch: 37 Global Step: 781520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:29,157-Speed 2496.74 samples/sec Loss 1.0601 LearningRate 0.000004 Epoch: 37 Global Step: 781530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:37,360-Speed 2497.40 samples/sec Loss 1.1142 LearningRate 0.000004 Epoch: 37 Global Step: 781540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:45,566-Speed 2496.02 samples/sec Loss 1.0736 LearningRate 0.000004 Epoch: 37 Global Step: 781550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:17:53,769-Speed 2496.91 samples/sec Loss 1.0803 LearningRate 0.000004 Epoch: 37 Global Step: 781560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:01,923-Speed 2512.03 samples/sec Loss 1.0624 LearningRate 0.000004 Epoch: 37 Global Step: 781570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:10,123-Speed 2498.24 samples/sec Loss 1.0665 LearningRate 0.000004 Epoch: 37 Global Step: 781580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:18,330-Speed 2495.84 samples/sec Loss 1.0846 LearningRate 0.000004 Epoch: 37 Global Step: 781590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:26,539-Speed 2495.07 samples/sec Loss 1.0723 LearningRate 0.000004 Epoch: 37 Global Step: 781600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:34,744-Speed 2496.58 samples/sec Loss 1.0666 LearningRate 0.000004 Epoch: 37 Global Step: 781610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:42,956-Speed 2494.45 samples/sec Loss 1.0986 LearningRate 0.000004 Epoch: 37 Global Step: 781620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:51,103-Speed 2513.90 samples/sec Loss 1.0898 LearningRate 0.000004 Epoch: 37 Global Step: 781630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:18:59,306-Speed 2497.10 samples/sec Loss 1.0882 LearningRate 0.000004 Epoch: 37 Global Step: 781640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:07,512-Speed 2496.31 samples/sec Loss 1.0827 LearningRate 0.000004 Epoch: 37 Global Step: 781650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:15,727-Speed 2493.03 samples/sec Loss 1.0734 LearningRate 0.000004 Epoch: 37 Global Step: 781660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:23,931-Speed 2496.53 samples/sec Loss 1.0759 LearningRate 0.000004 Epoch: 37 Global Step: 781670 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:32,138-Speed 2495.86 samples/sec Loss 1.0341 LearningRate 0.000004 Epoch: 37 Global Step: 781680 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:40,288-Speed 2513.37 samples/sec Loss 1.0695 LearningRate 0.000004 Epoch: 37 Global Step: 781690 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:19:48,449-Speed 2509.95 samples/sec Loss 1.0878 LearningRate 0.000004 Epoch: 37 Global Step: 781700 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:19:56,652-Speed 2496.99 samples/sec Loss 1.0694 LearningRate 0.000004 Epoch: 37 Global Step: 781710 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:04,857-Speed 2496.43 samples/sec Loss 1.0774 LearningRate 0.000004 Epoch: 37 Global Step: 781720 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:13,066-Speed 2495.21 samples/sec Loss 1.0762 LearningRate 0.000004 Epoch: 37 Global Step: 781730 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:21,271-Speed 2496.07 samples/sec Loss 1.0731 LearningRate 0.000004 Epoch: 37 Global Step: 781740 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:29,434-Speed 2509.45 samples/sec Loss 1.0633 LearningRate 0.000004 Epoch: 37 Global Step: 781750 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:37,643-Speed 2495.28 samples/sec Loss 1.0976 LearningRate 0.000004 Epoch: 37 Global Step: 781760 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:45,844-Speed 2497.80 samples/sec Loss 1.0601 LearningRate 0.000004 Epoch: 37 Global Step: 781770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:20:54,050-Speed 2495.85 samples/sec Loss 1.0929 LearningRate 0.000004 Epoch: 37 Global Step: 781780 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:02,254-Speed 2496.81 samples/sec Loss 1.0787 LearningRate 0.000004 Epoch: 37 Global Step: 781790 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:10,459-Speed 2496.76 samples/sec Loss 1.0755 LearningRate 0.000004 Epoch: 37 Global Step: 781800 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:18,610-Speed 2512.95 samples/sec Loss 1.0912 LearningRate 0.000004 Epoch: 37 Global Step: 781810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:26,819-Speed 2495.17 samples/sec Loss 1.0767 LearningRate 0.000004 Epoch: 37 Global Step: 781820 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:35,019-Speed 2498.15 samples/sec Loss 1.0718 LearningRate 0.000004 Epoch: 37 Global Step: 781830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:43,225-Speed 2495.97 samples/sec Loss 1.0977 LearningRate 0.000004 Epoch: 37 Global Step: 781840 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:51,432-Speed 2495.89 samples/sec Loss 1.0608 LearningRate 0.000004 Epoch: 37 Global Step: 781850 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:21:59,641-Speed 2495.54 samples/sec Loss 1.0752 LearningRate 0.000004 Epoch: 37 Global Step: 781860 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:07,791-Speed 2513.30 samples/sec Loss 1.1088 LearningRate 0.000004 Epoch: 37 Global Step: 781870 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:15,995-Speed 2496.88 samples/sec Loss 1.0509 LearningRate 0.000004 Epoch: 37 Global Step: 781880 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:24,206-Speed 2494.46 samples/sec Loss 1.0944 LearningRate 0.000004 Epoch: 37 Global Step: 781890 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:32,427-Speed 2491.67 samples/sec Loss 1.0611 LearningRate 0.000004 Epoch: 37 Global Step: 781900 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:40,627-Speed 2497.84 samples/sec Loss 1.0797 LearningRate 0.000004 Epoch: 37 Global Step: 781910 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:48,828-Speed 2497.91 samples/sec Loss 1.0298 LearningRate 0.000004 Epoch: 37 Global Step: 781920 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:22:57,009-Speed 2504.00 samples/sec Loss 1.0818 LearningRate 0.000004 Epoch: 37 Global Step: 781930 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:05,212-Speed 2496.78 samples/sec Loss 1.0377 LearningRate 0.000004 Epoch: 37 Global Step: 781940 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:13,416-Speed 2496.95 samples/sec Loss 1.0406 LearningRate 0.000004 Epoch: 37 Global Step: 781950 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:21,628-Speed 2494.47 samples/sec Loss 1.0575 LearningRate 0.000004 Epoch: 37 Global Step: 781960 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:29,830-Speed 2497.40 samples/sec Loss 1.0891 LearningRate 0.000004 Epoch: 37 Global Step: 781970 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:38,038-Speed 2495.38 samples/sec Loss 1.0736 LearningRate 0.000004 Epoch: 37 Global Step: 781980 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:46,192-Speed 2511.88 samples/sec Loss 1.0486 LearningRate 0.000004 Epoch: 37 Global Step: 781990 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:23:54,396-Speed 2496.91 samples/sec Loss 1.0602 LearningRate 0.000004 Epoch: 37 Global Step: 782000 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:02,606-Speed 2495.02 samples/sec Loss 1.0845 LearningRate 0.000004 Epoch: 37 Global Step: 782010 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:10,805-Speed 2497.94 samples/sec Loss 1.0637 LearningRate 0.000004 Epoch: 37 Global Step: 782020 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:19,006-Speed 2497.65 samples/sec Loss 1.0388 LearningRate 0.000004 Epoch: 37 Global Step: 782030 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:27,208-Speed 2497.43 samples/sec Loss 1.0670 LearningRate 0.000004 Epoch: 37 Global Step: 782040 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:35,359-Speed 2513.04 samples/sec Loss 1.0448 LearningRate 0.000004 Epoch: 37 Global Step: 782050 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:43,562-Speed 2496.96 samples/sec Loss 1.0675 LearningRate 0.000004 Epoch: 37 Global Step: 782060 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:51,762-Speed 2497.81 samples/sec Loss 1.1016 LearningRate 0.000004 Epoch: 37 Global Step: 782070 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:24:59,966-Speed 2496.94 samples/sec Loss 1.0762 LearningRate 0.000004 Epoch: 37 Global Step: 782080 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:08,170-Speed 2496.75 samples/sec Loss 1.0603 LearningRate 0.000004 Epoch: 37 Global Step: 782090 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:16,397-Speed 2489.84 samples/sec Loss 1.0700 LearningRate 0.000004 Epoch: 37 Global Step: 782100 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:24,572-Speed 2505.64 samples/sec Loss 1.1048 LearningRate 0.000004 Epoch: 37 Global Step: 782110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:32,787-Speed 2493.51 samples/sec Loss 1.0642 LearningRate 0.000004 Epoch: 37 Global Step: 782120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:40,998-Speed 2494.60 samples/sec Loss 1.0729 LearningRate 0.000004 Epoch: 37 Global Step: 782130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:49,205-Speed 2495.82 samples/sec Loss 1.0811 LearningRate 0.000004 Epoch: 37 Global Step: 782140 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:25:57,408-Speed 2497.00 samples/sec Loss 1.0811 LearningRate 0.000004 Epoch: 37 Global Step: 782150 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:05,618-Speed 2494.85 samples/sec Loss 1.0549 LearningRate 0.000004 Epoch: 37 Global Step: 782160 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:13,768-Speed 2513.33 samples/sec Loss 1.0589 LearningRate 0.000004 Epoch: 37 Global Step: 782170 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:21,970-Speed 2497.39 samples/sec Loss 1.1054 LearningRate 0.000004 Epoch: 37 Global Step: 782180 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:30,173-Speed 2497.19 samples/sec Loss 1.1002 LearningRate 0.000004 Epoch: 37 Global Step: 782190 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:38,380-Speed 2495.60 samples/sec Loss 1.1068 LearningRate 0.000004 Epoch: 37 Global Step: 782200 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:46,598-Speed 2492.49 samples/sec Loss 1.0747 LearningRate 0.000004 Epoch: 37 Global Step: 782210 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:26:54,802-Speed 2496.38 samples/sec Loss 1.0702 LearningRate 0.000004 Epoch: 37 Global Step: 782220 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:02,952-Speed 2513.45 samples/sec Loss 1.0785 LearningRate 0.000004 Epoch: 37 Global Step: 782230 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:11,151-Speed 2498.17 samples/sec Loss 1.0760 LearningRate 0.000004 Epoch: 37 Global Step: 782240 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:19,363-Speed 2494.34 samples/sec Loss 1.0665 LearningRate 0.000004 Epoch: 37 Global Step: 782250 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:27,568-Speed 2496.78 samples/sec Loss 1.0804 LearningRate 0.000004 Epoch: 37 Global Step: 782260 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:35,769-Speed 2497.74 samples/sec Loss 1.1078 LearningRate 0.000004 Epoch: 37 Global Step: 782270 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:43,979-Speed 2495.00 samples/sec Loss 1.0451 LearningRate 0.000004 Epoch: 37 Global Step: 782280 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:27:52,133-Speed 2511.83 samples/sec Loss 1.1095 LearningRate 0.000004 Epoch: 37 Global Step: 782290 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:00,347-Speed 2493.69 samples/sec Loss 1.0547 LearningRate 0.000004 Epoch: 37 Global Step: 782300 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:08,556-Speed 2495.53 samples/sec Loss 1.1108 LearningRate 0.000004 Epoch: 37 Global Step: 782310 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:16,759-Speed 2496.78 samples/sec Loss 1.0534 LearningRate 0.000004 Epoch: 37 Global Step: 782320 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:24,967-Speed 2495.74 samples/sec Loss 1.0728 LearningRate 0.000004 Epoch: 37 Global Step: 782330 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:33,169-Speed 2497.41 samples/sec Loss 1.0775 LearningRate 0.000004 Epoch: 37 Global Step: 782340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:41,321-Speed 2512.54 samples/sec Loss 1.0748 LearningRate 0.000004 Epoch: 37 Global Step: 782350 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:49,522-Speed 2497.46 samples/sec Loss 1.0721 LearningRate 0.000004 Epoch: 37 Global Step: 782360 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:28:57,731-Speed 2495.33 samples/sec Loss 1.0829 LearningRate 0.000004 Epoch: 37 Global Step: 782370 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:05,937-Speed 2496.08 samples/sec Loss 1.0514 LearningRate 0.000004 Epoch: 37 Global Step: 782380 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:14,141-Speed 2496.83 samples/sec Loss 1.0599 LearningRate 0.000004 Epoch: 37 Global Step: 782390 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:22,344-Speed 2497.13 samples/sec Loss 1.0922 LearningRate 0.000004 Epoch: 37 Global Step: 782400 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:30,510-Speed 2508.05 samples/sec Loss 1.0692 LearningRate 0.000004 Epoch: 37 Global Step: 782410 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:38,717-Speed 2496.08 samples/sec Loss 1.0500 LearningRate 0.000004 Epoch: 37 Global Step: 782420 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:46,924-Speed 2496.00 samples/sec Loss 1.0557 LearningRate 0.000004 Epoch: 37 Global Step: 782430 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:29:55,144-Speed 2492.00 samples/sec Loss 1.0984 LearningRate 0.000004 Epoch: 37 Global Step: 782440 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:03,350-Speed 2496.13 samples/sec Loss 1.0776 LearningRate 0.000004 Epoch: 37 Global Step: 782450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:11,555-Speed 2496.31 samples/sec Loss 1.0708 LearningRate 0.000004 Epoch: 37 Global Step: 782460 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:19,706-Speed 2513.09 samples/sec Loss 1.0855 LearningRate 0.000004 Epoch: 37 Global Step: 782470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:27,914-Speed 2495.45 samples/sec Loss 1.0636 LearningRate 0.000004 Epoch: 37 Global Step: 782480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:36,131-Speed 2492.89 samples/sec Loss 1.0898 LearningRate 0.000004 Epoch: 37 Global Step: 782490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:44,335-Speed 2496.66 samples/sec Loss 1.0605 LearningRate 0.000004 Epoch: 37 Global Step: 782500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:30:52,556-Speed 2491.69 samples/sec Loss 1.0705 LearningRate 0.000004 Epoch: 37 Global Step: 782510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:00,766-Speed 2494.85 samples/sec Loss 1.0719 LearningRate 0.000004 Epoch: 37 Global Step: 782520 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:08,923-Speed 2511.02 samples/sec Loss 1.0562 LearningRate 0.000004 Epoch: 37 Global Step: 782530 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:17,132-Speed 2495.33 samples/sec Loss 1.0688 LearningRate 0.000004 Epoch: 37 Global Step: 782540 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:25,339-Speed 2495.93 samples/sec Loss 1.1071 LearningRate 0.000004 Epoch: 37 Global Step: 782550 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:33,545-Speed 2496.15 samples/sec Loss 1.0652 LearningRate 0.000004 Epoch: 37 Global Step: 782560 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:41,757-Speed 2494.67 samples/sec Loss 1.0710 LearningRate 0.000004 Epoch: 37 Global Step: 782570 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:49,962-Speed 2496.48 samples/sec Loss 1.0758 LearningRate 0.000004 Epoch: 37 Global Step: 782580 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:31:58,115-Speed 2512.37 samples/sec Loss 1.0818 LearningRate 0.000004 Epoch: 37 Global Step: 782590 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:06,316-Speed 2497.54 samples/sec Loss 1.0569 LearningRate 0.000004 Epoch: 37 Global Step: 782600 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:14,610-Speed 2469.81 samples/sec Loss 1.0756 LearningRate 0.000004 Epoch: 37 Global Step: 782610 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:22,812-Speed 2497.01 samples/sec Loss 1.0948 LearningRate 0.000004 Epoch: 37 Global Step: 782620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:31,023-Speed 2494.86 samples/sec Loss 1.0950 LearningRate 0.000004 Epoch: 37 Global Step: 782630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:39,224-Speed 2497.69 samples/sec Loss 1.0651 LearningRate 0.000004 Epoch: 37 Global Step: 782640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:47,372-Speed 2513.73 samples/sec Loss 1.0891 LearningRate 0.000004 Epoch: 37 Global Step: 782650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:32:55,593-Speed 2491.58 samples/sec Loss 1.0771 LearningRate 0.000004 Epoch: 37 Global Step: 782660 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:03,796-Speed 2497.05 samples/sec Loss 1.0632 LearningRate 0.000004 Epoch: 37 Global Step: 782670 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:12,005-Speed 2495.30 samples/sec Loss 1.0840 LearningRate 0.000004 Epoch: 37 Global Step: 782680 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:20,207-Speed 2497.25 samples/sec Loss 1.0863 LearningRate 0.000004 Epoch: 37 Global Step: 782690 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:28,416-Speed 2495.53 samples/sec Loss 1.0511 LearningRate 0.000004 Epoch: 37 Global Step: 782700 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:36,565-Speed 2513.61 samples/sec Loss 1.0560 LearningRate 0.000004 Epoch: 37 Global Step: 782710 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:44,768-Speed 2496.92 samples/sec Loss 1.1003 LearningRate 0.000004 Epoch: 37 Global Step: 782720 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:33:52,986-Speed 2492.32 samples/sec Loss 1.0850 LearningRate 0.000004 Epoch: 37 Global Step: 782730 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:01,194-Speed 2495.85 samples/sec Loss 1.0689 LearningRate 0.000004 Epoch: 37 Global Step: 782740 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:09,404-Speed 2494.70 samples/sec Loss 1.0780 LearningRate 0.000004 Epoch: 37 Global Step: 782750 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:17,607-Speed 2497.10 samples/sec Loss 1.0576 LearningRate 0.000004 Epoch: 37 Global Step: 782760 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:25,757-Speed 2513.24 samples/sec Loss 1.1019 LearningRate 0.000004 Epoch: 37 Global Step: 782770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:33,960-Speed 2497.18 samples/sec Loss 1.0679 LearningRate 0.000004 Epoch: 37 Global Step: 782780 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:42,159-Speed 2498.14 samples/sec Loss 1.0838 LearningRate 0.000004 Epoch: 37 Global Step: 782790 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:50,367-Speed 2495.51 samples/sec Loss 1.0644 LearningRate 0.000004 Epoch: 37 Global Step: 782800 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:34:58,568-Speed 2497.57 samples/sec Loss 1.0788 LearningRate 0.000004 Epoch: 37 Global Step: 782810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:06,771-Speed 2497.01 samples/sec Loss 1.0836 LearningRate 0.000004 Epoch: 37 Global Step: 782820 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:14,922-Speed 2512.83 samples/sec Loss 1.0825 LearningRate 0.000004 Epoch: 37 Global Step: 782830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:23,122-Speed 2498.44 samples/sec Loss 1.0577 LearningRate 0.000004 Epoch: 37 Global Step: 782840 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:31,324-Speed 2497.35 samples/sec Loss 1.0606 LearningRate 0.000004 Epoch: 37 Global Step: 782850 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:39,528-Speed 2496.63 samples/sec Loss 1.0823 LearningRate 0.000004 Epoch: 37 Global Step: 782860 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:47,734-Speed 2496.20 samples/sec Loss 1.0743 LearningRate 0.000004 Epoch: 37 Global Step: 782870 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:35:55,934-Speed 2498.25 samples/sec Loss 1.0949 LearningRate 0.000004 Epoch: 37 Global Step: 782880 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:36:04,086-Speed 2512.96 samples/sec Loss 1.0492 LearningRate 0.000004 Epoch: 37 Global Step: 782890 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:36:12,292-Speed 2496.33 samples/sec Loss 1.0944 LearningRate 0.000004 Epoch: 37 Global Step: 782900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:36:20,507-Speed 2493.67 samples/sec Loss 1.0880 LearningRate 0.000004 Epoch: 37 Global Step: 782910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:36:28,708-Speed 2497.60 samples/sec Loss 1.0544 LearningRate 0.000004 Epoch: 37 Global Step: 782920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:36:36,914-Speed 2496.20 samples/sec Loss 1.0747 LearningRate 0.000004 Epoch: 37 Global Step: 782930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:36:45,119-Speed 2496.20 samples/sec Loss 1.0382 LearningRate 0.000004 Epoch: 37 Global Step: 782940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:36:53,271-Speed 2513.01 samples/sec Loss 1.0840 LearningRate 0.000004 Epoch: 37 Global Step: 782950 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:01,474-Speed 2497.02 samples/sec Loss 1.0342 LearningRate 0.000004 Epoch: 37 Global Step: 782960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:09,683-Speed 2495.13 samples/sec Loss 1.0762 LearningRate 0.000004 Epoch: 37 Global Step: 782970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:17,889-Speed 2496.20 samples/sec Loss 1.0630 LearningRate 0.000004 Epoch: 37 Global Step: 782980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:26,097-Speed 2495.52 samples/sec Loss 1.0562 LearningRate 0.000004 Epoch: 37 Global Step: 782990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:34,308-Speed 2495.13 samples/sec Loss 1.0414 LearningRate 0.000004 Epoch: 37 Global Step: 783000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:42,457-Speed 2513.35 samples/sec Loss 1.0699 LearningRate 0.000004 Epoch: 37 Global Step: 783010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:50,662-Speed 2496.37 samples/sec Loss 1.0826 LearningRate 0.000004 Epoch: 37 Global Step: 783020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:37:58,871-Speed 2495.38 samples/sec Loss 1.0520 LearningRate 0.000004 Epoch: 37 Global Step: 783030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:07,073-Speed 2497.35 samples/sec Loss 1.0651 LearningRate 0.000004 Epoch: 37 Global Step: 783040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:15,275-Speed 2497.11 samples/sec Loss 1.0262 LearningRate 0.000004 Epoch: 37 Global Step: 783050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:23,483-Speed 2495.62 samples/sec Loss 1.0665 LearningRate 0.000004 Epoch: 37 Global Step: 783060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:31,635-Speed 2512.62 samples/sec Loss 1.0768 LearningRate 0.000004 Epoch: 37 Global Step: 783070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:39,838-Speed 2497.40 samples/sec Loss 1.0432 LearningRate 0.000004 Epoch: 37 Global Step: 783080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:48,040-Speed 2497.15 samples/sec Loss 1.0456 LearningRate 0.000004 Epoch: 37 Global Step: 783090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:38:56,254-Speed 2493.67 samples/sec Loss 1.0919 LearningRate 0.000004 Epoch: 37 Global Step: 783100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:04,458-Speed 2496.93 samples/sec Loss 1.0803 LearningRate 0.000004 Epoch: 37 Global Step: 783110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:12,664-Speed 2496.24 samples/sec Loss 1.0579 LearningRate 0.000004 Epoch: 37 Global Step: 783120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:20,816-Speed 2512.98 samples/sec Loss 1.0905 LearningRate 0.000004 Epoch: 37 Global Step: 783130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:29,035-Speed 2492.09 samples/sec Loss 1.0665 LearningRate 0.000004 Epoch: 37 Global Step: 783140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:37,236-Speed 2497.52 samples/sec Loss 1.0686 LearningRate 0.000004 Epoch: 37 Global Step: 783150 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:45,445-Speed 2495.51 samples/sec Loss 1.0812 LearningRate 0.000004 Epoch: 37 Global Step: 783160 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:39:53,648-Speed 2496.86 samples/sec Loss 1.0795 LearningRate 0.000004 Epoch: 37 Global Step: 783170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:01,853-Speed 2496.31 samples/sec Loss 1.0728 LearningRate 0.000004 Epoch: 37 Global Step: 783180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:10,004-Speed 2513.37 samples/sec Loss 1.0657 LearningRate 0.000004 Epoch: 37 Global Step: 783190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:18,211-Speed 2495.71 samples/sec Loss 1.0633 LearningRate 0.000004 Epoch: 37 Global Step: 783200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:26,416-Speed 2496.24 samples/sec Loss 1.0410 LearningRate 0.000004 Epoch: 37 Global Step: 783210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:34,622-Speed 2496.15 samples/sec Loss 1.0651 LearningRate 0.000004 Epoch: 37 Global Step: 783220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:42,825-Speed 2497.08 samples/sec Loss 1.0886 LearningRate 0.000004 Epoch: 37 Global Step: 783230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:51,027-Speed 2497.32 samples/sec Loss 1.0759 LearningRate 0.000004 Epoch: 37 Global Step: 783240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:40:59,178-Speed 2512.74 samples/sec Loss 1.0710 LearningRate 0.000004 Epoch: 37 Global Step: 783250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:07,380-Speed 2497.45 samples/sec Loss 1.0878 LearningRate 0.000004 Epoch: 37 Global Step: 783260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:15,583-Speed 2496.92 samples/sec Loss 1.0841 LearningRate 0.000004 Epoch: 37 Global Step: 783270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:23,789-Speed 2496.18 samples/sec Loss 1.0576 LearningRate 0.000004 Epoch: 37 Global Step: 783280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:31,994-Speed 2496.67 samples/sec Loss 1.0603 LearningRate 0.000004 Epoch: 37 Global Step: 783290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:40,197-Speed 2496.99 samples/sec Loss 1.0784 LearningRate 0.000004 Epoch: 37 Global Step: 783300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:48,350-Speed 2512.28 samples/sec Loss 1.0908 LearningRate 0.000004 Epoch: 37 Global Step: 783310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:41:56,559-Speed 2495.19 samples/sec Loss 1.0871 LearningRate 0.000004 Epoch: 37 Global Step: 783320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:04,764-Speed 2496.59 samples/sec Loss 1.0868 LearningRate 0.000004 Epoch: 37 Global Step: 783330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:12,966-Speed 2497.21 samples/sec Loss 1.0810 LearningRate 0.000004 Epoch: 37 Global Step: 783340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:21,168-Speed 2497.33 samples/sec Loss 1.0703 LearningRate 0.000004 Epoch: 37 Global Step: 783350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:29,379-Speed 2494.52 samples/sec Loss 1.0710 LearningRate 0.000004 Epoch: 37 Global Step: 783360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:37,528-Speed 2513.58 samples/sec Loss 1.0838 LearningRate 0.000004 Epoch: 37 Global Step: 783370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:45,733-Speed 2496.47 samples/sec Loss 1.0736 LearningRate 0.000004 Epoch: 37 Global Step: 783380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-07-13 01:42:53,896-Speed 2509.51 samples/sec Loss 1.0802 LearningRate 0.000004 Epoch: 37 Global Step: 783390 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:02,100-Speed 2496.68 samples/sec Loss 1.0830 LearningRate 0.000004 Epoch: 37 Global Step: 783400 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:10,310-Speed 2495.22 samples/sec Loss 1.0457 LearningRate 0.000004 Epoch: 37 Global Step: 783410 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:18,514-Speed 2496.77 samples/sec Loss 1.0873 LearningRate 0.000004 Epoch: 37 Global Step: 783420 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:26,661-Speed 2514.46 samples/sec Loss 1.0974 LearningRate 0.000004 Epoch: 37 Global Step: 783430 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:34,876-Speed 2493.54 samples/sec Loss 1.0705 LearningRate 0.000004 Epoch: 37 Global Step: 783440 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:43,080-Speed 2496.59 samples/sec Loss 1.0884 LearningRate 0.000004 Epoch: 37 Global Step: 783450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:51,284-Speed 2496.75 samples/sec Loss 1.0650 LearningRate 0.000004 Epoch: 37 Global Step: 783460 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:43:59,490-Speed 2496.37 samples/sec Loss 1.0985 LearningRate 0.000004 Epoch: 37 Global Step: 783470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:07,697-Speed 2496.11 samples/sec Loss 1.0798 LearningRate 0.000004 Epoch: 37 Global Step: 783480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:15,849-Speed 2512.57 samples/sec Loss 1.0796 LearningRate 0.000004 Epoch: 37 Global Step: 783490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:24,052-Speed 2497.12 samples/sec Loss 1.0655 LearningRate 0.000004 Epoch: 37 Global Step: 783500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:32,254-Speed 2497.50 samples/sec Loss 1.0774 LearningRate 0.000004 Epoch: 37 Global Step: 783510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:40,456-Speed 2497.06 samples/sec Loss 1.0612 LearningRate 0.000004 Epoch: 37 Global Step: 783520 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:48,658-Speed 2497.57 samples/sec Loss 1.0680 LearningRate 0.000004 Epoch: 37 Global Step: 783530 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:44:56,860-Speed 2497.23 samples/sec Loss 1.1033 LearningRate 0.000004 Epoch: 37 Global Step: 783540 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:05,018-Speed 2510.79 samples/sec Loss 1.0956 LearningRate 0.000004 Epoch: 37 Global Step: 783550 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:13,253-Speed 2487.46 samples/sec Loss 1.0784 LearningRate 0.000004 Epoch: 37 Global Step: 783560 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:21,462-Speed 2495.20 samples/sec Loss 1.0743 LearningRate 0.000004 Epoch: 37 Global Step: 783570 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:29,679-Speed 2492.59 samples/sec Loss 1.0547 LearningRate 0.000004 Epoch: 37 Global Step: 783580 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:37,890-Speed 2494.76 samples/sec Loss 1.0468 LearningRate 0.000004 Epoch: 37 Global Step: 783590 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:46,108-Speed 2492.65 samples/sec Loss 1.0704 LearningRate 0.000004 Epoch: 37 Global Step: 783600 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:45:54,265-Speed 2511.13 samples/sec Loss 1.0997 LearningRate 0.000004 Epoch: 37 Global Step: 783610 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:02,473-Speed 2495.49 samples/sec Loss 1.0818 LearningRate 0.000004 Epoch: 37 Global Step: 783620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:10,679-Speed 2496.16 samples/sec Loss 1.0745 LearningRate 0.000004 Epoch: 37 Global Step: 783630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:18,897-Speed 2492.51 samples/sec Loss 1.0836 LearningRate 0.000004 Epoch: 37 Global Step: 783640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:27,113-Speed 2493.13 samples/sec Loss 1.0702 LearningRate 0.000004 Epoch: 37 Global Step: 783650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:35,319-Speed 2496.11 samples/sec Loss 1.0518 LearningRate 0.000004 Epoch: 37 Global Step: 783660 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:43,470-Speed 2513.10 samples/sec Loss 1.0701 LearningRate 0.000004 Epoch: 37 Global Step: 783670 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-07-13 01:46:51,671-Speed 2497.66 samples/sec Loss 1.0702 LearningRate 0.000004 Epoch: 37 Global Step: 783680 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:46:59,879-Speed 2495.80 samples/sec Loss 1.1049 LearningRate 0.000004 Epoch: 37 Global Step: 783690 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:08,088-Speed 2495.39 samples/sec Loss 1.0633 LearningRate 0.000004 Epoch: 37 Global Step: 783700 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:16,295-Speed 2495.73 samples/sec Loss 1.0526 LearningRate 0.000004 Epoch: 37 Global Step: 783710 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:24,514-Speed 2492.97 samples/sec Loss 1.0732 LearningRate 0.000004 Epoch: 37 Global Step: 783720 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:32,667-Speed 2512.55 samples/sec Loss 1.1059 LearningRate 0.000004 Epoch: 37 Global Step: 783730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:40,870-Speed 2497.02 samples/sec Loss 1.0490 LearningRate 0.000004 Epoch: 37 Global Step: 783740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:49,072-Speed 2497.52 samples/sec Loss 1.0718 LearningRate 0.000004 Epoch: 37 Global Step: 783750 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:47:57,275-Speed 2496.79 samples/sec Loss 1.0796 LearningRate 0.000004 Epoch: 37 Global Step: 783760 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:05,501-Speed 2490.14 samples/sec Loss 1.0805 LearningRate 0.000004 Epoch: 37 Global Step: 783770 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:13,703-Speed 2497.37 samples/sec Loss 1.0658 LearningRate 0.000004 Epoch: 37 Global Step: 783780 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:21,851-Speed 2513.76 samples/sec Loss 1.1247 LearningRate 0.000004 Epoch: 37 Global Step: 783790 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:30,056-Speed 2496.44 samples/sec Loss 1.0735 LearningRate 0.000004 Epoch: 37 Global Step: 783800 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:38,262-Speed 2496.19 samples/sec Loss 1.0574 LearningRate 0.000004 Epoch: 37 Global Step: 783810 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:46,465-Speed 2497.06 samples/sec Loss 1.0844 LearningRate 0.000004 Epoch: 37 Global Step: 783820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:48:54,668-Speed 2496.86 samples/sec Loss 1.0765 LearningRate 0.000004 Epoch: 37 Global Step: 783830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:02,872-Speed 2496.62 samples/sec Loss 1.0623 LearningRate 0.000004 Epoch: 37 Global Step: 783840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:11,021-Speed 2513.54 samples/sec Loss 1.1004 LearningRate 0.000004 Epoch: 37 Global Step: 783850 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:19,227-Speed 2496.39 samples/sec Loss 1.0961 LearningRate 0.000004 Epoch: 37 Global Step: 783860 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:27,431-Speed 2496.56 samples/sec Loss 1.0576 LearningRate 0.000004 Epoch: 37 Global Step: 783870 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:35,637-Speed 2496.04 samples/sec Loss 1.0619 LearningRate 0.000004 Epoch: 37 Global Step: 783880 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:43,844-Speed 2495.96 samples/sec Loss 1.0710 LearningRate 0.000004 Epoch: 37 Global Step: 783890 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:49:52,058-Speed 2493.74 samples/sec Loss 1.0789 LearningRate 0.000004 Epoch: 37 Global Step: 783900 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:00,210-Speed 2512.81 samples/sec Loss 1.0530 LearningRate 0.000004 Epoch: 37 Global Step: 783910 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:08,414-Speed 2496.95 samples/sec Loss 1.0395 LearningRate 0.000004 Epoch: 37 Global Step: 783920 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:16,618-Speed 2496.55 samples/sec Loss 1.0840 LearningRate 0.000004 Epoch: 37 Global Step: 783930 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:24,835-Speed 2492.99 samples/sec Loss 1.0745 LearningRate 0.000004 Epoch: 37 Global Step: 783940 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:33,039-Speed 2496.68 samples/sec Loss 1.0999 LearningRate 0.000004 Epoch: 37 Global Step: 783950 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:41,250-Speed 2494.53 samples/sec Loss 1.0796 LearningRate 0.000004 Epoch: 37 Global Step: 783960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:49,400-Speed 2513.38 samples/sec Loss 1.0921 LearningRate 0.000004 Epoch: 37 Global Step: 783970 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:50:57,603-Speed 2497.08 samples/sec Loss 1.0661 LearningRate 0.000004 Epoch: 37 Global Step: 783980 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:05,808-Speed 2496.36 samples/sec Loss 1.0665 LearningRate 0.000004 Epoch: 37 Global Step: 783990 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:14,011-Speed 2497.13 samples/sec Loss 1.0661 LearningRate 0.000004 Epoch: 37 Global Step: 784000 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:22,214-Speed 2496.94 samples/sec Loss 1.1013 LearningRate 0.000004 Epoch: 37 Global Step: 784010 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:30,422-Speed 2495.51 samples/sec Loss 1.0864 LearningRate 0.000004 Epoch: 37 Global Step: 784020 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:38,582-Speed 2510.19 samples/sec Loss 1.0867 LearningRate 0.000004 Epoch: 37 Global Step: 784030 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:46,787-Speed 2496.32 samples/sec Loss 1.0983 LearningRate 0.000004 Epoch: 37 Global Step: 784040 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:51:55,015-Speed 2489.62 samples/sec Loss 1.0858 LearningRate 0.000004 Epoch: 37 Global Step: 784050 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:03,240-Speed 2490.45 samples/sec Loss 1.0829 LearningRate 0.000004 Epoch: 37 Global Step: 784060 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:11,442-Speed 2497.20 samples/sec Loss 1.0841 LearningRate 0.000004 Epoch: 37 Global Step: 784070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:19,671-Speed 2488.98 samples/sec Loss 1.0751 LearningRate 0.000004 Epoch: 37 Global Step: 784080 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:27,814-Speed 2515.46 samples/sec Loss 1.0732 LearningRate 0.000004 Epoch: 37 Global Step: 784090 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:36,019-Speed 2496.34 samples/sec Loss 1.0970 LearningRate 0.000004 Epoch: 37 Global Step: 784100 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:44,220-Speed 2497.61 samples/sec Loss 1.0511 LearningRate 0.000004 Epoch: 37 Global Step: 784110 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:52:52,430-Speed 2494.96 samples/sec Loss 1.0483 LearningRate 0.000004 Epoch: 37 Global Step: 784120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:00,638-Speed 2496.09 samples/sec Loss 1.0576 LearningRate 0.000004 Epoch: 37 Global Step: 784130 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:08,850-Speed 2494.05 samples/sec Loss 1.0543 LearningRate 0.000004 Epoch: 37 Global Step: 784140 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:17,003-Speed 2512.45 samples/sec Loss 1.0727 LearningRate 0.000004 Epoch: 37 Global Step: 784150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:25,207-Speed 2496.75 samples/sec Loss 1.0900 LearningRate 0.000004 Epoch: 37 Global Step: 784160 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:33,427-Speed 2491.75 samples/sec Loss 1.0728 LearningRate 0.000004 Epoch: 37 Global Step: 784170 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:41,631-Speed 2496.84 samples/sec Loss 1.0653 LearningRate 0.000004 Epoch: 37 Global Step: 784180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:49,836-Speed 2496.73 samples/sec Loss 1.0769 LearningRate 0.000004 Epoch: 37 Global Step: 784190 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:53:58,050-Speed 2493.63 samples/sec Loss 1.0725 LearningRate 0.000004 Epoch: 37 Global Step: 784200 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:06,201-Speed 2512.83 samples/sec Loss 1.0337 LearningRate 0.000004 Epoch: 37 Global Step: 784210 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:14,414-Speed 2494.21 samples/sec Loss 1.0873 LearningRate 0.000004 Epoch: 37 Global Step: 784220 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:22,616-Speed 2497.49 samples/sec Loss 1.0514 LearningRate 0.000004 Epoch: 37 Global Step: 784230 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:30,824-Speed 2495.49 samples/sec Loss 1.0680 LearningRate 0.000004 Epoch: 37 Global Step: 784240 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:39,028-Speed 2496.69 samples/sec Loss 1.0756 LearningRate 0.000004 Epoch: 37 Global Step: 784250 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:47,233-Speed 2496.59 samples/sec Loss 1.0708 LearningRate 0.000004 Epoch: 37 Global Step: 784260 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:54:55,383-Speed 2513.09 samples/sec Loss 1.0713 LearningRate 0.000004 Epoch: 37 Global Step: 784270 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:03,592-Speed 2495.49 samples/sec Loss 1.0721 LearningRate 0.000004 Epoch: 37 Global Step: 784280 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:11,802-Speed 2494.79 samples/sec Loss 1.0829 LearningRate 0.000004 Epoch: 37 Global Step: 784290 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:20,005-Speed 2496.98 samples/sec Loss 1.0612 LearningRate 0.000004 Epoch: 37 Global Step: 784300 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:28,211-Speed 2496.26 samples/sec Loss 1.0457 LearningRate 0.000004 Epoch: 37 Global Step: 784310 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:36,429-Speed 2492.51 samples/sec Loss 1.0674 LearningRate 0.000004 Epoch: 37 Global Step: 784320 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:44,578-Speed 2513.62 samples/sec Loss 1.0682 LearningRate 0.000004 Epoch: 37 Global Step: 784330 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:55:52,787-Speed 2495.60 samples/sec Loss 1.0552 LearningRate 0.000004 Epoch: 37 Global Step: 784340 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:01,006-Speed 2492.51 samples/sec Loss 1.0610 LearningRate 0.000004 Epoch: 37 Global Step: 784350 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:09,227-Speed 2491.59 samples/sec Loss 1.0404 LearningRate 0.000004 Epoch: 37 Global Step: 784360 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:17,435-Speed 2495.56 samples/sec Loss 1.0903 LearningRate 0.000004 Epoch: 37 Global Step: 784370 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:25,646-Speed 2494.72 samples/sec Loss 1.0573 LearningRate 0.000004 Epoch: 37 Global Step: 784380 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:33,805-Speed 2510.28 samples/sec Loss 1.0757 LearningRate 0.000004 Epoch: 37 Global Step: 784390 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:42,016-Speed 2494.81 samples/sec Loss 1.0637 LearningRate 0.000004 Epoch: 37 Global Step: 784400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:50,226-Speed 2494.66 samples/sec Loss 1.0595 LearningRate 0.000004 Epoch: 37 Global Step: 784410 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:56:58,431-Speed 2497.08 samples/sec Loss 1.0722 LearningRate 0.000004 Epoch: 37 Global Step: 784420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:06,638-Speed 2495.92 samples/sec Loss 1.0688 LearningRate 0.000004 Epoch: 37 Global Step: 784430 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:14,843-Speed 2496.20 samples/sec Loss 1.0778 LearningRate 0.000004 Epoch: 37 Global Step: 784440 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:22,994-Speed 2512.98 samples/sec Loss 1.0920 LearningRate 0.000004 Epoch: 37 Global Step: 784450 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:31,195-Speed 2497.76 samples/sec Loss 1.0580 LearningRate 0.000004 Epoch: 37 Global Step: 784460 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:39,402-Speed 2495.78 samples/sec Loss 1.0908 LearningRate 0.000004 Epoch: 37 Global Step: 784470 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:47,616-Speed 2493.75 samples/sec Loss 1.0897 LearningRate 0.000004 Epoch: 37 Global Step: 784480 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:57:55,831-Speed 2493.35 samples/sec Loss 1.0789 LearningRate 0.000004 Epoch: 37 Global Step: 784490 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:04,035-Speed 2496.80 samples/sec Loss 1.0702 LearningRate 0.000004 Epoch: 37 Global Step: 784500 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:12,185-Speed 2513.42 samples/sec Loss 1.0774 LearningRate 0.000004 Epoch: 37 Global Step: 784510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:20,390-Speed 2496.19 samples/sec Loss 1.0839 LearningRate 0.000004 Epoch: 37 Global Step: 784520 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:28,598-Speed 2495.83 samples/sec Loss 1.0883 LearningRate 0.000004 Epoch: 37 Global Step: 784530 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:36,802-Speed 2496.63 samples/sec Loss 1.0829 LearningRate 0.000004 Epoch: 37 Global Step: 784540 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:45,014-Speed 2494.16 samples/sec Loss 1.0856 LearningRate 0.000004 Epoch: 37 Global Step: 784550 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:58:53,220-Speed 2496.02 samples/sec Loss 1.0655 LearningRate 0.000004 Epoch: 37 Global Step: 784560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:59:01,366-Speed 2514.84 samples/sec Loss 1.0696 LearningRate 0.000004 Epoch: 37 Global Step: 784570 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:59:09,570-Speed 2496.57 samples/sec Loss 1.0799 LearningRate 0.000004 Epoch: 37 Global Step: 784580 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 01:59:17,774-Speed 2496.81 samples/sec Loss 1.0511 LearningRate 0.000004 Epoch: 37 Global Step: 784590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 01:59:25,985-Speed 2494.38 samples/sec Loss 1.0565 LearningRate 0.000004 Epoch: 37 Global Step: 784600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 01:59:34,191-Speed 2496.00 samples/sec Loss 1.0951 LearningRate 0.000004 Epoch: 37 Global Step: 784610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 01:59:42,397-Speed 2496.54 samples/sec Loss 1.0862 LearningRate 0.000004 Epoch: 37 Global Step: 784620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 01:59:50,560-Speed 2509.01 samples/sec Loss 1.0627 LearningRate 0.000004 Epoch: 37 Global Step: 784630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 01:59:58,764-Speed 2496.84 samples/sec Loss 1.0527 LearningRate 0.000004 Epoch: 37 Global Step: 784640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:06,971-Speed 2495.76 samples/sec Loss 1.0632 LearningRate 0.000004 Epoch: 37 Global Step: 784650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:15,176-Speed 2496.56 samples/sec Loss 1.0516 LearningRate 0.000004 Epoch: 37 Global Step: 784660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:23,380-Speed 2496.78 samples/sec Loss 1.0689 LearningRate 0.000004 Epoch: 37 Global Step: 784670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:31,605-Speed 2490.31 samples/sec Loss 1.0785 LearningRate 0.000004 Epoch: 37 Global Step: 784680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:39,758-Speed 2512.28 samples/sec Loss 1.0614 LearningRate 0.000004 Epoch: 37 Global Step: 784690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:47,961-Speed 2497.07 samples/sec Loss 1.0709 LearningRate 0.000004 Epoch: 37 Global Step: 784700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:00:56,168-Speed 2495.85 samples/sec Loss 1.0781 LearningRate 0.000004 Epoch: 37 Global Step: 784710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:04,372-Speed 2496.62 samples/sec Loss 1.0826 LearningRate 0.000004 Epoch: 37 Global Step: 784720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:12,600-Speed 2489.32 samples/sec Loss 1.0660 LearningRate 0.000004 Epoch: 37 Global Step: 784730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:20,814-Speed 2493.63 samples/sec Loss 1.0554 LearningRate 0.000004 Epoch: 37 Global Step: 784740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:28,974-Speed 2510.56 samples/sec Loss 1.0918 LearningRate 0.000004 Epoch: 37 Global Step: 784750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:37,176-Speed 2497.14 samples/sec Loss 1.0672 LearningRate 0.000004 Epoch: 37 Global Step: 784760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:45,381-Speed 2496.60 samples/sec Loss 1.0854 LearningRate 0.000004 Epoch: 37 Global Step: 784770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:01:53,597-Speed 2493.03 samples/sec Loss 1.0416 LearningRate 0.000004 Epoch: 37 Global Step: 784780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:01,802-Speed 2496.44 samples/sec Loss 1.0872 LearningRate 0.000004 Epoch: 37 Global Step: 784790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:10,019-Speed 2492.92 samples/sec Loss 1.0655 LearningRate 0.000004 Epoch: 37 Global Step: 784800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:18,171-Speed 2512.57 samples/sec Loss 1.0904 LearningRate 0.000004 Epoch: 37 Global Step: 784810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:26,375-Speed 2496.90 samples/sec Loss 1.0803 LearningRate 0.000004 Epoch: 37 Global Step: 784820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:34,582-Speed 2495.79 samples/sec Loss 1.0556 LearningRate 0.000004 Epoch: 37 Global Step: 784830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:42,807-Speed 2490.43 samples/sec Loss 1.0535 LearningRate 0.000004 Epoch: 37 Global Step: 784840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:51,014-Speed 2495.93 samples/sec Loss 1.0776 LearningRate 0.000004 Epoch: 37 Global Step: 784850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:02:59,217-Speed 2496.87 samples/sec Loss 1.0756 LearningRate 0.000004 Epoch: 37 Global Step: 784860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:07,380-Speed 2509.43 samples/sec Loss 1.0817 LearningRate 0.000004 Epoch: 37 Global Step: 784870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:15,585-Speed 2496.47 samples/sec Loss 1.0652 LearningRate 0.000004 Epoch: 37 Global Step: 784880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:23,796-Speed 2494.56 samples/sec Loss 1.0840 LearningRate 0.000004 Epoch: 37 Global Step: 784890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:32,002-Speed 2496.21 samples/sec Loss 1.0907 LearningRate 0.000004 Epoch: 37 Global Step: 784900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:40,222-Speed 2492.09 samples/sec Loss 1.0526 LearningRate 0.000004 Epoch: 37 Global Step: 784910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:48,428-Speed 2496.12 samples/sec Loss 1.0681 LearningRate 0.000004 Epoch: 37 Global Step: 784920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:03:56,580-Speed 2512.58 samples/sec Loss 1.0414 LearningRate 0.000004 Epoch: 37 Global Step: 784930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:04,782-Speed 2497.43 samples/sec Loss 1.0375 LearningRate 0.000004 Epoch: 37 Global Step: 784940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:12,986-Speed 2496.69 samples/sec Loss 1.0866 LearningRate 0.000004 Epoch: 37 Global Step: 784950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:21,190-Speed 2496.82 samples/sec Loss 1.0811 LearningRate 0.000004 Epoch: 37 Global Step: 784960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:29,394-Speed 2496.78 samples/sec Loss 1.0661 LearningRate 0.000004 Epoch: 37 Global Step: 784970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:37,603-Speed 2495.11 samples/sec Loss 1.1095 LearningRate 0.000004 Epoch: 37 Global Step: 784980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:45,754-Speed 2513.32 samples/sec Loss 1.0519 LearningRate 0.000004 Epoch: 37 Global Step: 784990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:04:53,959-Speed 2496.53 samples/sec Loss 1.0628 LearningRate 0.000004 Epoch: 37 Global Step: 785000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:02,164-Speed 2496.34 samples/sec Loss 1.0576 LearningRate 0.000004 Epoch: 37 Global Step: 785010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:10,364-Speed 2497.70 samples/sec Loss 1.0616 LearningRate 0.000004 Epoch: 37 Global Step: 785020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:18,575-Speed 2494.79 samples/sec Loss 1.0857 LearningRate 0.000004 Epoch: 37 Global Step: 785030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:26,778-Speed 2497.27 samples/sec Loss 1.0618 LearningRate 0.000004 Epoch: 37 Global Step: 785040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:34,943-Speed 2508.50 samples/sec Loss 1.0615 LearningRate 0.000004 Epoch: 37 Global Step: 785050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:43,155-Speed 2494.36 samples/sec Loss 1.0620 LearningRate 0.000004 Epoch: 37 Global Step: 785060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:51,365-Speed 2495.14 samples/sec Loss 1.0476 LearningRate 0.000004 Epoch: 37 Global Step: 785070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:05:59,580-Speed 2493.80 samples/sec Loss 1.0743 LearningRate 0.000004 Epoch: 37 Global Step: 785080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:07,784-Speed 2496.34 samples/sec Loss 1.0771 LearningRate 0.000004 Epoch: 37 Global Step: 785090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:15,990-Speed 2496.05 samples/sec Loss 1.0686 LearningRate 0.000004 Epoch: 37 Global Step: 785100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:24,146-Speed 2511.80 samples/sec Loss 1.0630 LearningRate 0.000004 Epoch: 37 Global Step: 785110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:32,351-Speed 2496.50 samples/sec Loss 1.0575 LearningRate 0.000004 Epoch: 37 Global Step: 785120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:40,557-Speed 2496.07 samples/sec Loss 1.1052 LearningRate 0.000004 Epoch: 37 Global Step: 785130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:48,764-Speed 2496.22 samples/sec Loss 1.0465 LearningRate 0.000004 Epoch: 37 Global Step: 785140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:06:56,971-Speed 2495.62 samples/sec Loss 1.0616 LearningRate 0.000004 Epoch: 37 Global Step: 785150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:05,180-Speed 2495.55 samples/sec Loss 1.0746 LearningRate 0.000004 Epoch: 37 Global Step: 785160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:13,335-Speed 2511.41 samples/sec Loss 1.0765 LearningRate 0.000004 Epoch: 37 Global Step: 785170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:21,540-Speed 2496.89 samples/sec Loss 1.0841 LearningRate 0.000004 Epoch: 37 Global Step: 785180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:29,745-Speed 2496.43 samples/sec Loss 1.0736 LearningRate 0.000004 Epoch: 37 Global Step: 785190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:37,952-Speed 2495.74 samples/sec Loss 1.0894 LearningRate 0.000004 Epoch: 37 Global Step: 785200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:46,156-Speed 2497.18 samples/sec Loss 1.0328 LearningRate 0.000004 Epoch: 37 Global Step: 785210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:07:54,375-Speed 2492.21 samples/sec Loss 1.0638 LearningRate 0.000004 Epoch: 37 Global Step: 785220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:02,533-Speed 2510.86 samples/sec Loss 1.0826 LearningRate 0.000004 Epoch: 37 Global Step: 785230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:10,741-Speed 2495.78 samples/sec Loss 1.0716 LearningRate 0.000004 Epoch: 37 Global Step: 785240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:18,954-Speed 2493.85 samples/sec Loss 1.0976 LearningRate 0.000004 Epoch: 37 Global Step: 785250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:27,164-Speed 2494.85 samples/sec Loss 1.0772 LearningRate 0.000004 Epoch: 37 Global Step: 785260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:35,376-Speed 2494.42 samples/sec Loss 1.0777 LearningRate 0.000004 Epoch: 37 Global Step: 785270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:43,584-Speed 2495.56 samples/sec Loss 1.0868 LearningRate 0.000004 Epoch: 37 Global Step: 785280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:51,743-Speed 2510.63 samples/sec Loss 1.0862 LearningRate 0.000004 Epoch: 37 Global Step: 785290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:08:59,953-Speed 2494.94 samples/sec Loss 1.0592 LearningRate 0.000004 Epoch: 37 Global Step: 785300 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:08,161-Speed 2495.62 samples/sec Loss 1.0578 LearningRate 0.000004 Epoch: 37 Global Step: 785310 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:16,369-Speed 2495.35 samples/sec Loss 1.0603 LearningRate 0.000004 Epoch: 37 Global Step: 785320 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:24,579-Speed 2494.83 samples/sec Loss 1.0521 LearningRate 0.000004 Epoch: 37 Global Step: 785330 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:32,789-Speed 2494.96 samples/sec Loss 1.0692 LearningRate 0.000004 Epoch: 37 Global Step: 785340 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:40,941-Speed 2512.66 samples/sec Loss 1.0716 LearningRate 0.000004 Epoch: 37 Global Step: 785350 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:49,153-Speed 2494.49 samples/sec Loss 1.0710 LearningRate 0.000004 Epoch: 37 Global Step: 785360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:09:57,357-Speed 2496.54 samples/sec Loss 1.1014 LearningRate 0.000004 Epoch: 37 Global Step: 785370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:05,562-Speed 2496.43 samples/sec Loss 1.0678 LearningRate 0.000004 Epoch: 37 Global Step: 785380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:13,766-Speed 2497.18 samples/sec Loss 1.0554 LearningRate 0.000004 Epoch: 37 Global Step: 785390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:21,971-Speed 2496.28 samples/sec Loss 1.0906 LearningRate 0.000003 Epoch: 37 Global Step: 785400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:30,121-Speed 2513.31 samples/sec Loss 1.0868 LearningRate 0.000003 Epoch: 37 Global Step: 785410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:38,325-Speed 2496.65 samples/sec Loss 1.0630 LearningRate 0.000003 Epoch: 37 Global Step: 785420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:46,533-Speed 2495.92 samples/sec Loss 1.0699 LearningRate 0.000003 Epoch: 37 Global Step: 785430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:10:54,739-Speed 2496.17 samples/sec Loss 1.0720 LearningRate 0.000003 Epoch: 37 Global Step: 785440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:02,943-Speed 2496.91 samples/sec Loss 1.0879 LearningRate 0.000003 Epoch: 37 Global Step: 785450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:11,147-Speed 2496.67 samples/sec Loss 1.0704 LearningRate 0.000003 Epoch: 37 Global Step: 785460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:19,299-Speed 2512.66 samples/sec Loss 1.0834 LearningRate 0.000003 Epoch: 37 Global Step: 785470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:27,505-Speed 2496.11 samples/sec Loss 1.0684 LearningRate 0.000003 Epoch: 37 Global Step: 785480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:35,712-Speed 2495.80 samples/sec Loss 1.0798 LearningRate 0.000003 Epoch: 37 Global Step: 785490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:43,920-Speed 2495.58 samples/sec Loss 1.0817 LearningRate 0.000003 Epoch: 37 Global Step: 785500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:11:52,135-Speed 2493.58 samples/sec Loss 1.0753 LearningRate 0.000003 Epoch: 37 Global Step: 785510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:00,337-Speed 2497.36 samples/sec Loss 1.0671 LearningRate 0.000003 Epoch: 37 Global Step: 785520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:08,492-Speed 2511.57 samples/sec Loss 1.0825 LearningRate 0.000003 Epoch: 37 Global Step: 785530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:16,702-Speed 2495.26 samples/sec Loss 1.0720 LearningRate 0.000003 Epoch: 37 Global Step: 785540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:24,913-Speed 2494.58 samples/sec Loss 1.0683 LearningRate 0.000003 Epoch: 37 Global Step: 785550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:33,116-Speed 2496.79 samples/sec Loss 1.0466 LearningRate 0.000003 Epoch: 37 Global Step: 785560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:41,323-Speed 2496.11 samples/sec Loss 1.0585 LearningRate 0.000003 Epoch: 37 Global Step: 785570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:49,528-Speed 2496.41 samples/sec Loss 1.0558 LearningRate 0.000003 Epoch: 37 Global Step: 785580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:12:57,689-Speed 2509.97 samples/sec Loss 1.0807 LearningRate 0.000003 Epoch: 37 Global Step: 785590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:05,895-Speed 2496.04 samples/sec Loss 1.0798 LearningRate 0.000003 Epoch: 37 Global Step: 785600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:14,101-Speed 2496.54 samples/sec Loss 1.0863 LearningRate 0.000003 Epoch: 37 Global Step: 785610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:22,307-Speed 2496.01 samples/sec Loss 1.0788 LearningRate 0.000003 Epoch: 37 Global Step: 785620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:30,511-Speed 2496.74 samples/sec Loss 1.0477 LearningRate 0.000003 Epoch: 37 Global Step: 785630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:38,714-Speed 2496.98 samples/sec Loss 1.0618 LearningRate 0.000003 Epoch: 37 Global Step: 785640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:46,870-Speed 2511.31 samples/sec Loss 1.0607 LearningRate 0.000003 Epoch: 37 Global Step: 785650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:13:55,075-Speed 2496.49 samples/sec Loss 1.0655 LearningRate 0.000003 Epoch: 37 Global Step: 785660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:03,286-Speed 2494.79 samples/sec Loss 1.0773 LearningRate 0.000003 Epoch: 37 Global Step: 785670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:11,490-Speed 2496.52 samples/sec Loss 1.0712 LearningRate 0.000003 Epoch: 37 Global Step: 785680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:19,700-Speed 2495.04 samples/sec Loss 1.1001 LearningRate 0.000003 Epoch: 37 Global Step: 785690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:27,911-Speed 2494.62 samples/sec Loss 1.0851 LearningRate 0.000003 Epoch: 37 Global Step: 785700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:36,062-Speed 2512.82 samples/sec Loss 1.0970 LearningRate 0.000003 Epoch: 37 Global Step: 785710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:44,273-Speed 2494.63 samples/sec Loss 1.0796 LearningRate 0.000003 Epoch: 37 Global Step: 785720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:14:52,477-Speed 2496.65 samples/sec Loss 1.0841 LearningRate 0.000003 Epoch: 37 Global Step: 785730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:00,685-Speed 2495.87 samples/sec Loss 1.0629 LearningRate 0.000003 Epoch: 37 Global Step: 785740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:08,892-Speed 2495.85 samples/sec Loss 1.0745 LearningRate 0.000003 Epoch: 37 Global Step: 785750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:17,095-Speed 2496.79 samples/sec Loss 1.0725 LearningRate 0.000003 Epoch: 37 Global Step: 785760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:25,248-Speed 2512.65 samples/sec Loss 1.0767 LearningRate 0.000003 Epoch: 37 Global Step: 785770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:33,455-Speed 2495.79 samples/sec Loss 1.0867 LearningRate 0.000003 Epoch: 37 Global Step: 785780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:15:41,659-Speed 2496.71 samples/sec Loss 1.0435 LearningRate 0.000003 Epoch: 37 Global Step: 785790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:15:49,862-Speed 2497.15 samples/sec Loss 1.0944 LearningRate 0.000003 Epoch: 37 Global Step: 785800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:15:58,065-Speed 2497.21 samples/sec Loss 1.0650 LearningRate 0.000003 Epoch: 37 Global Step: 785810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:06,265-Speed 2497.77 samples/sec Loss 1.0673 LearningRate 0.000003 Epoch: 37 Global Step: 785820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:14,414-Speed 2513.56 samples/sec Loss 1.0499 LearningRate 0.000003 Epoch: 37 Global Step: 785830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:22,618-Speed 2496.92 samples/sec Loss 1.0616 LearningRate 0.000003 Epoch: 37 Global Step: 785840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:30,819-Speed 2498.91 samples/sec Loss 1.0667 LearningRate 0.000003 Epoch: 37 Global Step: 785850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:39,041-Speed 2491.13 samples/sec Loss 1.0497 LearningRate 0.000003 Epoch: 37 Global Step: 785860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:47,249-Speed 2495.41 samples/sec Loss 1.0520 LearningRate 0.000003 Epoch: 37 Global Step: 785870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:16:55,458-Speed 2495.17 samples/sec Loss 1.0833 LearningRate 0.000003 Epoch: 37 Global Step: 785880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:03,607-Speed 2513.59 samples/sec Loss 1.0551 LearningRate 0.000003 Epoch: 37 Global Step: 785890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:11,817-Speed 2494.81 samples/sec Loss 1.0554 LearningRate 0.000003 Epoch: 37 Global Step: 785900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:20,023-Speed 2496.08 samples/sec Loss 1.0698 LearningRate 0.000003 Epoch: 37 Global Step: 785910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:28,226-Speed 2497.08 samples/sec Loss 1.0938 LearningRate 0.000003 Epoch: 37 Global Step: 785920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:36,430-Speed 2496.63 samples/sec Loss 1.0621 LearningRate 0.000003 Epoch: 37 Global Step: 785930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:44,634-Speed 2496.77 samples/sec Loss 1.0764 LearningRate 0.000003 Epoch: 37 Global Step: 785940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:17:52,787-Speed 2512.62 samples/sec Loss 1.0651 LearningRate 0.000003 Epoch: 37 Global Step: 785950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:00,987-Speed 2497.90 samples/sec Loss 1.0598 LearningRate 0.000003 Epoch: 37 Global Step: 785960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:09,189-Speed 2497.15 samples/sec Loss 1.0784 LearningRate 0.000003 Epoch: 37 Global Step: 785970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:17,392-Speed 2497.11 samples/sec Loss 1.0694 LearningRate 0.000003 Epoch: 37 Global Step: 785980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:25,597-Speed 2496.70 samples/sec Loss 1.0686 LearningRate 0.000003 Epoch: 37 Global Step: 785990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:33,804-Speed 2495.84 samples/sec Loss 1.0867 LearningRate 0.000003 Epoch: 37 Global Step: 786000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:41,956-Speed 2512.45 samples/sec Loss 1.0571 LearningRate 0.000003 Epoch: 37 Global Step: 786010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:50,181-Speed 2490.37 samples/sec Loss 1.0744 LearningRate 0.000003 Epoch: 37 Global Step: 786020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:18:58,383-Speed 2497.35 samples/sec Loss 1.0515 LearningRate 0.000003 Epoch: 37 Global Step: 786030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-07-13 02:19:06,547-Speed 2509.07 samples/sec Loss 1.0691 LearningRate 0.000003 Epoch: 37 Global Step: 786040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:14,764-Speed 2492.60 samples/sec Loss 1.0917 LearningRate 0.000003 Epoch: 37 Global Step: 786050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:22,967-Speed 2497.24 samples/sec Loss 1.0834 LearningRate 0.000003 Epoch: 37 Global Step: 786060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:31,124-Speed 2511.04 samples/sec Loss 1.1020 LearningRate 0.000003 Epoch: 37 Global Step: 786070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:39,328-Speed 2496.76 samples/sec Loss 1.0700 LearningRate 0.000003 Epoch: 37 Global Step: 786080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:47,530-Speed 2497.50 samples/sec Loss 1.0799 LearningRate 0.000003 Epoch: 37 Global Step: 786090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:19:55,733-Speed 2497.06 samples/sec Loss 1.0802 LearningRate 0.000003 Epoch: 37 Global Step: 786100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:03,936-Speed 2497.08 samples/sec Loss 1.0278 LearningRate 0.000003 Epoch: 37 Global Step: 786110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:12,141-Speed 2496.51 samples/sec Loss 1.1009 LearningRate 0.000003 Epoch: 37 Global Step: 786120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:20,294-Speed 2512.37 samples/sec Loss 1.0753 LearningRate 0.000003 Epoch: 37 Global Step: 786130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:28,496-Speed 2497.27 samples/sec Loss 1.0877 LearningRate 0.000003 Epoch: 37 Global Step: 786140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:36,735-Speed 2486.01 samples/sec Loss 1.0868 LearningRate 0.000003 Epoch: 37 Global Step: 786150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:44,943-Speed 2495.57 samples/sec Loss 1.0604 LearningRate 0.000003 Epoch: 37 Global Step: 786160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:20:53,145-Speed 2497.23 samples/sec Loss 1.0629 LearningRate 0.000003 Epoch: 37 Global Step: 786170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:01,354-Speed 2495.05 samples/sec Loss 1.0537 LearningRate 0.000003 Epoch: 37 Global Step: 786180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:09,504-Speed 2513.65 samples/sec Loss 1.0748 LearningRate 0.000003 Epoch: 37 Global Step: 786190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:17,713-Speed 2495.47 samples/sec Loss 1.0492 LearningRate 0.000003 Epoch: 37 Global Step: 786200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:25,917-Speed 2496.47 samples/sec Loss 1.0485 LearningRate 0.000003 Epoch: 37 Global Step: 786210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:34,120-Speed 2497.12 samples/sec Loss 1.0687 LearningRate 0.000003 Epoch: 37 Global Step: 786220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:42,325-Speed 2496.50 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 37 Global Step: 786230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:50,534-Speed 2495.24 samples/sec Loss 1.0563 LearningRate 0.000003 Epoch: 37 Global Step: 786240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:21:58,686-Speed 2512.79 samples/sec Loss 1.0625 LearningRate 0.000003 Epoch: 37 Global Step: 786250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:06,894-Speed 2495.45 samples/sec Loss 1.1040 LearningRate 0.000003 Epoch: 37 Global Step: 786260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:15,112-Speed 2492.71 samples/sec Loss 1.0760 LearningRate 0.000003 Epoch: 37 Global Step: 786270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:23,318-Speed 2496.04 samples/sec Loss 1.0665 LearningRate 0.000003 Epoch: 37 Global Step: 786280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:31,537-Speed 2492.21 samples/sec Loss 1.0961 LearningRate 0.000003 Epoch: 37 Global Step: 786290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:39,740-Speed 2496.94 samples/sec Loss 1.0593 LearningRate 0.000003 Epoch: 37 Global Step: 786300 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:47,894-Speed 2512.21 samples/sec Loss 1.0726 LearningRate 0.000003 Epoch: 37 Global Step: 786310 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:22:56,100-Speed 2495.89 samples/sec Loss 1.0781 LearningRate 0.000003 Epoch: 37 Global Step: 786320 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:04,307-Speed 2495.84 samples/sec Loss 1.0466 LearningRate 0.000003 Epoch: 37 Global Step: 786330 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:12,515-Speed 2495.59 samples/sec Loss 1.0598 LearningRate 0.000003 Epoch: 37 Global Step: 786340 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:20,721-Speed 2496.08 samples/sec Loss 1.0349 LearningRate 0.000003 Epoch: 37 Global Step: 786350 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:28,938-Speed 2492.60 samples/sec Loss 1.0428 LearningRate 0.000003 Epoch: 37 Global Step: 786360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:37,087-Speed 2514.12 samples/sec Loss 1.0726 LearningRate 0.000003 Epoch: 37 Global Step: 786370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:45,305-Speed 2492.42 samples/sec Loss 1.0958 LearningRate 0.000003 Epoch: 37 Global Step: 786380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:23:53,512-Speed 2495.74 samples/sec Loss 1.0689 LearningRate 0.000003 Epoch: 37 Global Step: 786390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:01,720-Speed 2495.65 samples/sec Loss 1.0625 LearningRate 0.000003 Epoch: 37 Global Step: 786400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:09,939-Speed 2492.27 samples/sec Loss 1.0697 LearningRate 0.000003 Epoch: 37 Global Step: 786410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:18,142-Speed 2496.93 samples/sec Loss 1.1005 LearningRate 0.000003 Epoch: 37 Global Step: 786420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:26,304-Speed 2509.61 samples/sec Loss 1.0700 LearningRate 0.000003 Epoch: 37 Global Step: 786430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:34,510-Speed 2496.31 samples/sec Loss 1.0565 LearningRate 0.000003 Epoch: 37 Global Step: 786440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:42,717-Speed 2495.76 samples/sec Loss 1.0538 LearningRate 0.000003 Epoch: 37 Global Step: 786450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:50,920-Speed 2496.89 samples/sec Loss 1.0946 LearningRate 0.000003 Epoch: 37 Global Step: 786460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:24:59,126-Speed 2496.31 samples/sec Loss 1.0539 LearningRate 0.000003 Epoch: 37 Global Step: 786470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:07,354-Speed 2489.33 samples/sec Loss 1.0838 LearningRate 0.000003 Epoch: 37 Global Step: 786480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:15,515-Speed 2510.09 samples/sec Loss 1.0629 LearningRate 0.000003 Epoch: 37 Global Step: 786490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:23,727-Speed 2494.21 samples/sec Loss 1.0627 LearningRate 0.000003 Epoch: 37 Global Step: 786500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:31,932-Speed 2496.55 samples/sec Loss 1.0951 LearningRate 0.000003 Epoch: 37 Global Step: 786510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:40,136-Speed 2496.85 samples/sec Loss 1.0747 LearningRate 0.000003 Epoch: 37 Global Step: 786520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:48,354-Speed 2492.58 samples/sec Loss 1.0716 LearningRate 0.000003 Epoch: 37 Global Step: 786530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:25:56,559-Speed 2496.41 samples/sec Loss 1.0704 LearningRate 0.000003 Epoch: 37 Global Step: 786540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:04,713-Speed 2512.03 samples/sec Loss 1.0584 LearningRate 0.000003 Epoch: 37 Global Step: 786550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:12,915-Speed 2497.37 samples/sec Loss 1.0687 LearningRate 0.000003 Epoch: 37 Global Step: 786560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:21,119-Speed 2496.56 samples/sec Loss 1.1032 LearningRate 0.000003 Epoch: 37 Global Step: 786570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:29,324-Speed 2496.76 samples/sec Loss 1.0892 LearningRate 0.000003 Epoch: 37 Global Step: 786580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:37,536-Speed 2494.06 samples/sec Loss 1.0634 LearningRate 0.000003 Epoch: 37 Global Step: 786590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:45,741-Speed 2496.50 samples/sec Loss 1.0837 LearningRate 0.000003 Epoch: 37 Global Step: 786600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:26:53,898-Speed 2511.52 samples/sec Loss 1.0875 LearningRate 0.000003 Epoch: 37 Global Step: 786610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:02,103-Speed 2496.46 samples/sec Loss 1.0519 LearningRate 0.000003 Epoch: 37 Global Step: 786620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:10,310-Speed 2495.80 samples/sec Loss 1.0915 LearningRate 0.000003 Epoch: 37 Global Step: 786630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:18,516-Speed 2496.12 samples/sec Loss 1.0624 LearningRate 0.000003 Epoch: 37 Global Step: 786640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:26,722-Speed 2496.14 samples/sec Loss 1.0859 LearningRate 0.000003 Epoch: 37 Global Step: 786650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:34,931-Speed 2495.21 samples/sec Loss 1.0760 LearningRate 0.000003 Epoch: 37 Global Step: 786660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:43,081-Speed 2513.66 samples/sec Loss 1.0572 LearningRate 0.000003 Epoch: 37 Global Step: 786670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:51,282-Speed 2497.53 samples/sec Loss 1.1005 LearningRate 0.000003 Epoch: 37 Global Step: 786680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:27:59,488-Speed 2496.19 samples/sec Loss 1.0857 LearningRate 0.000003 Epoch: 37 Global Step: 786690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:07,694-Speed 2496.08 samples/sec Loss 1.0591 LearningRate 0.000003 Epoch: 37 Global Step: 786700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:15,900-Speed 2495.97 samples/sec Loss 1.0833 LearningRate 0.000003 Epoch: 37 Global Step: 786710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:24,112-Speed 2494.39 samples/sec Loss 1.0733 LearningRate 0.000003 Epoch: 37 Global Step: 786720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:32,266-Speed 2512.15 samples/sec Loss 1.0802 LearningRate 0.000003 Epoch: 37 Global Step: 786730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:40,470-Speed 2496.53 samples/sec Loss 1.0205 LearningRate 0.000003 Epoch: 37 Global Step: 786740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:48,674-Speed 2496.77 samples/sec Loss 1.0803 LearningRate 0.000003 Epoch: 37 Global Step: 786750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:28:56,877-Speed 2497.17 samples/sec Loss 1.0405 LearningRate 0.000003 Epoch: 37 Global Step: 786760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:05,079-Speed 2497.28 samples/sec Loss 1.0837 LearningRate 0.000003 Epoch: 37 Global Step: 786770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:13,281-Speed 2497.35 samples/sec Loss 1.0732 LearningRate 0.000003 Epoch: 37 Global Step: 786780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:21,447-Speed 2508.26 samples/sec Loss 1.0732 LearningRate 0.000003 Epoch: 37 Global Step: 786790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:29,656-Speed 2495.20 samples/sec Loss 1.0727 LearningRate 0.000003 Epoch: 37 Global Step: 786800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:37,863-Speed 2495.85 samples/sec Loss 1.0394 LearningRate 0.000003 Epoch: 37 Global Step: 786810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:46,087-Speed 2490.71 samples/sec Loss 1.0678 LearningRate 0.000003 Epoch: 37 Global Step: 786820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:29:54,318-Speed 2488.54 samples/sec Loss 1.0746 LearningRate 0.000003 Epoch: 37 Global Step: 786830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:02,528-Speed 2494.87 samples/sec Loss 1.0781 LearningRate 0.000003 Epoch: 37 Global Step: 786840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:10,694-Speed 2509.37 samples/sec Loss 1.0775 LearningRate 0.000003 Epoch: 37 Global Step: 786850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:18,899-Speed 2496.26 samples/sec Loss 1.0734 LearningRate 0.000003 Epoch: 37 Global Step: 786860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:27,103-Speed 2496.85 samples/sec Loss 1.0872 LearningRate 0.000003 Epoch: 37 Global Step: 786870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:35,304-Speed 2497.84 samples/sec Loss 1.0563 LearningRate 0.000003 Epoch: 37 Global Step: 786880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:43,519-Speed 2493.50 samples/sec Loss 1.0786 LearningRate 0.000003 Epoch: 37 Global Step: 786890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:51,738-Speed 2491.87 samples/sec Loss 1.0633 LearningRate 0.000003 Epoch: 37 Global Step: 786900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:30:59,897-Speed 2510.71 samples/sec Loss 1.0386 LearningRate 0.000003 Epoch: 37 Global Step: 786910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:08,100-Speed 2496.93 samples/sec Loss 1.0624 LearningRate 0.000003 Epoch: 37 Global Step: 786920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:16,308-Speed 2495.70 samples/sec Loss 1.0445 LearningRate 0.000003 Epoch: 37 Global Step: 786930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:24,511-Speed 2497.22 samples/sec Loss 1.0612 LearningRate 0.000003 Epoch: 37 Global Step: 786940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:32,733-Speed 2491.04 samples/sec Loss 1.0786 LearningRate 0.000003 Epoch: 37 Global Step: 786950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:40,939-Speed 2496.35 samples/sec Loss 1.0678 LearningRate 0.000003 Epoch: 37 Global Step: 786960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:49,090-Speed 2513.02 samples/sec Loss 1.0452 LearningRate 0.000003 Epoch: 37 Global Step: 786970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:31:57,294-Speed 2496.62 samples/sec Loss 1.0895 LearningRate 0.000003 Epoch: 37 Global Step: 786980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:05,497-Speed 2497.10 samples/sec Loss 1.0524 LearningRate 0.000003 Epoch: 37 Global Step: 786990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:13,700-Speed 2497.00 samples/sec Loss 1.0812 LearningRate 0.000003 Epoch: 37 Global Step: 787000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:21,905-Speed 2496.51 samples/sec Loss 1.0571 LearningRate 0.000003 Epoch: 37 Global Step: 787010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:30,109-Speed 2496.83 samples/sec Loss 1.0515 LearningRate 0.000003 Epoch: 37 Global Step: 787020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:38,261-Speed 2512.64 samples/sec Loss 1.0624 LearningRate 0.000003 Epoch: 37 Global Step: 787030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-07-13 02:32:46,420-Speed 2510.55 samples/sec Loss 1.0560 LearningRate 0.000003 Epoch: 37 Global Step: 787040 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:32:54,628-Speed 2495.66 samples/sec Loss 1.0635 LearningRate 0.000003 Epoch: 37 Global Step: 787050 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:02,844-Speed 2493.16 samples/sec Loss 1.0919 LearningRate 0.000003 Epoch: 37 Global Step: 787060 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:11,061-Speed 2492.60 samples/sec Loss 1.0951 LearningRate 0.000003 Epoch: 37 Global Step: 787070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:19,262-Speed 2497.99 samples/sec Loss 1.0998 LearningRate 0.000003 Epoch: 37 Global Step: 787080 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:27,411-Speed 2513.28 samples/sec Loss 1.0656 LearningRate 0.000003 Epoch: 37 Global Step: 787090 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:35,614-Speed 2497.08 samples/sec Loss 1.0577 LearningRate 0.000003 Epoch: 37 Global Step: 787100 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:43,817-Speed 2496.92 samples/sec Loss 1.0795 LearningRate 0.000003 Epoch: 37 Global Step: 787110 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:33:52,029-Speed 2494.26 samples/sec Loss 1.1065 LearningRate 0.000003 Epoch: 37 Global Step: 787120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:00,236-Speed 2495.86 samples/sec Loss 1.0506 LearningRate 0.000003 Epoch: 37 Global Step: 787130 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:08,438-Speed 2497.31 samples/sec Loss 1.0764 LearningRate 0.000003 Epoch: 37 Global Step: 787140 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:16,589-Speed 2512.92 samples/sec Loss 1.0610 LearningRate 0.000003 Epoch: 37 Global Step: 787150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:24,801-Speed 2494.57 samples/sec Loss 1.0908 LearningRate 0.000003 Epoch: 37 Global Step: 787160 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:33,008-Speed 2495.60 samples/sec Loss 1.0456 LearningRate 0.000003 Epoch: 37 Global Step: 787170 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:41,212-Speed 2496.68 samples/sec Loss 1.0822 LearningRate 0.000003 Epoch: 37 Global Step: 787180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:49,416-Speed 2496.85 samples/sec Loss 1.0250 LearningRate 0.000003 Epoch: 37 Global Step: 787190 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:34:57,619-Speed 2496.83 samples/sec Loss 1.0724 LearningRate 0.000003 Epoch: 37 Global Step: 787200 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:05,770-Speed 2513.17 samples/sec Loss 1.0614 LearningRate 0.000003 Epoch: 37 Global Step: 787210 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:13,974-Speed 2496.55 samples/sec Loss 1.0789 LearningRate 0.000003 Epoch: 37 Global Step: 787220 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:22,189-Speed 2493.71 samples/sec Loss 1.0854 LearningRate 0.000003 Epoch: 37 Global Step: 787230 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:30,392-Speed 2496.86 samples/sec Loss 1.0531 LearningRate 0.000003 Epoch: 37 Global Step: 787240 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:38,596-Speed 2496.92 samples/sec Loss 1.0955 LearningRate 0.000003 Epoch: 37 Global Step: 787250 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:46,802-Speed 2496.23 samples/sec Loss 1.0707 LearningRate 0.000003 Epoch: 37 Global Step: 787260 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:35:54,952-Speed 2513.12 samples/sec Loss 1.0578 LearningRate 0.000003 Epoch: 37 Global Step: 787270 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:03,158-Speed 2496.08 samples/sec Loss 1.0824 LearningRate 0.000003 Epoch: 37 Global Step: 787280 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:11,372-Speed 2493.82 samples/sec Loss 1.1067 LearningRate 0.000003 Epoch: 37 Global Step: 787290 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:19,577-Speed 2496.44 samples/sec Loss 1.0620 LearningRate 0.000003 Epoch: 37 Global Step: 787300 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:27,786-Speed 2495.34 samples/sec Loss 1.0395 LearningRate 0.000003 Epoch: 37 Global Step: 787310 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:35,992-Speed 2496.04 samples/sec Loss 1.0936 LearningRate 0.000003 Epoch: 37 Global Step: 787320 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:44,170-Speed 2504.35 samples/sec Loss 1.0489 LearningRate 0.000003 Epoch: 37 Global Step: 787330 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:36:52,377-Speed 2495.84 samples/sec Loss 1.0647 LearningRate 0.000003 Epoch: 37 Global Step: 787340 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:00,586-Speed 2495.44 samples/sec Loss 1.0791 LearningRate 0.000003 Epoch: 37 Global Step: 787350 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:08,790-Speed 2496.77 samples/sec Loss 1.0514 LearningRate 0.000003 Epoch: 37 Global Step: 787360 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:16,993-Speed 2497.08 samples/sec Loss 1.0723 LearningRate 0.000003 Epoch: 37 Global Step: 787370 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:25,200-Speed 2495.54 samples/sec Loss 1.0706 LearningRate 0.000003 Epoch: 37 Global Step: 787380 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:33,355-Speed 2511.95 samples/sec Loss 1.0948 LearningRate 0.000003 Epoch: 37 Global Step: 787390 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:41,562-Speed 2496.00 samples/sec Loss 1.0515 LearningRate 0.000003 Epoch: 37 Global Step: 787400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:49,764-Speed 2497.34 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 37 Global Step: 787410 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:37:57,974-Speed 2495.09 samples/sec Loss 1.0638 LearningRate 0.000003 Epoch: 37 Global Step: 787420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:06,179-Speed 2496.52 samples/sec Loss 1.0665 LearningRate 0.000003 Epoch: 37 Global Step: 787430 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:14,380-Speed 2497.32 samples/sec Loss 1.0903 LearningRate 0.000003 Epoch: 37 Global Step: 787440 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:22,531-Speed 2513.13 samples/sec Loss 1.0888 LearningRate 0.000003 Epoch: 37 Global Step: 787450 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:30,737-Speed 2496.25 samples/sec Loss 1.0598 LearningRate 0.000003 Epoch: 37 Global Step: 787460 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:38,952-Speed 2493.47 samples/sec Loss 1.0667 LearningRate 0.000003 Epoch: 37 Global Step: 787470 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:47,158-Speed 2495.96 samples/sec Loss 1.0965 LearningRate 0.000003 Epoch: 37 Global Step: 787480 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:38:55,365-Speed 2496.06 samples/sec Loss 1.0682 LearningRate 0.000003 Epoch: 37 Global Step: 787490 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:03,568-Speed 2496.98 samples/sec Loss 1.0803 LearningRate 0.000003 Epoch: 37 Global Step: 787500 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:11,718-Speed 2513.61 samples/sec Loss 1.0649 LearningRate 0.000003 Epoch: 37 Global Step: 787510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:19,920-Speed 2497.23 samples/sec Loss 1.0880 LearningRate 0.000003 Epoch: 37 Global Step: 787520 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:28,121-Speed 2497.69 samples/sec Loss 1.0908 LearningRate 0.000003 Epoch: 37 Global Step: 787530 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:36,326-Speed 2496.56 samples/sec Loss 1.0699 LearningRate 0.000003 Epoch: 37 Global Step: 787540 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:44,532-Speed 2495.98 samples/sec Loss 1.0674 LearningRate 0.000003 Epoch: 37 Global Step: 787550 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:39:52,735-Speed 2496.94 samples/sec Loss 1.1068 LearningRate 0.000003 Epoch: 37 Global Step: 787560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:00,892-Speed 2511.34 samples/sec Loss 1.0822 LearningRate 0.000003 Epoch: 37 Global Step: 787570 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:09,100-Speed 2495.34 samples/sec Loss 1.0893 LearningRate 0.000003 Epoch: 37 Global Step: 787580 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:17,307-Speed 2496.05 samples/sec Loss 1.0846 LearningRate 0.000003 Epoch: 37 Global Step: 787590 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:25,512-Speed 2496.21 samples/sec Loss 1.0719 LearningRate 0.000003 Epoch: 37 Global Step: 787600 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:33,725-Speed 2494.22 samples/sec Loss 1.0911 LearningRate 0.000003 Epoch: 37 Global Step: 787610 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:41,935-Speed 2495.23 samples/sec Loss 1.0777 LearningRate 0.000003 Epoch: 37 Global Step: 787620 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:50,101-Speed 2508.43 samples/sec Loss 1.0845 LearningRate 0.000003 Epoch: 37 Global Step: 787630 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:40:58,319-Speed 2492.42 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 37 Global Step: 787640 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:06,526-Speed 2495.59 samples/sec Loss 1.0992 LearningRate 0.000003 Epoch: 37 Global Step: 787650 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:14,740-Speed 2493.73 samples/sec Loss 1.0838 LearningRate 0.000003 Epoch: 37 Global Step: 787660 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:22,947-Speed 2495.49 samples/sec Loss 1.0668 LearningRate 0.000003 Epoch: 37 Global Step: 787670 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:31,155-Speed 2495.62 samples/sec Loss 1.0880 LearningRate 0.000003 Epoch: 37 Global Step: 787680 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:39,320-Speed 2508.81 samples/sec Loss 1.1001 LearningRate 0.000003 Epoch: 37 Global Step: 787690 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:47,528-Speed 2495.61 samples/sec Loss 1.0752 LearningRate 0.000003 Epoch: 37 Global Step: 787700 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:41:55,735-Speed 2495.46 samples/sec Loss 1.0952 LearningRate 0.000003 Epoch: 37 Global Step: 787710 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:03,942-Speed 2495.79 samples/sec Loss 1.0887 LearningRate 0.000003 Epoch: 37 Global Step: 787720 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:12,150-Speed 2495.72 samples/sec Loss 1.0866 LearningRate 0.000003 Epoch: 37 Global Step: 787730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:20,357-Speed 2495.98 samples/sec Loss 1.0580 LearningRate 0.000003 Epoch: 37 Global Step: 787740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:28,510-Speed 2512.43 samples/sec Loss 1.0814 LearningRate 0.000003 Epoch: 37 Global Step: 787750 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:36,723-Speed 2493.76 samples/sec Loss 1.0811 LearningRate 0.000003 Epoch: 37 Global Step: 787760 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:44,933-Speed 2494.88 samples/sec Loss 1.0747 LearningRate 0.000003 Epoch: 37 Global Step: 787770 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:42:53,143-Speed 2495.21 samples/sec Loss 1.0794 LearningRate 0.000003 Epoch: 37 Global Step: 787780 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:01,346-Speed 2496.89 samples/sec Loss 1.0997 LearningRate 0.000003 Epoch: 37 Global Step: 787790 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:09,558-Speed 2494.42 samples/sec Loss 1.0635 LearningRate 0.000003 Epoch: 37 Global Step: 787800 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:17,725-Speed 2508.01 samples/sec Loss 1.0823 LearningRate 0.000003 Epoch: 37 Global Step: 787810 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:25,933-Speed 2495.46 samples/sec Loss 1.0859 LearningRate 0.000003 Epoch: 37 Global Step: 787820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:34,139-Speed 2496.26 samples/sec Loss 1.0443 LearningRate 0.000003 Epoch: 37 Global Step: 787830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:42,344-Speed 2496.45 samples/sec Loss 1.1039 LearningRate 0.000003 Epoch: 37 Global Step: 787840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:50,546-Speed 2497.40 samples/sec Loss 1.0739 LearningRate 0.000003 Epoch: 37 Global Step: 787850 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:43:58,752-Speed 2496.18 samples/sec Loss 1.0829 LearningRate 0.000003 Epoch: 37 Global Step: 787860 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:06,908-Speed 2511.41 samples/sec Loss 1.0625 LearningRate 0.000003 Epoch: 37 Global Step: 787870 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:15,113-Speed 2496.19 samples/sec Loss 1.0379 LearningRate 0.000003 Epoch: 37 Global Step: 787880 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:23,316-Speed 2497.34 samples/sec Loss 1.0586 LearningRate 0.000003 Epoch: 37 Global Step: 787890 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:31,519-Speed 2496.91 samples/sec Loss 1.0791 LearningRate 0.000003 Epoch: 37 Global Step: 787900 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:39,724-Speed 2496.41 samples/sec Loss 1.0703 LearningRate 0.000003 Epoch: 37 Global Step: 787910 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:47,930-Speed 2496.05 samples/sec Loss 1.0855 LearningRate 0.000003 Epoch: 37 Global Step: 787920 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:44:56,080-Speed 2513.61 samples/sec Loss 1.0591 LearningRate 0.000003 Epoch: 37 Global Step: 787930 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:04,287-Speed 2495.72 samples/sec Loss 1.1018 LearningRate 0.000003 Epoch: 37 Global Step: 787940 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:12,496-Speed 2495.15 samples/sec Loss 1.0439 LearningRate 0.000003 Epoch: 37 Global Step: 787950 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:20,699-Speed 2497.17 samples/sec Loss 1.1012 LearningRate 0.000003 Epoch: 37 Global Step: 787960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:28,906-Speed 2495.57 samples/sec Loss 1.1022 LearningRate 0.000003 Epoch: 37 Global Step: 787970 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:37,113-Speed 2495.91 samples/sec Loss 1.0730 LearningRate 0.000003 Epoch: 37 Global Step: 787980 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:45,266-Speed 2512.30 samples/sec Loss 1.0505 LearningRate 0.000003 Epoch: 37 Global Step: 787990 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:45:53,477-Speed 2495.10 samples/sec Loss 1.0573 LearningRate 0.000003 Epoch: 37 Global Step: 788000 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:46:01,682-Speed 2496.19 samples/sec Loss 1.0893 LearningRate 0.000003 Epoch: 37 Global Step: 788010 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:46:09,897-Speed 2493.46 samples/sec Loss 1.0719 LearningRate 0.000003 Epoch: 37 Global Step: 788020 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:46:18,100-Speed 2496.93 samples/sec Loss 1.0733 LearningRate 0.000003 Epoch: 37 Global Step: 788030 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:46:26,306-Speed 2496.12 samples/sec Loss 1.0221 LearningRate 0.000003 Epoch: 37 Global Step: 788040 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-07-13 02:46:34,461-Speed 2512.00 samples/sec Loss 1.0593 LearningRate 0.000003 Epoch: 37 Global Step: 788050 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:46:42,664-Speed 2497.04 samples/sec Loss 1.0654 LearningRate 0.000003 Epoch: 37 Global Step: 788060 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:46:50,871-Speed 2495.93 samples/sec Loss 1.0952 LearningRate 0.000003 Epoch: 37 Global Step: 788070 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:46:59,079-Speed 2495.43 samples/sec Loss 1.0773 LearningRate 0.000003 Epoch: 37 Global Step: 788080 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:07,284-Speed 2496.39 samples/sec Loss 1.1022 LearningRate 0.000003 Epoch: 37 Global Step: 788090 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:15,491-Speed 2495.84 samples/sec Loss 1.0860 LearningRate 0.000003 Epoch: 37 Global Step: 788100 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:25,993-Speed 1950.54 samples/sec Loss 1.0895 LearningRate 0.000003 Epoch: 38 Global Step: 788110 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:34,196-Speed 2497.08 samples/sec Loss 1.0864 LearningRate 0.000003 Epoch: 38 Global Step: 788120 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:42,403-Speed 2495.85 samples/sec Loss 1.0585 LearningRate 0.000003 Epoch: 38 Global Step: 788130 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:50,610-Speed 2495.65 samples/sec Loss 1.0884 LearningRate 0.000003 Epoch: 38 Global Step: 788140 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:47:58,810-Speed 2497.98 samples/sec Loss 1.0558 LearningRate 0.000003 Epoch: 38 Global Step: 788150 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:07,014-Speed 2496.93 samples/sec Loss 1.0941 LearningRate 0.000003 Epoch: 38 Global Step: 788160 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:15,168-Speed 2511.97 samples/sec Loss 1.0773 LearningRate 0.000003 Epoch: 38 Global Step: 788170 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:23,372-Speed 2496.64 samples/sec Loss 1.0578 LearningRate 0.000003 Epoch: 38 Global Step: 788180 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:31,575-Speed 2496.98 samples/sec Loss 1.0525 LearningRate 0.000003 Epoch: 38 Global Step: 788190 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:39,784-Speed 2495.04 samples/sec Loss 1.0506 LearningRate 0.000003 Epoch: 38 Global Step: 788200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:47,990-Speed 2496.40 samples/sec Loss 1.0926 LearningRate 0.000003 Epoch: 38 Global Step: 788210 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:48:56,195-Speed 2496.35 samples/sec Loss 1.0734 LearningRate 0.000003 Epoch: 38 Global Step: 788220 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:49:04,346-Speed 2513.05 samples/sec Loss 1.0546 LearningRate 0.000003 Epoch: 38 Global Step: 788230 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:49:12,551-Speed 2496.47 samples/sec Loss 1.0600 LearningRate 0.000003 Epoch: 38 Global Step: 788240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:49:20,758-Speed 2495.72 samples/sec Loss 1.0980 LearningRate 0.000003 Epoch: 38 Global Step: 788250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:49:28,966-Speed 2495.55 samples/sec Loss 1.0523 LearningRate 0.000003 Epoch: 38 Global Step: 788260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:49:37,180-Speed 2493.63 samples/sec Loss 1.0733 LearningRate 0.000003 Epoch: 38 Global Step: 788270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:49:45,386-Speed 2496.41 samples/sec Loss 1.0808 LearningRate 0.000003 Epoch: 38 Global Step: 788280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:49:53,538-Speed 2512.52 samples/sec Loss 1.0589 LearningRate 0.000003 Epoch: 38 Global Step: 788290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:01,742-Speed 2496.81 samples/sec Loss 1.0963 LearningRate 0.000003 Epoch: 38 Global Step: 788300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:09,946-Speed 2496.82 samples/sec Loss 1.0746 LearningRate 0.000003 Epoch: 38 Global Step: 788310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:18,161-Speed 2493.68 samples/sec Loss 1.0749 LearningRate 0.000003 Epoch: 38 Global Step: 788320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:26,373-Speed 2494.23 samples/sec Loss 1.0725 LearningRate 0.000003 Epoch: 38 Global Step: 788330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:34,583-Speed 2495.35 samples/sec Loss 1.0645 LearningRate 0.000003 Epoch: 38 Global Step: 788340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:42,733-Speed 2513.14 samples/sec Loss 1.0621 LearningRate 0.000003 Epoch: 38 Global Step: 788350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:50,941-Speed 2495.59 samples/sec Loss 1.0614 LearningRate 0.000003 Epoch: 38 Global Step: 788360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:50:59,144-Speed 2497.28 samples/sec Loss 1.0324 LearningRate 0.000003 Epoch: 38 Global Step: 788370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:07,352-Speed 2495.36 samples/sec Loss 1.0671 LearningRate 0.000003 Epoch: 38 Global Step: 788380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:15,559-Speed 2495.94 samples/sec Loss 1.0579 LearningRate 0.000003 Epoch: 38 Global Step: 788390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:23,762-Speed 2497.13 samples/sec Loss 1.0606 LearningRate 0.000003 Epoch: 38 Global Step: 788400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:31,916-Speed 2512.12 samples/sec Loss 1.0357 LearningRate 0.000003 Epoch: 38 Global Step: 788410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:40,120-Speed 2496.50 samples/sec Loss 1.0371 LearningRate 0.000003 Epoch: 38 Global Step: 788420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:48,337-Speed 2493.22 samples/sec Loss 1.0290 LearningRate 0.000003 Epoch: 38 Global Step: 788430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:51:56,554-Speed 2492.89 samples/sec Loss 1.0779 LearningRate 0.000003 Epoch: 38 Global Step: 788440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:04,762-Speed 2495.57 samples/sec Loss 1.0769 LearningRate 0.000003 Epoch: 38 Global Step: 788450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:12,969-Speed 2495.76 samples/sec Loss 1.0358 LearningRate 0.000003 Epoch: 38 Global Step: 788460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:21,135-Speed 2508.23 samples/sec Loss 1.0650 LearningRate 0.000003 Epoch: 38 Global Step: 788470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:29,340-Speed 2496.53 samples/sec Loss 1.0770 LearningRate 0.000003 Epoch: 38 Global Step: 788480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:37,545-Speed 2496.54 samples/sec Loss 1.0705 LearningRate 0.000003 Epoch: 38 Global Step: 788490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:45,752-Speed 2495.80 samples/sec Loss 1.0442 LearningRate 0.000003 Epoch: 38 Global Step: 788500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:52:53,961-Speed 2495.32 samples/sec Loss 1.1040 LearningRate 0.000003 Epoch: 38 Global Step: 788510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:02,169-Speed 2495.36 samples/sec Loss 1.0948 LearningRate 0.000003 Epoch: 38 Global Step: 788520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:10,323-Speed 2512.21 samples/sec Loss 1.0814 LearningRate 0.000003 Epoch: 38 Global Step: 788530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:18,536-Speed 2493.98 samples/sec Loss 1.0770 LearningRate 0.000003 Epoch: 38 Global Step: 788540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:26,747-Speed 2494.77 samples/sec Loss 1.0760 LearningRate 0.000003 Epoch: 38 Global Step: 788550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:34,957-Speed 2494.87 samples/sec Loss 1.0690 LearningRate 0.000003 Epoch: 38 Global Step: 788560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:43,161-Speed 2496.89 samples/sec Loss 1.0664 LearningRate 0.000003 Epoch: 38 Global Step: 788570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:51,370-Speed 2495.46 samples/sec Loss 1.0801 LearningRate 0.000003 Epoch: 38 Global Step: 788580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:53:59,527-Speed 2510.87 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 38 Global Step: 788590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:07,734-Speed 2495.84 samples/sec Loss 1.0479 LearningRate 0.000003 Epoch: 38 Global Step: 788600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:15,947-Speed 2494.31 samples/sec Loss 1.0650 LearningRate 0.000003 Epoch: 38 Global Step: 788610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:24,153-Speed 2496.01 samples/sec Loss 1.0798 LearningRate 0.000003 Epoch: 38 Global Step: 788620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:32,356-Speed 2496.80 samples/sec Loss 1.0848 LearningRate 0.000003 Epoch: 38 Global Step: 788630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:40,563-Speed 2495.81 samples/sec Loss 1.0512 LearningRate 0.000003 Epoch: 38 Global Step: 788640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:48,715-Speed 2512.78 samples/sec Loss 1.0294 LearningRate 0.000003 Epoch: 38 Global Step: 788650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:54:56,922-Speed 2495.98 samples/sec Loss 1.0770 LearningRate 0.000003 Epoch: 38 Global Step: 788660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:05,126-Speed 2496.74 samples/sec Loss 1.0462 LearningRate 0.000003 Epoch: 38 Global Step: 788670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:13,333-Speed 2496.03 samples/sec Loss 1.0737 LearningRate 0.000003 Epoch: 38 Global Step: 788680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:21,537-Speed 2496.90 samples/sec Loss 1.0615 LearningRate 0.000003 Epoch: 38 Global Step: 788690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:29,746-Speed 2495.14 samples/sec Loss 1.0748 LearningRate 0.000003 Epoch: 38 Global Step: 788700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:37,915-Speed 2507.62 samples/sec Loss 1.0801 LearningRate 0.000003 Epoch: 38 Global Step: 788710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:46,119-Speed 2496.56 samples/sec Loss 1.0751 LearningRate 0.000003 Epoch: 38 Global Step: 788720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:55:54,323-Speed 2496.67 samples/sec Loss 1.0695 LearningRate 0.000003 Epoch: 38 Global Step: 788730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:56:02,525-Speed 2497.25 samples/sec Loss 1.0668 LearningRate 0.000003 Epoch: 38 Global Step: 788740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 02:56:10,688-Speed 2509.82 samples/sec Loss 1.0504 LearningRate 0.000003 Epoch: 38 Global Step: 788750 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:18,896-Speed 2495.82 samples/sec Loss 1.0716 LearningRate 0.000003 Epoch: 38 Global Step: 788760 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:27,045-Speed 2513.40 samples/sec Loss 1.0654 LearningRate 0.000003 Epoch: 38 Global Step: 788770 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:35,259-Speed 2493.90 samples/sec Loss 1.0597 LearningRate 0.000003 Epoch: 38 Global Step: 788780 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:43,464-Speed 2496.39 samples/sec Loss 1.0756 LearningRate 0.000003 Epoch: 38 Global Step: 788790 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:51,668-Speed 2497.04 samples/sec Loss 1.1005 LearningRate 0.000003 Epoch: 38 Global Step: 788800 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:56:59,879-Speed 2494.53 samples/sec Loss 1.0750 LearningRate 0.000003 Epoch: 38 Global Step: 788810 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:08,105-Speed 2489.95 samples/sec Loss 1.0794 LearningRate 0.000003 Epoch: 38 Global Step: 788820 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:16,258-Speed 2512.38 samples/sec Loss 1.0390 LearningRate 0.000003 Epoch: 38 Global Step: 788830 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:24,464-Speed 2496.24 samples/sec Loss 1.0639 LearningRate 0.000003 Epoch: 38 Global Step: 788840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:32,674-Speed 2494.95 samples/sec Loss 1.0794 LearningRate 0.000003 Epoch: 38 Global Step: 788850 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:40,881-Speed 2495.88 samples/sec Loss 1.0831 LearningRate 0.000003 Epoch: 38 Global Step: 788860 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:49,089-Speed 2495.84 samples/sec Loss 1.0831 LearningRate 0.000003 Epoch: 38 Global Step: 788870 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:57:57,291-Speed 2497.20 samples/sec Loss 1.0957 LearningRate 0.000003 Epoch: 38 Global Step: 788880 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:05,446-Speed 2511.85 samples/sec Loss 1.0513 LearningRate 0.000003 Epoch: 38 Global Step: 788890 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:13,654-Speed 2495.45 samples/sec Loss 1.0818 LearningRate 0.000003 Epoch: 38 Global Step: 788900 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:21,865-Speed 2495.09 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 38 Global Step: 788910 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:30,076-Speed 2494.75 samples/sec Loss 1.0583 LearningRate 0.000003 Epoch: 38 Global Step: 788920 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:38,276-Speed 2497.84 samples/sec Loss 1.0750 LearningRate 0.000003 Epoch: 38 Global Step: 788930 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:46,477-Speed 2497.73 samples/sec Loss 1.0748 LearningRate 0.000003 Epoch: 38 Global Step: 788940 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:58:54,626-Speed 2513.42 samples/sec Loss 1.0568 LearningRate 0.000003 Epoch: 38 Global Step: 788950 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:02,832-Speed 2496.39 samples/sec Loss 1.0981 LearningRate 0.000003 Epoch: 38 Global Step: 788960 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:11,046-Speed 2493.75 samples/sec Loss 1.1050 LearningRate 0.000003 Epoch: 38 Global Step: 788970 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:19,253-Speed 2495.85 samples/sec Loss 1.0892 LearningRate 0.000003 Epoch: 38 Global Step: 788980 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:27,454-Speed 2497.73 samples/sec Loss 1.0790 LearningRate 0.000003 Epoch: 38 Global Step: 788990 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:35,666-Speed 2494.13 samples/sec Loss 1.0713 LearningRate 0.000003 Epoch: 38 Global Step: 789000 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:43,817-Speed 2513.16 samples/sec Loss 1.0967 LearningRate 0.000003 Epoch: 38 Global Step: 789010 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 02:59:52,042-Speed 2490.36 samples/sec Loss 1.0702 LearningRate 0.000003 Epoch: 38 Global Step: 789020 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:00,246-Speed 2496.76 samples/sec Loss 1.0779 LearningRate 0.000003 Epoch: 38 Global Step: 789030 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:08,454-Speed 2495.59 samples/sec Loss 1.0556 LearningRate 0.000003 Epoch: 38 Global Step: 789040 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:16,656-Speed 2497.28 samples/sec Loss 1.0778 LearningRate 0.000003 Epoch: 38 Global Step: 789050 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:24,861-Speed 2496.52 samples/sec Loss 1.0515 LearningRate 0.000003 Epoch: 38 Global Step: 789060 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:33,008-Speed 2514.24 samples/sec Loss 1.0472 LearningRate 0.000003 Epoch: 38 Global Step: 789070 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:41,213-Speed 2496.30 samples/sec Loss 1.0492 LearningRate 0.000003 Epoch: 38 Global Step: 789080 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:49,416-Speed 2497.06 samples/sec Loss 1.0599 LearningRate 0.000003 Epoch: 38 Global Step: 789090 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:00:57,619-Speed 2496.95 samples/sec Loss 1.0443 LearningRate 0.000003 Epoch: 38 Global Step: 789100 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:05,825-Speed 2496.39 samples/sec Loss 1.0562 LearningRate 0.000003 Epoch: 38 Global Step: 789110 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:14,032-Speed 2495.66 samples/sec Loss 1.1139 LearningRate 0.000003 Epoch: 38 Global Step: 789120 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:22,183-Speed 2513.20 samples/sec Loss 1.0866 LearningRate 0.000003 Epoch: 38 Global Step: 789130 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:30,386-Speed 2497.02 samples/sec Loss 1.0916 LearningRate 0.000003 Epoch: 38 Global Step: 789140 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:38,587-Speed 2497.67 samples/sec Loss 1.0901 LearningRate 0.000003 Epoch: 38 Global Step: 789150 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:46,809-Speed 2491.19 samples/sec Loss 1.0548 LearningRate 0.000003 Epoch: 38 Global Step: 789160 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:01:55,015-Speed 2496.27 samples/sec Loss 1.0500 LearningRate 0.000003 Epoch: 38 Global Step: 789170 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:03,216-Speed 2497.59 samples/sec Loss 1.0662 LearningRate 0.000003 Epoch: 38 Global Step: 789180 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:11,375-Speed 2510.70 samples/sec Loss 1.0638 LearningRate 0.000003 Epoch: 38 Global Step: 789190 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:19,586-Speed 2494.47 samples/sec Loss 1.0921 LearningRate 0.000003 Epoch: 38 Global Step: 789200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:27,803-Speed 2493.00 samples/sec Loss 1.0474 LearningRate 0.000003 Epoch: 38 Global Step: 789210 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:36,013-Speed 2495.00 samples/sec Loss 1.0691 LearningRate 0.000003 Epoch: 38 Global Step: 789220 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:44,214-Speed 2497.61 samples/sec Loss 1.0575 LearningRate 0.000003 Epoch: 38 Global Step: 789230 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:02:52,424-Speed 2494.85 samples/sec Loss 1.0747 LearningRate 0.000003 Epoch: 38 Global Step: 789240 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:00,579-Speed 2512.08 samples/sec Loss 1.0717 LearningRate 0.000003 Epoch: 38 Global Step: 789250 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:08,786-Speed 2495.70 samples/sec Loss 1.0822 LearningRate 0.000003 Epoch: 38 Global Step: 789260 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:16,990-Speed 2496.61 samples/sec Loss 1.0766 LearningRate 0.000003 Epoch: 38 Global Step: 789270 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:25,197-Speed 2495.77 samples/sec Loss 1.0945 LearningRate 0.000003 Epoch: 38 Global Step: 789280 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:33,405-Speed 2495.79 samples/sec Loss 1.0721 LearningRate 0.000003 Epoch: 38 Global Step: 789290 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:41,612-Speed 2495.83 samples/sec Loss 1.0700 LearningRate 0.000003 Epoch: 38 Global Step: 789300 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:49,766-Speed 2512.09 samples/sec Loss 1.0491 LearningRate 0.000003 Epoch: 38 Global Step: 789310 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:03:57,981-Speed 2493.41 samples/sec Loss 1.0611 LearningRate 0.000003 Epoch: 38 Global Step: 789320 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:06,186-Speed 2496.47 samples/sec Loss 1.0660 LearningRate 0.000003 Epoch: 38 Global Step: 789330 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:14,389-Speed 2497.12 samples/sec Loss 1.0884 LearningRate 0.000003 Epoch: 38 Global Step: 789340 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:22,592-Speed 2496.86 samples/sec Loss 1.0657 LearningRate 0.000003 Epoch: 38 Global Step: 789350 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:30,798-Speed 2496.21 samples/sec Loss 1.0691 LearningRate 0.000003 Epoch: 38 Global Step: 789360 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:38,950-Speed 2512.78 samples/sec Loss 1.0834 LearningRate 0.000003 Epoch: 38 Global Step: 789370 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:47,152-Speed 2497.29 samples/sec Loss 1.0392 LearningRate 0.000003 Epoch: 38 Global Step: 789380 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:04:55,372-Speed 2491.84 samples/sec Loss 1.0223 LearningRate 0.000003 Epoch: 38 Global Step: 789390 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:03,578-Speed 2496.14 samples/sec Loss 1.0526 LearningRate 0.000003 Epoch: 38 Global Step: 789400 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:11,780-Speed 2497.31 samples/sec Loss 1.0864 LearningRate 0.000003 Epoch: 38 Global Step: 789410 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:19,995-Speed 2493.71 samples/sec Loss 1.0612 LearningRate 0.000003 Epoch: 38 Global Step: 789420 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:28,156-Speed 2509.81 samples/sec Loss 1.0475 LearningRate 0.000003 Epoch: 38 Global Step: 789430 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:36,359-Speed 2496.85 samples/sec Loss 1.0540 LearningRate 0.000003 Epoch: 38 Global Step: 789440 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:44,565-Speed 2496.34 samples/sec Loss 1.0251 LearningRate 0.000003 Epoch: 38 Global Step: 789450 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:05:52,769-Speed 2496.61 samples/sec Loss 1.0649 LearningRate 0.000003 Epoch: 38 Global Step: 789460 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:00,975-Speed 2496.15 samples/sec Loss 1.0696 LearningRate 0.000003 Epoch: 38 Global Step: 789470 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:09,181-Speed 2496.38 samples/sec Loss 1.0742 LearningRate 0.000003 Epoch: 38 Global Step: 789480 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:17,330-Speed 2513.70 samples/sec Loss 1.0310 LearningRate 0.000003 Epoch: 38 Global Step: 789490 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:25,535-Speed 2496.45 samples/sec Loss 1.0705 LearningRate 0.000003 Epoch: 38 Global Step: 789500 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:33,743-Speed 2495.50 samples/sec Loss 1.0643 LearningRate 0.000003 Epoch: 38 Global Step: 789510 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:41,951-Speed 2495.41 samples/sec Loss 1.0521 LearningRate 0.000003 Epoch: 38 Global Step: 789520 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:50,158-Speed 2495.68 samples/sec Loss 1.1006 LearningRate 0.000003 Epoch: 38 Global Step: 789530 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:06:58,365-Speed 2496.09 samples/sec Loss 1.0365 LearningRate 0.000003 Epoch: 38 Global Step: 789540 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:06,516-Speed 2512.92 samples/sec Loss 1.0675 LearningRate 0.000003 Epoch: 38 Global Step: 789550 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:14,723-Speed 2496.01 samples/sec Loss 1.0777 LearningRate 0.000003 Epoch: 38 Global Step: 789560 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:22,942-Speed 2492.18 samples/sec Loss 1.0532 LearningRate 0.000003 Epoch: 38 Global Step: 789570 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:31,149-Speed 2495.77 samples/sec Loss 1.0738 LearningRate 0.000003 Epoch: 38 Global Step: 789580 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:39,354-Speed 2496.40 samples/sec Loss 1.0731 LearningRate 0.000003 Epoch: 38 Global Step: 789590 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:47,564-Speed 2494.85 samples/sec Loss 1.0561 LearningRate 0.000003 Epoch: 38 Global Step: 789600 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:07:55,715-Speed 2513.20 samples/sec Loss 1.0645 LearningRate 0.000003 Epoch: 38 Global Step: 789610 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:03,922-Speed 2495.65 samples/sec Loss 1.0375 LearningRate 0.000003 Epoch: 38 Global Step: 789620 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:12,126-Speed 2496.71 samples/sec Loss 1.0705 LearningRate 0.000003 Epoch: 38 Global Step: 789630 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:20,328-Speed 2497.29 samples/sec Loss 1.0817 LearningRate 0.000003 Epoch: 38 Global Step: 789640 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:28,538-Speed 2495.10 samples/sec Loss 1.0481 LearningRate 0.000003 Epoch: 38 Global Step: 789650 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:36,740-Speed 2497.14 samples/sec Loss 1.0526 LearningRate 0.000003 Epoch: 38 Global Step: 789660 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:44,891-Speed 2512.91 samples/sec Loss 1.0557 LearningRate 0.000003 Epoch: 38 Global Step: 789670 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:08:53,098-Speed 2495.99 samples/sec Loss 1.0730 LearningRate 0.000003 Epoch: 38 Global Step: 789680 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:01,315-Speed 2493.06 samples/sec Loss 1.0531 LearningRate 0.000003 Epoch: 38 Global Step: 789690 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:09,527-Speed 2494.24 samples/sec Loss 1.0780 LearningRate 0.000003 Epoch: 38 Global Step: 789700 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:17,729-Speed 2497.53 samples/sec Loss 1.0520 LearningRate 0.000003 Epoch: 38 Global Step: 789710 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:25,936-Speed 2495.66 samples/sec Loss 1.0354 LearningRate 0.000003 Epoch: 38 Global Step: 789720 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:34,087-Speed 2513.12 samples/sec Loss 1.0707 LearningRate 0.000003 Epoch: 38 Global Step: 789730 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:42,291-Speed 2496.50 samples/sec Loss 1.0806 LearningRate 0.000003 Epoch: 38 Global Step: 789740 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:50,510-Speed 2492.31 samples/sec Loss 1.0788 LearningRate 0.000003 Epoch: 38 Global Step: 789750 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:09:58,719-Speed 2495.17 samples/sec Loss 1.0818 LearningRate 0.000003 Epoch: 38 Global Step: 789760 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:06,929-Speed 2495.16 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 38 Global Step: 789770 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:15,135-Speed 2496.12 samples/sec Loss 1.0850 LearningRate 0.000003 Epoch: 38 Global Step: 789780 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:23,285-Speed 2512.88 samples/sec Loss 1.0638 LearningRate 0.000003 Epoch: 38 Global Step: 789790 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:31,492-Speed 2496.39 samples/sec Loss 1.0440 LearningRate 0.000003 Epoch: 38 Global Step: 789800 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:39,711-Speed 2492.01 samples/sec Loss 1.1026 LearningRate 0.000003 Epoch: 38 Global Step: 789810 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:47,916-Speed 2496.66 samples/sec Loss 1.0806 LearningRate 0.000003 Epoch: 38 Global Step: 789820 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:10:56,120-Speed 2496.51 samples/sec Loss 1.0478 LearningRate 0.000003 Epoch: 38 Global Step: 789830 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:04,325-Speed 2496.42 samples/sec Loss 1.0781 LearningRate 0.000003 Epoch: 38 Global Step: 789840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:12,474-Speed 2513.73 samples/sec Loss 1.0237 LearningRate 0.000003 Epoch: 38 Global Step: 789850 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:20,678-Speed 2496.45 samples/sec Loss 1.0504 LearningRate 0.000003 Epoch: 38 Global Step: 789860 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:28,881-Speed 2497.10 samples/sec Loss 1.0583 LearningRate 0.000003 Epoch: 38 Global Step: 789870 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:37,086-Speed 2496.37 samples/sec Loss 1.0529 LearningRate 0.000003 Epoch: 38 Global Step: 789880 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:45,293-Speed 2495.94 samples/sec Loss 1.0934 LearningRate 0.000003 Epoch: 38 Global Step: 789890 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:11:53,503-Speed 2494.83 samples/sec Loss 1.0797 LearningRate 0.000003 Epoch: 38 Global Step: 789900 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:12:01,653-Speed 2513.45 samples/sec Loss 1.0566 LearningRate 0.000003 Epoch: 38 Global Step: 789910 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:12:09,858-Speed 2496.26 samples/sec Loss 1.0683 LearningRate 0.000003 Epoch: 38 Global Step: 789920 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:12:18,070-Speed 2494.53 samples/sec Loss 1.0998 LearningRate 0.000003 Epoch: 38 Global Step: 789930 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:12:26,275-Speed 2496.34 samples/sec Loss 1.0741 LearningRate 0.000003 Epoch: 38 Global Step: 789940 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:12:34,480-Speed 2496.69 samples/sec Loss 1.0735 LearningRate 0.000003 Epoch: 38 Global Step: 789950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:12:42,706-Speed 2489.88 samples/sec Loss 1.0857 LearningRate 0.000003 Epoch: 38 Global Step: 789960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:12:50,858-Speed 2512.76 samples/sec Loss 1.0712 LearningRate 0.000003 Epoch: 38 Global Step: 789970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:12:59,061-Speed 2496.85 samples/sec Loss 1.0770 LearningRate 0.000003 Epoch: 38 Global Step: 789980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:07,271-Speed 2495.10 samples/sec Loss 1.0435 LearningRate 0.000003 Epoch: 38 Global Step: 789990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:15,480-Speed 2495.49 samples/sec Loss 1.0576 LearningRate 0.000003 Epoch: 38 Global Step: 790000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:23,687-Speed 2495.87 samples/sec Loss 1.0849 LearningRate 0.000003 Epoch: 38 Global Step: 790010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:31,908-Speed 2491.62 samples/sec Loss 1.0881 LearningRate 0.000003 Epoch: 38 Global Step: 790020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:40,063-Speed 2511.82 samples/sec Loss 1.0726 LearningRate 0.000003 Epoch: 38 Global Step: 790030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:48,280-Speed 2492.88 samples/sec Loss 1.0676 LearningRate 0.000003 Epoch: 38 Global Step: 790040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:13:56,486-Speed 2495.93 samples/sec Loss 1.0814 LearningRate 0.000003 Epoch: 38 Global Step: 790050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:04,694-Speed 2495.41 samples/sec Loss 1.0508 LearningRate 0.000003 Epoch: 38 Global Step: 790060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:12,901-Speed 2496.06 samples/sec Loss 1.0849 LearningRate 0.000003 Epoch: 38 Global Step: 790070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:21,105-Speed 2496.64 samples/sec Loss 1.0679 LearningRate 0.000003 Epoch: 38 Global Step: 790080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:29,258-Speed 2512.49 samples/sec Loss 1.0539 LearningRate 0.000003 Epoch: 38 Global Step: 790090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:37,459-Speed 2497.46 samples/sec Loss 1.0583 LearningRate 0.000003 Epoch: 38 Global Step: 790100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:45,664-Speed 2496.39 samples/sec Loss 1.0663 LearningRate 0.000003 Epoch: 38 Global Step: 790110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:14:53,868-Speed 2496.75 samples/sec Loss 1.0415 LearningRate 0.000003 Epoch: 38 Global Step: 790120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:02,069-Speed 2497.46 samples/sec Loss 1.1061 LearningRate 0.000003 Epoch: 38 Global Step: 790130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:10,275-Speed 2496.70 samples/sec Loss 1.0865 LearningRate 0.000003 Epoch: 38 Global Step: 790140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:18,429-Speed 2512.23 samples/sec Loss 1.0494 LearningRate 0.000003 Epoch: 38 Global Step: 790150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:26,632-Speed 2496.94 samples/sec Loss 1.0672 LearningRate 0.000003 Epoch: 38 Global Step: 790160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:34,836-Speed 2496.59 samples/sec Loss 1.0915 LearningRate 0.000003 Epoch: 38 Global Step: 790170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:43,046-Speed 2494.97 samples/sec Loss 1.0657 LearningRate 0.000003 Epoch: 38 Global Step: 790180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:51,251-Speed 2496.45 samples/sec Loss 1.0665 LearningRate 0.000003 Epoch: 38 Global Step: 790190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:15:59,456-Speed 2496.61 samples/sec Loss 1.1061 LearningRate 0.000003 Epoch: 38 Global Step: 790200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:07,605-Speed 2513.41 samples/sec Loss 1.0634 LearningRate 0.000003 Epoch: 38 Global Step: 790210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:15,809-Speed 2497.09 samples/sec Loss 1.0493 LearningRate 0.000003 Epoch: 38 Global Step: 790220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:24,015-Speed 2496.13 samples/sec Loss 1.1061 LearningRate 0.000003 Epoch: 38 Global Step: 790230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:32,222-Speed 2495.88 samples/sec Loss 1.0551 LearningRate 0.000003 Epoch: 38 Global Step: 790240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:40,427-Speed 2496.54 samples/sec Loss 1.0605 LearningRate 0.000003 Epoch: 38 Global Step: 790250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:48,645-Speed 2492.30 samples/sec Loss 1.0708 LearningRate 0.000003 Epoch: 38 Global Step: 790260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:16:56,798-Speed 2512.65 samples/sec Loss 1.0986 LearningRate 0.000003 Epoch: 38 Global Step: 790270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:05,000-Speed 2497.55 samples/sec Loss 1.1139 LearningRate 0.000003 Epoch: 38 Global Step: 790280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:13,205-Speed 2496.16 samples/sec Loss 1.0536 LearningRate 0.000003 Epoch: 38 Global Step: 790290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:21,423-Speed 2492.48 samples/sec Loss 1.0471 LearningRate 0.000003 Epoch: 38 Global Step: 790300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:29,626-Speed 2497.04 samples/sec Loss 1.0709 LearningRate 0.000003 Epoch: 38 Global Step: 790310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:37,831-Speed 2496.85 samples/sec Loss 1.0867 LearningRate 0.000003 Epoch: 38 Global Step: 790320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:45,987-Speed 2511.18 samples/sec Loss 1.0579 LearningRate 0.000003 Epoch: 38 Global Step: 790330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:17:54,195-Speed 2495.59 samples/sec Loss 1.1020 LearningRate 0.000003 Epoch: 38 Global Step: 790340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:02,401-Speed 2496.36 samples/sec Loss 1.0521 LearningRate 0.000003 Epoch: 38 Global Step: 790350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:10,607-Speed 2496.06 samples/sec Loss 1.0265 LearningRate 0.000003 Epoch: 38 Global Step: 790360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:18,813-Speed 2496.00 samples/sec Loss 1.0771 LearningRate 0.000003 Epoch: 38 Global Step: 790370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:27,025-Speed 2494.56 samples/sec Loss 1.0650 LearningRate 0.000003 Epoch: 38 Global Step: 790380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:35,184-Speed 2510.37 samples/sec Loss 1.0462 LearningRate 0.000003 Epoch: 38 Global Step: 790390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:43,387-Speed 2496.97 samples/sec Loss 1.0690 LearningRate 0.000003 Epoch: 38 Global Step: 790400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:51,600-Speed 2494.05 samples/sec Loss 1.0520 LearningRate 0.000003 Epoch: 38 Global Step: 790410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:18:59,802-Speed 2497.38 samples/sec Loss 1.0582 LearningRate 0.000003 Epoch: 38 Global Step: 790420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:08,003-Speed 2497.64 samples/sec Loss 1.0600 LearningRate 0.000003 Epoch: 38 Global Step: 790430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:16,206-Speed 2496.77 samples/sec Loss 1.0461 LearningRate 0.000003 Epoch: 38 Global Step: 790440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:24,362-Speed 2511.50 samples/sec Loss 1.0378 LearningRate 0.000003 Epoch: 38 Global Step: 790450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:32,569-Speed 2496.03 samples/sec Loss 1.0809 LearningRate 0.000003 Epoch: 38 Global Step: 790460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:40,774-Speed 2496.29 samples/sec Loss 1.0598 LearningRate 0.000003 Epoch: 38 Global Step: 790470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:48,980-Speed 2496.39 samples/sec Loss 1.0347 LearningRate 0.000003 Epoch: 38 Global Step: 790480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:19:57,190-Speed 2495.11 samples/sec Loss 1.0628 LearningRate 0.000003 Epoch: 38 Global Step: 790490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:05,398-Speed 2495.53 samples/sec Loss 1.0744 LearningRate 0.000003 Epoch: 38 Global Step: 790500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:13,547-Speed 2513.67 samples/sec Loss 1.0909 LearningRate 0.000003 Epoch: 38 Global Step: 790510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:21,754-Speed 2495.69 samples/sec Loss 1.0371 LearningRate 0.000003 Epoch: 38 Global Step: 790520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:29,956-Speed 2497.65 samples/sec Loss 1.0527 LearningRate 0.000003 Epoch: 38 Global Step: 790530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:38,157-Speed 2497.44 samples/sec Loss 1.0528 LearningRate 0.000003 Epoch: 38 Global Step: 790540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:46,362-Speed 2496.50 samples/sec Loss 1.0664 LearningRate 0.000003 Epoch: 38 Global Step: 790550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:20:54,565-Speed 2497.06 samples/sec Loss 1.0845 LearningRate 0.000003 Epoch: 38 Global Step: 790560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:02,717-Speed 2512.78 samples/sec Loss 1.0488 LearningRate 0.000003 Epoch: 38 Global Step: 790570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:10,921-Speed 2496.60 samples/sec Loss 1.0506 LearningRate 0.000003 Epoch: 38 Global Step: 790580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:19,125-Speed 2496.55 samples/sec Loss 1.0495 LearningRate 0.000003 Epoch: 38 Global Step: 790590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:27,328-Speed 2497.00 samples/sec Loss 1.0825 LearningRate 0.000003 Epoch: 38 Global Step: 790600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:35,534-Speed 2496.41 samples/sec Loss 1.0511 LearningRate 0.000003 Epoch: 38 Global Step: 790610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:43,737-Speed 2496.90 samples/sec Loss 1.0924 LearningRate 0.000003 Epoch: 38 Global Step: 790620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:21:51,890-Speed 2512.28 samples/sec Loss 1.0876 LearningRate 0.000003 Epoch: 38 Global Step: 790630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:00,094-Speed 2496.94 samples/sec Loss 1.0462 LearningRate 0.000003 Epoch: 38 Global Step: 790640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:08,305-Speed 2494.51 samples/sec Loss 1.0709 LearningRate 0.000003 Epoch: 38 Global Step: 790650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:16,510-Speed 2496.18 samples/sec Loss 1.0891 LearningRate 0.000003 Epoch: 38 Global Step: 790660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:24,714-Speed 2496.86 samples/sec Loss 1.0334 LearningRate 0.000003 Epoch: 38 Global Step: 790670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:32,935-Speed 2491.76 samples/sec Loss 1.0816 LearningRate 0.000003 Epoch: 38 Global Step: 790680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:41,101-Speed 2508.37 samples/sec Loss 1.0797 LearningRate 0.000003 Epoch: 38 Global Step: 790690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:49,303-Speed 2497.15 samples/sec Loss 1.0509 LearningRate 0.000003 Epoch: 38 Global Step: 790700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:22:57,519-Speed 2493.20 samples/sec Loss 1.0788 LearningRate 0.000003 Epoch: 38 Global Step: 790710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:05,724-Speed 2496.49 samples/sec Loss 1.0724 LearningRate 0.000003 Epoch: 38 Global Step: 790720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:13,927-Speed 2496.93 samples/sec Loss 1.0854 LearningRate 0.000003 Epoch: 38 Global Step: 790730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:22,131-Speed 2497.04 samples/sec Loss 1.0449 LearningRate 0.000003 Epoch: 38 Global Step: 790740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:30,286-Speed 2511.49 samples/sec Loss 1.0700 LearningRate 0.000003 Epoch: 38 Global Step: 790750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:38,495-Speed 2495.40 samples/sec Loss 1.0491 LearningRate 0.000003 Epoch: 38 Global Step: 790760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:46,702-Speed 2495.96 samples/sec Loss 1.0635 LearningRate 0.000003 Epoch: 38 Global Step: 790770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:23:54,914-Speed 2494.50 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 38 Global Step: 790780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:03,117-Speed 2496.86 samples/sec Loss 1.0565 LearningRate 0.000003 Epoch: 38 Global Step: 790790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:11,326-Speed 2495.27 samples/sec Loss 1.0621 LearningRate 0.000003 Epoch: 38 Global Step: 790800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:19,479-Speed 2512.49 samples/sec Loss 1.0534 LearningRate 0.000003 Epoch: 38 Global Step: 790810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:27,682-Speed 2496.85 samples/sec Loss 1.0589 LearningRate 0.000003 Epoch: 38 Global Step: 790820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:35,898-Speed 2493.10 samples/sec Loss 1.0585 LearningRate 0.000003 Epoch: 38 Global Step: 790830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:44,104-Speed 2496.37 samples/sec Loss 1.0687 LearningRate 0.000003 Epoch: 38 Global Step: 790840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:24:52,310-Speed 2496.26 samples/sec Loss 1.1072 LearningRate 0.000003 Epoch: 38 Global Step: 790850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:00,512-Speed 2497.22 samples/sec Loss 1.0619 LearningRate 0.000003 Epoch: 38 Global Step: 790860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:08,666-Speed 2512.24 samples/sec Loss 1.0806 LearningRate 0.000003 Epoch: 38 Global Step: 790870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:16,873-Speed 2495.83 samples/sec Loss 1.0611 LearningRate 0.000003 Epoch: 38 Global Step: 790880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:25,075-Speed 2497.11 samples/sec Loss 1.0709 LearningRate 0.000003 Epoch: 38 Global Step: 790890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:33,299-Speed 2490.64 samples/sec Loss 1.0661 LearningRate 0.000003 Epoch: 38 Global Step: 790900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:41,504-Speed 2496.83 samples/sec Loss 1.0421 LearningRate 0.000003 Epoch: 38 Global Step: 790910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:49,709-Speed 2496.37 samples/sec Loss 1.0717 LearningRate 0.000003 Epoch: 38 Global Step: 790920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:25:57,859-Speed 2513.34 samples/sec Loss 1.0699 LearningRate 0.000003 Epoch: 38 Global Step: 790930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:06,066-Speed 2495.60 samples/sec Loss 1.0718 LearningRate 0.000003 Epoch: 38 Global Step: 790940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:14,276-Speed 2495.12 samples/sec Loss 1.0446 LearningRate 0.000003 Epoch: 38 Global Step: 790950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:22,484-Speed 2495.76 samples/sec Loss 1.0574 LearningRate 0.000003 Epoch: 38 Global Step: 790960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:30,687-Speed 2497.12 samples/sec Loss 1.0974 LearningRate 0.000003 Epoch: 38 Global Step: 790970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:38,890-Speed 2497.23 samples/sec Loss 1.0619 LearningRate 0.000003 Epoch: 38 Global Step: 790980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:47,044-Speed 2512.11 samples/sec Loss 1.0758 LearningRate 0.000003 Epoch: 38 Global Step: 790990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:26:55,248-Speed 2496.73 samples/sec Loss 1.0277 LearningRate 0.000003 Epoch: 38 Global Step: 791000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:03,453-Speed 2496.31 samples/sec Loss 1.0728 LearningRate 0.000003 Epoch: 38 Global Step: 791010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:11,659-Speed 2496.29 samples/sec Loss 1.0689 LearningRate 0.000003 Epoch: 38 Global Step: 791020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:19,861-Speed 2497.09 samples/sec Loss 1.0642 LearningRate 0.000003 Epoch: 38 Global Step: 791030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:28,065-Speed 2496.72 samples/sec Loss 1.0610 LearningRate 0.000003 Epoch: 38 Global Step: 791040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:36,216-Speed 2512.96 samples/sec Loss 1.0775 LearningRate 0.000003 Epoch: 38 Global Step: 791050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:44,419-Speed 2496.93 samples/sec Loss 1.0582 LearningRate 0.000003 Epoch: 38 Global Step: 791060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:27:52,629-Speed 2496.25 samples/sec Loss 1.0420 LearningRate 0.000003 Epoch: 38 Global Step: 791070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:00,831-Speed 2497.31 samples/sec Loss 1.0499 LearningRate 0.000003 Epoch: 38 Global Step: 791080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:09,034-Speed 2496.88 samples/sec Loss 1.0598 LearningRate 0.000003 Epoch: 38 Global Step: 791090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:17,240-Speed 2496.09 samples/sec Loss 1.0654 LearningRate 0.000003 Epoch: 38 Global Step: 791100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:25,391-Speed 2513.04 samples/sec Loss 1.0597 LearningRate 0.000003 Epoch: 38 Global Step: 791110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:33,593-Speed 2497.21 samples/sec Loss 1.0706 LearningRate 0.000003 Epoch: 38 Global Step: 791120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:41,798-Speed 2496.73 samples/sec Loss 1.0798 LearningRate 0.000003 Epoch: 38 Global Step: 791130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:50,003-Speed 2496.51 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 38 Global Step: 791140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:28:58,208-Speed 2496.43 samples/sec Loss 1.0581 LearningRate 0.000003 Epoch: 38 Global Step: 791150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:06,416-Speed 2495.46 samples/sec Loss 1.0809 LearningRate 0.000003 Epoch: 38 Global Step: 791160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:14,566-Speed 2513.42 samples/sec Loss 1.0560 LearningRate 0.000003 Epoch: 38 Global Step: 791170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:22,770-Speed 2496.63 samples/sec Loss 1.0707 LearningRate 0.000003 Epoch: 38 Global Step: 791180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:30,974-Speed 2496.98 samples/sec Loss 1.0686 LearningRate 0.000003 Epoch: 38 Global Step: 791190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:39,177-Speed 2497.38 samples/sec Loss 1.0737 LearningRate 0.000003 Epoch: 38 Global Step: 791200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:47,384-Speed 2495.47 samples/sec Loss 1.0582 LearningRate 0.000003 Epoch: 38 Global Step: 791210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:29:55,592-Speed 2495.81 samples/sec Loss 1.0702 LearningRate 0.000003 Epoch: 38 Global Step: 791220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:03,795-Speed 2497.26 samples/sec Loss 1.0688 LearningRate 0.000003 Epoch: 38 Global Step: 791230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:12,007-Speed 2494.31 samples/sec Loss 1.0737 LearningRate 0.000003 Epoch: 38 Global Step: 791240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:20,209-Speed 2497.04 samples/sec Loss 1.0969 LearningRate 0.000003 Epoch: 38 Global Step: 791250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:28,422-Speed 2494.31 samples/sec Loss 1.0894 LearningRate 0.000003 Epoch: 38 Global Step: 791260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:36,626-Speed 2496.88 samples/sec Loss 1.0342 LearningRate 0.000003 Epoch: 38 Global Step: 791270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:44,833-Speed 2495.70 samples/sec Loss 1.0839 LearningRate 0.000003 Epoch: 38 Global Step: 791280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:30:53,004-Speed 2506.67 samples/sec Loss 1.0513 LearningRate 0.000003 Epoch: 38 Global Step: 791290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:01,211-Speed 2496.20 samples/sec Loss 1.0636 LearningRate 0.000003 Epoch: 38 Global Step: 791300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:09,417-Speed 2496.02 samples/sec Loss 1.0319 LearningRate 0.000003 Epoch: 38 Global Step: 791310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:17,622-Speed 2496.34 samples/sec Loss 1.0811 LearningRate 0.000003 Epoch: 38 Global Step: 791320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:25,830-Speed 2495.65 samples/sec Loss 1.0670 LearningRate 0.000003 Epoch: 38 Global Step: 791330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:34,039-Speed 2495.40 samples/sec Loss 1.0427 LearningRate 0.000003 Epoch: 38 Global Step: 791340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:42,188-Speed 2513.62 samples/sec Loss 1.0581 LearningRate 0.000003 Epoch: 38 Global Step: 791350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:50,391-Speed 2496.87 samples/sec Loss 1.0736 LearningRate 0.000003 Epoch: 38 Global Step: 791360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:31:58,600-Speed 2495.25 samples/sec Loss 1.0572 LearningRate 0.000003 Epoch: 38 Global Step: 791370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:06,805-Speed 2496.57 samples/sec Loss 1.0958 LearningRate 0.000003 Epoch: 38 Global Step: 791380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:15,022-Speed 2492.72 samples/sec Loss 1.0730 LearningRate 0.000003 Epoch: 38 Global Step: 791390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:23,232-Speed 2495.20 samples/sec Loss 1.0596 LearningRate 0.000003 Epoch: 38 Global Step: 791400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:31,385-Speed 2512.68 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 38 Global Step: 791410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:39,595-Speed 2495.14 samples/sec Loss 1.0568 LearningRate 0.000003 Epoch: 38 Global Step: 791420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:47,798-Speed 2496.92 samples/sec Loss 1.0772 LearningRate 0.000003 Epoch: 38 Global Step: 791430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:32:56,015-Speed 2492.72 samples/sec Loss 1.0633 LearningRate 0.000003 Epoch: 38 Global Step: 791440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:04,222-Speed 2495.87 samples/sec Loss 1.0743 LearningRate 0.000003 Epoch: 38 Global Step: 791450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:12,430-Speed 2495.46 samples/sec Loss 1.0728 LearningRate 0.000003 Epoch: 38 Global Step: 791460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:20,583-Speed 2512.39 samples/sec Loss 1.0768 LearningRate 0.000003 Epoch: 38 Global Step: 791470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:28,790-Speed 2495.88 samples/sec Loss 1.0913 LearningRate 0.000003 Epoch: 38 Global Step: 791480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:36,994-Speed 2496.69 samples/sec Loss 1.0772 LearningRate 0.000003 Epoch: 38 Global Step: 791490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:45,200-Speed 2496.04 samples/sec Loss 1.0590 LearningRate 0.000003 Epoch: 38 Global Step: 791500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:33:53,403-Speed 2496.89 samples/sec Loss 1.0907 LearningRate 0.000003 Epoch: 38 Global Step: 791510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:01,623-Speed 2491.88 samples/sec Loss 1.0526 LearningRate 0.000003 Epoch: 38 Global Step: 791520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:09,775-Speed 2513.19 samples/sec Loss 1.0481 LearningRate 0.000003 Epoch: 38 Global Step: 791530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:17,982-Speed 2495.87 samples/sec Loss 1.0496 LearningRate 0.000003 Epoch: 38 Global Step: 791540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:26,189-Speed 2495.55 samples/sec Loss 1.0797 LearningRate 0.000003 Epoch: 38 Global Step: 791550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:34,396-Speed 2496.20 samples/sec Loss 1.0755 LearningRate 0.000003 Epoch: 38 Global Step: 791560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-07-13 03:34:42,567-Speed 2506.87 samples/sec Loss 1.0623 LearningRate 0.000003 Epoch: 38 Global Step: 791570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:34:50,773-Speed 2495.83 samples/sec Loss 1.0561 LearningRate 0.000003 Epoch: 38 Global Step: 791580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:34:58,924-Speed 2513.15 samples/sec Loss 1.1061 LearningRate 0.000003 Epoch: 38 Global Step: 791590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:07,128-Speed 2496.95 samples/sec Loss 1.0608 LearningRate 0.000003 Epoch: 38 Global Step: 791600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:15,333-Speed 2496.36 samples/sec Loss 1.0827 LearningRate 0.000003 Epoch: 38 Global Step: 791610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:23,537-Speed 2496.69 samples/sec Loss 1.0781 LearningRate 0.000003 Epoch: 38 Global Step: 791620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:31,744-Speed 2496.23 samples/sec Loss 1.0817 LearningRate 0.000003 Epoch: 38 Global Step: 791630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:39,950-Speed 2495.85 samples/sec Loss 1.0633 LearningRate 0.000003 Epoch: 38 Global Step: 791640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:48,098-Speed 2513.93 samples/sec Loss 1.0688 LearningRate 0.000003 Epoch: 38 Global Step: 791650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:35:56,304-Speed 2496.13 samples/sec Loss 1.0667 LearningRate 0.000003 Epoch: 38 Global Step: 791660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:04,514-Speed 2495.07 samples/sec Loss 1.0669 LearningRate 0.000003 Epoch: 38 Global Step: 791670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:12,716-Speed 2497.40 samples/sec Loss 1.0865 LearningRate 0.000003 Epoch: 38 Global Step: 791680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:20,924-Speed 2495.67 samples/sec Loss 1.0754 LearningRate 0.000003 Epoch: 38 Global Step: 791690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:29,132-Speed 2495.49 samples/sec Loss 1.0618 LearningRate 0.000003 Epoch: 38 Global Step: 791700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:37,284-Speed 2512.55 samples/sec Loss 1.0783 LearningRate 0.000003 Epoch: 38 Global Step: 791710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:45,491-Speed 2495.79 samples/sec Loss 1.0595 LearningRate 0.000003 Epoch: 38 Global Step: 791720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:36:53,700-Speed 2495.46 samples/sec Loss 1.0949 LearningRate 0.000003 Epoch: 38 Global Step: 791730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:01,918-Speed 2492.18 samples/sec Loss 1.1010 LearningRate 0.000003 Epoch: 38 Global Step: 791740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:10,124-Speed 2496.29 samples/sec Loss 1.0888 LearningRate 0.000003 Epoch: 38 Global Step: 791750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:18,330-Speed 2495.92 samples/sec Loss 1.0584 LearningRate 0.000003 Epoch: 38 Global Step: 791760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:26,483-Speed 2512.62 samples/sec Loss 1.0617 LearningRate 0.000003 Epoch: 38 Global Step: 791770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:34,687-Speed 2496.63 samples/sec Loss 1.0667 LearningRate 0.000003 Epoch: 38 Global Step: 791780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:42,904-Speed 2492.82 samples/sec Loss 1.0446 LearningRate 0.000003 Epoch: 38 Global Step: 791790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:51,108-Speed 2496.74 samples/sec Loss 1.0551 LearningRate 0.000003 Epoch: 38 Global Step: 791800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:37:59,313-Speed 2496.33 samples/sec Loss 1.0513 LearningRate 0.000003 Epoch: 38 Global Step: 791810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:38:07,524-Speed 2494.60 samples/sec Loss 1.1087 LearningRate 0.000003 Epoch: 38 Global Step: 791820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:38:15,683-Speed 2510.63 samples/sec Loss 1.0739 LearningRate 0.000003 Epoch: 38 Global Step: 791830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-07-13 03:38:23,845-Speed 2509.75 samples/sec Loss 1.0875 LearningRate 0.000003 Epoch: 38 Global Step: 791840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:38:32,051-Speed 2495.74 samples/sec Loss 1.0514 LearningRate 0.000003 Epoch: 38 Global Step: 791850 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:38:40,253-Speed 2497.49 samples/sec Loss 1.0758 LearningRate 0.000003 Epoch: 38 Global Step: 791860 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:38:48,456-Speed 2498.04 samples/sec Loss 1.0834 LearningRate 0.000003 Epoch: 38 Global Step: 791870 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:38:56,672-Speed 2493.16 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 38 Global Step: 791880 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:04,824-Speed 2512.53 samples/sec Loss 1.0748 LearningRate 0.000003 Epoch: 38 Global Step: 791890 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:13,035-Speed 2494.46 samples/sec Loss 1.0784 LearningRate 0.000003 Epoch: 38 Global Step: 791900 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:21,258-Speed 2491.28 samples/sec Loss 1.0642 LearningRate 0.000003 Epoch: 38 Global Step: 791910 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:29,458-Speed 2497.76 samples/sec Loss 1.0815 LearningRate 0.000003 Epoch: 38 Global Step: 791920 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:37,663-Speed 2496.25 samples/sec Loss 1.0897 LearningRate 0.000003 Epoch: 38 Global Step: 791930 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:45,866-Speed 2497.07 samples/sec Loss 1.0851 LearningRate 0.000003 Epoch: 38 Global Step: 791940 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:39:54,017-Speed 2513.33 samples/sec Loss 1.0701 LearningRate 0.000003 Epoch: 38 Global Step: 791950 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:02,230-Speed 2494.06 samples/sec Loss 1.0683 LearningRate 0.000003 Epoch: 38 Global Step: 791960 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:10,441-Speed 2494.65 samples/sec Loss 1.0666 LearningRate 0.000003 Epoch: 38 Global Step: 791970 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:18,650-Speed 2495.31 samples/sec Loss 1.0355 LearningRate 0.000003 Epoch: 38 Global Step: 791980 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:26,851-Speed 2497.69 samples/sec Loss 1.0732 LearningRate 0.000003 Epoch: 38 Global Step: 791990 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:35,058-Speed 2495.63 samples/sec Loss 1.0653 LearningRate 0.000003 Epoch: 38 Global Step: 792000 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:43,208-Speed 2513.29 samples/sec Loss 1.0712 LearningRate 0.000003 Epoch: 38 Global Step: 792010 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:51,408-Speed 2498.01 samples/sec Loss 1.0496 LearningRate 0.000003 Epoch: 38 Global Step: 792020 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:40:59,616-Speed 2495.74 samples/sec Loss 1.0673 LearningRate 0.000003 Epoch: 38 Global Step: 792030 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:07,821-Speed 2496.23 samples/sec Loss 1.0659 LearningRate 0.000003 Epoch: 38 Global Step: 792040 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:16,042-Speed 2491.66 samples/sec Loss 1.0535 LearningRate 0.000003 Epoch: 38 Global Step: 792050 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:24,247-Speed 2496.52 samples/sec Loss 1.0604 LearningRate 0.000003 Epoch: 38 Global Step: 792060 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:32,395-Speed 2514.20 samples/sec Loss 1.0421 LearningRate 0.000003 Epoch: 38 Global Step: 792070 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:40,598-Speed 2497.43 samples/sec Loss 1.0625 LearningRate 0.000003 Epoch: 38 Global Step: 792080 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:48,800-Speed 2497.39 samples/sec Loss 1.0859 LearningRate 0.000003 Epoch: 38 Global Step: 792090 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:41:57,006-Speed 2496.15 samples/sec Loss 1.0785 LearningRate 0.000003 Epoch: 38 Global Step: 792100 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:05,207-Speed 2497.58 samples/sec Loss 1.0612 LearningRate 0.000003 Epoch: 38 Global Step: 792110 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:13,411-Speed 2496.91 samples/sec Loss 1.0727 LearningRate 0.000003 Epoch: 38 Global Step: 792120 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:21,563-Speed 2512.83 samples/sec Loss 1.0626 LearningRate 0.000003 Epoch: 38 Global Step: 792130 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:29,775-Speed 2494.28 samples/sec Loss 1.0719 LearningRate 0.000003 Epoch: 38 Global Step: 792140 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:37,977-Speed 2497.30 samples/sec Loss 1.0877 LearningRate 0.000003 Epoch: 38 Global Step: 792150 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:46,188-Speed 2494.58 samples/sec Loss 1.0431 LearningRate 0.000003 Epoch: 38 Global Step: 792160 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:42:54,406-Speed 2492.39 samples/sec Loss 1.0652 LearningRate 0.000003 Epoch: 38 Global Step: 792170 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:02,607-Speed 2497.68 samples/sec Loss 1.0570 LearningRate 0.000003 Epoch: 38 Global Step: 792180 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:10,756-Speed 2513.76 samples/sec Loss 1.0687 LearningRate 0.000003 Epoch: 38 Global Step: 792190 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:18,956-Speed 2497.82 samples/sec Loss 1.0710 LearningRate 0.000003 Epoch: 38 Global Step: 792200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:27,159-Speed 2497.02 samples/sec Loss 1.0560 LearningRate 0.000003 Epoch: 38 Global Step: 792210 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:35,375-Speed 2493.26 samples/sec Loss 1.0657 LearningRate 0.000003 Epoch: 38 Global Step: 792220 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:43,578-Speed 2497.32 samples/sec Loss 1.0567 LearningRate 0.000003 Epoch: 38 Global Step: 792230 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:51,780-Speed 2497.33 samples/sec Loss 1.0443 LearningRate 0.000002 Epoch: 38 Global Step: 792240 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:43:59,931-Speed 2512.85 samples/sec Loss 1.0878 LearningRate 0.000002 Epoch: 38 Global Step: 792250 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:08,134-Speed 2496.96 samples/sec Loss 1.0461 LearningRate 0.000002 Epoch: 38 Global Step: 792260 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:16,338-Speed 2496.77 samples/sec Loss 1.0829 LearningRate 0.000002 Epoch: 38 Global Step: 792270 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:24,543-Speed 2496.22 samples/sec Loss 1.0907 LearningRate 0.000002 Epoch: 38 Global Step: 792280 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:32,749-Speed 2496.31 samples/sec Loss 1.0317 LearningRate 0.000002 Epoch: 38 Global Step: 792290 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:40,960-Speed 2494.83 samples/sec Loss 1.0796 LearningRate 0.000002 Epoch: 38 Global Step: 792300 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:49,111-Speed 2513.08 samples/sec Loss 1.0771 LearningRate 0.000002 Epoch: 38 Global Step: 792310 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:44:57,312-Speed 2497.43 samples/sec Loss 1.0898 LearningRate 0.000002 Epoch: 38 Global Step: 792320 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:05,517-Speed 2496.42 samples/sec Loss 1.0680 LearningRate 0.000002 Epoch: 38 Global Step: 792330 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:13,725-Speed 2495.73 samples/sec Loss 1.0759 LearningRate 0.000002 Epoch: 38 Global Step: 792340 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:21,932-Speed 2495.69 samples/sec Loss 1.0697 LearningRate 0.000002 Epoch: 38 Global Step: 792350 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:30,139-Speed 2495.80 samples/sec Loss 1.0635 LearningRate 0.000002 Epoch: 38 Global Step: 792360 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:38,289-Speed 2513.44 samples/sec Loss 1.0618 LearningRate 0.000002 Epoch: 38 Global Step: 792370 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:46,508-Speed 2492.09 samples/sec Loss 1.0388 LearningRate 0.000002 Epoch: 38 Global Step: 792380 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:45:54,715-Speed 2495.89 samples/sec Loss 1.0836 LearningRate 0.000002 Epoch: 38 Global Step: 792390 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:46:02,918-Speed 2496.76 samples/sec Loss 1.0920 LearningRate 0.000002 Epoch: 38 Global Step: 792400 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:46:11,121-Speed 2497.25 samples/sec Loss 1.0818 LearningRate 0.000002 Epoch: 38 Global Step: 792410 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-07-13 03:46:19,324-Speed 2496.73 samples/sec Loss 1.0714 LearningRate 0.000002 Epoch: 38 Global Step: 792420 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:46:27,480-Speed 2511.56 samples/sec Loss 1.0654 LearningRate 0.000002 Epoch: 38 Global Step: 792430 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:46:35,691-Speed 2494.71 samples/sec Loss 1.0595 LearningRate 0.000002 Epoch: 38 Global Step: 792440 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:46:43,895-Speed 2496.74 samples/sec Loss 1.0743 LearningRate 0.000002 Epoch: 38 Global Step: 792450 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:46:52,106-Speed 2494.46 samples/sec Loss 1.0751 LearningRate 0.000002 Epoch: 38 Global Step: 792460 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:00,312-Speed 2496.08 samples/sec Loss 1.0695 LearningRate 0.000002 Epoch: 38 Global Step: 792470 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:08,516-Speed 2496.79 samples/sec Loss 1.0381 LearningRate 0.000002 Epoch: 38 Global Step: 792480 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:16,675-Speed 2510.25 samples/sec Loss 1.0464 LearningRate 0.000002 Epoch: 38 Global Step: 792490 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:24,876-Speed 2497.43 samples/sec Loss 1.0189 LearningRate 0.000002 Epoch: 38 Global Step: 792500 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:33,080-Speed 2497.10 samples/sec Loss 1.0795 LearningRate 0.000002 Epoch: 38 Global Step: 792510 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:41,281-Speed 2497.44 samples/sec Loss 1.0656 LearningRate 0.000002 Epoch: 38 Global Step: 792520 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:49,484-Speed 2496.86 samples/sec Loss 1.0667 LearningRate 0.000002 Epoch: 38 Global Step: 792530 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:47:57,689-Speed 2496.68 samples/sec Loss 1.0734 LearningRate 0.000002 Epoch: 38 Global Step: 792540 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:05,837-Speed 2513.93 samples/sec Loss 1.0730 LearningRate 0.000002 Epoch: 38 Global Step: 792550 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:14,039-Speed 2497.48 samples/sec Loss 1.0655 LearningRate 0.000002 Epoch: 38 Global Step: 792560 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:22,252-Speed 2493.74 samples/sec Loss 1.0909 LearningRate 0.000002 Epoch: 38 Global Step: 792570 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:30,455-Speed 2497.22 samples/sec Loss 1.0665 LearningRate 0.000002 Epoch: 38 Global Step: 792580 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:38,658-Speed 2496.84 samples/sec Loss 1.0932 LearningRate 0.000002 Epoch: 38 Global Step: 792590 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:46,862-Speed 2496.70 samples/sec Loss 1.0292 LearningRate 0.000002 Epoch: 38 Global Step: 792600 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:48:55,015-Speed 2512.82 samples/sec Loss 1.0634 LearningRate 0.000002 Epoch: 38 Global Step: 792610 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:03,224-Speed 2495.12 samples/sec Loss 1.0321 LearningRate 0.000002 Epoch: 38 Global Step: 792620 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:11,428-Speed 2496.72 samples/sec Loss 1.0771 LearningRate 0.000002 Epoch: 38 Global Step: 792630 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:19,633-Speed 2496.44 samples/sec Loss 1.0770 LearningRate 0.000002 Epoch: 38 Global Step: 792640 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:27,837-Speed 2496.74 samples/sec Loss 1.0744 LearningRate 0.000002 Epoch: 38 Global Step: 792650 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:36,046-Speed 2495.01 samples/sec Loss 1.0398 LearningRate 0.000002 Epoch: 38 Global Step: 792660 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:44,196-Speed 2513.51 samples/sec Loss 1.0375 LearningRate 0.000002 Epoch: 38 Global Step: 792670 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:49:52,404-Speed 2495.45 samples/sec Loss 1.0417 LearningRate 0.000002 Epoch: 38 Global Step: 792680 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:00,609-Speed 2496.38 samples/sec Loss 1.0729 LearningRate 0.000002 Epoch: 38 Global Step: 792690 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:08,829-Speed 2492.04 samples/sec Loss 1.0765 LearningRate 0.000002 Epoch: 38 Global Step: 792700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:17,033-Speed 2496.73 samples/sec Loss 1.0377 LearningRate 0.000002 Epoch: 38 Global Step: 792710 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:25,326-Speed 2470.01 samples/sec Loss 1.1101 LearningRate 0.000002 Epoch: 38 Global Step: 792720 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:33,492-Speed 2508.16 samples/sec Loss 1.0716 LearningRate 0.000002 Epoch: 38 Global Step: 792730 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:41,696-Speed 2496.73 samples/sec Loss 1.0642 LearningRate 0.000002 Epoch: 38 Global Step: 792740 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:49,902-Speed 2496.37 samples/sec Loss 1.0657 LearningRate 0.000002 Epoch: 38 Global Step: 792750 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:50:58,133-Speed 2488.52 samples/sec Loss 1.0410 LearningRate 0.000002 Epoch: 38 Global Step: 792760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:06,347-Speed 2493.51 samples/sec Loss 1.0452 LearningRate 0.000002 Epoch: 38 Global Step: 792770 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:14,558-Speed 2494.73 samples/sec Loss 1.0886 LearningRate 0.000002 Epoch: 38 Global Step: 792780 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:22,713-Speed 2511.92 samples/sec Loss 1.0465 LearningRate 0.000002 Epoch: 38 Global Step: 792790 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:30,923-Speed 2494.83 samples/sec Loss 1.0527 LearningRate 0.000002 Epoch: 38 Global Step: 792800 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:39,130-Speed 2495.76 samples/sec Loss 1.0662 LearningRate 0.000002 Epoch: 38 Global Step: 792810 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:47,342-Speed 2494.16 samples/sec Loss 1.0685 LearningRate 0.000002 Epoch: 38 Global Step: 792820 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:51:55,552-Speed 2495.12 samples/sec Loss 1.0905 LearningRate 0.000002 Epoch: 38 Global Step: 792830 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:03,760-Speed 2495.52 samples/sec Loss 1.0874 LearningRate 0.000002 Epoch: 38 Global Step: 792840 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:11,915-Speed 2511.53 samples/sec Loss 1.0761 LearningRate 0.000002 Epoch: 38 Global Step: 792850 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:20,121-Speed 2496.26 samples/sec Loss 1.0745 LearningRate 0.000002 Epoch: 38 Global Step: 792860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:28,325-Speed 2496.78 samples/sec Loss 1.0850 LearningRate 0.000002 Epoch: 38 Global Step: 792870 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:36,538-Speed 2493.67 samples/sec Loss 1.0719 LearningRate 0.000002 Epoch: 38 Global Step: 792880 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:44,752-Speed 2493.76 samples/sec Loss 1.0537 LearningRate 0.000002 Epoch: 38 Global Step: 792890 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:52:52,956-Speed 2496.92 samples/sec Loss 1.0943 LearningRate 0.000002 Epoch: 38 Global Step: 792900 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:01,111-Speed 2511.55 samples/sec Loss 1.0644 LearningRate 0.000002 Epoch: 38 Global Step: 792910 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:09,314-Speed 2496.88 samples/sec Loss 1.0474 LearningRate 0.000002 Epoch: 38 Global Step: 792920 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:17,519-Speed 2496.54 samples/sec Loss 1.0674 LearningRate 0.000002 Epoch: 38 Global Step: 792930 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:25,722-Speed 2497.09 samples/sec Loss 1.0713 LearningRate 0.000002 Epoch: 38 Global Step: 792940 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:33,929-Speed 2496.07 samples/sec Loss 1.0481 LearningRate 0.000002 Epoch: 38 Global Step: 792950 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:42,136-Speed 2495.83 samples/sec Loss 1.0551 LearningRate 0.000002 Epoch: 38 Global Step: 792960 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:50,288-Speed 2512.46 samples/sec Loss 1.0333 LearningRate 0.000002 Epoch: 38 Global Step: 792970 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:53:58,493-Speed 2496.54 samples/sec Loss 1.0478 LearningRate 0.000002 Epoch: 38 Global Step: 792980 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:06,702-Speed 2495.27 samples/sec Loss 1.0852 LearningRate 0.000002 Epoch: 38 Global Step: 792990 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:14,904-Speed 2497.50 samples/sec Loss 1.0725 LearningRate 0.000002 Epoch: 38 Global Step: 793000 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:23,113-Speed 2495.58 samples/sec Loss 1.0673 LearningRate 0.000002 Epoch: 38 Global Step: 793010 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:31,323-Speed 2495.12 samples/sec Loss 1.0741 LearningRate 0.000002 Epoch: 38 Global Step: 793020 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:39,482-Speed 2510.16 samples/sec Loss 1.0613 LearningRate 0.000002 Epoch: 38 Global Step: 793030 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-07-13 03:54:47,693-Speed 2494.92 samples/sec Loss 1.0930 LearningRate 0.000002 Epoch: 38 Global Step: 793040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:54:55,902-Speed 2495.13 samples/sec Loss 1.0443 LearningRate 0.000002 Epoch: 38 Global Step: 793050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:04,125-Speed 2491.24 samples/sec Loss 1.0707 LearningRate 0.000002 Epoch: 38 Global Step: 793060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:12,333-Speed 2495.61 samples/sec Loss 1.0721 LearningRate 0.000002 Epoch: 38 Global Step: 793070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:20,546-Speed 2494.16 samples/sec Loss 1.0513 LearningRate 0.000002 Epoch: 38 Global Step: 793080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:28,700-Speed 2512.04 samples/sec Loss 1.0937 LearningRate 0.000002 Epoch: 38 Global Step: 793090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:36,908-Speed 2495.40 samples/sec Loss 1.0972 LearningRate 0.000002 Epoch: 38 Global Step: 793100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:45,114-Speed 2496.00 samples/sec Loss 1.0725 LearningRate 0.000002 Epoch: 38 Global Step: 793110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:55:53,316-Speed 2497.30 samples/sec Loss 1.0947 LearningRate 0.000002 Epoch: 38 Global Step: 793120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:01,515-Speed 2498.58 samples/sec Loss 1.0440 LearningRate 0.000002 Epoch: 38 Global Step: 793130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:09,717-Speed 2497.30 samples/sec Loss 1.0645 LearningRate 0.000002 Epoch: 38 Global Step: 793140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:17,866-Speed 2513.38 samples/sec Loss 1.0715 LearningRate 0.000002 Epoch: 38 Global Step: 793150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:26,070-Speed 2496.85 samples/sec Loss 1.0918 LearningRate 0.000002 Epoch: 38 Global Step: 793160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:34,273-Speed 2497.21 samples/sec Loss 1.0684 LearningRate 0.000002 Epoch: 38 Global Step: 793170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:42,487-Speed 2493.85 samples/sec Loss 1.0707 LearningRate 0.000002 Epoch: 38 Global Step: 793180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:50,694-Speed 2495.74 samples/sec Loss 1.0486 LearningRate 0.000002 Epoch: 38 Global Step: 793190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:56:58,904-Speed 2495.13 samples/sec Loss 1.0698 LearningRate 0.000002 Epoch: 38 Global Step: 793200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:07,054-Speed 2514.70 samples/sec Loss 1.0525 LearningRate 0.000002 Epoch: 38 Global Step: 793210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:15,259-Speed 2496.34 samples/sec Loss 1.0647 LearningRate 0.000002 Epoch: 38 Global Step: 793220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:23,464-Speed 2496.55 samples/sec Loss 1.0710 LearningRate 0.000002 Epoch: 38 Global Step: 793230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:31,675-Speed 2494.67 samples/sec Loss 1.0853 LearningRate 0.000002 Epoch: 38 Global Step: 793240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:39,885-Speed 2494.99 samples/sec Loss 1.0242 LearningRate 0.000002 Epoch: 38 Global Step: 793250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:48,088-Speed 2496.98 samples/sec Loss 1.0856 LearningRate 0.000002 Epoch: 38 Global Step: 793260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:57:56,243-Speed 2511.60 samples/sec Loss 1.0742 LearningRate 0.000002 Epoch: 38 Global Step: 793270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:04,448-Speed 2496.68 samples/sec Loss 1.0437 LearningRate 0.000002 Epoch: 38 Global Step: 793280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:12,654-Speed 2496.55 samples/sec Loss 1.0758 LearningRate 0.000002 Epoch: 38 Global Step: 793290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:20,871-Speed 2492.43 samples/sec Loss 1.0983 LearningRate 0.000002 Epoch: 38 Global Step: 793300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:29,077-Speed 2496.54 samples/sec Loss 1.0401 LearningRate 0.000002 Epoch: 38 Global Step: 793310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:37,283-Speed 2496.02 samples/sec Loss 1.0683 LearningRate 0.000002 Epoch: 38 Global Step: 793320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:45,441-Speed 2510.78 samples/sec Loss 1.0821 LearningRate 0.000002 Epoch: 38 Global Step: 793330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:58:53,644-Speed 2497.04 samples/sec Loss 1.0742 LearningRate 0.000002 Epoch: 38 Global Step: 793340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:01,848-Speed 2496.66 samples/sec Loss 1.0744 LearningRate 0.000002 Epoch: 38 Global Step: 793350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:10,054-Speed 2496.30 samples/sec Loss 1.0518 LearningRate 0.000002 Epoch: 38 Global Step: 793360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:18,269-Speed 2493.12 samples/sec Loss 1.0602 LearningRate 0.000002 Epoch: 38 Global Step: 793370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:26,473-Speed 2496.92 samples/sec Loss 1.0765 LearningRate 0.000002 Epoch: 38 Global Step: 793380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:34,627-Speed 2511.95 samples/sec Loss 1.0600 LearningRate 0.000002 Epoch: 38 Global Step: 793390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:42,841-Speed 2493.69 samples/sec Loss 1.0507 LearningRate 0.000002 Epoch: 38 Global Step: 793400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:51,046-Speed 2496.24 samples/sec Loss 1.0480 LearningRate 0.000002 Epoch: 38 Global Step: 793410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 03:59:59,258-Speed 2494.30 samples/sec Loss 1.0561 LearningRate 0.000002 Epoch: 38 Global Step: 793420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:07,476-Speed 2492.63 samples/sec Loss 1.0777 LearningRate 0.000002 Epoch: 38 Global Step: 793430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:15,680-Speed 2497.03 samples/sec Loss 1.0643 LearningRate 0.000002 Epoch: 38 Global Step: 793440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:23,829-Speed 2513.40 samples/sec Loss 1.0496 LearningRate 0.000002 Epoch: 38 Global Step: 793450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:32,032-Speed 2497.10 samples/sec Loss 1.0796 LearningRate 0.000002 Epoch: 38 Global Step: 793460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:40,237-Speed 2496.67 samples/sec Loss 1.0411 LearningRate 0.000002 Epoch: 38 Global Step: 793470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:48,445-Speed 2495.78 samples/sec Loss 1.0762 LearningRate 0.000002 Epoch: 38 Global Step: 793480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:00:56,650-Speed 2496.32 samples/sec Loss 1.0665 LearningRate 0.000002 Epoch: 38 Global Step: 793490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:04,865-Speed 2493.45 samples/sec Loss 1.0884 LearningRate 0.000002 Epoch: 38 Global Step: 793500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:13,014-Speed 2513.37 samples/sec Loss 1.0650 LearningRate 0.000002 Epoch: 38 Global Step: 793510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:21,222-Speed 2495.65 samples/sec Loss 1.0516 LearningRate 0.000002 Epoch: 38 Global Step: 793520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:29,430-Speed 2495.44 samples/sec Loss 1.0646 LearningRate 0.000002 Epoch: 38 Global Step: 793530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:37,632-Speed 2497.53 samples/sec Loss 1.0406 LearningRate 0.000002 Epoch: 38 Global Step: 793540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:45,833-Speed 2497.83 samples/sec Loss 1.0743 LearningRate 0.000002 Epoch: 38 Global Step: 793550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:01:54,037-Speed 2497.02 samples/sec Loss 1.0965 LearningRate 0.000002 Epoch: 38 Global Step: 793560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:02,188-Speed 2513.06 samples/sec Loss 1.0745 LearningRate 0.000002 Epoch: 38 Global Step: 793570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:10,401-Speed 2493.77 samples/sec Loss 1.0798 LearningRate 0.000002 Epoch: 38 Global Step: 793580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:18,604-Speed 2497.17 samples/sec Loss 1.0592 LearningRate 0.000002 Epoch: 38 Global Step: 793590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:26,807-Speed 2497.20 samples/sec Loss 1.0743 LearningRate 0.000002 Epoch: 38 Global Step: 793600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:35,019-Speed 2494.17 samples/sec Loss 1.0698 LearningRate 0.000002 Epoch: 38 Global Step: 793610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:43,221-Speed 2497.12 samples/sec Loss 1.0580 LearningRate 0.000002 Epoch: 38 Global Step: 793620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:51,373-Speed 2512.81 samples/sec Loss 1.0857 LearningRate 0.000002 Epoch: 38 Global Step: 793630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:02:59,581-Speed 2495.54 samples/sec Loss 1.0954 LearningRate 0.000002 Epoch: 38 Global Step: 793640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:07,794-Speed 2494.01 samples/sec Loss 1.0951 LearningRate 0.000002 Epoch: 38 Global Step: 793650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:15,999-Speed 2496.52 samples/sec Loss 1.0885 LearningRate 0.000002 Epoch: 38 Global Step: 793660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:24,213-Speed 2493.52 samples/sec Loss 1.0646 LearningRate 0.000002 Epoch: 38 Global Step: 793670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:32,418-Speed 2496.63 samples/sec Loss 1.0735 LearningRate 0.000002 Epoch: 38 Global Step: 793680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:40,574-Speed 2511.38 samples/sec Loss 1.0677 LearningRate 0.000002 Epoch: 38 Global Step: 793690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:48,776-Speed 2497.45 samples/sec Loss 1.0842 LearningRate 0.000002 Epoch: 38 Global Step: 793700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:03:56,982-Speed 2496.22 samples/sec Loss 1.0695 LearningRate 0.000002 Epoch: 38 Global Step: 793710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:05,184-Speed 2497.28 samples/sec Loss 1.0816 LearningRate 0.000002 Epoch: 38 Global Step: 793720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:13,385-Speed 2497.43 samples/sec Loss 1.0771 LearningRate 0.000002 Epoch: 38 Global Step: 793730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:21,593-Speed 2495.57 samples/sec Loss 1.0620 LearningRate 0.000002 Epoch: 38 Global Step: 793740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:29,742-Speed 2514.25 samples/sec Loss 1.0780 LearningRate 0.000002 Epoch: 38 Global Step: 793750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:37,947-Speed 2496.59 samples/sec Loss 1.0758 LearningRate 0.000002 Epoch: 38 Global Step: 793760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:46,150-Speed 2497.09 samples/sec Loss 1.0627 LearningRate 0.000002 Epoch: 38 Global Step: 793770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:04:54,354-Speed 2496.90 samples/sec Loss 1.0954 LearningRate 0.000002 Epoch: 38 Global Step: 793780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:02,562-Speed 2495.29 samples/sec Loss 1.0671 LearningRate 0.000002 Epoch: 38 Global Step: 793790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:10,769-Speed 2495.86 samples/sec Loss 1.0677 LearningRate 0.000002 Epoch: 38 Global Step: 793800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:18,921-Speed 2512.82 samples/sec Loss 1.0876 LearningRate 0.000002 Epoch: 38 Global Step: 793810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:27,126-Speed 2496.24 samples/sec Loss 1.0699 LearningRate 0.000002 Epoch: 38 Global Step: 793820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:35,328-Speed 2497.29 samples/sec Loss 1.0738 LearningRate 0.000002 Epoch: 38 Global Step: 793830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:43,544-Speed 2492.92 samples/sec Loss 1.0803 LearningRate 0.000002 Epoch: 38 Global Step: 793840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:51,750-Speed 2496.22 samples/sec Loss 1.0637 LearningRate 0.000002 Epoch: 38 Global Step: 793850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:05:59,959-Speed 2495.28 samples/sec Loss 1.0706 LearningRate 0.000002 Epoch: 38 Global Step: 793860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:08,116-Speed 2511.06 samples/sec Loss 1.0836 LearningRate 0.000002 Epoch: 38 Global Step: 793870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:16,321-Speed 2496.17 samples/sec Loss 1.0619 LearningRate 0.000002 Epoch: 38 Global Step: 793880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:24,525-Speed 2496.84 samples/sec Loss 1.0732 LearningRate 0.000002 Epoch: 38 Global Step: 793890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:32,731-Speed 2496.24 samples/sec Loss 1.0777 LearningRate 0.000002 Epoch: 38 Global Step: 793900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:40,937-Speed 2496.10 samples/sec Loss 1.0757 LearningRate 0.000002 Epoch: 38 Global Step: 793910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:49,143-Speed 2495.89 samples/sec Loss 1.0846 LearningRate 0.000002 Epoch: 38 Global Step: 793920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:06:57,295-Speed 2512.94 samples/sec Loss 1.0667 LearningRate 0.000002 Epoch: 38 Global Step: 793930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:05,502-Speed 2495.62 samples/sec Loss 1.0685 LearningRate 0.000002 Epoch: 38 Global Step: 793940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:13,709-Speed 2495.84 samples/sec Loss 1.0597 LearningRate 0.000002 Epoch: 38 Global Step: 793950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:21,922-Speed 2493.84 samples/sec Loss 1.0815 LearningRate 0.000002 Epoch: 38 Global Step: 793960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:30,133-Speed 2494.81 samples/sec Loss 1.0604 LearningRate 0.000002 Epoch: 38 Global Step: 793970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:38,341-Speed 2495.40 samples/sec Loss 1.0738 LearningRate 0.000002 Epoch: 38 Global Step: 793980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:46,495-Speed 2512.20 samples/sec Loss 1.0947 LearningRate 0.000002 Epoch: 38 Global Step: 793990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:07:54,707-Speed 2493.97 samples/sec Loss 1.0819 LearningRate 0.000002 Epoch: 38 Global Step: 794000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:02,916-Speed 2495.38 samples/sec Loss 1.0907 LearningRate 0.000002 Epoch: 38 Global Step: 794010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:11,124-Speed 2495.74 samples/sec Loss 1.0345 LearningRate 0.000002 Epoch: 38 Global Step: 794020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:19,327-Speed 2497.05 samples/sec Loss 1.0402 LearningRate 0.000002 Epoch: 38 Global Step: 794030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:27,531-Speed 2496.76 samples/sec Loss 1.0766 LearningRate 0.000002 Epoch: 38 Global Step: 794040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:35,679-Speed 2513.61 samples/sec Loss 1.0880 LearningRate 0.000002 Epoch: 38 Global Step: 794050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:43,885-Speed 2496.33 samples/sec Loss 1.0793 LearningRate 0.000002 Epoch: 38 Global Step: 794060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:08:52,087-Speed 2497.30 samples/sec Loss 1.0603 LearningRate 0.000002 Epoch: 38 Global Step: 794070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:00,293-Speed 2496.22 samples/sec Loss 1.0800 LearningRate 0.000002 Epoch: 38 Global Step: 794080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:08,497-Speed 2496.53 samples/sec Loss 1.0844 LearningRate 0.000002 Epoch: 38 Global Step: 794090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:16,703-Speed 2496.53 samples/sec Loss 1.0708 LearningRate 0.000002 Epoch: 38 Global Step: 794100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:24,859-Speed 2511.28 samples/sec Loss 1.0824 LearningRate 0.000002 Epoch: 38 Global Step: 794110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:33,075-Speed 2493.02 samples/sec Loss 1.0530 LearningRate 0.000002 Epoch: 38 Global Step: 794120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:41,281-Speed 2496.25 samples/sec Loss 1.0990 LearningRate 0.000002 Epoch: 38 Global Step: 794130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:49,489-Speed 2495.78 samples/sec Loss 1.0619 LearningRate 0.000002 Epoch: 38 Global Step: 794140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:09:57,696-Speed 2495.74 samples/sec Loss 1.0801 LearningRate 0.000002 Epoch: 38 Global Step: 794150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:05,903-Speed 2495.91 samples/sec Loss 1.0542 LearningRate 0.000002 Epoch: 38 Global Step: 794160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:14,054-Speed 2512.96 samples/sec Loss 1.0616 LearningRate 0.000002 Epoch: 38 Global Step: 794170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:22,260-Speed 2495.99 samples/sec Loss 1.0812 LearningRate 0.000002 Epoch: 38 Global Step: 794180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:30,468-Speed 2495.34 samples/sec Loss 1.0519 LearningRate 0.000002 Epoch: 38 Global Step: 794190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:38,673-Speed 2496.42 samples/sec Loss 1.0843 LearningRate 0.000002 Epoch: 38 Global Step: 794200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:46,882-Speed 2495.40 samples/sec Loss 1.0725 LearningRate 0.000002 Epoch: 38 Global Step: 794210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:10:55,088-Speed 2496.06 samples/sec Loss 1.0751 LearningRate 0.000002 Epoch: 38 Global Step: 794220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:11:03,254-Speed 2508.52 samples/sec Loss 1.0973 LearningRate 0.000002 Epoch: 38 Global Step: 794230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:11:11,458-Speed 2496.75 samples/sec Loss 1.0600 LearningRate 0.000002 Epoch: 38 Global Step: 794240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:11:19,668-Speed 2494.91 samples/sec Loss 1.0992 LearningRate 0.000002 Epoch: 38 Global Step: 794250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:11:27,876-Speed 2495.29 samples/sec Loss 1.0801 LearningRate 0.000002 Epoch: 38 Global Step: 794260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:11:36,080-Speed 2496.93 samples/sec Loss 1.0781 LearningRate 0.000002 Epoch: 38 Global Step: 794270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:11:44,284-Speed 2496.77 samples/sec Loss 1.0637 LearningRate 0.000002 Epoch: 38 Global Step: 794280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:11:52,435-Speed 2513.03 samples/sec Loss 1.0889 LearningRate 0.000002 Epoch: 38 Global Step: 794290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:12:00,606-Speed 2506.70 samples/sec Loss 1.0656 LearningRate 0.000002 Epoch: 38 Global Step: 794300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:08,810-Speed 2496.78 samples/sec Loss 1.0894 LearningRate 0.000002 Epoch: 38 Global Step: 794310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:17,014-Speed 2496.93 samples/sec Loss 1.0840 LearningRate 0.000002 Epoch: 38 Global Step: 794320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:25,234-Speed 2491.99 samples/sec Loss 1.0732 LearningRate 0.000002 Epoch: 38 Global Step: 794330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:33,442-Speed 2495.21 samples/sec Loss 1.0719 LearningRate 0.000002 Epoch: 38 Global Step: 794340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:41,594-Speed 2512.68 samples/sec Loss 1.0441 LearningRate 0.000002 Epoch: 38 Global Step: 794350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:49,804-Speed 2494.94 samples/sec Loss 1.0810 LearningRate 0.000002 Epoch: 38 Global Step: 794360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:12:58,024-Speed 2491.87 samples/sec Loss 1.0876 LearningRate 0.000002 Epoch: 38 Global Step: 794370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:06,234-Speed 2494.92 samples/sec Loss 1.0814 LearningRate 0.000002 Epoch: 38 Global Step: 794380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:14,438-Speed 2496.91 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 794390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:22,647-Speed 2495.37 samples/sec Loss 1.0532 LearningRate 0.000002 Epoch: 38 Global Step: 794400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:30,801-Speed 2512.00 samples/sec Loss 1.0981 LearningRate 0.000002 Epoch: 38 Global Step: 794410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:39,006-Speed 2496.32 samples/sec Loss 1.0584 LearningRate 0.000002 Epoch: 38 Global Step: 794420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:47,213-Speed 2495.77 samples/sec Loss 1.0397 LearningRate 0.000002 Epoch: 38 Global Step: 794430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:13:55,420-Speed 2496.05 samples/sec Loss 1.1003 LearningRate 0.000002 Epoch: 38 Global Step: 794440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:03,625-Speed 2496.62 samples/sec Loss 1.0408 LearningRate 0.000002 Epoch: 38 Global Step: 794450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:11,833-Speed 2495.37 samples/sec Loss 1.0540 LearningRate 0.000002 Epoch: 38 Global Step: 794460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:19,983-Speed 2513.29 samples/sec Loss 1.0602 LearningRate 0.000002 Epoch: 38 Global Step: 794470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:28,190-Speed 2495.98 samples/sec Loss 1.0553 LearningRate 0.000002 Epoch: 38 Global Step: 794480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:36,394-Speed 2496.49 samples/sec Loss 1.0282 LearningRate 0.000002 Epoch: 38 Global Step: 794490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:44,600-Speed 2496.18 samples/sec Loss 1.0667 LearningRate 0.000002 Epoch: 38 Global Step: 794500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:14:52,807-Speed 2496.07 samples/sec Loss 1.0543 LearningRate 0.000002 Epoch: 38 Global Step: 794510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:01,014-Speed 2495.64 samples/sec Loss 1.0829 LearningRate 0.000002 Epoch: 38 Global Step: 794520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:09,169-Speed 2511.65 samples/sec Loss 1.0624 LearningRate 0.000002 Epoch: 38 Global Step: 794530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:17,374-Speed 2496.61 samples/sec Loss 1.0680 LearningRate 0.000002 Epoch: 38 Global Step: 794540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:25,580-Speed 2495.93 samples/sec Loss 1.0557 LearningRate 0.000002 Epoch: 38 Global Step: 794550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:33,787-Speed 2495.85 samples/sec Loss 1.0591 LearningRate 0.000002 Epoch: 38 Global Step: 794560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:41,992-Speed 2496.73 samples/sec Loss 1.0499 LearningRate 0.000002 Epoch: 38 Global Step: 794570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:50,195-Speed 2496.90 samples/sec Loss 1.0748 LearningRate 0.000002 Epoch: 38 Global Step: 794580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:15:58,356-Speed 2510.12 samples/sec Loss 1.0373 LearningRate 0.000002 Epoch: 38 Global Step: 794590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:06,560-Speed 2496.62 samples/sec Loss 1.0548 LearningRate 0.000002 Epoch: 38 Global Step: 794600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:14,765-Speed 2496.26 samples/sec Loss 1.0412 LearningRate 0.000002 Epoch: 38 Global Step: 794610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:22,970-Speed 2496.75 samples/sec Loss 1.0690 LearningRate 0.000002 Epoch: 38 Global Step: 794620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:31,180-Speed 2495.13 samples/sec Loss 1.0586 LearningRate 0.000002 Epoch: 38 Global Step: 794630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:39,389-Speed 2495.21 samples/sec Loss 1.0459 LearningRate 0.000002 Epoch: 38 Global Step: 794640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:47,539-Speed 2513.03 samples/sec Loss 1.0603 LearningRate 0.000002 Epoch: 38 Global Step: 794650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:16:55,763-Speed 2490.86 samples/sec Loss 1.0495 LearningRate 0.000002 Epoch: 38 Global Step: 794660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:03,970-Speed 2495.70 samples/sec Loss 1.0791 LearningRate 0.000002 Epoch: 38 Global Step: 794670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:12,181-Speed 2494.50 samples/sec Loss 1.0429 LearningRate 0.000002 Epoch: 38 Global Step: 794680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:20,394-Speed 2493.90 samples/sec Loss 1.0836 LearningRate 0.000002 Epoch: 38 Global Step: 794690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:28,603-Speed 2495.56 samples/sec Loss 1.0725 LearningRate 0.000002 Epoch: 38 Global Step: 794700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:36,758-Speed 2511.77 samples/sec Loss 1.0689 LearningRate 0.000002 Epoch: 38 Global Step: 794710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:44,965-Speed 2495.59 samples/sec Loss 1.0553 LearningRate 0.000002 Epoch: 38 Global Step: 794720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:17:53,173-Speed 2495.53 samples/sec Loss 1.0505 LearningRate 0.000002 Epoch: 38 Global Step: 794730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:01,382-Speed 2495.26 samples/sec Loss 1.0564 LearningRate 0.000002 Epoch: 38 Global Step: 794740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:09,592-Speed 2495.07 samples/sec Loss 1.0631 LearningRate 0.000002 Epoch: 38 Global Step: 794750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:17,806-Speed 2493.76 samples/sec Loss 1.0813 LearningRate 0.000002 Epoch: 38 Global Step: 794760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:25,961-Speed 2511.74 samples/sec Loss 1.0936 LearningRate 0.000002 Epoch: 38 Global Step: 794770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:34,169-Speed 2495.33 samples/sec Loss 1.0741 LearningRate 0.000002 Epoch: 38 Global Step: 794780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:42,376-Speed 2495.80 samples/sec Loss 1.0710 LearningRate 0.000002 Epoch: 38 Global Step: 794790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:50,589-Speed 2494.15 samples/sec Loss 1.0862 LearningRate 0.000002 Epoch: 38 Global Step: 794800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:18:58,795-Speed 2496.19 samples/sec Loss 1.0619 LearningRate 0.000002 Epoch: 38 Global Step: 794810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:06,998-Speed 2496.75 samples/sec Loss 1.0555 LearningRate 0.000002 Epoch: 38 Global Step: 794820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:15,151-Speed 2512.41 samples/sec Loss 1.0778 LearningRate 0.000002 Epoch: 38 Global Step: 794830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:23,369-Speed 2492.84 samples/sec Loss 1.0781 LearningRate 0.000002 Epoch: 38 Global Step: 794840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:31,580-Speed 2494.51 samples/sec Loss 1.0638 LearningRate 0.000002 Epoch: 38 Global Step: 794850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:39,788-Speed 2495.31 samples/sec Loss 1.0285 LearningRate 0.000002 Epoch: 38 Global Step: 794860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:47,993-Speed 2496.55 samples/sec Loss 1.0681 LearningRate 0.000002 Epoch: 38 Global Step: 794870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:19:56,208-Speed 2493.67 samples/sec Loss 1.1034 LearningRate 0.000002 Epoch: 38 Global Step: 794880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:04,359-Speed 2512.92 samples/sec Loss 1.0759 LearningRate 0.000002 Epoch: 38 Global Step: 794890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:12,575-Speed 2492.92 samples/sec Loss 1.0640 LearningRate 0.000002 Epoch: 38 Global Step: 794900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:20,781-Speed 2496.23 samples/sec Loss 1.0724 LearningRate 0.000002 Epoch: 38 Global Step: 794910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:28,986-Speed 2496.19 samples/sec Loss 1.0404 LearningRate 0.000002 Epoch: 38 Global Step: 794920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:37,194-Speed 2495.66 samples/sec Loss 1.0756 LearningRate 0.000002 Epoch: 38 Global Step: 794930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:45,396-Speed 2497.79 samples/sec Loss 1.0412 LearningRate 0.000002 Epoch: 38 Global Step: 794940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:20:53,572-Speed 2505.21 samples/sec Loss 1.0492 LearningRate 0.000002 Epoch: 38 Global Step: 794950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:01,778-Speed 2496.25 samples/sec Loss 1.0376 LearningRate 0.000002 Epoch: 38 Global Step: 794960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:09,983-Speed 2496.87 samples/sec Loss 1.0859 LearningRate 0.000002 Epoch: 38 Global Step: 794970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:18,187-Speed 2496.67 samples/sec Loss 1.0606 LearningRate 0.000002 Epoch: 38 Global Step: 794980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:26,389-Speed 2497.20 samples/sec Loss 1.0611 LearningRate 0.000002 Epoch: 38 Global Step: 794990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:34,593-Speed 2497.03 samples/sec Loss 1.0764 LearningRate 0.000002 Epoch: 38 Global Step: 795000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:42,747-Speed 2511.83 samples/sec Loss 1.0657 LearningRate 0.000002 Epoch: 38 Global Step: 795010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:50,958-Speed 2494.89 samples/sec Loss 1.0488 LearningRate 0.000002 Epoch: 38 Global Step: 795020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:21:59,165-Speed 2495.90 samples/sec Loss 1.0779 LearningRate 0.000002 Epoch: 38 Global Step: 795030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:07,370-Speed 2496.61 samples/sec Loss 1.0616 LearningRate 0.000002 Epoch: 38 Global Step: 795040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:15,578-Speed 2495.50 samples/sec Loss 1.0504 LearningRate 0.000002 Epoch: 38 Global Step: 795050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:23,783-Speed 2496.18 samples/sec Loss 1.0476 LearningRate 0.000002 Epoch: 38 Global Step: 795060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:31,938-Speed 2512.12 samples/sec Loss 1.0545 LearningRate 0.000002 Epoch: 38 Global Step: 795070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:40,148-Speed 2494.77 samples/sec Loss 1.0731 LearningRate 0.000002 Epoch: 38 Global Step: 795080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:48,355-Speed 2495.94 samples/sec Loss 1.0333 LearningRate 0.000002 Epoch: 38 Global Step: 795090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:22:56,566-Speed 2494.35 samples/sec Loss 1.0835 LearningRate 0.000002 Epoch: 38 Global Step: 795100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:04,773-Speed 2495.96 samples/sec Loss 1.0390 LearningRate 0.000002 Epoch: 38 Global Step: 795110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:12,980-Speed 2496.15 samples/sec Loss 1.0766 LearningRate 0.000002 Epoch: 38 Global Step: 795120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:21,132-Speed 2512.41 samples/sec Loss 1.0706 LearningRate 0.000002 Epoch: 38 Global Step: 795130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:29,344-Speed 2494.45 samples/sec Loss 1.0851 LearningRate 0.000002 Epoch: 38 Global Step: 795140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:37,548-Speed 2496.60 samples/sec Loss 1.0529 LearningRate 0.000002 Epoch: 38 Global Step: 795150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:45,753-Speed 2496.66 samples/sec Loss 1.0788 LearningRate 0.000002 Epoch: 38 Global Step: 795160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:23:53,963-Speed 2494.85 samples/sec Loss 1.0682 LearningRate 0.000002 Epoch: 38 Global Step: 795170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:02,170-Speed 2495.88 samples/sec Loss 1.0411 LearningRate 0.000002 Epoch: 38 Global Step: 795180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:10,337-Speed 2508.26 samples/sec Loss 1.0836 LearningRate 0.000002 Epoch: 38 Global Step: 795190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:18,543-Speed 2495.95 samples/sec Loss 1.1065 LearningRate 0.000002 Epoch: 38 Global Step: 795200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:26,757-Speed 2493.57 samples/sec Loss 1.0673 LearningRate 0.000002 Epoch: 38 Global Step: 795210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:34,967-Speed 2494.93 samples/sec Loss 1.0603 LearningRate 0.000002 Epoch: 38 Global Step: 795220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:43,170-Speed 2497.37 samples/sec Loss 1.0502 LearningRate 0.000002 Epoch: 38 Global Step: 795230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:51,373-Speed 2496.91 samples/sec Loss 1.1003 LearningRate 0.000002 Epoch: 38 Global Step: 795240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:24:59,529-Speed 2511.29 samples/sec Loss 1.0849 LearningRate 0.000002 Epoch: 38 Global Step: 795250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:07,740-Speed 2494.79 samples/sec Loss 1.0806 LearningRate 0.000002 Epoch: 38 Global Step: 795260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:15,961-Speed 2491.67 samples/sec Loss 1.0609 LearningRate 0.000002 Epoch: 38 Global Step: 795270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:24,169-Speed 2495.50 samples/sec Loss 1.0649 LearningRate 0.000002 Epoch: 38 Global Step: 795280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:32,382-Speed 2493.69 samples/sec Loss 1.0744 LearningRate 0.000002 Epoch: 38 Global Step: 795290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:40,606-Speed 2491.15 samples/sec Loss 1.0923 LearningRate 0.000002 Epoch: 38 Global Step: 795300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:48,762-Speed 2511.52 samples/sec Loss 1.0464 LearningRate 0.000002 Epoch: 38 Global Step: 795310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:25:56,968-Speed 2495.91 samples/sec Loss 1.0751 LearningRate 0.000002 Epoch: 38 Global Step: 795320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:05,180-Speed 2494.60 samples/sec Loss 1.0507 LearningRate 0.000002 Epoch: 38 Global Step: 795330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:13,398-Speed 2492.41 samples/sec Loss 1.0762 LearningRate 0.000002 Epoch: 38 Global Step: 795340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:21,605-Speed 2496.21 samples/sec Loss 1.0648 LearningRate 0.000002 Epoch: 38 Global Step: 795350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:29,810-Speed 2496.32 samples/sec Loss 1.0700 LearningRate 0.000002 Epoch: 38 Global Step: 795360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:37,963-Speed 2512.49 samples/sec Loss 1.0627 LearningRate 0.000002 Epoch: 38 Global Step: 795370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:46,168-Speed 2496.22 samples/sec Loss 1.0713 LearningRate 0.000002 Epoch: 38 Global Step: 795380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:26:54,374-Speed 2496.25 samples/sec Loss 1.0647 LearningRate 0.000002 Epoch: 38 Global Step: 795390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:02,591-Speed 2492.65 samples/sec Loss 1.0925 LearningRate 0.000002 Epoch: 38 Global Step: 795400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:10,804-Speed 2494.04 samples/sec Loss 1.0445 LearningRate 0.000002 Epoch: 38 Global Step: 795410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:19,015-Speed 2494.45 samples/sec Loss 1.0514 LearningRate 0.000002 Epoch: 38 Global Step: 795420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:27,168-Speed 2512.40 samples/sec Loss 1.0465 LearningRate 0.000002 Epoch: 38 Global Step: 795430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:35,392-Speed 2490.80 samples/sec Loss 1.0703 LearningRate 0.000002 Epoch: 38 Global Step: 795440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:43,600-Speed 2495.46 samples/sec Loss 1.0782 LearningRate 0.000002 Epoch: 38 Global Step: 795450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:27:51,809-Speed 2495.62 samples/sec Loss 1.1033 LearningRate 0.000002 Epoch: 38 Global Step: 795460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:28:00,018-Speed 2495.83 samples/sec Loss 1.0795 LearningRate 0.000002 Epoch: 38 Global Step: 795470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:28:08,225-Speed 2495.70 samples/sec Loss 1.0901 LearningRate 0.000002 Epoch: 38 Global Step: 795480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:28:16,379-Speed 2511.76 samples/sec Loss 1.0664 LearningRate 0.000002 Epoch: 38 Global Step: 795490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:28:24,586-Speed 2495.95 samples/sec Loss 1.0646 LearningRate 0.000002 Epoch: 38 Global Step: 795500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:28:32,791-Speed 2496.34 samples/sec Loss 1.0278 LearningRate 0.000002 Epoch: 38 Global Step: 795510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:28:41,000-Speed 2495.27 samples/sec Loss 1.0527 LearningRate 0.000002 Epoch: 38 Global Step: 795520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:28:49,209-Speed 2495.21 samples/sec Loss 1.0700 LearningRate 0.000002 Epoch: 38 Global Step: 795530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:28:57,416-Speed 2495.65 samples/sec Loss 1.0342 LearningRate 0.000002 Epoch: 38 Global Step: 795540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:05,569-Speed 2512.70 samples/sec Loss 1.0817 LearningRate 0.000002 Epoch: 38 Global Step: 795550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:13,777-Speed 2495.33 samples/sec Loss 1.0575 LearningRate 0.000002 Epoch: 38 Global Step: 795560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:21,987-Speed 2494.97 samples/sec Loss 1.0691 LearningRate 0.000002 Epoch: 38 Global Step: 795570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:30,197-Speed 2495.03 samples/sec Loss 1.0402 LearningRate 0.000002 Epoch: 38 Global Step: 795580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:38,406-Speed 2495.18 samples/sec Loss 1.0745 LearningRate 0.000002 Epoch: 38 Global Step: 795590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:46,633-Speed 2489.73 samples/sec Loss 1.0625 LearningRate 0.000002 Epoch: 38 Global Step: 795600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:29:54,788-Speed 2511.62 samples/sec Loss 1.0741 LearningRate 0.000002 Epoch: 38 Global Step: 795610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:03,005-Speed 2493.08 samples/sec Loss 1.0880 LearningRate 0.000002 Epoch: 38 Global Step: 795620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:11,222-Speed 2492.66 samples/sec Loss 1.0908 LearningRate 0.000002 Epoch: 38 Global Step: 795630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:19,434-Speed 2494.46 samples/sec Loss 1.0604 LearningRate 0.000002 Epoch: 38 Global Step: 795640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:27,640-Speed 2496.12 samples/sec Loss 1.1033 LearningRate 0.000002 Epoch: 38 Global Step: 795650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:35,848-Speed 2495.49 samples/sec Loss 1.0847 LearningRate 0.000002 Epoch: 38 Global Step: 795660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:44,008-Speed 2510.22 samples/sec Loss 1.0746 LearningRate 0.000002 Epoch: 38 Global Step: 795670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:30:52,217-Speed 2495.66 samples/sec Loss 1.0466 LearningRate 0.000002 Epoch: 38 Global Step: 795680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:00,427-Speed 2495.38 samples/sec Loss 1.0674 LearningRate 0.000002 Epoch: 38 Global Step: 795690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:08,637-Speed 2494.74 samples/sec Loss 1.0514 LearningRate 0.000002 Epoch: 38 Global Step: 795700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:16,849-Speed 2494.24 samples/sec Loss 1.0437 LearningRate 0.000002 Epoch: 38 Global Step: 795710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:25,057-Speed 2495.70 samples/sec Loss 1.0734 LearningRate 0.000002 Epoch: 38 Global Step: 795720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:33,216-Speed 2510.44 samples/sec Loss 1.0700 LearningRate 0.000002 Epoch: 38 Global Step: 795730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:41,423-Speed 2495.73 samples/sec Loss 1.0757 LearningRate 0.000002 Epoch: 38 Global Step: 795740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:49,634-Speed 2494.59 samples/sec Loss 1.0777 LearningRate 0.000002 Epoch: 38 Global Step: 795750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:31:57,840-Speed 2496.20 samples/sec Loss 1.0763 LearningRate 0.000002 Epoch: 38 Global Step: 795760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:06,050-Speed 2495.39 samples/sec Loss 1.0708 LearningRate 0.000002 Epoch: 38 Global Step: 795770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:14,262-Speed 2494.37 samples/sec Loss 1.0671 LearningRate 0.000002 Epoch: 38 Global Step: 795780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:22,428-Speed 2508.34 samples/sec Loss 1.0628 LearningRate 0.000002 Epoch: 38 Global Step: 795790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:30,637-Speed 2495.17 samples/sec Loss 1.0662 LearningRate 0.000002 Epoch: 38 Global Step: 795800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:38,842-Speed 2496.62 samples/sec Loss 1.0349 LearningRate 0.000002 Epoch: 38 Global Step: 795810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:47,047-Speed 2496.27 samples/sec Loss 1.0707 LearningRate 0.000002 Epoch: 38 Global Step: 795820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:32:55,262-Speed 2493.63 samples/sec Loss 1.0748 LearningRate 0.000002 Epoch: 38 Global Step: 795830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:03,469-Speed 2495.58 samples/sec Loss 1.0331 LearningRate 0.000002 Epoch: 38 Global Step: 795840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:11,621-Speed 2512.84 samples/sec Loss 1.0696 LearningRate 0.000002 Epoch: 38 Global Step: 795850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:19,826-Speed 2496.45 samples/sec Loss 1.0956 LearningRate 0.000002 Epoch: 38 Global Step: 795860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:28,032-Speed 2496.13 samples/sec Loss 1.0414 LearningRate 0.000002 Epoch: 38 Global Step: 795870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:36,239-Speed 2495.67 samples/sec Loss 1.0828 LearningRate 0.000002 Epoch: 38 Global Step: 795880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:44,456-Speed 2492.98 samples/sec Loss 1.0753 LearningRate 0.000002 Epoch: 38 Global Step: 795890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:33:52,663-Speed 2495.97 samples/sec Loss 1.0750 LearningRate 0.000002 Epoch: 38 Global Step: 795900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:00,815-Speed 2512.64 samples/sec Loss 1.0827 LearningRate 0.000002 Epoch: 38 Global Step: 795910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:09,021-Speed 2496.38 samples/sec Loss 1.0515 LearningRate 0.000002 Epoch: 38 Global Step: 795920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:17,227-Speed 2496.17 samples/sec Loss 1.0495 LearningRate 0.000002 Epoch: 38 Global Step: 795930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:25,438-Speed 2494.50 samples/sec Loss 1.0811 LearningRate 0.000002 Epoch: 38 Global Step: 795940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:33,658-Speed 2491.80 samples/sec Loss 1.0939 LearningRate 0.000002 Epoch: 38 Global Step: 795950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:41,883-Speed 2490.68 samples/sec Loss 1.0799 LearningRate 0.000002 Epoch: 38 Global Step: 795960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:50,039-Speed 2511.52 samples/sec Loss 1.0407 LearningRate 0.000002 Epoch: 38 Global Step: 795970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:34:58,246-Speed 2495.66 samples/sec Loss 1.0678 LearningRate 0.000002 Epoch: 38 Global Step: 795980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:35:06,456-Speed 2495.14 samples/sec Loss 1.0678 LearningRate 0.000002 Epoch: 38 Global Step: 795990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:35:14,663-Speed 2495.63 samples/sec Loss 1.0561 LearningRate 0.000002 Epoch: 38 Global Step: 796000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-07-13 04:35:22,826-Speed 2509.32 samples/sec Loss 1.0711 LearningRate 0.000002 Epoch: 38 Global Step: 796010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:35:31,030-Speed 2496.68 samples/sec Loss 1.0382 LearningRate 0.000002 Epoch: 38 Global Step: 796020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:35:39,182-Speed 2512.49 samples/sec Loss 1.0185 LearningRate 0.000002 Epoch: 38 Global Step: 796030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:35:47,386-Speed 2496.99 samples/sec Loss 1.0711 LearningRate 0.000002 Epoch: 38 Global Step: 796040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:35:55,590-Speed 2496.60 samples/sec Loss 1.0701 LearningRate 0.000002 Epoch: 38 Global Step: 796050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:03,798-Speed 2495.40 samples/sec Loss 1.0461 LearningRate 0.000002 Epoch: 38 Global Step: 796060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:12,005-Speed 2495.96 samples/sec Loss 1.0692 LearningRate 0.000002 Epoch: 38 Global Step: 796070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:20,213-Speed 2495.62 samples/sec Loss 1.0895 LearningRate 0.000002 Epoch: 38 Global Step: 796080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:28,365-Speed 2512.60 samples/sec Loss 1.0779 LearningRate 0.000002 Epoch: 38 Global Step: 796090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:36,580-Speed 2493.95 samples/sec Loss 1.0803 LearningRate 0.000002 Epoch: 38 Global Step: 796100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:44,784-Speed 2496.60 samples/sec Loss 1.0803 LearningRate 0.000002 Epoch: 38 Global Step: 796110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:36:52,993-Speed 2495.20 samples/sec Loss 1.0725 LearningRate 0.000002 Epoch: 38 Global Step: 796120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:01,196-Speed 2497.00 samples/sec Loss 1.0935 LearningRate 0.000002 Epoch: 38 Global Step: 796130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:09,404-Speed 2495.67 samples/sec Loss 1.0599 LearningRate 0.000002 Epoch: 38 Global Step: 796140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:17,553-Speed 2513.68 samples/sec Loss 1.0648 LearningRate 0.000002 Epoch: 38 Global Step: 796150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:25,770-Speed 2492.81 samples/sec Loss 1.0617 LearningRate 0.000002 Epoch: 38 Global Step: 796160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:33,975-Speed 2496.65 samples/sec Loss 1.0509 LearningRate 0.000002 Epoch: 38 Global Step: 796170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:42,195-Speed 2491.87 samples/sec Loss 1.0517 LearningRate 0.000002 Epoch: 38 Global Step: 796180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:50,409-Speed 2494.00 samples/sec Loss 1.0500 LearningRate 0.000002 Epoch: 38 Global Step: 796190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:37:58,617-Speed 2495.66 samples/sec Loss 1.0627 LearningRate 0.000002 Epoch: 38 Global Step: 796200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:06,773-Speed 2511.14 samples/sec Loss 1.0880 LearningRate 0.000002 Epoch: 38 Global Step: 796210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:14,980-Speed 2495.94 samples/sec Loss 1.0566 LearningRate 0.000002 Epoch: 38 Global Step: 796220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:23,183-Speed 2496.94 samples/sec Loss 1.0548 LearningRate 0.000002 Epoch: 38 Global Step: 796230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:31,389-Speed 2496.30 samples/sec Loss 1.0917 LearningRate 0.000002 Epoch: 38 Global Step: 796240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:39,593-Speed 2496.49 samples/sec Loss 1.0872 LearningRate 0.000002 Epoch: 38 Global Step: 796250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:47,799-Speed 2496.58 samples/sec Loss 1.0494 LearningRate 0.000002 Epoch: 38 Global Step: 796260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:38:55,952-Speed 2512.47 samples/sec Loss 1.0595 LearningRate 0.000002 Epoch: 38 Global Step: 796270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:04,159-Speed 2495.69 samples/sec Loss 1.0946 LearningRate 0.000002 Epoch: 38 Global Step: 796280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:12,361-Speed 2497.58 samples/sec Loss 1.0664 LearningRate 0.000002 Epoch: 38 Global Step: 796290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:20,575-Speed 2493.55 samples/sec Loss 1.0700 LearningRate 0.000002 Epoch: 38 Global Step: 796300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:28,784-Speed 2495.19 samples/sec Loss 1.0799 LearningRate 0.000002 Epoch: 38 Global Step: 796310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:36,994-Speed 2495.14 samples/sec Loss 1.0538 LearningRate 0.000002 Epoch: 38 Global Step: 796320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:45,145-Speed 2513.11 samples/sec Loss 1.0522 LearningRate 0.000002 Epoch: 38 Global Step: 796330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:39:53,356-Speed 2494.56 samples/sec Loss 1.0523 LearningRate 0.000002 Epoch: 38 Global Step: 796340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:01,558-Speed 2497.24 samples/sec Loss 1.0506 LearningRate 0.000002 Epoch: 38 Global Step: 796350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:09,765-Speed 2495.77 samples/sec Loss 1.0767 LearningRate 0.000002 Epoch: 38 Global Step: 796360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:17,973-Speed 2495.64 samples/sec Loss 1.0557 LearningRate 0.000002 Epoch: 38 Global Step: 796370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:26,175-Speed 2497.39 samples/sec Loss 1.0720 LearningRate 0.000002 Epoch: 38 Global Step: 796380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:34,327-Speed 2512.69 samples/sec Loss 1.0761 LearningRate 0.000002 Epoch: 38 Global Step: 796390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:42,540-Speed 2493.76 samples/sec Loss 1.0965 LearningRate 0.000002 Epoch: 38 Global Step: 796400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:50,748-Speed 2495.51 samples/sec Loss 1.0649 LearningRate 0.000002 Epoch: 38 Global Step: 796410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:40:58,954-Speed 2496.37 samples/sec Loss 1.0596 LearningRate 0.000002 Epoch: 38 Global Step: 796420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:07,165-Speed 2494.62 samples/sec Loss 1.0592 LearningRate 0.000002 Epoch: 38 Global Step: 796430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:15,383-Speed 2492.40 samples/sec Loss 1.1082 LearningRate 0.000002 Epoch: 38 Global Step: 796440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:23,546-Speed 2509.40 samples/sec Loss 1.0840 LearningRate 0.000002 Epoch: 38 Global Step: 796450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:31,752-Speed 2496.05 samples/sec Loss 1.0926 LearningRate 0.000002 Epoch: 38 Global Step: 796460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:39,983-Speed 2488.63 samples/sec Loss 1.0513 LearningRate 0.000002 Epoch: 38 Global Step: 796470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:48,188-Speed 2496.42 samples/sec Loss 1.0530 LearningRate 0.000002 Epoch: 38 Global Step: 796480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:41:56,396-Speed 2495.55 samples/sec Loss 1.0501 LearningRate 0.000002 Epoch: 38 Global Step: 796490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:04,606-Speed 2494.71 samples/sec Loss 1.0678 LearningRate 0.000002 Epoch: 38 Global Step: 796500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:12,758-Speed 2512.58 samples/sec Loss 1.0880 LearningRate 0.000002 Epoch: 38 Global Step: 796510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:20,980-Speed 2491.23 samples/sec Loss 1.0648 LearningRate 0.000002 Epoch: 38 Global Step: 796520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:29,191-Speed 2494.91 samples/sec Loss 1.0707 LearningRate 0.000002 Epoch: 38 Global Step: 796530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:37,402-Speed 2494.56 samples/sec Loss 1.0874 LearningRate 0.000002 Epoch: 38 Global Step: 796540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:45,611-Speed 2495.25 samples/sec Loss 1.0353 LearningRate 0.000002 Epoch: 38 Global Step: 796550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:42:53,821-Speed 2494.73 samples/sec Loss 1.0624 LearningRate 0.000002 Epoch: 38 Global Step: 796560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:01,972-Speed 2513.10 samples/sec Loss 1.0837 LearningRate 0.000002 Epoch: 38 Global Step: 796570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:10,190-Speed 2492.40 samples/sec Loss 1.0758 LearningRate 0.000002 Epoch: 38 Global Step: 796580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:18,396-Speed 2496.00 samples/sec Loss 1.0611 LearningRate 0.000002 Epoch: 38 Global Step: 796590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:26,603-Speed 2495.94 samples/sec Loss 1.0737 LearningRate 0.000002 Epoch: 38 Global Step: 796600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:34,806-Speed 2497.50 samples/sec Loss 1.0696 LearningRate 0.000002 Epoch: 38 Global Step: 796610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:43,028-Speed 2491.10 samples/sec Loss 1.0585 LearningRate 0.000002 Epoch: 38 Global Step: 796620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:51,182-Speed 2512.03 samples/sec Loss 1.0537 LearningRate 0.000002 Epoch: 38 Global Step: 796630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:43:59,396-Speed 2493.64 samples/sec Loss 1.0629 LearningRate 0.000002 Epoch: 38 Global Step: 796640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:07,615-Speed 2492.38 samples/sec Loss 1.0731 LearningRate 0.000002 Epoch: 38 Global Step: 796650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:15,814-Speed 2498.02 samples/sec Loss 1.0601 LearningRate 0.000002 Epoch: 38 Global Step: 796660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:24,054-Speed 2486.02 samples/sec Loss 1.0745 LearningRate 0.000002 Epoch: 38 Global Step: 796670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:32,259-Speed 2496.16 samples/sec Loss 1.0931 LearningRate 0.000002 Epoch: 38 Global Step: 796680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:40,414-Speed 2512.13 samples/sec Loss 1.1129 LearningRate 0.000002 Epoch: 38 Global Step: 796690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:48,619-Speed 2496.31 samples/sec Loss 1.0903 LearningRate 0.000002 Epoch: 38 Global Step: 796700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:44:56,822-Speed 2497.13 samples/sec Loss 1.0776 LearningRate 0.000002 Epoch: 38 Global Step: 796710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:05,025-Speed 2497.19 samples/sec Loss 1.0811 LearningRate 0.000002 Epoch: 38 Global Step: 796720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:13,229-Speed 2496.75 samples/sec Loss 1.0763 LearningRate 0.000002 Epoch: 38 Global Step: 796730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:21,432-Speed 2497.02 samples/sec Loss 1.0856 LearningRate 0.000002 Epoch: 38 Global Step: 796740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:29,596-Speed 2508.91 samples/sec Loss 1.0323 LearningRate 0.000002 Epoch: 38 Global Step: 796750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:37,800-Speed 2496.50 samples/sec Loss 1.0515 LearningRate 0.000002 Epoch: 38 Global Step: 796760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:46,006-Speed 2496.58 samples/sec Loss 1.0521 LearningRate 0.000002 Epoch: 38 Global Step: 796770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:45:54,209-Speed 2497.03 samples/sec Loss 1.0679 LearningRate 0.000002 Epoch: 38 Global Step: 796780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-07-13 04:46:02,417-Speed 2495.61 samples/sec Loss 1.0963 LearningRate 0.000002 Epoch: 38 Global Step: 796790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:10,622-Speed 2496.44 samples/sec Loss 1.0681 LearningRate 0.000002 Epoch: 38 Global Step: 796800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:18,775-Speed 2512.66 samples/sec Loss 1.0781 LearningRate 0.000002 Epoch: 38 Global Step: 796810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:26,977-Speed 2497.38 samples/sec Loss 1.0812 LearningRate 0.000002 Epoch: 38 Global Step: 796820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:35,182-Speed 2496.19 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 796830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:43,388-Speed 2496.22 samples/sec Loss 1.0501 LearningRate 0.000002 Epoch: 38 Global Step: 796840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:51,588-Speed 2498.11 samples/sec Loss 1.0744 LearningRate 0.000002 Epoch: 38 Global Step: 796850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:46:59,797-Speed 2495.10 samples/sec Loss 1.0508 LearningRate 0.000002 Epoch: 38 Global Step: 796860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:07,948-Speed 2512.93 samples/sec Loss 1.1016 LearningRate 0.000002 Epoch: 38 Global Step: 796870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:16,150-Speed 2497.86 samples/sec Loss 1.0586 LearningRate 0.000002 Epoch: 38 Global Step: 796880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:24,360-Speed 2494.72 samples/sec Loss 1.0468 LearningRate 0.000002 Epoch: 38 Global Step: 796890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:32,567-Speed 2495.81 samples/sec Loss 1.0682 LearningRate 0.000002 Epoch: 38 Global Step: 796900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:40,776-Speed 2495.09 samples/sec Loss 1.0722 LearningRate 0.000002 Epoch: 38 Global Step: 796910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:48,981-Speed 2496.55 samples/sec Loss 1.0707 LearningRate 0.000002 Epoch: 38 Global Step: 796920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:47:57,131-Speed 2513.27 samples/sec Loss 1.0825 LearningRate 0.000002 Epoch: 38 Global Step: 796930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:48:05,336-Speed 2496.36 samples/sec Loss 1.0890 LearningRate 0.000002 Epoch: 38 Global Step: 796940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:48:13,539-Speed 2497.06 samples/sec Loss 1.0793 LearningRate 0.000002 Epoch: 38 Global Step: 796950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 04:48:21,703-Speed 2508.96 samples/sec Loss 1.0469 LearningRate 0.000002 Epoch: 38 Global Step: 796960 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:48:29,905-Speed 2497.27 samples/sec Loss 1.0774 LearningRate 0.000002 Epoch: 38 Global Step: 796970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:48:38,120-Speed 2493.46 samples/sec Loss 1.0812 LearningRate 0.000002 Epoch: 38 Global Step: 796980 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:48:46,279-Speed 2510.25 samples/sec Loss 1.0598 LearningRate 0.000002 Epoch: 38 Global Step: 796990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:48:54,483-Speed 2497.02 samples/sec Loss 1.0605 LearningRate 0.000002 Epoch: 38 Global Step: 797000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:02,684-Speed 2497.53 samples/sec Loss 1.0541 LearningRate 0.000002 Epoch: 38 Global Step: 797010 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:10,886-Speed 2497.25 samples/sec Loss 1.0612 LearningRate 0.000002 Epoch: 38 Global Step: 797020 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:19,087-Speed 2497.58 samples/sec Loss 1.0446 LearningRate 0.000002 Epoch: 38 Global Step: 797030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:27,291-Speed 2497.14 samples/sec Loss 1.0601 LearningRate 0.000002 Epoch: 38 Global Step: 797040 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:35,445-Speed 2511.97 samples/sec Loss 1.0549 LearningRate 0.000002 Epoch: 38 Global Step: 797050 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:43,658-Speed 2493.77 samples/sec Loss 1.0500 LearningRate 0.000002 Epoch: 38 Global Step: 797060 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:49:51,859-Speed 2497.60 samples/sec Loss 1.0760 LearningRate 0.000002 Epoch: 38 Global Step: 797070 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:00,063-Speed 2496.73 samples/sec Loss 1.0625 LearningRate 0.000002 Epoch: 38 Global Step: 797080 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:08,278-Speed 2493.36 samples/sec Loss 1.0867 LearningRate 0.000002 Epoch: 38 Global Step: 797090 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:16,482-Speed 2496.71 samples/sec Loss 1.0635 LearningRate 0.000002 Epoch: 38 Global Step: 797100 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:24,633-Speed 2513.08 samples/sec Loss 1.0542 LearningRate 0.000002 Epoch: 38 Global Step: 797110 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:32,841-Speed 2495.49 samples/sec Loss 1.0439 LearningRate 0.000002 Epoch: 38 Global Step: 797120 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:41,052-Speed 2494.52 samples/sec Loss 1.0558 LearningRate 0.000002 Epoch: 38 Global Step: 797130 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:49,259-Speed 2495.73 samples/sec Loss 1.0739 LearningRate 0.000002 Epoch: 38 Global Step: 797140 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:50:57,468-Speed 2495.66 samples/sec Loss 1.0538 LearningRate 0.000002 Epoch: 38 Global Step: 797150 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:05,677-Speed 2495.03 samples/sec Loss 1.0439 LearningRate 0.000002 Epoch: 38 Global Step: 797160 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:13,846-Speed 2507.51 samples/sec Loss 1.0487 LearningRate 0.000002 Epoch: 38 Global Step: 797170 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:22,058-Speed 2494.48 samples/sec Loss 1.0520 LearningRate 0.000002 Epoch: 38 Global Step: 797180 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:30,265-Speed 2495.93 samples/sec Loss 1.0765 LearningRate 0.000002 Epoch: 38 Global Step: 797190 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:38,481-Speed 2493.18 samples/sec Loss 1.0626 LearningRate 0.000002 Epoch: 38 Global Step: 797200 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:46,684-Speed 2497.00 samples/sec Loss 1.0804 LearningRate 0.000002 Epoch: 38 Global Step: 797210 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:51:54,887-Speed 2496.92 samples/sec Loss 1.0448 LearningRate 0.000002 Epoch: 38 Global Step: 797220 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:03,037-Speed 2513.01 samples/sec Loss 1.0611 LearningRate 0.000002 Epoch: 38 Global Step: 797230 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:11,253-Speed 2493.07 samples/sec Loss 1.1034 LearningRate 0.000002 Epoch: 38 Global Step: 797240 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:19,472-Speed 2492.21 samples/sec Loss 1.0616 LearningRate 0.000002 Epoch: 38 Global Step: 797250 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:27,679-Speed 2495.92 samples/sec Loss 1.1057 LearningRate 0.000002 Epoch: 38 Global Step: 797260 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:35,880-Speed 2497.70 samples/sec Loss 1.0611 LearningRate 0.000002 Epoch: 38 Global Step: 797270 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:44,090-Speed 2494.78 samples/sec Loss 1.0890 LearningRate 0.000002 Epoch: 38 Global Step: 797280 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:52:52,247-Speed 2511.02 samples/sec Loss 1.0335 LearningRate 0.000002 Epoch: 38 Global Step: 797290 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:00,452-Speed 2496.86 samples/sec Loss 1.0772 LearningRate 0.000002 Epoch: 38 Global Step: 797300 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:08,655-Speed 2496.92 samples/sec Loss 1.0546 LearningRate 0.000002 Epoch: 38 Global Step: 797310 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:16,857-Speed 2497.39 samples/sec Loss 1.0315 LearningRate 0.000002 Epoch: 38 Global Step: 797320 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:25,060-Speed 2497.03 samples/sec Loss 1.0256 LearningRate 0.000002 Epoch: 38 Global Step: 797330 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:33,262-Speed 2497.23 samples/sec Loss 1.0584 LearningRate 0.000002 Epoch: 38 Global Step: 797340 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:41,412-Speed 2513.45 samples/sec Loss 1.0583 LearningRate 0.000002 Epoch: 38 Global Step: 797350 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:49,615-Speed 2497.19 samples/sec Loss 1.0803 LearningRate 0.000002 Epoch: 38 Global Step: 797360 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:53:57,833-Speed 2492.64 samples/sec Loss 1.0804 LearningRate 0.000002 Epoch: 38 Global Step: 797370 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:06,039-Speed 2496.06 samples/sec Loss 1.0815 LearningRate 0.000002 Epoch: 38 Global Step: 797380 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:14,242-Speed 2497.10 samples/sec Loss 1.0923 LearningRate 0.000002 Epoch: 38 Global Step: 797390 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:22,446-Speed 2497.06 samples/sec Loss 1.0505 LearningRate 0.000002 Epoch: 38 Global Step: 797400 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:30,606-Speed 2509.99 samples/sec Loss 1.0549 LearningRate 0.000002 Epoch: 38 Global Step: 797410 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:38,811-Speed 2496.54 samples/sec Loss 1.0821 LearningRate 0.000002 Epoch: 38 Global Step: 797420 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:47,018-Speed 2495.65 samples/sec Loss 1.0609 LearningRate 0.000002 Epoch: 38 Global Step: 797430 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:54:55,224-Speed 2496.27 samples/sec Loss 1.1020 LearningRate 0.000002 Epoch: 38 Global Step: 797440 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:03,439-Speed 2493.11 samples/sec Loss 1.1061 LearningRate 0.000002 Epoch: 38 Global Step: 797450 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:11,638-Speed 2498.59 samples/sec Loss 1.0472 LearningRate 0.000002 Epoch: 38 Global Step: 797460 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:19,790-Speed 2512.59 samples/sec Loss 1.0598 LearningRate 0.000002 Epoch: 38 Global Step: 797470 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:27,998-Speed 2495.39 samples/sec Loss 1.0894 LearningRate 0.000002 Epoch: 38 Global Step: 797480 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:36,198-Speed 2497.91 samples/sec Loss 1.0827 LearningRate 0.000002 Epoch: 38 Global Step: 797490 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:44,403-Speed 2496.46 samples/sec Loss 1.0896 LearningRate 0.000002 Epoch: 38 Global Step: 797500 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:55:52,623-Speed 2491.79 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 797510 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:00,831-Speed 2495.74 samples/sec Loss 1.0723 LearningRate 0.000002 Epoch: 38 Global Step: 797520 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:08,984-Speed 2512.32 samples/sec Loss 1.0687 LearningRate 0.000002 Epoch: 38 Global Step: 797530 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:17,189-Speed 2496.59 samples/sec Loss 1.0759 LearningRate 0.000002 Epoch: 38 Global Step: 797540 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:25,397-Speed 2495.61 samples/sec Loss 1.0561 LearningRate 0.000002 Epoch: 38 Global Step: 797550 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:33,600-Speed 2496.77 samples/sec Loss 1.0567 LearningRate 0.000002 Epoch: 38 Global Step: 797560 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:41,807-Speed 2496.05 samples/sec Loss 1.0576 LearningRate 0.000002 Epoch: 38 Global Step: 797570 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:50,013-Speed 2496.01 samples/sec Loss 1.0451 LearningRate 0.000002 Epoch: 38 Global Step: 797580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:56:58,166-Speed 2512.17 samples/sec Loss 1.0650 LearningRate 0.000002 Epoch: 38 Global Step: 797590 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:06,370-Speed 2497.11 samples/sec Loss 1.0749 LearningRate 0.000002 Epoch: 38 Global Step: 797600 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:14,570-Speed 2497.73 samples/sec Loss 1.0733 LearningRate 0.000002 Epoch: 38 Global Step: 797610 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:22,777-Speed 2496.10 samples/sec Loss 1.0414 LearningRate 0.000002 Epoch: 38 Global Step: 797620 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:30,981-Speed 2496.59 samples/sec Loss 1.0860 LearningRate 0.000002 Epoch: 38 Global Step: 797630 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:39,184-Speed 2497.01 samples/sec Loss 1.0810 LearningRate 0.000002 Epoch: 38 Global Step: 797640 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:47,333-Speed 2513.62 samples/sec Loss 1.0741 LearningRate 0.000002 Epoch: 38 Global Step: 797650 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:57:55,549-Speed 2492.93 samples/sec Loss 1.0768 LearningRate 0.000002 Epoch: 38 Global Step: 797660 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:03,755-Speed 2495.92 samples/sec Loss 1.0348 LearningRate 0.000002 Epoch: 38 Global Step: 797670 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:11,971-Speed 2493.16 samples/sec Loss 1.0946 LearningRate 0.000002 Epoch: 38 Global Step: 797680 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:20,186-Speed 2493.69 samples/sec Loss 1.0819 LearningRate 0.000002 Epoch: 38 Global Step: 797690 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:28,389-Speed 2496.98 samples/sec Loss 1.0836 LearningRate 0.000002 Epoch: 38 Global Step: 797700 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:36,545-Speed 2511.56 samples/sec Loss 1.0575 LearningRate 0.000002 Epoch: 38 Global Step: 797710 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:44,750-Speed 2496.20 samples/sec Loss 1.0690 LearningRate 0.000002 Epoch: 38 Global Step: 797720 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:58:52,959-Speed 2495.52 samples/sec Loss 1.0788 LearningRate 0.000002 Epoch: 38 Global Step: 797730 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:01,164-Speed 2496.34 samples/sec Loss 1.0766 LearningRate 0.000002 Epoch: 38 Global Step: 797740 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:09,374-Speed 2495.00 samples/sec Loss 1.0485 LearningRate 0.000002 Epoch: 38 Global Step: 797750 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:17,574-Speed 2497.69 samples/sec Loss 1.0894 LearningRate 0.000002 Epoch: 38 Global Step: 797760 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:25,729-Speed 2511.83 samples/sec Loss 1.0582 LearningRate 0.000002 Epoch: 38 Global Step: 797770 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:33,933-Speed 2496.64 samples/sec Loss 1.0648 LearningRate 0.000002 Epoch: 38 Global Step: 797780 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:42,137-Speed 2497.07 samples/sec Loss 1.0546 LearningRate 0.000002 Epoch: 38 Global Step: 797790 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:50,348-Speed 2494.78 samples/sec Loss 1.0578 LearningRate 0.000002 Epoch: 38 Global Step: 797800 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 04:59:58,550-Speed 2497.25 samples/sec Loss 1.0833 LearningRate 0.000002 Epoch: 38 Global Step: 797810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:06,760-Speed 2494.83 samples/sec Loss 1.0752 LearningRate 0.000002 Epoch: 38 Global Step: 797820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:14,911-Speed 2512.88 samples/sec Loss 1.0614 LearningRate 0.000002 Epoch: 38 Global Step: 797830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:23,117-Speed 2496.13 samples/sec Loss 1.0819 LearningRate 0.000002 Epoch: 38 Global Step: 797840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:31,323-Speed 2496.27 samples/sec Loss 1.0695 LearningRate 0.000002 Epoch: 38 Global Step: 797850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:39,525-Speed 2497.20 samples/sec Loss 1.0556 LearningRate 0.000002 Epoch: 38 Global Step: 797860 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:47,729-Speed 2496.58 samples/sec Loss 1.0575 LearningRate 0.000002 Epoch: 38 Global Step: 797870 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:00:55,933-Speed 2496.99 samples/sec Loss 1.0805 LearningRate 0.000002 Epoch: 38 Global Step: 797880 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:04,096-Speed 2509.13 samples/sec Loss 1.0595 LearningRate 0.000002 Epoch: 38 Global Step: 797890 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:12,304-Speed 2495.37 samples/sec Loss 1.0356 LearningRate 0.000002 Epoch: 38 Global Step: 797900 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:20,508-Speed 2496.81 samples/sec Loss 1.1053 LearningRate 0.000002 Epoch: 38 Global Step: 797910 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:28,737-Speed 2489.31 samples/sec Loss 1.0839 LearningRate 0.000002 Epoch: 38 Global Step: 797920 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:36,941-Speed 2496.42 samples/sec Loss 1.0807 LearningRate 0.000002 Epoch: 38 Global Step: 797930 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:45,145-Speed 2496.61 samples/sec Loss 1.0425 LearningRate 0.000002 Epoch: 38 Global Step: 797940 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:01:53,299-Speed 2512.10 samples/sec Loss 1.0638 LearningRate 0.000002 Epoch: 38 Global Step: 797950 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:01,501-Speed 2497.23 samples/sec Loss 1.0569 LearningRate 0.000002 Epoch: 38 Global Step: 797960 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:09,708-Speed 2495.85 samples/sec Loss 1.0445 LearningRate 0.000002 Epoch: 38 Global Step: 797970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:17,911-Speed 2497.08 samples/sec Loss 1.0650 LearningRate 0.000002 Epoch: 38 Global Step: 797980 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:26,115-Speed 2496.75 samples/sec Loss 1.0943 LearningRate 0.000002 Epoch: 38 Global Step: 797990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:34,321-Speed 2495.99 samples/sec Loss 1.0726 LearningRate 0.000002 Epoch: 38 Global Step: 798000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:42,474-Speed 2512.47 samples/sec Loss 1.0456 LearningRate 0.000002 Epoch: 38 Global Step: 798010 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:50,677-Speed 2496.96 samples/sec Loss 1.0452 LearningRate 0.000002 Epoch: 38 Global Step: 798020 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:02:58,886-Speed 2495.67 samples/sec Loss 1.0734 LearningRate 0.000002 Epoch: 38 Global Step: 798030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:07,088-Speed 2497.47 samples/sec Loss 1.1018 LearningRate 0.000002 Epoch: 38 Global Step: 798040 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:15,290-Speed 2497.15 samples/sec Loss 1.0565 LearningRate 0.000002 Epoch: 38 Global Step: 798050 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:23,495-Speed 2496.46 samples/sec Loss 1.0617 LearningRate 0.000002 Epoch: 38 Global Step: 798060 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:31,644-Speed 2513.68 samples/sec Loss 1.0868 LearningRate 0.000002 Epoch: 38 Global Step: 798070 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:39,855-Speed 2494.55 samples/sec Loss 1.0647 LearningRate 0.000002 Epoch: 38 Global Step: 798080 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:48,062-Speed 2495.55 samples/sec Loss 1.0585 LearningRate 0.000002 Epoch: 38 Global Step: 798090 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:03:56,270-Speed 2495.47 samples/sec Loss 1.0740 LearningRate 0.000002 Epoch: 38 Global Step: 798100 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:04,478-Speed 2495.74 samples/sec Loss 1.0858 LearningRate 0.000002 Epoch: 38 Global Step: 798110 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:12,680-Speed 2497.25 samples/sec Loss 1.0795 LearningRate 0.000002 Epoch: 38 Global Step: 798120 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:20,832-Speed 2512.44 samples/sec Loss 1.0775 LearningRate 0.000002 Epoch: 38 Global Step: 798130 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:29,048-Speed 2492.95 samples/sec Loss 1.0807 LearningRate 0.000002 Epoch: 38 Global Step: 798140 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:37,253-Speed 2496.50 samples/sec Loss 1.0698 LearningRate 0.000002 Epoch: 38 Global Step: 798150 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-07-13 05:04:45,457-Speed 2496.79 samples/sec Loss 1.0901 LearningRate 0.000002 Epoch: 38 Global Step: 798160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:04:53,662-Speed 2496.23 samples/sec Loss 1.0541 LearningRate 0.000002 Epoch: 38 Global Step: 798170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:01,866-Speed 2496.99 samples/sec Loss 1.0529 LearningRate 0.000002 Epoch: 38 Global Step: 798180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:10,017-Speed 2513.02 samples/sec Loss 1.0797 LearningRate 0.000002 Epoch: 38 Global Step: 798190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:18,218-Speed 2497.39 samples/sec Loss 1.0994 LearningRate 0.000002 Epoch: 38 Global Step: 798200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:26,436-Speed 2492.59 samples/sec Loss 1.0255 LearningRate 0.000002 Epoch: 38 Global Step: 798210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:34,644-Speed 2495.47 samples/sec Loss 1.0867 LearningRate 0.000002 Epoch: 38 Global Step: 798220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:42,849-Speed 2496.54 samples/sec Loss 1.0537 LearningRate 0.000002 Epoch: 38 Global Step: 798230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:51,054-Speed 2496.18 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 798240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:05:59,203-Speed 2513.79 samples/sec Loss 1.0942 LearningRate 0.000002 Epoch: 38 Global Step: 798250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:07,405-Speed 2497.39 samples/sec Loss 1.0949 LearningRate 0.000002 Epoch: 38 Global Step: 798260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:15,612-Speed 2495.61 samples/sec Loss 1.0717 LearningRate 0.000002 Epoch: 38 Global Step: 798270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:23,821-Speed 2495.70 samples/sec Loss 1.0724 LearningRate 0.000002 Epoch: 38 Global Step: 798280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:32,027-Speed 2496.06 samples/sec Loss 1.0589 LearningRate 0.000002 Epoch: 38 Global Step: 798290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:40,235-Speed 2495.47 samples/sec Loss 1.0590 LearningRate 0.000002 Epoch: 38 Global Step: 798300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:48,384-Speed 2513.41 samples/sec Loss 1.0606 LearningRate 0.000002 Epoch: 38 Global Step: 798310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:06:56,591-Speed 2495.99 samples/sec Loss 1.0568 LearningRate 0.000002 Epoch: 38 Global Step: 798320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:04,800-Speed 2495.29 samples/sec Loss 1.0586 LearningRate 0.000002 Epoch: 38 Global Step: 798330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:13,007-Speed 2495.91 samples/sec Loss 1.0456 LearningRate 0.000002 Epoch: 38 Global Step: 798340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:21,222-Speed 2493.31 samples/sec Loss 1.0317 LearningRate 0.000002 Epoch: 38 Global Step: 798350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:29,429-Speed 2495.71 samples/sec Loss 1.0713 LearningRate 0.000002 Epoch: 38 Global Step: 798360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:37,581-Speed 2512.80 samples/sec Loss 1.0838 LearningRate 0.000002 Epoch: 38 Global Step: 798370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:45,788-Speed 2496.04 samples/sec Loss 1.0858 LearningRate 0.000002 Epoch: 38 Global Step: 798380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:07:53,994-Speed 2496.20 samples/sec Loss 1.0587 LearningRate 0.000002 Epoch: 38 Global Step: 798390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:02,201-Speed 2495.59 samples/sec Loss 1.0608 LearningRate 0.000002 Epoch: 38 Global Step: 798400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:10,411-Speed 2495.06 samples/sec Loss 1.0603 LearningRate 0.000002 Epoch: 38 Global Step: 798410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:18,619-Speed 2495.42 samples/sec Loss 1.0809 LearningRate 0.000002 Epoch: 38 Global Step: 798420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:26,773-Speed 2512.09 samples/sec Loss 1.0718 LearningRate 0.000002 Epoch: 38 Global Step: 798430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:34,982-Speed 2495.28 samples/sec Loss 1.0653 LearningRate 0.000002 Epoch: 38 Global Step: 798440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:43,190-Speed 2495.32 samples/sec Loss 1.0483 LearningRate 0.000002 Epoch: 38 Global Step: 798450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:51,393-Speed 2497.00 samples/sec Loss 1.0541 LearningRate 0.000002 Epoch: 38 Global Step: 798460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:08:59,600-Speed 2495.88 samples/sec Loss 1.0537 LearningRate 0.000002 Epoch: 38 Global Step: 798470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:07,810-Speed 2495.16 samples/sec Loss 1.0958 LearningRate 0.000002 Epoch: 38 Global Step: 798480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:15,979-Speed 2507.66 samples/sec Loss 1.0730 LearningRate 0.000002 Epoch: 38 Global Step: 798490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:24,185-Speed 2495.82 samples/sec Loss 1.0649 LearningRate 0.000002 Epoch: 38 Global Step: 798500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:32,388-Speed 2497.17 samples/sec Loss 1.0603 LearningRate 0.000002 Epoch: 38 Global Step: 798510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:40,590-Speed 2497.20 samples/sec Loss 1.0470 LearningRate 0.000002 Epoch: 38 Global Step: 798520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:48,793-Speed 2497.17 samples/sec Loss 1.0679 LearningRate 0.000002 Epoch: 38 Global Step: 798530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:09:56,998-Speed 2496.11 samples/sec Loss 1.0667 LearningRate 0.000002 Epoch: 38 Global Step: 798540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:05,149-Speed 2513.03 samples/sec Loss 1.0694 LearningRate 0.000002 Epoch: 38 Global Step: 798550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:13,356-Speed 2495.87 samples/sec Loss 1.0698 LearningRate 0.000002 Epoch: 38 Global Step: 798560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:21,560-Speed 2496.86 samples/sec Loss 1.0698 LearningRate 0.000002 Epoch: 38 Global Step: 798570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:29,763-Speed 2496.89 samples/sec Loss 1.0509 LearningRate 0.000002 Epoch: 38 Global Step: 798580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:37,972-Speed 2495.38 samples/sec Loss 1.0482 LearningRate 0.000002 Epoch: 38 Global Step: 798590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:46,178-Speed 2496.17 samples/sec Loss 1.0775 LearningRate 0.000002 Epoch: 38 Global Step: 798600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:10:54,331-Speed 2512.28 samples/sec Loss 1.0560 LearningRate 0.000002 Epoch: 38 Global Step: 798610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:02,538-Speed 2495.82 samples/sec Loss 1.0797 LearningRate 0.000002 Epoch: 38 Global Step: 798620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:10,745-Speed 2495.67 samples/sec Loss 1.0560 LearningRate 0.000002 Epoch: 38 Global Step: 798630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:18,954-Speed 2495.32 samples/sec Loss 1.0406 LearningRate 0.000002 Epoch: 38 Global Step: 798640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:27,166-Speed 2494.43 samples/sec Loss 1.0797 LearningRate 0.000002 Epoch: 38 Global Step: 798650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:35,370-Speed 2496.71 samples/sec Loss 1.0494 LearningRate 0.000002 Epoch: 38 Global Step: 798660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:43,540-Speed 2506.98 samples/sec Loss 1.0436 LearningRate 0.000002 Epoch: 38 Global Step: 798670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:51,744-Speed 2496.65 samples/sec Loss 1.0433 LearningRate 0.000002 Epoch: 38 Global Step: 798680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:11:59,947-Speed 2497.57 samples/sec Loss 1.0867 LearningRate 0.000002 Epoch: 38 Global Step: 798690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:08,150-Speed 2496.76 samples/sec Loss 1.0612 LearningRate 0.000002 Epoch: 38 Global Step: 798700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:16,355-Speed 2496.49 samples/sec Loss 1.0489 LearningRate 0.000002 Epoch: 38 Global Step: 798710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:24,576-Speed 2491.54 samples/sec Loss 1.0671 LearningRate 0.000002 Epoch: 38 Global Step: 798720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:32,737-Speed 2509.86 samples/sec Loss 1.0720 LearningRate 0.000002 Epoch: 38 Global Step: 798730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:40,950-Speed 2494.13 samples/sec Loss 1.0477 LearningRate 0.000002 Epoch: 38 Global Step: 798740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:49,179-Speed 2489.11 samples/sec Loss 1.1061 LearningRate 0.000002 Epoch: 38 Global Step: 798750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:12:57,397-Speed 2492.60 samples/sec Loss 1.0834 LearningRate 0.000002 Epoch: 38 Global Step: 798760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:05,601-Speed 2496.70 samples/sec Loss 1.1217 LearningRate 0.000002 Epoch: 38 Global Step: 798770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:13,813-Speed 2494.29 samples/sec Loss 1.0660 LearningRate 0.000002 Epoch: 38 Global Step: 798780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:21,966-Speed 2512.55 samples/sec Loss 1.0754 LearningRate 0.000002 Epoch: 38 Global Step: 798790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:30,171-Speed 2496.72 samples/sec Loss 1.0709 LearningRate 0.000002 Epoch: 38 Global Step: 798800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:38,388-Speed 2492.50 samples/sec Loss 1.0542 LearningRate 0.000002 Epoch: 38 Global Step: 798810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:46,593-Speed 2496.51 samples/sec Loss 1.0563 LearningRate 0.000002 Epoch: 38 Global Step: 798820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:13:54,798-Speed 2496.35 samples/sec Loss 1.0838 LearningRate 0.000002 Epoch: 38 Global Step: 798830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:03,006-Speed 2495.65 samples/sec Loss 1.0675 LearningRate 0.000002 Epoch: 38 Global Step: 798840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:11,158-Speed 2512.58 samples/sec Loss 1.0524 LearningRate 0.000002 Epoch: 38 Global Step: 798850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:19,363-Speed 2496.70 samples/sec Loss 1.0538 LearningRate 0.000002 Epoch: 38 Global Step: 798860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:27,566-Speed 2496.72 samples/sec Loss 1.0875 LearningRate 0.000002 Epoch: 38 Global Step: 798870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:35,774-Speed 2495.51 samples/sec Loss 1.0819 LearningRate 0.000002 Epoch: 38 Global Step: 798880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:43,992-Speed 2492.55 samples/sec Loss 1.0701 LearningRate 0.000002 Epoch: 38 Global Step: 798890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:14:52,198-Speed 2496.41 samples/sec Loss 1.0802 LearningRate 0.000002 Epoch: 38 Global Step: 798900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:00,353-Speed 2511.63 samples/sec Loss 1.0675 LearningRate 0.000002 Epoch: 38 Global Step: 798910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:08,558-Speed 2496.44 samples/sec Loss 1.0517 LearningRate 0.000002 Epoch: 38 Global Step: 798920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:16,763-Speed 2496.48 samples/sec Loss 1.0566 LearningRate 0.000002 Epoch: 38 Global Step: 798930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:24,967-Speed 2496.43 samples/sec Loss 1.1050 LearningRate 0.000002 Epoch: 38 Global Step: 798940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:33,171-Speed 2496.91 samples/sec Loss 1.0688 LearningRate 0.000002 Epoch: 38 Global Step: 798950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:41,379-Speed 2495.34 samples/sec Loss 1.0770 LearningRate 0.000002 Epoch: 38 Global Step: 798960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:49,531-Speed 2512.77 samples/sec Loss 1.0905 LearningRate 0.000002 Epoch: 38 Global Step: 798970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:15:57,740-Speed 2495.16 samples/sec Loss 1.0455 LearningRate 0.000002 Epoch: 38 Global Step: 798980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:05,952-Speed 2494.13 samples/sec Loss 1.0753 LearningRate 0.000002 Epoch: 38 Global Step: 798990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:14,160-Speed 2495.65 samples/sec Loss 1.0806 LearningRate 0.000002 Epoch: 38 Global Step: 799000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:22,362-Speed 2497.61 samples/sec Loss 1.0737 LearningRate 0.000002 Epoch: 38 Global Step: 799010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:30,573-Speed 2494.73 samples/sec Loss 1.0807 LearningRate 0.000002 Epoch: 38 Global Step: 799020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:38,724-Speed 2512.87 samples/sec Loss 1.0837 LearningRate 0.000002 Epoch: 38 Global Step: 799030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:46,937-Speed 2493.95 samples/sec Loss 1.0624 LearningRate 0.000002 Epoch: 38 Global Step: 799040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:16:55,154-Speed 2493.08 samples/sec Loss 1.0689 LearningRate 0.000002 Epoch: 38 Global Step: 799050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:03,354-Speed 2497.97 samples/sec Loss 1.0792 LearningRate 0.000002 Epoch: 38 Global Step: 799060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:11,558-Speed 2497.27 samples/sec Loss 1.0863 LearningRate 0.000002 Epoch: 38 Global Step: 799070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:19,762-Speed 2497.05 samples/sec Loss 1.0497 LearningRate 0.000002 Epoch: 38 Global Step: 799080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:27,914-Speed 2512.70 samples/sec Loss 1.0648 LearningRate 0.000002 Epoch: 38 Global Step: 799090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:36,119-Speed 2496.53 samples/sec Loss 1.0518 LearningRate 0.000002 Epoch: 38 Global Step: 799100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:44,323-Speed 2496.74 samples/sec Loss 1.0491 LearningRate 0.000002 Epoch: 38 Global Step: 799110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:17:52,535-Speed 2494.26 samples/sec Loss 1.0721 LearningRate 0.000002 Epoch: 38 Global Step: 799120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:00,745-Speed 2494.88 samples/sec Loss 1.0497 LearningRate 0.000002 Epoch: 38 Global Step: 799130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:08,947-Speed 2497.38 samples/sec Loss 1.0449 LearningRate 0.000002 Epoch: 38 Global Step: 799140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:17,102-Speed 2511.90 samples/sec Loss 1.0737 LearningRate 0.000002 Epoch: 38 Global Step: 799150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:25,306-Speed 2496.53 samples/sec Loss 1.0638 LearningRate 0.000002 Epoch: 38 Global Step: 799160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:33,510-Speed 2496.66 samples/sec Loss 1.0599 LearningRate 0.000002 Epoch: 38 Global Step: 799170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:41,713-Speed 2497.04 samples/sec Loss 1.0645 LearningRate 0.000002 Epoch: 38 Global Step: 799180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:49,920-Speed 2495.92 samples/sec Loss 1.0681 LearningRate 0.000002 Epoch: 38 Global Step: 799190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:18:58,126-Speed 2496.33 samples/sec Loss 1.0773 LearningRate 0.000002 Epoch: 38 Global Step: 799200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:06,274-Speed 2514.24 samples/sec Loss 1.0554 LearningRate 0.000002 Epoch: 38 Global Step: 799210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:14,478-Speed 2496.88 samples/sec Loss 1.0727 LearningRate 0.000002 Epoch: 38 Global Step: 799220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:22,684-Speed 2496.04 samples/sec Loss 1.0442 LearningRate 0.000002 Epoch: 38 Global Step: 799230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:30,888-Speed 2496.79 samples/sec Loss 1.0605 LearningRate 0.000002 Epoch: 38 Global Step: 799240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:39,090-Speed 2497.21 samples/sec Loss 1.0464 LearningRate 0.000002 Epoch: 38 Global Step: 799250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:47,297-Speed 2496.03 samples/sec Loss 1.0825 LearningRate 0.000002 Epoch: 38 Global Step: 799260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:19:55,449-Speed 2512.46 samples/sec Loss 1.0960 LearningRate 0.000002 Epoch: 38 Global Step: 799270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:03,653-Speed 2496.67 samples/sec Loss 1.0420 LearningRate 0.000002 Epoch: 38 Global Step: 799280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:11,862-Speed 2495.44 samples/sec Loss 1.0576 LearningRate 0.000002 Epoch: 38 Global Step: 799290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:20,066-Speed 2496.70 samples/sec Loss 1.0741 LearningRate 0.000002 Epoch: 38 Global Step: 799300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:28,286-Speed 2491.93 samples/sec Loss 1.0492 LearningRate 0.000002 Epoch: 38 Global Step: 799310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:36,488-Speed 2497.28 samples/sec Loss 1.0987 LearningRate 0.000002 Epoch: 38 Global Step: 799320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:44,645-Speed 2511.03 samples/sec Loss 1.0504 LearningRate 0.000002 Epoch: 38 Global Step: 799330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:20:52,849-Speed 2496.69 samples/sec Loss 1.0616 LearningRate 0.000002 Epoch: 38 Global Step: 799340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:21:01,053-Speed 2496.66 samples/sec Loss 1.0642 LearningRate 0.000002 Epoch: 38 Global Step: 799350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:21:09,257-Speed 2496.74 samples/sec Loss 1.0799 LearningRate 0.000002 Epoch: 38 Global Step: 799360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:17,462-Speed 2496.44 samples/sec Loss 1.0626 LearningRate 0.000002 Epoch: 38 Global Step: 799370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:25,671-Speed 2495.23 samples/sec Loss 1.0521 LearningRate 0.000002 Epoch: 38 Global Step: 799380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:33,826-Speed 2511.84 samples/sec Loss 1.0546 LearningRate 0.000002 Epoch: 38 Global Step: 799390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:42,034-Speed 2495.42 samples/sec Loss 1.0731 LearningRate 0.000002 Epoch: 38 Global Step: 799400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:50,241-Speed 2496.02 samples/sec Loss 1.0900 LearningRate 0.000002 Epoch: 38 Global Step: 799410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:21:58,450-Speed 2494.99 samples/sec Loss 1.0551 LearningRate 0.000002 Epoch: 38 Global Step: 799420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:06,655-Speed 2496.43 samples/sec Loss 1.0811 LearningRate 0.000002 Epoch: 38 Global Step: 799430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:14,857-Speed 2497.89 samples/sec Loss 1.0843 LearningRate 0.000002 Epoch: 38 Global Step: 799440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:23,012-Speed 2511.54 samples/sec Loss 1.0653 LearningRate 0.000002 Epoch: 38 Global Step: 799450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:31,213-Speed 2497.70 samples/sec Loss 1.0665 LearningRate 0.000002 Epoch: 38 Global Step: 799460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:39,419-Speed 2496.15 samples/sec Loss 1.0360 LearningRate 0.000002 Epoch: 38 Global Step: 799470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:47,622-Speed 2497.09 samples/sec Loss 1.0695 LearningRate 0.000002 Epoch: 38 Global Step: 799480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:22:55,825-Speed 2497.25 samples/sec Loss 1.0808 LearningRate 0.000002 Epoch: 38 Global Step: 799490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:04,031-Speed 2496.17 samples/sec Loss 1.0688 LearningRate 0.000002 Epoch: 38 Global Step: 799500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:12,187-Speed 2511.38 samples/sec Loss 1.0538 LearningRate 0.000002 Epoch: 38 Global Step: 799510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:20,394-Speed 2495.79 samples/sec Loss 1.0415 LearningRate 0.000002 Epoch: 38 Global Step: 799520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:28,599-Speed 2496.48 samples/sec Loss 1.0469 LearningRate 0.000002 Epoch: 38 Global Step: 799530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:36,801-Speed 2497.43 samples/sec Loss 1.0794 LearningRate 0.000002 Epoch: 38 Global Step: 799540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:45,007-Speed 2495.92 samples/sec Loss 1.0704 LearningRate 0.000002 Epoch: 38 Global Step: 799550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:23:53,212-Speed 2496.63 samples/sec Loss 1.0592 LearningRate 0.000002 Epoch: 38 Global Step: 799560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:01,375-Speed 2510.73 samples/sec Loss 1.0750 LearningRate 0.000002 Epoch: 38 Global Step: 799570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:09,605-Speed 2489.08 samples/sec Loss 1.0641 LearningRate 0.000002 Epoch: 38 Global Step: 799580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:17,807-Speed 2497.59 samples/sec Loss 1.0604 LearningRate 0.000002 Epoch: 38 Global Step: 799590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:26,022-Speed 2493.65 samples/sec Loss 1.0652 LearningRate 0.000002 Epoch: 38 Global Step: 799600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:34,237-Speed 2493.21 samples/sec Loss 1.0813 LearningRate 0.000002 Epoch: 38 Global Step: 799610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:42,446-Speed 2495.09 samples/sec Loss 1.0480 LearningRate 0.000002 Epoch: 38 Global Step: 799620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:50,597-Speed 2513.18 samples/sec Loss 1.0851 LearningRate 0.000002 Epoch: 38 Global Step: 799630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:24:58,816-Speed 2492.15 samples/sec Loss 1.0829 LearningRate 0.000002 Epoch: 38 Global Step: 799640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:07,024-Speed 2495.27 samples/sec Loss 1.0975 LearningRate 0.000002 Epoch: 38 Global Step: 799650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:15,231-Speed 2495.98 samples/sec Loss 1.0398 LearningRate 0.000002 Epoch: 38 Global Step: 799660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:23,439-Speed 2495.31 samples/sec Loss 1.0573 LearningRate 0.000002 Epoch: 38 Global Step: 799670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:31,642-Speed 2496.95 samples/sec Loss 1.0766 LearningRate 0.000002 Epoch: 38 Global Step: 799680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:39,792-Speed 2513.35 samples/sec Loss 1.0802 LearningRate 0.000002 Epoch: 38 Global Step: 799690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:47,996-Speed 2496.86 samples/sec Loss 1.0365 LearningRate 0.000002 Epoch: 38 Global Step: 799700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:25:56,160-Speed 2509.09 samples/sec Loss 1.0718 LearningRate 0.000002 Epoch: 38 Global Step: 799710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:04,377-Speed 2492.53 samples/sec Loss 1.0545 LearningRate 0.000002 Epoch: 38 Global Step: 799720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:12,580-Speed 2496.91 samples/sec Loss 1.0842 LearningRate 0.000002 Epoch: 38 Global Step: 799730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:20,784-Speed 2496.88 samples/sec Loss 1.0715 LearningRate 0.000002 Epoch: 38 Global Step: 799740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:28,938-Speed 2512.10 samples/sec Loss 1.0935 LearningRate 0.000002 Epoch: 38 Global Step: 799750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:37,143-Speed 2496.42 samples/sec Loss 1.0595 LearningRate 0.000002 Epoch: 38 Global Step: 799760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:45,356-Speed 2493.86 samples/sec Loss 1.0619 LearningRate 0.000002 Epoch: 38 Global Step: 799770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:26:53,561-Speed 2496.55 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 799780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:01,776-Speed 2493.43 samples/sec Loss 1.0534 LearningRate 0.000002 Epoch: 38 Global Step: 799790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:09,980-Speed 2496.69 samples/sec Loss 1.0550 LearningRate 0.000002 Epoch: 38 Global Step: 799800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:18,133-Speed 2512.73 samples/sec Loss 1.0351 LearningRate 0.000002 Epoch: 38 Global Step: 799810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:26,337-Speed 2496.78 samples/sec Loss 1.0373 LearningRate 0.000002 Epoch: 38 Global Step: 799820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:34,552-Speed 2493.19 samples/sec Loss 1.0482 LearningRate 0.000002 Epoch: 38 Global Step: 799830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:42,758-Speed 2496.11 samples/sec Loss 1.0426 LearningRate 0.000002 Epoch: 38 Global Step: 799840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:50,966-Speed 2495.68 samples/sec Loss 1.0814 LearningRate 0.000002 Epoch: 38 Global Step: 799850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:27:59,171-Speed 2496.56 samples/sec Loss 1.0670 LearningRate 0.000002 Epoch: 38 Global Step: 799860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:07,325-Speed 2511.91 samples/sec Loss 1.0495 LearningRate 0.000002 Epoch: 38 Global Step: 799870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:15,533-Speed 2495.59 samples/sec Loss 1.0417 LearningRate 0.000002 Epoch: 38 Global Step: 799880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:23,734-Speed 2497.81 samples/sec Loss 1.0573 LearningRate 0.000002 Epoch: 38 Global Step: 799890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:31,939-Speed 2496.37 samples/sec Loss 1.0779 LearningRate 0.000002 Epoch: 38 Global Step: 799900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:40,142-Speed 2496.87 samples/sec Loss 1.0616 LearningRate 0.000002 Epoch: 38 Global Step: 799910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:48,347-Speed 2496.22 samples/sec Loss 1.0763 LearningRate 0.000002 Epoch: 38 Global Step: 799920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:28:56,496-Speed 2513.70 samples/sec Loss 1.0440 LearningRate 0.000002 Epoch: 38 Global Step: 799930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:04,704-Speed 2495.53 samples/sec Loss 1.0607 LearningRate 0.000002 Epoch: 38 Global Step: 799940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:12,920-Speed 2492.95 samples/sec Loss 1.0584 LearningRate 0.000002 Epoch: 38 Global Step: 799950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:21,127-Speed 2495.99 samples/sec Loss 1.0668 LearningRate 0.000002 Epoch: 38 Global Step: 799960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:29,340-Speed 2494.12 samples/sec Loss 1.0693 LearningRate 0.000002 Epoch: 38 Global Step: 799970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:37,560-Speed 2492.02 samples/sec Loss 1.0915 LearningRate 0.000002 Epoch: 38 Global Step: 799980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:45,715-Speed 2511.66 samples/sec Loss 1.0525 LearningRate 0.000002 Epoch: 38 Global Step: 799990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:29:53,920-Speed 2496.49 samples/sec Loss 1.0414 LearningRate 0.000002 Epoch: 38 Global Step: 800000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:02,121-Speed 2497.55 samples/sec Loss 1.0512 LearningRate 0.000002 Epoch: 38 Global Step: 800010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:10,336-Speed 2493.40 samples/sec Loss 1.0523 LearningRate 0.000002 Epoch: 38 Global Step: 800020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:18,538-Speed 2497.79 samples/sec Loss 1.0682 LearningRate 0.000002 Epoch: 38 Global Step: 800030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:26,744-Speed 2496.11 samples/sec Loss 1.0631 LearningRate 0.000002 Epoch: 38 Global Step: 800040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:34,895-Speed 2512.97 samples/sec Loss 1.0598 LearningRate 0.000002 Epoch: 38 Global Step: 800050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:43,105-Speed 2494.99 samples/sec Loss 1.0666 LearningRate 0.000002 Epoch: 38 Global Step: 800060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:51,309-Speed 2496.59 samples/sec Loss 1.0498 LearningRate 0.000002 Epoch: 38 Global Step: 800070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:30:59,512-Speed 2497.07 samples/sec Loss 1.0750 LearningRate 0.000002 Epoch: 38 Global Step: 800080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:07,717-Speed 2496.35 samples/sec Loss 1.0420 LearningRate 0.000002 Epoch: 38 Global Step: 800090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:15,921-Speed 2497.13 samples/sec Loss 1.0583 LearningRate 0.000002 Epoch: 38 Global Step: 800100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:24,070-Speed 2513.36 samples/sec Loss 1.0670 LearningRate 0.000002 Epoch: 38 Global Step: 800110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:32,280-Speed 2495.22 samples/sec Loss 1.0881 LearningRate 0.000002 Epoch: 38 Global Step: 800120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:40,490-Speed 2494.90 samples/sec Loss 1.0481 LearningRate 0.000002 Epoch: 38 Global Step: 800130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:48,711-Speed 2491.72 samples/sec Loss 1.0843 LearningRate 0.000002 Epoch: 38 Global Step: 800140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:31:56,915-Speed 2496.56 samples/sec Loss 1.0565 LearningRate 0.000002 Epoch: 38 Global Step: 800150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:05,120-Speed 2496.58 samples/sec Loss 1.0817 LearningRate 0.000002 Epoch: 38 Global Step: 800160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:13,273-Speed 2512.62 samples/sec Loss 1.0456 LearningRate 0.000002 Epoch: 38 Global Step: 800170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:21,482-Speed 2495.08 samples/sec Loss 1.0409 LearningRate 0.000002 Epoch: 38 Global Step: 800180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:29,688-Speed 2496.17 samples/sec Loss 1.0482 LearningRate 0.000002 Epoch: 38 Global Step: 800190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:37,896-Speed 2495.73 samples/sec Loss 1.0596 LearningRate 0.000002 Epoch: 38 Global Step: 800200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:46,104-Speed 2495.46 samples/sec Loss 1.0601 LearningRate 0.000002 Epoch: 38 Global Step: 800210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:32:54,308-Speed 2496.53 samples/sec Loss 1.0681 LearningRate 0.000002 Epoch: 38 Global Step: 800220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:02,482-Speed 2505.82 samples/sec Loss 1.0578 LearningRate 0.000002 Epoch: 38 Global Step: 800230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:10,689-Speed 2496.03 samples/sec Loss 1.0479 LearningRate 0.000002 Epoch: 38 Global Step: 800240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:18,900-Speed 2494.77 samples/sec Loss 1.0973 LearningRate 0.000002 Epoch: 38 Global Step: 800250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:27,110-Speed 2494.96 samples/sec Loss 1.0461 LearningRate 0.000002 Epoch: 38 Global Step: 800260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:35,316-Speed 2495.95 samples/sec Loss 1.0541 LearningRate 0.000002 Epoch: 38 Global Step: 800270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:43,521-Speed 2496.89 samples/sec Loss 1.0475 LearningRate 0.000002 Epoch: 38 Global Step: 800280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:51,677-Speed 2511.34 samples/sec Loss 1.0621 LearningRate 0.000002 Epoch: 38 Global Step: 800290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:33:59,891-Speed 2494.26 samples/sec Loss 1.0751 LearningRate 0.000002 Epoch: 38 Global Step: 800300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:08,101-Speed 2495.13 samples/sec Loss 1.0503 LearningRate 0.000002 Epoch: 38 Global Step: 800310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:16,317-Speed 2493.20 samples/sec Loss 1.0370 LearningRate 0.000002 Epoch: 38 Global Step: 800320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:24,524-Speed 2495.58 samples/sec Loss 1.0838 LearningRate 0.000002 Epoch: 38 Global Step: 800330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:32,728-Speed 2496.61 samples/sec Loss 1.0533 LearningRate 0.000002 Epoch: 38 Global Step: 800340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:40,882-Speed 2512.36 samples/sec Loss 1.0765 LearningRate 0.000002 Epoch: 38 Global Step: 800350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:49,086-Speed 2496.60 samples/sec Loss 1.0678 LearningRate 0.000002 Epoch: 38 Global Step: 800360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:34:57,308-Speed 2491.33 samples/sec Loss 1.0371 LearningRate 0.000002 Epoch: 38 Global Step: 800370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:05,513-Speed 2496.07 samples/sec Loss 1.0727 LearningRate 0.000002 Epoch: 38 Global Step: 800380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:13,725-Speed 2494.64 samples/sec Loss 1.0458 LearningRate 0.000002 Epoch: 38 Global Step: 800390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:21,933-Speed 2495.36 samples/sec Loss 1.0496 LearningRate 0.000002 Epoch: 38 Global Step: 800400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:30,096-Speed 2509.37 samples/sec Loss 1.0499 LearningRate 0.000002 Epoch: 38 Global Step: 800410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:38,310-Speed 2493.65 samples/sec Loss 1.0539 LearningRate 0.000002 Epoch: 38 Global Step: 800420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:46,516-Speed 2496.55 samples/sec Loss 1.0539 LearningRate 0.000002 Epoch: 38 Global Step: 800430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:35:54,725-Speed 2495.23 samples/sec Loss 1.0611 LearningRate 0.000002 Epoch: 38 Global Step: 800440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:02,939-Speed 2493.65 samples/sec Loss 1.0469 LearningRate 0.000002 Epoch: 38 Global Step: 800450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:11,147-Speed 2495.31 samples/sec Loss 1.0785 LearningRate 0.000002 Epoch: 38 Global Step: 800460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:19,319-Speed 2506.65 samples/sec Loss 1.0680 LearningRate 0.000002 Epoch: 38 Global Step: 800470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:27,524-Speed 2496.25 samples/sec Loss 1.0555 LearningRate 0.000002 Epoch: 38 Global Step: 800480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:35,734-Speed 2495.40 samples/sec Loss 1.0550 LearningRate 0.000002 Epoch: 38 Global Step: 800490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:43,942-Speed 2495.31 samples/sec Loss 1.0586 LearningRate 0.000002 Epoch: 38 Global Step: 800500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:36:52,149-Speed 2495.78 samples/sec Loss 1.0269 LearningRate 0.000002 Epoch: 38 Global Step: 800510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:00,357-Speed 2495.73 samples/sec Loss 1.0197 LearningRate 0.000002 Epoch: 38 Global Step: 800520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:08,511-Speed 2511.90 samples/sec Loss 1.0440 LearningRate 0.000002 Epoch: 38 Global Step: 800530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:16,730-Speed 2492.17 samples/sec Loss 1.0705 LearningRate 0.000002 Epoch: 38 Global Step: 800540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:24,936-Speed 2496.52 samples/sec Loss 1.0478 LearningRate 0.000002 Epoch: 38 Global Step: 800550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:33,141-Speed 2496.40 samples/sec Loss 1.0742 LearningRate 0.000002 Epoch: 38 Global Step: 800560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:41,343-Speed 2497.24 samples/sec Loss 1.0253 LearningRate 0.000002 Epoch: 38 Global Step: 800570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:49,554-Speed 2495.41 samples/sec Loss 1.0657 LearningRate 0.000002 Epoch: 38 Global Step: 800580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:37:57,700-Speed 2514.35 samples/sec Loss 1.0390 LearningRate 0.000002 Epoch: 38 Global Step: 800590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:05,904-Speed 2497.44 samples/sec Loss 1.0548 LearningRate 0.000002 Epoch: 38 Global Step: 800600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:14,108-Speed 2496.68 samples/sec Loss 1.0594 LearningRate 0.000002 Epoch: 38 Global Step: 800610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:22,313-Speed 2496.48 samples/sec Loss 1.0453 LearningRate 0.000002 Epoch: 38 Global Step: 800620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:30,519-Speed 2496.08 samples/sec Loss 1.0569 LearningRate 0.000002 Epoch: 38 Global Step: 800630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:38,724-Speed 2496.31 samples/sec Loss 1.0501 LearningRate 0.000002 Epoch: 38 Global Step: 800640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:46,875-Speed 2512.81 samples/sec Loss 1.0745 LearningRate 0.000001 Epoch: 38 Global Step: 800650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:38:55,080-Speed 2496.48 samples/sec Loss 1.0803 LearningRate 0.000001 Epoch: 38 Global Step: 800660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:03,297-Speed 2492.66 samples/sec Loss 1.0778 LearningRate 0.000001 Epoch: 38 Global Step: 800670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:11,501-Speed 2496.97 samples/sec Loss 1.0402 LearningRate 0.000001 Epoch: 38 Global Step: 800680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:19,704-Speed 2496.82 samples/sec Loss 1.0602 LearningRate 0.000001 Epoch: 38 Global Step: 800690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:27,911-Speed 2495.97 samples/sec Loss 1.0703 LearningRate 0.000001 Epoch: 38 Global Step: 800700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:36,062-Speed 2512.92 samples/sec Loss 1.0562 LearningRate 0.000001 Epoch: 38 Global Step: 800710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:44,267-Speed 2496.51 samples/sec Loss 1.0560 LearningRate 0.000001 Epoch: 38 Global Step: 800720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:39:52,468-Speed 2497.67 samples/sec Loss 1.0692 LearningRate 0.000001 Epoch: 38 Global Step: 800730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:00,673-Speed 2496.40 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 38 Global Step: 800740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:08,878-Speed 2496.17 samples/sec Loss 1.0423 LearningRate 0.000001 Epoch: 38 Global Step: 800750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:17,085-Speed 2495.85 samples/sec Loss 1.0806 LearningRate 0.000001 Epoch: 38 Global Step: 800760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:25,234-Speed 2513.63 samples/sec Loss 1.0760 LearningRate 0.000001 Epoch: 38 Global Step: 800770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:33,448-Speed 2493.48 samples/sec Loss 1.0443 LearningRate 0.000001 Epoch: 38 Global Step: 800780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:41,656-Speed 2495.43 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 38 Global Step: 800790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:49,859-Speed 2496.92 samples/sec Loss 1.0527 LearningRate 0.000001 Epoch: 38 Global Step: 800800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:40:58,064-Speed 2496.46 samples/sec Loss 1.0439 LearningRate 0.000001 Epoch: 38 Global Step: 800810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:06,278-Speed 2493.93 samples/sec Loss 1.0853 LearningRate 0.000001 Epoch: 38 Global Step: 800820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:14,436-Speed 2510.51 samples/sec Loss 1.0749 LearningRate 0.000001 Epoch: 38 Global Step: 800830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:22,646-Speed 2494.92 samples/sec Loss 1.0799 LearningRate 0.000001 Epoch: 38 Global Step: 800840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:30,856-Speed 2494.99 samples/sec Loss 1.0391 LearningRate 0.000001 Epoch: 38 Global Step: 800850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:39,072-Speed 2492.92 samples/sec Loss 1.0719 LearningRate 0.000001 Epoch: 38 Global Step: 800860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:47,279-Speed 2496.00 samples/sec Loss 1.0742 LearningRate 0.000001 Epoch: 38 Global Step: 800870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:41:55,483-Speed 2496.46 samples/sec Loss 1.0460 LearningRate 0.000001 Epoch: 38 Global Step: 800880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:42:03,636-Speed 2512.74 samples/sec Loss 1.0586 LearningRate 0.000001 Epoch: 38 Global Step: 800890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:42:11,840-Speed 2496.76 samples/sec Loss 1.0446 LearningRate 0.000001 Epoch: 38 Global Step: 800900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-07-13 05:42:20,045-Speed 2496.37 samples/sec Loss 1.0381 LearningRate 0.000001 Epoch: 38 Global Step: 800910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:42:28,249-Speed 2496.86 samples/sec Loss 1.0747 LearningRate 0.000001 Epoch: 38 Global Step: 800920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:42:36,456-Speed 2495.99 samples/sec Loss 1.0569 LearningRate 0.000001 Epoch: 38 Global Step: 800930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:42:44,663-Speed 2495.57 samples/sec Loss 1.0797 LearningRate 0.000001 Epoch: 38 Global Step: 800940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:42:52,816-Speed 2512.41 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 38 Global Step: 800950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:01,018-Speed 2497.46 samples/sec Loss 1.0968 LearningRate 0.000001 Epoch: 38 Global Step: 800960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:09,225-Speed 2495.97 samples/sec Loss 1.0683 LearningRate 0.000001 Epoch: 38 Global Step: 800970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:17,434-Speed 2495.63 samples/sec Loss 1.0639 LearningRate 0.000001 Epoch: 38 Global Step: 800980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:25,637-Speed 2496.91 samples/sec Loss 1.0768 LearningRate 0.000001 Epoch: 38 Global Step: 800990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:33,857-Speed 2491.63 samples/sec Loss 1.0496 LearningRate 0.000001 Epoch: 38 Global Step: 801000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:42,007-Speed 2513.39 samples/sec Loss 1.0536 LearningRate 0.000001 Epoch: 38 Global Step: 801010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:50,210-Speed 2496.91 samples/sec Loss 1.0697 LearningRate 0.000001 Epoch: 38 Global Step: 801020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:43:58,430-Speed 2491.89 samples/sec Loss 1.0747 LearningRate 0.000001 Epoch: 38 Global Step: 801030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:06,632-Speed 2497.24 samples/sec Loss 1.0415 LearningRate 0.000001 Epoch: 38 Global Step: 801040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:14,847-Speed 2493.33 samples/sec Loss 1.0860 LearningRate 0.000001 Epoch: 38 Global Step: 801050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:23,060-Speed 2494.10 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 38 Global Step: 801060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:31,210-Speed 2513.24 samples/sec Loss 1.0564 LearningRate 0.000001 Epoch: 38 Global Step: 801070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:39,412-Speed 2497.33 samples/sec Loss 1.0641 LearningRate 0.000001 Epoch: 38 Global Step: 801080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:47,628-Speed 2493.19 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 38 Global Step: 801090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:44:55,829-Speed 2497.33 samples/sec Loss 1.0681 LearningRate 0.000001 Epoch: 38 Global Step: 801100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:04,033-Speed 2496.76 samples/sec Loss 1.0210 LearningRate 0.000001 Epoch: 38 Global Step: 801110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:12,243-Speed 2494.93 samples/sec Loss 1.0548 LearningRate 0.000001 Epoch: 38 Global Step: 801120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:20,394-Speed 2513.00 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 801130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:28,603-Speed 2495.19 samples/sec Loss 1.0535 LearningRate 0.000001 Epoch: 38 Global Step: 801140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:36,807-Speed 2496.88 samples/sec Loss 1.0686 LearningRate 0.000001 Epoch: 38 Global Step: 801150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-07-13 05:45:45,013-Speed 2496.14 samples/sec Loss 1.0847 LearningRate 0.000001 Epoch: 38 Global Step: 801160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:45:53,220-Speed 2495.81 samples/sec Loss 1.0650 LearningRate 0.000001 Epoch: 38 Global Step: 801170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:01,426-Speed 2496.17 samples/sec Loss 1.0537 LearningRate 0.000001 Epoch: 38 Global Step: 801180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:09,576-Speed 2513.67 samples/sec Loss 1.0525 LearningRate 0.000001 Epoch: 38 Global Step: 801190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:17,788-Speed 2494.29 samples/sec Loss 1.0576 LearningRate 0.000001 Epoch: 38 Global Step: 801200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:25,994-Speed 2496.32 samples/sec Loss 1.0681 LearningRate 0.000001 Epoch: 38 Global Step: 801210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:34,203-Speed 2494.95 samples/sec Loss 1.0517 LearningRate 0.000001 Epoch: 38 Global Step: 801220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:42,411-Speed 2495.65 samples/sec Loss 1.0591 LearningRate 0.000001 Epoch: 38 Global Step: 801230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:50,621-Speed 2495.13 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 801240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:46:58,774-Speed 2512.28 samples/sec Loss 1.0733 LearningRate 0.000001 Epoch: 38 Global Step: 801250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:06,985-Speed 2494.46 samples/sec Loss 1.0619 LearningRate 0.000001 Epoch: 38 Global Step: 801260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:15,195-Speed 2494.98 samples/sec Loss 1.0478 LearningRate 0.000001 Epoch: 38 Global Step: 801270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:23,406-Speed 2495.54 samples/sec Loss 1.0608 LearningRate 0.000001 Epoch: 38 Global Step: 801280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:31,612-Speed 2496.02 samples/sec Loss 1.0558 LearningRate 0.000001 Epoch: 38 Global Step: 801290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:39,821-Speed 2495.36 samples/sec Loss 1.0713 LearningRate 0.000001 Epoch: 38 Global Step: 801300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:47,975-Speed 2514.66 samples/sec Loss 1.0750 LearningRate 0.000001 Epoch: 38 Global Step: 801310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:47:56,182-Speed 2495.98 samples/sec Loss 1.0600 LearningRate 0.000001 Epoch: 38 Global Step: 801320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:04,389-Speed 2496.07 samples/sec Loss 1.0562 LearningRate 0.000001 Epoch: 38 Global Step: 801330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:12,592-Speed 2496.77 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 801340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:20,797-Speed 2496.80 samples/sec Loss 1.0736 LearningRate 0.000001 Epoch: 38 Global Step: 801350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:29,006-Speed 2495.11 samples/sec Loss 1.0544 LearningRate 0.000001 Epoch: 38 Global Step: 801360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:37,160-Speed 2512.16 samples/sec Loss 1.0587 LearningRate 0.000001 Epoch: 38 Global Step: 801370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:45,368-Speed 2495.49 samples/sec Loss 1.0480 LearningRate 0.000001 Epoch: 38 Global Step: 801380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:48:53,572-Speed 2496.58 samples/sec Loss 1.0670 LearningRate 0.000001 Epoch: 38 Global Step: 801390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:01,776-Speed 2496.99 samples/sec Loss 1.0569 LearningRate 0.000001 Epoch: 38 Global Step: 801400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:09,990-Speed 2493.60 samples/sec Loss 1.0877 LearningRate 0.000001 Epoch: 38 Global Step: 801410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:18,193-Speed 2496.82 samples/sec Loss 1.0457 LearningRate 0.000001 Epoch: 38 Global Step: 801420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:26,361-Speed 2507.80 samples/sec Loss 1.0953 LearningRate 0.000001 Epoch: 38 Global Step: 801430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:34,566-Speed 2496.44 samples/sec Loss 1.0673 LearningRate 0.000001 Epoch: 38 Global Step: 801440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:42,775-Speed 2495.08 samples/sec Loss 1.0797 LearningRate 0.000001 Epoch: 38 Global Step: 801450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:50,986-Speed 2494.55 samples/sec Loss 1.0850 LearningRate 0.000001 Epoch: 38 Global Step: 801460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:49:59,190-Speed 2496.94 samples/sec Loss 1.0608 LearningRate 0.000001 Epoch: 38 Global Step: 801470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:07,399-Speed 2495.26 samples/sec Loss 1.0634 LearningRate 0.000001 Epoch: 38 Global Step: 801480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:15,553-Speed 2511.84 samples/sec Loss 1.0449 LearningRate 0.000001 Epoch: 38 Global Step: 801490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:23,760-Speed 2495.82 samples/sec Loss 1.0531 LearningRate 0.000001 Epoch: 38 Global Step: 801500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:31,972-Speed 2494.23 samples/sec Loss 1.0687 LearningRate 0.000001 Epoch: 38 Global Step: 801510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:40,180-Speed 2495.76 samples/sec Loss 1.0756 LearningRate 0.000001 Epoch: 38 Global Step: 801520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:48,385-Speed 2496.57 samples/sec Loss 1.0835 LearningRate 0.000001 Epoch: 38 Global Step: 801530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:50:56,589-Speed 2496.62 samples/sec Loss 1.0588 LearningRate 0.000001 Epoch: 38 Global Step: 801540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:04,743-Speed 2512.07 samples/sec Loss 1.0528 LearningRate 0.000001 Epoch: 38 Global Step: 801550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:12,949-Speed 2495.83 samples/sec Loss 1.0665 LearningRate 0.000001 Epoch: 38 Global Step: 801560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:21,154-Speed 2496.41 samples/sec Loss 1.0809 LearningRate 0.000001 Epoch: 38 Global Step: 801570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:29,366-Speed 2494.51 samples/sec Loss 1.0653 LearningRate 0.000001 Epoch: 38 Global Step: 801580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:37,580-Speed 2493.70 samples/sec Loss 1.0550 LearningRate 0.000001 Epoch: 38 Global Step: 801590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:45,785-Speed 2496.29 samples/sec Loss 1.0860 LearningRate 0.000001 Epoch: 38 Global Step: 801600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:51:53,938-Speed 2512.36 samples/sec Loss 1.0510 LearningRate 0.000001 Epoch: 38 Global Step: 801610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:02,142-Speed 2496.62 samples/sec Loss 1.0533 LearningRate 0.000001 Epoch: 38 Global Step: 801620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:10,348-Speed 2496.32 samples/sec Loss 1.0994 LearningRate 0.000001 Epoch: 38 Global Step: 801630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:18,550-Speed 2497.65 samples/sec Loss 1.1120 LearningRate 0.000001 Epoch: 38 Global Step: 801640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:26,760-Speed 2495.14 samples/sec Loss 1.0775 LearningRate 0.000001 Epoch: 38 Global Step: 801650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:34,966-Speed 2496.36 samples/sec Loss 1.0518 LearningRate 0.000001 Epoch: 38 Global Step: 801660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:43,118-Speed 2512.44 samples/sec Loss 1.0812 LearningRate 0.000001 Epoch: 38 Global Step: 801670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:51,332-Speed 2493.63 samples/sec Loss 1.0575 LearningRate 0.000001 Epoch: 38 Global Step: 801680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:52:59,547-Speed 2493.67 samples/sec Loss 1.0852 LearningRate 0.000001 Epoch: 38 Global Step: 801690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:07,757-Speed 2494.83 samples/sec Loss 1.0679 LearningRate 0.000001 Epoch: 38 Global Step: 801700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:15,965-Speed 2495.49 samples/sec Loss 1.0563 LearningRate 0.000001 Epoch: 38 Global Step: 801710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:24,170-Speed 2496.46 samples/sec Loss 1.0765 LearningRate 0.000001 Epoch: 38 Global Step: 801720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:32,319-Speed 2513.75 samples/sec Loss 1.0702 LearningRate 0.000001 Epoch: 38 Global Step: 801730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:40,525-Speed 2496.14 samples/sec Loss 1.0497 LearningRate 0.000001 Epoch: 38 Global Step: 801740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:48,740-Speed 2493.21 samples/sec Loss 1.0542 LearningRate 0.000001 Epoch: 38 Global Step: 801750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:53:56,958-Speed 2492.42 samples/sec Loss 1.0855 LearningRate 0.000001 Epoch: 38 Global Step: 801760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:05,167-Speed 2495.37 samples/sec Loss 1.0777 LearningRate 0.000001 Epoch: 38 Global Step: 801770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:13,370-Speed 2496.85 samples/sec Loss 1.0510 LearningRate 0.000001 Epoch: 38 Global Step: 801780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:21,525-Speed 2511.94 samples/sec Loss 1.0832 LearningRate 0.000001 Epoch: 38 Global Step: 801790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:29,732-Speed 2495.91 samples/sec Loss 1.0555 LearningRate 0.000001 Epoch: 38 Global Step: 801800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:37,950-Speed 2492.62 samples/sec Loss 1.0902 LearningRate 0.000001 Epoch: 38 Global Step: 801810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:46,153-Speed 2497.17 samples/sec Loss 1.0861 LearningRate 0.000001 Epoch: 38 Global Step: 801820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:54:54,364-Speed 2494.39 samples/sec Loss 1.0626 LearningRate 0.000001 Epoch: 38 Global Step: 801830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:02,580-Speed 2493.01 samples/sec Loss 1.0646 LearningRate 0.000001 Epoch: 38 Global Step: 801840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:10,729-Speed 2513.84 samples/sec Loss 1.0965 LearningRate 0.000001 Epoch: 38 Global Step: 801850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:18,933-Speed 2496.62 samples/sec Loss 1.0446 LearningRate 0.000001 Epoch: 38 Global Step: 801860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:27,137-Speed 2496.84 samples/sec Loss 1.0706 LearningRate 0.000001 Epoch: 38 Global Step: 801870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:35,341-Speed 2496.69 samples/sec Loss 1.0238 LearningRate 0.000001 Epoch: 38 Global Step: 801880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:43,547-Speed 2496.32 samples/sec Loss 1.0937 LearningRate 0.000001 Epoch: 38 Global Step: 801890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:51,751-Speed 2496.89 samples/sec Loss 1.0912 LearningRate 0.000001 Epoch: 38 Global Step: 801900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:55:59,905-Speed 2511.97 samples/sec Loss 1.0570 LearningRate 0.000001 Epoch: 38 Global Step: 801910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:08,110-Speed 2496.30 samples/sec Loss 1.0606 LearningRate 0.000001 Epoch: 38 Global Step: 801920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:16,315-Speed 2496.55 samples/sec Loss 1.0570 LearningRate 0.000001 Epoch: 38 Global Step: 801930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:24,520-Speed 2496.61 samples/sec Loss 1.0439 LearningRate 0.000001 Epoch: 38 Global Step: 801940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:32,725-Speed 2496.60 samples/sec Loss 1.0932 LearningRate 0.000001 Epoch: 38 Global Step: 801950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:40,929-Speed 2496.56 samples/sec Loss 1.0656 LearningRate 0.000001 Epoch: 38 Global Step: 801960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:49,079-Speed 2513.11 samples/sec Loss 1.0465 LearningRate 0.000001 Epoch: 38 Global Step: 801970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:56:57,284-Speed 2496.58 samples/sec Loss 1.0491 LearningRate 0.000001 Epoch: 38 Global Step: 801980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:05,491-Speed 2495.65 samples/sec Loss 1.0741 LearningRate 0.000001 Epoch: 38 Global Step: 801990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:13,695-Speed 2496.93 samples/sec Loss 1.0580 LearningRate 0.000001 Epoch: 38 Global Step: 802000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:21,902-Speed 2496.05 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 38 Global Step: 802010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:30,107-Speed 2496.35 samples/sec Loss 1.0950 LearningRate 0.000001 Epoch: 38 Global Step: 802020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:38,256-Speed 2513.57 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 38 Global Step: 802030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:46,457-Speed 2497.60 samples/sec Loss 1.0754 LearningRate 0.000001 Epoch: 38 Global Step: 802040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:57:54,658-Speed 2497.47 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 38 Global Step: 802050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:02,865-Speed 2495.85 samples/sec Loss 1.0929 LearningRate 0.000001 Epoch: 38 Global Step: 802060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:11,071-Speed 2496.38 samples/sec Loss 1.0622 LearningRate 0.000001 Epoch: 38 Global Step: 802070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:19,276-Speed 2496.47 samples/sec Loss 1.0606 LearningRate 0.000001 Epoch: 38 Global Step: 802080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:27,430-Speed 2512.55 samples/sec Loss 1.0776 LearningRate 0.000001 Epoch: 38 Global Step: 802090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:35,635-Speed 2496.24 samples/sec Loss 1.0718 LearningRate 0.000001 Epoch: 38 Global Step: 802100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:58:43,838-Speed 2497.35 samples/sec Loss 1.0716 LearningRate 0.000001 Epoch: 38 Global Step: 802110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:58:52,045-Speed 2495.91 samples/sec Loss 1.0729 LearningRate 0.000001 Epoch: 38 Global Step: 802120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:59:00,248-Speed 2496.86 samples/sec Loss 1.0537 LearningRate 0.000001 Epoch: 38 Global Step: 802130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:59:08,449-Speed 2497.55 samples/sec Loss 1.0483 LearningRate 0.000001 Epoch: 38 Global Step: 802140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:59:16,599-Speed 2513.61 samples/sec Loss 1.0778 LearningRate 0.000001 Epoch: 38 Global Step: 802150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:59:24,800-Speed 2497.46 samples/sec Loss 1.0749 LearningRate 0.000001 Epoch: 38 Global Step: 802160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-07-13 05:59:32,965-Speed 2508.62 samples/sec Loss 1.0671 LearningRate 0.000001 Epoch: 38 Global Step: 802170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:59:41,169-Speed 2496.61 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 802180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:59:49,373-Speed 2496.61 samples/sec Loss 1.0748 LearningRate 0.000001 Epoch: 38 Global Step: 802190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 05:59:57,578-Speed 2496.52 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 802200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:05,734-Speed 2511.66 samples/sec Loss 1.0915 LearningRate 0.000001 Epoch: 38 Global Step: 802210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:13,938-Speed 2496.73 samples/sec Loss 1.0779 LearningRate 0.000001 Epoch: 38 Global Step: 802220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:22,156-Speed 2492.92 samples/sec Loss 1.0473 LearningRate 0.000001 Epoch: 38 Global Step: 802230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:30,363-Speed 2495.77 samples/sec Loss 1.0768 LearningRate 0.000001 Epoch: 38 Global Step: 802240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:38,567-Speed 2496.55 samples/sec Loss 1.0754 LearningRate 0.000001 Epoch: 38 Global Step: 802250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:46,773-Speed 2496.32 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 802260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:00:54,923-Speed 2513.46 samples/sec Loss 1.0617 LearningRate 0.000001 Epoch: 38 Global Step: 802270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:01:03,129-Speed 2496.03 samples/sec Loss 1.0735 LearningRate 0.000001 Epoch: 38 Global Step: 802280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:01:11,290-Speed 2509.88 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 802290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:01:19,496-Speed 2496.17 samples/sec Loss 1.0546 LearningRate 0.000001 Epoch: 38 Global Step: 802300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:01:27,698-Speed 2497.06 samples/sec Loss 1.0692 LearningRate 0.000001 Epoch: 38 Global Step: 802310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:01:35,905-Speed 2496.01 samples/sec Loss 1.0445 LearningRate 0.000001 Epoch: 38 Global Step: 802320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:01:44,056-Speed 2513.04 samples/sec Loss 1.0533 LearningRate 0.000001 Epoch: 38 Global Step: 802330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:01:52,264-Speed 2495.08 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 38 Global Step: 802340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:00,472-Speed 2495.72 samples/sec Loss 1.0568 LearningRate 0.000001 Epoch: 38 Global Step: 802350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:08,684-Speed 2494.28 samples/sec Loss 1.0743 LearningRate 0.000001 Epoch: 38 Global Step: 802360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:16,895-Speed 2494.66 samples/sec Loss 1.0777 LearningRate 0.000001 Epoch: 38 Global Step: 802370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:25,104-Speed 2495.25 samples/sec Loss 1.0735 LearningRate 0.000001 Epoch: 38 Global Step: 802380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:33,264-Speed 2510.40 samples/sec Loss 1.0764 LearningRate 0.000001 Epoch: 38 Global Step: 802390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:41,470-Speed 2496.09 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 802400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:49,677-Speed 2495.76 samples/sec Loss 1.0838 LearningRate 0.000001 Epoch: 38 Global Step: 802410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:02:57,887-Speed 2495.03 samples/sec Loss 1.0621 LearningRate 0.000001 Epoch: 38 Global Step: 802420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:06,108-Speed 2491.58 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 38 Global Step: 802430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:14,315-Speed 2495.86 samples/sec Loss 1.0149 LearningRate 0.000001 Epoch: 38 Global Step: 802440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:22,464-Speed 2513.54 samples/sec Loss 1.0875 LearningRate 0.000001 Epoch: 38 Global Step: 802450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:30,681-Speed 2492.62 samples/sec Loss 1.0678 LearningRate 0.000001 Epoch: 38 Global Step: 802460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:38,891-Speed 2495.23 samples/sec Loss 1.0589 LearningRate 0.000001 Epoch: 38 Global Step: 802470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:47,098-Speed 2495.47 samples/sec Loss 1.0383 LearningRate 0.000001 Epoch: 38 Global Step: 802480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:03:55,306-Speed 2495.53 samples/sec Loss 1.0850 LearningRate 0.000001 Epoch: 38 Global Step: 802490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:03,523-Speed 2493.00 samples/sec Loss 1.0426 LearningRate 0.000001 Epoch: 38 Global Step: 802500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:11,684-Speed 2510.11 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 802510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:19,890-Speed 2496.26 samples/sec Loss 1.0855 LearningRate 0.000001 Epoch: 38 Global Step: 802520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:28,095-Speed 2496.46 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 802530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:36,303-Speed 2495.62 samples/sec Loss 1.0593 LearningRate 0.000001 Epoch: 38 Global Step: 802540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:44,510-Speed 2495.74 samples/sec Loss 1.0449 LearningRate 0.000001 Epoch: 38 Global Step: 802550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:04:52,721-Speed 2494.70 samples/sec Loss 1.0625 LearningRate 0.000001 Epoch: 38 Global Step: 802560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:00,875-Speed 2511.87 samples/sec Loss 1.0486 LearningRate 0.000001 Epoch: 38 Global Step: 802570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:09,083-Speed 2495.81 samples/sec Loss 1.0569 LearningRate 0.000001 Epoch: 38 Global Step: 802580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:17,287-Speed 2496.68 samples/sec Loss 1.0651 LearningRate 0.000001 Epoch: 38 Global Step: 802590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:25,492-Speed 2496.35 samples/sec Loss 1.0844 LearningRate 0.000001 Epoch: 38 Global Step: 802600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:33,700-Speed 2495.67 samples/sec Loss 1.0568 LearningRate 0.000001 Epoch: 38 Global Step: 802610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:41,911-Speed 2494.54 samples/sec Loss 1.0750 LearningRate 0.000001 Epoch: 38 Global Step: 802620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:50,069-Speed 2510.89 samples/sec Loss 1.0579 LearningRate 0.000001 Epoch: 38 Global Step: 802630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:05:58,278-Speed 2495.14 samples/sec Loss 1.0367 LearningRate 0.000001 Epoch: 38 Global Step: 802640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:06,487-Speed 2495.33 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 38 Global Step: 802650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:14,701-Speed 2493.90 samples/sec Loss 1.0618 LearningRate 0.000001 Epoch: 38 Global Step: 802660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:22,907-Speed 2496.35 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 38 Global Step: 802670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:31,111-Speed 2496.72 samples/sec Loss 1.0475 LearningRate 0.000001 Epoch: 38 Global Step: 802680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:39,263-Speed 2512.78 samples/sec Loss 1.0415 LearningRate 0.000001 Epoch: 38 Global Step: 802690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:47,472-Speed 2495.49 samples/sec Loss 1.0508 LearningRate 0.000001 Epoch: 38 Global Step: 802700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:06:55,675-Speed 2497.00 samples/sec Loss 1.0655 LearningRate 0.000001 Epoch: 38 Global Step: 802710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:03,880-Speed 2496.65 samples/sec Loss 1.0498 LearningRate 0.000001 Epoch: 38 Global Step: 802720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:12,084-Speed 2496.52 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 38 Global Step: 802730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:20,290-Speed 2496.30 samples/sec Loss 1.0901 LearningRate 0.000001 Epoch: 38 Global Step: 802740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:28,442-Speed 2512.98 samples/sec Loss 1.0442 LearningRate 0.000001 Epoch: 38 Global Step: 802750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:36,659-Speed 2492.73 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 38 Global Step: 802760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:44,869-Speed 2494.79 samples/sec Loss 1.0393 LearningRate 0.000001 Epoch: 38 Global Step: 802770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:07:53,079-Speed 2494.98 samples/sec Loss 1.0804 LearningRate 0.000001 Epoch: 38 Global Step: 802780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:01,287-Speed 2495.82 samples/sec Loss 1.0884 LearningRate 0.000001 Epoch: 38 Global Step: 802790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:09,493-Speed 2495.99 samples/sec Loss 1.0349 LearningRate 0.000001 Epoch: 38 Global Step: 802800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:17,648-Speed 2512.04 samples/sec Loss 1.0729 LearningRate 0.000001 Epoch: 38 Global Step: 802810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:25,855-Speed 2495.83 samples/sec Loss 1.0723 LearningRate 0.000001 Epoch: 38 Global Step: 802820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:34,062-Speed 2495.60 samples/sec Loss 1.0670 LearningRate 0.000001 Epoch: 38 Global Step: 802830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:42,274-Speed 2494.42 samples/sec Loss 1.0470 LearningRate 0.000001 Epoch: 38 Global Step: 802840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:50,480-Speed 2496.27 samples/sec Loss 1.0838 LearningRate 0.000001 Epoch: 38 Global Step: 802850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:08:58,687-Speed 2495.97 samples/sec Loss 1.0673 LearningRate 0.000001 Epoch: 38 Global Step: 802860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:06,844-Speed 2510.83 samples/sec Loss 1.0550 LearningRate 0.000001 Epoch: 38 Global Step: 802870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:15,069-Speed 2490.51 samples/sec Loss 1.0461 LearningRate 0.000001 Epoch: 38 Global Step: 802880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:23,274-Speed 2496.58 samples/sec Loss 1.0523 LearningRate 0.000001 Epoch: 38 Global Step: 802890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:31,476-Speed 2497.19 samples/sec Loss 1.0652 LearningRate 0.000001 Epoch: 38 Global Step: 802900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:39,682-Speed 2496.27 samples/sec Loss 1.0887 LearningRate 0.000001 Epoch: 38 Global Step: 802910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:47,884-Speed 2497.22 samples/sec Loss 1.0705 LearningRate 0.000001 Epoch: 38 Global Step: 802920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:09:56,035-Speed 2513.00 samples/sec Loss 1.0582 LearningRate 0.000001 Epoch: 38 Global Step: 802930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:04,238-Speed 2496.98 samples/sec Loss 1.0962 LearningRate 0.000001 Epoch: 38 Global Step: 802940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:12,441-Speed 2497.27 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 802950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:20,731-Speed 2470.67 samples/sec Loss 1.0636 LearningRate 0.000001 Epoch: 38 Global Step: 802960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:28,933-Speed 2497.79 samples/sec Loss 1.0693 LearningRate 0.000001 Epoch: 38 Global Step: 802970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:37,136-Speed 2497.28 samples/sec Loss 1.0947 LearningRate 0.000001 Epoch: 38 Global Step: 802980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:45,295-Speed 2510.57 samples/sec Loss 1.0679 LearningRate 0.000001 Epoch: 38 Global Step: 802990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:10:53,502-Speed 2495.53 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 38 Global Step: 803000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:01,715-Speed 2494.00 samples/sec Loss 1.0586 LearningRate 0.000001 Epoch: 38 Global Step: 803010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:09,921-Speed 2496.23 samples/sec Loss 1.0607 LearningRate 0.000001 Epoch: 38 Global Step: 803020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:18,142-Speed 2491.72 samples/sec Loss 1.0495 LearningRate 0.000001 Epoch: 38 Global Step: 803030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:26,346-Speed 2496.80 samples/sec Loss 1.0817 LearningRate 0.000001 Epoch: 38 Global Step: 803040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:34,501-Speed 2511.84 samples/sec Loss 1.0574 LearningRate 0.000001 Epoch: 38 Global Step: 803050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:42,705-Speed 2496.70 samples/sec Loss 1.0342 LearningRate 0.000001 Epoch: 38 Global Step: 803060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:50,908-Speed 2496.81 samples/sec Loss 1.0351 LearningRate 0.000001 Epoch: 38 Global Step: 803070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:11:59,114-Speed 2496.27 samples/sec Loss 1.0564 LearningRate 0.000001 Epoch: 38 Global Step: 803080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:07,319-Speed 2496.39 samples/sec Loss 1.0706 LearningRate 0.000001 Epoch: 38 Global Step: 803090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:15,528-Speed 2495.27 samples/sec Loss 1.0632 LearningRate 0.000001 Epoch: 38 Global Step: 803100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:23,678-Speed 2513.31 samples/sec Loss 1.0698 LearningRate 0.000001 Epoch: 38 Global Step: 803110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:31,916-Speed 2486.45 samples/sec Loss 1.0720 LearningRate 0.000001 Epoch: 38 Global Step: 803120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:40,115-Speed 2498.06 samples/sec Loss 1.0700 LearningRate 0.000001 Epoch: 38 Global Step: 803130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:48,319-Speed 2496.69 samples/sec Loss 1.0993 LearningRate 0.000001 Epoch: 38 Global Step: 803140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:12:56,528-Speed 2495.23 samples/sec Loss 1.0665 LearningRate 0.000001 Epoch: 38 Global Step: 803150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:04,753-Speed 2490.34 samples/sec Loss 1.0723 LearningRate 0.000001 Epoch: 38 Global Step: 803160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:12,907-Speed 2512.20 samples/sec Loss 1.0633 LearningRate 0.000001 Epoch: 38 Global Step: 803170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:21,113-Speed 2496.15 samples/sec Loss 1.0558 LearningRate 0.000001 Epoch: 38 Global Step: 803180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:29,319-Speed 2496.37 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 38 Global Step: 803190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:37,523-Speed 2496.88 samples/sec Loss 1.0624 LearningRate 0.000001 Epoch: 38 Global Step: 803200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:45,729-Speed 2496.30 samples/sec Loss 1.0542 LearningRate 0.000001 Epoch: 38 Global Step: 803210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:13:53,932-Speed 2497.01 samples/sec Loss 1.0690 LearningRate 0.000001 Epoch: 38 Global Step: 803220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:02,085-Speed 2512.20 samples/sec Loss 1.0297 LearningRate 0.000001 Epoch: 38 Global Step: 803230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:10,301-Speed 2493.13 samples/sec Loss 1.0777 LearningRate 0.000001 Epoch: 38 Global Step: 803240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:18,517-Speed 2492.93 samples/sec Loss 1.0626 LearningRate 0.000001 Epoch: 38 Global Step: 803250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:26,725-Speed 2495.67 samples/sec Loss 1.0652 LearningRate 0.000001 Epoch: 38 Global Step: 803260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:34,931-Speed 2496.10 samples/sec Loss 1.1016 LearningRate 0.000001 Epoch: 38 Global Step: 803270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:43,140-Speed 2495.30 samples/sec Loss 1.0858 LearningRate 0.000001 Epoch: 38 Global Step: 803280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:51,290-Speed 2513.16 samples/sec Loss 1.0653 LearningRate 0.000001 Epoch: 38 Global Step: 803290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:14:59,491-Speed 2497.64 samples/sec Loss 1.0845 LearningRate 0.000001 Epoch: 38 Global Step: 803300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:07,697-Speed 2496.02 samples/sec Loss 1.0988 LearningRate 0.000001 Epoch: 38 Global Step: 803310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:15,902-Speed 2496.62 samples/sec Loss 1.0588 LearningRate 0.000001 Epoch: 38 Global Step: 803320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:24,106-Speed 2496.62 samples/sec Loss 1.0787 LearningRate 0.000001 Epoch: 38 Global Step: 803330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:32,315-Speed 2495.14 samples/sec Loss 1.0951 LearningRate 0.000001 Epoch: 38 Global Step: 803340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:40,466-Speed 2513.19 samples/sec Loss 1.0935 LearningRate 0.000001 Epoch: 38 Global Step: 803350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:48,672-Speed 2496.26 samples/sec Loss 1.0383 LearningRate 0.000001 Epoch: 38 Global Step: 803360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:15:56,885-Speed 2494.09 samples/sec Loss 1.0552 LearningRate 0.000001 Epoch: 38 Global Step: 803370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:05,095-Speed 2494.85 samples/sec Loss 1.0599 LearningRate 0.000001 Epoch: 38 Global Step: 803380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:13,302-Speed 2495.78 samples/sec Loss 1.0444 LearningRate 0.000001 Epoch: 38 Global Step: 803390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:21,518-Speed 2493.23 samples/sec Loss 1.1080 LearningRate 0.000001 Epoch: 38 Global Step: 803400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:29,670-Speed 2512.82 samples/sec Loss 1.0560 LearningRate 0.000001 Epoch: 38 Global Step: 803410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:37,879-Speed 2495.20 samples/sec Loss 1.0631 LearningRate 0.000001 Epoch: 38 Global Step: 803420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:46,087-Speed 2495.53 samples/sec Loss 1.0871 LearningRate 0.000001 Epoch: 38 Global Step: 803430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:16:54,303-Speed 2493.18 samples/sec Loss 1.0456 LearningRate 0.000001 Epoch: 38 Global Step: 803440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:17:02,511-Speed 2495.76 samples/sec Loss 1.0759 LearningRate 0.000001 Epoch: 38 Global Step: 803450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:17:10,715-Speed 2496.69 samples/sec Loss 1.0738 LearningRate 0.000001 Epoch: 38 Global Step: 803460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:17:18,872-Speed 2511.32 samples/sec Loss 1.0311 LearningRate 0.000001 Epoch: 38 Global Step: 803470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:17:27,079-Speed 2495.85 samples/sec Loss 1.0648 LearningRate 0.000001 Epoch: 38 Global Step: 803480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:17:35,293-Speed 2494.00 samples/sec Loss 1.0679 LearningRate 0.000001 Epoch: 38 Global Step: 803490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:17:43,498-Speed 2496.49 samples/sec Loss 1.0835 LearningRate 0.000001 Epoch: 38 Global Step: 803500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:17:51,702-Speed 2496.77 samples/sec Loss 1.0489 LearningRate 0.000001 Epoch: 38 Global Step: 803510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:17:59,911-Speed 2495.40 samples/sec Loss 1.0305 LearningRate 0.000001 Epoch: 38 Global Step: 803520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:18:08,064-Speed 2512.30 samples/sec Loss 1.0566 LearningRate 0.000001 Epoch: 38 Global Step: 803530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:18:16,281-Speed 2492.87 samples/sec Loss 1.0683 LearningRate 0.000001 Epoch: 38 Global Step: 803540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:18:24,489-Speed 2495.62 samples/sec Loss 1.0882 LearningRate 0.000001 Epoch: 38 Global Step: 803550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:18:32,707-Speed 2492.53 samples/sec Loss 1.0656 LearningRate 0.000001 Epoch: 38 Global Step: 803560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-07-13 06:18:40,869-Speed 2509.44 samples/sec Loss 1.0442 LearningRate 0.000001 Epoch: 38 Global Step: 803570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:18:49,082-Speed 2493.85 samples/sec Loss 1.0674 LearningRate 0.000001 Epoch: 38 Global Step: 803580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:18:57,235-Speed 2512.54 samples/sec Loss 1.0646 LearningRate 0.000001 Epoch: 38 Global Step: 803590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:05,445-Speed 2494.95 samples/sec Loss 1.0492 LearningRate 0.000001 Epoch: 38 Global Step: 803600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:13,651-Speed 2495.98 samples/sec Loss 1.0614 LearningRate 0.000001 Epoch: 38 Global Step: 803610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:21,855-Speed 2497.09 samples/sec Loss 1.0313 LearningRate 0.000001 Epoch: 38 Global Step: 803620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:30,062-Speed 2495.65 samples/sec Loss 1.0857 LearningRate 0.000001 Epoch: 38 Global Step: 803630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:38,272-Speed 2494.87 samples/sec Loss 1.0601 LearningRate 0.000001 Epoch: 38 Global Step: 803640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:46,427-Speed 2511.66 samples/sec Loss 1.0898 LearningRate 0.000001 Epoch: 38 Global Step: 803650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:19:54,634-Speed 2496.11 samples/sec Loss 1.0803 LearningRate 0.000001 Epoch: 38 Global Step: 803660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:02,834-Speed 2497.91 samples/sec Loss 1.0757 LearningRate 0.000001 Epoch: 38 Global Step: 803670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:11,043-Speed 2495.23 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 38 Global Step: 803680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:19,262-Speed 2492.09 samples/sec Loss 1.0706 LearningRate 0.000001 Epoch: 38 Global Step: 803690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:27,466-Speed 2496.69 samples/sec Loss 1.0418 LearningRate 0.000001 Epoch: 38 Global Step: 803700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:35,618-Speed 2512.88 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 38 Global Step: 803710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:43,821-Speed 2496.83 samples/sec Loss 1.0467 LearningRate 0.000001 Epoch: 38 Global Step: 803720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:20:52,028-Speed 2495.60 samples/sec Loss 1.0318 LearningRate 0.000001 Epoch: 38 Global Step: 803730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:00,234-Speed 2496.15 samples/sec Loss 1.0450 LearningRate 0.000001 Epoch: 38 Global Step: 803740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:08,443-Speed 2495.24 samples/sec Loss 1.0739 LearningRate 0.000001 Epoch: 38 Global Step: 803750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:16,659-Speed 2493.02 samples/sec Loss 1.0374 LearningRate 0.000001 Epoch: 38 Global Step: 803760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:24,809-Speed 2513.57 samples/sec Loss 1.0378 LearningRate 0.000001 Epoch: 38 Global Step: 803770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:33,012-Speed 2497.16 samples/sec Loss 1.0614 LearningRate 0.000001 Epoch: 38 Global Step: 803780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:41,215-Speed 2496.87 samples/sec Loss 1.0661 LearningRate 0.000001 Epoch: 38 Global Step: 803790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:49,422-Speed 2495.84 samples/sec Loss 1.0394 LearningRate 0.000001 Epoch: 38 Global Step: 803800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:21:57,627-Speed 2496.79 samples/sec Loss 1.0598 LearningRate 0.000001 Epoch: 38 Global Step: 803810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:05,835-Speed 2495.38 samples/sec Loss 1.0595 LearningRate 0.000001 Epoch: 38 Global Step: 803820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:13,989-Speed 2512.14 samples/sec Loss 1.0604 LearningRate 0.000001 Epoch: 38 Global Step: 803830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:22,200-Speed 2494.45 samples/sec Loss 1.0354 LearningRate 0.000001 Epoch: 38 Global Step: 803840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:30,404-Speed 2496.73 samples/sec Loss 1.1003 LearningRate 0.000001 Epoch: 38 Global Step: 803850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:38,607-Speed 2497.51 samples/sec Loss 1.0617 LearningRate 0.000001 Epoch: 38 Global Step: 803860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:46,825-Speed 2492.47 samples/sec Loss 1.0254 LearningRate 0.000001 Epoch: 38 Global Step: 803870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:22:55,035-Speed 2494.66 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 38 Global Step: 803880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:03,188-Speed 2512.68 samples/sec Loss 1.0376 LearningRate 0.000001 Epoch: 38 Global Step: 803890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:11,401-Speed 2494.00 samples/sec Loss 1.0762 LearningRate 0.000001 Epoch: 38 Global Step: 803900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:19,610-Speed 2495.25 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 38 Global Step: 803910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:27,816-Speed 2496.11 samples/sec Loss 1.0591 LearningRate 0.000001 Epoch: 38 Global Step: 803920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:36,032-Speed 2493.02 samples/sec Loss 1.0579 LearningRate 0.000001 Epoch: 38 Global Step: 803930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:44,244-Speed 2494.02 samples/sec Loss 1.0502 LearningRate 0.000001 Epoch: 38 Global Step: 803940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:23:52,404-Speed 2510.61 samples/sec Loss 1.0574 LearningRate 0.000001 Epoch: 38 Global Step: 803950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:00,611-Speed 2495.69 samples/sec Loss 1.0597 LearningRate 0.000001 Epoch: 38 Global Step: 803960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:08,817-Speed 2496.19 samples/sec Loss 1.0558 LearningRate 0.000001 Epoch: 38 Global Step: 803970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:17,022-Speed 2496.38 samples/sec Loss 1.0504 LearningRate 0.000001 Epoch: 38 Global Step: 803980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:25,229-Speed 2495.97 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 803990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:33,437-Speed 2495.48 samples/sec Loss 1.0732 LearningRate 0.000001 Epoch: 38 Global Step: 804000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:41,596-Speed 2510.79 samples/sec Loss 1.0447 LearningRate 0.000001 Epoch: 38 Global Step: 804010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:49,798-Speed 2497.42 samples/sec Loss 1.0723 LearningRate 0.000001 Epoch: 38 Global Step: 804020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:24:58,002-Speed 2496.73 samples/sec Loss 1.0602 LearningRate 0.000001 Epoch: 38 Global Step: 804030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:06,206-Speed 2496.45 samples/sec Loss 1.0768 LearningRate 0.000001 Epoch: 38 Global Step: 804040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:14,418-Speed 2494.51 samples/sec Loss 1.0726 LearningRate 0.000001 Epoch: 38 Global Step: 804050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:22,624-Speed 2496.28 samples/sec Loss 1.0658 LearningRate 0.000001 Epoch: 38 Global Step: 804060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:30,774-Speed 2513.16 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 804070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:38,979-Speed 2496.55 samples/sec Loss 1.0389 LearningRate 0.000001 Epoch: 38 Global Step: 804080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:47,184-Speed 2496.53 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 804090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:25:55,386-Speed 2497.10 samples/sec Loss 1.0622 LearningRate 0.000001 Epoch: 38 Global Step: 804100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:03,592-Speed 2496.11 samples/sec Loss 1.0721 LearningRate 0.000001 Epoch: 38 Global Step: 804110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:11,799-Speed 2496.14 samples/sec Loss 1.0704 LearningRate 0.000001 Epoch: 38 Global Step: 804120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:19,958-Speed 2510.46 samples/sec Loss 1.0583 LearningRate 0.000001 Epoch: 38 Global Step: 804130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:28,165-Speed 2496.11 samples/sec Loss 1.0692 LearningRate 0.000001 Epoch: 38 Global Step: 804140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:36,373-Speed 2495.30 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 38 Global Step: 804150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:44,582-Speed 2495.58 samples/sec Loss 1.0860 LearningRate 0.000001 Epoch: 38 Global Step: 804160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:26:52,791-Speed 2495.59 samples/sec Loss 1.0809 LearningRate 0.000001 Epoch: 38 Global Step: 804170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:00,997-Speed 2496.33 samples/sec Loss 1.0693 LearningRate 0.000001 Epoch: 38 Global Step: 804180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:09,152-Speed 2511.70 samples/sec Loss 1.0923 LearningRate 0.000001 Epoch: 38 Global Step: 804190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:17,356-Speed 2496.68 samples/sec Loss 1.0531 LearningRate 0.000001 Epoch: 38 Global Step: 804200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:25,558-Speed 2497.26 samples/sec Loss 1.0978 LearningRate 0.000001 Epoch: 38 Global Step: 804210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:33,784-Speed 2490.16 samples/sec Loss 1.0663 LearningRate 0.000001 Epoch: 38 Global Step: 804220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:41,991-Speed 2495.66 samples/sec Loss 1.0787 LearningRate 0.000001 Epoch: 38 Global Step: 804230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:50,199-Speed 2495.60 samples/sec Loss 1.0557 LearningRate 0.000001 Epoch: 38 Global Step: 804240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:27:58,347-Speed 2513.85 samples/sec Loss 1.0779 LearningRate 0.000001 Epoch: 38 Global Step: 804250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:06,561-Speed 2493.74 samples/sec Loss 1.0658 LearningRate 0.000001 Epoch: 38 Global Step: 804260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:14,764-Speed 2496.89 samples/sec Loss 1.0578 LearningRate 0.000001 Epoch: 38 Global Step: 804270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:22,971-Speed 2495.85 samples/sec Loss 1.0529 LearningRate 0.000001 Epoch: 38 Global Step: 804280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:31,174-Speed 2497.19 samples/sec Loss 1.0754 LearningRate 0.000001 Epoch: 38 Global Step: 804290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:39,379-Speed 2496.43 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 38 Global Step: 804300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:47,532-Speed 2512.15 samples/sec Loss 1.0560 LearningRate 0.000001 Epoch: 38 Global Step: 804310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:28:55,737-Speed 2496.63 samples/sec Loss 1.0969 LearningRate 0.000001 Epoch: 38 Global Step: 804320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:03,941-Speed 2496.78 samples/sec Loss 1.0771 LearningRate 0.000001 Epoch: 38 Global Step: 804330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:12,146-Speed 2496.61 samples/sec Loss 1.0749 LearningRate 0.000001 Epoch: 38 Global Step: 804340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:20,356-Speed 2495.11 samples/sec Loss 1.0609 LearningRate 0.000001 Epoch: 38 Global Step: 804350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:28,567-Speed 2494.56 samples/sec Loss 1.0433 LearningRate 0.000001 Epoch: 38 Global Step: 804360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:36,714-Speed 2514.36 samples/sec Loss 1.0773 LearningRate 0.000001 Epoch: 38 Global Step: 804370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:44,919-Speed 2496.56 samples/sec Loss 1.0676 LearningRate 0.000001 Epoch: 38 Global Step: 804380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:29:53,124-Speed 2496.41 samples/sec Loss 1.0504 LearningRate 0.000001 Epoch: 38 Global Step: 804390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:01,330-Speed 2496.27 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 804400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:09,559-Speed 2489.31 samples/sec Loss 1.0637 LearningRate 0.000001 Epoch: 38 Global Step: 804410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:17,765-Speed 2496.03 samples/sec Loss 1.0428 LearningRate 0.000001 Epoch: 38 Global Step: 804420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:25,917-Speed 2512.68 samples/sec Loss 1.0669 LearningRate 0.000001 Epoch: 38 Global Step: 804430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:34,133-Speed 2493.28 samples/sec Loss 1.0741 LearningRate 0.000001 Epoch: 38 Global Step: 804440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:42,350-Speed 2492.65 samples/sec Loss 1.0730 LearningRate 0.000001 Epoch: 38 Global Step: 804450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:50,555-Speed 2496.62 samples/sec Loss 1.0487 LearningRate 0.000001 Epoch: 38 Global Step: 804460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:30:58,762-Speed 2495.67 samples/sec Loss 1.0746 LearningRate 0.000001 Epoch: 38 Global Step: 804470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:06,969-Speed 2496.19 samples/sec Loss 1.0517 LearningRate 0.000001 Epoch: 38 Global Step: 804480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:15,121-Speed 2512.71 samples/sec Loss 1.0668 LearningRate 0.000001 Epoch: 38 Global Step: 804490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:23,335-Speed 2493.84 samples/sec Loss 1.0757 LearningRate 0.000001 Epoch: 38 Global Step: 804500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:31,539-Speed 2496.41 samples/sec Loss 1.0590 LearningRate 0.000001 Epoch: 38 Global Step: 804510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:39,748-Speed 2495.90 samples/sec Loss 1.0597 LearningRate 0.000001 Epoch: 38 Global Step: 804520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:47,952-Speed 2496.62 samples/sec Loss 1.0593 LearningRate 0.000001 Epoch: 38 Global Step: 804530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:31:56,175-Speed 2491.13 samples/sec Loss 1.0764 LearningRate 0.000001 Epoch: 38 Global Step: 804540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:04,327-Speed 2512.61 samples/sec Loss 1.0384 LearningRate 0.000001 Epoch: 38 Global Step: 804550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:12,545-Speed 2492.73 samples/sec Loss 1.0445 LearningRate 0.000001 Epoch: 38 Global Step: 804560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:20,759-Speed 2493.53 samples/sec Loss 1.0841 LearningRate 0.000001 Epoch: 38 Global Step: 804570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:28,963-Speed 2496.88 samples/sec Loss 1.0575 LearningRate 0.000001 Epoch: 38 Global Step: 804580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:37,165-Speed 2497.30 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 804590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:45,382-Speed 2492.80 samples/sec Loss 1.0744 LearningRate 0.000001 Epoch: 38 Global Step: 804600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:32:53,531-Speed 2513.54 samples/sec Loss 1.0774 LearningRate 0.000001 Epoch: 38 Global Step: 804610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:01,739-Speed 2495.87 samples/sec Loss 1.0508 LearningRate 0.000001 Epoch: 38 Global Step: 804620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:09,941-Speed 2497.33 samples/sec Loss 1.0582 LearningRate 0.000001 Epoch: 38 Global Step: 804630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:18,151-Speed 2494.90 samples/sec Loss 1.0484 LearningRate 0.000001 Epoch: 38 Global Step: 804640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:26,374-Speed 2491.18 samples/sec Loss 1.0630 LearningRate 0.000001 Epoch: 38 Global Step: 804650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:34,585-Speed 2494.53 samples/sec Loss 1.0761 LearningRate 0.000001 Epoch: 38 Global Step: 804660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:42,750-Speed 2508.70 samples/sec Loss 1.0791 LearningRate 0.000001 Epoch: 38 Global Step: 804670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:50,958-Speed 2495.39 samples/sec Loss 1.0328 LearningRate 0.000001 Epoch: 38 Global Step: 804680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:33:59,170-Speed 2494.53 samples/sec Loss 1.0725 LearningRate 0.000001 Epoch: 38 Global Step: 804690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:07,376-Speed 2496.13 samples/sec Loss 1.0714 LearningRate 0.000001 Epoch: 38 Global Step: 804700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:15,584-Speed 2495.45 samples/sec Loss 1.0727 LearningRate 0.000001 Epoch: 38 Global Step: 804710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:23,787-Speed 2496.85 samples/sec Loss 1.0741 LearningRate 0.000001 Epoch: 38 Global Step: 804720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:31,942-Speed 2511.79 samples/sec Loss 1.0507 LearningRate 0.000001 Epoch: 38 Global Step: 804730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:40,149-Speed 2496.08 samples/sec Loss 1.0577 LearningRate 0.000001 Epoch: 38 Global Step: 804740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:48,358-Speed 2495.21 samples/sec Loss 1.0750 LearningRate 0.000001 Epoch: 38 Global Step: 804750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:34:56,561-Speed 2496.88 samples/sec Loss 1.0591 LearningRate 0.000001 Epoch: 38 Global Step: 804760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-07-13 06:35:04,727-Speed 2508.48 samples/sec Loss 1.0738 LearningRate 0.000001 Epoch: 38 Global Step: 804770 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:12,931-Speed 2496.63 samples/sec Loss 1.0699 LearningRate 0.000001 Epoch: 38 Global Step: 804780 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:21,080-Speed 2513.50 samples/sec Loss 1.0744 LearningRate 0.000001 Epoch: 38 Global Step: 804790 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:29,283-Speed 2497.10 samples/sec Loss 1.0510 LearningRate 0.000001 Epoch: 38 Global Step: 804800 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:37,497-Speed 2493.85 samples/sec Loss 1.0673 LearningRate 0.000001 Epoch: 38 Global Step: 804810 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:45,700-Speed 2496.91 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 804820 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:35:53,906-Speed 2496.14 samples/sec Loss 1.0588 LearningRate 0.000001 Epoch: 38 Global Step: 804830 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:02,117-Speed 2494.57 samples/sec Loss 1.0798 LearningRate 0.000001 Epoch: 38 Global Step: 804840 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:10,267-Speed 2513.13 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 804850 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:18,481-Speed 2493.98 samples/sec Loss 1.0749 LearningRate 0.000001 Epoch: 38 Global Step: 804860 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:26,698-Speed 2492.70 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 38 Global Step: 804870 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:34,909-Speed 2494.55 samples/sec Loss 1.0619 LearningRate 0.000001 Epoch: 38 Global Step: 804880 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:43,119-Speed 2495.03 samples/sec Loss 1.0336 LearningRate 0.000001 Epoch: 38 Global Step: 804890 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:51,325-Speed 2496.24 samples/sec Loss 1.0847 LearningRate 0.000001 Epoch: 38 Global Step: 804900 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:36:59,473-Speed 2513.74 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 38 Global Step: 804910 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:07,682-Speed 2495.21 samples/sec Loss 1.0633 LearningRate 0.000001 Epoch: 38 Global Step: 804920 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:15,883-Speed 2497.72 samples/sec Loss 1.0601 LearningRate 0.000001 Epoch: 38 Global Step: 804930 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:24,104-Speed 2491.42 samples/sec Loss 1.0391 LearningRate 0.000001 Epoch: 38 Global Step: 804940 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:32,311-Speed 2495.71 samples/sec Loss 1.0952 LearningRate 0.000001 Epoch: 38 Global Step: 804950 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:40,527-Speed 2492.96 samples/sec Loss 1.0503 LearningRate 0.000001 Epoch: 38 Global Step: 804960 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:48,685-Speed 2510.81 samples/sec Loss 1.0787 LearningRate 0.000001 Epoch: 38 Global Step: 804970 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:37:56,894-Speed 2495.28 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 804980 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:05,101-Speed 2495.86 samples/sec Loss 1.0539 LearningRate 0.000001 Epoch: 38 Global Step: 804990 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:13,308-Speed 2496.16 samples/sec Loss 1.0481 LearningRate 0.000001 Epoch: 38 Global Step: 805000 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:21,515-Speed 2495.60 samples/sec Loss 1.0510 LearningRate 0.000001 Epoch: 38 Global Step: 805010 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:29,721-Speed 2496.14 samples/sec Loss 1.0936 LearningRate 0.000001 Epoch: 38 Global Step: 805020 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:37,877-Speed 2511.39 samples/sec Loss 1.0739 LearningRate 0.000001 Epoch: 38 Global Step: 805030 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:46,100-Speed 2491.05 samples/sec Loss 1.0256 LearningRate 0.000001 Epoch: 38 Global Step: 805040 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:38:54,305-Speed 2496.47 samples/sec Loss 1.0217 LearningRate 0.000001 Epoch: 38 Global Step: 805050 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:02,517-Speed 2494.34 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 38 Global Step: 805060 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:10,722-Speed 2496.36 samples/sec Loss 1.0680 LearningRate 0.000001 Epoch: 38 Global Step: 805070 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:18,931-Speed 2495.06 samples/sec Loss 1.0431 LearningRate 0.000001 Epoch: 38 Global Step: 805080 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:27,084-Speed 2512.55 samples/sec Loss 1.0819 LearningRate 0.000001 Epoch: 38 Global Step: 805090 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:35,289-Speed 2496.43 samples/sec Loss 1.0887 LearningRate 0.000001 Epoch: 38 Global Step: 805100 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:43,492-Speed 2496.81 samples/sec Loss 1.0778 LearningRate 0.000001 Epoch: 38 Global Step: 805110 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:51,696-Speed 2497.09 samples/sec Loss 1.0791 LearningRate 0.000001 Epoch: 38 Global Step: 805120 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:39:59,904-Speed 2495.69 samples/sec Loss 1.0547 LearningRate 0.000001 Epoch: 38 Global Step: 805130 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:08,111-Speed 2495.65 samples/sec Loss 1.0735 LearningRate 0.000001 Epoch: 38 Global Step: 805140 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:16,259-Speed 2513.79 samples/sec Loss 1.0431 LearningRate 0.000001 Epoch: 38 Global Step: 805150 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:24,463-Speed 2497.14 samples/sec Loss 1.0615 LearningRate 0.000001 Epoch: 38 Global Step: 805160 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:32,667-Speed 2496.70 samples/sec Loss 1.0755 LearningRate 0.000001 Epoch: 38 Global Step: 805170 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:40,874-Speed 2495.68 samples/sec Loss 1.0741 LearningRate 0.000001 Epoch: 38 Global Step: 805180 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:49,078-Speed 2496.88 samples/sec Loss 1.0684 LearningRate 0.000001 Epoch: 38 Global Step: 805190 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:40:57,281-Speed 2497.05 samples/sec Loss 1.0488 LearningRate 0.000001 Epoch: 38 Global Step: 805200 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:05,434-Speed 2512.44 samples/sec Loss 1.0298 LearningRate 0.000001 Epoch: 38 Global Step: 805210 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:13,642-Speed 2495.46 samples/sec Loss 1.0810 LearningRate 0.000001 Epoch: 38 Global Step: 805220 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:21,846-Speed 2496.67 samples/sec Loss 1.0602 LearningRate 0.000001 Epoch: 38 Global Step: 805230 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:30,049-Speed 2497.06 samples/sec Loss 1.0580 LearningRate 0.000001 Epoch: 38 Global Step: 805240 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:38,269-Speed 2491.81 samples/sec Loss 1.0791 LearningRate 0.000001 Epoch: 38 Global Step: 805250 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:46,484-Speed 2493.51 samples/sec Loss 1.0902 LearningRate 0.000001 Epoch: 38 Global Step: 805260 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:41:54,634-Speed 2513.30 samples/sec Loss 1.1150 LearningRate 0.000001 Epoch: 38 Global Step: 805270 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:02,837-Speed 2497.04 samples/sec Loss 1.0507 LearningRate 0.000001 Epoch: 38 Global Step: 805280 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:11,040-Speed 2497.36 samples/sec Loss 1.0793 LearningRate 0.000001 Epoch: 38 Global Step: 805290 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:19,245-Speed 2496.35 samples/sec Loss 1.0853 LearningRate 0.000001 Epoch: 38 Global Step: 805300 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:27,447-Speed 2497.26 samples/sec Loss 1.0720 LearningRate 0.000001 Epoch: 38 Global Step: 805310 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:35,652-Speed 2496.37 samples/sec Loss 1.0948 LearningRate 0.000001 Epoch: 38 Global Step: 805320 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:43,800-Speed 2514.33 samples/sec Loss 1.0660 LearningRate 0.000001 Epoch: 38 Global Step: 805330 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:42:52,003-Speed 2496.96 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 38 Global Step: 805340 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:00,210-Speed 2495.94 samples/sec Loss 1.0308 LearningRate 0.000001 Epoch: 38 Global Step: 805350 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:08,413-Speed 2497.01 samples/sec Loss 1.0776 LearningRate 0.000001 Epoch: 38 Global Step: 805360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:16,621-Speed 2495.45 samples/sec Loss 1.0424 LearningRate 0.000001 Epoch: 38 Global Step: 805370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:24,826-Speed 2496.42 samples/sec Loss 1.0545 LearningRate 0.000001 Epoch: 38 Global Step: 805380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:32,982-Speed 2511.52 samples/sec Loss 1.0507 LearningRate 0.000001 Epoch: 38 Global Step: 805390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:41,186-Speed 2496.83 samples/sec Loss 1.0698 LearningRate 0.000001 Epoch: 38 Global Step: 805400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:49,401-Speed 2493.47 samples/sec Loss 1.0441 LearningRate 0.000001 Epoch: 38 Global Step: 805410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:43:57,603-Speed 2497.14 samples/sec Loss 1.0324 LearningRate 0.000001 Epoch: 38 Global Step: 805420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:05,823-Speed 2492.10 samples/sec Loss 1.0854 LearningRate 0.000001 Epoch: 38 Global Step: 805430 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:14,026-Speed 2497.15 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 38 Global Step: 805440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:22,176-Speed 2513.08 samples/sec Loss 1.0687 LearningRate 0.000001 Epoch: 38 Global Step: 805450 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:30,377-Speed 2497.68 samples/sec Loss 1.0887 LearningRate 0.000001 Epoch: 38 Global Step: 805460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:38,592-Speed 2493.53 samples/sec Loss 1.0545 LearningRate 0.000001 Epoch: 38 Global Step: 805470 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:46,798-Speed 2496.19 samples/sec Loss 1.0488 LearningRate 0.000001 Epoch: 38 Global Step: 805480 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:44:55,002-Speed 2496.62 samples/sec Loss 1.0458 LearningRate 0.000001 Epoch: 38 Global Step: 805490 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:45:03,206-Speed 2496.83 samples/sec Loss 1.0712 LearningRate 0.000001 Epoch: 38 Global Step: 805500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:45:11,358-Speed 2512.58 samples/sec Loss 1.0857 LearningRate 0.000001 Epoch: 38 Global Step: 805510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:45:19,573-Speed 2493.40 samples/sec Loss 1.0393 LearningRate 0.000001 Epoch: 38 Global Step: 805520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-07-13 06:45:27,782-Speed 2495.16 samples/sec Loss 1.0401 LearningRate 0.000001 Epoch: 38 Global Step: 805530 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:45:35,992-Speed 2495.12 samples/sec Loss 1.0458 LearningRate 0.000001 Epoch: 38 Global Step: 805540 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:45:44,195-Speed 2497.08 samples/sec Loss 1.0416 LearningRate 0.000001 Epoch: 38 Global Step: 805550 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:45:52,406-Speed 2494.58 samples/sec Loss 1.0450 LearningRate 0.000001 Epoch: 38 Global Step: 805560 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:00,557-Speed 2512.74 samples/sec Loss 1.0545 LearningRate 0.000001 Epoch: 38 Global Step: 805570 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:08,767-Speed 2495.04 samples/sec Loss 1.0982 LearningRate 0.000001 Epoch: 38 Global Step: 805580 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:16,974-Speed 2496.02 samples/sec Loss 1.0601 LearningRate 0.000001 Epoch: 38 Global Step: 805590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:25,179-Speed 2496.38 samples/sec Loss 1.0666 LearningRate 0.000001 Epoch: 38 Global Step: 805600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:33,383-Speed 2496.70 samples/sec Loss 1.0840 LearningRate 0.000001 Epoch: 38 Global Step: 805610 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:41,587-Speed 2496.69 samples/sec Loss 1.0659 LearningRate 0.000001 Epoch: 38 Global Step: 805620 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:49,742-Speed 2511.94 samples/sec Loss 1.0820 LearningRate 0.000001 Epoch: 38 Global Step: 805630 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:46:57,953-Speed 2494.75 samples/sec Loss 1.0796 LearningRate 0.000001 Epoch: 38 Global Step: 805640 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:06,161-Speed 2495.39 samples/sec Loss 1.0799 LearningRate 0.000001 Epoch: 38 Global Step: 805650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:14,364-Speed 2497.08 samples/sec Loss 1.0881 LearningRate 0.000001 Epoch: 38 Global Step: 805660 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:22,576-Speed 2494.42 samples/sec Loss 1.0556 LearningRate 0.000001 Epoch: 38 Global Step: 805670 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:30,784-Speed 2495.67 samples/sec Loss 1.0875 LearningRate 0.000001 Epoch: 38 Global Step: 805680 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:38,936-Speed 2512.69 samples/sec Loss 1.0745 LearningRate 0.000001 Epoch: 38 Global Step: 805690 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:47,139-Speed 2496.99 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 805700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:47:55,342-Speed 2497.05 samples/sec Loss 1.0523 LearningRate 0.000001 Epoch: 38 Global Step: 805710 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:03,547-Speed 2496.42 samples/sec Loss 1.0418 LearningRate 0.000001 Epoch: 38 Global Step: 805720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:11,753-Speed 2495.98 samples/sec Loss 1.0658 LearningRate 0.000001 Epoch: 38 Global Step: 805730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:19,959-Speed 2496.23 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 38 Global Step: 805740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:28,109-Speed 2513.14 samples/sec Loss 1.0613 LearningRate 0.000001 Epoch: 38 Global Step: 805750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:36,317-Speed 2495.91 samples/sec Loss 1.0526 LearningRate 0.000001 Epoch: 38 Global Step: 805760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:44,519-Speed 2497.19 samples/sec Loss 1.0666 LearningRate 0.000001 Epoch: 38 Global Step: 805770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:48:52,735-Speed 2493.04 samples/sec Loss 1.0376 LearningRate 0.000001 Epoch: 38 Global Step: 805780 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:00,940-Speed 2496.56 samples/sec Loss 1.0726 LearningRate 0.000001 Epoch: 38 Global Step: 805790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:09,147-Speed 2495.98 samples/sec Loss 1.0466 LearningRate 0.000001 Epoch: 38 Global Step: 805800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:17,298-Speed 2512.96 samples/sec Loss 1.0615 LearningRate 0.000001 Epoch: 38 Global Step: 805810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:25,508-Speed 2494.77 samples/sec Loss 1.0576 LearningRate 0.000001 Epoch: 38 Global Step: 805820 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:33,727-Speed 2492.35 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 38 Global Step: 805830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:41,945-Speed 2492.31 samples/sec Loss 1.0807 LearningRate 0.000001 Epoch: 38 Global Step: 805840 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:50,146-Speed 2497.80 samples/sec Loss 1.0536 LearningRate 0.000001 Epoch: 38 Global Step: 805850 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:49:58,349-Speed 2497.08 samples/sec Loss 1.0406 LearningRate 0.000001 Epoch: 38 Global Step: 805860 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:06,499-Speed 2513.25 samples/sec Loss 1.0737 LearningRate 0.000001 Epoch: 38 Global Step: 805870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:14,703-Speed 2496.87 samples/sec Loss 1.0364 LearningRate 0.000001 Epoch: 38 Global Step: 805880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:22,908-Speed 2496.77 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 805890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:31,114-Speed 2496.39 samples/sec Loss 1.0791 LearningRate 0.000001 Epoch: 38 Global Step: 805900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:39,317-Speed 2497.07 samples/sec Loss 1.0680 LearningRate 0.000001 Epoch: 38 Global Step: 805910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:47,519-Speed 2497.38 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 38 Global Step: 805920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:50:55,670-Speed 2512.80 samples/sec Loss 1.0590 LearningRate 0.000001 Epoch: 38 Global Step: 805930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:51:03,883-Speed 2493.89 samples/sec Loss 1.0691 LearningRate 0.000001 Epoch: 38 Global Step: 805940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:51:12,090-Speed 2496.05 samples/sec Loss 1.0670 LearningRate 0.000001 Epoch: 38 Global Step: 805950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:51:20,298-Speed 2496.55 samples/sec Loss 1.0845 LearningRate 0.000001 Epoch: 38 Global Step: 805960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 06:51:28,499-Speed 2497.48 samples/sec Loss 1.0468 LearningRate 0.000001 Epoch: 38 Global Step: 805970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:51:36,702-Speed 2497.29 samples/sec Loss 1.0590 LearningRate 0.000001 Epoch: 38 Global Step: 805980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:51:44,852-Speed 2513.23 samples/sec Loss 1.0705 LearningRate 0.000001 Epoch: 38 Global Step: 805990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:51:53,069-Speed 2492.69 samples/sec Loss 1.0693 LearningRate 0.000001 Epoch: 38 Global Step: 806000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:01,272-Speed 2497.29 samples/sec Loss 1.0640 LearningRate 0.000001 Epoch: 38 Global Step: 806010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:09,481-Speed 2495.07 samples/sec Loss 1.0547 LearningRate 0.000001 Epoch: 38 Global Step: 806020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:17,689-Speed 2495.49 samples/sec Loss 1.0721 LearningRate 0.000001 Epoch: 38 Global Step: 806030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:25,894-Speed 2496.65 samples/sec Loss 1.0520 LearningRate 0.000001 Epoch: 38 Global Step: 806040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:34,047-Speed 2512.05 samples/sec Loss 1.0603 LearningRate 0.000001 Epoch: 38 Global Step: 806050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:42,271-Speed 2491.06 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 38 Global Step: 806060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:50,475-Speed 2496.59 samples/sec Loss 1.0329 LearningRate 0.000001 Epoch: 38 Global Step: 806070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:52:58,680-Speed 2496.37 samples/sec Loss 1.0821 LearningRate 0.000001 Epoch: 38 Global Step: 806080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:06,887-Speed 2495.81 samples/sec Loss 1.0787 LearningRate 0.000001 Epoch: 38 Global Step: 806090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:15,092-Speed 2496.26 samples/sec Loss 1.0677 LearningRate 0.000001 Epoch: 38 Global Step: 806100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:23,240-Speed 2514.12 samples/sec Loss 1.0450 LearningRate 0.000001 Epoch: 38 Global Step: 806110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:31,447-Speed 2495.54 samples/sec Loss 1.0729 LearningRate 0.000001 Epoch: 38 Global Step: 806120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:39,654-Speed 2495.86 samples/sec Loss 1.0522 LearningRate 0.000001 Epoch: 38 Global Step: 806130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:47,853-Speed 2498.65 samples/sec Loss 1.0828 LearningRate 0.000001 Epoch: 38 Global Step: 806140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:53:56,074-Speed 2491.65 samples/sec Loss 1.0913 LearningRate 0.000001 Epoch: 38 Global Step: 806150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:04,284-Speed 2494.84 samples/sec Loss 1.0438 LearningRate 0.000001 Epoch: 38 Global Step: 806160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:12,439-Speed 2511.81 samples/sec Loss 1.0670 LearningRate 0.000001 Epoch: 38 Global Step: 806170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:20,648-Speed 2495.38 samples/sec Loss 1.0765 LearningRate 0.000001 Epoch: 38 Global Step: 806180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:28,851-Speed 2497.18 samples/sec Loss 1.0756 LearningRate 0.000001 Epoch: 38 Global Step: 806190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:37,055-Speed 2496.47 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 806200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:45,261-Speed 2496.21 samples/sec Loss 1.0487 LearningRate 0.000001 Epoch: 38 Global Step: 806210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:54:53,466-Speed 2496.36 samples/sec Loss 1.0428 LearningRate 0.000001 Epoch: 38 Global Step: 806220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:01,655-Speed 2501.31 samples/sec Loss 1.0324 LearningRate 0.000001 Epoch: 38 Global Step: 806230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:09,863-Speed 2495.62 samples/sec Loss 1.0428 LearningRate 0.000001 Epoch: 38 Global Step: 806240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:18,068-Speed 2496.34 samples/sec Loss 1.0778 LearningRate 0.000001 Epoch: 38 Global Step: 806250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:26,273-Speed 2496.50 samples/sec Loss 1.0735 LearningRate 0.000001 Epoch: 38 Global Step: 806260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:34,474-Speed 2497.85 samples/sec Loss 1.0581 LearningRate 0.000001 Epoch: 38 Global Step: 806270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:42,682-Speed 2495.40 samples/sec Loss 1.0536 LearningRate 0.000001 Epoch: 38 Global Step: 806280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:50,833-Speed 2512.96 samples/sec Loss 1.0795 LearningRate 0.000001 Epoch: 38 Global Step: 806290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:55:59,038-Speed 2496.38 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 38 Global Step: 806300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:07,251-Speed 2494.08 samples/sec Loss 1.0702 LearningRate 0.000001 Epoch: 38 Global Step: 806310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:15,459-Speed 2495.62 samples/sec Loss 1.0458 LearningRate 0.000001 Epoch: 38 Global Step: 806320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:23,664-Speed 2496.31 samples/sec Loss 1.0542 LearningRate 0.000001 Epoch: 38 Global Step: 806330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:31,884-Speed 2492.02 samples/sec Loss 1.0442 LearningRate 0.000001 Epoch: 38 Global Step: 806340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:40,035-Speed 2512.93 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 806350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:48,243-Speed 2495.62 samples/sec Loss 1.0734 LearningRate 0.000001 Epoch: 38 Global Step: 806360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:56:56,445-Speed 2497.55 samples/sec Loss 1.0425 LearningRate 0.000001 Epoch: 38 Global Step: 806370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:04,665-Speed 2491.71 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 38 Global Step: 806380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:12,867-Speed 2497.36 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 38 Global Step: 806390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:21,071-Speed 2496.85 samples/sec Loss 1.0717 LearningRate 0.000001 Epoch: 38 Global Step: 806400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:29,228-Speed 2511.24 samples/sec Loss 1.0679 LearningRate 0.000001 Epoch: 38 Global Step: 806410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:37,432-Speed 2496.71 samples/sec Loss 1.0544 LearningRate 0.000001 Epoch: 38 Global Step: 806420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:45,637-Speed 2496.51 samples/sec Loss 1.0464 LearningRate 0.000001 Epoch: 38 Global Step: 806430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:57:53,847-Speed 2494.78 samples/sec Loss 1.0761 LearningRate 0.000001 Epoch: 38 Global Step: 806440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:02,051-Speed 2496.69 samples/sec Loss 1.0307 LearningRate 0.000001 Epoch: 38 Global Step: 806450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:10,259-Speed 2495.68 samples/sec Loss 1.0892 LearningRate 0.000001 Epoch: 38 Global Step: 806460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:18,416-Speed 2511.22 samples/sec Loss 1.0520 LearningRate 0.000001 Epoch: 38 Global Step: 806470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:26,621-Speed 2496.14 samples/sec Loss 1.0850 LearningRate 0.000001 Epoch: 38 Global Step: 806480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:34,825-Speed 2496.84 samples/sec Loss 1.0563 LearningRate 0.000001 Epoch: 38 Global Step: 806490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:43,040-Speed 2493.31 samples/sec Loss 1.0856 LearningRate 0.000001 Epoch: 38 Global Step: 806500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:51,242-Speed 2497.42 samples/sec Loss 1.0826 LearningRate 0.000001 Epoch: 38 Global Step: 806510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:58:59,446-Speed 2496.63 samples/sec Loss 1.0718 LearningRate 0.000001 Epoch: 38 Global Step: 806520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:07,596-Speed 2513.27 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 38 Global Step: 806530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:15,799-Speed 2497.10 samples/sec Loss 1.0546 LearningRate 0.000001 Epoch: 38 Global Step: 806540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:24,010-Speed 2494.60 samples/sec Loss 1.1011 LearningRate 0.000001 Epoch: 38 Global Step: 806550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:32,214-Speed 2496.78 samples/sec Loss 1.0789 LearningRate 0.000001 Epoch: 38 Global Step: 806560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:40,421-Speed 2495.99 samples/sec Loss 1.0480 LearningRate 0.000001 Epoch: 38 Global Step: 806570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:48,625-Speed 2496.42 samples/sec Loss 1.0868 LearningRate 0.000001 Epoch: 38 Global Step: 806580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 06:59:56,777-Speed 2512.58 samples/sec Loss 1.0518 LearningRate 0.000001 Epoch: 38 Global Step: 806590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:04,985-Speed 2495.63 samples/sec Loss 1.0585 LearningRate 0.000001 Epoch: 38 Global Step: 806600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:13,189-Speed 2496.68 samples/sec Loss 1.0572 LearningRate 0.000001 Epoch: 38 Global Step: 806610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:21,399-Speed 2495.17 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 806620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:29,602-Speed 2496.93 samples/sec Loss 1.0860 LearningRate 0.000001 Epoch: 38 Global Step: 806630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:37,809-Speed 2495.85 samples/sec Loss 1.0492 LearningRate 0.000001 Epoch: 38 Global Step: 806640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:45,969-Speed 2510.32 samples/sec Loss 1.0901 LearningRate 0.000001 Epoch: 38 Global Step: 806650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:00:54,176-Speed 2495.83 samples/sec Loss 1.0833 LearningRate 0.000001 Epoch: 38 Global Step: 806660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:02,378-Speed 2497.67 samples/sec Loss 1.0546 LearningRate 0.000001 Epoch: 38 Global Step: 806670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:10,585-Speed 2495.73 samples/sec Loss 1.0331 LearningRate 0.000001 Epoch: 38 Global Step: 806680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:18,789-Speed 2496.91 samples/sec Loss 1.0814 LearningRate 0.000001 Epoch: 38 Global Step: 806690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:26,994-Speed 2496.32 samples/sec Loss 1.0335 LearningRate 0.000001 Epoch: 38 Global Step: 806700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:35,159-Speed 2508.73 samples/sec Loss 1.0663 LearningRate 0.000001 Epoch: 38 Global Step: 806710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:43,365-Speed 2496.21 samples/sec Loss 1.0311 LearningRate 0.000001 Epoch: 38 Global Step: 806720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:51,570-Speed 2496.63 samples/sec Loss 1.0271 LearningRate 0.000001 Epoch: 38 Global Step: 806730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:01:59,774-Speed 2496.67 samples/sec Loss 1.0859 LearningRate 0.000001 Epoch: 38 Global Step: 806740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:07,978-Speed 2496.83 samples/sec Loss 1.0654 LearningRate 0.000001 Epoch: 38 Global Step: 806750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:16,182-Speed 2496.59 samples/sec Loss 1.0822 LearningRate 0.000001 Epoch: 38 Global Step: 806760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:24,336-Speed 2512.10 samples/sec Loss 1.0568 LearningRate 0.000001 Epoch: 38 Global Step: 806770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:32,544-Speed 2495.57 samples/sec Loss 1.0559 LearningRate 0.000001 Epoch: 38 Global Step: 806780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:40,752-Speed 2495.47 samples/sec Loss 1.0508 LearningRate 0.000001 Epoch: 38 Global Step: 806790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:48,952-Speed 2497.84 samples/sec Loss 1.0537 LearningRate 0.000001 Epoch: 38 Global Step: 806800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:02:57,158-Speed 2496.03 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 806810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:05,369-Speed 2494.75 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 806820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:13,521-Speed 2512.53 samples/sec Loss 1.0524 LearningRate 0.000001 Epoch: 38 Global Step: 806830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:21,725-Speed 2496.75 samples/sec Loss 1.0552 LearningRate 0.000001 Epoch: 38 Global Step: 806840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:29,931-Speed 2496.10 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 806850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:38,135-Speed 2496.61 samples/sec Loss 1.0313 LearningRate 0.000001 Epoch: 38 Global Step: 806860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:46,339-Speed 2496.84 samples/sec Loss 1.0823 LearningRate 0.000001 Epoch: 38 Global Step: 806870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:03:54,547-Speed 2495.72 samples/sec Loss 1.0733 LearningRate 0.000001 Epoch: 38 Global Step: 806880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:02,701-Speed 2512.05 samples/sec Loss 1.0698 LearningRate 0.000001 Epoch: 38 Global Step: 806890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:10,921-Speed 2491.74 samples/sec Loss 1.0461 LearningRate 0.000001 Epoch: 38 Global Step: 806900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:19,125-Speed 2496.99 samples/sec Loss 1.0709 LearningRate 0.000001 Epoch: 38 Global Step: 806910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:27,331-Speed 2496.31 samples/sec Loss 1.0604 LearningRate 0.000001 Epoch: 38 Global Step: 806920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:35,546-Speed 2493.12 samples/sec Loss 1.0651 LearningRate 0.000001 Epoch: 38 Global Step: 806930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:43,754-Speed 2495.64 samples/sec Loss 1.0785 LearningRate 0.000001 Epoch: 38 Global Step: 806940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:04:51,912-Speed 2510.84 samples/sec Loss 1.0431 LearningRate 0.000001 Epoch: 38 Global Step: 806950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:00,132-Speed 2492.22 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 38 Global Step: 806960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:08,339-Speed 2495.75 samples/sec Loss 1.0599 LearningRate 0.000001 Epoch: 38 Global Step: 806970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:16,543-Speed 2496.53 samples/sec Loss 1.0988 LearningRate 0.000001 Epoch: 38 Global Step: 806980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:24,751-Speed 2495.70 samples/sec Loss 1.0348 LearningRate 0.000001 Epoch: 38 Global Step: 806990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:32,959-Speed 2495.43 samples/sec Loss 1.1007 LearningRate 0.000001 Epoch: 38 Global Step: 807000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:41,116-Speed 2511.18 samples/sec Loss 1.0491 LearningRate 0.000001 Epoch: 38 Global Step: 807010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:49,324-Speed 2495.88 samples/sec Loss 1.0330 LearningRate 0.000001 Epoch: 38 Global Step: 807020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:05:57,529-Speed 2496.44 samples/sec Loss 1.0523 LearningRate 0.000001 Epoch: 38 Global Step: 807030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:05,734-Speed 2496.31 samples/sec Loss 1.0411 LearningRate 0.000001 Epoch: 38 Global Step: 807040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:13,946-Speed 2494.16 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 38 Global Step: 807050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:22,152-Speed 2496.17 samples/sec Loss 1.0695 LearningRate 0.000001 Epoch: 38 Global Step: 807060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:30,305-Speed 2512.31 samples/sec Loss 1.0583 LearningRate 0.000001 Epoch: 38 Global Step: 807070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:38,512-Speed 2495.78 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 807080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:46,717-Speed 2496.44 samples/sec Loss 1.0607 LearningRate 0.000001 Epoch: 38 Global Step: 807090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:06:54,923-Speed 2495.84 samples/sec Loss 1.0431 LearningRate 0.000001 Epoch: 38 Global Step: 807100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:03,135-Speed 2494.57 samples/sec Loss 1.0726 LearningRate 0.000001 Epoch: 38 Global Step: 807110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:11,338-Speed 2497.09 samples/sec Loss 1.0753 LearningRate 0.000001 Epoch: 38 Global Step: 807120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:19,491-Speed 2512.39 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 807130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:27,694-Speed 2497.10 samples/sec Loss 1.0519 LearningRate 0.000001 Epoch: 38 Global Step: 807140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:35,903-Speed 2495.19 samples/sec Loss 1.0293 LearningRate 0.000001 Epoch: 38 Global Step: 807150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:44,108-Speed 2496.45 samples/sec Loss 1.0693 LearningRate 0.000001 Epoch: 38 Global Step: 807160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:07:52,317-Speed 2495.29 samples/sec Loss 1.0419 LearningRate 0.000001 Epoch: 38 Global Step: 807170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:00,519-Speed 2497.44 samples/sec Loss 1.0866 LearningRate 0.000001 Epoch: 38 Global Step: 807180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:08,672-Speed 2512.19 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 807190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:16,878-Speed 2496.17 samples/sec Loss 1.0410 LearningRate 0.000001 Epoch: 38 Global Step: 807200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:25,083-Speed 2496.29 samples/sec Loss 1.0434 LearningRate 0.000001 Epoch: 38 Global Step: 807210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:33,286-Speed 2497.14 samples/sec Loss 1.0744 LearningRate 0.000001 Epoch: 38 Global Step: 807220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:41,498-Speed 2494.47 samples/sec Loss 1.0861 LearningRate 0.000001 Epoch: 38 Global Step: 807230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:49,707-Speed 2495.30 samples/sec Loss 1.0645 LearningRate 0.000001 Epoch: 38 Global Step: 807240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:08:57,857-Speed 2513.11 samples/sec Loss 1.0864 LearningRate 0.000001 Epoch: 38 Global Step: 807250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:06,064-Speed 2495.69 samples/sec Loss 1.0767 LearningRate 0.000001 Epoch: 38 Global Step: 807260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:14,272-Speed 2495.75 samples/sec Loss 1.0739 LearningRate 0.000001 Epoch: 38 Global Step: 807270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:22,480-Speed 2495.46 samples/sec Loss 1.0788 LearningRate 0.000001 Epoch: 38 Global Step: 807280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:30,691-Speed 2494.83 samples/sec Loss 1.0607 LearningRate 0.000001 Epoch: 38 Global Step: 807290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:38,900-Speed 2495.22 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 807300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:47,055-Speed 2511.54 samples/sec Loss 1.0628 LearningRate 0.000001 Epoch: 38 Global Step: 807310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:09:55,267-Speed 2494.31 samples/sec Loss 1.0503 LearningRate 0.000001 Epoch: 38 Global Step: 807320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:03,472-Speed 2496.59 samples/sec Loss 1.0786 LearningRate 0.000001 Epoch: 38 Global Step: 807330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:11,682-Speed 2494.91 samples/sec Loss 1.0678 LearningRate 0.000001 Epoch: 38 Global Step: 807340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:19,887-Speed 2496.36 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 38 Global Step: 807350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:28,097-Speed 2494.74 samples/sec Loss 1.0958 LearningRate 0.000001 Epoch: 38 Global Step: 807360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:36,252-Speed 2511.89 samples/sec Loss 1.0545 LearningRate 0.000001 Epoch: 38 Global Step: 807370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:44,461-Speed 2495.29 samples/sec Loss 1.0820 LearningRate 0.000001 Epoch: 38 Global Step: 807380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:10:52,673-Speed 2494.43 samples/sec Loss 1.0711 LearningRate 0.000001 Epoch: 38 Global Step: 807390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:00,883-Speed 2494.74 samples/sec Loss 1.0759 LearningRate 0.000001 Epoch: 38 Global Step: 807400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:09,088-Speed 2496.50 samples/sec Loss 1.0427 LearningRate 0.000001 Epoch: 38 Global Step: 807410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:17,293-Speed 2496.48 samples/sec Loss 1.0849 LearningRate 0.000001 Epoch: 38 Global Step: 807420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:25,444-Speed 2512.94 samples/sec Loss 1.0677 LearningRate 0.000001 Epoch: 38 Global Step: 807430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:33,651-Speed 2495.84 samples/sec Loss 1.0794 LearningRate 0.000001 Epoch: 38 Global Step: 807440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:41,855-Speed 2496.64 samples/sec Loss 1.0459 LearningRate 0.000001 Epoch: 38 Global Step: 807450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:50,059-Speed 2496.49 samples/sec Loss 1.0162 LearningRate 0.000001 Epoch: 38 Global Step: 807460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:11:58,262-Speed 2497.00 samples/sec Loss 1.0693 LearningRate 0.000001 Epoch: 38 Global Step: 807470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:06,470-Speed 2495.71 samples/sec Loss 1.0818 LearningRate 0.000001 Epoch: 38 Global Step: 807480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:14,622-Speed 2512.75 samples/sec Loss 1.0451 LearningRate 0.000001 Epoch: 38 Global Step: 807490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:22,826-Speed 2496.60 samples/sec Loss 1.0652 LearningRate 0.000001 Epoch: 38 Global Step: 807500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:31,031-Speed 2496.37 samples/sec Loss 1.0602 LearningRate 0.000001 Epoch: 38 Global Step: 807510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:39,238-Speed 2495.84 samples/sec Loss 1.0355 LearningRate 0.000001 Epoch: 38 Global Step: 807520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:47,446-Speed 2495.81 samples/sec Loss 1.0242 LearningRate 0.000001 Epoch: 38 Global Step: 807530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:12:55,647-Speed 2497.46 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 38 Global Step: 807540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:03,805-Speed 2510.99 samples/sec Loss 1.0740 LearningRate 0.000001 Epoch: 38 Global Step: 807550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:12,016-Speed 2494.66 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 38 Global Step: 807560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:20,221-Speed 2496.32 samples/sec Loss 1.0886 LearningRate 0.000001 Epoch: 38 Global Step: 807570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:28,426-Speed 2496.31 samples/sec Loss 1.0222 LearningRate 0.000001 Epoch: 38 Global Step: 807580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:36,632-Speed 2496.35 samples/sec Loss 1.0620 LearningRate 0.000001 Epoch: 38 Global Step: 807590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:44,836-Speed 2496.56 samples/sec Loss 1.0771 LearningRate 0.000001 Epoch: 38 Global Step: 807600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:13:52,989-Speed 2512.30 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 38 Global Step: 807610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:01,194-Speed 2496.71 samples/sec Loss 1.0535 LearningRate 0.000001 Epoch: 38 Global Step: 807620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:09,400-Speed 2496.19 samples/sec Loss 1.0492 LearningRate 0.000001 Epoch: 38 Global Step: 807630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:17,606-Speed 2495.98 samples/sec Loss 1.0369 LearningRate 0.000001 Epoch: 38 Global Step: 807640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:25,810-Speed 2496.88 samples/sec Loss 1.0673 LearningRate 0.000001 Epoch: 38 Global Step: 807650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:34,014-Speed 2496.91 samples/sec Loss 1.0632 LearningRate 0.000001 Epoch: 38 Global Step: 807660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:42,169-Speed 2511.60 samples/sec Loss 1.0604 LearningRate 0.000001 Epoch: 38 Global Step: 807670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:50,387-Speed 2492.38 samples/sec Loss 1.0717 LearningRate 0.000001 Epoch: 38 Global Step: 807680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:14:58,592-Speed 2496.29 samples/sec Loss 1.0327 LearningRate 0.000001 Epoch: 38 Global Step: 807690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:06,808-Speed 2493.49 samples/sec Loss 1.0370 LearningRate 0.000001 Epoch: 38 Global Step: 807700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:15,015-Speed 2495.84 samples/sec Loss 1.0521 LearningRate 0.000001 Epoch: 38 Global Step: 807710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:23,219-Speed 2496.62 samples/sec Loss 1.0356 LearningRate 0.000001 Epoch: 38 Global Step: 807720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:31,374-Speed 2511.90 samples/sec Loss 1.0827 LearningRate 0.000001 Epoch: 38 Global Step: 807730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:39,581-Speed 2495.60 samples/sec Loss 1.1016 LearningRate 0.000001 Epoch: 38 Global Step: 807740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:47,785-Speed 2496.72 samples/sec Loss 1.0641 LearningRate 0.000001 Epoch: 38 Global Step: 807750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:15:55,988-Speed 2497.00 samples/sec Loss 1.1036 LearningRate 0.000001 Epoch: 38 Global Step: 807760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:04,194-Speed 2496.40 samples/sec Loss 1.0404 LearningRate 0.000001 Epoch: 38 Global Step: 807770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:12,397-Speed 2496.97 samples/sec Loss 1.0439 LearningRate 0.000001 Epoch: 38 Global Step: 807780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:20,552-Speed 2511.58 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 807790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:28,758-Speed 2496.10 samples/sec Loss 1.0602 LearningRate 0.000001 Epoch: 38 Global Step: 807800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:36,963-Speed 2496.49 samples/sec Loss 1.0581 LearningRate 0.000001 Epoch: 38 Global Step: 807810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:45,172-Speed 2495.26 samples/sec Loss 1.0784 LearningRate 0.000001 Epoch: 38 Global Step: 807820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:16:53,380-Speed 2495.52 samples/sec Loss 1.0353 LearningRate 0.000001 Epoch: 38 Global Step: 807830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:01,586-Speed 2496.18 samples/sec Loss 1.0656 LearningRate 0.000001 Epoch: 38 Global Step: 807840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:09,752-Speed 2508.33 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 38 Global Step: 807850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:17,967-Speed 2493.38 samples/sec Loss 1.0279 LearningRate 0.000001 Epoch: 38 Global Step: 807860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:26,173-Speed 2496.11 samples/sec Loss 1.0828 LearningRate 0.000001 Epoch: 38 Global Step: 807870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:34,385-Speed 2494.37 samples/sec Loss 1.0539 LearningRate 0.000001 Epoch: 38 Global Step: 807880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:42,586-Speed 2497.62 samples/sec Loss 1.0564 LearningRate 0.000001 Epoch: 38 Global Step: 807890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:50,792-Speed 2496.10 samples/sec Loss 1.0743 LearningRate 0.000001 Epoch: 38 Global Step: 807900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:17:58,944-Speed 2512.59 samples/sec Loss 1.0405 LearningRate 0.000001 Epoch: 38 Global Step: 807910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:07,146-Speed 2497.39 samples/sec Loss 1.0675 LearningRate 0.000001 Epoch: 38 Global Step: 807920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:15,353-Speed 2495.93 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 38 Global Step: 807930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:23,556-Speed 2496.98 samples/sec Loss 1.0917 LearningRate 0.000001 Epoch: 38 Global Step: 807940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:31,760-Speed 2496.67 samples/sec Loss 1.0666 LearningRate 0.000001 Epoch: 38 Global Step: 807950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:39,963-Speed 2496.98 samples/sec Loss 1.0497 LearningRate 0.000001 Epoch: 38 Global Step: 807960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:48,114-Speed 2512.83 samples/sec Loss 1.0547 LearningRate 0.000001 Epoch: 38 Global Step: 807970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:18:56,327-Speed 2493.92 samples/sec Loss 1.0794 LearningRate 0.000001 Epoch: 38 Global Step: 807980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:04,560-Speed 2487.93 samples/sec Loss 1.0909 LearningRate 0.000001 Epoch: 38 Global Step: 807990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:12,773-Speed 2494.26 samples/sec Loss 1.0637 LearningRate 0.000001 Epoch: 38 Global Step: 808000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:20,981-Speed 2495.48 samples/sec Loss 1.0469 LearningRate 0.000001 Epoch: 38 Global Step: 808010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:29,191-Speed 2494.79 samples/sec Loss 1.0546 LearningRate 0.000001 Epoch: 38 Global Step: 808020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:37,346-Speed 2511.75 samples/sec Loss 1.0509 LearningRate 0.000001 Epoch: 38 Global Step: 808030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:45,553-Speed 2496.02 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 38 Global Step: 808040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:19:53,758-Speed 2496.25 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 38 Global Step: 808050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:01,963-Speed 2496.45 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 38 Global Step: 808060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:10,172-Speed 2495.10 samples/sec Loss 1.0391 LearningRate 0.000001 Epoch: 38 Global Step: 808070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:18,374-Speed 2497.03 samples/sec Loss 1.0271 LearningRate 0.000001 Epoch: 38 Global Step: 808080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:26,529-Speed 2511.72 samples/sec Loss 1.0849 LearningRate 0.000001 Epoch: 38 Global Step: 808090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:34,733-Speed 2496.85 samples/sec Loss 1.0776 LearningRate 0.000001 Epoch: 38 Global Step: 808100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:42,938-Speed 2496.57 samples/sec Loss 1.0831 LearningRate 0.000001 Epoch: 38 Global Step: 808110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:51,146-Speed 2495.58 samples/sec Loss 1.0806 LearningRate 0.000001 Epoch: 38 Global Step: 808120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:20:59,347-Speed 2497.30 samples/sec Loss 1.0829 LearningRate 0.000001 Epoch: 38 Global Step: 808130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:07,557-Speed 2495.00 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 38 Global Step: 808140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:15,718-Speed 2510.01 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 38 Global Step: 808150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:23,921-Speed 2497.13 samples/sec Loss 1.0634 LearningRate 0.000001 Epoch: 38 Global Step: 808160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:32,130-Speed 2495.04 samples/sec Loss 1.0845 LearningRate 0.000001 Epoch: 38 Global Step: 808170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:40,336-Speed 2496.26 samples/sec Loss 1.0796 LearningRate 0.000001 Epoch: 38 Global Step: 808180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:48,541-Speed 2496.53 samples/sec Loss 1.0523 LearningRate 0.000001 Epoch: 38 Global Step: 808190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:21:56,744-Speed 2496.94 samples/sec Loss 1.0399 LearningRate 0.000001 Epoch: 38 Global Step: 808200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:22:04,905-Speed 2509.91 samples/sec Loss 1.0748 LearningRate 0.000001 Epoch: 38 Global Step: 808210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-07-13 07:22:13,066-Speed 2509.83 samples/sec Loss 1.0403 LearningRate 0.000001 Epoch: 38 Global Step: 808220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:22:21,273-Speed 2496.06 samples/sec Loss 1.0743 LearningRate 0.000001 Epoch: 38 Global Step: 808230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:22:29,478-Speed 2496.51 samples/sec Loss 1.0702 LearningRate 0.000001 Epoch: 38 Global Step: 808240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:22:37,687-Speed 2495.19 samples/sec Loss 1.0890 LearningRate 0.000001 Epoch: 38 Global Step: 808250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:22:45,892-Speed 2496.29 samples/sec Loss 1.0733 LearningRate 0.000001 Epoch: 38 Global Step: 808260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:22:54,045-Speed 2512.30 samples/sec Loss 1.0665 LearningRate 0.000001 Epoch: 38 Global Step: 808270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:23:02,250-Speed 2496.34 samples/sec Loss 1.0511 LearningRate 0.000001 Epoch: 38 Global Step: 808280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:23:10,454-Speed 2497.09 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 38 Global Step: 808290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:23:18,660-Speed 2496.70 samples/sec Loss 1.0667 LearningRate 0.000001 Epoch: 38 Global Step: 808300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:23:26,868-Speed 2495.67 samples/sec Loss 1.0731 LearningRate 0.000001 Epoch: 38 Global Step: 808310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:23:35,028-Speed 2510.14 samples/sec Loss 1.0810 LearningRate 0.000001 Epoch: 38 Global Step: 808320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:23:43,181-Speed 2512.43 samples/sec Loss 1.0739 LearningRate 0.000001 Epoch: 38 Global Step: 808330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:23:51,401-Speed 2491.86 samples/sec Loss 1.0482 LearningRate 0.000001 Epoch: 38 Global Step: 808340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:23:59,607-Speed 2496.11 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 38 Global Step: 808350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:07,811-Speed 2496.53 samples/sec Loss 1.0753 LearningRate 0.000001 Epoch: 38 Global Step: 808360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:16,016-Speed 2496.39 samples/sec Loss 1.0482 LearningRate 0.000001 Epoch: 38 Global Step: 808370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:24,220-Speed 2497.01 samples/sec Loss 1.0750 LearningRate 0.000001 Epoch: 38 Global Step: 808380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:32,376-Speed 2511.46 samples/sec Loss 1.0671 LearningRate 0.000001 Epoch: 38 Global Step: 808390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:40,580-Speed 2496.87 samples/sec Loss 1.0395 LearningRate 0.000001 Epoch: 38 Global Step: 808400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:48,784-Speed 2496.71 samples/sec Loss 1.0672 LearningRate 0.000001 Epoch: 38 Global Step: 808410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:24:57,000-Speed 2493.11 samples/sec Loss 1.0540 LearningRate 0.000001 Epoch: 38 Global Step: 808420 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:05,206-Speed 2496.12 samples/sec Loss 1.0439 LearningRate 0.000001 Epoch: 38 Global Step: 808430 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:13,412-Speed 2496.04 samples/sec Loss 1.0595 LearningRate 0.000001 Epoch: 38 Global Step: 808440 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:21,562-Speed 2513.15 samples/sec Loss 1.0702 LearningRate 0.000001 Epoch: 38 Global Step: 808450 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:29,766-Speed 2496.79 samples/sec Loss 1.0517 LearningRate 0.000001 Epoch: 38 Global Step: 808460 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:37,971-Speed 2496.48 samples/sec Loss 1.0755 LearningRate 0.000001 Epoch: 38 Global Step: 808470 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:46,175-Speed 2496.29 samples/sec Loss 1.0733 LearningRate 0.000001 Epoch: 38 Global Step: 808480 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:25:54,381-Speed 2496.18 samples/sec Loss 1.0797 LearningRate 0.000001 Epoch: 38 Global Step: 808490 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:02,586-Speed 2496.75 samples/sec Loss 1.0746 LearningRate 0.000001 Epoch: 38 Global Step: 808500 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:10,742-Speed 2511.31 samples/sec Loss 1.0736 LearningRate 0.000001 Epoch: 38 Global Step: 808510 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:18,947-Speed 2496.35 samples/sec Loss 1.0906 LearningRate 0.000001 Epoch: 38 Global Step: 808520 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:27,179-Speed 2488.51 samples/sec Loss 1.0641 LearningRate 0.000001 Epoch: 38 Global Step: 808530 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:35,387-Speed 2495.38 samples/sec Loss 1.0629 LearningRate 0.000001 Epoch: 38 Global Step: 808540 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:43,594-Speed 2495.79 samples/sec Loss 1.0556 LearningRate 0.000001 Epoch: 38 Global Step: 808550 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:51,799-Speed 2496.36 samples/sec Loss 1.0600 LearningRate 0.000001 Epoch: 38 Global Step: 808560 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:26:59,949-Speed 2513.47 samples/sec Loss 1.0650 LearningRate 0.000001 Epoch: 38 Global Step: 808570 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:08,178-Speed 2489.13 samples/sec Loss 1.0657 LearningRate 0.000001 Epoch: 38 Global Step: 808580 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:16,382-Speed 2496.90 samples/sec Loss 1.0888 LearningRate 0.000001 Epoch: 38 Global Step: 808590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:24,586-Speed 2496.51 samples/sec Loss 1.0581 LearningRate 0.000001 Epoch: 38 Global Step: 808600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:32,790-Speed 2496.59 samples/sec Loss 1.0927 LearningRate 0.000001 Epoch: 38 Global Step: 808610 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:40,994-Speed 2496.91 samples/sec Loss 1.0631 LearningRate 0.000001 Epoch: 38 Global Step: 808620 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:49,145-Speed 2512.81 samples/sec Loss 1.0582 LearningRate 0.000001 Epoch: 38 Global Step: 808630 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:27:57,352-Speed 2496.02 samples/sec Loss 1.0799 LearningRate 0.000001 Epoch: 38 Global Step: 808640 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:05,564-Speed 2494.47 samples/sec Loss 1.0843 LearningRate 0.000001 Epoch: 38 Global Step: 808650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:13,778-Speed 2493.72 samples/sec Loss 1.0992 LearningRate 0.000001 Epoch: 38 Global Step: 808660 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:21,996-Speed 2492.35 samples/sec Loss 1.0820 LearningRate 0.000001 Epoch: 38 Global Step: 808670 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:30,199-Speed 2497.04 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 38 Global Step: 808680 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:38,349-Speed 2513.32 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 38 Global Step: 808690 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:46,549-Speed 2497.66 samples/sec Loss 1.0831 LearningRate 0.000001 Epoch: 38 Global Step: 808700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:28:54,754-Speed 2496.47 samples/sec Loss 1.0761 LearningRate 0.000001 Epoch: 38 Global Step: 808710 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:02,960-Speed 2496.34 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 808720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:11,169-Speed 2495.33 samples/sec Loss 1.0742 LearningRate 0.000001 Epoch: 38 Global Step: 808730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:19,376-Speed 2495.92 samples/sec Loss 1.0616 LearningRate 0.000001 Epoch: 38 Global Step: 808740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:27,527-Speed 2513.10 samples/sec Loss 1.0591 LearningRate 0.000001 Epoch: 38 Global Step: 808750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:35,734-Speed 2495.86 samples/sec Loss 1.0648 LearningRate 0.000001 Epoch: 38 Global Step: 808760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:43,939-Speed 2496.27 samples/sec Loss 1.0740 LearningRate 0.000001 Epoch: 38 Global Step: 808770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:29:52,148-Speed 2495.34 samples/sec Loss 1.0558 LearningRate 0.000001 Epoch: 38 Global Step: 808780 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:00,360-Speed 2494.33 samples/sec Loss 1.0816 LearningRate 0.000001 Epoch: 38 Global Step: 808790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:08,567-Speed 2495.56 samples/sec Loss 1.0728 LearningRate 0.000001 Epoch: 38 Global Step: 808800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:16,720-Speed 2512.49 samples/sec Loss 1.0567 LearningRate 0.000001 Epoch: 38 Global Step: 808810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:24,928-Speed 2495.43 samples/sec Loss 1.0916 LearningRate 0.000001 Epoch: 38 Global Step: 808820 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:33,151-Speed 2490.91 samples/sec Loss 1.0759 LearningRate 0.000001 Epoch: 38 Global Step: 808830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:41,363-Speed 2494.29 samples/sec Loss 1.0585 LearningRate 0.000001 Epoch: 38 Global Step: 808840 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:30:51,898-Speed 1944.40 samples/sec Loss 1.0825 LearningRate 0.000001 Epoch: 39 Global Step: 808850 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:00,102-Speed 2496.76 samples/sec Loss 1.0691 LearningRate 0.000001 Epoch: 39 Global Step: 808860 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:08,255-Speed 2512.33 samples/sec Loss 1.0668 LearningRate 0.000001 Epoch: 39 Global Step: 808870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:16,460-Speed 2496.53 samples/sec Loss 1.0930 LearningRate 0.000001 Epoch: 39 Global Step: 808880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:24,664-Speed 2497.09 samples/sec Loss 1.0387 LearningRate 0.000001 Epoch: 39 Global Step: 808890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:32,889-Speed 2490.26 samples/sec Loss 1.0751 LearningRate 0.000001 Epoch: 39 Global Step: 808900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:41,089-Speed 2498.04 samples/sec Loss 1.0543 LearningRate 0.000001 Epoch: 39 Global Step: 808910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:49,290-Speed 2497.80 samples/sec Loss 1.0900 LearningRate 0.000001 Epoch: 39 Global Step: 808920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:31:57,440-Speed 2513.43 samples/sec Loss 1.0612 LearningRate 0.000001 Epoch: 39 Global Step: 808930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:05,644-Speed 2496.87 samples/sec Loss 1.0592 LearningRate 0.000001 Epoch: 39 Global Step: 808940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:13,850-Speed 2495.91 samples/sec Loss 1.0488 LearningRate 0.000001 Epoch: 39 Global Step: 808950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:22,054-Speed 2496.57 samples/sec Loss 1.0794 LearningRate 0.000001 Epoch: 39 Global Step: 808960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:30,259-Speed 2496.44 samples/sec Loss 1.0600 LearningRate 0.000001 Epoch: 39 Global Step: 808970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:38,461-Speed 2497.20 samples/sec Loss 1.0682 LearningRate 0.000001 Epoch: 39 Global Step: 808980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:46,618-Speed 2511.34 samples/sec Loss 1.0376 LearningRate 0.000001 Epoch: 39 Global Step: 808990 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:32:54,836-Speed 2492.34 samples/sec Loss 1.0448 LearningRate 0.000001 Epoch: 39 Global Step: 809000 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:03,039-Speed 2496.85 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 39 Global Step: 809010 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:11,265-Speed 2490.00 samples/sec Loss 1.0651 LearningRate 0.000001 Epoch: 39 Global Step: 809020 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:19,470-Speed 2496.92 samples/sec Loss 1.0533 LearningRate 0.000001 Epoch: 39 Global Step: 809030 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:27,672-Speed 2497.39 samples/sec Loss 1.0872 LearningRate 0.000001 Epoch: 39 Global Step: 809040 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:35,821-Speed 2513.63 samples/sec Loss 1.0350 LearningRate 0.000001 Epoch: 39 Global Step: 809050 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:44,025-Speed 2496.72 samples/sec Loss 1.0588 LearningRate 0.000001 Epoch: 39 Global Step: 809060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:33:52,232-Speed 2496.04 samples/sec Loss 1.0799 LearningRate 0.000001 Epoch: 39 Global Step: 809070 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:00,436-Speed 2496.51 samples/sec Loss 1.0262 LearningRate 0.000001 Epoch: 39 Global Step: 809080 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:08,642-Speed 2496.30 samples/sec Loss 1.0308 LearningRate 0.000001 Epoch: 39 Global Step: 809090 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:16,846-Speed 2496.81 samples/sec Loss 1.0513 LearningRate 0.000001 Epoch: 39 Global Step: 809100 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:25,004-Speed 2510.82 samples/sec Loss 1.0603 LearningRate 0.000001 Epoch: 39 Global Step: 809110 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:33,212-Speed 2495.47 samples/sec Loss 1.0441 LearningRate 0.000001 Epoch: 39 Global Step: 809120 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:41,415-Speed 2497.24 samples/sec Loss 1.0468 LearningRate 0.000001 Epoch: 39 Global Step: 809130 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:49,620-Speed 2496.93 samples/sec Loss 1.0548 LearningRate 0.000001 Epoch: 39 Global Step: 809140 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:34:57,826-Speed 2495.90 samples/sec Loss 1.0528 LearningRate 0.000001 Epoch: 39 Global Step: 809150 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:06,033-Speed 2496.04 samples/sec Loss 1.0681 LearningRate 0.000001 Epoch: 39 Global Step: 809160 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:14,182-Speed 2513.58 samples/sec Loss 1.0562 LearningRate 0.000001 Epoch: 39 Global Step: 809170 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:22,388-Speed 2496.52 samples/sec Loss 1.0507 LearningRate 0.000001 Epoch: 39 Global Step: 809180 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:30,594-Speed 2496.26 samples/sec Loss 1.0450 LearningRate 0.000001 Epoch: 39 Global Step: 809190 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:38,800-Speed 2496.05 samples/sec Loss 1.0773 LearningRate 0.000001 Epoch: 39 Global Step: 809200 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:47,006-Speed 2495.92 samples/sec Loss 1.0268 LearningRate 0.000001 Epoch: 39 Global Step: 809210 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:35:55,215-Speed 2495.33 samples/sec Loss 1.0576 LearningRate 0.000001 Epoch: 39 Global Step: 809220 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:03,373-Speed 2510.78 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 39 Global Step: 809230 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:11,584-Speed 2494.77 samples/sec Loss 1.0682 LearningRate 0.000001 Epoch: 39 Global Step: 809240 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:19,791-Speed 2495.95 samples/sec Loss 1.0726 LearningRate 0.000001 Epoch: 39 Global Step: 809250 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:27,994-Speed 2496.98 samples/sec Loss 1.0403 LearningRate 0.000001 Epoch: 39 Global Step: 809260 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:36,197-Speed 2496.88 samples/sec Loss 1.0564 LearningRate 0.000001 Epoch: 39 Global Step: 809270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:44,408-Speed 2494.93 samples/sec Loss 1.0401 LearningRate 0.000001 Epoch: 39 Global Step: 809280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:36:52,565-Speed 2511.25 samples/sec Loss 1.0664 LearningRate 0.000001 Epoch: 39 Global Step: 809290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:00,795-Speed 2488.87 samples/sec Loss 1.0436 LearningRate 0.000001 Epoch: 39 Global Step: 809300 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:09,000-Speed 2496.41 samples/sec Loss 1.0479 LearningRate 0.000001 Epoch: 39 Global Step: 809310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:17,225-Speed 2490.31 samples/sec Loss 1.0515 LearningRate 0.000001 Epoch: 39 Global Step: 809320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:25,432-Speed 2495.97 samples/sec Loss 1.0492 LearningRate 0.000001 Epoch: 39 Global Step: 809330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:33,634-Speed 2497.28 samples/sec Loss 1.0598 LearningRate 0.000001 Epoch: 39 Global Step: 809340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:41,785-Speed 2512.81 samples/sec Loss 1.0473 LearningRate 0.000001 Epoch: 39 Global Step: 809350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:49,988-Speed 2497.12 samples/sec Loss 1.0629 LearningRate 0.000001 Epoch: 39 Global Step: 809360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:37:58,202-Speed 2493.59 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 39 Global Step: 809370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:06,426-Speed 2490.81 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 39 Global Step: 809380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:14,627-Speed 2497.57 samples/sec Loss 1.0399 LearningRate 0.000001 Epoch: 39 Global Step: 809390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:22,844-Speed 2492.90 samples/sec Loss 1.0653 LearningRate 0.000001 Epoch: 39 Global Step: 809400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:30,996-Speed 2512.73 samples/sec Loss 1.0752 LearningRate 0.000001 Epoch: 39 Global Step: 809410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:39,206-Speed 2494.91 samples/sec Loss 1.0591 LearningRate 0.000001 Epoch: 39 Global Step: 809420 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:47,410-Speed 2496.74 samples/sec Loss 1.0569 LearningRate 0.000001 Epoch: 39 Global Step: 809430 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:38:55,619-Speed 2495.28 samples/sec Loss 1.0690 LearningRate 0.000001 Epoch: 39 Global Step: 809440 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:03,822-Speed 2496.85 samples/sec Loss 1.0522 LearningRate 0.000001 Epoch: 39 Global Step: 809450 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:12,041-Speed 2492.20 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 39 Global Step: 809460 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:20,190-Speed 2513.74 samples/sec Loss 1.0598 LearningRate 0.000001 Epoch: 39 Global Step: 809470 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:28,396-Speed 2496.23 samples/sec Loss 1.0489 LearningRate 0.000001 Epoch: 39 Global Step: 809480 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:36,597-Speed 2497.61 samples/sec Loss 1.0354 LearningRate 0.000001 Epoch: 39 Global Step: 809490 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:44,805-Speed 2495.34 samples/sec Loss 1.0522 LearningRate 0.000001 Epoch: 39 Global Step: 809500 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:39:53,021-Speed 2493.23 samples/sec Loss 1.0833 LearningRate 0.000001 Epoch: 39 Global Step: 809510 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-07-13 07:40:01,226-Speed 2496.60 samples/sec Loss 1.0736 LearningRate 0.000001 Epoch: 39 Global Step: 809520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:09,385-Speed 2510.65 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 39 Global Step: 809530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:17,592-Speed 2495.69 samples/sec Loss 1.0694 LearningRate 0.000001 Epoch: 39 Global Step: 809540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:25,800-Speed 2495.69 samples/sec Loss 1.0208 LearningRate 0.000001 Epoch: 39 Global Step: 809550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:34,009-Speed 2495.31 samples/sec Loss 1.0847 LearningRate 0.000001 Epoch: 39 Global Step: 809560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:42,225-Speed 2492.94 samples/sec Loss 1.0793 LearningRate 0.000001 Epoch: 39 Global Step: 809570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:50,430-Speed 2496.23 samples/sec Loss 1.0861 LearningRate 0.000001 Epoch: 39 Global Step: 809580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:40:58,583-Speed 2512.63 samples/sec Loss 1.0496 LearningRate 0.000001 Epoch: 39 Global Step: 809590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:06,790-Speed 2495.91 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 39 Global Step: 809600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:14,997-Speed 2495.41 samples/sec Loss 1.0501 LearningRate 0.000001 Epoch: 39 Global Step: 809610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:23,204-Speed 2496.11 samples/sec Loss 1.0647 LearningRate 0.000001 Epoch: 39 Global Step: 809620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:31,407-Speed 2497.00 samples/sec Loss 1.0763 LearningRate 0.000001 Epoch: 39 Global Step: 809630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:39,615-Speed 2495.31 samples/sec Loss 1.0468 LearningRate 0.000001 Epoch: 39 Global Step: 809640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:47,766-Speed 2512.95 samples/sec Loss 1.0812 LearningRate 0.000001 Epoch: 39 Global Step: 809650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:41:55,971-Speed 2496.43 samples/sec Loss 1.0817 LearningRate 0.000001 Epoch: 39 Global Step: 809660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:04,173-Speed 2497.34 samples/sec Loss 1.0623 LearningRate 0.000001 Epoch: 39 Global Step: 809670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:12,382-Speed 2495.12 samples/sec Loss 1.0822 LearningRate 0.000001 Epoch: 39 Global Step: 809680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:20,600-Speed 2492.74 samples/sec Loss 1.0496 LearningRate 0.000001 Epoch: 39 Global Step: 809690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:28,817-Speed 2492.75 samples/sec Loss 1.0760 LearningRate 0.000001 Epoch: 39 Global Step: 809700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:36,972-Speed 2511.74 samples/sec Loss 1.0646 LearningRate 0.000001 Epoch: 39 Global Step: 809710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:45,174-Speed 2497.36 samples/sec Loss 1.0658 LearningRate 0.000001 Epoch: 39 Global Step: 809720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:42:53,376-Speed 2497.01 samples/sec Loss 1.0664 LearningRate 0.000001 Epoch: 39 Global Step: 809730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:01,580-Speed 2496.74 samples/sec Loss 1.0518 LearningRate 0.000001 Epoch: 39 Global Step: 809740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:09,785-Speed 2496.50 samples/sec Loss 1.0502 LearningRate 0.000001 Epoch: 39 Global Step: 809750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:17,987-Speed 2497.13 samples/sec Loss 1.0605 LearningRate 0.000001 Epoch: 39 Global Step: 809760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:26,149-Speed 2509.62 samples/sec Loss 1.0741 LearningRate 0.000001 Epoch: 39 Global Step: 809770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:34,350-Speed 2497.83 samples/sec Loss 1.0776 LearningRate 0.000001 Epoch: 39 Global Step: 809780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:42,555-Speed 2496.29 samples/sec Loss 1.0576 LearningRate 0.000001 Epoch: 39 Global Step: 809790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:50,760-Speed 2496.81 samples/sec Loss 1.0840 LearningRate 0.000001 Epoch: 39 Global Step: 809800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:43:58,964-Speed 2496.56 samples/sec Loss 1.0641 LearningRate 0.000001 Epoch: 39 Global Step: 809810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:07,183-Speed 2492.42 samples/sec Loss 1.0488 LearningRate 0.000001 Epoch: 39 Global Step: 809820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:15,350-Speed 2508.21 samples/sec Loss 1.0613 LearningRate 0.000001 Epoch: 39 Global Step: 809830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:23,551-Speed 2497.32 samples/sec Loss 1.0504 LearningRate 0.000001 Epoch: 39 Global Step: 809840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:31,755-Speed 2496.77 samples/sec Loss 1.0816 LearningRate 0.000001 Epoch: 39 Global Step: 809850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:39,960-Speed 2496.92 samples/sec Loss 1.0793 LearningRate 0.000001 Epoch: 39 Global Step: 809860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:48,167-Speed 2495.92 samples/sec Loss 1.0782 LearningRate 0.000001 Epoch: 39 Global Step: 809870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:44:56,369-Speed 2496.99 samples/sec Loss 1.0421 LearningRate 0.000001 Epoch: 39 Global Step: 809880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:45:04,519-Speed 2513.54 samples/sec Loss 1.0963 LearningRate 0.000001 Epoch: 39 Global Step: 809890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-07-13 07:45:12,723-Speed 2496.69 samples/sec Loss 1.0378 LearningRate 0.000001 Epoch: 39 Global Step: 809900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:45:20,932-Speed 2495.28 samples/sec Loss 1.0410 LearningRate 0.000001 Epoch: 39 Global Step: 809910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:45:29,137-Speed 2496.80 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 39 Global Step: 809920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:45:37,344-Speed 2495.92 samples/sec Loss 1.0650 LearningRate 0.000001 Epoch: 39 Global Step: 809930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:45:45,550-Speed 2496.14 samples/sec Loss 1.0336 LearningRate 0.000001 Epoch: 39 Global Step: 809940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:45:53,704-Speed 2512.03 samples/sec Loss 1.0600 LearningRate 0.000001 Epoch: 39 Global Step: 809950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:01,906-Speed 2497.29 samples/sec Loss 1.0720 LearningRate 0.000001 Epoch: 39 Global Step: 809960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:10,108-Speed 2497.50 samples/sec Loss 1.0361 LearningRate 0.000001 Epoch: 39 Global Step: 809970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:18,314-Speed 2496.10 samples/sec Loss 1.0672 LearningRate 0.000001 Epoch: 39 Global Step: 809980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:26,519-Speed 2496.26 samples/sec Loss 1.0954 LearningRate 0.000001 Epoch: 39 Global Step: 809990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:34,729-Speed 2494.75 samples/sec Loss 1.0707 LearningRate 0.000001 Epoch: 39 Global Step: 810000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:42,882-Speed 2512.36 samples/sec Loss 1.0885 LearningRate 0.000001 Epoch: 39 Global Step: 810010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:51,084-Speed 2497.37 samples/sec Loss 1.0675 LearningRate 0.000001 Epoch: 39 Global Step: 810020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:46:59,300-Speed 2493.14 samples/sec Loss 1.0585 LearningRate 0.000001 Epoch: 39 Global Step: 810030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:07,505-Speed 2496.45 samples/sec Loss 1.0322 LearningRate 0.000001 Epoch: 39 Global Step: 810040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:15,712-Speed 2495.75 samples/sec Loss 1.0336 LearningRate 0.000001 Epoch: 39 Global Step: 810050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:23,918-Speed 2496.15 samples/sec Loss 1.0675 LearningRate 0.000001 Epoch: 39 Global Step: 810060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:32,095-Speed 2504.90 samples/sec Loss 1.0550 LearningRate 0.000001 Epoch: 39 Global Step: 810070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:40,298-Speed 2497.01 samples/sec Loss 1.0900 LearningRate 0.000001 Epoch: 39 Global Step: 810080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:48,503-Speed 2496.59 samples/sec Loss 1.0389 LearningRate 0.000001 Epoch: 39 Global Step: 810090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:47:56,704-Speed 2497.65 samples/sec Loss 1.0446 LearningRate 0.000001 Epoch: 39 Global Step: 810100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:04,918-Speed 2493.70 samples/sec Loss 1.0450 LearningRate 0.000001 Epoch: 39 Global Step: 810110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:13,124-Speed 2496.14 samples/sec Loss 1.0877 LearningRate 0.000001 Epoch: 39 Global Step: 810120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:21,286-Speed 2509.44 samples/sec Loss 1.0611 LearningRate 0.000001 Epoch: 39 Global Step: 810130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:29,512-Speed 2490.17 samples/sec Loss 1.0846 LearningRate 0.000001 Epoch: 39 Global Step: 810140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:37,714-Speed 2497.39 samples/sec Loss 1.0378 LearningRate 0.000001 Epoch: 39 Global Step: 810150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:45,924-Speed 2494.73 samples/sec Loss 1.0609 LearningRate 0.000001 Epoch: 39 Global Step: 810160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:48:54,141-Speed 2492.88 samples/sec Loss 1.0215 LearningRate 0.000001 Epoch: 39 Global Step: 810170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:02,347-Speed 2496.18 samples/sec Loss 1.0447 LearningRate 0.000001 Epoch: 39 Global Step: 810180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:10,498-Speed 2512.75 samples/sec Loss 1.0586 LearningRate 0.000001 Epoch: 39 Global Step: 810190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:18,706-Speed 2495.48 samples/sec Loss 1.0529 LearningRate 0.000001 Epoch: 39 Global Step: 810200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:26,908-Speed 2497.68 samples/sec Loss 1.0614 LearningRate 0.000001 Epoch: 39 Global Step: 810210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:35,111-Speed 2497.02 samples/sec Loss 1.0898 LearningRate 0.000001 Epoch: 39 Global Step: 810220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:43,317-Speed 2496.01 samples/sec Loss 1.0590 LearningRate 0.000001 Epoch: 39 Global Step: 810230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:51,521-Speed 2497.36 samples/sec Loss 1.0459 LearningRate 0.000001 Epoch: 39 Global Step: 810240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:49:59,676-Speed 2511.43 samples/sec Loss 1.0352 LearningRate 0.000001 Epoch: 39 Global Step: 810250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:07,877-Speed 2497.68 samples/sec Loss 1.0627 LearningRate 0.000001 Epoch: 39 Global Step: 810260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:16,082-Speed 2496.40 samples/sec Loss 1.0703 LearningRate 0.000001 Epoch: 39 Global Step: 810270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:24,292-Speed 2495.06 samples/sec Loss 1.0464 LearningRate 0.000001 Epoch: 39 Global Step: 810280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:32,500-Speed 2495.52 samples/sec Loss 1.0375 LearningRate 0.000001 Epoch: 39 Global Step: 810290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:40,703-Speed 2496.99 samples/sec Loss 1.0615 LearningRate 0.000001 Epoch: 39 Global Step: 810300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:48,867-Speed 2509.04 samples/sec Loss 1.0504 LearningRate 0.000001 Epoch: 39 Global Step: 810310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:50:57,087-Speed 2492.02 samples/sec Loss 1.0442 LearningRate 0.000001 Epoch: 39 Global Step: 810320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:05,292-Speed 2496.46 samples/sec Loss 1.0631 LearningRate 0.000001 Epoch: 39 Global Step: 810330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:13,495-Speed 2496.83 samples/sec Loss 1.0259 LearningRate 0.000001 Epoch: 39 Global Step: 810340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:21,700-Speed 2496.97 samples/sec Loss 1.0828 LearningRate 0.000001 Epoch: 39 Global Step: 810350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:29,901-Speed 2497.71 samples/sec Loss 1.0535 LearningRate 0.000001 Epoch: 39 Global Step: 810360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:38,051-Speed 2513.29 samples/sec Loss 1.0365 LearningRate 0.000001 Epoch: 39 Global Step: 810370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:46,252-Speed 2497.75 samples/sec Loss 1.0582 LearningRate 0.000001 Epoch: 39 Global Step: 810380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:51:54,453-Speed 2497.68 samples/sec Loss 1.0386 LearningRate 0.000001 Epoch: 39 Global Step: 810390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:02,663-Speed 2494.88 samples/sec Loss 1.0407 LearningRate 0.000001 Epoch: 39 Global Step: 810400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:10,872-Speed 2495.23 samples/sec Loss 1.0677 LearningRate 0.000001 Epoch: 39 Global Step: 810410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:19,076-Speed 2496.82 samples/sec Loss 1.0318 LearningRate 0.000001 Epoch: 39 Global Step: 810420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:27,227-Speed 2513.02 samples/sec Loss 1.0618 LearningRate 0.000001 Epoch: 39 Global Step: 810430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:35,431-Speed 2496.63 samples/sec Loss 1.0368 LearningRate 0.000001 Epoch: 39 Global Step: 810440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:43,638-Speed 2496.04 samples/sec Loss 1.0333 LearningRate 0.000001 Epoch: 39 Global Step: 810450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:52:51,843-Speed 2496.28 samples/sec Loss 1.0692 LearningRate 0.000001 Epoch: 39 Global Step: 810460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:00,047-Speed 2496.63 samples/sec Loss 1.0454 LearningRate 0.000001 Epoch: 39 Global Step: 810470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:08,252-Speed 2496.43 samples/sec Loss 1.0661 LearningRate 0.000001 Epoch: 39 Global Step: 810480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:16,404-Speed 2512.71 samples/sec Loss 1.0733 LearningRate 0.000001 Epoch: 39 Global Step: 810490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:24,611-Speed 2495.76 samples/sec Loss 1.0833 LearningRate 0.000001 Epoch: 39 Global Step: 810500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:32,817-Speed 2496.07 samples/sec Loss 1.0719 LearningRate 0.000001 Epoch: 39 Global Step: 810510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:41,020-Speed 2497.27 samples/sec Loss 1.0789 LearningRate 0.000001 Epoch: 39 Global Step: 810520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:49,230-Speed 2495.01 samples/sec Loss 1.0652 LearningRate 0.000001 Epoch: 39 Global Step: 810530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:53:57,449-Speed 2491.92 samples/sec Loss 1.0441 LearningRate 0.000001 Epoch: 39 Global Step: 810540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:05,597-Speed 2514.05 samples/sec Loss 1.0459 LearningRate 0.000001 Epoch: 39 Global Step: 810550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:13,813-Speed 2493.11 samples/sec Loss 1.0589 LearningRate 0.000001 Epoch: 39 Global Step: 810560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:22,018-Speed 2496.22 samples/sec Loss 1.0533 LearningRate 0.000001 Epoch: 39 Global Step: 810570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:30,223-Speed 2496.64 samples/sec Loss 1.0530 LearningRate 0.000001 Epoch: 39 Global Step: 810580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:38,433-Speed 2494.75 samples/sec Loss 1.1011 LearningRate 0.000001 Epoch: 39 Global Step: 810590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:46,637-Speed 2496.83 samples/sec Loss 1.0777 LearningRate 0.000001 Epoch: 39 Global Step: 810600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:54:54,791-Speed 2512.00 samples/sec Loss 1.0238 LearningRate 0.000001 Epoch: 39 Global Step: 810610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:02,996-Speed 2496.38 samples/sec Loss 1.0598 LearningRate 0.000001 Epoch: 39 Global Step: 810620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:11,202-Speed 2496.24 samples/sec Loss 1.0457 LearningRate 0.000001 Epoch: 39 Global Step: 810630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:19,423-Speed 2491.43 samples/sec Loss 1.0578 LearningRate 0.000001 Epoch: 39 Global Step: 810640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:27,630-Speed 2495.62 samples/sec Loss 1.0560 LearningRate 0.000001 Epoch: 39 Global Step: 810650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:35,833-Speed 2497.00 samples/sec Loss 1.0632 LearningRate 0.000001 Epoch: 39 Global Step: 810660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:43,995-Speed 2509.66 samples/sec Loss 1.0417 LearningRate 0.000001 Epoch: 39 Global Step: 810670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:55:52,198-Speed 2496.98 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 39 Global Step: 810680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:56:00,401-Speed 2497.55 samples/sec Loss 1.0841 LearningRate 0.000001 Epoch: 39 Global Step: 810690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:56:08,607-Speed 2496.11 samples/sec Loss 1.0786 LearningRate 0.000001 Epoch: 39 Global Step: 810700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:56:16,822-Speed 2493.51 samples/sec Loss 1.0711 LearningRate 0.000001 Epoch: 39 Global Step: 810710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 07:56:25,023-Speed 2497.57 samples/sec Loss 1.0547 LearningRate 0.000001 Epoch: 39 Global Step: 810720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:56:33,172-Speed 2513.47 samples/sec Loss 1.0814 LearningRate 0.000001 Epoch: 39 Global Step: 810730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:56:41,374-Speed 2497.43 samples/sec Loss 1.0496 LearningRate 0.000001 Epoch: 39 Global Step: 810740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:56:49,589-Speed 2493.30 samples/sec Loss 1.0473 LearningRate 0.000001 Epoch: 39 Global Step: 810750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:56:57,796-Speed 2495.79 samples/sec Loss 1.0555 LearningRate 0.000001 Epoch: 39 Global Step: 810760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:06,001-Speed 2496.37 samples/sec Loss 1.0549 LearningRate 0.000001 Epoch: 39 Global Step: 810770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:14,208-Speed 2495.77 samples/sec Loss 1.0978 LearningRate 0.000001 Epoch: 39 Global Step: 810780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:22,362-Speed 2512.06 samples/sec Loss 1.0536 LearningRate 0.000001 Epoch: 39 Global Step: 810790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:30,567-Speed 2496.40 samples/sec Loss 1.0868 LearningRate 0.000001 Epoch: 39 Global Step: 810800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:38,772-Speed 2496.44 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 39 Global Step: 810810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:46,972-Speed 2497.79 samples/sec Loss 1.0672 LearningRate 0.000001 Epoch: 39 Global Step: 810820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:57:55,176-Speed 2496.84 samples/sec Loss 1.0675 LearningRate 0.000001 Epoch: 39 Global Step: 810830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:03,389-Speed 2493.95 samples/sec Loss 1.0782 LearningRate 0.000001 Epoch: 39 Global Step: 810840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:11,539-Speed 2513.26 samples/sec Loss 1.0401 LearningRate 0.000001 Epoch: 39 Global Step: 810850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:19,757-Speed 2492.76 samples/sec Loss 1.0527 LearningRate 0.000001 Epoch: 39 Global Step: 810860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:27,963-Speed 2495.92 samples/sec Loss 1.0326 LearningRate 0.000001 Epoch: 39 Global Step: 810870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:36,165-Speed 2497.62 samples/sec Loss 1.0511 LearningRate 0.000001 Epoch: 39 Global Step: 810880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:44,373-Speed 2495.59 samples/sec Loss 1.0601 LearningRate 0.000001 Epoch: 39 Global Step: 810890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:58:52,587-Speed 2493.83 samples/sec Loss 1.0441 LearningRate 0.000001 Epoch: 39 Global Step: 810900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:00,741-Speed 2512.04 samples/sec Loss 1.0502 LearningRate 0.000001 Epoch: 39 Global Step: 810910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:08,949-Speed 2495.65 samples/sec Loss 1.0548 LearningRate 0.000001 Epoch: 39 Global Step: 810920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:17,164-Speed 2493.43 samples/sec Loss 1.0697 LearningRate 0.000001 Epoch: 39 Global Step: 810930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:25,367-Speed 2496.97 samples/sec Loss 1.0867 LearningRate 0.000001 Epoch: 39 Global Step: 810940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:33,578-Speed 2494.45 samples/sec Loss 1.0444 LearningRate 0.000001 Epoch: 39 Global Step: 810950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:41,782-Speed 2496.44 samples/sec Loss 1.0520 LearningRate 0.000001 Epoch: 39 Global Step: 810960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:49,933-Speed 2513.30 samples/sec Loss 1.0702 LearningRate 0.000001 Epoch: 39 Global Step: 810970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 07:59:58,136-Speed 2497.04 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 39 Global Step: 810980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:06,361-Speed 2490.25 samples/sec Loss 1.0597 LearningRate 0.000001 Epoch: 39 Global Step: 810990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:14,584-Speed 2491.17 samples/sec Loss 1.0720 LearningRate 0.000001 Epoch: 39 Global Step: 811000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:22,786-Speed 2497.15 samples/sec Loss 1.0772 LearningRate 0.000001 Epoch: 39 Global Step: 811010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:30,994-Speed 2495.84 samples/sec Loss 1.0758 LearningRate 0.000001 Epoch: 39 Global Step: 811020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:39,144-Speed 2513.11 samples/sec Loss 1.0319 LearningRate 0.000001 Epoch: 39 Global Step: 811030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:47,361-Speed 2493.16 samples/sec Loss 1.0447 LearningRate 0.000001 Epoch: 39 Global Step: 811040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:00:55,566-Speed 2496.32 samples/sec Loss 1.0710 LearningRate 0.000001 Epoch: 39 Global Step: 811050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:03,770-Speed 2496.63 samples/sec Loss 1.0467 LearningRate 0.000001 Epoch: 39 Global Step: 811060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:11,975-Speed 2496.44 samples/sec Loss 1.0711 LearningRate 0.000001 Epoch: 39 Global Step: 811070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:20,191-Speed 2493.41 samples/sec Loss 1.0722 LearningRate 0.000001 Epoch: 39 Global Step: 811080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:28,339-Speed 2513.84 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 39 Global Step: 811090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:36,541-Speed 2497.28 samples/sec Loss 1.0637 LearningRate 0.000001 Epoch: 39 Global Step: 811100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:44,744-Speed 2496.84 samples/sec Loss 1.0586 LearningRate 0.000001 Epoch: 39 Global Step: 811110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:01:52,950-Speed 2496.25 samples/sec Loss 1.0727 LearningRate 0.000001 Epoch: 39 Global Step: 811120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:01,155-Speed 2496.56 samples/sec Loss 1.0462 LearningRate 0.000001 Epoch: 39 Global Step: 811130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:09,356-Speed 2497.70 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 39 Global Step: 811140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:17,508-Speed 2512.73 samples/sec Loss 1.0668 LearningRate 0.000001 Epoch: 39 Global Step: 811150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:25,712-Speed 2496.75 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 39 Global Step: 811160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:33,920-Speed 2495.34 samples/sec Loss 1.0856 LearningRate 0.000001 Epoch: 39 Global Step: 811170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:42,129-Speed 2495.17 samples/sec Loss 1.0687 LearningRate 0.000001 Epoch: 39 Global Step: 811180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:02:50,293-Speed 2509.08 samples/sec Loss 1.0431 LearningRate 0.000001 Epoch: 39 Global Step: 811190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:02:58,496-Speed 2496.91 samples/sec Loss 1.0843 LearningRate 0.000001 Epoch: 39 Global Step: 811200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:06,659-Speed 2509.29 samples/sec Loss 1.0899 LearningRate 0.000001 Epoch: 39 Global Step: 811210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:14,864-Speed 2496.31 samples/sec Loss 1.0735 LearningRate 0.000001 Epoch: 39 Global Step: 811220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:23,076-Speed 2494.61 samples/sec Loss 1.0717 LearningRate 0.000001 Epoch: 39 Global Step: 811230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:31,279-Speed 2496.99 samples/sec Loss 1.0537 LearningRate 0.000001 Epoch: 39 Global Step: 811240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:39,487-Speed 2495.44 samples/sec Loss 1.0566 LearningRate 0.000001 Epoch: 39 Global Step: 811250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:47,699-Speed 2494.15 samples/sec Loss 1.0551 LearningRate 0.000001 Epoch: 39 Global Step: 811260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:03:55,862-Speed 2509.24 samples/sec Loss 1.0558 LearningRate 0.000001 Epoch: 39 Global Step: 811270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:04,065-Speed 2497.13 samples/sec Loss 1.0437 LearningRate 0.000001 Epoch: 39 Global Step: 811280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:12,270-Speed 2496.58 samples/sec Loss 1.0566 LearningRate 0.000001 Epoch: 39 Global Step: 811290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:20,486-Speed 2493.04 samples/sec Loss 1.0686 LearningRate 0.000001 Epoch: 39 Global Step: 811300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:28,690-Speed 2496.75 samples/sec Loss 1.0514 LearningRate 0.000001 Epoch: 39 Global Step: 811310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:36,914-Speed 2490.98 samples/sec Loss 1.0287 LearningRate 0.000001 Epoch: 39 Global Step: 811320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:45,066-Speed 2512.67 samples/sec Loss 1.0670 LearningRate 0.000001 Epoch: 39 Global Step: 811330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:04:53,270-Speed 2496.75 samples/sec Loss 1.0674 LearningRate 0.000001 Epoch: 39 Global Step: 811340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:01,479-Speed 2495.06 samples/sec Loss 1.0717 LearningRate 0.000001 Epoch: 39 Global Step: 811350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:09,692-Speed 2494.25 samples/sec Loss 1.0606 LearningRate 0.000001 Epoch: 39 Global Step: 811360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:17,896-Speed 2496.93 samples/sec Loss 1.0659 LearningRate 0.000001 Epoch: 39 Global Step: 811370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:26,099-Speed 2496.89 samples/sec Loss 1.0432 LearningRate 0.000001 Epoch: 39 Global Step: 811380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:34,264-Speed 2508.85 samples/sec Loss 1.0453 LearningRate 0.000001 Epoch: 39 Global Step: 811390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:42,467-Speed 2497.14 samples/sec Loss 1.0534 LearningRate 0.000001 Epoch: 39 Global Step: 811400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:50,671-Speed 2496.61 samples/sec Loss 1.0625 LearningRate 0.000001 Epoch: 39 Global Step: 811410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:05:58,883-Speed 2494.29 samples/sec Loss 1.0312 LearningRate 0.000001 Epoch: 39 Global Step: 811420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:07,086-Speed 2497.43 samples/sec Loss 1.0485 LearningRate 0.000001 Epoch: 39 Global Step: 811430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:15,290-Speed 2496.62 samples/sec Loss 1.0746 LearningRate 0.000001 Epoch: 39 Global Step: 811440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:23,446-Speed 2511.36 samples/sec Loss 1.0973 LearningRate 0.000001 Epoch: 39 Global Step: 811450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:31,657-Speed 2494.65 samples/sec Loss 1.0445 LearningRate 0.000001 Epoch: 39 Global Step: 811460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:39,861-Speed 2496.98 samples/sec Loss 1.0769 LearningRate 0.000001 Epoch: 39 Global Step: 811470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:48,062-Speed 2497.44 samples/sec Loss 1.0767 LearningRate 0.000001 Epoch: 39 Global Step: 811480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:06:56,264-Speed 2497.42 samples/sec Loss 1.0788 LearningRate 0.000001 Epoch: 39 Global Step: 811490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:04,469-Speed 2496.49 samples/sec Loss 1.0222 LearningRate 0.000001 Epoch: 39 Global Step: 811500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:12,624-Speed 2511.84 samples/sec Loss 1.0511 LearningRate 0.000001 Epoch: 39 Global Step: 811510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:20,829-Speed 2496.56 samples/sec Loss 1.0387 LearningRate 0.000001 Epoch: 39 Global Step: 811520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:29,038-Speed 2495.20 samples/sec Loss 1.0582 LearningRate 0.000001 Epoch: 39 Global Step: 811530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:37,247-Speed 2495.45 samples/sec Loss 1.0689 LearningRate 0.000001 Epoch: 39 Global Step: 811540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:45,450-Speed 2497.02 samples/sec Loss 1.0841 LearningRate 0.000001 Epoch: 39 Global Step: 811550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:07:53,653-Speed 2496.85 samples/sec Loss 1.0499 LearningRate 0.000001 Epoch: 39 Global Step: 811560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:01,804-Speed 2513.06 samples/sec Loss 1.0605 LearningRate 0.000001 Epoch: 39 Global Step: 811570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:10,015-Speed 2494.51 samples/sec Loss 1.0654 LearningRate 0.000001 Epoch: 39 Global Step: 811580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:18,217-Speed 2497.72 samples/sec Loss 1.0528 LearningRate 0.000001 Epoch: 39 Global Step: 811590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:26,430-Speed 2493.93 samples/sec Loss 1.0652 LearningRate 0.000001 Epoch: 39 Global Step: 811600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:34,649-Speed 2492.21 samples/sec Loss 1.0419 LearningRate 0.000001 Epoch: 39 Global Step: 811610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:42,855-Speed 2496.40 samples/sec Loss 1.0647 LearningRate 0.000001 Epoch: 39 Global Step: 811620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:51,017-Speed 2509.60 samples/sec Loss 1.0311 LearningRate 0.000001 Epoch: 39 Global Step: 811630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:08:59,224-Speed 2495.85 samples/sec Loss 1.0432 LearningRate 0.000001 Epoch: 39 Global Step: 811640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:07,426-Speed 2497.32 samples/sec Loss 1.0323 LearningRate 0.000001 Epoch: 39 Global Step: 811650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:15,628-Speed 2497.31 samples/sec Loss 1.0710 LearningRate 0.000001 Epoch: 39 Global Step: 811660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:23,836-Speed 2495.61 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 39 Global Step: 811670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:32,041-Speed 2496.20 samples/sec Loss 1.0348 LearningRate 0.000001 Epoch: 39 Global Step: 811680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:40,191-Speed 2513.42 samples/sec Loss 1.0677 LearningRate 0.000001 Epoch: 39 Global Step: 811690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:48,398-Speed 2495.89 samples/sec Loss 1.0843 LearningRate 0.000001 Epoch: 39 Global Step: 811700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:09:56,607-Speed 2495.02 samples/sec Loss 1.0445 LearningRate 0.000001 Epoch: 39 Global Step: 811710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:04,814-Speed 2496.02 samples/sec Loss 1.0490 LearningRate 0.000001 Epoch: 39 Global Step: 811720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:13,019-Speed 2496.11 samples/sec Loss 1.0768 LearningRate 0.000001 Epoch: 39 Global Step: 811730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:21,226-Speed 2495.86 samples/sec Loss 1.0724 LearningRate 0.000001 Epoch: 39 Global Step: 811740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:29,378-Speed 2512.62 samples/sec Loss 1.0417 LearningRate 0.000001 Epoch: 39 Global Step: 811750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:37,596-Speed 2492.30 samples/sec Loss 1.0580 LearningRate 0.000001 Epoch: 39 Global Step: 811760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:45,805-Speed 2495.32 samples/sec Loss 1.0690 LearningRate 0.000001 Epoch: 39 Global Step: 811770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:10:54,010-Speed 2496.42 samples/sec Loss 1.0632 LearningRate 0.000001 Epoch: 39 Global Step: 811780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:02,220-Speed 2495.24 samples/sec Loss 1.0346 LearningRate 0.000001 Epoch: 39 Global Step: 811790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:10,427-Speed 2495.66 samples/sec Loss 1.0641 LearningRate 0.000001 Epoch: 39 Global Step: 811800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:18,581-Speed 2511.89 samples/sec Loss 1.0213 LearningRate 0.000001 Epoch: 39 Global Step: 811810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:26,788-Speed 2495.75 samples/sec Loss 1.0538 LearningRate 0.000001 Epoch: 39 Global Step: 811820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:34,993-Speed 2496.70 samples/sec Loss 1.0618 LearningRate 0.000001 Epoch: 39 Global Step: 811830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:43,198-Speed 2496.36 samples/sec Loss 1.0531 LearningRate 0.000001 Epoch: 39 Global Step: 811840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:51,403-Speed 2496.38 samples/sec Loss 1.0638 LearningRate 0.000001 Epoch: 39 Global Step: 811850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:11:59,609-Speed 2496.24 samples/sec Loss 1.0548 LearningRate 0.000001 Epoch: 39 Global Step: 811860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:07,762-Speed 2511.99 samples/sec Loss 1.0458 LearningRate 0.000001 Epoch: 39 Global Step: 811870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:15,968-Speed 2496.21 samples/sec Loss 1.0716 LearningRate 0.000001 Epoch: 39 Global Step: 811880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:24,173-Speed 2496.60 samples/sec Loss 1.0409 LearningRate 0.000001 Epoch: 39 Global Step: 811890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:32,380-Speed 2495.79 samples/sec Loss 1.0546 LearningRate 0.000001 Epoch: 39 Global Step: 811900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:40,593-Speed 2494.17 samples/sec Loss 1.0581 LearningRate 0.000001 Epoch: 39 Global Step: 811910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:48,801-Speed 2495.23 samples/sec Loss 1.0806 LearningRate 0.000001 Epoch: 39 Global Step: 811920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:12:56,969-Speed 2507.91 samples/sec Loss 1.0454 LearningRate 0.000001 Epoch: 39 Global Step: 811930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:05,174-Speed 2496.30 samples/sec Loss 1.0879 LearningRate 0.000001 Epoch: 39 Global Step: 811940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:13,380-Speed 2496.14 samples/sec Loss 1.0605 LearningRate 0.000001 Epoch: 39 Global Step: 811950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:21,589-Speed 2495.08 samples/sec Loss 1.0426 LearningRate 0.000001 Epoch: 39 Global Step: 811960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:29,800-Speed 2494.66 samples/sec Loss 1.0571 LearningRate 0.000001 Epoch: 39 Global Step: 811970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:38,015-Speed 2493.63 samples/sec Loss 1.0945 LearningRate 0.000001 Epoch: 39 Global Step: 811980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:46,164-Speed 2513.55 samples/sec Loss 1.0862 LearningRate 0.000001 Epoch: 39 Global Step: 811990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:13:54,368-Speed 2496.46 samples/sec Loss 1.0764 LearningRate 0.000001 Epoch: 39 Global Step: 812000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:02,574-Speed 2496.34 samples/sec Loss 1.0533 LearningRate 0.000001 Epoch: 39 Global Step: 812010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:10,786-Speed 2494.86 samples/sec Loss 1.0813 LearningRate 0.000001 Epoch: 39 Global Step: 812020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:18,994-Speed 2495.53 samples/sec Loss 1.0480 LearningRate 0.000001 Epoch: 39 Global Step: 812030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:27,199-Speed 2496.53 samples/sec Loss 1.0463 LearningRate 0.000001 Epoch: 39 Global Step: 812040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:35,348-Speed 2513.68 samples/sec Loss 1.0402 LearningRate 0.000001 Epoch: 39 Global Step: 812050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:43,557-Speed 2495.41 samples/sec Loss 1.0480 LearningRate 0.000001 Epoch: 39 Global Step: 812060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:51,771-Speed 2493.56 samples/sec Loss 1.0898 LearningRate 0.000001 Epoch: 39 Global Step: 812070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:14:59,979-Speed 2495.52 samples/sec Loss 1.0565 LearningRate 0.000001 Epoch: 39 Global Step: 812080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:08,185-Speed 2496.05 samples/sec Loss 1.0862 LearningRate 0.000001 Epoch: 39 Global Step: 812090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:16,393-Speed 2495.75 samples/sec Loss 1.0925 LearningRate 0.000001 Epoch: 39 Global Step: 812100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:24,547-Speed 2511.92 samples/sec Loss 1.0419 LearningRate 0.000001 Epoch: 39 Global Step: 812110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:32,752-Speed 2496.45 samples/sec Loss 1.1001 LearningRate 0.000001 Epoch: 39 Global Step: 812120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:40,963-Speed 2494.96 samples/sec Loss 1.0802 LearningRate 0.000001 Epoch: 39 Global Step: 812130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:49,178-Speed 2493.49 samples/sec Loss 1.0541 LearningRate 0.000001 Epoch: 39 Global Step: 812140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:15:57,385-Speed 2495.58 samples/sec Loss 1.0640 LearningRate 0.000001 Epoch: 39 Global Step: 812150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:05,591-Speed 2496.39 samples/sec Loss 1.0665 LearningRate 0.000001 Epoch: 39 Global Step: 812160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:13,738-Speed 2514.21 samples/sec Loss 1.0469 LearningRate 0.000001 Epoch: 39 Global Step: 812170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:21,941-Speed 2496.99 samples/sec Loss 1.0720 LearningRate 0.000001 Epoch: 39 Global Step: 812180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:30,146-Speed 2496.29 samples/sec Loss 1.0692 LearningRate 0.000001 Epoch: 39 Global Step: 812190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:38,351-Speed 2496.47 samples/sec Loss 1.0489 LearningRate 0.000001 Epoch: 39 Global Step: 812200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:46,559-Speed 2495.63 samples/sec Loss 1.0371 LearningRate 0.000001 Epoch: 39 Global Step: 812210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:16:54,773-Speed 2493.58 samples/sec Loss 1.0446 LearningRate 0.000001 Epoch: 39 Global Step: 812220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:02,916-Speed 2515.46 samples/sec Loss 1.0718 LearningRate 0.000001 Epoch: 39 Global Step: 812230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:11,120-Speed 2496.95 samples/sec Loss 1.0403 LearningRate 0.000001 Epoch: 39 Global Step: 812240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:19,325-Speed 2496.54 samples/sec Loss 1.0500 LearningRate 0.000001 Epoch: 39 Global Step: 812250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:27,529-Speed 2496.61 samples/sec Loss 1.0723 LearningRate 0.000001 Epoch: 39 Global Step: 812260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:35,739-Speed 2495.00 samples/sec Loss 1.0204 LearningRate 0.000001 Epoch: 39 Global Step: 812270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:43,945-Speed 2495.87 samples/sec Loss 1.0806 LearningRate 0.000001 Epoch: 39 Global Step: 812280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:17:52,098-Speed 2512.58 samples/sec Loss 1.0642 LearningRate 0.000001 Epoch: 39 Global Step: 812290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:00,311-Speed 2493.84 samples/sec Loss 1.0656 LearningRate 0.000001 Epoch: 39 Global Step: 812300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:08,518-Speed 2496.03 samples/sec Loss 1.0764 LearningRate 0.000001 Epoch: 39 Global Step: 812310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:16,722-Speed 2496.67 samples/sec Loss 1.0498 LearningRate 0.000001 Epoch: 39 Global Step: 812320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:24,938-Speed 2493.14 samples/sec Loss 1.0598 LearningRate 0.000001 Epoch: 39 Global Step: 812330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:33,145-Speed 2495.77 samples/sec Loss 1.0552 LearningRate 0.000001 Epoch: 39 Global Step: 812340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:41,296-Speed 2513.00 samples/sec Loss 1.0794 LearningRate 0.000001 Epoch: 39 Global Step: 812350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:49,502-Speed 2496.17 samples/sec Loss 1.0704 LearningRate 0.000001 Epoch: 39 Global Step: 812360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:18:57,716-Speed 2493.66 samples/sec Loss 1.0495 LearningRate 0.000001 Epoch: 39 Global Step: 812370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:19:05,921-Speed 2496.15 samples/sec Loss 1.0532 LearningRate 0.000001 Epoch: 39 Global Step: 812380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:19:14,136-Speed 2493.46 samples/sec Loss 1.0398 LearningRate 0.000001 Epoch: 39 Global Step: 812390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:19:22,341-Speed 2496.51 samples/sec Loss 1.0600 LearningRate 0.000001 Epoch: 39 Global Step: 812400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:19:30,505-Speed 2508.85 samples/sec Loss 1.0505 LearningRate 0.000001 Epoch: 39 Global Step: 812410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:19:38,717-Speed 2494.44 samples/sec Loss 1.0390 LearningRate 0.000001 Epoch: 39 Global Step: 812420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:19:46,921-Speed 2496.63 samples/sec Loss 1.0650 LearningRate 0.000001 Epoch: 39 Global Step: 812430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:19:55,125-Speed 2496.96 samples/sec Loss 1.0694 LearningRate 0.000001 Epoch: 39 Global Step: 812440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:03,330-Speed 2496.35 samples/sec Loss 1.0502 LearningRate 0.000001 Epoch: 39 Global Step: 812450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:11,539-Speed 2495.59 samples/sec Loss 1.0528 LearningRate 0.000001 Epoch: 39 Global Step: 812460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:19,688-Speed 2513.51 samples/sec Loss 1.0709 LearningRate 0.000001 Epoch: 39 Global Step: 812470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:27,894-Speed 2496.06 samples/sec Loss 1.0755 LearningRate 0.000001 Epoch: 39 Global Step: 812480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:36,099-Speed 2496.45 samples/sec Loss 1.0836 LearningRate 0.000001 Epoch: 39 Global Step: 812490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:44,305-Speed 2496.19 samples/sec Loss 1.0552 LearningRate 0.000001 Epoch: 39 Global Step: 812500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:20:52,510-Speed 2496.24 samples/sec Loss 1.0643 LearningRate 0.000001 Epoch: 39 Global Step: 812510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:00,718-Speed 2495.55 samples/sec Loss 1.0377 LearningRate 0.000001 Epoch: 39 Global Step: 812520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:08,870-Speed 2512.85 samples/sec Loss 1.0851 LearningRate 0.000001 Epoch: 39 Global Step: 812530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:17,077-Speed 2495.83 samples/sec Loss 1.0607 LearningRate 0.000001 Epoch: 39 Global Step: 812540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:25,287-Speed 2498.72 samples/sec Loss 1.0559 LearningRate 0.000001 Epoch: 39 Global Step: 812550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:33,495-Speed 2495.47 samples/sec Loss 1.0658 LearningRate 0.000001 Epoch: 39 Global Step: 812560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:41,708-Speed 2493.94 samples/sec Loss 1.0412 LearningRate 0.000001 Epoch: 39 Global Step: 812570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:49,917-Speed 2495.20 samples/sec Loss 1.0886 LearningRate 0.000001 Epoch: 39 Global Step: 812580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:21:58,066-Speed 2513.67 samples/sec Loss 1.0604 LearningRate 0.000001 Epoch: 39 Global Step: 812590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:06,275-Speed 2495.33 samples/sec Loss 1.0616 LearningRate 0.000001 Epoch: 39 Global Step: 812600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:14,482-Speed 2495.78 samples/sec Loss 1.0381 LearningRate 0.000001 Epoch: 39 Global Step: 812610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:22,711-Speed 2489.28 samples/sec Loss 1.0659 LearningRate 0.000001 Epoch: 39 Global Step: 812620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:30,909-Speed 2498.32 samples/sec Loss 1.0492 LearningRate 0.000001 Epoch: 39 Global Step: 812630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:39,120-Speed 2495.18 samples/sec Loss 1.0472 LearningRate 0.000001 Epoch: 39 Global Step: 812640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:47,270-Speed 2513.33 samples/sec Loss 1.0396 LearningRate 0.000001 Epoch: 39 Global Step: 812650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:22:55,497-Speed 2489.66 samples/sec Loss 1.0761 LearningRate 0.000001 Epoch: 39 Global Step: 812660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:03,708-Speed 2494.90 samples/sec Loss 1.0398 LearningRate 0.000001 Epoch: 39 Global Step: 812670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:11,918-Speed 2495.01 samples/sec Loss 1.0905 LearningRate 0.000001 Epoch: 39 Global Step: 812680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:20,139-Speed 2491.43 samples/sec Loss 1.0874 LearningRate 0.000001 Epoch: 39 Global Step: 812690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:28,362-Speed 2491.01 samples/sec Loss 1.0402 LearningRate 0.000001 Epoch: 39 Global Step: 812700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:36,513-Speed 2513.09 samples/sec Loss 1.0672 LearningRate 0.000001 Epoch: 39 Global Step: 812710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:44,719-Speed 2495.92 samples/sec Loss 1.0643 LearningRate 0.000001 Epoch: 39 Global Step: 812720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:23:52,928-Speed 2495.29 samples/sec Loss 1.0398 LearningRate 0.000001 Epoch: 39 Global Step: 812730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:01,136-Speed 2495.98 samples/sec Loss 1.0417 LearningRate 0.000001 Epoch: 39 Global Step: 812740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:09,342-Speed 2496.21 samples/sec Loss 1.0633 LearningRate 0.000001 Epoch: 39 Global Step: 812750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:17,544-Speed 2497.22 samples/sec Loss 1.0915 LearningRate 0.000001 Epoch: 39 Global Step: 812760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:25,697-Speed 2512.31 samples/sec Loss 1.0717 LearningRate 0.000001 Epoch: 39 Global Step: 812770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:33,905-Speed 2495.40 samples/sec Loss 1.0542 LearningRate 0.000001 Epoch: 39 Global Step: 812780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:42,106-Speed 2497.67 samples/sec Loss 1.0490 LearningRate 0.000001 Epoch: 39 Global Step: 812790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:50,308-Speed 2497.25 samples/sec Loss 1.0495 LearningRate 0.000001 Epoch: 39 Global Step: 812800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:24:58,513-Speed 2496.46 samples/sec Loss 1.0636 LearningRate 0.000001 Epoch: 39 Global Step: 812810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:06,719-Speed 2495.87 samples/sec Loss 1.0627 LearningRate 0.000001 Epoch: 39 Global Step: 812820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:14,872-Speed 2513.14 samples/sec Loss 1.0625 LearningRate 0.000001 Epoch: 39 Global Step: 812830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:23,076-Speed 2496.67 samples/sec Loss 1.0510 LearningRate 0.000001 Epoch: 39 Global Step: 812840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:31,283-Speed 2495.69 samples/sec Loss 1.0499 LearningRate 0.000001 Epoch: 39 Global Step: 812850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:39,492-Speed 2495.43 samples/sec Loss 1.0146 LearningRate 0.000001 Epoch: 39 Global Step: 812860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:47,712-Speed 2491.75 samples/sec Loss 1.0737 LearningRate 0.000000 Epoch: 39 Global Step: 812870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:25:55,918-Speed 2495.90 samples/sec Loss 1.0569 LearningRate 0.000000 Epoch: 39 Global Step: 812880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:04,077-Speed 2510.71 samples/sec Loss 1.0544 LearningRate 0.000000 Epoch: 39 Global Step: 812890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:12,284-Speed 2495.89 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 812900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:20,490-Speed 2495.97 samples/sec Loss 1.0387 LearningRate 0.000000 Epoch: 39 Global Step: 812910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:28,698-Speed 2495.59 samples/sec Loss 1.0835 LearningRate 0.000000 Epoch: 39 Global Step: 812920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:36,901-Speed 2496.83 samples/sec Loss 1.0567 LearningRate 0.000000 Epoch: 39 Global Step: 812930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:45,120-Speed 2492.28 samples/sec Loss 1.0460 LearningRate 0.000000 Epoch: 39 Global Step: 812940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:26:53,274-Speed 2512.16 samples/sec Loss 1.0657 LearningRate 0.000000 Epoch: 39 Global Step: 812950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:01,484-Speed 2494.87 samples/sec Loss 1.0712 LearningRate 0.000000 Epoch: 39 Global Step: 812960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:09,700-Speed 2493.01 samples/sec Loss 1.0450 LearningRate 0.000000 Epoch: 39 Global Step: 812970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:17,906-Speed 2496.02 samples/sec Loss 1.0701 LearningRate 0.000000 Epoch: 39 Global Step: 812980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:26,131-Speed 2490.31 samples/sec Loss 1.0585 LearningRate 0.000000 Epoch: 39 Global Step: 812990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:34,336-Speed 2496.27 samples/sec Loss 1.0681 LearningRate 0.000000 Epoch: 39 Global Step: 813000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:42,488-Speed 2512.69 samples/sec Loss 1.0757 LearningRate 0.000000 Epoch: 39 Global Step: 813010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:50,701-Speed 2493.96 samples/sec Loss 1.0661 LearningRate 0.000000 Epoch: 39 Global Step: 813020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:27:58,931-Speed 2488.99 samples/sec Loss 1.0721 LearningRate 0.000000 Epoch: 39 Global Step: 813030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:07,144-Speed 2493.90 samples/sec Loss 1.0524 LearningRate 0.000000 Epoch: 39 Global Step: 813040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:15,349-Speed 2496.35 samples/sec Loss 1.0425 LearningRate 0.000000 Epoch: 39 Global Step: 813050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:23,569-Speed 2492.01 samples/sec Loss 1.0208 LearningRate 0.000000 Epoch: 39 Global Step: 813060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:31,743-Speed 2505.87 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 813070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:39,951-Speed 2495.31 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 813080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:48,159-Speed 2495.89 samples/sec Loss 1.0750 LearningRate 0.000000 Epoch: 39 Global Step: 813090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:28:56,367-Speed 2495.85 samples/sec Loss 1.0427 LearningRate 0.000000 Epoch: 39 Global Step: 813100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:04,576-Speed 2495.17 samples/sec Loss 1.0635 LearningRate 0.000000 Epoch: 39 Global Step: 813110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:12,783-Speed 2495.88 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 813120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:20,939-Speed 2511.51 samples/sec Loss 1.0527 LearningRate 0.000000 Epoch: 39 Global Step: 813130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:29,145-Speed 2496.04 samples/sec Loss 1.0508 LearningRate 0.000000 Epoch: 39 Global Step: 813140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:37,365-Speed 2491.94 samples/sec Loss 1.0648 LearningRate 0.000000 Epoch: 39 Global Step: 813150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:45,571-Speed 2496.28 samples/sec Loss 1.0495 LearningRate 0.000000 Epoch: 39 Global Step: 813160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:29:53,776-Speed 2496.38 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 813170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:02,002-Speed 2489.76 samples/sec Loss 1.0744 LearningRate 0.000000 Epoch: 39 Global Step: 813180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:10,148-Speed 2514.75 samples/sec Loss 1.0807 LearningRate 0.000000 Epoch: 39 Global Step: 813190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:18,354-Speed 2495.88 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 813200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:26,560-Speed 2496.35 samples/sec Loss 1.0396 LearningRate 0.000000 Epoch: 39 Global Step: 813210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:34,765-Speed 2496.09 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 813220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:42,973-Speed 2495.94 samples/sec Loss 1.0645 LearningRate 0.000000 Epoch: 39 Global Step: 813230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:51,182-Speed 2495.21 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 813240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:30:59,338-Speed 2511.36 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 813250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:07,546-Speed 2495.57 samples/sec Loss 1.0944 LearningRate 0.000000 Epoch: 39 Global Step: 813260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:15,761-Speed 2493.68 samples/sec Loss 1.0425 LearningRate 0.000000 Epoch: 39 Global Step: 813270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:23,966-Speed 2496.75 samples/sec Loss 1.0581 LearningRate 0.000000 Epoch: 39 Global Step: 813280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:32,179-Speed 2493.77 samples/sec Loss 1.0489 LearningRate 0.000000 Epoch: 39 Global Step: 813290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:40,383-Speed 2496.77 samples/sec Loss 1.0410 LearningRate 0.000000 Epoch: 39 Global Step: 813300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:48,538-Speed 2511.85 samples/sec Loss 1.0646 LearningRate 0.000000 Epoch: 39 Global Step: 813310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:31:56,746-Speed 2495.55 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 813320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:04,953-Speed 2495.77 samples/sec Loss 1.0948 LearningRate 0.000000 Epoch: 39 Global Step: 813330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:13,162-Speed 2495.37 samples/sec Loss 1.0484 LearningRate 0.000000 Epoch: 39 Global Step: 813340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:21,367-Speed 2496.16 samples/sec Loss 1.0680 LearningRate 0.000000 Epoch: 39 Global Step: 813350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:29,573-Speed 2496.12 samples/sec Loss 1.0727 LearningRate 0.000000 Epoch: 39 Global Step: 813360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:37,730-Speed 2511.21 samples/sec Loss 1.0360 LearningRate 0.000000 Epoch: 39 Global Step: 813370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:45,934-Speed 2496.64 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 813380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:32:54,141-Speed 2495.79 samples/sec Loss 1.0485 LearningRate 0.000000 Epoch: 39 Global Step: 813390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:33:02,349-Speed 2495.86 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 813400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:33:10,557-Speed 2495.25 samples/sec Loss 1.0765 LearningRate 0.000000 Epoch: 39 Global Step: 813410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-07-13 08:33:18,726-Speed 2507.39 samples/sec Loss 1.0608 LearningRate 0.000000 Epoch: 39 Global Step: 813420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:33:26,884-Speed 2510.99 samples/sec Loss 1.0656 LearningRate 0.000000 Epoch: 39 Global Step: 813430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:33:35,094-Speed 2495.09 samples/sec Loss 1.0715 LearningRate 0.000000 Epoch: 39 Global Step: 813440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:33:43,298-Speed 2497.25 samples/sec Loss 1.0586 LearningRate 0.000000 Epoch: 39 Global Step: 813450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:33:51,501-Speed 2497.12 samples/sec Loss 1.0602 LearningRate 0.000000 Epoch: 39 Global Step: 813460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:33:59,729-Speed 2490.11 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 813470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:07,936-Speed 2495.92 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 813480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:16,088-Speed 2512.70 samples/sec Loss 1.0470 LearningRate 0.000000 Epoch: 39 Global Step: 813490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:24,308-Speed 2491.75 samples/sec Loss 1.0566 LearningRate 0.000000 Epoch: 39 Global Step: 813500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:32,517-Speed 2495.11 samples/sec Loss 1.0841 LearningRate 0.000000 Epoch: 39 Global Step: 813510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:40,721-Speed 2496.66 samples/sec Loss 1.0612 LearningRate 0.000000 Epoch: 39 Global Step: 813520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:48,927-Speed 2495.95 samples/sec Loss 1.0555 LearningRate 0.000000 Epoch: 39 Global Step: 813530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:34:57,131-Speed 2496.79 samples/sec Loss 1.0533 LearningRate 0.000000 Epoch: 39 Global Step: 813540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:35:05,285-Speed 2512.18 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 813550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:35:13,503-Speed 2492.63 samples/sec Loss 1.0500 LearningRate 0.000000 Epoch: 39 Global Step: 813560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:35:21,711-Speed 2495.36 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 813570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-07-13 08:35:29,872-Speed 2509.76 samples/sec Loss 1.0546 LearningRate 0.000000 Epoch: 39 Global Step: 813580 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:35:38,081-Speed 2495.24 samples/sec Loss 1.0759 LearningRate 0.000000 Epoch: 39 Global Step: 813590 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:35:46,285-Speed 2496.84 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 813600 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:35:54,436-Speed 2512.65 samples/sec Loss 1.0619 LearningRate 0.000000 Epoch: 39 Global Step: 813610 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:02,654-Speed 2492.66 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 813620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:10,875-Speed 2491.35 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 813630 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:19,081-Speed 2496.32 samples/sec Loss 1.0536 LearningRate 0.000000 Epoch: 39 Global Step: 813640 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:27,288-Speed 2495.59 samples/sec Loss 1.0901 LearningRate 0.000000 Epoch: 39 Global Step: 813650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:35,498-Speed 2495.26 samples/sec Loss 1.0466 LearningRate 0.000000 Epoch: 39 Global Step: 813660 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:43,650-Speed 2512.58 samples/sec Loss 1.0704 LearningRate 0.000000 Epoch: 39 Global Step: 813670 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:36:51,854-Speed 2497.01 samples/sec Loss 1.0736 LearningRate 0.000000 Epoch: 39 Global Step: 813680 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:00,059-Speed 2496.47 samples/sec Loss 1.0625 LearningRate 0.000000 Epoch: 39 Global Step: 813690 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:08,269-Speed 2494.86 samples/sec Loss 1.0193 LearningRate 0.000000 Epoch: 39 Global Step: 813700 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:16,477-Speed 2495.58 samples/sec Loss 1.0862 LearningRate 0.000000 Epoch: 39 Global Step: 813710 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:24,683-Speed 2496.35 samples/sec Loss 1.0452 LearningRate 0.000000 Epoch: 39 Global Step: 813720 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:32,834-Speed 2512.87 samples/sec Loss 1.0978 LearningRate 0.000000 Epoch: 39 Global Step: 813730 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:41,043-Speed 2495.28 samples/sec Loss 1.0624 LearningRate 0.000000 Epoch: 39 Global Step: 813740 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:49,248-Speed 2496.35 samples/sec Loss 1.0494 LearningRate 0.000000 Epoch: 39 Global Step: 813750 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:37:57,460-Speed 2494.24 samples/sec Loss 1.0771 LearningRate 0.000000 Epoch: 39 Global Step: 813760 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:05,666-Speed 2496.31 samples/sec Loss 1.0498 LearningRate 0.000000 Epoch: 39 Global Step: 813770 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:13,870-Speed 2496.62 samples/sec Loss 1.0592 LearningRate 0.000000 Epoch: 39 Global Step: 813780 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:22,026-Speed 2511.65 samples/sec Loss 1.0520 LearningRate 0.000000 Epoch: 39 Global Step: 813790 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:30,232-Speed 2495.94 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 813800 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:38,439-Speed 2496.22 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 813810 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:46,644-Speed 2496.56 samples/sec Loss 1.0383 LearningRate 0.000000 Epoch: 39 Global Step: 813820 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:38:54,851-Speed 2495.57 samples/sec Loss 1.0738 LearningRate 0.000000 Epoch: 39 Global Step: 813830 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:03,058-Speed 2495.77 samples/sec Loss 1.0290 LearningRate 0.000000 Epoch: 39 Global Step: 813840 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:11,210-Speed 2512.71 samples/sec Loss 1.0773 LearningRate 0.000000 Epoch: 39 Global Step: 813850 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:19,413-Speed 2497.26 samples/sec Loss 1.0645 LearningRate 0.000000 Epoch: 39 Global Step: 813860 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:27,617-Speed 2496.81 samples/sec Loss 1.0629 LearningRate 0.000000 Epoch: 39 Global Step: 813870 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:35,827-Speed 2495.07 samples/sec Loss 1.0511 LearningRate 0.000000 Epoch: 39 Global Step: 813880 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:44,035-Speed 2495.28 samples/sec Loss 1.0507 LearningRate 0.000000 Epoch: 39 Global Step: 813890 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:39:52,245-Speed 2496.21 samples/sec Loss 1.0594 LearningRate 0.000000 Epoch: 39 Global Step: 813900 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:00,395-Speed 2513.19 samples/sec Loss 1.0428 LearningRate 0.000000 Epoch: 39 Global Step: 813910 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:08,600-Speed 2496.47 samples/sec Loss 1.0742 LearningRate 0.000000 Epoch: 39 Global Step: 813920 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:16,808-Speed 2495.55 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 813930 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:25,012-Speed 2497.58 samples/sec Loss 1.0491 LearningRate 0.000000 Epoch: 39 Global Step: 813940 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:33,222-Speed 2495.07 samples/sec Loss 1.0397 LearningRate 0.000000 Epoch: 39 Global Step: 813950 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:41,424-Speed 2496.97 samples/sec Loss 1.0753 LearningRate 0.000000 Epoch: 39 Global Step: 813960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:49,582-Speed 2510.98 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 813970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:40:57,786-Speed 2496.64 samples/sec Loss 1.0661 LearningRate 0.000000 Epoch: 39 Global Step: 813980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:05,995-Speed 2495.30 samples/sec Loss 1.0846 LearningRate 0.000000 Epoch: 39 Global Step: 813990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:14,199-Speed 2496.71 samples/sec Loss 1.0545 LearningRate 0.000000 Epoch: 39 Global Step: 814000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:22,402-Speed 2496.95 samples/sec Loss 1.0473 LearningRate 0.000000 Epoch: 39 Global Step: 814010 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:30,631-Speed 2489.24 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 814020 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:38,781-Speed 2513.18 samples/sec Loss 1.0719 LearningRate 0.000000 Epoch: 39 Global Step: 814030 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:46,985-Speed 2496.83 samples/sec Loss 1.0279 LearningRate 0.000000 Epoch: 39 Global Step: 814040 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:41:55,190-Speed 2496.79 samples/sec Loss 1.0539 LearningRate 0.000000 Epoch: 39 Global Step: 814050 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:03,395-Speed 2496.47 samples/sec Loss 1.0703 LearningRate 0.000000 Epoch: 39 Global Step: 814060 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:11,600-Speed 2496.23 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 814070 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:19,809-Speed 2495.24 samples/sec Loss 1.0702 LearningRate 0.000000 Epoch: 39 Global Step: 814080 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:27,968-Speed 2510.48 samples/sec Loss 1.0722 LearningRate 0.000000 Epoch: 39 Global Step: 814090 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:36,189-Speed 2491.73 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 814100 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:44,398-Speed 2495.31 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 814110 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:42:52,606-Speed 2495.49 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 814120 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:00,822-Speed 2493.27 samples/sec Loss 1.0668 LearningRate 0.000000 Epoch: 39 Global Step: 814130 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:09,031-Speed 2495.39 samples/sec Loss 1.0824 LearningRate 0.000000 Epoch: 39 Global Step: 814140 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:17,196-Speed 2512.03 samples/sec Loss 1.0631 LearningRate 0.000000 Epoch: 39 Global Step: 814150 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:25,396-Speed 2497.91 samples/sec Loss 1.0259 LearningRate 0.000000 Epoch: 39 Global Step: 814160 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:33,601-Speed 2496.43 samples/sec Loss 1.0664 LearningRate 0.000000 Epoch: 39 Global Step: 814170 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:41,810-Speed 2495.32 samples/sec Loss 1.0568 LearningRate 0.000000 Epoch: 39 Global Step: 814180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:50,018-Speed 2495.61 samples/sec Loss 1.0763 LearningRate 0.000000 Epoch: 39 Global Step: 814190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:43:58,222-Speed 2496.69 samples/sec Loss 1.0695 LearningRate 0.000000 Epoch: 39 Global Step: 814200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:06,376-Speed 2511.94 samples/sec Loss 1.0556 LearningRate 0.000000 Epoch: 39 Global Step: 814210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:14,584-Speed 2495.34 samples/sec Loss 1.0470 LearningRate 0.000000 Epoch: 39 Global Step: 814220 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:22,791-Speed 2495.78 samples/sec Loss 1.0520 LearningRate 0.000000 Epoch: 39 Global Step: 814230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:31,010-Speed 2492.62 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 814240 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:39,220-Speed 2495.07 samples/sec Loss 1.0716 LearningRate 0.000000 Epoch: 39 Global Step: 814250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:47,426-Speed 2495.86 samples/sec Loss 1.0704 LearningRate 0.000000 Epoch: 39 Global Step: 814260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-07-13 08:44:55,581-Speed 2511.78 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 814270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:03,804-Speed 2491.16 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 814280 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:12,017-Speed 2493.69 samples/sec Loss 1.0661 LearningRate 0.000000 Epoch: 39 Global Step: 814290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:20,223-Speed 2496.08 samples/sec Loss 1.0729 LearningRate 0.000000 Epoch: 39 Global Step: 814300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:28,429-Speed 2496.20 samples/sec Loss 1.0754 LearningRate 0.000000 Epoch: 39 Global Step: 814310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:36,638-Speed 2495.30 samples/sec Loss 1.0702 LearningRate 0.000000 Epoch: 39 Global Step: 814320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:44,794-Speed 2511.30 samples/sec Loss 1.0236 LearningRate 0.000000 Epoch: 39 Global Step: 814330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:45:52,999-Speed 2496.65 samples/sec Loss 1.0634 LearningRate 0.000000 Epoch: 39 Global Step: 814340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:01,203-Speed 2496.66 samples/sec Loss 1.0281 LearningRate 0.000000 Epoch: 39 Global Step: 814350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:09,414-Speed 2495.15 samples/sec Loss 1.0736 LearningRate 0.000000 Epoch: 39 Global Step: 814360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:17,653-Speed 2486.15 samples/sec Loss 1.0627 LearningRate 0.000000 Epoch: 39 Global Step: 814370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:25,858-Speed 2496.24 samples/sec Loss 1.0814 LearningRate 0.000000 Epoch: 39 Global Step: 814380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:34,008-Speed 2513.40 samples/sec Loss 1.0654 LearningRate 0.000000 Epoch: 39 Global Step: 814390 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:42,214-Speed 2495.93 samples/sec Loss 1.0424 LearningRate 0.000000 Epoch: 39 Global Step: 814400 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:50,418-Speed 2496.77 samples/sec Loss 1.0757 LearningRate 0.000000 Epoch: 39 Global Step: 814410 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:46:58,622-Speed 2496.48 samples/sec Loss 1.0784 LearningRate 0.000000 Epoch: 39 Global Step: 814420 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:06,828-Speed 2496.37 samples/sec Loss 1.0798 LearningRate 0.000000 Epoch: 39 Global Step: 814430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:15,035-Speed 2495.80 samples/sec Loss 1.0510 LearningRate 0.000000 Epoch: 39 Global Step: 814440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:23,187-Speed 2512.53 samples/sec Loss 1.0577 LearningRate 0.000000 Epoch: 39 Global Step: 814450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:31,392-Speed 2496.30 samples/sec Loss 1.0467 LearningRate 0.000000 Epoch: 39 Global Step: 814460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:39,600-Speed 2495.65 samples/sec Loss 1.0499 LearningRate 0.000000 Epoch: 39 Global Step: 814470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:47,804-Speed 2496.97 samples/sec Loss 1.0642 LearningRate 0.000000 Epoch: 39 Global Step: 814480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:47:56,015-Speed 2494.37 samples/sec Loss 1.0592 LearningRate 0.000000 Epoch: 39 Global Step: 814490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:04,228-Speed 2494.04 samples/sec Loss 1.0734 LearningRate 0.000000 Epoch: 39 Global Step: 814500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:12,382-Speed 2512.22 samples/sec Loss 1.0437 LearningRate 0.000000 Epoch: 39 Global Step: 814510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:20,595-Speed 2493.69 samples/sec Loss 1.0645 LearningRate 0.000000 Epoch: 39 Global Step: 814520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:28,800-Speed 2496.31 samples/sec Loss 1.0440 LearningRate 0.000000 Epoch: 39 Global Step: 814530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:37,005-Speed 2496.63 samples/sec Loss 1.0958 LearningRate 0.000000 Epoch: 39 Global Step: 814540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:45,216-Speed 2494.45 samples/sec Loss 1.0853 LearningRate 0.000000 Epoch: 39 Global Step: 814550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:48:53,420-Speed 2496.80 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 814560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:01,574-Speed 2512.04 samples/sec Loss 1.0591 LearningRate 0.000000 Epoch: 39 Global Step: 814570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:09,778-Speed 2496.91 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 814580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:17,980-Speed 2497.09 samples/sec Loss 1.0677 LearningRate 0.000000 Epoch: 39 Global Step: 814590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:26,184-Speed 2496.63 samples/sec Loss 1.0737 LearningRate 0.000000 Epoch: 39 Global Step: 814600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:34,388-Speed 2497.05 samples/sec Loss 1.0869 LearningRate 0.000000 Epoch: 39 Global Step: 814610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:42,590-Speed 2497.44 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 814620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:50,743-Speed 2513.16 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 814630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:49:58,942-Speed 2498.15 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 814640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:07,153-Speed 2494.44 samples/sec Loss 1.0368 LearningRate 0.000000 Epoch: 39 Global Step: 814650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:15,368-Speed 2493.62 samples/sec Loss 1.0763 LearningRate 0.000000 Epoch: 39 Global Step: 814660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:23,577-Speed 2495.20 samples/sec Loss 1.0916 LearningRate 0.000000 Epoch: 39 Global Step: 814670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:31,782-Speed 2496.35 samples/sec Loss 1.0684 LearningRate 0.000000 Epoch: 39 Global Step: 814680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:39,930-Speed 2513.96 samples/sec Loss 1.0821 LearningRate 0.000000 Epoch: 39 Global Step: 814690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:48,144-Speed 2493.63 samples/sec Loss 1.0440 LearningRate 0.000000 Epoch: 39 Global Step: 814700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:50:56,346-Speed 2497.44 samples/sec Loss 1.0574 LearningRate 0.000000 Epoch: 39 Global Step: 814710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:04,547-Speed 2497.59 samples/sec Loss 1.0625 LearningRate 0.000000 Epoch: 39 Global Step: 814720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:12,756-Speed 2495.34 samples/sec Loss 1.0317 LearningRate 0.000000 Epoch: 39 Global Step: 814730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:20,960-Speed 2496.75 samples/sec Loss 1.0363 LearningRate 0.000000 Epoch: 39 Global Step: 814740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:29,109-Speed 2513.61 samples/sec Loss 1.0825 LearningRate 0.000000 Epoch: 39 Global Step: 814750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:37,311-Speed 2497.45 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 814760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:45,516-Speed 2496.44 samples/sec Loss 1.0797 LearningRate 0.000000 Epoch: 39 Global Step: 814770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 08:51:53,718-Speed 2497.43 samples/sec Loss 1.0551 LearningRate 0.000000 Epoch: 39 Global Step: 814780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:01,924-Speed 2495.99 samples/sec Loss 1.0562 LearningRate 0.000000 Epoch: 39 Global Step: 814790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:10,130-Speed 2496.71 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 814800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:18,282-Speed 2512.72 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 814810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:26,488-Speed 2496.45 samples/sec Loss 1.0774 LearningRate 0.000000 Epoch: 39 Global Step: 814820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:34,698-Speed 2494.75 samples/sec Loss 1.0826 LearningRate 0.000000 Epoch: 39 Global Step: 814830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:42,922-Speed 2490.83 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 814840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:51,126-Speed 2496.94 samples/sec Loss 1.0760 LearningRate 0.000000 Epoch: 39 Global Step: 814850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:52:59,332-Speed 2496.00 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 814860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:07,483-Speed 2512.78 samples/sec Loss 1.0871 LearningRate 0.000000 Epoch: 39 Global Step: 814870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:15,702-Speed 2492.33 samples/sec Loss 1.0584 LearningRate 0.000000 Epoch: 39 Global Step: 814880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:23,918-Speed 2493.01 samples/sec Loss 1.0696 LearningRate 0.000000 Epoch: 39 Global Step: 814890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:32,127-Speed 2495.23 samples/sec Loss 1.0675 LearningRate 0.000000 Epoch: 39 Global Step: 814900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:40,333-Speed 2496.27 samples/sec Loss 1.0731 LearningRate 0.000000 Epoch: 39 Global Step: 814910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:48,540-Speed 2495.63 samples/sec Loss 1.0716 LearningRate 0.000000 Epoch: 39 Global Step: 814920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:53:56,692-Speed 2512.54 samples/sec Loss 1.0542 LearningRate 0.000000 Epoch: 39 Global Step: 814930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:04,898-Speed 2496.38 samples/sec Loss 1.0570 LearningRate 0.000000 Epoch: 39 Global Step: 814940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:13,110-Speed 2494.31 samples/sec Loss 1.0507 LearningRate 0.000000 Epoch: 39 Global Step: 814950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:21,316-Speed 2496.28 samples/sec Loss 1.0715 LearningRate 0.000000 Epoch: 39 Global Step: 814960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:29,520-Speed 2496.77 samples/sec Loss 1.0597 LearningRate 0.000000 Epoch: 39 Global Step: 814970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:37,722-Speed 2497.32 samples/sec Loss 1.0747 LearningRate 0.000000 Epoch: 39 Global Step: 814980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:45,876-Speed 2512.16 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 814990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:54:54,081-Speed 2496.50 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 815000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:02,287-Speed 2496.51 samples/sec Loss 1.0627 LearningRate 0.000000 Epoch: 39 Global Step: 815010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:10,493-Speed 2496.08 samples/sec Loss 1.0448 LearningRate 0.000000 Epoch: 39 Global Step: 815020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:18,701-Speed 2495.76 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 815030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:26,905-Speed 2496.41 samples/sec Loss 1.0692 LearningRate 0.000000 Epoch: 39 Global Step: 815040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:35,058-Speed 2512.51 samples/sec Loss 1.0217 LearningRate 0.000000 Epoch: 39 Global Step: 815050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:43,261-Speed 2497.04 samples/sec Loss 1.0815 LearningRate 0.000000 Epoch: 39 Global Step: 815060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:51,484-Speed 2491.18 samples/sec Loss 1.0552 LearningRate 0.000000 Epoch: 39 Global Step: 815070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:55:59,688-Speed 2496.71 samples/sec Loss 1.0741 LearningRate 0.000000 Epoch: 39 Global Step: 815080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:07,889-Speed 2497.41 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 815090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:16,096-Speed 2495.92 samples/sec Loss 1.0789 LearningRate 0.000000 Epoch: 39 Global Step: 815100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:24,247-Speed 2513.09 samples/sec Loss 1.0229 LearningRate 0.000000 Epoch: 39 Global Step: 815110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:32,458-Speed 2494.80 samples/sec Loss 1.0851 LearningRate 0.000000 Epoch: 39 Global Step: 815120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:40,664-Speed 2496.11 samples/sec Loss 1.0806 LearningRate 0.000000 Epoch: 39 Global Step: 815130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:48,875-Speed 2494.32 samples/sec Loss 1.0435 LearningRate 0.000000 Epoch: 39 Global Step: 815140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:56:57,082-Speed 2495.72 samples/sec Loss 1.0906 LearningRate 0.000000 Epoch: 39 Global Step: 815150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:05,299-Speed 2493.07 samples/sec Loss 1.0392 LearningRate 0.000000 Epoch: 39 Global Step: 815160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:13,450-Speed 2512.86 samples/sec Loss 1.0771 LearningRate 0.000000 Epoch: 39 Global Step: 815170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:21,655-Speed 2496.45 samples/sec Loss 1.0322 LearningRate 0.000000 Epoch: 39 Global Step: 815180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:29,870-Speed 2493.37 samples/sec Loss 1.0731 LearningRate 0.000000 Epoch: 39 Global Step: 815190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:38,075-Speed 2496.35 samples/sec Loss 1.0500 LearningRate 0.000000 Epoch: 39 Global Step: 815200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:46,279-Speed 2496.81 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 815210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:57:54,491-Speed 2494.12 samples/sec Loss 1.0319 LearningRate 0.000000 Epoch: 39 Global Step: 815220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:02,642-Speed 2513.39 samples/sec Loss 1.0473 LearningRate 0.000000 Epoch: 39 Global Step: 815230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:10,843-Speed 2497.62 samples/sec Loss 1.0794 LearningRate 0.000000 Epoch: 39 Global Step: 815240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:19,050-Speed 2495.79 samples/sec Loss 1.0738 LearningRate 0.000000 Epoch: 39 Global Step: 815250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:27,261-Speed 2494.80 samples/sec Loss 1.0774 LearningRate 0.000000 Epoch: 39 Global Step: 815260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:35,477-Speed 2493.03 samples/sec Loss 1.0347 LearningRate 0.000000 Epoch: 39 Global Step: 815270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:43,690-Speed 2494.03 samples/sec Loss 1.0453 LearningRate 0.000000 Epoch: 39 Global Step: 815280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:58:51,845-Speed 2511.82 samples/sec Loss 1.0464 LearningRate 0.000000 Epoch: 39 Global Step: 815290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:00,057-Speed 2494.07 samples/sec Loss 1.0559 LearningRate 0.000000 Epoch: 39 Global Step: 815300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:08,265-Speed 2495.48 samples/sec Loss 1.0431 LearningRate 0.000000 Epoch: 39 Global Step: 815310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:16,485-Speed 2492.21 samples/sec Loss 1.0725 LearningRate 0.000000 Epoch: 39 Global Step: 815320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:24,702-Speed 2492.51 samples/sec Loss 1.0734 LearningRate 0.000000 Epoch: 39 Global Step: 815330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:32,908-Speed 2496.14 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 815340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:41,069-Speed 2510.28 samples/sec Loss 1.0581 LearningRate 0.000000 Epoch: 39 Global Step: 815350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:49,280-Speed 2494.36 samples/sec Loss 1.0834 LearningRate 0.000000 Epoch: 39 Global Step: 815360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 08:59:57,492-Speed 2494.32 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 815370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:05,724-Speed 2488.22 samples/sec Loss 1.0374 LearningRate 0.000000 Epoch: 39 Global Step: 815380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:13,932-Speed 2495.56 samples/sec Loss 1.0386 LearningRate 0.000000 Epoch: 39 Global Step: 815390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:22,151-Speed 2492.03 samples/sec Loss 1.0376 LearningRate 0.000000 Epoch: 39 Global Step: 815400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:30,306-Speed 2511.85 samples/sec Loss 1.0443 LearningRate 0.000000 Epoch: 39 Global Step: 815410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:38,514-Speed 2495.50 samples/sec Loss 1.0517 LearningRate 0.000000 Epoch: 39 Global Step: 815420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:00:46,679-Speed 2508.73 samples/sec Loss 1.0377 LearningRate 0.000000 Epoch: 39 Global Step: 815430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:00:54,886-Speed 2495.72 samples/sec Loss 1.0272 LearningRate 0.000000 Epoch: 39 Global Step: 815440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:03,095-Speed 2495.13 samples/sec Loss 1.0722 LearningRate 0.000000 Epoch: 39 Global Step: 815450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:11,301-Speed 2496.23 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 815460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:19,458-Speed 2511.18 samples/sec Loss 1.0636 LearningRate 0.000000 Epoch: 39 Global Step: 815470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:27,667-Speed 2495.35 samples/sec Loss 1.0634 LearningRate 0.000000 Epoch: 39 Global Step: 815480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:35,883-Speed 2493.05 samples/sec Loss 1.0570 LearningRate 0.000000 Epoch: 39 Global Step: 815490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:44,087-Speed 2496.56 samples/sec Loss 1.0489 LearningRate 0.000000 Epoch: 39 Global Step: 815500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:01:52,294-Speed 2495.78 samples/sec Loss 1.0686 LearningRate 0.000000 Epoch: 39 Global Step: 815510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:00,506-Speed 2494.49 samples/sec Loss 1.0873 LearningRate 0.000000 Epoch: 39 Global Step: 815520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:08,666-Speed 2510.18 samples/sec Loss 1.0695 LearningRate 0.000000 Epoch: 39 Global Step: 815530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:16,869-Speed 2497.11 samples/sec Loss 1.0450 LearningRate 0.000000 Epoch: 39 Global Step: 815540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:25,068-Speed 2498.03 samples/sec Loss 1.0589 LearningRate 0.000000 Epoch: 39 Global Step: 815550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:33,269-Speed 2497.82 samples/sec Loss 1.0251 LearningRate 0.000000 Epoch: 39 Global Step: 815560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:41,485-Speed 2492.75 samples/sec Loss 1.0332 LearningRate 0.000000 Epoch: 39 Global Step: 815570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:49,690-Speed 2496.44 samples/sec Loss 1.0772 LearningRate 0.000000 Epoch: 39 Global Step: 815580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:02:57,850-Speed 2511.48 samples/sec Loss 1.0644 LearningRate 0.000000 Epoch: 39 Global Step: 815590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:06,056-Speed 2496.01 samples/sec Loss 1.0721 LearningRate 0.000000 Epoch: 39 Global Step: 815600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:14,259-Speed 2496.95 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 815610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:22,462-Speed 2497.08 samples/sec Loss 1.0795 LearningRate 0.000000 Epoch: 39 Global Step: 815620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:30,664-Speed 2497.26 samples/sec Loss 1.0367 LearningRate 0.000000 Epoch: 39 Global Step: 815630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:38,869-Speed 2496.46 samples/sec Loss 1.0412 LearningRate 0.000000 Epoch: 39 Global Step: 815640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:47,019-Speed 2513.64 samples/sec Loss 1.0633 LearningRate 0.000000 Epoch: 39 Global Step: 815650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:03:55,228-Speed 2495.38 samples/sec Loss 1.0719 LearningRate 0.000000 Epoch: 39 Global Step: 815660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:03,438-Speed 2494.89 samples/sec Loss 1.0689 LearningRate 0.000000 Epoch: 39 Global Step: 815670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:11,650-Speed 2494.38 samples/sec Loss 1.0584 LearningRate 0.000000 Epoch: 39 Global Step: 815680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:19,857-Speed 2495.86 samples/sec Loss 1.0653 LearningRate 0.000000 Epoch: 39 Global Step: 815690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:28,070-Speed 2494.12 samples/sec Loss 1.0594 LearningRate 0.000000 Epoch: 39 Global Step: 815700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:36,223-Speed 2512.22 samples/sec Loss 1.0643 LearningRate 0.000000 Epoch: 39 Global Step: 815710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:44,428-Speed 2496.59 samples/sec Loss 1.0551 LearningRate 0.000000 Epoch: 39 Global Step: 815720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:04:52,633-Speed 2496.59 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 815730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:00,835-Speed 2497.36 samples/sec Loss 1.0715 LearningRate 0.000000 Epoch: 39 Global Step: 815740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:09,042-Speed 2495.70 samples/sec Loss 1.0510 LearningRate 0.000000 Epoch: 39 Global Step: 815750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:17,244-Speed 2497.34 samples/sec Loss 1.0625 LearningRate 0.000000 Epoch: 39 Global Step: 815760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:25,397-Speed 2512.45 samples/sec Loss 1.0622 LearningRate 0.000000 Epoch: 39 Global Step: 815770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:33,599-Speed 2497.32 samples/sec Loss 1.0454 LearningRate 0.000000 Epoch: 39 Global Step: 815780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:41,804-Speed 2496.46 samples/sec Loss 1.0714 LearningRate 0.000000 Epoch: 39 Global Step: 815790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:50,020-Speed 2493.28 samples/sec Loss 1.0552 LearningRate 0.000000 Epoch: 39 Global Step: 815800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:05:58,228-Speed 2495.41 samples/sec Loss 1.0237 LearningRate 0.000000 Epoch: 39 Global Step: 815810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:06,432-Speed 2496.87 samples/sec Loss 1.0166 LearningRate 0.000000 Epoch: 39 Global Step: 815820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:14,584-Speed 2512.48 samples/sec Loss 1.0664 LearningRate 0.000000 Epoch: 39 Global Step: 815830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:22,792-Speed 2495.65 samples/sec Loss 1.0503 LearningRate 0.000000 Epoch: 39 Global Step: 815840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:30,996-Speed 2496.67 samples/sec Loss 1.0664 LearningRate 0.000000 Epoch: 39 Global Step: 815850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:39,202-Speed 2496.06 samples/sec Loss 1.0654 LearningRate 0.000000 Epoch: 39 Global Step: 815860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:47,416-Speed 2493.85 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 815870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:06:55,637-Speed 2491.70 samples/sec Loss 1.0386 LearningRate 0.000000 Epoch: 39 Global Step: 815880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:03,787-Speed 2513.24 samples/sec Loss 1.0402 LearningRate 0.000000 Epoch: 39 Global Step: 815890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:11,991-Speed 2496.82 samples/sec Loss 1.0473 LearningRate 0.000000 Epoch: 39 Global Step: 815900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:20,195-Speed 2496.44 samples/sec Loss 1.0804 LearningRate 0.000000 Epoch: 39 Global Step: 815910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:28,400-Speed 2496.98 samples/sec Loss 1.0856 LearningRate 0.000000 Epoch: 39 Global Step: 815920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:36,604-Speed 2496.49 samples/sec Loss 1.0506 LearningRate 0.000000 Epoch: 39 Global Step: 815930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:44,810-Speed 2496.16 samples/sec Loss 1.0535 LearningRate 0.000000 Epoch: 39 Global Step: 815940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:07:52,974-Speed 2508.96 samples/sec Loss 1.0754 LearningRate 0.000000 Epoch: 39 Global Step: 815950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:01,190-Speed 2493.10 samples/sec Loss 1.0474 LearningRate 0.000000 Epoch: 39 Global Step: 815960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:09,395-Speed 2496.94 samples/sec Loss 1.0789 LearningRate 0.000000 Epoch: 39 Global Step: 815970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:17,604-Speed 2495.03 samples/sec Loss 1.0662 LearningRate 0.000000 Epoch: 39 Global Step: 815980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:25,822-Speed 2492.61 samples/sec Loss 1.0613 LearningRate 0.000000 Epoch: 39 Global Step: 815990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:34,023-Speed 2497.46 samples/sec Loss 1.0500 LearningRate 0.000000 Epoch: 39 Global Step: 816000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:42,181-Speed 2511.09 samples/sec Loss 1.0510 LearningRate 0.000000 Epoch: 39 Global Step: 816010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:50,390-Speed 2495.55 samples/sec Loss 1.0846 LearningRate 0.000000 Epoch: 39 Global Step: 816020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:08:58,598-Speed 2495.43 samples/sec Loss 1.0357 LearningRate 0.000000 Epoch: 39 Global Step: 816030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:06,803-Speed 2496.30 samples/sec Loss 1.0822 LearningRate 0.000000 Epoch: 39 Global Step: 816040 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:15,018-Speed 2493.56 samples/sec Loss 1.0369 LearningRate 0.000000 Epoch: 39 Global Step: 816050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:23,223-Speed 2496.28 samples/sec Loss 1.0413 LearningRate 0.000000 Epoch: 39 Global Step: 816060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:31,376-Speed 2512.43 samples/sec Loss 1.0513 LearningRate 0.000000 Epoch: 39 Global Step: 816070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:39,594-Speed 2492.29 samples/sec Loss 1.0482 LearningRate 0.000000 Epoch: 39 Global Step: 816080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:47,806-Speed 2494.22 samples/sec Loss 1.0884 LearningRate 0.000000 Epoch: 39 Global Step: 816090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:09:56,010-Speed 2496.81 samples/sec Loss 1.0668 LearningRate 0.000000 Epoch: 39 Global Step: 816100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:04,218-Speed 2495.47 samples/sec Loss 1.0866 LearningRate 0.000000 Epoch: 39 Global Step: 816110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:12,422-Speed 2496.71 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 816120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:20,572-Speed 2513.17 samples/sec Loss 1.0632 LearningRate 0.000000 Epoch: 39 Global Step: 816130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:28,788-Speed 2493.15 samples/sec Loss 1.0437 LearningRate 0.000000 Epoch: 39 Global Step: 816140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:37,004-Speed 2493.31 samples/sec Loss 1.0741 LearningRate 0.000000 Epoch: 39 Global Step: 816150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:45,225-Speed 2491.92 samples/sec Loss 1.0330 LearningRate 0.000000 Epoch: 39 Global Step: 816160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:10:53,432-Speed 2495.95 samples/sec Loss 1.0745 LearningRate 0.000000 Epoch: 39 Global Step: 816170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:01,645-Speed 2493.92 samples/sec Loss 1.0471 LearningRate 0.000000 Epoch: 39 Global Step: 816180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:09,800-Speed 2512.02 samples/sec Loss 1.0501 LearningRate 0.000000 Epoch: 39 Global Step: 816190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:18,016-Speed 2492.95 samples/sec Loss 1.0769 LearningRate 0.000000 Epoch: 39 Global Step: 816200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:26,221-Speed 2496.42 samples/sec Loss 1.0850 LearningRate 0.000000 Epoch: 39 Global Step: 816210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:34,434-Speed 2493.91 samples/sec Loss 1.0396 LearningRate 0.000000 Epoch: 39 Global Step: 816220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:42,637-Speed 2497.16 samples/sec Loss 1.0602 LearningRate 0.000000 Epoch: 39 Global Step: 816230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:50,839-Speed 2497.45 samples/sec Loss 1.0714 LearningRate 0.000000 Epoch: 39 Global Step: 816240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:11:58,990-Speed 2512.84 samples/sec Loss 1.0589 LearningRate 0.000000 Epoch: 39 Global Step: 816250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:07,196-Speed 2496.44 samples/sec Loss 1.0331 LearningRate 0.000000 Epoch: 39 Global Step: 816260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:15,403-Speed 2495.99 samples/sec Loss 1.0589 LearningRate 0.000000 Epoch: 39 Global Step: 816270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:23,620-Speed 2492.77 samples/sec Loss 1.0456 LearningRate 0.000000 Epoch: 39 Global Step: 816280 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:31,822-Speed 2497.03 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 816290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:40,036-Speed 2493.80 samples/sec Loss 1.0706 LearningRate 0.000000 Epoch: 39 Global Step: 816300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:48,209-Speed 2506.39 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 816310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:12:56,419-Speed 2494.84 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 816320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:04,661-Speed 2485.60 samples/sec Loss 1.0895 LearningRate 0.000000 Epoch: 39 Global Step: 816330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:12,866-Speed 2496.39 samples/sec Loss 1.0562 LearningRate 0.000000 Epoch: 39 Global Step: 816340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:21,067-Speed 2497.72 samples/sec Loss 1.0498 LearningRate 0.000000 Epoch: 39 Global Step: 816350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:29,276-Speed 2495.11 samples/sec Loss 1.0419 LearningRate 0.000000 Epoch: 39 Global Step: 816360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:37,425-Speed 2513.58 samples/sec Loss 1.0526 LearningRate 0.000000 Epoch: 39 Global Step: 816370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:45,630-Speed 2496.24 samples/sec Loss 1.0605 LearningRate 0.000000 Epoch: 39 Global Step: 816380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:13:53,836-Speed 2496.23 samples/sec Loss 1.0377 LearningRate 0.000000 Epoch: 39 Global Step: 816390 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:02,045-Speed 2495.23 samples/sec Loss 1.0815 LearningRate 0.000000 Epoch: 39 Global Step: 816400 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:10,248-Speed 2497.08 samples/sec Loss 1.0821 LearningRate 0.000000 Epoch: 39 Global Step: 816410 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:18,454-Speed 2496.15 samples/sec Loss 1.0754 LearningRate 0.000000 Epoch: 39 Global Step: 816420 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:26,603-Speed 2513.80 samples/sec Loss 1.0786 LearningRate 0.000000 Epoch: 39 Global Step: 816430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:34,808-Speed 2496.24 samples/sec Loss 1.0893 LearningRate 0.000000 Epoch: 39 Global Step: 816440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:43,017-Speed 2495.92 samples/sec Loss 1.0768 LearningRate 0.000000 Epoch: 39 Global Step: 816450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:51,220-Speed 2496.77 samples/sec Loss 1.0400 LearningRate 0.000000 Epoch: 39 Global Step: 816460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:14:59,434-Speed 2493.81 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 816470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:07,655-Speed 2491.58 samples/sec Loss 1.0792 LearningRate 0.000000 Epoch: 39 Global Step: 816480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:15,810-Speed 2511.78 samples/sec Loss 1.0639 LearningRate 0.000000 Epoch: 39 Global Step: 816490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:24,018-Speed 2495.43 samples/sec Loss 1.0829 LearningRate 0.000000 Epoch: 39 Global Step: 816500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:32,223-Speed 2496.90 samples/sec Loss 1.0668 LearningRate 0.000000 Epoch: 39 Global Step: 816510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:40,434-Speed 2494.61 samples/sec Loss 1.0823 LearningRate 0.000000 Epoch: 39 Global Step: 816520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:48,636-Speed 2497.63 samples/sec Loss 1.0815 LearningRate 0.000000 Epoch: 39 Global Step: 816530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:15:56,843-Speed 2495.85 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 816540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:04,996-Speed 2512.23 samples/sec Loss 1.0331 LearningRate 0.000000 Epoch: 39 Global Step: 816550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:13,200-Speed 2496.72 samples/sec Loss 1.0504 LearningRate 0.000000 Epoch: 39 Global Step: 816560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:21,405-Speed 2496.62 samples/sec Loss 1.0782 LearningRate 0.000000 Epoch: 39 Global Step: 816570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:29,622-Speed 2492.70 samples/sec Loss 1.0515 LearningRate 0.000000 Epoch: 39 Global Step: 816580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:37,828-Speed 2496.19 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 816590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:46,038-Speed 2494.89 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 816600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:16:54,189-Speed 2512.86 samples/sec Loss 1.0616 LearningRate 0.000000 Epoch: 39 Global Step: 816610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:17:02,393-Speed 2496.87 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 816620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:17:10,598-Speed 2496.45 samples/sec Loss 1.0452 LearningRate 0.000000 Epoch: 39 Global Step: 816630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:18,801-Speed 2496.88 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 816640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:27,008-Speed 2495.65 samples/sec Loss 1.0789 LearningRate 0.000000 Epoch: 39 Global Step: 816650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:35,213-Speed 2496.89 samples/sec Loss 1.0639 LearningRate 0.000000 Epoch: 39 Global Step: 816660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:43,366-Speed 2512.56 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 816670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:51,569-Speed 2497.05 samples/sec Loss 1.0500 LearningRate 0.000000 Epoch: 39 Global Step: 816680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:17:59,777-Speed 2495.55 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 816690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:07,992-Speed 2493.32 samples/sec Loss 1.0290 LearningRate 0.000000 Epoch: 39 Global Step: 816700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:16,203-Speed 2494.68 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 816710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:24,406-Speed 2497.07 samples/sec Loss 1.0527 LearningRate 0.000000 Epoch: 39 Global Step: 816720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:32,567-Speed 2509.85 samples/sec Loss 1.0377 LearningRate 0.000000 Epoch: 39 Global Step: 816730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:40,772-Speed 2496.49 samples/sec Loss 1.0443 LearningRate 0.000000 Epoch: 39 Global Step: 816740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:48,980-Speed 2495.28 samples/sec Loss 1.0262 LearningRate 0.000000 Epoch: 39 Global Step: 816750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:18:57,188-Speed 2495.59 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 816760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:05,405-Speed 2492.91 samples/sec Loss 1.0480 LearningRate 0.000000 Epoch: 39 Global Step: 816770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:13,609-Speed 2496.52 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 816780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:21,759-Speed 2513.35 samples/sec Loss 1.0654 LearningRate 0.000000 Epoch: 39 Global Step: 816790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:29,963-Speed 2496.66 samples/sec Loss 1.0932 LearningRate 0.000000 Epoch: 39 Global Step: 816800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:38,172-Speed 2495.53 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 816810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:46,372-Speed 2497.85 samples/sec Loss 1.0425 LearningRate 0.000000 Epoch: 39 Global Step: 816820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:19:54,576-Speed 2496.96 samples/sec Loss 1.0381 LearningRate 0.000000 Epoch: 39 Global Step: 816830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:02,778-Speed 2497.10 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 816840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:10,930-Speed 2512.86 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 816850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:19,130-Speed 2497.73 samples/sec Loss 1.0577 LearningRate 0.000000 Epoch: 39 Global Step: 816860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:27,332-Speed 2497.92 samples/sec Loss 1.0855 LearningRate 0.000000 Epoch: 39 Global Step: 816870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:35,536-Speed 2496.70 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 816880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:43,745-Speed 2495.26 samples/sec Loss 1.0866 LearningRate 0.000000 Epoch: 39 Global Step: 816890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:20:51,949-Speed 2496.88 samples/sec Loss 1.0440 LearningRate 0.000000 Epoch: 39 Global Step: 816900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:00,104-Speed 2512.18 samples/sec Loss 1.0342 LearningRate 0.000000 Epoch: 39 Global Step: 816910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:08,309-Speed 2496.55 samples/sec Loss 1.0415 LearningRate 0.000000 Epoch: 39 Global Step: 816920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:16,515-Speed 2495.95 samples/sec Loss 1.0810 LearningRate 0.000000 Epoch: 39 Global Step: 816930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:24,721-Speed 2496.39 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 816940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:32,931-Speed 2494.89 samples/sec Loss 1.0363 LearningRate 0.000000 Epoch: 39 Global Step: 816950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:21:41,092-Speed 2510.25 samples/sec Loss 1.0721 LearningRate 0.000000 Epoch: 39 Global Step: 816960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:21:49,242-Speed 2513.35 samples/sec Loss 1.0759 LearningRate 0.000000 Epoch: 39 Global Step: 816970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:21:57,444-Speed 2497.38 samples/sec Loss 1.0226 LearningRate 0.000000 Epoch: 39 Global Step: 816980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:05,659-Speed 2493.30 samples/sec Loss 1.0581 LearningRate 0.000000 Epoch: 39 Global Step: 816990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:13,864-Speed 2496.31 samples/sec Loss 1.0907 LearningRate 0.000000 Epoch: 39 Global Step: 817000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:22,076-Speed 2494.45 samples/sec Loss 1.0655 LearningRate 0.000000 Epoch: 39 Global Step: 817010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:30,293-Speed 2492.71 samples/sec Loss 1.0751 LearningRate 0.000000 Epoch: 39 Global Step: 817020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:38,475-Speed 2503.46 samples/sec Loss 1.0877 LearningRate 0.000000 Epoch: 39 Global Step: 817030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:46,692-Speed 2492.85 samples/sec Loss 1.0437 LearningRate 0.000000 Epoch: 39 Global Step: 817040 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:22:54,900-Speed 2495.65 samples/sec Loss 1.0630 LearningRate 0.000000 Epoch: 39 Global Step: 817050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:03,104-Speed 2496.35 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 817060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:11,316-Speed 2494.56 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 817070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:19,520-Speed 2496.62 samples/sec Loss 1.0589 LearningRate 0.000000 Epoch: 39 Global Step: 817080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:27,671-Speed 2513.09 samples/sec Loss 1.0891 LearningRate 0.000000 Epoch: 39 Global Step: 817090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:35,875-Speed 2496.46 samples/sec Loss 1.0382 LearningRate 0.000000 Epoch: 39 Global Step: 817100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:44,077-Speed 2497.33 samples/sec Loss 1.0522 LearningRate 0.000000 Epoch: 39 Global Step: 817110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:23:52,282-Speed 2496.43 samples/sec Loss 1.0672 LearningRate 0.000000 Epoch: 39 Global Step: 817120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:00,485-Speed 2497.18 samples/sec Loss 1.0646 LearningRate 0.000000 Epoch: 39 Global Step: 817130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:08,691-Speed 2496.06 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 817140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:16,843-Speed 2512.51 samples/sec Loss 1.0417 LearningRate 0.000000 Epoch: 39 Global Step: 817150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:25,051-Speed 2495.64 samples/sec Loss 1.0720 LearningRate 0.000000 Epoch: 39 Global Step: 817160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:33,271-Speed 2492.01 samples/sec Loss 1.0928 LearningRate 0.000000 Epoch: 39 Global Step: 817170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:41,479-Speed 2495.73 samples/sec Loss 1.0565 LearningRate 0.000000 Epoch: 39 Global Step: 817180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:49,686-Speed 2495.95 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 817190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:24:57,892-Speed 2496.27 samples/sec Loss 1.0587 LearningRate 0.000000 Epoch: 39 Global Step: 817200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:06,043-Speed 2512.96 samples/sec Loss 1.0999 LearningRate 0.000000 Epoch: 39 Global Step: 817210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:14,250-Speed 2495.60 samples/sec Loss 1.0630 LearningRate 0.000000 Epoch: 39 Global Step: 817220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:22,454-Speed 2497.12 samples/sec Loss 1.0484 LearningRate 0.000000 Epoch: 39 Global Step: 817230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:30,663-Speed 2495.29 samples/sec Loss 1.0662 LearningRate 0.000000 Epoch: 39 Global Step: 817240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:38,872-Speed 2495.02 samples/sec Loss 1.0881 LearningRate 0.000000 Epoch: 39 Global Step: 817250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:47,086-Speed 2493.79 samples/sec Loss 1.0460 LearningRate 0.000000 Epoch: 39 Global Step: 817260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:25:55,238-Speed 2512.85 samples/sec Loss 1.0480 LearningRate 0.000000 Epoch: 39 Global Step: 817270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:03,442-Speed 2496.49 samples/sec Loss 1.0533 LearningRate 0.000000 Epoch: 39 Global Step: 817280 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:11,646-Speed 2496.96 samples/sec Loss 1.0618 LearningRate 0.000000 Epoch: 39 Global Step: 817290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:19,848-Speed 2497.26 samples/sec Loss 1.0663 LearningRate 0.000000 Epoch: 39 Global Step: 817300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:28,049-Speed 2497.59 samples/sec Loss 1.0292 LearningRate 0.000000 Epoch: 39 Global Step: 817310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:36,256-Speed 2495.81 samples/sec Loss 1.0648 LearningRate 0.000000 Epoch: 39 Global Step: 817320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:44,407-Speed 2512.76 samples/sec Loss 1.0501 LearningRate 0.000000 Epoch: 39 Global Step: 817330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:26:52,614-Speed 2495.86 samples/sec Loss 1.0703 LearningRate 0.000000 Epoch: 39 Global Step: 817340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:00,823-Speed 2495.28 samples/sec Loss 1.0831 LearningRate 0.000000 Epoch: 39 Global Step: 817350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:09,040-Speed 2492.92 samples/sec Loss 1.0794 LearningRate 0.000000 Epoch: 39 Global Step: 817360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:17,242-Speed 2497.19 samples/sec Loss 1.0792 LearningRate 0.000000 Epoch: 39 Global Step: 817370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:25,449-Speed 2495.81 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 817380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:33,600-Speed 2513.07 samples/sec Loss 1.0696 LearningRate 0.000000 Epoch: 39 Global Step: 817390 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:41,800-Speed 2497.95 samples/sec Loss 1.0631 LearningRate 0.000000 Epoch: 39 Global Step: 817400 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:50,016-Speed 2493.14 samples/sec Loss 1.0309 LearningRate 0.000000 Epoch: 39 Global Step: 817410 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:27:58,220-Speed 2496.91 samples/sec Loss 1.0780 LearningRate 0.000000 Epoch: 39 Global Step: 817420 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:06,465-Speed 2484.25 samples/sec Loss 1.0463 LearningRate 0.000000 Epoch: 39 Global Step: 817430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:14,673-Speed 2495.50 samples/sec Loss 1.0264 LearningRate 0.000000 Epoch: 39 Global Step: 817440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:22,823-Speed 2513.23 samples/sec Loss 1.0781 LearningRate 0.000000 Epoch: 39 Global Step: 817450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:31,029-Speed 2496.17 samples/sec Loss 1.0468 LearningRate 0.000000 Epoch: 39 Global Step: 817460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:39,235-Speed 2496.26 samples/sec Loss 1.0702 LearningRate 0.000000 Epoch: 39 Global Step: 817470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:47,437-Speed 2497.40 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 817480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:28:55,648-Speed 2494.61 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 817490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:03,857-Speed 2495.13 samples/sec Loss 1.0921 LearningRate 0.000000 Epoch: 39 Global Step: 817500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:12,009-Speed 2512.55 samples/sec Loss 1.0584 LearningRate 0.000000 Epoch: 39 Global Step: 817510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:20,215-Speed 2496.02 samples/sec Loss 1.0595 LearningRate 0.000000 Epoch: 39 Global Step: 817520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:28,418-Speed 2497.28 samples/sec Loss 1.0751 LearningRate 0.000000 Epoch: 39 Global Step: 817530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:36,625-Speed 2495.45 samples/sec Loss 1.0619 LearningRate 0.000000 Epoch: 39 Global Step: 817540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:44,833-Speed 2495.61 samples/sec Loss 1.0736 LearningRate 0.000000 Epoch: 39 Global Step: 817550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:29:53,037-Speed 2496.77 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 817560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:01,192-Speed 2511.74 samples/sec Loss 1.0331 LearningRate 0.000000 Epoch: 39 Global Step: 817570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:09,405-Speed 2493.90 samples/sec Loss 1.0450 LearningRate 0.000000 Epoch: 39 Global Step: 817580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:17,612-Speed 2496.10 samples/sec Loss 1.0687 LearningRate 0.000000 Epoch: 39 Global Step: 817590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:25,818-Speed 2495.98 samples/sec Loss 1.0811 LearningRate 0.000000 Epoch: 39 Global Step: 817600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:34,026-Speed 2495.57 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 817610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:42,231-Speed 2496.48 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 817620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:50,384-Speed 2512.38 samples/sec Loss 1.0663 LearningRate 0.000000 Epoch: 39 Global Step: 817630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:30:58,589-Speed 2496.01 samples/sec Loss 1.0948 LearningRate 0.000000 Epoch: 39 Global Step: 817640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:06,799-Speed 2494.92 samples/sec Loss 1.0744 LearningRate 0.000000 Epoch: 39 Global Step: 817650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:15,003-Speed 2496.92 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 817660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:23,209-Speed 2496.29 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 817670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:31,416-Speed 2495.73 samples/sec Loss 1.0939 LearningRate 0.000000 Epoch: 39 Global Step: 817680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:39,573-Speed 2511.24 samples/sec Loss 1.0511 LearningRate 0.000000 Epoch: 39 Global Step: 817690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:47,783-Speed 2494.82 samples/sec Loss 1.0763 LearningRate 0.000000 Epoch: 39 Global Step: 817700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:31:55,986-Speed 2496.97 samples/sec Loss 1.0431 LearningRate 0.000000 Epoch: 39 Global Step: 817710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:04,191-Speed 2496.35 samples/sec Loss 1.0493 LearningRate 0.000000 Epoch: 39 Global Step: 817720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:12,395-Speed 2497.00 samples/sec Loss 1.0405 LearningRate 0.000000 Epoch: 39 Global Step: 817730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:20,600-Speed 2496.39 samples/sec Loss 1.0427 LearningRate 0.000000 Epoch: 39 Global Step: 817740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:28,756-Speed 2511.41 samples/sec Loss 1.0454 LearningRate 0.000000 Epoch: 39 Global Step: 817750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:36,973-Speed 2493.08 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 817760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:45,174-Speed 2497.63 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 817770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:32:53,378-Speed 2496.91 samples/sec Loss 1.0612 LearningRate 0.000000 Epoch: 39 Global Step: 817780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:01,577-Speed 2498.25 samples/sec Loss 1.0287 LearningRate 0.000000 Epoch: 39 Global Step: 817790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:09,780-Speed 2497.03 samples/sec Loss 1.0740 LearningRate 0.000000 Epoch: 39 Global Step: 817800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:17,931-Speed 2512.78 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 817810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:26,139-Speed 2495.73 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 817820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:34,346-Speed 2495.72 samples/sec Loss 1.0627 LearningRate 0.000000 Epoch: 39 Global Step: 817830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:42,549-Speed 2497.16 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 817840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:50,750-Speed 2497.41 samples/sec Loss 1.0533 LearningRate 0.000000 Epoch: 39 Global Step: 817850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:33:58,959-Speed 2495.35 samples/sec Loss 1.0663 LearningRate 0.000000 Epoch: 39 Global Step: 817860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:07,110-Speed 2512.90 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 817870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:15,316-Speed 2496.26 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 817880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:23,521-Speed 2496.72 samples/sec Loss 1.0563 LearningRate 0.000000 Epoch: 39 Global Step: 817890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:31,728-Speed 2495.88 samples/sec Loss 1.0852 LearningRate 0.000000 Epoch: 39 Global Step: 817900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:39,946-Speed 2492.39 samples/sec Loss 1.0373 LearningRate 0.000000 Epoch: 39 Global Step: 817910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:48,157-Speed 2494.69 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 817920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:34:56,309-Speed 2512.80 samples/sec Loss 1.0712 LearningRate 0.000000 Epoch: 39 Global Step: 817930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:04,513-Speed 2496.58 samples/sec Loss 1.0556 LearningRate 0.000000 Epoch: 39 Global Step: 817940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:12,724-Speed 2494.44 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 817950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:20,932-Speed 2495.61 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 817960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:29,148-Speed 2493.45 samples/sec Loss 1.0642 LearningRate 0.000000 Epoch: 39 Global Step: 817970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:37,353-Speed 2496.26 samples/sec Loss 1.0449 LearningRate 0.000000 Epoch: 39 Global Step: 817980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:45,507-Speed 2512.13 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 817990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:35:53,722-Speed 2493.56 samples/sec Loss 1.0665 LearningRate 0.000000 Epoch: 39 Global Step: 818000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:01,928-Speed 2496.39 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 818010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:10,139-Speed 2494.40 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 818020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:18,343-Speed 2496.71 samples/sec Loss 1.0405 LearningRate 0.000000 Epoch: 39 Global Step: 818030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:26,550-Speed 2495.91 samples/sec Loss 1.0568 LearningRate 0.000000 Epoch: 39 Global Step: 818040 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:34,703-Speed 2512.56 samples/sec Loss 1.0633 LearningRate 0.000000 Epoch: 39 Global Step: 818050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:42,909-Speed 2495.96 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 818060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:51,116-Speed 2495.86 samples/sec Loss 1.0904 LearningRate 0.000000 Epoch: 39 Global Step: 818070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:36:59,326-Speed 2494.99 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 818080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:07,556-Speed 2489.03 samples/sec Loss 1.0417 LearningRate 0.000000 Epoch: 39 Global Step: 818090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:15,764-Speed 2495.73 samples/sec Loss 1.0640 LearningRate 0.000000 Epoch: 39 Global Step: 818100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:23,915-Speed 2512.81 samples/sec Loss 1.0253 LearningRate 0.000000 Epoch: 39 Global Step: 818110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:32,126-Speed 2494.80 samples/sec Loss 1.0634 LearningRate 0.000000 Epoch: 39 Global Step: 818120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:40,336-Speed 2494.95 samples/sec Loss 1.0836 LearningRate 0.000000 Epoch: 39 Global Step: 818130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:48,565-Speed 2489.34 samples/sec Loss 1.0685 LearningRate 0.000000 Epoch: 39 Global Step: 818140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:37:56,772-Speed 2495.96 samples/sec Loss 1.0722 LearningRate 0.000000 Epoch: 39 Global Step: 818150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:38:04,982-Speed 2494.92 samples/sec Loss 1.0560 LearningRate 0.000000 Epoch: 39 Global Step: 818160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:13,140-Speed 2510.74 samples/sec Loss 1.0497 LearningRate 0.000000 Epoch: 39 Global Step: 818170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:21,355-Speed 2493.27 samples/sec Loss 1.0859 LearningRate 0.000000 Epoch: 39 Global Step: 818180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:29,561-Speed 2496.12 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 818190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:37,769-Speed 2495.66 samples/sec Loss 1.0474 LearningRate 0.000000 Epoch: 39 Global Step: 818200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:45,976-Speed 2495.62 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 818210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:38:54,185-Speed 2495.20 samples/sec Loss 1.0247 LearningRate 0.000000 Epoch: 39 Global Step: 818220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:02,348-Speed 2509.70 samples/sec Loss 1.0444 LearningRate 0.000000 Epoch: 39 Global Step: 818230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:10,562-Speed 2493.43 samples/sec Loss 1.0803 LearningRate 0.000000 Epoch: 39 Global Step: 818240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:18,774-Speed 2494.75 samples/sec Loss 1.1065 LearningRate 0.000000 Epoch: 39 Global Step: 818250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:26,978-Speed 2496.67 samples/sec Loss 1.0522 LearningRate 0.000000 Epoch: 39 Global Step: 818260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:35,181-Speed 2496.97 samples/sec Loss 1.0630 LearningRate 0.000000 Epoch: 39 Global Step: 818270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:43,383-Speed 2497.32 samples/sec Loss 1.0511 LearningRate 0.000000 Epoch: 39 Global Step: 818280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:51,531-Speed 2513.77 samples/sec Loss 1.0508 LearningRate 0.000000 Epoch: 39 Global Step: 818290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:39:59,734-Speed 2497.07 samples/sec Loss 1.0437 LearningRate 0.000000 Epoch: 39 Global Step: 818300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:07,939-Speed 2496.31 samples/sec Loss 1.0680 LearningRate 0.000000 Epoch: 39 Global Step: 818310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:16,144-Speed 2496.65 samples/sec Loss 1.0577 LearningRate 0.000000 Epoch: 39 Global Step: 818320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:24,345-Speed 2497.48 samples/sec Loss 1.0447 LearningRate 0.000000 Epoch: 39 Global Step: 818330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:32,554-Speed 2495.45 samples/sec Loss 1.0438 LearningRate 0.000000 Epoch: 39 Global Step: 818340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:40,711-Speed 2510.99 samples/sec Loss 1.0382 LearningRate 0.000000 Epoch: 39 Global Step: 818350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:48,926-Speed 2493.48 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 818360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:40:57,131-Speed 2496.50 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 818370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:05,338-Speed 2495.85 samples/sec Loss 1.0642 LearningRate 0.000000 Epoch: 39 Global Step: 818380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:13,544-Speed 2496.28 samples/sec Loss 1.0836 LearningRate 0.000000 Epoch: 39 Global Step: 818390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:21,751-Speed 2495.66 samples/sec Loss 1.0900 LearningRate 0.000000 Epoch: 39 Global Step: 818400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:29,904-Speed 2512.32 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 818410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:38,113-Speed 2495.56 samples/sec Loss 1.0663 LearningRate 0.000000 Epoch: 39 Global Step: 818420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:46,326-Speed 2494.07 samples/sec Loss 1.0551 LearningRate 0.000000 Epoch: 39 Global Step: 818430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:41:54,536-Speed 2494.91 samples/sec Loss 1.0677 LearningRate 0.000000 Epoch: 39 Global Step: 818440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:02,744-Speed 2495.37 samples/sec Loss 1.0620 LearningRate 0.000000 Epoch: 39 Global Step: 818450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:10,953-Speed 2495.30 samples/sec Loss 1.0545 LearningRate 0.000000 Epoch: 39 Global Step: 818460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:19,120-Speed 2507.97 samples/sec Loss 1.0789 LearningRate 0.000000 Epoch: 39 Global Step: 818470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:27,328-Speed 2495.69 samples/sec Loss 1.0425 LearningRate 0.000000 Epoch: 39 Global Step: 818480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:35,536-Speed 2495.42 samples/sec Loss 1.0663 LearningRate 0.000000 Epoch: 39 Global Step: 818490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:43,744-Speed 2495.66 samples/sec Loss 1.0741 LearningRate 0.000000 Epoch: 39 Global Step: 818500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:42:51,949-Speed 2496.24 samples/sec Loss 1.0757 LearningRate 0.000000 Epoch: 39 Global Step: 818510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:00,158-Speed 2495.13 samples/sec Loss 1.0408 LearningRate 0.000000 Epoch: 39 Global Step: 818520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:08,309-Speed 2512.95 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 818530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:16,519-Speed 2495.71 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 818540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:24,721-Speed 2497.49 samples/sec Loss 1.0911 LearningRate 0.000000 Epoch: 39 Global Step: 818550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:32,928-Speed 2495.67 samples/sec Loss 1.0359 LearningRate 0.000000 Epoch: 39 Global Step: 818560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:41,134-Speed 2495.98 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 818570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:49,345-Speed 2495.02 samples/sec Loss 1.0359 LearningRate 0.000000 Epoch: 39 Global Step: 818580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:43:57,498-Speed 2512.45 samples/sec Loss 1.0636 LearningRate 0.000000 Epoch: 39 Global Step: 818590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:44:05,706-Speed 2495.36 samples/sec Loss 1.0440 LearningRate 0.000000 Epoch: 39 Global Step: 818600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:44:13,910-Speed 2496.58 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 818610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:44:22,118-Speed 2496.02 samples/sec Loss 1.0757 LearningRate 0.000000 Epoch: 39 Global Step: 818620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-07-13 09:44:30,283-Speed 2508.49 samples/sec Loss 1.0840 LearningRate 0.000000 Epoch: 39 Global Step: 818630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-07-13 09:44:38,487-Speed 2496.83 samples/sec Loss 1.0686 LearningRate 0.000000 Epoch: 39 Global Step: 818640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:44:46,650-Speed 2509.14 samples/sec Loss 1.0826 LearningRate 0.000000 Epoch: 39 Global Step: 818650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:44:54,854-Speed 2496.78 samples/sec Loss 1.0477 LearningRate 0.000000 Epoch: 39 Global Step: 818660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:03,059-Speed 2496.50 samples/sec Loss 1.0556 LearningRate 0.000000 Epoch: 39 Global Step: 818670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:11,263-Speed 2496.74 samples/sec Loss 1.0504 LearningRate 0.000000 Epoch: 39 Global Step: 818680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:19,475-Speed 2494.35 samples/sec Loss 1.0798 LearningRate 0.000000 Epoch: 39 Global Step: 818690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:27,682-Speed 2495.79 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 818700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:35,844-Speed 2509.50 samples/sec Loss 1.0586 LearningRate 0.000000 Epoch: 39 Global Step: 818710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:44,050-Speed 2496.33 samples/sec Loss 1.0672 LearningRate 0.000000 Epoch: 39 Global Step: 818720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:45:52,257-Speed 2495.63 samples/sec Loss 1.0846 LearningRate 0.000000 Epoch: 39 Global Step: 818730 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:00,464-Speed 2496.12 samples/sec Loss 1.0770 LearningRate 0.000000 Epoch: 39 Global Step: 818740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:08,672-Speed 2495.18 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 818750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:16,885-Speed 2493.99 samples/sec Loss 1.0370 LearningRate 0.000000 Epoch: 39 Global Step: 818760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:25,044-Speed 2510.72 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 818770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:33,256-Speed 2494.29 samples/sec Loss 1.0687 LearningRate 0.000000 Epoch: 39 Global Step: 818780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:41,460-Speed 2496.61 samples/sec Loss 1.0686 LearningRate 0.000000 Epoch: 39 Global Step: 818790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:49,664-Speed 2496.65 samples/sec Loss 1.0587 LearningRate 0.000000 Epoch: 39 Global Step: 818800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:46:57,874-Speed 2495.04 samples/sec Loss 1.0777 LearningRate 0.000000 Epoch: 39 Global Step: 818810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:06,077-Speed 2497.08 samples/sec Loss 1.0726 LearningRate 0.000000 Epoch: 39 Global Step: 818820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:14,229-Speed 2512.54 samples/sec Loss 1.0750 LearningRate 0.000000 Epoch: 39 Global Step: 818830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:22,433-Speed 2496.77 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 818840 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:30,639-Speed 2496.03 samples/sec Loss 1.0776 LearningRate 0.000000 Epoch: 39 Global Step: 818850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:38,841-Speed 2497.48 samples/sec Loss 1.0443 LearningRate 0.000000 Epoch: 39 Global Step: 818860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:47,053-Speed 2494.33 samples/sec Loss 1.0573 LearningRate 0.000000 Epoch: 39 Global Step: 818870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:47:55,265-Speed 2494.37 samples/sec Loss 1.0705 LearningRate 0.000000 Epoch: 39 Global Step: 818880 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:03,420-Speed 2511.63 samples/sec Loss 1.0527 LearningRate 0.000000 Epoch: 39 Global Step: 818890 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:11,623-Speed 2497.05 samples/sec Loss 1.0787 LearningRate 0.000000 Epoch: 39 Global Step: 818900 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:19,829-Speed 2496.28 samples/sec Loss 1.0695 LearningRate 0.000000 Epoch: 39 Global Step: 818910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:28,034-Speed 2496.33 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 818920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:36,237-Speed 2497.28 samples/sec Loss 1.0924 LearningRate 0.000000 Epoch: 39 Global Step: 818930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:44,445-Speed 2495.63 samples/sec Loss 1.0743 LearningRate 0.000000 Epoch: 39 Global Step: 818940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:48:52,595-Speed 2513.17 samples/sec Loss 1.0662 LearningRate 0.000000 Epoch: 39 Global Step: 818950 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:00,810-Speed 2493.37 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 818960 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:09,016-Speed 2495.89 samples/sec Loss 1.0608 LearningRate 0.000000 Epoch: 39 Global Step: 818970 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:17,225-Speed 2495.57 samples/sec Loss 1.0498 LearningRate 0.000000 Epoch: 39 Global Step: 818980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:25,433-Speed 2495.19 samples/sec Loss 1.0803 LearningRate 0.000000 Epoch: 39 Global Step: 818990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:33,641-Speed 2495.41 samples/sec Loss 1.0768 LearningRate 0.000000 Epoch: 39 Global Step: 819000 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:41,801-Speed 2510.39 samples/sec Loss 1.0367 LearningRate 0.000000 Epoch: 39 Global Step: 819010 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:50,005-Speed 2496.91 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 819020 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:49:58,209-Speed 2496.65 samples/sec Loss 1.0732 LearningRate 0.000000 Epoch: 39 Global Step: 819030 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:06,416-Speed 2495.79 samples/sec Loss 1.0749 LearningRate 0.000000 Epoch: 39 Global Step: 819040 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:14,622-Speed 2496.27 samples/sec Loss 1.0780 LearningRate 0.000000 Epoch: 39 Global Step: 819050 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:22,833-Speed 2494.44 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 819060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:30,982-Speed 2513.40 samples/sec Loss 1.0483 LearningRate 0.000000 Epoch: 39 Global Step: 819070 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:39,188-Speed 2496.29 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 819080 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:47,395-Speed 2495.88 samples/sec Loss 1.0713 LearningRate 0.000000 Epoch: 39 Global Step: 819090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:50:55,607-Speed 2494.06 samples/sec Loss 1.0785 LearningRate 0.000000 Epoch: 39 Global Step: 819100 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:03,813-Speed 2496.41 samples/sec Loss 1.0854 LearningRate 0.000000 Epoch: 39 Global Step: 819110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:12,036-Speed 2490.80 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 819120 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:20,187-Speed 2513.17 samples/sec Loss 1.0961 LearningRate 0.000000 Epoch: 39 Global Step: 819130 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:28,399-Speed 2494.27 samples/sec Loss 1.0566 LearningRate 0.000000 Epoch: 39 Global Step: 819140 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:36,612-Speed 2493.89 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 819150 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:44,820-Speed 2495.57 samples/sec Loss 1.0384 LearningRate 0.000000 Epoch: 39 Global Step: 819160 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:51:53,031-Speed 2494.77 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 819170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:01,238-Speed 2495.62 samples/sec Loss 1.0669 LearningRate 0.000000 Epoch: 39 Global Step: 819180 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:09,394-Speed 2511.58 samples/sec Loss 1.0607 LearningRate 0.000000 Epoch: 39 Global Step: 819190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:17,599-Speed 2496.27 samples/sec Loss 1.0838 LearningRate 0.000000 Epoch: 39 Global Step: 819200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:25,804-Speed 2497.06 samples/sec Loss 1.0609 LearningRate 0.000000 Epoch: 39 Global Step: 819210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:34,008-Speed 2496.69 samples/sec Loss 1.0719 LearningRate 0.000000 Epoch: 39 Global Step: 819220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:42,214-Speed 2495.83 samples/sec Loss 1.0722 LearningRate 0.000000 Epoch: 39 Global Step: 819230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:50,418-Speed 2496.78 samples/sec Loss 1.0273 LearningRate 0.000000 Epoch: 39 Global Step: 819240 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:52:58,569-Speed 2513.09 samples/sec Loss 1.0396 LearningRate 0.000000 Epoch: 39 Global Step: 819250 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:06,775-Speed 2496.10 samples/sec Loss 1.0504 LearningRate 0.000000 Epoch: 39 Global Step: 819260 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:14,979-Speed 2496.74 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 819270 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:23,182-Speed 2496.86 samples/sec Loss 1.0759 LearningRate 0.000000 Epoch: 39 Global Step: 819280 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:31,388-Speed 2496.29 samples/sec Loss 1.0710 LearningRate 0.000000 Epoch: 39 Global Step: 819290 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:39,595-Speed 2495.86 samples/sec Loss 1.0426 LearningRate 0.000000 Epoch: 39 Global Step: 819300 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:47,745-Speed 2513.16 samples/sec Loss 1.0282 LearningRate 0.000000 Epoch: 39 Global Step: 819310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:53:55,951-Speed 2496.43 samples/sec Loss 1.0530 LearningRate 0.000000 Epoch: 39 Global Step: 819320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:04,158-Speed 2495.77 samples/sec Loss 1.0408 LearningRate 0.000000 Epoch: 39 Global Step: 819330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:12,363-Speed 2496.23 samples/sec Loss 1.0783 LearningRate 0.000000 Epoch: 39 Global Step: 819340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:20,571-Speed 2495.68 samples/sec Loss 1.0587 LearningRate 0.000000 Epoch: 39 Global Step: 819350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:28,773-Speed 2497.37 samples/sec Loss 1.0476 LearningRate 0.000000 Epoch: 39 Global Step: 819360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:36,925-Speed 2512.62 samples/sec Loss 1.0373 LearningRate 0.000000 Epoch: 39 Global Step: 819370 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:45,131-Speed 2496.11 samples/sec Loss 1.0855 LearningRate 0.000000 Epoch: 39 Global Step: 819380 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:54:53,336-Speed 2496.68 samples/sec Loss 1.0840 LearningRate 0.000000 Epoch: 39 Global Step: 819390 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:01,546-Speed 2495.19 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 819400 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:09,753-Speed 2495.97 samples/sec Loss 1.0723 LearningRate 0.000000 Epoch: 39 Global Step: 819410 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:17,968-Speed 2493.31 samples/sec Loss 1.0562 LearningRate 0.000000 Epoch: 39 Global Step: 819420 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:26,123-Speed 2512.07 samples/sec Loss 1.0629 LearningRate 0.000000 Epoch: 39 Global Step: 819430 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:34,331-Speed 2495.95 samples/sec Loss 1.0715 LearningRate 0.000000 Epoch: 39 Global Step: 819440 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:42,538-Speed 2495.76 samples/sec Loss 1.0493 LearningRate 0.000000 Epoch: 39 Global Step: 819450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:50,748-Speed 2495.05 samples/sec Loss 1.0521 LearningRate 0.000000 Epoch: 39 Global Step: 819460 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:55:58,954-Speed 2495.93 samples/sec Loss 1.0688 LearningRate 0.000000 Epoch: 39 Global Step: 819470 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:07,170-Speed 2493.33 samples/sec Loss 1.0550 LearningRate 0.000000 Epoch: 39 Global Step: 819480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:15,321-Speed 2513.07 samples/sec Loss 1.0867 LearningRate 0.000000 Epoch: 39 Global Step: 819490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:23,527-Speed 2496.12 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 819500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:31,734-Speed 2495.85 samples/sec Loss 1.0400 LearningRate 0.000000 Epoch: 39 Global Step: 819510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:39,941-Speed 2495.89 samples/sec Loss 1.0884 LearningRate 0.000000 Epoch: 39 Global Step: 819520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:48,148-Speed 2495.81 samples/sec Loss 1.0584 LearningRate 0.000000 Epoch: 39 Global Step: 819530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:56:56,369-Speed 2491.67 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 819540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:04,528-Speed 2510.58 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 819550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:12,731-Speed 2496.95 samples/sec Loss 1.0809 LearningRate 0.000000 Epoch: 39 Global Step: 819560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:20,936-Speed 2496.31 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 819570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:29,139-Speed 2497.18 samples/sec Loss 1.0527 LearningRate 0.000000 Epoch: 39 Global Step: 819580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:37,344-Speed 2496.33 samples/sec Loss 1.0847 LearningRate 0.000000 Epoch: 39 Global Step: 819590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:45,548-Speed 2496.86 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 819600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:57:53,697-Speed 2513.24 samples/sec Loss 1.0259 LearningRate 0.000000 Epoch: 39 Global Step: 819610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:01,916-Speed 2492.47 samples/sec Loss 1.0639 LearningRate 0.000000 Epoch: 39 Global Step: 819620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:10,121-Speed 2496.57 samples/sec Loss 1.0740 LearningRate 0.000000 Epoch: 39 Global Step: 819630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:18,329-Speed 2495.77 samples/sec Loss 1.0434 LearningRate 0.000000 Epoch: 39 Global Step: 819640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:26,543-Speed 2493.53 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 819650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:34,750-Speed 2495.76 samples/sec Loss 1.0769 LearningRate 0.000000 Epoch: 39 Global Step: 819660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:42,903-Speed 2512.58 samples/sec Loss 1.0394 LearningRate 0.000000 Epoch: 39 Global Step: 819670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:51,119-Speed 2492.86 samples/sec Loss 1.0688 LearningRate 0.000000 Epoch: 39 Global Step: 819680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:58:59,323-Speed 2496.88 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 819690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:07,533-Speed 2494.73 samples/sec Loss 1.0958 LearningRate 0.000000 Epoch: 39 Global Step: 819700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:15,737-Speed 2496.79 samples/sec Loss 1.0565 LearningRate 0.000000 Epoch: 39 Global Step: 819710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:23,944-Speed 2495.80 samples/sec Loss 1.0549 LearningRate 0.000000 Epoch: 39 Global Step: 819720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:32,096-Speed 2512.52 samples/sec Loss 1.0429 LearningRate 0.000000 Epoch: 39 Global Step: 819730 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:40,300-Speed 2496.95 samples/sec Loss 1.0371 LearningRate 0.000000 Epoch: 39 Global Step: 819740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:48,516-Speed 2493.00 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 819750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 09:59:56,727-Speed 2494.44 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 819760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:04,934-Speed 2496.09 samples/sec Loss 1.0556 LearningRate 0.000000 Epoch: 39 Global Step: 819770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:13,142-Speed 2495.39 samples/sec Loss 1.0379 LearningRate 0.000000 Epoch: 39 Global Step: 819780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:21,293-Speed 2513.19 samples/sec Loss 1.0968 LearningRate 0.000000 Epoch: 39 Global Step: 819790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:29,498-Speed 2496.24 samples/sec Loss 1.0788 LearningRate 0.000000 Epoch: 39 Global Step: 819800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:37,704-Speed 2496.41 samples/sec Loss 1.0572 LearningRate 0.000000 Epoch: 39 Global Step: 819810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:45,913-Speed 2495.12 samples/sec Loss 1.0661 LearningRate 0.000000 Epoch: 39 Global Step: 819820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-07-13 10:00:54,119-Speed 2496.22 samples/sec Loss 1.0772 LearningRate 0.000000 Epoch: 39 Global Step: 819830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:02,324-Speed 2496.27 samples/sec Loss 1.0734 LearningRate 0.000000 Epoch: 39 Global Step: 819840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:10,484-Speed 2510.59 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 819850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:18,697-Speed 2494.24 samples/sec Loss 1.0774 LearningRate 0.000000 Epoch: 39 Global Step: 819860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:26,907-Speed 2494.64 samples/sec Loss 1.0492 LearningRate 0.000000 Epoch: 39 Global Step: 819870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:35,120-Speed 2494.33 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 819880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:43,333-Speed 2494.31 samples/sec Loss 1.0538 LearningRate 0.000000 Epoch: 39 Global Step: 819890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:51,545-Speed 2494.19 samples/sec Loss 1.0910 LearningRate 0.000000 Epoch: 39 Global Step: 819900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:01:59,697-Speed 2512.51 samples/sec Loss 1.0481 LearningRate 0.000000 Epoch: 39 Global Step: 819910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:07,908-Speed 2494.63 samples/sec Loss 1.0530 LearningRate 0.000000 Epoch: 39 Global Step: 819920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:16,117-Speed 2495.48 samples/sec Loss 1.0370 LearningRate 0.000000 Epoch: 39 Global Step: 819930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:24,327-Speed 2494.67 samples/sec Loss 1.0535 LearningRate 0.000000 Epoch: 39 Global Step: 819940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:32,546-Speed 2491.97 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 819950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:40,754-Speed 2495.72 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 819960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:48,912-Speed 2511.24 samples/sec Loss 1.0634 LearningRate 0.000000 Epoch: 39 Global Step: 819970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:02:57,120-Speed 2495.29 samples/sec Loss 1.0541 LearningRate 0.000000 Epoch: 39 Global Step: 819980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:05,326-Speed 2496.14 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 819990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:13,533-Speed 2496.04 samples/sec Loss 1.0486 LearningRate 0.000000 Epoch: 39 Global Step: 820000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:21,748-Speed 2493.21 samples/sec Loss 1.0536 LearningRate 0.000000 Epoch: 39 Global Step: 820010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:29,952-Speed 2496.64 samples/sec Loss 1.0432 LearningRate 0.000000 Epoch: 39 Global Step: 820020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:38,110-Speed 2510.76 samples/sec Loss 1.0810 LearningRate 0.000000 Epoch: 39 Global Step: 820030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:46,320-Speed 2494.98 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 820040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:03:54,531-Speed 2494.80 samples/sec Loss 1.0778 LearningRate 0.000000 Epoch: 39 Global Step: 820050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:02,738-Speed 2495.88 samples/sec Loss 1.0976 LearningRate 0.000000 Epoch: 39 Global Step: 820060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:10,946-Speed 2495.91 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 820070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:19,149-Speed 2496.89 samples/sec Loss 1.0810 LearningRate 0.000000 Epoch: 39 Global Step: 820080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:27,299-Speed 2513.28 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 820090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:35,504-Speed 2496.55 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 820100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:43,708-Speed 2496.73 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 820110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:04:51,911-Speed 2496.88 samples/sec Loss 1.0499 LearningRate 0.000000 Epoch: 39 Global Step: 820120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:00,115-Speed 2496.98 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 820130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:08,318-Speed 2496.79 samples/sec Loss 1.0711 LearningRate 0.000000 Epoch: 39 Global Step: 820140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:16,469-Speed 2512.91 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 820150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:24,676-Speed 2495.84 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 820160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:32,880-Speed 2496.66 samples/sec Loss 1.0620 LearningRate 0.000000 Epoch: 39 Global Step: 820170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:41,086-Speed 2496.21 samples/sec Loss 1.0541 LearningRate 0.000000 Epoch: 39 Global Step: 820180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:49,288-Speed 2497.30 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 820190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:05:57,497-Speed 2495.20 samples/sec Loss 1.0772 LearningRate 0.000000 Epoch: 39 Global Step: 820200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:05,651-Speed 2511.95 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 820210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:13,855-Speed 2496.78 samples/sec Loss 1.0297 LearningRate 0.000000 Epoch: 39 Global Step: 820220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:22,066-Speed 2494.78 samples/sec Loss 1.0356 LearningRate 0.000000 Epoch: 39 Global Step: 820230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:30,280-Speed 2493.84 samples/sec Loss 1.0778 LearningRate 0.000000 Epoch: 39 Global Step: 820240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:38,484-Speed 2496.71 samples/sec Loss 1.0989 LearningRate 0.000000 Epoch: 39 Global Step: 820250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:46,696-Speed 2494.11 samples/sec Loss 1.0811 LearningRate 0.000000 Epoch: 39 Global Step: 820260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:06:54,848-Speed 2512.74 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 820270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:03,055-Speed 2495.82 samples/sec Loss 1.0818 LearningRate 0.000000 Epoch: 39 Global Step: 820280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:11,264-Speed 2495.23 samples/sec Loss 1.1010 LearningRate 0.000000 Epoch: 39 Global Step: 820290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:19,475-Speed 2494.70 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 820300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:27,678-Speed 2496.98 samples/sec Loss 1.0560 LearningRate 0.000000 Epoch: 39 Global Step: 820310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:35,889-Speed 2494.51 samples/sec Loss 1.0911 LearningRate 0.000000 Epoch: 39 Global Step: 820320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:44,042-Speed 2512.59 samples/sec Loss 1.0931 LearningRate 0.000000 Epoch: 39 Global Step: 820330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:07:52,246-Speed 2496.73 samples/sec Loss 1.0748 LearningRate 0.000000 Epoch: 39 Global Step: 820340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:00,451-Speed 2496.55 samples/sec Loss 1.0349 LearningRate 0.000000 Epoch: 39 Global Step: 820350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:08,667-Speed 2492.85 samples/sec Loss 1.0641 LearningRate 0.000000 Epoch: 39 Global Step: 820360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:16,875-Speed 2495.47 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 820370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:25,083-Speed 2495.40 samples/sec Loss 1.0397 LearningRate 0.000000 Epoch: 39 Global Step: 820380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:33,234-Speed 2512.98 samples/sec Loss 1.0774 LearningRate 0.000000 Epoch: 39 Global Step: 820390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:41,442-Speed 2495.77 samples/sec Loss 1.0363 LearningRate 0.000000 Epoch: 39 Global Step: 820400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:49,648-Speed 2496.23 samples/sec Loss 1.0659 LearningRate 0.000000 Epoch: 39 Global Step: 820410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:08:57,855-Speed 2495.56 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 820420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:06,061-Speed 2496.30 samples/sec Loss 1.0705 LearningRate 0.000000 Epoch: 39 Global Step: 820430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:14,265-Speed 2496.49 samples/sec Loss 1.0413 LearningRate 0.000000 Epoch: 39 Global Step: 820440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:22,438-Speed 2506.01 samples/sec Loss 1.0411 LearningRate 0.000000 Epoch: 39 Global Step: 820450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:30,640-Speed 2497.40 samples/sec Loss 1.0779 LearningRate 0.000000 Epoch: 39 Global Step: 820460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:38,848-Speed 2495.83 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 820470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:47,052-Speed 2496.57 samples/sec Loss 1.0768 LearningRate 0.000000 Epoch: 39 Global Step: 820480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:09:55,260-Speed 2495.63 samples/sec Loss 1.0517 LearningRate 0.000000 Epoch: 39 Global Step: 820490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:03,466-Speed 2496.47 samples/sec Loss 1.0555 LearningRate 0.000000 Epoch: 39 Global Step: 820500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:11,617-Speed 2512.72 samples/sec Loss 1.0352 LearningRate 0.000000 Epoch: 39 Global Step: 820510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:19,826-Speed 2495.49 samples/sec Loss 1.0350 LearningRate 0.000000 Epoch: 39 Global Step: 820520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:28,029-Speed 2497.03 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 820530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:36,241-Speed 2494.38 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 820540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:44,450-Speed 2494.84 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 820550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:10:52,655-Speed 2496.37 samples/sec Loss 1.0573 LearningRate 0.000000 Epoch: 39 Global Step: 820560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:00,810-Speed 2511.87 samples/sec Loss 1.0815 LearningRate 0.000000 Epoch: 39 Global Step: 820570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:09,034-Speed 2490.71 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 820580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:17,238-Speed 2496.71 samples/sec Loss 1.0486 LearningRate 0.000000 Epoch: 39 Global Step: 820590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:25,443-Speed 2496.31 samples/sec Loss 1.0393 LearningRate 0.000000 Epoch: 39 Global Step: 820600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:33,648-Speed 2496.56 samples/sec Loss 1.0783 LearningRate 0.000000 Epoch: 39 Global Step: 820610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:41,852-Speed 2496.80 samples/sec Loss 1.0494 LearningRate 0.000000 Epoch: 39 Global Step: 820620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:50,001-Speed 2514.28 samples/sec Loss 1.0331 LearningRate 0.000000 Epoch: 39 Global Step: 820630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:11:58,208-Speed 2496.52 samples/sec Loss 1.0398 LearningRate 0.000000 Epoch: 39 Global Step: 820640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:06,415-Speed 2495.84 samples/sec Loss 1.0687 LearningRate 0.000000 Epoch: 39 Global Step: 820650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:14,620-Speed 2496.32 samples/sec Loss 1.0682 LearningRate 0.000000 Epoch: 39 Global Step: 820660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:22,826-Speed 2496.10 samples/sec Loss 1.0294 LearningRate 0.000000 Epoch: 39 Global Step: 820670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:31,033-Speed 2495.56 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 820680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:39,185-Speed 2512.65 samples/sec Loss 1.0291 LearningRate 0.000000 Epoch: 39 Global Step: 820690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:47,391-Speed 2496.16 samples/sec Loss 1.0448 LearningRate 0.000000 Epoch: 39 Global Step: 820700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:12:55,598-Speed 2495.98 samples/sec Loss 1.0385 LearningRate 0.000000 Epoch: 39 Global Step: 820710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:03,801-Speed 2496.91 samples/sec Loss 1.0770 LearningRate 0.000000 Epoch: 39 Global Step: 820720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:12,007-Speed 2496.05 samples/sec Loss 1.0400 LearningRate 0.000000 Epoch: 39 Global Step: 820730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:20,214-Speed 2495.85 samples/sec Loss 1.0344 LearningRate 0.000000 Epoch: 39 Global Step: 820740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:28,367-Speed 2512.58 samples/sec Loss 1.0660 LearningRate 0.000000 Epoch: 39 Global Step: 820750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:36,574-Speed 2495.86 samples/sec Loss 1.0451 LearningRate 0.000000 Epoch: 39 Global Step: 820760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:44,778-Speed 2496.60 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 820770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:13:52,983-Speed 2496.56 samples/sec Loss 1.0399 LearningRate 0.000000 Epoch: 39 Global Step: 820780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:01,191-Speed 2495.12 samples/sec Loss 1.0465 LearningRate 0.000000 Epoch: 39 Global Step: 820790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:09,399-Speed 2495.85 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 820800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:17,550-Speed 2512.99 samples/sec Loss 1.0584 LearningRate 0.000000 Epoch: 39 Global Step: 820810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:25,754-Speed 2496.87 samples/sec Loss 1.0537 LearningRate 0.000000 Epoch: 39 Global Step: 820820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:33,958-Speed 2496.50 samples/sec Loss 1.0370 LearningRate 0.000000 Epoch: 39 Global Step: 820830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:42,163-Speed 2496.56 samples/sec Loss 1.0541 LearningRate 0.000000 Epoch: 39 Global Step: 820840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:50,367-Speed 2496.82 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 820850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:14:58,571-Speed 2496.57 samples/sec Loss 1.0595 LearningRate 0.000000 Epoch: 39 Global Step: 820860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:06,724-Speed 2512.55 samples/sec Loss 1.0051 LearningRate 0.000000 Epoch: 39 Global Step: 820870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:14,928-Speed 2496.64 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 820880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:23,140-Speed 2494.21 samples/sec Loss 1.0307 LearningRate 0.000000 Epoch: 39 Global Step: 820890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:31,350-Speed 2495.02 samples/sec Loss 1.0499 LearningRate 0.000000 Epoch: 39 Global Step: 820900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:39,555-Speed 2496.47 samples/sec Loss 1.0785 LearningRate 0.000000 Epoch: 39 Global Step: 820910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:47,761-Speed 2496.23 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 820920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:15:55,912-Speed 2512.96 samples/sec Loss 1.0473 LearningRate 0.000000 Epoch: 39 Global Step: 820930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:04,118-Speed 2496.21 samples/sec Loss 1.0045 LearningRate 0.000000 Epoch: 39 Global Step: 820940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:12,321-Speed 2496.94 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 820950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:20,524-Speed 2496.87 samples/sec Loss 1.0368 LearningRate 0.000000 Epoch: 39 Global Step: 820960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:28,736-Speed 2494.43 samples/sec Loss 1.0731 LearningRate 0.000000 Epoch: 39 Global Step: 820970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:36,940-Speed 2496.87 samples/sec Loss 1.0729 LearningRate 0.000000 Epoch: 39 Global Step: 820980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:45,094-Speed 2511.94 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 820990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:16:53,301-Speed 2496.01 samples/sec Loss 1.0581 LearningRate 0.000000 Epoch: 39 Global Step: 821000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:17:01,502-Speed 2497.43 samples/sec Loss 1.0395 LearningRate 0.000000 Epoch: 39 Global Step: 821010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:17:09,708-Speed 2496.39 samples/sec Loss 1.0839 LearningRate 0.000000 Epoch: 39 Global Step: 821020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:17:17,919-Speed 2494.64 samples/sec Loss 1.0741 LearningRate 0.000000 Epoch: 39 Global Step: 821030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:17:26,121-Speed 2497.12 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 821040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:17:34,278-Speed 2511.00 samples/sec Loss 1.0516 LearningRate 0.000000 Epoch: 39 Global Step: 821050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:17:42,481-Speed 2497.01 samples/sec Loss 1.0480 LearningRate 0.000000 Epoch: 39 Global Step: 821060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:17:50,685-Speed 2496.82 samples/sec Loss 1.0293 LearningRate 0.000000 Epoch: 39 Global Step: 821070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:17:58,892-Speed 2495.69 samples/sec Loss 1.0740 LearningRate 0.000000 Epoch: 39 Global Step: 821080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:07,100-Speed 2495.66 samples/sec Loss 1.0510 LearningRate 0.000000 Epoch: 39 Global Step: 821090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:15,310-Speed 2495.04 samples/sec Loss 1.0463 LearningRate 0.000000 Epoch: 39 Global Step: 821100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:23,460-Speed 2513.20 samples/sec Loss 1.0499 LearningRate 0.000000 Epoch: 39 Global Step: 821110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:31,662-Speed 2497.05 samples/sec Loss 1.0849 LearningRate 0.000000 Epoch: 39 Global Step: 821120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:39,866-Speed 2496.69 samples/sec Loss 1.0227 LearningRate 0.000000 Epoch: 39 Global Step: 821130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:48,070-Speed 2496.81 samples/sec Loss 1.0291 LearningRate 0.000000 Epoch: 39 Global Step: 821140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:18:56,275-Speed 2496.50 samples/sec Loss 1.0321 LearningRate 0.000000 Epoch: 39 Global Step: 821150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:04,481-Speed 2496.19 samples/sec Loss 1.0443 LearningRate 0.000000 Epoch: 39 Global Step: 821160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:12,633-Speed 2512.86 samples/sec Loss 1.0559 LearningRate 0.000000 Epoch: 39 Global Step: 821170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:20,842-Speed 2495.26 samples/sec Loss 1.0464 LearningRate 0.000000 Epoch: 39 Global Step: 821180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:29,045-Speed 2497.04 samples/sec Loss 1.0480 LearningRate 0.000000 Epoch: 39 Global Step: 821190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:37,254-Speed 2495.44 samples/sec Loss 1.0763 LearningRate 0.000000 Epoch: 39 Global Step: 821200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:45,462-Speed 2495.71 samples/sec Loss 1.0885 LearningRate 0.000000 Epoch: 39 Global Step: 821210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:19:53,664-Speed 2497.24 samples/sec Loss 1.0925 LearningRate 0.000000 Epoch: 39 Global Step: 821220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:01,816-Speed 2512.54 samples/sec Loss 1.0518 LearningRate 0.000000 Epoch: 39 Global Step: 821230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:10,021-Speed 2496.21 samples/sec Loss 1.0573 LearningRate 0.000000 Epoch: 39 Global Step: 821240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:18,227-Speed 2496.21 samples/sec Loss 1.0352 LearningRate 0.000000 Epoch: 39 Global Step: 821250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:26,430-Speed 2497.16 samples/sec Loss 1.0622 LearningRate 0.000000 Epoch: 39 Global Step: 821260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:34,633-Speed 2497.09 samples/sec Loss 1.0470 LearningRate 0.000000 Epoch: 39 Global Step: 821270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:42,838-Speed 2496.44 samples/sec Loss 1.0513 LearningRate 0.000000 Epoch: 39 Global Step: 821280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:50,988-Speed 2513.45 samples/sec Loss 1.0906 LearningRate 0.000000 Epoch: 39 Global Step: 821290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:20:59,190-Speed 2497.05 samples/sec Loss 1.0457 LearningRate 0.000000 Epoch: 39 Global Step: 821300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:07,396-Speed 2496.35 samples/sec Loss 1.0288 LearningRate 0.000000 Epoch: 39 Global Step: 821310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:15,602-Speed 2496.00 samples/sec Loss 1.0644 LearningRate 0.000000 Epoch: 39 Global Step: 821320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:23,812-Speed 2495.09 samples/sec Loss 1.0441 LearningRate 0.000000 Epoch: 39 Global Step: 821330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:32,021-Speed 2495.02 samples/sec Loss 1.0595 LearningRate 0.000000 Epoch: 39 Global Step: 821340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:40,175-Speed 2512.25 samples/sec Loss 1.0453 LearningRate 0.000000 Epoch: 39 Global Step: 821350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:48,382-Speed 2495.74 samples/sec Loss 1.0687 LearningRate 0.000000 Epoch: 39 Global Step: 821360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:21:56,589-Speed 2495.93 samples/sec Loss 1.0546 LearningRate 0.000000 Epoch: 39 Global Step: 821370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:04,791-Speed 2497.45 samples/sec Loss 1.0618 LearningRate 0.000000 Epoch: 39 Global Step: 821380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:12,995-Speed 2496.60 samples/sec Loss 1.0343 LearningRate 0.000000 Epoch: 39 Global Step: 821390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:21,200-Speed 2496.48 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 821400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:29,376-Speed 2505.18 samples/sec Loss 1.0605 LearningRate 0.000000 Epoch: 39 Global Step: 821410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:37,580-Speed 2496.86 samples/sec Loss 1.0313 LearningRate 0.000000 Epoch: 39 Global Step: 821420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:45,786-Speed 2495.95 samples/sec Loss 1.0254 LearningRate 0.000000 Epoch: 39 Global Step: 821430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:22:53,989-Speed 2497.01 samples/sec Loss 1.0888 LearningRate 0.000000 Epoch: 39 Global Step: 821440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:02,197-Speed 2495.95 samples/sec Loss 1.0700 LearningRate 0.000000 Epoch: 39 Global Step: 821450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:10,405-Speed 2495.44 samples/sec Loss 1.0401 LearningRate 0.000000 Epoch: 39 Global Step: 821460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:18,558-Speed 2512.30 samples/sec Loss 1.0255 LearningRate 0.000000 Epoch: 39 Global Step: 821470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:26,765-Speed 2496.06 samples/sec Loss 1.0447 LearningRate 0.000000 Epoch: 39 Global Step: 821480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:34,966-Speed 2497.55 samples/sec Loss 1.0625 LearningRate 0.000000 Epoch: 39 Global Step: 821490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:43,170-Speed 2496.91 samples/sec Loss 1.0320 LearningRate 0.000000 Epoch: 39 Global Step: 821500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:51,373-Speed 2496.94 samples/sec Loss 1.0755 LearningRate 0.000000 Epoch: 39 Global Step: 821510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:23:59,580-Speed 2495.71 samples/sec Loss 1.0730 LearningRate 0.000000 Epoch: 39 Global Step: 821520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:24:07,732-Speed 2512.57 samples/sec Loss 1.0907 LearningRate 0.000000 Epoch: 39 Global Step: 821530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:24:15,936-Speed 2496.78 samples/sec Loss 1.0525 LearningRate 0.000000 Epoch: 39 Global Step: 821540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:24:24,096-Speed 2510.30 samples/sec Loss 1.0695 LearningRate 0.000000 Epoch: 39 Global Step: 821550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:24:32,299-Speed 2496.89 samples/sec Loss 1.0897 LearningRate 0.000000 Epoch: 39 Global Step: 821560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:24:40,512-Speed 2493.87 samples/sec Loss 1.0588 LearningRate 0.000000 Epoch: 39 Global Step: 821570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:24:48,720-Speed 2495.64 samples/sec Loss 1.0747 LearningRate 0.000000 Epoch: 39 Global Step: 821580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:24:56,876-Speed 2511.66 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 821590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:05,078-Speed 2497.33 samples/sec Loss 1.0414 LearningRate 0.000000 Epoch: 39 Global Step: 821600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:13,282-Speed 2496.74 samples/sec Loss 1.0575 LearningRate 0.000000 Epoch: 39 Global Step: 821610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:21,484-Speed 2497.29 samples/sec Loss 1.0591 LearningRate 0.000000 Epoch: 39 Global Step: 821620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:29,702-Speed 2492.54 samples/sec Loss 1.0692 LearningRate 0.000000 Epoch: 39 Global Step: 821630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:37,914-Speed 2494.13 samples/sec Loss 1.0636 LearningRate 0.000000 Epoch: 39 Global Step: 821640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:46,063-Speed 2513.46 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 821650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:25:54,280-Speed 2492.96 samples/sec Loss 1.0438 LearningRate 0.000000 Epoch: 39 Global Step: 821660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:02,485-Speed 2496.18 samples/sec Loss 1.0548 LearningRate 0.000000 Epoch: 39 Global Step: 821670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:10,694-Speed 2496.05 samples/sec Loss 1.0620 LearningRate 0.000000 Epoch: 39 Global Step: 821680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:18,899-Speed 2496.44 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 821690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:27,103-Speed 2496.62 samples/sec Loss 1.0458 LearningRate 0.000000 Epoch: 39 Global Step: 821700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:35,255-Speed 2512.48 samples/sec Loss 1.0698 LearningRate 0.000000 Epoch: 39 Global Step: 821710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:43,466-Speed 2494.90 samples/sec Loss 1.0872 LearningRate 0.000000 Epoch: 39 Global Step: 821720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:51,673-Speed 2495.89 samples/sec Loss 1.0416 LearningRate 0.000000 Epoch: 39 Global Step: 821730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:26:59,878-Speed 2496.55 samples/sec Loss 1.0354 LearningRate 0.000000 Epoch: 39 Global Step: 821740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:08,084-Speed 2496.14 samples/sec Loss 1.0918 LearningRate 0.000000 Epoch: 39 Global Step: 821750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:16,292-Speed 2495.58 samples/sec Loss 1.0818 LearningRate 0.000000 Epoch: 39 Global Step: 821760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:24,443-Speed 2512.78 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 821770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:32,655-Speed 2494.42 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 821780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:40,859-Speed 2496.67 samples/sec Loss 1.0470 LearningRate 0.000000 Epoch: 39 Global Step: 821790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:49,065-Speed 2497.04 samples/sec Loss 1.0726 LearningRate 0.000000 Epoch: 39 Global Step: 821800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:27:57,283-Speed 2492.37 samples/sec Loss 1.0545 LearningRate 0.000000 Epoch: 39 Global Step: 821810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:05,489-Speed 2496.22 samples/sec Loss 1.0701 LearningRate 0.000000 Epoch: 39 Global Step: 821820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:13,639-Speed 2513.45 samples/sec Loss 1.0583 LearningRate 0.000000 Epoch: 39 Global Step: 821830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:21,847-Speed 2495.65 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 821840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:30,050-Speed 2496.78 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 821850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:38,256-Speed 2496.53 samples/sec Loss 1.0782 LearningRate 0.000000 Epoch: 39 Global Step: 821860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:46,463-Speed 2496.08 samples/sec Loss 1.0408 LearningRate 0.000000 Epoch: 39 Global Step: 821870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:28:54,680-Speed 2492.52 samples/sec Loss 1.0358 LearningRate 0.000000 Epoch: 39 Global Step: 821880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:02,832-Speed 2512.47 samples/sec Loss 1.0683 LearningRate 0.000000 Epoch: 39 Global Step: 821890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:11,053-Speed 2492.03 samples/sec Loss 1.0858 LearningRate 0.000000 Epoch: 39 Global Step: 821900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:19,255-Speed 2497.30 samples/sec Loss 1.0506 LearningRate 0.000000 Epoch: 39 Global Step: 821910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:27,461-Speed 2496.08 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 821920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:35,665-Speed 2496.89 samples/sec Loss 1.0758 LearningRate 0.000000 Epoch: 39 Global Step: 821930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:43,869-Speed 2496.62 samples/sec Loss 1.0537 LearningRate 0.000000 Epoch: 39 Global Step: 821940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:29:52,022-Speed 2512.28 samples/sec Loss 1.0763 LearningRate 0.000000 Epoch: 39 Global Step: 821950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:00,228-Speed 2496.03 samples/sec Loss 1.0462 LearningRate 0.000000 Epoch: 39 Global Step: 821960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:08,434-Speed 2496.20 samples/sec Loss 1.0375 LearningRate 0.000000 Epoch: 39 Global Step: 821970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:16,639-Speed 2496.39 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 821980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:24,844-Speed 2496.70 samples/sec Loss 1.0563 LearningRate 0.000000 Epoch: 39 Global Step: 821990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:33,050-Speed 2495.92 samples/sec Loss 1.0709 LearningRate 0.000000 Epoch: 39 Global Step: 822000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:41,201-Speed 2513.05 samples/sec Loss 1.0583 LearningRate 0.000000 Epoch: 39 Global Step: 822010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:49,409-Speed 2495.54 samples/sec Loss 1.1053 LearningRate 0.000000 Epoch: 39 Global Step: 822020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:30:57,617-Speed 2495.40 samples/sec Loss 1.0747 LearningRate 0.000000 Epoch: 39 Global Step: 822030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:05,824-Speed 2495.86 samples/sec Loss 1.0732 LearningRate 0.000000 Epoch: 39 Global Step: 822040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:14,036-Speed 2494.23 samples/sec Loss 1.0235 LearningRate 0.000000 Epoch: 39 Global Step: 822050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:22,241-Speed 2496.64 samples/sec Loss 1.0457 LearningRate 0.000000 Epoch: 39 Global Step: 822060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:30,392-Speed 2512.96 samples/sec Loss 1.0543 LearningRate 0.000000 Epoch: 39 Global Step: 822070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:38,597-Speed 2496.53 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 822080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:46,799-Speed 2497.14 samples/sec Loss 1.0650 LearningRate 0.000000 Epoch: 39 Global Step: 822090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:31:55,004-Speed 2496.26 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 822100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:03,228-Speed 2490.87 samples/sec Loss 1.0567 LearningRate 0.000000 Epoch: 39 Global Step: 822110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:11,436-Speed 2495.22 samples/sec Loss 1.0274 LearningRate 0.000000 Epoch: 39 Global Step: 822120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:19,588-Speed 2512.99 samples/sec Loss 1.0410 LearningRate 0.000000 Epoch: 39 Global Step: 822130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:27,802-Speed 2494.01 samples/sec Loss 1.0893 LearningRate 0.000000 Epoch: 39 Global Step: 822140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:36,008-Speed 2496.06 samples/sec Loss 1.0335 LearningRate 0.000000 Epoch: 39 Global Step: 822150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:44,211-Speed 2496.78 samples/sec Loss 1.0531 LearningRate 0.000000 Epoch: 39 Global Step: 822160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:32:52,419-Speed 2495.72 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 822170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:00,625-Speed 2496.13 samples/sec Loss 1.0311 LearningRate 0.000000 Epoch: 39 Global Step: 822180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:08,781-Speed 2511.20 samples/sec Loss 1.0686 LearningRate 0.000000 Epoch: 39 Global Step: 822190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:16,996-Speed 2493.51 samples/sec Loss 1.0811 LearningRate 0.000000 Epoch: 39 Global Step: 822200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:25,202-Speed 2496.19 samples/sec Loss 1.0654 LearningRate 0.000000 Epoch: 39 Global Step: 822210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:33,405-Speed 2496.99 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 822220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:41,618-Speed 2493.97 samples/sec Loss 1.0386 LearningRate 0.000000 Epoch: 39 Global Step: 822230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:49,828-Speed 2495.15 samples/sec Loss 1.0485 LearningRate 0.000000 Epoch: 39 Global Step: 822240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:33:57,981-Speed 2512.49 samples/sec Loss 1.0646 LearningRate 0.000000 Epoch: 39 Global Step: 822250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:06,185-Speed 2496.79 samples/sec Loss 1.0607 LearningRate 0.000000 Epoch: 39 Global Step: 822260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:14,390-Speed 2496.95 samples/sec Loss 1.0823 LearningRate 0.000000 Epoch: 39 Global Step: 822270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:22,594-Speed 2496.44 samples/sec Loss 1.0440 LearningRate 0.000000 Epoch: 39 Global Step: 822280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:30,800-Speed 2496.12 samples/sec Loss 1.0357 LearningRate 0.000000 Epoch: 39 Global Step: 822290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:39,011-Speed 2494.87 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 822300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:47,162-Speed 2512.84 samples/sec Loss 1.0482 LearningRate 0.000000 Epoch: 39 Global Step: 822310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:34:55,365-Speed 2497.06 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 822320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:03,570-Speed 2496.32 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 822330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:11,776-Speed 2496.22 samples/sec Loss 1.0565 LearningRate 0.000000 Epoch: 39 Global Step: 822340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:19,983-Speed 2495.70 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 822350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:28,190-Speed 2495.89 samples/sec Loss 1.0855 LearningRate 0.000000 Epoch: 39 Global Step: 822360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:36,339-Speed 2513.35 samples/sec Loss 1.0783 LearningRate 0.000000 Epoch: 39 Global Step: 822370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:44,544-Speed 2496.59 samples/sec Loss 1.0788 LearningRate 0.000000 Epoch: 39 Global Step: 822380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:35:52,753-Speed 2495.25 samples/sec Loss 1.0853 LearningRate 0.000000 Epoch: 39 Global Step: 822390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:00,958-Speed 2496.46 samples/sec Loss 1.0746 LearningRate 0.000000 Epoch: 39 Global Step: 822400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:09,164-Speed 2496.38 samples/sec Loss 1.1062 LearningRate 0.000000 Epoch: 39 Global Step: 822410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:17,370-Speed 2495.85 samples/sec Loss 1.0525 LearningRate 0.000000 Epoch: 39 Global Step: 822420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:25,524-Speed 2512.11 samples/sec Loss 1.0404 LearningRate 0.000000 Epoch: 39 Global Step: 822430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:33,732-Speed 2495.93 samples/sec Loss 1.0885 LearningRate 0.000000 Epoch: 39 Global Step: 822440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:41,939-Speed 2496.26 samples/sec Loss 1.0479 LearningRate 0.000000 Epoch: 39 Global Step: 822450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:50,149-Speed 2494.52 samples/sec Loss 1.0525 LearningRate 0.000000 Epoch: 39 Global Step: 822460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:36:58,374-Speed 2490.40 samples/sec Loss 1.0592 LearningRate 0.000000 Epoch: 39 Global Step: 822470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:06,579-Speed 2496.48 samples/sec Loss 1.0417 LearningRate 0.000000 Epoch: 39 Global Step: 822480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:14,735-Speed 2511.41 samples/sec Loss 1.0998 LearningRate 0.000000 Epoch: 39 Global Step: 822490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:22,942-Speed 2496.11 samples/sec Loss 1.0302 LearningRate 0.000000 Epoch: 39 Global Step: 822500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:31,160-Speed 2492.76 samples/sec Loss 1.0665 LearningRate 0.000000 Epoch: 39 Global Step: 822510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:39,365-Speed 2496.37 samples/sec Loss 1.1015 LearningRate 0.000000 Epoch: 39 Global Step: 822520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:47,582-Speed 2492.78 samples/sec Loss 1.0784 LearningRate 0.000000 Epoch: 39 Global Step: 822530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:37:55,795-Speed 2493.84 samples/sec Loss 1.0802 LearningRate 0.000000 Epoch: 39 Global Step: 822540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:03,952-Speed 2511.11 samples/sec Loss 1.0953 LearningRate 0.000000 Epoch: 39 Global Step: 822550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:12,161-Speed 2495.22 samples/sec Loss 1.0407 LearningRate 0.000000 Epoch: 39 Global Step: 822560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:20,367-Speed 2496.37 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 822570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:28,571-Speed 2496.41 samples/sec Loss 1.0415 LearningRate 0.000000 Epoch: 39 Global Step: 822580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:36,779-Speed 2495.66 samples/sec Loss 1.0317 LearningRate 0.000000 Epoch: 39 Global Step: 822590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:44,998-Speed 2492.19 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 822600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:38:53,163-Speed 2508.76 samples/sec Loss 1.0923 LearningRate 0.000000 Epoch: 39 Global Step: 822610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:01,367-Speed 2496.55 samples/sec Loss 1.0829 LearningRate 0.000000 Epoch: 39 Global Step: 822620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:09,592-Speed 2490.29 samples/sec Loss 1.0368 LearningRate 0.000000 Epoch: 39 Global Step: 822630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:17,811-Speed 2492.43 samples/sec Loss 1.0439 LearningRate 0.000000 Epoch: 39 Global Step: 822640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:26,015-Speed 2496.66 samples/sec Loss 1.0616 LearningRate 0.000000 Epoch: 39 Global Step: 822650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:34,221-Speed 2496.57 samples/sec Loss 1.0362 LearningRate 0.000000 Epoch: 39 Global Step: 822660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:42,372-Speed 2513.11 samples/sec Loss 1.0368 LearningRate 0.000000 Epoch: 39 Global Step: 822670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:50,577-Speed 2496.52 samples/sec Loss 1.0483 LearningRate 0.000000 Epoch: 39 Global Step: 822680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:39:58,786-Speed 2495.13 samples/sec Loss 1.0257 LearningRate 0.000000 Epoch: 39 Global Step: 822690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:06,989-Speed 2497.02 samples/sec Loss 1.0359 LearningRate 0.000000 Epoch: 39 Global Step: 822700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:15,194-Speed 2496.45 samples/sec Loss 1.0842 LearningRate 0.000000 Epoch: 39 Global Step: 822710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:23,399-Speed 2496.74 samples/sec Loss 1.0050 LearningRate 0.000000 Epoch: 39 Global Step: 822720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:31,565-Speed 2508.33 samples/sec Loss 1.0532 LearningRate 0.000000 Epoch: 39 Global Step: 822730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:39,771-Speed 2496.15 samples/sec Loss 1.0314 LearningRate 0.000000 Epoch: 39 Global Step: 822740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:40:47,977-Speed 2496.13 samples/sec Loss 1.0579 LearningRate 0.000000 Epoch: 39 Global Step: 822750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:40:56,187-Speed 2494.75 samples/sec Loss 1.0613 LearningRate 0.000000 Epoch: 39 Global Step: 822760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:04,396-Speed 2495.46 samples/sec Loss 1.0768 LearningRate 0.000000 Epoch: 39 Global Step: 822770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:12,601-Speed 2496.52 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 822780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:20,756-Speed 2511.67 samples/sec Loss 1.0817 LearningRate 0.000000 Epoch: 39 Global Step: 822790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:28,967-Speed 2494.90 samples/sec Loss 1.0608 LearningRate 0.000000 Epoch: 39 Global Step: 822800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:37,172-Speed 2496.56 samples/sec Loss 1.0881 LearningRate 0.000000 Epoch: 39 Global Step: 822810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:45,390-Speed 2492.71 samples/sec Loss 1.0134 LearningRate 0.000000 Epoch: 39 Global Step: 822820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:41:53,596-Speed 2496.19 samples/sec Loss 1.0640 LearningRate 0.000000 Epoch: 39 Global Step: 822830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:01,803-Speed 2495.83 samples/sec Loss 1.0472 LearningRate 0.000000 Epoch: 39 Global Step: 822840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:09,957-Speed 2511.86 samples/sec Loss 1.0902 LearningRate 0.000000 Epoch: 39 Global Step: 822850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:18,163-Speed 2496.25 samples/sec Loss 1.0712 LearningRate 0.000000 Epoch: 39 Global Step: 822860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:26,367-Speed 2496.67 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 822870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:34,569-Speed 2497.19 samples/sec Loss 1.0915 LearningRate 0.000000 Epoch: 39 Global Step: 822880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:42,776-Speed 2496.08 samples/sec Loss 1.0681 LearningRate 0.000000 Epoch: 39 Global Step: 822890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:50,984-Speed 2495.52 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 822900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:42:59,133-Speed 2513.45 samples/sec Loss 1.0437 LearningRate 0.000000 Epoch: 39 Global Step: 822910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:43:07,342-Speed 2495.23 samples/sec Loss 1.0703 LearningRate 0.000000 Epoch: 39 Global Step: 822920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:43:15,551-Speed 2495.12 samples/sec Loss 1.0730 LearningRate 0.000000 Epoch: 39 Global Step: 822930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:43:23,760-Speed 2495.53 samples/sec Loss 1.0814 LearningRate 0.000000 Epoch: 39 Global Step: 822940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:43:31,966-Speed 2496.09 samples/sec Loss 1.0570 LearningRate 0.000000 Epoch: 39 Global Step: 822950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-07-13 10:43:40,131-Speed 2508.56 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 822960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:43:48,286-Speed 2511.67 samples/sec Loss 1.0656 LearningRate 0.000000 Epoch: 39 Global Step: 822970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:43:56,492-Speed 2496.35 samples/sec Loss 1.0407 LearningRate 0.000000 Epoch: 39 Global Step: 822980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:44:04,696-Speed 2496.67 samples/sec Loss 1.0631 LearningRate 0.000000 Epoch: 39 Global Step: 822990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:44:12,899-Speed 2496.75 samples/sec Loss 1.0860 LearningRate 0.000000 Epoch: 39 Global Step: 823000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-07-13 10:44:21,115-Speed 2493.46 samples/sec Loss 1.0943 LearningRate 0.000000 Epoch: 39 Global Step: 823010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:44:29,320-Speed 2496.59 samples/sec Loss 1.0824 LearningRate 0.000000 Epoch: 39 Global Step: 823020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:44:37,470-Speed 2513.09 samples/sec Loss 1.0702 LearningRate 0.000000 Epoch: 39 Global Step: 823030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:44:45,670-Speed 2497.66 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 823040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:44:53,873-Speed 2497.12 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 823050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:02,084-Speed 2494.81 samples/sec Loss 1.0543 LearningRate 0.000000 Epoch: 39 Global Step: 823060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:10,302-Speed 2492.41 samples/sec Loss 1.0726 LearningRate 0.000000 Epoch: 39 Global Step: 823070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:18,512-Speed 2494.97 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 823080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:26,665-Speed 2512.26 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 823090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:34,870-Speed 2496.45 samples/sec Loss 1.0612 LearningRate 0.000000 Epoch: 39 Global Step: 823100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:43,075-Speed 2496.61 samples/sec Loss 1.0369 LearningRate 0.000000 Epoch: 39 Global Step: 823110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:51,277-Speed 2497.31 samples/sec Loss 1.0482 LearningRate 0.000000 Epoch: 39 Global Step: 823120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:45:59,494-Speed 2492.87 samples/sec Loss 1.0720 LearningRate 0.000000 Epoch: 39 Global Step: 823130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:07,698-Speed 2496.76 samples/sec Loss 1.0535 LearningRate 0.000000 Epoch: 39 Global Step: 823140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:15,859-Speed 2509.54 samples/sec Loss 1.0686 LearningRate 0.000000 Epoch: 39 Global Step: 823150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:24,069-Speed 2495.04 samples/sec Loss 1.0406 LearningRate 0.000000 Epoch: 39 Global Step: 823160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:32,273-Speed 2496.90 samples/sec Loss 1.0324 LearningRate 0.000000 Epoch: 39 Global Step: 823170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:40,478-Speed 2496.35 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 823180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:48,694-Speed 2493.13 samples/sec Loss 1.0423 LearningRate 0.000000 Epoch: 39 Global Step: 823190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:46:56,908-Speed 2493.52 samples/sec Loss 1.0471 LearningRate 0.000000 Epoch: 39 Global Step: 823200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:05,060-Speed 2512.84 samples/sec Loss 1.0753 LearningRate 0.000000 Epoch: 39 Global Step: 823210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:13,269-Speed 2495.28 samples/sec Loss 1.0321 LearningRate 0.000000 Epoch: 39 Global Step: 823220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:21,474-Speed 2496.60 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 823230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:29,684-Speed 2494.97 samples/sec Loss 1.0632 LearningRate 0.000000 Epoch: 39 Global Step: 823240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:37,888-Speed 2496.80 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 823250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:46,092-Speed 2496.69 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 823260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:47:54,257-Speed 2508.62 samples/sec Loss 1.0597 LearningRate 0.000000 Epoch: 39 Global Step: 823270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:02,463-Speed 2496.02 samples/sec Loss 1.0745 LearningRate 0.000000 Epoch: 39 Global Step: 823280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:10,678-Speed 2493.51 samples/sec Loss 1.0426 LearningRate 0.000000 Epoch: 39 Global Step: 823290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:18,883-Speed 2496.51 samples/sec Loss 1.0246 LearningRate 0.000000 Epoch: 39 Global Step: 823300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:27,093-Speed 2494.94 samples/sec Loss 1.0580 LearningRate 0.000000 Epoch: 39 Global Step: 823310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:35,314-Speed 2491.59 samples/sec Loss 1.0429 LearningRate 0.000000 Epoch: 39 Global Step: 823320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:43,467-Speed 2512.47 samples/sec Loss 1.0536 LearningRate 0.000000 Epoch: 39 Global Step: 823330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:51,675-Speed 2495.72 samples/sec Loss 1.0464 LearningRate 0.000000 Epoch: 39 Global Step: 823340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:48:59,900-Speed 2490.34 samples/sec Loss 1.0704 LearningRate 0.000000 Epoch: 39 Global Step: 823350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:08,104-Speed 2496.63 samples/sec Loss 1.0353 LearningRate 0.000000 Epoch: 39 Global Step: 823360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:16,315-Speed 2494.74 samples/sec Loss 1.0434 LearningRate 0.000000 Epoch: 39 Global Step: 823370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:24,520-Speed 2496.52 samples/sec Loss 1.0400 LearningRate 0.000000 Epoch: 39 Global Step: 823380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:32,671-Speed 2512.77 samples/sec Loss 1.0565 LearningRate 0.000000 Epoch: 39 Global Step: 823390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:40,876-Speed 2496.61 samples/sec Loss 1.0453 LearningRate 0.000000 Epoch: 39 Global Step: 823400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:49,080-Speed 2496.95 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 823410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:49:57,287-Speed 2495.58 samples/sec Loss 1.0742 LearningRate 0.000000 Epoch: 39 Global Step: 823420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:05,579-Speed 2470.40 samples/sec Loss 1.0355 LearningRate 0.000000 Epoch: 39 Global Step: 823430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:13,789-Speed 2494.95 samples/sec Loss 1.0384 LearningRate 0.000000 Epoch: 39 Global Step: 823440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:21,945-Speed 2511.45 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 823450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:30,147-Speed 2497.13 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 823460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:38,351-Speed 2496.71 samples/sec Loss 1.0463 LearningRate 0.000000 Epoch: 39 Global Step: 823470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:46,567-Speed 2493.26 samples/sec Loss 1.0782 LearningRate 0.000000 Epoch: 39 Global Step: 823480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:50:54,782-Speed 2493.32 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 823490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:02,996-Speed 2494.14 samples/sec Loss 1.0319 LearningRate 0.000000 Epoch: 39 Global Step: 823500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:11,154-Speed 2510.75 samples/sec Loss 1.0080 LearningRate 0.000000 Epoch: 39 Global Step: 823510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:19,358-Speed 2496.61 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 823520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:27,561-Speed 2497.09 samples/sec Loss 1.0897 LearningRate 0.000000 Epoch: 39 Global Step: 823530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:35,765-Speed 2496.90 samples/sec Loss 1.0512 LearningRate 0.000000 Epoch: 39 Global Step: 823540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:43,978-Speed 2494.08 samples/sec Loss 1.0453 LearningRate 0.000000 Epoch: 39 Global Step: 823550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:51:52,182-Speed 2496.49 samples/sec Loss 1.0885 LearningRate 0.000000 Epoch: 39 Global Step: 823560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:00,336-Speed 2512.07 samples/sec Loss 1.0602 LearningRate 0.000000 Epoch: 39 Global Step: 823570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:08,541-Speed 2496.49 samples/sec Loss 1.0555 LearningRate 0.000000 Epoch: 39 Global Step: 823580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:16,744-Speed 2497.18 samples/sec Loss 1.0737 LearningRate 0.000000 Epoch: 39 Global Step: 823590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:24,948-Speed 2496.91 samples/sec Loss 1.0762 LearningRate 0.000000 Epoch: 39 Global Step: 823600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:33,157-Speed 2495.19 samples/sec Loss 1.0772 LearningRate 0.000000 Epoch: 39 Global Step: 823610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:41,364-Speed 2495.78 samples/sec Loss 1.0723 LearningRate 0.000000 Epoch: 39 Global Step: 823620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:49,522-Speed 2511.26 samples/sec Loss 1.0793 LearningRate 0.000000 Epoch: 39 Global Step: 823630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:52:57,735-Speed 2494.02 samples/sec Loss 1.0884 LearningRate 0.000000 Epoch: 39 Global Step: 823640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:05,940-Speed 2496.45 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 823650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:14,150-Speed 2495.06 samples/sec Loss 1.0427 LearningRate 0.000000 Epoch: 39 Global Step: 823660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:22,357-Speed 2495.74 samples/sec Loss 1.0777 LearningRate 0.000000 Epoch: 39 Global Step: 823670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:30,568-Speed 2494.79 samples/sec Loss 1.0307 LearningRate 0.000000 Epoch: 39 Global Step: 823680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:38,740-Speed 2506.51 samples/sec Loss 1.0655 LearningRate 0.000000 Epoch: 39 Global Step: 823690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:46,959-Speed 2492.10 samples/sec Loss 1.0659 LearningRate 0.000000 Epoch: 39 Global Step: 823700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:53:55,164-Speed 2496.49 samples/sec Loss 1.0655 LearningRate 0.000000 Epoch: 39 Global Step: 823710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:03,374-Speed 2494.99 samples/sec Loss 1.0720 LearningRate 0.000000 Epoch: 39 Global Step: 823720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:11,580-Speed 2495.83 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 823730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:19,792-Speed 2494.30 samples/sec Loss 1.0781 LearningRate 0.000000 Epoch: 39 Global Step: 823740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:27,945-Speed 2512.39 samples/sec Loss 1.0749 LearningRate 0.000000 Epoch: 39 Global Step: 823750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:36,151-Speed 2496.22 samples/sec Loss 1.0384 LearningRate 0.000000 Epoch: 39 Global Step: 823760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:44,372-Speed 2491.46 samples/sec Loss 1.0702 LearningRate 0.000000 Epoch: 39 Global Step: 823770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:54:52,580-Speed 2495.77 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 823780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:00,786-Speed 2496.18 samples/sec Loss 1.0671 LearningRate 0.000000 Epoch: 39 Global Step: 823790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:08,997-Speed 2494.46 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 823800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:17,151-Speed 2511.88 samples/sec Loss 1.0711 LearningRate 0.000000 Epoch: 39 Global Step: 823810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:25,358-Speed 2495.86 samples/sec Loss 1.0297 LearningRate 0.000000 Epoch: 39 Global Step: 823820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:33,598-Speed 2485.56 samples/sec Loss 1.0539 LearningRate 0.000000 Epoch: 39 Global Step: 823830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:41,803-Speed 2496.46 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 823840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:50,015-Speed 2494.20 samples/sec Loss 1.0698 LearningRate 0.000000 Epoch: 39 Global Step: 823850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:55:58,222-Speed 2495.83 samples/sec Loss 1.0661 LearningRate 0.000000 Epoch: 39 Global Step: 823860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:06,377-Speed 2511.78 samples/sec Loss 1.0397 LearningRate 0.000000 Epoch: 39 Global Step: 823870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:14,587-Speed 2494.93 samples/sec Loss 1.0743 LearningRate 0.000000 Epoch: 39 Global Step: 823880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:22,803-Speed 2493.19 samples/sec Loss 1.0630 LearningRate 0.000000 Epoch: 39 Global Step: 823890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:31,015-Speed 2494.36 samples/sec Loss 1.0806 LearningRate 0.000000 Epoch: 39 Global Step: 823900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:39,226-Speed 2494.69 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 823910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:47,435-Speed 2495.16 samples/sec Loss 1.0626 LearningRate 0.000000 Epoch: 39 Global Step: 823920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:56:55,590-Speed 2511.92 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 823930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:03,800-Speed 2495.07 samples/sec Loss 1.0506 LearningRate 0.000000 Epoch: 39 Global Step: 823940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:12,021-Speed 2491.54 samples/sec Loss 1.0575 LearningRate 0.000000 Epoch: 39 Global Step: 823950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:20,228-Speed 2495.85 samples/sec Loss 1.0718 LearningRate 0.000000 Epoch: 39 Global Step: 823960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:28,438-Speed 2495.02 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 823970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:36,649-Speed 2494.78 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 823980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:44,813-Speed 2508.79 samples/sec Loss 1.0706 LearningRate 0.000000 Epoch: 39 Global Step: 823990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:57:53,019-Speed 2496.09 samples/sec Loss 1.0583 LearningRate 0.000000 Epoch: 39 Global Step: 824000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:01,226-Speed 2495.99 samples/sec Loss 1.0689 LearningRate 0.000000 Epoch: 39 Global Step: 824010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:09,443-Speed 2492.68 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 824020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:17,650-Speed 2495.74 samples/sec Loss 1.0448 LearningRate 0.000000 Epoch: 39 Global Step: 824030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:25,855-Speed 2496.63 samples/sec Loss 1.0701 LearningRate 0.000000 Epoch: 39 Global Step: 824040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:34,006-Speed 2512.86 samples/sec Loss 1.0670 LearningRate 0.000000 Epoch: 39 Global Step: 824050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:42,211-Speed 2496.53 samples/sec Loss 1.0746 LearningRate 0.000000 Epoch: 39 Global Step: 824060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:50,418-Speed 2495.76 samples/sec Loss 1.0442 LearningRate 0.000000 Epoch: 39 Global Step: 824070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:58:58,625-Speed 2495.98 samples/sec Loss 1.0738 LearningRate 0.000000 Epoch: 39 Global Step: 824080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:06,832-Speed 2495.45 samples/sec Loss 1.0433 LearningRate 0.000000 Epoch: 39 Global Step: 824090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:15,040-Speed 2495.66 samples/sec Loss 1.0825 LearningRate 0.000000 Epoch: 39 Global Step: 824100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:23,212-Speed 2506.41 samples/sec Loss 1.0678 LearningRate 0.000000 Epoch: 39 Global Step: 824110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:31,417-Speed 2496.52 samples/sec Loss 1.0479 LearningRate 0.000000 Epoch: 39 Global Step: 824120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:39,622-Speed 2496.44 samples/sec Loss 1.0928 LearningRate 0.000000 Epoch: 39 Global Step: 824130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:47,835-Speed 2493.98 samples/sec Loss 1.0525 LearningRate 0.000000 Epoch: 39 Global Step: 824140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 10:59:56,041-Speed 2496.50 samples/sec Loss 1.0622 LearningRate 0.000000 Epoch: 39 Global Step: 824150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:00:04,248-Speed 2495.69 samples/sec Loss 1.0401 LearningRate 0.000000 Epoch: 39 Global Step: 824160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:00:12,406-Speed 2511.06 samples/sec Loss 1.0475 LearningRate 0.000000 Epoch: 39 Global Step: 824170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:00:20,629-Speed 2490.88 samples/sec Loss 1.0864 LearningRate 0.000000 Epoch: 39 Global Step: 824180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:00:28,796-Speed 2507.94 samples/sec Loss 1.0444 LearningRate 0.000000 Epoch: 39 Global Step: 824190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:00:37,004-Speed 2495.60 samples/sec Loss 1.0568 LearningRate 0.000000 Epoch: 39 Global Step: 824200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:00:45,222-Speed 2492.44 samples/sec Loss 1.0405 LearningRate 0.000000 Epoch: 39 Global Step: 824210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:00:53,434-Speed 2494.45 samples/sec Loss 1.0432 LearningRate 0.000000 Epoch: 39 Global Step: 824220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:01,589-Speed 2511.68 samples/sec Loss 1.0689 LearningRate 0.000000 Epoch: 39 Global Step: 824230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:09,799-Speed 2495.00 samples/sec Loss 1.0594 LearningRate 0.000000 Epoch: 39 Global Step: 824240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:18,009-Speed 2495.00 samples/sec Loss 1.0565 LearningRate 0.000000 Epoch: 39 Global Step: 824250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:26,213-Speed 2496.62 samples/sec Loss 1.0577 LearningRate 0.000000 Epoch: 39 Global Step: 824260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:34,420-Speed 2495.76 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 824270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:42,641-Speed 2491.72 samples/sec Loss 1.0770 LearningRate 0.000000 Epoch: 39 Global Step: 824280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:50,796-Speed 2511.68 samples/sec Loss 1.0655 LearningRate 0.000000 Epoch: 39 Global Step: 824290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:01:59,003-Speed 2495.78 samples/sec Loss 1.0609 LearningRate 0.000000 Epoch: 39 Global Step: 824300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:07,211-Speed 2495.86 samples/sec Loss 1.0442 LearningRate 0.000000 Epoch: 39 Global Step: 824310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:15,418-Speed 2495.70 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 824320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:23,633-Speed 2493.56 samples/sec Loss 1.0835 LearningRate 0.000000 Epoch: 39 Global Step: 824330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:31,841-Speed 2495.36 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 824340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:39,997-Speed 2511.44 samples/sec Loss 1.0772 LearningRate 0.000000 Epoch: 39 Global Step: 824350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:48,211-Speed 2493.79 samples/sec Loss 1.0383 LearningRate 0.000000 Epoch: 39 Global Step: 824360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:02:56,415-Speed 2496.60 samples/sec Loss 1.0690 LearningRate 0.000000 Epoch: 39 Global Step: 824370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:04,625-Speed 2495.01 samples/sec Loss 1.0940 LearningRate 0.000000 Epoch: 39 Global Step: 824380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:12,830-Speed 2496.27 samples/sec Loss 1.0460 LearningRate 0.000000 Epoch: 39 Global Step: 824390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:21,035-Speed 2496.39 samples/sec Loss 1.0639 LearningRate 0.000000 Epoch: 39 Global Step: 824400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:29,189-Speed 2512.20 samples/sec Loss 1.0665 LearningRate 0.000000 Epoch: 39 Global Step: 824410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:37,404-Speed 2493.40 samples/sec Loss 1.0336 LearningRate 0.000000 Epoch: 39 Global Step: 824420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:45,611-Speed 2495.60 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 824430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:03:53,820-Speed 2495.41 samples/sec Loss 1.0643 LearningRate 0.000000 Epoch: 39 Global Step: 824440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:02,025-Speed 2496.34 samples/sec Loss 1.0441 LearningRate 0.000000 Epoch: 39 Global Step: 824450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:10,248-Speed 2490.94 samples/sec Loss 1.0436 LearningRate 0.000000 Epoch: 39 Global Step: 824460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:18,415-Speed 2508.04 samples/sec Loss 1.0396 LearningRate 0.000000 Epoch: 39 Global Step: 824470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:26,626-Speed 2494.60 samples/sec Loss 1.0635 LearningRate 0.000000 Epoch: 39 Global Step: 824480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:34,834-Speed 2495.42 samples/sec Loss 1.0629 LearningRate 0.000000 Epoch: 39 Global Step: 824490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:43,044-Speed 2494.97 samples/sec Loss 1.0340 LearningRate 0.000000 Epoch: 39 Global Step: 824500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:51,251-Speed 2495.88 samples/sec Loss 1.0060 LearningRate 0.000000 Epoch: 39 Global Step: 824510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:04:59,456-Speed 2496.41 samples/sec Loss 1.0755 LearningRate 0.000000 Epoch: 39 Global Step: 824520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:07,607-Speed 2512.73 samples/sec Loss 1.0503 LearningRate 0.000000 Epoch: 39 Global Step: 824530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:15,813-Speed 2496.39 samples/sec Loss 1.0320 LearningRate 0.000000 Epoch: 39 Global Step: 824540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:24,016-Speed 2496.98 samples/sec Loss 1.0407 LearningRate 0.000000 Epoch: 39 Global Step: 824550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:32,222-Speed 2496.34 samples/sec Loss 1.0500 LearningRate 0.000000 Epoch: 39 Global Step: 824560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:40,428-Speed 2496.33 samples/sec Loss 1.0345 LearningRate 0.000000 Epoch: 39 Global Step: 824570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:48,631-Speed 2496.91 samples/sec Loss 1.0512 LearningRate 0.000000 Epoch: 39 Global Step: 824580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:05:56,814-Speed 2503.06 samples/sec Loss 1.0477 LearningRate 0.000000 Epoch: 39 Global Step: 824590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:05,018-Speed 2497.03 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 824600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:13,222-Speed 2496.85 samples/sec Loss 1.0703 LearningRate 0.000000 Epoch: 39 Global Step: 824610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:21,425-Speed 2496.78 samples/sec Loss 1.0325 LearningRate 0.000000 Epoch: 39 Global Step: 824620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:29,630-Speed 2497.01 samples/sec Loss 1.0383 LearningRate 0.000000 Epoch: 39 Global Step: 824630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:37,836-Speed 2495.83 samples/sec Loss 1.0620 LearningRate 0.000000 Epoch: 39 Global Step: 824640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:45,989-Speed 2512.36 samples/sec Loss 1.0511 LearningRate 0.000000 Epoch: 39 Global Step: 824650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:06:54,195-Speed 2496.28 samples/sec Loss 1.0805 LearningRate 0.000000 Epoch: 39 Global Step: 824660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:02,401-Speed 2496.26 samples/sec Loss 1.0765 LearningRate 0.000000 Epoch: 39 Global Step: 824670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:10,604-Speed 2497.00 samples/sec Loss 1.0260 LearningRate 0.000000 Epoch: 39 Global Step: 824680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:18,806-Speed 2497.17 samples/sec Loss 1.0567 LearningRate 0.000000 Epoch: 39 Global Step: 824690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:27,011-Speed 2496.35 samples/sec Loss 1.0219 LearningRate 0.000000 Epoch: 39 Global Step: 824700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:35,163-Speed 2512.65 samples/sec Loss 1.0255 LearningRate 0.000000 Epoch: 39 Global Step: 824710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:43,368-Speed 2496.46 samples/sec Loss 1.0379 LearningRate 0.000000 Epoch: 39 Global Step: 824720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:51,577-Speed 2495.10 samples/sec Loss 1.0622 LearningRate 0.000000 Epoch: 39 Global Step: 824730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:07:59,783-Speed 2496.38 samples/sec Loss 1.0601 LearningRate 0.000000 Epoch: 39 Global Step: 824740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:07,988-Speed 2496.31 samples/sec Loss 1.0788 LearningRate 0.000000 Epoch: 39 Global Step: 824750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:16,192-Speed 2496.78 samples/sec Loss 1.0868 LearningRate 0.000000 Epoch: 39 Global Step: 824760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:24,342-Speed 2513.22 samples/sec Loss 1.0550 LearningRate 0.000000 Epoch: 39 Global Step: 824770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:32,544-Speed 2497.43 samples/sec Loss 1.0373 LearningRate 0.000000 Epoch: 39 Global Step: 824780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:40,746-Speed 2497.36 samples/sec Loss 1.0507 LearningRate 0.000000 Epoch: 39 Global Step: 824790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:48,954-Speed 2495.24 samples/sec Loss 1.0564 LearningRate 0.000000 Epoch: 39 Global Step: 824800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:08:57,165-Speed 2494.84 samples/sec Loss 1.0693 LearningRate 0.000000 Epoch: 39 Global Step: 824810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:05,366-Speed 2497.43 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 824820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:13,520-Speed 2512.15 samples/sec Loss 1.0515 LearningRate 0.000000 Epoch: 39 Global Step: 824830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:21,737-Speed 2492.90 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 824840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:29,938-Speed 2497.68 samples/sec Loss 1.0389 LearningRate 0.000000 Epoch: 39 Global Step: 824850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:38,142-Speed 2496.74 samples/sec Loss 1.0519 LearningRate 0.000000 Epoch: 39 Global Step: 824860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:46,345-Speed 2497.04 samples/sec Loss 1.0660 LearningRate 0.000000 Epoch: 39 Global Step: 824870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:09:54,549-Speed 2496.60 samples/sec Loss 1.0652 LearningRate 0.000000 Epoch: 39 Global Step: 824880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:02,699-Speed 2513.28 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 824890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:10,901-Speed 2497.36 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 824900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:19,104-Speed 2496.93 samples/sec Loss 1.0924 LearningRate 0.000000 Epoch: 39 Global Step: 824910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:27,307-Speed 2497.11 samples/sec Loss 1.0635 LearningRate 0.000000 Epoch: 39 Global Step: 824920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:35,512-Speed 2496.83 samples/sec Loss 1.0615 LearningRate 0.000000 Epoch: 39 Global Step: 824930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:43,717-Speed 2496.48 samples/sec Loss 1.0967 LearningRate 0.000000 Epoch: 39 Global Step: 824940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:10:51,866-Speed 2513.59 samples/sec Loss 1.0696 LearningRate 0.000000 Epoch: 39 Global Step: 824950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:00,072-Speed 2496.34 samples/sec Loss 1.0647 LearningRate 0.000000 Epoch: 39 Global Step: 824960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:08,293-Speed 2491.98 samples/sec Loss 1.0482 LearningRate 0.000000 Epoch: 39 Global Step: 824970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:16,495-Speed 2497.24 samples/sec Loss 1.0573 LearningRate 0.000000 Epoch: 39 Global Step: 824980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:24,703-Speed 2495.43 samples/sec Loss 1.0484 LearningRate 0.000000 Epoch: 39 Global Step: 824990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:32,909-Speed 2496.39 samples/sec Loss 1.0471 LearningRate 0.000000 Epoch: 39 Global Step: 825000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:41,059-Speed 2513.38 samples/sec Loss 1.0718 LearningRate 0.000000 Epoch: 39 Global Step: 825010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:49,264-Speed 2496.23 samples/sec Loss 1.0474 LearningRate 0.000000 Epoch: 39 Global Step: 825020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:11:57,468-Speed 2496.66 samples/sec Loss 1.0766 LearningRate 0.000000 Epoch: 39 Global Step: 825030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:05,675-Speed 2496.04 samples/sec Loss 1.0492 LearningRate 0.000000 Epoch: 39 Global Step: 825040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:13,885-Speed 2494.98 samples/sec Loss 1.0444 LearningRate 0.000000 Epoch: 39 Global Step: 825050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:22,090-Speed 2496.33 samples/sec Loss 1.0650 LearningRate 0.000000 Epoch: 39 Global Step: 825060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:30,244-Speed 2512.73 samples/sec Loss 1.0808 LearningRate 0.000000 Epoch: 39 Global Step: 825070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:38,452-Speed 2495.95 samples/sec Loss 1.0456 LearningRate 0.000000 Epoch: 39 Global Step: 825080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:46,673-Speed 2491.42 samples/sec Loss 1.0836 LearningRate 0.000000 Epoch: 39 Global Step: 825090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:12:54,882-Speed 2495.35 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 825100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:03,087-Speed 2496.27 samples/sec Loss 1.0688 LearningRate 0.000000 Epoch: 39 Global Step: 825110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:11,293-Speed 2496.17 samples/sec Loss 1.0586 LearningRate 0.000000 Epoch: 39 Global Step: 825120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:19,447-Speed 2512.19 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 825130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:27,651-Speed 2496.75 samples/sec Loss 1.0491 LearningRate 0.000000 Epoch: 39 Global Step: 825140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:35,861-Speed 2494.88 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 825150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:44,067-Speed 2495.92 samples/sec Loss 1.0438 LearningRate 0.000000 Epoch: 39 Global Step: 825160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:13:52,273-Speed 2496.35 samples/sec Loss 1.1037 LearningRate 0.000000 Epoch: 39 Global Step: 825170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:00,475-Speed 2497.03 samples/sec Loss 1.0628 LearningRate 0.000000 Epoch: 39 Global Step: 825180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:08,629-Speed 2512.21 samples/sec Loss 1.0785 LearningRate 0.000000 Epoch: 39 Global Step: 825190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:16,833-Speed 2497.06 samples/sec Loss 1.0727 LearningRate 0.000000 Epoch: 39 Global Step: 825200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:25,041-Speed 2495.66 samples/sec Loss 1.0721 LearningRate 0.000000 Epoch: 39 Global Step: 825210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:33,244-Speed 2496.87 samples/sec Loss 1.0840 LearningRate 0.000000 Epoch: 39 Global Step: 825220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:41,448-Speed 2496.67 samples/sec Loss 1.0837 LearningRate 0.000000 Epoch: 39 Global Step: 825230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:49,653-Speed 2496.63 samples/sec Loss 1.0624 LearningRate 0.000000 Epoch: 39 Global Step: 825240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:14:57,803-Speed 2513.49 samples/sec Loss 1.0622 LearningRate 0.000000 Epoch: 39 Global Step: 825250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:06,009-Speed 2495.96 samples/sec Loss 1.0691 LearningRate 0.000000 Epoch: 39 Global Step: 825260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:14,215-Speed 2496.01 samples/sec Loss 1.0552 LearningRate 0.000000 Epoch: 39 Global Step: 825270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:22,421-Speed 2496.37 samples/sec Loss 1.0479 LearningRate 0.000000 Epoch: 39 Global Step: 825280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:30,627-Speed 2496.30 samples/sec Loss 1.0756 LearningRate 0.000000 Epoch: 39 Global Step: 825290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:38,833-Speed 2495.77 samples/sec Loss 1.1059 LearningRate 0.000000 Epoch: 39 Global Step: 825300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:46,986-Speed 2512.62 samples/sec Loss 1.0538 LearningRate 0.000000 Epoch: 39 Global Step: 825310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:15:55,191-Speed 2496.76 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 825320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:03,397-Speed 2496.16 samples/sec Loss 1.0561 LearningRate 0.000000 Epoch: 39 Global Step: 825330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:11,606-Speed 2495.10 samples/sec Loss 1.0203 LearningRate 0.000000 Epoch: 39 Global Step: 825340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:19,827-Speed 2491.90 samples/sec Loss 1.0228 LearningRate 0.000000 Epoch: 39 Global Step: 825350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:28,040-Speed 2494.06 samples/sec Loss 1.0764 LearningRate 0.000000 Epoch: 39 Global Step: 825360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:36,195-Speed 2511.64 samples/sec Loss 1.0764 LearningRate 0.000000 Epoch: 39 Global Step: 825370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:44,400-Speed 2496.13 samples/sec Loss 1.0578 LearningRate 0.000000 Epoch: 39 Global Step: 825380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:16:52,613-Speed 2494.69 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 825390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:00,833-Speed 2491.64 samples/sec Loss 1.0701 LearningRate 0.000000 Epoch: 39 Global Step: 825400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:09,042-Speed 2495.42 samples/sec Loss 1.0619 LearningRate 0.000000 Epoch: 39 Global Step: 825410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:17,247-Speed 2496.30 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 825420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:25,415-Speed 2508.08 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 825430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:33,623-Speed 2495.40 samples/sec Loss 1.0530 LearningRate 0.000000 Epoch: 39 Global Step: 825440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:41,827-Speed 2496.75 samples/sec Loss 1.0866 LearningRate 0.000000 Epoch: 39 Global Step: 825450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:50,031-Speed 2496.78 samples/sec Loss 1.0451 LearningRate 0.000000 Epoch: 39 Global Step: 825460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:17:58,239-Speed 2495.61 samples/sec Loss 1.0490 LearningRate 0.000000 Epoch: 39 Global Step: 825470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:06,445-Speed 2496.13 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 825480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:14,602-Speed 2511.13 samples/sec Loss 1.0532 LearningRate 0.000000 Epoch: 39 Global Step: 825490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:22,809-Speed 2495.77 samples/sec Loss 1.0877 LearningRate 0.000000 Epoch: 39 Global Step: 825500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:31,011-Speed 2497.18 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 825510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:39,217-Speed 2496.17 samples/sec Loss 1.0241 LearningRate 0.000000 Epoch: 39 Global Step: 825520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:47,426-Speed 2495.20 samples/sec Loss 1.0642 LearningRate 0.000000 Epoch: 39 Global Step: 825530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:18:55,627-Speed 2497.69 samples/sec Loss 1.1049 LearningRate 0.000000 Epoch: 39 Global Step: 825540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:03,780-Speed 2512.41 samples/sec Loss 1.0672 LearningRate 0.000000 Epoch: 39 Global Step: 825550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:11,984-Speed 2496.52 samples/sec Loss 1.0736 LearningRate 0.000000 Epoch: 39 Global Step: 825560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:20,191-Speed 2495.98 samples/sec Loss 1.0896 LearningRate 0.000000 Epoch: 39 Global Step: 825570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:28,399-Speed 2495.46 samples/sec Loss 1.0546 LearningRate 0.000000 Epoch: 39 Global Step: 825580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:36,605-Speed 2496.30 samples/sec Loss 1.0471 LearningRate 0.000000 Epoch: 39 Global Step: 825590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:44,813-Speed 2495.45 samples/sec Loss 1.0895 LearningRate 0.000000 Epoch: 39 Global Step: 825600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:19:52,970-Speed 2511.32 samples/sec Loss 1.0581 LearningRate 0.000000 Epoch: 39 Global Step: 825610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:01,176-Speed 2496.65 samples/sec Loss 1.0587 LearningRate 0.000000 Epoch: 39 Global Step: 825620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:09,380-Speed 2496.67 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 825630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:17,586-Speed 2496.28 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 825640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:25,790-Speed 2496.78 samples/sec Loss 1.0687 LearningRate 0.000000 Epoch: 39 Global Step: 825650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:33,994-Speed 2496.93 samples/sec Loss 1.0697 LearningRate 0.000000 Epoch: 39 Global Step: 825660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:42,145-Speed 2512.62 samples/sec Loss 1.0922 LearningRate 0.000000 Epoch: 39 Global Step: 825670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:50,351-Speed 2496.25 samples/sec Loss 1.0306 LearningRate 0.000000 Epoch: 39 Global Step: 825680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:20:58,557-Speed 2496.00 samples/sec Loss 1.0378 LearningRate 0.000000 Epoch: 39 Global Step: 825690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:06,769-Speed 2494.61 samples/sec Loss 1.0676 LearningRate 0.000000 Epoch: 39 Global Step: 825700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:14,974-Speed 2496.45 samples/sec Loss 1.0621 LearningRate 0.000000 Epoch: 39 Global Step: 825710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:23,179-Speed 2496.22 samples/sec Loss 1.0636 LearningRate 0.000000 Epoch: 39 Global Step: 825720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:31,330-Speed 2513.27 samples/sec Loss 1.0729 LearningRate 0.000000 Epoch: 39 Global Step: 825730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:39,532-Speed 2497.21 samples/sec Loss 1.0524 LearningRate 0.000000 Epoch: 39 Global Step: 825740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:47,737-Speed 2496.55 samples/sec Loss 1.0389 LearningRate 0.000000 Epoch: 39 Global Step: 825750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:21:55,939-Speed 2497.38 samples/sec Loss 1.0284 LearningRate 0.000000 Epoch: 39 Global Step: 825760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-07-13 11:22:04,102-Speed 2509.04 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 825770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:12,303-Speed 2497.74 samples/sec Loss 1.0574 LearningRate 0.000000 Epoch: 39 Global Step: 825780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:20,457-Speed 2512.34 samples/sec Loss 1.0802 LearningRate 0.000000 Epoch: 39 Global Step: 825790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:28,675-Speed 2492.39 samples/sec Loss 1.0310 LearningRate 0.000000 Epoch: 39 Global Step: 825800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:36,879-Speed 2496.86 samples/sec Loss 1.0477 LearningRate 0.000000 Epoch: 39 Global Step: 825810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:45,089-Speed 2494.90 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 825820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:22:53,305-Speed 2493.10 samples/sec Loss 1.0793 LearningRate 0.000000 Epoch: 39 Global Step: 825830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:01,508-Speed 2497.08 samples/sec Loss 1.0864 LearningRate 0.000000 Epoch: 39 Global Step: 825840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:09,658-Speed 2513.27 samples/sec Loss 1.0432 LearningRate 0.000000 Epoch: 39 Global Step: 825850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:17,864-Speed 2496.08 samples/sec Loss 1.0326 LearningRate 0.000000 Epoch: 39 Global Step: 825860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:26,071-Speed 2495.88 samples/sec Loss 1.0259 LearningRate 0.000000 Epoch: 39 Global Step: 825870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:34,273-Speed 2497.40 samples/sec Loss 1.0327 LearningRate 0.000000 Epoch: 39 Global Step: 825880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:42,476-Speed 2497.26 samples/sec Loss 1.0463 LearningRate 0.000000 Epoch: 39 Global Step: 825890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:50,690-Speed 2493.88 samples/sec Loss 1.0787 LearningRate 0.000000 Epoch: 39 Global Step: 825900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:23:58,843-Speed 2512.11 samples/sec Loss 1.0652 LearningRate 0.000000 Epoch: 39 Global Step: 825910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:07,053-Speed 2495.11 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 825920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:15,253-Speed 2497.95 samples/sec Loss 1.0779 LearningRate 0.000000 Epoch: 39 Global Step: 825930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:23,458-Speed 2496.39 samples/sec Loss 1.0429 LearningRate 0.000000 Epoch: 39 Global Step: 825940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:31,661-Speed 2497.07 samples/sec Loss 1.0690 LearningRate 0.000000 Epoch: 39 Global Step: 825950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:39,877-Speed 2493.16 samples/sec Loss 1.0618 LearningRate 0.000000 Epoch: 39 Global Step: 825960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:48,048-Speed 2506.54 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 825970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:24:56,252-Speed 2496.94 samples/sec Loss 1.0403 LearningRate 0.000000 Epoch: 39 Global Step: 825980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:04,458-Speed 2496.18 samples/sec Loss 1.0605 LearningRate 0.000000 Epoch: 39 Global Step: 825990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:12,673-Speed 2493.72 samples/sec Loss 1.0662 LearningRate 0.000000 Epoch: 39 Global Step: 826000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:20,890-Speed 2492.80 samples/sec Loss 1.0400 LearningRate 0.000000 Epoch: 39 Global Step: 826010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:29,096-Speed 2495.95 samples/sec Loss 1.0467 LearningRate 0.000000 Epoch: 39 Global Step: 826020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:37,243-Speed 2514.07 samples/sec Loss 1.0486 LearningRate 0.000000 Epoch: 39 Global Step: 826030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:45,443-Speed 2498.00 samples/sec Loss 1.0675 LearningRate 0.000000 Epoch: 39 Global Step: 826040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:25:53,653-Speed 2495.26 samples/sec Loss 1.0738 LearningRate 0.000000 Epoch: 39 Global Step: 826050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:26:01,858-Speed 2496.26 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 826060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-07-13 11:26:10,017-Speed 2510.61 samples/sec Loss 1.0421 LearningRate 0.000000 Epoch: 39 Global Step: 826070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:18,230-Speed 2493.99 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 826080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:26,382-Speed 2512.71 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 826090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:34,584-Speed 2497.17 samples/sec Loss 1.0667 LearningRate 0.000000 Epoch: 39 Global Step: 826100 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:42,788-Speed 2496.81 samples/sec Loss 1.0446 LearningRate 0.000000 Epoch: 39 Global Step: 826110 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:50,993-Speed 2496.61 samples/sec Loss 1.0803 LearningRate 0.000000 Epoch: 39 Global Step: 826120 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:26:59,198-Speed 2496.60 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 826130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:27:07,418-Speed 2492.07 samples/sec Loss 1.0727 LearningRate 0.000000 Epoch: 39 Global Step: 826140 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:27:15,567-Speed 2513.32 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 826150 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:27:23,774-Speed 2496.04 samples/sec Loss 1.0359 LearningRate 0.000000 Epoch: 39 Global Step: 826160 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:27:31,935-Speed 2510.02 samples/sec Loss 1.0478 LearningRate 0.000000 Epoch: 39 Global Step: 826170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:27:40,138-Speed 2497.04 samples/sec Loss 1.0590 LearningRate 0.000000 Epoch: 39 Global Step: 826180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:27:48,340-Speed 2497.13 samples/sec Loss 1.0647 LearningRate 0.000000 Epoch: 39 Global Step: 826190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:27:56,543-Speed 2497.32 samples/sec Loss 1.0392 LearningRate 0.000000 Epoch: 39 Global Step: 826200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:04,691-Speed 2513.88 samples/sec Loss 1.0299 LearningRate 0.000000 Epoch: 39 Global Step: 826210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:12,893-Speed 2497.27 samples/sec Loss 1.0634 LearningRate 0.000000 Epoch: 39 Global Step: 826220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:21,093-Speed 2497.86 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 826230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:29,294-Speed 2497.60 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 826240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:37,496-Speed 2497.24 samples/sec Loss 1.0572 LearningRate 0.000000 Epoch: 39 Global Step: 826250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:45,703-Speed 2495.81 samples/sec Loss 1.0665 LearningRate 0.000000 Epoch: 39 Global Step: 826260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:28:53,850-Speed 2514.48 samples/sec Loss 1.0641 LearningRate 0.000000 Epoch: 39 Global Step: 826270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:02,049-Speed 2498.53 samples/sec Loss 1.0270 LearningRate 0.000000 Epoch: 39 Global Step: 826280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:10,250-Speed 2497.69 samples/sec Loss 1.0382 LearningRate 0.000000 Epoch: 39 Global Step: 826290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:18,450-Speed 2497.97 samples/sec Loss 1.1032 LearningRate 0.000000 Epoch: 39 Global Step: 826300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:26,655-Speed 2496.75 samples/sec Loss 1.0669 LearningRate 0.000000 Epoch: 39 Global Step: 826310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:34,859-Speed 2496.83 samples/sec Loss 1.0417 LearningRate 0.000000 Epoch: 39 Global Step: 826320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:43,007-Speed 2513.89 samples/sec Loss 1.0900 LearningRate 0.000000 Epoch: 39 Global Step: 826330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:51,210-Speed 2497.33 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 826340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:29:59,413-Speed 2496.90 samples/sec Loss 1.0690 LearningRate 0.000000 Epoch: 39 Global Step: 826350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:07,615-Speed 2497.31 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 826360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:15,818-Speed 2497.14 samples/sec Loss 1.0270 LearningRate 0.000000 Epoch: 39 Global Step: 826370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:24,021-Speed 2497.08 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 826380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:32,171-Speed 2513.35 samples/sec Loss 1.0464 LearningRate 0.000000 Epoch: 39 Global Step: 826390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:40,380-Speed 2495.72 samples/sec Loss 1.0336 LearningRate 0.000000 Epoch: 39 Global Step: 826400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:48,585-Speed 2496.32 samples/sec Loss 1.0654 LearningRate 0.000000 Epoch: 39 Global Step: 826410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:30:56,799-Speed 2493.59 samples/sec Loss 1.0517 LearningRate 0.000000 Epoch: 39 Global Step: 826420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:05,004-Speed 2496.83 samples/sec Loss 1.0558 LearningRate 0.000000 Epoch: 39 Global Step: 826430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:13,210-Speed 2495.85 samples/sec Loss 1.0792 LearningRate 0.000000 Epoch: 39 Global Step: 826440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:21,371-Speed 2510.11 samples/sec Loss 1.0529 LearningRate 0.000000 Epoch: 39 Global Step: 826450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:29,590-Speed 2491.89 samples/sec Loss 1.0504 LearningRate 0.000000 Epoch: 39 Global Step: 826460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:37,795-Speed 2496.85 samples/sec Loss 1.0714 LearningRate 0.000000 Epoch: 39 Global Step: 826470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:45,998-Speed 2497.26 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 826480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:31:54,214-Speed 2492.85 samples/sec Loss 1.0810 LearningRate 0.000000 Epoch: 39 Global Step: 826490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:02,417-Speed 2497.40 samples/sec Loss 1.0318 LearningRate 0.000000 Epoch: 39 Global Step: 826500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:10,568-Speed 2512.88 samples/sec Loss 1.0601 LearningRate 0.000000 Epoch: 39 Global Step: 826510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:18,770-Speed 2497.45 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 826520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:26,977-Speed 2496.23 samples/sec Loss 1.0224 LearningRate 0.000000 Epoch: 39 Global Step: 826530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:35,178-Speed 2497.42 samples/sec Loss 1.0567 LearningRate 0.000000 Epoch: 39 Global Step: 826540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:43,381-Speed 2497.21 samples/sec Loss 1.0684 LearningRate 0.000000 Epoch: 39 Global Step: 826550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:51,583-Speed 2497.23 samples/sec Loss 1.0592 LearningRate 0.000000 Epoch: 39 Global Step: 826560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:32:59,734-Speed 2513.05 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 826570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:07,943-Speed 2495.37 samples/sec Loss 1.0488 LearningRate 0.000000 Epoch: 39 Global Step: 826580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:16,150-Speed 2495.65 samples/sec Loss 1.0505 LearningRate 0.000000 Epoch: 39 Global Step: 826590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:24,353-Speed 2497.01 samples/sec Loss 1.0348 LearningRate 0.000000 Epoch: 39 Global Step: 826600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:32,559-Speed 2496.37 samples/sec Loss 1.0733 LearningRate 0.000000 Epoch: 39 Global Step: 826610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:40,762-Speed 2497.09 samples/sec Loss 1.0510 LearningRate 0.000000 Epoch: 39 Global Step: 826620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:48,910-Speed 2514.00 samples/sec Loss 1.0746 LearningRate 0.000000 Epoch: 39 Global Step: 826630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:33:57,116-Speed 2495.93 samples/sec Loss 1.0813 LearningRate 0.000000 Epoch: 39 Global Step: 826640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:05,319-Speed 2497.52 samples/sec Loss 1.0729 LearningRate 0.000000 Epoch: 39 Global Step: 826650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:13,520-Speed 2497.43 samples/sec Loss 1.0494 LearningRate 0.000000 Epoch: 39 Global Step: 826660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:21,740-Speed 2492.14 samples/sec Loss 1.0569 LearningRate 0.000000 Epoch: 39 Global Step: 826670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:29,942-Speed 2497.16 samples/sec Loss 1.0757 LearningRate 0.000000 Epoch: 39 Global Step: 826680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:38,093-Speed 2512.80 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 826690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:46,297-Speed 2497.10 samples/sec Loss 1.0536 LearningRate 0.000000 Epoch: 39 Global Step: 826700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:34:54,499-Speed 2497.33 samples/sec Loss 1.0520 LearningRate 0.000000 Epoch: 39 Global Step: 826710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:02,703-Speed 2496.65 samples/sec Loss 1.0391 LearningRate 0.000000 Epoch: 39 Global Step: 826720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:10,909-Speed 2496.02 samples/sec Loss 1.0671 LearningRate 0.000000 Epoch: 39 Global Step: 826730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:19,111-Speed 2497.65 samples/sec Loss 1.0710 LearningRate 0.000000 Epoch: 39 Global Step: 826740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:27,263-Speed 2512.67 samples/sec Loss 1.0356 LearningRate 0.000000 Epoch: 39 Global Step: 826750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:35,468-Speed 2496.58 samples/sec Loss 1.0645 LearningRate 0.000000 Epoch: 39 Global Step: 826760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:43,678-Speed 2495.03 samples/sec Loss 1.0193 LearningRate 0.000000 Epoch: 39 Global Step: 826770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:35:51,884-Speed 2496.10 samples/sec Loss 1.0637 LearningRate 0.000000 Epoch: 39 Global Step: 826780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:00,088-Speed 2496.61 samples/sec Loss 1.0719 LearningRate 0.000000 Epoch: 39 Global Step: 826790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:08,290-Speed 2497.36 samples/sec Loss 1.0787 LearningRate 0.000000 Epoch: 39 Global Step: 826800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:16,437-Speed 2514.30 samples/sec Loss 1.0471 LearningRate 0.000000 Epoch: 39 Global Step: 826810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:24,644-Speed 2496.26 samples/sec Loss 1.0845 LearningRate 0.000000 Epoch: 39 Global Step: 826820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:32,850-Speed 2495.94 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 826830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:41,055-Speed 2496.63 samples/sec Loss 1.0508 LearningRate 0.000000 Epoch: 39 Global Step: 826840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:49,258-Speed 2496.98 samples/sec Loss 1.0849 LearningRate 0.000000 Epoch: 39 Global Step: 826850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:36:57,464-Speed 2495.97 samples/sec Loss 1.0682 LearningRate 0.000000 Epoch: 39 Global Step: 826860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:05,618-Speed 2512.23 samples/sec Loss 1.0434 LearningRate 0.000000 Epoch: 39 Global Step: 826870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:13,820-Speed 2497.48 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 826880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:22,025-Speed 2496.36 samples/sec Loss 1.0791 LearningRate 0.000000 Epoch: 39 Global Step: 826890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:30,231-Speed 2496.39 samples/sec Loss 1.0898 LearningRate 0.000000 Epoch: 39 Global Step: 826900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:38,442-Speed 2494.61 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 826910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:46,646-Speed 2496.49 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 826920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:37:54,804-Speed 2511.14 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 826930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:03,008-Speed 2496.57 samples/sec Loss 1.0530 LearningRate 0.000000 Epoch: 39 Global Step: 826940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:11,212-Speed 2496.99 samples/sec Loss 1.0509 LearningRate 0.000000 Epoch: 39 Global Step: 826950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:19,417-Speed 2496.44 samples/sec Loss 1.0623 LearningRate 0.000000 Epoch: 39 Global Step: 826960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:27,623-Speed 2496.08 samples/sec Loss 1.0513 LearningRate 0.000000 Epoch: 39 Global Step: 826970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:35,824-Speed 2497.82 samples/sec Loss 1.0813 LearningRate 0.000000 Epoch: 39 Global Step: 826980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:43,974-Speed 2513.40 samples/sec Loss 1.0665 LearningRate 0.000000 Epoch: 39 Global Step: 826990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:38:52,174-Speed 2498.03 samples/sec Loss 1.0543 LearningRate 0.000000 Epoch: 39 Global Step: 827000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:00,375-Speed 2497.48 samples/sec Loss 1.0589 LearningRate 0.000000 Epoch: 39 Global Step: 827010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:08,580-Speed 2496.57 samples/sec Loss 1.0732 LearningRate 0.000000 Epoch: 39 Global Step: 827020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:16,782-Speed 2497.40 samples/sec Loss 1.0503 LearningRate 0.000000 Epoch: 39 Global Step: 827030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:24,987-Speed 2496.27 samples/sec Loss 1.0556 LearningRate 0.000000 Epoch: 39 Global Step: 827040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:33,141-Speed 2512.33 samples/sec Loss 1.0517 LearningRate 0.000000 Epoch: 39 Global Step: 827050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:41,347-Speed 2496.10 samples/sec Loss 1.0546 LearningRate 0.000000 Epoch: 39 Global Step: 827060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:49,551-Speed 2496.83 samples/sec Loss 1.0429 LearningRate 0.000000 Epoch: 39 Global Step: 827070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:39:57,776-Speed 2490.37 samples/sec Loss 1.0459 LearningRate 0.000000 Epoch: 39 Global Step: 827080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:05,993-Speed 2492.64 samples/sec Loss 1.0731 LearningRate 0.000000 Epoch: 39 Global Step: 827090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:14,196-Speed 2496.80 samples/sec Loss 1.0823 LearningRate 0.000000 Epoch: 39 Global Step: 827100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:22,356-Speed 2510.27 samples/sec Loss 1.0478 LearningRate 0.000000 Epoch: 39 Global Step: 827110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:30,564-Speed 2495.68 samples/sec Loss 1.0533 LearningRate 0.000000 Epoch: 39 Global Step: 827120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:38,769-Speed 2496.44 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 827130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:46,972-Speed 2496.90 samples/sec Loss 1.0607 LearningRate 0.000000 Epoch: 39 Global Step: 827140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:40:55,181-Speed 2495.12 samples/sec Loss 1.0598 LearningRate 0.000000 Epoch: 39 Global Step: 827150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:03,386-Speed 2496.55 samples/sec Loss 1.0693 LearningRate 0.000000 Epoch: 39 Global Step: 827160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:11,548-Speed 2509.68 samples/sec Loss 1.0643 LearningRate 0.000000 Epoch: 39 Global Step: 827170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:19,749-Speed 2497.66 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 827180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:27,959-Speed 2494.78 samples/sec Loss 1.0410 LearningRate 0.000000 Epoch: 39 Global Step: 827190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:36,171-Speed 2494.22 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 827200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:44,380-Speed 2495.41 samples/sec Loss 1.0671 LearningRate 0.000000 Epoch: 39 Global Step: 827210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:41:52,584-Speed 2496.73 samples/sec Loss 1.0503 LearningRate 0.000000 Epoch: 39 Global Step: 827220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:00,735-Speed 2512.99 samples/sec Loss 1.0592 LearningRate 0.000000 Epoch: 39 Global Step: 827230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:08,939-Speed 2496.65 samples/sec Loss 1.0897 LearningRate 0.000000 Epoch: 39 Global Step: 827240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:17,146-Speed 2495.74 samples/sec Loss 1.0619 LearningRate 0.000000 Epoch: 39 Global Step: 827250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:25,354-Speed 2495.63 samples/sec Loss 1.0698 LearningRate 0.000000 Epoch: 39 Global Step: 827260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:33,556-Speed 2497.35 samples/sec Loss 1.0362 LearningRate 0.000000 Epoch: 39 Global Step: 827270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:41,762-Speed 2496.03 samples/sec Loss 1.0691 LearningRate 0.000000 Epoch: 39 Global Step: 827280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:49,915-Speed 2512.39 samples/sec Loss 1.0853 LearningRate 0.000000 Epoch: 39 Global Step: 827290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:42:58,120-Speed 2496.34 samples/sec Loss 1.0594 LearningRate 0.000000 Epoch: 39 Global Step: 827300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:06,325-Speed 2496.40 samples/sec Loss 1.0338 LearningRate 0.000000 Epoch: 39 Global Step: 827310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:14,532-Speed 2496.19 samples/sec Loss 1.0712 LearningRate 0.000000 Epoch: 39 Global Step: 827320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:22,735-Speed 2496.68 samples/sec Loss 1.0435 LearningRate 0.000000 Epoch: 39 Global Step: 827330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:30,945-Speed 2495.04 samples/sec Loss 1.0497 LearningRate 0.000000 Epoch: 39 Global Step: 827340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:39,094-Speed 2513.50 samples/sec Loss 1.0533 LearningRate 0.000000 Epoch: 39 Global Step: 827350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:47,299-Speed 2496.53 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 827360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-07-13 11:43:55,500-Speed 2497.45 samples/sec Loss 1.0639 LearningRate 0.000000 Epoch: 39 Global Step: 827370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-07-13 11:44:03,700-Speed 2497.96 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 827380 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:11,915-Speed 2493.63 samples/sec Loss 1.0525 LearningRate 0.000000 Epoch: 39 Global Step: 827390 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:20,122-Speed 2495.78 samples/sec Loss 1.0501 LearningRate 0.000000 Epoch: 39 Global Step: 827400 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:28,272-Speed 2513.24 samples/sec Loss 1.0711 LearningRate 0.000000 Epoch: 39 Global Step: 827410 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:36,479-Speed 2496.08 samples/sec Loss 1.0571 LearningRate 0.000000 Epoch: 39 Global Step: 827420 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:44,680-Speed 2497.64 samples/sec Loss 1.0356 LearningRate 0.000000 Epoch: 39 Global Step: 827430 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:44:52,885-Speed 2496.37 samples/sec Loss 1.0704 LearningRate 0.000000 Epoch: 39 Global Step: 827440 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:01,088-Speed 2497.23 samples/sec Loss 1.0189 LearningRate 0.000000 Epoch: 39 Global Step: 827450 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:09,291-Speed 2496.93 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 827460 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:17,441-Speed 2513.41 samples/sec Loss 1.0423 LearningRate 0.000000 Epoch: 39 Global Step: 827470 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:25,642-Speed 2497.39 samples/sec Loss 1.0539 LearningRate 0.000000 Epoch: 39 Global Step: 827480 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:33,850-Speed 2495.76 samples/sec Loss 1.0633 LearningRate 0.000000 Epoch: 39 Global Step: 827490 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:42,056-Speed 2496.20 samples/sec Loss 1.0901 LearningRate 0.000000 Epoch: 39 Global Step: 827500 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:50,262-Speed 2496.23 samples/sec Loss 1.0612 LearningRate 0.000000 Epoch: 39 Global Step: 827510 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:45:58,475-Speed 2494.04 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 827520 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:06,625-Speed 2513.28 samples/sec Loss 1.0759 LearningRate 0.000000 Epoch: 39 Global Step: 827530 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:14,831-Speed 2496.15 samples/sec Loss 1.0524 LearningRate 0.000000 Epoch: 39 Global Step: 827540 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:23,042-Speed 2494.55 samples/sec Loss 1.0540 LearningRate 0.000000 Epoch: 39 Global Step: 827550 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:31,247-Speed 2496.59 samples/sec Loss 1.0674 LearningRate 0.000000 Epoch: 39 Global Step: 827560 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:39,454-Speed 2495.73 samples/sec Loss 1.0766 LearningRate 0.000000 Epoch: 39 Global Step: 827570 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:47,661-Speed 2496.04 samples/sec Loss 1.0751 LearningRate 0.000000 Epoch: 39 Global Step: 827580 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:46:55,811-Speed 2513.32 samples/sec Loss 1.0601 LearningRate 0.000000 Epoch: 39 Global Step: 827590 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:04,018-Speed 2495.90 samples/sec Loss 1.0614 LearningRate 0.000000 Epoch: 39 Global Step: 827600 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:12,220-Speed 2497.39 samples/sec Loss 1.0675 LearningRate 0.000000 Epoch: 39 Global Step: 827610 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:20,429-Speed 2495.14 samples/sec Loss 1.0696 LearningRate 0.000000 Epoch: 39 Global Step: 827620 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:28,638-Speed 2495.49 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 827630 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:36,840-Speed 2497.83 samples/sec Loss 1.0906 LearningRate 0.000000 Epoch: 39 Global Step: 827640 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:44,992-Speed 2512.72 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 827650 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:47:53,196-Speed 2496.91 samples/sec Loss 1.0485 LearningRate 0.000000 Epoch: 39 Global Step: 827660 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:01,401-Speed 2496.26 samples/sec Loss 1.0390 LearningRate 0.000000 Epoch: 39 Global Step: 827670 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:09,609-Speed 2495.54 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 827680 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:17,814-Speed 2496.44 samples/sec Loss 1.0196 LearningRate 0.000000 Epoch: 39 Global Step: 827690 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:26,018-Speed 2496.94 samples/sec Loss 1.0480 LearningRate 0.000000 Epoch: 39 Global Step: 827700 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:34,171-Speed 2512.26 samples/sec Loss 1.0597 LearningRate 0.000000 Epoch: 39 Global Step: 827710 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:42,374-Speed 2497.33 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 827720 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:50,587-Speed 2493.89 samples/sec Loss 1.0553 LearningRate 0.000000 Epoch: 39 Global Step: 827730 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:48:58,790-Speed 2497.03 samples/sec Loss 1.0668 LearningRate 0.000000 Epoch: 39 Global Step: 827740 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:07,011-Speed 2491.69 samples/sec Loss 1.0420 LearningRate 0.000000 Epoch: 39 Global Step: 827750 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:15,216-Speed 2496.31 samples/sec Loss 1.0508 LearningRate 0.000000 Epoch: 39 Global Step: 827760 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:23,364-Speed 2513.96 samples/sec Loss 1.0848 LearningRate 0.000000 Epoch: 39 Global Step: 827770 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:31,582-Speed 2492.59 samples/sec Loss 1.0463 LearningRate 0.000000 Epoch: 39 Global Step: 827780 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:39,788-Speed 2496.19 samples/sec Loss 1.0399 LearningRate 0.000000 Epoch: 39 Global Step: 827790 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:47,992-Speed 2497.17 samples/sec Loss 1.0393 LearningRate 0.000000 Epoch: 39 Global Step: 827800 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:49:56,194-Speed 2497.32 samples/sec Loss 1.0635 LearningRate 0.000000 Epoch: 39 Global Step: 827810 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:04,402-Speed 2495.42 samples/sec Loss 1.0796 LearningRate 0.000000 Epoch: 39 Global Step: 827820 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:12,553-Speed 2513.32 samples/sec Loss 1.0477 LearningRate 0.000000 Epoch: 39 Global Step: 827830 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:20,786-Speed 2487.82 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 827840 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:28,988-Speed 2497.45 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 827850 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:37,191-Speed 2497.16 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 827860 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:45,394-Speed 2497.44 samples/sec Loss 1.0855 LearningRate 0.000000 Epoch: 39 Global Step: 827870 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:50:53,609-Speed 2493.37 samples/sec Loss 1.0536 LearningRate 0.000000 Epoch: 39 Global Step: 827880 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:01,760-Speed 2512.89 samples/sec Loss 1.0399 LearningRate 0.000000 Epoch: 39 Global Step: 827890 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:09,964-Speed 2496.64 samples/sec Loss 1.0277 LearningRate 0.000000 Epoch: 39 Global Step: 827900 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:18,168-Speed 2497.02 samples/sec Loss 1.0669 LearningRate 0.000000 Epoch: 39 Global Step: 827910 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:26,379-Speed 2494.30 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 827920 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:34,608-Speed 2489.48 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 827930 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:42,811-Speed 2497.01 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 827940 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:50,963-Speed 2512.92 samples/sec Loss 1.0529 LearningRate 0.000000 Epoch: 39 Global Step: 827950 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:51:59,167-Speed 2496.65 samples/sec Loss 1.0764 LearningRate 0.000000 Epoch: 39 Global Step: 827960 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:07,371-Speed 2496.59 samples/sec Loss 1.0448 LearningRate 0.000000 Epoch: 39 Global Step: 827970 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:15,575-Speed 2496.89 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 827980 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:23,778-Speed 2497.02 samples/sec Loss 1.0546 LearningRate 0.000000 Epoch: 39 Global Step: 827990 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:31,990-Speed 2494.15 samples/sec Loss 1.0491 LearningRate 0.000000 Epoch: 39 Global Step: 828000 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:40,138-Speed 2514.08 samples/sec Loss 1.0880 LearningRate 0.000000 Epoch: 39 Global Step: 828010 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:48,341-Speed 2497.09 samples/sec Loss 1.0587 LearningRate 0.000000 Epoch: 39 Global Step: 828020 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:52:56,545-Speed 2496.52 samples/sec Loss 1.0271 LearningRate 0.000000 Epoch: 39 Global Step: 828030 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:04,747-Speed 2497.36 samples/sec Loss 1.0617 LearningRate 0.000000 Epoch: 39 Global Step: 828040 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:12,951-Speed 2496.67 samples/sec Loss 1.1022 LearningRate 0.000000 Epoch: 39 Global Step: 828050 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:21,160-Speed 2495.33 samples/sec Loss 1.0420 LearningRate 0.000000 Epoch: 39 Global Step: 828060 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:29,312-Speed 2513.09 samples/sec Loss 1.0507 LearningRate 0.000000 Epoch: 39 Global Step: 828070 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:37,519-Speed 2495.75 samples/sec Loss 1.0845 LearningRate 0.000000 Epoch: 39 Global Step: 828080 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:45,719-Speed 2497.84 samples/sec Loss 1.0493 LearningRate 0.000000 Epoch: 39 Global Step: 828090 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:53:53,920-Speed 2497.74 samples/sec Loss 1.0388 LearningRate 0.000000 Epoch: 39 Global Step: 828100 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:02,124-Speed 2497.03 samples/sec Loss 1.0150 LearningRate 0.000000 Epoch: 39 Global Step: 828110 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:10,339-Speed 2493.14 samples/sec Loss 1.0751 LearningRate 0.000000 Epoch: 39 Global Step: 828120 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:18,493-Speed 2512.14 samples/sec Loss 1.0576 LearningRate 0.000000 Epoch: 39 Global Step: 828130 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:26,700-Speed 2496.17 samples/sec Loss 1.0230 LearningRate 0.000000 Epoch: 39 Global Step: 828140 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:34,913-Speed 2494.05 samples/sec Loss 1.0735 LearningRate 0.000000 Epoch: 39 Global Step: 828150 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:43,125-Speed 2494.10 samples/sec Loss 1.0754 LearningRate 0.000000 Epoch: 39 Global Step: 828160 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:51,337-Speed 2494.58 samples/sec Loss 1.0765 LearningRate 0.000000 Epoch: 39 Global Step: 828170 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:54:59,541-Speed 2496.87 samples/sec Loss 1.0798 LearningRate 0.000000 Epoch: 39 Global Step: 828180 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:07,694-Speed 2512.48 samples/sec Loss 1.0333 LearningRate 0.000000 Epoch: 39 Global Step: 828190 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:15,897-Speed 2496.91 samples/sec Loss 1.0649 LearningRate 0.000000 Epoch: 39 Global Step: 828200 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:24,099-Speed 2497.38 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 828210 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:32,307-Speed 2495.76 samples/sec Loss 1.0608 LearningRate 0.000000 Epoch: 39 Global Step: 828220 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:40,510-Speed 2496.93 samples/sec Loss 1.0444 LearningRate 0.000000 Epoch: 39 Global Step: 828230 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:48,714-Speed 2496.63 samples/sec Loss 1.0287 LearningRate 0.000000 Epoch: 39 Global Step: 828240 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:55:56,871-Speed 2511.24 samples/sec Loss 1.0801 LearningRate 0.000000 Epoch: 39 Global Step: 828250 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:05,086-Speed 2493.52 samples/sec Loss 1.0515 LearningRate 0.000000 Epoch: 39 Global Step: 828260 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:13,291-Speed 2496.36 samples/sec Loss 1.0597 LearningRate 0.000000 Epoch: 39 Global Step: 828270 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:21,494-Speed 2497.05 samples/sec Loss 1.0406 LearningRate 0.000000 Epoch: 39 Global Step: 828280 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:29,702-Speed 2495.44 samples/sec Loss 1.0612 LearningRate 0.000000 Epoch: 39 Global Step: 828290 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:37,910-Speed 2495.88 samples/sec Loss 1.0458 LearningRate 0.000000 Epoch: 39 Global Step: 828300 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:46,057-Speed 2514.10 samples/sec Loss 1.0605 LearningRate 0.000000 Epoch: 39 Global Step: 828310 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:56:54,260-Speed 2497.04 samples/sec Loss 1.0637 LearningRate 0.000000 Epoch: 39 Global Step: 828320 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:02,461-Speed 2497.63 samples/sec Loss 1.0448 LearningRate 0.000000 Epoch: 39 Global Step: 828330 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:10,667-Speed 2496.87 samples/sec Loss 1.0563 LearningRate 0.000000 Epoch: 39 Global Step: 828340 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:18,868-Speed 2497.43 samples/sec Loss 1.0358 LearningRate 0.000000 Epoch: 39 Global Step: 828350 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:27,070-Speed 2497.42 samples/sec Loss 1.0864 LearningRate 0.000000 Epoch: 39 Global Step: 828360 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:35,224-Speed 2512.16 samples/sec Loss 1.0585 LearningRate 0.000000 Epoch: 39 Global Step: 828370 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:43,440-Speed 2493.00 samples/sec Loss 1.0487 LearningRate 0.000000 Epoch: 39 Global Step: 828380 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:51,644-Speed 2496.86 samples/sec Loss 1.0585 LearningRate 0.000000 Epoch: 39 Global Step: 828390 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:57:59,861-Speed 2492.91 samples/sec Loss 1.0632 LearningRate 0.000000 Epoch: 39 Global Step: 828400 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:08,061-Speed 2497.73 samples/sec Loss 1.0523 LearningRate 0.000000 Epoch: 39 Global Step: 828410 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:16,261-Speed 2497.78 samples/sec Loss 1.0460 LearningRate 0.000000 Epoch: 39 Global Step: 828420 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:24,416-Speed 2511.79 samples/sec Loss 1.0557 LearningRate 0.000000 Epoch: 39 Global Step: 828430 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:32,624-Speed 2495.70 samples/sec Loss 1.0667 LearningRate 0.000000 Epoch: 39 Global Step: 828440 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:40,837-Speed 2494.13 samples/sec Loss 1.0198 LearningRate 0.000000 Epoch: 39 Global Step: 828450 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:49,053-Speed 2492.97 samples/sec Loss 1.0945 LearningRate 0.000000 Epoch: 39 Global Step: 828460 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:58:57,255-Speed 2497.51 samples/sec Loss 1.0608 LearningRate 0.000000 Epoch: 39 Global Step: 828470 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:05,457-Speed 2497.21 samples/sec Loss 1.0602 LearningRate 0.000000 Epoch: 39 Global Step: 828480 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:13,610-Speed 2512.37 samples/sec Loss 1.0446 LearningRate 0.000000 Epoch: 39 Global Step: 828490 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:21,812-Speed 2497.23 samples/sec Loss 1.0700 LearningRate 0.000000 Epoch: 39 Global Step: 828500 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:30,019-Speed 2495.90 samples/sec Loss 1.0600 LearningRate 0.000000 Epoch: 39 Global Step: 828510 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:38,220-Speed 2497.82 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 828520 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:46,422-Speed 2497.32 samples/sec Loss 1.0239 LearningRate 0.000000 Epoch: 39 Global Step: 828530 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 11:59:54,625-Speed 2497.24 samples/sec Loss 1.0371 LearningRate 0.000000 Epoch: 39 Global Step: 828540 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 12:00:02,776-Speed 2512.93 samples/sec Loss 1.0496 LearningRate 0.000000 Epoch: 39 Global Step: 828550 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 12:00:10,982-Speed 2496.16 samples/sec Loss 1.0528 LearningRate 0.000000 Epoch: 39 Global Step: 828560 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-07-13 12:00:19,183-Speed 2497.83 samples/sec Loss 1.0582 LearningRate 0.000000 Epoch: 39 Global Step: 828570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:00:27,386-Speed 2497.03 samples/sec Loss 1.1000 LearningRate 0.000000 Epoch: 39 Global Step: 828580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:00:35,589-Speed 2496.96 samples/sec Loss 1.0903 LearningRate 0.000000 Epoch: 39 Global Step: 828590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:00:43,794-Speed 2496.56 samples/sec Loss 1.0780 LearningRate 0.000000 Epoch: 39 Global Step: 828600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:00:51,943-Speed 2513.42 samples/sec Loss 1.0425 LearningRate 0.000000 Epoch: 39 Global Step: 828610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:00,146-Speed 2497.10 samples/sec Loss 1.0373 LearningRate 0.000000 Epoch: 39 Global Step: 828620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:08,356-Speed 2494.88 samples/sec Loss 1.0781 LearningRate 0.000000 Epoch: 39 Global Step: 828630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:16,560-Speed 2496.74 samples/sec Loss 1.0365 LearningRate 0.000000 Epoch: 39 Global Step: 828640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:24,762-Speed 2497.58 samples/sec Loss 1.0341 LearningRate 0.000000 Epoch: 39 Global Step: 828650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:32,964-Speed 2497.40 samples/sec Loss 1.0832 LearningRate 0.000000 Epoch: 39 Global Step: 828660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:41,117-Speed 2512.54 samples/sec Loss 1.0526 LearningRate 0.000000 Epoch: 39 Global Step: 828670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:49,321-Speed 2496.81 samples/sec Loss 1.0258 LearningRate 0.000000 Epoch: 39 Global Step: 828680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:01:57,526-Speed 2496.43 samples/sec Loss 1.0307 LearningRate 0.000000 Epoch: 39 Global Step: 828690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:05,730-Speed 2496.77 samples/sec Loss 1.0675 LearningRate 0.000000 Epoch: 39 Global Step: 828700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:13,931-Speed 2497.67 samples/sec Loss 1.0606 LearningRate 0.000000 Epoch: 39 Global Step: 828710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:22,136-Speed 2496.55 samples/sec Loss 1.0746 LearningRate 0.000000 Epoch: 39 Global Step: 828720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:30,284-Speed 2513.60 samples/sec Loss 1.0862 LearningRate 0.000000 Epoch: 39 Global Step: 828730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:38,487-Speed 2497.15 samples/sec Loss 1.0554 LearningRate 0.000000 Epoch: 39 Global Step: 828740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:46,690-Speed 2497.03 samples/sec Loss 1.0449 LearningRate 0.000000 Epoch: 39 Global Step: 828750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:02:54,890-Speed 2497.91 samples/sec Loss 1.0701 LearningRate 0.000000 Epoch: 39 Global Step: 828760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:03,096-Speed 2496.28 samples/sec Loss 1.0638 LearningRate 0.000000 Epoch: 39 Global Step: 828770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:11,305-Speed 2495.08 samples/sec Loss 1.0863 LearningRate 0.000000 Epoch: 39 Global Step: 828780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:19,454-Speed 2513.63 samples/sec Loss 1.0594 LearningRate 0.000000 Epoch: 39 Global Step: 828790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:27,659-Speed 2496.27 samples/sec Loss 1.0530 LearningRate 0.000000 Epoch: 39 Global Step: 828800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:35,861-Speed 2497.36 samples/sec Loss 1.0706 LearningRate 0.000000 Epoch: 39 Global Step: 828810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:44,062-Speed 2497.79 samples/sec Loss 1.0890 LearningRate 0.000000 Epoch: 39 Global Step: 828820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:03:52,277-Speed 2493.52 samples/sec Loss 1.0501 LearningRate 0.000000 Epoch: 39 Global Step: 828830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:00,483-Speed 2496.10 samples/sec Loss 1.0459 LearningRate 0.000000 Epoch: 39 Global Step: 828840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:08,643-Speed 2510.24 samples/sec Loss 1.0457 LearningRate 0.000000 Epoch: 39 Global Step: 828850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:16,851-Speed 2495.49 samples/sec Loss 1.0538 LearningRate 0.000000 Epoch: 39 Global Step: 828860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:25,056-Speed 2496.37 samples/sec Loss 1.0547 LearningRate 0.000000 Epoch: 39 Global Step: 828870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:33,262-Speed 2496.11 samples/sec Loss 1.0517 LearningRate 0.000000 Epoch: 39 Global Step: 828880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:41,469-Speed 2496.01 samples/sec Loss 1.0447 LearningRate 0.000000 Epoch: 39 Global Step: 828890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:49,676-Speed 2495.70 samples/sec Loss 1.0776 LearningRate 0.000000 Epoch: 39 Global Step: 828900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:04:57,829-Speed 2512.59 samples/sec Loss 1.0599 LearningRate 0.000000 Epoch: 39 Global Step: 828910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:06,037-Speed 2495.58 samples/sec Loss 1.0402 LearningRate 0.000000 Epoch: 39 Global Step: 828920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:14,242-Speed 2496.29 samples/sec Loss 1.0666 LearningRate 0.000000 Epoch: 39 Global Step: 828930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:22,451-Speed 2495.15 samples/sec Loss 1.0577 LearningRate 0.000000 Epoch: 39 Global Step: 828940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:30,658-Speed 2495.85 samples/sec Loss 1.0734 LearningRate 0.000000 Epoch: 39 Global Step: 828950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:38,866-Speed 2495.95 samples/sec Loss 1.0927 LearningRate 0.000000 Epoch: 39 Global Step: 828960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:47,017-Speed 2512.96 samples/sec Loss 1.0675 LearningRate 0.000000 Epoch: 39 Global Step: 828970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:05:55,224-Speed 2496.15 samples/sec Loss 1.0723 LearningRate 0.000000 Epoch: 39 Global Step: 828980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:03,431-Speed 2495.88 samples/sec Loss 1.0707 LearningRate 0.000000 Epoch: 39 Global Step: 828990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:11,639-Speed 2495.58 samples/sec Loss 1.0694 LearningRate 0.000000 Epoch: 39 Global Step: 829000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:19,845-Speed 2496.25 samples/sec Loss 1.0479 LearningRate 0.000000 Epoch: 39 Global Step: 829010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:28,055-Speed 2494.59 samples/sec Loss 1.0737 LearningRate 0.000000 Epoch: 39 Global Step: 829020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:36,208-Speed 2512.76 samples/sec Loss 1.0633 LearningRate 0.000000 Epoch: 39 Global Step: 829030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:44,412-Speed 2496.73 samples/sec Loss 1.0635 LearningRate 0.000000 Epoch: 39 Global Step: 829040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:06:52,617-Speed 2496.38 samples/sec Loss 1.0611 LearningRate 0.000000 Epoch: 39 Global Step: 829050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:00,817-Speed 2497.87 samples/sec Loss 1.0637 LearningRate 0.000000 Epoch: 39 Global Step: 829060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:09,023-Speed 2496.05 samples/sec Loss 1.0843 LearningRate 0.000000 Epoch: 39 Global Step: 829070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:17,241-Speed 2492.58 samples/sec Loss 1.0242 LearningRate 0.000000 Epoch: 39 Global Step: 829080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:25,398-Speed 2511.48 samples/sec Loss 1.0585 LearningRate 0.000000 Epoch: 39 Global Step: 829090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:33,606-Speed 2495.49 samples/sec Loss 1.0737 LearningRate 0.000000 Epoch: 39 Global Step: 829100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:41,812-Speed 2495.93 samples/sec Loss 1.0534 LearningRate 0.000000 Epoch: 39 Global Step: 829110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:50,018-Speed 2496.34 samples/sec Loss 1.0632 LearningRate 0.000000 Epoch: 39 Global Step: 829120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:07:58,223-Speed 2496.24 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 829130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:06,427-Speed 2497.11 samples/sec Loss 1.0583 LearningRate 0.000000 Epoch: 39 Global Step: 829140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:14,618-Speed 2500.62 samples/sec Loss 1.0662 LearningRate 0.000000 Epoch: 39 Global Step: 829150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:22,831-Speed 2493.90 samples/sec Loss 1.0399 LearningRate 0.000000 Epoch: 39 Global Step: 829160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:31,037-Speed 2496.18 samples/sec Loss 1.0495 LearningRate 0.000000 Epoch: 39 Global Step: 829170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:39,253-Speed 2493.13 samples/sec Loss 1.0601 LearningRate 0.000000 Epoch: 39 Global Step: 829180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:47,465-Speed 2494.35 samples/sec Loss 1.0834 LearningRate 0.000000 Epoch: 39 Global Step: 829190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:08:55,672-Speed 2495.51 samples/sec Loss 1.0644 LearningRate 0.000000 Epoch: 39 Global Step: 829200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:03,825-Speed 2512.51 samples/sec Loss 1.1003 LearningRate 0.000000 Epoch: 39 Global Step: 829210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:12,032-Speed 2496.10 samples/sec Loss 1.0751 LearningRate 0.000000 Epoch: 39 Global Step: 829220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:20,238-Speed 2495.94 samples/sec Loss 1.0651 LearningRate 0.000000 Epoch: 39 Global Step: 829230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:28,444-Speed 2496.25 samples/sec Loss 1.0629 LearningRate 0.000000 Epoch: 39 Global Step: 829240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:36,653-Speed 2495.40 samples/sec Loss 1.0992 LearningRate 0.000000 Epoch: 39 Global Step: 829250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:44,860-Speed 2495.90 samples/sec Loss 1.0767 LearningRate 0.000000 Epoch: 39 Global Step: 829260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:09:53,014-Speed 2512.25 samples/sec Loss 1.0705 LearningRate 0.000000 Epoch: 39 Global Step: 829270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:01,226-Speed 2494.09 samples/sec Loss 1.0796 LearningRate 0.000000 Epoch: 39 Global Step: 829280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:09,431-Speed 2496.64 samples/sec Loss 1.0664 LearningRate 0.000000 Epoch: 39 Global Step: 829290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:17,637-Speed 2496.33 samples/sec Loss 1.0731 LearningRate 0.000000 Epoch: 39 Global Step: 829300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:25,844-Speed 2495.68 samples/sec Loss 1.0560 LearningRate 0.000000 Epoch: 39 Global Step: 829310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:34,059-Speed 2493.46 samples/sec Loss 1.0486 LearningRate 0.000000 Epoch: 39 Global Step: 829320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:42,212-Speed 2512.39 samples/sec Loss 1.0562 LearningRate 0.000000 Epoch: 39 Global Step: 829330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:50,421-Speed 2495.38 samples/sec Loss 1.0626 LearningRate 0.000000 Epoch: 39 Global Step: 829340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:10:58,624-Speed 2496.94 samples/sec Loss 1.0497 LearningRate 0.000000 Epoch: 39 Global Step: 829350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:06,833-Speed 2495.51 samples/sec Loss 1.0569 LearningRate 0.000000 Epoch: 39 Global Step: 829360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:15,043-Speed 2494.78 samples/sec Loss 1.0502 LearningRate 0.000000 Epoch: 39 Global Step: 829370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:23,252-Speed 2495.23 samples/sec Loss 1.1003 LearningRate 0.000000 Epoch: 39 Global Step: 829380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:31,403-Speed 2513.08 samples/sec Loss 1.0705 LearningRate 0.000000 Epoch: 39 Global Step: 829390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:39,605-Speed 2497.33 samples/sec Loss 1.0593 LearningRate 0.000000 Epoch: 39 Global Step: 829400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:47,820-Speed 2493.67 samples/sec Loss 1.0595 LearningRate 0.000000 Epoch: 39 Global Step: 829410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:11:56,055-Speed 2487.55 samples/sec Loss 1.0647 LearningRate 0.000000 Epoch: 39 Global Step: 829420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:04,277-Speed 2491.10 samples/sec Loss 1.0759 LearningRate 0.000000 Epoch: 39 Global Step: 829430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:12,487-Speed 2494.82 samples/sec Loss 1.0749 LearningRate 0.000000 Epoch: 39 Global Step: 829440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:20,638-Speed 2512.95 samples/sec Loss 1.0544 LearningRate 0.000000 Epoch: 39 Global Step: 829450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:28,844-Speed 2496.14 samples/sec Loss 1.0715 LearningRate 0.000000 Epoch: 39 Global Step: 829460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:37,051-Speed 2495.91 samples/sec Loss 1.0476 LearningRate 0.000000 Epoch: 39 Global Step: 829470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:45,270-Speed 2492.40 samples/sec Loss 1.0660 LearningRate 0.000000 Epoch: 39 Global Step: 829480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:12:53,490-Speed 2491.79 samples/sec Loss 1.0482 LearningRate 0.000000 Epoch: 39 Global Step: 829490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:01,696-Speed 2496.61 samples/sec Loss 1.0706 LearningRate 0.000000 Epoch: 39 Global Step: 829500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:09,849-Speed 2512.44 samples/sec Loss 1.0878 LearningRate 0.000000 Epoch: 39 Global Step: 829510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:18,056-Speed 2496.09 samples/sec Loss 1.0610 LearningRate 0.000000 Epoch: 39 Global Step: 829520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:26,262-Speed 2496.10 samples/sec Loss 1.0340 LearningRate 0.000000 Epoch: 39 Global Step: 829530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:34,478-Speed 2493.35 samples/sec Loss 1.0624 LearningRate 0.000000 Epoch: 39 Global Step: 829540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:42,684-Speed 2496.08 samples/sec Loss 1.0691 LearningRate 0.000000 Epoch: 39 Global Step: 829550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-07-13 12:13:50,891-Speed 2495.86 samples/sec Loss 1.0680 LearningRate 0.000000 Epoch: 39 Global Step: 829560 Fp16 Grad Scale: 16384 Required: -0 hours Training: 2022-07-13 12:13:59,045-Speed 2512.01 samples/sec Loss 1.0489 LearningRate 0.000000 Epoch: 39 Global Step: 829570 Fp16 Grad Scale: 16384 Required: -0 hours Training: 2022-07-13 12:14:07,250-Speed 2496.44 samples/sec Loss 1.0596 LearningRate 0.000000 Epoch: 39 Global Step: 829580 Fp16 Grad Scale: 16384 Required: -0 hours